I thought I would write a post on the relationship between bit depth and signal-to-noise ratio. The two are related, and the relationships can get quite complicated depending on how deep into the analysis you want to go.
The simplest rule of thumb, one that most of us know, is that the signal-to-noise ratio (SNR) in a simple linear PCM representation is 6dB per bit. So a 16-bit system will have a signal-to-noise ratio of 96dB. This is a pretty good approximation to the real answer, which is that the SNR is 1.76dB plus 6.02dB per bit. So in reality a 16-bit system has a theoretical maximum SNR of 98.08dB. Why is that?
It is the nature of digital audio that when we digitize a signal we must unavoidably incur a quantization error. This is the difference between the actual instantaneous value of the signal and the nearest quantization level which is what we actually record. Sometimes we are going to round up to the nearest quantization level, and sometimes down. In any case, the quantization error will always have a value between zero and one-half of the Least Significant Bit (LSB). It is the analysis of this quantization error that gives rise to the equation alluded to in the previous paragraph. The takeaway here is that any attempt to digitally encode a real-world signal must give rise to a background noise floor of quantization noise, and we can predict what its level ought to be in any given system.
These days anybody can download a high-quality, free, audio analysis program such as Audacity, which is available for both Mac and Windows. We can use that to perform a frequency analysis of any music track we like. Suppose you obtain a 16-bit test track with a single tone at some arbitrary frequency (1kHz is often chosen) at full scale (0dB) and plot the frequency response. What do expect to see?
It is easy to imagine that the answer would be a continuous background at -98dB, with a single spike at 1kHz going up to 0dB. But that is not what we see. Instead, the background will be a lot lower than -98dB, and we will have a few additional low level peaks that we can’t explain. For the purposes of this post I am not interested in the low level peaks. These are all related to the non-random nature of quantization noise; they can be eliminated by adding dither noise, and we can conveniently ignore them right now. Aside from that, what is happening? Why is the noise floor quite a long way below -98dB?
The answer is that it is the sum total of all of the quantization noise which amounts to -98dB. But that noise is distributed pretty much equally among all the different frequencies between zero and one-half of the sample rate. When we plot the frequency analysis using Audacity we see how that noise is distributed within the frequency space. Each fragment of noise at each frequency is well below -98dB, but taken together they will all add up more or less to -98dB. The next question is a little trickier. If the background noise on a frequency analysis is actually lower than -98dB, just how low should it be? And can we do anything useful with it?
The answer to this is not as cut and dried as you might like it to be. Remember that the noise is actually divided out, more or less evenly, among all of the different frequencies. Every plot on that frequency analysis curve is NOT simply a measure of the noise at that particular frequency. What it actually is is a measure of the sum total of the noise at all frequencies in the immediate vicinity of that frequency. When we do the frequency analysis, we can stipulate how many frequencies we want to divide the audio band into. The more frequencies we choose, the fewer frequencies will lie within the immediate vicinity of each point, and the lower the SNR noise floor will appear. So the value of the SNR noise floor will depend on the resolution with which we want to plot it. Surely that can’t make sense?
But it does make perfect sense, and here is why. The frequency response is the outcome of a Fourier Analysis, which takes a chunk of raw audio data and analyzes its frequency content. The number of frequencies it spits out is equal to one half of the number of audio samples that it analyzes. So if you want to increase the number of frequencies you have to increase the number of audio samples, which means you have to analyze a chunk of music of a longer duration. For example, if I perform a Fourier analysis with 16,384 samples I can reduce the noise floor by more than 20dB. But 16,384 samples is more than one third of a second of music, and this is important.
Lets go back to the notion of the noise floor actually being well below the quantization noise limit, which is in principle the lowest level signal that can be digitally encoded. If it is, for example, 20dB below that limit, it implies that I should be able to encode a signal that is at a level approaching 20dB below the quantization limit. And this is quite correct - I am indeed able to do that. But there are swings and roundabouts to be negotiated. If I need one third of a second of music to get the noise floor down below the level of the sub-quantization signal that I want to encode, then it also follows that that signal must persist for a full third of a second in order for it to appear above that noise. So, to the extent that I can actually make this happen, it is a pure party trick and has no practical value. The constituent parts of real music signals that exist below the -98dB quantization limit of 16-bit audio are not pure tones of extended duration.
But the story doesn’t end there. Recall that I said that the quantization noise is more or less divided equally among all the frequencies below one-half of the sample rate. Well, it needn’t be. It is possible to skew the distribution such that more of it goes into some frequencies than others. This is only useful if there are “unused” frequencies available where there is no signal content, which we can filter out later. We can then reduce the amount of quantization noise at the frequencies of interest, at the expense of increasing it at the “unused” frequencies. With a sample rate of 44.1kHz, though, there are no such “unused” frequencies. The 44.1kHz sample rate was devised precisely because all of the audio frequency band (as we understood it at the time) was contained neatly within its encodable bandwidth. To address this we would need to increase the sample rate quite considerably so that a whole new range of “unused” high frequencies are now accessible. We can therefore pull some of the quantization noise out of the audio frequencies and put it into those high frequencies instead. This process is called “Noise Shaping”, and what is really interesting about it is that the noise shaping process itself can also be used to back-fill usable “signal” into the newly-created gap between the quantization noise limit and the locally reduced noise floor.
Taken to its limits, this process can become very interesting. By reducing the bit depth all the way down to 1-bit, now the quantization noise itself gets to be massive at -7.78dB. But by increasing the sample rate all the way up to 2.8MHz, we can create enough “unused” frequency space that we can “shape” an additional 110dB (or more) of noise out of the audio bandwidth and stick it where the sun don’t shine. Sound familiar?….