Some of you are apparently quite skeptical about the claims I made regarding dither - even regarding the most basic claims. This skepticism, it seems to me, is borne mostly out of some false assumptions. So I think this post might be useful in clearing those up.
I have posted below (sorry, but I cannot find a way to intersperse them throughout the text) some graphs I have prepared showing a selection of Fourier Transforms I put together. What I did was to produce some WAV files containing a 0dB pure tone at a frequency of 344Hz, and added various levels of dither. The files are all 16-bit, 44.1kHz, standard CD resolution. The choice of frequency is very important, and I will come back to that later. The important point is that it was chosen to enable me to illustrate the point I am going to make with the most clarity, and not to mislead.
Skip this paragraph if technical disclosure details bore you. At BitPerfect we have developed our own audio analyzer, which has unusually high resolution. Unfortunately, this analyzer is not available to you, so you cannot use it to confirm my results. Also, its graphical output is somewhat utilitarian to say the least, and does not lend itself to illustrating nice on-line posts. The FFTs I am showing were instead created using Audacity, which is a free program you can download yourself if you are interested. For those who want to do just that, the FFTs were done with 16,384 samples using a Blackman-Harris window function (number of terms not specified), and exported to text files for graphing in MS Excel.
The first graph is labelled "Undithered". This, as its name suggests, is a pure tone with no dithering applied. I want to draw your attention to two features.
First, the background noise level is down at something like -180dB. Many of you will be wondering how a 16-bit audio file can have a background noise level so low that it would take 30-bits to encode it. Good question. The answer is that the file itself contains no information whatsoever at those frequencies. The correct value for background noise level would actually be minus infinity. What you are seeing instead is the fact that the mathematics of the FFT are done by a computer whose calculations are performed using numbers with finite precision. The thousands of calculations that go into producing every result each carry forward a tiny rounding error which ends up being the result you see. The background noise represents the mathematical limit of Audacity's FFT algorithm. By comparison, BitPerfect's own analyzer has a background noise level which is 100dB lower!
Second, there are a whole bunch of spikes starting at about 1kHz, and stretching out up to 22.05kHz. This, you might think, is not unusual. These are the harmonics of the 344Hz base tone (the 2nd harmonic, and one or two others, are missing), and represent the harmonic distortion spectrum of the QE (Quantization Error). It is interesting to note that the QE here comprises 100% harmonic components and no measurable un-correlated noise. The QE can therefore be seen to be entirely a distortion-based problem. The highest peaks are at -100dB, and most of the rest are between -100dB and -120dB. The point has previously been made that Harmonic Distortion is more objectionable to the human ear than Noise, so I won't belabour that one. However, this graph also lays bare the falsehood that 16-bit data encodes nothing below the -96dB level of 16-bit resolution. For sure it cannot encode a SIGNAL at those low levels, but it certainly can encode the unwanted consequences of quantizing at the 16-bit level.
Time to expand on the carefully chosen frequency, 344Hz. The period of a 344Hz oscillation is an integer number of samples at 44.1kHz. 128 samples to be exact. This is chosen precisely because at such frequencies the QE does indeed totally comprise harmonic distortion. As we move away from these frequencies it becomes more of a mix of distortion-like components plus true noise. I could likewise choose specific frequencies which have the property that the QE comprises 100% noise and no distortion-like components. But I have chosen a frequency that best illustrates my point, and you need to be aware that not all frequencies behave the same way.
In a moment, I will ask you to take a look at the second graph, entitled "LSB dither". This is the exact same data, but this time with a dither signal added. This dither is TPDF dither, with a peak-to-peak amplitude of one LSB (Least Significant Bit). The LSB is the separation between individual quantization steps. In other words, the magnitude of the dither is ±0.5 of one LSB, which means it is pretty much of the same magnitude as the QE itself. Before you look at the graph, I would challenge you to ask yourself what you expect to see. What do think it will do to the background noise level? And what do you think it will do to the QE distortion peaks? OK, time to take a look.
The first obvious thing is that the background noise has increased to about -125dB. This is real now, and is no longer an artefact of the FFT algorithm. The thoughtful ones among you will immediately ask why the noise floor is not nearer to -96dB. After, this is roughly the magnitude of the dither signal we added in. The answer is quite simple. The Total Noise Power that we added may well be closer to 96dB, but that noise is spread out over all of the possible frequencies (0 - 22.05kHz). The portion of the total dither noise found in each of the frequency bins is correspondingly lower. Hence the -125dB noise floor.
The next thing that you will notice is that the QE distortion peaks are still visible. This shows that this amount of dither is not enough to entirely eliminate the distortion components. However, if you look very closely, you will see that the magnitude of the distortion peaks has gone down by a decent 10-20dB. So our dithering has at least reduced the existing distortion, and not just masked it. That's a pretty interesting result. What is happening here is that the dither is so small that for many of the samples it fails to actually change the QE. Therefore the total QE is a mixture of enough unchanged values to still encode a (reduced) amount of harmonic distortion, and some dithered values which encode noise. The magnitude of the QE distortion peaks falls, and the noise floor rises.
Now we'll increase the amount of dither to ± one LSB. Will that manage to completely eliminate the QE distortion peaks? What do you think? Please look at the graph labelled "2LSB dither". Compared to the previous graph, the background noise level has gone up by about 3dB. But this time all of the QE distortion peaks have been completely eliminated. The highest peaks have been suppressed by over 20dB compared to the undithered result.
What happens if we increase the dither even further? I have added further graphs to illustrate that. Again, before you look, ask yourself what you expect to see. These graphs are labelled "3LSB dither" and "4LSB dither". If you predicted that the background noise level would go up, and the QE distortion peaks would remain totally suppressed, well done. You have learned well, Grasshopper.
Finally, just to make life easier I have added two final summary graphs. "Combo" superimposes all the curves onto one graph to make comparisons easier, and "Zoom Combo" zooms in on the area between 15kHz and 20kHz and shows perfectly how the 2LSB (±1LSB) dither has completely suppressed the QE distortion peaks at the cost of very little additional noise.
That was a relatively simplistic analysis, but I think it gets the point across quite well. Quantization Error can add distortion, and dithering can really show it the door. It is all very clever stuff. Just don't read anything more into it than I have tried to show you. There are many more complexities lurking in the mathematical murk to trip up anyone who wants to use this type of data to make glib generalizations regarding the bigger picture.