When you want to pull the wool over someone's eyes, the easiest way to do it is to employ indisputable facts, and present them in such a way as to enable the listener to draw an inappropriate conclusion, usually with the aid of some logical leger-de-main. Sometimes they won't bite. So you move on to myths. You invoke something that has been repeated so often that, through plain and simple brainwashing, it has come to be regarded as fact even though it has no factual basis to support it. Sometimes even the myth fails. So you have little choice but to move onto misconceptions. Misconceptions are a tricky animal, because they usually have their origins in indisputable facts. What you do is take a fact, tack something onto it that is not itself supported by facts, and hope to get away with passing it off as a fact by association. And if all else fails, you just have to lie.
Most of the time deceptions are manifestations of plain and simple dishonesty. But sometimes misconceptions themselves worm their way into our knowledge base in such a way that we forget where the lines lie between the underlying facts and the remora-like add-ons that attached themselves like suckers. To create a misconception you need three simple ingredients. First, you need a subject matter which is inherently complicated, but which can be easily described in simple terms so that people can feel comfortable with it to a certain level. Next, you need something which is actually incorrect. Third, you need a logical link - an argument or demonstration by which the flawed conclusion becomes conflated with the factual aspects of the subject matter.
There are many misconceptions that plague the complex world of digital audio. I am going to try and clarify one of them for you, because you will hear it repeated ad nauseam. It is the one where you will be told that by careful application of dither, you can extract signals whose amplitude lies below the LSB (least significant bit), and which, apparently, cannot otherwise be encoded.
I want to introduce you first to the concept of image enhancement. Many of you will have come across a scene in a movie or TV show where a blurred image is magically sharpened to incredible resolution with little more than a handful of keystrokes on a computer. It is utter balderdash. A misconception. But behind it lie some real facts. I have seen real demonstrations of blurry indistinct video of a military nature, where a computer is asked to uncover the presence of, say, a tank. When the magic button is pressed, a tank does indeed appear out of the murk. These demonstrations are very compelling and very convincing - and, yes, are factual. They key element of what is happening is that in order to see a tank, you have to be looking for a tank. If the murk hides a car, a hot-dog vendor, or even Osama Bin Laden holding a bazooka, you will never see them. You need to be looking specifically for them. How the technology works, is that the complex object hiding in the murk of the image disturbs the murk ever so slightly, and by correlating the disturbance with what we know of the appearance of a tank, we can infer to a greater or lesser degree, the presence of the tank. It is important to note that, so long as we are looking for the tank we will never be able to perceive Osama and his bazooka. We have to be looking specifically for him. And his bazooka.
Applying this to audio dither, the same arguments hold true. Yes, it is possible to take 16-bit audio data and apply the right sort of dither, and then observe a recorded pure tone at -120dB, which is 20dB below the Signal-to-(Quantization)-Noise ratio of the 16-bit format. That's a good 4 bits below the level of the 16th bit. This is because we were looking for that specific pure tone using a Fourier Transform. Music is NOT being encoded at -120dB. We are simply inferring the presence of the -120dB tone through its residual interaction with the (dithered) noise. In fact the residual evidence of the tone was only inserted in the first place during the dithering process.
Here is the true test of whether one can encode music below the 16th bit purely through the magic of dither. Take an undithered 16-bit recording. Apply 20dB of digital attenuation using whatever dithering technique you like. Mathematics says that the 4 least significant bits of the music data will be pushed below the level of the 16th bit where they are simply lost forever. However, according to the misconception, the 4 least significant bits of the 16-bit data stream have somehow been safely preserved by the dither, all ready to be re-constructed. Now take the attenuated data stream and apply 20dB of gain. Have you managed to reconstruct the original music data? No, you haven't. Not even close.
This is not to say that dither is neither useful nor valuable. The fact is that it CAN reduce the measured SNR to below the theoretical quantization noise limit over a certain range of the audio bandwidth. This is useful and valuable. But to demonstrate the extraction of test tones from below that limit is only a mathematical party trick, and nothing more should be inferred from it.
I close with a rhyme, wherein the landlord of a country Inn uses his own form of dither to fit ten men into nine bedrooms:
Ten weary travellers, cold and wet
To a country Inn did come.
The night was cold, their clothes were damp,
Their hands and feet were numb.
"Come in, come in", the landlord cried,
"A room for all ye men,
"But I have only nine spare beds,
"And ye are numbered ten."
"Then one of us shall take the floor,
"For none of us are gay!"
"Nay, nay, my friends", the landlord cried,
"There is an easy way."
Two men he placed in room marked A,
The third he lodged in B,
The fourth and fifth in C and D,
The sixth in bedroom E.
Seven, eight, nine, in F, G, H,
Then back to A did fly,
Wherein remained the tenth and last,
And lodged him safe in I.
Nine beds had he, and yet had one
For each of travellers ten.
And this is it that puzzles me,
And many wiser men.