Sunday 31 March 2013

So you think you understand Digital Audio?

I'm willing to bet that most of you don't.  In principle it is all very straightforward.  But in practice it really doesn't quite pan out that way.

Lets start off with all the stuff that you do know.

In digital audio, the music waveform is "sampled" on a regular basis.  Sampling means that the instantaneous magnitude of the waveform is measured and the resultant value stored somewhere.  The Sample Rate tells you how often these samples are measured.  The standard used in Compact Discs is 44.1kHz.  This means that the music waveform is measured 44,100 times per second, and the results of each measurement are stored.  The device that performs this sampling is called an ADC (Analog-to-Digital Converter).

Obviously, it is important that the precision with which the instantaneous waveform is measured is extremely accurate, and this precision is for the most part reflected by the "Bit Depth" with which the result is digitally stored.  The standard used in Compact Discs is 16-bits.  This means that the resultant value is stored as a 16-bit number.  16-bit numbers range from 0 to 65,535 and can take the form of whole numbers only.  Here, the largest possible amplitude of the musical waveform that can be recorded corresponds to the number 65,535, and the lowest possible magnitude (which is in fact the largest possible negative amplitude, since music signals vary between positive and negative), corresponds to the number zero.  By contrast, most high resolution recordings capture the musical waveform as 24-bit numbers.  These numbers range from 0 to 16,777,215 and so obviously are able to capture the musical waveform in a lot greater detail.

When we play back our digital music, all we have to do is re-create the musical waveform using the stored numbers, and this is where your DAC (Digital-to-Analog Converter) comes in.  Using the example of music stored in the CD format (which is commonly written in audio shorthand as 16/44.1) the job of the DAC is to grab a data value 44,100 times per second, each time creating an instantaneous voltage corresponding to the precise value encoded in the 16-bit data value.  If the DAC can do this accurately, then it will recreate the original musical waveform with a precision limited only by the extent to which the original music signal can be accurately reflected by 16-bit numbers.  Also - and this is obvious if you think about it - the timing of the 44,100 samples per second has to be exactly the same timing as when the music was originally sampled.

Now, here's the bit you probably don't know.

Unfortunately, one can be lulled into a false sense of security by this simplistic picture.  The reality is actually quite different.  You see, it turns out that it is a frighteningly complicated and prohibitively expensive task to build either ADCs or DACs that do the job I have just described.  So, given that most mobile phones contain both an ADC and a DAC, how is it we manage to get around this?

The answer is a technologically and mathematically challenging concept called "Sigma-Delta Modulation" (SDM).  This is basically an ultra-high-speed bit stream comprising only ones and zeros, such that at any point in time the magnitude of the encoded signal is reflected by the relative preponderance of ones over zeros.  If the bit-stream comprises almost all ones, then this would represent the maximum possible signal amplitude.  If almost all zeros, it would represent the minimum (or most negative) possible signal amplitude.  The beauty of SDM is that - without any signal processing whatsoever - the bitstream can be fed directly into the input of an amplifier, and it takes little more than an analog low-pass filter to convert it into music.  This is precisely how "Class-D" amplifiers work.

Although it is beyond the scope of this note to describe how, an ADC whose output is an SDM bit stream is a cheap thing to build, despite being an incredibly complicated thing to describe or even understand functionally.

So, in reality, with so few exceptions as to be not worth mentioning, all digital recordings are created using an SDM-based ADC, followed by a mathematically-driven signal processor which converts the SDM bitstream to a PCM data file.  And likewise, all DACs take the PCM data files they receive and put them through a mathematically-driven signal processor which converts them back to SDM, which is then converted to Analog using a simple Class-D output stage.

Why should you be concerned by all this?  Well, what you need to know is that, from a mathematical perspective, SDM and PCM are mutually incompatible formats.  Although both are at their roots nothing more than numbers, you cannot convert losslessly from one format to the other.  The conversion process invariably results in the musical data being irrecoverably "smeared" in the time domain.  So, in a real-world ADC, rather than actually sampling the music at fixed intervals in time, what the ADC is actually doing is calculating what the instantaneous amplitude of the music ought to be using the SDM bitstream as its reference.  Likewise for the DAC.

Many smart people will absolutely insist that PCM-based digital audio fundamentally has this, that, or another degree of perfection, based on somebody's published theoretical analysis, and if you claim to hear otherwise you are obviously fooling yourself.  And based on pure-PCM assumptions, these analyses are often very convincing.  Yet most serious audiophiles continue to hold that digital audio is on various levels less satisfying than good old analog.

I have developed a strong suspicion that many, many of the ills which we ascribe to digital audio may in fact be caused not by fundamental limitations of the PCM format, but by the sonically disruptive SDM-to-PCM and PCM-to-SDM converters that live unheralded in the ADCs and DACs that inhabit the playback chain.  Over time - quite a long time, I expect - I plan to experiment with this idea.