Tuesday 12 November 2013

What, exactly, is DSD? - II. Getting in Shape

Last week, we learned that by adopting a PCM format, we also constrain ourselves with the need to employ radical low-pass filtering in both the ADC and DAC stages in order to eliminate the fundamental problem of aliasing. Yesterday we learned that we can use oversampling and noise shaping to overcome some of the limitations imposed by Bit Depth in PCM systems. Taking both together, we learned that by increasing both the BitDepth and the Sample Rate we can make inroads into the audible effects of both of these limitations.

In practice, there is no point in extending the Bit Depth beyond 24 bits. This represents a dynamic range of 144dB, and no recording system we know of can can present analog waveforms with that level of dynamic range to the input of an ADC. On the other hand, even by extending the Sampling Rate out to 384kHz (the largest at which I have even seen any commercially available music made available), the brick-wall filter requirements are still within the territory where we would anticipate its effects to be audible. A 24/384 file is approximately 13 times the size of its 16/44.1 equivalent. That gets to be an awfully big file. In order for the filter requirements to be ameliorated to the point where we are no longer concerned with their sonic impact the sample rate needs to be out in the MHz range. But a 24-bit 2.82MHz file would be a whopping 100 times the size of its 16/44.1 counterpart. Clearly this is takes us places we don’t want to go.

But wait! Didn’t we just learn that by oversampling and Noise Shaping we can access dynamic range below the limitation imposed by the Bit Depth? Increasing the sample rate by a factor of 64 to 2.82MHz would mean that our audio frequencies (20Hz - 20kHz) are all going to be massively oversampled. Perhaps we can reduce the Bit Depth? Well, with oversampling alone, all we can do is shave a paltry 4-bits off our Bit depth. But do not get discouraged, with Noise Shaping it turns out we can reduce it all the way down to 1-bit. A 1-bit 2.82MHz file is only 4 times larger than its 16/44.1 equivalent, which is actually quite manageable. But really? Can we get more than 100dB of dynamic range from a 1-bit system just by sampling at 2.82MHz?

Yes, we can, but I am not going anywhere near the mathematics that spit out those numbers. That is the preserve of experts only. But here’s what we do. When we encode data with a 1-bit number, the quantization error is absolutely massive, and can be anywhere between +100% and -100% of the signal itself. Without any form of noise shaping, this quantization noise would in practice sit at a level of around -20dB (due to the effect of oversampling alone) but would extend all the way out to a frequency of 1.41MHz. But because of the massive amount of oversampling, we can attempt to use Noise Shaping to depress the quantization noise in the region of 0-20kHz, at the expense of increasing it at frequencies above, say 100kHz. In other words, we would “shape” it out of the audio band and up into the frequency range where we are confident no musical information lives, and plan on filtering it out later. We didn’t choose that sampling rate of 2.82MHz by accident. It turns out that is the lowest sample rate at which we can get the noise down well below 100dB over the entire audio frequency bandwidth.

To convert this signal back to analog, it turns out this format is much easier to implement than multi-bit PCM. Because we only encode 1-bit, we only have to create an output voltage of either Maximum or Minimum. We are not concerned with generating seriously accurate intermediate voltages. To generate this output, all we have to do is switch back and forth between Maximum and Minimum according to the bit stream. This switching can be done very accurately indeed. Then, having generated this binary waveform, all we have to do is pass it through a low pass filter. Job done.

This is a pretty interesting result. We have managed to eliminate the need for those nasty brick-wall filters at both the ACD and DAC, and at the same time capture a signal with exceptional dynamic range across the audio bandwidth. This, my friends, is DSD.

As with a lot of things, when you peek under the hood, things always get a little more complicated, and I will address some of those complications tomorrow.