Last week, we learned that by adopting a PCM format, we also constrain
ourselves with the need to employ radical low-pass filtering in both the
ADC and DAC stages in order to eliminate the fundamental problem of
aliasing. Yesterday we learned that we can use oversampling and noise
shaping to overcome some of the limitations imposed by Bit Depth in PCM
systems. Taking both together, we learned that by increasing both the
BitDepth and the Sample Rate we can make inroads into the audible
effects of both of these limitations.

In practice, there is no
point in extending the Bit Depth beyond 24 bits. This represents a
dynamic range of 144dB, and no recording system we know of can can
present analog waveforms with that level of dynamic range to the input of an
ADC. On the other hand, even by extending the Sampling Rate out to
384kHz (the largest at which I have even seen any commercially available
music made available), the brick-wall filter requirements are still
within the territory where we would anticipate its effects to be
audible. A 24/384 file is approximately 13 times the size of its
16/44.1 equivalent. That gets to be an awfully big file. In order for
the filter requirements to be ameliorated to the point where we are no
longer concerned with their sonic impact the sample rate needs to be out
in the MHz range. But a 24-bit 2.82MHz file would be a whopping 100
times the size of its 16/44.1 counterpart. Clearly this is takes us
places we don’t want to go.

But wait! Didn’t we just learn
that by oversampling and Noise Shaping we can access dynamic range below
the limitation imposed by the Bit Depth? Increasing the sample rate by
a factor of 64 to 2.82MHz would mean that our audio frequencies (20Hz -
20kHz) are all going to be massively oversampled. Perhaps we can
reduce the Bit Depth? Well, with oversampling alone, all we can do is
shave a paltry 4-bits off our Bit depth. But do not get discouraged,
with Noise Shaping it turns out we can reduce it all the way down to
1-bit. A 1-bit 2.82MHz file is only 4 times larger than its 16/44.1
equivalent, which is actually quite manageable. But really? Can we get
more than 100dB of dynamic range from a 1-bit system just by sampling
at 2.82MHz?

Yes, we can, but I am not going anywhere near the
mathematics that spit out those numbers. That is the preserve of
experts only. But here’s what we do. When we encode data with a 1-bit
number, the quantization error is absolutely massive, and can be
anywhere between +100% and -100% of the signal itself. Without any form
of noise shaping, this quantization noise would in practice sit at a
level of around -20dB (due to the effect of oversampling alone) but
would extend all the way out to a frequency of 1.41MHz. But because of
the massive amount of oversampling, we can attempt to use Noise Shaping
to depress the quantization noise in the region of 0-20kHz, at the
expense of increasing it at frequencies above, say 100kHz. In other
words, we would “shape” it out of the audio band and up into the
frequency range where we are confident no musical information lives, and
plan on filtering it out later. We didn’t choose that sampling rate of
2.82MHz by accident. It turns out that is the lowest sample rate at
which we can get the noise down well below 100dB over the entire audio
frequency bandwidth.

To convert this signal back to analog, it
turns out this format is much easier to implement than multi-bit PCM.
Because we only encode 1-bit, we only have to create an output voltage
of either Maximum or Minimum. We are not concerned with generating
seriously accurate intermediate voltages. To generate this output, all
we have to do is switch back and forth between Maximum and Minimum
according to the bit stream. This switching can be done very accurately
indeed. Then, having generated this binary waveform, all we have to do
is pass it through a low pass filter. Job done.

This is a pretty interesting result. We have managed to eliminate the
need for those nasty brick-wall filters at both the ACD and DAC, and at
the same time capture a signal with exceptional dynamic range across the
audio bandwidth. This, my friends, is DSD.

As with a lot of
things, when you peek under the hood, things always get a little more
complicated, and I will address some of those complications tomorrow.