We learned over the last couple of days how DSD works as a format, and
what its basic parameters are. It is a 1-bit system, sampled at
2.82MHz, relying heavily on oversampling and Noise Shaping. We didn’t
say much about the actual mechanism of Noise Shaping because, frankly,
it relies on some pretty dense mathematics. So we didn’t say too much
about what the resultant data stream actually represents.

We
learned that each individual bit is somehow like an Opinion Poll, where
instead of asking the bit to tell us what the signal value it is, we ask
it whether it thinks it should be a one or a zero. The bit is like an
individual respondent - it doesn’t really know, but it has a black &
white opinion, which might be right or wrong. But by asking the
question of enough bits, we can average out the responses and come up
with a consensus value. So each individual bit does not represent the
actual value of the signal, but on the other hand an average of all the
bits in the vicinity gets pretty close! So, at any point in time, in
order to represent the signal, some of the bits are ones and some are
zeros, and, to a first approximation, it does not matter too much how
those ones and zeros are distributed. But here is a quick peek into
Noise Shaping. Noise Shaping works by taking advantage of the choices
in distributing the ones and zeros. It is precisely those choices that
give rise to the Noise Shaping.

An interesting way of looking
at it is that the signal itself represents the probability that the
value of the bit will be a one or a zero. If the probability is higher,
a higher proportion of the bits will be ones, and if it is lower the
proportion will be correspondingly lower. As the waveform oscillates
between high and low, so the relative preponderance of ones over zeros
in the bitstream oscillates between high and low. The value of any one
individual bit - whether it is a one or a zero - says very, very little
about the underlying signal. That is quite a remarkable property. An
individual bit could be subject to some sort of reading error and come
out totally wrong, and provided there are a small enough number of such
errors, it is arguable that you would never actually know that the error
happened!

Compare this with PCM. In a PCM signal, we can
argue that every single bit means something. It says something highly
specific about the value of the signal at some specific point in time.
Some bits say more important things than others. For example, the Most
Significant Bit (MSB) tells us whether the signal is positive or
negative. If there is a reading error and that comes out wrong, the
impact on the resultant signal can be massive. Because every bit in a
PCM system has a specific meaning, and every bit in a DSD system has a
nebulous meaning, it should be no surprise that there is no mathematical
one-to-one correspondence between PCM data and DSD data. Sure, you can
convert PCM to DSD, and vice versa, but there is no mathematical
identity that links the two - unlike a signal and its Fourier Transform,
each of which is a direct representation of the other in a different
form. Any transformation from one to the other is therefore subject to a
lossy algorithm. Of course, an appropriate choice of algorithm can
minimize the loss, but the twain are fundamentally incompatible.

However, let us look at some similarities. Let’s look at the function
of the DAC. For a PCM DAC, its job is to recreate the values of a
voltage encoded by the data at each sample point. Those voltages go up
and down according to the data in the PCM data stream. We just need to
pass that waveform through a low-pass filter and the result is music.
Now let’s compare that with DSD. For a DSD DAC, its job is to recreate
the values of a voltage encoded by the data at each sample point. Those
voltages go up and down according to the data in the DSD data stream.
We just need to pass that waveform through a low-pass filter and the
result is music. Hang on one minute … wasn’t that just the same thing?
Yes it was. For 16/44.1 (CD) audio, the PCM DAC is tasked with
creating an output voltage with 16-bit precision, 44,100 times a second.
On the other hand, for DSD the DSD DAC is tasked with creating an
output voltage with 1-bit precision, 2,822,400 times a second. In each
case the final result is obtained by passing the output waveform through
a low-pass filter.

That is an interesting observation.
Although the data encoded by PCM and DSD are fundamentally different -
we just got through with describing how they mean fundamentally
different things - now we hear that the process for converting both to
analog is exactly the same? Yes. Strange but true. From a
functionality perspective, as far as a DAC is concerned, DSD and PCM are
the same thing!

By the way, I have mentioned how we can add
Noise Shaped dither to a PCM signal and in doing so encode data below
the resolution limit of the LSB. Our notional view of PCM is that the
data stream explicitly encodes the value of the waveform at a sequence
of instants in time, and yet, if we have encoded sub-dynamic data, that
data cannot be encoded in that manner. Instead, by Noise Shaping, it is
somehow captured in the way the multi-bit data stream evolves over
time. Rather like DSD, you might say! There is definitely a grey area
when it comes to calling one thing PCM and another thing DSD.

We started off this series of posts by mentioning the different
‘flavours’ of DSD that are cropping up out there. Now that I have set
the table, I can finally return to that.

DSD in its 1-bit
2.28MHz form is the only form that can be described correctly (and
pedantically) as DSD. We saw how it represents the lowest sample rate
at which a 1-bit system could be Noise Shaped to deliver a combination
of dynamic range and frequency response which at least equalled that
delivered by CD. What it in fact delivers is a significant improvement
in dynamic range, and more of a loosening in the restrictions on
high-frequency response imposed by CD than a major extension of it. In
any case, that is enough for most listeners to come out in favour of its
significant superiority. However, a significant body of opinion holds
that by increasing the sample rate yet further, we can achieve a
valuable extension of the high-frequency response. (In principle, we
could also increase the dynamic range, but DSD is already capable of
exceeding the dynamic range of real-world music signals). People are
already experimenting with doubling, quadrupling, and even octupling
1-bit sample rates. Terminology for these variants is settling on
DSD128, DSD256, and DSD512 respectively (with actual DSD being referred
to as DSD64). Why do this? Partially because we can. But - early days
yet - reports are emerging of listeners who are declaring them to be
significantly superior.

There are additionally formats - mostly
proprietary ones which only exist ephemerally within DAC chips or
pro-audio workstations - which replace the 1-bit quantization with
multi-bit quantization. These have occasionally been referred to as
“DSD-Wide”. I won’t go into that in much detail, but there are some
interesting reasons you might want to use multi-bit quantizers. Some
established authorities in digital audio - most notably Stanley
Lipschitz of the University of Waterloo - have come out against DSD
largely because of its 1-bit quantizers. Lipschitz’ most significant
objection is a valid one. In order to create a DSD (in its broadest
sense) bitstream, a Sigma Delta Modulator is used. For these modulators
to achieve the required level of audio performance, they must
incorporate high-order modulators to perform the Noise Shaping. These
high order modulators turn out to be unstable if you use a 1-bit
quantizer, but can be made stable by adopting a multi-bit quantizer. In
practical terms, though, many of Lipschitz’ objections have been
addressed in real-world systems, so I won’t pursue that topic further.

But ever since SACD (which uses the DSD system) first came out, DSD
DACs have recognized that the DAC’s performance can be significantly
improved by using one of the “extended-DSD” formats. So, internally,
the majority of such chipsets convert the incoming DSD to their choice
of “extended-DSD” format, and do the actual DAC work there. The
conversion involves first passing the DSD bitstream through a low-pass
filter, with the result being a PCM data stream using an ultra-high
resolution floating-point data format sampled at 2.82MHz. This is then
instantly oversampled to the required sample rate and converted to the
“extended-DSD” format using a digital SDM. Unfortunately, the low-pass
filter needs to share some of the undesirable characteristics of the
brick-wall filters that characterize all PCM formats because of all the
high-frequency content that has been shaped into the ultrasonic region.
So it is likely that the proponents of DSD128, DSD256, and so forth,
are onto something if those formats can be converted directly in the DAC
without any “extended-DSD” reformatting.

I hope you found
these posts which take a peek under the hood of DSD to be informative
and interesting. Although the mathematics of PCM can be challenging at
times, those of DSD are that and more, in spades. It is likely that
progress in this field will continue to be made. In the meantime,
condensing it into a form suitable for digestion by the layman remains a
challenge of its own :)