BitPerfect: What, exactly, is DSD? - III. It’s a bird? It’s a ‘plane?

We learned over the last couple of days how DSD works as a format, and what its basic parameters are. It is a 1-bit system, sampled at 2.82MHz, relying heavily on oversampling and Noise Shaping. We didn’t say much about the actual mechanism of Noise Shaping because, frankly, it relies on some pretty dense mathematics. So we didn’t say too much about what the resultant data stream actually represents.

We learned that each individual bit is somehow like an Opinion Poll, where instead of asking the bit to tell us what the signal value it is, we ask it whether it thinks it should be a one or a zero. The bit is like an individual respondent - it doesn’t really know, but it has a black & white opinion, which might be right or wrong. But by asking the question of enough bits, we can average out the responses and come up with a consensus value. So each individual bit does not represent the actual value of the signal, but on the other hand an average of all the bits in the vicinity gets pretty close! So, at any point in time, in order to represent the signal, some of the bits are ones and some are zeros, and, to a first approximation, it does not matter too much how those ones and zeros are distributed. But here is a quick peek into Noise Shaping. Noise Shaping works by taking advantage of the choices in distributing the ones and zeros. It is precisely those choices that give rise to the Noise Shaping.

An interesting way of looking at it is that the signal itself represents the probability that the value of the bit will be a one or a zero. If the probability is higher, a higher proportion of the bits will be ones, and if it is lower the proportion will be correspondingly lower. As the waveform oscillates between high and low, so the relative preponderance of ones over zeros in the bitstream oscillates between high and low. The value of any one individual bit - whether it is a one or a zero - says very, very little about the underlying signal. That is quite a remarkable property. An individual bit could be subject to some sort of reading error and come out totally wrong, and provided there are a small enough number of such errors, it is arguable that you would never actually know that the error happened!

Compare this with PCM. In a PCM signal, we can argue that every single bit means something. It says something highly specific about the value of the signal at some specific point in time. Some bits say more important things than others. For example, the Most Significant Bit (MSB) tells us whether the signal is positive or negative. If there is a reading error and that comes out wrong, the impact on the resultant signal can be massive. Because every bit in a PCM system has a specific meaning, and every bit in a DSD system has a nebulous meaning, it should be no surprise that there is no mathematical one-to-one correspondence between PCM data and DSD data. Sure, you can convert PCM to DSD, and vice versa, but there is no mathematical identity that links the two - unlike a signal and its Fourier Transform, each of which is a direct representation of the other in a different form. Any transformation from one to the other is therefore subject to a lossy algorithm. Of course, an appropriate choice of algorithm can minimize the loss, but the twain are fundamentally incompatible.

However, let us look at some similarities. Let’s look at the function of the DAC. For a PCM DAC, its job is to recreate the values of a voltage encoded by the data at each sample point. Those voltages go up and down according to the data in the PCM data stream. We just need to pass that waveform through a low-pass filter and the result is music. Now let’s compare that with DSD. For a DSD DAC, its job is to recreate the values of a voltage encoded by the data at each sample point. Those voltages go up and down according to the data in the DSD data stream. We just need to pass that waveform through a low-pass filter and the result is music. Hang on one minute … wasn’t that just the same thing? Yes it was. For 16/44.1 (CD) audio, the PCM DAC is tasked with creating an output voltage with 16-bit precision, 44,100 times a second. On the other hand, for DSD the DSD DAC is tasked with creating an output voltage with 1-bit precision, 2,822,400 times a second. In each case the final result is obtained by passing the output waveform through a low-pass filter.

That is an interesting observation. Although the data encoded by PCM and DSD are fundamentally different - we just got through with describing how they mean fundamentally different things - now we hear that the process for converting both to analog is exactly the same? Yes. Strange but true. From a functionality perspective, as far as a DAC is concerned, DSD and PCM are the same thing!

By the way, I have mentioned how we can add Noise Shaped dither to a PCM signal and in doing so encode data below the resolution limit of the LSB. Our notional view of PCM is that the data stream explicitly encodes the value of the waveform at a sequence of instants in time, and yet, if we have encoded sub-dynamic data, that data cannot be encoded in that manner. Instead, by Noise Shaping, it is somehow captured in the way the multi-bit data stream evolves over time. Rather like DSD, you might say! There is definitely a grey area when it comes to calling one thing PCM and another thing DSD.

We started off this series of posts by mentioning the different ‘flavours’ of DSD that are cropping up out there. Now that I have set the table, I can finally return to that.

DSD in its 1-bit 2.28MHz form is the only form that can be described correctly (and pedantically) as DSD. We saw how it represents the lowest sample rate at which a 1-bit system could be Noise Shaped to deliver a combination of dynamic range and frequency response which at least equalled that delivered by CD. What it in fact delivers is a significant improvement in dynamic range, and more of a loosening in the restrictions on high-frequency response imposed by CD than a major extension of it. In any case, that is enough for most listeners to come out in favour of its significant superiority. However, a significant body of opinion holds that by increasing the sample rate yet further, we can achieve a valuable extension of the high-frequency response. (In principle, we could also increase the dynamic range, but DSD is already capable of exceeding the dynamic range of real-world music signals). People are already experimenting with doubling, quadrupling, and even octupling 1-bit sample rates. Terminology for these variants is settling on DSD128, DSD256, and DSD512 respectively (with actual DSD being referred to as DSD64). Why do this? Partially because we can. But - early days yet - reports are emerging of listeners who are declaring them to be significantly superior.

There are additionally formats - mostly proprietary ones which only exist ephemerally within DAC chips or pro-audio workstations - which replace the 1-bit quantization with multi-bit quantization. These have occasionally been referred to as “DSD-Wide”. I won’t go into that in much detail, but there are some interesting reasons you might want to use multi-bit quantizers. Some established authorities in digital audio - most notably Stanley Lipschitz of the University of Waterloo - have come out against DSD largely because of its 1-bit quantizers. Lipschitz’ most significant objection is a valid one. In order to create a DSD (in its broadest sense) bitstream, a Sigma Delta Modulator is used. For these modulators to achieve the required level of audio performance, they must incorporate high-order modulators to perform the Noise Shaping. These high order modulators turn out to be unstable if you use a 1-bit quantizer, but can be made stable by adopting a multi-bit quantizer. In practical terms, though, many of Lipschitz’ objections have been addressed in real-world systems, so I won’t pursue that topic further.

But ever since SACD (which uses the DSD system) first came out, DSD DACs have recognized that the DAC’s performance can be significantly improved by using one of the “extended-DSD” formats. So, internally, the majority of such chipsets convert the incoming DSD to their choice of “extended-DSD” format, and do the actual DAC work there. The conversion involves first passing the DSD bitstream through a low-pass filter, with the result being a PCM data stream using an ultra-high resolution floating-point data format sampled at 2.82MHz. This is then instantly oversampled to the required sample rate and converted to the “extended-DSD” format using a digital SDM. Unfortunately, the low-pass filter needs to share some of the undesirable characteristics of the brick-wall filters that characterize all PCM formats because of all the high-frequency content that has been shaped into the ultrasonic region. So it is likely that the proponents of DSD128, DSD256, and so forth, are onto something if those formats can be converted directly in the DAC without any “extended-DSD” reformatting.

I hope you found these posts which take a peek under the hood of DSD to be informative and interesting. Although the mathematics of PCM can be challenging at times, those of DSD are that and more, in spades. It is likely that progress in this field will continue to be made. In the meantime, condensing it into a form suitable for digestion by the layman remains a challenge of its own :)

Wednesday, 13 November 2013

What, exactly, is DSD? - III. It’s a bird? It’s a ‘plane?