Mark Waldrep (aka ‘Dr. AIX’) has put a couple of DSD posts on his RealHD-Audio web site this month. Mark writes quite knowledgeably on audiophile matters, but is prone to a ‘you-can’t-argue-with-the-facts’ attitude predicated on an overly simplistic subset of what actually comprises ‘the facts’. In particular, Mark insists that 24-bit, 96kHz PCM is better than DSD, and one of the posts I am referring to discusses his abject bewilderment that 530 people (and counting) on the ‘Computer Audiophile’ blog would go to the trouble of participating in a thread which actively debates this assertion. He writes as though it were a self-evident ‘night-follows-day’ kind of an issue, almost a point of theology.
Let’s look at some of those facts. First of all, properly-dithered 24-bit PCM has a theoretical background noise signal within a dB or so of 144dB, whereas DSD64 rarely approaches within even 20dB of that. No argument from me there. Also, he points out that DSD64’s noise shaping process produces a massive amount of ultrasonic noise, which starts to appear just above the audio band and continues at a very, very high level all the way out to over 1MHz, which, he argues, all but subsumes the audio signal unless it is filtered out. We’ll grant him some hyperbolic license, and agree that, technically, what he says is correct.
Another ‘fact’ is, though, that much to Waldrep’s chagrin, there is a substantial body of opinion out there that would prefer to listen to DSD over 24/96. Why should this be, given that the above technical arguments (and others that you could also add into the mix with which I might also tend to agree) evidently set forth ‘the facts’? Yes, why indeed… and the answer is simple to state, but complex in scope. The main reason is that the pro-PCM arguments conveniently ignore the most critical aspect that differentiates the sound quality, which is the business of getting the audio signal into the PCM format in the first place. Let’s take a look at that.
If we are to encode an audio signal in PCM format, the most obvious way to approach the problem is using a sample-and hold circuit. This circuit looks at the incoming waveform, grabs hold of it at one specific instant, and ‘holds’ that value for the remainder of the sampling period. By ‘holding’ the signal, what we are doing is zeroing in on the value that we actually want to measure long enough to actually measure it.
Next we have to assign a digital value to this sampled voltage, and there are a couple of distinct ways to do this. One technique involves comparing the sampled signal level to the instantaneous value of a sawtooth waveform generated by a precision clock. As soon as the comparator detects that the instantaneous value of the sawtooth waveform has exceeded the value of the sampled waveform, by looking at the number of clock cycles that have passed we can calculate a digital value for the sampled waveform. Another technique is a ‘flash ADC’ where a number of simultaneous comparisons are made to precise DC values, each being a unique digital level. Obviously, for a 16-bit DAC this would mean 65,535 comparator circuits! That’s doable, but rather expensive. Think of it as the ADC equivalent of an R-2R ladder DAC. Yet another method is a hybrid of the two, where a sequence of comparators successively home in on the final result through a series of successive approximations whose logic I won’t attempt to unravel here. Each of these methods is limited by the accuracy of both the timer clock and the reference voltage levels.
Ultimately, in mixed-signal electronics (circuits with both analog and digital functions), it ends up being far easier to achieve a clock of arbitrary precision than a reference voltage of arbitrary precision. Way more so, in fact. For this reason, sample-and-hold ADC architectures have fallen from favour in the world of high-end audio. Instead, a technique called Sigma-Delta Modulation is used. You will recognize this term - it is the architecture that is used to create the 1-bit bitstream used in DSD. The SDM-ADC has for all practical purposes totally eliminated the sample-and-hold architectures in audio applications.
In an SDM-ADC, the trade-off between clock precision and reference voltage precision is resolved entirely in favour of the clock, which can be made as accurate as we want. In effect, we increase the sample rate to something many, many times higher than what is actually required, and accept a significantly reduced measurement accuracy. The inaccuracy of the instantaneous measurements are taken care of by a combination of averaging due to massive over-sampling and local feedback with the SDM. That will have to do in terms of an explanation, because an SDM is a conceptually complex beast, particularly in its analog form. In any case, the output of the SDM is a digital bitstream which can be 1-bit, but in reality is often 3-5 bits deep. The PCM output data is obtained on-chip by a digital conversion process similar to that which happens within DSD Master.
As you know, if you are going to encode an analog signal in a PCM format, the price you have to pay is to strictly band-limit the signal to less than one half of the sample rate prior to encoding it. This involves putting the signal through a ‘brick wall’ filter which removes all of the signal above a certain frequency while leaving everything below that frequency unchecked. In a sample-and-hold ADC this is performed using an all-analog filter located within the input stage of the ADC. In the SDM-ADC it is performed in the digital domain during the conversion from the 1-bit (or 3-5 bit) bitstream to the PCM output.
Brick wall filters are nasty things. Let’s look at a loudspeaker crossover filter as an example of a simple low-pass analog filter that generally can’t be avoided in our audio chain. The simplest filter is a single-stage filter with a cut-off slope of 6dB per octave (6dB/8ve). Steeper filters are considered to be progressively more intrusive due to phase disturbances which they introduce, although in practical designs steeper filters are often necessary to get around still greater issues elsewhere. Now compare that to a brick-wall ‘anti-aliasing’ filter. For 16/44.1 audio, this needs to pass all frequencies up to 20kHz, yet attenuate all frequencies above 22.05kHz by at least 96dB. That means a slope of at least 300dB/8ve is required.
If we confine ourselves purely to digital anti-aliasing filters used in a SDM-ADC, a slope of 300dB/8ve inevitably requires an ‘elliptic’ filter. Whole books have been devoted to elliptic filters, so I shall confine myself to saying that these filters have rather ugly phase responses. In principle they also have a degree of pass-band ripple, but I am willing to stipulate to an argument that such ripple is practically inaudible. The phase argument is another matter, though. Although conventional wisdom has it that phase distortion is inaudible, there is an increasing body of anecdotal evidence that suggests the opposite is the case. One of the core pillars of Meridian’s recent MQA initiative is based on the assumed superiority of “minimum phase” filter architectures, for example.
By increasing the sample rate of PCM we can actually reduce the aggression required of our anti-aliasing filters. I have written a previous post on this subject, but the bottom line is that only at sample rates above the 8Fs family (352.8/384kHz) can anti-aliasing filters be implemented with sufficiently low phase distortion. And Dr. AIX poo-poohs even 24/352.8 (aka ‘DXD’) as a credible format for high-end audio. Here at BitPerfect we are persuaded by the notion that the sound of digital audio is actually the sound of the anti-aliasing filters that are necessary for its existence, and that the characteristic that predominantly governs this is their phase response.
PCM requires an anti-aliasing filter, whereas DSD does not (actually, strictly speaking it does, but it is such a gentle filter that you could not with any kind of a straight face describe it as a ‘brick-wall’ filter). DSD has no inherent phase distortion resulting from a required filter. Instead, it has ultrasonic noise, and this is where Dr. AIX’s argument encounters difficulties. The simple solution is to filter it out. However, if we read his post, he seems to think that no such filtering is used … I quote: "It’s supposed to be out of the audio band but there is no ‘audio band’ for your playback equipment". Seriously? All it calls for is a filter similar to PCM’s ‘anti-aliasing’ filter, except not nearly as rigorous in its requirements.
Let me tell you how DSD Master approaches this in our DSD-to-PCM conversions. We know that, for 24/176.4 PCM conversions for example, we need only concern ourselves in a strict sense with that portion of the ultrasonic noise above 88.2kHz. It needs to be filtered out by at least 144dB or we will get aliasing. However, the steepness of the filter and its phase response are governed by the filter’s cut-off frequency. For the filters we use, the phase response remains pretty much linear up to about 80% of this frequency. Therefore we have some design freedom to push this frequency out as far as we want, and we choose to place it at a high enough frequency that the phase response remains quasi-linear across the entire audio band. Of course, the further we push it out, the more of the ultrasonic noise is allowed to remain in the encoded PCM data.
As an aside, you might well ask: If the ultrasonic noise is inaudible, then why do we have to filter it out in the first place? And that would indeed be a good question. According to auditory measurements, it is simple to determine that humans can’t hear anything above 20kHz - or even less as we age. However, more elaborate investigations indicate that we do respond subconsciously to ultrasonic stimuli that we cannot otherwise demonstrate that we hear. So it remains an interesting open question whether the presence of heavy ultrasonic content would actually have an impact on our perception of the sound. On the other hand, a lot of audio equipment is not designed to handle a heavy ultrasonic signal content. We know of one high-end TEAC DAC that could not lock onto a signal that contained even a modest -60dB of ultrasonic content (that problem, once identified, was quickly fixed with a firmware update). Such are probably as good reasons as any to want to filter it out.
So what do we do with the DSD content above 20kHz? In developing DSD Master we take the view that the content of this frequency range contains both the high-frequency content of the original signal (if any), plus the added high frequency noise created by the SDM’s noise-shaping process. We try to maintain any high frequency content within the signal flat up to 30kHz, and then begin our roll-off above that. Consequently, our DSD conversions at high sample rates (88.2kHz and above) do contain a significant ultrasonic peak in the 35-40kHz range. However, that peak is limited to about -80dB, which is way too low to either be audible(!) or to cause instability in anyone’s electronics. Meanwhile, the phase response is quasi-linear up to the point at which the ultrasonic noise rises above the signal level.
In designing DSD Master, we make those design compromises on the basis that the purpose of these conversions is to be used for final listening purposes. But if a similar functionality is being designed for the internal conversion stage of a PCM SDM-ADC then we know that a residual ultrasonic noise peak in the output data is not going to be acceptable. In our view, this means that design choices will be made which do not necessary coincide with the best possible sound quality.
As a final point, all the above observations are specific to ‘regular’ DSD (aka ‘DSD64’). The problem with ultrasonic noise pretty much goes away with DSD128 and above, something I have also written about in detail in a previous post.
So, from the foregoing, purely from a logical point of view, it seems somewhat contradictory for Dr. AIX to suggest that 24/96 PCM is inherently better than DSD, since DSD comes directly out of a SDM in its native form, whereas PCM is derived through digital manipulation of an SDM output with, among other things, a ‘brick-wall’ filter with a less-than-optimal configuration. I’ll also point out that his argument suggests that DSD (i.e the output of an SDM) will not deliver the full bit depth that he offers up as a key distinguishing feature of 24/96. Of course, those arguments apply only to ‘purist’ recordings which seek to capture the microphone output as naturally as possible. In that way the discussion is not coloured by any post-processing of the signal, which in any case is not possible in the native DSD domain.