Tuesday, 23 April 2013

The Nutty Professor

From some of my recent posts you will have observed that BitPerfect has been heavily involved in DSD over the past several weeks.  DSD is a form of Sigma-Delta Modulation (SDM), which, as I have pointed out, is a mathematically challenging concept.  Just to grasp its most basic form is quite an achievement in its own right, but as soon as you think you have got your head around it you learn that there are yet further wrinkles you need to understand, and it just goes and on and on and on.  It is very dense in its reliance on mathematics, and in fact you could earn a PhD studying and developing ever better forms of SDM, or coming up with newer and deeper understandings regarding distortion, stability, and the like.

For BitPerfect, we have been looking to find some “grown-up help”, in the form of a person or persons in the world of academia who can (a) help us to better understand the concepts; (b) help us to steer a path through the state-of-the-art in terms of both current implementations and the latest theoretical developments; (c) help us to avoid re-inventing wheels wherever possible; and (d) simply help to sort out facts from nonsense.  The last one of these is quite important – more so than you might imagine – because there is a lot of nonsense out there, mixed in with all the facts, and you really don’t want to waste brain cycles on any of it.

You would think it would be easy to develop the sort of relationships we are looking for, but not so.  Facts and nonsense still get in the way.  Take the Nutty Professor I recently met with.  This gentleman is head of faculty of a group which calls itself something along the lines of Faculty of Digital Music Technology (I’m not going to identify this person).  Our conversation got off on the wrong foot when, right off the bat, he insisted that DSD and PCM were in essence the same thing, and that you could losslessly convert between one format and the other (such as between FLAC and Apple Lossless, for example).  In his view, both were simply digital storage formats and so they HAD to have direct equivalence.  He was quite adamant about this, but didn't want to justify it.  I was to accept it as a fact.  Since a significant element in what I was looking for was clarity of thought on matters such as precisely this, I came away from the encounter somewhat disappointed.  At that point in time I wished I had the necessary understanding to present at least a simple argument to the Nutty Professor to counter his position, but I didn’t have one.

Today, I do – which I think is sufficiently elegant that I want to share it with you.  And I don’t think you need a background in mathematics to grasp it.

Refer to the graph below.   I have plotted signal-to-noise ratio (SNR) as a function of frequency.  The red line is a curve which is typical of DSD.  The SNR is very low across the frequency range that is important for high quality music playback (20Hz – 20kHz), and rises very dramatically at higher frequencies.  This is the famous Noise Shaping (that I described in yesterday’s post) in action.  Superimposed upon that is the blue line representing PCM in its 24-bit 88.2kHz form.  One simple way to interpret these curves is that each format is capable of fully encoding any musical signals at any points above the SNR, and is incapable of fully encoding anything below the line.
Suppose we have music encoded in the DSD format, and we convert it to 24/88.2 PCM format.  If we do this, all of the musical information represented by the hashed region labeled [A] must by necessity be lost.  This information is encoded into the DSD data stream, but cannot be represented by the PCM data stream.  Likewise, suppose we convert the 24/88.2 PCM data stream to DSD.  In this case, all of the musical information represented by the hashed region labeled [B] must by necessity be lost.  This information is encoded into the PCM data stream, but cannot be represented by the DSD data stream.  Regardless of whether we are converting DSD to PCM or the other way round, information is being lost.

Of course, there is an argument to be made regarding lost information, that if it represents something inaudible, then we can afford to throw it away.  In the example I have shown, the information contained in both [A] and [B] regions are arguably inaudible.  But don’t tell me that the conversion is lossless.  With a computer it is quite trivial to convert back and forth as often as you like between FLAC and Apple Lossless.  You can do it hundreds, thousands, even millions of times (if you are prepared to wait) and the music will remain unchanged.  Do the same thing between DSD and 24/88.2 PCM, and even after a hundred cycles the music will be all but unlistenable.

The Nutty Professor will not be advising BitPerfect.