The Lucy Show
To the extent that I can claim to be a qualified anything, that would be a Physicist. I am not a biologist, geneticist, anthropologist, or theologist for that matter. But it doesn’t stop my mind from wandering into these areas from time to time. And recently, I have been thinking a bit about evolution.
While working out on my cross-trainer I like to watch a TV show to alleviate the boredom. Preferably something that will interest me sufficiently to extend my workout to the point where it actually does me some good. Recently, I saw a show that made some unconvincing point or other about “Lucy” - the one who was supposedly an Australopithecus Afarensis who lived some 3-odd million years ago. The gist of the program seemed to be founded on the notion that Lucy was a common ancestor of all humanity, and that we are therefore all her direct descendants. It is all speculation, of course, but it got me to thinking about what it means to be descended from someone or something. Because, unless you buy into some sort of creation theory, we all, ultimately, have to be descendants of something that first appeared in the primordial ooze. And I got to thinking about that.
The first thing that strikes me is the notion of being descended from someone. Usually, long-chain blood lines descend down from a given person, and not up to him or her. This is because the historic record is light on the general, and heavy on the specific. So, if you want to trace your ancestry, you probably won’t have to go back very far before you’ll strike out, with apparently no traces remaining of any records regarding certain individuals. In my case, my family tree peters out after only three or four generations. Perhaps not a bad thing, I sometimes think.
But one thing we can be pretty sure of, and that is that every human being who ever lived had one extant mother and one extant father (OK, all but one, if that is a point you want to argue). So, starting with myself, I can say with certainly that I had two parents, four grandparents, eight great-grandparents, sixteen great-great grandparents, and so on. I may never know who they all were, but I know they did exist.
You can imagine a hypothetical map of the world on your computer screen, with a slider that controls the date going back as far in history as you want. As you move the slider, a pixel lights up showing the whereabouts of every one of your direct ancestors who was alive on the corresponding date. If such a thing could ever exist, I wonder what it would show. My father is a Scot and my mother Austrian, so I imagine that for the first few hundred years or so mostly Scotland and Austria would be lit up.
For all of my most recent ancestors, it is a fair assumption to suggest that they are all mutually exclusive. In other words, that there was no cross-pollination (if I may put it that way) no matter how far removed the individuals were in my family tree. Realistically, though, that approximation is going to falter if you include a sufficient number of generations. Therefore, as we go further back in time the net number of my ancestors stops growing at an exponential rate. But at the same time as the number of my ancestors has been growing, with the steady rolling back of the clock so also the total number of humans on the planet will be shrinking. The growing ancestral base, and the shrinking overall population must surely meet somewhere.
It seems reasonable that the same line of logic should apply to us all. In other words, if we go back far enough in time, we should find that we are all descended from the same group of humans, no matter how disparate geographically, culturally, or any-other-ally. But this group would not comprise all of humanity at that point in time. Some of those individuals will die childless. Others will bear children who will die childless, and so forth. So the entire human race at that point will comprise two groups of people. Those who are the direct ancestors of every living person in the world today, and those whose bloodlines died out completely in the intervening millennia.
So the questions that I arrived at were these. What would the expected ratio be of ancestors to non-ancestors? Would we expect it to be a relatively large percentage or a relatively small percentage? And in particular, if the latter, what would it take for that small percentage to actually be one person? Is that even possible? I have never seen this line of thinking expanded upon, but one thing I have learned is that whenever something like that crosses my mind, it has always previously crossed the mind of someone who is a proper expert in the field. Maybe one day I’ll get to hear that expert’s opinion. But, in the meantime, it seems highly improbable to me that the ancestral percentage would even be a minority, let alone a minority of one.
The next point concerns what you might term the crossing of the man-ape barrier. This troubles a lot of people. Scientists dig up ancient skeletons and fossils and assign them to categories such as human, proto-human, and ape. Actually, they are lot more scientific about it, but you get my drift. The theory of evolution provides a mechanism or road-map for the development of ape into proto-human and proto-human into human, but has little to say on the specifics. Meanwhile, all we have in our historical record are a seriously limited number of archaeological specimens that we can do little with other than to fit them into a timeline.
The transformation of proto-human into human took - I don’t know - let’s call it a million years. Yet we only have specific archaeological specimens - for example the proto-human and the human. Us ordinary folk look at them - and also at the artists renderings of what the original individuals may have looked like - and many people have a hard time grasping how it is at all possible for one to become the other. Of course, if we had a perfect fossil record - say, one for every thousand years over the span of that million years - we might be able to understand and communicate convincingly how the development played out. But we don’t. And so we can’t. We can just make guesses - albeit highly-informed and very well-educated guesses.
These things happened over timescales so vast that all of recorded history is just a blink right at the end. It is wrong to think of evolution as a set of stable eras characterized by specific inhabitant species, separated by periods of transition. Certainly major transformative periods did occur, such as the Jurassic/Triassic, Triassic/Cretaceous, and Cretaceous/Tertiary (K/T) boundaries, but in general, for the last 66 million years, evolution has been a continuous thing. We are evolving today as a species as least as quickly as - if not orders of magnitude faster than - our ancestors did as they transitioned from proto-human to human. We’ve just not been around for long enough to be able to observe it.
Lets close with my computerized ancestral map, and slide the time dial back to the age when proto-humans were evolving into humans. Assuming that all humanity does not derive from The Lucy Show after all, my ancestral map will become a map of proto-human occupation. Slide it back a bit more and it will reflect ape occupation. Slide it back even further - and then what? At that point as far as I can tell all science has to offer is speculative at best. Apes date back to the cretaceous period. So, although we pretty much certainly were not descended from dinosaurs, it seems likely that some of our ancestors will have been eaten by them (although if a T-Rex ate some of my relatives, it would perish from alcohol poisoning). On the other hand, the well-known Dimetrodon - a lizard-like creature characterized by a spiny sail along its back, and recognized by five-year-olds everywhere - is quite possibly an ancestor of today’s mammals.
At the far end of its travel, my ancestral map ends up in the primordial soup, presumably as a population of bacteria. But if it did, then so did yours!…
Monday 22 September 2014
Thursday 18 September 2014
OS X 10.9.5
While waiting for the results of the Scottish Independence referendum to trickle out, I installed the latest OS X 10.9.5 and gave it a quick workout. So far so good. I can see no reason why BitPerfect Users should not upgrade.
Of course, like the Scottish Referendum, it may not look so rosy by tomorrow. :)
Of course, like the Scottish Referendum, it may not look so rosy by tomorrow. :)
Friday 12 September 2014
Has DSD met its Waterloo?
In May of 2001, Stanley Lipshitz and John Vanderkooy of the University of Waterloo, in Canada, published a paper titled “Why 1-bit Sigma-Delta Conversion is Unsuitable for High-Quality Applications”. In the paper’s Abstract (a kind of introductory paragraph summing up what the paper is all about) they made some unusually in-your-face pronouncements, including “We prove this fact.”, and “The audio industry is misguided if it adopts 1-bit sigma-delta conversion as the basis for any high quality processing, archiving, or distribution format…”. DSD had, apparently, met its Waterloo.
What was the basis of their arguments? Quite simple, really. They focussed on the problem of dither. As I mentioned in an earlier post, with a 1-bit system the quantization error is enormous. We rely on dither to eliminate it, and we can prove mathematically that TPDF dither at a depth of ±1LSB is necessary to deal with it. But with a 1-bit system, ±1LSB exceeds the full modulation depth. Applying ±1LSB of TPDF dither to a 1-bit signal will subsume not only the distortion components of the quantization error, but also the entire signal itself. Lipshitz and Vanderkooy study the phenomenon in some detail.
They then go on to characterize the behaviour of SDMs. SDMs and noise shapers are more or less the same thing. I described how they work a couple of posts back, so you should read that if you missed it first time round. An SDM goes unstable (or ‘overloads’) if the signal presented to the quantizer is so large as to cause the quantizer to clip. As Lipshitz and Vanderkooy observe, a 1-bit SDM must clip if it is dithered at ±1LSB. In other words, if you take steps to prevent it from overloading, then those same steps will have the effect that distortions and other unwanted artifacts can no longer be eliminated.
They also do some interesting analysis to counter some of the data shown by the proponents of DSD, which purport to demonstrate that by properly optimizing the SDM, any residual distortions will remain below the level of the noise. Lipshitz and Vanderkooy show that this is a limitation of the measurement technique rather than the data, and that if the signal is properly analyzed, the actual noise levels are found to be lower but the distortions levels are not, and do in fact stand proud of the noise.
Lipshitz and Vanderkooy do not suggest that SDMs themselves are inadequate. The quantizer at the output of an SDM is not constrained to being only a single-bit quantizer. It can just as easily have a multi-bit output. In fact they go on to state that “… a multi-bit SDM is in principle perfect, in that its only contribution is the addition of a benign … noise spectrum”. This, they point out, is the best that any system, digital or analog, can do.
The concept of a stable SDM with a multi-bit output is what underlies the majority of chipset-based DAC designs today, such as those from Wolfson, ESS, Cirrus Logic, and AKM. These types of DAC upsample any incoming signal - whether PCM or DSD - using a high sample rate SDM with a small number of bits in the quantizer - usually not more than three - driving a simplified multi-bit analog conversion stage.
Lipshitz and Vanderkooy’s paper was of course subjected to counter-arguments, mostly (but not exclusively) from within the Sony/Phillips sphere of influence. This spawned a bit of thrust and counter-thrust, but by and large further interest within the academic community completely dried up within a very short time. The prevailing opinion appears to accept the validity of Lipshitz and Vanderkooy from a mathematical perspective, but is willing to also accept that once measures are taken to keep any inherent imperfections of 1-bit audio below certain presumed limits of audibility, 1-bit audio bitstreams can indeed be made to work extremely well.
Where we have reached from a theoretical perspective is the point where our ability to actually implement DSD in the ADC and DAC domains is more of a limiting factor than our ability to understand the perfectibility (or otherwise) of the format itself. Most of the recently published research on 1-bit audio focuses instead on the SDMs used to construct ADCs. These are implemented in silicon on mixed-signal ICs, and are often quite stunningly complex. Power consumption, speed, stability, and chip size are the areas that interest researchers. From a practicality perspective, 1-bit audio has broad applicability and interest beyond the limited sphere of high-end audio, which alone cannot come close to justifying such an active level of R&D. Interestingly though, few and far between are the papers on DACs.
For all that, the current resurgence of DSD which has swept the high-end audio scene grew up after the Lipshitz and Vanderkooy debate had blown over. Clearly, the DSD movement did NOT meet its Waterloo in Lipshitz and Vanderkooy. Its new-found popularity is based not on arcane adherence to theoretical tenets, but on broadly-based observations that, to many ears, DSD persists in sounding better than PCM. It is certainly true that the very best audio that I personally have ever heard was from DSD sources, played back through the Light Harmonic Da Vinci Dual DAC. However, using my current reference, a PS Audio DirectStream DAC, I do not hear any significant difference at all between DSD and the best possible PCM transcodes.
There is no doubt in my mind that we haven’t heard the last of this. We just need to be true to ourselves at all times and keep an open mind. The most important thing is to not allow ourselves to become too tightly wed to one viewpoint or another to the extent that we become blinkered.
What was the basis of their arguments? Quite simple, really. They focussed on the problem of dither. As I mentioned in an earlier post, with a 1-bit system the quantization error is enormous. We rely on dither to eliminate it, and we can prove mathematically that TPDF dither at a depth of ±1LSB is necessary to deal with it. But with a 1-bit system, ±1LSB exceeds the full modulation depth. Applying ±1LSB of TPDF dither to a 1-bit signal will subsume not only the distortion components of the quantization error, but also the entire signal itself. Lipshitz and Vanderkooy study the phenomenon in some detail.
They then go on to characterize the behaviour of SDMs. SDMs and noise shapers are more or less the same thing. I described how they work a couple of posts back, so you should read that if you missed it first time round. An SDM goes unstable (or ‘overloads’) if the signal presented to the quantizer is so large as to cause the quantizer to clip. As Lipshitz and Vanderkooy observe, a 1-bit SDM must clip if it is dithered at ±1LSB. In other words, if you take steps to prevent it from overloading, then those same steps will have the effect that distortions and other unwanted artifacts can no longer be eliminated.
They also do some interesting analysis to counter some of the data shown by the proponents of DSD, which purport to demonstrate that by properly optimizing the SDM, any residual distortions will remain below the level of the noise. Lipshitz and Vanderkooy show that this is a limitation of the measurement technique rather than the data, and that if the signal is properly analyzed, the actual noise levels are found to be lower but the distortions levels are not, and do in fact stand proud of the noise.
Lipshitz and Vanderkooy do not suggest that SDMs themselves are inadequate. The quantizer at the output of an SDM is not constrained to being only a single-bit quantizer. It can just as easily have a multi-bit output. In fact they go on to state that “… a multi-bit SDM is in principle perfect, in that its only contribution is the addition of a benign … noise spectrum”. This, they point out, is the best that any system, digital or analog, can do.
The concept of a stable SDM with a multi-bit output is what underlies the majority of chipset-based DAC designs today, such as those from Wolfson, ESS, Cirrus Logic, and AKM. These types of DAC upsample any incoming signal - whether PCM or DSD - using a high sample rate SDM with a small number of bits in the quantizer - usually not more than three - driving a simplified multi-bit analog conversion stage.
Lipshitz and Vanderkooy’s paper was of course subjected to counter-arguments, mostly (but not exclusively) from within the Sony/Phillips sphere of influence. This spawned a bit of thrust and counter-thrust, but by and large further interest within the academic community completely dried up within a very short time. The prevailing opinion appears to accept the validity of Lipshitz and Vanderkooy from a mathematical perspective, but is willing to also accept that once measures are taken to keep any inherent imperfections of 1-bit audio below certain presumed limits of audibility, 1-bit audio bitstreams can indeed be made to work extremely well.
Where we have reached from a theoretical perspective is the point where our ability to actually implement DSD in the ADC and DAC domains is more of a limiting factor than our ability to understand the perfectibility (or otherwise) of the format itself. Most of the recently published research on 1-bit audio focuses instead on the SDMs used to construct ADCs. These are implemented in silicon on mixed-signal ICs, and are often quite stunningly complex. Power consumption, speed, stability, and chip size are the areas that interest researchers. From a practicality perspective, 1-bit audio has broad applicability and interest beyond the limited sphere of high-end audio, which alone cannot come close to justifying such an active level of R&D. Interestingly though, few and far between are the papers on DACs.
For all that, the current resurgence of DSD which has swept the high-end audio scene grew up after the Lipshitz and Vanderkooy debate had blown over. Clearly, the DSD movement did NOT meet its Waterloo in Lipshitz and Vanderkooy. Its new-found popularity is based not on arcane adherence to theoretical tenets, but on broadly-based observations that, to many ears, DSD persists in sounding better than PCM. It is certainly true that the very best audio that I personally have ever heard was from DSD sources, played back through the Light Harmonic Da Vinci Dual DAC. However, using my current reference, a PS Audio DirectStream DAC, I do not hear any significant difference at all between DSD and the best possible PCM transcodes.
There is no doubt in my mind that we haven’t heard the last of this. We just need to be true to ourselves at all times and keep an open mind. The most important thing is to not allow ourselves to become too tightly wed to one viewpoint or another to the extent that we become blinkered.
Thursday 11 September 2014
iTunes 11.4
I have been testing the latest iTunes update (11.4) and it seems to be
working fine. Initially I was concerned that it had caused my RMBP to
grind to a halt and require a re-boot, but this did not repeat on my Mac
Mini so I am thinking that was an unrelated issue. It has been working
fine on both Macs since then.
It looks like BitPerfect users can install it with confidence.
It looks like BitPerfect users can install it with confidence.
DSD64 vs DSD128
As a quick follow-up to my post on noise shaping, I wanted to make some comments on DSD playback. DSD’s specification flies quite close to the edge, in that its noise shaping causes ultrasonic noise to begin to rise almost immediately above the commonly accepted upper limit of the audio band (20kHz). This means that if DSD is directly converted to analog, the analog signal needs to go through a very aggressive low-pass filter which strips off this ultrasonic noise while leaving the audio frequencies intact. Such a filter is very similar in its performance to the anti-aliasing filters required in order to digitally sample an analog signal at 44.1kHz. These aggressive filters almost certainly have audible consequences, although there is no widely-held agreement as to what they are.
In order to get around that, the playback standard for SACD players provides for an upsampling of the DSD signal to double the sample rate, which we nowadays are referring to as DSD128. With DSD128 we can arrange for the ultrasonic noise to start its rise somewhere north of 40kHz. When we convert this to analog, the filters required can be much more benign, and can extend the audio band’s flat response all the way out to 30kHz and beyond. Many audiophiles consider such characteristics to be more desirable. By the way, we don’t have to stop at DSD128, nor do we have to restrict ourselves to 1-bit formats, but those are entirely separate discussions.
If that was all there was to it, life would be simple. And it isn’t. The problem is that the original DSD signal (which I shall henceforth refer to as DSD64 for clarity) still contains ultrasonic noise from about 20-something kHz upwards. This is now part of the signal, and cannot be unambiguously separated from it. If nothing is done about it, it will still be present in the output even after remodulating it to DSD128. So you need to filter it out before upsampling to DSD128, using a filter with similar performance to the one we just discussed and trashed as a possible solution in the analog domain.
The saving grace is that this can now be a digital filter. There are three advantages that digital filters have over analog filters. The first is that they approach very closely the behavior of theoretically perfect filters, something which analog filters do not. This makes the design of a good digital filter massively easier as a practical matter than that of an equivalent analog filter. The second advantage is that digital filters have a wider design space than analog filters, and some performance characteristics can be attained using them that are not possible using analog filters. The third advantage is that analog filters are constructed using circuit elements which include capacitors, inductors, and resistors - components which high-end audio circuit designers will tell you can (and do) contribute to the sound quality. Well-designed digital filters have no equivalent sonic signatures.
So - good news - the unwanted ultrasonic noise can be filtered out digitally with less sonic degradation than we could achieve with an analog filter. Once the DSD64 is digitally filtered, it can be upsampled to 5.6MHz and processed into DSD128 using a Sigma-Delta Modulator (SDM). It is an unresolved question in digital audio whether a SDM introduces any audible sonic degradation. Together with the question of whether a 1-bit representation is adequate for the purposes of high-fidelity representation of an audio signal, these are the core technical issues at the heart of the PCM-vs-DSD debate.
So the difference between something that was converted from DSD64 to DSD128, and something that was recorded directly to DSD128, is that the former has been filtered to remove ultrasonic artifacts adjacent to the audio frequency band, and the latter has not. If DSD128 sounds better than DSD64 it is because it dispenses with that filtering (and re-modulation) requirement. Such arguments can be further extended to DSD256, DSD512, and the like. The higher the 1-bit sampling frequency, the further the onset of ultrasonic noise can be pushed away from the audio band, and the more benign the filtering can be to remove it for playback.
It is interesting to conclude with the observation that, unlike the situation with 44.1kHz PCM, DSD64 allows the encoded signal to retain its frequency spectrum all the way out to 1MHz and beyond, if you wanted. By contrast, 44.1kHz PCM requires the original analog signal itself to be strictly filtered to eliminate all content above a meager 22.05kHz. DSD64 retains the full bandwidth of the signal, but allows it to be submerged by extremely high levels of added noise. In the end you still have to filter out the noise - and any remaining signal components with it - but at least the original signal is still present.
In order to get around that, the playback standard for SACD players provides for an upsampling of the DSD signal to double the sample rate, which we nowadays are referring to as DSD128. With DSD128 we can arrange for the ultrasonic noise to start its rise somewhere north of 40kHz. When we convert this to analog, the filters required can be much more benign, and can extend the audio band’s flat response all the way out to 30kHz and beyond. Many audiophiles consider such characteristics to be more desirable. By the way, we don’t have to stop at DSD128, nor do we have to restrict ourselves to 1-bit formats, but those are entirely separate discussions.
If that was all there was to it, life would be simple. And it isn’t. The problem is that the original DSD signal (which I shall henceforth refer to as DSD64 for clarity) still contains ultrasonic noise from about 20-something kHz upwards. This is now part of the signal, and cannot be unambiguously separated from it. If nothing is done about it, it will still be present in the output even after remodulating it to DSD128. So you need to filter it out before upsampling to DSD128, using a filter with similar performance to the one we just discussed and trashed as a possible solution in the analog domain.
The saving grace is that this can now be a digital filter. There are three advantages that digital filters have over analog filters. The first is that they approach very closely the behavior of theoretically perfect filters, something which analog filters do not. This makes the design of a good digital filter massively easier as a practical matter than that of an equivalent analog filter. The second advantage is that digital filters have a wider design space than analog filters, and some performance characteristics can be attained using them that are not possible using analog filters. The third advantage is that analog filters are constructed using circuit elements which include capacitors, inductors, and resistors - components which high-end audio circuit designers will tell you can (and do) contribute to the sound quality. Well-designed digital filters have no equivalent sonic signatures.
So - good news - the unwanted ultrasonic noise can be filtered out digitally with less sonic degradation than we could achieve with an analog filter. Once the DSD64 is digitally filtered, it can be upsampled to 5.6MHz and processed into DSD128 using a Sigma-Delta Modulator (SDM). It is an unresolved question in digital audio whether a SDM introduces any audible sonic degradation. Together with the question of whether a 1-bit representation is adequate for the purposes of high-fidelity representation of an audio signal, these are the core technical issues at the heart of the PCM-vs-DSD debate.
So the difference between something that was converted from DSD64 to DSD128, and something that was recorded directly to DSD128, is that the former has been filtered to remove ultrasonic artifacts adjacent to the audio frequency band, and the latter has not. If DSD128 sounds better than DSD64 it is because it dispenses with that filtering (and re-modulation) requirement. Such arguments can be further extended to DSD256, DSD512, and the like. The higher the 1-bit sampling frequency, the further the onset of ultrasonic noise can be pushed away from the audio band, and the more benign the filtering can be to remove it for playback.
It is interesting to conclude with the observation that, unlike the situation with 44.1kHz PCM, DSD64 allows the encoded signal to retain its frequency spectrum all the way out to 1MHz and beyond, if you wanted. By contrast, 44.1kHz PCM requires the original analog signal itself to be strictly filtered to eliminate all content above a meager 22.05kHz. DSD64 retains the full bandwidth of the signal, but allows it to be submerged by extremely high levels of added noise. In the end you still have to filter out the noise - and any remaining signal components with it - but at least the original signal is still present.
Tuesday 9 September 2014
Noise Shaping - What Does It Do?
If you read these posts regularly you will know that any attempt to digitize a music signal - to reduce it to a finite numerical representation - will ultimately result in some degree of quantization error. This is because you cannot represent the waveform with an absolute degree of precision. It is a bit like trying to express 1/3 as a decimal (1.3333333 …… ) - the more 3’s you write, the more accurate it is - the less the ‘quantization error’ - but the error is still there. This quantization error comprises both noise and distortion. The distortion components are those which are related to the signal itself (mathematically we use the term ‘correlated’), and generally can be held to represent sonic defects. The noise is unrelated to the signal (mathematically ‘uncorrelated’) and represents the sort of background noise that we can in practice “tune out” without it adversely affecting our perception of the sound quality.
The way to eliminate the distortion caused by quantization is simply to add some more noise to it. But not so much as to totally subsume the distortions. It turns out that if we add just the right amount of noise, it doesn’t so much bury the distortion as cause it to shrink. If we do it right, it shrinks to a level just below the newly-added noise floor. Of course, this noise floor is now slightly higher than before, but this is perceived to sound better than the lower noise level with the higher distortion. The process of deliberately adding noise is called ‘dither’, and we can mathematically analyze exactly how much noise, and what type of noise, is necessary to accomplish the desired result. The answer is ‘TPDF’ dither (it doesn’t matter if you don’t know what that means) at the level of the Least Significant Bit (LSB). This means that the greater the Bit Depth of your signal, the less the amount of noise you have to add to ensure the absence of distortion components in the quantization error.
Explaining and understanding exactly why that works is beyond the scope of this post, but I should point out that the analysis leads to some deeper and more profound insights, the implications of which I want to talk about. Essentially, the idea of dither is this: when you digitize an analog signal (or reduce the bit depth of a digital signal - same thing) you are not constrained to always choose the nearest quantization level. Sometimes good things can happen if you instead choose a different quantization level, as we shall see.
One thing that is easy to grasp is the concept of averaging. If you count the number of people who live in a house, the answer is always an integer number. But if you average over several houses, the average number of occupants can be a fractional number - for example 2.59. Yet you will never look in an individual house and see 2.59 people. It is the same with digital audio. By measuring something multiple times, you can get an “average” value, which has more precision than the bit depth with which the values are measured. In digital audio we call this “oversampling”.
Recall also that in order to digitally sample an analog waveform, we need a sample rate which is at least twice that of the highest frequency present in the waveform. An audio waveform contains many frequencies, ranging from deep bass to high treble, so the sampling frequency must be at least twice that of the highest treble frequencies. Clearly, therefore, the sampling frequency is going to to be many, many times higher that what we would need to capture the lower frequencies alone. You could argue therefore, that the lowest frequencies are highly oversampled, and that the possibility therefore ought to exist to record their content at a precision which, thanks to "averaging", is greater than the nominal bit depth. And you would be right.
Noise shaping takes advantage of the fact that the lower frequencies are inherently over-sampled, and allows us to push the background noise level at these lower frequencies down below what would otherwise be the limit imposed by the fixed bit depth. In fact it even allows us to encode signals below the level of the LSB, right down to that noise floor. You would think it wouldn’t be possible, but it is, because of the fact that the low frequencies as quite highly oversampled. In effect, you can think of the low frequency information as being encoded by averaging it over a number of samples. In reality it is a lot more complicated than that, but that simplistic picture is essentially correct.
Like playing a flute, actually doing the noise shaping is a lot more difficult than talking about how to do it. A noise shaping circuit (or, in the digital domain, algorithm) is conceptually simple. You take the output of the quantizer and subtract it from its input. The result is the quantization error. You pass that through a carefully designed filter and subtract its output in turn from the original input signal. You are in effect putting the quantization error into a negative feedback loop. In designing such a noise shaper, though, you must not ask it to do the impossible, otherwise it won’t work and will go unstable. What you must do is recognize that only the low frequencies can benefit from noise shaping, so the filter must be a low-pass filter, and pass only the low frequency components of the quantization error through the feedback loop. This negative feedback in effect tries to reduce the quantization error only at those low frequencies.
But there’s no free lunch. All of those quantization errors can’t just be made to go away. Because of the low-pass filter, the higher frequency components of the quantization error are not subject to the same negative feedback and so the actual quantization error becomes dominated by high frequency components. The low frequency components end up being suppressed at the expense of increases in the high frequency components. This is why it is called “Noise Shaping”. It would be more accurate to refer to it as “Quantization Error Shaping”, but that trips less fluidly off the tongue. What we have done is to select quantization levels that are not necessarily those with the lowest individual quantization error, but as a result have nonetheless ended up with an improved performance.
At this point, a good question might be to ask just how much we can suppress the quantization error noise? And there is an answer to that. It is referred to as “Gerzon & Craven”, after the authors who published the first analysis of the subject in 1989. What Gerzon & Craven says is that if we plot the quantization noise on a dB scale against the frequency on a linear scale, as we use noise shaping to push the quantization noise floor down at the low frequency end, we plot out a new curve. There is an area that appears between the old and new curves. Then, at higher frequencies, noise shaping requires us to pull the noise floor up above the existing noise floor. Again, an area appears between the old curve and the new one. Gerzon and Craven tells us that the two areas must be equal. Since there is a fundamental limit on how high we can pull up the high frequency noise floor (we can’t pull it up higher than 0dB), it follows that there is a practical limit on how much we can push down the low frequency noise. In practice, however, too high a degree of noise shaping requires highly aggressive filters, and these can end up dominating the issue due to practical problems of their own.
For a lot of applications, the high frequency area overlaps with the signal bandwidth. A perfect example is 16/44.1 “red book” audio. The high frequency area goes up to 22.05kHz, of which the audio bandwidth is taken to comprise up to 20kHz. Any noise shaping done on 16/44.1 audio must therefore introduce audible high frequency noise. It must therefore be done - if it is done at all - very judiciously.
There are two very important things to bear in mind about noise shaping. The first is that the high frequency content is crucial to both the low frequency noise suppression and low-level signal encoding. In a real sense, those effects are actually encoded by the high frequency noise itself. If you were to pass the noise-shaped signal through a low-pass filter that cuts out only the high frequency noise, then as soon as you re-quantized the output of the filter to the bit depth of the original signal, all of that information would be lost again.
The second thing is that the noise-shaped noise is now part of the signal, and cannot be separated out. This is of greatest importance in applications such as 16/44.1 where the signal and the shaped noise share a part of the spectrum. Every time you add noise-shaped dither to such a signal as part of a processing stage, you end up adding to the high frequency noise. Considering that noise shaping may easily add 20dB of high frequency noise, this is a very important consideration.
All this is fundamental to the design of DSD, which is built upon the foundation of noise shaping. A 1-bit bitstream has a noise floor of nominally -6dB, which is useless for high quality audio. But if we can use noise-shaping to push it down to, say, -120dB over the audio bandwidth, then all of a sudden it becomes interesting. In order to do that, we would need an awful lot of high frequency headroom into which we can shape all the resultant noise. Additionally, we only have an absolute minimum of headroom into which we can push all this noise. We will need something like 1,000kHz of high frequency space in which to shape all this noise. Enter DSD, which has 1.4MHz available, and practical SDMs can just about be designed to do the job.
If we can double the sample rate of DSD and get what we now refer to as DSD128, or even increase it further to DSD256, DSD512, etc, then we can not only suppress the noise floor across the audio bandwidth, but also well into the ultrasonic region, so that it is totally removed from any audio content. Perhaps this is why those higher flavours of DSD have their adherents.
I want to finish with some comments related to the paragraph above where I talk about how the HF noise is integral to the LF performance gains. I want to discuss how this applies to DSD. Obviously, I have to strip off the HF noise before I can play the track. But if I can’t do that without regressing to 1-bit audio with -6B noise floor, how is it of any practical use? The answer is that the HF content is only crucial while the signal remains in the 1-bit domain. As soon as I free it from the shackles of 1-bit representation, all bets are off. Converting it to analog is one way of releasing those shackles. I can then use an analog filter to strip off the ultrasonic noise. Converting it to a 64-bit digital format would be another. In the 64-bit domain, for example, 1 and 0 become 1.0000000000000000E+000 and 0.0000000000000000E+000 respectively, and any quantization errors all of a sudden become vanishingly small. In the 64-bit digital domain I can do all sorts of useful and interesting things, like digitally filter out all the HF noise, which is now superfluous. But if I ever want to return it to the 1-bit domain, I need to go through the whole high-performance SDM once again, which would serve to add it right back in.
The way to eliminate the distortion caused by quantization is simply to add some more noise to it. But not so much as to totally subsume the distortions. It turns out that if we add just the right amount of noise, it doesn’t so much bury the distortion as cause it to shrink. If we do it right, it shrinks to a level just below the newly-added noise floor. Of course, this noise floor is now slightly higher than before, but this is perceived to sound better than the lower noise level with the higher distortion. The process of deliberately adding noise is called ‘dither’, and we can mathematically analyze exactly how much noise, and what type of noise, is necessary to accomplish the desired result. The answer is ‘TPDF’ dither (it doesn’t matter if you don’t know what that means) at the level of the Least Significant Bit (LSB). This means that the greater the Bit Depth of your signal, the less the amount of noise you have to add to ensure the absence of distortion components in the quantization error.
Explaining and understanding exactly why that works is beyond the scope of this post, but I should point out that the analysis leads to some deeper and more profound insights, the implications of which I want to talk about. Essentially, the idea of dither is this: when you digitize an analog signal (or reduce the bit depth of a digital signal - same thing) you are not constrained to always choose the nearest quantization level. Sometimes good things can happen if you instead choose a different quantization level, as we shall see.
One thing that is easy to grasp is the concept of averaging. If you count the number of people who live in a house, the answer is always an integer number. But if you average over several houses, the average number of occupants can be a fractional number - for example 2.59. Yet you will never look in an individual house and see 2.59 people. It is the same with digital audio. By measuring something multiple times, you can get an “average” value, which has more precision than the bit depth with which the values are measured. In digital audio we call this “oversampling”.
Recall also that in order to digitally sample an analog waveform, we need a sample rate which is at least twice that of the highest frequency present in the waveform. An audio waveform contains many frequencies, ranging from deep bass to high treble, so the sampling frequency must be at least twice that of the highest treble frequencies. Clearly, therefore, the sampling frequency is going to to be many, many times higher that what we would need to capture the lower frequencies alone. You could argue therefore, that the lowest frequencies are highly oversampled, and that the possibility therefore ought to exist to record their content at a precision which, thanks to "averaging", is greater than the nominal bit depth. And you would be right.
Noise shaping takes advantage of the fact that the lower frequencies are inherently over-sampled, and allows us to push the background noise level at these lower frequencies down below what would otherwise be the limit imposed by the fixed bit depth. In fact it even allows us to encode signals below the level of the LSB, right down to that noise floor. You would think it wouldn’t be possible, but it is, because of the fact that the low frequencies as quite highly oversampled. In effect, you can think of the low frequency information as being encoded by averaging it over a number of samples. In reality it is a lot more complicated than that, but that simplistic picture is essentially correct.
Like playing a flute, actually doing the noise shaping is a lot more difficult than talking about how to do it. A noise shaping circuit (or, in the digital domain, algorithm) is conceptually simple. You take the output of the quantizer and subtract it from its input. The result is the quantization error. You pass that through a carefully designed filter and subtract its output in turn from the original input signal. You are in effect putting the quantization error into a negative feedback loop. In designing such a noise shaper, though, you must not ask it to do the impossible, otherwise it won’t work and will go unstable. What you must do is recognize that only the low frequencies can benefit from noise shaping, so the filter must be a low-pass filter, and pass only the low frequency components of the quantization error through the feedback loop. This negative feedback in effect tries to reduce the quantization error only at those low frequencies.
But there’s no free lunch. All of those quantization errors can’t just be made to go away. Because of the low-pass filter, the higher frequency components of the quantization error are not subject to the same negative feedback and so the actual quantization error becomes dominated by high frequency components. The low frequency components end up being suppressed at the expense of increases in the high frequency components. This is why it is called “Noise Shaping”. It would be more accurate to refer to it as “Quantization Error Shaping”, but that trips less fluidly off the tongue. What we have done is to select quantization levels that are not necessarily those with the lowest individual quantization error, but as a result have nonetheless ended up with an improved performance.
At this point, a good question might be to ask just how much we can suppress the quantization error noise? And there is an answer to that. It is referred to as “Gerzon & Craven”, after the authors who published the first analysis of the subject in 1989. What Gerzon & Craven says is that if we plot the quantization noise on a dB scale against the frequency on a linear scale, as we use noise shaping to push the quantization noise floor down at the low frequency end, we plot out a new curve. There is an area that appears between the old and new curves. Then, at higher frequencies, noise shaping requires us to pull the noise floor up above the existing noise floor. Again, an area appears between the old curve and the new one. Gerzon and Craven tells us that the two areas must be equal. Since there is a fundamental limit on how high we can pull up the high frequency noise floor (we can’t pull it up higher than 0dB), it follows that there is a practical limit on how much we can push down the low frequency noise. In practice, however, too high a degree of noise shaping requires highly aggressive filters, and these can end up dominating the issue due to practical problems of their own.
For a lot of applications, the high frequency area overlaps with the signal bandwidth. A perfect example is 16/44.1 “red book” audio. The high frequency area goes up to 22.05kHz, of which the audio bandwidth is taken to comprise up to 20kHz. Any noise shaping done on 16/44.1 audio must therefore introduce audible high frequency noise. It must therefore be done - if it is done at all - very judiciously.
There are two very important things to bear in mind about noise shaping. The first is that the high frequency content is crucial to both the low frequency noise suppression and low-level signal encoding. In a real sense, those effects are actually encoded by the high frequency noise itself. If you were to pass the noise-shaped signal through a low-pass filter that cuts out only the high frequency noise, then as soon as you re-quantized the output of the filter to the bit depth of the original signal, all of that information would be lost again.
The second thing is that the noise-shaped noise is now part of the signal, and cannot be separated out. This is of greatest importance in applications such as 16/44.1 where the signal and the shaped noise share a part of the spectrum. Every time you add noise-shaped dither to such a signal as part of a processing stage, you end up adding to the high frequency noise. Considering that noise shaping may easily add 20dB of high frequency noise, this is a very important consideration.
All this is fundamental to the design of DSD, which is built upon the foundation of noise shaping. A 1-bit bitstream has a noise floor of nominally -6dB, which is useless for high quality audio. But if we can use noise-shaping to push it down to, say, -120dB over the audio bandwidth, then all of a sudden it becomes interesting. In order to do that, we would need an awful lot of high frequency headroom into which we can shape all the resultant noise. Additionally, we only have an absolute minimum of headroom into which we can push all this noise. We will need something like 1,000kHz of high frequency space in which to shape all this noise. Enter DSD, which has 1.4MHz available, and practical SDMs can just about be designed to do the job.
If we can double the sample rate of DSD and get what we now refer to as DSD128, or even increase it further to DSD256, DSD512, etc, then we can not only suppress the noise floor across the audio bandwidth, but also well into the ultrasonic region, so that it is totally removed from any audio content. Perhaps this is why those higher flavours of DSD have their adherents.
I want to finish with some comments related to the paragraph above where I talk about how the HF noise is integral to the LF performance gains. I want to discuss how this applies to DSD. Obviously, I have to strip off the HF noise before I can play the track. But if I can’t do that without regressing to 1-bit audio with -6B noise floor, how is it of any practical use? The answer is that the HF content is only crucial while the signal remains in the 1-bit domain. As soon as I free it from the shackles of 1-bit representation, all bets are off. Converting it to analog is one way of releasing those shackles. I can then use an analog filter to strip off the ultrasonic noise. Converting it to a 64-bit digital format would be another. In the 64-bit domain, for example, 1 and 0 become 1.0000000000000000E+000 and 0.0000000000000000E+000 respectively, and any quantization errors all of a sudden become vanishingly small. In the 64-bit digital domain I can do all sorts of useful and interesting things, like digitally filter out all the HF noise, which is now superfluous. But if I ever want to return it to the 1-bit domain, I need to go through the whole high-performance SDM once again, which would serve to add it right back in.
Subscribe to:
Posts (Atom)