Thursday, 18 September 2014

OS X 10.9.5

While waiting for the results of the Scottish Independence referendum to trickle out, I installed the latest OS X 10.9.5 and gave it a quick workout.  So far so good.  I can see no reason why BitPerfect Users should not upgrade.

Of course, like the Scottish Referendum, it may not look so rosy by tomorrow. :)

Friday, 12 September 2014

Has DSD met its Waterloo?

In May of 2001, Stanley Lipshitz and John Vanderkooy of the University of Waterloo, in Canada, published a paper titled “Why 1-bit Sigma-Delta Conversion is Unsuitable for High-Quality Applications”.  In the paper’s Abstract (a kind of introductory paragraph summing up what the paper is all about) they made some unusually in-your-face pronouncements, including “We prove this fact.”, and “The audio industry is misguided if it adopts 1-bit sigma-delta conversion as the basis for any high quality processing, archiving, or distribution format…”.  DSD had, apparently, met its Waterloo.

What was the basis of their arguments?  Quite simple, really.  They focussed on the problem of dither.  As I mentioned in an earlier post, with a 1-bit system the quantization error is enormous.  We rely on dither to eliminate it, and we can prove mathematically that TPDF dither at a depth of ±1LSB is necessary to deal with it.  But with a 1-bit system, ±1LSB exceeds the full modulation depth.  Applying ±1LSB of TPDF dither to a 1-bit signal will subsume not only the distortion components of the quantization error, but also the entire signal itself.  Lipshitz and Vanderkooy study the phenomenon in some detail.

They then go on to characterize the behaviour of SDMs.  SDMs and noise shapers are more or less the same thing.  I described how they work a couple of posts back, so you should read that if you missed it first time round.  An SDM goes unstable (or ‘overloads’) if the signal presented to the quantizer is so large as to cause the quantizer to clip.  As Lipshitz and Vanderkooy observe, a 1-bit SDM must clip if it is dithered at ±1LSB.  In other words, if you take steps to prevent it from overloading, then those same steps will have the effect that distortions and other unwanted artifacts can no longer be eliminated.

They also do some interesting analysis to counter some of the data shown by the proponents of DSD, which purport to demonstrate that by properly optimizing the SDM, any residual distortions will remain below the level of the noise.  Lipshitz and Vanderkooy show that this is a limitation of the measurement technique rather than the data, and that if the signal is properly analyzed, the actual noise levels are found to be lower but the distortions levels are not, and do in fact stand proud of the noise.

Lipshitz and Vanderkooy do not suggest that SDMs themselves are inadequate.  The quantizer at the output of an SDM is not constrained to being only a single-bit quantizer.  It can just as easily have a multi-bit output.  In fact they go on to state that “… a multi-bit SDM is in principle perfect, in that its only contribution is the addition of a benign … noise spectrum”.  This, they point out, is the best that any system, digital or analog, can do.

The concept of a stable SDM with a multi-bit output is what underlies the majority of chipset-based DAC designs today, such as those from Wolfson, ESS, Cirrus Logic, and AKM.  These types of DAC upsample any incoming signal - whether PCM or DSD - using a high sample rate SDM with a small number of bits in the quantizer - usually not more than three - driving a simplified multi-bit analog conversion stage.

Lipshitz and Vanderkooy’s paper was of course subjected to counter-arguments, mostly (but not exclusively) from within the Sony/Phillips sphere of influence.  This spawned a bit of thrust and counter-thrust, but by and large further interest within the academic community completely dried up within a very short time.  The prevailing opinion appears to accept the validity of Lipshitz and Vanderkooy from a mathematical perspective, but is willing to also accept that once measures are taken to keep any inherent imperfections of 1-bit audio below certain presumed limits of audibility, 1-bit audio bitstreams can indeed be made to work extremely well.

Where we have reached from a theoretical perspective is the point where our ability to actually implement DSD in the ADC and DAC domains is more of a limiting factor than our ability to understand the perfectibility (or otherwise) of the format itself.  Most of the recently published research on 1-bit audio focuses instead on the SDMs used to construct ADCs.  These are implemented in silicon on mixed-signal ICs, and are often quite stunningly complex.  Power consumption, speed, stability, and chip size are the areas that interest researchers.  From a practicality perspective, 1-bit audio has broad applicability and interest beyond the limited sphere of high-end audio, which alone cannot come close to justifying such an active level of R&D.  Interestingly though, few and far between are the papers on DACs.

For all that, the current resurgence of DSD which has swept the high-end audio scene grew up after the Lipshitz and Vanderkooy debate had blown over.  Clearly, the DSD movement did NOT meet its Waterloo in Lipshitz and Vanderkooy.  Its new-found popularity is based not on arcane adherence to theoretical tenets, but on broadly-based observations that, to many ears, DSD persists in sounding better than PCM.  It is certainly true that the very best audio that I personally have ever heard was from DSD sources, played back through the Light Harmonic Da Vinci Dual DAC.  However, using my current reference, a PS Audio DirectStream DAC, I do not hear any significant difference at all between DSD and the best possible PCM transcodes.

There is no doubt in my mind that we haven’t heard the last of this.  We just need to be true to ourselves at all times and keep an open mind.  The most important thing is to not allow ourselves to become too tightly wed to one viewpoint or another to the extent that we become blinkered.

Thursday, 11 September 2014

iTunes 11.4

I have been testing the latest iTunes update (11.4) and it seems to be working fine. Initially I was concerned that it had caused my RMBP to grind to a halt and require a re-boot, but this did not repeat on my Mac Mini so I am thinking that was an unrelated issue. It has been working fine on both Macs since then.

It looks like BitPerfect users can install it with confidence.

DSD64 vs DSD128

As a quick follow-up to my post on noise shaping, I wanted to make some comments on DSD playback.  DSD’s specification flies quite close to the edge, in that its noise shaping causes ultrasonic noise to begin to rise almost immediately above the commonly accepted upper limit of the audio band (20kHz).  This means that if DSD is directly converted to analog, the analog signal needs to go through a very aggressive low-pass filter which strips off this ultrasonic noise while leaving the audio frequencies intact.  Such a filter is very similar in its performance to the anti-aliasing filters required in order to digitally sample an analog signal at 44.1kHz.  These aggressive filters almost certainly have audible consequences, although there is no widely-held agreement as to what they are.

In order to get around that, the playback standard for SACD players provides for an upsampling of the DSD signal to double the sample rate, which we nowadays are referring to as DSD128.  With DSD128 we can arrange for the ultrasonic noise to start its rise somewhere north of 40kHz.  When we convert this to analog, the filters required can be much more benign, and can extend the audio band’s flat response all the way out to 30kHz and beyond.  Many audiophiles consider such characteristics to be more desirable.  By the way, we don’t have to stop at DSD128, nor do we have to restrict ourselves to 1-bit formats, but those are entirely separate discussions.

If that was all there was to it, life would be simple.  And it isn’t.  The problem is that the original DSD signal (which I shall henceforth refer to as DSD64 for clarity) still contains ultrasonic noise from about 20-something kHz upwards.  This is now part of the signal, and cannot be unambiguously separated from it.  If nothing is done about it, it will still be present in the output of the DSD128.  So you need to filter it out before upsampling to DSD128, using a filter with similar performance to the one we just discussed and trashed as a possible solution in the analog domain.

The saving grace is that this can now be a digital filter.  There are three advantages that digital filters have over analog filters.  The first is that they approach very closely the behavior of theoretically perfect filters, something which analog filters do not.  This makes the design of a good digital filter massively easier as a practical matter than that of an equivalent analog filter.  The second advantage is that digital filters have a wider design space than analog filters, and some performance characteristics can be attained using them that are not possible using analog filters.  The third advantage is that analog filters are constructed using circuit elements which include capacitors, inductors, and resistors - components which high-end audio circuit designers will tell you can (and do) contribute to the sound quality.  Well-designed digital filters have no equivalent sonic signatures.

So - good news - the unwanted ultrasonic noise can be filtered out digitally with less sonic degradation than we could achieve with an analog filter.  Once the DSD64 is digitally filtered, it can be upsampled to 5.6MHz and processed into DSD128 using a Sigma-Delta Modulator (SDM).  It is an unresolved question in digital audio whether a SDM introduces any audible sonic degradation.  Together with the question of whether a 1-bit representation is adequate for the purposes of high-fidelity representation of an audio signal, these are the core technical issues at the heart of the PCM-vs-DSD debate.

So the difference between something that was converted from DSD64 to DSD128, and something that was recorded directly to DSD128, is that the former has been filtered to remove ultrasonic artifacts adjacent to the audio frequency band, and the latter has not.  If DSD128 sounds better than DSD64 it is because it dispenses with that filtering (and re-modulation) requirement.  Such arguments can be further extended to DSD256, DSD512, and the like.  The higher the 1-bit sampling frequency, the further the onset of ultrasonic noise can be pushed away from the audio band, and the more benign the filtering can be to remove it for playback.

It is interesting to conclude with the observation that, unlike the situation with 44.1kHz PCM, DSD64 allows the encoded signal to retain its frequency spectrum all the way out to 1MHz and beyond, if you wanted.  By contrast, 44.1kHz PCM requires the original analog signal itself to be strictly filtered to eliminate all content above a meager 22.05kHz.  DSD64 retains the full bandwidth of the signal, but allows it to be submerged by extremely high levels of added noise.  In the end you still have to filter out the noise - and any remaining signal components with it - but at least the original signal is still present.

Tuesday, 9 September 2014

Noise Shaping - What Does It Do?

If you read these posts regularly you will know that any attempt to digitize a music signal - to reduce it to a finite numerical representation - will ultimately result in some degree of quantization error.  This is because you cannot represent the waveform with an absolute degree of precision.  It is a bit like trying to express 1/3 as a decimal (1.3333333 …… ) - the more 3’s you write, the more accurate it is - the less the ‘quantization error’ - but the error is still there.  This quantization error comprises both noise and distortion.  The distortion components are those which are related to the signal itself (mathematically we use the term ‘correlated’), and generally can be held to represent sonic defects.  The noise is unrelated to the signal (mathematically ‘uncorrelated’) and represents the sort of background noise that we can in practice “tune out” without it adversely affecting our perception of the sound quality.

The way to eliminate the distortion caused by quantization is simply to add some more noise to it.  But not so much as to totally subsume the distortions.  It turns out that if we add just the right amount of noise, it doesn’t so much bury the distortion as cause it to shrink.  If we do it right, it shrinks to a level just below the newly-added noise floor.  Of course, this noise floor is now slightly higher than before, but this is perceived to sound better than the lower noise level with the higher distortion.  The process of deliberately adding noise is called ‘dither’, and we can mathematically analyze exactly how much noise, and what type of noise, is necessary to accomplish the desired result.  The answer is ‘TPDF’ dither (it doesn’t matter if you don’t know what that means) at the level of the Least Significant Bit (LSB).  This means that the greater the Bit Depth of your signal, the less the amount of noise you have to add to ensure the absence of distortion components in the quantization error.

Explaining and understanding exactly why that works is beyond the scope of this post, but I should point out that the analysis leads to some deeper and more profound insights, the implications of which I want to talk about.  Essentially, the idea of dither is this: when you digitize an analog signal (or reduce the bit depth of a digital signal - same thing) you are not constrained to always choose the nearest quantization level.  Sometimes good things can happen if you instead choose a different quantization level, as we shall see.

One thing that is easy to grasp is the concept of averaging.  If you count the number of people who live in a house, the answer is always an integer number.  But if you average over several houses, the average number of occupants can be a fractional number - for example 2.59.  Yet you will never look in an individual house and see 2.59 people.  It is the same with digital audio.  By measuring something multiple times, you can get an “average” value, which has more precision than the bit depth with which the values are measured.  In digital audio we call this “oversampling”.

Recall also that in order to digitally sample an analog waveform, we need a sample rate which is at least twice that of the highest frequency present in the waveform.  An audio waveform contains many frequencies, ranging from deep bass to high treble, so the sampling frequency must be at least twice that of the highest treble frequencies.  Clearly, therefore, the sampling frequency is going to to be many, many times higher that what we would need to capture the lower frequencies alone.  You could say therefore, that the lowest frequencies are highly oversampled, and that the possibility therefore ought to exist to record their content at a precision which is greater than the nominal bit depth.  And you would be right.

Noise shaping takes advantage of the fact that the lower frequencies are inherently over-sampled, and allows us to push the background noise level at these lower frequencies down below what would otherwise be the limit imposed by the fixed bit depth.  In fact it even allows us to encode signals below the level of the LSB, right down to that noise floor.  You would think it wouldn’t be possible, but it is, because of the fact that the low frequencies as quite highly oversampled.  In effect, you can think of the low frequency information as being encoded by averaging it over a number of samples.  In reality it is a lot more complicated than that, but that simplistic picture is essentially correct.

Like playing a flute, actually doing the noise shaping is a lot more difficult than talking about how to do it.  A noise shaping circuit (or, in the digital domain, algorithm) is conceptually simple.  You take the output of the quantizer and subtract it from its input.  The result is the quantization error.  You pass that through a carefully designed filter and  subtract its output in turn from the original input signal.  You are in effect putting the quantization error into a negative feedback loop.  In designing the noise shaper you must not ask it to do the impossible, otherwise it won’t work and will go unstable.  What you must do is recognize that only the low frequencies can benefit from noise shaping, so the filter passes only the low frequency components of the quantization error through the feedback loop.  This negative feedback in effect tries to reduce the quantization error only at those low frequencies.

But there’s no free lunch.  All of those quantization errors can’t just be made to go away.  The higher frequency components of the quantization error are not subject to the same negative feedback and so the actual quantization error becomes dominated by high frequency components.  The low frequency components end up being suppressed at the expense of increases in the high frequency components.  This is why it is called “Noise Shaping”.  It might better be referred to as “Quantization Error Shaping”, but that trips less fluidly off the tongue.  What we have done is to select quantization levels that are not necessarily those with the lowest individual quantization error, but as a result have nonetheless ended up with an improved performance.

At this point, a good question might be to ask just how much we can suppress the quantization error noise?  And there is an answer to that.  It is referred to as “Gerzon & Craven”, after the authors who published the first analysis of the subject in 1989.  What Gerzon & Craven says is that if we plot the quantization noise on a dB scale against the frequency on a linear scale, as we use noise shaping to push the quantization noise floor down at the low frequency end, we plot out a new curve.  There is an area that appears between the old and new curves.  Then, at higher frequencies, noise shaping requires us to pull the noise floor up above the existing noise floor.  Again, an area appears between the old curve and the new one.  Gerzon and Craven tells us that the two areas must be equal.  Since there is a fundamental limit on how high we can pull up the high frequency noise floor (we can’t pull it up higher than 0dB), it follows that there is a practical limit on how much we can push down the low frequency noise.  In practice, however, too high a degree of noise shaping requires highly aggressive filters, and these can end up dominating the issue due to practical problems of their own.

For a lot of applications, the high frequency area overlaps with the signal bandwidth.  A perfect example is 16/44.1 “red book” audio.  The high frequency area goes up to 22.05kHz, of which the audio bandwidth is taken to comprise up to 20kHz.  Any noise shaping done on 16/44.1 audio must therefore introduce audible high frequency noise.  It must therefore be done - if it is done at all - very judiciously.

There are two very important things to bear in mind about noise shaping.  The first is that the high frequency content is crucial to both the low frequency noise suppression and low-level signal encoding.  In a real sense, those effects are actually encoded by the high frequency noise itself.  If you were to pass the noise-shaped signal through a low-pass filter that cuts out only the high frequency noise, then as soon as you re-quantized the output of the filter to the bit depth of the original signal, all of that information would be lost again.

The second thing is that the noise-shaped noise is now part of the signal, and cannot be separated out.  This is of greatest importance in applications such as 16/44.1 where the signal and the shaped noise share a part of the spectrum.  Every time you add noise-shaped dither to such a signal as part of a processing stage, you end up adding to the high frequency noise. Considering that noise shaping may easily add 20dB of high frequency noise, this is a very important consideration.

All this is fundamental to the design of DSD, which is built upon the foundation of noise shaping.  A 1-bit bitstream has a noise floor of nominally -6dB, which is useless for high quality audio.  But if we can use noise-shaping to push it down to, say, -120dB over the audio bandwidth, then all of a sudden it becomes interesting.  In order to do that, we would need an awful lot of high frequency headroom into which we can shape all the resultant noise.  Additionally, we only have an absolute minimum of headroom into which we can push all this noise.  We will need something like 1,000kHz of high frequency space in which to shape all this noise.  Enter DSD, which has 1.4MHz available, and practical SDMs can just about be designed to do the job.

If we can double the sample rate of DSD and get what we now refer to as DSD128, or even increase it further to DSD256, DSD512, etc, then we can not only suppress the noise floor across the audio bandwidth, but also well into the ultrasonic region, so that it is totally removed from any audio content.  Perhaps this is why those higher flavours of DSD have their adherents.

I want to finish with some comments related to the paragraph above where I talk about how the HF noise is integral to the LF performance gains.  I want to discuss how this applies to DSD.  Obviously, I have to strip off the HF noise before I can play the track.  But if I can’t do that without regressing to 1-bit audio with -6B noise floor, how is it of any practical use?  The answer is that the HF content is only crucial while the signal remains in the 1-bit domain.  As soon as I free it from the shackles of 1-bit representation, all bets are off.  Converting it to analog is one way of releasing those shackles.  I can then use an analog filter to strip off the ultrasonic noise.  Converting it to a 64-bit digital format would be another.  In the 64-bit domain, for example, 1 and 0 become 1.0000000000000000E+000 and 0.0000000000000000E+000 respectively, and any quantization errors all of a sudden become vanishingly small.  In the 64-bit digital domain I can do all sorts of useful and interesting things, like digitally filter out all the HF noise, which is now superfluous.  But if I ever want to return it to the 1-bit domain, I need to go through the whole high-performance SDM once again, which would serve to add it right back in.

Tuesday, 26 August 2014

AirPlay Sticky

I have posted a new "sticky" in the Pages section, regarding AirPlay.

Tuesday, 19 August 2014

Digital Filters

I’m just sittin' in the morning sun, and I'll be sittin' when the evening comes, watching the ships roll in.  Then I watch them roll away again.  I'm sittin' on the dock of the bay watchin' the tide roll away.  I'm just sittin' on the dock of the bay, wastin' time.

But it’s not an entire waste of time.  Watching the tide is an excellent metaphor for digital audio, and one I like to turn to often.  Tides are an extremely low-frequency phenomenon, having a period of about 12.5 hours.  Armed with nothing more technically sophisticated than a notebook and pencil, we can make observations of the tide at a much higher frequency.  Noting the depth of the water as often as once a minute would not be unduly challenging, except that it might get a bit old after the first hour or so.

Measuring the depth of the water is especially easy when you sit right next to one of those harbour markers that read off the water depth in feet.  But it still presents you with problem No 1, which is that the water just won’t sit still.  The surface continues to bob up and down, sometimes by quite a large amount, driven by wind, the wakes of boats, and any number of other factors.  Expressing the problem in technical terms, even though the tide itself has a long, slow period, the thing you actually measure - the position of the water surface - changes rather dramatically in the short interval during which you want to actually measure it.  In other words, the thing you are trying to observe has both high frequency and low frequency components, and the high frequency components are doing nothing other than getting in the way.

Anyway, after a few days of this, you’ll have enough data to head back to your lab and process it.  With little more than a sheet of graph paper you can plot out your data and very quickly you will see the long slow period of the tide dominate the picture.  Job done.  For greater accuracy, you can pass your data through a low-pass filter, and get rid of all the high frequency components that don’t actually mean anything in the context of tidal analysis, and end up with the actual waveform of the tide itself, which, depending on which Bay your Dock is located in, may not be truly sinusoidal.

Digital filters work in a very simple way.  You take the current data point and add to it a certain fraction of the previous data point, and then a different fraction of the data point before that, and so on.  You can also add in a fraction of the previous output value of the filter, plus another fraction of the value before that, and so on.  Each of those fractions are called the “coefficients” of the filter, and their values drop out of the “pole dancing” exercise I described a few posts back.  Depending on the requirements of the filter, there can be many of these coefficients, sometimes even thousands of them.

It is clear, therefore, that the output of a digital filter contains fragments of a great many of the previous values of the input signal, in some cases fragments of all of them.  This gives rise to the misleading, but conceptually useful idea that a digital filter “smears” the input signal in time, to the detriment of its impulse response.  The truth is, though, that the behaviour of the filter is described exactly by its transfer function.  And the transfer function, as described in my earlier post, encapsulates both the frequency response and the phase response, which, together, serve to define the impulse response.

Given that the primary purpose of a filter is to have a certain frequency response characteristic, the best way to look at the phase response and the impulse response is to consider that they are different ways of viewing the same set of attributes.  The phase response looks at them in the frequency domain, and the impulse response in the time domain.  Both are equally valid, and, from an audio perspective, both serve to attempt to characterize the way the filter ‘sounds’.  We know that a low-pass filter will cut off the high frequencies, and we can easily hear the effect of that.  But there are different ways to implement the filter, each having more or less the same frequency response.  The differences between them lies in their phase and impulse responses, and these - in principle at least - have the potential to impact the sound.

In audio applications, many - I dare even say most - audiophiles report being able to hear the difference between different filters having the same frequency response.  Clearly, therefore, it would make sense to suggest that the audible differences are encapsulated in the filters’ phase and impulse responses.  It is therefore natural to suggest that certain features which one may observe in a phase response or impulse response are either more desirable or less desirable than others.  The trouble is that there is little in the way of sound (no pun intended) logic behind that.

In one of my earlier posts I described the effects of phase on the frequency components of a square wave.  I talked about how simple changes in the phase of those components could give rise to rather dramatic changes in the waveform, turning it from a good approximation to a square wave, into a mushy jumble.  Yet, despite the apparently dramatic impact upon the waveform, listening tests performed many times by many testers on many groups of subjects have tended to demonstrate quite conclusively that humans - even golden-eared audiophiles - cannot reliably detect differences in sounds due solely to phase changes.

One limitation - and, in my view, a major one - of these tests is that they can only really be performed unambiguously using electronically generated test waveforms.  The currently accepted model of human audio perception is that our brains compare what we hear to some internalized version of what we think we ought to be hearing, and that our assessment of the ‘fidelity’ of the sound somehow arises from differences between the internalized model and what we are actually hearing.  If we are listening to synthesized test tones, then in practice we don’t have a good internalized model to compare it against, and so we don’t have solid foundation against which we can hope to detect subtle differences.  Think about it.  When have you ever heard a synthesized tone and thought “that sounds natural”, or even “that sounds unnatural”?  So a good picture of the phase response of a filter does not give us a comprehensive theoretical basis on which to hang our hats when we try to assess whether one filter is going to sound better or worse than another.

Impulse response, on the other hand, does generate a lot of opinion.  The problem here is that we can look at a depiction of an impulse response, and convince ourselves that we either like it or don’t, and rationalize a basis for doing so.  Impulse responses are characterized typically by decayed ringing following the impulse, and sometimes also by what we call pre-ringing, the appearance of a ringing phenomenon before the actual impulse.  We tend to recoil from impulse responses with a pronounced pre-ring, because we imagine hearing transient events like a cymbal strike and having them somehow commence before the actual strike itself, like a tape played backwards.  Surely, we say, with the confidence of Basil Fawlty stating the bleeding’ obvious, pre-ringing is a bad thing.

Well, no.  The thing is, a cymbal strike and an impulse are not the same thing.  An impulse is a decidedly non-musical mathematical construct.  And while, yes, a filter with a pre-ringing impulse response can be shown to create actual artifacts in advance of transient events, these can be expressed as being due to phase response characteristics which advance the phases of certain frequency components more than others.  The fact is that the pre-ringing in an impulse response does not result in comparable pre-ringing in a real-world music waveform after it has passed through the filter.  It reflects only the relative phases of the remaining frequency components after the filter has done its job.

Having said that, though, the preponderance of anecdotal evidence does tend to favour the sonic characteristics of filters which exhibit minimal pre-ringing in their impulse responses.  So, while it may be acceptable to channel your inner Basil Fawlty and hold forth on the need for a ‘good’ impulse response, you might want to stop short of attempting to explain why that should be.

Why the fixation on filters with a pre-ringing impulse response?  Simply because they are the easiest to design, and to implement in real-time implementations.  If you don’t want the pre-ringing, that’s fine.  But you have a lot more in the way of obstacles to overcome in the design and implementation.  To be fair, in most audiophile applications these obstacles and limitations are not really a major concern.  But any time a filter is required where the designer is not (or chooses not to be) constrained by audiophile sensitivities, you are going to get a filter with a pre-ring in the impulse response.

The final matter I want to raise for this post is the question of whether or not a digital filter is lossy.  In a sense that is a dumb question.  Of course a filter is lossy.  If wasn’t lossy it wouldn’t be doing anything in the first place.  Filters are all about throwing away something you don’t want and keeping something you do want.  You can’t run your music through an “inverse” filter and regain the original waveform, because the stuff that was lost is gone forever and cannot be recreated out of thin air.  But if the input signal comprises a mixture of something you want (music) and something you don’t want (noise), and if the music lives in one band of frequencies and the noise in another band of frequencies, then in principle we can design a filter that will remove the noise and leave the music intact.

There are not many circumstances where that is possible to such an extent that the filter leaves the music “losslessly” intact, but setting that aside, what would it mean if we could?  Suppose I had a music waveform - for example a 24/384 recording that was bandwidth limited in the analog domain to, say, 20kHz.  Suppose I then synthesized another waveform, this time a 24/384 waveform comprising noise limited to the frequency band 100-175kHz.  Suppose I mixed the two together.  Here, then, is the notion of a “lossless” low-pass filter.  If I designed a filter that passed DC to 20kHz, and attenuated everything at 100kHz and above, it would be a truly lossless filter if I could pass my mixed signal through it and find that the output was identical to the original music waveform.  [I haven’t actually done that experiment, but it strikes me it would be an interesting one to try.]

What are the things that govern whether this “lossless” characteristic can actually be met.  The first is in the actual filter design.  Our goal is for the filter to pass all frequencies from DC to 20kHz (the “passband”), and it is not hard to do that to a first approximation.  But if we look closely we see that the passband of the filter is not actually flat.  It contains a residual ripple.  The size and frequency of this ripple are determined by the number of poles and zeroes in the filter design.  The more you allow, the smaller you can make the ripple, but you can’t make it totally go away.  Are these passband ripples actually audible, either directly or via inescapable secondary effects?  That is a good question, and I don’t think we have definitive answers to it.

The second thing that governs the “losslessness” of a digital filter is computational rounding errors.  Every time you add together two N-bit numbers the answer is a (N+1)-bit number.  But in a computer, if you are only able to represent the answer in a N-bit environment, you have to throw away the least significant bit of your answer.  It is much worse for multiplication and division.  Therefore, the more additions and multiplications you do, the more times you have to throw away bits of useful information.  If you do it enough times, it is possible to actually end up replacing all of your data with noise!  In real-world computers, the solution is to use as many bits as possible to represent your data.  In most computers the largest bit depth available is 64-bits, which is really a huge quantity.  Also, many Intel CPUs actually perform their floating point arithmetic internally in a 80-bit structure, which helps a little.  All of this means that with a 64-bit data engine, you can implement filters with many, many poles and not lose data to rounding errors.

The best expression of rounding errors is to use Google to hunt down data for the background noise level produced by Fourier Transforms of different lengths, using different data representations, and different FFT algorithms (which relates to the number of additions/multiplications needed to execute the transform) - and different CPUs!  I have done this before, but I can’t remember where I found the information.

So there we have it.  There are four areas in which a filter may have an audible signature, (i) the frequency response (well, duh, but I mean residual ripple in the passband of the frequency response), (ii) features of the phase response, (iii) features of the impulse response, (iv) noise due to computational rounding errors.  Of those, the first three are intimately connected, but the question is whether or not there are measurable factors in those specific views into the fundamental performance of the filter, which can be tracked down and quantified as to their sonic impact.  The state-of-the-art has not yet reached that point, and will not do so until such time as the consensus view of the golden-eared audiophile community are in agreement with the scientific community that it has.