I’m just sittin' in the morning sun, and I'll be sittin' when the evening comes, watching the ships roll in. Then I watch them roll away again. I'm sittin' on the dock of the bay watchin' the tide roll away. I'm just sittin' on the dock of the bay, wastin' time.
But it’s not an entire waste of time. Watching the tide is an excellent metaphor for digital audio, and one I like to turn to often. Tides are an extremely low-frequency phenomenon, having a period of about 12.5 hours. Armed with nothing more technically sophisticated than a notebook and pencil, we can make observations of the tide at a much higher frequency. Noting the depth of the water as often as once a minute would not be unduly challenging, except that it might get a bit old after the first hour or so.
Measuring the depth of the water is especially easy when you're sittin' on the dock of the bay, right next to one of those harbour markers that read off the water depth in feet. But it still presents you with problem No 1, which is that the water just won’t sit still. The surface continues to bob up and down, sometimes by quite a large amount, driven by wind, the wakes of boats, and any number of other factors. Expressing the problem in technical terms, even though the tide itself has a long, slow period, the thing you actually measure - the position of the water surface - changes rather dramatically in the short interval during which you want to actually measure it. In other words, the thing you are trying to observe has both high frequency and low frequency components, and the high frequency components are doing nothing other than getting in the way.
Anyway, after a few days of this, you’ll have enough data to head back to your lab and process it. With little more than a sheet of graph paper you can plot out your data and very quickly you will see the long slow period of the tide dominate the picture. Job done. For greater accuracy, you can pass your data through a low-pass filter, and get rid of all the high frequency components that don’t actually mean anything in the context of tidal analysis, and end up with the actual waveform of the tide itself, which, depending on which Bay your Dock is located in, may not be truly sinusoidal.
Digital filters work in a very simple way. You take the current data point and add to it a certain fraction of the previous data point, and then a different fraction of the data point before that, and so on. You can also add in a fraction of the previous output value of the filter, plus another fraction of the value before that, and so on. Each of those fractions are called the “coefficients” of the filter, and their values drop out of the “pole dancing” exercise I described a few posts back. Depending on the requirements of the filter, there can be many of these coefficients, sometimes even thousands of them.
It is clear, therefore, that the output of a digital filter contains fragments of a great many of the previous values of the input signal, in some cases fragments of all of them. This gives rise to the misleading, but conceptually useful idea that a digital filter “smears” the input signal in time, to the detriment of its impulse response. The truth is, though, that the behaviour of the filter is described exactly by its transfer function. And the transfer function, as described in my earlier post, encapsulates both the frequency response and the phase response, which, together, serve to define the impulse response.
Given that the primary purpose of a filter is to have a certain frequency response characteristic, the best way to look at the phase response and the impulse response is to consider that they are different ways of viewing the same set of attributes. The phase response looks at them in the frequency domain, and the impulse response in the time domain. Both are equally valid, and, from an audio perspective, both serve to attempt to characterize the way the filter ‘sounds’. We know that a low-pass filter will cut off the high frequencies, and we can easily hear the effect of that. But there are different ways to implement the filter, each having more or less the same frequency response. The differences between them lies in their phase and impulse responses, and these - in principle at least - have the potential to impact the sound.
In audio applications, many - I dare even say most - audiophiles report being able to hear the difference between different filters having the same frequency response. Clearly, therefore, it would make sense to suggest that the audible differences are encapsulated in the filters’ phase and impulse responses. It is therefore natural to suggest that certain features which one may observe in a phase response or impulse response are either more desirable or less desirable than others. The trouble is that there is little in the way of sound (no pun intended) logic behind that.
In one of my earlier posts I described the effects of phase on the frequency components of a square wave. I talked about how simple changes in the phase of those components could give rise to rather dramatic changes in the waveform, turning it from a good approximation to a square wave, into a mushy jumble. Yet, despite the apparently dramatic impact upon the waveform, listening tests performed many times by many testers on many groups of subjects have tended to demonstrate quite conclusively that humans - even golden-eared audiophiles - cannot reliably detect differences in sounds due solely to phase changes.
One limitation - and, in my view, a major one - of these tests is that they can only really be performed unambiguously using electronically generated test waveforms. The currently accepted model of human audio perception is that our brains compare what we hear to some internalized version of what we think we ought to be hearing, and that our assessment of the ‘fidelity’ of the sound somehow arises from differences between the internalized model and what we are actually hearing. If we are listening to synthesized test tones, then in practice we don’t have a good internalized model to compare it against, and so we don’t have solid foundation against which we can hope to detect subtle differences. Think about it. When have you ever heard a synthesized tone and thought “that sounds natural”, or even “that sounds unnatural”? So a good picture of the phase response of a filter does not give us a comprehensive theoretical basis on which to hang our hats when we try to assess whether one filter is going to sound better or worse than another.
Impulse response, on the other hand, does generate a lot of opinion. The problem here is that we can look at a depiction of an impulse response, and convince ourselves that we either like it or don’t, and rationalize a basis for doing so. Impulse responses are characterized typically by decayed ringing following the impulse, and sometimes also by what we call pre-ringing, the appearance of a ringing phenomenon before the actual impulse. We tend to recoil from impulse responses with a pronounced pre-ring, because we imagine hearing transient events like a cymbal strike and having them somehow commence before the actual strike itself, like a tape played backwards. Surely, we say, with the confidence of Basil Fawlty stating the bleeding’ obvious, pre-ringing is a bad thing.
Well, no. The thing is, a cymbal strike and an impulse are not the same thing. An impulse is a decidedly non-musical mathematical construct. And while, yes, a filter with a pre-ringing impulse response can be shown to create actual artifacts in advance of transient events, these can be expressed as being due to phase response characteristics which advance the phases of certain frequency components more than others. The fact is that the pre-ringing in an impulse response does not result in comparable pre-ringing in a real-world music waveform after it has passed through the filter. It reflects only the relative phases of the remaining frequency components after the filter has done its job.
Having said that, though, the preponderance of anecdotal evidence does tend to favour the sonic characteristics of filters which exhibit minimal pre-ringing in their impulse responses. So, while it may be acceptable to channel your inner Basil Fawlty and hold forth on the need for a ‘good’ impulse response, you might want to stop short of attempting to explain why that should be.
Why the fixation on filters with a pre-ringing impulse response? Simply because they are the easiest to design, and to implement in real-time implementations. If you don’t want the pre-ringing, that’s fine. But you have a lot more in the way of obstacles to overcome in the design and implementation. To be fair, in most audiophile applications these obstacles and limitations are not really a major concern. But any time a filter is required where the designer is not (or chooses not to be) constrained by audiophile sensitivities, you are going to get a filter with a pre-ring in the impulse response.
The final matter I want to raise for this post is the question of whether or not a digital filter is lossy. In a sense that is a dumb question. Of course a filter is lossy. If wasn’t lossy it wouldn’t be doing anything in the first place. Filters are all about throwing away something you don’t want and keeping something you do want. You can’t run your music through an “inverse” filter and regain the original waveform, because the stuff that was lost is gone forever and cannot be recreated out of thin air. But if the input signal comprises a mixture of something you want (music) and something you don’t want (noise), and if the music lives in one band of frequencies and the noise in another band of frequencies, then in principle we can design a filter that will remove the noise and leave the music intact.
There are not many circumstances where that is possible to such an extent that the filter leaves the music “losslessly” intact, but setting that aside, what would it mean if we could? Suppose I had a music waveform - for example a 24/384 recording that was bandwidth limited in the analog domain to, say, 20kHz. Suppose I then synthesized another waveform, this time a 24/384 waveform comprising noise limited to the frequency band 100-175kHz. Suppose I mixed the two together. Here, then, is the notion of a “lossless” low-pass filter. If I designed a filter that passed DC to 20kHz, and attenuated everything at 100kHz and above, it would be a truly lossless filter if I could pass my mixed signal through it and find that the output was identical to the original music waveform. [I haven’t actually done that experiment, but it strikes me it would be an interesting one to try.]
What are the things that govern whether this “lossless” characteristic can actually be met. The first is in the actual filter design. Our goal is for the filter to pass all frequencies from DC to 20kHz (the “passband”), and it is not hard to do that to a first approximation. But if we look closely we see that the passband of the filter is not actually flat. It contains a residual ripple. The size and frequency of this ripple are determined by the number of poles and zeroes in the filter design. The more you allow, the smaller you can make the ripple, but you can’t make it totally go away. Are these passband ripples actually audible, either directly or via inescapable secondary effects? That is a good question, and I don’t think we have definitive answers to it.
The second thing that governs the “losslessness” of a digital filter is computational rounding errors. Every time you add together two N-bit numbers the answer is a (N+1)-bit number. But in a computer, if you are only able to represent the answer in a N-bit environment, you have to throw away the least significant bit of your answer. It is much worse for multiplication and division. Therefore, the more additions and multiplications you do, the more times you have to throw away bits of useful information. If you do it enough times, it is possible to actually end up replacing all of your data with noise! In real-world computers, the solution is to use as many bits as possible to represent your data. In most computers the largest bit depth available is 64-bits, which is really a huge quantity. Also, many Intel CPUs actually perform their floating point arithmetic internally in a 80-bit structure, which helps a little. All of this means that with a 64-bit data engine, you can implement filters with many, many poles and not lose data to rounding errors.
The best expression of rounding errors is to use Google to hunt down data for the background noise level produced by Fourier Transforms of different lengths, using different data representations, and different FFT algorithms (which relates to the number of additions/multiplications needed to execute the transform) - and different CPUs! I have done this before, but I can’t remember where I found the information.
So there we have it. There are four areas in which a filter may have an audible signature, (i) the frequency response (well, duh, but I mean residual ripple in the passband of the frequency response), (ii) features of the phase response, (iii) features of the impulse response, (iv) noise due to computational rounding errors. Of those, the first three are intimately connected, but the question is whether or not there are measurable factors in those specific views into the fundamental performance of the filter, which can be tracked down and quantified as to their sonic impact. The state-of-the-art has not yet reached that point, and will not do so until such time as the consensus view of the golden-eared audiophile community are in agreement with the scientific community that it has.