We already know that a digital waveform can be transformed, using a Fourier Transform, into a different representation where each data point represents a certain particular frequency, and the magnitude of the transform at that data point represents the amount of that frequency that is present in the original signal.
This is interesting, because we humans are able to perceive both of these aspects of a sound’s frequency content. If the frequency itself changes - increases or decreases - we perceive the pitch to go up or down. And if the magnitude changes - increases or decreases - we perceive the volume to get louder or quieter. Between them, these two things would appear to totally define how we perceive (or, if you prefer, “hear”) audio signals. Interestingly enough, a physical analysis of how the human hearing system actually works suggests that it is those separate individual frequencies, rather than the waveform itself in its full complexity, that our ears respond to.
If we take all the frequencies in the Fourier Transform and create a sine wave from each one, whose magnitude is the magnitude of the Fourier Transform, and add them all together, the sum total of all these sine waves will be the exact original waveform. But there are a couple of wrinkles to bear in mind. The first is that this is only strictly true if the original waveform used to create the Fourier Transform was of infinite duration, producing a Fourier Transform with an infinite number of frequencies. For the purposes of this post we can safely ignore that limitation. The second is that we need to know the relative phase of each frequency component.
I wrote in a previous post how we can decompose a square wave into its constituent frequency components and use those to reconstruct the square wave. However, if we change the phase of these individual frequency components - which describes how the individual sine waves “line up” against each other - then we end up changing the shape of the original square wave. Indeed, the change can be rather dramatic. In other words, changing the phases of a waveform’s component frequencies can significantly alter the waveform’s shape without changing any of its component frequencies or their magnitudes. To a first approximation, changes in the phase response of an audio system are considered not to be audible. However, at the bleeding edge where audiophiles live that is not so clear.
The Fourier Transform I mentioned in fact encodes both the magnitude and the phase information because the transformation actually produces complex numbers (numbers having two components which we term Real and Imaginary). We can massage these two components to yield both the phase and the magnitude. This is one example of how the phase and frequency responses of an audio system are tightly intertwined.
We are used to demanding that anything which affects an audio system has a frequency response that meets our objectives. This applies equally in the analog domain (whether we apply it to circuits such as amplifiers or components such as transistors) as in the digital domain (where we can apply it to simple filters or elaborate MP3 encoders). We are familiar with the common requirement for flat frequency response across the audio bandwidth because we know that we can “hear” these frequencies clearly. But all of those systems, analog and digital, also have an associated phase response.
Some types of phase response are quite trivial. For example, if the phase response is linear, which means that the phase is linear with frequency, this means simply that the signal has been delayed by a fixed amount of time. More generally if we look at the phase response plot (phase vs frequency), the slope of the line at any frequency tells us how much that frequency is delayed by. Clearly, if the slope is linear, all frequencies will be delayed by the same amount, and the effect will be a fixed delay applied to the entire signal. However, if the slope is anything other than linear, it means that different delays apply to each frequency and the result will be a degree of waveform distortion as discussed regarding the square wave.
So, we have clear ideas about errors in the magnitude of the frequency response. We classify these as dips, humps, roll-offs, etc, in the frequency response, and we have expectations as to how we expect these defects to sound, plus a reasonably well-cultivated language with which to describe those sounds. But we are still trying to develop an equivalent understanding of phase responses.
One development I don’t like is to focus on the impulse response, and to ascribe features of the impulse response to corresponding qualities in the output waveform. So, for example, pre-ringing in the impulse response is imagined to give rise to “pre-ringing” in the output waveform, which is presumed to be a BAD THING. This loses sight of a simple truth. If you mathematically analyze a pure perfect square wave and remove all of its components above a certain frequency, what you get is pre-ringing before each step, and post-ringing after it. We’re not talking about a filter here, we’re talking about what the waveform inherently looks like if its high frequency components were absent, which they need to be if we are going to encode it digitally.
You might argue that a perfect phase response would be a zero-phase response, where there is no phase error whatsoever at each and every frequency. Such characteristics cannot be achieved at all in the analog domain, but in the digital domain there are various ways of accomplishing it. However, it can be shown mathematically that all zero-phase filters must have a symmetrical impulse response. In other words, whatever post-ring your filter has, it will have the exact same pre-ring before the impulse. This, by the way, is another way of describing what happened to the pure perfect square wave.
Another impulse response characteristic that gets a lot of favourable press is the Minimum Phase filter. This is a misleading title because, although it does mathematically minimize the net phase error, it lacks a theoretical basis upon which to suppose a monotonic relationship exists between the accumulated net phase error and an observed deterioration in the sound quality. For example, linear phase filters exhibiting no waveform distortion can in principle have significant different fixed delays, with corresponding significant differences in their net phase error, yet with no difference whatsoever in the fidelity of their output signals. On the other hand, Minimum Phase filters do concentrate the filter’s “energy” as much as possible into the “early” part of its impulse response, which can mean that it is more mathematically “efficient”, which may make for either a better-designed filter, or a more accurate implementation of the filter’s design (sorry for the “air quotes”, but this is a topic that could take up a whole post of its own).
One thing I must be clear on is that this discussion is purely a technical one. I discuss the technical properties of phase and impulse responses, but I don’t hold up a hand and claim that one thing is better than the other. Someone may state an opinion that such-and-such a filter sounds better than so-and-so’s filter because it has a “better” impulse response. I might agree or disagree with the opinion regarding which filter sounds best, but I will argue against attributing the finding to certain properties of the impulse response without a good model to account for why the properties advocated should be beneficial. As regards the impulse responses no such “good” model yet exists (that I know of).
Where I do stand from a philosophical standpoint is that I like zero-phase responses and linear phase responses because these contribute no waveform distortion at the output. For that reason, we are, here at BitPerfect, developing a zero-phase DSP engine that, if successful, we will be able to apply quite broadly. We will try it out first in our DSD Master DSD-to-PCM conversion engine, where I am convinced that it will provide PCM conversions that are, finally, indistinguishable from the DSD originals. If listening tests prove us out, we will release it. From there it will migrate to SRC, where I believe it will deliver an SRC solution superior to the industry-leading Izotope product (which is too expensive for us to use cost-effectively). Finally, it will appear in our new design for a seriously good graphical equalizer package that is in early-stage development, with possible application to room-correction technology.