*sticky*" in the Pages section, regarding AirPlay.

## Tuesday 26 August 2014

## Tuesday 19 August 2014

### Digital Filters

*I’m just sittin' in the morning sun, and I'll be sittin' when the evening comes, watching the ships roll in. Then I watch them roll away again. I'm sittin' on the dock of the bay watchin' the tide roll away. I'm just sittin' on the dock of the bay, wastin' time.*

But it’s not an entire waste of time. Watching the tide is an excellent metaphor for digital audio, and one I like to turn to often. Tides are an extremely low-frequency phenomenon, having a period of about 12.5 hours. Armed with nothing more technically sophisticated than a notebook and pencil, we can make observations of the tide at a much higher frequency. Noting the depth of the water as often as once a minute would not be unduly challenging, except that it might get a bit old after the first hour or so.

Measuring the depth of the water is especially easy when you're sittin' on the dock of the bay, right next to one of those harbour markers that read off the water depth in feet. But it still presents you with problem No 1, which is that the water just won’t sit still. The surface continues to bob up and down, sometimes by quite a large amount, driven by wind, the wakes of boats, and any number of other factors. Expressing the problem in technical terms, even though the tide itself has a long, slow period, the thing you actually measure - the position of the water surface - changes rather dramatically in the short interval during which you want to actually measure it. In other words, the thing you are trying to observe has both high frequency and low frequency components, and the high frequency components are doing nothing other than getting in the way.

Anyway, after a few days of this, you’ll have enough data to head back to your lab and process it. With little more than a sheet of graph paper you can plot out your data and very quickly you will see the long slow period of the tide dominate the picture. Job done. For greater accuracy, you can pass your data through a low-pass filter, and get rid of all the high frequency components that don’t actually mean anything in the context of tidal analysis, and end up with the actual waveform of the tide itself, which, depending on which Bay your Dock is located in, may not be truly sinusoidal.

Digital filters work in a very simple way. You take the current data point and add to it a certain fraction of the previous data point, and then a different fraction of the data point before that, and so on. You can also add in a fraction of the previous output value of the filter, plus another fraction of the value before that, and so on. Each of those fractions are called the “coefficients” of the filter, and their values drop out of the “pole dancing” exercise I described a few posts back. Depending on the requirements of the filter, there can be many of these coefficients, sometimes even thousands of them.

It is clear, therefore, that the output of a digital filter contains fragments of a great many of the previous values of the input signal, in some cases fragments of all of them. This gives rise to the misleading, but conceptually useful idea that a digital filter “smears” the input signal in time, to the detriment of its impulse response. The truth is, though, that the behaviour of the filter is described exactly by its transfer function. And the transfer function, as described in my earlier post, encapsulates both the frequency response and the phase response, which, together, serve to define the impulse response.

Given that the primary purpose of a filter is to have a certain frequency response characteristic, the best way to look at the phase response and the impulse response is to consider that they are different ways of viewing the same set of attributes. The phase response looks at them in the frequency domain, and the impulse response in the time domain. Both are equally valid, and, from an audio perspective, both serve to attempt to characterize the way the filter ‘sounds’. We know that a low-pass filter will cut off the high frequencies, and we can easily hear the effect of that. But there are different ways to implement the filter, each having more or less the same frequency response. The differences between them lies in their phase and impulse responses, and these - in principle at least - have the potential to impact the sound.

In audio applications, many - I dare even say most - audiophiles report being able to hear the difference between different filters having the same frequency response. Clearly, therefore, it would make sense to suggest that the audible differences are encapsulated in the filters’ phase and impulse responses. It is therefore natural to suggest that certain features which one may observe in a phase response or impulse response are either more desirable or less desirable than others. The trouble is that there is little in the way of sound (no pun intended) logic behind that.

In one of my earlier posts I described the effects of phase on the frequency components of a square wave. I talked about how simple changes in the phase of those components could give rise to rather dramatic changes in the waveform, turning it from a good approximation to a square wave, into a mushy jumble. Yet, despite the apparently dramatic impact upon the waveform, listening tests performed many times by many testers on many groups of subjects have tended to demonstrate quite conclusively that humans - even golden-eared audiophiles - cannot reliably detect differences in sounds due solely to phase changes.

One limitation - and, in my view, a major one - of these tests is that they can only really be performed unambiguously using electronically generated test waveforms. The currently accepted model of human audio perception is that our brains compare what we hear to some internalized version of what we think we ought to be hearing, and that our assessment of the ‘fidelity’ of the sound somehow arises from differences between the internalized model and what we are actually hearing. If we are listening to synthesized test tones, then in practice we don’t have a good internalized model to compare it against, and so we don’t have solid foundation against which we can hope to detect subtle differences. Think about it. When have you ever heard a synthesized tone and thought “that sounds natural”, or even “that sounds unnatural”? So a good picture of the phase response of a filter does not give us a comprehensive theoretical basis on which to hang our hats when we try to assess whether one filter is going to sound better or worse than another.

Impulse response, on the other hand, does generate a lot of opinion. The problem here is that we can look at a depiction of an impulse response, and convince ourselves that we either like it or don’t, and rationalize a basis for doing so. Impulse responses are characterized typically by decayed ringing following the impulse, and sometimes also by what we call pre-ringing, the appearance of a ringing phenomenon before the actual impulse. We tend to recoil from impulse responses with a pronounced pre-ring, because we imagine hearing transient events like a cymbal strike and having them somehow commence before the actual strike itself, like a tape played backwards. Surely, we say, with the confidence of Basil Fawlty stating the bleeding’ obvious, pre-ringing is a

*.*

**bad thing**Well, no. The thing is, a cymbal strike and an impulse are not the same thing. An impulse is a decidedly non-musical mathematical construct. And while, yes, a filter with a pre-ringing impulse response can be shown to create actual artifacts in advance of transient events, these can be expressed as being due to phase response characteristics which advance the phases of certain frequency components more than others. The fact is that the pre-ringing in an impulse response does not result in comparable pre-ringing in a real-world music waveform after it has passed through the filter. It reflects only the relative phases of the remaining frequency components after the filter has done its job.

Having said that, though, the preponderance of anecdotal evidence does tend to favour the sonic characteristics of filters which exhibit minimal pre-ringing in their impulse responses. So, while it may be acceptable to channel your inner Basil Fawlty and hold forth on the need for a ‘good’ impulse response, you might want to stop short of attempting to explain why that should be.

Why the fixation on filters with a pre-ringing impulse response? Simply because they are the easiest to design, and to implement in real-time implementations. If you don’t want the pre-ringing, that’s fine. But you have a lot more in the way of obstacles to overcome in the design and implementation. To be fair, in most audiophile applications these obstacles and limitations are not really a major concern. But any time a filter is required where the designer is not (or chooses not to be) constrained by audiophile sensitivities, you are going to get a filter with a pre-ring in the impulse response.

The final matter I want to raise for this post is the question of whether or not a digital filter is lossy. In a sense that is a dumb question. Of course a filter is lossy. If wasn’t lossy it wouldn’t be doing anything in the first place. Filters are all about throwing away something you don’t want and keeping something you do want. You can’t run your music through an “inverse” filter and regain the original waveform, because the stuff that was lost is gone forever and cannot be recreated out of thin air. But if the input signal comprises a mixture of something you want (music) and something you don’t want (noise), and if the music lives in one band of frequencies and the noise in another band of frequencies, then in principle we can design a filter that will remove the noise and leave the music intact.

There are not many circumstances where that is possible to such an extent that the filter leaves the music “losslessly” intact, but setting that aside, what would it mean if we could? Suppose I had a music waveform - for example a 24/384 recording that was bandwidth limited in the analog domain to, say, 20kHz. Suppose I then synthesized another waveform, this time a 24/384 waveform comprising noise limited to the frequency band 100-175kHz. Suppose I mixed the two together. Here, then, is the notion of a “lossless” low-pass filter. If I designed a filter that passed DC to 20kHz, and attenuated everything at 100kHz and above, it would be a truly lossless filter if I could pass my mixed signal through it and find that the output was identical to the original music waveform. [

*I haven’t actually done that experiment, but it strikes me it would be an interesting one to try.*]

What are the things that govern whether this “lossless” characteristic can actually be met. The first is in the actual filter design. Our goal is for the filter to pass all frequencies from DC to 20kHz (the “passband”), and it is not hard to do that to a first approximation. But if we look closely we see that the passband of the filter is not actually flat. It contains a residual ripple. The size and frequency of this ripple are determined by the number of poles and zeroes in the filter design. The more you allow, the smaller you can make the ripple, but you can’t make it totally go away. Are these passband ripples actually audible, either directly or via inescapable secondary effects? That is a good question, and I don’t think we have definitive answers to it.

The second thing that governs the “losslessness” of a digital filter is computational rounding errors. Every time you add together two N-bit numbers the answer is a (N+1)-bit number. But in a computer, if you are only able to represent the answer in a N-bit environment, you have to throw away the least significant bit of your answer. It is much worse for multiplication and division. Therefore, the more additions and multiplications you do, the more times you have to throw away bits of useful information. If you do it enough times, it is possible to actually end up replacing all of your data with noise! In real-world computers, the solution is to use as many bits as possible to represent your data. In most computers the largest bit depth available is 64-bits, which is really a huge quantity. Also, many Intel CPUs actually perform their floating point arithmetic internally in a 80-bit structure, which helps a little. All of this means that with a 64-bit data engine, you can implement filters with many, many poles and not lose data to rounding errors.

The best expression of rounding errors is to use Google to hunt down data for the background noise level produced by Fourier Transforms of different lengths, using different data representations, and different FFT algorithms (which relates to the number of additions/multiplications needed to execute the transform) - and different CPUs! I have done this before, but I can’t remember where I found the information.

So there we have it. There are four areas in which a filter may have an audible signature, (i) the frequency response (well, duh, but I mean residual ripple in the passband of the frequency response), (ii) features of the phase response, (iii) features of the impulse response, (iv) noise due to computational rounding errors. Of those, the first three are intimately connected, but the question is whether or not there are measurable factors in those specific views into the fundamental performance of the filter, which can be tracked down and quantified as to their sonic impact. The state-of-the-art has not yet reached that point, and will not do so until such time as the consensus view of the golden-eared audiophile community are in agreement with the scientific community that it has.

## Saturday 16 August 2014

### Someone is watching you...

My brother-in-law has a great knack for sending just the right birthday card. Here, for your reading pleasure, is the (lengthy) inscription on this year's offering.

The burglar has been watching the house for a while. He is certain there is nobody at home, and decides the timing is right to break in. He knows the occupant has an expensive stereo and makes his way straight for it. Nice! He disconnects all the cables form the PS Audio DirectStream DAC and is just about to lift it from the shelf when, from the darkness at the back of the room he hears a quiet voice.

Momentarily shocked, the burglar freezes and slowly turns round. Seeing nothing, he puts it down to his imagination and turns his attention back to the DirectStream.

This time the clearly nervous burglar switches his flashlight on and scans the darkened room with it. The beam comes to rest on a parrot, sitting in a cage.

The parrot squawked, and thought for a moment before replying.

The burglar has been watching the house for a while. He is certain there is nobody at home, and decides the timing is right to break in. He knows the occupant has an expensive stereo and makes his way straight for it. Nice! He disconnects all the cables form the PS Audio DirectStream DAC and is just about to lift it from the shelf when, from the darkness at the back of the room he hears a quiet voice.

*"Jesus is watching you."*Momentarily shocked, the burglar freezes and slowly turns round. Seeing nothing, he puts it down to his imagination and turns his attention back to the DirectStream.

*"Jesus is watching you"*, says the voice again.This time the clearly nervous burglar switches his flashlight on and scans the darkened room with it. The beam comes to rest on a parrot, sitting in a cage.

*"Did you just say that?"*asks the startled burglar.*"Yes I did,"*replied the parrot,*"but I was only trying to warn you."**"You?"*asks the incredulous burglar.*"And who the hell are you, to be warning me?"**"My name is Moses"*says the parrot.*"Moses?"*says the bemused burglar.*"What kind of people call their parrot Moses?"*The parrot squawked, and thought for a moment before replying.

*"The kind of people who call their Rottweiler, 'Jesus'."*## Thursday 14 August 2014

### Understanding Fourier Transforms

Now that highly sophisticated analytical Apps such as Audacity are available as a free download, anybody can take an audio track and immediately obtain a Fourier Transform of it, without having to know a thing about it. And that is fine, so long as all you want is to take a quick and dirty look at the spectral content of the music. But if you want to be able to take it beyond a superficial glance, it would help to understand some of the fundamentals about what a Fourier Transform shows you, and what it doesn’t.

The first thing you need to know is the difference between a Fourier Transform and a DTFT or Discrete Time Fourier Transform. A Fourier Transform is a mathematical operation performed upon a function. That function may well be, for example, a musical waveform. However, when we want to get that waveform into a computer to actually

The next thing you need to know is that a Fourier Transform requires the complete waveform, which means we need to know what it looked like since time began, and what it is going to look like in the infinite future. Real waveforms - particularly digitized representations of real waveforms - are not like that. They have a defined start and a defined stop. The DTFT gets around that by assuming that the waveform between the start and stop points repeats

The above notwithstanding, for the remainder of this post I shall use the term Fourier Transform, even though everything I am discussing is in fact a DTFT. It just makes for easier reading.

The Fourier Transform (by which I mean DTFT, remember?) of a waveform represented by N samples, is another waveform of N data points. These data points represent frequencies. The lowest frequency is DC, or zero Hertz (0Hz), and the highest frequency is the sample rate of the waveform Fs. The remaining N-2 data points represent all of the frequencies between 0Hz and Fs, spaced linearly. So the more samples there are in the waveform that goes into the Fourier Transform, the more detailed will be the spectrum of frequencies in the output.

Those of you who have your antennae switched on will have noticed already that the output of the Fourier Transform includes all of the frequencies from zero to Fs, whereas, as we know, digital sampling can only record frequencies which are less than half of the sampling frequency (Fs/2). This is because all of the frequencies above Fs/2 are in fact the aliases of the frequencies below Fs/2, and if you look closely you will see that the spectrum above Fs/2 exactly mirrors that below Fs/2. For this reason, and because the alias frequencies have no analytical value whatsoever, it is normal practice for analytical software to display only those frequencies below Fs/2.

The Fourier Transform (DTFT - last reminder) is a very complicated mathematical operation, and we are indeed fortunate to live in an age where a desktop computer can perform them with ease. Nonetheless, it doesn’t hurt to perform our Fourier Transform in the most efficient manner possible. By far the most effective way to slash processing time is to restrict the Fourier Transform to snippets of music waveform where the number of samples is a power of two. The processing time tends to increase exponentially with the number of samples, as does the memory requirements to store intermediate results. This usually limits Fourier Transforms in analytical software to something like 65,536 samples (2 raised to the power 16). Many Apps limit you to less - for example Audicity limits you to 16,384.

If the music is sampled at 44.1kHz, then 16,384 samples amounts to less than half a second of music. With high-resolution music it is correspondingly less. The fewer the samples and the higher the sample rate, the shorter the duration of the sampled window. Therefore if you are making observations regarding the spectral content of a music track, you probably need to be careful to look at multiple Fourier Transforms throughout the track because the spectral content typically evolves dramatically during the course of a complete track.

Let’s consider a snippet of a music waveform comprising N samples, spaced at regular intervals of t = (1/Fs), where Fs is the sample rate. The total duration of the snippet is therefore given by T = (N-1)t = (N-1)/Fs. The Fourier Transform will comprise a spectrum of N frequencies from zero to Fs. These are spaced linearly at intervals of Fs/(N-1), which is equal to 1/T regardless of the sample rate. That’s an important result. The longer the duration of the snippet, the more accurately we can analyze its frequency content.

What does this mean if our music contains frequencies which are between those linearly spaced individual frequencies? This question goes to the root of digital audio. It means than a set of N samples, sampled at a rate of Fs, cannot tell us anything about the frequency content to an accuracy of better than Fs/(N-1) for the simple reason that

It is important to realize that Fourier analysis tells you not so much how much information you can extract from the data, rather it tells you how much is actually in there in the first place.

Earlier on I mentioned that the DTFT requires us to make an assumption that the snippet of music that goes into the analysis repeats

Since they originate at the place where the finite waveform periodically stitches together with itself, it should be possible to ‘mask’ the effect of the discontinuity by ‘windowing’ the waveform with a window function which falls to zero at the interface points. Note that this kind of treatment cannot make the problem go away, since it is inherent in the data to begin with. But every window function will have the effect of adding its own particular spuriae to the Fourier Transform. So it may be of some use to choose a window function whose spectral spuriae are of a known characteristic, which at least gives you the possibility to choose a type whose characteristics are less deleterious to a particular property that you are interested in, but perhaps more so to one you are less interested in.

For example, you may be more interested in knowing the absolute magnitude of the signal content at a certain frequency as accurately as possible, and less interested in the magnitude of the noise floor through which it rises. Or it may be the other way round. Or you may be interested to know whether a feature in the vicinity of certain frequency is just one broad spectral line or in fact comprises two frequencies close together. You may wish to know the line width of a certain frequency. All of these things can be addressed by choosing a particular window function whose properties are best suited to your requirements.

Common window functions include Rectangular, Hamming, Hanning, Bartlett, Blackman, Blackman-Harris, Gaussian, Flat-Top, Tukey, and many others. There are no real hard and fast rules, but a good choice of window for accurate resolution of frequency would be Rectangular; for lowest spectral leakage Blackman or Blackman-Harris; and for amplitude accuracy Flat-Top. Unfortunately, not all signal processing Apps offer the same selection of windowing functions, and many of those windowing functions come with their own sets of parameters and selectable types. So if you are using a Fourier Transform to make or illustrate a particular point or observation, it may be worth the effort to set about selecting an appropriate window function.

So there you are. A brief introduction to Fourier Transforms for the casual user. I hope you found it to be of some value.

The first thing you need to know is the difference between a Fourier Transform and a DTFT or Discrete Time Fourier Transform. A Fourier Transform is a mathematical operation performed upon a function. That function may well be, for example, a musical waveform. However, when we want to get that waveform into a computer to actually

*the analysis, all we can import is an audio file. An audio file is not the waveform itself, but merely a representation of the waveform. Therefore we do not have the waveform itself upon which to perform the transform, only the numbers which represent it. A DTFT is the digital equivalent of a Fourier Transform, performed upon a digital representation of the waveform.***do**The next thing you need to know is that a Fourier Transform requires the complete waveform, which means we need to know what it looked like since time began, and what it is going to look like in the infinite future. Real waveforms - particularly digitized representations of real waveforms - are not like that. They have a defined start and a defined stop. The DTFT gets around that by assuming that the waveform between the start and stop points repeats

*ad infinitum*into both the the past and the future. For the most part, this is a limitation that we can comfortably get around.The above notwithstanding, for the remainder of this post I shall use the term Fourier Transform, even though everything I am discussing is in fact a DTFT. It just makes for easier reading.

The Fourier Transform (by which I mean DTFT, remember?) of a waveform represented by N samples, is another waveform of N data points. These data points represent frequencies. The lowest frequency is DC, or zero Hertz (0Hz), and the highest frequency is the sample rate of the waveform Fs. The remaining N-2 data points represent all of the frequencies between 0Hz and Fs, spaced linearly. So the more samples there are in the waveform that goes into the Fourier Transform, the more detailed will be the spectrum of frequencies in the output.

Those of you who have your antennae switched on will have noticed already that the output of the Fourier Transform includes all of the frequencies from zero to Fs, whereas, as we know, digital sampling can only record frequencies which are less than half of the sampling frequency (Fs/2). This is because all of the frequencies above Fs/2 are in fact the aliases of the frequencies below Fs/2, and if you look closely you will see that the spectrum above Fs/2 exactly mirrors that below Fs/2. For this reason, and because the alias frequencies have no analytical value whatsoever, it is normal practice for analytical software to display only those frequencies below Fs/2.

The Fourier Transform (DTFT - last reminder) is a very complicated mathematical operation, and we are indeed fortunate to live in an age where a desktop computer can perform them with ease. Nonetheless, it doesn’t hurt to perform our Fourier Transform in the most efficient manner possible. By far the most effective way to slash processing time is to restrict the Fourier Transform to snippets of music waveform where the number of samples is a power of two. The processing time tends to increase exponentially with the number of samples, as does the memory requirements to store intermediate results. This usually limits Fourier Transforms in analytical software to something like 65,536 samples (2 raised to the power 16). Many Apps limit you to less - for example Audicity limits you to 16,384.

If the music is sampled at 44.1kHz, then 16,384 samples amounts to less than half a second of music. With high-resolution music it is correspondingly less. The fewer the samples and the higher the sample rate, the shorter the duration of the sampled window. Therefore if you are making observations regarding the spectral content of a music track, you probably need to be careful to look at multiple Fourier Transforms throughout the track because the spectral content typically evolves dramatically during the course of a complete track.

Let’s consider a snippet of a music waveform comprising N samples, spaced at regular intervals of t = (1/Fs), where Fs is the sample rate. The total duration of the snippet is therefore given by T = (N-1)t = (N-1)/Fs. The Fourier Transform will comprise a spectrum of N frequencies from zero to Fs. These are spaced linearly at intervals of Fs/(N-1), which is equal to 1/T regardless of the sample rate. That’s an important result. The longer the duration of the snippet, the more accurately we can analyze its frequency content.

What does this mean if our music contains frequencies which are between those linearly spaced individual frequencies? This question goes to the root of digital audio. It means than a set of N samples, sampled at a rate of Fs, cannot tell us anything about the frequency content to an accuracy of better than Fs/(N-1) for the simple reason that

*any such information in the first place. The original waveform may contain frequencies known to a very precise degree of accuracy, but once you extract a subset of N samples, without the rest of the waveform to support them those N samples do not contain this additional information. This is perhaps a more palatable conclusion if you take it to its logical extreme. If I choose a snippet of only***it cannot contain***samples from the original waveform and put those through a Fourier Transform, I end up with N=2 frequencies, DC and Fs, each of which is the alias of the other! It is self-evident that two consecutive samples can tell me absolutely nothing about the original waveform from which they were extracted.***two**It is important to realize that Fourier analysis tells you not so much how much information you can extract from the data, rather it tells you how much is actually in there in the first place.

Earlier on I mentioned that the DTFT requires us to make an assumption that the snippet of music that goes into the analysis repeats

*ad infinitum*into both the past and the future. This is not in and of itself a problem, but there is a practical problem associated with it. Where the the repeating waveforms meet each other there is going to be some sort of discontinuity. It is possible that there will be a smooth transition, but in the general case you are going to get an abrupt mismatch. This mismatch will generate a whole bunch of spurious frequencies, and you are not going to be able to tell which ones originate in the waveform, and which are associated with the discontinuity. The effect these spurious frequencies have in practice is that they serve to broaden the existing spectral lines, and spread them out further into the spectrum. In the limit, they will set the noise floor below which the Fourier Transform cannot extract spectral information, or more accurately, below which the waveform is, as a consequence, not capable of storing information. Because these stitching effects are more or less random, there is no way to model what the characteristics of their associated spectral spuriae will be.Since they originate at the place where the finite waveform periodically stitches together with itself, it should be possible to ‘mask’ the effect of the discontinuity by ‘windowing’ the waveform with a window function which falls to zero at the interface points. Note that this kind of treatment cannot make the problem go away, since it is inherent in the data to begin with. But every window function will have the effect of adding its own particular spuriae to the Fourier Transform. So it may be of some use to choose a window function whose spectral spuriae are of a known characteristic, which at least gives you the possibility to choose a type whose characteristics are less deleterious to a particular property that you are interested in, but perhaps more so to one you are less interested in.

For example, you may be more interested in knowing the absolute magnitude of the signal content at a certain frequency as accurately as possible, and less interested in the magnitude of the noise floor through which it rises. Or it may be the other way round. Or you may be interested to know whether a feature in the vicinity of certain frequency is just one broad spectral line or in fact comprises two frequencies close together. You may wish to know the line width of a certain frequency. All of these things can be addressed by choosing a particular window function whose properties are best suited to your requirements.

Common window functions include Rectangular, Hamming, Hanning, Bartlett, Blackman, Blackman-Harris, Gaussian, Flat-Top, Tukey, and many others. There are no real hard and fast rules, but a good choice of window for accurate resolution of frequency would be Rectangular; for lowest spectral leakage Blackman or Blackman-Harris; and for amplitude accuracy Flat-Top. Unfortunately, not all signal processing Apps offer the same selection of windowing functions, and many of those windowing functions come with their own sets of parameters and selectable types. So if you are using a Fourier Transform to make or illustrate a particular point or observation, it may be worth the effort to set about selecting an appropriate window function.

So there you are. A brief introduction to Fourier Transforms for the casual user. I hope you found it to be of some value.

## Wednesday 13 August 2014

### Pole Dancing

I have mentioned in previous posts that the frequency, phase, and impulse response of filters are inextricably tied together. Filter designers need to know exactly how and why these parameters are linked if they are going to be able to design effective filters, which means that a mathematical model for filters is required. Such models need to be able to describe both digital and analog filters equally well; indeed a fundamentally correct model should have precisely that property. But what does that have to do with pole dancing? Read on…

Some of the most challenging mathematical problems are routinely addressed by the trick of ‘transformation’. You re-state the problem by ‘transforming’ the data from one frame of reference to another. A transformation is any change made to a system such that everything in the transformed system corresponds to something unique in the original system and vice versa. You would do this because the problem, when expressed in terms of the new frame of reference, becomes soluble. The general goal is to find an appropriate transformation, one which expresses the problem in a form within which the solution can be identified. Take for example the problem of finding the cubic root of a number. Not easy to perform by itself. But if you can ‘transform’ the number to its logarithm, the cubic root is found simply by dividing the logarithm by three.

Most of the most challenging problems in mathematics are ultimately addressed by transformation. A little over 20 years ago, Fermat’s Last Theorem remained unproven. This theorem simply states that no three integers exist such that the cube of one of them is equal to the sum of the cubes of the the other two (and likewise for other powers higher than three). Although a simple problem that anyone can understand, it was ultimately solved by Andrew Wiles by employing the most fantastical transformation to express the problem in terms of a construct called “elliptic functions”, and solving the equivalent expression of the problem in elliptic function space, itself a gargantuan challenge. This is perhaps the most extreme example of a transformation, one which takes a concept which most laymen would have no trouble understanding, and renders it in a form accessible only to the most seriously skilled of experts.

At a simpler level, the Fourier Transform is an example understood by most audiophiles. By applying it to data representing a musical signal, we end up with a finely detailed representation of the spectral content of the music. This is information which is not readily apparent from inspection of the original music signal, and renders the music in such a form that we can analyze and manipulate its spectral content, which we could not do with the waveform alone. At the same time, the Fourier Transformed representation does not allow us to play the music, or inspect it for artifacts such as clipping or level setting.

Another problem for audiophiles is the design of filters. Filters are crucial to every aspect of audio reproduction. They are used in power supplies to help turn AC power into DC rail voltages. They are used to restrict gain stages to frequencies at which they don’t oscillate. They are used to prevent DC leakage from one circuit to another. They are used in loudspeaker crossovers to ensure drive units are asked to play only those frequencies for which they were designed. They are used on LP cutting lathes to suppress bass frequencies and enhance treble frequencies - and in phono preamplifiers to correct for it. And they are widely used in digital audio for a number of different purposes.

Filter design and analysis is surprisingly tricky, and this is where this post starts to get hairy. Modern filter theory makes use of a transformation called the z-Transform. This is closely related to the Fourier Transform (the Fourier transform is in fact a subset of the z-Transform). The z-Transform takes an audio signal and transforms it into a new representation on a two-dimensional surface called z-space. This 2-dimensionality arises because z is a complex number, and complex numbers have two components - a ‘real’ part and an ‘imaginary’ part. If you represent the ‘real’ part on an x-axis and the ‘imaginary’ part on a y-axis, then all values of z can be represented as points on the 2-dimensional x-y surface.

It can be quite difficult to get your head around the concept of z-space, but think of it as a representation of frequency plus some additional attributes. Having transformed your music into z-space with the z-Transform, the behaviour of any filter can then be conveniently described by a function, usually written as H(z) and referred to as the transfer function. If we multiply the value of the z-Transform at every point in z-space by the value of H(z) at that point in z-space, we get a modified z-Transform. If we were to apply the reverse (or inverse) z-Transform to this modified data we would end up with a modified audio signal - what we get is the result of passing the original signal through the filter. It may sound complicated, but it is a lot simpler (and more accurate) than any other general treatment. The bottom line is that the function H(z) is a complete description of the filter and can be used on its own to extract any information we want regarding the performance of the filter. The behaviour of H(z) has some unexpected benefits. If H(z) = 1/z then the result of that filter is simply to delay the audio signal by one sample. This has interesting implications for digital filter design.

The function H(z) can be absolutely anything you want it to be. However, having said that, there is nothing to prevent you from specifying a H(z) function which is unstable, has no useful purpose, or cannot be implemented. For all practical purposes, we are only really interested in transfer functions that are both stable and useful. Being useful means that we have to be able to implement it either as an analog or a digital filter, and the z-Transform methodology provides for both. The things that make a H(z) transfer function useful are its ‘Poles’ and ‘Zeros’. Poles are values of z for which H(z) is infinite (referred to as a singularity), and zeros are values of z for which H(z) is zero. Designing a filter then becomes a matter of placing poles and zeros in strategic positions in z-space. I have decided to call this ‘Pole Dancing’. Pole dancing requires great skill if your objectives are to be satisfied. It can take many forms, but the ones which have been proven over time to work best rely on certain specific steps, and you are best advised to stick with them. It is most effective when done by Pros, or at least by experienced amateurs.

Once you have danced your pole dance, it is then a relatively simple matter to use the poles and zeros you placed on your z-space to prepare the equation which describes H(z), and to re-arrange it into a convenient form. For digital filter design, the most convenient form is a polynomial in (1/z) divided by another polynomial in (1/z), in which case the coefficients of the two polynomials turn out to be the precise coefficients of the digital filter. The poles and zeros also translate into inductors and capacitors if you are designing an analog filter.

Once I have the transfer function H(z) nailed down, I can use it to calculate the frequency response, phase response, and impulse response of the filter by replacing z with its complex frequency representation. Recall that z is a complex number, having both ‘real’ and ‘imaginary’ parts. It therefore follows that H(z) itself is a complex function, and likewise has real and imaginary parts. The frequency response is obtained by evaluating the ‘magnitude’ of H(z), and the phase response is obtained by evaluating its ‘angle’. [

You can see why the phase, frequency and impulse responses are so intimately related. You have three functions, each described by only two variables - the ‘real’ and ‘imaginary’ parts of H(z). With three functions described by two variables, it is mathematically impossible to change any one without also changing at least one of the other two.

You also get a hint of where the different ‘types’ of filter come from. Many of you will have come across the terms Butterworth, Chebychev, Bessel, or Elliptic. These are classes of filter optimized for different specific attributes (I won’t go into those here). I mentioned that the transfer function H(z) can in principle be anything you want it to be. It turns out that each of those four filter types correspond to having their H(z) function belong to one of four clearly-defined forms (which I also won’t attempt to describe).

Finally, you may be thinking that you can get any arbitrary filter characteristic you desire by specifying exactly what you require in terms of any two of the frequency/phase/impulse responses, and work back to the H(x) function that corresponds to it. And you would be right - you can do that. But then you will find that your clever new H(x) is either unstable, or cannot be realized using any real-world digital or analog filter structures.

In which case, maybe you need to turn to a Pole Dancer!

Some of the most challenging mathematical problems are routinely addressed by the trick of ‘transformation’. You re-state the problem by ‘transforming’ the data from one frame of reference to another. A transformation is any change made to a system such that everything in the transformed system corresponds to something unique in the original system and vice versa. You would do this because the problem, when expressed in terms of the new frame of reference, becomes soluble. The general goal is to find an appropriate transformation, one which expresses the problem in a form within which the solution can be identified. Take for example the problem of finding the cubic root of a number. Not easy to perform by itself. But if you can ‘transform’ the number to its logarithm, the cubic root is found simply by dividing the logarithm by three.

Most of the most challenging problems in mathematics are ultimately addressed by transformation. A little over 20 years ago, Fermat’s Last Theorem remained unproven. This theorem simply states that no three integers exist such that the cube of one of them is equal to the sum of the cubes of the the other two (and likewise for other powers higher than three). Although a simple problem that anyone can understand, it was ultimately solved by Andrew Wiles by employing the most fantastical transformation to express the problem in terms of a construct called “elliptic functions”, and solving the equivalent expression of the problem in elliptic function space, itself a gargantuan challenge. This is perhaps the most extreme example of a transformation, one which takes a concept which most laymen would have no trouble understanding, and renders it in a form accessible only to the most seriously skilled of experts.

At a simpler level, the Fourier Transform is an example understood by most audiophiles. By applying it to data representing a musical signal, we end up with a finely detailed representation of the spectral content of the music. This is information which is not readily apparent from inspection of the original music signal, and renders the music in such a form that we can analyze and manipulate its spectral content, which we could not do with the waveform alone. At the same time, the Fourier Transformed representation does not allow us to play the music, or inspect it for artifacts such as clipping or level setting.

Another problem for audiophiles is the design of filters. Filters are crucial to every aspect of audio reproduction. They are used in power supplies to help turn AC power into DC rail voltages. They are used to restrict gain stages to frequencies at which they don’t oscillate. They are used to prevent DC leakage from one circuit to another. They are used in loudspeaker crossovers to ensure drive units are asked to play only those frequencies for which they were designed. They are used on LP cutting lathes to suppress bass frequencies and enhance treble frequencies - and in phono preamplifiers to correct for it. And they are widely used in digital audio for a number of different purposes.

Filter design and analysis is surprisingly tricky, and this is where this post starts to get hairy. Modern filter theory makes use of a transformation called the z-Transform. This is closely related to the Fourier Transform (the Fourier transform is in fact a subset of the z-Transform). The z-Transform takes an audio signal and transforms it into a new representation on a two-dimensional surface called z-space. This 2-dimensionality arises because z is a complex number, and complex numbers have two components - a ‘real’ part and an ‘imaginary’ part. If you represent the ‘real’ part on an x-axis and the ‘imaginary’ part on a y-axis, then all values of z can be represented as points on the 2-dimensional x-y surface.

It can be quite difficult to get your head around the concept of z-space, but think of it as a representation of frequency plus some additional attributes. Having transformed your music into z-space with the z-Transform, the behaviour of any filter can then be conveniently described by a function, usually written as H(z) and referred to as the transfer function. If we multiply the value of the z-Transform at every point in z-space by the value of H(z) at that point in z-space, we get a modified z-Transform. If we were to apply the reverse (or inverse) z-Transform to this modified data we would end up with a modified audio signal - what we get is the result of passing the original signal through the filter. It may sound complicated, but it is a lot simpler (and more accurate) than any other general treatment. The bottom line is that the function H(z) is a complete description of the filter and can be used on its own to extract any information we want regarding the performance of the filter. The behaviour of H(z) has some unexpected benefits. If H(z) = 1/z then the result of that filter is simply to delay the audio signal by one sample. This has interesting implications for digital filter design.

The function H(z) can be absolutely anything you want it to be. However, having said that, there is nothing to prevent you from specifying a H(z) function which is unstable, has no useful purpose, or cannot be implemented. For all practical purposes, we are only really interested in transfer functions that are both stable and useful. Being useful means that we have to be able to implement it either as an analog or a digital filter, and the z-Transform methodology provides for both. The things that make a H(z) transfer function useful are its ‘Poles’ and ‘Zeros’. Poles are values of z for which H(z) is infinite (referred to as a singularity), and zeros are values of z for which H(z) is zero. Designing a filter then becomes a matter of placing poles and zeros in strategic positions in z-space. I have decided to call this ‘Pole Dancing’. Pole dancing requires great skill if your objectives are to be satisfied. It can take many forms, but the ones which have been proven over time to work best rely on certain specific steps, and you are best advised to stick with them. It is most effective when done by Pros, or at least by experienced amateurs.

Once you have danced your pole dance, it is then a relatively simple matter to use the poles and zeros you placed on your z-space to prepare the equation which describes H(z), and to re-arrange it into a convenient form. For digital filter design, the most convenient form is a polynomial in (1/z) divided by another polynomial in (1/z), in which case the coefficients of the two polynomials turn out to be the precise coefficients of the digital filter. The poles and zeros also translate into inductors and capacitors if you are designing an analog filter.

Once I have the transfer function H(z) nailed down, I can use it to calculate the frequency response, phase response, and impulse response of the filter by replacing z with its complex frequency representation. Recall that z is a complex number, having both ‘real’ and ‘imaginary’ parts. It therefore follows that H(z) itself is a complex function, and likewise has real and imaginary parts. The frequency response is obtained by evaluating the ‘magnitude’ of H(z), and the phase response is obtained by evaluating its ‘angle’. [

*The square of the magnitude is the sum of the squares of the ‘real’ and ‘imaginary’ parts; the angle is that whose TAN() is given by the ratio of the ‘imaginary’ to the ‘real’ parts*]. Finally, the impulse response is obtained by the more complicated business of taking the inverse z-Transform of the entire H(z).You can see why the phase, frequency and impulse responses are so intimately related. You have three functions, each described by only two variables - the ‘real’ and ‘imaginary’ parts of H(z). With three functions described by two variables, it is mathematically impossible to change any one without also changing at least one of the other two.

You also get a hint of where the different ‘types’ of filter come from. Many of you will have come across the terms Butterworth, Chebychev, Bessel, or Elliptic. These are classes of filter optimized for different specific attributes (I won’t go into those here). I mentioned that the transfer function H(z) can in principle be anything you want it to be. It turns out that each of those four filter types correspond to having their H(z) function belong to one of four clearly-defined forms (which I also won’t attempt to describe).

Finally, you may be thinking that you can get any arbitrary filter characteristic you desire by specifying exactly what you require in terms of any two of the frequency/phase/impulse responses, and work back to the H(x) function that corresponds to it. And you would be right - you can do that. But then you will find that your clever new H(x) is either unstable, or cannot be realized using any real-world digital or analog filter structures.

In which case, maybe you need to turn to a Pole Dancer!

Subscribe to:
Posts (Atom)