Music is, fundamentally, a set of variations in air pressure that our ears are able to interpret in a meaningful way. Making music means creating those air pressure variations from scratch. Reproducing music means using a representation of these air pressure variations to create as close a facsimile as possible. For that to work we need a means to store that representation. Analog systems exist to store them as variations in the magnetic field on a tape, or as variations in the depths of a groove spiralling round a disc or a cylinder. These systems work, but there are limitations imposed by the physical properties of the tapes, discs, and cylinders, which dictate just how much information can be stored on them. Digital systems can store the same information as a set of numbers, and, as with analog systems, there are limitations imposed upon digital systems, this time by mathematics rather than by any fundamental physical properties of the digital storage medium.
The idea behind a digital representation of an analog waveform is that you measure the analog waveform on a periodic basis and store the resulting numbers. It has to be a periodic basis, because when it comes time to re-create the original waveform, you have to know the exact time when the measurement was made as well as the exact value that was measured. You could choose to store the time value as an additional data point of its own, but that would make for an extremely complicated system. Instead, we adopt the convention that the measurements are made on an exact regular basis, a certain specific number of times per second. This is what we call the sample rate, and we call this method of digital representation PCM (Pulse Code Modulation). The sample rate turns out to have a major impact on how the PCM resultant recording will sound.
Music is generally held to occupy the frequencies between 20Hz and 20kHz. Very few people can hear frequencies as high as 20kHz, and for pretty well all of us, this upper limit of hearing falls progressively as we age. But, in general we hold to the idea that to record music faithfully, we need to record all frequencies from 20Hz to 20kHz. How does this impact the choice of sample rate for a PCM system? The main consideration here is the well-known Nyquist-Shannon sampling theory, which tells us that in order to make a digital representation of an analog signal of a certain frequency, it is necessary that the sampling rate is at least double that frequency. It should be noted that this is not an approximation. It is a mathematical fact. We call the frequency which is one-half of the sample rate the Nyquist Frequency. Nyquist-Shannon theory informs us that a PCM system can capture only those frequencies which lie below the Nyquist Frequency.
If music occupies a frequency range which tops out at 20kHz, then in order to represent it faithfully in a digital system Nyquist-Shannon says that we need to sample it at a sample rate no lower than 40kHz. If that was all there was to it, life would be simple. But Nyquist-Shannon theory tells some other things too. What happens if we try to encode a frequency that is above the Nyquist Frequency? The answer is that it gets encoded very well. But, unfortunately the result is indistinguishable from what you would get if you instead recorded a certain frequency below the Nyquist Frequency. If the Nyquist Frequency was 20kHz, then a 21kHz signal would be encoded exactly the same as a 19kHz signal; a 22kHz signal would be the same as a 18kHz signal; a 23kHz signal would be the same as a 17kHz signal; and so on. This effect is called Aliasing (or Mirroring). All information existing in the recording above the Nyquist Frequency would be Aliased (or Mirrored) to a corresponding frequency below it. Such effects are - surprise! - destructive to the sound quality.
The solution to this problem is to pass the analog signal through a low-pass filter whose function is to filter out all the high frequencies. This is not as simple as it sounds. In theory you would want a filter that massively attenuates everything above 20kHz and nothing below it. This type of filter is called a brick-wall filter, for obvious reasons. The problem is that a real-world brick-wall filter makes the transition from flat to attenuating over a range of frequencies that you might think of as a no-man’s land. Within the no-man’s land the attenuation of the filter is not high enough to prevent aliasing, and not low enough to avoid audibly affecting the music signal. Therefore the no-man’s land must occupy a range of frequencies above the maximum frequency of the music content, but below the Nyquist Frequency. For this reason, the Nyquist Frequency should always be higher than the maximum signal frequency.
It turns out that, making this no-man’s land as small as it can practically be is a question of how we design the brick-wall filter, something I will come back to in a moment. Anyway, applying this kind of thinking to the case in point, we end up moving the Nyquist frequency up a bit from 20kHz to 22.05kHz. Recall that the sample rate is twice the Nyquist Frequency. That puts the sample rate at a familiar number - 44,100 samples per second. This is the thinking that gave birth to the CD format.
At this point we are still not quite finished with Aliasing. Recall that signals above the Nyquist Frequency that are encoded into the data stream cannot be distinguished from their Aliased counterparts below the Nyquist Frequency. The same is true in reverse during playback. For every frequency the DAC generates below the Nyquist Frequency, it also generates a companion at its Alias frequency. All those aliases are above the Nyquist Frequency, and we need to filter those out during playback. This requires another brick wall filter similar to the one we implemented for the recording process.
Summarizing the above, then, all we need is a sample rate a little bit above twice the maximum frequency we want to record, plus a brick-wall filter, and we’re good to go. Whoa boy! Not so fast…
There were two assumptions that we made along the way, one overtly, and one covertly by accepting something blindly without questioning it. The first assumption was that it is acceptable to restrict the frequency content to 20kHz because nobody can hear anything above that. It turns out this is not quite correct, depending on how pedantic you want to be about defining the word “hear”. Hot off the press, the latest research has thrown up an interesting result. Working with subjects who have taken a conventional listening test, and who are clearly unable to discern any audio above 20kHz, scientists have wired their brains up to the latest in scientific instruments, and have shown that their brains do in fact react quite unambiguously to the presence of audio signals at frequencies significantly above 20kHz, which the subjects themselves appear to be blissfully unaware of. This is ongoing research, so it is too early to draw conclusions as to what this means, but maybe it points to a rationale for extending the bandwidth of our recordings up from 20kHz. But how far? 30kHz? 50kHz? We don’t have any answers yet.
The second assumption is less esoteric. We dismissed the brick wall filters as just another circuit element that we could add at our whim. We declined to consider what - if anything - their audible contribution might be. This was not wise. With 16-bit audio, this filter’s job is to be flat up to 20kHz, and thereafter to roll off rapidly to the point where it is 96dB down by the time it reaches 22.05kHz. That is one monster mother of a filter, and typically would involve a huge component count including capacitors, inductors, and in many cases op amps, all of which are the sorts of components high-end audio circuit designers go to fantastic lengths to eliminate from their signal paths. Even with the best conceivable design in the world, where the filter’s frequency response is nice and flat, and its phase response is nice and linear, and it still meets its attenuation requirements, such filters are going to have an audible impact on the signal passing through them. And you have one of them at each of the A-to-D and D-to-A ends of the digital audio chain. There is an argument to be made that the sound of CD is not so much the sound of 16-bits and 44.1kHz, as much as the sound of an analog brick-wall filter.
So how do different sample rates help here? I will discuss this tomorrow in Part II.