Clipping is a
term that originated with analog audio, and refers to the situation where the
magnitude of the signal rises to a level that is larger than the ability of the
medium to store it, or the electronics to deliver it. For example, with magnetic tape, the signal
is stored by embedding a magnetic signal onto the tape. But if the musical signal gets beyond a
certain level, the tape will not have the magnetic capacity to store a large
enough magnetic signal. Same thing with
an amplifier – if you amplify a signal enough, you will eventually run out of
voltage (or current) at the output.
With magnetic
tape, as well as – generally speaking – with good old-fashioned vacuum tube
amplifiers, when the signal level approaches and exceeds the maximum the system
was designed to handle, the musical peaks get gradually compressed, so
gradually in fact that for the most part you don’t notice it happening. This so-called “soft clipping” meant that,
for the most part, clipping was not the most crucial sonically degrading issue
faced by early audio designers.
This all
changed with the advent of solid-state electronics. Your typical transistor amplifier does not
soft-clip. It hard-clips. This means that when it tries to deliver an
output voltage larger than it the maximum it was designed for, the output
voltage just sits at the maximum value and stays there until the output signal
drops below that maximum value. The peak
of the signal is just wiped out, and the signal waveform develops a flat-topped
appearance everywhere this hard-clip occurs.
Imagine Shaquille O’Neill walking through your front door, and instead
of gracefully ducking to avoid bumping his head, the door simply chops his head
off. The effect on the music is
similarly messy.
In digital
audio, the effect of clipping can actually be even worse! Lets look at what happens when a signal is
clipped. The easiest way to do that is
to consider the clipping as being an error signal which is added to the music
signal. This error signal comprises
nothing but the peaks that got chopped off.
If we analyze this signal, we find that it has frequency components
which extend from within the audio bandwidth (which is considered to be about
16Hz – 20,000Hz) on up into frequency ranges above the audio bandwidth. In analog space, we can generally just ignore
any components above the audio bandwidth because we can’t hear them
anyway. But in digital audio we can’t do
that.
Typical
digital audio has a sampling frequency of 44,100Hz, the standard developed for
the Compact Disc. There is a firm and
fixed mathematical law that says if we want to sample a waveform at a certain
frequency, then we have to make sure that the waveform contains no frequencies
above exactly one half of the sampling frequency. This frequency is termed the “Nyquist”
frequency. For CD, that means it has to
have no content at any frequency above 22,050Hz. What happens if you try and encode a signal
at, say, “N” Hz ABOVE the Nyquist
frequency? What you find is that the
result you get is EXACTLY THE SAME as
you would have got if instead the signal was “N” Hz BELOW the Nyquist frequency.
When you play back this signal, it is not the original high frequencies
you will hear, but the "bogus" lower ones. This effect is called mirroring, and is a
very audibly destructive artifact. It
explains why the original analog signal has to be very tightly filtered prior
to being sampled, to eliminate all traces of any frequency components above the
Nyquist frequency.
Back to
clipping. If you take a perfectly good
signal in the digital domain, and perform some signal processing on it, then
the possibility generally exists that the resultant signal will contain peaks
that are above the maximum value that can be represented by the digital
encoding system. What do you do with
those peaks? The easiest thing is to
“clip” them at the digital maximum, so that just as with analog clipping in a
solid-state amplifier, each sample that works out to be above the digital
maximum is encoded as a digital maximum.
You will have, in effect, encoded a waveform containing frequency
components above the Nyquist frequency.
When you play back that signal, those otherwise inaudible components
will be recreated as audible components at corresponding frequencies below the
Nyquist frequency. This will sound even
worse than hard-clipping in an amplifier.
The solution
is to use mathematics to “re-shape” the portion of the signal that is being
driven into clipping, in such a way as to remove all of the unwanted
high-frequency components. Of course,
there will be a sonic price to pay, even for this. But once you have driven the signal into
overload in the first place, there is no escaping without some sort of penalty.
This sort of
situation arises in general with any form of signal processing, but "mirroring"
is most commonly encountered when down-sampling from a higher sample rate to a
lower one, particularly one derived from a DSD source which has (by design) a
lot of high-frequency noise. In general,
you have to assume that the higher-rate-sampled “source” data can contain
frequency components anywhere below its own Nyquist frequency. But some of those frequencies can still be
higher than the Nyquist frequency of the lower sample rate which is the
“target” of the conversion. So, unless
you absolutely know for a certainty that the “source” material contains no
frequency content above the Nyquist frequency of the “target”, then your
downsampling process needs to incorporate an appropriately designed low-pass digital
filter.