There
are two uses of the word Compression in the world of digital audio. The
first is dynamic compression. This is where we want to increase the
volume of a track, but in doing so we make the loudest bits so
loud that their signal level is larger than the maximum value the
format can encode. Here we would use “dynamic compression” to
selectively reduce the gain on those loudest passages so that they fit
inside the available headroom. This note is not about dynamic
compression. Instead it is all about file compression.
File
compression is a process that takes a computer file which takes up a
certain number of Megabytes of storage space, and manipulates it so it
takes up a lesser number of Megabytes. Ideally, but not necessarily
always, this compression is lossless, by which we mean that identical
raw data can be extracted from both the original file and the compressed
file. There are two reasons for wanting to do this. To reduce the
amount of storage space required to store the file, and to reduce the
bandwidth required to transmit a file from one place to another within a
constrained amount of time.
Most of the time, we find that
everyday computer files can be readily compressed. Why is this? In the
software world, the format of a file is typically chosen so as to allow
the computer to write data to the file, and read data from the file, in
an efficient manner. Scant regard is often paid to the resultant
efficiency of data storage. An example might be a simple text file. A
simple ASCII character set uses only 7 bits to encode it. However,
computer files are typically written in chunks of 8-bits, called Bytes.
So every time we want to write a character we use up 8 bits of storage
when in practice we only needed 7 bits. A simple file compression
technique can use this observation to recover the unused storage space
and reduce the file size by one eighth. With more complex file
structures, a general-purpose strategy is not so obvious. Native music
file formats are similarly inefficient.
Anybody who has used a
zipping program to make a ZIP file to transmit a file over the Internet
will be familiar with lossless compression. A ZIP file is a
general-purpose lossless file compression utility. Some files, for
example Bitmap (BMP) image files will compress very nicely into much
smaller ZIP files. On the other hand, files such a JPG images are very
seldom reduced at all in file size by zipping. This is because the file
format used for BMP files is particularly inefficient, whereas by
contrast the file format for JPG files is highly efficient. In
principle, any computer file can be reduced in size by a well-chosen
lossless compression utility, unless the file format was specified to be
efficiently compressed in the first place.
In general, the
more we know about a file, and about the data that the file contains,
the more freedom we can have in selecting an optimum strategy to
compress it. With music files there are number of attributes that can
be exploited to effect lossless compression. Here are two of the easier
to describe attributes: (i) Because music files encode a waveform, and
because the waveform is not totally random (in which case it would be
noise, not music), we can use the waveform’s immediate past to predict
what its immediate future might look like, and encode instead the
differences between the predictions and the actual values. This is used
very effectively in many well-known lossless encoders. (ii) Stereo
music, content is dominated by centred images which contain identical
information in the right and left channels. If instead of encoding L
and R, we encode L+R and L-R we find we end up with waveforms that are
more readily susceptible to other compression methodologies.
Despite the effectiveness of these methods, there are still realistic
limits on how much a native music file can be compressed without losing
data. For most music this averages out at around 50%. To reduce file
sizes by more than that, it is necessary to adopt lossy compression
features. Lossy is exactly what it says it is. In order to further
reduce the file size, we take something that we think you probably can’t
hear and we throw it away. Lossy compression makes great use of the
findings of the field of psychoacoustics in order to help us decide
what, exactly, you ‘probably’ can’t hear. Lossy compression technology
is fabulously creative, extremely clever, and very interesting, but for all that it still makes your music sound worse.
MP3 is the
granddaddy of lossy audio compression technologies. I do not propose to
go into detail about how MP3 does its thing, but at its core it makes
use of a key finding of psychoacoustics, that of ‘masking’. Masking
states that certain sounds are more effectively masked by some sounds
than by others. For example, a louder sound masks a quieter one (well,
duh!). Also, a sound at one frequency effectively masks other sounds at
adjacent frequencies. So if we we can identify and extract one element
of a waveform, and determine that it is ‘masked’ by another one, then
we could, for example, encode the ‘masked’ element using a much lower
bit depth.
MP3 sets about breaking the music into as many as
572 frequency subbands, the contents of which are then scaled up or down
according to the aforementioned psychoacoustic principles, and end up
being encoded using a technique called “Huffman Coding”, by which the
most commonly-occurring values are encoded using fewer bits than the
less-common values (quite simple, yet really rather clever). Using this
approach we can, in effect, controllably reduce the resolution of the
encoded music, reducing it more for those elements in the music which
are ‘masked’, and less for those doing the masking. The Huffman Codes
are typically stored in one or more look-up tables, and by choosing an
appropriate table we can end up with a larger or smaller effective bit
rate.
In effect, lossy compression techniques employ much more
in the way of signal processing than lossless compression in order to
identify and extract which components can be effectively thrown away
while minimizing (note, never eliminating) the audible deterioration in
the perceived sound quality. For this reason, more recent encoders such
as Apple’s AAC, which are more elaborate and require more processing
power than MP3, tend to sound better at equivalent bit rates.