Most of us understand that PCM audio data “samples”
(measures) the music signal many times a second (44,100 times a second for a
CD) and stores the result in a number.
For a CD this number is a 16-bit number.
A 16-bit number can take on whole-number values anywhere between 0 and
65,535. Whole-number values means it can
take on the values such as 27,995 and 13,288.
But it cannot take on values such as 1.316 or 377½. Whilst this works just fine, recall that the
music waveforms that we are trying to measure swing from positive to negative,
and not from zero to a positive number.
But it turns out we can work around that. You see, an interesting property of binary
numbers and the inner workings of computers can be brought to bear.
A 16-bit number is just a string
of 16 digits which can be either one or zero.
Here is an example, the number 13,244 expressed in ordinary 16-bit
binary form: 0011001110111100. (I hope I
don’t need to explain binary numbers to you.)
If this were all zeros it would represent 0, and if it were all ones, it
would represent 65,535. But there are actually
different ways in which to interpret a sequence of 16 binary digits, and one of
these is called “Twos Complement”.
Before going into that, I want to
talk about 15-bit numbers. A 15-bit
number can take on values between 0 and 32,767.
Wouldn’t it be nice if we could encode our music as one 15-bit number
representing 0 to +32,767 for all those times when the musical waveform swings
positive, and another 15-bit number representing 0 to -32,767 for all those
times when the musical waveform swings negative? In fact, we can do that very easily. We take a 16-bit number, and reserve one of
the bits (say, the most significant bit) to read 0 to represent a positive
number and 1 to represent a negative number, and use the remaining 15 bits to
say how positive (or negative) it is!
Are you with me so far?
We need to make one small
modification. Both the positive and the
negative swings encode the value zero.
We can’t have two different numbers both representing the same value, so
we need to fix that. What we do is we
say that the negative waveform swings encode the numbers -1 to -32,768 so that
the value zero is only encoded as part of the positive waveform swing. So now we have a system where we can encode
the values from -32,768 to +32,767 which makes us very happy.
Lets do a simple thing. Take each of our numbers from -32,768 to
+32,767 and add 32,768 to them. We end
up with numbers that range from 0 to 65,535.
This is our original 16-bit number!
What we done, in a roundabout way, is to create the “Twos Complement” of our 16-bit
number. The twos complement lets us
express 16-bit data in a form that covers both positive and negative values.
It turns out that this makes
computers very happy as well, because numbers represented as twos complement
respond identically to the arithmetic operations of addition, subtraction, and
multiplication. So we can manipulate them
in exactly the same way as we do regular integers. In fact, twos complement representation is so
inherently useful to computers that they use an even more friendly term for
them – Signed Integers.
Twos Complement (or Signed
Integer) representation is such a huge convenience for computer audio, that
most audio processing uses this representation.
Amongst other things, simple signal processing functions like Digital
Volume Control are more efficient to code with Signed Integers.
There is one thing to bear in
mind, though, and it catches a lot of people out. Recall that the negative swing encodes a
higher maximum number than the positive swing.
Here I am going to shift the discussion from the illustrative example of
16-bit numbers to the more general case of N-bit numbers. The largest negative swing that can be
encoded is 2^(N-1) whereas the largest positive swing that can be encoded is
2^(N-1) - 1. Where this becomes important
is to note that the ratio between the two is not constant, and depends on N,
the bit depth. This comes into play if
you are designing a D-to-A Converter with separate DACs for the negative and
positive voltage swings. You need to
design it such that the negative and positive sides both reach the same peak
output with an input signal of 2^(N-1), while recognizing that the positive
side can never see it in practice, since it should only ever receive a maximum
signal of 2^(N-1) - 1. If it ever
receives a signal of 2^(N-1) this would indicate an error in its internal
processing algorithms.
Similar considerations exist when
normalizing the output of a DSP stage (which should properly be in floating
point format) for rendering to integer format.
The processed floating point data is typically normalized to ±1.0000 and it would be an error to
map this to ±2^(N-1) in Twos
Complement integer space, because this would result in clipping of the positive
voltage swing at its peak. Instead it
must be mapped to ±2^(N-1)
– 1.
Such things make a difference when
you operate at the cutting edge of sound quality.