Monday, 6 May 2013

Two’s Complement

Most of us understand that PCM audio data “samples” (measures) the music signal many times a second (44,100 times a second for a CD) and stores the result in a number.  For a CD this number is a 16-bit number.  A 16-bit number can take on whole-number values anywhere between 0 and 65,535.  Whole-number values means it can take on the values such as 27,995 and 13,288.  But it cannot take on values such as 1.316 or 377½.  Whilst this works just fine, recall that the music waveforms that we are trying to measure swing from positive to negative, and not from zero to a positive number.  But it turns out we can work around that.  You see, an interesting property of binary numbers and the inner workings of computers can be brought to bear.

A 16-bit number is just a string of 16 digits which can be either one or zero.  Here is an example, the number 13,244 expressed in ordinary 16-bit binary form: 0011001110111100.  (I hope I don’t need to explain binary numbers to you.)  If this were all zeros it would represent 0, and if it were all ones, it would represent 65,535.  But there are actually different ways in which to interpret a sequence of 16 binary digits, and one of these is called “Twos Complement.

Before going into that, I want to talk about 15-bit numbers.  A 15-bit number can take on values between 0 and 32,767.  Wouldn’t it be nice if we could encode our music as one 15-bit number representing 0 to +32,767 for all those times when the musical waveform swings positive, and another 15-bit number representing 0 to -32,767 for all those times when the musical waveform swings negative?  In fact, we can do that very easily.  We take a 16-bit number, and reserve one of the bits (say, the most significant bit) to read 0 to represent a positive number and 1 to represent a negative number, and use the remaining 15 bits to say how positive (or negative) it is!  Are you with me so far?

We need to make one small modification.  Both the positive and the negative swings encode the value zero.  We can’t have two different numbers both representing the same value, so we need to fix that.  What we do is we say that the negative waveform swings encode the numbers -1 to -32,768 so that the value zero is only encoded as part of the positive waveform swing.  So now we have a system where we can encode the values from -32,768 to +32,767 which makes us very happy.

Lets do a simple thing.  Take each of our numbers from -32,768 to +32,767 and add 32,768 to them.  We end up with numbers that range from 0 to 65,535.  This is our original 16-bit number!  What we done, in a roundabout way, is to create the “Twos Complement” of our 16-bit number.  The twos complement lets us express 16-bit data in a form that covers both positive and negative values.

It turns out that this makes computers very happy as well, because numbers represented as twos complement respond identically to the arithmetic operations of addition, subtraction, and multiplication.  So we can manipulate them in exactly the same way as we do regular integers.  In fact, twos complement representation is so inherently useful to computers that they use an even more friendly term for them – Signed Integers.

Twos Complement (or Signed Integer) representation is such a huge convenience for computer audio, that most audio processing uses this representation.  Amongst other things, simple signal processing functions like Digital Volume Control are more efficient to code with Signed Integers.

There is one thing to bear in mind, though, and it catches a lot of people out.  Recall that the negative swing encodes a higher maximum number than the positive swing.  Here I am going to shift the discussion from the illustrative example of 16-bit numbers to the more general case of N-bit numbers.  The largest negative swing that can be encoded is 2^(N-1) whereas the largest positive swing that can be encoded is 2^(N-1) - 1.  Where this becomes important is to note that the ratio between the two is not constant, and depends on N, the bit depth.  This comes into play if you are designing a D-to-A Converter with separate DACs for the negative and positive voltage swings.  You need to design it such that the negative and positive sides both reach the same peak output with an input signal of 2^(N-1), while recognizing that the positive side can never see it in practice, since it should only ever receive a maximum signal of 2^(N-1) - 1.  If it ever receives a signal of 2^(N-1) this would indicate an error in its internal processing algorithms.

Similar considerations exist when normalizing the output of a DSP stage (which should properly be in floating point format) for rendering to integer format.  The processed floating point data is typically normalized to ±1.0000 and it would be an error to map this to ±2^(N-1) in Twos Complement integer space, because this would result in clipping of the positive voltage swing at its peak.  Instead it must be mapped to ±2^(N-1) – 1.

Such things make a difference when you operate at the cutting edge of sound quality.