Thursday, 19 September 2013

Signal-to-Noise Ratio and the Cocktail Party

I want to go over some potentially obvious stuff mainly because I want it to set the table for tomorrow's post.  You have all heard of the old chestnut where, if you focus clearly, it is often possible to pick out one individual conversation from among the hubbub of a noisy cocktail party.  There may be a hundred people all talking at the same volume.  Together, this forms the noise, and it sounds like we have a hundred times more Noise than the Signal which we are trying to extract from it.  Clearly, the Noise overwhelms the Signal.  Yet most of us have already performed this social experiment, so we know it is not that hard to do.  What, then, is going on here?

To understand this, we need to go back to the concept of noise.  What exactly is Noise, and what makes it different from Signal?  Basically, noise occurs whenever what we are observing appears to be random.  Consider a sequence of random numbers.  What makes them random is that we can discern no pattern or sequence within them, regardless of the level of analytical sophistication, whether real or hypothetical, that we can bring to bear upon them.  If any such pattern can be established, then the numbers are no longer random.  Actually, generating truly random numbers is an astonishingly challenging task, as any expert in cryptography will tell you.  In audio, if the signal - whether an analog signal or a digital representation thereof - is totally random, then it comprises totally noise.

Having said that, there are many different flavours of random.  For example, we can generate a sequence of random numbers that lie between 0 and 1.  Or between -10 and +10.  The other interesting thing is that we can generate random numbers where all the different numbers do not actually have the same chance of appearing.  But is that random, you ask?  Yes it is, and here is an experiment you can do yourself.  Toss two coins (we will assume this to be a truly random process).  Repeat this as often as you like and make a tally of the outcomes.  Two heads or two tails will each appear about a quarter of the time.  But the combination of a head and a tail will appear about half the time.  In audio, the equivalent is what we call Noise Colours.  The noise signal itself may be random, but its frequency content can have any distribution that we like.  For example, White Noise has equal components at all frequencies, whereas Pink Noise has fewer components the higher the frequency goes.

A signal at a certain frequency can only be separated from the noise if its magnitude is higher than the magnitude of the fraction of the noise which is at that frequency.  Lets go back to the cocktail party.  A hundred people are talking, all at the same volume.  But you are only interested your boss, who is talking to the company chairman.  You can hear him discussing his thoughts on the new Vice President, an appointment that everyone expects to be announced soon.  Your boss's voice is like one frequency component in an audio spectrum, where all the other peoples' voices represent other frequency components.  By concentrating only on your boss's frequency, you can tune out all the other frequencies and listen in on his conversation.  Provided, that is, his voice stays above the residual background noise at that frequency.  So, finally the boss leans forward, and lowering his voice, tells the chairman who the new Vice President will be.  But - dammit it all! - by lowering his voice, he has reduced it below the level of the residual background noise.  And you can no longer make out what he says.  But at least there is a lesson to take away.  When the signal drops below the overall noise level, it is still possible to recover it.  But when it drops below the level of that component of the noise which is at the frequency of the signal, then it is irretrievably lost.  If the signal is fainter than the noise, it simply means that what you are listening to is indistinguishable from being random.  Your only option is to change the way you measure the signal.

So how do we know what the level of the signal is at a particular frequency, and how do we know what the background noise is?  The mathematical tool we use to analyze the frequency content of a signal is the Fourier Transform.  It is called a Transform, because the original audio data is transformed into something that bears no immediately obvious resemblance to it, and yet contains all of the information necessary to enable it to be transformed back into the exact original data.  If you want to see what the math looks like, look it up on Wikipedia!  The Fourier Transform of an audio signal turns out to be a representation of the frequency content of the audio signal.  It is a mathematically exact representation.  If there is any frequency information that cannot be precisely extracted from the Fourier Transform, this is simply because that information does not actually exist in the original signal.  Conversely, if you see something in the Fourier Transform, then, whether you like it or not, that means it is also in the original signal.

Taking our noisy cocktail party analogy, we can see what is necessary for us to identify a signal within a noisy environment.  We have to strip everything away that we can identify as not being part of the bit of the signal we are interested in, and focus just on those aspects of the data that could actually be the signal.  Provided we limit our thinking to the frequency domain, we can think this through quite nicely.  Within the data, we will be able to identify the presence of a signal at a certain frequency, but only if the magnitude of the signal is higher than magnitude of all of the noise that is within a narrow band of frequencies surrounding the one we are looking for.  And we can use a Fourier Transform to see whether that is in fact the case.

It sounds like a whole load of stuff and nonsense, but tomorrow we'll look at a practical example.