Here at BitPerfect we unabashedly took our name from the term of art “bit perfect” which, in the early days of computer audio, was considered to be the most important attribute a computer-based audio system needed to have in order to sound good. Today, we realize that being “bit perfect”, whilst a laudable objective, is neither a requirement for great sound, nor a guarantee of achieving it. Time to talk about what “bit perfect” means.
Your computer-based music music collection comprises a bunch of music files. Each of these contains a digital representation of the music - basically a whole bunch of numbers. The job of the audio playback software is to read those numbers out of the file and send them to an audio output device whose function is to turn them into music. The theory of “bit perfect” playback is that if the numbers that reach the audio output device are the exact same numbers as those in the music file, then there is not much more that the computer and its software can do to improve upon the resultant sound quality.
Lets stick with that notion for a while.
Why did people have any concerns that the computer might do anything different in the first place? The answer is that no computer was ever designed to be first and foremost an audiophile-grade music source. Computers generate all sorts of sounds, including warning beeps and notification chimes, as well as proper audio content. These audio events may be generated in different audio formats (after all, why generate a “beep” in CD-quality 16/44.1). The role of the audio subsystem in a computer is to manage all of these audio sources in as seamless a manner as possible. Additionally, they often provide “added value” functionality, such as equalizers, stereo image enhancers, and loudness control. Audio signals, from whatever source, are fed into the audio subsystem and are routed through all of this signal processing functionality. Sometimes, in order to meet objectives deemed preferable by the system’s designers, there may be additional functionality such as sample rate conversion. The upshot of all this is that, in many cases, the bits that reach the audio output device are no longer the same as those that were in the original music file. The modifications made along the way as often as not degrade the sound. So a very good first step would be to establish a “bit perfect” playback chain as a baseline and move on from there.
But are all departures from “bit perfect”-ness destructive to sound quality? It turns out that no, they are not. Lets look at what happens when you manipulate a signal.
Digital volume control is an obvious example. In a previous post I set out some thoughts on the subject. Basically, every 6dB of attenuation results in the loss of one bit of audio resolution, so it would make sense that digital volume control results in a de facto loss of sound quality. But if volume control is performed in the analog domain, the analog signals encoded by the ‘lost’ bit (the Least Significant Bit, or LSB) are themselves attenuated, pushed further down into the amplifier’s background noise. If the sounds encoded by the LSB lie below the background noise level of the amplifier, they should not be audible. With 16-bit audio, the noise floor of the best amplifiers can lie below the SNR floor of the encoded signal, so it is arguable that the resolution loss introduced by digital volume control can be audible - certainly measurable - but with 24-bit audio the SNR of the encoded signal is always swamped by amplifier noise, and so should never be audible. However, this argument assumes that analog-domain volume control has zero audible impact, and most (but not all) audio designers accept that this is not the case.
Beyond bit reduction, digital volume control involves recalculating the signal level at every sample point. The answer spit out by the algorithm may not be an exact 16-bit (or 24-bit) value, and so a quantization step is inevitably introduced, and a further quantization error encoded into the audio data stream. As pointed out in another of my previous posts, quantization error can be rendered virtually inaudible - and certainly less objectionable - by the judicious use of dither. Most audio authorities agree that quantization noise and dither can be audible on 16-bit audio but that dither is way less objectionable than undithered quantization error. Both are generally held to be inaudible with 24-bit data. Therefore digital volume control with 16-bit data is normally best performed with dither.
So the volume control method with the least deleterious effect on sound will be the one that sounds best. Certainly, in every experiment performed by my own ears, digital volume control has proven sonically superior to analog with 24-bit source material. Clearly a digitally-attenuated audio stream is inherently not “bit perfect”, so here is one example where “bit perfect”-ness may not be an a priori requirement for optimal sound quality.
Musical data consists, typically, of a bunch of 16-bit (or 24-bit numbers). This means that they take on whole number values only, between zero and 65,535 (or 16,777,215). Signal processing - of any description - involves mathematical processing of those numbers. Lets consider something simple, such as adding two 16-bit numbers together. Suppose both of those 16-bit numbers are the maximum possible value of 65,535. The answer will be 131,070. But we cannot store that as a 16-bit integer whose maximum value is 65,535! This is a fundamental problem. The sum of two 16-bit numbers produces an answer which is a 17-bit number. In general, the sum of two N-bit numbers produces an answer which is a (N+1)-bit number. The situation is more alarming if we consider multiplication. The product of two 16-bit numbers is a 32-bit number - more generally the product of two N-bit numbers is a (2N)-bit number. So if you want to do arithmetic using integer data, you need to take special measures to account for these difficulties.
Generally, signal processing becomes easier if you transform your integers into floating point numbers. I apologize if the next bit gets too heavy, but I put it into a paragraph of its own that you can skip if you prefer.
Floating point numbers come in two parts, a magnitude and a scaling factor. In a 32-bit floating point number, the magnitude occupies 24 of the bits and the scaling factor 8 bits. The magnitude ranges between roughly -1 and +1, and the scaling factor is represented by an 8-bit number which ranges between -128 and +127 (the actual scaling factor is 2 raised to the power of the 8-bit number). 32-bit floating point numbers therefore have approximately 7 significant figures of precision, and can represent values as large as 10 raised to the power 37, and as small as 10 raised to the power -37. The value in using 32-bit floating point format is that whatever the value being represented, it is always represented with full 24-bit precision (equivalent to 7 significant figures in decimal notation) across nearly 70 orders of magnitude of dynamic range. The down-side is that if you instead devoted all 32-bits to an integer representation you would have the equivalent of 10 significant figures of precision, but with no dynamic range at all. By using 64-bit floating-point numbers the benefits get even greater - the precision is equivalent to 15 significant figures (48 bits), and the dynamic range is for all practical signal processing purposes unlimited.
Why is any of this important? Well, generally speaking, depending on what type of manipulations you want to do on the audio data, it might not be. Volume control, for example, can be accomplished just as effectively (if, admittedly, less conveniently) on integer data as on float data. With more advanced manipulations, however, such as Sample Rate Conversion, the benefits of floating point begin to emerge. One calculation that often arises in signal processing is to calculate the difference between two numbers, and multiply the result by a third number. Where this type of calculation trips up the unwary is when the first two numbers are nearly identical, and the third number is very large. This can create what we term Data Precision Errors. I will illustrate this problem using an example that employs three 16-bit numbers:
First of all, I will take two 16-bit numbers, 14,303 and 14,301, and take the difference between them. The answer is 2. I will then multiply that by a third 16-bit number, 7,000. The answer is 14,000. Seems straightforward, no? Well, the difference answer I got was 2, and this answer has a precision of just one significant figure. In other words, the answer could have been 1, 2, or 3, but it could never have been 1.09929. Consequently, when I multiplied my difference by 7,000 the result could have been 7,000, 14,000, or 21,000. It could never have been 7,695.03 for example. Now, if my starting numbers (14,303 and 14,301) were raw data, then there is no further argument. But suppose instead that those numbers were the result of a prior calculation whose outcomes were actually 14,302.58112 and 14,301.48183. What happened was that the five significant figures that should have been after the decimal point got lost because the 16-bit format could not represent them and the results were rounded up or down to the nearest 16-bit integer. The difference between the two, instead of being 2, should have been 1.09929 and the result, when multiplied by 7,000 should have been 7695.03 instead of 14,000. That error is actually very big indeed. That is the difference between using 16-bit integer format and 64-bit float format (the differences are admittedly exaggerated by my unfair comparison of 16-bit Int to 64-bit Float, but it serves to illustrate the point). In a complicated process like Sample Rate Conversion, these types of processes are performed millions of times per second on the raw audio data, and something as simple as the choice of numerical format can made a big difference in how the outcome is judged qualitatively.
The point of all that is that, depending on your DAC, you may get better audio performance by upsampling to a higher sample rate. Self-evidently, upsampled data is no longer “bit perfect”, and the minutiae of how the upsampling is done can and will impact the sound quality of the result.
It is no longer either sufficient, nor necessary, for a computer-based audio playback system to be “bit perfect”. Of course, if you configure it to be truly “bit perfect” then it is unlikely to sound good if it fails to deliver “bit perfect” performance. But as long as the playback software does its job competently, then it should not be a fundamental cause for concern if the result is not strictly “bit perfect”. All you need, really, is for it to be “BitPerfect” :)