Monday 22 June 2015

Dynamic Compression

Most computer users will be familiar with data compression.  This was a godsend at the dawn of the internet age when internet connections were achieved via dial-up modems with bandwidths restricted to clockwork speeds.  The first ever document I received over the internet was a 1MB WordPerfect file, and the transmission took about three hours using a modem that set me back $280.  Even so, this was still a great thing, given that the file wouldn’t fit on a 5.25” floppy disk which could otherwise have been posted to me.  I didn’t have PKZip at that time, but eventually a colleague introduced me to it.  For many years thereafter nobody would ever consider sending an e-mail attachment without first “zipping” it.  Zipping a word processor file could reduce the file size by enormous factors of 5X or more.  Data compression was, and still is, a great thing.  In audio, formats like FLAC and Apple Lossless use data compression to reduce the size of an audio file without compromising its audio content.  By contrast, formats like MP3 and AAC go a step further and irretrievably delete some of the audio content to make the file smaller yet.  But dynamic compression is a different beast entirely. 

When an audio signal continues to increase in volume, at some point you will run into a limitation.  For example, beyond a certain (catastrophically loud) volume, air itself loses the ability to faithfully transmit a sound.  If you drive your loudspeaker with too many Watts, the drive units will self destruct.  If you feed your amplifier’s inputs with too large of a signal, its outputs will clip.  If you try to record too loud of a signal onto an analog tape, the tape will distort.  And if you try to encode too large of a signal in a digital format … well, you can’t get there from here, and you just have to encode something else instead - typically digital hard clipping.

Therefore, whether in today’s digital age or in the analog age of yore, anybody who is tasked with capturing and recording an analog signal has to be concerned with level matching.  If you turn the signal level up too high, you will encounter one of the previously mentioned problems (hopefully one of the last two).  If you turn it down too low, the sound will eventually descend into the noise and be lost.  However, analog tape had a built-in antidote.  It turns out that if you overload an analog tape, the overload is managed ‘gracefully’, which means that you could record at a level higher than the linear maximum and it wouldn’t sound too bad.

In fact, not only does it not sound too bad, but if you play back the resultant recording over a low-fidelity system like a radio or a boom box, it can actually sound better than a recording that properly preserves the full dynamic range.  This is because the dynamic range within a high quality recording is greater than the ability of the low-fi system to reproduce it, and the result can be a sound that appears to be quiet and lifeless.  By allowing the analog tape to saturate, the dynamic range of the recorded signal is effectively reduced (or ‘compressed’), and better matched to that of a low-fi system.  In fact, in all but the very finest systems, a little bit of dynamic compression is found by most people to be slightly preferable to none at all.  Which is a problem for those of us fortunate enough to enjoy the finest systems, whose revealing nature tends to deliver the opposite result.

With analog tape, managing dynamic compression through tape saturation is a finely balanced skill.  It is not something that you can easily bend to your design.  It’s sometimes considered to be more of an art than a science.  On the other hand, in the digital domain, dynamic compression can be tailored umpteen different ways according to your whim, and you can dial in just the right amount if you believe your recording needs it.  Most digital dynamic compression algorithms are seriously simple, being nothing more than a non-linear transfer function based on Quadratic, Cubic, Sinusoidal, Exponential, Hyperbolic tangent, or Reciprocal functions (to name but a few).  Ideally, the transfer function would remain linear up to a point, above which the non-linearity would progressively kick in, and the better regarded algorithms (such as the Cubic) do behave like that.  But most serious listeners agree that digital dynamic compression never sounds as good as ‘natural’ dynamic compression from magnetic tape.  Maybe this is one of the reasons analog still has its strong adherents.

The thing about digital dynamic compression is that, once it kicks in, its effect on the sound is rather drastic.  Harmonic distortion components at levels as high as -20dB are common.  Moreover, the technique can create substantial harmonic distortion components above the Nyquist frequency, which get mirrored down into the audio band where they appear as inharmonic frequencies which are subjectively a lot more discomforting than harmonic frequencies.  It also creates huge intermodulation distortion artifacts, also highly undesirable.

There are papers out there which do a very thorough job of analyzing what various dynamic compression systems, both real and theoretical, could do if they were implemented, and the conclusions they come to are pretty consistent.  Digital dynamic compression fundamentally sucks, and there’s not much you can do about it.  But having said that, if you have some understanding of how compression works, are willing to limit the amount of applied compression judiciously, and have sufficient computing power available, you can bring to bear a whole grab-bag of tricks to try to minimize them.  Such techniques include side-chain processing (where several analyses of the signal happen in parallel as inputs to the core compression tool), look-ahead (analysis of the future input signal, obviously not for real-time applications), advanced filtering (seeks to reduce unwanted distortions by filtering them out), and active attack/release control (governs the extent to which the sudden onset of compression is audible).  Sophisticated pro-audio tools can bring all these techniques - and more - to the party.

Dynamic compression as a serious issue of sound quality came to a head (or descended to its depths, depending on your viewpoint) during the early 2000’s with the so-called “loudness wars”.  The music industry was coming to terms with the notion that a lot of popular music was being listened to in MP3 format on portable players of limited fidelity.  While with their left hands they were trying their best to prevent the proliferation of music in the MP3 format, with their right hands they were recognizing that if music was going to be listened to on portable systems with restricted dynamic range it might sound better if the recordings themselves had a similarly restricted dynamic range.  It is a well known psychoacoustic effect that, when comparing two similar recordings, people overwhelmingly tend to perceive the louder one to be better, and dynamic compression is a way to increase the perceived loudness of a recording.  The labels therefore started falling over themselves to release recordings with more and more “loudness”, or put another way, with more and more dynamic compression.

Take U2’s “How to Dismantle an Atomic Bomb”, released in 2004.  This album is a downright disgrace.  It sounds absolutely appalling.  I bought it when it came out and haven’t listened seriously to it since.  And if there is any doubt as to why that might be, just take a look at the attached screenshot image.  These are waveform envelopes obtained using Adobe Audition.  The top track is “Vertigo” from this album.  The bottom track is “With or Without You” from their 1988 release Joshua Tree.  Both are ripped from the standard commercial CD releases.  The difference is laughable.  You can clearly see how the one on the top has been driven deeply into dynamic compression.

To attempt to quantify this effect, the “Loudness War” website endorses a free tool called the Tischmeyer Technology (TT) Loudness Meter.  This measures Vertigo as DR5 which it classifies as “Bad” (DR0 - DR7), and With or Without You as DR12 which is in the “Transition” range (DR8 - DR13), but getting close to Good (which starts at DR14).  All else being equal, the higher the number the better the sound, but the numerical result is quite dependent on the program material.  Next time you play an album, see if it is listed on and check its rating.  If it isn’t listed, it is a simple job to download the free TT Loudness Meter tool, measure the album yourself, and upload the data.

And it isn’t just the music business that faces this issue.  Incredibly, I also encounter it in the ultra-low-fi world of the TV sound track.  Just when you thought plain old dynamic compression was bad enough, the more aggressive “loudness shaping” algorithms also heavily modulate the volume of the sound track, winding it up during “quiet” passages when there is no dialog, or even between breaths during the dialog itself.  This has the effect of raising the background noise to the same loudness level as the dialog itself - and you can plainly hear it winding up and down - making watching the TV show a most unpleasant experience.  For me, for example, it ruined the last season of “House”.  I can’t begin to imagine how bad a TV set would have to be for such measures to be remotely beneficial.

As a final observation, for the purists who like to work in DSD, there are a couple of important considerations to bear in mind.  The first is that, in native DSD mode, you simply cannot do any sort of signal processing whatsoever - not even something as trivial as volume control (fade-in/fade-out for example), let alone dynamic compression.  You have to convert to PCM to do that and then convert back to DSD, which most DSD purists find unacceptable.  The other interesting thing is in the Sigma-Delta Modulators which convert analog (or PCM digital) to DSD format, which warrants a discussion all of its own.

As you increase the signal level in these modulators the result is far from deterministic.  Overloading the modulator can make it go unstable in an unpredictable manner.  For that reason, the SACD standard requires the analog signal level encoded in DSD to be 6dB below the theoretical maximum that the format can support.  But interesting things happen if you over-drive the modulator.  Most contain special circuits or algorithms which detect the onset of instability and apply corrective measures.  This means that the modulators can normally accept inputs that exceed the supposed -6dB limit, with a penalty limited to a slight increase in distortion.  Keep pushing it further, though, and the modulator self-resets, resulting in an audible click.

In a sense, if you are a recording engineer, DSD is a bit like analog tape on steroids.  If your signal exceeds the -6dB limit then to a large degree you are going to be able to get away with it, unlike the situation with PCM digital, where the signal will either clip, or the dynamic compressor will to cut in.  With DSD you get the ‘graceful’ overload of analog tape, but without the associated dynamic compression.  The result is probably the best of all worlds.  Interestingly, with our DSD Master tool, it gives us an accurate view into whether or not the recording/mastering engineer has “pushed” the recording beyond the -6dB guideline, and you would be seriously surprised at the extent to which such behaviour appears to be the norm.