Wednesday 27 November 2013

An Arm-Waving Argument

Nothing whatsoever to do with audio, this post.  Question:  How do you know when an Italian is speaking?  Answer:  His arms are moving!

That’s a stereotype as old as the hills, and one with a lot of truth to it.  The fact that people - of all nationalities - tend to wave their arms and gesture as they speak is something that has always fascinated me.  The fact that I do it myself, despite being stolidly Anglo-Saxon, fascinates me just as much.  I find this aspect of human behaviour endlessly amusing.  For example, what is the purpose of gesturing exorbitantly while talking on your cell phone?  Why do I do it myself from time to time?

This morning, while working out on my cross-trainer, I think I came up with a crazy insight, and I thought I would share it with you.  On the TV set was a travelogue show.  The host was taking a leisurely stroll with a guest, discussing the history of Buenos Aires, as it happens.  The guest was doing most of the talking.  At the beginning his arms remained at his sides.  Then they began to gradually lift in front of him as he spoke, and finally began to adopt mild gestures.

I considered the mechanics of walking, talking, and gesturing.  What else is there to do when you’re stuck on a treadmill?  Lets start with talking.  In order to talk, you need to establish an overpressure in your diaphragm to drive the vibrations in your vocal chords.  The process of tensing your diaphragm involves tensing your abdominal muscles.  Try saying something out loud right now, and note how your diaphragm and abs both tense up.  When you are standing up, and also when you are walking slowly, your abdominal muscles are also part of the process of staying in balance.  They will be more tense, in general, than when you are sitting down.

So now imagine you are standing up, maybe even walking, and decide you are going to say something.  The first thing that happens is that your diaphragm tenses up to supply an overpressure.  This requires your abs to tighten slightly.  The tightening of your abs causes your upper body to want to bend slightly forward.  But you don’t want to tip forward, so your autonomous nervous system automatically compensates by raising your arms in front of you.  The angular momentum of your arms rising in front of you counterbalances the angular momentum of your upper body bending forward, and this balance means that you don’t tip over.

Now you start to actually speak.  This involves temporarily reducing the overpressure in your diaphragm to allow a controlled release of air through the vocal chords.  The reduced overpressure is accomplished, at least partially, by releasing the tension in the abs.  This then releases the forward bend in the upper body.  The raised arms now need to begin to lower again to provide the angular momentum to counterbalance it.

So here is the summary of what I have just described.  When a person starts the process of speaking, his arms first come up.  With each utterance the arms gesture forwards again, and in the pauses between utterances come back up again.  When the speaking is over, the arms can come back down.

What about the TV guest in Buenos Aires?  Well, I think that it all boils down to your core body strength and endurance.  If you are in good shape, and particularly if your core is in good shape, your body is less likely to tilt in response to a slight tightening of the abs.  Your back and other core muscles will tend to compensate automatically.  But if not, then as you stroll slowly along, chatting as you go, your abs are being lightly exercised, and after a while your core muscles will gradually tire, and you will need to use your arms to assist.  This is what happened to the Argentinian, who did not give the impression of being particularly buff.  At the start of his short stroll, he needed no arm assist.  Then, as he tired, his arms would raise - barely so - as he spoke.  By the end of the chat, his arms were all the way up, and he was gesturing with each utterance.

My thought is that the root of gesturing as we speak - which is common to all cultures and not just Italians - must lie in some sort of bio-mechanical response such as this.  I thought that was a pretty cool idea.  Any anthropologists reading this?

Tuesday 26 November 2013

Upon Reflection

Positioning your loudspeakers in your listening room for optimum performance is an arcane art.  Three factors must be taken into account.  First, you want to avoid exciting  your listening room’s natural bass resonances; Second, you want to throw a good and accurate stereo image; and Third, there will be any number of purely practical considerations that you cannot avoid and have to work around - for example, it's best if you don’t block the doorway.

The first of these factors is well understood, although not to the extent that the correct answer can be exactly derived a priori.  The final solution will depend on the acoustic properties of each of the listening rooms walls, floor, and ceiling, as well as the speaker’s dispersion pattern, and not forgetting all of the furnishings.  There is a commonly adopted tool called the Rule of Thirds, where the speakers are each placed one third of the way into the room from the side wall, and one third of the way in from the back wall.  The listener then sits one third of the way in from the wall behind him.  This is usually a good place to start.  A variant of this rule is the Rule of Fifths, which is pretty much the same with all the “Thirds” replaced by “Fifths”.  But this post is not about this aspect, so lets move on.

The third factor is also something that I cannot help you with in a FaceBook post.  You had best check out Good Housekeeping, or something.  So this post is going to focus on the second factor, obtaining a good stereo image - indeed this post is about one very specific aspect of it.

It turns out that, of all the factors via which speaker positioning affects the creation of a solid stereo image, the most important is usually the so-called First Reflection.  Sound leaving the loudspeaker cones makes its way to your ear either directly, or by reflection off the walls, floor, and ceiling.  The sound can take many paths, bouncing all round the room, but the most important path is the one that travels from speaker to ear via a bounce off one surface only.  In most listening rooms, the ceiling is generally flat and unobstructed and is the same across the whole room.  Therefore the sound from either speaker will bounce off the ceiling in a similar manner on its journey to your ear.  As a consequence it does not generally impact the stereo image, at least to a first approximation.  The same can be said for the floor, although in most situations furniture and carpeting can interrupt the path.  However, the walls affect the speakers asymmetrically.  The right speaker is close to the right wall but far from the left wall, so its First Reflection will be dominated by reflections off the right wall.  The opposite will be the case for the left speaker.  This asymmetry is partly responsible for the perceived stereo imaging of the system.

There are two things the user can use to control these First Reflections from the side walls.  The first is to adjust the proximity of the speaker to the side wall.  The closer it is, the less the time delay between the arrival at the ear of the direct signal and the reflected signal.  The second is to adjust the speaker’s toe-in (the extent to which the speaker points towards the centre of the room rather than straight along the room’s axis).  Unless you have a true omnidirectional loudspeaker, the speaker’s horizontal dispersion pattern will peak at its straight-ahead position, gradually falling off as you move to one side or the other.  Therefore, the amount of toe-in controls the proportion of the reflected signal to the direct signal.  If your listening room has sonically reflective side walls (plain painted walls, for example), you will probably require a greater degree of toe-in than if you have heavily textured wallpapered side walls, or furniture that will scatter the sound as opposed to reflecting it (such as bookshelves).

I have attached two photographs of my own listening room.  The side walls are flat and painted, and are quite reflective, therefore my loudspeakers have quite a large degree of toe-in.  Also, along the wall beside one of them I have a glass door close to the speaker.  With the door closed, the First Reflection comes off a combination of the wall and the glass door.  However, with the door open, the First Reflection comes off a combination of the glass door and a big hole (which clearly does not reflect at all).  Therefore, on my system, the imaging is severely impacted if I listen with the door open.

The other thing you need to bear in mind is that your best strategy is to control these First Reflections rather than work to merely eliminate them.  As a rule, placing highly absorbent panels right where the Reflection is going to strike is not going to help the sound too much.  The fact that reflections are so important generally means that you don’t want your room to be too acoustically dead.  An empty room with painted flat walls can have a horrible echoing acoustic, but it only takes a small amount of furnishing to break it up.  The echoing or “liveliness” of a room is usually measured by a property called RT60.  This is the time it takes for a reverberation in the room (caused, for example, by clapping your hands) to fall to 60dB below its initial value.  A good number for a listening room would be 0.3 - 0.5 seconds.  If your room has a larger RT60 value, then you will probably need to deaden it with a judiciously placed acoustic panel.  But how big of a panel, and where to place it, is a very complicated subject in itself.  My room has a big absorbing panel, about 6’ x 4’, affixed to the ceiling between and behind the speakers.  I also prefer to listen with the heavy floor-to-ceiling curtains on the wall behind my listening chair drawn.

Of course, every time you make a significant change to the acoustics of your listening room, the chances are good that you are going to need to reposition your speakers.  Changes that affect the RT60 may well impact the optimum positioning, so you may have to go through the whole procedure again.  Reposition, then fine-tune the toe-in and the tilt.  My B&W 802 Diamonds weigh 160lb each, and are the most cussedly awkward things to grasp if you ever want to move them, so that is something I don’t like to get involved with on a whim.  Because of the First Reflection factor, if your listening room is such that the First Reflection surface has a high acoustic reflectivity, then be aware that the distance of the speaker from the side wall will probably have to be set to an accuracy of half an inch.  Likewise, the toe-in and tilt can require great precision for optimal results.

If your loudspeakers are not set up to image as well as they can, then you are going to find it that much harder to optimize other aspects of your system setup.

Monday 25 November 2013

Learning to Listen

There was a time - and this may surprise you - when a Hi-Fi reviewer’s job was to install whatever he was reviewing on his lab bench and measure the bejesus out of it. When I first got into Hi-Fi, in England back in the 1970’s, one of the senior reviewers in the vibrant Hi-Fi magazine scene was Gordon J. King. Gordon lived close by and I got to meet him. Gordon would never dream of connecting an amplifier to a pair of speakers and playing music through it. He would measure power output, distortion, frequency response, anything he could put a number to. But he would never let it near his sound system (which was pretty weird, and which he never did let me listen to).

When Naim released the radical 250 Power Amplifier, Julian Vereker didn’t pause to think before sending one out to Gordon for review. Now, the first 250’s had a tendency to oscillate in the ultrasonic without an inductive load. In fact, the user manual went to great lengths to specify the type of speaker cable which was necessary to avoid this problem in practice. Not a person to pay any attention to matters as mundane as loudspeaker cables, Gordon immediately installed the 250 on his lab bench and connected a rheostat across its output terminals, which, for the duration of his test, was all he would ever connect to it. Needless to say, it measured terribly, right up until he blew out its output stage measuring its power delivery capability. It was sent back to a horrified Julian Vereker, who repaired it and sent it back. It blew up for a second time. Gordon gave the Naim 250 a terrible review.

At one point, after he had retired, Gordon gave me a high-end Denon receiver, a product he considered one of the best amplifiers he had ever reviewed. That Denon sounded absolutely appalling when I hooked it up. I gave it back. As life would have it, I replaced the Denon with … a Naim 250. It was absolutely superb sounding.

A few years earlier, “Hi-Fi Answers” was one of the many UK Hi-Fi magazines sold in the high street newsagents. It was not particularly notable, but its hook was an expanded Q&A section where readers could write in for advice. In about 1980, Keith Howard took over as Editor, and soon Hi-Fi Answers had a radical editorial makeover. Word got around that every single question that was posed on their Q&A pages was answered with instructions to purchase a Linn Sondek turntable, Linn speakers, and Naim amplification. It didn’t seem to matter what the question was, the answer was always “Linn/Naim”. Additionally, Hi-Fi equipment was now reviewed solely by listening to it, with not a single measurement playing any role in the evaluation process. It really was quite a radical departure, back in those days, to talk about how an **amplifier** sounded! Let alone a turntable, or a tonearm. Finally, they propounded a radical new philosophy of “source first”, where the most important component in a Hi-Fi system was the turntable, followed, in order, by the tonearm, cartridge, preamp, power amp, and loudspeakers. All this was almost a total inversion of the perceived wisdom of the day. This radical approach interested a young me, as I had by that time gone through many stages of incremental system upgrades. Each time the system indubitably sounded better after the upgrade, but after the new car smell wore off I was left with the uneasy feeling that nothing much of substance had actually changed. I could hear apparently higher fidelity, but the new system never really rocked my boat any more than than did the previous incarnation. Meanwhile, Hi-Fi Answers promised audio nirvana if only I would buy a Linn Sondek. It was time I found out what all the fuss was about.

I found myself in London one weekday afternoon, so I figured I could spend some time in one of the city’s Linn dealers and I wouldn’t be getting in anybody’s way. I can’t remember its name, but I had read that Hi-Fi Answers’ Jimmy Hughes, the leading light of the new “just listen to it” school of equipment reviewing, used to work there. One of the sales staff duly introduced himself and inquired what I was looking for. I explained. He installed me in one of their many private (single-speaker) listening rooms and spent about two hours giving me a one-on-one lesson in listening to Hi-Fi. It went like this. I happened to have a couple of albums that I had just bought. One of them was a Sibelius Violin Concerto, although I don’t remember who the violinist was. He started off by asking why I had bought that record. This was an immediate problem, since I had only bought it because it was in a clearance bin and seemed worth a punt. But, surrounded by equipment I could never afford, and a smoothly urbane salesman I didn’t want to offend, I really didn’t want to say that. So I offered some appropriate-sounding platitudes. The salesman wouldn’t give up on it, though - he wanted to play it for me on a popular low-end turntable, and we duly listened for a while. At the end, he interrogated me on my thoughts regarding the soloist’s performance. Bless him, the salesman listened patiently to my clearly clueless response. I had no real opinion regarding the soloist’s performance, and I'm sure the salesman knew it. Now we switched the record to a Linn Sondek turntable, fitted with a Linn Ittok arm and a Linn Asak cartridge. I was asked to listen again, and answer the same question.

During those first 10 minutes of exposure to the Linn, I got it. It was a jaw-dropping experience. All of a sudden, everything made made sense. Like being struck by Cupid’s arrow, I immediately knew that the “source first” concept was the real deal, and that the Linn was for me. The salesman took me through many more albums, each one carefully chosen to illustrate a point he wanted to make. We listened to Sondeks with different arms and cartridges. Each point he wanted to make was a lesson I needed to absorb.

What I learned in that store that afternoon has been the basis of my approach to Hi-Fi ever since, and I don’t feel it has ever let me down. And I have no intention of trying to set it out in print here, because words alone can’t and don’t fully capture it. Only the experience does. Only the experience can, and preferably with the assistance of a really good teacher. But if I could distill the essence of it, it would be this: Does the performance **communicate** with you? The value of music cannot lie solely within the notes and words, but must derive from the performers’ interpretation of them. Sure, it takes technical chops to perform the piece, but what makes it worth listening to in the first place should be the same as what makes it worth committing to tape in a studio in the first place. The performer must surely have something to say, so is he **communicating** that to you as you listen?

I ended up confessing to the salesman that I could not remotely afford a Linn Sondek, and he was cool with that. But I did start saving, and in a little over a year I bought a Rega Planar 3 turntable, and a little over a year after that, replaced it with a Linn Sondek. My journey, which had begun about eight years earlier, only now started to make real forward progress. It was shortly after taking possession of the Sondek that Gordon J. King gave me the Denon Receiver. And it was after I gave it back to him that I wrangled a Naim Preamp and a 250 on long-term loan. I finally had that “Linn/Naim” system. Eventually, the Linn and Naim were both replaced, but now each upgrade came with a concomitant and lasting improvement in the pleasure to be had from the system.

Back then, the Hi-Fi world was different to what it is now. There were a very small number of manufacturers offering equipment with truly high-end performance, and a large majority whose products fell seriously, seriously short. It was a market in which the “Linn/Naim” message could - and did - resonate. Today, the picture is very different. You have to go a long way to find a truly bad product, and the choice of seriously, seriously good equipment can be almost bewildering. You know, as I write this, it occurs to me that maybe life was indeed much simpler when all you needed to know was “Linn/Naim”, “Linn/Naim”, and “Linn/Naim”. Nostalgia ain’t what it used to be.

Saturday 23 November 2013

A Sense Of Scale

When I was a kid, growing up in a rough area of Glasgow, we were all taught music at school - even at elementary school.  I have a memory going back to about age eight, sitting in a classroom that was right next to the school gym.  I recall it containing gym equipment.  And I recall the teacher writing two very strange words on the blackboard - “Beethoven” and “Mozart”.  Frankly, I don’t remember much else about it.  I do know that we were taught the so-called “Tonic Solfa”, - Do, Re, Mi, Fa, So, La, Ti, Do, which is in musical parlance the major scale.  On a piano keyboard this is easily played as C, D, E, F, G, A, B, C.  I think it is sad that this sort of thing is no longer taught in most schools as part of the core syllabus.

I think we also all know that those notes I mentioned form only the white keys on the piano keyboard, and that there are also black keys that sit between them, set back slightly from the front of the keyboard.  Every pair of white notes has a black note between them, save for E/F and B/C.  This gives the piano keyboard its characteristic pattern of black keys, which alternate up and down the keyboard in groups of two and three.  It is this breakup of the symmetry that allows us to immediately identify which note is which.  For instance, the C is the white key immediately to the left of the group of two black keys.  The other thing most of us know is that every black note has two names - the black note between C and D can be called either C-sharp (written C#) or D-flat (written D♭).  And if you didn’t know that before, well you do now!

Any performing musician will tell you that it is critically important to get your instruments in tune before you start playing.  And if you are in a band, it is important that all instruments are in tune with each other.  Some instruments (most notably stringed instruments) have a propensity to go out of tune easily and need frequent tune-ups, some even during the course of a performance.  Even the very slightest detuning will affect how the performance sounds.  Let’s take a close look at what this tuning is all about, and in the process we will learn some very interesting things.

Something else that I think you all understand is that the pitch of a note is determined by its frequency.  The higher the frequency, the higher the note.  And as we play the scale from C to the next C above it (I could denote those notes as C0 and C1 respectively), we find that the frequency of C1 is precisely double the frequency of C0.  In fact, each time we double the frequency of any note, what we get is the same note an octave higher.  This means, mathematically, that the individual notes appear to be linearly spaced on a logarithmic scale.  If we arbitrarily assign a frequency to a specific note by way of a standard (the musical world now defines the frequency 400Hz as being the note A), we can therefore attempt to define the musical scale by defining each of the adjacent 12 notes on the scale (7 white notes and 5 black notes) as having frequencies which are separated by a ratio given by the 12th root of 2.  If you don’t understand that, or can’t follow it, don’t worry - it is not mission-critical here.  What I have described is called the “Even-Tempered Scale”.  With this tuning, any piece can be played in any key and will sound absolutely the same, apart from the shift in pitch.  Sounds sensible, no?

As I mentioned earlier, if you double the frequency of a note you get the same note an octave higher.  If you triple it, you get the note which is the musical interval of “one fifth” above that.  In other words, if doubling the frequency of A0 gives us A1, then tripling it gives is E1.  By the same logic, we can halve the frequency of E1 and get E0.  So, multiplying a frequency by one-and-a-half times, we get the note which is a musical fifth above it.  Qualitatively, the interval of one-fifth plays very harmoniously on the ear, so it makes great sense to use this simple frequency relationship to provide an absolute definition for these notes.  So now we can have A0=400Hz and E0=600Hz.

The fourth harmonic of 400Hz is another A at 1600kHz, so let's look at the fifth harmonic.  This gives us the musical interval of “one third” above the fourth harmonic.  This turns out to be the note C#2.  So we can halve that frequency to get C#1, and halve it again to get C#0.  The notes A, C#, and E together make the triad chord of A-Major, which is very harmonious on the ear, so we could use this relationship to additionally define C#0=500Hz.

We have established that we go up in pitch by an interval of one-fifth each time we multiply the frequency by one-and-a-half times.  Bear with me now - this is what makes it interesting.  Starting with A0 we can keep doing this, dividing the answer by two where necessary to bring the resultant tone down into the range of pitches between A0 and A1.  If we  keep on doing this, it turns out we can map out every last note between A0 and A1.  The first fifth gives us the note E.  The next one B.  Then F#.  Then C#.  Let’s pause here and do the math.  This calculation ends up defining C# as 506.25Hz.  However, we previously worked out, by calculating the fifth harmonic, that C# should be 500Hz!  Why is there a discrepancy?  In fact, the discrepancy only gets worse.  Once we extend this analysis all the way until we re-define A, instead of getting 400Hz again we end up with 405.46Hz.  And what about the “Equal-Tempered Scale” I mentioned earlier - where does that fit in?  That calculation defines a frequency for C# of 503.97Hz.

The problem lies in the definition of the interval of one-fifth.  On one hand we have a qualitative definition that we get by observing that a note will play very harmoniously with another note that has a frequency exactly one-and-one half times higher.  On the other, we have a more elaborate structural definition that says we can divide an octave into twelve equally-spaced tones, assign each tone with the names A through G, plus some black notes (sharps/flats), and define one-fifth as the interval between any seven adjacent tones.  I have just shown that that the two are mathematically incompatible.  Our structural approach gives us a structure where we can play any tune, in any key, and defines an “Equal-Tempered” scale, but our harmonic-based approach is based on specific intervals that “sound” better.  How do we solve this conundrum?

This was a question faced by the early masters of keyboard-based instruments, where each individual note can be precisely tuned at will to a degree of precision that was not previously attainable by other instruments.  All this took place in the early part of the 18th Century, back in the time of our old friend Johann Sebastian Bach.  It turns out they were very attuned to this issue (no pun intended).  The problem was, if you tuned a keyboard to the “Equal-Tempered” tuning, then pieces of real music played on it did not sound at all satisfactory.  So if the “Equal-Tempered” tunings sounded wrong, what basis could you use to establish something better?  There isn’t a simple answer for that.  Every alternative will, by pure definition, have the property that a piece played in one key will sound slightly different played in another key.  What you want is that the different keys have the property of each having a sound which we accept may be different in character, but such that none of them sound “bad” in the way that the “Equal-Tempered” tuning does.

This problem shares many aspects with the debate between advocates of tube vs solid-state amplifiers, of horn-loaded vs conventionally dispersive loudspeaker, even of digital vs analog.  If the solution is to be found in a consensus opinion of a qualitative nature, there is always going to be a divergence of opinion at some point.  In Bach’s time, there was a consensus which emerged in favour of what is termed “Well-Tempered” tuning.  I won’t go into the specifics regarding how that particular tuning is derived, but in short this is now the basis of all modern Western music.  Bach wrote a well-known collection of keyboard pieces titled “The Well-Tempered Klavier” whose function is to illustrate the different tonal character of the different musical keys which arise from this tuning.

One thing which emerges as a result of all this is that the tonal palette of a composition is determined, to a certain degree, by the key in which it is written.  This is what is behind the habit of classical composers to name and identify their major works by the key in which they are written.  You may have wondered why Beethoven’s ninth symphony was written in D-Minor, or, given that it had to have been written in some key, why the key always gets a mention.  If so, well now you know.

Here is a web site that explores the “character” of each of the different keys.  Of course, since this is a purely qualitative assessment, YMMV.  Enjoy!…

Wednesday 20 November 2013

Mahler’s Mysterious Symphony No 7

Mahler’s 7th Symphony stands unique among the composer’s symphonic cycle for many, many reasons.  Most of all, there remains huge uncertainty over what it is actually about.  Does it have an over-arching message, or programme?  For the conductor, it presents huge difficulties in determining what it is, musically, that you want your interpretation to say.  The magnitude of this uncertainty is not to be underestimated.  Indeed, there has been at least one major international conference of musicologists devoted exclusively to analysis and interpretation of this one piece.

What did Mahler think about it?  The composer was known to be very particular about his compositions, and was an acknowledged master of complex musical form.  Each of his symphonies has a clearly discernible span, making a journey from one place to another, or examining a set of complex emotions or feelings with great clarity.  Analysts have long pondered over the Symphony’s 5-movement structure and tried to tie in the meanings of the outer movements in relation to the inner three.  You would have thought Mahler himself would have recognized such weaknesses, and yet he expressed himself more satisfied with the 7th than with any of his other symphonies.  He obviously saw something different in it.

Mahler undertook work on the 7th immediately after finishing his 6th Symphony, a relentlessly somber and anguished composition.  Yet none of these tragic elements make their way into the 7th Symphony.  It is clearly its own piece, born of its own musical ideas.  He began by composing what would become the 2nd and 4th movements, both called “Nachtmusik” - hence giving the Symphony its commonly used sobriquet “Song of the Night”.  Between those two is the Scherzo, another sinister-sounding movement of evidently nocturnal character.  What ties these three central movements together?  The answer to this must surely be the key to unlocking the mystery of the whole symphony.  Let’s look at them more closely.

The second movement is a beautifully crafted evocation of a quiet night in the forest.  We hear small animals scurrying about, calling to each other.  Hints left and right of things that might be happening if only we could see.  There is a kind of musical darkness that is evocative without being revealing, if I might put it that way.  It is almost a pastoral setting in total darkness.  Yet this darkness is one without any sense of menace.  Like Beethoven’s 6th Symphony’s stroll through the countryside on a fine summer’s day, this movement is a stroll through the forest in the middle of the night.  Humans have a natural trepidation when faced with darkness and night, and this movement seems to want to illustrate that it needn’t be so.  It is uplifting music.

But then along comes the Scherzo.  Now our community of nightlife is scurrying about with an obvious sense of nervousness, with an unspoken threat of something dangerous lurking unseen and probably very close by.  The Scherzo is unsettled from beginning to end.  Even as calm tries to break out from time to time, it is a nervous calm, and never seems to entirely free itself from the dangers hiding in the background.  But these dangers seem to be content, for the time being, to lurk, and never manage to leap forward and give their fears a name.

The fourth movement is the second “Nachtmusik” movement, and is a different beast entirely.  Here the protagonist is taking a leisurely, moonlit, late-evening stroll.  The restrained urgency of the forest has gone, along with its menagerie of small furry animals.  The feral menace of the Scherzo have evaporated, and instead the charms of the night are assembled to serenade us.  We are left with an overwhelming impression of contentment.

These three movements are the core of the Symphony, and were written first of all, with the 1st and 5th movements not being added by Mahler until the following year.  I think Mahler had said most of what he wanted to say in these three movements, but realized that they did not stand up on their own as a Symphony without the weighty bookends of suitable opening and closing movements.  I think this is what was in his mind when he knocked out the 1st and 5th movements in little over a month in the summer of 1905.

The first movement is one big introduction.  It is seen by many analysts are representing daybreak, and indeed it can be readily interpreted - on its own - in that light.  But it doesn’t really make a lot of sense to celebrate daybreak before three movements which set about celebrating night.  It is my contention that the first movement celebrates not nightfall as such, but - and here there is no word for what I want to say, so I am going to have to make one up - “nightbreak”.  We live in a daylight world, and in our world day breaks and night falls.  But in a nocturnal world the opposite happens, night breaks and day falls.  So the 1st movement of Mahler’s 7th represents “nightbreak” as a dawning process.  That it takes its time doing so, is necessary mainly for the purposes of creating an opening movement of suitable weight and depth.

The opposite happens in the finale.  Here the night is gradually giving way to day.  The dark tonal colours give way to conventionally brighter ones, and the music works its way to a celebratory conclusion.  We have been blessed with another wonderful night, which is now drawing to its conclusion with the dawn, and God willing, once the day passes the music’s nocturnal protagonist can hopefully look forward to the next night.

Because of the interpretive difficulties I have mentioned, there are many different and viable performances of this difficult work available on record.  I must admit, this has always be a very tough symphony for me, as nobody has yet come up with an interpretation that - to my ears at least - makes real sense of this challenging symphony.  I would have said that, with my hand on my heart, I haven’t yet heard a single recording I can recommend.

But that has now changed.

I recently posted about the Michael Tilson Thomas recording with the San Francisco Symphony, which is being made available in DSD by Blue Coast Records at a stunning price (until the end of November).  It is a stunning recording too.  This is finally the definitive Mahler 7th for me.  Those of you who already know the piece can be forgiven for wondering what the hell I am talking about in my out-of-left-field analysis.  But I think Tilson Thomas just about nails it for me in this recording.  In particular, the middle three movements are spectacularly spot on - quite the best I have yet heard.  Only the final movement is arguably weak.  The first movement is a great exposition of my “nightbreak” Introduction theory - and has the amusing bonus that the famous Star Trek theme which makes its appearance half way through is voiced to sound just like it does on the TV show!  I would expect nothing less from a bunch of San Franciscans!  The core central movements are breathtakingly magnificent.  A truly captivating performance.  Well done, Tilson Thomas, and quite unexpected given his unconvincingly austere rendition of the 1st (albeit a superbly recorded, unconvincingly austere rendition), also available from Blue Coast.

I provided a link to it in the previous post I mentioned, so here instead is a link to a YouTube video of Tilson Thomas and the San Francisco Symphony performing the 7th at a Prom concert in London, England a couple of years back.  Nowhere near as polished as the recording, from an interpretative standpoint, but still an hour and a half of compelling viewing.

Monday 18 November 2013

What is a Symphony?

Most of you who do not make a habit of listening to classical music will have heard of a Symphony, and know that it is some sort of portentous orchestral piece listened to by highbrow types wearing appreciative frowns. But I suspect that a much smaller proportion have some clear idea of what a Symphony actually is, and why it is at all important. If you are interested to learn a little more, this post is for you. But be forewarned - I am not a trained musicologist, so if you like what you read here, don’t treat it as gospel, but rather as inspiration to read further, from more authoritative sources.

The term “Symphony” actually has its roots in words of ancient Greek origin originally used to describe certain musical instruments. They have been applied to pipes, stringed instruments, a primitive hurdy-gurdy, and even drums. By the middle ages, similar words were being used for musical compositions of various forms. It is not until the eighteenth century that composers - most prominently Haydn and Mozart - began using the term Symphony to describe a particular form of orchestral composition that we may find familiar today.

Beginning in the Renaissance, the wealthiest European monarchs and princely classes began to assemble troupes of resident musicians in their courts. Although churches had for centuries maintained elaborate choirs, and travelling troubadours have been mentioned in the historical record since time immemorial, it was really only in this period that the concept of what we would now identify as an orchestra began to take shape. Since orchestras didn’t heretofore exist, it follows that composers of orchestral music also didn’t exist either, and the two had to develop and evolve hand in hand. Court composers composed, as a rule, at their masters’ pleasure. They wrote what they were told to write, rather than what they were inspired to write. The purpose of the “orchestra” was mainly to provide music to dance to, although special pieces were sometimes commissioned from the court composer for ceremonial occasions.

As music and musicianship grew, so the scope of compositions began to grow in order to highlight the advancing skills of the performers. Musical forms began to develop which would showcase these talents, and compositional styles emerged which would enable these performers to express their talents in the form of extended playing pieces where they would elaborate both their own playing skills, and the composer’s evolving compositional ideas. Specialist composers began to emerge, culminating in Johann Sebastian Bach, who would go on to codify many of the compositional and structural building blocks which continue to underpin all western music today. It might surprise many readers to learn that today’s pop & rock music adheres very firmly to the principles first set forth by Bach, far more so than do its modern classical counterparts.

By the late 18th century, specialist composers had fully emerged, brimming - indeed exploding - with musical ideas. Many of those ideas involved utilizing the seemingly unlimited expressive potential of the musical ensemble we call an orchestra, but there were few accepted musical forms which composers could use to realize these ambitions. What emerged was the Symphony. Musical forms did exist for shorter, simpler pieces. What the new classical symphonists did was to establish ways of stitching together groups of smaller pieces to make an interesting new whole, which they called a Symphony.

Haydn and Mozart established that a Symphony could be constructed by taking a simple, but highly structured established form such as a Sonata (think Lennon & McCartney) and combining it first with a slower piece and then with a faster piece by way of contrast, and concluding with an up-tempo musical form (such as a Rondo) which has a propensity to drive towards a satisfying and natural conclusion. Eventually, composers would learn to link the four “movements” together by thematic, harmonic, or tonal elements. In any case, the idea was that the four movements would together express musical ideas that exceeded the sum of their parts.

In the next century, particularly thanks to Beethoven, the Symphony grew to become the ultimate expression of compositional ideas. When a composer designates a work a Symphony, it implies both the deployment of the highest levels of musical sophistication, and great seriousness of purpose. Indeed many composers were (and are) reluctant to apply the term to compositions which in their minds failed to meet their personal expectations of what the form demands.

So what, then does the form demand? As time has gone on, the answer to that has grown increasingly abstract. In my view, what it demands more than anything else is structure, which sounds terribly pompous, so I need to describe what I mean by that. Structure is the framework upon which the music expresses its message. I think the easiest possible way to explain that is to listen to the first movement of Beethoven’s 5th symphony (with Carlos Kleiber conducting the Vienna Philharmonic Orchestra, if you can get hold of it). Everybody knows the famous 4-note motif which open the piece - DA-DA-DA-DAAAAA!, and then repeats one tone lower. The entire first movement is all about Beethoven explaining to us what he means by that 4-note motif. The piece sets about exploring and developing it in different ways. We hear it in different keys, at different pitches, played by different instruments and by the orchestra in unison, at different tempi, as the main theme and as part of the orchestra’s chattering accompaniment. It starts off famously as an interrogatory statement - three notes and then down a third with a portentous dwell on the fourth note. By the end of the movement the motif has modulated into a triumphant phrase - three notes and then up a fourth, with the fourth note punched out like an exclamation point. The opening of the movement has asked a (musical) question, then went on to explore the matter in some detail, and finished with a definitive answer. This is what I mean by structure. By the time the movement is over, I feel I know all I need to know about the 4-note motif, or at at least all that Beethoven has to say about tit.

A symphony can be a mammoth piece - some are over an hour long. Four movements is traditional, but five or six are common. What is needed to make a symphony work is that its musical message must be properly conveyed across its whole. It needs to feel incomplete if any parts are missing. It needs to feel wrong if the movements are played in the wrong order. And above all it needs to give up its mysteries reluctantly; it doesn’t want to be a cheap date - it wants your commitment too. A symphony is all about that structure, how its musical ideas are developed both within the individual movements, and also across the entirety of the work. These musical ideas may not be overt - indeed they can be totally hidden in such a way that experts have never managed to fully uncover them in over a hundred years. It may even be that the composer himself only knows those things in his subconscious. Some symphonies are programmatic - which is to say that the composer himself has acknowledged that it sets about telling a particular story - a fine example is the 7th Symphony of Shostakovich which represents the siege of Leningrad in WWII. Some symphonies express acknowledged thoughts, emotions, and musical recollections evoking a particular subject - such as Mendelssohn’s Italian (No 4) and Scottish (No 3) symphonies and Corigliano’s 1st symphony (prompted by the AIDS epidemic). Many entire symphonic oevres were prompted by profoundly religious (i.e Bruckner) or existential (i.e Mahler) emotions.

You can’t talk about the Symphony without talking about the dreaded “curse of the ninth”. Beethoven wrote nine symphonies then died. Shortly afterwards, Schubert died with his 9 symphonies (one unfinished) in the bag. Then came Dvorak, Bruckner, and Mahler. There are others, including the English composer Ralph Vaughan Williams. Arnold Schoenberg wrote “It seems that the Ninth is a limit. He who wants to go beyond it must pass away … Those who have written a Ninth stood too close to the hereafter.” Some composers went to great lengths to avoid writing a ninth symphony without getting the tenth safely in the bag immediately afterwards. These include Gustav Mahler whose ninth symphony he instead titled “Das Lied Von Der Erde”. With that safely published he wrote his formal 9th symphony … and then expired with his 10th barely begun. Amusing though it might be, the “curse of the ninth” is of course a fallacy, but one which remains acknowledged by many contemporary composers as a superstition in whose eye they really don’t want to poke a stick.

Some great composers wrote little of note outside of their symphonic output. Others never once in long and productive careers turned their hand to the format - Wagner and Verdi spring to mind. There are a few who were strangely reluctant to approach the form - Stravinsky composed four of them, but pointedly refused to assign numbers to them. In any case, the most important aspect of a Symphony is that - with very few exceptions - they reflect the composer’s most sincere, and personally committed works. They are therefore often listed amongst their composer’s most significant, most important works. And they are also among the most performed and recorded.

Here are a list of Symphonies that might go easy on the ear of a new listener interested in exploring the oevre, with some recommended recordings:

Mozart: Symphony No 40 (McKerras, Prague Chamber, Telarc)
Beethoven: Symphony No 5 (Kleiber, Vienna Philharmonic, DG)
Brahms: Symphony No 4 (Kleiber, Vienna Philharmonic, DG)
Dvorak: Symphony No 8 (Kertesz, LSO, Decca)
Tchaikovsky: Symphony No 6 (Haitink, Royal Concertgebouw, Philips)

And a few that might challenge the already initiated:

Nielsen: Symphony No 5 (Davis, LSO, LSO Live!)
Mahler: Symphony No 7 (Tilson Thomas, SF Symphony, Blue Coast)
Vaughan Williams: Symphony No 5 (Boult, London Philharmonic, EMI)
Corigliano: Symphony No 1 (Barenboim, Chicago Symphony, Erato)
Shostakovich: Symphony No 7 (Haitink, London Philharmonic, Decca)

Sunday 17 November 2013

A phenomenal offer from Cookie Marenco of Blue Coast Records!!

California-based Blue Coast Records is a pioneering producer of downloadable DSD recordings. Cookie insists that all her recordings are 100% analog to DSD encodings, with no intermediate PCM conversions in any form. This is quite important, because it means that all mixing, panning, fading, etc has to be done entirely in the analog domain since the DSD format does not enable this to be done in the digital domain.

Blue Coast's DSD offerings are mostly recorded in her own studio, using a methodology she refers to as "ESE" (Extended Sound Environment). These are some of the finest recordings you will ever own. Cookie also sells a very limited selection of recordings from other studios whose work meets her exacting requirements. A long-standing personal history with Sony means that she is now able to offer a selection of Mahler symphonies recorded by The San Francisco Symphony, conducted by Michael Tilson Thomas. At the moment, symphonies 1, 2, 4 and 7 are offered, and it is to be hoped that this will be expanded in due course to the whole cycle.

Typically, these specialist recordings are very, very expensive. We're looking at $50 - $75 here. But for the month of November, Cookie is making Mahler's 7th Symphony available for ONLY $12. That's right, a stunning, no-compromise DSD download of a Mahler Symphony that typically comes on a double CD, for just 12 bucks. Your choice of original DSD or PCM (24/96 or 16/44). Please hurry to buy this before they rush her off in a straight-jacket for some recuperation time in a local "health spa".

At BitPerfect we love our Mahler. We worship it. Thank you Cookie!

Thursday 14 November 2013


Either way you look at them, the high-end loudspeakers produced by Wilson Audio have a certain unmistakable ‘house style’ aesthetic. They have a well-known ‘house sound’ too, and it may float your boat or it may not, but in any case it appears to this observer that Chez Wilson, form follows function. And now, to boot, form can follow function in any colour you like! As to price - well, if you have to ask, you can't afford it!

I have spent time with Wilson’s Sophia III and with their Sasha W/P models. But I want to talk about their higher-end models, the Alexia and Alexandra XLF. These have the midrange drivers and tweeters in a separate box which is mounted above the bass bin inside a frame which allows them to be tilted through a quite surprising range of settings, the idea being, as I understand it, to allow for very precise time alignment depending on where the listener is located. As a rule, the bigger the speaker, the greater the physical separation between the drive units, and, therefore, the greater is the potential benefit to be had by getting the temporal alignment just so. At least, that’s the theory.

Tim spent some time observing Peter McGrath setting up a pair of Alexias. This involves positioning them in the room in the usual way, and then aligning the upper bins. The way the design works, as you might expect, this is very easy to do. The surprising thing was, however, the effect of getting the time alignment right. Wilsons are well known for, among other things, their holographic imaging properties. What Tim heard was how incredibly the image just seems to snap into place when you get the alignment right. It took Peter McGrath just 10 minutes to do the whole job, but there again he knows what he is doing! Interestingly enough, the image snapped into place not just for the lucky person in the sweet spot, but for quite a range of other listening positions too. Tim says they are comfortably the best speakers he has ever heard - and this from a guy who owns Stax SR-009's.

Recently, I spent some time refining the set-up of my own speakers. My B&W 802 Diamonds are not quite in the Wilson league for imaging, but they are still pretty good. However my listening room’s dimensions are unkind, and every now and then, having pondered long and hard over what problem I should be trying to solve, I try my hand at some room treatment work. Its a never ending process. In this case, I built a massive absorbing panel, about 6’ x 4’, and located it on the ceiling above the speakers, towards the back of the room. When you do stuff like this, it throws your previously optimized speaker set-up out of whack, and you have to start all over again.

I ended up moving my speakers a little more than 4 inches closer together, but that is typical of the sort of positioning accuracy you need to be bearing in mind. I had got the tonal balance where I wanted it, and the imaging was sort of correct. Instruments and performers were all where they should have been, but the ‘holographic’ element was missing - you could locate the position of instruments reasonably well, but somehow you could not just shut your eyes and visualize the performer. Trying to get this right, there are a couple of recordings I like to go to. These are inevitably recordings I played through the Wilson Sophia III’s and which, as I result, I had a good idea of what I ought to have been hearing imaging-wise. And I wasn’t hearing it.

I remembered what Tim said about the Alexias, and how Peter solved that problem by the simple expedient of tilting the mid/tweeter unit forward in its frame. My 802’s don’t have that adjustment. But then I thought why not just try tilting the whole kit & caboodle forward? I did. Nothing happened. So I tilted them a bit more. Still nada. By that time I had run out of adjustment range on the 802’s very beefy threaded spikes. So I found some wood to prop up the rear spikes and tilted them as far forward as I dared (802's are deceptively heavy). Well, that did the trick. All of a sudden the soundstage deepened and widened, and individual instruments began to occupy a more definable space. In particular, vocalists now appear tightly located, centre stage, just behind the plane of the speakers, and just in front of the drum kit. Kunzel's 1812 cannons are amazingly precisely located. Job done!

The rear spikes now sit in cups on a pair of Black Dahlia mounts, and everything is pretty solid. With the tilt, I found I needed to position them a couple of inches further back, but that’s fine - nobody can get behind them now (have you noticed how people always seem to be irresistibly drawn to the rears of large loudspeakers?) and accidentally topple them forwards. See the photograph below for an indication of the degree of tilt.

I’m not sure quite why this tilting has the effect it has. The design of the 802’s is such that the vertical and horizontal dispersion are probably very similar, outside of the crossover region at any rate. Perhaps I am reducing the energy reflected off the ceiling, but that is speculation, and well outside my sphere of competence. In any case tilting is surely a tool we can all add to our room-tuning arsenal. It will certainly be a big part of mine for some time to come. At least until I can afford Alexias … 

Wednesday 13 November 2013

What, exactly, is DSD? - III. It’s a bird? It’s a ‘plane?

We learned over the last couple of days how DSD works as a format, and what its basic parameters are. It is a 1-bit system, sampled at 2.82MHz, relying heavily on oversampling and Noise Shaping. We didn’t say much about the actual mechanism of Noise Shaping because, frankly, it relies on some pretty dense mathematics. So we didn’t say too much about what the resultant data stream actually represents.

We learned that each individual bit is somehow like an Opinion Poll, where instead of asking the bit to tell us what the signal value it is, we ask it whether it thinks it should be a one or a zero. The bit is like an individual respondent - it doesn’t really know, but it has a black & white opinion, which might be right or wrong. But by asking the question of enough bits, we can average out the responses and come up with a consensus value. So each individual bit does not represent the actual value of the signal, but on the other hand an average of all the bits in the vicinity gets pretty close! So, at any point in time, in order to represent the signal, some of the bits are ones and some are zeros, and, to a first approximation, it does not matter too much how those ones and zeros are distributed. But here is a quick peek into Noise Shaping. Noise Shaping works by taking advantage of the choices in distributing the ones and zeros. It is precisely those choices that give rise to the Noise Shaping.

An interesting way of looking at it is that the signal itself represents the probability that the value of the bit will be a one or a zero. If the probability is higher, a higher proportion of the bits will be ones, and if it is lower the proportion will be correspondingly lower. As the waveform oscillates between high and low, so the relative preponderance of ones over zeros in the bitstream oscillates between high and low. The value of any one individual bit - whether it is a one or a zero - says very, very little about the underlying signal. That is quite a remarkable property. An individual bit could be subject to some sort of reading error and come out totally wrong, and provided there are a small enough number of such errors, it is arguable that you would never actually know that the error happened!

Compare this with PCM. In a PCM signal, we can argue that every single bit means something. It says something highly specific about the value of the signal at some specific point in time. Some bits say more important things than others. For example, the Most Significant Bit (MSB) tells us whether the signal is positive or negative. If there is a reading error and that comes out wrong, the impact on the resultant signal can be massive. Because every bit in a PCM system has a specific meaning, and every bit in a DSD system has a nebulous meaning, it should be no surprise that there is no mathematical one-to-one correspondence between PCM data and DSD data. Sure, you can convert PCM to DSD, and vice versa, but there is no mathematical identity that links the two - unlike a signal and its Fourier Transform, each of which is a direct representation of the other in a different form. Any transformation from one to the other is therefore subject to a lossy algorithm. Of course, an appropriate choice of algorithm can minimize the loss, but the twain are fundamentally incompatible.

However, let us look at some similarities. Let’s look at the function of the DAC. For a PCM DAC, its job is to recreate the values of a voltage encoded by the data at each sample point. Those voltages go up and down according to the data in the PCM data stream. We just need to pass that waveform through a low-pass filter and the result is music. Now let’s compare that with DSD. For a DSD DAC, its job is to recreate the values of a voltage encoded by the data at each sample point. Those voltages go up and down according to the data in the DSD data stream. We just need to pass that waveform through a low-pass filter and the result is music. Hang on one minute … wasn’t that just the same thing? Yes it was. For 16/44.1 (CD) audio, the PCM DAC is tasked with creating an output voltage with 16-bit precision, 44,100 times a second. On the other hand, for DSD the DSD DAC is tasked with creating an output voltage with 1-bit precision, 2,822,400 times a second. In each case the final result is obtained by passing the output waveform through a low-pass filter. 

That is an interesting observation. Although the data encoded by PCM and DSD are fundamentally different - we just got through with describing how they mean fundamentally different things - now we hear that the process for converting both to analog is exactly the same? Yes. Strange but true. From a functionality perspective, as far as a DAC is concerned, DSD and PCM are the same thing! 

By the way, I have mentioned how we can add Noise Shaped dither to a PCM signal and in doing so encode data below the resolution limit of the LSB. Our notional view of PCM is that the data stream explicitly encodes the value of the waveform at a sequence of instants in time, and yet, if we have encoded sub-dynamic data, that data cannot be encoded in that manner. Instead, by Noise Shaping, it is somehow captured in the way the multi-bit data stream evolves over time. Rather like DSD, you might say! There is definitely a grey area when it comes to calling one thing PCM and another thing DSD.

We started off this series of posts by mentioning the different ‘flavours’ of DSD that are cropping up out there. Now that I have set the table, I can finally return to that.

DSD in its 1-bit 2.28MHz form is the only form that can be described correctly (and pedantically) as DSD. We saw how it represents the lowest sample rate at which a 1-bit system could be Noise Shaped to deliver a combination of dynamic range and frequency response which at least equalled that delivered by CD. What it in fact delivers is a significant improvement in dynamic range, and more of a loosening in the restrictions on high-frequency response imposed by CD than a major extension of it. In any case, that is enough for most listeners to come out in favour of its significant superiority. However, a significant body of opinion holds that by increasing the sample rate yet further, we can achieve a valuable extension of the high-frequency response. (In principle, we could also increase the dynamic range, but DSD is already capable of exceeding the dynamic range of real-world music signals). People are already experimenting with doubling, quadrupling, and even octupling 1-bit sample rates. Terminology for these variants is settling on DSD128, DSD256, and DSD512 respectively (with actual DSD being referred to as DSD64). Why do this? Partially because we can. But - early days yet - reports are emerging of listeners who are declaring them to be significantly superior. 

There are additionally formats - mostly proprietary ones which only exist ephemerally within DAC chips or pro-audio workstations - which replace the 1-bit quantization with multi-bit quantization. These have occasionally been referred to as “DSD-Wide”. I won’t go into that in much detail, but there are some interesting reasons you might want to use multi-bit quantizers. Some established authorities in digital audio - most notably Stanley Lipschitz of the University of Waterloo - have come out against DSD largely because of its 1-bit quantizers. Lipschitz’ most significant objection is a valid one. In order to create a DSD (in its broadest sense) bitstream, a Sigma Delta Modulator is used. For these modulators to achieve the required level of audio performance, they must incorporate high-order modulators to perform the Noise Shaping. These high order modulators turn out to be unstable if you use a 1-bit quantizer, but can be made stable by adopting a multi-bit quantizer. In practical terms, though, many of Lipschitz’ objections have been addressed in real-world systems, so I won’t pursue that topic further.

But ever since SACD (which uses the DSD system) first came out, DSD DACs have recognized that the DAC’s performance can be significantly improved by using one of the “extended-DSD” formats. So, internally, the majority of such chipsets convert the incoming DSD to their choice of “extended-DSD” format, and do the actual DAC work there. The conversion involves first passing the DSD bitstream through a low-pass filter, with the result being a PCM data stream using an ultra-high resolution floating-point data format sampled at 2.82MHz. This is then instantly oversampled to the required sample rate and converted to the “extended-DSD” format using a digital SDM. Unfortunately, the low-pass filter needs to share some of the undesirable characteristics of the brick-wall filters that characterize all PCM formats because of all the high-frequency content that has been shaped into the ultrasonic region. So it is likely that the proponents of DSD128, DSD256, and so forth, are onto something if those formats can be converted directly in the DAC without any “extended-DSD” reformatting. 

I hope you found these posts which take a peek under the hood of DSD to be informative and interesting. Although the mathematics of PCM can be challenging at times, those of DSD are that and more, in spades. It is likely that progress in this field will continue to be made. In the meantime, condensing it into a form suitable for digestion by the layman remains a challenge of its own :)

Tuesday 12 November 2013

What, exactly, is DSD? - II. Getting in Shape

Last week, we learned that by adopting a PCM format, we also constrain ourselves with the need to employ radical low-pass filtering in both the ADC and DAC stages in order to eliminate the fundamental problem of aliasing. Yesterday we learned that we can use oversampling and noise shaping to overcome some of the limitations imposed by Bit Depth in PCM systems. Taking both together, we learned that by increasing both the BitDepth and the Sample Rate we can make inroads into the audible effects of both of these limitations.

In practice, there is no point in extending the Bit Depth beyond 24 bits. This represents a dynamic range of 144dB, and no recording system we know of can can present analog waveforms with that level of dynamic range to the input of an ADC. On the other hand, even by extending the Sampling Rate out to 384kHz (the largest at which I have even seen any commercially available music made available), the brick-wall filter requirements are still within the territory where we would anticipate its effects to be audible. A 24/384 file is approximately 13 times the size of its 16/44.1 equivalent. That gets to be an awfully big file. In order for the filter requirements to be ameliorated to the point where we are no longer concerned with their sonic impact the sample rate needs to be out in the MHz range. But a 24-bit 2.82MHz file would be a whopping 100 times the size of its 16/44.1 counterpart. Clearly this is takes us places we don’t want to go.

But wait! Didn’t we just learn that by oversampling and Noise Shaping we can access dynamic range below the limitation imposed by the Bit Depth? Increasing the sample rate by a factor of 64 to 2.82MHz would mean that our audio frequencies (20Hz - 20kHz) are all going to be massively oversampled. Perhaps we can reduce the Bit Depth? Well, with oversampling alone, all we can do is shave a paltry 4-bits off our Bit depth. But do not get discouraged, with Noise Shaping it turns out we can reduce it all the way down to 1-bit. A 1-bit 2.82MHz file is only 4 times larger than its 16/44.1 equivalent, which is actually quite manageable. But really? Can we get more than 100dB of dynamic range from a 1-bit system just by sampling at 2.82MHz?

Yes, we can, but I am not going anywhere near the mathematics that spit out those numbers. That is the preserve of experts only. But here’s what we do. When we encode data with a 1-bit number, the quantization error is absolutely massive, and can be anywhere between +100% and -100% of the signal itself. Without any form of noise shaping, this quantization noise would in practice sit at a level of around -20dB (due to the effect of oversampling alone) but would extend all the way out to a frequency of 1.41MHz. But because of the massive amount of oversampling, we can attempt to use Noise Shaping to depress the quantization noise in the region of 0-20kHz, at the expense of increasing it at frequencies above, say 100kHz. In other words, we would “shape” it out of the audio band and up into the frequency range where we are confident no musical information lives, and plan on filtering it out later. We didn’t choose that sampling rate of 2.82MHz by accident. It turns out that is the lowest sample rate at which we can get the noise down well below 100dB over the entire audio frequency bandwidth.

To convert this signal back to analog, it turns out this format is much easier to implement than multi-bit PCM. Because we only encode 1-bit, we only have to create an output voltage of either Maximum or Minimum. We are not concerned with generating seriously accurate intermediate voltages. To generate this output, all we have to do is switch back and forth between Maximum and Minimum according to the bit stream. This switching can be done very accurately indeed. Then, having generated this binary waveform, all we have to do is pass it through a low pass filter. Job done.

This is a pretty interesting result. We have managed to eliminate the need for those nasty brick-wall filters at both the ACD and DAC, and at the same time capture a signal with exceptional dynamic range across the audio bandwidth. This, my friends, is DSD.

As with a lot of things, when you peek under the hood, things always get a little more complicated, and I will address some of those complications tomorrow.

Monday 11 November 2013

What, exactly, is DSD? - I. Opinion Polls

Being strictly accurate, DSD (Direct Stream Digital) is a term coined by Sony and Phillips, and refers to a very specific audio protocol. It is a 1-bit Sigma-Delta Modulated data stream encoded at a sample rate of 2.8224MHz. However, the term has now been arbitrarily widened by the audio community at large, to the point where we find it employed to apply generically to an ever-widening family of Sigma-Delta Modulated audio data streams. We read the terms Double-DSD, Quadruple DSD, and “DSD-Wide” applied to various SDM-based audio formats, so that DSD has become a catch-all term somewhat like PCM. There are many flavours of it, and some are claimed to be better than others.

So, time to take a closer look at DSD in its broadest sense, and hopefully wrap some order around the confusion.

Strangely enough, the best place to start is via a detour into the topic of dither which I discussed a couple of weeks back. You will recall how I showed that a 16-bit audio signal with a maximum dynamic range of 96dB can, when appropriately dithered, be shown using Fourier Analysis to have a noise floor that can be as low as -120dB. I dismissed that as a digital party trick, which in that context it is. But this time it is apropos to elaborate on that.

The question is, can I actually encode a waveform that has an amplitude below -96dB using 16-bit data? Yes I can, but only if I take advantage of a process called “oversampling”. Oversampling works a bit like an opinion poll. If I ask your opinion on whether Joe or Fred will win the next election, your response may be right or may be wrong, but it has limited value as a predictor of outcome. However, if I ask 10,000 people, their collective opinions may prove to be a more reliable measure. What I have done in asking 10,000 people is to “oversample” the problem. The more people I poll, the more accurate the outcome should be. Additionally, instead of just predicting that Joe will win (sorry, Fred), I start to be able to predict exactly how many points he will win by, even though my pollster never even asked that question in the first place!

In digital audio, you will recall that I showed how an audio signal needs to be sampled at a frequency which is at least twice the highest frequency in the audio signal. I can, of course, sample it at any frequency higher than that. Sampling at a higher frequency than is strictly necessary is called “oversampling”. There is a corollary to this. All frequencies in the audio signal that are lower than the highest frequency are therefore inherently being oversampled. The lowest frequencies are being oversampled the most, and highest frequencies the least. Oversampling gives me “information space” I can use to encode a “sub-dynamic” (my term) signal. Here’s how…

At this point I wrote and then deleted three very dense and dry paragraphs which described, and illustrated with examples, the mathematics of how oversampling works. But I had to simplify it too much to make it readable, in which form it was too easy to misinterpret, so they had to go. Instead, I will somewhat bluntly present the end result: The higher the oversampling rate, the deeper we can go below the theoretical PCM limit. More precisely, each time we double the sample rate, we can encode an additional 3dB of dynamic range. But there’s no free lunch to be had. Simple information theory says we can’t encode something below the level of the Least Significant Bit (LSB), and yet that’s what we appear to have done. The extra “information” must be encoded elsewhere in the data, and it is. In this case it is encoded as high levels of harmonic distortion. The harmonic distortion is the mathematical price we pay for encoding our “sub-dynamic” signal. This is a specific example of a more general mathematical consequence, which says that if we use the magic of oversampling to encode signals below the level of the LSB, other signals - think of them as aliases if you like - are going to appear at higher frequencies, and there is nothing we can do about that.

Let’s go back again to dither, and consider a technique - Noise Shaping - that I mentioned in a previous post. Noise shaping relates to the fact that when we quantize a signal in a digital representation, the resultant quantization error looks like a noise signal added to the waveform. What is the spectrum of this noise signal? It turns out that we have a significant level of control over what it can look like. At lower frequencies we can squeeze that noise down to levels way below the value of the LSB, lower even than can be achieved by oversampling alone, but at the expense of huge amounts of additional noise popping up at higher frequencies. That high-frequency noise is the "aliases" of the sub-dynamic "low frequency" information that our Noise Shaping has encoded - even if that low frequency information is silence(!). This is what we mean by Noise Shaping - we “shape” the quantization noise so that it is lower at low frequencies and higher at high frequencies. For CD audio, those high frequencies must all be within the audio frequency range, and as a consequence, you have to be very careful in deciding where and when (and even whether) to use it, and what “shape” you want to employ. Remember - no free lunch.

But if we increase the sample rate we also increase the high frequency space above the limit of audibility. Perhaps we can use it as a place to park all that “shaped” high-frequency noise? Tomorrow, we’ll find out.

Wednesday 6 November 2013

iTunes 11.1.3

I have been using iTunes 11.1.3 all day without encountering any problems.  It should be fine for BitPerfect users to download and install it.

iTunes 11.1.3 released

Today we are testing the latest iTunes update 11.1.3 for compatibility with BitPerfect.  I will post my findings later in the day.

Tuesday 5 November 2013

To upsample or not to upsample; that is the question.

Back in late March, I posted some introductory comments here regarding how DACs actually function.  Anyway, following my recent posts on Sample Rate I thought it might be apropos to revisit that subject.

Today’s DACs, with a few very rare (and expensive) exceptions, all use a process called Sigma Delta Modulation (SDM, sometimes also written DSM) to generate their output signal.  A nice way to look at SDM DACs is to visualize them as upsampling their output to a massively high frequency - sometimes 64, 128 or 256 times 44.1kHz, but often higher than that - and taking advantage of the ability to use a very benign analog filter at the output.  That is a gross over-simplification, but for the purposes of the point I am trying to make today, it is good enough.

Doing such high-order up-conversion utilizes a great deal of processing power, and providing that processing power adds cost.  Additionally, the manufacturers of the most commonly used DAC chipsets are very coy about their internal architectures, and don’t disclose the most significant details behind their approaches.  I would go so far as to say that some DAC manufacturers actually misunderstand how the DAC chipsets which they buy actually work, and publish misleading information (I have to assume this is not done intentionally) about how their product functions.  Much of this centres around cavalier usage of the terms ‘upsampling’ and ‘oversampling’.  Finally, some DAC manufacturers use DAC chipsets with prodigious on-chip DSP capability (such as the mighty ESS Sabre 9018), and then fail to make full use of it in their implementations.

Let’s study a hypothetical example.  We’ll take a 44.1kHz audio stream that our DAC chip needs to upsample by a factor of 64 to 2.88MHz, before passing it through its SDM.  The best way to do this would be using a no-holds-barred high-performance sample rate converter.  However, there are some quite simple alternatives, the simplest of which would be to just repeat each of the original 44.1kHz samples 64 times until the next sample comes along.  What this does is to encode the “stairstep” representation of digital audio we often have in mind, in fine detail.  This is acceptable, because, in truth, the 44.1kHz audio steam does not contain one jot of additional information.  Personally, I would refer to this as oversampling rather than upsampling, but you cannot rely on DAC manufacturers doing likewise.

If we are going to use this approach, though, it leads us down a certain path.  It results in the accurate recreation of the stairstep waveform at the output of the DAC.  Even though we have oversampled by a factor of 64 in our SDM process, the output of our DAC has been a faithful reproduction of a 44.1kHz sampled waveform.  This waveform, therefore needs to go through an analog brick-wall filter to strip out the aliases which are embedded within the stairstep.  This is exactly as we discussed in my last post on Sample Rates.

In principle, therefore, by upsampling (using proper Sample Rate Conversion) our 44.1kHz audio by a factor of 2 or 4 prior to sending it to the DAC, we can avail ourselves of the possibility that the DAC can instead implement a less aggressive, and better-sounding, brick-wall filter at its output.  That would be nice.  But that is not the way many (and maybe even most) DACs that use this approach are built.  Instead, they use the same analog brick-wall filter at high sample rates as they do at 44.1kHz (because switching analog filters in and out makes for complicated - read expensive - circuitry).  If your DAC does this you would not expect to hear anything at all in the way of sonic improvement by asking BitPerfect (or whatever other audio software you use) to upsample for you.

So let’s go back a couple of paragraphs, and instead of our DAC oversampling the incoming 44.1kHz waveform, suppose it actually upsamples it using a high quality SRC algorithm.  Bear in mind that all of the audio content up to 20kHz in a 44.1kHz audio stream is aliased within the frequency band from 24.1kHz to 44.1kHz.  If we are to upsample this, we should really strip the aliases out using a digital brick-wall filter.  Done this way, the result is a clean signal that we can pass into the SDM, and which is precisely regenerated, without the stairstep, at the DAC’s output.  So we no longer need that aggressive, sonically worrisome, analog brick-wall filter.

Let’s take another look at these last two scenarios.  One had an aggressive analog brick-wall filter at the output, but the other had essentially the same brick-wall filter implemented digitally at an intermediate processing stage.  If the two sound at all different, it can only be because the two filters sounded different.  Is this possible?  In fact, yes it is, and there are two reasons for that.  The first, as I mentioned in a previous post, is that an analog filter has sonic characteristics which derive from both its design, and from the sonic characteristics of the components with which it is constructed.  The digital equivalent - IF (a big IF) properly implemented - only has sonic consequences arising from its design.  There is a further point, which is that digital filters can be designed to have certain characteristics which their analog counterparts cannot, but that fact serves only as a distraction here.  The bottom line here is that, if properly designed, a diligent DAC designer ought to be able to achieve better sound with this ‘upsampling’ approach than with the previously discussed ‘oversampling’ approach (again, I must emphasize this is MY usage of those terminologies, and is not necessarily everybody else’s).

Using the ‘upsampling’ approach I have just described, it should once again make little difference whether you send your music to the DAC at its native sample rate, or if you upsample it first using BitPerfect (or whatever).  However, all this assumes that the upsampling algorithm used by the DAC is at least as good as the one used by BitPerfect.  There is no guarantee that this will be so, in which case you may find that you get improved results by using BitPerfect to upsample for you to the maximum supported by your DAC.  And you should use one of the SoX upsampling algorithms provided by BitPerfect, rather than CoreAudio.

The bottom line here is that you should expect your DAC to sound better (or at least as good) with your music sent to it at its native sample rate than with it upsampled by BitPerfect.  And if it doesn’t, the difference is probably down to BitPerfect’s upsampling algorithm sounding better than the one implemented in your DAC’s DSP firmware.

So, in summary, in light of all the above, our recommendation here at BitPerfect is that you do NOT use BitPerfect to upsample for you, unless you have conducted some extensive listening tests and determined that upsampling sounds better in your system.  These tests should include serious auditioning of BitPerfect’s three SoX algorithms.

Monday 4 November 2013

Masterchef Junior

I confess to having a weakness for good food and good wine, as well as good sound.  I enjoy cooking, and am not too bad at it, although I offer no pretensions to being any sort of chef.  So it is not surprising that I also get a kick out of watching the TV show Masterchef.

If you don’t know the premise of the show, it goes like this.  Two dozen of the best (amateur) home cooks in America are set cooking challenges by three top celebrity chefs, headed up by the fearsome Gordon Ramsey.  Each week one of them gets eliminated.  At the end of it all, the surviving chef wins the big prize.  The thing is, the challenges these amateur chefs get set are quite mind-blowingly difficult, and in addition they have to compete under very serious time pressures.  Watching the show, I always find myself thinking that the best 24 professional chefs in the country - and certainly ANY of the contestants in Ramsey’s companion show “Hell’s Kitchen” - would find the competition no less challenging.

So, to my astonishment, the producers at Masterchef came up with the notion of Masterchef Junior, where the same format would instead be opened to the 24 best chefs in America, but this time in the age range 10-13 years old.  The challenges faced by these junior chefs would be no less formidable than those faced by the adults.

Here’s the thing, though.  If you had pitched that idea to me before I had seen the show, I would have laughed and said that Masterchef Junior would be of marginal interest, and then only to Soccer Moms.  The reality turned out to be rather different.

Instead we were treated to the sight of 10-year-olds cooking stunningly (and I mean stunningly - things I couldn’t begin to imagine taking on) complex foods, with no preparation, under the very same pressure-cooker time constraints, and held accountable to the same unyielding standards, as their adult counterparts.  It blows my mind.  Imagine being dined in the most expensive restaurant you know of, having a great meal, and being introduced to the chef only to find that he or she is still at elementary school.  And, cynical as I am regarding the so-called “unscripted” nature of TV reality shows, I find it hard to believe that all of this is not very real.

I happen to believe that the current generation of children growing up in North America is doing so with the greatest sense of entitlement of any generation that has ever lived, coupled with the least intention of developing the skills necessary to make good on those expectations.

That said, I now know that there are at least 24 kids out there who, in whatever direction their lives and careers will eventually take them, have truly enormous - dare I say unlimited - potential.

Masterchef Junior

Sunday 3 November 2013

Sample Rate Matters - II.

In yesterday’s post we found ourselves wondering whether a high-rez recording needs to expand its high frequency limit beyond 20kHz, and whether squeezing a brick-wall filter into the gap between 20kHz and 22.05kHz is really that good of an idea.  Today we will look at what we might be able to do about those things.

First, lets ignore the extension of the audio bandwidth above 20kHz and look at the simple expedient of doubling the sample rate from 44.1kHz to 88.2kHz.  Our Nyquist Frequency will now go up from 22.05kHz to 44.1kHz.  Two things are going to happen, which are quite interesting.  To understand these we must look back at the two brick-wall filters we introduced yesterday, one protecting the A-to-D converter (ADC) from receiving input signals above the Nyquist Frequency, and the other protecting the output of the D-to-A converter (DAC) from generating aliased components of the audio signal at frequencies above the Nyquist Frequency.  They were, to all intent and purpose, identical filters.  In reality, not so, and at double the sample rate it becomes evident that they have slightly different jobs to do.

We start by looking at the filter protecting the input to the ADC.  That filter still has to provide no attenuation at all at 20kHz and below, but now the 96dB attenuation it must provide need only happen at 44.1kHz and above.  That requirement used to be 22.05kHz and above.  The distance between the highest signal frequency and the Nyquist Frequency (the roll-over band) is now over 10 times wider than it was before!  That is a big improvement.  But let’s not get carried away by that - it is still a significant filter, one having a roll-off rate of nearly 100dB per octave.  By comparison, a simple RC filter has a roll-off rate of only 6dB per octave.

Now we’ll look at the filter that removes the aliasing components from the output of the DAC.  Those components are aliases of the signal frequencies that are all below 20kHz.  As described in Part I, those aliases will be generated within the band of frequencies that lies between 68.2kHz and 88.2kHz.  If there is no signal above 20kHz, then there will be no aliasing components below 68.2kHz.  Therefore the requirements for the DAC’s anti-aliasing filter are a tad easier still.  We still need our brick wall filter to be flat below 20kHz, but now it can afford to roll over more slowly, and only needs to reach 96dB at 68.2kHz.

Doubling the sample rate yet again gives us more of the same.  The sample rate is now 176.4kHz and its Nyquist Frequency is 88.2kHz.  The DAC filter does not need to roll off until 156.4kHz!  These filters are significantly more benign.  In fact, you can argue that since the aliasing components will all be above 156.4kHz they will be completely inaudible anyway - and might not in fact even be reproducible by your loudspeakers!  Some DAC designs therefore do away entirely with the anti-aliasing filters when the sample rate is high enough.

You can keep on increasing the sample rate, and make further similar gains.

Obviously, the higher sample rates also give us the option of encoding audio signals with a correspondingly higher bandwidth.  Equally obviously, that advantage comes at the expense of some of the filter gains, which vanish completely once the desired audio frequency bandwidth is extended all the way out to the new Nyquist Frequency.  But even so, by extending the high frequency limit of the audio signal out to 30kHz, little is given up in filter performance, particularly with a sample rate of 176.4kHz.

So far I have only mentioned sample rates which are multiples of 44.1kHz, whereas we know that 96kHz and 192kHz are popular choices also.  From the point of view of the above arguments concerning brick-wall filters, 96kHz vs 88.2kHz (for example) makes no difference whatsoever.  However, there are other factors which come into play when you talk about the 48kHz family of sample rates vs the 44.1kHz family.  These are all related to what we call Sample Rate Conversion (SRC).

If you want to double the sample rate, one simple way to look at it is that you can keep all your original data, and just interpolate one additional data point between each existing data point.  However, if you convert from one sample rate to another which is not a convenient multiple of the first, then very very few - in fact, in some cases none - of the sample points in the original data will coincide with the required sample points for the new data.  Therefore more of the data - and in extreme cases all of the data - has to be interpolated.  Now, don’t get me wrong here.  There is nothing fundamentally wrong with interpolating.  But, without wanting to get overly mathematical, high quality interpolation requires a high quality algorithm, astutely implemented.  It is not too hard to make one of lesser quality, or to take a good one and implement it poorly.

Downconverting - going from a high sample rate to a lower one - is fraught with even more perils.  For example, going from 88.2kHz sample rate to 44.1kHz sounds easy.  We just delete every second data point.  You wouldn’t believe how many people do that, because it is easy.  But by doing so you make a HUGE assumption.  You see, 88.2kHz data has a Nyquist Frequency of 44.1kHz and therefore has the capability to encode signals at any frequency up to 44.1kHz.  However, music with a sample rate of 44.1kHz can only encode signals up to 22.05kHz.  Any signals above this frequency will be irrecoverably aliased down into the audio band.  Therefore, when converting from any sample rate to any lower sample rate, it is necessary to perform brick-wall filtering - this time in the digital domain - to eliminate frequency content above the Nyquist Frequency of the target sample rate.  This makes down-conversion a more challenging task than up-conversion if high quality is paramount.

Time to summarize the salient points regarding sample rates.

1.  Higher sample rates are not fundamentally (i.e mathematically) necessary to encode the best quality sound, but they can ameliorate (or even eliminate) the need for brick-wall filters which can be quite bad for sound quality.

2.  Higher sample rates can encode higher frequencies than lower sample rates.  Emerging studies suggest that human perception may extend to frequencies higher than can be captured by CD’s 44.1kHz sample rate standard.

3.  Chances are that a high sample rate music track was produced by transcoding from an original which may have been at some other sample rate.  There is absolutely no way of knowing what the original was by examining the file, although pointers can be suggestive.

4.  There is no fundamental reason why 96kHz music cannot be as good as 88.2kHz music.  Likewise 192kHz and 176.4kHz.  However, since almost all music is derived from masters using a sample rate which is a multiple of 44.1kHz, if you purchase 24/96 or 24/192 music your hope is that high quality SRC algorithms were used to prepare them.

5.  Try to bear in mind, if your high-res downloads are being offered at 96kHz and 192kHz, it means your music vendor is maybe being run by people who pay more attention to their Marketing department than their Engineering department.  That’s not an infallible rule of thumb, but it’s a reasonable one.  (Incidentally, that is what happened to Blackberry.  It’s why they are close to bankruptcy.)