Saturday, 3 October 2015

BitPerfect v3 Released

BitPerfect v3 includes our new Generation-IV audio engine, plus some substantial improvements to the way we communicate with iTunes.  And we have re-instated support for Sound Check.  Additional features include:
    •    Fixed bugs which affected certain multi-channel DACs.
    •    Audio Output Device selection from the drop-down menu.
    •    Menu bar icon color now indicates the playback sample rate.
    •    Support for OS X ‘Dark Mode’.
    •    Improved automatic detection of Access Permissions.
    •    Visible confirmation of Integer Mode playback.

BitPerfect v3 is a free upgrade for all existing BitPerfect customers.

Friday, 25 September 2015


Our warmest congratulations go out to Channel Classics, one of our favourite record labels, on being recognized with Gramophone's "Label of the Year, 2015" award.  Channel Classics are making some of the best classical recordings on the market, not only in terms of absolutely impeccable audio quality, but also from the perspective of artistic merit.

Well done to Jared Sacks and his team!

Monday, 21 September 2015

Odd iTunes 12.3 Bug

Here’s something for you to consider checking out.

If you have a headless Mac Mini with no monitor attached, and you have downloaded the latest iTunes, try this. Click on the orange ‘minimize’ button at the top left. Your Mac will immediately jump to the UI Login Screen. I have reported this to Apple vie the official bug channel, but thus far no response.

Wednesday, 9 September 2015

Sound Check

iTunes has a feature called sound check, which scans all of the tracks in your music library and determines a level of volume adjustment that can be applied to each track so that they play back at a more or less comparable volume.  BitPerfect used to support sound check, but along the way iTunes changed the way in which they implemented it internally and so we dropped support for it.  Over the last few updates to iTunes, however, the sound check feature has remained pretty much stable and unchanged, so we decided maybe it is time to implement it again in BitPerfect.

Nobody is quite sure how sophisticated the algorithm is with which iTunes determines the amount of volume adjustment to be applied by sound check.  But I wouldn’t be betting my house on it being anything beyond something relatively crude.  For sure, the net effect of enabling sound check is to populate a metadata tag which shows up in the “Get Info…” dialog window under the “File” tab.  This number is basically tells you how much volume adjustment is applied to the track by iTunes in order to “normalize” is playback volume.

These numbers can be negative, in which case the track will be attenuated for playback, or positive in which case gain needs to be applied.  If the volume number is 0dB it appears that this “Volume” tag stays blank.  Of course attenuation is not a problem, but if we apply gain then we have to be aware of the possibility that the loudest parts of the track may be clipped.  Quite how this is handled in iTunes is not clear.  It is likely that iTunes uses a crude soft-clipping algorithm.  I have previously posted on what I think about that.  BitPerfect does not go in for crude anything.

The biggest problem with clipping, soft or otherwise, is the creation of aliased harmonics.  The best approach is therefore to upsample as far as you can before soft clipping so that the aliases will lie predominantly outside of the audio bandwidth and then down-sample, using the anti-aliasing filter to eliminate the aliases.  All in all a rather hefty processing loop for something that at the end of the day is still going to introduce very significant distortions.  I suppose this is something we might introduce in the future as a specialist plug-in if the demand is there.

What used to happen was that iTunes wrote the volume adjustment information into each file’s metadata.  BitPerfect could then read this value when playing back the file.  However, for some reason, iTunes made a change to that system.  Now, it only writes that information into some files and not others.  Those that get this data written to them include MP3 and AAC formats, while those that don’t include Apple Lossless.  I have no idea who it was in Apple that decided this was a GOOD IDEA.  Since Apple Lossless files represent the vast majority of files listened to by BitPerfect users, it means that we cannot support sound check using this method.  I suppose the argument is that you shouldn’t be using sound check if you’re listening to Apple Lossless music, and maybe particularly not if you are listening via BitPerfect, but that’s a little bit too Orwellian for me.

The other way to get this information is by interrogating the iTunes Library Framework, one of the many inter-process communication protocols that Apple provides, none of which work very well.  We have been down this road before with Apple.  The way it works is that when BitPerfect loads it asks iTunes for an iTunes Library Framework Object.  Sounds simple enough.  However, last time we tried this we found that on some systems when BitPerfect tries to load the Framework Object, iTunes refuses to respond.  We could never figure out what it was about a system that made iTunes behave that way.  At that time, the reason we tried using the Framework Object was to manage the Access Permissions scan to improve the scanning speed.  Consequently, since Access Permissions are a necessity, we were forced to drop that method entirely.

But now we are looking at it again as a possible way to re-introduce support for sound check.  And we’re finding that the old problem is still there.  On one of our three main test platforms we cannot get iTunes to deliver the Framework Object to BitPerfect.  On the others it is not a problem at all.  If the Framework Object is delivered to BitPerfect then it can support sound check, but if it is not then it has no way to access the sound check values.

We are wondering what to do about that in the short term.  We have a new update to BitPerfect that we want to release soon.  Do we release it with sound check support knowing that for some people - and we have no way of knowing how many - it isn’t going to work?  Or do we not release without sound check support?

I’m thinking we’ll release it with the problematic sound check support.  For those who are unaffected by the problem they will have access to the sound check function.  For those who encounter the problem the sound check function won’t work, but it would be no different to if we hadn’t included it in the first place.  Some feedback from concerned users would be welcome :)

Tuesday, 8 September 2015

Password Security

A lot of fuss has been made recently about the hackers who released all of the private information held by the company Ashley Madison, which provides a supposed intermediary service for would-be adulterers.  Private information pertaining to millions upon millions of users has been released into the public domain.

What does that mean, though?  If you are one of Ashley Madison’s customers you may be desperately wondering who knows what your name is, where you live, your credit card information - even, perhaps, intimate details of your sexual preferences - what are the prospects for that information also making its way into the public eye?  I thought I would talk a little bit about encryption, how it works … and how to crack it.

Password-based encryption systems work broadly on the basis of either encryptions or hashes.  The difference between the two is that encryption systems are two way functions, whereas hash systems are one-way functions.  With an encryption-based system for every output value there is only one possible input value that could give rise to it, whereas with a hash system every output value has an unmanageably large number of possible input values that could give rise to it.  A trivial, but quite inadequate, example of a hash system might be a function which returns 1 if the input is odd and 0 if it is even.  Clearly, given an output value of 1 or 0 there is no way to determine what the input value was, beyond narrowing it down to an odd or even number.

With an encryption system the designer is basically asserting that there is no known way to reverse the encryption.  The risk is that if such a reversal algorithm is ever discovered it instantly unlocks every password ever encrypted using that system.  By contrast, with a hash system the designer is asserting that reversal is fundamentally not possible.  But in this case the risk is that you can unlock the password by stumbling upon one of a colossal number of possible input values (of which the true password is just one).  Both methods have their advantages, and are widely used.  But note that, in most common parlance, the output of a password security algorithm - whether an encryption or a hashing function - is usually referred to as a ‘hash’.

Typically, in a security application, the input value is a password.  What we have in principle is a system where I can publish both the encryption/hash algorithm and the hash value itself, and nobody can use that information to work out what the password was which created that hash value.  Anybody can make a guess at the password and pass it through the hash algorithm, but unless the guess is correct the result will not match the hash value.  The key advantage is that the open-text password itself is never stored anywhere.  Just about every password-based authentication system today is based upon these principles.

So how do hackers set about cracking those passwords?  The reality is that there is only one way of doing it.  What you do is make a guess at the password, pass it through the hash algorithm, and compare the result to the open-text hash value.  If you guessed right, you’ve cracked it.  If not, you try again with a different guess at the password.  Simple, really.  How long do you think it would take before they guessed right?

What serious ‘professional’ hackers do is to compile lists of ‘known’ passwords.  These are combinations of known names, known places, common numbers (such as dates) and other common candidates.  Also, every time a hacker cracks or otherwise obtains a password, they will add it to a list of known passwords, where it can be used to crack similar passwords the owner may be using elsewhere.  Such lists are maintained in various dark corners of the internet.  The list might, for example, contain the word
FRED’.  People accessing these lists will use that to automatically derive variants such as fred, “fReD”, “FR3D”, “fred69”, and “DERF”.  Hackers using specially-configured computers will burn through these lists, passing each and every possible password through the hash algorithm until they find a match, at which point all of your personal information protected by that password is immediately at their fingertips.  However, such lists usually contain over 10 million passwords.

While this analysis may or may not give you pause for thought, it is based on a simple inconvenient truth.  How quickly it takes the attack to crack your password depends on how ‘secure’ your password is.  An ‘unsecure’ password will appear near the top of the list as one of the best guesses available.  It will be cracked pretty quickly.  A less secure password won’t get tried until later in the process and will take a lot longer to crack.  But a highly secure password is one that won’t actually get tried at all for the simple reason that it won’t even be on the list.  It won’t be cracked at all.

The fact is, the vast majority of password choices used by ordinary people today are not secure enough to avoid appearing on one of those nefarious lists.  If your password doesn’t appear on a list, or cannot be derived from a root that appears on a list, then it is never going to be cracked no matter whose computer is doing the cracking.  Bear this in mind next time you enter a password somewhere!  The problem is that people don’t like properly secure passwords because they are almost impossible to remember.  But there’s no simple way around that.  Here is an example of a highly secure 48-character password (the degree of security increases exponentially with the number and variety of characters used):


If this is what you had to type in to access your Ashley Madison account, you can console yourself that it was never going to be cracked.  Mind you, it might take you all day, and several frustrating attempts, just to enter it successfully each time you log in!  Such is the price to be paid for a comfortable level of security.

One of the most commonly used hash algorithms is SHA-2, designed by the NSA.  Less secure is MD5 which is often used to verify the accuracy of a CD rip (which is not a security application).  More secure would be something seriously potent like bcrypt, scrypt, or PBKDF2.  As a reasonable approximation, what makes an algorithm more ‘secure’ is largely down to the amount of computer time taken to calculate the hash value, as this also increases the time taken by a hacker to churn through a list of passwords.  Additionally, algorithms like scrypt work by requiring the computer to have a very substantial amount of RAM, making the required hardware much more expensive, while at the same time compromising a hacker’s ability to configure an attacking computer to perform multiple operations in parallel.  Despite this, the scrypt algorithm, for example, is deceptively simple, amounting to only a few tens of lines of code.

Many experts in computer and internet security are suspicious of NSA-designed algorithms such as SHA-1 and SHA-2, due to concerns that NSA may potentially only approve encryption systems that they are reasonably confident they can crack.  Such considerations are highly speculative.  However, a more recent development, SHA-3, was approved following a competition among non-NSA designers.

Even though what I have described sounds virtually uncrackable you would be surprised at how clever people are when it comes to mathematically analyzing encryption systems and devising increasingly more efficient ways of attacking them.  One such attack is known as a “rainbow table”.  It uses the formidable data storage capacity of modern computer and network technology to store vast lists of what you might term ‘intermediate results’.  By referring to those rainbow tables, hackers can seriously short-circuit the time taken to crack a password.  To get around that, most encryption and hash algorithms using a technique called “salting”, which adds various random characters to the user’s password before encrypting it.  These random characters, called the “salt”, can be stored in plaintext alongside the hash result without compromising the integrity of the encryption.  Salting has the effect of disrupting a rainbow table and rendering it useless.

Back to Ashley Madison.  It turns out that their password encryption system uses a 10-round bcrypt, which is an encryption-based algorithm.  A computer security expert recently analyzed the released Ashley Madison password hashes using a computer optimized for cracking.  He had a list of 14.3 million clear-text passwords.  He loaded the first 6 million Ashley Madison password hashes into his program to attack them.  From his first 6 days of intense cracking he estimated that it would take him 19,000 years to crack all 6 million passwords, and 117,000 years to crack the whole database.  On that basis you would rate Ashley Madison’s password encryption system as very good indeed - relatively few web sites implement anything as powerful as a 10-round bcrypt.  As a sobering thought, the researcher estimated that if Ashley Madison had been using plain unsalted MD5, the passwords of their entire user list could be cracked in a mere 3 seconds!  Note that this is basically an individual with a highly-sophisticated but still relatively ordinary commercial computer.  A government, with a room full of super-computers dedicated to the task, would probably make mincemeat of those numbers.

But the problem with the Ashley Madison episode is not that people’s passwords were cracked.  They weren’t.  The hackers actually hacked directly into the company’s database, which gave them direct access to the user data without needing to know the password in the first place.  And in this case, it would appear that at least part of the database itself was not encrypted, something that is a requirement of the US government’s PCI and HIPAA compliance standards (for financial and medical data respectively).  There may be a case to make that this was an ill-considered and avoidable oversight given the thought that went into their implementation of a password encryption system.

It is important to understand that the Ashley Madison hack does not mean that the clients’ passwords were compromised.  They were in the sense that the leaked password hash values gives hackers everywhere useful information which they could use to attempt to discover what the passwords were.  But as we have seen, the task of churning that data into passwords is a formidable one.  However, for anyone who cares to try, the first passwords to be cracked will be the least secure ones.  The computer security expert I mentioned earlier reported cracking his first 4,096 passwords in only 6 days, and you would imagine that few (if any) of those fell outside of the ‘unsecure’ category.  The trouble is you would imagine that the sort of person who uses a low-security password is also the sort of person that re-uses that same password - either identically or with minor variations - for most of their other secure accounts.  Given that the Ashley Madison database exposes the users’ e-mail addresses in open-text, this gives hackers a significant opportunity to target those individuals and go after their banking and other financial data.  This, ultimately, is the probably biggest threat posed by the Ashley Madison hack.

So, in conclusion, it is probably a good time to ask yourself an important question.  Given what you’ve just read, just how secure do you think your own passwords are?

Wednesday, 26 August 2015

Phase Value

We already know that a digital waveform can be transformed, using a Fourier Transform, into a different representation where each data point represents a certain particular frequency, and the magnitude of the transform at that data point represents the amount of that frequency that is present in the original signal.

This is interesting, because we humans are able to perceive both of these aspects of a sound’s frequency content.  If the frequency itself changes - increases or decreases - we perceive the pitch to go up or down.  And if the magnitude changes - increases or decreases - we perceive the volume to get louder or quieter.  Between them, these two things would appear to totally define how we perceive (or, if you prefer, “hear”) audio signals.  Interestingly enough, a physical analysis of how the human hearing system actually works suggests that it is those separate individual frequencies, rather than the waveform itself in its full complexity, that our ears respond to.

If we take all the frequencies in the Fourier Transform and create a sine wave from each one, whose magnitude is the magnitude of the Fourier Transform, and add them all together, the sum total of all these sine waves will be the exact original waveform.  But there are a couple of wrinkles to bear in mind.  The first is that this is only strictly true if the original waveform used to create the Fourier Transform was of infinite duration, producing a Fourier Transform with an infinite number of frequencies.  For the purposes of this post we can safely ignore that limitation.  The second is that we need to know the relative phase of each frequency component.

I wrote in a previous post how we can decompose a square wave into its constituent frequency components and use those to reconstruct the square wave.  However, if we change the phase of these individual frequency components - which describes how the individual sine waves “line up” against each other - then we end up changing the shape of the original square wave.  Indeed, the change can be rather dramatic.  In other words, changing the phases of a waveform’s component frequencies can significantly alter the waveform’s shape without changing any of its component frequencies or their magnitudes.  To a first approximation, changes in the phase response of an audio system are considered not to be audible.  However, at the bleeding edge where audiophiles live that is not so clear.

The Fourier Transform I mentioned in fact encodes both the magnitude and the phase information because the transformation actually produces complex numbers (numbers having two components which we term Real and Imaginary).  We can massage these two components to yield both the phase and the magnitude.  This is one example of how the phase and frequency responses of an audio system are tightly intertwined.

We are used to demanding that anything which affects an audio system has a frequency response that meets our objectives.  This applies equally in the analog domain (whether we apply it to circuits such as amplifiers or components such as transistors) as in the digital domain (where we can apply it to simple filters or elaborate MP3 encoders).  We are familiar with the common requirement for flat frequency response across the audio bandwidth because we know that we can “hear” these frequencies clearly.  But all of those systems, analog and digital, also have an associated phase response.

Some types of phase response are quite trivial.  For example, if the phase response is linear, which means that the phase is linear with frequency, this means simply that the signal has been delayed by a fixed amount of time.  More generally if we look at the phase response plot (phase vs frequency), the slope of the line at any frequency tells us how much that frequency is delayed by.  Clearly, if the slope is linear, all frequencies will be delayed by the same amount, and the effect will be a fixed delay applied to the entire signal.  However, if the slope is anything other than linear, it means that different delays apply to each frequency and the result will be a degree of waveform distortion as discussed regarding the square wave.

So, we have clear ideas about errors in the magnitude of the frequency response.  We classify these as dips, humps, roll-offs, etc, in the frequency response, and we have expectations as to how we expect these defects to sound, plus a reasonably well-cultivated language with which to describe those sounds.  But we are still trying to develop an equivalent understanding of phase responses.

One development I don’t like is to focus on the impulse response, and to ascribe features of the impulse response to corresponding qualities in the output waveform.  So, for example, pre-ringing in the impulse response is imagined to give rise to “pre-ringing” in the output waveform, which is presumed to be a BAD THING.  This loses sight of a simple truth.  If you mathematically analyze a pure perfect square wave and remove all of its components above a certain frequency, what you get is pre-ringing before each step, and post-ringing after it.  We’re not talking about a filter here, we’re talking about what the waveform inherently looks like if its high frequency components were absent, which they need to be if we are going to encode it digitally.

You might argue that a perfect phase response would be a zero-phase response, where there is no phase error whatsoever at each and every frequency.  Such characteristics cannot be achieved at all in the analog domain, but in the digital domain there are various ways of accomplishing it.  However, it can be shown mathematically that all zero-phase filters must have a symmetrical impulse response.  In other words, whatever post-ring your filter has, it will have the exact same pre-ring before the impulse.  This, by the way, is another way of describing what happened to the pure perfect square wave.

Another impulse response characteristic that gets a lot of favourable press is the Minimum Phase filter.  This is a misleading title because, although it does mathematically minimize the net phase error, it lacks a theoretical basis upon which to suppose a monotonic relationship exists between the accumulated net phase error and an observed deterioration in the sound quality.  For example, linear phase filters exhibiting no waveform distortion can in principle have significant different fixed delays, with corresponding significant differences in their net phase error, yet with no difference whatsoever in the fidelity of their output signals.  On the other hand, Minimum Phase filters do concentrate the filter’s “energy” as much as possible into the “early” part of its impulse response, which can mean that it is more mathematically “efficient”, which may make for either a better-designed filter, or a more accurate implementation of the filter’s design (sorry for the “air quotes”, but this is a topic that could take up a whole post of its own).

One thing I must be clear on is that this discussion is purely a technical one.  I discuss the technical properties of phase and impulse responses, but I don’t hold up a hand and claim that one thing is better than the other.  Someone may state an opinion that such-and-such a filter sounds better than so-and-so’s filter because it has a “better” impulse response.  I might agree or disagree with the opinion regarding which filter sounds best, but I will argue against attributing the finding to certain properties of the impulse response without a good model to account for why the properties advocated should be beneficial.  As regards the impulse responses no such “good” model yet exists (that I know of).

Where I do stand from a philosophical standpoint is that I like zero-phase responses and linear phase responses because these contribute no waveform distortion at the output.  For that reason, we are, here at BitPerfect, developing a zero-phase DSP engine that, if successful, we will be able to apply quite broadly.  We will try it out first in our DSD Master DSD-to-PCM conversion engine, where I am convinced that it will provide PCM conversions that are, finally, indistinguishable from the DSD originals.  If listening tests prove us out, we will release it.  From there it will migrate to SRC, where I believe it will deliver an SRC solution superior to the industry-leading Izotope product (which is too expensive for us to use cost-effectively).  Finally, it will appear in our new design for a seriously good graphical equalizer package that is in early-stage development, with possible application to room-correction technology.

Thursday, 13 August 2015

Audio Files for Audiophiles

A few years back I purchased a Windows App called dBpoweramp.  It met my needs for a while.  Upon installation, I learned that the App supports a huge number of different music file formats.  Today, that list reads:  AIFF, ALAC, CDA, FLAC, MP3, WAV, AC3, AAC, SND, DFF, DSF, MID, APE, MPP, OGG, OPUS, WVC, W64, WMA, OFR, RA, SHN, SPX, TTA, plus a number of variants.  Who knew there were so many audio formats?  I for one have never heard of most of these.  Counting through them, I have only ever used eight of ’em, and of the rest I have only ever come across three.  Well, good for dBpoweramp!  I can sleep comfortably knowing that if I ever want to convert a TTA file to OFR I probably have just the the tool for the job.

Music file formats arise to fill a need, and each and every one of those file formats I mentioned represents a need which went unmet at the time the format was devised.  Actually, I even invented an audio file format of my own, way back in 1979.  In my lab at work I had a Commodore Pet computer which was attached to an X-Y graphic printer.  I used the Pet to control a laser test apparatus and had the printer output the results graphically.  As the printer’s two stepper motors (one for each axis) drove the pen holder across the paper, the tone of each motor would sound a certain note.  By having the printer draw out a certain pattern I could get it to play “God Save the Queen”.  Not very imaginative, I agree, but it was quite a party trick in its day.  I then wrote a program that would allow you to compose a tune which you could then play on the printer.  Finally, I devised a simple format with which to store those instructions in a file which the Commodore Pet saved on its audio-cassette tape drive.  I could conceivably claim to have developed one of the world’s first audio file formats!  Looking back, the Zeitgeist was quite delicious - a computer audio file stored in digital form on an analog audio cassette tape.

But back to the myriad file formats supported by dBpoweramp.  Each one has a purpose, and I suppose not all of those involve the distribution of music for commercial or recreational purposes.  For what it’s worth, the developers of iTunes could have arranged for it to support all of these weird and wonderful file formats too, but they didn’t.  In some cases there are good technical reasons why they would elect not to support a particular file type.  In others it is a matter of choice.  Some of those formats are Audio-Video formats, and iTunes is, after all, a multi-media platform.  But for the purposes of this post I am going to constrain the discussion to audio-only playback.

Not just the developers of iTunes, but every developer who writes an audio playback App has to decide for themselves which of those (and, perhaps others too) file formats their App is going to support.  I am going to break these formats down into four camps - Uncompressed, Lossless Compressed, Lossy Compressed, and DSD.  Lets look at each one, and discuss how they handle the audio data.

The simplest audio file formats contain uncompressed audio data.  The actual audio data itself is written straight into the file.  It is not manipulated or massaged in any way.  The advantage of doing it this way is that the audio data can be both written and read with the minimum of fuss.  The two most commonly used examples of this type of file format are AIFF (released by Apple in 1987) and WAV (released by Microsoft in 1991).  iTunes will happily load either file type.

Back in those days the file size of a AIFF or WAV file was utterly prohibitive.  A five-minute track ripped from a CD would be require a file size of 53MB which represented something like three times the capacity of a good-sized hard disk drive at that time.  Clearly, if computers were going to be able to handle digital audio something needed to be done to reduce the file size.  To address this problem, during the early 1990’s the Fraunhofer Institute in Germany developed what we now call the MP3 file format.  What this does is, effectively, to figure out which parts of an audio signal are the least audible and throw them away.  By throwing away more and more of the audio signal the file size can be reduce rather dramatically.  This approach is referred to as Lossy Compression, because it compresses the file size but loses data (and therefore sound quality) along the way.

The first MP3 codec was released in 1995.  In 1997 Apple introduced their own version of MP3 called AAC.  Structurally, AAC is very similar to MP3 but has some significant differences aimed at improving the subjective audio quality.  However, each format requires a separate codec to be able to read it.

By the end of the decade the combination of the MP3 codec and the ready availability of hard discs with capacities exceeding 100MB had ushered in the age of computer audio.  As always, there was a fringe element who still preferred the improved sound quality of uncompressed WAV and AIFF files, but who were still troubled by the enormous file sizes.  Programs like PKZip proved that ordinary computer files could be compressed to a smaller file size and subsequently regenerated in their exact original form.  However, PKZip did not do a very good job of reducing the file size of audio files.  A dedicated lossless compressor was needed, one specifically optimized around the characteristics of audio data.  In 2001 the first FLAC format specification was released.  The FLAC codec could produce compressed files that are approximately 50% of the size of the original WAV or AIFF file.  In 2004 Apple introduced their own lossless compression format ALAC (or Apple Lossless).

In 1999, Sony and Philips tried and failed to launch the SACD format as a successor to the ubiquitous CD.  SACD uses a radically different form of audio encoding called DSD.  Ultimately, the SACD launch flopped, although the format has never actually gone away, and the DSD format acquired its own band of loyal followers.  The developers of SACD each developed a file format that could handle DSD data - the DFF format developed by Philips, and the DSF format developed by Sony.  By 2011, DSD enthusiasts had demonstrated the ability to manage DFF and DSF files on their computers, and to transmit DSD data to a DAC, and the first DSD-compatible DACs trickled onto the market.  Consumer-level DSD recording equipment is also now available, and produces output files in either DSF or DFF format - bizarrely, they rarely offer a choice of formats.

Today, although other file formats do persist, the computer audio market has more or less settled down to four format types, with two competing format offerings for each type.  AIFF (Apple) and WAV (everybody else) for uncompressed audio; ALAC (Apple) and FLAC (everybody else) for lossless compression; and AAC (Apple) and MP3 (everybody else) for lossy compression.  DSF and DFF continue to duke it out in the DSD world.  Note that, except for DSD which Apple does not support in any form, the formats have shaken down into pairs of Apple and everybody else.  Why is this?

Frankly, there is absolutely no reason why any software player should not be able to support all of these file formats.  The process of reading (or writing) any of them is quite straightforward.  Yet, Apple originally refused to support WAV and MP3 formats in its iTunes software and iPod players, instead requiring users to use its own AIFF and AAC formats.  In fact, to this day Apple products continue to refuse to support FLAC files, instead requiring its customers to use ALAC.  From a functionality viewpoint none of this really matters.  ALAC and FLAC can be seamlessly transformed from one to the other and back again using high quality free software (as can AIFF and WAV, AAC and MP3).  But this is not what customers want.  So why is it that Apple takes this unhelpful stance?

The reason is simple.  From a business perspective, Apple’s entire iTunes ecosystem exists not to provide you with a platform on which to manage and play your music, but as a platform to sell you the music that you listen to.  Apple’s business model is for you to buy your music from them rather than from anybody else.  Therefore when you buy music from the iTunes Store it comes in AAC format only and not in MP3 or FLAC.  But if you buy your music virtually anywhere else, it only comes in the MP3 and FLAC formats.  Virtually nobody outside of Apple is interested in selling AAC or ALAC files.

But why bother in the first place?  Apple isn’t actually selling any ALAC files on its iTunes Store, so you have to wonder what their thinking is.  Do they consider that they are motivating me to buy AAC files from Apple instead of FLAC files from someone else?  Really?  Hey, maybe they’re right.  Maybe that’s exactly what we do.  It has also been suggested that Apple is scared of becoming targets of a patent troll if they start offering FLAC support, but that seems to be an even more feeble explanation.  Google have been supporting FLAC in Android for some time now, and have not attracted any trolls’ attention that we know of.  In any case, nobody is sure what patents FLAC might possibly be infringing, given that it is all open-source.  But given the size of Big Apple, they would certainly make for a tasty target.

Interestingly, with the overwhelming consumer embrace of MP3, Apple realized very early on that if they were going to continue refusing to support MP3 they could risk losing out on the whole mobile music opportunity to one of the competing platforms such as Rio, Zune, Nomad/Zen and others.  Deciding to support MP3 was a key tactical business decision that took the air out of their competitors’ sails and ultimately paved the way for the total dominance of iPod and iTunes.  Today, despite the overwhelming consumer embrace of FLAC, there is no such pressure on Apple to encourage them towards supporting FLAC.

At one time there was an App called Fluke which allowed users to import FLAC files into iTunes.  Unfortunately, that loophole relied on a 32-bit OS kernel, and as a result Fluke no longer works with OS X 10.7 (Lion) and up.  Just to be clear, there are absolutely no technical reasons whatsoever that prevent Apple from supporting FLAC files.  It would be a trivial move for them to make, if they wanted to.  Their refusal to support FLAC is entirely a tactical decision on their part.

The situation with DSD is significantly different.  OS X and iOS are both fundamentally incapable of supporting DSD.  It would require significant changes to the way their audio subsystems work in order for that to happen, and, being honest, I see some fundamental issues that they would face if they ever considered doing that.  Consequently, I don’t see DSD being supported by Apple in any form for the foreseeable future.  The way the audio industry has got around that is with the DoP data transmission format.  This dresses up native DSD data so that it looks like PCM, which OS X can then be fooled into sending to your DAC, but it means that any Mac Apps which support DSD would have to be extremely careful how they went about it.  BitPerfect, for example, can do that, and iTunes can’t.  This is different from the situation with FLAC files.  Whereas iTunes would have no problems reading a FLAC file if Apple chose to let it, it would have absolutely no idea what to make of a DSD file.  You might as well ask it to load an Excel spreadsheet.

In order for BitPerfect to manage DSD playback, we have created what we call the Hybrid-DSD file format.  Hybrid-DSD files are ALAC files that iTunes recognizes, and can import and play normally.  However they also contain the native DSD audio data as a sort of “trojan horse” payload.  If iTunes plays a Hybrid-DSD file it plays the ordinary ALAC content.  But if BitPerfect plays the file it plays the DSD content.  We really like that system.  Other software players have instead adopted the idea of a “proxy” file.  This is a similar thing, but instead of containing ordinary ALAC music plus the DSD payload, they contain no music and include information that enables the playback software to locate the original DSF or DFF file.  Some may like the proxy file format, indeed some may prefer it, but we don’t, and this isn’t the place to discuss that.

It has often been suggested that BitPerfect could adopt a mechanism similar to either the Hybrid-DSD file or the proxy file to import FLAC files into iTunes.  And yes, we could do that.  But frankly, we believe the proper solution to that problem is to simply transcode the FLAC files to ALAC using a free App such as XLD.  It is simple and effective, and the ALAC files can just as easily be transcoded back into FLAC form if needed.

The final topic I want to cover in this post is Digital Rights Management (DRM).  This is a method by which the audio content in the file is encrypted in such a way as to prevent someone who does not “own” an audio file from playing it.  In other words, it is an anti-piracy technique.  Files containing DRM are pretty much indistinguishable from files that do not contain it, and most audio file formats support the inclusion of DRM (I am given to understand that FLAC does not, but I am not 100% sure).  For example, Apple included DRM in almost all of the music downloads sold on iTunes between 2004 and 2009.   

DRM is something that tends to get forced on the distributors (i.e. the iTunes Store) by content providers (i.e. the record labels), and is a major inconvenience for absolutely everybody involved in the playback chain.  Between 2004 and 2009 Apple had grown to hold sufficient clout that they could dictate to the content providers their intention to discontinue supporting DRM.  Today, DRM is a non-factor, although the new Apple Music service, plus TIDAL, and other streaming-based services which offer off-line storage are now re-introducing it.  The advance and retreat of DRM is an interesting barometer of who has the upper hand at any time in the music business between the distributors and the content providers.