Thursday 25 April 2013

Ripping your CD Collection – II. Which File Format?

The process of extracting the music from a CD and placing it in a set of computer files is called “ripping”.  When you come to rip a CD, the first decision you have to make is which file format you are going to use.  There are several of them.  All of them have both advantages and disadvantages.  It is useful to understand what these are so you can make an informed choice.

The first, and most dramatic distinction is between lossless and lossy files.  This arises because, in order to minimize the amount of hard disk space taken up by your music files, or alternatively to maximize the number of music files you can fit on any given hard disk, you usually want to aim to store your music in the smallest convenient file size.  Much like “zipping” a regular computer file to get it down to a small enough size to attach it to an e-mail message, music files can be “compressed” down to a manageable smaller size.  This compression can be either lossy or lossless.  Lossy compression results in a much smaller file size, but at the cost of some loss in quality.  Generally the more the compression, the smaller the file size and the lower the quality.  The term “lossy” is used because some of the musical data is irretrievably lost in the process and can never be recovered.  I never recommend ripping a CD to a lossy format unless you are very clear in your mind that you really want/need smaller file sizes and are prepared to accept compromised sound quality to get it, and that if you ever change your mind about that you will have to rip all of your CDs over again.

Lossless file formats store the music data in such a way that all of the music data that was on the original CD can be precisely recreated during playback, bit for bit, each and every time.  This can be done either with or without compression.  The music data on a CD comprises 16 bits (2 Bytes) of data, per channel, 44,100 times per second.  So every second of music requires 176.4kB of disk space.  Lossless compression techniques can reduce this disk space requirement.  The amount of compression that can be achieved will vary depending on the musical content.  Some tracks will compress more than others.  But a rough guideline is that a lossless compressed file will use about one-half to two-thirds of the disk space compared to an uncompressed file.  This allows you to make a rough estimate of how much disk space you will need to store your entire collection.  Another (very) rough guideline is to allow for 200-300MB per CD (if compressed) for rock and jazz, and 300-400MB per CD (if compressed) for classical.  YMMV.

There are two major uncompressed formats in use today, WAV and AIFF.  Both are, to all intent and purpose, identical.  The differences are marginal.  The former was developed by Microsoft for use on Windows machines, and the latter by Apple for use on Macs.  In reality, there is nothing to stop a Mac from reading a WAV file and vice-versa.  It is just a question of whether or not the software you are running supports that file format.

There are also two major lossless compressed formats in use today, FLAC and Apple Lossless (also called ALAC, and sometimes ALE).  FLAC is an open-source format which has become widely adopted, and is now very close to being the de facto industry standard.  The latest version of the FLAC spec also includes an “uncompressed” option.  Apple Lossless, on the other hand, was developed by Apple for use with iTunes.  It was originally a private format, but has now been thoroughly hacked so third party software can support it.  But Apple has still not published a specification, and some minor incompatibility issues still surface from time to time.  It has no real use other than with iTunes, and lives on only because Apple still refuses to support the FLAC standard.  Apple Lossless files usually have the extension “.m4a”.

The two major lossy compression formats are MP3 (used everywhere – even in iTunes) and AAC (used only within iTunes).  I am not going to discuss lossy formats any further.  As they say at Ruth’s Chris Steak House, “customers wishing to order their steaks well done are invited to dine elsewhere”.

You will read in some places that music stored under various different lossless file formats actually sounds different.  This appears to stretch credibility somewhat.  Let me state for the record that if you are using BitPerfect there is absolutely no possibility of this happening.  At the start of playback, the file is opened, read and decoded, and loaded into memory.  This process normally takes less than five seconds, (but can be longer for some higher resolution music tracks).  Once this five seconds is over, then the precise same data will reside in the precise same memory location, regardless of what the file format is.  For the remainder of playback there is no possible mechanism by which the file format can influence the sound quality.  Arguably, if you use different software to play back the music, and the music is streamed from disk and not from memory, then the slight differences in specific disk and CPU activity needed to access the different file formats could conceivably be reflected in the resultant sound quality.  I have never personally heard any differences, though.

It is really important to understand that the different file formats store their metadata in different ways.  WAV files, for example, normally the first format that springs to people’s minds once they have dismissed MP3, only supports a very limited number of metadata fields – few enough to be a serious strike against it in my view.  Some people modify the WAV format to include metadata in the ID3 format, which is a comprehensive metadata standard.  Unfortunately, this results in non-standard WAV files which your choice of playback software may have trouble reading.  Apple’s AIFF format supports ID3 out of the box, but Apple Lossless supports the Quicktime metadata format, a symptom of its “Apple proprietary” origins.  FLAC supports a comprehensive metadata format called Vorbis Comments, which are flexible and easy to read and write, but the standards that define what the fields should be and what should go in them are very lax indeed.  This is both an advantage (since you can define whatever metadata implementation you want) and a disadvantage (since the software that reads the metadata may not interpret it in the same way as the software that wrote it).  Having said that, this is only a problem if you want to store “extended” metadata that goes beyond the commonly implemented “standard” fields, in which case there are no existing standards that you can adhere to anyway, regardless of whatever file/metadata format you may choose.

Since having good metadata is in my view the principle raison d’etre for moving to computer audio in the first place, this argues against using WAV files.  FLAC has become the de facto standard for lossless downloaded music, but the big strike against FLAC is that you cannot load FLAC files into iTunes.  So you cannot use FLAC with BitPerfect (for the moment).  AIFF and Apple Lossless appear to sound like good bets, but in reality there is limited enthusiasm for these formats outside of the Apple ecosystem (although, to be fair, that is slowly changing).  At the root of this is a battle between Apple and the rest of the world for your music download dollars.

Please read the previous paragraph again.  There are no simple answers to the conundrum posed by it.

Most of the music I download is only available in FLAC format, but I cannot load these files into iTunes for playback using BitPerfect.  So my own approach is to transcode them to Apple Lossless immediately after downloading using a free App called XLD.  If I ever wanted to, it would be just as easy to convert them back to FLAC with absolutely zero loss of quality.  There are both free and paid Apps available on both Windows and Mac platforms which convert freely between lossless formats, so it is not really too big of a deal to convert an entire library from one format to another should the need arise.

Back to Part I.
Part III can be found here

An expanded discussion of audio file formats can be found here.