Friday, 26 April 2013

Ripping your CD Collection – III. Data Grooming

I hope I have managed to get across in my previous posts that the central benefit behind ripping your files and playing from a computer lies in the ability to use the metadata to enhance your playback experience.  What you will be able to do with your music collection will be limited – to a large extent – by the quality of your metadata.  So I want to spend some time on what is often called “metadata grooming” before getting round to the actual process of ripping the CDs and embedding the metadata into the resultant files.

For most users, this will not present too much of a challenge.  The metadata structures, and the way that current software uses it, was pretty much defined back in the nineties by techno-geeks who did the work in order to fill unserviced gaps their own needs.  So, if you listen mostly to rock, pop, and other modern musical genres, you too will probably find that the existing metadata structures meet most, if not all, of your needs.  But if you listen to classical music, you will find that the opposite is true.  For this reason, I will devote a separate post to data grooming for classical music listeners.  This current post considers only the existing mainstream musical needs.

Metadata is just Fields and Content, and how the two match up.  The Fields are the names given to the specific categories of metadata.  Typical Fields are Album, Artist, Title, and so forth.  The Content is what goes in the Field.  So an item of Content which might be “The Rise And Fall Of Ziggy Stardust And The Spiders From Mars” needs to go into a Field called “Album” and not one called “Artist”.  Data Grooming is basically the process of adjusting the Content to make sure it is properly descriptive of the Field, and in a form that will provide the most utility to you when it comes time to use it to browse your music collection.  But don’t worry too much, because most of the time that is going to happen automatically without your having to think about it.

One thing that is very important to grasp is that Apps such as iTunes provide only perfunctory support for the richness that good metadata provides.  Here is an obvious example.  Most Beatles songs were written by Lennon & McCartney.  So what do you enter into the “Composer” Field?  You have several approaches you can take.  First, you can enter “Lennon & McCartney”.  Second, you can enter “John Lennon & Paul McCartney”.  Or, if your name is Paul McCartney, you can write “Paul McCartney & John Lennon”.  So what happens if you want to browse your music collection to find cover versions of Beatles songs?  If you use Column Browser to list the “Composers”, you will find separate entries for all three of those variants, and they will not be adjacent to each other because they get listed in alphabetical order.  You might scroll down to “Lennon & McCartney” and not realize that the other entries exist further up and further down the Composers list.

Data Grooming is the process of finding and correcting these sorts of ambiguities.  And the first step in correcting them is to do your best to make sure they don’t happen in the first place, although if you buy downloaded music you don’t have any control over the metadata which has already been embedded into it.  When you rip your own CDs, you have the opportunity to perform a first pass over the metadata and make sure it conforms to one consistent standard.  Of course, you need to put some thought into what that standard should be.  Whatever you cannot correct at rip (or download) time, you will have to "groom" afterwards.

Think about how you want to use all that metadata.  When you browse through the list of Composers – and believe me that list can quickly grow to be pretty darned big – how do you want all the entries to appear?  Do you want to see “Bob Dylan” or “Dylan, Bob”.  If the former, “Bob Dylan” will be listed between “Bob Crew” and “Bob Feldman”.  If the latter, he will appear between “Dvořák, Antonín” and “Earle, Steve”.  It’s all about what makes the most sense to you, and you really need to spend time thinking about it before you start ripping.  But at the same time, you should bear in mind that the most popular nomenclature is “Bob Dylan” and that this will be what is employed in most everything you download, so if you want to standardize on “Dylan, Bob”, you need to be prepared to do a lot of Data Grooming to correct these entries.  Of course, some of you are going to believe it is only right and proper to use “Zimmerman, Robert...

Another important aspect to be aware of is that most metadata standards actually support multiple-valued entries.  So we can enter TWO items of Content for the “Composer” in our Beatles collection.  John Lennon” and “Paul McCartney” can appear as two separate entries in the list of composers, and any search for songs written by “John Lennon” (or “Lennon, John”…) will come up with songs he co-wrote with others as well as songs he wrote by himself.  However – big however – you need to be aware that a simple software App such as iTunes does not support multiple value fields.  A Day In The Life” would show up in iTunes as being composed by “John Lennon/Paul McCartney” if the file was in Apple Lossless format, and “John Lennon;Paul McCartney” if the file was in AIFF format, since the two formats specify different delimiters to separate individual content items in a multi-valued field.  (Interestingly, the Apple Lossless specification means that the band AC/DC would be treated as two separate Artists, “AC” and “DC”.  Ha Ha!).

I have focused on Composers here, because it is convenient, but the same applies to Artists.  Take the album “Supernatural”.  This is a Santana album, and so the Album Artist would be “Santana”.  However, each track features a different guest vocalist.  Therefore a good strategy would be, for each track, to enter “Santana” as the Album Artist, and to have multiple values for the Artist field, “Santana” (or “Carlos Santana”, or “Santana, Carlos”, according to your personal preference) together with “Rob Thomas”, etc.  Note that there can be any number of multiple entries.

My view is that multiple value fields are a HUGE benefit.  The fact that iTunes doesn’t handle it properly today is NOT in my view sufficient reason not to take full advantage of it.  If you put it off until such time as Apps improve to support it, you may find the size of the task will have become daunting.

When you use a music player App (such as iTunes) to edit your metadata, one thing you need to be sure about is whether or not the App just updates the metadata within its own internal database, or if it then updates the metadata embedded within the individual files to reflect the changes you made.  It should be your objective to keep the metadata embedded within the files current, because you want the flexibility of being able to move from your existing music player App to any better one that comes along, without leaving your precious groomed metadata behind.  I don’t use iTunes to groom metadata, so I am not 100% sure, but I think it does update the embedded metadata whenever you make an edit to the “Get Info…” page.

My own practice is to use a totally separate App to perform Data Grooming.  That App is MusicBee and it is a free App that runs only on Windows.  I just like the convenience of their user interface.  Plus, I can use it to play music while I’m working!  My process for adding new music to my library is (1) rip or download it on my Windows machine using EAC; (2) groom the metadata on the Windows machine using MusicBee; (3) make an Apple Lossless copy on the Windows machine using dBpoweramp; (4) move everything to my NAS; and (5) import the Apple Lossless files into iTunes.  Again, not for everybody, but it’s what I do.

Back to Part II.
Part IV can be found here.