The 3 C’s of Music Library Mangement – Completeness, Correctness, Consistency


Following is a guest post by our friend Dan Gravell over at Elsten Software. He is as obsessed with cleaning up music libraries as we are and will regularly be offering some great perspectives on music library management, metadata, and the like.


Early on, building a digital music collection is a doddle. Once you’ve learned how to
rip, buy and download music you begin to feel like you’re amassing the largest
digital jukebox you’ve ever seen, and it’s all under your control.

It seems so easy. If you rip music, automatic CD ripper software not only transfers
the music into computer music files, it also ‘tags’ the music with metadata such
as when the music was released, what genre it fits into and more. If you buy and download
music these tags are almost certainly already present in the music files you download.

But over time, music library management begins to get more difficult. The first reason
for this is an inevitability of the size of your collection. A larger collection requires
more effort to maintain. For instance, if you decide you want to add a ‘conductor’ tag
to all of your classical music the effort involved in this will be directly proportional
to the amount of music you have.

The second reason is that the sources of your music tends to become more diverse. You
might start out ripping your old CDs, but then decide you want to purchase new music
online and download it. The trouble is that the standards used for the CDs that you ripped
might be different to those applied at the music store your downloaded new tracks from.
For instance, album art provided with the downloaded music might be a much lower
quality than the album art you have found for your ripped CDs. If you’ve
got an iPad to show off that beautiful album art you’ll be annoyed by the lower quality
art, and want to achieve a more consistent breadth of art in your library.

In the main, it’s not the audio itself that becomes difficult to manage; it’s the metadata
and ancillary items like album art. These are the essential building blocks of a
music library, because they help us find, browse and search our music collections when
we want to play music. And because they are central to our music experience, when
they become untidy it’s obvious, and the more untidy they get the more you want to
do something about it.

The three Cs of music library management are the three measures by which the tidiness of
your music library is judged. They are:

* Completeness
* Correctness
* Consistency

Passt 2


Completeness is the completeness of your metadata. For instance, does each piece of
music you have in your music library have a genre or a year tag?

Completeness is probably the most obvious of our measures. It’s at its in-your-face
worst when you have an album where there are no tags at all. Here, I’m talking about
the dreaded “Unknown album” everyone seems to have in the deepest recesses
of their music collection. But it also occurs in more subtle incarnations. You can have
music that is otherwise perfectly tagged, with well formed album and artist names, but
with no data about the year of release, or what genre the music is in.

Completeness is important because the metadata we are discussing is at the heart of how
you browse and search your music collection. When you scroll through a list of the genres
in your music collection there’s no magic going on; your music player is simply showing
all the genres that are ‘tagged’ in your music files and, when you click through on a
genre, all albums that are associated with that genre tag. If the tag doesn’t exist, the
music player cannot list the music. This makes it harder to search or choose music to

What causes a lack of completeness? Typically, CD rippers are preconfigured to only
tag a constrained set of information types. Album name, genre and so on are all pretty
typical, but tagging can get quite exotic at times. For instance, for classical music
all manner of metadata can record soloists, orchestras, conductors and performers. It’s
unlikely that your CD ripper will tag these items. If you’re lucky, downloading music
online might provide more metadata, but you’re far from guaranteed.

The remedy tends to be an automatic music tagger based upon audio fingerprinting. Audio
fingerprinting attempts to recognise music from the actual audio data located within a
music file. Once recognised, its metadata can be looked up online and the file automatically
tagged. In addition, for music that is partly tagged with key identifiers like album
and artist name, bliss can be used to lookup data online, or you can do the job yourself
manually with a normal music tagger and a browser to find the information to tag.

Folding rule


There are two types of correctness: semantic correctness and syntactic correctness. Really
I should probably have broken them up, but then “The 3 Cs” had a nice ring to it… so
I didn’t.

Semantic correctness, or rather semantic _in_correctness, is easy for humans to recognise.
It’s that feeling when you scroll through your music library and see ‘Hunky Dory’ listed as being released in 1999. It’s only when you stop to think about it do you remember that it was *re*released in 1999, but still, you want the year of release to denote the original year of release.

Much data is pretty objective. Album name (mostly), year of release and so on. Other data
is more subjective, such as style, genre and mood. Online music information databases
are the best source for correcting information. For the more subjective data, which tend
to explode to thousands of different values, some process of data classification and grouping
is required, as is done with bliss’s genre consolidation.

Syntactic correctness gets more… nerdy. Here, we’re talking about tags, and the exact
format they take. Most are straightforward: album name is just a bunch of letters (‘character
strings’, to get all nerdy again). Some require a special format, however. Year of release
is generally a four digit number denoting the year, but what if there’re two numbers? What
if there’s a full year, month, date definition? Browsing music by year will be more difficult
to do if there’s a variety of tag syntaxes used.

Fractal Vegetable


Completeness and correctness are obvious and, with a fair bit of work, they are both
solvable. Consistency, my personal favourite, is less tractable and the hardest to get

Consistency is generally concerned with the semantic meaning of tags. I discussed semantics above when discussing correctness: it’s how music is classified by human invented, fluffy notions
like genre. Genre’s a good example because in many people’s libraries their genres are all
over the shop. Take a look at your own library now and I bet you have some genres uselessly
broad (“rock” or “general”) and some confusingly specific (“80’s boomtown revival”). If you
have a neatly tended set of genres then I doff my cap to you.

Well, I might “doff” it, but consistency doesn’t end there. Sure, working out which genres
to allow and what to convert those albums to that are not in the required genres is the
first step. But the next step is keeping them that way. What happens when you add a new
album, downloaded from online, which doesn’t obey your genre list? The best case is that you
simply change the genre to one of your allowed genres. The worst case, however, is that you
realise that, actually, the genre is representative of the album and you want to keep it. So,
rather than modifying the album you just imported, instead your must assess *every* other album
in your collection to see if they fall into the newly allowed genre. And as your library gets
larger, this gets harder.

It’s not just genre either. Consistency effects all manner of metadata. Here’s a common example:
what I call “disc number artifacts”, in other words “disk II” or “(disc A)” that you sometimes
see in album names for multi disc albums. If you are in any way obsessive about such things it
will probably annoy you to see inconsistency in the spelling of “disc” or the formatting of
the number. Either way, that’s just another example of many possible types of inconsistency.

Consistency is the reason I wrote bliss, a way of codifying the consistency rules
for your music library. I also made it (optionally) fully automatic, so when music is added the rules get applied without you doing anything. To ensure consistency you need rules, so the alternative
to using software is to write them down, intelligibly, and make sure you enforce them each
time you add music.

Phew! I hope my classification of different types of music library tidiness was interesting,
or at least provided food for thought!

Read More