How to future-proof digital metadata

For example, I have quite large collection of books, audio, photos etc.

Setting aside the question of preserving the data itself, it is equally important to make it easily searchable and catalog-able, without having to know how to parse the file contents (eg EXIF tags)

So I tried a number of software packages that handle libraries, but they all seem to get obsolete real fast.

Right now I am using a combination of custom-written wiki software based on CouchDB (because it's easy to replicate) together with JSON-formatted descriptors in folders. It is not pretty but it works.

Are there any "best practices" for storing metadata for file collections?

wizzard0

file-management
metadata
library-management
organization

Comments

Answer by Trevor Owens

Libraries, Archives and Museums seem to have generally gone with creating stacks of XML files with basic metadata in them. See for example MODS, EAD, or Dublin Core XML. As far as I'm concerned, what really matters is that there is textual information on hand. So if that's in XML files: Great. If it's in EXIF tags, or ID3 tags: that's great too. Frankly, that embeded EXIF or ID3 tag information has some particularly nice benifits to it, in that it sticks with the file. At the end of the day, you can open a photo or a mp3 in a text editor and see a bunch of nonsense and those tags too so the information sticks with the files. (This post has an example of seeing ID3 tags in a text editor.

Comments

Nicholas Webb: I'd add that if you use a database for day-to-day metadata management, build it around standards (to the extent that they fit your particular needs) and include the ability to export XML. Don't put this off until it's time to migrate everything out! I use DSpace to manage a mixed collection of audio, text and image objects described with Dublin Core. Descriptive and provenance metadata are stored in a Postgres database for search and retrieval, but DSpace can generate output packages at different levels of granularity (item, collection, entire repository) with this data as METS/MODS XML.

Answer by Nick Krabbenhoeft

Modern file systems are basically databases of metadata with pointers to bitstreams. This is great for accessibility since the database can be queried quickly, but not great for preservation since the database can become disassociated from the bitstreams. In more complex preservation systems like Safety Deposit Box/Preservica (this is a white paper about their work with the Swiss Federal Archives) the system can store one copy of the metadata in a database for access and a second copy in a package with the digital object on the storage medium.

Comments

Answer by John Lovejoy

I would suggest that a database would be the best place to store the metadata about the documents. This could be commercial off the shelf library/records software or home grown software. I imagine that trying to shoehorn metadata into a wiki would not be ideal.

You should not rely on the metadata embedded into the documents (eg EXIF, document properties), because it is likely to be a subset of all the metadata about that document. You have already flagged the problems associated with trying to parse the embedded metadata for retreival purposes. Also, different document types have different capacity for storing tags - you would have to try to ensure that it was easy to identify all incarnations of the same data (DC.Title vs vs title vs subject, for example)

No matter what database solution you use for your metadata you will need to determine your preservation strategy for that database (you mention the problem of the risk of obsolescence).

Your strategy would probably be migrating to a newer platform when the time is right. Dumping the information to an XML file (or two) could be an intermediate step in the migration process. It would all depend on the volume of data that needs to be migrated.