Choosing a metadata standard

Posted by Dorothea Salo | Uncategorized | Friday 28 July 2006 1:47 pm

MARC, MODS, MARCXML, TEI, EAD, Dublin Core, METS, RDF, topic maps, ETD-MS… metadata standards abound. How to pick the right ones?

Shockingly, technical merit is close to the bottom of the decision stack. Many other concerns come first.

  • What is the problem domain? Pick the right tool for the job—or at least discard obviously wrong tools. If you are marking up metadata for electronic theses, EAD is not going to help you, designed as it is for archival finding aids. ETD-MS is what you want. METS and DIDL are at heart administrative and structural metadata; they are designed for fitting complex groups of digital files together. Don’t look at them if all you need is a standard for bibliographic-style descriptive metadata. If you’re trying to get your catalogue data out of MARC into something more hackable, you have a few choices—but EAD, ETD-MS, METS, and DIDL are not among them.

    Be aware, too, of the distinction between a metadata standard and a meta-standard that applies to metadata. Saying that you want “XML metadata” is meaningless unqualified; many metadata standards are expressed in XML (EAD, METS, and the TEI Header for starters) and others not natively tied to XML can still be expressed in it (such as Dublin Core and even MARC). Some standards contain “envelopes” for others; for example, METS can include or point to any number of different kinds of descriptive metadata.

  • Is the choice baked into the system? If you are starting up a repository that will emit OAI-PMH records, get used to Dublin Core, because OAI-PMH demands Dublin Core as the metadata base layer. If Dublin Core is good enough for your modest purposes, stop there; the decision has been made for you.

  • What are similar projects using? A literature search for the ever-present “How I Done It Good” articles may actually be useful. It can’t hurt to contact some of the project principals directly, either; they will have invaluable information about tradeoffs and pitfalls.

  • What else do you have to interoperate with? Will any of your metadata go into your MARC catalogue? You’ll want to make sure you can find or construct an appropriate crosswalk. Is OAI-PMH in your future? Examine Dublin Core. Do you want the metadata viewable on the Web? Then ask how easy it is to query from a database, or transform directly to HTML. Want other people to use it? Then pick something easily explained and manipulated—even if it’s not a library-created standard.

  • What kind of usage infrastructure is there? Rolling your own infrastructure is tedious at best. The more training materials and venues that exist for a metadata standard, the easier it will be to learn and ramp up in production. The more software that already exists for creating, storing, querying, and displaying this metadata, the less you have to create.

    Be careful, though; if a given metadata standard is poorly-documented or only supported by expensive proprietary software, what expense and hassle are you locking yourself into if you adopt it? I rarely recommend topic maps because topic-map software implementations are so expensive, much though I love them in theory. Expensive, convoluted, and proprietary systems are also one reason many systems librarians dislike MARC.

  • What will this metadata do? If it has to be stored in a relational database, XML-based metadata schemas may not be the best choice (though many are at least feasible). If you need non-experts to create metadata (especially through a web form), highly granular or complex metadata standards are likely not the best choice.

  • Is it a good standard that encourages good metadata? Last on the list, but that doesn’t mean to ignore it. Check for the right level of granularity, ease of creation, ease of access and comprehension, flexibility for hacking and recombination, solid best practices, and a lively support community.

Above all, don’t panic. In these days of crosswalks, your library can probably recover from a bad decision without too much expense, as long as you’re not tossing out an expensively-customized infrastructure along with it. (The travails of MARC are instructive here. Converting MARC to something else is already fairly feasible. The problem we have yet to solve is what to do about all our systems that depend on MARC!)

Read before you choose; pilot before you implement; evaluate wisely—and all should be well.

Canvassing for topics

Posted by Dorothea Salo | Uncategorized | Monday 10 July 2006 7:07 pm

Now that TechEssence has had a chance to settle down to business, it seems a good opportunity to canvass readers about topics they’d like to see us address. I know we’ve got some left over from last time, but I also suspect that new readers have new ideas. Please, leave a comment with yours!