Common pitfalls in vocabulary selection


From the TC 372 Workshop Compendium

The EN 15907 standard often refers to a controlled vocabulary from which values for certain elements should be taken. There are several approaches to selecting, defining, or using such vocabularies, some of which can cause problems.



Bibliographic standards such as MARC employ a multitude of code lists, some of which could be useful for filmographic metadata.

Some scrutiny is recommended, however, before a code list is adopted from a community with a different focus. Using this example, how would we describe a flamenco theme that was composed for a film?

This is an example of a vocabulary with non-disjoint concepts. Motion picture music can belong to any musical genre. The term describes music under the aspect of usage, rather than under the aspect of artistic style. Mixing several aspects often causes a dilemma for indexers, forcing them to take more or less arbitrary decisions.



Maintaining your own country code list is generally not a good idea, at least for those countries where the U.N. and the ISO 3166 committee will do the maintenance work for you.

Alternative lists will be needed for countries that no longer exist, for former colonies, and for non-sovereign geographic entities in general.

This is a section from a list maintained by the Max Planck Institute for Intellectual Property Law, duplicating most of ISO 3166 with an incompatible encoding.



Categories that make sense to in-house staff do not necessarily make sense to the rest of the world.

While British Museum employees may find it natural to look for badges in the coins and medals department, few users of their collection database will hit on this idea in the first place.



Unsupervised automatic indexing will almost inevitably produce a significant fraction of ludicrous results.

This book has been assigned to the Music category by analysing titles for occurrences of supposedly related terms such as instruments.

• Previous: Agent and event: Further examples • Up: Contents • Next: Where to look for suitable element vocabularies