Difference between revisions of "Common pitfalls in vocabulary selection"

From filmstandards.org

Line 53: Line 53:
  
 
| valign="top" width="405px" |
 
| valign="top" width="405px" |
Unsupervised '''automatic indexing''' will almost inevitably produce a significant fraction of '''ludicrous results'''.
+
'''Unsupervised automatic indexing''' will almost inevitably produce a significant fraction of ludicrous results.
  
This book has been assigned to the Music category by analysing titles for occurrences of supposedly related terms such as instruments.
+
This book has been assigned to the ''Music'' category by analysing titles for occurrences of supposedly related terms such as ''instruments''.
 +
|}
 +
 
 +
{| height="20px" width="100%"
 +
|- style="text-align:center; "
 +
|<span style="color:#808080"> • Previous: [[Agent and event: Further examples]] • Up: [[TC 372 Workshop Compendium|Contents]] •  Next: [[Where to look for suitable element vocabularies]] • </span>
 +
|-
 
|}
 
|}

Revision as of 09:24, 5 April 2011

From the TC 372 Workshop Compendium

The EN 15907 standard often refers to a controlled vocabulary from which values for certain elements should be taken. There are several approaches to selecting, defining, or using such vocabularies, some of which can cause problems.


Marc-music.png

From: http://www.loc.gov/standards/valuelist/marcmuscomp.html

Bibliographic standards such as MARC employ a multitude of code lists, some of which could be useful for filmographic metadata.

Some scrutiny is recommended, however, before a code list is adopted from a community with a different focus. Using this example, how would we describe a flamenco theme that was composed for a film?

Voc-country.png

From: http://www.ip.mpg.de/ww/en/pub/library/research/systematic/countrycode.cfm

Maintaining your own country code list is generally not a good idea, at least for those countries where the U.N. and the ISO 3166 committee will do the maintenance work for you.

Alternative lists will be needed for countries that no longer exist, for former colonies, and for non-sovereign geographic entities in general.

This is a section from a list maintained by the Max Planck Institute for Intellectual Property Law, duplicating most of ISO 3166 with an incompatible encoding.

British-museum-1.png

From: http://www.britishmuseum.org/research/search_the_collection_database.aspx

Categories that make sense to in-house staff do not necessarily make sense to the rest of the world.

While British Museum employees may find it natural to look for badges in the coins and medals department, few users of their collection database will hit on this idea in the first place.

Ulib-book.png

From: http://www.ulib.org/

Unsupervised automatic indexing will almost inevitably produce a significant fraction of ludicrous results.

This book has been assigned to the Music category by analysing titles for occurrences of supposedly related terms such as instruments.

• Previous: Agent and event: Further examples • Up: Contents • Next: Where to look for suitable element vocabularies