Difference between revisions of "Common pitfalls in vocabulary selection"

From filmstandards.org

(Created page with "''From the TC 372 Workshop Compendium'' The EN 15907 standard often refers to a '''controlled vocabulary''' from which '''values for certain elements''' should be taken. The...")
 
Line 15: Line 15:
  
 
Some scrutiny is recommended, however, before a code list is adopted from a community with a '''different focus'''. Using this example, how would we describe a flamenco theme that was composed for a film?
 
Some scrutiny is recommended, however, before a code list is adopted from a community with a '''different focus'''. Using this example, how would we describe a flamenco theme that was composed for a film?
 +
|}
 +
 +
{| style="float: right; border: 1px solid #BBB; margin: .46em 0 0 .2em;"
 +
|-
 +
| valign="top" width="405px" |[[File:Voc-country.png|400px]]<br />
 +
<span style="font-size:8pt">
 +
From: http://www.ip.mpg.de/ww/en/pub/library/research/systematic/countrycode.cfm
 +
</span>
 +
 +
| valign="top" width="405px" |
 +
Maintaining your '''own country code list''' is generally not a good idea, at least for those countries where the U.N. and the ISO 3166 committee will do the maintenance work for you.
 +
 +
Alternative lists will be needed for countries that no longer exist, for former colonies, and for non-sovereign geographic entities in general.
 +
 +
This is a section from a list maintained by the Max Planck Institute for Intellectual Property Law, duplicating most of ISO 3166 with an incompatible encoding.
 +
|}
 +
 +
{| style="float: right; border: 1px solid #BBB; margin: .46em 0 0 .2em;"
 +
|-
 +
| valign="top" width="405px" |[[File:British-museum-1.png|400px]]<br />
 +
<span style="font-size:8pt">
 +
From: http://www.britishmuseum.org/research/search_the_collection_database.aspx
 +
</span>
 +
 +
| valign="top" width="405px" |
 +
Categories that make sense to '''in-house''' staff do not necessarily make sense to the '''rest of the world'''.
 +
 +
While British Museum employees may find it natural to look for badges in the coins and medals department, few users of their collection database will hit on this idea in the first place.
 +
|}
 +
 +
{| style="float: right; border: 1px solid #BBB; margin: .46em 0 0 .2em;"
 +
|-
 +
| valign="top" width="405px" |[[File:Ulib-book.png|400px]]<br />
 +
<span style="font-size:8pt">
 +
From: http://www.ulib.org/
 +
</span>
 +
 +
| valign="top" width="405px" |
 +
Unsupervised '''automatic indexing''' will almost inevitably produce a significant fraction of '''ludicrous results'''.
 +
 +
This book has been assigned to the Music category by analysing titles for occurrences of supposedly related terms such as instruments.
 
|}
 
|}

Revision as of 09:18, 5 April 2011

From the TC 372 Workshop Compendium

The EN 15907 standard often refers to a controlled vocabulary from which values for certain elements should be taken. There are several approaches to selecting, defining, or using such vocabularies, some of which can cause problems.


Marc-music.png

From: http://www.loc.gov/standards/valuelist/marcmuscomp.html

Bibliographic standards such as MARC employ a multitude of code lists, some of which could be useful for filmographic metadata.

Some scrutiny is recommended, however, before a code list is adopted from a community with a different focus. Using this example, how would we describe a flamenco theme that was composed for a film?

Voc-country.png

From: http://www.ip.mpg.de/ww/en/pub/library/research/systematic/countrycode.cfm

Maintaining your own country code list is generally not a good idea, at least for those countries where the U.N. and the ISO 3166 committee will do the maintenance work for you.

Alternative lists will be needed for countries that no longer exist, for former colonies, and for non-sovereign geographic entities in general.

This is a section from a list maintained by the Max Planck Institute for Intellectual Property Law, duplicating most of ISO 3166 with an incompatible encoding.

British-museum-1.png

From: http://www.britishmuseum.org/research/search_the_collection_database.aspx

Categories that make sense to in-house staff do not necessarily make sense to the rest of the world.

While British Museum employees may find it natural to look for badges in the coins and medals department, few users of their collection database will hit on this idea in the first place.

Ulib-book.png

From: http://www.ulib.org/

Unsupervised automatic indexing will almost inevitably produce a significant fraction of ludicrous results.

This book has been assigned to the Music category by analysing titles for occurrences of supposedly related terms such as instruments.