Difference between revisions of "Common pitfalls in vocabulary selection"
From filmstandards.org
(Created page with "''From the TC 372 Workshop Compendium'' The EN 15907 standard often refers to a '''controlled vocabulary''' from which '''values for certain elements''' should be taken. The...") |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 15: | Line 15: | ||
Some scrutiny is recommended, however, before a code list is adopted from a community with a '''different focus'''. Using this example, how would we describe a flamenco theme that was composed for a film? | Some scrutiny is recommended, however, before a code list is adopted from a community with a '''different focus'''. Using this example, how would we describe a flamenco theme that was composed for a film? | ||
+ | |||
+ | This is an example of a vocabulary with '''non-disjoint concepts'''. Motion picture music can belong to any musical genre. The term describes music under the aspect of usage, rather than under the aspect of artistic style. Mixing several aspects often causes a dilemma for indexers, forcing them to take more or less arbitrary decisions. | ||
+ | |} | ||
+ | |||
+ | {| style="float: right; border: 1px solid #BBB; margin: .46em 0 0 .2em;" | ||
+ | |- | ||
+ | | valign="top" width="405px" |[[File:Voc-country.png|400px]]<br /> | ||
+ | <span style="font-size:8pt"> | ||
+ | From: http://www.ip.mpg.de/ww/en/pub/library/research/systematic/countrycode.cfm | ||
+ | </span> | ||
+ | |||
+ | | valign="top" width="405px" | | ||
+ | Maintaining your '''own country code list''' is generally not a good idea, at least for those countries where the U.N. and the ISO 3166 committee will do the maintenance work for you. | ||
+ | |||
+ | Alternative lists will be needed for countries that no longer exist, for former colonies, and for non-sovereign geographic entities in general. | ||
+ | |||
+ | This is a section from a list maintained by the Max Planck Institute for Intellectual Property Law, duplicating most of ISO 3166 with an incompatible encoding. | ||
+ | |} | ||
+ | |||
+ | {| style="float: right; border: 1px solid #BBB; margin: .46em 0 0 .2em;" | ||
+ | |- | ||
+ | | valign="top" width="405px" |[[File:British-museum-1.png|400px]]<br /> | ||
+ | <span style="font-size:8pt"> | ||
+ | From: http://www.britishmuseum.org/research/search_the_collection_database.aspx | ||
+ | </span> | ||
+ | |||
+ | | valign="top" width="405px" | | ||
+ | Categories that make sense to '''in-house''' staff do not necessarily make sense to the '''rest of the world'''. | ||
+ | |||
+ | While British Museum employees may find it natural to look for badges in the coins and medals department, few users of their collection database will hit on this idea in the first place. | ||
+ | |} | ||
+ | |||
+ | {| style="float: right; border: 1px solid #BBB; margin: .46em 0 0 .2em;" | ||
+ | |- | ||
+ | | valign="top" width="405px" |[[File:Ulib-book.png|400px]]<br /> | ||
+ | <span style="font-size:8pt"> | ||
+ | From: http://www.ulib.org/ | ||
+ | </span> | ||
+ | |||
+ | | valign="top" width="405px" | | ||
+ | '''Unsupervised automatic indexing''' will almost inevitably produce a significant fraction of ludicrous results. | ||
+ | |||
+ | This book has been assigned to the ''Music'' category by analysing titles for occurrences of supposedly related terms such as ''instruments''. | ||
+ | |} | ||
+ | |||
+ | {| height="20px" width="100%" | ||
+ | |- style="text-align:center; " | ||
+ | |<span style="color:#808080"> • Previous: [[Agent and event: Further examples]] • Up: [[TC 372 Workshop Compendium|Contents]] • Next: [[Where to look for suitable element vocabularies]] • </span> | ||
+ | |- | ||
|} | |} |
Latest revision as of 12:17, 9 May 2011
From the TC 372 Workshop Compendium
The EN 15907 standard often refers to a controlled vocabulary from which values for certain elements should be taken. There are several approaches to selecting, defining, or using such vocabularies, some of which can cause problems.
From: http://www.loc.gov/standards/valuelist/marcmuscomp.html |
Bibliographic standards such as MARC employ a multitude of code lists, some of which could be useful for filmographic metadata. Some scrutiny is recommended, however, before a code list is adopted from a community with a different focus. Using this example, how would we describe a flamenco theme that was composed for a film? This is an example of a vocabulary with non-disjoint concepts. Motion picture music can belong to any musical genre. The term describes music under the aspect of usage, rather than under the aspect of artistic style. Mixing several aspects often causes a dilemma for indexers, forcing them to take more or less arbitrary decisions. |
From: http://www.ip.mpg.de/ww/en/pub/library/research/systematic/countrycode.cfm |
Maintaining your own country code list is generally not a good idea, at least for those countries where the U.N. and the ISO 3166 committee will do the maintenance work for you. Alternative lists will be needed for countries that no longer exist, for former colonies, and for non-sovereign geographic entities in general. This is a section from a list maintained by the Max Planck Institute for Intellectual Property Law, duplicating most of ISO 3166 with an incompatible encoding. |
From: http://www.britishmuseum.org/research/search_the_collection_database.aspx |
Categories that make sense to in-house staff do not necessarily make sense to the rest of the world. While British Museum employees may find it natural to look for badges in the coins and medals department, few users of their collection database will hit on this idea in the first place. |
From: http://www.ulib.org/ |
Unsupervised automatic indexing will almost inevitably produce a significant fraction of ludicrous results. This book has been assigned to the Music category by analysing titles for occurrences of supposedly related terms such as instruments. |
• Previous: Agent and event: Further examples • Up: Contents • Next: Where to look for suitable element vocabularies • |