No entity without identity

From filmstandards.org

Revision as of 16:09, 4 April 2011 by Dbalzer (talk | contribs) (Created page with "==Where philosophers can help== Any statement about something must '''identify''' the thing in question. Introducing '''description levels''' into metadata also means introducin...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Where philosophers can help

Any statement about something must identify the thing in question. Introducing description levels into metadata also means introducing different concepts of identity.

Jazzgossen-rels-id.png

Graph based on information from the Swedish Film Database

In the relationship graph from the preceding session we identified instances of Cinematographic Work by a title, and instances of Agent by a name.

As databases grow, titles and names can quickly become ambiguous, while machines require unambigous indentifiers for operating on relationships. Moreover, in this example we have an instance of an Event entity, for which no "natural" identifier exists.

Imdb-agentname.png

From: Internet Movie Database, http://www.imdb.com/ accessed 13-Oct-2010

For many years, the creators of the Internet Movie Database (and also those of Wikipedia) believed that all things of interest could be identified uniquely and persistently by a name, a title, or similar.

This identifier scheme turned out to be difficult to manage as the databases grew in size.

In recent years, both databases have introduced non-semantic identifiers in the form of numbering schemes that remain hidden from the user interface.

Zensurkarte.jpg

No bureaucracy without identifiers.

An identifier such as "58981" can only be unique within a particular scope, in this case, the set of German film censorship records.

In databases, the scope of an identifier is usually limited to an entity from the data model. In this way, different entities can share the same set of identifiers, e.g. a Cinematographic Work 12345 can be distinguished from an Agent 12345.

In the Linked Open Data paradigm, the scope of an identifier is determined by a namespace. All UniformResource Identifiers (URIs) must contain a component that identifies a namespace.