Upload
miranda-davis
View
219
Download
2
Embed Size (px)
Citation preview
1
CS 430: Information Discovery
Lecture 5
Descriptive Metadata 1
Libraries CatalogsDublin Core
2
Course Administration
•
3
Descriptive Metadata
• Catalog: metadata records that have a consistent structure, organized according to systematic rules.
• Abstract: a free text record that summarizes a longer document.
• Indexing record: less formal than a catalog record, but more structure than a simple abstract.
Some methods of information discovery search descriptive metadata about the objects.
Metadata typically consists of a catalog or indexing record, or an abstract, one record for each object.
4
Descriptive Metadata
• Usually stored separately from the objects that it describes, but sometimes is embedded in the objects.
• Usually the metadata is a set of text fields.
Textual metadata can be used to describe non-textual objects, e.g., software, images, music
5
Descriptive metadata
Information discovery is often most effective when applied to metadata rather than raw information
• Allows fielded searching
author = "Goethe"
• Suitable for non-textual material
type = "picture" and subject = "Ithaca"
• Can be used with controlled vocabulary
language = "en"
6
Origins of Library Catalogs
Bibliographic Objective:
• To bring together like items
• To differentiate among similar ones
Sir Anthony Panizzi, Keeper of Books at the British Museum (1856-67).
His Ninety-One Rules (1841) were the basis of modern catalogue rules.
7
Origins of Library Catalogs
Information Discovery:
• to enable a person to find a book of which either the author, title or subject is known
• to show what the library has by a given author, on a given subject, or in a given kind of literature
• to assist in the choice of a book as to its edition (bibliographically) or to its character (literary or topical).
Charles Ammi CutterLibrarian of the Boston Athenaeum
Rules for a Dictionary Catalog, 1874
8
Origins of Library Catalogs
Classification:
Division of subject matter into a hierarchy. Typically used in libraries to provided a subject-based order for shelving books.
Melvil DeweyActing Librarian of Amherst College (1874)
Dewey Decimal system of book classification, uses the numbers 000 to 999
to cover the general fields of knowledge and decimals to fit special subjects.
9
Technology
Materials to be catalogued:
• Originally books
• Extended to serials, maps, music, etc., but concepts still rely heavily on experience with books
Form of catalog:
• Entries in books (Panizzi)
• Index cards (Cutter)
• Online databases (Kilgour)
[Library Cataloguing will be continued in Lecture 6.]
10
Catalogs as Investments
Costs:
• Conventional Catalog Records are created by skilled librarians. (cost estimate $100 per record).
• OCLC's catalog has 43 million records. Total investment is several billion dollars.
Cataloguing Standards:
• Enable libraries to share records
• Combine records of the past with records created today
• Allow readers and librarians to move between libraries
11
Dublin Core
Simple set of metadata elements for online information
• 15 basic elements
• intended for all types and genres of material
• all elements optional
• all elements repeatable
Developed by an international group chaired by Stuart Weibel since 1995.
(Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)
12
13
Dublin Core
publisher: OCLC
creator: Weibel, Stuart L.
creator: Miller, Eric J.
title: Dublin Core Reference Page
date: 1996-05-28
format: text/html (MIME type)
language: en (English)
identifier: http://purl.org/dc/documents/rec-dces-199809.htm#
14
Dublin Core with Meta Tags
<meta name="publisher" content="OCLC">
<meta name="creator" content="Weibel, Stuart L.">
<meta name="creator" content="Miller, Eric J.">
<meta name="title" content="Dublin Core Reference Page">
<meta name="date" content="1996-05-28">
<meta name="format" content="text/html">
<meta name="language" content="en">
<meta name="identifier" content="http://purl.org/dc/documents/rec-dces-199809.htm#">
15
Dublin Core elements
1. Title The name given to the resource by the creator or publisher.
2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.
3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.
16
Dublin Core elements
4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.
5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.
6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).
17
Dublin Core elements
7. Date A date associated with the creation or availability of the resource.
8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary.
9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource.
10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.
18
Dublin Core elements
11. Source Information about a second resource from which the present resource is derived.
12. Language The language of the intellectual content of the resource.
13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).
19
Dublin Core elements
14. Coverage The spatial locations and temporal durations characteristic of the resource.
15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.
20
Qualifiers
Element qualifier
Example: Date
DC.Date -> Created: 1997-11-01
DC.Date -> Issued: 1997-11-15
DC.Date -> Available: 1997-12-01/1998-06-01
DC.Date -> Valid: 1998-01-01/1998-06-01
21
Qualifiers
Value qualifiers
Example: Subject
DC.Subject -> DDC: 509.123
DC.Subject -> LCSH: Digital libraries-United States
22
23
Dublin Core with qualifiers
<title>Digital Libraries and the Problem of Purpose</title>
<creator>David M. Levy</creator>
<publisher>Corporation for National Research Initiatives</publisher>
<date date-type = "publication">January 2000</date>
<type resource-type = "work">article</type>
<identifier uri-type = "DOI">10.1045/january2000-levy</identifier>
<identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier>
<language>English</language>
<rights>Copyright (c) David M. Levy</rights>
24
Limits of Dublin Core
Complex objects
• Article within a journal
• A thumbnail of another image
• The March 28 final edition of a newspaper
Complete object
Sub-objects
Metadata records
25
Flat v. linked records
Flat record
All information about an item is held in a single Dublin Core record, including information about related items
convenient for access and preservation
information is repeated -- maintenance problem
Linked record
Related information is held in separate records with a link from the item record
less convenient for access and preservation
information is stored once
Compare with normal forms in relational databases
26
Dublin Core with flat record extension
Continuation
<relation rel-type = "InSerial">
<serial-name>D-Lib Magazine</serial-name>
<issn>1082-9873</issn>
<volume>6</volume>
<issue>1</issue>
</relation>
27
Events
Version 1
New material
Version 2
Should Version 2 have its own record or should extra information be added to the Version 2 record?
How are these represented in Dublin Core?
28
Minimalist versus structuralist
Minimalist
15 elements, no qualifiers, suitable for non-professionals
encourage creators to provide metadata
Structuralists
15 elements, qualifiers, RDF, detailed coding rules
will require trained metadata experts
[For an example of how complex Dublin Core can become, see the source of: http://purl.org/dc/documents/rec-dces-199809.htm#]
29
Dublin Core in many languages
See:
Thomas Baker, Languages for Dublin Core, D-Lib MagazineDecember 1998, http://www.dlib.org/dlib/december98/12baker.html