29
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

Embed Size (px)

Citation preview

Page 1: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

1

CS 430: Information Discovery

Lecture 5

Descriptive Metadata 1

Libraries CatalogsDublin Core

Page 2: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

2

Course Administration

Page 3: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

3

Descriptive Metadata

• Catalog: metadata records that have a consistent structure, organized according to systematic rules.

• Abstract: a free text record that summarizes a longer document.

• Indexing record: less formal than a catalog record, but more structure than a simple abstract.

Some methods of information discovery search descriptive metadata about the objects.

Metadata typically consists of a catalog or indexing record, or an abstract, one record for each object.

Page 4: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

4

Descriptive Metadata

• Usually stored separately from the objects that it describes, but sometimes is embedded in the objects.

• Usually the metadata is a set of text fields.

Textual metadata can be used to describe non-textual objects, e.g., software, images, music

Page 5: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

5

Descriptive metadata

Information discovery is often most effective when applied to metadata rather than raw information

• Allows fielded searching

author = "Goethe"

• Suitable for non-textual material

type = "picture" and subject = "Ithaca"

• Can be used with controlled vocabulary

language = "en"

Page 6: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

6

Origins of Library Catalogs

Bibliographic Objective:

• To bring together like items

• To differentiate among similar ones

Sir Anthony Panizzi, Keeper of Books at the British Museum (1856-67).

His Ninety-One Rules (1841) were the basis of modern catalogue rules.

Page 7: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

7

Origins of Library Catalogs

Information Discovery:

• to enable a person to find a book of which either the author, title or subject is known

• to show what the library has by a given author, on a given subject, or in a given kind of literature

• to assist in the choice of a book as to its edition (bibliographically) or to its character (literary or topical).

Charles Ammi CutterLibrarian of the Boston Athenaeum

Rules for a Dictionary Catalog, 1874

Page 8: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

8

Origins of Library Catalogs

Classification:

Division of subject matter into a hierarchy. Typically used in libraries to provided a subject-based order for shelving books.

Melvil DeweyActing Librarian of Amherst College (1874)

Dewey Decimal system of book classification, uses the numbers 000 to 999

to cover the general fields of knowledge and decimals to fit special subjects.

Page 9: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

9

Technology

Materials to be catalogued:

• Originally books

• Extended to serials, maps, music, etc., but concepts still rely heavily on experience with books

Form of catalog:

• Entries in books (Panizzi)

• Index cards (Cutter)

• Online databases (Kilgour)

[Library Cataloguing will be continued in Lecture 6.]

Page 10: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

10

Catalogs as Investments

Costs:

• Conventional Catalog Records are created by skilled librarians. (cost estimate $100 per record).

• OCLC's catalog has 43 million records. Total investment is several billion dollars.

Cataloguing Standards:

• Enable libraries to share records

• Combine records of the past with records created today

• Allow readers and librarians to move between libraries

Page 11: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

11

Dublin Core

Simple set of metadata elements for online information

• 15 basic elements

• intended for all types and genres of material

• all elements optional

• all elements repeatable

Developed by an international group chaired by Stuart Weibel since 1995.

(Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)

Page 12: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

12

Page 13: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

13

Dublin Core

publisher: OCLC

creator: Weibel, Stuart L.

creator: Miller, Eric J.

title: Dublin Core Reference Page

date: 1996-05-28

format: text/html (MIME type)

language: en (English)

identifier: http://purl.org/dc/documents/rec-dces-199809.htm#

Page 14: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

14

Dublin Core with Meta Tags

<meta name="publisher" content="OCLC">

<meta name="creator" content="Weibel, Stuart L.">

<meta name="creator" content="Miller, Eric J.">

<meta name="title" content="Dublin Core Reference Page">

<meta name="date" content="1996-05-28">

<meta name="format" content="text/html">

<meta name="language" content="en">

<meta name="identifier" content="http://purl.org/dc/documents/rec-dces-199809.htm#">

Page 15: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

15

Dublin Core elements

1. Title The name given to the resource by the creator or publisher.

2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

Page 16: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

16

Dublin Core elements

4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.

5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.

6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

Page 17: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

17

Dublin Core elements

7. Date A date associated with the creation or availability of the resource.

8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary.

9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource.

10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

Page 18: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

18

Dublin Core elements

11. Source Information about a second resource from which the present resource is derived.

12. Language The language of the intellectual content of the resource.

13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

Page 19: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

19

Dublin Core elements

14. Coverage The spatial locations and temporal durations characteristic of the resource.

15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

Page 20: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

20

Qualifiers

Element qualifier

Example: Date

DC.Date -> Created: 1997-11-01

DC.Date -> Issued: 1997-11-15

DC.Date -> Available: 1997-12-01/1998-06-01

DC.Date -> Valid: 1998-01-01/1998-06-01

Page 21: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

21

Qualifiers

Value qualifiers

Example: Subject

DC.Subject -> DDC: 509.123

DC.Subject -> LCSH: Digital libraries-United States

Page 22: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

22

Page 23: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

23

Dublin Core with qualifiers

<title>Digital Libraries and the Problem of Purpose</title>

<creator>David M. Levy</creator>

<publisher>Corporation for National Research Initiatives</publisher>

<date date-type = "publication">January 2000</date>

<type resource-type = "work">article</type>

<identifier uri-type = "DOI">10.1045/january2000-levy</identifier>

<identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier>

<language>English</language>

<rights>Copyright (c) David M. Levy</rights>

Page 24: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

24

Limits of Dublin Core

Complex objects

• Article within a journal

• A thumbnail of another image

• The March 28 final edition of a newspaper

Complete object

Sub-objects

Metadata records

Page 25: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

25

Flat v. linked records

Flat record

All information about an item is held in a single Dublin Core record, including information about related items

convenient for access and preservation

information is repeated -- maintenance problem

Linked record

Related information is held in separate records with a link from the item record

less convenient for access and preservation

information is stored once

Compare with normal forms in relational databases

Page 26: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

26

Dublin Core with flat record extension

Continuation

<relation rel-type = "InSerial">

<serial-name>D-Lib Magazine</serial-name>

<issn>1082-9873</issn>

<volume>6</volume>

<issue>1</issue>

</relation>

Page 27: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

27

Events

Version 1

New material

Version 2

Should Version 2 have its own record or should extra information be added to the Version 2 record?

How are these represented in Dublin Core?

Page 28: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

28

Minimalist versus structuralist

Minimalist

15 elements, no qualifiers, suitable for non-professionals

encourage creators to provide metadata

Structuralists

15 elements, qualifiers, RDF, detailed coding rules

will require trained metadata experts

[For an example of how complex Dublin Core can become, see the source of: http://purl.org/dc/documents/rec-dces-199809.htm#]

Page 29: 1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core

29

Dublin Core in many languages

See:

Thomas Baker, Languages for Dublin Core, D-Lib MagazineDecember 1998, http://www.dlib.org/dlib/december98/12baker.html