28
Metadata standards and interoperability

Metadata standards and interoperability

  • Upload
    melva

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Metadata standards and interoperability. The world of standards. A standard is any agreed-upon means of doing something. Standards can be formally created and adopted or merely customary. - PowerPoint PPT Presentation

Citation preview

  • Metadata standardsand interoperability

  • The world of standardsA standard is any agreed-upon means of doing something.

    Standards can be formally created and adopted or merely customary.

    With standards, products and processes have a certain level of consistency and predictability that can make production and use more efficient.

  • Goals of metadata standardsMetadata standards enable more reliable and consistent description. For example, by agreeing to use separate fields to indicate first names and last names of resource creators, displays of search results by author can be properly alphabetized and more easily read, no matter if first name or last name comes first in the display.

    Reliable description facilitates the sharing of data across different systemsinteroperability.

  • Interoperability: for money as well as love Interoperable records facilitate information access and exchange across contexts: not just for cultural heritage. A distributor like Amazon sells products from many different providers. To the extent that Amazon can get interoperable product records from its suppliers, its job is easierand you can find that book, or pair of shoes, or compost bin.

  • Interoperability: for money as well as love Interoperable wine records?Astor WinesWine.comSherry-Lehmann67 Wine

  • Types of standardsElings and Waibel describe four types of metadata standards:

    Data structure (attributes, elements, or fields): Dublin Core; CDWA (museums), EAD (archives), MARC (libraries). Data content (values): CCO (museums), RDA (libraries), DACS (archives). Data format: XML (aaaannddd...MARC; EAD is also built around XML). Data exchange: Z39.50 and OAI.

    These are useful categories, but standards may straddle them. You could say, for example, that MARC reflects RDA and not the other way aroundalthough MARC defines data fields in a technical sense, RDA defines the content with which the fields are populated and to some degree conceptually determines the MARC fields; in practice these two become functionally intertwined.

  • Multiple standards at workA cataloger uses RDA to determine: That a books title should be part of its description. The wording, spelling, capitalization, and punctuation of the title.

    The cataloger uses MARC to record the title information in a consistent form that computers can process.

  • Multiple standards at workTwo computer networks can use Z39.50 to determine how to exchange their MARC catalog records.

    The result? A user at Library A can search Library Bs catalog and not discern a difference in the way that information is structured and presented. It just works.

  • Multiple standards at workAn archivist uses EAD to determine that an archival finding aid should include a scope and content note.

    The archivist uses DACS for guidance on what to include in the scope and content note and how to express that content.

  • Multiple standards at workThe archivist uses EAD to include the scope and content note in a machine-accessible document.

    The result? A researcher can access finding aids from Archive A online, and these documents have similar content and structure. Other areas of the finding aid document might appear as links in the Scope and Content note.

  • Multiple standards at workA museum curator is documenting a new acquisition in proprietary museum database software. The collection management system includes a field for the Work Type, which is a core attribute from CDWA. Guidance for describing the work type is given in CCO. The Art and Architecture Thesaurus (AAT) includes vocabulary terms that can be used to describe the work type.

  • Multiple standards at workLater, collection data is mapped from CDWA (data structure) to the Europeana Data Model (EDM) (data structure), for aggregation into Europeana and subsequent data reuse.

    In this mapping, the proprietary database format (data format) is translated to the EDMs RDF/XML schema (data format).

  • Developing and adopting standardsOrganizations agree to adopt standards because the benefits of creating products or services that work together can be great.

    However, developing standards and forging that agreement can be a difficult process.

    For metadata content standards, using them can be complicated, and there is plenty of room for interpretive flexibility.

  • Content standards: considerationsWhy are content standards so complicated? Because documents are various!

    Most content standards will try to implement a few basic guidelines supplemented by rules and options for special cases.

    Ideally, the basic guidelines will be based on clearly articulated goals and principles.

  • Example: RDA goalsRDA has articulated a concrete set of descriptive goals and principles. A few goals: Enable description of any resource (not just printed materials). Align with the FRBR conceptual model (works, expressions, manifestations, resources) and its objectives (finding, selecting, understanding, and so on). Create content descriptions that can be used in multiple encodings and displays. Retain backward compatibility with existing records.

  • Example: RDA PrinciplesOne principle is that descriptions should reflect the resources representation of itself.

    This is a longstanding principle in library cataloging: where possible, description = transcription.

    This can be linked to the objective of finding known items: the catalog description should match how the item is known to others, which is most likely from the item itself.

  • Example: RDA guidelinesThis principle of transcription underlies the basic guideline for RDA titles, which is that the title proper or primary title should come from the preferred source of information, which for books is the title page.

    While the wording comes from the title page, though, the capitalization and punctuation are standardized for all titles.

  • Example: RDA special casesWhat if... Some introductory words on the title page seem like theyre not really part of the title (e.g., Walt Disney Presents Sleeping Beauty)? The title is given in two languages (e.g., Canadian Literature/Literature Canadienne)? There is a spelling mistake in the title? The document is a manifestation of a commonly known work but has a slightly different title than most manifestations (e.g., William Shakespeares Hamlet)? A subtitle appears under what seems to be the main title (e.g., Museum Informatics an introductory textbook)? The title is over one paragraph long?

  • Keeping standards relevantStandards are immediately out of date.

    Particular institutions, such as the Library of Congress, will issue their own rules for interpreting the standards, which smaller organizations (such as the University of Texas) may or may not choose to adopt.

  • Levels of interoperabilityDifferent kinds of standards enable different kinds of interoperability. Lets say someone gives you a metadata record to incorporate in your database of records from your schema. What can you do with it? Your computer can read the filesystem interoperability. Your database understands the file formatsyntax interoperability. The attributes match other records in the databasestructural interoperability.The values in the fields are consistent with other records in the databasesemantic interoperability.

  • DerivationNew schemas are subsets, supersets, or direct translations of existing schemas:CDWA Lite is a subset of CDWA (removes some attributes). French Dublin Core is a translated version of Dublin Core (same attributes, different labels). Gateway to Educational Materials (GEM) adds elements to Dublin Core.

  • Application profilesApplication profiles mix attributes from different existing schemas or mix usage rules for attributes from different existing schemas. The application profile for the Digital Public Library of America (DPLA) uses elements from:Dublin Core.The Europeana data model (EDM).A Basic Geo schema created by the W3C (wgs84) for simple geographic information. The DPLA itself (published separately from the profile).

  • CrosswalksCrosswalks are mappings between one schema to another.

    For example, a crosswalk might specify that the Title element in CDWA should be mapped to the Title element in Dublin Core.

    Crosswalks can map only schema elements that are semantically equivalent, or they can map semantically close elements to each other.

  • Switching languagesSwitches map multiple schemas to a single switching language.

    For example, multiple content schemas could all be mapped to Dublin Core. The Dublin Core content could then in turn be mapped to something else. (This is more efficient than mapping each individual schema to the result.)

    Imagine a multilingual conversation in which everyone has a different native language but speaks French...

  • FrameworksA basic set of concepts and specifications that are agreed upon by a particular group.

    For example, the Warwick Framework is an early specification that designates the idea of a container as an aggregation of metadata sets, or packages.

    Agreements on ideas like containers and packages facilitate the sharing of different sorts of units. (The DPLA, for example, relies on service hubs that aggregate metadata sets from individual contributing institutions.)

  • RegistriesRegistries publish information about metadata schemas.

    Registries constitute reference information that facilitate the development of new application profiles, crosswalks, and so on.

    Open Metadata Registry

  • Aggregated infrastructuresSome examples of systems that are enabled via all of this stuff:Europeana, the European cultural heritage data aggregation.The Digital Public Library of America (DPLA).

    Europeana and the DPLA describe themselves primarily as platforms: they want you (really, they want you) to create applications and other cool stuff with the data (really metadata) that they aggregate and publish.

  • Schema assignment notesConsider whether attributes should be:Mandatory or optional.Repeatable.You might include general guidelines that apply to all attributes in your schema, as well as guidelines for each attribute. (Check the CDP best practice guidelines for an example.)

    ******The title proper goes in the 245 field. *******************Crosswalks can also map between encodings. ***