35
Metadata Management and Tools August 1, 2013 Data Curation Course

Metadata Management and Tools

  • Upload
    argus

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Metadata Management and Tools. August 1, 2013 Data Curation Course. Outline. General information about metadata Metadata and the data life cycle DDI – a specification for documenting social, behavioral and economic data Exercises. Defining Metadata. - PowerPoint PPT Presentation

Citation preview

Page 1: Metadata Management and Tools

Metadata Management and Tools

August 1, 2013Data Curation Course

Page 2: Metadata Management and Tools

Outline

• General information about metadata• Metadata and the data life cycle• DDI – a specification for documenting social,

behavioral and economic data• Exercises

Page 3: Metadata Management and Tools

Defining Metadata

• Metadata are commonly described as “data about data”

• Metadata serve as “bridge” between data producer and data user

• Metadata bring data to life, helping user to interpret and understand data

Page 4: Metadata Management and Tools

Simple Example

Bad Better…Better…

Best(Rich,

Structured)

Best(Rich,

Structured)

Page 5: Metadata Management and Tools

Importance of Metadata

• John MacInnes, Professor of Sociology, The University of Edinburgh, talks about the issues in using secondary data.*

• http://www.youtube.com/watch?v=xlQMVV7VJtA

* Video courtesy of MANTRA Research Data Management Training -- http://datalib.edina.ac.uk/mantra/

Page 6: Metadata Management and Tools

Concerns About Creating MetadataConcern Solution

workload required to capture accurate robust metadata

incorporate metadata creation into data development process – distribute the effort

time and resources to create, manage, and maintain metadata

include in grant budget and schedule

readability / usability of metadata use a standardized metadata format

discipline specific information and ontologies

‘profile’ standard to require specific information and use specific values

DataONE Education Module: Metadata. DataONE. Retrieved July 19, 2013

Page 7: Metadata Management and Tools

Metadata Types

• Types of metadata, by content: *– Descriptive: Intellectual content and contextual

information relevant to understanding and interpreting data

– Technical: Physical and digital features of a data resource

– Structural: Configuration of a resource, connections and relationships among parts, or among related resources

*Adapted from Jenn Riley, Seeing Standards: A Visualization of the Metadata Universe

Page 8: Metadata Management and Tools

Metadata and the Data Life Cycle

• Metadata–driven life cycle: Metadata are created, but also used and reused at every stage of the data life cycle

• Ideally, metadata continue to accumulate to provide a complete record of the evolution of a dataset

Page 9: Metadata Management and Tools

Metadata and the Data Life Cycle

Rich metadata = smooth life cycle, high quality data

Page 10: Metadata Management and Tools

Structured Metadata

• Enhances the value and usability of metadata• A consistent, predictable metadata structure

enables– More effective searches– Automated management and processing– Resource sharing– Interoperability

• Standardization leads to greater efficiency

Page 12: Metadata Management and Tools

Standards

Cartoon courtesy of XKCD.com

Page 13: Metadata Management and Tools

What is DDI?

• A metadata standard of and for the community• Two major development lines

– DDI Codebook– DDI Lifecycle

• Metadata for both human and machine consumption• Additional specifications:

– Controlled vocabularies – RDF vocabularies for use with Linked Data

Page 14: Metadata Management and Tools

DDI Background and History

• Its development started in the mid-1990s, as a grant-funded effort initiated and organized by ICPSR, with international participation

• First version published in February 2000

Page 15: Metadata Management and Tools

Background and History Continued

• The DDI Alliance was formed in 2003 to support and develop the DDI standard

http://www.ddialliance.org/• Ever-growing number of DDI users; large

multinational projects– CESSDA data portal (20 European data archives)– International Household Survey Network – IHSN

(developing countries from Africa, Asia, former Soviet Union, and more recently, Latin America)

Page 16: Metadata Management and Tools

DDI Members and Projects Worldwide

Page 17: Metadata Management and Tools

DDI Specification

• The first versions of DDI (1.0 through 2.1) were document- and codebook-centric

• Version 3.0 was published in April 2008 to document the data life cycle

Page 18: Metadata Management and Tools

RDF Vocabularies for Semantic Web

• DDI-RDF Discovery Vocabularyo For publishing metadata about datasets into the Web of Linked

Datao Based on DDI Codebook and DDI Lifecycle

• XKOSo RDF vocabulary for describing statistical

classifications, which is an extension of the popular SKOS vocabulary

Publication expected in second half of 2013

Page 19: Metadata Management and Tools

DDI of the Future

• Robust and persistent data model (for the metadata), with extension possibilities, variety of technical expressions

• Complete data life cycle coverage• Broadened focus for new research domains• Simpler specification that is easier to understand

and use including better documentation

Page 20: Metadata Management and Tools

Benefits of DDI Approach

• Rich content (currently over 800 items)• Metadata reuse across the life cycle• Machine-actionability• Data management and curation• Support for longitudinal data and

comparison

Page 21: Metadata Management and Tools

Metadata Reuse

Page 22: Metadata Management and Tools

DDI Alignment with Other Metadata Standards

• MARC: DDI-C, DDI-L• Dublin Core: DDI-C, DDI-L• SDMX (Statistical Data and Metadata Exchange):DDI-L• ISO 11179 (Metadata Registries): DDI-L• FGDC (Digital Geospatial Metadata): DDI-L• ISO 19115 (Geographic Information Metadata): DDI-L• PREMIS (Preservation Metadata), METS (Metadata

Encoding and Transmission): under consideration

Page 23: Metadata Management and Tools

DDI-L or DDI-C?• DDI-L

– Complex data (hierarchical, longitudinal, comparative)

– Metadata-driven survey design (building questionnaires)

– Multiple languages– Detailed geographic information– Metadata reuse across the data life cycle– Reusable resources: question/concept/variable

banks, registries of organizations and individuals, etc.

Page 24: Metadata Management and Tools

DDI-L or DDI-C?

• DDI-C– Documentation of simple, survey-type data– Catalog records, involving mainly study-level

descriptions (most new features in DDI-L relate to documenting data at item/variable level)

• Both DDI-C and DDI-L may be used within the same organization

• ICPSR uses DDI-C but has translation to DDI-L for study-level records

Page 25: Metadata Management and Tools

DDI-C Structure and ContentsDDI-C main sections:1. Document Description

Self-referencing information about the DDI instance at hand. Usually for internal use, not publicly displayed

2. Study DescriptionGeneral information about the study. Input is usually the introductory part of a codebook, describing the study scope, methodology, topical/temporal coverage, etc. In DDI-C this section also includes data access and availability information

3. File DescriptionDescribes physical characteristics of data file(s) – name, format, structure, dimensions

4. Data DescriptionDetailed description of each variable, including variable groups if applicable. Special subsection for documenting census-type aggregate data

• Other (Study Related) MaterialsReferences, or contains materials used in the production of the study or useful in the analysis of the data

For complete content and Tag Library see http://www.ddialliance.org/Specification/DDI-Codebook/2.1/DTD/Documentation/DDI2-1-tree.html

Page 26: Metadata Management and Tools

Study-level DDI Elements at ICPSR• Study ID (Number, DOI)• Title, Alternate Title• Author/Primary Investigator• Bibliographic Citation• Funding Information• Abstract• Keywords/Topic Classification• Series Information• Geographic Coverage• Time Period Covered• Time Method

Date(s) of CollectionMode of CollectionUniverseSamplingUnit of AnalysisResponse RatesWeighting InformationData TypeExtent of ProcessingAccess Conditions/RestrictionsVersion History

Page 27: Metadata Management and Tools

Study-level DDI at ICPSR• Leveraged in several ways

o Data discovery -- Forms basis of Solr/Lucene faceted search

o Repurposing -- Record is reused across ICPSR’s topical archive sites

o Interoperating -- Records shared with Data-PASS, ODESI, and CESSDA archives

o Study Overview -- Becomes PDF overview bundled with each download

Example: www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30103

Page 28: Metadata Management and Tools

DDI at ICPSR: Study-level Metadata Editor

Page 29: Metadata Management and Tools

DDI at ICPSR: Study-level Metadata Editor

Page 30: Metadata Management and Tools

Variable-level DDI elements at ICPSR

• Variable name and ID• Variable label• Question text• Descriptive variable text• Category labels and values (responses)• Category statistics (frequencies)• Summary statistics • Variable format• Notes

Page 31: Metadata Management and Tools

Variable-level DDI at ICPSR

• Variable-level DDI leveraged in several ways

o Search -- Permits search of variables within a dataset/serieso Search across ICPSR -- Serves as foundation for Social Science

Variables Databaseo Integration with online analysiso Codebook with frequencies -- Enables generation of PDF

documentation• Example:

http://www.icpsr.umich.edu/icpsrweb/ICPSR/ssvd/studies/30103/datasets/1/variables/Q25

Page 32: Metadata Management and Tools

Tools for generating DDI metadata• Nesstar Publisher

– DDI-C, study, file, and variable level• Colectica

– DDI-L configuration, study and variable level– Both DDI-C and DDI-L compatible (import and

export)– Exports DDI and PDF, HTML, RTF documentation

(no need to re-convert to presentation formats)• Colectica for Excel

Page 33: Metadata Management and Tools

Tools continued

• XCONVERT (SDA Berkeley)– DDI-C, variable level: converts SAS, SPSS, or

Stata syntax into DDI-XML, without frequencies

• StatTransfer (v. 11)– DDI-L, variable level: no frequencies

• MQDS tool– Exports Blaise to DDI-L to create study

documentation

Page 34: Metadata Management and Tools

Tools continued

• More DDI tools can be found here:http://www.ddialliance.org/resources/tools

Page 35: Metadata Management and Tools

Questions?