44
GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor Data documentation through metadata

GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Embed Size (px)

Citation preview

Page 1: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

GRAD 521, Research Data Management Winter 2014 – Lecture 9

Amanda L. Whitmire, Asst. Professor

Data documentation through metadata

Page 2: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Lesson topics

1. Definition of metadata

2. Examine information included in a metadata record

3. Examples of metadata standards and how to choose

4. Illustrate the value of metadata to data users, data providers, and organizations

5. Describe the utility of metadata for a variety of scenarios beyond discovery

Page 3: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

The data lifecycle

Page 4: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Data collection

CC im

age

by Ju

stin

See

on F

lickr

CC im

age

by C

IMM

YT o

n Fl

ickr

CC im

age

by a

cord

ova

on

Flic

kr

CC im

age

by k

ukku

rova

ca o

n Fl

ickr

CC im

age

by S

EDAC

on

Flic

krCC

imag

e by

ISAS

on

Flic

kr

Page 5: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

From field notes to datasets

Average temperature of observation for each species

SpeciesAverage

Temperature

Temperature Standard Deviation

Number of Observations

Minimum Temperature

Maximum Temperature

Northern Red-legged Frog

4.4 --- 1 4.4 4.4

Tailed Frog 7.0 3.0 3 4 10

Arizona Toad 10.0 --- 1 10 10

Strecker's Chorus Frog

10.5 2.0 11 9 16

Oregon Spotted Frog

11.0 15.5 2 0 22

New Jersey Chorus Frog

11.5 4.5 17 3 22

Wood Frog 12.5 5.5 897 0 28.8

Spring Peeper 13.2 5.6 569 -1 32

Red-legged Frog 13.3 5.9 16 4 27

Page 6: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

From datasets to published papers

CC im

age

by H

eath

er K

enne

dy o

n Fl

ickr

Page 7: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Working with data

When you provide data to someone else, what types of information would you want to include with the data?

When you receive a dataset from an external source, what types of details do you want to know about the data?

Page 8: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Working with data

Providing data: Why were the data created? What limitations, if any, do the data have? What does the data mean? How should the data be cited if it is re-used in a new study?

Receiving data:What are the data gaps?What processes were used for creating the data?Are there any fees associated with the data?In what scale were the data created? What do the values in the tables mean?What software do I need in order to read the data?What projection are the data in?Can I give these data to someone else?

Page 9: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

What is metadata?

“Data about data”

“Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”

NISO, Understanding Metadata

Page 10: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Metadata

“The metadata accompanying your data should be written for a user 20 years into the future -- what does that person need to know to use your data properly? Prepare the metadata for a user who is unfamiliar with your project, methods, or observations.”

Oak Ridge National Laboratory Distributed Active Archive Center for Biogeochemical

Dynamics(ORNL DAAC)

Page 11: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

What is metadata?

WHO created the data? WHAT is the content of the data? WHEN were the data created? WHERE is it geographically? HOW were the data developed? WHY were the data developed?

Phot

o by

Mic

helle

Cha

ng. A

ll Ri

ghts

Res

erve

d

Metadata is: Data ‘reporting’

Page 12: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Levels of metadata

PROJECT LEVELDescriptive information

DATA LEVELGranular information

Page 13: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Metadata in real life

You use it all the time…

Page 14: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Metadata standards

Dublin Core (DC), Darwin

Core (DwC), EML, DDI, NBII,

FGDC/CSDGM, ISO 19139, ISO

19115, DIF, LDIF, e-GMS,

AGLS, METS, MODS, PREMIS,

OAI-PMH, MARC, CDWA,

CIDOC/CRM, DACS, DIG35,

GILS, GML, ISBD, LCSH, KML,

MARCXML, MEI, MODS, MIX,

OAIS, ANSI/NISO Z39.88, PB

Core, PRISM, QDC, RDF,

SGML, VSO, XML, XMP

Page 15: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

What is a metadata standard?A Standard provides a structure to describe data with:

o Common terms to allow consistency between recordso Common definitions for easier interpretationo Common language for ease of communicationo Common structure to quickly locate information

In search and retrieval, standards provide:o Documentation structure in a reliable and predictable format for

computer interpretationo A uniform summary description of the dataset

CC im

age

by c

carls

tead

on

Flic

kr

Page 16: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

What does a metadata record look like?

Ocean Currents and Biogeochemistry: Nearshore Water Profiles (Monthly CTD and Chemistry; SBC-LTER)web link

New York City Community Health Survey, 2009 (ICPSR)web link

Mountain hemlock tree-ring width chronologies from the western Oregon Cascade Mountains (USFS Research Data Archive)web link

Page 17: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Muddiest point…

What did you find unclear about the

concept of metadata?

Page 18: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data.

Concerns about creating metadata

Page 19: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Concerns about creating metadata

Concern Solution

workload required to capture accurate robust metadata

incorporate metadata creation into data development process – distribute the effort

time and resources to create, manage, and maintain metadata

include in grant budget and schedule

readability / usability of metadata use a standardized metadata format

discipline specific information and ontologies

‘profile’ standard to require specific information and use specific values

Page 20: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

The value of metadata

Data creators

Datausers

Organizations

Metadatahelps…

Page 21: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

What is the value to data creators?

Metadata allows data creators to:o Avoid data duplication o Share reliable informationo Publicize efforts – promote the work of a scientist and

his/her contributions to a field of study

CC im

age

by U

S Em

bass

y G

uyan

a o

n Fl

ickr

Page 22: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

What is the value to data users?

Metadata gives a user the ability to:o Search, retrieve, and evaluate data

set information from both inside and outside an organization

o Find data: Determine what data exists for a geographic location and/or topic

o Determine applicability: Decide if a data set meets a particular need

o Discover how to acquire the dataset you identified; process and use the dataset

CC im

age

by A

SEE

on F

lickr

Page 23: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

What is the value to organizations?

Metadata helps ensure an organization’s investment in data

o Documentation of data processing steps, quality control, definitions, data uses, and restrictions

o Ability to use data after initial intended purpose

Transcends people & time o Offers data permanenceo Creates institutional memory

Advertises an organization’s research o Creates possible new partnerships and

collaborations through data sharing

Page 24: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Information EntropyDA

TA D

ETAI

LS

Time of data development

Specific details about problems with individual items or specific dates are lost relatively rapidly

General details about datasets are lost through time

Accident or technology change may make data unusable

Retirement or career change makes access to “mental storage” difficult or unlikely

Loss of data developer leads to loss of remaining information

TIME (From Michener et al 1997)

Page 25: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Information Entropy

TIME

DATA

DET

AILS Sound information

management, including metadata development, can arrest the loss of dataset detail.

Page 26: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

A closer look: the utility of metadata

Metadata can support:o data distributiono data managemento [project management]

If it is:o considered a component of the datao created during data developmento populated with rich content

derive classify

collect

planimetric imagery

analysis

alternativecommittee

review

PLAN

charette

meta

meta

meta

meta

Page 27: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Data distribution via metadata

metadata publication

dataportals

datadiscovery

Page 28: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Distribution: data discovery

The descriptive content of the metadata file can be used to identify, assess, and access available data resources.

• online access• order process• contacts

ACCESS

• use constraints• access constraints• data quality• availability/pricing

ASSESS

• keywords• geographic location• time period• attributes

IDENTIFY

Page 29: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Distribution: metadata publication

A metadata collection can be published to the internet via:

website catalogweb accessible folder (WAF)Z39.50 metadata clearinghousemetadata servicegeospatial data portal

Internet

Metadata CollectionUser Query

Internet /

Intranet

Dataset

Page 30: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Distribution: data portals

Examples of metadata search portals:Data.gov

Federal e-gov geospatial data portalhttp://www.geo.data.gov

MetacatRepository for data and metadatahttp://knb.ecoinformatics.org/index.jsp

US Geological SurveyUSGS Core Science Metadata Clearinghouse:

http://mercury.ornl.gov/clearinghouseICPSR

Political and Social Science data portal

Page 31: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Data management via metadata

DataAccountability

Discovery

& Re-use

Maintenance

& Update

DataLiability

Page 32: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Management: maintenance & updateMetadata records can used to track data provenance accuracyData Maintenance:

• Are the data current?o Do we have data older than ten years?o was before some political or geophysical event that resulted in

significant change?• Are the data valid?

o prior to most current source datao prior to most current methodologies

Data Update:• Contact information• Distribution policies, availability, pricing, URLs• New derivations of the dataset

Page 33: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Discovery: data reuse

If you create metadata, other people can discover your data

If you create metadata,you can find your own data

CC im

age

by O

cean

it D

aily

Pho

to

on F

lickr

Page 34: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Management: data discovery & reuse

Find your data by:o themes / attributeso geographic locationo time rangeso analytical methods usedo sources & contributorso data quality

Discoverable data is usable data!

CC im

age

by N

ASA

God

dard

Spe

ce F

light

Cen

ter o

n Fl

ickr

Page 35: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Management: data accountability

Metadata allows you to repeat scientific process if:o methodologies are definedo variables are definedo analytical parameters are defined

Metadata allows you to defend your scientific process:

o demonstrate process o increasingly GIS-savvy public

requires metadata for consumer information

INPUT

RESULTS

Page 36: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Management: data accountability

Metadata is an exercise in data accountability. It requires you to assess:

What do you know about the dataset?What don’t you know about the dataset?What should you know about the dataset?

Are you willing to associate yourself with the metadata record ?

Page 37: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Management: data liability

Metadata is a declaration of:Purposeo the originator’s intended application of

the data

Use Constraintso inappropriate applications of the data

Completenesso features or geographies excluded from the data

Distribution Liabilityo explicit liability of the data producer and assumed liability of the

consumer

What to do…

What not to do…

Page 38: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Review: the utility of metadata

Metadata can support: Data distribution

o discoveryo metadata publicationo data portals

Data managemento maintenance & updateo discovery & reuseo data accountabilityo data liability

[Project management]

Page 39: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Choosing Metadata Standards

Imag

e co

urte

sy o

f Viv

Hut

chin

son

Page 40: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Darwin Core | biological diversity, taxonomy

Dublin Core | general

DDI (Data Documentation Initiative) | social & behavioral sci.

DIF (Directory Interchange Format) | environmental sci.

EML (Ecological Metadata Language) | ecology, biology

ISO 19115| geographic data

Multiple standards exist

Browse by discipline: http://www.dcc.ac.uk/resources/metadata-standards

Page 41: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Comparing metadata standards

EML FGDC

Title Title

Abstract Abstract

Entity Description Entity Type Definition

Intellectual Rights Use Constraints

Page 42: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Choosing a metadata standard

Many standards collect similar informationFactors to consider:

1. Your data type• raster/vector GIS data, images, surveys/text, etc.

2. Organization [funder] policies3. Future preservation/sharing location4. Tools to support creation & distribution5. Other factors: Availability of human support;

instructional materials; use of controlled vocabularies; output formats

Page 43: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

Summaryo Metadata is documentation of datao A metadata record captures critical information about the content of a dataseto Metadata allows data to be discovered, accessed, and re-usedo A metadata standard provides structure and consistency to data

documentationo Standards and tools vary – select according to defined criteria such as data

type, organizational guidance, and available resourceso Metadata is of critical importance to data developers, data users, and

organizationso Metadata can be effectively used for:

• data distribution• data management• project management

o Metadata completes a dataset.

Creating robust metadata is in your OWN best interest!

Page 44: GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor

On Thursday

Barnard Classroom5th Floor