39
Science Metadata Viv Hutchison Core Science Analytics and Synthesis US Geological Survey Denver, CO [email protected] Data Management Practices for Early Career Scientists NACP All Investigator Meeting February 3, 2013

Science Metadata

  • Upload
    sumana

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Science Metadata. Viv Hutchison Core Science Analytics and Synthesis US Geological Survey Denver, CO [email protected] Data Management Practices for Early Career Scientists NACP All Investigator Meeting February 3, 2013. Topics. Examine information included in a metadata record - PowerPoint PPT Presentation

Citation preview

Page 1: Science Metadata

Science Metadata

Viv HutchisonCore Science Analytics and SynthesisUS Geological SurveyDenver, [email protected]

Data Management Practices for Early Career ScientistsNACP All Investigator MeetingFebruary 3, 2013

Page 2: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Examine information included in a metadata record

• Examples of metadata standards and how to choose one to use

• Illustrate the value of metadata to data users, data providers, and organizations

• Tips on how to write quality metadata records

• Publishing metadata

Topics

CC

im

ag

e b

y A

lec

Cou

ros

on

Flic

kr

2

Page 3: Science Metadata

NACP Best Data Management Practices, February 3, 2013

The Data Life Cycle

3

Page 4: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Data Collection

CC

im

ag

e b

y Ju

stin

See o

n F

lickr

CC

im

ag

e b

y C

IMM

YT o

n F

lickr

CC

im

ag

e b

y a

cord

ova

on

Flic

kr

CC

im

ag

e b

y k

ukk

uro

vaca

on

Flic

kr

CC

im

ag

e b

y S

ED

AC

on

Flic

krC

C im

ag

e b

y IS

AS

on F

lickr

4

Page 5: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Average Temperature of Observation for Each Species

From Field Notes to Datasets

Species Average Temperature

Temperature Standard Deviation

Number of Observations

Minimum Temperature

Maximum Temperature

Northern

Red-legged Frog

4.4 --- 1 4.4 4.4

Tailed Frog 7.0 3.0 3 4 10

Arizona Toad 10.0 --- 1 10 10

Strecker's Chorus Frog

10.5 2.0 11 9 16

Oregon Spotted Frog

11.0 15.5 2 0 22

New Jersey Chorus Frog

11.5 4.5 17 3 22

Wood Frog 12.5 5.5 897 0 28.8

Spring Peeper 13.2 5.6 569 -1 32

Red-legged Frog 13.3 5.9 16 4 27

5

Page 6: Science Metadata

NACP Best Data Management Practices, February 3, 2013

From Datasets to Published Papers

CC

im

ag

e b

y H

eath

er

Ken

ned

y

on

Flic

kr6

Page 7: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Metadata is a critical part of the data picture

CC

im

ag

e b

y I lik

e o

n F

lickr

7

Page 8: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Providing data to another researcher: – Why were the data created? – What limitations, if any, do the data have? – What does the data mean? – How should the data be cited if it is re-used in a new study?

• Receiving data from another researcher:– What are the data gaps?– What processes were used for creating the data?– Are there any fees associated with the data?– In what scale were the data created? – What do the values in the tables mean?– What software do I need in order to read the data?– What projection are the data in?– Can I give these data to someone else?

Think about Scenarios in Working with Data

8

Page 9: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Why Care About Metadata?• Fourth Paradigm: scientific breakthroughs will

increasingly be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.

• “Metadata must be preserved when scientific data is generated…” -- Jim Gray, The Fourth Paradigm

• Further the time/space distance between data producer and re-use, the more detailed metadata that is required.

9

Page 10: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Metadata: Why Care?“Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it.

Several times, I've seen colleagues called into court in order to testify about conditions they have observed.

Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble able to produce field notes, data approval records, and the like under cross-examination. Instead, they were, to back up their testimony.

It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.” Nelson Williams

Eastern RegionUSGS Water

10

Page 11: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Metadata: Why Care?

The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.

Senior climatologists were accused of manipulating important global temperature data

Investigations emphasized need for data to be more open to ensure credibility and avoid future misguided controversy

Metadata aids in open science

11

Page 12: Science Metadata

NACP Best Data Management Practices, February 3, 2013

A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters

“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.

“Planet hidden in Hubble archives” Science News

(Feb. 27, 2009)

Metadata: Why Care?

…Metadata is critical in maintaining data in archives – for understanding data you discover 12

Page 13: Science Metadata

NACP Best Data Management Practices, February 3, 2013

The Value of Metadata

Data developers

Datausers

Organizations

MetadataMetadatahelps…helps…

13

Page 14: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Metadata allows data developers to:– Avoid data duplication – Share reliable information– Publicize efforts – promote the

work of a scientist and his/her contributions to a field of study

– Reduce Workload

What is the Value to Data Developers?

CC

im

ag

e b

y U

S E

mb

ass

y G

uyan

a

on

Flic

kr

14

Page 15: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Metadata gives a user the ability to:– Search, retrieve, and evaluate data set

information from both inside and outside an organization

– Find data: Determine what data exists for a geographic location and/or topic

– Determine applicability: Decide if a data set meets a particular need

– Discover how to acquire the dataset you identified; process and use the dataset

What is the Value to Data Users?

CC

im

ag

e b

y A

SEE o

n F

lickr

15

Page 16: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Metadata helps ensure an organization’s investment in data:

– Documentation of data processing steps, quality control, definitions, data uses, and restrictions

– Ability to use data after initial intended purpose

• Transcends people and time: – Offers data permanence– Creates institutional memory

• Advertises an organization’s research: – Creates possible new partnerships and

collaborations through data sharing

What is the Value to Organizations?

CC

im

ag

e b

y m

am

bol on

Flic

kr

16

Page 17: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Information EntropyD

ATA

D

ETA

ILS

Time of data development

Specific details about problems with individual items or specific dates are lost relatively rapidly

General details about datasets are lost through time

Accident or technology change may make data unusable

Retirement or career change makes access to “mental storage” difficult or unlikely

Loss of data developer leads to loss of remaining information

TIME (From Michener et al 1997)17

Page 18: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Memory Check

50% change in global average

Why?i checked my 2002 email archives, and here is what i found out:

it appears that the current 3rd generation algorithm was implemented into operations around Oct-Nov 2002 time frame. cannot say more precisely, as all email correspondence i am looking at, talks about this indirectly. (maybe it's what's refered to as the Phase II algorithm.) At the same time, we had implemented quite a few other changes fixing data bugs and formats: view angle problem, increased digitization in all channel's reflectances and AODs, etc.

The jump is deemed due to introducing 3rd generation algorithm, which replaced the 2nd generation. The new numbers (~0.08) look more realistic than the previous ones (~0.05 or so). The changes seen in the data is close to the expected effect of this change. The 3rd gen alg takes into account the exact spectral response, whereas the 2nd gen is generic ("one size fits all").

hopefully this settles the issue..

Page 19: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Information Entropy

TIME

DA

TA

D

ETA

ILS

Sound information management, including metadata development, can arrest the loss of dataset detail.

19

Page 20: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data.

Still…There are Occasional Concerns About Creating Metadata

CC

im

ag

e b

y w

ate

rlily

sag

e

on

Flic

kr

20

Page 21: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Let’s Address these Concerns…

Concern Solution

workload required to capture accurate robust metadata

incorporate metadata creation into data development process – distribute the effort

time and resources to create, manage, and maintain metadata

include in grant budget and schedule

readability / usability of metadata use a standardized metadata format

discipline specific information and ontologies

use ‘profile’ standard to require specific information and use specific values

21

Page 22: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Selecting a Standard

22

Page 23: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Many standards collect similar information…factors to consider:

Your data type:• Are you working mainly with GIS data? Rastor/vector or point data?

Do you have biological or shoreline information in your dataset? - Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile.

• Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling?

–If so, then consider using the ISO 19115-2 standard• Are you mainly working with ecological data?

– Consider Ecological Metadata Language (EML)

Choosing a Metadata Standard

23

Page 24: Science Metadata

NACP Best Data Management Practices, February 3, 2013

– Your organization’s policies: do they state which standard to use?

– What tools are available to create metadata? Examples of Tools:

FGDC CSDGM: – Mermaid (NOAA)– Metavist (Forest Service) -- Online Metadata Editor (USGS) EML:–- Morpho (KNB)ISO: -- XML Spy or Oxygen--- CatMD

Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats

Choosing a Metadata Standard

24

Page 25: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Writing Quality Metadata

25

Page 26: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Organize your information– Did you write a project abstract to obtain funding for your

proposal? Re-use it in your metadata! – Did you use a lab notebook or other notes during the data

development process that define measurements and other parameters?

– Do you have the contact information for colleagues you worked with?

– What about citations for other data sources you used in your project?

Steps to Create Quality Metadata

CC

im

ag

e b

y o

n G

oog

le

Imag

es

26

Page 27: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Write your metadata using a metadata tool

Steps to Create Quality Metadata

27

Page 28: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Review for accuracy and completeness• Have someone else read your record• Revise the record, based on comments

from your reviewer• Review once more before you publish

Steps to Create Quality Metadata

CC

im

ag

e b

y m

uja

lifah

on

Fl

ickr

CC

im

ag

e b

y S

helly

Mu

nkb

erg

on

Flic

kr

28

Page 29: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Do not use jargon -- define technical terms and acronyms:

– CA, LA, GPS, GIS : what do these mean?• Clearly state data limitations

– E.g., data set omissions, completeness of data– Express considerations for appropriate re-use of the data

• Use “none” or “unknown” meaningfully– None usually means that you knew about data and

nothing existed (e.g., a “0” cubic feet per second discharge value)

– Unknown means that you don’t know whether that data existed or not (e.g., a null value)

Tips for Writing Quality Metadata

CC

im

ag

e b

y k

ruu

sch

t on

Flic

kr

29

Page 30: Science Metadata

NACP Best Data Management Practices, February 3, 2013

Titles, Titles, Titles…•Titles are critical in helping readers find your data

– While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs.

– Treat the title as the opportunity to sell your dataset.•A complete title includes: What, Where, When, Who, and Scale•An informative title includes: topic, timeliness of the data, specific information about place and geography

Tips for Writing Quality Metadata

30

Page 31: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• A Clear Choice: Which title is better?

• Rivers OR• Greater Yellowstone Rivers from 1:126,700 U.S.

Forest Service Visitor Maps (1961-1983)

Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961-1983) (when)

Tips for Writing Quality Metadata

CC

im

ag

e b

y d

olfi

on

Fl

ickr

31

Page 32: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use the data without contacting the dataset owner.

Vague: We checked our work and it looks complete.

Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections.

Tips for Writing Quality Metadata

CC

im

ag

e b

y P

NA

SH

on

Flic

kr

32

Page 33: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Use descriptive and clear writing• Fully qualify geographic locations• Select keywords wisely - use thesauri for keywords

whenever possible Example: USGS Biocomplexity Thesaurus (over 9,500 terms)

Tips for Writing Quality Metadata

CC

im

ag

e b

y M

arc

o A

rmen

t o

n F

lickr

33

Page 34: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Remember: a computer will read your metadata• Do not use symbols that could be misinterpreted:

Examples: ! @ # % { } | / \ < > ~• Do not use tabs, indents, or line feeds/carriage

returns• When copying and pasting from other sources, use

a text editor (e.g., Notepad) to eliminate hidden characters

Tips for Writing Quality Metadata

CC

im

ag

e b

y B

en

on

Goog

le

Imag

es

34

Page 35: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Fully define entities, attributes, units of measure• Ignore temptation to only fill in mandatory fields in the standard --

skipping sections of metadata standard labeled “mandatory if applicable” or “optional” are often critical portions of the standard

– Example:

Tips for Writing Quality Metadata

Seven Major Metadata Sections: Section 1 - Identification Information*Section 2 - Data Quality Information Section 3 - Spatial Data InformationSection 4 - Spatial Reference Information Section 5 - Entity and Attribute Information Section 6 - Distribution Information Section 7 - Metadata Information*

Three Supporting Sections:Section 8 - Citation Information*Section 9 - Time Period Information* Section 10 - Contact Information*

* Minimum required metadata 35

Page 36: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Share your metadata with other researchers Examples of metadata search portals:

– Data.gov• Federal e-gov geospatial data portalhttp://www.geo.data.gov

– Metacat• Repository for data and metadata• http://knb.ecoinformatics.org/index.jsp

– US Geological Survey• USGS Core Science Metadata Clearinghouse:

http://mercury.ornl.gov/clearinghouse– ArcGIS Online

• ESRI sponsored national geospatial data portalhttp://www.geographynetwork.com

Share Your Metadata: Distribution

CC

im

ag

e b

y R

GB

12

on

Flic

kr

36

Page 37: Science Metadata

NACP Best Data Management Practices, February 3, 2013

DataONE Search

37

Page 38: Science Metadata

NACP Best Data Management Practices, February 3, 2013

• Metadata is documentation of data• A metadata record captures critical information about the content of a

dataset• Metadata allows data to be discovered, accessed, and re-used• A metadata standard provides structure and consistency to data

documentation• Standards and tools vary – select according to defined criteria such as

data type, organizational guidance, and available resources• Metadata is of critical importance to data developers, data users, and

organizations• Writing quality metadata is important because records are expected to

last with the data over decades• Metadata completes a dataset.Creating robust metadata is in your OWN best interest!

Summary

38

Page 39: Science Metadata

Additional Slides

Science Metadata