39
Data Standards & Best Practices Kerstin Lehnert Lamont-Doherty Earth Observatory iedadata. org

Data Standards & Best Practices for the Stratigraphic Record

Embed Size (px)

Citation preview

Page 1: Data Standards & Best Practices for the Stratigraphic Record

Data Standards & Best PracticesKerstin LehnertLamont-Doherty Earth Observatory

iedadata.org

Page 2: Data Standards & Best Practices for the Stratigraphic Record

2

Vouchering the Stratigraphic Record A synthesis database?

Aggregates data that are published in articles or in data repositories

Requirements: Integration, Quality (Trusted data!) Needs standardized metadata, semantics, and persistent unique

identifiers

A trusted repository? Publishes and ensures persistent access to data Requirements: Compliance with international data

curation and repository standards Long-term preservation, data identification (DOI), editorial

procedures, etc.

Page 3: Data Standards & Best Practices for the Stratigraphic Record

3

Data Standards

“documented agreements on representation, format, definition, structuring, tagging, transmission, manipulation, use, and management of data.”

Discipline specific Data type specific Application specific

Page 4: Data Standards & Best Practices for the Stratigraphic Record

4

Data Standards: Why?

Re-usability of data

Reproducibility of science

Integration/interoperability of data

Page 5: Data Standards & Best Practices for the Stratigraphic Record
Page 6: Data Standards & Best Practices for the Stratigraphic Record

6

Reproducibility in the Field Sciences Workshop in May 2015, organized by AAAS (M. McNutt), AGU, and

ESA, funded by the Arnold Foundation Report in preparation

Technical Requirements for Transparent, Reproducible Data1. The data themselves must be publicly available in machine-readable, non-

proprietary formats with accurate and precise descriptive metadata; 2. Data provenance—process(es) by which usable datasets were generated or

derived from raw, often streaming or machine-readable-only data—must be accurately and precisely specified;

3. Computer code (“scripts”) and software with which datasets were analyzed must be available and adequately described to ensure their repeated use and be publicly available in non-proprietary formats, and;

4. Version control should be used to ensure that the original data and code are maintained.

(from draft workshop report)

Page 7: Data Standards & Best Practices for the Stratigraphic Record

7

Coalition for Publishing Data in the Earth & Space Sciences (COPDESS)

Joint initiative of Earth Science publishers and Data Facilities to better help translate the aspirations of open, available, and useful data from policy into practice. Reaffirm and ensure adherence to existing journal and

publishing policies and society position statements regarding open data sharing and archiving of data, tools, and models.

Ensure that Earth science data will, to the greatest extent possible, be stored in community approved repositories that can provide additional data services.

Statement of Commitment signed by all major Earth & Space Science publishers

7

www.copdess.org

Page 8: Data Standards & Best Practices for the Stratigraphic Record
Page 9: Data Standards & Best Practices for the Stratigraphic Record

9

9

Repository Standards

Open access

Data quality assurance (editorial process)

Persistence (long-term preservation)

Persistent & unique identification of data (DOI registration)

Standard-based metadata (ISO) & APIs (OAI-PHM)

Page 10: Data Standards & Best Practices for the Stratigraphic Record

accessible

small data

findableidentification,persistence

protection,protocols

context,provenance

re-usableharmonized, machine-readable

interoperable

BIG DATA

Generic Repositories Community Data Collections

Adding V

alue

Domain Repositories

Page 11: Data Standards & Best Practices for the Stratigraphic Record

11

Distributed Data Curation

Alert: Stratigraphy is multi-disciplinary There are many data types that already have homes

Paleobio Database Macrostrat/Digital Crust Geochron (@IEDA) MagIC Open Core Data (@IEDA – under development) EarthChem (@IEDA) System for Earth Sample Registration (@IEDA)

Don’t reinvent, but leverage, link, & integrate!

Page 12: Data Standards & Best Practices for the Stratigraphic Record

EarthCube

Page 13: Data Standards & Best Practices for the Stratigraphic Record

EarthCube: A Process

Get all the info at: http://earthcube.org

COMPUTER SCIENCES

SOFTWARE ENGINEERS

SCIENTIFIC VISIONTECHNICAL ARCHITECTURE

ENGAGEMENTFUNDED PROJECTS

Page 14: Data Standards & Best Practices for the Stratigraphic Record

14

Back to Data Standards

Metadata Content Structure (data model) Vocabularies & Taxonomies

Identifiers

(API = Application Programming Interface)

Page 15: Data Standards & Best Practices for the Stratigraphic Record

15

Metadata Standards

Geospatial

Scientific Context

Object classifications

Methods (instrumentation, computation, etc.)

Actions dates actors

Data provenance (references, authors, etc.)

Page 16: Data Standards & Best Practices for the Stratigraphic Record

16

16

Open Geospatial Consortium (OGC):Observations & Measurements

Observation Result

Feature of Interest

Sampling Sampling Feature

Observation

“Observations commonly involve sampling of an ultimate feature of interest. This International Standard defines a common set of sampling feature types classified primarily by topological dimension, as well as

samples for ex-situ observations.” (OGC O&M 2.0.0 / ISO19156; editor: Simon Cox)

e.g. Station,Transect, Section

Page 17: Data Standards & Best Practices for the Stratigraphic Record

Observation Data Model v2

Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain"

17

ODM2 Team:J S HorsburghA K AufdenkampeL HsuA JonesK LehnertE MayorgaL SongD TarbotonI Zaslavsky

Page 18: Data Standards & Best Practices for the Stratigraphic Record

18

18

Data Templates

LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data

Page 19: Data Standards & Best Practices for the Stratigraphic Record

Persistent Unique Identifiers

SamplesDataset

Article publication

Awards & grants

ORCID

Cruise ID

IGSN

DOI

FundRef

DOI

ResearchersField Program

Page 20: Data Standards & Best Practices for the Stratigraphic Record
Page 21: Data Standards & Best Practices for the Stratigraphic Record

Data DOI Metadata

Page 22: Data Standards & Best Practices for the Stratigraphic Record

22

22

Internet of Samples in the Earth Sciences Physical samples need to be linked to the digital data

generated by their study. Reproducibility! Access to the physical samples is required to

verify & reproduce observations. Re-usability! Access to information about samples is required

for proper evaluation & interpretation of sample-based data.

Physical samples need to be shared broadly for use & re-use.

Samples are often expensive to collect (drilling, remote locations). Many samples are unique and irreplaceable. Re-analysis augments utility of existing data. Samples often serve in ways that the collectors and repositories could not

have imagined.

3/26/2015

Page 23: Data Standards & Best Practices for the Stratigraphic Record

23

23

Unique Sample Identification

Imagine the possibilities … Easily find a specific sample and contact its owner Find all publications that mention a specific sample Find all data for that sample across the literature

and distributed databases Find other samples with similar properties

geospatial temporal compositional

Page 24: Data Standards & Best Practices for the Stratigraphic Record

24

24

Sample Identification Until Now

Samples have ambiguous and non-persistent names and cannot be properly cited.

The EarthChem Portal shows 75 publications with

geochemical data referenced to a sample with the name

M1 (or M-1). (www.earthchem.org)Names of dredge sample 3 of

the Amphitrite cruise(PetDB database, www.petdb.org)

Page 25: Data Standards & Best Practices for the Stratigraphic Record

25

25

Sample Identification From Now:IGSN: International Geo Sample Number

Persistent unique identifier for physical objects in the Earth Sciences Global uniqueness guaranteed via governance by the IGSN e.V.

Persistent access and preservation of sample metadata Cataloguing services of IGSN e.V. members Allows to build central search engine Resolving service of the IGSN central registry

Does not replace personal or institutional naming protocols

Page 26: Data Standards & Best Practices for the Stratigraphic Record

IGSN: Examples

Oriented Core Drill Hole (ODP)

Soil Section Rock Specimen

Page 27: Data Standards & Best Practices for the Stratigraphic Record

27

27

IGSN Status

International governance established in 2011 14 members (organizations) in the IGSN e.V. (www.igsn.org)

ca. 4 million samples registered (registration tripled in 2014)

>350 active users, including increasing number of individual scientists sample repositories & museums (Smithsonian, marine cores, geological surveys (USGS, Geoscience Australia, BGR) large-scale observatories and sampling campaigns

ICDP, IODP, CZO, DCO, GeoPRISMs, etc.)

Page 28: Data Standards & Best Practices for the Stratigraphic Record

IGSN Adoption

Page 29: Data Standards & Best Practices for the Stratigraphic Record

IGSN Adoption

COPDESS Statement of Commitment

Page 30: Data Standards & Best Practices for the Stratigraphic Record

IGSN in Action

Page 31: Data Standards & Best Practices for the Stratigraphic Record

31

IGSN in Action:

Publications

Page 32: Data Standards & Best Practices for the Stratigraphic Record

32

Metadata

Identification Sample name(s), registrant

Description Material, classification, age, size, comments

Geospatial information Geographical names, coordinates

Collection Expedition/cruise, platform, date, collector,

technique

Archiving/access Physical location of sample (repository), contact

32

Page 33: Data Standards & Best Practices for the Stratigraphic Record

IGSN Sample “Geneology” 33

Page 34: Data Standards & Best Practices for the Stratigraphic Record

34

34

Extended IGSN Metadata

Images Documents (.pdf, .xls, .doc) References URLs for related data resources User defined metadata

Page 35: Data Standards & Best Practices for the Stratigraphic Record

Internet of Samples in the Earth Sciences

iSamples RCN

Advance use of innovative CI to connect physical samples across the Earth Sciences with digital data infrastructure

Goals: Improve discovery, access, and re-usability of physical samples Improve re-usability and reproducibility of the data generated by their

study

Registries & Catalogs

Metadata

Identifiers

CitationRepositories

Software ToolsTaxonomies

Page 36: Data Standards & Best Practices for the Stratigraphic Record

C4P: Collaboration & Cyberinfrastructure for PaleoscienceAn EarthCube Research Coordination Network

Unravel the large-scale, long-term evolution of the Earth-Life System through the study of the geological record

Major challenges C4P addresses:• Heterogeneous & dispersed data• Modeling of age & time• Legacy & ‘dark’ data• Limited interoperability among resources• Variable semantics & ontologies

A diverse community:paleobiology, paleoclimate, paleoceanography, geochemistry, dendrochronology, stratigraphy, geochronology, sample curation, data management, bioinformatics, semantics, software architecture, and more ...

C4P achievements:• New resources

• data & software catalogs• Educational materials (webinars)

• New collaborations• Convergence on best practices (samples,

age, taxonomy)

Page 37: Data Standards & Best Practices for the Stratigraphic Record

37

Take Away Messages 37

develop leading practices for data

get community buy-in

align & coordinate with existing leading practices

leverage existing infrastructure

get started and don’t let the challenges stop you

Page 38: Data Standards & Best Practices for the Stratigraphic Record

“The Hitchhiker’s Guide to Geoinformatics”

(Lee Allison, LISTMG Workshop 2004)“Building an International

Collaboration for Geoinformatics”

(Walter Snyder, AGU 2005)

“Cyberinfrastructure for Solid Earth Geochemistry” (Kerstin Lehnert, GSA 2003)

The Cultural Challenges 38

Page 39: Data Standards & Best Practices for the Stratigraphic Record

39

Thank You!

"The wonderful thing about standards is that there are so many of them to

choose from”.

(Grace Hopper)