32
Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011 Quarterly Meeting ~ Washington, DC

Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

Embed Size (px)

Citation preview

Page 1: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

Cyndy Chandler 22 July 2011

Biological and Chemical Oceanography Data Management Office

(BCO-DMO)

SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011 Quarterly Meeting ~ Washington, DC

Page 2: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

BCO-DMOWhat is BCO-DMO?Who is BCO-DMO?Why is BCO-DMO different?How do we accomplish our task?

Outline

Discussion: Data Management for Biodiversity Research

Page 3: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

BCO-DMO staff provide data management support for investigators and projects funded by NSF Ocean Sciences Biological and Chemical Oceanography Sections or NSF OPP ANT Organisms & Ecosystems Program

partner with individual investigators and those associated with collaborative research projects

data management support throughout the projectcapture and record documentation (metadata)

sufficient to support data reuse and re-purposing load data and metadata into a relational database and ensure

their availability onlineensure final archive in appropriate data center (e.g. NODC);

contribute to special repositories (e.g. CDIAC, OBIS, GenBank)

‘proposal to preservation’

What is BCO-DMO?

Page 4: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

BCO-DMO StaffBiology Department

Peter Wiebe (Lead Investigator)Robert Groman (co-PI)Dicky Allison (Data Specialist)Tobias Work (Programmer)

Marine Chemistry and GeochemistryDavid Glover (co-PI)Cyndy Chandler (co-PI)Stephen Gegg (Data Specialist)

additional data specialists, consultants and collaborators as needed

Who is BCO-DMO?

Page 5: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

BCO-DMO staff are funded to …support NSF OCE and OPP funded researchersensure that data are …

available to the research community in a timely mannersufficiently documented to facilitate reuse and re-purposing

work with investigators during all phases of research:data management planning and stewardship

proposal writingcruise preparationcruise and data documentationeffective organization of data in the BCO-DMO data system

permanent archive of data at NODC

Why is BCO-DMO different?

Page 6: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

How do we accomplish our task?

BCO-DMO staff work in partnership with PIs to create well-documented data sets from research programs

involving a wide variety of sampling gear

Page 7: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Data Discovery and Availabilityour primary task is to ensure that data from NSF OCE

funded awards are freely available online

the BCO-DMO data system and interfaces facilitatedata discovery (text and map-based browse systems)data access to assess fitness-for-purposedata export and downloaddata preservation in a permanent archive (the National

Oceanographic Data Center (NODC))

How do we accomplish our task?

Page 8: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Field Data to Databasein situ data from research cruises are documented and contributed to theonline data system and discoverable through a variety of user interfaces

How do we accomplish our task?

Original data from Bongo net towsand CTD/Niskin Rosette

Page 9: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 9 of 17

MOCNESS data – paper to digital

“Data Management in the Wild” ~ MOCNESS Datahauled in by people . . .

. . . the samples are processed by people, observations recorded by people, and digital data sets created by people

MOCNESS Sampling raw biology data raw physical data

digital biology data

digital physical

data

CTD sensor data

Page 10: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 10 of 17

MapServer Starting Screen

http://bco-dmo.org/BCO-DMO

Geospatial MapServer interface showing all available data.

http://bco-dmo.org/BCO-DMO

Geospatial MapServer interface showing all available data.

Page 11: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 11 of 17

MapServer with selections

access to dataaccess to data

Page 12: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

BCO-DMO staff work in partnership with PIs to create well-documented data sets to enable reuse

and re-purposing of data to support US contributions to large coordinated

research programs and global ocean research themes

How do we accomplish our task?

Page 13: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

BCO-DMO and Other Data RepositoriesBCO-DMO is part of a network of distributed data

repositories working to support the research community and ensuring that data are available in the public domain.

Carbon Dioxide Information Analysis Center

North American Carbon Program

Long Term Ecological ResearchNetwork

National Center for Biotechnology Information: GenBank

Rolling Deck to Repository (R2R)

How do we accomplish our task?

Page 14: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 14 of 17

“A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.”

In: Advice to a Young Investigator (Santiago Ramón y Cajal, 1897)

Thank you.Questions?

photo by Chris Linder (WHOI)

http://bco-dmo.org/

Page 15: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

What additional cyber-infrastructure is needed to support biodiversity research?

What else is needed to support biodiversity research?

The remaining slides are a supplement to the talk that may be useful during the data management discussion.

Page 16: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

NSF Dimensions of Biodiversity Programdata from 9 awards to be managed by BCO-DMO

NSF OCE #1046144Dimensions: The Role of Viruses in Structuring Biodiversity in Methanotrophic Marine Ecosystems

NSF OCE #1046017 and OCE #1046098 Dimensions: Significance of nitrification in shaping planktonic biodiversity in the ocean

NSF OCE # 1045966, 1046001 , 1046368 and 1046297 Dimensions: Biological controls on the ocean C:N:P ratios

NSF OCE #1046371 and 1046372 Dimensions: Uncovering the novel diversity of the copepod microbiome and its effect on habitat invasions by the copepod host

What else is needed to support biodiversity research?

Page 17: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

Marine Biodiversity Operation Network

Extended research network being considered

bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 17 of 17

Page 18: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Infrastructure OptionsChallenge:

there are currently many sources with overlapping and/or incomplete information

researchers must locate resources, resolve conflicts/duplicates, review and ‘repair’ retrieved data

Strategies and Solutions:data warehousing - extract, transfer, load datadata federation – network of distributed repositories

data remain at the source and are retrieved on demand

data aggregation – central catalog (e.g. EOL)

What else is needed to support biodiversity research?

Page 19: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Advantages and Disadvantagesdata warehousing – one central repository for all data

one system ‘one stop shop’ is rarely appropriate for all data typesdata and information loss during transfer

data federation – network of distributed repositoriesdata remain closer to the ‘source of origin’ and local expertisedata and information loss is limited requires negotiated arrangements (standards) to support

interoperability of distributed systemsLong-term preservation must be considered

data aggregation (e.g. EOL)

What else is needed to support biodiversity research?

Page 20: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Interoperability the ability of different data repository systems to

exchange and integrate data and information and present a unified view to the user

requires syntactic (format) compatibilitye.g. access/security, file formats, transfer protocols to retrieve data and information

requires semantic (language) compatibilitye.g. metadata standards, controlled vocabularies, ontologies to understand data and information

What else is needed to support biodiversity research?

Page 21: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Trans-disciplinary, cross-agency collaboration and cooperation

a workshop of 100 invited participants held in Broomfield, Colorado in March 2011 NSF sponsored with support from USGS primary objective: “to substantially advance discussions

and directions of data life cycle, data integration and data citation, with strong emphasis on end-use, and to provide a state-of-the-field report to NSF and the USGS of the geoinformatics community’s capabilities and needs ... “

final report (in progress) http://tw.rpi.edu/web/Workshop/Community/GeoData2011

Geo-Data Informatics 2011 WorkshopExploring the Life Cycle, Citation and Integration of Geo-Data

What else is needed to support biodiversity research?

Page 22: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

some thoughts . . . integration of distributed, loosely federated data

repositoriesdesigned to foster biodiversity research and assessment

Microbes to MammalsHabitat to HealthTaxonomy to Tipping Points

bco-dmo.org Biological and Chemical Oceanography Data Management

What else is needed to support biodiversity research?

Page 23: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Data repositories for biodiversity research?

What else is needed to support biodiversity research?

BCO-DMOLTER sitesNCBI GenBankOBISMICROBIS: ICoMM Marine Microbes DatabaseEOLprotein Data Bank (3D structures of DNA, RNA)Cell Image Library (cellimagelibrary.org)NOAA, NASA, EPA and USGS sitesLiterature (some are proprietary)

Page 24: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Coordinating groups for biodiversity research?

What else is needed to support biodiversity research?

NSF, NOAA, NASA, EPA and USGS agency program managers, representatives, committees

Interagency Working Groups and Advisory CommitteesScientific Steering Committees Interagency Working Group on Ocean Observations

(IWGOO) Support Office hosted at the Consortium for Ocean Leadership

Page 25: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

Other considerations:What are the connection axes (geospatial, temporal,

organism/taxon/species name)?PI name (e.g. Web of Science researcher ID;

or ORCID - Open Source ID for researchers)Data provenance is very importantPersistent identifiers (DOIs ?)References (reciprocal links) to published literatureAccess to proprietary information

bco-dmo.org Biological and Chemical Oceanography Data Management

What else is needed to support biodiversity research?

Page 26: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Existing Repositories

Other considerations:Long tail or ‘dark data’ (Heidorn 2008)

Page 27: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Other considerations

What are the use cases?

Benedict, et al. 2007

Page 28: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office Final Slide

What additional cyber-infrastructure is needed to support biodiversity research?

What else is needed to support biodiversity research?

Additional repositories?What about the *omics data?

Connections between repositories?Standards (semantic and syntactic)

Advisory groups, workshops and governance systems

Page 29: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

bco-dmo.org Biological and Chemical Oceanography Data Management Office

Existing Repositories

Existing Repositories

Page 30: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

CDIACCarbon Dioxide Information Analysis Center-Ocean CO2

http://cdiac.ornl.gov/oceans/TCO2 (DIC)

TALK

pH

pCO2

CFCs

SF6

CC14

CaCO3

DOC, TOC

TDN

dC14

Page 31: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

OBIS - USAOcean Biodiversity Information System (OBIS) - USA

http://obisusa.nbii.gov (will redirect)

Page 32: Cyndy Chandler 22 July 2011 Biological and Chemical Oceanography Data Management Office (BCO-DMO) SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011

GenBank http://www.ncbi.nlm.nih.gov/genbank/

SUBMIT DATA

SEARCH for DATA