17
Research Data Management Philip Tarrant Global Institute of Sustainability

Research Data Management Philip Tarrant Global Institute of Sustainability

Embed Size (px)

Citation preview

Research Data Management

Philip TarrantGlobal Institute of Sustainability

New research data management world

Federal funding agencies now expect researchers to include data management plans in proposals

Article 26. Sharing of Findings, Data, and Other Research Productsa. NSF expects significant findings from research and education activities it supports to be promptly submitted for publication, with authorship that accurately reflects the contributions of those involved. It expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages grantees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.b. Adjustments and, where essential, exceptions may be allowed to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate legitimate interests of investigators.

New research data management world

Promise (DMP)

Deliver (share data)

Measure compliance

New research data management world

Promise (Easy)

Deliver (Hard)

Measure (TBD)

Get this right

Easy!

Context : The scientist’s view

• Design experiments that will collect the data needed to answer the research question(s)

• Process those data in a way that will produce the results needed to draw sensible conclusions

• Publish those conclusions so that they can be shared with the wider scientific community and ultimately the public

Investigators often feel that their responsibility ends here.

Context: The data manager’s view

• Consolidate the data collected by investigators and produce datasets in formats that encourage re-use by other investigators

• Make those data accessible to the wider scientific community and potential citizen scientists

• Publish the metadata necessary to enable the data to be interpreted by third parties

The investigators’ responsibility actually ends HERE!

The “reusable” data challengeHow do we:• Consolidate interdisciplinary data from disparate sources in

ways that provide practical (and achievable) opportunities for powerful, complex, synthetic analysis?

• Encourage commonality in scientific measurements and data standards so that we can perform a meaningful comparison between “apples” and “apples”?

• Increase data re-use to extract the maximum possible value from precious research dollars

• Share these data with collaborators in a way that supports answering the big questions?

Someone’s responsibility ends HERE!

Current data management process

Researcher completes

project

DM asks for data and metadata

DM asks for data and metadata

again

Researcher plans next

project

DM pledges eternal friendship and reminds about

metadata

Researcher publishes research

Researcher starts next

project

DM dies waiting for metadata

Researcher sends data

to DM

DM rescinds friendship pledge while pleading for

metadata

Researcher completes

project

DM asks for data and metadata

DM asks for data and metadata

again

Researcher plans next

project

Ideally, where do we want to be?

I need information…

Realistically, where do we want to be?

• A single, consolidated repository (“virtual” notebook) where we can store information about projects, organization, methods and protocols, and datasets (metadata)

• The means to enter data wherever we may be• A data catalog that helps find useful research

data (both published and unpublished)• Visualization tools to help assess the value of

data

A Single Datasource

Data Archives

GIOS DB

Projects

People

Media/refs

MetadataMethods

Enter information in different ways

GIOS DB

A catalog to help find and evaluate data

GIOS DB

Organization data

1

Public datasets Metadata

2

Internal datasets Metadata

3

Data management workflow

Create project record

Create metadat

a record(s)

Projectinitiation

Projectcompletion

Describe field

methods

Describe data

attributes

Submit data and

metadata

Datacollection

Dataanalysis

Finalize dataset

s

Describe lab

methods

Create dataset

s

Data Management System

Research publication

Publish data and

metadata

What does this mean to investigators?

• Effort required to input project and dataset information

• “Pay As You Go” model reduces the back end effort when you really want to be planning for the future

• A single resilient place to store project information – accessible by all team members

• Latest “version” always available• Content available as input to manuscripts• The next data management plan is…

Easy!

Project Timeline

• Design phase: December – February• Development: – Organization/project module: Jan – April – Metadata module: March – July

• Assistance with testing: June - July• Training: July – August

Questions?