13
Making It Happen March 19, 2013 Anita de Waard VP Research Data Collaborations, Elsevier RDS [email protected] Sustainable Data Preservation and Use Making It Happen:

Making Data Sharing Happen

Embed Size (px)

DESCRIPTION

Flash talk for Beyond the PDF 2, Amsterdam, 2013

Citation preview

Page 1: Making Data Sharing Happen

Making It Happen

March 19, 2013Anita de Waard

VP Research Data Collaborations, Elsevier [email protected]

Sustainable Data Preservation and Use

Making It Happen:

Page 2: Making Data Sharing Happen

“What aspects/tools/capabilities/frameworks are related to this idea?”

• There are many different research databases– both generic (Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …)

• There are many systems for creating/sharing workflows (Taverna, MyExperiment, Vistrails, Workflow4Ever etc)

• There are many e-lab notebooks (LabGuru, LabArchives, LaBlog, etc)

• There are scores of projects, committees, standards, bodies, grants, initiatives, conferences for discussing and connecting all of this (KEfED, Pegasus, PROV, RDA, Science Gateways, Codata, BRDI, Earthcube, etc. etc)

• You can make a living out of this ;-)! (and many of us do…)

Page 3: Making Data Sharing Happen

…but this is what scientists do:

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of this,and writes a paper. End of story.

Page 4: Making Data Sharing Happen

Why save research data?A. Data Preservation: – Preserve record of scientific process, provenance– Enable reproducible research

B. Data Use:– Use results obtained by others– Do better science!– Improve interdisciplinary work

C. Sustainable Models: – Technology transfer; societal/industrial development– Reward scientists for data creation (credit/attribution)– Long-term archiving

Page 5: Making Data Sharing Happen

> 50 My Papers2 M scientists

2 M papers/year

Where The Data Goes Now:

Majority of data(90%?) is stored

on local hard drivesDryad:

7,631 filesDataverse:

0.6 M

Datacite: 1.5 M

Some data (8%?) stored in large,

generic data repositories

MiRB: 25k

PetDB: 1,5 k

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

Page 6: Making Data Sharing Happen

> 50 My Papers2 M scientists

2 M papers/year

Key Needs:

Dryad: 7,631 files

Dataverse:0.6 M

Datacite: 1.5 M

MiRB: 25k

PetDB: 1,5 k

Majority of data(90%?) is stored

on local hard drives

Some data (8%?) stored in large,

generic data repositories

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

INCREASE DATA PRESERVATION

IMPR

OVE DAT

A USE

DEVELOP SUSTAINABLE MODELS

Page 7: Making Data Sharing Happen

Objections (and rebuttals) to data sharing:Objection: Rebuttal:“Our lab notebooks are all on paper – it’s how we do things”

Graft tools closely on scientists’ daily practice

“I need to see a direct benefit of any effort I put in.”

Create tools to allow better insight in own and other’s results.

“I don’t really trust anyone else’s data – and don’t think they’ll trust mine”

Create social networking context and allow data owner to provide granular access control.

“I am afraid other people might scoop my discoveries”

=> Reward system moves from a competition to a ‘shared mission’

Page 8: Making Data Sharing Happen

Prepare

Observe

Analyze

Ponder

Communicate

Prepare

Observe

Analyze

Ponder

Communicate

From insular ‘CoSI-Factories’…

Page 9: Making Data Sharing Happen

…to shared experimental repositories:

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Across labs, experiments: track reagents and how they are used

Page 10: Making Data Sharing Happen

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Compare outcome of interactions with these entities

…to shared experimental repositories:

Page 11: Making Data Sharing Happen

Prepare

Analyze Communicate

Prepare

AnalyzeCommunicate

Observations

Observations

Observations

Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments

…to shared experimental repositories:

Think

Page 12: Making Data Sharing Happen

• Grafting tools on workflow: create tailored metadata collection tools on mini-tablets in labs to replace paper notebook

• Direct rewards: through ‘PI-Dashboard’: allow immediate access/analysis of shared data: new science!

• Data sharing rewards: Data Rescue Challenge:: collect and reward stories/practices of data preservation/use in Earth/Lunar Science

• Improve data use: With NIF/Eagle-I: add antibodies as key ‘entities’ to paper, link to AB repository

Some examples:

c o n s o r t i u m

Page 13: Making Data Sharing Happen

How do we make data use happen:• We are creating repositories of shared experiments: you

are part of a greater whole!• Collect and share stories and practices re. data use and

sustainable systems: “What gets to them?”• Develop system of rewards for data sharing: enable

demonstrably better science!• Work with grant agencies, repositories (generic/specific,

institutional, cross-national) to integrate and annotate existing datasets and enable cross-use

• Collectively pioneer long-term funding options; support/develop ‘shared mission’ funding challenges