18
Data! Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

Data!

Embed Size (px)

DESCRIPTION

Presentation at the Conference of Open Access Scholarly Publishers Association, UNESCO Headquarters, Paris, Sept. 18, 2014

Citation preview

Page 1: Data!

Data!Philip E. Bourne Ph.D.

Associate Director for Data ScienceNational Institutes of Health

Page 2: Data!

Some Context: NIH Data Science History

6/12 2/14 3/14

• Findings:• Sharing data & software through catalogs• Support methods and applications development• Need more training• Need campus-wide IT strategy• Hire CSIO• Continued support throughout the lifecycle

Page 3: Data!

My Bias

Still a scientist

A funder who still thinks like a PI

Not yet attuned to the federal system

Big supporter of OA via PLOS and others

Page 4: Data!

Data – A Few Observations …

We talk about the promise of big data, but we don’t even know the value of little data (aka could “Big Data” be the new “AI”)

Good data is expensive in terms of time and money

Looking at data retroactively is really expensive

Good data begats trust; trust begats community; community is God

The way we support scientific data currently is not sustainable

There is no workable business model currently for scientific data

Page 5: Data!

Data – A Few NIH Observations …

1. We have little idea how much we spend on data – estimated over $1bn per year

2. We have even less idea how much we should be spending

Point 2 is part of a culture clash between the more observational history of biomedicine and the new analytical approach to discovery

Page 6: Data!

ADDS Mission Statement

To foster an ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances

health, lengthens life and reduces illness and disability

Page 7: Data!

What Problems Are We Trying to Solve?

Possible Solutions

Sustainability – 50% business model

Efficiency – sharing best practices in longitudinal clinical studies

Collaboration - identification of collaborators at the point of data collection not publication

Reproducibility – data accessible with publication

Integration – phenotype homogenization

Accessibility – clinical trials registration

Quality – sharing CDEs across institutes

Training – keeping trainees in the ecosystem

Page 8: Data!

The Data Ecosystem

Community Policy

Infrastructure

• Sustainable business model

• Collaboration• Training

Page 9: Data!

Raw Materials to Seed the Ecosystem

NIH mandate & support

ADDS team of 8 people

Intramural participation of over 100 team members across ICs

Funding through BD2K:– ~$30M in FY14

– ~$80M in FY15

– ....

Page 10: Data!

Example Communities

– NIH

• 20/27 ICs

– Agencies

• NSF

• DOE

• DARPA

• NIST

– Government

• OSTP

• HHS HDI

• ONC

• CDC

• FDA

– Private sector

• Phrma

• Google

• Amazon

– Organizations

• PCORI

• RDA, ELIXIR

• CCC

• CATS

• FASEB, ISCB

• Biophysical Society

• Sloan Foundation

• Moore Foundation

Page 11: Data!

Example Policies

– Clinical data harmonization

– Data citation

– Machine readable data sharing plans on all grants

– New review models, audiences etc.

• Open review

• Micro funding

• Standing data committees to explore best practices

• Crowd sourcing

Page 12: Data!

Example Infrastructure: The Commons

Data

The Long Tail

Core Facilities/HS Centers

Clinical /Patient

The Why:Data Sharing Plans

TheCommons

Government

The How:

DataDiscoveryIndex

SustainableStorage

Quality

Scientific Discovery

Usability

Security/Privacy

The End Game:

KnowledgeNIHAwardees

PrivateSector

Metrics/Standards

Rest ofAcademia

Software StandardsIndex

BD2KCenters

Cloud, Research Objects,Business Models

Page 13: Data!

What Does the Commons Enable?

Dropbox like storage

The opportunity to apply quality metrics

Bring compute to the data

A place to collaborate

A place to discover

http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png

Page 14: Data!

[Adapted from George Komatsoulis]

One Possible Commons Business Model

HPC, Institution …

Page 15: Data!

Pilots Around A Virtuous CycleExpect a Funding Call

Page 16: Data!

Training & Diversity Training & Diversity

Training & Diversity Goals:

– Develop a sufficient cadre of diverse researchers skilled in the science of Big Data

– Elevate general competencies in data usage and analysis across the biomedical research workforce

– Combat the Google bus

How:

– Traditional training grants

– Work with IC’s on a needs assessment

– Standards for course descriptions with EU

– Work with institutions on raising awareness

– Partner with minority institutions

– Virtual/physical training center(s)?

Page 17: Data!

What Can Open Access Publishers Do?

Work with NIH on supporting data citation

Experiment with the idea of micropublication

Other?

Page 18: Data!

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

[email protected]