Upload
philip-bourne
View
314
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation at the Conference of Open Access Scholarly Publishers Association, UNESCO Headquarters, Paris, Sept. 18, 2014
Citation preview
Data!Philip E. Bourne Ph.D.
Associate Director for Data ScienceNational Institutes of Health
Some Context: NIH Data Science History
6/12 2/14 3/14
• Findings:• Sharing data & software through catalogs• Support methods and applications development• Need more training• Need campus-wide IT strategy• Hire CSIO• Continued support throughout the lifecycle
My Bias
Still a scientist
A funder who still thinks like a PI
Not yet attuned to the federal system
Big supporter of OA via PLOS and others
Data – A Few Observations …
We talk about the promise of big data, but we don’t even know the value of little data (aka could “Big Data” be the new “AI”)
Good data is expensive in terms of time and money
Looking at data retroactively is really expensive
Good data begats trust; trust begats community; community is God
The way we support scientific data currently is not sustainable
There is no workable business model currently for scientific data
Data – A Few NIH Observations …
1. We have little idea how much we spend on data – estimated over $1bn per year
2. We have even less idea how much we should be spending
Point 2 is part of a culture clash between the more observational history of biomedicine and the new analytical approach to discovery
ADDS Mission Statement
To foster an ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances
health, lengthens life and reduces illness and disability
What Problems Are We Trying to Solve?
Possible Solutions
Sustainability – 50% business model
Efficiency – sharing best practices in longitudinal clinical studies
Collaboration - identification of collaborators at the point of data collection not publication
Reproducibility – data accessible with publication
Integration – phenotype homogenization
Accessibility – clinical trials registration
Quality – sharing CDEs across institutes
Training – keeping trainees in the ecosystem
The Data Ecosystem
Community Policy
Infrastructure
• Sustainable business model
• Collaboration• Training
Raw Materials to Seed the Ecosystem
NIH mandate & support
ADDS team of 8 people
Intramural participation of over 100 team members across ICs
Funding through BD2K:– ~$30M in FY14
– ~$80M in FY15
– ....
Example Communities
– NIH
• 20/27 ICs
– Agencies
• NSF
• DOE
• DARPA
• NIST
– Government
• OSTP
• HHS HDI
• ONC
• CDC
• FDA
– Private sector
• Phrma
• Amazon
– Organizations
• PCORI
• RDA, ELIXIR
• CCC
• CATS
• FASEB, ISCB
• Biophysical Society
• Sloan Foundation
• Moore Foundation
Example Policies
– Clinical data harmonization
– Data citation
– Machine readable data sharing plans on all grants
– New review models, audiences etc.
• Open review
• Micro funding
• Standing data committees to explore best practices
• Crowd sourcing
Example Infrastructure: The Commons
Data
The Long Tail
Core Facilities/HS Centers
Clinical /Patient
The Why:Data Sharing Plans
TheCommons
Government
The How:
DataDiscoveryIndex
SustainableStorage
Quality
Scientific Discovery
Usability
Security/Privacy
The End Game:
KnowledgeNIHAwardees
PrivateSector
Metrics/Standards
Rest ofAcademia
Software StandardsIndex
BD2KCenters
Cloud, Research Objects,Business Models
What Does the Commons Enable?
Dropbox like storage
The opportunity to apply quality metrics
Bring compute to the data
A place to collaborate
A place to discover
http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png
[Adapted from George Komatsoulis]
One Possible Commons Business Model
HPC, Institution …
Pilots Around A Virtuous CycleExpect a Funding Call
Training & Diversity Training & Diversity
Training & Diversity Goals:
– Develop a sufficient cadre of diverse researchers skilled in the science of Big Data
– Elevate general competencies in data usage and analysis across the biomedical research workforce
– Combat the Google bus
How:
– Traditional training grants
– Work with IC’s on a needs assessment
– Standards for course descriptions with EU
– Work with institutions on raising awareness
– Partner with minority institutions
– Virtual/physical training center(s)?
What Can Open Access Publishers Do?
Work with NIH on supporting data citation
Experiment with the idea of micropublication
Other?
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health