View
513
Download
0
Category
Tags:
Preview:
DESCRIPTION
Presented at Yale Day of Data, Yale University, Sept 26 2014.
Citation preview
NIH’s Day, Months, Four Years of Data
Philip E. Bourne Ph.D.Associate Director for Data Science
National Institutes of Health
What is NIH’s Overall Approach to Data and What Does It Mean to You?
Some Context: NIH Data Science History
6/12 2/14 3/14
• Findings:• Sharing data & software through catalogs• Support methods and applications development• Need more training• Need campus-wide IT strategy• Hire CSIO• Continued support throughout the lifecycle
My Bias
Still a scientist
A funder who still thinks like a PI
Not yet attuned to the federal system
Big supporter of open science through prior work with the Public Library of Science, FORCE11 etc.
Data – A Few Observations …
We talk about the promise of big data, but we don’t even know the value of little data (aka could “Big Data” be the new “AI”)
Good data is expensive in terms of time and money
Looking at data retroactively is really expensive
Good data begats trust; trust begats community; community is God
The way we support scientific data currently is not sustainable
That is, is no business model currently for scientific data
Data – A Few NIH Observations …
1. We have little idea how much we spend on data – estimated over $1bn per year
2. We have even less idea how much we should be spending
Point 2 is part of a culture clash between the more observational history of biomedicine and the new analytical approach to discovery
Data – An Academic Medical Center Observation
A digital enterprise exists when data connections are made across different areas of the organization such that productivity and competitiveness are improved
For example, between education and research
I am not aware that any such academic institutions exist?
Many are starting to wake up to the idea of getting there
JAMIA 2014, 21(2), 194
ADDS Mission Statement
To foster an ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances
health, lengthens life and reduces illness and disability
What Problems Are We Trying to Solve?
One Possible Solution
Sustainability – 50% business model
Efficiency – sharing best practices in longitudinal clinical studies; “trusted investigator”
Collaboration - identification of collaborators at the point of data collection not publication
Reproducibility – data accessible with publication
Integration – phenotype homogenization
Accessibility – clinical trials registration
Quality – sharing CDEs across institutes
Training – keeping trainees in the ecosystem
The Data Ecosystem
Community Policy
Infrastructure
• Sustainable business model
• Collaboration• Training
The Data Ecosystem
Community Policy
Infrastructure
• Sustainable business model
• Collaboration• Training
VirtuousResearch
Cycle
The Virtuous Cycle
http://goo.gl/fkWjhS
Raw Materials to Seed the Ecosystem
NIH mandate & support
ADDS team of 8 people
Intramural participation of over 100 team members across ICs
Funding through BD2K:– ~$30M in FY14
– ~$80M in FY15
– ....
Organization to Seed the Ecosystem…
Associate Director for Data Science
Commons BD2K Efficiency
Sustainability Education Innovation Process
• Cloud – Data & Compute
• Search• Security • Reproducibility
Standards• App Store
• Coordinate• Hands-on• Syllabus• MOOCs
• Community• Centers• Training Grants• Catalogs• Standards• Analysis
• Data Resource Support
• Metrics• Best
Practices• Evaluation• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Partnerships
Collaboration
Programmatic Theme
Deliverable
Example Features • IC’s• Researchers• Federal
Agencies• International
Partners• Computer
Scientists
Scientific Data Council External Advisory Board
Training
Associate Director for Data Science
Commons BD2K Efficiency
Sustainability Education Innovation Process
• Cloud – Data & Compute
• Search• Security • Reproducibility
Standards• App Store
• Coordinate• Hands-on• Syllabus• MOOCs
• Community• Centers• Training Grants• Catalogs• Standards• Analysis
• Data Resource Support
• Metrics• Best
Practices• Evaluation• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Partnerships
Collaboration
Programmatic Theme
Deliverable
Example Features • IC’s• Researchers• Federal
Agencies• International
Partners• Computer
Scientists
Scientific Data Council External Advisory Board
Training
Example Communities
– NIH
• 27 ICs
– Agencies
• NSF
• DOE
• DARPA
• NIST
– Government
• OSTP
• HHS HDI
• ONC
• CDC
• FDA
– Private sector
• Phrma
• Amazon
– Organizations
• PCORI, GA4GH
• RDA, ELIXIR
• CCC
• CATS
• FASEB, ISCB
• Biophysical Society
• Sloan Foundation
• Moore Foundation
Example Policies
– Clinical data harmonization
– DbGaP in the cloud
– Data citation
– Machine readable data sharing plans on all grants
– New review models, audiences etc.
• Open review
• Micro funding
• Standing data committees to explore best practices
• Crowd sourcing
Example Infrastructure: The Commons
Data
The Long Tail
Core Facilities/HS Centers
Clinical /Patient
The Why:Data Sharing Plans
TheCommons
Government
The How:
DataDiscoveryIndex
SustainableStorage
Quality
Scientific Discovery
Usability
Security/Privacy
The End Game:
KnowledgeNIHAwardees
PrivateSector
Metrics/Standards
Rest ofAcademia
Software StandardsIndex
BD2KCenters
Cloud, Research Objects,Business Models
What Does the Commons Enable?
Dropbox like storage
The opportunity to apply quality metrics
Bring compute to the data
A place to collaborate
A place to discover
http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png
[Adapted from George Komatsoulis]
One Possible Commons Business Model
HPC, Institution …
Pilots Around A Virtuous CycleExpect a FY15 Funding Call to Work in
the Commons
Training & Diversity Training & Diversity
Training & Diversity Goals:
– Develop a sufficient cadre of diverse researchers skilled in the science of Big Data
– Elevate general competencies in data usage and analysis across the biomedical research workforce
– Combat the Google bus
How:
– Traditional training grants
– Work with IC’s on a needs assessment
– Standards for course descriptions with EU
– Work with institutions on raising awareness
– Partner with minority institutions
– Virtual/physical training center(s)?
Closing Question
Calls for increased NIH funding has so far gone unheeded, what can the
ADDS do (that you have not heard about) to improve data science
activities?
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
Recommended