Upload
imgcommcall
View
18
Download
1
Embed Size (px)
Citation preview
The NCI Cancer Research Data Commons
Allen Dearry, Ph.D.Program Director
Center for Biomedical Informatics and Information Technology
Imaging Community Call11.06.2017
National Cancer Data Ecosystem Recommendation
Overall goal: “Enable all participants across the cancer research and care continuum to contribute, access, combine and analyze diverse data that will enable new discoveries and lead to lowering the burden of cancer.”
• Fundamental framework and infrastructure to connect components and ensure interoperability
• Common APIs• Data schemas• Common data dictionaries• Enhanced cloud computing platforms
• Components such as repositories, analytics services, and interactive portals
• The ability to link diverse data types and data sources is fundamental to interoperability of the Cancer Data Ecosystem.
2
3
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at the University of Chicago Kenwood Data Center.
NCI Scope: “Create a data science infrastructure necessary to connect repositories, analytical tools, and knowledge bases”
Clinical Immuno-oncology
NCI Cancer Research Data Commons (NCRDC) - Concept
Data commons co-locate data, storage and computing infrastructure with commonly used services, tools & apps for analyzing and sharing data to create an interoperable resource for the research community.*
4
• A key component of a learning National Cancer Data Ecosystem
• Making research data available for discovery, validation, new therapies
• Maximizing the impact, reuse, and reproducibility of cancer research
• Facilitating innovation of methods and tools for research (e.g., ITCR)
• Promoting research collaborations
• Creating foundational infrastructure for new programs to include data
• Assisting DCCs in sharing and redistributing data
• Changing incentives for data sharing
Reduce the risk, improve early detection, outcomes, and survivorship in cancer
Why Develop a Cancer Research Data Commons?
NCI Cancer Research Data Commons (NCRDC)
Genomic Data Commons Node: GDC
Imaging Data Commons Node: IDC
Proteomic Data Commons Node: PDC
APIs
• Authentication and Authorization
• Metadata Validation Tools
• Data Models
• User Workspaces
• Container Environment
Data Commons Framework – Modular, Flexible Core Services
Data Commons Framework
What is it?
• Reusable, expandable framework for the Data Commons
• Defines the core principles and structure of a Data Commons
• Provides reusable, modular components that can be leveraged across the Data Commons
Modular Components
• Secure user authentication and authorization
• Metadata validation and tools
• Domain-specific, extensible data models
• API and container environment for tools and pipelines
• Access to computational workspaces for storing data, tools, and results
6
The NCI Cancer Research Data CommonsA virtual, expandable infrastructure
Standardized data submission and Q/C Controlled vocabularies Harmonization by subject matter experts Genomic Data
Proteomic Data
GDC
Clinical
Functional
Cancer Models
Imaging
Population
Proteomics
NCI Cancer Research
Data Commons
GDC
Imaging Data Secure data access through API or web UI Query across data domains Analytics, elastic compute, visualization
GDC
Authentication
&
Authorization
Biologists / Clinical Researchers
Clinicians and Patients
Tool / Algorithm
Developers
Computational
Scientists
Data Contributors
8
API API API API
Cancer Data AggregatorAggregate by case, sample, study, disease, tissue, etc.
API
APIs
Community Presentation
Analytics
Multi-modal data aggregation
Data Commons Repositories/Nodes
Genomics Imaging ProteomicsClinical9
10
Why do we need an Imaging Data Commons?
• A common, accessible platform is needed to store, access, and analyze imaging data types
• Imaging methodologies beyond X-ray and CT• Digital Pathology
• 2D and 3D imaging, multiple labels
• Ideal pilot case for the NCRDC framework • Strong partners in the NCI Divisions, Offices and Centers
• Understand the requirements of the IDC
• Build off their work with TCIA
Steve Jett
11
Imaging, image analysis, and query are the basis of the IDC
• Established image formats such as DICOM
• Addition of newer imaging method data• WSI, 3D images
• Generation of new image access and query tools, plus relevant APIs
• Maintain compatibility with the NCRDC
• Allow users to run their own apps in the cloud
• Allow querying of private data seamlessly with IDC data
• Input from the research community is essential to the IDC’s success
Steve Jett
12
▪ The Cancer Imaging Archive (TCIA)
▪ The Cancer Genome Atlas
▪ Quantitative Imaging Network
▪ Reference Image Database to Evaluate Therapy Response
▪ Human Tumor Atlas
▪ CPTAC
▪ APOLLO
▪ Imaging data from ITCR grants
IDC will host imaging data from NCI-supported projects
HTA …
TCGA
Analysis Tools
IDC
13
Current IDC Status: Landscape Analysis
• IDC is a work in progress, looking for how it can best serve the research community
• What will be the relationship between TCIA and the IDC?
• What are the emerging, important imaging fields and formats beyond digital pathology and the Human Tumor Atlas that should be considered and engaged for inclusion?
• Input from the research community will drive the IDC data composition, as well as the tool set
• Generation of an IDC prototype
Steve Jett
NCRDC Governance and Outreach
• Governance process to be established, including Scientific and Technical
Review Board and Steering Committee
• Structured process for decisions, interactions, roles
• Outreach and collaboration
• Working with NIH and other ICs on related initiatives / Data Commons, as well
as external groups such as Chan Zuckerberg
• Participating on NIH and interagency working groups and on PMI- and
Moonshot-related projects
• Plans for workshops and RFIs to get community input, feedback, and
participation14
Cloud Resources Team Leads• Gad Getz, Ph.D - Broad Institute• Ilya Shmulevich, Ph.D - ISB • Brandi Davis-Dusenberry, Ph.D - Seven Bridges
NCI CBIIT Team• Durga Addepalli, Ph.D.• Allen Dearry, Ph.D.• Juli Klemm, Ph.D.• Tanja Davidsen, Ph.D.• Izumi Hinkson, Ph.D.• Betsy Hsu, Ph.D.• Stephen Jett, Ph.D.• John Otridge, Ph.D.• Sima Pandya• Eve Shalley• Steve Tsang, Ph.D.
Framework Team• Robert Grossman, Ph.D - University of Chicago• Phillis Tang• Christina Yung
AcknowledgementsNCI Center for Cancer Genomics
• JC Zenklusen, Ph.D.• Daniela Gerhard, Ph.D.• Zhining Wang, Ph.D.
NCI Office of Cancer Clinical Proteomics Research• Henry Rodriguez, Ph.D.• Chris Kinsinger, Ph.D.
NCI Cancer Imaging Program• Paula Jacobs• John Freymann• Justin Kirby
NCI Leadership• Doug Lowy, M.D.• Warren Kibbe, Ph.D.• Lou Staudt, M.D., Ph.D.• Stephen Chanock, M.D.
15