16
The NCI Cancer Research Data Commons Allen Dearry, Ph.D. Program Director Center for Biomedical Informatics and Information Technology Imaging Community Call 11.06.2017

Imaging dearry ncrdc 11062017

Embed Size (px)

Citation preview

The NCI Cancer Research Data Commons

Allen Dearry, Ph.D.Program Director

Center for Biomedical Informatics and Information Technology

Imaging Community Call11.06.2017

National Cancer Data Ecosystem Recommendation

Overall goal: “Enable all participants across the cancer research and care continuum to contribute, access, combine and analyze diverse data that will enable new discoveries and lead to lowering the burden of cancer.”

• Fundamental framework and infrastructure to connect components and ensure interoperability

• Common APIs• Data schemas• Common data dictionaries• Enhanced cloud computing platforms

• Components such as repositories, analytics services, and interactive portals

• The ability to link diverse data types and data sources is fundamental to interoperability of the Cancer Data Ecosystem.

2

3

*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at the University of Chicago Kenwood Data Center.

NCI Scope: “Create a data science infrastructure necessary to connect repositories, analytical tools, and knowledge bases”

Clinical Immuno-oncology

NCI Cancer Research Data Commons (NCRDC) - Concept

Data commons co-locate data, storage and computing infrastructure with commonly used services, tools & apps for analyzing and sharing data to create an interoperable resource for the research community.*

4

• A key component of a learning National Cancer Data Ecosystem

• Making research data available for discovery, validation, new therapies

• Maximizing the impact, reuse, and reproducibility of cancer research

• Facilitating innovation of methods and tools for research (e.g., ITCR)

• Promoting research collaborations

• Creating foundational infrastructure for new programs to include data

• Assisting DCCs in sharing and redistributing data

• Changing incentives for data sharing

Reduce the risk, improve early detection, outcomes, and survivorship in cancer

Why Develop a Cancer Research Data Commons?

NCI Cancer Research Data Commons (NCRDC)

Genomic Data Commons Node: GDC

Imaging Data Commons Node: IDC

Proteomic Data Commons Node: PDC

APIs

• Authentication and Authorization

• Metadata Validation Tools

• Data Models

• User Workspaces

• Container Environment

Data Commons Framework – Modular, Flexible Core Services

Data Commons Framework

What is it?

• Reusable, expandable framework for the Data Commons

• Defines the core principles and structure of a Data Commons

• Provides reusable, modular components that can be leveraged across the Data Commons

Modular Components

• Secure user authentication and authorization

• Metadata validation and tools

• Domain-specific, extensible data models

• API and container environment for tools and pipelines

• Access to computational workspaces for storing data, tools, and results

6

Denny et al. A data biosphere for biomedical research. Medium. Oct 16 2017.

“A Data Biosphere”

The NCI Cancer Research Data CommonsA virtual, expandable infrastructure

Standardized data submission and Q/C Controlled vocabularies Harmonization by subject matter experts Genomic Data

Proteomic Data

GDC

Clinical

Functional

Cancer Models

Imaging

Population

Proteomics

NCI Cancer Research

Data Commons

GDC

Imaging Data Secure data access through API or web UI Query across data domains Analytics, elastic compute, visualization

GDC

Authentication

&

Authorization

Biologists / Clinical Researchers

Clinicians and Patients

Tool / Algorithm

Developers

Computational

Scientists

Data Contributors

8

API API API API

Cancer Data AggregatorAggregate by case, sample, study, disease, tissue, etc.

API

APIs

Community Presentation

Analytics

Multi-modal data aggregation

Data Commons Repositories/Nodes

Genomics Imaging ProteomicsClinical9

10

Why do we need an Imaging Data Commons?

• A common, accessible platform is needed to store, access, and analyze imaging data types

• Imaging methodologies beyond X-ray and CT• Digital Pathology

• 2D and 3D imaging, multiple labels

• Ideal pilot case for the NCRDC framework • Strong partners in the NCI Divisions, Offices and Centers

• Understand the requirements of the IDC

• Build off their work with TCIA

Steve Jett

11

Imaging, image analysis, and query are the basis of the IDC

• Established image formats such as DICOM

• Addition of newer imaging method data• WSI, 3D images

• Generation of new image access and query tools, plus relevant APIs

• Maintain compatibility with the NCRDC

• Allow users to run their own apps in the cloud

• Allow querying of private data seamlessly with IDC data

• Input from the research community is essential to the IDC’s success

Steve Jett

12

▪ The Cancer Imaging Archive (TCIA)

▪ The Cancer Genome Atlas

▪ Quantitative Imaging Network

▪ Reference Image Database to Evaluate Therapy Response

▪ Human Tumor Atlas

▪ CPTAC

▪ APOLLO

▪ Imaging data from ITCR grants

IDC will host imaging data from NCI-supported projects

HTA …

TCGA

Analysis Tools

IDC

13

Current IDC Status: Landscape Analysis

• IDC is a work in progress, looking for how it can best serve the research community

• What will be the relationship between TCIA and the IDC?

• What are the emerging, important imaging fields and formats beyond digital pathology and the Human Tumor Atlas that should be considered and engaged for inclusion?

• Input from the research community will drive the IDC data composition, as well as the tool set

• Generation of an IDC prototype

Steve Jett

NCRDC Governance and Outreach

• Governance process to be established, including Scientific and Technical

Review Board and Steering Committee

• Structured process for decisions, interactions, roles

• Outreach and collaboration

• Working with NIH and other ICs on related initiatives / Data Commons, as well

as external groups such as Chan Zuckerberg

• Participating on NIH and interagency working groups and on PMI- and

Moonshot-related projects

• Plans for workshops and RFIs to get community input, feedback, and

participation14

Cloud Resources Team Leads• Gad Getz, Ph.D - Broad Institute• Ilya Shmulevich, Ph.D - ISB • Brandi Davis-Dusenberry, Ph.D - Seven Bridges

NCI CBIIT Team• Durga Addepalli, Ph.D.• Allen Dearry, Ph.D.• Juli Klemm, Ph.D.• Tanja Davidsen, Ph.D.• Izumi Hinkson, Ph.D.• Betsy Hsu, Ph.D.• Stephen Jett, Ph.D.• John Otridge, Ph.D.• Sima Pandya• Eve Shalley• Steve Tsang, Ph.D.

Framework Team• Robert Grossman, Ph.D - University of Chicago• Phillis Tang• Christina Yung

AcknowledgementsNCI Center for Cancer Genomics

• JC Zenklusen, Ph.D.• Daniela Gerhard, Ph.D.• Zhining Wang, Ph.D.

NCI Office of Cancer Clinical Proteomics Research• Henry Rodriguez, Ph.D.• Chris Kinsinger, Ph.D.

NCI Cancer Imaging Program• Paula Jacobs• John Freymann• Justin Kirby

NCI Leadership• Doug Lowy, M.D.• Warren Kibbe, Ph.D.• Lou Staudt, M.D., Ph.D.• Stephen Chanock, M.D.

15

www.cancer.gov www.cancer.gov/espanol