7
Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 1 Integration of Knowledge and Data from Science to Clinical Practice Christopher G. Chute, MD DrPH Bloomberg Distinguished Professor of Health Informatics Professor of Medicine, Public Health, and Nursing Chief Health Research Information Officer Deputy Director, Institute for Clinical and Translational Research Johns Hopkins University, Baltimore, MD, USA Advancing Research in the Digital Age Baltimore, 27 February 2018 © 2007 Mayo 2 The Historical Center of the Health Data Universe Clinical Data Billable Diagnoses Billable Diagnoses © 2007 Mayo 3 Copernican Healthcare Billable Diagnoses Clinical Data (Niklas Koppernigk) Clinical Guidelines Scientific Literature Medical Literature Clinical Data Origins of Big Science Astronomy 4 Sloan Digital Sky Survey III – DR9 Total area of imaging 31,637 square degrees Image field size 1361x2048 pixels Number fields 938,046 (excluding supernovae runs) Catalog objects 1,231,051,050 Number of unique, primary sources Total 469,053,874 Stars 260,562,744 Galaxies 208,478,448 Unknown 12,682 5 Images Spectra Object catalog Metadata Rutherford “Table-top” Experiment 6

Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Embed Size (px)

Citation preview

Page 1: Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Biomedical Informatics; CG Chute

© Mayo Clinic College of Medicine 201 1

Integration of Knowledge and Data from Science to Clinical Practice

Christopher G. Chute, MD DrPHBloomberg Distinguished Professor of Health InformaticsProfessor of Medicine, Public Health, and NursingChief Health Research Information OfficerDeputy Director, Institute for Clinical and Translational ResearchJohns Hopkins University, Baltimore, MD, USA

Advancing Research in the Digital AgeBaltimore, 27 February 2018

© 2007 Mayo2

The Historical Center of theHealth Data Universe

Clinical Data

Billable Diagnoses

Billable Diagnoses

© 2007 Mayo3

Copernican Healthcare

Billable Diagnoses

Clinical Data(Niklas Koppernigk)

Clinical Guidelines

Scientific LiteratureMedical Literature

Clinical Data

Origins of Big ScienceAstronomy

4

Sloan Digital Sky Survey III – DR9

Total area of imaging

31,637 square degrees

Image field size 1361x2048 pixels

Number fields938,046 (excluding supernovae runs)

Catalog objects 1,231,051,050Number of unique, primary sources Total 469,053,874Stars 260,562,744Galaxies 208,478,448Unknown 12,682 5

• Images• Spectra• Object catalog• Metadata

Rutherford “Table-top” Experiment

6

Page 2: Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Biomedical Informatics; CG Chute

© Mayo Clinic College of Medicine 201 2

That Higgs Boson

7

•600 institutions•10,000 scientists•800 trillion collisions•200PB of data =•2×1017 bytes of data

Boarding on an astronomical number in its own right!• $13.25B USD

Dimensionality of Higgs “Big Data”

• Mass/Energy• Direction• Charge

8

• Medicine is more complicated than that

Dimensionality of Big Data• Broad

• Small amounts of data; Huge number observations• National Claims data

• Deep• Large amounts of data; Few observation• NGS Complete Genome

• Rich• Broad and Deep• Clinical Phenotyping data (EMRs)

• Labs, Vitals, Exam, Waveform, Images, Omics, …• Social, environmental, diet,

9 10

BiocomputingUnimaginable potential, next 60 years

• Moore’s had a long reign• 1013 fold increase in computing power / 60 yrs• World-class Supercomputers of 1990

• Game platforms and cell phones of 2015 (GPUs)

• Extensions to networks, memory, & storage110 baud to 100Gb backbone (1010 / 60 yrs)Vacuum tubes to 1Tb RAM (1012 / 60 yrs)Paper-tape to Pb drives (1015 / 60 yrs)Cloudy computing (1010 / 10 yrs)

• Synergistic Summary – 1060 / 60 yrs

11

Biomedical ScienceAn Information Intensive Enterprise

Biomedical discovery dependent upon:• Web of data, information, and knowledge• Public repositories of ‘omics data

• Genes, proteins, small molecules, pathways• Associations, annotations, dynamic pathways

• Clinical data is becoming plentiful - EHRs• Privacy and confidentiality limit public access• De-identification remains asymptotic

... enabled by computational capacity… enhanced with linkage and connections

Some Fashionable Biology DatabasesThis Week

12

Page 3: Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Biomedical Informatics; CG Chute

© Mayo Clinic College of Medicine 201 3

Some Fashionable Clinical DatabasesThis Week

13

0

1

2

3

4

5

6

7

8

9

10

Est

ima

ted

Act

ivity

Biology Medicine

The Continuum Of Biomedical InformaticsBioinformatics meets Medical Informatics

14

The Chasm of Semantic Despair

15

From Practice-based Evidenceto Evidence-based Practice

ClinicalDatabases

Registries et al.

ClinicalGuidelines

ExpertSystems

Data Inference

KnowledgeManagement

Decisionsupport

Standards

Comparability and Consistency

Terminologies & Data Models

Foundations for Learning Health System

PatientEncounters

MedicalKnowledge

NCATS Translator Program

16June, 2016

Chris ChuteClinical terminologies,

frameworks & text mining

Melissa HaendelDevelopmental bio, model organisms &

ontologies

Chris MungallSemantic engineering &

similarity algorithms

Peter RobinsonClinical ontologies &

diagnostic algorithms,Medical genetics

Hongfang LiuText mining, pathway

modeling, data normalization

Ben GoodCrowdsourcing,

community curation, semantic web

Andrew SuData integration, data

services

Shannon McWeeney

cancer imaging, omics, drugs, flow cytometry,

machine learning

Maureen HoatlinFanconi Anemia, rare

disease, biochemistry, basic research

Julie McMurryProject Management,

User Experience, Public Health

David KoellerClinician of inborn

errors of metabolism, Undiagnosed

diseases

Casey OverbyKnowledge-based

methods & evaluation of precision medicine

applications

Guoqian JiangClinical data standards

& Interoperability, semantic modeling &

validation

Page 4: Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Biomedical Informatics; CG Chute

© Mayo Clinic College of Medicine 201 4

11

22

33

The Monarch InitiativeMelissa Haendel, Chris Mungall

• Integrate, align, and re-distribute cross-species gene, genotype, variant, disease, and phenotype data

• Provide a portal for exploration of phenotype-based similarity• Facilitate identification of animal models of human disease

through phenotypic similarity• Enable quantitative comparison of cross-species phenotypes• Develop embeddable widgets for data exploration• Influence genotype and phenotype reporting standards• Improve ontologies to better curate genotype-phenotype

data

• “make all the data count” monarchinitiative.org

21Image from Washington et al, 2009 PMC2774506

Computational Comparison of Animal Model Phenotypes with Human Disease

• Comparisons computationally established using ontological characterizations

• Robinson et al, 2014.

PMC3912424

22

Leadership in existing projects to be leveraged in the translator

Team Experienced in Open Source Efforts and Public Data

Dissemination / deployment

Extraction of knowledge from diverse sources

Data & knowledge integration,

algorithms & tools

Contribution in other projects

1

2

3

Leadership in other projects

OBO Foundry

Extraction of knowledge from diverse sources

My*

• Tools to integrate vocabularies• Tools to identify equivalencies:

• Identifier and synonym alignment• Conceptual alignment based upon logic

determinations and prior probabilities

• Text mining for clinical concepts and pathway fragments

TRANSMED KNOWLEDGE GRAPH

kBOOM

Page 5: Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Biomedical Informatics; CG Chute

© Mayo Clinic College of Medicine 201 5

GRAPH INFERENCE OWL-based reasoning over large graphs BELpathway ‘chains of causation’ Probabilistic inference across disease-phenotype-

gene Bayesian Ontology Query Algorithm (Boqa)

GRAPH QUERY Query related entities within/across species,

sources Query similar sets of entities (OWLsim) Sets of phenotypes Expression patterns Pathway modules

TRANSMED KNOWLEDGE GRAPH

Data & knowledge integration, algorithms & tools

Example queryWhat pathways are uniquely targeted by stem cell therapy pre-conditioning drugs that are

well- vs poorly- tolerated by FA patients?

Query Path with Semantic Types

Drug - [molecularly_controls] ‐> Protein - [encoded_by] ‐>     Gene ‐ [member_of] ‐> Pathway|                                                                                         |                 |                                                                    |

drugbank:DB01073 - [molecularly_controls] ‐> Uniprot:P09884 - [encoded_by] ‐> HGNC:9173 ‐ [member_of] ‐> WikiPath:WP2446(Fludarabine) (POLA1 protein) (POLA1 gene) (Rb Pathway)

Data Types and Sources:1. Drug-Protein Interactions

a. from DrugBank via BioThings APIb. from DGIdb, via DGIdb API

2. Protein-Gene Associationsa. from Ensembl via Ensembl API or BioLink APIb. from Uniprot via Wikidata API

3. Gene-Pathway Membershipa. from Wikipathway via Wikidata APIb. from Reactome via BioLink API

(Query implemented in OrangeQ2.4_Drug_Gene_Pathway Jupyter notebook)

Pharmaceutical Product

RxNorm CUIEMEA ID

Chemical CompoundRxNorm CUI, CAS, Pubchem, InChIKey, Drugbank, UNII, ChEMBL, ChEBI, NDF-RT

active ingredient

has active ingredient

DiseaseOMIM, MeSH, ICD-9, DOID, ICD-10, UMLS, Orphanet genetic association

Drug used for treatment

GeneEnsembl, Entrez, Refseq, HGNC, OrthologStrand, ChromosomeGenomic start, end, taxon

ProteinEnsembl, uniprot, Refseqfunction, cell component, biological process (GO term)Has part (IPR domain)Subclass of (IPR family), taxon

encodes

encoded by

Physically interacts withAs (agonist, inhibitor, modulator..

)

Variant

Chromosome, genomic start/end, protein, CIViC variant ID

negative therapeutic predictorpositive therapeutic predictor

medical condition treated

negative prognostic predictorpositive prognostic predictorpositive diagnostic predictornegative diagnostic predictor

Medical condition treatedmedicine marketing autho.

Medical condition treated

RED indicates new type or predicate (since Sept) ~150k items added - Garbanzo API (knowledge beacon impl)

Biological PathwayWikiPathways IDHas part

895,065 110,306 535,409 46,356

8694

1,081

31

Total counts

New

216

1,165

1,259

250

2506 157,340

adapted from Russell and Norvig, 2009Credit: NCATS Translator Team Ultraviolet, Broad Institute

The Knowledge Map is a set ofactions the agent can perform 

Credit: NCATS Translator Team Ultraviolet, Broad Institute

Page 6: Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Biomedical Informatics; CG Chute

© Mayo Clinic College of Medicine 201 6

downloadeverything

crawl &index

high‐levelabstraction

trial &error

more details fewer details

Credit: NCATS Translator Team Ultraviolet, Broad Institute

The agent plans with abstract entities

BLACKBOARDspecific instances

KNOWLEDGE MAPAND AGENT STATEabstract entities

imatinib c‐Kit KIT signaling mast cell asthma

Drug Target Pathway Cell Disease

bindingabstraction

Bcr‐Abl eosinophil

Credit: NCATS Translator Team Ultraviolet, Broad Institute

Credit: NCATS Translator Team Ultraviolet, Broad Institute

Four steps of reasoning

Credit: NCATS Translator Team Ultraviolet, Broad Institute

Changing Face of Medicine: Continuous Learning

35

Biological principles

BasicScienceStudies

ClinicalCohorts

Experience

Studies

Longitudinal Data

loT

Wearables

SocialBehavioralBiologicalEnvironmental

Knowledge of Basic Science

Knowledge of Clinicians

RESEARCH ENVIRONMENT

36

DISCOVERY

Prototype Platform

Pulls datasets

from disparate sources

Normalizes across

disparate sources to

create useable data set

Projects approved data sets into safe,

secure environment for analysis

Checks against IRB approvals to

ensure authorization

to access data

DATA COMMONS

Platform Management and Services

Platform Security and Data Provenance

Cloud and Big Data Technologies

Systems of Record

JHM Deployment (Local)Azure Deployment (Cloud)Hybrid Deployment

LEGEND

DATA SOURCES

Reusable Pipelines

Page 7: Biomedical Informatics; CG Chute - … · Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine 201 2 That Higgs Boson 7 •600 institutions •10,000 scientists •800

Biomedical Informatics; CG Chute

© Mayo Clinic College of Medicine 201 7

37

Where is This Going?• Biomedical practice and research are data,

information, and knowledge intensive• Comparable and consistent data

representation are pre-requisite for efficient clinical analytics

• Clinical Data is Broad, Deep, and Complex• Inferencing clinical “big data” is not Google• Ontological frameworks are needed for large-

scale data integration and big science