Upload
hoangminh
View
225
Download
0
Embed Size (px)
Citation preview
Biomedical Informatics; CG Chute
© Mayo Clinic College of Medicine 201 1
Integration of Knowledge and Data from Science to Clinical Practice
Christopher G. Chute, MD DrPHBloomberg Distinguished Professor of Health InformaticsProfessor of Medicine, Public Health, and NursingChief Health Research Information OfficerDeputy Director, Institute for Clinical and Translational ResearchJohns Hopkins University, Baltimore, MD, USA
Advancing Research in the Digital AgeBaltimore, 27 February 2018
© 2007 Mayo2
The Historical Center of theHealth Data Universe
Clinical Data
Billable Diagnoses
Billable Diagnoses
© 2007 Mayo3
Copernican Healthcare
Billable Diagnoses
Clinical Data(Niklas Koppernigk)
Clinical Guidelines
Scientific LiteratureMedical Literature
Clinical Data
Origins of Big ScienceAstronomy
4
Sloan Digital Sky Survey III – DR9
Total area of imaging
31,637 square degrees
Image field size 1361x2048 pixels
Number fields938,046 (excluding supernovae runs)
Catalog objects 1,231,051,050Number of unique, primary sources Total 469,053,874Stars 260,562,744Galaxies 208,478,448Unknown 12,682 5
• Images• Spectra• Object catalog• Metadata
Rutherford “Table-top” Experiment
6
Biomedical Informatics; CG Chute
© Mayo Clinic College of Medicine 201 2
That Higgs Boson
7
•600 institutions•10,000 scientists•800 trillion collisions•200PB of data =•2×1017 bytes of data
Boarding on an astronomical number in its own right!• $13.25B USD
Dimensionality of Higgs “Big Data”
• Mass/Energy• Direction• Charge
8
• Medicine is more complicated than that
Dimensionality of Big Data• Broad
• Small amounts of data; Huge number observations• National Claims data
• Deep• Large amounts of data; Few observation• NGS Complete Genome
• Rich• Broad and Deep• Clinical Phenotyping data (EMRs)
• Labs, Vitals, Exam, Waveform, Images, Omics, …• Social, environmental, diet,
9 10
BiocomputingUnimaginable potential, next 60 years
• Moore’s had a long reign• 1013 fold increase in computing power / 60 yrs• World-class Supercomputers of 1990
• Game platforms and cell phones of 2015 (GPUs)
• Extensions to networks, memory, & storage110 baud to 100Gb backbone (1010 / 60 yrs)Vacuum tubes to 1Tb RAM (1012 / 60 yrs)Paper-tape to Pb drives (1015 / 60 yrs)Cloudy computing (1010 / 10 yrs)
• Synergistic Summary – 1060 / 60 yrs
11
Biomedical ScienceAn Information Intensive Enterprise
Biomedical discovery dependent upon:• Web of data, information, and knowledge• Public repositories of ‘omics data
• Genes, proteins, small molecules, pathways• Associations, annotations, dynamic pathways
• Clinical data is becoming plentiful - EHRs• Privacy and confidentiality limit public access• De-identification remains asymptotic
... enabled by computational capacity… enhanced with linkage and connections
Some Fashionable Biology DatabasesThis Week
12
Biomedical Informatics; CG Chute
© Mayo Clinic College of Medicine 201 3
Some Fashionable Clinical DatabasesThis Week
13
0
1
2
3
4
5
6
7
8
9
10
Est
ima
ted
Act
ivity
Biology Medicine
The Continuum Of Biomedical InformaticsBioinformatics meets Medical Informatics
14
The Chasm of Semantic Despair
15
From Practice-based Evidenceto Evidence-based Practice
ClinicalDatabases
Registries et al.
ClinicalGuidelines
ExpertSystems
Data Inference
KnowledgeManagement
Decisionsupport
Standards
Comparability and Consistency
Terminologies & Data Models
Foundations for Learning Health System
PatientEncounters
MedicalKnowledge
NCATS Translator Program
16June, 2016
Chris ChuteClinical terminologies,
frameworks & text mining
Melissa HaendelDevelopmental bio, model organisms &
ontologies
Chris MungallSemantic engineering &
similarity algorithms
Peter RobinsonClinical ontologies &
diagnostic algorithms,Medical genetics
Hongfang LiuText mining, pathway
modeling, data normalization
Ben GoodCrowdsourcing,
community curation, semantic web
Andrew SuData integration, data
services
Shannon McWeeney
cancer imaging, omics, drugs, flow cytometry,
machine learning
Maureen HoatlinFanconi Anemia, rare
disease, biochemistry, basic research
Julie McMurryProject Management,
User Experience, Public Health
David KoellerClinician of inborn
errors of metabolism, Undiagnosed
diseases
Casey OverbyKnowledge-based
methods & evaluation of precision medicine
applications
Guoqian JiangClinical data standards
& Interoperability, semantic modeling &
validation
Biomedical Informatics; CG Chute
© Mayo Clinic College of Medicine 201 4
11
22
33
The Monarch InitiativeMelissa Haendel, Chris Mungall
• Integrate, align, and re-distribute cross-species gene, genotype, variant, disease, and phenotype data
• Provide a portal for exploration of phenotype-based similarity• Facilitate identification of animal models of human disease
through phenotypic similarity• Enable quantitative comparison of cross-species phenotypes• Develop embeddable widgets for data exploration• Influence genotype and phenotype reporting standards• Improve ontologies to better curate genotype-phenotype
data
• “make all the data count” monarchinitiative.org
21Image from Washington et al, 2009 PMC2774506
Computational Comparison of Animal Model Phenotypes with Human Disease
• Comparisons computationally established using ontological characterizations
• Robinson et al, 2014.
PMC3912424
22
Leadership in existing projects to be leveraged in the translator
Team Experienced in Open Source Efforts and Public Data
Dissemination / deployment
Extraction of knowledge from diverse sources
Data & knowledge integration,
algorithms & tools
Contribution in other projects
1
2
3
Leadership in other projects
OBO Foundry
Extraction of knowledge from diverse sources
My*
• Tools to integrate vocabularies• Tools to identify equivalencies:
• Identifier and synonym alignment• Conceptual alignment based upon logic
determinations and prior probabilities
• Text mining for clinical concepts and pathway fragments
TRANSMED KNOWLEDGE GRAPH
kBOOM
Biomedical Informatics; CG Chute
© Mayo Clinic College of Medicine 201 5
GRAPH INFERENCE OWL-based reasoning over large graphs BELpathway ‘chains of causation’ Probabilistic inference across disease-phenotype-
gene Bayesian Ontology Query Algorithm (Boqa)
GRAPH QUERY Query related entities within/across species,
sources Query similar sets of entities (OWLsim) Sets of phenotypes Expression patterns Pathway modules
TRANSMED KNOWLEDGE GRAPH
Data & knowledge integration, algorithms & tools
Example queryWhat pathways are uniquely targeted by stem cell therapy pre-conditioning drugs that are
well- vs poorly- tolerated by FA patients?
Query Path with Semantic Types
Drug - [molecularly_controls] ‐> Protein - [encoded_by] ‐> Gene ‐ [member_of] ‐> Pathway| | | |
drugbank:DB01073 - [molecularly_controls] ‐> Uniprot:P09884 - [encoded_by] ‐> HGNC:9173 ‐ [member_of] ‐> WikiPath:WP2446(Fludarabine) (POLA1 protein) (POLA1 gene) (Rb Pathway)
Data Types and Sources:1. Drug-Protein Interactions
a. from DrugBank via BioThings APIb. from DGIdb, via DGIdb API
2. Protein-Gene Associationsa. from Ensembl via Ensembl API or BioLink APIb. from Uniprot via Wikidata API
3. Gene-Pathway Membershipa. from Wikipathway via Wikidata APIb. from Reactome via BioLink API
(Query implemented in OrangeQ2.4_Drug_Gene_Pathway Jupyter notebook)
Pharmaceutical Product
RxNorm CUIEMEA ID
Chemical CompoundRxNorm CUI, CAS, Pubchem, InChIKey, Drugbank, UNII, ChEMBL, ChEBI, NDF-RT
active ingredient
has active ingredient
DiseaseOMIM, MeSH, ICD-9, DOID, ICD-10, UMLS, Orphanet genetic association
Drug used for treatment
GeneEnsembl, Entrez, Refseq, HGNC, OrthologStrand, ChromosomeGenomic start, end, taxon
ProteinEnsembl, uniprot, Refseqfunction, cell component, biological process (GO term)Has part (IPR domain)Subclass of (IPR family), taxon
encodes
encoded by
Physically interacts withAs (agonist, inhibitor, modulator..
)
Variant
Chromosome, genomic start/end, protein, CIViC variant ID
negative therapeutic predictorpositive therapeutic predictor
medical condition treated
negative prognostic predictorpositive prognostic predictorpositive diagnostic predictornegative diagnostic predictor
Medical condition treatedmedicine marketing autho.
Medical condition treated
RED indicates new type or predicate (since Sept) ~150k items added - Garbanzo API (knowledge beacon impl)
Biological PathwayWikiPathways IDHas part
895,065 110,306 535,409 46,356
8694
1,081
31
Total counts
New
216
1,165
1,259
250
2506 157,340
adapted from Russell and Norvig, 2009Credit: NCATS Translator Team Ultraviolet, Broad Institute
The Knowledge Map is a set ofactions the agent can perform
Credit: NCATS Translator Team Ultraviolet, Broad Institute
Biomedical Informatics; CG Chute
© Mayo Clinic College of Medicine 201 6
downloadeverything
crawl &index
high‐levelabstraction
trial &error
more details fewer details
Credit: NCATS Translator Team Ultraviolet, Broad Institute
The agent plans with abstract entities
BLACKBOARDspecific instances
KNOWLEDGE MAPAND AGENT STATEabstract entities
imatinib c‐Kit KIT signaling mast cell asthma
Drug Target Pathway Cell Disease
bindingabstraction
Bcr‐Abl eosinophil
Credit: NCATS Translator Team Ultraviolet, Broad Institute
Credit: NCATS Translator Team Ultraviolet, Broad Institute
Four steps of reasoning
Credit: NCATS Translator Team Ultraviolet, Broad Institute
Changing Face of Medicine: Continuous Learning
35
Biological principles
BasicScienceStudies
ClinicalCohorts
Experience
Studies
Longitudinal Data
loT
Wearables
SocialBehavioralBiologicalEnvironmental
Knowledge of Basic Science
Knowledge of Clinicians
RESEARCH ENVIRONMENT
36
DISCOVERY
Prototype Platform
Pulls datasets
from disparate sources
Normalizes across
disparate sources to
create useable data set
Projects approved data sets into safe,
secure environment for analysis
Checks against IRB approvals to
ensure authorization
to access data
DATA COMMONS
Platform Management and Services
Platform Security and Data Provenance
Cloud and Big Data Technologies
Systems of Record
JHM Deployment (Local)Azure Deployment (Cloud)Hybrid Deployment
LEGEND
DATA SOURCES
Reusable Pipelines
Biomedical Informatics; CG Chute
© Mayo Clinic College of Medicine 201 7
37
Where is This Going?• Biomedical practice and research are data,
information, and knowledge intensive• Comparable and consistent data
representation are pre-requisite for efficient clinical analytics
• Clinical Data is Broad, Deep, and Complex• Inferencing clinical “big data” is not Google• Ontological frameworks are needed for large-
scale data integration and big science