53
Next-generation Phenotyping Using Interoperable Big Data George Hripcsak, Chunhua Weng Columbia University Medical Center Collab with Mount Sinai Medical Center Biomedical Informatics discovery and impact

Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Embed Size (px)

Citation preview

Page 1: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Next-generation PhenotypingUsing Interoperable Big Data

George Hripcsak, Chunhua Weng

Columbia University Medical Center

Collab with Mount Sinai Medical Center

Biomedical Informaticsdiscovery and impact

Page 2: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Introducing OHDSI

Observational Health Data Sciences and InformaticsInternational network of researchers and observational health databases with a central coordinating center housed at Columbia UniversityMission: Large-scale analysis of observational health databases for population-level estimation and patient-level predictionsVision: Patients and clinicians use OHDSI tools every day to access evidence based on 1 billion patients

http://ohdsi.org

Clinical researcher, provider, patient

Tools and algorithms

Data nodes

Infrastructure, models, ontologies

Page 3: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

OHDSI’s global research community

• >120 collaborators from 11 different countries• Experts in informatics, statistics, epidemiology, clinical sciences• Active participation from academia, government, industry, providers

http://ohdsi.org/who-we-are/collaborators/

Page 4: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Global reach of ohdsi.org

• >4600 distinct users from 96 countries in 2015

Page 5: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Why large-scale analysis is needed in healthcare

All

dru

gs

All health outcomes of interest

Page 6: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

What is large-scale?

• Millions of observations

• Millions of covariates

• Millions of questions

No analytics software in the world can fit a regression with >1m observations and >1m covariates on typical hardware… but CYCLOPS can!

Need for performance in handling relational structure with millions of patients and billions of clinical observations, focus on optimization to analytical use cases.

Systematic solutions with massive parallelization should be designed to run efficiently for one-at-a-time AND all-by-all

Page 7: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Concept

Concept_relationship

Concept_ancestor

Vocabulary

Source_to_concept_map

Relationship

Concept_synonym

Drug_strength

Cohort_definition

Stand

ardize

d vo

cabu

laries

Attribute_definition

Domain

Concept_class

Cohort

Dose_era

Condition_era

Drug_era

Cohort_attribute

Stand

ardize

d

de

rived

ele

me

nts

Stan

dar

diz

ed

clin

ical

dat

a

Drug_exposure

Condition_occurrence

Procedure_occurrence

Visit_occurrence

Measurement

Procedure_cost

Drug_cost

Observation_period

Payer_plan_period

Provider

Care_siteLocation

Death

Visit_cost

Device_exposure

Device_cost

Observation

Note

Standardized health system data

Fact_relationship

SpecimenCDM_source

Standardized meta-data

Stand

ardize

d h

ealth

e

con

om

ics

Drug safety surveillance

Device safety surveillance

Vaccine safety surveillance

Comparative effectiveness

Health economics

Quality of care

Person

Page 8: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Preparing your data for analysis

Patient-level data in source

system/ schema

Patient-level data in

OMOP CDM

ETL design

ETL implement

ETL test

WhiteRabbit: profile your source data

RabbitInAHat: map your source

structure to CDM tables and

fields

ATHENA: standardized vocabularies for all CDM

domains

ACHILLES: profile your CDM data;

review data quality

assessment; explore

population-level summaries

OH

DSI

to

ols

bu

ilt t

o h

elp

CDM: DDL, index,

constraints for Oracle, SQL

Server, PostgresQL;

Vocabulary tables with loading

scripts

http://github.com/OHDSI

OHDSI Forums:Public discussions for OMOP CDM Implementers/developers

Usagi: map your

source codes to CDM

vocabulary

Page 9: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Single study

Real-time query

Large-scale analytics

Data Evidence sharing paradigms

Patient-level data in

OMOP CDM

evidence

Write Protocol

Developcode

Executeanalysis

Compile result

Develop app

Design query

Submit job

Review result

Develop app

Execute script

Explore results

One-time Repeated

Page 10: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Standardized large-scale analytics tools under development within OHDSI

Patient-level data in

OMOP CDM

http://github.com/OHDSI

ACHILLES:Database profiling

CIRCE:Cohort

definition

HERACLES:Cohort

characterization

OHDSI Methods Library:CYCLOPS

CohortMethodSelfControlledCaseSeries

SelfControlledCohortTemporalPatternDiscovery

Empirical CalibrationHERMES:

Vocabulary exploration

LAERTES: Drug-AE

evidence base

HOMER:Population-level

causality assessment

PLATO:Patient-level

predictive modeling

CALYPSO:Feasibility

assessment

Page 11: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

CIRCE for cohort definition

• CIRCE (Cohort Inclusion and Restriction Criteria Expression)• User interface to define and review cohort definitions:

– COHORT is a set of persons satisfying one or more criteria for a duration of time

– Disease phenotype is a typical use case for cohort definition

• Interface translates a human-readable form into a standardized JSON representation for network-based analysis interoperabilities, and compiles the JSON into platform-specific SQL dialect for direct execution against any OMOP CDM-compliant dataset

• Open-source, freely available source code: https://github.com/OHDSI/Circe

Page 12: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping
Page 13: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping
Page 14: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

One interface allows definition of criteria across all tables and all fields of the OMOP Common Data Model. The user interface translates this human-readable form into JSON, which is compiled into SQL dialects for 5 platforms.

Page 15: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Each expression can be defined by one or more standard concept sets, using OHDSI’s standardized vocabularies

Page 16: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

OHDSI standardized vocabularies allows consistent definitions to be applied across disparate source vocabularies:

Select descendents for SNOMED concept of ‘Attention deficit hyperactivity disorder’ maps all ICD9, ICD10, READ codes to execute analysis across OHDSI’s international data network

HERMES for vocabulary exploration

Page 17: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Concept sets can define one or more entitities. Here, the PheKB list of ‘ADHD inclusionary medications’ has been represented by 21 RxNorm ingredient concepts, all brands/dose/form are subsumed

Page 18: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

The human-readable Expression form is translated into JSON in real-time. This JSON object can be shared across partners to materialize the definition consistently and reproducibly without any programming required

Page 19: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Each expression is compiled into SQL. OHDSI supports rendering SQL into platform-specific dialects for SQL Server, Oracle, Postgres, RedShift, MS APS.

This code can be copied and executed in your favorite SQL UI tool, or….

Page 20: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Patient-level observational databases that are converted to the OMOP Common Data Model and exposed to the OHDSI webAPI (either local install or any public network version) can have the cohort definition directly executed within the database to produce a COHORT . The COHORT is then available for all subsequent research within the OHDSI environment…

Page 21: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Try it yourself

http://www.ohdsi.org/web/circe/#/146

Page 22: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Proof of concept

• Treatment pathways around the world

• Diabetes, hypertension, depression

• (Submitted to PNAS)

Page 23: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Cohort

Page 24: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Databases (255M) and definitions

Page 25: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Diabetes

Page 26: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping
Page 27: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping
Page 28: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Opportunities for collaboration

• Implement the PheKB library in CIRCE, so that all organizations with patient-level data (translated to OMOP common data model) can take the work from eMERGE and directly apply the logic to their own data and participate in eMERGE’s research

Page 29: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Phenotyping hard challenges

• Quality of the data– Ambiguous or unknown meaning

– Accuracy• 50-100% accuracy [Hogan JAMIA 1997]

– Completeness• mostly missing

– Complexity• disease ontologies

• Bias

Page 30: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

observe &

interpretTruth

Health status of the patient

ConceptClinician or

patient’s conception

RecordEHR/PHR

Concept2nd clinician’s conception of the patient (or

self, lawyer, compliance, ...)

ModelComputable

representation

author read

process

Error Error

Error

Implicit

Page 31: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Biased

Patient state

Electronic health record

Care team

Therapy

Objective tests

Environment

Page 32: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Inpatient mortality for community acquired pneumonia

0

5

10

15

20

25

30

35

1 2 3 4 5

Fine class

Mo

rtality

(%

)

18715 cohort1935 cohortFine

18715 cohort+CXR+fdg-recent pneu-recent visit

1935 cohortabove plus+DSUM exist+ICD9 (pneu

not sepsis)

Hripcsak ... Comput Biol Med 2007;37:296-304

Page 33: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

EHR-derived phenotype

• Clinically relevant feature derived from EHR

– Patient has (a diagnosis of) type II diabetes

– Recent rash and fever

– Drug-induced liver injury

• Then use the phenotype in correlation studies, etc.

Raw data Phenotype ExperimentQuery

Page 34: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

“Physics” of the medical record

1. Study EHR as if it were a natural object

– Use EHR to learn about EHR

– Not studying patient, but recording of patient

2. Aggregate across units and model

3. Borrow methods from non-linear time series

Page 35: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Glucose by Δt and tau

1 2 3 4 5 6 7 8 910 20 30 40 50 60 70 80 90

100

0.17

0.83

2

750

450

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

MI

tau

delta-t (days)

Glucose

0.4-0.45

0.35-0.4

0.3-0.350.25-0.3

0.2-0.25

0.15-0.2

0.1-0.15

0.05-0.10-0.05

-0.1-0

Albers ... Translational Bioinformatics 2009

Page 36: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Correlate lab tests and concepts

• 22 years of data on 3 million patients

• 21 laboratory tests

– sodium, potassium, bicarbonate, creatinine, urea nitrogen, glucose, and hemoglobin

• 60 concepts derived from signout notes

– residents caring for inpatients to facilitate the transfer of care for overnight coverage

– concepts likely to have an association + controls

Page 37: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Intentional and physiologic associations

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

-60 -40 -20 0 20 40 60

potassium

aldactone

dialysis

hyperkalemia

hypokalemia

hypomagnesemia

Page 38: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Timing of cause in disease vs. treatment

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

-60 -10 40

glucose

hyperglycemia

hypernatremia

hypoglycemia

insulin

metformin

pancreatitis

Page 39: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Specificity of the concept

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-60 -40 -20 0 20 40 60

creatinine

aldactone

dialysis

diarrhea

diuretic

hctz

hyperglycemia

hypernatremia

vomiting

Page 40: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Health care process model

Hripcsak ... JAMIA 2013

Page 41: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Hripcsak ... JAMIA 2013

Page 42: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping
Page 43: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

inpatient admit ambulatory surgery

Page 44: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Interpreting time

Hripcsak JAMIA 2009

Page 45: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Deviation by stated unit

0

5

10

15

20

25

30

35

40

45

50-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 1

Proportional deviation

Nu

mb

er

of

occu

rren

ces

dayweekmonthyear

No

w

Stat

ed

ti

me

Page 46: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Interpreting time

Variable Definition Coefficient Significance

value stated numeric value in the temporal assertion (1 to 30 in this sample)

0.0414 <0.001

round number true if value is a multiple of 5 (any unit) or 6 (with months)

–0.0218 0.002

ln(duration) logarithm of stated duration in days, which equals the product of unit and value

0.150 0.023

gt 18 years true if duration ≥ 18 years, so the event should not be in the database

0.816 <0.001

intercept 0.406 0.416

Page 47: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Patient variability and sampling

Page 48: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Parameterizing Time

Page 49: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Parameterizing Time(Non-stationarity)

0

0.5

1

1.5

2

2.5

creatinine glucose sodium potassium

coe

ffic

ien

t o

f va

riat

ion

rate of change

clock

warped

sequence

Hripcsak JAMIA 2015

Page 50: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Parameterizing Time

Page 51: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Vector autoregression to decipher associations

Page 52: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Noisy training setswith Nigam Shah; David Sontag

Page 53: Next-generation Phenotyping Using Interoperable Big · PDF fileNext-generation Phenotyping Using Interoperable Big Data ... Vaccine safety surveillance ... Next-generation Phenotyping

Summary

• OHDSI international collaboration could dovetail with eMERGE

• Next-generation phenotyping requires understanding the EHR