27
1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech [email protected] More info at sunlab.org

1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech [email protected] More info at sunlab.orgsunlab.org

Embed Size (px)

Citation preview

Page 1: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

1

Phenotyping from Electronic Health Records

Jimeng Sun

College of Computing

Georgia Tech

[email protected]

More info at sunlab.org

Page 2: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

2

My research focus on health analytics

Genomic data

Clinical data

Behavior data

Social data

Health Analytic Apps

Heart disease predictor for $5.99 Analytic cloudPrivacy engine

Visualization

User

Clinical Researchers

Training data

Research Challenges

Big data analytics on the cloud

Data mining and machine learning techniques

Privacy preserving data sharing

Visual analytic techniques

My focus

Page 3: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

3

Outline

Phenotyping from EHR

Other work

– PARAMO: Large scale predictive modeling pipeline

– Patient Similarity

Page 4: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

4

EHR

Phenotyping from Electronic Health Records

Demographic

Diagnosis

Medication

Lab Tests

Procedure

Medical Images

Medical Concepts

(phenotypes)Phenotyping

Page 5: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

5

Motivation: Increasing Importance of Electronic Health Records

EHR become acceptable data sources for clinical research

EHR data can enable many more research

Explosion in interest

HOW TO TURN EHR INTO PHENOTYPES?

Page 6: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

6

This talk

Challenges in Phenotyping from EHR

Representation

– How to represent heterogeneous EHR data and phenotypes?

Speed

– How to construct diverse phenotypes in unsupervised fashion?

Intuition

– How to validate and refine the phenotypes?

Adaptation

– How to adapt phenotypes from one site to another?

Page 7: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

7

Constructing Feature Tensor

Tensor is a generalization of matrix

– Matrix is a 2nd order tensor

Tensors can better capture interactions among concepts

Data element types:• Binary • Count (integer)• Continuous (numeric)

Mode

Page 8: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

8

Multiple Tensors

Diagnosis-Medication

Diagnostic Sources

Medication Reconciliation

Lab Results

SymptomsVital

Page 9: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

9

Phenotyping through Tensor Factorization

Phenotype R

≈ + … +

λ1 λR

Medication factor

Diagnosis factor

Patients factor

Phenotype importance

Phenotype 1

Elements sum to 1

Factor elementssum to 1

Page 10: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

Candidate Phenotype k(40% of patients)HypertensionBeta Blockers Cardio-SelectiveThiazides and Thiazide-Like DiureticsHMG CoA Reductase Inhibitors

λk

Example Phenotype

Diagnosis factor

Medication factor

Patients factor

Page 11: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

11

Phenotyping Process using Tensor Factorization

CountData

Tensor Factorization

Projection

Phenotype Definitions

Count Data

λ1 λR+ … +

PhenotypesMatrix

New Patients

Page 12: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

12

CP-APR Model

KL divergence for count data

Nonnegative combinations

Stochastic constraint(elements in factor sum to 1)

Element index

Chi, E.C. and Kolda, T.G. 2012. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications. 33, 4 (2012), 1272–1299.

Page 13: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

13

Constructing the Tensor

Medication orders from Geisinger dataset

Diagnosis codes aggregated into HCC codes

Medications are defined as pharmacy subclass

31,816 patients x 169 diagnoses x 471 medications

Page 14: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

14

Evaluation of Phenotypes: Classification

Task: predict patients with heart failure

Model: logistic regression with ℓ1 regularization

10 random even splits of the dataset (50% training)

Features:

1. Baseline using source independence matrix

2. Principal Component Analysis (PCA)

3. Nonnegative Matrix Factorization (NMF)

4. Phenotype Tensor Factorization (PTF)

Page 15: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

15

Predictive Performance Effect

Small number of phenotypes outperforms 640 features

Number of Phenotypes

Page 16: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

Phenotype 1

Hypertension – Opiod Combinations

Disorders of the Vertebrae and Spinal Discs – Glucocortiocosteriods

Disorders of the Vertebrae and Spinal Discs – Stimulant Laxatives

Disorders of the Vertebrae and Spinal Discs – Beta Blockers Cardio-Selective

Disorders of the Vertebrae and Spinal Discs – Sympathomimetics

Disorders of the Vertebrae and Spinal Discs – Anticonvulsants - Misc

Disorders of the Vertebrae and Spinal Discs – Central Muscle Relaxants

Disorders of the Vertebrae and Spinal Discs – HMG CoA Reductase Inhibitors

Disorders of the Vertebrae and Spinal Discs – Selective Serotonin Reuptake Inhibitors

Disorders of the Vertebrae and Spinal Discs – Surfactant Laxatives

Disorders of the Vertebrae and Spinal Discs – Proton Pump Inhibitors

Disorders of the Vertebrae and Spinal Discs – Cephalosporins – 1st Generation

Disorders of the Vertebrae and Spinal Discs – Analgesics Other

Disorders of the Vertebrae and Spinal Discs – Non-Barbiturate Hypnotics

Disorders of the Vertebrae and Spinal Discs – Electrolyte Mixtures

Minor Symptoms, Signs, Findings – Opiod Combinations

Post-Surgical States/Aftercare/Elective – Opiod Combinations

Post-Surgical States/Aftercare/Elective – Stimulant Laxatives

Post-Surgical States/Aftercare/Elective – Beta Blockers Cardio-Selective

Post-Surgical States/Aftercare/Elective – HMG CoA Reductase Inhibitors

Post-Surgical States/Aftercare/Elective – Proton Pump Inhibitors

Post-Surgical States/Aftercare/Elective – Opiod Agonists

Post-Surgical States/Aftercare/Elective – Cephalosporins – 1st Generation

Post-Surgical States/Aftercare/Elective – Analgesics Other

Post-Surgical States/Aftercare/Elective – Non-Barbiturate Hypnotics

Other Eye Disorders – Opiod Combinations

Other Eye Disorders – Stimulant Laxatives

Other Eye Disorders – Opiod Agonists

Other Eye Disorders – Cephalosporins – 1st Generation

Other Eye Disorders – Non-Barbiturate Hypnotics

Phenotype 2

Major Symptoms, Abnormalities – Stimulant Laxatives

Major Symptoms, Abnormalities – Beta Blockers Cardio-Selective

Major Symptoms, Abnormalities – Sympathomimetics

Major Symptoms, Abnormalities – Coumarin Anticoagulants

Major Symptoms, Abnormalities – Salicylates

Major Symptoms, Abnormalities – Surfactant Laxatives

Major Symptoms, Abnormalities – Insulin

Major Symptoms, Abnormalities – Proton Pump Inhibitors

Major Symptoms, Abnormalities – Anti-infective Agents - Misc

Major Symptoms, Abnormalities – Vasodilators

Hypertension – Opiod Combinations

Other Gastrointestinal Disorders – Surfactant Laxatives

Other Gastrointestinal Disorders – Insulin

Diabetes with No or Unspecified Complications – Insulin

Specified Heart Arrhythmias – Beta Blockers Cardio-Selective

Iron Deficiency and Other/Unspecified Anemias and Blood Disease - Hematopoietic Growth Factors

Urinary Tract Infection – Insulin

Other Endocrine/Metabolic/Nutritional Disorders – Insulin

Vascular Disease – Coumarin Anticoagulants

Vascular Disease – Insulin

History of Disease– Insulin

Unspecified Renal Failure – Coumarin Anticoagulants

Diabetes with Renal Manifestation – Insulin

NMF factors are not concise, harder to interpret

Page 17: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

Phenotype 3 (17.6% of patients)Diabetes with No or Unspecified ComplicationsSulfonylureasBiguanidesDiagnostic TestsInsulin Sensitizing AgentsDiabetic SuppliesMeglitinide AnaloguesAntidiabetic Combinations

Phenotype 4 (31.1% of patients)HypertensionACE InhibitorsThiazides and Thiazide-Like Diuretics

Phenotype 5 (36.7% of patients)Other Ear, Nose, Throat, and Mouth DisordersViral and Unspecified Pneumonia, PleurisySignificant Ear, Nose, and Throat DisordersCough/Cold/Allergy CombinationsAzithromycinFluoroquinolonesSympathomimeticsPenicillin CombinationsAntitussivesGlucocorticosteroidsTetracyclinesAnti-infective Misc. - CombinationsClarithromycinCephalosporins - 2nd GenerationCephalosporins - 1st GenerationExpectorants

Uncomplicated Diabetes

Mild Hypertension Chronic Respiratory Inflammation/Infection

PTF interpretation: Major disease phenotypes can be identified

Page 18: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

Phenotype 4 (31.1% of patients)HypertensionACE InhibitorsThiazides and Thiazide-Like Diuretics

Mild HypertensionPhenotype 6 (24.3% of patients)HypertensionCalcium Channel BlockersAntihypertensive CombinationsAntiadrenergic AntihypertensivesPotassium Sparing Diuretics

Severe HypertensionPhenotype 2(31.5% of patients)HypertensionBeta Blockers Cardio-SelectiveAngiotensin II Receptor AntagonistsLoop DiureticsPotassiumNitratesAlpha-Beta BlockersVasodilators

Moderate Hypertension

PTF interpretation: Disease subtypes can be automatically identified

Over 80% phenotype factors are clinically meaningful

Page 19: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

19

Summary: Phenotyping using Tensor Factorization

Nonnegative tensor factorization can be used to learn phenotypes without supervision

Small number of phenotypes outperforms a large number of features in a prediction task

Phenotype R

≈ +…+λ1 λR

Phenotype 1

Few diagnosis

Page 20: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

20

PARAMO: PARALLEL PREDICTIVE MODELING PLATFORM

System

Page 21: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

21

Predictive Modeling Pipeline

There are many different models that need to be built and evaluated

– Different patient cohorts

– Different targets

– Different features

– Different algorithms

– Multiple training and testing splits in cross-validation

~100K different pipelines

Page 22: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

22

Running Time vs. Parallelism level

Patient sets

– Small: 5,000 patients for hypertension control prediction

– Medium: 33K for predicting heart failure onset

– Large: 319K for hypertension diagnosis prediction

Dependency graph: 1808 nodes and 3610 edges

9 days

3 hours

72X speed up

Page 23: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

23

PATIENT SIMILARITYAlgorithm

Page 24: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

24

Patient Similarity Problem

Supervision

Patient Doctor

Similarity

search

Page 25: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

25

Patient Similarity Problem

Patient Doctor

Page 26: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

26

Summary on Patient Similarity

To learn a customized distance metric for a target [1]

Extension 1: Composite distance integration (Comdi) [2]

– How to combine multiple patient similarity measures?

Extension 2: Interactive metric update (iMet) [3]

– How to update an existing distance measure?

1. Sun, J., Wang, F., Hu, J., Edabollahi, S., 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter 14, 16.

2. Fei Wang, Jimeng Sun, Shahram Ebadollahi: Integrating Distance Metrics Learned from Multiple Experts and its Application in Inter-Patient Similarity Assessment. SDM 2011: 59-70 56 

3. Fei Wang, Jimeng Sun, Jianying Hu, Shahram Ebadollahi: iMet: Interactive Metric Learning in Healthcare Applications. SDM 2011: 944-955

Page 27: 1 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.orgsunlab.org

27

Phenotyping from Electronic Health Records

Jimeng Sun

College of Computing

Georgia Tech

[email protected]

More info at sunlab.org