27
Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury Casey Overby, Chunhua Weng, Krystl Haerian, Adler Perotte Carol Friedman, George Hripcsak Department of Biomedical Informatics Columbia University AMIA TBI paper presentation March 20 th , 2013

Evaluation Considerations for EHR-Based Phenotyping

  • Upload
    amia

  • View
    17

  • Download
    4

Embed Size (px)

DESCRIPTION

2013 Summit on Translational Bioinformatics

Citation preview

Page 1: Evaluation Considerations for EHR-Based Phenotyping

Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury

Casey Overby, Chunhua Weng, Krystl Haerian, Adler Perotte Carol Friedman, George Hripcsak

Department of Biomedical Informatics

Columbia University

AMIA TBI paper presentation March 20th, 2013 !

Page 2: Evaluation Considerations for EHR-Based Phenotyping

Published Genome-Wide Associations through 09/2011 1,596 published GWA at p≀5X10-8 for 249 traits

NHGRI GWA Catalog www.genome.gov/GWAStudies

Success in part due to GWAS consortia to obtain needed sample sizes

Background and Motivation

Page 3: Evaluation Considerations for EHR-Based Phenotyping

There are added challenges to studying pharmacological traits

β€― Drug response is complex β€― Risk factors in pathogenesis of drug induced liver injury (DILI)

β€― Sample sizes are small compared to typical association studies of common disease β€― Adverse drug events β€― Responder types

Source: Tujios & Fontana et al. Nat. Rev. Gastroenterol. Hepatol. 2011

Background and Motivation

Page 4: Evaluation Considerations for EHR-Based Phenotyping

Consortium recruitment approaches

β€― Recruit and phenotype participants prospectively β€― Protocol driven recruitment

β€― Electronic health records (EHR) linked with DNA biorepositories β€― EHR phenotyping

Background and Motivation

Page 5: Evaluation Considerations for EHR-Based Phenotyping

Successes developing EHR algorithms within eMERGE

β€― Type II diabetes β€― Peripheral arterial disease β€― Atrial fibrillation β€― Crohn disease β€― Multiple sclerosis β€― Rheumatoid arthritis

β€― High PPV

(Kho et al. JAMIA 2012; Kullo et al. JAMIA 2010; Ritchie et al. AJHG 2010; Denny et al. Circulation 2010; Peissig et al. JAMIA 2012)

Source: www.phekb.org

Background and Motivation

Page 6: Evaluation Considerations for EHR-Based Phenotyping

Unique characteristics of DILI

β€― Rare condition of low prevalence β€― Complex condition

β€― Drug is causal agent of liver injury β€― Different drugs can cause DILI β€― Pattern of liver injury varies between drug

β€― Pattern of liver injury based on liver enzyme elevations β€― No tests to confirm drug causality (some assessment tools exist)

β€― High PPV may be challenging

Background and Motivation

Page 7: Evaluation Considerations for EHR-Based Phenotyping

Why study? β€― DILI accounts for up to 15 % of acute liver failure cases in the

U.S., of which 75% requires liver transplant or lead to death β€― Most frequent reason for withdrawal of approved drugs from the

market

β€― Lack understanding of underlying mechanisms of DILI β€― Computerized approaches can reduce the burden of

traditional approaches to screening for rare conditions (Jha AK et al. JAMIA 1998; Thadani SR et al. JAMIA 2009)

Background and Motivation

Page 8: Evaluation Considerations for EHR-Based Phenotyping

Overview of EHR phenotyping process

Case definition

Background and Motivation

Page 9: Evaluation Considerations for EHR-Based Phenotyping

Overview of EHR phenotyping process

Case definition (Re-)Design

EHR Phenotyping algorithm

Background and Motivation

e.g., liver injury e.g., ICD-9 codes for acute liver injury, Decreased liver function lab

Page 10: Evaluation Considerations for EHR-Based Phenotyping

Overview of EHR phenotyping process

Case definition (Re-)Design

EHR Phenotyping algorithm

Implement EHR Phenotyping

algorithm

Background and Motivation

Page 11: Evaluation Considerations for EHR-Based Phenotyping

Overview of EHR phenotyping process

Case definition (Re-)Design

EHR Phenotyping algorithm

Evaluate EHR Phenotyping

algorithm

Implement EHR Phenotyping

algorithm

Background and Motivation

Page 12: Evaluation Considerations for EHR-Based Phenotyping

Overview of EHR phenotyping process

Case definition (Re-)Design

EHR Phenotyping algorithm

Evaluate EHR Phenotyping

algorithm

Implement EHR Phenotyping

algorithm

Disseminate EHR Phenotyping

algorithm

If algorithm needs

improvement

If algorithm is sufficient to be

useful

Background and Motivation

Page 13: Evaluation Considerations for EHR-Based Phenotyping

Overview of methods to develop & evaluate initial algorithm

DILI Case definition (iSAEC)

Design EHR Phenotyping

algorithm

Evaluate EHR Phenotyping

algorithm

Implement EHR Phenotyping

algorithm

Disseminate EHR Phenotyping

algorithm

Methods and Results

Page 14: Evaluation Considerations for EHR-Based Phenotyping

Overview of methods to develop & evaluate initial algorithm

DILI Case definition (iSAEC)

Design EHR Phenotyping

algorithm

Evaluate EHR Phenotyping

algorithm

Implement EHR Phenotyping

algorithm

Disseminate EHR Phenotyping

algorithm

Develop an evaluation framework

Report lessons learned

Methods and Results

Page 15: Evaluation Considerations for EHR-Based Phenotyping

DILI Case definition (iSAEC)

Design EHR Phenotyping

algorithm

Evaluate EHR Phenotyping

algorithm

Implement EHR Phenotyping

algorithm

Disseminate EHR Phenotyping

algorithm

Develop an evaluation framework

Report lessons learned

Lessons inform evaluator approach and algorithm design changes

Lessons learned

Page 16: Evaluation Considerations for EHR-Based Phenotyping

DILI case definition 1.β€― Liver injury diagnosis (A1)

a.β€― Acute liver injury (C1-C4) b.β€― New liver injury (B)

2.β€― Caused by a drug a.β€― New drug (A2) b.β€― Not by another disease (D)

A1. Diagnosed with liver

injury?

Clinical data warehouse

B. New liver

injury?

Consider chronicity

no

C2. ALT >= 5x ULN

Patients meeting drug induced liver injury criteria

A2. Exposure to

drug?

C1. ALP >= 2x ULN

C3. ALT >= 3x ULN

C4. Bilirubin >=

2x ULN

yes

yes

no

no

no

Exclude

yes

no

Exclude

D. Other

diagnoses?yes

yes

yes

no

yes

Exclude

no

Exclude

yes

no

Exclude

18,423

13,972

2,375

1,264

560

Initial DILI EHR phenotyping algorithm

Methods and Results

Ref: Aithal, G.P., et al. Case Definition and Phenotype Standardization in Drug-induced Liver Injury. Clin Charmacol Ther. 2011 Jun; 89(6):806-15

Page 17: Evaluation Considerations for EHR-Based Phenotyping

β€― TP: 27

β€― FP: 42

β€― NA: 30

β€― PPV: TP/(TP+FP) = 27/(42+27) = 39%

β€― Preliminary kappa coefficient: 0.50 (Moderate agreement)

β€― Interpretation of PPV is unclear given moderate agreement among reviewers

Initial algorithm results: 100 randomly selected for manual review from 560 patients

Estimated positive predictive value

20

20

20

20

20 Reviewer 2

Reviewer 1

Reviewer 3

Reviewer 4

Methods and Results

Page 18: Evaluation Considerations for EHR-Based Phenotyping

Included measurement and demonstration studies

β€― Measurement study goal β€― β€œto determine the extent and nature of the errors with which a measurement

is made using a specific instrument.” β€― Evaluator effectiveness

β€― Demonstration study goal β€― β€œestablishes a relation – which may be associational or causal – between a set

of measured variables.” β€― Algorithm performance

Definitions from: β€œEvaluation methods in medical informatics” Friedman & Wyatt 2006

Methods and Results

Page 19: Evaluation Considerations for EHR-Based Phenotyping

Included quantitative and qualitative assessment

β€― Quantitative data β€― Inter-rater reliability assessment β€― PPV

β€― Qualitative data β€― Perceptions of evaluation approach effectiveness

β€― e.g., review tool, artifacts reviewed

β€― Perceptions of benefit of results β€― e.g., correct for the case definition?

Methods and Results

Page 20: Evaluation Considerations for EHR-Based Phenotyping

An evaluation framework and results Measurement study

(evaluator effectiveness) Demonstration study

(algorithm performance)

Qua

ntit

ativ

e re

sult

s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%

Qua

litat

ive

resu

lts

Perceptions of evaluation approach effectiveness: β€’β€― Differences between evaluation

platforms β€’β€― Visualizing lab values β€’β€― Availability of notes

β€’β€― Discharge summary vs. other notes

Perceptions of benefit of results (themes in FPs): β€’β€― Babies β€’β€― Patients who died β€’β€― Overdose patients β€’β€― Patients who had a liver transplant

Methods and Results

Page 21: Evaluation Considerations for EHR-Based Phenotyping

Lesson learned: What’s correct for the algorithm may not be correct for the case definition

β€― Are we measuring what we mean to measure?

β€― Case definition: liver injury due to medication, not by another disease

β€― Many FPs were transplant patients β€― Patients correct for the algorithm, but liver enzymes elevated due to procedure

β€― Revision: exclude transplant patients

Lessons learned

Page 22: Evaluation Considerations for EHR-Based Phenotyping

Improved algorithm design given themes in FPs

β€― Added exclusions β€― Babies β€― Overdose patients β€― Patients who died β€― Transplant patients

Lessons learned β€œA collaborative approach to develop an EHR phenotyping algorithm for DILI” in preparation

Page 23: Evaluation Considerations for EHR-Based Phenotyping

Lesson learned: Evaluator effectiveness influences ability to drawing appropriate inferences about algorithm performance

β€― How does our evaluation approach influence performance estimations?

β€― Moderate agreement among algorithm reviewers, so interpretation of PPV unclear

β€― Revision: Improve evaluator approach

Lessons learned

Page 24: Evaluation Considerations for EHR-Based Phenotyping

Improved evaluator approach

β€― Consensus among 4 reviewers

β€― Assign TP/FP status by 1.β€― First-pass review of temporal relationship β€― Assign preliminary TP, FP, unknown status

2.β€― Perform Chart review β€― Confirm suspected TPs β€― Assign TP/FP if unknown status in first pass

review

●

●●

●●●●

●●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●

●●●●

●●●●●●●

●●●●●

●

●

●●●

●●●●●●●

●●●●●

●

●

●

●●●

●

●●●●●●

●● ●●

●●●●●●

●

● ● ●

● ●●

100

200

300

aph

valu

e

2012βˆ’0

3βˆ’15

2012βˆ’0

3βˆ’20

2012βˆ’0

3βˆ’25

2012βˆ’0

3βˆ’30

2012βˆ’0

4βˆ’05

2012βˆ’0

4βˆ’10

2012βˆ’0

4βˆ’15

2012βˆ’0

4βˆ’20

2012βˆ’0

4βˆ’26

2012βˆ’0

5βˆ’01

2012βˆ’0

5βˆ’06

2012βˆ’0

5βˆ’11

2012βˆ’0

5βˆ’17

2012βˆ’0

5βˆ’22

2012βˆ’0

5βˆ’27

2012βˆ’0

6βˆ’01

2012βˆ’0

6βˆ’07

2012βˆ’0

6βˆ’12

2012βˆ’0

6βˆ’17

2012βˆ’0

6βˆ’22

2012βˆ’0

6βˆ’28

2012βˆ’0

7βˆ’03

2012βˆ’0

7βˆ’08

2012βˆ’0

7βˆ’13

2012βˆ’0

7βˆ’19

●

●

●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●

●

●●●●●

●

●

●

●

●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●● ● ● ● ● ● ● ●0

200

400

alt v

alue

2012βˆ’0

3βˆ’15

2012βˆ’0

3βˆ’20

2012βˆ’0

3βˆ’25

2012βˆ’0

3βˆ’30

2012βˆ’0

4βˆ’05

2012βˆ’0

4βˆ’10

2012βˆ’0

4βˆ’15

2012βˆ’0

4βˆ’20

2012βˆ’0

4βˆ’26

2012βˆ’0

5βˆ’01

2012βˆ’0

5βˆ’06

2012βˆ’0

5βˆ’11

2012βˆ’0

5βˆ’17

2012βˆ’0

5βˆ’22

2012βˆ’0

5βˆ’27

2012βˆ’0

6βˆ’01

2012βˆ’0

6βˆ’07

2012βˆ’0

6βˆ’12

2012βˆ’0

6βˆ’17

2012βˆ’0

6βˆ’22

2012βˆ’0

6βˆ’28

2012βˆ’0

7βˆ’03

2012βˆ’0

7βˆ’08

2012βˆ’0

7βˆ’13

2012βˆ’0

7βˆ’19

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●●

●

●●

●

●●●

●●

●

●●●●

●●

●●

●

●●

●

●●

●

●●●

●

●

●

●

●●

●

●●●

●●

●●●

●

●

●

●●

●

●●

●

●

●

●

●●●

●●

●

●●

●

●●

●

●●●

●

●

●

●●●●

●●

●●

●

●●

●

●

●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●●

●●●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●●

●

●●●

●●

●

●●

●

●●

●

●●

●

●●●

●

●●

●●●

●●

●

●●

●

●●

●

●

●●

●

●●

●●●

●●●●●

●

●

●●

●●●●● ●

●●●●●●●●●●●●●

●

●●●●●●●●●

●●

●

●●

●

●●●

●

●●●

●●●●●

●●●0

12

34

bilir

ubin

IV v

alue

2012βˆ’0

3βˆ’15

2012βˆ’0

3βˆ’20

2012βˆ’0

3βˆ’25

2012βˆ’0

3βˆ’30

2012βˆ’0

4βˆ’05

2012βˆ’0

4βˆ’10

2012βˆ’0

4βˆ’15

2012βˆ’0

4βˆ’20

2012βˆ’0

4βˆ’26

2012βˆ’0

5βˆ’01

2012βˆ’0

5βˆ’06

2012βˆ’0

5βˆ’11

2012βˆ’0

5βˆ’17

2012βˆ’0

5βˆ’22

2012βˆ’0

5βˆ’27

2012βˆ’0

6βˆ’01

2012βˆ’0

6βˆ’07

2012βˆ’0

6βˆ’12

2012βˆ’0

6βˆ’17

2012βˆ’0

6βˆ’22

2012βˆ’0

6βˆ’28

2012βˆ’0

7βˆ’03

2012βˆ’0

7βˆ’08

2012βˆ’0

7βˆ’13

2012βˆ’0

7βˆ’19

Lessons learned β€œA collaborative approach to develop an EHR phenotyping algorithm for DILI” in preparation

Page 25: Evaluation Considerations for EHR-Based Phenotyping

Summary of findings

β€― Lessons learned from applying our evaluation framework β€― What’s correct for the algorithm may not be correct for the case definition (Are we

measuring what we mean to measure?) β€― Evaluator effectiveness influences ability to draw appropriate inferences about algorithm

performance

β€― Demonstrated that our evaluation framework is useful β€― Informed improvements in algorithm design β€― Informed improvements in evaluator approach β€― Likely more useful for rare conditions

β€― Demonstrated EHR phenotyping algorithm development is an iterative process β€― Complexity of the algorithm may influence

Page 26: Evaluation Considerations for EHR-Based Phenotyping

Acknowledgments

β€― Dr. Yufeng Shen - Serious Adverse Event Consortium collaborator

β€― eMERGE collaborators β€― Mount Sinai (Drs. Omri Gottesman, Erwin Bottinger, and Steve Ellis) β€― Mayo Clinic (Drs. Jyotishman Pathak, Sean Murphy, Kevin Bruce, Stephanie Johnson,

Jay Talwalker, Christopher G. Chute, Iftikhar J. Kullo) β€― Northwestern (Dr. Abel Kho) β€― Vanderbilt (Dr. Josh Denny)

β€― DILIN collaborator β€― UNC-CH (Dr. Ashraf Farrag)

β€― Columbia Training in Biomedical Informatics (NIH NLM #T15 LM007079)

β€― The eMERGE network U01 HG006380-01 (Mount Sinai)

!

Page 27: Evaluation Considerations for EHR-Based Phenotyping

Questions?

Casey L. Overby [email protected]

Measurement study Demonstration study

Qua

ntit

ativ

e re

sult

s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%

Qua

litat

ive

resu

lts

Perceptions of evaluation approach effectiveness: β€’β€― Differences between evaluation

platforms β€’β€― Visualizing lab values β€’β€― Availability of notes

β€’β€― Discharge summary vs. other notes

Perceptions of benefit of results (themes in FPs): β€’β€― Babies β€’β€― Patients who died β€’β€― Overdose patients β€’β€― Patients who had a liver transplant

!

DILI Case definition (iSAEC)

Design EHR Phenotyping

algorithm

Evaluate EHR Phenotyping

algorithm

Implement EHR Phenotyping

algorithm

Disseminate EHR Phenotyping

algorithm

Develop an evaluation framework

Report lessons learned