34
Joel Saltz, Andrew Post, Doris Gao, Sharath Cholleti, Mark Grand: Emory David Levine, Sam Hohmann: UHC Data Analytics for Readmission: Temporal features, predictive modeling

Presentation at UHC Annual Meeting

Embed Size (px)

DESCRIPTION

Talk presented at the UHC 2011 Annual Meeting

Citation preview

Page 1: Presentation at UHC  Annual Meeting

Joel Saltz, Andrew Post, Doris Gao, Sharath Cholleti, Mark Grand: Emory David Levine, Sam Hohmann: UHC

Data Analytics for Readmission: Temporal features, predictive modeling

Page 2: Presentation at UHC  Annual Meeting

Analytic Information Warehouse Project: Tools and Analytics to Answer Questions such as:

• What fraction of patients with a given category of principal diagnosis will be readmitted within 30 days?

• What fraction of patients with a given set of diseases will be readmitted within 30 days?

• How does severity and time course of co-morbidities affect readmissions?

• How can we best use history of prior hospitalizations to predict readmissions?

• What are the medical and socio-economic characteristics of frequently readmitted patients?

• Can we translate insight derived from our patient population into rules that can be used to manage patients?

Page 3: Presentation at UHC  Annual Meeting

Emory Clinical Data Warehouse

• EUH, EUHM and WW (inpatient encounters)• Excludes Psych and Rehab encounters

• Encounter location (entity, pavilion, unit)• Providers• Discharge disposition• Primary and secondary ICD9 codes• Procedure codes• DRGs• Medication orders• Labs• Vitals• Insurance status• Geographic information

Page 4: Presentation at UHC  Annual Meeting

Identifying Variables Associated with 30-day Readmits• Problem: “Raw” variables in the CDW are difficult to

use for prediction– Too many diagnosis codes, procedure codes– Continuous variables (e.g., labs) require interpretation– Temporal relationships between variables are implicit

• Solution: Transform the data into a much smaller set of variables using heuristic knowledge– Categorize diagnosis and procedure codes using code

hierarchies– Classify continuous variables using standard

interpretations (e.g., high, normal, low)– Identify temporal patterns (e.g., frequency, duration,

sequence)– Apply standard data mining techniques

Page 5: Presentation at UHC  Annual Meeting

Clinical Data Warehouse

Analytic InformationWarehouse

The CDW/AIW Relationship

• CDW as source of clinical and administrative data – cloned periodically (e.g., monthly)

• AIW as incubator of algorithms that generate derived information

Clinical Data Warehouse/Analytic Information Warehouse (AIW)

Cloned periodically

Derived information returned

Page 6: Presentation at UHC  Annual Meeting

AIW Workflow

Analytic InformationWarehouse

Data subset, mapped to a

standard model

Cloned periodically Periodic data

extraction

Calculation of derived variables (transform)

Augmented data set

Multiple Databases

Make analyses available in existing tools

Load into multiple output forms

Page 7: Presentation at UHC  Annual Meeting

Readmissions Analyses (Emory Healthcare)

Page 8: Presentation at UHC  Annual Meeting

Derived Variables• 30-day readmit• The 9 Emory Enhanced Risk Assessment Tool diagnosis categories• UHC product lines• “Disease indicators” (combinations of diagnosis codes, procedure codes, labs

and/or med orders that indicate a condition)– Obesity– Uncontrolled diabetes– End-stage renal disease (ESRD)– Pressure ulcer– Sickle cell disease

• Temporal variables derived over multiple encounters– Multiple MI– Multiple past 30-day readmissions– Sickle cell disease– Diabetes/uncontrolled diabetes– CKD/ESRD

Page 9: Presentation at UHC  Annual Meeting

Emory Enhanced Risk Assessment Tool (ERAT) Diagnoses• Diabetes• Heart Failure• Chronic Kidney Disease• Chronic Obstructive Pulmonary Disease• Acute Myocardial Infarction• Stroke• History of Transplant• Cancer• Pulmonary Hypertension

Page 10: Presentation at UHC  Annual Meeting

Identifying Variables Associated with 30-day Readmits• No variables in the CDW are broadly associated with

(or predictive of) readmits across the entire EHC population

• Need to drill-down into subpopulations to identify variables that are associated with readmits

• Ultimately, may be able to derive subpopulation-specific predictive models of readmissions

Page 11: Presentation at UHC  Annual Meeting

3-year+ subset (2008-3/2011)

Analytic Information Warehouse

Page 12: Presentation at UHC  Annual Meeting

Association of CKD with 30-day ReadmissionsOverall Emory Readmission Rate = 15%

ReadmissionRate = 21%

CKD?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 19386 7017 26403

No 30 Day Readmission 110058 23460 133518

Grand Total 129444 30477 159921

ESRD?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 23091 3312 26403

No 30 Day Readmission 124518 9000 133518

Grand Total 14760912312 159921

Readmission Rate =27%

Analytic Information Warehouse

Page 13: Presentation at UHC  Annual Meeting

Association of Multiple MI with 30-day Readmissions

Readmission Rate = 44%

Multiple MI?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 685 167 852

No 30 Day Readmission 5772 209 5981

Grand Total 6457 376 6833

Page 14: Presentation at UHC  Annual Meeting

Uncontrolled Diabetes (total n=8696, readmit n=1844, Readmit Rate = 21%)

Has Pressure Ulcer

Has ESRD

ReadmissionRate = 33%

ReadmissionRate = 32%

Pressure ulcer?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 387 128 515

No 30 Day Readmission 1053 260 1313

Grand Total 1440 388 1828

ESRD?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 1200 327 1527

No 30 Day Readmission 3491 712 4203

Grand Total 4691 1039 5730

Page 15: Presentation at UHC  Annual Meeting

Sickle Cell Anemia and 30-day ReadmitsSickle Cell Anemia

Sickle Cell Crisis

ReadmissionRate = 34%

SS Crisis?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 25972 431 26403

No 30 Day Readmission 132759 759 133518

Grand Total 158731 1190 159921

Sickle Cell Anemia?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 25905 498 26403

No 30 Day Readmission 132550 968 133518

Grand Total 158455 1466 159921

ReadmissionRate = 36%

Page 16: Presentation at UHC  Annual Meeting

Association of MRSA with 30-day Readmissions

Overall

Stroke

MI

Readmission Rate = 27%

Readmission Rate=38%

Readmission Rate=29%

MRSA?Subsequent 30-day readmit? FALSE TRUE Grand Total30 Day Readmission 25982 421 26403No 30 Day Readmission 132362 1156 133518Grand Total 158344 1577 159921

MRSA?Subsequent 30-day readmit? FALSE TRUE Grand Total30 Day Readmission 1203 16 1219No 30 Day Readmission 3996 26 4022Grand Total 5199 42 5241

MRSA?Subsequent 30-day readmit? FALSE TRUE Grand Total30 Day Readmission 836 16 852No 30 Day Readmission 5942 39 5981Grand Total 6778 55 6833

Page 17: Presentation at UHC  Annual Meeting

Use of Temporal Variables in creating useful subsets of data (5 year dataset)

Patient Population

Number of Encounters

Number of Readmissions Readmission Rate

Overall Emory 232645 34270 15%

Single MI 17992 2804 16%

Multiple MI 1355 492 36%

CKD 45664 10818 24%

>=4 readmissions 17550 9459 54%Multiple MI and >= 4 readmissions 900 465 52%CKD and >=4 readmissions 6997 3606 52%

Page 18: Presentation at UHC  Annual Meeting

Predictive Modeling for Readmission

• Classify inpatient encounters into high, medium, low risk groups of 30-day readmission based on patients’ characteristics

• Data preprocessing and mapping of codes• Predictive modeling

– Random forests (ensemble of decision trees)– Ranking of the predictions into high to low risk

• Emory specific data sets

Page 19: Presentation at UHC  Annual Meeting

Random Forests

• Random forests: an ensemble of tree predictors• Each tree is created using a random subset of the

variables in the dataset• A large number of trees are generated• All of them vote to classify a test example• Reference: Leo Breiman, Random Forests, Machine

Learning, 45, 5-32, 2001

Page 20: Presentation at UHC  Annual Meeting

Random Forest (cont)

• Generalization error depends on the strength of individual trees and the correlation between them

• Its accuracy is as good as AdaBoost (another robust algorithm)

• It is relatively robust to noise and outliers• It gives useful internal estimates of error,

correlation, strength and variable importance

Page 21: Presentation at UHC  Annual Meeting

Variables used in Predictive Modeling

• Age, gender, race• Census tract data: population, population by race,

average household income, persons per household• Primary and secondary diagnosis codes grouped

using ontologies• Lab procedure codes grouped using ontologies• Vitals like heart rate, blood pressure, temperature,

respiratory rate, BMI• Medications• Derived variables (next slide)

Page 22: Presentation at UHC  Annual Meeting

Derived Variables

• Disease flags– CKD, MI, HF, COPD, Diabetes, etc.

• Medication flags– Diabetes medication count, ACE inhibitor, beta

blocker, diuretic, inotropic agent, etc.

• Treatment flags– Radiotherapy, chemotherapy

• Patient history– Encounter 90 days earlier, 180 day earlier

Page 23: Presentation at UHC  Annual Meeting

BMI Using WHO Simple Classification (1 year subset 4/2010-3/2011)

Percent BMI Category for CKD female patients with multiple readmits (n=197)

Analytic Information Warehouse

Percent BMI Category for CKD patients with multiple readmits (n=386)

“30 Day Readmission” represents encounters that were followed by a 30 day readmit“No 30 Day Readmission” represents other encounters that were not followed by a 30 day readmit

RR=1.2

Page 24: Presentation at UHC  Annual Meeting

Predictive Modeling Results with Temporal Variable Constrained Dataset: MI data

(Emory)

All MI data and Multiple MI data

Data

Predicted Risk # of

encounters# of

Readmissions

30-day Readmission

rate

All MI data High 968 360 37%

Multiple MI High 68 35 51%

All MI data (no predictive modeling) 9674 1648 17%

Multiple MI (no predictive modeling) 376 167 44%

Page 25: Presentation at UHC  Annual Meeting

Predictive Modeling Results with Temporal Variable Constrained Dataset: CKD data

(Emory)

All CKD data and End Stage Renal CKD

DataPredicted

Risk# of

encounters# of

ReadmissionsReadmission

rate

CKD High 2284 950 42%End Stage

Renal High 952 444 47%All CKD (no predictive

modeling) 45664 10818 24%End Stage Renal (no predictive modeling) 3312 12312 27%

Page 26: Presentation at UHC  Annual Meeting

UHC Data Analyses

• Much larger dataset• Much less detailed information about each patient• UHC only has coded data sent by institutions so co-

morbidity related ICD-9 codes may be missing• Analyses across patient encounters can pick up

chronic co-morbidities that might not be coded in a particular encounter

Page 27: Presentation at UHC  Annual Meeting

Missing Diagnosis Codes in UHC dataset 10/1/2006 - 4/30/2011

Disease Number of Patients with missing codes in future encounters

Total number of patients

Number of Encounters with missing codes

Total number of encounters

Diabetes 144806 (8.01%) 1807322 311403 (9.4%) 3300804

Heart Failure 197043 (20.1%) 976041 366926 (20.7%) 1765203

MI 171213 (21.8%) 784559 301673 (25.8%) 1168056

Sickle Cell 2870 (10.5%) 27210 11162 (9.9%) 112268

Page 28: Presentation at UHC  Annual Meeting

UHC Use of Temporal Variables in Sub setting Data

Patient Population

# Total Encounters

# Readmitted Patients

Proportion of Patients Readmitted

MI 310954 47210 15.2%

Multiple MI 73227 29017 39.6%

Non-ESRD 13023536 1735308 13.3%

ESRD 510702 142622 27.9%

CKD 1334617 316399 23.7%

Page 29: Presentation at UHC  Annual Meeting

UHCUse of Temporal Variables in Sub setting Data

Patient Population

# Total Patients

# Readmitted Patients

Proportion of Patients Readmitted

Diabetes 2465049 465526 18.8%Uncontrolled Diabetes 388417 78005 20.0%

ESRD 510702 142622 27.9%Uncontrolled Diabetes and ESRD 48583 14224 29.8%

Page 30: Presentation at UHC  Annual Meeting

Readmission Hot Spots

Page 31: Presentation at UHC  Annual Meeting

UHC “Readmission Hot Spots”

1 2 3 4 5 6 7 80

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

EncountersPatients

Page 34: Presentation at UHC  Annual Meeting

Conclusion

• Integrative dataset analysis can leverage patient information gathered over many encounters

• Temporal analyses can generate derived variables that appear to correlate with readmissions

• Hot spots appear to be an important phenomenon and have the potential of leading to patient-level interventions

• Predictive modeling has promise of providing decision support

• Future analysis will look at temporal patterns of encounters and relationship between LOS and readmission