36
Welcome 1

Welcome [365.himss.org]

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Welcome

1

2

Adding Community-level Social Determinants of Health Factors to Patient-level Data to Predict Stroke

2

Rema Padman, PhD; Min Chen, PhD

Carnegie Mellon University, Pittsburgh, PA;

Florida International University, Miami, FL

DISCLAIMER: The views, opinions and images expressed in this presentation are those of the authors and do not necessarily represent official policy or position of HIMSS.

BG5, March 9, 2020

3

Meet Our Speakers

Rema Padman Min Chen

Conflict of Interest

Rema Padman, PhD and Min Chen, PhD

Have no real or apparent conflicts of interest to report.

4

Collaborators

5

Xuan Tan, PhD Candidate

Department of Information Systems & Business Analytics, Florida International University

Manjiri Kshirsagar, Jana Macickova, Ashita Vadlamudi, Chi Zhang

Graduate students, Heinz College of Information Systems and Public Policy, Carnegie

Mellon University

Agenda

• Introduction and Motivation

• Study questions

• Datasets and Analysis Cohort Identification

• Social Determinants of Health Factors

• Stroke Prediction Models

• Results and Discussion

• Conclusions

6

Learning Objectives

• Evaluate the extent to which data available only at admission can be

used to provide a relatively reliable prediction of acute disease

diagnosis, such as stroke, and which type of information is the most

valuable

• Assess the value of adding SDoH to the prediction of disease

incidence and outcomes

• Leverage SDoH data to achieve a more accurate risk assessment, and

ultimately, better performance for healthcare providers and better

outcomes for population health

7

8

Introduction

• Stroke or brain attack is one of

the leading causes of disability

and death worldwide [1]

• Estimated cost is $34 billion each

year in the US [2-3]

• Undiagnosed stroke or

misdiagnosed stroke means

delayed treatment or no

treatment at all

https://getaheadofstroke.org/

9

https://www.stroke.net.nz/stroke-information

Introduction• Stroke occurs when blood supply to a

certain area in the brain suddenly gets interrupted and brain cells die quickly

due to lack of oxygen [4-5]

• The harmful effects of stroke vary from

person to person and depend on the

affected area in the brain, size of stroke,

age, comorbidities [4-5]

• How quickly the patient receives medical

treatment is most critical for recovery. The

first hour of symptom onset is called “the

golden hour” [6]

• Patients who receive medications or

procedures to restore blood flow within

the first three hours have significantly

higher probability of recovery [6]

Timely and Accurate Diagnosis of Stroke: A Serious Challenge

• There are many medical conditions

that can initially look like stroke - Stroke

Mimics

• Diagnosis relies on laboratory medicine

resources and time-consuming and

expensive imaging and are not always

readily available at patient admission

to an emergency care facility [7]

10

https://www.kob.com/albuquerque-news/be-fast-neurologists-lay-

out-guide-to-spotting-stroke-symptoms/5363071/

• Social Determinants of Health (SDoH) have been shown to have an

association with risk of stroke and many other diseases [8-9]

Summary of Previous Studies on Stroke Diagnosis Prediction

• Several studies have attempted to detect potential biomarkers to

distinguish acute ischemic stroke from stroke-mimics [10-12]

• Some studies have reported predicting risk or mortality of stroke using

claims data [13, 15]. Others, such as Teoh (2018), predicted diagnosis

of stroke within one year using electronic health records data in Japan,

reporting an ROC of 0.67 [14]

• Few studies have tried to incorporate SDoH information to predict

stroke. Min et al. (2018) derived a model for stroke pre-diagnosis with

potentially modifiable risk factors (including lifestyle factors), and were

able to correctly discriminate between normal subjects and stroke

patients in 65% of the cases [16]

11

12

• Social Determinants of Health

(SDoH) include various community

and social factors, such as

“conditions in which people are

born, grow, work, live, and age,

and the systems shaping the

conditions of daily life” [17]

• Socioeconomic status• Education

• Occupation

• Transportation

• Health insurance

• Urban/rural residence

• Social support• Neighborhood factors

• ….

What are Social Determinants of Health (SDoH)?

https://hitconsultant.net/2019/03/18/social-determinants-of-health-sdoh-collection/#.XkweTihKg2w

13

• 20 percent of a person’s health

and well-being is related to

access to care and quality of

services [19-21]

• The physical environment,

social determinants and

behavioral factors drive 80

percent of health outcomes

[19-21]

• Your zip code could matter more than your genetic code

Impact of Social Determinants of HealthSocial determinants of health have tremendous affect on an individual’s

health regardless of age, race, or ethnicity.

14

• Population health suggests addressing upstream

SDoH factors such as access to healthy food and

viable transportation options [18]

• There is significant correlation between certain

SDoH factors (e.g., neighborhood socioeconomic

status indicators) and clinical outcomes including

hospitalizations due to stroke [8-9]

Impact of Social Determinants of Health

Acoihc.az.gov

• There are few systematic studies assessing the value of SDoH factors in the

prediction of diverse clinical events

• There is a need to explicitly evaluate whether and how social determinants

of health data can contribute to improving patient risk stratification and

prediction

Study Questions:

• Motivated by the current challenges of acute disease prediction,

especially the diagnosis of stroke when a patient presents for

emergency care at the Emergency Department(ED), this study aims to

answer the following questions:

• Can we use data available only at admission to provide a reliable

prediction of stroke diagnosis? Moreover, which type of information is

the most valuable?

• How can we identify an analysis cohort that best represents the

potential stroke population?

• How can we leverage SDoH data to achieve more accurate

prediction and risk assessment?15

Datasets

• State Inpatient Dataset (SID) from the Agency for Healthcare

Research and Quality (AHRQ) – HCUP Data

• The universe of discharge records from patients admitted across all

Florida community hospitals

• 2012 to 2014

• American Community Survey (ACS) Data

• Data from the US Census Bureau

• Demographic, socioeconomic, and other neighborhood information

about individuals and households at various geographic levels of

aggregation

16

HCUP Data

• The Healthcare Cost and Utilization Project (HCUP) has been

developed through a Federal-State-Industry partnership sponsored by

the AHRQ

• HCUP maintains both the State Inpatient Databases (SID) and the

State Emergency Department Databases (SEDD)

• SID contains all hospital inpatient discharge records including

information on patients seen in the emergency room and

subsequently admitted to the hospital

• SEDD captures discharge information on all “treat and release” ED

visits

17

ACS Data vs. Census Data

• Similarities

• Both administered by the U.S. Census Bureau

• Both provide neighborhood level SDoH information

• Differences

• Census is conducted once every 10 years

• ACS provides more up-to-date information about the social and

economic needs of the community

18

Creating the Analysis Dataset

19

For those admitting diagnosis codes related to

stroke and mimics:

89,925 (stroke sample) + 55,051 (mimics sample)

= 144,976

Ended up as stroke:

66,704 (stroke sample) + 786

(mimics sample) = 67,490

Ended up as non-stroke:

23,221 (stroke sample) + 54,265

(mimics sample) = 77,486

Identifying Stroke Mimics

• We compiled a list of conditions with similar initial symptoms as stroke by

consulting physicians, Epocrates, and medical literature

• We identified stroke mimics in the patient-level data using admitting diagnosis

codes for specific conditions (e.g. hypoglycemia, complicated migraine, seizure)

• Top 5 diagnoses from Stroke Mimics diagnosis codes

20

DX_CCS1 DescriptionPercent of

observations

83 Epilepsy; convulsions 56.8%

50 Diabetes mellitus with complications 6.3%

660 Alcohol-related disorders 5.1%

35 Cancer of brain and nervous system 4.1%

51 Other Endocrine disorders 1.9%

Stroke versus Stroke Mimics - Examples

21

Patient

Admitting Symptoms Principal Diagnosis

ICD-9 Code Description

Clinical

Classification

Code

Description

A 78039 Convulsions 83 Epilepsy, convulsions

B 78039 Convulsions 131 Respiratory failure

C 78039 Convulsions 129

Aspiration pneumonitis;

food/vomitus

D 43491

Cerebral artery

occlusion with

infarction 109

Acute cerebrovascular

disease

E 431

Intracerebral

hemorrhage 109

Acute cerebrovascular

disease

Available Patient-Level Variables in HCUP

22

Information available at IP

admission: • Age, gender, race, ZIP code,

rural or urban residence,

median income of patient’s ZIP

code• Admission time

• Patient point of origin (e.g.

home, ER, nursing facility, etc.)

• Admitting diagnosis code

• Number of chronic conditions• Primary expected payer

• Whether it was a weekend

admission

• Whether the admission was

during night shift

Information available at

discharge: • Discharge time

• Died during

hospitalization?

• Physician ID

• Diagnosis Related

Group

• Major Diagnostic Category

• Length of stay

• Procedures

• Total hospital charges

Patient ID is used to track a patient’s visits across hospitals over time

HCUP Data Used for Analysis

23

Information available at ED

presentation (extracted from IP

admission): • Age, gender, race, ZIP code,

rural or urban residence,

median income of patient’s

ZIP code

• Patient point of origin (ER)

• Number of chronic conditions

• Primary expected payer

Patient ID is used to track a patient’s visits across hospitals over time

Outcome Measure: Binary indicator variable

of stroke versus non-

stroke (using Primary

Diagnosis Code from IP

admission)

SDoH Data for Analysis

24

431 Variables: household characteristics, relationship & marital status, fertility, educational attainment, veteran status, disability status, residence 1 year ago, place of birth, citizenship status,

language spoken at home, ancestry, computer use, employment status, commuting to work,

occupation & industry, income, health insurance coverage, poverty status, housing

characteristics, vehicles available, house value & expense, Gini index

Subjective selection based on literature review

78 Variables: The 78 variables were chosen from a larger set of 4 ACS tables and several hundred variables because they represent social, economic, housing, and demographic characteristics

referenced in the literature to have a relationship with health status

Data cleaning steps

(i) Removing non-numeric indicators of missing data and replacing them with column averages,

(ii) Log10 transforming columns with large dollar values to be closer in value to the other columns,

which ranged from 0-100 indicating percentages

(iii)Only a negligible portion of the smaller population ZIP codes had missing column values, so

replacing the missing values with the column average was deemed appropriate

25

Flow Chart of Analysis Data Creation

join

Original ACS dataset (2010-2014): 983 rows

No-NA Diagnosis + ACS dataset (input for analytical models): 97,134 rows

Diagnosis + ACS dataset: 101,558 rows

Zip Code Level SDoH Data

Patient Level Data + SDoH Data

De-duplicated dataset : 125,266 rows

No-NA dataset: 101,558 rows (70% of original)

Remove NAs

Patient Level Stroke and Mimics Data

Original dataset (2012-2014): 144,976 rows

Keep only ED transferred to IP records

Train set (80% of 97,134): 77,707 rows Test set (20% of 97,134): 19,427 rows

Remove NAs

ED to IP dataset : 106,010 rows

Keep only index admission

Descriptive Summary

26

Stroke Sample Mimics Sample All Sample

Age*** 71.11(14.74) 55.83(22.74) 63.72(20.51)

Female*** 0.50(0.50) 0.51(0.50) 0.51(0.50)

Number of Chronic Conditions*** 7.14(3.00) 5.30(3.21) 6.25(3.23)

Elixhauser Score 7.59(9.82) 7.61(9.84) 7.60(9.83)

Race/Ethnicity

White*** 0.66(0.47) 0.62(0.49) 0.64(0.48)

Black*** 0.17(0.38) 0.20(0.40) 0.19(0.39)

Hispanic*** 0.14(0.35) 0.15(0.36) 0.15(0.35)

Other Race 0.03(0.16) 0.03(0.16) 0.03(0.16)

Medical Insurance

Medicare*** 0.70(0.46) 0.49(0.50) 0.60(0.49)

Medicaid*** 0.08(0.27) 0.18(0.38) 0.12(0.33)

Private Insurance*** 0.13(0.34) 0.17(0.37) 0.15(0.36)

Other Insurance*** 0.09(0.29) 0.16(0.37) 0.13(0.33)

Urban Residence*** 0.96(0.20) 0.97(0.17) 0.96(0.19)

Median household income for patient's ZIP Code

First Quartile*** 0.40(0.49) 0.42(0.49) 0.41(0.49)

Second Quartile** 0.33(0.47) 0.32(0.47) 0.33(0.47)

Third Quartile 0.20(0.4) 0.20(0.40) 0.20(0.4)

Fourth Quartile 0.07(0.25) 0.06(0.24) 0.06(0.25)

Number of Observations 50,159 46,975 97,134

Note: Standard Deviations are in parentheses. ***significant at 1%; **significant at 5%, *significant at 10%.

27

Prediction Models

• Logistic Regression (LR): a popular baseline model to determine the

relationship between a set of predictor variables and a binary

outcome variable [22]

• Random Forest (RF): a supervised machine learning method for

classification that fits multiple decision trees on different subsamples

of the data to classify outcomes [23]

• Gradient Boosting Machine (GBM): a machine learning method for

classification, which produces a prediction model in the form of an

ensemble of weak prediction models [24]

28

Model Evaluation Scenarios

• HCUP variables only

• Add community-level SDoH variables from the ACS data

• Add Comorbidities to HCUP variables: Elixhauser Index

• Add community-level SDoH variables from the ACS data and

Elixhauser Index

Key Performance Measures

• Accuracy: % of predictions that the model got right

• AUC: Probability that the model will rank a randomly chosen positive example

higher than a randomly chosen negative example

• Precision: Ratio of correct positive results to total positive results predicted by

the model

• Recall: Ratio of correct positive results to the total correct results in the

data

• F1 score: A [0, 1] score that is computed as the harmonic mean

between precision and recall

29

30

Performance of Stroke Diagnosis Models

Model Predictors Accuracy AUC Precision Recall F1 Score

Logistic

Regression

SID Only 0.67 0.67 0.66 0.75 0.70

SID + ACS 0.67 0.67 0.66 0.75 0.70

SID + Elixhauser 0.67 0.67 0.66 0.75 0.70

SID + ACS + Elixhauser 0.68 0.68 0.67 0.75 0.71

Random

Forest

SID only 0.69 0.68 0.65 0.86 0.74

SID + ACS 0.68 0.67 0.66 0.77 0.71

SID + Elixhauser 0.69 0.68 0.65 0.86 0.74

SID + ACS + Elixhauser 0.68 0.67 0.66 0.77 0.71

Gradient

Boosting

Machine

SID only 0.69 0.68 0.65 0.84 0.73

SID + ACS 0.70 0.69 0.67 0.81 0.73

SID + Elixhauser 0.69 0.68 0.65 0.84 0.73

SID + ACS + Elixhauser 0.70 0.69 0.67 0.82 0.74

Variable Importance Analysis

31

Top 20 Features from GBM (SID + ACS + Elixhauser Score)

Discussion: Why Does SDoH Information Lack Strong Predictive Power?

• Insufficient variability in the ZIP code level SDoH measures

Welch-adjusted ANOVA shows significant difference in means

between stroke versus non-stroke patients for most of the ACS

variables

• Patient-level demographics present in claims data may have

accounted for much of the variability in the ACS variables

Regress ACS SDoH variables onto the patient-level variables

R-squared: 30-40%

32

Key Takeaways

• Claims data can be used to predict stroke diagnosis

• Individual-level SDoH variables are important predictors

• Age

• Primary payer

• Adding community-level ACS variables to the patient-level data did

not improve predictive power substantially

33

Contributions, Limitations and Implications

• One of the first large-scale studies that systematically assesses the added

value of SDoH information using claims data

• Development and integration of individual level SDoH screening tools is

strongly indicated; incentivize the collection of SDoH data through

financial or quality measures

• Link data from Emergency Department and pre-hospital care settings,

including outpatient ambulatory care to get a more complete patient

trajectory

• Integrate clinical data with the claims data to include laboratory and

imaging test results

• Investigate stroke prediction at milestone events after ED presentation,

such as when a test is completed, to determine the added value of

distinct interventions

34

References1. “Heart Disease and Stroke Statistics— 2018 Update: A Report From the American Heart Association”. American Heart Association. March 20, 2018.

2. Benjamin EJ, Virani SS, Callaway CW, Chamberlain AM, Chang AR, Cheng S, et al. Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart

Association. Circulation. 2018;137(12):e67-e492.

3. Johnson CO, Nguyen M, Roth GA, Nichols E, Alam T, Abate D, et al. Global, Regional, and National Burden of Stroke, 1990–2016: A Systematic Analysis for the Global Burden of

Disease Study 2016. The Lancet Neurology. 2019;18(5):439-58.

4. Kelly Adam G, Hellkamp Anne S, Olson D, Smith Eric E, Schwamm Lee H. Predictors of Rapid Brain Imaging in Acute Stroke. Stroke. 2012;43(5):1279-84.

5. Alberts MJ, Hademenos G, Latchaw RE, Jagoda A, Marler JR, Mayberg MR, et al. Recommendations for the Establishment of Primary Stroke Centers. JAMA. 2000;283(23):3102-9.

6. Mayo Clinic. Stroke-Diagnosis & Treatment 2019 [Available from: https://www.mayoclinic.org/diseases-conditions/stroke/diagnosis-treatment/drc-20350119.

7. Musuka TD, Wilton SB, Traboulsi M, Hill MD. Diagnosis and Management of Acute Ischemic Stroke: Speed is Critical. CMAJ. 2015;187(12):887-93.

8. Chan KS, Roberts E, McCleary R, Buttorff C, Gaskin DJ. Community Characteristics and Mortality: the Relative Strength of Association of Different Community Characteristics.

American journal of public health. 2014;104(9):1751-8.

9. Hill PL, Weston SJ, Jackson JJ. Connecting Social Environment Variables to the Onset of Major Specific Health Outcomes. Psychology & health. 2014;29(7):753-67.

10. Glickman SW, Phillips S, Anstrom KJ, Laskowitz DT, & Cairns CB. Discriminative capacity of biomarkers for acute stroke in the emergency department. The Journal of emergency

medicine. 2010; 41(3): 333-339.

11. Reynolds MA, Kirchick HJ, Dahlen JR, Anderberg JM, McPherson PH, Nakamura KK, ... & Buechler KF. Early biomarkers of stroke. Clinical Chemistry. 2003; 49(10):1733-1739.

12. Saenger AK, & Christenson RH. Stroke biomarkers: progress and challenges for diagnosis, prognosis, differentiation, and treatment. Clinical chemistry. 2010; 56(1): 21-33.

13. Ong MEH, Chan YH, Lin WP, & Chung WL. Validating the ABCD2 score for predicting stroke risk after transient ischemic attack in the ED. The American journal of emergency

medicine. 2010; 28(1): 44-48.

14. Teoh D. Towards stroke prediction using electronic health records. BMC medical informatics and decision making. 2018; 18(1): 1-11.

15. Cheon S, Kim J, Lim J. The Use of Deep Learning to Predict Stroke Patient Mortality. International journal of environmental research and public health. 2019 Jan;16(11):1876.

16. Min SN, Park SJ, Kim DJ, Subramaniyam M, & Lee KS. Development of an Algorithm for Stroke Prediction: A National Health Insurance Database Study in Korea. European

neurology. 2018; 79(3-4): 214-220.

17. World Health Organization. Social determinants of health. Secondary Social determinants of health. [Available from: http://www.who.int/social_determinants/sdh_definition/en/

18. Freij M, Dullabh P, Hovey L, Leonard J, Card A, Dhopeshwarkar R. Incorporating Social Determinants of Health in Electronic Health Records: A Qualitative Study of Perspectives on

Current Practices among Top Vendors. NORC at the University of Chicago 2018.

19. ProMedica. Social Determinants of Health 2019 [Available from: https://www.promedica.org/socialdeterminants/pages/default.aspx.

20. County Health Rankings & Roadmaps. County Health Rankings Model 2016 [Available from: https://www.countyhealthrankings.org/county-health-rankings-model.

21. Jessica T. Claudio, HRET HIIN, the Association for Community Health Improvement. Reducing Root Causes of Harm: Social Determinants of Health. November 15, 2018. [Available

from: http://www.hret-hiin.org/Resources/health_care_disparities/18/hret-hiin-virtual-event-reducing-root-causes-of-harm-social-determinants-of-health-slides.pdf.

22. Hosmer, D. W., and Lemeshow, S. 2000. “Interpretation of the Fitted Logistic Regression Model,” Chapter 3 in Applied Logistic Regression (2 nd ed.), New York: Wiley, pp. 47-90.

23. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning (2nd ed.). Springer. ISBN 0-387-95284-5.

24. Hastie, T.; Tibshirani, R.; Friedman, J. H. (2009). "10. Boosting and Additive Trees". The Elements of Statistical Learning (2nd ed.). New York: Springer.

35

Thank you!

We welcome and appreciate your feedback.

[email protected]; [email protected]

Questions?

36

Click here to rate this sessionOr

Type the below URL in your browserhttps://himss.pswebsurvey.com/SE.asp?SID=BG5