Big Data, Bias and Analytics – What Can Your EHR Really Tell You? ADAM WILCOX, PHD

Preview:

Citation preview

Big Data, Bias and Analytics – What Can Your EHR Really Tell You?ADAM WILCOX, PHD

DATAbig

Source:Nature (Feb 13, 2013)

Hype Cycle for Emerging TechnologiesGartner (August 2014)

Outline

Background and Experience

Big Data IntroductionBig Data – Bias Issues

Advancing Big Data

Next Steps and Conclusion

Outline

Background and Experience

Big Data IntroductionBig Data – Bias Issues

Advancing Big Data

Next Steps and Conclusion

Knowledge Representation vs. Knowledge Discovery

0.5

0.6

0.7

0.8

0.9

1

ML

text

Q te

xt

Q U

MLS

ML

UMLS

ML

NLP

Q N

LPru

les

phys

ician

s

text

UMLS

NLP

MachineLearningQueries

Costs/ClinicSalary + training + admin

$92,077

Benefits/Clinic

Productivity (7 MD’s) $99,986

Hospitalizations ↓ * $0

Total (benefits – cost) +$7,909

* Society would save, per clinic, $79,092 in reduced hospitalizations.

Dorr DA, Wilcox AB, et al. The effect of technology-supported, multidisease care management on the mortality and hospitalization of seniors. J Am Geriatr Soc. 2008 Dec;56(12):2195-202.

Effect of Care Management: Outcomes

Increase in CDR View Access

0%

10%

20%

30%

40%

50%

60%

70%

Jul 0

1 20

07Ju

l 17

2007

Aug

02

2007

Aug

18

2007

Sep

03 2

007

Sep

19 2

007

Oct

05

2007

Oct

21

2007

Nov

06

2007

Nov

22

2007

Dec

08

2007

Dec

24

2007

Jan

09 2

008

Jan

25 2

008

Feb

10 2

008

Feb

26 2

008

Mar

13

2008

Mar

29

2008

Apr

14

2008

Apr

30

2008

May

16

2008

Jun

01 2

008

Jun

17 2

008

Jul 0

3 20

08Ju

l 19

2008

Aug

04

2008

Aug

20

2008

Sep

05 2

008

Sep

21 2

008

Oct

07

2008

Oct

23

2008

Nov

08

2008

Nov

24

2008

Dec

10

2008

Dec

26

2008

Jan

11 2

009

Jan

27 2

009

Feb

12 2

009

Feb

28 2

009

0

1000

2000

3000

4000

5000

6000

Eclipsys MRNs

Tab %

INTE

GRA

TIO

N

SERV

ICES

REPLICATED

Databases

VIRTUAL DATA WAREHOUSE

DATAMARTS

DM

DM

DM

A

B

C

Ad-Hoc Queries–

QuestionsResearch Define

Recurring–

Automated Queries

Management Reports Measure

OLAP–

Analytics

Operational Reports Analyze

Dashboards Point of Care

Reporting Improve

ApplicationsDecisionSupport Control

DATA WAREHOUSE TOOLS

WICER

Improve Use of Information for Learning Health System

• Informed strategy for healthcare transformation

• Measures to support real-time process and quality improvement

• Data and analytics driving research and discovery

Outline

Background and Experience

Big Data IntroductionBig Data – Bias Issues

Advancing Big Data

Next Steps and Conclusion

Raw Clinical

Matched Clinical

Matched Survey

SurveyMatched vs. Matched

Clinical vs. Survey

Age 47.55 52.33 51.12 50.12 0.072 p << .0001

Proportion Female 0.62 0.79 0.78 0.71 0.963 p << .0001

Proportion Hispanic

0.50 0.56 0.94 0.96 p << .0001 p << .0001

Weight kg

75.69 77.16 76.99 75.42 0.851 0.851

Height cm

160.34 158.23 161.31 161.25 p << .0001 p << .0001

BMI 28.10 29.70 28.90 28.20 0.207 0.207

Prevalence of Smoking

0.09 0.08 0.08 0.06 0.944 p << .0001

Systolic 127.23 128.48 127.50 127.68 0.204 0.164

Diastolic 73.07 74.34 79.24 80.95 p << .0001 p << .0001

Prevalence of Diabetes (Survey = self-report, Clinical = >1 Diabetes ICD-9 AND >1 abnormal test)

0.04 0.09 0.22 0.16 p << .0001 p << .0001

Data Collection Methods

Outline

Background and Experience

Big Data IntroductionBig Data – Bias Issues

Advancing Big Data

Next Steps and Conclusion

Data Quality and Assessment

Weiskopf NG, Weng C. Methods and dimensions of data quality assessment: enabling reuse for clinical research. JAMIA 2013

“New” Analytic Methods

• Bootstrapping

• Learning curves and over-fitting

• Hypothesis generation process

t-tests Non-parametric tests (Chi-square)

Bootstrapping

+ Easy + Easy + Robust

+ Powerful + Robust + Powerful

+ Widely implemented + Widely implemented - Less common

- Not appropriate for all data types

- Less powerful - Requires special packages or programming

Big Data Analytic Approaches

• Sub-population analysis

• Investigating surprises– Often more revealing about data quality than

real effects

Outline

Background and Experience

Big Data IntroductionBig Data – Bias Issues

Advancing Big Data

Next Steps and Conclusion

Big Data

• Know the data you need

• Use the data you have

• Get the data you want

• Adapt data to user needs

• Make value accessible

Next Steps to Make it Useful

Minimum Requirements to Provide Value

• Secure database

• Data sources

• Patient-level integration– Master Patient Index*

• Semantic integration– Vocabulary*

• Excellent analysts

Patient Data Integration

Vocabulary and Data Density

Natural Language Processing

Factors Influencing Health

SocioeconomicHealth behaviorsClinical carePhysical envi-ronment

Collecting Patient-Reported Outcomes

• Transcribing

• Patient Portals

• Scanning

• Tablet entry

Patient Reported Information: Tablets vs. Scanned Documents

Scanning Tablets

Institutional

Equipment cost = =Infection risk = =

Security

Theft + -Data loss - +Patient mismatch

- +

Disaster recovery

+ -

Patient Reported Information: Tablets vs. Scanned Documents

Scanning Tablets

Functionality

Office workflow - =Education/training

= =

Data timeliness = +Branching logic - +Extensibility - +

Patient experiencePreference = +Security perception

= -

Goal Task Use User Tool QI Life-cycle

Cost/ Instance Instances Required

Answer a specific

question

Ad hoc query Research Researcher SQL Define + +++++ Defined

request

Observe trends Recurring query

Management reports Manager Reporting

applicationMeasur

e ++ ++++ Available owner

Identify dependencies

Sub-population

analysis

Operational analysis Analyst Analytic tools Analyze +++ +++

Content expert/ analyst

Assist decision making

Dashboard display

Point of care improvement

Clinical team

Registries Improve ++++ ++ Pilot site

Automate processes Application Decision

supportClinician/

RoleEMR

application Control +++++ + Institutional sponsor

Physical Activity