Download pdf - Pharmacoepidemiology – Big data, Big problems, Big … · Outline - agenda • In the context of pharmacoepidemiology • What are big data? • What are the big problems with big

James Brophy MD FRCP PhD McGill University Health Center,

McGill University, Montreal, Quebec

Réseau Québécois de Recherche sur les Médicaments

Session II : Big Data : une mine d’or Québécoise à exploiter

1 juin 2015

Pharmacoepidemiology – Big data, Big problems, Big solutions

2

Conflicts of Interest

I have no known conflicts associated with this presentation and to, the best of my knowledge,

am equally disliked by all pharmaceutical and device companies

http://www.nofreelunch.org/

Outline - agenda

•  In the context of pharmacoepidemiology •  What are big data? •  What are the big problems with big data? •  Are there innovative solutions to these

problems?

3

What is the definition of big data? •  Something that

– doesn’t fit into Excel (65,535 row limit) – makes you say ”wow” – makes you uncomfortable working with it – only applies to genomics

•  Wikipedia – Big data is high volume, high velocity, and/or

high variety information to enable enhanced decision making, insight discovery and process optimization. 4

How big is big data?

5

Just because it’s big, is it right?

6 http://oig.ssa.gov/sites/default/files/audit/full/pdf/A-06-14-34030_0.pdf

Over 6 million Americans have reached the age of 112 Just 13 are claiming benefits, and 67,000 of them are WORKING

More big data hubris

1.  2008 stock market crash – lots of economic data but incorrect models failed to predict and even facilitated the crash (Black Swan – N. Taleb)

2.  Google - “…we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day.” (Nature 2009)

7

More big data hubris

•  Google Flu was wrong for 100 out of 108 weeks since August 2011

•  Error was a systematic over-estimate (Science Mar 14 2014) 8

So the big question…

•  Is not the volume, velocity or variety of the data that is the problem but rather its VERACITY

•  Also a problem for pharmacoepidemiology?

9

Pharmacoepidemiology 2010

10

2010 •  Both studies used UK GPRD database

1996 -2006 & 1995-2005

11

BMJ – RR 2

JAMA – RR 1.07

Me, too

12

2 RAMQ cohorts

13

NEJM RCT 2014

14

NEJM RCT 2014

15

Problems with Big Data

•  Most big data is observational -> biases (selection, information) and confounding

•  Big data -> small random errors, tight CIs, small p values, but systematic errors not measured in these CIs -> false sense of precision

•  Big data often leads to ignoring other pertinent evidence that should be synthesized to reach the most reasonable conclusions

16

Principles for working with big data •  Government

– Privacy / Accessibility –  Integrity of the data

•  Researchers – Privacy / security – Processing the data (design, analysis, model

selection) –  Interpreting the results - epistemologically

important to distinguish information (data), knowledge (causal inferences) & wisdom (systematic incorporation of all knowledge)

17

Learning from Big Data •  More than big data need better data, rich in

important confounders •  Need better research designs, especially

experimental data •  Need to better appreciation of the quantitative

sciences (uncertainty, causal inference) •  Need “domain knowledge”—specific clinical

information •  Must incorporate prior evidence.

–  If good prior data use informative priors –  If very little data use agnostic/uniform prior beliefs 18

What is the purpose of pharmacoepidemiology?

•  Patterns of drug utilization •  Generating new information on drug safety •  Supplementing premarketing effectiveness

studies – different populations, better precision

•  However, the overall purpose is to provide insights or causal inferences, not merely associations generated from large data sets.

19

Estimating causal effects

1.  Randomized Experiments 2.  Natural Experiments 3.  Instrumental Variables 4.  Regression Discontinuity 5.  Difference in Differences

20

21

An example

Results

22

Problems

•  Not sure of the benefit in NA context •  Changing everyone in Quebec to

ticagrelor would cost $25 million •  Doing a large conventional RCT could cost

$10-50 million •  What to do?

23

Using big data effectively

•  Most of the cost is for the follow-up •  We have excellent administrative

databases with reliable measures of death and cardiac outcomes so could minimize costs

•  Need to avoid selection basis so could randomize at start and then simply observe

•  New design – randomized registry – can answer the question at a reasonable cost aa24

Conclusion •  Instead of focusing on a “big data

revolution,” better is an “all data revolution” including replication

•  Recognize critical change has been innovative designs and analytics, can be applied to both traditional and new data

•  Big data is an aid to thinking not a substitute for thinking

•  Goal of this revolution is to provide a deeper, clearer understanding of our world. 25

Science March 14 2014

Merci

26

Learning form big data •  Must incorporate prior evidence.

–  If good prior data use informative priors –  If very little data use agnostic or uniform prior

beliefs •  In all cases, must be able to specify where

you are and why, if agnostic approach then need validation study

•  Avoid confusing prior beliefs with prior evidence -> biases

27

How Much Data is There?

•  2.5 quintillion terabytes of data were generated every day in 2012

•  As much data is now generated in just two days as was created from the dawn of civilization until 2003.

28 Harvard Business Review Dec

2014

•  Where things go wrong is where tools of this kind are used not as an aid to thinking but as a substitute for thinking. When the information provided is used (this was one of David Ogilvy’s favourite quotations) “… as a drunk uses a lamppost: for support rather than illumination.”

29

What can big data find in healthcare?

30

Big data & inferences

31 Washington Post March 21

What is the correct inference?

•  Americans spend too much on gambling and too much on the important stuff of politics

•  Americans spend too much on gambling and not enough on the important stuff of politics

•  Americans don’t spend too much on gambling but spending on politics is out of control

32

Looking in detail

•  Consider there are 316 MM Americans •  Basketball 13% gambled, average bet $200 •  Elections, 80% adults, average $25 •  Elections 1% of 1% of the population

(31,600) spent 28% or $2 B, average contribution $64,000

•  Very small sample of Americans are controlling the election process

33

How unequal?

34

Do statins increase or decrease the risk of cancer?

Impossible d'afficher l'image. Votre ordinateur manque peut-être de mémoire pour ouvrir l'image ou l'image est endommagée. Redémarrez l'ordinateur, puis ouvrez à nouveau le fichier. Si le x rouge est toujours affiché, vous devrez peut-être supprimer l'image avant de la réinsérer.

NO

YES

Maybe neither

Maybe this is an isolated case and dates from 2007. Surely we are better today.

Do statins cause diabetes?

37

Do statins cause diabetes?

38

Statins & diabetes, Who do you believe?

•  Both studies published in May 2013 •  Both studies published in high impact

journals •  Both used validated administrative

datasets •  Both published by renown investigators

39

Statins & diabetes, Who do you believe?

•  Even more confusing & troublesome •  Both used THE SAME validated

administrative datasets (Ontario) •  Both used essentiallyTHE SAME patients

(>65, no diabetes, new statin users from 1997 (2004) - 2010

•  Both sets of authors are from THE SAME academic institution (Sunnybrook, U of T)

40

Adaptive randomization & ethics

41

•  In the end, it seems doubtful that adaptive allocation generally improves risk/benefit for patients.

•  Require larger sample sizes -> more patients, more research procedures, more visits.

•  Since costs scale with sample sizes, it means more resources are consumed in answering a single research question than with a fixed 1:1 design.

42

Adaptive randomization & ethics

•  Does outcome-adaptive allocation better accommodate clinical equipoise and promotes informed consent?

•  Does adaptive allocation offers a ‘‘partial remedy’’ for the therapeutic misconception associated with fixed randomization?

43

Arguing against

•  Hey and Kimmelman suggest that they do not improve risk–benefit for subjects but increase total burden for both patients and research systems by demanding larger sample sizes.

•  Suggest that they redistribute rather than dissolving tensions in informed consent

•  Suggest may have validity problems 44

A source of bias? •  Given that the odds of receiving the better

treatment will improve over the course of the trial

•  It is in the best interests of patient-subjects (and the physicians advocating on their behalf) to wait and enroll as late as possible

•  So later patients maybe healthier (less urgency to participate) -> predictable time-trend in the study population increases the risk of bias 45

Example # 3

46

Example # 3

47 We have reached a threshold such that time to reperfusion no longer matters, provided < 90 minutes, and we now need to look elsewhere for improvements.

Results

48

16 minute improvement

No improvement, really? •  Adjusted mortality has declined from 5% to

4.7% p=0.34 but what would CI tell us? •  Back of the envelop calculations, a 0.3%

improvement with 95% CI from -0.1% to +0.7%

•  In other words this small improvement in time is consistent with an up to 7/1000 absolute survival benefit (about 2800 annually) or 14% relative decrease in mortality and is entirely consistent with previous research 49

Consistent with other results

50 J Am Coll Cardiol 2006;47:2180-6

22,900 PCI in AMI NRMI

Telling it like it isn’t

51

MY CONCLUSIONS

This study shows that improved treatment times, even below the 90 minute threshold, are likely associated with meaningful mortality benefits that are entirely consistent with previous work and may have a huge public health impact. Efforts should continue to reduce all treatment delays.

Fundamental identity of causal

inference Outcome for treated − outcome for untreated

= [Outcome for treated − Outcome for treated if not treated] + [Outcome for treated if not treated − Outcome for untreated] = Impact of treatment on treated + selection bias If treatment is randomly assigned • Selection bias is zero. • Treated are random selection from population, so impact on treated = impact on population 52

Problems

•  Basic problems of observational research including selection bias, information bias and confounding – How were patients selected? – How was exposure measured? – Were time dependencies? – What were the statistical models? – What confounders, interactions, mediators

considered? 53

References

54