Bias Correction in Pharmaceutical Risk-Benefit Assessment Bob Obenchain, PhD, FASA Risk-Benefit Statistics LLC Yin = Dark = Evil = Risk Yang = Light =

Bias Correction in Pharmaceutical

Risk-Benefit Assessment

Bob Obenchain, PhD, FASARisk-Benefit Statistics LLC

Yin = Dark = Evil = Risk

Yang = Light = Good = Benefit

Outline:• Covariate Adjustment (Simplistic,

Global Modeling) is Inadequate• Local Control methods take BIG

Steps in “Right Directions.”• Emerging Credibility Crisis in

Pharmaceutical Safety

Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. J Clin Epidemiol 2005; 58: 550–559.

Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. [REVIEW ARTICLE] J Clin Epidemiol 2006; 59: 437–447.

With titles like these, do youreally need to read the paper?

Heckman JJ. Sample selection bias as a specification error. Econometrica 1979; 47: 153–161.

Crown WE, Obenchain RL, Engelhart L, Lair TJ, Buesching DP, Croghan TW. The application of sample selection models in evaluating treatment effects: the case for examining the effects of antidepressant medication. Stat Med 1998; 17, 1943–1958.

Obenchain RL, Melfi CA. Propensity score and Heckman adjustments for treatment selection bias in database studies. 1997 Proceedings of the Biopharmaceutical Section. Alexandria, VA: American Statistical Association. 1998; 297–306.

Early CA Modeling Efforts

D’Agostino RB Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. [TEACHER’S CORNER] Stat Med 1998; 17: 2265–2281.

Highly Influential ???

Claimed that 3rd form of PS Adjustment (after matching and sub-grouping) was to

simply use some function of PS estimates as an additional X in Covariate Adjustment.

• Epidemiology (case-control & cohort) studies• Post-stratification and re-weighting in surveys• Stratified, dynamic randomization to improve balance

on predictors of outcome• Matching and Sub-grouping using Propensity Scores • Econometric Instrumental Variables (LATEs)• Marginal Structural Models (IPW 1/PS)• Unsupervised Propensity Scoring: Nested Treatment-

within-Cluster ANOVA model …with LATE, LTD and Error sources of variation

History of Local ControlMethods for Human Studies

“Local” Terminology:

• Subgroups of Patients

• Subclasses…

• Strata…

• Clusters… (natural or forced)

Notation for Variables

y = observed outcome variable(s)x = observed baseline covariate(s)t = observed treatment assignment

(usually non-random)z = unobserved explanatory variable(s)

Fundamental PS TheoremJoint distribution of x and t given p:

Pr( x, t | p ) Pr( x | p ) Pr( t | x, p ) = Pr( x | p ) Pr( t | x ) = Pr( x | p ) times p or (1p) = Pr( x | p ) Pr( t | p )

...i.e x and t are conditionally independent given the propensity for new, p = Pr( t = 1 | x ).

Conditioning (patient matching) on Propensity Scores implies both…

Balance: local X-covariate distributions must be the same for both treatments

and

Imbalance: Unequal local treatment fractions

unless Pr( t | p ) = p = 1p = 0.5

Constant PS Estimate Calipersfrom Discrete Choice (Logit or Probit) Model

x2

x3

x1

xx LinearLinearFunctionalFunctional constantconstant

Infinite 3-D Slab

Pr( x, t | p ) = Pr( x | p ) Pr( t | p )The unknown true propensity

score is the “most coarse”possible balancing score.

The known x-vector itself is the“most detailed” balancing score…

Pr( x, t ) = Pr( x ) Pr( t | x )

Conditioning upon Cluster Membership is intuitivelysomewhere between the two PS extremes in the limit as

individual clusters become numerous, small and compact…

But LESS “detailed” thanPr( x, t ) = Pr( x ) Pr( t | x ) ?

Pr( x, t | C ) = Pr( x | C ) Pr( t | x, C ) = Pr( x | C ) Pr( t | x ) for xC

constant Pr( t | C )

What is LESS “coarse” thanPr( x, t | p ) = Pr( x | p ) Pr( t | p ) ?

Unsupervised No PS Estimates Needed

x2

x3

x1

3-D Clusters (Informative orUninformative)

Source Degrees-of- Freedom Interpretation

Clusters (Subgroups)

C = Number of Clusters

Local Average Treatment Effects (LATEs) are

Cluster MeansTreatment

within Cluster

Number of “Informative” Clusters C

Local Treatment Differences (LTDs)

Error Number of Patients 2C Uncertainty

Although a NESTED model can be (technically) WRONG, it is sufficiently versatile to almost always be

USEFUL as the number of “clusters” increases.

Nested ANOVA

Source Degrees-of- Freedom Interpretation

Clusters (Subgroups)

C = Number of Clusters

Local Average Treatment Effects (LATEs) are

Cluster MeansTreatment

within Cluster

Number of “Informative” Clusters C

Local Treatment Differences (LTDs)

Error Number of Patients 2C Uncertainty

Although a NESTED model can be (technically) WRONG, it is sufficiently versatile to almost always be

USEFUL as the number of “clusters” increases.

Nested ANOVA

Multiplicative “Shrinkage” Model

1(observed )pi i

ii

YE Y E

1= 1 if "treated" is observed; 0,otherwise.i iY

and isstatistically independent of .i iYiPropensityScore = Pr( =1) = p 0 and < 1i

0 = Number untreated patients in th cluster > 0in i

Nested ANOVA Treatment Difference within ith Cluster:

0 ˆ(1 )i in p

1 = Number treated patients in th cluster > 0in i

1 0

1 1for treated patient for untreated patient

i i

y yn n

1 0 1 1ˆ /i i i i ip n n n n

1 ˆi in pLocal Treatment

Imbalance!

i.e. not Generalized Linear Modelsand their Nonlinear extensions.

The “statistical methodology” engine ideal for making fair treatment comparisons is:

Cluster Analysis(Unsupervised Learning)

plus Nested ANOVA

ˆ ( ) / ( )i i i ii i

k ky x x x

p p

Inverse Probability Weighting(IPW) for CA models:

( | )i i iE y x x

2( | ) pi i iV y x

The “Local Control” Philosophy:

• y = Outcome comparisons among patients with the have most similar X characteristics are most relevant

• Robust, Nested Treatment-within-Cluster ANOVA• Systematically form, compare, subdivide & recombine

subgroups (clusters) …built-in sensitivity• Non-parametric Distribution of Observed Local

Treatment Differences (LTDs) …no prior distribution!• Main Effect of Treatment is Mean of CDF formed by

combining LTD estimates weighted Cluster Size• Only when Combined CDF suggests Differential

Response: Which patient characteristics predict What?

Credibility…• Conflicts of Interest between Pharmaceutical

Industry, Regulators and Data Custodians / Analysts

• Why should industry pay BIG $$$ for observational studies when poor / naïve analyses of biased data can create perceived needs for even more expensive RCTs?

FDA

MC

Res

earc

h

CR

O &

Academ

ic

Research

Man

aged

Car

e CM

S&

VA

PharmaIndustry

PU

BL

IC

The pieces don’t fit together very well in the USA!

Aprotinin Case Study…• Attack in early 2006 by a US MD who got

some very sloppy analyses of international patient registry data published in NEJM

• Bayer (Germany) commissioned gigantic admin claims analysis by the research arm of their major US payer in mid 2006

• MC researcher emailed a flawed, highly unfavorable analysis to Germany 8 days before 2006 US advisory board meeting

Drug warnings fall flatBayer hides bad news; a researcher doesn't, and takes heat.

KRIS HUNDLEY, St. Petersburg Times, August 5, 2007

Dr. Thomas Kelly, a heart surgeon for 30 years, …routinely uses Trasylol on repeat open-heart patients or people on blood thinners.

"Bleeding is a tremendous problem" Kelly said. “In certain populations, there is much less need for transfusions with Trasylol. The alternatives are not nearly as effective.“

"This drug is used on high-risk people; that's why there's a higher incidence of death," the surgeon said. "I think a terrible disservice has been done to a very helpful drug."

Though he thinks the recent studies "unfairly impugned" Trasylol, Kelly said he is using the drug more selectively and reading all the research available on the topic.

FDA

MC

Dat

aC

RO

&

Academ

ic

Research

Man

aged

Car

e CM

S&

VA

PharmaIndustry

PU

BL

IC

Why should Pharma TRUST the other Players?

Unbiased

Arbitration

What constitutes a BENEFIT ???When a treatment is approved only for patients with high disease severity or clear vulnerability / frailty,

there appear to be two possible “standards.”

Treated patients have better outcomes than untreated patients with same risk

orTreated patients have better outcomes than untreated patients with high risk

Bang H, Robins JM. Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics 2005; 61: 962-972.

Fraley C, Raftery AE. Model based clustering, discriminant analysis and density estimation. JASA 2002; 97: 611-631.

Imbens GW, Angrist JD. Identification and Estimation of Local Average Treatment Effects. Econometrica 1994; 62: 467-475.

McClellan M, McNeil BJ, Newhouse JP. Does More Intensive Treatment of Myocardial Infarction in the Elderly Reduce Mortality?: Analysis Using Instrumental Variables. JAMA 1994; 272: 859-866.

McEntegart D. “The Pursuit of Balance Using Stratified and Dynamic Randomization Techniques: An Overview.” Drug Information Journal 2003; 37: 293-308.

References

Obenchain RL. USPS package: Unsupervised and Supervised Propensity Scoring in R. Version 1.1-0. www.r-project.org August 2007.

Obenchain RL. Unsupervised Propensity Scoring: NN and IV Plots. 2004 Proceedings of the JSM.

Robins JM, Hernan MA, Brumback B. Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology 2000; 11: 550-560.

Rosenbaum PR, Rubin RB. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 1983; 70: 41-55.

Rosenbaum PR. Observational Studies, Second Edition. 2002. New York: Springer-Verlag.

References …concluded

Documents

Bias Correction in Pharmaceutical Risk-Benefit Assessment Bob Obenchain, PhD, FASA Risk-Benefit Statistics LLC Yin = Dark = Evil = Risk Yang = Light =