Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Alternative approaches for confounding
adjustment in observational studies using
weighting based on the propensity scoreRishi J Desai, MS, PhD
Assistant Professor of Medicine
Division of Pharmacoepidemiology and Pharmacoeconomics
Brigham & Women’s Hospital/Harvard Medical School
[email protected] @Rishidesai11
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Outline
• Confounding and propensity score (PS) basics
• PS weighting- some general principles
• PS weighting- selecting among alternatives
• Case-example
• Summary
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
CONFOUNDING AND PROPENSITY SCORE
BASICS
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Random Systematic
Information biasSelection bias
Related to the manner in which patients
are recruited into the study or are
retained during the course of the study
Related to the manner in which information
about important study variables (exposure or
outcomes) is collected
Confounding bias
The relationship between the exposure
and outcome of interest is due,
completely or in part, to another variable
(the “confounder”)
Sources of errors in observational studies
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Potential confounders
(e.g age)
Exposure
(Treatment A vs B)
Outcome of interest
(e.g mortality)
Confounding
• Confounders- variables that simultaneously influence treatment selection and
risk for the outcome of interest
Conditioning on the
propensity score
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Taxonomy of Confounding Control
Amenable to Propensity
Techniques
Schneeweiss. Pharmacoepidemiol Drug Saf 2006; 15; 291-303
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Propensity scores (PS)
• What
o It is the conditional probability of receiving a particular treatment given a
vector of observed covariates
P (Treatment) ~ age + gender + DM + HTN..
o Predicted probability- quantity between 0 and 1
o Depends on availability of measured patient characteristics
Rosenbaum & Rubin. Biometrika 1983; 70 (1): 41-55
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Propensity scores (PS)
• Why
o PS offers a one-dimensional summary of multidimensional covariates, such
that when the propensity score is balanced across the treatment and
comparison groups, the distribution of all the covariates are balanced in
expectation across the two groups
o Efficient confounding control
Rosenbaum & Rubin. Biometrika 1983; 70 (1): 41-55
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Propensity scores (PS)
• How
o Build a model (logistic regression, most commonly) with treatment as the
dependent variable and measured patient characteristics as independent
variables with a goal of identifying patients’ likelihood of receiving a treatment
o Through modeling of available data on patient characteristics and actually
received treatments, researchers try to mimic prescriber’s decision process for
selecting a treatment for a particular patient
Rosenbaum & Rubin. Biometrika 1983; 70 (1): 41-55
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
PS WEIGHTING- SOME GENERAL PRINCIPLES
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Propensity scores (PS)
Matching• Find one (or more) reference patients for
a treated patient based on the proximity
in the PS
Regression adjustment• Treatment effect is derived after
adjusting for the PS in the outcome
model as a continuous variable
Weighting• Use PS to calculate weights, which create
a weighted pseudopopulation in which
treatment groups are balanced with
respect to distribution of all confounding
variables
Stratification (subclassification)• Classify patients into groups based on PS
distribution of the cohort (eg. 5 groups
based on PS quintiles)
• Treatment effect is computed within each
subclass and an overall effect is derived
based on a weighted average of effects
in the subclass
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Properties of alternate PS adjustment appraoches
12
Matching Stratification
(traditional)
Regression Weighting
Bias control*
Maximize precision (keeps all
observations in the
analysis?)
Transparent evaluation and
reporting of balance
Robust against phenomenon
of increasing covariate
imbalance after conditioning
(“PSM paradox”)#
Flexibility in targeting specific
populations of interest
* True under non-exceptional circumstances when methods are tuned to the problem at hand and applied carefully
Stürmer et al. Am J Epidemiol 2005;161:891-8
Elze et al. J Am Coll Cardiol. 2017 Jan 24;69(3):345-357
Vansteelandt & Daniel. Stat. Med. 2014, 33 4053–4072
# King and Neilsen. Political Analysis. 2019;27(4), 435-454
Ripollone et al. Am J Epidemiol. 2018;187(9):1951–1961
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Target of inference (Estimand)
13
the patient population to which the estimated treatment effect applies
Key question to consider when deciding the target of inference for a specific study—would it be feasible
to treat all eligible patients included in the study with the treatment of interest?
Yes
the target of inference might be defined as the
average treatment effect (ATE)
No
only patients with certain characteristics who actually
received the treatment would be ideal candidates for
treatment; then the target of inference might be defined
as average treatment effect among the treated
population (ATT)
For instance, a study comparing the effectiveness of
a newly approved treatment with an existing
treatment for a certain condition when both
treatments are indicated as exchangeable options
e.g dabigatran v warfarin for atrial fibrillation
For instance, a study evaluating the safety of
antipsychotic drugs for pregnant women with
schizophrenia/bipolar. Not all pregnant patients
might be considered for treatment due to unclear
teratogenicity profile; only women with greater
severity of these conditions would be treated
• In some cases, the interest is in targeting ATE only among a subset of patients with certain characteristics
leading to clinical equipoise eg average treatment effect in the overlap population (ATO)
• When there is no heterogeneity in treatment effect, ATE and ATT coincide
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
PS WEIGHTING- SELECTING AMONG
ALTERNATIVES
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School15
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
• Misspecification takes multiple forms- missing interaction between covariates, missing
an important covariate, inappropriate functional forms (for continuous covariates)
• A simple diagnostic step of checking covariate balance in weighted samples can alert
researchers to potential model misspecification
• In theory, alternate models of PS (machine learning based- boosting, random forests,
neural networks etc or targeting balance e.g covariate balancing PS) could be useful in
protecting against misspecification because they are adept at handling interactions
and non-linearity in continuous variables by default; in practice, evidence generally
suggests logistic regression model is adequate in most scenarios*
• The impact of model misspecification could vary across PS weighting approaches
16* Wyss et al. Am J Epidemiol. 2014;180(6):645–655
Setoguchi et al. Pharmacoepidem Drug Saf 2008; 17: 546–555
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School17
“Sufficient overlap”
• In most situations, investigators need to
use somewhat subjective decisions
• A simple recommendation is to
implement non-overlap trimming and
determine sufficiency of overlap based
on % of the sample excluded
0 1
%
0.5
A: Extensive treatment selection
0 1
%
0.50 1
%
0.5
B: Moderate treatment selection C: Little preference
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School18
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Key features
•Clear target of inference (mimics target of
inference in randomized trials)
•Easily extends to >2 treatment groups
•Easily extends to time-varying settings
(marginal structural models)
19
Cautions
•The score is directly used to create
weights, which commonly leads to
unstable weights and requirement for
stabilization or truncation
•Generally has less robust performance in
presence of PS model misspecification*
Weight calculation
Treated patients Reference patients
1
𝑃𝑆
1
(1 − 𝑃𝑆)
Inverse probability treatment weights (IPTW)
* Waernbaum Stat Med 2012;31:1572-81
Lee et al. PLoS One 2011;6:e18174
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Exposed Reference
Stratum 1
Stratum 2
Stratum 3
Stratum 49
Stratum 48
...
...
Stratum 50
Based on
PS
distribution
in the
exposed
Fine stratification weights (ATE)
Original sample
Weight calculation
Treated patients Reference patients
Τ𝑁𝑇𝑜𝑡𝑎𝑙 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙
Τ𝑁𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
Τ𝑁𝑇𝑜𝑡𝑎𝑙 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙
Τ𝑁𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Desai et al. Epidemiology 2017;28:249-57.
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Exposed Reference
Stratum 1
Stratum 2
Stratum 3
Stratum 49
Stratum 48
...
...
Stratum 50
Based on
PS
distribution
in the
exposed
Weighted sample
Fine stratification weights (ATE)
Weight calculation
Treated patients Reference patients
Τ𝑁𝑇𝑜𝑡𝑎𝑙 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙
Τ𝑁𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
Τ𝑁𝑇𝑜𝑡𝑎𝑙 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙
Τ𝑁𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Desai et al. Epidemiology 2017;28:249-57.
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Key features
•Clear target of inference (mimics target of
inference in randomized trials)
• “Semiparametric implementation PS
weighting”- PS is only used to stratify and
not directly to weight, which makes this
approach theoretically robust against PS
model misspecification
•Extreme weights are less common
22
Cautions
•Does not readily extend to >2 treatment
groups, n of strata increases
exponentially leading to high variability
•Sparse strata could lead to unstable
weights and requirement for truncation
Fine stratification weights (ATE)
Weight calculation
Treated patients Reference patients
Τ𝑁𝑇𝑜𝑡𝑎𝑙 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙
Τ𝑁𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
Τ𝑁𝑇𝑜𝑡𝑎𝑙 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙
Τ𝑁𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Exposed Reference
Stratum 1
Stratum 2
Stratum 3
Stratum 49
Stratum 48
...
...
Stratum 50
Based on
PS
distribution
in the
exposed
Original sample
Fine stratification weights (ATT)
Weight calculation
Treated patients Reference patients
1Τ𝑁𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁
𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
Τ𝑁𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Desai et al. Epidemiology 2017;28:249-57.
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Exposed Reference
Stratum 1
Stratum 2
Stratum 3
Stratum 49
Stratum 48
...
...
Stratum 50
Based on
PS
distribution
in the
exposed
Reference
Weighted sample
Fine stratification weights (ATT)
Weight calculation
Treated patients Reference patients
1Τ𝑁𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁
𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
Τ𝑁𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Desai et al. Epidemiology 2017;28:249-57.
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Key features
•Clear target of inference (mimics target of
inference in randomized trials)
• “Semiparametric implementation PS
weighting”- PS is only used to stratify and
not directly to weight, which makes this
approach theoretically robust against PS
model misspecification
•Extreme weights are less common
25
Cautions
•Does not readily extend to >2 treatment
groups, n of strata increases
exponentially leading to high variability
•Sparse strata could lead to unstable
weights and requirement for truncation
Fine stratification weights (ATT)
Weight calculation
Treated patients Reference patients
1Τ𝑁𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁
𝑇𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
Τ𝑁𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑃𝑆 𝑠𝑡𝑟𝑎𝑡𝑎 𝑖 𝑁𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Key features
•Clear target of inference
•Extends to >2 treatment groups
26
Standardized mortality ratio weighting (SMRW)
Weight calculation
Treated patients Reference patients
1𝑃𝑆
(1 − 𝑃𝑆)
Cautions
•The score is directly used to create
weights, which commonly leads to
unstable weights and requirement for
stabilization or truncation
•Performance likely compromised in
presence of PS model misspecification
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Key features
• Weights bound between 0 and 1- extreme weights
impossible
• Extends to >2 treatment groups
27
Matching weights
Weight calculation
Treated patients Reference patients
Cautions
• Target of interference is variable- close to ATE in
the whole population when groups are equally
sized and PS distributions have good overlap, is
close to the ATT in the smaller group when groups
are unequally sized but PS distribution have good
overlap.
• In circumstances of limited overlap in PS
distribution, may lead to treatment effect
estimation in a subpopulation that is not reflective
of patients receiving the treatment of interest in
routine care or the whole study population.
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 (𝑃𝑆, 1 − 𝑃𝑆)
𝑃𝑆
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 (𝑃𝑆, 1 − 𝑃𝑆)
(1 − 𝑃𝑆)
Li & Greene. Int J Biostat 2013;9:215-34
Yoshida et al. Epidemiology 2017;28:387-95.
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Key features
•Weights bound between 0 and 1-
extreme weights impossible
•Extends to >2 treatment groups
28
Overlap weights
Weight calculation
Treated patients Reference patients
Cautions
• Target of interference is variable- may lead to
treatment effect estimation in a
subpopulation that is not reflective of
patients receiving the treatment of interest in
routine care or the whole study population
• For 2 treatment groups, OWs calculated
based on logistic model yields exact covariate
balance- which makes it difficult to use
balance as a PS model diagnostic
(1 − 𝑃𝑆) 𝑃𝑆
Li et al. Am J Epidemiol 2019;188:250-7.
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
CASE EXAMPLE
29
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Study design
30
Warfarin/dabigatran
prescription
Index date
Continuous enrollment
for 6-months
No warfarin or dabigatran
use
Afib
diagnosis
Outcome
Follow-up stroke/systemic
embolism
(As treated)
October,
2013
October,
2010
(Dabigatran market entry)
PS calculated with 72 covariates using a logistic regression model
Desai et al. Am J Epidemiol. 2018 Nov 1;187(11):2439-2448.
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
PS distributional overlap
• Trimming non-overlapping regions
of the resulted in the exclusion of
only 10 patients, which confirmed
sufficient overlap in this cohort
• Bimodal distribution of warfarin
treated patients
• Warfarin patients in the first peak
down-weighted substantially
under all weighting approaches
except for the weights targeting
the ATE
31
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Weight distribution
32
Truncated at 99th percentile
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Confounder distribution in
warfarin group= distribution in
dabigatran group
Confounder distribution in
warfarin group= distribution in
dabigatran group=distribution
in subset of the whole cohort
Confounder distribution in
warfarin group= distribution in
dabigatran group = distribution
in the whole cohort
Table of select population characteristics
33
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School34
Balance plot
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School35
Treatment effect estimates with respect to stroke
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School36
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Summary
• Weighting based on the propensity score represents a valuable tool for
confounding adjustment in observational studies of treatment use and is
increasingly being used in epidemiological investigations
• Ideally, selection of the appropriate weighting approach should be driven by
target of inference specific to each study question
• When applied carefully, all alternative approaches of confounding adjustment
are likely to work well under most circumstances
37
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Resources
• SAS macros for PS fine stratification available on Harvard dataverse along with
simulated toy examples
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/U8
JLCW
• For R users- WeightIt by Noah Greifer supports most of these weighting
methods
https://cran.r-project.org/web/packages/WeightIt/WeightIt.pdf
38
Division of Pharmacoepidemiology and Pharmacoeconomics
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School
Thank you
[email protected] @Rishidesai11