View
213
Download
1
Category
Preview:
Citation preview
Propensity Score Analyses: A good looking cousin of an RCT
KCASUG Q1: March 4, 2010
Kevin Kennedy, MSSaint Luke’s Hospital, Kansas City, MO
John House, MSSaint Lukes’s Hospital, Kansas City, MO
Phil Jones, MSSaint Luke’s Hospital, Kansas City, MO
Motivation• Estimating Treatment effect is important!
– Is Drug “A” advantageous to Placebo?– Do same sex classes increase academic performance?– Do Titanium golf clubs increase distance of drives?
• Designing ways to answer these questions should be:– Ethical– Practical– Cost Effective
The Gold Standard• Randomized Control Trials
– Randomization of subjects to treatment groups (essentially coin flip determines group)
– On average all subject characteristics will be balanced between groups
Treatment
(n=100)
Control
(n=100)
P-value
Age 57±3.2 57±3.1 .78
Male 57% 58% .65
History Diabetes
22% 22% .99
History
Heart Failure
8% 9% .75
Benefits of a RCT• A pure link between Treatment and Outcome
– Random allocation of subjects removes the possibility of a third factor being associated with treatment and outcome
• Can blind subjects and researchers to treatment allocation
Potential Caveats with an RCT• Ethical Issues:
– Not assigning subjects to a treatment generally thought to improve outcomes is often thought unethical
• Practical Issues:– Problems with recruitment of subjects
• Consenting to “alternatives”, and substantial ‘drop out’– Cost and Time Issues:
• Enrolling subjects, training staff, designing trial, treatment
• May be “too” controlled– Specific subject criteria and treatment use– Population may not represent the “real world” experience
Spaar A, Frey M, Turk A, Karrer W, Puhan MA. Recruitment barriers in a randomized controlled trial from the physicians' perspective: a postal survey. BMC Med Res Methodol. 2009 Mar 2;9:14
So…what now?• Observational data is popular
– Treatment is not given due to randomization, only observed– Unfortunately…Subject characteristics will likely not be
balanced
Treatment
(n=100)
Control
(n=100)
P-value
Age 57±3.2 62±5 .031
Male 57% 42% .047
History Diabetes
22% 30% <.001
History
Heart Failure
8% 15% ..035
So…what now?• Need to account for the differences between
treatment and control– Common in modeling to “adjust” away differences
between groups• However, sample size constraints restrict the # of
variables to adjust for
• Solution: Propensity Scores
Propensity Score Outline
I. IntroductionII. How to use the score
i. Matchingii. Stratifying
III. Accessing Balancei. Standardized Difference
IV. Propensity Scores Using SASV. Concluding remarks
i. Other usesii. Issues with publications
Introduction• Definition:
– Propensity score (PS): the conditional probability of being treated given the individual’s covariates
– Notation:
– Estimating Propensity Score can be done with the common logistic regression model predicting treatment on selected covariates needing balanced
– Will be used to balance characteristics between groups
covariates observed are and
control if 0 and treatmentif 1 :
)|1()(
i
i
iiii
x
ZWhere
xXZPxe
IntroductionTreatment
(n=100)
Control
(n=100)
P-value
Age 57±3.2 62±5 .031
Male 57% 42% .047
History Diabetes 22% 30% <.001
History
Heart Failure
8% 15% ..035
Here we would develop a PS for being in the treatment group conditioned on: age, gender, diabetes history, and heart failure
Introduction-why important?• Important: For a specific value of the PS the
difference between treatment and control is an unbiased estimate of the average treatment effect at that PS (Rosenbaum & Rubin, 1983; Theorem 4)
• “Quasi-Randomized” experiment– Take 2 subjects (one from treatment and other control)
with the same PS then you could “imagine” these 2 subjects were “randomly” assigned to each group. (since they are equally likely to be treated.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
Introduction• It’s not just a side analysis anymore……
0
100
200
300
400
500
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
# of publications with PS in Search PubMed
Ways to use the PS• Common strategies include:
– Matching• Match treatment and controls on PS
– Stratification• Keep all subjects but analyze in Strata (usually quintiles of
PS)– Regression adjustment
Matching• Most common use of PS analyses.
– Since the PS is a single scalar quantity Matching is comparatively easier (as opposed to matching on: age, gender, history, etc…)
– Matching 1 Control to 1 Treatment makes for an easily understood analyses
– Common to match on the Logit of the PS since it is approximately normal
)(
)(1log)(
Xe
XexL
Matching• Nearest Neighbor matching (w/o replacement)
– Randomly Order Treated and Control Subjects– Take the first treated subject and find the Control with the
closest Propensity Score. Remove both from list– Move to the second Treated subject and find control with
closest PS……continue until you run out of treated patients
• This will create a 1:1 match of treated and control patients– Note: methods exist for 1:many matches also
Matching• Problem: The “Nearest” neighbor may not be that
“Near”• May want to enforce a caliper width for acceptable
matches– E.g. if there is no control within the ‘caliper’ of a case
then no match occurs and case will be removed • Common in Literature to use:
.2*stddev[L(x)] as the caliper• For a matching macro see:
mayoresearch.mayo.edu/biostat/upload/gmatch.sas
Matching: Ideal Scenario• Before Match
• After Match:
Treatment
(n=543)
Control
(n=1598)
P-value
Age 57±3.2 62±5 .031
Male 57% 42% .047
History Diabetes
22% 30% <.001
History
Heart Failure
8% 15% ..035
Treatment
(n=500)
Control
(n=500)
P-value
Age 57±3.2 57.3±3 .45
Male 57% 57% .88
History Diabetes
22% 23% .48
History
Heart Failure
8% 7% .77
Stratification • Matching will inevitably result in a smaller dataset• Stratifying analyses on PS will keep all data.
– Create the PS– Cut the PS into equal groups (Quartile, Quintiles)
• (Rosenbaum & Rubin, 1983) claim quintile strata will remove 90% of bias
– Conduct the analyses within these strata
Example• Comparison of Angiography (vs not) in elderly patients with
Chronic Kidney Disease (CKD)– Propensity score for receiving an Angio
• Based on Demographics, History, and Hospital Characteristics
Chertow GM, Normand SL, McNeil BJ. "Renalism": inappropriately low rates of coronary angiography in elderly individuals with renal insufficiency. J Am Soc Nephrol. 2004 Sep;15(9):2462-8
Propensity Quintile
Group # of patients
1-year Mortality
OR (95%CI)
1(0-.06) Angio
No Angio
46
1307
56.5%
56.2%
1.02 (.56-1.84)
2(.06-.16) Angio
No Angio
133
1221
36.8%
50.7%
.57 (.39-.82)
3 (.16-.30) Angio
No Angio
303
1051
34.7%
44.7%
.66 (.50-.86)
4 (.30-.54) Angio
No Angio
557
797
30.7%
38.3%
.72 (.57-.90)
5 (.54-1) Angio
No Angio
967
387
18.9%
34.1%
.45 (.35-.59)
Overall Angio
No Angio
2014
4780
26.7%
47.4%
.62 (.54-.70)
Covariate Adjustment• This use would be the least recommended.• Do a model for PS, and then use that PS in a
model as an adjustment when evaluating association between treatment and outcome
• Advantage over normal covariate adjustment– Simpler final model– Can have many more covariates in the PS model
Assessing Balance• Remember: the main purpose of a PS is to
balance characteristics between treated and controls…so how do we show success?
• P-values– Function of Sample Size – May be misleading for Stratification or 1:many match
• Standardized Differences– Not a function of Sample Size– Can be used for Stratification and 1:many matches
Standardized Differences• Formula: Continuous Variables
2
*10022controltreatment
controltreatment
ss
xxd
• Formula: Dichotomous Variables
2
)ˆ1(ˆ)ˆ1(ˆ
ˆˆ*100
cctt
controltreatment
pppp
ppd
• For Stratified analyses: compute d in each strata and take average
Standardized Differences• Sample Calculations for a 1:1 match:• Before Match
• After Match
119
2
52.3
6257*100
22
d
Treatment
(n=543)
Control
(n=1598)
P-value
Age 57±3.2 62±5 .031
Treatment
(n=500)
Control
(n=500)
P-value
Age 57±3.2 57.3±3 .45
9.
2
32.3
3.5757*100
22
d
Standardized Differences• What value constitutes balance?
– Peter Austin Commonly states values less than 10 constitute balance between groups
– The closer to ‘0’ then more balanced
Propensity Analysis (Matching) Using SAS• Simulated Data• Data specifics
– N=5000 (~1000 Group1, ~4000 Group2)
Group1
N=1011
Group2
N=3989
P-value
Age 59.4 ± 4.0 63.5 ± 4.0 < 0.001
Male_Gender 560( 55.4% ) 2009 ( 50.4% ) 0.004
History of
Diabetes
689 ( 16.9% ) 516 ( 21.4% ) < 0.001
Example: Create PS
proc logistic data=dataset descending;model group1= age gender diabetes {+others};output out=pred p=pred xbeta=logit;run;
Predicted probabilities of being in group 1
On Logit scale
Example: Define Caliperproc means data=pred stddev;var logit;output out=lstd;run;
data _null_;set lstd;if _stat_='STD' THEN do;
call symputx('std',logit/5);end;run;
Creating “caliper” of .2*stddev(logit)
Example: Perform Match%gmatch(data=pred, group=group1, id=id, mvars=logit, wts=1 , dmaxk=&std, ncontls=1,seedca=987896, seedco=425632, out=match);
Group1
N=858
Group2
N=858
P-value
Age 60.1 ± 3.6 60.17 ± 3.62 .678
Male_Gender 469( 54.66% ) 478 ( 55.71% ) .662
History of
Diabetes
261 ( 30.42% ) 256 ( 29.84% ) .792
mayoresearch.mayo.edu/biostat/upload/gmatch.sas
Example: Assess Balance• Original Data
– %std_diff(data=fulldata, group=group1, continuous=age {+others}, binary=male diabetes {+others}, out=before)
• Matched Data– %std_diff(data=matched_data, group=group1, continuous=age
{+others}, binary=male diabetes {+others}, out=after)• Combine
data after;set after(rename=(stddiff=after_stddiff));run;proc sql;create table both as select *from before as a join after as b on a.variable=b.variable;quit;
Example: Assess BalanceVariable label STD DIFF
BeforeSTD DIFF AFTER
V1
V2
V3
…
Age
Gender
Diabetes
…
99.65
9.22
15.9
…
.3
.45
3.3
…
proc gplot data=both;title 'Standardized difference plot';plot label*StdDiff=1 label*after_stddiff=2/overlay vaxis=axis1 haxis=axis2 href=10 legend=legend1 AUTOVREF chref=black lhref=3;run;quit;
Age
Also Made Up
Blah Blah
Diabetes History
Gender
KCASUG
Kevin Rules
Made Up
Random Variable
Running out of Names
Standardized Difference
0 10 20 30 40 50 60 70 80 90 100 110 120
Standardized difference plot
Before MatchAfter Match
Hmmm…a bit ugly
Format macroproc sort data=both;by stddiff;run;/*attach formats to variables*/%macro doformat(data=);data &data;set &data;count+1;run;
proc sql;select label into :label separated by '*' from &data;quit;
%let numvar=%words(&label,delim=%str(*));proc format;value fmt
%do i=1 %to &numvar ;&i=%qscan(&var,&i,*)
%end;;run;
data &data;set &data;format count fmt.;run; %mend;
%doformat(data=both);
Counter Variable
Read in Label names into &label
Count # of Variables
Format (i) counter with (i) label
Sort by stddiff before match
Assessing Balance
proc gplot data=both;title 'Standardized difference plot';plot count*StdDiff=1 count*afterstddiff=2/overlay vaxis=axis1 haxis=axis2 href=10 legend=legend1 AUTOVREF chref=black lhref=3;run;quit;
Variable label STD DIFF Before STD DIFF AFTER Count
V1
V3
V2
…
Age
Diabetes
Gender
…
99.65
15.9
9.22
…
.3
3.3
.45
…
Age
Diabetes
Gender
…
Made Up
Running out of Names
Also Made Up
Random Variable
Gender
Diabetes History
KCASUG
Kevin Rules
Blah Blah
Age
Standardized Difference
0 10 20 30 40 50 60 70 80 90 100 110 120
Standardized difference plot
Before MatchAfter Match
underweightrenal_failure
otherheart_diseasedialysis
obeseCOPD
tiarenal_insufficiency
strokeheartfailure
CVDrheumatic_HD
oth_aterialdiseasePVD
anemiaprior_MI
race_blackformersmoke
chronic_kidney_dismale
race_whitediabetes
hyperlipidemiahypertension
apr_sevself_pay
prior_PCIcardiogenic_shock
apr_mortnstemi
currentsmokeage
electiveemergency
stemi
Standardized Difference
0 10 20 30 40 50 60 70
Standardized difference plot
Before MatchAfter Match
Now What?• Variable Standardized differences are <10,
indicating balance• Now we can see if group membership has an
impact on our outcome– Caution: this is matched data so statistically we need to
account for this• Paired t-tests, McNemars Test, Conditional Logistic
Regression, Stratified Proportional Hazard Regression
Other Uses…• A way to show just how different 2 groups are…
Distribution of Propensity Scores
Group 1 Group 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Pro
ba
bili
ty o
f G
rou
p 2
Pro
ba
bili
ty o
f C
AS
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
CEA CAS
Distribution of Propensity Scores
Group 1 Group 2
Pro
babi
lity
Gro
up 2
Concluding Remarks• If you want more information: Search for Ralph
D’Agostino Jr. (Wake Forest) and Peter Austin (Univ of Toronto)
• Introductory Read:– D’Agostino JR: Tutorial in Biostatistics: Propensity Score Methods for
Bias Reduction in the comparison of treatment to a non-randomized control group. Statist. Med 17 (1998), 2265-2281
• 1:Many Matching– Austin P. Assessing balance in measured baseline covariates when using
many-to-one matching on the propensity score. Pharmacoepidemiology and drug safety (2008) 17: 1218-1225
Concluding Remarks…things to avoid• Austin (2008) performed a literature review and found
many propensity score matching papers were done incorrectly– 47 Articles reviewed from medical literature which did
Propensity Score Matching• Only 2 studies used Standardized Differences to access
match (most relied on p-values)• Only 13 used correct statistical methods for matched data
• See paper for the common errors
– Only 2 studies assessed balance correctly and used correct statistical methods
Austin PC. A critical appraisal of propensity-score matching in the medicalliterature between 1996 and 2003. Stat Med. 2008 May 30;27(12):2037-49
Concluding Remarks…things to avoid• Austin’s Recommendations1. Strategy for creating pairings should be specifically
stated with appropriate statistical citation2. The distribution of baseline characteristics between
treated and control should be described3. Differences in distributions should be assessed with
methods not influenced by sample size4. Use appropriate statistical methods to account for
matchi. McNemar’s Test for Binary dataii. Use of strata statement in proc logistic or phreg
What have we learned…if anything1. RCT may be the gold standard but Propensity
Scores are their attractive cousin2. Using PS can remove a lot of bias in determining
treatment effect3. You can: Match, stratify, or adjust for the PS4. Use the standardized difference to determine
balance (unaffected by sample size)
Contact InformationName: Kevin KennedyCompany: Mid America Heart Institute: St. Luke’s HospitalAddress: 4401 Wornall Rd, Kansas City, MOEmail: kfk3388@gmail.com or kfkennedy@saint-lukes.org
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Recommended