Upload
economic-research-forum
View
1.343
Download
4
Embed Size (px)
Citation preview
Propensity Score
Matching Methods
Day 1, Lecture 3
By Ragui Assaad
Training on Applied Micro-Econometrics and Public Policy Evaluation
July 25-27, 2016
Economic Research Forum
Type I Methods
• Assumption:
• Conditional exogeneity of placement
• i.e. placement into program does not depend on unobservables
affecting outcome of interest
• In that case Ti is not correlated with error term and is therefore not
endogenous
• Can simply use regression to get at ATE
• This is a fairly strong assumption that is unlikely to be met in most
cases.
Multiple Regression Approach• Assumes conditional exogeneity of placement (selection
on observables only)
• Run a multiple regression explaining the outcome (Y) as a
function of the Treatment (T) and other controls (X). The
coefficient of T is the average treatment effect.
• Can Differentiate effect of treatment across different
groups by interacting X and T
iiii XTY
iiiiii XTXTY
Problems with Regression Approach
• Besides the strong assumption of conditional exogeneity
of placement, linear regression makes other strong
parametric assumptions relating to how treatment and
outcome are related conditional on observables.
• We must assume a particular functional form, like linear,
log-linear, etc.
• We would prefer not to make such assumptions and use a
non-parametric approach.
• Regular regression also gives equal weight to all
comparison observations, irrespective of how likely they
are to participate in the program
Introduction to Propensity Score
Matching (PSM)• Method is used to find a comparison group of non-
participants with similar pre-intervention characteristics as
the treatment group
• Like regression, method assumes conditional exogeneity
of placement or “selection on observables only”
• Main Question: Which characteristics does one use and
what weight to put on each of them?
• Propensity score matching is a way of answering this
question
What are the advantages of PSM over
Regression?
• PSM does not assume a particular functional form for the
way the X’s affect Y and the way T affects Y
• It is referred to as a non-parametric method
• Regression uses the whole dataset to compare treatment
and controls, even if treatment and control observations
are very different in terms of the observables
• PSM only compares treatment and control observations
that are similar (along observable characteristics)
• We only keep control observations that "close" to the treatment
observations we want to match
Basic Theorem of PSM
• Rosenbaum and Rubin (1983) showed that instead of
having to match on a multitude of dimensions in a vector
of observable characteristics Z, it is only necessary to
match on a single dimension P(Z), which is the propensity
score.
• A treatment and control observation that are "close" in the
propensity score space, will also be "close" along the
various elements making up Z.
What is a propensity score?• A propensity score is given by:
• P(Z)=Pr(T=1 | Z) where Z is a vector of pre-exposure
characteristics
– Z can include the pre-treatment value of the outcome
– Treatment units are matched to comparison or control units with
similar values of P(Z)
• Impact estimates from Propensity Score Matching (PSM)
will depend on the variables that go into the equation
used to estimate propensity score and also on the
specification of that equation
• Can be estimated using probit or logit
Region of Common Support
Region of common support for propensity score between participants and non-participants must be large enough to find an adequate comparison group
Assumption Underlying PSM Methods
• Eliminating selection based on observables will reduce
overall bias
• This is the case if and only if the bias based on
unobservables goes in the same direction as bias based
on observables
• If they go in opposite direction, eliminating bias based on
observables will increase overall bias
• Possible in theory, but not likely in practice
What variables go in Z?
• Any variable that can affect program placement, including
the pre-intervention level of the outcome variable
• Can be obtained through interviews
• If one cannot find good observable variables to explain
who participates and who doesn’t, we will do a poor job
eliminating selection bias
PSM in practice
• Use predicted probabilities of participation from a probit or
logit model
• Matching is done based on the distance between each
treatment observation and comparison observations in its
neighborhood, using the propensity score as a metric of
distance.
• The outcome of each treatment is compared to that of its
matched comparison observations to detect whether the
treatment has a statistically significant effect
Matching Methods
• There are many ways to undertake the matching• Simplest method:
• One-to-one matching
• Match each treatment observation with “nearest” comparison observation
• Can impose a maximum distance, through a “caliper”
• Subject to potential outliers
• Addressing ties in propensity scores
• Nearest n-neighbors
• Uses the average of the nearest n-neighbors among comparison observations
• Can also impose a caliper
Matching Method• More complicated but probably sounder methods
• Kernel matching
• Kernel functions are symmetric functions around a particular point that
can be used to provide a weight that is inversely proportional to the
horizontal distance from that point
• The weight is the vertical height at each distance from the central point
• The weight typically goes to zero after a certain distance has been
reached (usually referred to as bandwidth)
Some Popular Kernel Function
0
0.2
0.4
0.6
0.8
1
1.2
-1 -0.5 0 0.5 1
Epanechnikov
0
0.2
0.4
0.6
0.8
1
1.2
-1 -0.5 0 0.5 1
Tricube
0
0.2
0.4
0.6
0.8
1
1.2
-1 -0.5 0 0.5 1
Uniform
0
0.2
0.4
0.6
0.8
1
-3 -2 -1 0 1 2 3
Exponential
Epanechnikov Tricube
Uniform Gaussian or Normal
PSM Estimator
• The simplest measure of the impact of program is
measured by:
Where YjC is the value for a single “nearest neighbor” from
the comparison group.
• If we are using multiple “nearest neighbors” we take a
weighted mean of all the “nearest neighbors” to form YjC
with the weights provided by the kernel function used
(generally declining with distance from observation j)
(1/ NT) (Y
j
T -Yj
C )j=1
NT
å
Notes on Estimating Propensity Score
• Although effect of treatment on outcome is non-
parametric, estimation of propensity score itself
depends on a parametric specification (usually
logit or probit)
• Specification of propensity score equation will
affect quality of matching and, therefore, final
result
• Must try out multiple specifications with lots of
interaction terms between the observable
characteristics to get a good fit
Testing how good the match is• Need to test the extent to which observable
characteristics are balanced in the matched sample, using
a t-test of of the difference in means of the covariates
across matched samples.
• None of the test should be statistically significant
• Should examine the effect of the matching method on the
area of common support
• The closer we try to make the match, the more likely that some
treatment observation won’t find matches, which reduce the
efficiency of the model
Standard Errors
• Because propensity scores are predicted, analytical
standard errors obtained from a t-test of the outcome for
matched treatment and comparison observations are not
correct
• We typically use bootstrapping to estimate the correct
standard errors.
Sensitivity Analysis
• There are many modeling decisions that can affect the
results, including
• Specification of propensity score equation
• What variables to include
• How many interactions to include and at what level
• Matching method and caliper or bandwidth to use
• Must test sensitivity of results to these decisions.
• If results are not robust to these changes, this should raise a
question mark about their reliability
PSM compared to Social Experiements
• In social experiments PS= constant.
– Everyone has same probability of participating.
• PSM tries to match treatment and control groups based
on them having an equal probability of participation (using
the probability predicted from observables)
• But probability of participation can also be affected by
unobservables raising concerns about remaining selection
bias in PSM methods
Motivation for propensity score
weighting• Propensity score methods are used to remove the effects
of observable confounders when estimating the effect of a
treatment on an outcome
• Have been discussing matching methods
• Stratification, nearest neighbor, etc.
• Propensity scores can also be used to weight
observations (like a sample weight)
• Specifically use inverse probability of treatment weighting (IPTW)
• This approach can be used in contexts where models are non-linear
• Example: Propensity score weights in survival analysis (Austin, 2011)
23
Creating propensity score weights
• Same creation of propensity scores as for matching
• P(Z)=Pr(Ti=1 | Zi) where Z is a vector of pre-exposure
characteristics
• Predict probabilities of participation using logit or probit:
• Use propensity scores to create weights
• Weights (wi) are the inverse of the probability of receiving the
treatment (or non-treatment) that the subject actually received
• High weights for those who were unlikely to receive treatment but
did and those who were likely to receive treatment but did not
24
P̂
wi =Ti
P̂i+
1-Ti
1- P̂i If treated wi =
1
P̂i If control wi =
1
1- P̂i
Estimates of treatment effects
• There are a number of ways to use IPTW directly
to estimate treatment effects
• Can estimate average treatment effect (ATE) as:
• Average treatment on the treated (ATT) and
average treatment on the untreated (controls,
ATC) use different weights:
25
ATE =1
n
TiYi
P̂i1
n
å -(1-Ti )Yi
(1- P̂i )1
n
å
wi,ATT =Ti +P̂i(1-Ti )
1- P̂i wi,ATC =
Ti (1- P̂i )
P̂i+ (1-Ti )
Incorporating weights into models
• Regression models can be weighted by the inverse
probability of treatment to estimate the causal effects of
treatment
• Use like a sample weight
• Need to account for weighted nature of synthetic sample
(use robust estimation) when using IPTW
26
Checking weights
• When weights are applied to treatment and control groups, they should then have statistically indistinguishable group means
• Weights may be very inaccurate/unstable for subjects with a very low probability of receive the treatment status they received• Very unlikely to get treatment and did
• Very likely to get treatment and did not
• Can stabilize weights by multiplying weights by mean probability of treatment received (T or C) propensity score
• Can also trim (use region of common support)
27
Example of propensity score weighting
• Southwest China Poverty Reduction Project ran from
1995 to 2005
• Surveys in 1996 and 2000 of 2,000 households in
targeted and non-targeted villages
• Follow-up in 2004/5
• Selection might drive treatment effect estimates if initial
differences in samples were large
• Propensity score weighting in regression models was
used as part of estimating program effects
• Impact evaluation of Chen, Mu, and Ravallion (2008)
found impacts on income, consumption, and education
diminished by 2004/5
28