Propensity Score Matching Methods

Propensity Score

Matching Methods

Day 1, Lecture 3

By Ragui Assaad

Training on Applied Micro-Econometrics and Public Policy Evaluation

July 25-27, 2016

Economic Research Forum

Type I Methods

• Assumption:

• Conditional exogeneity of placement

• i.e. placement into program does not depend on unobservables

affecting outcome of interest

• In that case Ti is not correlated with error term and is therefore not

endogenous

• Can simply use regression to get at ATE

• This is a fairly strong assumption that is unlikely to be met in most

cases.

Multiple Regression Approach• Assumes conditional exogeneity of placement (selection

on observables only)

• Run a multiple regression explaining the outcome (Y) as a

function of the Treatment (T) and other controls (X). The

coefficient of T is the average treatment effect.

• Can Differentiate effect of treatment across different

groups by interacting X and T

iiii XTY

iiiiii XTXTY

Problems with Regression Approach

• Besides the strong assumption of conditional exogeneity

of placement, linear regression makes other strong

parametric assumptions relating to how treatment and

outcome are related conditional on observables.

• We must assume a particular functional form, like linear,

log-linear, etc.

• We would prefer not to make such assumptions and use a

non-parametric approach.

• Regular regression also gives equal weight to all

comparison observations, irrespective of how likely they

are to participate in the program

Introduction to Propensity Score

Matching (PSM)• Method is used to find a comparison group of non-

participants with similar pre-intervention characteristics as

the treatment group

• Like regression, method assumes conditional exogeneity

of placement or “selection on observables only”

• Main Question: Which characteristics does one use and

what weight to put on each of them?

• Propensity score matching is a way of answering this

question

What are the advantages of PSM over

Regression?

• PSM does not assume a particular functional form for the

way the X’s affect Y and the way T affects Y

• It is referred to as a non-parametric method

• Regression uses the whole dataset to compare treatment

and controls, even if treatment and control observations

are very different in terms of the observables

• PSM only compares treatment and control observations

that are similar (along observable characteristics)

• We only keep control observations that "close" to the treatment

observations we want to match

Basic Theorem of PSM

• Rosenbaum and Rubin (1983) showed that instead of

having to match on a multitude of dimensions in a vector

of observable characteristics Z, it is only necessary to

match on a single dimension P(Z), which is the propensity

score.

• A treatment and control observation that are "close" in the

propensity score space, will also be "close" along the

various elements making up Z.

What is a propensity score?• A propensity score is given by:

• P(Z)=Pr(T=1 | Z) where Z is a vector of pre-exposure

characteristics

– Z can include the pre-treatment value of the outcome

– Treatment units are matched to comparison or control units with

similar values of P(Z)

• Impact estimates from Propensity Score Matching (PSM)

will depend on the variables that go into the equation

used to estimate propensity score and also on the

specification of that equation

• Can be estimated using probit or logit

Region of Common Support

Region of common support for propensity score between participants and non-participants must be large enough to find an adequate comparison group

Assumption Underlying PSM Methods

• Eliminating selection based on observables will reduce

overall bias

• This is the case if and only if the bias based on

unobservables goes in the same direction as bias based

on observables

• If they go in opposite direction, eliminating bias based on

observables will increase overall bias

• Possible in theory, but not likely in practice

What variables go in Z?

• Any variable that can affect program placement, including

the pre-intervention level of the outcome variable

• Can be obtained through interviews

• If one cannot find good observable variables to explain

who participates and who doesn’t, we will do a poor job

eliminating selection bias

PSM in practice

• Use predicted probabilities of participation from a probit or

logit model

• Matching is done based on the distance between each

treatment observation and comparison observations in its

neighborhood, using the propensity score as a metric of

distance.

• The outcome of each treatment is compared to that of its

matched comparison observations to detect whether the

treatment has a statistically significant effect

Matching Methods

• There are many ways to undertake the matching• Simplest method:

• One-to-one matching

• Match each treatment observation with “nearest” comparison observation

• Can impose a maximum distance, through a “caliper”

• Subject to potential outliers

• Addressing ties in propensity scores

• Nearest n-neighbors

• Uses the average of the nearest n-neighbors among comparison observations

• Can also impose a caliper

Matching Method• More complicated but probably sounder methods

• Kernel matching

• Kernel functions are symmetric functions around a particular point that

can be used to provide a weight that is inversely proportional to the

horizontal distance from that point

• The weight is the vertical height at each distance from the central point

• The weight typically goes to zero after a certain distance has been

reached (usually referred to as bandwidth)

Some Popular Kernel Function

0

0.2

0.4

0.6

0.8

1

1.2

-1 -0.5 0 0.5 1

Epanechnikov

0

0.2

0.4

0.6

0.8

1

1.2

-1 -0.5 0 0.5 1

Tricube

0

0.2

0.4

0.6

0.8

1

1.2

-1 -0.5 0 0.5 1

Uniform

0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Exponential

Epanechnikov Tricube

Uniform Gaussian or Normal

PSM Estimator

• The simplest measure of the impact of program is

measured by:

Where YjC is the value for a single “nearest neighbor” from

the comparison group.

• If we are using multiple “nearest neighbors” we take a

weighted mean of all the “nearest neighbors” to form YjC

with the weights provided by the kernel function used

(generally declining with distance from observation j)

(1/ NT) (Y

j

T -Yj

C )j=1

NT

å

Notes on Estimating Propensity Score

• Although effect of treatment on outcome is non-

parametric, estimation of propensity score itself

depends on a parametric specification (usually

logit or probit)

• Specification of propensity score equation will

affect quality of matching and, therefore, final

result

• Must try out multiple specifications with lots of

interaction terms between the observable

characteristics to get a good fit

Testing how good the match is• Need to test the extent to which observable

characteristics are balanced in the matched sample, using

a t-test of of the difference in means of the covariates

across matched samples.

• None of the test should be statistically significant

• Should examine the effect of the matching method on the

area of common support

• The closer we try to make the match, the more likely that some

treatment observation won’t find matches, which reduce the

efficiency of the model

Standard Errors

• Because propensity scores are predicted, analytical

standard errors obtained from a t-test of the outcome for

matched treatment and comparison observations are not

correct

• We typically use bootstrapping to estimate the correct

standard errors.

Sensitivity Analysis

• There are many modeling decisions that can affect the

results, including

• Specification of propensity score equation

• What variables to include

• How many interactions to include and at what level

• Matching method and caliper or bandwidth to use

• Must test sensitivity of results to these decisions.

• If results are not robust to these changes, this should raise a

question mark about their reliability

PSM compared to Social Experiements

• In social experiments PS= constant.

– Everyone has same probability of participating.

• PSM tries to match treatment and control groups based

on them having an equal probability of participation (using

the probability predicted from observables)

• But probability of participation can also be affected by

unobservables raising concerns about remaining selection

bias in PSM methods

Propensity Score

Weighting

Motivation for propensity score

weighting• Propensity score methods are used to remove the effects

of observable confounders when estimating the effect of a

treatment on an outcome

• Have been discussing matching methods

• Stratification, nearest neighbor, etc.

• Propensity scores can also be used to weight

observations (like a sample weight)

• Specifically use inverse probability of treatment weighting (IPTW)

• This approach can be used in contexts where models are non-linear

• Example: Propensity score weights in survival analysis (Austin, 2011)

23

Creating propensity score weights

• Same creation of propensity scores as for matching

• P(Z)=Pr(Ti=1 | Zi) where Z is a vector of pre-exposure

characteristics

• Predict probabilities of participation using logit or probit:

• Use propensity scores to create weights

• Weights (wi) are the inverse of the probability of receiving the

treatment (or non-treatment) that the subject actually received

• High weights for those who were unlikely to receive treatment but

did and those who were likely to receive treatment but did not

24

P̂

wi =Ti

P̂i+

1-Ti

1- P̂i If treated wi =

1

P̂i If control wi =

1

1- P̂i

Estimates of treatment effects

• There are a number of ways to use IPTW directly

to estimate treatment effects

• Can estimate average treatment effect (ATE) as:

• Average treatment on the treated (ATT) and

average treatment on the untreated (controls,

ATC) use different weights:

25

ATE =1

n

TiYi

P̂i1

n

å -(1-Ti )Yi

(1- P̂i )1

n

å

wi,ATT =Ti +P̂i(1-Ti )

1- P̂i wi,ATC =

Ti (1- P̂i )

P̂i+ (1-Ti )

Incorporating weights into models

• Regression models can be weighted by the inverse

probability of treatment to estimate the causal effects of

treatment

• Use like a sample weight

• Need to account for weighted nature of synthetic sample

(use robust estimation) when using IPTW

26

Checking weights

• When weights are applied to treatment and control groups, they should then have statistically indistinguishable group means

• Weights may be very inaccurate/unstable for subjects with a very low probability of receive the treatment status they received• Very unlikely to get treatment and did

• Very likely to get treatment and did not

• Can stabilize weights by multiplying weights by mean probability of treatment received (T or C) propensity score

• Can also trim (use region of common support)

27

Example of propensity score weighting

• Southwest China Poverty Reduction Project ran from

1995 to 2005

• Surveys in 1996 and 2000 of 2,000 households in

targeted and non-targeted villages

• Follow-up in 2004/5

• Selection might drive treatment effect estimates if initial

differences in samples were large

• Propensity score weighting in regression models was

used as part of estimating program effects

• Impact evaluation of Chen, Mu, and Ravallion (2008)

found impacts on income, consumption, and education

diminished by 2004/5

28

Government & Nonprofit

Propensity Score Matching Methods