Upload
phungnhi
View
229
Download
0
Embed Size (px)
Citation preview
Michael LechnerCausal Analysis RDD
2014page 1
Lecture 7
The Regression Discontinuity Design
RDD – fuzzy and sharp
Michael LechnerCausal Analysis RDD
2014page 2
Regression Discontinuity Design (RDD)Introduction (1)
The RDD design is a quasi-experimental design with the defining characteristic that the probability of receiving treatment changes discontinuously at some point while at the same time the potential outcome remains stable in a neighbourhood around this point observations in this neighbourhood with and without treatment may be good comparisons for each other
Example: If a training programme is only available to youths not older than 25, than at least in the year of the introduction of the programme, similar nonparticipants of age 26 may be a good comparison group for participants of age 25,... (the effects of the age change can be deduced e.g. from a comparison of 24 25 or 25 26, 26 27; these effects may be used to adjust for the additional year that come jointly with being in one or the other group)
Michael LechnerCausal Analysis RDD
2014page 3
Regression Discontinuity Design (RDD)Introduction (2)
Advantage: Experiment around a threshold
Common threads to internal validity:– Since location of threshold is usually not truly experimental, units may manipulate their
location with respect to the threshold– Threshold may be relevant for other treatments as well
(sensitivity checks available see last slides)
Common thread to external validity: Units around the threshold may not be representative for the treated or population at large
Michael LechnerCausal Analysis RDD
2014page 4
Regression Discontinuity Design (RDD)Introduction (3)
First applications – Thistlethwaite and Campbell (1960) study the effect of student scholarship on career
aspirations• awards only made if a test score exceeds a threshold
– See Cook (2008) for the history of RDD in economics, psychology, and statistics
More recent applications in economics– Class size on attainment by pupils (Angrist and Lavy, 1996, if average class size more
than XX pupils, class is split)– Anti-Discrimination laws on share of minority employees (Hahn, Todd, van der Klaaw,
1999,: Firms are covered by the law if they have more than 15 employees share of minorities increases!)
– ... many more (see surveys by Cook, 2008, Lee & Lemieux, 2009, Van der Klaauw, W. 2008)
Method of the day (?): Still pretty hot (and perhaps overvalued)(2008 special issue of the Journal of Econometrics)
Michael LechnerCausal Analysis RDD
2014page 5
The Regression Discontinuity Design (RDD)Introduction (4)
Example: Angrist, Lavy (QJE, 1999)– What is the effect of class size on students/pupils performance?– All sorts of endogeneity problems (richer schools have smaller classes, etc.)– Israel has rule that a class is split into two (3, 4, etc.) if number of students is above
certain thresholds– Whether number of students is just below or above threshold is exogenous (not related
to outcomes other than by leading to different class sizes)– Compare outcomes of classes just above and below the threshold to obtain effect of class
size
Michael LechnerCausal Analysis RDD
2014page 6
Regression Discontinuity Design (RDD)Introduction (5)
A note on estimation
Nonparametric estimation close to cut-off points (kernel may perform poorly local regression may be better)
Regression type estimator if additional assumptions are used
Because identification is local, there cannot be sqrt(N)-convergent estimators without further assumptions (no averaging of nonparametric estimates as in matching or IV)
Key: Need enough data close enough to threshold
Michael LechnerCausal Analysis RDD
2014page 7
Two distinct types of RDDSharp design
Michael LechnerCausal Analysis RDD
2014page 8
Two distinct types of RDD Fuzzy design
Michael LechnerCausal Analysis RDD
2014page 9
RDDSharp design (1)
X: (continuous) variable that defines threshold (forcing variable)
c: threshold
Quantity typically identified:
Identification is typically achieved by showing that these expectations can be learned from the observed outcome just above and below the threshold:
Note that CIA is fulfilled at threshold in a trivial way– since there is variation of treatment at cut-off, it must hold– but there is no common support in the sharp design CIA cannot be exploited directly matching not possible extrapolation around the cut-off needed
Smoothness condition for potential outcomes (that allows some local extrapolation) is therefore required
1( )D X c= ≥
[ ](1) (0) |S E Y Y X cτ = − =
[ ] [ ][ ] [ ]
(1) | lim |
(0) | lim |x c
x c
E Y X c E Y X x
E Y X c E Y X x↓
↑
= = =
= = =
(1), (0) |Y Y D X c=C
Michael LechnerCausal Analysis RDD
2014page 10
RDDSharp design (2)
Conditional expectations of the potential outcomes have to be smooth
Here we assume that the conditional distributions of the potential outcomes are smooth in x and y
– in practice continuity in a neighbourhood around c is enough
From this assumption we obtain identification
Estimate regression curves and evaluate their difference at X=c
However, as c is a boundary point for both regressions this is non-trivial– since non-parametric estimation is local to the threshold, estimators will converge
slower than sqrt(N) [i.e. slower than standard IV or matching]– typical non-parametric regression estimators are not well behaved at boundary
(0)| (1)|( , ) ( , ) are continuous in for all Y X Y XF y x and F y x x y
[ ] [ ] [ ] [ ][ ] [ ] [ ] [ ]
(0) | lim (0) | lim (0) | 0, lim |
(1) | lim (1) | lim (1) | 1, lim |x c x c x c
x c x c x c
E Y X c E Y X x E Y D X x E Y X x
E Y X c E Y X x E Y D X x E Y X x↑ ↑ ↑
↓ ↓ ↓
= = = = = = = =
= = = = = = = =
Michael LechnerCausal Analysis RDD
2014page 11
RDDFuzzy design (1)
Whereas in the sharp design the conditional on X treatment probability jumps from 0 to 1 (so everybody close to the threshold is a complier), in the fuzzy design there is less (local) compliance as this probability jumps by a smaller amount
Such a situation may typically arise when the incentives to participate in D jump at the threshold (but (non-)participation is not enforced)
In this case the threshold acts like an instrument. Under usual IV-type assumptions, the following estimator is obtained for the-local-complier treatment effect:
[ ] [ ]lim 1| lim 1|x c x c
P D X x P D X x↑ ↓
= = ≠ = =
[ ] [ ][ ] [ ]
lim | lim |
lim 1| lim 1|F x c x c
x c x c
E Y X x E Y X x
P D X x P D X xτ ↓ ↑
↓ ↑
= − ==
= = − = =
Michael LechnerCausal Analysis RDD
2014page 12
RDDFuzzy design (2)
Assuming that the cut-off can be marginally manipulated, we need the following assumptions for a LATE-type interpretation of the identified effect
– D(x) is non-decreasing in x at x=c (monotonicity)– A complier (at c) is defined as– Nevertaker (at c)– Always taker (at c)
Note that an explicit exclusion restriction is not necessary, because continuity of the potential outcome in the neighbourhood of the cut-off combined with a discrete jump of the probability at the cut-off implicitly acts like an exclusion restriction
Under these assumptions the fuzzy RDD estimates the effects for those observations located just around the cut-off which would switch status when moving over the cut-off
[ ] [ ]lim 1| 0 lim 1| 1x c x c
P D X x and P D X x↑ ↓
= = = = = =[ ] [ ]lim 1| 0 lim 1| 0
x c x cP D X x and P D X x
↑ ↓= = = = = =
[ ] [ ]lim 1| 1 lim 1| 1x c x c
P D X x and P D X x↑ ↓
= = = = = =
(0)| (1)|( , ) ( , ) are continuous in for all Y X Y XF y x and F y x x y
Michael LechnerCausal Analysis RDD
2014page 14
RDDExternal vs. internal validity
SRDD: Valid only for observations at the cut-off
FRDD: Valid only for compliers at the cut-off
ATE, ATET, etc. only identified by extrapolation of the effects
Thus, RDD explicitly trades-off external validity for internal validity
However, sometimes these compliers may represent politically interesting subpopulations, for example, when in a particular situation they correspond to the marginal population that is affected if the treatment is (to some extend) expanded or contracted
– Regional variation of programme assignment probabilities: Frölich & Lechner (2010)– Using unaffected firms (usually small firm are not subject to the same tough rules as
larger firms) to evaluate firing restrictions and disability laws
Michael LechnerCausal Analysis RDD
2014page 15
RDDVerifying the assumptions by some graphs
As RDD emphasises the distinction between a continuous change of the potential outcomes and a discrete jump of the treatment probability (leading to a discrete jump of the observed outcome if there is any effect), graphical analysis is a powerful tool to falsify these assumptions
Imbens and Lemieux (2008) suggest to plot the following– Mean values of the outcomes in fixed bins around the cut-off
• there should be some jump at c if there is any effect; and no other jumps that cannot be directly justified
– Mean values of the treatment variable in fixed bins around the cut-off • there should be some jump at c; and no other jumps that cannot be directly justified
– Mean values of other covariates in fixed bins around the cut-off • there should be no jump at c, because in this case the cut-off may also have a direct impact on Y
via changes in other covariates– A histogram of the forcing variable around the cut-off
• any clustering close to c might suggest strategic location above or below cut-off which would violate the implied assumption that location is random (at least close to c)
• to see if there are enough observations for a local nonparametric analysis
Michael LechnerCausal Analysis RDD
2014page 16
RDDEstimation (Imbens, Lemieux 2008)
Ideally, we would choose a subsample with observations clustered very, very closely around the cut-off point and then compute the respective sample means
Usually, this will not work, because sample size becomes too small to be useful need larger window
Since X influences Y, larger window usually means that values of X above and below the cut-off become more dissimilar and exclusion restrictions becomes critical
Therefore, control for changes in X by (nonparametric) regression above and below the threshold
Michael LechnerCausal Analysis RDD
2014page 17
RDDNonparametric estimation in the sharp design (1)
We must estimate E(y|X) above and below the cut-off and compare the predicted values of both non-parametric regressions at c
– Problem: Prediction of a single point of two np-regressions (no sqrt(N)-convergent np-estimator available) & this point is a boundary point
– Standard Kernel-type regressions do not very well close to the boundary (slower rate of convergence than at interior points)
– See Imbens, Lemieux (2008), p. 624 for the analytical derivation of these problems in a special case
Literature suggest particular estimators that seem to be fairly robust in such specific situations
– Global nonparametric methods (series estimation etc.)• not very attractive as only the region around the cut-off is of key importance (observations far
away from the cut-off may influence the results)– Local linear regression
• more robust to boundary problems than kernel regression but still local approach (observations far away from the cut-off have almost no impact on results)
Michael LechnerCausal Analysis RDD
2014page 18
RDDLocal linear regression in the sharp design (1)
In local region around both sides of the cut-off, estimate linear regressions
For the sharp design, these estimates are then evaluated at x=c and the difference is taken
Standard regression inference may be valid
Non-linear regressions (logit etc.) may be used (depending on the outcome variable)
We obtain numerically the same estimate if we use the following one-step estimator
2
1, , ,
ˆ 1( ) ( ) ( )minS
NS S
i i i i i ii
c h x c h y x c d x c dα β γ δ
γ α β γ δ=
= − < < + − − − − − − ∑
1 1 0 0 1 0ˆ ˆˆ ˆ ˆ ˆ( ) ( )S c c c cτ α β α β α α= + − − − − = −
0 0
1 1
20 0 0 0
:,
21 1 1 1
:,
ˆˆ( , ) ( )
ˆˆ( , ) ( )
min
mini
i
i ii c h x c
i ii c x c h
y x c
y x c
α β
α β
α β α β
α β α β
− < <
< < +
= − − −
= − − −
∑
∑
Michael LechnerCausal Analysis RDD
2014page 19
RDDThe role of additional covariates in the sharp design
Usually not necessary
But– Reduce variance of estimator (although there may be a dimensionality problem)– Reduce bias of estimator if observations away from the cut-off have to be used (because
of sample size considerations)– Check for discontinuities in those variables– Use to define sample (if discontinuity is only valid in a subsample)
Michael LechnerCausal Analysis RDD
2014page 20
RDDEstimation in the fuzzy design (1)
Here we have to estimate a ratio
We estimate the effect of the discontinuity on the outcome in the same way as for the sharp design
In a similar fashion (or with a local logit or probit) we estimate the effect of the discontinuity on the treatment probability
The final estimator is then given by
,0 ,0
,1 ,1
2,0 ,0 ,0 ,0
:,
2,1 ,1 ,1 ,1
:,
ˆˆ( , ) ( )
ˆˆ( , ) ( )
min
minD D
i
D Di
D D D Di i
i c h x c
D D D Di i
i c x c h
d x c
d x c
α β
α β
α β α β
α β α β
− < <
< < +
= − − −
= − − −
∑
∑
1 0
,1 ,0
ˆ ˆˆˆ ˆ
FD D
α αγα α
−=
−
Michael LechnerCausal Analysis RDD
2014page 21
RDDEstimation in the fuzzy design (2)
Because of the particular implementation– rectangular kernel– same bandwidth for numerator and denominator
… this estimator is identical to following (local) 2SLS estimator (see IL08)
– define
– regression function:– endogenous variable: D– excluded instrument:– observations used for estimation:
0
0
1
11( )( ) ;1( )( )
i i i
i i
V x c x cx c x c
αδ β
β
= < − =
≥ −
' Fi i i iy v d errorδ γ= + +
1( )ix c≥
{ }|ix x c h x c h∈ − < < +
Michael LechnerCausal Analysis RDD
2014page 22
RDDNonparametric estimation: How to select the bandwidth?
How to determine the 'local neighbourhood' around the cut-off for the local linear regression ?
– Imbens, Lemieux (2008), section 5
They suggest to use a modified cross-validation procedure– Modified in such a way to take account of the fact that we are interested in minimizing
Michael LechnerCausal Analysis RDD
2014page 23
RDDHow to select the bandwidth? Sharp design
Estimate the following CV criteria
– But take account of by one-sided nature of estimation problem, by estimating the regression to the left or the right of xi only prediction of yi does not depend on yi
– Choose bandwidth that minimizes CV(h) as it corresponds to Q(..)– Discard observations 'too far away' from the threshold when doing this (IL suggest to
discard 50% of the obs. on either side of the threshold, although this seems rather arbitrary) sensitivity of bandwidth choice with respect to the share of obs. discarded has to be analysed
Michael LechnerCausal Analysis RDD
2014page 24
RDDHow to select the bandwidth? Fuzzy design
Main difference: Bandwidth for treatment probability as function of the running variable has to be estimated as well 4 nonparametric regressions
Michael LechnerCausal Analysis RDD
2014page 25
RDDNonparametric (local linear) estimation: Inference
Simplifying assumptions– local linear– rectangular kernel– same bandwidth for all nonparametric regressions– undersmoothing bias plays no role in the asymptotic distribution
Most simple strategy – use one-step estimators for fuzzy and sharp design as given before– use standard heteroscedasticity robust estimators for 2SLS (fuzzy) and OLS (sharp)
IL08 provide also alternative estimator– not clear which estimator is better use the simplest one
Michael LechnerCausal Analysis RDD
2014page 26
RDDTesting potential manipulation of location around cut-off (McCrary, 2008) (1)
Key concern of RD designs: RDD can be invalid if individuals can manipulate the “assignment / running variable”.
Check by analysing the distribution of X for 'heaping' on one side of the threshold heaping implies that density of running variable is discontinuous
– McCrary (2008) proposes a formal test for this feature– Test can be expected to be powerful if manipulation is monotonic
Test has two steps:– 1st step, one obtains a finely gridded histogram. – 2nd step, one smooths the histogram using local linear regression, separately on either
side of the cutoff. – Perform Wald-type test– To efficiently convey sensitivity of the discontinuity estimate to smoothing assumptions,
one may augment a graphical presentation of the second-step smoother with the first-step histogram, analogous to presenting local averages along with an estimatedconditional expectation
Michael LechnerCausal Analysis RDD
2014page 27
RDDTesting potential manipulation of location around cut-off (McCrary, 2008) (2)
Different cases
Michael LechnerCausal Analysis RDD
2014page 30
RDDSome specification tests suggested by IL08
Any effect of the discontinuity on other (exogenous) covariates?– other jumps usually suggests problems (because then many things change at the cut-off
and the effect may come from another treatment, i.e. local exclusion restriction does not hold)
Testing the continuity of the forcing variable …– addresses the problem of strategic clustering of units around the cut-off (test by
McCrary, 2007)
Testing for jumps in outcome variable away from the discontinuity– this is ruled out by assumption– idea similar to placebo treatments (effect is estimated when it is known to be zero)
Michael LechnerCausal Analysis RDD
2014page 32
Regression discontinouity design (RDD)Conclusion
Practical considerations (IL)
This approach is becoming more and more popular
Appears to be promising in many cases when there is no (real) instrument and no CIA
Sharp: No common support at all! in practise some extrapolation necessary + treatment effect homogeneity close to cut-off marginal matching estimator
Need enough observations close to the cut-off
Rules based on small integers apparently are more difficult to justify, since they have basically no observations local to the cut-off quality of approximation?
Fuzzy rule (probabilities not strictly zero or one) change selection on observables to selection on unobservables approach gets more complicated marginal (local) IV (LATE)
Very nice, most recent, and not too difficult to understand survey of methods and applications is also provided by Willbert van der Klaauw (2008, Labour)