Lecture 7 The Regression Discontinuity Design RDD – fuzzy and sharp RDD.pdf · Michael Lechner Causal Analysis RDD 2014 page 1 Lecture 7 The Regression Discontinuity Design RDD

Michael LechnerCausal Analysis RDD

2014page 1

Lecture 7

The Regression Discontinuity Design

RDD – fuzzy and sharp


2014page 2

Regression Discontinuity Design (RDD)Introduction (1)

The RDD design is a quasi-experimental design with the defining characteristic that the probability of receiving treatment changes discontinuously at some point while at the same time the potential outcome remains stable in a neighbourhood around this point observations in this neighbourhood with and without treatment may be good comparisons for each other

Example: If a training programme is only available to youths not older than 25, than at least in the year of the introduction of the programme, similar nonparticipants of age 26 may be a good comparison group for participants of age 25,... (the effects of the age change can be deduced e.g. from a comparison of 24 25 or 25 26, 26 27; these effects may be used to adjust for the additional year that come jointly with being in one or the other group)


2014page 3


Advantage: Experiment around a threshold

Common threads to internal validity:– Since location of threshold is usually not truly experimental, units may manipulate their

location with respect to the threshold– Threshold may be relevant for other treatments as well

(sensitivity checks available see last slides)

Common thread to external validity: Units around the threshold may not be representative for the treated or population at large


2014page 4


First applications – Thistlethwaite and Campbell (1960) study the effect of student scholarship on career

aspirations• awards only made if a test score exceeds a threshold

– See Cook (2008) for the history of RDD in economics, psychology, and statistics

More recent applications in economics– Class size on attainment by pupils (Angrist and Lavy, 1996, if average class size more

than XX pupils, class is split)– Anti-Discrimination laws on share of minority employees (Hahn, Todd, van der Klaaw,

1999,: Firms are covered by the law if they have more than 15 employees share of minorities increases!)

– ... many more (see surveys by Cook, 2008, Lee & Lemieux, 2009, Van der Klaauw, W. 2008)

Method of the day (?): Still pretty hot (and perhaps overvalued)(2008 special issue of the Journal of Econometrics)


2014page 5

The Regression Discontinuity Design (RDD)Introduction (4)

Example: Angrist, Lavy (QJE, 1999)– What is the effect of class size on students/pupils performance?– All sorts of endogeneity problems (richer schools have smaller classes, etc.)– Israel has rule that a class is split into two (3, 4, etc.) if number of students is above

certain thresholds– Whether number of students is just below or above threshold is exogenous (not related

to outcomes other than by leading to different class sizes)– Compare outcomes of classes just above and below the threshold to obtain effect of class

size


2014page 6


A note on estimation

Nonparametric estimation close to cut-off points (kernel may perform poorly local regression may be better)

Regression type estimator if additional assumptions are used

Because identification is local, there cannot be sqrt(N)-convergent estimators without further assumptions (no averaging of nonparametric estimates as in matching or IV)

Key: Need enough data close enough to threshold


2014page 7

Two distinct types of RDDSharp design


2014page 8

Two distinct types of RDD Fuzzy design


2014page 9

RDDSharp design (1)

X: (continuous) variable that defines threshold (forcing variable)

c: threshold

Quantity typically identified:

Identification is typically achieved by showing that these expectations can be learned from the observed outcome just above and below the threshold:

Note that CIA is fulfilled at threshold in a trivial way– since there is variation of treatment at cut-off, it must hold– but there is no common support in the sharp design CIA cannot be exploited directly matching not possible extrapolation around the cut-off needed

Smoothness condition for potential outcomes (that allows some local extrapolation) is therefore required

1( )D X c= ≥

[ ](1) (0) |S E Y Y X cτ = − =

[ ] [ ][ ] [ ]

(1) | lim |

(0) | lim |x c

x c

E Y X c E Y X x

E Y X c E Y X x↓

↑

= = =

= = =

(1), (0) |Y Y D X c=C


2014page 10

RDDSharp design (2)

Conditional expectations of the potential outcomes have to be smooth

Here we assume that the conditional distributions of the potential outcomes are smooth in x and y

– in practice continuity in a neighbourhood around c is enough

From this assumption we obtain identification

Estimate regression curves and evaluate their difference at X=c

However, as c is a boundary point for both regressions this is non-trivial– since non-parametric estimation is local to the threshold, estimators will converge

slower than sqrt(N) [i.e. slower than standard IV or matching]– typical non-parametric regression estimators are not well behaved at boundary

(0)| (1)|( , ) ( , ) are continuous in for all Y X Y XF y x and F y x x y

[ ] [ ] [ ] [ ][ ] [ ] [ ] [ ]

(0) | lim (0) | lim (0) | 0, lim |

(1) | lim (1) | lim (1) | 1, lim |x c x c x c

x c x c x c

E Y X c E Y X x E Y D X x E Y X x

E Y X c E Y X x E Y D X x E Y X x↑ ↑ ↑

↓ ↓ ↓

= = = = = = = =

= = = = = = = =


2014page 11

RDDFuzzy design (1)

Whereas in the sharp design the conditional on X treatment probability jumps from 0 to 1 (so everybody close to the threshold is a complier), in the fuzzy design there is less (local) compliance as this probability jumps by a smaller amount

Such a situation may typically arise when the incentives to participate in D jump at the threshold (but (non-)participation is not enforced)

In this case the threshold acts like an instrument. Under usual IV-type assumptions, the following estimator is obtained for the-local-complier treatment effect:

[ ] [ ]lim 1| lim 1|x c x c

P D X x P D X x↑ ↓

= = ≠ = =

[ ] [ ][ ] [ ]

lim | lim |

lim 1| lim 1|F x c x c

x c x c

E Y X x E Y X x

P D X x P D X xτ ↓ ↑

↓ ↑

= − ==

= = − = =


2014page 12

RDDFuzzy design (2)

Assuming that the cut-off can be marginally manipulated, we need the following assumptions for a LATE-type interpretation of the identified effect

– D(x) is non-decreasing in x at x=c (monotonicity)– A complier (at c) is defined as– Nevertaker (at c)– Always taker (at c)

Note that an explicit exclusion restriction is not necessary, because continuity of the potential outcome in the neighbourhood of the cut-off combined with a discrete jump of the probability at the cut-off implicitly acts like an exclusion restriction

Under these assumptions the fuzzy RDD estimates the effects for those observations located just around the cut-off which would switch status when moving over the cut-off

[ ] [ ]lim 1| 0 lim 1| 1x c x c

P D X x and P D X x↑ ↓

= = = = = =[ ] [ ]lim 1| 0 lim 1| 0

x c x cP D X x and P D X x

↑ ↓= = = = = =

[ ] [ ]lim 1| 1 lim 1| 1x c x c

P D X x and P D X x↑ ↓

= = = = = =

(0)| (1)|( , ) ( , ) are continuous in for all Y X Y XF y x and F y x x y


2014page 14

RDDExternal vs. internal validity

SRDD: Valid only for observations at the cut-off

FRDD: Valid only for compliers at the cut-off

ATE, ATET, etc. only identified by extrapolation of the effects

Thus, RDD explicitly trades-off external validity for internal validity

However, sometimes these compliers may represent politically interesting subpopulations, for example, when in a particular situation they correspond to the marginal population that is affected if the treatment is (to some extend) expanded or contracted

– Regional variation of programme assignment probabilities: Frölich & Lechner (2010)– Using unaffected firms (usually small firm are not subject to the same tough rules as

larger firms) to evaluate firing restrictions and disability laws


2014page 15

RDDVerifying the assumptions by some graphs

As RDD emphasises the distinction between a continuous change of the potential outcomes and a discrete jump of the treatment probability (leading to a discrete jump of the observed outcome if there is any effect), graphical analysis is a powerful tool to falsify these assumptions

Imbens and Lemieux (2008) suggest to plot the following– Mean values of the outcomes in fixed bins around the cut-off

• there should be some jump at c if there is any effect; and no other jumps that cannot be directly justified

– Mean values of the treatment variable in fixed bins around the cut-off • there should be some jump at c; and no other jumps that cannot be directly justified

– Mean values of other covariates in fixed bins around the cut-off • there should be no jump at c, because in this case the cut-off may also have a direct impact on Y

via changes in other covariates– A histogram of the forcing variable around the cut-off

• any clustering close to c might suggest strategic location above or below cut-off which would violate the implied assumption that location is random (at least close to c)

• to see if there are enough observations for a local nonparametric analysis


2014page 16

RDDEstimation (Imbens, Lemieux 2008)

Ideally, we would choose a subsample with observations clustered very, very closely around the cut-off point and then compute the respective sample means

Usually, this will not work, because sample size becomes too small to be useful need larger window

Since X influences Y, larger window usually means that values of X above and below the cut-off become more dissimilar and exclusion restrictions becomes critical

Therefore, control for changes in X by (nonparametric) regression above and below the threshold


2014page 17

RDDNonparametric estimation in the sharp design (1)

We must estimate E(y|X) above and below the cut-off and compare the predicted values of both non-parametric regressions at c

– Problem: Prediction of a single point of two np-regressions (no sqrt(N)-convergent np-estimator available) & this point is a boundary point

– Standard Kernel-type regressions do not very well close to the boundary (slower rate of convergence than at interior points)

– See Imbens, Lemieux (2008), p. 624 for the analytical derivation of these problems in a special case

Literature suggest particular estimators that seem to be fairly robust in such specific situations

– Global nonparametric methods (series estimation etc.)• not very attractive as only the region around the cut-off is of key importance (observations far

away from the cut-off may influence the results)– Local linear regression

• more robust to boundary problems than kernel regression but still local approach (observations far away from the cut-off have almost no impact on results)


2014page 18

RDDLocal linear regression in the sharp design (1)

In local region around both sides of the cut-off, estimate linear regressions

For the sharp design, these estimates are then evaluated at x=c and the difference is taken

Standard regression inference may be valid

Non-linear regressions (logit etc.) may be used (depending on the outcome variable)

We obtain numerically the same estimate if we use the following one-step estimator

2

1, , ,

ˆ 1( ) ( ) ( )minS

NS S

i i i i i ii

c h x c h y x c d x c dα β γ δ

γ α β γ δ=

= − < < + − − − − − − ∑

1 1 0 0 1 0ˆ ˆˆ ˆ ˆ ˆ( ) ( )S c c c cτ α β α β α α= + − − − − = −

0 0

1 1

20 0 0 0

:,

21 1 1 1

:,

ˆˆ( , ) ( )

ˆˆ( , ) ( )

min

mini

i

i ii c h x c

i ii c x c h

y x c

y x c

α β

α β

α β α β

α β α β

− < <

< < +

= − − −

= − − −

∑

∑


2014page 19

RDDThe role of additional covariates in the sharp design

Usually not necessary

But– Reduce variance of estimator (although there may be a dimensionality problem)– Reduce bias of estimator if observations away from the cut-off have to be used (because

of sample size considerations)– Check for discontinuities in those variables– Use to define sample (if discontinuity is only valid in a subsample)


2014page 20

RDDEstimation in the fuzzy design (1)

Here we have to estimate a ratio

We estimate the effect of the discontinuity on the outcome in the same way as for the sharp design

In a similar fashion (or with a local logit or probit) we estimate the effect of the discontinuity on the treatment probability

The final estimator is then given by

,0 ,0

,1 ,1

2,0 ,0 ,0 ,0

:,

2,1 ,1 ,1 ,1

:,

ˆˆ( , ) ( )

ˆˆ( , ) ( )

min

minD D

i

D Di

D D D Di i

i c h x c

D D D Di i

i c x c h

d x c

d x c

α β

α β

α β α β

α β α β

− < <

< < +

= − − −

= − − −

∑

∑

1 0

,1 ,0

ˆ ˆˆˆ ˆ

FD D

α αγα α

−=

−


2014page 21

RDDEstimation in the fuzzy design (2)

Because of the particular implementation– rectangular kernel– same bandwidth for numerator and denominator

… this estimator is identical to following (local) 2SLS estimator (see IL08)

– define

– regression function:– endogenous variable: D– excluded instrument:– observations used for estimation:

0

0

1

11( )( ) ;1( )( )

i i i

i i

V x c x cx c x c

αδ β

β

= < − =

≥ −

' Fi i i iy v d errorδ γ= + +

1( )ix c≥

{ }|ix x c h x c h∈ − < < +


2014page 22

RDDNonparametric estimation: How to select the bandwidth?

How to determine the 'local neighbourhood' around the cut-off for the local linear regression ?

– Imbens, Lemieux (2008), section 5

They suggest to use a modified cross-validation procedure– Modified in such a way to take account of the fact that we are interested in minimizing


2014page 23

RDDHow to select the bandwidth? Sharp design

Estimate the following CV criteria

– But take account of by one-sided nature of estimation problem, by estimating the regression to the left or the right of xi only prediction of yi does not depend on yi

– Choose bandwidth that minimizes CV(h) as it corresponds to Q(..)– Discard observations 'too far away' from the threshold when doing this (IL suggest to

discard 50% of the obs. on either side of the threshold, although this seems rather arbitrary) sensitivity of bandwidth choice with respect to the share of obs. discarded has to be analysed


2014page 24

RDDHow to select the bandwidth? Fuzzy design

Main difference: Bandwidth for treatment probability as function of the running variable has to be estimated as well 4 nonparametric regressions


2014page 25

RDDNonparametric (local linear) estimation: Inference

Simplifying assumptions– local linear– rectangular kernel– same bandwidth for all nonparametric regressions– undersmoothing bias plays no role in the asymptotic distribution

Most simple strategy – use one-step estimators for fuzzy and sharp design as given before– use standard heteroscedasticity robust estimators for 2SLS (fuzzy) and OLS (sharp)

IL08 provide also alternative estimator– not clear which estimator is better use the simplest one


2014page 26

RDDTesting potential manipulation of location around cut-off (McCrary, 2008) (1)

Key concern of RD designs: RDD can be invalid if individuals can manipulate the “assignment / running variable”.

Check by analysing the distribution of X for 'heaping' on one side of the threshold heaping implies that density of running variable is discontinuous

– McCrary (2008) proposes a formal test for this feature– Test can be expected to be powerful if manipulation is monotonic

Test has two steps:– 1st step, one obtains a finely gridded histogram. – 2nd step, one smooths the histogram using local linear regression, separately on either

side of the cutoff. – Perform Wald-type test– To efficiently convey sensitivity of the discontinuity estimate to smoothing assumptions,

one may augment a graphical presentation of the second-step smoother with the first-step histogram, analogous to presenting local averages along with an estimatedconditional expectation


2014page 27

RDDTesting potential manipulation of location around cut-off (McCrary, 2008) (2)

Different cases


2014page 30

RDDSome specification tests suggested by IL08

Any effect of the discontinuity on other (exogenous) covariates?– other jumps usually suggests problems (because then many things change at the cut-off

and the effect may come from another treatment, i.e. local exclusion restriction does not hold)

Testing the continuity of the forcing variable …– addresses the problem of strategic clustering of units around the cut-off (test by

McCrary, 2007)

Testing for jumps in outcome variable away from the discontinuity– this is ruled out by assumption– idea similar to placebo treatments (effect is estimated when it is known to be zero)


2014page 32

Regression discontinouity design (RDD)Conclusion

Practical considerations (IL)

This approach is becoming more and more popular

Appears to be promising in many cases when there is no (real) instrument and no CIA

Sharp: No common support at all! in practise some extrapolation necessary + treatment effect homogeneity close to cut-off marginal matching estimator

Need enough observations close to the cut-off

Rules based on small integers apparently are more difficult to justify, since they have basically no observations local to the cut-off quality of approximation?

Fuzzy rule (probabilities not strictly zero or one) change selection on observables to selection on unobservables approach gets more complicated marginal (local) IV (LATE)

Very nice, most recent, and not too difficult to understand survey of methods and applications is also provided by Willbert van der Klaauw (2008, Labour)

Documents

Lecture 7 The Regression Discontinuity Design RDD – fuzzy and sharp RDD.pdf · Michael Lechner Causal Analysis RDD 2014 page 1 Lecture 7 The Regression Discontinuity Design RDD