Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Causality in Econometrics and Statistics
Guido Imbens (Stanford University)
University of Virginia
Charlottesville, February 9th, 2018
1. Causality
2. Randomized Experiments
3. Observational Studies: Matching Methods
4. Observational Studies: Instrumental Variables Methods
1
1. Causality
Three key notions underlying the Rubin-Causal-Model (RCM)
or Potential Outcome approach to causality (Holland, 1986).
First, potential outcomes, each corresponding to the various
levels of a treatment or manipulation.
Second, the presence of multiple units, and the related sta-
bility assumption.
Central role of the assignment mechanism, which is crucial for
inferring causal effects and serves as the organizing principle.
2
1.1 Potential Outcomes
Given a unit and a set of actions, we associate each ac-
tion/unit pair with a potential outcome: “potential” because
only one will ultimately be realized and therefore possibly ob-
served: the potential outcome corresponding to the action
actually taken at that time.
The causal effect of the action or treatment involves the
comparison of these potential outcomes, e.g., Y (1)− Y (0).
Y (0) denotes the outcome given the control treatment, and
Y (1) the outcome given the active treatment. W ∈ {0,1}indicator for treatment, observe
W and Y obs = Y (W ) = W · Y (1) + (1 − W ) · Y (0)
3
Is this useful?
• Potential outcome notion is consistent with the way economists
think about demand/supply functions: quantities demanded
and supplied at different prices.
• some causal questions become more tricky: causal effect
of race on economic outcomes. One solution is to make ma-
nipulation precise: change names on cv for job applications
(Bertrand and Mullainathan, 2004).
• what is causal effect of physical appearance, height, or
gender, on earnings (e.g., Hamermesh and Biddle, 1994)?
Strong statistical correlations, but what do they mean? Many
manipulations possible, probably all with different causal ef-
fects.
4
1.2 Multiple Units
Because we cannot learn about causal effects from a single
observed outcome, we must rely on multiple units exposed
to different treatments to make causal inferences.
By itself, however, the presence of multiple units does not
solve the problem of causal inference. Consider a drug (as-
pirin) example with two units—you and I—and two possible
treatments for each unit—aspirin or no aspirin.
There are now a total of four treatment levels: you take an
aspirin and I do not, I take an aspirin and you do not, we
both take an aspirin, or we both do not.
5
In many situations it may be reasonable to assume that treat-
ments applied to one unit do not affect the outcome for
another (Stable Unit Treatment Value Assumption, (Imbens
and Rubin, 2015).
• In agricultural fertilizer experiments, researchers have taken
care to separate plots using “guard rows,” unfertilized strips
of land between fertilized areas.
• In large scale job training programs the outcomes for one in-
dividual may well be affected by the number of people trained
when that number is sufficiently large to create increased
competition for certain jobs (Crepon, Duflo et al)
• In the peer effects / social interactions literature these
interaction effects are the main focus.
6
Six Observations from the GAIN Experiment in Los Angeles
Individual Potential Outcomes Actual Observed Outcome
Yi(0) Yi(1) Treatment Y obsi
1 66 ? 0 662 0 ? 0 03 0 ? 0 04 ? 0 1 05 ? 607 1 6076 ? 436 1 436
Note: (Yi(0), Yi(1)) fixed for i = 1, . . . ,6. (W1, . . . , W6) is
stochastic.
7
1.3 The Assignment Mechanism
The key piece of information is how each individual cameto receive the treatment level received: in our language ofcausation, the assignment mechanism.
Pr(W|Y(0), Y(1), X)
Known, no dependence on Y(0),Y(1): randomized exper-iment
Unknown, no dependence on Y(0), Y(1): Unconfoundedassignment / Selection on Observables
Unknown, possible dependence on Y(0),Y(1): e.g., instru-mental variables, no general solution.
Compare with conventional focus on distribution of outcomesgiven explanatory variables, e.g., Y obs|Wi ∼ N(α + βWi, σ
2).Here, other way around.
8
2. Randomized Experiments: Fisher Exact P-values
Given data from a randomized experiment, Fisher was inter-
ested testing sharp null hypotheses, that is, null hypotheses
under which all values of the potential outcomes for the units
in the experiment are either observed or can be inferred.
Notice that this is distinct from the question of whether the
average treatment effect across units is zero.
The null of a zero average is a much weaker hypothesis be-
cause the average effect of the treatment may be zero even
if for some units the treatment has a positive effect, as long
as for others the effect is negative.
9
We will test the sharp null hypothesis that the program had
absolutely no effect on earnings, that is:
H0 : Yi(0) = Yi(1) for all i = 1, . . . ,6.
Under this null hypothesis, the unobserved potential out-
comes are equal to the observed outcomes for each unit.
Thus we can fill in all six of the missing entries using the
observed data.
This is the first key point of the Fisher approach: under the
sharp null hypothesis all the missing values can be inferred
from the observed ones.
11
Six Observations from the GAIN Experiment in Los Angeles
Individual Potential Outcomes Actual Observed OutcomeYi(0) Yi(1) Treatment Yi
1 66 (66) 0 662 0 (0) 0 03 0 (0) 0 04 (0) 0 1 05 (607) 607 1 6076 (436) 436 1 436
12
Now consider testing this null against the alternative hypoth-
esis that Yi(0) 6= Yi(1) for some units, based on the test
statistic:
T1 = T(W, Yobs) =1
3
6∑
i=1
Wi · Y obsi − 1
3
6∑
i=1
(1 − Wi) · Y obsi
=1
3
6∑
i=1
Wi · Yi(1) − 1
3
6∑
i=1
(1 − Wi) · Yi(0).
For the observed data the value of the test statistic is (Y obs4 +
Y obs5 + Y obs
6 − Y obs1 − Y obs
2 − Y obs3 )/3 = 325.6.
Suppose for example, that instead of the observed assign-
ment vector Wobs = (0,0,0,1,1,1)′ the assignment vector
had been W = (0,1,1,0,1,0). Under this assignment vector
the test statistic would have been (−Y obs4 + Y obs
5 − Y obs6 −
Y obs1 + Y obs
2 + Y obs3 )/3 = 35.
13
Randomization Distribution for six observations from GAIN data
W1 W2 W3 W4 W5 W6 levels ranks
0 0 0 1 1 1 325.6 1.000 0 1 0 1 1 325.6 1.670 0 1 1 0 1 -79.0 -1.670 0 1 1 1 0 35.0 -1.000 1 0 0 1 1 325.6 2.330 1 0 1 0 1 -79.0 -1.000 1 0 1 1 0 35.0 -0.33... ... ... ... ... ... ... ...1 1 1 0 0 0 325.6 -1.00
14
Given the distribution of the test statistic, how unusual is
this observed average difference 325.6), assuming the null
hypothesis is true?
One way to formalize this question is to ask how likely it is
(under the randomization distribution) to observe a value of
the test statistic that is as large in absolute value as the one
actually observed.
Simply counting we see that there are twelve vectors of
assignments with at least a difference in absolute value of
325.6 between treated and control classes, out of a set of
twenty possible assignment vectors. This implies a p-value
of 8/20 = 0.40.
15
2.2 The Choice of Statistic
The most standard choice of statistic is the difference in
average outcomes by treatment status:
T =
∑
Wi · Y obsi
∑
Wi−∑
(1 − Wi) · Y obsi
∑
(1 − Wi).
An obvious alternative to the simple difference in average
outcomes by treatment status is to transform the outcomes
before comparing average differences between treatment lev-
els, e.g., by taking logarithms, leading to the following test
statistic:
T =
∑
Wi · ln(Y obsi )
∑
Wi−∑
(1 − Wi) · ln(Y obsi )
∑
1 − Wi.
16
An important class of statistics involves transforming the out-
comes to ranks before considering differences by treatment
status. This improves robustness.
We also often subtract (N + 1)/2 from each to obtain a
normalized rank that has average zero in the population:
Ri(Yobs1 , . . . , Y obs
N ) =N∑
j=1
1Y obs
j ≤Y obsi
− N + 1
2.
Given the ranks Ri, an attractive test statistic is the difference
in average ranks for treated and control units:
T =
∑
Wi · Ri∑
Wi−∑
(1 − Wi) · Ri∑
1 − Wi.
17
2.3 Computation of p-values
The p-value calculations presented so far have been exact.
With both N and M sufficiently large, it may therefore be
unwieldy to calculate the test statistic for every value of the
assignment vector.
In that case we rely on numerical approximations to the p-
value.
Formally, randomly draw an N-dimensional vector with N −M zeros and M ones from the set of assignment vectors.
Calculate the statistic for this draw (denoted T1). Repeat
this process K − 1 times, in each instance drawing another
vector of assignments and calculating the statistic Tk, for
k = 2, . . . , K. We then approximate the p-value for our test
statistic by the fraction of these K statistics that are more
extreme than Tobs.
18
Comparison to p-value based on normal approximation to
distribution of t-statistic:
t =Y 1 − Y 0
√
s20/(N − M) + s21/M
where
s20 =1
N − M − 1
∑
i:Wi=0
(Y obsi − Y 0)
2, s21 =1
M − 1
∑
i:Wi=1
(Y obsi −
and
p = 2 × Φ(−|t|) where Φ(a) =∫ a
−∞1√2π
exp
(
−x2
2
)
dx
19
P-values for Fisher Exact Tests: Ranks versus Levels
sample size p-valuesProg Loc controls treated t-test FET FET
(levels) (ranks)
GAIN AL 601 597 0.835 0.836 0.890GAIN LA 1400 2995 0.544 0.531 0.561GAIN RI 1040 4405 0.000 0.000 0.000GAIN SD 1154 6978 0.057 0.068 0.018
WIN AR 37 34 0.750 0.753 0.805WIN BA 260 222 0.339 0.339 0.286WIN SD 257 264 0.136 0.137 0.024WIN VI 154 331 0.960 0.957 0.249
20
2.4 Neyman’s Unbiased Estimator Y obs1 − Y obs
0
Neyman was interested in the population average treatment
effect:
τ =1
N
N∑
i=1
(Yi(1) − Yi(0)) = Y (1) − Y (0).
Suppose that we observed data from a completely random-
ized experiment in which M units were assigned to treatment
and N − M assigned to control.
Neyman’s unbiased estimator is the difference in the average
outcomes for those assigned to the treatment versus those
assigned to the control:
τ =1
M
∑
i:Wi=1
Y obsi − 1
N − M
∑
i:Wi=0
Y obsi = Y obs
1 − Y obs0 .
21
Neyman (1923, 1990) was also interested in the variance of
this unbiased estimator of the average treatment effect
This involved two steps: first, deriving the variance of the
estimator for the average treatment effect; and second, de-
veloping unbiased estimators of this variance.
In addition, Neyman sought to create confidence intervals for
the population average treatment effect which also requires
an appeal to the central limit theorem for large sample nor-
mality.
22
The variance of Y obs1 − Y obs
0 is equal to:
Var(Y obs1 − Y obs
0 ) =S20
N − M+
S21
M− S2
01
N, (1)
where S2w is the variance of Yi(w) in the population, defined
as:
S2w =
1
N − 1
N∑
i=1
(Yi(w) − Y (w))2,
for w = 0,1, and S201 is the population variance of the unit-
level treatment effect, defined as:
S201 =
1
N − 1
N∑
i=1
(Yi(1) − Yi(0)− τ)2.
23
The numerator of the first term, the population variance of
the potential control outcome vector, Y(0), is equal to
S20 =
1
N − 1
N∑
i=1
(Yi(0)− Y (0))2.
An unbiased estimator for S20 is
s20 =1
N − M − 1
∑
i:Wi=0
(Y obsi − Y obs
0 )2.
24
The third term, S201 (the population variance of the unit-
level treatment effect) is more difficult to estimate because
we cannot observe both Yi(1) and Yi(0) for any unit.
If the treatment effect is additive (Yi(1) − Yi(0) = c for all
i), then this variance is equal to zero and the third term
vanishes.
Under this circumstance we can obtain an unbiased estimator
for the variance as:
V
(
Y obs1 − Y obs
0
)
=s20
N − M+
s21M
. (2)
Gain Data: Riverside:
τ = 0.284 (se 0.033)
25
This estimator for the variance is widely used, even when the
assumption of an additive treatment effect is inappropriate.
There are two main reasons for this estimator’s popularity.
First, confidence intervals generated using this estimator of
the variance will be conservative with actual coverage at least
as large, but not necessarily equal to, the nominal coverage.
The second reason for using this estimator for the variance
is that it is always unbiased for the variance of τ = Y obs1 −
Y obs0 when this statistic is interpreted as the estimator of the
average treatment effect in the super-population from which
the N observed units are a random sample. (we return to
this interpretation later)
26
3. Observational Studies: Matching Methods
Treatment indicator: Wi
Potential Outcomes Yi(0), Yi(1)
Covariates Xi
Observed outcome: Y obsi = Wi · Yi(1) + (1 − Wi) · Yi(0).
Nc =∑
(1 − Wi) number of controls, Nt =∑
i Wi number of
treated units.
Y obsc and Y obs
t are average outcomes for controls and treated.
28
3.1 Application: Imbens-Rubin-Sacerdote Lottery Study
IRS were interested in estimating the effect of unearned in-
come on economic behavior including effects on labor supply,
consumption and savings.
Those effects are important to understand and improve de-
sign of social security programs.
IRS surveyed individuals who had played and won large sums
of money in the lottery (“winners”). As a comparison group
they collected data on a second set of individuals who also
played the lottery but who had not won big prizes (the “losers”)
.
29
Nt = 259 winners and Nc = 237 losers in the sample of
N = 496 lottery players.
We know the year individuals played the lottery, the number
of tickets they typically bought, age, sex, education, and
their social security earnings for the six years preceeding their
winning. (what else should we have asked for?)
We present averages and standard deviations for the full sam-
ple, t-statistics for the null hypothesis of equal means in the
two groups, and the normalized differences in the means.
30
Normalized Difference in Covariates
Xc =1
Nc
∑
i:Wi=0
Xi S2c =
1
Nc − 1
∑
i:Wi=0
(
Xi − Xc
)2
Xt =1
Nt
∑
i:Wi=1
Xi S2t =
1
Nt − 1
∑
i:Wi=1
(
Xi − Xt
)2
Normalized difference:
nd =Xt − Xc
√
(S2c + S2
t )/2
More relevant for assessing balance than the t-statistic
t =Xt − Xc
√
S2c /Nc + S2
t /Nt
31
Summary Statistics Lottery Sample
Variable All Losers WinnersN=496 Nt=259 Nc=237 Norm.
mean (s.d.) mean mean [t-stat] Dif.
Year Won 6.23 1.18 6.38 6.06 -3.0 -0.27# Tickets 3.33 2.86 2.19 4.57 9.9 0.90Age 50.2 13.7 53.2 47.0 -5.2 -0.47Male 0.63 0.48 0.67 0.58 -2.1 -0.19Education 13.73 2.20 14.43 12.97 -7.8 -0.70Working Then 0.78 0.41 0.77 0.80 0.9 0.08Earn Y -6 13.8 13.4 15.6 12.0 -3.1 -0.27...Earn Y -1 16.3 15.7 18.0 14.5 -2.5 -0.23Pos Earn Y-6 0.69 0.46 0.69 0.70 0.3 0.03...Pos Earn Y-1 0.71 0.45 0.69 0.74 1.2 0.10
32
There are substantial differences between the two groups,
including in pre-winning earnings and the number of lottery
tickets bought. This suggests that simple regression meth-
ods will not necessarily be adequate for removing the biases
associated with the differences in covariates.
Where do these differences come from?
• People buying more tickets are more likely to win.
• Nonresponse may differ by prize and individual character-
istics (including labor income)
Is unconfoundedness plausible?
33
3.3 Assumptions
I. Unconfoundedness
Yi(0), Yi(1) ⊥ Wi | Xi.
This form due to Rosenbaum and Rubin (1983). Sometimes
referred to as “selection on observables”, or “exogeneity” but
those terms are not well defined.
Suppose
Yi(0) = α + β′Xi + εi, and Yi(1) = Yi(0) + τ,
then
Yi = α + τ · Wi + β′Xi + εi,
and unconfoundedness ⇐⇒ εi ⊥ Wi|Xi (exogeneity)
34
II. Overlap
0 < pr(Wi = 1|Xi) < 1.
For all X there are treated and control units.
This assumption is in principle testable: one can estimate the
propensity score
e(x) = pr(Wi = 1|Xi = x)
and assess whether it gets close to zero.
I and II combined: Strong Ignorability (Rosenbaum and Ru-
bin, 1983)
This is what we will work with for this application.
35
Identification Given Unconfoundedness and Overlap
By definition,
τ(X) = E[Yi(1)− Yi(0)|Xi = x]
= E[Yi(1)|Xi = x] − E[Yi(0)|Xi = x]
By unconfoundedness this is equal to
E[Yi(1)|Wi = 1, Xi = x] − E[Yi(0)|Wi = 0, Xi = x]
= E[Yi|Wi = 1, Xi = x] − E[Yi|Wi = 0, Xi = x].
By the overlap assumption we can estimate both terms on
the righthand side.
Then
τ = E[τ(Xi)].
36
3.4 Concern with regression estimators in cases with
limited overlap in covariate distributions (distinct from
plausibility of unconfoundedness assumption)
Regression estimators are the most widely used methods for
estimating treatment effects
Sometimes simple linear regression:
Y obsi = β0 + τ · Wi + β1 · Xi + ε
Sometimes allowing for interactions:
Y obsi = β0 + τ · Wi + β1 · Xi + β2 · (Xi − X) · Wi + ε
What is the problem with these methods?
37
Suppose the regression model is:
Yi(w) = βw0 + βw1 · Xi + εwi
Suppose we are interested in τPt :
τPt = E[Yi(1) − Yi(0)|Wi = 1]
= E[Yi|Wi = 1] − E[E[Yi|Wi = 0,Xi]|Wi = 1]
This is estimated as
Y t − Y c − βc1 ·(
Xt − Xc
)
The problem is that if the difference Xt−Xc is substantial, wedo a lot of extrapolation using this estimator. Regression isOK with experimental data, but not if Xt−Xc is substantial.
This is why looking at normalized differences in covariates isuseful as indication of potential problems.
38
3.5 Estimation: Subclassification
The propensity score is the probability of assignment to treat-
ment given the covariates:
e(x) ≡ pr(Wi = 1|Xi = x) = E[Wi|Xi = x]
Unconfoundedness
Wi ⊥ Yi(0), Yi(1) | Xi.
This implies (Rosenbaum & Rubin, 1983)
Wi ⊥ Yi(0), Yi(1) | e(Xi)
Only need to adjust for differences in (scalar) propensity
score.
39
Blocking or Subclassification
Pick c0 < c1 < . . . < cJ, with c0 = 0, cJ = 1.
Define block indicator Bij = 1 if cj−1 < e(Xi) ≤ cj.
τj =1
Njt
∑
i:Bij=1
Wi · Yi −1
Njc
∑
i:Bij=1
(1 − Wi) · Yi.
τ =J∑
j=1
Njc + Njt
N· τj.
Cochran recommend 5 blocks as it eliminates 90% of bias in
normal case.
40
Lottery Data: Normalized Diffs in Covs after SubclassificationFull Selected 2 Subclass 5 Subclass
Year Won -0.27 -0.06 0.03 -0.07Tickets Bought 0.90 0.51 -0.18 -0.07Age -0.47 -0.08 0.02 -0.04Male -0.19 -0.11 0.10 0.14Years of Schooling -0.70 -0.47 0.19 0.10Working Then 0.08 0.03 -0.03 -0.02Earnings Year -6 -0.27 -0.19 0.10 0.03...Earnings Year -1 -0.23 -0.19 0.02 -0.00Pos Earnings Year -6 0.03 -0.00 0.08 0.08...Pos Earnings Year -1 0.10 -0.01 0.04 0.07
41
Optimal Subclassification
Subclass Min P-score Max P-score # Controls # Treated
1 0.03 0.24 67 132 0.24 0.32 32 83 0.32 0.44 24 174 0.44 0.69 34 465 0.69 0.99 15 66
42
Average Treatment Effects Lottery Data
With and Without Subclassification
(Within Blocks Regression Adjustment with No, Few, or All Cov)
Full Sample Selected Selected SelectedCov 1 Block 1 Block 2 Blocks 5 Blocks
No -6.16 -6.64 -6.05 -5.66Few -2.85 -3.99 -5.57 -5.07All -5.08 -5.34 -6.35 -5.74
43
4.6 Assessing Unconfoundedness
Main question:
• what can we say about the plausibility of unconfounded-
ness?
• Recall, unconfoundedness is not testable.
44
We consider the unconfoundedness assumption,
Yi(0), Yi(1) ⊥⊥ Wi | Xi. (3)
We partition the vector of covariates Xi into two parts, a
(scalar) pseudo outcome, denoted by Xpi , and the remainder,
denoted by Xri , so that Xi = (Xp
i , Xr′i )′.
Now we assess whether the following conditional indepen-
dence relation holds:
Xpi ⊥⊥ Wi | Xr
i .
The two issues are (i) the interpretation of this and the rela-
tion to the unconfoundedness assumption that is of primary
interest, and (ii) the implementation of the test.
45
The main issue concerns the link between the conditional
independence relation and unconfoundedness . This link is
indirect, as unconfoundedness cannot be tested directly.
First consider a related condition:
Yi(0), Yi(1) ⊥⊥ Wi | Xri .
If this modified unconfoundedness condition were to hold,
one could use the adjustment methods using only the subset
of covariates Xri . In practice this is a stronger condition than
the original unconfoundedness condition.
The modified unconfoundedness condition is not testable.
We use the pseudo outcome Xpi as a proxy variable Yi(0),
and test
Xpi ⊥⊥ Wi | Xr
i .
46
A leading example is that where Xi contains multiple lagged
measures of the outcome. In the lottery example we have
observations on earnings for six, years prior to winning. De-
note these lagged outcomes by Yi,−1, . . . , Yi,−6, where Yi,−1
is the most recent and Yi,−6 is the most distant pre-winning
earnings measure.
One could implement the above ideas using earnings for the
most recent pre-winning year Yi,−1 as the pseudo outcome
Xpi , so that the vector of remaining pretreatment variables Xr
i
would still include the five prior years of pre-winning earnings
Yi,−2, . . . , Yi,−6 (ignoring additional pre-treatment variables).
47
Assessing Unconfoundedness Using the Lottery Data
Pseudo Outc est (s.e.)
Y−1 -0.53 (0.78)
Y−1+Y−22 -1.16 (0.83)
Y−1+Y−2+Y−33 -0.39 (0.95)
Y−1+...+Y−44 -0.56 (0.97)
Y−1+...+Y−55 -0.41 (0.92)
Y−1+...+Y−66 -1.30 (2.17)
Actual Outc
Y obs -5.74 (1.40)
49
3. Observational Studies: Instrumental Variables
• Sometimes unconfoundedness is not very plausible.
• No general solutions, but some special cases
50
The Fulton Fish Market Study
Fulton fish market in New York is where fish got bought by
restaurants/fishmongers from boats and trucks, 2-6am in the
morning.
Individuals would trade for particular quantities at prices ne-
gotiated for each sale separately, no posted prices.
Katy Graddy, for her PhD thesis, collected data from a par-
ticular seller, for a particular fish (whiting).
51
We are interested in estimating a demand function:
QDt (p)
One demand function for every market (every day t), as a
function of price.
Uconfoundedness: assume prices are randomly assigned and
compare market t where price was p with market t′ where
price was p′ 6= p.
Not very credible! Price are not randomly assigned!
52
Instrumental Variables
Think of both demand and supply function:
QDt (p), QS
t (p)
What we observe is the equilibrium price pt that solves
QDt (pt) = QS
t (pt)
and the corresponding quantity
qt = QDt (pt) = QS
t (pt)
How do we get the slope of the demand function?
53
Katy Graddy collected data on weather (windspeed and wave-
height) at sea on the preceeding days.
Call these Zt.
They affect the supply function, but not the demand func-
tion:
Cov(
QSt (p), Zt
)
6= 0, Cov(
QDt (p), Zt
)
= 0
54
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.86
6.5
7
7.5
8
8.5
9
9.5
10ols estimates
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.86
6.5
7
7.5
8
8.5
9
9.5
10iv estimates
Assume linearity:
lnQDt (p) = α + β × ln p + εt
Estimates:
OLS : − 0.54 (0.18)
IV : − 1.01 (0.38)
55
References
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. ”Syn-
thetic control methods for comparative case studies: Esti-
mating the effect of California’s tobacco control program.”
Journal of the American statistical Association 105.490 (2010):
493-505.
Angrist, J. D., Graddy, K., & Imbens, G. W. (2000). The
interpretation of instrumental variables estimators in simul-
taneous equations models with an application to the demand
for fish. The Review of Economic Studies, 67(3), 499-527.
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How
much should we trust differences-in-differences estimates?.
The Quarterly journal of economics, 119(1), 249-275.
56
Bertrand, M., & Mullainathan, S. (2004). Are Emily and
Greg more employable than Lakisha and Jamal? A field ex-
periment on labor market discrimination. American economic
review, 94(4), 991-1013.
Crepon, B., Duflo, E., Gurgand, M., Rathelot, R., & Zamora,
P. (2013). Do labor market policies have displacement ef-
fects? Evidence from a clustered randomized experiment.
The Quarterly Journal of Economics, 128(2), 531-580.
Fisher, R. A. (1937). The design of experiments. Oliver And
Boyd; Edinburgh; London.
Hamermesh, D. S., & Biddle, J. E. (1994). Beauty and the
Labor Market. American Economic Review, 84(5), 1174-
1194.
Holland, P. W. (1986). Statistics and Causal Inference. Jour-
nal of the American Statistical Association, 81(396), 945-
960.
Imbens, G., and D. Rubin (2015) Causal Inference Cambridge
University Press.
Imbens, G. W., Rubin, D. B., & Sacerdote, B. I. (2001).
Estimating the effect of unearned income on labor earnings,
savings, and consumption: Evidence from a survey of lottery
players. American Economic Review, 91(4), 778-794.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role
of the propensity score in observational studies for causal
effects. Biometrika, 70(1), 41-55.
Splawa-Neyman, J., Dabrowska, D. M., & Speed, T. P.
(1990). On the application of probability theory to agricul-
tural experiments. Essay on principles. Section 9. Statistical
Science, 465-472.