Causality in Econometrics and Statisticsqc.virginia.edu/files/2018/01/imbens_uva_causality_figures.pdf · 1. Causality Three key notions underlying the Rubin-Causal-Model (RCM) or

Causality in Econometrics and Statistics

Guido Imbens (Stanford University)

University of Virginia

Charlottesville, February 9th, 2018

1. Causality

2. Randomized Experiments

3. Observational Studies: Matching Methods

4. Observational Studies: Instrumental Variables Methods

1

1. Causality

Three key notions underlying the Rubin-Causal-Model (RCM)

or Potential Outcome approach to causality (Holland, 1986).

First, potential outcomes, each corresponding to the various

levels of a treatment or manipulation.

Second, the presence of multiple units, and the related sta-

bility assumption.

Central role of the assignment mechanism, which is crucial for

inferring causal effects and serves as the organizing principle.

2

1.1 Potential Outcomes

Given a unit and a set of actions, we associate each ac-

tion/unit pair with a potential outcome: “potential” because

only one will ultimately be realized and therefore possibly ob-

served: the potential outcome corresponding to the action

actually taken at that time.

The causal effect of the action or treatment involves the

comparison of these potential outcomes, e.g., Y (1)− Y (0).

Y (0) denotes the outcome given the control treatment, and

Y (1) the outcome given the active treatment. W ∈ {0,1}indicator for treatment, observe

W and Y obs = Y (W ) = W · Y (1) + (1 − W ) · Y (0)

3

Is this useful?

• Potential outcome notion is consistent with the way economists

think about demand/supply functions: quantities demanded

and supplied at different prices.

• some causal questions become more tricky: causal effect

of race on economic outcomes. One solution is to make ma-

nipulation precise: change names on cv for job applications

(Bertrand and Mullainathan, 2004).

• what is causal effect of physical appearance, height, or

gender, on earnings (e.g., Hamermesh and Biddle, 1994)?

Strong statistical correlations, but what do they mean? Many

manipulations possible, probably all with different causal ef-

fects.

4

1.2 Multiple Units

Because we cannot learn about causal effects from a single

observed outcome, we must rely on multiple units exposed

to different treatments to make causal inferences.

By itself, however, the presence of multiple units does not

solve the problem of causal inference. Consider a drug (as-

pirin) example with two units—you and I—and two possible

treatments for each unit—aspirin or no aspirin.

There are now a total of four treatment levels: you take an

aspirin and I do not, I take an aspirin and you do not, we

both take an aspirin, or we both do not.

5

In many situations it may be reasonable to assume that treat-

ments applied to one unit do not affect the outcome for

another (Stable Unit Treatment Value Assumption, (Imbens

and Rubin, 2015).

• In agricultural fertilizer experiments, researchers have taken

care to separate plots using “guard rows,” unfertilized strips

of land between fertilized areas.

• In large scale job training programs the outcomes for one in-

dividual may well be affected by the number of people trained

when that number is sufficiently large to create increased

competition for certain jobs (Crepon, Duflo et al)

• In the peer effects / social interactions literature these

interaction effects are the main focus.

6

Six Observations from the GAIN Experiment in Los Angeles

Individual Potential Outcomes Actual Observed Outcome

Yi(0) Yi(1) Treatment Y obsi

1 66 ? 0 662 0 ? 0 03 0 ? 0 04 ? 0 1 05 ? 607 1 6076 ? 436 1 436

Note: (Yi(0), Yi(1)) fixed for i = 1, . . . ,6. (W1, . . . , W6) is

stochastic.

7

1.3 The Assignment Mechanism

The key piece of information is how each individual cameto receive the treatment level received: in our language ofcausation, the assignment mechanism.

Pr(W|Y(0), Y(1), X)

Known, no dependence on Y(0),Y(1): randomized exper-iment

Unknown, no dependence on Y(0), Y(1): Unconfoundedassignment / Selection on Observables

Unknown, possible dependence on Y(0),Y(1): e.g., instru-mental variables, no general solution.

Compare with conventional focus on distribution of outcomesgiven explanatory variables, e.g., Y obs|Wi ∼ N(α + βWi, σ

2).Here, other way around.

8

2. Randomized Experiments: Fisher Exact P-values

Given data from a randomized experiment, Fisher was inter-

ested testing sharp null hypotheses, that is, null hypotheses

under which all values of the potential outcomes for the units

in the experiment are either observed or can be inferred.

Notice that this is distinct from the question of whether the

average treatment effect across units is zero.

The null of a zero average is a much weaker hypothesis be-

cause the average effect of the treatment may be zero even

if for some units the treatment has a positive effect, as long

as for others the effect is negative.

9

We will test the sharp null hypothesis that the program had

absolutely no effect on earnings, that is:

H0 : Yi(0) = Yi(1) for all i = 1, . . . ,6.

Under this null hypothesis, the unobserved potential out-

comes are equal to the observed outcomes for each unit.

Thus we can fill in all six of the missing entries using the

observed data.

This is the first key point of the Fisher approach: under the

sharp null hypothesis all the missing values can be inferred

from the observed ones.

11

Six Observations from the GAIN Experiment in Los Angeles

Individual Potential Outcomes Actual Observed OutcomeYi(0) Yi(1) Treatment Yi

1 66 (66) 0 662 0 (0) 0 03 0 (0) 0 04 (0) 0 1 05 (607) 607 1 6076 (436) 436 1 436

12

Now consider testing this null against the alternative hypoth-

esis that Yi(0) 6= Yi(1) for some units, based on the test

statistic:

T1 = T(W, Yobs) =1

3

6∑

i=1

Wi · Y obsi − 1

3

6∑

i=1

(1 − Wi) · Y obsi

=1

3

6∑

i=1

Wi · Yi(1) − 1

3

6∑

i=1

(1 − Wi) · Yi(0).

For the observed data the value of the test statistic is (Y obs4 +

Y obs5 + Y obs

6 − Y obs1 − Y obs

2 − Y obs3 )/3 = 325.6.

Suppose for example, that instead of the observed assign-

ment vector Wobs = (0,0,0,1,1,1)′ the assignment vector

had been W = (0,1,1,0,1,0). Under this assignment vector

the test statistic would have been (−Y obs4 + Y obs

5 − Y obs6 −

Y obs1 + Y obs

2 + Y obs3 )/3 = 35.

13

Randomization Distribution for six observations from GAIN data

W1 W2 W3 W4 W5 W6 levels ranks

0 0 0 1 1 1 325.6 1.000 0 1 0 1 1 325.6 1.670 0 1 1 0 1 -79.0 -1.670 0 1 1 1 0 35.0 -1.000 1 0 0 1 1 325.6 2.330 1 0 1 0 1 -79.0 -1.000 1 0 1 1 0 35.0 -0.33... ... ... ... ... ... ... ...1 1 1 0 0 0 325.6 -1.00

14

Given the distribution of the test statistic, how unusual is

this observed average difference 325.6), assuming the null

hypothesis is true?

One way to formalize this question is to ask how likely it is

(under the randomization distribution) to observe a value of

the test statistic that is as large in absolute value as the one

actually observed.

Simply counting we see that there are twelve vectors of

assignments with at least a difference in absolute value of

325.6 between treated and control classes, out of a set of

twenty possible assignment vectors. This implies a p-value

of 8/20 = 0.40.

15

2.2 The Choice of Statistic

The most standard choice of statistic is the difference in

average outcomes by treatment status:

T =

∑

Wi · Y obsi

∑

Wi−∑

(1 − Wi) · Y obsi

∑

(1 − Wi).

An obvious alternative to the simple difference in average

outcomes by treatment status is to transform the outcomes

before comparing average differences between treatment lev-

els, e.g., by taking logarithms, leading to the following test

statistic:

T =

∑

Wi · ln(Y obsi )

∑

Wi−∑

(1 − Wi) · ln(Y obsi )

∑

1 − Wi.

16

An important class of statistics involves transforming the out-

comes to ranks before considering differences by treatment

status. This improves robustness.

We also often subtract (N + 1)/2 from each to obtain a

normalized rank that has average zero in the population:

Ri(Yobs1 , . . . , Y obs

N ) =N∑

j=1

1Y obs

j ≤Y obsi

− N + 1

2.

Given the ranks Ri, an attractive test statistic is the difference

in average ranks for treated and control units:

T =

∑

Wi · Ri∑

Wi−∑

(1 − Wi) · Ri∑

1 − Wi.

17

2.3 Computation of p-values

The p-value calculations presented so far have been exact.

With both N and M sufficiently large, it may therefore be

unwieldy to calculate the test statistic for every value of the

assignment vector.

In that case we rely on numerical approximations to the p-

value.

Formally, randomly draw an N-dimensional vector with N −M zeros and M ones from the set of assignment vectors.

Calculate the statistic for this draw (denoted T1). Repeat

this process K − 1 times, in each instance drawing another

vector of assignments and calculating the statistic Tk, for

k = 2, . . . , K. We then approximate the p-value for our test

statistic by the fraction of these K statistics that are more

extreme than Tobs.

18

Comparison to p-value based on normal approximation to

distribution of t-statistic:

t =Y 1 − Y 0

√

s20/(N − M) + s21/M

where

s20 =1

N − M − 1

∑

i:Wi=0

(Y obsi − Y 0)

2, s21 =1

M − 1

∑

i:Wi=1

(Y obsi −

and

p = 2 × Φ(−|t|) where Φ(a) =∫ a

−∞1√2π

exp

(

−x2

2

)

dx

19

P-values for Fisher Exact Tests: Ranks versus Levels

sample size p-valuesProg Loc controls treated t-test FET FET

(levels) (ranks)

GAIN AL 601 597 0.835 0.836 0.890GAIN LA 1400 2995 0.544 0.531 0.561GAIN RI 1040 4405 0.000 0.000 0.000GAIN SD 1154 6978 0.057 0.068 0.018

WIN AR 37 34 0.750 0.753 0.805WIN BA 260 222 0.339 0.339 0.286WIN SD 257 264 0.136 0.137 0.024WIN VI 154 331 0.960 0.957 0.249

20

2.4 Neyman’s Unbiased Estimator Y obs1 − Y obs

0

Neyman was interested in the population average treatment

effect:

τ =1

N

N∑

i=1

(Yi(1) − Yi(0)) = Y (1) − Y (0).

Suppose that we observed data from a completely random-

ized experiment in which M units were assigned to treatment

and N − M assigned to control.

Neyman’s unbiased estimator is the difference in the average

outcomes for those assigned to the treatment versus those

assigned to the control:

τ =1

M

∑

i:Wi=1

Y obsi − 1

N − M

∑

i:Wi=0

Y obsi = Y obs

1 − Y obs0 .

21

Neyman (1923, 1990) was also interested in the variance of

this unbiased estimator of the average treatment effect

This involved two steps: first, deriving the variance of the

estimator for the average treatment effect; and second, de-

veloping unbiased estimators of this variance.

In addition, Neyman sought to create confidence intervals for

the population average treatment effect which also requires

an appeal to the central limit theorem for large sample nor-

mality.

22

The variance of Y obs1 − Y obs

0 is equal to:

Var(Y obs1 − Y obs

0 ) =S20

N − M+

S21

M− S2

01

N, (1)

where S2w is the variance of Yi(w) in the population, defined

as:

S2w =

1

N − 1

N∑

i=1

(Yi(w) − Y (w))2,

for w = 0,1, and S201 is the population variance of the unit-

level treatment effect, defined as:

S201 =

1

N − 1

N∑

i=1

(Yi(1) − Yi(0)− τ)2.

23

The numerator of the first term, the population variance of

the potential control outcome vector, Y(0), is equal to

S20 =

1

N − 1

N∑

i=1

(Yi(0)− Y (0))2.

An unbiased estimator for S20 is

s20 =1

N − M − 1

∑

i:Wi=0

(Y obsi − Y obs

0 )2.

24

The third term, S201 (the population variance of the unit-

level treatment effect) is more difficult to estimate because

we cannot observe both Yi(1) and Yi(0) for any unit.

If the treatment effect is additive (Yi(1) − Yi(0) = c for all

i), then this variance is equal to zero and the third term

vanishes.

Under this circumstance we can obtain an unbiased estimator

for the variance as:

V

(

Y obs1 − Y obs

0

)

=s20

N − M+

s21M

. (2)

Gain Data: Riverside:

τ = 0.284 (se 0.033)

25

This estimator for the variance is widely used, even when the

assumption of an additive treatment effect is inappropriate.

There are two main reasons for this estimator’s popularity.

First, confidence intervals generated using this estimator of

the variance will be conservative with actual coverage at least

as large, but not necessarily equal to, the nominal coverage.

The second reason for using this estimator for the variance

is that it is always unbiased for the variance of τ = Y obs1 −

Y obs0 when this statistic is interpreted as the estimator of the

average treatment effect in the super-population from which

the N observed units are a random sample. (we return to

this interpretation later)

26

3. Observational Studies: Matching Methods

Treatment indicator: Wi

Potential Outcomes Yi(0), Yi(1)

Covariates Xi

Observed outcome: Y obsi = Wi · Yi(1) + (1 − Wi) · Yi(0).

Nc =∑

(1 − Wi) number of controls, Nt =∑

i Wi number of

treated units.

Y obsc and Y obs

t are average outcomes for controls and treated.

28

3.1 Application: Imbens-Rubin-Sacerdote Lottery Study

IRS were interested in estimating the effect of unearned in-

come on economic behavior including effects on labor supply,

consumption and savings.

Those effects are important to understand and improve de-

sign of social security programs.

IRS surveyed individuals who had played and won large sums

of money in the lottery (“winners”). As a comparison group

they collected data on a second set of individuals who also

played the lottery but who had not won big prizes (the “losers”)

.

29

Nt = 259 winners and Nc = 237 losers in the sample of

N = 496 lottery players.

We know the year individuals played the lottery, the number

of tickets they typically bought, age, sex, education, and

their social security earnings for the six years preceeding their

winning. (what else should we have asked for?)

We present averages and standard deviations for the full sam-

ple, t-statistics for the null hypothesis of equal means in the

two groups, and the normalized differences in the means.

30

Normalized Difference in Covariates

Xc =1

Nc

∑

i:Wi=0

Xi S2c =

1

Nc − 1

∑

i:Wi=0

(

Xi − Xc

)2

Xt =1

Nt

∑

i:Wi=1

Xi S2t =

1

Nt − 1

∑

i:Wi=1

(

Xi − Xt

)2

Normalized difference:

nd =Xt − Xc

√

(S2c + S2

t )/2

More relevant for assessing balance than the t-statistic

t =Xt − Xc

√

S2c /Nc + S2

t /Nt

31

Summary Statistics Lottery Sample

Variable All Losers WinnersN=496 Nt=259 Nc=237 Norm.

mean (s.d.) mean mean [t-stat] Dif.

Year Won 6.23 1.18 6.38 6.06 -3.0 -0.27# Tickets 3.33 2.86 2.19 4.57 9.9 0.90Age 50.2 13.7 53.2 47.0 -5.2 -0.47Male 0.63 0.48 0.67 0.58 -2.1 -0.19Education 13.73 2.20 14.43 12.97 -7.8 -0.70Working Then 0.78 0.41 0.77 0.80 0.9 0.08Earn Y -6 13.8 13.4 15.6 12.0 -3.1 -0.27...Earn Y -1 16.3 15.7 18.0 14.5 -2.5 -0.23Pos Earn Y-6 0.69 0.46 0.69 0.70 0.3 0.03...Pos Earn Y-1 0.71 0.45 0.69 0.74 1.2 0.10

32

There are substantial differences between the two groups,

including in pre-winning earnings and the number of lottery

tickets bought. This suggests that simple regression meth-

ods will not necessarily be adequate for removing the biases

associated with the differences in covariates.

Where do these differences come from?

• People buying more tickets are more likely to win.

• Nonresponse may differ by prize and individual character-

istics (including labor income)

Is unconfoundedness plausible?

33

3.3 Assumptions

I. Unconfoundedness

Yi(0), Yi(1) ⊥ Wi | Xi.

This form due to Rosenbaum and Rubin (1983). Sometimes

referred to as “selection on observables”, or “exogeneity” but

those terms are not well defined.

Suppose

Yi(0) = α + β′Xi + εi, and Yi(1) = Yi(0) + τ,

then

Yi = α + τ · Wi + β′Xi + εi,

and unconfoundedness ⇐⇒ εi ⊥ Wi|Xi (exogeneity)

34

II. Overlap

0 < pr(Wi = 1|Xi) < 1.

For all X there are treated and control units.

This assumption is in principle testable: one can estimate the

propensity score

e(x) = pr(Wi = 1|Xi = x)

and assess whether it gets close to zero.

I and II combined: Strong Ignorability (Rosenbaum and Ru-

bin, 1983)

This is what we will work with for this application.

35

Identification Given Unconfoundedness and Overlap

By definition,

τ(X) = E[Yi(1)− Yi(0)|Xi = x]

= E[Yi(1)|Xi = x] − E[Yi(0)|Xi = x]

By unconfoundedness this is equal to

E[Yi(1)|Wi = 1, Xi = x] − E[Yi(0)|Wi = 0, Xi = x]

= E[Yi|Wi = 1, Xi = x] − E[Yi|Wi = 0, Xi = x].

By the overlap assumption we can estimate both terms on

the righthand side.

Then

τ = E[τ(Xi)].

36

3.4 Concern with regression estimators in cases with

limited overlap in covariate distributions (distinct from

plausibility of unconfoundedness assumption)

Regression estimators are the most widely used methods for

estimating treatment effects

Sometimes simple linear regression:

Y obsi = β0 + τ · Wi + β1 · Xi + ε

Sometimes allowing for interactions:

Y obsi = β0 + τ · Wi + β1 · Xi + β2 · (Xi − X) · Wi + ε

What is the problem with these methods?

37

Suppose the regression model is:

Yi(w) = βw0 + βw1 · Xi + εwi

Suppose we are interested in τPt :

τPt = E[Yi(1) − Yi(0)|Wi = 1]

= E[Yi|Wi = 1] − E[E[Yi|Wi = 0,Xi]|Wi = 1]

This is estimated as

Y t − Y c − βc1 ·(

Xt − Xc

)

The problem is that if the difference Xt−Xc is substantial, wedo a lot of extrapolation using this estimator. Regression isOK with experimental data, but not if Xt−Xc is substantial.

This is why looking at normalized differences in covariates isuseful as indication of potential problems.

38

3.5 Estimation: Subclassification

The propensity score is the probability of assignment to treat-

ment given the covariates:

e(x) ≡ pr(Wi = 1|Xi = x) = E[Wi|Xi = x]

Unconfoundedness

Wi ⊥ Yi(0), Yi(1) | Xi.

This implies (Rosenbaum & Rubin, 1983)

Wi ⊥ Yi(0), Yi(1) | e(Xi)

Only need to adjust for differences in (scalar) propensity

score.

39

Blocking or Subclassification

Pick c0 < c1 < . . . < cJ, with c0 = 0, cJ = 1.

Define block indicator Bij = 1 if cj−1 < e(Xi) ≤ cj.

τj =1

Njt

∑

i:Bij=1

Wi · Yi −1

Njc

∑

i:Bij=1

(1 − Wi) · Yi.

τ =J∑

j=1

Njc + Njt

N· τj.

Cochran recommend 5 blocks as it eliminates 90% of bias in

normal case.

40

Lottery Data: Normalized Diffs in Covs after SubclassificationFull Selected 2 Subclass 5 Subclass

Year Won -0.27 -0.06 0.03 -0.07Tickets Bought 0.90 0.51 -0.18 -0.07Age -0.47 -0.08 0.02 -0.04Male -0.19 -0.11 0.10 0.14Years of Schooling -0.70 -0.47 0.19 0.10Working Then 0.08 0.03 -0.03 -0.02Earnings Year -6 -0.27 -0.19 0.10 0.03...Earnings Year -1 -0.23 -0.19 0.02 -0.00Pos Earnings Year -6 0.03 -0.00 0.08 0.08...Pos Earnings Year -1 0.10 -0.01 0.04 0.07

41

Optimal Subclassification

Subclass Min P-score Max P-score # Controls # Treated

1 0.03 0.24 67 132 0.24 0.32 32 83 0.32 0.44 24 174 0.44 0.69 34 465 0.69 0.99 15 66

42

Average Treatment Effects Lottery Data

With and Without Subclassification

(Within Blocks Regression Adjustment with No, Few, or All Cov)

Full Sample Selected Selected SelectedCov 1 Block 1 Block 2 Blocks 5 Blocks

No -6.16 -6.64 -6.05 -5.66Few -2.85 -3.99 -5.57 -5.07All -5.08 -5.34 -6.35 -5.74

43

4.6 Assessing Unconfoundedness

Main question:

• what can we say about the plausibility of unconfounded-

ness?

• Recall, unconfoundedness is not testable.

44

We consider the unconfoundedness assumption,

Yi(0), Yi(1) ⊥⊥ Wi | Xi. (3)

We partition the vector of covariates Xi into two parts, a

(scalar) pseudo outcome, denoted by Xpi , and the remainder,

denoted by Xri , so that Xi = (Xp

i , Xr′i )′.

Now we assess whether the following conditional indepen-

dence relation holds:

Xpi ⊥⊥ Wi | Xr

i .

The two issues are (i) the interpretation of this and the rela-

tion to the unconfoundedness assumption that is of primary

interest, and (ii) the implementation of the test.

45

The main issue concerns the link between the conditional

independence relation and unconfoundedness . This link is

indirect, as unconfoundedness cannot be tested directly.

First consider a related condition:

Yi(0), Yi(1) ⊥⊥ Wi | Xri .

If this modified unconfoundedness condition were to hold,

one could use the adjustment methods using only the subset

of covariates Xri . In practice this is a stronger condition than

the original unconfoundedness condition.

The modified unconfoundedness condition is not testable.

We use the pseudo outcome Xpi as a proxy variable Yi(0),

and test

Xpi ⊥⊥ Wi | Xr

i .

46

A leading example is that where Xi contains multiple lagged

measures of the outcome. In the lottery example we have

observations on earnings for six, years prior to winning. De-

note these lagged outcomes by Yi,−1, . . . , Yi,−6, where Yi,−1

is the most recent and Yi,−6 is the most distant pre-winning

earnings measure.

One could implement the above ideas using earnings for the

most recent pre-winning year Yi,−1 as the pseudo outcome

Xpi , so that the vector of remaining pretreatment variables Xr

i

would still include the five prior years of pre-winning earnings

Yi,−2, . . . , Yi,−6 (ignoring additional pre-treatment variables).

47

Assessing Unconfoundedness Using the Lottery Data

Pseudo Outc est (s.e.)

Y−1 -0.53 (0.78)

Y−1+Y−22 -1.16 (0.83)

Y−1+Y−2+Y−33 -0.39 (0.95)

Y−1+...+Y−44 -0.56 (0.97)

Y−1+...+Y−55 -0.41 (0.92)

Y−1+...+Y−66 -1.30 (2.17)

Actual Outc

Y obs -5.74 (1.40)

49

3. Observational Studies: Instrumental Variables

• Sometimes unconfoundedness is not very plausible.

• No general solutions, but some special cases

50

The Fulton Fish Market Study

Fulton fish market in New York is where fish got bought by

restaurants/fishmongers from boats and trucks, 2-6am in the

morning.

Individuals would trade for particular quantities at prices ne-

gotiated for each sale separately, no posted prices.

Katy Graddy, for her PhD thesis, collected data from a par-

ticular seller, for a particular fish (whiting).

51

We are interested in estimating a demand function:

QDt (p)

One demand function for every market (every day t), as a

function of price.

Uconfoundedness: assume prices are randomly assigned and

compare market t where price was p with market t′ where

price was p′ 6= p.

Not very credible! Price are not randomly assigned!

52

Instrumental Variables

Think of both demand and supply function:

QDt (p), QS

t (p)

What we observe is the equilibrium price pt that solves

QDt (pt) = QS

t (pt)

and the corresponding quantity

qt = QDt (pt) = QS

t (pt)

How do we get the slope of the demand function?

53

Katy Graddy collected data on weather (windspeed and wave-

height) at sea on the preceeding days.

Call these Zt.

They affect the supply function, but not the demand func-

tion:

Cov(

QSt (p), Zt

)

6= 0, Cov(

QDt (p), Zt

)

= 0

54

-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.86

6.5

7

7.5

8

8.5

9

9.5

10ols estimates

-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.86

6.5

7

7.5

8

8.5

9

9.5

10iv estimates

Assume linearity:

lnQDt (p) = α + β × ln p + εt

Estimates:

OLS : − 0.54 (0.18)

IV : − 1.01 (0.38)

55

References

Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. ”Syn-

thetic control methods for comparative case studies: Esti-

mating the effect of California’s tobacco control program.”

Journal of the American statistical Association 105.490 (2010):

493-505.

Angrist, J. D., Graddy, K., & Imbens, G. W. (2000). The

interpretation of instrumental variables estimators in simul-

taneous equations models with an application to the demand

for fish. The Review of Economic Studies, 67(3), 499-527.

Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How

much should we trust differences-in-differences estimates?.

The Quarterly journal of economics, 119(1), 249-275.

56

Bertrand, M., & Mullainathan, S. (2004). Are Emily and

Greg more employable than Lakisha and Jamal? A field ex-

periment on labor market discrimination. American economic

review, 94(4), 991-1013.

Crepon, B., Duflo, E., Gurgand, M., Rathelot, R., & Zamora,

P. (2013). Do labor market policies have displacement ef-

fects? Evidence from a clustered randomized experiment.

The Quarterly Journal of Economics, 128(2), 531-580.

Fisher, R. A. (1937). The design of experiments. Oliver And

Boyd; Edinburgh; London.

Hamermesh, D. S., & Biddle, J. E. (1994). Beauty and the

Labor Market. American Economic Review, 84(5), 1174-

1194.

Holland, P. W. (1986). Statistics and Causal Inference. Jour-

nal of the American Statistical Association, 81(396), 945-

960.

Imbens, G., and D. Rubin (2015) Causal Inference Cambridge

University Press.

Imbens, G. W., Rubin, D. B., & Sacerdote, B. I. (2001).

Estimating the effect of unearned income on labor earnings,

savings, and consumption: Evidence from a survey of lottery

players. American Economic Review, 91(4), 778-794.

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role

of the propensity score in observational studies for causal

effects. Biometrika, 70(1), 41-55.

Splawa-Neyman, J., Dabrowska, D. M., & Speed, T. P.

(1990). On the application of probability theory to agricul-

tural experiments. Essay on principles. Section 9. Statistical

Science, 465-472.

Documents

Causality in Econometrics and Statisticsqc.virginia.edu/files/2018/01/imbens_uva_causality_figures.pdf · 1. Causality Three key notions underlying the Rubin-Causal-Model (RCM) or