32
EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia [email protected] June 19, 2019 Ed Kroc (UBC) Causal Inference June 19, 2019 1 / 32

EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

EPSE 581C: Causal Inference for Applied Researchers

Ed Kroc

University of British Columbia

[email protected]

June 19, 2019

Ed Kroc (UBC) Causal Inference June 19, 2019 1 / 32

Page 2: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Last time

More instrumental variables

Ed Kroc (UBC) Causal Inference June 19, 2019 2 / 32

Page 3: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Today

More instrumental variables

Pseudo-causal inference:

Mediation models

Naive SEMs

Course recap

Ed Kroc (UBC) Causal Inference June 19, 2019 3 / 32

Page 4: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Instrumental variables (natural experiments)

Instrumental variables allow us to unconfound the causal effect oftreatment (with drop-out, etc.) by exploiting a truly random intentionto treat mechanism:

Intention to treat is highly correlated with actual treatment.

The assignment to treatment only affects the outcome via the actualtreatment.

More generally, an instrumental variable Z unconfounds the causaleffect of X on Y if:

Z is highly correlated with X .

Z is only related to the outcome Y via the effect of X .

Ed Kroc (UBC) Causal Inference June 19, 2019 4 / 32

Page 5: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Instrumental variables (natural experiments)

We saw last time that if X is not randomly assigned to sample units,then X will be confounded with Y ; i.e.

Y “ β0 ` βXX ` δ

where δ “ f pX q ` δ1.

Thus:

βX “CovpY ,X q ´ Covpδ,X q

VarpX q

and because Covpδ,X q ‰ 0, the usual regression assumptions areviolated. Thus, the ordinary regression estimate

pβX “yCovpY ,X q

xVarpX q“

řni“1pyi ´ syqpxi ´ sxq

řni“1pxi ´ sxq2

will be wrong!

Ed Kroc (UBC) Causal Inference June 19, 2019 5 / 32

Page 6: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Instrumental variables (natural experiments)

Instead, we can get an unconfounded estimate of the ACE(X) via

βIV “CovpZ ,Y q

CovpZ ,X q

Why? A little covariance algebra:

βIV “CovpZ ,Y q

CovpZ ,X q

“CovpZ , β0 ` βXX ` δq

CovpZ ,X q

“βXCovpZ ,X q ` CovpZ , δq

CovpZ ,X q“ βX

Critically, this requires that CovpZ ,X q ‰ 0 (i.e. Z and X arecorrelated) and CovpZ , δq “ 0 (i.e. Z is only related to Y via X ).

Ed Kroc (UBC) Causal Inference June 19, 2019 6 / 32

Page 7: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Instrumental variables (natural experiments)

In general, an instrumental variable Z for the causal effect of some Xon Y must obey the following properties:

(1) Z must be conditionally independent of the response Y , given X ;i.e. Z is only related to Y via X .

(2) Z must be related to the causal variable X , and in particular,CovpZ ,X q ‰ 0.

Assumption (1) is usually untestable; must be justified fromtheoretical considerations (like our intention to treat example).

Assumption (2) is directly testable; in practice, to get goodIV-estimates, we should have a strong association/correlation betweenZ and X .

Note: this all generalizes to situations where you may want to controlfor fixed covariates or have more than one instrumental variable forone or more causal variables of interest.

Ed Kroc (UBC) Causal Inference June 19, 2019 7 / 32

Page 8: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Properties of the IV-estimator

Just like an ordinary (unadjusted) sample regression coefficient, theIV-estimator follows (approximately) a normal distribution for largeenough sample sizes.

However, unlike typical OLS or ML regression coefficient estimates,the convergence of the sample statistic pβIV pY„X q to a normaldistribution can be very slow.

After some mathematical manipulation, we get that

pβIV pY„X q ÝÑ NpβX , σ2IV q, as nÑ8,

where

σ2IV “VarppZ ´ EpZ qqδqnpCovpZ ,X qq2

Notice: this standard error is huge if CovpZ ,X q is small (relative ton); i.e. if Z is a weak instrument for pX ,Y q.

Ed Kroc (UBC) Causal Inference June 19, 2019 8 / 32

Page 9: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Properties of the IV-estimator

For weak instruments, the sample IV-estimator need not look evenclose to normal (or t-distributed), even for large n:

Ed Kroc (UBC) Causal Inference June 19, 2019 9 / 32

Page 10: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Implementing the IV approach

Can take a so-called “two-stage least squares” approach toestimation:

First, estimate the model relating Z to X , e.g.:

X “ γ0 ` γZZ ` ε

Then, estimate the model relating the predicted values of X , i.e. pX , toY , e.g.:

Y “ β0 ` βX pX ` δ

Then this second-stage OLS estimate of βX will equal the IV-estimator:βIV .

Unfortunately, this method will produce the wrong standard errors!

Many fixes for this problem, but in R, can use the ‘ivreg’ function (seeHW).

Ed Kroc (UBC) Causal Inference June 19, 2019 10 / 32

Page 11: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Case study

Draca et al. (2009)

Ed Kroc (UBC) Causal Inference June 19, 2019 11 / 32

Page 12: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

IV examples

Here, Z is a true (simulated) instrumental variables for pX ,Y q.

The response Y also depends on another random variable Wunrelated to X or Z via:

Y “ 2´ 2X ´ 2W ´W 2 ` ε,

where ε is a random normal disturbance term.

IV-estimator of βX , not controlling for W :

Ed Kroc (UBC) Causal Inference June 19, 2019 12 / 32

Page 13: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

IV examples

If we calculate the IV-estimator using naive (unadjusted) two-stageleast squares, then we get the correct estimate, but the standard erroris wrong.

In general, fitting the two-stage least squares model naively willproduce inflated standard errors, since we are using predicted valuesfor X instead of their true observed values.

Ed Kroc (UBC) Causal Inference June 19, 2019 13 / 32

Page 14: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

IV examples

Still have:Y “ 2´ 2X ´ 2W ´W 2 ` ε,

where ε is a random normal disturbance term.

Now find IV-estimator of βX controlling for W :

Slight improvement in estimate and standard error.

Ed Kroc (UBC) Causal Inference June 19, 2019 14 / 32

Page 15: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference

There are a variety of methods out there that claim to be useful forcausal inference.

Some methods are valid:

RD-designs (local causal inference)

PS methods (partial causal inference)

IV methods (full causal inference, theoretically possible)

Some methods are not valid on their own:

Mediation modelling

Naive (typical) structural equation modelling (SEM)

Network analysis

Imputation and other missing data techniques

Ed Kroc (UBC) Causal Inference June 19, 2019 15 / 32

Page 16: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: Mediation modelling

“The triangle of mediation” (AKA the triangle of broken dreams):

Ed Kroc (UBC) Causal Inference June 19, 2019 16 / 32

Page 17: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: Mediation modelling

Mediation analysis is not causal modelling.

Superficially, looks very similar to instrumental variables.

But instrumental validity is ignored.

An example of this pseudo-statistical methodology is as follows:

Check that the exposure is correlated with the outcome (regression).

Then fit a regression model including both the exposure and themediator as predictors.

Then, if the direct effect of exposure on outcome (c 1 in previouspicture) is statistically no different from zero, people say that the effectof exposure is fully mediated; i.e. some claim that exposure only affectsthe outcome via the mediator.

This should sound familiar: claiming that this method shows thatexposure is an instrumental variable!

Ed Kroc (UBC) Causal Inference June 19, 2019 17 / 32

Page 18: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: Mediation modelling

But this is false.

We know that zero correlation does not imply that random variablesare independent (unless all normally distributed).

But IVs require independence of the IV from the outcome, conditionalon the causal agent.

Mediation modelling is nothing more than correlational modelling;i.e. it is predictive or descriptive/explanatory, but not causal.

Ed Kroc (UBC) Causal Inference June 19, 2019 18 / 32

Page 19: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: Mediation modelling

Example of the (many) problems with mediation analysis:

Try to study a response Y that depends on X and W , where both Xand W depend on Z :

Y “ 2´ 2X ` 2X 2 ´ 2W ´W 2 ` 3X ¨W ` ε,

where ε is a random normal disturbance term.

Suppose we only collect data on Z , X , and Y (perhaps we don’tknow that W is an important confounder).

Mediation analysis: using some kind of apriori hypothesis/theory, wemight think that Z only affects Y via X ; i.e. the effect of Z on Y isfully mediated via X .

Ed Kroc (UBC) Causal Inference June 19, 2019 19 / 32

Page 20: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: Mediation modelling

Following the standard “triangle of mediation”, we would first checkthat Z is correlated with Y :

Ed Kroc (UBC) Causal Inference June 19, 2019 20 / 32

Page 21: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: Mediation modelling

Next, we would check that the effect of Z on Y “disappears” whenwe control for X :

True! But, in fact, Z is not an instrumental variable, nor does it affectY only via X ; recall, Y depends on W , which depends on Z as well.

Ed Kroc (UBC) Causal Inference June 19, 2019 21 / 32

Page 22: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: Mediation modelling

Next, we would check that the effect of Z on Y “disappears” whenwe control for X :

Notice also that the estimated effect of X on Y is totally wrong!(True marginal effect is ´2)

Ed Kroc (UBC) Causal Inference June 19, 2019 22 / 32

Page 23: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Fake causal inference: SEMs

Structural equation models - SEMs (can take entire classes on thissubject).

Very useful for modelling latent variables (constructs) withmeasurement error.

However, SEMs are not causal modelling on their own.

There is a long history of conflating SEMs with causal modelling.

The vast majority of SEM work is purely correlational, not causal.

But see the work of J. Pearl et al. for a more rigorous,causally-flavoured approach to SEMs (relying entirely on theRD/PS/IV frameworks to inject causality into the methodology).

Ed Kroc (UBC) Causal Inference June 19, 2019 23 / 32

Page 24: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

3 kinds of inference; 2 kinds of modelling

Inferential frameworks:

Descriptive/explanatory inference

Predictive inference

Causal inference

Modelling frameworks:

Exploratory modelling

Confirmatory modelling

Generally, all of these use the same statistical tools. But how these toolsare used and how we define what are good statistics differ.

Ed Kroc (UBC) Causal Inference June 19, 2019 24 / 32

Page 25: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

3 causal frameworks in use

Fisher: Classical experimental design:

Randomization

Control

Manipulation

Neyman-Rubin: Counterfactual causality:

Counterfactual probability

Missing data problems

Pearl: “do”–calculus and a psuedo-return to Fisher:

Counterfactual probability augmented

Directed acyclic graphis (DAGs), structural equation models (SEMs)

Ed Kroc (UBC) Causal Inference June 19, 2019 25 / 32

Page 26: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

2.5 broad domains of research design

Experimental research:

Fisher: randomize, control, manipulate

Classic laboratory sciences: (lab) biology, chemistry, physics

Observational research:

Purely associational; no causal inference

Quasi-experimental research:

Observational designs that have experimental elements: includes mosthealth research

Most applied research that is considered “rigorous” in the social andecological sciences

A complicated mixture of Fisher, Neyman-Rubin, and Pearl(discipline-dependent)

Causal inference here always requires the application of certainuntestable assumptions

Ed Kroc (UBC) Causal Inference June 19, 2019 26 / 32

Page 27: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Control, randomize, manipulate

Experimental manipulation:

E.g. physically assigning seeds to 1 of 3 soil treatments

Experimental control:

Defined, fixed treatment levels

Sampling frame: e.g. all seeds from same industrial source

Experimental randomization/exchangeability:

Random assignment to experimental protocol

Lack of random sampling, but seeds engineered to be exchangeable

Ed Kroc (UBC) Causal Inference June 19, 2019 27 / 32

Page 28: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Exchangeability: generalizability and confounding

Random sampling produces exchangeability of samples over the (empirical)population; allows inferences to generalize from sample to sampling frame:

For a random variable Y , the probability distribution of Y is the sameon the sample S (on average) as it is in the (empirical) population Ω:

Strong exchangeability : ErPrpY | S qs “ PrpY | Ωq

Eliminate confounding by ensuring sample units are exchangeable overtreatment; allows for causal inferences in the sample:

For example, if treatment T is randomly assigned to subjects, and Xis a covariate, then the probability distribution of X on T “ 0 is thesame (on average) as the probability distribution of X on T “ 1:

ErPrpX | T “ 0, S qs “ ErPrpX | T “ 1, S qs

Ed Kroc (UBC) Causal Inference June 19, 2019 28 / 32

Page 29: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Isolation, association, directionality

Is it always possible to induce experimental control?

Can we always isolate a single cause from other causes?

Are effects always associated with causes?

Can we always isolate the direction of causality: e.g. A causes B, butB does not cause A?

These are difficult questions with, at best, unclear answers. For example:

We want to establish that X causes Y , and not the other way around.How do we detect this in practice?

Temporal priority does not help if X can cause Y and Y can cause Xsimultaneously.

But is that actually possible? E.g. high stock prices of a company cancause high profits and high profits can cause high stock prices.

Ed Kroc (UBC) Causal Inference June 19, 2019 29 / 32

Page 30: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Probability alone is not sufficient to capture causality

Classic example from Pearl (1995):

Define the expressionPrpY | dopxqq

to be the probability of Y given that X has been experimentally fixedto the value X “ x .

Then

Prpmud | rainq ě Prpmudq and Prprain | mudq ě Prprainq,

but

Prpmud | doprainqq ě Prpmudq and Prprain | dopmudqq “ Prprainq.

But does it even make sense to attempt causal inference whentreatments cannot be (at least hypothetically) manipulated?

Ed Kroc (UBC) Causal Inference June 19, 2019 30 / 32

Page 31: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Final words

There are many unanswered questions regarding causality and causalinference, especially in a non-experimental framework.

Ultimately, the only way toward pure causality is experimental controland manipulation.

But failing that, we have three powerful methods to derive local,partial, or “full” causal inferences in a non-experimental framework:

Regression discontinuity designs

Propensity scores (matching)

Instrumental variables

And, as always, when in doubt: consult a statistician!

Ed Kroc (UBC) Causal Inference June 19, 2019 31 / 32

Page 32: EPSE 581C: Causal Inference for Applied Researchers · 2019. 6. 19. · Mediation analysis: using some kind of apriori hypothesis/theory, we might think that Z only a ects Y via X;

Final words

Good luck in your future work!

Lagniappe: follow me on Instagram if you like wildlife photos (withlots of gulls) and occasional academic stuff: @edkroc

Ed Kroc (UBC) Causal Inference June 19, 2019 32 / 32