An Introduction to Causal Inference in...

An Introduction to Causal Inference in Neuroimaging

Moritz Grosse-Wentrup

Max Planck Institute for Intelligent SystemsDepartment Empirical Inference

Tubingen, Germany

February 28, 2014

M. Grosse-Wentrup (MPI-IS) Causal Inference in Neuroimaging February 28, 2014 1 / 41

Why should we be interested in causal inference?

What should a scientific theory be able to do?

1 Explain already observed data2 Correctly predict future observations

I of systems that are passively observed.I of systems that are actively intervened upon.

→The aim of causal inference is to predict how a system reacts to anintervention.

1 Explain already observed data

2 Correctly predict future observations

I of systems that are passively observed.

I of systems that are actively intervened upon.

The potential outcomes framework

Notation:

A population UIndividual units u ∈ UA treatment variable S(u) : U → {t, c}An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

U : A set of patients

u: An individual patient

S(u): Assignment of patient u to treatment- or control-group

Y (u,S(u)): The survival time of patient u under treatment S(u)

(Holland PW, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)

Notation:

A population UIndividual units u ∈ UA treatment variable S(u) : U → {t, c}An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

Notation:

A population U

Individual units u ∈ UA treatment variable S(u) : U → {t, c}An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

Notation:

A population U

Individual units u ∈ UA treatment variable S(u) : U → {t, c}An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

Notation:

A population UIndividual units u ∈ U

A treatment variable S(u) : U → {t, c}An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

Notation:

A population UIndividual units u ∈ U

A treatment variable S(u) : U → {t, c}An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

Notation:

A population UIndividual units u ∈ UA treatment variable S(u) : U → {t, c}

An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

Notation:

A population UIndividual units u ∈ UA treatment variable S(u) : U → {t, c}

An outcome variable Y (u,S(u)) : U × {t, c} → R

Example:

Notation:

A population UIndividual units u ∈ UA treatment variable S(u) : U → {t, c}An outcome variable Y (u, S(u)) : U × {t, c} → R

Example:

Notation:

A population UIndividual units u ∈ UA treatment variable S(u) : U → {t, c}An outcome variable Y (u, S(u)) : U × {t, c} → R

Example:

The fundamental problem of causal inference:

The effect of cause t on u is measured as Yt(u)− Yc(u).

It is impossible to observe Yt(u) and Yc(u) on the same unit.

Solutions:

1 Assume unit homogeneity : Yt/c(u1) = Yt/c(u2)

2 Assume causal transience: Yt(u)⊥⊥Yc(u)

3 Compute the average causal effect: T = E (Yt)− E (Yc)

But we can only observe E (Yt/c |S = t/c).

When does it hold that E (Yt/c |S = t/c) = E (Yt/c)?

If Y ⊥⊥ S , i.e. if treatment assignments are done randomly.

Solutions:

Example:

Treatment

Survival

E (Survival|Treatment) = E (Survival|Control)

Assume E (Survival|Sex = female) > E (Survival|Sex = male)

Assume E (Treatment|Sex = female) > E (Treatment|Sex = male)

Then E (Survival|Treatment) > E (Survival|Control)

Incorrect conclusion: Treatment → Survival

Example:

Treatment

Survival

Example:

Treatment Sex

Survival

Example:

Treatment Sex

Survival

Example:

Treatment Sex

Survival

Example:

Treatment Sex

Survival

Example:

Treatment h

Survival

Causal inference in neuroimaging

1 Can we do causal inference based on observational data only?

I In general, no.I Under (usually untestable) assumptions, yes.

2 So what makes for a good causal inference algorithm?

I It is provably correct under reasonable assumptions.I It makes testable predictions on the effect of interventions.I It is able to deal with hidden confounders.I It performs well on finite data.

1 Can we do causal inference based on observational data only?

I In general, no.I Under (usually untestable) assumptions, yes.

1 Can we do causal inference based on observational data only?I In general, no.

I Under (usually untestable) assumptions, yes.

1 Can we do causal inference based on observational data only?I In general, no.I Under (usually untestable) assumptions, yes.

2 So what makes for a good causal inference algorithm?I It is provably correct under reasonable assumptions.

I It makes testable predictions on the effect of interventions.I It is able to deal with hidden confounders.I It performs well on finite data.

2 So what makes for a good causal inference algorithm?I It is provably correct under reasonable assumptions.I It makes testable predictions on the effect of interventions.

I It is able to deal with hidden confounders.I It performs well on finite data.

2 So what makes for a good causal inference algorithm?I It is provably correct under reasonable assumptions.I It makes testable predictions on the effect of interventions.I It is able to deal with hidden confounders.

I It performs well on finite data.

2 So what makes for a good causal inference algorithm?I It is provably correct under reasonable assumptions.I It makes testable predictions on the effect of interventions.I It is able to deal with hidden confounders.I It performs well on finite data.

Outline

1 Granger Causality

2 Causal Bayesian Networks

3 Dynamic Causal Modelling

4 Non-Linear Non-Gaussian Acyclic Models (Non-LiNGAM)

5 Summary

Outline

1 Granger Causality

5 Summary

Granger causality

Definition A (not by Granger): One stochastic process is causal to asecond if the autoregressive predictability of the second process at a giventime point is improved by including measurements from the immediatepast of the first.

Example: First-order autoregressive (AR) process

x [t] = ax [t − 1] + εx [t]y [t] = by [t− 1] + cx [t− 1] + εy [t]

x [t − 1] x [t] x [t + 1]

y [t − 1] y [t] y [t + 1]

Inference procedure: x → y?

Predict y [t] from its past values ⇒ σ2(y [t]|y [t − 1]).

Predict y [t] from the past of y and x ⇒ σ2(y [t]|y [t − 1], x [t − 1]).

If σ2(y [t]|y [t − 1], x [t − 1]) < σ2(y [t]|y [t − 1]) conclude that x → y .

(Granger C.W.J., Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 1969)

Granger causality

x [t − 1] x [t] x [t + 1]

y [t − 1] y [t] y [t + 1]

Granger causality

x [t − 1] x [t] x [t + 1]

y [t − 1] y [t] y [t + 1]

Granger causality

x [t − 1] x [t] x [t + 1]

y [t − 1] y [t] y [t + 1]

Granger causality

x [t − 1] x [t] x [t + 1]

y [t − 1] y [t] y [t + 1]

Granger causality

x [t − 1] x [t] x [t + 1]

y [t − 1] y [t] y [t + 1]

Granger causality: Confounding

Definition B (by Granger): If σ2(Y |U) < σ2(Y |U − X ), we say that X iscausing Y , denoted by Xt ⇒ Yt .

Confounding:

x [t − 2] x [t − 1] x [t] x [t + 1]

y [t − 2] y [t − 1] y [t] y [t + 1]

h[t − 2] h[t − 1] h[t] h[t + 1]

→ Control for h: σ2(y |y , x , h) < σ2(y |y , h)!Impossible for latent variables.

Confounding:

x [t − 2] x [t − 1] x [t] x [t + 1]

y [t − 2] y [t − 1] y [t] y [t + 1]

h[t − 2] h[t − 1] h[t] h[t + 1]

Confounding:

x [t − 2] x [t − 1] x [t] x [t + 1]

y [t − 2] y [t − 1] y [t] y [t + 1]

h[t − 2] h[t − 1] h[t] h[t + 1]

Confounding:

x [t − 2] x [t − 1] x [t] x [t + 1]

y [t − 2] y [t − 1] y [t] y [t + 1]

h[t − 2] h[t − 1] h[t] h[t + 1]

Confounding:

x [t − 2] x [t − 1] x [t] x [t + 1]

y [t − 2] y [t − 1] y [t] y [t + 1]

h[t − 2] h[t − 1] h[t] h[t + 1]

Confounding:

x [t − 2] x [t − 1] x [t] x [t + 1]

y [t − 2] y [t − 1] y [t] y [t + 1]

h[t − 2] h[t − 1] h[t] h[t + 1]

→ Control for h: σ2(y |y , x , h) < σ2(y |y , h)!

Impossible for latent variables.

Confounding:

x [t − 2] x [t − 1] x [t] x [t + 1]

y [t − 2] y [t − 1] y [t] y [t + 1]

h[t − 2] h[t − 1] h[t] h[t + 1]

Granger causality: Directed transfer function (DTF)

Observe T samples of x[t] ∈ RN (= number of signals).

Pick an order p of the AR-process.

Learn parameters A[i ] of AR-process x[t] =∑p

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

hij(f ) describes the frequency-specific effect of xj [t] on xi [t].

(Kaminski et al., Evaluating causal relations in neural systems. Biological Cybernetics, 2001)

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]

F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

i=1 A[i ]x[t − i ] + ε[t].

Let A[0] = −I :

−∑p

i=0 A[i ]x[t − i ] = ε[t]

⇔ −A[t] ∗ x[t] = ε[t]F↔ −A(f )x(f ) = ε(f )

⇔ H(f ) = x(f )e(f ) = −A−1(f ).

Granger causality: Case study

(Bosman et al., Attentional stimulus selection through selective synchronization between monkey visual areas. Neuron, 2012)

GC CBN DCM Non-LiNGAMProvably correct underreasonable assumptions

Testable interventions

Hidden confounders

Empirical performance

Testable interventions

Hidden confounders

Testable interventions 3

Hidden confounders

Hidden confounders 7

Outline

1 Granger Causality

5 Summary

Causal Bayesian Networks: Introductory example

Causal structure

Empirical data Dependency structure

(unkown)

(observable) (inferable)

{x1, y1, z1}x y

{x2, y2, z2}...

{xN , yN , zN}→ p(x , y , z)

(Pearl J., Causality: Models, reasoning, and inference, 2000)

Causal structure Empirical data

Dependency structure

(unkown) (observable)

(inferable)

{x1, y1, z1}x y

{x2, y2, z2}...

{xN , yN , zN}→ p(x , y , z)

(inferable)

{x1, y1, z1}

{x2, y2, z2}...

{xN , yN , zN}

→ p(x , y , z)

(inferable)

{x1, y1, z1}

{x2, y2, z2}...

{xN , yN , zN}→ p(x , y , z)

Causal structure Empirical data Dependency structure(unkown) (observable) (inferable)

{x1, y1, z1}x y

{x2, y2, z2}...

{xN , yN , zN}→ p(x , y , z)

Causal x y

structure

Dependency x y

structure

Causal x y

structure

Dependency x y

structure

Causal x y

structure

Dependency x y

structure

Causal x y

structure

Dependency x y

structure

Causal Bayesian Networks: Potential causation

Under the assumptions of faithfulness and causal sufficiency, the followingconditions are sufficient for x to be a cause of y :

x 6⊥⊥ z

y 6⊥⊥ z

x ⊥⊥ y

Dependency structure Causal structure

Causal Bayesian Networks: Potential causation

Under the assumptions of faithfulness and causal sufficiency, the followingconditions are sufficient for x to be a cause of y :

x 6⊥⊥ z

y 6⊥⊥ z

x ⊥⊥ y

Causal Bayesian Networks: Spurious association

Under the assumption of faithfulness, the following conditions aresufficient for x and y to be spuriously related:

x 6⊥⊥ y

x 6⊥⊥ z

y 6⊥⊥w

z ⊥⊥w

x ⊥⊥w

z ⊥⊥ y

z wh⇒

x 6⊥⊥ y

x 6⊥⊥ z

y 6⊥⊥w

z ⊥⊥w

x ⊥⊥w

z ⊥⊥ y

z w⇐

x 6⊥⊥ y

x 6⊥⊥ z

y 6⊥⊥w

z ⊥⊥w

x ⊥⊥w

z ⊥⊥ y

z w⇐

Causal Bayesian Networks: Genuine causation

Under the assumption of faithfulness, the following conditions aresufficient for x to be a genuine cause of y :

z is a potential cause of x

z 6⊥⊥ y

z ⊥⊥ y |x

Causal Bayesian Networks: Predicting interventions

Given: Causal structure (DAG) & p(x , y , z ,w)Goal: Predict the effect of experimentally controlling x

1 Factorize p(x , y , z ,w) accoding to it’s DAG:

p(x , y , z ,w) = p(w |z)p(z |x , y)p(x)p(y)

2 Compute the interventional distribution:

p(x , y , z ,w |do(x = x)) = p(w |z)p(z |x , y)p(y)

3 Compute marginal distribution of variable of interest:

p(w |do(x = x)) =∫z

∫y p(w |z)p(z |x , y)p(y)dydz

p(w |do(x = x)) =∫z

Causal Bayesian Networks: Faithfulness

Faithfulness: All observed (conditional) independences are structural.

Causal structure Functional model Dependency structure

x = ux x y

y = ax + bz + uy

(if a = −bc)= buz + uy

z = cx + uz

For a given causal structure (DAG), the unfaithful distributions havemeasure zero in the space of all distributions that can be generated by theDAG (Meek. UAI, 1995).

x = ux x y

y = ax + bz + uy

z = cx + uz

x = ux x y

y = ax + bz + uy

z = cx + uz

x = ux x y

y = ax + bz + uy

z = cx + uz

Causal Bayesian Networks: Conditional independence tests

Independence tests have to be carried out on finite data.

Uncorrelatedness does not imply independence.

Nonlinear independence tests are hard (Gretton et al., JMLR, 2012).

Conditional non-linear independence tests are very hard (Zhang et al.UAI, 2012).

Not finding a dependence is not evidence for independence.

Causal Bayesian Networks: Case study

What are the causes of inter-subject variations in performance whenoperating a sensorimotor brain-computer interface (BCI)?

Causal hypothesis

Instruction

γ-power

Empirical independence stucture

⊥⊥ /p Instruction SMR γ-power

Instruction - 1e-4 0.44SMR 6⊥⊥ - 2e-4γ-power ⊥⊥ 6⊥⊥ -

Instruction

Neural substrate ofmotor imagery

γ-power

Neural substrate ofattention

(Grosse-Wentrup et al. Causal influence of gamma-oscillations on the sensorimotor-rhythm. NeuroImage, 2011)

Causal hypothesis

Instruction

γ-power

Instruction

γ-power

Causal hypothesis

Instruction

γ-power

Instruction

γ-power

Causal hypothesis

Instruction

γ-power

Instruction

γ-power

Testable interventions 3 3

Hidden confounders 7 3

Empirical performance 7

Outline

1 Granger Causality

5 Summary

Dynamic causal modelling (DCM)

Causality in DCM is used in a control theory sense and means that, underthe model, activity in one brain area causes dynamics in another, and thatthese dynamics cause the observations. (Friston, 2009)

Inference procedure:

1 Observe an N-dimensional time-series x(t) ∈ RN for t = 1, . . . ,T(e.g., BOLD signals).

2 Define a set of M models M = {m1, . . . ,mM}, where each modelconsists of a set of differential equations with a different connectivitystructure.

3 Fit each model to the data (which is a tough problem).

4 Take the connectivity of the model with the best data fit as the truecausal structure.

(Friston K.J. et al., Dynamic causal modelling. NeuroImage, 2003)

Dynamic causal modelling: The hemodynamic model

Dynamic causal modelling: The bilinear model

Dynamic causal modelling: Model comparison

Dynamic causal modelling

If M does not contain the true model, is the best-fitting model in Msimilar to the true one in terms of its connectivity structure?

Dynamic causal modelling

If M does not contain the true model, is the best-fitting model in Msimilar to the true one in terms of its connectivity structure?

Dynamic causal modelling: Model fit & structure similarity

Model fit

→ Similarity in terms of model fit does not translate into similarity interms of connectivity structure.→ There is no reason to believe that DCM selects a causal structure thatis structurally similar to the true one.

(Lohmann et al., Critical comments on dynamic causal modelling. NeuroImage, 2012)

Model fit

→ Similarity in terms of model fit does not translate into similarity interms of connectivity structure.

→ There is no reason to believe that DCM selects a causal structure thatis structurally similar to the true one.

Model fit

Testable interventions 3 3 7

Hidden confounders 7 3 7

Empirical performance 7 7

Outline

1 Granger Causality

5 Summary

Non-Linear Non-Gaussian Acyclic Models (Non-LiNGAM)

Lety = f (x) + n

for some arbitrary non-linear function f and p(x , n) = p(x)p(n) (x ⊥⊥ n).Is it possible to invert this model to obtain

x = g(y) + n

with some arbitrary non-linear function g and p(y , n) = p(y)p(n) (y ⊥⊥ n)?→ In general, no!

(Hoyer et al., Nonlinear causal discovery with additive noise models. NIPS, 2008)

Length

Lety = f (x) + n

x = g(y) + n

Length

Lety = f (x) + n

for some arbitrary non-linear function f and p(x , n) = p(x)p(n) (x ⊥⊥ n).

Is it possible to invert this model to obtain

x = g(y) + n

Length

Lety = f (x) + n

x = g(y) + n

with some arbitrary non-linear function g and p(y , n) = p(y)p(n) (y ⊥⊥ n)?

→ In general, no!

Length

Lety = f (x) + n

x = g(y) + n

1 Observe N samples of {xi , yi} with i = 1, . . . ,N.

2 Perform a non-linear regression from x to y and test whether for theresiduals ey it holds that ey ⊥⊥ x .

3 Perform a non-linear regression from y to x and test whether for theresiduals ex it holds that ex ⊥⊥ y .

4 If ey ⊥⊥ x decide that x → y .

5 If ex ⊥⊥ y decide that y → x .

6 Do not decide on causal direction if neither ey ⊥⊥ x nor ex ⊥⊥ y .

7 3 7 3

Testable interventions 3 3 7 3

7 3 7 3

Hidden confounders 7 3 7 3

Outline

1 Granger Causality

5 Summary

Empirical Performance

(Smith et al., Network modelling methods for fMRI. NeuroImage, 2011)

7 3 7 3

Empirical performance 7 7 7 7

Conclusions

Every causal inference algorithms rests on untestable assumptions.

Several causal inference algorithms appear to perform abovechance-level.

Causal inference may be useful

I to guide the design of interventional studiesI when qualitative conclusions do not depend on individual results.

Causal inference is (at present) not useful, when qualitativeconclusions depend on one individual inference result.

Conclusions

Causal inference may be usefulI to guide the design of interventional studies

I when qualitative conclusions do not depend on individual results.

Conclusions

Causal inference may be usefulI to guide the design of interventional studiesI when qualitative conclusions do not depend on individual results.

Conclusions

Causal inference may be usefulI to guide the design of interventional studiesI when qualitative conclusions do not depend on individual results.

4th Int. Workshop on Pattern Recognition inNeuroimaging (PRNI 2014)

June 4-6, 2014, Tubingenhttp://prni.org

An Introduction to Causal Inference in...

Documents

Causal Inference with Panel Data

Lecture 1: Causal Inference in Social Science€¦ · Outlines 1 Course Overview 2 Causal Inference in Social Science Causal Inference: The Core of Empirical Studies in Economics

Causal inference using regression on the treatment variablegelman/arm/chap9.pdf · CHAPTER 9 Causal inference using regression on the treatment variable 9.1 Causal inference and predictive

Causal Inference in Education Policy Research · 2020-02-28 · Causal Inference Stapleton, McNeish & Mao Ñ 1 Causal Inference in Education Policy Research MLDS Research Series December

Quantifying causal influences · QUANTIFYING CAUSAL INFLUENCES By Dominik Janzing∗, David Balduzzi∗,†, Moritz Grosse-Wentrup∗ and Bernhard Scholkopf¨ ∗ Max Planck Institute

Introduction to Causal Bayesian Inference Chris Holmes ... · Causal Bayesian Inference 3 Overview of Course – Introduction to causal inference and statistical association – The

Causal Inference and Data-Fusion in Econometrics · 2019-03-29 · Causal Inference in Econometrics I Despite a strong interest in causal inference in general, graphical models of

Bayesian Causal Inference - uni-muenchen.de...from causal inference have been attracting much interest recently. [HHH18] propose that causal [HHH18] propose that causal inference stands

Generalizability in Causal Inference - Carlos Cinelli and Bareinboim...Cinelli, Bareinboim, SoCal 2019 - Generalizability in Causal Inference Causal inference: causal models Inputs:

Causal Inference in Sociological Studies

Causal Inference Woth Observational Data

Causal Inference with Observational Datajustinesarey.com/causal-inference.pdf · 1 What is \causal inference?" As Sekhon1 notes in his essay on causal inference, every student of

Clash of Causal Inference Techniques

Causal Inference and Direct Effects

Causal Inference in Epidemiology

Causal Inference: prediction, explanation,

CompSci 295, Causal Inference

Experimental Causal Inference

Mediation: The Causal Inference Approach

Applying Causal Inference Techniques to Strengthen Dual ... · Modern Causal Inference Techniques Modern causal inference techniques can be used to account for the absence of random