Causal discovery, Bayesian networks, and structural equation models

Aapo Hyvärinenwith

Patrik Hoyer and Shohei Shimizu

[ Presentation at LTL/BRU, Apr 2008 ]

Dept of Computer ScienceUniversity of Helsinki

Causal discovery, Bayesian networks, and

structural equation models

The “causal discovery” problem

•Example: is smoking cause of lung cancer?

•Distinguish between

- X causes Y

- Y causes X

- X and Y are both caused by Z

- Discovery: Find interesting connections between many variables

Smoking

Physi

olo

gic

al

quanti

ty

Non-smoki

ng

How to “best” infer causality?

•Randomized experiments!

- Condition 1 with smoking

- Condition 2 without smoking

•Unfortunately: in many cases, can be...

- costly- unpractical- unethical...what then?

Causality & statistical inference

•Emphasis in statistics courses:

“Correlation does not imply causality”

due to wide-spread misinterpretation of correlation as causality in the past

•This had lead to (exaggerated) pessimism regarding all causal inference

•Correlations do not imply causality, but causality usually implies correlations

Model-based causal discovery from “non-experimental” data

•Make a model with assumptions on the process which generated the data

•Deduce what different causal connections and directions would imply for the data

•We can choose which alternative fits best the data if the assumptions hold

•Thus, we can find the true causal connections (if the assumptions hold!)

(see, e.g. Spirtes et al, 1993; Pearl, 2000)

Data-driven vs. physical models

•Data-driven models (topic of this talk):

- Few assumptions

- General functional forms

•Physically detailed models (e.g. Friston's DCM)

- Stronger assumptions

- Specific functional forms

•Which one is better? Who knows...

Basic form of data-driven models

•Typically, each data variable is expressed as a function of other data variables

•Often a linear function

• If xi is a function of xj , we think there is a causal effect

•Different from factor/component models (PCA, ICA) where x is function some other variables

x i= f x j , j≠i , for a ll i

x i=∑ j≠i bij x j , for a lli

Main approaches (1): Autoregressive models•Present data is “caused” by the past (+ noise)

•Needs good time resolution in measurements (measurements faster than effects)

•Non-zero aij related to Granger causality

•Estimation “easy”: simple linear regression

•Problems will occur because there can be many different time lags => many parameters to estimate and summarize

x i t =∑ k≥1∑ jbij k x j t−k e i t

Main approaches (1): Autoregressive models

Red: Reference region

Green: Sources of influence TO reference region

Blue: Sources of influence FROM reference region

(Roeboeck, Formisano, Goebel.NeuroImage, 2005)

Main approaches (2): Stuctrural equation models

•Also called (linear) Bayesian networks or simultaneous equation models

•All effects occur at the same time

•Estimation difficult: not simple regression

• If data is Gaussian, many different model indistinguishable => despair?

x i=∑ j≠ i bij x j ei

Linear Non-Gaussian Acyclic Model (LiNGAM)

•Non-Gaussianity allows estimation of the model, cf. ICA vs. factor analysis

• Important assumption of acyclicity:

- Equivalent to existence of an ordering of the variables so that there are only effects “forward”

- Otherwise, problems due to variables causing each other ad infinitum

(Shimizu, Hoyer, Hyvärinen, Kerminen, Journal of Machine Learning Research, 2006)

Examples of acyclic graphs

Estimation of LiNGAM

•Transform it to ICA:

•Estimate ICA: you get up to a permutation and normalization.

•Acyclicity allows determination of right permutation. (Normalization obvious.)

•Optionally, set almost half the parameters to zero based on acyclicity.

x= B x e ⇔ I − B x= e

I − B

Combination of autoregressive and structural equation models

•Easy to combine both in same equation:

Note that k starts from 0

•Must assume acyclicity for k=0

•Lagged bij change when k=0 included

•Can be estimated by combining autoregressive estimation with LiNGAM (Hyvärinen, Shimizu, Hoyer, ICML 2008, in press)

x i t =∑ k≥0∑ j≠ i bij k x j t− k ei t

Deep issues in modelling

•Hidden variables

- Perhaps x does not cause y and y does not cause x but both are caused by an unobserved variable z (Hoyer et al, in press)

•Lack of acyclicity

- x has an effect on y but then (afterwards?) y has an effect on x (Lacerda et al, submitted)

•Non-linearity, dependence of disturbances ei, etc.

Philosophical basis:Causal vs. probabilistic models

•A formalization which has recently gained acceptance

•Find the data generating mechanism, not just the statistical regularities:

- A probabilistic model of the data allows you to predict one quantity from observation of the other

- A causal model would allow you to predict the effect on one variable if intervening on the other

Code

We distribute full Matlab/Octave code for LiNGAM. Please see:

http://www.cs.helsinki.fi/group/neuroinf/lingam/

Summary•Causal discovery is possible by making

general assumptions on causal structure

•Simplest approach is autoregressive models

- needs good time resolution of measurements

• In simultaneous (or structural) models, non-Gaussianity is needed, cf. ICA

- Our LiNGAM method

•Autoregressive and structural models can be combined

•An alternative to factor/component based exploratory analysis

•Matlab code available on-line

Documents

Causal discovery, Bayesian networks, and structural equation models