18
Aapo Hyvärinen with Patrik Hoyer and Shohei Shimizu [ Presentation at LTL/BRU, Apr 2008 ] Dept of Computer Science University of Helsinki Causal discovery, Bayesian networks, and structural equation models

Causal discovery, Bayesian networks, and structural equation models

Embed Size (px)

DESCRIPTION

Aapo Hyvärinen with Patrik Hoyer and Shohei Shimizu. Causal discovery, Bayesian networks, and structural equation models. Dept of Computer Science University of Helsinki. [ Presentation at LTL/BRU, Apr 2008 ]. The “causal discovery” problem. Example: is smoking cause of lung cancer? - PowerPoint PPT Presentation

Citation preview

Page 1: Causal discovery, Bayesian networks, and structural equation models

Aapo Hyvärinenwith

Patrik Hoyer and Shohei Shimizu

[ Presentation at LTL/BRU, Apr 2008 ]

Dept of Computer ScienceUniversity of Helsinki

Causal discovery, Bayesian networks, and

structural equation models

Page 2: Causal discovery, Bayesian networks, and structural equation models

The “causal discovery” problem

•Example: is smoking cause of lung cancer?

•Distinguish between

- X causes Y

- Y causes X

- X and Y are both caused by Z

- Discovery: Find interesting connections between many variables

Smoking

Physi

olo

gic

al

quanti

ty

Non-smoki

ng

Page 3: Causal discovery, Bayesian networks, and structural equation models

How to “best” infer causality?

•Randomized experiments!

- Condition 1 with smoking

- Condition 2 without smoking

•Unfortunately: in many cases, can be...

- costly- unpractical- unethical...what then?

Page 4: Causal discovery, Bayesian networks, and structural equation models

Causality & statistical inference

•Emphasis in statistics courses:

“Correlation does not imply causality”

due to wide-spread misinterpretation of correlation as causality in the past

•This had lead to (exaggerated) pessimism regarding all causal inference

•Correlations do not imply causality, but causality usually implies correlations

Page 5: Causal discovery, Bayesian networks, and structural equation models

Model-based causal discovery from “non-experimental” data

•Make a model with assumptions on the process which generated the data

•Deduce what different causal connections and directions would imply for the data

•We can choose which alternative fits best the data if the assumptions hold

•Thus, we can find the true causal connections (if the assumptions hold!)

(see, e.g. Spirtes et al, 1993; Pearl, 2000)

Page 6: Causal discovery, Bayesian networks, and structural equation models

Data-driven vs. physical models

•Data-driven models (topic of this talk):

- Few assumptions

- General functional forms

•Physically detailed models (e.g. Friston's DCM)

- Stronger assumptions

- Specific functional forms

•Which one is better? Who knows...

Page 7: Causal discovery, Bayesian networks, and structural equation models

Basic form of data-driven models

•Typically, each data variable is expressed as a function of other data variables

•Often a linear function

• If xi is a function of xj , we think there is a causal effect

•Different from factor/component models (PCA, ICA) where x is function some other variables

x i= f x j , j≠i , for a ll i

x i=∑ j≠i bij x j , for a lli

Page 8: Causal discovery, Bayesian networks, and structural equation models

Main approaches (1): Autoregressive models•Present data is “caused” by the past (+ noise)

•Needs good time resolution in measurements (measurements faster than effects)

•Non-zero aij related to Granger causality

•Estimation “easy”: simple linear regression

•Problems will occur because there can be many different time lags => many parameters to estimate and summarize

x i t =∑ k≥1∑ jbij k x j t−k e i t

Page 9: Causal discovery, Bayesian networks, and structural equation models

Main approaches (1): Autoregressive models

Red: Reference region

Green: Sources of influence TO reference region

Blue: Sources of influence FROM reference region

(Roeboeck, Formisano, Goebel.NeuroImage, 2005)

Page 10: Causal discovery, Bayesian networks, and structural equation models

Main approaches (2): Stuctrural equation models

•Also called (linear) Bayesian networks or simultaneous equation models

•All effects occur at the same time

•Estimation difficult: not simple regression

• If data is Gaussian, many different model indistinguishable => despair?

x i=∑ j≠ i bij x j ei

Page 11: Causal discovery, Bayesian networks, and structural equation models

Linear Non-Gaussian Acyclic Model (LiNGAM)

•Non-Gaussianity allows estimation of the model, cf. ICA vs. factor analysis

• Important assumption of acyclicity:

- Equivalent to existence of an ordering of the variables so that there are only effects “forward”

- Otherwise, problems due to variables causing each other ad infinitum

(Shimizu, Hoyer, Hyvärinen, Kerminen, Journal of Machine Learning Research, 2006)

Page 12: Causal discovery, Bayesian networks, and structural equation models

Examples of acyclic graphs

Page 13: Causal discovery, Bayesian networks, and structural equation models

Estimation of LiNGAM

•Transform it to ICA:

•Estimate ICA: you get up to a permutation and normalization.

•Acyclicity allows determination of right permutation. (Normalization obvious.)

•Optionally, set almost half the parameters to zero based on acyclicity.

x= B x e ⇔ I − B x= e

I − B

Page 14: Causal discovery, Bayesian networks, and structural equation models

Combination of autoregressive and structural equation models

•Easy to combine both in same equation:

Note that k starts from 0

•Must assume acyclicity for k=0

•Lagged bij change when k=0 included

•Can be estimated by combining autoregressive estimation with LiNGAM (Hyvärinen, Shimizu, Hoyer, ICML 2008, in press)

x i t =∑ k≥0∑ j≠ i bij k x j t− k ei t

Page 15: Causal discovery, Bayesian networks, and structural equation models

Deep issues in modelling

•Hidden variables

- Perhaps x does not cause y and y does not cause x but both are caused by an unobserved variable z (Hoyer et al, in press)

•Lack of acyclicity

- x has an effect on y but then (afterwards?) y has an effect on x (Lacerda et al, submitted)

•Non-linearity, dependence of disturbances ei, etc.

Page 16: Causal discovery, Bayesian networks, and structural equation models

Philosophical basis:Causal vs. probabilistic models

•A formalization which has recently gained acceptance

•Find the data generating mechanism, not just the statistical regularities:

- A probabilistic model of the data allows you to predict one quantity from observation of the other

- A causal model would allow you to predict the effect on one variable if intervening on the other

Page 17: Causal discovery, Bayesian networks, and structural equation models

Code

We distribute full Matlab/Octave code for LiNGAM. Please see:

http://www.cs.helsinki.fi/group/neuroinf/lingam/

Page 18: Causal discovery, Bayesian networks, and structural equation models

Summary•Causal discovery is possible by making

general assumptions on causal structure

•Simplest approach is autoregressive models

- needs good time resolution of measurements

• In simultaneous (or structural) models, non-Gaussianity is needed, cf. ICA

- Our LiNGAM method

•Autoregressive and structural models can be combined

•An alternative to factor/component based exploratory analysis

•Matlab code available on-line