A non-Gaussian model for causal discovery in the presence of hidden common causes

  • View
    1.195

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Shohei Shimizu

Shiga University / Osaka University

Japan

1

A non-Gaussian model for causal

discovery in the presence of hidden

common causes

2016 Munich Workshop on

Causal Inference and Information Theory

Abstract

• Managing hidden common causes is

essential in causal discovery

• Non-causally-related observed variables

can be correlated due to hidden common

causes

• Propose a linear non-Gaussian model for

estimating causal direction in cases with

hidden common causes

2

Motivation

Illustrative example

Strong correlation btw chocolate

consumption and number of Nobel

laureates (Messerli12NEJM)

4

2002-2011Chocolate consumption (kg/yr/capita)Num

. N

obel la

ure

ate

s p

er

10 m

illio

n p

op.

Corr. 0.791

P-value < 0.001

Eating more chocolate increases

num. Nobel laureates?

• Interpretational drift (Maurage+13, J. Nutrition)

5

Chclt Nobel?

Chclt Nobelor

GDP GDP

Chclt Nobelor

GDP

Corr. 0.791

P-value < 0.001

No

bel

Chocolate

Hidden

Common

cause

Manage this gap!

Hidden

Common

cause

Hidden

Common

cause

Formulating the problem

Structural causal models (Pearl, 2000,2009; cf. Bollen, 1989)

• A framework for describing causal relations

• Generally speaking, if the value of 𝑥1 has

been changed and then that of 𝑥2 changes,

then 𝑥1 causes 𝑥2

7

2122

111

,,

,

efxgx

efgx

x1 x2

f

e1 e2

GDP

NobelChclt

Challenge in causal discovery8

Hidden common cause

2122

111

,,

,

efxgx

efgx

Data matrixx1

x2 21... ,~ xxpdii

obs.1

Assume that either of

the three generated

the data

Estimate which of the

three models generated

the data

obs.nobs.2 …

x1 x2

f

x1 x2

f

x1 x2

f

e1 e2 e1 e2 e1 e2

fpepep ,, 21

Hidden common cause Hidden common cause

222

1211

,

,,

efgx

efxgx

222

111

,

,

efgx

efgx

fpepep ,, 21 fpepep ,, 21

Under what conditions

can we manage the gap?

• We have shown that it is possible under the three

assumptions: i) linearity; ii) Acyclicty;

iii) non-Gaussianity (Hoyer+08IJAR; Shimizu+14JMLR):

• Classical Bayesian network approach incapable

9

x1 x2?

x1 x2or

f1 f1

x1 x2

f1

or

21211212

11121

efxbx

efx

21212

11122121

efx

efxbx

22212

11121

efx

efx

Basic non-Gaussian model

(No hidden common cause)

S. Shimizu, P. O. Hoyer, A. Hyvärinen

and A. Kerminen

Journal of Machine Learning Research

2006

Linear Non-Gaussian Acyclic

Model (LiNGAM) (Shimizu et al., 2006)

• Identifiable: causal directions and coefficients

• Various extensions including nonlinear (Hoyer+08NIPS,

Zhang+09UAI) and cyclic (Lacerda+08UAI) models

11

i

ij

jiji exbx

x1 x2

x3

21b

23b13b

2e

3e

1e

Linearity

Acyclicity

Non-Gaussian errors eiIndependence of errors ei

(no hidden common causes)

1212Different directions give

different data distributionsGaussian Non-Gaussian

(ex. uniform)

Model 1:

Model 2:

x1

x2

x1

x2

e1

e2

x1

x2

e1

e2

x1

x2

x1

x2

x1

x2

212

11

8.0 exx

ex

22

121 8.0

ex

exx

1varvar 21 xx

,021 eEeE

13

Independent Component Analysis

(ICA) (Jutten & Herault, 1991; Comon, 1994; Hyvarinen et al., 2001)

• Observed variables are modeled by

where

– Hidden variables are non-Gaussian and independent

• Then, mixing matrix A is identifiable up to permutation and scaling of the columns

Asx

pjs j ,,1

p

j

jiji sax1

or

ix

Sketch of the identifiability proof

• Different directions give different zero/non-

zero patterns of the mixing matrices

– No zeros on the diagonal in the causal model

– No permutation indeterminacy

14

2

1

212

1

1

01

e

e

bx

x

21212

11

exbx

ex

A sx

2

112

2

1

10

1

e

eb

x

x

A sx22

12121

ex

exbx

x1

x2

e1

e2

x1

x2

e1

e2

0

0

Model 1:

Model 2:

LiNGAM with hidden

common causes

P. O. Hoyer, S. Shimizu, A. Kerminen,

and M. Palviainen

Int. J. Approximate Reasoning

2008

qf

2121

1

22

1

1

11

exbfx

efx

Q

q

qq

Q

q

qq

i

ij

jij

Q

q

qiqi exbfx 1

• Extension to incorporate non-Gaussian hidden

common causes

LiNGAM with hidden

common causes (Hoyer+08IJAR)

16

where are independent: ),,1( Qqfq

x1 x2 2e1e

1f 2f

i

ij

jij

Q

q

qiqi exbfx 1

2

:2 fef

1

:1 fef

qfWLG, hidden common causes

are assumed to be independent

Independent hidden

common causes

17

x1 x2 2e1e

1fe

2fe

x1 x2 2e1e

1f 2f

Dependent hidden

common causes

2

1

2221

11

2221

11

2

100

2

1

f

f

aa

a

e

e

aa

a

f

f

f

f

Non-Gaussian

x2

x1

Gaussian e1,e2, f1

x2

• Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given

Different directions give different

zero/non-zero patterns (Hoyer+08IJAR)

18

x1 x2

f1

x1 x2

f1

x1 x2

f1

Models

1.

2.

3.

**0

*0*

***

*0*

**0

***

A

A

Previous estimation methods(Hoyer+08IJAR; Henao+11JMLR)

• Explicitly model hidden common causes

• Do model comparison based on maximum

likelihood principle or Bayesian approach

• Need to specify their number and distributions,

which is difficult in general

19

x1 x2

f1

x1 x2

orfQ f1 fQ… …

2e1e2e1e

Our proposal:

A Bayesian LiNGAM

approach

S. Shimizu and K. Bollen.

Journal of Machine Learning Research,

2014

and something extra

Key idea (1/2)

• Transform the model to a model with

no hidden common causes

21

)1(

1x)1(

2x

)(

2

mx)1(

1xx1 x2

f1 fQ…

2e1e

)1(

2e)1(

1e

)(

2

me)(

1

me

……

21b

21b

21b)(

2

m

)1(

2

LiNGAM with no hidden

common causes but with

possibly different

intercepts over obs.

LiNGAM with

hidden common

causes

)1(

1

)(

1

m

Key idea (2/2)

• Include the sums of hidden common causes as

the model parameters, i.e., observation-specific

intercepts:

• Not explicitly model hidden common causes

– Neither necessary to specify the number of hidden

common causes Q nor estimate the coefficients

22

)(

2

m

)(

2

)(

121

1

)(

2

)(

2

mmQ

q

m

qq

m exbfx

m-th obs.:

q2

Obs.-specific

intercept

• Compare the marginal likelihoods wth data stndrdzd

• Once a direction has been estimated, compute the

posterior of the connection strength b21 or b12

• Many obs.-specific intercepts

– Similar to mixed models and multi-level models

– Informative prior

)()(

121

)(

2

)(

2

)(

1

)(

1

)(

1

m

i

mmm

mmm

exbx

ex

Bayesian model selection23

),,1;2,1()( nmim

i

Model 3 (x1 x2)

)(

2

)(

2

)(

2

)(

1

)(

212

)(

1

)(

1

mmm

mmmm

ex

exbx

Model 4 (x1 x2)

Prior for the observation-specific

intercepts

• Motivation: Central limit theorem

– Sums of independent variables tend to be more Gaussian

• Approximate the density by a bell-shaped curve dist.

– Dependent due to hidden common causes

• Select the hyper-parameter values

that maximize the marginal likelihood

24

Q

q

m

qq

mQ

q

m

qq

m ff1

)(

2

)(

2

1

)(

1

)(

1 ,

~)(

2

)(

1

m

m

t-distribution with sd ,

correlation , and DOF1221,v

}8.0,.6.0,4.0{, 21

)(m

qf

(here, 8)

Error distributions and other

priors used in the experiment

• Error distributions

– Fixed to be the Laplace distribution

– Possible to be estimated assuming a family of

generalized Gaussian distributions, for

example

• Priors for the other parameters

25

)75.0,0(~

)75.0,0(~

)1,1(~

2

21

2

12

12

Nb

Nb

U

)1,0(~)(

)1,0(~)(

2

1

Uestd

Uestd

)(),( 21 epep

Experiment on sociology data

Sociology data

• Source: General Social Survey (n=1380)– Non-farm background, ages 35-44, white, male, in the labor

force, no missing data for any of the covariates, 1972-2006

• 15 pairs with known temporal directions (Duncan+1972)

27

Status attainment model(Duncan et al., 1972)

x2: Son’s Income

Numbers of successes

(n=1380)

28

FE

Cf. LiNGAM-GU-UK (Chen+13NECO) 0.20; PNL(Zhang+09UAI): 0.60

Known (temporal)

orderings of 15 pairs

Son’s

Education

Father’s

Education

Son’s

Income

Son’s

Occupation

f1

f1

Conclusion

Conclusion• Estimation of causal direction in the presence of

hidden common causes is a major challenge in

causal discovery

• Proposed a linear non-Gaussian SEM approach

– Not necessary to model individual hidden common

causes

• Future directions

– Cyclic cases: Using some prior for forcing the

identifiability condition of Lacerda+08UAI?

– Non-stationarity: Combining with Kun’s method

(Huang+15IJACI)?

30

Recommended