Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Distinguishing between Cause and Effect:Estimation of Causal Graphs with two Variables
Jonas PetersETH Zurich
Tutorial
NIPS 2013 Workshop on Causality9th December 2013
F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
F. H. Messerli: Chocolate Consumption, Cognitive Function, and Nobel Laureates, N Engl J Med 2012
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Problem: Given P(X ,Y ), can we infer whether
X → Y or Y → X ?
Difficulty: So much symmetry:
P(X ) · P(Y |X ) = P(X ,Y ) = P(X |Y ) · P(Y )
We need assumptions!! (e.g. Markov and faithfulness do not suffice.)
Surprise (for some assumptions):
2 variables ⇒ p variables
J. Peters, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, arXiv:1309.6779
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Problem: Given P(X ,Y ), can we infer whether
X → Y or Y → X ?
Difficulty: So much symmetry:
P(X ) · P(Y |X ) = P(X ,Y ) = P(X |Y ) · P(Y )
We need assumptions!! (e.g. Markov and faithfulness do not suffice.)
Surprise (for some assumptions):
2 variables ⇒ p variables
J. Peters, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, arXiv:1309.6779
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Problem: Given P(X ,Y ), can we infer whether
X → Y or Y → X ?
Difficulty: So much symmetry:
P(X ) · P(Y |X ) = P(X ,Y ) = P(X |Y ) · P(Y )
We need assumptions!! (e.g. Markov and faithfulness do not suffice.)
Surprise (for some assumptions):
2 variables ⇒ p variables
J. Peters, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, arXiv:1309.6779
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Idea No. 1: Linear Non-Gaussian Additive Models (LiNGAM)
Structural assumptions like additive non-Gaussian noise models break thesymmetry:
Y = βX + NY NY ⊥⊥ X ,
with NY non-Gaussian.
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Asymmetry No. 1
Consider a distribution corresponding to
Y = βX + NY
• NY ⊥⊥ X• NY non-Gaussian
X Y
Then there is no
X = φY + NX
• NX ⊥⊥ Y• NX non-Gaussian
X Y
S. Shimizu, P.O. Hoyer, A. Hyvarinen and A.J. Kerminen: A linear non-Gaussian acyclic model for causal discovery, JMLR 2006
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Asymmetry No. 1
Consider a distribution corresponding to
Y = βX + NY
• NY ⊥⊥ X• NY non-Gaussian
X Y
Then there is no
X = φY + NX
• NX ⊥⊥ Y• NX non-Gaussian
X Y
S. Shimizu, P.O. Hoyer, A. Hyvarinen and A.J. Kerminen: A linear non-Gaussian acyclic model for causal discovery, JMLR 2006
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Idea No. 2: Additive noise models
Nonlinear functions are also fine!
Y = f (X ) + NY NY ⊥⊥ X
P. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Scholkopf: Nonlinear causal discovery with additive noise models, NIPS 2008
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Asymmetry No. 2
Consider a distribution corresponding to
Y = f (X ) + NY
with NY ⊥⊥ X
X Y
Then for “most combinations” (f ,P(X ),P(NY )) there is no
X = g(Y ) + MX
with MX ⊥⊥ Y
X Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Asymmetry No. 2
Consider a distribution corresponding to
Y = f (X ) + NY
with NY ⊥⊥ X
X Y
Then for “most combinations” (f ,P(X ),P(NY )) there is no
X = g(Y ) + MX
with MX ⊥⊥ Y
X Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Y = f (X ) + NY , NY ⊥⊥ X
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Y = f (X ) + NY , NY ⊥⊥ X
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
X = g(Y ) + NX , NX ⊥⊥ Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
X = g(Y ) + NX , NX ⊥⊥ Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Idea No. 3: Gaussian Process Inference (GPI)
We can always write∗
Y = f (X ,NY ), NY ⊥⊥ X
andX = g(Y ,NX ), NX ⊥⊥ Y
Which model is more “complex”? Use Bayesian model comparison.
J. M. Mooij, O. Stegle, D. Janzing, K. Zhang, B. Scholkopf:
Probabilistic latent variable models for distinguishing between cause and effect, NIPS 2010
∗E.g., J. Peters: Restricted Structural Equation Models for Causal Inference, PhD Thesis
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Asymmetry No. 3
1 Fix the noise distribution to be N (0, 1).
2 Put prior p(θX ) on input distribution p(x | θX ) ( complexity of X ).
3 Put prior p(θf ) on the functions p(f | θf ) ( complexity of f ).
4 Approximate marginal likelihood for X → Y
p(x , y) = p(x) · p(y | x)
=
∫p(x | θX )p(θX ) dθX
·∫δ(y − f (x , e)
)p(e)p(f ) de df
θf
f
Y
X
θX
E
5 Approximate marginal likelihood for Y → X .
6 Compare.
J. M. Mooij, O. Stegle, D. Janzing, K. Zhang, B. Scholkopf:
Probabilistic latent variable models for distinguishing between cause and effect, NIPS 2010
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Asymmetry No. 3
1 Fix the noise distribution to be N (0, 1).
2 Put prior p(θX ) on input distribution p(x | θX ) ( complexity of X ).
3 Put prior p(θf ) on the functions p(f | θf ) ( complexity of f ).
4 Approximate marginal likelihood for X → Y
p(x , y) = p(x) · p(y | x)
=
∫p(x | θX )p(θX ) dθX
·∫δ(y − f (x , e)
)p(e)p(f ) de df
θf
f
Y
X
θX
E
5 Approximate marginal likelihood for Y → X .
6 Compare.
J. M. Mooij, O. Stegle, D. Janzing, K. Zhang, B. Scholkopf:
Probabilistic latent variable models for distinguishing between cause and effect, NIPS 2010
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Idea No. 4: Information Geometric Causal Inference (IGCI)
Assume a deterministic relationship
Y = f (X )
and that f and P(X ) are “independent”.
D. Janzing, J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Scholkopf:
Information-geometric approach to inferring causal directions, Artificial Intelligence 2012
y
x
f(x)
p(x)
p(y)
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Idea No. 4: Information Geometric Causal Inference (IGCI)
Assume a deterministic relationship
Y = f (X )
and that f and P(X ) are “independent”.
D. Janzing, J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Scholkopf:
Information-geometric approach to inferring causal directions, Artificial Intelligence 2012
y
x
f(x)
p(x)
p(y)
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Asymmetry No. 4
Consider Y = f (X ) with id 6= f : [0, 1]→ [0, 1] invertible and X = g(Y ).If
“cov”(log f ′, pX ) =
∫log(f ′(x)) pX (x) dx −
∫log f ′(x) dx = 0
then
“cov”(log g ′, pY ) =
∫log(g ′(y)) pY (y) dy −
∫log g ′(y) dy > 0
D. Janzing, J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Scholkopf:
Information-geometric approach to inferring causal directions, Artificial Intelligence 2012
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Open Questions 1: Quantifying Identifiability
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Open Questions 1: Quantifying Identifiability
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Open Questions 1: Quantifying Identifiability
Proposition
Assume P(X ,Y ) is generated by
Y = f (X ) + NY
with independent X and NY .
Theninf
Q∈{Q:Y→X}KL(P ||Q) = ?
first steps to understand the geometry
gives us finite sample guarantees
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Open Questions 2: Robustness
What happens if assumptions are violated? E.g., in case of confounding?
X Y
Z
Can we still infer X → Y ? How useful is this?
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Conclusions
In theory, we can brake asymmetry between cause and effect.
restricted structural equation models:- linear functions, additive non-Gaussian noise- nonlinear functions, additive noise
complexity measures on functions and distributions
“independence” between function and input distribution
... principles behind new methods from challenge?
Causal inference prob-
lem of climate change is
solved! Fight the cause!
Don’t fly! (Zurich-SFO 5.4t CO2)!
Compensate!
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Conclusions
In theory, we can brake asymmetry between cause and effect.
restricted structural equation models:- linear functions, additive non-Gaussian noise- nonlinear functions, additive noise
complexity measures on functions and distributions
“independence” between function and input distribution
... principles behind new methods from challenge?
Causal inference prob-
lem of climate change is
solved! Fight the cause!
Don’t fly! (Zurich-SFO 5.4t CO2)!
Compensate!
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Conclusions
In theory, we can brake asymmetry between cause and effect.
restricted structural equation models:- linear functions, additive non-Gaussian noise- nonlinear functions, additive noise
complexity measures on functions and distributions
“independence” between function and input distribution
... principles behind new methods from challenge?
Causal inference prob-
lem of climate change is
solved! Fight the cause!
Don’t fly! (Zurich-SFO 5.4t CO2)!
Compensate!
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
IGCI
It turns out that if X → Y∫log |f ′(x)|p(x) dx <
∫log |g ′(y)|p(y) dy
Estimator:
CX→Y :=1
m
m∑j=1
log
∣∣∣∣yj+1 − yjxj+1 − xj
∣∣∣∣ ≈ ∫ log |f ′(x)|p(x) dx
Infer X → Y ifCX→Y < CY→X
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Y = βX + NY , NY ⊥⊥ X , NY non-Gaussian
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
X
Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Y = βX + NY , NY ⊥⊥ X , NY non-Gaussian
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
X
Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
X = φY + NX , NX ⊥⊥ Y , NX non-Gaussian
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
X
Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
X = φY + NX , NX ⊥⊥ Y , NX non-Gaussian
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
X
Y
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Does X cause Y or vice versa?Real Data
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Does X cause Y or vice versa?Real Data
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Does X cause Y or vice versa?
No (not enough) data for chocolate
... but we have data for coffee!
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Does X cause Y or vice versa?
No (not enough) data for chocolate
... but we have data for coffee!
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Does X cause Y or vice versa?
0 2 4 6 8 10 12
05
15
25
coffee consumption per capita (kg)
# N
ob
el L
au
rea
tes /
10
mio
Correlation: 0.698, p-value: < 2.2 · 10−16.
Nobel Prize→ Coffee: Dependent residuals (p-value of 0).Coffee→ Nobel Prize: Dependent residuals (p-value of 0).
⇒ Model class too small? Causally insufficient?
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
Does X cause Y or vice versa?
0 2 4 6 8 10 12
05
15
25
coffee consumption per capita (kg)
# N
ob
el L
au
rea
tes /
10
mio
Correlation: 0.698, p-value: < 2.2 · 10−16.
Nobel Prize→ Coffee: Dependent residuals (p-value of 0).Coffee→ Nobel Prize: Dependent residuals (p-value of 0).
⇒ Model class too small? Causally insufficient?Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013
The linear Gaussian case
Y = βX + NY
with independent
X ∼ N (0, σ2X ) and
N ∼ N (0, σ2NY
) .
Then there is a linear SEM with
X = αY + MX
How can we find α and MX ?
L2
NYY
XβX
Jonas Peters (ETH Zurich) Distinguishing between Cause and Effect 9th December 2013