38
Learning Causality Some slides are from Judea Pearl’s class lecture http://bayes.cs.ucla.edu/BOOK-2K/

Learning Causality Some slides are from Judea Pearl’s class lecture

Embed Size (px)

Citation preview

Page 1: Learning Causality Some slides are from Judea Pearl’s class lecture

Learning Causality

Some slides are from Judea Pearl’s class lecturehttp://bayes.cs.ucla.edu/BOOK-2K/viewgraphs.html

Page 2: Learning Causality Some slides are from Judea Pearl’s class lecture

A causal model Example

• Statement ‘rain causes mud’ implies an asymmetric relationship: the rain will create mud, but the mud will not create rain.

• Use ‘→’ when refer such causal relationship;• There is no arrow between ‘rain’ and ‘other

causes of mud’ means that there is no direct causal relationship between them;

Rain Other causes of mudMud

Page 3: Learning Causality Some slides are from Judea Pearl’s class lecture

Directed (causal) Graphs

• A and B are causally independent;• C, D, E, and F are causally dependent on A and B;• A and B are direct causes C;• A and B are indirect causes D, E and F;• If C is prevented from changing with A and B, then A

and B will no longer cause changes in D, E and F.

A

F

D

E

B

C

Page 4: Learning Causality Some slides are from Judea Pearl’s class lecture

Conditional Independence

Page 5: Learning Causality Some slides are from Judea Pearl’s class lecture

Conditional Independence

Page 6: Learning Causality Some slides are from Judea Pearl’s class lecture

Conditional Independence (Notation)

Page 7: Learning Causality Some slides are from Judea Pearl’s class lecture

Causal Structure

Page 8: Learning Causality Some slides are from Judea Pearl’s class lecture

Causal Structure (cont’d)

• A Causal Structure serves as a blueprint for forming a “casual model” – a precise specification of how each variable is influenced by its parents in the DAG.

• We assume that Nature is at liberty to impose arbitrary functional relationships between each effect and its causes and then to perturb these relationships by introducing arbitrary disturbance;

• These disturbances reflect “hidden” or unmeasurable conditions.

Page 9: Learning Causality Some slides are from Judea Pearl’s class lecture

Causal Model

Page 10: Learning Causality Some slides are from Judea Pearl’s class lecture

Causal Model (Cont’d)• Once a causal model M is formed, it defines a joint

probability distribution P(M) over the variables in the system;

• This distribution reflects some features of the causal structure– Each variable must be independent of its grandparents, given the

values of its parents

• We may allowed to inspect a select subset OV of “observed” variables to ask questions about P[o], the probability distribution over the observations;

• We may recover the topology D of the DAG, from features of the probability distribution P[o].

Page 11: Learning Causality Some slides are from Judea Pearl’s class lecture

Inferred Causation

Page 12: Learning Causality Some slides are from Judea Pearl’s class lecture

Latent Structure

Page 13: Learning Causality Some slides are from Judea Pearl’s class lecture

Structure Preference

Page 14: Learning Causality Some slides are from Judea Pearl’s class lecture

Structure Preference (Cont’d)

• The set of independencies entailed by a causal structure imposes limits on its power to mimic other structure;

• L1 cannot be preferred to L2 if there is even one observable dependency that is permitted by L1 and forbidden by L2;

• L1 is preferred to L2 if L2 has subset of L1’s independence;

• Thus, test for preference and equivalence can sometimes be reduced to test dependencies, which can be determined by topology of the DAGs without concerning parameters.

Page 15: Learning Causality Some slides are from Judea Pearl’s class lecture

Minimality

Page 16: Learning Causality Some slides are from Judea Pearl’s class lecture

Consistency

Page 17: Learning Causality Some slides are from Judea Pearl’s class lecture

Inferred Causation

Page 18: Learning Causality Some slides are from Judea Pearl’s class lecture

Examples{a,b,c,d} reveal two independencies:1.a is independent of b;2.d is independent of {a,b} given c;

Assume further that the data reveals no other independencies;

a = having a cold;b = having hay fever;c = having to sneeze;d = having to wipe one’s nose.

Page 19: Learning Causality Some slides are from Judea Pearl’s class lecture

Example (Cont’d){a,b,c,d} reveal two independencies:1. a is independent of b;2. d is independent of {a,b} given c;

minimal

Arbitrary relations between a and b

Not minimal: fails to impose conditional Independence between d and {a,b}

Not consistent with data: impose marginalindependence between d and {a,b}

Page 20: Learning Causality Some slides are from Judea Pearl’s class lecture

Stability

The stability condition states that, as we vary the parmeters from to , no indpendence in P can be destroyed.

In other words, if the independency exists, it will always exists.

Page 21: Learning Causality Some slides are from Judea Pearl’s class lecture

Stable distribution

• A probability distribution P is a faithful/stable distribution if there exist a directed acyclic graph (DAG) D such that the conditional independence relationship in P is also shown in the D, and vice versa.

Page 22: Learning Causality Some slides are from Judea Pearl’s class lecture

IC algorithm (Inductive Causation)

• IC algorithm (Pearl)

– Based on variable dependencies;– Find all pairs of variables that are dependent

of each other (applying standard statistical method on the database);

– Eliminate (as much as possible) indirect dependencies;

– Determine directions of dependencies;

Page 23: Learning Causality Some slides are from Judea Pearl’s class lecture

Comparing abduction, deduction and induction

• Deduction: major premise: All balls in the box are black

minor premise: These balls are from the box

conclusion: These balls are black

• Abduction: rule: All balls in the box are black

observation: These balls are black

explanation: These balls are from the box

• Induction: case: These balls are from the box

observation: These balls are black

hypothesized rule: All ball in the box are black

A => B A ---------BA => B B-------------Possibly A

Whenever A then B but not vice versa-------------Possibly A => B

Induction: from specific cases to general rules;Abduction and deduction:

both from part of a specific case to other part of the case using general rules (in different ways)

Source from httpwww.csee.umbc.edu/~ypeng/F02671/lecture-notes/Ch15.ppt

Page 24: Learning Causality Some slides are from Judea Pearl’s class lecture

IC Algorithm (Cont’d)

• Input: – P – a stable distribution on a set V of

variables;

• Output: – A pattern H(P) compatible with P;

Patten: is a partially directed DAG• some edges are directed and • some edges are undirected;

Page 25: Learning Causality Some slides are from Judea Pearl’s class lecture

IC Algorithm: Step 1

• For each pair of variables a and b in V, search for a set Sab such that (a╨b | Sab) holds in P – in other words, a and b should be

independent in P, conditioned on Sab .

• Construct an undirected graph G such that vertices a and b are connected with an edge if and only if no set Sab can be found.

Sab

aNot Sab

b

Sab a b

a b╨

Page 26: Learning Causality Some slides are from Judea Pearl’s class lecture

IC Algorithm: Step 2

• For each pair of nonadjacent variables a and b with a common neighbor c, check if c Sab.

• If it is, then continue;• Else add arrowheads at c• i.e a→ c ← b

Yes

c

a

b

a b C╨No

ca

b

Page 27: Learning Causality Some slides are from Judea Pearl’s class lecture

ExampleRain

Other causes of mud

Mud

RainOther causes of mud

Mud

Page 28: Learning Causality Some slides are from Judea Pearl’s class lecture

IC Algorithm Step 3

• In the partially directed graph that results, orient as many of the undirected edges as possible subject to two conditions:– The orientation should not create a new v-

structure;– The orientation should not create a directed cycle;

Page 29: Learning Causality Some slides are from Judea Pearl’s class lecture

Rules required to obtaining a maximally oriented pattern

• R1: Orient b — c into b→c whenever there is an arrow a→b such that a and c are non adjacent;

cb cb

ba c

Page 30: Learning Causality Some slides are from Judea Pearl’s class lecture

Rules required to obtaining a maximally oriented pattern

• R2: Orient a — b into a→b whenever there is a chain a→c→b;

ba ba

ca b

Page 31: Learning Causality Some slides are from Judea Pearl’s class lecture

Rules required to obtaining a maximally oriented pattern

R3: Orient a — b into a→b whenever there are two chains a—c→b and a—d→b such that c and d are nonadjacent;

ba ba

c

a b

d

Page 32: Learning Causality Some slides are from Judea Pearl’s class lecture

Rules required to obtaining a maximally oriented pattern

R4: Orient a — b into a→b whenever there are two chains a—c→d and c→d→b such that c and b are nonadjacent;

ba ba

ca d

dc b

Page 33: Learning Causality Some slides are from Judea Pearl’s class lecture

IC* Algorithm

• Input: – P, a sampled distribution;

• Output: – core(P), a marked pattern;

Page 34: Learning Causality Some slides are from Judea Pearl’s class lecture

Marked Pattern:Four types of edges

Page 35: Learning Causality Some slides are from Judea Pearl’s class lecture

IC* Algorithm: Step 1

For each pair of variables a and b, search for a set Sab such that a and b are independent in P, conditioned on Sab. If there is no such Sab, place an undirected link between the two variables, a – b.

Page 36: Learning Causality Some slides are from Judea Pearl’s class lecture

IC* Algorithm: Step 2

• For each pair of nonadjacent variables a and b with a common neighbor c, check if cSab

– If it is, then continue;– If it is not, then add arrow heads pointing at c (i.e. a

c b).

• In the partially directed graph that results, add (recursively) as many arrowheads as possible, and mark as many edges as possible, according to the following two rules:

Page 37: Learning Causality Some slides are from Judea Pearl’s class lecture

IC* Algorithm: Rule 1

• R1: For each pair of non-adjacent nodes a and b with a common neighbor c, if the link between a and c has an arrow head into c and if the link between c and b has no arrowhead into c, then add an arrow head on the link between c and b pointing at b and mark that link to obtain c –* b;

c

a

b

c

a

b

*

Page 38: Learning Causality Some slides are from Judea Pearl’s class lecture

IC* Algorithm: Rule 2

• R2: If a and b are adjacent and there is a directed path (composed strictly of marked links) from a to b, then add an arrowhead pointing toward b on the link between a and b;