Learning Causality Some slides are from Judea Pearl’s class lecture

Learning Causality

Some slides are from Judea Pearl’s class lecturehttp://bayes.cs.ucla.edu/BOOK-2K/viewgraphs.html

A causal model Example

• Statement ‘rain causes mud’ implies an asymmetric relationship: the rain will create mud, but the mud will not create rain.

• Use ‘→’ when refer such causal relationship;• There is no arrow between ‘rain’ and ‘other

causes of mud’ means that there is no direct causal relationship between them;

Rain Other causes of mudMud

Directed (causal) Graphs

• A and B are causally independent;• C, D, E, and F are causally dependent on A and B;• A and B are direct causes C;• A and B are indirect causes D, E and F;• If C is prevented from changing with A and B, then A

and B will no longer cause changes in D, E and F.

A

F

D

E

B

C

Conditional Independence

Conditional Independence

Conditional Independence (Notation)

Causal Structure

Causal Structure (cont’d)

• A Causal Structure serves as a blueprint for forming a “casual model” – a precise specification of how each variable is influenced by its parents in the DAG.

• We assume that Nature is at liberty to impose arbitrary functional relationships between each effect and its causes and then to perturb these relationships by introducing arbitrary disturbance;

• These disturbances reflect “hidden” or unmeasurable conditions.

Causal Model

Causal Model (Cont’d)• Once a causal model M is formed, it defines a joint

probability distribution P(M) over the variables in the system;

• This distribution reflects some features of the causal structure– Each variable must be independent of its grandparents, given the

values of its parents

• We may allowed to inspect a select subset OV of “observed” variables to ask questions about P[o], the probability distribution over the observations;

• We may recover the topology D of the DAG, from features of the probability distribution P[o].

Inferred Causation

Latent Structure

Structure Preference

Structure Preference (Cont’d)

• The set of independencies entailed by a causal structure imposes limits on its power to mimic other structure;

• L1 cannot be preferred to L2 if there is even one observable dependency that is permitted by L1 and forbidden by L2;

• L1 is preferred to L2 if L2 has subset of L1’s independence;

• Thus, test for preference and equivalence can sometimes be reduced to test dependencies, which can be determined by topology of the DAGs without concerning parameters.

Minimality

Consistency

Inferred Causation

Examples{a,b,c,d} reveal two independencies:1.a is independent of b;2.d is independent of {a,b} given c;

Assume further that the data reveals no other independencies;

a = having a cold;b = having hay fever;c = having to sneeze;d = having to wipe one’s nose.

Example (Cont’d){a,b,c,d} reveal two independencies:1. a is independent of b;2. d is independent of {a,b} given c;

minimal

Arbitrary relations between a and b

Not minimal: fails to impose conditional Independence between d and {a,b}

Not consistent with data: impose marginalindependence between d and {a,b}

Stability

The stability condition states that, as we vary the parmeters from to , no indpendence in P can be destroyed.

In other words, if the independency exists, it will always exists.

Stable distribution

• A probability distribution P is a faithful/stable distribution if there exist a directed acyclic graph (DAG) D such that the conditional independence relationship in P is also shown in the D, and vice versa.

IC algorithm (Inductive Causation)

• IC algorithm (Pearl)

– Based on variable dependencies;– Find all pairs of variables that are dependent

of each other (applying standard statistical method on the database);

– Eliminate (as much as possible) indirect dependencies;

– Determine directions of dependencies;

Comparing abduction, deduction and induction

• Deduction: major premise: All balls in the box are black

minor premise: These balls are from the box

conclusion: These balls are black

• Abduction: rule: All balls in the box are black

observation: These balls are black

explanation: These balls are from the box

• Induction: case: These balls are from the box

observation: These balls are black

hypothesized rule: All ball in the box are black

A => B A ---------BA => B B-------------Possibly A

Whenever A then B but not vice versa-------------Possibly A => B

Induction: from specific cases to general rules;Abduction and deduction:

both from part of a specific case to other part of the case using general rules (in different ways)

Source from httpwww.csee.umbc.edu/~ypeng/F02671/lecture-notes/Ch15.ppt

IC Algorithm (Cont’d)

• Input: – P – a stable distribution on a set V of

variables;

• Output: – A pattern H(P) compatible with P;

Patten: is a partially directed DAG• some edges are directed and • some edges are undirected;

IC Algorithm: Step 1

• For each pair of variables a and b in V, search for a set Sab such that (a╨b | Sab) holds in P – in other words, a and b should be

independent in P, conditioned on Sab .

• Construct an undirected graph G such that vertices a and b are connected with an edge if and only if no set Sab can be found.

Sab

aNot Sab

b

Sab a b

a b╨

IC Algorithm: Step 2

• For each pair of nonadjacent variables a and b with a common neighbor c, check if c Sab.

• If it is, then continue;• Else add arrowheads at c• i.e a→ c ← b

Yes

c

a

b

a b C╨No

ca

b

ExampleRain

Other causes of mud

Mud

RainOther causes of mud

Mud

IC Algorithm Step 3

• In the partially directed graph that results, orient as many of the undirected edges as possible subject to two conditions:– The orientation should not create a new v-

structure;– The orientation should not create a directed cycle;

Rules required to obtaining a maximally oriented pattern

• R1: Orient b — c into b→c whenever there is an arrow a→b such that a and c are non adjacent;

cb cb

ba c


• R2: Orient a — b into a→b whenever there is a chain a→c→b;

ba ba

ca b


R3: Orient a — b into a→b whenever there are two chains a—c→b and a—d→b such that c and d are nonadjacent;

ba ba

c

a b

d


R4: Orient a — b into a→b whenever there are two chains a—c→d and c→d→b such that c and b are nonadjacent;

ba ba

ca d

dc b

IC* Algorithm

• Input: – P, a sampled distribution;

• Output: – core(P), a marked pattern;

Marked Pattern:Four types of edges

IC* Algorithm: Step 1

For each pair of variables a and b, search for a set Sab such that a and b are independent in P, conditioned on Sab. If there is no such Sab, place an undirected link between the two variables, a – b.

IC* Algorithm: Step 2

• For each pair of nonadjacent variables a and b with a common neighbor c, check if cSab

– If it is, then continue;– If it is not, then add arrow heads pointing at c (i.e. a

c b).

• In the partially directed graph that results, add (recursively) as many arrowheads as possible, and mark as many edges as possible, according to the following two rules:

IC* Algorithm: Rule 1

• R1: For each pair of non-adjacent nodes a and b with a common neighbor c, if the link between a and c has an arrow head into c and if the link between c and b has no arrowhead into c, then add an arrow head on the link between c and b pointing at b and mark that link to obtain c –* b;

c

a

b

c

a

b

*

IC* Algorithm: Rule 2

• R2: If a and b are adjacent and there is a directed path (composed strictly of marked links) from a to b, then add an arrowhead pointing toward b on the link between a and b;

Documents

Learning Causality Some slides are from Judea Pearl’s class lecture