32
1 Use graphs and not pure logic •Variables represented by nodes and dependencies by edges. • Common in our language: “threads of thoughts”, “lines of reasoning”, “connected ideas”, “far-fetched arguments”. •Still, capturing the essence of dependence is not an easy task. When modeling causation, association, and relevance, it is hard to distinguish between direct and indirect neighbors. •If we just connect “dependent variables” we will get cliques.

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Embed Size (px)

Citation preview

Page 1: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

1

Use graphs and not pure logic

•Variables represented by nodes and dependencies by edges.

• Common in our language: “threads of thoughts”, “lines of reasoning”, “connected ideas”, “far-fetched arguments”.

•Still, capturing the essence of dependence is not an easy task. When modeling causation, association, and relevance, it is hard to distinguish between direct and indirect neighbors.

•If we just connect “dependent variables” we will get cliques.

Page 2: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

2

Undirected Graphs can represent Independence

Let G be an undirected graph (V,E).

Define IG(X,Z,Y) for disjoint sets of nodes X,Y, and Z if and only if all paths between a node in X and a node in Y pass via a node in Z.

In the text book another notation used is <X|Z|Y>G.

Page 3: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

3

M = { IG(M1,{F1,F2},M2), IG(F1,{M1,M2},F2) + symmetry }

Page 4: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

4

Dependency Models – abstraction of Probability distributions

Page 5: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

5

Page 6: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

6

Page 7: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

7Recall that composition and contraction are implied.

Page 8: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

8

(namely, Composition holds)

Page 9: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

9

Page 10: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

10

Page 11: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

11

The set of all independence statements defined by (3.11) is called the pairwise basis of G.

These are the independence statements that define the graph.

Page 12: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

12

Page 13: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

13

Edge minimal and unique.

Page 14: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

14

The set of all independence statements defined by (3.12) is called the neighboring basis of G.

Page 15: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

15

Page 16: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

16

Page 17: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

17

Testing I-mapness

Proof: (2) (1)(3) (2).(2) (1): Holds because G is an I-map of G0 is an I map of P.(1)(3): True due to I-mapness of G (by definition).(3) (2):

Page 18: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

18

Insufficiency of Local tests for non strictly positive probability

distributionsConsider the case X=Y=Z=W. What is a Markov network for it ? Is it unique ? The Intersection property is critical !

Page 19: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

19

Markov Networks that represents probability distributions (rather than just independence)

Page 20: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

20

The two males and females example

Page 21: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

21

Page 22: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

22

Theorem 6 does not guarantee that every dependency of P will be represented by G. However one can show the following claim:

Theorem X: Every undirected graph G has a distribution P such that G is a perfect map of P.

(In light of previous notes, it must have the form of a product over cliques).

Page 23: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

23

Proof Sketch

Given a graph G, it is sufficient to show that for an independence statement = I(,Z,) that does NOT hold in G, there exists a probability distribution that satisfies all independence statements that hold in the graph and does not satisfy = I(,Z,).

Well, simply pick a path in G between and that does not contain a node from Z. Define a probability distribution that is a perfect map of the chain and multiply it by any marginal probabilities on all other nodes forming P . Now “multiply” all P (Armstrong relation) to obtain P.

Interesting task (Replacing HMW #4): Given an undirected graph over binary variables construct a perfect map probability distribution. (Note: most Markov random fields are perfect maps !).

Page 24: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

24

The set of all independence statements defined by (3.12) was called the neighboring basis of G.

Interesting conclusion of Theorem X. All independence statements that follow for strictly-positive probability from the neighborhood basis are derivable via symmetry, decomposition, intersection, and weak union. These axioms are sound and complete for neighborhood bases !

Same conclusion with pairwise bases. In fact for saturated statements independence and separation have the same characterization. See paper P2 in the recitation class.

Recall:

Page 25: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

25

Drawback: Interpreting the Links is not simple

Another drawback is the difficulty with extreme probabilities.

Both drawbacks disappear in the class of decomposable models, which are a special case of Bayesian networks

Page 26: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

26

Decomposable Models

Example: Markov Chains and Markov Trees

Assume the following chain is an I-map of some P(x1,x2,x3,x4) and was constructed using the methods we just described.

The “compatibility functions” on all links can be easily interpreted in the case of chains.

Same also for trees. This idea actually works for all chordal graphs.

Page 27: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

27

Chordal Graphs

Page 28: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

28

Interpretation of the links

Clique 1 Clique 2

Clique 3

A probability distribution that can be written as a productof low order marginals divided by a product of low order marginals is said to be decomposable.

Page 29: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

29

Importance of Decomposability

When assigning compatibility functions it suffices to use marginal probabilities on cliques and just make sure to be locally consistent. Marginals can be assessed from experts or estimated directly from data.

Page 30: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

30

The Diamond Example – The smallest non chordal graph

Adding one more link will turn the graph to become chordal.

Turning a general undirected graph into a chordal graph in some optimal way is the key for all exact computations done on Markov and Bayesian networks.

Page 31: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

31

Chordal Graphs

Page 32: 1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

32

Example of the Theorem

1. Each cycle has a chord.2. There is a way to direct edges legally, namely,

AB , AC , BC , BD , CD, CE3. Legal removal order (eg): start with E, than D,

than the rest. 4. The maximal cliques form a join (clique) tree.