1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

1

Use graphs and not pure logic

•Variables represented by nodes and dependencies by edges.

• Common in our language: “threads of thoughts”, “lines of reasoning”, “connected ideas”, “far-fetched arguments”.

•Still, capturing the essence of dependence is not an easy task. When modeling causation, association, and relevance, it is hard to distinguish between direct and indirect neighbors.

•If we just connect “dependent variables” we will get cliques.

2

Undirected Graphs can represent Independence

Let G be an undirected graph (V,E).

Define IG(X,Z,Y) for disjoint sets of nodes X,Y, and Z if and only if all paths between a node in X and a node in Y pass via a node in Z.

In the text book another notation used is <X|Z|Y>G.

3

M = { IG(M1,{F1,F2},M2), IG(F1,{M1,M2},F2) + symmetry }

4

Dependency Models – abstraction of Probability distributions

5

6

7Recall that composition and contraction are implied.

8

(namely, Composition holds)

9

10

11

The set of all independence statements defined by (3.11) is called the pairwise basis of G.

These are the independence statements that define the graph.

12

13

Edge minimal and unique.

14

The set of all independence statements defined by (3.12) is called the neighboring basis of G.

15

16

17

Testing I-mapness

Proof: (2) (1)(3) (2).(2) (1): Holds because G is an I-map of G0 is an I map of P.(1)(3): True due to I-mapness of G (by definition).(3) (2):

18

Insufficiency of Local tests for non strictly positive probability

distributionsConsider the case X=Y=Z=W. What is a Markov network for it ? Is it unique ? The Intersection property is critical !

19

Markov Networks that represents probability distributions (rather than just independence)

20

The two males and females example

21

22

Theorem 6 does not guarantee that every dependency of P will be represented by G. However one can show the following claim:

Theorem X: Every undirected graph G has a distribution P such that G is a perfect map of P.

(In light of previous notes, it must have the form of a product over cliques).

23

Proof Sketch

Given a graph G, it is sufficient to show that for an independence statement = I(,Z,) that does NOT hold in G, there exists a probability distribution that satisfies all independence statements that hold in the graph and does not satisfy = I(,Z,).

Well, simply pick a path in G between and that does not contain a node from Z. Define a probability distribution that is a perfect map of the chain and multiply it by any marginal probabilities on all other nodes forming P . Now “multiply” all P (Armstrong relation) to obtain P.

Interesting task (Replacing HMW #4): Given an undirected graph over binary variables construct a perfect map probability distribution. (Note: most Markov random fields are perfect maps !).

24

The set of all independence statements defined by (3.12) was called the neighboring basis of G.

Interesting conclusion of Theorem X. All independence statements that follow for strictly-positive probability from the neighborhood basis are derivable via symmetry, decomposition, intersection, and weak union. These axioms are sound and complete for neighborhood bases !

Same conclusion with pairwise bases. In fact for saturated statements independence and separation have the same characterization. See paper P2 in the recitation class.

Recall:

25

Drawback: Interpreting the Links is not simple

Another drawback is the difficulty with extreme probabilities.

Both drawbacks disappear in the class of decomposable models, which are a special case of Bayesian networks

26

Decomposable Models

Example: Markov Chains and Markov Trees

Assume the following chain is an I-map of some P(x1,x2,x3,x4) and was constructed using the methods we just described.

The “compatibility functions” on all links can be easily interpreted in the case of chains.

Same also for trees. This idea actually works for all chordal graphs.

27

Chordal Graphs

28

Interpretation of the links

Clique 1 Clique 2

Clique 3

A probability distribution that can be written as a productof low order marginals divided by a product of low order marginals is said to be decomposable.

29

Importance of Decomposability

When assigning compatibility functions it suffices to use marginal probabilities on cliques and just make sure to be locally consistent. Marginals can be assessed from experts or estimated directly from data.

30

The Diamond Example – The smallest non chordal graph

Adding one more link will turn the graph to become chordal.

Turning a general undirected graph into a chordal graph in some optimal way is the key for all exact computations done on Markov and Bayesian networks.

31

Chordal Graphs

32

Example of the Theorem

1. Each cycle has a chord.2. There is a way to direct edges legally, namely,

AB , AC , BC , BD , CD, CE3. Legal removal order (eg): start with E, than D,

than the rest. 4. The maximal cliques form a join (clique) tree.

Documents

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,