43
Machine Learning CUNY Graduate Center Lecture 22: Inference in Graphical Models

Lecture 22: Inference in Graphical Models

  • Upload
    dermot

  • View
    101

  • Download
    2

Embed Size (px)

DESCRIPTION

Lecture 22: Inference in Graphical Models. Machine Learning CUNY Graduate Center. Today. Graphical Models Representing conditional dependence graphically Inference Junction Tree Algorithm. Undirected Graphical Models. A. D. C. B. - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 22: Inference in Graphical Models

Machine Learning

CUNY Graduate Center

Lecture 22: Inference in Graphical Models

Page 2: Lecture 22: Inference in Graphical Models

Today

• Graphical Models– Representing conditional dependence

graphically– Inference– Junction Tree Algorithm

2

Page 3: Lecture 22: Inference in Graphical Models

Undirected Graphical Models

• In an undirected graphical model, there is no trigger/response relationship.

• Represent slightly different conditional independence relationships

• Conditional independence determined by graph separability.

3

DD

BB

CC

AA

Page 4: Lecture 22: Inference in Graphical Models

Undirected Graphical Models

• Different relationships can be described with directed and undirected graphs.

• Cannot represent:

4

Page 5: Lecture 22: Inference in Graphical Models

Undirected Graphical Models

• Different relationships can be described with directed and undirected graphs.

5

Page 6: Lecture 22: Inference in Graphical Models

Probabilities in Undirected Graphs

• Clique: a set of nodes such that there is an edge between every pair of nodes that are members of the set

• We will defined the joint probability as a relationship between functions defined over cliques in the graphical model

6

Page 7: Lecture 22: Inference in Graphical Models

Probabilities in Undirected Graphs

• Potential Functions: positive functions over groups of connected variables (represented by maximal cliques of graphical model nodes)– Maximal cliques: if a clique of nodes A are not a

proper subset of a clique B, then A is a maximal clique.

7

Guanrantees a sum of 1

Page 8: Lecture 22: Inference in Graphical Models

Logical Inference

• In logical inference, nodes are binary, edges represent gates.– AND, OR, XOR, NAND, NOR, NOT, etc.

• Inference: given observed variables, predict others

• Problems: uncertainty, conflicts, inconsistency

8

ANDAND

NOTNOT

XORXOR

Page 9: Lecture 22: Inference in Graphical Models

Probabilistic Inference

• Rather than a logic network, use a Bayesian Network• Probabilistic Inference: given observed variables,

calculate marginals over others.• Logic networks are generalized by Bayesian Networks

9

ANDAND

NOTNOT

XORXOR

B=TRUE B=FALSE

A=TRUE 0 1

A=FALSE 1 0NOT

Page 10: Lecture 22: Inference in Graphical Models

Probabilistic Inference

• Rather than a logic network, use a Bayesian Network• Probabilistic Inference: given observed variables,

calculate marginals over others.• Logic networks are generalized by Bayesian Networks

10

ANDAND

NOTNOT

XORXOR

B=TRUE B=FALSE

A=TRUE 0.1 0.9

A=FALSE 0.9 0.1NOT-ish

Page 11: Lecture 22: Inference in Graphical Models

Inference in Graphical Models

• General Problem: Given a graphical model, for any subsets of observed and expected variables find

• Direct approach can be quite inefficient if there are many irrelevant variables

11

Page 12: Lecture 22: Inference in Graphical Models

Marginal Computation

• Graphical models provide efficient storage by decomposing p(x) into conditional probabilities and a simple MLE result.

• Now look for efficient calculation of marginals which will lead to efficient inference.

12

Page 13: Lecture 22: Inference in Graphical Models

Brute Force Marginal Calculation

• First approach: have CPTs and graphical model. We can compute arbitrary joints.– Assume 6 variables

13

Page 14: Lecture 22: Inference in Graphical Models

Computation of Marginals

• Pass messages (small tables) around the graph

• The messages are small functions that propagate potentials around and undirected graphical model.

• The inference technique is the Junction Tree Algorithm

14

Page 15: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• Efficient Message Passing for Undirected Graphs.– For Directed Graphs, first convert to

undirected.

• Goal: Efficient Inference in Graphical Models

15

Page 16: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• Moralization

• Introduce Evidence

• Triangulate

• Construct Junction Tree

• Propagate Probabilities

16

Page 17: Lecture 22: Inference in Graphical Models

Moralization

• Converts a directed graph to an undirected graph.• Moralization “marries” the parents.

– Insert an undirected edge between every pair of nodes that have a child in common.

– Replace all directed edges with undirected edges.

17

Page 18: Lecture 22: Inference in Graphical Models

Moralization Examples

18

Page 19: Lecture 22: Inference in Graphical Models

Moralization Examples

19

Page 20: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• Moralization

• Introduce Evidence

• Triangulate

• Construct Junction Tree

• Propagate Probabilities

20

Page 21: Lecture 22: Inference in Graphical Models

Introduce Evidence

• Given a moral graph, identify the observed variables.

• Reduce probability functions since we know some are fixed.

• Only keep probability functions over remaining nodes.

21

Page 22: Lecture 22: Inference in Graphical Models

Slices

• Differentiate potential functions from slices• Potential Functions are related to joint

probabilities over groups of nodes, but aren’t necessarily correctly normalized, and can even be initialized to conditionals.

• A slice of a potential function is a row or column of the underlying table (in the discrete case) or unnormalized marginal (in the continuous case)

22

.4 .1

.12 .15

Page 23: Lecture 22: Inference in Graphical Models

Separation from Introducing Evidence

• Observing nodes separates conditionally independent sets of variables

• Normalization Calculation. Don’t bother until the end when we want to determine an individual marginal.

23

Page 24: Lecture 22: Inference in Graphical Models

Junction Trees

• Construction of junction trees.– Each node represents a clique of variables.– Edges connect cliques– There is a unique path from node to root– Between each clique node is a separator node.– Separators contain intersections of variables

24

Page 25: Lecture 22: Inference in Graphical Models

Triangulation

• Constructing a junction tree.• Need to guarantee that a Junction Graph, made

up of cliques and separators of an undirected graph is a Tree.– Eliminate any chordless cycles of four or more nodes.

25

BB

CC

DD

AA

EE

BB

CC

DD

AA

EE

ABDABD

CECE

BCBC DEDE

BB

CC

DD

EE

Page 26: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• Moralization

• Introduce Evidence

• Triangulate

• Construct Junction Tree

• Propagate Probabilities

26

Page 27: Lecture 22: Inference in Graphical Models

Triangulation

• When eliminating cycles there may be many choices about which edge to add.

• Want to keep the largest clique size small – small potential functions

• Triangulation that minimizes the largest clique size is NP-complete.

• Suboptimal triangulation is acceptable (poly-time) and doesn’t introduce many extra dimensions.

27

Page 28: Lecture 22: Inference in Graphical Models

Triangulation

• When eliminating cycles there may be many choices about which edge to add.

• Want to keep the largest clique size small – small potential functions

• Triangulation that minimizes the largest clique size is NP-complete.

• Suboptimal triangulation is acceptable (poly-time) and doesn’t introduce many extra dimensions.

28

Page 29: Lecture 22: Inference in Graphical Models

Triangulation Examples

29

AA

FF

DD

EE

BB

CC

AA

AAAA

AA

AA

Page 30: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• Moralization

• Introduce Evidence

• Triangulate

• Construct Junction Tree

• Propagate Probabilities

30

Page 31: Lecture 22: Inference in Graphical Models

Constructing Junction Trees

• Junction trees must satisfy the Running Intersection Property:– All nodes on a path between a clique node V and clique

node W must include all nodes in V ∩ W• Junction trees will have maximal separator cardinality.

31

BB

CC

DD

AA

EE CDECDE

BCDBCD

ABDABD

BDBD

CDCD

CDECDE

BCDBCD

ABDABD

DD

CDCD

Page 32: Lecture 22: Inference in Graphical Models

Forming a Junction Tree

• Given a set of cliques, connect the nodes, s.t. the Running Intersection Property holds.– Maximize the cardinality of the separators.

• Maximum Spanning Tree (Kruskal’s algorithm)– Initialize a tree with no edges.– Calculate the size of separators between all pairs

• O(N2)– Connect two cliques with the largest separator

cardinality without creating a loop.– Repeat until all nodes are connected.

32

Page 33: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• Moralization

• Introduce Evidence

• Triangulate

• Construct Junction Tree

• Propagate Probabilities

33

Page 34: Lecture 22: Inference in Graphical Models

Propagating Probabilities

• We have a valid junction tree.– What can we do with it?

• Probabilities in Junction Trees:– De-absorb smaller cliques from maximal cliques.

– Doesn’t change anything, but is a less compact description.

34

Page 35: Lecture 22: Inference in Graphical Models

Conversion from Directed Graph

• Example conversion.

• Represent CPTs as potential and separator functions (with a normalizer)

35

X1X1 X2X2 X3X3 X4X4

X1X1 X2X2 X3X3 X4X4

X1 X2X1 X2 X2 X3X2 X3 X3 X4X3 X4X2X2 X3X3

Page 36: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• Goal: Make marginals consistent• Junction Tree Algorithm sends messages

between cliques and separators until this consistency is reached.

36

Page 37: Lecture 22: Inference in Graphical Models

Message passing

• Send a message from a clique to a separator• The message is what the clique thinks the

marginal should be.• Normalize the clique by each message from the

separators s.t. agreement is reached

37

A BA B B CB CBB

If they agree, finished. Otherwise, iterate.

Page 38: Lecture 22: Inference in Graphical Models

Message Passing

38

A BA B B CB CBB

Page 39: Lecture 22: Inference in Graphical Models

Junction Tree Algorithm

• When convergence is reached – clique potentials are marginals and separator potentials are submarginals.

• p(x) is consistent across all of the message passing.

• This implies that, so long as p(x) is correctly represented in the potential functions, the JTA can be used to make each potential function correspond to an appropriate marginal without impacting the overall probability function.

39

Page 40: Lecture 22: Inference in Graphical Models

Converting a DAG to a Junction Tree

• Initialize separators to 1 and clique tables to CPTs• Run JTA to convert potential functions (CPTs) to marginals.

40

X1X1 X2X2 X3X3 X4X4

X5X5

X6X6 X7X7

X3 X4X3 X4

X2 X3X2 X3

X1 X2X1 X2X3 X5X3 X5

X5 X6X5 X6 X5 X7X5 X7

Page 41: Lecture 22: Inference in Graphical Models

Evidence in a Junction Tree

• Initialize as usual.

• Update with a slice rather that the whole table

41

Conditional

Page 42: Lecture 22: Inference in Graphical Models

Efficiency of the Junction Tree Algorithm

• Construct CPTs– Polynomial in # of data points

• Moralization– Polynomial in # of nodes

• Introduce Evidence– Polynomial in # of nodes

• Triangulate– Suboptimal = polynomial. Optimal = NP

• Construct Junction Tree– Polynomial in the number of cliques– Identifying cliques = polynomial in the number of nodes

• Propagate Probabilities– Polynomial in the number of cliques– Exponential in the size of cliques

42

Page 43: Lecture 22: Inference in Graphical Models

Hidden Markov Models

• Powerful graphical model to describe sequential information.

43

Q1Q1 Q2Q2 Q3Q3 Q4Q4

X1X1 X2X2 X3X3 X4X4