15
1 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University November 10 th , 2006 Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website Chapter 9 - Jordan 10-708 ©Carlos Guestrin 2006 2 More details on Loopy BP Numerical problem: messages < 1 get multiplied together as we go around the loops numbers can go to zero normalize messages to one: Z ij doesn’t depend on X j , so doesn’t change the answer Computing node “beliefs” (estimates of probs.): Difficulty SAT Grade Happy Job Coherence Letter Intelligence

More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

1

1

Loopy Belief PropagationGeneralized Belief PropagationUnifying Variational and GBPLearning Parameters of MNs

Graphical Models – 10708Carlos GuestrinCarnegie Mellon University

November 10th, 2006

Readings:K&F: 11.3, 11.5Yedidia et al. paper from the class websiteChapter 9 - Jordan

10-708 – ©Carlos Guestrin 2006 2

More details on Loopy BPNumerical problem:

messages < 1 get multiplied togetheras we go around the loopsnumbers can go to zeronormalize messages to one:

Zi→j doesn’t depend on Xj, so doesn’t change the answer

Computing node “beliefs” (estimates of probs.):

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 2: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

2

10-708 – ©Carlos Guestrin 2006 3

An example of running loopy BP

10-708 – ©Carlos Guestrin 2006 4

(Non-)Convergence of Loopy BP

Loopy BP can oscillate!!!oscillations can smalloscillations can be really bad!

Typically, if factors are closer to uniform, loopy does well (converges)if factors are closer to deterministic, loopy doesn’t behave well

One approach to help: damping messagesnew message is average of old message and new one:

often better convergencebut, when damping is required to get convergence, result often bad

graphs from Murphy et al. ’99

Page 3: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

3

10-708 – ©Carlos Guestrin 2006 5

Loopy BP in Factor graphsWhat if we don’t have pairwiseMarkov nets?

1. Transform to a pairwise MN2. Use Loopy BP on a factor

graph

Message example:from node to factor:

from factor to node:

A B C D E

ABC ABD BDE CDE

10-708 – ©Carlos Guestrin 2006 6

Loopy BP in Factor graphsFrom node i to factor j:

F(i) factors whose scope includes Xi

From factor j to node i:Scope[φj] = Y∪{Xi}

A B C D E

ABC ABD BDE CDE

Page 4: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

4

10-708 – ©Carlos Guestrin 2006 7

What you need to know about loopy BPApplication of belief propagation in loopy graphs

Doesn’t always convergedamping can helpgood message schedules can help (see book)

If converges, often to incorrect, but useful results

Generalizes from pairwise Markov networks by using factor graphs

10-708 – ©Carlos Guestrin 2006 8

Announcements

Monday’s special recitationPradeep Ravikumar on exciting new approximate inference algorithms

Page 5: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

5

10-708 – ©Carlos Guestrin 2006 9

Loopy BP v. Clique trees: Two ends of a spectrum

Difficulty

SATGrade

HappyJob

Coherence

Letter

IntelligenceDIG

GJSL

HGJ

CD

GSI

10-708 – ©Carlos Guestrin 2006 10

Generalize cluster graph

Generalized cluster graph: For set of factors F

Undirected graphEach node i associated with a cluster Ci

Family preserving: for each factor fj ∈ F, ∃ node i such that scope[fi]⊆ Ci

Each edge i – j is associated with a set of variables Sij ⊆ Ci ∩ Cj

Page 6: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

6

10-708 – ©Carlos Guestrin 2006 11

Running intersection property

(Generalized) Running intersection property (RIP)

Cluster graph satisfies RIP if whenever X∈ Ci and X∈ Cj then ∃ one and only one path from Cito Cj where X∈Suv for every edge (u,v) in the path

10-708 – ©Carlos Guestrin 2006 12

Examples of cluster graphs

Page 7: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

7

10-708 – ©Carlos Guestrin 2006 13

Two cluster graph satisfying RIP with different edge sets

10-708 – ©Carlos Guestrin 2006 14

Generalized BP on cluster graphs satisfying RIP

Initialization:Assign each factor φ to a clique α(φ), Scope[φ]⊆Cα(φ)

Initialize cliques:

Initialize messages:

While not converged, send messages:

Belief:

Page 8: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

8

10-708 – ©Carlos Guestrin 2006 15

Cluster graph for Loopy BP

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

10-708 – ©Carlos Guestrin 2006 16

What if the cluster graph doesn’t satisfy RIP

Page 9: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

9

10-708 – ©Carlos Guestrin 2006 17

Region graphs to the rescue

Can address generalized cluster graphs that don’t satisfy RIP using region graphs:

Yedidia et al. from class website

Example in your homework! ☺

Hint – From Yedidia et al.:Section 7 – defines region graphsSection 9 – message passing on region graphsSection 10 – An example that will help you a lot!!! ☺

10-708 – ©Carlos Guestrin 2006 18

Revisiting Mean-Fields

Choice of Q:Optimization problem:

Page 10: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

10

10-708 – ©Carlos Guestrin 2006 19

Interpretation of energy functional

Energy functional:

Exact if P=Q:

View problem as an approximation of entropy term:

10-708 – ©Carlos Guestrin 2006 20

Entropy of a tree distribution

Entropy term:Joint distribution:

Decomposing entropy term:

More generally: di number neighbors of Xi

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 11: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

11

10-708 – ©Carlos Guestrin 2006 21

Loopy BP & Bethe approximation

Energy functional:

Bethe approximation of Free Energy:use entropy for trees, but loopy graphs:

Theorem: If Loopy BP converges, resulting πij & πi are stationary point (usually local maxima) of Bethe Free energy!

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

10-708 – ©Carlos Guestrin 2006 22

GBP & Kikuchi approximationExact Free energy: Junction Tree

Bethe Free energy:

Kikuchi approximation: Generalized cluster graph

spectrum from Bethe to exactentropy terms weighted by counting numberssee Yedidia et al.

Theorem: If GBP converges, resulting πCi are stationary point (usually local maxima) of Kikuchi Free energy!

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence DIG

GJSL

HGJ

CD

GSI

Page 12: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

12

10-708 – ©Carlos Guestrin 2006 23

What you need to know about GBP

Spectrum between Loopy BP & Junction Trees:More computation, but typically better answers

If satisfies RIP, equations are very simple

General setting, slightly trickier equations, but not hard

Relates to variational methods: Corresponds to local optima of approximate version of energy functional

10-708 – ©Carlos Guestrin 2006 24

Learning Parameters of a BN

Log likelihood decomposes:

Learn each CPT independentlyUse counts

D

SG

HJ

C

L

I

Page 13: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

13

10-708 – ©Carlos Guestrin 2006 25

Log Likelihood for MNLog likelihood of the data:

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

10-708 – ©Carlos Guestrin 2006 26

Log Likelihood doesn’t decompose for MNs

Log likelihood:

A convex problemCan find global optimum!!

Term log Z doesn’t decompose!!

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 14: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

14

10-708 – ©Carlos Guestrin 2006 27

Derivative of Log Likelihood for MNs

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

10-708 – ©Carlos Guestrin 2006 28

Derivative of Log Likelihood for MNs

Difficulty

SATGrade

HappyJob

Coherence

Letter

IntelligenceDerivative:

Setting derivative to zeroCan optimize using gradient ascent

Let’s look at a simpler solution

Page 15: More details on Loopy BP - Carnegie Mellon School of ...guestrin/Class/10708-F06/Slides/gbp-mn-param-annotated.pdf4 10-708 – ©Carlos Guestrin 2006 7 What you need to know about

15

10-708 – ©Carlos Guestrin 2006 29

Iterative Proportional Fitting (IPF)

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Setting derivative to zero:

Fixed point equation:

Iterate and converge to optimal parametersEach iteration, must compute:

10-708 – ©Carlos Guestrin 2006 30

What you need to know about learning MN parameters?

BN parameter learning easyMN parameter learning doesn’t decompose!

Learning requires inference!

Apply gradient ascent or IPF iterations to obtain optimal parameters