Kristin Branson September 29, 2003 -...

Preview:

Citation preview

Loopy Belief PropagationResearch Exam

Kristin Branson

September 29, 2003

Loopy Belief Propagation – p.1/73

Problem Formalization

Reasoning about any real-world problem requiresassumptions about the structure of the problem: therelevant variables and the interrelationships of thesevariables.A graphical model is a formal representation of theseassumptions.

Loopy Belief Propagation – p.2/73

Problem Formalization

Reasoning about any real-world problem requiresassumptions about the structure of the problem: therelevant variables and the interrelationships of thesevariables.A graphical model is a formal representation of theseassumptions.

Loopy Belief Propagation – p.3/73

Probabilistic Model

These assumptions are a simplification of the problem’strue structure.The world appears stochastic in terms of the model.Graphical models are interpreted as describing theprobability distribution of random variables.

Loopy Belief Propagation – p.4/73

Probabilistic Inference

Reasoning about real-world problems can be modeled asprobabilistic inference on a distribution described by agraph.Probabilistic inference involves computing desiredproperties of a distribution:

What is the most probable state of the variables,given some evidence?

What is the marginal distribution of a subset of thevariables, given some evidence?

Loopy Belief Propagation – p.5/73

Inference Example

Estimate the intensity value of each pixel of an imagegiven a corrupted version of the image.

Loopy Belief Propagation – p.6/73

Inference Example

Estimate the intensity value of each pixel of an imagegiven a corrupted version of the image.

Assume ��� � �� ��

�.

Assume relationship between uncorrupted pixel variablescan be described by a local smoothness constraint.

Loopy Belief Propagation – p.7/73

Inference is Intractible

Assuming a sparse graphical model greatly simplifies theproblem.Still, probabilistic inference is in general intractible.Exact inference algorithms are exponential in the graphsize.Pearl’s Belief Propagation (BP) algorithm performsapproximate inference on an arbitrary graphical model.

Loopy Belief Propagation – p.8/73

Loopy BP

The assumptions made by BP only hold for acyclicgraphs.For graphs containing cycles, loopy BP is not guaranteedto converge or be correct.However, it has been applied with experimental success.

Loopy Belief Propagation – p.9/73

Experimental Results

Impressive experimental results were first observedin graphical code schemes.

The Turbo code error-correcting code scheme wasdescribed as “the most exciting and potentiallyimportant development in coding theory in manyyears” (McEliece et al., 1995).

Murphy et al. experimented with loopy BP ongraphical models that appear in machine learning.

They concluded that loopy BP did converge to goodapproximations in many cases.

Since then, loopy BP has successfully been appliedto many applications in machine learning andcomputer vision.

Loopy Belief Propagation – p.10/73

Theoretical Analysis

When considering applying loopy BP, one would like toknow whether it will converge to good approximations.

In this exam, I present recent analyses of loopy BP.

Loopy Belief Propagation – p.11/73

Outline

Background.Undirected graphical models.Belief Propagation algorithm.

Three techniques for analyzing loopy BP.Algebraic analysis.Unwrapped tree.Reparameterization.

Future work.

Loopy Belief Propagation – p.12/73

Outline

Background.Undirected graphical models.Belief Propagation algorithm.

Three techniques for analyzing loopy BP.Algebraic analysis.Unwrapped tree.Reparameterization.

Future work.

Loopy Belief Propagation – p.13/73

Markov Random Fields

An undirected graphical model represents a distributionby an undirected graph.

� � � � � �1 2 3

4 5 6

� � � � � �

� � �� � � � � �

� � � � � �

� � � � � �

Nodes represent random variables.

Edges represent dependencies.

Each clique is associated with a potential function,

� � � � �.

Loopy Belief Propagation – p.14/73

Markov Properties

Paths in the graph represent dependencies.If two nodes are not connected, the variables areindependent.

A

B

C

If nodes separate nodes from nodes , then andare conditionally independent given .

The Hammersley-Clifford theorem: the conditionalindependence assumptions hold if and only if thedistribution factorizes as the product of potentialfunctions over cliques: � � � �

� � � � � � � �

.

Loopy Belief Propagation – p.15/73

Pairwise MRFs

To simplify notation, we focus on pairwise MRFs.The largest clique size in a pairwise MRF is two.The distribution can therefore be represented as

� � � �� �

� � ��

�� ��

� � � � � � �� �

�� � � � ��

A MRF with larger cliques can be converted into apairwise MRF.

Loopy Belief Propagation – p.16/73

Probabilistic Inference

We discuss two inference problems:

Marginalization: For each node �, compute

���� �� � � �

�� � � �� �� ��

� � � � � � � ��

MAP assignment: Find the maximum probabilityassignment given the evidence:

� � ��

� argmax� � �

� � � � � � � ��

Loopy Belief Propagation – p.17/73

Max-Marginals

To find the MAP assignment, we will compute themax-marginals:

��� �� � � �

� � ��

� � � �� � �� � � � � � � � �

The MAP assignment � � ��

� maximizes ���� �

.

Loopy Belief Propagation – p.18/73

Notation

To simplify notation, we assume that effect of theobserved data � � is encapsulated in the single-nodepotentials � .

��� � � � �

� � � �

� �

� � � � � �

Loopy Belief Propagation – p.19/73

Variable Elimination

Exact inference can be performed by repeatedlyeliminating variables:

������ �� � �

� �� ��� � � �� ��� � �

� � � �� ��� � � �

� � ��� � ����

���� � � � ��� � � � � � �

� � � �� ��� � � �

� � ��� � ���� � � � � ��� � �

Loopy Belief Propagation – p.20/73

Outline

Background.Undirected graphical models.Belief Propagation algorithm.

Three techniques for analyzing loopy BP.Algebraic analysis.Unwrapped tree.Reparameterization.

Future work.

Loopy Belief Propagation – p.21/73

BP for Trees

��

��

� �

� �

� � ��� � � �

�� � ���� � �

BP breaks the [max-]marginalization for a node � intoindependent subproblems corresponding to the subtreesrooted at the neighbors of �.In each subproblem, BP eliminates all the variablesexcept �.The result is a message ��� �� � �

.Loopy Belief Propagation – p.22/73

BP for Trees

� �

� ���� ��

��� ���� ��

� � ���� � � � ��

� � ��� � � � ��

BP is a dynamic programming form of variableelimination.The creation of a message is equivalent to repeatedlyremoving leaf nodes of the subtree:

�� ��� ��

� ��

� ��� � � � ��

��� ��

� � �� � �� �� ��

�� ��

�� ��� ��

� � � � ��

� ��� � � � ��

��� ��

� � �� � � � �� ��

�� ��

Loopy Belief Propagation – p.23/73

BP for Trees

����� ��

� ���� ��

� � ���� �� ��� ���� ��

In terms of these messages, the [max-]marginals are

� ��� ��

� � ��� ��

� � �� � ��� �

�� ��

��� ��

� � ��� ��

� � �� � ��� �

�� ��

Loopy Belief Propagation – p.24/73

Parallel Message Passing

� �

� ���� ��

� � � �� �

��� ��

� � ���� � � � ��

� �� ��� � � � ��

Instead of waiting for smaller problems to be solvedbefore solving larger problems, we can iteratively passmessages in parallel.

Initialize the messages � �� ��� ��

for all

� �� � � � .

For

� � ��

�� � � � until convergence,

Update � �� ��� ��

using � � ���

�� ��

for all� �� � � � .

Loopy Belief Propagation – p.25/73

Loopy BP

The parallel BP algorithm can be applied to arbitrarygraphs.However, the assumptions made by BP do not hold for aloopy graph.Loopy BP is not guaranteed to converge.If it does converge, it is not guaranteed to converge to thecorrect [max-]marginals.We will call the approximate [max-]marginals beliefs

��

�� ��

.

Loopy Belief Propagation – p.26/73

Theoretical Analysis

When will loopy BP converge?

How good an approximation are the[max-]marginals and max-product assignment?

I present three techniques for analyzing BP.The first two analyze the message-passing dynamics,while the third analyzes the steady-state beliefs directly.

Loopy Belief Propagation – p.27/73

Outline

Background.Undirected graphical models.Belief Propagation algorithm.

Three techniques for analyzing loopy BP.Algebraic analysis.Unwrapped tree.Reparameterization.

Future work.

Loopy Belief Propagation – p.28/73

Algebraic Analysis Overview

We first discuss an algebraic analysis of thesum-product algorithm for a single-cycle graph.

We represent each message update of thesum-product algorithm as a matrix multiplication.

We use linear algebra results to show therelationship between the steady-state beliefs and thetrue marginals, as well as convergence properties.

The sum-product algorithm converges for asingle-cycle graph.

The convergence rate and the accuracy of the beliefsare related.

Loopy Belief Propagation – p.29/73

Matrix Representation

We represent the message and belief functions as vectors

� � and

�� .

Similarly, we represent the single- and pair-nodepotentials as matrices � and � � .

��

� � � �

Loopy Belief Propagation – p.30/73

Matrix Representation

We represent the message and belief functions as vectors

� � and

�� .

Similarly, we represent the single- and pair-nodepotentials as matrices � and � � .

� � � �

Loopy Belief Propagation – p.31/73

Matrix Message Updates

� � �

� � � �

� �

� ��

� ����

� ��

� � � ��

� � � �

For a graph consisting of a single cycle, a messageupdate is a matrix multiplication

�� � � �

� � � � � � � �

� �� � � � �

Loopy Belief Propagation – p.32/73

Matrix Message Updates

� � �

� � � �

� �

� ��

� ����

� ��

� � � ��

� � � �

� � � � � �

For a graph consisting of a single cycle, a messageupdate is a matrix multiplication

�� � � �

� � � � � � � �

� �� � � � �

Loopy Belief Propagation – p.33/73

Matrix Belief Updates

� � �

� � � �

�� ���

�� � � � �

� ��For a graph consisting of a single cycle, a belief update is

� � � �diag� � � � � �

�� � � � �

��

Loopy Belief Propagation – p.34/73

Matrix Message Updates

1 2

4 3

� ���� � � � � ����� ����

����� ��

�� ��� �

� � �!"!#$#

% & %A series of message-updates is a series of matrixmultiplications.

' � � (� � �� � � � �) � � �� �

)� � � � � ) � � �� � � � ) � � � �* * * � )�� � � � ) �

� ' �� � �� �

� � +� � �� � ' �� � �� �

Loopy Belief Propagation – p.35/73

Power Method Lemma

� � �� �

converges to the principaleigenvector of , � � � .

The convergence rate is the ratio of the first twoeigenvalues, � �

���

�� � � .

This applies ifThe eigenvalues follow

�� � � � ���

(e.g. if thedistribution is positive).The initial vector

is not orthogonal to � � .

Loopy Belief Propagation – p.36/73

True Marginals

The sums and multiplications performed whencomputing the marginals are a distributed version of thesums and multiplications performed when computing thediagonal elements of � � � � :

� � � �diag� � � � ��

Loopy Belief Propagation – p.37/73

Beliefs

� �� � � � � is the left eigenvector of � � � � , since

� � � � � ��

�� � � � � � �

The steady-state beliefs are therefore the diagonalelements of the outer product of the right and leftprincipal eigenvectors of � � � � .

Loopy Belief Propagation – p.38/73

Beliefs vs True Marginals

The diagonal elements of the outer product of theright and left principal eigenvectors is anapproximation of the diagonal elements of � � � � .

The goodness of the approximation depends on theratio

� � �� � � .

Recall that the convergence rate depends on asimilar ratio,

� � ���.

The faster the convergence, the better theapproximation.

Loopy Belief Propagation – p.39/73

Algebraic Analysis Recap

By representing the sum-product algorithm on asingle-cycle graph as a series of matrixmultiplications, we showed the following results:

The sum-product algorithm converges forpositive distributions.Both the covergence rate and the accuracy of thesteady-state beliefs depend on the relative size ofthe first eigenvalue of the same matrix .

Loopy Belief Propagation – p.40/73

Outline

Background.Undirected graphical models.Belief Propagation algorithm.

Three techniques for analyzing loopy BP.Algebraic analysis.Unwrapped tree.Reparameterization.

Future work.

Loopy Belief Propagation – p.41/73

Unwrapped Tree

To analyze the BP algorithm, we construct theunwrapped tree, , an acyclic graph that is locallyequivalent to the original graph, .

1 2

4 3

� �

� � � �

� � � �

� � � �

� � � �

Loopy Belief Propagation – p.42/73

Unwrapped Tree Analysis

� ��� � ��

� � ��� � � ��

Want

Loopy Belief Propagation – p.43/73

Unwrapped Tree Analysis

� ��� � ��

� � ��� � � ��

Know

Want

Loopy Belief Propagation – p.44/73

Unwrapped Tree Analysis

� ��� � ��

� � ��� � � ��

Show

Know

Show

Want

Loopy Belief Propagation – p.45/73

Unwrapped Tree Overview

The unwrapped tree was used to prove:

The max-product assignment is exact for a graphcontaining a single cycle.

The max-product assignment has a higherprobability than any other assignment in a largeneighborhood.

Loopy Belief Propagation – p.46/73

Unwrapped Tree Construction

� ��� � ��

� � ��� � � ��

ShowK

now

ShowW

ant

Loopy Belief Propagation – p.47/73

Unwrapped Tree Construction

The unwrapped tree, , is constructed from as follows:

Choose an arbitrary root node �. Initialize � �.Repeat:

For each leaf � of , find the neighbors of thecorresponding node in , other than the parent of

� in . Add these nodes to the tree.

� �

Loopy Belief Propagation – p.48/73

Unwrapped Tree Construction

The unwrapped tree, , is constructed from as follows:

Choose an arbitrary root node �. Initialize � �.Repeat:

For each leaf � of , find the neighbors of thecorresponding node in , other than the parent of

� in . Add these nodes to the tree.

� �

� �

� �Loopy Belief Propagation – p.49/73

Unwrapped Tree Construction

The unwrapped tree, , is constructed from as follows:

Choose an arbitrary root node �. Initialize � �.Repeat:

For each leaf � of , find the neighbors of thecorresponding node in , other than the parent of

� in . Add these nodes to the tree.

� � � �

� �

� � � �

Loopy Belief Propagation – p.50/73

Unwrapped Tree Construction

The unwrapped tree, , is constructed from as follows:

Choose an arbitrary root node �. Initialize � �.Repeat:

For each leaf � of , find the neighbors of thecorresponding node in , other than the parent of

� in . Add these nodes to the tree.

� � � � � �

� �

� � � � � �

Loopy Belief Propagation – p.51/73

Unwrapped Tree Construction

The unwrapped tree, , is constructed from as follows:

Choose an arbitrary root node �. Initialize � �.Repeat:

For each leaf � of , find the neighbors of thecorresponding node in , other than the parent of

� in . Add these nodes to the tree.

� � � � � � � �

� �

� � � � � � � �

Loopy Belief Propagation – p.52/73

Unwrapped Tree Construction

� � � � � � � �

� �

� � � � � � � �

� �

� � � � � �

� � � �

� �

� � � � � �

� � � �

� �

� � �� � �

� � �� � �

� � �� � �

� � �� � �

Copy the potentials from the corresponding nodes in .Modify the leaf single-node potentials to simulate thesteady-state messages in .Because the graphs are locally the same, the � �

� will bereplicas of � ��� .

Loopy Belief Propagation – p.53/73

Graphs Containing a Single Cycle

For a graph containing a single cycle, the unwrapped treeis an infinite chain.We can construct so that each node is replicated �

times in the interior.

� � � � � � � � � �

� �

� � � � � � � �

Loopy Belief Propagation – p.54/73

Graphs Containing a Single Cycle

� ��� � ��

� � ��� � � ��

ShowK

now

ShowW

ant

Loopy Belief Propagation – p.55/73

Graphs Containing a Single Cycle

Let

� � � � �

be the log-likelihood of assignment � for .Since the interior of consists of � replicas of , thelog-likelihood for is

��

� � �� � � � � � � �� � ��� �

where

�� � ��� �

is the log-likelihood for the two leaf nodes.As

�� � ��� �

does not depend on �, in the limit as � ,

��

� � � � � � � � �

.

Loopy Belief Propagation – p.56/73

Graphs Containing a Single Cycle

� ��� � ��

� � ��� � � ��

ShownK

now

ShownW

ant

Loopy Belief Propagation – p.57/73

Optimality for Arbitrary Graphs

Let

be a set of nodes whose induced subgraph containsat most one cycle per connected component.We can show that � � has a higher probability than any� ��� � � ���

.

��� � ���

Loopy Belief Propagation – p.58/73

Outline

Background.Undirected graphical models.Belief Propagation algorithm.

Three techniques for analyzing loopy BP.Algebraic analysis.Unwrapped tree.Reparameterization.

Future work.

Loopy Belief Propagation – p.59/73

Reparameterization Analysis

The past two analysis techniques analyzed themessage-passing dynamics of BP.The reparameterization technique analyzes thesteady-state beliefs.

Loopy Belief Propagation – p.60/73

Reparameterization Overview

We show that the beliefs define anotherparameterization of the distribution � � � �

.

In this parameterization,We show that the steady-state beliefs areconsistent w.r.t every subtree.We show that the max-product assignmentsatisfies an optimality condition w.r.t. everysubgraph with at most one cycle per connectedcomponent.

Loopy Belief Propagation – p.61/73

Steady-State Beliefs

� � ���� ��

� �

� � � ��� ��

��� ��

��� ��

� ��� � � � ��

We analyze the steady-state single- and pair-node beliefs:

� �

���� �� � � ���� ��

� � � � � �� �

� ���� ��

� �

� ���� � � � �� � � ���� �� � ���� �� � � ���� � � � ��

� � � � � � ��� �

� ���� ��

� � � ��� � � �� �

� ���� ��*

Loopy Belief Propagation – p.62/73

Belief Parameterization

The beliefs

define another parameterization of thedistribution:

� � � � � �

1 2 3

4 5 6

� � � � � �

� � �� � � � � �

� � � � � �

� � � � � �

� � � � � �

1 2 3

4 5 6

� � � � � �

� ��� � ��

� � �� � � �

�� �

�� � �

�� �

�� � �

� ��� � ��

� � �

� � � �

� � �� � ��

Loopy Belief Propagation – p.63/73

Belief Parameterization

The beliefs

define another parameterization of thedistribution:

� � � � � � � � �� � �

� �

���� ��

� � � � � � �

� �

� ���� � � � ��

� ����� �� � ���� ��

� �� � �

� ���� ��

� � � � � � �� � ��� � � � ��

� � � � � ) �

as can be shown by substituting in the definition of

.

Loopy Belief Propagation – p.64/73

Consistency

Definition: Let ���� �

be a subgraph withdistribution

� �� � ��� � �

��

� �� � ���

��

�� ��

� � � � � � � �

�� �

�� ��

��

�� �� �

��� �� �

The beliefs

are consistent w.r.t if the correspondingbeliefs

�� are the true max-marginals of � �

� � ��� � ��

.

Loopy Belief Propagation – p.65/73

Edge Consistency

The edge beliefs are consistent:

� �����

�� �

�� � � � ��

��

�� ��

as can be seen by substituting in the message definitionsof

�� � and

�� .

Loopy Belief Propagation – p.66/73

Tree Consistency

� � � � � �

1 2 3

4 5 6

� � � � � �

� ��� � ��

� � �� � � �

�� �

�� � �

�� �

�� � �

� ��� � ��

� � �

� � � �

� � �� � ��

The steady-state beliefs�

are consistent w.r.t everysubtree of .This is shown by exploiting the edge consistencydescribed to remove leaf nodes one at a time.In the end, we are left with only two nodes, a trivial basecase.

Loopy Belief Propagation – p.67/73

Tree Plus Cycle Optimality

� �� � � � � � �

1 2 3

4 5 6

� � � � � � � � �

Let ���� �

be a subgraph of with at most onecycle per connected component and distribution

� �� � ��� � �

��

� �� � ���

��

�� ��

� � � � � � � �

�� �

�� ��

��

�� �� �

��� �� �

The max-product assignment � � ��� maximizes � �.Loopy Belief Propagation – p.68/73

Tree Plus Cycle Optimality

Using the edge consistency described, we can show that

� ��

�� �

�� � � � ��

��

�� ��

� ��

�� �

�� �� � � ��

��

�� ��

for any other assignment � � � � � .

Loopy Belief Propagation – p.69/73

Tree Plus Cycle Optimality

If is a connected subgraph containing one cycle, thenthe edges of can be directed so that each node hasexactly one parent:

� �� � � � � �

��

�� � ���

� ��

�� �

�� � � � ��

��

�� ��

� � ��� ��

�� �

�� �� � � ��

��

�� ��

� �� � � � � � � ��

where � is the parent of �.

Loopy Belief Propagation – p.70/73

Corollaries of TPS Optimality

The two results proved using the unwrapped tree arecorollaries of the Tree-Plus-Cycle optimality.The Tree-Plus-Cycle optimality can also be used to showan error bound on the max-product assignment for anarbitrary graph.

Loopy Belief Propagation – p.71/73

Future Work

I have presented three techniques for analyzing loopy BP.Experimental results are better than the results proved.Future work includes extending each technique to bemore general and prove stronger results.

Prove convergence properties of the max-productalgorithm on a single-cycle graph using algebraicanalysis.

Prove the optimality of the max-product algorithmfor specific multiple-loop graphs using theunwrapped tree technique.

Show more powerful optimality results for arbitrarygraph structures with specific potential properties.

Loopy Belief Propagation – p.72/73

ReferencesAji, S., Horn, G., and McEliece, R. (1998). On the convergence of iterative decoding on graphs with a

single cycle. In IEEE International Symposium on Information Theory.

McEliece, R., Rodemich, E., and Cheng, J. (1995). The Turbo decision algorithm. In 33rd AllertonConference on Communications, Control and Computing, Monticello, IL.

Murphy, K., Weiss, Y., and Jordan, M. (1999). Loopy belief propagation for approximate inference:An empirical study. In Uncertainty in Artificial Intelligence, pages 467–475.

Pearl, J. (1998). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.Morgan Kaufmann Publishers, Inc., San Mateo, CA.

Wainwright, M. (January, 2002). Stochastic Processes on Graphs with Cycles: Geometric andVariational Approaches. PhD thesis, MIT, Cambridge, MA.

Wainwright, M., Jaakola, T., and Willsky, A. (2003). Tree-based reparameterization framework foranalysis of sum-product and related algorithms. IEEE Transactions on Information Theory, 49(5).

Wainwright, M., Jaakola, T., and Willsky, A. (October 28, 2002). Tree consistency and bounds on theperformance of the max-product algorithm and its generalizations. Technical Report P–2554,Laboratory for Information and Decision Systems, MIT.

Weiss, Y. (November, 1997). Belief propagation and revision in networks with loops. Technical ReportAI Memo No. 1616, C.B.C.L. Paper No. 155, AI Lab, MIT.

Weiss, Y. and Freeman, W. (2001a). Correctness of belief propagation in Gaussian graphical models orarbitrary topology. Neural Computation, 13:2173–2200.

Weiss, Y. and Freeman, W. (2001b). On the optimality of solutions of the max-product beliefpropagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory,47(2):723–735. Loopy Belief Propagation – p.73/73