Physics of Algorithms Talk

Orbit-Product Analysis of (Generalized) GaussianBelief Propagation

Jason Johnson, Post-Doctoral Fellow, LANLJoint work with Michael Chertkov and Vladimir Chernyak

Physics of Algorithms WorkshopSanta Fe, New Mexico

September 3, 2009

Overview

Introduction

I graphical models + belief propagation

I specialization to Gaussian model

Analysis of Gaussian BP

I walk-sum analysis for means, variances, covariances1

I orbit-product analysis/corrections for determinant2

Current Work on Generalized Belief Propagation (GBP) [Yedidia etal]

I uses larger “regions” to capture more walks/orbits of thegraph (better approximation)

I However, it can also lead to over-counting of walks/orbits(bad approximation/unstable algorithm)!

1Earlier joint work with Malioutov & Willsky (NIPS, JMLR ’06).2Johnson, Chernyak & Chertkov (ICML ’09).

Graphical Models

A graphical model is a multivariate probability distribution that isexpressed in terms of interactions among subsets of variables (e.g.pairwise interactions on the edges of a graph G ).

P(x) =1

∏i∈V

ψi (xi )∏{i ,j}∈G

ψij(xi , xj)

Markov property:

P(xA, xB |xS) = P(xA|xS)P(xB |xS)

Given the potential functions ψ, the goal of inference is to computemarginals P(xi ) =

∑xV\i

P(x) or the normalization constant Z ,

which is generally difficult in large, complex graphical models.

Gaussian Graphical Model

Information form of Gaussian density.

P(x) ∝ exp{−1

2xT Jx + hT x}

Gaussian graphical model: sparse J matrix

Jij 6= 0 if and only if {i , j} ∈ G

Potentials:ψi (xi ) = e−

12Jiix

2i +hixi

ψij(xi , xj) = e−Jijxixj

Inference corresponds to calculation of mean vector µ = J−1h,covariance matrix K = J−1 or determinant Z = det J−1. MarginalsP(xi ) specified by means µi and variances Kii .

Belief Propagation

Belief Propagation iteratively updates a set of messages µi→j(xj)defined on directed edges of the graph G using the rule:

µi→j(xj) ∝∑xi

ψi (xi )∏

k∈N(i)\j

µk→i (xi )ψ(xi , xj)

Iterate message updates until converges to a fixed point.

Marginal Estimates: combine messages at a node

P(xi ) =1

Ziψi (xi )

∏k∈N(i)

µk→i (xi )︸︷︷︸ψ̃i (xi )

Belief Propagation II

Pairwise Estimates (on edges of graph):

P(xi , xj) =1

Zijψ̃i (xi )ψ̃j(xj)

ψ(xi , xj)

µi→j(xj)µj→i (xi )︸︷︷︸ψ̃ij (xi ,xj )

Estimate of Normalization Constant:

Zbp =∏i∈V

∏{i ,j}∈G

BP fixed point is saddle point of RHS with respect tomessages/reparameterizations.

In trees, BP converges in finite number of steps and is exact(equivalent to variable elimination).

Gaussian Belief Propagation (GaBP)

Messages µi→j(xj) ∝ exp{12αi→jx

2j + βi→jxj}.

BP fixed-point equations reduce to:

αi→j = J2ij (Jii − αi\j)

βi→j = −Jij(Jii − αi\j)−1(hi + βi\j)

where αi\j =∑

k∈N(i)\j αk→i and βi\j =∑

k∈N(i)\j αk→i .

Marginals specified by:

Kbpi = (Jii −

∑k∈N(i)

αk→i )−1

µbpi = Kbp

i (hi +∑

k∈N(i)

βk→i )

Gaussian BP Determinant Estimate

Estimates of pairwise covariance on edges:

Kbp(ij) =

(Jii − αi\j Jij

Jij Jjj − αj\i

Estimate of Z , det K = det J−1:

Zbp =∏i∈V

∏{i ,j}∈G

where Zi = Kbpi and Zij = det Kbp

Exact in tree models (equivalent to Gaussian elimination),approximate in loopy models.

The BP Computation Tree

BP marginal estimates are equivalent to the exact marginal in atree-structured model [Weiss & Freeman].

1 3 3 9 7 9 5 95 9 1 7 3 9 7 9

µ(1)5→6

µ(4)2→1

µ(2)6→3

µ(3)3→2

The BP messages correspond to upwards variable elimination stepsin this computation tree.

Walk-Summable Gaussian Models

Let J = I − R. If ρ(R) < 1 then (I − R)−1 =∑∞

L=0 RL.

Walk-Sum interpretation of inference:

Kij =∞∑

∑w :i

Rw ?=∑

w :i→j

µi =∑

∞∑L=0

∑w :j

Rw ?=∑

w :∗→i

h∗Rw

Walk-Summable if∑

w :i→j |Rw | converges for all i , j . Absoluteconvergence implies convergence of walk-sums (to same value) forarbitrary orderings and partitions of the set of walks. Equivalent toρ(|R|) < 1.

Walk-Sum Interpretation of GaBP

Combine interpretation of BP as exact inference on computationtree with walk-sum interpretation of Gaussian inference in trees:

I messages represent walk-sums in subtrees of computation tree

I Gauss BP converges in walk-summable models

I complete walk-sum for the means

I incomplete walk-sum for the variances

Complete Walk-Sum for Means

Every walk in G ending at a node i maps to a walk of thecomputation tree Ti (ending at root node of Ti )...

1 3 3 9 7 9 5 95 9 1 7 3 9 7 9

Gaussian BP converges to the correct means in WS models.

Incomplete Walk-Sum for Variances

Only those totally backtracking walks of G can be embedded asclosed walks in the computation tree...

1 3 3 9 7 9 5 95 9 1 7 3 9 7 9

Gaussian BP converges to incorrect variance estimates(underestimate in non-negative model).

Zeta Function and Orbit-Product

What about the determinant?

Definition of Orbits:

I A walk is closed if it begins and ends at same vertex.

I It is primitive if does not repeat a shorter walk.

I Two primitive walks are equivalent if one is a cyclic shift ofthe other.

I Define orbits ` ∈ L of G to be equivalence classes of closed,primitive walks.

Theorem. Let Z , det(I − R)−1. If ρ(|R|) < 1 then

Z =∏`

(1− R`)−1 ,∏`

A kind of zeta function in graph theory.

Zbp as Totally-Backtracking Orbit-Product

Definition of Totally-Backtracking Orbits:

I Orbit is reducible if it contains backtracking steps ...(ij)(ji)...,else it is irreducible (or backtrackless).

I Every orbit ` has a unique irreducible core γ = Γ(`) obtainedby iteratively deleting pairs of backtracking steps until no moreremain. Let Lγ denote the set of all orbits that reduce to γ.

I Orbit is totally backtracking (or trivial) if it reduces to theempty orbit Γ(`) = ∅, else it is non-trivial.

Theorem. If ρ(|R|) < 1 then Zbp (defined earlier) is equal to thetotally-backtracking orbit-product:

Zbp =∏`∈L∅

Orbit-Product Correction and Error Bound

Orbit-product correction to Zbp:

Z = Zbp∏6̀∈L∅

Error Bound: missing orbits must all involve cycles of the graph...

∣∣∣∣logZ

∣∣∣∣ ≤ ρg

g(1− ρ)

where ρ , ρ(|R|) < 1 and g is girth of the graph (length ofshortest cycle).

Reduction to Backtrackless Orbit-Product Correction

We may reduce the orbit-product correction to one over justbacktrackless orbits γ

Z = Zbp

Z` = Zbp

∏`∈L(γ)

︸︷︷︸

Z ′γ

with modified orbit-factors Z ′γ based on GaBP

Z ′γ = (1−∏

(ij)∈γ

r ′ij)−1 where r ′ij , (1− αi\j)

−1rij

The factor (1− αi\j)−1 serves to reconstruct totally-backtracking

walks at each point i along the backtrackless orbit γ.

Backtrackless Determinant Correction

Define backtrackless graph G ′ of G as follows: nodes of G ′

correspond to directed edges of G , edges (ij)→ (jk) for k 6= i .

6985 96

633652

Let R ′ be adjacency matrix of G ′ with modified edge-weights r ′

based on GaBP. Then,

Z = Zbp det(I − R ′)−1

Region-Based Estimates/CorrectionsSelect a set of regions R ⊂ 2V that is closed under intersectionsand cover all vertices and edges of G.

Define regions counts (nA ∈ Z,A ∈ R) by inclusion-exclusion rule:

nA = 1−∑

B∈R|A(B

To capture all orbits covered by any region (without over-counting)we calculate the estimate:

ZR ,∏B

ZnBB ,

(det(I − RB)−1)nB

Error Bounds. Select regions to cover all orbits up to length L.Then,

∣∣∣∣logZBZ

∣∣∣∣ ≤ ρL

L(1− ρ)

Example: 2-D Grids

Choice of regions for grids: overlapping L× L, L2 × L, L× L

2 , L2 ×

(shifted by L2 ).

For example, in 6× 6 grid with block size L = 4:

n = +1 n = −1 n = +1

256× 256 Periodic Grid, uniform edge weights r ∈ [0, .25].Test with L = 2, 4, 8, 16, 32.

0 0.05 0.1 0.15 0.2 0.250

ρ(|R|)ρ(|R′|)

0 0.05 0.1 0.15 0.2 0.250

n−1 log Ztrue

n−1 log Zbp

n−1 log ZB (L=2,4,8,...)

0 0.05 0.1 0.15 0.2 0.250

n−1 log Ztrue

n−1 log Zbp

5 10 15 20 25 3010

10−10

10−8

10−6

10−4

10−2

n−1|log Z−1true

Z′B|

Generalized Belief PropagationSelect a set of regions R ⊂ 2V that is closed under intersectionsand cover all vertices and edges of G.

Define regions counts (nA ∈ Z,A ∈ R) by inclusion-exclusion rule:

nA = 1−∑

B∈R|A(B

Then, GBP solves for saddle point of

ZR(ψ) ,∏A∈R

Z (ψR)nR

over reparameterizations {ψA,A ∈ R} of the form

P(x) =1

∏A∈R

ψR(xR)nR

Denote saddle-point by Zgbp = ZR(ψgbp).

Example: 2-D Grid Revisited

4 6 8 10 12 14 1610

10−14

10−12

10−10

10−8

10−6

10−4

10−2

block size

block estimateGBP estimate

GBP Toy Example

Look at graph G = K4 and consider different choices of regions...

BP Regions:

n = +1

n = −2

GBP “3∆” Regions:

n = +1

123124

12n = −1

n = +1

GBP “4∆” Regions:

n = +1

123124

n = −1

n = +1

Computational Experiment with equal edge weights r = .32 (themodel becomes singular/indefinite for r ≥ 1

Z = 10.9

Zbp = 2.5

Zgbp(3∆) = 9.9

Zgbp(4∆) = 54.4!!!

GBP with 3∆ regions is big improvement of BP (GBP capturesmore orbits).

What went wrong with the 4∆ method?

Orbit-Product Interpretation of GBP

Answer: sometimes GBP can overcount orbits of the graph.

I Let T (R) be the set of hypertrees T one may construct fromregions R.

I Orbit ` spans T if we can embed ` in T but cannot embed itin any sub-hypertree of T .

I Let g` , #{T ∈ T (R)|` spans T}.

Orbit-Product Interpretation of GBP:

Zgbp =∏`

Remark. GBP may also include multiples of an orbit asindependent orbits (these are not counted by Z ).

We say GBP is consistent if g` ≤ 1 for all (primitive) orbits andg` = 0 for multiples of orbits (no overcounting).

Examples of Over-Counting

Orbit ` = [(12)(23)(34)(41)]:

134 123

Orbit ` = [(12)(23)(34)(42)(21)]:

Conclusion and Future Work

Graphical view of inference in walk-summable Gaussian graphicalmodels that is very intuitive for understanding iterative inferencealgorithms and approximation methods.

Future Work:

I many open questions on GBP.

I multiscale method to approximate longer orbits fromcoarse-grained model.

I beyond walk-summable?

Physics of Algorithms Talk

Documents

Teaching Statistical Physics by Thinking About Models and Algorithms

Quantum algorithms for Nuclear Physics I€¦ · Quantum algorithms for Nuclear Physics I. Outline 2 Bloch sphere Qubits Single-qubit gates Two-qubit gates Jordan-Wigner transformation

htmlgp/talk - gpss.ccgpss.cc/gpip/slides/mackay/htmlgp/talk.pdf · Information Theory, Inference, and Learning Algorithms ... Information Theory, Inference, and Learning Algorithms

Talk by: Surajit Sen Department of Physics Gurucharan College Silchar

New Physics-Based Turbocharger Data-Maps Extrapolation Algorithms: Validation · PDF file · 2017-09-07New Physics-Based Turbocharger Data-Maps Extrapolation Algorithms: Validation

Talk given at the 10 th International Symposium on Experimental Algorithms (SEA 2011)

Unexpected Connections in Physics: From Superconductors to Black Holes Talk online: sachdev.physics.harvard.edu Talk online: sachdev.physics.harvard.edu

Optimization and Machine Learning Training Algorithms for ...€¦ · Optimization and Machine Learning Training Algorithms for Fitting Numerical Physics Models Raghu Bollapragaday

Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms

Nuclear Physics B330 (19901 681-704 ALGORITHMS FOR THE …tracy/selectedPapers/1990s/CV32.pdf · 2005. 10. 11. · Nuclear Physics B330 (19901 681-704 North-Holland ALGORITHMS FOR

PhD Defense Talk - Near-Optimal Mobile Crowdsensing: Design Framework and Algorithms

Genetic Algorithms Talk

Research Article Physics-Inspired Optimization Algorithms ...downloads.hindawi.com/journals/jopti/2013/438152.pdf · Research Article Physics-Inspired Optimization Algorithms:

Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Nikolay Malitsky

Plan for the Talk - Cornell University · Plan for the Talk ! ... Opinion extraction – definition and examples ! Algorithms and evaluation ! Demo Subjective Language ! Subjective

Basic Numerical Algorithms Computational Physics- 2012 KH

The Physics of Compressive Sensing and the Gradient-Based Recovery Algorithms Physics... · 2017-11-10 · cuss relevant recovery algorithms. 2. Exact Recovery of Sparse Signals Given

Physics of Percussion Instruments Drum Shell Bearing Edges Final Talk

2016 06 21 VFX talk Institute of Physics

Design of Parallel Algorithms - Physics & Astronomyhomepage.physics.uiowa.edu/~ghowes/teach/ihpc12/lec/ihpc12Lec... · Design of Parallel Algorithms ... -Excellent tutorial on the