Upload
jonathan-henderson
View
214
Download
1
Embed Size (px)
Citation preview
Genome Evolution. Amos Tanay 2010
Genome evolution:
Lecture 9: Variational inference and Belief propagation
Genome Evolution. Amos Tanay 2010
Expectation-Maximization
),(),|()|,(
)|,(log)|(log
sPshPshP
shPsPh
),|(log)|,(log)|(log shPshPsP
h h
shPshPshPshPsP ),|(log),|()|,(log),|()|(log ''
h
kk shPshPQ )|,(log),|()|(
h
kkkkk
k
shP
shPshPQQ
sPsP
),|(
),|(log),|()|()|(
)|(log)|(log
h i
ik
h
kk shPshPshPshPQ )|,(log),|()|,(log),|()|(
Genome Evolution. Amos Tanay 2010
Log-likelihood to Free Energy
h
shPsP )|,(log)|(log
• We have so far worked on computing the likelihood:
h hs
hqhqF )
)|,Pr(
)(log()(
• Better: when q a distribution, the free energy bounds the likelihood:
• Computing likelihood is hard. We can reformulate the problem by adding parameters and transforming it into an optimization problem. Given a trial function q, define the free energy of the model as:
• The free energy is exactly the likelihood when q is the posterior:
)|Pr(log)|Pr(log),|Pr(
))|,Pr(/),|log(Pr(),|Pr(),|Pr()(
sssh
hsshshFshhq
h
h
hh
shqshhqhqF )),Pr(ogl)()),|Pr(/)(log()(
D(q || p(h|s)) Likelihood
Genome Evolution. Amos Tanay 2010
Energy?? What energy?
T
xE
eTZ
xp)(
)(
1)(
• In statistical mechanics, a system at temperature T with states x and an energy function E(x) is characterized by Boltzman’s law:
• If we think of P(h|s,):
• Given a model p(h,s|T) (a BN), we can define the energy using Boltzman’s law
• Z is the partition function:
dxeTZ TxE /)()(
)|,(log)|,(1 shpshET
)(),,(log)( spZshphE
Genome Evolution. Amos Tanay 2010
Free Energy and Variational Free EnergyT
xE
eTZ
xp)(
)(
1)(
• The Helmoholtz free energy is defined in physics as:
• The average energy is:
• The variational transformation introduce trial functions q(h), and set the variational free energy (or Gibbs free energy) to:
• This free energy is important in statistical mechanics, but it is difficult to compute, as our probabilistic Z (= p(s))
)()()( qHqUqF
h
hEhqqU )()()(
ZFH log
h
hqhqqH )(log)()(
• The variational entropy is:
• And as before:
)||()( pqDFqF H
Genome Evolution. Amos Tanay 2010
Solving the variational optimization problem
• So instead of computing p(s), we can search for q that optimizes the free energy
)()()( qHqUqF h
shphqqU ),(log)()( h
hqhqqH )(log)()(
• This is still hard as before, but we can simplify the problem by restricting q• (this is where the additional degrees of freedom become important)
Maxmizing U? Maxmizing H?
Focus on max configurations Spread out the distribution
Genome Evolution. Amos Tanay 2010
Simplest variational approximation: Mean Field
• Let’s assume complete independence among r.v.’s posteriors:
)()()( qHqUqF h
shphqqU ),(log)()( h
hqhqqH )(log)()(
• Under this assumption we can try optimizing the qi – (looking for minimal energy!)
Maxmizing U? Maxmizing H?
Focus on max configurations Spread out the distribution
)()( iii
hqhq
)(log),(logmin)(min iiiiiq
MF hqqshpqqFFi
i hiii
hii
i
hqqshpq )(log),(log)(min
Genome Evolution. Amos Tanay 2010
Mean Field Inference
• We optimize iteratively:
• Select i (sequentially, or using any method)
• Optimize qi to minimize FMF(q1,..,qi,…,qn) while fixing all other qs
• Terminate when FMF cannot be improved further
)()( iii
hqhq )(log),(logmin)(min iiiiiq
MF hqqshpqqFFi
i hiii
hii
i
hqqshpq )(log),(log)(min
• Remember: FMF always bound the likelihood
• qi optimization can usually be done efficiently
Genome Evolution. Amos Tanay 2010
Mean field for a simple-tree model
)()(),|Pr( ii hqhqsh Just for illustration, since we know how solve this one exactly:
MFqii Fhqi
maxarg)(
We select a node and optimize its qi while making sure it is a distribution:
)(log))(()|,Pr(log))(( hqhqshhqFh
jjh
jjMF
Chqhqxxhq iih
iih j
jjkkk
i
)(log)()pa|Pr(log)(
Chqhqhhhqhq
hhhqhqhhhqhq
iih
iihh
illlii
hhirrrii
hhiiiiii
ili
riii
)(log)()|Pr(log)()(
)|Pr(log)()()pa|Pr(log)()(
,
,pa,papa
To ease notation, assume the left (l) and right (r) children are hidden
Chqhhhq
hhhqhhhq
hqii
hilll
hirrr
hiiii
hii
i
ri
i
)(log)|Pr(log)(
)|Pr(log)()pa|Pr(log)(
)(pa
papa
The energy decomposes, and only few terms are affected:
Genome Evolution. Amos Tanay 2010
Mean field for a simple-tree model
)()(),|Pr( ii hqhqsh
Just for illustration, since we know how solve this one exactly:
MFqii Fhqi
maxarg)(
We select a node and optimize its qi while making sure it is a distribution:
Chqhhhq
hhhqhhhq
hqFii
hilll
hirrr
hiiii
hiiMF
i
ri
i
)(log)|Pr(log)(
)|Pr(log)()pa|Pr(log)(
)(pa
papa
i
ri
hilll
hirrr
hiiii
iihhhq
hhhqhhhq
hq)|Pr(log)(
)|Pr(log)()pa|Pr(log)(
exp)(pa
papa
Genome Evolution. Amos Tanay 2010
Mean field for a phylo-hmm model
)()(),|Pr( ,ji
jiji hqhqsh
Now we don’t know how to solve this exactly, but MF is still simple:
)(log))(()|,Pr(log))(()( ,, hqhqshhqqFh
mk
mkkm
h
mk
mkmk
jiMF
hjihj-1
i
hj-1pai hj
pai
Chqhq
hhhhhhhhq
hhhhhhhhq
ji
ji
h
ji
ji
hhhh
hhhh
hhhh
hhhh
hh
hh
jpai
ji
jpai
ji
jpai
ji
jpai
ji
hh
hh
jpai
ji
jpai
ji
jpai
ji
jpai
ji
i
jl
ji
jl
ji
jl
ji
jl
ji
jr
ji
jr
ji
jr
ji
jr
ji
jpai
ji
jpai
ji
jpai
ji
jpai
ji
)(log)(
(..)(..)(..)(..)
),,|Pr(log),,,(
),,|Pr(log),,,(
11111111
11
11
,,,
,,,
,,,
,,,
,
,,
1111
,
,,
1111
hjl
hj+1l hj
r hj+1r
hj+1i
hjpai
hj-1rhj-1
r
Genome Evolution. Amos Tanay 2010
Mean field for a phylo-hmm model
)()(),|Pr( ii hqhqsh
Now we don’t know how to solve this exactly, but MF is still simple:
hjihj-1
i
hj-1pai hj
pai
hjl
hj+1l hj
r hj+1r
hj+1i
hjpai
hj-1rhj-1
r
1111111111
11
,,
,,
,,
,,
,
,
,
,
1111
(..)(..)(..)(..)(..)
),,|Pr(log),,(
exp)(
jl
ji
jl
jl
ji
jl
jr
ji
jr
jr
ji
jr
jpai
ji
jpai
jpai
ji
jpai
hhh
hhh
hhh
hhh
hh
h
hh
h
jpai
ji
jpai
ji
jpai
ji
jpai
ii
hhhhhhhq
hq
As before, the optimal solution is derived by making logqi equals the sum of affected terms:
)(log))(()|,Pr(log))(()( ,, hqhqshhqqFh
mk
mkkm
h
mk
mkmk
jiMF
Genome Evolution. Amos Tanay 2010
Because the MF trial function is very crude
Simple Mean Field is usually not a good idea
Why?
For example, we said before that the joint posteriors cannot be approximated by independent product of the hidden variables posteriors
)()(),|Pr( ii hqhqsh
A C A C
A/C A/C
A/C
),|Pr(pa),|Pr(),|pa,Pr( shshshh iiii
Genome Evolution. Amos Tanay 2010
Exploiting additional structure
The approximation specify independent distributions for each loci, but maintain the tree dependencies.
We can greatly improve accuracy by generalizing the mean field algorithm using larger building blocks
We now optimize each tree q separately, given the current other tree potentials.
),..,()(),|Pr( 11jn
jj hhqhqsh
The key point is that optimizing for any given tree is efficient: we just use a modified up-down algorithm
Genome Evolution. Amos Tanay 2010
Tree based variational inference
)(log))(()|,Pr(log))(()( hqhqshhqqFh
mmm
h
mmm
jMF
Chqhq
hhhqhq
hhhqhq
jj
h
j
hh
jjjj
hh
jjjj
j
jj
jj
)(log)(
)|Pr(log)()(
)|Pr(log)()(
1
1
,
11
,
11
Chqhq
hhhqhhhqhq
jj
h
j
h h
jjj
h
jjjj
j
j jj
)(log)(
))|Pr(log)((())|Pr(log)(()(11
1111
Chqhq
hhhhhq
hhhhhq
hq
jj
h
j
h
h i
jpai
jpai
ji
ji
j
h i
jpai
jpai
ji
ji
j
j
j
j
j
j
)(log)(
)),,|Pr(log)((
)),,|Pr(log)((
)(
1
1
111
111
Each tree is only affected by the tree before and the tree after:
Genome Evolution. Amos Tanay 2010
Tree based variational inference
Chqhq
hhhhhq
hhhhhq
hq
jj
h
j
h
h i
jpai
jpai
ji
ji
j
h i
jpai
jpai
ji
ji
j
j
j
j
j
j
)(log)(
)),,|Pr(log)((
)),,|Pr(log)((
)(
1
1
111
111
Chqhq
hhhhhhq
hhhhhhq
hq
jj
h
j
h i
hh
jpai
jpai
ji
ji
jpai
ji
hh
jpai
jpai
ji
ji
jpai
ji
j
j
j
jpai
ji
jpai
ji
)(log)(
),,|Pr(log),(
),,|Pr(log),(
)(
11
11
,
1111
,
1111
)(log)()|Pr(log)()|( hqhqhhhqsPhh i
paiiTreeSimple
We got the same functional form as we had for the simple tree, so we can use the up-down algorithm to optimize qj.
),(log paji
ji hh
)(log)(),(log)( hqhqhhhqFhh i
jpai
ji
j
Genome Evolution. Amos Tanay 2010
Chain cluster variational inference
We can use any partition of a BN to trees and derive a similar MF algorithm
For example, instead of trees we can use the Markov chains in each species
What will work better for us?
Depends on the strength of dependencies at each dimension – we should try to capture as much “dependency” as possible
Genome Evolution. Amos Tanay 2010
Simple Tree: Inference as message passing
s
s
s s
s
s
sYou are P(H|our data)
You are P(H|our data)
I am P(H|all data)
DATA
Genome Evolution. Amos Tanay 2010
Factor graphs
Defining the joint probability for a set of random variables given:
1) Any set of node subsets (hypergraph)
2) Functions on the node subsets (Potentials)
)(1
)Pr( aa xZ
x
)( ax
)|{, VaaAV
x
aa xZ )(
Joint distribution:
Partition function:
If the potentials are condition probabilities, what will be Z?
Things are difficult when there are several modes
Factor
R.V.
Not necessarily 1! (can you think of an example?)
Genome Evolution. Amos Tanay 2010
hpaij hpai
j+1hpaij-1
hij hi
j+1hij-1
hpaij hpai
j+1hpaij-1
hij hi
j+1hij-1
DBN PhyloHMM
hpaij hpai
j+1hpaij-1
hij hi
j+1hij-1
hpaij hpai
j+1hpaij-1
hij hi
j+1hij-1
hpaij hpai
j+1hpaij-1
hij hi
j+1hij-1
hpaij hpai
j+1hpaij-1
hij hi
j+1hij-1
Converting directional models to factor graphs
(Loops!) Well defined
Z=1Z=1
)pa|Pr()( xxxa )pa|Pr()( xxxa
)pa|Pr()( xxxa Z!=1
Genome Evolution. Amos Tanay 2010
More definitions
The model: a
axZx )(log)log())log(Pr(
Potentials can be defined on discrete, real valued etc.it is also common to define general log-linear models directly:
))(logexp(1
)Pr( a
aa xwZ
x
Inference:
Dx a
aa xwZ
D ))(logexp(1
)|Pr(
)|Pr(/))(logexp(1
),|Pr(,|
DxwZ
DxDxx a
aai
i
Learning:
Find the factors parameterization: )|Pr(maxarg
D
Genome Evolution. Amos Tanay 2010
Inference in factor graphs: Algorithms
Directed models are sometimes more natural and easy to understand. Their popularity stems from their original role as expressing knowledge in AIThey are not very natural for modeling physical phenomena, except for time-dependent processes
Undirected models are analogous to well-developed models in statistical physics (e.g., spin glass models)
We borrow computational ideas from physicists (these people are big with approximations)
The models are convex which give them important algorithmic properties
Dynamic programming:
Forward sampling (likelihood weighting):
Metropolis/Gibbs:
Mean field:
Structural variational inference:
No (also not in BN!)
No
Yes
Yes
Yes
Genome Evolution. Amos Tanay 2010
Belief propagation in a factor graph
)(1
)|( aaa xZ
xP
• Remember, a factor graph is defined given a set of random variables (use indices i,j,k.) and a set of factors on groups of variables (use indices a,b..)
)( iia xm
• Think of messages as transmitting beliefs:
a->i : given my other inputs variables, and ignoring your message, you are x
i->a : given my other inputs factors and my potential, and ignoring your message, you are x
• xa refers to an assignment of values to the inputs of the factor a
• Z is the partition function (which is hard to compute)
• The BP algorithm is constructed by computing and updating messages:
• Messages from factors to variables:
• Messages from variables to factors: )( iai xm
(any value attainable by xi)->real values
Genome Evolution. Amos Tanay 2010
Messages update rules:
)()(\)(
iicaiNc
iai xmxm
ia xx
jajiaNj
aaiia xmxxm )()()(\)(
Messages from variables to factors:
Messages from factors to variables:
a
i aiN \)(
a
iiaN \)(
Genome Evolution. Amos Tanay 2010
The algorithm proceeds by updating messages:
• Define the beliefs as approximating single variables posterios (p(hi|s)):
)()()(
iiaiNa
ii xmxb
Algorithm:
Initialize all messages to uniformIterate until no message change:
Update factors to variables messagesUpdate variables to factors messages
• Why this is different than the mean field algorithm?
)()( ii hqhq
Genome Evolution. Amos Tanay 2010
Beliefs on factor inputs
This is far from mean field, since for example:
)()(
)()()()(
\)()(
)()(
jjcajNcaNj
a
jjaNj
jajaNj
aaa
xmx
xbxmxxb
The update rules can be viewed as derived from constraints on the beliefs:
1.requirement on the variables beliefs (bi)
2.requirement on the factor beliefs (ba)
3.Marginalization requirement:
a
i aiN \)(
a
iiaN \)(
ia xxjjc
ajNcaNjaiiiid
iNdxmxxbxm )()()()(
\)()()(
ia xxjjc
ajNciaNjaiia xmxxm )()()(
\)(\)(
ia xx
aaii xbxb\
)()(
)()()(
iiaiNa
ii xmxb
)()()(\)()(
jjcajNcaNj
aaa xmxxb
Genome Evolution. Amos Tanay 2010
BP on Tree = Up-Down
s4 s3
h2
h3e
s2 s1
h1
b a
c
d
)|Pr()|Pr()( 12111hshsxup ih
111)( 1 hbhach mmhm
)()()(
)()()(
2\
1
1\
1
11
11
smxhm
smxhm
bshx
bhb
ashx
aha
ib
ia
32
32
1
,313232 )|Pr()|Pr()()(
)(
hhhh
ih
hhhhhdownhup
xdown
3 2
2
3
33
1
31
)(),()(),(
)()(),(
)()()(
323313
3313
\31
h hehedc
hhehdc
hxchcchc
hmhhhhh
hmhmhh
hmxhmc
2 1
3
Genome Evolution. Amos Tanay 2010
Loopy BP is not guaranteed to converge
X Y
Y
x
01
10
Y
x
01
10
1 1
0 0
This is not a hypothetical scenario – it frequently happens when there is too much symmetryFor example, most mutational effects are double stranded and so symmetric which can result in loops.
Genome Evolution. Amos Tanay 2010
The Bethe Free Energy
H. Bethe
• LBP was introduced in several domains (BNs, Coding), and is consider very practical in many cases.
• ..but unlike the variational approaches we studied before, it is not clear how it approximate the likelihood/partition function, even when it converges..
hh
hqhqshphqF )(log)()|,(log)( • Compare to the variational free energy:
Theorem: beliefs are LBP fixed points if and only if they are locally optimal for the Bethe free energy
iiii
aaabethe
aaabethe
BetheBetheBethe
xbxbdxbxbH
xxbU
HUF
)(log)()1()(log)(
)(log)(
• In the early 2000, Yedidia, Freeman and Weiss discovered a connection between the LBP algorithm and the Bethe free energy developed by Hans Bethe to approximate the free energy in crystal field theory back in the 40’s/50’s.
Genome Evolution. Amos Tanay 2010
Generalization: Regions-based free energy
RR AaR
XiR caci 11
• Start with a factor graph (X,A)
• Introduce regions (XR,AR) and multipliers cR
• We require that:
• We will work with valid regions graphs:
)()()(
)(log)()(
)()()(
)(log)(
RRRRRR
xRRRRRR
xRRRRR
AaaRR
bHbUbF
xbxbbH
xExbbU
xxE
R
R
R
RR XaNAa )(
Region-based average energy
Region average energy
Region Entropy
Region Free energy
})({})({})({
)(})({
)(})({
R
R
R
RRRRR
RRRRR
RRRR
bHbUbF
bHcbH
bUcbU
Region-based entropy
Region-based free energy
Genome Evolution. Amos Tanay 2010
Bethe regions are the factors neighbors sets and single variables regions:
a
c
b
111 ccbac ccc
We compensate for the multiple counting of variables using the multiplicity constant
We can add larger regions
As long as we update the multipliers:
11 iia dcc
Ra
Rac
Rbc
RR
RR cc'
'1
Genome Evolution. Amos Tanay 2010
Multipliers compensate on average, not on entropy
Claim: For valid regions, if the regions’ beliefs are exact:
a x
aaaa
c
R x RaaaRRR
RRRRRR
a
RaR
R
xxbxxbcbUcbU )(log)()(log)()(})({)1(
We cannot guarantee much on the region-based entropy:
Claim: the region-based entropy is exact when the model is a uniform distributionProof: exercise. This means that the entropy count the correct number of degrees of
freedom – e.g. for binary variables, H=Nlog2
Definition: a region based free energy approximation is said to be max-ent normal if its region-based entropy is maximized when the beliefs are uniform.
An non max-ent approximation can minimize the region free energy by selecting erroneously high entropy beliefs!
Rx
RRRRRR xbxbbH )(log)()(
)()( RRRR xpxb
x
RR xExpbU )()(})({then the average region-based energy is exact:
a x
aaaax a
aax a
xxpxxpxExpU )(log)()(log)()()(
Genome Evolution. Amos Tanay 2010
Bethe’s region are max-ent normal
Claim: The Bethe regions gives a max-ent normal approximation (i.e. it maximize the region-based entropy on the uniform distribution)
a x aNi xiiiiaaaa
i xiiiiBethe
a ii
xbxbxbxbxbxbH)(
)(log)()(ln)()(log)(
Entropy Information
(maximal on uniform) (nonnegative, and 0 on uniform)
iiii
aaabethe
BetheBetheBethe
xbxbdxbxbH
HUF
)(log)()1()(log)(
)( abI)( ibH
Genome Evolution. Amos Tanay 2010
Start with a complete graph and binary factors
Add all variable triplets, pairs and singleton as regions
Generate multipliers:triplets = 1 (20 overall)pairs = -3 (15 overall)singletons = 6 (6 overall) ( guarantee consistency)
Example: A Non max-ent approximation
Look at the consistent beliefs:
The Region entropy (for any region) = ln2. The total region entropy is:
otherwise
xxxxxxb
otherwise
xxxxbxb kji
kjiji
jii 0
2/1),,(
0
2/1),(;5.0)0(
2ln112ln362ln452ln20 R
RRR HcH
We claimed before the entropy of the uniform distribution will be exact: 6ln2
RR
RR cc'
'1
Genome Evolution. Amos Tanay 2010
We want to solve a variational problem:
While enforcing constraints on the regions’ beliefs:
Inference as minimization of region-based free energy
})({min RR bF
1)( Rx
RR xb
)()( ''\ '
RRxx
RR xbxbRR
Unlike the structured variational approximation we discussed before, and although the beliefs are (regionally) compatible, we can have cases with optimal beliefs that are not representing a true global posterior distribution
C
BA
Y
x
4.01.0
1.04.0
1.04.0
4.01.0,
4.01.0
1.04.0CBA bbb
Y
x
1.04.0
4.01.0
Optimal region beliefs are identical to the factors:
5.0
5.0ib
Y
x
4.01.0
1.04.0
It can be shown that this cannot be the result of any joint distribution on the three variables
(note the negative feedback loop here)
Genome Evolution. Amos Tanay 2010
Claim: When it converges, LBP finds a minimum of the Bethe free energy.
Proof idea: we have an optimization problem (minimum energy) with constraints (beliefs are consistent and adds up to 1). We write down a Lagrangian that expresses both minimization goal and constraints, and show that it is minimized when the LBP update rules are holding.
Inference as minimization of region-based free energy
i iNa x xxaaiiiai
i xiii
a xaaaBethe
i ia
ia
xbxbx
xbxbFL
)( \
)]()()[(
1)(1)(
Important technical point: we shall assume that in the fixed point all beliefs are non zero. This can be shown to hold if all factors are “soft” (do not contain zero values for any assignment).
Genome Evolution. Amos Tanay 2010
The Bethe Lagrangian
i iNa x xxaaiiiai
i xiii
a xaaaBethe
i ia
ia
xbxbx
xbxbFL
)( \
)]()()[(
1)(1)(
i x
iiiiia x
aaaaa x
aaaaBethe
iaa
xbxbdxbxbxxbF )(log)()1()(log)()(log)(
i iNa x xxaaiiiai
i xiii
a xaaa
i ia
i
a
xbxbx
xb
xb
)( \
)]()()[(
1)(
1)(
Large region beliefs are normalized
Variable region beliefs are normalized
Marginalization
Genome Evolution. Amos Tanay 2010
The Bethe lagrangian
Take the derivatives with respect to each ba and bi:
))(1exp()()(
)(1)(log)(log)(
)(
)(
aNiiaiaaaaa
aNiiaiaaaaa
aia
xxxb
xxbxxb
L
i iNa x xxaaiiiai
i xiii
a xaaaBethe
i ia
ia
xbxbx
xbxbFL
)( \
)]()()[(
1)(1)(
i x
iiiiia x
aaaaa x
aaaaBethe
iaa
xbxbdxbxbxxbF )(log)()1()(log)()(log)(
)))((1
11exp()(
)()1)()(log1()(
)(
)(
aNiiaii
iii
iNaiaiiiii
ii
xd
xb
xxbdxb
L
Genome Evolution. Amos Tanay 2010
Bethe minima are LBP fixed points
))(exp()())(1exp()()()(
)(iai
aNia
aNiiaiaaaaa xxxxxb
)1
)(exp()))((
1
11exp()(
)()(
i
iai
iNaaNiiaii
iii d
xx
dxb
)(log)(log)(\)(
iicaiNc
iaiiai xmxmx
So here are the conditions:
And we can solve them if:
)()()(\)()(
iicaiNcaNi
aaa xmxxb
)()(1
1)(
)(\)()(iia
iNaiic
aiNci
iNaii xmxm
dxb
Giving us:
We saw before these conditions, with the marginalization constraint, are generating the update rules! So L minimum -> LBP fixed point is proven.The other direction quite direct – see Exercise
LBP is in fact computing the Lagrange multipliers – a very powerful observation
Genome Evolution. Amos Tanay 2010
Generalizing LBP for region graphs
)()()()( '})(\{)P(')()P(
DDPRRDDPRDD
RRPRP
aaAa
RR xmxmxxbR
Parent-to-child beliefs:
A region graph is graph on subsets of nodes in the factor graph, with valid multipliers (as defined above)
RD(R) – Decedents of R
P(R)
RR AaR
XiR caci 11
• regions (XR,AR) and multipliers cR
• We require that:
• We will work with valid regions graphs:
RR XaNAa )(
RR
RR cc'
'1
P(D(R))\D(R)P(R) – Parents of R
D(R)
Genome Evolution. Amos Tanay 2010
Generalizing LBP for region graphs
)()()()( '})(\{)P(')()P(
DDPRRDDPRDD
RRPRP
aaAa
RR xmxmxxbR
Parent-to-child algorithm:
RP
RP
x JJIRPDJI
JJIRPNJIaFaRRP xm
xmxxm
\
\
)(
)()()(
),(),(
),(),(
I
J
D(P)+P
Not D(P)+P
D(R) – Decedents of R
P(R) – Parents of R
P
RD(R)+R
IJ
D(P)+PP
RD(R)+R
N(I,J) = I not in D(P)+P J in D(P)+P but not D(R)+R
D(I,J) = I in D(P)+P but not D(R)+R J in D(R)+R
Genome Evolution. Amos Tanay 2010
GLBP in practice
LBP is very attractive for users: really simple to implement, very fast
LBP performance is limited by the size of region assignments Xa which can grow rapidly with the factor’s degrees or the size of large regions
GLBP will be powerful when large regions can capture significant dependencies that are not captured by individual factors – think small positive loop or other symmetric effects
LBP messages can be computed synchronously (factors->variables->factors…), other scheduling options may boost up performance considerably
LBP is just one (quite indirect) way by which Bethe energies can be minimized. Other approaches are possible – which can be guaranteed to converge
The Bethe/Region energy minimization can be further constraint to force beliefs are realizable. This gives rise to the concept of Wainwright-Jordan marginal polytope and convex algorithms on it.