Upload
jack-park
View
55
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Expectation Propagation for Graphical Models. Yuan (Alan) Qi Joint work with Tom Minka. Motivation. Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics. - PowerPoint PPT Presentation
Citation preview
Expectation Propagation for Graphical Models
Yuan (Alan) Qi
Joint work with Tom Minka
Motivation
• Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics.
• Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency.
• Need a new method that better balances the trade-off between accuracy and efficiency.
Motivation
Efficiency
Acc
urac
y
CurrentTechniques
What we want
Outline
• Background
• Expectation Propagation (EP) on dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions and future work
Outline
• Background
• Expectation Propagation (EP) on dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions
Graphical ModelsDirected Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative)
Maximum entropy Markov models
Conditional random fields
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
Inference on Graphical Models
• Bayesian inference techniques:– Belief propagation(BP): Kalman filtering
/smoothing, forward-backward algorithm– Monte Carlo: Particle filter/smoothers,
MCMC• Loopy BP: typically efficient, but not accurate• Monte Carlo: accurate, but often not efficient
Efficiency vs. Accuracy
Efficiency
Acc
urac
y EP ?
BP
MC
Expectation Propagation in a Nutshell
• Approximate a probability distribution by simpler parametric terms:
• Each approximation term lives in an exponential family (e.g. Gaussian)
a
afp )()( xx a
afq )(~
)( xx
)(~xaf
Update Term Approximation
• Iterate the fixed-point equation by moment matching:
))()(~
||)()((minarg \\
)(~
xxxxx
aa
aa
f
qfqfD
a
ab
ba fq )(
~)(\ xx
Where the leave-one-out approximation is
Outline
• Background
• Expectation Propagation (EP) on dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions
EP on Dynamic SystemsDirected Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative)
Maximum entropy Markov models
Conditional random fields
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
Object Tracking
Guess the position of an object given noisy observations
1y
4y
Object
1x2x
3x
4x
2y
3y
Bayesian Network
ttt νxx 1
noise tt xy
(random walk)e.g.
want distribution of x’s given y’s
x1 x2 xT
y1 y2 yT
Approximation
1
1111 )|()|()|()(),(t
tttt xypxxpxypxpp yx
1
111111 )(~)(~)(~)(~)()(t
tttttttt xoxpxpxoxpq x
Factorized and Gaussian in x
(proportional)
Message Interpretation
)(~)(~)(~)( 11 tttttttt xpxoxpxq
= (forward msg)(observation)(backward msg)
xt
yt
Forward Message
Backward Message
Observation Message
EP on Dynamic Systems• Filtering: t = 1, …, T
– Incorporate forward message
– Initialize observation message
• Smoothing: t = T, …, 1– Incorporate the backward message
– Compute the leave-one-out approximation by dividing out the old observation messages
– Re-approximate the new observation messages
• Re-filtering: t = 1, …, T– Incorporate forward and observation messages
Extension of EP
• Instead of matching moments, use any method for approximate filtering.– Examples: Extended Kalman filter, statistical
linearization, unscented filter
• All methods can be interpreted as finding linear/Gaussian approximations to original terms
Example: Poisson Tracking
• is an integer valued Poisson variate with mean )exp( txty
Poisson Tracking Model
)01.0,(~)|( 11 ttt xNxxp
)100,0(~)( 1 Nxp
!/)exp()|( tx
tttt yexyxyp t
Approximate Observation Message
• is not Gaussian
• Moments of x not analytic
• Two approaches:– Gauss-Hermite quadrature for moments– Statistical linearization instead of moment-
matching
• Both work well
)|()|( tttt yxpxyp
EP Accuracy Improves Significantly in only a few Iterations
Approximate vs. Exact Posterior
EP vs. Monte Carlo: Accuracy
Variance
Mean
Accuracy/Efficiency Tradeoff
EP for Digital Wireless Communication
• Signal detection problem
• Transmitted signal st =
• vary to encode each symbol
• Complex representation:
)sin( ta
iae
Re
Im
),( a
a
Binary Symbols, Gaussian Noise
• Symbols are 1 and –1 (in complex plane)
• Received signal yt =
• Optimal detection is easy
noises
ty
0s 1s
Fading Channel
• Channel systematically changes amplitude and phase:
• changes over time
noise sxy tt
txty
0sxt
1sxt 11sxt
01sxt
1ty
Benchmark: Differential Detection
• Classical technique
• Use previous observation to estimate state
• Binary symbols only
Bayesian network for Signal Detection
x1 x2 xT
y1 y2 yT
s1 s2 sT
On-line EP Joint Signal Detector and Channel Estimation
• Iterate over the last observations
• Observations before act as prior for the current estimation
)( t
Computational Complexity• Expectation propagation O(nLd2)
• Stochastic mixture of Kalman filters O(LMd2)
• Rao-blackwised paricle smoothers O(LMNd2)
n: Number of EP iterations (Typically, 4 or 5)
d: Dimension of the parameter vector
L: Smooth window length
M: Number of samples in filtering
N: Number of samples in smoothing
Experimental Results
EP outperforms particle smoothers in efficiency with comparable accuracy.
(Chen, Wang, Liu 2000)
Bayesian Networks for Adaptive Decoding
x1 x2 xT
y1 y2 yT
e1 e2 eT
The information bits et are coded by a convolutional
error-correcting encoder.
EP Outperforms Viterbi Decoding
Outline
• Background
• Expectation Propagation (EP) on dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions
EP on Boltzman machinesDirected Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative)
Maximum entropy Markov models
Conditional random fields
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
Inference on Grids
Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.
X1 X2 X3 X4
X5 X6 X7 X8
X9 X10 X11 X12
X13 X14 X15 X16
Boltzmann Machines
4x
2x 3x
1x
a
afp )()( xx
Joint distribution is product of pair potentials:
Want to approximate by a simpler distribution
BP vs. EP
4x
2x 3x
1x
4x
2x 3x
4x
2x 3x
1x1xBP EP
Junction Tree Representation
p(x) q(x) Junction tree
Approximating an Edge by a Tree
24
443
3442
2441
14
21)(
~),(
~),(
~),(
~),(
xf
xxfxxfxxfxxf
a
aaaa
Each potential f a in p is projected onto the tree-structure of q
Correlations are not lost, but projected onto the tree
Moment Matching
• Match single and pairwise marginals of
• Reduces to exact inference on single loops– Use cutset conditioning
4x
2x 3x
1x
4x
2x 3x
1x
and
Local Propagation
• Original EP: globally propagate evidence to the whole tree– Problem: Computationally expensive
• Exploit the junction tree representation: only locally propagate evidence within the minimal subtree that is directly connected to the off-tree edge.– Reduce computational complexity
– Save memory
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x3 x4
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
Global propagation
x3 x4
Local propagation
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x1 x2
x1 x3 x1 x4
4-node Graph
TreeEP = the proposed method, BP = loopy belief propagation, GBP = generalized belief propagation on triangles, MF = mean-field, TreeVB =variational tree.
Fully-connected graphs
Results are averaged over 10 graphs with randomly generated potentials
•TreeEP performs the same or better than all other methods in both accuracy and efficiency!
8x8 grids, 10 trials
Method FLOPS Error
Exact 30,000 0
TreeEP 300,000 0.149
BP/double-loop 15,500,000 0.358
GBP 17,500,000 0.003
TreeEP versus BP and GBP
• TreeEP is always more accurate than BP and is often faster
• TreeEP is much more efficient than GBP and more accurate on some problems
• TreeEP converges more often than BP and GBP
Outline
• Background
• Expectation Propagation (EP) on dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions
Conclusions
• EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency
Efficiency
Acc
urac
y EP
Future Work
• EP is applicable to a wide range of applications
• EP is sensitive to choice of approximation– How to choose an approximation family (e.g.
tree structure) – More flexible approximation: mixture of EP?– Error bound?
Future WorkDirected Undirected
Generative Bayesian networks Boltzman machines
Conditional (Discriminative)
Maximum entropy Markov models
Conditional random fields
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
x1 x2
y1 y2
End
EP versus BP
• EP approximation is in a restricted family, e.g. Gaussian
• EP approximation does not have to be factorized
• EP applies to many more problems– e.g. mixture of discrete/continuous variables
EP versus Monte Carlo
• Monte Carlo is general but expensive
• EP exploits underlying simplicity of the problem if it exists
• Monte Carlo is still needed for complex problems (e.g. large isolated peaks)
• Trick is to know what problem you have
(Loopy) Belief propagation
• Specialize to factorized approximations:
• Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized)– “send messages”
i
iaia xff )(~
)(~x “messages”
)()( \ xx aa qf
)()(~ \ xx a
a qf
Limitation of BP
• If the dynamics or measurements are not linear and Gaussian, the complexity of the posterior increases with the number of measurements
• I.e. BP equations are not “closed”– Beliefs need not stay within a given family
* or any other exponential family
*
Approximate filtering
• Compute a Gaussian belief which approximates the true posterior:
• E.g. Extended Kalman filter, statistical linearization, unscented filter, assumed-density filter
)|()( ttt yxpxq
EP perspective
• Approximate filtering is equivalent to replacing true measurement/dynamics equations with linear/Gaussian equations
)|(
),|()|(
tt
ttttt yxp
yyxpxyp
)|()|(),|( ttttttt yxpxypyyxp
implies
Gaussian
Gaussian
EP perspective
• EKF, UKF, ADF are all algorithms for:
)|( tt xyp
)|( 1tt xxp
Nonlinear,Non-Gaussian
Linear,Gaussian
)|(~tt xyp
)|(~1tt xxp
Terminology
• Filtering: p(xt|y1:t )
• Smoothing: p(xt|y1:t+L ) where L>0
• On-line: old data is discarded (fixed memory)
• Off-line: old data is re-used (unbounded memory)
Kalman filtering / Belief propagation
• Prediction:
• Measurement:
• Smoothing:
111 )|()|()|( ttttttt dxyxpxxpyxp
)|()|(),|( ttttttt yxpxypyyxp
11111 ),|()|()|(),|( ttttttttttt dxyyxpxxpyxpyyxp