View
83
Download
0
Category
Tags:
Preview:
DESCRIPTION
Introduction to Graphical Models. Brookes Vision Lab Reading Group. Graphical Models. To build a complex system using simpler parts. System should be consistent Parts are combined using probability Undirected – Markov random fields Directed – Bayesian Networks. Overview. Representation - PowerPoint PPT Presentation
Citation preview
Introduction to Graphical Models
Brookes Vision Lab Reading Group
Graphical Models
• To build a complex system using simpler parts.
• System should be consistent• Parts are combined using probability• Undirected – Markov random fields• Directed – Bayesian Networks
Overview
• Representation• Inference• Linear Gaussian Models• Approximate inference• Learning
Causality : Sprinkler “causes” wet grass
Representation
Conditional Independence
• Independent of ancestors given parents• P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R)• = P(C) P(S|C) P(R|C) P(W|S,R)
• Space required for n binary nodes– O(2n) without factorization– O(n2k) with factorization, k = maximum fan-in
Inference
• Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430• Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708
Explaining Away
• S and R “compete” to explain W=1
• S and R are conditionally dependent
• Pr(S=1|R=1,W=1) = 0.1945
Inference
where
where
Inference
• Variable elimination• Choosing optimal ordering – NP hard• Greedy methods work well• Computing several marginals• Dynamic programming avoids redundant
computation• Sound familiar ??
Bayes Balls for Conditional Independence
A Unifying (Re)View
Linear GaussianModel (LGM)
FA SPCA PCA LDS
Mixture of Gaussians VQ HMM
Continuous-State LGM
Basic Model
Discrete-State LGM
Basic Model● State of a system is a k-vector x (unobserved)● Output of a system is a p-vector y (observed) ● Often k << p
● Basic model ● xt+1 = A xt + w● yt = C xt + v
● A is the k x k transition matrix● C is a p x k observation matrix● w = N(0, Q)● v = N(0, R)
● Noise processes are essential
Zero mean w.l.o.g
Degeneracy in Basic Model
• Structure in Q can be moved to A and C• W.l.o.g. Q = I• R cannot be restricted as yt are observed• Components of x can be reordered arbitrarily.• Ordering is based on norms of columns of C.• x1 = N(µ1, Q1)• A and C are assumed to have rank k.• Q, R, Q1 are assumed to be full rank.
Probability Computation
• P( xt+1 | xt ) = N(A xt, Q ; xt+1)
• P( yt | xt ) = N( C xt, R; yt)
• P({x1,..,xT,{y1,..,yT}) =
P(x1) П P(xt+1|xtП P(yt|xt)• Negative log probability
Inference● Given model parameters {A, C, Q, R, µ1, Q1}● Given observations y● What can be infered about hidden states x ?● Total likelihood
● Filtering : P (x(t) | {y(1), ... , y(t)})● Smoothing: P (x(t) | {y(1), ... , y(T)})● Partial smoothing: P (x(t) | {y(1), ... , y(t+t')})● Partial prediction: P (x(t) | {y(1), ... , y(t-t')})● Intermediate values of recursive methods for computing total likelihood.
Learning• Unknown parameters {A, C, Q, R, µ1, Q1}• Given observations y• Log-likelihood
F(Q,Ө) – free energy
EM algorithm• Alternate between maximizing F(Q,Ө) w.r.t. Q and
Ө.
• F = L at the beginning of M-step• E-step does not change Ө• Therefore, likelihood does not decrease.
Continuous-State LGM
Continuous-State LGM
Static Data Modeling Time-series Modeling
● No temporal dependence ● Factor analysis● SPCA● PCA
● Time ordering of data crucial● LDS (Kalman filter models)
Static Data Modelling
• A = 0• x = w• y = C x + v• x1 = N(0,Q)• y = N(0, CQC'+R)• Degeneracy in model• Learning : EM
– R restricted• Inference
Factor Analysis
• Restrict R to be diagonal.• Q = I• x – factors• C – factor loading matrix• R – uniqueness• Learning – EM , quasi-Newton optimization• Inference
SPCA
• R = єI• є – global noise level• Columns of C span the principal subspace.• Learning – EM algorithm• Inference
PCA• R = lim є->0 єI• Learning
– Diagonalize sample covariance of data– Leading k eigenvalues and eigenvectors define C– EM determines leading eigenvectors without
diagonalization• Inference
– Noise becomes infinitesimal– Posterior collapses to a single point
Linear Dynamical Systems
• Inference – Kalman filter• Smoothing – RTS recursions• Learning – EM algorithm
– C known – Shumway and Stoffer, 1982– All unknown – Ghahramani and Hinton, 1995
Discrete-State LGM
• xt+1 = WTA[A xt + w]
• yt = C xt + v• x1 = WTA[N(µ1,Q1)]
Discrete-State LGM
Discrete-state LGM
Static Data Modeling Time-series Modeling
● Mixture of Gaussians● VQ
● HMM
Static Data Modelling
• A = 0• x = WTA[w]• w = N(µ,Q)• Y = C x + v• лj = P(x = ej)
• Nonzero µ for nonuniform лj
• y = N(Cj, R)
• Cj – jth column of C
Mixture of Gaussians• Mixing coefficients of cluster лj
• Mean – columns Cj
• Variance – R• Learning: EM (corresponds to ML
competitive learning)• Inference
Vector Quantization• Observation noise becomes infinitesimal• Inference problem solved by 1NN rule• Euclidean distance for diagonal R• Mahalanobis distance for unscaled R• Posterior collapses to closest cluster• Learning with EM = batch version of k-
means
Time-series modelling
HMM
• Transition matrix T• Ti,j = P(xt+1 = ej | xt = ei)• For every T, there exist A and Q• Filtering : forward recursions• Smoothing: forward-backward algorithm• Learning: EM (called Baum-Welsh
reestimation)• MAP state sequences - Viterbi
Recommended