Introduction to Graphical Models

Brookes Vision Lab Reading Group

Graphical Models

• To build a complex system using simpler parts.

• System should be consistent• Parts are combined using probability• Undirected – Markov random fields• Directed – Bayesian Networks

Overview

• Representation• Inference• Linear Gaussian Models• Approximate inference• Learning

Causality : Sprinkler “causes” wet grass

Representation

Conditional Independence

• Independent of ancestors given parents• P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R)• = P(C) P(S|C) P(R|C) P(W|S,R)

• Space required for n binary nodes– O(2n) without factorization– O(n2k) with factorization, k = maximum fan-in

Inference

• Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430• Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708

Explaining Away

• S and R “compete” to explain W=1

• S and R are conditionally dependent

• Pr(S=1|R=1,W=1) = 0.1945

Inference

• Variable elimination• Choosing optimal ordering – NP hard• Greedy methods work well• Computing several marginals• Dynamic programming avoids redundant

computation• Sound familiar ??

Bayes Balls for Conditional Independence

A Unifying (Re)View

Linear GaussianModel (LGM)

FA SPCA PCA LDS

Mixture of Gaussians VQ HMM

Continuous-State LGM

Basic Model

Discrete-State LGM

Basic Model● State of a system is a k-vector x (unobserved)● Output of a system is a p-vector y (observed) ● Often k << p

● Basic model ● xt+1 = A xt + w● yt = C xt + v

● A is the k x k transition matrix● C is a p x k observation matrix● w = N(0, Q)● v = N(0, R)

● Noise processes are essential

Zero mean w.l.o.g

Degeneracy in Basic Model

• Structure in Q can be moved to A and C• W.l.o.g. Q = I• R cannot be restricted as yt are observed• Components of x can be reordered arbitrarily.• Ordering is based on norms of columns of C.• x1 = N(µ1, Q1)• A and C are assumed to have rank k.• Q, R, Q1 are assumed to be full rank.

Probability Computation

• P( xt+1 | xt ) = N(A xt, Q ; xt+1)

• P( yt | xt ) = N( C xt, R; yt)

• P({x1,..,xT,{y1,..,yT}) =

P(x1) П P(xt+1|xtП P(yt|xt)• Negative log probability

Inference● Given model parameters {A, C, Q, R, µ1, Q1}● Given observations y● What can be infered about hidden states x ?● Total likelihood

● Filtering : P (x(t) | {y(1), ... , y(t)})● Smoothing: P (x(t) | {y(1), ... , y(T)})● Partial smoothing: P (x(t) | {y(1), ... , y(t+t')})● Partial prediction: P (x(t) | {y(1), ... , y(t-t')})● Intermediate values of recursive methods for computing total likelihood.

Learning• Unknown parameters {A, C, Q, R, µ1, Q1}• Given observations y• Log-likelihood

F(Q,Ө) – free energy

EM algorithm• Alternate between maximizing F(Q,Ө) w.r.t. Q and

• F = L at the beginning of M-step• E-step does not change Ө• Therefore, likelihood does not decrease.

Continuous-State LGM

Static Data Modeling Time-series Modeling

● No temporal dependence ● Factor analysis● SPCA● PCA

● Time ordering of data crucial● LDS (Kalman filter models)

Static Data Modelling

• A = 0• x = w• y = C x + v• x1 = N(0,Q)• y = N(0, CQC'+R)• Degeneracy in model• Learning : EM

– R restricted• Inference

Factor Analysis

• Restrict R to be diagonal.• Q = I• x – factors• C – factor loading matrix• R – uniqueness• Learning – EM , quasi-Newton optimization• Inference

• R = єI• є – global noise level• Columns of C span the principal subspace.• Learning – EM algorithm• Inference

PCA• R = lim є->0 єI• Learning

– Diagonalize sample covariance of data– Leading k eigenvalues and eigenvectors define C– EM determines leading eigenvectors without

diagonalization• Inference

– Noise becomes infinitesimal– Posterior collapses to a single point

Linear Dynamical Systems

• Inference – Kalman filter• Smoothing – RTS recursions• Learning – EM algorithm

– C known – Shumway and Stoffer, 1982– All unknown – Ghahramani and Hinton, 1995

Discrete-State LGM

• xt+1 = WTA[A xt + w]

• yt = C xt + v• x1 = WTA[N(µ1,Q1)]

Discrete-State LGM

Discrete-state LGM

Static Data Modeling Time-series Modeling

● Mixture of Gaussians● VQ

● HMM

Static Data Modelling

• A = 0• x = WTA[w]• w = N(µ,Q)• Y = C x + v• лj = P(x = ej)

• Nonzero µ for nonuniform лj

• y = N(Cj, R)

• Cj – jth column of C

Mixture of Gaussians• Mixing coefficients of cluster лj

• Mean – columns Cj

• Variance – R• Learning: EM (corresponds to ML

competitive learning)• Inference

Vector Quantization• Observation noise becomes infinitesimal• Inference problem solved by 1NN rule• Euclidean distance for diagonal R• Mahalanobis distance for unscaled R• Posterior collapses to closest cluster• Learning with EM = batch version of k-

Time-series modelling

• Transition matrix T• Ti,j = P(xt+1 = ej | xt = ei)• For every T, there exist A and Q• Filtering : forward recursions• Smoothing: forward-backward algorithm• Learning: EM (called Baum-Welsh

reestimation)• MAP state sequences - Viterbi

Introduction to Graphical Models

Documents

Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

On Semiparametric Exponential Family Graphical Models · Keywords: Graphical Models, Exponential Family, High Dimensional Inference 1. Introduction Given a d-dimensional random vector

Probabilistic graphical models: Introduction and general informationimagine.enpc.fr/.../mva_gm/fall2016/intro_cours_MVA_2016.pdf · 2016. 10. 7. · Probabilistic graphical models:

Graphical Models (Lecture 1 - Introduction) - CECS SSLLssll.cecs.anu.edu.au/files/slides/caetano.pdf · Graphical Models (Lecture 1 - Introduction) Tibério Caetano tiberiocaetano.com

Probabilistic Graphical Models · 2014-01-15 · Logistics Text books: Daphne Koller and Nir Friedman, Probabilistic Graphical Models M. I. Jordan, An Introduction to Probabilistic

Gaussian Graphical Models - Oxford Statisticssteffen/teaching/cimpa/gauss.pdfGaussian graphical models Gaussian Graphical Models Ste en Lauritzen University of Oxford CIMPA Summerschool,

Graphical models: Foundations of neural computationpapers.cnl.salk.edu/PDFs/Graphical models... · graphical models framework provides formil definitions of both adap- tivity and

An Introduction to Probabilistic Graphical Models · Probabilistic Graphical Models C edric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp

Introduction to Probabilistic Graphical Models · Probabilistic graphical models (PGMs) Many classical probabilistic problems in statistics, information theory, pattern recognition,

Introduction to Probabilistic Graphical Models · 2019-05-07 · This tutorial provides an introduction to probabilistic graphical models. We review three rep-resentations of probabilistic

Computational Information Geometry and Graphical Models · Computational Information Geometry and Graphical Models Frank Critchley and Paul Marriott Introduction Information Geometry

An Introduction to RevBayes and Graphical Models

Introduction to Bayesian multilevel models (hierarchical ... · Introduction to Bayesian multilevel models (hierarchical Bayes/graphical models) Tom Loredo ... Graphical model —

An Introduction to Variational Methods for Graphical Models

Introduction to Graphical Models - School of Computing · Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah. Reference ... •Probably the

Advances in Combinatorial Optimization for Graphical Modelsdechter/talks/tutorial-IJCAI-09i-final.pdf2 Outline Introduction Graphical models Optimization tasks for graphical models

Introduction to Graphical Modelscs6320/cv_files/GraphicalModels.pdf · 2018-02-07 · Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah

Introduction to Probability for Graphical Modelsduvenaud/courses/csc412/lectures/probability... · Introduction to Probability for Graphical Models CSC 412 Kaustav Kundu Thursday

Probabilistic Graphical Models · 2014-01-15 · School of Computer Science Probabilistic Graphical Models Introduction to GM and Directed GMs: Bayesian Networks Eric Xing Lecture

A Brief Introduction to Graphical Models