125
Introduction to Graphical models Guillaume Obozinski Ecole des Ponts - ParisTech INIT/AERFAI Summer school on Machine Learning Benic` assim, June 26th 2017 G. Obozinski Introduction to Graphical models 1/47

Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Introduction to Graphical models

Guillaume Obozinski

Ecole des Ponts - ParisTech

INIT/AERFAI Summer school on Machine Learning

Benicassim, June 26th 2017

G. Obozinski Introduction to Graphical models 1/47

Page 2: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Structured problems in HD

G. Obozinski Introduction to Graphical models 2/47

Page 3: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Structured problems in HD

G. Obozinski Introduction to Graphical models 2/47

Page 4: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Structured problems in HD

G. Obozinski Introduction to Graphical models 2/47

Page 5: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Sequence modelling

How to model the distribution of DNA sequences of length k?

Naive model→ 4n − 1 parameters

Independant model → 3n parameters

x1 x2 x3 x4

First order Markov chain:

x1 x2 x3 x4

Second order Markov chain:

x1 x2 x3 x4

Number of parameters O(n) for chains of length n.

G. Obozinski Introduction to Graphical models 3/47

Page 6: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Sequence modelling

How to model the distribution of DNA sequences of length k?

Naive model→ 4n − 1 parameters

Independant model → 3n parameters

x1 x2 x3 x4

First order Markov chain:

x1 x2 x3 x4

Second order Markov chain:

x1 x2 x3 x4

Number of parameters O(n) for chains of length n.

G. Obozinski Introduction to Graphical models 3/47

Page 7: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Sequence modelling

How to model the distribution of DNA sequences of length k?

Naive model→ 4n − 1 parameters

Independant model → 3n parameters

x1 x2 x3 x4

First order Markov chain:

x1 x2 x3 x4

Second order Markov chain:

x1 x2 x3 x4

Number of parameters O(n) for chains of length n.

G. Obozinski Introduction to Graphical models 3/47

Page 8: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Sequence modelling

How to model the distribution of DNA sequences of length k?

Naive model→ 4n − 1 parameters

Independant model → 3n parameters

x1 x2 x3 x4

First order Markov chain:

x1 x2 x3 x4

Second order Markov chain:

x1 x2 x3 x4

Number of parameters O(n) for chains of length n.

G. Obozinski Introduction to Graphical models 3/47

Page 9: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Sequence modelling

How to model the distribution of DNA sequences of length k?

Naive model→ 4n − 1 parameters

Independant model → 3n parameters

x1 x2 x3 x4

First order Markov chain:

x1 x2 x3 x4

Second order Markov chain:

x1 x2 x3 x4

Number of parameters O(n) for chains of length n.

G. Obozinski Introduction to Graphical models 3/47

Page 10: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Sequence modelling

How to model the distribution of DNA sequences of length k?

Naive model→ 4n − 1 parameters

Independant model → 3n parameters

x1 x2 x3 x4

First order Markov chain:

x1 x2 x3 x4

Second order Markov chain:

x1 x2 x3 x4

Number of parameters O(n) for chains of length n.

G. Obozinski Introduction to Graphical models 3/47

Page 11: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Sequence modelling

How to model the distribution of DNA sequences of length k?

Naive model→ 4n − 1 parameters

Independant model → 3n parameters

x1 x2 x3 x4

First order Markov chain:

x1 x2 x3 x4

Second order Markov chain:

x1 x2 x3 x4

Number of parameters O(n) for chains of length n.

G. Obozinski Introduction to Graphical models 3/47

Page 12: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Models for speech processing

Speech modelled by a sequence of unobserved phonemes

For each phoneme a random sound is produced following adistribution which characterizes the phoneme

Hidden Markov Model: HMMzn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

→ Latent variable models

G. Obozinski Introduction to Graphical models 4/47

Page 13: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Models for speech processing

Speech modelled by a sequence of unobserved phonemes

For each phoneme a random sound is produced following adistribution which characterizes the phoneme

Hidden Markov Model: HMMzn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

→ Latent variable models

G. Obozinski Introduction to Graphical models 4/47

Page 14: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Models for speech processing

Speech modelled by a sequence of unobserved phonemes

For each phoneme a random sound is produced following adistribution which characterizes the phoneme

Hidden Markov Model: HMMzn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

→ Latent variable models

G. Obozinski Introduction to Graphical models 4/47

Page 15: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Models for speech processing

Speech modelled by a sequence of unobserved phonemes

For each phoneme a random sound is produced following adistribution which characterizes the phoneme

Hidden Markov Model: HMMzn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

→ Latent variable models

G. Obozinski Introduction to Graphical models 4/47

Page 16: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Modelling image structures

Markov Random Field

Segmentation

→ oriented graphical model vs non oriented

G. Obozinski Introduction to Graphical models 5/47

Page 17: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Modelling image structures

Markov Random Field

Segmentation→ oriented graphical model vs non oriented

G. Obozinski Introduction to Graphical models 5/47

Page 18: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Anaesthesia alarm (Beinlich et al., 1989)

“The ALARM Monitoring system”

http://www.bnlearn.com/documentation/networks/

CVP central venous pressurePCWP pulmonary capillary wedge pressureHIST historyTPR total peripheral resistanceBP blood pressureCO cardiac outputHRBP heart rate / blood pressure.HREK heart rate measured by an EKG monitorHRSA heart rate / oxygen saturation.PAP pulmonary artery pressure.SAO2 arterial oxygen saturation.FIO2 fraction of inspired oxygen.PRSS breathing pressure.ECO2 expelled CO2.MINV minimum volume.MVS minimum volume setHYP hypovolemiaLVF left ventricular failureAPL anaphylaxisANES insufficient anesthesia/analgesia.PMB pulmonary embolusINT intubationKINK kinked tube.DISC disconnectionLVV left ventricular end-diastolic volumeSTKV stroke volumeCCHL catecholamineERLO error low outputHR heart rate.ERCA electrocauterSHNT shuntPVS pulmonary venous oxygen saturationACO2 arterial CO2VALV pulmonary alveoli ventilationVLNG lung ventilationVTUB ventilation tubeVMCH ventilation machine

G. Obozinski Introduction to Graphical models 6/47

Page 19: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Probabilistic model

1

2

3

7

8

6

4

5

9

p(x1, x2, . . . , x9) = f12(x1, x2) f23(x2, x3) f34(x3, x4) f45(x4, x5) . . .

f56(x5, x6) f37(x3, x7) f678(x6, x7, x8) f9(x9)

G. Obozinski Introduction to Graphical models 7/47

Page 20: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Probabilistic model

1

2

3

7

8

6

4

5

9

f12 f34 f45

f678

f9

f56f37f23

p(x1, x2, . . . , x9) = f12(x1, x2) f23(x2, x3) f34(x3, x4) f45(x4, x5) . . .

f56(x5, x6) f37(x3, x7) f678(x6, x7, x8) f9(x9)

G. Obozinski Introduction to Graphical models 7/47

Page 21: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Probabilistic model

1

2

3

7

8

6

4

5

9

f12 f34 f45

f678

f9

f56f37f23

p(x1, x2, . . . , x9) = f12(x1, x2) f23(x2, x3) f34(x3, x4) f45(x4, x5) . . .

f56(x5, x6) f37(x3, x7) f678(x6, x7, x8) f9(x9)

G. Obozinski Introduction to Graphical models 7/47

Page 22: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Abstact models vs concrete ones

Abstracts models

Linear regression

Logistic regression

Mixture model

Principal Component Analysis

Canonical Correlation Analysis

Independent Component analysis

LDA (Multinomiale PCA)

Naive Bayes Classifier

Mixture of experts

Concrete Models

Markov chains

HMM

Tree-structured models

Double HMMs

Oriented acyclic models

Markov Random Fields

Star models

Constellation Model

G. Obozinski Introduction to Graphical models 8/47

Page 23: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 24: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 25: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 26: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 27: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 28: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 29: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 30: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 31: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 32: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 33: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 34: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 35: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Poll...

... about some relevant concepts.

Markov Chain

Density of the multivariate Gaussian distribution

Maximum likelihood estimator

Logistic regression

Entropy

Exponential families of distributions (6= the exponential distribution)

Bayesian inference

Kullback-Leibler divergence

Expectation maximisation algorithm

Sum-product algorithm

MCMC sampling

G. Obozinski Introduction to Graphical models 9/47

Page 36: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Outline

1 Preliminary concepts

2 Directed graphical models

3 Markov random field

4 Operations on graphical models

G. Obozinski Introduction to Graphical models 10/47

Page 37: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Probability distributions

Joint probability distribution of r.v. (X1, . . . ,Xp): p(x1, x2, x3, . . . , xn).We assume either that

Xi takes values in {1, . . . ,K} and

p(x1, . . . , xn) = P(X1 = x1, . . . ,Xn = xn).

or that (X1, . . . ,Xn) admits a density in Rn

Marginalization

p(x1) =∑x2

p(x1, x2)

Factorization

p(x1, . . . , xn) = p(x1)p(x2|x1)p(x3|x1, x2) . . . p(xn|x1, . . . , xn−1)

G. Obozinski Introduction to Graphical models 11/47

Page 38: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Entropy and Kullback-Leibler divergence

Entropie

H(p) = −∑x

p(x) log p(x) = E[− log p(X )]

→ Expectation of the negative log-likelihood

Kullback-Leibler divergence

KL(p‖q) =∑x

p(x) logp(x)

q(x)= Ep

[log

p(X )

q(X )

]→ Expectation of the log-likelihood ratio

→ Property: KL(p‖q) ≥ 0

G. Obozinski Introduction to Graphical models 12/47

Page 39: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Entropy and Kullback-Leibler divergence

Entropie

H(p) = −∑x

p(x) log p(x) = E[− log p(X )]

→ Expectation of the negative log-likelihood

Kullback-Leibler divergence

KL(p‖q) =∑x

p(x) logp(x)

q(x)= Ep

[log

p(X )

q(X )

]→ Expectation of the log-likelihood ratio

→ Property: KL(p‖q) ≥ 0

G. Obozinski Introduction to Graphical models 12/47

Page 40: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Entropy and Kullback-Leibler divergence

Entropie

H(p) = −∑x

p(x) log p(x) = E[− log p(X )]

→ Expectation of the negative log-likelihood

Kullback-Leibler divergence

KL(p‖q) =∑x

p(x) logp(x)

q(x)= Ep

[log

p(X )

q(X )

]→ Expectation of the log-likelihood ratio

→ Property: KL(p‖q) ≥ 0

G. Obozinski Introduction to Graphical models 12/47

Page 41: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Independence concepts

Independence: X ⊥⊥Y

We say that X et Y are independents and write X ⊥⊥Y ssi:

∀x , y , P(X = x ,Y = y) = P(X = x)P(Y = y)

Conditional Independence: X ⊥⊥Y | ZOn says that X and Y are independent conditionally on Z and

write X ⊥⊥Y | Z iff:

∀x , y , z ,

P(X = x ,Y = y | Z = z) = P(X = x |Z = z) P(Y = y |Z = z)

G. Obozinski Introduction to Graphical models 13/47

Page 42: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Independence concepts

Independence: X ⊥⊥Y

We say that X et Y are independents and write X ⊥⊥Y ssi:

∀x , y , P(X = x ,Y = y) = P(X = x)P(Y = y)

Conditional Independence: X ⊥⊥Y | ZOn says that X and Y are independent conditionally on Z and

write X ⊥⊥Y | Z iff:

∀x , y , z ,

P(X = x ,Y = y | Z = z) = P(X = x |Z = z) P(Y = y |Z = z)

G. Obozinski Introduction to Graphical models 13/47

Page 43: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Conditional Independence exemple

Example of“X-linked recessive inheritance”:

Transmission of the generesponsible for hemophilia

Risk for sons from an unaffected father:

dependance between the situation of the two brothers.

conditionally independent given that the mother is a carrier of thegene or not.

G. Obozinski Introduction to Graphical models 14/47

Page 44: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Conditional Independence exemple

Example of“X-linked recessive inheritance”:

Transmission of the generesponsible for hemophilia

Risk for sons from an unaffected father:

dependance between the situation of the two brothers.

conditionally independent given that the mother is a carrier of thegene or not.

G. Obozinski Introduction to Graphical models 14/47

Page 45: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Indicator variable coding for multinomial variables

Let C a r.v. taking values in {1, . . . ,K}, with

P(C = k) = πk .

We will code C with a r.v. Y = (Y1, . . . ,YK )> with

Yk = 1{C=k}

For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with

∑Kk=1 yk = 1.

P(C = k) = P(Yk = 1) and P(Y = y) =K∏

k=1

πykk .

G. Obozinski Introduction to Graphical models 15/47

Page 46: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Indicator variable coding for multinomial variables

Let C a r.v. taking values in {1, . . . ,K}, with

P(C = k) = πk .

We will code C with a r.v. Y = (Y1, . . . ,YK )> with

Yk = 1{C=k}

For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with

∑Kk=1 yk = 1.

P(C = k) = P(Yk = 1) and P(Y = y) =K∏

k=1

πykk .

G. Obozinski Introduction to Graphical models 15/47

Page 47: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Indicator variable coding for multinomial variables

Let C a r.v. taking values in {1, . . . ,K}, with

P(C = k) = πk .

We will code C with a r.v. Y = (Y1, . . . ,YK )> with

Yk = 1{C=k}

For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.

So y ∈ {0, 1}K with∑K

k=1 yk = 1.

P(C = k) = P(Yk = 1) and P(Y = y) =K∏

k=1

πykk .

G. Obozinski Introduction to Graphical models 15/47

Page 48: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Indicator variable coding for multinomial variables

Let C a r.v. taking values in {1, . . . ,K}, with

P(C = k) = πk .

We will code C with a r.v. Y = (Y1, . . . ,YK )> with

Yk = 1{C=k}

For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with

∑Kk=1 yk = 1.

P(C = k) = P(Yk = 1) and P(Y = y) =K∏

k=1

πykk .

G. Obozinski Introduction to Graphical models 15/47

Page 49: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Indicator variable coding for multinomial variables

Let C a r.v. taking values in {1, . . . ,K}, with

P(C = k) = πk .

We will code C with a r.v. Y = (Y1, . . . ,YK )> with

Yk = 1{C=k}

For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with

∑Kk=1 yk = 1.

P(C = k) = P(Yk = 1) and P(Y = y) =K∏

k=1

πykk .

G. Obozinski Introduction to Graphical models 15/47

Page 50: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Bernoulli, Binomial, Multinomial

Y ∼ Ber(π) (Y1, . . . ,YK ) ∼M(1, π1, . . . , πK )

p(y) = πy (1− π)1−y p(y) = πy11 . . . πyKK

N1 ∼ Bin(n, π) (N1, . . . ,NK ) ∼M(n, π1, . . . , πK )

p(n1) =( nn1

)πn1 (1− π)n−n1 p(n) =

(n

n1 . . . nK

)πn1

1 . . . πnKK

with (n

i

)=

n!

(n − i)!i !and

(n

n1 . . . nK

)=

n!

n1! . . . nK !

G. Obozinski Introduction to Graphical models 16/47

Page 51: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gaussian model

Univariate gaussian : X ∼ N (µ, σ2)

X is real valued r.v., et θ =(µ, σ2

)∈ Θ = R× R∗+.

pµ,σ2 (x) =1√

2πσ2exp

(−1

2

(x − µ)2

σ2

)

Multivariate gaussian: X ∼ N (µ,Σ)

X takes values in Rd . Si Kn is the set of n × n positive definitematrices, and θ = (µ,Σ) ∈ Θ = Rd ×Kn.

pµ,Σ (x) =1√

(2π)d det Σexp

(−1

2(x − µ)T Σ−1 (x − µ)

)

G. Obozinski Introduction to Graphical models 17/47

Page 52: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gaussian model

Univariate gaussian : X ∼ N (µ, σ2)

X is real valued r.v., et θ =(µ, σ2

)∈ Θ = R× R∗+.

pµ,σ2 (x) =1√

2πσ2exp

(−1

2

(x − µ)2

σ2

)

Multivariate gaussian: X ∼ N (µ,Σ)

X takes values in Rd . Si Kn is the set of n × n positive definitematrices, and θ = (µ,Σ) ∈ Θ = Rd ×Kn.

pµ,Σ (x) =1√

(2π)d det Σexp

(−1

2(x − µ)T Σ−1 (x − µ)

)

G. Obozinski Introduction to Graphical models 17/47

Page 53: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gaussian densities

G. Obozinski Introduction to Graphical models 18/47

Page 54: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gaussian densities

G. Obozinski Introduction to Graphical models 18/47

Page 55: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Maximum likelihood principle

Let a model PΘ ={p(x ; θ) | θ ∈ Θ

}Let an observation x

Likelihood:

L : Θ → R+

θ 7→ p(x ; θ)

Maximum likelihood estimator:

θML = argmaxθ∈Θ

p(x ; θ)Sir Ronald Fisher

(1890-1962)

Case of i.i.d. data

For (xi )1≤i≤n a sample of i.i.d. data of size n:

θML = argmaxθ∈Θ

n∏i=1

p(xi ; θ) = argmaxθ∈Θ

n∑i=1

log p(xi ; θ)

G. Obozinski Introduction to Graphical models 19/47

Page 56: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Maximum likelihood principle

Let a model PΘ ={p(x ; θ) | θ ∈ Θ

}Let an observation x

Likelihood:

L : Θ → R+

θ 7→ p(x ; θ)

Maximum likelihood estimator:

θML = argmaxθ∈Θ

p(x ; θ)Sir Ronald Fisher

(1890-1962)

Case of i.i.d. data

For (xi )1≤i≤n a sample of i.i.d. data of size n:

θML = argmaxθ∈Θ

n∏i=1

p(xi ; θ) = argmaxθ∈Θ

n∑i=1

log p(xi ; θ)

G. Obozinski Introduction to Graphical models 19/47

Page 57: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Maximum likelihood principle

Let a model PΘ ={p(x ; θ) | θ ∈ Θ

}Let an observation x

Likelihood:

L : Θ → R+

θ 7→ p(x ; θ)

Maximum likelihood estimator:

θML = argmaxθ∈Θ

p(x ; θ)Sir Ronald Fisher

(1890-1962)

Case of i.i.d. data

For (xi )1≤i≤n a sample of i.i.d. data of size n:

θML = argmaxθ∈Θ

n∏i=1

p(xi ; θ) = argmaxθ∈Θ

n∑i=1

log p(xi ; θ)

G. Obozinski Introduction to Graphical models 19/47

Page 58: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Maximum likelihood principle

Let a model PΘ ={p(x ; θ) | θ ∈ Θ

}Let an observation x

Likelihood:

L : Θ → R+

θ 7→ p(x ; θ)

Maximum likelihood estimator:

θML = argmaxθ∈Θ

p(x ; θ)Sir Ronald Fisher

(1890-1962)

Case of i.i.d. data

For (xi )1≤i≤n a sample of i.i.d. data of size n:

θML = argmaxθ∈Θ

n∏i=1

p(xi ; θ) = argmaxθ∈Θ

n∑i=1

log p(xi ; θ)

G. Obozinski Introduction to Graphical models 19/47

Page 59: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Bayesian estimation

Parameters θ are modelled as a random variable.

A priori

We have an a priori p (θ) on the model parameters.

A posteriori

The data contribute to the likelihood : p (x |θ).The a posteriori probability of parameters is then

p (θ|x) =p (x |θ) p (θ)

p (x)∝ p (x |θ) p (θ) .

→ The Bayesian estimator is thus a probability distibution on theparameters.

One talks about Bayesian inference.

G. Obozinski Introduction to Graphical models 20/47

Page 60: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Bayesian estimation

Parameters θ are modelled as a random variable.

A priori

We have an a priori p (θ) on the model parameters.

A posteriori

The data contribute to the likelihood : p (x |θ).The a posteriori probability of parameters is then

p (θ|x) =p (x |θ) p (θ)

p (x)∝ p (x |θ) p (θ) .

→ The Bayesian estimator is thus a probability distibution on theparameters.

One talks about Bayesian inference.

G. Obozinski Introduction to Graphical models 20/47

Page 61: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Outline

1 Preliminary concepts

2 Directed graphical models

3 Markov random field

4 Operations on graphical models

G. Obozinski Introduction to Graphical models 21/47

Page 62: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Notations for graphical models

Graphs

G = (V ,E ) is a graph with vertex set V and edge set E .

The graph will be

either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .

or a an undirected graph

� then {i , j} ∈ E means i and j are adjacent.

Variables of the graphical model

To each node i ∈ V , we associate a graphical variable Xi .

Observations/values of Xi are denoted xi .

If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.

G. Obozinski Introduction to Graphical models 22/47

Page 63: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Notations for graphical models

Graphs

G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be

either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .

or a an undirected graph

� then {i , j} ∈ E means i and j are adjacent.

Variables of the graphical model

To each node i ∈ V , we associate a graphical variable Xi .

Observations/values of Xi are denoted xi .

If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.

G. Obozinski Introduction to Graphical models 22/47

Page 64: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Notations for graphical models

Graphs

G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be

either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .

or a an undirected graph

� then {i , j} ∈ E means i and j are adjacent.

Variables of the graphical model

To each node i ∈ V , we associate a graphical variable Xi .

Observations/values of Xi are denoted xi .

If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.

G. Obozinski Introduction to Graphical models 22/47

Page 65: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Notations for graphical models

Graphs

G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be

either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .

or a an undirected graph

� then {i , j} ∈ E means i and j are adjacent.

Variables of the graphical model

To each node i ∈ V , we associate a graphical variable Xi .

Observations/values of Xi are denoted xi .

If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.

G. Obozinski Introduction to Graphical models 22/47

Page 66: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Notations for graphical models

Graphs

G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be

either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .

or a an undirected graph

� then {i , j} ∈ E means i and j are adjacent.

Variables of the graphical model

To each node i ∈ V , we associate a graphical variable Xi .

Observations/values of Xi are denoted xi .

If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.

G. Obozinski Introduction to Graphical models 22/47

Page 67: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Notations for graphical models

Graphs

G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be

either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .

or a an undirected graph

� then {i , j} ∈ E means i and j are adjacent.

Variables of the graphical model

To each node i ∈ V , we associate a graphical variable Xi .

Observations/values of Xi are denoted xi .

If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.

G. Obozinski Introduction to Graphical models 22/47

Page 68: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed graphical model or Bayesian network

p(a, b, c) = p(a) p(b|a) p(c |b, a)

a

b

c

p(x1, x2) = p(x1)p(x2)x1 x2

p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3

a⊥⊥ b | c

c

a b

G. Obozinski Introduction to Graphical models 23/47

Page 69: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed graphical model or Bayesian network

p(a, b, c) = p(a) p(b|a) p(c |b, a)

a

b

c

p(x1, x2) = p(x1)p(x2)x1 x2

p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3

a⊥⊥ b | c

c

a b

G. Obozinski Introduction to Graphical models 23/47

Page 70: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed graphical model or Bayesian network

p(a, b, c) = p(a) p(b|a) p(c |b, a)

a

b

c

p(x1, x2) = p(x1)p(x2)x1 x2

p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3

a⊥⊥ b | c

c

a b

G. Obozinski Introduction to Graphical models 23/47

Page 71: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed graphical model or Bayesian network

p(a, b, c) = p(a) p(b|a) p(c |b, a)

a

b

c

p(x1, x2) = p(x1)p(x2)x1 x2

p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3

a⊥⊥ b | c

c

a b

G. Obozinski Introduction to Graphical models 23/47

Page 72: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed graphical model or Bayesian network

Factorization according to a directed graph

Definition: a distribution factorizes according to a directed graph

p∏j=1

p(xj |xΠj)

x1

x2 x3

x4 x5

x6 x7

p(x1)M∏j=2

p(xj |xj−1)

x1 x2 xM

G. Obozinski Introduction to Graphical models 24/47

Page 73: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed graphical model or Bayesian network

Factorization according to a directed graph

Definition: a distribution factorizes according to a directed graph

p∏j=1

p(xj |xΠj)

x1

x2 x3

x4 x5

x6 x7

p(x1)M∏j=2

p(xj |xj−1)

x1 x2 xM

G. Obozinski Introduction to Graphical models 24/47

Page 74: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

How to parameterize an Oriented graphical model?

x1

x2 x3

x4 x5

Conditional Probability tables

x1 ∈ {0, 1}x2 ∈ {0, 1, 2}x3 ∈ {0, 1, 2}

p(x3 = k)x1 x2 0 1 2

0 0 1 0 00 1 1 0 00 2 0.1 0 0.91 0 1 0 01 1 0.5 0.5 01 2 0.2 0.3 0.5

p(x;θ) = p(x1; θ1) p(x2|x1; θ2) p(x3|x2, x1; θ3) p(x4|x3, x2; θ4) p(x5|x3; θ5)

G. Obozinski Introduction to Graphical models 25/47

Page 75: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

How to parameterize an Oriented graphical model?

x1

x2 x3

x4 x5

Conditional Probability tables

x1 ∈ {0, 1}x2 ∈ {0, 1, 2}x3 ∈ {0, 1, 2}

p(x3 = k)x1 x2 0 1 2

0 0 1 0 00 1 1 0 00 2 0.1 0 0.91 0 1 0 01 1 0.5 0.5 01 2 0.2 0.3 0.5

p(x;θ) = p(x1; θ1) p(x2|x1; θ2) p(x3|x2, x1; θ3) p(x4|x3, x2; θ4) p(x5|x3; θ5)

G. Obozinski Introduction to Graphical models 25/47

Page 76: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

How to parameterize an Oriented graphical model?

x1

x2 x3

x4 x5

Conditional Probability tables

x1 ∈ {0, 1}x2 ∈ {0, 1, 2}x3 ∈ {0, 1, 2}

p(x3 = k)x1 x2 0 1 2

0 0 1 0 00 1 1 0 00 2 0.1 0 0.91 0 1 0 01 1 0.5 0.5 01 2 0.2 0.3 0.5

p(x;θ) = p(x1; θ1) p(x2|x1; θ2) p(x3|x2, x1; θ3) p(x4|x3, x2; θ4) p(x5|x3; θ5)

G. Obozinski Introduction to Graphical models 25/47

Page 77: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The Sprinkler

SR

G

R = 1: it has rained

S = 1: the sprinkler worked

G = 1: the grass is wet

P(S = 1) = 0.5

P(R = 1) = 0.2

P(G = 1|S ,R) R=0 R=1

S=0 0.01 0.8S=1 0.8 0.95

Given that we observe that the grass is wet, are R and Sindependent?

G. Obozinski Introduction to Graphical models 26/47

Page 78: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The Sprinkler

SR

G

R = 1: it has rained

S = 1: the sprinkler worked

G = 1: the grass is wet

P(S = 1) = 0.5

P(R = 1) = 0.2

P(G = 1|S ,R) R=0 R=1

S=0 0.01 0.8S=1 0.8 0.95

Given that we observe that the grass is wet, are R and Sindependent?

G. Obozinski Introduction to Graphical models 26/47

Page 79: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The Sprinkler

SR

G

R = 1: it has rained

S = 1: the sprinkler worked

G = 1: the grass is wet

P(S = 1) = 0.5

P(R = 1) = 0.2

P(G = 1|S ,R) R=0 R=1

S=0 0.01 0.8S=1 0.8 0.95

Given that we observe that the grass is wet, are R and Sindependent?

G. Obozinski Introduction to Graphical models 26/47

Page 80: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The Sprinkler II

SR

G

PR = 1: it has rained

S = 1: the sprinkler worked

G = 1: the grass is wet

P =2: the paws of the dog arewet

P(S = 1) = 0.5 P(R = 1) = 0.2

P(G = 1|S ,R) R=0 R=1

S=0 0.01 0.8S=1 0.8 0.95

P(P = 1|G ) G=0 G=1

0.2 0.7

G. Obozinski Introduction to Graphical models 27/47

Page 81: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The Sprinkler II

SR

G

P

R = 1: it has rained

S = 1: the sprinkler worked

G = 1: the grass is wet

P =2: the paws of the dog arewet

P(S = 1) = 0.5 P(R = 1) = 0.2

P(G = 1|S ,R) R=0 R=1

S=0 0.01 0.8S=1 0.8 0.95

P(P = 1|G ) G=0 G=1

0.2 0.7

G. Obozinski Introduction to Graphical models 27/47

Page 82: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The Sprinkler II

SR

G

PR = 1: it has rained

S = 1: the sprinkler worked

G = 1: the grass is wet

P =2: the paws of the dog arewet

P(S = 1) = 0.5 P(R = 1) = 0.2

P(G = 1|S ,R) R=0 R=1

S=0 0.01 0.8S=1 0.8 0.95

P(P = 1|G ) G=0 G=1

0.2 0.7

G. Obozinski Introduction to Graphical models 27/47

Page 83: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization and Independence

A factorization imposes independence statements

∀x , p(x) =

p∏j=1

p(xj |xΠj) ⇔ ∀j , Xj ⊥⊥X{1,..., j−1}\Πj

| XΠj

Is it possible to read from the graph the (conditional) independencestatements that hold given the factorization.

X5

?⊥⊥X2 | X4

x1

x2 x3

x4 x5

x6 x7

G. Obozinski Introduction to Graphical models 28/47

Page 84: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization and Independence

A factorization imposes independence statements

∀x , p(x) =

p∏j=1

p(xj |xΠj) ⇔ ∀j , Xj ⊥⊥X{1,..., j−1}\Πj

| XΠj

Is it possible to read from the graph the (conditional) independencestatements that hold given the factorization.

X5

?⊥⊥X2 | X4

x1

x2 x3

x4 x5

x6 x7

G. Obozinski Introduction to Graphical models 28/47

Page 85: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Blocking nodes

diverging edges head-to-tail converging edges

c

a b

a c b

c

a b

=a⊥⊥� b a⊥⊥� b a⊥⊥ b

c

a b

a c b

c

a b

= =a⊥⊥ b | c a⊥⊥ b | c a⊥⊥� b | c

The configuration with converging edges is called a v-structure

G. Obozinski Introduction to Graphical models 29/47

Page 86: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

d-separation

f

e b

a

c

Theorem

If A,B and C are three disjoint sets of node, the statementXA⊥⊥XB |XC holds if all paths joining A to B go through at least oneblocking node. A node j is blocking a path

if the edges of the paths are diverging/following and j ∈ C

if the edges of the paths are converging (i.e. form a v-structure)and neither j nor any of its descendants is in C

G. Obozinski Introduction to Graphical models 30/47

Page 87: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

d-separation

f

e b

a

c

Theorem

If A,B and C are three disjoint sets of node, the statementXA⊥⊥XB |XC holds if all paths joining A to B go through at least oneblocking node. A node j is blocking a path

if the edges of the paths are diverging/following and j ∈ C

if the edges of the paths are converging (i.e. form a v-structure)and neither j nor any of its descendants is in C

G. Obozinski Introduction to Graphical models 30/47

Page 88: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization et Independence II

Several graphs can induce the same set of conditionalindependences .

c

a b

a c b

p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)

Some combinations of conditional independences cannot befaithfully represented by a graphical model

Ex1: X ∼ Ber 12 Y ∼ Ber 1

2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0

G. Obozinski Introduction to Graphical models 31/47

Page 89: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization et Independence II

Several graphs can induce the same set of conditionalindependences .

c

a b

a c b

p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)

Some combinations of conditional independences cannot befaithfully represented by a graphical model

Ex1: X ∼ Ber 12 Y ∼ Ber 1

2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0

G. Obozinski Introduction to Graphical models 31/47

Page 90: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization et Independence II

Several graphs can induce the same set of conditionalindependences .

c

a b

a c b

p(c)p(a|c)p(b|c)

= p(a)p(c |a)p(b|c)

Some combinations of conditional independences cannot befaithfully represented by a graphical model

Ex1: X ∼ Ber 12 Y ∼ Ber 1

2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0

G. Obozinski Introduction to Graphical models 31/47

Page 91: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization et Independence II

Several graphs can induce the same set of conditionalindependences .

c

a b

a c b

p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)

Some combinations of conditional independences cannot befaithfully represented by a graphical model

Ex1: X ∼ Ber 12 Y ∼ Ber 1

2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0

G. Obozinski Introduction to Graphical models 31/47

Page 92: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization et Independence II

Several graphs can induce the same set of conditionalindependences .

c

a b

a c b

p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)

Some combinations of conditional independences cannot befaithfully represented by a graphical model

Ex1: X ∼ Ber 12 Y ∼ Ber 1

2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0

G. Obozinski Introduction to Graphical models 31/47

Page 93: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization et Independence II

Several graphs can induce the same set of conditionalindependences .

c

a b

a c b

p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)

Some combinations of conditional independences cannot befaithfully represented by a graphical model

Ex1: X ∼ Ber 12 Y ∼ Ber 1

2 Z = X ⊕ Y .

Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0

G. Obozinski Introduction to Graphical models 31/47

Page 94: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Factorization et Independence II

Several graphs can induce the same set of conditionalindependences .

c

a b

a c b

p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)

Some combinations of conditional independences cannot befaithfully represented by a graphical model

Ex1: X ∼ Ber 12 Y ∼ Ber 1

2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0

G. Obozinski Introduction to Graphical models 31/47

Page 95: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Outline

1 Preliminary concepts

2 Directed graphical models

3 Markov random field

4 Operations on graphical models

G. Obozinski Introduction to Graphical models 32/47

Page 96: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Markov random field (MRF) or Oriented graphical model

Is it possible to associate to each graph a family of distribution so thatconditional independence coincides exactly with the notion of separationin the graph?

Global Markov Property

XA⊥⊥XB | XC ⇔ C separates A et B

A

CB

G. Obozinski Introduction to Graphical models 33/47

Page 97: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gibbs distribution

Clique Set of nodes that are all connected to one another.

Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .

Gibbs distribution

p(x) =1

Z

∏C

ψC (xC )

Partition function

Z =∑x

∏C

ψC (xC )

x1

x2

x3

x4

Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.

G. Obozinski Introduction to Graphical models 34/47

Page 98: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gibbs distribution

Clique Set of nodes that are all connected to one another.

Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .

Gibbs distribution

p(x) =1

Z

∏C

ψC (xC )

Partition function

Z =∑x

∏C

ψC (xC )

x1

x2

x3

x4

Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.

G. Obozinski Introduction to Graphical models 34/47

Page 99: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gibbs distribution

Clique Set of nodes that are all connected to one another.

Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .

Gibbs distribution

p(x) =1

Z

∏C

ψC (xC )

Partition function

Z =∑x

∏C

ψC (xC )

x1

x2

x3

x4

Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.

G. Obozinski Introduction to Graphical models 34/47

Page 100: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Gibbs distribution

Clique Set of nodes that are all connected to one another.

Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .

Gibbs distribution

p(x) =1

Z

∏C

ψC (xC )

Partition function

Z =∑x

∏C

ψC (xC )

x1

x2

x3

x4

Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.

G. Obozinski Introduction to Graphical models 34/47

Page 101: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Example 1: Ising model

X = (X1, . . . ,Xd) is a collection of binary variables, whose jointprobability distribution is

p(x1, . . . , xd) =1

Z (η)exp

(∑i∈V

ηixi +∑{i ,j}∈E

ηijxixj

)=

1

Z (η)

∏i∈V

eηixi∏{i ,j}∈E

eηijxixj

=1

Z (η)

∏i∈V

ψi (xi )∏{i ,j}∈E

ψi (xi , xj)

with ψi (xi ) = eηixi and ψij(xi , xj) = eηijxixj .

G. Obozinski Introduction to Graphical models 35/47

Page 102: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Example 2: Directed graphical model

Consider a distribution p that factorizes according to a directed graphG = (V ,E ), then

p(x1, . . . , xd) =d∏

i=1

p(xi | xπi )

=d∏

i=1

ψCi(xCi

) with Ci = {i} ∪ πi

Consequence: A distribution that factorizes according to a directedmodel is a Gibbs distribution for the cliques Ci = {i} ∪ πi . As aconsequence, it factorizes according to an undirected graph in which Ci

are cliques.

G. Obozinski Introduction to Graphical models 36/47

Page 103: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Theorem of Hammersley and Clifford (1971)

A distribution p, which is such that p(x) > 0 for all x satisfies the globalMarkov property for graph G if and only if it is a Gibbs distributionassociated with G .

Gibbs distribution: PG : p(x) =1

Z

∏C∈CG

ψC (xC )

Global Markov property:

PM : XA⊥⊥XB | XC si C separated A and B in G

Theorem

We have PG ⇒ PM and (HC): if ∀x , p(x) > 0, then PM ⇒ PG

G. Obozinski Introduction to Graphical models 37/47

Page 104: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Markov Blanket in an undirected graph

Definition

The Markov Blanket B of a node i is the smallest set of nodes B suchthat

Xi ⊥⊥XR | XB , with R = V \(B ∪ {i})

or equivalently such that

p(Xi | X−i ) = p(Xi | XB).

G. Obozinski Introduction to Graphical models 38/47

Page 105: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Markov Blanket in an undirected graph

Definition

The Markov Blanket B of a node i is the smallest set of nodes B suchthat

Xi ⊥⊥XR | XB , with R = V \(B ∪ {i})

or equivalently such that

p(Xi | X−i ) = p(Xi | XB).

G. Obozinski Introduction to Graphical models 38/47

Page 106: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Markov Blanket in an undirected graph

Definition

The Markov Blanket B of a node i is the smallest set of nodes B suchthat

Xi ⊥⊥XR | XB , with R = V \(B ∪ {i})

or equivalently such that

p(Xi | X−i ) = p(Xi | XB).

G. Obozinski Introduction to Graphical models 38/47

Page 107: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Markov Blanket for a directed graph?

What is the Markov Blanket in a directed graph? By definition: thesmallest set C of nodes such that conditionally on XC , j is independentof all the other nodes in the graph?

xi

G. Obozinski Introduction to Graphical models 39/47

Page 108: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Markov Blanket for a directed graph?

What is the Markov Blanket in a directed graph? By definition: thesmallest set C of nodes such that conditionally on XC , j is independentof all the other nodes in the graph?

xi

G. Obozinski Introduction to Graphical models 39/47

Page 109: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Moralization

For a given oriented graphical model

is there an unoriented graphical model which is equivalent?

is there a smallest unoriented graphical which contains the orientedgraphical model?

p(x) =1

Z

∏C

ψC (xC ) vsM∏j=1

p(xj |xΠj)

G. Obozinski Introduction to Graphical models 40/47

Page 110: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Moralization

Given a directed graph G , its moralized graph GM is obtained by

1 For any node i , add undirected edges between all its parents

2 Remove the orientation of all the oriented edges

x1 x3

x4

x2

x1 x3

x4

x2

Proposition

If a probability distribution factorizes according to a directed graph Gthen it factorizes according to the undirected graph GM .

G. Obozinski Introduction to Graphical models 41/47

Page 111: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Moralization

Given a directed graph G , its moralized graph GM is obtained by

1 For any node i , add undirected edges between all its parents

2 Remove the orientation of all the oriented edges

x1 x3

x4

x2

x1 x3

x4

x2

Proposition

If a probability distribution factorizes according to a directed graph Gthen it factorizes according to the undirected graph GM .

G. Obozinski Introduction to Graphical models 41/47

Page 112: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Moralization

Given a directed graph G , its moralized graph GM is obtained by

1 For any node i , add undirected edges between all its parents

2 Remove the orientation of all the oriented edges

x1 x3

x4

x2

x1 x3

x4

x2

Proposition

If a probability distribution factorizes according to a directed graph Gthen it factorizes according to the undirected graph GM .

G. Obozinski Introduction to Graphical models 41/47

Page 113: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed vs undirected trees

Definition: directed tree

A directed tree is a DAG such that each node has at most one parent

Remark: By definition a directed tree has no v-structure.

Moralizing trees

What is the moralized graph for a directed tree?

The corresponding undirected tree!

Proposition (Equivalence between directed and undirected tree)

A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.

Corollary

All orientations of the edges of a tree that do not create v-structure areequivalent.

G. Obozinski Introduction to Graphical models 42/47

Page 114: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed vs undirected trees

Definition: directed tree

A directed tree is a DAG such that each node has at most one parent

Remark: By definition a directed tree has no v-structure.

Moralizing trees

What is the moralized graph for a directed tree?

The corresponding undirected tree!

Proposition (Equivalence between directed and undirected tree)

A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.

Corollary

All orientations of the edges of a tree that do not create v-structure areequivalent.

G. Obozinski Introduction to Graphical models 42/47

Page 115: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed vs undirected trees

Definition: directed tree

A directed tree is a DAG such that each node has at most one parent

Remark: By definition a directed tree has no v-structure.

Moralizing trees

What is the moralized graph for a directed tree?

The corresponding undirected tree!

Proposition (Equivalence between directed and undirected tree)

A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.

Corollary

All orientations of the edges of a tree that do not create v-structure areequivalent.

G. Obozinski Introduction to Graphical models 42/47

Page 116: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Directed vs undirected trees

Definition: directed tree

A directed tree is a DAG such that each node has at most one parent

Remark: By definition a directed tree has no v-structure.

Moralizing trees

What is the moralized graph for a directed tree?

The corresponding undirected tree!

Proposition (Equivalence between directed and undirected tree)

A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.

Corollary

All orientations of the edges of a tree that do not create v-structure areequivalent.

G. Obozinski Introduction to Graphical models 42/47

Page 117: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Outline

1 Preliminary concepts

2 Directed graphical models

3 Markov random field

4 Operations on graphical models

G. Obozinski Introduction to Graphical models 43/47

Page 118: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Operations on graphical models

Probabilistic inference

Compute a marginal distribution p(xi ) or a conditional marginalp(xi |x1 = 3, x7 = 0)

Decoding (aka MAP Inference)

Finding what is the most probable configuration for the set of randomvariables?

argmaxzp(z |x)

zn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

G. Obozinski Introduction to Graphical models 44/47

Page 119: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Operations on graphical models

Probabilistic inference

Compute a marginal distribution p(xi ) or a conditional marginalp(xi |x1 = 3, x7 = 0)

Decoding (aka MAP Inference)

Finding what is the most probable configuration for the set of randomvariables?

argmaxzp(z |x)

zn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

G. Obozinski Introduction to Graphical models 44/47

Page 120: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Learning/ estimation in graphical models

Frequentist learning

The main frequentist learning principle for graphical model is themaximum likelihood principle of R. Fisher. Letp(x ;θ) = 1

Z(θ)

∏C ψ(xC , θC ), we would like to find

argmaxθ

n∏i=1

p(x (i);θ) = argmaxθ1

Z (θ)

n∏i=1

∏C

ψ(x(i)C , θC )

Bayesian learning

Graphical models can also learn using bayesian inference.

G. Obozinski Introduction to Graphical models 45/47

Page 121: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The “Naive Bayes” model for classification

Data

Class label: C ∈ {1, . . . ,K}Class indicator vector Z ∈ {0, 1}K

Features Xj , j = 1, . . . ,D(e.g. word presence)

Model

p(z) =∏k

πzkk

Model

Which model for

p(x1, . . . , xD |zk = 1) ?

z

x1 xD

“Naive” hypothesis

p(x1, . . . , xD |zk = 1) =D∏j=1

p(xj | zk = 1; bjk) =D∏j=1

bxjjk (1− bjk)1−xj

with bjk = P(xj = 1 | zk = 1).

G. Obozinski Introduction to Graphical models 46/47

Page 122: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The “Naive Bayes” model for classification

Data

Class label: C ∈ {1, . . . ,K}Class indicator vector Z ∈ {0, 1}K

Features Xj , j = 1, . . . ,D(e.g. word presence)

Model

p(z) =∏k

πzkk

Model

Which model for

p(x1, . . . , xD |zk = 1) ?

z

x1 xD

“Naive” hypothesis

p(x1, . . . , xD |zk = 1) =D∏j=1

p(xj | zk = 1; bjk) =D∏j=1

bxjjk (1− bjk)1−xj

with bjk = P(xj = 1 | zk = 1).

G. Obozinski Introduction to Graphical models 46/47

Page 123: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

The “Naive Bayes” model for classification

Data

Class label: C ∈ {1, . . . ,K}Class indicator vector Z ∈ {0, 1}K

Features Xj , j = 1, . . . ,D(e.g. word presence)

Model

p(z) =∏k

πzkk

Model

Which model for

p(x1, . . . , xD |zk = 1) ?

z

x1 xD

“Naive” hypothesis

p(x1, . . . , xD |zk = 1) =D∏j=1

p(xj | zk = 1; bjk) =D∏j=1

bxjjk (1− bjk)1−xj

with bjk = P(xj = 1 | zk = 1).G. Obozinski Introduction to Graphical models 46/47

Page 124: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Naive Bayes (continued)

Learning (estimation) with the maximum likelihood principle

π = argmaxπ:π>1=1

∏k,i

πz

(i)k

k bjk = argmaxbjk

n∑i=1

log p(x(i)j |z

(i) = k; bjk)

Prediction:

z = argmaxz

∏Dj=1 p(xj |z)p(z)∑

z ′∏D

j=1 p(xj |z ′)p(z ′)

Properties

Ignores the correlation between features

Prediction requires only to use Bayes rule

The model can be learnt in parallel

Complexity in O(nD)

G. Obozinski Introduction to Graphical models 47/47

Page 125: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts

Naive Bayes (continued)

Learning (estimation) with the maximum likelihood principle

π = argmaxπ:π>1=1

∏k,i

πz

(i)k

k bjk = argmaxbjk

n∑i=1

log p(x(i)j |z

(i) = k; bjk)

Prediction:

z = argmaxz

∏Dj=1 p(xj |z)p(z)∑

z ′∏D

j=1 p(xj |z ′)p(z ′)

Properties

Ignores the correlation between features

Prediction requires only to use Bayes rule

The model can be learnt in parallel

Complexity in O(nD)

G. Obozinski Introduction to Graphical models 47/47