Mixture Models for Image Analysis

Mixture Models for Image Analysis

Aristidis Likas & Christophoros Nikou

IPAN Research GroupDepartment of Computer Science

University of Ioannina

Collaborators: Nikolaos Galatsanos, Professor

Konstantinos Blekas, Assistant Professor

Dr. Costas Constantinopoulos, Researcher

George Sfikas, Ph.d CandidateDemetrios Gerogiannis, Ph.d Candidate

Outline

• Mixture Models and EM (GMM, SMM)• Bayesian GMMs• Image segmentation using mixture models

– Incremental Bayesian GMMs – Spatially varying GMMs (SVMMs) with MRF priors– SVMMs and line processes

• Image registration using mixture models

Mixture Models • Probability density estimation: estimate the density function

model f(x) that generated a given dataset X={x1,…, xN} • Mixture Models

– M pdf components φj(x),

– mixing weights: π1, π2, …, πM (priors)

• Gaussian Mixture Model (GMM): φj = N(μj, Σj)

1

( ) ( ; )M

j j jj

f x x

1

0, 1M

j jj

GMM (graphical model)

Hidden variable

πj

observation

GMM examples

6

GMMs be used for density estimation (like histograms) or clustering

( ; )( | )

( )

nj j jn n

jn

xP j x z

f x

Cluster memberhsip probability

Mixture Model training • Given a dataset X={x1,…, xN} and a GMM f (x;Θ)

• Likelihood:

• GMM training: log-likelihood maximization

• Expectation-maximization (EM) algorithm– Applicable when posterior P(Z|X) can be computed

1 1( ; ) ( ,..., ; ) ( ; )N

N iip X p x x f x

1

arg max ln ( ; )N

ii

p x

EM for Mixture Models• E-step: compute expectation of hidden variables

given the observations:

1

( | )( | )

( | )

nj jn n

j Kn

j pp

xP j x z

x

• M-step: maximize expected complete likelihood

( 1)( | )arg max (Θ) log ( , ;Θ)t

P Z XQ p X Z

1 1

( ) log log ( | )N K

n nj j j

n j

Q z x

EM for GMM (M-step)

( 1) ( 1)( 1) 1

1

( )( )N n n t n t Tj j jt n

j N njn

z x x

z

( 1) 1

1

N n njt n

j N njn

z x

z

Mean

Covariance

Mixing weights ( 1) 1

N njt n

j

zN

Student's t-distribution

1/ 2

1 ( )/ 2 1 2

| |2( ; , , )

( ) 1 ( ) ( ) /2

dd

d

St xx x

Mean μ Covariance matrix Σ Degrees of freedom v

Bell-shaped + heavy-tailed (depending on v) Tends to Gaussian for large v

The Student's t-distribution


| ; , ~ ( , / )x u u

; ~ ( / 2, / 2)u Gamma

Hierarchical distribution x follows a Gaussian distribution whose covariance is scaled

by a factor following a Gamma distribution. ML parameter estimation using the EM algorithm

(u is considered as hidden variable).


SMM: Student's t Mixture Models Each component j follows St(μj, Σj, vj) (robust mixture)

Parameter estimation using EM hidden variables: uj and zj

E-step:

1

( | )

( | )

nj jn

j Kn

p pp

xz

x

( )

1( ) ( ) ( ) ( )

tjn

j Tt n t t n tj j j j

v du

v x x

SMM training• M-step

( 1) 1

1

N n n nj jt n

j N n nj jn

u z x

u z

Mean

Covariance

Mixingproportion

( 1) 1

N n nj jt n

j

u zN

( 1) ( 1)( 1) 1

1

( )( )N n n n t n t Tj j j jt n

j N n nj jn

u z x x

u z

EM for SMM• M-step

( 1) ( 1) ( 1)

log 1 log2 2 2

t t tj j jv v v d

( ) ( ) ( ) ( 1)1

( )1

log0

2

N n t n t n t tj j j jn

N n tjn

z u u v d

z

Degrees of freedom: no closed form update

Mixture model training issues• EM local maxima (dependence on initialization)

• Covariance Singularities

• How to select the number of components

• SMM vs GMM• Better results for data with outliers (robustness)• Higher dependence on initialization (how to

initialize vi ?)

EM Local Maxima

Bayesian GMM

1

1

: ( , ), ( ) ( ), j j j jj

T Wishart v V p T p T T

1 1( ,..., ) : ( ,..., )MDirichlet a a

1

: ( , ), ( ) ( )j jj

N m S p p

1

( ) ( ; , )M

j j j jj

f x x

M

jj

1

1

Typical approach: Priors on all GMM parameters

Bayesian GMM training• Parameters Θ become (hidden) RVs: H={Z, Θ}

• Objective: Compute Posteriors P(Z|X), P(Θ|X) (intractable)

• Approximations

• Sampling (RJMCMC)

• MAP approach

• Variational approach

• MAP approximation

• mode of the posterior P(Θ|Χ) (MAP-EM)

• compute P(Z|X,ΘMAP)

arg max {log ( | ) log ( )}MAP P X P

Variational Inference (no parameters)

• Computes approximation q(H) of the true posterior P(H|X)• For any pdf q(H):• Variational Bound (F) maximization

• Mean field approximation

• System of equations

D. Tzikas, A. Likas, N. Galatsanos, IEEE Signal Processing Magazine, 2008

ln |p X F q KL q H P H X

,* arg max arg max lnq q

p X Hq F q q H dH

q H

k

k

q H q H

\

\

exp ln , ;;

exp ln , ;

k

k

q Hkk

q H

p X Hq H

p X H dH

Variational Inference (with parameters)• X data, H hidden RVs, Θ parameters• For any pdf q(H;Θ):

• Maximization of Variational Bound F

ln ; , ; | ;p X F q KL q H p H X

, ;

, ; ln ln ;;

p X HF q q H dH p X

q H

• Variational EM• VE-Step:

• VM-Step:

arg max , old

qq F q

arg max ,oldF q

Bayesian GMM training

• Bayesian GMMs (no parameters)

• mean field variational approximation

• tackles the covariance singularity problem

• requires to specify the parameters of the priors

• Estimating the number of components:

• Start with a large number of components

• Let the training process prune redundant components (πj=0)

• Dirichlet prior on πj prevents component prunning

Bayesian GMM without prior on π

• Mixing weights πj are parameters (remove Dirichlet prior)

• Training using Variational EM

Method (C-B)• Start with a large number of components• Perform variational maximization of the marginal likelihood

• Prunning of redundant components (πj=0)

• Only components that fit well to the data are finally retained

CBdemo (CBdemo.wmv)

Bayesian GMM (C-B)

• C-B method: Results depend on• the number of initial components• initialization of components• specification of the scale matrix V of the Wishart prior p(T)

Incremental Bayesian GMM

• Modification of the Bayesian GMM is needed

• Divide the components as ‘fixed’ or ‘free’

• Prior on the weights of ‘fixed’ components (retained)

• No prior on the weights of ‘free’ components (may be eliminated)

• Prunning restricted among ‘free’ components

• Solution: incremental training using component splitting

• Local scale matrix V: based on the variance of the component to be splitted

C. Constantinopoulos & A. Likas, IEEE Trans. on Neural Networks, 2007

Incremental Bayesian GMM

Incremental Bayesian GMM• Start with k=1 component. • At each step:

• select a component j • split component j in two subcomponents

• set the scale matrix V analogous to Σj• apply Variational EM considering the two subcomponents

as free and the rest components as fixed• either the two components will be retained and adjusted• or one of them will be eliminated and the other one will

recover the original component (before split)• until all components have been tested for split unsuccessfully

C-L

Mixture Models for Image Modeling • Select a feature representation• Compute a feature vector per pixel to form the training set• Build a mixture model for the image using the training set

• Applications• Image retrieval + relevance feedback• Image segmentation• Image registration

Mixture Models for Image Segmentation

• One cluster per mixture component.• Assign pixels to clusters based on P(j|x)• Take into account spatial smoothness:

neighbouring pixels are expected to have the same label• Simple way: add pixel coordinates to the feature vector• Bayesian way: impose MRF priors (SVMM)

Incremental Bayesian GMM Image segmentation

Number of segments determined automatically

Incremental Bayesian GMMImage segmentation

Number of segments determined automatically

Spatially Varying mixtures (1)

1

( | , ) ( | )K

n n nj j

j

f x x

Π Θ

nxnj

},{ jjj

Image feature (e.g. pixel intensity)Contextual mixing proportions

)|( jnx Gaussian

parameterized by

Nn ,...,2,1

Data Label, hidden variable njz

Spatially Varying mixtures (2)

Smoothness is enforced in the image by imposing a prior p(Π) on the probability of the pixel labels (contextual mixing proportions).

1

( | , ) log ( | , ) log ( )N

n

n

L f x p

Π Χ Θ Π Θ Π

( 1| )n n nj jp z x

Insight into the contextual mixing proportions:

SV-GMM with Gibbs prior (1)

• A typical constraint is the Gibbs prior:

),()(1

ΠΠ

N

iNi

VU ,1)( )(ΠΠ UeZ

p

2

1 1

( ) ,i

i

NKi m

N j jj m

V

Π Smoothness weight

[K. Blekas, A. Likas, N. Galatsanos and I. Lagaris. IEEE Trans. Neur. Net., 2005]



• E-step: equivalent with GMM.• M-step: the contextual mixing proportions are

solutions to a quadratic equation.

• Note that:1) Parameter β of the Gibbs prior must be determined

beforehand.2) The contextual mixing proportions are not

constraint to be probability vectors:

1

0 1, 1, 1,2,...,K

n nj j

j

n N


To address these issues:1) Class adaptive Gauss-Markov random field

prior.

2) Projection of probabilities to the hyper-plane (another solution will be presented later on):

1

1, 1,2,...,K

nj

j

n N

SV-GMM with Gauss-Markov prior (2)• One variance per cluster j=1,2,…,K per direction

d=0, 45, 90, 135 degrees

2

1, 2

1 1 ,

( )1( ) exp2

n

Nn mj jD K

n m NNj d

d j j d

p

Π

[C. Nikou, N. Galatsanos and A. Likas. IEEE Trans. Im. Proc., 2007]

SV-GMM with Gauss-Markov prior (3)

MAP estimation

1

1, 1,2,...,K

nj

j

n N

Posterior probabilities are the non-negative solutions of the second degree equation:

There is always a non-negative solution.

Projection to the hyperplane:

2 2 2 2, , ,

1 11 1 1

0 | | ( ) ( ) 0pn

D D DD Dn m n i

n j d j j d j j j j dnp pd d dm Nj

d p d p

Q N z

RGB image segmentation (1)

Original image R-SNR = 2 dB

G-SNR = 4 dB

B-SNR = 3 dB


SVFMM CA-SVFMM

Noise-free image segmentation


SVFMM (β determined by trial

and error)

CA-SVFMM

Degraded image segmentation


βj x10-3

Cupola 128Sky 33Wall 119

Shading effect on cupola and wall modeled with SVFMM with a GMRF prior.

SV-GMM with DCM prior (1)For pixel n, the class label is a random variable multinomially distributed:

11

1

!( | ) , 0, 1, 1,..., ,nj

K Kzn n n n nj j jK

i jjj

j

p z n Nz

parameterized by probability vector 1 2, ,..., .Tn n n n

K

The whole image is parameterized by

1 2, ,..., Ξ

SV-GMM with DCM prior (2)

Generative model for the image•Multinomial distribution: K possile outcomes.•Class label j, (j=1…K) appears with probability ξj .•M realizations of the process.•The distribution of the counts of a certain class is binomial.

11

1

!( | ) , 0, 1, 1,..., ,nj

K Kzn n n n nj j jK

i jjj

j

p z n Nz


•The Dirichlet distribution forms the conjugate prior for the multinomial distribution.– The posterior has the same functional form

as the prior .

( | )( | ) (

())( | ) pp px x

p x p d

( | )p x( )p

[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]


•It is natural to impose a Dirichlet prior on the parameters of the multinomial pdf:

( 1)1

1

1

( | ) , 0, 1,..., , 1,..., ,nj

Knj K ajn n n n

j jKn jj

j

ap a a n N j K

a

parameterized by vector 1 2, ,..., .Tn n n n

Ka a a a


Marginalizing the parameters of the multinomial1

0

( | ) ( | ) ( | ) , 1, 2,...,n n n n n n ip z a p z p a d n N

1

1

1 1

!( | ) , 1,..., .

Kn

n nj Kj jjn n

K nKn jn n jj j j

j j

aa zMp z a n N

az a z

yields the Dirichlet compound multinomial distribution for the class labels:


Image model: for a given pixel, its class j is determined by M=1 realization of the process.

( 1| ) 1( 1| ) 0 , 1,2,..., ,

n n nj j

n n nm m

p z xp z x m j m K

1

( 1| ) 1,..., .n nj jn n

j K nnm

m

a ap z a j K

a

The DCM prior for the class label becomes:


The model becomes spatially varying by imposing a GMRF prior on the parameters of the Dirichlet pdf.

2

12

1

( )1( ) exp2

n

Nn nj jK

n m NNj

j j

a ap

A

[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]


MAP estimation

23 20 ( ) ( ) ( ) 0

| | | | | |i i

m n mj j j n n

m N m N j j jn n n nj j j jn

j n n n

a azQ a a a

a N N N

Posterior probabilities are the non-negative solutions to

There is always a non-negative solution. No need for projection!

1

, 1, 2,...,K

n nj m

mm j

a n N

Natural image segmentation (1) Berkeley image data base (300 images). Ground truth: human segmentations. Features

MRF features o 7x7 windows x 3 components.o 147 dimensional vector.o PCA on a single image.o 8 principal components kept.

Natural image segmentation (2)


MRF features


MRF features





Results (K=5)

Segmentation and recovery (1) Berkeley image data base. Additive white Gaussian noise

SNR between -4 dB and 12 dB MRF features. Evaluation indices

PR. VI. GCE. BDE.

Segmentation and recovery (2)PR index (K=5)

Line processes (1)

2 21min i i i iu i i

d u u u

Data fidelity term Smoothness term

Image recovery: estimate a smooth function from noisy observations.

• Observations: d• Function to be estimated: u

• Calculus of variations (Euler-Lagrange equations).

Line processes (2)

2 21,

min 1i i i i i iu l i i i

d u u u l a l

Penalty term

In presence of many edges (piecewise smoothness) the standard solution is not satisfactory. A line process is integrated:

• Many local minima (due to simultaneous estimation of u and l), calculus of variations cannot be applied.

0 :1:

i

i

ll

Non-edge, include the term.

Edge, add penalty.

Line processes (3) Milestones

[D. Geman and S. Geman 1984], [A. Blake and A. Zisserman 1988], [M. Black 1996 ].

Integration of la line process into a SV-GMM. Continuous line process model on the

contextual mixing proportions. Gamma distributed line process variables. Line process parameters are automatically

estimated from the data (EM and Variational EM).

GMM with line process (2)

Line Process

GMM with continuous line process (1)

2~ (0, , ), , , , ( )n kj j jd jd dSt v d n j k n

Student’s-t prior on the local differences of the contextual mixing proportions:

Distinct priors on each: Image class, Neighborhood direction (horizontal, vertical).


~ ( / 2, / 2), , , , ( ).nkj jd jd du Gamma v v d n j k n

2

1 1 1 ( )

( ; , ) ( | ; , ).N D

n kj j jd jd

j n d k n

p v St v

Joint distribution:

2~ (0, / ),n k nkj j jd jN u

Equivalently, at each pixel n:

GMM with continuous line process (3)Ga

mma dist

ribute

d

l.p.


0nkju nkju

Description of edge structure. Continuous generalization of a binary line process.

nkju Weak class variances (smoothness).

Uninformative prior (no smoothness).

Separation of class j from the remaining classes.

[G. Sfikas, C. Nikou and N. Galatsanos. IEEE CVPR, 2008]

Edges between segments (1)

Edges between segments (2)

Horizontal differences Vertical differences

Sky

Cupola

Building

Numerical results (1)Berkeley images -Rand index (RI)

Image registration

• Estimate the transformation TΘ mapping the coordinates of an image I1 to a target image I2:

2 1, , I , ,x y z x y z

ΤΘ is described by a set of parameters Θ

Image similarity measure

• Single modal images– Quadratic error, Correlation, Fourier transform,

Sign changes.• Multimodal images

– Inter-image uniformity, mutual information (MI), normalized MI.

1 2( ) I , , , , ,E x y z x y z

Fundamental hypothesis• Correspondence between uniform regions in the two

images.• Partitioning of the image to be registered.

– Not necessarily into joint regions. • Projection of the partition onto the reference image.• Evaluation of a distance between the overlapping regions.

– Minimum at correct alignment.– Minimize the distance.

Distance between GMMs (1)

A straightforward approach would be:

11 1 1

1

( | , ) ( )M

i im m

m

G x x

Π Θ

1 21 2

1 1

( , ) ,M N

m n m nm n

E G G B

22 2 2

1

( | , ) ( )N

i in n

n

G x x

Π Θ

Bhattacharyya distance

Distance between GMMs (2)

Knowing the correspondences allows the definition of:

1 21 2

1

( , ) ,K

k kk

TE G G B

Component of the reference image.

Pixels of the transformed floating image overlapping with the kth component of the reference image.

Energy function (1)

• For a set of transformation parameters Θ:– Segment the image to be registered into K

segments by GMM (or SMM).– For each segment:• Project the pixels onto the reference image.• Compute the mean and covariance of the reference

image pixels under the projection mask.– Evaluate the distance between the distributions.

Energy function (2)• Find the transformation parameters Θ:

1 2

1

min ,K

k kk

TB

• Optimization by simplex, Powell method or ICM.

[D. Gerogiannis, C. Nikou and A. Likas. Image and Vision Computing, 2009]

Convexity

Battacharyya distance

MMBIA 2007, Rio de Janeiro, Brazil

Registration error(Gaussian noise)

Registration error

Registration of point clouds

• Correspondence is unknown– Greedy distance between mixtures– Determine the correspondence (Hungarian algorithm)

Experimental results

2% outliers + uniform noiseinitial point set

Greedy distance

Hungarian algorithm

Conclusions• Application of mixture models to – Image segmentation– Image registration

• Other applications– Image retrieval– Visual tracking– …

Education

Mixture Models for Image Analysis