91
Mixture Models for Image Analysis Aristidis Likas & Christophoros Nikou IPAN Research Group Department of Computer Science University of Ioannina

Mixture Models for Image Analysis

Embed Size (px)

DESCRIPTION

Aristidis Likas, Associate Professor and Christoforos Nikou, Assistant Professor, University of Ioannina, Department of Computer Science , Mixture Models for Image Analysis

Citation preview

Page 1: Mixture Models for Image Analysis

Mixture Models for Image Analysis

Aristidis Likas & Christophoros Nikou

IPAN Research GroupDepartment of Computer Science

University of Ioannina

Page 2: Mixture Models for Image Analysis

Collaborators: Nikolaos Galatsanos, Professor

Konstantinos Blekas, Assistant Professor

Dr. Costas Constantinopoulos, Researcher

George Sfikas, Ph.d CandidateDemetrios Gerogiannis, Ph.d Candidate

Page 3: Mixture Models for Image Analysis

Outline

• Mixture Models and EM (GMM, SMM)• Bayesian GMMs• Image segmentation using mixture models

– Incremental Bayesian GMMs – Spatially varying GMMs (SVMMs) with MRF priors– SVMMs and line processes

• Image registration using mixture models

Page 4: Mixture Models for Image Analysis

Mixture Models • Probability density estimation: estimate the density function

model f(x) that generated a given dataset X={x1,…, xN} • Mixture Models

– M pdf components φj(x),

– mixing weights: π1, π2, …, πM (priors)

• Gaussian Mixture Model (GMM): φj = N(μj, Σj)

1

( ) ( ; )M

j j jj

f x x

1

0, 1M

j jj

Page 5: Mixture Models for Image Analysis

GMM (graphical model)

Hidden variable

πj

observation

Page 6: Mixture Models for Image Analysis

GMM examples

6

GMMs be used for density estimation (like histograms) or clustering

( ; )( | )

( )

nj j jn n

jn

xP j x z

f x

Cluster memberhsip probability

Page 7: Mixture Models for Image Analysis

Mixture Model training • Given a dataset X={x1,…, xN} and a GMM f (x;Θ)

• Likelihood:

• GMM training: log-likelihood maximization

• Expectation-maximization (EM) algorithm– Applicable when posterior P(Z|X) can be computed

1 1( ; ) ( ,..., ; ) ( ; )N

N iip X p x x f x

1

arg max ln ( ; )N

ii

p x

Page 8: Mixture Models for Image Analysis

EM for Mixture Models• E-step: compute expectation of hidden variables

given the observations:

1

( | )( | )

( | )

nj jn n

j Kn

j pp

xP j x z

x

• M-step: maximize expected complete likelihood

( 1)( | )arg max (Θ) log ( , ;Θ)t

P Z XQ p X Z

1 1

( ) log log ( | )N K

n nj j j

n j

Q z x

Page 9: Mixture Models for Image Analysis

EM for GMM (M-step)

( 1) ( 1)( 1) 1

1

( )( )N n n t n t Tj j jt n

j N njn

z x x

z

( 1) 1

1

N n njt n

j N njn

z x

z

Mean

Covariance

Mixing weights ( 1) 1

N njt n

j

zN

Page 10: Mixture Models for Image Analysis

Student's t-distribution

1/ 2

1 ( )/ 2 1 2

| |2( ; , , )

( ) 1 ( ) ( ) /2

dd

d

St xx x

Mean μ Covariance matrix Σ Degrees of freedom v

Bell-shaped + heavy-tailed (depending on v) Tends to Gaussian for large v

Page 11: Mixture Models for Image Analysis

The Student's t-distribution

Page 12: Mixture Models for Image Analysis

The Student's t-distribution

| ; , ~ ( , / )x u u

; ~ ( / 2, / 2)u Gamma

Hierarchical distribution x follows a Gaussian distribution whose covariance is scaled

by a factor following a Gamma distribution. ML parameter estimation using the EM algorithm

(u is considered as hidden variable).

Page 13: Mixture Models for Image Analysis

The Student's t-distribution

Page 14: Mixture Models for Image Analysis

SMM: Student's t Mixture Models Each component j follows St(μj, Σj, vj) (robust mixture)

Parameter estimation using EM hidden variables: uj and zj

E-step:

1

( | )

( | )

nj jn

j Kn

p pp

xz

x

( )

1( ) ( ) ( ) ( )

tjn

j Tt n t t n tj j j j

v du

v x x

Page 15: Mixture Models for Image Analysis

SMM training• M-step

( 1) 1

1

N n n nj jt n

j N n nj jn

u z x

u z

Mean

Covariance

Mixingproportion

( 1) 1

N n nj jt n

j

u zN

( 1) ( 1)( 1) 1

1

( )( )N n n n t n t Tj j j jt n

j N n nj jn

u z x x

u z

Page 16: Mixture Models for Image Analysis

EM for SMM• M-step

( 1) ( 1) ( 1)

log 1 log2 2 2

t t tj j jv v v d

( ) ( ) ( ) ( 1)1

( )1

log0

2

N n t n t n t tj j j jn

N n tjn

z u u v d

z

Degrees of freedom: no closed form update

Page 17: Mixture Models for Image Analysis

Mixture model training issues• EM local maxima (dependence on initialization)

• Covariance Singularities

• How to select the number of components

• SMM vs GMM• Better results for data with outliers (robustness)• Higher dependence on initialization (how to

initialize vi ?)

Page 18: Mixture Models for Image Analysis

EM Local Maxima

Page 19: Mixture Models for Image Analysis

Bayesian GMM

1

1

: ( , ), ( ) ( ), j j j jj

T Wishart v V p T p T T

1 1( ,..., ) : ( ,..., )MDirichlet a a

1

: ( , ), ( ) ( )j jj

N m S p p

1

( ) ( ; , )M

j j j jj

f x x

M

jj

1

1

Typical approach: Priors on all GMM parameters

Page 20: Mixture Models for Image Analysis

Bayesian GMM training• Parameters Θ become (hidden) RVs: H={Z, Θ}

• Objective: Compute Posteriors P(Z|X), P(Θ|X) (intractable)

• Approximations

• Sampling (RJMCMC)

• MAP approach

• Variational approach

• MAP approximation

• mode of the posterior P(Θ|Χ) (MAP-EM)

• compute P(Z|X,ΘMAP)

arg max {log ( | ) log ( )}MAP P X P

Page 21: Mixture Models for Image Analysis

Variational Inference (no parameters)

• Computes approximation q(H) of the true posterior P(H|X)• For any pdf q(H):• Variational Bound (F) maximization

• Mean field approximation

• System of equations

D. Tzikas, A. Likas, N. Galatsanos, IEEE Signal Processing Magazine, 2008

ln |p X F q KL q H P H X

,* arg max arg max lnq q

p X Hq F q q H dH

q H

k

k

q H q H

\

\

exp ln , ;;

exp ln , ;

k

k

q Hkk

q H

p X Hq H

p X H dH

Page 22: Mixture Models for Image Analysis

Variational Inference (with parameters)• X data, H hidden RVs, Θ parameters• For any pdf q(H;Θ):

• Maximization of Variational Bound F

ln ; , ; | ;p X F q KL q H p H X

, ;

, ; ln ln ;;

p X HF q q H dH p X

q H

• Variational EM• VE-Step:

• VM-Step:

arg max , old

qq F q

arg max ,oldF q

Page 23: Mixture Models for Image Analysis

Bayesian GMM training

• Bayesian GMMs (no parameters)

• mean field variational approximation

• tackles the covariance singularity problem

• requires to specify the parameters of the priors

• Estimating the number of components:

• Start with a large number of components

• Let the training process prune redundant components (πj=0)

• Dirichlet prior on πj prevents component prunning

Page 24: Mixture Models for Image Analysis

Bayesian GMM without prior on π

• Mixing weights πj are parameters (remove Dirichlet prior)

• Training using Variational EM

Method (C-B)• Start with a large number of components• Perform variational maximization of the marginal likelihood

• Prunning of redundant components (πj=0)

• Only components that fit well to the data are finally retained

CBdemo (CBdemo.wmv)

Page 25: Mixture Models for Image Analysis

Bayesian GMM (C-B)

• C-B method: Results depend on• the number of initial components• initialization of components• specification of the scale matrix V of the Wishart prior p(T)

Page 26: Mixture Models for Image Analysis

Incremental Bayesian GMM

• Modification of the Bayesian GMM is needed

• Divide the components as ‘fixed’ or ‘free’

• Prior on the weights of ‘fixed’ components (retained)

• No prior on the weights of ‘free’ components (may be eliminated)

• Prunning restricted among ‘free’ components

• Solution: incremental training using component splitting

• Local scale matrix V: based on the variance of the component to be splitted

C. Constantinopoulos & A. Likas, IEEE Trans. on Neural Networks, 2007

Page 27: Mixture Models for Image Analysis

Incremental Bayesian GMM

Page 28: Mixture Models for Image Analysis

Incremental Bayesian GMM• Start with k=1 component. • At each step:

• select a component j • split component j in two subcomponents

• set the scale matrix V analogous to Σj• apply Variational EM considering the two subcomponents

as free and the rest components as fixed• either the two components will be retained and adjusted• or one of them will be eliminated and the other one will

recover the original component (before split)• until all components have been tested for split unsuccessfully

C-L

Page 29: Mixture Models for Image Analysis

Mixture Models for Image Modeling • Select a feature representation• Compute a feature vector per pixel to form the training set• Build a mixture model for the image using the training set

• Applications• Image retrieval + relevance feedback• Image segmentation• Image registration

Page 30: Mixture Models for Image Analysis

Mixture Models for Image Segmentation

• One cluster per mixture component.• Assign pixels to clusters based on P(j|x)• Take into account spatial smoothness:

neighbouring pixels are expected to have the same label• Simple way: add pixel coordinates to the feature vector• Bayesian way: impose MRF priors (SVMM)

Page 31: Mixture Models for Image Analysis

Incremental Bayesian GMM Image segmentation

Number of segments determined automatically

Page 32: Mixture Models for Image Analysis

Incremental Bayesian GMMImage segmentation

Number of segments determined automatically

Page 33: Mixture Models for Image Analysis

Spatially Varying mixtures (1)

1

( | , ) ( | )K

n n nj j

j

f x x

Π Θ

nxnj

},{ jjj

Image feature (e.g. pixel intensity)Contextual mixing proportions

)|( jnx Gaussian

parameterized by

Nn ,...,2,1

Data Label, hidden variable njz

Page 34: Mixture Models for Image Analysis

Spatially Varying mixtures (2)

Smoothness is enforced in the image by imposing a prior p(Π) on the probability of the pixel labels (contextual mixing proportions).

1

( | , ) log ( | , ) log ( )N

n

n

L f x p

Π Χ Θ Π Θ Π

( 1| )n n nj jp z x

Insight into the contextual mixing proportions:

Page 35: Mixture Models for Image Analysis

SV-GMM with Gibbs prior (1)

• A typical constraint is the Gibbs prior:

),()(1

ΠΠ

N

iNi

VU ,1)( )(ΠΠ UeZ

p

2

1 1

( ) ,i

i

NKi m

N j jj m

V

Π Smoothness weight

[K. Blekas, A. Likas, N. Galatsanos and I. Lagaris. IEEE Trans. Neur. Net., 2005]

Page 36: Mixture Models for Image Analysis

SV-GMM with Gibbs prior (2)

Page 37: Mixture Models for Image Analysis

SV-GMM with Gibbs prior (3)

• E-step: equivalent with GMM.• M-step: the contextual mixing proportions are

solutions to a quadratic equation.

• Note that:1) Parameter β of the Gibbs prior must be determined

beforehand.2) The contextual mixing proportions are not

constraint to be probability vectors:

1

0 1, 1, 1,2,...,K

n nj j

j

n N

Page 38: Mixture Models for Image Analysis

SV-GMM with Gibbs prior (4)

To address these issues:1) Class adaptive Gauss-Markov random field

prior.

2) Projection of probabilities to the hyper-plane (another solution will be presented later on):

1

1, 1,2,...,K

nj

j

n N

Page 39: Mixture Models for Image Analysis

SV-GMM with Gauss-Markov prior (2)• One variance per cluster j=1,2,…,K per direction

d=0, 45, 90, 135 degrees

2

1, 2

1 1 ,

( )1( ) exp2

n

Nn mj jD K

n m NNj d

d j j d

p

Π

[C. Nikou, N. Galatsanos and A. Likas. IEEE Trans. Im. Proc., 2007]

Page 40: Mixture Models for Image Analysis

SV-GMM with Gauss-Markov prior (3)

Page 41: Mixture Models for Image Analysis

MAP estimation

1

1, 1,2,...,K

nj

j

n N

Posterior probabilities are the non-negative solutions of the second degree equation:

There is always a non-negative solution.

Projection to the hyperplane:

2 2 2 2, , ,

1 11 1 1

0 | | ( ) ( ) 0pn

D D DD Dn m n i

n j d j j d j j j j dnp pd d dm Nj

d p d p

Q N z

Page 42: Mixture Models for Image Analysis

RGB image segmentation (1)

Original image R-SNR = 2 dB

G-SNR = 4 dB

B-SNR = 3 dB

Page 43: Mixture Models for Image Analysis

RGB image segmentation (2)

SVFMM CA-SVFMM

Noise-free image segmentation

Page 44: Mixture Models for Image Analysis

RGB image segmentation (3)

SVFMM (β determined by trial

and error)

CA-SVFMM

Degraded image segmentation

Page 45: Mixture Models for Image Analysis

RGB image segmentation (4)

βj x10-3

Cupola 128Sky 33Wall 119

Shading effect on cupola and wall modeled with SVFMM with a GMRF prior.

Page 46: Mixture Models for Image Analysis

SV-GMM with DCM prior (1)For pixel n, the class label is a random variable multinomially distributed:

11

1

!( | ) , 0, 1, 1,..., ,nj

K Kzn n n n nj j jK

i jjj

j

p z n Nz

parameterized by probability vector 1 2, ,..., .Tn n n n

K

The whole image is parameterized by

1 2, ,..., Ξ

Page 47: Mixture Models for Image Analysis

SV-GMM with DCM prior (2)

Generative model for the image•Multinomial distribution: K possile outcomes.•Class label j, (j=1…K) appears with probability ξj .•M realizations of the process.•The distribution of the counts of a certain class is binomial.

11

1

!( | ) , 0, 1, 1,..., ,nj

K Kzn n n n nj j jK

i jjj

j

p z n Nz

Page 48: Mixture Models for Image Analysis

SV-GMM with DCM prior (3)

•The Dirichlet distribution forms the conjugate prior for the multinomial distribution.– The posterior has the same functional form

as the prior .

( | )( | ) (

())( | ) pp px x

p x p d

( | )p x( )p

[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]

Page 49: Mixture Models for Image Analysis

SV-GMM with DCM prior (4)

•It is natural to impose a Dirichlet prior on the parameters of the multinomial pdf:

( 1)1

1

1

( | ) , 0, 1,..., , 1,..., ,nj

Knj K ajn n n n

j jKn jj

j

ap a a n N j K

a

parameterized by vector 1 2, ,..., .Tn n n n

Ka a a a

Page 50: Mixture Models for Image Analysis

SV-GMM with DCM prior (5)

Marginalizing the parameters of the multinomial1

0

( | ) ( | ) ( | ) , 1, 2,...,n n n n n n ip z a p z p a d n N

1

1

1 1

!( | ) , 1,..., .

Kn

n nj Kj jjn n

K nKn jn n jj j j

j j

aa zMp z a n N

az a z

yields the Dirichlet compound multinomial distribution for the class labels:

Page 51: Mixture Models for Image Analysis

SV-GMM with DCM prior (6)

Image model: for a given pixel, its class j is determined by M=1 realization of the process.

( 1| ) 1( 1| ) 0 , 1,2,..., ,

n n nj j

n n nm m

p z xp z x m j m K

1

( 1| ) 1,..., .n nj jn n

j K nnm

m

a ap z a j K

a

The DCM prior for the class label becomes:

Page 52: Mixture Models for Image Analysis

SV-GMM with DCM prior (7)

The model becomes spatially varying by imposing a GMRF prior on the parameters of the Dirichlet pdf.

2

12

1

( )1( ) exp2

n

Nn nj jK

n m NNj

j j

a ap

A

[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]

Page 53: Mixture Models for Image Analysis

SV-GMM with DCM prior (8)

Page 54: Mixture Models for Image Analysis

MAP estimation

23 20 ( ) ( ) ( ) 0

| | | | | |i i

m n mj j j n n

m N m N j j jn n n nj j j jn

j n n n

a azQ a a a

a N N N

Posterior probabilities are the non-negative solutions to

There is always a non-negative solution. No need for projection!

1

, 1, 2,...,K

n nj m

mm j

a n N

Page 55: Mixture Models for Image Analysis

Natural image segmentation (1) Berkeley image data base (300 images). Ground truth: human segmentations. Features

MRF features o 7x7 windows x 3 components.o 147 dimensional vector.o PCA on a single image.o 8 principal components kept.

Page 56: Mixture Models for Image Analysis

Natural image segmentation (2)

Page 57: Mixture Models for Image Analysis

Natural image segmentation (3)

MRF features

Page 58: Mixture Models for Image Analysis

Natural image segmentation (4)

MRF features

Page 59: Mixture Models for Image Analysis

Natural image segmentation (6)

Page 60: Mixture Models for Image Analysis

Natural image segmentation (7)

Page 61: Mixture Models for Image Analysis

Natural image segmentation (8)

Page 62: Mixture Models for Image Analysis

Natural image segmentation (9)

Page 63: Mixture Models for Image Analysis

Results (K=5)

Page 64: Mixture Models for Image Analysis

Segmentation and recovery (1) Berkeley image data base. Additive white Gaussian noise

SNR between -4 dB and 12 dB MRF features. Evaluation indices

PR. VI. GCE. BDE.

Page 65: Mixture Models for Image Analysis

Segmentation and recovery (2)PR index (K=5)

Page 66: Mixture Models for Image Analysis

Line processes (1)

2 21min i i i iu i i

d u u u

Data fidelity term Smoothness term

Image recovery: estimate a smooth function from noisy observations.

• Observations: d• Function to be estimated: u

• Calculus of variations (Euler-Lagrange equations).

Page 67: Mixture Models for Image Analysis

Line processes (2)

2 21,

min 1i i i i i iu l i i i

d u u u l a l

Penalty term

In presence of many edges (piecewise smoothness) the standard solution is not satisfactory. A line process is integrated:

• Many local minima (due to simultaneous estimation of u and l), calculus of variations cannot be applied.

0 :1:

i

i

ll

Non-edge, include the term.

Edge, add penalty.

Page 68: Mixture Models for Image Analysis

Line processes (3) Milestones

[D. Geman and S. Geman 1984], [A. Blake and A. Zisserman 1988], [M. Black 1996 ].

Integration of la line process into a SV-GMM. Continuous line process model on the

contextual mixing proportions. Gamma distributed line process variables. Line process parameters are automatically

estimated from the data (EM and Variational EM).

Page 69: Mixture Models for Image Analysis

GMM with line process (2)

Line Process

Page 70: Mixture Models for Image Analysis

GMM with continuous line process (1)

2~ (0, , ), , , , ( )n kj j jd jd dSt v d n j k n

Student’s-t prior on the local differences of the contextual mixing proportions:

Distinct priors on each: Image class, Neighborhood direction (horizontal, vertical).

Page 71: Mixture Models for Image Analysis

GMM with continuous line process (2)

~ ( / 2, / 2), , , , ( ).nkj jd jd du Gamma v v d n j k n

2

1 1 1 ( )

( ; , ) ( | ; , ).N D

n kj j jd jd

j n d k n

p v St v

Joint distribution:

2~ (0, / ),n k nkj j jd jN u

Equivalently, at each pixel n:

Page 72: Mixture Models for Image Analysis

GMM with continuous line process (3)Ga

mma dist

ribute

d

l.p.

Page 73: Mixture Models for Image Analysis

GMM with continuous line process (4)

0nkju nkju

Description of edge structure. Continuous generalization of a binary line process.

nkju Weak class variances (smoothness).

Uninformative prior (no smoothness).

Separation of class j from the remaining classes.

[G. Sfikas, C. Nikou and N. Galatsanos. IEEE CVPR, 2008]

Page 74: Mixture Models for Image Analysis

Edges between segments (1)

Page 75: Mixture Models for Image Analysis

Edges between segments (2)

Horizontal differences Vertical differences

Sky

Cupola

Building

Page 76: Mixture Models for Image Analysis

Numerical results (1)Berkeley images -Rand index (RI)

Page 77: Mixture Models for Image Analysis

Image registration

• Estimate the transformation TΘ mapping the coordinates of an image I1 to a target image I2:

2 1, , I , ,x y z x y z

ΤΘ is described by a set of parameters Θ

Page 78: Mixture Models for Image Analysis

Image similarity measure

• Single modal images– Quadratic error, Correlation, Fourier transform,

Sign changes.• Multimodal images

– Inter-image uniformity, mutual information (MI), normalized MI.

1 2( ) I , , , , ,E x y z x y z

Page 79: Mixture Models for Image Analysis

Fundamental hypothesis• Correspondence between uniform regions in the two

images.• Partitioning of the image to be registered.

– Not necessarily into joint regions. • Projection of the partition onto the reference image.• Evaluation of a distance between the overlapping regions.

– Minimum at correct alignment.– Minimize the distance.

Page 80: Mixture Models for Image Analysis

Distance between GMMs (1)

A straightforward approach would be:

11 1 1

1

( | , ) ( )M

i im m

m

G x x

Π Θ

1 21 2

1 1

( , ) ,M N

m n m nm n

E G G B

22 2 2

1

( | , ) ( )N

i in n

n

G x x

Π Θ

Bhattacharyya distance

Page 81: Mixture Models for Image Analysis

Distance between GMMs (2)

Knowing the correspondences allows the definition of:

1 21 2

1

( , ) ,K

k kk

TE G G B

Component of the reference image.

Pixels of the transformed floating image overlapping with the kth component of the reference image.

Page 82: Mixture Models for Image Analysis

Energy function (1)

• For a set of transformation parameters Θ:– Segment the image to be registered into K

segments by GMM (or SMM).– For each segment:• Project the pixels onto the reference image.• Compute the mean and covariance of the reference

image pixels under the projection mask.– Evaluate the distance between the distributions.

Page 83: Mixture Models for Image Analysis

Energy function (2)• Find the transformation parameters Θ:

1 2

1

min ,K

k kk

TB

• Optimization by simplex, Powell method or ICM.

[D. Gerogiannis, C. Nikou and A. Likas. Image and Vision Computing, 2009]

Page 84: Mixture Models for Image Analysis

Convexity

Battacharyya distance

Page 85: Mixture Models for Image Analysis

MMBIA 2007, Rio de Janeiro, Brazil

Registration error(Gaussian noise)

Page 86: Mixture Models for Image Analysis

Registration error

Page 87: Mixture Models for Image Analysis

Registration of point clouds

• Correspondence is unknown– Greedy distance between mixtures– Determine the correspondence (Hungarian algorithm)

Page 88: Mixture Models for Image Analysis

Experimental results

2% outliers + uniform noiseinitial point set

Page 89: Mixture Models for Image Analysis

Greedy distance

Page 90: Mixture Models for Image Analysis

Hungarian algorithm

Page 91: Mixture Models for Image Analysis

Conclusions• Application of mixture models to – Image segmentation– Image registration

• Other applications– Image retrieval– Visual tracking– …