Generative Models for Image Analysis

Generative Models for Image Analysis

Stuart Geman(with E. Borenstein, L.-B. Chang, W. Zhang)

I. Bayesian (generative) image modelsII. Feature distributions and data distributionsIII. Conditional modelingIV. Sampling and the choice of null distributionV. Other applications of conditional modeling

I. Bayesian (generative) image modelsPrior

( )

set of possible "interpretations" or "parses"

a particular interpretation

probability model on

* very structured and constrained

I

x I

P x I

* organizing principles: hierarchy and reusability

(Amit, Buhmann, Pogio, Yuille, Zhu, etc.)

* non-Markovian (context/content sensitive)

Conditional likelihood

( | ) ( | ) ( )P x y P y x P x

( | )

image

conditional probability model

y

P y x

Posterior

focus here on ( | )P y x

II. Feature distributions and data distributions

S{ }s s Sy y

pixel intensity at sy s S

( )

( )

"feature" e.g.

variance of patch

histogram of gradients, sift features, etc.

template correlation

probability model undergF

f y

P f ' ' category

(edge, corner, eye, face, ...)

g

image patch

Model patch through a feature model:

e.g. detection and recognition of eyes

S{ }s s Sy y


image patch

2 2

(1 )

1( )( )

| |( ) ( ) ( , )

1 1( ) ( )

| | | |

( )

Consider

and the model T

T

s ss

T

s ss s

ceC T

T T y yS

f y c y corr T y

T T y yS S

P c e

1,..., ,Problem: given samples of eye patches, learn and Ny y T

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

actually:

1

(1 ( )), 1

1 1

( ),..., ( )

( ( ),..., ( )) ( ( ))

Tempting to PRETEND that the data is :

T k

T

T T N

N Nc ye

T T T N C T kk k

c y c y

L c y c y P c y e

(1 ( ))

(1 ( ))

,

( ( ))

1( )

caution: is different from

T

T

T

c yeC T

c yeY

T

P c y e

P y eZ

(1 ( )),

1

,...,

( ) ( ( )) ( | ( ))

( ,..., ) ( | ( ))

1

1

BUT the data is y and

T

T k

N

e e eY C T Y T T

Nc y e

T N Y k T T kk

y

P y P c y P y C c y

L y y e P y C c y

The first is fine for estimating λ but not fine for estimating T

Use maximum likelihood…but what is the likelihood?

?

III. Conditional modeling

( )

( ) ( ( )) ( | ( ))

( ( )) ( | ).

For any category (e.g. "eye") and feature

Easy to model ; hard to model

g g gY F Y

g gF Y

g F f Y

P y P f y P y F f y

P f y P y F f

0

0

( ) ( )

( )

( )

Principle: start with a "null" or "background"

distribution and choose

1. consistent with , and

2. otherwise "as close as possible" to

gY Y

gF

Y

P y P y

P f

P y

0

0

: ( )( )

0

( ) ( ),

( ) arg min ( || )

( ) ( ( )) ( | ( ))

(( || ) ( ) log

has distribution

Specifically, given , and a null distribution

choose

(where

YgF

gF Y

gY Y Y

P F YP f

g gY F Y

P f P y

P y D P P

P y P f y P y F f y

PD P P P y

)

( ) is K-L divergence)

ydy

P y

Conditional modeling: a perturbation of the null distribution

Estimation

1

01

1

0

01

0

0 0

,..., ( )

( ,..., ) ( ( )) ( | ( ))

( ( )) ( | ( ))

( )

( ( )) ( | ( ))

( ( )) ( |

Given , and

=

gN F

Ng

N F k Y k kk

gNF k Y k k

k Y k

gF k Y k k

F k Y k

y y P f

L y y P f y P y F f y

P f y P y F f y

P y

P f y P y F f y

P f y P y F

1

01

( ))

( ( ))

( ( )) =

N

k k

gNF k

k F k

f y

P f y

P f y

Much Easier!

Example: learning eye templates

0

1

(1 ( )) 0

( ) ( ) ( , ),

( ) ( ( )) ( | ( ))

( | ( ))

M

M

m=1

Take and model patch as a

MIXTURE:

=

T m m mm

m Tm

m m m

T

e eY m C T Y T T

m

c y

m Y T T

f y c y corr T y

P y P c y P y C c y

e P y C c y

S{ }s s Sy y


image patch

Example: learning eye templates1 1 1 1

0

11

0

011

( ,..., | ,..., , ,..., , ,..., )

( ( )) ( | ( ))

( ( )) ( | ( ))

( )

(

M

M

=

T m m mm

T m m mm

Tm

N m m m

Ne

m C T k Y k T T kmk

eNC T k Y k T T k

mmk Y k

eC

m

L y y T T

P c y P y C c y

P c y P y C c y

P y

P

0

0 011

011

(1 ( ))

011

( )) ( | ( ))

( ( )) ( | ( ))

( ( ))

( ( ))

( ( ))

M

M

M

=

=

m m m

T m m mm

T mm

T mm

m T km

m

T mm

NT k Y k T T k

mk C T k Y k T T k

eNC T k

mmk C T k

c yN

mmk C T k

c y P y C c y

P c y P y C c y

P c y

P c y

e

P c y


0 | |1( ) ( )

256

Take (for now)

(iid uniform)SYP y

2| | ( )0 2( ( ))

Then, by a Central Limit Theorem:

|S|

2

T

T

S C y

C tP c y e


2

1 1 1 1

0

11

(1 ( ))

| | ( )11 2

( ,..., | ,..., , ,..., , ,..., )

( ( )) ( | ( ))M

M

|S|

2

T m m mm

m T km

m

T

N m m m

Ne

m C T k Y k T T kmk

c yN

m S C ymk

L y y T T

P c y P y C c y

e

e

Maximize the data likelihood for the mixing probabilities, the feature parameters, and the templates themselves…

Example: learning (right) eye templates

Example: learning (right) eye templates

2

(1 ( ))(1 ( ))

| | ( )1 11 1 2

)M M

What if we forget all this nonsense and just maximize

(instead of ?|S|

2

m T km

m T k mm

m T

c yN N

c y

m m S C ym mk k

ee

e

How good are the templates? A classification experiment…

0

0

1

( ) ( ( )) ( | ( ))

( ) ( ( )) ( | ( ))M

In general

or a mixture of these models

m

g gY F Y

g gY m F m Y m m

m

P y P f y P y F f y

P y P f y P y F f y

( ) is any function (feature),

such as a correlation with a SUBIMAGE.

Thus can index

* alternative models (e.g. 8 eye templates)

* transformations of scale, rotation, ...

mf y

m

(e.g. as in work of Amit and Trouve)

How good are the templates? A classification experiment…

Classify East Asian and South Asian * mixing over 4 scales, and 8 templates

East Asian: (L) examples of training images (M) progression of EM (R) trained templates

South Asian: (L) examples of training images (M) progression of EM (R) trained templates

Classification Rate: 97%

Other examples: noses 16 templates multiple scales, shifts, and rotations

samples from training set learned templates

Other examples: mixture of noses and mouths

samples from training set(1/2 noses, 1/2 mouths)

32 learned templates

Other examples: train on 58 faces …half with glasses…half without

32 learned templates

samples from training set

8 learned templates

Other examples: train on 58 faces …half with glasses…half without

8 learned templates

random eight of the 58 faces

row 2 to 4, top to bottom:templates ordered by posterior likelihood

Other examples: train random patches (“sparse representation”)

500 random 15x15 training patches from random internet images

24 10x10 templates

Other examples: coarse representation( ) ( , ( )),

( ) ( ( ), )?)

use where downconvert

(go other way for super res.:

f y Corr T D y D

f y Corr D T y

training of 8 low-res (10x10) templates

sample from training set(down-converted images)

IV. Sampling and the choice of null distribution 0

(1 ( )) 0( ) ( | ( ))M

m=1

Take a closer look at by (approximately) sampling from

m Tm

m m m

Y

c ygY m Y T T

P

P y e P y C c y

0 | | 1

| | | |

( )

* Standardize templates and patches:

* View & as distributions on the unit sphere in

* Then sample from by

1. choosing a mixing compo

m mm

m m

g SY Y

gY

T T y yT y

T T y y

P P R

y P y

(1 )

0

,...,1nent m according to

2. choosing a correlation according to

3. choosing a sample according to

and computing :

m

m

M

c

Y

c e

y P

y

(approximate) sampling…

| | 1unit sphere in SR

mT

y

y

c


032 samples from mixture model with white noiseYP


032 samples from mixture model with Caltech 101YP


032 samples from mixture model with from outdoor scenesYP


0

max | ( ) || |

32 samples from mixture model with random

patches satisfying

Y

s

s Ss

P

y y

y y

V. Other applications of conditional modeling

( )1. when two templates overlapgYP y

0 ( ) { }

2. Gibbs sampling: the problem is to draw a sample

from some distribution s s SP y y y

( )

* Given a sample at iteration from some

probability , visit a site t

t

P y s S

\( |* Replace by a sample from ) s s S sy P y y

\ \

01 \ \

0

: ( ) ( )

( ) ( ) ( | )

arg min ( || )

Then

S s t S s

t t S s s S s

P P y P y

P y P y P y y

D P P

3. Hierarchical models and the Markov Dilemma

{0,1}

1 'pair of eyes'

p

p

x

x

{0,1}

1 'left eye'l

l

x

x

{0,1}

1

'right eye'r

r

x

x

Markov model

Markov property…

Estimation

Computation

Representation

1Given , there are

probabilistic constraints

on the poses and

appearances of the

left and right eyes.

px

Hierarchical models and the Markov Dilemma{0,1}

1 'pair of eyes'

p

p

x

x

{0,1}

1 'left eye'l

l

x

x

{0,1}

1

'right eye'r

r

x

x

Markov

model0 ( )

1 ( )

( )

( ( ) |1 ( ) 1)A

More generally

Markov distribution

e.g. pair of eyes

attribute (e.g. relative

poses of two eyes)

P de

B

B

P x

x x B

a x

A a x x

sired

conditional distribution

1 0

: ( ( )|1 ( ) 1)( ( )|1 ( ) 1)

1 ( ) 1

1 00

( ) arg min ( || )

( ( ) |1 ( ) 1)( ) ( )

( ( ) |1 ( ) 1)

Choose

then

B

A B

B

P P A a x xP A a x x

x

A B

B

P x D P P

P A a x xP x P x

P A a x x

characters, plate sides

generic letter, generic number, L-junctions of sides

license plates

parts of characters, parts of plate sides

plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits)

license numbers (3 digits + 3 letters, 4 digits + 2 letters)

Hierarchical models and the Markov dilemma

Original image Zoomed license region

Top object: Markov distribution

Top object: perturbed (“content-sensitive”) distribution

Hierarchical models and the Markov dilemma

PATTERN SYNTHESIS

= PATTERN ANALYSIS

Ulf Grenander

Documents

Generative Models for Image Analysis