56
Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois

Generative Models for Image Understanding

  • Upload
    toki

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Generative Models for Image Understanding. Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois. Problem: Summarization of High Dimensional Data. Pattern Analysis: For several classes c=1,..,C of the data, define probability distribution functions p(x| c) - PowerPoint PPT Presentation

Citation preview

Page 1: Generative Models for Image Understanding

Generative Models for Image Understanding

Nebojsa Jojic and Thomas HuangBeckman Institute and ECE Dept.

University of Illinois

Page 2: Generative Models for Image Understanding

Problem: Summarization of High Dimensional Data

• Pattern Analysis: – For several classes c=1,..,C of the data, define probability

distribution functions p(x| c)

• Compression: – Define a probabilistic model p(x) and devise an optimal coding

approach

• Video Summary: – Drop most of the frames in a video sequence and keep interesting

information that summarizes it.

Page 3: Generative Models for Image Understanding

Generative density modeling• Find a probability model that

– reflects desired structure– randomly generates plausible images, – represents the data by parameters

• ML estimation

• p(image|class) used for recognition, detection, ...

Page 4: Generative Models for Image Understanding

Problems we attacked• Transformation as a discrete variable in

generative models of intensity images• Tracking articulated objects in dense stereo

maps• Unsupervised learning for video summary

• Idea - the structure of the generative model reveals the interesting objects we want to extract.

Page 5: Generative Models for Image Understanding

Mixture of Gaussians

c

z

The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z; c , c)

P(c) = c

Page 6: Generative Models for Image Understanding

Mixture of Gaussians

cP(c) = c

zp(z|c) = N(z; c , c)

• Parameters c, c and c represent the data

• For input z, the cluster responsibilities are

P(c|z) = p(z|c)P(c) / c p(z|c)P(c)

Page 7: Generative Models for Image Understanding

Example: Simulation

c=1P(c) = c

z=

p(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Page 8: Generative Models for Image Understanding

Example: Simulation

c=2P(c) = c

z=

p(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Page 9: Generative Models for Image Understanding

Example: Learning - E step

c=1

Images from data set

z=

c=2

P(c|z)

c0.52

0.48

1= 0.5,

2= 0.5,

Page 10: Generative Models for Image Understanding

Example: Learning - E step

Images from data set

z=

cc=1

c=2

P(c|z)0.48

0.52

1= 0.5,

2= 0.5,

Page 11: Generative Models for Image Understanding

Example: Learning - M step

c

1= 0.5,

2= 0.5,

zSet 1 to the average of zP(c=1|z)

Set 2 to the average of zP(c=2|z)

Page 12: Generative Models for Image Understanding

Example: Learning - M step

c

1= 0.5,

2= 0.5,

zSet 1 to the average of

diag((z-1)T (z-1))P(c=1|z)Set 2 to the average of

diag((z-2)T (z-2))P(c=2|z)

Page 13: Generative Models for Image Understanding

Transformation as a Discrete Latent Variable

withBrendan J. Frey

Computer Science, University of Waterloo, CanadaBeckman Institute & ECE, Univ of Illinois at Urbana

Page 14: Generative Models for Image Understanding

Kind of data we’re interested in

Even after tracking, the features still have unknown positions, rotations, scales, levels of shearing, ...

Page 15: Generative Models for Image Understanding

Oneapproach

Normalization

PatternAnalysis

Images

Normalizedimages

Labor

Page 16: Generative Models for Image Understanding

Ourapproach

Joint Normalization

andPattern Analysis

Images

Page 17: Generative Models for Image Understanding

• A continuous transformation moves an image, , along a continuous curve

• Our subspace model should assign images near this nonlinear manifold to the same point in the subspace

What transforming an image does in the vector space of pixel intensities

Page 18: Generative Models for Image Understanding

Tractable approaches to modeling the transformation manifold

\ Linear approximation - good locally

• Discrete approximation - good globally

Page 19: Generative Models for Image Understanding

Adding “transformation” as a discrete latent variable

• Say there are N pixels

• We assume we are given a set of sparse N x N transformation generating matrices G1,…,Gl ,…,GL

• These generate points from point

Page 20: Generative Models for Image Understanding

Transformed Mixture of Gaussians

• l, c, c and c represent the data

• The cluster/transf responsibilities, P(c,l|x), are quite easy to compute

p(x|z,l) = N(x; Gl z , )

x

P(l) = l l

p(z|c) = N(z; c , c)

c

z

P(c) = c

Page 21: Generative Models for Image Understanding

Example: Simulation

l=1

c=1

G1 = shift left and up, G2 = I, G3 = shift right and up

z=

x=

Page 22: Generative Models for Image Understanding

ML estimation of a Transformed Mixture of Gaussians using EM

x

l

c

z

• E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data

• M step: Set– c = avg of P(c|x)

– l = avg of P(l|x)

– c = avg mean of p(z|c,x)

– c = avg variance of p(z|c,x)

– = avg var of p(x-Gl z|x)

Page 23: Generative Models for Image Understanding

Face ClusteringExamples of 400 outdoor images of 2 people

(44 x 28 pixels)

Page 24: Generative Models for Image Understanding

Mixture of Gaussians15 iterations of EM (MATLAB takes 1 minute)

Cluster meansc = 1 c = 2 c = 3 c = 4

Page 25: Generative Models for Image Understanding

30 iterations of EM

Cluster meansc = 1 c = 2 c = 3 c = 4

Transformed mixture of Gaussians

Page 26: Generative Models for Image Understanding

Video Analysis Using Generative Models

with Brendan Frey, Nemanja Petrovic and Thomas Huang

Page 27: Generative Models for Image Understanding

Idea

• Use generative models of video sequences to do unsupervised learning

• Use the resulting model for video summarization, filtering, stabilization, recognition of objects, retrieval, etc.

Page 28: Generative Models for Image Understanding

Transformed Hidden Markov Model

x

l

c

z

x

l

c

z

tt-1

P(c,l|past)

Page 29: Generative Models for Image Understanding

THMM Transition Models

• Independent probability distributions for class and transformations; relative motion

P(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1))

• Relative motion dependent on the classP(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1) | ct)

• Autoregressive model for transformation distribution

Page 30: Generative Models for Image Understanding

Inference in THMM

• Tasks:– Find the most likely state at time t given the

whole observed sequence {xt} and the model parameters (class means and variances, transition probabilities, etc.)

– Find the distribution over states for each time t– Find the most likely state sequence– Learn the parameters that maximize he

likelihood of the observed data

Page 31: Generative Models for Image Understanding

Video Summary and Filtering

x

l

c

z

p(x|z,l) = N(x; Gl z , )

p(z|c) = N(z; c , c) Video summary

Image segmentation

Removal of sensor noise

Image Stabilization

Page 32: Generative Models for Image Understanding

Example: Learning

• Hand-held camera• Moving subject• Cluttered backgroundDATA

c 1 class

121 translations (11 vertical and 11 horizontal shifts)

c5 classes

c

c

Page 33: Generative Models for Image Understanding

Examples

• Normalized sequence

• Simulated sequence

• De-noising

• Seeing through distractions

Page 34: Generative Models for Image Understanding

Future work

• Fast approximate learning and inference

• Multiple layers

• Learning transformations from images

Nebojsa Jojic: www.ifp.uiuc.edu/~jojic

Page 35: Generative Models for Image Understanding

Subspace models of imagesExample: Image, R 1200 = f (y, R 2)

Frown

Shut eyes

Page 36: Generative Models for Image Understanding

y

z

The density of pixel intensitiesz given subspace pointy is p(z|y) = N(z; +y, )

p(y) = N(y; 0, I)

Factor analysis (generative PCA)

Manifold: f (y) = +y, linear

Page 37: Generative Models for Image Understanding

• Parameters , represent the manifold• Observing z induces a Gaussian p(y|z):

COV[y|z] = (I)

E[y|z] = COV[y|z] z

y

z

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I)

Factor analysis (generative PCA)

Page 38: Generative Models for Image Understanding

Example: Simulation

Shut

eye

s

Frow

n=

y

z

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I) Frn

SE =

Page 39: Generative Models for Image Understanding

Example: Simulation

Shut

eye

s

Frow

n=

y

z

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I) Frn

SE =

Page 40: Generative Models for Image Understanding

Example: Simulation

Shut

eye

s

Frow

n=

y

z

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I) Frn

SE =

Page 41: Generative Models for Image Understanding

y

z

p(z|y) = N(z; +y, )

Transformed Component Analysis

lP(l) = l

p(y) = N(y; 0, I)

The probability of observedimage x is p(x|z,l) = N(x; Gl z , )

x

Page 42: Generative Models for Image Understanding

Example: Simulation

Shut

eye

s

Frow

n=

=

G1 = shift left & up, G2 = I, G3 = shift right & up

zl=3

yFrn

SE

x

Page 43: Generative Models for Image Understanding

Example: InferenceG1 = shift left & up, G2 = I, G3 = shift right & up

zl=3

x

yFrn

SE

zl=2

x

yFrn

SE

zl=1

x

yFrn

SE

Garbage

Garbage

P(l=1|x) =

P(l=3|x) =

P(l=2|x) =

Page 44: Generative Models for Image Understanding

EM algorithm for TCA• Initialize , , , to random values • E Step

– For each training case x(t), infer

q(t)(l,z,y) = p(l,z,y |x(t))• M Step

– Compute new,new, new,new,new to maximize

t E[ log p(y) p(z|y) P(l) p(x(t)|z,l)],

where E[] is wrt q(t)(l,z,y) • Each iteration increases log p(Data)

Page 45: Generative Models for Image Understanding

A tough toy problem• 144, 9 x 9 images• 1 shape (pyramid)• 3-D lighting• cluttered background

• 25 possible locations

Page 46: Generative Models for Image Understanding

1st 8 principal components:

TCA:

• 3 components• 81 transformations

- 9 horiz shifts- 9 vert shifts

• 10 iters of EM

• Model generates realistic examples

:1:2 :3

Page 47: Generative Models for Image Understanding

Expression modeling

• 100 16 x 24 training images

• variation in expression

• imperfect alignment

Page 48: Generative Models for Image Understanding

PCA: Mean + 1st 10 principal components

Factor Analysis: Mean + 10 factors after 70 its of EM

TCA: Mean + 10 factors after 70 its of EM

Page 49: Generative Models for Image Understanding

Fantasies from FA model Fantasies from TCA model

Page 50: Generative Models for Image Understanding

Modeling handwritten digits

• 200 8 x 8 images of each digit

• preprocessing normalizes vert/horiz translation and scale

• different writing angles (shearing) - see “7”

Page 51: Generative Models for Image Understanding

TCA: - 29 shearing + translation combinations - 10 components per digit - 30 iterations of EM per digit

Mean of each digitTransformed means

Page 52: Generative Models for Image Understanding

FA: Mean + 10 components per digit

TCA: Mean + 10 components per digit

Page 53: Generative Models for Image Understanding

Classification Performance• Training: 200 cases/digit, 20 components, 50 EM iters

• Testing: 1000 cases, p(x|class) used for classification

• Results:

Method Error ratek-nearest neighbors (optimized k) 7.6%Factor analysis 3.2%Tranformed component analysis 2.7%

• Bonus: P(l|x) infers the writing angle!

Page 54: Generative Models for Image Understanding

Wrap-up• Papers, MATLAB scripts:

www.ifp.uiuc.edu/~jojicwww.cs.uwaterloo.ca/~frey

• Other domains: audio, bioinfomatics, …

• Other latent image models, p(z)– mixtures of factor analyzers (NIPS99)– layers, multiple objects, occlusions– time series (in preparation)

Page 55: Generative Models for Image Understanding

Wrap-up• Discrete+Linear Combination: Set some

components equal to derivatives of wrt transformations

• Multiresolution approach

• Fast variational methods, belief propagation,...

Page 56: Generative Models for Image Understanding

Other generative models

• Modeling human appearance in stereo images: articulated, self-occluding Gaussians