Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan

Learning deformable models

Yali Amit, University of Chicago

Alain Trouvé, CMLA Cachan.

Why modeling?

Generative models for object appearance allow us to move from learned objects to online decisions on object configurations.

Probability models can be composed.

Parameters can be estimated online. Generative models allow us to learn sequentially and still be able to

discriminate between objects.

Sequential learning of new objects.

Sequential learning of sub-classes. Proper modeling and accounting of invariances allows us to learn with

small samples.

Large background samples not necessary.

Modeling object appearance

Object classes are recognized in data modulo strong variations – geometric and photometric.

Variations are modeled as group action on data.

Data is noisy and sampled discretely.

Model object appearance through group actions on a template which then undergoes some degradation to become observed data.

As vectors these are very far apart. Modulo translation, rotation and contrast they are identical except for the noise.Lower dimensional parameterization.

This structure could not be discovered through direct measurements onthe data. (Dictionary world or manifold world)

Mathematical formulation

Template estimation

Unobserved deformations

Example: handwritten digits

No modeling of contrast → contrast sensitive.

One way to avoid modeling a certain variability is to `mod' it out - Binary oriented edges. (Can't add binary images... )

Oriented edge dataOriginal image

Transforming to oriented edges

Deforming the data

Simplest background model

Mixtures

Mixture models for the `micro-world'

Mixture models for the `micro-world'

Modulo deformations

Structured library of parts

A mixture of models for local image windows – parts - is used to recode the image data at much lower spatial resolution with little loss of information.

A mixture of deformable models (rotations) imposes a geometric structure on this code – tells us which parts are similar.

Part based representation

Because parts are structured not much information lost with lower resolution.Much invariance gained.

Now estimate Bernoulli mixture models for object class with coarse part based representation. Or estimate hierarchy of mixture models.

Simple non-linear deformations

Patchwork model: gray levels

Training a POP model

Simple approximation: train each window separately with full E-step in the EM algorithm.

Assume homogeneous background model outside window.

Works for binary features not so well for gray level models.

For gray level data: use current estimates for all other windows, at optimal instantiation for each training sample as a background – iterative optimization of the full likelihood.

Training a POP model continued

Mixture models based on parts on coarse grid → POP models for each component based on oriented edge data → For each component of POPmodel compute mean image modulo shift → Produce gray level POP model from image means.

Conclusion Importance of modeling variability as hidden random variable.

Estimation of templates and mixtures through EM type algorithms.

Local world – parts, dictionaries with symmetries.

Global objects – non-linear deformations. Instead of modeling variability – max over simple subsets of

deformations applied to object parts. (Needs formalization.)

For object recognition there is rich structure in the subject matter beyond linear operations in function spaces. Distances should not be measured directly in observation space. The `manifold' is defined through the group action.

A wide range of open questions both theoretical and applied waiting to be studied.

Documents

Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan