LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning...

Preview:

Citation preview

LOCUS(Learning Object Classes with Unsupervised Segmentation)

A variational approach to learning model-based segmentation.

John Winn Microsoft Research Cambridge

with Nebojsa Jojic, MSR Redmond

7th July 2006

Overview

Learning object models

The LOCUS model

Experiments & results

Extensions to LOCUS

Goal

Long Term Goal

Recognise ~10,000 object classes.

Learning from ‘buckets’ of images

Horsemodel

Learningalgorithm

•Object Segmentation

•Object Recognition

•Object Detection

Object segmentation

+Horsemodel

LOCUS

Related work

Constellation modelsWeakly supervised

Probabilistic framework

Sparse

No segmentation

Object class recognition by unsupervised scale-invariant learning. R. Fergus, P. Perona, and A. Zisserman. CVPR 2003A Bayesian approach to unsupervised One-Shot learning of Object categories. L. Fei-Fei, R. Fergus, and P. Perona. ICCV 2003

Fragment-based

Learning to segment. E. Borenstein and S. Ullman. ECCV 2004Combining top-down and bottom-up segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR 2004

Dense model

Supervised

Non-probabilistic

No global shape model

Codebook-based

Combined object categorization and segmentation with an implicit shape model. B. Leibe, A. Leonardis, and B. Schiele. ECCV ‘04

Probabilistic

Dense model

Supervised

Ad-hoc inference

OBJ CUTProbabilistic

Dense model

Supervised

Requires video

LOCUS overview

Weakly supervised learning Buckets of images - no annotation required.

Probabilistic generative modelof both object and background.

Dense modelAll pixels modelled, not just at interest points.

Combines global and local cuesModels global shape and local appearance + edges.

Iterative inference processSimultaneous localisation, segmentation, pose estimation.

The LOCUS model

LOCUS model

Deformation field D

Position & size T

Class shape π Class edge sprite μo,σo

Edge image e

Image

Object appearance λ1

Background appearance λ0

Mask m

Shared between images

Different for each image

LOCUS model: appearance

background

object

Mask m

Background mixture coefficients

λ0

Objectmixture coefficients

λ1Image z

Shared mixture components:

LOCUS model: mask

background

object

8-neighbour Markov Random Field (as used in GrabCut)

favours segmentation along contrast edges

LOCUS model: shape/position

TNT4T2 T3T1

Transformation

Class shape π

Iterative inference

TNT4T2 T3T1

Class shape π

Iteration #1

Iterative inference

TNT4T2 T3T1

Class shape π

Iteration #2

Iterative inference

TNT4T2 T3T1

Class shape π

Iteration #3

Iterative inference

TNT4T2 T3T1

Class shape π

Iteration #5

Iterative inference

TNT4T2 T3T1

Class shape π

Iteration #8

Iterative inference

TNT4T2 T3T1

Class shape π

Iteration #12

Non-rigid objectsClass shape π

Translation and scale is not enough.

LOCUS model: pose

Class shape π

T 0 50 100 150 200

0

50

100

150

Deformation field D

5x5 blocks

Prior ensures smoothness

LOCUS model: poseClass shape π

TD1 TD2 TD3 TDN

LOCUS model: edge

TD1 TD2 TD3 TDN

Edge images e …

Original images

Class edge sprite μo,σo

LOCUS model: overview

Deformation field D

Position & size T

Class shape π Class edge sprite μo,σo

Edge image e

Image

Object appearance λ1

Background appearance λ0

Mask m

Shared between images

Different for each image

Inference

Aim to infer all latent variables, For each image: background appearance λ0, object

appearance λ1, deformation D, transformation T, mask m, Class variables: shape π, edge sprite μo, σo.

Bayesian inference is carried out using variational message passing with a fully factorised variational distribution.

Optimisation of grid-structured variational free energy terms (relating to the deformation field D and the mask m) achieved using graph cuts.

Experiments & results

Experiments

LOCUS applied to 8 sets of 20 images each containing objects of the same class.

•Horses•Faces•Cars (rear)•Cars (side)

•Motorbikes•Aeroplanes•Cows•Trees

For each class, we ran separate experiments for color and texture appearance models.

Results: horses

Results: horses

Results: cars

Results: cars

Results: remaining classes

Cars (rear)Faces Motorbikes Planes Cows Trees

Segmentation accuracy

Horses Cars (side)

LOCUS (color)LOCUS (texture)

unannotated training images

93.1%93.0%

91.4%94.0%

Borenstein et al.hand-segmented training images

93.6% -

Each image segmented separately

88.6% 82.1%

To evaluate segmentation quantitively, we used hand segmentations for horses and cars (side).

Object registration

Transformation + deformation field registers object outlines (and some internal edges).

Object registration

Extensions to LOCUS

Recognition + segmentation

Object recognition using only global shape:

Overall: 88% accuracy.

Probabilistic Index Maps2 indices 9 indices

Each image has a ‘palette’ of appearance models – palette invariance.

Probabilistic Index Maps

Learning objects from video

Object shape

Object edge sprite

Locumotion

Add flow and track constraints to achieve motion segmentation:

Tracking/flow estimation by Larry Zitnick

Conclusions

LOCUS gives unsupervised segmentations of accuracy equivalent to state-of-the-art supervised methods.

General-purpose model allows:Object localisationPose estimationObject segmentationMotion segmentation/object trackingObject recognition/detection (in combination

with discriminative model)

Questions ?

Recommended