36
Object Recognition with Deformable Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.

Object Recognition with Deformable Models - …cs.brown.edu/~pff/talks/deformable.pdf · Object Recognition with Deformable Models Pedro F. Felzenszwalb ... -Applications: pose estimation,

  • Upload
    vungoc

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Object Recognition with Deformable Models

Pedro F. FelzenszwalbDepartment of Computer Science

University of Chicago

Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.

Example Problems

Detecting non-rigid objects

PASCAL challenge

Segmenting cells

Medical imageanalysis

Detecting rigid objects

Deformable Models

• Significant challenge:

- Handling variation in appearance within object classes

- Non-rigid objects, generic categories, etc.

• Deformable models approach:

- Consider each object as a deformed version of a template

- Compact representation

- Leads to interesting modeling and algorithmic problems

Overview

• Part I: Pictorial Structures

- Deformable part models

- Highly efficient matching algorithms

• Part II: Deformable Shapes

- Triangulated polygons

- Hierarchical models

• Part III: The PASCAL Challenge

- Recognizing 20 object categories in realistic scenes

- Discriminatively trained, multiscale, deformable part models

Part I: Pictorial Structures

• Introduced by Fischler and Elschlager in 1973

• Part-based models:

- Each part represents local visual properties

- “Springs” capture spatial relationships

Matching model to image involves joint optimization of part locations

“stretch and fit”

Local Evidence + Global Decision

• Parts have a match quality at each image location

• Local evidence is noisy

- Parts are detected in the context of the whole model

part

test image match quality

Matching Problem

• Model is represented by a graph G = (V, E)

- V = {v1,...,vn} are the parts

- (vi,vj) ∈ E indicates a connection between parts

• mi(li) is a cost for placing part i at location li

• dij(li,lj) is a deformation cost

• Optimal configuration for the object is L = (l1,...,ln) minimizing

i=1E(L) = ∑ mi(li) + ∑ dij(li,lj)

n

(vi,vj) ∈ E

Matching Problem

• Assume n parts, k possible locations for each part

- There are kn configurations L

• If graph is a tree we can use dynamic programming

- O(nk2) algorithm

• If dij(li,lj) = g(li-lj) we can use min-convolutions

- O(nk) algorithm

- As fast as matching each part separately!

i=1E(L) = ∑ mi(li) + ∑ dij(li,lj)

n

(vi,vj) ∈ E

• For each l1 find best l2:

- Best2(l1) = min [m2(l2) + d12(l1,l2)]

• “Delete” v2 and solve problem with smaller model

• Keep removing leafs until there is a single part left

Dynamic Programming on Trees

v1

v2

i=1E(L) = ∑ mi(li) + ∑ dij(li,lj)

n

(vi,vj) ∈ E

l2

Min-Convolution Speedup

• Brute force: O(k2) --- k is number of locations

• Suppose d12(l1,l2) = g(l1-l2):

- Best2(l1) = min [m2(l2) + g(l1-l2)]

• Min-convolution: O(k) if g is convex

Best2(l1) = min [m2(l2) + d12(l1,l2)] v1

v2

l2

l2

Finding Motorbikes

Model with 6 parts:2 wheels

2 headlightsfront & back of seat

Human Pose Estimation

Human Tracking

Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007

Part II: Deformable Shapes

• Shape is a fundamental cue for recognizing objects

• Many objects have no well defined parts

- We can capture their outlines using deformable models

Triangulated Polygons

• Polygonal templates

• Delauney triangulation gives natural decomposition of an object

• Consider deforming each triangle “independently”

Rabbit ear can be bent by changing shape of a single

triangle

Structure of Triangulated Polygons

There are 2 graphs associated with a triangulated polygon

Dual graph is a tree

If the polygon is simple (no holes):

Graphical structure of triangulation is a 2-tree

Deformable Matching

Matching to MRI data

Model

Consider piecewise affine maps from model to image (taking triangles to triangles)

Find globally optimal deformation using dynamic programming over 2-tree

Hierarchical Shape Model• Shape-tree of curve from a to b:

- Select midpoint c, store relative location c | a,b.

- Left child is a shape-tree of sub-curve from a to c.

- Right child is a shape-tree of sub-curve from c to b.

b a

ce

dg

fh

i

g | e,c i | d,bh | c,df | a,e

d | c,be | a,c

c | a,b

Deformations

• Independently perturb relative locations stored in a shape-tree

- Local and global properties are preserved

- Reconstructed curve is perceptually similar to original

p

q

r

Matching

Match(v, [p,q]) = w1Match(u, [q,r]) = w2

Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r))

b a

ce

dg

fh

i

g | e,c i | d,bh | c,df | a,e

d | c,be | a,c

c | a,b

v

u

w

model curve

similar to parsing with the CKY algorithm

Recognizing Leafs

15 species

75 examples per species

(25 training, 50 test)

Nearest neighbor classification

Shape-tree 96.28

Inner distance 94.13

Shape context 88.12

Part III: PASCAL Challenge

• ~10,000 images, with ~25,000 target objects

- Objects from 20 categories (person, car, bicycle, cow, table...)

- Objects are annotated with labeled bounding boxes

Model Overview

Model has a root filter plus deformable parts

root filter part filters deformation models

detection

Histogram of Gradient (HOG) Features

• Image is partitioned into 8x8 pixel blocks

• In each block we compute a histogram of gradient orientations

- Invariant to changes in lighting, small deformations, etc.

• We compute features at different resolutions (pyramid)

Filters

• Filters are rectangular templates defining weights for features

• Score is dot product of filter and subwindow of HOG pyramid

Image pyramid HOG feature pyramid

HOG pyramid

W

Score of H at this location is H ⋅ W

H

Object Hypothesis

Image pyramid HOG feature pyramid

Multiscale model captures features at two-resolutions

Score is sum of filter scores plus deformation

scores

Training• Training data consists of images with labeled bounding boxes

• Need to learn the model structure, filters and deformation costs

Training

Connection With Linear Classifiers

w is a modelx is a detection windowz are filter placements

concatenation of features and part displacements

concatenation of filters and deformation parameters

• Score of model is sum of filter scores plus deformation scores

- Bounding box in training data specifies that score should be high for some placement in a range

Latent SVMs

Linear in w if z is fixed

Regularization Hinge loss

Learned Models

Bottle

Car

Bicycle

Sofa

Example Results

More Results

Overall Results

• 9 systems competed in the 2007 challenge

• Out of 20 classes we get:

- First place in 10 classes

- Second place in 6 classes

• Some statistics:

- It takes ~2 seconds to evaluate a model in one image

- It takes ~3 hours to train a model

- MUCH faster than most systems

Component Analysis

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1

recall

prec

ision

PASCAL2006 Person

Root (0.18)Root+Latent (0.24)Parts+Latent (0.29)Root+Parts+Latent (0.34)

Summary

• Deformable models provide an elegant framework for object detection and recognition

- Efficient algorithms for matching models to images

- Applications: pose estimation, medical image analysis, object recognition, etc.

• We can learn models from partially labeled data

- Generalized standard ideas from machine learning

- Leads to state-of-the-art results in PASCAL challenge

• Future work: hierarchical models, grammars, 3D objects