Part1

Lecture 6: Introduction to Object Recognition

So what does object recognition involve?

Classification: does this contain people?

Detection: where are there people (if any)?

Identification: is that Potala Palace?

Object categorization

mountain

building

tree

banner

vendorpeople

street lamp

Scene and context categorization

• outdoor

• city

• …

Applications: Photography

Application: Assisted driving

meters

met

ers

Ped

Ped

Car

Lane detection

Pedestrian and car detection

• Collision warning systems with adaptive cruise control, • Lane departure warning systems, • Rear object detection systems,

Object recognitionIs it really so hard?

This is a chair

Find the chair in this image Output of normalized correlation

Slide: A. Torralba

Object recognitionIs it really so hard?

Find the chair in this image

Pretty much garbageSimple template matching is not going to make it

A “popular method is that of template matching, by point to point correlation of a model pattern with the image pattern. These techniques are inadequate for three-dimensional scene analysis for many reasons, such as occlusion, changes in viewing angle, and articulation of parts.” Nivatia & Binford, 1977.

Slide: A. Torralba

Challenges 1: view point variation

Michelangelo 1475-1564

Challenges 2: illumination

slide credit: S. Ullman

Challenges 3: occlusion

Magritte, 1957

Challenges 4: scale

Challenges 5: deformation

Xu, Beihong 1943

Variability: Camera positionIlluminationInternal parameters

Within-class variations

Modeling variability

Within-class variations

Timeline of recognition

• 1965-late 1980s: alignment, geometric primitives

Variability:Camera positionIlluminationInternal parameters

Alignment

Roberts (1965); Lowe (1987); Faugeras & Hebert (1986); Grimson & Lozano-Perez (1986); Huttenlocher & Ullman (1987)

Shape: assumed known

Recall: Alignment

• Alignment: fitting a model to a transformation between pairs of features (matches) in two images

i

ii xxT )),((residual

Find transformation T that minimizesT

xixi'


• 1965-late 1980s: alignment, geometric primitives• Early 1990s: invariants, appearance-based

methods

Empirical models of image variability

Appearance-based techniques

Turk & Pentland (1991); Murase & Nayar (1995); etc.

Color Histograms

Swain and Ballard, Color Indexing, IJCV 1991.

http://www.inf.ed.ac.uk/teaching/courses/av/LECTURE_NOTES/swainballard91.pdf

Limitations of global appearance models

• Can work on relatively simple patterns

• Not robust to clutter, occlusion, lighting changes



methods• Mid-late 1990s: sliding window approaches

– Classify each window separately – Scale / orientation range to search over

Sliding window approaches

Scene-level context for image parsing

J. Tighe and S. Lazebnik, ECCV 2010 submission

D. Hoiem, A. Efros, and M. Herbert. Putting Objects in Perspective. CVPR 2006.

Geometric context

http://www.cs.uiuc.edu/homes/dhoiem/projects/pop/


• 1965-late 1980s: alignment, geometric primitives

• Early 1990s: invariants, appearance-based methods

• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods

Lowe’02

Mahamud & Hebert’03

Local featuresCombining local appearance, spatial constraints, invariants, and classification techniques from machine learning.

Schmid & Mohr’97

Local features for recognition of object instancesSpecific Object Recognition



methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models

Parts and Structure approaches

With a different perspective, these models focused more on the geometry than on defining the constituent elements:

• Fischler & Elschlager 1973• Yuille ‘91• Brunelli & Poggio ‘93• Lades, v.d. Malsburg et al. ‘93• Cootes, Lanitis, Taylor et al. ‘95• Amit & Geman ‘95, ‘99 • Perona et al. ‘95, ‘96, ’98, ’00, ’03, ‘04, ‘05• Felzenszwalb & Huttenlocher ’00, ’04 • Crandall & Huttenlocher ’05, ’06• Leibe & Schiele ’03, ’04• Many papers since 2000

Figure from [Fischler & Elschlager 73]

Representing categories: Parts and Structure

Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)

Representation• Object as set of parts

– Generative representation

• Model:– Relative locations between parts– Appearance of part

• Issues:– How to model location– How to represent appearance– Sparse or dense (pixels or regions)– How to handle occlusion/clutter



methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models• 2003 – present: bags of features

ObjectObjectBag of Bag of ‘‘wordswords’’

Bag-of-features models

Objects as texture

• All of these are treated as being the same

• No distinction between foreground and background: scene recognition?



methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models• 2003 – present: bags of features• Present trends: combination of local and global

methods, modeling context, integrating recognition and segmentation

Global models?

• The “gist” of a scene: Oliva & Torralba (2001)

J. Hays and A. Efros, Scene Completion using Millions of Photographs,

SIGGRAPH 2007

http://graphics.cs.cmu.edu/projects/scene-completion/

NIPS 2007



methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models• 2003 – present: bags of features• Present trends: combination of local and global

methods, modeling context, integrating recognition and segmentation

Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint

)|( imagezebrap

)( ezebra|imagnopvs.

• Bayes rule:

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

• Discriminative methods model posterior

• Generative methods model likelihood and prior

Discriminative

• Direct modeling of

Zebra

Non-zebra

Decisionboundary

)|(

)|(

imagezebranop

imagezebrap

• Model and

Generative)|( zebraimagep ) |( zebranoimagep

Low Middle

High MiddleLow

)|( zebranoimagep)|( zebraimagep

Three main issuesThree main issues

• Representation– How to represent an object category

• Learning– How to form the classifier, given training data

• Recognition– How the classifier is to be used on novel data

Representation

– Generative / discriminative / hybrid

Representation


– Appearance only or location and appearance

Representation



– Invariances• View point• Illumination• Occlusion• Scale• Deformation• Clutter• etc.

Representation



– invariances– Part-based or global

w/sub-window

Representation



– invariances– Parts or global w/sub-

window– Use set of features or

each pixel in image

– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning

Learning

– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)

– Methods of training: generative vs. discriminative

Learning


– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

– Level of supervision• Manual segmentation; bounding box; image

labels; noisy labels

Learning

Contains a motorbike





– Batch/incremental (on category and image level; user-feedback )

Learning






– Training images:• Issue of overfitting• Negative images for discriminative methods

Priors

Learning






– Training images:• Issue of overfitting• Negative images for discriminative methods

– Priors

Learning

OBJECTS

ANIMALS INANIMATEPLANTS

MAN-MADENATURALVERTEBRATE …..

MAMMALS BIRDS

GROUSEBOARTAPIR CAMERA

Technology

Part1