Upload
khawarbashir
View
90
Download
1
Tags:
Embed Size (px)
Citation preview
Lecture 6: Introduction to Object Recognition
So what does object recognition involve?
Classification: does this contain people?
Detection: where are there people (if any)?
Identification: is that Potala Palace?
Object categorization
mountain
building
tree
banner
vendorpeople
street lamp
Scene and context categorization
• outdoor
• city
• …
Applications: Photography
Application: Assisted driving
meters
met
ers
Ped
Ped
Car
Lane detection
Pedestrian and car detection
• Collision warning systems with adaptive cruise control, • Lane departure warning systems, • Rear object detection systems,
Object recognitionIs it really so hard?
This is a chair
Find the chair in this image Output of normalized correlation
Slide: A. Torralba
Object recognitionIs it really so hard?
Find the chair in this image
Pretty much garbageSimple template matching is not going to make it
A “popular method is that of template matching, by point to point correlation of a model pattern with the image pattern. These techniques are inadequate for three-dimensional scene analysis for many reasons, such as occlusion, changes in viewing angle, and articulation of parts.” Nivatia & Binford, 1977.
Slide: A. Torralba
Challenges 1: view point variation
Michelangelo 1475-1564
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957
Challenges 4: scale
Challenges 5: deformation
Xu, Beihong 1943
Variability: Camera positionIlluminationInternal parameters
Within-class variations
Modeling variability
Within-class variations
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives
Variability:Camera positionIlluminationInternal parameters
Alignment
Roberts (1965); Lowe (1987); Faugeras & Hebert (1986); Grimson & Lozano-Perez (1986); Huttenlocher & Ullman (1987)
Shape: assumed known
Recall: Alignment
• Alignment: fitting a model to a transformation between pairs of features (matches) in two images
i
ii xxT )),((residual
Find transformation T that minimizesT
xixi'
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives• Early 1990s: invariants, appearance-based
methods
Empirical models of image variability
Appearance-based techniques
Turk & Pentland (1991); Murase & Nayar (1995); etc.
Color Histograms
Swain and Ballard, Color Indexing, IJCV 1991.
Limitations of global appearance models
• Can work on relatively simple patterns
• Not robust to clutter, occlusion, lighting changes
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives• Early 1990s: invariants, appearance-based
methods• Mid-late 1990s: sliding window approaches
– Classify each window separately – Scale / orientation range to search over
Sliding window approaches
Scene-level context for image parsing
J. Tighe and S. Lazebnik, ECCV 2010 submission
D. Hoiem, A. Efros, and M. Herbert. Putting Objects in Perspective. CVPR 2006.
Geometric context
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives
• Early 1990s: invariants, appearance-based methods
• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods
Lowe’02
Mahamud & Hebert’03
Local featuresCombining local appearance, spatial constraints, invariants, and classification techniques from machine learning.
Schmid & Mohr’97
Local features for recognition of object instancesSpecific Object Recognition
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives• Early 1990s: invariants, appearance-based
methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models
Parts and Structure approaches
With a different perspective, these models focused more on the geometry than on defining the constituent elements:
• Fischler & Elschlager 1973• Yuille ‘91• Brunelli & Poggio ‘93• Lades, v.d. Malsburg et al. ‘93• Cootes, Lanitis, Taylor et al. ‘95• Amit & Geman ‘95, ‘99 • Perona et al. ‘95, ‘96, ’98, ’00, ’03, ‘04, ‘05• Felzenszwalb & Huttenlocher ’00, ’04 • Crandall & Huttenlocher ’05, ’06• Leibe & Schiele ’03, ’04• Many papers since 2000
Figure from [Fischler & Elschlager 73]
Representing categories: Parts and Structure
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
Representation• Object as set of parts
– Generative representation
• Model:– Relative locations between parts– Appearance of part
• Issues:– How to model location– How to represent appearance– Sparse or dense (pixels or regions)– How to handle occlusion/clutter
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives• Early 1990s: invariants, appearance-based
methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models• 2003 – present: bags of features
ObjectObjectBag of Bag of ‘‘wordswords’’
Bag-of-features models
Objects as texture
• All of these are treated as being the same
• No distinction between foreground and background: scene recognition?
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives• Early 1990s: invariants, appearance-based
methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models• 2003 – present: bags of features• Present trends: combination of local and global
methods, modeling context, integrating recognition and segmentation
Global models?
• The “gist” of a scene: Oliva & Torralba (2001)
J. Hays and A. Efros, Scene Completion using Millions of Photographs,
SIGGRAPH 2007
NIPS 2007
Timeline of recognition
• 1965-late 1980s: alignment, geometric primitives• Early 1990s: invariants, appearance-based
methods• Mid-late 1990s: sliding window approaches• Late 1990s: feature-based methods• Early 2000s – present : parts-and-shape models• 2003 – present: bags of features• Present trends: combination of local and global
methods, modeling context, integrating recognition and segmentation
Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint
)|( imagezebrap
)( ezebra|imagnopvs.
• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
posterior ratio likelihood ratio prior ratio
Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
posterior ratio likelihood ratio prior ratio
• Discriminative methods model posterior
• Generative methods model likelihood and prior
Discriminative
• Direct modeling of
Zebra
Non-zebra
Decisionboundary
)|(
)|(
imagezebranop
imagezebrap
• Model and
Generative)|( zebraimagep ) |( zebranoimagep
Low Middle
High MiddleLow
)|( zebranoimagep)|( zebraimagep
Three main issuesThree main issues
• Representation– How to represent an object category
• Learning– How to form the classifier, given training data
• Recognition– How the classifier is to be used on novel data
Representation
– Generative / discriminative / hybrid
Representation
– Generative / discriminative / hybrid
– Appearance only or location and appearance
Representation
– Generative / discriminative / hybrid
– Appearance only or location and appearance
– Invariances• View point• Illumination• Occlusion• Scale• Deformation• Clutter• etc.
Representation
– Generative / discriminative / hybrid
– Appearance only or location and appearance
– invariances– Part-based or global
w/sub-window
Representation
– Generative / discriminative / hybrid
– Appearance only or location and appearance
– invariances– Parts or global w/sub-
window– Use set of features or
each pixel in image
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning
Learning
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– Methods of training: generative vs. discriminative
Learning
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
Learning
Contains a motorbike
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
– Batch/incremental (on category and image level; user-feedback )
Learning
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
– Batch/incremental (on category and image level; user-feedback )
– Training images:• Issue of overfitting• Negative images for discriminative methods
Priors
Learning
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
– Batch/incremental (on category and image level; user-feedback )
– Training images:• Issue of overfitting• Negative images for discriminative methods
– Priors
Learning
OBJECTS
ANIMALS INANIMATEPLANTS
MAN-MADENATURALVERTEBRATE …..
MAMMALS BIRDS
GROUSEBOARTAPIR CAMERA