Upload
zukun
View
3.429
Download
4
Tags:
Embed Size (px)
Citation preview
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Recognition
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l ????
Identification vs. Categorization
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
2K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Object Categorization
• How to recognize ANY car
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
3K. Grauman, B. Leibe
• How to recognize ANY cow
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
What could be done with recognition algorithms?
There is a wide range of applications, including…
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Medical image analysis
Navigation, driver safetyAutonomous robots Situated search
Content-based retrieval and analysis for images and videos
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Object Categorization
• Task Description
� “Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label.”
• Which categories are feasible visually?
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
5K. Grauman, B. Leibe
•� Extensively studied in Cognitive Psychology,
e.g. [Brown’58]
GermanGermanGermanGerman
shepherdshepherdshepherdshepherd
animalanimalanimalanimaldogdogdogdog livinglivinglivingliving
beingbeingbeingbeing
“Fido”“Fido”“Fido”“Fido”
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Categories
• Basic Level Categories in human categorization [Rosch 76, Lakoff 87]
� The highest level at which category members have similar perceived shape
� The highest level at which a single mental image reflects the entire category
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
6K. Grauman, B. Leibe
entire category
� The level at which human subjects are usually fastest at identifying category members
� The first level named and understood by children
� The highest level at which a person uses similar motor actions for interaction with category members
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Categories
• Basic-level categories in humans seem to be defined predominantly visually.
• There is evidence that humans (usually)start with basic-level categorization before doing identification.
⇒⇒⇒⇒ Basic-level categorization is easierAbstract
animal
…
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
7K. Grauman, B. Leibe
⇒⇒⇒⇒ Basic-level categorization is easierand faster for humans than objectidentification!
⇒⇒⇒⇒ Most promising starting pointfor visual classification
Basic level
Individual level
Abstract levels
“Fido”
dog
quadruped
German
shepherdDoberman
cat cow
…
……
… …
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Other Types of Categories
• Functional Categories
� e.g. chairs = “something you can sit on”
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
8K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Other Types of Categories
• Ad-hoc categories
� e.g. “something you can find in an office environment”
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
9K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Levels of Object Categorization
“cow”
“motorbike”
“car”
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
10K. Grauman, B. Leibe
• Different levels of recognition
� Which object class is in the image? ⇒⇒⇒⇒ Obj/Img classification
� Where is it in the image? ⇒⇒⇒⇒ Detection/Localization
� Where exactly ― which pixels? ⇒⇒⇒⇒ Figure/Ground segmentation
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Challenges: robustness
Illumination Object pose Clutter
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Illumination Object pose Clutter
ViewpointIntra-class appearance
Occlusions
K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Challenges: robustness
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
12K. Grauman, B. Leibe
• Detection in Crowded Scenes� Learn object variability
– Changes in appearance, scale, and articulation
� Compensate for clutter, overlap, and occlusion
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Challenges: context and human experience
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Challenges: context and human experience
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Context cues Dynamics
Video credit: J. DavisImage credit: D. Hoeim
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Challenges: scale, efficiency
• Thousands to millions of pixels in an image• Estimated 30 Gigapixels of image/video content
generated per second• About half of the cerebral cortex in primates is devoted
to processing visual information [Felleman and van Essen 1991]
• 3,000-30,000 human recognizable object categories
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• 3,000-30,000 human recognizable object categories• 30+ degrees of freedom in the pose of articulated
objects (humans)• Billions of images indexed by Google Image Search• 18 billion+ prints produced from digital camera images
in 2004• 295.5 million camera phones sold in 2005
K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Challenges: learning with minimal supervision
MoreLess
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Rough evolution of focus in recognition research
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
1980s Currently1990s to early 2000s
K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
This tutorial
• Intended for broad AAAI audience
� Assuming basic familiarity with machine learning, linear algebra, probability
� Not assuming significant vision background
• Our goals
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Our goals
� Describe main approaches to recognition
� Highlight past successes and future challenges
� Provide the pointers (to literature and tools) that would allow you to take advantage of existing techniques in your research
• Questions welcome
18K. Grauman, B. Leibe
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perc
eptu
al
and S
enso
ry A
ugm
ente
d C
om
puti
ng
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
19K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual Object Recognition
Bastian Leibe &Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen GraumanDepartment of Computer SciencesUniversity of Texas in Austin
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
2K. Grauman, B. Leibe
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Detection via classification: Main idea
Car/non-car Classifier
Yes, car.No, not a car.
K. Grauman, B. Leibe
Basic component: a binary classifier
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Detection via classification: Main idea
Car/non-car Classifier
K. Grauman, B. Leibe
If object may be in a cluttered scene, slide a window around looking for it.
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Detection via classification: Main idea
Car/non-car Classifier
Feature extraction
Training examples
K. Grauman, B. Leibe
1. Obtain training data2. Define features3. Define classifier
Fleshing out this pipeline a bit more, we need to:
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
6K. Grauman, B. Leibe
Detection via classification: Main idea
• Consider all subwindows in an imageSample at multiple scales and positions
• Make a decision per window:“Does this contain object category X or not?”
• In this section, we’ll focus specifically on methods using a global representation (i.e., not part-based, not local features).
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Feature extraction: global appearance
Feature extraction
Simple holistic descriptions of image contentgrayscale / color histogramvector of pixel intensities
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Eigenfaces: global appearance description
K. Grauman, B. LeibeTurk & Pentland, 1991
Training images
Mean
Eigenvectors computed from covariance matrix
Project new images to “face space”.
Recognition via nearest neighbors in face space
Generate low-dimensional representation of appearance with a linear subspace.
≈ + +Mean
+ +
...
An early appearance-based approach to face recognition
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Feature extraction: global appearance
• Pixel-based representations sensitive to small shifts
• Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation
K. Grauman, B. Leibe
Cartoon example: an albino koala
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Gradient-based representations
• Consider edges, contours, and (oriented) intensity gradients
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Gradient-based representations: Matching edge templates
• Example: Chamfer matching
Template shape
Input image
Edges detected
Distance transform
Gavrila & Philomin ICCV 1999
Best match
At each window position, compute average min distance between points on template (T) and input (I).
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
• Chamfer matching
Gavrila & Philomin ICCV 1999
Hierarchy of templates
Gradient-based representations: Matching edge templates
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Gradient-based representations
• Consider edges, contours, and (oriented) intensity gradients
• Summarize local distribution of gradients with histogramLocally orderless: offers invariance to small shifts and rotationsContrast-normalization: try to correct for variable illumination
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Gradient-based representations:Histograms of oriented gradients (HoG)
Dalal & Triggs, CVPR 2005
Map each grid cell in the input window to a histogram counting the gradients per orientation.
Code available: http://pascal.inrialpes.fr/soft/olt/
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Gradient-based representations:SIFT descriptor
Lowe, ICCV 1999
Local patch descriptor (more on this later)
K. Grauman, B. Leibe
Code: http://vision.ucla.edu/~vedaldi/code/sift/sift.htmlBinary: http://www.cs.ubc.ca/~lowe/keypoints/
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
K. Grauman, B. Leibe
Gradient-based representations:Biologically inspired features
Serre, Wolf, Poggio, CVPR 2005Mutch & Lowe, CVPR 2006
Convolve with Gabor filters at multiple orientations
Pool nearby units (max)
Intermediate layers compare inputto prototype patches
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Gradient-based representations:Rectangular features
Compute differences between sums of pixels in rectangles
Captures contrast in adjacent spatial regions
Similar to Haar wavelets, efficient to compute
Viola & Jones, CVPR 2001K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Gradient-based representations:Shape context descriptor
Count the number of points inside each bin, e.g.:
Count = 4
Count = 10...
Log-polar binning: more precision for nearby points, more flexibility for farther points.
Belongie, Malik & Puzicha, ICCV 2001
K. Grauman, B. Leibe
Local descriptor (more on this later)
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
• How to compute a decision for each subwindow?
Image feature
Classifier construction
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Discriminative vs. generative models
0 10 20 30 40 50 60 700
0.05
0.1
0 10 20 30 40 50 60 700
0.5
1x = data
Plots from Antonio Torralba 2007
),Pr( carimage ),Pr( carimage ¬
)|Pr( imagecar )|Pr( imagecar¬
image feature
image feature
Generative: separately model class-conditional and prior densities
Discriminative: directly model posterior
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Discriminative vs. generative models
• Generative:+ possibly interpretable+ can draw samples- models variability unimportant to classification task- often hard to build good model with few parameters
• Discriminative:+ appealing when infeasible to model data itself+ excel in practice- often can’t provide uncertainty in predictions- non-interpretable
21K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Discriminative methods
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005...
Neural networks
LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…
Support Vector Machines Conditional Random Fields
McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003…
Guyon, VapnikHeisele, Serre, Poggio, 2001,…
Slide adapted from Antonio TorralbaK. Grauman, B. Leibe
Boosting
Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Boosting
• Build a strong classifier by combining number of “weak classifiers”, which need only be better than chance
• Sequential learning process: at each iteration, add a weak classifier
• Flexible to choice of weak learnerincluding fast simple classifiers that alone may be inaccurate
• We’ll look at Freund & Schapire’s AdaBoost algorithmEasy to implementBase learning algorithm for Viola-Jones face detector
23K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
AdaBoost: Intuition
24K. Grauman, B. Leibe
Figure adapted from Freund and Schapire
Consider a 2-d feature space with positive and negative examples.
Each weak classifier splits the training examples with at least 50% accuracy.
Examples misclassified by a previous weak learner are given more emphasis at future rounds.
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
AdaBoost: Intuition
25K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
AdaBoost: Intuition
26K. Grauman, B. Leibe
Final classifier is combination of the weak classifiers
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
AdaBoost AlgorithmStart with uniform weights on training examples
Evaluate weighted error for each feature, pick best.
Incorrectly classified -> more weight
Correctly classified -> less weight
Final classifier is combination of the weak ones, weighted according to error they had.
Freund & Schapire 1995
{x1,…xn}
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Cascading classifiers for detection
For efficiency, apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative; e.g.,
Filter for promising regions with an initial inexpensive classifier
Build a chain of classifiers, choosing cheap ones with low false negative rates early in the chain
28K. Grauman, B. Leibe
Fleuret & Geman, IJCV 2001Rowley et al., PAMI 1998Viola & Jones, CVPR 2001
Figure from Viola & Jones CVPR 2001
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Example: Face detection
• Frontal faces are a good example of a class where global appearance models + a sliding window detection approach fit well:
Regular 2D structure
Center of face almost shaped like a “patch”/window
• Now we’ll take AdaBoost and see how the Viola-Jones face detector works
29K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Feature extraction
30K. Grauman, B. Leibe
Feature output is difference between adjacent regions
Viola & Jones, CVPR 2001
Efficiently computable with integral image: any sum can be computed in constant time
Avoid scaling images scale features directly for same cost
“Rectangular” filters
Value at (x,y) is sum of pixels above and to the left of (x,y)
Integral image
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Large library of filters
Considering all possible filter parameters: position, scale, and type:
180,000+ possible features associated with each 24 x 24 window
Use AdaBoost both to select the informative features and to form the classifier
Viola & Jones, CVPR 2001
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
AdaBoost for feature+classifier selection• Want to select the single rectangle feature and threshold
that best separates positive (faces) and negative (non-faces) training examples, in terms of weighted error.
Outputs of a possible rectangle feature on faces and non-faces.
…
Resulting weak classifier:
For next round, reweight the examples according to errors, choose another filter/threshold combo.
Viola & Jones, CVPR 2001
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Viola-Jones Face Detector: Summary
• Train with 5K positives, 350M negatives• Real-time detector using 38 layer cascade• 6061 features in final layer• [Implementation available in OpenCV:
http://www.intel.com/technology/computing/opencv/]33
K. Grauman, B. Leibe
Faces
Non-faces
Train cascade of classifiers with
AdaBoost
Selected features, thresholds, and weights
New image
Appl
y to
eac
h
subw
indo
w
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Viola-Jones Face Detector: Results
34K. Grauman, B. Leibe
First two features selected
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Viola-Jones Face Detector: Results
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Viola-Jones Face Detector: Results
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Viola-Jones Face Detector: Results
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Profile Features
Detecting profile faces requires training separate detector with profile examples.
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Paul Viola, ICCV tutorial
Viola-Jones Face Detector: Results
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
40K. Grauman, B. Leibe
Everingham, M., Sivic, J. and Zisserman, A."Hello! My name is... Buffy" - Automatic naming of characters in TV video,BMVC 2006. http://www.robots.ox.ac.uk/~vgg/research/nface/index.html
Example application
Frontal faces detected and then tracked, character names inferred with alignment of script and subtitles.
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pedestrian detection• Detecting upright, walking humans also possible using sliding
window’s appearance/texture; e.g.,
K. Grauman, B. Leibe
SVM with Haar wavelets [Papageorgiou & Poggio, IJCV 2000]
Space-time rectangle features [Viola, Jones & Snow, ICCV 2003]
SVM with HoGs [Dalal & Triggs, CVPR 2005]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Highlights
• Sliding window detection and global appearance descriptors:
Simple detection protocol to implementGood feature choices criticalPast successes for certain classes
42K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Limitations
• High computational complexity For example: 250,000 locations x 30 orientations x 4 scales = 30,000,000 evaluations!If training binary detectors independently, means cost increaseslinearly with number of classes
• With so many windows, false positive rate better be low
43K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Limitations (continued)
• Not all objects are “box” shaped
44K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Limitations (continued)
• Non-rigid, deformable objects not captured well with representations assuming a fixed 2d structure; or must assume fixed viewpoint
• Objects with less-regular textures not captured well with holistic appearance-based descriptions
45K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Limitations (continued)
• If considering windows in isolation, context is lost
46K. Grauman, B. LeibeFigure credit: Derek Hoiem
Sliding window Detector’s view
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Limitations (continued)
• In practice, often entails large, cropped training set (expensive)
• Requiring good match to a global appearance description can lead to sensitivity to partial occlusions
47K. Grauman, B. LeibeImage credit: Adam, Rivlin, & Shimshoni
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
48K. Grauman, B. Leibe
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
2K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Motivation
• Global representations have major limitations
• Instead, describe and match only local regions
• Increased robustness to
� Occlusions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
� Articulation
� Intra-category variations
3K. Grauman, B. Leibe
θq
φ
dq
φ
θ
d
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Approach
A1
A2 A3
1. Find a set of distinctive key-points
2. Define a region around each keypoint
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
4K. Grauman, B. Leibe
Npixels
N pixels
Similarity
measureAf
e.g. color
Bf
e.g. color
Tffd BA <),(
3. Extract and normalize the region content
4. Compute a local descriptor from the normalized region
5. Match local descriptors
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Requirements
• Region extraction needs to be repeatable and precise
� Translation, rotation, scale changes
� (Limited out-of-plane (≈≈≈≈affine) transformations)
� Lighting variations
• We need a sufficient number of regions to cover the
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
• We need a sufficient number of regions to cover the object
• The regions should contain “interesting” structure
5K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Many Existing Detectors Available
• Hessian & Harris [Beaudet ‘78], [Harris ‘88]
• Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]
• Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]
• Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]
• EBR and IBR [Tuytelaars & Van Gool ‘04]
•
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
• MSER [Matas ‘02]
• Salient Regions [Kadir & Brady ‘01]
• Others…
6K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Keypoint Localization
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Goals:
� Repeatable detection
� Precise localization
� Interesting content
⇒⇒⇒⇒ Look for two-dimensional signal changes
7K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Hessian Detector [Beaudet78]
• Hessian determinant
=
yyxy
xyxx
II
IIIHessian )(
Ixx
I
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
8K. Grauman, B. Leibe
Iyy
Ixy
Intuition: Search for strongderivatives in two orthogonal directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Hessian Detector [Beaudet78]
• Hessian determinant
Ixx
I
=
yyxy
xyxx
II
IIIHessian )(
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
9K. Grauman, B. Leibe
IyyIxy
2))(det( xyyyxx IIIIHessian −=
2)^(. xyyyxx III −∗
In Matlab:
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Hessian Detector – Responses [Beaudet78]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
10
Effect: Responses mainly on corners and strongly textured areas.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Hessian Detector – Responses [Beaudet78]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
11
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Harris Detector [Harris88]
• Second moment matrix(autocorrelation matrix)
∗=
)()(
)()()(),(
2
2
DyDyx
DyxDx
IDIIII
IIIg
σσ
σσσσσµ
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
12K. Grauman, B. Leibe
Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors).
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Harris Detector [Harris88]
• Second moment matrix(autocorrelation matrix)
I I
∗=
)()(
)()()(),(
2
2
DyDyx
DyxDx
IDIIII
IIIg
σσ
σσσσσµ
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
13K. Grauman, B. Leibe
1. Image derivatives
gx(σD), gy(σD),
IxIy
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Harris Detector [Harris88]
• Second moment matrix(autocorrelation matrix)
∗=
)()(
)()()(),(
2
2
DyDyx
DyxDx
IDIIII
IIIg
σσ
σσσσσµ
I I
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
14K. Grauman, B. Leibe
1. Image derivatives
gx(σD), gy(σD),
IxIy
14
2. Square of
derivatives
Ix2 Iy
2 IxIy
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Harris Detector [Harris88]
• Second moment matrix(autocorrelation matrix)
I
∗=
)()(
)()()(),(
2
2
DyDyx
DyxDx
IDIIII
IIIg
σσ
σσσσσµ
1. Image
derivatives
Ix Iy
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
1. Image derivatives
gx(σD), gy(σD),
2. Square of
derivatives
Iy
2. Square of
derivatives
3. Gaussian
filter g(σI)
Ix2 Iy
2 IxIy
g(Ix2) g(Iy
2) g(IxIy)
15
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Harris Detector [Harris88]
• Second moment matrix(autocorrelation matrix)
I
∗=
)()(
)()()(),(
2
2
DyDyx
DyxDx
IDIIII
IIIg
σσ
σσσσσµ
1. Image
derivatives
2. Square of
derivatives
Ix Iy
Ix2 Iy
2 IxIy
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Iy
g(IxIy)
16
derivatives
3. Gaussian
filter g(σI)g(Ix
2) g(Iy2) g(IxIy)
222222)]()([)]([)()( yxyxyx IgIgIIgIgIg +−− α
=−= ))],([trace()],(det[ DIDIhar σσµασσµ
4. Cornerness function – both eigenvalues are strong
har5. Non-maxima suppression
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Harris Detector – Responses [Harris88]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
17
Effect: A very precise corner detector.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Harris Detector – Responses [Harris88]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
18
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Automatic Scale Selection
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
19K. Grauman, B. Leibe
)),(( )),((11
σσ ′′= xIfxIfmm iiii KK
Same operator responses if the patch contains the same image up to scale factor
How to find corresponding patch sizes?
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
20K. Grauman, B. Leibe
)),((1
σxIfmii K
)),((1
σxIfmii
′K
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
21K. Grauman, B. Leibe
)),((1
σxIfmii K
)),((1
σxIfmii
′K
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
22K. Grauman, B. Leibe
)),((1
σxIfmii K
)),((1
σxIfmii
′K
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
23K. Grauman, B. Leibe
)),((1
σxIfmii K
)),((1
σxIfmii
′K
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
24K. Grauman, B. Leibe
)),((1
σxIfmii K
)),((1
σxIfmii
′K
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
25K. Grauman, B. Leibe
)),((1
σxIfmii K
)),((1
σ ′′xIfmii K
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
What Is A Useful Signature Function?
• Laplacian-of-Gaussian = “blob” detector
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
26K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Laplacian-of-Gaussian (LoG)
• Local maxima in scale space of Laplacian-of-Gaussian
)()( σσ LL +
σσσσ4444
σσσσ5555
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
27K. Grauman, B. Leibe
)()( σσ yyxx LL +
σσσσ
σσσσ2222
σσσσ3333
⇒⇒⇒⇒ List of(x, y, s)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Results: Laplacian-of-Gaussian
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
28K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Difference-of-Gaussian (DoG)
• Difference of Gaussians as approximation of theLaplacian-of-Gaussian
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
29K. Grauman, B. Leibe
- =
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
DoG – Efficient Computation
• Computation in Gaussian scale pyramid
Sampling with
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
30K. Grauman, B. Leibe
σσσσ
Original image4
1
2=σ
Sampling withstep σσσσ4444 =2
σσσσ
σσσσ
σσσσ
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Results: Lowe’s DoG
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
31K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Harris-Laplace [Mikolajczyk ‘01]
1. Initialization: Multiscale Harris corner detection
σσσσ4444
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
32
σσσσ
σσσσ2222
σσσσ3333
Computing Harris function Detecting local maxima
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Harris-Laplace [Mikolajczyk ‘01]
1. Initialization: Multiscale Harris corner detection
2. Scale selection based on Laplacian(same procedure with Hessian ⇒⇒⇒⇒ Hessian-Laplace)
Harris points
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
33K. Grauman, B. Leibe
Harris-Laplace points
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Maximally Stable Extremal Regions [Matas ‘02]
• Based on Watershed segmentation algorithm
• Select regions that stay stable over a large parameter range
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
34K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Example Results: MSER
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
35K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
You Can Try It At Home…
• For most local feature detectors, executables are available online:
• http://robots.ox.ac.uk/~vgg/research/affine
• http://www.cs.ubc.ca/~lowe/keypoints/
• http://www.vision.ee.ethz.ch/~surf
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
36K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Orientation Normalization
• Compute orientation histogram
• Select dominant orientation
• Normalize: rotate to fixed orientation
[Lowe, SIFT, 1999]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
T. Tuytelaars, B. Leibe
370 2π
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Local Descriptors
• The ideal descriptor should be
� Repeatable
� Distinctive
� Compact
� Efficient
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
• Most available descriptors focus on edge/gradient information
� Capture texture information
� Color still relatively seldomly used (more suitable for homogenous regions)
38K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Local Descriptors: SIFT Descriptor
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
[Lowe, ICCV 1999]
Histogram of oriented gradients
• Captures important texture information
• Robust to small translations /affine deformations
K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Local Descriptors: SURF
• Fast approximation of SIFT idea
� Efficient computation by 2D box filters & integral images⇒⇒⇒⇒ 6 times faster than SIFT
� Equivalent quality for object identification
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
40K. Grauman, B. Leibe
[Bay, ECCV’06], [Cornelis, CVGPU’08]
• GPU implementation available
� Feature extraction @ 100Hz(detector + descriptor, 640×480 img)
� http://www.vision.ee.ethz.ch/~surf
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Local Descriptors: Shape Context
Count the number of points inside each bin, e.g.:
Count = 4
Count = 10...
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Count = 10
Log-polar binning: more precision for nearby points, more flexibility for farther points.
Belongie & Malik, ICCV 2001K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Local Descriptors: Geometric Blur
Compute edges
at four
orientations
Extract a patch
in each channel
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Example descriptor
~
in each channel
Apply spatially varying
blur and sub-sample
(Idealized signal)
Berg & Malik, CVPR 2001K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
So, What Local Features Should I Use?
• There have been extensive evaluations/comparisons
� [Mikolajczyk et al., IJCV’05, PAMI’05]
� All detectors/descriptors shown here work well
• Best choice often application dependent
� MSER works well for buildings and printed things
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
� MSER works well for buildings and printed things
� Harris-/Hessian-Laplace/DoG work well for many natural categories
• More features are better
� Combining several detectors often helps
43K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
44K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
2K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Recognition with Local Features
• Image content is transformed into local features that are invariant to translation, rotation, and scale
• Goal: Verify if they belong to a consistent configuration
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
3K. Grauman, B. Leibe
Local Features, e.g. SIFT
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Finding Consistent Configurations
• Global spatial models
� Generalized Hough Transform [Lowe99]
� RANSAC [Obdrzalek02, Chum05, Nister06]
� Basic assumption: object is planar
• Assumption is often justified in practice
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
4K. Grauman, B. Leibe
• Assumption is often justified in practice
� Valid for many structures on buildings
� Sufficient for small viewpoint variations on 3D objects
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Hough Transform
• Origin: Detection of straight lines in clutter� Basic idea: each candidate point votes
for all lines that it is consistent with.
� Votes are accumulated in quantized array
� Local maxima correspond to candidate lines
• Representation of a line
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
5K. Grauman, B. Leibe
• Representation of a line� Usual form y = a x + b has a singularity around 90º.
� Better parameterization: x cos(θθθθ) + y sin(θθθθ) = ρ
ρ
θx
y
θ
ρ
x
y
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Hough Transform: Noisy Line
ρ
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
7K. Grauman, B. Leibe
• Problem: Finding the true maximum
Tokens Votesθ
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Hough Transform: Noisy Input
ρ
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
8K. Grauman, B. Leibe
• Problem: Lots of spurious maxima
Tokens Votes
Slide credit: David Lowe
θ
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Generalized Hough Transform [Ballard81]
• Generalization for an arbitrary contour or shape
� Choose reference point for the contour (e.g. center)
� For each point on the contour remember where it is located w.r.t. to the reference point
� Remember radius r and angle φrelative to the contour tangent
Recognition: whenever you find
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
9K. Grauman, B. Leibe
� Recognition: whenever you find a contour point, calculate the tangent angle and ‘vote’ for all possible reference points
� Instead of reference point, can also vote for transformation
⇒⇒⇒⇒ The same idea can be used with local features!
Slide credit: Bernt Schiele
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Gen. Hough Transform with Local Features
• For every feature, store possible “occurrences”
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
10K. Grauman, B. Leibe
– Object identity
– Pose
– Relative position
• For new image, let the matched features vote for possible object positions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
3D Object Recognition
• Gen. HT for Recognition
� Typically only 3 feature matches needed for recognition
� Extra matches provide robustness
� Affine model can be used for planar objects
[Lowe99]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
12K. Grauman, B. Leibe Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
View Interpolation
• Training
� Training views from similar viewpoints are clusteredbased on feature matches.
� Matching features between adjacent views are linked.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
13K. Grauman, B. Leibe
• Recognition
� Feature matches may bespread over several training viewpoints.
⇒⇒⇒⇒ Use the known links to “transfer votes” to other viewpoints.
Slide credit: David Lowe
[Lowe01]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Recognition Using View Interpolation
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
14K. Grauman, B. Leibe Slide credit: David Lowe
[Lowe01]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Location Recognition
Training
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
15K. Grauman, B. Leibe Slide credit: David Lowe
Training
[Lowe04]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Applications
• Sony Aibo(Evolution Robotics)
• SIFT usage
� Recognize docking station
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
16K. Grauman, B. Leibe
docking station
� Communicate with visual cards
• Other uses
� Place recognition
� Loop closure in SLAM
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
RANSAC (RANdom SAmple Consensus) [Fischler81]
• Randomly choose a minimal subset of data points necessary to fit a model (a sample)
• Points within some distance threshold t of model are a consensus set. Size of consensus set is model’s support.
• Repeat for N samples; model with biggest support is most robust fit
Points within distance of best model are inliers
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
17K. Grauman, B. Leibe
� Points within distance t of best model are inliers
� Fit final model to all inliers
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
RANSAC: How many samples?
• How many samples are needed?� Suppose w is fraction of inliers (points from line).
� n points needed to define hypothesis (2 for lines)
� k samples chosen.
• Prob. that a single sample of n points is correct:
• Prob. that all samples fail is:
nw
knw )1( −
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
19K. Grauman, B. Leibe
• Prob. that all samples fail is:
⇒⇒⇒⇒ Choose k high enough to keep this below desired failure rate.
knw )1( −
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
After RANSAC
• RANSAC divides data into inliers and outliers and yields estimate computed from minimal set of inliers
• Improve this initial estimate with estimation over all inliers (e.g. with standard least-squares minimization)
• But this may change inliers, so alternate fitting with re-classification as inlier/outlier
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
21K. Grauman, B. Leibe
classification as inlier/outlier
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Finding Feature Matches
• Find best stereo match within a square search window (here 300 pixels2)
• Global transformation model: epipolar geometry
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
22K. Grauman, B. Leibe
from Hartley & Zisserman
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Finding Feature Matches
• Find best stereo match within a square search window (here 300 pixels2)
• Global transformation model: epipolar geometry
before RANSAC after RANSAC
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
23K. Grauman, B. Leibe
from Hartley & Zisserman
Slide credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Comparison
Gen. Hough Transform
• Advantages
� Very effective for recognizing arbitrary shapes or objects
� Can handle high percentage of outliers (>95%)
� Extracts groupings from clutter in linear time
RANSAC
• Advantages
� General method suited to large range of problems
� Easy to implement
� Independent of number of dimensions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
24K. Grauman, B. Leibe
linear time
• Disadvantages
� Quantization issues
� Only practical for small number of dimensions (up to 4)
• Improvements available
� Probabilistic Extensions
� Continuous Voting Space
• Disadvantages
� Only handles moderate number of outliers (<50%)
• Many variants available, e.g.
� PROSAC: Progressive RANSAC [Chum05]
� Preemptive RANSAC [Nister05]
[Leibe08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example Applications
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
25B. Leibe
Mobile tourist guide• Self-localization• Object/building recognition• Photo/video augmentation
[Quack, Leibe, Van Gool, CIVR’08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Web Demo: Movie Poster Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
26K. Grauman, B. Leibe
http://www.kooaba.com/en/products_engine.html#
50’000 movie
posters indexed
Query-by-image
from mobile phone
available in Switzer-
land
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Application: Large-Scale Retrieval
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
27K. Grauman, B. Leibe [Philbin CVPR’07]
Query Results from 5k Flickr images (demo available for 100k set)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
Application: Image Auto-AnnotationMoulin Rouge
Tour Montparnasse Colosseum
Old Town Square (Prague)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lV
isu
al
Ob
jec
t R
ec
og
nit
ion
Tu
tori
al
28K. Grauman, B. Leibe
Left: Wikipedia image
Right: closest match from Flickr
[Quack CIVR’08]
Colosseum
Viktualienmarkt
Maypole
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
29K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
2K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Feature Sets
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Global representations: limitations
• Success may rely on alignment -> sensitive to viewpoint
• All parts of the image or window impact the description -> sensitive to occlusion, clutter
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
3K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Local representations
• Describe component regions or patches separately.
• Many options for detection & description…
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
4K. Grauman, B. Leibe
Superpixels
[Ren et al.]
Shape context
[Belongie 02]
Maximally Stable
Extremal Regions
[Matas 02]
Geometric Blur
[Berg 05]
SIFT [Lowe 99]
Salient regions
[Kadir 01]
Harris-Affine
[Mikolajczyk 04]Spin images
[Johnson 99]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Recall: Invariant local features
Subset of local feature types designed to be invariant to
� Scale
� Translation
� Rotation
� Affine transformations
y1 y2…
yd
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
� Affine transformations
� Illumination
1) Detect interest points
2) Extract descriptors
x1 x2…
xd
[Mikolajczyk01, Matas02, Tuytelaars04, Lowe99, Kadir01,… ]
K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Recognition with local feature sets
• Previously, we saw how to use local invariant features + a global spatial model to recognize specific objects, using a planar object assumption.
• Now, we’ll use local features for
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Now, we’ll use local features for
� Indexing-based recognition
� Bags of words representations
� Correspondence / matching kernels
6K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Basic flow
…
…Index each one into pool of descriptors from previously seen images
…
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
7K. Grauman, B. Leibe
Detect or sample features
Describe features
List of positions,
scales,
orientations
Associated list of
d-dimensional
descriptors
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Indexing local features
• Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Indexing local features
• When we see close points in feature space, we have similar descriptors, which indicates similar local content.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Figure credit: A. Zisserman K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Indexing local features
• We saw in the previous section how to use voting and pose clustering to identify objects using local features
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
10K. Grauman, B. Leibe
Figure credit: David Lowe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Indexing local features
• With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?
� Low-dimensional descriptors : can use standard efficient
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
� Low-dimensional descriptors : can use standard efficient data structures for nearest neighbor search
� High-dimensional descriptors: approximate nearest neighbor search methods more practical
� Inverted file indexing schemes
11K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Indexing local features: approximate nearest neighbor search
Best-Bin First (BBF), a variant of k-d trees that uses priority queue to examine most promising branches first [Beis & Lowe, CVPR 1997]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
12K. Grauman, B. Leibe
Locality-Sensitive Hashing (LSH), a randomized hashing technique using hash functions that map similar points to the same bin, with high probability [Indyk & Motwani, 1998]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• For text documents, an efficient way to find all pages on which a word occurs is to use an index…
• We want to find all
Indexing local features: inverted file index
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• We want to find all images in which a feature occurs.
• To use this idea, we’ll need to map our features to “visual words”.
13K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words: main idea
• Extract some local features from a number of images …
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
14K. Grauman, B. Leibe
e.g., SIFT descriptor space: each
point is 128-dimensional
Slide credit: D. Nister
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words: main idea
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
15K. Grauman, B. LeibeSlide credit: D. Nister
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words: main idea
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
16K. Grauman, B. LeibeSlide credit: D. Nister
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words: main idea
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
17K. Grauman, B. LeibeSlide credit: D. Nister
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lPerceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
18K. Grauman, B. LeibeSlide credit: D. Nister
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lPerceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
19K. Grauman, B. LeibeSlide credit: D. Nister
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words: main idea
Map high-dimensional descriptors to tokens/words by quantizing the feature space
• Quantize via
clustering, let
cluster centers be
the prototype
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
20K. Grauman, B. Leibe
the prototype
“words”
Descriptor space
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words: main idea
Map high-dimensional descriptors to tokens/words by quantizing the feature space
• Determine which
word to assign to
each new image
region by finding
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
21K. Grauman, B. Leibe
region by finding
the closest cluster
center.
Descriptor space
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words
• Example: each group of patches belongs to the same visual word
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
22K. Grauman, B. Leibe
Figure from Sivic & Zisserman, ICCV 2003
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words
• First explored for texture and material representations
• Texton = cluster center of filter responses over collection of images
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
collection of images
• Describe textures and materials based on distribution of prototypical texture elements.
Leung & Malik 1999; Varma &
Zisserman, 2002; Lazebnik,
Schmid & Ponce, 2003;
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual words
• More recently used for describing scenes and objects for the sake of indexing or classification.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
24K. Grauman, B. Leibe
Sivic & Zisserman 2003;
Csurka, Bray, Dance, & Fan
2004; many others.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Inverted file index for images comprised of visual words
Word number
List of image numbers
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Image credit: A. Zisserman K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bags of visual words
• Summarize entire image based on its distribution (histogram) of word occurrences.
• Analogous to bag of words representation commonly
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
representation commonly used for documents.
26K. Grauman, B. LeibeImage credit: Fei-Fei Li
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Video Google System
1. Collect all words within query region
2. Inverted file index to find relevant frames
3. Compare word counts
4. Spatial verification
Query
region
Retrie
ved fra
mes
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
4. Spatial verification
Sivic & Zisserman, ICCV 2003
• Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html
27K. Grauman, B. Leibe
Retrie
ved fra
mes
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Basic flow
…
…Index each one into pool of descriptors from previously seen images
…
or
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
28K. Grauman, B. Leibe
Detect or sample features
Describe features
List of positions,
scales,
orientations
Associated list of
d-dimensional
descriptors
Quantize to form bag of words vector for the image
…
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual vocabulary formation
Issues:
• Sampling strategy
• Clustering / quantization algorithm
• Unsupervised vs. supervised
• What corpus provides features (universal vocabulary?)
•
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Vocabulary size, number of words
29K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Sampling strategies
Dense, uniformly Sparse, at
interest pointsRandomly
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
30K. Grauman, B. LeibeImage credits: F-F. Li, E. Nowak, J. Sivic
interest points
Multiple interest
operators
• To find specific, textured objects, sparse
sampling from interest points often more
reliable.
• Multiple complementary interest operators
offer more image coverage.
• For object categorization, dense sampling
offers better coverage.
[See Nowak, Jurie & Triggs, ECCV 2006]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Clustering / quantization methods
• k-means (typical choice), agglomerative clustering, mean-shift,…
• Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies
Vocabulary tree [Nister & Stewenius, CVPR 2006]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
� Vocabulary tree [Nister & Stewenius, CVPR 2006]
31K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Recognition with Vocabulary Tree
• Tree construction:
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
32K. Grauman, B. Leibe Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary Tree
• Training: Filling the tree
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
33K. Grauman, B. Leibe Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary Tree
• Training: Filling the tree
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
34K. Grauman, B. Leibe Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary Tree
• Training: Filling the tree
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
35K. Grauman, B. Leibe Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary Tree
• Training: Filling the tree
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
36K. Grauman, B. Leibe Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary Tree
• Training: Filling the tree
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
37K. Grauman, B. Leibe Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary Tree
• Recognition
RANSAC
verification
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
38K. Grauman, B. Leibe Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary Tree: Performance
• Evaluated on large databases
� Indexing with up to 1M images
• Online recognition for databaseof 50,000 CD covers
Retrieval in ~1s
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
39K. Grauman, B. Leibe
� Retrieval in ~1s
• Find experimentally that large vocabularies can be beneficial for recognition
[Nister & Stewenius, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Vocabulary formation
• Ensembles of trees provide additional robustness
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Figure credit: F. Jurie K. Grauman, B. Leibe
Moosmann, Jurie, & Triggs 2006; Yeh, Lee, & Darrell 2007;
Bosch, Zisserman, & Munoz 2007; …
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Supervised vocabulary formation
• Recent work considers how to leverage labeled images when constructing the vocabulary
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
41K. Grauman, B. Leibe
Perronnin, Dance, Csurka, & Bressan, Adapted Vocabularies for
Generic Visual Categorization, ECCV 2006.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Supervised vocabulary formation
• Merge words that don’t aid in discriminability
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Winn, Criminisi, & Minka, Object Categorization by Learned
Universal Visual Dictionary, ICCV 2005
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Supervised vocabulary formation
• Consider vocabulary and classifier construction jointly.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
43K. Grauman, B. Leibe
Yang, Jin, Sukthankar, & Jurie, Discriminative Visual Codebook Generation
with Classifier Training for Object Category Recognition, CVPR 2008.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Learning and recognition with bag of words histograms
• Bag of words representation makes it possible to describe the unordered point set with a single vector (of fixed dimension across image examples)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Provides easy way to use distribution of feature types with various learning algorithms requiring vector input.
44K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• …including unsupervised topic models designed for documents.
• Hierarchical Bayesian text models (pLSA and LDA)
– Hoffman 2001, Blei, Ng & Jordan, 2004
– For object and scene categorization: Sivic et al. 2005,
Learning and recognition with bag of words histograms
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
– For object and scene categorization: Sivic et al. 2005, Sudderth et al. 2005, Quelhas et al. 2005, Fei-Fei et al. 2005
45K. Grauman, B. LeibeFigure credit: Fei-Fei Li
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• …including unsupervised topic models designed for documents.
Learning and recognition with bag of words histograms
Probabilistic Latent
Semantic Analysis
(pLSA)wN
d z
D
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
46K. Grauman, B. Leibe
D
“face”
Sivic et al. ICCV 2005
[pLSA code available at: http://www.robots.ox.ac.uk/~vgg/software/]
Figure credit: Fei-Fei Li
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bags of words: pros and cons
+ flexible to geometry / deformations / viewpoint
+ compact summary of image content
+ provides vector representation for sets
+ has yielded good recognition results in practice
-
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
- basic model ignores geometry – must verify afterwards, or encode via features
- background and foreground mixed when bag covers whole image
- interest points or sampling: no guarantee to capture object-level parts
- optimal vocabulary formation remains unclear
47K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
48K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Feature Sets
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
2K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Feature Sets
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Basic flow
…
…Index each one into pool of descriptors from previously seen images
…
or
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
3K. Grauman, B. Leibe
Detect or sample features
Describe features
List of positions,
scales,
orientations
Associated list of
d-dimensional
descriptors
Compute match with another image
or
Quantize to form bag of words vector for the image
…
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Local feature correspondences
• The matching between sets of local features helps to establish overall similarity between objects or shapes.
• Assigned matches also useful for localization
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
4K. Grauman, B. Leibe
Shape context
[Belongie &
Malik 2001]
Low-distortion matching [Berg & Malik 2005] Match kernel
[Wallraven,
Caputo & Graf
2003]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Local feature correspondences
• Least cost match: minimize total cost between matched points
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Least cost partial match: match all of smaller set to some portion of larger set.
∑∈
→
−
Xx
ii
YXi
xx )(min:
ππ
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel (PMK)
• Optimal matching expensive relative to number of features per image (m).
• PMK is approximate partial match for efficient discriminative learning from sets of local features.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
6K. Grauman, B. Leibe
Optimal match: O(m3)Greedy match: O(m2 log m)Pyramid match: O(m)
[Grauman & Darrell, ICCV 2005]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel: pyramid extraction
,
Histogram
pyramid:
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
7K. Grauman
pyramid:
level i has bins
of size
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel: counting matches
Histogram intersection
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
8K. Grauman
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel: counting new matches
matches at this level matches at previous level
Histogram intersection
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
9K. Grauman
Difference in histogram intersections across
levels counts number of new pairs matched
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel
histogram pyramids
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
10K. Grauman
• For similarity, weights inversely proportional to bin size (or may be learned discriminatively)
• Normalize kernel values to avoid favoring large sets
measure of difficulty of a match at level i
number of newly matched pairs at level i
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example pyramid match
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
11K. Grauman
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example pyramid match
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
12K. Grauman
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example pyramid match
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
13K. Grauman
pyramid match
Example pyramid match
optimal match
K. Grauman
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel
• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods
• Bounded error relative to optimal partial match
• Linear time -> efficient learning with large feature sets
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel
• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods
• Bounded error relative to optimal partial match
• Linear time -> efficient learning with large feature sets
Accu
racy
ET
H
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Accu
racy
Mean number of featuresT
ime (
s)
Mean number of features
ET
H-8
0 d
ata
set
Pyramid match
Match [Wallraven et al.]O(m2)
O(m)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Pyramid match kernel
• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods
• Bounded error relative to optimal partial match
• Linear time -> efficient learning with large feature sets
• Use data-dependent pyramid partitions for high-d feature spaces
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
feature spaces
Uniform pyramid bins Vocabulary-guided
pyramid bins
Code for PMK: http://people.csail.mit.edu/jjl/libpmk/
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Matching smoothness & local geometry
• Solving for linear assignment means (non-overlapping) features can be matched independently, ignoring relative geometry.
• One alternative: simply expand feature vectors to include spatial information before matching.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
18K. Grauman, B. Leibe
[ f1,…,f128, ]
xa
yaxa, ya
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Spatial pyramid match kernel
• First quantize descriptors into words, then do one pyramid match per word in image coordinate space.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Lazebnik, Schmid & Ponce, CVPR 2006
K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Matching smoothness & local geometry
• Use correspondence to estimate parameterized transformation, regularize to enforce smoothness
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Shape context matching [Belongie, Malik, & Puzicha 2001]
K. Grauman, B. Leibe
Code: http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Matching smoothness & local geometry
• Let matching cost include term to penalize distortion between pairs of matched features.
j j'QueryTemplate
Rij
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Approximate for efficient solutions: Berg & Malik, CVPR 2005;
Leordeanu & Hebert, ICCV 2005
i i 'i i'
RijSi'j'
Figure credit: Alex Berg K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Matching smoothness & local geometry
• Compare “semi-local” features: consider configurations or neighborhoods and co-occurrence relationships
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Hyperfeatures: Agarwal &
Triggs, ECCV 2006]
Correlograms of
visual words
[Savarese, Winn, &
Criminisi, CVPR 2006]
Proximity
distribution kernel
[Ling & Soatto, ICCV
2007]
Feature neighborhoods [Sivic
& Zisserman, CVPR 2004]
Tiled neighborhood [Quack, Ferrari,
Leibe, van Gool ICCV 2007]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Matching smoothness & local geometry
• Learn or provide explicit object-specific shape model [Next in the tutorial : part-based models]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
x1
x3
x4
x6
x5
x2
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Summary
• Local features are a useful, flexible representation
� Invariance properties - typically built into the descriptor
� Distinctive, especially helpful for identifying specific textured objects
� Breaking image into regions/parts gives tolerance to occlusions and clutter
Mapping to visual words forms discrete tokens from image
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
� Mapping to visual words forms discrete tokens from image regions
• Efficient methods available for
� Indexing patches or regions
� Comparing distributions of visual words
� Matching features
24K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
25K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Feature Sets
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
2K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Recognition of Object Categories
• We no longer have exact correspondences…
• On a local level, wecan still detect similar parts.
• Represent objects
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
3T. Tuytelaars, B. Leibe
• Represent objectsby their parts
⇒⇒⇒⇒ Bag-of-features
• How can weimprove on this?
� Encode structure
Slide credit: Rob Fergus
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Part-Based Models
• Fischler & Elschlager 1973
• Model has two components
� parts (2D image fragments)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
4K. Grauman, B. Leibe
(2D image fragments)
� structure (configuration of parts)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Different Connectivity Structures
Fergus et al. ’03Fei-Fei et al. ‘03
Leibe et al. ’04, ‘08Crandall et al. ‘05
Crandall et al. ‘05 Felzenszwalb & Huttenlocher ‘05
O(N6) O(N2) O(N3) O(N2)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
5K. Grauman, B. Leibe
Fei-Fei et al. ‘03 Crandall et al. ‘05Fergus et al. ’05
Huttenlocher ‘05
Bouchard & Triggs ‘05 Carneiro & Lowe ‘06Csurka ’04Vasconcelos ‘00
from [Carneiro & Lowe, ECCV’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Spatial Models Considered Here
x1
x6 x2
“Star” shape model
x1
x6 x2
Fully connected shape model
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
6K. Grauman, B. Leibe
x3
x4
x5x3
x4
x5
� e.g. Constellation Model
� Parts fully connected
� Recognition complexity: O(NP)
� Method: Exhaustive search
� e.g. ISM
� Parts mutually independent
� Recognition complexity: O(NP)
� Method: Gen. Hough Transform
Slide credit: Rob Fergus
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Constellation Model
• Joint model for appearance and shape
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
7K. Grauman, B. Leibe
Gaussian shape pdf
Prob. of detection
Gaussian part appearance pdf Gaussian
relative scale pdf
Log(scale)
0.8 0.75 0.9
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Constellation ModelGaussian shape pdf
Prob. of detection
Gaussian part appearance pdf Gaussian
relative scale pdf
Log(scale)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
8K. Grauman, B. Leibe
0.8 0.75 0.9
Uniform shape pdf
Clutter model
Gaussian appearance pdf
Poission pdf on # detections
Uniform
relative scale pdf
Log(scale)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Goal: Find regions & their location, scale & appearance
• Initialize model parameters
• Use EM and iterate to convergence
� E-step: Compute assignments for which regions are foreground/background
� M-step: Update model parameters
Constellation Model: Learning Procedure
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
9K. Grauman, B. Leibe
• Trying to maximize likelihood – consistency in shape & appearance
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Motorbikes
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
10K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Motorbikes (2)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
11K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Spotted Cats
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
12K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Discussion: Constellation Model
• Advantages� Works well for many different object categories
� Can adapt well to categories where– Shape is more important
– Appearance is more important
� Everything is learned from training data
� Weakly-supervised training possible
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
13K. Grauman, B. Leibe
� Weakly-supervised training possible
• Disadvantages� Model contains many parameters that need to be estimated
� Cost increases exponentially with increasing number of parameters
⇒⇒⇒⇒ Fully connected model restricted to small number of parts.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Implicit Shape Model (ISM)
• Basic ideas
� Learn an appearance codebook
� Learn a star-topology structural model
– Features are considered independent given obj. center
• Algorithm: probabilistic Gen. Hough Transform
x1
x3
x4
x6
x5
x2
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
14K. Grauman, B. Leibe
• Algorithm: probabilistic Gen. Hough Transform
� Exact correspondences →→→→ Prob. match to object part
� NN matching →→→→ Soft matching
� Feature location on obj. →→→→ Part location distribution
� Uniform votes →→→→ Probabilistic vote weighting
� Quantized Hough array →→→→ Continuous Hough space
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Codebook Representation
• Extraction of local object features� Interest Points (e.g. Harris detector)
� Sparse representation of the object appearance
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
15K. Grauman, B. Leibe
• Collect features from whole training set
• Example:
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Gen. Hough Transform with Local Features
• For every feature, store possible “occurrences”
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
18K. Grauman, B. Leibe
– Object identity
– Pose
– Relative position
• For new image, let the matched features vote for possible object positions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Implicit Shape Model - Representation
Training images(+reference segmentation)
Appearance codebook…………
………………………………
…………
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
19B. Leibe
• Learn appearance codebook
� Extract local features at interest points
� Agglomerative clustering ⇒⇒⇒⇒ codebook
• Learn spatial distributions
� Match codebook to training images
� Record matching positions on object
Spatial occurrence distributionsx
y
sx
y
s
x
y
s
x
y
s
+ local figure-ground labels
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Implicit Shape Model - Recognition
Interest Points Matched Codebook
Entries
Probabilistic
Voting
yObject Image Feature Interpretation
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
21
3D Voting Space
(continuous)
xs
Object
Position
o,x
Image Feature
f
Interpretation
(Codebook match)
Ci
)( fCp i ),,( lin Cxop
∑=i
inin CxopfCpfxop ),,()(),,( ll
[Leibe04, Leibe08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Implicit Shape Model - Recognition
Interest Points Matched Codebook
Entries
Probabilistic
Voting
y
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
22
Backprojected
Hypotheses
3D Voting Space
(continuous)
xs
Backprojection
of Maxima
[Leibe04, Leibe08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Results on Cows
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
24K. Grauman, B. Leibe
Original image
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Results on Cows
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
25K. Grauman, B. Leibe
Original imageInterest points
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Results on Cows
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
26K. Grauman, B. Leibe
Original imageOriginal imageOriginal imageOriginal imageInterest pointsInterest pointsInterest pointsInterest pointsMatched patches
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Results on Cows
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
27K. Grauman, B. Leibe
Original imageOriginal imageOriginal imageOriginal imageInterest pointsInterest pointsInterest pointsInterest pointsMatched patchesMatched patchesMatched patchesMatched patchesProb. Votes
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Results on Cows
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
28K. Grauman, B. Leibe
1st hypothesis
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Results on Cows
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
29K. Grauman, B. Leibe
2nd hypothesis
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Results on Cows
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
30K. Grauman, B. Leibe
3rd hypothesis
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Scale-invariant feature selection
� Scale-invariant interest points
� Rescale extracted patches
� Match to constant-size codebook
• Generate scale votes
Scale Invariant Voting
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
31K. Grauman, B. Leibe
• Generate scale votes
� Scale as 3rd dimension in voting space
� Search for maxima in 3D voting space
Search window
x
y
s
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Scale Voting: Efficient Computation
y
s
Binned
y
s
x
Refinement
y
s
x
Candidate
y
s
Scale votes
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
33K. Grauman, B. Leibe
• Mean-Shift formulation for refinement
� Scale-adaptive balloon density estimator
Binned
accum. array
Refinement
(MSME)
Candidate
maxima
Scale votes
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Detection Results
• Qualitative Performance
� Recognizes different kinds of objects
� Robust to clutter, occlusion, noise, low contrast
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
35K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Figure-Ground Segregation
• Problem extensively studied in Psychophysics
• Experiments with ambiguousfigure-ground stimuli
• Results:
Evidence that object recognition can
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
36K. Grauman, B. Leibe
� Evidence that object recognition canand does operate before figure-ground organization
� Interpreted as Gestalt cue familiarity.
M.A. Peterson, “Object Recognition Processes Can and Do Operate Before Figure-
Ground Organization”, Cur. Dir. in Psych. Sc., 3:105-111, 1994.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
ISM – Top-Down Segmentation
Interest Points Matched Codebook
Entries
Probabilistic
Voting
y
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
37K. Grauman, B. Leibe
Backprojected
Hypotheses
Segmentation3D Voting Space
(continuous)
xs
Backprojection
of Maximap(figure)
Probabilities
[Leibe04, Leibe08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Segmentation: Probabilistic Formulation
• Influence of patch on object hypothesis (vote weight)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
38K. Grauman, B. Leibe
( )( ) ( ) ( )
( )xop
f,pfCpCxopxofp
n
i iin
n,
||,,,
∑=
ll
( ) ( ) ( )∑∈
===),(
,|,,,,|,|l
ll
f
nnn xofpxoffigurepxofigurepp
pp
• Backprojection to features ff and pixels pp:
Segmentationinformation
Influence on object hypothesis
[Leibe04, Leibe08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Segmentation
p(figure)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
46K. Grauman, B. Leibe
• Interpretation of p(figure) map
� per-pixel confidence in object hypothesis
� Use for hypothesis verification
p(figure)
p(ground)
Segmentation
p(ground)
Original image
[Leibe04, Leibe08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example Results: Motorbikes
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
47K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example Results: Cows
• Training
� 112 hand-segmented images
• Results on novel sequences:
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
48K. Grauman, B. Leibe
Single-frame recognition - No temporal continuity used!
[Leibe04, Leibe08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example Results: Chairs
Dining room chairs
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
49B. Leibe
Office chairs
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Inferring Other Information: Part Labels
TrainingTraining
TestTest OutputOutput
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
50[Thomas07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Inferring Other Information: Part Labels (2)
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
51[Thomas07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Inferring Other Information: Depth Maps
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
52
“Depth from a single image”
[Thomas07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Estimating Articulation
Application for Pedestrian Detection
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
53B. Leibe
• Rotation-Invariant Detection
[Leibe, Seemann, Schiele, CVPR’05]
[Mikolajczyk, Leibe, Schiele, CVPR’06]
θq
φ
dq
φ
θ
d
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
54K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Visual Object Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
2K. Grauman, B. Leibe
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Highlight of some research topics not covered in the main tutorial
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Benchmark Data
• What degree of difficulty do current datasets have?
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Caltech-101
A dataset that has been about mastered…
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Images from the Caltech-101:
101-way multi-class classification problem
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Caltech256
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Images from the Caltech-256:
256 multi-class recognition problem
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: Pascal Visual Object Classes Challenge
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
K. Grauman, B. Leibe
Pascal VOC 2007:
Binary detection problems
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Example: LabelMe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
http://labelme.csail.mit.edu/
K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Current challenges & ongoing research
• Multi-cue integration
• Finer level categorization
• View invariant recognition
• Unsupervised category discovery
• Learning from noisily labeled images
•
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Integration of segmentation and recognition
• Learning with text and images/video
• Use of video
• Context and scene layout
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Multi-cue integration
• Single cues often not sufficient.
• Integrate multiple local and global cues.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
9K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Multi-Category Discrimination
• Distinguish similar categories.
• Need to look at specific details!
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
10K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• Detectors for different viewpoints ⇒⇒⇒⇒ How can this be improved?
Multi-Aspect Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
11K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Multi-Aspect Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
12K. Grauman, B. Leibe
[Thomas et al., CVPR’06][Hoiem, Rother, Winn, CVPR’07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Multi-Aspect Recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
13K. Grauman, B. Leibe
[Rothganger et al., CVPR’03]
[Savarese & Fei-Fei, ICCV’07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Unsupervised, semi-supervised category discovery
Probabilistic Latent Semantic Analysis (pLSA)
“face”
Topic models for images
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
wN
c zD
π
“beach”Latent Dirichlet Allocation (LDA)
Sivic et al. ICCV 2005, Fei-Fei et al. ICCV 2005Figure credit: Fei-Fei Li
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Unsupervised, semi-supervised category discovery
Clustering cluttered images
Learning from noisy keyword-based image search results
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Grauman & Darrell, CVPR 2006
Fergus et al. ECCV 2004, ICCV 2005
Li & Fei-Fei, CVPR 2007
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Learning with text and images/video
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Berg, Berg, Edwards,
& Forsyth, NIPS 2006
Barnard et al. JMLR 2003
Gupta et al. ECML 2008
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Integrating segmentation + recognition
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Kumar et al. CVPR 2005Borenstein & Ullman, ECCV 2002
Kannan, Winn, & Rother, NIPS 2006Tu, Chen, Yuille, Zhu, ICCV 2003
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Role of context, understanding scene layout
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Antonio Torralba, IJCV 2003
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Role of context, understanding scene layout
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Image World
Hoiem, Efros, & Hebert, CVPR 2006
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Integration with Scene Geometry
• Goal: Find the ground plane
� Restrict object location
� Assume Gaussian size prior
⇒⇒⇒⇒ Significantly reduced search space
Structure-from-Motion
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
20B. Leibe
Dense stereo
Structure-from-Motion
x
s
y Search corridor
Hough Volume
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Extensions
• Combination with 3D Geometry
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
21K. Grauman, B. Leibe
• Mobile Pedestrian Detection
[Leibe, Cornelis, Cornelis, Van Gool, CVPR’07]
[Ess, Leibe, Van Gool, ICCV’07]21
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Detections Using Ground Plane Constraints
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
22B. Leibe
left camera
1175 frames
[Leibe et al. CVPR’07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Extensions: Tracking-by-Detection
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
23
• Spacetime trajectory analysis
� Link up detections to form physically plausible ST trajectories
� Select set of ST trajectories that best explain the data
[Leibe et al. CVPR’07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Dynamic Scene Analysis Results
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
24B. Leibe [Leibe et al. CVPR’07]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Extensions (2)
• Combination 3D Reconstruction
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
25K. Grauman, B. Leibe
[Cornelis, Leibe, Cornelis, Van Gool, 3DPVT’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Textured 3D Model
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
26B. Leibe
• Run-times� SfM + Bundle adjustment: 27-30 fps on CPU
� Dense reconstruction: 36 fps on GPU
Original 3D Reconstruction
[Cornelis, Cornelis, Van Gool, CVPR’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Improved 3D City Model
Enhancing your driving experience…
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
27
Original 3D Reconstruction
[Cornelis, Leibe, Cornelis, Van Gool, 3DPVT’06]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Putting It All Together…
π
1..nπd oi
di
I D
x
y
s
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
28B. Leibe
x
t
z
itiH
,
H1 H2
Q
S
V
VT
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Mobile Pedestrian Tracking
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
29[Ess, Leibe, Schindler, Van Gool, CVPR’08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Mobile Tracking Through Crowds
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
30[Ess, Leibe, Schindler, Van Gool, CVPR’08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Extension: Recovering Articulations1...N
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
31B. Leibe
• Idea: Only perform articulated tracking where it’s easy!
• Multi-person tracking
� Solves hard data association problem
• Articulated tracking
� Only on individual “tracklets” between occlusions
[Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV’08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Articulated Multi-Person Tracking
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
32B. Leibe
• Multi-Person tracking� Recovers trajectories and solves data association
� Estimates 3D walking direction and speed
� Detects occlusion events
[Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV’08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Articulated Tracking under Egomotion
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
33B. Leibe
[Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV’08]
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
lPerceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
34K. Grauman, B. Leibe
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
Summary
• Visual recognition is a challenging and very active research area.
• We’ve covered some basic models and representations that have been shown to be effective, and highlighted some ongoing issues.
Perceptual and Sensory Augmented Computing
Vis
ua
l O
bje
ct
Re
co
gn
itio
n T
uto
ria
l
• See tutorial website for slides, links, references.http://www.vision.ee.ethz.ch/~bleibe/teaching/tutorial-aaai08/
Thank you!
K. Grauman, B. Leibe