Object Recognition
Vision Class 2006-7
Object Classes
Individual Recognition
Brief History: Recognition
Mental Rotation
Three-point alignment
Huttenlocher D. & Ullman, S. Recognizing solid objects by alignment with
an image. Int. J. Computer Vision 5(3), 195 – 212, 1990.
Object Alignment
Given three model points P1, P2, P3, and three image points p1, p2, p3, there is a unique transformation (rotation, translation, scale)
that aligns the model with the image .
(SR + d)Pi = pi
Alignment -- comments
• The projection is orthographic projection (combined with scaling).
• The 3 points are required to be non-collinear.
• The transformation is determined up to a reflection of the points about the image plane and translation in depth.
Car Recognition
Car Models
Alignment: Cars
Alignment: Mismatch
Brief History: Classification
RBC
Structural Description
G2
G2
G4
G3
G1
G4
Above
Above
Right-of Left-of
Touch
Classification: Current Approaches
Visual Class: Similar Arrangement of Shared Components
Optimal Class Components?
• Large features are too rare
• Small features are found
everywhere
Find features that carry the highest amount of information
Entropy
Entropy: H = -Σp(xi) log2 p(xi)
x = 0 1 H p = 0.5 0.5 ?
0.1 0.9 0.47 0.01 0.99 0.08
Mutual information
H(C) when F=1 H(C) when F=0
I(C;F) = H(C) – H(C/F)
F=1 F=0
H(C)
))(()()( xPLogxPxH
Mutual Information I
X alone: p(x) = 0.5, 0.5 H = 1.0
X given Y: Y = 0 Y = 1
p(x) = 0.8, 0.2 H = 0.72
p(x) = 0.1, 0.9H = 0.47
H(X|Y) = 0.5*0.72 + 0.5*0.47 = 0.595
H(X) – H(X|Y) = 1 – 0.595 = 0.405
I(X,Y) = 0.405
Mutual Information II
yx ypxp
yxpyxpYXI
, )()(
),(log),(),(
Computing MI from Examples
• Mutual information can be measured from examples:
100 Faces 100 Non-faces
Feature: 44 times 6 times
Mutual information: 0.1525H(C) = 1, H(C|F) = 0.8475
Mutual Info vs. Threshold
0.00 20.00 40.00
Detection threshold
Mu
tu
al In
fo
forehead
hairline
mouth
eye
nose
nosebridge
long_hairline
chin
twoeyes
Fragments Selection
• For a set of training images:• Generate candidate fragments
– Measure p(F/C), p(F/NC)
• Compute mutual information• Select optimal fragment • After k fragments: Maximizing the minimal addition in mutual
information with respect to each of the first k fragments
Highly Informative Face Fragments
Horse-class features
Car-class features
Fragment ‘Weight’
)|(
)|()(
CFP
CFPFR
Likelihood ratio:
Weight of F:
))(()( FRLogFw
Decision:
∑wi Fi > θ
Combining fragments
kkFW
w1 wkw2
D1 D2Dk
Feature detection :
Within a region
S(F,I) > Threshold
Fragment-based Classification
Leibe, Schiele 2003
Fergus, Perona, Zisserman 2003
Agarwal, Roth 2002
Recognition: ROC Curves
Training & Test Images
• Frontal faces without distinctive features (K:496,W:385)• Minimize background by cropping• Training images for extraction: 32 for each class• Training images for evaluation: 100 for each class• Test images: 253 for Western and 364 for Korean
Training – Fragment Extraction
WesternFragment
Score 0.92 0.82 0.77 0.76 0.75 0.74 0.72 0.68 0.67 0.65
Weight 3.42 2.40 1.99 2.23 1.90 2.11 6.58 4.14 4.12 6.47
KoreanFragment
Score 0.92 0.82 0.77 0.76 0.75 0.74 0.72 0.68 0.67 0.65
Weight 3.42 2.40 1.99 2.23 1.90 2.11 6.58 4.14 4.12 6.47
Extracted Fragments
Classifying novel images
Westerner
Korean
Unknown
kF
wF
Detect FragmentsCompare
Summed WeightsDecision
)w()k( FWFW
)w()k( FWFW
)w()k( FWFW
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100
Number of fragments
Co
rre
ct -
Err
or
(%)
Eastern test set Western test setEffect of Number of Fragments
• 7 fragments: 95%, 80 fragments: 100%• Inherent redundancy of the features• Slight violation of independence assumption
Harris Corner Detection
Ix2 IxIy
IxIy
Iy2
∑
Harris Corner Operator
<Ix2> < IxIy<
< < yIxI < yI2>
H=
Averages within a neighborhood.
Corner: The two eigenvalues λ1, λ2 are large
Indirectly:
‘Corner’ = det(H) – k trace2(H)
Harris Corner Examples
SIFT descriptor
David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
Example :
4*4 sub-regions
Histogram of 8 orientations in each
V = 128 values:
g1,1,…g1,8,… …g16,1,…g16,8
Constellation of Patches Using interest points
Fegurs, Perona, Zissermann 2003
2004 Carnegie Mellon University, all rights reserved.
A CAPTCHATM is a program that can generate and grade tests that most humans can pass, but current computer programs can't pass.
Classification: Class Examples