View
46
Download
0
Category
Tags:
Preview:
DESCRIPTION
Object Recognizing. Recognition -- topics. Features Classifiers Example ‘winning’ system. Object Classes. Individual Recognition. Object parts Automatic, or query-driven. Window. Mirror. Window. Door knob. Headlight. Back wheel. Bumper. Front wheel. Headlight. - PowerPoint PPT Presentation
Citation preview
Object Recognizing
Recognition -- topics
• Features
• Classifiers
• Example ‘winning’ system
Object Classes
Individual Recognition
Object partsAutomatic, or query-driven
Headlight
Window
Door knob
Back wheel
Mirror
Front wheel Headlight
Window
Bumper
Class Non-class
Variability of Airplanes Detected
Class Non-class
Features and Classifiers
Same features with different classifiersSame classifier with different features
Generic Features:The same for all classes
Simple (wavelets) Complex (Geons)
Class-specific Features: Common Building Blocks
Optimal Class Components?
• Large features are too rare
• Small features are found
everywhere
Find features that carry the highest amount of information
Entropy
Entropy:
x = 0 1 H
p = 0.5 0.5 ? 0.1 0.9 0.47 0.01 0.99 0.08
)p(x log )p(x- H i2i
Mutual information
H(C) when F=1 H(C) when F=0
I(C;F) = H(C) – H(C/F)
F=1 F=0
H(C)
))(()()( cPLogcPcH
Mutual Information I(C,F)
Class:11010100
Feature:10011100
I(F,C) = H(C) – H(C|F)
Optimal classification features
• Theoretically: maximizing delivered information minimizes classification error
• In practice: informative object components can be identified in training images
Mutual Info vs. Threshold
0.00 20.00 40.00
Detection threshold
Mu
tu
al
Info
forehead
hairline
mouth
eye
nose
nosebridge
long_hairline
chin
twoeyes
Selecting Fragments
Horse-class features
Car-class features
Pictorial features Learned from examples
Star model
Detected fragments ‘vote’ for the center location
Find location with maximal vote
In variations, a popular state-of-the art scheme
Bag of words
ObjectObject Bag of ‘words’Bag of ‘words’
Bag of visual words A large collection of image patches
–
1.Feature detection 1.Feature detection and representationand representation
•Regular grid– & VogelSchiele ,2003
–Fei- ,Fei & Perona2005
Generate a dictionary using K-means clustering
Recognition by Bag of Words (BoD): Each class has its words historgram
–
–
–
Limited or no GeometrySimple and popular, no longer state-of-the art .
HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection
Shape context
Recognition Class II:
SVM Example Classifiers
SVM – linear separation in feature space
Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w|Margin: 2/|w|
0+1
-1 The Margin
Max Margin Classification
)Equivalently, usually used
How to solve such constraint optimization ?
The examples are vectors xi
The labels yi are +1 for class, -1 for non-class
Solving the SVM problem
• Duality
• Final form
• Efficient solution
• Extensions
Using Lagrange multipliers :
Using Lagrange multipliers: Minimize LP =
With αi > 0 the Lagrange multipliers
Minimizing the Lagrangian
Minimize Lp :
Set all derivatives to 0:
Also for the derivative w.r.t. αi
Dual formulation: Maximize the Lagrangian w.r.t. the αi and the above two conditions.
Solved in ‘dual’ formulation
Maximize w.r.t αi :
With the conditions:
Dual formulation
Mathematically equivalent formulation: Can maximize the Lagrangian with respect to the αi
After manipulations – concise matrix form :
Summary points
• Linear separation with the largest margin, f(x) = w∙x + b
• Dual formulation
• Natural extension to non-separable classes
• Extension through kernels, f(x) = ∑αi yi K(xi x) + b
Felzenszwalb
• Felzenszwalb, McAllester, Ramanan CVPR 2008. A Discriminatively Trained, Multiscale, Deformable Part Model
• Many implementation details, will describe the main points.
Using patches with HoG descriptors and classification by SVM
Person model HoG orientations with w > 0
Object model using HoG
A bicycle and its ‘root filter ’The root filter is a patch of HoG descriptor Image is partitioned into 8x8 pixel cells In each block we compute a histogram of gradient orientations
The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale
Dealing with scale: multi-scale analysis
A part Pi = (Fi, vi, si, ai, bi) .
Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box
ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2
Adding Parts
Bicycle model: root, parts, spatial map
Person model
The full score of a potential match is: ∑ Fi ∙ Hi + ∑ ai1 xi + ai2 yi
+ bi1xi2 + bi2yi
2
Fi ∙ Hi is the appearance part
xi, yi, is the deviation of part pi from its expected location in the model. This is the spatial part.
Match Score
The score of a match can be expressed as the dot-product of a vector β of coefficients, with the image:
Score = β∙ψ
Using the vectors ψ to train an SVM classifier :β∙ψ > 1 for class examples
β∙ψ < 1 for class examples
Using SVM:
β∙ψ > 1 for class examples β∙ψ < 1 for class examples
However, ψ depends on the placement z, that is, the values of ∆xi, ∆yi
We need to take the best ψ over all placements. In their notation :Classification then uses β∙f > 1
We need to take the best ψ over all placements. In their notation :
Classification then uses β∙f > 1
search with gradient descent over the placement. This includes also the levels in the hierarchy. Start with the root filter, find places of high score for it. For these high-scoring locations, each for the optimal placement of the parts at a level with twice the resolution as the root-filter, using GD.
Final decision β∙ψ > θ implies class
Recognition
Essentially maximize ∑Fi Hi + ∑ ai1 xi + ai2 y + bi1x2 + bi2y2
Over placements (xi yi)
• Training -- positive examples with bounding boxes around the objects, and negative examples.
• Learn root filter using SVM
• Define fixed number of parts, at locations of high energy in the root filter HoG
• Use these to start the iterative learning
Hard Negatives
The set M of hard-negatives for a known β and data set DThese are support vector (y ∙ f =1) or misses (y ∙ f < 1)
Optimal SVM training does not need all the examples, hard examples are sufficient. For a given β, use the positive examples + C hard examples Use this data to compute β by standard SVM Iterate (with a new set of C hard examples)
All images contain at least 1 bike
Future challenges :
• Dealing with very large number of classes – Imagenet, 15,000 categories, 12 million images
• To consider: human-level performance for at least one class
Recommended