Object Recognizing

Preview:

DESCRIPTION

Object Recognizing. Recognition -- topics. Features Classifiers Example ‘winning’ system. Object Classes. Individual Recognition. Object parts Automatic, or query-driven. Window. Mirror. Window. Door knob. Headlight. Back wheel. Bumper. Front wheel. Headlight. - PowerPoint PPT Presentation

Citation preview

Object Recognizing

Recognition -- topics

• Features

• Classifiers

• Example ‘winning’ system

Object Classes

Individual Recognition

Object partsAutomatic, or query-driven

Headlight

Window

Door knob

Back wheel

Mirror

Front wheel Headlight

Window

Bumper

Class Non-class

Variability of Airplanes Detected

Class Non-class

Features and Classifiers

Same features with different classifiersSame classifier with different features

Generic Features:The same for all classes

Simple (wavelets) Complex (Geons)

Class-specific Features: Common Building Blocks

Optimal Class Components?

• Large features are too rare

• Small features are found

everywhere

Find features that carry the highest amount of information

Entropy

Entropy:

x = 0 1 H

p = 0.5 0.5 ? 0.1 0.9 0.47 0.01 0.99 0.08

)p(x log )p(x- H i2i

Mutual information

H(C) when F=1 H(C) when F=0

I(C;F) = H(C) – H(C/F)

F=1 F=0

H(C)

))(()()( cPLogcPcH

Mutual Information I(C,F)

Class:11010100

Feature:10011100

I(F,C) = H(C) – H(C|F)

Optimal classification features

• Theoretically: maximizing delivered information minimizes classification error

• In practice: informative object components can be identified in training images

Mutual Info vs. Threshold

0.00 20.00 40.00

Detection threshold

Mu

tu

al

Info

forehead

hairline

mouth

eye

nose

nosebridge

long_hairline

chin

twoeyes

Selecting Fragments

Horse-class features

Car-class features

Pictorial features Learned from examples

Star model

Detected fragments ‘vote’ for the center location

Find location with maximal vote

In variations, a popular state-of-the art scheme

Bag of words

ObjectObject Bag of ‘words’Bag of ‘words’

Bag of visual words A large collection of image patches

1.Feature detection 1.Feature detection and representationand representation

•Regular grid– & VogelSchiele ,2003

–Fei- ,Fei & Perona2005

Generate a dictionary using K-means clustering

Recognition by Bag of Words (BoD): Each class has its words historgram

Limited or no GeometrySimple and popular, no longer state-of-the art .

HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection

Shape context

Recognition Class II:

SVM Example Classifiers

SVM – linear separation in feature space

Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w|Margin: 2/|w|

0+1

-1 The Margin

Max Margin Classification

)Equivalently, usually used

How to solve such constraint optimization ?

The examples are vectors xi

The labels yi are +1 for class, -1 for non-class

Solving the SVM problem

• Duality

• Final form

• Efficient solution

• Extensions

Using Lagrange multipliers :

Using Lagrange multipliers: Minimize LP =

With αi > 0 the Lagrange multipliers

Minimizing the Lagrangian

Minimize Lp :

Set all derivatives to 0:

Also for the derivative w.r.t. αi

Dual formulation: Maximize the Lagrangian w.r.t. the αi and the above two conditions.

Solved in ‘dual’ formulation

Maximize w.r.t αi :

With the conditions:

Dual formulation

Mathematically equivalent formulation: Can maximize the Lagrangian with respect to the αi

After manipulations – concise matrix form :

Summary points

• Linear separation with the largest margin, f(x) = w∙x + b

• Dual formulation

• Natural extension to non-separable classes

• Extension through kernels, f(x) = ∑αi yi K(xi x) + b

Felzenszwalb

• Felzenszwalb, McAllester, Ramanan CVPR 2008. A Discriminatively Trained, Multiscale, Deformable Part Model

• Many implementation details, will describe the main points.

Using patches with HoG descriptors and classification by SVM

Person model HoG orientations with w > 0

Object model using HoG

A bicycle and its ‘root filter ’The root filter is a patch of HoG descriptor Image is partitioned into 8x8 pixel cells In each block we compute a histogram of gradient orientations

The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale

Dealing with scale: multi-scale analysis

A part Pi = (Fi, vi, si, ai, bi) .

Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box

ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2

Adding Parts

Bicycle model: root, parts, spatial map

Person model

The full score of a potential match is:  ∑ Fi ∙ Hi + ∑ ai1 xi + ai2 yi

+ bi1xi2 + bi2yi

2  

Fi ∙ Hi is the appearance part

xi, yi, is the deviation of part pi from its expected location in the model. This is the spatial part.

Match Score

The score of a match can be expressed as the dot-product of a vector β of coefficients, with the image:

Score = β∙ψ

Using the vectors ψ to train an SVM classifier :β∙ψ > 1 for class examples

β∙ψ < 1 for class examples

Using SVM:

β∙ψ > 1 for class examples β∙ψ < 1 for class examples

However, ψ depends on the placement z, that is, the values of ∆xi, ∆yi

 

We need to take the best ψ over all placements. In their notation :Classification then uses β∙f > 1

We need to take the best ψ over all placements. In their notation :

Classification then uses β∙f > 1

search with gradient descent over the placement. This includes also the levels in the hierarchy. Start with the root filter, find places of high score for it. For these high-scoring locations, each for the optimal placement of the parts at a level with twice the resolution as the root-filter, using GD.

Final decision β∙ψ > θ implies class

Recognition

Essentially maximize ∑Fi Hi + ∑ ai1 xi + ai2 y + bi1x2 + bi2y2

Over placements (xi yi)

• Training -- positive examples with bounding boxes around the objects, and negative examples.

• Learn root filter using SVM

• Define fixed number of parts, at locations of high energy in the root filter HoG

• Use these to start the iterative learning

Hard Negatives

The set M of hard-negatives for a known β and data set DThese are support vector (y ∙ f =1) or misses (y ∙ f < 1)

Optimal SVM training does not need all the examples, hard examples are sufficient. For a given β, use the positive examples + C hard examples Use this data to compute β by standard SVM Iterate (with a new set of C hard examples)

All images contain at least 1 bike

Future challenges :

• Dealing with very large number of classes – Imagenet, 15,000 categories, 12 million images

• To consider: human-level performance for at least one class

Recommended