Shared features and Joint Boosting Yuandong Tian Sharing visual features for multiclass and multiview object detection Sharing visual features for multiclass

Shared features and Joint Boosting

Yuandong Tian

Sharing visual features for multiclass and multiview object detectionA. Torralba, K. P. Murphy and W. T. Freeman PAMI. vol. 29, no. 5, pp. 854-869, May, 2007.

Outline

Motivation to choose this paper Motivation of this paper Basic ideas in boosting Joint Boost Feature used in this paper My results in face recognition

Motivation to choose this paper

Axiom: Computer vision is hard.

Assumption: (smart-stationary) Equally smart people are equally distributed over time.

Conjure: If computer vision cannot be solved in 30 years, it won’t be solved forever!

Wrong! Because we are standing on

the Shoulder of Giants.

Where are the Giants?

More computing resources?

Lots of data? Advancement of new

algorithm? Machine Learning?

What I believe

Cruel Reality

Why ML seems not to help much in CV (at least for now)?

My answer: CV and ML are

weakly coupled

A typical question in CVQ: Why do we use feature A instead of feature B?

A1: Feature A gives better performance.

A2: Feature A has some fancy properties.

A3: The following step requires the feature to have a certain property that only A has.

A strongly-coupled answer

Typical CV pipeline

Preprocessing Steps (“Computer Vision”) Feature/Similarity

ML black box

Have some domain-specific structures Design for generic structures

Contribution of this paper

Tune the ML algorithm in a CV context A good attempt to break the black

box and integrate them together

Outline


This paper

Object Recognition Problem Many object category. Few images per category

Solution—Feature sharing Find common features that distinguish a

subset of classes against the rest.

Feature sharingConcept of Feature Sharing

Template-like features

100% accuracy for a single object

But too specific.

Wavelet-like features,

weaker discriminative power

but shared in many classes.

Typical behavior of feature sharing

Result of feature sharing

Why feature sharing?

ML: Regularization—avoid over-fitting Essentially more positive samples Reuse the data

CV: Utilize the intrinsic structure of object category Use domain-specific prior to bias the

machine learning algorithm

Outline


Basic idea in Boosting

Concept: Binary classification samples, labels(+1 or -1)

Goal: Find a function (classifier) H whichmaps positive samples to the positive value

Optimization: Minimize the exponential loss w.r.t the classifier H

Basic idea in boosting(2)

Boosting: Assume H is additive

Each is a “weak” learner (classifier).Almost random but uniformly better than random

Example:Single feature classifier:

make decision only on a single dimension

The addition of weak classifiers gives a strong classifier!Key point:

How weak learner looks like

Basic idea in boosting(3)

How to minimize? Greedy Approach

Fix H, add one h in each iteration Weighting samples

After each iteration, wrongly classified samples (difficult samples) get higher weights

Technical parts

Greedy -> Second-order Taylor Expansion in each iteration

weightslabels

The weak learner to be optimized in this iteration

Solved by Least Square

Outline


Joint Boost—Multiclass We can minimize a similar function using

one-vs-all strategy

This doesn’t work very well, since it is separable in c.

Put constraints. -> shared features!

Joint Boost (2)

In each iteration, choose One common feature A subset of classes that use this feature

So that the objective decreases most

#class

Features 1 3 4 5 2 1 4 6 2 7 3

#Iteration

Sharing Diagram

III III IV V

Key insight

Each class may have its own favorite feature

a common feature may not be any of them, however it simultaneously decreases errors of many classes.

Joint Boost – Illustration

Computational issue

Choose the best subset is prohibitive Use greedy approach

Choose one class and one feature so that the objective decreases the most

Iteratively add more classes until the objective increases again Note the common feature may change

From O(2^C) to O(C^2)

(greedy)

#features = O(log #class)

29 objects, average over 20 training sets

0.95 ROC

Outline


Feature they used in the paper

Dictionary 2000 random sampled patches

Of size from 4x4 to 14x14 no clustering

Each patch is associated with a spatial mask

The candidate features

Dictionary of 2000 candidate patches and position masks, randomly sampled from the training images

template position

Features

Building feature vectors Normalized correlation with each patch

to get response Raise the response to some power

Large value gets even larger and dominate the response (max operation)

Use spatial mask to align the response to the object center (voting)

Extract response vector at object center

Results

Multiclass object recognition Dataset: LabelMe 21 objects, 50 samples per object 500 rounds

Multiview car recognition Train on LabelMe, test on PASCAL 12 views, 50 samples per view 300 rounds

70 rounds, 20 training per class, 21 objects

12 views50 samples per class300 features

Outline


Simple Experiment

Main point of this paper They claimed shared feature helps in the

situation of many categories, only a few samples in

each category.

Test it! Dataset: face recognition “Face in the wild” dataset.

Many famous figures

Experiment configuration

Use Gist-like feature but Only Gabor response Use finer grid to gather histogram

Face is aligned in the dataset.

Feature statistics 8 orientation, 2 scale, 8x8 grid 1024 dimension

Experiment

Training and testing Find 50 identities with most images For each identity, random select 3 as

training The rest for testing

Nearest neighbor (50 classes, 3 per class)Chance rate = 0.02

orientation Scale

Block

K blur L1 L2 Chisqr

8 2 8 3 No 0.1338 0.1033 0.13

8 2 8 1 No 0.1868 0.1350 0.1681

8 2 6 1 No 0.1651 0.1285 0.1544

8 2 8 1 1.0 0.1822 0.1407 0.1754

8 2 8 1 2.0 0.1677 0.1365 0.1616

80% better than NN

Result on More images

50 people, 7 images each Chance rate = 2% Nearest neighbor

L1 = 0.2856 (0.1868 in 50/3)L2 = 0.2022Chisqr = 0.2596

Joint Boost doubles the accuracy of NN

Single->PairwisePairwise->Joint7% percent

More feature is shared

Result on More Identities

100 people, 3 images each Chance rate = 1% Nearest neighbor

L1 = 0.1656 (0.1868 in 50/3)L2 = 0.1235Chisqr = 0.1623

The performance of single Boost is the same as NN

Joint Boost is still better than NN yet the increment is less (~60%) compared to the previous cases.

Conclusion Joint Boosting indeed works

Especially when the number of images per class is not too small (otherwise NN)

Better performance in the presence of Many classes, each class has only a new sam

ples Introduce regularization that reduce overfitti

ng Disadvantages

Train slowly, O(C^2).

Thanks!

Any questions?

Documents

Shared features and Joint Boosting Yuandong Tian Sharing visual features for multiclass and multiview object detection Sharing visual features for multiclass