Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection Pham Minh Tri Ph.D. Candidate and Research Associate Nanyang

Principled Asymmetric Boosting Approachesto Rapid Training and Classification

in Face Detection

Pham Minh TriPh.D. Candidate and Research AssociateNanyang Technological University, Singapore

presented by

Outline

• Motivation• Contributions

– Fast Weak Classifier Learning– Automatic Selection of Asymmetric Goal– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error

• Future Work• Summary

Outline




Problem

Application

Application

Face recognition

Application

3D face reconstruction

Application

Camera auto-focusing

Application

Windows face logon• Lenovo Veriface Technology

Appearance-based Approach• Scan image with probe

window patch (x,y,s)– at different positions and scales– Binary classify each patch into

• face, or• non-face

• Desired output state: – (x,y,s) containing face

0 1

Most popular approach•Viola-Jones ‘01-’04, Li et.al. ‘02, Wu et.al. ’04, Brubaker et.al. ‘04, Liu et.al. ’04, Xiao et.al ‘04, •Bourdev-Brandt ‘05, Mita et.al. ‘05, Huang et.al. ’05 – ‘07, Wu et.al. ‘05, Grabner et.al.

’05-’07, •And many more

Appearance-based Approach• Statistics:

– 6,950,440 patches in a 320x240 image

– P(face) < 10-5

• Key requirement:– A very fast classifier

0 1

A very fast classifier• Cascade of non-face rejectors:

F1 F2 FN….passpasspass pass

reject reject reject

face

non-face



face

non-face

• Cascade of non-face rejectors:

• F1, F2, …, FN : asymmetric classifiers

– FRR(Fk) 0– FAR(Fk) as small as possible (e.g. 0.5 – 0.8)

A very fast classifier

F1 F2

non-face

F1 F2 FN faceF1 F2

non-face

F1 F2 FN faceF1 F2

non-face

F1 F2 FN faceF1 F2

non-face

F1 F2 FN face





F1 FN….passpasspass pass


face

non-face

F2

• A strong combination of weak classifiers:

Non-face Rejector

– f1,1, f1,2, …, f1,K : weak classifiers

– : threshold

pass

reject

F1

…. +++yes

no

f1,1 f1,2f1,K

> ?


Non-face Rejector


– : threshold

pass

reject

F1

…. +++yes

no

f1,1 f1,2f1,K

> ?

Boosting

WeakClassifierLearner

1


2

Wrongly classified

Wrongly classified

Correctly classified

Correctly classified

: negative example: positive example

Stage 1 Stage 2

Asymmetric Boosting


1


2


Stage 1 Stage 2

• Weight positives times more than negatives

pass

reject

F1

…. +++yes

no

f1,2f1,K

> ?


Non-face Rejector


– : threshold

f1,1

pass

reject

F1

…. +++yes

no

f1,2f1,K

> ?


Non-face Rejector


– : threshold

f1,1

• Classify a Haar-like feature value

Weak classifier

input patch

featurevalue v

Classifyv

score


Weak classifier

input patch

featurevalue v

Classifyv

score

…

• Learning is time-consuming

Main issues


Main issues


Weak classifier

input patch

featurevalue v

Classifyv

score

…10 minutes to learn a

weak classifier

A very fast classifier• Cascade of non-face rejectors:



face

non-face

To learn a face detector ( 4000 weak classifiers):4,000 * 10 minutes 1 month


• Learning requires too much intervention from experts

Main issues


• Learning requires too much intervention from experts

Main issues







face

non-face

F2

How to choose bounds for FRR(Fk) and FAR(Fk)?

Asymmetric Boosting


1


2


Stage 1 Stage 2

• Weight positives times more than negatives

How to choose ?

pass

reject

F1

…. +++yes

no

f1,2f1,K

> ?


Non-face Rejector


– : threshold

f1,1

How to choose ?

• Requires too much intervention from experts

• Very long learning time

Main issues

Outline




Outline




Outline




Motivation

• Face detectors today– Real-time detection

speed

…but…

– Weeks of training time

Factor

Description Common value

N number of examples 10,000

M number of weak classifiers in total

4,000 - 6,000

T number of Haar-like features

40,000

Why is Training so Slow?

• Time complexity: O(MNT log N)– 15ms to train a feature classifier– 10min to train a weak classifier– 27 days to train a face detector

A view of a face detector training algorithm

for weak classifier m from 1 to M:…update weights – O(N)for feature t from 1 to T:

compute N feature values – O(N)sort N feature values – O(N log N)train feature classifier – O(N)

select best feature classifier – O(T)…

A view of a face detector training algorithm

for weak classifier m from 1 to M:…update weights – O(N)for feature t from 1 to T:

compute N feature values – O(N)sort N feature values – O(N log N)train feature classifier – O(N)


Factor




4,000 - 6,000


40,000

Why is Training so Slow?

• Time complexity: O(MNT log N)– 15ms to train a feature classifier– 10min to train a weak classifier– 27 days to train a face detector

• Bottleneck:– At least O(NT) to train a weak

classifier

• Can we avoid O(NT)?

Our Proposal

• Fast StatBoost: To train feature classifiers using statistics rather than using input data– Con:

• Less accurate

… but not critical for a feature classifier– Pro:

• Much faster training time: Constant time instead of linear time

Fast StatBoost• Training feature classifiers using

statistics:– Assumption: feature value v(t) is normally

distributed given face class c is known – Closed-form solution for optimal threshold

• Fast linear projections of the statistics of a window’s integral image into 1D statistics of a feature value

Non-face

Face

Optimalthreshold

Featurevalue

)()( tTt gmJ )()(2)( tTtt gg

J

constant time to train a feature classifier

: Haar-like feature, a sparse vector with less than 20 non-zero elements

: mean vector and covariance matrix ofJJ

m , J

)(tg

: random vector representing a window’s integral imageJ : mean and variance of feature value v(t)2)()( , tt

Fast StatBoost• Integral image’s statistics are obtained directly from the weighted input data

– Input: N training integral images and their current weights w(m):

– We compute:• Sample total weight:

• Sample mean vector:

• Sample covariance matrix:

NNmN

mm ccc ,,,...,,,,,, )(22

)(2

)(1 JwJwJw 11

ccn

nmncc

n

wz:

)(1ˆˆ Jm

ccn

mnc

n

wz:

)(ˆ

Tcc

ccn

Tnn

mncc

n

wz mmJJ ˆˆˆˆ:

)(1

Factor




4,000 - 6,000


40,000

d number of pixels of a window

300-500

Fast StatBoost• To train a weak classifier:

– Extract the class-conditional integral image statistics

• Time complexity: O(Nd2)• Factor d2 negligible because fast algorithms

exist, hence in practice: O(N)

– Train T feature classifiers by projecting the statistics into 1D:

• Time complexity: O(T)

– Select the best feature classifier• Time complexity: O(T)

• Time complexity: O(N+T)

A view of our face detector training algorithm

for weak classifier m from 1 to M:…update weights – O(N)Extract statistics of integral image – O(Nd2)for feature t from 1 to T:

project statistics into 1D – O(1)train feature classifier – O(1)


Experimental Results

• Setup– Intel Pentium IV 2.8GHz– 19 types 295,920 Haar-like

features

• Time for extracting the statistics:– Main factor: covariance matrices

• GotoBLAS: 0.49 seconds per matrix

• Time for training T features:– 2.1 seconds

(1) (2)

(17)

(7)

(3) (4) (5) (6)

(14)(15)

(16)

(8) (9)(10) (11) (12) (13)

(18) (19)

Edge features: Corner features:

Diagonal line features:

Line features: Center-surround features:

Nineteen feature types used in our experiments

Total training time: 3.1 seconds per weak classifier with 300K features• Existing methods: up to 10 minutes with 40K features or fewer

Experimental Results• Comparison with Fast AdaBoost (J. Wu et. al. ‘07), the fastest known

implementation of Viola-Jones’ framework:

0 50000 100000 150000 200000 250000 3000000

2

4

6

8

10

12

training time of a weak classifier

Fast AdaBoost

Fast StatBoost

number of features (T)

se

co

nd

s (

s)

Experimental Results• Performance of a cascade:

ROC curves of the final cascades for face detection

Method Total training time

Memory requirement

Fast AdaBoost (T=40K)

13h 20m 800 MB

Fast StatBoost (T=40K)

02h 13m 30 MB

Fast StatBoost (T=300K)

03h 02m 30 MB

Conclusions

• Fast StatBoost: – use of statistics instead of input data to train feature

classifiers

• Learning time:– A month 3 hours

• Better detection accuracy:– Due to much more members of Haar-like features explored

Outline




Outline




Problem overview• Common appearance-based approach:

– F1, F2, …, FN : boosted classifiers


– : threshold



object

non-object

pass

reject

F1

…. +++yes

no

f1,1 f1,2f1,K

> ?

Objective

• Find f1,1, f1,2, …, f1,K, and such that:

– – – K is minimized proportional to F1’s evaluation time

pass

reject

F1

…. +++yes

no

f1,1 f1,2f1,K

> ?

01

01

)(

)(

FFRR

FFAR

K

ii xfsignxF

1,11 )()(

Existing trends (1)

Idea: Boosting + Thresholding• For k from 1 until convergence:

– Let

– Learn new weak classifier f1,k(x):

– Let

– Adjust to see if we can achieve FAR(F1) <= 0 and FRR(F1) <= 0:

• Break loop if such exists

Issues• Weak classifiers are sub-

optimal w.r.t. training goal.• Too many weak classifiers

are required in practice.

k

ii xfsignxF

1,11 )()(

)()(minargˆ11,1

,1

FFRRFFARfkf

k

k

ii xfsignxF

1,11 )()(

Existing trends (2)

Idea: Asymmetric Boosting• For k from 1 until convergence:

– Let

– Learn new weak classifier f1,k(x):

– Break loop if FAR(F1) <= 0 and FRR(F1) <= 0

Pros• Reduce FRR at the

cost of increasing FAR – acceptable for cascades

• Fewer weak classifiers

k

ii xfsignxF

1,11 )()(

)()(minargˆ11,1

,1

FFRRFFARfkf

k

Cons• How to choose ?• Much longer training

time

Solution to con• Trial and error:

• choose such that K is minimized.

Our solution

Why?

Learn every weak classifier using the same asymmetric goal:

where

)(,1 xf k

,)()(minargˆ11,1

,1

FFRRFFARfkf

k

.0

0

Because…• Consider two desired bounds (or targets) for learning a boosted classifier

– Exact bound: and– Conservative bound:

• (2) is more conservative than (1) because (2) => (1).

0)( MFFAR 0)( MFFRR

00

0 )()(

MM FFRRFFAR

:)(xFM

(2)

(1)

0 1

1

0

= 1

H1

H2

H200H201

H3

H4

b0

Q1Q2

Q200

Q201

Q3Q4

FAR

FRR

exact bound

conservativebound

FRR0 1

1

= 0/0

FAR

H1

H2

H3

H39

H40

0

b0

H41

Q1

Q2

Q3

Q39

Q41

Q40

exact bound

conservativebound

At for every new weak classifier learned, the ROC operating

point moves the fastest toward the conservative bound

,0

0

Implication

• When the ROC operating point reaches in the conservative bound:– – – Conditions met, therefore = 0.

pass

reject

F1

…. +++yes

no

f1,1 f1,2f1,K

> ?

01

01

)(

)(

FFRR

FFAR

K

ii xfsignxF

1,11 )()(

Goal () vs. Number of weak classifiers (K)

• Toy problem: To learn a (single-exit) boosted classifier F for classifying face/non-face patches such that FAR(F) < 0.8 and FRR(F) < 0.01– Empirically best goal:

– Our method chooses:

• Similar results were obtained for tests on other desired error rates.

.8001.0

8.0

].100,10[

Multi-exit Asymmetric BoostingA method to train a single boosted classifier with multiple exit nodes:

: a weak classifier : a weak classifier followed by a decision to continue or reject – an exit node

f1 f2 f3 f4 f5 f6 f7 f8 face

non-face

pass pass passreject reject reject

fi fi

+ + + + + + +

.0

0

• Features:• Weak classifiers are trained with the same goal:• Every pass/reject decision is guaranteed with and• The classifier is a cascade.• Score is propagated from one node to another.

• Main advantages:• Weak classifiers are learned (approximately) optimally.• No training of multiple boosted classifiers.• Much fewer weak classifiers are needed than traditional cascades.

0FAR .0FRR

F2F1 F3

Results

• Use Fast StatBoost as base method for fast-learning a weak classifier.

Method No of weak

classifiers

No of exit

nodes

Total training

time

Viola Jones [3] 4,297 32 6h20m

Viola Jones [4] 3,502 29 4h30m

Boosting chain [7] 959 22 2h10m

Nested cascade [5] 894 20 2h

Soft cascade [1] 4,871 4,871 6h40m

Dynamic cascade [6] 1,172 1,172 2h50m

Multi-exit Asymmetric Boosting

575 24 1h20m

Results

• MIT+CMU Frontal Face Test set:

Conclusions

• Automatic Selection of Asymmetric Goal: – Rejectors are trained with a goal that allows to utilize

approximately the fewest weak classifiers.

• Eliminates human intervention in selecting and

• Faster detection speed:– Due to fewer weak classifiers

• Better detection accuracy:– Due to principled score propagation

Outline




Outline




Other Contributions• Online Asymmetric Boosting

– To learn online an asymmetric boosted classifier– Integration of two lines of research:

• Online Boosting• Asymmetric Boosting

• Generalization Bounds on the Asymmetric Error– To explain how well Asymmetric Boosting works– For all t > 0, with probability at least 1 – 4exp(-2t2):

tC

CCn

VC

n

VCyxFPyxFP

FFRRFFAR 2

22221

1

)1,0[)1,0[

)/2(loglog)/2(loglog

)1|)((ˆ)1|)((ˆ

inf)()(

Outline




Outline




• Extending Haar-like features• From axis-aligned shape to polygonal shape?

• Fast searching for Haar-like features

• Consistency and convergence rate of asymmetric boosting

• Sharper asymmetric bounds

Some immediate directions…


Analysis of sequential decisions



face

non-face

F2

What is the best strategy to design the sequence F1, F2, …, FN?

• Current popular object classes:– Upright frontal face:

– Medium color variance– Small shape variance– Works best with Haar-like features

Tackling harder object class

Mean intensity

• Current popular object classes:– Pedestrian:

– Large color variance– Small shape variance– Works best with HOG


Mean gradient

Multi-view multi-pose human

• Large color variance• Large shape variance

Multi-view face

• Medium color variance

• Large shape variance

Multi-pose hand

• Small color variance• Large shape variance

• More challenging object classes:


Outline




Outline




Summary

• Fast Weak Classifier Learning– Reduction of face detector learning time: a month 3 hours

• Automatic Selection of Asymmetric Goal– Principled learning goal for learning the rejectors

• Online Asymmetric Boosting– Online learning an asymmetric boosted classifier

• Generalization Bounds on the Asymmetric Error– Theory to explain how well asymmetric boosting works

Publications

• M.T. Pham and V.D.D. Hoang and T.J. Cham. Detection with Multi-exit Asymmetric Boosting. In Proc. CVPR Anchorage, Alaska, Jun 2008.– Acceptance rate 27.9%

• M.T. Pham and T.J. Cham. Fast Training and Selection of Haar features using Statistics in Boosting-based Face Detection. In Proc. ICCV, Rio de Janeiro, Brazil, Oct 2007.– Oral paper – acceptance rate 3.9%.

• M.T. Pham and T.J. Cham. Online Learning Asymmetric Boosted Classifiers for Object Detection. In Proc. CVPR, Minnesota, USA, Jun 2007.– Oral paper – acceptance rate 4.1%.

• M.T. Pham and T.J. Cham. Detection Caching for Faster Object Detection. In Proc. IEEE International Workshop on modeling People and Human Interaction (PHI'05), Beijing, China, Jun 2005. Held in conjunction with ICCV.

Awards

• One of only two first authors in the world with CVPR and ICCV oral papers in 2007.

• Travel Grant, ICCV, Rio de Janeiro, Brazil, 2007.

• Second Prize, Pattern Recognition and Machine Intelligence Association (PREMIA)’s Best Student Paper in 2007 Award, Singapore, Feb 2008.

• Travel Grant, CVPR, Anchorage, Alaska, Jun 2008.

Thank You

Documents

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection Pham Minh Tri Ph.D. Candidate and Research Associate Nanyang