Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois

Face Detection Using Large Margin Classifiers

Ming-Hsuan Yang Dan Roth Narendra Ahuja

Presented by Kiang “Sean” ZhouBeckman Institute

University of Illinois at Urbana-ChampaignUrbana, IL 61801

Overview Large margin classifiers have demonstrated

success in visual learning Support Vector Machine (SVM) Sparse Network of Winnows (SNoW)

Aim to present a theoretical account for their success and suitability in visual recognition

Theoretical and empirical analysis of these two classifiers within the context of face detection Generalization error: expected error in test Efficiency: computational capability to represent

features

Face Detection Goal: Identify and locate human faces

in an image (usually gray scale) regardless of their position, scale, in plane rotation, orientation, pose and illumination

The first step for any automatic face recognition system

A very difficult problem! First aim to detect upright frontal faces with certain ability to detect faces with different pose, scale, and illumination

See “Detecting Faces in Images: A Survey”, by M.-H. Yang, D. Kriegman, and N. Ahuja, to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002.

http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html

Where are the faces, if any?

Large Margin Classifiers Based on linear decision surface (hyperplane)

f: wT x + b = 0 Compute w and b from samples SNoW: based on Winnow with multiplicative

update rule SVM: based on Perceptron with additive

update rule Though SVM can be developed independently

of the relation to perceptron, we view them as a large margin classifier for the sake of derivation of theoretical analysis

Sparse Network of Winnows (SNoW)

Feature Vector

Target nodes

On line, mistake driven algorithm based on Winnow Attribute (feature) efficiency Allocations of nodes and links is data driven

time complexity depends on number of active features Mechanisms for discarding irrelevant features Allows for combining task hierarchically

Winnow Update Rule Multiplicative weight update algorithm:

Number of mistakes in training is O (k log n) where k is the number of relevant features of the concept and n is the number of features

Tolerate a large number of features Mistake bound is logaritimic in number of features Advantageous when function space is sparse

Robust in the presence of noisy features

0.5 2, Usually,

(demotion) 1)x (if w w,xbut w 0Class If

)(promotion 1)x (if w w,xwbut 1Class If

xw iff 1 is Prediction

iii

iii

θ

θ

θ

Support Vector Machine (SVM)

Can be viewed as a perceptron with maximum margin

Based on statistical learning theory Extend to nonlinear SVM using kernel tricks

Computational efficiency Expressive representation with nonlinear features

Have demonstrated excellent empirical results in visual recognition tasks

Training can be time consuming though fast algorithms have been developed

Generalization Error Bounds: SVM

Theorem 1: If data is L2 norm bounded as ||x||2b, and the family of hyperplanes w such that ||w||2<a, then for any margin <0, with probability 1- over n random samples, the misclassification error err(w)

where k = |{I: wTxiyi<}| is the number of samples with margin less than

1

ln)2ln()( 222

nab

ban

C

n

kwerr

Generalization Error Bounds: SNoW

Theorem 2: If data is L norm bounded as ||x||b, and the family of hyperplanes w such that ||w||1<a and jln( )c, then for any margin <0, with probability 1- over n random samples, the misclassification error err(w)

where k = |{I: wTxiyi<}| is the number of samples with margin less than

1

1

w

w

j

j

1

ln)2ln()()( 222

nab

acabn

C

n

kwerr

Generalization Error Bounds

In summary SVM: Ea ||w||2

2 max ||xi||22

SNoW: Em 2 ln 2n||w||12 max ||xi||2

SNoW has lower generalization error if Data is L norm bounded and there is a small L1 norm

hyperplane SVM has lower generalization error if

Data is L2 norm bounded and there is a small L2 norm hyperplane

SNoW performs better than SVM if the data has small L norm but large L2 norm

Efficiency Features in nonlinear SVMs are more

expressive than linear features (and efficient as a result of kernel trick)

Can use conjunctive features in SNoW as nonlinear features

Represent the occurrence (conjunction) of intensity values of m pixels within a window by a new feature value

Experiments Training set:

6,977 2020 upright, frontal images: 2,429 faces and 4,548 nonfaces

Appearance-based approach: Histogram equalized Convert each image to a vector of intensity values

Test set: 24,045 images: 472 faces and 23,573 nonfaces

Empirical Results

SNoW with local features performs better linear SVM

SVM with 2nd order polynomial performs better than SNoW with conjunctive features

SNoW with local features

SVM with linear features

SVM with 2nd poly kernel

SNoW with conjunctive features

Discussion Studies have shown that

the target hyperplane function in visual pattern recognition is usually sparse, i.e.,

the L2 norm and L1 of ||w|| are usually small

Perceptron does not have any theoretical advantage over Winnow (or SNoW)

In the experiments, L2 is on average 10.2 times larger than L

Empirical results conform to theoretical analysis





SVM with 2nd poly kernel

SNoW with conjunctive features

Conclusion Theoretical and empirical arguments suggest

SNoW-based learning framework has important advantages for visual learning task

SVMs have nice computational properties to represent nonlinear features as a result of kernel tricks

Future work will focus on efficient methods (i.e., similar to kernel ticks) to represent nonlinear features for SNoW-based learning framework

Documents

Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois