Classification for Computer Vision

7/31/2019 Classification for Computer Vision

1/51

Zr ich Autonomous Systems Lab

From eigenfaces to adaboost

Cedric Pradalier


2/51

Autonomou

sSystemsLab

Zr ich

An introduction to the processing of large dimensionalitydataset


3/51

Autonomou

sSystemsLab

Zr ich

Input A database of normalised face photos

A normalised face photo

Output Face identification: whose photo is that?

Face representation in minimal dimension

Face comparison


4/51

Autonomou

sSystemsLab

Zr ich

DatabaseImage to identify

Identification


5/51

Autonomou

sSystemsLab

Zr ich

Why? Different centering

Different mouth shape

Different eye opening

Solution: Extract the most important features

Discard the details

PCA is one solution


6/51

Autonomou

sSystemsLab

Zr ich

Each image is a n x m matrix of pixels Convert it into a nm vector by stacking the columns

A small image is 100x100 -> a 10000 elementvector, i.e. a point in a 10000 dimension space.


7/51

Autonomou

sSystemsLab

Zr ich

M = compute average vector

Subtract M from each vector -> zero-centered distribution.


8/51

Autonomou

sSystemsLab

Zr ich

C = compute covariance matrix

(10000x10000)

= compute the 10000

eigenvalues and eigenvector of C

Change all points into the eigenvectors

frame

1 1, ) ( ,( )p pv v


9/51

Autonomou

sSystemsLab

Zr ich

Select just enough dimensions accordingto the strength of their eigenvalues Typical value of 30-100 dimensions seems enough

for faces

Discard all the remaining dimensions


10/51

Autonomou

sSystemsLab

Zr ich

Prepare the image Start with a face to identify.

Convert the image to a vector

Subtract M

Change to the eigenvectors frame

Keep only the required dimension

Find the closest point in the remaining

dimensions.


11/51

Autonomou

sSystemsLab

Zr ich

Database from Yale: http://cvc.yale.edu/projects/yalefaces/yalefaces.html

165 faces, 11 persons with varying lighting, expression,glasses

Results and algorithms from: http://www.cs.princeton.edu/~cdecoro/eigenfaces/
http://cvc.yale.edu/projects/yalefaces/yalefaces.htmlhttp://www.cs.princeton.edu/~cdecoro/eigenfaces/http://www.cs.princeton.edu/~cdecoro/eigenfaces/http://cvc.yale.edu/projects/yalefaces/yalefaces.html


12/51

Autonomou

sSystemsLab

Zr ich


13/51

Autonomou

sSystemsLab

Zr ich

30% of faces used for testing, 70% forlearning.


14/51

Autonomou

sSystemsLab

Zr ich

Variance: normalised cummulative sum of the eigenvalues.

About 55 eigenfaces are required to represent 80% of the information


15/51

Autonomou

sSystemsLab

Zr ich

Adding eigenfaces one at a time

Adding eigenfaces eight at a time

Reconstruction to perfect image requires a lot of eigenfaces, butmuch less than pixels


16/51

Autonomou

sSystemsLab

Zr ich

All faces with glasses have beenignored: not a huge difference.


17/51

Autonomou

sSystemsLab

Zr ich

Most recognitions are correct, even with wide range of expression variation:PCA has relatively low sensitivity to local-changes


18/51

Autonomou

sSystemsLab

Zr ich


19/51

Autonomou

sSystemsLab

Zr ich

9/23 recognitions are wrong. PCA is sensitive to global changes


20/51

Autonomou

sSystemsLab

Zr ich

Normalisation: Normalisation of the range and center of each

dimensions

Computational tools: Eigenvalues or eigenvectors

SVD decomposition


21/51

Autonomou

sSystemsLab

Zr ich

Principal Component Analysis is a good toolto identify main characteristics of a dataset.

It is computationally efficient for recognitionand dimensionality reduction

The construction of the eigenvectors can bevery expensive (esp. for images).

Online PCA techniques have been

researched. For image recognitions, image must be pre-

cut very accurately, with consistent lightingfor the technique to work.


22/51

Autonomou

sSystemsLab

Zr ich

{ xi} a set of points in a D-dimensional space X

{ ui} an orthonormal basis in X

Then:

Approximating on a sub base:

Approximation error:

,

1 1

( )D D

T

n n i i n i i

i i

u x u ux = =

= =

,

1 1

D

n n i i

M

i i n

i i M

x u bz u z b= = +

= + = + Independent of n

2

1

1 N

n n

n

xJ xN =

=


23/51

Autonomou

sSystemsLab

Zr ich

Minimising J: W.r.t z:

W.r.t b:

Substituting:

Leads to:

0T

nj n j

nj

Jz x u

z

= =

1

1

0

TN

T

j j n jnj

J

b x u x ub N =

= = =

{ }1 )(D

T

n n n i ii Mx xx u ux= + =

{ }2

1 1 1

1 N D DT T Tn i i i i

n i M i M

x u u Sx u uJN = = + = +

= =

1

1( )( )

NT

n n

n

x x x x

N

S=

=

Data covariance matrix:


24/51

Autonomou

sSystemsLab

Zr ich

Minimising J:

Finding the optimal ui require a minimisationwith constraints: Introduce Lagrange multipliers

The optimal is found when ui is an eigenvector of S

Eigenvalues are positive, so J is minimal if the ui are

the eigenvector with thes m a l l e s t

eigenvalues

1

1 D Ti i

i M

u SuN

J= +

=

1iu =

1

D

i i i i

i M

Su Ju = +

==


25/51

AutonomousSystemsLab

Zr ich

PCA is the orthogonal projection of the data

onto a lower subspace such that the varianceof the projected data is maximised. Informally: more variance means more information

Probabilistic formulation: Latent variable z: projection on the subspace

EM algorithm: Maximise the log-likelihood of p(x)

Find the optimal W, and : they correspond to thedata mean and the principal component of the data.

-> Can deal with missing data (among other

advantages)

2( ) ( | 0, ) ( | ) ( ,| )p z N z I p x z N x z IW = = +


26/51


Zr ich

ICA: Independent Component Analysis Similar to the Probabilistic formulation, except the

latent variable have a non-linear, non gaussiandistribution:

Used in signal processing. Typical example is blindsource separation in audio signal analysis.

CCA: Canonical Correlation Analysis Creates a model that maximally correlates 2 sets

of variable

Used in data analysis/statistic to find what iscommon between two sets of observations.

1

( () )M

j

j

p zp z=

=


27/51


Zr ich

A good way to build a classifier


28/51


Zr ich

What is classification (in layman terms)?NL

L


29/51


Zr ich

Computational learning theorydistinguishes between a: Strong learning algorithm: finds with a high

probability an arbitrarily accurate classifier Weak learning algorithm: Only finds a classifier

with a bounded accuracy.

For example: Support Vector Machineswith linear kernel only create a boundedaccuracy.

But: They are at least better than randomguessing! (i.e. the classification error is lower than 0.5)


30/51


Zr ich

SVM Support vector machines for joint multvariablesoptimization [Spinello08]

Slide from prof. Buhman: Machine Learning


31/51


Zr ich

Decision Stumps are a class of very simpleweak classifiers. Goal: Find an axis-aligned hyperplane

that minimizes the classification error.

This can be done for each feature (i.e.for each dimension in feature space)

It can be shown that the classification erroris always better than 0.5 (random

guessing). Idea: apply many weak classifiers, where

each is trained on the misclassifiedexamples of the previous.


32/51


Zr ich

Weak classifiers (in Adaboost) are binaryclassifiers

>+=

m

xmmjxc

j

),,|(

Stump: simple most non trivial type of decision tree(equivalent to a linear classifier defined by affine

hyperplane)

)1,1( +m

The hyperplane is orthogonal to j axis with which it intersects in

(it ignores all entries ofx except )

jx

1x

2x


33/51


Zr ich

Boosting is a technique to build a stronglearning algorithm from a given weaklearning algorithm.

The most popular boosting algorithm is

AdaBoost (adaptive boosting). It assigns a weight to each training data point.

In the beginning, all weights are equal

In each round AdaBoost finds a weak classifier and

re-weights the misclassified points.

Correct classified points are weighted less,

misclassified points are weighted higher


34/51


Zr ich

Algorithm TrainAdaBoost:1. for do

2. for do

3. Find a classifier that minimizes

4. compute

5. return


35/51


Zr ich

Algorithm ClassifyAdaBoost:

1. return

Major features:

Accuracy of the classifier increases with thenumber Mof weak classifiers. I.e. the algorithm is

arbitrarily accurate Classification can be done very fast (in contrast to

training)


36/51


Zr ich



37/51


Zr ich



38/51


Zr ich



39/51


Zr ich

The state of the art:

Robust Real-time Object Detection,

Paul Viola and Michael Jones, IWSCTV, 2001


40/51


Zr ich

Features for face detection

Quick evaluation through the integral image

approach

Classifier selection How to select a minimal set of features/weak

classifier to detect a face

Classifier cascade

How to efficiently assemble classifiers


41/51


Zr ich

Defined as difference ofrectangular integral area: The sum of the pixels which

lie within the white

rectangles are subtractedfrom the sum of pixels inthe grey rectangles.

One feature defined as: Feature type: A,B,C or D

Feature position and size

( )( )

( , ) ( , )White Grey

I x y dxdy I x y dxdy


42/51


Zr ich

Defined as :

Integral on rectangle D canbe computed in 4 access toIint:

Very efficient way tocompute features

= I(x,y) dy dx( , )intx X y Y

I X Y

(1( ), ) (4) (2) (3)int int int int

D

I Ix y II I = +


43/51


44/51

Autonomo

usSystemsLab

Zr ich


45/51

Autonomo

usSystemsLab

Zr ich

A classifier with only this two features can be trained torecognise 100% of the faces, with 40% of false positives


46/51

Autonomo

usSystemsLab

Zr ich

scale = 24x24

Do {

For each position in the image {

Try classifying the part of the image starting at this

position, with the current scale, using the classifier

selected by AdaBoost

} Scale = Scale x 1.5

} until maximum scale


47/51

Autonomo

usSystemsLab

Zr ich

Basic idea:

It is easy to detect that something is not a face

Tune(boost) classifier to be very reliable at saying

NO (i.e. very low false negative) Stop evaluating the cascade of classifier if one

classifier says NO


48/51


49/51

Autonomo

usSystemsLab

Zr ich


50/51

Autonomo

usSystemsLab

Zr ich


51/51

Autonomo

usSystemsLab

Face detection is solved

Algorithms such as Viola-Jones AdaBoost are very

efficient and easily implemented in hardware

Occurring on digital camera and camcorder

The approach used in Viola-Jones algorithm

are generic enough to be used for other

detection tasks PCA can still be useful, but only on very

controlled settings

Documents

Classification for Computer Vision