Classification for Computer Vision

Embed Size (px)

Citation preview

  • 7/31/2019 Classification for Computer Vision

    1/51

    Zr ich Autonomous Systems Lab

    From eigenfaces to adaboost

    Cedric Pradalier

  • 7/31/2019 Classification for Computer Vision

    2/51

    Autonomou

    sSystemsLab

    Zr ich

    An introduction to the processing of large dimensionalitydataset

  • 7/31/2019 Classification for Computer Vision

    3/51

    Autonomou

    sSystemsLab

    Zr ich

    Input A database of normalised face photos

    A normalised face photo

    Output Face identification: whose photo is that?

    Face representation in minimal dimension

    Face comparison

  • 7/31/2019 Classification for Computer Vision

    4/51

    Autonomou

    sSystemsLab

    Zr ich

    DatabaseImage to identify

    Identification

  • 7/31/2019 Classification for Computer Vision

    5/51

    Autonomou

    sSystemsLab

    Zr ich

    Why? Different centering

    Different mouth shape

    Different eye opening

    Solution: Extract the most important features

    Discard the details

    PCA is one solution

  • 7/31/2019 Classification for Computer Vision

    6/51

    Autonomou

    sSystemsLab

    Zr ich

    Each image is a n x m matrix of pixels Convert it into a nm vector by stacking the columns

    A small image is 100x100 -> a 10000 elementvector, i.e. a point in a 10000 dimension space.

  • 7/31/2019 Classification for Computer Vision

    7/51

    Autonomou

    sSystemsLab

    Zr ich

    M = compute average vector

    Subtract M from each vector -> zero-centered distribution.

  • 7/31/2019 Classification for Computer Vision

    8/51

    Autonomou

    sSystemsLab

    Zr ich

    C = compute covariance matrix

    (10000x10000)

    = compute the 10000

    eigenvalues and eigenvector of C

    Change all points into the eigenvectors

    frame

    1 1, ) ( ,( )p pv v

  • 7/31/2019 Classification for Computer Vision

    9/51

    Autonomou

    sSystemsLab

    Zr ich

    Select just enough dimensions accordingto the strength of their eigenvalues Typical value of 30-100 dimensions seems enough

    for faces

    Discard all the remaining dimensions

  • 7/31/2019 Classification for Computer Vision

    10/51

    Autonomou

    sSystemsLab

    Zr ich

    Prepare the image Start with a face to identify.

    Convert the image to a vector

    Subtract M

    Change to the eigenvectors frame

    Keep only the required dimension

    Find the closest point in the remaining

    dimensions.

  • 7/31/2019 Classification for Computer Vision

    11/51

    Autonomou

    sSystemsLab

    Zr ich

    Database from Yale: http://cvc.yale.edu/projects/yalefaces/yalefaces.html

    165 faces, 11 persons with varying lighting, expression,glasses

    Results and algorithms from: http://www.cs.princeton.edu/~cdecoro/eigenfaces/

    http://cvc.yale.edu/projects/yalefaces/yalefaces.htmlhttp://www.cs.princeton.edu/~cdecoro/eigenfaces/http://www.cs.princeton.edu/~cdecoro/eigenfaces/http://cvc.yale.edu/projects/yalefaces/yalefaces.html
  • 7/31/2019 Classification for Computer Vision

    12/51

    Autonomou

    sSystemsLab

    Zr ich

  • 7/31/2019 Classification for Computer Vision

    13/51

    Autonomou

    sSystemsLab

    Zr ich

    30% of faces used for testing, 70% forlearning.

  • 7/31/2019 Classification for Computer Vision

    14/51

    Autonomou

    sSystemsLab

    Zr ich

    Variance: normalised cummulative sum of the eigenvalues.

    About 55 eigenfaces are required to represent 80% of the information

  • 7/31/2019 Classification for Computer Vision

    15/51

    Autonomou

    sSystemsLab

    Zr ich

    Adding eigenfaces one at a time

    Adding eigenfaces eight at a time

    Reconstruction to perfect image requires a lot of eigenfaces, butmuch less than pixels

  • 7/31/2019 Classification for Computer Vision

    16/51

    Autonomou

    sSystemsLab

    Zr ich

    All faces with glasses have beenignored: not a huge difference.

  • 7/31/2019 Classification for Computer Vision

    17/51

    Autonomou

    sSystemsLab

    Zr ich

    Most recognitions are correct, even with wide range of expression variation:PCA has relatively low sensitivity to local-changes

  • 7/31/2019 Classification for Computer Vision

    18/51

    Autonomou

    sSystemsLab

    Zr ich

  • 7/31/2019 Classification for Computer Vision

    19/51

    Autonomou

    sSystemsLab

    Zr ich

    9/23 recognitions are wrong. PCA is sensitive to global changes

  • 7/31/2019 Classification for Computer Vision

    20/51

    Autonomou

    sSystemsLab

    Zr ich

    Normalisation: Normalisation of the range and center of each

    dimensions

    Computational tools: Eigenvalues or eigenvectors

    SVD decomposition

  • 7/31/2019 Classification for Computer Vision

    21/51

    Autonomou

    sSystemsLab

    Zr ich

    Principal Component Analysis is a good toolto identify main characteristics of a dataset.

    It is computationally efficient for recognitionand dimensionality reduction

    The construction of the eigenvectors can bevery expensive (esp. for images).

    Online PCA techniques have been

    researched. For image recognitions, image must be pre-

    cut very accurately, with consistent lightingfor the technique to work.

  • 7/31/2019 Classification for Computer Vision

    22/51

    Autonomou

    sSystemsLab

    Zr ich

    { xi} a set of points in a D-dimensional space X

    { ui} an orthonormal basis in X

    Then:

    Approximating on a sub base:

    Approximation error:

    ,

    1 1

    ( )D D

    T

    n n i i n i i

    i i

    u x u ux = =

    = =

    ,

    1 1

    D

    n n i i

    M

    i i n

    i i M

    x u bz u z b= = +

    = + = + Independent of n

    2

    1

    1 N

    n n

    n

    xJ xN =

    =

  • 7/31/2019 Classification for Computer Vision

    23/51

    Autonomou

    sSystemsLab

    Zr ich

    Minimising J: W.r.t z:

    W.r.t b:

    Substituting:

    Leads to:

    0T

    nj n j

    nj

    Jz x u

    z

    = =

    1

    1

    0

    TN

    T

    j j n jnj

    J

    b x u x ub N =

    = = =

    { }1 )(D

    T

    n n n i ii Mx xx u ux= + =

    { }2

    1 1 1

    1 N D DT T Tn i i i i

    n i M i M

    x u u Sx u uJN = = + = +

    = =

    1

    1( )( )

    NT

    n n

    n

    x x x x

    N

    S=

    =

    Data covariance matrix:

  • 7/31/2019 Classification for Computer Vision

    24/51

    Autonomou

    sSystemsLab

    Zr ich

    Minimising J:

    Finding the optimal ui require a minimisationwith constraints: Introduce Lagrange multipliers

    The optimal is found when ui is an eigenvector of S

    Eigenvalues are positive, so J is minimal if the ui are

    the eigenvector with thes m a l l e s t

    eigenvalues

    1

    1 D Ti i

    i M

    u SuN

    J= +

    =

    1iu =

    1

    D

    i i i i

    i M

    Su Ju = +

    ==

  • 7/31/2019 Classification for Computer Vision

    25/51

    AutonomousSystemsLab

    Zr ich

    PCA is the orthogonal projection of the data

    onto a lower subspace such that the varianceof the projected data is maximised. Informally: more variance means more information

    Probabilistic formulation: Latent variable z: projection on the subspace

    EM algorithm: Maximise the log-likelihood of p(x)

    Find the optimal W, and : they correspond to thedata mean and the principal component of the data.

    -> Can deal with missing data (among other

    advantages)

    2( ) ( | 0, ) ( | ) ( ,| )p z N z I p x z N x z IW = = +

  • 7/31/2019 Classification for Computer Vision

    26/51

    AutonomousSystemsLab

    Zr ich

    ICA: Independent Component Analysis Similar to the Probabilistic formulation, except the

    latent variable have a non-linear, non gaussiandistribution:

    Used in signal processing. Typical example is blindsource separation in audio signal analysis.

    CCA: Canonical Correlation Analysis Creates a model that maximally correlates 2 sets

    of variable

    Used in data analysis/statistic to find what iscommon between two sets of observations.

    1

    ( () )M

    j

    j

    p zp z=

    =

  • 7/31/2019 Classification for Computer Vision

    27/51

    AutonomousSystemsLab

    Zr ich

    A good way to build a classifier

  • 7/31/2019 Classification for Computer Vision

    28/51

    AutonomousSystemsLab

    Zr ich

    What is classification (in layman terms)?NL

    L

  • 7/31/2019 Classification for Computer Vision

    29/51

    AutonomousSystemsLab

    Zr ich

    Computational learning theorydistinguishes between a: Strong learning algorithm: finds with a high

    probability an arbitrarily accurate classifier Weak learning algorithm: Only finds a classifier

    with a bounded accuracy.

    For example: Support Vector Machineswith linear kernel only create a boundedaccuracy.

    But: They are at least better than randomguessing! (i.e. the classification error is lower than 0.5)

  • 7/31/2019 Classification for Computer Vision

    30/51

    AutonomousSystemsLab

    Zr ich

    SVM Support vector machines for joint multvariablesoptimization [Spinello08]

    Slide from prof. Buhman: Machine Learning

  • 7/31/2019 Classification for Computer Vision

    31/51

    AutonomousSystemsLab

    Zr ich

    Decision Stumps are a class of very simpleweak classifiers. Goal: Find an axis-aligned hyperplane

    that minimizes the classification error.

    This can be done for each feature (i.e.for each dimension in feature space)

    It can be shown that the classification erroris always better than 0.5 (random

    guessing). Idea: apply many weak classifiers, where

    each is trained on the misclassifiedexamples of the previous.

  • 7/31/2019 Classification for Computer Vision

    32/51

    AutonomousSystemsLab

    Zr ich

    Weak classifiers (in Adaboost) are binaryclassifiers

    >+=

    m

    xmmjxc

    j

    ),,|(

    Stump: simple most non trivial type of decision tree(equivalent to a linear classifier defined by affine

    hyperplane)

    )1,1( +m

    The hyperplane is orthogonal to j axis with which it intersects in

    (it ignores all entries ofx except )

    jx

    1x

    2x

  • 7/31/2019 Classification for Computer Vision

    33/51

    AutonomousSystemsLab

    Zr ich

    Boosting is a technique to build a stronglearning algorithm from a given weaklearning algorithm.

    The most popular boosting algorithm is

    AdaBoost (adaptive boosting). It assigns a weight to each training data point.

    In the beginning, all weights are equal

    In each round AdaBoost finds a weak classifier and

    re-weights the misclassified points.

    Correct classified points are weighted less,

    misclassified points are weighted higher

  • 7/31/2019 Classification for Computer Vision

    34/51

    AutonomousSystemsLab

    Zr ich

    Algorithm TrainAdaBoost:1. for do

    2. for do

    3. Find a classifier that minimizes

    4. compute

    5. return

  • 7/31/2019 Classification for Computer Vision

    35/51

    AutonomousSystemsLab

    Zr ich

    Algorithm ClassifyAdaBoost:

    1. return

    Major features:

    Accuracy of the classifier increases with thenumber Mof weak classifiers. I.e. the algorithm is

    arbitrarily accurate Classification can be done very fast (in contrast to

    training)

  • 7/31/2019 Classification for Computer Vision

    36/51

    AutonomousSystemsLab

    Zr ich

    Slide from prof. Buhman: Machine Learning

  • 7/31/2019 Classification for Computer Vision

    37/51

    AutonomousSystemsLab

    Zr ich

    Slide from prof. Buhman: Machine Learning

  • 7/31/2019 Classification for Computer Vision

    38/51

    AutonomousSystemsLab

    Zr ich

    Slide from prof. Buhman: Machine Learning

  • 7/31/2019 Classification for Computer Vision

    39/51

    AutonomousSystemsLab

    Zr ich

    The state of the art:

    Robust Real-time Object Detection,

    Paul Viola and Michael Jones, IWSCTV, 2001

  • 7/31/2019 Classification for Computer Vision

    40/51

    AutonomousSystemsLab

    Zr ich

    Features for face detection

    Quick evaluation through the integral image

    approach

    Classifier selection How to select a minimal set of features/weak

    classifier to detect a face

    Classifier cascade

    How to efficiently assemble classifiers

  • 7/31/2019 Classification for Computer Vision

    41/51

    AutonomousSystemsLab

    Zr ich

    Defined as difference ofrectangular integral area: The sum of the pixels which

    lie within the white

    rectangles are subtractedfrom the sum of pixels inthe grey rectangles.

    One feature defined as: Feature type: A,B,C or D

    Feature position and size

    ( )( )

    ( , ) ( , )White Grey

    I x y dxdy I x y dxdy

  • 7/31/2019 Classification for Computer Vision

    42/51

    AutonomousSystemsLab

    Zr ich

    Defined as :

    Integral on rectangle D canbe computed in 4 access toIint:

    Very efficient way tocompute features

    = I(x,y) dy dx( , )intx X y Y

    I X Y

    (1( ), ) (4) (2) (3)int int int int

    D

    I Ix y II I = +

  • 7/31/2019 Classification for Computer Vision

    43/51

  • 7/31/2019 Classification for Computer Vision

    44/51

    Autonomo

    usSystemsLab

    Zr ich

  • 7/31/2019 Classification for Computer Vision

    45/51

    Autonomo

    usSystemsLab

    Zr ich

    A classifier with only this two features can be trained torecognise 100% of the faces, with 40% of false positives

  • 7/31/2019 Classification for Computer Vision

    46/51

    Autonomo

    usSystemsLab

    Zr ich

    scale = 24x24

    Do {

    For each position in the image {

    Try classifying the part of the image starting at this

    position, with the current scale, using the classifier

    selected by AdaBoost

    } Scale = Scale x 1.5

    } until maximum scale

  • 7/31/2019 Classification for Computer Vision

    47/51

    Autonomo

    usSystemsLab

    Zr ich

    Basic idea:

    It is easy to detect that something is not a face

    Tune(boost) classifier to be very reliable at saying

    NO (i.e. very low false negative) Stop evaluating the cascade of classifier if one

    classifier says NO

  • 7/31/2019 Classification for Computer Vision

    48/51

  • 7/31/2019 Classification for Computer Vision

    49/51

    Autonomo

    usSystemsLab

    Zr ich

  • 7/31/2019 Classification for Computer Vision

    50/51

    Autonomo

    usSystemsLab

    Zr ich

  • 7/31/2019 Classification for Computer Vision

    51/51

    Autonomo

    usSystemsLab

    Face detection is solved

    Algorithms such as Viola-Jones AdaBoost are very

    efficient and easily implemented in hardware

    Occurring on digital camera and camcorder

    The approach used in Viola-Jones algorithm

    are generic enough to be used for other

    detection tasks PCA can still be useful, but only on very

    controlled settings