Color Image Based Face Recognition · Tejaswini Ganapathi Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2008 Traditional

Color Image Based Face Recognition

by

Tejaswini Ganapathi

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2008 by Tejaswini Ganapathi

Abstract

Color Image Based Face Recognition

Tejaswini Ganapathi

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2008

Traditional appearance based face recognition (FR) systems use gray scale images,

however recently attention has been drawn to the use of color images. Color inputs have

a larger dimensionality, which increases the computational cost, and makes the small

sample size (SSS) problem in supervised FR systems more challenging. It is therefore

important to determine the scenarios in which usage of color information helps the FR

system.

In this thesis, it was found that inclusion of chromatic information in FR systems

is shown to be particularly advantageous in poor illumination conditions. In supervised

systems, a color input of optimal dimensionality would improve the FR performance

under SSS conditions. A fusion of decisions from individual spectral planes also helps in

the SSS scenario. Finally, chromatic information is integrated into a supervised ensemble

learner to address pose and illumination variations. This framework significantly boosts

FR performance under a range of learning scenarios.

ii

Acknowledgements

I would like to sincerely thank my research adviser, Prof. Kostas Plataniotis for

his guidance and insightful inputs, which helped me a lot during my thesis work. His

encouragement and thoughts were very helpful during my graduate studies.

I would also like to thank my thesis proposal and committee members for taking

time out of their busy schedules and reviewing my work, offering valuable comments and

suggestions. I also acknowledge the financial support from the Ontario Graduate Schol-

arship, Department of Electrical and Computer Engineering at University of Toronto and

Prof. Kostas Plataniotis during the period of my graduate studies.

Finally, I would like to thank close friends (you know who you are!) and my lab mates

for their company, encouragement, and most of all for being there, without which this

journey would have been a very difficult one.

Last but not the least, I would like to thank my family members for their constant

encouragement, support and care.

iii

Contents

1 Introduction 1

1.1 Color based Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Face Recognition: Modes of Operation, and Target Applications . . . . . 4

1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Thesis Contributions and Organization . . . . . . . . . . . . . . . . . . . 6

2 Prior Work and Background 8

2.1 Face Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Color Face Recognition: A Survey . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Motivation of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Color face recognition in different learning scenarios 20

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Representation of Color Information . . . . . . . . . . . . . . . . . . . . . 22

3.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 24

3.3.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . 25

3.4 Color and the Small Sample Size Problem . . . . . . . . . . . . . . . . . 27

3.4.1 Small Sample Size Problem . . . . . . . . . . . . . . . . . . . . . 27

3.4.2 Implication of Color Inputs . . . . . . . . . . . . . . . . . . . . . 27

3.5 Methodology and Experimental Setup . . . . . . . . . . . . . . . . . . . . 29

iv

3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.6.1 Choice of Gray scale baseline and Similarity Metric . . . . . . . . 31

3.6.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6.3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Decision Level Fusion of Spectral Planes 49

4.1 Introduction and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 Combination Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52


4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.1 Choice of Aggregation Rule . . . . . . . . . . . . . . . . . . . . . 57

4.4.2 FR Performance: Poor Illumination conditions . . . . . . . . . . . 59

4.4.3 FR Performance: Good Illumination conditions . . . . . . . . . . 62

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Color Face Recognition in Ada-Boost framework 67

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Motivation: Ada-Boost Learning . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3.1 Regularized Direct LDA . . . . . . . . . . . . . . . . . . . . . . . 72

5.3.2 Ada-Boost framework . . . . . . . . . . . . . . . . . . . . . . . . 74

5.4 Possible Implication of color in the Ada-Boost framework . . . . . . . . . 77


5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.6.1 Implication of Color . . . . . . . . . . . . . . . . . . . . . . . . . 86

v

5.6.2 Implication of ensemble learning . . . . . . . . . . . . . . . . . . . 88

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6 Conclusion and Future Research 92

6.1 Research Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

A Color CMU PIE database 97

A.1 Pose and Illumination variation . . . . . . . . . . . . . . . . . . . . . . . 98

A.2 Pose and Expression variation . . . . . . . . . . . . . . . . . . . . . . . . 99

B Preprocessing Method 101

C YCbCr Color Space 104

Bibliography 107

vi

List of Tables

3.1 Learning Scenarios encountered in face recognition problems . . . . . . . 22

3.2 Best Color/ Gray scale transformations in Extreme Small Sample Size

scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1 Decision Fusion: Rank 1 CRR in % (YCbCr 4:4:4, Database DB1 ) . . . . 60

4.2 Decision Fusion: Rank 1 CRR in % - (YCbCr 4:2:2, Database DB1 ) . . . 60







4.9 Decision Fusion:Rank 1 CRR in % - (YCbCr 4:2:0, Database DB2 ) . . . 63




5.1 Results obtained with ada-boost.M2 & R-LDA using color & gray scale

transformations in different learning scenarios . . . . . . . . . . . . . . . 85

5.2 Best Performances obtained by using the color space counterpart over the

corresponding gray scale over different learning tasks . . . . . . . . . . . 87

vii

5.3 Best Performances obtained by boosting the R-LDA learner for different

inputs and learning tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

A.1 Details of CMU PIE database . . . . . . . . . . . . . . . . . . . . . . . . 98

viii

List of Figures

1.1 Various approaches to Face Recognition . . . . . . . . . . . . . . . . . . . 2

2.1 General FR system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 General multiple classifier FR system . . . . . . . . . . . . . . . . . . . . 11

2.3 Past Works in color FR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1 Color FR System Description . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Comparison of gray scale transformations and similarity metrics . . . . . 32

3.3 Rank 1 performance of YCbCr transformations with PCA feature extrac-

tor (Unsupervised Learning), database DB1 . . . . . . . . . . . . . . . . 34







3.7 Rank 1 performance of YCbCr transformations with LDA feature extractor

(Supervised Learning), database DB1 . . . . . . . . . . . . . . . . . . . . 40



ix





4.1 Color FR: Multiple Classifier System Diagram . . . . . . . . . . . . . . . 56

4.2 Comparison of Aggregation Rules . . . . . . . . . . . . . . . . . . . . . . 58

5.1 Training the Ada-Boost Ensemble- Generic Diagram . . . . . . . . . . . . 75

5.2 Pseudocode: Ada-Boost framework . . . . . . . . . . . . . . . . . . . . . 78

5.3 Color FR: Ada-Boost System Description . . . . . . . . . . . . . . . . . . 83

A.1 CMU PIE: Images with Pose and Illumination Variations : No Room Lights 99

A.2 CMU PIE: Images with Pose and Illumination Variations : Room Lights On 99

A.3 CMU PIE: Images with Pose and Expression Variations : Room Lights On 100

B.1 Image preprocessing method . . . . . . . . . . . . . . . . . . . . . . . . . 103

C.1 Illustration of Chromatic sub sampling in YCbCr . . . . . . . . . . . . . 106

x

List of Abbreviations

FR Face Recognition

PCA Principle Component Analysis

LDA Linear Discriminant Analysis

R-LDA Regularized Linear Discriminant Analysis

LDD Learning Difficulty Degree

Ada-boost Adaptive Boosting

FRR False Rejection Rate

FAR False Acceptance Rate

CRR Correct Recognition Rate

MPEG Moving Picture Experts Group

JPEG Joint Photographic Experts Group

xi

Chapter 1

Introduction

Face Recognition (FR) is the process of recognizing an individual using facial features.

FR is a technology with applications ranging from security related, such as monitoring

and surveillance, identity authentication to human computer interaction and face based

video indexing. Recently, an increase in the security concerns world wide has focused the

attention of researchers and the public on the accuracy of computerized FR systems. It

has been reported that the accuracy of FR algorithms is comparable to or better than

that of the recognition by humans when the given face images are subjected to difficult

conditions like varying illumination, pose and resolutions. However none of the existing

FR methods are totally robust to these conditions[1, 2, 3], automatic FR is a promising

research area.

Various 2-d and 3-d methods have been proposed in past literature for FR, and are

reported in a recent survey [4]. The 2-d methods are more popular than 3-d methods

as research on the latter is relatively new and poses challenges such as difficult data

acquisition process, alignment of 3-d meshes and faces with occlusions (e.g. spectacles)

that cannot be properly dealt with. The 2-d methods in literature can be classified into

appearance based methods and feature based methods. The feature based approach is

based on the localization of face features like eyes, eyebrows, nose and mouth. Infor-

1

Chapter 1. Introduction 2

mation about their geometry characteristics, relative positions and other statistics are

used to describe faces. Examples of feature based approaches are those based on Hidden

Markov Models [5, 6], Elastic Bunch Graph Mapping [7] and gabor wavelets [8]. Al-

though feature based methods might lead to a good FR performance, they have a major

disadvantage: their performance heavily relies on the accurate localization of face feature

areas. Various line of approaches to automatic FR systems are presented in Figure 1.1.

FR Algorithms

2D 3D

Appearance BasedFeature Based

Gray Scale Image Colour Image

Figure 1.1: Various approaches to Face Recognition

Appearance based methods treat the face as a holistic 2-d pattern and focus on cre-

ating a low dimensional statistical representation of the face. In this class of methods,

the face is represented by a vector/matrix of pixel intensity values and the FR algorithm

focuses on projecting these vectors/matrices onto a lower dimensional discriminative face

space in which recognition is performed. FR is therefore viewed as a multivariate statis-

tical problem. This class of methods avoids challenges relating to localization of features

in face images and 3-d modeling and are reported to be the very successful in past lit-

erature [9, 10, 11, 12] when applied on large and complex databases. Although humans

can recognize persons based on certain face features, the geometric interdependency be-


tween different face features contributes more to the recognition process than a particular

feature alone. In other words, humans tend to treat a face as a holistic pattern while

performing the process of recognition. This argument, along with the results demon-

strated by appearance based methods in past literature is the motivation for using the

appearance based approach to FR in this thesis.

1.1 Color based Face Recognition

Gray scale or intensity images have been traditionally used in appearance based FR

systems and have been reported to lead to good performance under favorable imaging

conditions of uniform illumination and minimal pose variations [9, 10, 13, 14]. However,

gray scale images get severely affected under severe illumination conditions and poor

resolution. The shape cues present in the gray scale image get severely destroyed under

these conditions as they contain only intensity information, thus making them recognition

difficult. Variations due to bad imaging conditions, pose and expression variations are

sometimes larger than variations between images of the same person, and hence are

crucial to address.

Recently, attention has been drawn to using the information in color spaces to improve

the performance of FR systems. Previous works [8, 15, 16, 17] have shown that chromatic

information in conjunction with intensity information lead to better FR performance in

comparison with the usage of gray scale information alone. Color features make object

recognition more robust against image variations such as illumination [17, 18].

According to a recent study on human face perception [2], faces can differ from each

other in two ways - their shape cues and their pigmentation/ color cues. The color

cues give information about the texture and surface reflectance of the face, as well as

particular hue of their hair or skin which might aid the human identification process.

Also, when shape cues are degraded (this happens to the intensity image in conditions of


bad illumination and poor resolution), the human brain uses color to pinpoint identity.

A recent study [18] asserts that although the observed colors can change significantly

under different illumination conditions, the human visual system uses color cues for

segmentation of features within a face, especially when the shape cues are degraded. Both

human face perception studies as well as recent works on computerized FR algorithms [2,

17, 15, 17] support the hypothesis that chromatic information could supplement intensity

information in automatic FR systems, which is the motivation for using color images as

inputs to appearance based FR methods in this thesis.

1.2 Face Recognition: Modes of Operation, and Tar-

get Applications

FR systems can operate in 3 modes: identification, authentication and watch list [19, 20].

In the identification mode, the FR system compares the identity of an unknown person

to all the enrolled persons in the face database, and thus reveals the identity of the

unknown person. The FR system solves a 1 : N problem in this mode, where N is the

number of subjects enrolled in the face database. This mode has applications in the

area of surveillance and the system performance is measured by the fraction of unknown

images correctly identified.

In the authentication or verification mode, the FR system verifies the identity claim of

the unknown person. The FR system compares the unknown face (input) to the claimed

identity in the database and makes a decision to accept or reject the claim. In this mode,

the FR system solves a 1:1 problem. This mode has applications in access control. The

system performance is measured by the correct accept rate versus the false accept/reject

rate, depending on the sensitivity of the application.

In both of the above modes, the assumption is that the unknown person has been

enrolled into the face database. In the watch list mode, the FR system first checks for


the presence of the unknown person (input) in the face recognition database, and if true,

identifies the person. When the FR system operates in this mode, the size of the database

is usually very small compared to the query images. The system performance is measured

by correct detection rate, correct recognition rate and false accept rate and this mode

could have applications in crime investigation and related domains.

In this thesis, our main aim is to solve complex FR problems using information in

color spaces, and we operate in the identification mode.

1.3 Challenges

Appearance based approaches are statistical methods which process the face as a holistic

pattern. In practical FR scenarios like insufficient faces available for training, complex

imaging conditions or other facial distortions, these methods are posed with statistical

challenges. The key technical barriers are summarized in this section.

1. High dimensionality of training inputs & insufficient learning samples: Face im-

ages are typically represented as vectors of pixel values. For example, a 150× 130

resolution face image is represented as a vector of dimension 19500. In contrast,

the number of samples available per subject for training the FR algorithm is usu-

ally less than 10. This leads to statistical problems like matrix singularities and

biased estimation of parameters. This scenario is referred to as the small sample

size (SSS) problem and could corrupt the design of the FR system, especially if the

FR algorithm uses the identity information in training. When the faces used are

color images, the dimensionality of the faces increases by a function of the number

of spectral planes in the face images and the sampling structure of the color space

involved, thus leading to a more severe small sample size problem. Furthermore,

the high dimensionality of face inputs also poses many computational challenges.

2. Complexities in face patterns: In practical FR systems, faces are subjected to pose


and expression variations, bad imaging conditions like severe illumination varia-

tions and poor resolution. These distortions and conditions are the complexities in

face patterns. All appearance based approaches are traditionally linear methods

and cannot learn complexities in face patterns and imaging conditions effectively.

Therefore, creating robust FR systems which can obtain discriminative information

from faces under these conditions is a major challenge.

1.4 Thesis Contributions and Organization

In this thesis, an in depth analysis on the usage of color information for face recognition

is provided, along with analysis of the behavior of color inputs and FR algorithms in

different learning scenarios, imaging conditions, and facial distortions.

In Chapter 2, a review of past literature where chromatic inputs have been used in FR

systems is provided. This includes works where chromatic information have been used

as inputs to the FR system and decision level fusion of classifiers trained on different

chromatic inputs. The subsequent chapters present the methods developed in this thesis,

1. In Chapter 3, the learning scenarios and imaging conditions under which the use

of color images significantly betters the FR performance were examined, in both

supervised and unsupervised learning modes for the YCbCr color space. The be-

havior of chromatic inputs with different sub sampling ratios in the small sample

size scenario, which is a special case of the learning scenarios examined was analyzed

for the supervised learning mode. Experiments were conducted on two evaluation

databases which had moderate and severe illumination conditions. This work was

partially published in [21].

2. In Chapter 4, a decision level combination of classifiers trained on different spectral

planes of the YCbCr color space transformations was examined using rule based

fusion methods. The motivation behind this, was to produce diverse classifiers


using the YCbCr color space, and to reduce the small sample size problem by using

chromatic information in a decision fusion framework. An analysis of the behavior

of this framework in the small sample size scenarios and the effect of chromatic

sub sampling was performed under different imaging and learning conditions. The

evaluation databases used were common to those used in Chapter 3.

3. Chapter 5, complexities in face patterns (expression and pose variations) and imag-

ing conditions (severe illumination conditions and poor resolution) were addressed

by combining the advantages of chromatic inputs (in addressing bad imaging con-

ditions) and supervised learning with ensemble learning (in learning complex face

patterns). An adaptive boosting (ada-boost) framework was used with a learner

consisting of a direct LDA feature extractor and a nearest center classifier. The

behavior of this framework was examined in various small sample size scenarios to

analyze the contribution of chromatic information and boosting in a range of learn-

ing scenarios. Experiments were performed on a large evaluation database having

severe illumination and pose variations. This work was published in [22].

To the authors knowledge, this thesis is the first work to examine the effect of the

small sample size problem created by the increased dimensionality of vectorized color

inputs on supervised systems. This thesis concludes in Chapter 6 with a summary of the

work along with future research directions.

Chapter 2

Prior Work and Background

Chromatic information has been used for object detection in a large number of works;

however, it was not traditionally applied in the recognition domain. In recent works, the

use of color images for FR purposes has been shown to improve the system recognition

performance. This chapter is concluded by providing an insight into the motivations for

thesis.

2.1 Face Recognition System

The face images contain irrelevant information along with the face, which includes the

background, hair, etc. In the preprocessing stage, the face is isolated from the rest of

the image. The face is then represented as a column vector for further processing. An

FR system consists of a training stage and a testing stage. The training stage focuses on

the creation of a low dimensional feature space is created to project the face data where

face patterns are well clustered and separated, as the original face inputs are usually of

a very high dimensionality (≈ 104). This takes place in the feature extraction step. In

the testing stage, the face inputs are projected onto this low dimensional space.

The outputs and inputs of the FR system in the testing stage depend on the mode

of operation of the FR system. In past works, FR systems have been operated in two

8

Chapter 2. Prior Work and Background 9

modes: identification and verification or authentication. The difference in the two modes

lies in the state of knowledge of the system regarding the identity of a subject. A general

framework depicting an FR system in identification/ verification mode is illustrated in

Figure 2.1.

Identification mode: Given a database, or a gallery, consisting of images of known

identity, the aim of the FR system is to identify the input image or the probe whose

identity is unknown. In the testing stage, both the gallery and probe data are projected

onto the lower dimensional feature space created in the training stage. Classification is

performed in this lower dimensional space. The output of the FR system in this mode is

the identity of the probe image. During the identification process, the system has no prior

knowledge about the identity of the unknown subject. The performance of identification

systems is measured by the Correct Recognition Rate (CRR). Correct recognition rate at

rank k refers to the ratio of the number of correct searches in the top k candidates to

the total number of probe images taken as a percentage. When k=1, this becomes the

fraction of probe images correctly identified.

Authentication mode: The aim of the FR system is to verify the claimed identity of

the probe, by comparing it with the corresponding image in the gallery. This is a one

to one problem in contrast to identification which is a one to many problem. In the

testing stage, both the claimed identity from the gallery and the unknown probe image

are projected onto the lower dimensional subspace created in the training stage, and

matching is performed. The output of system would be an accept/ reject of the claim of

the unknown person (probe) based on the distance between the projected probe and the

claimed identity. The performance of the system in the verification mode is measured

by the false acceptance rate (FAR), false rejection rate (FRR) and the total error rate

(which is the sum of the two) [23]. The FAR and FRR are computed as,

FAR =Number of subjects wrongly authenticated

Total number of intruders


Training Data

Orthogonal Feature Basis

Training

Testing

Probe ID

Probe Image

Gallery Set (identification)/

Claimed Identity (verification)

Depending on the mode of operation of the FR system, output of the testing stage is the Probe ID (identification)/ Accept or Reject of Claim (Verification)

Preprocessing and Vectorization of input



Feature Extractor

Projection onto Feature Basis

Classification

Figure 2.1: Block diagram of an appearance based FR system. This FR system archi-tecture is used in [15, 17]

FRR =Number of subjects wrongly rejected

Total number of subjects.

The Equal Error Rate (EER) is rate corresponding to which FAR equals FRR. A smaller

EER indicates a better FR system. A trade off is involved in achieving a low FAR and

FRR as it is hard to achieve them simultaneously. In face verification systems with

sensitive applications, the focus is to maximize the total recognition rate for a minimum

FAR.

Chapter

2.

Prio

rW

ork

and

Background

11

Training Set

Input 1

Input 2

Input K

Feature Basis 1

Feature Basis 2

Feature Basis K

Projection onto Feature Basis 1

Projection onto Feature Basis 2

Projection onto Feature Basis K

Probe Image

Input 1

Input 2

Input K

SimilarityComputation – S1

SimilarityComputation – S2

SimilarityComputation –SK

Decision

FusionProbe ID/

Accept-Reject

Input 1

Input 2

Input K

Depending on the mode of operation of the FR system, output of the testing stage is the Probe ID (identification) / Accept or Reject of claim (verification)

Training

Testing







Gallery Set (identification)/

Claimed Identity (verification)




Feature Extractor

Feature Extractor

Feature Extractor

Figure 2.2: Generic block diagram of FR system architecture in [16, 24, 25, 26, 27]


The FR system in Figure 2.1 is extended to a multiple classifier FR system in Figure

2.2. In this figure, the system is trained on different inputs to create a set of low dimen-

sional subspaces. Classifiers trained on different inputs in the testing stage are fused in

the decision level using an aggregation method.

2.2 Color Face Recognition: A Survey

Notable past works in the domain of color FR can be classified into 2 parts, those in

which the input to the feature extractor is the raw information contained in color spaces,

and those where the the feature extractor operates on the information in the frequency

domain, i.e., information in color spaces is operated upon by a filter.

L. Torres et al, 1999The importance of Color Information in FR

C. Jones III et al, 2006Color FR by Hypercomplex Gabor Analysis

P. Shih et al, 2005 Comparative assessment of Content Based Face Image

Retrieval in different color spaces

P.Shih et al, 2006 Improving the FRGC baseline performance using color

configurations across color spaces

J. Kittler et al, 2004Physics based decorrelation of Image Data for decision level

fusion in Face Verification

M.T. Sadeghi et al, 2007Confidence based gating of Color Features for Face AuthenticationSVM based selection of color space experts for face authentication

Feature based, frequency domainColor Space Domain

Raw data level fusion:Recognition Depends on Color Space

Decision level fusion: Color spaces contain complementary information, use of decorrelated

spectral planes

Color FR

Figure 2.3: Past Works in color FR

A work on the latter category was performed in [8] by C. Jones III et al, using gabor

analysis on color images and Elastic Bunch Graph mapping for recognition. However

this method is feature based, and also leads to a face vector of very high dimension after

gabor analysis, thus increasing the small sample size problem when used with a supervised


learner and making the FR system more computationally complex. Therefore, this line

of approach is not adopted. In this thesis, the feature extractor operates directly on

the raw information present in color spaces and treats the face as a holistic pattern. A

diagram showing the directions and hierarchy of past works in color FR is presented in

Figure 2.3.

One of the first works which proposed the idea of the usage of multi spectral or color

faces for FR was by L. Torres et al in [16]. RGB, YUV and HSV color spaces, each

comprised of three color planes, were examined for recognition purposes. Experiments

were performed on 120 images from test video sequences. Training was performed on

the gallery set, Z and then images of the probe set, Q were matched against those of

the gallery. The images in the probe set were of a different viewpoint from those in the

gallery. The Principal Component Analysis (PCA) [13] feature extractor was used in the

training module. The PCA feature extractor was trained separately on single spectral

gallery images, to produce 3 projection feature bases. The individual spectral planes

of each probe image were projected onto the corresponding feature bases created in the

training step. They were matched with the projected single spectral gallery images (of

the corresponding spectral plane) to form 3 similarity scores for each probe-gallery pair.

A decision level fusion was performed to get a single score, which was used to determine

the unknown identity of the probe. The Mahalanobis distance, given by Equation (2.1),

was used for finding the similarity measure, and classification was performed using the

nearest center classifier.

d(xi, µ) = (xi − µi)T Σi(xi − µi) (2.1)

where xi is a face vector of the ith class, µi and Σi are the mean and covariance matrix

of the ith class respectively.

The FR system framework used in this work can be fit into the block diagram in Figure

2.2. In this case, K=3, and the inputs are the images corresponding to the different


spectral planes of the YUV, SV and RGB images. This work reported a recognition rate

of 88.14 % when YUV inputs were used in the FR system, providing a 3.39% improvement

over using the Y input alone.

The important conclusions of this work are

1. The correct recognition rate of the FR system is affected by the color space used.

2. Color spaces where the luminance and the chrominance components are isolated,

lead to better FR systems.

3. Recognized faces are not the same for different color space inputs even though the

recognition rate is the same.

The conclusions in [16] have motivated further work on the usage of color spaces to

perform FR, as illustrated in Figure 2.3. Conclusion 1 provides the motivation to explore

different color spaces for recognition purposes, which was performed in [17] by P. Shih

et al. Conclusion 2 implies that for the FR system to perform well, the luminance and

chrominance spectral planes should be isolated. A broader view of this conclusion would

be that, in order to produce a more diverse set of classifiers and hence a better FR

system, the spectral planes of the inputs should carry different, uncorrelated information

(which was in fact the case with the YUV color space). This was the idea behind [26]

by J. Kittler et al. Conclusion 3 of [16], lead to the idea that color spaces contain

complementary information, which provided the motivation for fusing classifiers trained

on different color spaces. This was performed in [25] by M. T. Sadeghi et al.

P. Shih et al in [17] used the idea that the information contained in different color

spaces can be applied for different visual tasks, and therefore explored the usage various

color spaces for content based image retrieval, specifically the computer retrieval of face

images given a particular subject query. This task is similar to FR in the identification

mode and the system performance was measured by the correct retrieval rate. In this

work, 12 color spaces were examined (as inputs) to the FR system. This FR system


architecture can be fit into the block diagram in Figure 2.1. The system was trained

using the PCA feature extractor. The Mahalanobis metric given by equation (2.1) was

used to measure similarity in conjunction with the nearest neighbour classifier. The

color inputs are represented as an augmented vector by concatenating individual column

vectors formed by a row wise ordering of spectral planes (raw data level fusion). The

color spaces examined included RGB, HSV, I1I2I3, video transmission spaces (YIQ,

YUV, YCbCr) and intensity normalized RGB. Seven subspaces were analyzed for each

color space. Experiments were performed on 600 FERRET [28] images corresponding to

200 subjects and 456 FRGC images [29] corresponding to 152 subjects. The images in

FERRET have uniform illumination, pose and expression variations, while the images

in FRGC have both images in controlled and uncontrolled settings (illumination and

expression variations). Results show that the YI (from YIQ), YCr (from YCbCr) and

YV (from the YUV) subspaces lead to the best retrieval rate. Incidentally, YCbCr, YUV

and YIQ are decorrelated color spaces. Also, inputs of I1I2I3 space (decorrelated RGB)

lead to a better FR system than RGB inputs. The interpretation for these trends is that

when color spaces are decorrelated, each color spectral plane provides distinct information

about a different aspect of the image. When these spectral planes are concatenated,

they form a vector with low redundancy, in contrast with the column vector formed by

RGB inputs (where the spectral planes are highly correlated). Also the blue chromatic

plane does not provide as much discriminative information as the red, and chromatic

information needs to be used in conjunction with intensity information for a good FR

performance.

This work was extended in [15] where experiments were conducted on 1126 FRGC

images. The FR system was tested with both PCA and Linear Discriminant Analysis

(LDA) [14] feature extractors with a nearest neighbour classifier based on the normalized

inner product similarity metric. A combination of spectral planes from the YIQ and

YCbCr, the YQCr was concluded to improve the rank 1 recognition performance of the


FR system.

The work in [25, 26, 30] was motivated by the fact that in order to construct an efficient

multiple classifier FR system, the component classifiers should provide complementary

information to the FR process.

In [26] the R, G, and B spectral planes have been decorrelated and mapped to new

orthogonal spaces which separate the effects of object shape and albedo, and create

complementary data channels that lead to classifiers containing different information

having a high level of diversity. This is done by analyzing the physics of image data, and

creating an intensity channel, a green channel, g and an opponent chromaticity channel

rg. The FR system is a face verification system and its description can be fit into the block

diagram on Figure 2.2 (K = 3). Training is performed on the training data (XM2VTS

database [31]) using the LDA technique, and the BANCA database [32] was used for

testing purposes. The BANCA database contains images in controlled, uncontrolled and

adverse imaging conditions. The inputs to the feature extractor are the decorrelated

intensity, chromatic and opponent chromatic channels created. The similarity measure

was the gradient direction metric, defined in [33]. Two fusion methods, i.e, the score

averaging (a linear method) and the max rule (non linear method) were used to fuse

the outputs of the individual classifiers. The total error rate, false acceptance rate and

false rejection rate were used to evaluate performance of the FR system at a global

threshold. The fusion of the classifiers created from the decorrelated data (intensity, g

and rg) significantly improved the performance of the FR system over usage of the RGB

color space alone. The interpretation of this data would be that, fusing diverse classifiers

created from decorrelated / independent spectral planes boosts the performance of a

multiple classifier FR system.

Different color spaces contain complementary information which could be useful to the

FR system. In different imaging conditions, different color faces provide discriminatory

information to the FR system. In [25], classifiers trained on different color spaces are


fused using an aggregation scheme at the decision level, and the classifiers to be fused

are chosen based on a confidence based gating scheme. The idea is to use inputs from

all useful color spaces for the face verification process. The FR system in this work can

be fit into Figure 2.2, where K is the number of color spaces used in the training and

classification process. However, in the decision fusion stage, only those classifiers which

are experts according to the confidence based gating rule(s) are aggregated. Training

is performed on the training data (XM2VTS database [31]) using the LDA technique.

The XM2VTS database consists of controlled images of 295 subjects. The Gradient

Direction Metric [33] is used as the similarity measure and the false acceptance rate and

false rejection rate were used to evaluate performance of the FR system. The method of

aggregation of the classifier experts chosen through the confidence based gating process is

the majority vote. Using this method of confidence based choosing of classifier experts,

the optimum subset of expert classifiers is dynamically chosen for each probe image

and aggregated to produce a more accurate expert. The performance of the verification

system was considerably improved by this aggregation over both the gray scale baseline

and the individual experts themselves.

This work has been extended in [27], where the the classifiers are trained on the

spectral planes of the different color spaces, thus increasing the competing classifiers by

three times. In this work, aggregation was performed by a more sophisticated method

based on support vector machines.

The previous works reinforced that chromatic information boosts the performance of

the FR system. They focused on the problem of creation of complementary classifiers for

efficient combination in a decision fusion framework. This was performed in two ways

• Using a confidence based gating method to choose classifier experts trained on

different color spaces / spectral planes and combine them in a decision aggregation

framework,

• decorrelating the data in the spectral planes of color spaces to produce diverse


classifiers for use in a multiple classifier FR system.

They also focused on the evaluation of the usage of different color spaces for FR applica-

tion in their raw form by performing a raw data level fusion. An important conclusion is

that fusion of decorrelated data performs better, both in multiple classifier systems and

raw data level, and color information in conjunction with intensity information helps the

FR system.

Motivated by the conclusions in the previous works, the YCbCr color space trans-

formations - YCbCr 4:4:4, YCbCr 4:2:2 and YCbCr 4:2:0, along with their subspaces

are chosen for our FR experiments and analysis as the YCbCr is a decorrelated color

space and has demonstrated a good performance for FR tasks [17]. Since the intensity

and the chromatic planes of the YCbCr space are decorrelated, it is a good color space

for examining the contribution of chromatic information. Also, since it is used in digital

video and image compression standards, it could have benefits from an application point

of view like the integration of FR and video systems. Although sub sampling of the

chromatic planes in the YCbCr 4:2:2 and YCbCr 4:2:0 transformations does not have

any notable visual difference considering that humans do not perceive high color spatial

resolutions, its effect on the FR system is not evident, and will be discussed in the next

chapter.

2.3 Motivation of this Thesis

The databases used in previous literature however, consisted of images which were not

always captured under controlled settings and therefore, did not always have good shape

cues. In this work, the imaging and learning conditions in which color information

significantly helps the FR system is studied. For example, if the images in the face

database have a high resolution and are photographed in a controlled environment with

no degradation of the shape cues, will color cues still improve performance? Experiments


were performed to examine the performance of chromatic information in a wide range

of learning scenarios with various difficulties and different imaging conditions, as these

trends would help in the design of the FR system.

An important direction of research is the effect of the increased dimensionality of

color inputs on the small sample size problem in supervised learning. The small sample

size problem could corrupt the design of the FR system and is an important factor to

consider in FR systems. The effect of a decision level fusion of the decorrelated spectral

planes of the YCbCr space on the small sample size problem is also examined.

We have explored a third method of creating complementary classifiers with color

data using a boosting framework, consisting of supervised learners. This framework

was tested this framework on a range of small sample size learning scenarios on a large

database having severe illumination and pose variations to examine the effect of ensemble

learning and chromatic information on the performance of the FR system under imaging

conditions and facial distortions encountered in real life applications.

The work in this thesis is complimentary to the conclusions and ideas proposed in

previous literature on color FR, and therefore is not directly comparable to past works.

Chapter 3

Color face recognition in different

learning scenarios

In this chapter, the usefulness and contribution of chromatic information is examined in a

range of learning scenarios and different imaging conditions in order to conclude the exact

scenarios where usage of color information would help the FR system in both supervised

and unsupervised learning modes. The implication of the extra dimensionality added by

chromatic spectral planes on the small sample size problem encountered in supervised

learning systems is another aspect evaluated in this chapter.

3.1 Introduction

Recently attention has been drawn to the usage of color information for FR purposes.

Previous works have confirmed the usefulness of color in automatic FR systems [16, 15,

17, 8]. According to the studies on human perception and vision, color cues are supposed

to improve the performance of the FR system when shape cues are degraded. However,

the usage of color images poses two main challenges to an automatic FR system,

1. Computational and storage requirements: Face images are represented as vectors in

20

Chapter 3. Color face recognition in different learning scenarios 21

an FR system. Usage of color information leads to a vector of a larger dimension

which substantially increases the computational cost of the FR system. Also, a

face database comprising of color images would require a larger storage space.

2. Larger Dimensionality of color inputs vs. less number of training samples: Face

inputs have a very large dimensionality, approximately of the order of 104, while the

number of training samples available is very low (around 2-10 samples/ subject),

as mentioned in Chapter 1. This leads to a small sample size problem. When color

inputs are used, the dimensionality of the face inputs becomes larger, thus leading

to a more challenging small sample size problem.

It is therefore important to determine the imaging conditions, learning scenarios and

situations in which color information would be useful to the FR system, in other words,

whether chromatic information would have the same degree of contribution to the per-

formance of the FR system when there is no degradation of the shape cues (distortion of

the intensity image) and when the learning scenarios are optimal.

The usefulness of chromatic information is examined in this chapter for both super-

vised and unsupervised FR systems. The learning scenarios examined are a function

of the number of subjects available and the samples per subject available for training

the FR algorithm. The latter parameter is an important factor in supervised learning

scenarios. Table 3.1 summarizes the various learning scenarios examined. The effect of

chromatic information is also examined for different imaging conditions. Two databases

have been chosen for evaluation, one with severe illumination variations and the other

with relatively moderate illumination variations, in order to examine the contribution of

color space information in different imaging conditions.

A special case of the learning scenarios in Table 3.1 is the small sample size scenario,

which exists when the number of samples per subject available for training is very small,

(approximately two to three images per subject) as this affects supervised FR systems.


Table 3.1: Learning Scenarios encountered in face recognition problemsNo. of Subjects No. of Samples per subject

Low LowLow HighHigh LowHigh High

The effect of the increased dimensionality of color inputs on the small sample size problem

is an important issue to be taken into consideration in the design of an effective FR

system.

The YCbCr color space transformations (along with the various sub sampling ratios)

are used for analysis in this thesis. However, the effect of spatial sampling of the chromatic

spectral planes on the FR system is not evident. The implication of chromatic sub

sampling on the FR system and the small sample size problem is another aspect studied

in this chapter.

3.2 Representation of Color Information

Let Si be the ith 2-d image in a set of images, with spatial dimensions, J = IW × IH and

K spectral planes. Each spectral plane has a spectral depth of 8 bits (which corresponds

to values between 0 and 255); therefore Si has a spectral depth of K×8 bits. The number

of spectral planes is dependent on the color space, for example, an RGB image will have

3 spectral planes: R, G and B. Every image Si is represented as a column vector, xi for

future analysis. In order to convert Si to xi, the following steps are performed:

1. Each spectral plane is converted to a column vector.

2. The column vectors from each spectral plane are concatenated.

In order to form a column vector for the mth spectral plane, where 1 ≤ m ≤ K,

the 8 bit values of that spectral plane are ordered lexicographically (row-wise) into a

column vector, sim, where sim is the column vector of mth spectral plane of the ith


image. Also, sim ∈ RDm×1, where Dm is the dimensionality of sim. Dm is dependent on

2 parameters: the sampling nature of sim and the spatial dimension of Si. For example:

If Si is converted to the YCbCr 4:2:0 color space used in MPEG standards, K = 3, and

every alternate row and column is eliminated from the Cb and Cr (chromatic) spectral

planes (subsampling) while forming their column vectors. Therefore, the ratio of the

dimensions of the 3 spectral planes will be DY : DCb : DCr = 4 : 1 : 1, and their values

will be of the form Dm = J × µ, where µ is the scaling factor of that particular spectral

plane; µ = 0.25 for the Cb and Cr spectral planes and µ = 1 for the Y plane, in this case.

After forming the column vectors, si1, si2, ..., siK , xi is formed by, xi = [sTi1s

Ti2...s

TiK ]T .

The Dimension of xi is d =∑K

m=1Dm. This is performed for all Sis in the set of images to

form a set of column vectors, {xi}N

i=1, where N is the number of images in the set. Since

K = 1 for gray scale images, and K = 3 for most color images, the dimensionality of

the column vector for a color image is thrice that of the corresponding gray scale image,

when no sampling is performed.

Column vectors of individual spectral planes of YCbCr transformed images are nor-

malized to zero mean and unit variance prior to concatenation or further processing.

This operation is possible because YCbCr is a decorrelated transform, and hence the

individual spectral planes can be operated upon separately.

3.3 Background

All appearance based methods can be classified on the basis of the knowledge used by

the FR system in the feature extraction step of the training stage. They are classified

into unsupervised methods, supervised methods, and those methods based on inter and

intra personal variations (based on Bayes learning [34]). In this section, the underlying

concepts of supervised and unsupervised learning are presented along with a description

of the two most basic unsupervised and supervised learning methods used in FR systems.


• In unsupervised learning, the learner (or feature extractor) uses solely the input

patterns, i.e., the preprocessed faces in the training database to form the feature

basis of the FR system. The learner is not provided with any class information,

which includes identities of the subject, class means or variances. The Principle

Component Analysis (PCA) is the most fundamental unsupervised learning method

used in FR.

• In supervised learning, the feature extractor is furnished with the preprocessed

input patterns, along with their class information, class means, inter and intra class

variations. All of this information is used to create a feature basis for projection

in the testing stage. Linear Discriminant Analysis (LDA) is the most fundamental

supervised learning method used in FR systems.

Most appearance based FR methods, including those based on kernels [35, 36] are

based on PCA and LDA.

3.3.1 Principal Component Analysis

The PCA is one of the first and most popular tools for data reduction and feature

extraction. It was first used for FR in [13]. The PCA focuses on finding a set of orthogonal

basis vectors which maximize the total scatter or variance in the training samples.

Given a training set Z, containing images {zij}, where zij is the jth image of the

ithclass in Z and Ci is the number of images in the ithclass and C is the number of

classes, the covariance matrix is given by,

Scov =1

N

C∑

i=1

Ci∑

j=1

(zij − z)(zij − z))T (3.1)

where z = 1N

∑C

i=1

∑Ci

j=1 zij is the average of the training samples. The covariance given

by Equation 3.1 is the sum of the intra and inter class variances of all the images of


the training set [37]. The orthogonal basis is formed by the eigen decomposition of the

covariance matrix. Finding an orthogonal feature basis to maximize the total scatter

given in equation 3.1 corresponds to solving the following eigen value problem,

Φk = λkScovΦk

where k = 1, 2, ...M . The PCA feature space is thus spanned by the M (M < d) most

significant eigen vectors, Φk corresponding to the M largest eigen values, where d is the

dimensionality of the face input zij . Every face zij is projected onto this low dimensional

feature space by the linear mapping: yij = W TPCA(zij − z), where WPCA = [φ1, φ2, ...φM ]

is the transformation matrix consisting of the first M most significant eigenvectors. The

vector of reduced dimension yij is a vector formed by the projections of zij on each of

the M orthonormal basis vectors. The classification of faces takes place in this reduced

feature space using any classifier.

PCA achieves object reconstruction in the least square sense and maximizes both the

inter and intra class variances. Since the intra class variances could have a negative impact

on the performance of FR systems, it is generally believed that PCA does not perform

as well as supervised learning techniques based on the Linear Discriminant Analysis.

3.3.2 Linear Discriminant Analysis

The Linear Discriminant Analysis (LDA) method is a supervised learning method used

in FR systems, and is the basis for all the supervised learning methods in FR literature.

LDA uses class specific projections and produces a set of orthogonal vectors to form a

low dimensional discriminative feature space.

Given a training set Z = {Z}C

i=1, containing C classes with each class Zi = {zij}Ci

j=1,

consisting of images zij (where zij is the column vector of the jth image of the ith class),

a total of N =∑C

i=1Ci are present on the training set. The dimensionality of the column


vectors of the images in Z is d. LDA finds a set of M feature vectors, M ≤ d, based on

the following optimality criterion,

Ψ = arg maxΨ

∣

∣ΨTSBΨ∣

∣

|ΨTSW¶si|(3.2)

where, Ψ = [ψ1ψ2...ψM ], ψk ∈ ℜd, and SB and SW are the between class and within class

scatter matrices respectively and defined as per the following equations,

SB =1

N

C∑

i=1

Ci(zi − z)(zi − z)T =C

∑

i=1

ΦB,iΦTB,i = ΦBΦT

B (3.3)

SW =1

N

C∑

i=1

Ci∑

j=1

(zij − zi)(zij − zi)T (3.4)

where ΦB,i =(

Ci

N

)1

2 (zi − z), ΦB = [ΦB,1ΦB,2...ΦB,C ] and zi =∑Ci

j=1 zij is the mean of the

class Zi.

The optimization problem in equation 3.2 is equivalent to solving the following eigen

value problem,

SBψk = λkSWψk, k = 1, ...,M (3.5)

The basis vectors ψk are the eigen vectors corresponding to the M largest values of

S−1W SB, provided SW is not singular.

Although LDA is expected to perform better than the PCA since it utilizes class

information to create a low dimensional feature basis, it is more susceptible small sample

size problem, which will be discussed in the next section.


3.4 Color and the Small Sample Size Problem

3.4.1 Small Sample Size Problem

According to statistical learning theory, as the dimensionality of the face input, d in-

creases, the estimation of the scatter matrices becomes increasingly difficult. This is

because in practical FR systems, the data available for training is usually very less com-

pared to the order of the dimensionality of the training inputs. A problem is poorly

posed if the number of parameters to be estimated is comparable to the number of train-

ing samples, L and is ill posed if it is far greater than L [38]. This makes the estimation of

scatter matrices an ill posed problem, and is referred to as the small sample size problem.

The small sample size (SSS) problem is most severe in supervised learning scenarios

based on LDA. The SW matrix is essentially proportional to the sum of the covariance

matrices of the individual classes. The number of samples available for training in each

class (≤ 10) is typically very small compared to the dimensionality of the column vectors

of the samples in Z(of the order of ≈ 104). This makes the estimation of the SW matrix

a highly ill posed problem as it has a very low rank, and in the case of the classical LDA,

might lead to highly degenerate scatter matrices. Therefore the direct optimization of

the ratio in equation 3.2 becomes difficult as SW is singular, and leads to highly biased

eigen values.

3.4.2 Implication of Color Inputs

For inputs with multiple spectral planes, the dimensionality of the column vector is

increased by the number of spectral planes, thus making the estimation of SW more ill

posed. For example, if zijs are gray scale and have a dimensionality of d = 150 × 130 =

19500 and Ci = 2,∀i, the small sample size becomes more prominent if zij was color as

d would be increased three times to 58500 (without sub sampling), while the number of

training samples would remain the same. Lowering of the number of samples per class


leads to biased estimates of eigen values, i.e., the largest ones are biased high while the

small ones are biased very low.

Sub sampling of chromatic planes using the standard ratios is not perceptually visible

to the human eye since humans see color with much less spatial resolution than intensity

[39], however its impact on the FR system is not totally evident. Chromatic sub sampling

has 2 major implications,

• Removal of bytes from the chromatic planes would mean removal of input infor-

mation to the FR system. It is not known whether loss of chromatic information

would degrade the FR system, i.e., whether YCbCr 4:4:4 would lead to a better

FR performance than YCbCr 4:2:2.

• Sub sampling would also lead to a reduced dimension of xi as opposed to no sam-

pling. This would have an implication in supervised learning systems trained with

LDA when the number of samples per subject is very low (≈ 2 − 3) as a reduced

input dimension might lead to a less ill posed within class scatter matrix, reducing

the small sample size problem. For example, the dimension of the column vector

for a YCbCr 4:2:0 is approximately half that of a YCbCr 4:4:4 input.

It is therefore interesting to examine the effect of the extra chromatic spectral planes

in supervised FR systems with a small number of samples per subject for training (L ≈

2−3). Intuitively, YCbCr transformation applied to the input faces should be an optimal

trade off between the amount of chromatic information used and the dimension of the

input vector. For example, a YCbCr 4:2:0 is expected to lead to a better FR system than

a YCbCr 4:4:4 transformation in an extreme small sample size scenario. In this chapter,

the effect of chromatic inputs and spatial sampling of the chromatic planes on the FR

system are studied, along with their implications on the small sample size problems.


3.5 Methodology and Experimental Setup

For the experiments in this chapter, the FR system has been trained on the gallery set,

Z. The FR system operates in the identification mode, and the images of the probe

set, Q are to be matched against those of the gallery. A pictorial description of the FR

system is given in Figure 3.1.

Conversion to YCbCr

Orthogonal Feature Basis

Probe ID

Probe Image

Training

Testing

Gallery DataPreprocessing and

Construction of Column Vector

Projection onto Feature Basis

FeatureExtractor

ClassificationConversion to

YCbCr

Preprocessing and Construction of Column Vector

Gallery Data

Figure 3.1: System Description (The color space transformation is the same in bothtraining and testing stages)

The images of the gallery and probe sets, Z and Q not only contain the face but

also contain irrelevant portions comprising of the background, hair, shoulder, etc. These

images are therefore passed through a preprocessing stage where the face is isolated from

the rest of the image, and the preprocessed face is converted to a column vector for

further processing. The method used for preprocessing is explained in Appendix B. The

resolution of the images after preprocessing are fixed to 150×130, as this resolution is

commonly used in surveillance applications. The preprocessed faces are then vectorized

following the procedure detailed in Section 3.2.

The image vectors are in the RGB format and are then transformed to the YCbCr

color space. The YCbCr transformations used are the YCbCr 4:4:4, YCbCr 4:2:2 and

YCbCr 4:2:0. The YCr subspace is also tested, as the red spectral plane is proven to be


more discriminative for FR purposes than the blue spectral plane in past literature [17].

The corresponding YCr transformations are therefore referred to as YCr 4:4:4, YCr 4:2:2

and YCr 4:2:0. The same color space/ gray scale transformation is performed in both

the training and testing stages. The gray scale transformation is used as a baseline for

comparison.

In this chapter, the effect of chromatic information on the FR system is evaluated in

good and poor imaging conditions, for both supervised and unsupervised FR systems.

The PCA feature extractor is used when the feature extractor is an unsupervised learner.

In the supervised learning case, the LDA feature extractor is utilized, however, a PCA

step is applied prior to the LDA in order to avoid the inversion of a singular SW .

Two subsets of the CMU PIE[40, 41] database are chosen for evaluation, DB1 and

DB2. Database DB1 consists of images having severe illumination conditions caused by

varying positions of camera flash in a room with zero background illumination. Database

DB2 contains images with varying camera flash positions with uniform background illu-

mination, therefore neutralizing to an extent the effect of the flash; Database DB2 has

lighter illumination variations than DB1. Faces in both databases have neutral expres-

sion and frontal pose. DB1 and DB2 contain 1496 images and 1425 images respectively

of 68 subjects. A description of the CMU PIE database along with sample images from

DB1 and DB2 is provided in Appendix A.

For evaluation, C subjects are chosen from the evaluation database (DB1/DB2 ) along

with all corresponding images to form a database Y . Y is randomly partitioned into the

training/gallery set,Z and probe set Q, such that Y = Z +Q and Z ∩Q = ∅. A random

partition is performed on Y , such that Z is composed of CxL images. The remaining

|Y | −CxL images comprise the probe set,Q, where |Y | is the cardinality of Y . Any Face

Recognition method evaluated is first trained on Z and then evaluated on Q to produce a

Rank k Correct Recognition Rate (CRR). The performance of the system is measured by

the Rank 1 CRR. Results for the Rank 5 CRR are also provided, in order to evaluate the


contribution of chromatic information when the performance measure criterion is more

relaxed. The results are reported at an average greater than 5 runs to avoid bias.

3.6 Results

In this section, the contribution of chromatic information in FR systems is examined for

1. Easy to hard learning scenarios- varying number of subjects, C and samples per

subject, L. C is fixed to 35 and 65 and L varies from 2 to 9 samples/ subject

2. Poor and good illumination conditions, for both supervised and unsupervised learn-

ing systems.

3.6.1 Choice of Gray scale baseline and Similarity Metric

In order to evaluate the contribution of the chromatic spectral planes, all performances

must be compared to a gray scale baseline. Three gray scale transformations whose

inputs have the same dimensionality were evaluated: Y from YCbCr, R from RGB and

an RGB linear combination of 0.2B + 0.7G+ 0.1R, for both supervised and unsupervised

learning systems in Figure 3.2 on a subset of database DB1. In addition, two similarity

metrics were evaluated, one based on the Euclidean Distance and the other based on the

inner product.

The cosine similarity metric based on the inner product is given by,

d =u · v

|u| |v|(3.6)

where d is the distance, u and v are the pattern vectors. The normalized inner product

produces the cosine metric.


2 3 4 5 6 7 8 915

20

25

30

35

40

45

50

55

60

Samples/Subject

Ran

k 1

CR

R%

PCA feature extraction (Database: DB1, C=65)

Y (Inner Product Metric)Y (Euclidean MetricR (Inner Product Metric)R (Euclidean Metric).2B+.7G+.1R (Inner Product).2B+.7G+.1R (Euclidean)

2 3 4 5 6 7 8 970

75

80

85

90

95

100

Samples/Subject

Ran

k 1

CR

R%

LDA feature extraction (Database: DB1, C=65)

Y (Inner Product Metric)Y (Euclidean MetricR (Inner Product Metric)R (Euclidean Metric).2B+.7G+.1R (Inner Product Metric).2B+.7G+.1R (Euclidean Metric)

Figure 3.2: Comparison of gray scale transformations and similarity metrics

The Euclidean similarity metric is given by,

d = −√

(u− v)′ · (u− v) (3.7)


where d is the distance and u and v are pattern vectors.

From Figure 3.2, all gray scale transformations lead to almost the same level of

performance. Since the YCbCr chromatic transformations are being used, the Y trans-

formation is chosen over the other gray scale transformations. Also, the inner product

based similarity metrics perform better than the euclidean distance based metric. The Y

transformation is therefore chosen as the baseline, and the inner product based similarity

metric is used for the remainder of the experiments.

3.6.2 Unsupervised Learning

In this section, the effect of various color space transformations will be evaluated for

both evaluation databases, DB1 and DB2 in the unsupervised FR system (trained with

a PCA feature extractor).

An obvious trend noticed from Figures 3.3, 3.4, 3.5 and 3.6 is that the overall FR

performance is much higher in the case of database DB2 by approximately 10 %, which

gives an insight into the difficulty of the FR problem when database DB1 is used. From

Figures 3.3 and 3.4, a broad conclusion is that, for database DB1, color space transfor-

mations outperform the gray scale Y transformation over all learning scenarios examined,

although for database DB2, the gray scale Y transformation leads to a better recogni-

tion rate than the color transformations for L ≥ 3 for all Cs examined. The images in

database DB1 have poor illumination conditions, and therefore the shape cues of these

images are degraded. Therefore, the chromatic planes are necessary to boost the recogni-

tion performance in these imaging conditions. On the other hand, for database DB2, the

images have good illumination conditions and hence the intensity plane of the images is

not degraded. Chromatic planes hence do not contribute to the performance of the FR

system when operated on database DB2.


2 3 4 5 6 7 8 9

25

30

35

40

45

50

55

60

65

70

75

Samples/Subject

Ran

k 1

CR

R%

Performance of YCbCr transformations with PCA (Database: DB1, C=35)

YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)

2 3 4 5 6 7 8 920

25

30

35

40

45

50

55

60

65

70

Samples/Subject

Ran

k 1

CR

R%



Figure 3.3: Rank 1 performance of YCbCr transformations with PCA feature extractor(Unsupervised Learning), database DB1

Database DB1

The YCbCr 4:4:4 transformation leads to the best FR performance for all values of C

and L examined. Spatial sub sampling of chromatic planes leads to loss of important


2 3 4 5 6 7 8 9

40

45

50

55

60

65

70

75

80

85

Samples/Subject

Ran

k 1

CR

R%



2 3 4 5 6 7 8 9

35

40

45

50

55

60

65

70

75

80

Samples/Subject

Ran

k 1

CR

R%




information, thus leading to a deterioration in FR performance. Since PCA focuses on

object reconstruction by maximizing the total scatter, it can be concluded that, when

the shape cues are unclear, chromatic information is very important to achieve this


2 3 4 5 6 7 8 930

40

50

60

70

80

Samples/Subject

Ran

k 5

CR

R%



2 3 4 5 6 7 8 930

35

40

45

50

55

60

65

70

75

80

Samples/Subject

Ran

k 5

CR

R%




reconstruction as the intensity image is degraded.

Another trend noticed is that the YCbCr 4:2:2 and the YCr 4:2:2 perform better

than the YCbCr 4:2:0 and YCr 4:2:0 transformations. This trend leads to the conclusion


2 3 4 5 6 7 8 940

45

50

55

60

65

70

75

80

85

Samples/Subject

Ran

k 5

CR

R%



2 3 4 5 6 7 8 940

45

50

55

60

65

70

75

80

85

90

Samples/Subject

Ran

k 5

CR

R%




that the Cr plane has more discriminative information for the FR application, and loss

of information from the Cr plane leads to a larger deterioration in performance, when

compared to the Cb plane.


The contribution of chromatic information to the FR system remains approximately

the same across all learning scenarios for all color transformations examined. The trends

and conclusions observed are alike for both rank 1 and rank 5 performance measures

- relaxation of the performance measure criterion, does not reduce the contribution of

color information.

Database DB2

The best color transformations are the YCr 4:2:0 and YCr 4:2:2 across all learning sce-

narios examined. The color transformations are outperformed by the Y transformation.

However, for the hardest learning case of C=65 and L=2, the YCr 4:2:0 and YCr 4:2:2

color transformations perform as well as the Y transformation alone.

The trends observed in Figure 3.4 suggest that addition of color information reduces

the performance of the PCA based FR system under most learning conditions. The PCA

is a statistical algorithm and requires training data for the creation of a discriminative

low dimensional face space. The results suggest that the extra information in the Cb and

Cr planes are not as useful as the information in the Y plane, and therefore the reduce

the performance of the FR system when passed as inputs to the PCA feature extractor.

In fact, the chromatic inputs with most chromatic information (YCbCr 4:4:4) lead to the

worst performance.

The YCr transformations on the whole lead to better rank 1 and rank 5 performances

than the YCbCr transformations. The YCr 4:4:4 performs better than the YCbCr 4:2:2

over all values of C and L, despite the fact that both transformations possess the same

amount of chromatic information. Thus, the Cr spectral plane has better discriminative

information for the FR application, which reinforces the conclusions made in past liter-

ature [17] and the trends observed with database DB1. The trends observed remain the

same for both C=35 and 65 as well as for performance measures, rank 1 and 5.

In conclusion, chromatic information aids the performance of the FR system signif-


icantly in difficult illumination conditions, when the shape cues are unclear leading to

degraded intensity images. When the illumination conditions are good, the addition of

chromatic bytes to the face input lead to a reduction in FR performance. The bytes

of information which constitute a chromatic input are therefore very important in the

design of a color FR system. The trends in performances of the various transformations

examined remain constant over a range of learning scenarios.

3.6.3 Supervised Learning

In this section, the effect of the various chromatic transformations will be examined on

both the evaluation databases DB1 and DB2 for an LDA based supervised FR system.

Since the LDA algorithm is susceptible to the small sample size problem, an evaluation

of the behavior of the FR system in the small sample size problem and the effect of color

information on this problem is also presented in this section.

From Figures 3.7, 3.8, 3.9 and 3.10, it is obvious that the FR system has a better

performance of approximately 10% for small L when operated on database DB2, when

compared to database DB1. This reconfirms the difficulty of the FR problem when

operated on database DB1, as mentioned in Section 3.6.2. A general conclusion can

be made on the variation of the FR system performance with respect to the number of

samples per subject, L. As L increases, the performance of all the color space and gray

scale transformations converge to a constant high value. This convergence occurs for a

lower value of L when the FR system operates on database DB2, and can be attributed

to the fact that recognition of images from database DB2 is not as hard a problem

as database DB1, therefore, the FR system does not require specialized inputs for the

creation of a discriminative feature space, when the learning scenarios are not hard.

The detailed trends and conclusions on each of the evaluation databases are provided in

sections 3.6.3 and 3.6.3.


2 3 4 5 6 7 8 9

80

85

90

95

100

Samples/Subject

Ran

k 1

CR

R%

Performance of YCbCr transformations with LDA (Database: DB1, C=35)


2 3 4 5 6 7 8 975

80

85

90

95

100

Samples/Subject

Ran

k 1

CR

R%

Performance of YCbCr transformations with lda (Database: DB1, C=65)


Figure 3.7: Rank 1 performance of YCbCr transformations with LDA feature extractor(Supervised Learning), database DB1

Database DB1

As with the case of the unsupervised learning scenario, chromatic information is especially

important in conditions of poor illumination. The contribution of color information is


2 3 4 5 6 7 8 9

88

90

92

94

96

98

100

Samples/Subject

Ran

k 1

CR

R%



2 3 4 5 6 7 8 9

90

92

94

96

98

100

Samples/Subject

Ran

k 1

CR

R%




significant for low values of L, and as L increases to 9, the FR performance of all color

space and gray scale transformations converge to a constant high value.

An important observation is that, for small L ≈ 2 − 3, the YCbCr 4:4:4 transfor-


2 3 4 5 6 7 8 982

84

86

88

90

92

94

96

98

100

Samples/Subject

Ran

k 5

CR

R%



2 3 4 5 6 7 8 980

82

84

86

88

90

92

94

96

98

100

Samples/Subject

Ran

k 5

CR

R%




mation leads to the worst performance and is marginally better than the gray scale Y

transformation. This trend can be attributed to the extremely large dimensionality of

a YCbCr 4:4:4 input (thrice the corresponding Y input). The increased dimensionality


2 3 4 5 6 7 8 992

93

94

95

96

97

98

99

100

Samples/Subject

Ran

k 5

CR

R%



2 3 4 5 6 7 8 994

95

96

97

98

99

100

Samples/Subject

Ran

k 5

CR

R%




causes the within class scatter matrix of the LDA learner to be ill posed, leading to a more

challenging small sample size problem. In the most extreme small sample size scenario

examined, L = 2, an interesting trend is noticed. When C = 35, the YCr 4:4:4 leads


to the best FR performance, and when C=65 (the hardest learning scenario), the YCr

4:4:4 and the YCbCr 4:2:2 lead to the best FR performance. These transformations are

followed by YCr 4:2:2 and YCbCr 4:2:0. These transformations have dimensionalities ap-

proximately 12− 2

3that of YCbCr 4:4:4. The YCr 4:2:0 and YCbCr 4:4:4 transformations

are among those which lead to the worst performance in this learning scenario. These

trends lead to the idea that chromatic inputs help the FR system in the small sample size

scenario, however a color transformation with optimal dimensionality with respect to the

FR performance should be chosen. This observations suggests that although increased

dimensionality of color inputs could lead to a small sample size problem, color inputs with

optimal dimension enhance the FR system significantly even in extreme small sample size

learning scenarios. The trends are similar for both rank 1 and rank 5 performances.

Database DB2

As with the unsupervised learning case, the contribution of chromatic information is not

significant when the illumination conditions are good, as the shape information present

in the intensity image is not degraded.

The contribution of chromatic information when the FR system operates on database

DB2 is most pronounced in the hardest learning scenario examined, corresponding to

C = 65 and L = 2. In this learning scenario, the YCr 4:2:2 and YCr 4:2:0 transforma-

tions offer a marginal improvement in both rank 1 and rank 5 CRRs over the gray scale Y.

Incidentally these are the transformations which lead to lowest input dimensionality. This

trend suggests that, color helps the FR system in hard learning scenarios, even when the

illumination conditions are not poor. In all the other learning scenarios, color transfor-

mations do not offer any significant improvement in performance to the FR system. As L

increases, for both values of C examined, all transformations converge to a constant high

value. The effect of the dimensionality of other high dimensional color transformations

(from YCr 4:4:4 to YCbCr 4:4:4) on the small sample size problem is very clearly seen


in Figures 3.8 and 3.10. The trends observed when the FR system operates on database

DB2 reinforces the theory that, a color input with the best trade off between dimension-

ality and amount of chromatic information should be chosen, depending on the difficulty

of the imaging conditions and learning scenarios. Another observation is that, the YCr

transformations lead to a better FR performance than the YCbCr transformations of the

same dimensionality (YCr 4:4:4 & YCbCr 4:2:2, YCr 4:2:2, YCbCr 4:2:0). This leads to

the obvious conclusion, that information from the Cr plane contains more discriminative

information required to create the LDA feature basis than that contained in the Cb plane.

This conclusion is identical to that obtained when a PCA feature extractor was applied

on database DB2 in Section 3.6.2.

In conclusion, the contribution of color is most significant in severe imaging conditions

and hard learning scenarios. The extra dimensionality of color inputs does have an

implication on the small sample size problem encountered with an LDA learner, although

if color inputs with good dimensionality trade off is chosen, the performance of the FR

system can be boosted. Another factor to be taken into consideration in the design of

LDA based FR systems, is the inclusion of bytes from the best spectral planes for FR

purposes in the construction of the vectorized face input. Both of the above parameters

are important for supervised LDA based FR system design, and depend on the severity

of the illumination conditions of the face images. As the learning scenarios are relaxed,

all face inputs lead to a high performance.

3.7 Conclusions

In this section, the conclusions of the trends observed in the experiments is summarized,

along with recommendations on the usage of chromatic information for both supervised

and unsupervised FR systems for the scenarios listed in Table 3.1.

An obvious trend is that the performance of the FR system is better in the super-


vised learning mode for all learning scenarios and illumination conditions examined, as

expected. The training was performed on the gallery set, therefore class specific projec-

tions (LDA feature basis) could provide a more discriminative feature space.

Another trend is that the overall FR performance was much higher when the FR sys-

tem was operated on database DB2 when compared to database DB1, which is because

the database DB1 was captured under difficult imaging conditions. Chromatic informa-

tion boosts the performance of FR systems under severe imaging conditions, and does

not provide discriminative information to the FR system under good imaging conditions.

A trend which holds true over all illumination conditions examined, for all FR systems is

that the red chromatic plane has a more discriminative information than the blue plane

for FR purposes. Therefore choosing of the correct bytes of chromatic information to

form the face input is very important.

Sub sampling of chromatic spectral planes leads to a loss of chromatic information

when unsupervised feature extractors based on PCA are operated on databases with dif-

ficult imaging conditions and thus would lead to a deterioration of performance. Larger

dimensional chromatic inputs help in better reconstruction and creation of a more dis-

criminative feature space. On the other hand, when the illumination conditions are good,

sub sampled chromatic planes lead to a better performance than without spatial sam-

pling. This trend continues over all learning scenarios. The PCA algorithm does not

suffer from the small sample size problem.

In supervised systems based on the LDA, the contribution of chromatic spectral planes

is most pronounced in poor illumination conditions and hard learning scenarios. When

the illumination conditions are good, and the number of samples per subject is not ex-

tremely small, color does not significantly help the FR system, and when the learning

scenarios become less hard, both gray scale and color inputs lead to a very high per-

formance. This trend holds true for both low and high number of subjects. The extra

dimension of color does have an implication on the small sample size problem which the


LDA algorithm is susceptible to. The improvement offered by color transformations is

notable in the small sample size scenario, when the number of samples per subject is

around 2 or 3; however the best trade off between using more chromatic information and

a chromatic input with reasonably low dimensionality with respect to FR performance

is necessary. This chosen dimensionality would depend on the imaging conditions un-

der which the faces were captured and the hardness of the learning scenario (number of

subjects under consideration). Table 3.7 summarizes the optimal color space/ gray scale

transformations for both imaging conditions and learning scenarios examined in the most

extreme small sample size scenario examined (Number of samples per subject= 2).

No. of subjects35 65

Moderate Imaging Conditions Y Ycr 4:2:0, Ycr 4:2:2(Database DB2)Severe Imaging Conditions Ycr 4:4:4 Ycr 4:4:4, YCbCr 4:2:2(Database DB1)

Table 3.2: Best Color/ Gray scale transformations in Extreme Small Sample Size scenario

3.8 Chapter Summary

In this chapter, the contribution of color inputs to the FR system was examined under

a range of learning scenarios covering those listed in Table 3.1 under good and poor

illumination conditions, for both supervised and unsupervised FR systems. It was found

that color inputs significantly help the FR system under difficult learning scenarios and

hard imaging conditions for both supervised and unsupervised systems.

The implication of chromatic sub sampling was examined for unsupervised FR sys-

tems, under different imaging conditions. It was found that under severe imaging condi-

tions, chromatic sub sampling could lead to a loss of important color information, thus

leading to a less discriminative feature basis.


Experiments were carried out to identify the implication of the extra dimension of

color inputs and the spatial sub sampling of chromatic spectral planes on the small

sample size problem encountered in supervised learning systems, and the conclusions

were presented.

Chapter 4

Decision Level Fusion of Spectral

Planes

In this chapter, the decisions obtained by classifiers trained on decorrelated individual

spectral planes of YCbCr transformed inputs are fused to produce a final decision. The

impact of this decision fusion framework on the FR system, specifically on the small sam-

ple size problem encountered in supervised learning systems is discussed in this chapter.

The results also provide an insight into the discriminatory capabilities of the different

spectral planes of YCbCr transformed inputs.

4.1 Introduction and Objective

The integration of chromatic data into the FR system improves the performance of the FR

system, and this holds true especially in conditions of poor illumination as experimentally

concluded in Chapter 3. In appearance based FR methods, chromatic information is

integrated into the FR system by the fusion of information from individual spectral

planes. This fusion can take place at three levels,

• Raw data/ signal level fusion: This usually involves concatenation of vectorized

49

Chapter 4. Decision Level Fusion of Spectral Planes 50

data from individual spectral planes forming a long vectorized chromatic input.

• Feature level fusion: This involves fusing of the feature vectors constructed from

the individual spectral planes.

• Decision level fusion: This level of fusion involves fusing of the decisions obtained

by classifiers trained on individual spectral planes.

Face inputs of color spaces in which each of the spectral planes provides different

and complementary information to the FR system, i.e., where the information contained

in each of the spectral planes is decorrelated lead to a more efficient use of chromatic

information and better FR systems. In the case of signal level fusion, concatenation of

decorrelated information from individual spectral planes, where each spectral plane offers

unique discriminative information to the FR system would lead to a vectorized input

with low redundancy. Similarly, in the case of decision level fusion, classifiers trained

on decorrelated information would lead to a diverse set of classifiers, which theoretically

would result in a better multiple classification FR system. This trend of decorrelated

color spaces leading to promising FR systems is supported by the results of past works

[26, 17] and is also true for all levels of fusion.

Fusion at the raw data level leads to a vectorized chromatic input of a dimension µ

times that of the corresponding gray scale image, where µ is a function of the number

of spectral planes, K in the color space and the spatial sub sampling structure of the

chromatic planes. This vectorized chromatic input of increased dimension could typically

increase the small sample size problem in supervised FR systems as discussed in Chapter

3. This issue therefore motivates the usage of a different level of fusion of information

in spectral planes. In [24] by M.T. Sadeghi et al, different levels of fusion of information

of spectral planes were examined on inputs of the RGB color space. It was found that

the most effective types of fusion were fusion at the feature level, performed by the

concatenation of the low dimensional feature vectors formed from the individual spectral


planes and decision level fusion performed by a similarity score average at the classifier

level, of the decisions obtained by the classifiers trained on the individual spectral planes.

However, fusion at the decision level is computationally the simplest method of fusion,

and can avoid the passing of large dimensional inputs through the feature extraction

process.

In Chapter 3, YCbCr transformed face inputs were fused at the signal level and

were shown to improve the performance of FR systems especially in poor illumination

conditions and difficult learning scenarios, specifically the small sample size learning

scenario. It was also concluded that sub sampling ratio could be used as a parameter

to control the trade off between dimensionality of the face input and the amount of

chromatic information used, for effective use of color inputs in small sample size scenarios

for different imaging conditions. In this chapter, the idea of creating a multiple classifier

system for FR purposes by fusing the decisions of classifiers trained on individual spectral

planes of YCbCr transformed inputs is explored, and the impact of this framework on the

small sample size learning scenario is specifically examined. Since the YCbCr color space

is decorrelated, it is expected to lead to a good multiple classifier system with diverse

classifiers.

In summary, the objectives/ issues examined in this chapter are,

• Address the small sample size problem caused by increased dimensionality of vec-

torized chromatic inputs by exploring fusion of information from spectral planes on

a decision level. The effect of spatial sub sampling of chromatic spectral planes on

this framework is also examined.

• The results of the above two issues can also be used to determine the spectral

plane which performs best under the different imaging conditions examined, as

each spectral plane offers distinct information to the FR system.


In supervised learning FR systems, all color space/ gray scale transformations con-

verge to a constant high value of performance as the number of samples per subject

available for training increase, which is experimentally concluded in Chapter 3. How-

ever, in the extreme small sample size scenario, the performance of the system is severely

affected by high dimensionality inputs and inputs with low discriminative information.

Therefore the small sample size scenarios are of particular interest in supervised learning

FR systems.

4.2 Combination Strategies

In this section, the various methods by which decisions of classifiers trained on different

spectral planes are combined are explained. In this chapter, rule based fusion methods,

( whose fusion rules do not depend on the training data) are used for the combination

of decisions. Although data dependent fusion methods are expected to lead to a better

performance [42], they are not examined in this chapter as the focus is to examine the

effect of the small sample size problem on the multiple classifier fusion framework.

Let z be a probe or an unknown face input to be identified which consists of K

spectral planes, such that z = {sm}K

m=1, where sm is the mth spectral plane of image z.

The aim of the multiple classifier system is to determine the identity of z, ωj from among

the C classes where ωj ∈ {1, 2, 3, ..., C}. In the case of the experiments performed in this

thesis, K = 3 for YCbCr transformed inputs.

A multiple classifier FR system can be formulated on the framework of Bayesian

estimation theory [30]. According to Bayesian estimation theory,

assign z to ωj if

P (ωj/z) =C

maxk=1

P (ωk/z) (4.1)


Since z contains 3 spectral planes s1, s2 and s3,

P (ωj/s1, s2, s3) =C

maxk=1

P (ωk/s1, s2, s3) (4.2)

Equations (4.1) and (4.2) suggest that z is assigned to the class which has the max-

imum a posteriori probability, given the spectral planes, {sm}3m=1 of z. The estimation

of this a posteriori probability, P (ωj/z) would depend on the fusion rule adopted.

Sum Rule

In the sum rule, the a posteriori probability is given by,

P (ωk/z) = P (ωk/s1, s2, s3) ≈3

∑

m=1

P (ωk/sm) (4.3)

The probability value P (ωk/sm) lies in the interval [0,1] and is typically the value of

the similarity score between sm and the mth spectral plane of an image of class ωk in the

gallery.

By substituting Equation (4.3) in Equation (4.2), the sum rule is given by Equation

(4.4)

assign z = {sm}3m=1 to class ωj if,

3∑

m=1

P (ωj/sm) =C

maxk

3∑

m=1

P (ωk/sm) (4.4)

Max Rule

Similarly, for the max rule, the a posteriori probability is given by,

P (ωk/z) = P (ωk/s1, s2, s3) ≈ maxm

P (ωk/sm) (4.5)

Substituting Equation (4.5) in Equation (4.2), the max rule of fusion is given by,



3maxm=1

P (ωj/sm) =C

maxk=1

3maxm=1

P (ωk/sm) (4.6)

Min Rule

Similarly, the min rule is given by Equation (4.7),


3

minm=1

P (ωj/sm) =C

maxk=1

3

minm=1

P (ωk/sm) (4.7)


For the experiments in this chapter, the FR system is trained on the gallery set, Z. The

images of probe set, Q are matched against those of the gallery, Z. The FR system is

operated in the identification mode. Figure 4.1 presents a pictorial representation of the

system used.

The images used for experiments consist of other irrelevant information along with

the face, e.g., hair, background, shoulder, etc. The face is isolated from these images

for experiments, and this is performed in the preprocessing step. The method for pre-

processing follows that explained in Appendix B. The resolution of the images after

preprocessing are fixed to 150×130, like in the experiments performed in Chapter 3 as

this resolution is commonly used in surveillance applications. Each of the preprocessed

faces are then vectorized following the procedure detailed in Section 3.2.

The images of the faces are stored in RGB format, and are converted to the required

color space in the color space transformation block. The YCbCr set of color transforma-

tions are used in the experiments. The YCbCr transformations used for the experiments

include the YCbCr 4:4:4, YCbCr 4:2:2 and YCbCr 4:2:0. Following the color space trans-


formation step, the Y, Cb, and Cr planes are isolated, and passed through the remaining

procedures as detailed in Figure 4.1.

Since the main objective of this chapter is to examine the effect of using a multiple

classifier FR system on the small sample size problem encountered in supervised FR

systems, the LDA based feature extractor is used. To ease the inversion of the within

class scatter matrix, SW , a PCA step is performed prior to the LDA, and the number

of features retained by the LDA feature extractor is C − 1, where C is the number of

subjects. The evaluation databases chosen for experiments in this chapter are the same

as those in Chapter 3, DB1 and DB2. The number of subjects, C is however fixed to 65,

and the samples per subject for training, L are varied between 2 and 9. The method of

creating the gallery, Z and probe, Q sets is also the same as the procedure described for

the experiments in Chapter 3.

The decisions of the classifiers trained on the various Y, Cb and Cr spectral planes

are fused in an aggregation step as depicted in Figure 4.1. The aggregation methods are

the sum rule, the max rule and the min rule. In the next section, the three aggregation

rules will be compared and the best performing one(s) will be chosen for the remainder

of the experiments. The normalized inner product is used as a similarity metric as it

was experimentally proven to be lead to a better performance than the euclidean based

metric by the experiments in Section 3.6 of Chapter 3. The FR system is first trained

on Z and evaluated on Q to produce a Rank k Correct Recognition Rate (CRR). For

our experiments, k = 1, 5. The results are reported at an average greater than 5 runs to

avoid bias. Each run is performed on a random gallery-probe partition.

Chapter

4.

Decis

ion

Level

Fusio

nof

Spectral

Planes

56

Gallery SetColor Space

Transformation

Y input

Cb input

Cr input

WY

WCb

WCr

Probe Image

YCbCr Color Space

Transformation

YCbCr Color Space

Transformation

Projection onto WY

Projection onto WCb

Projection onto WCr

SimilarityComputation



Decision Fusion /

Aggregation

Classification

Probe ID

Testing

Training

Cb

Cr

Y

Y

Cb

Cr


Linear Discriminant

Analysis

Linear Discriminant

Analysis

Linear Discriminant

Analysis



Gallery Set

Figure 4.1: System Diagram: Decision Level Fusion


4.4 Results

In this section, the effect of decision level fusion is specifically examined in a range of

small sample size learning scenarios. Experiments are performed on both images with

severe illumination conditions, database DB1 and moderate imaging conditions, Database

DB2. The number of subjects is fixed to C = 65 as mentioned and the samples per

subject available for training, L ∈ {2, 3, 4, 6, 9}. The discriminatory information present

in individual spectral planes of the YCbCr transform is also studied for both evaluation

databases.

In order to examine the improvement achieved by decision level fusion, a new perfor-

mance measure has been introduced, β∗. This signifies the best improvement obtained

by a decision level fusion over the raw data level fusion. A negative value of β∗ signifies

an improvement. The rank 1 and rank 5 results for evaluations on both database DB1

and database DB2 are presented in Tables 4.1-4.12. β∗444, β

∗422 and β∗

420 signify the best

improvements obtained by a decision level fusion for the YCbCr 4:4:4, YCbCr 4:2:2 and

YCbCr 4:2:0 transformations respectively. The first column in all tables consists of the

performances with the gray scale baseline, Y.

4.4.1 Choice of Aggregation Rule

In order to evaluate the contribution of fusion or aggregation at a decision level, a good

aggregation method should be chosen. Three aggregation methods are evaluated- the

sum rule, the max rule and the min rule on images from both evaluation databases. The

color space transformation used was YCbCr 4:4:4, and the aggregation methods were

evaluated using the Rank 1 CRR performance measure. All methods of decision level

fusion were compared against the raw data level fusion of the Y, Cb, Cr spectral planes.

The graphs in Figure 4.4.1 show that the sum rule leads to the best performance when

database DB1 is used, while the max rule leads to the best performance when database


2 3 4 5 6 7 8 975

80

85

90

95

100

Samples/Subject

Ran

k 1

CR

R%

Comparison of Aggregation Methods(Database: DB1)

YCbCr (min rule)YCbCr (max rule)YCbCr (sum rule)YCbCr (raw data fusion)

2 3 4 5 6 7 8 975

80

85

90

95

100

Samples/Subject

Ran

k 1

CR

R%

Comparison of Aggregation Methods(Database: DB2)

YCbCr (min rule)YCbCr (max rule)YCbCr (sum rule)YCbCr (raw data fusion)

Figure 4.2: Comparison of Aggregation Rules on Databases: DB1 and DB2

DB2 is used. The min rule of fusion does not lead to as good an FR performance as the

raw data level fusion or the other methods of decision aggregation evaluated. This leads

to the conclusion that more optimistic decision rules lead to better FR performances.


In the case of the min rule, the decision combiner reports a classification error if even

one of the component classifiers (trained on one of the spectral planes) reports a mis-

classification or a low a posteriori probability of a correct class. The most optimistic

rules- the sum and the max rule therefore lead to the best FR performance, as supported

by past literature on classifier combination [30]. The sum rule and max rule decision

aggregators are therefore are used for the remainder of the experiments.

Another trend observed is that the best improvement offered by decision fusion is in

the extreme small sample size scenario examined, L = 2. As the value of L increases, this

contribution also reduces, and all fusion methods converge to a constant high performance

for large L. This aspect will be examined in the subsequent sub sections.

4.4.2 FR Performance: Poor Illumination conditions

Similar to the experimental results obtained in Chapter 3, the experimental results in

Tables 4.1-4.3 suggest that under all learning scenarios examined, images with chromatic

information leads to a better FR performance than pure gray scale images, irrespective

of the method and level of fusion used. The corresponding rank 5 results are presented in

tables 4.4-4.6. In poor illumination conditions, the shape cues are unclear in the intensity

image, therefore chromatic information significantly helps in boosting the performance

of the FR system.

Effect of decision level fusion in small sample size scenarios

From Tables 4.1-4.3, the values of β∗444, β

∗422, and β∗

420 are negative for all values of

L, which indicates that fusion of classifiers on a decision level boosts the performance

of the FR system. As expected the values of |β∗| for all YCbCr transformations are

highest in the most extreme case of small sample size learning examined (L = 2), and

reduce monotonically with the increase in L. This is because aggregation of information

from different spectral planes on a decision level prevents the issue of passing a higher


Table 4.1: Rank 1 CRR in % (YCbCr 4:4:4, Database DB1 )L Y Cr Cb YCbCr β∗

444

signal level max sum

2 75.53 80.56 71.09 78.43 80.84 83.25 -4.833 84.91 89.68 82.01 88.82 89.91 90.96 -2.144 91.02 93.62 86.83 92.99 93.57 94.66 -1.676 97.03 97.51 94.28 96.85 97.92 98.67 -1.829 99.39 99.26 97.63 99.56 99.58 99.74 -0.19

Table 4.2: Rank 1 CRR in % - (YCbCr 4:2:2, Database DB1 )L Y Cr Cb YCbCr β∗

422


2 75.53 81.24 70.73 79.97 81.49 83.52 -3.553 84.92 89.24 82.20 88.44 89.81 90.94 -2.54 91.02 93.92 88.08 93.57 93.91 95.09 -1.526 97.02 98.08 93.36 97.13 98 98.59 -1.469 99.39 99.46 97.98 99.49 99.58 99.81 -0.32


420


2 75.53 81.63 69.18 78.69 81 83.50 -4.803 84.91 88.68 81 87.37 88.87 90.56 -3.184 91.02 93.37 88.05 92.94 94.07 95.43 -2.496 97.03 97.03 93.88 97.41 97.54 98.31 -0.909 99.39 99.36 97.34 99.29 99.68 99.71 -0.42

dimensional input vector consisting of color information through the feature extraction

process, thus reducing the number of parameters to be estimated in the within class

scatter matrix, SW leading to a less severe small sample size problem.

The values of |β∗422| are slightly lower than those of |β∗

444| and |β∗420| in the extreme

small sample size learning scenarios. This suggests that the improvement offered by fus-

ing on a decision level is slightly lower for YCbCr 4:2:2 transformed inputs. Experimental

results in Chapter 3, suggest that in conditions of severe illumination, a transformation

with the right trade off between the amount of chromatic information contained and the

input dimensionality should be used. Among the YCbCr transformations, this optimal

trade off was obtained by the YCbCr 4:2:2. The YCbCr 4:4:4 and YCbCr 4:2:0 trans-



444


2 80.64 86.09 77.41 83.88 85.89 87.61 -3.733 87.05 91.84 84.81 90.77 91.99 92.35 -1.584 92.71 94.98 89.66 95.05 95.14 95.93 -0.886 97.03 97.51 94.28 96.85 97.92 98.67 -1.829 99.39 99.26 97.63 99.55 99.58 99.74 -0.19


422


2 80.64 86.7 77.6 85.32 86.68 88.06 -2.743 87.05 91.65 85.19 90.32 91.6 92.61 -2.294 92.71 95.38 90.7 94.8 95.52 96.31 -1.526 97.03 98.08 93.36 97.13 98 98.59 -1.469 99.39 99.46 97.98 99.49 99.58 99.81 -0.32


420


2 80.64 86.8 75.83 83.68 86.03 87.63 -3.953 87.05 90.88 84.19 89.47 90.64 92.31 -2.844 92.71 95.02 90.41 94.32 95.5 96.45 -2.136 97.03 97.03 93.87 97.41 97.54 98.31 -0.99 99.39 99.36 97.34 99.29 99.68 99.71 -0.42

formations have a very high dimensionality and less chromatic information respectively,

when the spectral planes are fused at the signal level. The improvement offered by fus-

ing on a decision level is lower for YCbCr 4:2:2 than YCbCr 4:4:4 and YCbCr 4:2:0

transformed inputs as the raw data fusion performs better.

As mentioned earlier, the sum fusion rule leads to a better FR performance for images

in this evaluation database. However no trend is observed in the performance of decision

level fusion with respect to the sampling structure of the chromatic input.


Discriminative capacity of Individual Spectral Planes

From Tables 4.1-4.3, a trend observed is that information from the red spectral plane

has the most discriminative information while the blue spectral plane has the least. This

is in conjunction with previous works and experiments on color FR. The information

contained in the intensity plane, Y is not sufficient for good FR performances.

As the evaluation method is relaxed, i.e., when the rank 5 performance measure is

used (Tables 4.4-4.6), the FR system performances are higher as expected. However, the

improvement in FR performance offered by a decision level aggregation, i.e., the values

for |β∗| for all YCbCr transformations are lower than the corresponding rank 1 results in

the small sample size learning scenarios.

4.4.3 FR Performance: Good Illumination conditions

In this section, the experimental results the discriminatory ability of various spectral

planes and the effect of a decision level aggregation is examined for images captured

in moderate illumination conditions. The experimental rank 1 results are provided in

Tables 4.7-4.9. A general conclusion is that in conditions of moderate/ light illumination

variations, when the shape cues of the image are clear, color information does not improve

the performance of the FR system. This is similar to the conclusions of the experiments

in Chapter 3.

Effect of decision level fusion in small sample size scenarios

From Tables 4.8-4.9, it is observed that the performance of fusion of spectral planes on any

level does not lead to a boost in performance compared to using gray scale information

alone. This can be attributed to the low discriminative ability of the Cb plane and

clear intensity images. The conclusions of Chapter 3 suggest that in the extreme small

sample size learning scenario which corresponds to L = 2, the YCr 4:2:0 transformation

(when fused on the signal level) leads to a better performance than gray scale images



444


2 95.23 81.87 62.81 89.1 94.17 91.63 -5.083 98.56 91.32 76.23 96.6 98.08 96.52 -1.484 99.74 95.41 84.08 98.53 99.29 98.63 -0.776 99.98 99.11 90.55 99.62 99.98 99.54 -0.369 100 99.47 96.27 99.94 100 99.94 -0.06


422


2 95.23 80.6 61.69 92.42 93.56 90.81 -1.133 98.56 90.69 76.52 97.43 98.06 96.05 -0.634 99.74 95.77 83.1 99.29 99.57 98.7 -0.286 99.98 98.27 91.25 99.81 99.95 99.78 -0.149 100 99.79 95.86 100 100 100 0


420


2 95.23 80.94 60.31 94.37 93.69 90.5 0.673 98.56 90.89 76.42 98.32 98.48 96.48 -0.164 99.74 94.59 82.29 99.53 99.32 98.61 0.216 99.98 98.1 90.19 99.93 99.93 99.74 09 100 99.38 95.53 99.97 100 99.97 -0.03

alone. Therefore a fusion of the Y and Cr planes on a decision level could boost the FR

performance over the grayscale Y transformation. However, the contribution of fusion of

spectral planes on a decision level over the signal level fusion can still be discussed from

the trends observed in tables 4.8-4.9.

The values of |β∗444|, |β

∗422| and |β∗

420| are highest for the small sample size scenario

and decrease monotonically with the increase in L for all three YCbCr transformations

examined, which suggests that the contribution of decision level aggregation is most in

the extreme small sample size scenarios. This is similar to the trends observed in the case

where evaluation database DB1 was used. When a raw data level fusion is performed, the

Rank 1 CRR for L = 2 is lowest for the YCbCr 4:4:4 transformation and increases with



444


2 97.35 88.19 71.31 93.96 96.79 95.38 -2.833 99.13 93.54 80.28 97.75 98.83 97.55 -1.074 99.89 96.94 86.52 98.93 99.57 99.27 -0.646 99.98 99.11 90.55 99.62 99.98 99.54 -0.369 100 99.47 96.27 99.94 100 99.94 -0.06


422


2 97.35 86.56 69.69 95.46 96.62 94.58 -1.153 99.13 92.91 80.36 98.26 98.83 97.35 -0.574 99.89 97.03 86.2 99.64 99.76 99.34 -0.136 99.98 98.27 91.25 99.81 99.95 99.78 -0.149 100 99.79 95.86 100 100 100 0


420


2 97.35 87.67 68.23 97.37 96.96 94.4 0.43 99.13 93.26 80.14 98.97 99.07 97.41 -0.14 99.89 96.15 85.68 99.7 99.64 99.29 0.066 99.98 98.1 90.19 99.93 99.93 99.74 09 100 99.38 95.53 99.97 100 99.97 -0.03

YCbCr 4:2:2 and YCbCr 4:2:0. The contribution of fusion, |β∗| is highest for YCbCr 4:4:4

(≈ 5.07) and reduces as more chromatic sampling is performed. This can be attributed

to the fact that the YCbCr 4:4:4 has the highest dimensionality and is most severely

affected by the small sample size problem when a signal level fusion is performed. The

value of |β∗420| suggests that the performance of the FR system obtained by decision level

aggregation is almost the same as that obtained by a raw data level fusion.

The max rule of of decision aggregation leads to the best aggregation performance

under most cases. Similar to the previous results on database DB1, no trend is observed

in the performance of decision level fusion with respect to the sampling structure of the

chromatic input.


Discriminative capacity of Individual Spectral Planes

As mentioned earlier, when the imaging conditions are optimal, the gray scale image (Y

transformation) contains sufficient discriminative information for a well performing FR

system. However the Cb plane leads to a poor performance, and should be avoided when

fusing chromatic information.

When the performance measure is relaxed, i.e., when the rank 5 performance measure

is used (tables 4.10-4.12), the CRRs obtained are higher. Also, the values of |β∗| are lower

than the corresponding rank 1 results.

4.5 Conclusion

In this section, the conclusions of the trends observed in this chapter are summarized.

Chromatic information in general improves the performance of the FR system in condi-

tions of severe illumination. When the imaging conditions are optimal, the shape cues

present in the intensity image provide enough discriminatory information for the FR

system, and the systems performance is not improved by the integration of chromatic

spectral planes. This is in agreement with the conclusions in Chapter 3.

The Cb plane contains poor discriminative information when used with supervised

learning systems, especially in moderate imaging conditions. A good aggregation of in-

formation from the Y and Cr planes is therefore expected to boost the FR performance

over using Y alone when the imaging conditions are moderate. When the imaging con-

ditions are severe, the Cr plane offers the most significant discriminative information,

while the Y plane leads to the best individual performance when the imaging conditions

are optimal.

An important conclusion is that fusion of spectral planes on a decision level leads to

a better use of chromatic information in conditions of small sample size learning. This

is because a decision level aggregation helps avoid the passing of a larger dimension


input through the feature extraction process. This holds true for all imaging conditions

examined. As the small sample size condition is relaxed, i.e., more samples per subject

are available for training, the contribution of decision level aggregation over a signal level

fusion is reduced.

4.6 Chapter Summary

In this chapter, the effect of a decision level fusion of information from individual spectral

planes of YCbCr inputs was examined over a range of small sample size learning scenarios.

Small sample size scenarios are of particular interest in supervised FR systems. As the

small sample size condition is relaxed, all gray scale and color space transformations

fused over all levels converge to a high FR performance, however there is still scope for

an improvement of FR performance in the small sample size scenarios, where there is a

lack of training data available.

Experimental results suggest that a decision level aggregation of classifiers trained on

individual spectral planes, boosts the performance of the FR system over a signal level fu-

sion of information, and this improvement in performance is most significant in the small

sample size learning scenarios. The discriminative capability of the individual spectral

planes of the YCbCr transformation was also examined, and the results were presented

in this chapter. It was concluded that the Cb spectral plane does not significantly help

the FR performance, compared to the intensity Y and Cr spectral planes.

Chapter 5

Color Face Recognition in

Ada-Boost framework

In this chapter, intensity and chromatic information is used as an input to the FR system

to create complementary classifiers to be combined in a decision fusion framework, in or-

der to address complexities in face patterns and severe imaging conditions. Complexities

in face patterns manifest in the form of expression and pose variations and severe imag-

ing conditions take the form of severe illumination / lighting conditions, poor resolution,

etc. These conditions cannot be easily learned by linear feature extractors. Complemen-

tary classifiers are created by ensemble learning using the adaptive boosting (ada-boost)

framework.

5.1 Introduction

Features based on color information lead to a better recognition performance in FR sys-

tems as confirmed by the experiments in the previous chapters and past works [15, 16,

17, 24, 25, 26]. The results in Chapter 3 and 4 suggest that color makes object recog-

nition more robust to imaging conditions such as illumination, and a face space created

with a supervised learning method based on the LDA criterion, trained on intensity and

67

Chapter 5. Color Face Recognition in Ada-Boost framework 68

chromatic information leads to a good FR performance in poor illumination conditions.

This enhances the performance of the FR system by combining the advantages of both

chromatic features and supervised learning.

However, linear feature extractors based on the LDA criterion cannot effectively learn

complexities in face patterns which occur when face patterns are subject to pose and

expression variations, and therefore lead to a deterioration in FR performance under these

conditions[11, 36]. Variations due to factors like illumination, pose and expression could

cause larger intra subject variations in faces than variations due to change in identity and

hence are crucial to address. In order to take complexities in face patterns into account

while training the system, linear methods of learning like the LDA should be replaced

by either globally nonlinear models, like those based on kernel discriminant analysis [36],

or by a linear combination of locally linear models (ensemble based models). Ensemble

based models based on a linear combination of linear and complementary classifiers are

advantageous over kernel based analysis in dealing with complexities as are less likely to

over fit, and have fewer parameters to optimize than their kernel counterparts [11, 19].

Previous works [25] have created multiple classifier FR systems based on LDA learners

trained on chromatic information, as discussed in Chapter 2. In [25], the concept that

different color spaces offer different information about the faces to the FR system is

utilized and the classifier experts trained on the different color spaces are combined in

a decision fusion framework. The experts are dynamically chosen using a confidence

based gating scheme, and depend on the probe image to be identified. This approach

to classifier combination combines the information contained in relevant color spaces

thus addressing various imaging conditions, however, cannot address the simultaneous

variation of pose, expression and illumination in face patterns which is a very realistic

situation especially in surveillance applications, where pictures of subjects may not be

captured in controlled conditions.

In this chapter, a multiple classifier FR system trained on chromatic information is


built using an ensemble learning framework. The learning framework used overcomes

the limitations of the classical LDA learner and previous multiple classifier FR systems

trained on chromatic information and addresses both complexities in face patterns and

illumination conditions, by creating complementary classifiers using the ada-boost tech-

nique.

5.2 Motivation: Ada-Boost Learning

This section presents the details and motivation behind the choice of the chosen ada-boost

framework. The learning framework used in this chapter aims at addressing the com-

plexities in face patterns and illumination conditions by combining the advantages of the

combination of chromatic features and LDA based learning in addressing FR systems

with that of ensemble learning in addressing complexities in face patterns. Ensemble

learning methods such as boosting and bagging are reported to lead to better perfor-

mances in pattern recognition systems when compared to individual learners as they

learn the various patterns in the training data and can generalize across different kinds

of images in the testing set. [11, 43, 42].

LDA based methods are susceptible to the small sample size problem frequently en-

countered in high dimensional pattern recognition tasks such as FR. When the faces are

multi spectral or color, the extra dimensionality of color inputs poses a more challeng-

ing small sample size problem which was explained in chapter 3. A direct effect of the

small sample size problem is the singularity of SW , which makes its inversion difficult.

A variant of the LDA called the direct LDA was proposed by H. Yu et al in [44] which

eases the inversion of the within class scatter matrix, SW , thus making it suitable for

application for high dimensional data. The direct LDA however does not totally solve

the small sample size problem as the estimation of SW still remains ill posed. J. Lu et

al have proposed a method, Ada-boost.M2 based on ada-boost [11] to linearly combine


a set of linear models into an ensemble model. Each linear model consisted of a feature

extractor trained using a direct LDA learner [10] and a linear classifier. This method

was tested on a subset of gray scale images from the FERRET database [28] having pose

(upto 22.5 degrees) and expression variations and proved to be effective in addressing

these complexities caused by pose and expression variations. Ada-Boost.M2 hence com-

bines the advantages of the adaptive boosting framework in addressing complexities and

the direct LDA in addressing the problem of degenerate scatter matrices. It however

involves a trade off between the weakness of individual learners and low generalization

error achieved on the training set, in order to create the most effective complementary

classifiers.

In this chapter, the conclusions of the previous chapters are extended by using chro-

matic information as an input to the ada-boost.M2, so that the ensemble of LDA based

learners can effectively learn the difficult illumination conditions and complexities in face

patterns which take the shape of variations in pose and viewpoint. Images of different

color spaces (RGB and YCbCr spaces) are used as inputs to the ada-boost.M2, and this

combination is tested on faces subjected to both horizontal and vertical pose variations

up to a maximum of 45 degrees and severe illumination conditions. It is found that

in certain cases this combination utilizes both the advantages of chromatic information

in dealing with images with illumination variations and the ada-boost.M2 in addressing

complexities caused by pose and viewpoints. However a challenge is that color inputs

have three spectral planes and hence the direct LDA learner is posed with a more severe

small sample size problem.

This learning framework is examined in various small sample size scenarios and the

impact of both chromatic spectral planes and boosting on the LDA learner is discussed

in detail in Section 5.6.


5.3 Background

In this section, an explanation of the concept of ada-boost is presented along with a

description of the ada-boost.M2 framework introduced in [11] which are used for the

experiments in this chapter.

The basic aim of boosting is to improve the performance of a learning algorithm [37].

It involves creating a weak learner whose error on the training set is slightly better than

average, and combining an ensemble of these learners in a decision fusion framework to

produce a strong ensemble learner whose combined decision rule outperforms each of the

individual learners and has a relatively low classification error on the training set. A

classifier/ learner is said to be weak or unstable if small changes in the training data

lead to significantly different classifiers/ learners and large changes in the accuracy. The

individual learners should be diverse and have a low mutual dependence, and are trained

on subsets of the training data in such a way that they offer complementary information

to the FR system.

The most popular variation of boosting is ada-boost. Ada-boost involves the addition

of subsequent weak learners to the ensemble in every iteration until the combined ensem-

ble learner achives a low error on the training set. In order to design a good ada-boost

system,

• There should be an interaction between the booster and the individual learners. The

learner in each subsequent iteration is trained on those training samples which were

hardest to classify in the present iteration. This is usually performed by assigning

a weight to each training sample at the end of every iteration, which determines its

probability of being selected for the subsequent iteration. The ada-boost therefore

focuses on the difficult patterns. This ensures the complementarity and low mutual

dependence of the individual classifiers.

• The Boosting procedure should create weak learners/ classifiers which have a low


mutual dependence and a low generalization error on the training set. This in

theory involves a trade off, as it is hard to achieve both conditions simultaneously.

Past works [11] however indicate that boosting is generally robust to over fitting of the

training data, and can learn a wide range of patterns.

Ada-boost.M2 has been chosen for the experiments in this chapter owing to its demon-

strated capability in addressing a large database containing complexities in face patterns

in gray scale images, as mentioned earlier. Each individual learner consists of a direct

LDA based feature extractor [10] and a linear classifier (nearest center). The direct LDA

based feature extractor and the ada-boost.M2 framework are explained in the remainder

of this section.

5.3.1 Regularized Direct LDA

The linear discriminant analysis or LDA feature extractor finds the set of orthogonal

vectors which maximize the ratio of the inter/ between class scatter matrix to the within/

intra class scatter matrix, as explained in Section 3.3. As explained earlier, the number

of samples per subject available for training is usually very small when compared to the

dimensionality of the face input used for training. The large dimensionality of face inputs

makes the estimation of the within class scatter matrix, SW an ill posed problem, and

this is referred to as the small sample size problem. This leads to a singular SW as the

matrix has a very low rank and hence is impossible to perform an inversion operation on

SW , making it difficult to obtain the LDA feature basis.

The issue of inverting SW has been solved in different ways. In Chapter 3, a PCA step

was performed prior to the LDA, thus effectively reducing the dimensionality of the LDA

input from d (dimensionality of column vector of the face) to N−C, where N = C×L (N

is the number of images in the training set, C is the number of subjects being considered

and L is the number of samples per subject) [14]. However, this solution does not lead

to an optimal solution for the LDA feature matrix as part of the important within/ intra


scatter information is lost in the PCA preprocessing step.

H. Yu et al in [44] proposed a different method of finding the LDA feature basis, W

which does not involve the PCA preprocessing step. If A is the null space of the between

class scatter matrix, SB and B is the null space of SW , according to the LDA opti-

mality criterion, the direct LDA finds the M most significant eigen vectors in AC⋂

B

which maximize the ratio in Equation (3.2). This is performed by first diagonalizing

SB using eigen decomposition, and retaining only the most significant C − 1 vectors

(rank(SW )=min(N, c− 1)) to form AC . SW is then projected onto this low C − 1 dimen-

sional space, AC . AC⋂

B could be solved for by performing an eigen decomposition on

the projected SW and retaining the M vectors which correspond to the smallest eigen

values. AC⋂

B is usually a low dimensional subspace.

In the experiments performed, the regularized direct LDA (R-LDA) [9] is used as

the feature extractor in the individual learning block of the boosting framework and is

based on the direct LDA. The R-LDA is a variant of the direct LDA, and uses a modified

fisher’s criterion,

Ψ = arg maxΨ

∣

∣ΨTSBΨ∣

∣

η |ΨTSBΨ| + |ΨTSW )Ψ|(5.1)

where, Ψ = [ψ1ψ2...ψM ]T and η is the regularization parameter. This modified criterion

has the effect of decreasing larger eigen values and increasing smaller eigen values, thereby

counteracting the high bias involved in the estimation of eigen values. It also has the

effect of adding a minimum value to the zero eigen values, thus making SW easier to

invert. The criterion in Equation (5.1) is equivalent to the conventional Fisher’s criterion

in Equation (3.2), according to the following theorem [11]:

Theorem 1: In an n-dimensional vector space, ℜn, ∀x ∈ ℜn, let h1(x) = f(x)g(x)

,

h2(x) = f(x)g(x)+ηf(x)

, where f(x) ≥ 0, g(x)>0, 0 ≤ η ≤ 1 and f(x) + g(x)>0. If h1(x) has a

maximum (including +∞) at x0 ∈ ℜn, then h2(x) has a maximum at the same point.

This modified criterion reduces the bias and variance in estimating the eigenvalues,

at the same time prevents the issue of inverting a singular SW , however the estimation


of the SW matrix is still an ill posed problem, especially when the number of samples

per class is approximately 2 to 3, and the dimensionality of the samples is usually of

the order of 104. In this chapter,the effect of the more severe small sample size problem

created by the increased dimensionality of a multi spectral image is examined on the

boosting framework in a range of small sample size learning scenarios. The regularization

parameter η is therefore fixed to a particular value, η = 1 in the experiments.

5.3.2 Ada-Boost framework

The individual learner in the ada-boost.M2 consists of a R-LDA feature extractor and a

linear classifier: Nearest Center Classifier. A new learner is formed in each subsequent

iteration based on the outputs or results from the learner in the previous iteration in

the form of the updated parameters, which depend on the error in the hard to classify

samples and hard to classify subjects of the previous iteration. The classifier built at

iteration t is a Nearest Center linear classifier and is denoted by ht. The final classifier

hf is a weighted sum of all ht’s. A learner consisting of a R-LDA feature extractor and a

Nearest Center classifier is henceforth referred to as a g-Classifier. A general ada-boost

learning framework is presented in Figure 5.1.

Given a training set Z = {Z}C

i=1, containing C classes with each class Zi = {zij}Ci

j=1,

consisting of images zij (where zij is the column vector of the jth image of the ith class),

a total of N =∑C

i=1Ci are present on the training set. The dimensionality of the column

vectors of the images in Z is d. Ci is fixed to L, ∀i.

For optimal performance of the boosting method, the individual learners of the ada-

boost.M2 should have a low mutual dependence with each other and a low generalization

error on the training set. The boosting method does not perform better over iterations

if either the individual g-Classifiers are too strong, i.e., have a high mutual dependence,

or they are too weak so as to produce a very high generalization error, as explained

earlier. The g-Classifiers will have a strong mutual dependence if the samples used to


Figure 5.1: Training the Ada-Boost Ensemble- Generic Diagram

train each of the g-Classifiers are overlapping. The samples per subject available for

training each g-Classifier is therefore used as a parameter for adjusting the weakness of

the g-Classifiers. The weakness of the g-Classifier is described using a quantity called

the Learning Difficulty Degree (LDD) which is given by Equation (5.2),

Learning Difficulty Degree, ρ =r

C(5.2)

where r is the number of samples/ subject present in each individual g-classifier and C

is the number of subjects in the entire training set, Z. The Boosting method therefore

involves a trade off between weak g-Classifiers and low generalization error, and this is

achieved by choosing an r∗ such that the most optimal performance is achieved with

ada-boost.M2. The optimal r∗ will differ for each learning scenario and can take values


in [2, L], where L is the number of samples per subject in Z.

In order to facilitate interaction between the booster and the learner, two quanti-

ties are introduced: the pairwise class discriminant distribution: At which introduces a

weighting factor in the between class scatter matrix SB,t, and the sample distribution:

Dt which introduces a weighting factor in the within class scatter matrix, SW,t, where t

is the boosting iteration. Higher values for At(p, q) and Dt(zij) show harder separability

between two classes (p & q) and a harder to classify sample (zij) respectively. The values

of At(p, q) and Dt(zij) are calculated from the mislabel distribution, Γt(zij, y), which is

in fact a function of the pseudoloss at iteration t− 1, ǫt−1. The pseudoloss, ǫ represents

the training or generalization error. The Equations (5.3)- (5.7) present the formulae used

to calculate these values.

LetB be the set of all mis-labels defined as, B ={

(zij, y) : zij ∈ Z, zij ∈ ℜd, y ∈ Y, y 6= yij

}

ǫt =1

2

∑

(zij ,y)∈B

Γt(zij, y)(1 − ht(zij, yij) + ht(zij, y)) (5.3)

where Γt(zij, y) is the mislabel distribution defined over all elements of B. A higher value

of Γt signifies a higher probability of the misclassification: (zij, y), where y 6= yij.

The equations for the between and within class scatter of the R-LDA in iteration t

are given by,

SB,t =r

N

C∑

p=1

At(p, q)(zp − zq)(zp − zq)T (5.4)

SW,t = N

C∑

i=1

r∑

j=1

Dt(zij)(zij − zi)(zij − zi)T (5.5)

At and Dt which are used in equations 5.4 and 5.5 are described below,


At(p, q) =

12(∑

j:gt(zpj)=q Dt(zpj) +∑

j:gt(zqj)=p Dt(zqj)) if p 6= q

0 if p = q

(5.6)

gt(z) = arg maxy∈Y

ht(z, y)

Dt(zij) =∑

y 6=yij

Γt(zij, y) (5.7)

where ht ∈ [0, 1] is based on the nearest center classifier. A value of 1 indicates perfect

similarity and a value close to 0 indicates low similarity of sample zij to class y. At is

calculated using only those samples from each class which were not represented well in

the previous t − 1 g-classifiers. Higher values of At and Dt indicate harder to classify

classes and harder to classify samples, respectively.

With respect to the pictorial representation of the ada-boost training in Figure 5.1,

the updated parameters refer to Γt, At and Dt. The hard to classify subset extracted in

every successive tthiteration correspond to those r samples with the the highest values of

Dt in every class. A pseudo code of the ada-boost.M2 procedure is presented in Figure

5.2.

5.4 Possible Implication of color in the Ada-Boost

framework

In this section, the possible implication of the extra dimensionality of chromatic infor-

mation on the ada-boost framework is discussed. Chromatic information could have

implications on two aspects of the boosting framework, which are discussed in this sec-

tion. They are,

• Color inputs could worsen the small sample size problem when used with the R-

LDA feature extractor, which is a part of the g-classifier.


I n p u t : T r a i n i n g i m a g e s a n d t h e i r c o r r e s p o n d i n g l a b e l s :1

1

,C

L

ij ijj

i

z ya n d

ijy Y w h e r e

{1,2,..., }Y C , N o . o f I t e r a t i o n s T .I n i t i a l i z a t i o n :1

1( , )

( 1)ijz y

N C. C a l c u l a t e

1Aa n d

1D u s i n g e q u a t i o n s 5 . 6 a n d 5 . 7P r o c e d u r e : D o t h e f o l l o w i n g f r o m t = 1 t o T1 . i f t = 1 , c h o o s e r s a m p l e s p e r c l a s s r a n d o m l y t o f o r m R t , e l s e c h o o s e r h a r d e s ts a m p l e s p e r c l a s s b a s e d o n t h e h i g h e s t ˆt

D v a l u e s t o f o r m R t2 . T r a i n t h e J D L D A f e a t u r e e x t r a c t o r o n R t u s i n g e q u a t i o n s 5 . 1 , 5 . 4 a n d 5 . 5 a n do b t a i n t h e f e a t u r e b a s i s W t , p r o j e c t e d c l a s s m e a n s , t iW z3 . B u i l d c l a s s i f i e r h t u s i n g W t a n d

t iW z c r e a t e d i n p r e v i o u s s t e p a n d o b t a i n t h eh y p o t h e s i s : h t :

[0,1]dR Y4 . C a l c u l a t e

ˆ u s i n g e q u a t i o n 5 . 3 . ˆˆˆ1

tt

t5 . U p d a t i o n : (1 ( , ) ( , )) / 2

1

( , )( , )

t ij ij t ijh z y h z y

t ij t

t ij

z yz y , w h e r e i s a n o r m a l i z a t i o nf a c t o r t o c o n v e r t i t t o a d i s t r i b u t i o n .

1ˆ

tA

a n d1

ˆt

Da r e u p d a t e d u s i n g e q u a t i o n s5 . 6 a n d 5 . 7O u t p u t : T h e f i n a l e n s e m b l e g c l a s s i f i e r ,

1

1( ) arg (log ) ( , )max

T

f t

y Y t t

h z h z y , w h e r e z i sa n u n k n o w n p r o b eFigure 5.2: Pseudocode: Ada-Boost framework

• They could also have an effect on the optimal weakness, i.e., the r∗ parameter of

the ada-boost.M2 framework.

Implication on the small sample size problem

When the images in Z are multi spectral (or color), the dimensionality of of the training

samples, zijs are increased by a factor of the number of spectral planes, K as discussed

earlier. This could worsen the small sample size problem when used with an LDA learner


as discussed in Chapter 3. Even though the R-LDA is robust to the issue of inversion of

a singular within class scatter matrix, SW by searching for an optimal basis in the low

dimensional space of AC⋂

B as discussed earlier, the estimation of SW remains an ill

posed problem due to the high bias and variance involved in the estimation of parameters.

The estimation of SW becomes a more ill posed problem when multi spectral color inputs

are used as an input to the R-LDA feature extractor, thus worsening the small sample

size problem.

Implication on the optimal weakness of the individual g-classifiers

The design of the ada-boost.M2 involves an optimal choice of the parameter r∗ which

denotes the optimal number of samples per subject used for training each individual

g-classifier. This r∗ should achieve the best trade off between creating individual g-

classifiers with low mutual dependence, i.e., weak learners and achieving a low general-

ization error on the training set. The trade off is described using a loss function defined

in [11],

R(r) = (1

T

T∑

t=1

∑

i,j

Pr[ht,r(zij) 6= yij]) + λ ·

√

ρl(r)

ρl(L)(5.8)

where T is the iteration number, Pr[ht,r(zij) 6= yij] is the empirical classification error

rate(CER) obtained by applying the g-Classifier ht to the training set Z and λ is a

constant whose value is determined experimentally. The first term in the above equation

represents the generalization error while the second term represents the weakness of the

individual learners. Determining the optimal r∗ is equivalent to minimizing R(r) with

respect to r. The optimality is defined with respect to lowest generalization error on the

training set.

Increased dimensionality of the face inputs zijs induces a small sample size problem

in training the individual g-Classifiers, which could increase or change the generalization

error on the training set (first term in Equation (5.8) ). Therefore the weakness of the


g-Classifier would have to be adjusted accordingly, i.e., r∗ would have to be changed

in order to minimize the function in Equation (5.8). The booster would however fail if

r was either too high or low. Although color inputs lead to a better performance than

gray scale inputs for all examined scenarios, the optimal r∗ could be different for both

color and gray scale inputs. This is another aspect examined in the experiments in this

chapter.

The effect of the increased dimensionality and the induced small sample size created

by color images on the booster, along with the effect of chromatic information in the

design of the ada-boost.M2 parameters are examined in the performed experiments so

that chromatic information can be used in the most effective way so that can be used to

address variations in illumination, in addition to pose and expression variations.


For the experiments in this chapter, the ada-boost.M2 has been trained on the gallery

set, Z. The FR system operates in the identification mode and the images of the probe

set, Q are matched against those of the gallery. A flowchart depicting a broad outline of

the FR system used is presented in Figure 5.3.

As in the experiments performed in the previous chapters, the images of the gallery

and probe sets, Z and Q respectively contain irrelevant portions comprising of the back-

ground, hair, shoulder, etc. along with the face. The preprocessing stage isolates the

face from the rest of the image, and represents the face as a column vector for further

processing. The steps to preprocessing are explained in Appendix B. The resolution of

the images are fixed to 150×130 for all experiments performed. This resolution is chosen

as it is commonly used in surveillance applications.

Since the images are in the RGB format, a color space transformation block is required

to transform the preprocessed images to the required color space for analysis. Since we


want to evaluate the effect that color has on the small sample size scenario and the ada-

boosting framework in this paper, we have considered two color space transformations:

RGB, and YCbCr 4:4:4, along with their corresponding gray scale counterparts: R and

Y. The same color space/ gray scale transformation is used in both training and testing

stages, i.e., it is assumed that the FR system user knows the color space/ gray scale

transformation used for training the system.

As mentioned before, each individual g-classifier consists of a R-LDA feature extractor

and a linear classifier: nearest center, and is trained based on updated parameters which

depend on the generalization error on the training set, Z. In order to fit the ada-boost

framework, the nearest center classifier is based on the Euclidean distance and given by,

dist(z, i,Ψt, zi,t) =distmax − distz,i

distmax − distmin

(5.9)

where distz,i =∥

∥ΨT (z − zi)∥

∥, distmax = max({distz,i}Ci=1) and distmin = min({distz,i}

Ci=1).

The classification score obtained by Equation (5.9) has values in [0,1]. It should be noted

that for normalized inputs of unit norm, the cosine similarity measure is equivalent to the

Euclidean similarity metric. The ada-boost.M2 is first trained on Z and then evaluated

on Q to produce a Classification Error Rate (CER). CER is the ratio of the number of

wrong identifications to the total number of probe images taken as a percentage. The

CER is equal to 100- CRR.

A difficult subset of the CMU PIE database having severe pose and illumination vari-

ations is chosen as an evaluation database for the experiments in this chapter. 7 different

poses and 10 different illumination conditions are included to depict hard conditions in

the FR problem. The illumination conditions are caused by varying positions of the

camera flash in a room with zero background illumination, hence the variations caused

are severe. Following the PIE’s naming rule, pose group [05, 07, 09, 11, 27, 29, 37] and

alternate flash numbers [2, 4, 6, 10, 12, 13, 14, 16, 18, 19] are chosen for experimentation


purposes. Poses chosen are restricted to a maximum variation of 45 degrees and 10 out of

21 illumination conditions are used in this evaluation database. Details of the evaluation

subset used for experimentation, D are listed as follows:

• No. of subjects: 68

• No. of samples per subject: 70 (Each subject has 10 images belong to each pose,

where each of those 10 images belong to a different illumination condition; thus

covering 7 poses and 10 severe illumination conditions).

• Total number of images in the evaluation database: 68×70=4760

The number of subjects is fixed to 68, and the samples per subject, L is varied.

Following standard FR practices, the evaluation database is divided into two sets: the

gallery set on which training is performed, Z and the probe set, Q which contains the

images of unknown identity, such that D = Z +Q, and Z ∩Q = ∅. L images per subject

from D comprise the training set Z, while the remaining 70 − L images per subject

constitute the probe set Q, hence the cardinality of Z is |Z| = 68×L and the cardinality

of Q is |Q| = |D| − 68 × L. L takes values in {3, 4, 5, 6, 7, 10, 13, 16} in order to examine

the small sample size problem in terms of number of samples/subject for both color and

gray scale images. The images of each subject chosen to comprise Z are ensured to be of

different illuminations for all learning scenarios and different poses if L ≤ 7, so that all

7 poses and 10 illuminations are represented by the 68 subjects in the training set. The

results reported are at an average of more then 7 runs to avoid bias; each run is executed

on a gallery and probe partition.


Fig

ure

5.3:

Syst

emD

escr

ipti

on


5.6 Results

In this section, the following aspects are examined for a range of small sample size learning

scenarios on the evaluation database described,

• The effect of chromatic information and the extra dimensionality of chromatic in-

puts on the R-LDA feature extractor

• The effect of boosting the R-LDA learner for different color space/ gray scale trans-

formations

The results are presented in Table 5.1 and the values recorded are the Classification

Error Rate (CER) as a percentage. The ada-boost.M2 was performed over 40 iterations

and the best FR performance (at iteration T*) over 40 iterations is presented. In order

to reduce the number of parameters varied, the number of features used for each R-LDA

feature extractor in the creation of each g-classifier is fixed to 30 for all experiments. In

order to compare the best performances of boosting with that of the R-LDA, the best

CER (that obtained using the optimal number of features, M*) is recorded in Table 5.1.

In order to examine the aspects above, three performance measures are introduced,

• ξ∗J : The best improvement obtained by color over its gray scale counterpart for

R-LDA

• ξ∗B: The best improvement obtained by color over its gray scale counterpart for

ada-boost.M2

• δ∗: The best improvement obtained by boosting, i.e., the improvement of ada-

boost.M2 over R-LDA

Negative values of ξ∗J , ξ∗B and δ∗ signify improvements, while positive values signify

deterioration in performance. Table 5.3 presents values for ξ∗B and table 5.2 presents

values for ξ∗J and ξ∗B over the range of learning scenarios examined.

Chapter

5.

Color

Face

Recognit

ion

inA

da-B

oost

framew

ork

85

Table 5.1: Results obtained with ada-boost.M2 & R-LDA using color & gray scale transformations in different learning scenarios

x 1/68 B-JD-LDA(T*)JD-LDA(M*) B-JD-LDA(T*) JD-LDA(M*) B-JD-LDA(T*) JD-LDA(M*) B-JD-LDA(T*) JD-LDA(M*)

2 65.86(6) 64.45(4) 60.14(11) 59.27(1)

3 54.97(5) 54.17(7) 49.87(21) 45.89(3)

2 67.9(5) 64.12(5) 60.72(5) 61.33(2)

3 39.95(30) 38.62(23) 34.16(30) 29.71(24)

2 67.94(1) 65.37(4) 65.15(3) 62.09(2)

3 35.36(40) 32.38(40) 29.71(38) 25.15(36)

4 31.20(37) 29.12(34) 26.06(34) 20.57(22)

3 33.86(40) 28.54(40) 26.79(40) 29.64(7)

4 25.17(39) 23.84(27) 20.78(40) 16.76(36)

5 23.67(37) 22.47(23) 19.61(33) 16.23(17)

3 32.64(36) 23.92(40) 21.87(37) 31.18(4)

4 19.78(39) 17.32(37) 14.35(39) 11.07(37)

5 16.71(40) 15.46(31) 12.33(29) 9.37(30)

6 16.04(40) 15.13(21) 12.16(38) 9.76(13)

5 13.73(38) 12.56(40) 10.62(39) 7.21(40)

6 12.50(39) 11.57(30) 9.70(34) 6.28(40)

7 11.87(39) 11.37(35) 9.42(39) 6.04(28)

9 13.79(22) 13.80(8) 11.39(15) 8.04(6)

5 16.67(36) 14.95(39) 12(40) 9.13(39)

7 13.28(40) 12.45(39) 9.36(40) 7.57(38)

9 12.63(31) 12.24(36) 9.66(40) 8.50(11)

11 15.07(14) 14.14(10) 12.16(40 9.73(7)

5 15.1(40) 13.27(40) 9.91(40) 8.42(40)

7 10.86(40) 10.43(39) 7.19(40) 6.06(35)

9 9.59(35) 9.8(40) 6.57(37) 5.56(26)

11 9.71(39) 10.09(29) 7.43(36) 6.69(11)

13 11.49(24) 11.43(15) 9.16(24) 7.41(10)

15 13.09(17) 12.86(20) 10.59(12) 8.02(7)

56.3586(46)

Values in this table are Classification Error Rate (CER) expressed as a percentage

The CERs reported for the B-JD-LDA are the minimum over 40 ada-boost iterations, T* denotes the iteration number at which this

minimum was achieved, no of JD-LDA features used 30 for all boosting experiments

The CERs reported for JD-LDA are for the best found feature number, where M* is the most optimal number of features

46.385(46)

Gray scale transformationsLDD of

individual

learner

Color Space Transformations

R Y RGB YCbCr

52.7283(46)

16

7

10

13

3

46.8326(35)

56.3191(46)

Samples/

Subject

4

5

6

18.6819(30) 15.2614(30) 11.6285(30)

21.5931(30) 17.3676(30) 13.4559(30)

14.6646(30)19.1744(30)22.549(30)

29.2464(33)

27.0003(30) 22.1548(30) 17.411(30)

39.2534(33)

17.963(30)

39.5989(33)

33.3203(30)

27.1761(30)

21.9401(30)

20.9657(30)

33.596(30) 28.8166(30) 23.8074(30)

46.884(35) 41.9546(35) 36.7647(35)

35.0309(33)


5.6.1 Implication of Color

From Table 5.2 it is evident that color inputs lead to a better performance for the FR

system under any given case examined. The improvement caused by using color inputs

ranges from approximately 2% to 8% for all the learning scenarios examined. From table

5.1, |ξ∗B| and |ξ∗J | are significantly greater for the YCbCr & Y pair of transformations

compared to the RGB & R pair, which implies YCbCr is a better color space for FR.

This is in agreement with the conclusions in past works [17], and can be attributed to the

decorrelated information in each of its spectral planes. The issues discussed in Section

5.4 are examined in this sub section.

Small Sample Size Scenario

An overview of Table 5.2 suggests that for both the RGB & R and YCbCr & Y pairs

of transformations ξ∗B and ξJ are higher in the small sample size scenarios. The values

of |ξ∗B| and |ξ∗J | are highest when L = 4 and monotonically decrease as L increases.

However in the most extreme small sample size scenario examined, corresponding to

L = 3 case, |ξ∗B| and |ξ∗J | are marginally lower. This observation can be attributed to the

effect of the increased dimensionality of color on the small sample size problem, and is

similar to the trend observed in Chapter 3. However, the values of |ξ∗B| and |ξ∗J | are still

significantly high to use chromatic information over gray scale in the small sample size

scenarios. As the small sample size restriction is relaxed, i.e., the value of L increases,

the improvement of color over gray scale (ξ∗B and ξJ) is not as significant for both pairs

of color transformations examined.

Implication on weakness of g-classifier

In agreement with earlier literature [11], r∗ should not be too high or too low, from

table 5.3. However no trend in the shift of r∗ was observed when color transformations

were used instead of their gray scale counterparts, although in some cases (L = 6, 7) the


Samples / LD-LDA ada-boost.M2Subject

RGB YCbCr RGB YCbCrCER 52.7283 46.385 49.87 45.89

3 r* - - 3 3ξ ∗B /ξ∗J -3.59 -9.974 -5.1 -8.28

CER 41.9546 36.7647 34.16 29.714 r* - - 3 3

ξ ∗B /ξ∗J -4.88 -10.12 -5.79 -8.91CER 35.0309 29.2464 26.06 20.57

5 r* - - 4 4ξ ∗B /ξ∗J -4.57 -10.01 -5.14 -8.55

CER 28.8166 23.8074 19.61 16.236 r* - - 5 5

ξ ∗B /ξ∗J -4.5 -9.789 -4.06 -6.24CER 22.1548 17.411 12.16 9.37

7 r* - - 6 6ξ ∗B /ξ∗J -5.02 -9.589 -3.88 -5.76

CER 17.3676 13.4559 9.42 6.0410 r* - - 7 7

ξ ∗B /ξ∗J -3.6 -8.137 -2.45 -5.33CER 19.1744 14.6646 9.36 7.57

13 r* - - 8 8ξ ∗B /ξ∗J -2.77 -7.884 -3.27 -4.67

CER 15.2164 11.6285 6.57 5.5616 r* - - 9 9

ξ ∗B /ξ∗J -2.7 -7.053 -3.02 -4.24

Table 5.2: Best Performances obtained by using the color space counterpart over thecorresponding gray scale over different learning tasks


value of r* is shifted to a lower value for the YCbCr set of transformations. Color inputs

always produce a lower generalization error error on the training set when compared to

gray scale inputs, therefore would require a lower r* to achieve the optimal trade off.

However, the effect of using chromatic inputs with different dimensionality on r* is not

examined.

5.6.2 Implication of ensemble learning

From the the negative values of δ∗ in Table 5.3, a broad conclusion would be that the

ada-boost.M2 has a better performance than R-LDA method of FR for all examined

cases.

The improvement caused by boosting the R-LDA, i.e., |δ∗| is larger when the size of

the training database is large, i.e., L>4. The value |δ∗| is not significant for the case when

L = 3, however is over 6% for all cases when L>4. This is due to the fact that when the

training database is large, the probability of the ada-boost.M2 choosing a different set

of training samples and hence creating a diverse and complimentary set of classifiers is

higher. This trend agrees with previous works [11, 45] which examine the performance

of ensemble learners.

Another trend observed in Table 5.3 is that the improvement obtained by boosting

the R-LDA, |δ∗| does not depend on the color space/ gray scale transformation used, but

only on the size of the training database.

The above trends suggest that boosting the learner does not significantly help the FR

system in the extreme small sample size learning scenarios, L ≤ 3, but however improves

the FR system performance when the training set is reasonably large irrespective of the

color space/ gray scale transformation used.


Samples / Gray Scale Transformation Color Space Transformation

Subject Y fromR YCbCr RGB YCbCr

CER 54.97 54.17 49.87 45.893 r* 3 3 3 3

δ∗ -31.3491 -2.1886 -2.8583 -0.495CER 39.95 38.62 34.16 29.71

4 r* 3 3 3 3δ∗ -6.8826 -8.264 -7.7946 -7.0547

CER 31.2 29.12 26.06 20.575 r* 4 4 4 4

δ∗ -8.3989 -10.1334 -8.9709 -8.6764CER 23.67 22.47 19.61 16.23

6 r* 5 5 5 4δ∗ -9.6503 -11.126 -9.2066 -7.5774

CER 16.04 15.13 12.16 9.377 r* 6 6 6 5

δ∗ -11.1361 -11.8703 -9.9948 -8.041CER 11.87 11.37 9.42 6.04

10 r* 7 7 7 7δ∗ -9.0957 -7.4069 -7.9476 -7.4159

CER 12.63 12.24 9.36 7.5713 r* 9 9 8 8

δ∗ -9.3101 -9.451 -9.8144 -7.0946CER 9.59 9.8 6.57 5.56

16 r* 9 9 9 9δ∗ -8.373 -8.8819 -8.6914 -6.0685

Table 5.3: Best Performances obtained by boosting the R-LDA learner for different inputsand learning tasks


5.7 Conclusions

Color transformations boost the performance of the performance of the FR system un-

der any given scenario, however, the improvement offered by chromatic inputs reduces

monotonically as the learning scenario becomes easier. This is in agreement with the

trends observed in Chapter 3. Even though the added dimensionality of the color inputs

examined leads to a dip in the improvement caused by color in the most extreme case of

small sample size learning examined, it is still high enough to suggest chromatic inputs

over gray scale inputs.

Boosting the learner on the other hand, leads to an improvement in the FR system

performance when the training database is large, as the ada-boost.M2 can build a more

diverse and complementary set of classifiers. In the extreme cases of small sample size

learning, the classifiers generated by the boosting framework are trained on the virtually

the same samples at every iteration and differ only in the updated parameters, leading to

a strong mutual dependence between the individual learners. Therefore the improvement

achieved by boosting is not significant in these learning scenarios. The weakness of the

individual R-LDA learners should be appropriately designed depending on the learning

scenario concerned, and would not depend on the color space/ gray scale transformation

used.

The trends observed in Section 5.6 suggest that the design of the FR system (color

space/ gray scale transformation and boosting parameters) would have to be chosen de-

pending on the learning scenario under consideration. In small sample size scenarios, the

FR system performance is boosted significantly by the usage of chromatic inputs and

not by boosting the learner. As the value of the samples per subject, L is increased,

the improvement provided by color information reduces. Boosting the learner improves

the performance of the individual learner significantly in all cases where the size of the

training database is reasonably large, i.e., L ≥ 4& |D| > 272 images. The experimental

results show that integrating color into the boosting framework could significantly im-


prove the performance of the FR system when L ≈ 4 − 10 for medium sized databases.

Also, the YCbCr set of color transformations lead to a higher FR performance than the

RGB, for the set of images used.

5.8 Chapter Summary

In this chapter, chromatic information is integrated with an ada-boost learner to address

complexities in face patterns and illumination variations in training databases for face

recognition (FR). An LDA based learner is boosted and the integrated framework is

tested on a large database of images having severe pose and illumination variations.

The effect of both the extra dimensionality of color inputs and ensemble learning were

examined on the LDA learner in a range of small sample size learning scenarios. The

results of the experiments performed were presented in this chapter.

Experimental results show that integrating color into the boosting framework helps in

addressing complexities in face patterns and severe illumination variations and produces

a high performing FR system for a range of learning scenarios. However in learning

scenarios where the training database is small, e.g., the small sample size scenarios, the

contribution of chromatic information is very significant, and when the size of the training

database is reasonably large, ensemble learning boosts the performance of the FR system.

Chapter 6

Conclusion and Future Research

In this chapter, the broad conclusions of the aspects studied in this thesis are present

along with a summary of the work. Proposed directions for the extension of this work

are also discussed.

6.1 Research Summary

Usage of color information has gathered recent attention in FR research. In this thesis,

color information in multi spectral images is used along with intensity or gray scale

information as an input to the FR system. The small sample size learning learning

scenarios are of major importance in this work. The effect of chromatic information

is examined in a range of learning scenarios, facial distortions/ viewpoints and both

poor and good illumination conditions, and an analysis is presented on the usefulness

of chromatic information. The experiments performed suggest toward the idea that

chromatic inputs do provide discriminatory information to the FR system in certain

conditions. The results presented in this thesis are specific to the databases evaluated

upon, however the conclusions and ideas can be extended to any color FR system.

Experiments were performed to determine the learning scenarios and imaging condi-

tions in which chromatic information boosts the performance of the FR system. This is

92

Chapter 6. Conclusion and Future Research 93

an important concern as the storage requirements and computational cost involved in the

usage of multi spectral chromatic inputs is around 1.5-3 times more than the correspond-

ing gray scale images. This issue was examined for both supervised and unsupervised

learning modes and it was concluded that color cues provide important discriminatory

information to the FR system in conditions of poor illumination, when the shape cues are

degraded owing to an unclear intensity or gray scale image. A point of interest was the

effect of the increased dimensionality of color inputs and spatial sampling of chromatic

planes on the small sample size problem in supervised FR systems. Interestingly, the ex-

periments suggest that color inputs help the FR system performance in small sample size

learning scenarios. Spatial sub sampling of chromatic planes can be used as a parameter

to control the trade off required between the dimensionality of the input and the amount

of chromatic information fed to the system thus aiding the design of the FR systems in

small sample size scenarios. This chosen dimensionality would depend on the imaging

conditions under which the faces were captured. Spatial sub sampling leads to a decrease

in important information when used in unsupervised FR systems with images subjected

to severe imaging variations, although does not lead to a loss of information when the

shape cues are optimal. Another important factor in the design of a color FR system is

the chromatic bytes used to form the color inputs. In our experiments, we found that

the Cb spectral plane (of YCbCr) does not contain discriminative information useful for

FR purposes. This would depend on the nature of images present in the training and

testing sets.

The effect of using the YCbCr transform in a decision fusion framework was evaluated

in this thesis. The YCbCr being a decorrelated transform is expected to offer different

and complementary information through each of its spectral planes to the FR system.

Since the individual learners are trained on individual spectral planes, this framework is

expected to be more robust to the small sample size problem in comparison to FR systems

where a raw data level fusion of spectral planes is performed. The framework was tested


in the supervised learning mode under both poor and good conditions of illumination

and it was found that fusion of chromatic and intensity information on a decision level

is an efficient way to use information contained in face inputs especially in small sample

size learning scenarios.

Complexities in face patterns which take the shape of variations in pose, viewpoint

and expression which occur with simultaneous variation in illumination conditions were

addressed by combining chromatic information, supervised learning and adaptive boost-

ing into a single learning framework. The implication of color information on different

aspects of the LDA based ensemble learner were discussed and examined. The individ-

ual effects of chromatic information and adaptive boosting were examined on the LDA

based supervised learner and it was concluded that this combined framework proposed

finds applications in medium sized face databases which have simultaneous illumination,

pose and variations in viewpoint. When the size of the training database is very small,

chromatic information helps the FR system, while when the size of the training database

is very large, the ensemble learner boosts the performance of the FR system.

In summary, an important conclusion from this thesis is that color especially helps

the FR system when the images are captured in uncontrolled conditions and severe

illumination conditions. Chromatic information improves the FR performance in the

extreme small sample size learning scenarios, if used effectively. Even though there

might be a slight drop in the contribution of chromatic information under some learning

and imaging conditions, color cues still provide valuable discriminative information to

the FR system under these difficult learning scenarios.

6.2 Future Work

A set of research topics are discussed in this section which would extend the work pre-

sented in this thesis.


More efficient aggregation on a decision level

In this thesis, information from the different spectral planes are fused in a multiple

classifier system framework using rule based aggregation methods: max rule, min rule

and sum rule. These aggregation methods do not depend on the training data. Using

of data dependent aggregation methods would lead to an improvement in aggregation

performance [42, 11]. Previous works [25, 27] have provided a framework for a confidence

based choosing of color spaces and spectral planes from different color spaces for decision

level aggregation. Efficient data dependent aggregation of information from individual

spectral planes would therefore lead to a more efficient use of chromatic information,

especially in small sample size learning scenarios (in supervised FR systems).

Face Resolution Implications on FR

The resolution was fixed to 150×130 for all experiments in this thesis as this resolution

is commonly used for surveillance applications. However, it is not certain that this is the

optimal resolution for FR purposes. It would therefore be a good idea to determine the

optimal resolution for FR purposes. Another possible issue is when the testing images

do not have the same resolution as those which were used for training, as this would

lead to the issue of projection of images of a particular dimensionality on to a subspace

of different dimensionality. Efficient methods to solve this resolution mismatch would

significantly help in the practical usage of FR systems. It could be done by estimation

of the feature space (of the testing image dimension) from the existing feature space of

the training image dimension, or by resizing the testing inputs to suit the dimension of

the training inputs; however it is not evident which of these approaches would lead to

a better FR system. A change in resolution would also change the implication of the

small sample size problem in supervised FR systems, as it would mean a change in the

dimension of the vectorized input.


Transformation Mismatch in training and testing stages

The assumption in this thesis is that the same color space or gray scale transformation

is used for both training the FR system and in the testing stage. This assumption

was made as the principle aim of this thesis was to investigate the efficient use of color

information in FR systems. However, this is not a necessary condition in practical FR

systems where the FR system user is not guaranteed to know the color spaces or gray

scale transformation on which the system was trained. It is therefore important from an

application point of view to determine the effect of employing different transformations in

training and testing stage, or in the scenario of a transformation mismatch. A knowledge

of the transformations which are more robust to mismatches would help in the design of

FR systems as the training could be performed on these robust transformations.

Tensorial Analysis of Color inputs

In this thesis, color inputs are processed by the formation of a column vector in the

preprocessing step. The column vector was formed by a row wise ordering of bytes from

each spectral plane followed by a concatenation of spectral planes. By this operation,

though the chromatic information is used in the FR system, the structural and correlation

data between successive pixels and spectral planes is lost as the image is processed as a

long vector. Tensorial analysis has been explored for high dimensional data like faces and

gait in [46] and it is proved that preserving the structure and correlation in the data could

lead to a better recognition performance of pattern recognition systems. Analysis of color

inputs as 3-mode tensorial data would therefore preserve the structure and correlation in

data as well as the relation of the spectral planes with each other, especially in the case

of inputs of correlated color spaces.

Appendix A

Color CMU PIE database

In this appendix, a description of the images and their imaging conditions are provided,

so as to provide a sample of the faces used for experiments in this thesis. The images are

from the color CMU PIE database [40, 41]. This database is chosen for experimentation

as it consists of color faces in a wide range of imaging conditions, facial distortions

and variations in pose and viewpoint. The color CMU PIE database is licensed to the

University of Toronto and is permitted to be used solely for research purposes. The

sample images included in this appendix are of those subjects which are permitted for

use in published papers/ results.

The CMU PIE database [40, 41] consists of 41,368 images of 70 subjects. The images

in the color CMU PIE database are stored in RGB format and have a spatial dimension

of 480 x 640. Each subject was photographed under 13 different poses, 42 different

illumination conditions, and 4 different expressions. The database consists of two major

partitions, the first with pose and expression variation only, the second with pose and

illumination variation. The various different categories of images available in the CMU

PIE database are explained in this appendix and summarized in Table A.1. Images and

other data are available in [40].

97

Appendix A. Color CMU PIE database 98

Table A.1: Details of CMU PIE databaseCondition No. of subjects Other detailsPose and 70 21 flash conditions

illumination 13 posesPose and 70 2 background conditionslighting 3 poses

21 flash conditionsPose and 70 3 expressions and talkingexpression 13 poses (neutral illumination)

A.1 Pose and Illumination variation

This subset contains images of all 70 subjects having pose and illumination variations.

13 pose variations are captured and the poses consist of both horizontal and vertical

variations ranging from 0 to 90 degrees. The illumination conditions captured can be

classified into those where the room lights were off, and those where they were on. The

former is denoted as illumination and the latter as lighting. The images are captured by

varying positions of camera flash, thus leading to a total of 21 illumination and lighting

conditions each.

Illumination Images : The images are captured by varying positions of camera flash, in

a room with zero background light, thus leading to images with severe imaging conditions.

Each of the 21 illumination conditions are captured in all 13 poses, leading to a total of

273 samples per subject. Samples of these images are provided in Figure A.1.

Lighting Images : These images are captured by varying positions of camera flash

in neutral background light. The images have good illumination conditions, and are

depicted in Figure A.2. This set of images is typical of an office environment. Each of

the 21 illumination conditions are captured in 3 poses.


Pose 0

7P

ose 3

7

Flash 02 Flash 06 Flash 10

Figure A.1: CMU PIE: Images with Pose and Illumination Variations : No Room Lights

Flash 02 Flash 06 Flash 10

All images are of the Frontal Pose - 27

Figure A.2: CMU PIE: Images with Pose and Illumination Variations : Room Lights On

A.2 Pose and Expression variation

This subset of the CMU PIE database consists of images of 70 subjects, having pose

and expression variations. Subjects are captured in all of the 13 poses in this subset. 3

different expressions are considered - smiling, neutral and blinking along with an image

of the subject talking. A sample of images from this category is provided in Figure A.3.

These images are captured in neutral illumination. If the subject wears spectacles, both


images with and without spectacles are included in this partition.

Neutral ExpressionFrontal Pose 27

Smiling Expression Pose 05

Blinking ExpressionPose 37

Figure A.3: CMU PIE: Images with Pose and Expression Variations : Room Lights On

Appendix B

Preprocessing Method

In this appendix, a description of the method used for preprocessing face images is

provided.

The images in the databases are in RGB format. They contain not only the face but

also irrelevant information such as the hair, neck, shoulder, background, etc. To avoid

incorrect evaluations, it is required to isolate the face from the remaining image. This

separation of the face takes place in the preprocessing stage. The preprocessed images

are then passed through the rest of the blocks of the FR system.

The sequence of preprocessing steps performed is as follows:

1. Each spectral plane of the initial color image (or the entire gray-scale image)is

translated, rotated and scaled to size 150×130, so that the centers of the eyes are

placed on definite pixels, and the distance between the eye centers is 70 pixels. Also

the eye centers are placed on the 45th row. The distance of 70 pixels between eye

centers and the particular row to place the eyes are chosen such the photometric

proportion of the face is maintained.

2. A standard mask is applied to this image of reduced dimension to remove the

non-face portions.

101

Appendix B. Preprocessing Method 102

3. The image is converted to the respective color space or subspace / gray-scale format.

4. Each plane of the color image is normalized to zero mean and unit variance (if the

color space used is decorrelated). Since YCbCr is a decorrelated color space, each

color plane can be individually normalized. For gray scale images, this operation

is performed on the gray-scale image after a histogram equalization.

The steps to preprocessing for a single spectral plane of a color image (or a gray scale

image) are illustrated in Figure B.1 on an image from the PIE database. The preprocessed

images are then represented as a column vector for further processing. The procedure

for conversion of a preprocessed image to a column vector is presented in Chapter 3.

Appendix

B.

Preprocessin

gM

ethod

103

Input Image

Eye Coordinates

Mask Application

(Removal of unnecessary

portions (hair, etc))

Mask

Pre-preprocessed face

Rotation

Resolution Scaling

(150x130)

Figure B.1: Steps to preprocessing a single spectral plane(or a gray scale) face image

Appendix C

YCbCr Color Space

In this appendix, the details of the YCbCr color transformations used in this thesis are

provided. The YCbCr color space was developed as part of the ITU-R Recommendation

B.T.6012 for digital video standards and television transmissions. This color space is

used in MPEG video compression standards and JPEG images [39].

The YCbCr is a decorrelated color transform and contains one intensity channel, Y

and two chromatic channels, red (Cr) and blue(Cb). The Y spectral plane has 220 levels

ranging from 16 to 235, while the Cb and Cr spectral planes have 225 levels ranging

from 16 to 240. Values below 16 and above 235 are denoted as headroom and footroom

and are reserved for other processing. Given an RGB image, we can derive the YCbCr

transformations using the following equation [39],

Y

Cb

Cr

=

16

128

128

+

65.4810 128.5530 24.9960

−37.7745 −74.1592 111.9337

111.9581 −93.7509 −18.2072

R

G

B

(C.1)

where the 8 bit values in R, G, and B spectral planes are scaled in the closed interval

[0,1].

In digital video transmission applications, the chromatic spectral planes of the YCbCr

104

Appendix C. YCbCr Color Space 105

color space are decimated or spatially sampled. The rationale behind this is that humans

see color with much less spatial resolution than black and white. Three sampling schemes

are used in MPEG and JPEG standards - 4:4:4, 4:2:2 and 4:2:0. Although this type

of compression is lossy, the resulting images used in video frames/ pictures have no

perceivable loss of clarity.

In YCbCr 4:4:4, no chromatic sampling is performed. The 8 bit values of each spectral

plane are used directly. The 4:2:2 scheme indicates horizontal sub sampling by a factor

of 2, i.e., every alternate row of the Cb and Cr spectral planes is eliminated. The 4:2:0

scheme indicates both horizontal and vertical subsampling of the chromatic planes by a

factor of 2. In this sampling scheme, every alternate row and column of the Cb and Cr

spectral planes is eliminated. In all these sampling schemes, the Y intensity plane is not

spatially sampled. Figure C.1 presents a pictorial illustration of the various sampling

schemes.

Appendix C. YCbCr Color Space 106

Figure C.1: Illustration of Chromatic sub sampling - Each sub figure is a YCbCr image

Bibliography

[1] A. O’Toole, P. Phillips, F. Jiang, J. Ayyad, N. Penard, and H. Abdi, “Face recogni-

tion algorithms surpass humans matching faces over changes in illumination,” IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1642–

1646, Sept. 2007.

[2] P. Sinha, B. Balas, Y. Otrovsky, and R. Russell, “Face recognition by humans:

Nineteen results all computer vision researchers should know about,” Proceedings of

IEEE, vol. 94, no. 11, pp. 1948–1962, 2006.

[3] L. Torres, “Is there any hope for face recognition?” International workshop on Image

Analysis for Multimedia Interactive Services, April 2004.

[4] A. F. Abate, M. Nappi, D. Riccio, and G. Sabatino, “2d and 3d face recognition: A

survey,” Pattern Recognition Letters, vol. 28, no. 14, pp. 1885–1906, 2007.

[5] F. Samaria, “Face recognition using hidden markov models,” in PhD thesis, 1994.

[6] F. Samaria and S. Young, “HMM based architecture for face identification,” Image

and Visual Computing, vol. 12, pp. 537–583, 1994.

[7] L. Wiskott, J.M.Fellous, N. Kruger, and C.V.D.Malsburg, “Face recognition by elas-

tic bunch graph mapping,” in CRC Press, 1999.

107

Bibliography 108

[8] C. Jones and A. I. Abbott, “Color face recognition by hypercomplex gabor analysis,”

7th International Conference on Automatic Face and Gesture Recognition, April

2006.

[9] J.Lu, K.N.Plataniotis, and A.N.Venetsanopoulos, “Regularization studies of linear

discriminant analysis in small sample size scenarios with application to face recog-

nition,” Pattern Recognition Letters, pp. 181–191, 2005.

[10] ——, “Face recognition using LDA-based algorithms,” IEEE Trans. on Neural Net-

works, vol. 14, no. 1, pp. 195–200, Jan 2003.

[11] J.Lu, K.N.Plataniotis, A.N.Venetsanopoulos, and S. Li, “Ensemble-based discrim-

inant learning with boosting for face recognition,” IEEE Transactions on Neural

Networks, vol. 17, no. 1, pp. 166–178, Jan. 2006.

[12] J.Wang, K. Plataniotis, J. Lu, and A. Venetsanopoulos, “On solving the one face

recognition problem with one training sample per subject,” Pattern Recognition,

vol. 39, pp. 1746–1762, 2006.

[13] M. Turk and A. Pentland, “Face recognition using eigenfaces,” IEEE Computer

Society Conference on Computer Vision & Pattern Recognition, pp. 586–591, Jun

1991.

[14] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. fisherfaces: recogni-

tion using class specific linear projection,” Fourth European Conferencee on Com-

puter Vision, pp. 45–58, Apr 1996.

[15] S. Peichung and L. Chengjun, “Improving the face recognition grand challenge base-

line performance using color configurations across color spaces,” IEEE International

Conference on Image Processing, pp. 1001–1004, 8-11 Oct. 2006.

Bibliography 109

[16] L.Torres, J. Reutter, and L. Lorente, “The importance of color information in face

recognition,” IEEE International Conference on Image Processing, pp. 627–631,

1999.

[17] S. Peichung and L. Chengjun, “Comparative assessment of content based face image

retrieval in different colour spaces,” International Journal of Pattern Recognition,

vol. 19, no. 7, pp. 873–893, 2005.

[18] A. Yip and P. Sinha, “Role of color in face recognition,” Technical Report, Artificial

Intelligence Laboratory, MIT, December 2001.

[19] J. Wang, “Appearance based face recognition under small sample size scenario,” in

PhD thesis, 2007, vol. University of Toronto.

[20] P. Philips, H. Moon, S. Rizvi, and P. Rauss, “The ferret evaluation methodology for

face recognition algorithms,” IEEE transactions on Pattern Analysis and Machine

Intelligence, vol. 22, no. 10, pp. 1090–1104, Oct. 2000.

[21] T. Ganapathi and K. Plataniotis, “Color face recognition under various learning

scenarios,” IEEE Canadian Conference on Electrical and Computer Engineering,

2008.

[22] T. Ganapathi, K. Plataniotis, and Y. Ro, “Boosting chromatic information for face

recognition,” IEEE Canadian Conference on Electrical and Computer Engineering,

2008.

[23] A. Jain, A. Ross, and S. Prabhakar, “An introduction to biometric recognition,”

IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1,

pp. 4–20, Jan. 2004.

[24] M. Sadeghi and J. Kittler, “A comparative study of data fusion strategies in face

verification,” 12th European Signal Processing Conference, pp. 1229–1232, 2004.

Bibliography 110

[25] M. Sadeghi, S. Khoushrou, and J. Kittler, “Confidence based gating of colour fea-

tures for face authentication,” in Multiple Classifier Systems. Springer, 2007, vol.

4472, pp. 121–130.

[26] J. Kittler and M. Sadeghi, “Physics based decorrelation of image data for decision

level fusion in face verification,” Multiple Classifier Systems, pp. 354–363, 2004.

[27] M. Sadeghi, S. Khoushrou, and J. Kittler, “SVM based selection of color space

experts for face authentication,” in International conference on Bioinformatics.

Springer, 2007, vol. 4642, pp. 907–916.

[28] P. Philips, H. Wechsler, J. Huang, and P. Rauss, “The FERRET database and eval-

uation procedure for face recognition algorithms,” Image Visual Computing Journal,

vol. 16, no. 5, pp. 295–306, 1998.

[29] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, and W. Worek, “Preliminary face

recognition grand challenge results,” in Seventh International Conference on Auto-

matic Face and Gesture Recognition, UK, 2006.

[30] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On combining classifiers,” IEEE

transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–

238, Mar. 1998.

[31] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, “XM2VTSDB: The

extended M2VTS database,” in International conference on Audio- and Video-Based

Biometric Person Authentication, 1999.

[32] E. B.Bailliere, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler, J. Mariethoz, J. Matas,

K. Messer, V. Popovici, F. Poree, B. Ruiz, and J. Thiran, “The BANCA database

and evaluation protocol,” in International conference on Audio and Video-Based

Biometric Person Authentication, 2003, pp. 625–638.

Bibliography 111

[33] M. Sadeghi and J. Kittler, “Decision making in the LDA space: Generalised gradi-

ent direction metric,” in International Conference on Automatic Face and Gesture

Recognition, 2004, pp. 248–253.

[34] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian face recognition,” Pattern

Recognition, vol. 33, no. 11, pp. 1771–1782, November 2000.

[35] B. Scholkopf, A. Smolla, and K. Muller, “Non linear component analysis as a kernel

eigen value problem,” Neural Computation, vol. Vol. 10, pp. 1299–1319, 1999.

[36] J.Lu, K.N.Plataniotis, and A.N.Venetsanopoulos, “Face recognition using kernel

direct discriminant analysis algorithms,” IEEE transactions on Neural Networks,

vol. 14, no. 1, pp. 117–126, Jan 2003.

[37] R. Duda, P. Hart, and D. Stork, Pattern Classification. John Wiley, 2000.

[38] S.J.Raudys and A. Jain, “Small sample size effects in statistical pattern recognition:

recommendations for practitioners,” IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 13, no. 3, pp. 252–264, Mar 1991.

[39] Z. Li and M. Drew, Fundamentals of Multimedia. Prentice Hall, 2004.

[40] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination and expression

database,” in Fifth International Conference on Automatic Face and Gesture Recog-

nition, Washington, D.C., 2002.

[41] ——, “The CMU pose, illumination, and expression (PIE) database of human faces,”

The Robotics Institute, Carnegie Mellon University, Tech. Rep. CMU-RI-TR-01-02,

January 2001.

[42] M. Kamel and N. Wanas, “Data dependence in combining classifiers,” in Multiple

Classifier Systems. Springer, 2003, vol. 2709, pp. 1–14.

Bibliography 112

[43] N.V.Chawla and K. Bowyer, “Designing multiple classifier systems for face recogni-

tion,” in Multiple Classifier Systems. Springer, 2005, vol. 3541, pp. 407–416.

[44] H.Yu and J. Yang, “A direct LDA algorithm for high- dimensional data with appli-

cation to face recognition,” Pattern Recognition, vol. 34, pp. 2067–2070, 2001.

[45] M. Skurichina, L. Kuncheva, and R. Duin, “Bagging and boosting for the nearest

mean classifier: Effects of small sample size on diversity and accuracy,” in Multiple

Classifier Systems. Springer, 2002, vol. 2364, pp. 62– 71.

[46] H.Lu, K.N.Plataniotis, and A.N.Venetsanopoulous, “Mpca: Multilinear principal

component analysis of tensor objects,” IEEE transactions on Neural Networks,

vol. 19, pp. 18–39, January 2008.

Documents

Color Image Based Face Recognition · Tejaswini Ganapathi Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2008 Traditional