Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Color Image Based Face Recognition
by
Tejaswini Ganapathi
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2008 by Tejaswini Ganapathi
Abstract
Color Image Based Face Recognition
Tejaswini Ganapathi
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2008
Traditional appearance based face recognition (FR) systems use gray scale images,
however recently attention has been drawn to the use of color images. Color inputs have
a larger dimensionality, which increases the computational cost, and makes the small
sample size (SSS) problem in supervised FR systems more challenging. It is therefore
important to determine the scenarios in which usage of color information helps the FR
system.
In this thesis, it was found that inclusion of chromatic information in FR systems
is shown to be particularly advantageous in poor illumination conditions. In supervised
systems, a color input of optimal dimensionality would improve the FR performance
under SSS conditions. A fusion of decisions from individual spectral planes also helps in
the SSS scenario. Finally, chromatic information is integrated into a supervised ensemble
learner to address pose and illumination variations. This framework significantly boosts
FR performance under a range of learning scenarios.
ii
Acknowledgements
I would like to sincerely thank my research adviser, Prof. Kostas Plataniotis for
his guidance and insightful inputs, which helped me a lot during my thesis work. His
encouragement and thoughts were very helpful during my graduate studies.
I would also like to thank my thesis proposal and committee members for taking
time out of their busy schedules and reviewing my work, offering valuable comments and
suggestions. I also acknowledge the financial support from the Ontario Graduate Schol-
arship, Department of Electrical and Computer Engineering at University of Toronto and
Prof. Kostas Plataniotis during the period of my graduate studies.
Finally, I would like to thank close friends (you know who you are!) and my lab mates
for their company, encouragement, and most of all for being there, without which this
journey would have been a very difficult one.
Last but not the least, I would like to thank my family members for their constant
encouragement, support and care.
iii
Contents
1 Introduction 1
1.1 Color based Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Face Recognition: Modes of Operation, and Target Applications . . . . . 4
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Thesis Contributions and Organization . . . . . . . . . . . . . . . . . . . 6
2 Prior Work and Background 8
2.1 Face Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Color Face Recognition: A Survey . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Motivation of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Color face recognition in different learning scenarios 20
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Representation of Color Information . . . . . . . . . . . . . . . . . . . . . 22
3.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . 25
3.4 Color and the Small Sample Size Problem . . . . . . . . . . . . . . . . . 27
3.4.1 Small Sample Size Problem . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Implication of Color Inputs . . . . . . . . . . . . . . . . . . . . . 27
3.5 Methodology and Experimental Setup . . . . . . . . . . . . . . . . . . . . 29
iv
3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6.1 Choice of Gray scale baseline and Similarity Metric . . . . . . . . 31
3.6.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6.3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Decision Level Fusion of Spectral Planes 49
4.1 Introduction and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Combination Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Methodology and Experimental Setup . . . . . . . . . . . . . . . . . . . . 54
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.1 Choice of Aggregation Rule . . . . . . . . . . . . . . . . . . . . . 57
4.4.2 FR Performance: Poor Illumination conditions . . . . . . . . . . . 59
4.4.3 FR Performance: Good Illumination conditions . . . . . . . . . . 62
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 Color Face Recognition in Ada-Boost framework 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Motivation: Ada-Boost Learning . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.1 Regularized Direct LDA . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.2 Ada-Boost framework . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Possible Implication of color in the Ada-Boost framework . . . . . . . . . 77
5.5 Methodology and Experimental Setup . . . . . . . . . . . . . . . . . . . . 80
5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.6.1 Implication of Color . . . . . . . . . . . . . . . . . . . . . . . . . 86
v
5.6.2 Implication of ensemble learning . . . . . . . . . . . . . . . . . . . 88
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6 Conclusion and Future Research 92
6.1 Research Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A Color CMU PIE database 97
A.1 Pose and Illumination variation . . . . . . . . . . . . . . . . . . . . . . . 98
A.2 Pose and Expression variation . . . . . . . . . . . . . . . . . . . . . . . . 99
B Preprocessing Method 101
C YCbCr Color Space 104
Bibliography 107
vi
List of Tables
3.1 Learning Scenarios encountered in face recognition problems . . . . . . . 22
3.2 Best Color/ Gray scale transformations in Extreme Small Sample Size
scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Decision Fusion: Rank 1 CRR in % (YCbCr 4:4:4, Database DB1 ) . . . . 60
4.2 Decision Fusion: Rank 1 CRR in % - (YCbCr 4:2:2, Database DB1 ) . . . 60
4.3 Decision Fusion: Rank 1 CRR in % - (YCbCr 4:2:0, Database DB1 ) . . . 60
4.4 Decision Fusion: Rank 5 CRR in % (YCbCr 4:4:4, Database DB1 ) . . . . 61
4.5 Decision Fusion: Rank 5 CRR in % - (YCbCr 4:2:2, Database DB1 ) . . . 61
4.6 Decision Fusion: Rank 5 CRR in % - (YCbCr 4:2:0, Database DB1 ) . . . 61
4.7 Decision Fusion: Rank 1 CRR in % (YCbCr 4:4:4, Database DB2 ) . . . . 63
4.8 Decision Fusion: Rank 1 CRR in % - (YCbCr 4:2:2, Database DB2 ) . . . 63
4.9 Decision Fusion:Rank 1 CRR in % - (YCbCr 4:2:0, Database DB2 ) . . . 63
4.10 Decision Fusion: Rank 5 CRR in % (YCbCr 4:4:4, Database DB2 ) . . . . 64
4.11 Decision Fusion: Rank 5 CRR in % - (YCbCr 4:2:2, Database DB2 ) . . . 64
4.12 Decision Fusion: Rank 5 CRR in % - (YCbCr 4:2:0, Database DB2 ) . . . 64
5.1 Results obtained with ada-boost.M2 & R-LDA using color & gray scale
transformations in different learning scenarios . . . . . . . . . . . . . . . 85
5.2 Best Performances obtained by using the color space counterpart over the
corresponding gray scale over different learning tasks . . . . . . . . . . . 87
vii
5.3 Best Performances obtained by boosting the R-LDA learner for different
inputs and learning tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.1 Details of CMU PIE database . . . . . . . . . . . . . . . . . . . . . . . . 98
viii
List of Figures
1.1 Various approaches to Face Recognition . . . . . . . . . . . . . . . . . . . 2
2.1 General FR system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 General multiple classifier FR system . . . . . . . . . . . . . . . . . . . . 11
2.3 Past Works in color FR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Color FR System Description . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Comparison of gray scale transformations and similarity metrics . . . . . 32
3.3 Rank 1 performance of YCbCr transformations with PCA feature extrac-
tor (Unsupervised Learning), database DB1 . . . . . . . . . . . . . . . . 34
3.4 Rank 1 performance of YCbCr transformations with PCA feature extrac-
tor (Unsupervised Learning), database DB2 . . . . . . . . . . . . . . . . 35
3.5 Rank 5 performance of YCbCr transformations with PCA feature extrac-
tor (Unsupervised Learning), database DB1 . . . . . . . . . . . . . . . . 36
3.6 Rank 5 performance of YCbCr transformations with PCA feature extrac-
tor (Unsupervised Learning), database DB2 . . . . . . . . . . . . . . . . 37
3.7 Rank 1 performance of YCbCr transformations with LDA feature extractor
(Supervised Learning), database DB1 . . . . . . . . . . . . . . . . . . . . 40
3.8 Rank 1 performance of YCbCr transformations with LDA feature extractor
(Supervised Learning), database DB2 . . . . . . . . . . . . . . . . . . . . 41
ix
3.9 Rank 5 performance of YCbCr transformations with LDA feature extractor
(Supervised Learning), database DB1 . . . . . . . . . . . . . . . . . . . . 42
3.10 Rank 5 performance of YCbCr transformations with LDA feature extractor
(Supervised Learning), database DB2 . . . . . . . . . . . . . . . . . . . . 43
4.1 Color FR: Multiple Classifier System Diagram . . . . . . . . . . . . . . . 56
4.2 Comparison of Aggregation Rules . . . . . . . . . . . . . . . . . . . . . . 58
5.1 Training the Ada-Boost Ensemble- Generic Diagram . . . . . . . . . . . . 75
5.2 Pseudocode: Ada-Boost framework . . . . . . . . . . . . . . . . . . . . . 78
5.3 Color FR: Ada-Boost System Description . . . . . . . . . . . . . . . . . . 83
A.1 CMU PIE: Images with Pose and Illumination Variations : No Room Lights 99
A.2 CMU PIE: Images with Pose and Illumination Variations : Room Lights On 99
A.3 CMU PIE: Images with Pose and Expression Variations : Room Lights On 100
B.1 Image preprocessing method . . . . . . . . . . . . . . . . . . . . . . . . . 103
C.1 Illustration of Chromatic sub sampling in YCbCr . . . . . . . . . . . . . 106
x
List of Abbreviations
FR Face Recognition
PCA Principle Component Analysis
LDA Linear Discriminant Analysis
R-LDA Regularized Linear Discriminant Analysis
LDD Learning Difficulty Degree
Ada-boost Adaptive Boosting
FRR False Rejection Rate
FAR False Acceptance Rate
CRR Correct Recognition Rate
MPEG Moving Picture Experts Group
JPEG Joint Photographic Experts Group
xi
Chapter 1
Introduction
Face Recognition (FR) is the process of recognizing an individual using facial features.
FR is a technology with applications ranging from security related, such as monitoring
and surveillance, identity authentication to human computer interaction and face based
video indexing. Recently, an increase in the security concerns world wide has focused the
attention of researchers and the public on the accuracy of computerized FR systems. It
has been reported that the accuracy of FR algorithms is comparable to or better than
that of the recognition by humans when the given face images are subjected to difficult
conditions like varying illumination, pose and resolutions. However none of the existing
FR methods are totally robust to these conditions[1, 2, 3], automatic FR is a promising
research area.
Various 2-d and 3-d methods have been proposed in past literature for FR, and are
reported in a recent survey [4]. The 2-d methods are more popular than 3-d methods
as research on the latter is relatively new and poses challenges such as difficult data
acquisition process, alignment of 3-d meshes and faces with occlusions (e.g. spectacles)
that cannot be properly dealt with. The 2-d methods in literature can be classified into
appearance based methods and feature based methods. The feature based approach is
based on the localization of face features like eyes, eyebrows, nose and mouth. Infor-
1
Chapter 1. Introduction 2
mation about their geometry characteristics, relative positions and other statistics are
used to describe faces. Examples of feature based approaches are those based on Hidden
Markov Models [5, 6], Elastic Bunch Graph Mapping [7] and gabor wavelets [8]. Al-
though feature based methods might lead to a good FR performance, they have a major
disadvantage: their performance heavily relies on the accurate localization of face feature
areas. Various line of approaches to automatic FR systems are presented in Figure 1.1.
FR Algorithms
2D 3D
Appearance BasedFeature Based
Gray Scale Image Colour Image
Figure 1.1: Various approaches to Face Recognition
Appearance based methods treat the face as a holistic 2-d pattern and focus on cre-
ating a low dimensional statistical representation of the face. In this class of methods,
the face is represented by a vector/matrix of pixel intensity values and the FR algorithm
focuses on projecting these vectors/matrices onto a lower dimensional discriminative face
space in which recognition is performed. FR is therefore viewed as a multivariate statis-
tical problem. This class of methods avoids challenges relating to localization of features
in face images and 3-d modeling and are reported to be the very successful in past lit-
erature [9, 10, 11, 12] when applied on large and complex databases. Although humans
can recognize persons based on certain face features, the geometric interdependency be-
Chapter 1. Introduction 3
tween different face features contributes more to the recognition process than a particular
feature alone. In other words, humans tend to treat a face as a holistic pattern while
performing the process of recognition. This argument, along with the results demon-
strated by appearance based methods in past literature is the motivation for using the
appearance based approach to FR in this thesis.
1.1 Color based Face Recognition
Gray scale or intensity images have been traditionally used in appearance based FR
systems and have been reported to lead to good performance under favorable imaging
conditions of uniform illumination and minimal pose variations [9, 10, 13, 14]. However,
gray scale images get severely affected under severe illumination conditions and poor
resolution. The shape cues present in the gray scale image get severely destroyed under
these conditions as they contain only intensity information, thus making them recognition
difficult. Variations due to bad imaging conditions, pose and expression variations are
sometimes larger than variations between images of the same person, and hence are
crucial to address.
Recently, attention has been drawn to using the information in color spaces to improve
the performance of FR systems. Previous works [8, 15, 16, 17] have shown that chromatic
information in conjunction with intensity information lead to better FR performance in
comparison with the usage of gray scale information alone. Color features make object
recognition more robust against image variations such as illumination [17, 18].
According to a recent study on human face perception [2], faces can differ from each
other in two ways - their shape cues and their pigmentation/ color cues. The color
cues give information about the texture and surface reflectance of the face, as well as
particular hue of their hair or skin which might aid the human identification process.
Also, when shape cues are degraded (this happens to the intensity image in conditions of
Chapter 1. Introduction 4
bad illumination and poor resolution), the human brain uses color to pinpoint identity.
A recent study [18] asserts that although the observed colors can change significantly
under different illumination conditions, the human visual system uses color cues for
segmentation of features within a face, especially when the shape cues are degraded. Both
human face perception studies as well as recent works on computerized FR algorithms [2,
17, 15, 17] support the hypothesis that chromatic information could supplement intensity
information in automatic FR systems, which is the motivation for using color images as
inputs to appearance based FR methods in this thesis.
1.2 Face Recognition: Modes of Operation, and Tar-
get Applications
FR systems can operate in 3 modes: identification, authentication and watch list [19, 20].
In the identification mode, the FR system compares the identity of an unknown person
to all the enrolled persons in the face database, and thus reveals the identity of the
unknown person. The FR system solves a 1 : N problem in this mode, where N is the
number of subjects enrolled in the face database. This mode has applications in the
area of surveillance and the system performance is measured by the fraction of unknown
images correctly identified.
In the authentication or verification mode, the FR system verifies the identity claim of
the unknown person. The FR system compares the unknown face (input) to the claimed
identity in the database and makes a decision to accept or reject the claim. In this mode,
the FR system solves a 1:1 problem. This mode has applications in access control. The
system performance is measured by the correct accept rate versus the false accept/reject
rate, depending on the sensitivity of the application.
In both of the above modes, the assumption is that the unknown person has been
enrolled into the face database. In the watch list mode, the FR system first checks for
Chapter 1. Introduction 5
the presence of the unknown person (input) in the face recognition database, and if true,
identifies the person. When the FR system operates in this mode, the size of the database
is usually very small compared to the query images. The system performance is measured
by correct detection rate, correct recognition rate and false accept rate and this mode
could have applications in crime investigation and related domains.
In this thesis, our main aim is to solve complex FR problems using information in
color spaces, and we operate in the identification mode.
1.3 Challenges
Appearance based approaches are statistical methods which process the face as a holistic
pattern. In practical FR scenarios like insufficient faces available for training, complex
imaging conditions or other facial distortions, these methods are posed with statistical
challenges. The key technical barriers are summarized in this section.
1. High dimensionality of training inputs & insufficient learning samples: Face im-
ages are typically represented as vectors of pixel values. For example, a 150× 130
resolution face image is represented as a vector of dimension 19500. In contrast,
the number of samples available per subject for training the FR algorithm is usu-
ally less than 10. This leads to statistical problems like matrix singularities and
biased estimation of parameters. This scenario is referred to as the small sample
size (SSS) problem and could corrupt the design of the FR system, especially if the
FR algorithm uses the identity information in training. When the faces used are
color images, the dimensionality of the faces increases by a function of the number
of spectral planes in the face images and the sampling structure of the color space
involved, thus leading to a more severe small sample size problem. Furthermore,
the high dimensionality of face inputs also poses many computational challenges.
2. Complexities in face patterns: In practical FR systems, faces are subjected to pose
Chapter 1. Introduction 6
and expression variations, bad imaging conditions like severe illumination varia-
tions and poor resolution. These distortions and conditions are the complexities in
face patterns. All appearance based approaches are traditionally linear methods
and cannot learn complexities in face patterns and imaging conditions effectively.
Therefore, creating robust FR systems which can obtain discriminative information
from faces under these conditions is a major challenge.
1.4 Thesis Contributions and Organization
In this thesis, an in depth analysis on the usage of color information for face recognition
is provided, along with analysis of the behavior of color inputs and FR algorithms in
different learning scenarios, imaging conditions, and facial distortions.
In Chapter 2, a review of past literature where chromatic inputs have been used in FR
systems is provided. This includes works where chromatic information have been used
as inputs to the FR system and decision level fusion of classifiers trained on different
chromatic inputs. The subsequent chapters present the methods developed in this thesis,
1. In Chapter 3, the learning scenarios and imaging conditions under which the use
of color images significantly betters the FR performance were examined, in both
supervised and unsupervised learning modes for the YCbCr color space. The be-
havior of chromatic inputs with different sub sampling ratios in the small sample
size scenario, which is a special case of the learning scenarios examined was analyzed
for the supervised learning mode. Experiments were conducted on two evaluation
databases which had moderate and severe illumination conditions. This work was
partially published in [21].
2. In Chapter 4, a decision level combination of classifiers trained on different spectral
planes of the YCbCr color space transformations was examined using rule based
fusion methods. The motivation behind this, was to produce diverse classifiers
Chapter 1. Introduction 7
using the YCbCr color space, and to reduce the small sample size problem by using
chromatic information in a decision fusion framework. An analysis of the behavior
of this framework in the small sample size scenarios and the effect of chromatic
sub sampling was performed under different imaging and learning conditions. The
evaluation databases used were common to those used in Chapter 3.
3. Chapter 5, complexities in face patterns (expression and pose variations) and imag-
ing conditions (severe illumination conditions and poor resolution) were addressed
by combining the advantages of chromatic inputs (in addressing bad imaging con-
ditions) and supervised learning with ensemble learning (in learning complex face
patterns). An adaptive boosting (ada-boost) framework was used with a learner
consisting of a direct LDA feature extractor and a nearest center classifier. The
behavior of this framework was examined in various small sample size scenarios to
analyze the contribution of chromatic information and boosting in a range of learn-
ing scenarios. Experiments were performed on a large evaluation database having
severe illumination and pose variations. This work was published in [22].
To the authors knowledge, this thesis is the first work to examine the effect of the
small sample size problem created by the increased dimensionality of vectorized color
inputs on supervised systems. This thesis concludes in Chapter 6 with a summary of the
work along with future research directions.
Chapter 2
Prior Work and Background
Chromatic information has been used for object detection in a large number of works;
however, it was not traditionally applied in the recognition domain. In recent works, the
use of color images for FR purposes has been shown to improve the system recognition
performance. This chapter is concluded by providing an insight into the motivations for
thesis.
2.1 Face Recognition System
The face images contain irrelevant information along with the face, which includes the
background, hair, etc. In the preprocessing stage, the face is isolated from the rest of
the image. The face is then represented as a column vector for further processing. An
FR system consists of a training stage and a testing stage. The training stage focuses on
the creation of a low dimensional feature space is created to project the face data where
face patterns are well clustered and separated, as the original face inputs are usually of
a very high dimensionality (≈ 104). This takes place in the feature extraction step. In
the testing stage, the face inputs are projected onto this low dimensional space.
The outputs and inputs of the FR system in the testing stage depend on the mode
of operation of the FR system. In past works, FR systems have been operated in two
8
Chapter 2. Prior Work and Background 9
modes: identification and verification or authentication. The difference in the two modes
lies in the state of knowledge of the system regarding the identity of a subject. A general
framework depicting an FR system in identification/ verification mode is illustrated in
Figure 2.1.
Identification mode: Given a database, or a gallery, consisting of images of known
identity, the aim of the FR system is to identify the input image or the probe whose
identity is unknown. In the testing stage, both the gallery and probe data are projected
onto the lower dimensional feature space created in the training stage. Classification is
performed in this lower dimensional space. The output of the FR system in this mode is
the identity of the probe image. During the identification process, the system has no prior
knowledge about the identity of the unknown subject. The performance of identification
systems is measured by the Correct Recognition Rate (CRR). Correct recognition rate at
rank k refers to the ratio of the number of correct searches in the top k candidates to
the total number of probe images taken as a percentage. When k=1, this becomes the
fraction of probe images correctly identified.
Authentication mode: The aim of the FR system is to verify the claimed identity of
the probe, by comparing it with the corresponding image in the gallery. This is a one
to one problem in contrast to identification which is a one to many problem. In the
testing stage, both the claimed identity from the gallery and the unknown probe image
are projected onto the lower dimensional subspace created in the training stage, and
matching is performed. The output of system would be an accept/ reject of the claim of
the unknown person (probe) based on the distance between the projected probe and the
claimed identity. The performance of the system in the verification mode is measured
by the false acceptance rate (FAR), false rejection rate (FRR) and the total error rate
(which is the sum of the two) [23]. The FAR and FRR are computed as,
FAR =Number of subjects wrongly authenticated
Total number of intruders
Chapter 2. Prior Work and Background 10
Training Data
Orthogonal Feature Basis
Training
Testing
Probe ID
Probe Image
Gallery Set (identification)/
Claimed Identity (verification)
Depending on the mode of operation of the FR system, output of the testing stage is the Probe ID (identification)/ Accept or Reject of Claim (Verification)
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Feature Extractor
Projection onto Feature Basis
Classification
Figure 2.1: Block diagram of an appearance based FR system. This FR system archi-tecture is used in [15, 17]
FRR =Number of subjects wrongly rejected
Total number of subjects.
The Equal Error Rate (EER) is rate corresponding to which FAR equals FRR. A smaller
EER indicates a better FR system. A trade off is involved in achieving a low FAR and
FRR as it is hard to achieve them simultaneously. In face verification systems with
sensitive applications, the focus is to maximize the total recognition rate for a minimum
FAR.
Chapter
2.
Prio
rW
ork
and
Background
11
Training Set
Input 1
Input 2
Input K
Feature Basis 1
Feature Basis 2
Feature Basis K
Projection onto Feature Basis 1
Projection onto Feature Basis 2
Projection onto Feature Basis K
Probe Image
Input 1
Input 2
Input K
SimilarityComputation – S1
SimilarityComputation – S2
SimilarityComputation –SK
Decision
FusionProbe ID/
Accept-Reject
Input 1
Input 2
Input K
Depending on the mode of operation of the FR system, output of the testing stage is the Probe ID (identification) / Accept or Reject of claim (verification)
Training
Testing
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Gallery Set (identification)/
Claimed Identity (verification)
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Preprocessing and Vectorization of input
Feature Extractor
Feature Extractor
Feature Extractor
Figure 2.2: Generic block diagram of FR system architecture in [16, 24, 25, 26, 27]
Chapter 2. Prior Work and Background 12
The FR system in Figure 2.1 is extended to a multiple classifier FR system in Figure
2.2. In this figure, the system is trained on different inputs to create a set of low dimen-
sional subspaces. Classifiers trained on different inputs in the testing stage are fused in
the decision level using an aggregation method.
2.2 Color Face Recognition: A Survey
Notable past works in the domain of color FR can be classified into 2 parts, those in
which the input to the feature extractor is the raw information contained in color spaces,
and those where the the feature extractor operates on the information in the frequency
domain, i.e., information in color spaces is operated upon by a filter.
L. Torres et al, 1999The importance of Color Information in FR
C. Jones III et al, 2006Color FR by Hypercomplex Gabor Analysis
P. Shih et al, 2005 Comparative assessment of Content Based Face Image
Retrieval in different color spaces
P.Shih et al, 2006 Improving the FRGC baseline performance using color
configurations across color spaces
J. Kittler et al, 2004Physics based decorrelation of Image Data for decision level
fusion in Face Verification
M.T. Sadeghi et al, 2007Confidence based gating of Color Features for Face AuthenticationSVM based selection of color space experts for face authentication
Feature based, frequency domainColor Space Domain
Raw data level fusion:Recognition Depends on Color Space
Decision level fusion: Color spaces contain complementary information, use of decorrelated
spectral planes
Color FR
Figure 2.3: Past Works in color FR
A work on the latter category was performed in [8] by C. Jones III et al, using gabor
analysis on color images and Elastic Bunch Graph mapping for recognition. However
this method is feature based, and also leads to a face vector of very high dimension after
gabor analysis, thus increasing the small sample size problem when used with a supervised
Chapter 2. Prior Work and Background 13
learner and making the FR system more computationally complex. Therefore, this line
of approach is not adopted. In this thesis, the feature extractor operates directly on
the raw information present in color spaces and treats the face as a holistic pattern. A
diagram showing the directions and hierarchy of past works in color FR is presented in
Figure 2.3.
One of the first works which proposed the idea of the usage of multi spectral or color
faces for FR was by L. Torres et al in [16]. RGB, YUV and HSV color spaces, each
comprised of three color planes, were examined for recognition purposes. Experiments
were performed on 120 images from test video sequences. Training was performed on
the gallery set, Z and then images of the probe set, Q were matched against those of
the gallery. The images in the probe set were of a different viewpoint from those in the
gallery. The Principal Component Analysis (PCA) [13] feature extractor was used in the
training module. The PCA feature extractor was trained separately on single spectral
gallery images, to produce 3 projection feature bases. The individual spectral planes
of each probe image were projected onto the corresponding feature bases created in the
training step. They were matched with the projected single spectral gallery images (of
the corresponding spectral plane) to form 3 similarity scores for each probe-gallery pair.
A decision level fusion was performed to get a single score, which was used to determine
the unknown identity of the probe. The Mahalanobis distance, given by Equation (2.1),
was used for finding the similarity measure, and classification was performed using the
nearest center classifier.
d(xi, µ) = (xi − µi)T Σi(xi − µi) (2.1)
where xi is a face vector of the ith class, µi and Σi are the mean and covariance matrix
of the ith class respectively.
The FR system framework used in this work can be fit into the block diagram in Figure
2.2. In this case, K=3, and the inputs are the images corresponding to the different
Chapter 2. Prior Work and Background 14
spectral planes of the YUV, SV and RGB images. This work reported a recognition rate
of 88.14 % when YUV inputs were used in the FR system, providing a 3.39% improvement
over using the Y input alone.
The important conclusions of this work are
1. The correct recognition rate of the FR system is affected by the color space used.
2. Color spaces where the luminance and the chrominance components are isolated,
lead to better FR systems.
3. Recognized faces are not the same for different color space inputs even though the
recognition rate is the same.
The conclusions in [16] have motivated further work on the usage of color spaces to
perform FR, as illustrated in Figure 2.3. Conclusion 1 provides the motivation to explore
different color spaces for recognition purposes, which was performed in [17] by P. Shih
et al. Conclusion 2 implies that for the FR system to perform well, the luminance and
chrominance spectral planes should be isolated. A broader view of this conclusion would
be that, in order to produce a more diverse set of classifiers and hence a better FR
system, the spectral planes of the inputs should carry different, uncorrelated information
(which was in fact the case with the YUV color space). This was the idea behind [26]
by J. Kittler et al. Conclusion 3 of [16], lead to the idea that color spaces contain
complementary information, which provided the motivation for fusing classifiers trained
on different color spaces. This was performed in [25] by M. T. Sadeghi et al.
P. Shih et al in [17] used the idea that the information contained in different color
spaces can be applied for different visual tasks, and therefore explored the usage various
color spaces for content based image retrieval, specifically the computer retrieval of face
images given a particular subject query. This task is similar to FR in the identification
mode and the system performance was measured by the correct retrieval rate. In this
work, 12 color spaces were examined (as inputs) to the FR system. This FR system
Chapter 2. Prior Work and Background 15
architecture can be fit into the block diagram in Figure 2.1. The system was trained
using the PCA feature extractor. The Mahalanobis metric given by equation (2.1) was
used to measure similarity in conjunction with the nearest neighbour classifier. The
color inputs are represented as an augmented vector by concatenating individual column
vectors formed by a row wise ordering of spectral planes (raw data level fusion). The
color spaces examined included RGB, HSV, I1I2I3, video transmission spaces (YIQ,
YUV, YCbCr) and intensity normalized RGB. Seven subspaces were analyzed for each
color space. Experiments were performed on 600 FERRET [28] images corresponding to
200 subjects and 456 FRGC images [29] corresponding to 152 subjects. The images in
FERRET have uniform illumination, pose and expression variations, while the images
in FRGC have both images in controlled and uncontrolled settings (illumination and
expression variations). Results show that the YI (from YIQ), YCr (from YCbCr) and
YV (from the YUV) subspaces lead to the best retrieval rate. Incidentally, YCbCr, YUV
and YIQ are decorrelated color spaces. Also, inputs of I1I2I3 space (decorrelated RGB)
lead to a better FR system than RGB inputs. The interpretation for these trends is that
when color spaces are decorrelated, each color spectral plane provides distinct information
about a different aspect of the image. When these spectral planes are concatenated,
they form a vector with low redundancy, in contrast with the column vector formed by
RGB inputs (where the spectral planes are highly correlated). Also the blue chromatic
plane does not provide as much discriminative information as the red, and chromatic
information needs to be used in conjunction with intensity information for a good FR
performance.
This work was extended in [15] where experiments were conducted on 1126 FRGC
images. The FR system was tested with both PCA and Linear Discriminant Analysis
(LDA) [14] feature extractors with a nearest neighbour classifier based on the normalized
inner product similarity metric. A combination of spectral planes from the YIQ and
YCbCr, the YQCr was concluded to improve the rank 1 recognition performance of the
Chapter 2. Prior Work and Background 16
FR system.
The work in [25, 26, 30] was motivated by the fact that in order to construct an efficient
multiple classifier FR system, the component classifiers should provide complementary
information to the FR process.
In [26] the R, G, and B spectral planes have been decorrelated and mapped to new
orthogonal spaces which separate the effects of object shape and albedo, and create
complementary data channels that lead to classifiers containing different information
having a high level of diversity. This is done by analyzing the physics of image data, and
creating an intensity channel, a green channel, g and an opponent chromaticity channel
rg. The FR system is a face verification system and its description can be fit into the block
diagram on Figure 2.2 (K = 3). Training is performed on the training data (XM2VTS
database [31]) using the LDA technique, and the BANCA database [32] was used for
testing purposes. The BANCA database contains images in controlled, uncontrolled and
adverse imaging conditions. The inputs to the feature extractor are the decorrelated
intensity, chromatic and opponent chromatic channels created. The similarity measure
was the gradient direction metric, defined in [33]. Two fusion methods, i.e, the score
averaging (a linear method) and the max rule (non linear method) were used to fuse
the outputs of the individual classifiers. The total error rate, false acceptance rate and
false rejection rate were used to evaluate performance of the FR system at a global
threshold. The fusion of the classifiers created from the decorrelated data (intensity, g
and rg) significantly improved the performance of the FR system over usage of the RGB
color space alone. The interpretation of this data would be that, fusing diverse classifiers
created from decorrelated / independent spectral planes boosts the performance of a
multiple classifier FR system.
Different color spaces contain complementary information which could be useful to the
FR system. In different imaging conditions, different color faces provide discriminatory
information to the FR system. In [25], classifiers trained on different color spaces are
Chapter 2. Prior Work and Background 17
fused using an aggregation scheme at the decision level, and the classifiers to be fused
are chosen based on a confidence based gating scheme. The idea is to use inputs from
all useful color spaces for the face verification process. The FR system in this work can
be fit into Figure 2.2, where K is the number of color spaces used in the training and
classification process. However, in the decision fusion stage, only those classifiers which
are experts according to the confidence based gating rule(s) are aggregated. Training
is performed on the training data (XM2VTS database [31]) using the LDA technique.
The XM2VTS database consists of controlled images of 295 subjects. The Gradient
Direction Metric [33] is used as the similarity measure and the false acceptance rate and
false rejection rate were used to evaluate performance of the FR system. The method of
aggregation of the classifier experts chosen through the confidence based gating process is
the majority vote. Using this method of confidence based choosing of classifier experts,
the optimum subset of expert classifiers is dynamically chosen for each probe image
and aggregated to produce a more accurate expert. The performance of the verification
system was considerably improved by this aggregation over both the gray scale baseline
and the individual experts themselves.
This work has been extended in [27], where the the classifiers are trained on the
spectral planes of the different color spaces, thus increasing the competing classifiers by
three times. In this work, aggregation was performed by a more sophisticated method
based on support vector machines.
The previous works reinforced that chromatic information boosts the performance of
the FR system. They focused on the problem of creation of complementary classifiers for
efficient combination in a decision fusion framework. This was performed in two ways
• Using a confidence based gating method to choose classifier experts trained on
different color spaces / spectral planes and combine them in a decision aggregation
framework,
• decorrelating the data in the spectral planes of color spaces to produce diverse
Chapter 2. Prior Work and Background 18
classifiers for use in a multiple classifier FR system.
They also focused on the evaluation of the usage of different color spaces for FR applica-
tion in their raw form by performing a raw data level fusion. An important conclusion is
that fusion of decorrelated data performs better, both in multiple classifier systems and
raw data level, and color information in conjunction with intensity information helps the
FR system.
Motivated by the conclusions in the previous works, the YCbCr color space trans-
formations - YCbCr 4:4:4, YCbCr 4:2:2 and YCbCr 4:2:0, along with their subspaces
are chosen for our FR experiments and analysis as the YCbCr is a decorrelated color
space and has demonstrated a good performance for FR tasks [17]. Since the intensity
and the chromatic planes of the YCbCr space are decorrelated, it is a good color space
for examining the contribution of chromatic information. Also, since it is used in digital
video and image compression standards, it could have benefits from an application point
of view like the integration of FR and video systems. Although sub sampling of the
chromatic planes in the YCbCr 4:2:2 and YCbCr 4:2:0 transformations does not have
any notable visual difference considering that humans do not perceive high color spatial
resolutions, its effect on the FR system is not evident, and will be discussed in the next
chapter.
2.3 Motivation of this Thesis
The databases used in previous literature however, consisted of images which were not
always captured under controlled settings and therefore, did not always have good shape
cues. In this work, the imaging and learning conditions in which color information
significantly helps the FR system is studied. For example, if the images in the face
database have a high resolution and are photographed in a controlled environment with
no degradation of the shape cues, will color cues still improve performance? Experiments
Chapter 2. Prior Work and Background 19
were performed to examine the performance of chromatic information in a wide range
of learning scenarios with various difficulties and different imaging conditions, as these
trends would help in the design of the FR system.
An important direction of research is the effect of the increased dimensionality of
color inputs on the small sample size problem in supervised learning. The small sample
size problem could corrupt the design of the FR system and is an important factor to
consider in FR systems. The effect of a decision level fusion of the decorrelated spectral
planes of the YCbCr space on the small sample size problem is also examined.
We have explored a third method of creating complementary classifiers with color
data using a boosting framework, consisting of supervised learners. This framework
was tested this framework on a range of small sample size learning scenarios on a large
database having severe illumination and pose variations to examine the effect of ensemble
learning and chromatic information on the performance of the FR system under imaging
conditions and facial distortions encountered in real life applications.
The work in this thesis is complimentary to the conclusions and ideas proposed in
previous literature on color FR, and therefore is not directly comparable to past works.
Chapter 3
Color face recognition in different
learning scenarios
In this chapter, the usefulness and contribution of chromatic information is examined in a
range of learning scenarios and different imaging conditions in order to conclude the exact
scenarios where usage of color information would help the FR system in both supervised
and unsupervised learning modes. The implication of the extra dimensionality added by
chromatic spectral planes on the small sample size problem encountered in supervised
learning systems is another aspect evaluated in this chapter.
3.1 Introduction
Recently attention has been drawn to the usage of color information for FR purposes.
Previous works have confirmed the usefulness of color in automatic FR systems [16, 15,
17, 8]. According to the studies on human perception and vision, color cues are supposed
to improve the performance of the FR system when shape cues are degraded. However,
the usage of color images poses two main challenges to an automatic FR system,
1. Computational and storage requirements: Face images are represented as vectors in
20
Chapter 3. Color face recognition in different learning scenarios 21
an FR system. Usage of color information leads to a vector of a larger dimension
which substantially increases the computational cost of the FR system. Also, a
face database comprising of color images would require a larger storage space.
2. Larger Dimensionality of color inputs vs. less number of training samples: Face
inputs have a very large dimensionality, approximately of the order of 104, while the
number of training samples available is very low (around 2-10 samples/ subject),
as mentioned in Chapter 1. This leads to a small sample size problem. When color
inputs are used, the dimensionality of the face inputs becomes larger, thus leading
to a more challenging small sample size problem.
It is therefore important to determine the imaging conditions, learning scenarios and
situations in which color information would be useful to the FR system, in other words,
whether chromatic information would have the same degree of contribution to the per-
formance of the FR system when there is no degradation of the shape cues (distortion of
the intensity image) and when the learning scenarios are optimal.
The usefulness of chromatic information is examined in this chapter for both super-
vised and unsupervised FR systems. The learning scenarios examined are a function
of the number of subjects available and the samples per subject available for training
the FR algorithm. The latter parameter is an important factor in supervised learning
scenarios. Table 3.1 summarizes the various learning scenarios examined. The effect of
chromatic information is also examined for different imaging conditions. Two databases
have been chosen for evaluation, one with severe illumination variations and the other
with relatively moderate illumination variations, in order to examine the contribution of
color space information in different imaging conditions.
A special case of the learning scenarios in Table 3.1 is the small sample size scenario,
which exists when the number of samples per subject available for training is very small,
(approximately two to three images per subject) as this affects supervised FR systems.
Chapter 3. Color face recognition in different learning scenarios 22
Table 3.1: Learning Scenarios encountered in face recognition problemsNo. of Subjects No. of Samples per subject
Low LowLow HighHigh LowHigh High
The effect of the increased dimensionality of color inputs on the small sample size problem
is an important issue to be taken into consideration in the design of an effective FR
system.
The YCbCr color space transformations (along with the various sub sampling ratios)
are used for analysis in this thesis. However, the effect of spatial sampling of the chromatic
spectral planes on the FR system is not evident. The implication of chromatic sub
sampling on the FR system and the small sample size problem is another aspect studied
in this chapter.
3.2 Representation of Color Information
Let Si be the ith 2-d image in a set of images, with spatial dimensions, J = IW × IH and
K spectral planes. Each spectral plane has a spectral depth of 8 bits (which corresponds
to values between 0 and 255); therefore Si has a spectral depth of K×8 bits. The number
of spectral planes is dependent on the color space, for example, an RGB image will have
3 spectral planes: R, G and B. Every image Si is represented as a column vector, xi for
future analysis. In order to convert Si to xi, the following steps are performed:
1. Each spectral plane is converted to a column vector.
2. The column vectors from each spectral plane are concatenated.
In order to form a column vector for the mth spectral plane, where 1 ≤ m ≤ K,
the 8 bit values of that spectral plane are ordered lexicographically (row-wise) into a
column vector, sim, where sim is the column vector of mth spectral plane of the ith
Chapter 3. Color face recognition in different learning scenarios 23
image. Also, sim ∈ RDm×1, where Dm is the dimensionality of sim. Dm is dependent on
2 parameters: the sampling nature of sim and the spatial dimension of Si. For example:
If Si is converted to the YCbCr 4:2:0 color space used in MPEG standards, K = 3, and
every alternate row and column is eliminated from the Cb and Cr (chromatic) spectral
planes (subsampling) while forming their column vectors. Therefore, the ratio of the
dimensions of the 3 spectral planes will be DY : DCb : DCr = 4 : 1 : 1, and their values
will be of the form Dm = J × µ, where µ is the scaling factor of that particular spectral
plane; µ = 0.25 for the Cb and Cr spectral planes and µ = 1 for the Y plane, in this case.
After forming the column vectors, si1, si2, ..., siK , xi is formed by, xi = [sTi1s
Ti2...s
TiK ]T .
The Dimension of xi is d =∑K
m=1Dm. This is performed for all Sis in the set of images to
form a set of column vectors, {xi}N
i=1, where N is the number of images in the set. Since
K = 1 for gray scale images, and K = 3 for most color images, the dimensionality of
the column vector for a color image is thrice that of the corresponding gray scale image,
when no sampling is performed.
Column vectors of individual spectral planes of YCbCr transformed images are nor-
malized to zero mean and unit variance prior to concatenation or further processing.
This operation is possible because YCbCr is a decorrelated transform, and hence the
individual spectral planes can be operated upon separately.
3.3 Background
All appearance based methods can be classified on the basis of the knowledge used by
the FR system in the feature extraction step of the training stage. They are classified
into unsupervised methods, supervised methods, and those methods based on inter and
intra personal variations (based on Bayes learning [34]). In this section, the underlying
concepts of supervised and unsupervised learning are presented along with a description
of the two most basic unsupervised and supervised learning methods used in FR systems.
Chapter 3. Color face recognition in different learning scenarios 24
• In unsupervised learning, the learner (or feature extractor) uses solely the input
patterns, i.e., the preprocessed faces in the training database to form the feature
basis of the FR system. The learner is not provided with any class information,
which includes identities of the subject, class means or variances. The Principle
Component Analysis (PCA) is the most fundamental unsupervised learning method
used in FR.
• In supervised learning, the feature extractor is furnished with the preprocessed
input patterns, along with their class information, class means, inter and intra class
variations. All of this information is used to create a feature basis for projection
in the testing stage. Linear Discriminant Analysis (LDA) is the most fundamental
supervised learning method used in FR systems.
Most appearance based FR methods, including those based on kernels [35, 36] are
based on PCA and LDA.
3.3.1 Principal Component Analysis
The PCA is one of the first and most popular tools for data reduction and feature
extraction. It was first used for FR in [13]. The PCA focuses on finding a set of orthogonal
basis vectors which maximize the total scatter or variance in the training samples.
Given a training set Z, containing images {zij}, where zij is the jth image of the
ithclass in Z and Ci is the number of images in the ithclass and C is the number of
classes, the covariance matrix is given by,
Scov =1
N
C∑
i=1
Ci∑
j=1
(zij − z)(zij − z))T (3.1)
where z = 1N
∑C
i=1
∑Ci
j=1 zij is the average of the training samples. The covariance given
by Equation 3.1 is the sum of the intra and inter class variances of all the images of
Chapter 3. Color face recognition in different learning scenarios 25
the training set [37]. The orthogonal basis is formed by the eigen decomposition of the
covariance matrix. Finding an orthogonal feature basis to maximize the total scatter
given in equation 3.1 corresponds to solving the following eigen value problem,
Φk = λkScovΦk
where k = 1, 2, ...M . The PCA feature space is thus spanned by the M (M < d) most
significant eigen vectors, Φk corresponding to the M largest eigen values, where d is the
dimensionality of the face input zij . Every face zij is projected onto this low dimensional
feature space by the linear mapping: yij = W TPCA(zij − z), where WPCA = [φ1, φ2, ...φM ]
is the transformation matrix consisting of the first M most significant eigenvectors. The
vector of reduced dimension yij is a vector formed by the projections of zij on each of
the M orthonormal basis vectors. The classification of faces takes place in this reduced
feature space using any classifier.
PCA achieves object reconstruction in the least square sense and maximizes both the
inter and intra class variances. Since the intra class variances could have a negative impact
on the performance of FR systems, it is generally believed that PCA does not perform
as well as supervised learning techniques based on the Linear Discriminant Analysis.
3.3.2 Linear Discriminant Analysis
The Linear Discriminant Analysis (LDA) method is a supervised learning method used
in FR systems, and is the basis for all the supervised learning methods in FR literature.
LDA uses class specific projections and produces a set of orthogonal vectors to form a
low dimensional discriminative feature space.
Given a training set Z = {Z}C
i=1, containing C classes with each class Zi = {zij}Ci
j=1,
consisting of images zij (where zij is the column vector of the jth image of the ith class),
a total of N =∑C
i=1Ci are present on the training set. The dimensionality of the column
Chapter 3. Color face recognition in different learning scenarios 26
vectors of the images in Z is d. LDA finds a set of M feature vectors, M ≤ d, based on
the following optimality criterion,
Ψ = arg maxΨ
∣
∣ΨTSBΨ∣
∣
|ΨTSW¶si|(3.2)
where, Ψ = [ψ1ψ2...ψM ], ψk ∈ ℜd, and SB and SW are the between class and within class
scatter matrices respectively and defined as per the following equations,
SB =1
N
C∑
i=1
Ci(zi − z)(zi − z)T =C
∑
i=1
ΦB,iΦTB,i = ΦBΦT
B (3.3)
SW =1
N
C∑
i=1
Ci∑
j=1
(zij − zi)(zij − zi)T (3.4)
where ΦB,i =(
Ci
N
)1
2 (zi − z), ΦB = [ΦB,1ΦB,2...ΦB,C ] and zi =∑Ci
j=1 zij is the mean of the
class Zi.
The optimization problem in equation 3.2 is equivalent to solving the following eigen
value problem,
SBψk = λkSWψk, k = 1, ...,M (3.5)
The basis vectors ψk are the eigen vectors corresponding to the M largest values of
S−1W SB, provided SW is not singular.
Although LDA is expected to perform better than the PCA since it utilizes class
information to create a low dimensional feature basis, it is more susceptible small sample
size problem, which will be discussed in the next section.
Chapter 3. Color face recognition in different learning scenarios 27
3.4 Color and the Small Sample Size Problem
3.4.1 Small Sample Size Problem
According to statistical learning theory, as the dimensionality of the face input, d in-
creases, the estimation of the scatter matrices becomes increasingly difficult. This is
because in practical FR systems, the data available for training is usually very less com-
pared to the order of the dimensionality of the training inputs. A problem is poorly
posed if the number of parameters to be estimated is comparable to the number of train-
ing samples, L and is ill posed if it is far greater than L [38]. This makes the estimation of
scatter matrices an ill posed problem, and is referred to as the small sample size problem.
The small sample size (SSS) problem is most severe in supervised learning scenarios
based on LDA. The SW matrix is essentially proportional to the sum of the covariance
matrices of the individual classes. The number of samples available for training in each
class (≤ 10) is typically very small compared to the dimensionality of the column vectors
of the samples in Z(of the order of ≈ 104). This makes the estimation of the SW matrix
a highly ill posed problem as it has a very low rank, and in the case of the classical LDA,
might lead to highly degenerate scatter matrices. Therefore the direct optimization of
the ratio in equation 3.2 becomes difficult as SW is singular, and leads to highly biased
eigen values.
3.4.2 Implication of Color Inputs
For inputs with multiple spectral planes, the dimensionality of the column vector is
increased by the number of spectral planes, thus making the estimation of SW more ill
posed. For example, if zijs are gray scale and have a dimensionality of d = 150 × 130 =
19500 and Ci = 2,∀i, the small sample size becomes more prominent if zij was color as
d would be increased three times to 58500 (without sub sampling), while the number of
training samples would remain the same. Lowering of the number of samples per class
Chapter 3. Color face recognition in different learning scenarios 28
leads to biased estimates of eigen values, i.e., the largest ones are biased high while the
small ones are biased very low.
Sub sampling of chromatic planes using the standard ratios is not perceptually visible
to the human eye since humans see color with much less spatial resolution than intensity
[39], however its impact on the FR system is not totally evident. Chromatic sub sampling
has 2 major implications,
• Removal of bytes from the chromatic planes would mean removal of input infor-
mation to the FR system. It is not known whether loss of chromatic information
would degrade the FR system, i.e., whether YCbCr 4:4:4 would lead to a better
FR performance than YCbCr 4:2:2.
• Sub sampling would also lead to a reduced dimension of xi as opposed to no sam-
pling. This would have an implication in supervised learning systems trained with
LDA when the number of samples per subject is very low (≈ 2 − 3) as a reduced
input dimension might lead to a less ill posed within class scatter matrix, reducing
the small sample size problem. For example, the dimension of the column vector
for a YCbCr 4:2:0 is approximately half that of a YCbCr 4:4:4 input.
It is therefore interesting to examine the effect of the extra chromatic spectral planes
in supervised FR systems with a small number of samples per subject for training (L ≈
2−3). Intuitively, YCbCr transformation applied to the input faces should be an optimal
trade off between the amount of chromatic information used and the dimension of the
input vector. For example, a YCbCr 4:2:0 is expected to lead to a better FR system than
a YCbCr 4:4:4 transformation in an extreme small sample size scenario. In this chapter,
the effect of chromatic inputs and spatial sampling of the chromatic planes on the FR
system are studied, along with their implications on the small sample size problems.
Chapter 3. Color face recognition in different learning scenarios 29
3.5 Methodology and Experimental Setup
For the experiments in this chapter, the FR system has been trained on the gallery set,
Z. The FR system operates in the identification mode, and the images of the probe
set, Q are to be matched against those of the gallery. A pictorial description of the FR
system is given in Figure 3.1.
Conversion to YCbCr
Orthogonal Feature Basis
Probe ID
Probe Image
Training
Testing
Gallery DataPreprocessing and
Construction of Column Vector
Projection onto Feature Basis
FeatureExtractor
ClassificationConversion to
YCbCr
Preprocessing and Construction of Column Vector
Gallery Data
Figure 3.1: System Description (The color space transformation is the same in bothtraining and testing stages)
The images of the gallery and probe sets, Z and Q not only contain the face but
also contain irrelevant portions comprising of the background, hair, shoulder, etc. These
images are therefore passed through a preprocessing stage where the face is isolated from
the rest of the image, and the preprocessed face is converted to a column vector for
further processing. The method used for preprocessing is explained in Appendix B. The
resolution of the images after preprocessing are fixed to 150×130, as this resolution is
commonly used in surveillance applications. The preprocessed faces are then vectorized
following the procedure detailed in Section 3.2.
The image vectors are in the RGB format and are then transformed to the YCbCr
color space. The YCbCr transformations used are the YCbCr 4:4:4, YCbCr 4:2:2 and
YCbCr 4:2:0. The YCr subspace is also tested, as the red spectral plane is proven to be
Chapter 3. Color face recognition in different learning scenarios 30
more discriminative for FR purposes than the blue spectral plane in past literature [17].
The corresponding YCr transformations are therefore referred to as YCr 4:4:4, YCr 4:2:2
and YCr 4:2:0. The same color space/ gray scale transformation is performed in both
the training and testing stages. The gray scale transformation is used as a baseline for
comparison.
In this chapter, the effect of chromatic information on the FR system is evaluated in
good and poor imaging conditions, for both supervised and unsupervised FR systems.
The PCA feature extractor is used when the feature extractor is an unsupervised learner.
In the supervised learning case, the LDA feature extractor is utilized, however, a PCA
step is applied prior to the LDA in order to avoid the inversion of a singular SW .
Two subsets of the CMU PIE[40, 41] database are chosen for evaluation, DB1 and
DB2. Database DB1 consists of images having severe illumination conditions caused by
varying positions of camera flash in a room with zero background illumination. Database
DB2 contains images with varying camera flash positions with uniform background illu-
mination, therefore neutralizing to an extent the effect of the flash; Database DB2 has
lighter illumination variations than DB1. Faces in both databases have neutral expres-
sion and frontal pose. DB1 and DB2 contain 1496 images and 1425 images respectively
of 68 subjects. A description of the CMU PIE database along with sample images from
DB1 and DB2 is provided in Appendix A.
For evaluation, C subjects are chosen from the evaluation database (DB1/DB2 ) along
with all corresponding images to form a database Y . Y is randomly partitioned into the
training/gallery set,Z and probe set Q, such that Y = Z +Q and Z ∩Q = ∅. A random
partition is performed on Y , such that Z is composed of CxL images. The remaining
|Y | −CxL images comprise the probe set,Q, where |Y | is the cardinality of Y . Any Face
Recognition method evaluated is first trained on Z and then evaluated on Q to produce a
Rank k Correct Recognition Rate (CRR). The performance of the system is measured by
the Rank 1 CRR. Results for the Rank 5 CRR are also provided, in order to evaluate the
Chapter 3. Color face recognition in different learning scenarios 31
contribution of chromatic information when the performance measure criterion is more
relaxed. The results are reported at an average greater than 5 runs to avoid bias.
3.6 Results
In this section, the contribution of chromatic information in FR systems is examined for
1. Easy to hard learning scenarios- varying number of subjects, C and samples per
subject, L. C is fixed to 35 and 65 and L varies from 2 to 9 samples/ subject
2. Poor and good illumination conditions, for both supervised and unsupervised learn-
ing systems.
3.6.1 Choice of Gray scale baseline and Similarity Metric
In order to evaluate the contribution of the chromatic spectral planes, all performances
must be compared to a gray scale baseline. Three gray scale transformations whose
inputs have the same dimensionality were evaluated: Y from YCbCr, R from RGB and
an RGB linear combination of 0.2B + 0.7G+ 0.1R, for both supervised and unsupervised
learning systems in Figure 3.2 on a subset of database DB1. In addition, two similarity
metrics were evaluated, one based on the Euclidean Distance and the other based on the
inner product.
The cosine similarity metric based on the inner product is given by,
d =u · v
|u| |v|(3.6)
where d is the distance, u and v are the pattern vectors. The normalized inner product
produces the cosine metric.
Chapter 3. Color face recognition in different learning scenarios 32
2 3 4 5 6 7 8 915
20
25
30
35
40
45
50
55
60
Samples/Subject
Ran
k 1
CR
R%
PCA feature extraction (Database: DB1, C=65)
Y (Inner Product Metric)Y (Euclidean MetricR (Inner Product Metric)R (Euclidean Metric).2B+.7G+.1R (Inner Product).2B+.7G+.1R (Euclidean)
2 3 4 5 6 7 8 970
75
80
85
90
95
100
Samples/Subject
Ran
k 1
CR
R%
LDA feature extraction (Database: DB1, C=65)
Y (Inner Product Metric)Y (Euclidean MetricR (Inner Product Metric)R (Euclidean Metric).2B+.7G+.1R (Inner Product Metric).2B+.7G+.1R (Euclidean Metric)
Figure 3.2: Comparison of gray scale transformations and similarity metrics
The Euclidean similarity metric is given by,
d = −√
(u− v)′ · (u− v) (3.7)
Chapter 3. Color face recognition in different learning scenarios 33
where d is the distance and u and v are pattern vectors.
From Figure 3.2, all gray scale transformations lead to almost the same level of
performance. Since the YCbCr chromatic transformations are being used, the Y trans-
formation is chosen over the other gray scale transformations. Also, the inner product
based similarity metrics perform better than the euclidean distance based metric. The Y
transformation is therefore chosen as the baseline, and the inner product based similarity
metric is used for the remainder of the experiments.
3.6.2 Unsupervised Learning
In this section, the effect of various color space transformations will be evaluated for
both evaluation databases, DB1 and DB2 in the unsupervised FR system (trained with
a PCA feature extractor).
An obvious trend noticed from Figures 3.3, 3.4, 3.5 and 3.6 is that the overall FR
performance is much higher in the case of database DB2 by approximately 10 %, which
gives an insight into the difficulty of the FR problem when database DB1 is used. From
Figures 3.3 and 3.4, a broad conclusion is that, for database DB1, color space transfor-
mations outperform the gray scale Y transformation over all learning scenarios examined,
although for database DB2, the gray scale Y transformation leads to a better recogni-
tion rate than the color transformations for L ≥ 3 for all Cs examined. The images in
database DB1 have poor illumination conditions, and therefore the shape cues of these
images are degraded. Therefore, the chromatic planes are necessary to boost the recogni-
tion performance in these imaging conditions. On the other hand, for database DB2, the
images have good illumination conditions and hence the intensity plane of the images is
not degraded. Chromatic planes hence do not contribute to the performance of the FR
system when operated on database DB2.
Chapter 3. Color face recognition in different learning scenarios 34
2 3 4 5 6 7 8 9
25
30
35
40
45
50
55
60
65
70
75
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with PCA (Database: DB1, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 920
25
30
35
40
45
50
55
60
65
70
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with PCA (Database: DB1, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.3: Rank 1 performance of YCbCr transformations with PCA feature extractor(Unsupervised Learning), database DB1
Database DB1
The YCbCr 4:4:4 transformation leads to the best FR performance for all values of C
and L examined. Spatial sub sampling of chromatic planes leads to loss of important
Chapter 3. Color face recognition in different learning scenarios 35
2 3 4 5 6 7 8 9
40
45
50
55
60
65
70
75
80
85
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with PCA (Database: DB2, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 9
35
40
45
50
55
60
65
70
75
80
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with PCA (Database: DB2, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.4: Rank 1 performance of YCbCr transformations with PCA feature extractor(Unsupervised Learning), database DB2
information, thus leading to a deterioration in FR performance. Since PCA focuses on
object reconstruction by maximizing the total scatter, it can be concluded that, when
the shape cues are unclear, chromatic information is very important to achieve this
Chapter 3. Color face recognition in different learning scenarios 36
2 3 4 5 6 7 8 930
40
50
60
70
80
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with PCA (Database: DB1, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 930
35
40
45
50
55
60
65
70
75
80
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with PCA (Database: DB1, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.5: Rank 5 performance of YCbCr transformations with PCA feature extractor(Unsupervised Learning), database DB1
reconstruction as the intensity image is degraded.
Another trend noticed is that the YCbCr 4:2:2 and the YCr 4:2:2 perform better
than the YCbCr 4:2:0 and YCr 4:2:0 transformations. This trend leads to the conclusion
Chapter 3. Color face recognition in different learning scenarios 37
2 3 4 5 6 7 8 940
45
50
55
60
65
70
75
80
85
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with PCA (Database: DB2, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 940
45
50
55
60
65
70
75
80
85
90
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with PCA (Database: DB2, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.6: Rank 5 performance of YCbCr transformations with PCA feature extractor(Unsupervised Learning), database DB2
that the Cr plane has more discriminative information for the FR application, and loss
of information from the Cr plane leads to a larger deterioration in performance, when
compared to the Cb plane.
Chapter 3. Color face recognition in different learning scenarios 38
The contribution of chromatic information to the FR system remains approximately
the same across all learning scenarios for all color transformations examined. The trends
and conclusions observed are alike for both rank 1 and rank 5 performance measures
- relaxation of the performance measure criterion, does not reduce the contribution of
color information.
Database DB2
The best color transformations are the YCr 4:2:0 and YCr 4:2:2 across all learning sce-
narios examined. The color transformations are outperformed by the Y transformation.
However, for the hardest learning case of C=65 and L=2, the YCr 4:2:0 and YCr 4:2:2
color transformations perform as well as the Y transformation alone.
The trends observed in Figure 3.4 suggest that addition of color information reduces
the performance of the PCA based FR system under most learning conditions. The PCA
is a statistical algorithm and requires training data for the creation of a discriminative
low dimensional face space. The results suggest that the extra information in the Cb and
Cr planes are not as useful as the information in the Y plane, and therefore the reduce
the performance of the FR system when passed as inputs to the PCA feature extractor.
In fact, the chromatic inputs with most chromatic information (YCbCr 4:4:4) lead to the
worst performance.
The YCr transformations on the whole lead to better rank 1 and rank 5 performances
than the YCbCr transformations. The YCr 4:4:4 performs better than the YCbCr 4:2:2
over all values of C and L, despite the fact that both transformations possess the same
amount of chromatic information. Thus, the Cr spectral plane has better discriminative
information for the FR application, which reinforces the conclusions made in past liter-
ature [17] and the trends observed with database DB1. The trends observed remain the
same for both C=35 and 65 as well as for performance measures, rank 1 and 5.
In conclusion, chromatic information aids the performance of the FR system signif-
Chapter 3. Color face recognition in different learning scenarios 39
icantly in difficult illumination conditions, when the shape cues are unclear leading to
degraded intensity images. When the illumination conditions are good, the addition of
chromatic bytes to the face input lead to a reduction in FR performance. The bytes
of information which constitute a chromatic input are therefore very important in the
design of a color FR system. The trends in performances of the various transformations
examined remain constant over a range of learning scenarios.
3.6.3 Supervised Learning
In this section, the effect of the various chromatic transformations will be examined on
both the evaluation databases DB1 and DB2 for an LDA based supervised FR system.
Since the LDA algorithm is susceptible to the small sample size problem, an evaluation
of the behavior of the FR system in the small sample size problem and the effect of color
information on this problem is also presented in this section.
From Figures 3.7, 3.8, 3.9 and 3.10, it is obvious that the FR system has a better
performance of approximately 10% for small L when operated on database DB2, when
compared to database DB1. This reconfirms the difficulty of the FR problem when
operated on database DB1, as mentioned in Section 3.6.2. A general conclusion can
be made on the variation of the FR system performance with respect to the number of
samples per subject, L. As L increases, the performance of all the color space and gray
scale transformations converge to a constant high value. This convergence occurs for a
lower value of L when the FR system operates on database DB2, and can be attributed
to the fact that recognition of images from database DB2 is not as hard a problem
as database DB1, therefore, the FR system does not require specialized inputs for the
creation of a discriminative feature space, when the learning scenarios are not hard.
The detailed trends and conclusions on each of the evaluation databases are provided in
sections 3.6.3 and 3.6.3.
Chapter 3. Color face recognition in different learning scenarios 40
2 3 4 5 6 7 8 9
80
85
90
95
100
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with LDA (Database: DB1, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 975
80
85
90
95
100
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with lda (Database: DB1, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.7: Rank 1 performance of YCbCr transformations with LDA feature extractor(Supervised Learning), database DB1
Database DB1
As with the case of the unsupervised learning scenario, chromatic information is especially
important in conditions of poor illumination. The contribution of color information is
Chapter 3. Color face recognition in different learning scenarios 41
2 3 4 5 6 7 8 9
88
90
92
94
96
98
100
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with LDA (Database: DB2, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 9
90
92
94
96
98
100
Samples/Subject
Ran
k 1
CR
R%
Performance of YCbCr transformations with LDA (Database: DB2, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.8: Rank 1 performance of YCbCr transformations with LDA feature extractor(Supervised Learning), database DB2
significant for low values of L, and as L increases to 9, the FR performance of all color
space and gray scale transformations converge to a constant high value.
An important observation is that, for small L ≈ 2 − 3, the YCbCr 4:4:4 transfor-
Chapter 3. Color face recognition in different learning scenarios 42
2 3 4 5 6 7 8 982
84
86
88
90
92
94
96
98
100
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with LDA (Database: DB1, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 980
82
84
86
88
90
92
94
96
98
100
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with lda (Database: DB1, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.9: Rank 5 performance of YCbCr transformations with LDA feature extractor(Supervised Learning), database DB1
mation leads to the worst performance and is marginally better than the gray scale Y
transformation. This trend can be attributed to the extremely large dimensionality of
a YCbCr 4:4:4 input (thrice the corresponding Y input). The increased dimensionality
Chapter 3. Color face recognition in different learning scenarios 43
2 3 4 5 6 7 8 992
93
94
95
96
97
98
99
100
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with LDA (Database: DB2, C=35)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
2 3 4 5 6 7 8 994
95
96
97
98
99
100
Samples/Subject
Ran
k 5
CR
R%
Performance of YCbCr transformations with lda (Database: DB2, C=65)
YYCbCr4:4:4YCr (from YCbCr4:4:4)YCbCr4:2:2YCr(from YCbCr4:2:2)YCbCr4:2:0YCr(from YCbCr4:2:0)
Figure 3.10: Rank 5 performance of YCbCr transformations with LDA feature extractor(Supervised Learning), database DB2
causes the within class scatter matrix of the LDA learner to be ill posed, leading to a more
challenging small sample size problem. In the most extreme small sample size scenario
examined, L = 2, an interesting trend is noticed. When C = 35, the YCr 4:4:4 leads
Chapter 3. Color face recognition in different learning scenarios 44
to the best FR performance, and when C=65 (the hardest learning scenario), the YCr
4:4:4 and the YCbCr 4:2:2 lead to the best FR performance. These transformations are
followed by YCr 4:2:2 and YCbCr 4:2:0. These transformations have dimensionalities ap-
proximately 12− 2
3that of YCbCr 4:4:4. The YCr 4:2:0 and YCbCr 4:4:4 transformations
are among those which lead to the worst performance in this learning scenario. These
trends lead to the idea that chromatic inputs help the FR system in the small sample size
scenario, however a color transformation with optimal dimensionality with respect to the
FR performance should be chosen. This observations suggests that although increased
dimensionality of color inputs could lead to a small sample size problem, color inputs with
optimal dimension enhance the FR system significantly even in extreme small sample size
learning scenarios. The trends are similar for both rank 1 and rank 5 performances.
Database DB2
As with the unsupervised learning case, the contribution of chromatic information is not
significant when the illumination conditions are good, as the shape information present
in the intensity image is not degraded.
The contribution of chromatic information when the FR system operates on database
DB2 is most pronounced in the hardest learning scenario examined, corresponding to
C = 65 and L = 2. In this learning scenario, the YCr 4:2:2 and YCr 4:2:0 transforma-
tions offer a marginal improvement in both rank 1 and rank 5 CRRs over the gray scale Y.
Incidentally these are the transformations which lead to lowest input dimensionality. This
trend suggests that, color helps the FR system in hard learning scenarios, even when the
illumination conditions are not poor. In all the other learning scenarios, color transfor-
mations do not offer any significant improvement in performance to the FR system. As L
increases, for both values of C examined, all transformations converge to a constant high
value. The effect of the dimensionality of other high dimensional color transformations
(from YCr 4:4:4 to YCbCr 4:4:4) on the small sample size problem is very clearly seen
Chapter 3. Color face recognition in different learning scenarios 45
in Figures 3.8 and 3.10. The trends observed when the FR system operates on database
DB2 reinforces the theory that, a color input with the best trade off between dimension-
ality and amount of chromatic information should be chosen, depending on the difficulty
of the imaging conditions and learning scenarios. Another observation is that, the YCr
transformations lead to a better FR performance than the YCbCr transformations of the
same dimensionality (YCr 4:4:4 & YCbCr 4:2:2, YCr 4:2:2, YCbCr 4:2:0). This leads to
the obvious conclusion, that information from the Cr plane contains more discriminative
information required to create the LDA feature basis than that contained in the Cb plane.
This conclusion is identical to that obtained when a PCA feature extractor was applied
on database DB2 in Section 3.6.2.
In conclusion, the contribution of color is most significant in severe imaging conditions
and hard learning scenarios. The extra dimensionality of color inputs does have an
implication on the small sample size problem encountered with an LDA learner, although
if color inputs with good dimensionality trade off is chosen, the performance of the FR
system can be boosted. Another factor to be taken into consideration in the design of
LDA based FR systems, is the inclusion of bytes from the best spectral planes for FR
purposes in the construction of the vectorized face input. Both of the above parameters
are important for supervised LDA based FR system design, and depend on the severity
of the illumination conditions of the face images. As the learning scenarios are relaxed,
all face inputs lead to a high performance.
3.7 Conclusions
In this section, the conclusions of the trends observed in the experiments is summarized,
along with recommendations on the usage of chromatic information for both supervised
and unsupervised FR systems for the scenarios listed in Table 3.1.
An obvious trend is that the performance of the FR system is better in the super-
Chapter 3. Color face recognition in different learning scenarios 46
vised learning mode for all learning scenarios and illumination conditions examined, as
expected. The training was performed on the gallery set, therefore class specific projec-
tions (LDA feature basis) could provide a more discriminative feature space.
Another trend is that the overall FR performance was much higher when the FR sys-
tem was operated on database DB2 when compared to database DB1, which is because
the database DB1 was captured under difficult imaging conditions. Chromatic informa-
tion boosts the performance of FR systems under severe imaging conditions, and does
not provide discriminative information to the FR system under good imaging conditions.
A trend which holds true over all illumination conditions examined, for all FR systems is
that the red chromatic plane has a more discriminative information than the blue plane
for FR purposes. Therefore choosing of the correct bytes of chromatic information to
form the face input is very important.
Sub sampling of chromatic spectral planes leads to a loss of chromatic information
when unsupervised feature extractors based on PCA are operated on databases with dif-
ficult imaging conditions and thus would lead to a deterioration of performance. Larger
dimensional chromatic inputs help in better reconstruction and creation of a more dis-
criminative feature space. On the other hand, when the illumination conditions are good,
sub sampled chromatic planes lead to a better performance than without spatial sam-
pling. This trend continues over all learning scenarios. The PCA algorithm does not
suffer from the small sample size problem.
In supervised systems based on the LDA, the contribution of chromatic spectral planes
is most pronounced in poor illumination conditions and hard learning scenarios. When
the illumination conditions are good, and the number of samples per subject is not ex-
tremely small, color does not significantly help the FR system, and when the learning
scenarios become less hard, both gray scale and color inputs lead to a very high per-
formance. This trend holds true for both low and high number of subjects. The extra
dimension of color does have an implication on the small sample size problem which the
Chapter 3. Color face recognition in different learning scenarios 47
LDA algorithm is susceptible to. The improvement offered by color transformations is
notable in the small sample size scenario, when the number of samples per subject is
around 2 or 3; however the best trade off between using more chromatic information and
a chromatic input with reasonably low dimensionality with respect to FR performance
is necessary. This chosen dimensionality would depend on the imaging conditions un-
der which the faces were captured and the hardness of the learning scenario (number of
subjects under consideration). Table 3.7 summarizes the optimal color space/ gray scale
transformations for both imaging conditions and learning scenarios examined in the most
extreme small sample size scenario examined (Number of samples per subject= 2).
No. of subjects35 65
Moderate Imaging Conditions Y Ycr 4:2:0, Ycr 4:2:2(Database DB2)Severe Imaging Conditions Ycr 4:4:4 Ycr 4:4:4, YCbCr 4:2:2(Database DB1)
Table 3.2: Best Color/ Gray scale transformations in Extreme Small Sample Size scenario
3.8 Chapter Summary
In this chapter, the contribution of color inputs to the FR system was examined under
a range of learning scenarios covering those listed in Table 3.1 under good and poor
illumination conditions, for both supervised and unsupervised FR systems. It was found
that color inputs significantly help the FR system under difficult learning scenarios and
hard imaging conditions for both supervised and unsupervised systems.
The implication of chromatic sub sampling was examined for unsupervised FR sys-
tems, under different imaging conditions. It was found that under severe imaging condi-
tions, chromatic sub sampling could lead to a loss of important color information, thus
leading to a less discriminative feature basis.
Chapter 3. Color face recognition in different learning scenarios 48
Experiments were carried out to identify the implication of the extra dimension of
color inputs and the spatial sub sampling of chromatic spectral planes on the small
sample size problem encountered in supervised learning systems, and the conclusions
were presented.
Chapter 4
Decision Level Fusion of Spectral
Planes
In this chapter, the decisions obtained by classifiers trained on decorrelated individual
spectral planes of YCbCr transformed inputs are fused to produce a final decision. The
impact of this decision fusion framework on the FR system, specifically on the small sam-
ple size problem encountered in supervised learning systems is discussed in this chapter.
The results also provide an insight into the discriminatory capabilities of the different
spectral planes of YCbCr transformed inputs.
4.1 Introduction and Objective
The integration of chromatic data into the FR system improves the performance of the FR
system, and this holds true especially in conditions of poor illumination as experimentally
concluded in Chapter 3. In appearance based FR methods, chromatic information is
integrated into the FR system by the fusion of information from individual spectral
planes. This fusion can take place at three levels,
• Raw data/ signal level fusion: This usually involves concatenation of vectorized
49
Chapter 4. Decision Level Fusion of Spectral Planes 50
data from individual spectral planes forming a long vectorized chromatic input.
• Feature level fusion: This involves fusing of the feature vectors constructed from
the individual spectral planes.
• Decision level fusion: This level of fusion involves fusing of the decisions obtained
by classifiers trained on individual spectral planes.
Face inputs of color spaces in which each of the spectral planes provides different
and complementary information to the FR system, i.e., where the information contained
in each of the spectral planes is decorrelated lead to a more efficient use of chromatic
information and better FR systems. In the case of signal level fusion, concatenation of
decorrelated information from individual spectral planes, where each spectral plane offers
unique discriminative information to the FR system would lead to a vectorized input
with low redundancy. Similarly, in the case of decision level fusion, classifiers trained
on decorrelated information would lead to a diverse set of classifiers, which theoretically
would result in a better multiple classification FR system. This trend of decorrelated
color spaces leading to promising FR systems is supported by the results of past works
[26, 17] and is also true for all levels of fusion.
Fusion at the raw data level leads to a vectorized chromatic input of a dimension µ
times that of the corresponding gray scale image, where µ is a function of the number
of spectral planes, K in the color space and the spatial sub sampling structure of the
chromatic planes. This vectorized chromatic input of increased dimension could typically
increase the small sample size problem in supervised FR systems as discussed in Chapter
3. This issue therefore motivates the usage of a different level of fusion of information
in spectral planes. In [24] by M.T. Sadeghi et al, different levels of fusion of information
of spectral planes were examined on inputs of the RGB color space. It was found that
the most effective types of fusion were fusion at the feature level, performed by the
concatenation of the low dimensional feature vectors formed from the individual spectral
Chapter 4. Decision Level Fusion of Spectral Planes 51
planes and decision level fusion performed by a similarity score average at the classifier
level, of the decisions obtained by the classifiers trained on the individual spectral planes.
However, fusion at the decision level is computationally the simplest method of fusion,
and can avoid the passing of large dimensional inputs through the feature extraction
process.
In Chapter 3, YCbCr transformed face inputs were fused at the signal level and
were shown to improve the performance of FR systems especially in poor illumination
conditions and difficult learning scenarios, specifically the small sample size learning
scenario. It was also concluded that sub sampling ratio could be used as a parameter
to control the trade off between dimensionality of the face input and the amount of
chromatic information used, for effective use of color inputs in small sample size scenarios
for different imaging conditions. In this chapter, the idea of creating a multiple classifier
system for FR purposes by fusing the decisions of classifiers trained on individual spectral
planes of YCbCr transformed inputs is explored, and the impact of this framework on the
small sample size learning scenario is specifically examined. Since the YCbCr color space
is decorrelated, it is expected to lead to a good multiple classifier system with diverse
classifiers.
In summary, the objectives/ issues examined in this chapter are,
• Address the small sample size problem caused by increased dimensionality of vec-
torized chromatic inputs by exploring fusion of information from spectral planes on
a decision level. The effect of spatial sub sampling of chromatic spectral planes on
this framework is also examined.
• The results of the above two issues can also be used to determine the spectral
plane which performs best under the different imaging conditions examined, as
each spectral plane offers distinct information to the FR system.
Chapter 4. Decision Level Fusion of Spectral Planes 52
In supervised learning FR systems, all color space/ gray scale transformations con-
verge to a constant high value of performance as the number of samples per subject
available for training increase, which is experimentally concluded in Chapter 3. How-
ever, in the extreme small sample size scenario, the performance of the system is severely
affected by high dimensionality inputs and inputs with low discriminative information.
Therefore the small sample size scenarios are of particular interest in supervised learning
FR systems.
4.2 Combination Strategies
In this section, the various methods by which decisions of classifiers trained on different
spectral planes are combined are explained. In this chapter, rule based fusion methods,
( whose fusion rules do not depend on the training data) are used for the combination
of decisions. Although data dependent fusion methods are expected to lead to a better
performance [42], they are not examined in this chapter as the focus is to examine the
effect of the small sample size problem on the multiple classifier fusion framework.
Let z be a probe or an unknown face input to be identified which consists of K
spectral planes, such that z = {sm}K
m=1, where sm is the mth spectral plane of image z.
The aim of the multiple classifier system is to determine the identity of z, ωj from among
the C classes where ωj ∈ {1, 2, 3, ..., C}. In the case of the experiments performed in this
thesis, K = 3 for YCbCr transformed inputs.
A multiple classifier FR system can be formulated on the framework of Bayesian
estimation theory [30]. According to Bayesian estimation theory,
assign z to ωj if
P (ωj/z) =C
maxk=1
P (ωk/z) (4.1)
Chapter 4. Decision Level Fusion of Spectral Planes 53
Since z contains 3 spectral planes s1, s2 and s3,
P (ωj/s1, s2, s3) =C
maxk=1
P (ωk/s1, s2, s3) (4.2)
Equations (4.1) and (4.2) suggest that z is assigned to the class which has the max-
imum a posteriori probability, given the spectral planes, {sm}3m=1 of z. The estimation
of this a posteriori probability, P (ωj/z) would depend on the fusion rule adopted.
Sum Rule
In the sum rule, the a posteriori probability is given by,
P (ωk/z) = P (ωk/s1, s2, s3) ≈3
∑
m=1
P (ωk/sm) (4.3)
The probability value P (ωk/sm) lies in the interval [0,1] and is typically the value of
the similarity score between sm and the mth spectral plane of an image of class ωk in the
gallery.
By substituting Equation (4.3) in Equation (4.2), the sum rule is given by Equation
(4.4)
assign z = {sm}3m=1 to class ωj if,
3∑
m=1
P (ωj/sm) =C
maxk
3∑
m=1
P (ωk/sm) (4.4)
Max Rule
Similarly, for the max rule, the a posteriori probability is given by,
P (ωk/z) = P (ωk/s1, s2, s3) ≈ maxm
P (ωk/sm) (4.5)
Substituting Equation (4.5) in Equation (4.2), the max rule of fusion is given by,
Chapter 4. Decision Level Fusion of Spectral Planes 54
assign z = {sm}3m=1 to class ωj if,
3maxm=1
P (ωj/sm) =C
maxk=1
3maxm=1
P (ωk/sm) (4.6)
Min Rule
Similarly, the min rule is given by Equation (4.7),
assign z = {sm}3m=1 to class ωj if,
3
minm=1
P (ωj/sm) =C
maxk=1
3
minm=1
P (ωk/sm) (4.7)
4.3 Methodology and Experimental Setup
For the experiments in this chapter, the FR system is trained on the gallery set, Z. The
images of probe set, Q are matched against those of the gallery, Z. The FR system is
operated in the identification mode. Figure 4.1 presents a pictorial representation of the
system used.
The images used for experiments consist of other irrelevant information along with
the face, e.g., hair, background, shoulder, etc. The face is isolated from these images
for experiments, and this is performed in the preprocessing step. The method for pre-
processing follows that explained in Appendix B. The resolution of the images after
preprocessing are fixed to 150×130, like in the experiments performed in Chapter 3 as
this resolution is commonly used in surveillance applications. Each of the preprocessed
faces are then vectorized following the procedure detailed in Section 3.2.
The images of the faces are stored in RGB format, and are converted to the required
color space in the color space transformation block. The YCbCr set of color transforma-
tions are used in the experiments. The YCbCr transformations used for the experiments
include the YCbCr 4:4:4, YCbCr 4:2:2 and YCbCr 4:2:0. Following the color space trans-
Chapter 4. Decision Level Fusion of Spectral Planes 55
formation step, the Y, Cb, and Cr planes are isolated, and passed through the remaining
procedures as detailed in Figure 4.1.
Since the main objective of this chapter is to examine the effect of using a multiple
classifier FR system on the small sample size problem encountered in supervised FR
systems, the LDA based feature extractor is used. To ease the inversion of the within
class scatter matrix, SW , a PCA step is performed prior to the LDA, and the number
of features retained by the LDA feature extractor is C − 1, where C is the number of
subjects. The evaluation databases chosen for experiments in this chapter are the same
as those in Chapter 3, DB1 and DB2. The number of subjects, C is however fixed to 65,
and the samples per subject for training, L are varied between 2 and 9. The method of
creating the gallery, Z and probe, Q sets is also the same as the procedure described for
the experiments in Chapter 3.
The decisions of the classifiers trained on the various Y, Cb and Cr spectral planes
are fused in an aggregation step as depicted in Figure 4.1. The aggregation methods are
the sum rule, the max rule and the min rule. In the next section, the three aggregation
rules will be compared and the best performing one(s) will be chosen for the remainder
of the experiments. The normalized inner product is used as a similarity metric as it
was experimentally proven to be lead to a better performance than the euclidean based
metric by the experiments in Section 3.6 of Chapter 3. The FR system is first trained
on Z and evaluated on Q to produce a Rank k Correct Recognition Rate (CRR). For
our experiments, k = 1, 5. The results are reported at an average greater than 5 runs to
avoid bias. Each run is performed on a random gallery-probe partition.
Chapter
4.
Decis
ion
Level
Fusio
nof
Spectral
Planes
56
Gallery SetColor Space
Transformation
Y input
Cb input
Cr input
WY
WCb
WCr
Probe Image
YCbCr Color Space
Transformation
YCbCr Color Space
Transformation
Projection onto WY
Projection onto WCb
Projection onto WCr
SimilarityComputation
SimilarityComputation
SimilarityComputation
Decision Fusion /
Aggregation
Classification
Probe ID
Testing
Training
Cb
Cr
Y
Y
Cb
Cr
Preprocessing and Construction of Column Vector
Linear Discriminant
Analysis
Linear Discriminant
Analysis
Linear Discriminant
Analysis
Preprocessing and Construction of Column Vector
Preprocessing and Construction of Column Vector
Gallery Set
Figure 4.1: System Diagram: Decision Level Fusion
Chapter 4. Decision Level Fusion of Spectral Planes 57
4.4 Results
In this section, the effect of decision level fusion is specifically examined in a range of
small sample size learning scenarios. Experiments are performed on both images with
severe illumination conditions, database DB1 and moderate imaging conditions, Database
DB2. The number of subjects is fixed to C = 65 as mentioned and the samples per
subject available for training, L ∈ {2, 3, 4, 6, 9}. The discriminatory information present
in individual spectral planes of the YCbCr transform is also studied for both evaluation
databases.
In order to examine the improvement achieved by decision level fusion, a new perfor-
mance measure has been introduced, β∗. This signifies the best improvement obtained
by a decision level fusion over the raw data level fusion. A negative value of β∗ signifies
an improvement. The rank 1 and rank 5 results for evaluations on both database DB1
and database DB2 are presented in Tables 4.1-4.12. β∗444, β
∗422 and β∗
420 signify the best
improvements obtained by a decision level fusion for the YCbCr 4:4:4, YCbCr 4:2:2 and
YCbCr 4:2:0 transformations respectively. The first column in all tables consists of the
performances with the gray scale baseline, Y.
4.4.1 Choice of Aggregation Rule
In order to evaluate the contribution of fusion or aggregation at a decision level, a good
aggregation method should be chosen. Three aggregation methods are evaluated- the
sum rule, the max rule and the min rule on images from both evaluation databases. The
color space transformation used was YCbCr 4:4:4, and the aggregation methods were
evaluated using the Rank 1 CRR performance measure. All methods of decision level
fusion were compared against the raw data level fusion of the Y, Cb, Cr spectral planes.
The graphs in Figure 4.4.1 show that the sum rule leads to the best performance when
database DB1 is used, while the max rule leads to the best performance when database
Chapter 4. Decision Level Fusion of Spectral Planes 58
2 3 4 5 6 7 8 975
80
85
90
95
100
Samples/Subject
Ran
k 1
CR
R%
Comparison of Aggregation Methods(Database: DB1)
YCbCr (min rule)YCbCr (max rule)YCbCr (sum rule)YCbCr (raw data fusion)
2 3 4 5 6 7 8 975
80
85
90
95
100
Samples/Subject
Ran
k 1
CR
R%
Comparison of Aggregation Methods(Database: DB2)
YCbCr (min rule)YCbCr (max rule)YCbCr (sum rule)YCbCr (raw data fusion)
Figure 4.2: Comparison of Aggregation Rules on Databases: DB1 and DB2
DB2 is used. The min rule of fusion does not lead to as good an FR performance as the
raw data level fusion or the other methods of decision aggregation evaluated. This leads
to the conclusion that more optimistic decision rules lead to better FR performances.
Chapter 4. Decision Level Fusion of Spectral Planes 59
In the case of the min rule, the decision combiner reports a classification error if even
one of the component classifiers (trained on one of the spectral planes) reports a mis-
classification or a low a posteriori probability of a correct class. The most optimistic
rules- the sum and the max rule therefore lead to the best FR performance, as supported
by past literature on classifier combination [30]. The sum rule and max rule decision
aggregators are therefore are used for the remainder of the experiments.
Another trend observed is that the best improvement offered by decision fusion is in
the extreme small sample size scenario examined, L = 2. As the value of L increases, this
contribution also reduces, and all fusion methods converge to a constant high performance
for large L. This aspect will be examined in the subsequent sub sections.
4.4.2 FR Performance: Poor Illumination conditions
Similar to the experimental results obtained in Chapter 3, the experimental results in
Tables 4.1-4.3 suggest that under all learning scenarios examined, images with chromatic
information leads to a better FR performance than pure gray scale images, irrespective
of the method and level of fusion used. The corresponding rank 5 results are presented in
tables 4.4-4.6. In poor illumination conditions, the shape cues are unclear in the intensity
image, therefore chromatic information significantly helps in boosting the performance
of the FR system.
Effect of decision level fusion in small sample size scenarios
From Tables 4.1-4.3, the values of β∗444, β
∗422, and β∗
420 are negative for all values of
L, which indicates that fusion of classifiers on a decision level boosts the performance
of the FR system. As expected the values of |β∗| for all YCbCr transformations are
highest in the most extreme case of small sample size learning examined (L = 2), and
reduce monotonically with the increase in L. This is because aggregation of information
from different spectral planes on a decision level prevents the issue of passing a higher
Chapter 4. Decision Level Fusion of Spectral Planes 60
Table 4.1: Rank 1 CRR in % (YCbCr 4:4:4, Database DB1 )L Y Cr Cb YCbCr β∗
444
signal level max sum
2 75.53 80.56 71.09 78.43 80.84 83.25 -4.833 84.91 89.68 82.01 88.82 89.91 90.96 -2.144 91.02 93.62 86.83 92.99 93.57 94.66 -1.676 97.03 97.51 94.28 96.85 97.92 98.67 -1.829 99.39 99.26 97.63 99.56 99.58 99.74 -0.19
Table 4.2: Rank 1 CRR in % - (YCbCr 4:2:2, Database DB1 )L Y Cr Cb YCbCr β∗
422
signal level max sum
2 75.53 81.24 70.73 79.97 81.49 83.52 -3.553 84.92 89.24 82.20 88.44 89.81 90.94 -2.54 91.02 93.92 88.08 93.57 93.91 95.09 -1.526 97.02 98.08 93.36 97.13 98 98.59 -1.469 99.39 99.46 97.98 99.49 99.58 99.81 -0.32
Table 4.3: Rank 1 CRR in % - (YCbCr 4:2:0, Database DB1 )L Y Cr Cb YCbCr β∗
420
signal level max sum
2 75.53 81.63 69.18 78.69 81 83.50 -4.803 84.91 88.68 81 87.37 88.87 90.56 -3.184 91.02 93.37 88.05 92.94 94.07 95.43 -2.496 97.03 97.03 93.88 97.41 97.54 98.31 -0.909 99.39 99.36 97.34 99.29 99.68 99.71 -0.42
dimensional input vector consisting of color information through the feature extraction
process, thus reducing the number of parameters to be estimated in the within class
scatter matrix, SW leading to a less severe small sample size problem.
The values of |β∗422| are slightly lower than those of |β∗
444| and |β∗420| in the extreme
small sample size learning scenarios. This suggests that the improvement offered by fus-
ing on a decision level is slightly lower for YCbCr 4:2:2 transformed inputs. Experimental
results in Chapter 3, suggest that in conditions of severe illumination, a transformation
with the right trade off between the amount of chromatic information contained and the
input dimensionality should be used. Among the YCbCr transformations, this optimal
trade off was obtained by the YCbCr 4:2:2. The YCbCr 4:4:4 and YCbCr 4:2:0 trans-
Chapter 4. Decision Level Fusion of Spectral Planes 61
Table 4.4: Rank 5 CRR in % (YCbCr 4:4:4, Database DB1 )L Y Cr Cb YCbCr β∗
444
signal level max sum
2 80.64 86.09 77.41 83.88 85.89 87.61 -3.733 87.05 91.84 84.81 90.77 91.99 92.35 -1.584 92.71 94.98 89.66 95.05 95.14 95.93 -0.886 97.03 97.51 94.28 96.85 97.92 98.67 -1.829 99.39 99.26 97.63 99.55 99.58 99.74 -0.19
Table 4.5: Rank 5 CRR in % - (YCbCr 4:2:2, Database DB1 )L Y Cr Cb YCbCr β∗
422
signal level max sum
2 80.64 86.7 77.6 85.32 86.68 88.06 -2.743 87.05 91.65 85.19 90.32 91.6 92.61 -2.294 92.71 95.38 90.7 94.8 95.52 96.31 -1.526 97.03 98.08 93.36 97.13 98 98.59 -1.469 99.39 99.46 97.98 99.49 99.58 99.81 -0.32
Table 4.6: Rank 5 CRR in % - (YCbCr 4:2:0, Database DB1 )L Y Cr Cb YCbCr β∗
420
signal level max sum
2 80.64 86.8 75.83 83.68 86.03 87.63 -3.953 87.05 90.88 84.19 89.47 90.64 92.31 -2.844 92.71 95.02 90.41 94.32 95.5 96.45 -2.136 97.03 97.03 93.87 97.41 97.54 98.31 -0.99 99.39 99.36 97.34 99.29 99.68 99.71 -0.42
formations have a very high dimensionality and less chromatic information respectively,
when the spectral planes are fused at the signal level. The improvement offered by fus-
ing on a decision level is lower for YCbCr 4:2:2 than YCbCr 4:4:4 and YCbCr 4:2:0
transformed inputs as the raw data fusion performs better.
As mentioned earlier, the sum fusion rule leads to a better FR performance for images
in this evaluation database. However no trend is observed in the performance of decision
level fusion with respect to the sampling structure of the chromatic input.
Chapter 4. Decision Level Fusion of Spectral Planes 62
Discriminative capacity of Individual Spectral Planes
From Tables 4.1-4.3, a trend observed is that information from the red spectral plane
has the most discriminative information while the blue spectral plane has the least. This
is in conjunction with previous works and experiments on color FR. The information
contained in the intensity plane, Y is not sufficient for good FR performances.
As the evaluation method is relaxed, i.e., when the rank 5 performance measure is
used (Tables 4.4-4.6), the FR system performances are higher as expected. However, the
improvement in FR performance offered by a decision level aggregation, i.e., the values
for |β∗| for all YCbCr transformations are lower than the corresponding rank 1 results in
the small sample size learning scenarios.
4.4.3 FR Performance: Good Illumination conditions
In this section, the experimental results the discriminatory ability of various spectral
planes and the effect of a decision level aggregation is examined for images captured
in moderate illumination conditions. The experimental rank 1 results are provided in
Tables 4.7-4.9. A general conclusion is that in conditions of moderate/ light illumination
variations, when the shape cues of the image are clear, color information does not improve
the performance of the FR system. This is similar to the conclusions of the experiments
in Chapter 3.
Effect of decision level fusion in small sample size scenarios
From Tables 4.8-4.9, it is observed that the performance of fusion of spectral planes on any
level does not lead to a boost in performance compared to using gray scale information
alone. This can be attributed to the low discriminative ability of the Cb plane and
clear intensity images. The conclusions of Chapter 3 suggest that in the extreme small
sample size learning scenario which corresponds to L = 2, the YCr 4:2:0 transformation
(when fused on the signal level) leads to a better performance than gray scale images
Chapter 4. Decision Level Fusion of Spectral Planes 63
Table 4.7: Rank 1 CRR in % (YCbCr 4:4:4, Database DB2 )L Y Cr Cb YCbCr β∗
444
signal level max sum
2 95.23 81.87 62.81 89.1 94.17 91.63 -5.083 98.56 91.32 76.23 96.6 98.08 96.52 -1.484 99.74 95.41 84.08 98.53 99.29 98.63 -0.776 99.98 99.11 90.55 99.62 99.98 99.54 -0.369 100 99.47 96.27 99.94 100 99.94 -0.06
Table 4.8: Rank 1 CRR in % - (YCbCr 4:2:2, Database DB2 )L Y Cr Cb YCbCr β∗
422
signal level max sum
2 95.23 80.6 61.69 92.42 93.56 90.81 -1.133 98.56 90.69 76.52 97.43 98.06 96.05 -0.634 99.74 95.77 83.1 99.29 99.57 98.7 -0.286 99.98 98.27 91.25 99.81 99.95 99.78 -0.149 100 99.79 95.86 100 100 100 0
Table 4.9: Rank 1 CRR in % - (YCbCr 4:2:0, Database DB2 )L Y Cr Cb YCbCr β∗
420
signal level max sum
2 95.23 80.94 60.31 94.37 93.69 90.5 0.673 98.56 90.89 76.42 98.32 98.48 96.48 -0.164 99.74 94.59 82.29 99.53 99.32 98.61 0.216 99.98 98.1 90.19 99.93 99.93 99.74 09 100 99.38 95.53 99.97 100 99.97 -0.03
alone. Therefore a fusion of the Y and Cr planes on a decision level could boost the FR
performance over the grayscale Y transformation. However, the contribution of fusion of
spectral planes on a decision level over the signal level fusion can still be discussed from
the trends observed in tables 4.8-4.9.
The values of |β∗444|, |β
∗422| and |β∗
420| are highest for the small sample size scenario
and decrease monotonically with the increase in L for all three YCbCr transformations
examined, which suggests that the contribution of decision level aggregation is most in
the extreme small sample size scenarios. This is similar to the trends observed in the case
where evaluation database DB1 was used. When a raw data level fusion is performed, the
Rank 1 CRR for L = 2 is lowest for the YCbCr 4:4:4 transformation and increases with
Chapter 4. Decision Level Fusion of Spectral Planes 64
Table 4.10: Rank 5 CRR in % (YCbCr 4:4:4, Database DB2 )L Y Cr Cb YCbCr β∗
444
signal level max sum
2 97.35 88.19 71.31 93.96 96.79 95.38 -2.833 99.13 93.54 80.28 97.75 98.83 97.55 -1.074 99.89 96.94 86.52 98.93 99.57 99.27 -0.646 99.98 99.11 90.55 99.62 99.98 99.54 -0.369 100 99.47 96.27 99.94 100 99.94 -0.06
Table 4.11: Rank 5 CRR in % - (YCbCr 4:2:2, Database DB2 )L Y Cr Cb YCbCr β∗
422
signal level max sum
2 97.35 86.56 69.69 95.46 96.62 94.58 -1.153 99.13 92.91 80.36 98.26 98.83 97.35 -0.574 99.89 97.03 86.2 99.64 99.76 99.34 -0.136 99.98 98.27 91.25 99.81 99.95 99.78 -0.149 100 99.79 95.86 100 100 100 0
Table 4.12: Rank 5 CRR in % - (YCbCr 4:2:0, Database DB2 )L Y Cr Cb YCbCr β∗
420
signal level max sum
2 97.35 87.67 68.23 97.37 96.96 94.4 0.43 99.13 93.26 80.14 98.97 99.07 97.41 -0.14 99.89 96.15 85.68 99.7 99.64 99.29 0.066 99.98 98.1 90.19 99.93 99.93 99.74 09 100 99.38 95.53 99.97 100 99.97 -0.03
YCbCr 4:2:2 and YCbCr 4:2:0. The contribution of fusion, |β∗| is highest for YCbCr 4:4:4
(≈ 5.07) and reduces as more chromatic sampling is performed. This can be attributed
to the fact that the YCbCr 4:4:4 has the highest dimensionality and is most severely
affected by the small sample size problem when a signal level fusion is performed. The
value of |β∗420| suggests that the performance of the FR system obtained by decision level
aggregation is almost the same as that obtained by a raw data level fusion.
The max rule of of decision aggregation leads to the best aggregation performance
under most cases. Similar to the previous results on database DB1, no trend is observed
in the performance of decision level fusion with respect to the sampling structure of the
chromatic input.
Chapter 4. Decision Level Fusion of Spectral Planes 65
Discriminative capacity of Individual Spectral Planes
As mentioned earlier, when the imaging conditions are optimal, the gray scale image (Y
transformation) contains sufficient discriminative information for a well performing FR
system. However the Cb plane leads to a poor performance, and should be avoided when
fusing chromatic information.
When the performance measure is relaxed, i.e., when the rank 5 performance measure
is used (tables 4.10-4.12), the CRRs obtained are higher. Also, the values of |β∗| are lower
than the corresponding rank 1 results.
4.5 Conclusion
In this section, the conclusions of the trends observed in this chapter are summarized.
Chromatic information in general improves the performance of the FR system in condi-
tions of severe illumination. When the imaging conditions are optimal, the shape cues
present in the intensity image provide enough discriminatory information for the FR
system, and the systems performance is not improved by the integration of chromatic
spectral planes. This is in agreement with the conclusions in Chapter 3.
The Cb plane contains poor discriminative information when used with supervised
learning systems, especially in moderate imaging conditions. A good aggregation of in-
formation from the Y and Cr planes is therefore expected to boost the FR performance
over using Y alone when the imaging conditions are moderate. When the imaging con-
ditions are severe, the Cr plane offers the most significant discriminative information,
while the Y plane leads to the best individual performance when the imaging conditions
are optimal.
An important conclusion is that fusion of spectral planes on a decision level leads to
a better use of chromatic information in conditions of small sample size learning. This
is because a decision level aggregation helps avoid the passing of a larger dimension
Chapter 4. Decision Level Fusion of Spectral Planes 66
input through the feature extraction process. This holds true for all imaging conditions
examined. As the small sample size condition is relaxed, i.e., more samples per subject
are available for training, the contribution of decision level aggregation over a signal level
fusion is reduced.
4.6 Chapter Summary
In this chapter, the effect of a decision level fusion of information from individual spectral
planes of YCbCr inputs was examined over a range of small sample size learning scenarios.
Small sample size scenarios are of particular interest in supervised FR systems. As the
small sample size condition is relaxed, all gray scale and color space transformations
fused over all levels converge to a high FR performance, however there is still scope for
an improvement of FR performance in the small sample size scenarios, where there is a
lack of training data available.
Experimental results suggest that a decision level aggregation of classifiers trained on
individual spectral planes, boosts the performance of the FR system over a signal level fu-
sion of information, and this improvement in performance is most significant in the small
sample size learning scenarios. The discriminative capability of the individual spectral
planes of the YCbCr transformation was also examined, and the results were presented
in this chapter. It was concluded that the Cb spectral plane does not significantly help
the FR performance, compared to the intensity Y and Cr spectral planes.
Chapter 5
Color Face Recognition in
Ada-Boost framework
In this chapter, intensity and chromatic information is used as an input to the FR system
to create complementary classifiers to be combined in a decision fusion framework, in or-
der to address complexities in face patterns and severe imaging conditions. Complexities
in face patterns manifest in the form of expression and pose variations and severe imag-
ing conditions take the form of severe illumination / lighting conditions, poor resolution,
etc. These conditions cannot be easily learned by linear feature extractors. Complemen-
tary classifiers are created by ensemble learning using the adaptive boosting (ada-boost)
framework.
5.1 Introduction
Features based on color information lead to a better recognition performance in FR sys-
tems as confirmed by the experiments in the previous chapters and past works [15, 16,
17, 24, 25, 26]. The results in Chapter 3 and 4 suggest that color makes object recog-
nition more robust to imaging conditions such as illumination, and a face space created
with a supervised learning method based on the LDA criterion, trained on intensity and
67
Chapter 5. Color Face Recognition in Ada-Boost framework 68
chromatic information leads to a good FR performance in poor illumination conditions.
This enhances the performance of the FR system by combining the advantages of both
chromatic features and supervised learning.
However, linear feature extractors based on the LDA criterion cannot effectively learn
complexities in face patterns which occur when face patterns are subject to pose and
expression variations, and therefore lead to a deterioration in FR performance under these
conditions[11, 36]. Variations due to factors like illumination, pose and expression could
cause larger intra subject variations in faces than variations due to change in identity and
hence are crucial to address. In order to take complexities in face patterns into account
while training the system, linear methods of learning like the LDA should be replaced
by either globally nonlinear models, like those based on kernel discriminant analysis [36],
or by a linear combination of locally linear models (ensemble based models). Ensemble
based models based on a linear combination of linear and complementary classifiers are
advantageous over kernel based analysis in dealing with complexities as are less likely to
over fit, and have fewer parameters to optimize than their kernel counterparts [11, 19].
Previous works [25] have created multiple classifier FR systems based on LDA learners
trained on chromatic information, as discussed in Chapter 2. In [25], the concept that
different color spaces offer different information about the faces to the FR system is
utilized and the classifier experts trained on the different color spaces are combined in
a decision fusion framework. The experts are dynamically chosen using a confidence
based gating scheme, and depend on the probe image to be identified. This approach
to classifier combination combines the information contained in relevant color spaces
thus addressing various imaging conditions, however, cannot address the simultaneous
variation of pose, expression and illumination in face patterns which is a very realistic
situation especially in surveillance applications, where pictures of subjects may not be
captured in controlled conditions.
In this chapter, a multiple classifier FR system trained on chromatic information is
Chapter 5. Color Face Recognition in Ada-Boost framework 69
built using an ensemble learning framework. The learning framework used overcomes
the limitations of the classical LDA learner and previous multiple classifier FR systems
trained on chromatic information and addresses both complexities in face patterns and
illumination conditions, by creating complementary classifiers using the ada-boost tech-
nique.
5.2 Motivation: Ada-Boost Learning
This section presents the details and motivation behind the choice of the chosen ada-boost
framework. The learning framework used in this chapter aims at addressing the com-
plexities in face patterns and illumination conditions by combining the advantages of the
combination of chromatic features and LDA based learning in addressing FR systems
with that of ensemble learning in addressing complexities in face patterns. Ensemble
learning methods such as boosting and bagging are reported to lead to better perfor-
mances in pattern recognition systems when compared to individual learners as they
learn the various patterns in the training data and can generalize across different kinds
of images in the testing set. [11, 43, 42].
LDA based methods are susceptible to the small sample size problem frequently en-
countered in high dimensional pattern recognition tasks such as FR. When the faces are
multi spectral or color, the extra dimensionality of color inputs poses a more challeng-
ing small sample size problem which was explained in chapter 3. A direct effect of the
small sample size problem is the singularity of SW , which makes its inversion difficult.
A variant of the LDA called the direct LDA was proposed by H. Yu et al in [44] which
eases the inversion of the within class scatter matrix, SW , thus making it suitable for
application for high dimensional data. The direct LDA however does not totally solve
the small sample size problem as the estimation of SW still remains ill posed. J. Lu et
al have proposed a method, Ada-boost.M2 based on ada-boost [11] to linearly combine
Chapter 5. Color Face Recognition in Ada-Boost framework 70
a set of linear models into an ensemble model. Each linear model consisted of a feature
extractor trained using a direct LDA learner [10] and a linear classifier. This method
was tested on a subset of gray scale images from the FERRET database [28] having pose
(upto 22.5 degrees) and expression variations and proved to be effective in addressing
these complexities caused by pose and expression variations. Ada-Boost.M2 hence com-
bines the advantages of the adaptive boosting framework in addressing complexities and
the direct LDA in addressing the problem of degenerate scatter matrices. It however
involves a trade off between the weakness of individual learners and low generalization
error achieved on the training set, in order to create the most effective complementary
classifiers.
In this chapter, the conclusions of the previous chapters are extended by using chro-
matic information as an input to the ada-boost.M2, so that the ensemble of LDA based
learners can effectively learn the difficult illumination conditions and complexities in face
patterns which take the shape of variations in pose and viewpoint. Images of different
color spaces (RGB and YCbCr spaces) are used as inputs to the ada-boost.M2, and this
combination is tested on faces subjected to both horizontal and vertical pose variations
up to a maximum of 45 degrees and severe illumination conditions. It is found that
in certain cases this combination utilizes both the advantages of chromatic information
in dealing with images with illumination variations and the ada-boost.M2 in addressing
complexities caused by pose and viewpoints. However a challenge is that color inputs
have three spectral planes and hence the direct LDA learner is posed with a more severe
small sample size problem.
This learning framework is examined in various small sample size scenarios and the
impact of both chromatic spectral planes and boosting on the LDA learner is discussed
in detail in Section 5.6.
Chapter 5. Color Face Recognition in Ada-Boost framework 71
5.3 Background
In this section, an explanation of the concept of ada-boost is presented along with a
description of the ada-boost.M2 framework introduced in [11] which are used for the
experiments in this chapter.
The basic aim of boosting is to improve the performance of a learning algorithm [37].
It involves creating a weak learner whose error on the training set is slightly better than
average, and combining an ensemble of these learners in a decision fusion framework to
produce a strong ensemble learner whose combined decision rule outperforms each of the
individual learners and has a relatively low classification error on the training set. A
classifier/ learner is said to be weak or unstable if small changes in the training data
lead to significantly different classifiers/ learners and large changes in the accuracy. The
individual learners should be diverse and have a low mutual dependence, and are trained
on subsets of the training data in such a way that they offer complementary information
to the FR system.
The most popular variation of boosting is ada-boost. Ada-boost involves the addition
of subsequent weak learners to the ensemble in every iteration until the combined ensem-
ble learner achives a low error on the training set. In order to design a good ada-boost
system,
• There should be an interaction between the booster and the individual learners. The
learner in each subsequent iteration is trained on those training samples which were
hardest to classify in the present iteration. This is usually performed by assigning
a weight to each training sample at the end of every iteration, which determines its
probability of being selected for the subsequent iteration. The ada-boost therefore
focuses on the difficult patterns. This ensures the complementarity and low mutual
dependence of the individual classifiers.
• The Boosting procedure should create weak learners/ classifiers which have a low
Chapter 5. Color Face Recognition in Ada-Boost framework 72
mutual dependence and a low generalization error on the training set. This in
theory involves a trade off, as it is hard to achieve both conditions simultaneously.
Past works [11] however indicate that boosting is generally robust to over fitting of the
training data, and can learn a wide range of patterns.
Ada-boost.M2 has been chosen for the experiments in this chapter owing to its demon-
strated capability in addressing a large database containing complexities in face patterns
in gray scale images, as mentioned earlier. Each individual learner consists of a direct
LDA based feature extractor [10] and a linear classifier (nearest center). The direct LDA
based feature extractor and the ada-boost.M2 framework are explained in the remainder
of this section.
5.3.1 Regularized Direct LDA
The linear discriminant analysis or LDA feature extractor finds the set of orthogonal
vectors which maximize the ratio of the inter/ between class scatter matrix to the within/
intra class scatter matrix, as explained in Section 3.3. As explained earlier, the number
of samples per subject available for training is usually very small when compared to the
dimensionality of the face input used for training. The large dimensionality of face inputs
makes the estimation of the within class scatter matrix, SW an ill posed problem, and
this is referred to as the small sample size problem. This leads to a singular SW as the
matrix has a very low rank and hence is impossible to perform an inversion operation on
SW , making it difficult to obtain the LDA feature basis.
The issue of inverting SW has been solved in different ways. In Chapter 3, a PCA step
was performed prior to the LDA, thus effectively reducing the dimensionality of the LDA
input from d (dimensionality of column vector of the face) to N−C, where N = C×L (N
is the number of images in the training set, C is the number of subjects being considered
and L is the number of samples per subject) [14]. However, this solution does not lead
to an optimal solution for the LDA feature matrix as part of the important within/ intra
Chapter 5. Color Face Recognition in Ada-Boost framework 73
scatter information is lost in the PCA preprocessing step.
H. Yu et al in [44] proposed a different method of finding the LDA feature basis, W
which does not involve the PCA preprocessing step. If A is the null space of the between
class scatter matrix, SB and B is the null space of SW , according to the LDA opti-
mality criterion, the direct LDA finds the M most significant eigen vectors in AC⋂
B
which maximize the ratio in Equation (3.2). This is performed by first diagonalizing
SB using eigen decomposition, and retaining only the most significant C − 1 vectors
(rank(SW )=min(N, c− 1)) to form AC . SW is then projected onto this low C − 1 dimen-
sional space, AC . AC⋂
B could be solved for by performing an eigen decomposition on
the projected SW and retaining the M vectors which correspond to the smallest eigen
values. AC⋂
B is usually a low dimensional subspace.
In the experiments performed, the regularized direct LDA (R-LDA) [9] is used as
the feature extractor in the individual learning block of the boosting framework and is
based on the direct LDA. The R-LDA is a variant of the direct LDA, and uses a modified
fisher’s criterion,
Ψ = arg maxΨ
∣
∣ΨTSBΨ∣
∣
η |ΨTSBΨ| + |ΨTSW )Ψ|(5.1)
where, Ψ = [ψ1ψ2...ψM ]T and η is the regularization parameter. This modified criterion
has the effect of decreasing larger eigen values and increasing smaller eigen values, thereby
counteracting the high bias involved in the estimation of eigen values. It also has the
effect of adding a minimum value to the zero eigen values, thus making SW easier to
invert. The criterion in Equation (5.1) is equivalent to the conventional Fisher’s criterion
in Equation (3.2), according to the following theorem [11]:
Theorem 1: In an n-dimensional vector space, ℜn, ∀x ∈ ℜn, let h1(x) = f(x)g(x)
,
h2(x) = f(x)g(x)+ηf(x)
, where f(x) ≥ 0, g(x)>0, 0 ≤ η ≤ 1 and f(x) + g(x)>0. If h1(x) has a
maximum (including +∞) at x0 ∈ ℜn, then h2(x) has a maximum at the same point.
This modified criterion reduces the bias and variance in estimating the eigenvalues,
at the same time prevents the issue of inverting a singular SW , however the estimation
Chapter 5. Color Face Recognition in Ada-Boost framework 74
of the SW matrix is still an ill posed problem, especially when the number of samples
per class is approximately 2 to 3, and the dimensionality of the samples is usually of
the order of 104. In this chapter,the effect of the more severe small sample size problem
created by the increased dimensionality of a multi spectral image is examined on the
boosting framework in a range of small sample size learning scenarios. The regularization
parameter η is therefore fixed to a particular value, η = 1 in the experiments.
5.3.2 Ada-Boost framework
The individual learner in the ada-boost.M2 consists of a R-LDA feature extractor and a
linear classifier: Nearest Center Classifier. A new learner is formed in each subsequent
iteration based on the outputs or results from the learner in the previous iteration in
the form of the updated parameters, which depend on the error in the hard to classify
samples and hard to classify subjects of the previous iteration. The classifier built at
iteration t is a Nearest Center linear classifier and is denoted by ht. The final classifier
hf is a weighted sum of all ht’s. A learner consisting of a R-LDA feature extractor and a
Nearest Center classifier is henceforth referred to as a g-Classifier. A general ada-boost
learning framework is presented in Figure 5.1.
Given a training set Z = {Z}C
i=1, containing C classes with each class Zi = {zij}Ci
j=1,
consisting of images zij (where zij is the column vector of the jth image of the ith class),
a total of N =∑C
i=1Ci are present on the training set. The dimensionality of the column
vectors of the images in Z is d. Ci is fixed to L, ∀i.
For optimal performance of the boosting method, the individual learners of the ada-
boost.M2 should have a low mutual dependence with each other and a low generalization
error on the training set. The boosting method does not perform better over iterations
if either the individual g-Classifiers are too strong, i.e., have a high mutual dependence,
or they are too weak so as to produce a very high generalization error, as explained
earlier. The g-Classifiers will have a strong mutual dependence if the samples used to
Chapter 5. Color Face Recognition in Ada-Boost framework 75
Figure 5.1: Training the Ada-Boost Ensemble- Generic Diagram
train each of the g-Classifiers are overlapping. The samples per subject available for
training each g-Classifier is therefore used as a parameter for adjusting the weakness of
the g-Classifiers. The weakness of the g-Classifier is described using a quantity called
the Learning Difficulty Degree (LDD) which is given by Equation (5.2),
Learning Difficulty Degree, ρ =r
C(5.2)
where r is the number of samples/ subject present in each individual g-classifier and C
is the number of subjects in the entire training set, Z. The Boosting method therefore
involves a trade off between weak g-Classifiers and low generalization error, and this is
achieved by choosing an r∗ such that the most optimal performance is achieved with
ada-boost.M2. The optimal r∗ will differ for each learning scenario and can take values
Chapter 5. Color Face Recognition in Ada-Boost framework 76
in [2, L], where L is the number of samples per subject in Z.
In order to facilitate interaction between the booster and the learner, two quanti-
ties are introduced: the pairwise class discriminant distribution: At which introduces a
weighting factor in the between class scatter matrix SB,t, and the sample distribution:
Dt which introduces a weighting factor in the within class scatter matrix, SW,t, where t
is the boosting iteration. Higher values for At(p, q) and Dt(zij) show harder separability
between two classes (p & q) and a harder to classify sample (zij) respectively. The values
of At(p, q) and Dt(zij) are calculated from the mislabel distribution, Γt(zij, y), which is
in fact a function of the pseudoloss at iteration t− 1, ǫt−1. The pseudoloss, ǫ represents
the training or generalization error. The Equations (5.3)- (5.7) present the formulae used
to calculate these values.
LetB be the set of all mis-labels defined as, B ={
(zij, y) : zij ∈ Z, zij ∈ ℜd, y ∈ Y, y 6= yij
}
ǫt =1
2
∑
(zij ,y)∈B
Γt(zij, y)(1 − ht(zij, yij) + ht(zij, y)) (5.3)
where Γt(zij, y) is the mislabel distribution defined over all elements of B. A higher value
of Γt signifies a higher probability of the misclassification: (zij, y), where y 6= yij.
The equations for the between and within class scatter of the R-LDA in iteration t
are given by,
SB,t =r
N
C∑
p=1
At(p, q)(zp − zq)(zp − zq)T (5.4)
SW,t = N
C∑
i=1
r∑
j=1
Dt(zij)(zij − zi)(zij − zi)T (5.5)
At and Dt which are used in equations 5.4 and 5.5 are described below,
Chapter 5. Color Face Recognition in Ada-Boost framework 77
At(p, q) =
12(∑
j:gt(zpj)=q Dt(zpj) +∑
j:gt(zqj)=p Dt(zqj)) if p 6= q
0 if p = q
(5.6)
gt(z) = arg maxy∈Y
ht(z, y)
Dt(zij) =∑
y 6=yij
Γt(zij, y) (5.7)
where ht ∈ [0, 1] is based on the nearest center classifier. A value of 1 indicates perfect
similarity and a value close to 0 indicates low similarity of sample zij to class y. At is
calculated using only those samples from each class which were not represented well in
the previous t − 1 g-classifiers. Higher values of At and Dt indicate harder to classify
classes and harder to classify samples, respectively.
With respect to the pictorial representation of the ada-boost training in Figure 5.1,
the updated parameters refer to Γt, At and Dt. The hard to classify subset extracted in
every successive tthiteration correspond to those r samples with the the highest values of
Dt in every class. A pseudo code of the ada-boost.M2 procedure is presented in Figure
5.2.
5.4 Possible Implication of color in the Ada-Boost
framework
In this section, the possible implication of the extra dimensionality of chromatic infor-
mation on the ada-boost framework is discussed. Chromatic information could have
implications on two aspects of the boosting framework, which are discussed in this sec-
tion. They are,
• Color inputs could worsen the small sample size problem when used with the R-
LDA feature extractor, which is a part of the g-classifier.
Chapter 5. Color Face Recognition in Ada-Boost framework 78
I n p u t : T r a i n i n g i m a g e s a n d t h e i r c o r r e s p o n d i n g l a b e l s :1
1
,C
L
ij ijj
i
z ya n d
ijy Y w h e r e
{1,2,..., }Y C , N o . o f I t e r a t i o n s T .I n i t i a l i z a t i o n :1
1( , )
( 1)ijz y
N C. C a l c u l a t e
1Aa n d
1D u s i n g e q u a t i o n s 5 . 6 a n d 5 . 7P r o c e d u r e : D o t h e f o l l o w i n g f r o m t = 1 t o T1 . i f t = 1 , c h o o s e r s a m p l e s p e r c l a s s r a n d o m l y t o f o r m R t , e l s e c h o o s e r h a r d e s ts a m p l e s p e r c l a s s b a s e d o n t h e h i g h e s t ˆt
D v a l u e s t o f o r m R t2 . T r a i n t h e J D L D A f e a t u r e e x t r a c t o r o n R t u s i n g e q u a t i o n s 5 . 1 , 5 . 4 a n d 5 . 5 a n do b t a i n t h e f e a t u r e b a s i s W t , p r o j e c t e d c l a s s m e a n s , t iW z3 . B u i l d c l a s s i f i e r h t u s i n g W t a n d
t iW z c r e a t e d i n p r e v i o u s s t e p a n d o b t a i n t h eh y p o t h e s i s : h t :
[0,1]dR Y4 . C a l c u l a t e
ˆ u s i n g e q u a t i o n 5 . 3 . ˆˆˆ1
tt
t5 . U p d a t i o n : (1 ( , ) ( , )) / 2
1
( , )( , )
t ij ij t ijh z y h z y
t ij t
t ij
z yz y , w h e r e i s a n o r m a l i z a t i o nf a c t o r t o c o n v e r t i t t o a d i s t r i b u t i o n .
1ˆ
tA
a n d1
ˆt
Da r e u p d a t e d u s i n g e q u a t i o n s5 . 6 a n d 5 . 7O u t p u t : T h e f i n a l e n s e m b l e g c l a s s i f i e r ,
1
1( ) arg (log ) ( , )max
T
f t
y Y t t
h z h z y , w h e r e z i sa n u n k n o w n p r o b eFigure 5.2: Pseudocode: Ada-Boost framework
• They could also have an effect on the optimal weakness, i.e., the r∗ parameter of
the ada-boost.M2 framework.
Implication on the small sample size problem
When the images in Z are multi spectral (or color), the dimensionality of of the training
samples, zijs are increased by a factor of the number of spectral planes, K as discussed
earlier. This could worsen the small sample size problem when used with an LDA learner
Chapter 5. Color Face Recognition in Ada-Boost framework 79
as discussed in Chapter 3. Even though the R-LDA is robust to the issue of inversion of
a singular within class scatter matrix, SW by searching for an optimal basis in the low
dimensional space of AC⋂
B as discussed earlier, the estimation of SW remains an ill
posed problem due to the high bias and variance involved in the estimation of parameters.
The estimation of SW becomes a more ill posed problem when multi spectral color inputs
are used as an input to the R-LDA feature extractor, thus worsening the small sample
size problem.
Implication on the optimal weakness of the individual g-classifiers
The design of the ada-boost.M2 involves an optimal choice of the parameter r∗ which
denotes the optimal number of samples per subject used for training each individual
g-classifier. This r∗ should achieve the best trade off between creating individual g-
classifiers with low mutual dependence, i.e., weak learners and achieving a low general-
ization error on the training set. The trade off is described using a loss function defined
in [11],
R(r) = (1
T
T∑
t=1
∑
i,j
Pr[ht,r(zij) 6= yij]) + λ ·
√
ρl(r)
ρl(L)(5.8)
where T is the iteration number, Pr[ht,r(zij) 6= yij] is the empirical classification error
rate(CER) obtained by applying the g-Classifier ht to the training set Z and λ is a
constant whose value is determined experimentally. The first term in the above equation
represents the generalization error while the second term represents the weakness of the
individual learners. Determining the optimal r∗ is equivalent to minimizing R(r) with
respect to r. The optimality is defined with respect to lowest generalization error on the
training set.
Increased dimensionality of the face inputs zijs induces a small sample size problem
in training the individual g-Classifiers, which could increase or change the generalization
error on the training set (first term in Equation (5.8) ). Therefore the weakness of the
Chapter 5. Color Face Recognition in Ada-Boost framework 80
g-Classifier would have to be adjusted accordingly, i.e., r∗ would have to be changed
in order to minimize the function in Equation (5.8). The booster would however fail if
r was either too high or low. Although color inputs lead to a better performance than
gray scale inputs for all examined scenarios, the optimal r∗ could be different for both
color and gray scale inputs. This is another aspect examined in the experiments in this
chapter.
The effect of the increased dimensionality and the induced small sample size created
by color images on the booster, along with the effect of chromatic information in the
design of the ada-boost.M2 parameters are examined in the performed experiments so
that chromatic information can be used in the most effective way so that can be used to
address variations in illumination, in addition to pose and expression variations.
5.5 Methodology and Experimental Setup
For the experiments in this chapter, the ada-boost.M2 has been trained on the gallery
set, Z. The FR system operates in the identification mode and the images of the probe
set, Q are matched against those of the gallery. A flowchart depicting a broad outline of
the FR system used is presented in Figure 5.3.
As in the experiments performed in the previous chapters, the images of the gallery
and probe sets, Z and Q respectively contain irrelevant portions comprising of the back-
ground, hair, shoulder, etc. along with the face. The preprocessing stage isolates the
face from the rest of the image, and represents the face as a column vector for further
processing. The steps to preprocessing are explained in Appendix B. The resolution of
the images are fixed to 150×130 for all experiments performed. This resolution is chosen
as it is commonly used in surveillance applications.
Since the images are in the RGB format, a color space transformation block is required
to transform the preprocessed images to the required color space for analysis. Since we
Chapter 5. Color Face Recognition in Ada-Boost framework 81
want to evaluate the effect that color has on the small sample size scenario and the ada-
boosting framework in this paper, we have considered two color space transformations:
RGB, and YCbCr 4:4:4, along with their corresponding gray scale counterparts: R and
Y. The same color space/ gray scale transformation is used in both training and testing
stages, i.e., it is assumed that the FR system user knows the color space/ gray scale
transformation used for training the system.
As mentioned before, each individual g-classifier consists of a R-LDA feature extractor
and a linear classifier: nearest center, and is trained based on updated parameters which
depend on the generalization error on the training set, Z. In order to fit the ada-boost
framework, the nearest center classifier is based on the Euclidean distance and given by,
dist(z, i,Ψt, zi,t) =distmax − distz,i
distmax − distmin
(5.9)
where distz,i =∥
∥ΨT (z − zi)∥
∥, distmax = max({distz,i}Ci=1) and distmin = min({distz,i}
Ci=1).
The classification score obtained by Equation (5.9) has values in [0,1]. It should be noted
that for normalized inputs of unit norm, the cosine similarity measure is equivalent to the
Euclidean similarity metric. The ada-boost.M2 is first trained on Z and then evaluated
on Q to produce a Classification Error Rate (CER). CER is the ratio of the number of
wrong identifications to the total number of probe images taken as a percentage. The
CER is equal to 100- CRR.
A difficult subset of the CMU PIE database having severe pose and illumination vari-
ations is chosen as an evaluation database for the experiments in this chapter. 7 different
poses and 10 different illumination conditions are included to depict hard conditions in
the FR problem. The illumination conditions are caused by varying positions of the
camera flash in a room with zero background illumination, hence the variations caused
are severe. Following the PIE’s naming rule, pose group [05, 07, 09, 11, 27, 29, 37] and
alternate flash numbers [2, 4, 6, 10, 12, 13, 14, 16, 18, 19] are chosen for experimentation
Chapter 5. Color Face Recognition in Ada-Boost framework 82
purposes. Poses chosen are restricted to a maximum variation of 45 degrees and 10 out of
21 illumination conditions are used in this evaluation database. Details of the evaluation
subset used for experimentation, D are listed as follows:
• No. of subjects: 68
• No. of samples per subject: 70 (Each subject has 10 images belong to each pose,
where each of those 10 images belong to a different illumination condition; thus
covering 7 poses and 10 severe illumination conditions).
• Total number of images in the evaluation database: 68×70=4760
The number of subjects is fixed to 68, and the samples per subject, L is varied.
Following standard FR practices, the evaluation database is divided into two sets: the
gallery set on which training is performed, Z and the probe set, Q which contains the
images of unknown identity, such that D = Z +Q, and Z ∩Q = ∅. L images per subject
from D comprise the training set Z, while the remaining 70 − L images per subject
constitute the probe set Q, hence the cardinality of Z is |Z| = 68×L and the cardinality
of Q is |Q| = |D| − 68 × L. L takes values in {3, 4, 5, 6, 7, 10, 13, 16} in order to examine
the small sample size problem in terms of number of samples/subject for both color and
gray scale images. The images of each subject chosen to comprise Z are ensured to be of
different illuminations for all learning scenarios and different poses if L ≤ 7, so that all
7 poses and 10 illuminations are represented by the 68 subjects in the training set. The
results reported are at an average of more then 7 runs to avoid bias; each run is executed
on a gallery and probe partition.
Chapter 5. Color Face Recognition in Ada-Boost framework 83
Fig
ure
5.3:
Syst
emD
escr
ipti
on
Chapter 5. Color Face Recognition in Ada-Boost framework 84
5.6 Results
In this section, the following aspects are examined for a range of small sample size learning
scenarios on the evaluation database described,
• The effect of chromatic information and the extra dimensionality of chromatic in-
puts on the R-LDA feature extractor
• The effect of boosting the R-LDA learner for different color space/ gray scale trans-
formations
The results are presented in Table 5.1 and the values recorded are the Classification
Error Rate (CER) as a percentage. The ada-boost.M2 was performed over 40 iterations
and the best FR performance (at iteration T*) over 40 iterations is presented. In order
to reduce the number of parameters varied, the number of features used for each R-LDA
feature extractor in the creation of each g-classifier is fixed to 30 for all experiments. In
order to compare the best performances of boosting with that of the R-LDA, the best
CER (that obtained using the optimal number of features, M*) is recorded in Table 5.1.
In order to examine the aspects above, three performance measures are introduced,
• ξ∗J : The best improvement obtained by color over its gray scale counterpart for
R-LDA
• ξ∗B: The best improvement obtained by color over its gray scale counterpart for
ada-boost.M2
• δ∗: The best improvement obtained by boosting, i.e., the improvement of ada-
boost.M2 over R-LDA
Negative values of ξ∗J , ξ∗B and δ∗ signify improvements, while positive values signify
deterioration in performance. Table 5.3 presents values for ξ∗B and table 5.2 presents
values for ξ∗J and ξ∗B over the range of learning scenarios examined.
Chapter
5.
Color
Face
Recognit
ion
inA
da-B
oost
framew
ork
85
Table 5.1: Results obtained with ada-boost.M2 & R-LDA using color & gray scale transformations in different learning scenarios
x 1/68 B-JD-LDA(T*)JD-LDA(M*) B-JD-LDA(T*) JD-LDA(M*) B-JD-LDA(T*) JD-LDA(M*) B-JD-LDA(T*) JD-LDA(M*)
2 65.86(6) 64.45(4) 60.14(11) 59.27(1)
3 54.97(5) 54.17(7) 49.87(21) 45.89(3)
2 67.9(5) 64.12(5) 60.72(5) 61.33(2)
3 39.95(30) 38.62(23) 34.16(30) 29.71(24)
2 67.94(1) 65.37(4) 65.15(3) 62.09(2)
3 35.36(40) 32.38(40) 29.71(38) 25.15(36)
4 31.20(37) 29.12(34) 26.06(34) 20.57(22)
3 33.86(40) 28.54(40) 26.79(40) 29.64(7)
4 25.17(39) 23.84(27) 20.78(40) 16.76(36)
5 23.67(37) 22.47(23) 19.61(33) 16.23(17)
3 32.64(36) 23.92(40) 21.87(37) 31.18(4)
4 19.78(39) 17.32(37) 14.35(39) 11.07(37)
5 16.71(40) 15.46(31) 12.33(29) 9.37(30)
6 16.04(40) 15.13(21) 12.16(38) 9.76(13)
5 13.73(38) 12.56(40) 10.62(39) 7.21(40)
6 12.50(39) 11.57(30) 9.70(34) 6.28(40)
7 11.87(39) 11.37(35) 9.42(39) 6.04(28)
9 13.79(22) 13.80(8) 11.39(15) 8.04(6)
5 16.67(36) 14.95(39) 12(40) 9.13(39)
7 13.28(40) 12.45(39) 9.36(40) 7.57(38)
9 12.63(31) 12.24(36) 9.66(40) 8.50(11)
11 15.07(14) 14.14(10) 12.16(40 9.73(7)
5 15.1(40) 13.27(40) 9.91(40) 8.42(40)
7 10.86(40) 10.43(39) 7.19(40) 6.06(35)
9 9.59(35) 9.8(40) 6.57(37) 5.56(26)
11 9.71(39) 10.09(29) 7.43(36) 6.69(11)
13 11.49(24) 11.43(15) 9.16(24) 7.41(10)
15 13.09(17) 12.86(20) 10.59(12) 8.02(7)
56.3586(46)
Values in this table are Classification Error Rate (CER) expressed as a percentage
The CERs reported for the B-JD-LDA are the minimum over 40 ada-boost iterations, T* denotes the iteration number at which this
minimum was achieved, no of JD-LDA features used 30 for all boosting experiments
The CERs reported for JD-LDA are for the best found feature number, where M* is the most optimal number of features
46.385(46)
Gray scale transformationsLDD of
individual
learner
Color Space Transformations
R Y RGB YCbCr
52.7283(46)
16
7
10
13
3
46.8326(35)
56.3191(46)
Samples/
Subject
4
5
6
18.6819(30) 15.2614(30) 11.6285(30)
21.5931(30) 17.3676(30) 13.4559(30)
14.6646(30)19.1744(30)22.549(30)
29.2464(33)
27.0003(30) 22.1548(30) 17.411(30)
39.2534(33)
17.963(30)
39.5989(33)
33.3203(30)
27.1761(30)
21.9401(30)
20.9657(30)
33.596(30) 28.8166(30) 23.8074(30)
46.884(35) 41.9546(35) 36.7647(35)
35.0309(33)
Chapter 5. Color Face Recognition in Ada-Boost framework 86
5.6.1 Implication of Color
From Table 5.2 it is evident that color inputs lead to a better performance for the FR
system under any given case examined. The improvement caused by using color inputs
ranges from approximately 2% to 8% for all the learning scenarios examined. From table
5.1, |ξ∗B| and |ξ∗J | are significantly greater for the YCbCr & Y pair of transformations
compared to the RGB & R pair, which implies YCbCr is a better color space for FR.
This is in agreement with the conclusions in past works [17], and can be attributed to the
decorrelated information in each of its spectral planes. The issues discussed in Section
5.4 are examined in this sub section.
Small Sample Size Scenario
An overview of Table 5.2 suggests that for both the RGB & R and YCbCr & Y pairs
of transformations ξ∗B and ξJ are higher in the small sample size scenarios. The values
of |ξ∗B| and |ξ∗J | are highest when L = 4 and monotonically decrease as L increases.
However in the most extreme small sample size scenario examined, corresponding to
L = 3 case, |ξ∗B| and |ξ∗J | are marginally lower. This observation can be attributed to the
effect of the increased dimensionality of color on the small sample size problem, and is
similar to the trend observed in Chapter 3. However, the values of |ξ∗B| and |ξ∗J | are still
significantly high to use chromatic information over gray scale in the small sample size
scenarios. As the small sample size restriction is relaxed, i.e., the value of L increases,
the improvement of color over gray scale (ξ∗B and ξJ) is not as significant for both pairs
of color transformations examined.
Implication on weakness of g-classifier
In agreement with earlier literature [11], r∗ should not be too high or too low, from
table 5.3. However no trend in the shift of r∗ was observed when color transformations
were used instead of their gray scale counterparts, although in some cases (L = 6, 7) the
Chapter 5. Color Face Recognition in Ada-Boost framework 87
Samples / LD-LDA ada-boost.M2Subject
RGB YCbCr RGB YCbCrCER 52.7283 46.385 49.87 45.89
3 r* - - 3 3ξ ∗B /ξ∗J -3.59 -9.974 -5.1 -8.28
CER 41.9546 36.7647 34.16 29.714 r* - - 3 3
ξ ∗B /ξ∗J -4.88 -10.12 -5.79 -8.91CER 35.0309 29.2464 26.06 20.57
5 r* - - 4 4ξ ∗B /ξ∗J -4.57 -10.01 -5.14 -8.55
CER 28.8166 23.8074 19.61 16.236 r* - - 5 5
ξ ∗B /ξ∗J -4.5 -9.789 -4.06 -6.24CER 22.1548 17.411 12.16 9.37
7 r* - - 6 6ξ ∗B /ξ∗J -5.02 -9.589 -3.88 -5.76
CER 17.3676 13.4559 9.42 6.0410 r* - - 7 7
ξ ∗B /ξ∗J -3.6 -8.137 -2.45 -5.33CER 19.1744 14.6646 9.36 7.57
13 r* - - 8 8ξ ∗B /ξ∗J -2.77 -7.884 -3.27 -4.67
CER 15.2164 11.6285 6.57 5.5616 r* - - 9 9
ξ ∗B /ξ∗J -2.7 -7.053 -3.02 -4.24
Table 5.2: Best Performances obtained by using the color space counterpart over thecorresponding gray scale over different learning tasks
Chapter 5. Color Face Recognition in Ada-Boost framework 88
value of r* is shifted to a lower value for the YCbCr set of transformations. Color inputs
always produce a lower generalization error error on the training set when compared to
gray scale inputs, therefore would require a lower r* to achieve the optimal trade off.
However, the effect of using chromatic inputs with different dimensionality on r* is not
examined.
5.6.2 Implication of ensemble learning
From the the negative values of δ∗ in Table 5.3, a broad conclusion would be that the
ada-boost.M2 has a better performance than R-LDA method of FR for all examined
cases.
The improvement caused by boosting the R-LDA, i.e., |δ∗| is larger when the size of
the training database is large, i.e., L>4. The value |δ∗| is not significant for the case when
L = 3, however is over 6% for all cases when L>4. This is due to the fact that when the
training database is large, the probability of the ada-boost.M2 choosing a different set
of training samples and hence creating a diverse and complimentary set of classifiers is
higher. This trend agrees with previous works [11, 45] which examine the performance
of ensemble learners.
Another trend observed in Table 5.3 is that the improvement obtained by boosting
the R-LDA, |δ∗| does not depend on the color space/ gray scale transformation used, but
only on the size of the training database.
The above trends suggest that boosting the learner does not significantly help the FR
system in the extreme small sample size learning scenarios, L ≤ 3, but however improves
the FR system performance when the training set is reasonably large irrespective of the
color space/ gray scale transformation used.
Chapter 5. Color Face Recognition in Ada-Boost framework 89
Samples / Gray Scale Transformation Color Space Transformation
Subject Y fromR YCbCr RGB YCbCr
CER 54.97 54.17 49.87 45.893 r* 3 3 3 3
δ∗ -31.3491 -2.1886 -2.8583 -0.495CER 39.95 38.62 34.16 29.71
4 r* 3 3 3 3δ∗ -6.8826 -8.264 -7.7946 -7.0547
CER 31.2 29.12 26.06 20.575 r* 4 4 4 4
δ∗ -8.3989 -10.1334 -8.9709 -8.6764CER 23.67 22.47 19.61 16.23
6 r* 5 5 5 4δ∗ -9.6503 -11.126 -9.2066 -7.5774
CER 16.04 15.13 12.16 9.377 r* 6 6 6 5
δ∗ -11.1361 -11.8703 -9.9948 -8.041CER 11.87 11.37 9.42 6.04
10 r* 7 7 7 7δ∗ -9.0957 -7.4069 -7.9476 -7.4159
CER 12.63 12.24 9.36 7.5713 r* 9 9 8 8
δ∗ -9.3101 -9.451 -9.8144 -7.0946CER 9.59 9.8 6.57 5.56
16 r* 9 9 9 9δ∗ -8.373 -8.8819 -8.6914 -6.0685
Table 5.3: Best Performances obtained by boosting the R-LDA learner for different inputsand learning tasks
Chapter 5. Color Face Recognition in Ada-Boost framework 90
5.7 Conclusions
Color transformations boost the performance of the performance of the FR system un-
der any given scenario, however, the improvement offered by chromatic inputs reduces
monotonically as the learning scenario becomes easier. This is in agreement with the
trends observed in Chapter 3. Even though the added dimensionality of the color inputs
examined leads to a dip in the improvement caused by color in the most extreme case of
small sample size learning examined, it is still high enough to suggest chromatic inputs
over gray scale inputs.
Boosting the learner on the other hand, leads to an improvement in the FR system
performance when the training database is large, as the ada-boost.M2 can build a more
diverse and complementary set of classifiers. In the extreme cases of small sample size
learning, the classifiers generated by the boosting framework are trained on the virtually
the same samples at every iteration and differ only in the updated parameters, leading to
a strong mutual dependence between the individual learners. Therefore the improvement
achieved by boosting is not significant in these learning scenarios. The weakness of the
individual R-LDA learners should be appropriately designed depending on the learning
scenario concerned, and would not depend on the color space/ gray scale transformation
used.
The trends observed in Section 5.6 suggest that the design of the FR system (color
space/ gray scale transformation and boosting parameters) would have to be chosen de-
pending on the learning scenario under consideration. In small sample size scenarios, the
FR system performance is boosted significantly by the usage of chromatic inputs and
not by boosting the learner. As the value of the samples per subject, L is increased,
the improvement provided by color information reduces. Boosting the learner improves
the performance of the individual learner significantly in all cases where the size of the
training database is reasonably large, i.e., L ≥ 4& |D| > 272 images. The experimental
results show that integrating color into the boosting framework could significantly im-
Chapter 5. Color Face Recognition in Ada-Boost framework 91
prove the performance of the FR system when L ≈ 4 − 10 for medium sized databases.
Also, the YCbCr set of color transformations lead to a higher FR performance than the
RGB, for the set of images used.
5.8 Chapter Summary
In this chapter, chromatic information is integrated with an ada-boost learner to address
complexities in face patterns and illumination variations in training databases for face
recognition (FR). An LDA based learner is boosted and the integrated framework is
tested on a large database of images having severe pose and illumination variations.
The effect of both the extra dimensionality of color inputs and ensemble learning were
examined on the LDA learner in a range of small sample size learning scenarios. The
results of the experiments performed were presented in this chapter.
Experimental results show that integrating color into the boosting framework helps in
addressing complexities in face patterns and severe illumination variations and produces
a high performing FR system for a range of learning scenarios. However in learning
scenarios where the training database is small, e.g., the small sample size scenarios, the
contribution of chromatic information is very significant, and when the size of the training
database is reasonably large, ensemble learning boosts the performance of the FR system.
Chapter 6
Conclusion and Future Research
In this chapter, the broad conclusions of the aspects studied in this thesis are present
along with a summary of the work. Proposed directions for the extension of this work
are also discussed.
6.1 Research Summary
Usage of color information has gathered recent attention in FR research. In this thesis,
color information in multi spectral images is used along with intensity or gray scale
information as an input to the FR system. The small sample size learning learning
scenarios are of major importance in this work. The effect of chromatic information
is examined in a range of learning scenarios, facial distortions/ viewpoints and both
poor and good illumination conditions, and an analysis is presented on the usefulness
of chromatic information. The experiments performed suggest toward the idea that
chromatic inputs do provide discriminatory information to the FR system in certain
conditions. The results presented in this thesis are specific to the databases evaluated
upon, however the conclusions and ideas can be extended to any color FR system.
Experiments were performed to determine the learning scenarios and imaging condi-
tions in which chromatic information boosts the performance of the FR system. This is
92
Chapter 6. Conclusion and Future Research 93
an important concern as the storage requirements and computational cost involved in the
usage of multi spectral chromatic inputs is around 1.5-3 times more than the correspond-
ing gray scale images. This issue was examined for both supervised and unsupervised
learning modes and it was concluded that color cues provide important discriminatory
information to the FR system in conditions of poor illumination, when the shape cues are
degraded owing to an unclear intensity or gray scale image. A point of interest was the
effect of the increased dimensionality of color inputs and spatial sampling of chromatic
planes on the small sample size problem in supervised FR systems. Interestingly, the ex-
periments suggest that color inputs help the FR system performance in small sample size
learning scenarios. Spatial sub sampling of chromatic planes can be used as a parameter
to control the trade off required between the dimensionality of the input and the amount
of chromatic information fed to the system thus aiding the design of the FR systems in
small sample size scenarios. This chosen dimensionality would depend on the imaging
conditions under which the faces were captured. Spatial sub sampling leads to a decrease
in important information when used in unsupervised FR systems with images subjected
to severe imaging variations, although does not lead to a loss of information when the
shape cues are optimal. Another important factor in the design of a color FR system is
the chromatic bytes used to form the color inputs. In our experiments, we found that
the Cb spectral plane (of YCbCr) does not contain discriminative information useful for
FR purposes. This would depend on the nature of images present in the training and
testing sets.
The effect of using the YCbCr transform in a decision fusion framework was evaluated
in this thesis. The YCbCr being a decorrelated transform is expected to offer different
and complementary information through each of its spectral planes to the FR system.
Since the individual learners are trained on individual spectral planes, this framework is
expected to be more robust to the small sample size problem in comparison to FR systems
where a raw data level fusion of spectral planes is performed. The framework was tested
Chapter 6. Conclusion and Future Research 94
in the supervised learning mode under both poor and good conditions of illumination
and it was found that fusion of chromatic and intensity information on a decision level
is an efficient way to use information contained in face inputs especially in small sample
size learning scenarios.
Complexities in face patterns which take the shape of variations in pose, viewpoint
and expression which occur with simultaneous variation in illumination conditions were
addressed by combining chromatic information, supervised learning and adaptive boost-
ing into a single learning framework. The implication of color information on different
aspects of the LDA based ensemble learner were discussed and examined. The individ-
ual effects of chromatic information and adaptive boosting were examined on the LDA
based supervised learner and it was concluded that this combined framework proposed
finds applications in medium sized face databases which have simultaneous illumination,
pose and variations in viewpoint. When the size of the training database is very small,
chromatic information helps the FR system, while when the size of the training database
is very large, the ensemble learner boosts the performance of the FR system.
In summary, an important conclusion from this thesis is that color especially helps
the FR system when the images are captured in uncontrolled conditions and severe
illumination conditions. Chromatic information improves the FR performance in the
extreme small sample size learning scenarios, if used effectively. Even though there
might be a slight drop in the contribution of chromatic information under some learning
and imaging conditions, color cues still provide valuable discriminative information to
the FR system under these difficult learning scenarios.
6.2 Future Work
A set of research topics are discussed in this section which would extend the work pre-
sented in this thesis.
Chapter 6. Conclusion and Future Research 95
More efficient aggregation on a decision level
In this thesis, information from the different spectral planes are fused in a multiple
classifier system framework using rule based aggregation methods: max rule, min rule
and sum rule. These aggregation methods do not depend on the training data. Using
of data dependent aggregation methods would lead to an improvement in aggregation
performance [42, 11]. Previous works [25, 27] have provided a framework for a confidence
based choosing of color spaces and spectral planes from different color spaces for decision
level aggregation. Efficient data dependent aggregation of information from individual
spectral planes would therefore lead to a more efficient use of chromatic information,
especially in small sample size learning scenarios (in supervised FR systems).
Face Resolution Implications on FR
The resolution was fixed to 150×130 for all experiments in this thesis as this resolution
is commonly used for surveillance applications. However, it is not certain that this is the
optimal resolution for FR purposes. It would therefore be a good idea to determine the
optimal resolution for FR purposes. Another possible issue is when the testing images
do not have the same resolution as those which were used for training, as this would
lead to the issue of projection of images of a particular dimensionality on to a subspace
of different dimensionality. Efficient methods to solve this resolution mismatch would
significantly help in the practical usage of FR systems. It could be done by estimation
of the feature space (of the testing image dimension) from the existing feature space of
the training image dimension, or by resizing the testing inputs to suit the dimension of
the training inputs; however it is not evident which of these approaches would lead to
a better FR system. A change in resolution would also change the implication of the
small sample size problem in supervised FR systems, as it would mean a change in the
dimension of the vectorized input.
Chapter 6. Conclusion and Future Research 96
Transformation Mismatch in training and testing stages
The assumption in this thesis is that the same color space or gray scale transformation
is used for both training the FR system and in the testing stage. This assumption
was made as the principle aim of this thesis was to investigate the efficient use of color
information in FR systems. However, this is not a necessary condition in practical FR
systems where the FR system user is not guaranteed to know the color spaces or gray
scale transformation on which the system was trained. It is therefore important from an
application point of view to determine the effect of employing different transformations in
training and testing stage, or in the scenario of a transformation mismatch. A knowledge
of the transformations which are more robust to mismatches would help in the design of
FR systems as the training could be performed on these robust transformations.
Tensorial Analysis of Color inputs
In this thesis, color inputs are processed by the formation of a column vector in the
preprocessing step. The column vector was formed by a row wise ordering of bytes from
each spectral plane followed by a concatenation of spectral planes. By this operation,
though the chromatic information is used in the FR system, the structural and correlation
data between successive pixels and spectral planes is lost as the image is processed as a
long vector. Tensorial analysis has been explored for high dimensional data like faces and
gait in [46] and it is proved that preserving the structure and correlation in the data could
lead to a better recognition performance of pattern recognition systems. Analysis of color
inputs as 3-mode tensorial data would therefore preserve the structure and correlation in
data as well as the relation of the spectral planes with each other, especially in the case
of inputs of correlated color spaces.
Appendix A
Color CMU PIE database
In this appendix, a description of the images and their imaging conditions are provided,
so as to provide a sample of the faces used for experiments in this thesis. The images are
from the color CMU PIE database [40, 41]. This database is chosen for experimentation
as it consists of color faces in a wide range of imaging conditions, facial distortions
and variations in pose and viewpoint. The color CMU PIE database is licensed to the
University of Toronto and is permitted to be used solely for research purposes. The
sample images included in this appendix are of those subjects which are permitted for
use in published papers/ results.
The CMU PIE database [40, 41] consists of 41,368 images of 70 subjects. The images
in the color CMU PIE database are stored in RGB format and have a spatial dimension
of 480 x 640. Each subject was photographed under 13 different poses, 42 different
illumination conditions, and 4 different expressions. The database consists of two major
partitions, the first with pose and expression variation only, the second with pose and
illumination variation. The various different categories of images available in the CMU
PIE database are explained in this appendix and summarized in Table A.1. Images and
other data are available in [40].
97
Appendix A. Color CMU PIE database 98
Table A.1: Details of CMU PIE databaseCondition No. of subjects Other detailsPose and 70 21 flash conditions
illumination 13 posesPose and 70 2 background conditionslighting 3 poses
21 flash conditionsPose and 70 3 expressions and talkingexpression 13 poses (neutral illumination)
A.1 Pose and Illumination variation
This subset contains images of all 70 subjects having pose and illumination variations.
13 pose variations are captured and the poses consist of both horizontal and vertical
variations ranging from 0 to 90 degrees. The illumination conditions captured can be
classified into those where the room lights were off, and those where they were on. The
former is denoted as illumination and the latter as lighting. The images are captured by
varying positions of camera flash, thus leading to a total of 21 illumination and lighting
conditions each.
Illumination Images : The images are captured by varying positions of camera flash, in
a room with zero background light, thus leading to images with severe imaging conditions.
Each of the 21 illumination conditions are captured in all 13 poses, leading to a total of
273 samples per subject. Samples of these images are provided in Figure A.1.
Lighting Images : These images are captured by varying positions of camera flash
in neutral background light. The images have good illumination conditions, and are
depicted in Figure A.2. This set of images is typical of an office environment. Each of
the 21 illumination conditions are captured in 3 poses.
Appendix A. Color CMU PIE database 99
Pose 0
7P
ose 3
7
Flash 02 Flash 06 Flash 10
Figure A.1: CMU PIE: Images with Pose and Illumination Variations : No Room Lights
Flash 02 Flash 06 Flash 10
All images are of the Frontal Pose - 27
Figure A.2: CMU PIE: Images with Pose and Illumination Variations : Room Lights On
A.2 Pose and Expression variation
This subset of the CMU PIE database consists of images of 70 subjects, having pose
and expression variations. Subjects are captured in all of the 13 poses in this subset. 3
different expressions are considered - smiling, neutral and blinking along with an image
of the subject talking. A sample of images from this category is provided in Figure A.3.
These images are captured in neutral illumination. If the subject wears spectacles, both
Appendix A. Color CMU PIE database 100
images with and without spectacles are included in this partition.
Neutral ExpressionFrontal Pose 27
Smiling Expression Pose 05
Blinking ExpressionPose 37
Figure A.3: CMU PIE: Images with Pose and Expression Variations : Room Lights On
Appendix B
Preprocessing Method
In this appendix, a description of the method used for preprocessing face images is
provided.
The images in the databases are in RGB format. They contain not only the face but
also irrelevant information such as the hair, neck, shoulder, background, etc. To avoid
incorrect evaluations, it is required to isolate the face from the remaining image. This
separation of the face takes place in the preprocessing stage. The preprocessed images
are then passed through the rest of the blocks of the FR system.
The sequence of preprocessing steps performed is as follows:
1. Each spectral plane of the initial color image (or the entire gray-scale image)is
translated, rotated and scaled to size 150×130, so that the centers of the eyes are
placed on definite pixels, and the distance between the eye centers is 70 pixels. Also
the eye centers are placed on the 45th row. The distance of 70 pixels between eye
centers and the particular row to place the eyes are chosen such the photometric
proportion of the face is maintained.
2. A standard mask is applied to this image of reduced dimension to remove the
non-face portions.
101
Appendix B. Preprocessing Method 102
3. The image is converted to the respective color space or subspace / gray-scale format.
4. Each plane of the color image is normalized to zero mean and unit variance (if the
color space used is decorrelated). Since YCbCr is a decorrelated color space, each
color plane can be individually normalized. For gray scale images, this operation
is performed on the gray-scale image after a histogram equalization.
The steps to preprocessing for a single spectral plane of a color image (or a gray scale
image) are illustrated in Figure B.1 on an image from the PIE database. The preprocessed
images are then represented as a column vector for further processing. The procedure
for conversion of a preprocessed image to a column vector is presented in Chapter 3.
Appendix
B.
Preprocessin
gM
ethod
103
Input Image
Eye Coordinates
Mask Application
(Removal of unnecessary
portions (hair, etc))
Mask
Pre-preprocessed face
Rotation
Resolution Scaling
(150x130)
Figure B.1: Steps to preprocessing a single spectral plane(or a gray scale) face image
Appendix C
YCbCr Color Space
In this appendix, the details of the YCbCr color transformations used in this thesis are
provided. The YCbCr color space was developed as part of the ITU-R Recommendation
B.T.6012 for digital video standards and television transmissions. This color space is
used in MPEG video compression standards and JPEG images [39].
The YCbCr is a decorrelated color transform and contains one intensity channel, Y
and two chromatic channels, red (Cr) and blue(Cb). The Y spectral plane has 220 levels
ranging from 16 to 235, while the Cb and Cr spectral planes have 225 levels ranging
from 16 to 240. Values below 16 and above 235 are denoted as headroom and footroom
and are reserved for other processing. Given an RGB image, we can derive the YCbCr
transformations using the following equation [39],
Y
Cb
Cr
=
16
128
128
+
65.4810 128.5530 24.9960
−37.7745 −74.1592 111.9337
111.9581 −93.7509 −18.2072
R
G
B
(C.1)
where the 8 bit values in R, G, and B spectral planes are scaled in the closed interval
[0,1].
In digital video transmission applications, the chromatic spectral planes of the YCbCr
104
Appendix C. YCbCr Color Space 105
color space are decimated or spatially sampled. The rationale behind this is that humans
see color with much less spatial resolution than black and white. Three sampling schemes
are used in MPEG and JPEG standards - 4:4:4, 4:2:2 and 4:2:0. Although this type
of compression is lossy, the resulting images used in video frames/ pictures have no
perceivable loss of clarity.
In YCbCr 4:4:4, no chromatic sampling is performed. The 8 bit values of each spectral
plane are used directly. The 4:2:2 scheme indicates horizontal sub sampling by a factor
of 2, i.e., every alternate row of the Cb and Cr spectral planes is eliminated. The 4:2:0
scheme indicates both horizontal and vertical subsampling of the chromatic planes by a
factor of 2. In this sampling scheme, every alternate row and column of the Cb and Cr
spectral planes is eliminated. In all these sampling schemes, the Y intensity plane is not
spatially sampled. Figure C.1 presents a pictorial illustration of the various sampling
schemes.
Appendix C. YCbCr Color Space 106
Figure C.1: Illustration of Chromatic sub sampling - Each sub figure is a YCbCr image
Bibliography
[1] A. O’Toole, P. Phillips, F. Jiang, J. Ayyad, N. Penard, and H. Abdi, “Face recogni-
tion algorithms surpass humans matching faces over changes in illumination,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1642–
1646, Sept. 2007.
[2] P. Sinha, B. Balas, Y. Otrovsky, and R. Russell, “Face recognition by humans:
Nineteen results all computer vision researchers should know about,” Proceedings of
IEEE, vol. 94, no. 11, pp. 1948–1962, 2006.
[3] L. Torres, “Is there any hope for face recognition?” International workshop on Image
Analysis for Multimedia Interactive Services, April 2004.
[4] A. F. Abate, M. Nappi, D. Riccio, and G. Sabatino, “2d and 3d face recognition: A
survey,” Pattern Recognition Letters, vol. 28, no. 14, pp. 1885–1906, 2007.
[5] F. Samaria, “Face recognition using hidden markov models,” in PhD thesis, 1994.
[6] F. Samaria and S. Young, “HMM based architecture for face identification,” Image
and Visual Computing, vol. 12, pp. 537–583, 1994.
[7] L. Wiskott, J.M.Fellous, N. Kruger, and C.V.D.Malsburg, “Face recognition by elas-
tic bunch graph mapping,” in CRC Press, 1999.
107
Bibliography 108
[8] C. Jones and A. I. Abbott, “Color face recognition by hypercomplex gabor analysis,”
7th International Conference on Automatic Face and Gesture Recognition, April
2006.
[9] J.Lu, K.N.Plataniotis, and A.N.Venetsanopoulos, “Regularization studies of linear
discriminant analysis in small sample size scenarios with application to face recog-
nition,” Pattern Recognition Letters, pp. 181–191, 2005.
[10] ——, “Face recognition using LDA-based algorithms,” IEEE Trans. on Neural Net-
works, vol. 14, no. 1, pp. 195–200, Jan 2003.
[11] J.Lu, K.N.Plataniotis, A.N.Venetsanopoulos, and S. Li, “Ensemble-based discrim-
inant learning with boosting for face recognition,” IEEE Transactions on Neural
Networks, vol. 17, no. 1, pp. 166–178, Jan. 2006.
[12] J.Wang, K. Plataniotis, J. Lu, and A. Venetsanopoulos, “On solving the one face
recognition problem with one training sample per subject,” Pattern Recognition,
vol. 39, pp. 1746–1762, 2006.
[13] M. Turk and A. Pentland, “Face recognition using eigenfaces,” IEEE Computer
Society Conference on Computer Vision & Pattern Recognition, pp. 586–591, Jun
1991.
[14] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. fisherfaces: recogni-
tion using class specific linear projection,” Fourth European Conferencee on Com-
puter Vision, pp. 45–58, Apr 1996.
[15] S. Peichung and L. Chengjun, “Improving the face recognition grand challenge base-
line performance using color configurations across color spaces,” IEEE International
Conference on Image Processing, pp. 1001–1004, 8-11 Oct. 2006.
Bibliography 109
[16] L.Torres, J. Reutter, and L. Lorente, “The importance of color information in face
recognition,” IEEE International Conference on Image Processing, pp. 627–631,
1999.
[17] S. Peichung and L. Chengjun, “Comparative assessment of content based face image
retrieval in different colour spaces,” International Journal of Pattern Recognition,
vol. 19, no. 7, pp. 873–893, 2005.
[18] A. Yip and P. Sinha, “Role of color in face recognition,” Technical Report, Artificial
Intelligence Laboratory, MIT, December 2001.
[19] J. Wang, “Appearance based face recognition under small sample size scenario,” in
PhD thesis, 2007, vol. University of Toronto.
[20] P. Philips, H. Moon, S. Rizvi, and P. Rauss, “The ferret evaluation methodology for
face recognition algorithms,” IEEE transactions on Pattern Analysis and Machine
Intelligence, vol. 22, no. 10, pp. 1090–1104, Oct. 2000.
[21] T. Ganapathi and K. Plataniotis, “Color face recognition under various learning
scenarios,” IEEE Canadian Conference on Electrical and Computer Engineering,
2008.
[22] T. Ganapathi, K. Plataniotis, and Y. Ro, “Boosting chromatic information for face
recognition,” IEEE Canadian Conference on Electrical and Computer Engineering,
2008.
[23] A. Jain, A. Ross, and S. Prabhakar, “An introduction to biometric recognition,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1,
pp. 4–20, Jan. 2004.
[24] M. Sadeghi and J. Kittler, “A comparative study of data fusion strategies in face
verification,” 12th European Signal Processing Conference, pp. 1229–1232, 2004.
Bibliography 110
[25] M. Sadeghi, S. Khoushrou, and J. Kittler, “Confidence based gating of colour fea-
tures for face authentication,” in Multiple Classifier Systems. Springer, 2007, vol.
4472, pp. 121–130.
[26] J. Kittler and M. Sadeghi, “Physics based decorrelation of image data for decision
level fusion in face verification,” Multiple Classifier Systems, pp. 354–363, 2004.
[27] M. Sadeghi, S. Khoushrou, and J. Kittler, “SVM based selection of color space
experts for face authentication,” in International conference on Bioinformatics.
Springer, 2007, vol. 4642, pp. 907–916.
[28] P. Philips, H. Wechsler, J. Huang, and P. Rauss, “The FERRET database and eval-
uation procedure for face recognition algorithms,” Image Visual Computing Journal,
vol. 16, no. 5, pp. 295–306, 1998.
[29] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, and W. Worek, “Preliminary face
recognition grand challenge results,” in Seventh International Conference on Auto-
matic Face and Gesture Recognition, UK, 2006.
[30] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On combining classifiers,” IEEE
transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–
238, Mar. 1998.
[31] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, “XM2VTSDB: The
extended M2VTS database,” in International conference on Audio- and Video-Based
Biometric Person Authentication, 1999.
[32] E. B.Bailliere, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler, J. Mariethoz, J. Matas,
K. Messer, V. Popovici, F. Poree, B. Ruiz, and J. Thiran, “The BANCA database
and evaluation protocol,” in International conference on Audio and Video-Based
Biometric Person Authentication, 2003, pp. 625–638.
Bibliography 111
[33] M. Sadeghi and J. Kittler, “Decision making in the LDA space: Generalised gradi-
ent direction metric,” in International Conference on Automatic Face and Gesture
Recognition, 2004, pp. 248–253.
[34] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian face recognition,” Pattern
Recognition, vol. 33, no. 11, pp. 1771–1782, November 2000.
[35] B. Scholkopf, A. Smolla, and K. Muller, “Non linear component analysis as a kernel
eigen value problem,” Neural Computation, vol. Vol. 10, pp. 1299–1319, 1999.
[36] J.Lu, K.N.Plataniotis, and A.N.Venetsanopoulos, “Face recognition using kernel
direct discriminant analysis algorithms,” IEEE transactions on Neural Networks,
vol. 14, no. 1, pp. 117–126, Jan 2003.
[37] R. Duda, P. Hart, and D. Stork, Pattern Classification. John Wiley, 2000.
[38] S.J.Raudys and A. Jain, “Small sample size effects in statistical pattern recognition:
recommendations for practitioners,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 13, no. 3, pp. 252–264, Mar 1991.
[39] Z. Li and M. Drew, Fundamentals of Multimedia. Prentice Hall, 2004.
[40] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination and expression
database,” in Fifth International Conference on Automatic Face and Gesture Recog-
nition, Washington, D.C., 2002.
[41] ——, “The CMU pose, illumination, and expression (PIE) database of human faces,”
The Robotics Institute, Carnegie Mellon University, Tech. Rep. CMU-RI-TR-01-02,
January 2001.
[42] M. Kamel and N. Wanas, “Data dependence in combining classifiers,” in Multiple
Classifier Systems. Springer, 2003, vol. 2709, pp. 1–14.
Bibliography 112
[43] N.V.Chawla and K. Bowyer, “Designing multiple classifier systems for face recogni-
tion,” in Multiple Classifier Systems. Springer, 2005, vol. 3541, pp. 407–416.
[44] H.Yu and J. Yang, “A direct LDA algorithm for high- dimensional data with appli-
cation to face recognition,” Pattern Recognition, vol. 34, pp. 2067–2070, 2001.
[45] M. Skurichina, L. Kuncheva, and R. Duin, “Bagging and boosting for the nearest
mean classifier: Effects of small sample size on diversity and accuracy,” in Multiple
Classifier Systems. Springer, 2002, vol. 2364, pp. 62– 71.
[46] H.Lu, K.N.Plataniotis, and A.N.Venetsanopoulous, “Mpca: Multilinear principal
component analysis of tensor objects,” IEEE transactions on Neural Networks,
vol. 19, pp. 18–39, January 2008.