[IEEE 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06) - New York, NY, USA (17-22 June 2006)] 2006 Conference on Computer Vision and Pattern Recognition

Abstract

In this paper we investigate how to perform face recognition on the hardest experiment (Exp4) in Face Recognition Grand Challenge(FRGC) phase-II data which deals with subjects captured under uncontrolled conditions such as harsh overhead illumination, some pose variations and facial expressions in both indoor and outdoor environments. Other variations include the presence and absence of eye-glasses. The database consists of a generic dataset of 12,776 images for training a generic face subspace, a target set of 16,028 images and a query set of 8,014 images are given for matching. We propose to use our novel face recognition algorithm using Kernel Correlation Feature Analysis for dimensionality reduction (222 features) coupled with Support Vector Machine discriminative training in the Target KCFA feature set for providing a similarity distance measure of the probe to each target subject. We show that this algorithm configuration yields the best verification rate at 0.1% FAR (87.5%) compared to PCA+SVM, GSLDA+SVM, SVM+SVM, KDA+SVM. Thus we explore with our proposed algorithm which facial regions provide the best discrimination ability, we analyze performing partial face recognition using the eye-region, nose region and mouth region. We empirically find that the eye-region is the most discriminative feature of the faces in FRGC data and yields a verification rate closest to the holistic face recognition of 83.5% @ 0.1% FAR compared to 87.5%. We use Support Vector Machines for fusing these two to boost the performance to [email protected] % FAR on the first large-scale face database such as the FRGC dataset.

1. Introduction Machine recognition of human faces from still-images and video frames is an active research area due to the increasing demand for authentication in commercial and law enforcement applications. Despite the research advancement over the years, face recognition is still a highly challenging task in practice due to large nonlinear distortions caused by natural facial appearance distortions

such as expressions, pose and ambient illumination variations. Two well-known popular algorithms for face recognition are Eigenfaces [1] and Fisherfaces [2]. TheEigenfaces method generates features that capture the holistic nature of faces through Principal Component Analysis (PCA), which determines a lower-dimensional subspace that offers the minimum mean squared error approximation to the original high-dimensional face data. Instead of seeking a subspace that is efficient for representation, the Linear Discriminant Analysis (LDA) method seeks projection directions that are more optimal for discrimination. In practice, Fisherfaces first performs PCA to overcome the singularities in the within-class scatter matrix (Sw) by reducing the dimensionality of the data and then applies traditional LDA in this lower-dimensional subspace. Recently [3], it has been suggested that the null space of the Sw matrix is important for discrimination. The claim is that applying PCA in Fisherfaces may discard discriminative information since the null space of Sw

contains the most discriminative information. Fueled by this insight, Direct LDA (DLDA) and Gram-Schmidt LDA (GSLDA) [4] methods have been proposed by utilizing part of the null-space of the Sw. However, these linear subspace methods[9] may not discriminate faces well due to large nonlinear distortions in the faces. In such cases, the proposed kernel correlation filter approach in this paper may be attractive because of its ability to tolerate some level of distortions [5][6]. One of the recent advances in advanced correlation filters is the class dependent feature analysis (CFA)[7] which proposes a novel feature extraction method using advanced correlation filter methods to provide an efficient compact feature set. We will show that with a proper choice of nonlinear kernel parameters with our proposed kernel correlation filter, the performance can be significantly improved over previous work. Our experimental evaluation includes results from CFA, GSLDA, Fisherfaces, Kernel LDA (KDA), Eigenfaces and Kernel PCA on a large scale database from the face recognition grand challenge (FRGC) dataset collected by the University of Notre Dame [8]. We examine our proposed methodology, the theory and derivation, how it performs

Partial & Holistic Face Recognition on FRGC-II data using Support Vector Machine Kernel Correlation Feature Analysis

M. Savvides, R. Abiantun, J. Heo, S. Park, C.Xie and B.V.K. Vijayakumar,Electrical & Computer Eng Dept, Carnegie Mellon University, Pittsburgh PA 15213

[email protected], {raa, jheo, sungwonp, chunyanx}@andrew.cmu.edu, [email protected]

Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06) 0-7695-2646-2/06 $20.00 © 2006 IEEE

on holistic facial images and parts based recognition and fusion results.

Figure 1. Example Query images from FRGC-Phase II dataset showing harsh uncontrolled illumination.

1.1. Face Recognition Grand Challenge (FRGC) The Face Recognition Grand Challenge(FRGC) is aimed to provide several face recognition experiment challenges to the face recognition community. In this paper we focus on the hardest 2D face recognition problem which is experiment 4 in the FRGC. This experiment test face recognition algorithms of subjects captured in uncontrolled indoor and outdoor conditions where there is considerable harsh lighting conditions for image capture as shown in Figure 1. The FRGC database consists of three dataset components: a “generic dataset” which is typically used to build a global face model (e.g. a global PCA subspace), a “target set” which consists of known people that we want to find and a “query” or “probe” set which are the unknown images captured that we need to identify against the gallery set. The generic training set consists of 222 people with up to a maximum of 64 images per person, yielding a total of 12,776 facial images for generic training. The gallery set consists of 466 people yielding a total of 16,028 target images, and the probe set also has 466 people with a total of 8,014 facial images. The subjects in the dataset were captured in many different sessions spanning a period of one year. Experiment 4 is difficult as the target images are controlled images with controlled lighting conditions whereas the probe images are captured in un-controlled indoor and outdoor setting posing a serious challenge. The NIST PCA baseline algorithm yields a verification rate of 12% at 0.1% false acceptance rate (FAR). It is the goal of the face recognition grand challenge(FRGC) to increase

this verification rate at a specified false acceptance rate of 0.1%, thus this paper will mostly focus on the verification rate at this operating point of the ROC curve.

1.2. BackgroundOne of the most common algorithms used is PCA or eigenfaces which is the reported baseline algorithm in FRGC. The PCA finds the minimum mean squared error linear transformation that maps from the original N -dimensional data space into an M-dimensional feature subspace (where M << N) to achieve dimensionality reduction by using the eigenvectors corresponding to the largest eigenvalues of the data covariance matrix. These resulting basis vectors obtained are the optimal projection vectors that describe the total variance of the data and the leading eigenvectors with the largest eigenvalues capture most of the signal energy in the observed data. However it is obvious that the subspace spanned by the FRGC eigenfaces of the generic dataset can not represent the target/query images otherwise the verification result would be much higher than 12% @ 0.1 % FAR, thus we explore and propose a different approach using correlation filters for feature extraction.

Correlation filter approaches represent the data in the frequency domain using derived closed form solutions to specific optimization criteria. One of the most popular correlation filters is the minimum average correlation energy (MACE) [6][8] filter. This is designed to minimize the average correlation plane energy resulting from the training images, while constraining the correlation peak value at the origin to pre-specified values. Correlation outputs from MACE filters typically exhibit sharp peaks, making the peak detection and location relatively easy and robust. The closed form expression for the vectorized MACE filter h is

uDDh -1-1 1)( (1)where X is a d2xN complex valued matrix (where N is the number of training images and d2 is the number of pixels in each image) and its ith column contains the lexicographically re-ordered version of the 2-D Fourier transform of the ith training image. D is a d2xd2 diagonal matrix containing the average power spectrum of the training images along its diagonal and u is a column vector containing N pre-specified correlation values at the origin. Optimally trading off between noise tolerance and peak sharpness produces the optimal trade-off filters (OTF). OTF filter vector is given by

1 1( )1h T X X T X u (2)

where 21T D C , 0 1 , and C is a d2xd2


diagonal matrix whose diagonal elements C(k,k) represent the noise power spectral density at frequency k.Correlation filters are well suited for biometric verification applications and have been shown to exhibit robustness to illumination variations and other distortions [5][6]. However there is still one problem, in that by the pure nature of these correlation filter designs, they are wired to make use of the gallery set and up to now have not used the generic set to build a generic set of filter bases. In this paper the proposed class-dependence feature analysis (CFA) overcomes this limitation and makes use of the generic dataset. Furthermore we show that we can achieve better dimensionality reduction using the proposed approach rather than traditional PCA methods as we show that we reduce the dimensionality to 222 features even though we have 12,776 generic set images, (note that we make full use of all 12,776 images in this dimensionality reduction step) and can achieve verification rates close 90% @ 0.1% using these CFA features with fusing partial and holistic face regions. We show that the CFA method is a function of the number of classes and not the total number of images which is beneficial for dimensionality reduction in face recognition applications. The CFA novel feature extraction method as explained in the next section.

2. Class dependence Feature Analysis (CFA) In the proposed CFA approach, one filter (in our case a MACE filter) is designed for each of the 222 classes in the generic training set. This filter takes input all the 12,776 generic face images in a 1-against all configuration and designed in closed form to produce correlation output of 1 at the origin for the authentic class (the class it is being designed for) and a correlation output of 0 for all other images from the rest of the 221 people. Once the design process is completed the feature extraction is the next step where we are given a test image y, we represent it by the correlation of that test image with the N MACE filters, i.e.

Tmace-1 mace-2 mace-nH [h h ... h ]Tc y y (3)

where hmace-i is a filter is trained to give a small correlation output (close to 0) for all classes except for class-i as shown in Figure 2. In this example, the number of filters generated by the FRGC generic training set is 222 since it contains 222 classes (or subjects). Then each input image y is projected onto those basis vectors to yield a 222 dimensional correlation feature vector c. The similarity between the probe image and the gallery image is performed in this 222 dimensional feature space.

Figure 2: The CFA algorithm; the filter response of y1 and hmace-2 can be distinctive to that of y2 and hmace-2

Due to the nonlinear distortions in human face appearance variations, the linear subspace methods have not performed well in real face recognition applications, especially in lack of representative training data. As a result, typical algorithms such as PCA and LDA algorithms are extended to represent nonlinear features efficiently by mapping onto a higher dimensional feature space. Since nonlinear mappings increase the dimensionality rapidly which leads to increase in the computation and memory storage requirements, kernel trick methods are used for computational efficiency as they enable us to obtain the necessary inner products in the higher-dimensional feature space without actually having to compute the higher-dimensional feature mappings. Examples are Kernel Eigenfaces and Kernel Fisherfaces[14]. The mapping function can be denoted as follows. (4)

Kernel functions defined by )(),(),( yxyxK can be used without having to form the mapping as long as kernels form an inner-product and satisfy Mercer’ theorem to ensure that we are still working in a Hilbert inner-product space. Examples of kernel functions are:

Polynomial kernel: ( pbabaK )1,(),( ) (5) Radial Basis Function kernel: ( )2/)(exp(),( 22babaK ) (6) Neural Net Sigmoidal Kernel: ( ),tanh(),( bakbaK (7)

We show that we need to perform something similar in the next section by deriving formulation of kernel correlation filters. This will allow us to exploit higher-order correlations in these kernel spaces.

Class 1

Class 2

...Class n

0u1u

0u

2-maceh

maceT hy

y

1y 2y

FR N:


3. Kernel Correlation Filters The Kernel Correlation Filters can be extended from the linear advanced correlation filters using the kernel trick, however we must factor out the advanced correlation filters by showing that we need to perform a pre-filtering step before we can derive closed form solution to kernel correlation filter [6]. The correlation output of a filter hand an input image y (lexicographically re-ordered into a vector) can be expressed as

-1 -1 -1

-0.5 -0.5 -0.5 -0.5 1

' ' ' ' 1

h D X(X D X) u (D ) (D X)(D X D X) u ( X )(X X ) u

-y

y y

y (8)

where XDX 0.5' indicates pre-whitened version of X .From now on, we assume the X' is already pre-whitened. Now we can apply the kernel trick to yield the Kernel Correlation Filter as follows below:

1

1

( ) (h) ( ( ) ( '))( ( ') ( ')) u( , ' ) ( ' , ' ) ui i jK x K x x

y y X X Xy

(9)

We can use these Kernel filters in the same CFA framework to obtain the same 222 dimensional kernel feature vector representation which we refer to as the KCFA method. Figure 3 shows the comparative experimental results using Eigenfaces (PCA), GSLDA, CFA, and KCFA of the experiment 4 of the FRGC. The performance of PCA is the benchmark provided by NIST[13].

0

0.1

0.2

0.3

0.4

0.5

0.6

Exp 4

P C A

GS LD A

CFA

KCFA

Figure 3: The performance comparison of the FRGC experiment 4 at 0.1 % FAR. The Kernel CFA shows the best results (nearly 3-fold) over all linear methods.

The similarity or distance measure between target image and probe image is important. Commonly used distance measures are L1-norm, L2-norm [7], Mahalanobis distance and the normalized cosine distance (given below)

which exhibits the best results on the algorithms above including both CFA and KCFA.

-x yd(x,y)||x|| ||y||

(10)

where d denotes the similarity (or distance) between x and y. KCFA uses the distance measure in Eq. 10, and the resulting performance is shown above in Figure 3.

4. Distance Measure in SVM Space A direct use of support vector machines in the raw pixel space as a decision classifier may produce worse performance under those distortions since only small number of training images are allowed to build the SVM. Another important factor is due to the large dimensionality of the data, we want to apply SVMs[10][11][12] in a reduced dimensional feature space otherwise training can take a significant amount of time and will require large amount of physical computing memory. Therefore, instead of using the SVM as a classifier directly on the images, we use the KCFA projection coefficients for training the SVM classifier. Figure 4 shows a pictorial example of virtual decision boundary and distance measure in the SVM space.

Figure 4: The decision boundary of a class and distance measure in the SVM space; A direct use of the SVM may

falsely indicate that image C5 as not the same person with those images inside the decision boundary.

The L2-norm distance without the SVM decision boundary w may be large between the same classes causing poor performance. In this case, the L2-normdistance of C2 and C5 is greater than that of C6 and C5causing a misclassification. However, if we project C5 onto w, the projection coefficients among the same classes will be small and we can change the threshold distance depending on FAR and FRR. Thus this approach may lead to flexibility of varying thresholds and better performance in classification. We design 466 SVMs (in a one-against all framework) using the gallery set of the FRGC data. The probe images are then projected on the class-specific SVMs which will provide a projection similarity score. We show that this approach can boost performance of all algorithms, however it enhances our KCFA method the most, yielding nearly 85%. Note that this SVM approach


deviates from the pure 1-1 FRGCv2 protocol because it makes use of multiple images per person instead of just computing independent pair-wise image comparisons between probe and query sets. We justify our method as in practical face recognition scenarios, you have more than one image of the suspect you are looking for and one would use those images for training, thus we show how to boost performance of our KCFA approach with SVMs. As shown in Figure 5, the new distance measure in the SVM space produces better results than using normalized cosine distance. We also compared the different kernel approaches such as KPCA and KDA with different distance measures showing the SVM+KCFA based methods have superior to other kernel approaches as shown in Figure 5.

Normalized Cosine SVM space

Figure 5: The performance comparison of the FRGC experiment 4 at 0.1 % FAR. Utilizing the SVM training and use

as a distance measure consistently boost performance of all methods, maximizing the performance of our KCFA approach.

5. Partial vs. Holistic Face Recognition We have shown consistently that our KCFA approach yields the best performance over other methods, yielding a performance of 87.5% @ 0.1% FAR for holistic face recognition. It is interesting to then observe which parts of the face provide the best discrimination information. Thus we divided the facial regions into three sections: eye region, nose region and mouth region as shown in Figure 6. What we observe is that the eye-brow region out-performs all other facial regions consistently with a large margin of separation. At 0.1% FAR, the eye-region yields a verification rate of 83.1% compared to 53% obtained by using the mouth region and 50% with the nose region. This result is very substantial and can be explained by the fact that many of the target images available are not representative of the query image variations and in general the uncontrolled images contains significant amounts of facial expressions. Thus regions such as the mouth and nose are greatly affected due to the effect of facial

expressions such as smiling which also affect the appearance of the nose and wrinkles around it as the expression changes.

(A)

(B)

\

(C)Figure 6: (A) Eye-region images (B) Nose region (C) Mouth region images of a subject in FRGC (controlled illumination)

Figure 7: VR vs FAR ROC III curves for FRGC experiment 4 using KCFA for each of the three facial regions.

We thus split the generic data, target and query images into these three facial region crops and ran our KCFA algorithm on each region separately to get 222 feature vectors for each facial region and then apply our SVM distance metric and recorded the ROC-III curves for each facial region in the plot above in Figure 7. The eye-region however is less affected by such variations in comparison and is only affected by extreme severe expression variations. This is one explanation of the empirical findings in this paper. Another possible contributing factor might also be registration in that all the face images have been aligned based on eye-coordinates, thus if a person is at a particular off-angle pose, then there will be a mis-


match between other facial regions. We tried fusing the eye-region feature with the holistic features by feeding the corresponding 222 features to the same SVM classifier, this produced a further boost in performance to 90% @ 0.1% FAR.

6. CONCLUSIONS Due to nonlinear face distortions and poor quality face images due to uncontrolled acquisition conditions, linear approaches such as PCA, LDA, and CFA may not be suitable to represent or discriminate facial features efficiently. We show that our proposed Kernel Correlation Feature analysis outperforms the above linear methods and also their non-linear (kernel) versions (KDA, KPCA). The proposed approach is a very efficient dimensionality reduction method reducing the representation of the 12,776 generic facial images to a KCFA feature vector of size 222 as KCFA is dependent on the number of generic classes and not the total number of generic training images.

Figure 7: VR vs FAR for FRGC experiment 4 for different methods using normalized cosine distances and SVM space.

This is shown to generalize well in FRGC-II data in a pure one-to-one matching (68-72% @ 0.1% FAR), but furthermore when we use a Support Vector Machine (SVM) to train among the Target set (in the 222 KCFA feature space) we are able to boost performance up significantly. We show that this boost in performance is also true when used in conjunction with other methods such as SVM (on raw images), KDA and KPCA, however the maximum performance is achieved using our proposed KCFA+SVM method. We also analyze how partial face recognition performs using eye-region, nose-region and mouth-region, where we see that the eye-region performs remarkable well (83.1% @ 0.1FAR). This is intuitive since facial expressions will greatly affect regions around the mouth, but leave the eye-region relatively un-affected. When using SVM to fuse feature information from eye-region and holistic face, we achieve a greatest boost in performance to ~90% verification @ 0.1% FAR.

7. ACKNOWLEDGEMENTS We would like to thank the United States Technical Support Working Group (TSWG) for supporting and sponsoring this research and Carnegie Mellon University CyLab.

References[1] M. Turk and A. Pentland, “Eigenfaces for Recognition,”

Journal of Cognitive Neuroscience, Vol. 3, pp.72-86, 1991. [2] P.Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs

Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. PAMI, Vol.19. No.7, pp.711- 720, 1997.

[3] L.F. Chen, H.Y.M. Liao, M.T. Ko, J.C. Lin, and G.J. Yu, “A new LDA-based face recognition system which can solve the small sample size problem,” Pattern Recognition, Vol. 33, pp. 1713-1726, 2000

[4] W. Zheng, C. Zou, and L. Zhao, “Real-Time Face Recognition Using Gram-Schmidt Orthogonalization for LDA,” IEEE Conf. Pattern Recognition (ICPR) , pp.403-406, 2004

[5] M. Savvides, B.V.K. Vijaya Kumar and P.K. Khosla, “Face verification using correlation filters,” Proc. Of Third IEEE Automatic Identification Advanced Technologies, Tarrytown, NY, pp.56-61, 2002.

[6] M. Savvides, B.V.K. Vijaya Kumar and P.K. Khosla, “Corefaces – Robust Shift Invariant PCA based Correlation Filter for Illumination Tolerant Face Recognition”, IEEE Computer Vision and Pattern Recognition(CVPR), Vol2, II-834-841, July2004.

[7] C. Xie, M. Savvides, and B.V.K. Vijaya Kumar, “Redundant Class-Dependence Feature Analysis Based on Correlation Filters Using FRGC2.0 Data,” IEEE Conf. Computer Vision and Pattern Recognition(CVPR) , 2005

[8] A. Mahalanobis, B.V.K. Vijaya Kumar, and D. Casasent, “Minimum average correlation energy filters,” Appl. Opt. 26, pp. 3633-3630, 1987.

[9] K. Fukunaga, Introduction to Statistical Pattern Recognition(2nd Edition), New York: Academic Press, 1990

[10] V. N. Vapnik, The Nature of Statistical Learning Theory,New York: Springer-Verlag, 1995.

[11] B. Scholkopf, Support Vector Learning, Munich, Germany: Oldenbourg-Verlag, 1997.

[12] P. J. Phillips, “Support vector machines applied to face recognition,” Advances in Neural Information Processing Systems 11, M. S. Kearns, S. A. Solla, and D. A. Cohn, eds., 1998.

[13] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, "Overview of the Face Recognition Grand Challenge,” IEEEConf. Computer Vision and Pattern Recognition(CVPR),2005

[14] M.H Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition using Kernel Methods,” IEEE Conf. Automatic Face and Gesture Recognition, pp. 215-220, 2002

[15] R Gross and V. Brajovic, “An Image Preprocessing Algorithm for Illumination Invariant Face Recognition”, 4th

International Conf. on Audio and Video Based Biometric Person Authentication (AVBPA), 2003


Documents

[IEEE 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06) - New York, NY, USA (17-22 June 2006)] 2006 Conference on Computer Vision and Pattern Recognition