4780 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. …xujuefei.com/felix_tip15_spartans.pdf · age invariant face recognition, where they determined that periocular region is

4780 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 12, DECEMBER 2015

Spartans: Single-Sample Periocular-BasedAlignment-Robust Recognition Technique

Applied to Non-Frontal ScenariosFelix Juefei-Xu, Student Member, IEEE, Khoa Luu, Member, IEEE,

and Marios Savvides, Member, IEEE

Abstract— In this paper, we investigate a single-sampleperiocular-based alignment-robust face recognition techniquethat is pose-tolerant under unconstrained face matchingscenarios. Our Spartans framework starts by utilizing onesingle sample per subject class, and generate new face imagesunder a wide range of 3D rotations using the 3D generic elasticmodel which is both accurate and computationally economic.Then, we focus on the periocular region where the most stableand discriminant features on human faces are retained, andmarginalize out the regions beyond the periocular region sincethey are more susceptible to expression variations and occlusions.A novel facial descriptor, high-dimensional Walsh local binarypatterns, is uniformly sampled on facial images with robustnesstoward alignment. During the learning stage, subject-dependentadvanced correlation filters are learned for pose-tolerantnon-linear subspace modeling in kernel feature space followedby a coupled max-pooling mechanism which further improve theperformance. Given any unconstrained unseen face image, theSpartans can produce a highly discriminative matching score,thus achieving high verification rate. We have evaluated ourmethod on the challenging Labeled Faces in the Wild databaseand solidly outperformed the state-of-the-art algorithms underfour evaluation protocols with a high accuracy of 89.69%, a topscore among image-restricted and unsupervised protocols. Theadvancement of Spartans is also proven in the Face RecognitionGrand Challenge and Multi-PIE databases. In addition, ourlearning method based on advanced correlation filters ismuch more effective, in terms of learning subject-dependentpose-tolerant subspaces, compared with many well-establishedsubspace methods in both linear and non-linear cases.

Index Terms— Periocular-based recognition, pose tolerance,alignment robustness, unconstrained face recognition, single-sample 3D face synthesis, advanced correlation filters.

I. INTRODUCTION

THE algorithms of face recognition under controlledconditions have now become mature. However, the

problem of unconstrained face recognition still remainsunsolved, for example in the Boston Marathon bombingcase [1], the law enforcement failed to find a match of

Manuscript received May 7, 2015; revised July 27, 2015 andAugust 10, 2015; accepted August 10, 2015. Date of publication August 13,2015; date of current version September 18, 2015. The associate editorcoordinating the review of this manuscript and approving it for publication wasProf. Christine Guillemot.

The authors are with the CyLab Biometrics Center, Department of Elec-trical and Computer Engineering, Carnegie Mellon University, Pittsburgh,PA 15213 USA (e-mail: [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2015.2468173

Fig. 1. Examples of facial appearance changes with different poses,illumination, expressions, occlusion and their corresponding less affectedperiocular region in the LFW database.

the unconstrained face of the bombing suspect against thegovernment database. The face was captured from a cellphone and had off-angle pose, illumination variation, in lowresolution, and under partial occlusion, as shown in Figure 2.Many complex factors could affect the appearance of a faceimage in real-world scenarios and providing tolerance to thesefactors is the main challenge for researchers. Among thesefactors, pose is often the most important factor to be dealtwith. It is known that as facial pose deviates from a frontalview, most face recognition systems have difficulty in robustlyperforming recognition. In order to handle a wide range ofpose changes, it becomes necessary to utilize 3D structuralinformation of faces. However, many of the existing 3D facemodeling schemes [2]–[4] have many drawbacks, such ascomputation time and complexity, so that they have difficultyto be applied in real-world large scale unconstrained facerecognition scenarios. Recently, 3D generic elastic model (3D-GEM) approach was proposed as a new efficient and reliable3D modeling method from a single 2D image. Heo [5] andPrabhu et al. [6] claim that the depth information of a face isnot extremely discriminative when factoring out the 2D spatiallocation of facial features. In our method, we follow this ideaand observe that fairly accurate 3D models can be generatedby using only one single frontal image at a relatively lesscomputational expense compared to the previous approaches.

Besides the challenges from pose, unconstrained faces alsocontain numerous challenging facial expressions, various illu-mination and occlusions. Therefore, full face based approachesare not good candidates to solve the problem. Meanwhile, theuse of the periocular region as a biometrics trait has recentlygained much attention among researchers due to the fact thatthe most important features in human faces are located aroundthe eye regions [7]–[15]. More importantly, periocular regionsare less affected by facial expressions [16], illuminationvariations [17], [18], and occlusions [19], [20], as in Figure 1.Recognition using the periocular region actually represents abetter alternative for unconstrained full face recognition.

1057-7149 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

JUEFEI-XU et al.: SINGLE-SAMPLE PERIOCULAR-BASED ALIGNMENT-ROBUST RECOGNITION TECHNIQUE 4781

Fig. 2. (a) Low quality image of the Boston Marathon bombing suspectreleased by the FBI on April 18, 2013. (b) High quality passport photo of thesuspect. The law enforcement failed to match the two when the first one isunder such an unconstrained condition.

Fig. 3. Overall framework of the proposed Spartans method.

A. Contributions of This WorkIn this paper, we focus on the periocular region from

3D pose-tolerant synthesized images to deal with uncon-strained face matching problems. By utilizing only one singleimage per subject in the training corpus, we synthesize a setof new face images under a wide range of 3D rotations basedon 3D-GEM and segment out only the periocular regions.Then, the features of periocular regions are extracted byusing high dimensional Walsh local binary patterns (WLBP),a novel frequency-domain LBP encoding scheme that greatlyoutperforms pixel-domain LBP. Following this, optimal featuresubset is selected using Laplacian regularized A-optimality.Finally, coupled max-pooling kernel class-dependence featureanalysis (CoMax-KCFA) is harnessed to learn the discrim-inant non-linear subspaces for classification. Given unseenunconstrained face images, our proposed Spartans method canproduce highly discriminative matching scores. Figure 3 showsthe overall framework of our proposed approach.

With our understanding of periocular biometrics and3D-GEM from others’ previous publications [5], [21], we areconfident to set up a general framework where we modelperiocular region using 3D-GEM to tackle the real-worldunconstrained face matching problems. Specifically, we carryout our evaluation on the challenging LFW database [22] andSpartans solidly outperforms all the state-of-the-art algorithmsunder four evaluation protocols. Also, the advancement of theSpartans is also shown using the FRGC [23] and MPIE [24]databases. Additionally, the 3D-GEM reconstruction accuracyis justified using MPIE database.

Our Spartans approach does not require precise alignmentof the facial images. Both the high dimensional featureextraction stage and pose invariant face recognition stage arealignment-robust. It can also be agnostic about the pose of thequery images because the pose invariance is built in duringthe design of the advanced correlation filters. Unlike otherpose-invariance face recognition techniques that require eitherprecise alignment or pose estimation of the query images.

Moreover, in order to show the effectiveness of our proposedapproach, in terms of learning subject-dependent pose-tolerantsubspaces, we introduce a new evaluation protocol on the LFWto show that our proposed discriminant subspace learningscheme can achieve the highest recognition accuracy, in bothlinear and non-linear cases, compared to other well-establishedsubspace learning methods, including principal componentanalysis (PCA) [25], linear discriminant analysis (LDA) [26],locality preserving projections (LPP) [27] and unsuperviseddiscriminant projection (UDP) [28].

Our paper is organized as follows. In Section II, we reviewprior work carried out on periocular and unconstrained facerecognition. In Section III, we describe our proposed Spartansframework. Section IV details experimental setup and thecomparison of our approach against existing methods on threedatabases, with emphasis on the LFW database evaluation.Finally, we present our conclusions on this work in Section V.

II. BACKGROUND

One of the earliest studies on periocular for biometricsidentification can be traced back to 2009 when Park et al. [21]studied the feasibility of using periocular images of anindividual as a biometric trait. They have shown 77% rank-1identification rate on a rather small database (958 images from30 subjects) for people identification. Pioneered by this, morework on using periocular region for various biometric fields,such as gender, ethnicity, and aging-invariant people identifica-tion was carried out. Juefei-Xu et al. [7] were the first to evalu-ate the performance of periocular biometrics on FRGC ver2.0.They proposed various local feature sets for extractinglocal features in periocular regions. Merkow et al. [29]explored gender identification using only the periocular region,they showed that at least an 85% classification rate is stillobtainable using only the periocular region with a database of936 low resolution images. Similarly, Lyle et al. [30] alsostudied gender and ethnicity classification using periocularregion features. They used a subset of FRGC and obtaineda classification accuracy of 93% and 91% for gender andethnicity respectively. Miller et al. [31] studied the effect ofinput periocular image quality on recognition performance,and texture information present in different color channels.Woodard et al. [32] utilized periocular region appearancecues for biometric identification both on images capturedin visible and NIR spectrum while Park et al. [33] studiedperiocular biometrics in the visible spectrum. More recently,Juefei-Xu et al. [8] looked into the periocular region forage invariant face recognition, where they determined thatperiocular region is more stable across ages than full face.

There are also numerous studies on unconstrainedface images, particularly by using the LFW database.Pinto et al. [34] introduced V1-like and V1-like+ features torepresent input faces. The V1-simple-cell-like unit is computedfrom the normalized image and the response of 96 spatiallylocal Gabor wavelets. These extracted features are classifiedby using multi kernel learning (MKL) associated with supportvector machines (SVM) with high accuracy of 79.35% onthe LFW. Guillaumin et al. [35] presented two approaches forlearning robust distance measures, i.e. the logistic discriminant


approach to learn the metric from a set of labeled image pairsand the nearest neighbor approach to compute the probabilityfor two images that belong to the same class. They evaluatedon the LFW with 79.3% accuracy for the restricted setting.Yin et al. [36] presented the “associate-predict” model builton a given generic identity dataset, to measure the similaritybetween human faces where every input face is first associatedwith alike identities from the given generic identity dataset.Then, based on these associated representation, they willpredict the appearance of a new input face whether belongs tothe same subject. They achieved 90.57% accuracy on the LFW.However, their method requires an additional identity datasetto train the system. Seo and Milanfar [37] presented the locallyadaptive regression kernel (LARK) descriptors to measure aself-similarity based on “signal-induced distance” between acenter pixel and locally surrounding neighbor pixels. Whentraining data are available, one-shot similarity (OSS) isconducted based on LDA. They achieved good results onimage restricted training setting with 78.90% accuracy.

More recently, many works using dense facial descriptorsachieve the state-of-the-art performances. Cui et al. [38]represent each partitioned block by sum-pooling the nonnega-tive sparse codes of position-free patches sampled within theblock in order to obtain spatial face region descriptor. Togetherwith a pairwise-constrained multiple metric learning scheme,they have reported good performance for both image and videoface verification tasks. Simonyan et al. [39] fit the Gaussianmixture model on the densely sampled SIFT features to obtainthe high dimensional Fisher vectors. Compact descriptors canbe obtained through large-margin dimensionality reduction anddiscriminative metric learning. Chen et al. [40] densely sampleLBP or SIFT features around accurate facial fiducial pointsin a multi-scale manner, and form a very high-dimensionalfeatures (100K-dim). The high dimensionality is then mappeddown using rotated sparse regression. They point out thatas dimensionality increases, many commonly used featurescould all achieve better performance. The pose tolerance isachieved by getting descriptors around the key points and theperformance would rely on how good these key points arelocalized.

Li et al. [41] proposed a probabilistic elastic matchingframework for pose variant face verification, where theyfirst take a part-based representation by extracting localfeatures like LBP and SIFT from densely samples multi-scaleimage patches. Their feature incorporates both the descriptorand location. With that, they train a Gaussian mixture model,with later adaptation to universal background model, to capturethe spatial-appearance distribution of all faces in the trainingset. The key step in their approach is to formulate a differencevector for a pair of face/face images, which is committed toeach Gaussian component. A kernel SVM is then trained onthe difference vector to achieve invariant matching.

Yi et al. [42] developed a pose adaptive filter approachwhere they transform the filters according to the pose andshape of face image and use the pose adapted filters forfeature extraction. They start by building a 3D deformablemodel using 3DMM, with 34 facial landmarks. Then, givenan off-angle 2D image, the pose and shape is estimated

by 3D model fitting and Gabor filters are used for featureextraction. There are several reasons for it to be pose robust:holistic rigid transformation, non-rigid shape deformation,local Gabor filtering, and “half-face” mirroring by facialsymmetry to deal with self-occlusion.

III. OUR PROPOSED Spartans APPROACH

In this section, we detail each of the major building blockstowards our proposed Spartans approach. First, the modifiedactive shape model (MASM) [43] is employed for faciallandmarking in the training stage, and then 3D model fromone single image is generated using 3D-GEM. Novel 2D posesynthesized images can be easily obtained by utilizing the3D-GEM framework. Second, we extract robust space-frequency features from each periocular region using highdimensional Walsh LBP. This way, more discriminative featurerepresentations of the periocular region can be achieved.Third, an optimal feature subset is selected using Laplacianregularized A-optimality. Finally, we design a class-specificrepresentation based on advanced correlation filters in kernelfeature space for achieving a compact representation whilefurther enhancing the discriminative power on the periocularrepresentations. With the coupled max-pooling techniquethat strategically combines information from both the MACEfilters (designed in frequency domain) and the ECPSDF filters(designed in the space domain), the performance is furtherimproved.

A. 3D Generic Elastic ModelThe key assumption in 3D-GEM is that the depth

information z of a face is not significantly discriminativeamong people. A fairly accurate 3D face model can begenerated by utilizing a set of generic depth information. Theoverview of the 3D-GEM framework are explained as follows.3D-GEM can be utilized for modeling a realistic 3D facefrom a single image automatically. Formally, this problemcan be stated as follows: given a face image I, automaticallyextract the input face landmarks S2×n and assign the depthinformation using a generic depth model. To achieve this, thex and y information needs to be estimated from the inputimage. The depth information can be recovered by deformingthe 3D-GEM depth-map based on the input observations. Theoverall procedure is shown in Figure 4. Based on the detectedshape, each face I is partitioned into a mesh of triangularpolygons P. Similarly, the generic depth-map D is partitionedinto a mesh M from predefined landmark points. After register-ing points between the input image and the generic depth-map,we increase the point density simultaneously using subdivisionmethod. The subdivision method used here can be consideredan intermediate step for establishing dense correspondencebetween the input mesh and the depth-map. A piece-wiseaffine transform W [44] is used for warping the depth-map D,sampled at the spatial locations of M, onto the input trianglemesh P in order to estimate the depth information. Each pointin the input image has an exact corresponding point in thedepth-map and the intensity of the depth-map can be used forthe estimation of the depth in the input image. Finally, theintensity of the input image I(P(x, y)), sampled at the spatial


Fig. 4. Framework of 3D-GEM approach for novel 2D pose synthesis fromone single image using 3D modeling algorithm.

locations of P(x, y), is mapped onto the 3D shape. Therefore,we represent the reconstructed 3D face as follows:

Sr = (x, y, z = D(M(x, y))) (1)

Tr = I(P(x, y, z)) = (Rx,y,z, Gx,y,z, Bx,y,z) (2)

where x and y in M are the registered points x and y inimage P. This representation with shape Sr and texture Tr canbe achieved using 3DMMs. By using the 3D-GEM approach,we can easily obtain novel 2D pose images less than afew seconds in a fully automated manner. Based on these2D images, we will then extract features on each periocularregion as will be discussed in the following Section III-B.

B. Periocular Feature Extraction

We detail our proposed feature extraction technique in thissection. Each periocular image is coarsely aligned1 and rotatedbased on eye coordinates, and down-sampled or up-sampled toa set resolution of 50×128. Then Walsh local binary patterns isapplied [8], [45] to extract periocular features. After applyingthe Walsh transformation, we obtain regional high and lowfrequencies, followed by LBP for dimensionality reduction.This way, locally discriminative information is captured andcan be used for recognition tasks.

Frequency domain LBP yields higher tolerance to local dis-tortion caused by pose variations, lighting changes, expression,and so forth. Let us consider a very simple case when a localregion is slightly distorted, the space domain LBP would returna totally different value which characterizes this local region.In the frequency domain, the coefficients will also change.However, since we care about the ordinal information amongthose coefficients in order to build the frequency domain LBP,this is where robustness comes in. In other word, when theabsolute values of the coefficients change, the relative quantityshould remain consistent for small distortion.

We first formulate convolution filters such as Walshmasks [46] to capture local image characteristics. Such

1Note that we do not need very precise alignment due to the niceshift-invariant property of correlation filters.

orthonormal basis are obtained from the Walsh function:

ω2 j+q(t) = (−1)� 2

j �+q[ω j (2t) + (−1) j+qω j (2t − 1)

], (3)

where � 2j � denotes the integer part of j/2, q ∈ {0, 1}. We then

sample each function at the integer points only, so we producethe following vectors ωi for the size of 5:

�ω =

⎡⎢⎢⎢⎢⎣

ω0ω1ω2ω3ω4

⎤⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎣

+1 +1 +1 +1 +1−1 −1 −1 +1 +1−1 −1 +1 +1 −1+1 +1 −1 +1 −1+1 −1 +1 +1 −1

⎤⎥⎥⎥⎥⎦

All possible combinations of Walsh vectors can be used toproduce 2D basis images. As shown in Figure 5 (a), there are9 basis images using 3 × 3 kernel and 25 using 5 × 5 kernel.

For any given image patch ϒ, we can represent it using theWalsh transformation matrix �ω we just obtained:

G = �ωϒ��ω , (4)

where G is the output Walsh coefficient for image patch ϒ.Such equation holds for even sized kernel where the trans-formation matrix �ω constructed from Walsh function isorthogonal thus its inverse is its transpose.

When the kernel is of odd size, i.e. 3 × 3, 5 × 5 and 7 × 7,odd-sized Walsh vectors ωi yields odd-sized Walsh transfor-mation matrices �ω which are no longer orthogonal. In orderto invert Equation 4, we make use of the inverse of �ω. Thenwe shall have [46]:

�−1ω G(��

ω )−1 = ϒ. (5)

So, we are able to use the inverse of transformationmatrix �ω to process the image:

G = (�−1ω )�ϒ�−1

ω . (6)

where G, ϒ and �ω are odd-sized G, ϒ and �ω respectively.In our evaluation, odd-size kernels are used to capture the

local features. Once we have the Walsh coefficients, LBP isapplied for dimensionality reduction. When using 7×7 kernel,the feature dimensionality is 49 times of the size of the imageafter Walsh transformation. After LBP encoding, the featuredimensionality can be reduced to be the same as the size ofthe input periocular image. Figure 5 (b-d) show examples ofLBP encoding for odd-sized kernels, and each gray-scale blockrepresent one Walsh coefficient. In LBP method, neighboringpixel intensities are compared with the “pivot” (usually thecenter pixel, in orange box), and then threshold against thepivot. Our experiments find that using non-centered pivotyields better performance as shown in green box of Figure 5.

Chen et al. [40] showed that high dimensional LBP wouldyield much higher performance in terms of face verification.We also find it true that building a higher dimensional LBP(dense description of the face image) can give better represen-tation of the face. The way we build the high dimensional LBPis very different from Chen et al. for the following two reasons.

First, Chen et al. extract the LBP features around eachlandmark for a given face image. This requires a very precisefacial alignment. While our approach samples uniformly on


Fig. 5. Illustration of our proposed WLBP feature extraction method: (a) Expand each 3 × 3 image patch using 9 Walsh basis images (left) or 5 × 5 imagepatch using 25 Walsh basis images (right), (b-d) LBP encoding on Walsh coefficients for odd-sized kernel (orange box shows centered pivot whereas greenbox shows non-centered pivot), (e) WLBP features of different kernel sizes on a periocular image.

the facial image which is alignment-robust in the sense thatwe do not need very precise alignment of facial image.Second, Chen et al. build high dimensional LBP features by amulti-scale sampling technique, but again at each scale theuse the same LBP descriptor. While in our approach, thehigh dimensional LBP comes by considering multiple kernelsizes, different choices of pivot points and different bases forconverting to decimal number (usually people use base 2,but we can use any arbitrary bases). By doing this, we canachieve a very rich descriptor of the face image. The resultinghigh dimensional WLBP features are used as the inputs tooptimal feature selection as will be discussed in the followingSection III-C.

C. Optimal Feature Selection

After WLBP features are obtained, we have conductedextensive experiments to test the effectiveness of the featureas well as its robustness towards matching pairs of uncon-strained facial images. In order to achieve optimal learningperformance, we resort to the feature selection technique underLaplacian regularization [47], and we find out that using onlya subset of the feature yields better results. This is consistentwith conceptual understanding of human faces where someparts are supposed to be more discriminative than others.

We follow the work of He et al. [47] on Laplacian regular-ized A-optimal feature selection (LapAOFS) for obtaining theoptimal subset of our features. The goal is to find a featuresubset that (1) is geometrically faithful to the underlyingmanifold structure, and (2) can, at most, improve the learningperformance when the selected features are used for tasks suchas a linear regression.

Let XF = (x1, ..., xm) be a data ensemble, where xi ∈ Rn

is the WLBP feature for the i th periocular image, and mis total number of images in this ensemble. We define thefeature set F = {f1, ..., fn}, where fi ∈ Rm corresponds to onefeature point across all images. Now we want to find a subsetS = {s1, ..., sk} ⊂ F that gives the optimal classificationoutput. Let XS be a masked version of XF where onlyfeatures in S are kept and xSi be the i th column of XS .

Now consider a linear regression model [47] by using onlythe selected feature subset S: y = w�xS + ε where ε is anunknown error with zero mean. Here yi is the label of i thimage and the maximum likelihood estimate (MLE) for theweight vector wmle can be obtained by minimizing the sumsquared error as: Jsse(w) = ∑m

i=1(w�xSi − yi )

2.

Further considering the faithfulness towards intrinsicmanifold structure, the Laplacian regularizers are fused intothe sum squared error and the new objective function becomes:

J (w) = λ1

2

m∑i, j=1

(w�xSi − w�xSj )2Ai j + λ2‖w‖2

︸︷︷︸Laplacian regularizer

+m∑

i=1

(w�xSi − yi )2

︸︷︷︸sum squared error

, (7)

where A is a binary adjacency matrix based on KNN ofneighboring data. The solution to this objective functionis [48]:

w = wmle =H−1︷︸︸︷

(XS(XS )� + λ1XSL(XS )� + λ2I︸︷︷︸�

)−1XSy,

where L = D − A is graph Laplacian which is positivesemidefinite (PSD), and D is a diagonal matrix, Dii = ∑

j Ai j .Next, we want to compute the covariance matrix of w and theexpected squared prediction error. Define

H = XS(XS)� + λ1XSL(XS)� + λ2I (8)

� = �(λ1, λ2) = λ1XSL(XS )� + λ2I (9)

Thus, w = H−1XSy, where y = (XS)�w + ε.Then, with the knowledge Cov(y) = σ 2I, we can have

the covariance matrix of w [47]: Cov(w) ≈ σ 2H−1,(small λ1, λ2).

Also, let y = w�xSi be the predicted observation of xSi , andthe expected squared prediction error [47]:

E(y − y)2 ≈ σ 2 + σ 2(xSi )�H−1xSi , (small λ1, λ2).

By minimizing the size of H−1, both the covariance ofthe data and the expected squared prediction error can beminimized [47]. Thus, the optimal features subset can be foundby solving a Laplacian regularized A-optimality which is tominimize the trace of the parameter covariance matrix:

minS⊂F

Tr(H−1) = minS⊂F

Tr((L +k∑

i=1

si s�i )−1L) (10)

= minS⊂F

Tr((L +k∑

i=1

αi fi f�i )−1L) (11)


Fig. 6. Framework of training and testing stages of our Spartans method. In our implementation, the input to KCFA is the optimal WLBP features.

where L = λ2(I + λ1L)−1 and k is total number of selectedfeatures. So the optimization in Equation 10 becomes:

minα

Tr

⎧⎪⎪⎪⎨⎪⎪⎪⎩

⎡⎢⎢⎢⎣L + (XF )�

⎡⎢⎢⎢⎣

α1 0 0 00 α2 0 0

0 0. . . 0

0 0 0 αn

⎤⎥⎥⎥⎦XF

⎤⎥⎥⎥⎦

−1

L

⎫⎪⎪⎪⎬⎪⎪⎪⎭

subject to∑n

i=1 αi = k, αi ∈ {0, 1}.After applying sequential optimization [47], [48], the

(k+1)th optimal feature is given by:

sk+1 = argmaxs

s�L−1k LL−1

k s

1 + s�L−1k s

(12)

where Lk = L + ∑ki=1 si s�

i . Thus, we can obtain k optimalfeatures sequentially. In our approach, we keep k = 6400.More detailed derivation can be found in [47]. Afterperiocular optimal features are acquired, we feed them intothe CoMax-KCFA subspace modeling as will be described inthe following Section III-D.

D. Kernel Class-Dependence Feature Analysisvia MACE and SDF Filters

In this section, class-dependence feature analysis (CFA)method using minimum average correlation energy (MACE)filters (designed in frequency domain) and equal correlationpeak synthetic discriminant function (ECPSDF) filters(designed in space domain) will first be described, followedby their kernelization, justification and more discussions.The overall training and testing framework of the proposedSpartans method is shown in Figure 6. Components involvedwill be explained in this section and the section to follow.

1) CFA via MACE Filters: The CFA method has shown itsability to perform robust face recognition in well representedtraining data. It uses a set of MACE filters [49], [50] to extractfeatures from the individuals in the training set. MACE filterminimizes the average energy of the correlation outputswhile maintaining the correlation value at the origin at a

pre-specified level. Sidelobes are reduced greatly and discrimi-nation performance is improved compared to other well-knowncorrelation filters such as ECPSDF filter. MACE filters aregenerated for every subject in the generic training set thereforenumber of class matches the dimensionality of the featurespace. The closed vector form of the resulting MACE filter is:

h = D−1X(X+D−1X)−1u, (13)

where X is d×N matrix containing N training feature vectors.Here d is the dimension of input feature vector. D is d × ddiagonal matrix containing average power spectrum of thetraining vectors. The column vector u containing pre-specifiedcorrelation values at the origin is set to 1 for the “authentic”class to whom the MACE filter belongs, while 0 for allother images belonging to the remaining “imposter” classes.These preset correlation peaks ensure that the filter finds nocorrelation between subjects who belong to different classes.The correlation of a test image y, with N MACE filterscan be expressed as:

c = H�y = [hMACE−1 hMACE−2 · · · hMACE−N ]�y, (14)

where hMACE−i is a MACE filter that has preset value ofthe correlation peak to give a small correlation output thatis close to zero for all classes except for class-i. Each inputimage is then projected onto those correlation feature vector cwhere N is the number of training classes. The similaritybetween the probe image and the gallery image is obtainedin the multi-dimensional feature space.

In our experiments, one MACE filter is designed for eachof the N subjects in training set. The design of these N filtersutilizes entire or part of 3D-GEM generated images in trainingset in one-class-against-the-rest fashion (i.e., correlation outputis 1 at the origin for all images from the same subject and0 for all other images) as shown in Figure 6. When N filtersare designed, they are used as projection bases as shown inFigure 6 to produce the output feature vector c for any testimage y in Equation 14.


2) CFA via SDF Filters: MACE filters produce sharpcorrelation peaks at the origin, which leads to highdiscriminability. However, MACE filters can be sensitive tonoise and may not provide high correlation peak values inresponse to non-training images from the desired class. Whenimages are subject to distortions such as 3D geometricdistortions, occlusions, pose variations, illuminationchanges, expression variations, etc., synthetic discriminantfunction (SDF) filters can provide an attractive approach fordesigning correlation filters that are tolerant to such distortions.

The SDF filters are based on training images that containexamples of expected distortions. The first SDF filters(known as equal correlation peak synthetic discriminantfunction (ECPSDF) filters, or projection SDF filters) assumedthat the correlation filter is a weighted sum of training images,and the weights are found so that correlation output at theorigin takes on pre-specified values in response to trainingimages. More specifically, the closed form solution of theECPSDF filters is:

h = X(X�X)−1u (15)

where columns of X are the training images in space domain,and u is the pre-specified correlation peak values at the origin,which are set to 1 for the “authentic” class to whom theECPSDF filter belongs, and 0 for all other images belongingto the remaining “imposter” classes just like in the CFAvia MACE filters. The correlation of a test image y, withN ECPSDF filters can be expressed as:

c = H�y = [hECPSDF−1 hECPSDF−2 · · · hECPSDF−N ]�y, (16)

It is worth noted that the ECPSDF filters can be designedin both space domain and frequency domain, while theMACE filters can only be designed in frequency domain.One extension of ECPSDF filters is the minimum variancesynthetic discriminant function (MVSDF) filters whichminimize the output noise variance (ONV) while stillsatisfying the hard correlation peak constraints, and thusprovide maximum robustness to noise. The MVSDF filtersare usually designed in frequency domain and if the inputnoise is white, MVSDF filters are the same as ECPSDFfilters. In this work, we assume additive Gaussian whitenoise, therefore, we will simply use ECPSDF filters.

3) Kernel CFA via MACE Filters: Linear approaches suchas PCA, LDA, and CFA tend to yield low performance ratein face images that are taken under uncontrolled acquisitionconditions. KCFA [49] is designed to overcome the lowperformance of the linear subspace approach due to thenonlinear distortions in human face appearance variations.The algorithm is implemented to represent nonlinear featuresby mapping them onto a higher dimensional feature spacewhich allows higher order correlations in kernel spaces. Thefeatures in the higher dimensional space are obtained usinginner products in the linear space, without actually formingthe higher dimensional feature mappings. This kernel trickimproves efficiency and keeps the computation tractable evenwith the high dimensionality.

Pre-filtering process is performed before the derivation ofthe closed form solution to the kernel MACE filter. The

correlation output of a filter h and a test image y can beexpressed as:

y+h = y+ [D−1X(X+D−1X)−1u]

= (D− 12 y)+(D− 1

2 X)((D− 1

2 X)+(D− 12 X)

)−1u

= ((y)+X

) ((X)+X

)−1u,

where X = D− 12 X and y = D− 1

2 y denotes pre-whitenedversion of X and y. Now we can apply the kernel trick toyield the kernel correlation filter as follows:

(y) · (h) = ((y) · (X)

) ((X) · (X)

)−1u

= K(y, xi )K(xi , x j )−1u

where is the mapping function : RN → F , and the RBFkernel function is defined by:

K(x, y) = 〈(x),(y)〉 = exp(−(x − y)2/2σ 2

)(17)

where σ = 8 in this work, as determined by cross-validation.The KCFA via MACE filters is completed after extracting

N-dimensional kernel feature vectors using the kernel MACEfilters in the same CFA framework.

4) Kernel CFA via SDF Filters: Similar to kernel CFA viaMACE filters, the correlation output of an ECPSDF filter hand a test image y can be expressed, in space domain, as:

y+h = y+ [X(X+X)−1u]

= ((y)+X

) ((X)+X

)−1 u

Now we can again apply the kernel trick to yield the kernelECPSDF filter as follows:

(y) · (h) = ((y) · (X)) ((X) · (X))−1 u

= K(y, xi )K(xi , x j )−1u

The choice of kernel function is the same as in the kernelCFA via MACE filters. The KCFA via SDF filters is completedafter extracting N-dimensional kernel feature vectors using thekernel ECPSDF filters.

5) Justification: The reason why correlation filters canhandle pose variant training images is two-fold: shift-invariance and grace degradation. Correlation is a naturalmetric for characterizing the similarity between a referencepattern r(x, y) and a test pattern t (x, y). The two patternsbeing compared exhibit relative shifts and it makes senseto compute the cross-correlation c(τx , τy) between the twopatterns for various possible shifts τx and τy as c(τx , τy) =∫∫

t (x, y)r(x − τx , y − τy)dxdy.Then, it makes sense to select its maximum as a similarity

metric between the two patterns and the location of the corre-lation peak as the estimated shift of one pattern with respectto the other. This correlation operation can be equivalentlyexpressed as:

c(τx , τy) =∫∫

T ( fx , fy)R∗( fx , fy)ej2π( f xτx + fyτy)d fx d fy

= FT−1{T ( fx , fy)R∗( fx , fy)} (18)

where T ( fx , fy) and R( fx , fy) are the 2D Fourier trans-forms (FT) of t (x, y) and r(x, y) respectively, with fx and fy

denoting the spatial frequencies.


One of the benefits of correlation filters is the resultingautomatic shift-invariance. Since correlation filters are linearshift-invariant filters, any translation in the test image willresult in correlation output being shifted by exactly the sameamount. Since the first thing we do in processing correlationoutputs is to locate the correlation peak and since filtersare designed to yield centered correlation peaks for centeredtraining images, we do not need to explicitly center the testimage. Instead, we implicitly center the test image by locatingthe correlation peak and computing the PSR centered at thispeak location. In many feature-based FR methods, centeringof a test image prior to computing the feature values is criticalsince they are often sensitive to image centering.

Another important benefits of correlation filters is thegraceful degradation they offer because of the integrativenature of the matching operation as seen in Equation 18.If some of the pixels are occluded, they simply do notcontribute to the correlation peak, thus decreasing the overallpeak. However, no single pixel in the image domain is criticalin that recognition can be still carried out successfully. Thisgives correlation filters the tolerance to handle misalignedimages. When images are misaligned, it is clear that thecorrelation peak is degraded but still high enough to verifythe subject’s identity. This tolerance to image misalignmentand occlusions is in contrast to many space-domain methodsthat first locate the two eyes and then center and normalizethe input image, prior to computing features.

Table I shows a set of correlation planes from full faceand periocular region when matching gallery vs. gallery andgallery vs. probe images along with the peak-to-sideloberatio (PSR). We can see that when frontal gallery imageis matched against itself, perfect peaks are obtained. Also,periocular region achieves even higher PSR then full facebecause the periocular region is more discriminative. Whenoff-angle probe image is matched against frontal galleryimage, PSR drops, but still with a discernable peak. It isworth noted that periocular region still yields higher PSR thanfull face when off-angle face is matched against frontal one,because in this case, the off-angle probe image has occlusionand using periocular region can circumvent this issue. Highdiscriminative power and robust to occlusion is the two keyreasons we employ periocular region in Spartans.

E. CoMax-KCFA: Coupled Max-Pooling KernelClass-Dependence Feature Analysis

As MVSDF filters (equivalently ECPSDF when assum-ing white noise) typically emphasize low spatial frequenciesand MACE filters emphasize high spatial frequencies, theyprovide conflicting attributes. In the previous formulation,MACE filters which are designed to minimize the averagecorrelation energy (ACE) would yield sharp peaks but islikely to have less optimal noise properties. On the otherhand, the MVSDF filters which are designed to minimize theoutput noise variance (ONV) is robust to noise but does notyield shape peak. An optimal tradeoff synthetic discriminantfunction (OTSDF) filter that makes an acceptable compromisebetween the sharpness of the peak and the noise tolerance

TABLE I

CORRELATION PLANE FROM FULL FACE AND PERIOCULAR REGION

WHEN MATCHING GALLERY VS. GALLERY AND GALLERY

VS. PROBE. PEAK-TO-SIDELOBE RATIO (PSR)

IS ALSO COMPUTED

criteria might be preferable to either the MACE or MVSDFfilter. Such a filter could be obtained by optimizing a weightedsum of the ACE and ONV metrics, and the resulting filterwould have the following form:

h = (αD + βC)−1X[X+(αD + βC)−1X

]−1u (19)

where the non-negative constants α and β control the tradeoff.2

However, the question arises as to what is the optimum mixof the terms. This can be done by trial and error using thevalidation set.

Here, we propose a better and computationally inexpensiveway to achieve the tradeoff between discriminability andnoise tolerance. Recall that in the KCFA formulation usingboth MACE and ECPSDF filters, for each testing image y,an N-dimensional feature vector can be obtained, where eachelement is the kernelized projection of y onto each filterhi : K(y, hi ).

If y is from one of the training classes, in the case of MACEfilters, this feature vector fm is highly sparse, and should onlypeak at the genuine class. However, for the ECPSDF filters, thefeature vector fe will still obtain positive values for imposterclasses.

It is more important to discuss the case when y is notfrom one of the training classes. What happens for fm isthat all the values will be close to zero, which is by design.Especially after kernelization, if the test image is outside of3σ of any classes, fm would be near-zero vector. In this case,the feature vector become non-informative. This is one of thelimits of KCFA via MACE filters that it does not generalizewell for unseen classes. To tackle this, we find that ECPSDFhas smooth surface and can assign positive values to those testimages that are even outside 3σ of any classes. The resultingfeature vector fe would be non-sparse and informative.

2The definitions of C and D are discussed in the appendix.


Fig. 7. Toy example showing the correlation response planes of MACE andECPSDF filters. The filters are trained using the 10 images shown, and theresponses are for testing image 1 through 3 from the same class. MACE filtersresult in sharp peaks while ECPSDF filters return smooth surfaces.

We propose to apply some pooling techniques on these twofeature vectors fm and fe, instead of determining the mixingconstants in the OTSDF during the filter design stages, whichis way more expensive computation-wise.

Intuitively, we want to retain the sharp peak (high discrim-inability) from MACE filters and smooth surface (tolerance tonoise) from ECPSDF filters. Coupled max-pooling can be agood choice because for the feature dimensions correspondingto genuine classes, both MACE and ECPSDF should give highvalue, and after max-pooling, high discriminability still holds.For all other dimensions corresponding to imposter classes,ECPSDF will give higher values than MACE, and after max-pooling, the resulting feature vector will still be informativebecause for these dimensions, values are no longer near zero.This is depicted in Figure 6. We have shown in Figure 7the response planes for both MACE and ECPSDF filters forsome sample images. MACE filters result in sharp peaks whileECPSDF filters return smooth surfaces.

The final coupled max-pooling KCFA (CoMax-KCFA)feature vector is expressed as:

fCoMax(i) = max(fm(i), fe(i)) (20)

In addition, we also explore sum-pooling and min-pooling.They are not as good as max-pooling from our evaluationusing “Experiment 4 protocol” in the FRGC ver2.0 databaseon raw pixel intensities as shown by the receiver operatingcharacteristics (ROC) curves in Figure 8. We also showthe verification rate (VR) at 0.001 false accept rate (FAR)for the comparative evaluation in Table II. Moreover, theCoMax-KCFA is a cross-domain approach, where the max-pooling is across space and frequency domain. This wouldallow more tolerance to distortions.

IV. EVALUATIONS

We first evaluate the quality of the 3D synthesized facesby comparing them against their corresponding actual imagesin the CMU Multi-PIE (MPIE) database [24]. Then, wetest how using only single image per training class can beharnessed to achieve comparable results in a large-scale1-to-1 face verification scenario using the Face Recognition

Fig. 8. ROC curves for strict FRGC experiment 4 protocol. CoMaxoutperforms the rest.

TABLE II

VERIFICATION RATES (VR) AT 0.1% (0.001) FAR AND EQUAL ERROR

RATES (ERR) ON FRGC EXPERIMENT 4 PROTOCOL

Grand Challenge (FRGC) ver2.0 database [23]. Thirdly,we evaluate the off-angle face matching capability of ourproposed Spartans using the MPIE database, in terms ofpose invariant face recognition tasks. Following this, wecarry out our major experiments on the Labeled Faces in theWild (LFW) database [22] to test how well Spartans can dealwith unconstrained non-frontal face verification tasks.

A. 3D-GEM Quality on Periocular RegionWe show the accuracy of our 3D synthesized faces by

comparing the periocular regions of synthesized imagesagainst that of the actual database images. Figure 9 showsthe comparison between the actual off-angle faces and itsperiocular region (in blue box) and the synthesized off-anglefaces and periocular regions by using our proposed method(in red box). In order to show the dissimilarity score ofthe two, we compute the root mean square error (RMSE)between the periocular region of database images and thatof the corresponding synthesized faces. Table III shows thecomparison results of the RMSE on 249 faces, each face has5 poses at 0, ±15 and ±30 degrees of yaw, where μRMSE isthe mean of RMSE across these 249 face images, and SRMSE

Eis the standard error of μRMSE. Please note that at 0 degree,the RMSE is 0. This is because we synthesize the off-angleimages based on the frontal image. As shown in Table III,our synthesized images have very low dissimilarity (smallRMSE) with database images.

B. Evaluation on the FRGC v2.0 DatabaseThe FRGC ver2.0 database [23] has three components: First,

the generic training set is typically used in the training phaseto extract features. It contains both controlled and uncontrolledimages of 222 subjects, and a total of 12,776 images. Second,the target set represents the people that we want to find. It has466 different subjects, and a total of 16,028 images. Last,the probe set represents the unknown images that we need to


Fig. 9. Comparison of database periocular images and our synthesizedperiocular images at 0, ±15 and ±30 degrees of yaw.

TABLE III

RMSE BETWEEN SYNTHESIZED PERIOCULAR IMAGES AND DATABASE

PERIOCULAR IMAGES ON A SUBSET OF THE CMU MULTI-PIE

match against the target set. It contains the same 466 subjectsas in target set, with half as many images for each person asin the target set, bringing the total number of probe imagesto 8,014. All the probe subjects that we are trying to identifyare present in the target set [51]–[53]. The harsh illuminationvariation mentioned earlier heavily affects the performance ofall recognition algorithms and cannot be overlooked. FRGCExperiment 4 is the hardest experiment in the FRGC protocolwhere face images captured in controlled indoor setting arematched against face images captured in uncontrolled outdoorconditions, where harsh lighting conditions considerably alterthe appearance of the face images.

We will make the both FRGC Experiment 1 and Experiment4 protocols even harder by allowing us to see only 1 imageper training class. So, instead of using 12,776 image from222 classes for training, we will be shown only 222 imagesfrom 222 distinct subjects (randomly choose 1 per trainingsubject) in total.

Our proposed Spartans approach is a good fit for this single-sample case where it can generate many face images withvarious poses for each individual to enrich the training corpusby bring in more variations.3 Since the FRGC is a frontal ornear-frontal database, we confine the range of the 3D posesynthesis to ±15° in yaw and ±10° in pitch.

Figure 10 (L) shows the ROC curves for the FRGC Exper-iment 1 and Experiment 4 evaluations. Table IV shows theVR at 0.1% FAR and EER of our method and the protocoltraining method. For fair comparison, we restrict the Spartansto take in the raw pixel features. As can be seen from theresults, the Spartans can achieve comparable results with theprotocol which utilizes all of the 12,776 training images. Thisshows the effectiveness of our proposed method under thesingle-sample training scenarios.

3In this case, only pose variations will be enriched. The same frameworkcan be easily extended to bringing in other variations like illumination andexpression. This is beyond the scope of this paper.

Fig. 10. (L) ROC curves for the FRGC Exp1 and Exp4 evaluation. (R) ROCcurves for the MPIE evaluation.

TABLE IV

Spartans PERFORMANCE COMPARED WITH PROTOCOL ON FRGC

C. Evaluation on the MPIE Database

We use the CMU Multi-PIE (MPIE) database [24] to studypose invariant face recognition using the proposed Spartansapproach. The MPIE database comprises of images taken in acontrolled studio setting. It is a very good experiment to seehow Spartans can deal with controlled pose variations. Thiswill be a pilot for the proposed method before it starts to dealwith unconstrained real-world data in the section to follow.

There has not been any universal evaluation protocols thatwe can follow for MPIE database. Therefore we decideto follow the pose invariant face recognition protocol fromPrabhu et al. [6]. They used a subset of MPIE database fromthe first session with neutral expression and frontal lighting,covering nine poses (left 60 degree to right 60 degree yawin steps of 15 degrees) of 249 subjects. In their work, pitchvariation was not explored.

In their evaluation, off-angle query images are matched tofrontal gallery images. They first estimate the pose of the queryimage, and then rotate each gallery image to the estimatedpose of the query image plus or minus some search range(5 degrees). This means for any query image xt with anunknown yaw pose φ, 10 poses [φ − 5°, φ + 5°] of eachof the 249 gallery images have to be generated, resulting in11×249 = 2, 739 to be compared to a single query image xt .

Actually, theirs is a quite computationally expensive methodin the sense that pose estimation has to be carried out for eachquery image and 2,739 pose images have to be generatedfrom the gallery set to be matched to each query image.Such method will be less feasible for large-scale off-angleface matching problems. In addition, the accuracy of the poseestimation will also be an issue especially when the images areless ideal. One of the reasons they do not carry experiments onpitch variation could be that the pitch estimation is somewhatmuch harder then yaw estimation.

On the other hand, our proposed Spartans approach canbe agnostic about the pose of the query image and does not


TABLE V

TABLE SHOWING RANK-1 IDENTIFICATION RATE (PERCENT) AND VERIFICATION RATES AT 1 PERCENT FALSE ACCEPT RATE (PERCENT, Bold Italics)

FOR THREE BENCHMARK EXPERIMENTS AND THE PROPOSED METHOD EXPERIMENTS ON MPIE

need any pose estimation steps. The pose invariance is built induring the design of the advanced correlation filters. For faircomparison, we use raw pixel intensity in Spartans and theperformance results are shown in Table V along with the bestalgorithm from Prabhu et al. [6], PCA (subspace trained onfrontal images) and raw pixel intensities (direct matching offrontal and non-frontal images). Since the 3D-GEM goes notmodel eyeglasses, we also carry out experiments on a subsetof the MPIE data set containing images without eyeglasses.The results are shown in the second portion of Table V.

As can be seen, our proposed Spartans consistently outper-forms the competing best algorithm [6]. Figure 10 (R) showsthe receiver operating characteristics (ROC) curves fromour proposed Spartans method on the MPIE data set, witheyeglasses included. We do observe some performance loss atmore extreme pose angles, but we attribute this to illuminationvariations which appear when the face is observed from differ-ent angles. Even under the frontal lighting conditions, the facesexhibit strong illumination gradient visible across the cheeksin the 60 degree view which is absent in the frontal view.When we allow Spartans to use optimal WLBP features, whichis more discriminative and robust to illumination changes asproposed, we have seen a performance improvement of around20 percent in Rank-1 ID rate at 60 degrees over the raw pixelfeatures.

D. Evaluation on the LFW Database

In this subsection, we detail our major experiments on theLFW database [22] which contains a total of 13,233 faceimages of 5,749 different subjects. The whole database is splitinto 2 views: View 1 is used for model selection and algorithmdevelopment which has 2,200 pairs for training and 1,000 pairsfor testing; View 2 consists of 10-fold cross-validation setof 6,000 pairs, on which we report the performance only.In our implementation, we tune the model parameters strictlyon View 1 only and evaluate our performance on each of the10 folds in View 2 following several configuration protocolsto be discussed below.

The images from the LFW have rich contextual informationin the background which will definitely aid the classificationperformance. In our implementation, we restrict the regionof interest solely on face region, to be more specifically, weconsider only the periocular information.

We conduct four different experiments on the LFW,following its latest updated protocols [22], [54]. (1) We followthe Image-Restricted, No Outside Data protocol, to showthat our proposed method is able to achieve higher accuracycompared to other existing approaches. Since no outside infor-mation can be used, not even for alignment and feature extrac-tion, we confine the training of facial landmarker (MASM) andpose synthesizer (3D-GEM) using only the images availableto us in View 1, which is available for model selection andtraining under this most strict protocol. Of course themodel would not be as robust as if trained on more ideal andaccurately labeled faces. (2) We follow the Image-Restricted,Label-Free Outside Data protocol, only this time, outsidetraining data can be used only for alignment and featureextraction. In this case, our landmarker and synthesizer canbe improved in the sense of precision. (3) Since the wholematching process does not require any label information tobe considered, our approach is actually an unsupervised one.Therefore, we will also benchmark the Spartans againstother unsupervised approaches following the Unsupervisedprotocol. (4) We follow our self-defined protocol to showthe effectiveness of our proposed CoMax-KCFA subspace bycomparing against other subspace methods (both linear andnon-linear) on the pair-matching task in the LFW, which willbe detailed below.

The advantages of using Spartans on the LFW are asfollows: (1) it can be completely unsupervised, (2) it doesnot require a lot of training data, (3) it trains fast. Thisis directly opposite to those methods that reply on largeamount of labeled data for their deep architectures whichare computationally demanding. To that end, we compete indifferent categories that either does not utilize outside trainingdata, or are completely unsupervised.

In the first three experiments, we follow theleave-one-out cross validation protocol which is required bythe LFW evaluation, i.e. in each fold, one subset is used fortesting and the remaining 9 subsets for training. Especially,for any training-testing selection, the subjects in each subsetare mutually exclusive. The final pair-matching performanceis reported as the mean recognition rate μ and standard errorSE about the mean over 10-fold cross-validation.

For the first experiment under Image-Restricted, NoOutside Data protocol (without any outside training data


Fig. 11. (a): ROC curves for experiments on LFW database under Image-Restricted, No Outside Data protocol. (b): ROC curves for experiments on LFWdatabase under Image-Restricted, Label-Free Outside Data protocol (outside training data used only for alignment or feature extraction). (c): ROC curvesfor experiments on LFW database under Unsupervised protocol.

TABLE VI

VERIFICATION RESULTS COMPARISON ON LFW UNDER

“IMAGE-RESTRICTED, NO OUTSIDE DATA” PROTOCOL [59]

TABLE VII


“IMAGE-RESTRICTED, LABEL-FREE OUTSIDE

DATA” PROTOCOL [59]

whatsoever), our proposed Spartans approach can achieve87.55% accuracy, a high score that is comparable to thestate-of-the-art [39], [55] as shown in Table VI. The receiveroperating characteristics (ROC) curves for all the methodslisted in Table VI are shown in Figure 11(a). It is worth notedthat our approach actually has the much higher in the low FARrange, a desirable property for access control scenarios. Forthe second experiment under Image-Restricted, Label-FreeOutside Data protocol where outside training data can onlybe used for alignment and feature extraction, our proposedSpartans with better trained landmarker and pose synthesizer

can yield higher accuracy than in the first experiment. It beatsall the state-of-the-art results by achieving an accuracyof 89.69%. Table VII shows the accuracies of our methodand 8 other methods following the same protocol. The ROCcurves for all the algorithms are shown in Figure 11(b).Since our approach is also an unsupervised one, becausein the training stage we do not take any label informationinto account, we also benchmark our approach against otherunsupervised approaches as our third experiment following theUnsupervised protocol. The accuracies are measured by areaunder the ROC curve (AUC), listed in Table VIII and the ROCcurves are shown in Figure 11(c). It turns out, the Spartansagain beats all the state-of-the-art algorithm by reaching anAUC of 0.9428. In the last experiment, we introduce a newprotocol also using the LFW View 2. In each fold, we selectthe most frontal face image of each subject in that fold fortraining and keep the rest of images for testing. This processis repeated 10 times across all 10 folds. Under this scheme,we compare our proposed method with eight other subspacelearning methods, including both linear and non-linearversions of PCA, LDA, LPP, and UDP. The experimentalresults are shown in the Table IX. Our Spartans with CoMax-KCFA subspace has reach 85.65% accuracy, a big jumpcompared to other subspace approaches under this self-definedscheme.

When two feature vectors are obtained from a pair ofimages, a learned metric would return the decision of “match”or “mismatch”. We follow the work in [39] to learn a low-ranklarge-margin Mahalanobis metric. The authors have shownimprovements by projection onto a subspace with a discrim-inative Euclidean distance. In their method, a linear projec-tion W ∈ Rp×d is leaned such that the squared Euclideandistance (Mahalanobis metric in the original d-dimensionspace) d2

W (i , j ) = ‖Wi − W j ‖22 = (i − j )

�W�W(i − j ) between image feature i and j is smaller than alearned threshold b if i and j are the same person, and largerotherwise. By imposing a large margin (at least 1) criterion, theconstraints become yi j (b − d2

W (i , j )) > 1, where yi j = 1iff i and j are same person, and yi j = −1 otherwise.


TABLE VIII


UNSUPERVISED PROTOCOL [59]

TABLE IX

COMPARISON OF LINEAR AND NON-LINEAR SUBSPACE PROJECTION

METHODS UNDER SELF-DEFINED PROTOCOL ON THE LFW

The learning of W is straightforward using the followingobjective function with a hinge-loss formulation:

argminW,b

∑i, j

max[1−yi j (b −(i − j )

�W�W (i − j )), 0]

which can be solved using a stochastic sub-gradient method.At each iteration t , the algorithm uniformly samples a singlepair of face images (i, j) and updates the projection matrix asfollows:

Wt+1 ={

Wt if yi j (b − d2W (i , j )) > 1

Wt − τ yi j Wt�i j otherwise

(21)

where �i j = (i − j )(i − j )� is the outer product of the

difference vector and τ is a constant learning rate determinedby the validation set. The algorithm stops after 5,000 iterations.

Apparently, this metric learning is supervised which requiresthe label information from the pairs of training images. Thelearner needs to know which of the image pairs is “match”or “mismatch”. We, however, want to make the metriclearning process unsupervised. The implementation is simple,it only changes the part where the “match” and “mismatch”pairs are determined, and all the remaining steps shouldremain the same. Given all the available training imagesX = [x1, x2, ..., xk], we can computer a similarity matrixS ∈ Rk×k according to some similarity measure. In this case,we adopt the normalized cosine distance (NCD) [56]–[58]between two vectors as the similarity measure:NCD(a, b) = 1 − a·b

‖a‖‖b‖ . Let SL be the strict lower trianglematrix of S, and therefore, SL will have k2 − k elements.We simply rank these k2 − k numbers in a vector R, thenwe can pick the top t, (t < k2−k

2 − η) terms from R as thesimilarities of the “match” pairs, and similarly, pick the last tterms to be the similarities of the “mismatch” pairs. Followingthis, we will assign 1’s as the similarities for all “match” pair,and 0’s for all “mismatch” pair. The remaining steps should

be the same as in the original implementation. By doing this,we are able to learn the metric in an unsupervised manner.It turns out, the unsupervised modification will lead to morerobustness than enforcing hard labels in the supervised way.

V. CONCLUSIONSIn this work, we have developed a single-sample

periocular-based alignment-robust face recognition techniquethat is pose-tolerant under unconstrained face matchingscenarios. The proposed Spartans approach utilizes only onesample per subject class and generate new face images under awide range of 3D rotations using the 3D generic elastic modelwhich is both accurate and computationally inexpensive. Then,we focus on the periocular region where the most stable anddiscriminant features are retained. A novel facial descriptor,high dimensional Walsh local binary patterns, is densely anduniformly sampled on facial images with robustness towardsalignment. During the learning stage, subject-dependentadvanced correlation filters are learned for pose tolerantsubspace modeling in kernel feature space followed by thecoupled max-pooling mechanism which further improvesthe performance. Given any unconstrained unseen faceimage, the Spartans can produce a highly discriminativematching score, thus achieving high verification rate. We haveevaluated our method on the challenging Labeled Faces inthe Wild (LFW) database and solidly outperformed the state-of-the-art algorithms under four evaluation protocols with ahigh accuracy of 89.69%, a top score among image-restrictedand unsupervised protocols. The advancement of Spartans isalso proven in the FRGC and MPIE databases. In addition,our learning method based on advanced correlation filters ismuch more effective, in terms of learning subject-dependentpose-tolerant subspaces, compared to many well-establishedsubspace methods in both linear and non-linear cases. Futurework may include extending this framework to involve morevariations such as illumination variation and expressionchanges; combining with novel biometrics traits such asgait [81] and other behavioral ones [82]. This work can beseen as another step towards a fully robust unconstrained facematching system.

APPENDIX ADERIVATION OF MACE FILTERS

The minimum average correlation energy (MACE)filters [50] minimizes the average correlation energy (ACE)subject to pre-specified hard constraints. One desirable wayof suppressing the sidelobes in the correlation plane is tominimize the energy in the correlation plane. This ensureshigh discriminability of MACE filters. The average correlationenergy for the N training images is defined as:

ACE = 1

N

N∑i=1

d1∑m

d2∑n

|gi (m, n)|2 (22)

where gi (m, n) = xi (m, n) ∗ h(m, n) denotes the correlationfunction of the i th image xi (m, n) with filter h(m, n).

A direct realization of Parseval’s theorem gives the ACEexpressed in the frequency domain as:

ACE = 1

d · N

N∑i=1

d1∑k

d2∑l

|Gi (k, l)|2 (23)


where Gi (k, l), Xi (k, l), and H (k, l) are the 2D Fouriertransform of gi (m, n), xi (m, n) and h(m, n) respectively.

Since Gi (k, l) = H (k, l)X∗i (k, l), the frequency domain

expression of ACE becomes:

ACE = 1

d · N

N∑i=1

d1∑k

d2∑l

|H (k, l)|2|Xi (k, l)|2 (24)

Using the vector form of the image sequence, we can rewritethe ACE expression as follows:

ACE = 1

d · N

N∑i=1

(h+Xi )(X∗i h) (25)

= h+[

1

d · N

N∑i=1

Xi X∗i

]h = h+Dh (26)

where D = 1d ·N

∑Ni=1 Xi X∗

i is a d × d diagonal matrix, andXi ’s are also diagonal matrices whose elements on the maindiagonal are Xi (k, l).

The constraints on the correlation peak must also beexpressed in the frequency domain. Since inner projects inthe space domain are directly proportional to inner productsin the frequency domain, the constraints becomes:

X+h = d · u (27)

where X is now a matrix whose columns xi are vector rep-resentations of the Fourier transforms of the training images,and u = [u1, u2, ..., uN ]� is a vector containing the desiredpeak values for the training images.

To solve this constrained quadratic optimization problemwhere the quadratic function h+Dh is minimized subject tothe linear constraints X+h = d · u, Lagrangian multiplierscan be introduced: L(h,λ) = h+Dh − 2

∑Ni=1 λi (h+xi − ui ).

Setting the gradient of L with respect of h to zero, we have:∂L∂h = 0 ⇒ Dh = λ1x1 + ... + λN xN . Since the matrix D isinvertible, h can be expressed as:

h = D−1

[N∑

i=1

λi xi

]= D−1Xλ (28)

Substituting 28 into 27, the constraint becomes:X+D−1Xλ = u. Therefore, λ can be solved as:

λ =(

X+D−1X)−1

u (29)

With 29 and 28, we have: h = D−1X(X+D−1X

)−1 u.

APPENDIX BDERIVATION OF ECPSDF FILTERS

One of the earliest composite correlation filters is knownas the synthetic discriminant function (SDF) filters, whichis designed to achieve a specific value at the origin of thecorrelation plane in response to each training image. It is notedthat SDF filters do not, by design, suppress the sidelobes asMACE filters.

Let ui be the value at the origin of the correlationplane gi (m, n) produced by the filter h(m, n) in responseto a training image xi (m, n). Therefore, ui = gi(0, 0) =

∑d1m=1

∑d2n=1 xi (m, n)h(m, n). Written in vector form, the

above N linear equations become:

X�h = u (30)

Since the number of training images N is usually muchsmaller than the number of frequencies in the filters, thesystem of equations is under-determined and many filters existthat satisfy the constraints. To find a unique solution, h isassumed to be a linear combination of the training images4:

h = Xa (31)

where a is the weights vector for linearly combining thecolumns of the data matrix X. Substituting for h in 30 yields:X�Xa = u ⇒ a = (X�X)−1u. Using 31, the ECPSDFsolution is obtained: h = X(X�X)−1u.

Some other SDF filters can be designed by using severalperformance criteria such as output noise variance (ONV),and average similarity measure (ASM) as measures of qualitythat relate to different properties of the filter.

The filter’s output in response to a training vector xi

corrupted by the additive noise vector v is given by:

(xi + v)�h = xi h + v�h = ui + δ (32)

The output noise variance, assuming that the noise is azero-mean process, is defined as:

ONV = E{δ2} = E{(v�h)2} (33)

= E{h�vv�h} = h�E{vv�}h = h�Ch (34)

where C = E{vv�} is the covariance matrix of the input noise.This will lead to the minimum variance synthetic discriminantfunction (MVSDF) filter solution:

h = C−1X(

X+C−1X)−1

u (35)

The MVSDF filter is identical to the ECPSDF filter whenthe input noise is white (C = I). Thus, the ECPSDF is thebest one in the sense of minimizing the output variance forwhite input noise.

REFERENCES

[1] Boston Marathon Bombings—Wikipedia. [Online]. Available: http://en.wikipedia.org/wiki/Boston_Marathon_bombings, accessed Mar. 31,2015.

[2] J. Atick, P. Griffin, and A. Redlich, “Statistical approach to shape fromshading: Reconstruction of 3D face surfaces from single 2D images,”Neural Comput., vol. 8, pp. 1321–1340, 1997.

[3] V. Blanz, S. Romdhani, and T. Vetter, “Face identification acrossdifferent poses and illuminations with a 3D morphable model,” in Proc.FG, 2002, pp. 202–207.

[4] S.-F. Wang and S.-H. Lai, “Efficient 3D face reconstruction from a single2D image by combining statistical and geometrical information,” inProc. 7th ACCV, 2006, pp. 427–436.

[5] J. Heo, “Generic elastic models for 2D pose synthesis and face recog-nition,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Carnegie MellonUniv., Pittsburgh, PA, USA, 2009.

[6] U. Prabhu, J. Heo, and M. Savvides, “Unconstrained pose-invariant facerecognition using 3D generic elastic models,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 33, no. 10, pp. 1952–1961, Oct. 2011.

4Another reason for the linear combination requirement is that the resultingfilters could be obtained in an optics lab by exposing a holographic plateto different exposure levels/times of Fourier transforms of different trainingimages.


[7] F. Juefei-Xu, M. Cha, J. L. Heyman, S. Venugopalan, R. Abiantun, andM. Savvides, “Robust local binary pattern feature sets for periocularbiometric identification,” in Proc. 4th IEEE Int. Conf. Biometrics,Theory, Appl., Syst. (BTAS), Sep. 2010, pp. 1–8.

[8] F. Juefei-Xu, K. Luu, M. Savvides, T. D. Bui, and C. Y. Suen, “Inves-tigating age invariant face recognition based on periocular biometrics,”in Proc. Int. Joint Conf. Biometrics (IJCB), Oct. 2011, pp. 1–7.

[9] F. Juefei-Xu, D. K. Pal, and M. Savvides, “Hallucinating the full facefrom the periocular region via dimensionally weighted K-SVD,” in Proc.IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 1–8.

[10] M. Savvides and F. Juefei-Xu, “Image matching using subspace-based discrete transform encoded local binary patterns,”U.S. Patent 2014 0 212 044 A1, Jul. 31, 2013.

[11] F. Juefei-Xu and M. Savvides, “Subspace-based discrete transformencoded local binary patterns representations for robust periocularmatching on NIST’s face recognition grand challenge,” IEEE Trans.Image Process., vol. 23, no. 8, pp. 3490–3505, Aug. 2014.

[12] F. Juefei-Xu, M. Cha, M. Savvides, S. Bedros, and J. Trojanova, “Robustperiocular biometric recognition using multi-level fusion of various localfeature extraction techniques,” in Proc. IEEE 17th Int. Conf. Digit.Signal Process. (DSP), Jul. 2011, pp. 1–6.

[13] F. Juefei-Xu and M. Savvides, “Unconstrained periocular biometricacquisition and recognition using COTS PTZ camera for uncooperativeand non-cooperative subjects,” in Proc. IEEE Workshop Appl. Comput.Vis. (WACV), Jan. 2012, pp. 201–208.

[14] F. Juefei-Xu and M. Savvides, “Can your eyebrows tell me who youare?” in Proc. 5th Int. Conf. Signal Process. Commun. Syst. (ICSPCS),Dec. 2011, pp. 1–8.

[15] F. Juefei-Xu and M. Savvides, “Multi-class Fukunaga Koontz dis-criminant analysis for enhanced face recognition,” Pattern Recognit.,to be published.

[16] F. Juefei-Xu and M. Savvides, “An augmented linear discriminantanalysis approach for identifying identical twins with the aid of facialasymmetry features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.Workshops (CVPR), Jun. 2013, pp. 56–63.

[17] F. Juefei-Xu and M. Savvides, “Pokerface: Partial order keeping andenergy repressing method for extreme face illumination normalization,”in Proc. IEEE 7th Int. Conf. Biometrics, Theory, Appl., Syst. (BTAS),Sep. 2015, pp. 1–8.

[18] F. Juefei-Xu and M. Savvides, “Encoding and decoding local binarypatterns for harsh face illumination normalization,” in Proc. IEEE Int.Conf. Image Process. (ICIP), Sep. 2015, pp. 1–5.

[19] F. Juefei-Xu, D. K. Pal, K. Singh, and M. Savvides, “A preliminaryinvestigation on the sensitivity of COTS face recognition systems toforensic analyst-style face processing for occlusions,” in Proc. IEEEConf. Comput. Vis. Pattern Recognit. (CVPR) Workshops, Jun. 2015,pp. 25–33.

[20] K. Seshadri, F. Juefei-Xu, D. K. Pal, and M. Savvides, “Driver cellphone usage detection on strategic highway research program (SHRP2)face view videos,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR) Workshops, Jun. 2015, pp. 1–9.

[21] U. Park, A. Ross, and A. K. Jain, “Periocular biometrics in the visiblespectrum: A feasibility study,” in Proc. IEEE 3rd Int. Conf. BTAS,Sep. 2009, pp. 1–6.

[22] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeledfaces in the wild: A database for studying face recognition in uncon-strained environments,” Univ. Massachusetts, Amherst, MA, USA,Tech. Rep. 07-49, Oct. 2007.

[23] P. J. Phillips et al., “Overview of the face recognition grandchallenge,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. PatternRecognit. (CVPR), vol. 1. Jun. 2005, pp. 947–954.

[24] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-PIE,”in Proc. 8th IEEE Int. Conf. FG, Sep. 2008, pp. 1–8.

[25] M. A. Turk and A. P. Pentland, “Face recognition using eigen-faces,” in Proc. IEEE Comput. Soc. Conf. CVPR, Jun. 1991,pp. 586–591.

[26] K. Etemad and R. Chellappa, “Discriminant analysis for recognition ofhuman face images,” J. Opt. Soc. Amer. A, vol. 14, no. 8, pp. 1724–1733,1997.

[27] X. He and P. Niyogi, “Locality preserving projections,” in Proc. Adv.NIPS, 2003, pp. 153–160.

[28] J. Yang, D. Zhang, J.-Y. Yang, and B. Niu, “Globally maximizing, locallyminimizing: Unsupervised discriminant projection with applications toface and palm biometrics,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 29, no. 4, pp. 650–664, Apr. 2007.

[29] J. Merkow, B. Jou, and M. Savvides, “An exploration of genderidentification using only the periocular region,” in Proc. 4th IEEE Int.Conf. BTAS, Sep. 2010, pp. 1–5.

[30] J. R. Lyle, P. E. Miller, S. J. Pundlik, and D. L. Woodard, “Soft biometricclassification using periocular region features,” in Proc. 4th IEEE Int.Conf. BTAS, Sep. 2010, pp. 1–7.

[31] P. E. Miller, J. R. Lyle, S. J. Pundlik, and D. L. Woodard, “Performanceevaluation of local appearance based periocular recognition,” in Proc.4th IEEE Int. Conf. BTAS, Sep. 2010, pp. 1–6.

[32] D. L. Woodard, S. J. Pundlik, J. R. Lyle, and P. E. Miller, “Periocularregion appearance cues for biometric identification,” in Proc. IEEEComput. Soc. Conf. CVPRW, Jun. 2010, pp. 162–169.

[33] U. Park, R. Jillela, A. Ross, and A. K. Jain, “Periocular biometrics inthe visible spectrum,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 1,pp. 96–106, Mar. 2011.

[34] N. Pinto, J. J. DiCarlo, and D. D. Cox, “How far can you get witha modern face recognition test set using only simple features?” in Proc.IEEE Conf. CVPR, Jun. 2009, pp. 2591–2598.

[35] M. Guillaumin, J. Verbeek, and C. Schmid, “Is that you? Metriclearning approaches for face identification,” in Proc. IEEE 12th ICCV,Sep./Oct. 2009, pp. 498–505.

[36] Q. Yin, X. Tang, and J. Sun, “An associate-predict model for facerecognition,” in Proc. IEEE Conf. CVPR, Jun. 2011, pp. 497–504.

[37] H. J. Seo and P. Milanfar, “Face verification using the LARK representa-tion,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 4, pp. 1275–1286,Dec. 2011.

[38] Z. Cui, W. Li, D. Xu, S. Shan, and X. Chen, “Fusing robust face regiondescriptors via multiple metric learning for face recognition in the wild,”in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013,pp. 3554–3561.

[39] K. Simonyan, O. Parkhi, A. Vedaldi, and A. Zisserman, “Fisher vectorfaces in the wild,” in Proc. Brit. Mach. Vis. Conf. (BMVC), 2013,pp. 8.1–8.12.

[40] D. Chen, X. Cao, F. Wen, and J. Sun, “Blessing of dimensionality:High-dimensional feature and its efficient compression for face verifi-cation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),Jun. 2013, pp. 3025–3032.

[41] H. Li, G. Hua, Z. Lin, J. Brandt, and J. Yang, “Probabilis-tic elastic matching for pose variant face verification,” in Proc.IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013,pp. 3499–3506.

[42] D. Yi, Z. Lei, and S. Z. Li, “Towards pose robust face recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013,pp. 3539–3545.

[43] K. Seshadri and M. Savvides, “Robust modified active shape model forautomatic facial landmark annotation of frontal faces,” in Proc. IEEE3rd BTAS, Sep. 2009, pp. 1–8.

[44] I. Matthews and S. Baker, “Active appearance models revisited,” Int. J.Comput. Vis., vol. 60, no. 2, pp. 135–164, 2004.

[45] F. Juefei-Xu and M. Savvides, “Weight-optimal local binary patterns,”in Computer Vision (Lecture Notes in Computer Science), vol. 8926.New York, NY, USA: Springer-Verlag, 2015, pp. 148–159.

[46] M. Petrou and P. G. Sevilla, Dealing With Texture. New York, NY, USA:Wiley, 2006.

[47] X. He, M. Ji, C. Zhang, and H. Bao, “A variance minimizationcriterion to feature selection using Laplacian regularization,” IEEETrans. Pattern Anal. Mach. Intell., vol. 33, no. 10, pp. 2013–2025,Oct. 2011.

[48] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.:Cambridge Univ. Press, 2004.

[49] B. V. K. V. Kumar, M. Savvides, and C. Xie, “Correlation pattern recog-nition for face recognition,” Proc. IEEE, vol. 94, no. 11, pp. 1963–1976,Nov. 2006.

[50] A. Mahalanobis, B. V. K. V. Kumar, and D. Casasent, “Minimumaverage correlation energy filters,” Appl. Opt., vol. 26, no. 17,pp. 3633–3640, Sep. 1987.

[51] F. Juefei-Xu and M. Savvides, “Single face image super-resolutionvia solo dictionary learning,” in Proc. IEEE Int. Conf. ImageProcess. (ICIP), Sep. 2015, pp. 1–5.

[52] N. Zehngut, F. Juefei-Xu, R. Bardia, D. K. Pal, C. Bhagavatula,and M. Savvides, “Investigating the feasibility of image-based nosebiometrics,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2015,pp. 1–5.

[53] F. Juefei-Xu and M. Savvides, “An image statistics approach towardsefficient and robust refinement for landmarks on facial boundary,” inProc. IEEE 6th Int. Conf. Biometrics, Theory, Appl., Syst. (BTAS),Sep./Oct. 2013, pp. 1–8.

[54] G. B. Huang and E. Learned-Miller, “Labeled faces in the wild: Updatesand new reporting procedures,” Dept. Comput. Sci., Univ. Massa-chusetts Amherst, Amherst, MA, USA, Tech. Rep. UM-CS-2014-003,2014.


[55] H. Li, G. Hua, X. Shen, Z. Lin, and J. Brandt, “Eigen-PEP for videoface recognition,” in Proc. 12th Asian Conf. Comput. Vis. (ACCV), 2014,pp. 17–33.

[56] F. Juefei-Xu and M. Savvides, “Pareto-optimal discriminant analy-sis,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2015,pp. 1–5.

[57] F. Juefei-Xu, D. K. Pal, and M. Savvides, “NIR-VIS heterogeneousface recognition via cross-spectral joint dictionary learning and recon-struction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)Workshops, Jun. 2015, pp. 1–10.

[58] F. Juefei-Xu and M. Savvides, “Facial ethnic appearance synthesis,”in Computer Vision (Lecture Notes in Computer Science), vol. 8926.New York, NY, USA: Springer-Verlag, 2015, pp. 825–840.

[59] G. Huang. (May 2015). Results on the Labeled Faces in the Wild.[Online]. Available: http://vis-www.cs.umass.edu/lfw/results.html

[60] E. Nowak and F. Jurie, “Learning visual similarity measures for com-paring never seen objects,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR), Jun. 2007, pp. 1–8.

[61] G. B. Huang, V. Jain, and E. Learned-Miller, “Unsupervised jointalignment of complex images,” in Proc. IEEE 11th Int. Conf. Comput.Vis. (ICCV), Oct. 2007, pp. 1–8.

[62] L. Wolf, T. Hassner, and Y. Taigman, “Descriptor based methods in thewild,” in Proc. Faces Real-Life Images Workshop Eur. Conf. Comput.Vis. (ECCV), 2008, pp. 1–14.

[63] L. Wolf, T. Hassner, and Y. Taigman, “Similarity scores based onbackground samples,” in Proc. 9th Asian Conf. Comput. Vis. (ACCV),2009, pp. 88–97.

[64] H. V. Nguyen and L. Bai, “Cosine similarity metric learning for faceverification,” in Proc. 10th Asian Conf. Comput. Vis. (ACCV), 2010,pp. 709–720.

[65] D. Cox and N. Pinto, “Beyond simple features: A large-scale featuresearch approach to unconstrained face recognition,” in Proc. IEEE Int.Conf. Autom. Face Gesture Recognit. (FG), Mar. 2011, pp. 8–15.

[66] Y. Ying and P. Li, “Distance metric learning with eigenvalue optimiza-tion,” J. Mach. Learn. Res., vol. 13, pp. 1–26, Jan. 2012.

[67] G. B. Huang, H. Lee, and E. Learned-Miller, “Learning hierarchicalrepresentations for face verification with convolutional deep beliefnetworks,” in Proc. IEEE Conf. CVPR, Jun. 2012, pp. 2518–2525.

[68] Q. Cao, Y. Ying, and P. Li, “Similarity metric learning for facerecognition,” in Proc. IEEE ICCV, Dec. 2013, pp. 2408–2415.

[69] O. Barkan, J. Weill, L. Wolf, and H. Aronowitz, “Fast high dimensionalvector multiplication face recognition,” in Proc. IEEE ICCV, Dec. 2013,pp. 1960–1967.

[70] J. Hu, J. Lu, and Y.-P. Tan, “Discriminative deep metric learning forface verification in the wild,” in Proc. IEEE Conf. CVPR, Jun. 2014,pp. 1875–1882.

[71] J. Hu, J. Lu, J. Yuan, and Y.-P. Tan, “Large margin multi-metric learningfor face and kinship verification in the wild,” in Proc. 12th Asian Conf.Comput. Vis. (ACCV), 2014, pp. 252–267.

[72] T. Hassner, S. Harel, E. Paz, and R. Enbar, “Effective face frontal-ization in unconstrained images,” in Proc. IEEE CVPR, Jun. 2015,pp. 4295–4304.

[73] J. Ruiz-del-Solar, R. Verschae, and M. Correa, “Recognition of facesin unconstrained environments: A comparative study,” EURASIP J. Adv.Signal Process., vol. 2009, p. 184617, Jan. 2009.

[74] G. Sharma, S. ul Hussain, and F. Jurie, “Local higher-orderstatistics (LHS) for texture categorization and facial analysis,” in Proc.12th Eur. Conf. Comput. Vis. (ECCV), 2012, pp. 1–12.

[75] S. R. Arashloo and J. Kittler, “Efficient processing of MRFs forunconstrained-pose face recognition,” in Proc. IEEE 6th Int. Conf. BTAS,Sep./Oct. 2013, pp. 1–8.

[76] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfacesvs. Fisherfaces: Recognition using class specific linear projection,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720,Jul. 1997.

[77] J. Yang, D. Zhang, Z. Jin, and J.-Y. Yang, “Unsupervised discriminantprojection analysis for feature extraction,” in Proc. 18th ICPR, vol. 1.2006, pp. 904–907.

[78] B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear componentanalysis as a kernel eigenvalue problem,” Neural Comput., vol. 10, no. 5,pp. 1299–1319, 1998.

[79] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.-R. Müller,“Fisher discriminant analysis with kernels,” in Proc. IEEE SignalProcess. Soc. Workshop Neural Netw. Signal Process. IX, Aug. 1999,pp. 41–48.

[80] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, “Face recognitionusing Laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27,no. 3, pp. 328–340, Mar. 2005.

[81] F. Juefei-Xu, C. Bhagavatula, A. Jaech, U. Prasad, and M. Savvides,“Gait-ID on the move: Pace independent human identification usingcell phone accelerometer dynamics,” in Proc. 5th Int. Conf. Biometrics,Theory, Appl., Syst. (BTAS), Sep. 2012, pp. 8–15.

[82] S. Venugopalan, F. Juefei-Xu, B. Cowley, and M. Savvides, “Elec-tromyograph and keystroke dynamics for spoof-resistant biomet-ric authentication,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR) Workshops, Jun. 2015, pp. 1–10.

Felix Juefei-Xu (S’10) received the B.S. degreein electronic engineering from Shanghai Jiao TongUniversity, Shanghai, China, and the M.S. degree inelectrical and computer engineering from CarnegieMellon University, Pittsburgh, PA, USA, where heis currently pursuing the Ph.D. degree in electricaland computer engineering. He is currently withthe Carnegie Mellon CyLab Biometrics Center, ina research group specializing in pattern recogni-tion, machine learning, computer vision, and imageprocessing, especially as applied to the field of bio-

metrics. His research is focused on unconstrained periocular face recognition,with a broader interest in pattern recognition, statistics, computer vision,machine learning, and image processing.

Khoa Luu received the B.S. degree from theUniversity of Natural Science in Vietnam, in2005, the M.S. degree from Concordia University,Canada, in 2009, and the Ph.D. degree in computerscience from Concordia University, in 2014.He is currently pursuing the Ph.D. degree inelectrical and computer engineering from CarnegieMellon University (CMU), Pittsburgh, PA, USA,in order to strengthen his knowledge in signalprocessing. He has been a Research Scientist and aPost-Doctoral Fellow with the CyLab Biometrics

Center, CMU, since 2011. He leads biometric projects and collaborateswith graduate students and research scholars to build stand-alone biometricsystems. He has authored over 30 papers in top-tier conferences and journals,and received two best paper awards at the IEEE International Conferenceon Biometrics: Theory, Applications and Systems, Washington, DC, in2011 and 2012. He has one filed patent, and two others under pending.His research interests focus on biometrics, image processing, machinelearning, and computer vision. His research achievements have appeared inseveral news media in Canada and the US. He is currently a Reviewer ofseveral high quality journals, conferences and transactions, such as the IEEETRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,the IEEE TRANSACTION ON IMAGE PROCESSING, the Journal of Imageand Vision Computing, the Journal of Signal Processing, and the Journal ofPattern Recognition Letters. He was a Vice Chair of the Montreal ChapterIEEE SMCS in Canada from 2009 to 2011.

Marios Savvides (S’03–M’04) is the Founderand Director of the Biometrics Center andcurrently a Research Professor of the Electrical andComputer Engineering Department with CarnegieMellon University (CMU). He is also one of thetapped researchers to form the Office of the Directorof National Intelligence 1st Center of AcademicExcellence in Science and Technology. His researchis mainly focused on developing algorithmsfor robust face and iris biometrics and patternrecognition, machine vision, and computer image

understanding for enhancing biometric systems performance. He has authoredor co-authored over 170 journal and conference publications, including severalbook chapters in the area of Biometrics and served as the Area Editor of theSpringer’s Encyclopedia of Biometrics. His achievements include leading theresearch and development in CMU’s past participation at NIST’s Face Recog-nition Grand Challenge 2005 (CMU ranked one in academia and industry atthe hardest experiment four open challenge) and also in NIST’s Iris ChallengeEvaluation (CMU ranked one in academia and two against iris vendors).He has filed over 20 patent applications in the area of biometrics and was arecipient of the CMU’s 2009 Carnegie Institute of Technology OutstandingFaculty Research Award. He is on the Program Committee on severalBiometric conferences, such as the IEEE BTAS, ICPR, SPIE BiometricIdentification, the IEEE AutoID, and others, and has been the Organizerand the Co-Chair of the Robust Biometrics Understanding the Science andTechnology Conference.

Documents

4780 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. …xujuefei.com/felix_tip15_spartans.pdf · age invariant face recognition, where they determined that periocular region is