8
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005 411 Correspondence________________________________________________________________________ Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations Krithika Venkataramani, Saim Qidwai, and B. V. K. Vijayakumar Abstract—We investigated the performance of three face verification algorithms (correlation filters, Individual PCA, and FisherFaces) on an image database collected by a cell phone camera. Cell phone camera images tend to be of poorer quality along with experiencing scale and dy- namic illumination changes due to cell phone portability. While Individual PCA and FisherFaces work in the image domain, correlation filters work in the frequency domain and offer advantages such as shift-invariance, the ability to accommodate in-class image variability, and closed-form expressions. Verification results suggest that, with this database, corre- lation filters can offer a better performance than Individual PCA and comparable performance with FisherFaces with fewer filters. Index Terms—Biometrics, correlation filters, face recognition, Fisher- Faces, principal component analysis. I. INTRODUCTION Many face recognition algorithms perform well on databases that had been collected with high-resolution cameras and in highly con- trolled situations. However, they may not retain good performance in real-life situations where there is a lot of variation in illumination, scale, pose, etc. In applications such as face authentication using cameras in cell phones and PDAs, the cameras may introduce image distortions (e.g., because of fish-eye lenses), and face images may exhibit a wide range of illumination conditions, as well as scale and pose variations. An important question is which of the face authentication algorithms will work well with face images produced by cell phone cameras? To address this issue, we collected a face database at Carnegie Mellon Uni- versity using a cell phone camera. In this correspondence, we evaluate and compare the performance of correlation filters for face authentica- tion with the Individual PCA method [1] and FisherFaces method [2] under various lighting conditions. Correlation filters are attractive for a variety of reasons such as shift-invariance, the ability to accommodate in-class image variability, the ability to trade off between discrimina- tion and distortion tolerance, and provision of closed-form expressions [3]–[7]. The rest of the paper is organized as follows. Section II provides some background on correlation filters, Individual PCA, and Fisher- Faces. Section III gives details on the database collection process using a cell phone camera and the preprocessing done on these images. Sec- tion IV provides an evaluation of correlation filters using this database along with a comparison with the Individual PCA and the FisherFaces methods. Finally, conclusions are provided in Section V. Manuscript received December 12, 2003; revised May 20, 2004. This work was supported in part by the Technology Support Working Group (TSWG) and by the CyberSecurity Laboratory at Carnegie Mellon University. This paper was recommended by Guest Editor D. Zhang. The authors are with the Electrical and Computer Engineering, Depart- ment, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TSMCC.2005.848183 II. BACKGROUND A. Correlation Filters The major difference between correlation filters and other methods is that the correlation filters are used in the spatial frequency domain, whereas other methods such as the Individual PCA and the Fisher- Faces work primarily in the spatial domain. A high-level schematic of the correlation filter method can be seen in Fig. 1. The filters are designed in the frequency domain using the Fourier Transforms (FTs) of the training images. The multiplication of the FT of the test image with the filter and the inverse FT of the product gives the correlation output. Typically, for a well designed filter, a sharp correlation output peak implies authenticity, whereas lack of such a distinct peak implies an impostor (Figs. 2 and 3). Correlation filters have the advantage of built-in shift invariance, in that if the test image is shifted with respect to the training images, then the correlation peak will shift by the same amount from the origin. The peak-to-sidelobe ratio (PSR) is used as a measure of the sharpness of the peak of the correlation output and is typically high for authentics and low for impostors. The PSR is defined as PSR peak mean standard deviation (1) where the peak is the largest value of correlation output. The mean and standard deviation are computed in the sidelobe region (an annular window surrounding the peak as shown in Fig. 4). Previously, correlation filters were mostly used for automatic target recognition (ATR) applications [3] and have been applied to biometrics [8], [9] only recently. While there are many correlation filter designs [3]–[7], the Unconstrained Optimal Tradeoff Synthetic Discriminant Function (UOTSDF) filter [6], [7] is used in this effort because of its ability to provide high discrimination while providing noise tolerance, its simplicity in computation, and especially for its ability to be easily updated incrementally [8]. We assume that there are training images, and that each image is of size containing pixels. Matrices in the frequency domain are denoted by uppercase bold characters, and vectors in the frequency domain are denoted by lowercase bold characters. Vectors in the image domain are denoted by lowercase letters. Matrices in the image domain are denoted by uppercase letters. Scalar elements are de- noted by lowercase italicized letters. The superscript “ ” refers to the conjugate transpose. The two-dimensional (2-D) FT of the th training image is lexicographically scanned to form a column vector con- taining elements. The 2-D filter in the frequency domain is similarly scanned and represented by the column vector . Ideally, correlation filters should suppress false class images, be tol- erant to noise, and produce a correlation peak that could be easily de- tected. Noise tolerance can be provided by reducing the output noise variance (ONV) [4]. The ONV is expressed by (2) where is a diagonal matrix containing the elements of the input noise power spectral density along its diagonal. By reducing the average correlation energy (ACE) defined below, sidelobes can be suppressed in order to provide a sharp correlation peak [5]. (3) 1094-6977/$20.00 © 2005 IEEE

Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

  • Upload
    bvk

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005 411

Correspondence________________________________________________________________________

Face Authentication From Cell Phone Camera ImagesWith Illumination and Temporal Variations

Krithika Venkataramani, Saim Qidwai, and B. V. K. Vijayakumar

Abstract—We investigated the performance of three face verificationalgorithms (correlation filters, Individual PCA, and FisherFaces) on animage database collected by a cell phone camera. Cell phone cameraimages tend to be of poorer quality along with experiencing scale and dy-namic illumination changes due to cell phone portability. While IndividualPCA and FisherFaces work in the image domain, correlation filters workin the frequency domain and offer advantages such as shift-invariance,the ability to accommodate in-class image variability, and closed-formexpressions. Verification results suggest that, with this database, corre-lation filters can offer a better performance than Individual PCA andcomparable performance with FisherFaces with fewer filters.

Index Terms—Biometrics, correlation filters, face recognition, Fisher-Faces, principal component analysis.

I. INTRODUCTION

Many face recognition algorithms perform well on databases thathad been collected with high-resolution cameras and in highly con-trolled situations. However, they may not retain good performance inreal-life situationswhere there is a lot of variation in illumination, scale,pose, etc. In applications such as face authentication using cameras incell phones and PDAs, the cameras may introduce image distortions(e.g., because of fish-eye lenses), and face images may exhibit a widerange of illumination conditions, as well as scale and pose variations.An important question is which of the face authentication algorithmswill work well with face images produced by cell phone cameras? Toaddress this issue, we collected a face database at CarnegieMellon Uni-versity using a cell phone camera. In this correspondence, we evaluateand compare the performance of correlation filters for face authentica-tion with the Individual PCA method [1] and FisherFaces method [2]under various lighting conditions. Correlation filters are attractive for avariety of reasons such as shift-invariance, the ability to accommodatein-class image variability, the ability to trade off between discrimina-tion and distortion tolerance, and provision of closed-form expressions[3]–[7].

The rest of the paper is organized as follows. Section II providessome background on correlation filters, Individual PCA, and Fisher-Faces. Section III gives details on the database collection process usinga cell phone camera and the preprocessing done on these images. Sec-tion IV provides an evaluation of correlation filters using this databasealong with a comparison with the Individual PCA and the FisherFacesmethods. Finally, conclusions are provided in Section V.

Manuscript received December 12, 2003; revised May 20, 2004. This workwas supported in part by the Technology Support Working Group (TSWG) andby the CyberSecurity Laboratory at CarnegieMellon University. This paper wasrecommended by Guest Editor D. Zhang.

The authors are with the Electrical and Computer Engineering, Depart-ment, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail:[email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/TSMCC.2005.848183

II. BACKGROUND

A. Correlation Filters

The major difference between correlation filters and other methodsis that the correlation filters are used in the spatial frequency domain,whereas other methods such as the Individual PCA and the Fisher-Faces work primarily in the spatial domain. A high-level schematicof the correlation filter method can be seen in Fig. 1. The filters aredesigned in the frequency domain using the Fourier Transforms (FTs)of the training images. The multiplication of the FT of the test imagewith the filter and the inverse FT of the product gives the correlationoutput. Typically, for a well designed filter, a sharp correlation outputpeak implies authenticity, whereas lack of such a distinct peak impliesan impostor (Figs. 2 and 3).

Correlation filters have the advantage of built-in shift invariance, inthat if the test image is shifted with respect to the training images, thenthe correlation peak will shift by the same amount from the origin. Thepeak-to-sidelobe ratio (PSR) is used as a measure of the sharpness ofthe peak of the correlation output and is typically high for authenticsand low for impostors. The PSR is defined as

PSR =peak � mean

standard deviation(1)

where the peak is the largest value of correlation output. The meanand standard deviation are computed in the sidelobe region (an annularwindow surrounding the peak as shown in Fig. 4).

Previously, correlation filters were mostly used for automatic targetrecognition (ATR) applications [3] and have been applied to biometrics[8], [9] only recently. While there are many correlation filter designs[3]–[7], the Unconstrained Optimal Tradeoff Synthetic DiscriminantFunction (UOTSDF) filter [6], [7] is used in this effort because of itsability to provide high discrimination while providing noise tolerance,its simplicity in computation, and especially for its ability to be easilyupdated incrementally [8].

We assume that there are N training images, and that each image isof size d1 � d2 containing d = d1d2 pixels. Matrices in the frequencydomain are denoted by uppercase bold characters, and vectors in thefrequency domain are denoted by lowercase bold characters. Vectorsin the image domain are denoted by lowercase letters. Matrices in theimage domain are denoted by uppercase letters. Scalar elements are de-noted by lowercase italicized letters. The superscript “+” refers to theconjugate transpose. The two-dimensional (2-D) FT of the ith trainingimage is lexicographically scanned to form a column vector xi con-taining d elements. The 2-D filter in the frequency domain is similarlyscanned and represented by the column vector h.

Ideally, correlation filters should suppress false class images, be tol-erant to noise, and produce a correlation peak that could be easily de-tected. Noise tolerance can be provided by reducing the output noisevariance (ONV) [4]. The ONV is expressed by

E1 = h+Ch (2)

whereC is a d�d diagonal matrix containing the elements of the inputnoise power spectral density along its diagonal.

By reducing the average correlation energy (ACE) defined below,sidelobes can be suppressed in order to provide a sharp correlation peak[5].

E2 = h+Dh (3)

1094-6977/$20.00 © 2005 IEEE

Page 2: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

412 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005

Fig. 1. Schematic of the correlation filter approach to verification/identification.

Fig. 2. Correlation output of an authentic image for a well designed correlation filter.

whereD is a d�d diagonal matrix containing the average power spec-trum of the training images placed along its diagonal.

Since the images we deal with (such as face images) have predom-inantly a low frequency content, and the noise is assumed to have aflat spectrum, providing discrimination by minimizing the ACE andproviding distortion tolerance by minimizing the ONV are conflictinggoals. Refregier [6] first proposed a method of finding an optimaltradeoff between the two criteria and introduced the Optimal TradeoffSDF (OTSDF) filter. The UOTSDF [7] filter maximizes the square

of the average correlation height (ACH) defined below instead ofconstraining the peak values of all training images to a specified value,and the resulting filter also happens to be simpler to compute.

ACH =1

N

N

i=1

h+xi = jh+mj (4)

wherem is the average of the N vectors x1;x2; . . . ;xN .

Page 3: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005 413

Fig. 3. Correlation output of an impostor image for a well-designed correlation filter.

Fig. 4. Calculation of peak-to-sidelobe ratio (PSR) using an annular window.

Holding one criterion (say, the ACE) constant while minimizing theother (in this case, the ONV) criterion, and maximizing (ACH)2 leadsto the UOTSDF filter given below [7]:

h = (�C+ �D)�1m (5)

where � and �; (0 � �; � � 1) are relative weights for noise toleranceand peak sharpness, respectively. Varying the values of � and � allowsus to trade off between discrimination and distortion tolerance.1) Incremental Updating of the UOTSDF Filter: It is useful to up-

date the filter to accommodate for changes in the face images and the

UOTSDF filter has the advantage of being able to be easily updated inan incremental manner [8]. The incremental updating of the UOTSDFfilter reduces the need to store all training images and requires thestorage of the sum FT sn and the sum spectral densityD0

n of the cur-rent n training images given by

sn =

n

i=1

xi (6)

D0

n =

n

i=1

X�

iXi (7)

Page 4: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

414 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005

where Xi is a d � d diagonal matrix containing the elements of xi

along its diagonal. When the (n + 1)th training image comes in, thesum FT and sum spectral density are updated accordingly as follows:

sn+1 = sn + xn+1 (8)

D0

n+1 = D0

n +X�

n+1Xn+1 (9)

and the updated UOTSDF filter is computed by

hn+1 = (�(n+ 1)C+ �D0

n+1)�1sn+1: (10)

2) Selective Incremental Updating: We can also selectively choosethe training set such that the filter is updated only with images thatare not sufficiently well represented in the filter [10]. If a new trainingimage is already well represented in the current filter, the PSR of itscorrelation output with the current filter will be high. Hence, if its PSRis below a threshold �h (say, 100), it is used to update the current filter.This method is also useful in registration of training images. The shiftinvariance property of correlation filters can be used to register the newtraining image before updating the filter. By translating the image bythe amount of the shift between the peak location and the origin of thecorrelation output, the new training image is effectively registered.

When the training images are diverse, one filter is not sufficient torepresent all images. The training images can be divided into sets ofsimilar images, and a different filter can be designed for each set. Mul-tiple filters from the training images can be built in the following way[10].

Compute the FT of a new training image. Correlate with the currentfilter(s) and calculate the resulting PSR. Consider the maximum PSRvalue psrmax and the filter f corresponding to this PSR.

— If psrmax > �h, do not use this image for updating this filtersince the new image is well represented in the filter f .

— If psrmax < �l, this image is sufficiently different from any ofthe filters. Hence, use this image to build a new filter.

— If �l < psrmax < �h, this image is similar to the images rep-resented by filter f but not sufficiently well represented in thefilter. Hence, register this training image with respect to the filterf . Compute the FT and spectral density of the registered image,and update the sum FT and the sum spectral density of imagesused in filter f . Update the filter f using these.

B. Individual PCA

The eigenface approach to face recognition, also known as UniversalPCA, was introduced by Turk and Pentland [1]. Training images fromall the classes are used to construct an eigenspace. Within this space,the face images of each of the classes form a cluster. It was found in [11]that face authentication performance can be improved by constructingeigenspaces from the face images of one person since it is tuned to thatperson. This method is referred to as the Individual PCA.

We assume that there are N training images for each person, eachof size d1 � d2 pixels. The ith image for a person is lexicographicallyscanned to form a column vector xi of length d = d1d2. The meanvector of the training images of that person is given by

m =

N

i=1

xi: (11)

PCA is done on a matrix A consisting of the mean removed vectorsgiven by

A = [(x1 �m) (x2 �m) � � � (xN �m)] (12)

to find a set of Q orthonormal eigenvectors uq corresponding to thelarger eigenvalues of the outer product matrix AAT of size d� d:

AAT uq = �quq; q = 1; 2; . . . ; Q: (13)

The number of pixels d is usually very large, and finding the eigen-vectors of a d � d matrix is computationally expensive. However, wecan obtain these eigenvectors from the eigenvectors u0q of the innerproduct matrix ATA of size N �N , which is usually a much smallermatrix.

ATAu0q = �qu0

q; q = 1; 2; . . . ; Q (14)

where u0q is related to uq by the following equation:

uq = Au0q��

q : (15)

These eigenvectors uq form the orthonormal basis of a new featurespace called the eigenspace. A test image x is projected from the imagespace into the eigenspace to a vector of length Q by

wq = uTq (x�m); q = 1; 2; . . . ; Q: (16)

The mean-removed test face image s = x � m can be reconstructedfrom the projection onto the eigenspace by

s =

Q

q=1

wquq: (17)

Now, the residue of the test image, i.e., the Euclidean distance be-tween the mean-removed test image and the reconstructed image fromthe eigenspace projection, would be small for authentic images andlarge for impostor images. The residue r is given by

r = kx�mk2 �

Q

q=1

w2q : (18)

C. FisherFaces

The Fisher Linear Discriminant Analysis (LDA) [12] tries to find anoptimal projection directionw to find the best discrimination betweenclasses in a reduced dimensional space. In the case of face authentica-tion, the Fisher LDA projects the face images into a one-dimensional(1-D) subspace such that the between-class distance in the projectedspace is increased while the within-class scatter is reduced. The FisherLDA employs the linear functionwT

x for which the criterion

J(w) =( ~m1 � ~m2)

2

~s21 + ~s22(19)

function is maximized. Here, ~mk is the mean and ~s2k is the scatter of theof the kth class, i.e., either authentic or impostor. We assume there areN training images each for authentics and for impostors. J can also beexpressed using the between-class scatter SB and within-class scatterSW matrices by

SB = (m1 �m2)(m1 �m2)T (20)

SW =

2

k=1

N

n=1

(xkn �mk)(xkn �mk)T (21)

J(w) =wTSBw

wTSWw: (22)

The optimal projection direction w that optimizes J(w) is given by

w = S�1W (m1 �m2): (23)

Usually, the within-class scatter matrix SW is singular, since it is ofsize d�d, and its rank is at most (2N�2) and the number of pixels d ismuch larger than the number of training images per classN . In order tobe able to invert the within-class scatter matrix for face authentication,the training images are first projected onto a lower dimensional space[typically (2N � 2)] using Universal PCA, and then, the discriminantanalysis is done on the lower dimensional space [2].

Page 5: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005 415

TABLE ISAMPLE IMAGES OF THE SAME PERSON UNDER DIFFERENT ILLUMINATIONS IN ROUNDS 1 AND 2

TABLE IIPROCESSED SAMPLE IMAGES OF TABLE I UNDER DIFFERENT ILLUMINATIONS IN ROUNDS 1 AND 2

III. CELL PHONE CAMERA FACE DATABASE

A. Data Collection

The image database is comprised of 20 subjects whose images weretaken with a cell phone camera. The cell camera used was NTT ’s movaD25li: part of the SH251i product line. It is equipped with a 170 000-pixel charge coupled device (CCD) camera. The images are stored ina “Memory Stick Duo.” All the images were collected in the camera’s“Burst Mode,” where 20 frames of size 120� 120 are captured overa period of time. The vertical and horizontal resolutions are 96 dots/inand the bit depth of the images is 8.

Two rounds of images were collected with a period of approxi-mately one month between the rounds to evaluate the performanceof the methods over time as people’s appearances change with time.For evaluating the performance of various algorithms under differentillumination conditions, images were collected at six different illumi-nation settings. The first setting was the normal ambient backgroundindoor lighting setting. Additional illumination using a lamp to theleft, the right, and both sides of the subject were the next three settings.The fifth setting mimicked total darkness with all artificial lights(including the ambient lights) being switched off and using only thecamera’s built-in compact light. The last setting was outdoors, wherethe illumination is very dynamic and undergoes a lot of variation,depending on the time of the day and the location. The outdoor imageswere taken at different locations between the two rounds, and it wasattempted to take the images at different times of the day as well toincorporate the fluctuations in outdoor lighting. In each round therewere two sets: Set 1 and Set 2, respectively. Set 1 images were usedprimarily in the training of the algorithms whereas Set 2 images wereused primarily in the testing. There are 20 images in each set, thustotaling 80 � 6 = 480 images per person for all illuminations.

Table I shows one image from each different variation and the dif-ferent rounds. It is important to note that no strict constraints were puton the orientation and scale of the images in order to mimic real-lifesituations. The fish-eye lens of the camera resulted in distorted images

at different distances and orientations from the camera, thus providingscale, tilt, and pose variations in the images.

B. Processing

One of the major hurdles was preprocessing the images in the pres-ence of scale, tilt, and pose variations in order to make them suitablefor testing. The distance between eyes was used for cropping the faceregions. The eye locations were found semiautomatically due to theabsence of accurate automatic eye location algorithms. The eye loca-tion in the first image of a set was found manually while the rest werefound by correlating them with a UOTSDF filter built from the firstimage. Finally, contrast stretching was done on all the images for someillumination normalization, and then, all the images were normalizedto have unit energy. Table II shows the processed images of Table I.

IV. EVALUATION OF CORRELATION FILTERS, INDIVIDUAL PCA,AND FISHERFACES

We compared the verification results of the three methods—corre-lation filters, Individual PCA, and FisherFaces—on the processed im-ages. UOTSDF filters with a small noise tolerance coefficient (� =10�5) were built incrementally from the training images. The testingwas further divided into three processes to study the effect of time aswell as of illumination and how much the performance improves byadding samples of these to the training set.

For each of the three methods (i.e., correlation filters, IndividualPCA, and FisherFaces), the error rates were found based on a thresholdbetween authentic and impostor scores. Thresholds were placed on thePSRs for correlation filters, residues for Individual PCA, and the pro-jections for FisherFaces. The false acceptance rate (FAR) and false re-jection rate (FRR) were calculated as follows for a particular threshold:

FAR =Number of incorrectly accepted impostor images

Total number of impostor images(24)

Page 6: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

416 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005

TABLE IIIAVERAGE EER (%) OF CORRELATION FILTERS, INDIVIDUAL PCA, AND FISHERFACES FOR TESTING PROCESSES I, II, AND III AT DIFFERENT ILLUMINATIONS

FRR =Number of incorrectly rejected authentic images

Total number of authentic images:

(25)

The Equal Error Rate (EER) is the error rate when the threshold is suchthat FAR and FRR are equal.

A. Training and Testing Processes

1) TrainingWith Normal Ambient Indoor Lighting of Round 1: Thefirst process put the algorithms through the toughest test and testedthe effects of illumination and time scale.Twenty images from Set 1 ofRound 1 of the normal ambient indoor lighting were used for training.Testing was done on all images of different illuminations, except thetraining set.

Four UOTSDF filters using five consecutive training images eachwere built for each person. For Individual PCA, eigenvectors corre-sponding to 98% of energy, which was approximately 15, were storedfor each person. For FisherFaces, projection was done by UniversalPCA using 61 eigenvectors, corresponding to 98% energy, followed bythe Fisher LDA.

Table III shows the error rates for correlation filters, IndividualPCA, and FisherFaces. All methods perform poorly on images cap-tured a month later. Outdoor images have large error rates, whichare expected since outdoor and indoor illumination are significantlydifferent. Among the indoor illumination images, all three methodsperformed worse on the images with no lights, implying that theseimages are tougher to verify and indicating that images with andwithout background lighting are substantially different.

Since the correlation filters we have used for evaluation (UOTSDFfilters with a small noise coefficient and a large peak sharpness coeffi-cient) do not emphasize the low-frequency components and illumina-tion variation corresponds to low-frequency components, correlationfilters are expected to perform reasonably well, even under differentillumination conditions when built with indoor images under normalambient light conditions. From the results of Process 1, the averageEER of correlation filters (11.4%) is lower than that of Individual PCA(15.9%) under various illumination conditions. FisherFaces performsthe best in this training process with an average EER of 7.1%. Fisher-Faces also uses impostor distributions, unlike the other two methods,which helps in providing a better performance. It should also be noted

that the number of eigenvectors used in Universal PCA to lower the di-mension of the training images before applying the Fisher LDA ismuchlarger compared to the corresponding eigenvectors or filters used in theother two methods.2) Training With Samples From All Illuminations of Round 1:

Process 2 helps us understand the effects of images taken a certaintime period apart. Samples of different illuminations from Set 1 ofRound 1 were added to normal ambient illumination images from Set1 of Round 1. Since the outdoor images are very different, trainingand testing was separated between outdoor and indoor illuminations.Every fourth image was used, resulting in 25 training images (fivefrom ambient light and five each from other indoor light conditions)for indoor illuminations and ten training images (five from ambientlight and five from outdoor light conditions) per person.

For the indoor illumination test, approximately 17 eigenvectors wereretained in Individual PCA to preserve 98% energy, while 130 eigen-vectors from Universal PCA were used for projecting the images be-fore the Fisher LDA to preserve 98% energy. To update the correlationfilters with images of different illuminations, the selective incrementalupdating approach described in Section II in the background of correla-tion filters. Multiple UOTSDF filters were built depending on the vari-ability of the training images. Five to 12 filters were built, dependingon the person for the indoor illumination test. For the outdoor illumi-nation test, 89 eigenvectors were used in Universal PCA, while two tofive correlation filters were built.

From Table III, we find that the error rates reduce in Round 1 whensamples of different illuminations are included in the training set. Theerror rates for Round 2 are still quite large, indicating that face im-ages change significantly over time. However, there may be distortionsincluded because of holding the camera at a different distance and ori-entation from the face.

Since the first few eigenvectors are expected to reflect the changes inillumination, results are also shown without the first three eigenvectorsfor Individual PCA in Table III. However, due to the pose, scale, andfish-eye lens distortion present in the database, the first few eigenvec-tors also represent these distortions. Hence, the performance is pooreron removing the first three eigenvectors.

Correlation filters have a good generalization capability and per-form well when samples of the expected distortions are included in thetraining set. The average EER of 3.9% for correlation filters is lowerthan that of Individual PCA and comparable to that of FisherFaces over

Page 7: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005 417

the different illumination sets, despite using only a small set of filtersper person, as compared to the large number of eigenvectors requiredfor the other methods.3) Training With Samples From All Illuminations as Well as From

Different Rounds: Process 3 was the simplest test that we put themethods through since images from different time scales were alsoincluded in the training set. This represents the real-life scenario ofcontinually updating the methods over time. Every fourth image fromSet 1 of Round 2 of different illuminations was added to the trainingset of Process 2. The indoor and outdoor illuminations were trainedand tested separately as in Process 2. Thus, 50 training images perperson are used in the indoor illumination test, and 20 training imagesper person are used in the outdoor illumination test.

Approximately 35 eigenvectors per person were required in Indi-vidual PCA in order to retain 98% of the energy in the indoor illu-mination training. FisherFaces used 221 eigenvectors for the indoor il-lumination test and 104 eigenvectors for the outdoor illumination testin order to retain 98% energy in Universal PCA. Selective incrementalupdating leads to nine–22 UOTSDF filters per person in the indoor illu-mination test and four–14 filters per person in the outdoor illuminationtest.

As expected, the error rates for all methods reduce from those inProcess 2. From Table III, we find that correlation filters have lowererror rates than Individual PCA and have comparable error rates toFisherFaces by using only a small number of filters as compared to thelarge number of eigenvectors used in FisherFaces. The average EER ofcorrelation filters is 1.1% over different indoor illuminations and 1.4%over indoor and outdoor illuminations, whereas the average EER ofFisherFaces is about 1% for both cases. This demonstrates the efficientrepresentation and good generalization capability of correlation filters.

B. Comparison and Conclusions

From Table III, as expected, with the addition of samples from dis-tortions into the training set from Proceses 1 to 3, the error rates re-duce from Proceses 1 to 3 for all three methods. The error rates forimages captured a month later are significantly larger in Processes 1and 2, implying that time scale plays a large role in face images. Itshould also be noted that there may be changes in scale and orientationand hence distortion through the fish-eye lens because of holding thecell phone camera at a different distance from the face. Between indoorand outdoor images, all three methods performed poorly on the outdoorimages, indicating that outdoor images have much more variability inillumination.

It should be noted that FisherFaces uses impostor distributions thatare not used by the two other methods. The number of eigenvectorsused in Universal PCA is also very large contributing to some unfair-ness in comparison. Further, in practical applications, impostor distri-butions may not always be available.

Among the three methods, correlation filters have the least storagerequirement, requiring only a few filters for good performance. This in-dicates that correlation filters use an efficient representation of the datafor verification purposes. This also aids in reducing the computationaltime.

By adding training images having distortions, correlation filtersincorporate distortion tolerance and can generalize better. Over time,to accommodate for changes or distortions in the face images, theUOTSDF filters can easily be updated incrementally by training onauthentic images producing low correlation peaks. This is an additionaladvantage over the other methods. The performance of correlationfilters is better than Individual PCA and comparable to FisherFaceswhen a large number of eigenvectors are used. They have an averageEER of about 1.5% on the cell phone database when images from

different illuminations and different rounds are used to incrementallyupdate the filters. They also have advantages such as shift-invariance,closed-form expressions, and graceful degradation, and because ofthis, we can say that correlation filters are better than Individual PCAand FisherFaces.

V. CONCLUSION

The cell phone camera database has been used to study the perfor-mance of some face verification algorithms in real-life situations. Thedatabase has scale and pose distortions in addition to illumination andtime-scale variations. The fish-eye lens causes further distortion in theimages with changes in orientation.

Despite the challenging database, correlation filters performed betteron average than the Individual PCA method. The error rates using asmall number of incrementally updated UOTSDF filters are compa-rable to that of FisherFaces using a large number of eigenvectors. Timescale plays a significant role in cell phone face images, and error ratesare larger for images captured a month later. If the correlation filters areincrementally updated with images over time, error rates reduce. TheUOTSDF filter provides an easy way of incremental updating usingimages captured over a period of time without the need of buildingthe filter from scratch. By incrementally updating the correlation fil-ters over time with images of different illuminations, the average EERis about 1.5% for images captured a month later.

There is need for improvement if these filters are to be implementedin the real world. Scale changes because of capturing face images atdifferent distances from the cell phone camera are a major reason forlower performance. It is close to impossible to resize the images to thesame ratio with respect to one another. Clearly, in the processing ofthe images, the up-sampling or the down-sampling resulted in scalingthe frequency domain bands with respect to those of the training im-ages. It was noticed in preliminary testing that the performance is sensi-tive to changes in eye locations. Human error in manually locating theeye locations would result in a larger EER than otherwise. Accuratescale normalization algorithms would reduce the error rates. Also thecell phone camera resulted in lower quality images and distortions thanfrom normal digital cameras. For example, the cell phone’s fish-eyelens distorted images when there was a slight deviation in the orienta-tion of the subject with respect to the camera.

Correlation filters incorporate distortion tolerance when examplesof images having the expected distortion are included in the trainingset. Correlations filters are efficient in terms of both memory usageand run time, which make them ideal, if they are to be implementedin low storage and computation devices such as cell phones or PDAs.In addition, correlation filters are shift-invariant and provide gracefuldegradation and closed-form solutions, making their use attractive.

REFERENCES

[1] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. CognitiveNeurosci., vol. 3, no. 1, pp. 71–86, 1991.

[2] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs.Fisherfaces: Recognition using class specific linear projection,” IEEETrans. Pattern Anal. Machine Intell., vol. 19, no. 7, pp. 711–720, Jul.1997.

[3] B. V. K. V. Kumar, “Tutorial survey of composite filter designs for op-tical correlators,” Applied Opt., vol. 31, pp. 4773–4801, 1992.

[4] B. V. K. Vijaya Kumar, “Minimum variance synthetic discriminant func-tions,” J. Opt. Soc. Am. A, vol. 3, pp. 1579–1584, 1986.

[5] A. Mahalanobis, B. V. K. Vijaya Kumar, and D. Casasent, “Minimumaverage correlation energy filters,”Applied Opt., vol. 26, pp. 3633–3630,1987.

[6] P. Réfrégier, “Optimal trade-off filters for noise robustness, sharpnessof the correlation peak, and Horner efficiency,” Opt. Lett., vol. 16, pp.829–831, 1991.

Page 8: Face Authentication From Cell Phone Camera Images With Illumination and Temporal Variations

418 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 3, AUGUST 2005

[7] B. V. K. V. Kumar, D. W. Carlson, and A. Mahalanobis, “Optimaltrade-off synthetic discriminant function filters for arbitrary devices,”Opt. Lett., vol. 19, no. 19, pp. 1556–1558, 1994.

[8] M. Savvides, K. Venkataramani, and B. V. K. Vijaya Kumar, “Incre-mental updating of advanced correlation filters for biometric authenti-cation systems,” in Proc. IEEE Int. Conf. Multimedia Expo, vol. 3, Jul.2003, pp. 229–232.

[9] B. V. K. V. Kumar, M. Savvides, C. Xie, K. Venkataramani, and J.Thornton, “Using composite correlation filters for biometric verifica-tion,” in Proc. SPIE, vol. 5106, Apr. 2003, pp. 13–21.

[10] K. Venkataramani and B. V. K. Vijaya Kumar, “Performance of com-posite correlation filters in fingerprint verification,” Opt. Eng., vol. 43,no. 8, pp. 1820–1827, Aug. 2004.

[11] X. Liu, T. Chen, and B. V. K. Vijaya Kumar, “Face authentication formultiple subjects using eigenflow,” Pattern Recognit., vol. 36, no. 2, pp.313–328, Feb. 2003.

[12] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. NewYork: Wiley, 2001.

Target Dependent Score Normalization Techniques andTheir Application to Signature Verification

Julian Fierrez-Aguilar, Javier Ortega-Garcia, andJoaquin Gonzalez-Rodriguez

Abstract—Score normalization methods in biometric verification, whichencompass the more traditional user-dependent decision thresholdingtechniques, are reviewed from a test hypotheses point of view. Theseare classified into test dependent and target dependent methods. Thefocus of the paper is on target dependent score normalization techniques,which are further classified into impostor-centric, target-centric, andtarget-impostor methods. These are applied to an on-line signature veri-fication system on signature data from the First International SignatureVerification Competition (SVC 2004). In particular, a target-centric tech-nique based on the cross-validation procedure provides the best relativeperformance improvement testing both with skilled (19%) and randomforgeries (53%) as compared to the raw verification performance withoutscore normalization (7.14% and 1.06% Equal Error Rate for skilled andrandom forgeries, respectively).

Index Terms—Biometrics, decision threshold, score normalization, sig-nature verification.

I. INTRODUCTION

Automatic extraction of identity cues from personal traits (e.g., fin-gerprints, speech, or face images) has given rise to a particular area ofpattern recognition (biometrics) where the goal is to infer identity ofpeople from personal data [1], [2]. The increasing interest in biomet-rics is related to the number of important applications where a correctassessment of identity is crucial. Biometrics provides a way to establishan identity based on “who you are,” rather than by “what you possess”or “what you know.” This concept not only ensures enhanced securitybut also avoids the need to remember and maintain multiple passwords.

Manuscript received December 15, 2003; revised May 19, 2004. This workwas supported by the Spanish Ministry of Science and Technology underprojects TIC2003-09068-C02-01 and TIC2003-08382-C05-01. This paper wasrecommended by Guest Editor D. Zhang.

The authors are with the Biometrics Research Lab.-ATVS, EUIT Teleco-municacion, Universidad Politecnica de Madrid, 28031 Madrid, Spain (e-mail:[email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/TSMCC.2005.848181

Previous studies have shown that the performance of a number ofbiometric verification systems, especially those based on behavioraltraits such as written signature [3]–[5] and voice [6], [7], can be im-proved with user-dependent decision thresholds. Even greater verifi-cation performance improvement can be expected through the use ofscore normalization techniques [8], [9]. These methods (which includethe user-dependent decision thresholding as a particular case) accountnot only for user specificities but also for intersession and environmentchanges [10].

The objectives of this work are: 1) to provide a framework for scorenormalization collecting previous work in related areas; 2) to providesome guidelines for the application of these techniques in real worldscenarios; and 3) to provide an example of a successful application ofthe proposed normalization methods regarding the First InternationalSignature Verification Competition (SVC 2004) [11], where the systemproposed by the authors [12] was ranked first and second for randomand skilled forgeries, respectively.

The paper is structured as follows: The Introduction includes somedefinitions, the system model of biometric verification with score nor-malization, and the description of a preliminary experiment which cor-roborates themotivation of this work.1 In Section III, the subset of scorenormalization methods we focus on is detailed. Some experiments onthe development corpus of SVC 2004 extended task are reported inSection IV. Conclusions are given in Section V.

A. Definitions and System Model

In authentication (also known as verification) applications, theclients or targets are known to the system (through an enrollmentor training process) whereas the impostors can potentially be theworld population. In such applications, the users provide a biometricsample X (e.g., a written signature) and their claimed identities Tand a one-to-one matching is performed. The result of the com-parison s (similarity score) can be further normalized to sn beforecomparing it to a decision threshold. If the score is higher than thedecision threshold, then the claim is accepted; otherwise, the claimis rejected. The system model of biometric authentication with scorenormalization is provided in Fig. 1 for an on-line signature verificationapplication.

Depending on the biometric verification system at hand, impostorsmay know information about the client that lowers verification per-formance when it is exploited (e.g., signature shape in signature ver-ification). As a result, two kinds of impostors are usually considered,namely: 1) casual impostors producing random forgeries, when no in-formation about target user is known and 2) real impostors producingskilled forgeries, when some information regarding the biometric traitbeing forged is used.

B. Experimental Motivation

As pointed out above, it has been observed in a number of bio-metric verification systems that using user-dependent thresholds im-proves verification performance [3]–[7]. This occurs because the clientand impostor score distributions are not aligned for the different targetsinvolved (mainly due to target specificities). The following preliminaryexperiment by using the on-line signature verification system describedin [12] on the development corpus of the SVC 2004 extended task [11]corroborates this fact.

Target-dependent client and impostor score distributions (Gaussianfit) are plotted in Fig. 2 and show testing either with skilled (left) or

1In Section II, the framework for score normalization and some backgroundon error estimation methods is described.

1094-6977/$20.00 © 2005 IEEE