Do singular values contain adequate information for face recognition?

Pattern Recognition 36 (2003) 649– 655www.elsevier.com/locate/patcog

Do singular values contain adequate informationfor face recognition?�

Yuan Tian, Tieniu Tan, YunhongWang ∗, Yuchun FangNational Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing,

PO Box 2728, China

Received 12 January 2001; accepted 18 April 2002

Abstract

Singular values (SVs) have been used for face recognition by many researchers. In this paper, we show that the SVs containlittle useful information for face recognition and most important information is encoded in the two orthogonal matrices of theSVD. Experimental results are given to support this observation. To overcome this problem, a new method for face recognitionbased on the above 6nding is proposed. The face image is projected on to the orthogonal basis of SVD and then the vectorsof coe8cients are used as the face image features. By using probability density of this image feature obtained by a simpli6edEM algorithm, the Bayesian classi6er is adopted to recognize the unknown faces. The proposed algorithm obtains acceptableexperimental results on the ORL face database.? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.

Keywords: Face recognition; Orthogonal decomposition; SVD; Bayesian decision

1. Introduction

Biometrics-based personal identi6cation is perceived asa key enabling technology in our increasingly networkedsociety. Among all the biometric identi6cationmethods, faceidenti6cation has attracted much attention in recent yearssince it is non-intrusive and user-friendly [1].

Great progress has been made in face recognition in thepast 20 years. For almost all previously proposed techniques,the success of face recognition depends on the solution oftwo problems: representation and matching [2]. The repre-sentation of a pattern can be considered as feature extractionin pattern recognition. Many methods are used for facerecognition [3–8]. In Ref. [9], image features are dividedinto four groups: visual features [10,11] statistical pixel

� This work is funded by research grants from the NSFC, the863 Program and the Chinese Academy of Sciences.

∗ Corresponding author. Tel.: +86 10 62 54 2 944; fax: +86 1062 551 993.E-mail address: [email protected] (Y. Wang).

features, transform coe8cient features [3], and algebraicfeatures [9]. The algebraic features represent intrinsic pro-perties of an image and have better stability. Hong [9]suggested that the algebraic features are valid features inobject recognition such as face recognition. He proposed asingular value decomposition (SVD) [12] based recogni-tion method which uses the singular values as the featurevectors. The eIectiveness of SVD has been tested in Refs.[9,13], respectively. In Ref. [9] an error rate of 42:47% wasrecorded which was thought to be caused by the statisticallimitations of the small samples. Cheng [13] proposed a hu-man face recognition method based on the statistical modelof small sample size that also used the singular values asthe face features. In his paper, an optimal discriminanttransformation is constructed to transform an original spaceof singular values (SVs) vectors into a new space whosedimension is signi6cantly lower than that of the originalspace to minimize the small sample size eIect. That ap-proach was tested on 64 facial images of eight people. Gooddiscrimination ability was obtained with an accuracy rateof 100% [13]. It should be noted that in order to make themethod independent of translation, rotation and scaling, the

0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S0031 -3203(02)00105 -X

650 Y. Tian et al. / Pattern Recognition 36 (2003) 649– 655

images were represented by Goshtasby’s shape matrices.The Goshtasby’s shape matrices are invariant to translation,rotation, and scaling of the facial images and are obtainedby polar quantization of the shape [14]. The above twomethods have never been tested with large face databasesand their eIectiveness with large databases remains un-known (especially, when there are variations in lighting andviewpoint). Furthermore, both methods use only SVs asface features. In Ref. [15], a new face recognition methodbased SVD was proposed by using reconstruction errorbased on singular values of diIerent face images and datafusion method, it is reported that the recognition accuracyof this method reaches 96% on the ORL face database.These SVD based face recognition methods are also citedby many papers such as [16,17].

All the methods shown above extracted SVs as the faceimage feature vector. An important problem one should ad-dress is whether the singular value (SV) has enough infor-mation to identify human face when the SV is used as thefeature vector. In this paper, experiments show that the SVof image only contains partial useful information about face,and most important information is carried by the orthogonalvector. It is thus expected that the results may be improvedif these orthogonal vectors are used as face features.

In this paper, a new method is proposed for face recogni-tion. This method consists of three parts. Firstly, we selectthe proper orthogonal bases and obtain its projection co-e8cients, respectively. Then some known samples and animproved EM algorithm [18,19] are combined to estimatethe probability density of these samples in the featurespace. Afterwards the Bayesian classi6er is used to identifythe human face. Experiments show that the proposedmethod is an eIective algorithm for face recognition.

2. Singular values do not contain adequate informationfor face recognition

Let us 6rst introduce the relevant properties of SVD.

Theorem 1 (SVD). If A∈Rm×n; then there exist ortho-gonal matrices

U = [u1; : : : ; um]∈Rm×m and V = [v1; : : : ; vn]T ∈Rn×n

such that

A= U A VT;

where A = diag(�1; �2; : : : ; �p); �1¿ �2¿ · · ·¿ �p¿ 0and p=min(m; n).

In recent years, SV have been introduced as the featurevector for face recognition and other applications [9,15,20].Can the singular values express the crucial features of a face?Experiments given later show that SVs only include partialinformation about the image, most crucial information is

contained in the two orthogonal matrices U and V as far asface recognition is concerned.

In Figs. 1 and 2, there are four diIerent faces whichare denoted as A1; A2; A3 and A4, respectively. Accordingto Theorem 1, every face image can be expressed as Ai =UiSiV T

i ; i=1; : : : ; 4, where Si is a diagonal matrix that con-sists of the singular values. To show how changes in SVsaIect the appearance of face images (hence SVs usefulnessfor face recognition), we swap the SV matrices between A1

and A2, A3 and A4, respectively, and keep their orthogonalmatrices unchanged. The reconstructed images are shownin Figs. 1(b),(d), Figs. 2(b) and (d). It can be seen that theSV has little inPuence on the appearance of the face imagesas far as face recognition is concerned. The crucial featurestherefore must be carried by the orthogonal matrices U andV of face images. In Figs. 1 and 2, (a) and (c) have the sameSVs as (d) and (b), respectively, but they express diIerentfaces. Fig. 3 is a set of faces derived from the ORL database.The main diIerence between Fig. 3 and the original ORLimages is that all the face images in Fig. 3 have the sameSVs as the SV of the top-left face image. Similar experi-ments have been done on all the images of ORL databaseand we 6nd that most of reconstructed images maintain thesame identity, but the gray level distributions of some ofthem (less than 25%) cover wider range than the original [0,255]. The number of the reconstructed images whose graylevels go beyond the range [0, 255] is reduced remarkablyby using the SVs obtained from the average image of thefaces. Some experiments that reconstruct the images withthe SVs of a non-face image have also been done such asFig. 4. The main diIerence between Figs. 3 and 4 is thatthe gray level distributions of most reconstructed images inFig. 4 go beyond the range [0, 255]. Apparently, one can-not diIerentiate these face images if only singular valuesare considered as the feature vectors for face recognition.Similar experiments have been done on all the images ofORL database on some other non-face image cases and thesimilar results are obtained. This observation shows that thesingular values of the image only contain little useful in-formation and most important features are encoded in theorthogonal matrices U and V of the image.

According to the experiments discussed above, the rela-tionship between the original image Ao and the reconstructedimage Ar that has been normalized can be represented ap-proximately by a linear transform

Ao = �Ar + C;

where � is a coe8cient and C is a constant matrix withthe same dimensions as the original image. Therefore theinPuence of SVs can be represented by the parameters of �and C. These experiments show that the swapping of SVshas little inPuence in the same type of images such as faceimages and has larger inPuence in diIerent type of images.In Fig. 3, all the images have the same singular values andthis means that � = 1 and C = [0]m×n, where m × n isthe dimension of the original image. In the case of Fig. 4,

Y. Tian et al. / Pattern Recognition 36 (2003) 649– 655 651

Fig. 1. EIects of swapping singular values of two images.

Fig. 2. EIects of swapping singular values of two images.

Fig. 3. ORL faces with the same singular values of the top-left face image.


Fig. 4. Normalized ORL faces that reconstructed with the same singular values of the top-left non-face image.

the � and C usually have larger change than the case ofFig. 3. In whichever cases of images, it is obvious that themost important information for recognition is encoded in thetwo orthogonal matrices and is not in the SVs.

These experiments show that the SVs do not contain ad-equate information for face recognition. This observation isalso testi6ed by experiments in Section 6.

Why did the previous SVD papers obtain the better re-sults by using SVs feature vector for face recognition? Somepossible reasons are enumerated as follows. Firstly, almostall of these papers completed face recognition in a smallerfaces database. These samples are too few to show the in-herent connection between face images and the SV featurevectors. The distribution of the SV feature vector may be justnow separated easily in the given testing database. Anotherimportant reason may be that the feature mapping classi6erswere used in these papers, such as RBF neural networks[15]. These classi6ers have stronger ability to separate thegiven samples; they can usually achieve better classi6ed re-sults even if the given samples have complicated distributionproperty. Therefore, it is possible to obtain better recogni-tion result on the condition discussed above. The viewpointis partly testi6ed in the later of this paper.

To overcome the de6ciency of singular values, a newmethod is proposed for face recognition in the followingsections. A new feature vector obtained by projecting theimage on to the set of orthogonal basis of the template im-age, is used as the feature for face recognition. Firstly, thefunctions of probability density distributions (FPDD) of ev-ery components of the feature vector are obtained by using

the known training samples and the method in Section 4.Afterwards we project the unknown face image on theseorthogonal bases to obtain projection coe8cients and thenrecognition task is completed based on the feature vector andthe Bayesian classi6er. Details of the algorithm are given inSection 5.

3. Face features based on orthogonal projectioncoe�cients of SVD

We now present a new feature based on orthogonal pro-jection coe8cients. Matrix A representing a face image canbe written as

A=m∑i=1

�iuivTi ; (1)

where �i are the singular values and uivTi is the orthogonalbases of SVD. The orthogonal bases uivTi is saved in themodel; the coe8cients �i = vTi B

Tui obtained by projectingimage B on to uivTi can be considered as the features of theface image B with respect to the template face A. The nexttheorem shows that these features are robust.

Theorem 2. Suppose A; B∈Rm×n; and m¡n. U =[u1; u2; : : : ; um] and V = [v1; v2; : : : ; vn]T are two or-thogonal matrices derived from a template face. If�i = vTi A

Tui and �i = vTi BTui; i = 1; : : : ; m; there will be∑m

i=1(�i − �i)26 ‖A− B‖2F ; where ‖ • ‖F is the Frobeniusnorm of matrix.


Proof.

‖A− B‖2F = ‖U (A− B)V‖2F = ‖UAV − UBV‖2F¿ ‖diag(�1; �2; : : : ; �m)− diag(�1; �2; : : : ; �m)‖2F

=m∑i=1

(�i − �i)2:

The last theorem shows that the feature vector proposedabove is robust since the change in the feature vector is lessthan that in the original matrix. Projecting the image on to aset of orthogonal bases that are saved in the model database,a vector X = (x1; x2; : : : ; xN ), which consists of projectioncoe8cients, is obtained. It is obvious that these coe8cientsare independent of each other in signi6cance of statistics.The probability density of X can thus be expressed as

p(X |!i) = p(x1; x2; : : : ; xN |!i)

= p1(x1|!i)P2(x2|!i); : : : ; pN (xN |!i); (2)

where !i represents class !i. According to the BayesianTheorem, the probability of X belonging to!i can be writtenas below when the a priori probabilities P(!i) are the sameamong all classes !i; i = 1; : : : ; N .

p(!i|X ) = p(X |!i)∑Ni=1p(X |!i)

: (3)

4. Estimation of probability density functions

A well-known method for estimating probability density,which is called as Mixture-of-Gaussians and EM algorithm,is proposed in Refs. [18,21]. It can be used to obtain thecomplex multi-dimension function of probability densitydistribution (FPDD). This algorithm is considered as a pro-cedure of optimization to 6nd a maximum. Under the specialcase of this paper (the overall probability density of featurevector can be represented as the products of the probabilitydensities of each component), this method is simpli6ed toreduce computation.

There is a set T that consists of given training samplesT : (x1; x2; : : : ; xN ). Their distribution region is in [xmin; xmax].Therefore, all the samples can be mapped into the region[0; L] by a linear transform

x′i =xi − xminxmax − xmin

L; i = 1; : : : ; N; (4)

where L is the number of Gaussian functions. The unknownFPDD may then be expressed as

p(x′i |$) =L∑

k=0

|�k |G(x − k); i = 1; : : : ; N; (5)

where G(x−k)=1=√2(� e−(x−k)2=2�2 is the Gaussian func-

tion,∑L

k=0 |�k |= 1 and $= {�k}Lk=0. $ may be estimated,

when the set of training samples have been given as follows:

$̂= arg max

[NT∏t=1

p(x′t |$)

]; (6)

where NT is the number of given training samples. The jointFPDD can be obtained from Eq. (2).

5. Outline of the proposed algorithm

The main processing steps of the algorithm may be sum-marized as follows.

Step 1: Suppose there are N persons in the database andeach person has K training face images (Fj; j = 1; : : : ; K).For person i, average the training images to obtain the tem-plate face image Mi = 1=K

∑Kj=1 Fj (here we assume that

all images have been pre-processed and normalized). By de-composing Mi by SVD, the resultant UMi , VMi and SMi aresaved as model parameters for person i.Step 2: Project every training face image on to the orthog-

onal matrices UMi , VMi to obtain projection coe8cient vec-tors. These coe8cients are then used to estimate the FPDDs.Step 3: To recognize an unknown face image X, we

project X to each known person’s orthogonal matricesto obtain the vectors of projection coe8cients. By usingEqs. (2) and (3), the probability of X belonging to eachknown person is computed. The person with the highestprobability determines the identity of X. If the highestprobability of image X is smaller than a given threshold,image X will be rejected.

6. Experimental results

We use the ORL database that contains a set of faces takenbetween April 1992 and April 1994 at the Olivetti ResearchLaboratory in Cambridge. There are 10 diIerent imagesfor each of the 40 distinct subjects. There are variations infacial expression (open=closed eyes, smiling=non-smiling),and facial details (glasses=no glasses). All images weretaken against a dark homogeneous background with the sub-jects in an up-right, frontal position, with tolerance for sometilting and rotation of up to about 20

◦. The images are

grayscale with a resolution of 92× 112.Experiments were performed with 6 training images and

4 test images for each person. The average correct veri6-cation rate is about 87:5% for using the coe8cients of theorthogonal projection in distance matching (i.e., the Eu-clidean distance classi6er). The rejection rate is 2:5%. Betterresults were achieved when applying coe8cients of the or-thogonal projection with the Bayesian classi6er. The recog-nition rate is 92:5% and the rejection rate is 3:3%. The im-provement is mostly due to the Bayesian classi6er spanningwider region and its learning ability.


Table 1Comparison of diIerent methods

Method 1 Method 2 Method 3 Method 4 Method 5

Recognition 15:60% 19:40% 87:50% 92:50% 89:38%rateRun time (s) 0.16 1.93 1.78 4.51 0.29

Table 2Comparison of diIerent methods based on ORL like face databasewith identical SVs

Method 1 Method 2 Method 3 Method 4 Method 5

Recognition 2:50% 2:50% 87:50% 92:50% 84:38%rateRun time (s) 0.16 1.93 1.78 4.51 0.29

Table 3The comparison of recognition rate on diIerent scale face databaseswith the feature of SVs and BP neural networks classi6er

Samples number 40 80 120 160Recognition rate 80:00% 72:50% 61:88% 50:63%

Some simple comparative experimental results are shownin Table 1. Method 1 is the method that uses the SVs as fea-tures with the Euclidean distance classi6er. Method 2 usesthe features proposed in Ref. [15] and the Euclidean dis-tance classi6er. Methods 3 and 4 utilize the feature vectorproposed in this paper with the Euclidean distance and theBayesian classi6er, respectively. The run time is the aver-age time for recognizing one face image. Method 5 is theeigenfaces method [3] that projects the face image on to theorthogonal basis of PCA and then the vectors of coe8cients(CV) are used as the face image features. To compare theeIect of PCA feature vector with the feature vector proposedin this paper, Euclidean distance classi6er is used to com-plete faces recognition. Tables 1 and 2 show that methods3 and 4 obtain the similar experimental results.

Table 2 shows the comparative experimental results thatare based on the new ORL database where all faces arepre-processed to have identical SVs (e.g. Fig. 3). The recog-nition accuracy of methods 1 and 2 should be zero in theory.The small value of 2:5% of methods 1 and 2 in Table 2 isdue to gray level normalization of the reconstructed images.

Another simple experiment is done to explain, whysome last papers that recognized faces by using SVs fea-ture obtained better experimental results. Firstly, four facesdatabase with diIerent faces number are created to testifytheir recognition rates. These face images are all comefrom ORL face database. Table 3 shows the classi6ed re-sults of BP neural networks by using the feature vectorsof face images singular values. Table 3 shows that therecognition rate may be better in spite of the bad distribu-tion property of feature vectors in feature space. The majorreason of the better experimental results may be caused by

the imperfect face database and the strong ability of theclassi6ers.

7. Conclusions and future work

In this paper, we 6nd that the singular values of face im-ages, which have been utilized by some researchers as thefeature vector for face recognition, only contain a very lim-ited amount of information useful for face recognition. Mostuseful information is contained in the two orthogonal matri-ces of SVD. To overcome this problem, we have proposeda new method for face recognition. After projecting the im-age to a group of orthogonal bases, the task of face recogni-tion can be completed by using these projection coe8cientsand Bayesian classi6er. The new method obtains better re-sults. The main shortcoming of the methods proposed in thispaper is that they require more computation. It is noticedthat the Bayesian classi6er is only one of the classi6ers wecan select, and the training samples are too small in the ORLdatabase. Therefore, we could further improve recognitionaccuracy if more training images are used or the proper clas-si6ers are selected such as neural networks [22]. Althoughthe SVs of image contain little useful information for facerecognition, the inPuences of SVs to image quality requirefurther studies.

References

[1] A. Pentland, T. Choudhury, Face recognition for smartenvironments, IEEE Comput. 33 (2) (2000) 50–55.

[2] Jun Zhang, Yong Yan, M. Lades, Face recognition: eigenface,elastic matching, and neural nets, Proceedings of the IEEE,85 (9) (1997) 1423–1435.

[3] M.A. Turk, A.P. Pentland, Face recognition using eigenfaces,in: Proceedings of the International Conference on PatternRecognition, 1991, pp. 586–591.

[4] B.S. Manjunath, R. Chellappa, et al., A feature basedapproach to face recognition, in: Proceedings of the IEEEComputer Society Conference on Computer Vision andPattern Recognition, 1992, pp. 373–378.

[5] C. Wu, J. Huang, Human face pro6le recognition by computer,Pattern Recognition 23 (1990) 255–259.

[6] J.C. Campos, A.D. Linney, J.P. Moss, The analysis of facialpro6les using scale space techniques, Pattern Recognition 26(6) (1993) 819–824.

[7] S. Akamatsu, T. Sasaki, H. Fukamachi, Y. Suenaga, A robustface identi6cation scheme—KL expansion of an invariantfeature space, in: SPIE Proceedings of the Intell. Robot andComputer Vision X: Algorithm and Technology, Vol. 1607,1991, pp. 71–84.

[8] M. Kirby, L. Sirovich, Application of Karhunen–Loeveprocedure for the characterization of human face, IEEE Trans.Pattern Anal. Mach. Intell. 12 (1990) 103–108.

[9] Z. Hong, Algebraic feature extraction of image for recognition,Pattern Recognition 24 (1991) 211–219.

[10] I. Craw, H. Ellis, J. Lishman, Finding face features, in:Proceedings of the Second European Conference on ComputerVision, 1992, pp. 92–96.


[11] R. Brunelli, T. Poggio, Face recognition: feature versustemplates IEEE Trans. Pattern Anal. Mach. Intell. 15 (10)(1993) 1041–1052.

[12] R.A. Horn, C.R. Johnson, Matrix Analysis, CambridgeUniversity Press, Cambridge, 1990, pp. 411–455.

[13] Yong-Qing Cheng, et al., Human face recognition methodbased on the statistical model of small sample size, SPIEProceedings of the Intell. Robots and Comput. Vision, Vol.1607, 1991, pp. 85–95.

[14] Y. Cheng, K. Liu, J. Yang, H. Wang, A robust algebraicmethod for human face recognition, in: Proceedings of the11th International Conference on Pattern Recognition, 1992,pp. 221–224.

[15] Wang Yun-Hong, Tan Tie-Niu, Zhu Yong, Face identi6cationbased on singular values decomposition and data fusion,Chinese J. Comput. 23 (6) (2000) (in Chinese).

[16] A.V. Ne6an, M.H. Hayes III, Hidden Markov models for facerecognition, in: IEEE International Conference on Acoustics,Speech and Signal Processing, Seattle, WA, 1998.

[17] L. Hong, A.K. Jain, Integrating faces and 6ngerprint forpersonal identi6cation, IEEE Trans. Pattern Anal. Mach. Intell.20 (12) (1997) 1295–1307.

[18] A. Dempster, N. Laird, D. Robin, Maximum likelihood fromincomplete data via the EM algorithm, J. Royal Stat. Soc. B39 (1977) 1–38.

[19] R. Redner, H. Walker, Mixture densities, maximum likelihoodand the EM algorithm, SIAM Rev. 26 (2) (1984)195–239.

[20] V.C. Klema, The singular value decomposition: itscomputation and some applications, IEEE Trans. Automat.Control 25 (1980) 164–176.

[21] B. Moghaddam, A. Pentland, Probabilistic visual learning forobject representation, IEEE Trans. Pattern Anal. Mach. Intell.19 (7) (1997) 696–710.

[22] S. Lawrence, L. Giles, Ah Chung Tsoi, A.D. Back, Facerecognition: a convolutional neural network approach, IEEETrans. Neural Networks (Special Issue on Neural Networksand Pattern Recognition) (1998) 98–113.

About the Author—YUAN TIAN was born on 7 January 1965, in Beijing, China. He received B.S degree, M.S and Ph.D. degrees fromXi’an Jiaotong University in 1985, 1994 and 1998, respectively. He joined the National Laboratory of Pattern Recognition, Institute ofAutomation, Chinese Academy of Sciences as a post doctor fellow in 1999. His current research interests include digital image and videoprocessing, visual surveillance and the analysis of biomedical imagery.

About the Author—TIENIU TAN received his B.Sc.(1984) in Electronic Engineering from Xi’an Jiaotong University, China, and M.Sc.(1986), DIC (1986) and Ph.D. (1989) in Electronic Engineering from Imperial College of Science, Technology and Medicine, London,England. In October 1989, he joined the Computational Vision Group at the Department of Computer Science, The University of Reading,England, where he worked as Research Fellow, Senior Research Fellow and Lecturer. In January 1998, he returned to China to join theNational Laboratory of Pattern Recognition, the Institute of Automation of the Chinese Academy of Sciences, Beijing, China. He is currentlyProfessor and Director of the National Laboratory of Pattern Recognition and Assistant Director of the Institute of Automation. He haspublished widely on image processing, computer vision and pattern recognition. He is a Senior Member of the IEEE and was an electedmember of the Executive Committee of the British Machine Vision Association and Society for Pattern Recognition (1996–1997). Heserves as referee for many major national and international journals and conferences. He is an Associate Editor of the International Journalof Pattern Recognition, the Asia Editor of the International Journal of Image and Vision Computing and is a founding co-chair of the IEEEInternational Workshop on Visual Surveillance. His current research interests include speech and image processing, machine and computervision, pattern recognition, multimedia, and robotics.

About the Author—YUN HONG WANG received her B.Sc. from Northwestern Polytechnical University in 1989, and obtained her Ph.D.degree from Nanjing University of Science and Technology in 1998. During April 1998 to May 2000, she was a postdoctoral fellow inNLPR, Institute of Automation, Chinese Academy of Sciences, China. Since 2000, she has been an associate professor in NLPR. Her currentresearch interests include biometrics, statistical pattern recognition and digital image processing. She has published more than 30 papers onmajor international and national journals and conferences, and applied 5 patents on biometrics.

About the Author—YUCHUN FANG received her B.Sc. (1996) in the Central University of Nationalities and M.Sc. (1999) in BeijingPolytechnic University, respectively. She is currently a Ph.D. candidate in the National Laboratory of Pattern Recognition, Institute ofAutomation, Chinese Academy of Sciences, Beijing, China. Her research interests include pattern recognition, image processing and computervision. Currently, her main research content is face tracking, detection and recognition.

Documents

Do singular values contain adequate information for face recognition?