op ological Principal Comp onen t Analysis for face enco ...jordi/pub/TPCA_DEFINITIU.pdf · mani-fold. They are used to obtain statistically uncorrelated features through linear or

Topological Principal Component Analysis for

face encoding and recognition

Albert Pujol 1, Jordi Vitri�a, Felipe Lumbreras,

Juan J. Villanueva

Computer Vision Center and Departament d'Inform�atica, Edi�ci O, Universitat

Aut�onoma de Barcelona 08193, Cerdanyola, Spain

Abstract

PCA-like methods make use of an estimation of the covariances between sample

variables. This estimation does not take into account their topological relationships.

This paper proposes how to use these relationships in order to estimate the covari-

ances in a more robust way. The new method Topological Principal Component

Analysis (TPCA) is tested using both face encoding and recognition experiments

showing how the generalization capabilities of PCA are improved.

Key words: Generalization; Principal component analysis; Face recognition;

Topological covariance matrix; Covariance estimation

1 Corresponding author.

E-mail address: [email protected] (A. Pujol)

Preprint submitted to Elsevier Preprint 16 February 2001

1 Introduction

Applying high-dimensional data to pattern recognition methods gives rise to

the well known problem of \the curse of dimensionality" (discussed by Fried-

man (1994)). In order to both avoid this problem and to increase e�ciency,

data are usually mapped into a space of lower dimensionality. Principal Com-

ponent Analysis or Karhunen-Loeve Transform (see Oja (1989)) is probably

the most widely applied technique in dimensionality reduction. This technique

is based on the estimation of the linear subspace that spans the samples. The

coordinates of the samples inside this subspace are then used to encode the

data instead of those of the original space.

Usual data sources (images, time series, . . . ) maintain topological relations

between their variables. These topological relations give us a prior knowledge

about the subspace spanned by the data. This paper proposes a new dimen-

sionality reduction technique called Topological Principal Components Analy-

sis (TPCA) that shows how to use this prior knowledge in order to improve the

�tness of the estimated subspace for unknown samples (i.e. generalization).

Face recognition has been a successful �eld of research mostly during the past

two decades. The growth of research works in the �eld is mainly due to three

factors: (i) the growing amount of face recognition applications re ected in

the increasing number of face recognition companies, (ii) the knowledge that

face recognition models provide to the cognitive science �eld, and (iii) the

fact that face recognition has become a paradigm or benchmark of recognition

methodologies.

In fact, face recognition, as a paradigm of a recognition system, has become

a benchmark to the solutions of some of the main computer vision problems

2

(invariance to view point; illumination change; occlusion; deformation due to

changes of expression, age, make-up and hair style), as well to some of the

main topics on statistical pattern recognition (feature selection; generaliza-

tion; discriminability, etc.). This is evident when the continuous publication

of reviews and surveys is considered, from the earliest of Samal and Iyengar

(1992), to the latest of Grudin (2000), passing through the works of Valentin

et al. (1994), Chellappa et al. (1995) and Fromherz (1998).

Most of the statistical approaches to face recognition and detection, are based

on gaussian or mixture of gaussian models (Moghaddam and Pentland (1997)).

These methods are mainly concerned with an estimation of the face mani-

fold. They are used to obtain statistically uncorrelated features through linear

or piecewise linear projections. Encoding images inside these manifolds pro-

vides a compact face representation, and the distance between images and the

manifolds provides a way of distinguishing between face and non-face images.

The major drawback of these approaches is that there is no guarantee that

the information relevant to discrimination between faces remains when im-

ages are encoded. Approaches such as the \Fisherfaces" (see Belhumeur et al.

(1997)) or the \dual eigenspaces" (Moghaddam (1999)) techniques try to avoid

these problems through intrapersonal (di�erences between images of the same

subject) and extrapersonal (di�erences between images of di�erent subjects)

gaussian models. All these methods depend on the accurate estimation of the

parameters of gaussian models and its generalization capabilities. This is the

problem that is addressed in this paper. Beymer and Poggio (1995) broached

the problem of generalization using prior knowledge of faces to generate new

synthetic image samples. Instead of generating new samples, the approach

presented in this paper introduces the prior knowledge inside the model.

TPCA uses the knowledge of the \a priori " correlation between variables due

3

to their topology. Besides the correlation, other prior measures (e.g. mutual

information) can be de�ned in terms of the variables topology, giving rise to

possible generalization improvements in other projection methods (Indepen-

dent Component Analysis (Comon (1994)), Projection Pursuit (Friedman and

Tukey (1974)), etc.). A review of these and other linear projection methods

can be found in Ripley (1996).

Even though the proposed method has been designed to improve the recon-

struction generalization capabilities of PCA methods, it has been also tested

in recognition experiments. This paper presents the results obtained using

a large facial image data set. A comparison of the results obtained by our

method and standard PCA is reported.

2 Principal Components method

Given a set of sample vectors S = fs1; s2; : : : ; spg of dimension n where si =

(si1; si2; : : : ; s

in)

T , the goal of the PCA method is to �nd an orthonormal set of

basis vectors (linear subspace) U = fu1;u2; : : : ;umg, where m < min(n; p),

such that the elements of S can be recovered optimally in a least square error

sense (Eq. (2)) from their projection into the space de�ned by U. We will

denote the reconstruction of the i-th sample vector as ~si:

~si = (mXj=1

uj(ujT(si ��s))) +�s (1)

where �s is the average vector of S, and the samples set squared reconstruction

error as:

" =1

p

pXi=1

jjsi �~sijj2 (2)

4

It can be shown (Bishop (1996)), that the bases of this subspace can be com-

puted as the m eigenvectors with highest associated eigenvalues of the samples

covariance matrix �:

� = E[(s��s)(s��s)T] (3)

where E[�] is expected value. So that each element �ij of the covariance matrix

� is the expected value of the product of the deviations of the random variables

i and j,

�ij = E[(si � �si)(sj � �sj)] (4)

This set of selected bases is called the Principal Components of the sample

set. In order to encode the sample data with the new base, their projection

into these principal components are used.

2.1 Topological relations and prior covariance matrices

The problem of generalization arises when only a small subset C � S of

samples (training set) is available. Our aim is to use some prior knowledge of

the relation between variables of the data to make a more accurate estimation

of the Principal Components of the full set S.

As we have seen, the full subspace estimation process depends on the estima-

tion of the covariance matrix � of S, computed using only a subset of samples

C � S. This matrix encodes the linear correlation between pairs of variables

observed in the samples set. The Principal Components construction process is

invariant to the ordering arrangement of the variables. When pattern variables

present topological relations (e.g. time series, or images), it is worth taking

5

them into account. Due to these topological (temporal or spatial) relationships

two close variable are more likely to be correlated than two distant ones. In

order to make them explicit we propose to compute a prior covariance ma-

trix �P . This matrix will be combined with the sample estimation covariance

matrix (which will be called �C from now) in order to obtain a more robust

matrix �S from which the S set subspace will be computed.

The topological relations determine a metric space between variables. The

prior covariance matrix �P , is then constructed making explicit this metric.

Thus, the a priori covariance between two variables will be de�ned as a func-

tion of the distance between them:

�

Pij = �(d(i; j)) (5)

In this way the distance function d(i; j) makes explicit the topological rela-

tion between variables, and � transforms distances into covariances. Di�erent

functions both for the distance (univariate or multivariate) as well as for the

covariance function can be considered.

We have considered face images in order to show the application of the pro-

posed method. In this case, the i-th and j-th variable will be the pixel intensity

values placed at the (fx(i); fy(i)) and (fx(j); fy(j)) positions of the images,

where fx(i) and fy(i) are the horizontal and vertical position of the i-th pixel

image read in raster sense.

Face images normalized in position present an almost left-right symmetry. This

permits us to de�ne among others, two possible univariate distance measures,

the �rst dE, is simply the Euclidean distance between pixel locations, and the

second dS, takes into account the left-right symmetry of faces:

dE(i; j) =q(fx(i)� fx(j))2 + (fy(i)� fy(j))2 (6)

6

dS(i; j) =

r(jfx(i)�

c

2j � jfx(j)�

c

2j)2 + (fy(i)� fy(j))2 (7)

where c is the horizontal image size.

In order to transform distances into covariances, a non-parametric estimation

of the function �(d) has been modeled. For each possible distance value , d,

the function �(d) is computed as the expected covariance between two pixels,

given that the distance between them (dE or dS) is equal to d.

�(d) = E[(si � �si)(sj � �sj)jd(i; j) = d] (8)

which is approximated using the values of the matrix �C . So that:

�(d) =1

Kd

X(i;j)jd(i;j)=d

�

Cij (9)

where Kd is the number of couples of variables (i; j) that satisfy d(i; j) = d.

FIGURE 1 GOES AROUND HERE

Fig. 1 shows the expected covariance value, �(d), estimated when Euclidean

distances, �(dE) (upper row), and symmetric topology, �(dS) (bottom row),

are considered. This �gure shows the functions obtained when di�erent train-

ing sets sizes (10, 50 and 100 images) are used. It has to be noted that the

estimated function remains almost the same despite the training set size used

to compute it, so that a small number of images is enough to estimate it in

a robust way. As expected, covariance decreases with the distance between

pixel positions. It has to be noted too, that when Euclidean topology is used

(upper row), an unexpected peak appears for distance 32 pixels. This peak is

due to the contribution of the covariance between background pixels at both

7

sides of the neck (see the �rst Principal Component in Fig. 2).

2.2 Prior and estimated covariance combination

We have two models of the space spanned by S: The �rst, �C , is �tted to

a subset C of the real space spanned by S, using too many free parameters

(or degrees of freedom) with respect to the number of samples (i.e. over-

�tted). The second, �P , is �tted to a subspace more general than the real

space spanned by S, using a reduced number of degrees of freedom (i.e. over-

relaxed). Combining �P and �C allows us to �nd a compromise between both

models, in order to increase the generalization of �C , and at same time adjust

�P to the real face space.

In order to show how these two models are combined, Eq. (3) is rewritten as:

�S =1

jSj

Xsi2S

(si ��s)(si ��s)T =1

jSj

Xsi2S

�i (10)

where jSj is the cardinal of the set S, and �i is the covariance contribution of

the sample si. Now suppose that the set S is split into two subsets, C and P,

such that S = C[P, and jSj = jCj+ jPj. Then Eq. (10) can be rewritten as:

�S =1

jSj

Xsi2C

�i +1

jSj

Xsi2P

�i =jCj

jCj+ jPj�C +

jPj

jCj+ jPj�P (11)

so that the covariance matrix of the full set S, is the linear combination of

the covariance matrices of the subsets C and P. Assuming that the energies

of the normalized images j(si ��s)j2 (i.e. the trace � of �i) are approximately

the same for all the images si in S, then:

jCj ��C

�

; jPj ��P

�

(12)

8

and Eq. (11) becomes:

�S = ��C + (1� �)�P (13)

where

� =(�C)

(�P ) + (�C)(14)

�C and �

P being the traces of the covariance matrices �C and �P respectively.

The resultant �S matrix of size n�n, where n is the dimension of the sample

vectors, is then diagonalized to obtain its eigenvectors and eigenvalues. Usu-

ally the dimension of the sample vectors is large, making the diagonalization

a computationally expensive process. The complexity of this step is usually

reduced by computing the eigenvectors of the implicit covariance matrix:

~� = C

TC (15)

being ~�, a p � p matrix, where, p ( p << n), is the number of sample vec-

tors, and the rows of the samples matrix C are the mean normalized sample

vectors (si ��s). The relationship between the eigenvectors and eigenvalues of

a covariance matrix and its implicit form are detailed in (Fukunaga (1990)) {

this technique is also refered to by some authors as SVD (Murase and Nayar

(1995)). The main drawback of the TPCA technique is that, unfortunately,

the SVD method cannot be applied to either �P or �S , making TPCA a more

computationally expensive process than PCA.

9

3 Experimental Results

The proposed method has been applied to the following face encoding and

recognition experiments. A set of 212 images from the AR-Face database 2 ,

have been used to test the system. These images correspond to 106 subjects

(two images per subject, taken at di�erent sessions and with similar illumina-

tion conditions).

Eye location has been used in order to normalize the images in size and po-

sition. Face images are then cropped and scaled to 40x35 pixels. The gener-

alization capabilities of the proposed method has been tested with training

sets of size 10, 20, 50 and 100 images, with the remaining 202, 192, 162 and

112 images being the test set. The training images, C, for each experiment

have been randomly selected. The training set is then used to compute an

estimation of the principal components of the full face space set S. In order to

test the di�erent proposed methods, two indicators are used: reconstruction

percentage and averaged recognition hit ratio, both evaluated in the test set.

The average recognition hit ratio is computed such that each image, si, in the

test set is projected into the TPCA subspace:

ai = UT(si ��s) (16)

For each ai, the nearest (in Euclidean distance) test set element aj is chosen.

argminjjjai � ajjj ; j 6= i (17)

A recognition hit is considered if both ai and aj correspond to the same subject.

The average hit ratio is then calculated as the percentage of hits with respect

2 Publicly available from http://www.cvc.uab.es/shared/arees/FaceDB.html

10

to the total number of test faces.

The reconstruction percentage is computed by averaging the percentage of the

reconstruction energy with respect to the original image energy, "i, of each of

the test set images si, where:

"i = 100

1�

jjsi �~sijj2

jjsijj2

!(18)

This procedure has been cross-validated by repeating the experiment 10 times

for each of the considered training set sizes, and then averaging the obtained

results.


Fig. 2 depicts the eigenvectors of the sample covariance matrix �C (PCA),

as well as the prior �P and the combined �S (TPCA) covariance matrices

eigenvectors. For clarity, only the 1st, 5th, 10th, 20th and 50th eigenvectors

are shown. These eigenvectors have been constructed using a training set size

of 100 face images.

The average hit ratio and the percentage of reconstruction, obtained using the

p�1 principal components, of �C (PCA) (where p is the training set size) and

�S (TPCA) are shown in Table 1. This table shows the results obtained using

both Euclidean topology (TPCA dE), and symmetry (TPCA dS). It can be

seen that TPCA outperforms PCA, both for recognition and reconstruction

measured on the test set, regardless of the learning set size. Both measures

are improved when symmetry (dS) is considered instead of Euclidean topology

(dE).


11

Table 1

Hit ratios (left) and reconstruction (right) (averaged over 10 experiments) obtained

using di�erent training sets of sizes p.

Hit rat. Hit rat. Hit rat. Hit rat. Recons Recons Recons Recons

Method p=10 p=20 p=50 p=100 p=10 p=20 p=50 p=100

PCA 24.15% 31.31% 42.00% 48.45% 58.75% 69.61% 81.92% 89.60%

TPCA dE 26.00% 35.89% 45.37% 50.27% 64.48% 73.49% 84.26% 91.28%

TPCA dS 28.10% 37.63% 46.62% 51.45% 65.68% 73.85% 84.77% 91.35%

Fig. 3 shows the evolution of the reconstruction (left column) and recognition

(right column) capabilities of TPCA (thick solid line) against PCA (thin dot-

ted line) as a function of the number of principal components used to encode

the faces. These �gures are constructed using symmetry with 20 (bottom �g-

ures) and 50 (top �gures) elements in the training set. It can be seen that both

measures are improved using TPCA when at least 2 components are used to

encode the faces.

4 Conclusions and further work

This paper present a new technique (TPCA) whose aim is to increase the

generalization capabilities of PCA. This is accomplished by taking into ac-

count the topological relationships between data variables. Experimental re-

sults show that the generalization capabilities of TPCA outperform those of

PCA, both in recognition and reconstruction.

The main drawback of this technique is its computational cost due to the

fact that a covariance matrix of size n�n must be diagonalized ( n being the

dimension of the samples vectors). Further work needs to be done in order to

12

overcome this problem, by reformulating the process, so that SVD methods

can be applied to increase the computational e�ciency of the TPCA method.

The work covered in this paper provides a number of areas of interest that may

be worth further investigation: i) The de�nition of a parametric model of the

� function that could help to �nd out analytical solutions for the eigenvectors

construction process. ii) In addition to the considered topological relations,

more speci�c topologies in data space could be de�ned, by taking into account

special data features. iii) The extension of this procedure to other projection

methods, such as Independent Component Analysis, where instead of prior

covariance between variables, prior mutual information due to topological re-

lations could be explored.

Acknowledgements

This work has been partially funded by project 2FD97-0618. We also thank

Marco Bressan and Je� Berens for their collaboration.

References

Belhumeur, P., Hespanha, J., and Kriegman, D. (1997). Eigenfaces vs. �sher-

faces: Recognition using class-speci�c linear projection. IEEE Transactions

on Pattern Analysis and Machine Intelligence, 19(7):711{720.

Beymer, D. and Poggio, T. (1995). Face recognition from one example view.

In ICCV95, pages 500{507.

Bishop, C. M. (1996). Neural Networks for Pattern Recognition. Oxford Uni-

versity Press, Oxford.

13

Chellappa, R., Wilson, C., and Sirohey, S. (1995). Human and machine recog-

nition of faces: A survey. Proceedings of the IEEE, 83(5):705{740.

Comon, P. (1994). Independent component analysis, a new concept? Signal

Processing, 36(3):287{314.

Friedman, J. H. (1994). An overview of predictive learning and function ap-

proximation. In Cherkassky, V., Friedman, J. H., and Wechsler, H., editors,

From Statistics to Neural Networks, Theory and Pattern Recognition Appli-

cations, volume 136 of NATO ASI Series F, pages 1{61. Springer.

Friedman, J. H. and Tukey, J. W. (1974). A projection pursuit algorithm for

exploratory data analysis. IEEE Transactions on Computers, C(23):881{

889.

Fromherz, T. (1998). Face recognition: a summary of 1995 - 1997. Technical

Report TR-98-027, International Computer Science Institute, Berkeley, CA.

Fukunaga, K. (1990). Introduction to statistical pattern recognition. In Aca-

demic Press, New York and London.

Grudin, M. (2000). On internal representations in face recognition systems.

Pattern Recognition, 33(7):1161{1177.

Moghaddam, B. (1999). Principal manifolds and bayesian subspaces for visual

recognition. In ICCV99, pages 1131{1136.

Moghaddam, B. and Pentland, A. (1997). Probabilistic visual learning for

object representation. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 19(7):696{710.

Murase, H. and Nayar, S. (1995). Visual learning and recognition of 3-d objects

from appearance. Internation Journal of Computer Vision, 14(1):5{24.

Oja, E. (1989). Neural networks, principal components, and subspaces. Inter-

national Journal of Neural Systems, 1:61{68.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge

Univerity Press.

14

Samal, A. and Iyengar, P. (1992). Automatic recognition and analysis of

human faces and facial expressions: A survey. Pattern Recognition, 25(1):65{

77.

Valentin, D., Abdi, H., O'Toole, A., and Cottrell, G. (1994). Connectionist

models of face processing: A survey. Pattern Recognition, 27(9):1209{1230.

15

List of Figures

1 Covariance as a function of the distance between pixels when

Euclidean (upper row) or symmetric (bottom row) topology

are considered, using training sets of size (rows) 10, 50 and

100 image samples. 17

2 1st, 5th, 10th, 20th and 50th eigenvector, computed using PCA

(�rst row), TPCA with Euclidean (3rd row) and left-right

symmetric topology (5th row). Rows 2nd and 4th show the

eigenvectors of the prior covariance matrix for both topologies. 18

3 Evolution of reconstruction (left) and recognition (right)

generalization, for a training set of 20 (down) and 50 (up)

images when symmetric topologies (dS) are considered. 18

16

Fig. 1. Covariance as a function of the distance between pixels when Euclidean

(upper row) or symmetric (bottom row) topology are considered, using training

sets of size (rows) 10, 50 and 100 image samples.

17

Fig. 2. 1st, 5th, 10th, 20th and 50th eigenvector, computed using PCA (�rst row),

TPCA with Euclidean (3rd row) and left-right symmetric topology (5th row). Rows

2nd and 4th show the eigenvectors of the prior covariance matrix for both topologies.

Fig. 3. Evolution of reconstruction (left) and recognition (right) generalization, for

a training set of 20 (down) and 50 (up) images when symmetric topologies (dS) are

considered.

18

Documents

op ological Principal Comp onen t Analysis for face enco ...jordi/pub/TPCA_DEFINITIU.pdf · mani-fold. They are used to obtain statistically uncorrelated features through linear or