Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Topological Principal Component Analysis for
face encoding and recognition
Albert Pujol 1, Jordi Vitri�a, Felipe Lumbreras,
Juan J. Villanueva
Computer Vision Center and Departament d'Inform�atica, Edi�ci O, Universitat
Aut�onoma de Barcelona 08193, Cerdanyola, Spain
Abstract
PCA-like methods make use of an estimation of the covariances between sample
variables. This estimation does not take into account their topological relationships.
This paper proposes how to use these relationships in order to estimate the covari-
ances in a more robust way. The new method Topological Principal Component
Analysis (TPCA) is tested using both face encoding and recognition experiments
showing how the generalization capabilities of PCA are improved.
Key words: Generalization; Principal component analysis; Face recognition;
Topological covariance matrix; Covariance estimation
1 Corresponding author.
E-mail address: [email protected] (A. Pujol)
Preprint submitted to Elsevier Preprint 16 February 2001
1 Introduction
Applying high-dimensional data to pattern recognition methods gives rise to
the well known problem of \the curse of dimensionality" (discussed by Fried-
man (1994)). In order to both avoid this problem and to increase e�ciency,
data are usually mapped into a space of lower dimensionality. Principal Com-
ponent Analysis or Karhunen-Loeve Transform (see Oja (1989)) is probably
the most widely applied technique in dimensionality reduction. This technique
is based on the estimation of the linear subspace that spans the samples. The
coordinates of the samples inside this subspace are then used to encode the
data instead of those of the original space.
Usual data sources (images, time series, . . . ) maintain topological relations
between their variables. These topological relations give us a prior knowledge
about the subspace spanned by the data. This paper proposes a new dimen-
sionality reduction technique called Topological Principal Components Analy-
sis (TPCA) that shows how to use this prior knowledge in order to improve the
�tness of the estimated subspace for unknown samples (i.e. generalization).
Face recognition has been a successful �eld of research mostly during the past
two decades. The growth of research works in the �eld is mainly due to three
factors: (i) the growing amount of face recognition applications re ected in
the increasing number of face recognition companies, (ii) the knowledge that
face recognition models provide to the cognitive science �eld, and (iii) the
fact that face recognition has become a paradigm or benchmark of recognition
methodologies.
In fact, face recognition, as a paradigm of a recognition system, has become
a benchmark to the solutions of some of the main computer vision problems
2
(invariance to view point; illumination change; occlusion; deformation due to
changes of expression, age, make-up and hair style), as well to some of the
main topics on statistical pattern recognition (feature selection; generaliza-
tion; discriminability, etc.). This is evident when the continuous publication
of reviews and surveys is considered, from the earliest of Samal and Iyengar
(1992), to the latest of Grudin (2000), passing through the works of Valentin
et al. (1994), Chellappa et al. (1995) and Fromherz (1998).
Most of the statistical approaches to face recognition and detection, are based
on gaussian or mixture of gaussian models (Moghaddam and Pentland (1997)).
These methods are mainly concerned with an estimation of the face mani-
fold. They are used to obtain statistically uncorrelated features through linear
or piecewise linear projections. Encoding images inside these manifolds pro-
vides a compact face representation, and the distance between images and the
manifolds provides a way of distinguishing between face and non-face images.
The major drawback of these approaches is that there is no guarantee that
the information relevant to discrimination between faces remains when im-
ages are encoded. Approaches such as the \Fisherfaces" (see Belhumeur et al.
(1997)) or the \dual eigenspaces" (Moghaddam (1999)) techniques try to avoid
these problems through intrapersonal (di�erences between images of the same
subject) and extrapersonal (di�erences between images of di�erent subjects)
gaussian models. All these methods depend on the accurate estimation of the
parameters of gaussian models and its generalization capabilities. This is the
problem that is addressed in this paper. Beymer and Poggio (1995) broached
the problem of generalization using prior knowledge of faces to generate new
synthetic image samples. Instead of generating new samples, the approach
presented in this paper introduces the prior knowledge inside the model.
TPCA uses the knowledge of the \a priori " correlation between variables due
3
to their topology. Besides the correlation, other prior measures (e.g. mutual
information) can be de�ned in terms of the variables topology, giving rise to
possible generalization improvements in other projection methods (Indepen-
dent Component Analysis (Comon (1994)), Projection Pursuit (Friedman and
Tukey (1974)), etc.). A review of these and other linear projection methods
can be found in Ripley (1996).
Even though the proposed method has been designed to improve the recon-
struction generalization capabilities of PCA methods, it has been also tested
in recognition experiments. This paper presents the results obtained using
a large facial image data set. A comparison of the results obtained by our
method and standard PCA is reported.
2 Principal Components method
Given a set of sample vectors S = fs1; s2; : : : ; spg of dimension n where si =
(si1; si2; : : : ; s
in)
T , the goal of the PCA method is to �nd an orthonormal set of
basis vectors (linear subspace) U = fu1;u2; : : : ;umg, where m < min(n; p),
such that the elements of S can be recovered optimally in a least square error
sense (Eq. (2)) from their projection into the space de�ned by U. We will
denote the reconstruction of the i-th sample vector as ~si:
~si = (mXj=1
uj(ujT(si ��s))) +�s (1)
where �s is the average vector of S, and the samples set squared reconstruction
error as:
" =1
p
pXi=1
jjsi �~sijj2 (2)
4
It can be shown (Bishop (1996)), that the bases of this subspace can be com-
puted as the m eigenvectors with highest associated eigenvalues of the samples
covariance matrix �:
� = E[(s��s)(s��s)T] (3)
where E[�] is expected value. So that each element �ij of the covariance matrix
� is the expected value of the product of the deviations of the random variables
i and j,
�ij = E[(si � �si)(sj � �sj)] (4)
This set of selected bases is called the Principal Components of the sample
set. In order to encode the sample data with the new base, their projection
into these principal components are used.
2.1 Topological relations and prior covariance matrices
The problem of generalization arises when only a small subset C � S of
samples (training set) is available. Our aim is to use some prior knowledge of
the relation between variables of the data to make a more accurate estimation
of the Principal Components of the full set S.
As we have seen, the full subspace estimation process depends on the estima-
tion of the covariance matrix � of S, computed using only a subset of samples
C � S. This matrix encodes the linear correlation between pairs of variables
observed in the samples set. The Principal Components construction process is
invariant to the ordering arrangement of the variables. When pattern variables
present topological relations (e.g. time series, or images), it is worth taking
5
them into account. Due to these topological (temporal or spatial) relationships
two close variable are more likely to be correlated than two distant ones. In
order to make them explicit we propose to compute a prior covariance ma-
trix �P . This matrix will be combined with the sample estimation covariance
matrix (which will be called �C from now) in order to obtain a more robust
matrix �S from which the S set subspace will be computed.
The topological relations determine a metric space between variables. The
prior covariance matrix �P , is then constructed making explicit this metric.
Thus, the a priori covariance between two variables will be de�ned as a func-
tion of the distance between them:
�
Pij = �(d(i; j)) (5)
In this way the distance function d(i; j) makes explicit the topological rela-
tion between variables, and � transforms distances into covariances. Di�erent
functions both for the distance (univariate or multivariate) as well as for the
covariance function can be considered.
We have considered face images in order to show the application of the pro-
posed method. In this case, the i-th and j-th variable will be the pixel intensity
values placed at the (fx(i); fy(i)) and (fx(j); fy(j)) positions of the images,
where fx(i) and fy(i) are the horizontal and vertical position of the i-th pixel
image read in raster sense.
Face images normalized in position present an almost left-right symmetry. This
permits us to de�ne among others, two possible univariate distance measures,
the �rst dE, is simply the Euclidean distance between pixel locations, and the
second dS, takes into account the left-right symmetry of faces:
dE(i; j) =q(fx(i)� fx(j))2 + (fy(i)� fy(j))2 (6)
6
dS(i; j) =
r(jfx(i)�
c
2j � jfx(j)�
c
2j)2 + (fy(i)� fy(j))2 (7)
where c is the horizontal image size.
In order to transform distances into covariances, a non-parametric estimation
of the function �(d) has been modeled. For each possible distance value , d,
the function �(d) is computed as the expected covariance between two pixels,
given that the distance between them (dE or dS) is equal to d.
�(d) = E[(si � �si)(sj � �sj)jd(i; j) = d] (8)
which is approximated using the values of the matrix �C . So that:
�(d) =1
Kd
X(i;j)jd(i;j)=d
�
Cij (9)
where Kd is the number of couples of variables (i; j) that satisfy d(i; j) = d.
FIGURE 1 GOES AROUND HERE
Fig. 1 shows the expected covariance value, �(d), estimated when Euclidean
distances, �(dE) (upper row), and symmetric topology, �(dS) (bottom row),
are considered. This �gure shows the functions obtained when di�erent train-
ing sets sizes (10, 50 and 100 images) are used. It has to be noted that the
estimated function remains almost the same despite the training set size used
to compute it, so that a small number of images is enough to estimate it in
a robust way. As expected, covariance decreases with the distance between
pixel positions. It has to be noted too, that when Euclidean topology is used
(upper row), an unexpected peak appears for distance 32 pixels. This peak is
due to the contribution of the covariance between background pixels at both
7
sides of the neck (see the �rst Principal Component in Fig. 2).
2.2 Prior and estimated covariance combination
We have two models of the space spanned by S: The �rst, �C , is �tted to
a subset C of the real space spanned by S, using too many free parameters
(or degrees of freedom) with respect to the number of samples (i.e. over-
�tted). The second, �P , is �tted to a subspace more general than the real
space spanned by S, using a reduced number of degrees of freedom (i.e. over-
relaxed). Combining �P and �C allows us to �nd a compromise between both
models, in order to increase the generalization of �C , and at same time adjust
�P to the real face space.
In order to show how these two models are combined, Eq. (3) is rewritten as:
�S =1
jSj
Xsi2S
(si ��s)(si ��s)T =1
jSj
Xsi2S
�i (10)
where jSj is the cardinal of the set S, and �i is the covariance contribution of
the sample si. Now suppose that the set S is split into two subsets, C and P,
such that S = C[P, and jSj = jCj+ jPj. Then Eq. (10) can be rewritten as:
�S =1
jSj
Xsi2C
�i +1
jSj
Xsi2P
�i =jCj
jCj+ jPj�C +
jPj
jCj+ jPj�P (11)
so that the covariance matrix of the full set S, is the linear combination of
the covariance matrices of the subsets C and P. Assuming that the energies
of the normalized images j(si ��s)j2 (i.e. the trace � of �i) are approximately
the same for all the images si in S, then:
jCj ��C
�
; jPj ��P
�
(12)
8
and Eq. (11) becomes:
�S = ��C + (1� �)�P (13)
where
� =(�C)
(�P ) + (�C)(14)
�C and �
P being the traces of the covariance matrices �C and �P respectively.
The resultant �S matrix of size n�n, where n is the dimension of the sample
vectors, is then diagonalized to obtain its eigenvectors and eigenvalues. Usu-
ally the dimension of the sample vectors is large, making the diagonalization
a computationally expensive process. The complexity of this step is usually
reduced by computing the eigenvectors of the implicit covariance matrix:
~� = C
TC (15)
being ~�, a p � p matrix, where, p ( p << n), is the number of sample vec-
tors, and the rows of the samples matrix C are the mean normalized sample
vectors (si ��s). The relationship between the eigenvectors and eigenvalues of
a covariance matrix and its implicit form are detailed in (Fukunaga (1990)) {
this technique is also refered to by some authors as SVD (Murase and Nayar
(1995)). The main drawback of the TPCA technique is that, unfortunately,
the SVD method cannot be applied to either �P or �S , making TPCA a more
computationally expensive process than PCA.
9
3 Experimental Results
The proposed method has been applied to the following face encoding and
recognition experiments. A set of 212 images from the AR-Face database 2 ,
have been used to test the system. These images correspond to 106 subjects
(two images per subject, taken at di�erent sessions and with similar illumina-
tion conditions).
Eye location has been used in order to normalize the images in size and po-
sition. Face images are then cropped and scaled to 40x35 pixels. The gener-
alization capabilities of the proposed method has been tested with training
sets of size 10, 20, 50 and 100 images, with the remaining 202, 192, 162 and
112 images being the test set. The training images, C, for each experiment
have been randomly selected. The training set is then used to compute an
estimation of the principal components of the full face space set S. In order to
test the di�erent proposed methods, two indicators are used: reconstruction
percentage and averaged recognition hit ratio, both evaluated in the test set.
The average recognition hit ratio is computed such that each image, si, in the
test set is projected into the TPCA subspace:
ai = UT(si ��s) (16)
For each ai, the nearest (in Euclidean distance) test set element aj is chosen.
argminjjjai � ajjj ; j 6= i (17)
A recognition hit is considered if both ai and aj correspond to the same subject.
The average hit ratio is then calculated as the percentage of hits with respect
2 Publicly available from http://www.cvc.uab.es/shared/arees/FaceDB.html
10
to the total number of test faces.
The reconstruction percentage is computed by averaging the percentage of the
reconstruction energy with respect to the original image energy, "i, of each of
the test set images si, where:
"i = 100
1�
jjsi �~sijj2
jjsijj2
!(18)
This procedure has been cross-validated by repeating the experiment 10 times
for each of the considered training set sizes, and then averaging the obtained
results.
FIGURE 2 GOES AROUND HERE
Fig. 2 depicts the eigenvectors of the sample covariance matrix �C (PCA),
as well as the prior �P and the combined �S (TPCA) covariance matrices
eigenvectors. For clarity, only the 1st, 5th, 10th, 20th and 50th eigenvectors
are shown. These eigenvectors have been constructed using a training set size
of 100 face images.
The average hit ratio and the percentage of reconstruction, obtained using the
p�1 principal components, of �C (PCA) (where p is the training set size) and
�S (TPCA) are shown in Table 1. This table shows the results obtained using
both Euclidean topology (TPCA dE), and symmetry (TPCA dS). It can be
seen that TPCA outperforms PCA, both for recognition and reconstruction
measured on the test set, regardless of the learning set size. Both measures
are improved when symmetry (dS) is considered instead of Euclidean topology
(dE).
FIGURE 3 GOES AROUND HERE
11
Table 1
Hit ratios (left) and reconstruction (right) (averaged over 10 experiments) obtained
using di�erent training sets of sizes p.
Hit rat. Hit rat. Hit rat. Hit rat. Recons Recons Recons Recons
Method p=10 p=20 p=50 p=100 p=10 p=20 p=50 p=100
PCA 24.15% 31.31% 42.00% 48.45% 58.75% 69.61% 81.92% 89.60%
TPCA dE 26.00% 35.89% 45.37% 50.27% 64.48% 73.49% 84.26% 91.28%
TPCA dS 28.10% 37.63% 46.62% 51.45% 65.68% 73.85% 84.77% 91.35%
Fig. 3 shows the evolution of the reconstruction (left column) and recognition
(right column) capabilities of TPCA (thick solid line) against PCA (thin dot-
ted line) as a function of the number of principal components used to encode
the faces. These �gures are constructed using symmetry with 20 (bottom �g-
ures) and 50 (top �gures) elements in the training set. It can be seen that both
measures are improved using TPCA when at least 2 components are used to
encode the faces.
4 Conclusions and further work
This paper present a new technique (TPCA) whose aim is to increase the
generalization capabilities of PCA. This is accomplished by taking into ac-
count the topological relationships between data variables. Experimental re-
sults show that the generalization capabilities of TPCA outperform those of
PCA, both in recognition and reconstruction.
The main drawback of this technique is its computational cost due to the
fact that a covariance matrix of size n�n must be diagonalized ( n being the
dimension of the samples vectors). Further work needs to be done in order to
12
overcome this problem, by reformulating the process, so that SVD methods
can be applied to increase the computational e�ciency of the TPCA method.
The work covered in this paper provides a number of areas of interest that may
be worth further investigation: i) The de�nition of a parametric model of the
� function that could help to �nd out analytical solutions for the eigenvectors
construction process. ii) In addition to the considered topological relations,
more speci�c topologies in data space could be de�ned, by taking into account
special data features. iii) The extension of this procedure to other projection
methods, such as Independent Component Analysis, where instead of prior
covariance between variables, prior mutual information due to topological re-
lations could be explored.
Acknowledgements
This work has been partially funded by project 2FD97-0618. We also thank
Marco Bressan and Je� Berens for their collaboration.
References
Belhumeur, P., Hespanha, J., and Kriegman, D. (1997). Eigenfaces vs. �sher-
faces: Recognition using class-speci�c linear projection. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 19(7):711{720.
Beymer, D. and Poggio, T. (1995). Face recognition from one example view.
In ICCV95, pages 500{507.
Bishop, C. M. (1996). Neural Networks for Pattern Recognition. Oxford Uni-
versity Press, Oxford.
13
Chellappa, R., Wilson, C., and Sirohey, S. (1995). Human and machine recog-
nition of faces: A survey. Proceedings of the IEEE, 83(5):705{740.
Comon, P. (1994). Independent component analysis, a new concept? Signal
Processing, 36(3):287{314.
Friedman, J. H. (1994). An overview of predictive learning and function ap-
proximation. In Cherkassky, V., Friedman, J. H., and Wechsler, H., editors,
From Statistics to Neural Networks, Theory and Pattern Recognition Appli-
cations, volume 136 of NATO ASI Series F, pages 1{61. Springer.
Friedman, J. H. and Tukey, J. W. (1974). A projection pursuit algorithm for
exploratory data analysis. IEEE Transactions on Computers, C(23):881{
889.
Fromherz, T. (1998). Face recognition: a summary of 1995 - 1997. Technical
Report TR-98-027, International Computer Science Institute, Berkeley, CA.
Fukunaga, K. (1990). Introduction to statistical pattern recognition. In Aca-
demic Press, New York and London.
Grudin, M. (2000). On internal representations in face recognition systems.
Pattern Recognition, 33(7):1161{1177.
Moghaddam, B. (1999). Principal manifolds and bayesian subspaces for visual
recognition. In ICCV99, pages 1131{1136.
Moghaddam, B. and Pentland, A. (1997). Probabilistic visual learning for
object representation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 19(7):696{710.
Murase, H. and Nayar, S. (1995). Visual learning and recognition of 3-d objects
from appearance. Internation Journal of Computer Vision, 14(1):5{24.
Oja, E. (1989). Neural networks, principal components, and subspaces. Inter-
national Journal of Neural Systems, 1:61{68.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge
Univerity Press.
14
Samal, A. and Iyengar, P. (1992). Automatic recognition and analysis of
human faces and facial expressions: A survey. Pattern Recognition, 25(1):65{
77.
Valentin, D., Abdi, H., O'Toole, A., and Cottrell, G. (1994). Connectionist
models of face processing: A survey. Pattern Recognition, 27(9):1209{1230.
15
List of Figures
1 Covariance as a function of the distance between pixels when
Euclidean (upper row) or symmetric (bottom row) topology
are considered, using training sets of size (rows) 10, 50 and
100 image samples. 17
2 1st, 5th, 10th, 20th and 50th eigenvector, computed using PCA
(�rst row), TPCA with Euclidean (3rd row) and left-right
symmetric topology (5th row). Rows 2nd and 4th show the
eigenvectors of the prior covariance matrix for both topologies. 18
3 Evolution of reconstruction (left) and recognition (right)
generalization, for a training set of 20 (down) and 50 (up)
images when symmetric topologies (dS) are considered. 18
16
Fig. 1. Covariance as a function of the distance between pixels when Euclidean
(upper row) or symmetric (bottom row) topology are considered, using training
sets of size (rows) 10, 50 and 100 image samples.
17
Fig. 2. 1st, 5th, 10th, 20th and 50th eigenvector, computed using PCA (�rst row),
TPCA with Euclidean (3rd row) and left-right symmetric topology (5th row). Rows
2nd and 4th show the eigenvectors of the prior covariance matrix for both topologies.
Fig. 3. Evolution of reconstruction (left) and recognition (right) generalization, for
a training set of 20 (down) and 50 (up) images when symmetric topologies (dS) are
considered.
18