Upload
ning-wu
View
214
Download
0
Embed Size (px)
Citation preview
ORIGINAL ARTICLE
Distance approximation for two-phase test sample representationin face recognition
Xiang Wu • Ning Wu
Received: 26 August 2012 / Accepted: 29 January 2013
� Springer-Verlag London 2013
Abstract The two-phase test sample representation
(TPTSR) scheme was proposed as a useful method for face
recognition; however, the sample selection based on sparse
representation in the first phase is not necessary. This is
because the first phase only plays a role of course search in
TPTSR, but the sparse representation method is suitable for
fine classification. This paper proves that alternative near-
est-neighbor selection criterions with higher efficiency can
be used in the first phase of TPTSR without compromising
the classification accuracy. Theoretical analysis and
experimental results show that the original distance metric
based on sparse representation in the first phase of the
TPTSR can be approximated with a more straightforward
metric while maintaining a comparable classification per-
formance with the original TPTSR. Therefore, the com-
putational load of the TPTSR can be greatly reduced.
Keywords Computer vision � Face recognition � Pattern
recognition � Sparse representation � Transform methods
1 Introduction
Face recognition has drawn many researchers’ attention in
recent years, and many approaches have been found for this
application. One type of the methods is to reduce the
dimensionality of sample by extracting the feature vector
with linear transformation methods, such as the principal
component analysis (PCA) [1–3] and the linear discrimi-
nant analysis (LDA) [4, 5]. In the PCA method, the training
samples and the testing samples are transformed from the
original sample space into a space with the maximum
variance of all the samples, while the LDA method con-
verts the samples to a feature space where the distances of
the centers of different classes are maximized. In these two
transformation methods, both the training samples and the
testing samples have their corresponding representations in
the new feature space, and the classification is carried out
based on the distance between the representations related to
the training set and the testing set.
Another type of transformation-based method was pro-
posed to focus on local information of the training samples.
Instead of using the whole training set, this type of method
only uses part of the samples, since the performance of the
classifiers is usually limited within some certain local
distributions. By concentrating on the local distribution of
training data, the design and testing of the classifier can be
much more efficient than the global methods [6]. Typical
examples of local LDA methods include the method for
multimodal data projection [7] and the approach to use the
local dependencies of samples for classification [8]. It is
also found that the local PCA is more efficient than the
global PCA in feature extraction [9] or sample clustering
[10].
Recently, the sparse representation (SR) theory has been
introduced in face recognition applications and has drawn a
lot of interests [11–18]. In sparse representation methods,
the testing sample is written in the form of the linear
combination of all the training samples, and the represen-
tation error of each of the classes is used for estimating the
X. Wu
School of Mechanical and Electrical Engineering,
Harbin Institute of Technology, 92 West Dazhi Street,
Nan Gang District, Harbin 150001, China
N. Wu (&)
Shenzhen Key Lab of Wind Power and Smart Grid,
Harbin Institute of Technology Shenzhen Graduate School,
Shenzhen 518055, China
e-mail: [email protected]
123
Neural Comput & Applic
DOI 10.1007/s00521-013-1352-8
target class. In previous study, l1-norm is usually used for
approximating the representation error, it has been found
that, however, it is the collaboration representation in the
SR method that helps to differentiate the target class, and
l2-norm in the approximation can achieve similar efficiency
[19].
It has been found that the linear combination of training
set can help to increase the representational capacity of the
training samples. The synthetic discriminant function
(SDF) was proposed for this purpose [20], and the nearest
linear combination (NLC) method utilizes the linear com-
bination of training set in a different way for eigenface-
based face recognition [21, 22]. However, in the NLC
method, the distance between the testing sample and the
linear combination of training classes is very sensitive to
the mean values of different classes, and the normalization
of all samples is required before the NLC processing.
In the SR method, a set of overdetermined equations is
required to be solved in order to obtain the linear repre-
sentation coefficients. However, if the number of samples
is massive, the computational load as well as memory
required to solve the equations would be unacceptably big.
In a recent effort, a two-phase test sample representation
(TPTSR) scheme was proposed as an efficient SR approach
for face recognition [23]. In this method, the number of
training samples for the SR processing is reduced by
selecting M-nearest neighbors to the testing image before
hand. In the second phase of the TPTSR, the selected
subset of M-nearest neighbors is used to linearly represent
the testing sample like the original SR method. The two-
phase structure in the TPTSR scheme enhances the per-
formance of the SR method by reducing the complexity of
data based on a local distribution of sample space.
Although the TPTSR method has been proven to be
powerful in face recognition, there is still great potential
that the computation load can be reduced. Since the first
phase processing in the TPTSR is designed to provide a
course classification to the training samples, fine classifi-
cation methods like SR is in fact not necessary in this
situation. In this paper, we show that the SR method for
selecting nearest neighbors for the TPTSR is not necessary,
and we study alternative nearest-neighbor selection crite-
rions for the first phase of the TPTSR method by approxi-
mating the distance between a testing sample and a training
sample with a more straightforward metric. In the theo-
retical analysis, we prove that selecting M-nearest neigh-
bors using distance approximation would only result in
ignorable uncertainty to the classification accuracy, but the
computational load can be greatly reduced.
The computational performance of a pattern recognition
method is usually measured with time or complexity. In the
previous literatures, computation time is usually considered
as a popular metric to compare the performance of different
methods [4, 24]. However, the computation time of a
method depends greatly on the quality of programming,
and it will be different considerably case by case. The
comparison of computation complexity has rarely been
considered in face recognition; however, it is the theoret-
ical analysis to the computation performance of a method.
In this paper, the computation complexity of the proposed
methods is compared with the original TPTSR and the
sorting of computation efficiency for all the methods can be
clearly identified.
In the initial testing, we compare the computational
efficiency as well as classification performance of 10 most
popular metrics in the field of pattern recognition and
digital image processing. It is found that seven types of
distance metrics out of 10 result in similar classification
performance with the original linear representation crite-
rion. However, the theoretical analysis in their compu-
tational complexity shows that the TPTSR method
implemented with the City-block distance and the Euclid-
ean distance offers the most efficient performance in face
recognition. By replacing the linear representation criterion
in the first phase of TPTSR with the City-block distance or
the Euclidean distance metrics, the computation time for
the face recognition task can be significantly reduced while
maintaining almost the same classification performance.
The comparison between the TPTSR and other state-of-
the-art face recognition methods such as the LDA, PCA, or
global representation methods has been demonstrated
elsewhere [23], and in this study, we only compare the
performance of the TPTSR method with different distance
metrics.
In the next section of this paper, we will introduce the
theory of the TPTSR with different nearest-neighbor
selection criterions. Section 3 presents our experimental
results with different face image databases, and finally a
conclusion will be drawn in Sect. 4.
2 Two-phase test sample representation (TPTSR)
with M-nearest-neighbor selection criterions
In this section, we will excavate the optimization solution
of the TPTSR scheme and show that the efficiency of the
first phase processing can be improved.
2.1 First phase of the TPTSR
The first phase of the TPTSR is to reduce the global sample
space to a local area for the target class by selecting
M-nearest neighbors from all the training samples for fur-
ther processing in the second phase [23]. The M-nearest
neighbors are selected by calculating the weighted dis-
tances of the testing sample from each of the training
Neural Comput & Applic
123
samples. Firstly, let us assume that there are L classes and n
training images, x1, x2, …, xn, and if some of these images
are from the jth class (j = 1, 2, …, L), then j is their class
label. It is also assumed that a test image y can be written in
the form of linear combination of all the training samples,
such as
y ¼ a1x1 þ a2x2 þ � � � þ anxn; ð1Þ
where ai (i = 1, 2, …, n) is the coefficient for each training
image xn. Equation (1) can also be written in the form of
vector operation, such as
y ¼ XA; ð2Þ
where A = [a1 … an]T, X = [x1 … xn]T. x1 … xn and y are
all column vectors. If X is a singular square matrix, Eq. (2)
can be solved by using A = (XTX ? lI)-1XTy, or it can be
solved by using A = X-1y, where l is a small positive
constant and I is the identity matrix. In our experiment with
the T-TPTSR method, l in the solution is set to be 0.01.
By solving Eq. (2), we can represent the testing image
using the linear combination of the training set as shown in
Eq. (1), which means that the testing image is essentially an
approximation of the weighted summation of all the
training images, and the weighted image aixi is a part of the
approximation. In order to measure the distance between
the training image xi and the testing image y, a distance
metric is defined as follows,
ei ¼ y� aixik k2; ð3Þ
where ei is called the distance function, and it gives the
deviation of the testing sample y from the training sample
xi. It is clear that a smaller value of ei means the ith training
sample is closer to the testing sample, and it is more
probable to be the member of the target class. These
M-nearest neighbors are chosen to be processed further in the
second phase of the TPTSR where the final decision will be
made within a much smaller sample space. We assume that
the M-nearest neighbors selected are denoted as x1 … xM,
and the corresponding class labels are C = {c1 … cM},
where ci 2 1; 2; . . .; Lf g. In the second phase of the
TPTSR, if a sample xp’s class label does not belong to C,
then this class will not be considered as a target class, and
only a class from C will be regarded as a potential target
class.
The linear representation method in the first phase is not
the only metric for selecting the M-nearest neighbors.
There are several popular distance metrics in digital image
processing measuring the difference between two images
suitable for the nearest-neighbor selection, such as the
City-block distance, Euclidean distance, Minkowski dis-
tance, cosine, correlation, spearman distance, and et al.
[25–28]. Without the loss of generosity, if the jth element
of the testing image y and a training image xi are y(j) and
xi(j), respectively, where j 2 1; 2; . . .;Nf g, and N is the
total number of elements in each vector, the Minkowski
distance between these two image vectors is defined as
ei ¼XN
j¼1
yðjÞ � xiðjÞ½ �p( )1
p
¼ y� xik kp: ð4Þ
where p 2 1;1½ � and �j jj jp denotes the lp norm. It is noted
that the City-block distance and the Euclidean distance are
actually two special cases of the Minkowski distance when
p = 1 and p = 2, respectively, especially the Euclidean
distance is also a special case of the linear representation
method when all the coefficients are set to unity. Intui-
tively, the linear representation method is an optimal
solution to find the distance between two images. However,
if the computational load is taken into account as well as
classification rate, the efficiency of the TPTSR can be
improved by selecting the M-nearest neighbors with a
approximation of distance, such as the City-block distance
and the Euclidean distance et al. The theory of distance
approximation will be explained in Sect. 2.3, and the
computational complexity of all the popular distance
metrics will be compared in Sect. 2.4. The comparison of
the performance for the selection criterions will be shown
in the section of experiment.
2.2 Second phase of the TPTSR
In the second phase of the TPTSR method, the M-nearest
neighbors selected from the first phase are further pro-
cessed to generate a final decision for the recognition task.
It was defined in the first phase processing that the
M-nearest neighbors selected are denoted as x1 … xM, and
again, their linear combination for the approximation of the
testing image y is assumed to be satisfied, such as
y ¼ b1x1 þ � � � þ bMxM ; ð5Þ
where bi (i = 1, 2, …, M) are the coefficients. In vector
operation form, Eq. (5) can be written as
y ¼ ~XB; ð6Þ
where B = [b1 … bM]T and ~X ¼ x1. . .xM½ �. In the same
philosophy as above, if ~Xis a nonsingular square matrix,
Eq. (6) can be solved by
B ¼ ~X� ��1
y; ð7Þ
or otherwise, B can be solved by
B ¼ ð ~XT ~X þ lIÞ�1 ~XT y; ð8Þ
where l is a positive small value constant and I is the
identity matrix.
Neural Comput & Applic
123
With the coefficients bi for each of the nearest neighbors
obtained, the next step is to examine the contribution of
each of the classes to the testing image in the second phase
linear representation. We presume that the nearest neigh-
bors xs … xt are from the rth class ðr 2 CÞ, and the linear
contribution to approximate the testing sample by this class
is defined as
gr ¼ bsxs þ � � � þ btxt: ð9Þ
The approximation of the testing sample from the rth class
samples in the M-nearest neighbors is examined by
calculating the deviation of gr from y, such as
Dr ¼ y� grk k2; r 2 C: ð10Þ
Clearly, a smaller value of Dr means a better approxima-
tion of the training samples from the rth class for the
testing sample, and thus the rth class will have a higher
possibility over other classes to be in-class. Therefore, the
testing sample y is classified to the class with the smallest
deviation Dr.
In the second phase of the TPTSR, the solution in Eqs.
(7) or (8) offers an efficient means to find the coefficients
for identifying the similarity between the training samples
from the M-nearest neighbors and the testing sample. It
can be seen that, if the training samples from one class
have great similarity with the testing sample, more train-
ing samples from this class would be selected into the
group of M-nearest neighbors in the first phase of the
TPTSR. In the second phase of this process, the coeffi-
cients obtained will help to weigh these training samples
in the linear representation, so that the training samples
from this class contribute more than any other classes in
the approximation of the testing sample. As a result, the
testing sample is assigned to this class with the maximum
probability.
The linear representation method used in the TPTSR is
not the only metric for selecting the M-nearest neighbors
and classification. Many other distance metrics in digital
image processing are suitable for the nearest-neighbor
selection in the first phase, such as the City-block distance,
Euclidean distance, Minkowski distance, cosine, and et al.
[25–28]. Intuitively, the linear representation method is an
optimal solution to find the distance between two images.
However, if the computational load is taken into account as
well as classification rate, the efficiency of the TPTSR can
be improved by selecting the M-nearest neighbors with an
approximation of distance, such as the City-block distance
and the Euclidean distance et al.
2.3 Distance approximation in the first phase of TPTSR
It has been pointed out that the linear representation in
the SR method is essentially the perpendicular projection
of y onto the space spanned by X [19]. In the original
TPTSR method, the solution A = (XTX ? lI)-1XTy is
essentially a least squares solution, which means that
the projection of y into the sample vector space is XA.
Figure 1 shows the geometric demonstration of a special
case of the least squares solution for the projection of y,
where a plane p is defined by two sample vectors x1 and
x2, and the projection of y onto p is XA, here X = [x1, x2]
and A = [a1, a2]T.
Since the testing sample and all the training samples are
normalized before the recognition process, the matching of
vectors depends only on their orientations. If the angle
between y and x1 in Fig. 1 is denoted as h, the projection of
y to the training sample x1 is denoted as OP�!
, we will have
y� a1x1k k� y� OP�!������ ¼ yk k sin h ¼ sin h: ð11Þ
It is noted that y is a normalized vector, and therefore
yk k ¼ 1. Since y� OP�!������?x1, we have
y� OP�!������ ¼ min y� a1x1k kf g: ð12Þ
If the testing sample y is similar to the training sample x1
and far from x2, the distance y� a1x1k k will be close
to y� OP�!������ and vice versa. Consequently, in the first
phase of TPTSR method, the M-nearest neighbors can be
selected based on the distance y� OP�!������ of every training
sample. In another word, if the minimum value of the
distance y� aixik k ði ¼ 1; . . .; nÞ of any training sample
is bigger than others, this sample will not be considered as
a nearest neighbor. Therefore, if we measure and sort the
distance y� OP�!������ instead of y� aixik k ði ¼ 1; . . .; nÞ
for every training sample in the first phase of TPTSR, it
finds the farthest neighbors in the training set and the rest
of the M samples are regarded as the nearest neighbors.
When h is a small angle (h B 15�), sin h can be esti-
mated by h in radian, and the arc length corresponding to
rmin =y – XA
x2
XA
y
a1x1
x1
O
a2x2
P
Fig. 1 The geometric demonstration of a special case of the least
squares solution for the projection of y
Neural Comput & Applic
123
the angle h can also be approximated by y� x1k k, such
that
sin h � h � y� x1k k: ð13Þ
It is noted that y� x1k k is the City-block distance between
y and x1. It can be seen that the distance y� a1x1k k is much
more sensitive to the change of value of the angle h than
the distance y� OP�!������ or y� x1k k. This can also explain
why the SR method is better than the nearest-neighbor
method. Therefore, if the angle h is small enough, the
distance metric for selecting M-nearest neighbors (or far-
thest neighbors) in the original TPTSR method can
be estimated by the City-block distance or Euclidean
distance.
In practice, the approximation in Eq. (13) is verified by
calculating the orientation distributions of the image vec-
tors from the online face image databases, such as the ORL
[29], AR [30], and Feret [31, 32] databases. Table 1 shows
the average orientation deviation (AOD) of the image
vectors for the three databases, respectively, and their
resulted errors by the approximation in Eq. (13). The ori-
entation deviation of image vectors is calculated based on
the difference in each of the image vectors with all other
images within the database. The second row of Table 1
describes the relative error introduced by using the distance
approximation in Eq. (13). The third row of Table 1 shows
the average classification error rate of the TPTSR accord-
ing to Ref. [23]. With reference to the measurement theory,
the uncertainty resulted to the classification error of the
original TPTSR for introducing the distance approximation
to the first phase can be calculated based on the average
classification error of TPTSR (denoted as Eav) and the
relative error (denoted as c) resulted by the distance
approximation in the first phase, such that [33]
Eun ¼ �cEav: ð14Þ
It can be seen from Table 1 that the average orientation
deviation for the three databases would be up to about 25�,
and the relative error resulted by this estimation in Eq. (2)
is about 3 %. However, in practice, the uncertainty of this
distance approximation resulted to the final classification
accuracy of the TPTSR method can be ignored.
2.4 Computational complexity analysis
for nearest-neighbor selection criterions
Computational load for a face recognition task has rarely
been considered seriously in the past since almost all the
real-world applications were regarded to be run on a PC.
However, in the recent development of portable systems
like mobile phones and other consumable embedded sys-
tems, reducing the computational load for a recognition
task is significant for cutting the system cost. In this
section, we show the theoretical analysis of time complexity
for the nearest-neighbor selection criterions mentioned in
Sect. 2.2. It has been denoted that n is the number of
training samples and N is the number of pixels within each
of the training/testing images. Table 2 shows the approxi-
mation of the computational complexity of 10 different
distance metrics that can be used for nearest-neighbor
selection in the first phase of TPTSR.
It is noted that Table 2 shows the approximation of
computational complexity for the calculations only required
for the nearest-neighbor selection in the first phase of the
TPTSR processing, not including the calculations in the
second phase. It can be seen that the City-block distance and
the Euclidean distance are the most efficient metrics for the
nearest-neighbor selection. Table 3 compares the computa-
tional complexity of the SR and TPTSR method with dif-
ferent distance metrics. It can be seen that the computational
load of the original TPTSR method is more than that of the
SR method by at least O(m(N3 ? 2 N2)). However, by using
the City-block distance in TPTSR, the computational com-
plexity is reduced by O((n - m)(N3 ? 2 N2) - nN) than the
SR method, and O(n(N3 ? 2 N2 - N)) than the original
TPTSR, respectively. Since normally n � m and N3 � N,
the computational load relieved by using the TPTSR with
Table 1 Average orientation deviation of the image vectors from the
three databases, respectively
ORL database AR database Feret database
AOD 14.3� 24.9� 14.6�c 1.035 % 3.143 % 1.079
Eav of TPTSR 5.4 % 27.6 % 35.8 %
Eun ±0.056 % ±0.867 % ±0.386 %
Table 2 Approximation of the computational complexity of 10 dif-
ferent distance metrics for nearest-neighbor selection in the first phase
of TPTSR
Distance metric Computational complexity
Linear representation (SR) O(nN3 ? 2nN2) for nonsingular X
O(3nN3 ? 2nN2) for singular X
City-block distance O(nN)
Euclidean distance O(3nN)
Standardized Euclidean distance O(nN2 ? 2nN ? N)
Third-order Minkowski distance O(4nN)
Fourth-order Minkowski
distance
O(5nN)
Cosine O(4nN)
Correlation O(3nN2 ? nN)
Chebyshev distance O(nN2)
Jaccard distance O(5nN)
Neural Comput & Applic
123
City-block distance is significant. In the experimental
section, we will compare their classification performance in
face recognition; however, because of the massive compu-
tational load, only the metrics such as City-block distance
and Euclidean distance are tested in more detail.
3 Experimental results
The training sets and testing sets prepared for the experi-
ment are from the online ORL [29], AR [30], and Feret
[31, 32] face image databases. These databases provide
images taken from different faces with different facial
expressions and facial details at different time under dif-
ferent lighting conditions. There are totally 400 face ima-
ges in the ORL database from 40 different people (or
classes), and they are all used for our experiment. The AR
face database contains 3120 images from 120 individuals,
each of which provides 26 different facial details. In the
Feret face image database, only 1400 face images from 200
classes are used for our training and testing, and each of the
classes contains seven face images.
The experiments in this study are applied to the ORL,
AR, and Feret databases, respectively. In each recognition
task with the ORL database, the training samples are pre-
pared by selecting some of the images from the database
and the remaining images are taken as the testing set. If
there are n samples in one class, and s samples are selected
to be the training samples, then the rest of the t = n - s
samples will be regarded as the testing set from this class.
According to the combination theory, the number of
possible selection combinations for s samples is
Csn ¼
nðn� 1Þ. . .ðn� sþ 1Þsðs� 1Þ. . .1
: ð15Þ
In this way, there are Csn possible training sets generated
with Csn corresponding testing sets, and there will be Cs
n
training and testing tasks to carry out for this database.
The first database to test is the ORL database. We select
s = 5 training images out of n = 10 images within a class,
and the rest of the t = 5 images are for testing. Therefore,
the methods of TPTSR are applied to Csn ¼ 252 face rec-
ognition tasks. Figure 2 shows part of the images from the
ORL database for the experiment, and all the images have
been resized to a 46 9 56-pixel image by using a down-
sampling algorithm [34].
In the testing of AR database, for each of the classes, 13
face images are randomly selected as the training set from
the 26 images and the remaining 13 images are taken as the
testing set. Therefore, only one face recognition testing is
carried out because of the excessive computation load.
Figure 3 shows some of the images from the AR database,
and the images for training and testing have been down-
sized to be a 40 9 50-pixel image [34].
For the Feret database, four images out of seven within
each class are selected randomly to be the training images
and the rest of three images are the testing samples.
Figure 4 shows some sample images from the Feret data-
base, and the images used are also resized to 80 9 80.
In the TPTSR method with any distance metric, the
solution of Eqs. (7) or (8) is required in the second phase of
recognition following the selection of M-nearest samples.
During our experiment, l in Eq. (8) is set to be 0.01 for all
the M-nearest-neighbor selections.
In the second phase of the TPTSR method, the testing
image is represented by the linear combination of the M-
nearest samples as expressed in Eq. (5). If the linear rep-
resentation gs of all the M-nearest samples from one class
has the minimum deviation from the testing image, this
image will be classified to this class. Consequently, the
reconstruction image gs will present a similar looking face
image (or the most similar shape among all the classes) as
the testing image.
In order to fully test the performance of the three
nearest-neighbor selection criterions in the TPTSR, each
testing image in the testing sets of ORL database is tested
with different number of nearest neighbors. In this experi-
ment, 252 different combination sets of training and testing
images of the 40 individuals from the ORL database are
tested. In each face recognition task of 252, there are 200
training images and 200 testing images selected from the
40 classes, each of which provides five training images and
Table 3 Approximation of the
computational complexity of 10
different distance metrics for
nearest-neighbor selection in the
first phase of TPTSR
Face recognition method Computational complexity
SR O(nN3 ? 2nN2) for nonsingular X
O(3nN3 ? 2nN2) for singular X
Original TPTSR O((n ? m)(N3 ? 2 N2)) for nonsingular X
O((n ? m)(3 N3 ? 2 N2)) for singular X
TPTSR with City-block distance O(nN ? m(N3 ? 2 N2)) for nonsingular X
O(nN ? m(3 N3 ? 2 N2)) for singular X
TPTSR with Euclidean distance O(3nN ? m(N3 ? 2 N2)) for nonsingular X
O(3nN ? m(3 N3 ? 2 N2)) for singular X
Neural Comput & Applic
123
five testing images, respectively. For every testing image,
we test the three distance criterions with the numbers of
nearest neighbors from 10 to 200 with the interval of 10
(M = 10, 20, …, 200). In every task, if the criterion used in
the TPTSR for nearest-neighbor selection results in mis-
classification for a test image, this error will be recorded.
Therefore, for each M number, there will be an error rate of
classification, which is defined as the number of testing
images that have been misclassified versus the total number
of testing images. Here, in this experiment, the total
number of testing images in each task is 200.
For the initial testing and comparison, only 10 combi-
nation sets of training and testing samples randomly
selected from the 252 combinations of the ORL database
are tested under the TPTSR method with the 10 different
neighbor-selection criterions listed in Table 1. Figure 5
shows the mean error rate averaged from these 10 tests for
different M numbers and different distance metrics. It is
obvious that the classification performance of the TPTSR
method with the Jaccard distance metric and standard
Euclidean distance metric is relatively poor, but the rest of
the eight distance metrics have comparable classification
Fig. 2 Part of the face imagesfrom the ORL database for
testing
Fig. 3 Part of the face imagesfrom the AR database for testing
Neural Comput & Applic
123
rates, some of them are very similar with each other. From
Fig. 5, we can see that these eight working distance metrics
can all be used as the nearest-neighbor selection criterion in
TPTSR methods. Although these eight distance metrics can
result in similar classification rates, from Table 1, we can
see that their computational efficiencies vary considerably.
Because of the huge computational load, in the following
experiments, we only select the two most efficient distance
metrics from Table 1, the Euclidean distance and City-
block distance, for further testing as a comparison with the
linear presentation metric in the original TPTSR.
In the testing with the original TPTSR and the TPTSR
with the City-block distance and Euclidean distance, the
computational efficiency will be compared as well as the
classification performance. Figure 6 shows the mean error
rates averaged from the 252 tests for different M numbers.
It can be seen that the three nearest-neighbor selection
criterions have very similar mean error rates for M numbers
over 100 in the tests. However, with a closer look, we can
see that the Euclidean distance and City-block distance
criterions achieve better performance than the linear rep-
resentation method in the nearest-neighbor selection. It can
be seen from Fig. 6 that the mean error rates for the City-
block distance and Euclidean distance reach the minimum
with the nearest neighbors of 40, where the best mean error
rate in this experiment can be achieved by the City-block
distance criterion. However, the mean error rates in Fig. 6
start to increase when the nearest neighbors used are more
than 40. This is because the more neighbors are selected,
the closer their performance is to the global representation
Fig. 4 Part of the face imagesfrom the Feret database for
testing
Fig. 5 Mean error rate
averaged from these 10 tests for
different M numbers and
different distance metrics
Neural Comput & Applic
123
method, and when all the training samples are selected, all
methods will be equivalent to the global representation
method.
Figure 7 shows the reconstruction results of the TPTSR
for the ORL database with different M-nearest sample
selection criterions, respectively. The first column images
are the testing image and the images in each row are the
five nearest neighbors from this testing image measured by
the linear representation, Euclidean distance, and City-
block distance, respectively. It can be seen that all the
distance criterions for the M-nearest sample selection have
resulted in the correct classification and they all have small
deviation between each other. Figure 8 shows the testing
samples of a subject from the ORL database and their
corresponding linear representations by different selected
intra-class training images, respectively. The first row lists
the original testing image. The images in the second, third,
and fourth rows are the corresponding linear combinations
in the second phase of TPTSR by the training images from
the same class selected by the criterions of linear repre-
sentation, Euclidean distance, and City-block distance,
respectively. Since the ORL database is tested with TPTSR
in 252 tasks, each of which uses a different combination set
of training and testing images within a class, the testing
samples in Fig. 8 are selected from one of the 252
combinations.
Figure 9 shows the computation time of the three
selection criterions required to calculate testing images for
different numbers of classes involved. This computation
time only counts the first phase calculation with different
selection criterions in one task, and the computation was
carried out on a PC with a CPU of Intel T5200 1.6 GHz
Fig. 6 The mean error rates
averaged from the 252 tests for
different M numbers (ORL
database)
Fig. 7 The reconstruction
results of the TPTSR for the
ORL database with different
M-nearest sample selection
criterions, respectively
Neural Comput & Applic
123
and RAM of 1.5 Gbyte. It is clear that the computational
load for the linear representation method increases much
faster than the Euclidean distance and City-block distance
when calculating the increasing number of classes in the
image recognition.
In the testing tasks with the AR database and Feret
database, only one combination of training set and testing
set is tested because of the overwhelming computation
load. In the testing with the AR database, 13 face images
from each of the 120 classes are randomly selected as the
training sample and the rest of the 13 images within the
class are taken as the testing sample. For every testing
image, we test the three distance criterions with the num-
bers of nearest neighbors from 50 to 1,500 with the interval
of 50 (M = 50, 100, …, 1,500). Figure 10 shows the error
rates for each of the M numbers of nearest neighbors
selected by the three criterions, and Fig. 11 illustrates their
computation time required to calculate the first phase in the
task.
In the testing with the Feret database, four random
images are selected out of seven to be the training sample
for each of the 200 classes, and the remaining three images
are the testing samples. Figure 12 shows the error rates for
each of the M numbers of nearest neighbors from 7 to 800
Fig. 8 The testing samples of a
subject from the ORL database
and their corresponding linear
representations by different
selected intra-class trainingimages, respectively
Fig. 9 The computation time of
the three selection criterions
required to calculate the ORL
testing images for different
numbers of classes
Neural Comput & Applic
123
Fig. 10 The error rates for
randomly selected training
samples for different
M numbers (AR database)
Fig. 11 The computation time
of the three selection criterions
required to calculate the AR
testing images for different
numbers of classes
Fig. 12 The error rates for
randomly selected training
samples for different
M numbers (Feret database)
Neural Comput & Applic
123
with the interval of 28 (M = 7, 35, …, 800). Figure 13
shows the computation time for the three criterions in the
first phase calculations.
It can be seen from Figs. 10 and 12 that the classifica-
tion rates of the TPTSR with the Euclidean distance and
City-block distance are also close to that of the linear
representation. However, from Figs. 11 and 13, we can see
that the linear representation criterion is much more
demanding in computation time than the Euclidean dis-
tance and the City-block distance. The computation load
for the linear representation criterion increases dramati-
cally with the size of the database and the number of
classes needed, since the matrix operations involved cannot
be simplified or optimized.
4 Conclusion
The TPTSR method increases the classification rate by
selecting M-nearest neighbors from the training space and
focusing on the local distribution. However, the linear
representation criterion for selecting the M-nearest neigh-
bors in the first phase is computational demanding, espe-
cially when the training set as well as the dimension of
sample is large. Since the first phase processing in the
TPTSR is a course search, sparse representation is not
necessary for this stage. Therefore, we consider alternative
distance approximation metrics for the neighbor selection,
and 10 most popular distance metrics are tested and
compared. The theoretical analysis shows that the use
of distance approximation of City-block distance for the
M-nearest-neighbor selection will introduce ignorable
uncertainty to the classification rate of TPTSR. How-
ever, through experimental analysis, we can see that the
computational efficiency can be significantly improved. In
the testing, we can see that among the six distance metrics
tested, the Euclidean distance and the City-block distance
are the most efficient in terms of both classification
performance and computational complexity. The experi-
mental results show that the TPTSR method with the
Euclidean distance and the City-block distance criterions
can achieve almost the same classification performance as
the original TPTSR; however, the computational com-
plexity can be greatly reduced. This study can also be a
reference for the face recognition on portable systems,
since minimizing the computation load is significant or even
of utmost importance for cutting the cost of embedded
systems.
References
1. Kirby M, Sirovich L (1990) Application of the KL phase for the
characterization of human faces. IEEE Trans Pattern Anal Mach
Intell 12:103–108
2. Xu Y, Zhang D, Yang J, Yang J-Y (2008) An approach for
directly extracting features from matrix data and its application in
face recognition. Neurocomputing 71:1857–1865
3. Yang J, Zhang D, Frangi AF, Yang J-Y (2004) Two-dimensional
PCA: a new approach to appearance-based face representation
and recognition. IEEE Trans Pattern Anal Mach Intell 26:
131–137
4. Xu Y, Zhang D (2010) Represent and fuse bimodal biometric
images at the feature level: complex-matrix-based fusion scheme.
Opt Eng 49(3):037002
5. Park SW, Savvides M (2010) A multifactor extension of linear
discriminant analysis for face recognition under varying pose and
illumination. EURASIP J Adv Signal Process 2010:11
6. Fan Z, Xu Y, Zhang D (2011) Local linear discriminant analysis
framework using sample neighbors. IEEE Trans Neural Netw
22:1119–1132
Fig. 13 The computation time
of the three selection criterions
required to calculate the Feret
testing images for different
numbers of classes
Neural Comput & Applic
123
7. Sugiyama M (2007) Dimensionality reduction of multimodal
labeled data by local Fisher discriminant analysis. J Mach Learn
Res 8:1027–1061
8. Vural V, Fung G, Krishnapuram B, Dy JG, Rao B (2009) Using
local dependencies within batches to improve large margin
classifiers. J Mach Learn Res 10:183–206
9. Liu ZY, Chiu KC, Xu L (2003) Improved system for object
detection and star/galaxy classification via local subspace anal-
ysis. Neural Netw 16(3–4):437–451
10. Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering
using local discriminant models and global integration. IEEE
Trans Image Process 19(10):2761–2773
11. Lai Z, Jin Z, Yang J, Wong WK (2010) Sparse local discriminant
projections for feature extraction. In: ICPR, pp 926–929
12. Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010)
Sparse representation for computer vision and pattern recogni-
tion. In: IEEE transactions on neural network, pp 1031–1044
13. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust
face recognition via sparse representation. IEEE Trans Pattern
Anal Mach Intell 31:210–227
14. Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2009)
Supervised dictionary learning. In: Advances in neural informa-
tion processing systems (NIPS)
15. Shi Y, Dai DQ, Liu CC, Yan H (2009) Sparse discriminant
analysis for breast cancer biomarker identification and classifi-
cation. Prog Nat Sci 19:1635–1641
16. Dikmen M, Huang T (2008) Robust estimation of foreground in
surveillance videos by sparse error estimation. In: International
conference on pattern recognition
17. Elhamifar E, Vidal R (2009) Sparse subspace clustering. In: IEEE
international conference on computer vision and pattern recog-
nition, pp. 2790–2797
18. Rao S, Tron R, Vidal R, Ma Y (2008) Motion segmentation via
robust subspace separation in the presence of outlying, incom-
plete, and corrupted trajectories. In: IEEE international confer-
ence on computer vision and pattern recognition, pp 1–8
19. Zhang L, Yang M, Feng X (2011) Sparse representation or
collaborative representation: which helps face recognition?
In: ICCV
20. Casasent D (1984) Unified synthetic discriminant function com-
putational formulation. Appl Opt 23:1620–1627
21. Li SZ (1998) Face recognition based on nearest linear combi-
nations. In: Proceedings of IEEE international conference on
computer vision and pattern recognition, pp 839–844
22. Li SZ, Lu J (1999) Face recognition using nearest feature line
method. IEEE Trans Neural Netw 10:439–443
23. Xu Y, Zhang D, Yang J, Yang J-Y (2011) A two-phase test
sample sparse representation method for use with face recogni-
tion. In: IEEE Transactions on Circuits and Systems for Video
Technology, vol 21
24. Zhang L (2011) Sparse representation or collaborative represen-
tation: which helps face recognition? In: ICCV
25. Breu H, Gil J, Kirkpatrick D, Werman M (1995) Linear time
Euclidean distance transform algorithms. IEEE Trans Pattern
Anal Mach Intell 17:529–533
26. Krause EF (1987) Taxicab geometry. Dover, NY
27. Spearman C (1904) The proof and measurement of association
between two things. Am J Psychol 15:72–101
28. Minkowski H (1953) Geometrie der Zahlen. Chelsea, New York
29. Available: http://www.cl.cam.ac.uk/research/dtg/attarchive/face
database.html.in
30. Available: http://cobweb.ecn.purdue.edu/*aleix/aleix-face-DB.
html.in
31. Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The FERET
evaluation methodology for face-recognition algorithms. IEEE
Trans Pattern Anal Mach Intell 22:1090–1104
32. Phillips PJ The facial recognition technology (FERET) database.
Available: http://www.itl.nist.gov/iad/humanid/feret/feret-master.
html
33. Tumanski S (2006) Principles of electrical measurement. Taylor
& Francis Group, New York
34. Xu Y, Jin Z (2008) Down-sampling face images and low-reso-
lution face recognition. In: 3rd International conference on
innovative computing, information and control, pp 392–395
Neural Comput & Applic
123