Distance approximation for two-phase test sample representation in face recognition

ORIGINAL ARTICLE

Distance approximation for two-phase test sample representationin face recognition

Xiang Wu • Ning Wu

Received: 26 August 2012 / Accepted: 29 January 2013

� Springer-Verlag London 2013

Abstract The two-phase test sample representation

(TPTSR) scheme was proposed as a useful method for face

recognition; however, the sample selection based on sparse

representation in the first phase is not necessary. This is

because the first phase only plays a role of course search in

TPTSR, but the sparse representation method is suitable for

fine classification. This paper proves that alternative near-

est-neighbor selection criterions with higher efficiency can

be used in the first phase of TPTSR without compromising

the classification accuracy. Theoretical analysis and

experimental results show that the original distance metric

based on sparse representation in the first phase of the

TPTSR can be approximated with a more straightforward

metric while maintaining a comparable classification per-

formance with the original TPTSR. Therefore, the com-

putational load of the TPTSR can be greatly reduced.

Keywords Computer vision � Face recognition � Pattern

recognition � Sparse representation � Transform methods

1 Introduction

Face recognition has drawn many researchers’ attention in

recent years, and many approaches have been found for this

application. One type of the methods is to reduce the

dimensionality of sample by extracting the feature vector

with linear transformation methods, such as the principal

component analysis (PCA) [1–3] and the linear discrimi-

nant analysis (LDA) [4, 5]. In the PCA method, the training

samples and the testing samples are transformed from the

original sample space into a space with the maximum

variance of all the samples, while the LDA method con-

verts the samples to a feature space where the distances of

the centers of different classes are maximized. In these two

transformation methods, both the training samples and the

testing samples have their corresponding representations in

the new feature space, and the classification is carried out

based on the distance between the representations related to

the training set and the testing set.

Another type of transformation-based method was pro-

posed to focus on local information of the training samples.

Instead of using the whole training set, this type of method

only uses part of the samples, since the performance of the

classifiers is usually limited within some certain local

distributions. By concentrating on the local distribution of

training data, the design and testing of the classifier can be

much more efficient than the global methods [6]. Typical

examples of local LDA methods include the method for

multimodal data projection [7] and the approach to use the

local dependencies of samples for classification [8]. It is

also found that the local PCA is more efficient than the

global PCA in feature extraction [9] or sample clustering

[10].

Recently, the sparse representation (SR) theory has been

introduced in face recognition applications and has drawn a

lot of interests [11–18]. In sparse representation methods,

the testing sample is written in the form of the linear

combination of all the training samples, and the represen-

tation error of each of the classes is used for estimating the

X. Wu

School of Mechanical and Electrical Engineering,

Harbin Institute of Technology, 92 West Dazhi Street,

Nan Gang District, Harbin 150001, China

N. Wu (&)

Shenzhen Key Lab of Wind Power and Smart Grid,

Harbin Institute of Technology Shenzhen Graduate School,

Shenzhen 518055, China

e-mail: [email protected]

123

Neural Comput & Applic

DOI 10.1007/s00521-013-1352-8

target class. In previous study, l1-norm is usually used for

approximating the representation error, it has been found

that, however, it is the collaboration representation in the

SR method that helps to differentiate the target class, and

l2-norm in the approximation can achieve similar efficiency

[19].

It has been found that the linear combination of training

set can help to increase the representational capacity of the

training samples. The synthetic discriminant function

(SDF) was proposed for this purpose [20], and the nearest

linear combination (NLC) method utilizes the linear com-

bination of training set in a different way for eigenface-

based face recognition [21, 22]. However, in the NLC

method, the distance between the testing sample and the

linear combination of training classes is very sensitive to

the mean values of different classes, and the normalization

of all samples is required before the NLC processing.

In the SR method, a set of overdetermined equations is

required to be solved in order to obtain the linear repre-

sentation coefficients. However, if the number of samples

is massive, the computational load as well as memory

required to solve the equations would be unacceptably big.

In a recent effort, a two-phase test sample representation

(TPTSR) scheme was proposed as an efficient SR approach

for face recognition [23]. In this method, the number of

training samples for the SR processing is reduced by

selecting M-nearest neighbors to the testing image before

hand. In the second phase of the TPTSR, the selected

subset of M-nearest neighbors is used to linearly represent

the testing sample like the original SR method. The two-

phase structure in the TPTSR scheme enhances the per-

formance of the SR method by reducing the complexity of

data based on a local distribution of sample space.

Although the TPTSR method has been proven to be

powerful in face recognition, there is still great potential

that the computation load can be reduced. Since the first

phase processing in the TPTSR is designed to provide a

course classification to the training samples, fine classifi-

cation methods like SR is in fact not necessary in this

situation. In this paper, we show that the SR method for

selecting nearest neighbors for the TPTSR is not necessary,

and we study alternative nearest-neighbor selection crite-

rions for the first phase of the TPTSR method by approxi-

mating the distance between a testing sample and a training

sample with a more straightforward metric. In the theo-

retical analysis, we prove that selecting M-nearest neigh-

bors using distance approximation would only result in

ignorable uncertainty to the classification accuracy, but the

computational load can be greatly reduced.

The computational performance of a pattern recognition

method is usually measured with time or complexity. In the

previous literatures, computation time is usually considered

as a popular metric to compare the performance of different

methods [4, 24]. However, the computation time of a

method depends greatly on the quality of programming,

and it will be different considerably case by case. The

comparison of computation complexity has rarely been

considered in face recognition; however, it is the theoret-

ical analysis to the computation performance of a method.

In this paper, the computation complexity of the proposed

methods is compared with the original TPTSR and the

sorting of computation efficiency for all the methods can be

clearly identified.

In the initial testing, we compare the computational

efficiency as well as classification performance of 10 most

popular metrics in the field of pattern recognition and

digital image processing. It is found that seven types of

distance metrics out of 10 result in similar classification

performance with the original linear representation crite-

rion. However, the theoretical analysis in their compu-

tational complexity shows that the TPTSR method

implemented with the City-block distance and the Euclid-

ean distance offers the most efficient performance in face

recognition. By replacing the linear representation criterion

in the first phase of TPTSR with the City-block distance or

the Euclidean distance metrics, the computation time for

the face recognition task can be significantly reduced while

maintaining almost the same classification performance.

The comparison between the TPTSR and other state-of-

the-art face recognition methods such as the LDA, PCA, or

global representation methods has been demonstrated

elsewhere [23], and in this study, we only compare the

performance of the TPTSR method with different distance

metrics.

In the next section of this paper, we will introduce the

theory of the TPTSR with different nearest-neighbor

selection criterions. Section 3 presents our experimental

results with different face image databases, and finally a

conclusion will be drawn in Sect. 4.

2 Two-phase test sample representation (TPTSR)

with M-nearest-neighbor selection criterions

In this section, we will excavate the optimization solution

of the TPTSR scheme and show that the efficiency of the

first phase processing can be improved.

2.1 First phase of the TPTSR

The first phase of the TPTSR is to reduce the global sample

space to a local area for the target class by selecting

M-nearest neighbors from all the training samples for fur-

ther processing in the second phase [23]. The M-nearest

neighbors are selected by calculating the weighted dis-

tances of the testing sample from each of the training


123

samples. Firstly, let us assume that there are L classes and n

training images, x1, x2, …, xn, and if some of these images

are from the jth class (j = 1, 2, …, L), then j is their class

label. It is also assumed that a test image y can be written in

the form of linear combination of all the training samples,

such as

y ¼ a1x1 þ a2x2 þ � � � þ anxn; ð1Þ

where ai (i = 1, 2, …, n) is the coefficient for each training

image xn. Equation (1) can also be written in the form of

vector operation, such as

y ¼ XA; ð2Þ

where A = [a1 … an]T, X = [x1 … xn]T. x1 … xn and y are

all column vectors. If X is a singular square matrix, Eq. (2)

can be solved by using A = (XTX ? lI)-1XTy, or it can be

solved by using A = X-1y, where l is a small positive

constant and I is the identity matrix. In our experiment with

the T-TPTSR method, l in the solution is set to be 0.01.

By solving Eq. (2), we can represent the testing image

using the linear combination of the training set as shown in

Eq. (1), which means that the testing image is essentially an

approximation of the weighted summation of all the

training images, and the weighted image aixi is a part of the

approximation. In order to measure the distance between

the training image xi and the testing image y, a distance

metric is defined as follows,

ei ¼ y� aixik k2; ð3Þ

where ei is called the distance function, and it gives the

deviation of the testing sample y from the training sample

xi. It is clear that a smaller value of ei means the ith training

sample is closer to the testing sample, and it is more

probable to be the member of the target class. These

M-nearest neighbors are chosen to be processed further in the

second phase of the TPTSR where the final decision will be

made within a much smaller sample space. We assume that

the M-nearest neighbors selected are denoted as x1 … xM,

and the corresponding class labels are C = {c1 … cM},

where ci 2 1; 2; . . .; Lf g. In the second phase of the

TPTSR, if a sample xp’s class label does not belong to C,

then this class will not be considered as a target class, and

only a class from C will be regarded as a potential target

class.

The linear representation method in the first phase is not

the only metric for selecting the M-nearest neighbors.

There are several popular distance metrics in digital image

processing measuring the difference between two images

suitable for the nearest-neighbor selection, such as the

City-block distance, Euclidean distance, Minkowski dis-

tance, cosine, correlation, spearman distance, and et al.

[25–28]. Without the loss of generosity, if the jth element

of the testing image y and a training image xi are y(j) and

xi(j), respectively, where j 2 1; 2; . . .;Nf g, and N is the

total number of elements in each vector, the Minkowski

distance between these two image vectors is defined as

ei ¼XN

j¼1

yðjÞ � xiðjÞ½ �p( )1

p

¼ y� xik kp: ð4Þ

where p 2 1;1½ � and �j jj jp denotes the lp norm. It is noted

that the City-block distance and the Euclidean distance are

actually two special cases of the Minkowski distance when

p = 1 and p = 2, respectively, especially the Euclidean

distance is also a special case of the linear representation

method when all the coefficients are set to unity. Intui-

tively, the linear representation method is an optimal

solution to find the distance between two images. However,

if the computational load is taken into account as well as

classification rate, the efficiency of the TPTSR can be

improved by selecting the M-nearest neighbors with a

approximation of distance, such as the City-block distance

and the Euclidean distance et al. The theory of distance

approximation will be explained in Sect. 2.3, and the

computational complexity of all the popular distance

metrics will be compared in Sect. 2.4. The comparison of

the performance for the selection criterions will be shown

in the section of experiment.

2.2 Second phase of the TPTSR

In the second phase of the TPTSR method, the M-nearest

neighbors selected from the first phase are further pro-

cessed to generate a final decision for the recognition task.

It was defined in the first phase processing that the

M-nearest neighbors selected are denoted as x1 … xM, and

again, their linear combination for the approximation of the

testing image y is assumed to be satisfied, such as

y ¼ b1x1 þ � � � þ bMxM ; ð5Þ

where bi (i = 1, 2, …, M) are the coefficients. In vector

operation form, Eq. (5) can be written as

y ¼ ~XB; ð6Þ

where B = [b1 … bM]T and ~X ¼ x1. . .xM½ �. In the same

philosophy as above, if ~Xis a nonsingular square matrix,

Eq. (6) can be solved by

B ¼ ~X� ��1

y; ð7Þ

or otherwise, B can be solved by

B ¼ ð ~XT ~X þ lIÞ�1 ~XT y; ð8Þ

where l is a positive small value constant and I is the

identity matrix.


123

With the coefficients bi for each of the nearest neighbors

obtained, the next step is to examine the contribution of

each of the classes to the testing image in the second phase

linear representation. We presume that the nearest neigh-

bors xs … xt are from the rth class ðr 2 CÞ, and the linear

contribution to approximate the testing sample by this class

is defined as

gr ¼ bsxs þ � � � þ btxt: ð9Þ

The approximation of the testing sample from the rth class

samples in the M-nearest neighbors is examined by

calculating the deviation of gr from y, such as

Dr ¼ y� grk k2; r 2 C: ð10Þ

Clearly, a smaller value of Dr means a better approxima-

tion of the training samples from the rth class for the

testing sample, and thus the rth class will have a higher

possibility over other classes to be in-class. Therefore, the

testing sample y is classified to the class with the smallest

deviation Dr.

In the second phase of the TPTSR, the solution in Eqs.

(7) or (8) offers an efficient means to find the coefficients

for identifying the similarity between the training samples

from the M-nearest neighbors and the testing sample. It

can be seen that, if the training samples from one class

have great similarity with the testing sample, more train-

ing samples from this class would be selected into the

group of M-nearest neighbors in the first phase of the

TPTSR. In the second phase of this process, the coeffi-

cients obtained will help to weigh these training samples

in the linear representation, so that the training samples

from this class contribute more than any other classes in

the approximation of the testing sample. As a result, the

testing sample is assigned to this class with the maximum

probability.

The linear representation method used in the TPTSR is

not the only metric for selecting the M-nearest neighbors

and classification. Many other distance metrics in digital

image processing are suitable for the nearest-neighbor

selection in the first phase, such as the City-block distance,

Euclidean distance, Minkowski distance, cosine, and et al.

[25–28]. Intuitively, the linear representation method is an

optimal solution to find the distance between two images.

However, if the computational load is taken into account as

well as classification rate, the efficiency of the TPTSR can

be improved by selecting the M-nearest neighbors with an

approximation of distance, such as the City-block distance

and the Euclidean distance et al.

2.3 Distance approximation in the first phase of TPTSR

It has been pointed out that the linear representation in

the SR method is essentially the perpendicular projection

of y onto the space spanned by X [19]. In the original

TPTSR method, the solution A = (XTX ? lI)-1XTy is

essentially a least squares solution, which means that

the projection of y into the sample vector space is XA.

Figure 1 shows the geometric demonstration of a special

case of the least squares solution for the projection of y,

where a plane p is defined by two sample vectors x1 and

x2, and the projection of y onto p is XA, here X = [x1, x2]

and A = [a1, a2]T.

Since the testing sample and all the training samples are

normalized before the recognition process, the matching of

vectors depends only on their orientations. If the angle

between y and x1 in Fig. 1 is denoted as h, the projection of

y to the training sample x1 is denoted as OP�!

, we will have

y� a1x1k k� y� OP�!�� ¼ yk k sin h ¼ sin h: ð11Þ

It is noted that y is a normalized vector, and therefore

yk k ¼ 1. Since y� OP�!��?x1, we have

y� OP�!�� ¼ min y� a1x1k kf g: ð12Þ

If the testing sample y is similar to the training sample x1

and far from x2, the distance y� a1x1k k will be close

to y� OP�!�� and vice versa. Consequently, in the first

phase of TPTSR method, the M-nearest neighbors can be

selected based on the distance y� OP�!�� of every training

sample. In another word, if the minimum value of the

distance y� aixik k ði ¼ 1; . . .; nÞ of any training sample

is bigger than others, this sample will not be considered as

a nearest neighbor. Therefore, if we measure and sort the

distance y� OP�!�� instead of y� aixik k ði ¼ 1; . . .; nÞ

for every training sample in the first phase of TPTSR, it

finds the farthest neighbors in the training set and the rest

of the M samples are regarded as the nearest neighbors.

When h is a small angle (h B 15�), sin h can be esti-

mated by h in radian, and the arc length corresponding to

rmin =y – XA

x2

XA

y

a1x1

x1

O

a2x2

P

Fig. 1 The geometric demonstration of a special case of the least

squares solution for the projection of y


123

the angle h can also be approximated by y� x1k k, such

that

sin h � h � y� x1k k: ð13Þ

It is noted that y� x1k k is the City-block distance between

y and x1. It can be seen that the distance y� a1x1k k is much

more sensitive to the change of value of the angle h than

the distance y� OP�!�� or y� x1k k. This can also explain

why the SR method is better than the nearest-neighbor

method. Therefore, if the angle h is small enough, the

distance metric for selecting M-nearest neighbors (or far-

thest neighbors) in the original TPTSR method can

be estimated by the City-block distance or Euclidean

distance.

In practice, the approximation in Eq. (13) is verified by

calculating the orientation distributions of the image vec-

tors from the online face image databases, such as the ORL

[29], AR [30], and Feret [31, 32] databases. Table 1 shows

the average orientation deviation (AOD) of the image

vectors for the three databases, respectively, and their

resulted errors by the approximation in Eq. (13). The ori-

entation deviation of image vectors is calculated based on

the difference in each of the image vectors with all other

images within the database. The second row of Table 1

describes the relative error introduced by using the distance

approximation in Eq. (13). The third row of Table 1 shows

the average classification error rate of the TPTSR accord-

ing to Ref. [23]. With reference to the measurement theory,

the uncertainty resulted to the classification error of the

original TPTSR for introducing the distance approximation

to the first phase can be calculated based on the average

classification error of TPTSR (denoted as Eav) and the

relative error (denoted as c) resulted by the distance

approximation in the first phase, such that [33]

Eun ¼ �cEav: ð14Þ

It can be seen from Table 1 that the average orientation

deviation for the three databases would be up to about 25�,

and the relative error resulted by this estimation in Eq. (2)

is about 3 %. However, in practice, the uncertainty of this

distance approximation resulted to the final classification

accuracy of the TPTSR method can be ignored.

2.4 Computational complexity analysis

for nearest-neighbor selection criterions

Computational load for a face recognition task has rarely

been considered seriously in the past since almost all the

real-world applications were regarded to be run on a PC.

However, in the recent development of portable systems

like mobile phones and other consumable embedded sys-

tems, reducing the computational load for a recognition

task is significant for cutting the system cost. In this

section, we show the theoretical analysis of time complexity

for the nearest-neighbor selection criterions mentioned in

Sect. 2.2. It has been denoted that n is the number of

training samples and N is the number of pixels within each

of the training/testing images. Table 2 shows the approxi-

mation of the computational complexity of 10 different

distance metrics that can be used for nearest-neighbor

selection in the first phase of TPTSR.

It is noted that Table 2 shows the approximation of

computational complexity for the calculations only required

for the nearest-neighbor selection in the first phase of the

TPTSR processing, not including the calculations in the

second phase. It can be seen that the City-block distance and

the Euclidean distance are the most efficient metrics for the

nearest-neighbor selection. Table 3 compares the computa-

tional complexity of the SR and TPTSR method with dif-

ferent distance metrics. It can be seen that the computational

load of the original TPTSR method is more than that of the

SR method by at least O(m(N3 ? 2 N2)). However, by using

the City-block distance in TPTSR, the computational com-

plexity is reduced by O((n - m)(N3 ? 2 N2) - nN) than the

SR method, and O(n(N3 ? 2 N2 - N)) than the original

TPTSR, respectively. Since normally n � m and N3 � N,

the computational load relieved by using the TPTSR with

Table 1 Average orientation deviation of the image vectors from the

three databases, respectively

ORL database AR database Feret database

AOD 14.3� 24.9� 14.6�c 1.035 % 3.143 % 1.079

Eav of TPTSR 5.4 % 27.6 % 35.8 %

Eun ±0.056 % ±0.867 % ±0.386 %

Table 2 Approximation of the computational complexity of 10 dif-

ferent distance metrics for nearest-neighbor selection in the first phase

of TPTSR

Distance metric Computational complexity

Linear representation (SR) O(nN3 ? 2nN2) for nonsingular X

O(3nN3 ? 2nN2) for singular X

City-block distance O(nN)

Euclidean distance O(3nN)

Standardized Euclidean distance O(nN2 ? 2nN ? N)

Third-order Minkowski distance O(4nN)

Fourth-order Minkowski

distance

O(5nN)

Cosine O(4nN)

Correlation O(3nN2 ? nN)

Chebyshev distance O(nN2)

Jaccard distance O(5nN)


123

City-block distance is significant. In the experimental

section, we will compare their classification performance in

face recognition; however, because of the massive compu-

tational load, only the metrics such as City-block distance

and Euclidean distance are tested in more detail.

3 Experimental results

The training sets and testing sets prepared for the experi-

ment are from the online ORL [29], AR [30], and Feret

[31, 32] face image databases. These databases provide

images taken from different faces with different facial

expressions and facial details at different time under dif-

ferent lighting conditions. There are totally 400 face ima-

ges in the ORL database from 40 different people (or

classes), and they are all used for our experiment. The AR

face database contains 3120 images from 120 individuals,

each of which provides 26 different facial details. In the

Feret face image database, only 1400 face images from 200

classes are used for our training and testing, and each of the

classes contains seven face images.

The experiments in this study are applied to the ORL,

AR, and Feret databases, respectively. In each recognition

task with the ORL database, the training samples are pre-

pared by selecting some of the images from the database

and the remaining images are taken as the testing set. If

there are n samples in one class, and s samples are selected

to be the training samples, then the rest of the t = n - s

samples will be regarded as the testing set from this class.

According to the combination theory, the number of

possible selection combinations for s samples is

Csn ¼

nðn� 1Þ. . .ðn� sþ 1Þsðs� 1Þ. . .1

: ð15Þ

In this way, there are Csn possible training sets generated

with Csn corresponding testing sets, and there will be Cs

n

training and testing tasks to carry out for this database.

The first database to test is the ORL database. We select

s = 5 training images out of n = 10 images within a class,

and the rest of the t = 5 images are for testing. Therefore,

the methods of TPTSR are applied to Csn ¼ 252 face rec-

ognition tasks. Figure 2 shows part of the images from the

ORL database for the experiment, and all the images have

been resized to a 46 9 56-pixel image by using a down-

sampling algorithm [34].

In the testing of AR database, for each of the classes, 13

face images are randomly selected as the training set from

the 26 images and the remaining 13 images are taken as the

testing set. Therefore, only one face recognition testing is

carried out because of the excessive computation load.

Figure 3 shows some of the images from the AR database,

and the images for training and testing have been down-

sized to be a 40 9 50-pixel image [34].

For the Feret database, four images out of seven within

each class are selected randomly to be the training images

and the rest of three images are the testing samples.

Figure 4 shows some sample images from the Feret data-

base, and the images used are also resized to 80 9 80.

In the TPTSR method with any distance metric, the

solution of Eqs. (7) or (8) is required in the second phase of

recognition following the selection of M-nearest samples.

During our experiment, l in Eq. (8) is set to be 0.01 for all

the M-nearest-neighbor selections.

In the second phase of the TPTSR method, the testing

image is represented by the linear combination of the M-

nearest samples as expressed in Eq. (5). If the linear rep-

resentation gs of all the M-nearest samples from one class

has the minimum deviation from the testing image, this

image will be classified to this class. Consequently, the

reconstruction image gs will present a similar looking face

image (or the most similar shape among all the classes) as

the testing image.

In order to fully test the performance of the three

nearest-neighbor selection criterions in the TPTSR, each

testing image in the testing sets of ORL database is tested

with different number of nearest neighbors. In this experi-

ment, 252 different combination sets of training and testing

images of the 40 individuals from the ORL database are

tested. In each face recognition task of 252, there are 200

training images and 200 testing images selected from the

40 classes, each of which provides five training images and

Table 3 Approximation of the

computational complexity of 10

different distance metrics for

nearest-neighbor selection in the

first phase of TPTSR

Face recognition method Computational complexity

SR O(nN3 ? 2nN2) for nonsingular X

O(3nN3 ? 2nN2) for singular X

Original TPTSR O((n ? m)(N3 ? 2 N2)) for nonsingular X

O((n ? m)(3 N3 ? 2 N2)) for singular X

TPTSR with City-block distance O(nN ? m(N3 ? 2 N2)) for nonsingular X

O(nN ? m(3 N3 ? 2 N2)) for singular X

TPTSR with Euclidean distance O(3nN ? m(N3 ? 2 N2)) for nonsingular X

O(3nN ? m(3 N3 ? 2 N2)) for singular X


123

five testing images, respectively. For every testing image,

we test the three distance criterions with the numbers of

nearest neighbors from 10 to 200 with the interval of 10

(M = 10, 20, …, 200). In every task, if the criterion used in

the TPTSR for nearest-neighbor selection results in mis-

classification for a test image, this error will be recorded.

Therefore, for each M number, there will be an error rate of

classification, which is defined as the number of testing

images that have been misclassified versus the total number

of testing images. Here, in this experiment, the total

number of testing images in each task is 200.

For the initial testing and comparison, only 10 combi-

nation sets of training and testing samples randomly

selected from the 252 combinations of the ORL database

are tested under the TPTSR method with the 10 different

neighbor-selection criterions listed in Table 1. Figure 5

shows the mean error rate averaged from these 10 tests for

different M numbers and different distance metrics. It is

obvious that the classification performance of the TPTSR

method with the Jaccard distance metric and standard

Euclidean distance metric is relatively poor, but the rest of

the eight distance metrics have comparable classification

Fig. 2 Part of the face imagesfrom the ORL database for

testing

Fig. 3 Part of the face imagesfrom the AR database for testing


123

rates, some of them are very similar with each other. From

Fig. 5, we can see that these eight working distance metrics

can all be used as the nearest-neighbor selection criterion in

TPTSR methods. Although these eight distance metrics can

result in similar classification rates, from Table 1, we can

see that their computational efficiencies vary considerably.

Because of the huge computational load, in the following

experiments, we only select the two most efficient distance

metrics from Table 1, the Euclidean distance and City-

block distance, for further testing as a comparison with the

linear presentation metric in the original TPTSR.

In the testing with the original TPTSR and the TPTSR

with the City-block distance and Euclidean distance, the

computational efficiency will be compared as well as the

classification performance. Figure 6 shows the mean error

rates averaged from the 252 tests for different M numbers.

It can be seen that the three nearest-neighbor selection

criterions have very similar mean error rates for M numbers

over 100 in the tests. However, with a closer look, we can

see that the Euclidean distance and City-block distance

criterions achieve better performance than the linear rep-

resentation method in the nearest-neighbor selection. It can

be seen from Fig. 6 that the mean error rates for the City-

block distance and Euclidean distance reach the minimum

with the nearest neighbors of 40, where the best mean error

rate in this experiment can be achieved by the City-block

distance criterion. However, the mean error rates in Fig. 6

start to increase when the nearest neighbors used are more

than 40. This is because the more neighbors are selected,

the closer their performance is to the global representation

Fig. 4 Part of the face imagesfrom the Feret database for

testing

Fig. 5 Mean error rate

averaged from these 10 tests for

different M numbers and

different distance metrics


123

method, and when all the training samples are selected, all

methods will be equivalent to the global representation

method.

Figure 7 shows the reconstruction results of the TPTSR

for the ORL database with different M-nearest sample

selection criterions, respectively. The first column images

are the testing image and the images in each row are the

five nearest neighbors from this testing image measured by

the linear representation, Euclidean distance, and City-

block distance, respectively. It can be seen that all the

distance criterions for the M-nearest sample selection have

resulted in the correct classification and they all have small

deviation between each other. Figure 8 shows the testing

samples of a subject from the ORL database and their

corresponding linear representations by different selected

intra-class training images, respectively. The first row lists

the original testing image. The images in the second, third,

and fourth rows are the corresponding linear combinations

in the second phase of TPTSR by the training images from

the same class selected by the criterions of linear repre-

sentation, Euclidean distance, and City-block distance,

respectively. Since the ORL database is tested with TPTSR

in 252 tasks, each of which uses a different combination set

of training and testing images within a class, the testing

samples in Fig. 8 are selected from one of the 252

combinations.

Figure 9 shows the computation time of the three

selection criterions required to calculate testing images for

different numbers of classes involved. This computation

time only counts the first phase calculation with different

selection criterions in one task, and the computation was

carried out on a PC with a CPU of Intel T5200 1.6 GHz

Fig. 6 The mean error rates

averaged from the 252 tests for

different M numbers (ORL

database)

Fig. 7 The reconstruction

results of the TPTSR for the

ORL database with different

M-nearest sample selection

criterions, respectively


123

and RAM of 1.5 Gbyte. It is clear that the computational

load for the linear representation method increases much

faster than the Euclidean distance and City-block distance

when calculating the increasing number of classes in the

image recognition.

In the testing tasks with the AR database and Feret

database, only one combination of training set and testing

set is tested because of the overwhelming computation

load. In the testing with the AR database, 13 face images

from each of the 120 classes are randomly selected as the

training sample and the rest of the 13 images within the

class are taken as the testing sample. For every testing

image, we test the three distance criterions with the num-

bers of nearest neighbors from 50 to 1,500 with the interval

of 50 (M = 50, 100, …, 1,500). Figure 10 shows the error

rates for each of the M numbers of nearest neighbors

selected by the three criterions, and Fig. 11 illustrates their

computation time required to calculate the first phase in the

task.

In the testing with the Feret database, four random

images are selected out of seven to be the training sample

for each of the 200 classes, and the remaining three images

are the testing samples. Figure 12 shows the error rates for

each of the M numbers of nearest neighbors from 7 to 800

Fig. 8 The testing samples of a

subject from the ORL database

and their corresponding linear

representations by different

selected intra-class trainingimages, respectively

Fig. 9 The computation time of

the three selection criterions

required to calculate the ORL

testing images for different

numbers of classes


123

Fig. 10 The error rates for

randomly selected training

samples for different

M numbers (AR database)

Fig. 11 The computation time

of the three selection criterions

required to calculate the AR


numbers of classes

Fig. 12 The error rates for

randomly selected training

samples for different

M numbers (Feret database)


123

with the interval of 28 (M = 7, 35, …, 800). Figure 13

shows the computation time for the three criterions in the

first phase calculations.

It can be seen from Figs. 10 and 12 that the classifica-

tion rates of the TPTSR with the Euclidean distance and

City-block distance are also close to that of the linear

representation. However, from Figs. 11 and 13, we can see

that the linear representation criterion is much more

demanding in computation time than the Euclidean dis-

tance and the City-block distance. The computation load

for the linear representation criterion increases dramati-

cally with the size of the database and the number of

classes needed, since the matrix operations involved cannot

be simplified or optimized.

4 Conclusion

The TPTSR method increases the classification rate by

selecting M-nearest neighbors from the training space and

focusing on the local distribution. However, the linear

representation criterion for selecting the M-nearest neigh-

bors in the first phase is computational demanding, espe-

cially when the training set as well as the dimension of

sample is large. Since the first phase processing in the

TPTSR is a course search, sparse representation is not

necessary for this stage. Therefore, we consider alternative

distance approximation metrics for the neighbor selection,

and 10 most popular distance metrics are tested and

compared. The theoretical analysis shows that the use

of distance approximation of City-block distance for the

M-nearest-neighbor selection will introduce ignorable

uncertainty to the classification rate of TPTSR. How-

ever, through experimental analysis, we can see that the

computational efficiency can be significantly improved. In

the testing, we can see that among the six distance metrics

tested, the Euclidean distance and the City-block distance

are the most efficient in terms of both classification

performance and computational complexity. The experi-

mental results show that the TPTSR method with the

Euclidean distance and the City-block distance criterions

can achieve almost the same classification performance as

the original TPTSR; however, the computational com-

plexity can be greatly reduced. This study can also be a

reference for the face recognition on portable systems,

since minimizing the computation load is significant or even

of utmost importance for cutting the cost of embedded

systems.

References

1. Kirby M, Sirovich L (1990) Application of the KL phase for the

characterization of human faces. IEEE Trans Pattern Anal Mach

Intell 12:103–108

2. Xu Y, Zhang D, Yang J, Yang J-Y (2008) An approach for

directly extracting features from matrix data and its application in

face recognition. Neurocomputing 71:1857–1865

3. Yang J, Zhang D, Frangi AF, Yang J-Y (2004) Two-dimensional

PCA: a new approach to appearance-based face representation

and recognition. IEEE Trans Pattern Anal Mach Intell 26:

131–137

4. Xu Y, Zhang D (2010) Represent and fuse bimodal biometric

images at the feature level: complex-matrix-based fusion scheme.

Opt Eng 49(3):037002

5. Park SW, Savvides M (2010) A multifactor extension of linear

discriminant analysis for face recognition under varying pose and

illumination. EURASIP J Adv Signal Process 2010:11

6. Fan Z, Xu Y, Zhang D (2011) Local linear discriminant analysis

framework using sample neighbors. IEEE Trans Neural Netw

22:1119–1132

Fig. 13 The computation time

of the three selection criterions

required to calculate the Feret


numbers of classes


123

7. Sugiyama M (2007) Dimensionality reduction of multimodal

labeled data by local Fisher discriminant analysis. J Mach Learn

Res 8:1027–1061

8. Vural V, Fung G, Krishnapuram B, Dy JG, Rao B (2009) Using

local dependencies within batches to improve large margin

classifiers. J Mach Learn Res 10:183–206

9. Liu ZY, Chiu KC, Xu L (2003) Improved system for object

detection and star/galaxy classification via local subspace anal-

ysis. Neural Netw 16(3–4):437–451

10. Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering

using local discriminant models and global integration. IEEE

Trans Image Process 19(10):2761–2773

11. Lai Z, Jin Z, Yang J, Wong WK (2010) Sparse local discriminant

projections for feature extraction. In: ICPR, pp 926–929

12. Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010)

Sparse representation for computer vision and pattern recogni-

tion. In: IEEE transactions on neural network, pp 1031–1044

13. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust

face recognition via sparse representation. IEEE Trans Pattern

Anal Mach Intell 31:210–227

14. Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2009)

Supervised dictionary learning. In: Advances in neural informa-

tion processing systems (NIPS)

15. Shi Y, Dai DQ, Liu CC, Yan H (2009) Sparse discriminant

analysis for breast cancer biomarker identification and classifi-

cation. Prog Nat Sci 19:1635–1641

16. Dikmen M, Huang T (2008) Robust estimation of foreground in

surveillance videos by sparse error estimation. In: International

conference on pattern recognition

17. Elhamifar E, Vidal R (2009) Sparse subspace clustering. In: IEEE

international conference on computer vision and pattern recog-

nition, pp. 2790–2797

18. Rao S, Tron R, Vidal R, Ma Y (2008) Motion segmentation via

robust subspace separation in the presence of outlying, incom-

plete, and corrupted trajectories. In: IEEE international confer-

ence on computer vision and pattern recognition, pp 1–8

19. Zhang L, Yang M, Feng X (2011) Sparse representation or

collaborative representation: which helps face recognition?

In: ICCV

20. Casasent D (1984) Unified synthetic discriminant function com-

putational formulation. Appl Opt 23:1620–1627

21. Li SZ (1998) Face recognition based on nearest linear combi-

nations. In: Proceedings of IEEE international conference on

computer vision and pattern recognition, pp 839–844

22. Li SZ, Lu J (1999) Face recognition using nearest feature line

method. IEEE Trans Neural Netw 10:439–443

23. Xu Y, Zhang D, Yang J, Yang J-Y (2011) A two-phase test

sample sparse representation method for use with face recogni-

tion. In: IEEE Transactions on Circuits and Systems for Video

Technology, vol 21

24. Zhang L (2011) Sparse representation or collaborative represen-

tation: which helps face recognition? In: ICCV

25. Breu H, Gil J, Kirkpatrick D, Werman M (1995) Linear time

Euclidean distance transform algorithms. IEEE Trans Pattern

Anal Mach Intell 17:529–533

26. Krause EF (1987) Taxicab geometry. Dover, NY

27. Spearman C (1904) The proof and measurement of association

between two things. Am J Psychol 15:72–101

28. Minkowski H (1953) Geometrie der Zahlen. Chelsea, New York

29. Available: http://www.cl.cam.ac.uk/research/dtg/attarchive/face

database.html.in

30. Available: http://cobweb.ecn.purdue.edu/*aleix/aleix-face-DB.

html.in

31. Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The FERET

evaluation methodology for face-recognition algorithms. IEEE

Trans Pattern Anal Mach Intell 22:1090–1104

32. Phillips PJ The facial recognition technology (FERET) database.

Available: http://www.itl.nist.gov/iad/humanid/feret/feret-master.

html

33. Tumanski S (2006) Principles of electrical measurement. Taylor

& Francis Group, New York

34. Xu Y, Jin Z (2008) Down-sampling face images and low-reso-

lution face recognition. In: 3rd International conference on

innovative computing, information and control, pp 392–395


123

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.in

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.in

http://cobweb.ecn.purdue.edu/~aleix/aleix-face-DB.html.in

http://cobweb.ecn.purdue.edu/~aleix/aleix-face-DB.html.in

http://www.itl.nist.gov/iad/humanid/feret/feret-master.html

http://www.itl.nist.gov/iad/humanid/feret/feret-master.html

Documents

Distance approximation for two-phase test sample representation in face recognition