Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines

8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines

1/15

178 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

Face Recognition Using Total Margin-BasedAdaptive Fuzzy Support Vector Machines

Yi-Hung Liu, Member, IEEE, and Yen-Ting Chen

AbstractThis paper presents a new classifier called totalmargin-based adaptive fuzzy support vector machines (TAF-SVM)that deals with several problems that may occur in support vectormachines (SVMs) when applied to the face recognition. The pro-posed TAF-SVM not only solves the overfitting problem resultedfrom the outlier with the approach of fuzzification of the penalty,but also corrects the skew of the optimal separating hyperplanedue to the very imbalanced data sets by using different costalgorithm. In addition, by introducing the total margin algorithmto replace the conventional soft margin algorithm, a lower gen-eralization error bound can be obtained. Those three functionsare embodied into the traditional SVM so that the TAF-SVM isproposed and reformulated in both linear and nonlinear cases.

By using two databases, the Chung Yuan Christian University(CYCU) multiview and the facial recognition technology (FERET)face databases, and using the kernel Fishers discriminant anal-ysis (KFDA) algorithm to extract discriminating face features,experimental results show that the proposed TAF-SVM is superiorto SVM in terms of the face-recognition accuracy. The resultsalso indicate that the proposed TAF-SVM can achieve smallererror variances than SVM over a number of tests such that betterrecognition stability can be obtained.

Index TermsFace recognition, kernel Fishers discriminantanalysis (KFDA), support vector machines (SVMs).

I. INTRODUCTION

MANY computer vision-based systems have become more

and more important and attractive in recent years, such as

the surveillance, automatic access control, and the humanrobot

interaction. Face recognition plays a critical role in those appli-

cations. Due to the complicated pattern distribution from large

variations in facial expressions, facial details, illumination con-

ditions, and viewpoints, the face-recognition task has been con-

sidered as one of the most difficult pattern-recognition research

fields. Recently, various approaches have been proposed, e.g.,

[3],[5],[12],[15], [16], [22],[23], and[25][32]. From these

systems, we can conclude that how to extract discriminating fea-tures from raw face images and how to accurately classify dif-

ferent people based on these input features are the two keys to

the development of reliable and high-accuracy face-recognition

systems. This paper aims to propose a new classifier called total

margin-based adaptive fuzzy support vector machines (TAF-

SVM), which can enhance the performance of support vector

Manuscript received July 1, 2005; revised March 1, 2006. This work wassupported by the National Science Council of Taiwan, R.O.C., under Grant93-2212-E-033-011.

The authors are withthe Department of Mechanical Engineering, Chung YuanChristian University, Chung-Li 32023, Taiwan, R.O.C. (e-mail: [email protected]).

Digital Object Identifier 10.1109/TNN.2006.883013

machines (SVM) for face recognition. In addition to classifier

design, selecting a good feature extractor is also necessary.

A. Feature Selection

Principal component analysis (PCA)[12]and Fishers linear

discriminant analysis (FLDA) are widely used linear subspace

analysis methods in facial feature extraction. Compared with

PCA, FLDA owns more abilities to extract discriminating fea-

tures since its objective is to maximize the between-class and

minimize the within-class scatters. FLDA has been successfully

applied to face recognition in[32]and shown to be superior toPCA. Due to the linear nature, the capabilities of linear subspace

analysis methods are still limited. Motivated by the success of

the use of kerneltrick in the SVMs [8], [13], Schlkopfetal. [24]

proposed the kernel PCA (KPCA) by combining the PCA with

the kernel trick. Since the kernel trick is capable of representing

nonlinear relations of input data, KPCA is better than PCA in

terms of representation and reconstruction. This has been also

evidenced by Kims work[25]in which KPCA combined with

linear SVM classifier was applied to face recognition.

Another nonlinear subspace analysis method called general-

ized discriminant analysis (GDA) or kernel Fishers discrimi-

nant analysis (KFDA) was proposed by Baudatet al.[9]. KFDA

first nonlinearly maps input data into higher dimensional featurespacein whichFLDA is performed.Recently, several works have

shown that KFDA was much effective than KPCA in face recog-

nition [3], [22], [23]. This is due to the fact that KFDA keeps the

nature of FLDA, which is based on the separability maximiza-

tion criterion while the unsupervised learning-based KPCA is

still only designed for the pattern representation/reconstruction.

Therefore, this paper adopts the KFDA as the feature extractor

such that the goals of the extraction of discriminating features

and the reduction of input dimensionality can be reached.

B. Classifier Design

Although KFDA has been proven its superiority to discrimi-nating features extraction, it suffers from the problems that its

performance would drop while meeting new inputs that have

never been considered in the training process, for example, a

test face, whose viewpoint does not face the camera, while the

training faces are of frontal face. Features extracted with KFDA

are not invariant to these large changes because KFDA is es-

sentially a kind of appearance-based method. In [3], authors

suggested that a more sophisticated classifier, compared with

nearest neighbor (NN) classifier, was still needed even though

the KFDA algorithm has been employed for the multiview face

recognition because the face-pattern distribution would still be

nonseparable in the KFDA-based subspace. In other words, a

1045-9227/$20.00 2006 IEEE


2/15

LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 179

classifier with good generalization ability and minimal empir-

ical risk is necessary for making up the drawback of the appear-

ance-based feature extractor. Based on this, an SVM can serve

as a good classifier candidate.

SVM was proposed by Vapniket al.[13]and has been suc-

cessfully applied to various applications such as the unsuper-

vised segmentation of switching dynamics[46],face member-ship authentication[47], and image fusion[48]. Recently, sev-

eral works relative to face recognition have used SVMs as classi-

fiers and yielded satisfactory results [25][31]. In those systems,

the SVMs used are of regular SVM. However, some researches

which are not directly relative to the search of face recognition

have indicated that SVMwould suffer from several critical prob-

lems when applied to classify some particular data types. The

first problem is that SVM is very sensitive to outliers since the

penalty weight for every data point is the same [5][7].Second,

the class-boundary-skew problem will be met when SVM is ap-

plied to the problem of learning from imbalanced data sets in

which the negative data heavily outnumber the positive data

[1], [11],[17],[33]. The class boundary, i.e., the optimal sep-arating hyperplane (OSH), learned by SVM, can be skewed to-

wards the positive class. In consequence, the false-negative rate

can be very high and can make SVM ineffective in identifying

the targets that belong to the positive class, which results in the

class-boundary-skew problem. The two problems limit the per-

formance of SVM. Unfortunately, they also occur in the appli-

cation of SVM-based face recognition.

In face recognition, for example, a face image with an exag-

gerated expression may result in the existence of an outlier. If

the outlier possesses nonzero value of slack variable, the soft

margin algorithm used in regular SVM would start to find a hy-

perplane to let the error be correct. The overfitting problem mayfollow. The other problem is that SVM was originally designed

for the binary classification, while face recognition is practically

a problem of multiclass classification. To extendthe binarySVM

to multiclass face recognition, most existing systems [25][31]

used the one-against-all (OAA) method. As far as the compu-

tational effort is concerned, OAA may be more efficient than

one-against-one (OAO) strategy. The advantage of OAA over

OAO is that we only have to construct one hyperplane for each

ofthe classesinsteadof pairwise decision functions.

This decreases the computational effort by a factor of ;

in some examples, it can be brought down to [35]. This

may be the reason that authors of[25][31]used OAA in their

systems, though it has been reported that OAO is more efficient

than OAA in terms of classification accuracy[2],[36], [45]. As

OAA method is used, one of the classes will be the target class

andtherest classeswillbethenegativeclassforthelearning

of each OSH. The class-boundary-skew problem occurs. Also,

the larger the number of classes becomes, the more imbalanced

the training set is when OAA method is applied.

To remedytheseproblemswhenSVM is applied to face recog-

nition, this paper proposes a new classifier called TAF-SVM.

TAF-SVM is able to solve the overfitting problem by fuzzifying

the training set which is equivalent to fuzzifying the penalty term

[7],[44]. With this manner, training data are no longer treated

equally but treated differently according to their relative impor-tance. Besides, TAF-SVM also embodies the different cost al-

gorithm[11],[33], by which TAF-SVM can adapt itself to the

imbalanced training set such that the false-negative rate is re-

duced and the recognition accuracy is enhanced.

Another contribution of this paper is that we replace the soft

margin algorithm by introducing the total margin algorithm [4]

to TAF-SVM. The total margin algorithm not only considers

the errors but also involves the information of correctly classi-fied data points in the construction of OSH. Compared with the

conventional soft margin algorithm used in the regular SVM, a

lower generalization error bound can be reached. This can facil-

itate the face recognition since the generalization ability plays

a very important role for the predictions of unseen face images.

We combine these approaches in TAF-SVM and show that we

can significantly improve the face-recognition accuracy com-

pared to applying any one approach including SVM also.

This paper is organized as follows. Section II presents the

KFDA-based feature extraction method. A brief review of SVM

is given inSection III. Then, the problems of applying SVM to

face recognition are pointed out in detail together with the solu-

tions embodied in the TAF-SVM. InSection IV,we reformulatethe TAF-SVM in both linear and nonlinear cases. Experimental

results are presented and discussed inSection V. Conclusions

are drawn inSection VI.

II. FEATURE EXTRACTION VIAKFDA

A face image is first scanned row by row to form a vector

of . The training set contains images out of

subjects, namely and , ,

where is the set of class and is the cardinality of . For

KFDA, the within-class scatter and between-class scatter

in space are given by

(1)

(2)

where is a nonlinear mapping function that maps the data

from input space to a higher dimensional feature space:

. denotes the th face image in the

th class. The mapped data are centered in space [9],[24].

KFDA seeks tofind a set of discriminating orthonormal eigen-

vectors for the projection of input face image byperforming FLDA in space in which the between-class scatter

is maximized and the within-class scatter is minimized. This is

equivalent to solving the following maximization problem:

(3)

Solutions associated with the largest nonzero eigenvalues

must lie in the span of all mapped data; so, for , there exists

a normalized expansion coefficient vector

such that

(4)


3/15


Thus, for a testing face image , its projection on the th eigen-

vector is computed by

(5)

We do not need to know the nonlinear mapping exactly. By

using the kernel trick, the projection can be easily obtained

by

(6)

where the kernel function is defined as the dot product of vectors

(7)

Theradial basis function(RBF) kernel is used in this paper and

is expressed as

(8)

where the width is specifieda prioriby the user.

To project a face image into new coordinates, eigenvec-

tors associated with thefirst largest nonzero eigenvalues

are selected to construct the transformation matrix or

such that the dimensionality of a face image

is reduced from to .

To simplify the notation used in the following, we let the numberof projection vectors be equal to .

III. BASICIDEAS OFTAF-SVM

A. Basic Review of SVM

In SVM, the training set is given as , where

is the training data and is its class label being either

1 or 1. Let and be the weight vector and the bias of

the separating hyperplane, the objective of SVM is to find the

OSH by maximizing the margin of separation and minimizing

the training errors

Minimize (9)

Subject to (10a)

(10b)

where , is the nonlinear mapping

function which maps the data into a higher dimensional feature

space from the input space. are slack variables representing

the error measures of data points. The penalty weight is a free

parameter; it measures the size of the penalties assigned to the

errors. Minimizing the first term in (9)is equivalent to maxi-

mizing the margin of separation, which is related to minimizing

the VapnikChervonenkis (VC) dimension. Formulation of theobjective function in (9) is perfect accord with the structural risk

minimization (SRM) principle, by which good generalization

ability can be achieved[8].

By introducing the Lagrangian, the primal constrained opti-

mization problem can be solved with its dual form. The pre-

dicted class of an unseen data is the output of the decision

function

sign (11)

where are the nonnegative Lagrange multipliers for thefirst

inequality constraints in the primal problem (10a), are

support vectors for which , and is the number

of support vectors. The optimal value of is calculated with

KuhnTucker (KT) complementary conditions.

B. Basic Ideas of TAF-SVM

1) Dealing With the Overfitting Problem via Fuzzification of

Training Set: One issue on using SVM for face recognition ishow to tackle the overfitting problem since large variation re-

sulted from facial expressions and viewpoints may produce the

outliers appearing in the pattern distribution. As shown in pre-

vious researches[5], [6], SVM is very sensitive to outliers or

noises since the penalty term of SVM treats every data point

equally in the training process. This may result in the occurrence

of overfitting problem if one or few data points have relatively

very large values of . Wanget al. and Huanget al. proposed

the fuzzy SVM (FSVM) to deal with the overfitting problem

[7],[44],based on the idea that a membership value is assigned

to each data according to its relative importance in its class so

that a less important data is punished less. To achieve this, the

fuzzy penalty term is redefined in FSVM where is

the membership value denoting the relative importance of point

to its own class.

We incorporate the concept of FSVM into the proposed TAF-

SVM. The training set isfirst divided into two sets: the fuzzy

positive training set and the fuzzy negative training set ,

denoted by

(12a)

(12b)

where the membership values and standfor t he relative importance o f the points and to

the positive class and negative class, respectively. The variable

is a small positive real number. and are the cardinalities

of fuzzy positive training set and fuzzy negative training set,

respectively, and .

2) Adaptation to Imbalanced Face Training Sets via Different

Cost Algorithm: Face recognition is practically a task of mul-

ticlass classification while SVM was designed for the binary

classification. OAO and OAA methods are two popular ways to

realize the SVM-based multiclass classification task[2]. Based

on the pairwise learning framework, OAO method needs to con-

struct OSHs and use the voting strategy to makefinal

decisions if there are subjects to be recognized. Compared withOAO method, OAA method, by which only OSHs need to


4/15


be learned, is more effective in terms of computational effort.

Therefore, it is found that most existing SVM-based face-recog-

nition systems chose the OAA method to accomplish the task of

multiclass classification[25][31]. However, a critical problem,

the class-boundary-skew phenomenon, which had never been

pointed out in these SVM and OAA method-based face-recog-

nition systems, is followed.By using OAA method to learn each OSH for multiclass face

recognition, one of the subjects forms the positive class and

the rest form the negative class. With this manner, the training

faces of the negative class significantly outnumber the training

faces in positive class. The ratio of the size of negative class to

the size of positive class is . A very imbalanced face

training set is produced. The larger the number of subjects is,

the heavier the imbalance of the face training set is.

It has been recently reported that the success of SVM is

limited when applied to the imbalanced data sets [1],[11],[17],

[33] because the OSH would be skewed towards the positive

class and results in the class-boundary-skew phenomenon. To

solve this critical problem, some remedies have been proposedincluding the oversampling and undersampling techniques [18],

combining oversampling with undersampling [19], synthetic

minority oversampling technique (SMOTE) [20], different error

cost algorithms[1], [33], class-boundary-alignment algorithm

[17],and SMOTE with different cost algorithm (SDC)[11].

Those methods can be divided into three categories. The

methods proposed in[18][20]process the data before feeding

them into the classifier. The oversampling technique dupli-

cates the positive data by interpolation while undersampling

technique removes the redundant negative data to reduce the

imbalanced ratio. They are classifier-independent approaches.

The second category belongs to the algorithm-based approach[1], [17], [33]. For example, Veropoulos et al. [1] and Lin et

al. [33] proposed the different cost algorithms suggesting that

by assigning heavier penalty to the smaller class, the skew

of the OSH can be corrected. The third category is the SDC

method which combines the SMOTE and the different error

cost algorithm[11].

For face recognition, since each training data stands for the

particular face information, we attempt not to use any presam-

pling techniques. Instead, the proposed TAF-SVM adopts the

different cost algorithm to achieve the goal of adaptation to the

imbalanced face training sets when faced with the OAA-based

multiclass classification. Another reason for using this algo-

rithm is that the different cost algorithm was originally designed

for solving the skew problem of SVM. By combining the fuzzy

penalty and the different cost algorithm, the proposed fuzzified

biased penalties are expressed as

(13)

where and are the penalty weights for the errors of the

positive class and negative class, respectively. The slack vari-

ables and are the measurement of errors of the data be-

longing to the positive class and the negative class, respectively.By setting to meet the central concept of different

cost algorithm, the OSH will be much more distant from the

smaller class.

3) Improvement of the Generalization Error Bound via Total

Margin Algorithm: Due to the fact that it is impossible to take

all face information into consideration, i.e., the available face

training samples are always finite and not numerous, the gen-

eralization ability for a classifier dominates the prediction ac-curacy for unseen faces. Soft margin algorithm used in SVM

relaxes the measure of margin by introducing slack variables

to errors. An OSH is found with the maximal margin of sepa-

ration by maximizing the minimum distance between

few extreme values (support vectors) and the separating hyper-

plane. However, only few extreme training data that are used

would cause the loss of information because the most of in-

formation is contained in the nonextreme data, which occupy

the majority in the training set. Feng et al.proposed the scaled

SVM [21], which not only employed the support vectors but

also involved the means of the classes to reduce the generaliza-

tion error of SVM. However, the face-pattern distribution is gen-

erally non-Gaussian and highly nonconvex[3], [22]. Namely,the mean of a class may not be very representative. Another ap-

proach for improving the generalization error bound called total

margin algorithm has been also proposed by Yoon et al.[4].

The total margin algorithm extends the soft margin algorithm

by introducing extra surplus variables to the correctly classi-

fied data points . The surplus variable measures the distance

between the correctly classified data point and the hyperplane

, if this data point belongs to the positive/neg-

ative class. In addition to minimizing the sum of slack variables

(the misclassified data points) while maximizing the margin

of separation proposed by soft margin algorithm, total margin

algorithm suggests that the sum of surplus variables (the cor-rectly classified data points) should also be maximized simulta-

neously. Maximizing the sum of surplus variables is equivalent

to maximizing , which in turn is equivalent to minimizing

. Therefore, total margin-based SVM is formulated as

the constrained optimization problem

Minimize

Subject to

(14)

where is the weight for the misclassified data points and

is the weight for the correctly classified data points, i.e., thesurplus variables .

From (14), we can see that the construction of OSH is no

longer controlled only by few extreme data points in which

most of them may be misclassified data points, but also by the

correctly classified data points, which are the majority of the

training set. The advantages are clear. First, the essence of the

soft margin-based SVM is to rely only on the set of data points

which take extreme values, the so-called support vectors. From

the statistics of extreme values, we know that the disadvantage

of such an approach is that the information contained in most

samples (not extreme values) is lost, so that such an approach

is bound to be less efficient than one that takes into account the

lost information[21], i.e., the correctly classified data points.Therefore, total margin algorithm can be more efficient and gain


5/15


Fig. 1. Geometric interpretation of slack variables and surplus variables used in TAF-SVM.

better generalization ability than soft margin algorithm since the

information of all samples are considered in the construction

of OSH. Second, from the objective expressed in(14), we can

see that minimizing implies that the obtained OSH is

able to gain more correctly classified data points because mini-

mizing is equivalent to maximizing the sum of surplus

variables. Therefore, in this paper, we adopt the total margin al-

gorithm as one of the bases in the development of TAF-SVM

for the face recognition.

In order to facilitate the reformulation of TAF-SVM inSec-

tion IV, the usage of combining surplus variables and the im-

balanced penalties is illustrated here. Since the TAF-SVM con-

siders both of the different cost algorithm and the total margin

algorithm, the geometric relationship between the positive/neg-

ative slack variables and the positive/negative surplus variables

is illustrated inFig. 1.

InFig. 1, the white circles and the black circles denote thedata points belonging to the positive class and the negative class,

respectively. The slack variable measures the distance be-

tween the hyperplane and the misclassified data point

which is supposed to be classified as the positive class. Con-

trarily, is the measurement from the misclassified data point

to the hyperplane . The surplus variable mea-

sures the distance between the correctly classified data point

and the hyperplane . The surplus variable measures

the distance between the correctly classified data point and the

hyperplane . All these variables are nonnegative vari-

ables. At least one of and will be zero for a data point .

Furthermore, we assume that are the positive training datapoints, in which any of can have two different classification

results: misclassified and correctly classified. Table I summaries

the relationship between the slack variable and the surplus

variable according to different classification results of .

Notice that the used inTable Ican be any data point among

all the positive training data points, while the shown inFig. 1

is just one misclassified positive training data point.

IV. REFORMULATION OFTAF-SVM

In this section, we reformulate the proposed TAF-SVM for

linearly nonseparable and nonlinearly nonseparable cases basedon the aforementioned ideas.

TABLE IINTERPRETATION OF THE RELATIONSHIPS BETWEENSLACK VARIABLES,

SURPLUS VARIABLES, AND THE P OINT L OCATIONS BY T AKINGPOSITIVETRAININGDATAPOINTS ASEXAMPLE

A. Linearly Nonseparable Case

The primal problem for the linearly nonseparable case is re-

formulated as follows:

Minimize

(15)

Subject to

(16a)

(16b)

(16c)

(16d)

(16e)

(16f)

where and are the weights for positive and negative

slack variables, respectively. and are the weights for

the positive and negative surplus variables, respectively. It is

difficult to solve this constrained optimization problem. Sim-ilar to SVM, the primal optimization problem of TAF-SVM is


6/15


transformed to the dual form by introducing a set of nonnega-

tive Lagrange multipliers , , , , , and for the

constraints from(16a) to (16f)to yield its Lagrangian

(17)

Differentiation with respect to , , , , , and yields

(18a)

(18b)

(18c)

(18d)

(18e)

(18f)

By the resubstitution of these equations into primal problem, the

dual problem is obtained

Maximize (19)

Subject to (20a)

(20b)

(20c)

B. Nonlinearly Nonseparable Case

The dual form for the nonlinearly nonseparable case can be

obtained by using kernel function

where , is a nonlinear map. The

objective is as follows:

Maximize (21)

The constraints for this maximization problem are the same as

those in the dual form of the linear case (20a)(20c).The KT

complementary condition plays a key role in the optimality. The

KT complementary conditions for the nonlinear TAF-SVM are

given by

(22a)

(22b)

(22c)

(22d)

(22e)

(22f)

The optimal value of can be calculated with any data in

the training set satisfying the KT complementary conditions.

However, from the numerical perspective, it is better to take the

mean value of resulting from all such data[14]. Therefore,

the optimal value of is computed by

(23)

where and are the subsets of and , respectively

(24a)(24b)

For an unseen data , its predicted class is the output of the

decision function

sign (25)

According to the formulation of the TAF-SVM, three main

properties are discussed and summarized as follows.

1) Through an inspection from the constraints in the dualform, we can see that the Lagrange multipliers ( and

) are bounded with t he upper b ound ( and )

and the lower bound ( and ). Therefore, ac-

cording to (20b) and (20c), all training data are support

vectors for TAF-SVM since all data are with nonzero ,

which meets the role of the total margin algorithm. On the

contrary, in the soft margin-based SVM, the OSH is only

constructed by few data points whose satisfy .

2) In SVM, the are bounded by the range of .

For all data points, their feasible regions are fixed once

is chosen. Speaking of TAF-SVM, the feasible region

is dynamic since the upper and lower bounds for every

data point are different, because the bounds of the feasibleregion are functions of the assigned membership values.


7/15


This means that a less important data point hasthe narrower

width in the feasible region.

Another question is how to fuzzify the training set effi-

ciently. Basically, the rule to assign proper membership

values to data points can depend on the relative importance

of data points to their own classes. Therefore, for a positive

data, its assigned membership value can be calculated withthe membership function

if

otherwise(26)

where denotes the Euclidean distance, the lower bound

is a nonnegative small real number and is user-defined,

and is the mean of all data points in . The member-

ship values for all the fuzzified positive training data

are bounded in . The same procedure is also used for

the fuzzification of the negative data in which the mean is

calculated with all the negative data.

3) In SVM, only one free parameter has to be adjusted. Amore complex procedure may occur for TAF-SVM since

there are four free parameterstobe adjusted: , , ,

and . However, the adjustment process can be further

simplified according to some relationships. First, the in-

equality constraints in(20b) and (20c)say that the two in-

equalities and must be held. Second,

based on the concept of adaptation to imbalanced data sets,

the relationships and are required

if the size of positive class is smaller than that of negative

class. Two ratios are defined to simplify the procedure of

the adjustment of these parameters

(27)

(28)

Byfixing the value among any of the four parameters ,

, , and , and setting the values of and , the

other three parameters can be obtained directly. In the case

of 1, no adaptation effort will be made to the imbal-

anced case. Furthermore, TAF-SVM will be the standard

SVM if equals 1 and goes to infinity ( ,

0, and is finite), and the membership

values are set as 1. It is noticed that the very small number10 is added to avoid the situation when .

Also, when 1, infinity, and 0 1, the pro-

posed TAF-SVM will become the FSVM. Therefore, SVM

and FSVM can be viewed as two particular cases of the

proposed TAF-SVM.

V. EXPERIMENTAL RESULTS

A. Experiment on CYCU Face Database

Here, we present a set of experiments that were carried out by

using the Chung Yuan Christian University (CYCU) multiview

face database [10]. The CYCU multiview face database contains

3150 face images out of 30 subjects and involves variations offacial expression, viewpoint, and sex. Each image is a 112 92

Fig. 2. Thirty subjects of the CYCU multiview face database.

Fig. 3. Collected 21 face images of one of the 30 subjects in CYCU multiviewface database.

24-b pixel matrix. The viewpoint is governed by two parame-

ters: the rotation angle and the tilt angle where the rotation

angle and tilt angle have seven and three kinds of degrees of

angles ( , ), re-

spectively. For each viewpoint of each subject we prepared five

face images with different facial expressions. Therefore, each

subject has 105 face images.Fig. 2shows the total 30 subjectsin this database. All images contain face contours and hair. The

color of the background is nearly white and the lighting condi-

tion is controlled to be uniform.Fig. 3shows the collected 21

images containing 21 different viewpoints of one subject.

1) Analysis of Face-Pattern Distributions in KFDA-Based

Subspace: Two cases are analyzed in this subsection. Before

the experiments, all colored face images are transferred to gray-

level images and the contrast of each gray image is enhanced

by the histogram equalization. All gray-level images of 112

92 pixels are resized to 28 23 pixel matrices before the fea-

ture extraction. In addition, all extracted features by KFDA from

Case 1 to Case 2 are thefirst two most discrimination features.Case 1: Fig. 4 depicts the distribution of the face pat-

terns of five subjects randomly chosen from the database

in the KFDA-based subspaces. Each subject contains 21

patterns covering the whole range: and

, i.e., each of the 21 viewpoints provides one

image for one person. Two observations are as follows. First,

when , we observe that there exists an outlier

for the class denoted by ,as shown inFig. 4(a). This outlier

is very far from the main body of its class and falls into another

class denoted by .The SVM will suffer from the overfitting

problem when it is applied to solve the binary classification

problem between the two classes. Second, according to the

distribution shown inFig. 4(b), it is observed that there existsan overlap between the three classes denoted by , ,and


8/15


Fig. 4. Distribution of 105 face patterns out of the five subjects in KFDA-basedsubspaces with the RBF kernel where (a)

2

and (b)

.

,respectively. To identify the class by using the OAA

method, the imbalanced ratio of positive class to negative class

is 1 : 4. The OSH learned by the traditional SVM will be skew

toward the positive class . Consequently, the number of

false negatives will increase and the recognition accuracy will

decrease.

Case 2: Most face-recognition systems evaluate their

systems performance by changing some face conditions such

as expression, viewpoint, and illumination conditions, etc. Ac-

cordingly, several well-known databases are widely used such

as the Olivetti Research Laboratory (ORL) face database, the

University of Manchester Institute of Science and Technology

(UMIST) multiview face database, and Yale face database. The

three face databases have different conditions considered. For

example, ORL database contains 400 face images in which

all frontal face images have different facial expressions and

details (glasses or no glasses). UMIST database consists of 575

face images covering a wide range of poses from one-sided

profile to frontal views as well as the expressions. Yale database

contains 165 frontal face images having different expressions,illumination conditions, and small occlusions (with glasses).

The three databases have involved most of the considerable

conditions crucial to the evaluations of face-recognition sys-

tems. However, all the faces in these databases are bounded

well. That is, they do not take the variations of face contour and

hair into consideration.

Face-recognition task follows the face-detection task. For ex-

ample, the SVM-based face-detection system[34]searches forfaces from an image by using size-varying widows to scan this

image and perform face/nonface-classification task. Once the

faces are detected, these faces will be framed by rectangular

bounding boxes with different sizes and then sent into the face-

recognition system. The framed face images detected by dif-

ferent face-detection systems (even the same) may contain the

whole hair and face contours, or just partial hair and contours,

or none of them. Most existing face-recognition systems do not

evaluate their systems on this factor since all the images in the

three databases are full faces containing both hair and contours.

Er et al. have conducted an interesting experiment in their

work[16]. They evaluated their system [discrete cosine trans-

form (DCT) FLDA RBF neural networks] on two groups ofdata: One was full faces of Yale database, and the other was the

closely cropped faces of the same database, and achieved error

rates of 1.8% and 4.8%, respectively, which were lower than

other approaches such as eigenfaces and Fisherfaces, etc. How-

ever, each of the two groups does not consider both full faces

and cropped faces at the same time. Nevertheless, by comparing

the two results, we see that the information of face contour and

hair style is important for face recognition. This study will eval-

uate the proposed TAF-SVM by letting this information be a

variable.

In this paper, we assume that an input face can be a full face

or a partially cropped face in order to fulfill the requirement thatin addition to variations resulted from different expressions and

viewpoints, a robust and reliable face-recognition system should

also be able tofight against the variation due to the size-varying

face-bounding boxes. This case aims to investigate the influence

of the changes of the sizes of face-bounding boxes upon the

face-pattern distribution in KFDA-based subspace. To achieve

this goal, each face image is cropped to a new face image with

two integer cropping sizes: and . This procedure is called

face cutting, which is illustrated inFig. 5, where the operator

round is to force the value of to become an integer. The

dotted white rectangle is the face-bounding box. After the cut-

ting, the cropped image is resized to a 112 92 new image. With

this manner, an input face may contain the whole face contour

or just part of it.

The face-cutting procedure is performed to all the 105 face

images that have been used in Case 1 with randomly chosen

within the range of [0, 7]. Fig. 6(a) and (b) shows the distribution

of these 105 randomly cropped face patterns in the KFDA-based

subspaces. Compared with the distribution depicted inFig. 4,

it can be seen that face images with different cropping sizes

significantly result in the increase of the interclass ambiguities

and the decrease of the intraclass compactness. Although the

KFDA-based feature extraction method has tried to maximize

the between-class separability and the within-class compact-

ness, it cannot absorb the large variations caused from view-point and size-varying face-bounding box. Therefore, a robust


9/15


Fig. 5. Face-cuttingprocedureand thecropped face imageswith differentcrop-ping sizes of

1

. The dotted white rectangle is the face-bounding box.

Fig. 6. Distribution of randomly cropped and 105 face patterns offive subjectsin the KFDA-based subspaces subject to (a)

2

, and (b)

.

classifier is still needed even though the robust feature extrac-

tion method KFDA has been used. The robust classifier here

means a classifier better than NN classifier. Besides, because

outliers would appear in the distribution, and the situation of im-

balanced training data sets would happen when OAA method is

employed, a classifier which is more robust than SVM is alsoneeded. This also motivates this paper.

2) Sensitivity Test of TAF-SVM: The goal of this experiment

is to test the sensitivities of TAF-SVMto its intrinsic parameters,

including the penalty weights and , weights for surplus

variables and , and the lower bound used in the fuzzy

membership function. To make the following experiments more

constructive, three conditions containing different criteria for

the collection of the training set and the test set are de fined asfollows.

Condition 1: For the training set, each subject offers 21

face images picked from all 21 angle combinations of

. Each angle combination randomly offers one for

each subject. Therefore, the training set contains 630 face

images out of the 30 subjects. The collection procedure

for the test set is of the same procedure. The two sets have

no overlap.

Condition 2: Each set is provided with 21 face images

from every subject, so each set has 630 face images in

total. The face images, different from Condition 1, are

picked randomly from confined angle combinations. For

the training set, only 30 and 0 are considered in ,and 15 and 0 in . As to the test set, only combina-

tions of 45 and 15 in , and 15 and 0 in are

picked. Those chosen face images will not be picked again.

Condition 3: Face images are randomly chosen from all

3150 face images in CYCU face database for the training

set and test set. Each of the 30 subjects provides 21 face

images for each set and there is no overlapping between

the two sets. Those chosen face images will not be picked

again.

As far as the viewpoint of face is concerned, it can be seen hat

the degrees of uncertainties of the three data collection criteria

are apparently different.Condition 2 has the highest uncertaindegree among the three, while Condition 1 has the lowest. Also,

the face-cutting procedure is performed to all face images before

the feature extraction with random cropping size in the range

[0, 7].

Before extracting features via KFDA method, the optimal

RBF kernel parameter , which results in the minimum error

rate, is found by searching the variation range of from 1 to 10 .

The error rate is the average error rate over ten runs. Whenever

we are performing the next run, the training set and test set are

reprepared based on Condition 3. Following the method used

in[3]and[15],the average error rate is computed by

(29)

where is the number of runs, is the number of errors for

the th run, and is the number of total testing face images

of each run. It is noticed that the total testing face images

means the training set in the parameter selection process and

classifier training, while in the experiment of comparison of dif-

ferent classification systems (online testing), thetotal test pat-

terns means the test set. After taking trail-and-error, the op-

timal KFDA parameter was found to be 2 5.6 10 , which

resulted in the lowest average error rate measured from the ten

training sets, also resulted in a low average error rate 11.8%measured from the ten test sets, by using the NN classifier. With


10/15


the optimal kernel value, total of 29 discriminating feature com-

ponents are extracted from each face image by using the KFDA

method.

a) Sensitivity test on , , , and : Thefirst ex-

periment is to test the sensitivities of TAF-SVM to the four pa-

rameters: , , , and , which have been condensed

to the ratios and defined by(27) and (28). The values ofand for this experiment are {1, 10, 20, 30, 40} and {2, 4, 6,

8, infinity}, respectively. The values of is set as 10 and is

fixed. The lower bound of the fuzzy membership function is

fixed to 0.4. RBF kernel is also used for TAF-SVM where its

kernel parameter is set as 0.05. The experimental results for

the three conditions are shown inFig. 7.

The lowest error rates for the three conditions are 2.10%,

7.10%, and 4.80% when the pairs equal (20, 4), (30, 4),

and (30, 6), respectively. The corresponding values of( , ,

, ) are (200, 10, 50, 2.5), (300, 10, 75, 2.5), and (300,

10, 50, 1.66), respectively. In addition, the results also indicate

that the error rates can be reduced by changing the ratios of

and . In the following, we take the results ofCondition 3shown inFig. 7(c)as examples to show how the performance of

TAF-SVM will be affected under different and . Three steps

are as follows.

Step 1) InFig. 7(c),the largest average error rate 7.76% oc-

curs at the position (1, infinity).

(1, infinity) means that

(10,10,0,0). At this position, the different cost

algorithm is disabled because the penalties for

the positive and the negative classes are the

same ( 10). The total margin al-

gorithm is also disabled at this position because of

0. Therefore, only the fuzzy penaltyis used in TAF-SVM.

Step 2) As the position goes to (30, infinity) from

(1, infinity) , the average error rate decreases to

5.94% from 7.76%. At the position (30, in-

finity), the different cost algorithm is enabled (used)

in TAF-SVM because the penalties for the positive

and the negative classes become different:

300 and 10. We can see that when

300 and 10, the penalty for the positive class

is much larger than that for the negative class. This

meets the role of different cost algorithm, which

says: Assign heavier penalty to the smaller class. In

the experiments ofFig. 7(c), the number of nega-

tive training data is 29 times the number of positive

training data in the learning of each OSH by OAA

TAF-SVM, because there are 30 subjects in CYCU

database needed to be classified. On the other hand,

the total margin algorithm is still not enabled at this

position because of 0. By comparing

the analysis in Step 1) with the one in Step 2), we

can see that the error rate is reduced by the applica-

tion of different cost algorithm.

Step 3) As the position goes to (30, 6) from (30, in-

finity), the average error rate decreases to 4.8% from

5.94%. At the position (30, 6), not only thedifferent cost algorithm is enabled ( 300 and

Fig. 7. Comparisons of average error rates among differentpairs of

usedin TAF-SVM under different data collection conditions.

10), but also the total margin algorithm is en-

abled ( and ). By

comparing the analyses in Steps 1) and 2) with the

analysis in Step 3), we can see that the error rate can

be further reduced with the involving of total marginalgorithm, after the use of different cost algorithm.


11/15


12/15


TABLE III

PARAMETERSETTING INKFDA, SVM, AND TAF-SVM

TABLE IVCOMPARISON OF THE AVERAGEERRORRATE ANDSTANDARDDEVIATION(SD)

OVERTENRUNSBETWEENTAF-SVM AND OTHERSYSTEMS

TABLE V

COMPARISON OFCOMPUTATIONTIMEAMONGDIFFERENTSYSTEMS

based SVM, the improvement is very limited (from 11.41%

to 9.02%) while that is significant by using OAO-based SVM(from 11.41% to 7.55%). It is not surprising that the difference

of recognition accuracy between OAO- and OAA-based SVM

is that apparent since the OAO method does not result in the oc-

currence of imbalanced data sets while OAA method does. As a

matter of fact, Linet al.[2]have indicated that the OAO method

is more suitable for practical use than the OAA method in terms

of classification accuracy according to their experiments carried

out on various popular data. In this paper, we also suggest that

the OAO method is better than the OAA method for SVM-based

face recognition. However, this suggestion is only in terms of

face-recognition accuracy because the OAO method takes more

recognition time than the OAA method (seeTable V).We conduct this experiment mainly based on the reason that

a robust face classifier should be able to maintain good sta-

bility while expecting that it can achieve the best recognition

accuracy under the training with different training sets. The re-

sults of Table IV indicate that TAF-SVM outperforms OAO-

and OAA-based SVM. This is due to the fact that TAF-SVM

not only can adapt to the imbalanced face data sets but also

can avoid the overfitting problem and improve the generaliza-

tion error bound. In addition, the system KFDA TAF-SVM

achieves the lowest standard deviation (0.57) compared with the

system KFDA SVM. It indicates that the TAF-SVM is more

stable than SVM.

4) Computational Complexity: Our experiments were im-plemented on an Intel Xeon 3.0 GHz-Workstation (1 MB L2

catch, DDR2 2.0 GB SDRAM, 800 MHz Front-Site-Bus, and

10 000 rpm SCSI-hard disk). The training program was imple-

mented by using Matlab since it is able to solve the eigenvalue

problem for KFDA and constrained optimization problem for

both SVM and TAF-SVM easily. After the training, we saved

the expansion coefficients for KFDA and the indispensable in-

formation to the further recognition including the support vec-tors, Lagrange multipliers, and the optimal bias for each OSH.

The test program was implemented using C++ language since

the recognition process only executes simple calculations such

as the dot product of vectors, their linear combinations, and de-

cision making. We recorded the computation time of the first

run of the last experiment and listed them in Table V.

Most training time was spent on the solving of the constrained

optimization problem. The larger the number of the training data

was, the more time the training process needed. Also, the pro-

portion of the increase of training time to that of training data

was more than one. The total training time of OAA-based SVM

(2429.7 s for 30 OSHs) is much larger than that of OAO-based

SVM (234.8 s for 435 OSHs), as shown inTable V. Moreover,we found that the training time of the proposed TAF-SVM was

smaller than that of OAA-based SVM. This may be due to the

reason that for TAF-SVM the feasible regions are functions of

membership values less than one. That is, most data have com-

parative smaller feasible regions in searching the Lagrange mul-

tipliers, compared with SVM. Although the training is time-con-

suming, what is the biggest concern for face recognition is the

online recognition speed.

In the training of an OSH for the OAA-based SVM, we

noticed that the percentage of the obtained support vectors

is around 20%25%. The proposed TAF-SVM, by which the

percentage of the support vectors is 100%, is 4.75 (76.4/16.1)times the recognition time of OAA-based SVM, as listed in

Table V.Besides, the recognition time for TAF-SVM is around

0.1231 s per subject. This recognition speed is acceptable for

the tasks of security and visual surveillance.

B. Experiment on FERET Database

The facial recognition technology (FERET) face database,

obtained from the FERET program[37],[38],[43], is a com-

monly used database for the test of state-of-art face-recognition

algorithms. In the following, the proposed TAF-SVM is tested

on a subset of this database.

This subset contains 1400 images of 200 subjects. The subsetcontains the images whose names are marked with two-char-

acter strings:ba, bj, bk, be, bf, bd,and bg.Each

subject has seven images involving variations in illumination,

pose, and facial expression. In our experiment, each original

image is cropped so that each cropped image only contains

the portions of the face and hair. Then, each cropped image is

resized to 80 80 pixels and preprocessed by the histogram

equalization. Some images of one of the 200 subjects are shown

inFig. 9.Six out of seven images of each subject are randomly

chosen for training, and the remaining one is used for testing.

The training set size is 1200 and the test set size is 200. We run

the previous process 20 times and obtain 20 different training

and test sets, and in each run there is no overlap betweentraining set and test set.


13/15


Fig. 9. First row: images of one of the 200 subjects in the FERET database.Second row: cropped images of those in thefirst row after histogram equaliza-tion.

TABLE VICOMPARISONS OFAVERAGEERROR RATE AND SD AMONG DIFFERENT

SYSTEMSWITHKFDA FEATUREEXTRACTION ONFERET DATABASE

1) Performance Test After the KFDA Feature Extraction: In

this experiment, the face images will go through the KFDA fea-

ture extractionfirst before the classification. Therefore, wefirst

find the optimal parameters of KFDA.

a) Optimal parameter selection: The first stepis to find the

optimal parameters of KFDA for the experiment on the subset

of FERET database. Only two parameters for KFDA need to

be determined, namely the RBF kernel parameter , and the

number of chosen eigenvectors . The optimal parameter pair

will be the pair, over the wide ranges of and, resulting in the lowest average error rate. One average error

rate is computed from the 20 error rates under a specific param-

eter pair and the errors are measured by an NN classifier. In the

sequence, the optimal parameters 6.1 10 and

199, are found for KFDA. Then, the training sets are projected

onto the 199 eigenvectors and thus 20 projected training sets are

obtained.

Then, the projected training sets are used tofind the optimal

parameters for SVM and TAF-SVM, respectively. The RBF

kernel is still used in the classifiers. Similar to the searching

process for the KFDAs optimal parameter selection, here the

optimal parameters of each classifier will also be the parameters

resulting in the lowest average error rate over wide searching

ranges of the classifiers parameters.

b) Comparisons among different systems: After we select

the optimal parameters for each classifier, we start to test and

compare the classification accuracies by feeding 20 test sets

into different systems. The experimental results are listed in

Table VI.

First, by comparing the results in Tables VIand Table IV,

we can observe that the error rates obtained from FERET data-

base are much larger than those obtained from CYCU data-

base. It may be due to the following facts: 1) the number of

available training data from FERET database (six per subject)

is much smaller than that from CYCU database (21 per sub-ject), 2) there exist larger variations in FERET database, and 3)

the number of subjects in the subset of FERET database (200

subjects) is relatively much larger than that of CYCU database

(30 subjects). Nevertheless, the experimental results carried out

from FERET database show that TAF-SVM still performs better

than SVM. Based on the results in Table VI, TAF-SVM out-

performs both SVM (OAO) and SVM (OAA) in average error

rate by 3.21% and 5.10%, respectively. Additionally, TAF-SVMachieves the lowest variance among these systems, which indi-

cates that TAF-SVM is able to keep better stability than SVM

when facing different unseen patterns.

It is worth noting that though KFDA has been used to ex-

tract discriminating features from original image raw data based

on the maximization of between-class separability; however, it

does not mean that the class distribution in KFDA-based sub-

space will be separable. This can be evidenced by the result from

Table VI: (KFDA NN) 22.18%.Thisresult impliesthat,

based on the optimal parameters of KFDA having been used,

there still exist numerous errors between classes. That is, the

class distribution in KFDA-based subspace is still nonseparable.

This may result from two reasons as follows.First, the face patterns involve too large variation such that

KFDA is not able to separate the classes well. Second, in this

paper, the KFDA used for the subset of FERET database and

CYCU database actually suffers from the so-called small

sample size(SSS) problem[3],because in our experiment the

number of training patterns is smaller than the dimensionality

of the input training patterns. For example, in our training

sets, each pattern (80 80 pixel image) is a 6400-dimensional

vector, while the number of available training patterns is only

1200 (200 subjects, six per subject). The SSS problem also

occurs in the KFDA used for the experiment of CYCU data-

base, where each training pattern (28 23 pixel image) is a644-dimensional vector, while the number of training patterns

in each training set is 630 (21 per subject). Since the KFDA

used for the two databases suffers from the SSS problem, the

within-class scatter matrix of(1)is degenerated because

contains the null space.

To solve the SSS problem in numerical computation, this

paper uses the method of adding a matrix , where is the

identity matrix and is a small number, to the inner product

kernel matrix infinding the expansion coefficient vector for

the data projection. This method is very simple and was sug-

gested by Mika et al. [39], [40]. However, this method dis-

cards the discriminant information contained in the null space of

within-class scatter matrix, yet the discarded information may

contain the most significant discriminant information[3],[41],

[42]. This means, even if the most discriminant eigenvectors

have been used for the data projection in our experiments, these

eigenvectors are actually not the most discriminant. Although

KFDA is employed in our work, the face-pattern distribution

is still nonseparable, for example the face-pattern distribution

shown inFigs. 4(b)and 6.

Several more efficient solutions for this SSS problem have

been recently proposed, such as the kernel direct discriminant

analysis (KDDA)[3] which is a generalization of direct LDA

[41], and the complete kernel Fisher discriminant analysis

(CKFD) which combines the kernel PCA and LDA [42].We expect that the classification accuracy of each system in


14/15


TABLE VII

COMPARISONS OF AVERAGEERRORRATE ANDSD AMONGDIFFERENT

SYSTEMSWITHOUTKFDA FEATUREEXTRACTION ONFERET DATABASE

Tables VIandIVwill be improved if, instead of KFDA, KDDA

or CKFD is used for the face-feature extraction in this paper.

Moreover, though KFDA has tried to minimize the within-

class scatters for obtaining larger intraclass compactness, this

cannot guarantee that no outliers will appear in the KFDA-based

subspace. For example, in Fig. 4(a), an outlier still exists in

the KFDA-based subspace. Under such a situation, the SVMs,

SVM (OAO), and SVM (OAA) may suffer from the overfitting

problem and the classification performance will drop. On the

other hand, for SVM (OAA), although KFDA has been used

for the face-feature extraction, the case of imbalanced trainingdata sets is still unavoidable in the KFDA-based subspace. In

the training of an OSH via SVM (OAA), the imbalanced ratio

of negative training data to positive data is 199 : 1. Such a large

imbalanced ratio will result in the class-boundary-skew problem

for SVM (OAA). This may be the reason why SVM (OAO) al-

ways performs better than SVM (OAA), because the ratio of

negative training data to positive data is always 1 : 1 for SVM

(OAO) if the sizes of classes are the same. To sum up,Table VI

shows that the proposed TAF-SVM improves the classification

performance of SVM (OAO) and SVM (OAA), and such a sig-

nificant improvement of performance should be attributed not

only to the use of fuzzy penalty and different cost algorithm,but also to the total margin algorithm embedded in TAF-SVM.

2) Performance Test Without KFDAs Feature Extraction: In

this experiment, the image raw vectors are directly sent into each

classifier without using KFDA as the feature extractor. Since the

KFDA feature extractor is no longer used, the optimal parame-

ters of each classifier need to be reselected. It is noted here that

the inputs of each classifier are normalized to zero mean and

unit variance. After feeding the 20 different test sets into these

systems directly without using KFDA feature extractor, the av-

erage error rates are obtained and listed in Table VII.

Comparing the results reported in Tables VII and VI, we

canfind that the average error rate of each system inTable VI

is lower than that listed in Table VII. For example,

(KFDA TAF-SVM) 14.15%, while (TAF-SVM)

20.40%. Therefore, we can conclude that by using KFDA as the

feature extractor, the classification accuracy of each classifier

can be further enhanced significantly. FromTable VII, it can be

seen that in terms of the average classification rate, TAF-SVM

outperforms SVM (OAA) and SVM (OAO) by 7.23% and

3.73%, respectively. In addition, TAF-SVM still achieves the

lowest variance.

VI. CONCLUSION ANDFUTUREWORK

A new classifier called TAF-SVM is proposed in this paper.

TAF-SVM is mainly designed for the improvement of the draw-backs of traditional SVM when applied to face recognition,

the class-boundary-skew problem, and the overfitting problem,

by introducing the different cost algorithm and the method of

fuzzification of training set. Another contribution is to enhance

the generalization ability of SVM by introducing the total

margin algorithm. Experimental results show that the proposed

TAF-SVM is superior to OAO- and OAA-based SVM in terms

of both face-classification rate and stability. The validity ofTAF-SVM on the improvement of classification accuracy of

SVM for face recognition has been indicated.

Based on the work presented, there still remain several topics

worth studying in the future. First, the circle-like membership

model for the training set fuzzification used in this paper is not a

very efficient model since the face-pattern distribution is in gen-

eral non-Gaussian and nonconvex. The study on a better model

is needed. Second, experimental results have shown that using

KFDA as feature extractor is able to enhance the classification

accuracy. However, for face-recognition, KFDA suffers from

the SSS problem in our work. It is believed that if this problem

is solved, e.g., by using the variants of KFDA such as KDDA

[3]or CKFD[42],the face-recognition accuracy can be furtherenhanced based on the use of TAF-SVM classifier.

ACKNOWLEDGMENT

The authors would like to thank the reviewers for their useful

comments and suggestions, and Prof. H.-P. Huang, Prof. S.-G.

Miaou, Prof. P. C. P. Chao, and H.-Y. Lin for their help in

preparing this paper.

REFERENCES

[1] K. Veropoulos,C. Campbell, and N. Cristianini, Controlling the sensi-

tivity of support vector machines,inProc. Int. Joint Conf. Artif. Intell.(IJCAI99), Stockholm, Sweden, 1999, pp. 5560.[2] C. W. Hsu and C. J. Lin,A comparison of methods for multiclass

support vector machines, IEEE Trans. Neural Netw., vol. 13, no. 2,pp. 415425, Mar. 2002.

[3] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, Face recogni-tion using kernel direct discriminant analysis algorithms,IEEE Trans.

Neural Netw., vol. 14, no. 1, pp. 117126, Jan. 2003.[4] M. Yoon, Y. Yun, and H. Nakayama,A role of total margin in support

vector machines,in Proc. Int. Joint Conf. Neural Netw., 2003, vol. 3,pp. 20492053.

[5] I. Guyon, N. Matic, and V. Vapnik,Discovering informative patternsand data cleaning, in Advances in Knowledge Discovery and Data

Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthu-rusamy, Eds. Menlo Park, CA: AAAI Press, 1996, pp. 181203.

[6] X. Zhang, Using class-center vectors to build support vector ma-chines, in Proc. IEEE Workshop Neural Netw. Signal Process.

(NNSP99), Madison, WI, 1999, pp. 311.[7] C. F. Lin and S. D. Wang,Fuzzy support vector machines, IEEE

Trans. Neural Netw., vol. 13, no. 2, pp. 464471, Mar. 2002.[8] V. Vapnik, Statistical Learning Theory. New York: Springer-Verlag,

1998.

[9] G. Baudat and F. Anouar,Generalized discriminant analysis using akernel approach,Neural Comput., vol. 12, pp. 23852404, 2000.

[10] Chung Yuan Christian Univ. (CYCU), Multiview Face DatabaseChungli, Taiwan [Online]. Available: http://vsclab.me.cycu.edu.tw/~face/face_index.html

[11] R. Akbani, S. Kwek, and N. Japkowicz,Applying support vector ma-chines to imbalanced datasets,in Proc. 15th Eur. Conf. Mach. Learn.(ECML), Pisa, Italy, 2004, pp. 3950.

[12] M. Turk and A. Pentland,Eigenfaces for recognition,J. Cogn. Neu-rosci., vol. 3, no. 1, pp. 7186, 1991.

[13] C. Corts and V. Vapnik,Support vector networks,Mach. Learn., vol.

20, pp. 273297, 1995.[14] J. C. Burges,A tutorial on support vector machines for pattern recog-nition,Data Mining Knowl. Disc., vol. 2, pp. 121167, 1998.


15/15


[15] M. J. Er, S. Wu, J. Liu, and H. L. Toh,Face recognition with radialbasis function (RBF) neural networks,IEEE Trans. Neural Netw., vol.13, no. 3, pp. 697710, May 2002.

[16] M.J. Er, W.L. Chen,and S.Q. Wu, High-speed face recognition basedon discrete cosine transform and RBF neural networks,IEEE Trans.

Neural Netw. , vol. 16, no. 3, pp. 679691, May 2005.[17] G. Wu and E. Cheng, Class-boundary alignment for imbalanced

dataset learning, in Proc. ICML 2003 Workshop Learn. Imbalanced

Data Sets II, Washington, DC, 2003, pp. 4956.[18] N. Japkowicz,The class imbalance problem: Significance and strate-gies,inProc. 2000 Int. Conf. Artif. Intell.: Special Track on Inductive

Learning, Las Vegas, NV, 2000, pp. 111 117.[19] C. Ling and C. Li,Data mining for direct marketing problems and

solutions, in Proc.4th Int. Conf. Knowl.Disc.Data Mining, New York,1998, pp. 7379.

[20] N. Chawla, K. Bowyer, and W. Kegelmeyer, SMOTE: Syntheticminority over-sampling technique, J. Artif. Intell. Res., vol. 16, pp.321357, 2002.

[21] J. F. Feng and P. Williams,The generalization error of the symmetricand scaled support vector machines,IEEE Trans. Neural Netw., vol.12, no. 5, pp. 12551260, Sep. 2001.

[22] M. H. Yang,Kernel Eigenfaces vs. kernel Fisherfaces: Face recogni-tion using kernel methods,in Proc. 5th IEEE Int. Conf. Autom. FaceGesture Recognit., Washington, DC, 2002, pp. 215220.

[23] Q. S. Liu, H. Q. Lu, and S. D. Ma,Improving kernelfisher discrim-

inant analysis for face recognition, IEEE Trans. Circuits Syst. VideoTechnol., vol. 14, no. 1, pp. 42 49, Jan. 2004.

[24] B.Schlkopf,A. Smola,and K.R. Mller, Nonlinear component anal-ysis as a kernel eigenvalue problem,Neural Comput., vol. 10, no. 5,pp. 12991319, 1998.

[25] K. I. Kim, K. Jung, and H. J. Kim,Face recognition using kernel prin-cipal component analysis, IEEE Signal Process. Lett., vol. 9, no. 2,pp. 4042, Feb. 2002.

[26] G. Cui and W. Gao,SVMs for few examples-based face recognition,inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. , Hong Kong,2003, vol. 2, pp. 381384.

[27] W. Chi, G. Dai, and L. Zhang,Face recognition based on independentGabor features and supportvector machine, inProc. 5th World Congr.

Intell. Control Autom., Hangzhou, China, 2004, vol. 5, pp. 4030 4033.[28] C. Y. Li, F. Liu, and Y. X. Xie,Face recognition using self-orga-

nizing feature maps and support vector machines, in Proc. 5th Int.Conf. Comput.Intell. Multimedia Appl., Xian, China, 2003,pp.3742.

[29] G. Dai and C. Zhou,Face recognition using support vector machineswith the robust feature, in Proc. 12th IEEE Int. Workshop Robot

Human I nteractive Commun., 2003, pp. 4953.[30] S. Y. Zhang and H. Qiao,Face recognition with support vector ma-

chine, in Proc. IEEE Int. Conf. Robot., Intell. Syst. Signal Process.,Changsha, China, 2003, vol. 2, pp. 726730.

[31] K. I. Kim, J. Kim, and K. Jung, Recognition of facial imagesusing support vector machines, in Proc. 11th Workshop Stat. SignalProcess., Singapore, 2001, pp. 468471.

[32] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman,Eigenfaces vs.Fisherfaces: Recognition using class specific linear projection,IEEETrans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711720, Jul.1997.

[33] Y. Lin, Y. Lee, and G. Wahba,Support vector machines for classifi-cation in nonstandard situations,Mach. Learn., vol. 46, pp. 191202,2002.

[34] E. Osuna, R. Freund, and F. Girosit, Training support vector ma-chines: An application to face detection,in Proc. Comp. Vis. Pattern

Recognit. (CVPR) , Puerto Rico, 1997, pp. 130136.[35] U. H.-G. Kressel, Pairwise classification and support vector ma-

chines,inAdvances in Kernel MethodsSupport Vector Learning, B.Schlkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA:MIT Press, 1999.

[36] T. Van Gestel, J. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G.

Dedene, B. De Moor, and J. Vandewalle,Benchmarking least squaressupport vector machine classifiers, Mach. Learn., vol. 54, no. 1, pp.532, 2004.

[37] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss,The FERETevaluation methodology for face-recognition Algorithms,IEEE Trans.Pattern Anal. Mach. Intell., vol. 22, no. 10, pp. 10901104, Oct. 2000.

[38] P. J. Phillips, The Facial Recognition Technology (FERET) Database(2004) [Online]. Available: http://www.itl.nist.gov/iad/humanid/feret/feret_master.html

[39] S. Mika, G. Rtsch, J. Weston, B. Schlkopf, and K.-R. Mller,Fisher discriminant analysis with kernels,in Proc. IEEE Int. Work-

shop Neural Netw. Signal Process. IX, Aug. 1999, pp. 4148.[40] , Constructing descriptive and discriminant nonlinear features:Rayleigh coefficients in kernel feature spaces, IEEE Trans. Pattern

Anal. Mach. Intell., vol. 25, no. 5, pp. 623628, May 2003.[41] H. Yu and J. Yang,A direct lDA algorithm for high-dimensional data

with application to face recognition, Pattern Recognit., vol. 34, pp.20672070, 2001.

[42] J. Yang, A. F. Frangi, J. Y. Yang, D. Zhang, and Z. Jin, KPCA plusLDA: A complete kernel Fisher discriminant framework for featureextraction and recognition, IEEE Trans. Pattern Anal. Mach. Intell.,vol. 27, no. 2, pp. 230244, Feb. 2005.

[43] J. Yang, J. Y. Yang, and A. F. Frangi,Combined Fisherfaces frame-work,Image Vis. Comput., vol. 21, no. 12, pp. 10371044, 2003.

[44] H. P. Huang and Y. H. Liu,Fuzzy support vector machines for pat-tern recognition and data mining,Int. J. Fuzzy Syst., vol. 4, no. 3, pp.826835, 2002.

[45] J. Suykens and J. Vandewalle,Least squares support vector machine

classifiers,Neural Process. Lett., vol. 9, pp. 293300, 1999.[46] M. W. Chang, C. J. Lin, and R. C. H. Weng,Analysis of switching dy-

namics with competing support vector machines,IEEE Trans. NeuralNetw., vol. 15, no. 3, pp. 720727, May 2004.

[47] S. N. Pang, D. Kim, and S. Y. Bang,Face membership authenticationusing SVM classification tree generated by membership-based LLEdata partition,IEEE Trans. Neural Netw., vol. 16, no. 2, pp. 436446,Mar. 2005.

[48] S.Li, J.T. Y.Kwok, I.W. H.Tsang,and Y.Wang, Fusing imageswithdifferent focuses using support vector machines,IEEE Trans. Neural

Netw., vol. 15, no. 6, pp. 15551561, Nov. 2004.

Yi-Hung Liu (M04) received the B.S. degree innaval architecture and marine engineering fromNational Cheng Kung University, Tainan, Taiwan,R.O.C.,in 1994, and the M.S. degree in engineeringscience and ocean engineering in 1996 and the Ph.D.degree in mechanical engineering in 2003, both fromNational Taiwan University, Taipei, Taiwan, R.O.C.

He is currently an Assistant Professor with theDepartment of Mechanical Engineering at ChungYuan Christian University, Chungli, Taiwan, R.O.C.

His research interests include computer vision,machine learning, pattern recognition, data mining, automatic control, and theirassociated applications.

Yen-Ting Chen was born in Kaohsiung, Taiwan,R.O.C. He received the B.S. and M.S. degrees in

mechanical engineering from Chung Yuan ChristianUniversity, Chungli, Taiwan, R.O.C., in 2004 and2006, respectively.

Currently, he is with the Industrial TechnologyResearch Institute (ITRI), Hsinchu, Taiwan, R.O.C.,where he is working for the intelligent robots. Hisresearch interests include machine vision and neuralnetworks.

Documents

Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines