17
1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning Haiping Lu, Member, IEEE, Konstantinos N. (Kostas) Plataniotis, Senior Member, IEEE, and Anastasios N. Venetsanopoulos, Fellow, IEEE Abstract—This paper proposes an uncorrelated multilinear principal component analysis (UMPCA) algorithm for unsuper- vised subspace learning of tensorial data. It should be viewed as a multilinear extension of the classical principal component analysis (PCA) framework. Through successive variance max- imization, UMPCA seeks a tensor-to-vector projection (TVP) that captures most of the variation in the original tensorial input while producing uncorrelated features. The solution consists of sequential iterative steps based on the alternating projection method. In addition to deriving the UMPCA framework, this work offers a way to systematically determine the maximum number of uncorrelated multilinear features that can be extracted by the method. UMPCA is compared against the baseline PCA solution and its five state-of-the-art multilinear extensions, namely two-dimensional PCA (2DPCA), concurrent subspaces analysis (CSA), tensor rank-one decomposition (TROD), generalized PCA (GPCA), and multilinear PCA (MPCA), on the tasks of unsu- pervised face and gait recognition. Experimental results included in this paper suggest that UMPCA is particularly effective in determining the low-dimensional projection space needed in such recognition tasks. Index Terms—Dimensionality reduction, face recognition, fea- ture extraction, gait recognition, multilinear principal component analysis (MPCA), tensor objects, uncorrelated features. I. INTRODUCTION I NPUT data sets in many practical pattern recognition problems are multidimensional in nature and they can be formally represented using tensors. Several indices are needed to address the elements of a tensor and the number of indices used defines the “order” of a tensor, with each index defining one “mode” [1]. There are many real-world tensor Manuscript received October 31, 2008; revised July 15, 2009; accepted July 30, 2009. First published September 29, 2009; current version published November 04, 2009. This work was supported in part by the Ontario Centres of Excellence under the Communications and Information Technology Ontario Partnership Program and the Bell University Labs—at the University of Toronto. H. Lu was with the Edward S. Rogers Sr. Department of Electrical and Com- puter Engineering University of Toronto, Toronto, ON M5S 3G4 Canada. He is now with the Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore (e-mail: hplu@ieee. org). K. N. Plataniotis is with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto, Toronto, ON M5S 3G4 Canada (e-mail: [email protected]). A. N. Venetsanopoulos is with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto, Toronto, ON M5S 3G4 Canada and also with Ryerson University, Toronto, ON M5B 2K3 Canada (e-mail: [email protected]). Digital Object Identifier 10.1109/TNN.2009.2031144 TABLE I LIST OF ACRONYMS objects [2]–[5], such as gray-level images as second-order tensors consisting of the column and row modes [6]–[8], and gray-scale video sequences as third-order tensors consisting of the column, row, and time modes [9], [10]. In addition, many streaming and mining data are frequently organized as third-order tensors [11]–[14]. For instance, data in social network analysis are usually organized in three modes, namely time, author, and keywords [11]. A typical real-world tensor object is often specified in a high-dimensional tensor space and recognition methods operating directly on this space suffer from the so-called curse of dimensionality [15]. Nonetheless, a class of tensor objects in most applications are highly constrained to a subspace, a manifold of intrinsically low dimension [15]. Subspace learning, a popular dimensionality reduction method, is frequently employed to transform a high-dimensional data set into a low-dimensional space of equivalent representation while retaining most of the underlying structure [16]. The focus of this paper is on unsupervised subspace learning of tensorial data. For convenience of discussion, Table I lists the acronyms used in this paper. For unsupervised subspace learning of tensorial data, an ob- vious first choice is to utilize existing linear solutions such as the celebrated principal component analysis (PCA). PCA is a classical linear method for unsupervised subspace learning that transforms a data set consisting of a large number of interre- lated variables to a new set of uncorrelated variables, while re- taining most of the input data variations [20]. However, PCA on tensor objects requires the reshaping (vectorization) of tensors 1045-9227/$26.00 © 2009 IEEE Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

Uncorrelated Multilinear Principal ComponentAnalysis for Unsupervised Multilinear

Subspace LearningHaiping Lu, Member, IEEE, Konstantinos N. (Kostas) Plataniotis, Senior Member, IEEE, and

Anastasios N. Venetsanopoulos, Fellow, IEEE

Abstract—This paper proposes an uncorrelated multilinearprincipal component analysis (UMPCA) algorithm for unsuper-vised subspace learning of tensorial data. It should be viewedas a multilinear extension of the classical principal componentanalysis (PCA) framework. Through successive variance max-imization, UMPCA seeks a tensor-to-vector projection (TVP)that captures most of the variation in the original tensorial inputwhile producing uncorrelated features. The solution consists ofsequential iterative steps based on the alternating projectionmethod. In addition to deriving the UMPCA framework, thiswork offers a way to systematically determine the maximumnumber of uncorrelated multilinear features that can be extractedby the method. UMPCA is compared against the baseline PCAsolution and its five state-of-the-art multilinear extensions, namelytwo-dimensional PCA (2DPCA), concurrent subspaces analysis(CSA), tensor rank-one decomposition (TROD), generalized PCA(GPCA), and multilinear PCA (MPCA), on the tasks of unsu-pervised face and gait recognition. Experimental results includedin this paper suggest that UMPCA is particularly effective indetermining the low-dimensional projection space needed in suchrecognition tasks.

Index Terms—Dimensionality reduction, face recognition, fea-ture extraction, gait recognition, multilinear principal componentanalysis (MPCA), tensor objects, uncorrelated features.

I. INTRODUCTION

I NPUT data sets in many practical pattern recognitionproblems are multidimensional in nature and they can

be formally represented using tensors. Several indices areneeded to address the elements of a tensor and the number ofindices used defines the “order” of a tensor, with each indexdefining one “mode” [1]. There are many real-world tensor

Manuscript received October 31, 2008; revised July 15, 2009; acceptedJuly 30, 2009. First published September 29, 2009; current version publishedNovember 04, 2009. This work was supported in part by the Ontario Centresof Excellence under the Communications and Information Technology OntarioPartnership Program and the Bell University Labs—at the University ofToronto.

H. Lu was with the Edward S. Rogers Sr. Department of Electrical and Com-puter Engineering University of Toronto, Toronto, ON M5S 3G4 Canada. He isnow with the Institute for Infocomm Research, Agency for Science, Technologyand Research (A*STAR), Singapore 138632, Singapore (e-mail: [email protected]).

K. N. Plataniotis is with the Edward S. Rogers Sr. Department of Electricaland Computer Engineering University of Toronto, Toronto, ON M5S 3G4Canada (e-mail: [email protected]).

A. N. Venetsanopoulos is with the Edward S. Rogers Sr. Department ofElectrical and Computer Engineering University of Toronto, Toronto, ON M5S3G4 Canada and also with Ryerson University, Toronto, ON M5B 2K3 Canada(e-mail: [email protected]).

Digital Object Identifier 10.1109/TNN.2009.2031144

TABLE ILIST OF ACRONYMS

objects [2]–[5], such as gray-level images as second-ordertensors consisting of the column and row modes [6]–[8], andgray-scale video sequences as third-order tensors consistingof the column, row, and time modes [9], [10]. In addition,many streaming and mining data are frequently organizedas third-order tensors [11]–[14]. For instance, data in socialnetwork analysis are usually organized in three modes, namelytime, author, and keywords [11]. A typical real-world tensorobject is often specified in a high-dimensional tensor space andrecognition methods operating directly on this space suffer fromthe so-called curse of dimensionality [15]. Nonetheless, a classof tensor objects in most applications are highly constrainedto a subspace, a manifold of intrinsically low dimension [15].Subspace learning, a popular dimensionality reduction method,is frequently employed to transform a high-dimensional dataset into a low-dimensional space of equivalent representationwhile retaining most of the underlying structure [16]. The focusof this paper is on unsupervised subspace learning of tensorialdata. For convenience of discussion, Table I lists the acronymsused in this paper.

For unsupervised subspace learning of tensorial data, an ob-vious first choice is to utilize existing linear solutions such asthe celebrated principal component analysis (PCA). PCA is aclassical linear method for unsupervised subspace learning thattransforms a data set consisting of a large number of interre-lated variables to a new set of uncorrelated variables, while re-taining most of the input data variations [20]. However, PCA ontensor objects requires the reshaping (vectorization) of tensors

1045-9227/$26.00 © 2009 IEEE

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 2: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1821

into vectors in a very high-dimensional space. This not only in-creases the computational complexity and memory demands butmost importantly destroys the structural correlation of the orig-inal data [7]–[9], [23]–[25]. It is commonly believed that poten-tially more compact or useful low-dimensional representationscan be obtained directly from the original tensorial represen-tation. Recently, there have been several proposed PCA exten-sions operating directly on tensor objects rather than their vec-torized versions [8], [9], [26].

The tensor rank-one decomposition (TROD) algorithm intro-duced in [22] extracts features for a set of images based on vari-ance maximization and the solution involves (greedy) succes-sive residue calculation. The two-dimensional PCA (2DPCA)proposed in [17] constructs an image covariance matrix usingimage matrices as inputs. However, in 2DPCA, a linear trans-formation is applied only to the right-hand side of image ma-trices so the image data is projected in one mode only, resultingin poor dimensionality reduction [7]. In comparison, the gen-eralized low rank approximation of matrices (GLRAM) algo-rithm in [7] applies linear transformations to the input imagematrices from both the left- and right-hand sides and it isshown to outperform 2DPCA. While GLRAM targets approx-imation and reconstruction in its formulation, the generalizedPCA (GPCA) proposed in [8] aims to maximize the capturedvariation, as a two-dimensional extension of PCA. In addition,the two-dimensional singular value decomposition (2DSVD)[27] provides a near-optimal solution for GLRAM and GPCA.For higher order extensions, the concurrent subspaces analysis(CSA) formulated in [26] targets at optimal reconstruction ofgeneral tensor objects, which can be considered as a furthergeneralization of GLRAM, and the multilinear PCA (MPCA)introduced in [9] targets at variation maximization for generaltensor objects, which can be considered as a further general-ization of GPCA.

Nevertheless, none of the existing multilinear extensions ofPCA mentioned above takes an important property of PCA intoaccount, i.e., the fact that PCA derives uncorrelated features.Instead, these multilinear extensions of PCA produce orthog-onal bases in each mode. Although uncorrelated features implyorthogonal projection bases in PCA [20], this is not necessarilythe case for its multilinear extension. Uncorrelated features arehighly desirable in many recognition tasks since they containminimum redundancy and ensure linear independence amongfeatures [28]. In practical recognition tasks, uncorrelated fea-tures can greatly simplify the subsequent classification task.Thus, this paper investigates multilinear extension of PCAwhich can produce uncorrelated features. A novel uncorrelatedmultilinear PCA (UMPCA) is proposed for unsupervised tensorobject subspace learning (dimensionality reduction). UMPCAutilizes the tensor-to-vector projection (TVP) principle intro-duced during the development of the uncorrelated multilineardiscriminant analysis (UMLDA) framework presented in [6],and it parallelizes the successive variance maximization ap-proach seen in the classical PCA derivation [20]. In UMPCA,a number of elementary multilinear projections (EMPs) aresolved to maximize the captured variance, subject to thezero-correlation constraint. The solution is iterative in nature,as many other multilinear algorithms [8], [22], [26].

This paper makes two main contributions.1) The introduction of a novel algorithm (UMPCA) for ex-

tracting uncorrelated features directly from tensors. Thederived solution captures the maximum variation of theinput tensors. As a multilinear extension of PCA, UMPCAnot only obtains features that maximize the variance cap-tured, but also enforces a zero-correlation constraint, thusextracting uncorrelated features. UMPCA is the only mul-tilinear extension of PCA, to the best of the authors’ knowl-edge, that can produce uncorrelated features in a fashionsimilar to that of the classical PCA, in contrast to othermultilinear PCA extensions, such as 2DPCA [17], CSA[26], TROD [22], GPCA [8], and MPCA [9]. It should benoted that unlike the works reported in [9] and [26], TVPrather than tensor-to-tensor projection (TTP) is used here,and that unlike the heuristic approach of [22], this worktakes a systematic approach in deriving the solution underthe TVP paradigm. Interested readers should refer to [6],[9], and [19] for a detailed discussion on the topic of tensorprojections, including TTP and TVP.It should be noted that although the proposed UMPCAalgorithm takes an approach similar to that used in theUMLDA algorithm [6] to derive uncorrelated featuresthrough TVP, the two methods utilize different objectivefunctions. UMPCA is an unsupervised learning algo-rithm which does not require labeled training data, whileUMLDA is a supervised learning algorithm which requiresaccess to labeled training samples. It is therefore obviousthat UMPCA can be used in many applications whereUMLDA cannot be applied. Examples include typicalunsupervised learning tasks such as clustering, and theone-training-sample (OTS) scenario recently studied inthe face recognition literature, where many supervised al-gorithms cannot be applied properly due to the availabilityof only one training sample per class [29], [30].

2) A systematic method to determine the maximum numberof uncorrelated multilinear features that can be extractedunder the UMPCA framework. The pertinent corollaryprovides insight into the possible uses of UMPCA. It helpsdesigners and practitioners to understand the possiblelimitations of UMPCA and provides guidance on whereand how UMPCA should be used. In the linear case, thederived constraint on the maximum number of uncorre-lated features reduces to a well-known constraint on therank of the data matrix in PCA.

The rest of this paper is organized as follows. Section II re-views the basic multilinear notation and operations, includingTVP. In Section III, the problem is stated and the UMPCAframework is formulated, with an algorithm derived as a sequen-tial iterative process. This section also includes a systematic wayto determine the maximum number of uncorrelated features thatcan be extracted under the UMPCA framework. In addition, is-sues such as initialization, projection order, termination, con-vergence, and computational complexity are discussed in de-tail. Section IV studies the properties of the proposed UMPCAalgorithm using three synthetic data sets and evaluates the ef-fectiveness of UMPCA in face and gait recognition tasks bycomparing its performance against that of PCA, 2DPCA, CSA,

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 3: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1822 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

TABLE IILIST OF NOTATION

TROD, GPCA, and MPCA. Finally, Section V draws the con-clusions. It should be noted that in order to take a systematicapproach to the problem of interest, this work shares some sim-ilarity in presentation with [6].

II. MULTILINEAR FUNDAMENTALS

This section introduces the fundamental multilinear notation,operations, and projections necessary for the presentation ofUMPCA. For more detailed understanding of the materials pre-sented here, readers should refer to previously published works,such as those in [1], [6], [9], and [19]. The notation conventionsused in this paper are listed in Table II.

A. Notation and Basic Multilinear Operations

Following the notational convention of [1], vectors are de-noted by lowercase boldface letters, e.g., ; matrices are denotedby uppercase boldface letters, e.g., ; and tensors are denotedby calligraphic letters, e.g., . Their elements are denoted withindices in parentheses. Indices are denoted by lowercase lettersand span the range from 1 to the uppercase letter of the index,e.g., .

An th-order tensor is addressed byindices , , and each addresses the -mode of

. The -mode product of a tensor by a matrix ,denoted by , is a tensor with entries

(1)

The scalar product of two tensors is de-fined as

(2)

A rank-one tensor equals to the outer product of vec-tors: , which means that

for allvalues of indices.

B. Tensor-to-Vector Projection

In order to extract uncorrelated features from tensorial datadirectly, i.e., without vectorization, this work employs the TVP

introduced in [6]. A brief review on TVP is given here and de-tailed introduction is available in [6].

TVP is a generalized version of the projection framework firstintroduced in [22]. It consists of multiple EMPs. An EMP isa multilinear projection composedof one unit projection vector per mode, i.e., for

, where is used to indicate the Euclidean normfor vectors. An EMP projects a tensor to ascalar through the unit projection vectors as

where .The TVP of a tensor object to a vector consists ofEMPs , , which can

be written concisely as

(3)

where the th component of is obtained from the th EMPas: . The TROD[22] in fact seeks a TVP to maximize the captured variance.However, it takes a heuristic approach. Section III proposes asystematic, more principled formulation by taking considerationof the correlations among features.

III. UNCORRELATED MULTILINEAR PCA

This section introduces the UMPCA framework for unsu-pervised subspace learning of tensor objects. The UMPCAobjective function is first formulated. Then, the successivevariance maximization approach and alternating projectionmethod are adopted to derive uncorrelated features throughTVP. A methodology to systematically determine the max-imum number of uncorrelated features that can be extracted isintroduced. Practical issues regarding initialization, projectionorder, termination, and convergence are addressed and thecomputational aspects of UMPCA are discussed in detail.

In the presentation, for the convenience of discussion andwithout loss of generality, training samples are assumed to bezero-mean so that the constraint of uncorrelated features isthe same as orthogonal features [20], [31].1 When the trainingsample mean is not zero, it can be subtracted to make thetraining samples to be zero-mean. It should be noted that theorthogonality and zero correlation discussed here are referringto the concepts in linear algebra rather than statistics [32].

A. Formulation of the UMPCA Problem

Following the standard derivation of PCA provided in [20],the variance of the principal components is considered one at atime, starting from the first principal component that targets to

1Let � and � be vector observations of the variables � and �. Then, � and �are orthogonal iff � � � �, and � and � are uncorrelated iff �� � ��� �� ���� � �, where �� and �� are the means of � and �, respectively [32]. Thus, twozero-mean ��� � �� � �� vectors are uncorrelated when they are orthogonal[31].

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 4: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1823

capture the most variance. In the setting of TVP, the th prin-cipal components are , where isthe number of training samples and is the projection of

the th sample by the th EMP :

. Accordingly, thevariance is measured by their total scatter , which is definedas

(4)

where . In addition, let denote theth coordinate vector, which is the representation of the training

samples in the th EMP space. Its th component. With these definitions, the formal definition of the un-

supervised multilinear subspace learning problem in UMPCAis as follows.

A set of tensor object samples areavailable for training. Each tensor objectassumes values in the tensor space ,where is the -mode dimension of the tensor and

denotes the Kronecker product [33]. The objective ofUMPCA is to determine a TVP, which consists of EMPs

, so that the original tensorspace can be mapped into a vectorsubspace (with )

(5)

while the variance of the projected samples, measured by ,is maximized in each EMP, subject to the constraint that thecoordinate vectors are uncorrelated.

In other words, the UMPCA objective is to determine a set ofEMPs that maximize the variance

captured while producing uncorrelated features. The objectivefunction for determining the th EMP can be expressed as

subject to and

(6)

where is the Kronecker delta defined as

ifotherwise.

(7)

The constraint is imposed since as in the linearcase [20], the maximum variance cannot be achieved for finite

.Remark 1: It should be noted that due to the nature of TVP,

the UMPCA algorithm is a feature extraction algorithm whichproduces feature vectors in a manner similar to those of linearsolutions. Thus, for the tensor sample , the correspondingUMPCA feature vector is given as

(8)

B. Derivation of the UMPCA Solution

To solve this UMPCA problem in (6), the successive vari-ance maximization approach, first utilized in the derivation ofPCA [20], is taken. The EMPsare determined sequentially in steps. This stepwise processproceeds as follows.

— Step 1: Determine the first EMPby maximizing .

— Step 2: Determine the second EMPby maximizing subject to the constraint

that .

— Step 3: Determine the third EMPby maximizing subject to the constraint thatand .

— Step : Determine the th EMP

by maximizing subjectto the constraint that for .

Fig. 1 lists the pseudocode of the proposed UMPCA algorithm,where is a matrix to be defined in the formulated eigen-

value problem below, and is a total scatter matrix to be de-fined below. In the figure, the stepwise process described abovecorresponds to the loop indexed by . In the following, how tocompute these EMPs is presented in detail.

In order to determine the th EMP ,there are sets of parameters corresponding to the projec-tion vectors to be determined, , one in eachmode. It will be desirable to determine these sets of parame-ters ( projection vectors) in all modes simultaneously so that

is globally maximized, subject to the zero-correlation con-straint. Unfortunately, as in other multilinear subspace learningalgorithms [6], [9], [18], there is no closed-form solution for thisproblem, except when , which is the classical PCA whereonly one projection vector is to be solved. Therefore, followingthe heuristic approaches in [6], [9], and [18], the alternating pro-jection method is used to solve the multilinear problem, as de-scribed below.

To determine the th EMP, the sets of parameters for theprojection vectors are estimated one mode (i.e., one set) at

a time. For mode , a linear subproblem in terms of issolved by fixing , the projection vectors for theother modes. Thus, there are such conditional subproblems,corresponding to the loop indexed by in Fig. 1. This processiterates until a stopping criterion is met, which corresponds tothe loop indexed by in Fig. 1.

To solve for , the conditional subproblem in the-mode, the tensor samples are projected in these

modes first to obtain the vectors

(9)

where , assuming that is given.This conditional subproblem then becomes to determinethat projects the vector samplesonto a line so that the variance is maximized, subject to thezero-correlation constraint, which is a PCA problem with the

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 5: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1824 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

Fig. 1. Pseudocode implementation of the UMPCA algorithm for unsupervised subspace learning of tensor objects.

input samples . The total scatter ma-trix corresponding to is thendefined as

(10)

where . With (10), it is now

ready to determine the projection vectors. For , the

that maximizes the total scatter in the projected

space is obtained as the unit eigenvector of associated withthe largest eigenvalue, for . For ,given the first EMPs, the th EMP aims to maximizethe total scatter , subject to the constraint that featuresprojected by the th EMP are uncorrelated with those projectedby the first EMPs. Let be a matrixwith as its th column, i.e.,

(11)

then the th coordinate vector is . The con-straint that is uncorrelated with canbe written as

(12)

Thus, can be determined by solving the followingconstrained optimization problem:

subject to and

(13)

The solution is given by the following theorem.

Theorem 1: The solution to the problem in (13) is the (unit-length) eigenvector corresponding to the largest eigenvalue ofthe following eigenvalue problem:

(14)

where

(15)

(16)

(17)

and is an identity matrix of size .Proof: The proof of Theorem 1 is given in Appendix I.

By setting and from Theorem 1, a unified so-lution for UMPCA is obtained: for , is ob-tained as the unit eigenvector of associated with thelargest eigenvalue. As pointed out earlier, this solution is an ap-proximate, suboptimal solution to the original formulation in(6), due to the heuristics employed to tackle the problem.

C. Determination of the Maximum Number of ExtractedUncorrelated Features

It should be pointed out that compared to PCA and its existingmultilinear extensions, UMPCA has a possible limitation in thenumber of (uncorrelated) features that can be extracted. This isbecause the projection to be solved in UMPCA is highly con-strained, in both their correlation property and the simplicity ofthe projection. Compared with PCA, the projection in UMPCAis a TVP rather than a linear projection from vector to vector.Thus, the number of parameters to be estimated in UMPCA isusually much smaller than that in PCA, as discussed in [19].Compared to 2DPCA, CSA, TROD, and MPCA, the correla-tions among features extracted by UMPCA have to be zero. The

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 6: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1825

maximum number of features that can be extracted by UMPCAis given by the following corollary.

Corollary 1: The number of uncorrelated features thatcan be extracted by UMPCA is upper-bounded by

, i.e., , provided thatthe elements of are not all zero, where is the -modedimensionality and is the number of training samples.

Proof: The proof of Corollary 1 is given in Appendix II.Corollary 1 implies that UMPCA may be more appropriate

for high-resolution tensor objects where the dimensionality ineach mode is high enough to result in a larger and en-able the extraction of sufficient number of (uncorrelated) fea-tures. It also indicates that UMPCA is more suitable for applica-tions that need only a small number (e.g., )of features, such as clustering of a small number of classes.

In addition, it is interesting to examine Corollary 1 forthe linear case, i.e., for . Since UMPCA follows theapproach of successive variance maximization in the classicalderivation of PCA [20], when , the samples are vectors

and UMPCA reduces to PCA. Accordingly, ineach step , there is only one projection vector to be solved tomaximize the captured variance, subject to the zero-correlationconstraint. Corollary 1 indicates that the maximum number ofextracted features does not exceed when .This is exactly the case for PCA, where the number of PCAfeatures cannot exceed the rank of the data matrix, whichis upper-bounded by the minimum of the dimension of thesamples and the number of samples .

D. Initialization, Projection Order, Termination,and Convergence

This section discusses UMPCA design issues, including theinitialization procedure, the projection order, termination condi-tions, as well as issues related to the convergence of the solution.

Due to the multilinear nature of UMPCA, the determinationof each EMP is iterative in nature. Sincesolving the projection vector in one mode requires the projectionvectors in all the other modes, initial estimations for the projec-tion vectors are necessary. However, as in [6], [24], and[34]–[36], determining the optimal initialization in UMPCA isstill an open problem. This work empirically studies two com-monly used initialization methods [6], [19], [22], [35], [36]: uni-form initialization and random initialization. The uniform ini-tialization initializes each -mode projection vector to the allones vector , with proper normalization to have unit length.The random initialization draws each element of the -modeprojection vectors randomly from a zero-mean uniform distribu-tion between , with normalization to have unit lengthas well. The experimental results reported in Section IV indicatethat the results of UMPCA are affected by initialization, and theuniform initialization gives more stable results.

The mode ordering (the loop indexed by in Fig. 1) incomputing the projection vectors affects performance. The op-timal way to determine the projection order is an open researchproblem. Simulation studies on the effects of the projectionorder indicate that different projection order results in differentamount of captured variance. However, there is no guidance

either from the problem, the data, or the algorithm on the bestprojection order. Hence, there is no preference on a particularprojection order, and the projection vectors are solved sequen-tially (from 1-mode to -mode), in a fashion similar to the oneused in [6], [9], and [18].

As can be seen in Fig. 1, the iterative procedure terminateswhen , where is the

total scatter captured by the th EMP obtained in the th it-eration of UMPCA and is a small number threshold. Alter-natively, the convergence of the projection vectors can also betested: , where

(18)

and is a user-defined small number threshold (e.g., ).Section IV indicates that the variance captured by a partic-ular EMP usually increases rapidly for the first few iterationsand slowly afterwards. Therefore, the iterative procedures inUMPCA can be terminated by simply setting a maximumnumber of iterations in practice for convenience, especiallywhen the computational cost is a concern.

Regarding convergence, the derivation of Theorem 1 (the endof Appendix I) implies that per iteration, the scatter is a non-decreasing function since each update of the projection vector

in a given mode maximizes . On the other hand,is upper-bounded by the variation in the original samples,

following similar argument in [9]. Therefore, UMPCA is ex-pected to convergence over iterations, following [23, Th. 1] and[24]. Empirical results presented in Section IV indicate thatUMPCA converges within ten iterations for typical tensor ob-jects in practice. In addition, when the largest eigenvalues ineach mode are with multiplicity 1, the projection vectors ,which maximize the objective function , are expected to con-verge as well, where the convergence is up to sign. Simulationstudies show that the projection vectors do converge overa number of iterations.

E. Computational Aspects of UMPCA

Finally, the computational aspects of UMPCA are consideredhere. Specifically, the computational complexity and memoryrequirements are analyzed, following the framework used in [9]for MPCA, and in [6] for UMLDA. It is assumed again that

for simplicity.The most computationally demanding steps in UMPCA are

the calculations of the projection , the computation ofand , and the calculation of the leading eigenvector

of . The complexity of calculating for

and are in order of and

, respectively. The computation of is in the order of

(19)

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 7: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1826 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

Fig. 2. Illustration of the effects of initialization on the scatter captured by UMPCA. Comparison of the captured � with uniform and random initialization(average of 20 repetitions) over 30 iterations for � � �� �� �� �� � on synthetic data set (a) db1, (b) db2, and (c) db3. (d) Illustration of the captured � of tenrandom initializations for � � � on db2.

The computation of and its eigendecomposition2 areboth of order . Therefore, the computational complexityper mode for one iteration of step is

(20)

Regarding the memory requirement, the respective compu-tation can be done incrementally by reading sequentially.Thus, except for , the memory needed for the UMPCAalgorithm can be as low as , although sequential readingwill lead to higher input/output (I/O) cost.

From Fig. 1 and the discussions above, as a sequential itera-tive solution, UMPCA may have a high computational and I/O

2Since only the largest eigenvalue and the corresponding eigenvector areneeded in UMPCA, more efficient computational methods may be applied inpractice.

cost. Nevertheless, solving the UMPCA projection is only inthe training phase of the targeted pattern recognition tasks, soit can be done offline and the additional computational and I/Ocosts due to iterations and sequential processing are not consid-ered a disadvantage. During testing, feature extraction from atest sample is an efficient linear operation, as in conventionallinear subspace learning algorithms.

IV. EXPERIMENTAL STUDIES

This section presents a number of experiments carried out insupport of the following objectives.

1) Investigate the various properties of the UMPCAalgorithm.

2) Demonstrate the utility of the UMPCA algorithm in typicallearning applications by comparing the UMPCA recogni-tion performance against that of the baseline PCA solutionand its state-of-the-art multilinear extensions on two recog-nition problems involving tensorial data, namely, face andgait recognition.

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 8: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1827

A. Study of UMPCA Properties on Synthetic Data

The following properties of UMPCA are studied here: 1) theeffects of the initialization procedure, 2) the effects of the pro-jection order, and 3) the convergence of the algorithm. Similarto MPCA [9], UMPCA is an unsupervised algorithm derivedunder the variance maximization principle and eigendecomposi-tion in each mode (Theorem 1) is an important step in computingthe UMPCA projection. Thus, its properties are affected by theeigenvalue distribution of the input data. Three third-order ten-sorial data sets, namely, db1, db2, and db3, have been syntheti-cally generated in [9], with eigenvalues in each mode spanningdifferent magnitude ranges. In this work, experimental studiesof the UMPCA properties are performed on these three syntheticdata sets, with a brief description of the generation process in-cluded below.

For each set, samples are generatedaccording to ,where ,and . All entries in are drawn from azero-mean unit-variance Gaussian distribution and multipliedby , where controls the eigenvaluedistributions. are orthogonal matricesobtained from applying singular value decomposition (SVD)on random matrices with entries drawn from zero-mean,unit-variance Gaussian distribution. All entries of are froma zero-mean Gaussian distribution with variance 0.01. Threesynthetic data sets, db1, db2, and db3, of sizewith and and , respectively,are generated, where a smaller results in a narrower rangeof eigenvalue spread and vice versa. Practical data such asface and gait data share similar characteristics with the data indb1 [9].

1) The Effects of Initialization: The effects of initializationare studied first. The uniform initialization and random initial-ization discussed in Section III-D are tested up to 30 iterations,with the projection order fixed. Fig. 2 shows the simulationresults on the three synthetic data sets. The results shown forrandom initialization in Fig. 2(a)–(c) are the average of 20 re-peated trials. From Fig. 2(a) and (b), it can be seen that for ,both the uniform and random initializations result in the same

. For , two ways of initialization lead to different ,with the uniform initialization performing better (i.e., results inlarger ) on db2. In addition, it should be noted that for db1and db2, decreases as increases, which is expected sincemaximum variation should be captured in each EMP subject tothe zero-correlation constraint. From the figures, the algorithmconverges in around 5 and 15 iterations for db1 and db2, respec-tively. For db3 in Fig. 2(c), the uniform and random initializa-tions do not result in the same even for and doesnot always decrease as increases, which indicates that someEMPs fail to capture the maximum variation. It may be partlyexplained by observing from Fig. 2(c) that the algorithm con-verges slowly and 30 iterations may not be sufficient to reachconvergence (i.e., maximize the captured variation).

Fig. 2(d) further shows some typical results of the evolutionof for ten random initializations on db2 with . As seenfrom the figure, the results obtained from random initialization

have high variance. Although when computational cost is nota concern, a number of random initializations can be tested tochoose the one results in the best performance, i.e., the largest

, the uniform initialization is a safe choice when testing sev-eral initializations is not desirable. Thus, the uniform initializa-tion is used in all the following experiments for UMPCA.

2) The Effects of Projection Order: Next, the effects of theprojection order are tested, with representative results shown inFig. 3 for on the three synthetic data sets. As shownin the figure, the projection order affects UMPCA as well, ex-cept for on db1 and db2. Nonetheless, no one particularprojection order consistently outperforms all the others. Thus, inthe following experiments, the projection order is fixed to be se-quential from 1 to . As in initialization, if computational costis not a concern, all possible projection orders could be testedand the one resulting in the largest should be considered foreach .

3) Convergence Studies: Last, experimental studies are per-formed on the convergence of the total scatter captured in eachEMP and the convergence of the corresponding projection vec-tors in each mode. Fig. 4 depicts the evolution of the capturedtotal scatter and the two-mode projection vector difference over50 iterations for . From Fig. 4(a), (c), and (e),it can be observed that the algorithm converges (in terms ofthe total scatter) on db1 and db2 in about 10 and 30 itera-tions, respectively, while on db3, the convergence speed isconsiderably lower, indicating again the difficulty of db3.Fig. 4(b), (d), and (f) demonstrates that the derived projectionvectors converge too. It is also observed that the convergencespeed of the projection vectors on db3 is much lower than theconvergence speed on the other two data sets. This is likely dueto the narrow range of eigenvalue spread in db3.

B. Evaluation on Face and Gait Recognition

The proposed UMPCA is evaluated on two unsupervisedrecognition tasks, namely, face recognition [37] and gaitrecognition [38], which can be considered as second-order andthird-order tensor biometric classification problems, respec-tively. These two recognition tasks have practical importance insecurity-related applications such as biometric authenticationand surveillance [39]. The evaluation is through performancecomparison against the baseline PCA solution and its existingmultilinear extensions.

1) The Face and Gait Data: The facial recognition tech-nology (FERET) database [40], a standard testing database forface recognition performance evaluation, includes 14 126 im-ages from 1199 individuals covering a wide range of variationsin viewpoint, illumination, facial expression, races, and ages.The FERET subset selected in this experimental evaluation con-sists of those subjects that have at least eight images in the data-base with at most 15 of pose variation. Thus, 721 face imagesfrom 70 FERET subjects are considered. Since the focus hereis on recognition rather than detection, all face images are man-ually cropped, aligned (with manually annotated coordinate in-formation of eyes), and normalized to 80 80 pixels, with 256gray levels per pixel. Fig. 5(a) depicts sample face images froma subject in this FERET subset.

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 9: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1828 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

Fig. 3. Illustration of the effects of projection order on the scatter captured by UMPCA: on db1 with (a) � � � and (b) � � �, on db2 with (c) � � � and (d)� � �, and on db3 with (e) � � � and (f) � � �.

The University of South Florida (USF) gait challenge data setversion 1.7 consists of 452 sequences from 74 subjects walkingin elliptical paths in front of the camera, with two viewpoints(left or right), two shoe types (A or B), and two surface types(grass or concrete). There are seven probes in this data set andprobe A is chosen for the evaluation, with difference in view-

point only. Thus, the gallery set is used as the training set andit is captured on grass surface, with shoe type A and from theright view, and probe A is used as the test set and it is capturedon grass surface, with shoe type A and from the left view. Eachset has only one sequence for a subject. Subjects are unique inthe gallery and probe sets and there are no common sequences

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 10: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1829

Fig. 4. Illustration of the convergence of UMPCA: the evolution of the total scatter captured on (a) db1, (c) db2, and (e) db3; and the evolution of������ �� � on (b) db1, (d) db2, and (f) db3.

between the gallery set and the probe set. These gait data setsare employed to demonstrate the performance on third-ordertensors since gait silhouette sequences are naturally 3-D data[9]. The procedures in [9] are followed to get gait samples fromgait silhouette sequences and each gait sample is resized to athird-order tensor of 32 22 10. There are 731 and 727 gait

samples in the gallery set and probe A, respectively. Fig. 5(b)shows three gait samples from the USF gait database.

2) Algorithms and Their Settings in Performance Compar-ison: In the face and gait recognition experiments, the perfor-mance of the proposed UMPCA is compared against that ofPCA [20], [21] and five existing multilinear PCA extensions,

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 11: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1830 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

Fig. 5. Examples of the biometric data used in performance evaluation: (a) eight face samples from a subject in the FERET subset used, and (b) three gait samplesfrom the USF gait database V.1.7, shown by concatenating frames in rows.

Fig. 6. Detailed face recognition results by PCA algorithms on the FERET database for � � �: (a) performance curves for the low-dimensional projection case,(b) performance curves for the high-dimensional projection case, (c) the variation captured by individual features, and (d) the correlation among features.

2DPCA [17], CSA [18], TROD [22], GPCA [8], and MPCA [9].It should be noted that 2DPCA and GPCA are only applied to theface recognition problem since it cannot handle the third-ordertensors in gait recognition. In addition, 2DSVD [27] is used for

initialization in GPCA so the GPCA algorithm tested is equiva-lent to MPCA with .

The recognition performance is evaluated by the identifica-tion rate calculated through similarity measurement between

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 12: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1831

Fig. 7. Detailed face recognition results by PCA algorithms on the FERET database for � � �: (a) performance curves for the low-dimensional projection case,(b) performance curves for the high-dimensional projection case, (c) the variation captured by individual features, and (d) the correlation among features.

feature vectors. The simple nearest neighbor classifier withEuclidean distance measure is used for classification of ex-tracted features since the focus of this paper is on featureextraction. Among the algorithms considered here, 2DPCA,CSA, GPCA, and MPCA produce tensorial features, whichneed to be vectorized for classification. Hence, for these fouralgorithms, each entry in the projected tensorial features isviewed as an individual feature and the corresponding totalscatter as defined in (4) is calculated. The tensorial featuresproduced by these methods are then arranged into a vectorin descending total scatter. All the iterative algorithms (CSA,TROD, GPCA, MPCA, and UMPCA) are terminated by settingthe maximum number of iterations for fair comparison andcomputational concerns. Since CSA, GPCA, and MPCA havevery good convergence performance, is set to 1. For TRODand UMPCA, is set to 10 and the uniform initialization isused. Due to Corollary 1, up to 80 features are tested in the facerecognition experiments and up to ten features are tested in thegait recognition experiments.

3) Face Recognition Results: Gray-level face images are nat-urally second-order tensors (matrices), i.e., . There-

fore, they are input directly as 80 80 tensors to the multi-linear algorithms (2DPCA, CSA, TROD, GPCA, MPCA, andUMPCA), while for PCA, they are vectorized to 6400 1 vec-tors as input. For each subject in a face recognition experiment,

samples are randomly selected for un-supervised training and the rest are used for testing. The resultsaveraged over 20 such random splits (repetitions) are reported interms of the correct recognition rate (CRR), i.e., the rank 1 iden-tification rate. Figs. 6 and 7 show the detailed results forand , respectively. is the one training sample (perclass) scenario [30], and is the maximum number oftraining samples that can be used in this set of experiments. Itshould be noted that for PCA and UMPCA, there are at most69 (uncorrelated) features when since there are only70 faces for training and the mean is zero. Figs. 6(a) and 7(a)plot the CRRs against , the dimensionality of the subspace, for

, and Figs. 6(b) and 7(b) plot those for rangingfrom 15 to 80. From the figures, UMPCA outperforms the otherfive methods in both cases and across all s, indicating that theuncorrelated features extracted directly from the tensorial facedata are effective in classification. The figures also show that for

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 13: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1832 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

TABLE IIIFACE RECOGNITION RESULTS BY PCA ALGORITHMS ON THE FERETDATABASE: THE CRRS ������ ��� FOR VARIOUS �s AND � s

UMPCA, the recognition rate saturates around , whichcan be explained by observing the variation captured by indi-vidual features as shown in Figs. 6(c) and 7(c) (in scale).These figures show that the variation captured by UMPCA isconsiderably lower than those captured by the other methods,due to its constraints of zero-correlation and the TVP. Despitecapturing lower variation, UMPCA has superior performance inthe recognition task performed. Nonetheless, when the variationcaptured is too low, those corresponding features are no longerdescriptive enough to contribute in classification, leading to thesaturation.

In addition, the average correlations of individual featureswith all the other features are plotted in Figs. 6(d) and 7(d).As supported by theoretical derivation, features extracted byPCA and UMPCA are uncorrelated. In contrast, the featuresextracted by all the other methods are correlated, with thoseextracted by 2DPCA and TROD having much higher corre-lation on average, which could be partly the reason of theirpoorer performance.

The recognition results for are listedin Table III for , with both the mean and thestandard deviation (STD) over 20 repetitions indicated. The toptwo results for each and combination are highlighted in bold

face for easy reading. From the table, UMPCA achieves the bestrecognition results in all cases reported. In particular, for smaller

, UMPCA outperforms the other methods signif-icantly, demonstrating its superior capability in classifying facesin low-dimensional projection spaces.

4) Gait Recognition Results: In order to evaluate the recog-nition performance of UMPCA on third-order tensors, recogni-tion experiments are also carried out on the 3-D gait data de-scribed in Section IV-B1. In these experiments, gait samplesare input directly as third-order tensors to the multilinear algo-rithms, while for PCA, they are vectorized to 7040 1 vectorsas input. The gallery set is used for training and the probe set Ais used for testing. For the classification of individual gait sam-ples, the CRRs are reported in Fig. 8(a). For the classificationof gait sequences, both rank 1 and rank 5 recognition rates arereported in Fig. 8(b) and (c). The calculation of matching scoresbetween two gait sequences follows that in [9]. From Fig. 8(a),starting from four features, UMPCA outperforms the other fourmethods consistently in terms of CRRs for individual gait sam-ples. Fig. 8(b) and (c) demonstrates that starting from three fea-tures, UMPCA gives better recognition performance than allthe other four algorithms. These results again show the supe-riority of UMPCA in low-dimensional projection space in un-supervised classification of tensorial samples.

On the other hand, in this experiment, the number of fea-tures that can be extracted by the UMPCA algorithm is lim-ited to ten, the lowest mode dimension of the gait sample (fromCorollary 1). Such a small number prevents higher recognitionrate to be achieved. This limitation of UMPCA may be partlyovercome through the combination with other features to en-hance the recognition results. For example, MPCA may be com-bined with UMPCA through the relaxation of the zero-correla-tion constraint, or PCA may be combined with UMPCA throughthe relaxation of the projection. Beyond the maximum numberfeatures from UMPCA, the heuristic approach in TROD mayalso be adopted to generate more features. Furthermore, the ag-gregation scheme introduced in [6] could also be a possible fu-ture working direction.

C. Illustration of the UMPCA Projections

Finally, in order to provide some insights into the UMPCA al-gorithm, Fig. 9(a) and (b) depicts, as gray-level images, the firstthree EMPs obtained by UMPCA using the FERET databasewith , and the USF gait gallery set, respectively. From theEMPs for the face data, it can be seen that there is strong pres-ence of structured information due to the multilinear nature ofUMPCA, which is different from the information conveyed bythe ghost-face-like bases produced by linear algorithms such aseigenface [21] or fisherface [41]. The maximum variation (thefirst EMP) mainly captures the difference between forehead areaand the rest of the facial image [with reference to Fig. 5(a)].The second and third EMPs indicate that there are also signifi-cant variations around the other facial feature areas, such as theeyes, nose, and mouth. In the EMPs for the gait data, which arethird-order tensors displayed as their one-mode unfolded ma-trices in Fig. 9(b), structure is again observed across the threemodes (column, row, and time). The first gait EMP indicates that

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 14: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1833

Fig. 8. Detailed gait recognition results by PCA algorithms on the USF gait database (probe A): (a) CRR for individual gait samples, (b) rank 1 recognition ratefor gait sequences, and (c) rank 5 recognition rate for gait sequences.

Fig. 9. Illustration of the first three EMPs (in top-to-down order) obtained byUMPCA from (a) the FERET database with� � �, and (b) the USF gait gallerysequences (one-mode unfolding is shown).

the highlighted symmetric areas in frames encode the most vari-ations, which roughly correspond to the boundary of the humansilhouette [with reference to Fig. 5(b)]. The second gait EMPdemonstrates that significant variation is encoded in the differ-ence between the upper body and lower body movement. Thethird gait EMP shows that there is also considerable variation inthe foot area, as expected. These observations provide insightsinto the nature of the features encoded by UMPCA and offer a

better understanding of this algorithm’s performance when ap-plied to certain data sets.

V. CONCLUSION

This paper has introduced a novel unsupervised learningsolution called UMPCA, which is capable of extracting uncor-related features directly from tensorial representation througha TVP. This feature extraction problem is solved by succes-sive maximization of variance captured by each elementaryprojection while enforcing the zero-correlation constraint. Thesolution is an iterative method utilizing an alternating projec-tion method. A proof is provided regarding possible restrictionson the maximum number of uncorrelated features that can beextracted by the UMPCA procedure. Experimentation on faceand gait recognition data sets demonstrates that compared withother unsupervised subspace learning algorithms includingPCA, 2DPCA, CSA, TROD, GPCA, and MPCA, the proposedUMPCA solution has achieved the best overall results withthe same number of features. Moreover, the low-dimensionalprojection space produced by UMPCA is particularly effectivein these recognition tasks.

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 15: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1834 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

APPENDIX IPROOF OF THEOREM 1

Proof: First, Lagrange multipliers can be used to trans-form the problem in (13) to the following to include all the con-straints:

(21)

where and are Lagrange multipliers.The optimization is performed by setting the partial derivative

of with respect to to zero

(22)

Multiplying (22) by results in

(23)

which indicates that is exactly the criterion to be maximized,with the constraint on the norm of the projection vector incor-porated.

Next, a set of equations are obtained by multiplying

(22) by , , respectively

(24)Let

(25)

and use (16) and (17), then the equations of (24) can berepresented in a single matrix equation as the following:

(26)

Thus

(27)

Since from (17) and (25)

(28)

equation (22) can be written as

Using the definition in (15), an eigenvalue problem is obtainedas . Since is the criterion to be maximized,

the maximization is achieved by setting to be the (unit)eigenvector corresponding to the largest eigenvalue of (14).

APPENDIX IIPROOF OF COROLLARY 1

Proof: To prove the corollary, it is only needed to showthat for any mode , the number of bases that can satisfy thezero-correlation constraint is upper-bounded by .

Considering only one mode , the zero-correlation constraintfor mode in (13) becomes

(29)

First, let and the constraint be-comes

(30)

Since , when , the set ,forms a basis for the -dimensional space and there is no so-lution for (30). Thus, .

Second, let and the constraint be-comes

(31)

Since , are orthogonal, ,are linearly independent if the elements of are not all

zero. Since , when , the set ,forms a basis for the -dimensional space

and there is no solution for (31). Thus, .From the above, if the elements of

are not all zero, which is often the case as long as theprojection basis is not initialized to zero and the elements ofthe training tensors are not all zero.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir insightful comments.

REFERENCES

[1] L. D. Lathauwer, B. D. Moor, and J. Vandewalle, “On the best rank-1and rank-�� �� � � � � � � � approximation of higher-order tensors,”SIAM J. Matrix Anal. Appl., vol. 21, no. 4, pp. 1324–1342, 2000.

[2] C. M. Cyr and B. B. Kimia, “3D object recognition using shape simil-iarity-based aspect graph,” in Proc. IEEE Conf. Comput. Vis., Jul. 2001,vol. 1, pp. 254–261.

[3] H. S. Sahambi and K. Khorasani, “A neural-network appearance-based3-D object recognition using independent component analysis,” IEEETrans. Neural Netw., vol. 14, no. 1, pp. 138–149, Jan. 2003.

[4] K. W. Bowyer, K. Chang, and P. Flynn, “A survey of approaches andchallenges in 3D and multi-modal 3D � 2D face recognition,” Comput.Vis. Image Understand., vol. 101, no. 1, pp. 1–15, Jan. 2006.

[5] S. Z. Li, C. Zhao, M. Ao, and Z. Lei, “Learning to fuse 3D � 2D basedface recognition at both feature and decision levels,” in Proc. IEEE Int.Workshop Anal. Model. Faces Gestures, Oct. 2005, pp. 43–53.

[6] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Uncorrelatedmultilinear discriminant analysis with regularization and aggregationfor tensor object recognition,” IEEE Trans. Neural Netw., vol. 20, no.1, pp. 103–123, Jan. 2009.

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 16: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

LU et al.: UMPCA FOR UNSUPERVISED MULTILINEAR SUBSPACE LEARNING 1835

[7] J. Ye, “Generalized low rank approximations of matrices,” Mach.Learn., vol. 61, no. 1–3, pp. 167–191, 2005.

[8] J. Ye, R. Janardan, and Q. Li, “GPCA: An efficient dimension reduc-tion scheme for image compression and retrieval,” in Proc. 10th ACMSIGKDD Int. Conf. Knowl. Disc. Data Mining, 2004, pp. 354–363.

[9] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “MPCA: Mul-tilinear principal component analysis of tensor objects,” IEEE Trans.Neural Netw., vol. 19, no. 1, pp. 18–39, Jan. 2008.

[10] C. Nolker and H. Ritter, “Visual recognition of continuous handpostures,” IEEE Trans. Neural Netw., vol. 13, no. 4, pp. 983–994,Jul. 2002.

[11] C. Faloutsos, T. G. Kolda, and J. Sun, “Mining large time-evolvingdata using matrix and tensor tools,” in Int. Conf. Mach. Learn. Tutorial,2007 [Online]. Available: http://www.cs.cmu.edu/~christos/TALKS/ICML-07-tutorial/ICMLtutorial.pdf

[12] J. Sun, D. Tao, and C. Faloutsos, “Beyond streams and graphs: Dy-namic tensor analysis,” in Proc. 12th ACM SIGKDD Int. Conf. Knowl.Disc. Data Mining, Aug. 2006, pp. 374–383.

[13] J. Sun, Y. Xie, H. Zhang, and C. Faloutsos, “Less is more: Sparsegraph mining with compact matrix decomposition,” Statist. Anal. DataMining, vol. 1, no. 1, pp. 6–22, Feb. 2008.

[14] J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos, “Incre-mental tensor analysis: Theory and applications,” ACM Trans. Knowl.Disc. Data, vol. 2, no. 3, pp. 11:1–11:37, Oct. 2008.

[15] G. Shakhnarovich and B. Moghaddam, “Face recognition in sub-spaces,” in Handbook of Face Recognition, S. Z. Li and A. K. Jain,Eds. New York: Springer-Verlag, 2004, pp. 141–168.

[16] M. H. C. Law and A. K. Jain, “Incremental nonlinear dimensionalityreduction by manifold learning,” IEEE Trans. Pattern Anal. Mach. In-tell., vol. 28, no. 3, pp. 377–391, Mar. 2006.

[17] J. Yang, D. Zhang, A. F. Frangi, and J. Yang, “Two-dimensional PCA:A new approach to appearance-based face representation and recog-nition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1, pp.131–137, Jan. 2004.

[18] D. Xu, S. Yan, L. Zhang, S. Lin, H.-J. Zhang, and T. S. Huang, “Re-construction and recognition of tensor-based objects with concurrentsubspaces analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 18,no. 1, pp. 36–47, Jan. 2008.

[19] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “A taxonomyof emerging multilinear discriminant analysis solutions for biometricsignal recognition,” in Biometrics: Theory, Methods, and Applications,N. Boulgouris, K. N. Plataniotis, and E. Micheli-Tzanakou, Eds. NewYork: Wiley, 2009, ISBN: 978-0-470-24782-2.

[20] I. T. Jolliffe, Principal Component Analysis, ser. Springer Serires inStatistics, 2nd ed. : , 2002.

[21] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neu-rosci., vol. 3, no. 1, pp. 71–86, 1991.

[22] A. Shashua and A. Levin, “Linear image coding for regression and clas-sification using the tensor-rank principle,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit., 2001, vol. I, pp. 42–49.

[23] D. Tao, X. Li, X. Wu, W. Hu, and S. J. Maybank, “Supervised tensorlearning,” Knowl. Inf. Syst., vol. 13, no. 1, pp. 1–42, Jan. 2007.

[24] D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminantanalysis and gabor features for gait recognition,” IEEE Trans. PatternAnal. Mach. Intell., vol. 29, no. 10, pp. 1700–1715, Oct. 2007.

[25] D. Tao, X. Li, X. Wu, and S. J. Maybank, “Tensor rank one discriminantanalysis—a convergent method for discriminative multilinear subspaceselection,” Neurocomputing, vol. 71, no. 10–12, pp. 1866–1882, Jun.2008.

[26] D. Xu, S. Yan, L. Zhang, H.-J. Zhang, Z. Liu, and H.-Y. Shum, “Con-current subspaces analysis,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit., Jun. 2005, vol. II, pp. 203–208.

[27] C. Ding and J. Ye, “2-dimensional singular value decomposition for2D maps and images,” in Proc. SIAM Int. Conf. Data Mining, 2005,pp. 24–34.

[28] J. Ye, R. Janardan, Q. Li, and H. Park, “Feature reduction via gener-alized uncorrelated linear discriminant analysis,” IEEE Trans. Knowl.Data Eng., vol. 18, no. 10, pp. 1312–1322, Oct. 2006.

[29] J. Wang, K. N. Plataniotis, and A. N. Venetsanopoulos, “Selecting dis-criminant eigenfaces for face recognition,” Pattern Recognit. Lett., vol.26, no. 10, pp. 1470–1482, 2005.

[30] J. Wang, K. N. Plataniotis, J. Lu, and A. N. Venetsanopoulos, “Onsolving the face recognition problem with one training sample per sub-ject,” Pattern Recognit., vol. 39, no. 9, pp. 1746–1762, 2006.

[31] Y. Koren and L. Carmel, “Robust linear dimensionality reduction,”IEEE Trans. Vis. Comput. Graphics, vol. 10, no. 4, pp. 459–470, Jul.–Aug. 2004.

[32] J. L. Rodgers, W. A. Nicewander, and L. Toothaker, “Linearly indepen-dent, orthogonal and uncorrelated variables,” Amer. Statistician, vol.38, no. 2, pp. 133–134, May 1984.

[33] T. K. Moon and W. C. Stirling, Mathematical methods and Algorithmsfor Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 2000.

[34] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H. Zhang, “Multi-linear discriminant analysis for face recognition,” IEEE Trans. ImageProcess., vol. 16, no. 1, pp. 212–220, Jan. 2007.

[35] Y. Wang and S. Gong, “Tensor discriminant analysis for view-basedobject recognition,” in Proc. Int. Conf. Pattern Recognit., Aug. 2006,vol. 3, pp. 33–36.

[36] D. Tao, X. Li, X. Wu, and S. J. Maybank, “Elapsed time in human gaitrecognition: A new approach,” in Proc. IEEE Int. Conf. Acoust. SpeechSignal Process., Apr. 2006, vol. 2, pp. 177–180.

[37] S. Z. Li and A. K. Jain, “Introduction,” in Handbook of Face Recogni-tion, S. Z. Li and A. K. Jain, Eds. New York: Springer-Verlag, 2004,pp. 1–11.

[38] M. S. Nixon and J. N. Carter, “Automatic recognition by gait,” Proc.IEEE, vol. 94, no. 11, pp. 2013–2024, Nov. 2006.

[39] R. Chellappa, A. Roy-Chowdhury, and S. Zhou, Recognition of Hu-mans and Their Activities Using Video. San Rafael, CA: Morgan &Claypool, 2005.

[40] P. J. Phillips, H. Moon, S. A. Rizvi, and P. Rauss, “The FERET eval-uation method for face recognition algorithms,” IEEE Trans. PatternAnal. Mach. Intell., vol. 22, no. 10, pp. 1090–1104, Oct. 2000.

[41] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs.fisherfaces: Recognition using class specific linear projection,” IEEETrans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul.1997.

Haiping Lu (S’02–M’09) received the B.Eng. andM.Eng. degrees in electrical and electronic engi-neering from Nanyang Technological University,Singapore, in 2001 and 2004, respectively, and thePh.D. degree in electrical and computer engineeringfrom University of Toronto, Toronto, ON, Canada,in 2008.

Currently, he is a Research Fellow at the Institutefor Infocomm Research, Agency for Science, Tech-nology and Research (A*STAR), Singapore. Beforejoining A*STAR, he was a Postdoctoral Fellow at

the Edward S. Rogers Sr. Department of Electrical and Computer Engineering,University of Toronto. His current research interests include statistical patternrecognition, machine learning, multilinear algebra, tensor object processing,biometric encryption, and data mining.

Konstantinos N. (Kostas) Plataniotis (S’90–M’92–SM’03) received the B.Eng. degree in computerengineering from University of Patras, Patras,Greece, in 1988 and the M.S. and Ph.D. degreesin electrical engineering from Florida Institute ofTechnology (Florida Tech), Melbourne, in 1992 and1994, respectively.

He is a Professor with The Edward S. Rogers Sr.Department of Electrical and Computer Engineering,University of Toronto, Toronto, ON, Canada, an Ad-junct Professor with the School of Computer Science

at Ryerson University, Toronto, ON, Canada, a member of The University ofToronto’s Knowledge Media Design Institute, and the Director of Research forthe Identity, Privacy and Security Initiative at the University of Toronto. His re-search interests include biometrics, communications systems, multimedia sys-tems, and signal and image processing.

Dr. Plataniotis is the Editor in Chief (2009–2011) for the IEEE SIGNAL

PROCESSING LETTERS, a registered professional engineer in the province ofOntario, and a member of the Technical Chamber of Greece. He is the 2005recipient of IEEE Canada’s Outstanding Engineering Educator Award “forcontributions to engineering education and inspirational guidance of graduatestudents” and the corecipient of the 2006 IEEE TRANSACTIONS ON NEURAL

NETWORKS Outstanding Paper Award for the published in 2003 paper entitled“Face Recognition Using Kernel Direct Discriminant Analysis Algorithms.”

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.

Page 17: 1820 IEEE TRANSACTIONS ON NEURAL …reu.dimacs.rutgers.edu/~xgui/UMPCA_TNN09.pdf1820 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009 Uncorrelated Multilinear Principal

1836 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 11, NOVEMBER 2009

Anastasios N. Venetsanopoulos (S’66–M’69–SM’79–F’88) received the B.S. degree in elec-trical and mechanical engineering degree from theNational Technical University of Athens (NTU),Athens, Greece, in 1965 and the M.S., M.Phil., andPh.D. degrees in electrical engineering from YaleUniversity, New Haven, CT, in 1966, 1968, and1969, respectively.

He joined the Department of Electrical and Com-puter Engineering, University of Toronto, Toronto,ON, Canada, in September 1968 as a Lecturer and

he was promoted to Assistant Professor in 1970, Associate Professor in 1973,and Professor in 1981. He has served as Chair of the Communications Groupand Associate Chair of the Department of Electrical Engineering. BetweenJuly 1997 and June 2001, he was Associate Chair (graduate studies) in theDepartment of Electrical and Computer Engineering and was Acting Chairduring the spring term of 1998–1999. He served as the Inaugural Bell CanadaChair in Multimedia between 1999 and 2005. Between 2001 and 2006 heserved as the 12th Dean of the Faculty of Applied Science and Engineeringof the University of Toronto and on October 1, 2006, he accepted the positionof founding vice-president research and innovation at Ryerson University,Toronto, ON, Canada. He was on research leave at the Imperial College

of Science and Technology, the National Technical University of Athens,the Swiss Federal Institute of Technology, the University of Florence, andthe Federal University of Rio de Janeiro, and has also served as AdjunctProfessor at Concordia University. He has served as lecturer in 138 shortcourses to industry and continuing education programs and as consultant tonumerous organizations; he is a contributor to 35 books and has publishedover 800 papers in refereed journals and conference proceedings on digitalsignal and image processing and digital communications.

Prof. Venetsanopoulos was elected as a Fellow of the IEEE “for contributionsto digital signal and image processing”; he is also a Fellow of the EngineeringInstitute of Canada (EIC) and was awarded an honorary doctorate from the Na-tional Technical University of Athens in October 1994. In October 1996, he wasawarded the “Excellence in Innovation Award” of the Information TechnologyResearch Centre of Ontario and Royal Bank of Canada, “for innovative workin color image processing and its industrial applications.” In November 2003,he was the recipient of the “Millennium Medal of IEEE.” In April 2001 he be-came a Fellow of the Canadian Academy of Engineering. In 2003, he was therecipient of the highest award of the Canadian IEEE, the MacNaughton Award,and in 2003, he served as the Chair of the Council of Deans of Engineering ofthe Province of Ontario (CODE). In 2006, he was selected as the joint recipientof the 2003 IEEE TRANSACTIONS ON NEURAL NETWORKS Outstanding PaperAward.

Authorized licensed use limited to: Haiping Lu. Downloaded on February 28,2010 at 21:55:57 EST from IEEE Xplore. Restrictions apply.