A linear constrained distance-based discriminant analysis for … · 2006. 8. 1. · Pattern Recognition 34 (2001) 361}373 A linear constrained distance-based discriminant analysis

Pattern Recognition 34 (2001) 361}373

A linear constrained distance-based discriminant analysisfor hyperspectral image classi"cation

Qian Du!, Chein-I Chang",*

!Assistant Professor of Electrical Engineering, Texas A&M University-Kingsville, Kingsville, TX 78363, USA"Remote Sensing Signal and Image Processing Laboratory, Department of Computer Science and Electrical Engineering,

University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA

Received 20 April 1999; received in revised form 15 October 1999; accepted 15 October 1999

Abstract

Fisher's linear discriminant analysis (LDA) is a widely used technique for pattern classi"cation problems. It employsFisher's ratio, ratio of between-class scatter matrix to within-class scatter matrix to derive a set of feature vectors bywhich high-dimensional data can be projected onto a low-dimensional feature space in the sense of maximizing classseparability. This paper presents a linear constrained distance-based discriminant analysis (LCDA) that uses a criterionfor optimality derived from Fisher's ratio criterion. It not only maximizes the ratio of inter-distance between classes tointra-distance within classes but also imposes a constraint that all class centers must be aligned along predetermineddirections. When these desired directions are orthogonal, the resulting classi"er turns out to have the same operationform as the classi"er derived by the orthogonal subspace projection (OSP) approach recently developed for hyperspectralimage classi"cation. Because of that, LCDA can be viewed as a constrained version of OSP. In order to demonstrate itsperformance in hyperspectral image classi"cation, Airborne Visible/InfraRed Imaging Spectrometer (AVIRIS) andHYperspectral Digital Imagery Collection Experiment (HYDICE) data are used for experiments. ( 2000 PatternRecognition Society. Published by Elsevier Science Ltd. All rights reserved.

Keywords: Constrained energy minimization (CEM); Linear constrained distance-based discriminant analysis (LCDA); Linear dis-criminant analysis (LDA); Orthogonal subspace projection (OSP); Unsupervised LCDA (ULCDA); Unsupervised CEM (UCEM);Unsupervised LDA (ULDA); Unsupervised OSP (UOSP)

1. Introduction

Remotely sensed images generally consist of a set ofco-registered images taken at the same time by di!erentspectral channels during data acquisition. Consequently,a remote sensing image is indeed an image cube with eachimage pixel represented by a column vector. In addition,due to a large area covered by an instantaneous "eld ofview of remote sensing instruments, an image pixel vectorgenerally contains more than one endmember (sub-stance). This result in a mixture of endmembers residentwithin the pixel vector rather than a pure pixel con-

*Corresponding author. Tel.: #1-410-455-3502; fax: #1-410-455-3969.

E-mail address: [email protected] (C.-I. Chang).

sidered in classical image processing. So, standard imageprocessing techniques are not directly applicable.

Many mixed pixel classi"cation methods have beenproposed such as linear unmixing [1}8]. One of principaldi!erences between pure and mixed pixel classi"cation isthat the former is a class membership assignment pro-cess, whereas the latter is actually endmember signatureabundance estimation. An experiment-based compara-tive study [9] which showed that mixed pixel classi"ca-tion techniques generally performed better than purepixel classi"cation methods. Additionally, it has beenalso shown that if mixed pixel classi"cation was con-verted to pure pixel classi"cation, Fisher's linear dis-criminant analysis (LDA) [10] was among the best. Itbecomes obvious that directly applying pure pixel-basedLDA to mixed pixel classi"cation problems may not be

0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 2 1 5 - 0

an e!ective way to best utilize LDA. In magnetic reson-ance imaging (MRI) applications [11], Soltanian-Zadehet al. recently developed a constrained criterion to char-acterize brain issues for 3-D feature representation. TheSoltanian-Zadeh et al.'s criterion is the ratio of the inter-distance (IED) to intra-set distance (IAD) [12] subject toa constraint that each class center must be aligned alongsome predeterimined directions. In order to arrive at ananalytic solution, Soltanian-Zadeh et al. further made anassumption of white Gaussian noise based on which IEDcan be reduced to a constant. As a result, maximizingthe ratio of IED to IAD is reduced to minimizing IAD.However, the white Gaussian noise assumption may notbe valid in hyperspectral images since it has been demon-strated [8,13] that unknown interference such as back-ground signatures, clutters in hyperspectral imagery weremore severe and dominant than noise which may resultin non-Gaussianity and nonstationarity. As a matter offact, we will show in this paper that such assumption isnot necessary and can be removed by a whitening pro-cess. By modifying Soltanian-Zadeh et al.'s approacha linear constrained distance-based discriminant analysis(LCDA) is developed in this paper for hyperspectralimage classi"cation and target detection. Two importantaspects resulting from LCDA are worth being men-tioned. One is to show that minimizing IAD is equivalentto minimizing the trace of the data sample covariancematrix &. In this case, a whitening process can be de-veloped to decorrelate the & without making Gaussianassumption. Another is to show that after such a whiten-ing processing is accomplished, LCDA can be simplycarried out by orthogonal subspace projection (OSP)[1,7]. In the light of classi"cation, LCDA can be viewedas a constrained version of OSP. As will be shown inexperiments, LCDA performs signi"cantly better thanOSP.

One of advantages of LCDA is to use a constraint tosteer class centers of interest along desired directions,generally orthogonal directions. Such constraint allowsus to separate di!erent classes as farther as possible. Theidea of using direction constraints is not new and hasbeen found in various applications, minimum variancedistortionless response (MVDR) beamformer in arrayprocessing [14,15], chemical remote sensing [16] andconstrained energy minimization (CEM) in hyperspectralimage classi"cation [17}19]. Of particular interest isa comparative study between LCDA and CEM sinceCEM has shown success in some practical applications.The advantage of CEM over LCDA is that CEM isdesigned for detection of a particular target and onlyrequires knowledge of the desired target to be detectedand classi"ed. So, it is very sensitive to noise. On theother hand, LCDA needs a complete knowledge of targetsignatures of interest, but can do much better classi"ca-tion than CEM using direction constraints when twotargets have very similar signatures in which CEM can

generally detect one of them, but not both. Furthermore,the advantage of using CEM in unknown image scenesdiminishes when LCDA and CEM are extended to theirunsupervised versions where the exact target knowledgeis not available.

This paper is organized as follows. Section 2 brie#yreviews Fisher's LDA and CEM. Section 3 describesLCDA in detail. In particular, an algorithm to implementLCDA is also provided in this section. Section 4 extendsLCDA, LDA, CEM and OSP to their unsupervised ver-sions. Section 5 conducts experiments using airbornevisible/infrared imaging spectrometer (AVIRIS) andHYperspectral Digital Imagery Collection Experiment(HYDICE) data to evaluate the performance of LCDAand ULCDA in comparison with LDA, CEM and OSP.Finally, Section 6 draws some conclusions.

2. Fisher's linear discriminant analysis and constrainedenergy minimization

In this section, we brie#y review Fisher's linear dis-criminant analysis (LDA) and constrained energy mini-mization (CEM) approaches which will be used incomparison with LCDA.

2.1. Fisher's linear discriminant analysis (LDA)

Let Rd and Rp be d- and p-dimensional vector spaces,respectively, with p)d. Let MC

kNck/1

denote c classes ofinterest where C

k"Mxk

1, xk

2,2, xk

NkN"Mxk

jNNkj/1

is the kthclass and contains N

kpatterns and the jth pattern in

class Ck, denoted by xk

j"(xk

1jxk2j2

xkdj

)5 is a d-dimen-sional vector in the space Rd. Let N"N

1#2#N

cbe

the total number of training patterns. From Fisher'sdiscriminant analysis [10], we can form total, between-and within-class scatter matrices as follows.

Let k"(1/N) +Nk/1

+Nk

j/1xkj

be the global mean andkk"(1/N

k) +Nk

j/1xkj

the mean of class Ck. Then

ST"

c+k/1

Nk

+j/1

(xkj!k) (xk

j!k)5, (1)

SW"

c+k/1

Nk

+j/1

(xkj!k

k) (xk

j!k

k)5, (2)

SB"

c+k/1

Nk(k

k!k) (k

k!k)5. (3)

From Eqs. (1)}(3)

ST"S

W#S

B. (4)

Assume that $"Mx1, x

2,2, x

NN"Mxk

jNNk ,cj/1,k/1

are alldata training samples. The Fisher's discriminant analysisis to "nd a d](c!1) weight matrix="[w

1w22w

c~1]

that projects all data samples x3$ in an Rd space into

362 Q. Du, C.-I. Chang / Pattern Recognition 34 (2001) 361}373

y in a low-dimensional feature space Rp in such a mannerthat all projected data samples y's yield the best-possibleclass separability by

y"=5x (5)

with

yk"w5

kx for 1)k)c!1, (6)

where wk

is the kth column vector with dimensionalityd]1 in = and y"(y

1y22y

N)5.

Using Eqs. (2) and (3) we can de"ne similar within- andbetween-class scatter matrices for the projected samplesy given by (5) as follows:

SIW"

c+k/1

Nk

+j/1

(ykj!k8

k) (yk

j!k8

k)5, (7)

SIB"

c+k/1

Nk(k8

k!k8 ) (k8

k!k8 )5. (8)

where k8 "(1/N) +Nk/1

+Nk

j/1ykjand k8

k"(1/N

k) +Nk

j/1ykj.

Substituting Eqs. (2), (3), (5) and (6) into Eqs. (7) and (8)results in

SIW"=5S

W=, (9)

SIB"=5S

B=, (10)

In order to "nd an optimal linear transformation matrix= in the sense of class separability, we use Fisher'sdiscriminant function ratio, called Raleigh quotientwhich is the ratio of the between-class scatter to within-class scatter as follows:

J(=)"DSI

BD

DSIW

D"

D=5SB=D

D=5SW=D

, (11)

where D ) D is the determinant of a matrix. The optimalsolution to Eq. (11), denoted by

=*dC(c~1)

"[wH1wH22wH

c~1]dC(c~1)

(12)

can be found by solving following generalized eigenvalueproblem:

SBwHk"j

kSW

wHk

(13)

with wHk

corresponding to the eigenvalue jk. These c!1

eigenvectors, MwHkNc~1k/1

form a set of Fisher's linear dis-criminant functions which can be used in Eq. (6) as

yk"(wH

k)5x for 1)k)c!1. (14)

It should be noted that since there are c classes, onlyc!1 eigenvalues, denoted by Mj

iNc~1i/1

are nonzeros. Eacheigenvalue j

igenerates its own eigenvector wH

j. By means

of these eigenvector MwHiNc~1i/1

we can de"ne a Fisher's

discriminant analysis-based optimal linear transforma-tion ¹H via Eqs. (5), (12) and (14) by

y"¹H(x)"(=HdC(c~1)

)5x. (15)

For more details, we refer to Ref. [10].

2.2. Constrained energy minimization (CEM ) [17]

An approach similar to direction constraints, calledconstrained energy minimization (CEM) [17}19] waspreviously developed for detection and classi"cation ofa desired target. It used a "nite impulse response (FIR)"lter to constrain the desired signature by a speci"c gainwhile minimizing the "lter output power. The idea ofCEM arises in minimum variance distortionless response(MVDR) in array processing [14,15] with the desiredsignature interpreted as the signals arrived from a desireddirection. It can be derived as follows.

Assume that we are given a "nite set of observationsS"Mr

1r22r

NN where r

i"(r

i1ri22r

iL)5 for 1)i)N is

a sample pixel vector. Suppose that the desired signatured is also known a priori. The objective of CEM is todesign an FIR linear "lter with ¸ "lter coe$cientsMw

1w22w

LN, denoted by an L-dimensional vector

w"(w1w22w

L)5 that minimizes the "lter output power

subject to the following constraint:

d5w"

L+l/1

dlwl"1. (16)

Let yi

denote the output of the designed FIR "lterresulting from the input r

i. Then y

ican be written as

yi"

L+l/1

wlril"w5r

i"r5

iw. (17)

So, the average output power produced by the obser-vation set S and the FIR "lter with coe$cient vectorw"(w

1w22w

L)5 speci"ed by Eq. (17) is given by

1

N CN+i/1

y2i D"

1

N CL+l/1

(r5iw)5r5

iwD

"w5A1

N CN+i/1

rir5iDB w"w5R

LCLw, (18)

where RLCL

"(1/N) [ +Ni/1

rir5i] turns out to be the

¸]¸ sample autocorrelation matrix of S.Minimizing Eq. (18) with the "lter response constraint

d5w"+Ll/1

dlwl"1 yields

minw G

1

NCN+i/1

y2i DH"min

w

Mw5RLCL

wN subject to d5w"1.

(19)

The solution to Eq. (19) was shown [17] and calledconstrained energy minimization (CEM) classi"er with

Q. Du, C.-I. Chang / Pattern Recognition 34 (2001) 361}373 363

the weight vector wH given by

wH"R~1

LCLd

d5R~1LCL

d. (20)

3. Linear constrained distance-based discriminantanalysis

In Fisher's discriminant analysis, the discriminantfunctions are generated based on the Fisher's ratio withno constraints on directions of these discriminant func-tions. In many practical applications, a priori knowledgecan be used to constrain desired features along certaindirections to minimize the e!ects of undesired features.As discussed in Section 2.2 CEM constrains a desiredsignature with a speci"c gain so that its output energyis minimized. In MRI [11], Soltanian-Zadeh et al. con-strained normal tissues along prespeci"ed target posi-tions so as to achieve a better clustering for visualization.In this section, we follows a similar idea [11] to derivea constrained discriminant analysis for hyperspectralimage classi"cation.

First of all, assume that a linear transformation ¹ isused to project a high-dimensional data space intoa low-dimensional space using the ratio of the averagebetween-class distance to average within-class distanceas the criterion for optimality [11] and given by

J(¹)"(2/c(c!1)) +c

i/1+c

j/i`1E¹(k

i)!¹(k

j)E2

(1/cN) +ck/1

[+Nt

j/1E¹(xk

j)!¹(k

k)E2]

, (21)

where the global mean k and local mean kk

of the kthclass were de"ned in the previous section. Now supposethat there are p classes of particular interest with p)cwhich are denoted by MC

kNpk/1

without loss of generality.We can implement Eq. (21) subject to constraint that thedesired class means Mk

kNpk/1

along the prespeci"ed targetdirections Mt

kNpk/1

. What we seek here is to "nd anoptimal linear transformation ¹H which maximizes

J(¹) subject to tk"¹(k

k) for all k with 1)k)p)c.

(22)

Eqs. (21) and (22) outline a general constrained optimi-zation problem which can be solved numerically. LikeEq. (15) we can view the linear transformation ¹ inEq. (22) as a matrix transform which is speci"ed by aweight matrix=

dCp

y"¹(x)"=5dCp

x, (23)

where =dCp

"[w1w22w

p]dCp

is given similarly byEq. (12).

Now if we further assume that p"c, tk, in Eq. (22)

is a unit vector along the kth coordinate in the space Rp,

i.e., tk"(0201

k[020)T is a p]1 unit column vector

with one in the kth component and zeros, otherwise andMw

iNpi/1

in =dCp

are linearly independent, then Eq. (22)becomes

J(¹) subject to wtikk"d

ikfor 1)j, k)p, (24)

where djk

is Kronecker's notation given by

dik"G

1 if i"k,

0 if iOk.

From Ref. [20], the numerator and denominator ofJ(¹) in Eq. (21) can be further reduced to

2

p(p!1)

p+i/1

p+

j/i`1

E¹(ki)!¹(k

j)E2

"

2

p(p!1)

p~1+i/1

p+

j/i`1

[2]"2 (25)

and

1

pN

p+k/1C

Nk

+j/1

E¹(xkj)!¹(k

k)E2D

"

1

pNtraceA=5G

p+k/1C

Nk

+j/1

(xkj!k

k) (xk

j!k

k)5DH=B.

(26)

Since the numerator given by Eq. (25) is a constant,maximizing Eq. (24) is reduced to minimizing Eq. (26),namely

minw

traceA=5Gp+k/1C

Nk

+j/1

(xkj!k

k) (xk

j!k

k)5DH=B

subject to w5ikk"d

ik. (27)

It should be noted that the term in the bracket inEq. (27) turns out to be S

W. From Eq. (4) minimizing

SW

is equivalent to minimizing ST. So, if let & denote the

sample covariance matrix of all training samples, then

ST"N )&. (28)

Since N is a constant, using Eq. (28) we obtain thefollowing equivalent problem by substituting Eq. (28)into Eq. (27):

minW

trace(=5&=) subject to w5ikk"d

ik. (29)

Now all problems are reduced to how we can "nd amatrix A which decorrelates and whitens the covariancematrix & into an identity matrix so that the Gram}Schmidt orthogonalization procedure can be employedto further orthogonalize all A5k

k's. Assume that there

exists such a matrix A. Eq. (29) can be further reduced to


a very simple optimization problem given by

minW

trace(=5&=)"minW G

p+i/1

w5iwiH

"minW G

p+i/1

EwiE2H subject to w5

ik(k"d

ik, (30)

where Mk(iNpi/1

is the orthogonal vectors resulting fromapplying Gram}Schmidt orthogonalization procedure toA5k

k. Since Ew

iE2 is nonnegative for each i with 1)i)p,

Eq. (30) is also equivalent to

maxwi

(w5iwi)~1@2 subject to w5

ik(k"d

ik

for each i with 1)i)p (31)

which can be solved analytically [11] for each wHi

givenby

wHi"k( 5

iPMU, (32)

where

PMU"I!;(;5;)~1;5 (33)

and ; is the space linearly spanned by Mk(jNpj/1,jEi

, allcluster centers except k(

i. Surprisingly, the solution speci-

"ed by Eq. (32) turns out to be the orthogonal subspaceprojection classi"er [7]. So, the classi"er wH

i"k( 5

iPMU

inLCDA can be viewed as a constrained version of theOSP classi"er.

A comment on LCDA is noteworthy. Despite the factthat two criteria used in LDA and LCDA look similarwhere both calculate a ratio of a measure of between classto a measure of within class, there are di!erences betweenLDA and LCDA. In LDA, it uses the scatter matricesderived from between class and within class while LCDAmakes use of inter-distance between classes and intra-distance within classes to arrive the criterion speci"ed byEq. (21). Another di!erence also noted in [11] is that thenumber of the discriminant functions resulting fromLDA is one less than the total number of classes ofinterest, p, whereas the number of projection vectors usedin LCDA is the same number of classes of interest, p.Most signi"cantly, LCDA can be viewed as a variation ofOSP as shown above but LDA is not.

4. Implementation of LCDA using a whitening process

As mentioned, in order to reduce the optimizationproblem given by Eq. (29) to the one speci"ed by Eq. (30),we need to "nd the matrix A. In Ref. [11], Soltanian-Zadeh et al. made the white Gaussian noise assumptionfor MRI to arrive at Eq. (30). As a matter of fact, we

can "nd the matrix A without making such an as-sumption.

Assume that MjiNdi/1

are the eigenvalues of the samplecovariance matrix & or the total scatter matrix S

Tand

MviNdi/1

are their corresponding eigenvectors. Since ST

or& is nonnegative de"nite, then all eigenvalues are non-negative and there exists a unitary matrix Q such that& can be decomposed into

Q5&Q"", (34)

where Q"[v1v22v

b] is a matrix made up of the eigen-

vectors MviNdi/1

and ""diagMjiNdi/1

is a diagonal matrixwith Mj

iNdi/1

in the diagonal line.

If we let "~1@2"diagMJjiNdi/1

, multiplying both sidesof Eq. (34) by "~1@2 results in

"~1@2Q5&Q"~1@2"I. (35)

From Eq. (35), we obtain the desired matrix A for Eq. (29)which is given by

A"Q"~1@2 (36)

so that A5&A"I.Using Eqs. (36) and (32) we can solve the linear con-

strained Euclidean distance-based discriminant analysisoptimization problem speci"ed by Eq. (30) as follows:

Algorithm to Implement LCDA

1. Find the cluster centers MkkNpk/1

of p-classes and thetotal scatter matrix S

Tor covariance matrix &.

2. Find the eigenvalues MjiNdi/1

and their correspondingeigenvectors Mv

iNdi/1

of ST

or & to form the unitarymatrix Q"[v

1v22v

d] and the diagonal matrix

""diagMjiNdi/1

.3. Form the desired matrix A"Q"~1@2 using Eq. (36).4. Find the A-transformed cluster centers MA5k

iNpi/1

where for each i with 1)i)p.5. Apply the Gram}Schmidt orthogonalization proced-

ure to orthogonalize MA5kiNpi/1

to produce their cor-responding orthogonal vectors Mk(

iNpi/1

.6. The linear constrained discriminant Euclidean dis-

tance-based analysis optimization problem speci"edby Eq. (30) can be solved by wH

i"k( 5

iPMU

with PMU

givenby Eq. (33). This step is the classi"cation step wherewHi

is used to classify data samples into the ith class.

5. Unsupervised LCDA

LCDA described in Section 3 requires training samplesto generate data sample covariance matrix &. In order toextend LCDA to unsupervised LCDA many unsuper-vised clustering methods can be used for this purpose,for instance, ISODATA, K-means clustering [10,12].In hyperspectral image classi"cation, we can design an


Fig. 1. An AVIRIS image scene.

unsupervised LCDA by taking advantage of an algo-rithm, called target generation process (TGP) used in theautomatic target detection and classi"cation algorithm[21}23].

The idea of TGP can be brie#y described as follows. Itis "rst initialized by selecting a pixel vector with themaximum length as an initial target denoted by T

0. We

then employ an orthogonal subspace projector PMT0via

Eq. (33) with ;"T0

to project all image pixel vectorsinto the orthogonal complement space of ST

0T, denoted

by ST0TM. The maximum length of the pixel vector in

ST0TM will be selected to be a "rst target denoted by T

1.

The reason for this selection is the following. SinceT1

has the maximum projection in ST0TM, T

1should

have the most distinct features from T0. Then we create

the "rst set ;1

by letting ;1"T

1and calculate the

Orthogonal Projection Correlation Index (OPCI) de-"ned by g

1"¹t

0PM(T0 >U1)

T0

to see whether it less than aprescribed threshold. If it is, TGP stops generating targets.Otherwise, TGP is continued to search for a second targetby applying PM

(T0T1)again to the image. The pixel vector

with the maximum length in the space ST0, T

1TM will be

selected as a second target, denoted by T2. Then once again,

we calculate the g2"OPCI(T

0,;

2)"Tt

0PM(T0 >U2 )

T0

with;2"(T

1T2) to determine if TGP should be termin-

ated. If not, the same above procedure is repeated to "nda third target, a fourth target, etc. until at the ith stepwhere the g

i"OPCI(T

0,;

i)"Tt

0PM(T0 >U1 )

T0

is smallenough and less than a prescribed threshold.

The objective of TGP is to generate a set of potentialtargets in an unknown image without any prior know-ledge. Because of that each TGP-generated target maynot be a real target and could be its neighboring pixelvectors due to interfering e!ects. In order to produce arobust signature matrix M, several metrics can be usedfor this purpose such as Euclidean distance (ED) [10],spectral angle (SA) [1], spectral information divergence(SID) [24,25]. Assume that MT

kNck/1

is the set of targetsgenerated by TGP and m(x, y) is a metric to measure thecloseness of two samples x and y. Then for each1)k)c the training class of T

k, denoted by C

icom-

prises all pixel vectors with distance from Tjmeasured by

m( ) , ) ) less than a prescribed threshold e. That is

Ck"MxkDm(xk, T

k)(eN, (37)

where e is a prescribed distance threshold. It should benoted that T

kis included in its own class C

k. By virtue of

these training classes MCkNck/1

, ULCDA can be imple-mented by LCDA.

6. Experimental results

In this section, hyperspectral data will be used toevaluate the performance of LCDA and ULCDA in

comparison with the results produced by OSP [7] andCEM [17}19].

Example 1 (AVIRIS experiments). The AVIRIS data usedin the experiments were the same data considered in Ref.[7]. It is a subscene of 200]200 pixels extracted from theupper left corner of the Lunar Crater Volcanic Field inNorthern Nye County, Nevada shown in Fig. 1 where"ve signatures of interest in these images are `red oxi-dized basaltic cindersa, `rhyolitea, `playa (dry lakebed)a,`shadea and `vegetationa. In this case, p"5 and d"224is the number of bands. LCDA used constrained unitvectors Mt

kN5k/1

in Eq. (22) to steer the "ve desired signa-tures along "ve orthogonal directions. Fig. 2 shows re-sults of LCDA, CEM and OSP where the images in the"rst, second, third and fourth columns were producedby LCDA, LDA, CEM and OSP, respectively. Imageslabeled by (a), (b), (c) and (d) show targets: cinders,rhyolite, playa and vegetation as targets, respectively,and images labeled by (e) are results of the shade. Theimages are arranged in such a fashion that their counter-parts can be compared in parallel. As we can see fromFig. 2, the results produced by LCDA and CEM arecomparable and both performed better than LDA andOSP in detection and classi"cation of all "ve targetsignatures, cinders, rhyolite, playa and vegetation, spe-ci"cally, in classifying cinders, vegetation and shade.Here, the LDA was carried out by performing LDA onthe image data then followed by a minimum-distanceclassi"er used for class-membership assignment. Since itworked as a pure pixel classi"er rather than a mixed pixelclassi"er to estimate target signature abundance as per-formed by the other three the images produced by LDAare binary as opposed to gray scale images produced byLCDA, CEM and OSP. Therefore, it is not surprising tosee that LDA produced the worst results.

In order to see the performance of ULCDA, we ad-opted SID [24,25] as a spectral metric to measurethe closeness or similarity of two pixel vectors in thescene. The unsupervised clustering used in ULCDA wasthe nearest neighboring rule (NNR) [10,12]. We also


Fig. 2. Results of LCDA, LDA CEM and OSP. First column: (a) cinders classi"ed by LCDA; (b) rhyolite classi"ed by LCDA; (c) playaclassi"ed by LCDA; (d) vegetation classi"ed by LCDA; (e) shade classi"ed by LCDA. Second column: (a) cinders classi"ed by LDA; (b)rhyolite classi"ed by LDA; (c) playa classi"ed by LDA; (d): vegetation classi"ed by LDA; (e) shade classi"ed by LDA. Third column (a)cinders classi"ed by CEM; (b); rhyolite classi"ed by CEM; (c) playa classi"ed by CEM; (d) vegetation classi"ed by CEM; (e) shadeclassi"ed by CEM. Forth column: (a) cinders classi"ed by OSP; (b) rhyolite classi"ed by OSP; (c): playa classi"ed by OSP; (d): vegetationclassi"ed by OSP; (e): shade classi"ed by OSP.


Fig. 3. Results of ULCDA, CEM and UOSP with SID. First column: (a) cinders classi"ed by ULCDA; (b) rhyolite classi"ed byULCDA; (c) playa classi"ed by ULCDA; (d) vegetation classi"ed by ULCDA; (e) shade classi"ed by ULCDA. Second column: (a)cinders classi"ed by ULDA; (b) rhyolite classi"ed by ULDA; (c) playa classi"ed by ULDA; (d): vegetation classi"ed by ULDA; (e) shadeclassi"ed by ULDA. Third column: (a) cinders classi"ed by UCEM; (b): rhyolite classi"ed by UCEM; (c): playa classi"ed by UCEM; (d):vegetation classi"ed by UCEM; (e): shade classi"ed by UCEM. Forth column: (a) cinders classi"ed by UOSP; (b): rhyolite classi"ed byUOSP; (c) playa classi"ed by UOSP; (d) vegetation classi"ed by UOSP; (e) shade classi"ed by UOSP.


Fig. 4. A HYDICE scene.

used TGP, SID and NNR to extend LDA, OSP andCEM to their unsupervised counterparts, referred toas ULDA, UCEM and UOSP. The purpose of usingSID is to relax the sensitivity of knowledge used inclassi"cation. Since there is no prior knowledge aboutsignatures, the target pixel vectors generated by TGPmay not be true target vectors due to interferenceand noise. Using SID in Eq. (37) allows us to grouppixel vectors whose signatures are similar and close todesired target vectors and use their spectral signatureaverages as target class centers. The results are shownin Fig. 3. All the images are arranged in the same matteras does Fig. 2. It is interesting to note that in Fig. 3,except for ULDA that did much worse than LDA, theclassi"cation results obtained by ULCDA, UCEMand UOSP are nearly the same as those obtained bytheir counterparts in Fig. 2 even there is no a priorisignature knowledge is available. The reason for this isthat the a priori target signature information required formixed pixel classi"cation can be compensated by itsestimated abundance while ULDA does not have such anadvantage.

Example 2 (HYDICE experiments). In this example, a HY-DICE scene was conducted to evaluate the performanceamong LCDA, LDA, CEM and OSP and their unsuper-vised counterparts. The considered scene is exactly thesame one used in [13] and reproduced in Fig. 4 with1.5-m spatial resolution. Four vehicles of two di!erenttypes are parked along the tree line labeled from top tobottom by <

1, <

2, <

3, <

4and one man-made object

denoted by Obj is near the center of the scene. While thetop three<

1,<

2,<

3belong to one type of vehicle denoted

by V1, the bottom vehicle <4

belongs to another type ofvehicle denoted by V2. Of particular interest in this sceneis that the ground truth provides precise pixel locationsof all the four vehicles as well as the object so that we canverify the detection and classi"cation results for eachtarget for di!erent methods. Fig. 5 shows the results ofLCDA, LDA, CEM and OSP with the images arrangedin the same fashion as those in Fig. 2. Images labeled by(a), (b) and (c) show the target detection and classi"cationof the Obj, V1 and V2, respectively. From Fig. 5 LCDAperformed better than LDA, CEM and OSP in overallperformance. An interesting "nding is that LDA actuallyperformed better than LCDA in detection of <

4but did

not work as well as LCDA in detection of V1 and Objwhere many false alarms occurred in LDA detection. Thereason for this is that LDA is a pure pixel classi"cationtechnique while LCDA is a mixed pixel classi"cationmethod (So are CEM and OSP). As a result, the graylevel values of mixed pixels in the images generated byLCDA re#ect the abundance fractions of a particulardetected target. Due to the fact that the spectral signatureof <

3is very similar to that of <

4, LCDA also detected

a very small fraction of <3

while detecting<4

in Fig. 5(c).

The same phenomena were found much worse in theCEM-generated and OSP-generated images shown inFig. 5(b) and (c) where both methods had di$culty withdi!erentiating these two vehicles <

3and <

4. Unlike

CEM and OSP, LCDA did manage to mitigate thisproblem by making use of constraints to steer <

3and

<4

in such a way that they both were forced to beseparated along orthogonal directions. Consequently,a barely visible amount of abundance of <

3was detected

and classi"ed in the LCDA-detected image in Fig. 5(c).Furthermore, comparing the images in the "rst and thirdcolumns of Fig. 5, we can see that CEM extracted moreabundance of <

3than did LCDA in detection of <

4.

Nevertheless, CEM and LCDA performed signi"cantlybetter than OSP. In detection and classi"cation of V1,LCDA also performed better than LDA, CEM and OSPas shown in the images of Fig. 5. Although OSP alsocorrectly classi"ed V1, it also extracted some naturalbackground signatures, tree and road. On the contrary,CEM nulled out all background signatures, but alsoinevitably extracted some fraction of <

4. Since the spec-

tral signature of Obj is very distinct from those of fourvehicles, OSP, CEM and LCDA performed well in thiscase. These HYDICE experiments further demonstratethat in discrimination of targets with similar spectralsignatures LCDA not only is superior to OSP, but alsoperforms slightly better than CEM.

As noted above, the spectral signatures of <3

and<4

are very similar. So, when ULCDA, ULDA, UCEMand UOSP were applied to the scene in Fig. 4, the resultswere interesting. Since there is no a priori knowledgeabout the targets, <

3and <

4were treated as di!erent

targets. As a results, four categories of targets, Obj,V1(<

1, <

2), V1(<

3), V2(<

4) were detected in Fig. 6 where

the images are also arranged in the same way as those inFig. 5 but V1 has been split into two di!erent classes. Asshown in Fig. 6, ULCDA did not work as well as didLCDA in Fig. 5, but still performed reasonably well ingeneral. ULDA did well in pulling out Obj and V1(<

3),

but did poorly in detecting V1(<1, <

2) and V2(<

4) with

many false alarms. Of particular interest is UCEM where


Fig. 5. Results of LCDA, LDA, CEM and OSP. First column: (a) Obj classi"ed by LCDA; (b) V1 classi"ed by LCDA; (c) V2 classi"ed byLCDA. Second column: (a) Obj classi"ed by LDA; (b) V1 classi"ed by LDA; (c) V2 classi"ed by LDA. Third column: (a) Obj classi"ed byCEM; (b) V1 classi"ed by CEM; (c) V2 classi"ed by CEM. Forth column: (a) Obj classi"ed by OSP; (b) V1 classi"ed by OSP; (c) V2classi"ed by OSP.

UCEM did as well as ULCDA and its performance wasactually improved if <

3was considered to be a separate

type of vehicle. For UOSP it performed well in the senseof target detection, but its performance was slightly o!setby extracting small abundance fractions of some back-ground signatures.

7. Conclusion

Linear discriminant analysis has been well acceptedas a major technique in pattern classi"cation. It canbe also applied to hyperspectral image classi"cation[9]. This paper presents a similar but di!erent approachto LDA, called LCDA which replaces Fisher's ratio

with the ratio of inter-distance to intra-distance as acriterion for optimality. The advantage of LCDA overLDA is that it constrains the class centers along thedesired orthogonal directions. Consequently, all theclasses of interest are forced to separate, one must beorthogonal to another. By means of this direction con-straint LCDA can detect and classify similar targets. It isparticularly useful for very high spatial resolution hyper-spectral imagery such as HYDICE [8] where the sizeof targets ranges from 1 meter to 4 meters. Additionally,LCDA and CEM can be extended to an unsupervisedmode for unknown image scenes when no a priori signa-ture knowledge is available. The experimental results arevery impressive and almost as good as their supervisedcounterparts.


Fig. 6. Results of LCDA, LDA, CEM and OSP. First column: (a) Obj classi"ed by LCDA; (b): V1(<1,<

2) classi"ed by LCDA;

(c): V1(<3) classi"ed by LCDA; (d): V2(<

4) classi"ed by LCDA. Second column: (a) Obj classi"ed by LDA; (b) V1(<

1,<

2) classi"ed

by LDA; (c) V1(<3) classi"ed by LDA; (d): V2(<

4) classi"ed by LDA. Third column: (a) Obj classi"ed by CEM; (b) V1(<

1,<

2) classi"ed

by CEM; (c) V1(<3) classi"ed by CEM; (d): V2(<

4) classi"ed by CEM. Forth column: (a) Obj classi"ed by OSP; (b): V1(<

1,<

2) classi"ed

by OSP; (c) V1(<3) classi"ed by OSP; (d) V2(<

4) classi"ed by OSP.


Acknowledgements

The authors would like to thank Dr. Harsanyi ofApplied Signal and Image Technology Inc. for providingAVIRIS data for experiments conducted in this paper.

References

[1] R.A. Schowengerdt, Remote Sensing: Models andMethods for Image Processing, 2nd Edition, AcademicPress, New York, 1997.

[2] J.W. Boardman, Inversion of imaging spectrometry datausing singular value decomposition, Proceedings ofIEEE Symposium. Geoscience and Remote Sensing, 1989,pp. 2069}2072.

[3] J.J. Settle, On the relationship between spectral unmixingand subspace projection, IEEE Trans. Geosci. RemoteSensing 34 (1996) 1045}1046.

[4] Y.E. Shimabukuro, J.A. Smith, The least-squares mixingmodels to generate fraction images derived from remotesensing multispectral data, IEEE Trans. Geosci. RemoteSensing 29 (1) (1991) 16}20.

[5] T.M. Tu, C.-H. Chen, C.-I. Chang, A least squares ortho-gonal subspace projection approach to desired signatureextraction and detection, IEEE Trans. Geosci. RemoteSensing 35 (1) (1997) 127}139.

[6] C.-I. Chang, X. Zhao, M.L.G. Althouse, J.-J. Pan,Least squares subspace projection approach to mixedpixel classi"cation in hyperspectral images, IEEE Trans.Geosci. Remote Sensing 36 (3) (1998) 898}912.

[7] J.C. Harsanyi, C.-I. Chang, Hyperspectral image classi"ca-tion and dimensionality reduction: an orthogonal sub-space projection, IEEE Trans. Geosci. Remote Sensing 32(4) (1994) 779}785.

[8] C.-I. Chang, T.-L.E. Sun, M.L.G. Althouse, An unsuper-vised interference rejection approach to target detectionand classi"cation for hyperspectral imagery, Opt. Engng37 (3) (1998) 735}743.

[9] C.-I. Chang, H. Ren, An experiment-based quantitativeand comparative analysis of hyperspectral target detectionand image classi"cation algorithms, IEEE Trans on Geo-science and Remote Sensing (to appear).

[10] R.O. Duda, P.E. Hart, Pattern Classi"cation and SceneAnalysis, Wiley, New York, 1973.

[11] H. Soltanian-Zadeh, P. Windham, D.J. Peck, Optimallinear transformation for MRI feature extraction, IEEETrans. Med. Imaging 15 (6) (1996) 749}767.

[12] J.T. Tou, R.C. Gonzalez, Pattern Recognition Principles,Addison-Wesley, Reading, MA, 1974.

[13] C.-I. Chang, Q. Du, Interference and noise adjusted princi-pal components analysis, IEEE Trans. Geosci. RemoteSensing 37 (5) (1999) 2387}2396.

[14] B.D. Van Veen, K.M. Buckley, Beamforming: a versatileapproach to spatial "ltering, IEEE ASSP Mag. (1988) 4}24.

[15] S. Haykin, Adaptive Filter Theory, 3rd Edition, Prentice-Hall, Englewood Cli!s, NJ, 1996.

[16] M.L.G. Althouse, C.-I. Chang, Chemical vapor detectionwith a multispectral thermal imager, Opt. Engng 30 (11)(1991) 1725}1733.

[17] J.C. Harsanyi, Detection and classi"cation of subpixelspectral signatures in hyperspectral image sequences,Ph.D. Dissertation, Department of Electrical Engineering,University of Maryland Baltimore County, Baltimore,MD, 1993.

[18] J.C. Harsanyi, W. Farrand, C.-I. Chang, Detection of sub-pixel spectral signatures in hyperspectral image sequences,Annual Meeting, Proceedings of American Society of Photo-grammetry & Remote Sensing, Reno, 1994, pp. 236}247.

[19] W. Farrand, J.C. Harsanyi, Mapping the distribution ofmine tailing in the coeur d'Alene river valley, Idaho,through the use of constrained energy minimization tech-nique, Remote Sensing Environ. 59 (1997) 64}76.

[20] Q. Du, C.-I. Chang, A linear constrained Euclideandistance-based discriminant analysis for hyperspectralimage classi"cation, 1999 Conference on Information andSystem Science, Johns Hopkins University, Baltimore,MD, 1999.

[21] H. Ren, C.-I. Chang, An unsupervised orthogonal sub-space projection to target detection and classi"cation inunknown environment, Spectroradiometric Symposium,San Diego, November 2}7, 1997.

[22] H. Ren, C.-I. Chang, A computer-aided detection andclassi"cation method for concealed targets in hyperspec-tral imagery, International Symposium on Geoscienceand Remote Sensing'98, Seattle, WA, July 5}10, 1998,pp. 1016}1018.

[23] H. Ren, C.-I. Chang, A generalized orthogonal subspaceprojection approach to unsupervised multispectral imageclassi"cation, Proceedings of SPIE Conference on Imageand Signal Processing for Remote Sensing IV, Vol. 3500,Spain, September 21}25, 1998, pp. 42}53.

[24] C.-I. Chang, C. Brumbley, Linear unmixing Kalman "lter-ing approach to signature abundance detection, signatureestimation and subpixel classi"cation for remotely sensedimages, IEEE Trans. Aerospace Electron. Systems 37 (1)(1999) 319}330.

[25] C.-I. Chang, Spectral information divergence of hyper-psctral image analysis, International Symposium on Geos-cience and Remote Sensing'99, June 28-July 2, Hamburg,Germany, 1999, 509}511.

About the Author*QIAN DU received the B.S. and M.S. degrees in electrical engineering from Beijing Institute of Technology in 1992and 1995, respectively. She is currently a Ph.D. candidate of department of Computer Science and Electrical Engineering, University ofMaryland Baltimore County. Her research interests include signal and image processing, pattern recognition and neural networks. Ms.Du is a member of IEEE, SPIE and Phi Kappa Phi.

About the Author*CHEIN-I CHANG received his BS, MS and MA degrees from Soochow University, Taipei, Taiwan, 1973, theInstitute of Mathematics at National Tsing Hua University, Hsinchu, Taiwan, 1975 and the State University of New York at Stony


Brook, 1977, respectively, all in mathematics, and MS and MSEE degrees from the University of Illinois at Urbana-Champaign in 1982respectively and Ph.D. in electrical Engineering from the University of Maryland, College Park in 1987. He was a visiting AssistantProfessor from January 1987 to August 1987, Assistant Professor from 1987 to 1993, and is currently an Associate Professor in theDepartment of Computer Science and Electrical Engineering at the University of Maryland Baltimore County. He was a visitingspecialist in the Institute of Information Engineering at the National Cheng Kung University, Tainan, Taiwan from 1994}1995.Dr. Chang is an editor for Journal of High Speed Network and the guest editor of a special issue on Telemedicine and Applications. Hisresearch interests include information theory and coding, signal detection and estimation, remote sensing image processing, neuralnetworks, pattern recognition. Dr. Chang is a senior member of IEEE and a member of SPIE, INNS, Phi Kappa Phi and Eta Kappa Nu.


Documents

A linear constrained distance-based discriminant analysis for … · 2006. 8. 1. · Pattern Recognition 34 (2001) 361}373 A linear constrained distance-based discriminant analysis