Lecture 8 - University of Western Ontarioolga/Courses/CS434a_541a/Lecture...Multiple Discriminant...

CS434a/541a: Pattern RecognitionProf. Olga Veksler

Lecture 8

� Continue with Dimensionality Reduction� Last lecture: PCA� This lecture: Fisher Linear Discriminant

� PCA finds the most accurate data representationin a lower dimensional space� Project data in the directions of maximum variance

� Fisher Linear Discriminant project to a line which preserves direction useful for data classification

Data Representation vs. Data Classification

� However the directions of maximum variance may be useless for classification

apply PCA

to each class

separable

not separable

Fisher Linear Discriminant

� Main idea: find projection to a line s.t. samples from different classes are well separated

bad line to project to,classes are mixed up

Example in 2D

good line to project to,classes are well separated

Fisher Linear Discriminant� Suppose we have 2 classes and d-dimensional

samples x1,…,xn where � n1 samples come from the first class� n2 samples come from the second class

� consider projection on a line� Let the line direction be given by unit vector v

vixvt x

� Scalar vtxi is the distance of projection of xi from the origin

� Thus it vtxi is the projection of xi into a one dimensional subspace

� How to measure separation between projections of different classes?

� Thus the projection of sample xi onto a line in direction v is given by vtxi

�� ∈∈∈∈ ∈∈∈∈

====��

��

��========

µµµµµµµµ

� Let µµµµ1 and µµµµ2 be the means of classes 1 and 2

� Let and be the means of projections of classes 1 and 2

1~µµµµ 2

~µµµµ

22~, µµµµµµµµ tvsimilarly ====

� seems like a good measure21~~ µµµµµµµµ −−−−

� How good is as a measure of separation?� The larger , the better is the expected separation

21~~ µµµµµµµµ −−−−

� the vertical axes is a better line than the horizontal axes to project to for class separability

� however 2121~~ µµµµµµµµµµµµµµµµ −−−−>>>>−−−− ��

1µµµµ

2µµµµ

1µµµµ�2µµµµ�

1~µµµµ

2~µµµµ

� The problem with is that it does not consider the variance of the classes

1µµµµ�2µµµµ�

1~µµµµ

2~µµµµ

large variance

1µµµµ

2µµµµ

� We need to normalize by a factor which is proportional to variance

(((( ))))��====

−−−−====n

2µµµµ

� Define their scatter as

� Have samples z1,…,zn . Sample mean is ��====

1µµµµ

� Thus scatter is just sample variance multiplied by n� scatter measures the same thing as variance, the spread

of data around the mean� scatter is just on different scale than variance

larger scatter: smaller scatter:

� Fisher Solution: normalize by scatter21~~ µµµµµµµµ −−−−

(((( ))))��∈∈∈∈

−−−−====2Classy

~ys~ µµµµ� Scatter for projected samples of class 2 is

� Scatter for projected samples of class 1 is (((( ))))��

∈∈∈∈

−−−−====1

~~Classy

ys µµµµ

� Let yi = vtxi , i.e. yi ‘s are the projected samples

� We need to normalize by both scatter of class 1 and scatter of class 2

(((( )))) (((( ))))22

++++−−−−====

µµµµµµµµ

� Thus Fisher linear discriminant is to project on line in the direction v which maximizes

want projected means are far from each other

want scatter in class 2 is as small as possible, i.e. samples of class 2 cluster around the projected mean 2

~µµµµ

want scatter in class 1 is as small as possible, i.e. samples of class 1 cluster around the projected mean 1

~µµµµ

(((( )))) (((( ))))22

++++−−−−====

µµµµµµµµ

� If we find v which makes J(v) large, we are guaranteed that the classes are well separated

1~µµµµ

2~µµµµ

small implies that projected samples of class 1 are clustered around projected mean

1~s small implies that

projected samples of class 2 are clustered around projected mean

projected means are far from each other

Fisher Linear Discriminant Derivation

� All we need to do now is to express J explicitly as a function of v and maximize it� straightforward but need linear algebra and Calculus

(((( )))) (((( ))))22

++++−−−−====

µµµµµµµµ

� Define the separate class scatter matrices S1 and S2 for classes 1 and 2. These measure the scatter of original samples xi (before projection)

(((( ))))(((( ))))��∈∈∈∈

−−−−−−−−====1

111Classx

xxS µµµµµµµµ

(((( ))))(((( ))))��∈∈∈∈

−−−−−−−−====2

222Classx

xxS µµµµµµµµ

� Now define the within the class scatter matrix21 SSSW ++++====

� Recall that (((( ))))��∈∈∈∈

−−−−====1

~~Classy

ys µµµµ

(((( ))))��∈∈∈∈

−−−−====1Classy

vxvs~ µµµµ

� Using yi = vtxi and 11~ µµµµµµµµ tv====

(((( ))))(((( )))) (((( ))))(((( ))))��∈∈∈∈

−−−−−−−−====1Classy

vxvx µµµµµµµµ

(((( ))))(((( ))))��∈∈∈∈

−−−−−−−−====1Classy

vxxv µµµµµµµµ vSv t1====

(((( ))))(((( )))) (((( ))))(((( ))))��∈∈∈∈

−−−−−−−−====1Classy

xvxv µµµµµµµµ

� Similarly vSvs t2

~ ====

� Let’s rewrite the separations of the projected means

� Therefore vSvvSvvSvss Wttt ====++++====++++ 21

(((( )))) (((( ))))221

~~ µµµµµµµµµµµµµµµµ tt vv −−−−====−−−−(((( ))))(((( )))) vv tt

2121 µµµµµµµµµµµµµµµµ −−−−−−−−====vSv B

� Define between the class scatter matrix(((( ))))(((( ))))t

BS 2121 µµµµµµµµµµµµµµµµ −−−−−−−−====� SB measures separation between the means of two

classes (before projection)

� Thus our objective function can be written:

� Minimize J(v) by taking the derivative w.r.t. v and setting it to 0

(((( )))) (((( ))))2vSv

vSvvSvdvdvSvvSv

��

��−−−−��

��

(((( )))) (((( ))))(((( ))))2

vSvvSvSvvS

B −−−−==== 0====

(((( )))) (((( ))))vSvvSv

s~s~~~

221 ====

++++−−−−====

µµµµµµµµ

� Need to solve (((( )))) (((( )))) 0====−−−− vSvSvvSvSv WBt

(((( )))) (((( ))))0====−−−−��

vSvvSvSv

(((( ))))0====−−−−��

vSvvSvSv

vSvS WB λλλλ====��

λλλλ====

generalized eigenvalue problem

� If SW has full rank (the inverse exists), can convert this to a standard eigenvalue problem

vSvS WB λλλλ====

vvSS BW λλλλ====−−−−1

� But SB x for any vector x, points in the same direction as µµµµ1 - µµµµ2

(((( ))))(((( )))) xxS tB 2121 µµµµµµµµµµµµµµµµ −−−−−−−−==== (((( )))) (((( ))))(((( ))))xt

2121 µµµµµµµµµµµµµµµµ −−−−−−−−==== (((( ))))21 µµµµµµµµαααα −−−−====

� Thus can solve the eigenvalue problem immediately (((( ))))21

1 µµµµµµµµ −−−−==== −−−−WSv

αααα

(((( ))))[[[[ ]]]] (((( ))))[[[[ ]]]]211

W SSSS µµµµµµµµααααµµµµµµµµ −−−−====−−−− −−−−−−−−−−−−

(((( ))))[[[[ ]]]]211

WS µµµµµµµµαααα −−−−==== −−−−

λλλλ

Fisher Linear Discriminant Example� Data

� Class 1 has 5 samples c1=[(1,2),(2,3),(3,3),(4,5),(5,5)]� Class 2 has 6 samples c2=[(1,0),(2,1),(3,1),(3,2),(5,3),(6,5)]

� Arrange data in 2 separate matrices

��

��====

21c1 ��

��

��====

01c2 ��

� Notice that PCA performs very poorly on this data because the direction of largest variance is not helpful for classification

Fisher Linear Discriminant Example

� First compute the mean for each class(((( )))) [[[[ ]]]]6.33cmean 11 ========µµµµ (((( )))) [[[[ ]]]]23.3cmean 22 ========µµµµ

(((( ))))��

�� ====∗∗∗∗==== 2.70.8

0.810ccov4S 11 (((( ))))��

�� ====∗∗∗∗==== 1616

163.17ccov5S 22

� Compute scatter matrices S1 and S2 for each class

��

�� ====++++==== 2.2324

243.27SSS 21W

� Within the class scatter:

� it has full rank, don’t have to solve for eigenvalues

(((( ))))��

�� −−−−

−−−−========−−−−47.041.041.039.0SinvS W

1W� The inverse of SW is

� Finally, the optimal line direction v(((( ))))

��

�� −−−−====−−−−==== −−−−

89.079.0Sv 21

1W µµµµµµµµ

Fisher Linear Discriminant Example

� Notice, as long as the line has the right direction, its exact position does not matter

� Last step is to compute the actual 1D vector y. Let’s do it separately for each class

[[[[ ]]]] [[[[ ]]]]4.081.0525173.065.011 �

�� ====

��

�� −−−−======== ttcvY

[[[[ ]]]] [[[[ ]]]]25.065.0506173.065.022 −−−−−−−−====��

�� −−−−======== �

��ttcvY

Multiple Discriminant Analysis (MDA)� Can generalize FLD to multiple classes� In case of c classes, can reduce dimensionality to

1, 2, 3,…, c-1 dimensions� Project sample xi to a linear subspace yi = V txi

� V is called projection matrix

Multiple Discriminant Analysis (MDA)

� Objective function: (((( )))) (((( ))))(((( ))))VSVdet

VSVdetVJ

� within the class scatter matrix SW is

(((( ))))(((( ))))�� ==== ∈∈∈∈====

−−−−−−−−========c

iclassxik

1iiW xxSS

µµµµµµµµ

� between the class scatter matrix SB is

(((( ))))(((( ))))ti

1iiiB nS µµµµµµµµµµµµµµµµ −−−−−−−−==== ��

� Let

��∈∈∈∈

====iclassxi

i xn1µµµµ ��====

n1µµµµ

maximum rank is c -1

� ni by the number of samples of class i� and µµµµi be the sample mean of class i� µ µ µ µ be the total mean of all samples

Multiple Discriminant Analysis (MDA)

(((( )))) (((( ))))(((( ))))VSVdet

VSVdetVJ

� The optimal projection matrix V to a subspace of dimension k is given by the eigenvectors corresponding to the largest k eigenvalues

� First solve the generalized eigenvalue problem:vSvS WB λλλλ====

� At most c-1 distinct solution eigenvalues

� Let v1, v2 ,…, vc-1 be the corresponding eigenvectors

� Thus can project to a subspace of dimension at most c-1

FDA and MDA Drawbacks� Reduces dimension only to k = c-1 (unlike PCA)

� For complex data, projection to even the best line may result in unseparable projected samples

� Will fail:1. J(v) is always 0: happens if µµµµ1 = µµµµ2

PCA performs reasonably well here:

PCA also fails:

2. If J(v) is always large: classes have large overlap when projected to any line (PCA will also fail)

Lecture 8 - University of Western Ontarioolga/Courses/CS434a_541a/Lecture...Multiple Discriminant...

Documents

Lecture 16 - csd.uwo.caolga/Courses/CS434a_541a/Lecture16.pdf · Lecture 16. Today Continue Clustering Last Time “Flat Clustring” Today Hierarchical Clustering Divisive Agglomerative

Lecture 11 - Computer Science - Western Universityoveksler/Courses/CS434a_541a/Lecture11.pdfLecture 11. Today Support Vector Machines (SVM) Introduction Linear Discriminant Linearly

No Classes Classes Resume

Lecture 6 - Computer Science - Western Universityolga/Courses/CS434a_541a/Lecture6.pdfProf. Olga Veksler Lecture 6 Today Introduction to nonparametric techniques Basic Issues in Density

Schedule of Day/Evening Classes Fall Classes Begin ... · Schedule of Day/Evening Classes Fall Classes Begin September 21 ... Dani Randolph, ... Classes in English as a Second Language

blogs.spiritsd.ca · Classes resume No school Last day of classes (Winter break) Family Day Classes resume Last day of classes (Easter break) Good Friday Easter Monday Classes resume

Lecture 5 - University of Western Ontariooveksler/Courses/CS434a_541a/Lecture5.p… · Lecture 5. Today Introduction to parameter estimation Two methods for parameter estimation Maximum

Classes? Que Classes?

Ideal Classes-Coaching classes for ICSE

Classes? Que Classes? Ciclo de Debates sobre Classes Sociais

Web Classes .......................... page 98 Hybrid classes

Business Classes Computer Classes Health/Medical-Related ......Business Classes Page 6 Computer Classes Page 9 Health/Medical-Related Classes Page 13 HR Specialist Certificate Page

Lecture 6 - Computer Science - Western Universityolga/Courses//CS434a_541a//Lecture6.pdf · Lecture 6. Today Introduction to nonparametric techniques Basic Issues in Density Estimation

ISSN: 1500-0713 Article Title: Strengthening Academic Curricula … · AUTUMN WINTER SPRING SUMMER 1 Classes Classes Classes Off 2 Classes 1st Co-op Classes Classes 2nd Co-op 3 Classes

Definition: The ability to derive child classes (sub- classes) from parent classes (super-classes) Characteristics: Methods and instance variables

Lecture 11 - Western Universityolga/Courses/CS434a_541a/Lecture11.pdf · SVM Said to start in 1979 with Vladimir Vapnik’s paper Major developments throughout 1990’s Elegant theory

Fall Classes 2016 - reynolds.edu · Summer Classes 2016 Open Registration Begins April 25 Classes Start May 23 Fall Classes 2016 Open Registration Begins June 13 Classes Start August

Classes and Nested Classes in Java

Characteristic classes for Riemannian foliationshurder/papers/69manuscript.pdf · Keywords: Riemannian foliation, characteristic classes, secondary classes, Chern-Simons classes 1

Adverbial Classes and Adjective Classes - uni … · Adverbial Classes and Adjective Classes. The Roots of Adverbial Classes ... Adverb Classes a. Manner adverbs Jones buttered his