15
Course Calendar Class DATE Contents 1 Sep. 26 Course information & Course overview 2 Oct. 4 Bayes Estimation 3 11 Classical Bayes Estimation - Kalman Filter - 4 18 Simulation-based Bayesian Methods 5 25 Modern Bayesian Estimation Particle Filter 6 Nov. 1 HMM(Hidden Markov Model) Nov. 8 No Class 7 15 Bayesian Decision 8 29 Non parametric Approaches 9 Dec. 6 PCA(Principal Component Analysis) 10 13 ICA(Independent Component Analysis) 11 20 Applications of PCA and ICA 12 27 Clustering, k-means et al. 13 Jan. 17 Other Topics 1 Kernel machine. 14 22(Tue) Other Topics 2

2012 mdsp pr09 pca lda

Embed Size (px)

Citation preview

Page 1: 2012 mdsp pr09 pca lda

Course Calendar Class DATE Contents

1 Sep. 26 Course information & Course overview

2 Oct. 4 Bayes Estimation

3 〃 11 Classical Bayes Estimation - Kalman Filter -

4 〃 18 Simulation-based Bayesian Methods

5 〃 25 Modern Bayesian Estimation :Particle Filter

6 Nov. 1 HMM(Hidden Markov Model)

Nov. 8 No Class

7 〃 15 Bayesian Decision

8 〃 29 Non parametric Approaches

9 Dec. 6 PCA(Principal Component Analysis)

10 〃 13 ICA(Independent Component Analysis)

11 〃 20 Applications of PCA and ICA

12 〃 27 Clustering, k-means et al.

13 Jan. 17 Other Topics 1 Kernel machine.

14 〃 22(Tue) Other Topics 2

Page 2: 2012 mdsp pr09 pca lda

Lecture Plan

Principal Component Analysis

1. Introduction

Dimensionality Reduction

2. Principal Component Analysis (PCA)

3. Linear Discriminative Analysis(LDA)

Page 3: 2012 mdsp pr09 pca lda

3

1. Dimensionality of Feature Space For given sample data there would exist a maximum dimension of

features above which the performance of Bayes classifier will

degrade rather than improve.

Two Approaches

Feature Selection: Choose a subset of overall features

Feature Extraction: Create a subset of new features by combining

the original features.

11

21

1 2where (1,2, , ) ( , , , )

i

im

imn

xx

xx

n i i i

xx

1 11

2 21

mn n

x xy

x xy

f

yx x

Page 4: 2012 mdsp pr09 pca lda

4

2

,

,

2

, :1 ex. m=256

where, is image intensity at pixel ,

From an array data to a dimensional vector .

By lexicographic ordering of m m images into

a set of vectors,

i j

i j

m

I i j m

I i j

m

R

I

Vectorization

I x

x

2 2ex. =256 65,536 dimentionm

Example : Representation of images

Page 5: 2012 mdsp pr09 pca lda

5

Restricting the transforms within linear(*) functions, we have the

form

(* Several nonlinear transformations such as multi-layer

perceptron, manifold learning and kernel methods are known.)

y Wx

1 11 11 1

2 21

1

. .

. .. .

. ... .

. .

n

m mnmn n

x xy w w

x xy

w wyx x

Page 6: 2012 mdsp pr09 pca lda

6 x1

x2

x1

x2

Signal Representation (PCA) vs. Classification (LDA)

When the aim of feature extraction is to represent a signal, the

Principal Component Analysis(PCA) is applied. Whereas for a data-

classification under supervised learning problem the Linear

Discriminative Analysis (LDA) is applied.

(a) PCA for representation (b) LDA for classification

Page 7: 2012 mdsp pr09 pca lda

7

2. PCA by variance maximization

- Random vector with n elements x

- Samples xi , i=1~N

- The first and second order statistics are given or, in practice, are

calculated from the samples

- Subtracting the mean of vector x, that is

(where E means the expectation over x, and henceforth x in the PCA

part of this lecture is assumed zero means vector)

Consider a linear combination of the elements of x.

y1 is called the first principal component of x, if the variance of y1 is

maximum.

Newx E x x

1 1 1

1

1 1

where ' are weights, and is the vector form of the weights.

nT

k k

k

k

y w x

w s

w x

w

Page 8: 2012 mdsp pr09 pca lda

8

2

21 1 1 1 1 1 1

1

:

=1

The matrix is the n n covariance metrix of .

T T T TPCA x

x

J E y E

PCA Criterion

w w x w E xx w = w C w

w

C x

The solution of PCA problem

-Results from matrix theory

1 2

1 22

1 1 1 1

1

( )

nT

k

k

w w w w

1, ,

1, 1 2

1

1 1

- Define unity-norm eignvectors of the matrix

- Corresponding eigenvalues satisfy 0

The solution of maximizing is given by

This indicates

n x

n n

PCAJ

e e C

w

w e

1 1

that the first principle component of is

T

x

y e x

1

1PCAMax Jw

w

Page 9: 2012 mdsp pr09 pca lda

9

2

2 2

2 2

1

2 1

- The second principal component of with weight vector

The variance of is maximized under the condition that is

with .

0

This co

Ty

y y

y

E y y

uncorrelated

x w

w x

ndition derives

2 1 2 1 2 1 1 2 1

2 1

22

2 2

0

The right most equation means that the is orthognal to .

- The problem is to seek the maximum variance of

in the orthogonal subspace

T T T Tx

T

E y y E

E y E

w x w x w C w w e

w e

w x

1

2 3

to the first eignvector

(namely, this subspace is spanned by , , ).n

e

e e e

Page 10: 2012 mdsp pr09 pca lda

10

2 2

- The solution is hence given by

, we may derive all principal components, thus

,

where ( 1 ~ )

k k

k k

Likewise

y

k n

w e

w x

w e

Page 11: 2012 mdsp pr09 pca lda

11

3. LDA –Fisher’s Ratio Maximization-

1 2

1 2 1 2

1 2

We are given data

, , ,

which are divided into two subsets D and D (N and N data subsets)

corresponding to two classes and respectively.

The problem is to find a projection

N

x x x

1 2

1 2

onto a line

where we want to separate sample data into the subsets Y and Y

corresponding to D and D are well as possible.

Ty w x

The LDA tries to find directions that are efficient for classification or

discrimination of samples in supervised learning problems.

2

1 2

2 2

1 2

An evaluation function of the best separability can be defined

(Fisher ratio)m m

Js s

w

Page 12: 2012 mdsp pr09 pca lda

12

22

where

1

is the sample mean of the projected sample for each class , and

is the scatter value for the projected samples with label .

i

i

i y Yi

i

i iy Y

i

m yN

s y m

1 2

:

To rewrite the Fisher's ratio, we define the scatter matrices

1, where

,the

and the

i i

T

i i i iix D D

W

N

Fisher ratio as a function of

within - class scatter matrix

between -

x

w

S x m x m m x

S S S

1 2 1 2 T

B

class scatter matrix

S m m m m

Page 13: 2012 mdsp pr09 pca lda

13

Then the Fisher ratio becomes

and the optimal weight vector for maximizing can be

obtained as the solution of the generalized eignvalue problem

TB

Tw

B

J

J

w S ww

w S w

w w

S w S

1

Since the matrix is symmetric and positive semidefinite(*) and

it is usually positive definete(*), hence is non-singular, the solution

is given by

Futhermore, if we ign

w

W

w B

w

S

S S w w

11 2

ore the magnitude of , we can rewrite

the solution as follow.

*; For any , 0, **; For any , 0

w

T Tw w

w

w S m m

w w S w w w S w

Page 14: 2012 mdsp pr09 pca lda

14

Fig. Comparison of PCA(First principal Magenta line ) and LDA Discriminate Line (Green Line ) [3]

Page 15: 2012 mdsp pr09 pca lda

15

References: [1] R.O. Duda, P.E. Hart, and D. G. Stork, “Pattern Classification”, John Wiley & Sons, 2nd edition, 2004 [2] C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006 [3] All data files of Bishop’s book are available at the “http://research.microsoft.com/~cmbishop/PRML” [4] ] A. Hyvarinen et al. “Independent Component Analysis” , Wiley-InterScience, 2001