PCA/LDA/ICA/KPCA and other acronyms ending in Avision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · In general, the optimal mapping y=f(x) will be a non-linear

Feature Extraction

PCA/LDA/ICA/KPCA and otheracronyms ending in A

Feature Extraction

In general, the optimal mapping y=f(x) will be anon-linear function

• However, there is no systematic way to generate non-linear transforms

• The selection of a particular subset of transforms is problemdependent

• For this reason, feature extraction is commonly limited tolinear transforms: y=Wx

†

x = Uy, such that UTU = I

x =r u 1

r u 2Lr u n[ ]y = yi

r u ii=1:nÂ

†

x =r u 1

r u 2Lr u m[ ] r u m +1

r u m +2 Lr u n[ ][ ]y = yi

r u ii=1:mÂ + yi

r u ii= (m +1):nÂ

ˆ x (m) = yir u i

i=1:mÂ

PCA Derivation: Minimizing Reconstruction Error

Any point in Rn canperfectly reconstructedin a new Orthonormalbasis of size n.

Goal: Find anorthonormal basis of mvectors, m<n thatminimizesReconstruction error.

†

ˆ x = r u 1r u 2L

r u m[ ]y1M

ym

È

Î

Í Í

˘

˚

˙ ˙ +

r u m +1r u m +2 L

r u n[ ]ym +1Myn

È

Î

Í Í

˘

˚

˙ ˙

ˆ x = Umym + Udb = ˆ x (m) + ˆ x discard

†

ˆ x (m) =r u 1

r u 2 Lr u m[ ]

y1M

ym

È

Î

Í Í

˘

˚

˙ ˙

Define a reconstruction based on the‘best’ m vectors x(m)ˆ

†

Errrecon2 = x k-ˆ x k( )T x k-ˆ x k( )

k=1:NsamplesÂ

Visualizing Reconstruction Error

x

Data scatter Data as 2D vectors

y

ugood

u

x

xt u

r

xp

ugood

Solution involves findingdirections u which minimizethe perpendicular distancessnd removing them

†

Dx(m) = x - ˆ x (m) = yir u i -

i=1:nÂ yi

r u i +i=1:mÂ bi

r u ii= (m +1):nÂ

Ê

Ë Á Á

ˆ

¯ ˜ ˜ = (yi - bi)

r u ii= (m +1):nÂ

Errrecon2 = E Dx(m) 2[ ] = E (y j - b j )

r u j (yi - bi)r u i

i= (m +1):nÂ

j=(m +1):nÂ

È

Î Í Í

˘

˚ ˙ ˙

= E (yi - bi)(y j - b j )r u i

T r u ji= (m +1):nÂ

j=(m +1):nÂ

È

Î Í Í

˘

˚ ˙ ˙

= E (yi - bi)2

i= (m +1):nÂ

È

Î Í

˘

˚ ˙ = E (yi - bi)

2[ ]i= (m +1):nÂ

Goal: Find basis vectors ui and constants bi minimizereconstruction error

†

∂Err∂bi

= 0 =∂

∂bi

E (yi - bi)2[ ] =

i= (m +1):nÂ 2(E yi[ ] - bi) fi bi = E yi[ ]

Rewriting theerror….

Solving for b….

Therefore, replace the discarded dimensions yi’s by their expected value.

†

E (yi - E[yi])2[ ]

i= (m +1):nÂ = E (xT r u i - E[xT r u i])

2[ ]i= (m +1):nÂ

= E (xT r u i - E[xT r u i])T (xT r u i - E[xT r u i])[ ]i= (m +1):nÂ

= E r u iT (xT - E[xT ])T (xT - E[xT ])r u i[ ]

i= (m +1):nÂ

= E r u iT (x - E[x])(x - E[x])T r u i[ ]

i= (m +1):nÂ

=r u i

T E (x - E[x])(x - E[x])T[ ]r u ii= (m +1):nÂ

=r u i

TC r u ii= (m +1):nÂ

Now rewrite the error replacing the bi

C is the covariance matrix for x

Thus, finding the best basis ui involves minimizing the quadratic form,

subject to the constraint || ui ||=1

†

Err =r u i

TC r u ii= (m +1):nÂ

Using Lagrangian Multipliers we form the constrainederror function:

†

Err =r u i

TC r u ii= (m +1):nÂ + li 1-

r u iT r u i( )

∂Err∂r u i

=∂

∂r u i

r u iTC r u i

i= (m +1):nÂ + li 1-

r u iT r u i( ) = 0

=∂

∂r u i

r u iTC r u i + li 1-

r u iT r u i( )( ) = 2C r u i - 2li

r u i = 0

†

C r u i = lir u iWhich results in the following

Eigenvector problem

†

Err =r u i

TC r u ii= (m +1):nÂ + li 1-

r u iT r u i( )

Err =r u i

T lir u i( )

i= (m +1):nÂ + 0 = li

i= (m +1):nÂ

Plugging back into the error:

Thus the solution is to discard the m-n smallesteigenvalue eigenvectors.

PCA summary:1) Compute data covariance2) Eigenanalysis on covariance matrix3) Throw out smallest eigenvalue eigenvectors

Problem: How many to keep?Many criteria.

e.g. % total data variance:

†

max(m) '

lii= (m +1):nÂ

lii=1:nÂ

< 0.05

Extensions: ICA• Find the ‘best’ linear basis, minimizing the

statistical dependence between projectedcomponentsProblem:Find c hiddenind. sources xi

ObservationModel

ICA Problem statement:Recover the source signals from the sensed signals. More specifically, we seek

a real matrix W such that z(t) is an estimate of x(t):

Solve via:

Depending on density assumptions, ICA canhave easy or hard solutions

• Gradient approach• Kurtotic ICA: Two lines matlab code.

– http://www.cs.toronto.edu/~roweis/kica.html• yy are the mixed measurements (one per column)• W is the unmixing matrix.

• % W = kica(yy);• xx = sqrtm(inv(cov(yy')))*(yy-repmat(mean(yy,2),1,size(yy,2)));• [W,ss,vv] = svd((repmat(sum(xx.*xx,1),size(xx,1),1).*xx)*xx');

Kernel PCA

• PCA after non-linear transformation

http://www-white.media.mit.edu/vismod/demos/facerec/basic.html

PCA on aligned face images

Documents

PCA/LDA/ICA/KPCA and other acronyms ending in Avision.psych.umn.edu/users/schrater/schrater_lab/courses/PattRecog… · In general, the optimal mapping y=f(x) will be a non-linear