Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Feature Extraction
PCA/LDA/ICA/KPCA and otheracronyms ending in A
Feature Extraction
In general, the optimal mapping y=f(x) will be anon-linear function
• However, there is no systematic way to generate non-linear transforms
• The selection of a particular subset of transforms is problemdependent
• For this reason, feature extraction is commonly limited tolinear transforms: y=Wx
†
x = Uy, such that UTU = I
x =r u 1
r u 2Lr u n[ ]y = yi
r u ii=1:nÂ
†
x =r u 1
r u 2Lr u m[ ] r u m +1
r u m +2 Lr u n[ ][ ]y = yi
r u ii=1:m + yi
r u ii= (m +1):nÂ
ˆ x (m) = yir u i
i=1:mÂ
PCA Derivation: Minimizing Reconstruction Error
Any point in Rn canperfectly reconstructedin a new Orthonormalbasis of size n.
Goal: Find anorthonormal basis of mvectors, m<n thatminimizesReconstruction error.
†
ˆ x = r u 1r u 2L
r u m[ ]y1M
ym
È
Î
Í Í
˘
˚
˙ ˙ +
r u m +1r u m +2 L
r u n[ ]ym +1Myn
È
Î
Í Í
˘
˚
˙ ˙
ˆ x = Umym + Udb = ˆ x (m) + ˆ x discard
†
ˆ x (m) =r u 1
r u 2 Lr u m[ ]
y1M
ym
È
Î
Í Í
˘
˚
˙ ˙
Define a reconstruction based on the‘best’ m vectors x(m)ˆ
†
Errrecon2 = x k-ˆ x k( )T x k-ˆ x k( )
k=1:NsamplesÂ
Visualizing Reconstruction Error
x
Data scatter Data as 2D vectors
y
ugood
u
x
xt u
r
xp
ugood
Solution involves findingdirections u which minimizethe perpendicular distancessnd removing them
†
Dx(m) = x - ˆ x (m) = yir u i -
i=1:n yi
r u i +i=1:m bi
r u ii= (m +1):nÂ
Ê
Ë Á Á
ˆ
¯ ˜ ˜ = (yi - bi)
r u ii= (m +1):nÂ
Errrecon2 = E Dx(m) 2[ ] = E (y j - b j )
r u j (yi - bi)r u i
i= (m +1):nÂ
j=(m +1):nÂ
È
Î Í Í
˘
˚ ˙ ˙
= E (yi - bi)(y j - b j )r u i
T r u ji= (m +1):nÂ
j=(m +1):nÂ
È
Î Í Í
˘
˚ ˙ ˙
= E (yi - bi)2
i= (m +1):nÂ
È
Î Í
˘
˚ ˙ = E (yi - bi)
2[ ]i= (m +1):nÂ
Goal: Find basis vectors ui and constants bi minimizereconstruction error
†
∂Err∂bi
= 0 =∂
∂bi
E (yi - bi)2[ ] =
i= (m +1):n 2(E yi[ ] - bi) fi bi = E yi[ ]
Rewriting theerror….
Solving for b….
Therefore, replace the discarded dimensions yi’s by their expected value.
†
E (yi - E[yi])2[ ]
i= (m +1):n = E (xT r u i - E[xT r u i])
2[ ]i= (m +1):nÂ
= E (xT r u i - E[xT r u i])T (xT r u i - E[xT r u i])[ ]i= (m +1):nÂ
= E r u iT (xT - E[xT ])T (xT - E[xT ])r u i[ ]
i= (m +1):nÂ
= E r u iT (x - E[x])(x - E[x])T r u i[ ]
i= (m +1):nÂ
=r u i
T E (x - E[x])(x - E[x])T[ ]r u ii= (m +1):nÂ
=r u i
TC r u ii= (m +1):nÂ
Now rewrite the error replacing the bi
C is the covariance matrix for x
Thus, finding the best basis ui involves minimizing the quadratic form,
subject to the constraint || ui ||=1
†
Err =r u i
TC r u ii= (m +1):nÂ
Using Lagrangian Multipliers we form the constrainederror function:
†
Err =r u i
TC r u ii= (m +1):n + li 1-
r u iT r u i( )
∂Err∂r u i
=∂
∂r u i
r u iTC r u i
i= (m +1):n + li 1-
r u iT r u i( ) = 0
=∂
∂r u i
r u iTC r u i + li 1-
r u iT r u i( )( ) = 2C r u i - 2li
r u i = 0
†
C r u i = lir u iWhich results in the following
Eigenvector problem
†
Err =r u i
TC r u ii= (m +1):n + li 1-
r u iT r u i( )
Err =r u i
T lir u i( )
i= (m +1):n + 0 = li
i= (m +1):nÂ
Plugging back into the error:
Thus the solution is to discard the m-n smallesteigenvalue eigenvectors.
PCA summary:1) Compute data covariance2) Eigenanalysis on covariance matrix3) Throw out smallest eigenvalue eigenvectors
Problem: How many to keep?Many criteria.
e.g. % total data variance:
†
max(m) '
lii= (m +1):nÂ
lii=1:nÂ
< 0.05
Extensions: ICA• Find the ‘best’ linear basis, minimizing the
statistical dependence between projectedcomponentsProblem:Find c hiddenind. sources xi
ObservationModel
ICA Problem statement:Recover the source signals from the sensed signals. More specifically, we seek
a real matrix W such that z(t) is an estimate of x(t):
Solve via:
Depending on density assumptions, ICA canhave easy or hard solutions
• Gradient approach• Kurtotic ICA: Two lines matlab code.
– http://www.cs.toronto.edu/~roweis/kica.html• yy are the mixed measurements (one per column)• W is the unmixing matrix.
• % W = kica(yy);• xx = sqrtm(inv(cov(yy')))*(yy-repmat(mean(yy,2),1,size(yy,2)));• [W,ss,vv] = svd((repmat(sum(xx.*xx,1),size(xx,1),1).*xx)*xx');
Kernel PCA
• PCA after non-linear transformation
http://www-white.media.mit.edu/vismod/demos/facerec/basic.html
PCA on aligned face images