Class 4 Part 2

8/4/2019 Class 4 Part 2

http://slidepdf.com/reader/full/class-4-part-2 1/26

Pattern Recognition for VisionFall 2004

Principal Component Analysis & Independent Component Analysis

Overview

Principal Component Analysis

•Purpose

•Derivation

•Examples

Independent Component Analysis

•Generative Model•Purpose

•Restrictions

•Computation•Example

Literature

8/4/2019 Class 4 Part 2



Notation & Basics

Vector

Matrix

, , Dot product written

as matrix product

Product of a row vector with

T N

a

∈

u

U

u v u v R

u

22scalar as matrix product, and notsquared norm

Rules for matrix multiplication:

( ) ( )

( )

( )

T

T T T

a= =

= =

+ = +

=

uu u u u

UVW UV W U VW

U V W UW VW

UV V U

8/4/2019 Class 4 Part 2



Principal Component Analysis (PCA)

Purpose

For a set of samples of a random vector

,discover or reduce the dimensionality and

identify meaningful variables.

N ∈x R

1u

2u 1u

, : , p N p N = × <y Ux U

8/4/2019 Class 4 Part 2




PCA by Variance Maximization

uu

2 2

A Bσ σ >u u

{ } { }

{ } { }

{ }

1

1

1

1

2 21 1

2

1 1 1 1

21 1

1

Find the vector ,such that the variance of the data along this

direction is maximized:

( ) , 0, 1

( )( ) ,

,

The solution is the eigenvector of wi

T

T T T T

T T

E E

E E

E

σ

σ

σ

= = =

= =

= =

u

u

u

u

u x x u

u x x u u xx u

u Cu C xx

e C

1

1

2

1 1 1 1 1 1 1

th the largesteigenvalue .

, T

λ

λ λ σ λ = = ⇔ =uCe e e Ce

8/4/2019 Class 4 Part 2




{ }

For a given , find orthonormal basis vectors

such that the variance of the data along these vectors

is maximally large, under the constraint of decorrelation:

( )( ) 0, 0

The sol

i

T T T

i n i n

p N p

E n p

<

= = ≠

u

u x u x u u

{ } { }

1 1 2 2 1 2

orthogonal

ution are the eigenvectors of ordered according todecreasing eigenvalues :

, ,..., , ...

Proof of decorrelation for eigenvectors:

( )( )

p p p

T T T T T T

i n i n i n i n n E E

λ

λ λ λ

λ

= = = > >

= = = =

C

u e u e u e

e x e x e xx e e Ce e e 0

PCA by Variance Maximization

8/4/2019 Class 4 Part 2




PCA by Mean Square Error Compression

x

x̂

k u

{ } ( )2

1

For a given , find orthonormal basis vectors such that

ˆthe between and its projection into the subspace spanned

by the orthonormal basis vectors is mimimum:

ˆ ˆ, , p

T

i k k k

k

p N p

mse

p

mse E =

<

= − = ∑

x x

x x x u x u u ,

T

k m k mδ =u

8/4/2019 Class 4 Part 2




{ }

{ }

{ }{ }

( )

2

2

1scalar

2

1 1 1

2 2 2

1 1

2 2

1

1

ˆ ( )

2 ( ) ( ) ( )

2 ( ) ( )

( )

trace ,

pT

k k

k

p p pT T T T T

k k n n k k

k n k

p pT T

k k k k

pT

k

k p

T

k k

k

mse E E

E E E

E E E

E E

=

= = =

= =

=

=

= − = −

= − +

= − +

= −

= −

∑

∑ ∑ ∑

∑ ∑∑

∑

x x x u x u

x x u x u x u u u x u

x x u x u

x x u

C u Cu C

{ }T E = xx

8/4/2019 Class 4 Part 2




( ) { }1

maximize

trace ,

P T T

k k k mse E == − =∑C u Cu C xx

1

Solution to minimizing is any (orthonormal) basis

of the subspace spanned by the first eigenvectors

,..., of . p

mse

p

e e C

( )1 1

trace p N

k k

k k p

mse λ λ = = +

= − =∑ ∑C

1

1

The is the sum of the eigenvalues corresponding to

the discarded eigenvectors ,..., of : N

P N k

k p

mse

mse λ +

= +

= ∑e e C

8/4/2019 Class 4 Part 2




How to determine the number of principal components ? p

{ } { }{ } { } { } { }

1,1 1,

,1 ,

2

0

Linear signal model with unknown number of signals:

,

Signal have 0 mean and are uncorrelated, is white noise:

,

( )

p

N N p

i

T T

n

T T T T

p N a a

N p

a a

s

E E

E E E E

σ

=

<

= + = ×

= =

= = + +

x As n A

n

ss I nn I

C xx As As nn Asn

{ } { } 2

2

1 2 1 2... ...

cut off when eigenvalues become constants

T T T T

n

p p p N n

E E

d d d d d d

σ

σ + +

= + = +

> > > > = = = =

→

A ss A nn AA I

8/4/2019 Class 4 Part 2




Computing the PCA

{ }1

1

1

Given a set of samples ,..., of a random vector

calculate mean and covariance.

1,

1( )( )

Compute eigenvecotors of e.g. with QR algorithm

M

M

i

i

M T

i i

i

M

M

=

=

= → −

= − −

∑

∑

x x x

µ x x x µ

C x µ x µ

C

i i l l i ( )

8/4/2019 Class 4 Part 2




Computing the PCA

1,1 1,

,1 ,

If the number of samples is smaller than the

dimensionality of :

, , : , :

( ) ( )

,

M

T T

N N M

T

T

T

M

N

x x

N M M N

x x

λ

λ λ

λ λ

= = × ×

=

′ ′ ′=

′ ′ ′=

′ ′= =

x

B C BB B B

BB e e

B Be eBB Be Be

e Be

2 2

Reducing complexity from

( ) to ( )O N O M

→

P i i l C A l i (PCA)

8/4/2019 Class 4 Part 2




Examples

Eigenfaces for face recognition (Turk&Pentland):

Training:

-Calculate the eigenspace for all faces in the training database

-Project each face into the eigenspace feature reduction

Classification:

-Projec

→

t new face into eigenspace

-Nearest neighbor in the eigenspace

P i i l C t A l i (PCA)

8/4/2019 Class 4 Part 2




Examples cont.

Feature reduction/extraction

Original Reconstruction with 20 PC

http://www.nist.gov/

I d d t C t A l i (ICA)

8/4/2019 Class 4 Part 2



Independent Component Analysis (ICA)

Generative model

( )

( )

1

1

1

1,1 1,

1, ,

,1 ,

Noise free, linear signal model:

,..., Observed variables

,..., latent signals, independent components

unknown mixing matrix, ( ,..., )

N

i ii

T

N

T N

N

T

i i N i

N N N

s

x x

s s

a a

a aa a

== =

=

=

= =

∑x As a

x

s

A a

Independent Component Anal sis (ICA)

8/4/2019 Class 4 Part 2




Task

For the linear, noise free signal model, compute andgiven the measurements .

A sx

1 2 3

1 2 3

Blind source separation :

separate the three original signals

, , and from their mixtures, , and .

s s s x x x

Figure by MIT OCW.

S3

S2

S1

X1

X2

X3

S 1

S 1

S 1


8/4/2019 Class 4 Part 2




Restrictions

{ } { } { } { }

1 2 1 1 2 2

1 1 2 2 1 1 2 2

1.) Statistical independence

The signals must be statistically independent:( , ,..., ) ( ) ( )... ( )

Independent variables satisfy:

( ) ( )... ( ) ( ) ( ) ... ( )

for any

i

N N N

N N N N

s p s s s p s p s p s

E g s g s g s E g s E g s E g s

g

=

=

{ }

{ } { }

1

1 1 2 2 1 1 2 2 1 2 1 2

1 1 1 1 2 2 2 2 1 1 2 2

( )

( ) ( ) ( ) ( ) ( , )

( ) ( ) ( ) ( ) ( ) ( )

i s L

E g s g s g s g s p s s ds ds

g s p s ds g s p s ds E g s E g s

∞ ∞

−∞ −∞

∞ ∞

−∞ −∞

∈

=

= =

∫ ∫ ∫ ∫


8/4/2019 Class 4 Part 2




Restrictions

{ } { } { } { }

{ } { } { }

1 1 2 2 1 1 2 2

Statistical independence cont.

( ) ( )... ( ) ( ) ( ) ... ( )

Independence includes uncorrelatedness:

( )( ) ( ) ( ) 0

N N N N

i i n n i i n n

E g s g s g s E g s E g s E g s

E s s E s E sµ µ µ µ

=

− − = − − =


8/4/2019 Class 4 Part 2




Restrictions

2 2

1 21 2 2 2

1 2 1 2

2.) Nongaussian components

The components must have a nongaussian distribution

otherwise there is no unique solution.Example:

given and two gaussian signals:

1( , ) exp( )

2 2 2

gene

i s

s s p s s

πσ σ σ σ = − −

A

2 21 1 2

1 2

2

Scaling matrix

rate new signals1/ 0 1

( , ) exp( )exp( )0 1/ 2 2 2

s s p s s

σ

σ π

′ ′ ′ ′ ′= ⇒ = − −

S

s s

1σ

2σ

1

1


8/4/2019 Class 4 Part 2




Restrictions

2 2

1 21 2

Rotation matrix

Nongaussian components cont.

under rotation the components remain independent:

cos sin 1, ( , ) exp( )exp( )

sin cos 2 2 2

combine whitening and rotatio

s s p s s

ϕ ϕ

ϕ ϕ π

′′ ′′ ′′ ′ ′′ ′′= = − −

− R

s s

1

1

n :

is also a solution to the ICA problem.

−

−

=

′′= =

B RS

x As AB s

AB


8/4/2019 Class 4 Part 2




Restrictions

3.) Mixing matrix must be invertible

The number of independent components is equal to

the number of observerd variables.

Which means that there are no redundant mixtures.

In case mixing matrix is not invertible apply PCA

on measurements first to remove redundancy.


8/4/2019 Class 4 Part 2




Ambiguities

{ }1 1

2

1.) Scale

1( )( )

Reduce ambiguity by enforcing 1

2.) Order

We cannot determine an order of the independantcomponents

N N

i i i i i

i i i

i

s s

E s

α α = =

= =

=

∑ ∑x a a


8/4/2019 Class 4 Part 2




Computing ICA

1

1

2

a) Minimizing mutual information:

ˆˆ

ˆ ˆˆMutual information: ( ) ( ) ( )

is the differential etnropy:ˆ ˆ ˆ ˆ( ) ( ) log ( ( ))

ˆis always nonnegative and 0 only if the are

independe

N

i

i

i

I H s H

H H p p d

I s

−

=

=

= −

= −

∑

∫

s A x

s s

s s s s

1

nt.

ˆ ˆIteratively modify such that ( ) is minimized. I −A s


8/4/2019 Class 4 Part 2




Computing ICA cont.

b) Maximizing Nongaussianity

1

introduce and :

From central limit theorem:

is more gaussian than any of the and becomesleast gaussian if .

Iteratively modify such that the "gaussianity"

of is m

T T T

T

i

i

T

y y

y s y s

y

−=

= = =

==

s A x

b b x b As q s

q s

b

1

inimized. When a local minimum is reached,

is a row vector of .T −b A

PCA Applied to Faces

8/4/2019 Class 4 Part 2



pp

1x

x

1,1 x

,1 N x

1, x

, N M x

1u

2u

Each pixel is a feature, each face image a point in the feature space.

Dimension of feature vector is given by the size of the image.

1 x

2 x

3 x

1x

x

1

are the eigenvectors which can be

represented as pixel images in the original

cooridinate system ...

i

N x x

u

…

ICA Applied to Faces

8/4/2019 Class 4 Part 2



pp

…

1,1 x

1, x

,1 N x

, N M x

N x1 x

Now each image corresponds to a particular observed variable

measured over time (M samples). N is the number of images.

( )

( )

1

1

1

,..., Observed variables

,..., latent signals, independent components

N

i i

i

T

N

T

N

s

x x

s s

=

= =

=

=

∑x As a

x

s

S 1

S 1

PCA and ICA for Faces

8/4/2019 Class 4 Part 2



Features for face recognition

Image removed due to copyright considerations. See Figure 1 in: Baek, Kyungim et. al.

"PCA vs. ICA: A comparison on the FERET data set." International Conference of Computer Vision,

Pattern Recognition, and Image Processing, in conjunction with the 6th JCIS. Durham, NC, March 8-14 2002, June 2001.

Documents

Class 4 Part 2