Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate

Object Orie’d Data Analysis, Last Time

• Finished NCI 60 Data

• Started detailed look at PCA

• Reviewed linear algebra

Today:

• More linear algebra

• Multivariate Probability Distribution

• PCA as an optimization problem

Detailed Look at PCA

Three important (and interesting) viewpoints:

1. Mathematics

2. Numerics

3. Statistics

1st: Review linear alg. and multivar. prob.

Review of Linear Algebra (Cont.)

Singular Value Decomposition (SVD):

For a matrix

Find a diagonal matrix ,

with entries

called singular values

And unitary (rotation) matrices ,

(recall )

so that

ndX

ndS

),min(1,..., ndss

ddU nnV

nt

dt IVVIUU ,

tUSVX


Intuition behind Singular Value Decomposition:

• For a “linear transf’n” (via matrix multi’n)

• First rotate

• Second rescale coordinate axes (by )

• Third rotate again

• i.e. have diagonalized the transformation

X

vVSUvVSUvX tt

is


r

SVD Compact Representation:

Useful Labeling:

Singular Values in Increasing Order

Note: singular values = 0 can be omitted

Let = # of positive singular values

Then:

Where are truncations of

trnrrrd VSUX

VSU ,,

),min(1 dnss


SVD Full Representation:

=ndX ddU ndS nn

tV


SVD Reduced Representation:

=

Assumes

ndX ddU nnS nn

tV

nnd 0

nd


SVD Reduced Representation:

=

Assumes

ndX ndU nnS nn

tV

nd



= ndX rdU

rrS nrtV

0



= ndX rdU

rrS nrtV


Eigenvalue Decomposition:

For a (symmetric) square matrix

Find a diagonal matrix

And an orthonormal matrix

(i.e. )

So that: , i.e.

ddX

d

D

0

01

ddB

ddtt IBBBB

DBBX tBDBX


Eigenvalue Decomposition (cont.):• Relation to Singular Value Decomposition

(looks similar?):• Eigenvalue decomposition “harder”• Since needs • Price is eigenvalue decomp’n is generally

complex• Except for square and symmetric

• Then eigenvalue decomp. is real valued• Thus is the sing’r value decomp. with:

VU

X

BVU


Better View of Relationship:

Singular Value Dec. Eigenvalue Dec.

• Start with data matrix:

• With SVD:

• Create square, symmetric matrix:

• Note that:

• Gives Eigenanalysis,

tVSUX

X

2& SDUB

nd

tXX

tttt USUUSVVSUXX 2


Computation of Singular Value and

Eigenvalue Decompositions:

• Details too complex to spend time here

• A “primitive” of good software packages

• Eigenvalues are unique

• Columns of are called

“eigenvectors”

• Eigenvectors are “ -stretched” by :

d ,...,1

dvvB 1

iii vvX X


Eigenvalue Decomp. solves matrix problems:

• Inversion:

• Square Root:

• is positive (nonn’ve, i.e. semi) definite

all

t

d

BBX

1

11

1

0

0

t

d

BBX

2/1

2/11

2/1

0

0

X

0)(i

Recall Linear Algebra (Cont.)

Moore-Penrose Generalized Inverse:

For

tr BBX

000

0

0

0

00

1

11

0,,0,, 11 drr


Easy to see this satisfies the definition of

Generalized (Pseudo) Inverse

•

•

• symmetric

• symmetric

XXXX

XXXXXX

XX



Idea: matrix inverse on non-null space of the corresponding linear transformation

Reduces to ordinary inverse, in full rank case,i.e. for r = d, so could just always use this

Tricky aspect:“>0 vs. = 0” & floating point arithmetic



Folklore: most multivariate formulas involving matrix inversion “still work” when Generalized Inverse is used instead

Review of Multivariate Probability

Given a “random vector”,

A “center” of the distribution is the mean

vector,

dX

X

X 1

dEX

EX

XE 1

Review of Multivariate Probability

Given a “random vector”,

A “measure of spread” is the covariance

matrix:

dX

X

X 1

dd

d

XXX

XXX

X

var,cov

,covvar

)cov(

1

11

Review of Multivar. Prob. (Cont.)

Covariance matrix:

• Noneg’ve Definite (since all varia’s are 0)

• Provides “elliptical summary of distribution”

• Calculated via “outer product”:

t

dddddd

dd

XXE

XXXX

XXXX

EX

11

111111

)cov(


Empirical versions:

Given a random sample ,

Estimate the theoretical mean ,

with the sample mean:

nXX ,...,1

n

ii

d

Xn

X

X

X1

11

ˆ


Empirical versions (cont.)

And estimate the “theoretical cov.” ,

with the “sample cov.”:

Normalizations:

gives unbiasedness

gives MLE in Gaussian case

n

idid

n

iidid

n

ididi

n

ii

XXXXXX

XXXXXX

n

1

2

111

111

1

211

11ˆ

1

1

n

n

1


Outer product representation:

,

where:

n

idididid

didii

XXXXXX

XXXXXX

n 1 211

112

11

1

1ˆ

tn

i

tii XXXXXX

n~~

11ˆ

1

ndn XXXX

nX

11

1~

PCA as an Optimization Problem

Find “direction of greatest variability”:

PCA as Optimization (Cont.)

Find “direction of greatest variability”:

Given a “direction vector”, (i.e. )

Projection of in the direction :

Variability in the direction :

du 1uXX i u

uuXXXXP iiu ,

n

ii

n

ii

n

iiv uuXXuuXXXXP

1

22

1

2

1

2,,

n

i

ti

n

ii uXXuXX

1

2

1

2,

n

i

tii

t uXXXXu1

u

PCA as Optimization (Cont.)Variability in the direction :

i.e. (proportional to) aquadratic form in the covariance matrix

Simple solution comes from the eigenvalue representation of :

where is orthonormal, &

u

uunuXXXXuXXP tn

i

tii

tn

iiv

ˆ111

2

tBDB

dvvB ,...,1

d

D

0

01

PCA as Optimization (Cont.)Variability in the direction :

But = “ transform of ”

= “ rotated into coordinates”,

and the diagonalized quadratic form becomes

u

uBDBunuBDBunXXP ttttn

iiu 11

1

2

uv

uv

u

v

v

uB

dt

d

t

t

,

,11

B u

u B

d

jjj

n

iiu uvnXXP

1

2

1

2,1

PCA as Optimization (Cont.)Now since is an orthonormal basis matrix,

and

So the rotation gives a distribution

of the (unit) energy of over the eigen-

directions

And is max’d (over ),

by putting all energy in the “largest

direction”, i.e. ,

where “eigenvalues are ordered”,

B

d

jjj vuvu

1

,

d

jj uvu

1

22,1

uv

uv

uB

d

t

,

,1

u

d

jjj

n

iiu uvnXXP

1

2

1

2,1 u

1vu

d 21

PCA as Optimization (Cont.)

Notes:

• Solution is unique when

• Else have sol’ns in subsp. gen’d by 1st s

• Projecting onto subspace to ,

gives as next direction

• Continue through ,…,

• Replace by to get theoretical PCA

• Estimated by the empirical version

21

v

1v

2v

3v dv

Iterated PCA Visualization

Connect Math to Graphics2-d Toy Example

Feature Space Object Space

Data Points (Curves) are columns of data matrix, X

Connect Math to Graphics (Cont.)

2-d Toy ExampleFeature Space Object Space

Sample Mean, X



Residuals from Mean = Data - Mean



Recentered Data = Mean Residuals, shifted to 0

= (rescaling of) X



PC1 Direction = η = Eigenvector (w/ biggest λ)



Centered Data PC1 Projection Residual



PC2 Direction = η = Eigenvector (w/ 2nd biggest λ)



Centered Data PC2 Projection Residual


Note for this 2-d Example:

PC1 Residuals = PC2 Projections

PC2 Residuals = PC1 Projections

(i.e. colors common across these pics)

PCA Redistribution of EnergyConvenient summary of amount of structure:

Total Sum of Squares

Physical Interpetation:Total Energy in Data

Insight comes from decomposition

Statistical Terminology:ANalysis Of VAriance (ANOVA)

n

iiX

1

2

PCA Redist’n of Energy (Cont.)

ANOVA mean decomposition:

Total Variation = = Mean Variation + Mean Residual Variation

Mathematics: Pythagorean Theorem

Intuition Quantified via Sums of Squares

n

ii

n

i

n

ii XXXX

1

2

1

2

1

2



Residuals from Mean = Data – Mean

Most of Variation = 92% is Mean Variation SS

Remaining Variation = 8% is Resid. Var. SS


Now decompose SS about the mean

where:

Energy is expressed in trace of covar’ce matrix

XXtrnXXXXXX tn

ii

ti

n

ii

~~)1(11

2

ndn XXXX

nX

11

1~

ˆ1~~11

2trnXXtrnXX t

n

ii


j

Eigenvalues provide atoms of SS decomposi’n

Useful Plots are:• “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs.

Note PCA gives SS’s for free (as eigenvalues),

but watch factors of

d

jj

ttn

ii DtrDBBtrBDBtrXX

n 11

2)(

1

1

j

j

j jlog

j

jj

1''

1n


Note, have already considered some of these Useful Plots:• Power Spectrum• Cumulative Power Spectrum



Revisit SS Decomposition for PC1:PC1 has “most of var’n” = 93%Reflected by good approximation in Object Space



Revisit SS Decomposition for PC1:PC2 has “only a little var’n” = 7%Reflected by poor approximation in Object Space

Different Views of PCA

Solves several optimization problems:

1. Direction to maximize SS of 1-d proj’d data

2. Direction to minimize SS of residuals

(same, by Pythagorean Theorem)

3. “Best fit line” to data in “orthogonal sense”

(vs. regression of Y on X = vertical sense

& regression of X on Y = horizontal sense)

Use one that makes sense…

Different Views of PCA2-d Toy Example

Feature Space Object Space

1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line

Documents

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate