Upload
ann-robertson
View
221
Download
2
Embed Size (px)
Citation preview
Object Orie’d Data Analysis, Last Time
• Finished NCI 60 Data
• Started detailed look at PCA
• Reviewed linear algebra
Today:
• More linear algebra
• Multivariate Probability Distribution
• PCA as an optimization problem
Detailed Look at PCA
Three important (and interesting) viewpoints:
1. Mathematics
2. Numerics
3. Statistics
1st: Review linear alg. and multivar. prob.
Review of Linear Algebra (Cont.)
Singular Value Decomposition (SVD):
For a matrix
Find a diagonal matrix ,
with entries
called singular values
And unitary (rotation) matrices ,
(recall )
so that
ndX
ndS
),min(1,..., ndss
ddU nnV
nt
dt IVVIUU ,
tUSVX
Review of Linear Algebra (Cont.)
Intuition behind Singular Value Decomposition:
• For a “linear transf’n” (via matrix multi’n)
• First rotate
• Second rescale coordinate axes (by )
• Third rotate again
• i.e. have diagonalized the transformation
X
vVSUvVSUvX tt
is
Review of Linear Algebra (Cont.)
r
SVD Compact Representation:
Useful Labeling:
Singular Values in Increasing Order
Note: singular values = 0 can be omitted
Let = # of positive singular values
Then:
Where are truncations of
trnrrrd VSUX
VSU ,,
),min(1 dnss
Review of Linear Algebra (Cont.)
SVD Full Representation:
=ndX ddU ndS nn
tV
Review of Linear Algebra (Cont.)
SVD Reduced Representation:
=
Assumes
ndX ddU nnS nn
tV
nnd 0
nd
Review of Linear Algebra (Cont.)
SVD Reduced Representation:
=
Assumes
ndX ndU nnS nn
tV
nd
Review of Linear Algebra (Cont.)
SVD Compact Representation:
= ndX rdU
rrS nrtV
0
Review of Linear Algebra (Cont.)
SVD Compact Representation:
= ndX rdU
rrS nrtV
Review of Linear Algebra (Cont.)
Eigenvalue Decomposition:
For a (symmetric) square matrix
Find a diagonal matrix
And an orthonormal matrix
(i.e. )
So that: , i.e.
ddX
d
D
0
01
ddB
ddtt IBBBB
DBBX tBDBX
Review of Linear Algebra (Cont.)
Eigenvalue Decomposition (cont.):• Relation to Singular Value Decomposition
(looks similar?):• Eigenvalue decomposition “harder”• Since needs • Price is eigenvalue decomp’n is generally
complex• Except for square and symmetric
• Then eigenvalue decomp. is real valued• Thus is the sing’r value decomp. with:
VU
X
BVU
Review of Linear Algebra (Cont.)
Better View of Relationship:
Singular Value Dec. Eigenvalue Dec.
• Start with data matrix:
• With SVD:
• Create square, symmetric matrix:
• Note that:
• Gives Eigenanalysis,
tVSUX
X
2& SDUB
nd
tXX
tttt USUUSVVSUXX 2
Review of Linear Algebra (Cont.)
Computation of Singular Value and
Eigenvalue Decompositions:
• Details too complex to spend time here
• A “primitive” of good software packages
• Eigenvalues are unique
• Columns of are called
“eigenvectors”
• Eigenvectors are “ -stretched” by :
d ,...,1
dvvB 1
iii vvX X
Review of Linear Algebra (Cont.)
Eigenvalue Decomp. solves matrix problems:
• Inversion:
• Square Root:
• is positive (nonn’ve, i.e. semi) definite
all
t
d
BBX
1
11
1
0
0
t
d
BBX
2/1
2/11
2/1
0
0
X
0)(i
Recall Linear Algebra (Cont.)
Moore-Penrose Generalized Inverse:
For
tr BBX
000
0
0
0
00
1
11
0,,0,, 11 drr
Recall Linear Algebra (Cont.)
Easy to see this satisfies the definition of
Generalized (Pseudo) Inverse
•
•
• symmetric
• symmetric
XXXX
XXXXXX
XX
Recall Linear Algebra (Cont.)
Moore-Penrose Generalized Inverse:
Idea: matrix inverse on non-null space of the corresponding linear transformation
Reduces to ordinary inverse, in full rank case,i.e. for r = d, so could just always use this
Tricky aspect:“>0 vs. = 0” & floating point arithmetic
Recall Linear Algebra (Cont.)
Moore-Penrose Generalized Inverse:
Folklore: most multivariate formulas involving matrix inversion “still work” when Generalized Inverse is used instead
Review of Multivariate Probability
Given a “random vector”,
A “center” of the distribution is the mean
vector,
dX
X
X 1
dEX
EX
XE 1
Review of Multivariate Probability
Given a “random vector”,
A “measure of spread” is the covariance
matrix:
dX
X
X 1
dd
d
XXX
XXX
X
var,cov
,covvar
)cov(
1
11
Review of Multivar. Prob. (Cont.)
Covariance matrix:
• Noneg’ve Definite (since all varia’s are 0)
• Provides “elliptical summary of distribution”
• Calculated via “outer product”:
t
dddddd
dd
XXE
XXXX
XXXX
EX
11
111111
)cov(
Review of Multivar. Prob. (Cont.)
Empirical versions:
Given a random sample ,
Estimate the theoretical mean ,
with the sample mean:
nXX ,...,1
n
ii
d
Xn
X
X
X1
11
ˆ
Review of Multivar. Prob. (Cont.)
Empirical versions (cont.)
And estimate the “theoretical cov.” ,
with the “sample cov.”:
Normalizations:
gives unbiasedness
gives MLE in Gaussian case
n
idid
n
iidid
n
ididi
n
ii
XXXXXX
XXXXXX
n
1
2
111
111
1
211
11ˆ
1
1
n
n
1
Review of Multivar. Prob. (Cont.)
Outer product representation:
,
where:
n
idididid
didii
XXXXXX
XXXXXX
n 1 211
112
11
1
1ˆ
tn
i
tii XXXXXX
n~~
11ˆ
1
ndn XXXX
nX
11
1~
PCA as an Optimization Problem
Find “direction of greatest variability”:
PCA as Optimization (Cont.)
Find “direction of greatest variability”:
Given a “direction vector”, (i.e. )
Projection of in the direction :
Variability in the direction :
du 1uXX i u
uuXXXXP iiu ,
n
ii
n
ii
n
iiv uuXXuuXXXXP
1
22
1
2
1
2,,
n
i
ti
n
ii uXXuXX
1
2
1
2,
n
i
tii
t uXXXXu1
u
PCA as Optimization (Cont.)Variability in the direction :
i.e. (proportional to) aquadratic form in the covariance matrix
Simple solution comes from the eigenvalue representation of :
where is orthonormal, &
u
uunuXXXXuXXP tn
i
tii
tn
iiv
ˆ111
2
tBDB
dvvB ,...,1
d
D
0
01
PCA as Optimization (Cont.)Variability in the direction :
But = “ transform of ”
= “ rotated into coordinates”,
and the diagonalized quadratic form becomes
u
uBDBunuBDBunXXP ttttn
iiu 11
1
2
uv
uv
u
v
v
uB
dt
d
t
t
,
,11
B u
u B
d
jjj
n
iiu uvnXXP
1
2
1
2,1
PCA as Optimization (Cont.)Now since is an orthonormal basis matrix,
and
So the rotation gives a distribution
of the (unit) energy of over the eigen-
directions
And is max’d (over ),
by putting all energy in the “largest
direction”, i.e. ,
where “eigenvalues are ordered”,
B
d
jjj vuvu
1
,
d
jj uvu
1
22,1
uv
uv
uB
d
t
,
,1
u
d
jjj
n
iiu uvnXXP
1
2
1
2,1 u
1vu
d 21
PCA as Optimization (Cont.)
Notes:
• Solution is unique when
• Else have sol’ns in subsp. gen’d by 1st s
• Projecting onto subspace to ,
gives as next direction
• Continue through ,…,
• Replace by to get theoretical PCA
• Estimated by the empirical version
21
v
1v
2v
3v dv
Iterated PCA Visualization
Connect Math to Graphics2-d Toy Example
Feature Space Object Space
Data Points (Curves) are columns of data matrix, X
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Sample Mean, X
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Residuals from Mean = Data - Mean
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Recentered Data = Mean Residuals, shifted to 0
= (rescaling of) X
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
PC1 Direction = η = Eigenvector (w/ biggest λ)
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Centered Data PC1 Projection Residual
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
PC2 Direction = η = Eigenvector (w/ 2nd biggest λ)
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Centered Data PC2 Projection Residual
Connect Math to Graphics (Cont.)
Note for this 2-d Example:
PC1 Residuals = PC2 Projections
PC2 Residuals = PC1 Projections
(i.e. colors common across these pics)
PCA Redistribution of EnergyConvenient summary of amount of structure:
Total Sum of Squares
Physical Interpetation:Total Energy in Data
Insight comes from decomposition
Statistical Terminology:ANalysis Of VAriance (ANOVA)
n
iiX
1
2
PCA Redist’n of Energy (Cont.)
ANOVA mean decomposition:
Total Variation = = Mean Variation + Mean Residual Variation
Mathematics: Pythagorean Theorem
Intuition Quantified via Sums of Squares
n
ii
n
i
n
ii XXXX
1
2
1
2
1
2
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Residuals from Mean = Data – Mean
Most of Variation = 92% is Mean Variation SS
Remaining Variation = 8% is Resid. Var. SS
PCA Redist’n of Energy (Cont.)
Now decompose SS about the mean
where:
Energy is expressed in trace of covar’ce matrix
XXtrnXXXXXX tn
ii
ti
n
ii
~~)1(11
2
ndn XXXX
nX
11
1~
ˆ1~~11
2trnXXtrnXX t
n
ii
PCA Redist’n of Energy (Cont.)
j
Eigenvalues provide atoms of SS decomposi’n
Useful Plots are:• “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs.
Note PCA gives SS’s for free (as eigenvalues),
but watch factors of
d
jj
ttn
ii DtrDBBtrBDBtrXX
n 11
2)(
1
1
j
j
j jlog
j
jj
1''
1n
PCA Redist’n of Energy (Cont.)
Note, have already considered some of these Useful Plots:• Power Spectrum• Cumulative Power Spectrum
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Revisit SS Decomposition for PC1:PC1 has “most of var’n” = 93%Reflected by good approximation in Object Space
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Revisit SS Decomposition for PC1:PC2 has “only a little var’n” = 7%Reflected by poor approximation in Object Space
Different Views of PCA
Solves several optimization problems:
1. Direction to maximize SS of 1-d proj’d data
2. Direction to minimize SS of residuals
(same, by Pythagorean Theorem)
3. “Best fit line” to data in “orthogonal sense”
(vs. regression of Y on X = vertical sense
& regression of X on Y = horizontal sense)
Use one that makes sense…
Different Views of PCA2-d Toy Example
Feature Space Object Space
1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line