Upload
victor-powell
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Principal Components:A Mathematical Introduction
Simon Mason
International Research Institute for Climate Prediction
The Earth Institute of Columbia University
L i n k i n g S c i e n c e t o S o c i e t yL i n k i n g S c i e n c e t o S o c i e t y
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
What is the most beautiful city setting?
The setting could be measured on a variety of metrics, such as height of surrounding mountains, length of coastline.
But if more than one metric is used, then some combined measure will need to be devised.
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
The city scores can be represented by a matrix, X. For simplicity, the scores are considered on only two metrics, and for only three cities.
The metrics are sea and mountains, and the cities are San Francisco, Hong Kong, and Cape Town:
sea mtns
San Francisco 4 4
Hong Kong 8 5
Cape Town 6 6
X
The means and variances are:
4 4
8 5
6 6
X
mean 6
sea mo
5
vari
unt
an
ains
ce 4 1
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
The variance is used to distinguish the cities’ attractiveness. The total variance is 5.
X can be expressed as an anomaly matrix or a standardized anomaly matrix:
2 1
2 0
0 1
X
1 1
1 0
0 1
X
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
In general, if
a d
b e
c f
X
and a + b + c = 0, and d + e + f = 0, then
2 2 2var iance 2a b c
covariance 2ad be cf
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
In general, matrix multiplication gives:
2 2 2
2 2 2
T
a da b c
b ed e f
c f
a b c ad be cf
da eb fc d e f
X X
So, if a + b + c = 0, and d + e + f = 0, then:
1 variance-covariance matrixTn X X
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
If X contains data expressed as anomalies
If X contains data expressed as standardized anomalies
1 variance-covariance matrixTn X X
1 correlation matrixTn X X
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
Using the city data expressed in standardized anomalies:
1 1
1
1 1
1 0
0 1
1 11 1 0
1 01 0 1
0 1
2 1
1 2
1 0.5
0.5 1
Tn n
n
X
X X
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
The variance-covariance matrix for the city data is:
11 0.5
0.5 1T
n
X X
Note that the covariances are greater than zero, implying that both metrics represent a common aspect of city attractiveness.
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
Because of the covariance (or correlation) between the two metrics, we could combine these two metrics into a single new metric that represents the variance that is common to both metrics.
Specifically we want to define sets of weights so that the new variables are uncorrelated, and have maximized variance.
Let the weights for the first principal component be a, and, for the second principal component, b.
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
In matrix notation data are post-multiplied by the weights, represented as U:
sea sea
mountains mountains
1 1
1 0
0 1
a b
a b
XU
This gives the principal components Z. The scores on the principal components are:
sea mountains sea mountains
sea sea
mountains mountains
PC1 PC2
San Francisco
Hong Kong
Cape Town
a a b b
a b
a b
Z
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
The principal components are defined as:
Z XU
Which simply states that they are calculated as the weighted sums of the original metrics.
Note that the sums of the squared weights = 1. Also if the principal components are to be uncorrelated, the weights also need to be uncorrelated.
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
These two properties of the weights are useful:
The diagonals are the sums of the squares of each column of X. A column of X contains the weights for one of the principal components, so the diagonal of XTX are 1. Because the weights are uncorrelated, the off-diagonals are 0.
1 0
0 1T T
U U UU I
sea mountains sea sea
sea mountains mountains mountains
2 2sea sea mountains mountainssea mountains
2 2sea sea mountains mountains sea mountains
T a a a b
b b a b
a a a b a b
b a b a b b
U U
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
So if we post-multiply
Z XU
by UT, we get:
I
T T
ZU XUU
X
X
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
Allowing us to express X in terms of the principal component scores and loadings.
Rem em ber that
1 Tn X X
is e ither the variance-covariance m atrix, or the corre lation m atrix (depending on whether X contains anom alies or standardized anom alies). W e can replace X by ZU T, which g ives:
1 1
1
1
TT T Tn n
T Tn
T Tn
X X ZU ZU
U Z ZU
U Z Z U
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
C o m p a r e 1 Tn X X w i t h 1 T
n Z Z , w h i c h a p p e a r s i n t h e p r e v i o u s
e q u a t i o n . J u s t a s 1 Tn X X i s t h e c o v a r i a n c e ( c o r r e l a t i o n ) f o r
X , s o 1 Tn Z Z i s t h e c o v a r i a n c e m a t r i x f o r Z .
W e k n o w t h a t t h e c o v a r i a n c e m a t r i x f o r Z i s a d i a g o n a l m a t r i x o f e i g e n v a l u e s ( v a r i a n c e s o f t h e p r i n c i p a l c o m p o n e n t s ) . S e t t i n g 1 T
nC X X , a n d 1 TnΛ Z Z ,
1 1T T Tn n
T
X X U Z Z U
C U Λ U
w h i c h s t a t e s t h a t t h e c o v a r i a n c e ( o r c o r r e l a t i o n ) m a t r i x o f t h e o r i g i n a l m e t r i c s i s r e l a t e d t o t h e w e i g h t s a n d v a r i a n c e s o f t h e p r i n c i p a l c o m p o n e n t s .
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
TC U Λ U c a n b e r e a r r a n g e d :
T T T
T
T
T
U C U U U Λ U U
U C U I Λ I
U C U Λ
U C U Λ 0
I f w e t a k e o n l y o n e p r i n c i p a l c o m p o n e n t a t a t i m e ,
0T u C u u a r e t h e w e i g h t s f o r t h i s c o m p o n e n t , a n d i s i t s v a r i a n c e .
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
0
0
0
0
T
T
u Cu
uu Cu u
Cu u
C I u
If C I is invertible we could premultiply by C I, and would be left with u = 0, which provides no useful solution. Therefore we want C I to be non-invertible, which we can ensure by setting the determinant to zero.
0 C I
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
0
1 0.5 1 00
0.5 1 0 1
1 0.5 00
0.5 1 0
1 0.50
0.5 1
C I
Using the city data:
which gives us the variances for both principal components. (Note the total variance.) The eigenvectors can be obtained by solving:
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
2 2
2
1 0.50
0.5 1
1 0.5 0
2 0.75 0
1.79, or 0.21
TC UΛU
T h e p r i n c i p a l c o m p o n e n t s c a n a l s o b e d e r i v e d u s i n g S V D . R e m e m b e r t h a t t h e v a r i a n c e - c o v a r i a n c e m a t r i x f o r t h e p r i n c i p a l c o m p o n e n t s i s a d i a g o n a l m a t r i x c o n t a i n i n g t h e e i g e n v a l u e s :
12
1 . 7 9 0 . 0 0
0 . 0 0 0 . 2 1T
Z Z Λ
L e t t h e s t a n d a r d i z e d p r i n c i p a l c o m p o n e n t s b e r e p r e s e n t e d b y W :
1W Z S S r e p r e s e n t s t h e s t a n d a r d d e v i a t i o n s ,
S Λ .
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
I f w e s t a n d a r d i z e t h e p r i n c i p a l c o m p o n e n t s s o t h a t t h e y h a v e u n i t v a r i a n c e , t h e v a r i a n c e - c o v a r i a n c e m a t r i x o f t h e s t a n d a r d i z e d c o m p o n e n t s i s :
11 . 0 0 0 . 0 0
0 . 0 0 1 . 0 0T
n
W W
A n a l t e r n a t i v e s t a n d a r d i z a t i o n , Σ , a l s o e l i m i n a t e s t h e c o n s t a n t 1
n . L e t t h e s e s t a n d a r d i z e d p r i n c i p a l c o m p o n e n t s b e d e n o t e d V , w h i c h a r e r e s c a l e d p r i n c i p a l c o m p o n e n t s :
1V Z Σ s o t h a t
1 . 0 0 0 . 0 0
0 . 0 0 1 . 0 0T
V V
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
F r o m
1V Z Σ a n d t h e o r i g i n a l d e f i n i t i o n o f t h e p r i n c i p a l c o m p o n e n t s
Z X U W e g e t
1V X U Σ T o r e a r r a n g e t h i s e q u a t i o n i n t e r m s o f X …
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
1
1
T T
T
V X U Σ
V Σ X U Σ Σ
V Σ X U
V Σ U X U U
V Σ U X
N o w X i s e x p r e s s e d i n t e r m s o f t w o o r t h o g o n a l m a t r i c e s :
1 . 0 0 0 . 0 0
0 . 0 0 1 . 0 0T T
U U V V
TX V Σ U d e f i n e s t h e S V D o f X . A n S V D e x p r e s s e s a m a t r i x
i n t e r m s o f a d i a g o n a l m a t r i x o f s i n g u l a r v a l u e s , Σ , a n d t w o o r t h o g o n a l m a t r i c e s . O n e o f t h e o r t h o g o n a l m a t r i c e s i s t h e t h e p r i n c i p a l c o m p o n e n t w e i g h t s U , t h e o t h e r i s t h e s t a n d a r d i z e d - r e s c a l e d p r i n c i p a l c o m p o n e n t s c o r e s V .
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
The eigenvalues can be obtained from the singular vectors Σ as follows. From
1 TnΛ Z Z
and
1
1
V ZΣ
VΣ ZΣ Σ
VΣ Z
then …
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
1
1
1
1
1
21
T
n
T Tn
Tn
Tn
n
n
Λ VΣ VΣ
Λ Σ V VΣ
Λ Σ IΣ
Λ Σ Σ
Λ ΣΣ
Λ Σ
Therefore the singular vectors are simply rescaled eigenvalues.
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
Finally, the SVD is useful for demonstrating the equivalence between S- and T-mode analyses.
Mode Rows of X Columns of X S Time Space T Space Time R Time Parameter Q Parameter Time O Space Parameter P Parameter Space
L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !
T
TT T
T
X VΣU
X VΣU
UΣV
Therefore a T-mode principal components analysis will generate the same results as an S-mode analysis, except that the loadings and the scores are swapped., and the singular values will be scaled by a different value for n.