Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University

Principal Components:A Mathematical Introduction

Simon Mason

International Research Institute for Climate Prediction

The Earth Institute of Columbia University

L i n k i n g S c i e n c e t o S o c i e t yL i n k i n g S c i e n c e t o S o c i e t y

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

What is the most beautiful city setting?

The setting could be measured on a variety of metrics, such as height of surrounding mountains, length of coastline.

But if more than one metric is used, then some combined measure will need to be devised.


The city scores can be represented by a matrix, X. For simplicity, the scores are considered on only two metrics, and for only three cities.

The metrics are sea and mountains, and the cities are San Francisco, Hong Kong, and Cape Town:

sea mtns

San Francisco 4 4

Hong Kong 8 5

Cape Town 6 6

X

The means and variances are:

4 4

8 5

6 6

X

mean 6

sea mo

5

vari

unt

an

ains

ce 4 1


The variance is used to distinguish the cities’ attractiveness. The total variance is 5.

X can be expressed as an anomaly matrix or a standardized anomaly matrix:

2 1

2 0

0 1

X

1 1

1 0

0 1

X


In general, if

a d

b e

c f

X

and a + b + c = 0, and d + e + f = 0, then

2 2 2var iance 2a b c

covariance 2ad be cf


In general, matrix multiplication gives:

2 2 2

2 2 2

T

a da b c

b ed e f

c f

a b c ad be cf

da eb fc d e f

X X

So, if a + b + c = 0, and d + e + f = 0, then:

1 variance-covariance matrixTn X X


If X contains data expressed as anomalies

If X contains data expressed as standardized anomalies

1 variance-covariance matrixTn X X

1 correlation matrixTn X X


Using the city data expressed in standardized anomalies:

1 1

1

1 1

1 0

0 1

1 11 1 0

1 01 0 1

0 1

2 1

1 2

1 0.5

0.5 1

Tn n

n

X

X X


The variance-covariance matrix for the city data is:

11 0.5

0.5 1T

n

X X

Note that the covariances are greater than zero, implying that both metrics represent a common aspect of city attractiveness.


Because of the covariance (or correlation) between the two metrics, we could combine these two metrics into a single new metric that represents the variance that is common to both metrics.

Specifically we want to define sets of weights so that the new variables are uncorrelated, and have maximized variance.

Let the weights for the first principal component be a, and, for the second principal component, b.


In matrix notation data are post-multiplied by the weights, represented as U:

sea sea

mountains mountains

1 1

1 0

0 1

a b

a b

XU

This gives the principal components Z. The scores on the principal components are:

sea mountains sea mountains

sea sea

mountains mountains

PC1 PC2

San Francisco

Hong Kong

Cape Town

a a b b

a b

a b

Z


The principal components are defined as:

Z XU

Which simply states that they are calculated as the weighted sums of the original metrics.

Note that the sums of the squared weights = 1. Also if the principal components are to be uncorrelated, the weights also need to be uncorrelated.


These two properties of the weights are useful:

The diagonals are the sums of the squares of each column of X. A column of X contains the weights for one of the principal components, so the diagonal of XTX are 1. Because the weights are uncorrelated, the off-diagonals are 0.

1 0

0 1T T

U U UU I

sea mountains sea sea

sea mountains mountains mountains

2 2sea sea mountains mountainssea mountains

2 2sea sea mountains mountains sea mountains

T a a a b

b b a b

a a a b a b

b a b a b b

U U


So if we post-multiply

Z XU

by UT, we get:

I

T T

ZU XUU

X

X


Allowing us to express X in terms of the principal component scores and loadings.

Rem em ber that

1 Tn X X

is e ither the variance-covariance m atrix, or the corre lation m atrix (depending on whether X contains anom alies or standardized anom alies). W e can replace X by ZU T, which g ives:

1 1

1

1

TT T Tn n

T Tn

T Tn

X X ZU ZU

U Z ZU

U Z Z U


C o m p a r e 1 Tn X X w i t h 1 T

n Z Z , w h i c h a p p e a r s i n t h e p r e v i o u s

e q u a t i o n . J u s t a s 1 Tn X X i s t h e c o v a r i a n c e ( c o r r e l a t i o n ) f o r

X , s o 1 Tn Z Z i s t h e c o v a r i a n c e m a t r i x f o r Z .

W e k n o w t h a t t h e c o v a r i a n c e m a t r i x f o r Z i s a d i a g o n a l m a t r i x o f e i g e n v a l u e s ( v a r i a n c e s o f t h e p r i n c i p a l c o m p o n e n t s ) . S e t t i n g 1 T

nC X X , a n d 1 TnΛ Z Z ,

1 1T T Tn n

T

X X U Z Z U

C U Λ U

w h i c h s t a t e s t h a t t h e c o v a r i a n c e ( o r c o r r e l a t i o n ) m a t r i x o f t h e o r i g i n a l m e t r i c s i s r e l a t e d t o t h e w e i g h t s a n d v a r i a n c e s o f t h e p r i n c i p a l c o m p o n e n t s .


TC U Λ U c a n b e r e a r r a n g e d :

T T T

T

T

T

U C U U U Λ U U

U C U I Λ I

U C U Λ

U C U Λ 0

I f w e t a k e o n l y o n e p r i n c i p a l c o m p o n e n t a t a t i m e ,

0T u C u u a r e t h e w e i g h t s f o r t h i s c o m p o n e n t , a n d i s i t s v a r i a n c e .



0

0

0

0

T

T

u Cu

uu Cu u

Cu u

C I u

If C I is invertible we could premultiply by C I, and would be left with u = 0, which provides no useful solution. Therefore we want C I to be non-invertible, which we can ensure by setting the determinant to zero.

0 C I


0

1 0.5 1 00

0.5 1 0 1

1 0.5 00

0.5 1 0

1 0.50

0.5 1

C I

Using the city data:

which gives us the variances for both principal components. (Note the total variance.) The eigenvectors can be obtained by solving:


2 2

2

1 0.50

0.5 1

1 0.5 0

2 0.75 0

1.79, or 0.21

TC UΛU

T h e p r i n c i p a l c o m p o n e n t s c a n a l s o b e d e r i v e d u s i n g S V D . R e m e m b e r t h a t t h e v a r i a n c e - c o v a r i a n c e m a t r i x f o r t h e p r i n c i p a l c o m p o n e n t s i s a d i a g o n a l m a t r i x c o n t a i n i n g t h e e i g e n v a l u e s :

12

1 . 7 9 0 . 0 0

0 . 0 0 0 . 2 1T

Z Z Λ

L e t t h e s t a n d a r d i z e d p r i n c i p a l c o m p o n e n t s b e r e p r e s e n t e d b y W :

1W Z S S r e p r e s e n t s t h e s t a n d a r d d e v i a t i o n s ,

S Λ .


I f w e s t a n d a r d i z e t h e p r i n c i p a l c o m p o n e n t s s o t h a t t h e y h a v e u n i t v a r i a n c e , t h e v a r i a n c e - c o v a r i a n c e m a t r i x o f t h e s t a n d a r d i z e d c o m p o n e n t s i s :

11 . 0 0 0 . 0 0

0 . 0 0 1 . 0 0T

n

W W

A n a l t e r n a t i v e s t a n d a r d i z a t i o n , Σ , a l s o e l i m i n a t e s t h e c o n s t a n t 1

n . L e t t h e s e s t a n d a r d i z e d p r i n c i p a l c o m p o n e n t s b e d e n o t e d V , w h i c h a r e r e s c a l e d p r i n c i p a l c o m p o n e n t s :

1V Z Σ s o t h a t

1 . 0 0 0 . 0 0

0 . 0 0 1 . 0 0T

V V


F r o m

1V Z Σ a n d t h e o r i g i n a l d e f i n i t i o n o f t h e p r i n c i p a l c o m p o n e n t s

Z X U W e g e t

1V X U Σ T o r e a r r a n g e t h i s e q u a t i o n i n t e r m s o f X …


1

1

T T

T

V X U Σ

V Σ X U Σ Σ

V Σ X U

V Σ U X U U

V Σ U X

N o w X i s e x p r e s s e d i n t e r m s o f t w o o r t h o g o n a l m a t r i c e s :

1 . 0 0 0 . 0 0

0 . 0 0 1 . 0 0T T

U U V V

TX V Σ U d e f i n e s t h e S V D o f X . A n S V D e x p r e s s e s a m a t r i x

i n t e r m s o f a d i a g o n a l m a t r i x o f s i n g u l a r v a l u e s , Σ , a n d t w o o r t h o g o n a l m a t r i c e s . O n e o f t h e o r t h o g o n a l m a t r i c e s i s t h e t h e p r i n c i p a l c o m p o n e n t w e i g h t s U , t h e o t h e r i s t h e s t a n d a r d i z e d - r e s c a l e d p r i n c i p a l c o m p o n e n t s c o r e s V .


The eigenvalues can be obtained from the singular vectors Σ as follows. From

1 TnΛ Z Z

and

1

1

V ZΣ

VΣ ZΣ Σ

VΣ Z

then …



1

1

1

1

1

21

T

n

T Tn

Tn

Tn

n

n

Λ VΣ VΣ

Λ Σ V VΣ

Λ Σ IΣ

Λ Σ Σ

Λ ΣΣ

Λ Σ

Therefore the singular vectors are simply rescaled eigenvalues.


Finally, the SVD is useful for demonstrating the equivalence between S- and T-mode analyses.

Mode Rows of X Columns of X S Time Space T Space Time R Time Parameter Q Parameter Time O Space Parameter P Parameter Space


T

TT T

T

X VΣU

X VΣU

UΣV

Therefore a T-mode principal components analysis will generate the same results as an S-mode analysis, except that the loadings and the scores are swapped., and the singular values will be scaled by a different value for n.

Documents

Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University