PRINCIPAL COMPONENT ANALYSIS(PCA) EOFs and Principle Components; Selection Rules

Preview:

DESCRIPTION

LECTURE 8. PRINCIPAL COMPONENT ANALYSIS(PCA) EOFs and Principle Components; Selection Rules. Supplementary Readings : Wilks , chapters 9. WE’LL START OUT WITH AN EXAMPLE: 20th GLOBAL SURFACE TEMPERATURE RECORD. Surface Temperature Changes. - PowerPoint PPT Presentation

Citation preview

PRINCIPAL COMPONENT ANALYSIS(PCA)

EOFs and Principle Components; Selection Rules

LECTURE 8

Supplementary Readings:

Wilks, chapters 9

WE’LL START OUT WITH AN EXAMPLE: 20th GLOBAL SURFACE TEMPERATURE RECORD

Climatic Research Unit (‘CRU’), University of East Anglia

Surface Temperature Changes

EOFs for the five leading eigenvectors of the global temperature data from 1902-1980.

The gridpoint areal weighting factor used in the PCA procedure has been removed from the EOFs so that relative temperature anomalies can be inferred from the patterns.

12% (88%)

6% (3%)

5% (1%)

4% (1%)

3% (0.5%)

EOF #1

EOF #2

EOF #3

EOF #4

EOF #5

SURFACE TEMPERATURE RECORD FILTERED BY RETAINING PROJECTION ONTO WITH FIRST FIVE EIGENVECTORS

FILTERING THROUGH PCA

GLOBAL TEMPERATURE TREND

EOF #1

PC #1

Multivariate ENSO Index

(“MEI”)

EOF #2PC #2

EL NINO/SOUTHERN OSCILLATION (ENSO)

NORTH ATLANTIC OSCILLATION

EOF #3

PC #3

EOF #3

PC #3

NORTH ATLANTIC OSCILLATION

TROPICAL ATLANTIC “DIPOLE”

EOF #3

PC #3

ATLANTIC MULTIDECADAL OSCILLATION

EOF #5

PC #5

EOF #5

PC #5

ATLANTIC MULTIDECADAL OSCILLATION

EOF #5

PC #5

ATLANTIC MULTIDECADAL OSCILLATION

PCA as an SVD on the Data Matrix X

Recall from our earlier lecture the variance-covariance matrix A in the multivariate regression problem:

The eigenvectors of A comprise an orthogonal predictor set

cbA

2321

32

32313

2322

212

131212

1

ˆ...ˆˆˆˆˆˆ..

.

...

.

.

ˆˆ...ˆˆˆˆˆ

ˆˆ...ˆˆˆˆˆ

ˆˆˆˆˆˆˆ

ix

ixi

xi

xi

xixi

x

ixi

xi

xi

xi

xixi

xi

xi

xi

xi

xi

xixi

xi

xix

ixix

ixix

ix

MMMM

M

M

M

A

(Principal Components Regression)

Let us return to the data matrix,(assume it has zero mean)

Nxxx

NxxxN

xxxNxxx

MMMˆ......

.

...

.

...

ˆ...2

ˆ1

ˆ

ˆ......2

ˆ1

ˆ

ˆ......2

ˆ1

ˆ

333

222

111

X

TVSUX

M

k

Tkkk

1

vuVSUXT

We can write

Where U,V are unitary matrices (orthogonal matrices if X is real-valued), U is MxN, S is diagonal NxN, and V is NxN

Singular Value Decomposition (SVD)

Assume M>N (overdetermined; greater number of “equations” than “unknowns”)

TTVSUX

M

k

Tkkk

1

vuVSUXTT

We can then write

Where U, V are unitary matrices (orthogonal matrices if X is real-valued), U is NxM, S is diagonal MxM, and V is MxM

Singular Value Decomposition (SVD)

Typically, we are interested in the case N>M.

A revised overdetermined problem can be obtained by redefining the problem:

M

k

Tkkk

TTTTT

1

uvUSVUSVVSUX

Nxxx

NxxxN

xxxNxxx

MMMˆ......

.

...

.

...

ˆ...2

ˆ1

ˆ

ˆ......2

ˆ1

ˆ

ˆ......2

ˆ1

ˆ

333

222

111

X

A

2

321

3

2

32313

232

2

212

13121

2

1

ˆ...ˆˆˆˆˆˆ..

.

...

.

.ˆˆ...ˆˆˆˆˆ

ˆˆ...ˆˆˆˆˆ

ˆˆˆˆˆˆˆ

ix

ixi

xi

xi

xixi

x

ixi

xi

xi

xi

xixi

xi

xi

xi

xi

xi

xixi

xi

xix

ixix

ixix

ix

MMMM

M

M

M

TTT VSUUSVXX

Nx

Nx

Nx

xxx

xxx

xxx

Nxxx

NxxxN

xxxNxxx

T

M

M

M

M

MMMˆ......ˆˆ

.

...

.

...

3ˆ...

2ˆ......

1ˆ......

ˆ......2

ˆ1

ˆ..

.

...

.

.ˆ...

ˆ......2

ˆ1

ˆ

ˆ......2

ˆ1

ˆ

21

21

21

21

333

222

111

XX

TTTMM VSISVVSUUSV

)x(

TVSV 2 12 VSV

V is a unitary matrix which diagonalizes XXT!

TTT VSUUSVXX

TTTMM VSISVVSUUSV

)x(

TVSV 2 12 VSV

There is a mathematical equivalence between taking the Singular Value Decomposition (SVD) of X, and finding the eigenvectors of A=XXT

Thus, S2 contains the eigenvalues of XXT

U contains as its columns the temporal patterns or Principal Components (“PC”s) corresponding to the M eigenvalues, which are the “right eigenvectors” of the SVD:

M

k

Tkkk

1

uvUSVXT

SVUX

V contains the as its columns the Spatial Pattern or Empirical Orthogonal Function (“EOF”) corrresponding to the M eigenvalues, which are the “left eigenvectors” of the SVD:

M

kkk

1

v

TTUSXV

M

kkkT

1

u

M

k

Tkkk

1

vuVSUXT

*

1

*)( M

k

Tkkk

M vuX

We can filter the original data with a subset of M* eigenvectors:

FILTERING WITH EIGENVECTORS

•Standardization & Areal Weighting•Gappy Data•Frequency domain •“Rotation”•Selection Rules

Some Additional Considerations:

How many eigenvectors do we consider significant?

•Eigenvalue > 1/M

•Break in slope in eigenvalue spectrum (“Scree” test) or log eigenvalue (“LEV”) spectrum

•Eigenvalue lies outside expected distribution for M uncorrelated Gaussian time series of length N (Preisendorfer Rule N). This is an example of a Monte Carlo method

•Rule N’ (take into account serial correlation)

There is no uniquely defensible criterion...

SELECTION RULES

Preisdendorfer Rule N

SELECTION RULES

Asymptotic results of Preisendorfer Rule N for large sample size

(N,M>100 or so)

=N/M

SELECTION RULES

MATLAB EXAMPLE:

NORTH ATLANTIC SEA LEVEL PRESSURE DATA

1899-1999

Recommended