Upload
sera
View
65
Download
0
Embed Size (px)
DESCRIPTION
LECTURE 8. PRINCIPAL COMPONENT ANALYSIS(PCA) EOFs and Principle Components; Selection Rules. Supplementary Readings : Wilks , chapters 9. WE’LL START OUT WITH AN EXAMPLE: 20th GLOBAL SURFACE TEMPERATURE RECORD. Surface Temperature Changes. - PowerPoint PPT Presentation
Citation preview
PRINCIPAL COMPONENT ANALYSIS(PCA)
EOFs and Principle Components; Selection Rules
LECTURE 8
Supplementary Readings:
Wilks, chapters 9
WE’LL START OUT WITH AN EXAMPLE: 20th GLOBAL SURFACE TEMPERATURE RECORD
Climatic Research Unit (‘CRU’), University of East Anglia
Surface Temperature Changes
EOFs for the five leading eigenvectors of the global temperature data from 1902-1980.
The gridpoint areal weighting factor used in the PCA procedure has been removed from the EOFs so that relative temperature anomalies can be inferred from the patterns.
12% (88%)
6% (3%)
5% (1%)
4% (1%)
3% (0.5%)
EOF #1
EOF #2
EOF #3
EOF #4
EOF #5
SURFACE TEMPERATURE RECORD FILTERED BY RETAINING PROJECTION ONTO WITH FIRST FIVE EIGENVECTORS
FILTERING THROUGH PCA
GLOBAL TEMPERATURE TREND
EOF #1
PC #1
Multivariate ENSO Index
(“MEI”)
EOF #2PC #2
EL NINO/SOUTHERN OSCILLATION (ENSO)
NORTH ATLANTIC OSCILLATION
EOF #3
PC #3
EOF #3
PC #3
NORTH ATLANTIC OSCILLATION
TROPICAL ATLANTIC “DIPOLE”
EOF #3
PC #3
ATLANTIC MULTIDECADAL OSCILLATION
EOF #5
PC #5
EOF #5
PC #5
ATLANTIC MULTIDECADAL OSCILLATION
EOF #5
PC #5
ATLANTIC MULTIDECADAL OSCILLATION
PCA as an SVD on the Data Matrix X
Recall from our earlier lecture the variance-covariance matrix A in the multivariate regression problem:
The eigenvectors of A comprise an orthogonal predictor set
cbA
2321
32
32313
2322
212
131212
1
ˆ...ˆˆˆˆˆˆ..
.
...
.
.
ˆˆ...ˆˆˆˆˆ
ˆˆ...ˆˆˆˆˆ
ˆˆˆˆˆˆˆ
ix
ixi
xi
xi
xixi
x
ixi
xi
xi
xi
xixi
xi
xi
xi
xi
xi
xixi
xi
xix
ixix
ixix
ix
MMMM
M
M
M
A
(Principal Components Regression)
Let us return to the data matrix,(assume it has zero mean)
Nxxx
NxxxN
xxxNxxx
MMMˆ......
2ˆ
1ˆ
.
...
.
...
ˆ...2
ˆ1
ˆ
ˆ......2
ˆ1
ˆ
ˆ......2
ˆ1
ˆ
333
222
111
X
TVSUX
M
k
Tkkk
1
vuVSUXT
We can write
Where U,V are unitary matrices (orthogonal matrices if X is real-valued), U is MxN, S is diagonal NxN, and V is NxN
Singular Value Decomposition (SVD)
Assume M>N (overdetermined; greater number of “equations” than “unknowns”)
TTVSUX
M
k
Tkkk
1
vuVSUXTT
We can then write
Where U, V are unitary matrices (orthogonal matrices if X is real-valued), U is NxM, S is diagonal MxM, and V is MxM
Singular Value Decomposition (SVD)
Typically, we are interested in the case N>M.
A revised overdetermined problem can be obtained by redefining the problem:
M
k
Tkkk
TTTTT
1
uvUSVUSVVSUX
Nxxx
NxxxN
xxxNxxx
MMMˆ......
2ˆ
1ˆ
.
...
.
...
ˆ...2
ˆ1
ˆ
ˆ......2
ˆ1
ˆ
ˆ......2
ˆ1
ˆ
333
222
111
X
A
2
321
3
2
32313
232
2
212
13121
2
1
ˆ...ˆˆˆˆˆˆ..
.
...
.
.ˆˆ...ˆˆˆˆˆ
ˆˆ...ˆˆˆˆˆ
ˆˆˆˆˆˆˆ
ix
ixi
xi
xi
xixi
x
ixi
xi
xi
xi
xixi
xi
xi
xi
xi
xi
xixi
xi
xix
ixix
ixix
ix
MMMM
M
M
M
TTT VSUUSVXX
Nx
Nx
Nx
xxx
xxx
xxx
Nxxx
NxxxN
xxxNxxx
T
M
M
M
M
MMMˆ......ˆˆ
.
...
.
...
3ˆ...
3ˆ
3ˆ
2ˆ......
2ˆ
2ˆ
1ˆ......
1ˆ
1ˆ
ˆ......2
ˆ1
ˆ..
.
...
.
.ˆ...
2ˆ
1ˆ
ˆ......2
ˆ1
ˆ
ˆ......2
ˆ1
ˆ
21
21
21
21
333
222
111
XX
TTTMM VSISVVSUUSV
)x(
TVSV 2 12 VSV
V is a unitary matrix which diagonalizes XXT!
TTT VSUUSVXX
TTTMM VSISVVSUUSV
)x(
TVSV 2 12 VSV
There is a mathematical equivalence between taking the Singular Value Decomposition (SVD) of X, and finding the eigenvectors of A=XXT
Thus, S2 contains the eigenvalues of XXT
U contains as its columns the temporal patterns or Principal Components (“PC”s) corresponding to the M eigenvalues, which are the “right eigenvectors” of the SVD:
M
k
Tkkk
1
uvUSVXT
SVUX
V contains the as its columns the Spatial Pattern or Empirical Orthogonal Function (“EOF”) corrresponding to the M eigenvalues, which are the “left eigenvectors” of the SVD:
M
kkk
1
v
TTUSXV
M
kkkT
1
u
M
k
Tkkk
1
vuVSUXT
*
1
*)( M
k
Tkkk
M vuX
We can filter the original data with a subset of M* eigenvectors:
FILTERING WITH EIGENVECTORS
•Standardization & Areal Weighting•Gappy Data•Frequency domain •“Rotation”•Selection Rules
Some Additional Considerations:
How many eigenvectors do we consider significant?
•Eigenvalue > 1/M
•Break in slope in eigenvalue spectrum (“Scree” test) or log eigenvalue (“LEV”) spectrum
•Eigenvalue lies outside expected distribution for M uncorrelated Gaussian time series of length N (Preisendorfer Rule N). This is an example of a Monte Carlo method
•Rule N’ (take into account serial correlation)
There is no uniquely defensible criterion...
SELECTION RULES
Preisdendorfer Rule N
SELECTION RULES
Asymptotic results of Preisendorfer Rule N for large sample size
(N,M>100 or so)
=N/M
SELECTION RULES
MATLAB EXAMPLE:
NORTH ATLANTIC SEA LEVEL PRESSURE DATA
1899-1999