Upload
margaret-elliott
View
225
Download
4
Embed Size (px)
Citation preview
Statistical regression
Basic Forecast Methods
May 1 snowpack % avg
Apr-
Jul s
trea
mflo
w %
avg
S Fork Rio Grande, Colo
Snowpack
Soil water
Snow
Rainfall
Runoff
Heat
Simulation modeling
Credit: Tom Pagano
The General Linear Regression Model
where:Y = dependent variableXi = independent variables
bi = regression coefficients
n = number of independent variables
n
iii XbbY
10
Credit: Dave Garen
The Problem
If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated.
However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.
n
iii XbbY
10
Credit: Dave Garen
Example
Streamflow = bo + b1 * (Snotel A) + b2 * (Snotel B)
-> Snotel sites are very well correlated-> An optimal b1 and b2 will be difficult to determine since the correlation is so strong
The Solution
Possibilities:1) Pre-combine X’s into composite index(es), e.g., Z-score method2) Principal components regressionThese are similar in concept but differ in the mathematics.
n
iii XbbY
10
Credit: Dave Garen
Principal Components Analysis
Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables.
Principal components are linear combinations of the X’s.
Credit: Dave Garen
Principal Components AnalysisEach principal component is a weighted sum of all the X’s:
n
jjjXePC
111
n
jjjXePC
122
n
jjnjn XePC
1
. .
.
Credit: Dave Garen
Principal Components Analysis
The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other.
Principal components are new variables that are not correlated with each other.
The principal components transformation is equivalent to a rotation of axes.
Credit: Dave Garen
Principal Components Analysis
0
2
4
6
8
10
12
14
16
18
20
0 10 20 30 40 50 60 70 80
X1
X2
R2 = 0.698R = 0.836
PC1 = e11 X1 + e12 X2
PC2 = e21 X1 + e22 X2
Credit: Dave Garen
Principal Components Analysis
The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true).
Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.
Credit: Dave Garen
Principal Components Analysis -- Example
Independent Variables:
X1 – X5 Snow water equivalent at 5 stations
X6 – X10 Water year to date precipitation at 5 stations
X11 Antecedent streamflow
X12 Climate teleconnection index
Credit: Dave Garen
Correlation MatrixX1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Y
X1 1.0
.72
.67
.76
.81
.54
.31
.54
.38
.50
.18
.64
.65
X2 1.0
.67
.45
.80
.62
.45
.47
.31
.49
.14
.39
.60
X3 1.0
.49
.72
.84
.76
.86
.68
.85
.48
.56
.80
X4 1.0
.62
.42
.26
.36
.56
.38
.28
.59
.68
X5 1.0
.62
.49
.51
.44
.62
.32
.59
.73
X6 1.0
.93
.87
.83
.90
.63
.43
.85
X7 1.0
.82
.85
.90
.67
.32
.76
X8 1.0
.74
.84
.64
.39
.70
X9 1.0
.80
.70
.49
.84
X10 1.0
.64
.46
.79
X11 1.0
.36
.51
X12 1.0
.64
Credit: Dave Garen
First Five Eigenvectors
PC1 PC2 PC3 PC4 PC5
X1 0.265 0.444 0.004 0.074 -0.104
X2 0.249 0.325 -0.483
-0.030
0.315
X3 0.335 0.016 -0.178
0.149 -0.314
X4 0.229 0.353 0.456 -0.595
-0.009
X5 0.287 0.332 -0.148
0.120 0.412
X6 0.339 -0.168
-0.162
-0.106
-0.040
X7 0.308 -0.329
-0.150
-0.058
-0.015
X8 0.317 -0.197
-0.114
0.027 -0.261
X9 0.304 -0.240
0.299 -0.313
-0.103
X10 0.330 -0.197
-0.197
0.072 -0.129
X11 0.235 -0.349
0.351 0.168 0.692
X12 0.232 0.262 0.473 0.675 -0.212
% var.
62.7 15.8 7.8 3.8 3.2
Credit: Dave Garen
Principal Components Regression Procedure
• Try the PC’s in order• Test for regression coefficient significance (t-test)• Stop at first insignificant component• Transform regression coefficients to be in terms
of original variables• Sign test – coefficient signs must be same as
correlation with Y
Credit: Dave Garen
Summary
• Principal components analysis is a standard multivariate statistical procedure
• Can be used for descriptive purposes to reduce the dimensionality of correlated variables
• Can be taken a step further to provide new, non-correlated independent variables for regression
• PC’s taken in order, subject to t-test and sign test
• Final model is expressed in terms of original X variables Credit: Dave Garen
Soil Moisture at the interannual timescale
• Another example demonstrating importance of land surface processes in the climate system: Werner, 1999:– GCM run with and without active
land surface model in South America to explore the importance of land surface processes in the climate system variability in the Nordeste region.
– Both simulations include full atmospheric model, slab ocean model (no ocean dynamics), and dynamic land surface model everywhere except tropical South America in the Data Land simulation.
• Modeled variability– Full dynamic land surface
model simulation contains variability resembling observed variability with connection between NH and SH SSTs.
– Fixed land surface model shows no connected variability between NH and SH SSTs
Soil Moisture at the interannual timescale
Resources
• Dave Garen VIPER slides• Dennis Hartmann lecture notes (
http://www.atmos.washington.edu/~dennis/)
What does z-score regression do?
1. Combines predictors into weighted indices,emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data.
3. Regresses index against target predictand
Credit: Tom Pagano
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
Credit: Tom Pagano
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
Credit: Tom Pagano
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
60
135
avg stdev
30
15
Credit: Tom Pagano
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
60
135
avg stdev
30
15
Z = (90 – 60)/15 = +2
Credit: Tom Pagano
How good are the results
Under conditions of serially compete data,and relatively “normal” conditionsPCA and Z-Score are effectively indistinguishable*
Skill and behavior is similar to the official published outlooks**
However… Any tool is a weapon if you hold it right.(aka “A fool with a tool is still a tool”)
*Viper technical note - 1 basin ** Pagano dissertation – 29 basins Credit: Tom Pagano
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Global month changes
Credit: Tom Pagano
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Global month changes
Historical statisticsCredit: Tom Pagano
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Forecast vs observed time series
Station availability, weights
Global month changes
Historical statisticsCredit: Tom Pagano
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Forecast vs observed time series
Station availability, weights
Fcst vs obsscatterplot
Helpervariable
Scatterplot/Forecast
progression
Global month changes
Historical statisticsCredit: Tom Pagano
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Probabilitybounds
Forecast vs observed time series
Station availability, weights
Fcst vs obsscatterplot
Helpervariable
Scatterplot/Forecast
progression
Settings
Global month changes
Historical statisticsCredit: Tom Pagano
The Viper Main InterfaceLayout and interpretation
Probabilitybounds
Forecast vs observed time series
Station availability, weights
Fcst vs obsscatterplot
Helpervariable
Scatterplot/Forecast
progression
Settings
Historical statistics
There’s more if you scroll right:Relate any variable to another
Credit: Tom Pagano