Upload
nigel-harrell
View
225
Download
10
Tags:
Embed Size (px)
Citation preview
Jery R. StedingerCornell University
Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim
SAMSI Workshop 23 January 2008
Regionalization of Statistics Describing the Distribution of
Hydrologic Extremes
Extreme Value Theory & Hydrology
Annual maximum flood may be daily maximum, or instantaneous maximum.
Annual maximum 24-hour rainfall may be daily maximum or maximum 1440-minute values.
Annual maximums are not maximum of I.I.D. series:Years have definite “wet” and “dry” seasonsDaily values are correlated Because of El Niño and atmospheric patterns,
some years extreme-event prone, others are not.
Peaks-over-threshold (PDS) another alternative.
Outline
• Summarizing Data: Moments and L-moments
• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs
• Bayesian GLS Regression for regionalization
• Concluding observations
Outline
• Summarizing Data: Moments and L-moments
• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs
• Bayesian GLS Regression for regionalization
• Concluding observations
Definitions: Product-Moments
Mean, measure of location
µx = E[ X ]
Variance, measure of spread
x2 = E[ (X – µx )2]
Coef. of Skewness, asymmetry
x = E[ (X – µx )3] /x3
Conventional Moment Ratios
Conventional descriptions of shape are
Coefficient of Variation, CV:
Coefficients of skewness, : E[(X-µ)3]/3
Coefficients of kurtosis, : E[(X-µ)4]/4
Product-Moment Skew-Kurtosis estimators: n=10
0.0
1.0
2.0
3.0
4.0
5.0
6.0
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Skew
Ku
rto
sis
True
Average
Samples drawn from a Gumbel distribution.
L-Moments
An alternative to product moments
now widely used in hydrology.
L-Moments: an alternative
• L-moments can summarize data as do conventional moments using linear combinations of the ordered observations.
• Because L-moments avoid squaring and
cubing the data, their ratios do not suffer from the severe bias problems encountered with product moments.
• Estimate using order statistics
L-Moments: an alternative
Let X(i|n) be ith largest obs. in sample of size n.
Measure of Scale
expected difference largest and smallest observations in sample of 2:
2 = (1/2) E[ X(2|2) - X(1|2) ]
Measure of Asymmetry
3 = (1/3) E[ X(3|3) - 2 X(2|3) + X(1|3) ]
where 3 > 0 for positively skewed distributions
L-Moments: an alternative
Measure of Kurtosis
4 = (1/4) E[ X(4|4) – 3 X(3|4) – 3 X(2|4) + X(1|4) ]
For highly kurtotic distributions, 4 large.
For the uniform distribution 4 = 0.
Dimensionless L-moment ratios
L-moment Coefficient of variation (L-CV): /µ
L-moment coef. of skew (L-Skewness)
L-moment coef. of kurtosis (L-Kurtosis)
(Note: Hosking calls L-CV instead of .)
Product-Moment Skew-Kurtosis estimators: n=10
0.0
1.0
2.0
3.0
4.0
5.0
6.0
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Skew
Ku
rto
sis
True
Average
Samples drawn from a Gumbel distribution.
L-Moment Skew-Kurtosis estimators: n=10
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
0.60
-0.30 -0.10 0.10 0.30 0.50 0.70
L-Skew
L-K
urt
osi
s
True
Average
Samples drawn from a Gumbel distribution.
Generalized Extreme Value (GEV) distribution
Gumbel's Type I, II & III Extreme Value distr.:
F(x) = exp{ – [ 1 – (/a)(x-)]1/ } for ≠ 0
= shape; = scale, = location.
Mostly -0.3 < ≤ 0
[Others use for shape .]
GEV Prob. Density Function
0.00
0.05
0.10
0.15
0.20
0 5 10 15 20 25 30
x
f(x)
GEV Prob. Density Function large x
0.000
0.010
0.020
0.030
0.040
12 14 16 18 20 22 24 26 28 30
x
f(x)
Simple GEV L-Moment Estimators
Using L-moments – Hosking, Wallis & Wood (1985)
c = 2/(3 + 3) – ln(2)/ln(3); 3 = 3 / 2
then
= 7.8590 c + 2.9554 c2 ; 3 ≤ 0.5
= 2 / [ (1+ ) (1 – 2- ) ]
= 1 + [ (1+) – 1 ] /
Quantiles:
xp = + () { 1 – [ -ln(p) ] }
Method of L-moments simple and attractive.
Index Flood Methodology
Research has demonstrated potential
advantages of index flood procedures for
combining regional and at-site data to
improve the estimators at individual sites.
Hosking and Wallis (1997)
Development ofL-moments for regional
flood frequency analysis.
Research done in the 1980-1995 period.
J.R.M. Hosking and J.R. Wallis, Regional Frequency
Analysis: An Approach Based on L-moments, Cambridge University
Press, 1997.
Index Flood Method for Regionalization
Systematic Record Mean
Dimensionless Regional
Model
1 2 3 4 K
xp = x y p
yt =
xtx
yp
Compute for regionaverage L-CV and L-CS
which yields regional yp
Index Flood Methodology
• Use data from hydrologically "similar" basins to estimate a dimensionless flood distribution which is scaled by at-site sample mean.
• "Substitutes Space for Time" by using regional information to compensate for relatively short records at each site.
• Most of these studies have used the GEV distribution and L-moments or equivalent.
Outline
• Summarizing Data: Moments and L-moments
• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs
• Bayesian GLS Regression for regionalization
• Concluding observations
Trouble with MLEs for GEV
X0.999
= 14.9 (true)
= 6,000,000 (est.)
CASE: N = 15, X ~ GEV(= 0, = 1, = –0.20)
0
1
2
3
4
5
-2.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0x
f (x)
Sample
MLE Solution:
= –0.2,
= 0.53,
= –2.48
Parameter Estimators for 3-parameter GEV distribution
1. Maximum Likelihood (ML)
2. Method of Moments (MOM)
3. Method of L-moments (LM)
4. Generalized Maximum Likelihood (GML)Introduces a prior distribution for that ensures estimator
within ( -0.5, +0.5), and encourages values within (-0.3, +0.1)Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood GEV quantile estimators for hydrologic data, Water Resour. Res.. 36(3), 737-744, 2000.
Or can use a penalty to enfore constraint that > -1:Coles, S.G., and M.J.Dixon, Likelihood-Based Inference for Extreme
Value Models, Extremes 2:1, 5-23, 1999.
Prior distribution on GEV
0
1
2
3
4
-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5
f()
Performance Alternative Estmators of x0.99 for GEV distribution, n = 25
0
2
4
6
8
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
RM
SE
ML
LM
MOM
GML
Performance Alternative Estmators of x0.99 for GEV distribution, n = 100
RM
SE
0
1
2
3
4
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
ML
LM
MOM
GML
GEV Estimators
• In 1985 when Hosking, Wallis and Wood introduced L-moment (PWM) estimators for GEV, they were much better than MLEs and Quantile estimators
• In 1998 Madsen and Rosbjerg demonstrated MOM were not so bad, perhaps better than L-Moments.
• Finally in 2000 Martins & Stedinger demonstrated that adding realistic control of GEV shape parameter yielded estimators that dominated competition. This is a distribution with modest-accuracy regional description of shape parameter.
Outline
• Summarizing Data: Moments and L-moments
• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs
• Bayesian GLS Regression for regionalization
• Concluding observations
Partial Duration or Annual Maximum Series.
by seeing more little floods,
do we know more about big floods ?
Partial Duration Series (PDS)Peaks over threshold (POT)
Years of Record
Threshold
Flood Flows
Poisson/Pareto model for PDS
= arrival rate for floods > x0
which follow a Poisson process
G(x) = Pr[ X ≤ x ] for peaks over threshold x > x0
is a Generalized Pareto distribution= 1 – { 1 - [ (x - x0)/ ] }1/
Then annual maximums haveGeneralized Extreme Value distributionF(x) = exp{ – ( 1 - [ (x - )/’ ] )1/
= x0 + (1 – -)/ ’ = same
Which is more precise: Which is more precise: AMS or PDS?AMS or PDS?
R RMSE{Q0.99 | AMS}
RMSE{Q0.99 | PDS}
Consider where estimate only 2 parameter. Fix = 0, corresponding to
Poisson arrivals with exponential exceendances:
Share & Lynn (1964) model for flood risk.
Poisson Arrivals withPoisson Arrivals withExponential Exceedances (Exponential Exceedances (= = 00))
Ratio of RMSE of AMS over PDS Estimator
0.5
0.8
1.0
1.3
1.5
0.0 1.0 2.0 3.0 4.0 5.0 6.0
Arrival Rate
R
10-Year100-Year
Which is more precise: Which is more precise: AMS=GP or PDS=GEV ?AMS=GP or PDS=GEV ?
RMSE-ratio =RMSE{Q0.99 | PDS XXX}
RMSE{Q0.99 | AMS GML}
Now estimate 3 parameters using PDS data employingXXX = MOM, L-Moments (LM) and GMLwith Generalized Pareto distribution
and compare RMSE of PDS-XXXto RMSE of AMS-GMLE GEV estimator.
RMSE 3 PDS estimators vs AMS-GML RMSE 3 PDS estimators vs AMS-GML = = 55 events/year events/year
(b)
0.0
0.5
1.0
1.5
2.0
-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
RM
SE[P
DS
]/R
MS
E[A
MS]
GML
MOM
LM
RM
SE
-Rat
io P
DS
/AM
S-G
ML
E
shape parameter-0.3 -0.2 -0.1 0 +0.1 +0.2 +0.3
(a)
MOM-AMS
0.5
1.0
1.5
2.0
0 1 2 3 4 5
RM
SE
-rat
io
GML
MOM
LM
RMSE 3 PDS estimators vs AMS-GML RMSE 3 PDS estimators vs AMS-GML = = – 0.30– 0.30
RM
SE
-Rat
io P
DS
/AM
S-G
MLE
events per year
Conclusions: PDS versus AMS
For < 0, with PDS data, again GML quantile estimators generally better than MOM, LM and ML.
Precision of GML quantile estimators insensitive to
A year of PDS data generally worth a year of AMS data for estimating 100-year flood when employing the GMLE estimators of GP and GEV parameters:
more little floods do not tell us about
the distribution of large floods.
Outline
• Summarizing Data: Moments and L-moments
• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs
• Bayesian GLS Regression for regionalization
• Concluding observations
GLS Regression for Regional Analyses
GOAL– Obtain efficient estimators of the mean, standard
deviation, T-yr flood, or GEV parametersas a function of physiographic basin characteristics;and provide the precision of that estimator.
MODEL–log[Statistic-of-interest ] =
+ 1 log(Area) + 2 log(Slope) + . . . + Error
GLS Analysis: Complications
With available records, only obtain sample estimates of Statistic-of-Interest, denoted y i
Total error iis a combination of –
(i) time-sampling-error i in sample estimators yi which are often cross-correlated, and
(ii) underlying model error i (true lack of fit).Variance of those errors about prediction X
depends on statistics-of-interest at each site.
^
^
Model error
ˆ y X X Sample error
Total error
Prediction
^
GLS for RegionalizationUse Available
record lengths ni, concurrent record lengths mij,regional estimates of stan. deviations i, or 2i , 3i and cross-correlations ij of floods to estimate variance
& cross-correlations of describing errors in i.
With true model error variance determine covariancematrix () of residual errors:
() = I + where ( ) is covariance matrix of the estimator y
y
y
y
GLS Analysis: Solution
GLS regression model (Stedinger & Tasker, 1985, 1989)
= X +
with parameter estimator b for { XT ()-1 X } b = XT ()-1
Can estimate model-error using moments ( – X b)T ()-1 ( – X b) = n - k
() = I +
n = dimension of y; k = dimension of b
y
y
y y
y
Likelihood function - model error
Tibagi River, Brazil, n=17)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40
2
4
6
8
10
12
14
16
18
20
Model Error Variance
Maximum of likelihoodmay be at zero, but largervalues are very probable.
Zero clearly not in middle of likely range of values.
Method of moments has Same problem zeroestimate.
Advantages of Bayesian Analysis
Provides posterior distribution of
parameters
model error variance 2, and
predictive distribution for dependent variable
Bayesian Approach is a natural solution to the problem
Bayesian GLS Model
Prior distribution: (, )
-Parameter are multivariate normal ()
-Model error variance
Exponential dist. (); E[ ] = = 24
Likelihood function:
Assume data is multivariate N[ X, ]
Quasi-Analytic Bayesian GLS
Joint posterior distribution
Marginal posterior of
2 2 2( | ) ( , | ) ( | , , )p D p D d Cf D
where integrate analytically normal likelihood & prior to determine f in closed-form.
2 2 2( , | ) ( | , ) ( , )p D C D
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40
2
4
6
8
10
12
14
16
18
20
22
Model Error Variance
marginal posteriorpriorlikelihood
MM-GLS for 0.000
MLE-GLS for 0.000
Bayesian GLS for 0.046
Example of a posterior of
Model 1,Tibagi, Brazil, n =17)
Model error variance
Quasi-Analytic Result
From joint posterior distribution
can compute marginal posterior of
2 2 2( | ) ( | , ) ( | )p D p D p D d and moments by 1- dimensional num. integrations
2 2 2( , | ) ( | , ) ( | )p D p D p D
Bayesian GLS for Regionalization
of Flood Characteristics in Korea Dae Il Jeong
Post-doctoral Researcher, Cornell University Jery R. Stedinger
Professor, Cornell University
Young-Oh KimAssociate Professor, Seoul National University
Jang Hyun Sung Graduate Student, Seoul National University
Korean River basins
• Land Area: 120,000 km2
Major river basins:Han, Nakdong, Geum
• Total Annual Precipitation: (TAP) = 1283mm
Two thirds of TAP occurs
during 3-month flood season (Jul~Sep)
• Available sites: 31Average length: 22 years
Han River Basin
Nakdong River Basin
Geum River Basin
Korean Application
Regional estimators of L-CV 2 and L-CS 3 for
flood frequency analysis using GEV distribution
6 Explanatory Variables• 2 indicators (Han-Nakdong-Geum basins)• logs of drainage area• logs of channel slope• mean precipitation• SD of annual maximum precipitation
Cross-correlation concurrent maxima
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
0 50 100 150 200 250 300
Distance between stations (km)
Cor
rela
tion
coe
ffic
ient
)80/exp(45.0 ijij d
Monte Carlo results forcross-correlation L-CS estimators
GEV+ when = -0.3 and 2 = 0.3
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
ρx,y
Series 1Series 2Series 3
ρxy
ρ(τ
3x, τ
3y)
ρ(τ
3x, τ
3y)
xy - cross-correlation annual maxima
xy - cross-Corre-lationL-CS
estimators
bijijjkik cfa |ˆ|)ˆ,ˆ(ˆ ,,
Regression Results L-CV
Model Name
Const. Ln(Area)Mean
Ppt
ModelErrorVar.
AvgSampling
Var. AVPGLS
Pseudo R2
(%)
ERL(years)
B-GLS0 0.4178 0.0077 0.0009 0.0087 0 14
(0.0306) (0.0033)
B-GLS2 0.4220 -0.0416 -0.1307 0.0043 0.0015 0.0057 45 21
(0.0285) (0.0116) (0.0522) (0.0021)
[0.1 %] [1.3 %]
Standard error in parentheses ( - ); p-value in brackets [ - ].
Performance Measures
Average Variance of Prediction (AVP)How well model estimates true value of quantity of interest on average across sites
Pseudo R2 : improvement of GLS(k) versus GLS(0)
Effective Record Length (ERL)Relative uncertainty of regional estimate compared to an at-site estimator
AVPGLS 2
1
Nxi (X
T 1X) 1xiT
i1
N
RGLS2
n[2 (0)
2 (k)]
n2 (0)
1
2 (k)
2 (0)
Regression Results L-CS 3
Model Name
Const. Ln(Area)
ModelErrorVar.
AvgSampling
Var. AVP
Pseudo R2
(%)
ERL(years)
B-GLS0 0.3402 0.0094 0.0029 0.0123 0 39
(0.0538) (0.0056)
B-GLS1 0.3405 -0.0535 0.0060 0.0035 0.0094 37 51
(0.0489) (0.0183) (0.0044)
[0.6 %]
Standard error in parentheses ( - ); p-value in brackets [ - ].
Model Diagnostic Measures
Pseudo ANOVA table
- Variation explained by regional model
- Residual variation due to model errors
- Residual variation due sampling errors
- Represents partition of TOTAL variation
Pseudo ANOVA Table for L-CV and L-CS
SourceDegrees-
of-freedom
Sum of squares
Equations L-CV L-CS
Model k = 1 or 2 n[2(0) -
2(k)] 0.108 0.106
Model error δ n - k - 1 n2(k) 0.132 0.185
Sampling error η n 0.156 0.624
Total 2n - 1 0.396 0.916
1.18 3.38
2.89 3.76
Pseudo R2 45 % 37 %
EVR 1
nVar(yi )
i1
n
/ 2 (k)
MBV 1
nwT(
2 )w
Var(yi )i1
n
n
2 Var(yi )i1
n
, where w is the vector ( )1 / ii
We need GLS regression analysis
ERL (years) = 21 51
Conclusion: Value in Korea
• Regional estimator for L-Coefficient of Variation should be combined with its at-site estimatorERL(2) = 21 years ≈ average record length (22 yrs)
• Regional estimator for L-skewness was more precise than at-site estimatorsERL(3) = 51 years > average record length (22 yrs)
Clearly advantageous to use BOTH regional and at-site information
in analysis of annual maxima.
Diagnostic Statistics
Statistics for evaluating data concerns, precision of predicted values, sources of
variation, and model adequacy:
Leverage and InfluenceMeasures of Prediction Precision
Pseudo R2 and ANOVAModeling Diagnostics: EVR & MBV
Bayesian Plausibility Level
Bayesian Hierarchical Model:Solve whole problem at once?
Assume values for each site i for i = 1, …, K
Xit ~ GEV( ), t = 1, …, ni
where for parameters we have
i ~ N(µi ~ N(µwhere perhapsii / I or coef. of variationi ~ N(µ
with priors on µ; µ; µ
whose values for each site I may depend on at-site physiographic characteristics of that site.
Ignores cross-correlations: need multivariate model for K variates?Beware of special cases and lack of fit.
Outline
• Summarizing Data: Moments and L-moments
• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs
• Bayesian GLS Regression for regionalization
• Concluding observations
Concluding Remarks
• GEV distribution used by many water agencies and countries to describe the distribution of extremes.
• L-moments provide simple estimators, but not efficient.
• Generalized Maximum Likelihood Estimators [GMLEs]
(modest prior on ) solve problems with MLEs and were the most precise.
• PDS (GPD-Poisson) no better than AMS (GEV) when estimating three parameters with GMLE.
Final Comments
• Regional regression procedures should account for precision of at-site estimators and their cross-correlations, as can be done with
Generalized Least Squares regressionOtherwise estimates of model accuracy and of precision of parameter estimates will be in error.
• When model error variance is small relative to errors in estimated hydrologic statistics,
Bayesian model error variance estimator is particularly attractive.
Hosking and Wallis (1997)
We can do better than
simple index flood
procedures that everywhere use regional average
L-CV 2 and L-CS 3 values.
Conclusion: Applicability of GLS
• Developed Bayesian Generalized Least Squares
modeling framework to analyze regional information
addressing distribution parameters recognizing – Sampling error in at-site estimators as function of record length,
cross-correlation of concurrent events, and concurrent record
lengths, and
– regional model error (true precision of regional model)
• Developed regression models for L-CV and L-CS for
Korean annual maximum flood using B-GLS analysis
Background ReadingStedinger, J.R., Flood Frequency Analysis and Statistical Estimation of Flood Risk, Chapter 12, Inland Flood Hazards: Human, Riparian and Aquatic Communities, E.E. Wohl (ed.), Cambridge University Press, Stanford, United Kingdom, 2000.
ReferencesHosking, J. R. M., L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics, J. of Royal Statistical Society, B, 52(2), 105-124, 1990.
Hosking, J.R.M., and J.R. Wallis, Regional Frequency Analysis: An Approach Based on L-moments, Cambridge University Press, 1997.
Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood GEV quantile estimators for hydrologic data, Water Resources Research. 36(3), 737-744, 2000.
Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood Pareto-Poisson Flood Risk Analysis for Partial Duration Series, Water Resources Research.37(10), 2559-2567, 2001.
Stedinger, J. R. , and L. Lu, Appraisal of Regional and Index Flood Quantile Estimators, Stochastic Hydrology and Hydraulics, 9(1), 49-75, 1995.
Flood Frequency References
GLS ReferencesGriffis, V. W., and J. R. Stedinger, The Use of GLS Regression in Regional Hydrologic
Analyses, J. of Hydrology, 344(1-2), 82-95, 2007 [doi:10.1016/j.jhydrol.2007.06.023].
Gruber, Andrea M., Dirceu S. Reis Jr., and Jery R. Stedinger, Models of Regional Skew Based on Bayesian GLS Regression, Paper 40927-3285, World Environ. & Water Resour. Conf. - Restoring our Natural Habitat, K.C. Kabbes editor, Tampa, FL, May 15-18, 2007.
Jeong, Dae Il, Jery R. Stedinger, Young-Oh Kim, and Jang Hyun Sung, Bayesian GLS for Regionalization of Flood Characteristics in Korea, Paper 40927-2736, World Environ. & Water Resour. Conf. - Restoring our Natural Habitat, Tampa, FL, May 15-18, 2007.
Martins, E.S., and J.R. Stedinger, Cross-correlation among estimators of shape, Water Resources Research, 38(11), doi: 10.1029/2002WR001589, 26 November 2002.
Reis, D. S., Jr., J. R. Stedinger, and E. S. Martins, Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation, Water Resour. Res., 41, W10419, doi:10.1029/2004WR003445, 2005.
Stedinger, J.R., and G.D. Tasker, Regional Hydrologic Analysis, 1. Ordinary, Weighted and Generalized Least Squares Compared, Water Resour. Res., 21(9), 1421-1432, 1985.
Tasker, G.D., and J.R. Stedinger, Estimating Generalized Skew With Weighted Least Squares Regression, J. of Water Resources Planning and Management, 112(2), 225-237, 1986.
Tasker, G.D., and J.R. Stedinger, An Operational GLS Model for Hydrologic Regression, J. of Hydrology, 111(1-4), 361-375, 1989.
Pseudo R2 for GLS
Not interested in total error that includes sampling error which cannot explain. Traditional adjusted R2 :
How much of critical model error can we explain, where Var[] = (k) for model with k parameters?
variancetotal
anceerror vari dunexplaine12 R
Consider the GLS model: ˆ y X X
RGLS2 1 –
2 (k)
2 (0)
ROLS2 1 –
s2
sY2
Pseudo ANOVA Table
Source Degrees of Freedom Estimator
Model k
Model Error n - k - 1
Sampling Error n
Total 2n - 1
n[ 2 (0) – 2 (k)]
n 2 (k)
Var(y)
n 2 (0)Var(y)
Modeling Diagnostics
To evaluate whether OLS might be sufficient consider the Error Variance Ratio EVR.
If EVR > 20%, then sampling error in estimators of y are potentially an important fraction of the observed total error = .
Do we need WLS or GLS to correctly analyze this data?
EVR Average{Var[
i]}
Var[]Var( öy)
n ö2(k)
Modeling Diagnostics
EVR > 20% suggests a need for WLS or GLS. But when is cross-correlation so largethat a GLS analysis is needed?
Misrepresentation of Beta Variance (MBV)
Describes error made by WLS in its evaluation of precision of estimator b0 of the constant term.
MBV Var[b0
WLS | GLS analysis]
Var[b0WLS | WLS analysis]
wTw
nwhere wi
1
ii
OLS, WLS and GLS for L-CSModel Name
Const. Ln(Area)ModelErrorVar.
AverageSampling
Var.AVPnew
Pseudo R2
(%)
ERL(years)
OLS1 0.3679 -0.0472 0.0221 0.0014 0.0235 16 21
(0.0267) (0.0181)
B-WLS1 0.3792 -0.0492 0.0059 0.0016 0.0074 31 65
(0.0261) (0.0206) (0.0047)
B-GLS1 0.3405 -0.0535 0.0060 0.0035 0.0094 37 51
(0.0489) (0.0188) (0.0044)
Standard error in parentheses ( - ).