75
Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January 2008 Regionalization of Statistics Describing the Distribution of Hydrologic Extremes

Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Embed Size (px)

Citation preview

Page 1: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Jery R. StedingerCornell University

Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim

SAMSI Workshop 23 January 2008

Regionalization of Statistics Describing the Distribution of

Hydrologic Extremes

Page 2: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Extreme Value Theory & Hydrology

Annual maximum flood may be daily maximum, or instantaneous maximum.

Annual maximum 24-hour rainfall may be daily maximum or maximum 1440-minute values.

Annual maximums are not maximum of I.I.D. series:Years have definite “wet” and “dry” seasonsDaily values are correlated Because of El Niño and atmospheric patterns,

some years extreme-event prone, others are not.

Peaks-over-threshold (PDS) another alternative.

Page 3: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Outline

• Summarizing Data: Moments and L-moments

• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs

• Bayesian GLS Regression for regionalization

• Concluding observations

Page 4: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Outline

• Summarizing Data: Moments and L-moments

• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs

• Bayesian GLS Regression for regionalization

• Concluding observations

Page 5: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Definitions: Product-Moments

Mean, measure of location

µx = E[ X ]

Variance, measure of spread

x2 = E[ (X – µx )2]

Coef. of Skewness, asymmetry

x = E[ (X – µx )3] /x3

Page 6: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Conventional Moment Ratios

Conventional descriptions of shape are

Coefficient of Variation, CV:

Coefficients of skewness, : E[(X-µ)3]/3

Coefficients of kurtosis, : E[(X-µ)4]/4

Page 7: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Product-Moment Skew-Kurtosis estimators: n=10

0.0

1.0

2.0

3.0

4.0

5.0

6.0

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Skew

Ku

rto

sis

True

Average

Samples drawn from a Gumbel distribution.

Page 8: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

L-Moments

An alternative to product moments

now widely used in hydrology.

Page 9: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

L-Moments: an alternative

• L-moments can summarize data as do conventional moments using linear combinations of the ordered observations.

• Because L-moments avoid squaring and

cubing the data, their ratios do not suffer from the severe bias problems encountered with product moments.

• Estimate using order statistics

Page 10: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

L-Moments: an alternative

Let X(i|n) be ith largest obs. in sample of size n.

Measure of Scale

expected difference largest and smallest observations in sample of 2:

2 = (1/2) E[ X(2|2) - X(1|2) ]

Measure of Asymmetry

3 = (1/3) E[ X(3|3) - 2 X(2|3) + X(1|3) ]

where 3 > 0 for positively skewed distributions

Page 11: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

L-Moments: an alternative

Measure of Kurtosis

4 = (1/4) E[ X(4|4) – 3 X(3|4) – 3 X(2|4) + X(1|4) ]

For highly kurtotic distributions, 4 large.

For the uniform distribution 4 = 0.

Page 12: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Dimensionless L-moment ratios

L-moment Coefficient of variation (L-CV): /µ

L-moment coef. of skew (L-Skewness)

L-moment coef. of kurtosis (L-Kurtosis)

(Note: Hosking calls L-CV instead of .)

Page 13: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Product-Moment Skew-Kurtosis estimators: n=10

0.0

1.0

2.0

3.0

4.0

5.0

6.0

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Skew

Ku

rto

sis

True

Average

Samples drawn from a Gumbel distribution.

Page 14: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

L-Moment Skew-Kurtosis estimators: n=10

-0.20

-0.10

0.00

0.10

0.20

0.30

0.40

0.50

0.60

-0.30 -0.10 0.10 0.30 0.50 0.70

L-Skew

L-K

urt

osi

s

True

Average

Samples drawn from a Gumbel distribution.

Page 15: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Generalized Extreme Value (GEV) distribution

Gumbel's Type I, II & III Extreme Value distr.:

F(x) = exp{ – [ 1 – (/a)(x-)]1/ } for ≠ 0

= shape; = scale, = location.

Mostly -0.3 < ≤ 0

[Others use for shape .]

Page 16: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GEV Prob. Density Function

0.00

0.05

0.10

0.15

0.20

0 5 10 15 20 25 30

x

f(x)

Page 17: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GEV Prob. Density Function large x

0.000

0.010

0.020

0.030

0.040

12 14 16 18 20 22 24 26 28 30

x

f(x)

Page 18: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Simple GEV L-Moment Estimators

Using L-moments – Hosking, Wallis & Wood (1985)

c = 2/(3 + 3) – ln(2)/ln(3); 3 = 3 / 2

then

= 7.8590 c + 2.9554 c2 ; 3 ≤ 0.5

= 2 / [ (1+ ) (1 – 2- ) ]

= 1 + [ (1+) – 1 ] /

Quantiles:

xp = + () { 1 – [ -ln(p) ] }

Method of L-moments simple and attractive.

Page 19: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Index Flood Methodology

Research has demonstrated potential

advantages of index flood procedures for

combining regional and at-site data to

improve the estimators at individual sites.

Page 20: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Hosking and Wallis (1997)

Development ofL-moments for regional

flood frequency analysis.

Research done in the 1980-1995 period.

J.R.M. Hosking and J.R. Wallis, Regional Frequency

Analysis: An Approach Based on L-moments, Cambridge University

Press, 1997.

Page 21: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Index Flood Method for Regionalization

Systematic Record Mean

Dimensionless Regional

Model

1 2 3 4 K

xp = x y p

yt =

xtx

yp

Compute for regionaverage L-CV and L-CS

which yields regional yp

Page 22: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Index Flood Methodology

• Use data from hydrologically "similar" basins to estimate a dimensionless flood distribution which is scaled by at-site sample mean.

• "Substitutes Space for Time" by using regional information to compensate for relatively short records at each site.

• Most of these studies have used the GEV distribution and L-moments or equivalent.

Page 23: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Outline

• Summarizing Data: Moments and L-moments

• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs

• Bayesian GLS Regression for regionalization

• Concluding observations

Page 24: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Trouble with MLEs for GEV

X0.999

= 14.9 (true)

= 6,000,000 (est.)

CASE: N = 15, X ~ GEV(= 0, = 1, = –0.20)

0

1

2

3

4

5

-2.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0x

f (x)

Sample

MLE Solution:

= –0.2,

= 0.53,

= –2.48

Page 25: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Parameter Estimators for 3-parameter GEV distribution

1. Maximum Likelihood (ML)

2. Method of Moments (MOM)

3. Method of L-moments (LM)

4. Generalized Maximum Likelihood (GML)Introduces a prior distribution for that ensures estimator

within ( -0.5, +0.5), and encourages values within (-0.3, +0.1)Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood GEV quantile estimators for hydrologic data, Water Resour. Res.. 36(3), 737-744, 2000.

Or can use a penalty to enfore constraint that > -1:Coles, S.G., and M.J.Dixon, Likelihood-Based Inference for Extreme

Value Models, Extremes 2:1, 5-23, 1999.

Page 26: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Prior distribution on GEV

0

1

2

3

4

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

f()

Page 27: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Performance Alternative Estmators of x0.99 for GEV distribution, n = 25

0

2

4

6

8

-0.3 -0.2 -0.1 0 0.1 0.2 0.3

RM

SE

ML

LM

MOM

GML

Page 28: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Performance Alternative Estmators of x0.99 for GEV distribution, n = 100

RM

SE

0

1

2

3

4

-0.3 -0.2 -0.1 0 0.1 0.2 0.3

ML

LM

MOM

GML

Page 29: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GEV Estimators

• In 1985 when Hosking, Wallis and Wood introduced L-moment (PWM) estimators for GEV, they were much better than MLEs and Quantile estimators

• In 1998 Madsen and Rosbjerg demonstrated MOM were not so bad, perhaps better than L-Moments.

• Finally in 2000 Martins & Stedinger demonstrated that adding realistic control of GEV shape parameter yielded estimators that dominated competition. This is a distribution with modest-accuracy regional description of shape parameter.

Page 30: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Outline

• Summarizing Data: Moments and L-moments

• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs

• Bayesian GLS Regression for regionalization

• Concluding observations

Page 31: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Partial Duration or Annual Maximum Series.

by seeing more little floods,

do we know more about big floods ?

Page 32: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Partial Duration Series (PDS)Peaks over threshold (POT)

Years of Record

Threshold

Flood Flows

Page 33: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Poisson/Pareto model for PDS

= arrival rate for floods > x0

which follow a Poisson process

G(x) = Pr[ X ≤ x ] for peaks over threshold x > x0

is a Generalized Pareto distribution= 1 – { 1 - [ (x - x0)/ ] }1/

Then annual maximums haveGeneralized Extreme Value distributionF(x) = exp{ – ( 1 - [ (x - )/’ ] )1/

= x0 + (1 – -)/ ’ = same

Page 34: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Which is more precise: Which is more precise: AMS or PDS?AMS or PDS?

R RMSE{Q0.99 | AMS}

RMSE{Q0.99 | PDS}

Consider where estimate only 2 parameter. Fix = 0, corresponding to

Poisson arrivals with exponential exceendances:

Share & Lynn (1964) model for flood risk.

Page 35: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Poisson Arrivals withPoisson Arrivals withExponential Exceedances (Exponential Exceedances (= = 00))

Ratio of RMSE of AMS over PDS Estimator

0.5

0.8

1.0

1.3

1.5

0.0 1.0 2.0 3.0 4.0 5.0 6.0

Arrival Rate

R

10-Year100-Year

Page 36: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Which is more precise: Which is more precise: AMS=GP or PDS=GEV ?AMS=GP or PDS=GEV ?

RMSE-ratio =RMSE{Q0.99 | PDS XXX}

RMSE{Q0.99 | AMS GML}

Now estimate 3 parameters using PDS data employingXXX = MOM, L-Moments (LM) and GMLwith Generalized Pareto distribution

and compare RMSE of PDS-XXXto RMSE of AMS-GMLE GEV estimator.

Page 37: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

RMSE 3 PDS estimators vs AMS-GML RMSE 3 PDS estimators vs AMS-GML = = 55 events/year events/year

(b)

0.0

0.5

1.0

1.5

2.0

-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3

RM

SE[P

DS

]/R

MS

E[A

MS]

GML

MOM

LM

RM

SE

-Rat

io P

DS

/AM

S-G

ML

E

shape parameter-0.3 -0.2 -0.1 0 +0.1 +0.2 +0.3

Page 38: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

(a)

MOM-AMS

0.5

1.0

1.5

2.0

0 1 2 3 4 5

RM

SE

-rat

io

GML

MOM

LM

RMSE 3 PDS estimators vs AMS-GML RMSE 3 PDS estimators vs AMS-GML = = – 0.30– 0.30

RM

SE

-Rat

io P

DS

/AM

S-G

MLE

events per year

Page 39: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Conclusions: PDS versus AMS

For < 0, with PDS data, again GML quantile estimators generally better than MOM, LM and ML.

Precision of GML quantile estimators insensitive to

A year of PDS data generally worth a year of AMS data for estimating 100-year flood when employing the GMLE estimators of GP and GEV parameters:

more little floods do not tell us about

the distribution of large floods.

Page 40: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Outline

• Summarizing Data: Moments and L-moments

• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs

• Bayesian GLS Regression for regionalization

• Concluding observations

Page 41: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GLS Regression for Regional Analyses

GOAL– Obtain efficient estimators of the mean, standard

deviation, T-yr flood, or GEV parametersas a function of physiographic basin characteristics;and provide the precision of that estimator.

MODEL–log[Statistic-of-interest ] =

+ 1 log(Area) + 2 log(Slope) + . . . + Error

Page 42: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GLS Analysis: Complications

With available records, only obtain sample estimates of Statistic-of-Interest, denoted y i

Total error iis a combination of –

(i) time-sampling-error i in sample estimators yi which are often cross-correlated, and

(ii) underlying model error i (true lack of fit).Variance of those errors about prediction X

depends on statistics-of-interest at each site.

^

^

Model error

ˆ y X X Sample error

Total error

Prediction

^

Page 43: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GLS for RegionalizationUse Available

record lengths ni, concurrent record lengths mij,regional estimates of stan. deviations i, or 2i , 3i and cross-correlations ij of floods to estimate variance

& cross-correlations of describing errors in i.

With true model error variance determine covariancematrix () of residual errors:

() = I + where ( ) is covariance matrix of the estimator y

y

y

y

Page 44: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GLS Analysis: Solution

GLS regression model (Stedinger & Tasker, 1985, 1989)

= X +

with parameter estimator b for { XT ()-1 X } b = XT ()-1

Can estimate model-error using moments ( – X b)T ()-1 ( – X b) = n - k

() = I +

n = dimension of y; k = dimension of b

y

y

y y

y

Page 45: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Likelihood function - model error

Tibagi River, Brazil, n=17)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

2

4

6

8

10

12

14

16

18

20

Model Error Variance

Maximum of likelihoodmay be at zero, but largervalues are very probable.

Zero clearly not in middle of likely range of values.

Method of moments has Same problem zeroestimate.

Page 46: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Advantages of Bayesian Analysis

Provides posterior distribution of

parameters

model error variance 2, and

predictive distribution for dependent variable

Bayesian Approach is a natural solution to the problem

Page 47: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Bayesian GLS Model

Prior distribution: (, )

-Parameter are multivariate normal ()

-Model error variance

Exponential dist. (); E[ ] = = 24

Likelihood function:

Assume data is multivariate N[ X, ]

Page 48: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Quasi-Analytic Bayesian GLS

Joint posterior distribution

Marginal posterior of

2 2 2( | ) ( , | ) ( | , , )p D p D d Cf D

where integrate analytically normal likelihood & prior to determine f in closed-form.

2 2 2( , | ) ( | , ) ( , )p D C D

Page 49: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

2

4

6

8

10

12

14

16

18

20

22

Model Error Variance

marginal posteriorpriorlikelihood

MM-GLS for 0.000

MLE-GLS for 0.000

Bayesian GLS for 0.046

Example of a posterior of

Model 1,Tibagi, Brazil, n =17)

Model error variance

Page 50: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Quasi-Analytic Result

From joint posterior distribution

can compute marginal posterior of

2 2 2( | ) ( | , ) ( | )p D p D p D d and moments by 1- dimensional num. integrations

2 2 2( , | ) ( | , ) ( | )p D p D p D

Page 51: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Bayesian GLS for Regionalization

of Flood Characteristics in Korea Dae Il Jeong

Post-doctoral Researcher, Cornell University Jery R. Stedinger

Professor, Cornell University

Young-Oh KimAssociate Professor, Seoul National University

Jang Hyun Sung Graduate Student, Seoul National University

Page 52: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Korean River basins

• Land Area: 120,000 km2

Major river basins:Han, Nakdong, Geum

• Total Annual Precipitation: (TAP) = 1283mm

Two thirds of TAP occurs

during 3-month flood season (Jul~Sep)

• Available sites: 31Average length: 22 years

Han River Basin

Nakdong River Basin

Geum River Basin

Page 53: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Korean Application

Regional estimators of L-CV 2 and L-CS 3 for

flood frequency analysis using GEV distribution

6 Explanatory Variables• 2 indicators (Han-Nakdong-Geum basins)• logs of drainage area• logs of channel slope• mean precipitation• SD of annual maximum precipitation

Page 54: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Cross-correlation concurrent maxima

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

0 50 100 150 200 250 300

Distance between stations (km)

Cor

rela

tion

coe

ffic

ient

)80/exp(45.0 ijij d

Page 55: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Monte Carlo results forcross-correlation L-CS estimators

GEV+ when = -0.3 and 2 = 0.3

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

ρx,y

Series 1Series 2Series 3

ρxy

ρ(τ

3x, τ

3y)

ρ(τ

3x, τ

3y)

xy - cross-correlation annual maxima

xy - cross-Corre-lationL-CS

estimators

bijijjkik cfa |ˆ|)ˆ,ˆ(ˆ ,,

Page 56: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Regression Results L-CV

Model Name

Const. Ln(Area)Mean

Ppt

ModelErrorVar.

AvgSampling

Var. AVPGLS

Pseudo R2

(%)

ERL(years)

B-GLS0 0.4178 0.0077 0.0009 0.0087 0 14

(0.0306) (0.0033)

B-GLS2 0.4220 -0.0416 -0.1307 0.0043 0.0015 0.0057 45 21

(0.0285) (0.0116) (0.0522) (0.0021)

[0.1 %] [1.3 %]

Standard error in parentheses ( - ); p-value in brackets [ - ].

Page 57: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Performance Measures

Average Variance of Prediction (AVP)How well model estimates true value of quantity of interest on average across sites

Pseudo R2 : improvement of GLS(k) versus GLS(0)

Effective Record Length (ERL)Relative uncertainty of regional estimate compared to an at-site estimator

AVPGLS 2

1

Nxi (X

T 1X) 1xiT

i1

N

RGLS2

n[2 (0)

2 (k)]

n2 (0)

1

2 (k)

2 (0)

Page 58: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Regression Results L-CS 3

Model Name

Const. Ln(Area)

ModelErrorVar.

AvgSampling

Var. AVP

Pseudo R2

(%)

ERL(years)

B-GLS0 0.3402 0.0094 0.0029 0.0123 0 39

(0.0538) (0.0056)

B-GLS1 0.3405 -0.0535 0.0060 0.0035 0.0094 37 51

(0.0489) (0.0183) (0.0044)

[0.6 %]

Standard error in parentheses ( - ); p-value in brackets [ - ].

Page 59: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Model Diagnostic Measures

Pseudo ANOVA table

- Variation explained by regional model

- Residual variation due to model errors

- Residual variation due sampling errors

- Represents partition of TOTAL variation

Page 60: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Pseudo ANOVA Table for L-CV and L-CS

SourceDegrees-

of-freedom

Sum of squares

Equations L-CV L-CS

Model k = 1 or 2 n[2(0) -

2(k)] 0.108 0.106

Model error δ n - k - 1 n2(k) 0.132 0.185

Sampling error η n 0.156 0.624

Total 2n - 1 0.396 0.916

1.18 3.38

2.89 3.76

Pseudo R2 45 % 37 %

EVR 1

nVar(yi )

i1

n

/ 2 (k)

MBV 1

nwT(

2 )w

Var(yi )i1

n

n

2 Var(yi )i1

n

, where w is the vector ( )1 / ii

We need GLS regression analysis

ERL (years) = 21 51

Page 61: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Conclusion: Value in Korea

• Regional estimator for L-Coefficient of Variation should be combined with its at-site estimatorERL(2) = 21 years ≈ average record length (22 yrs)

• Regional estimator for L-skewness was more precise than at-site estimatorsERL(3) = 51 years > average record length (22 yrs)

Clearly advantageous to use BOTH regional and at-site information

in analysis of annual maxima.

Page 62: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Diagnostic Statistics

Statistics for evaluating data concerns, precision of predicted values, sources of

variation, and model adequacy:

Leverage and InfluenceMeasures of Prediction Precision

Pseudo R2 and ANOVAModeling Diagnostics: EVR & MBV

Bayesian Plausibility Level

Page 63: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Bayesian Hierarchical Model:Solve whole problem at once?

Assume values for each site i for i = 1, …, K

Xit ~ GEV( ), t = 1, …, ni

where for parameters we have

i ~ N(µi ~ N(µwhere perhapsii / I or coef. of variationi ~ N(µ

with priors on µ; µ; µ

whose values for each site I may depend on at-site physiographic characteristics of that site.

Ignores cross-correlations: need multivariate model for K variates?Beware of special cases and lack of fit.

Page 64: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Outline

• Summarizing Data: Moments and L-moments

• Parameter estimation for GEV – Use of a prior on – PDS versus AMS with GMLEs

• Bayesian GLS Regression for regionalization

• Concluding observations

Page 65: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Concluding Remarks

• GEV distribution used by many water agencies and countries to describe the distribution of extremes.

• L-moments provide simple estimators, but not efficient.

• Generalized Maximum Likelihood Estimators [GMLEs]

(modest prior on ) solve problems with MLEs and were the most precise.

• PDS (GPD-Poisson) no better than AMS (GEV) when estimating three parameters with GMLE.

Page 66: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Final Comments

• Regional regression procedures should account for precision of at-site estimators and their cross-correlations, as can be done with

Generalized Least Squares regressionOtherwise estimates of model accuracy and of precision of parameter estimates will be in error.

• When model error variance is small relative to errors in estimated hydrologic statistics,

Bayesian model error variance estimator is particularly attractive.

Page 67: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Hosking and Wallis (1997)

We can do better than

simple index flood

procedures that everywhere use regional average

L-CV 2 and L-CS 3 values.

Page 68: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Conclusion: Applicability of GLS

• Developed Bayesian Generalized Least Squares

modeling framework to analyze regional information

addressing distribution parameters recognizing – Sampling error in at-site estimators as function of record length,

cross-correlation of concurrent events, and concurrent record

lengths, and

– regional model error (true precision of regional model)

• Developed regression models for L-CV and L-CS for

Korean annual maximum flood using B-GLS analysis

Page 69: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Background ReadingStedinger, J.R., Flood Frequency Analysis and Statistical Estimation of Flood Risk, Chapter 12, Inland Flood Hazards: Human, Riparian and Aquatic Communities, E.E. Wohl (ed.), Cambridge University Press, Stanford, United Kingdom, 2000.

ReferencesHosking, J. R. M., L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics, J. of Royal Statistical Society, B, 52(2), 105-124, 1990.

Hosking, J.R.M., and J.R. Wallis, Regional Frequency Analysis: An Approach Based on L-moments, Cambridge University Press, 1997.

Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood GEV quantile estimators for hydrologic data, Water Resources Research. 36(3), 737-744, 2000.

Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood Pareto-Poisson Flood Risk Analysis for Partial Duration Series, Water Resources Research.37(10), 2559-2567, 2001.

Stedinger, J. R. , and L. Lu, Appraisal of Regional and Index Flood Quantile Estimators, Stochastic Hydrology and Hydraulics, 9(1), 49-75, 1995.

Flood Frequency References

Page 70: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

GLS ReferencesGriffis, V. W., and J. R. Stedinger, The Use of GLS Regression in Regional Hydrologic

Analyses, J. of Hydrology, 344(1-2), 82-95, 2007 [doi:10.1016/j.jhydrol.2007.06.023].

Gruber, Andrea M., Dirceu S. Reis Jr., and Jery R. Stedinger, Models of Regional Skew Based on Bayesian GLS Regression, Paper 40927-3285, World Environ. & Water Resour. Conf. - Restoring our Natural Habitat, K.C. Kabbes editor, Tampa, FL, May 15-18, 2007.

Jeong, Dae Il, Jery R. Stedinger, Young-Oh Kim, and Jang Hyun Sung, Bayesian GLS for Regionalization of Flood Characteristics in Korea, Paper 40927-2736, World Environ. & Water Resour. Conf. - Restoring our Natural Habitat, Tampa, FL, May 15-18, 2007.

Martins, E.S., and J.R. Stedinger, Cross-correlation among estimators of shape, Water Resources Research, 38(11), doi: 10.1029/2002WR001589, 26 November 2002.

Reis, D. S., Jr., J. R. Stedinger, and E. S. Martins, Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation, Water Resour. Res., 41, W10419, doi:10.1029/2004WR003445, 2005.

Stedinger, J.R., and G.D. Tasker, Regional Hydrologic Analysis, 1. Ordinary, Weighted and Generalized Least Squares Compared, Water Resour. Res., 21(9), 1421-1432, 1985.

Tasker, G.D., and J.R. Stedinger, Estimating Generalized Skew With Weighted Least Squares Regression, J. of Water Resources Planning and Management, 112(2), 225-237, 1986.

Tasker, G.D., and J.R. Stedinger, An Operational GLS Model for Hydrologic Regression, J. of Hydrology, 111(1-4), 361-375, 1989.

Page 71: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Pseudo R2 for GLS

Not interested in total error that includes sampling error which cannot explain. Traditional adjusted R2 :

How much of critical model error can we explain, where Var[] = (k) for model with k parameters?

variancetotal

anceerror vari dunexplaine12 R

Consider the GLS model: ˆ y X X

RGLS2 1 –

2 (k)

2 (0)

ROLS2 1 –

s2

sY2

Page 72: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Pseudo ANOVA Table

Source Degrees of Freedom Estimator

Model k

Model Error n - k - 1

Sampling Error n

Total 2n - 1

n[ 2 (0) – 2 (k)]

n 2 (k)

Var(y)

n 2 (0)Var(y)

Page 73: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Modeling Diagnostics

To evaluate whether OLS might be sufficient consider the Error Variance Ratio EVR.

If EVR > 20%, then sampling error in estimators of y are potentially an important fraction of the observed total error = .

Do we need WLS or GLS to correctly analyze this data?

EVR Average{Var[

i]}

Var[]Var( öy)

n ö2(k)

Page 74: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

Modeling Diagnostics

EVR > 20% suggests a need for WLS or GLS. But when is cross-correlation so largethat a GLS analysis is needed?

Misrepresentation of Beta Variance (MBV)

Describes error made by WLS in its evaluation of precision of estimator b0 of the constant term.

MBV Var[b0

WLS | GLS analysis]

Var[b0WLS | WLS analysis]

wTw

nwhere wi

1

ii

Page 75: Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January

OLS, WLS and GLS for L-CSModel Name

Const. Ln(Area)ModelErrorVar.

AverageSampling

Var.AVPnew

Pseudo R2

(%)

ERL(years)

OLS1 0.3679 -0.0472 0.0221 0.0014 0.0235 16 21

(0.0267) (0.0181)

B-WLS1 0.3792 -0.0492 0.0059 0.0016 0.0074 31 65

(0.0261) (0.0206) (0.0047)

B-GLS1 0.3405 -0.0535 0.0060 0.0035 0.0094 37 51

(0.0489) (0.0188) (0.0044)

Standard error in parentheses ( - ).