Marginal modelling of spatially-dependent non …ucakpjn/TALKS/RSS_2Dec2010.pdfMarginal modelling of...

Preview:

Citation preview

Marginal modelling of spatially-dependentnon-stationary extremes using threshold modelling

Paul NorthropUniversity College London

paul@stats.ucl.ac.uk

Environmental ExtremesRoyal Statistical Society

2nd December 2010

1/29

Outline

Wave height data

Spatial non-stationarity and dependence

Thresholds for non-stationary extremes

Model parameterisation

Theoretical and simulation studies

Wave height data

2/29

Wave height hindcasts from the Gulf of Mexico

Data supplied by Philip Jonathan at Shell Research UK.

Hindcasts of Y storm peak significant wave height (in metres)in the Gulf of Mexico.

wave height: trough to the crest of the wave.significant wave height: the average of the largest 1/3 waveheights. A measure of sea surface roughness.storm peak: largest value from each storm (cf. declustering).

a 6 × 12 grid of 72 sites (≈ 14 km apart).

Sep 1900 to Sep 2005 : 315 storms in total.

average of 3 observations (storms) per year, at each site.

Aim: quantify the extremal behaviour of Y at each site, makingappropriate adjustment for spatial dependence.

3/29

Spatial dependence

●● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

0 2 4 6 8 10 12

0

2

4

6

8

10

12

14

2 distant sites

Hssp / m, at site 1

Hssp

/ m

, at s

ite 7

2

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

0 5 10 15

0

5

10

15

2 nearby sites

Hssp / m, at site 30

Hssp

/ m

, at s

ite 3

1●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

4/29

Spatial non-stationarity

latitude

12

34

5

6

longitude

2

4

6

810

12

storm peak significant w

ave height / m

13

14

15

16

5/29

Modelling issues

Spatial non-stationarity.

Simple approach: model spatial effects on EV parameters asLegendre polynomials in longitude and latitude.More flexible approaches are possible.

Use a threshold that varies over space?

Spatial dependence.

Estimate parameters assuming conditional independence ofresponses given covariate values.Adjust standard errors etc. for spatial dependence.

6/29

Extreme value regression model

Conditional on covariates xij exceedances over a high thresholdu(xij) follow a 2-dimensional non-homogeneous Poisson process.

If responses Yij , i = 1, . . . , 72 (space), j = 1, . . . , 315 (storm) areconditionally independent:

L(θ) =315∏j=1

72∏i=1

exp

{− 1

λ

[1 + ξ(xij)

(u(xij)− µ(xij)

σ(xij)

)]−1/ξ(xij )+

}

×315∏j=1

∏i :yij>u(xij )

1

σ(xij)

[1 + ξ(xij)

(yij − µ(xij)

σ(xij)

)]−1/ξ(xij )−1+

.

λ : mean number of observations per year;µ(xij), σ(xij), ξ(xij) : GEV parameters of annual maxima at xij ;θ : vector of all model parameters:

7/29

Covariate-dependent thresholds

Arguments for:

Asymptotic justification for EV regression model : thethreshold u(xij) needs to be high for each xij .

Design : spread exceedances across a wide range of covariatevalues.

Set u(xij) so that P(Y > u(xij)), is approx. constant for all xij .

Set u(xij) by trial-and-error or by discretising xij , e.g. differentthreshold for different locations, months etc.

Quantile regression (QR) : model quantiles of a response Yas a function of covariates.

8/29

Constant threshold

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

estimate of 90% quantile

x

Y

9/29

Quantile regression

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

estimate of 90% quantile

x

Y

10/29

Model parameterisation

Let p(xij) = P(Yij > u(xij)). Then, if ξ(xij) = ξ is constant,

p(xij) ≈1

λ

[1 + ξ

(u(xij)− µ(xij)

σ(xij)

)]−1/ξ.

If p(xij) = p is constant then

u(xij) = µ(xij) + c σ(xij).

The form of u(xij) is determined by the extreme value model:

if µ(xij) and/or σ(xij) are linear in xij : linear QR;

if log(µ(xij) and/or log(σ(xij) is linear in xij : non-linear QR.

11/29

Theoretical study (with Nicolas Attalides)

Data-generating process: for covariate values x1, . . . , xn

Yi | X = xiindep∼ GEV (µ0 + µ1 xi , σ, ξ).

Set thresholdu(x) = u0 + u1 x .

Vary u1, set u0 so that the expected proportion of exceedances iskept constant at p.

Calculate Fisher expected information for (µ0, µ1, σ, ξ).

Invert to find asymptotic V-C of MLEs µ0, µ1, σ, ξ and hencevar(µ1).

Find the value of u1 that minimises var(µ1).

12/29

Preliminary findings

Let u1 be the value of u1 that minimises var(µ1).

If covariate values x1, . . . , xn are symmetrically distributedthen u1 = µ1 (quantile regression).

13/29

µ1 = 1 : symmetric x

0.0 0.5 1.0 1.5 2.0

0.16

0.18

0.20

0.22

0.24

u1

var(µ

1)

14/29

Preliminary findings

Let u1 be the value of u1 that minimises var(µ1).

If covariate values x1, . . . , xn are symmetrically distributedthen u1 = µ1 (quantile regression).

If x1, . . . , xn are positive (negative) skew then u1 < µ1(u1 > µ1).

15/29

µ1 = 1 : positive skew x (coeff. of skewness = 1)

0.0 0.5 1.0 1.5 2.0

0.15

0.20

0.25

0.30

u1

var(µ

1)

16/29

Preliminary findings

Let u1 be the value of u1 that minimises var(µ1).

If covariate values x1, . . . , xn are symmetrically distributedthen u1 = µ1 (quantile regression).

If x1, . . . , xn are positive (negative) skew then u1 < µ1(u1 > µ1).

. . . but the loss in efficiency from using u1 = µ1 is small.

Extensions:

More general models.

Effect of model mis-specification due to low threshold;

17/29

Adjustment for spatial dependence (for wave height data)

Independence log-likelihood:

lIND(θ) =k∑

j=1

72∑i=1

log fij(yij ; θ) =k∑

j=1

lj(θ).

(storms) (space)

In regular problems, as k →∞,

θ → N(θ0, I−1 V I−1),

I = Fisher expected information: −E(∂2

∂θ2lIND(θ0)

);

V = var(∂∂θ lIND(θ)

)=∑j

var (Uj(θ0)) =∑j

E(U2j (θ0)

),

where

Uj(θ) =∂lj(θ)

∂θ.

18/29

Adjustment of lIND(θ)

Estimate

I by Fisher observed information, evaluated at θ;

V byk∑

j=1

U2j

(θ)

.

Let HA =(−I−1 V I−1

)−1and HI = −I .

Chandler and Bate (2007):

lADJ(θ) = lIND(θ) +(θ − θ)′HA(θ − θ)

(θ − θ)′HI (θ − θ)

(lIND(θ)− lIND(θ)

),

Adjust lIND(θ) so that its Hessian is HA at θ rather than HI .

This adjustment preserves the usual asymptotic distribution ofthe likelihood ratio statistic.

19/29

Simulation study

30 years of daily data on a spatial grid.

Spatial dependence : mimics that of wave height data.

Temporal dependence : moving maxima : extremal index 1/2.

Spatial variation: location µ linear in longitude and latitude.

ξ: −0.2, 0.1, 0.4, 0.7.

Thresholds: 90th, 95th, 99th percentiles.

SE adjustment: data from distinct years are independent.

Simulations with no covariate effects and/or no spatialdependence for comparison.

20/29

Findings of simulation study

Slight underestimation of standard errors : uncertainty inthreshold ignored.

Uncertainties in covariate effects of threshold are negligiblecompared to the uncertainty in the level of the threshold.

Estimates of regression effects from QR and EV models arevery close : both estimate extreme quantiles from the samedata.

To a large extent fitting the EV model accounts foruncertainty in the covariate effects at the level of thethreshold.

21/29

Summary of modelling of wave height data

Threshold selection:

Choice of p: look for stability in parameter estimates.

Based on µ (and u) quadratic in longtiude and latitude, σ andξ constant . . .

22/29

Threshold selection : µ intercept

● ● ● ● ● ● ● ● ● ● ●●

● ● ●●

●●

probability of exceedance

0.5 0.4 0.3 0.2 0.1

2.5

3.0

3.5

4.0

4.5

5.0

µ0

23/29

Threshold selection : µ coefficient of latitude

● ●

●●

● ●●

●●

●●

●●

probability of exceedance

0.5 0.4 0.3 0.2 0.1

−0.

35−

0.30

−0.

25−

0.20

−0.

15−

0.10

−0.

05

µ2

24/29

Threshold selection : ξ

●●

●● ● ● ● ●

●●

●●

probability of exceedance

0.5 0.4 0.3 0.2 0.1

−0.

10.

00.

10.

20.

30.

4

ξ

25/29

Summary of modelling of wave height data

Choice of p: look for stability in parameter estimates.Use p = 0.4.

Model diagnostics : slight underestimation at very high levels,but consistent with estimated sampling variability.

QR model and EV model agree closely.

ξ = 0.066, with 95% confidence interval (−0.052, 0.223).

Estimated 200 year return level at (long=7, lat=1) is 15.78mwith 95% confidence interval (12.90, 22.28)m.

26/29

Conditional 200 year return levels

latitude

12

34

5

6

longitude

2

4

6

810

12

200 year return level / m 15.2

15.4

15.6

27/29

Conclusions

Quantile regression:

a simple and effective strategy to set thresholds fornon-stationary EV models;

supported by simulation study;

theoretical work is on-going;

Kysely, J., et al. (2010) use quantile regression to set atime-dependent threshold for peaks-over-threshold GP modelling ofdata simulated from a climate model.

Spatial dependence

adjust inferences for dependence; or

model dependence explicitly.

28/29

References

Chandler, R. E. and Bate, S. B. (2007) Inference for clustered datausing the independence loglikelihood. Biometrika 94 (1), 167–183.

Kysely, J., Picek, J. and Beranova, R. (2010) Estimating extremesin climate change simulations using the peaks-over-thresholdmethod with a non-stationary threshold Global and PlanetaryChange, 72, 55-68.

Northop, P. J. and Jonathan, P. Threshold modelling ofspatially-dependent non-stationary extremes with application tohurricane-induced wave heights. Revisions completed forEnvironmetrics.

Thank you for your attention.

29/29

Recommended