Extremal cluster characteristics of a regime switching model, with
hydrological applications
Péter Elek,Krisztina Vasas and András Zempléni
Eötvös Loránd University, [email protected]
4th Conference on Extreme Value AnalysisGothenburg, 2005
Contents
• Outline of EVT for stationary series– extremal index
– limiting cluster size distribution (e.g. distribution of flood length)
– distribution of aggregate excesses (e.g. distribution of flood volume)
• Two models:– a light-tailed conditionally heteroscedastic model
– a regime switching autoregressive model
• Extremal behaviour of the regime switching model
• Application to the study of flood dynamics
Extremal index
• Conditions D(un) or (un) are always assumed.
• A stationary series has extremal index if there exists a real sequence un for which
• n(1-F(un))
• P(M1,nun) exp(-)
• where M1,n = max(X1,X2,...,Xn)
• Under D(un) the extremal index can be estimated as:
= lim P(M1,p(n) un | X0>un)
• where p(n) is an appropriately increasing sequence
• p(n) is regarded as the cluster size
Cluster size distribution and point process convergence
• Distribution of the number of exceedances in [1,pn]:
n(j) = P( 1{X1>un}+...+ 1{Xp(n)>un} = j | M1,p(n)>un )• The point process of exceedances:
Nn(.) = i/n(.)1{Xi>un} • Under appropriate conditions:
n converges to some limiting distribution – Nn(.) converges weakly to a compound Poisson process
whose underlying Poisson process has intensity and whose i.i.d clusters are distributed as
• High-level exceedances occur in clusters, with cluster size distribution . Moreover, E()=1/.
Distribution of aggregate excess
• Aggregate excess above u in time interval [k,l]:
Wk,l(u) = (Xk-u)++(Xk+1-u)++...+(Xl-u)+
• This value (called flood volume in hydrology) is a good
indicator of the severity of extreme events.
• Under appropriate conditions (Smith et al., 1997):
W1,n(un) d W1+W2+...+WK
where K~Poisson() and the variables Wi are i.i.d,
independent of K.
• The distribution of Wi can be regarded as the limiting
aggregate excess distribution during an extremal event.
Problems
• Estimation of limiting quantities (, , W) is difficult.• Often the subasymptotic behaviour is of interest,
too, since the convergence to the limit is very slow.• To overcome these problems, one can restrict
attention to certain families of models.• A large class of Markov-chains behaves like a
random walk at extreme levels• which can be used to simulate extremal clusters in a
Markov-chain, see e.g. Smith et al. (1997)
Water discharge series are non-Markovian – even above high thresholds
• If the series were Markovian,
(Xt-Xt-1 | Xt-1,Xt-1-Xt-2>0) ~ (Xt –Xt-1| Xt-1,Xt-1-Xt-2<0) would hold
• The following plots show Xt-Xt-1 as a function of Xt-1 (if Xt-1 is above the 98% quantile), conditionally on the sign of Xt-1-Xt-2
• The two plots are not similar!
A light-tailed conditionally heteroscedastic model
Xt-ct = ai(Xt-i-ct-i) + t + bjt-j
t = t Zt
t = [d0 + d1(Xt-1-m)+]1/2
• Zt is an i.i.d. sequence with zero mean and unit variance
• ct describes the deterministic seasonal behaviour in mean
• If all moments of Zt are finite, then all moments of Xt are finite
• However, the exact tail behaviour is unknown (a special case of a similar model has Weibull-like tails, see Robert, 2000)
• The model approximates the extremal properties of water discharge series well (see Elek and Márkus, 2005)
A regime switching (RS) autoregressive model
Xt = Xt-1 + 1t if It = 1 (rising regime)
Xt = aXt-1 + 0t if It = 0 (falling regime)
1t is an i.i.d noise, distributed as Gamma(,)
0t is an i.i.d noise, distributed as Normal(0,)
• 0<a<1
• Successive regime durations are independent and distributed as
– NegBinom(1,p1) in the rising regime
– NegBinom(0,p0) in the falling regime
Properties of the RS-model • Heuristic explanation:
– Xt gets independent positive shocks in the rising regime
– it develops as a mean-reverting autoregression in the falling regime
• If 1=0=1, then It is a Markov-chain and Xt is a Markov-switching autoregression
• The model is stationary by applying the result of Brandt (1986) for stochastic difference equations
• Regime switching models have deep roots in hydrology (see e.g. Bálint and Szilágyi, 2005)
The model gives back the asymmetric shape of the hydrograph
Tail behaviour of the stationary distribution
• Theorem: The process has Gamma-like upper tail:• P( Xt>u | It=1 ) ~ K1 u-1 exp{-u[1-(1-p1)1/]}
• P( Xt>u | It=0 ) ~ K0 u-1 exp{-u[1-(1-p1)1/]/a}
• thus: P( Xt>u ) ~ K1 u-1 exp{-u[1-(1-p1)1/]}.
• The proof is based on the observations that • the aggregate increment during a rising regime has
Gamma-like tail • which becomes “negligible” during the falling regime.
• Corollary: Exceedances above high thresholds are asymptotically exponentially distributed:• limu P(Xt>x+u | Xt>u) = exp{-x[1-(1-p1)1/]}
Limiting cluster quantities in the model I.
• Even when the regime lengths are negative binomial,
• the extremal index is p1,
• and the limiting cluster size distribution is geometric with parameter p1.
Limiting cluster quantities in the model II.
• If =1, the limiting aggregate excess distribution is W = E1 + 2E2 + ... + NEN
– where N is geometric with parameter p1
– the variables Ei are exponential with parameter , independent from each other and from N
• The exponential moments are infinite, but all polynomial moments are finite.
• Anderson and Dancy (1992) suggested to model the aggregate excesses of a hydrological data set with Weibull-distribution.
Slow convergence to the limiting quantities
• The plot gives (u,p)
• if =p1=0.5, p0=0.1, a=0.5 and =0=1=1– for p=100 and 200 and– for u ranging from the
99% to the 99.99% quantile
= limp limu P( M1,pu | X0>u ) = (u,p)
Parameter estimation• Estimation of the whole model with hidden regimes:
– (reversible jump) MCMC
– maximum likelihood if 1=0=1 (i.e. in the Markov-switching case) – but it is computationally infeasible
• However, if we focus only on extremal dynamics
and assume that the regime durations (at least above a high level) are geometrically distributed
we can write down the likelihood based solely on data during floods (i.e. above a high threshold)
=1 is also assumed (in accordance with the empirical data)
Exponential QQ-plot for the positive increments above the threshold 900 m3/s
Likelihood computations
• Likelihood can be determined recursively:– qt=P( It=1 | Xt, Xt-1, …)
– q1cond = P( It=1 | Xt-1,…) = (1-p1)qt-1 + p0(1-qt-1)
– q0cond = P( It=0 | Xt-1,…) = p1qt-1 + (1-p0)(1-qt-1)
– f1 = f(Xt , It=1 | Xt-1,…) = q1cond fExp() (Xt-Xt-1)
– f0 = f(Xt , It=0 | Xt-1,…) = q0cond fN(0,) (Xt-aXt-1)
– f(Xt | Xt-1,…) = f0 + f1
– qt = f1/(f0 + f1)
• Some care is needed:– at the beginning of the floods qt
is determined from the tail behaviour of the model
– at the end of the floods the observation is censored
Advantages of using only the data over a threshold
• Model dynamics may be different at lower levels– For physical reasons, the rate of decay in the falling
regime (characterised by a) is varying over the decay
• Fast maximum likelihood estimation– Smaller sample size– Regimes separate very well at high levels
Application to flood analysis
• Data: 50 years of daily water discharge series at Tivadar (river Tisza) – about 18000 observations
• We assume =0=1=1
• Threshold: 900m3/s (about 98% quantile)• Parameter estimates and asymptotic standard errors:
– p1=0.598 (0.037)
• on average 1.7 days of further increase – in accordance with emp. value
– p0=0.027 (0.011)
• has a negligible effect on the dynamics over the threshold
– a=0.823 (0.007) • high persistence even in the falling regime
=0.0044 (0.0003) =137.1 (8.0)
Empirical and simulated flood dynamics
• Shape of the empirical and simulated floods are very similar.• Subasymptotic behaviour is important:
– Simulated water discharge remains over the threshold for 1.4 days in
average after the peak
Exceedances over a threshold
• Maximal exceedance over a threshold is approximately exponential with parameter p1=1/392 in the model,
• in good accordance with the empirical distribution.
• The plot shows the exceedance over the threshold 1250m3/s.
Aggregate excess (flood volume)
• Threshold = 1250 m3/s• Operational definition: two
floods are separated when the water discharge goes below a lower threshold (900 m3/s) between them
• There are only 48 such floods in 50 years
• Emp. mean: 72.1 mill. m3
• Sim. mean: 76.9 mill m3
• The QQ-plot shows the fit of the distribution, too.
Dependence of p1 on the threshold
Conclusions
• The limiting cluster quantities can be determined in our physically motivated regime switching model
• Simulations are still needed since the subasymptotic behaviour is important at the relevant thresholds
• To determine return levels of, e.g., flood volume, the occurence of extreme events should also be modelled, by a Poisson-process.
• Further work: what parametric multivariate extreme value distribution does a reasonable multivariate regime switching model suggest?
References
• Anderson, C.W. and Dancy, G.P. (1992): The severity of extreme events, Research Report 409/92 University of Sheffield.
• Bálint, G. and Szilágyi, J. (2005): A hybrid, Markov-chain based model for daily streamflow generation, Journal of Hydrol. Engineering, in press.
• Brandt, A. (1986): The stochastic equation Yn+1=AnYn+Bn with stationary coefficients, Adv. in Appl. Prob., 18, 211-220.
• Elek, P. and Márkus, L. (2004): A long range dependent model with nonlinear innovations for simulating daily river flows, Natural Hazards and Earth Systems Sciences, 4, 277-283.
• Elek, P. and Márkus, L. (2005): A light-tailed conditionally heteroscedastic model with applications to river flows, in preparation.
• Robert, C. (2000): Extremes of alpha-ARCH models, in: Measuring Risk in Complex Stochastic Systems (ed. by Franke et al.), XploRe e-books.
• Segers, J. (2003): Functionals of clusters of extremes, Adv. in Appl. Prob., 35, 1028-1045.
• Smith, R.L., Tawn, J.A. and Coles, S.G. (1997): Markov chain models for threshold exceedances, Biometrika, 84, 249-268.
Thank you for your attention!