26
Author's personal copy Stochastic Processes and their Applications 120 (2010) 1492–1517 www.elsevier.com/locate/spa A discussion on mean excess plots Souvik Ghosh a,* , Sidney Resnick b a Department of Statistics, Columbia University, New York, NY 10027, United States b School of Operations Research and Information Engineering, Cornell University, Ithaca, NY 14853, United States Received 29 July 2009; received in revised form 20 March 2010; accepted 6 April 2010 Available online 23 April 2010 Abstract The mean excess plot is a tool widely used in the study of risk, insurance and extreme values. One use is in validating a generalized Pareto model for the excess distribution. This paper investigates some theoretical and practical aspects of the use of the mean excess plot. c 2010 Elsevier B.V. All rights reserved. Keywords: Mean excess plots; Peaks over threshold; Extreme values; Generalized Pareto 1. Introduction The distribution of the excess over a threshold u for a random variable X with distribution function F is defined as F u (x ) = P [ X - u x | X > u ] . (1.1) This excess distribution is the foundation for peaks over threshold (POT) modeling [13,4] which fits appropriate distributions to data on excesses. The use of peaks over threshold modeling is widespread and applications include: Hydrology: It is critical to model the level of water in a river or sea to avoid flooding. The level u could represent the height of a dam, levee or river bank. See [26,25]. Actuarial science: Insurance companies set premium levels based on models for large losses. Excess of loss insurance pays for losses exceeding a contractually agreed amount. See [17,14]. * Corresponding author. E-mail addresses: [email protected] (S. Ghosh), [email protected] (S. Resnick). 0304-4149/$ - see front matter c 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.spa.2010.04.002

A discussion on mean excess plots - Cornell Universitybased on the ME plot which is the plot of the pointsf. X.k/; MO. X.k/// V1 < k ng, where X.1/ X.2/ X.n/ are the order statistics

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Author's personal copy

    Stochastic Processes and their Applications 120 (2010) 1492–1517www.elsevier.com/locate/spa

    A discussion on mean excess plots

    Souvik Ghosha,∗, Sidney Resnickb

    a Department of Statistics, Columbia University, New York, NY 10027, United Statesb School of Operations Research and Information Engineering, Cornell University, Ithaca, NY 14853, United States

    Received 29 July 2009; received in revised form 20 March 2010; accepted 6 April 2010Available online 23 April 2010

    Abstract

    The mean excess plot is a tool widely used in the study of risk, insurance and extreme values. One use isin validating a generalized Pareto model for the excess distribution. This paper investigates some theoreticaland practical aspects of the use of the mean excess plot.c© 2010 Elsevier B.V. All rights reserved.

    Keywords: Mean excess plots; Peaks over threshold; Extreme values; Generalized Pareto

    1. Introduction

    The distribution of the excess over a threshold u for a random variable X with distributionfunction F is defined as

    Fu(x) = P [X − u ≤ x |X > u] . (1.1)

    This excess distribution is the foundation for peaks over threshold (POT) modeling [13,4] whichfits appropriate distributions to data on excesses. The use of peaks over threshold modeling iswidespread and applications include:

    • Hydrology: It is critical to model the level of water in a river or sea to avoid flooding. Thelevel u could represent the height of a dam, levee or river bank. See [26,25].• Actuarial science: Insurance companies set premium levels based on models for large losses.

    Excess of loss insurance pays for losses exceeding a contractually agreed amount. See[17,14].

    ∗ Corresponding author.E-mail addresses: [email protected] (S. Ghosh), [email protected] (S. Resnick).

    0304-4149/$ - see front matter c© 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.spa.2010.04.002

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1493

    • Survival analysis: The POT method is used for modeling lifetimes; see [15].• Environmental science: Public health agencies set standards for pollution levels. Exceedances

    of these standards generate public alerts or corrective measures; see [24].

    Peaks over threshold modeling is based on the generalized Pareto class of distributionsbeing appropriate for describing statistical properties of excesses. A random variable X has ageneralized Pareto distribution (GPD) if it has a cumulative distribution function of the form

    Gξ,β(x) ={

    1− (1+ ξ x/β)−1/ξ if ξ 6= 01− exp(−x/β) if ξ = 0

    (1.2)

    where β > 0, and x ≥ 0 when ξ ≥ 0 and 0 ≤ x ≤ −β/ξ if ξ < 0. The parameters ξ and βare referred to as the shape and scale parameters respectively. For a Pareto distribution, the tailindex α is just the reciprocal of ξ when ξ > 0. A special case is when ξ = 0 and in this case theGPD is the same as the exponential distribution with mean β.

    The Pickands–Balkema–de Haan Theorem [14, Theorem 7.20, page 277] provides thetheoretical justification for the centrality of the GPD class of distributions for peaks overthreshold modeling. This result shows that for a large class of distributions (those distributionsin a maximal domain of attraction of the extreme value laws), the excess distribution Fu isasymptotically equivalent to a GPD law Gξ,β(u), as the threshold u approaches the right endpoint of the distribution F . Here the asymptotic shape parameter ξ is fixed but the scale β(u)may depend on u. More precise statements are given below in Theorems 3.1, 3.6 and 3.9. Forthis reason the GPD is a natural choice for modeling peaks over a threshold.

    The choice of the extreme threshold u, where the GPD model provides a suitableapproximation to the excess distribution Fu , is critical in applications. The mean excess (ME)function is a tool popularly used to aid this choice of u and also to determine the adequacy of theGPD model in practice. The ME function of a random variable X is defined as

    M(u) := E [X − u|X > u] , (1.3)

    provided E X+ u]

    n∑i=1

    I[X i>u], u ≥ 0. (1.4)

    Yang [27] suggested the use of the empirical ME function and established the uniform strongconsistency of M̂(u) over compact u-sets; that is, for any b > 0,

    P

    [lim

    n→∞sup

    0≤u≤b|M̂(u)− M(u)| = 0

    ]= 1. (1.5)

    In the context of extremes, however, (1.5) is not especially informative since what is of interestis the behavior of M̂(u) in a neighborhood of the right end point of F , which could be ∞. In

  • Author's personal copy

    1494 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    this case the GPD plays a pivotal role. For a random variable X ∼ Gξ,β , we have E(X) 1, the case where the ME function doesnot exist, and show that the ME plot converges to a random curve. This also holds in the moredelicate case ξ = 1 after suitable rescaling. These results show that the ME plot is inconsistentwhen ξ ≥ 1 and emphasizes that knowledge of a finite mean is required.

    It is tempting to argue that consistency of the ME plot M̂(u) should imply, by a continuityargument, the consistency of the estimator of ξ obtained from computing the slope of the line fitto the ME plot. However, this slope functional is not necessarily continuous, as discussed in [6].So consistency of the slope function requires further work and is an ongoing investigation.

    The paper is arranged as follows. In Section 2 we briefly discuss required background onconvergence of random closed sets and then study the ME plot in Section 3. In Section 4 wediscuss advantages and disadvantages of the mean excess plot and how this tool compares withother techniques of extreme value theory such as using the Hill estimator, the Pickands estimatorand the QQ plot. We illustrate the behavior of the empirical mean excess plot for some simulateddata sets in Section 5 and in Section 6 we analyze three real data sets obtained from differentsubject areas and also compare different tools.

    2. Background

    2.1. Topology on closed sets of R2

    Before we start any discussion on whether a mean excess plot is a reasonable diagnostic toolwe need to understand what it means to talk about convergence of plots. So we discuss thetopology on a set containing the plots.

    We denote the collection of closed subsets of R2 by F . We consider a hit and miss topologyon F called the Fell topology. The Fell topology is generated by the families {FK , K compact}and {FG ,G open} where for any set B

    F B = {F ∈ F : F ∩ B = ∅} and FB = {F ∈ F : F ∩ B 6= ∅}.So F B and FB are collections of closed sets which miss and hit the set B, respectively. This iswhy such topologies are called hit and miss topologies. In the Fell topology a sequence of closedsets {Fn} converges to F ∈ F if and only if the following two conditions hold:

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1495

    • F hits an open set G implies that there exists N ≥ 1 such that for all n ≥ N , Fn hits G.• F misses a compact set K implies that there exists N ≥ 1 such that for all n ≥ N , Fn

    misses K .

    The Fell topology on the closed sets of R2 is metrizable and we indicate convergence in thistopology of a sequence {Fn} of closed sets to a limit closed set F by Fn → F . Sometimes,rather than work with the topology, it is easier to deal with the following characterization ofconvergence.

    Lemma 2.1. A sequence Fn ∈ F converges to F ∈ F in the Fell topology if and only if thefollowing two conditions hold:(1) For any t ∈ F there exists tn ∈ Fn such that tn → t .(2) If for some subsequence (mn), tmn ∈ Fmn converges, then limn→∞ tmn ∈ F.

    See Theorem 1-2-2 in [19, p. 6] for a proof of this lemma. Since the topology is metrizable,the definition of convergence in probability is obvious. The following result is a well-knownand helpful characterization for convergence in probability of random variables and it holds forrandom sets as well; see Theorem 6.21 in [20, p. 92].

    Lemma 2.2. A sequence of random sets (Fn) in F converges in probability to a random set F ifand only if for every subsequence (n′) of Z+ there exists a further subsequence (n′′) of (n′) suchthat Fn′′ → F-a.s.

    We use the following notation: For a real number x and a set A ⊂ Rn , x A = {xy : y ∈ A}.[19,20] are good references for the theory of random sets.

    2.2. Miscellaneous details

    Throughout this paper we will take k := kn to be a sequence increasing to infinity such thatkn/n → 0. For a distribution function F(x) we write F̄(x) = 1 − F(x) for the tail and thequantile function is

    F←(

    1−1u

    )= inf

    {s : F(s) ≥ 1−

    1u

    }=

    (1

    1− F

    )←(u).

    A function U : (0,∞) 7→ R+ is regularly varying with index ρ ∈ R, written as U ∈ RVρ , if

    limt→∞

    U (t x)U (t)

    = xρ, x > 0.

    We denote the space of non-negative Radon measures µ on (0,∞] metrized by the vague metricby M+(0,∞]. Point measures are expressed as a function of their points {xi , i = 1, . . . , n} by∑n

    i=1 δxi . See, for example, [21, Chapter 3].We will use the following notation to denote different classes of functions: For 0 ≤ a < b ≤

    ∞,(i) D[a, b): Right-continuous functions with finite left limits defined on [a, b).

    (ii) Dl [a, b): Left-continuous functions with finite right limits defined on [a, b).

    We will assume that these spaces are equipped with the Skorokhod topology and the distancefunction. In some cases we will also consider product spaces of functions and then the topologywill be the product topology. For example, D2l [1,∞) will denote the class of two-dimensionalfunctions on [1,∞). The classes of functions defined on the sets [a, b] or (a, b] will have theobvious notation.

  • Author's personal copy

    1496 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    3. Mean excess plots

    As discussed in the introduction, a random variable having distribution Gξ,β with ξ < 1 has alinear ME function given by (1.6) where the slope ξ is positive (0 < ξ < 1), negative or ξ = 0.We consider these three cases separately.

    3.1. Positive slope

    In this subsection we concentrate on the case where ξ > 0. A finite mean for F is guaranteedwhen ξ < 1 and we also investigate what happens when ξ ≥ 1.

    The following theorem is a combination of Theorems 3.3.7 and 3.4.13(b) in [13].

    Theorem 3.1. Assume ξ > 0. The following are equivalent for a cumulative distribution functionF:

    (1) F̄ ∈ RV−1/ξ , i.e., for every t > 0, limx→∞ F̄(t x)F̄(x) = t−1/ξ .

    (2) F is in the maximal domain of attraction of a Frechet distribution with parameter 1/ξ , i.e.,

    limn→∞

    Fn(cn x) = exp{−x−1/ξ } for all x > 0

    where cn = F←(1− n−1).(3) There exists a positive measurable function β(u) such that

    limu→∞

    supx≥u|Fu(x)− Gξ,β(u)(x)| = 0. (3.1)

    Theorem 3.1(3) is one case of the Pickands–Balkema–de Haan theorem. It guarantees theexistence of a measurable function β(u) for which (3.1) holds but does not construct thisfunction. However, β(u) can be obtained from Karamata’s representation of a regularly varyingfunction [3], namely if F̄ ∈ RV−1/ξ , there exists 0 < z u] ≈ Gξ,1(x)

    and a(u) is a choice for the scale parameter β(u) in (3.1). Hence we get that β(u)/u → ξ asu →∞ by the convergence to types theorem [21].

    Consider the ME plot for i.i.d. random variables having common distribution F which satisfiesF̄ ∈ RV−1/ξ for some ξ > 0. Since the excess distribution is well approximated by the GPD forhigh thresholds, we expect that for ξ < 1, the ME function will look similar to that of the GPD

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1497

    for high thresholds and therefore seek evidence of linearity in the plot. We first consider the MEplot when 0 < ξ < 1 and will discuss the case ξ ≥ 1 separately. Furthermore, we see that foreach n ≥ 1, the mean excess plot, being a finite set of R2-valued random variables, is measurableand a random closed set. This follows from the definition of random sets; see Definition 1.1 in[20, p. 2].

    3.1.1. The heavy tail with a finite mean; 0 < ξ < 1The scaled and thresholded ME plot converges to a deterministic line.

    Theorem 3.2. If (Xn, n ≥ 1) are i.i.d. observations with distribution F satisfying F̄ ∈ RV−1/ξwith 0 < ξ < 1, then in F

    Sn := 1X(k){(

    X(i), M̂(X(i))): i = 2, . . . , k

    }P−→ S :=

    {(t,

    ξ

    1− ξt): t ≥ 1

    }. (3.2)

    Remark 3.3. Roughly, this result implies

    X(k)Sn :={(

    X(i), M̂(X(i))): i = 2, . . . , kn

    }≈ X(k)S.

    The plot of the points Sn is a little different from the original ME plot. In practice, peopleplot the points {(X(i), M̂(X(i))) : 1 < i ≤ n}, but our result restricts attention to the higherorder statistics corresponding to X(1), . . . , X(k). This restriction is natural and corresponds tolooking at observations over high thresholds. One imagines zooming into the area of interest inthe complete ME plot.

    This result scales the points (X(i), M̂(X(i))) by X−1(k) . Since both coordinates of the points inthe plot are scaled, we do not change the structure or appearance of the plot but only the scaleof the axes. Hence we may still estimate the slope of the line if we want to estimate ξ by thismethod [7]. The scaling is important because the points {(X(i), M̂(X(i))) : 1 < i ≤ k} aremoving to infinity and the Fell topology is not equipped to handle sets which are moving out toinfinity. Furthermore, the regular variation assumption on the tail of F involves a ratio conditionand thus it is natural that the random set convergence uses scaling.

    A central assumption in Theorem 3.2 is that the random variables {X i } are i.i.d. The proofof the theorem below will explain that an important tool is the convergence of the tail empiricalmeasure ν̂n in (3.4). By Proposition 2.1 in [23], we know that the i.i.d. assumption of the randomvariables is not a necessary condition for the convergence of the tail empirical measure. Webelieve that as long as the tail empirical measure converges, our result should hold.

    Proof. We show that for every subsequence mn of integers there exists a further subsequence lnof mn such that

    Sln → S a.s. (3.3)Define the tail empirical measure as a random element of M+(0,∞] by

    ν̂n :=1k

    n∑i=1

    δX i /X(k) . (3.4)

    Following (4.21) in [22, p. 83] we get that

    ν̂n ⇒ ν in M+(0,∞] (3.5)

  • Author's personal copy

    1498 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    where ν(x,∞] = x−1/ξ , x > 0. Now consider

    Sn(u) =

    (X(dkue)

    X(k),

    M̂(X(dkue))X(k)

    )u ∈ (0, 1]

    as random elements in D2l (0, 1]. We will show that Sn(·)P−→ S(·) in D2l (0, 1], where

    S(u) =(

    u−ξ ,ξ

    1− ξu−ξ

    )for all 0 < u ≤ 1.

    We already know the result for the first component of Sn , i.e., S(1)n (t) := X(dkte)/X(k)

    P→ t−ξ in

    Dl(0, 1]; see [22, p. 82]. Since the limits are non-random it suffices to prove the convergence ofthe second component of Sn . Observe that the empirical mean excess function can be obtainedfrom the tail empirical measure:

    S(2)n (u) :=M̂(X(dkue))

    X(k)=

    kdkue − 1

    ∫∞

    X(dkue)/X(k)ν̂n(x,∞]dx .

    Consider the maps T and TK from M+(0,∞] to Dl [1,∞) defined by

    T (µ)(t) =∫∞

    tµ(x,∞]dx and TK (µ)(t) =

    ∫ K∨tt

    µ(x,∞]dx .

    We understand T (µ)(t) = ∞ if µ(x,∞] is not integrable. We will show that T (ν̂n)P→ T (ν).

    The function TK is obviously continuous and therefore TK (ν̂n)P→ TK (ν) in Dl [1,∞). In order

    to prove that T (ν̂n)P→ T (ν) it suffices to show that for any � > 0

    limK→∞

    lim supn→∞

    P[∥∥TK (ν̂n)− T (ν̂n)∥∥ > �] = 0,

    where ‖ · ‖ is the sup-norm on Dl [1,∞). To verify this claim, note that

    limK→∞

    lim supn→∞

    P[∥∥TK (ν̂n)− T (ν̂n)∥∥ > �] ≤ lim

    K→∞lim sup

    n→∞P[∫∞

    Kν̂n(x,∞]dx > �

    ]and the rest is proved easily following the arguments used in step 3 of the proof of Theorem 4.2in [22, p. 81].

    Suppose Dl,≥1(0, 1] is the subspace of Dl(0, 1] consisting only of functions which are neverless than 1. Consider the random element Yn in the space Dl,≥1(0, 1] × Dl [1,∞),

    Yn :=(

    X(dk·e)X(k)

    , T (ν̂n)).

    From what we have obtained so far it is easy to check that YnP→ Y , where

    Y (u, t) =(

    u−ξ ,ξ

    1− ξt (ξ−1)/ξ

    ).

    The map T̃ : Dl,≥1(0, 1] × Dl [1,∞)→ Dl(0, 1] defined by

    T̃ ( f, g)(u) = g( f (u)) for all 0 < u ≤ 1

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1499

    is continuous if g and f are continuous and therefore

    T̃ (Yn)(u) =∫∞

    X(dkue)/X(k)ν̂n(x,∞]dx

    P−→

    ξ

    1− ξu1−ξ in Dl(0, 1].

    This finally shows the convergence of the second component of Sn and hence we get that SnP→ S.

    Next we have to convert this result to that of convergence of the random sets Sn . This argumentis similar to the one used to prove Lemma 2.1.3 in [6]. Choose any subsequence (mn) of integers.

    Since Sn(·)P−→ S(·) we have Smn (·)

    P−→ S(·) in D2l (0, 1]. So there exists a subsequence (ln)

    of (mn) such that Sln (·) → S(·) a.s. Now the final step is to use this to prove (3.3) and for thatwe will use Lemma 2.1. Take any point in S of the form (t, ξ/(1 − ξ)t) for some t ≥ 1. Setu = t−ξ and observe that Sln (u) → (t, ξ/(1 − ξ)t) and Sln (u) ∈ Sln . This proves condition(1) of Lemma 2.1 and we next prove condition (2). Suppose for some subsequence ( jn) of (ln),S jn (un) converges to (x, y). Since S

    (1)(u) is strictly monotone we get that there must be some0 < u ≤ 1 such that un → u as n →∞. Now, since S jn → S and S is a continuous we get thatS jn (un)→ S(u) ∈ S. That completes the proof. �

    3.1.2. Case ξ ≥ 1; limit sets are randomThe following theorem describes the asymptotic behavior of the ME plot when ξ ≥ 1.

    Reminder: ξ > 1 guarantees an infinite mean.

    Theorem 3.4. Assume (Xn, n ≥ 1) are i.i.d. observations with distribution F satisfying F̄ ∈RV−1/ξ :

    (i) If ξ > 1, then

    Sn :={(

    X(i)b(n/k)

    ,M̂(X(i))b(n)/k

    ): i = 2, . . . , k

    }H⇒ S := {(tξ , t S1/ξ ) : t ≥ 1} (3.6)

    inF , where b(n) := F←(1−1/n) and S1/ξ is the positive stable random variable with index1/ξ which satisfies for t ∈ R

    E[eit S1/ξ

    ]= exp

    {−Γ

    (1−

    )cos

    π

    2ξ|t |1/ξ

    [1− i sgn(t) tan

    π

    ]}. (3.7)

    (ii) If ξ = 1 and k satisfies k = k(n)→∞, k/n→ 0, and

    kb(n/k)/b(n)→ 1 (n→∞), (3.8)

    then

    Sn :={(

    X(i)b(n/k)

    ,M̂(X(i))b(n/k)

    −kCn,kib(n)

    ): i = 2, . . . , k

    }H⇒ S := {t (1, S1 − 1− log t) : t ≥ 1} (3.9)

    in F , whereCn,k = n

    (E[X1 I[X1≤b(n)]] − E[X1 I[X1≤b(n/k)]]

    )and S1 is a positively skewed stable random variable satisfying

    E[eit S1

    ]= exp

    {it∫∞

    0

    (sin xx2−

    1x(1+ x)

    )dx − |t |

    [π2+ i sgn(t) log |t |

    ]}.

  • Author's personal copy

    1500 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    Remark 3.5. In Theorem 3.2 we considered the points of the mean excess plot normalized byX(k). By scaling both coordinates by the same normalizing sequence, we did not change thestructure of the plot. But in Theorem 3.4(i) we need different scaling in the two coordinates. Thisis simple to observe since b(n) = F←(1 − n−1) ∈ RVξ and ξ > 1 implies kb(n/k)/b(n)→ 0as n → ∞. This means that in order to get a finite limit we need to normalize the secondcoordinate by a sequence increasing at a much faster rate than the normalizing sequence for thefirst coordinate. This is indeed changing the structure of the plot and even with this normalizationthe limiting set is random. The limit is a curve scaled in the second coordinate by the randomquantity S1/ξ . Note that the limit is independent of the choice of the sequence kn as long as itsatisfies the condition that kn →∞ and kn/n → 0 as n →∞. Another interesting outcome, aspointed out by a referee, is that in the log–log scale the limit set becomes

    logS ={(

    u,1ξ

    u + log S1/ξ

    ): u ≥ 0

    }which is a straight line with slope 1/ξ and a random intercept term S1/ξ .

    In Theorem 3.4(ii) along with ξ = 1 we make the extra assumption (3.8). Under theseassumptions we get that the mean excess plot with some centering in the second coordinateconverges to a random set. We could obtain the result without (3.8) but then the centeringbecomes random and more complicated and difficult to interpret. The significance of (3.8) isas follows: The centering Cn,k is of the form

    Cn,k = n (π(n)− π(n/k))

    where π(t) =∫ b(t)

    0 F̄(s)ds is in the de Haan class Π and has slowly varying auxiliaryfunction g(t) := b(t)/t ; see [22,3,9,10]. Condition (3.8) is the same as requiring k to satisfyg(n/k)/g(n)→ 1.

    Proof. (i) We will first prove that

    Yn(t) :=

    (X(dk/te)b(n/k)

    ,M̂(X(dk/te)

    )b(n)/k

    )H⇒ Y (t) :=

    (tξ , t S1/ξ

    )in D2[1,∞). (3.10)

    The two important facts that we will need for the proof are the following:

    (A) Csorgo and Mason [5] showed that for any kn →∞ satisfying kn/n→ 0,

    1b(n)

    kn∑i=1

    X(i) H⇒ S1/ξ , in R.

    (B) Under the same assumption on the sequence kn [22, p. 82],

    Y (1)n (t) =X(dk/te)b(n/k)

    P−→ Y (1)(t) = tξ in D[1,∞). (3.11)

    Since Y (1)(t) is non-random, in order to prove (3.10) it suffices to show that Y (2)n (t) H⇒Y (2)(t) in D[1,∞) [22, Proposition 3.1, p. 57]. By Theorem 16.7 in [2, p. 174] we need to showthat Y (2)n (t) H⇒ Y (2)(t) in D[1, N ] for every N > 1. So fix N > 1 arbitrarily. By an abuse ofnotation we will use Y and Yn to denote their restrictions on [1, N ] as elements of D[1, N ].

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1501

    Observe that b(n) ∈ RVξ and since ξ > 1 we get kb(n/k)/b(n)→ 0 as n→∞. Combiningthis with (B) we get that for any t ≥ 1,

    k X(dk/te)b(n)

    P−→ 0. (3.12)

    Also observe that for any 1 ≤ t1 < t2 ≤ N ,

    1bn

    dk/t1e∑i=dk/t2e+1

    X(i) ≤ k(

    1t1−

    1t2+ 1

    )X(dk/t2e)

    b(n)P−→ 0. (3.13)

    Using (A), (3.12), (3.13) and Proposition 3.1 in [22, p. 57] we get that for any 1 ≤ t1 < t2 ≤N ,

    1b(n)

    (dk/t2e−1∑

    i=1

    X(i),dk/t1e−1∑i=dk/t2e

    X(i), k X(dk/t2e), k X(dk/t1e)

    )H⇒

    (S1/ξ , 0, 0, 0

    ). (3.14)

    This allows us to obtain the weak limit of (Y (2)n (t1), Y(2)n (t2)):

    (Y (2)n (t1), Y(2)n (t2)) =

    kb(n)

    (M̂(X(dk/t1e)

    ), M̂

    (X(dk/t2e)

    ))=

    kb(n)

    (1

    dk/t1e − 1

    dk/t1e−1∑i=1

    X(i) − X(dk/t1e),1

    dk/t2e − 1

    dk/t2e−1∑i=1

    X(i) − X(dk/t2e)

    )

    =1

    b(n)

    kdk/t2e−1∑

    i=1X(i)

    dk/t1e − 1+

    kdk/t1e−1∑i=dk/t2e

    X(i)

    dk/t1e − 1− k X(dk/t1e),

    kdk/t2e−1∑

    i=1X(i)

    dk/t2e − 1− k X(dk/t2e)

    H⇒ (t1, t2)S1/ξ .

    By similar arguments we can also show that for any 1 ≤ t1 < t2 < · · · < tm ≤ N ,

    (Y (2)n (t1), . . . , Y(2)n (tm)) H⇒ (t1, . . . , tm)S1/ξ .

    From [2], Theorem 13.3, p. 141, the proof of (3.10) will be complete if we show for any � > 0,

    limδ→0

    limn→∞

    P[wN (Y (2)n , δ) ≥ �

    ]= 0,

    where for any g ∈ D[1, N ],

    wN (g, δ) = sup1≤t1≤t≤t2≤N ,t2−t1≤δ

    {|g(t)− g(t1)| ∧ |g(t2)− g(t)|} .

    Fix any � > 0 and choose n large enough such that X(k) > 0 and k/N > 1. Then for any1 ≤ t1 ≤ t2 ≤ N ,

    |Y (2)n (t2)− Y(2)n (t1)|

    =1

    b(n)

    ∣∣∣∣∣ kdk/t2e − 1dk/t2e−1∑

    i=1

    X(i) − k X(dk/t2e) −k

    dk/t1e − 1

    dk/t1e−1∑i=1

    X(i) + k X(dk/t1e)

    ∣∣∣∣∣

  • Author's personal copy

    1502 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    ≤1

    b(n)

    (k

    dk/t2e − 1−

    kdk/t1e − 1

    ) dk/Ne−1∑i=1

    X(i)

    +1

    b(n)

    (k

    dk/t2e − 1

    dk/t2e−1∑i=dk/Ne

    X(i) + k X(dk/t2e)

    +k

    dk/t1e − 1

    dk/t1e−1∑i=dk/Ne

    X(i) + k X(dk/t1e)

    )

    ≤1

    b(n)

    (k

    dk/t2e − 1−

    kdk/t1e − 1

    ) dk/Ne−1∑i=1

    X(i) +4k Nb(n)

    X(dk/Ne)

    =: Un,N (t1, t2) H⇒ (t2 − t1)S1/ξ .

    Therefore, using the form of the function Un,N we get

    limδ→0

    limn→∞

    P[wN (Y (2)n , δ) ≥ �

    ]≤ limδ→0

    limn→∞

    P

    [sup

    1≤t1≤t2≤N ,t2−t1≤δ

    ∣∣∣Y (2)n (t2)− Y (2)n (t1)∣∣∣ ≥ �]

    ≤ limδ→0

    limn→∞

    P

    [sup

    1≤t1≤t2≤N ,t2−t1≤δUn,N (t1, t2) > �

    ]= limδ→0

    P[δS1/ξ ≥ �

    ]= 0.

    Hence we have proved (3.10).Now we prove the statement of the theorem. By Proposition 6.10, page 87 in [20], it suffices

    to show that for any continuous function f : R2 7→ R+ with a compact support,

    limn→∞

    E

    [supx∈Sn

    f (x)

    ]= E

    [supx∈S

    f (x)].

    Suppose f : R2 7→ R+ is a continuous function with compact support. By the Skorokhodrepresentation theorem (see Theorem 6.7 in [2, p. 70]) there exists a probability space (Ω ,G, P)and random elements Y ∗n (t) and Y

    ∗(t) in D[1,∞) such that Ynd= Y ∗n and Y

    d= Y ∗ and

    Y ∗n (t)(ω)→ Y∗(t)(ω) in D[1,∞) for every ω ∈ Ω . Now observe that

    supx∈S

    f (x) d= supt≥1

    f(Y ∗(t)

    )and sup

    x∈Snf (x) d= sup

    t≥1f(Y ∗n (t)

    ).

    Since f is continuous we get

    supt≥1

    f(Y ∗n (t)

    )→ sup

    t≥1f(Y ∗(t)

    )P-a.s

    and since f is bounded we apply the dominated convergence theorem to get

    limn→∞

    E

    [supx∈Sn

    f (x)

    ]= lim

    n→∞E

    [supt≥1

    f(Y ∗n (t)

    )]= E

    [supt≥1

    f(Y ∗(t)

    )]= E

    [supx∈S

    f (x)]

    and that completes the proof of the theorem when ξ > 1.

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1503

    (ii) Like in the proof of part (i) we will first prove that in D2[1,∞),

    Yn(t) :=

    (X(dkn/te)b(n/k)

    ,M̂(X(dkn/te)

    )b(n/k)

    −kdk/te

    Cn,kb(n)

    )H⇒ Y (t) := t (1, S1 − 1− log t) .

    (3.15)

    We will use the following facts:

    (A) Csorgo and Mason [5] showed that for any kn →∞ satisfying kn/n→ 0,

    1b(n)

    (kn∑

    i=1

    X(i) − Cn,k

    )H⇒ S1, in R.

    (B) For k →∞ with k/n→ 0, (3.11) still holds with ξ = 1.

    By the same arguments as were used in part (i) it suffices to prove that for any arbitrary N > 1,

    Y (2)n (t) H⇒ Y(2)(t) in D[1, N ].

    Observe that from (3.11) and the assumption that kb(n/k)/b(n)→ 1 we get for any t > 1,

    1b(n)

    k∑i=dk/te

    X(i) =(1+ o(1))b(n/k)k

    k∑i=dk/te

    X(i)P−→ log t. (3.16)

    The reason for this is that

    1k

    k∑i=dk/te

    X(i)b(n/k)

    =

    ∫ X(dk/te)/b(n/k)X(k)/b(n/k)

    xνn(dx)P→

    ∫ t1

    xx−2dx = log t

    where νn(dx) = 1k∑n

    i=1 δX(i)/b(n/k)(dx)→ x−2dx . See (3.5) and (3.11). Now fix any 1 ≤ t ≤

    N and note that

    Y (2)n (t) =M̂(X(dk/te)

    )b(n/k)

    −kCn,kdk/teb(n)

    =1

    kb(n/k)

    (k

    dk/te − 1

    dk/te−1∑i=1

    X(i)

    )−

    X(dk/te)b(n/k)

    −kCn,kdk/teb(n)

    = (1+ op(1))t

    b(n)

    (k∑

    i=1

    X i − Cn,k

    )−

    X(dk/te)b(n/k)

    −(1+ o(1))t

    b(n)

    k∑i=dk/te

    X(i)

    H⇒ t S1 − t − t log t.

    We complete the proof using the same arguments as were used in part (i). �

    3.2. Negative slope

    The case when ξ < 0 is characterized by the following theorem which is a combination ofTheorems 3.3.12 and 3.4.13(b) in [13]:

    Theorem 3.6. If ξ < 0 then the following are equivalent for a distribution function F:

    (1) F has a finite right end point xF and F̄(xF − x−1) ∈ RV1/ξ .

  • Author's personal copy

    1504 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    (2) F is in the maximal domain of attraction of a Weibull distribution with parameter −1/ξ , i.e.,

    Fn (xF − cn x)→ exp{−(−x)−1/ξ } for all x ≤ 0,

    where cn = xF − F←(1− n−1).(3) There exists a measurable function β(u) such that

    limu→xF

    supu≤x≤xF

    ∣∣Fu(x)− Gξ,β(u)(x)∣∣ = 0.Here we again get a characterization of this class of distributions in terms of the behavior

    of the maxima of i.i.d. random variables and the excess distribution. Using Theorem 3.6(1) andKaramata’s Theorem [3, Theorem 1.5.11, p. 28] we get that M(u)/(xF − u) ∼ ξ/(ξ − 1) asu → xF . We show that this behavior is observed empirically. The Pickands–Balkema–de HaanTheorem, part (3) of Theorem 3.6, does not explicitly construct the scale parameter β(u), but asin Remark 3.3 one can show that β(u)/(xF − u)→−ξ as u → xF .

    Theorem 3.7. Suppose (Xn, n ≥ 1) are i.i.d. random variables with distribution F which has afinite right end point xF and satisfies 1 − F(xF − x−1) ∈ RV1/ξ as x → ∞ for some ξ < 0.Then

    Sn := 1X(1) − X(k){(

    X(i) − X(k), M̂(X(i))): 1 < i ≤ k

    }P−→ S :=

    {(t,

    ξ

    1− ξ(t − 1)

    ): 0 ≤ t ≤ 1

    }in F .

    Remark 3.8. As in Section 3.1 we look at a modified version of the mean excess plot. Here wescale and relocate the points of the plot near the right end point. We may interpret this result as

    {(X(i), M̂(X(i))) : 1 < i ≤ k} ≈(X(k), 0

    )+(X(1) − X(k)

    )Swhere S =

    {(t, ξ1−ξ (t − 1)

    ): 0 ≤ t ≤ 1

    }.

    Proof. The proof is similar to that of Theorem 3.2. From Theorem 5.3(ii), p. 139 in [22], we get

    νn :=1k

    n∑i=1

    δ xF−Xicdn/ke

    H⇒ ν in M+[0,∞)

    where ν[0, x) = x−1/ξ for all x ≥ 0 and cn = F←(1 − n−1). Following the arguments used inthe proof of Theorem 4.2 in [22, p. 81] we also get

    ν̂n :=1k

    n∑i=1

    δ xF−XixF−X(k)

    H⇒ ν in M+[0,∞). (3.17)

    Here we can represent M̂(X(dkue)) in terms of the empirical measure as

    M̂(X(dkue)) =k(xF − X(k))dkue − 1

    ∫ xF−X(dkue)xF−X(k)

    0ν̂n[0, x)dx

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1505

    and taking the same route as in Theorem 3.2 we get

    Sn(u) =

    (xF − X(dkue)

    xF − X(k),

    M̂(X(dkue))xF − X(k)

    )P−→ S(u) =

    (u−ξ ,

    ξ

    ξ − 1u−ξ

    )in D2l (0, 1]. From this we get in the Fell topology{(

    xF − X(i)xF − X(k)

    ,M̂(X(i))

    xF − X(k)

    ): 1 ≤ i ≤ k

    }P−→

    {(t,

    ξ

    ξ − 1t): 0 ≤ t ≤ 1

    }.

    Finally, using the fact thatX(1) − X(k)xF − X(k)

    P−→ 1,

    and the identity1

    X(1) − X(k)

    {(X(i) − X(k), M̂(X(i))

    ): 1 < i ≤ k

    }=

    xF − X(k)X(1) − X(k)

    {(X(i) − X(k)xF − X(k)

    ,M̂(X(i))

    xF − X(k)

    ): 1 < i ≤ k

    }

    =xF − X(k)

    X(1) − X(k)

    {(1−

    xF − X(i)xF − X(k)

    ,M̂(X(i))

    xF − X(k)

    ): 1 < i ≤ k

    }we get the final result. �3.3. Zero slope

    The next result follows from Theorems 3.3.26 and 3.4.13(b) in [13] and Proposition 1.4in [21].

    Theorem 3.9. The following conditions are equivalent for a distribution function F with rightend point xF ≤ ∞:(1) There exists z < xF such that F has a representation

    F̄(x) = c(x) exp{−

    ∫ xz

    1a(t)

    dt}, for all z < x < xF , (3.18)

    where c(x) is a measurable function satisfying c(x) → c > 0, x → xF , and a(x) is apositive, absolutely continuous function with density a′(x)→ 0 as x → xF .

    (2) F is in the maximal domain of attraction of the Gumbel distribution, i.e.,

    Fn (cn x + dn)→ exp{−e−x

    }for all x ∈ R,

    where dn = F←(1− n−1) and cn = a(dn).(3) There exists a measurable function β(u) such that

    limu→xF

    supu≤x≤xF

    |Fu(x)− G0,β(u)(x)| = 0.

    Theorem 3.3.26 in [13] also says that a possible choice of the auxiliary function a(x) in (3.18)is

    a(x) =∫ xF

    x

    F̄(t)F̄(x)

    dt for all x < xF ,

  • Author's personal copy

    1506 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    and for this choice, the auxiliary function is the ME function, i.e., a(x) = M(x). Furthermore,we also know that a′(x)→ 0 as x → xF and this implies that M(u)/u → 0 as u → xF .

    A prime example in this class is the exponential distribution for which the ME function isa constant. The domain of attraction of the Gumbel distribution is a very big class includingdistributions as diverse as the normal and the log-normal. It is indexed by auxiliary functionswhich only need to satisfy a′(x) → 0 as x → xF . Since M(x) is a choice for the auxiliaryfunction a(x), the class of ME functions corresponding to the domain of attraction of the Gumbeldistribution is very large.

    Theorem 3.10. Suppose (Xn, n ≥ 1) are i.i.d. random variables with distribution F whichsatisfies any one of the conditions in Theorem 3.9. Then in F ,

    Sn := 1X(dk/2e) − X(k){(

    X(i) − X(k), M̂(X(i))): 1 < i ≤ k

    }P−→ S := {(t, 1) : t ≥ 0} .

    Proof. This is again similar to the proof of Theorem 3.2. Using Theorem 3.9(2) we get

    nF̄(cn x + dn)→ e−x for all x ∈ R.

    Since n/kn →∞ we also getnk

    F̄(cdn/kex + ddn/ke

    )→ e−x for all x ∈ R

    and then Theorem 5.3(ii) in [22, p. 139] gives us

    νn :=1k

    n∑i=1

    δ Xi−ddn/kecdn/ke

    H⇒ ν in M+(R)

    where ν(x,∞) = e−x for all x ∈ R. Following the arguments in the proof of Theorem 4.2 in[22, p. 81] we get

    X(k) − ddn/kecdn/ke

    P−→ 0

    and then

    ν̂n :=1k

    n∑i=1

    δ Xi−X(k)cdn/ke

    H⇒ ν in M+(R).

    Now, one can easily establish the identity between the empirical mean excess function and theempirical measure

    M̂(X(dkue)) =kcdn/kedkue − 1

    ∫∞

    X(dkue)−X(k)cdn/ke

    ν̂n(x,∞)dx .

    From this fact it follows that

    Sn(u) =

    (X(dkue) − X(k)

    cdn/ke,

    M̂(X(dkue))cdn/ke

    )P−→ S(u) = (− ln u, 1)

    in D2l (0, 1] and that in turn implies{(X(i) − X(k)

    cdn/ke,

    M̂(X(i))cdn/ke

    ): 1 ≤ i ≤ k

    }P−→ {(t, 1) : 0 ≤ t

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1507

    Finally, using the fact that

    X(dk/2e) − X(k)cdn/ke

    P−→ ln 2

    we get the desired result. �

    4. Comparison with other methods of extreme value analysis

    For i.i.d. random variables from a distribution in the maximal domain of attraction of theFrechet, Weibull or the Gumbel distributions, Theorems 3.2, 3.7 and 3.10 describe the asymptoticbehavior of the ME plot for high thresholds. Linearity of the ME plot for high order statisticsindicates there is no evidence against the hypothesis that the GPD model is a good fit for thethresholded data.

    Furthermore, we obtain a natural estimate ξ̂ of ξ by fitting a line to the linear part of the MEplot using least squares to get a slope estimate b̂ and then recovering ξ̂ = b̂/(1 + b̂). Althoughnatural, convergence of the ME plot to a linear limit does not guarantee consistency of thisestimate ξ̂ and this is still under consideration. Proposition 5.1.1 in [6] explains why the slope ofthe least squares line is not a continuous functional of finite random sets.

    Davison and Smith [7] give another method for estimating ξ . They suggest a way of finding athreshold using the ME plot and then fit a GPD to the points above the threshold using maximumlikelihood estimation. For both this and the LS method, the ME plot obviously plays a centralrole. We analyze several simulation and real data sets in Sections 5 and 6 using only the LSmethod.

    With any method, an important step is the choice of threshold guided by the ME plot such thatthe plot is roughly linear above this threshold. Threshold choice can be challenging and parameterestimates can be sensitive to the threshold choice, especially when real data are analyzed.

    The ME plot is only one of a suite of widely used tools for extreme value model selection.Other techniques are using the Hill plot, the Pickands plot, the moment estimator plot and the QQplot; cf. [22, Chapter 4] and [10]. Some comparisons from the point of view of asymptotic biasand variance appear in [11]. Here we review definitions and basic facts about several methodsassuming that X1, . . . , Xn is an i.i.d. sample from a distribution in the maximal domain ofattraction of an extreme value distribution. The asymptotics require k = kn , the number of upperorder statistics used for estimation, to be a sequence increasing to∞ such that kn/n→ 0.

    (1) The Hill estimator based on m upper order statistics is

    Hm,n =

    (1m

    m∑i=1

    logX(i)

    X(m+1)

    )−1, 1 ≤ m ≤ n.

    If ξ > 0 then Hkn ,nP−→ α = 1/ξ . The Hill plot is the plot of the points {(k, Hk,n) : 1 ≤ k ≤

    n}.(2) The Pickands estimator does not impose any restriction on the range of ξ . The Pickands

    estimator,

    ξ̂m,n =1

    log 2log

    (X(m) − X(2m)X(2m) − X(4m)

    ), 1 ≤ m ≤ [n/4],

    is consistent for ξ ∈ R; i.e., ξ̂kn ,nP−→ ξ as n → ∞. The Pickands plot is the plot of the

    points {(k, ξ̂m,n), 1 ≤ m ≤ [n/4]}.

  • Author's personal copy

    1508 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    (3) The QQ plot treats the cases ξ > 0 and ξ < 0 separately. When ξ > 0, the QQ plot consists ofthe points Qm,n := {

    (− log(i/m), log(X(i)/X(m))

    ): 1 ≤ i ≤ m} where m < n is a suitably

    chosen integer. Das and Resnick [6] showed Qkn ,n → {(t, ξ t) : t ≥ 0} in F equipped withthe Fell topology. So the limit is a line with slope ξ and the LS estimator is consistent [6,18].

    In the case when ξ < 0 then the QQ plot can be defined as the plot of the pointsQ′m,n := {(X(i),G←ξ̂ ,1(i/(n + 1))) : 1 ≤ i ≤ n}, where ξ̂ is an estimate of ξ based on mupper order statistics.

    (4) The moment estimator [12,10] is another method which works for all ξ ∈ R and is definedas

    ξ̂ (moment)m,n = H(1)m,n + 1−

    12

    (1−

    (H (1)m,n)2

    H2m,n

    )−1, 1 ≤ m ≤ n,

    where

    H (r)m,n =1m

    ∑(log

    X(i)X(m+1)

    )r, r = 1, 2.

    The moment estimator plot is the plot of the points {(k, ξ̂ (moment)k,n ) : 1 ≤ k ≤ n}. The momentestimator is consistent for ξ .

    (5) To complete this survey, recall that the ME plot converges to a non-random line when ξ < 1.

    The Hill and QQ plots work best for ξ > 0 and the ME plot requires the knowledge thatξ < 1. Each plot requires the data to be properly thresholded. The ME plot requires thresholdingbut also that k be sufficiently large that proper averaging takes place.

    5. Simulation

    We divide this section into three subsections. In Section 5.1 we show simulation results for themean excess plot of some standard distributions with well-behaved tails. In Sections 5.2 and 5.3we discuss simulation results for some distributions with either difficult tail behavior or infinitemean.

    5.1. Standard situations

    5.1.1. Pareto distributionThe obvious first choice for a distribution function to simulate from is the GPD. For the GPD

    the ME plot should be roughly linear. We simulate 50 000 random variables from the Pareto(2)distribution. This means that the parameters of the GPD are ξ = 0.5 and β = 1. Fig. 1 showsthe mean excess plot for this data set. Observe that in Fig. 1(a) the first part of the plot is quitelinear but it is scattered for very high order statistics. The reason behind this phenomenon is thatthe empirical mean excess function for high thresholds is the average of the excesses of a smallnumber of upper order statistics. When averaging over few numbers, there is high variability andtherefore, this part of the plot appears very non-linear and is uninformative. In Fig. 1(b) we zoominto the plot by leaving out the top 250 points. We calculate using all the data but plot only thepoints {(X(i), M̂(X(i))) : 250 ≤ i ≤ 50 000}. This restricted plot looks linear. We fit a leastsquares line to this plot and the estimate of the slope is 0.9701. Since the slope is ξ/(1 − ξ) weget the estimate of ξ to be 0.5076.

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1509

    Fig. 1. ME plot {(X(i), M̂(X(i))), 1 ≤ i ≤ 50 000} of 50 000 random variables from the Pareto(2) distribution (ξ = 0.5).(a) Entire plot, (b) order statistics 250–50 000.

    Fig. 2. ME plot of 50 000 random variables from the totally right-skewed stable(1.5) distribution (ξ = 2/3). (a) Entireplot, (b) order statistics 120–30 000, (c) 180–20 000, (d) 270–10 000.

    5.1.2. Right-skewed stable distribution

    We next simulate 50 000 random samples from a totally right-skewed stable(1.5) distribution.So F̄ ∈ RV−1.5 and then ξ = 2/3. Fig. 2(a) is the ME plot obtained from this data set. This is nota sample from a GPD, but only from a distribution in the same maximal domain of attraction. TheME function is not exactly linear and for estimating ξ we should concentrate on high thresholds.As we did for the last example we drop points in the plot for very high order statistics since theyare the averages of a very few values. Fig. 2(b), (c) and (d) confine the plot to the order statistics120–30 000, 180–20 000 and 270–10 000 respectively, i.e., plot the points (X(i), M̂(X(i))) for i inthe specified range. As we restrict the plot more and more, the plot becomes increasingly linear.In Fig. 2(d) the least squares estimate of the slope of the line is 1.763 and hence the estimate ofξ is 0.638.

  • Author's personal copy

    1510 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    dc

    Fig. 3. ME plot of 50 000 random variables from the beta(2, 2) distribution (ξ = −0.5). (a) Entire plot, (b) orderstatistics: 150–35 000, (c) 300–20 000, (d) 450–5000.

    5.1.3. Beta distributionFig. 3 gives the ME plot for 50 000 random variables from the beta(2, 2) distribution which

    is in the maximal domain of the Weibull distribution with the parameter ξ = −0.5. Fig. 3(a) isthe entire ME plot and then Fig. 3(b), (c) and (d) plot the empirical ME function for the orderstatistics 150–35 000, 300–20 000 and 450–5000 respectively. The last plot is quite linear and theestimate of ξ is −0.5361.

    5.2. Difficult cases

    5.2.1. Log-normal distributionThe log-normal(0, 1) distribution is in the maximal domain of attraction of the Gumbel

    distribution and hence ξ = 0. The ME function of the log-normal has the form

    M(u) =u

    ln u(1+ o(1)) as u →∞;

    see Table 3.4.7 in [13, p. 161]. So M(u) is regularly varying of index 1 but still M ′(u) → 0.Fig. 4(a) shows the ME plot obtained for a sample of size 105 from the log-normal(0, 1)distribution. Fig. 4(b), (c) and (d) show the empirical ME functions for the order statistics150–70 000, 300–40 000 and 450–10 000 respectively. The slopes of the least squares lines inFig. 4(b), (c) and (d) are 0.3351, 0.3112 and 0.267 respectively. The estimate of ξ also decreasessteadily as we zoom in towards the higher order statistics from 0.251 in 4(b) to 0.2107 in 4(d).Furthermore, a curve is evident in the plots and the slope of the curve is decreasing, albeit veryslowly, as we look at higher and higher thresholds. At first glance the ME function might seemto resemble that of a distribution in the maximal domain of attraction of the Frechet distribution.

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1511

    Fig. 4. ME plot of 105 random variables from the log-normal(0, 1) distribution (ξ = 0). (a) Entire plot, (b) orderstatistics: 150–70 000, (c) 300–40 000, (d) 450–10 000.

    The curve becomes evident only after a detailed analysis of the plot. That is possible because thedata are simulated, but in practice analysis would be difficult. For this example, the ME plot isnot a very effective diagnostic for discerning the model.

    5.2.2. A non-standard distributionWe also try a non-standard distribution for which F̄−1(x) = x−1/2(1 − 10 ln x), 0 < x ≤ 1.

    This means that F̄ ∈ RV−2 and therefore ξ = 0.5. The exact form of F̄ is given by

    F̄(x) = 400W(

    xe1/20/20)2

    x−2 for all x ≥ 1, (5.1)

    where W is the Lambert W function satisfying W (x)eW (x) = x for all x > 0. Observe thatW (x)→∞ as x →∞ and W (x) ≤ log(x) for x > 1. Furthermore,

    log(x)W (x)

    = 1+log W (x)

    W (x)→ 1 as x →∞,

    and hence W (x) is a slowly varying function. This is therefore an example where the slowlyvarying term contributes significantly to F̄ . That was not the case in the Pareto or the stableexamples. We simulated 105 random variables from this distribution. Fig. 5(a) gives the entireME plot from this data set. Fig. 5(b) and (c) plot the ME function for the order statistics150–70 000 and 400–20 000 respectively. In Fig. 5(c) the estimate of ξ is 0.6418 which is asomewhat disappointing estimate given that the sample size was 105. Fig. 5(d) is the Hill plotfrom this data set using the QRMlib package in R. It plots the estimate of α = 1/ξ obtained bychoosing different values of k. It is evident from this that the Hill estimator does not performwell here. For none of the values of k is the Hill estimator even close to the true value of α which

  • Author's personal copy

    1512 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    Fig. 5. ME plot of 105 random variables from the distribution in (5.1) (ξ = 0.5). (a) Entire plot, (b) order statistics:150–70 000, (c) 400–20 000, (d) Hill plot estimating α = 1/ξ .

    Fig. 6. ME plot of 50 000 random variables from the Pareto(0.5) distribution (ξ = 2). (a) Entire plot, (b) order statistics250–10 000.

    is 2. We conclude, not surprisingly, that a slowly varying function increasing to infinity can foolboth the ME plot and the Hill plot. See [8] for a discussion on the behavior of the ME plot for asample simulated from the g-and-h distribution and [22] for Hill horror plots.

    5.3. Infinite mean: Pareto distribution with ξ = 2

    This simulation sheds light on the behavior of the ME plot when ξ > 1. In this case the MEfunction does not exist but the empirical ME plot does. Fig. 6 displays the ME plot of for 50 000random variables simulated from the Pareto(0.5) distribution. The plot is certainly far from lineareven for high order statistics and the least squares line has slope 7780.84 which gives an estimateof ξ as 0.9999. This certainly gives an indication that the ME plot is not a good diagnostic in thiscase.

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1513

    Fig. 7. Internet response sizes. (a) Scatter plot, (b) Hill plot estimating α = 1/ξ , (c) ME plot, (d) ME plot for orderstatistics 300–12 500, (e) Pickands plot for ξ , (f) QQ plot with k = 15 000, (g) QQ plot with k = 5000.

    6. ME plots for real data

    6.1. The size of the Internet response

    This data set consists of Internet response sizes corresponding to user requests. The sizes arethresholded to be at least 100 KB. The data set is a part of a bigger set collected in April 2000 atthe University of North Carolina at Chapel Hill.

    Fig. 7 contains various plots of the data. Fig. 7(b) and (e) are the Hill plot (estimating 1/ξ )and the Pickands plot respectively. It is difficult to infer anything from these plots, thoughsuperficially they appear stable. Fig. 7(c) and (d) are the entire ME plot and the ME plot restrictedto order statistics 300–12 500. The second plot does seem to be very linear and gives an estimateof ξ as 0.5908. Fig. 7(f) and (g) are the QQ plots for the data for k = 15 000 and k = 2500(as explained in Section 4). The estimates of ξ in these two plots are 0.8851 and 0.6362. Theestimates of ξ obtained from the QQ plot 7(d) and the ME plot 7(g) are close and the plots arealso linear. So we believe that this is a reasonable estimate of ξ .

    6.2. The volume of water in the Hudson river

    We now analyze data on the average daily discharge of water (in cubic feet per second) in theHudson river measured at the US Geological Survey site number 01318500 near Hadley, NY.The range of the data is from July 15, 1921 to December 31, 2008 for a total of 31 946 datapoints.

  • Author's personal copy

    1514 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    Fig. 8. Daily discharge of water in Hudson river. (a) Time series plot, (b) homoscedasticized plot, (c) residual plot, (d)ACF of residuals, (e) Hill plot for α = 1/ξ , (f) ME plot, (g) ME plot for order statistics 300–1300, (h) Pickands plot, (i)QQ plot with k = 8000, (j) QQ plot with k = 600.

    Fig. 8(a) is the time series plot of the original data and it shows the presence of periodicityin the data. The volume of water is typically much higher in April and May than the restof the year which possibly is due to snow melt. We ‘homoscedasticize’ the data in thefollowing way. We compute the standard deviation of the average discharge of water for everyday of the year and then divide each data point by the standard deviation corresponding tothat day. If the original data set is say (X7/15/1921, . . . , X12/31/2008) then we transform it to(X7/15/1921/S7/15, . . . , X12/31/2008/S12/31), where S7/15 is the standard deviation of the datapoints obtained on July 15 in the different years in the range of the data and similarly S12/31is the same for December 31. The plot of the transformed points is given in 8(b). We then fit anAR(33) model to this data set using the function ar in the stats package in R. The lag was chosen

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1515

    based on the AIC criterion. Fig. 8(c) and (d) show the residuals and their ACF plot respectively.This encourages us to assume that there is no linear dependence in the residuals.

    We now apply the tools for extreme value analysis to the residuals. Fig. 8(e) is the Hill plotand it is difficult to draw any inference from this plot in this case. Fig. 8(f) and (g) are the entireME plot and the ME plot restricted to the order statistics 300–1300. From 8(g) we get an estimateof ξ as 0.261. The Pickands plot in 8(h) and the QQ plots in 8(i) and (j) suggest an estimate ofξ around 0.4. A definite curve is visible in the QQ plot even for k = 600. But the slope of theleast squares line fitting the QQ plot supports the estimate suggested by the Pickands plot and theME plot. We see that it is difficult to reach a conclusion about the range of ξ . Still we infer that0.4 is a reasonable estimate of ξ for this data set since that is being suggested by two differentmethods.

    6.3. The ozone level in New York City

    We also apply the methods to a data set obtained from http://www.epa.gov/ttn/airs/aqsdatamart.This is the data on daily maxima of the level of ozone (in parts per million) in New York City onmeasurements closest to the ground level observed between January 1, 1980 and December 31,2008.

    Fig. 9(a) is the time series plot of the data. This data set also showed a seasonal componentwhich accounted for high values during the summer months. We transform the data set to ahomoscedastic series (Fig. 9(b)) using the same technique as was explained in Section 6.2. Fittingan AR(16) model we get the residuals which are uncorrelated; see Fig. 9(c) and (d).

    The Hill plot in Fig. 9(e) again fails to give a reasonable estimate of the tail index. The MEplots in Fig. 9(e) and (g) are also very rough. Fig. 9(g) is the plot of the points (X(i), M̂(X i )) for300 ≤ i ≤ 1300 and the least squares line fitting these points has slope 0.0472 which gives anestimate of ξ as 0.0451. This is consistent with the Pickands plot in 9(h). This suggests that theresiduals may be in the domain of attraction of the Gumbel distribution.

    7. Conclusion

    The ME plot may be used as a diagnostic to aid in tail or quantile estimation for riskmanagement and other extreme value problems. However, some problems associated with itsuse certainly exist:

    • One needs to trim away {(X(i), M̂(X(i)))} for small values of i where too few terms areaveraged and also trim irrelevant terms for large values of i which are governed by either thecenter of the distribution or the left tail. So two discretionary cuts to the data need be madewhereas for other diagnostics only one threshold needs to be selected.• The analyst needs to be convinced that ξ < 1 since for ξ ≥ 1 random sets are the limits for the

    normalized ME plot. Such random limits could create misleading impressions. The Pickandsand moment estimators place no such restriction on the range of ξ . The QQ method worksmost easily when ξ > 0 but can be extended to all ξ ∈ R. The Hill method requires ξ > 0.• Distributions not particularly close to GPD can fool the ME diagnostic. However, fairness

    requires pointing out that this is true of all the procedures in the extreme value catalogue. Inparticular, with heavy tail distributions, if a slowly varying factor is attached to a Pareto tail,diagnostics typically perform poorly.

    The standing assumption for the proofs in this paper is that {Xn} is i.i.d.. We believe that mostof the results on the ME plot hold under the assumption that the underlying sequence {Xn} is

  • Author's personal copy

    1516 S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517

    Fig. 9. Ozone level in New York City. (a) Time series plot, (b) homoscedasticized plot, (c) residual plot, (d) ACF ofresiduals, (e) Hill plot for α = 1/ξ , (f) ME plot, (g) ME plot for order statistics 300–1300, (h) Pickands plot, (i) QQ plotwith k = 4000, (j) QQ plot with k = 550.

    stationary and the tail empirical measure is consistent for the limiting GPD distribution of themarginal distribution of X1. We intend to look into this further. Other open issues engaging ourattention include converses to the consistency of the ME plot and whether the slope of the leastsquares line through the ME plot is a consistent estimator.

    Acknowledgements

    S. Ghosh was partially supported by the FRA program at Columbia University. S. Resnick waspartially supported by ARO Contract W911NF-07-1-0078 at Cornell University. The authors aregrateful to the referees and the editors for their helpful and extensive comments.

  • Author's personal copy

    S. Ghosh, S. Resnick / Stochastic Processes and their Applications 120 (2010) 1492–1517 1517

    References

    [1] G. Benktander, C. Segerdahl, On the analytical representation of claim distributions with special reference to excessof loss reinsurance, in: XVIth International Congress of Actuaries, Brussels, 1960.

    [2] P. Billingsley, Convergence of Probability Measures, 2nd ed., John Wiley and Sons, New York, 1999.[3] N.H. Bingham, C.M. Goldie, J.L. Teugels, Regular Variation, Cambridge University Press, 1989.[4] S. Coles, An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag, London, 2001.[5] S. Csorgo, D. Mason, The asymptotic distribution of sums of extreme values from a regularly varying distribution,

    The Annals of Probability 14 (3) (1986) 974–983.[6] B. Das, S. Resnick, QQ plots, random sets and data from a heavy tailed distribution, Stochastic Models 24 (1)

    (2008) 103–132.[7] A. Davison, R. Smith, Models for exceedances over high thresholds, Journal of the Royal Statistical Society. Series

    B 52 (3) (1990) 393–442.[8] M. Degen, P. Embrechts, D.D. Lambrigger, The quantitative modeling of operational risk: between g-and-h and

    evt, Astin Bulletin 37 (2) (2007) 265.[9] L. de Haan, An Abel–Tauber theorem for Laplace transforms, Journal of the London Mathematical Society 13

    (1976) 537–542.[10] L. de Haan, A. Ferreira, Extreme Value Theory: An Introduction, Springer-Verlag, New York, 2006.[11] L. de Haan, L. Peng, Comparison of tail index estimators, Statistica Neerlandica 52 (1) (1998) 60–70.[12] A. Dekkers, J. Einmahl, L. de Haan, A moment estimator for the index of an extreme-value distribution, The Annals

    of Statistics 17 (1989) 1833–1855.[13] P. Embrechts, C. Klüppelberg, T. Mikosch, Modelling Extremal Events, in: Applications in Mathematics, vol. 33,

    Springer-Verlag, New York, 1997.[14] P. Embrechts, A.J. McNeil, R. Frey, Quantitative Risk Management: Concepts, Techniques, and Tools, Princeton

    University Press, 2005.[15] F. Guess, F. Proschan, Mean Residual Life: Theory and Applications, Defense Technical Information Center, 1985.[16] W. Hall, J. Wellner, Mean residual life, Statistics and Related Topics (1981) 169–184.[17] R. Hogg, S. Klugman, Loss Distributions, Wiley, New York, 1984.[18] M. Kratz, S. Resnick, The qq-estimator and heavy tails, Stochastic Models 12 (4) (1996) 699–724.[19] G. Matheron, Random Sets and Integral Geometry, Wiley, New York, 1975.[20] I.S. Molchanov, Theory of Random Sets, Springer, 2005.[21] S.I. Resnick, Extreme Values, Regular Variation and Point Processes, Springer-Verlag, Berlin, New York, 1987.[22] S. Resnick, Heavy-Tail Phenomena: Probabilistic and Statistical Modeling, Springer, 2007.[23] S. Resnick, C. Starica, Tail index estimation for dependent data, The Annals of Applied Probability (1998)

    1156–1183.[24] R. Smith, Extreme value analysis of environmental time series: an application to trend detection in ground-level

    ozone, Statistical Science 4 (4) (1989) 367–377.[25] P. Todorovic, J. Rousselle, Some problems of flood analysis, Water Resources Research 7 (5) (1971) 1144–1150.[26] P. Todorovic, E. Zelenhasic, A stochastic model for flood analysis, Water Resources Research 6 (6) (1970)

    1641–1648.[27] G. Yang, Estimation of a biometric function, The Annals of Statistics 6 (1) (1978) 112–116.