Expected Shortfall Estimation and Gaussian Inference for ...jbhill.web.unc.edu/files/2018/10/JBHILL_expected_short.pdf · The α-quantile Value-at-Risk (VaR) is −qα > 0, hence

[14:46 15/12/2014 nbt020.tex] JFINEC: Journal of Financial Econometrics Page: 1 1–44

Journal of Financial Econometrics, 2015, Vol. 13, No. 1, 1--44

Expected Shortfall Estimation and GaussianInference for Infinite Variance Time Series†

JONATHAN B. HILL

Department of Economics, University of North Carolina

ABSTRACTWe develop methods of nonparametric estimation for the ExpectedShortfall of possibly heavy tailed asset returns that leads to asymptot-ically standard inference. We use a tail-trimming indicator to dampenextremes negligibly, ensuring standard Gaussian inference, and ahigher rate of convergence than without trimming when the variance isinfinite. Trimming, however, causes bias in small samples and possiblyasymptotically when the variance is infinite, so we exploit a rarelyused remedy to estimate and utilize the tail mean that is removedby trimming. Since estimating the tail mean involves estimation of tailparameters and therefore an added arbitrary choice of the numberof included extreme values, we present weak limit theory for an ESestimator that optimally selects the number of tail observations bymaking our estimator arbitrarily close to the untrimmed estimator, yetstill asymptotically normal. Finally, we apply the new estimators tofinancial returns data. ( JEL: C13, C20, C22)

KEYWORDS: Expected Shortfall, heavy tails, robust estimation, bias correction

Let {yt : −∞ < t < ∞} be a stationary stochastic process in R where yt has anabsolutely continuous distribution, and let qα < 0 be the α-quantile of yt for somesmall α > 0:

P(yt ≤qα

)=α∈ (0,1).

The α-quantile Value-at-Risk (VaR) is −qα > 0, hence the probability a loss −ytexceeds −qα is α. The Expected Shortfall (ES) of yt is the expected loss given the

We than thank two anonymous referees and Co-Editor Eric Renault for comments and suggestions thatlead to an improved manuscript. Address correspondence to Jonathan B. Hill, Department of EconomicsUniversity of North Carolina-Chapel Hill.†This paper was previous circulated under the title “Robust Expected Shortfall Estimation for InfiniteVariance Time Series.”

doi:10.1093/jjfinec/nbt020 Advanced Access publication September 13, 2013© The Author, 2013. Published by Oxford University Press. All rights reserved.For Permissions, please email: [email protected]

at University of N

orth Carolina at C

hapel Hill on January 16, 2015

http://jfec.oxfordjournals.org/D

ownloaded from

http://jfec.oxfordjournals.org/


2 Journal of Financial Econometrics

loss exceeds the α-level VaR:

ESα :=E[−yt|−yt ≥−qα

]=− 1α

E[ytI

(yt ≤qα

)]>0.

In order to justify the existence of the ES we assume integrability, thus E|yt| < ∞.ES is used in finance to provide valuable improvements over VaR as a

coherent measure of risk (Artzner et al., 1999; Acerbi and Tasche, 2002; Scaillet,2004; Chen, 2008; Danielsson, 2011), and falls into a broad category of spectralmeasures of risk that includes the VaR (see Acerbi, 2002, and Tasche, 2002). Seealso McNeil, Frey, and Embrechts (2005) and Christoffersen (2012) for textbooktreatments. Recent studies focus on systemic or market risk, including expectedshortfall for a portfolio, or the marginal impact of expected shortfall for asmall change in the portfolio allocation. As another example, the systematic riskindex of a particular institution is a function of the tail expectation of a firm’sloss conditional on a market event, which generalizes the ES self-conditionalform E[−yt|−yt ≥ −qα]. A related measure is the VaR conditional on the stateof an institution, and the spread between VaR’s conditional on normal anddistressed states of an institution. See Gourieroux, Laurent, and Scaillet (2000),Fermanian and Scaillet (2005), Brownlees and Engle (2010), Acharya et al. (2010),and Adrian and Brunnermeier (2011) amongst many others.

Let {yt}nt=1 be the observed sample of size n≥1. A simple nonparametric

estimator of the ES is

ESn =− 1αn

n∑t=1

ytI(yt ≤ qn,α

),

where qn,α estimates qα . In the literature a finite second moment E[y2t ] < ∞

is almost universally assumed in order to ensure a Gaussian limit for√

n(ESn− ESα). See Scaillet (2004) and Chen (2008) and their references. See alsoCai and Wang (2008) for use of nonparametric conditional distribution estimationfor the ES, and for nonparametric estimation of the ES under netting agreementssee Fermanian and Scaillet (2005).

Otherwise only Linton and Xiao (2013) tackle the heavy tail case E[y2t ] = ∞ by

considering those processes yt with a regularly varying distribution tail

P(∣∣yt

∣∣>c)=c−κL(c) with tail index κ ∈ (0,2), (1)

and L(c) is a slowly varying function.1 Recall a tail index value κ ≤2 implies yt has aninfinite variance (see Leadbetter, Lindgren, and Rootzén, 1983 and Resnick, 1987),hence Linton and Xiao (2013) only treat the infinite variance case. Linton and Xiao(2013) derive the non-Gaussian stable limit of n1−1/κ (ESn −ESα) for geometricallyα-mixing yt with κ ∈ (1,2). Since the limit law is non-Gaussian and depends on the

1Slow variation is defined by limc→∞L(λc)/L(c)=1 ∀λ>0. Classic examples are constants, and the naturallog and its powers, e.g., a(ln(c))b for any a>0 and b≥0 (Resnick, 1987).

at University of N

orth Carolina at C



ownloaded from



HILL | Expected Shortfall Estimation and Gaussian Inference 3

unknown (but estimable) tail index κ , the authors present a subsampling methodfor inference. Notice they do not treat the hairline infinite variance case κ =2 whichneglects, for instance, IGARCH with Gaussian errors (e.g., Mikosch, and Starica,2000) and the popularly used RiskMetric volatility model that intrinsically hasan IGARCH representation (cf. Christoffersen, 2012, p. 23). Evidence for heavytails in general and particularly conditional heteroscedasticity in financial andmacroeconomic returns is substantial (e.g., Embrechts, Klüppleberg, and Mikosch,1997; Rachev and Mittnik, 2000; Davis, 2010; Ibragimov, Ibragimov, and Kattuman,2010). Specifically, an infinite variance appears to exists for some exchange ratesand asset returns, for example assets traded on emerging stock markets, dependingon the time frame (Ling, 2005; Ibragimov, Ibragimov, and Kattuman, 2010), whileevidence for IGARCH effects is even more substantial (see Morana, 2002 andCaporale, Pittis, and Spagnolo, 2003 for examples).

In the case where tails are so heavy that E|yt| = ∞ then the ES doesnot exist and it is no longer a valid solution to the lack of coherency ofthe VaR. See especially Garcia et al. (2007) and Ibragimov, Jaffee, and Walden(2009). Ibragimov, Jaffee, and Walden (2009) and Ibragimov and Walden (2011)provide evidence of extremely heavy tailed economic losses following naturaldisasters. See also Nešlehová, Embrechts, and Demoulin (2006), Powers (2010),Ibragimov and Walden (2011), and Mainik and Embrechts (2012) for infinite meanmodels and the implications for diversification.

Evidently there does not exist a unified estimation methodology for the ES thatleads to classic inference even if E[y2

t ] = ∞. Since ES by construction conditions ona fixed upper tail quantile, in this article we tail-trim yt from below using negativeobservations y(−)

t := ytI(yt < 0), their sample order statistics y(−)(1) ≤ y(−)

(2) ··· ≤ y(−)(n)

≤ 0, and an intermediate order sequence {kn}, where kn → ∞ and kn/n → 0, todetermine the tail-trimming threshold. Our first tail-trimmed ES estimator is then

ES(∗)n =− 1

αn

n∑t=1

ytI(

y(−)(kn) ≤yt ≤ qn,α

).

Thus kn is the number of trimmed left tail extremes, representing an asymptoticallyvanishing and therefore negligible sample tail portion kn/n → 0. Under a geometricα-mixing condition on {yt} and few additional regularity conditions, negligibility

ensures a consistent and asymptotically normal estimator ES(∗)n . In terms of

inference we propose a HAC estimator for the variance of ES(∗)n that allows for

standard inference asymptotically.A fixed quantile lower bound qn,β ≤ yt ≤ qn,α for 0 < β < α implies

a nonnegligible amount of trimming. This does not permit identification a la

ES(∗)n

p→ ESα since nonnegligibility forces the probability limit of |ES(∗)n | to

be strictly less than |ESα|. Only negligible trimming kn/n → 0 ensures bothasymptotic normality and identification of the true ES. Tail-trimming appears inthe literature predominantly for mean estimation (cf. Csörgo, Horváth, and Mason,

at University of N

orth Carolina at C



ownloaded from




1986, Hahn, Kuelbs, and Weiner, 1990), including average treatment effects (e.g.,Chaudhuri and Hill, 2013), and has been used for heavy tail robust momentcondition tests in Hill (2012a) and Hill and Aguilar (2012), and for robust estimationof conditional mean and random volatility models in Hill (2012b, 2013) andHill and Renault (2012).

We cover the general tail index case κ > 1 and show we can always achieve arate of convergence higher than in the untrimmed case when variance is infinite κ

∈ (1,2). Further, we require fewer assumptions than Linton and Xiao (2013) sincethe limit theory here is guided by normal domain of attraction asymptotics fortail-trimmed mixing arrays derived from a general limit theory in Peligrad (1996).Moreover, our results apply to any tail index value κ > 1, covering the hairlineinfinite variance case κ = 2 and finite variance case κ > 2. Nevertheless, in order tosimplify characterizing trimmed moments we impose a stronger tail decay propertythan the general form (1) used in Linton and Xiao (2013).

One-sided tail-trimming, however, necessarily introduces small sample and

possibly asymptotic bias in the limit distribution when E[y2t ] = ∞ since ES

(∗)n

neglects a tail region (−∞,y(−)(kn)] that may vanish too slowly as n → ∞

(Csörgo, Horváth, and Mason, 1986; Peng, 2001). We therefore exploit ideas in Peng

(2001, 2004) that leads to new estimators that augment ES(∗)n based on two sample

versions R(1)n and R(2)

n of the tail mean E[ytI(yt < y(−)(kn))] that is removed by tail-

trimming. Peng (2001) estimates the tail mean with the same fractile kn in allcomponents. Our first tail mean estimator R(1)

n uses a flexible set of fractiles such that

ES(1)n = ES

(∗)n + R(1)

n resolves small sample bias and leads to the same easily estimated

variance as ES(∗)n . Our remaining estimator R(2)

n selects a fractile associated with R(1)n

that forces ES(2)n = ES

(∗)n + R(2)

n to be arbitrarily close to the untrimmed ESn as n → ∞without undermining asymptotic normality, and again having the same variance

as ES(∗)n . The estimator ES

(2)n has exceptional sampling properties with essentially

the same empirical bias as the untrimmed estimator ESn, and contrary to the latterour estimator is normally distributed asymptotically, and approximately normal insmall samples.2

Peng (2004) solves a similar problem also for iid data by exploitingQin and Wong’s (1996) semi-parametric empirical likelihood method (ELM) togenerate a chi-squared based asymptotic confidence region for the trimmed meanwith tail mean estimators. The semi-parametric ELM should offer yet anothermeans to gain standard inference for the ES since it side-steps estimation of thestandard error. Under dependence, however, block smoothing is necessary in orderto promote a classic EL limit theory (Owen, 1991; Kitamura, 1997), hence inference

2The use of an optimized bias corrected tail-trimmed variance estimator was developed simultaneouslyin Hill and Renault (2012) for Gaussian inference in heavy tailed GARCH models based on variancetargeting.

at University of N

orth Carolina at C



ownloaded from




still requires use of a bandwidth. Thus, whether ELM has any advantage over ourmethod is unknown and left for future research.

The rate of convergence of our ES estimators are all identical and a function ofthe unknown tail index κ , although the rate is n1/2 when κ > 2. Nevertheless, oncestandardized our estimator is asymptotically standard normal, and if κ ∈ (1,2] thenthe rate can always be assured to be n1/2/L(n) for some slowly varying L(n)→∞. Ina general context Ibragimov, and Müller (2010) treat a scalar parametric estimatorwith an asymptotically normal distribution and provide a t-test and confidenceband method robust to unknown correlation, heterogeneity and therefore rateof convergence properties. Although their goal is improved inference whenthe dependence structure is unknown, it is applicable here and indeed in anycontext of scalar inference with Gaussian asymptotics under tail-trimming in viewof the unknown rate of convergence (e.g., Hill 2012a,b, 2013; Hill and Aguilar,2012).

Finally, our estimation method carries over to a large variety of relatedfinancial risk measures. Examples include those discussed above like the marginalexcess shortfall (Brownlees and Engle, 2010), other members of the spectral classof risk measures (Acerbi, 2002), and expected shortfall under netting agreements(Fermanian and Scaillet, 2005) to name a few. Consider, for example, the spectralclass presented in Acerbi (2002). Define the Dirac Delta function δ by

∫ ba f (x)δ(x −

c)dx = f (c) ∀c ∈ [a,b] for all continuous differentiable functions f with compactsupport, and let δ′ satisfy

∫ ba f (x)δ′(x − c)dx = −f ′(c) ∀c ∈ [a,b]. It is easily seen

than the ES is identically ESα = ∫ 10 ξESξ dμ(ξ ) where dμ(ξ ) = α−1δ(α − ξ )dξ , and

the VaR is identically∫ 1

0 ξESξ dμ(ξ ) with dμ(ξ ) = δ′(α − ξ )dξ . This suggests a broadspectrum of related risk measures Mμ := ∫ 1

0 ξESξ dμ(ξ ) where dμ(ξ ) is any measureon [0,1] that satisfies

∫ 10 ξdμ(ξ ) = 1. Our estimators of ESα , which allow for heavy

tails and standard asymptotic inference, instantly apply to this general class of riskmeasures.

The estimators ES(·)n are developed in Sections 1 and 2. In Section 2 we

present a consistent estimator of the scale used for inference. Sections 3 and 4contain a simulation study and an empirical pplication. The appendices containproofs.

We use the following notation. ι > 0 is a tiny constant and K >0 is a constant,the values of which may change from line to line. L(n) → ∞ is a slowly varyingfunction, the value and rate of which may change from line to line. an ∼ bn impliesan/bn → 1. [z] is the integer part of variable z.

1 TAIL-TRIMMED ESTIMATOR

All random variables in this paper exist on the same probability measure space(,F,P), and all σ -fields are subsets of F . Define �t := σ (yτ : τ ≤ t) and �t

s := σ (yτ :s ≤ τ ≤ t), and define sample order statistics y(i) of yt: y(1) ≥ y(2) ≥ ··· ≥ y(n). Similarly,

at University of N

orth Carolina at C



ownloaded from




define the left-tail process {y(−)t } and its order statistics:

y(−)t :=ytI

(yt <0

)and y(−)

(1) ≤y(−)(2) ≤···≤y(−)

(n) ≤0.

In the ES construction, we estimate the VaR −qα with a central order statistic forbrevity (e.g., Chen, 2008),

qn,α =y([αn]),

although a smoothed estimator can be employed as in Scaillet (2004) andLinton and Xiao (2013).

Let {kn} be an intermediate order sequence, hence kn → ∞ and kn =o(n), and let {ln} be the sequence of positive numbers that satisfies (e.g.,Leadbetter, Lindgren, and Rootzén, 1983)

P(yt <−ln

)= kn

n. (2)

Distribution continuity ensures such thresholds {ln} exist for any choice of {kn},and −ln is the left-tail kn/n quantile of yt. By construction y(−)

(kn) estimates −ln. Theassociated ES estimator is an intermediate order left-tail and central order right-tailtrimmed mean

ES(∗)n =− 1

αn

n∑t=1

ytI(

y(−)(kn) ≤yt ≤y([αn])

).

Asymptotic theory requires a version of the trimmed variable ytI(y(−)(kn) ≤ yt ≤

y([αn])) based on the nonrandom thresholds {ln}, and its long-run variance, so define

y∗n,t :=ytI

(−ln ≤yt ≤qα

)and S2

n := 1n

E

( n∑t=1

{y∗

n,t −E[y∗

n,t]})2

.

Although ES(∗)n , y∗

n,t and S2n implicitly depend on the level of risk α, we suppress

such dependence for notational economy.In order to obtain Gaussian asymptotics we impose the following mixing,

identification and tail decay conditions.

Assumption D (dgp). yt is governed by a strictly stationary, nondegenerateand absolutely continuous distribution on (−∞,∞), and is geometrically α-mixing:supA∈�t−∞,B∈�∞

t+h|P(A∩B) − P(A)P(B)| = O(ρh) where ρ ∈ (0,1).

Assumption I (identification.) n1/2S−1n E[ytI(yt < −ln)] → 0.

We show in the Appendix that ES(∗)n obtains the expansion

n1/2

Sn/α

(ES

(∗)n −ESα

)=− 1

n1/2Sn

n∑t=1

{y∗

n,t −E[y∗

n,t]}− n1/2

Sn

(E[y∗

n,t]+α×ESα

)

at University of N

orth Carolina at C



ownloaded from




=− 1n1/2Sn

n∑t=1

{y∗

n,t −E[y∗

n,t]}− n1/2

Sn

(E[ytI

(yt ≤qα

)]−E[ytI

(−ln ≤yt ≤qα

)])=Zn − n1/2

SnE[ytI

(yt <−ln

)]=Zn +Rn,

say, where the equality holds asymptotically with probability approaching one.The first term Zn satisfies a Gaussian central limit theorem, thus Assumption Iensures asymptotic unbiasedness by Rn → 0. By arguments in Section 3 we haven1/2|E[ytI(yt < −ln)]| ∼ Kn1/2(kn/n)ln = Kn1/2(n/kn)1/κ−1 = Kk1−1/κ

n /n1/2−1/κ , andif the variance is finite κ > 2 then 0 < limn→∞S2

n < ∞ under geometric α-mixing(Ibragimov, 1962). In this case Assumption I implies we cannot trim too much, inparticular kn must not increase too fast

kn/n(κ/2−1)/(κ−1) →0.

This is assured by letting kn be slowly varying, for example kn = [ln(n)].Assumption I otherwise may rule out heavy tails κ ∈ (1,2) since |Rn| → ∞ is

possible. This follows by noting if κ ∈ (1,2) then S2n ∼Krn(n/kn)2/κ−1 by Lemma 2.1,

below, for some sequence of nonrandom positive numbers {rn} where rn = O(ln(n)),and rn = O(1) if yt is m-dependent. Now use |E[ytI(yt < −ln)]| ∼ K(n/kn)1/κ−1 whenκ ∈ (1,2) to deduce asymptotic bias |Rn| = n1/2S−1

n |E[ytI(yt < −ln)]| ∼ Kk1/2n /rn →

(0,∞] when yt is heavy tailed and m-dependent, or yt is heavy tailed and kn → ∞at least as fast as (ln(n))2. See Section 1.2 for an ES estimator based on ES

(∗)n with an

asymptotic bias correction.

Assumption T (tails). P(yt < −c) = dc−κ (1 + o(1)) as c → ∞ where d > 0 and κ > 1.

Remark: Assumption T aligns with the stable domain of attraction fora sample mean of the unbounded left tail variable ytI(yt ≤ qα). SeeLeadbetter, Lindgren, and Rootzén (1983) and Resnick (1987). Recall the tail indexκ in Assumption T is identically the left-tail moment supremum:

κ =argsup{α>0 :E∣∣ytI

(yt <0

)∣∣α <∞}hence E

∣∣ytI(yt <0

)∣∣p <∞ iff p∈(0,κ).

The assumption is in some sense more restrictive than the general regularly varyingform (1) used in Linton and Xiao (2013). We only restrict the left tail since ytI(yt ≤qα) is bounded from above, but the restriction is of the Paretian class dc−κ (1 + o(1)).We can allow a general left-tail P(yt < −c) = c−κL(c), but the Paretian class L(c) =d(1 + o(1)) greatly simplifies deducing the long-run variance rate S2

n → ∞ in theinfinite variance case in view of Karamata’s Theorem. See Lemma 2.1, below.

As a regularity condition we require that S2n be non-degenerate as n → ∞.

Assumption N (nondegeneracy). liminfn→∞{S2n/E[y∗2

n,t]} > 0.

at University of N

orth Carolina at C



ownloaded from




Remark 1: A nondegenerate distribution under Assumption D and the negligi-bility of trimming ensure liminfn→∞E[y∗2

n,t] > 0. Assumption N, however, rules outperverse cases for the long-run variance S2

n due to negative correlation wherein S2n

→ 0. In order to understand the nuance, define tail-trimmed covariances γ ∗n (h) :=

E[y∗n,ty

∗n,t−h]− (E[y∗

n,t])2 and correlations ρ∗n(h) := γ ∗

n (h)/γ ∗n (0), and note S2

n = γ ∗n (0)

× {1 + 2∑n−1

h=1 (1−h/n)ρ∗n(h)} ≥ 0. Now, by dominated convergence E[y∗

n,t] → E[yt]hence γ ∗

n (0)/E[y∗2n,t] ∼ 1 − (E[yt])2/E[y∗2

n,t]. In the infinite variance case γ ∗n (0)/E[y∗2

n,t]∼ 1 and otherwise by dominated convergence γ ∗

n (0)/E[y∗2n,t]∼1−(E[yt])2/E[y2

t ]> 0, hence in general γ ∗

n (0)/E[y∗2n,t] ∼ K > 0. Assumption N therefore implies

liminfn→∞{S2n/E[y∗2

n,t]} = K(1 + 2liminfn→∞{∑n−1h=1 ρ∗

n(h)}) > 0. The only way forliminfn→∞{S2

n/E[y∗2n,t]} = 0 to hold is for ρ∗

n(h) < 0 to be large for some lag(s) h ≥ 1.3

Conditions like Assumption N are standard in the central limit theory literature fordependent data (e.g., Dehling, Denker, and Phillip, 1986) and used here to ensurea central limit theorem for tail-trimmed arrays.

Remark 2: If yt is m-dependent then Assumption N automatically holds.

We first characterize trimmed variances for future reference.

Lemma 1.1. Under Assumptions D and T:

a. S2n = rnE[(y∗

n,t− E[y∗n,t])2] = o(n) where {rn} is a sequence of positive numbers that does

not depend on kn, liminfn→∞rn > 0 and rn = O(ln(n)), and rn = O(1) if yt is m-dependentor E[y2

t ] < ∞.

b. If κ = 2 then E[y∗2n,t] =K ln(n), and if κ ∈ (1,2) then E[y∗2

n,t] ∼ κ(2 − κ)−1d2/κ

(n/kn)2/κ−1.

Remark: The term rn = O(ln(n)) arises due to non-m-dependence and heavy tailsby exploiting a covariance bound in Rio (1993).

The tail-trimmed ES estimator is consistent and asymptotically normal.

Theorem 1.2. Under Assumptions D, N and T ES(∗)n

p→ESα and n1/2S−1n (ES

(∗)n −ESα +

α−1E[ytI(yt < −ln)]) d→ N(0,α−2). If additionally Assumption I holds then n1/2S−1n (ES

(∗)n

− ESα)d→ N(0,α−2). If κ ≤ 2 then the rate of convergence is n1/2S−1

n = o(n1/2).

3Notice having some or many large ρ∗n (h) is fundamentally different from the long memory property

that correlations are not absolutely summable. In the finite variance case long memory is ruled outby geometric α-mixing since by dominated convergence and Lemma 1.3 in Ibragimov (1962) it followslimn→∞

∑n−1h=1 |ρ∗

n (h)|= ∑∞h=1 |ρ(h)|< ∞ where ρ(h) is the untrimmed correlation. In the infinite variance

case∑n−1

h=1 |ρ∗n (h)| → ∞ is possible under non-m-dependence and geometric α-mixing since |γ ∗

n (h)| → ∞and therefore |ρ∗

n (h)| → 1 are possible for infinitely many h ∈ N.

at University of N

orth Carolina at C



ownloaded from




As discussed above, if yt has a finite variance and kn → ∞ at a slowly varyingrate then identification Assumption I holds. In this case and in view of geometricα-mixing it follows limn→∞S2

n exists and is positive (cf. Ibragimov, 1962).

Corollary 1.3. Let E[y2t ] < ∞ and kn → ∞ at a slowly varying rate (e.g., kn ∼

ln(n)). Under Assumptions D, N and T n1/2(ES(∗)n − ESα)

d→ N(0,α−2S2) where S2

:= limn→∞S2n satisfies 0 < S2 < ∞.

The iid case is instructive since if κ ∈ (1,2) then by Lemma 2.1 S2n ∼ E[y∗2

n,t] ∼κ(2 − κ)−1d2/κ (n/kn)2/κ−1 → ∞. In this case the asymptotic variance has a simpleform that depends on tail parameters.

Corollary 1.4. Let yt be iid. Under Assumptions N and T with index κ ∈ (1,2)

n1/2

(n/kn)1/κ−1/2 (ES(∗)n −ESα +E

[ytI

(yt <−ln

)])

d→N(

0,α−2d−2/κ

(2−κ

κ

)).

In general for geometrically α-mixing data and in the case of heavy tails, Lemma2.1 shows S2

n → ∞ hence the rate of convergence is n1/2/Sn = o(n1/2), similar to thecase of untrimmed ES estimation (Linton and Xiao, 2013). However, by constructiontrimming removes the most damaging extremes which can be exploited to elevatethe rate. By Lemma 2.1 if κ ∈ (1,2) then S2

n ∼ Krn(n/kn)2/κ−1, hence the rate is

n1/2

Sn= n1/2

r1/2n ×(n/kn)1/κ−1/2

=n1−1/κ × kn1/κ−1/2

r1/2n

where 0<rn ≤K ln(n). (3)

A rapid rate of trimming kn → ∞ therefore optimizes the rate of convergence sincethe presence of extremes diminishes estimator sharpness, a well known result fora sample tail-trimmed mean (e.g., Csörgo, Horváth, and Mason, 1986, Hill, 2012a,Hill and Aguilar, 2012). Linton and Xiao (2013, Theorem 2) show the rate for the

untrimmed estimator ESn is exactly n1−1/κ when κ ∈ (1,2). Thus ES(∗)n converges

faster than ESn sufficiently if the rate of trimming kn → ∞ is regularly varyingsince rn ≤ K ln(n), for example kn ∼ λnδ or kn ∼ λn/ln(n) for λ ∈ (0,1] and δ ∈(0,1). Notice a regularly varying rate does not necessarily equate to trimming manyobservations: if n ∈ {100,500,1000} then kn ∼ 0.25n2/3, for example, implies only{5.4%,3.1%,2.5%} of observations are trimmed. See our simulation study, below.Of course, the fastest rate of convergence n1/2 is achieved when kn = λn, a centralorder sequence, since then by the mixing property S2

n = O(1). This implies a fixedportion of the sample is trimmed, a case not considered here since nonnegligibleleft-tailed trimming induces bias both in small samples and asymptotically.

As an example of a trimming policy that optimizes the convergence rate whenκ ∈ (1,2), consider kn =[n/gn] for a sequence of positive numbers {gn} where gn → ∞as slowly as we choose and not faster than a slowly varying function, e.g., gn =K ln(n)or gn = K ln(ln(n)). Then kn is close to the central order λn for λ ∈ (0,1), and if κ ∈ (1,2)

at University of N

orth Carolina at C



ownloaded from




then in view of Lemma 2.1 the variance S2n = rng2/κ−1

n → ∞ no faster than a slowly

varying rate, and ES(∗)n achieves a rate of convergence n1/2/(r1/2

n g1/κ−1/2n ) which is

close to n1/2. In the hairline infinite variance case κ = 2 we have by Lemma 2.1 S2n

= Krn ln(n)=O((ln(n))2) for any intermediate order sequence {kn}, hence the rate ofconvergence is n1/2/(rn ln(n))1/2 which is again close to n1/2.

Corollary 1.5. Let Assumptions D, N, and T hold. If κ ∈ (1,2) and kn = [n/gn] for gn→ ∞ not faster than a slowly varying function, or κ = 2 and {kn} is any intermediate ordersequence, then S2

n → ∞ not faster than a slowly varying function. Further, for some 0 <

v2 < ∞ that may be different in different places,

κ ∈(1,2) : n1/2

r1/2n g1/κ−1/2

n(ES

(∗)n −ESα + 1

αE[ytI

(yt <−ln

)])

d→N(

0,v2)

κ =2 :(

nrn ln(n)

)1/2(ES

(∗)n −ESα + 1

αE[ytI

(yt <−ln

)])

d→N(

0,v2).

Remark: The asymptotic variance v2 is not unique since it depends on the level ofrisk α, the tail index κ , and the chosen fractile kn. This is irrelevant for inference sincewe estimate S2

n nonparametrically. See Section 2.3. See also Ibragimov, and Müller(2010) for a method of robust inference when the rate of convergence or asymptoticvariance are not known.

2 BIAS CORRECTED ES ESTIMATORS

By construction ESα =−α−1E[ytI(−ln ≤yt ≤ qα)]+α−1E[ytI(−ln < yt)]. As discussed

above, the statistic ES(∗)n is possibly biased asymptotically if κ ∈ (1,2) since the

tail mean −E[ytI(yt < −ln)] is removed and may vanish too slowly asymptotically.Define the quantile function

Q(u) := inf{x≥0 :P(yt ≤x

)≥u}

where 0≤u≤1,

and note under Assumption T if κ ∈ (1,2) then Q(u) = d1/κu−1/κ as u → 0. In orderto construct an estimator of the tail mean that is asymptotically normal, notice

E[ytI

(yt <−ln

)]∼∫ kn/n

0Q(u)du∼−d1/κ

(κ

κ−1

)(kn

n

)−1/κ+1=− κ

κ−1kn

nln. (4)

Peng (2001, 2004) exploits a similar result to improve on a left- and right-tailedtrimmed mean for iid data. We exploit the same trick for improved ES estimators formixing data under left-tail trimming. We then present consistent estimators of thescales for standard inference. Finally, we show that our bias estimator is negligibleasymptotically even if tails are so thin that they decay infinitely faster than theAssumption T power law rate (e.g., yt is Gaussian, or has bounded support).

at University of N

orth Carolina at C



ownloaded from




2.1 Tail Mean Estimator

By (4) we may estimate the tail mean by estimating −κ(κ − 1)−1(kn/n)ln. We alreadyhave y(−)

(kn) as an estimator of −ln, and in order to estimate κ we consider Hill (1975)’sseminal estimator4

κ(−)kn

:=⎛⎝ 1

kn

kn∑i=1

ln(

y(−)(i) /y(−)

(kn)

)⎞⎠−1

.

Hill (2010, 2011a) proves (−y(−)(kn))/l(−)

n − 1 and κ(−)kn

− κ are Op(k−1/2n ), and

asymptotically normal for a large class of dependent and heterogeneous time seriesthat covers Assumptions D and T. In view of formula (4) the enhanced ES estimatorbased on Peng (2001) idea is therefore

ES(pg)n := ES

(∗)n − 1

α

⎛⎝ κ(−)kn

κ(−)kn

−1

kn

ny(−)

(kn)

⎞⎠= ES(∗)n +Rn,

say.Peng (2001) uses the same fractile sequence {kn} for estimating all relevant

components in ES(∗)n and Rn. In terms of theory, we show in the appendices that

only ES(∗)n and (kn/n)y(−)

(kn) need to be based on the same kn to ensure asymptoticunbiasedness. If we also use kn for estimating κ then the limit theory is complicated

by the fact that each ES(∗)n , κ

(−)kn

, and y(−)(kn) impact ES

(pg)n asymptotically: even in the

iid case inference requires a complicated parametric asymptotic variance estimator(see Peng, 2001; Theorem 1).

As a practical matter trimming very few observations for each n sufficientlyerodes the heavy tailedness of the remaining sample for Gaussian asymptotics.Conversely, using many tail observations for an order statistic based estimator ofκ improves its rate of convergence, and allows it to not impact a bias-corrected ESestimator asymptotically as we show below.5 We therefore use κ

(−)mn with a larger

fractile mn > kn as n → ∞, specifically

kn

mn→0 where mn →∞ and mn =o(n), (5)

hence our first bias-corrected ES estimator is

ES(1)n := ES

(∗)n − 1

α

(κ

(−)mn

κ(−)mn −1

kn

ny(−)

(kn)

)= ES

(∗)n +R(1)

n .

4Many estimators of the tail index are available. See, e.g., de Haan and Peng (1998) and Hill (2010), amongstothers, for references and theory.

5An estimator of κ for GARCH processes is treated in Berkes, Horváth, and Kokoszka (2003). This estimatoris n1/2-convergent since all observations are used for estimation, but the GARCH error must have a finitefourth moment that limits it use for heavy tailed data.

at University of N

orth Carolina at C



ownloaded from




If yt is known to have the same left- and right-tail indices, or the left tail is knownto be heavier6, then we use a two-tailed tail index estimator since this can includemore data points and therefore be more efficient:

κmn :=(

1mn

mn∑i=1

ln(

y(a)(i) /y(a)

(mn)

))−1

where y(a)t := ∣∣yt

∣∣.Property (5) ensures κ

(−)mn

p→ κ fast enough to eradicate any impact on ES(1)n .

We specifically require κ(−)mn = κ + Op(1/m1/2

n ) which in general holds as longas yt satisfies a second-order power law condition and observations are takenfrom sufficiently far out in the tails, hence mn → ∞ cannot be too fast (Hill,2010, Theorem 2; cf. Hall, 1982, Theorem 2; Haeusler and Teugels, 1985, Section 5).We invoke the following second-order extension of Paretian decay AssumptionT, although other second-order properties are possible. See Hall (1982) andHaeusler and Teugels (1985).

Assumption T′ (tails). Let P(yt < −c) = dc−κ (1 + O(c−β )) as c → ∞ where d,β > 0and κ > 1. Further

mn →∞ and mn =o(

n2β/(2β+κ)). (6)

Remark 1: In the exact Pareto case P(yt < −c) = dc−κ for c > 0 the rate boundin (6) becomes simply mn = o(n), but as the tail deviates from a true Pareto thenobservations must be taken from farther out in the tails. See Haeusler and Teugels(1985).

Remark 2: An example of sequences {kn,mn} that satisfy (5) and (6) is

kn =[

n2β/(2β+κ)

(ln(n))2ι

]and mn =

[n2β/(2β+κ)

(ln(n))ι

]for tiny ι>0.

If β ≥ κ then we may use kn ∼ λkn2/3/(ln(n))2ι and mn ∼ λmn2/3/(ln(n))ι for scalefractions λk,λm > 0. Since this requires knowledge of k, another example is

kn =[λk (ln(n))1−ι

]and mn = [λm ln(n)] for λk,λm >0 and tiny ι>0.

Recall y∗n,t := ytI(−ln ≤ yt ≤ qα). Stack nontail and tail variables

Wn,t :=[

y∗n,t −E

[y∗

n,t

],

(nkn

)1/2{I(yt <−ln

)−P(yt <−ln

)} ]′,

6Suppose, for example, P(yt < −y) ∼ d1y−κ1 and P(yt > y) ∼ d2y−κ2 where d1,d2 > 0 and κ1 ≤ κ2. ThenP(|yt| > y) ∼ d1y−κ1 + d2y−κ2 = d1y−κ1 (1 + (d2/d1)yκ1−κ2 ) = d1y−κ1 (1 + O(1)): two-tailed and left-tailedindices are identical.

at University of N

orth Carolina at C



ownloaded from




and defineV2

n :=B′n�nBn (7)

where

Bn :=[

1, − 1κ−1

(kn

n

)1/2ln

]′and �n := 1

n

n∑s,t=1

E[Wn,sW ′

n,t]∈R

2×2.

Notice the upper left element of �n is identically S2n .

We must strengthen nondegeneracy Assumption N to cover �n. Let λ ∈ R2.

Assumption N′ (nondegeneracy). liminfn→∞ infλ′λ=1 ||λ′�nλ/(E(λ′Wn,t)2)|| > 0.

Identification Assumption I is now superfluous.

Theorem 2.1: Under Assumptions D, N′ , T′, and (5)-(6) we have n1/2V−1n (ES

(1)n −

ESα)d→ N(0,α−2). Moreover, V2

n = S2n(1 + o(1)) if κ > 2 and V2

n = S2n(1 + O(1)) if κ ∈

(1,2].

Remark 1: Since κ(−)mn does not affect the limit distribution of ES

(1)n , only y(−)

(kn) and

ES(∗)n matter asymptotically. The order statistic y(−)

(kn)/(−ln) can be written as a sum of

I(yt <−ln)−P(yt < −ln), cf. Hsing (1991, p. 1553), and ES(∗)n is asymptotically a sum

of y∗n,t − E[y∗

n,t] by Lemma A.2 in Appendix A. The scale V2n = B′

n�nBn therefore hasa classic sandwich form capturing the covariance between nontail and tail variables

y∗n,t and I(yt < −ln) which form the asymptotic basis of ES

(∗)n and R(1)

n .

Remark 2: The relative magnitude of V2n and S2

n is interesting. By construction andnondegeneracy Assumption N′ the long-run variance S2

n of y∗n,t diverges if E[y2

t ] =∞, but the long-run variance of (n/kn)1/2{I(yt < −ln) − P(yt < −ln)} is bounded (cf.Hill, 2009, 2010). Moreover, (kn/n)l2n/S2

n = O(1) since κ > 1, and if the variance isfinite κ > 2 then S2

n = O(1) and (kn/n)1/2ln =K(kn/n)1/2−1/κ → 0. By combining theseproperties it must be the case that V2

n = S2n(1 + o(1)) when κ > 2, and otherwise V2

n= S2

n(1 + O(1)). The general relationship V2n ∼ KS2

n means the rate of convergencedetails from Section 2.1 carry over here verbatim. See Lemma A.1 in Appendix Afor a compendium of tail and nontail variance properties.

2.2 Optimal Bias Correction

A limitation of the tail mean estimators Rn and R(1)n is they rely on point estimates

for their components. In practice a better approach is to choose R(1)n that best

approximates the tail mean E[ytI(yt < −ln)], in particular such that ES(∗)n + R(1)

n

at University of N

orth Carolina at C



ownloaded from




is closest to the untrimmed estimator ESn := −1/(αn)∑n

t=1ytI(yt ≤ qn). This will

eradicate asymptotic bias, and minimize incidental small sample bias that arisesfrom trimming. Further, by assumption E|yt| < ∞ hence κ > 1, which meansthat we only use a point tail index estimator if κ

(−)mn > 1. As it turns out, we can

minimize |ES(∗)n + R(1)

n − ESn| over the fractile mn used to compute κmn in the tailmean estimator R(1)

n and still have an ES estimator that is asymptotically normal. If,however, we minimize the criterion over mn and kn simultaneously then the resultingestimator has a nonstandard limit distribution since kn is used to compute the core

tail-trimmed ES estimator ES(∗)n .

Although we use the term “optimal” to describe the bias correction in thissection, certainly our solution is not the only one. In the iid case, for example, themean-squared-errors of κ

(−)mn(λn)

and y(−)(kn) can be computed as a function of second-

order tail parameters (see Hall, 1982; Huisman et al. 2001; Segers, 2002). In ourcontext evidently {mn,kn} can be chosen to minimize an asymptotic parametric

mean-squared-error of ES(∗)n + R(1)

n , but in the general mixing case little is knownabout tail estimator variances (see Hill, 2010, 2011b). Our optimization method,however, has an obvious advantage since it minimizes bias and eradicates itasymptotically, and still results in an asymptotically normal ES estimator.

Define mn(λ) = [λmn] where {mn} is an intermediate order sequence and λ istaken from a compact set [λ,λ], 0 < λ < λ. The new tail mean estimator

R(2)n =R(2)

n (λn) :=− 1α

×⎛⎝ κ

(−)mn(λn)

κ(−)mn(λn)

−1

kn

ny(−)

(kn)

⎞⎠is a function of any λn that solves the optimization problem

λn =argminλ∈[λ,λ]

∣∣∣ES(∗)n +R(2)

n (λ)−ESn

∣∣∣.Our second ES estimator with optimal bias correction is then

ES(2)n := ES

(∗)n +R(2)

n .

As long as we impose kn/mn → 0 as in (5) the estimator ES(2)n has the same limit

distribution as ES(∗)n , hence the correct scale is again V2

n in (7). Other fractile functionsmn(λ) can be considered, for example cadlag mn(λ) that is nondecreasing in λ (seeHill, 2009).

Notice in principle λn may not be unique for finite n or even asymptotically.This is irrelevant since we show in the proof of the next claim that the Hill (1975)estimator process {κmn(λ) : λ ∈ [λ,λ]} satisfies supλ∈[λ,λ] |κmn(λ) − κ| = op(1/k1/2

n ) as

at University of N

orth Carolina at C



ownloaded from




long we use more tail observations mn than we trim kn a la (5), hence κmn(λ∗n) = κ +

op(1/k1/2n ) for any sequence {λ∗

n} of possibly stochastic λ∗n on [λ,λ]. This is all that is

required for Theorem 2.2 to go through.

Theorem 2.2: Let Assumptions D, N,′ and T′ hold, and assume kn and mn satisfy (5).

Further, let {λn} be any sequence on [λ,λ] that solves (8) for each n. Then n1/2V−1n (ES

(2)n −

ESα)d→ N(0,α−2).

Uniform asymptotics remain valid on any subset of [λ,λ]. Consider the subsetsuch that the tail index estimator is above unity:

�∗n :=

{λ∈[λ,λ] : κ (−)

mn(λ) >1}.

Then the proposed optimal fractile parameter is

λ∗n =argmin

λ∈�∗n

∣∣∣ES(∗)n +R(2)

n (λ)−ESn

∣∣∣. (8)

It is easy to show by the argument used to prove Theorem 2.2 that n1/2V−1n (ES

(2)n −

ESα)d→ N(0,α−2) when R(2)

n = R(2)n (λ∗

n) is the computed bias estimator.

2.3 Bias Correction—Thin Tails

The tail mean formula (4) crucially exploits the Assumption T power law tail form. Ifthe left tail decays faster than any power law such that P(yt < −c)/c−ξ → 0 ∀ξ >0, forexample if yt has exponentially decaying tails or a bounded distribution support,then (4) does not have meaning and is in general incorrect. Thus, the importantquestions are what exactly do R(1)

n and R(2)n (λ) estimate? and what is their limiting

behavior when probability tails do not satisfy a power law? Consider R(1)n , while a

similar argument applies to R(2)n (λ).

Assume for simplicity that yt has the following exponential tail

P(∣∣yt

∣∣≥c)=ϑ exp

{−� cδ}

where ϑ,�,δ∈ (0,∞). (9)

The intermediate order statistic y(−)(kn) is well behaved for a larger variety of distri-

butions (cf. David and Nagaraja, 2003), while intuitively the moment supremum

under (9) is argsup{α > 0 : E|yt|α < ∞} = ∞ hence κ(−)mn

p→ ∞ may reasonably beexpected. Thus the limiting properties of R(1)

n should be characterizable. Indeed, inthe thin tail case the scale is bounded 0 < limn→∞Vn < ∞ hence the conclusions ofTheorems 2.1 and 2.2 remain valid as long as n1/2R(1)

np→ 0. The next result shows

this occurs as long as trimming is very light.

at University of N

orth Carolina at C



ownloaded from




Theorem 2.3: Let Assumptions D and N′ hold, assume (9), and let kn → ∞, kn =o(ln(n)), mn → ∞ and mn = o(n). Then n1/2V−1

n (ES(1)n − ESα)

d→ N(0,α−2).

Remark 1: It is imperative that the trimming fractile kn increases very slowly such

that kn = o(ln(n)). This ensures the tail-trimmed mean estimator ES(∗)n dominates

asymptotics by sufficiently elevating its rate of convergence relative to the sampletail exponent in the bias estimator R(1)

n . In view of Theorems 2.1 and 2.2 it is alwaysvalid to use kn = [λk(ln(n))1−ι] and mn = [λm ln(n)] for λk,λm > 0 and tiny ι > 0: inthe heavy tail case this satisfies Theorems 2.1 and 2.2 and in the light tail case itsatisfies Theorem 2.3.

Remark 2: Notice Theorem 2.3 does not require a restriction on the tail fractile mnused for tail exponent estimation since power law Assumption T or T′ no longerhold.

Remark 3: A similar proof can be constructed for the case where yt has boundedsupport.

2.4 Nonparametric Inference

Estimation of the scale V2n = B′

n�nBn is straightforward since each component isbased on a long-run variance and/or a tail quantile. Write compactly

y∗n,t =ytI

(y(−)

(kn) ≤yt ≤ qn

)and y∗

n := 1n

n∑t=1

y∗n,t and Bn :=

[1,− 1

κ(−)mn −1

(kn

n

)1/2y(−)

(kn)

]

Wn,t :=[

y∗n,t − y∗

n,

(nkn

)1/2{I(

yt <y(−)(kn)

)− kn

n

}]′

and define kernel estimators

S2n := 1

n

n∑s,t=1

K((s−t)/γn){y∗

n,s − y∗n}{

y∗n,t − y∗

n}

and �n := 1n

n∑s,t=1

K((s−t)/γn)Wn,sW ′n,t

where K(x) is a kernel function with K(0) = 1, and the bandwidth γn satisfiesγn →∞ and γn = o(n). Note S2

n estimates the scale S2n for the core estimator ES

(∗)n .

Now build a scale estimator for the bias-corrected ES estimators:

V2n := B′

n�nBn.

The following properties cover Tuckey-Hanning, Parzen, and Bartlett kernels, toname a few. See Hill (2010, 2012a,b) for references.

Assumption K (kernel and bandwidth). Assume K(·) is continuous at 0 and allbut a finite number of points, K : R → [−1,1], K(0) = 1, K(x) = K(−x) ∀x∈R,

at University of N

orth Carolina at C



ownloaded from



HILL | Expected Shortfall Estimation and Gaussian Inference 17∫∞−∞|K(x)|dx < ∞, and

∫∞−∞|� (ξ )|dξ < ∞}. Let

∑ns,t=1 |K((s − t)/γn)| = o(n2),

max1≤s≤n∑n

t=1K((s − t)/γn) = o(n) and bandwidth γn = o(n).

Theorem 2.4: Under Assumptions D, K, N,′ and T′ S2

n/S2n

p→ 1 and V2n/V2

np→ 1.

3 SIMULATION STUDY

We now perform a Monte Carlo experiment to investigate the merits of our

estimators. We study both the core estimator ES(∗)n = −α−1n−1

∑n

t=1ytI(y(−)

(kn) ≤ yt

≤ qn) and the optimal bias corrected estimator ES(2)n = ES

(∗)n + R(2)

n .Let N0,1 denote a standard normal distribution, and let Pκ denote a

symmetrized Pareto distribution: if εt ∼ Pκ then P(εt > c) = P(εt < −c) = 0.5(1+ c)−κ for κ > 0. The models are

AR(2) : yt =0.8yt−1 −0.3yt−2 +εt, where εtiid∼Pκε with κε ∈{1.5,2.5}, or εt

iid∼N0,1

GARCH(1,1) : yt =σtεt, where σ 2t =1+0.3y2

t−1 +0.6σ 2t−1, and εt

iid∼P2.5 or εtiid∼N0,1

IGARCH(1,1) : yt =σtεt, where σ 2t =1+0.4y2

t−1 +0.6σ 2t−1 and εt

iid∼N0,1

RiskMetric IGARCH : yt =σtεt, where σ 2t =1+0.06y2

t−1 +0.94σ 2t−1 and εt

iid∼N0,1.

In all GARCH cases we standardize Pareto errors such that E[ε2t ] = 1. The

RiskMetric volatility model follows from JP Morgan’s parametric convention thatreduces to an IGARCH with σ 2

t = ω + 0.06y2t−1 + 0.94σ 2

t−1 for some ω > 0. SeeChristoffersen (2012), §2). Each process {yt} is stationary, ergodic, geometrically α-mixing, and has a Paretian tail P(|yt| > c) = dc−κy (1 + o(1)), cf. Brockwell and Cline(1985), Pham and Tran (1985), and Mikosch, and Starica (2000). We simulate 10,000samples of sizes n∈{100, 500, 1000},7 and estimate the ES of yt evaluated at a levels ofrisk α ∈ {0.05,0.10}. All estimators work qualitatively similarly for other small levelsof risk. e.g. α ∈ {0.01,0.025}. Since all model parameters are known we compute thetail index κy of yt, and the corresponding true ESα which we use for simulation biascomputation.8,9

7We draw 2n observations of yt and retain the last n observations. Start values are y0 = y−1 = 0 for AR(2),and σ 2

1 =1 for GARCH(1,1).8The true ESα is computed by drawing N = 100,000 observations of yt and computing 1/N

∑Nt=1 ytI(yt <

y[αN]). We repeat this 10,000 times and use the median value for bias computation.9The tail index for a stationary AR is the same as the iid error (Brockwell and Cline, 1985). In the GARCHcase P(|yt| > y) = dy−κy (1 + o(1)) where E|0.3ε2

t + 0.6|κy/2 = 1 (Basrak, Davis, and Mikosch, 2002). Wedraw N = 100,000 iid εt from P2.5 or N(0,1) and compute κ = argminκ∈K |1/N

∑Nt=1 |0.3ε2

t + 0.6|κ/2 −1| where K = {0.001,0.002,...,10}. We repeat this 10,000 times and report the median value under κy inTables 1–3. In the Gaussian IGARCH case κy = 2. See Mikosch, and Starica (2000).

at University of N

orth Carolina at C



ownloaded from




Table 1(a) α=0.05 and n=100

Untrimmed ESn

Model εt κy Bias RMSEi KSii ADiii Nulliv Alt1 Alt2

AR P1.5 1.50 0.333 0.407 3.31 2.88 0.000,0.036,0.228 0.053,0.104,0.136 0.168,0.304,0.402AR P2.5 2.50 0.009 0.046 0.694 2.43 0.000,0.035,0.111 0.155,0.296,0.406 0.688,0.858,0.937AR N0,1 ∞ 0.004 0.015 0.850 2.25 0.015,0.050,0.098 0.951,0.985,0.991 1.00,1.00,1.00

GARCHv P2.5 1.50 0.017 0.065 1.27 4.89 0.024,0.064,0.102 0.266,0.512,0.659 0.918,0.970,0.982IGARCH N0,1 2.00 0.055 0.132 2.47 1.79 0.008,0.030,0.106 0.204,0.358,0.471 0.991,1.00,1.00RiskMetric N0,1 2.00 0.025 0.098 1.87 3.07 0.008,0.067,0.155 0.995,1.00,1.00 1.00,1.00,1.00GARCH N0,1 4.10 0.014 0.038 1.67 2.16 0.003,0.048,0.123 1.00,1.00,1.00 1.00,1.00,1.00

Trimmed ES(∗)n,α

Model εt κy Bias RMSE KS AD Null Alt1 Alt2

AR P1.5 1.50 0.560 0.250 2.95 2.85 0.480,0.703,0.779 0.017,0.054,0.083 0.206,0.368,0.481AR P2.5 2.50 0.051 0.030 0.666 2.61 0.216,0.409,0.526 0.045,0.157,0.213 0.764,0.924,0.960AR N0,1 ∞ 0.035 0.011 0.777 2.38 0.718,0.891,0.956 0.597,0.800,0.881 1.00,1.00,1.00

GARCH P2.5 1.50 0.081 0.055 1.32 5.85 0.151,0.294,0.392 0.046,0.205,0.330 0.848,0.937,0.965IGARCH N0,1 2.00 0.143 0.105 2.37 1.86 0.057,0.320,0.481 0.148,0.259,0.343 1.00,1.00,1.00RiskMetric N0,1 2.00 0.165 0.083 2.43 3.64 0.294,0.514,0.624 0.945,0.990,1.00 1.00,1.00,1.00GARCH N0,1 4.10 0.067 0.021 1.45 2.14 0.145,0.356,0.478 1.00,1.00,1.00 1.00,1.00,1.00

Optimally Trimmed ES(2)n,α = ES

(∗)n,α +R(2)

n,α


AR P1.5 1.50 0.088 0.380 1.69 2.45 0.021,0.057,0.084 0.076,0.126,0.160 0.158,0.268,0.364AR P2.5 2.50 −0.025 0.063 0.821 2.29 0.020,0.062,0.122 0.223,0.394,0.518 0.640,0.835,0.920AR N0,1 ∞ −0.003 0.017 0.892 2.21 0.014,0.054,0.103 0.994,0.100,1.00 1.00,1.00,1.00

GARCH P2.5 1.50 −0.021 0.082 0.852 3.88 0.021,0.065,0.112 0.539,0.769,0.832 0.904,0.960,0.978IGARCH N0,1 2.00 −0.080 0.211 1.75 1.83 0.021,0.064,0.126 0.334,0.592,0.772 0.926,1.00,1.00RiskMetric N0,1 2.00 0.031 0.110 1.48 3.11 0.017,0.061,0.121 1.00,1.00,1.00 1.00,1.00,1.00GARCH N0,1 4.10 −0.053 0.065 1.03 2.05 0.015,0.064,0.123 1.00,1.00,1.00 1.00,1.00,1.00

(i) Root mean squared error.(ii) Kolmogorov–Smirnov statistic for a test of standard normality, divided by the 5% critical value.(iii) Anderson–Darling statistic for a test of standard normality, divided by the 5% critical value.(iv) The hypotheses are Null: ES = ESα ; Alt1: ES = ESα/2; Alt2: ES = 0.(v) The GARCH model is yt =σtεt, σ 2

t = ω + αy2t−1 + βσ 2

t−1. GARCH σ,β =0.3,0.6; IGARCH α,β =0.4,0.6;RiskMetric α,β =0.06,0.94.

We compute the untrimmed ESn, the core trimmed ES(∗)n with fractile

kn = min{1,[.25n2/3/(ln(n))2ι]} and ι = 10−10, and the bias corrected ES(2)n

where the tail index estimator κmn(λ) is based on the fractile function mn(λ) =min{1,[λkn(ln(n))ι]} and λ ∈ [λ,λ]=[0.05,10]. The interval [0.05,10] is discretizedwith increments 1/n.

In a few cases from an entire sample draw {yt}nt=1 all tail index estimates κmn(λ)

∈ (0,1] which is too small for our bias estimator to be valid: we simply throw out

at University of N

orth Carolina at C



ownloaded from




Table 1(b) α=0.05 and n=500

Untrimmed ESn


AR P1.5 1.50 0.203 0.301 2.21 2.61 0.000,0.508,0.189 0.098,0.185,0.244 0.493,0.772,0.911AR P2.5 2.50 0.005 0.030 0.954 2.29 0.023,0.039,0.072 0.445,0.682,0.795 0.976,0.988,0.994AR N0,1 ∞ 0.002 0.007 0.831 1.63 0.010,0.051,0.092 1.00,1.00,1.00 1.00,1.00,1.00


Trimmed ES(∗)n,α


AR P1.5 1.50 0.436 0.172 1.85 2.51 0.562,0.753,0.805 0.043,0.108,0.150 0.827,0.991,0.999AR P2.5 2.50 0.034 0.023 1.188 2.59 0.128,0.295,0.403 0.232,0.481,0.630 0.977,0.990,0.994AR N0,1 ∞ 0.017 0.006 1.018 1.67 0.491,0.708,0.794 1.00,1.00,1.00 1.00,1.00,1.00



(∗)n,α +R(2)

n,α


AR P1.5 1.50 0.120 0.273 1.53 2.27 0.000,0.038,0.130 0.112,0.203,0.269 0.527,0.827,0.946AR P2.5 2.50 −0.002 0.031 0.708 2.18 0.016,0.039,0.088 0.488,0.738,0.812 0.975,0.988,0.994AR N0,1 ∞ −0.004 0.008 0.798 1.49 0.004,0.045,0.087 1.00,1.00,1.00 1.00,1.00,1.00



t = ω + αy2t−1 + βσ 2


such samples since κ ≤ 1 is ruled out. This is reasonable since in practice if there iscompelling evidence that κ ≤ 1 then the ES may not exist.

Let Ei,n denote any particular ES estimator for the i-th simulated sample outof N samples. We report the simulation bias 1/N

∑Ni=1 Ei,n − ESα , root mean-

squared-error sN := (1/N∑N

i=1(Ei,n − ESα)2)1/2, and perform Kolmogorov–Smirnovand Anderson–Darling tests of standard normality on the standardized variable(Ei,n −1/N

∑Ni=1 Ei,n)/sN where s2

N := (1/N∑N

i=1(Ei,n − 1/N∑N

i=1 Ei,n)2)1/2. Finally,

at University of N

orth Carolina at C



ownloaded from




Table 1(c) α=0.05 and n=1000

Untrimmed ESn


AR P1.5 1.50 0.178 0.252 2.05 2.59 0.000,0.082,0.191 0.147,0.152,0.347 0.842,0.991,1.00AR P2.5 2.50 0.005 0.024 1.17 2.43 0.024,0.052,0.089 0.677,0.841,0.891 0.995,0.999,0.999AR N0,1 ∞ 0.001 0.005 0.722 2.15 0.016,0.058,0.093 1.00,1.00,1.00 1.00,1.00,1.00


Trimmed ES(∗)n,α


AR P1.5 1.50 0.360 0.145 1.82 2.21 0.518,0.712,0.801 0.102,0.211,0.300 1.00,1.00,1.00AR P2.5 2.50 0.024 0.021 1.51 2.87 0.085,0.188,0.271 0.479,0.717,0.809 0.995,0.999,0.999AR N0,1 ∞ 0.009 0.005 0.768 2.22 0.310,0.515,0.635 1.00,1.00,1.00 1.00,1.00,1.00



(∗)n,α +R(2)

n,α


AR P1.5 1.50 1.04 0.251 1.33 1.65 0.000,0.060,0.134 0.166,0.283,0.398 0.889,0.997,1.00AR P2.5 2.50 −0.002 0.025 1.05 2.40 0.015,0.041,0.087 0.714,0.864,0.906 0.994,0.999,0.999AR N0,1 ∞ −0.003 0.005 0.678 1.98 0.012,0.063,0.122 1.00,1.00,1.00 1.00,1.00,1.00

GARCH P2.5 1.50 −0.022 0.061 1.63 2.78 0.009,0.065,0.130 0.614,0.790,0.851 0.968,0.988,0.992IGARCH N0,1 2.00 −0.027 0.182 1.73 2.03 0.020,0.063,0.120 0.350,0.618,0.758 1.00,1.00,1.00RiskMetric N0,1 2.00 −0.046 0.478 1.46 2.06 0.018,0.057,0.094 0.526,0.775,0.916 1.00,1.00,1.00GARCH N0,1 4.10 −0.001 0.026 0.784 1.28 0.016,0.058,0.119 1.00,1.00,1.00 1.00,1.00,1.00


t = ω + αy2t−1 + βσ 2


we use the simulation ratio 1/N1/2∑Ni=1(Ei,n −e0)/sN to perform asymptotic tests of

whether the true ES is e0 ∈ {ESα , ESα/2, 0}.The simulation results are presented in Tables 1 and 2 for α = 0.05 and 0.10,

respectively. The core estimator ES(∗)n exhibits substantial bias as predicted, with

essentially no change as n increases. In fact, there must be asymptotic bias for anyκ > 1 since kn is regularly varying: see the discussions in Sections 1 and 1. Bias

is essentially eradicated with ES(2)n since it exploits an optimally fitted tail mean

at University of N

orth Carolina at C



ownloaded from




Table 1(d) α=0.05 and n=10,000

Untrimmed ESn

Model εt κy Bias RMSEi KSii ADiii Nulld Alt1 Alt2

AR P1.5 1.50 0.093 0.139 1.95 2.29 0.001,0.0788,0.186 0.692,0.910,0.975 1.00,1.00,1.00AR P2.5 2.50 0.008 0.014 2.68 2.11 0.046,0.092,0.118 0.959,0.976,0.983 1.00,1.00,1.00AR N0,1 ∞ 0.000 0.002 0.985 1.64 0.004,0.025,0.093 1.00,1.00,1.00 1.00,1.00,1.00

GARCHv P2.5 1.50 0.024 0.032 2.26 8.21 0.059,0.099,0.144 0.815,0.910,0.932 1.00,1.00,1.00IGARCH N0,1 2.00 −0.014 0.089 1.21 1.65 0.019,0.067,0.098 1.00,1.00,1.00 1.00,1.00,1.00RiskMetric N0,1 2.00 0.096 0.121 1.93 2.38 0.003,0.038,0.089 1.00,1.00,1.00 1.00,1.00,1.00GARCH N0,1 4.10 −0.002 0.008 0.729 1.47 0.001,0.046,0.121 1.00,1.00,1.00 1.00,1.00,1.00

Trimmed ES(∗)n,α


AR P1.5 1.50 0.208 0.082 0.191 2.18 0.517,0.709,0.794 0.911,0.988,0.997 1.00,1.00,1.00AR P2.5 2.50 0.015 0.014 2.91 2.43 0.091,0.149,0.214 0.922,0.963,0.970 1.00,1.00,1.00AR N0,1 ∞ 0.001 0.002 1.01 1.63 0.054,0.160,0.243 1.00,1.00,1.00 1.00,1.00,1.00



(∗)n,α +R(2)

n,α


AR P1.5 1.50 0.081 0.141 0.853 1.81 0.000,0.064,0.136 0.703,0.918,0.981 1.00,1.00,1.00AR P2.5 2.50 0.006 0.014 1.43 2.18 0.014,0.058,0.119 0.960,0.976,0.983 1.00,1.00,1.00AR N0,1 ∞ −0.001 0.02 0.892 1.63 0.004,0.061,0.119 1.00,1.00,1.00 1.00,1.00,1.00

GARCH P2.5 1.50 0.011 0.034 1.45 2.88 0.020,0.065,0.120 0.857,0.922,0.944 1.00,1.00,1.00IGARCH N0,1 2.00 −0.023 0.103 1.11 1.77 0.012,0.058,0.109 1.00,1.00,1.00 1.00,1.00,1.00RiskMetric N0,1 2.00 −0.013 0.226 1.39 2.29 0.013,0.056,0.111 1.00,1.00,1.00 1.00,1.00,1.00GARCH N0,1 4.10 −0.009 0.009 0.645 1.46 0.012,0.055,0.112 1.00,1.00,1.00 1.00,1.00,1.00


t = ω + αy2t−1 + βσ 2


estimator that renders ES(2)n close to the untrimmed ESn. The untrimmed estimator

ESn exhibits the greatest deviations from normality and the largest empiricalsize distortions relative to the nominal size when the variance is infinite. Of allestimators ES

(2)n has the smallest RMSE, it is closest to normal by either normality

test, and when used for inference its t-ratio exhibits the sharpest empirical size

and exceptional power. Finally, our bias corrected ES(2)n compares well with the

untrimmed estimator ESn when the variance is finite. This suggests trimming doesnot impose a detectable penalty in terms of small sample mean-squared-error.

at University of N

orth Carolina at C



ownloaded from




Table 2(a) α=0.10 and n=100

Untrimmed ESn


AR P1.5 1.50 0.197 0.962 5.63 2.34 0.008,0.012,0.018 0.017,0.045,0.069 0.064,0.153,0.246AR P2.5 2.50 0.001 0.065 0.707 3.06 0.007,0.048,0.097 0.218,0.395,0.511 0.815,0.945,0.973AR N0,1 ∞ 0.001 0.027 0.678 1.94 0.007,0.047,0.097 0.953,0.995,1.00 1.00,1.00,1.00

GARCHv P2.5 1.50 −0.004 0.113 1.57 4.91 0.023,0.052,0.086 0.15,0.371,0.515 0.830,0.943,0.970IGARCH N0,1 2.00 0.067 0.188 3.12 1.62 0.023,0.036,0.075 0.254,0.443,0.587 1.00,1.00,1.00RiskMetric N0,1 2.00 0.122 0.126 1.49 1.61 0.059,0.163,0.245 1.00,1.00,1.00 1.00,1.00,1.00GARCH N0,1 4.10 −0.002 0.074 2.81 1.70 0.026,0.045,0.064 0.930,1.00,1.00 1.00,1.00,1.00

Trimmed ES(∗)n,α


AR P1.5 1.50 0.899 0.119 3.16 2.38 0.999,1.00,1.00 0.500,0.778,0.844 0.492,0.784,0.926AR P2.5 2.50 0.132 0.026 0.770 2.97 0.992,0.996,0.998 0.039,0.093,0.160 0.865,0.949,0.970AR N0,1 ∞ 0.110 0.015 0.771 2.16 1.00,1.00,1.00 0.013,0.066,0.110 1.00,100,1.00



(∗)n,α +R(2)

n,α


AR P1.5 1.50 0.174 0.477 2.13 2.28 0.006,0.043,0.099 0.089,0.159,0.223 0.330,0.540,0.708AR P2.5 2.50 −0.021 0.067 0.526 2.45 0.006,0.054,0.111 0.319,0.536,0.654 0.877,0.962,0.977AR N0,1 ∞ −0.012 0.049 0.676 1.87 0.014,0.061,0.110 0.767,0.931,0.973 1.00,1.00,1.00

GARCH P2.5 1.50 −0.009 0.163 0.937 4.20 0.019,0.045,0.090 0.407,0.664,0.784 0.805,0.921,0.955IGARCH N0,1 2.00 −0.043 0.202 1.96 1.60 0.019,0.068,0.112 0.215,0.387,0.504 0.707,0.968,1.00RiskMetric N0,1 2.00 −0.142 0.165 1.72 1.62 0.032,0.075,0.145 0.685,0.899,1.00 1.00,1.00,1.00GARCH N0,1 4.10 −0.012 0.112 1.91 1.71 0.021,0.071,0.115 0.560,0.878,0.977 1.00,1.00,1.00


t = ω + αy2t−1 + βσ 2


4 EMPIRICAL APPLICATION

Finally, we estimate ES for several financial returns series. We study the stockmarket Hang Seng Index (HSI) and the Russian Ruble—U.S. Dollar exchange rateconsidering evidence of infinite variance returns over different periods (Ling, 2005;Ibragimov, Ibragimov, and Kattuman, 2010).10 The period for the Ruble is January 1,

10We thank Rustam Ibragimov for providing us the Ruble-Dollar exchange rate data.

at University of N

orth Carolina at C



ownloaded from




Table 2(b) α=0.10 and n=500

Untrimmed ESn


AR P1.5 1.50 0.031 1.09 6.41 1.75 0.014,0.016,0.016 0.016,0.021,0.030 0.019,0.045,0.087AR P2.5 2.50 0.001 0.042 1.28 3.02 0.025,0.054,0.098 0.538,0.781,0.852 0.983,0.994,0.997AR N0,1 ∞ 0.015 0.042 0.858 2.03 0.028,0.062,0.106 0.444,0.698,0.783 0.985,0.994,0.998


Trimmed ES(∗)n,α


AR P1.5 1.50 0.836 0.108 3.75 1.82 1.00,1.00,1.00 0.159,0.502,0.678 0.990,0.998,0.999AR P2.5 2.50 0.109 0.021 2.14 2.87 1.00,1.00,1.00 0.016,0.055,0.095 0.986,0.995,0.997AR N0,1 ∞ 0.086 0.008 0.364 2.10 1.00,1.00,1.00 0.902,0.971,0.989 1.00,1.00,1.00



(∗)n,α +R(2)

n,α


AR P1.5 1.50 0.034 0.398 2.19 1.34 0.021,0.053,0.089 0.140,0.280,0.368 0.663,0.929,0.982AR P2.5 2.50 −0.007 0.049 1.10 2.53 0.013,0.061,0.112 0.597,0.797,0.868 0.976,0.991,0.996AR N0,1 ∞ −0.009 0.050 0.901 2.03 0.011,0.048,0.099 0.507,0.718,0.808 0.974,0.992,0.995



t = ω + αy2t−1 + βσ 2


1999–October 29, 2008 as in Ibragimov, Ibragimov, and Kattuman (2010), spanninga period between major financial crises in Russia and globally.11 The period for HSIis June 3, 1996–May 31, 1998 as in Ling (2005). We take each series {xt} and computethe daily log returns yt = ln(xt) − ln(xt−1), resulting in 2,449 and 489 returns for theRuble and HSI, respectively.

11In 1998 Russia experienced the so-called Ruble Crisis, with a major devaluation occurring in August 1998.In 2008 a global financial crisis emerged from U.S. banking and investment practices.

at University of N

orth Carolina at C



ownloaded from




Table 2(c) α=0.10 and n=1000

Untrimmed ESn


AR P1.5 1.50 0.002 1.11 7.24 1.92 0.006,0.090,0.012 0.009,0.015,0.024 0.014,0.039,0.084AR P2.5 2.50 0.001 0.035 1.28 2.76 0.015,0.042,0.079 0.737,0.878,0.922 0.994,0.995,0.998AR N0,1 ∞ 0.000 0.009 0.548 1.67 0.009,0.053,0.100 1.00,1.00,1.00 1.00,1.00,1.00


Trimmed ES(∗)n,α


AR P1.5 1.50 0.815 0.083 3.60 1.88 1.00,1.00,1.00 0.269,0.594,0.746 0.996,0.997,0.999AR P2.5 2.50 0.101 0.029 2.11 2.43 1.00,1.00,1.00 0.012,0.060,0.012 0.997,0.998,0.999AR N0,1 ∞ 0.075 0.006 1.23 1.67 1.00,1.00,1.00 1.00,1.00,1.00 1.00,1.00,1.00



(∗)n,α +R(2)

n,α


AR P1.5 1.50 0.006 0.377 1.75 1.48 0.009,0.058,0.096 0.159,0.315,0.447 0.835,0.983,0.994AR P2.5 2.50 −0.011 0.041 0.712 2.15 0.015,0.064,0.113 0.624,0.821,0.895 0.991,0.994,0.996AR N0,1 ∞ −0.003 0.009 0.402 1.58 0.008,0.054,0.102 1.00,1.00,1.00 1.00,1.00,1.00

GARCH P2.5 1.50 −0.009 0.087 1.38 3.03 0.020,;055,0.096 0.274,0.560,0.707 0.934,0.976,0.984IGARCH N0,1 2.00 −0.026 1.83 1.42 2.30 0.007,0.043,0.094 0.588,0.854,0.940 1.00,1.00,1.00RiskMetric N0,1 2.00 0.042 0.240 1.10 1.99 0.015,0.069,0.123 0.988,0.997,1.00 1.00,1.00,1.00GARCH N0,1 4.10 0.005 0.035 1.17 1.68 0.020,0.049,0.091 1.00,1.00,1.00 1.00,1.00,1.00


t = ω + αy2t−1 + βσ 2


In Figure 1, we plot the returns series {yt}nt=1 and Hill (1975) estimator 95%

confidence bands κmn ± 1.96κ2mn

σmn/m1/2n , where σ 2

mnis a kernel estimator of the

mean-squared-error of the inverted estimator E(m1/2n (κ−1

mn− κ−1))2. In particular

σ 2mn

= 1/mn∑n

s,t=1K((s − t)/bn){ln(y(a)s /y(a)

(mn+1))+ − (mn/n)κ−1mn

}× {ln(y(a)t /y(a)

(mn+1))+−(mn/n)κ−1

mn} with Bartlett kernel K(·) and bandwidth bn = n0.25. See Hill (2010),

Theorem 3) for a proof of consistency under mixing Assumption D and tail decay

at University of N

orth Carolina at C



ownloaded from




Table 2(d) α=0.10 and n=10,000

Untrimmed ESn


AR P1.5 1.50 0.040 0.116 2.01 1.33 0.007,0.062,0.152 0.942,0.999,1.00 1.00,1.00,1.00AR P2.5 2.50 0.012 0.022 3.07 2.37 0.045,0.083,0.144 0.940,0.971,0.978 1.00,1.00,1.00AR N0,1 ∞ −0.000 0.028 0.509 1.99 0.011,0.056,0.107 1.00,1.00,1.00 1.00,1.00,1.00

GARCHv P2.5 1.50 0.030 0.051 2.69 4.17 0.050,0.080,0.115 0.672,0.858,0.897 0.986,0.992,0.995IGARCH N0,1 2.00 −0.023 0.109 1.71 1.91 0.030,0.070,0.014 1.00,1.00,1.00 1.00,1.00,1.00RiskMetric N0,1 2.00 0.119 1.75 4.15 2.15 0.024,0.039,0.055 0.151,0.280,0.398 0.968,1.00,1.00GARCH N0,1 4.10 0.000 0.011 1.77 1.79 0.022,0.051,0.095 1.00,1.00,1.00 1.00,1.00,1.00

Trimmed ES(∗)n,α


AR P1.5 1.50 0.159 0.100 1.91 1.48 0.154,0.395,0.534 1.00,1.00,1.00 1.00,1.00,1.00AR P2.5 2.50 0.017 0.022 2.98 2.34 0.059,0.117,0.158 0.909,0.949,0.964 1.00,1.00,1.00AR N0,1 ∞ 0.001 0.003 0.510 1.98 0.014,0.073,0.133 1.00,1.00,1.00 1.00,1.00,1.00



(∗)n,α +R(2)

n,α


AR P1.5 1.50 0.028 0.157 1.32 4.85 0.010,0.050,0.093 0.949,0.999,1.00 1.00,1.00,1.00AR P2.5 2.50 0.010 0.022 1.86 2.18 0.022,0.062,0.108 0.939,0.970,0.978 1.00,1.00,1.00AR N0,1 ∞ −0.001 0.003 0.578 1.98 0.013,0.065,0.122 1.00,1.00,1.00 1.00,1.00,1.00

GARCH P2.5 1.50 0.017 0.054 1.93 2.88 0.028,0.064,0.094 0.713,0.875,0.907 0.985,0.991,0.994IGARCH N0,1 2.00 −0.001 0.122 1.65 1.72 0.016,0.052,0.082 1.00,1.00,1.00 1.00,1.00,1.00RiskMetric N0,1 2.00 0.022 1.82 2.03 2.15 0.019,0.043,0.086 0.154,0.295,0.398 1.00,1.00,1.00GARCH N0,1 4.10 −0.007 0.012 1.62 1.68 0.016,0.068,0.130 1.00,1.00,1.00 1.00,1.00,1.00


t = ω + αy2t−1 + βσ 2


Assumption T′. Each series displays evidence of clustered volatility and extremespikes. Activity in the Ruble corresponds to the recovery following the 1998 crisisand subsequent Ruble devaluation, and the mounting crisis in late 2008. Thetail index estimates for both series are predominantly near or below 2 and theconfidence bands contain 2 for all mn ∈ {5,...,200}.

We now turn to value-at-risk and expected shortfall computation for the risklevels α ∈ {.05,.10}. We follow Linton and Xiao (2013) and compute the VaR qn,α and

at University of N

orth Carolina at C



ownloaded from




-4.0%

-3.0%

-2.0%

-1.0%

0.0%

1.0%

2.0%

3.0%

4.0% Ruble Daily Returns: Jan. 1999-Oct. 2008

0

1

2

3

4

5

6

25 45 65 85 105 125 145 165 185number of tail observations

Ruble Hill-Plot: Jan. 1999-Oct. 2008

-20%-15%-10%

-5%0%5%

10%15%20%

HSI Daily Returns : 1996-1998

0.00.51.01.52.02.53.03.54.0

5 30 55 80 105 130 155 180number of tail observations

HSI Hill-Plot: 1996-1998

Figure 1 Daily returns and Hill-plots.

bias-corrected ES estimator ES(2)n over backward looking rolling windows of size 250

days or about one year. There are therefore 2,200 and 240 windows, respectivelyfor the Ruble and HSI. See Figures 2 and 3. We also include a rolling windowsurface plot of the Hill (1975) estimator over the number of tail observations mn ∈{5,...,200} and window. In this way we can see how risk dynamics are associatedwith heaviness of returns tails. See Figure 4.

Each estimator is computed as in Section 3. We include 95% confidence bands

of the VaR −qn,α and ES(2)n . In the case of ES

(2)n we use the kernel estimator V2

n from

Section 2.4 to compute ES(2)n ± 1.96α−1Vn/n1/2, with Bartlett kernel and bandwidth

γn = n.25

In the case of the intermediate order statistic qn,α = y([αn]) and subsequentVaR −qn,α > 0 the computed confidence band is −qn,α ±1.96Vn/n1/2 where V2

n isconstructed by first deriving the limit distribution of qn,α . Let f (y) be the pdf ofyt, assume f (y) is differentiable, and define It(ξ ) := I(yt > qαeξ ). By replicatingthe argument in Hsing (1991, p. 1553) for an intermediate order statistic, it iseasily verified that n1/2(y([αn]) − qα) ≤ ξ for any ξ ∈ R if and only if In(ξ/n1/2):= 1/n1/2∑n

t=1{It(ξ/n1/2) − E[It(ξ/n1/2)]}/f (qα) ≤ ξ + o(1). If we define V2n(ξ ) :=

E[I2n(ξ )] then by mixing Assumption D In(ξ/n1/2)/Vn(ξ/n1/2)

d→ N(0,1), hence by

dominated convergence n1/2(y([αn]) − qα)/Vn(0)d→ N(0,1). We therefore estimate

Vn(0) with the kernel estimator Vn = 1/n∑n

s,t=1K(s − t)/γn){I((ys ≤ qn,α) − α}{I((yt

≤ qn,α) − α}/fn(qn,α) where fn(y) is a Gaussian kernel estimator of the pdf f (y) withbandwidth 0.5, cf. Silverman (1986).

at University of N

orth Carolina at C



ownloaded from




0.000

0.001

0.002

0.003

0.004

0.005

0.006

0.007

1 501 1001 1501 2001window

Ruble : VaR with Kernel 95% Band

0.0000.0020.0040.0060.0080.0100.0120.0140.016

1 501 1001 1501 2001window

Ruble : ES with Kernel 95% Band

0.010

0.020

0.030

0.040

0.050

0.060

1 51 101 151 201window

HSI: VaR with Kernel 95% Bands

0.00

0.02

0.04

0.06

0.08

0.10

1 51 101 151 201window

HSI: ES with Kernel 95% Bands

Figure 2 Rolling window VaR and Expected Shortfall (risk α=5%).

0.000

0.001

0.002

0.003

0.004

0.005

0.006

1 501 1001 1501 2001window

Ruble : VaR with Kernel 95% Band

0.000

0.002

0.004

0.006

0.008

0.010

0.012

1 501 1001 1501 2001window

Ruble : ES with Kernel 95% Band

0.0050.0100.0150.0200.0250.0300.0350.040

1 51 101 151 201window

HSI: VaR with Kernel 95% Bands

0.010.020.030.040.050.060.07

1 51 101 151 201window

HSI: ES with Kernel 95% Bands

Figure 3 Rolling window VaR and Expected Shortfall (risk α=10%).Notes: The ES estimator is tail-trimmed with optimal bias correction. Each window covers 250trading days. The HSI spans June 3, 1996 to May 31, 1998 representing 240 windows. The Rublewindows spans January 1, 1999 to October 29, 2008 representing 2200 windows.

at University of N

orth Carolina at C



ownloaded from




Figure 4 Rolling window hill-plot. Notes: mn is the number of tail observations.

at University of N

orth Carolina at C



ownloaded from




Both VaR and ES estimates capture roughly the same patterns of increasingand decreasing risk. The Ruble displays decreasing risk following the 1998 crisisas market volatility decreases, and increasing risk as the 2008 crisis emerged. Thisis matched by smaller tail index values over most fractiles in the months near 1999and 2008, showing higher risk coincides with heavier tails. Similarly, the HSI tailindex estimates are smaller in windows with higher risk.

5 CONCLUSION

We develop Expected Shortfall estimators that can be used for standard inferenceeven if the data are heavy tailed. Although bootstrap methods in the heavy tailedcase exist (Linton and Xiao, 2013), our estimators are aimed at computationalsimplicity since they lead to classic inference irrespective of heavy tails. We combineclassic tail-trimming with an improved bias-correction method based on Peng’s(2001, 2004) work for power law distributions to deliver an estimator that isasymptotically unbiased and normal even if the probability tails do not satisfya power law. Further, we exploit an empirical process method to best approximatethe removed tail mean. This leads to a new estimator that can be made arbitrarilyclose to the untrimmed estimator as the sample grows, and yet be asymptoticallynormal. A simulation study shows the new bias-corrected estimator has excellentsmall sample properties.

APPENDIX A: Proofs of Main Results

We require several preliminary results proved in Appendix B. Define tail andnontail indicator variables

In,t := I(

yt ≥y(−)(kn)

)and In,t := I

(yt ≥−ln

)In,t :=

(nkn

)1/2{I(yt <−ln

)−P(yt <−ln

)}.

Stack nontail and tail variables and define their long-run covariance matrix

Wn,t :=[

y∗n,t −E

[y∗

n,t

], In,t

]′and �n := 1

n

n∑s,t=1

E[Wn,sW ′

n,t]. (A1)

Recall S2n := E(n−1/2

∑n

t=1{y∗

n,t− E[y∗n,t]})2.

Lemma A.1 (variance rate). Let Assumptions D and T hold.

a. S2n = rnE[y∗2

n,t] = o(n) where {rn} is a sequence of positive numbers liminfn→∞rn > 0,rn = O(ln(n)), rn does not depend on kn, and rn ∼ K if yt is m-dependent or E[y2

t ] < ∞.

at University of N

orth Carolina at C



ownloaded from




b. If κ = 2 then E[y∗2n,t] ∼ K ln(n), and if κ ∈ (1,2) then E[y∗2

n,t] ∼ κ(2−κ)−1d2/κ

(n/kn)2/κ−1.

c. E(n−1/2∑nt=1In,t)2 = rnE[I2

n,t] where liminfn→∞rn > 0 and rn = O(1).

Lemma A.2 (intermediate order statistic approximation). Let {an,t : 1 ≤ t ≤ n}n≥1be any stochastic triangular array where an,t is �t-measurable and max1≤t≤n{|an,t|} =Op(1). Under Assumptions D, N, and T n−1/2S−1

n∑n

t=1an,tyt{In,t − In,t} = op(1).

Lemma A.3 (central order statistic approximation). Under Assumptions D, N,and T

1n1/2Sn

n∑t=1

ytIn,t{I(yt ≤y([αn])

)−I(yt ≤qα

)}=op(1).

Lemma A.4 (intermediate order statistic expansion). Let Assumptions D and Thold. For any positive sequences {ln,kn} that satisfy kn → ∞, kn = o(n) and (n/kn)P(yt <

−ln) → 1 we have k1/2n (y(−)

(kn)/(−ln) − 1) = κ−1n−1/2∑nt=1In,t ×(1 + op(1)) = Op(1).

Lemma A.5 (central limit theorem). Under Assumptions D, N′ and T n−1/2

(λ′�nλ)−1/2 × ∑nt=1λ′Wn,t

d→ N(0,1) for any λ ∈ R2 such that λ′λ = 1.

Remark 1. If λ = [1,0] then λ′�nλ′ = S2n hence n−1/2S−1

n∑n

t=1{y∗n,t −E[y∗

n,t]}d→

N(0,1).

Remark 2. By the Cramér-Wold Theorem n−1/2�−1/2n

∑nt=1Wn,t

d→ N(0,I2).

We are now ready to prove the main results.

Proof of Lemma 1.1. Each claim follows from Lemma A.1.a,b. �

Proof of Theorem 1.2. Recall y∗n,t := ytI(−ln ≤ yt ≤ qα). By the construction of ES

(∗)n

use intermediate order statistic approximation Lemma A.2 with an,t = I(yt ≤ y([αn]))to deduce

n1/2

Sn/α

{ES

(∗)n −E

[y∗

n,t]} = − 1

n1/2Sn

n∑t=1

{y∗

n,t −E[y∗

n,t]}

− 1n1/2Sn

n∑t=1


)−I(yt ≤qα

)}+op(1)

= An +Bn +op(1),

say. By central order statistic approximation Lemma A.3 Bn = op(1). Now use Sn =o(n1/2) by Lemma A.1.a to obtain

ES(∗)n −ESα =− 1

αE[y∗

n,t]−ESα − 1

α× 1

n

n∑t=1

{y∗

n,t −E[y∗

n,t]}+op(1).

at University of N

orth Carolina at C



ownloaded from




The first claim ES(∗)n

p→ ESα now follows from −α−1E[y∗n,t] → ESα by dominated

convergence, and 1/n∑n

t=1{y∗

n,t − E[y∗n,t]}

p→ 0 by Chebyshev’s inequality in view

of S2n = o(n) by Lemma A.1.a.Next, apply Remark 1 of central limit theorem Lemma A.5 to An to conclude

n1/2

Sn/α

(ES

(∗)n + 1

αE[y∗

n,t])=An +op(1)

d→N(0,1). (A2)

Since α−1E[y∗n,t] = −ESα + α−1E[ytI(yt <−ln)] the second claim is proved. The third

claim follows from (A2) and identification Assumption I. �

Proof of Theorem 2.1. Define

Y∗n,t :=y∗

n,t −E[y∗

n,t]

and In,t :=(

nkn

)1/2{I(yt <−ln

)−P(yt <−ln

)},

and Wn,t = [Y∗n,t,In,t]′. Recall S2

n := n−1E(∑n

t=1Y∗n,t)

2, V2n := B′

n�nBn, �n :=1/n

∑ns,t=1E[Wn,sW ′

n,t] and Bn := [1, −(κ − 1)−1(kn/n)1/2ln]′, and define variances

σ 2n := 1

nE

( n∑t=1

In,t

)2

and Hn := 1n

E

⎛⎝ n∑s,t=1

Y∗n,sIn,t

⎞⎠.

We first relate V2n to S2

n and then derive the distribution limit of ES(1)n .

Step 1 (V2n): Note

V2n =B′

n�nBn =S2n −2

1κ−1

(kn

n

)1/2lnHn + 1

(κ−1)2kn

nl2nσ 2

n .

By Lemma A.1.a,c liminfn→∞S2n > 0 and σ 2

n = O(1), hence σ 2n = O(S2

n) and thereforeby the Cauchy–Schwartz inequality Hn = O(S2

n). Moreover, by power law tail decayAssumption D and the threshold construction (2) it follows ln/(n/kn)1/κ = K > 0hence the Bn term (kn/n)1/2ln ∼ K(kn/n)1/2−1/κ . Therefore (kn/n)1/2ln → 0 if κ > 2,(kn/n)1/2ln → K > 0 if κ = 2, and (kn/n)1/2ln → ∞ if κ > 2. Finally, if κ ∈ (1,2] then byLemma A.1.a,b S2

n ∼rn(n/kn)2/κ−1 where rn = O(ln(n)) and liminfn→∞rn > 0, hence(kn/n)l2n/S2

n = O(1).It now follows by case V2

n = S2n(1 + o(1)) if κ > 2 and V2

n = S2n(1 + O(1)) if κ ∈

(1,2].Step 2 (ES

(1)n ): By the proof of Theorem 1.2 ES

(∗)n obtains the expansion

n1/2

Sn/α

(ES

(∗)n −ESα

)=− 1

n1/2Sn

n∑t=1

Y∗n,t +

n1/2

Sn

(−E[y∗

n,t]−ESα

)+op(1). (A3)

at University of N

orth Carolina at C



ownloaded from




Under Assumptions D and T′ it follows κ(−)mn = κ + Op(1/m1/2

n ) by Theorem 3 in Hill(2010), hence

κ(−)mn

κ(−)mn −1

kn

ny(−)

(kn) = −lnκ

κ−1kn

n+ κ

κ−1k1/2

n

nln ×k1/2

n

(y(−)

(kn)/ln +1)

(A4)

+(

κ(−)mn

κ(−)mn −1

− κ

κ−1

)k1/2

n

nln ×k1/2

n

(y(−)

(kn)/ln +1)

−(

κ(−)mn

κ(−)mn −1

− κ

κ−1

)kn

nln

= − κ

κ−1kn

nln + κ

κ−1k1/2

n

nln ×k1/2

n

(y(−)

(kn)/ln +1)+Op

(m−1/2

nkn

nln

).

Further, by intermediate order statistic expansion Lemma A.4

k1/2n

(y(−)

(kn)/ln +1)=−κ−1 1

n1/2

n∑t=1

In,t ×(1+op(1)

). (A5)

Finally, by tail mean formula (4)

E[ytI

(yt <−ln

)]=− κ

κ−1kn

nln ×(1+o(1))

and by construction

E[y∗

n,t]− κ

κ−1kn

nln −ESα =− κ

κ−1kn

nln −E

[ytI

(yt <−ln

)]. (A6)

Combine (A3)–(A6) and simplify to obtain

n1/2

Vn/α

(ES

(1)n −ESα

)= n1/2

Vn/α

(ES

(∗)n + 1

α

(κ

(−)mn

κ(−)mn −1

kn

ny(−)

(kn)

)−ESα

)

={

1n1/2Vn

n∑t=1

Y∗n,t −

1κ−1

1Vn

(kn

n

)1/2ln

1n1/2

n∑t=1

In,t

}

−n1/2

Vn

(E[ytI

(yt <−ln

)]+ κ

κ−1kn

nln

)

+Op

(1Vn

(n

mn

)1/2 kn

nln

)+op(1)

= Zn +A1,n +A2,n +op(1),

say. We will show A1,n = o(1), A2,n = op(1) and Znd→ N(0,1).

at University of N

orth Carolina at C



ownloaded from




Consider A1,n. Peng’s (2001, pp. 259–264) arguments in conjunction withnondegeneracy Assumption N

′and tail decay Assumption T′ prove

⎛⎝ n

E[y∗2

n,t

]⎞⎠1/2(

E[ytI

(yt <−ln

)]+ κ

κ−1kn

nln

)→0.

See also Csörgo, Horváth, and Mason (1986) and Csörgo, Haeusler, and Mason(1988). Since by Step 1 and Lemma A.1 V2

n ∼KS2n ∼ rnE[y∗2

n,t] where liminfn→∞rn> 0 and rn = O(ln(n)), it follows A1,n = o(1).

Now consider A2,n. By power-law Assumption T and (2) it follows ln =K(n/kn)1/κ . Further, Lemma A.1.a and trimming negligibility imply liminfn→∞S2

n> 0 hence liminfn→∞V2

n > 0 by Step 1. Therefore

1Vn

(n

mn

)1/2 kn

nln ≤K

(n

mn

)1/2( nkn

)1/κ−1=K

k1−1/κn

m1/2n

1n1/2−1/κ

≤K(

kn

mn

)1−1/κ

.

Since by assumption κ > 1 and kn/mn = o(1) it follows A2,n = op(1).Finally, by construction and Remark 2 of Lemma A.5

Zn = 1n1/2Vn

n∑t=1

Y∗n,t −

1κ−1

(kn

n

)1/2ln

n∑t=1

In,t = 1n1/2Vn

B′n

n∑t=1

Wn,t

= 1(B′n�nBn

)1/2 B′n�

1/2n

1n1/2 �

−1/2n

n∑t=1

Wn,td→N(0,1).

�

Proof of Theorem 2.2. Recall 0 < λ < λ and mn(λ) = [λmn]. Since λn =arginfλ∈[λ,λ] |ES

(∗)n + R(2)

n (λ) − ESn| operates only on κ(−)mn(λ) in R(2)

n (λ), consider the

weak limit of {m1/2n (κ−1

mn(λ) − κ−1) : λ ∈ [λ,λ]}. Define σ 2n (λ) := E[(m1/2

n (λ)(κ−1mn(λ)

− κ−1))2]. Under the maintained assumptions and Theorem 5.1 in Hill (2009)there exists a zero mean Gaussian process {Z(λ) : λ ∈ [λ,λ]} on C[λ,λ] such that{m1/2

n (κ−1mn(λ) − κ−1) : λ ∈ [λ,λ]} �⇒ {Z(λ) : λ ∈ [λ,λ]} where �⇒ denotes weak

convergence on C[λ,λ] the space of continuous functions. Hence supλ∈[λ,λ] |κ−1mn(λ) −

κ|= Op(1/infλ∈[λ,λ]m1/2n (λ)) by the continuous mapping theorem, which is op(1/k1/2

n )given kn/mn → 0. The argument used to prove Theorem 2.1 now carries oververbatim. �

Proof of Theorem 2.3. None of the following arguments depend on the sign of yt,so assume yt > 0 a.s. for notational convenience. Therefore all components of thebias corrected estimator are positive, in particular {ln} satisfies P(yt > ln) = kn/n.

at University of N

orth Carolina at C



ownloaded from




Write κkn = κ(−)kn

and y(kn) = y(−)(kn), and observe for any tiny ι > 0,

n1/2R(1)n = 1

α

(κmn

κmn −1kn

n1/2−ι

lnnι

y(kn)

ln

)

It suffices to show ln/nι → 0, y(kn)/lnp→ 1, and 1/κmn

p→ 0. By supposition kn/n1/2−ι

→ 0 hence the proof is complete.

Step 1 (ln). By exponential tail decay P(yt ≥ c) = ϑ exp{−� cδ} in (9), and thedefinition of ln, it follows

ln =(

1�

lnϑ + 1�

lnnkn

)1/δ

∼ 1� 1/δ

(ln(n))1/δ =o(nι). (A7)

Step 2 (y(kn)). We will prove k1/2n ln(y(kn)/ln) = Op(1) hence y(kn)/ln = 1 + Op(1/k1/2

n ).Define, In,t(u) := (n/kn)I(yt > lneu) for any u ∈ R, and In(u) := 1/n

∑nt=1In,t(u). By

construction k1/2n ln(y(kn)/ln) ≤ u for u ∈ R if and only if In(u/k1/2

n ) ≤ 1, if and only if

In(u/k1/2n )−E

[In(u/k1/2

n )]≤1− n

knP(

yt > lneu/k1/2n)=1−

P(

yt > lneu/k1/2n)

P(yt > ln

) .

Expand P(yt > lneu/k1/2n ) around u = 0. Use ln ∼ �−1/δ(ln(n))1/δ and (∂/∂c)P(yt > c)

= −�ϑδcδ−1exp{−� cδ}, and the mean-value-theorem, to deduce k1/2n ln(y(kn)/ln) ≤

u if and only if for some |u∗| ≤ |u|

k1/2n

(In(u/k1/2

n )−E[In(u/k1/2

n )])

≤− 1P(∣∣yt

∣∣> ln) ∂

∂cP(yt >c

)|c=lneu∗/k1/2

n×lneu∗/k1/2

n ×u

=δϑ�nkn

exp{−�

(lneu∗/k1/2

n)δ}(

lneu∗/k1/2n)δ−1

×lneu∗/k1/2n u=δϑ

ln(n)

knu×(1+o(1)).

Therefore k1/2n ln(y(kn)/ln) and (k3/2

n /(δϑ ln(n))) × (In(u/k1/2n ) − E[In(u/k1/2

n )]) havethe same limit distribution. But in view of the Assumption D mixing property andthe zero mean L2-boundedness of (n/kn)1/2{I(yt > lneu) − E[I(yt > lneu)]} we haveby Theorem 1.6 and Lemma 2.1 in McLeish (1975)

k1/2n

(In(u/k1/2

n )−E[In(u/k1/2

n )])

= 1n1/2

n∑t=1

(nkn

)1/2{I(

yt > lneu/k1/2n)−E

[I(

yt > lneu/k1/2n)]}

=Op(1).

at University of N

orth Carolina at C



ownloaded from




By the supposition kn = O(ln(n)) it therefore follows

k3/2n

ln(n)

(In(u/k1/2

n )−E[In(u/k1/2

n )])

= kn

ln(n)k1/2

n

(In(u/k1/2

n )−E[In(u/k1/2

n )])

=Op(1).

Therefore k1/2n ln(y(kn)/ln)=Op(1).

Step 3 (κmn ). Let {cn} be the sequence of positive numbers that satisfies P(yt > cn)= mn/n. We have

κ−1mn

= 1mn

mn∑j=1

ln(y(i)/y(mn+1)

)(A8)

= 1mn

n∑i=1

ln(y(i)/y(mn+1)

)×I(yi >y(mn+1)

)

= 1mn

n∑i=1

ln(y(i)/cn

)×I(yi >y(mn+1)

)+op(1)

= 1mn

n∑i=1

ln(yi/cn

)×I(yi > cn

)+op(1)= 1n

n∑i=1

xn,i +op(1),

say. The third equality follows from y(mn+1)/cn = 1 + Op(1/m1/2n ) by an application

of Step 2 since

1mn

n∑i=1

ln(y(i)/y(mn+1)

)×I(yi >y(mn+1)

)− 1mn

n∑i=1

ln(y(i)/cn

)×I(yi >y(mn+1)

)

=(ln(y(mn+1)

)−ln(cn))× 1

mn

n∑i=1

I(yi >y(mn+1)

)

=Op

(1

m1/2n

1mn

n∑i=1

I(yi >y(mn+1)

))=Op

(1/m1/2

n

).

Notice this exploits the construction of y(mn+1) and distribution continuity:1/mn

∑ni=1 I(yi > y(mn+1)) = 1 a.s. The fourth equality in (A8) can be verified under

very general conditions in view of the stationary mixing property and the factthat y(mn+1) is the sample mn/n tail quantile. The method of proof is identical toLemmas A.2 and A.3 in Hill (2013).

In order to prove κ−1mn

p→ 0 it remains to verify 1/n∑n

i=1xn,ip→ 0. The variable

xn,i = (n/mn)ln(yi/cn)I(yi > cn) has a positive finite mean that decays to zero.

at University of N

orth Carolina at C



ownloaded from




This follows by invoking exponential tail bound (9) and P(yi > cn) = mn/n to obtain

E[xn,i

] = nmn −1

∫ ∞

0P(ln(yi/cn

)>u

)du= n

mn −1

∫ ∞

0P(yi > cneu)du

= nmn −1

P(yi > cn

)∫ ∞

0

P(yi > cneu)

P(yi > cn

) du

= mn

mn −1

∫ ∞

0exp

{−� cδn(eδu −1

)}du>0.

Seeing that � > 0, and (A7) implies cn → ∞, it follows by dominated convergenceE[xn,i] ↘ 0. Hence the Cesáro sum 1/n

∑ni=1E[xn,i] ↘ 0. Now use Markov’s

inequality and 1/n∑n

i=1E[xn,i] ↘ 0 to deduce 1/n∑n

i=1xn,ip→ 0. �

Proof of Theorem 2.4. We have −y(−)(kn)/ln = 1 + Op(1/k1/2

n ) by Lemma B.3, while Hill

(2012b, Lemma A.9) proves S2n/S2

np→ 1 under conditions that cover Assumptions

D, K, N, and T′. Further, the proofs of tail array HAC consistency Theorem 3 in Hill(2010) and tail-trimmed HAC consistency Lemma A.9 in Hill (2012b) can be easily

extended to prove ||�n�−1n − I2|| p→ 0. The claim V2

n/V2n

p→ 1 now follows from theSlutsky Theorems. �

APPENDIX B: Proofs of Lemmas A.1–A.5

Proof of Lemma A.1.

Claim (a): We simplify notation by proving E(1/n1/2∑nt=1yn,t − E[yn,t])2 = rnE[y2

n,t]for the two-tailed intermediate order trimmed yn,t := ytI(|yt| ≤ ln). Simply replaceytI(|yt| ≤ ln) with ytI(−ln ≤yt ≤qα) to prove the claim.

Note Sn ∼ E[y2n,t]+2

∑n−1i=1 (1 − i/n)E[yn,1yn,i+1]. If E[y2

t ] < ∞ then Sn ∼ K bythe Assumption D geometric α-mixing property (Ibragimov, 1962) hence the claimSn = rnE[y2

n,t] holds for some rn = O(1). Now assume E[y2t ] = ∞ such that the tail

index κ ∈ (1,2].Define quantile functions Qn(u) = inf{a ≥ 0 : P(|yn,t| > a) ≤ u} and Q(u) = inf{a

≥ 0 : P(|yt| > a) ≤ u} for u ∈ [0,1], and recall under geometric α-mixing the mixingcoefficients satisfy αh ≤ Kρh for ρ ∈ (0,1). By Theorem 1.1 of Rio (1993)

n−1∑i=1

∣∣E[yn,1yn,i+1]∣∣≤2

n−1∑i=1

∫ αh

0Q2

n(u)du≤2n−1∑i=1

∫ ρh

0Q2

n(u)du.

By tail-trimming and distribution continuity P(yn,t = 0) = kn/n. Hence by thethreshold construction (2) we have Qn(u) = 0 for u ∈[0,kn/n] and Qn(u) = Q(u) foru ∈ (kn/n,1], and by the Assumption T power law property Q(u) = Ku−1/κ (1 + o(1))

at University of N

orth Carolina at C



ownloaded from




as u → 0. Now exploit Q(u) ≤ Ku−1/κ and dominated convergence to deduce

n−1∑i=1

∣∣E[yn,1yn,i+1]∣∣ ≤ K

n−1∑i=1

∫ ρh

kn/nu−2/κdu=K

n−1∑i=1

max{

0,(n/kn

)2/κ−1 −ρ−i(2/κ−1)}

= KAln(n/kn)∑

i=1

{(n/kn

)(2/κ−1)−ρ−i(2/κ−1)},

where A := (2/κ − 1)/ln(ρ). The final equality follows from (n/kn)2/κ−1 < ρ−i(2/κ−1)

when i > Aln(n/kn). Further∑Aln(n/kn)

i=1 {(n/kn)2/κ−1 − ρ−i(2/κ−1)} = K ln(n/kn)×(n/kn)2/κ−1(1 + O(1)) and kn = o(n), hence

n−1∑i=1

∣∣E[yn,1yn,i+1]∣∣ ≤ K ln

(n/kn

)×(n/kn

)2/κ−1(1+O(1))

≤ K ln(n/kn

)×(n/kn

)2/κ−1 ≤K ln(n)×(n/kn

)2/κ−1.

Since E[y2n,t] ∼ K ln(n) if κ = 2 and E[y2

n,t] ∼ K(n/kn

)2/κ−1 if κ < 2 by Claim (b), theproof is complete.

Claim (b): Under Assumption T and (2) the thresholds are by construction ln= d1/κ (n/kn)1/κ . The claims follow from power law decay Assumption T and anapplication of Karamata’s Theorem (Resnick (1987): Theorem 0.6): if κ = 2 thenE[y∗2

n,t] ∼ K ln(n), and if κ ∈ (1,2) then E[y∗2n,t] ∼ κ(2 − κ)−1l2n(kn/n) = κ(2 −

κ)−1d2/κ (n/kn)2/κ−1.

Claim (c): Observe {I(yt < −ln)/k1/2n ,�t} forms a geometric L2mixingale array

with mixingale constants Kn−1/2 under geometric α-mixing Assumption D(Hill, 2010, Lemma 2). Therefore E(1/n1/2∑n

t=1In,t)2 ≤ K∑n

t=1(Kn−1/2)2 ≤ K byTheorem 1.6 in McLeish (1975). Since E[I2

n,t] = 1 + o(1) by construction we haveshown E(1/n1/2∑n

t=1In,t)2 ≤ KE[I2n,t]. �

Proof of Lemma A.2. The indicator function I(u) := I(u ≤ 0) can be approximated bya smooth regular sequence {In(u)}n≥1, cf. Lighthill (1958). Let {Nn} be a sequenceof finite positive numbers, Nn → ∞, the rate to be chosen below. Now define In(u):= ∫∞

−∞ I(� )S(Nn(� − u))Nne−� 2/N 2n d�, where S(ξ ) = e−1/(1−ξ2)/

∫ 1−1e−1/(1−w2)dw

if |ξ | < 1 and S(ξ ) = 0 if |ξ | ≥ 1. The function S(Nn(� − u)) blots out I(� ) when �

is outside the open interval (u − 1/Nn,u + 1/Nn). The function I(u) is differentiablewith derivative δ(u) the Dirac delta function. Note δ(u) has a regular sequenceapproximation Dn(u) := (Nn/π )1/2exp{−Nnu2}. See Lighthill (1958, p. 22).

Define Yt(a) := yt − a. We have

1n1/2Sn

n∑t=1


)−I(yt ≤qα

)}

at University of N

orth Carolina at C



ownloaded from




= 1n1/2Sn

n∑t=1

ytIn,t{In(Yt

(y([αn])

))−In(Yt

(qα

))}

+ 1n1/2Sn

n∑t=1

ytIn,t{I(Yt

(y([αn])

))−In(Yt

(y([αn])

))}

− 1n1/2Sn

n∑t=1

ytIn,t{I(Yt

(qα

))−In(Yt

(qα

))}.

The second and third terms can be forced to be as close to zero as we like for eachn by choosing Nn → ∞ sufficiently fast, hence they are op(1). Cf. Phillips (1995).

The first term on the right-hand-side can be expanded by the mean-value-theorem. Use y([αn]) = qα + Op(1/n1/2) as in Cižek (2008, proof of Lemma A.3), andnote for some q∗

n, |q∗n − qα| ≤ |y([αn]) + qα|,

An :=∣∣∣∣∣ 1n1/2Sn

n∑t=1

ytIn,t{In(Yt

(y([αn])

))−In(Yt

(qα

))}∣∣∣∣∣

=∣∣∣∣∣ 1n1/2Sn

n∑t=1

ytIn,tDn(Yt

(q∗

n))×(

y([αn]) +qα

)∣∣∣∣∣=Op

(E∣∣ytIn,t

∣∣Sn

1n

n∑t=1

∣∣ytIn,t∣∣

E∣∣ytIn,t

∣∣Dn(Yt

(q∗

n)))

.

By Lyapunov’s inequality and variance bound Lemma A.1.a E|ytIn,t|/Sn ≤(E[y2

t In,t])1/2/Sn = O(1); by stationarity and ergodicity of yt it follows

1/n∑n

t=1|ytIn,t|/E|ytIn,t| p→ 1 since |ytIn,t|/E|ytIn,t| is integrable; and since by

distribution continuity Yt(q∗n) = yt −q∗

n �= 0 a.s. it follows max1≤t≤n{Dn(Yt(q∗n))}

p→ 0 as fast as we choose by setting Nn → ∞ sufficiently fast. Therefore An = op(1),which completes the proof. �

Proof of Lemma A.4. Define In,t(u) := (n/kn)1/2{I(yt < −lneu) − P(yt < −lneu)} foru ∈ R, hence by the notation of Appendix A In,t = In,t(0). The dependence andtail conditions of Lemma 3 in Hill (2010) are satisfied under Assumptions D and

T, hence n−1/2∑nt=1In,t(u)

d→ N(0,v2(u)) where v2(u) > 0 for each u. This impliesn−1/2∑n

t=1In,t(0) = Op(1) as claimed. Further, it implies by dominated convergence

n−1/2∑nt=1In,t(u/k1/2

n )d→ N(0,v2(0)) for each u hence by arguments in Hsing (1991),

p. 1553) k1/2n (y(−)

(kn)/(−ln) − 1) = κ−1n−1/2∑nt=1In,t(u/k1/2

n )×(1 + op(1)) for each u.

Pick u = 0 to deduce as claimed k1/2n (y(−)

(kn)/(−ln) − 1) = κ−1n−1/2∑nt=1In,t(0)×(1 +

op(1)). �

Proof of Lemma A.5. We first require some notation. Recall In,t = (n/kn)1/2{I(yt <

−ln) − P(yt < −ln)} and Wn,t := [y∗n,t − E[y∗

n,t],In,t]′, and define Z∗n,t := λ′Wn,t and

at University of N

orth Carolina at C



ownloaded from




S2n := E(

∑nt=1Z∗

n,t)2 where we hide dependence on λ. Let S and T be families of

integers, and define the sigma-field Gn,T := σ (Z∗n,t/Sn : t ∈ T). Let L2(A) denote the

space of zero mean and finite variance A-measurable random variables. Defineρ∗

k := supn≥1supdist(T−S)≥k maxf ∈L2(Gn,T ),g∈L2(Gn,S) |corr(f ,g)| (see Bradley, 1993 forreferences on the interlaced maximal correlation coefficient).

We must show S−1n∑n

t=1Zn,td→ N(0,1). It suffices to verify the conditions

of Theorem 2.1 in Peligrad (1996). By �t-measurability, Assumptions D and N′and variance properties Lemma A.1, it follows Z∗

n,t/Sn is geometrically α-mixing,stationary over 1 ≤ t ≤ n, and uniformly L2-bounded supn≥1E(Z∗

n,t/Sn)2 ≤ K. Itremains to verify (i) limk→∞ ρ∗

k < 1, (ii) supn≥1S−2n∑n

t=1E[Z∗2n,t] < ∞, and (iii)

S−2n∑n

t=1E[Z∗2n,tI(|Z∗

n,t| > εSn)] → 0 ∀ε > 0.Observe ρ∗

k is defined over finite variance σ (∪n≥1 ∪T Gn,T)-measurable randomvariables, where σ (∪n≥1 ∪T Gn,T) ⊆ σ (∪t∈Zσ (yτ : τ ≤ t)) while yt is stationary andgeometrically α-mixing. Therefore ρ∗

k = O(ρk) for some ρ ∈ (0,1) can be verified asin Theorem 1 of Bradley (1993). Since limk→∞ ρ∗

k = 0 we have shown (i). AssumptionN′ and stationarity ensure nE[Z∗2

n,t]/S2n = O(1) hence (ii).

Finally, consider the Lindeberg condition (iii) and assume E[y∗n,t]= 0 to simplify

notation. In view of stationarity we need only prove the well-known sufficientLyapunov condition (e.g., Davidson, 1994, Theorem 23.11)

n

S2+δn

E∣∣Z∗

n,t∣∣2+δ →0 for some δ>0. (A9)

By variance properties Lemma A.1 and the fact that E[I2n,t] = 1 + o(1) by

construction, it is straightforward to verify S2n =E(

∑nt=1Z∗

n,t)2 = E(

∑nt=1y∗

n,t)2 ×

(1 + O(1)). See the proof of Theorem 2.1. Further, by the threshold constructionE[I(yt < −ln)] = kn/n and Minkowski’s inequality it follows

E∣∣Z∗

n,t∣∣2+δ ≤K

((E∣∣y∗

n,t∣∣2+δ

)1/(2+δ) +(

nkn

)1/2(kn

n

)1/(2+δ))2+δ

for some finite K > 0. Therefore since nE[y∗2n,t]/E(

∑nt=1y∗

n,t)2 = O(1) by Assumption

N′, we have

n

S2+δn

E∣∣Z∗

n,t∣∣2+δ ≤K

1

nδ/2(E[y∗2

n,t

])1+δ/2

((E∣∣y∗

n,t∣∣2+δ

)1/(2+δ) +(

nkn

)1/2(kn

n

)1/(2+δ))2+δ

=:An.

Write (z)+ := max(0,z). Under power law Assumption T and the thresholdconstruction P(yt < −ln) = (kn/n) it follows ln =K(n/kn)1/κ . If κ ∈ (0,2) then by an

at University of N

orth Carolina at C



ownloaded from




application of Karamata’s Theorem E|y∗n,t|2+r ∼ Kl2+r

n P(|yt| > ln) for any r ≥ 0, hence

An ≤ K1

nδ/2(l2n(kn/n

))1+δ/2

((l2+δn

(kn/n

))1/(2+δ) +(

nkn

)1/2(kn

n

)1/(2+δ))2+δ

= K1

l2+δn kδ/2

n

(ln +

(nkn

)1/2)2+δ

=K1

kδ/2n

(1+

(kn

n

)1/κ−1/2)2+δ

= O(k−δ/2n )=o(1)

since 1/κ > 1/2.If κ ∈ [2,2 + δ) then E|y∗

n,t|2+δ ∼ Kl2+δn (kn/n) and liminfn→∞E[y∗2

n,t] > 0, hence

An ≤ Kkn/nnδ/2

(ln +

(nkn

)1/2)2+δ

= K1

nδ/2

(nkn

)δ/2((

kn

n

)1/2−1/κ

+1

)2+δ

=O(k−δ/2n )=o(1),

because 1/2 ≥ 1/κ . If κ = 2 + δ then E|y∗n,t|2+δ = K ln(n) by an application of

Karamata’s Theorem hence

An ≤ K1

nδ/2

(ln(n)+

(nkn

)1/2(kn

n

)1/(2+δ))2+δ

= Kkn

n1

nδ/2

((nkn

)1/(2+δ)ln(n)+

(nkn

)1/2)2+δ

≤ Kkn

n1

nδ/2

((nkn

)1/2)2+δ

=K1

kδ/2n

=o(1).

Finally, if κ > 2 + δ then by the same argument An ∼ K/kδ/2n → 0. This verifies (A9)

which completes the proof. �

Received June 14, 2012; revised May 15, 2013; accepted June 15, 2013.

REFERENCES

Acerbi, C. 2002. Spectral Measures of Risk: A Coherent Representation of SubjectiveRisk Aversion. Journal of Banking and Finance 26:1505–1518.

Acerbi, C., and D. Tasche. 2002. On the Coherence of Expected Shortfall. Journal ofBanking and Finance 26:1487–1503.

at University of N

orth Carolina at C



ownloaded from




Acharya, V. V., L. H. Pedersen, T. Philippon, and M. Richardson. 2010. “MeasuringSystemic Risk.” Technical Report, Department of Finance, NYU.

Adrian, T., and M. Brunnermeier. 2011. “CoVar.” Working Paper, Department ofEconomics, Princeton University.

Artzner, P. F., J. Delbaenm, M. Eber, and D. Heath. 1999. Coherent Measures of Risk.Mathematical Finance 9:203–228.

Basrak, B., R. A. Davis, and T. Mikosch. 2002. Regular Variation of GARCHProcesses. Stochastic Processes and their Applications 99:95–110.

Berkes, I., L. Horváth, and P. Kokoszka. 2003. Estimation of the Maximal MomentExponent of a GARCH(1,1) Sequence. Econometric Theory 19:565–586.

Bradley. R. C. 1993. Equivalent Mixing Conditions for Random Fields. Annals ofProbability 21:1921–1926.

Brownlees, C. T., and R. Engle. 2010. “Volatility, Correlation and Tails for SystemicRisk Measurement.” Technical Report, Department of Finance, NYU.

Brockwell, P. J., and D. B. H. Cline. 1985. Linear Prediction of ARMA Processes withInfinite Variance. Stochastic Processes and their Applications 19:281–296.

Cai, Z., and X. Wang. 2008. Nonparametric Estimation of Conditional VaR andExpected Shortfall. Journal of Econometrics 147:120–130.

Caporale, G. M., N. Pittis, and N. Spagnolo. 2003. IGARCH Models and StructuralBreaks. Applied Economics Letters 10:765–768.

Chaudhuri, S., and J. B. Hill. 2013. “Robust Estimation for Average TreatmentEffects.” Working Paper, Department of Economics, University of NorthCarolina.

Chen, S. 2008. Nonparametric Estimation of Expected Shortfall. Journal of FinancialEconometrics 1:87–107.

Christoffersen, P. F. 2012. Elements of Financial Risk Management, 2nd edn. Waltham,Mass., US: Adademic Press.

Cižek, P. (2008). General Trimmed Estimation: Robust Approach to Nonlinear andLimited Dependent Variables Models. Econometric Theory 24:1500–1529.

Csörgo S., L. Horváth, and D. Mason (1986). What Portion of the Sample Makesa Partial Sum Asymptotically Stable or Normal? Probability Theory and RelatedFields 72:1–16.

Csörgö, S., Haeusler, E., Mason, D. M. (1988). The Asymptotic Distribution ofTrimmed Sums. Annals of Probability 16:672–699.

Danielsson, J. 2011. Financial Risk Forecasting. London: Wiley.David, H. A., and H. N. Nagaraja. 2003. Order Statistics. London: Wiley.Davidson, J. 1994. Stochastic Limit Theory. Oxford: Oxford University Press.Davis, R. A. 2010. Heavy Tails in Financial Time Series. In R. Cont, (ed.), Encyclopedia

Quantitative Finance. New York: Wiley.de Haan, L., and L. Peng. 1998. Comparison of Tail Index Estimators. Statistica

Neerlandica 52:60–70.Dehling, H., M. Denker, and W. Phillip. 1986. Central Limit Theorems for

Mixing Sequences of Random Variables under Minimal Conditions. Annalsof Probability 14:1359–1370.

at University of N

orth Carolina at C



ownloaded from




Embrechts, P., C. Klüppleberg, and T. Mikosch. 1997. Modelling Extremal Events forInsurance and Finance. Berlin: Springer-Verlag.

Fermanian, J. D., and O. Scaillet. 2005. Sensitivity Analysis of VaR and ExpectedShortfall for Portfolios under Netting Agreements. Journal of Banking andFinance 29:927–958.

Garcia, R., E. Renault, and G. Tsafack. 2007. Proper Conditioning for Coherent VaRin Portfolio Management. Management Science 53:483–494.

Gourieroux, C., J. P. Laurent, and O. Scaillet. 2000. Sensitivity Analysis of Values atRisk. Journal of Empirical Finance 7:225–245.

Hahn, M. G., J. Kuelbs, and D. C. Weiner. 1990. The Asymptotic Distributionof Magnitude-Winsorized Sums Via Self-Normalization. Journal of TheoreticalProbability 3:137–168.

Haeusler, E., and J. L. Teugels. 1985. On Asymptotic Normality of Hill’s Estimatorfor the Exponent of Regular Variation. Annals of Statistics 13:743–756.

Hall, P. 1982. On Some Simple Estimates of an Exponent of Regular Variation. Journalof the Royal Statistical Society Series B 44:37–42.

Hill, B. M. 1975. A Simple General Approach to Inference About the Tail of aDistribution. Annals of Statistics 3:1163–1174.

Hill, J. B. 2009. On Functional Central Limit Theorems for Dependent, Heteroge-neous Arrays with Applications to Tail Index and Tail Dependence Estimation.Journal of Statistical Planning and Inference 139:2091–2110.

Hill, J. B. 2010. On Tail Index Estimation for Dependent. Heterogeneous DataEconometric Theory 26:1398–1436.

Hill, J. B. 2011a. Tail and Non-Tail Memory with Applications to Extreme Value andRobust Statistics. Econometric Theory 27:844–884.

Hill, J. B. 2011b. Extremal Memory of Stochastic Volatility with an Applicationto Tail Shape Inference. Journal of Statistical Planning and Inference 141:663–676.

Hill, J. B. 2012a. Heavy-Tail and Plug-In Robust Consistent Conditional MomentTests of Functional Form: in X. Chen and N. Swanson (eds.), Recent Advancesand Future Directions in Causality, Prediction, and Specification Analysis: Essays inHonor of Halbert L. White Jr., pp. 241–274, New York: Springer.

Hill, J. B. 2012b. “Robust Estimation and Inference for Heavy Tailed GARCH.”Working Paper, Department of Economics, University of North Carolina.

Hill, J. B. 2013. Least Tail-Trimmed Squares for Infinite Variance Autoregressions.Journal of Time Series Analysis 34:168–186.

Hill, J. B., and M. Aguilar. 2012. Moment Condition Tests for Heavy-Tailed TimeSeries. Journal of Econometrics 172:255–274.

Hill, J. B., and E. Renault. 2012. “Variance Targeting for Heavy Tailed Time Series.”Working Paper, Department of Economics, University of North Carolina.

Hsing, T. 1991. On Tail Index Estimating Using Dependent Data. Annals of Statistics19:1547–1569.

Huisman, R., K. G. Koedijk, C. J. M. Kool and F. Palm. 2001. Tail-Index Estimates inSmall Samples. Journal of Business and Economic Statistics 19:208–216.

at University of N

orth Carolina at C



ownloaded from




Ibragimov, I. A. 1962. Some Limit Theorems for Stationary Sequences. Theory ofProbability and its Applications 7:349–382.

Ibragimov, M., R. Ibragimov, and P. Kattuman. 2010. “Emerging Markets and HeavyTails.” Mimeo, Department of Economics, Harvard University.

Ibragimov R. 2009. Portfolio Diversification and Value at Risk under Thick-Tailedness. Quantitative Finance 9:565–580.

Ibragimov, R., and U. Müller. 2010. t-Statistic Based Correlation and HeterogeneityRobust Inference. Journal of Business and Economic Statistics 28:453–468.

Ibragimov, R., D. Jaffee, and J. Walden. 2009. Non-Diversification Traps in Marketsfor Catastrophic Risk. Review of Financial Studies 22:959–993.

Ibragimov, R., and J. Walden. 2011. Value at Risk and Efficiency under Dependenceand Heavy-Tailedness: Models with Common Shocks. Annals of Finance 7:285–318.

Kitamura, Y. 1997. Empirical Likelihood Methods with Weakly DependentProcesses. Annals of Statistics 25:2084–2102.

Leadbetter, M. R., G. Lindgren, and H. Rootzén. 1983. Extremes and Related Propertiesof Random Sequences and Processes. New York: Springer-Verlag.

Lighthill, M. J. 1958. Introduction to Fourier Analysis and Generalised Functions.Cambridge: Cambridge University Press.

Ling, S. 2005. Self-Weighted LAD Estimation for Infinite Variance AutoregressiveModels. Journal of the Royal Statistical Society Series B 67:381–393.

Linton, O., and Z. Xiao. 2013. Estimation of and Inference about the ExpectedShortfall for Time Series with Infinite Variance. Econometric Theory 29:771–807.

Mainik, G., and P. Embrechts. 2012. “Diversification in Heavy-Tailed Portfolios:Properties and Pitfalls.” Technical Report, RiskLab, Department of Mathemat-ics, ETH Zürich.

McLeish, D. L. 1975. A Maximal Inequality and Dependent Strong Laws. Annals ofProbability 3:829–839.

McNeil, A. J., R. Frey, and P. Embrechts. 2005. Quantitative Risk Management:Concepts, Techniques and Tools. Princeton, NJ: Princeton Univeristy Press.

Mikosch, T., and C. Starica. 2000. Limit Theory for the Sample Autocorrelations andExtremes of a GARCH(1,1) Process. Annals of Statistics 28:1427–1451.

Morana, C. 2002. IGARCH Effects: An Interpretation. Applied Economics Letters9:745–748.

Owen, A. 1991. Empirical Likelihood. New York: Chapman & Hall.Nešlehová, J., P. Embrechts, and C. Demoulin. 2006. Infinite-Mean Models and the

LDA for Operational Risk. Journal of Operational Risk 1:3–25.Peligrad, M. 1996. On the Asymptotic Normality of Sequences of Weak Dependent

Random Variables. Journal of Theoretical Probability 9:703–715.Peng, L. 2001. Estimating the Mean of a Heavy Tailed Distribution. Statistics and

Probability Letters 52:255–264.Peng, L. 2004. Empirical-Likelihood-Based Confidence Interval for the Mean with

a Heavy-Tailed Distribution. Annals of Statistics 32:1192–1214.

at University of N

orth Carolina at C



ownloaded from




Pham, T. D., and L. T. Tran. 1985. Some Mixing Properties of Time Series Models.Stochastic Processes and their Applications 19:297–303.

Phillips, P. C. B. 1995. Robust Nonstationary Regression. Econometric Theory 11:912–951.

Powers, M. R. 2010. Infinite-Mean Losses: Insurance’s “Dread Disease”. Journal ofRisk Finance 11:125–128.

Qin, J., and A. Wong. 1996. Empirical Likelihood in a Semi-Parametric Model.Scandinavian Journal of Statistics 23:209–219.

Rachev, S., and S. Mittnik. 2000. Stable Paretian Models in Finance. New York: Wiley.Resnick, S. I. 1987. Extreme Values, Regular Variation and Point Processes. New York:

Springer-Verlag.Rio, E. 1993. Covariance Inequalities for Strongly Mixing Processes. Annals of the

Institute of Henri Poincaré 29:587–597.Scaillet, O. 2004. Nonparametric Estimation and Sensitivity Analysis of Expected

Shortfall. Mathematical Finance 14:115–129.Segers, J. 2002. Abelian and Tauberian Theorems on the Bias of the Hill Estimator.

Scandinavian Journal of Statistics 29:461–483.Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis, Monographs

on Statistics and Applied Probability. London, UK: Chapman and Hall.Tasche, D. 2002. Expected Shortfall and Beyond. Journal of Banking and Finance

26:1519–1533. at University of N

orth Carolina at C



ownloaded from


Documents

Expected Shortfall Estimation and Gaussian Inference for ...jbhill.web.unc.edu/files/2018/10/JBHILL_expected_short.pdf · The α-quantile Value-at-Risk (VaR) is −qα > 0, hence