advanced risk management

Risk Measurement

Jean-Michel ZAKOIAN

CREST and Univ. Lille, France

April 2012

Contents

1 Introduction 11.1 Financial Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Classical time series models and financial series 42.1 Stationary processes . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 ARMA and ARIMA models . . . . . . . . . . . . . . . . . . . . . . 72.3 Financial Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Random variance models . . . . . . . . . . . . . . . . . . . . . . . . 142.5 GARCH(p, q) Processes . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Reserves and risk measures 223.1 Risk factors and loss distributions . . . . . . . . . . . . . . . . . . . 223.2 VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Definition and interpretations . . . . . . . . . . . . . . . . . 243.2.2 VAR and conditional moments . . . . . . . . . . . . . . . . . 253.2.3 VAR and tails of distributions . . . . . . . . . . . . . . . . . 27

3.3 Aggregation of risks: diversification and contagion . . . . . . . . . 273.3.1 Iid factor model . . . . . . . . . . . . . . . . . . . . . . . . . 283.3.2 Model with autocorrelated factor . . . . . . . . . . . . . . . 30

3.4 Alternative standard risk measures . . . . . . . . . . . . . . . . . . 313.4.1 Volatility and moments . . . . . . . . . . . . . . . . . . . . . 313.4.2 Expected shortfall . . . . . . . . . . . . . . . . . . . . . . . 323.4.3 Distortion measures . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Sensitivity with respect to the composition of portfolio . . . . . . . 383.6 Coherent risk measures . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Estimation of risk 444.1 Properties of the empirical cdf . . . . . . . . . . . . . . . . . . . . . 444.2 Empirical quantile function . . . . . . . . . . . . . . . . . . . . . . 46

4.2.1 Calculation of the empirical quantiles . . . . . . . . . . . . . 46

ii

CONTENTS iii

4.2.2 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . 484.3 Methods for estimating risk measures . . . . . . . . . . . . . . . . . 52

4.3.1 Nonparametric estimation . . . . . . . . . . . . . . . . . . . 534.3.2 Dynamic models of conditional moments . . . . . . . . . . . 554.3.3 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . 584.3.4 Dynamic models of VaR . . . . . . . . . . . . . . . . . . . . 60

A Stationarity of GARCH(p, q) Processes 65A.1 Case of the GARCH(1,1) model . . . . . . . . . . . . . . . . . . . . 65A.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

B Quantile 72B.1 Quantile function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72B.2 Aggregation of quantile functions . . . . . . . . . . . . . . . . . . . 73B.3 Derivatives of the quantile of a linear combination of random variables 74B.4 Non absolutely continuous variables . . . . . . . . . . . . . . . . . . 75B.5 Central limit theorem for triangular arrays . . . . . . . . . . . . . . 76

C References 78

iv CONTENTS

Chapter 1

Introduction

In finance, investors, banks and other stakeholders of the markets have their capitalexposed to risk. It is therefore useful to quantify the risk of a particular positionto decide whether it is acceptable or not.

In addition, new regulations are currently being set: for financial institutions(Basel II), for insurance companies (Solvency II), due to the undervaluation ofreserves in recent years.

For these reasons, various classes of risk measures have been introduced. We firstdefine the various notions of risks encountered in finance and insurance beforegiving a brief history of new regulations.

1.1 Financial Risk

The main risks encountered in finance are the following, but the border betweenthem is not always clear.

• Market risk: risk of change in the value of a financial position due to changesin its components (stocks, bonds, exchange rates, prices of raw first ...)

• Credit risk: risk of not receiving payments expected because of the defaultof the borrower.

• Operational risks: risk of losses from failures of internal (people or systems)or external processes. Example: fire, fraud, legal risk.

• Liquidity risk: related to the fact that an investment can not be done quicklyenough to avoid a loss.

1

2 CHAPTER 1. INTRODUCTION

• Model risk: risk related to the use of a misspecified model (eg Black-Scholeswhen returns are are not Gaussian). In a way this risk is still present, butto different degrees.

The measure of risk is a statistical question: from historical data and a probabilisticmodel, the problem is to evaluate the risk of a position. The risk management isthe business of a bank or an insurance company.

1.2 Regulation

Regulation in recent years has been conducted by the Basel Committee, estab-lished in 1974 by the G-10. The committee has no supranational authority andits conclusions have no legal force. It provides recommendations that nationalinstitutions are free to follow and adapt.

The main steps of implementation of control procedures were as follows.

• Basel I: the first guidelines, relating principally to credit risk, date from1988. The risk measure proposed (Cooke ratio) was too crude and poorlydifferentiated.

• VaR birth: 1993, in various reports. In 1996, an amendment to Basel Iadvocated a standard model for market risk but allows big banks to choosean internal model, based on VaR. The delicate problem of credit risk is notresolved and banks complain that they not have enough incentive to diversifythis risk.

• Basel II: the consultative process for a second agreement is initiated in 2001.Approval of the final provisions of Basel II, by the governors of the centralbank, is made in June 2004. In 2005: beginning of the transitional period ofone year. End of 2006: introduction of Basel II in different countries.

An important difference compared to Basel I is the introduction in Basel II of theconcept of three pillars:

• Pillar I: a minimum capital requirement. These funds apply to market risk(already in Basel I), credit risk (substantially revised compared to Basel I)and, for the first time, to operational risk.

• Pillar II: implementation and validation of internal procedures for controlingand monitoring risk.

1.2. REGULATION 3

• Pillar III: market discipline and transparency including issues of dissemina-tion and exchange of informations.

Two types of approaches have been planned initially:

• The advanced approach methods based on relatively sophisticated risk cal-culation (use conditional laws);

• The standard approach based on simpler methods but requiring larger re-serves.

Financial institutions have the opportunity to develop their own model of riskmanagement. The regulator validates the model and imposes capital levels basedon the quality the proposed model.

Chapter 2

Classical time series models andfinancial series

The standard time series analysis rests on important concepts such as station-arity, autocorrelation, white noise, innovation, and on a central family of

models, the ARMA (AutoRegressive Moving Average). We start by recalling theirmain properties and the way they can be used. As we shall see, these notionsare insufficient for the analysis of financial time series. In this chapter, we alsointroduce the crucial concept of volatility.

Consider a sequence of real random variables (Xt)t∈Z, defined on the same proba-bility space. Such a sequence is called time series, and constitutes an example ofdiscrete-time stochastic process.

2.1 Stationary processes

Stationarity plays a central part in time series analysis, because it replaces in anatural way the hypothesis of iid (independent and identically distributed) obser-vations in standard statistics.

Two notions of stationarity are generally introduced.

Definition 2.1 (Strict stationarity) The process (Xt) is said to be strictly sta-tionary if the vectors (X1, . . . , Xk)

′ and (X1+h, . . . , Xk+h)′ have the same joint

distribution, for any k ∈ N and any h ∈ Z.

The following notion may seem less demanding, because it only constrains the firsttwo moments of the variables Xt, but contrary to the strict stationarity, it requiresthe existence of such moments.

4

2.1. STATIONARY PROCESSES 5

Definition 2.2 (Second-order stationarity) The process (Xt) is said to besecond-order stationary if

(i) EX2t < ∞ ∀t ∈ Z,

(ii) EXt = m ∀t ∈ Z,

(iii) Cov(Xt, Xt+h) = γX(h) ∀t, h ∈ Z.The function γX(·) (resp. ρX(·) := γX(·)/γX(0)) is called the autocovariance func-tion (resp. the autocorrelation function) of (Xt).

The simplest example of a second-order stationary process is the white noise. Thisprocess is particularly important because it allows to construct more complexstationary processes.

Definition 2.3 (Weak white noise) The process (ǫt) is called a weak whitenoise if, for some positive constant σ2:

(i) Eǫt = 0 ∀t ∈ Z,

(ii) Eǫ2t = σ2 ∀t ∈ Z,

(iii) Cov(ǫt, ǫt+h) = 0 ∀t, h ∈ Z, h 6= 0.

Remark 2.1 (Strong white noise) It should be noted that no independenceassumption is made in the definition of a weak white noise. The variables atdifferent dates are only uncorrelated and the distinction is particularly crucial forfinancial time series. It is sometimes necessary to replace the hypothesis (iii) bythe stronger hypothesis

(iii’) the variables ǫt and ǫt+h are independent and identically distributed.The process (ǫt) is then said to be a strong white noise.

Estimating the autocovariances

The classical time series analysis is centered on the second-order structure of theprocesses. Gaussian stationary processes are completely characterized by theirmean and autocovariance function. For non Gaussian processes, the mean andautocovariances give a first idea of the temporal dependence structure. In practicethese moments are unknown and are estimated from a realization of size n of theseries, denoted X1, . . . , Xn. This step is preliminary to any construction of anappropriate model. To estimate γ(h), we generally use the sample autocovariancedefined, for 0 ≤ h < n by

γ(h) =1

n

n−h∑

j=1

(Xj −X)(Xj+h −X) := γ(−h),

6CHAPTER 2. CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES

where X = 1/n∑n

j=1Xj denotes the sample mean. We similarly define the sampleautocorrelation function by ρ(h) = γ(h)/γ(0) for |h| < n.

The previous estimators are biased but are asymptotically unbiased. Other simi-lar estimators of the autocovariance function which possess the same asymptoticproperties exist (for instance obtained by replacing 1/n by 1/(n − h)). The pro-posed estimator can however be preferred to others because the matrix (γ(i− j))is positive semi-definite cf Brockwell and Davis, 1991, p.221).

It is of course not recommended to use the sample autocovariances for h closen, because too few pairs (Xj , Xj+h) are used when h is large. Box, Jenkins andReinsel (1994, p.32) suggest that useful estimates of the autocorrelations can onlybe made if, approximately, n > 50 and h ≤ n/4.

One often wishes to know (for instance to select an appropriate model) if someor all the sample autocovariances are significantly different from 0. To do this, itis necessary to estimate the covariance structure of those sample autocovariances.We have the following result (see for instance Brockwell and Davis, 1991, p. 222,226).

Theorem 2.1 (Bartlett’s formulas for a strong linear process) Let (Xt)be a linear process satisfying

Xt =

∞∑

j=−∞φjǫt−j ,

∞∑

j=−∞|φj| < ∞

where (ǫt) is a sequence of iid variables, such that

E(ǫt) = 0, E(ǫ2t ) = σ2, E(ǫ4t ) = κǫσ4 < ∞.

Appropriately normalized, the sample autocovariances and autocorrélations areasymptotically normal, with asymptotic variances given by the Bartlett formulas:

limn→∞

nCovγ(h), γ(k) =∞∑

i=−∞γ(i)γ(i+ k − h) + γ(i+ k)γ(i− h)

+(κǫ − 3)γ(h)γ(k) (2.1)

and

limn→∞

nCovρ(h), ρ(k) =

∞∑

i=−∞ρ(i) [2ρ(h)ρ(k)ρ(i)− 2ρ(h)ρ(i+ k)− 2ρ(k)ρ(i+ h)

+ρ(i+ k − h) + ρ(i− k − h)] . (2.2)

2.2. ARMA AND ARIMA MODELS 7

Formula (2.2) still holds under the assumptions

Eǫ2t < ∞,

∞∑

j=−∞|j|φ2

j < ∞.

In particular, if Xt = ǫt and Eǫ2t < ∞, we have

√n

ρ(1)...

ρ(h)

L→ N (0, Ih) .

The assumptions of this theorem are demanding, because they require a strongwhite noise (ǫt). For many non linear processes, in particular the ARCH processesstudies in this book, the asymptotic covariance of the sample autocovariances canbe very different from (2.1). Using the standard Bartlett’s formula can lead tospecification errors.

2.2 ARMA and ARIMA models

In the classical time series analysis, one wishes to construct a model for the under-lying stochastic process. This model is then used for analyzing the causal structureof the process or to obtain optimal predictions.

The class of ARMA models is the most widely used for the prediction of second-order stationary processes . These models can be viewed as a natural consequenceof a fundamental result due to Wold (1938), which can be stated as follows: anycentered, second-order stationary, and “purely non deterministic” process 1 admitsan infinite moving-average representation of the form

Xt = ǫt +

∞∑

i=1

ciǫt−i, (2.3)

where (ǫt) is the linear innovation process of (Xt), that is

ǫt = Xt − E(Xt|HX(t− 1)), (2.4)

1A stationary process (Xt) is said to be purely non deterministic if and only if⋂∞

n=−∞HX(n) = 0, where HX(n) denotes, in the Hilbert space of the real, centered, andsquared integrable variables, the sub-space constituted by the limits of the linear combinationsof the variables Xn−i, i ≥ 0. Thus, for a purely non deterministic (or regular) process, the linearpast, sufficiently far away in the past, is not useful to predict future values. See Brockwell andDavis (1991), p.187-189, or Azencott and Dacunha-Castelle (1984), for more details.


where HX(t − 1) denotes the Hilbert space generated by the random variablesXt−1, Xt−2, . . . .

2 The sequence of coefficients (ci) is such that∑

i c2i < ∞.

Note that (ǫt) is a weak white noise.

Truncating the infinite sum in (2.3), we obtain the process

Xt(q) = ǫt +

q∑

i=1

ciǫt−i

called moving average of order q, or MA(q). We have

‖Xt(q)−Xt‖22 = Eǫ2t∑

i>q

c2i → 0, quand q → ∞.

It follows that the set of all finite-order moving averages is dense in the set of thesecond-order stationary and purely non deterministic processes. For parcimonyreasons, because they generally use a smaller number of parameters, the class ofARMA models is often preferred to the MA’s.

Definition 2.4 (ARMA(p, q) process) A second-order stationary process (Xt)is called ARMA(p, q), where p and q are integers, if there exist real coefficientsa1, . . . , ap, b1, . . . , bq such that

∀t ∈ Z, Xt +

p∑

i=1

aiXt−i = ǫt +

q∑

j=1

bjǫt−j , (2.5)

where (ǫt) is the linear innovation process of (Xt).

This definition entails constraints on the zeroes of the autoregressive and moving-average polynomials, a(z) = 1 +

∑pi=0 aiz

i and b(z) = 1+∑q

i=0 bizi (Exercise ??).

The main interest of this model, and the representations obtained by successivelyinverting the polynomials a(·) and b(·), is to provide a framework for derivingthe optimal linear predictions of the process, in a much simpler way than by onlyassuming the second-order stationarity.

Many economic series display trends, making the stationarity assumption unrealis-tic. Such trends often vanish when the series is differentiated, once or several times.Let ∆Xt = Xt−Xt−1 denote the first-difference series, and let ∆dXt = ∆(∆d−1Xt)(with ∆0Xt = Xt) denote the differences of order d.

2In this representation, the equivalence class E(Xt|HX(t − 1)) is identified with a randomvariable.

2.3. FINANCIAL SERIES 9

Definition 2.5 (ARIMA(p, d, q) process) Let d be a positive integer. The pro-cess (Xt) is said to be an ARIMA(p, d, q) if, for k = 0, . . . , d − 1, the processes(∆kXt) are not second-order stationary, and (∆dXt) is an ARMA(p, q) process.

The simplest ARIMA process, and the most important one, is the ARIMA(0, 1, 0),also called random walk, satisfying

Xt = ǫt + ǫt−1 + · · ·+ ǫ1 +X0, t ≥ 1,

where ǫt is a weak white noise.

For statistical convenience, ARMA (and ARIMA) models are generally used understronger assumptions on the noise than that of weak white noise. Strong ARMArefers to the ARMA model of Definition 2.4 when ǫt is assumed to be a strongwhite noise. This additional assumption allows to use convenient statistical toolsdeveloped in this framework, but considerably reduces the generality of the ARMAclass. Indeed, assuming a strong ARMA is tantamount to assuming that (i) theoptimal predictions of the process are linear ((ǫt) being the strong innovation of(Xt)) and (ii) the amplitudes of the prediction intervals depend on the horizonbut not on the observations. We shall see in the next section how restrictive thisassumption can be, in particular for financial time series modeling.

The orders (p, q) of an ARMA process are fully characterized through its autocor-relation function (see Brockwell and Davis (1991), p.89-90, for a proof).

Theorem 2.2 (Characterization of an ARMA) Let (Xt) denote a centeredsecond-order stationary process. We have

ρ(h) +

p∑

i=1

aiρ(h− i) = 0 for all |h| > q

if and only if (Xt) is an ARMA(p, q) process.

2.3 Financial Series

Modeling financial time series is a complex problem. This complexity is not onlydue to the variety of the series in use (stocks, exchange rates, interest rates etc.),to the importance of the frequency of d’observation (second, minute, hour, day,etc) or to the availability of very large data sets. It is mainly due to the existenceof statistical regularities (stylized facts) which are common to a large number offinancial series and are difficult to reproduce artificially using stochastic models.


Most of these stylized facts were put forward in a paper by Mandelbrot (1963).Since then, they have been documented, and completed, by many empirical studies.They can be observed more or less clearly depending on the nature of the seriesand its frequency. The properties that we now present mainly concern the dailystock prices.

Let pt denote the price of an asset at time t and let ǫt = log(pt/pt−1) be the log-return (also called return). The series (ǫt) is often close to the series of relative pricevariations rt =

pt−pt−1

pt−1, since ǫt = log(1 + rt). Contrary to the prices, the returns

or relative prices do not depend on monetary units which facilitates comparisonsbetween assets. The following properties have been amply commented in thefinancial literature.

(i) Non stationarity of the prices series. Samples paths of prices are generallyclose to a random walk without intercept (see the CAC index series 3 displayedin Figure 2.1). On the other hand, returns have sample paths making the second-order stationarity plausible. For instance, Figures 2.2 to 2.3 show that the seriesǫt = log(pt/pt−1), where pt denotes the price of the CAC index, is constituted ofoscillations around zero, with very different magnitudes from date to date, but

3The CAC 40 index (CAC for Cotations Assistées en Continu) is a linear combination of aselection of 40 values at the Paris Stock Exchange.

prix

2000

3000

4000

5000

6000

7000

19/Aug/91 11/Sep/01 21/Jan/08

Figure 2.1: CAC 40 index on the period 01/03/1990-15/10/2008 (4702 observations).


averaged modulus computed on sufficiently long sub-periods of the same size arealmost constant. The extreme volatility of prices on the recent period, induced bythe financial crisis of 2008, is worth noting.

(ii) Absence of autocorrelation for the price variations. The price variations seriesgenerally displays small autocorrelations, making it close to a white noise. This isillustrated for the CAC in Figure 2.4. The classical significance bands (for strongwhite noise) are used here as an approximation. Note that for intra-day series, withvery small time intervals between observations (measured in minutes or seconds)significant autocorrelations can be observed due to microstructure effects.

(iii) Autocorrelations of the squared price returns. The series of squares (ǫ2t ) orabsolute values (|ǫt|) are generally strongly autocorrelated (see Figure 2.4). Thisproperty is not incompatible with the white noise assumption for the returns, butshows that the white noise is not strong.

(iv) Volatility clustering). Large values of |ǫt|, or large price variations, tend to befollowed by large values, small values by small values. This property is generallyvisible on the sample paths (as in Figure 2.3). Turbulent sub-periods (the marketis said to be more volatile), are followed by quiet periods (low-volatility periods).

Ren

dem

ents

−10

−5

05

10

19/Aug/91 11/Sep/01 21/Jan/08

Figure 2.2: CAC 40 returns (02/03/1990-15/10/2008). 11/09/2001: fall of the Twin

Towers, 21/01/2008: effect of the subprimes crisis, 6/10/2008: effect of the financial

crisis.


These sub-periods are recurrent but do not appear in a periodic way (which mightcontradict the stationarity assumption). In other words, volatility clustering is notincompatible with homoscedasticity (constant variance) of the returns marginaldistribution.

(v) Fat tailed distributions. When the empirical distribution of daily returns isdrawn, one can generally observe that it does not resemble a Gaussian distri-bution. Classical tests typically lead to reject the normality assumption, at anyreasonable level. More precisely, the densities have fat tails (with a slower decreasethan exp(−x2/2)) and are sharply peaked at zero: they are called leptokurtik. Ameasure of leptokurticity is the kurtosis coefficient, defined as the ratio of thesample fourth-order moment over the squared sample variance. Asymptoticallyequal to 3 for Gaussian iid observations, this coefficient is much greater than 3for returns series. When the sample interval over which the returns are computedincreases, leptokurticity tends to vanish and the empirical distributions gets closerto a Gaussian. Figure 2.5 compares a kernel estimator of the density of the CACreturns with a Gaussian density. The peak around zero appears clearly, but thethickness of the tails is more difficult to visualize.

(vi) Leverage effects. The so-called leverage effect was noted by Black (1976),and consists in an asymmetry of the impact of past positive and negative values

Ren

dem

ents

−10

−5

05

10

21/Jan/08 06/Oct/08

Figure 2.3: Returns of the CAC 40 (02/01/2008-15/10/2008).


on the current volatility. Negative values (corresponding to price decreases) tendto increase volatility of a larger amount than positive values (price increases) ofthe same magnitude. Empirically, a positive correlation is often detected betweenǫ+t = max(ǫt, 0) and |ǫt+h| (a price increase should entail future volatility increases),but, as shown in Table 2.1, this correlation is generally less than the one observedbetween −ǫ−t = max(−ǫt, 0) and |ǫt+h|.

(vii) Seasonalities. Following a period when markets are closed (week-ends, fêtes)volatility tends to increase, reflecting the information cumulated during this break.

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

4

Retard

AC

RE

Autocorrélations empiriques (ACRE) des rendements

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

4

Retard

AC

RE

ACRE des carrés des rendements

Figure 2.4: Sample autocorrelations (ACRE) of returns and squared returns of the CAC

40 (02/01/2008-15/10/2008).


Table 2.1: Sample autocorrelations of returns ǫt (CAC 40 index, 02/01/2008-15/10/2008), of absolute returns |ǫt|, sample correlations between ǫ+t−h and |ǫt|,and between −ǫ−t−h and |ǫt|

h 1 2 3 4 5 6 7ρǫ(h) -0.012 -0.014 -0.047 0.025 -0.043 -0.023 -0.014ρ|ǫ|(h) 0.175 0.229 0.235 0.200 0.218 0.212 0.203ρ(ǫ+t−h, |ǫt|) 0.038 0.059 0.051 0.055 0.059 0.109 0.061ρ(−ǫ−t−h, |ǫt|) 0.160 0.200 0.215 0.173 0.190 0.136 0.173

We use here the notations ǫ+t = max(ǫt, 0) and ǫ−t = min(ǫt, 0).

However, it can be observed that the increase is less than if the information hadcumulated at constant speed. Let us also mention that the seasonal effect is alsovery present for intra-day series.

2.4 Random variance models

The previous properties illustrate the difficulty of financial series modeling. Clas-sical formulations (such as ARMA models) centered on the second-order structureare inappropriate. Indeed, the second-order structure of most financial time seriesis close to that of a white noise.

The fact that large values of returns tend to be followed by large values (what-ever the sign of the price variations) is hardly compatible with the assumption ofconstant conditional variance. This phenomenon is called conditional heteroscedas-ticity :

Var(ǫt | ǫt−1, ǫt−2, . . . ) 6≡ cst.

Conditional heteroscedasticity is perfectly compatible with stationarity (in thestrict or second-order sense), just as the existence of a non constant conditionalmean is compatible with stationarity. The GARCH processes studied in this bookwill amply illustrate this point.

The models introduced in the econometric literature to account for the very specificnature of financial series (price variations or log-returns, interest rates etc.) aregenerally written under the following multiplicative form

ǫt = σtηt (2.6)

where (ηt) and (σt) are real processes such that:

i) σt is measurable with respect to a σ− field, denoted Ft−1, including the past ofǫt;

2.4. RANDOM VARIANCE MODELS 15

ii)(ηt) is an iid centered process with unit variance, ηt being independent of Ft−1;

iii) σt > 0.The random variable σt is called volatility 4 of ǫt. This model implies that the signof the current price variation (that is the sign of ǫt) is that of ηt, and is independentfrom the past price variations.

It can be noted that (under existence assumptions)

E(ǫt) = E(σt)E(ηt) = 0

4There is no general agreement concerning the definition of this concept in the literature.Volatility sometimes refer to a conditional standard-deviation, and sometimes to a conditionalvariance.

−10 −5 0 5 10

0.0

0.1

0.2

0.3

Den

sité

Figure 2.5: Kernel estimator of the CAC 40 returns density (full line) and density of a

Gaussian with mean and variance equal to the sample mean and variance of the returns

(dotted line).


andCov(ǫt, ǫt−h) = E(ηt)E(σtǫt−h) = 0, ∀h > 0

which makes (ǫt) a weak white noise. The series of squares, on the other hand,generally present non-zero autocovariances: (ǫt) is thus not a strong white noise.

The kurtosis coefficient of ǫt, when existing, is related to that of ηt, denoted κη,by

E(ǫ4t )

E(ǫ2t )2= κη

[

1 +var(σ2

t )

E(σ2t )2

]

. (2.7)

This formula shows that the leptokurticity of financial time series can be takeninto account in two different ways: either by using a leptokurtic distribution forthe iid sequence (ηt), or by specifying a process (σ2

t ) with a great variability.

Different classes of models can be distinguished depending on the specificationadopted for σt:

- Conditionally heteroscedastic (or GARCH-type) processes for which Ft−1 =σ(ǫs; s < t) is the σ-field generated by the past of ǫt. Volatility is here a de-terministic function of the past of ǫt. Processes of this class differ by the choice ofa specification for this function. The standard GARCH models are characterizedby a volatility specified as a linear function of the past values of ǫ2t . They will bebriefly presented in the next section.

- Stochastic volatility processes 5 for which Ft−1 is the σ-field generated byvt, vt−1, . . ., where (vt) is a strong white noise and is independent from (ηt).In these models, volatility is a latent process. The most popular model in thisclass assumes that the process log σt follows an AR(1) of the form:

log σt = ω + φ log σt−1 + vt.

where the noises (vt) and (ηt) are independent.

- Switching-regime models for which σt = σ(∆t,Ft−1), where (∆t) is a latent (un-observable) integer-valued process, independent of (ηt). The state of the variable∆t is here interpreted as a regime and, conditionally on this state, the volatilityof ǫt has a GARCH specification. The process (∆t) is generally supposed to be afinite-state Markov chain. The models are thus called Markov-switching models.

2.5 GARCH(p, q) Processes

ARCH models (autoregressive conditionally heteroskedastic) have been introducedby Engle (1982) and their extension GARCH (Generalized ARCH) is due to Boller-

5Note, however, that the volatility is also a random variable in GARCH-type processes.

2.5. GARCH(P,Q) PROCESSES 17

slev (1986). In these models, the key concept is the conditional variance, that isthe variance conditional to the past. In the classical GARCH models, the condi-tional variance is expressed as a linear function of the squared past values of theseries. This particular specification allows for a complete study of the propertiesof the model solutions, while keeping a high level of generality. GARCH modelsare indeed able to capture the stylized facts characterizing most financial series,as described in Section 2.3.

We start with a definition of GARCH processes based on the first two conditionalmoments.

Definition 2.6 (GARCH(p, q process )) A process (ǫt) is called aGARCH(p, q) if its first two conditional moments exist and verify

(i) E(ǫt | ǫu, u < t) = 0, t ∈ Z;

(ii) There exist constants ω, αi, i = 1, . . . , q and βj , j = 1, . . . , p such that

σ2t = Var(ǫt | ǫu, u < t) = ω +

q∑

i=1

αiǫ2t−i +

p∑

j=1

βjσ2t−j , t ∈ Z. (2.8)

Equation (2.8) can be written on a more compact way as

σ2t = ω + α(B)ǫ2t + β(B)σ2

t , t ∈ Z, (2.9)

where B is the standard backshift operator (Biǫ2t = ǫ2t−i and Biσ2t = σ2

t−i for anyinteger i), α and β are polynomials of degrees q and p:

α(B) =

q∑

i=1

αiBi, β(B) =

p∑

j=1

βjBj.

If β(z) = 0 we have

σ2t = ω +

q∑

i=1

αiǫ2t−i (2.10)

and the process is called an ARCH(q) 6.

6This specification rapidly turned out to be too restrictive when applied to financial series.Indeed, a large number of past variables has to be included in the conditional variance to obtaina good fit of the model. Choosing a large value for q is not satisfactory from a statistical pointof view because it requires estimating a large number of coefficients.


By definition, the innovation of the process ǫ2t is the variable νt = ǫ2t−σ2t . Replacing

in equation (2.8) the variables σ2t−j by ǫ2t−j − νt−j we get the representation

ǫ2t = ω +r∑

i=1

(αi + βi)ǫ2t−i + νt −

p∑

j=1

βjνt−j , t ∈ Z, (2.11)

where r = max(p, q), with the convention αi = 0 (resp. βj = 0) if i > q (resp. j >p). This equation has the linear structure of an ARMA model, allowing for simplecomputation of the linear predictions. Under additional assumptions (implying thesecond-order stationarity of ǫ2t ), we can state that if (ǫt) is a GARCH(p, q), then(ǫ2t ) is an ARMA(r, p) process. In particular, the square of an ARCH(q) processadmits, if it is stationary, an AR(q) representation. These ARMA representationsare useful for the prediction and estimation of GARCH processes.

Remark 2.2 (Correlation of the squares of a GARCH) We have seen inchapter 2 that a characteristic feature of financial series is the autocorrelation ofthe squared returns, while the returns are non autocorrelated. The representation(2.11) shows that GARCH processes are able to capture this empirical fact. If thefourth-order moment of (ǫt) is finite, the sequence of the h-order autocorrelationsof ǫ2t is the solution of a recursive equation which is characteristic of ARMA mod-els. For simplicity consider the case of a GARCH(1,1). The squared process (ǫ2t )is an ARMA(1,1) and thus, its autocorrelation decreases to zero proportionally to(α1 + β1)

h: for h > 1Corr(ǫ2t , ǫ

2t−h) = K(α1 + β1)

h

where K is a constant independent of h. Moreover, the ǫt are non correlated inview of (i) in Definition 3.3.

Definition 3.3 does not directly provide a solution process satisfying those con-ditions. The next definition is more restrictive but allows to explicitly obtainsolutions.

Let η denote a probability distribution with expectation zero and unit variance.

Definition 2.7 (Strong GARCH(p, q) Process ) Let (ηt) be an iid sequencewith distribution η. The process (ǫt) is called a strong GARCH(p, q) (with respectto the sequence (ηt)) if

ǫt = σtηtσ2t = ω +

∑qi=1 αiǫ

2t−i +

∑pj=1 βjσ

2t−j

(2.12)

where the αi, βj and ω are nonnegative constants.


100 200 300 400 500

-10

-5

5

10

Figure 2.6: Simulation of size 500 of the ARCH(1) process with ω = 1, α = 0.95 and

ηt ∼ N (0, 1)

Properties of simulated paths

Contrary to standard time series models (ARMA), the GARCH structure allowsthe magnitude of the noise ǫt to be function of its past values. Thus, periodswith a high volatility level (large values, in modulus, of the ǫt−i and hence of σ2

t )will be followed by periods where the fluctuations have a smaller amplitude. Thesimulations in Figures 2.6-2.9 highlight this property called (volatility clustering).Large values, in modulus, are not uniformly distributed on the whole period, buttend to cluster. We will see in the sequel that all these trajectories correspond tostrictly stationary processes which, except for the ARCH(1) model of Figure 2.7,are also second-order stationary. Even if the modulus can be extremely large,these processes are not explosive, as can be seen from these Figures. Higher valuesfor α (theoretically α > 3.56 for the N (0, 1) distribution, as will be establishedlater on) lead to explosive paths. Figures 2.8-2.9, corresponding to GARCH(1,1)models, have been obtained with the same simulated sequence (ηt). As we willsee, permuting α and β does not modify the variance of the process but has aneffect on the higher-order moments. For instance the simulated process of Figure2.9, with α = 0.7 and β = 0.2, does not admit a fourth-order moment contraryto the process of Figure 2.8, which implies the presence of values with a very highmodulus. The two process are also different in terms of shocks persistence: thelarger β, the slower a shock on the volatility vanishes. Conversely, a large value ofα implies sudden volatility variations in response to shocks on ǫt.

Conditions ensuring the stationarity of GARCH processes are presented in theAppendix.


100 200 300 400 500

-20

-15-10

-5

510

1520

Figure 2.7: Simulation of size 500 of the ARCH(1) process with ω = 1, α = 1.1 and

ηt ∼ N (0, 1)

100 200 300 400 500

-15

-10

-5

5

10

15

Figure 2.8: Simulation of size 500 of the GARCH(1,1) process with ω = 1, α = 0.2, β =

0.7 and ηt ∼ N (0, 1)


100 200 300 400 500

-15

-10

-5

5

10

15

Figure 2.9: Simulation of size 500 of the GARCH(1,1) process with ω = 1, α = 0.7, β =

0.2 and ηt ∼ N (0, 1)

Chapter 3

Reserves and risk measures

Several aspects are important in setting the reserves. One has to

• precisely define the balance sheet line of interest (including the horizon ofanalysis). For instance, for a portfolio of assets, should dividends be in-cluded? should the liquidity risk be taken into account?

• determine the uncertainty on this line, from a probabilistic model. Theproblem gets rapidly difficult because of the large number of assets, of thelack of data or of complex stylized facts.

• determine, from this model, a scalar summary measuring risk and then theamount of reserves based on this summary. One should also explain howthese reserves appear on the balance sheet.

Before defining the VaR, the central concept in the introduction of regulations infinancial institutions, we introduce the notion of loss distribution. Throughoutthis chapter, the variables will be considered as continuous, to avoid difficulties indefining quantiles (extensions to arbitrary variables are presented in the appendix).

3.1 Risk factors and loss distributions

Let a portfolio whose value at time t is denoted Vt. At horizon h, the loss isdenoted

Lt,t+h = −(Vt+h − Vt).

22

3.1. RISK FACTORS AND LOSS DISTRIBUTIONS 23

The law of Lt,t+h is called loss distribution (conditional or not). This distributionallows the calculation of reserves, which will be insufficient to cover all risks. Ingeneral Vt is modeled as a function of d observable risk factors.

For example, consider a portfolio consisting of d assets. The price of asset i attime t is denoted Si,t, and let Ri,t,t+h = logSi,t+h − log Si,t denote the log-return.Letting ai the number of assets i in the portfolio, we have

Vt =d∑

i=1

aiSi,t

and therefore, assuming a fixed portfolio composition between the dates t and t+h,

Lt,t+h = −d∑

i=1

aiSi,t(eri,t,t+h − 1).

The distribution of Vt+h conditional to the past until time t is called the "Profitand Loss (P&L) distribution".

The determination of the level of reserves depends on

• the portfolio,

• the time t (information) and the horizon h,

• the level of risk considered to be admissible (parameterized by the real α ∈]0, 1[).

Denote by Rt,h(α) this level of reserves. With these reserves, that are not remu-nerated, the line of the balance sheet at t+h becomes Vt+h+Rt,h(α). One can fixthe smallest reserves satisfying

Pt[Vt+h +Rt,h(α) < 0] < α, (3.1)

that is,

Pt[Vt+h < Rt,h(α)] < α.

In general, Rt,h(α) is thus the quantile of level α of the conditional distribution ofVt+h, i.e. of the P&L distribution.

24 CHAPTER 3. RESERVES AND RISK MEASURES

3.2 VAR

3.2.1 Definition and interpretations

The required capital to cover the risk, or VaR also includes the current value ofthe portfolio:

VARt,h(α) = Vt +Rt,h(α).

VaR is thus interpreted as the exposed capital or "Value at Risk "in case of failure.Another interpretation is as follows. We have

Pt[Vt+h − Vt < −VARt,h(α)] < α

or

Pt[VARt,h(α) < Lt,t+h] < α, ie Pt[Lt,t+h ≤ VARt,h(α)] ≥ 1− α. (3.2)

We will adopt the following definition.

Definition 3.1 Call VaR at level α, the quantile of order 1−α of the conditionalloss distribution:

VARt,h(α) := infx ∈ R | Pt[Lt,t+h ≤ x] ≥ 1− α,

when this quantile is positive. By convention, VARt,h(α) = 0 otherwise.

In particular VARt,h(α) increases as α decreases.

Remark 3.1 For market risk management, one typically has h = 1 day or 10days. For the regulator (credit risk or operational risk), h = 1 year, α = 5% or3%. In the so-called "standard" approach, the conditional distribution is replacedby the marginal distribution.

Remark 3.2 It is more convenient to work with the VAR, instead of the level ofreserves, because it is a quantile of the law of a price change (in general supposedto be stationary).

Remark 3.3 Another interpretation of the VaR, as the optimal capital for somecost function, is as follows. Suppose that one seeks the capital zt, determined attime t, minimizing the conditional expectation of a cost

Ct+h = (1− α)(Lt,t+h − zt)+ + α(zt − Lt,t+h)

+.

3.2. VAR 25

Thus, for α small, a low cost is associated with an over-rated reserve, but a signif-icant cost is associated with losses exceeding the reserve. A value of zt minimizingthe expected conditional cost of this is precisely VARt,h(α).

1 VaR thus achievesan optimal balance between an excessive loss and an excessive reserve.

3.2.2 VAR and conditional moments

Introduce the first two moments conditional to the available information at timet of Lt,t+h:

Mt,t+h = ET (Lt,t+h), σt,t+h = Vt(Lt,t+h).

Suppose thatLLt,t+h = mt,t+h + σ∗t,t+hLh (3.3)

where L∗h is a random variable, with c.d.f. Fh. We then have from (3.2),

1− α = Pt[VARt,h(α) ≥ mt,t+h + σ∗t,t+hL] = Fh

(

VARt,h(α)−mt,t+h

σt,t+h

)

.

ThereforeVARt,h(α) = mt,t+h + σt,t+hF

←h (1− α). (3.4)

VaR thus breaks up into an "expected loss" mt,t+h, the conditional mean of thelosses, and an "unexpected loss" σt,t+hF

←(1− α), also called economic capital .

The apparent simplicity of formula (3.4) masks difficulties (i) in the calculationof conditional moments for a given model and (ii) in the determination of the lawFh, assumed to be independent of t, of the standardized returns at horizon h.

Let a portfolio of prices pt = a′Pt, where a, Pt ∈ Rd. We have, by introducing the

price changes ∆Pt = PPt−t−1,

Lt,t+h = −(pt − pt + h) = −a′(Pt+h − Pt) = −a′h∑

i=1

∆Pt+i.

1For X a random variable, with continuous cdf F and such that E|X | < ∞, let

f(z) = (1 − α)E(X − z)+ + αE(z −X)+ = (1− α)

∫ ∞

z

(x− z)dF (x) + α

∫ z

−∞

(z − x)dF (x)

denote the function to be minimized. This function is convex (hence continuous) andlim|z|→∞ f(z) = +∞. The minimum is reached at any point z∗ such that

0 = f ′(z∗) = −(1− α)

∫ ∞

z∗

dF (x) + α

∫ z∗

−∞

dF (x) = α− 1 + F (z∗).

The quantile of order 1 − α of F is a solution. Specifically, the set of solutions is the compact[q1−α, q

1−α] of the points between the upper and lower quantiles of X .


Consider several examples of term structures of the VaR, giving its evolution as afunction of the horizon.

Example 3.1 Gaussian iid price changes. If the ∆Pt+i are iid with lawN (m,Σ), the law of Lt,t+h is the Gaussian N (−a′mh, a′Sigmaah). As a result,from (3.4),

VARt,h(α) = −a′mh+√a′Σ

√hΦ−1(1− α). (3.5)

In particular, if m = 0, then VARt,h(α) =√hVARt,1(α). This rule of multiplying

the VaR at horizon 1 by a√h, to get the VaR at horizon h, is often misused when

price changes are not iid, Gaussian and centered. For example, if the sequence(a∆Pt) is iid, with a double exponential law of parameter λ, i.e. with densityf(x) = 0.5λ exp−λ|x|, then VARt,1(α) = − log(2α)/λ. One checks easily thatthe law of Lt,t+2 has density G(x) = 0.25λ exp−λ|x|(1 + λ|x|). VaR at horizon2 is the solution u of the equation (2 + λu) exp−λu = 4α.

For instance, for λ = 0.1 we obtain VARt,2(0.01) = 51.92 while√2VARt,1(0.01) =

55.32. VaR is therefore overvalued by applying the wrong rule, but for other valuesof α it can be undervalued: VARt,2(0.05) = 32.72 while

√2VARta(0.05) = 32.56.

Example 3.2 AR(1) price changes. Suppose now that

∆Pt −m = A(∆Pt −m− 1) + Ut, (Ut) iid ∼ N (0, σ)

where A is a matrix whose eigenvalues are of modulus strictly less than 1. Theprocess (∆Pt) is thus stationary with mean m. We have

∆Pti −m+ = Ai(∆Pt −m) + Ut+iAUt+i−1 + · · ·+ Ai−1Ut+1.

Hence, by setting ai = (I − Ai)(I −A)−1,

Lt,t+h = −a′h∑

i=1

(

m+ Ai(∆Pt −m) +i∑

j=1

AijUt+j

)

= −A′mh− a′AAh(∆Pt −m)− a′h∑

j=1

(

h∑

i=j

Aij

)

Ut+j

= −A′mh− a′AAh(∆Pt −m)− a′∑

j=1

Ahh−j+1Ut+j .

The conditional distribution of Lt,t+h is thus the law N (a′µt,h, a′Σha) where

µt,h = −mh− AAh(∆Pt −m), Σh =∑

j=1

Ahh−j+1ΣAh − j1′.

3.3. AGGREGATION OF RISKS: DIVERSIFICATION AND CONTAGION 27

ThereforeVARt,h(α) = a′µt,h +

√

a′ΣhaΦ−1(1− α).

In the case where A = 0, this formula reduces to (3.5). Apart from this case, theterm in factor of Φ−1(1− α) is not proportional to

√h.

Example 3.3 ARCH (1) price changes. Suppose d = 1, a = 1 for simplicityand

∆Pt =√

ω + α1∆P 2t−1Ut, ω > 0, α1 ≥ 0, (Ut)iid ∼ N (0, 1).

The conditional distribution of Lt,t+1 is thus the law N (0, ω + α1∆P 2t ). Then

VARt,1(α) =√

ω + α1∆P 2t Φ−1(1− α).

The VaR calculation at horizon larger than 1 is an issue. Indeed, the conditionaldistribution Lt,t+h is not Gaussian. For example at horizon 2, we have

∆Pt+2 =√

ω + α1∆P 2t+1Ut+2 =

√

ω + α1(ω + α1∆P 2t )U

2t+1Ut+2.

The conditional distribution of ∆Pt+2 is not Gaussian if α1 > 0, because it has aKurtosis coefficient equal to

Et∆P 4t+2

(Et∆P 2t+2)

2= 3

(

1 +2θ2t

(ω + θt)2

)

> 3, θt = α1(ω + α1∆P 2t ).

3.2.3 VAR and tails of distributions

From (3.2), VaR is defined as an upper quantile of the conditional loss distribution.Figure 3.1 compares the VaR of three laws, with the same variance but withdistribution tails that are more or less thick. The Student law S has the thickertail, proportional to 1/x4, the Gaussian N has the thinner tail, and the double-

exponential law E has a tail of intermediate size, proportional to e−√2|x|. For a

very small level α, the VaRs will be ranked according to the thickness of the tails:VaR (N ) <VaR (E) <VaR (S). The right graph in Figure 3.1 shows that this isnot the case for usual levels α = 1% and α = 5%.

3.3 Aggregation of risks: diversification and con-

tagion

It is usual to have to consider several sources of risk. These risks can providediversification but also contagion. The VaR calculation in such a context raisesmany problems. We consider several nested models with unobservable factor. Forsimplicity, we consider in this part risk at horizon 1.


-1 1 2 3-0.1

0.1

0.2

0.3

0.4

0.5

0.6

VaR

distribution des pertes

α

0.01 0.02 0.03 0.04 0.05

1.5

2

2.5

VaR

α

Figure 3.1: VaR is the quantile of order 1 − α of the conditional distribution of losses (left

graph). The right graph represents the VaR as a function of α ∈ [1%, 5%] for a loss distribution of

a Gaussian N (full line), a 3 degrees Student distribution S (dotted line) and a double exponential

E (thin dotted line). The 3 distributions are normalized so that the variance is 1. For α = 1%

we have VaR(N )<VaR(S)<VaR(E), for α = 5% we have VaR(S)<VaR(E)<VaR(N ).

3.3.1 Iid factor model

Suppose there are N sources of risk. Let L(i)t+1 denote the loss generated by the ith

source of risk between times t and t + 1. To study the problems associated withthe aggregation of various risks, we consider the following model:

L(i)t+1 = ai + biZt+1 + u

(i)t+1, i = 1, . . . , N (3.6)

where (Zt) is an iid unobservable process of law N (0, 1), the (U(i)t ) are mutually

independent white noises that are independent of (Zt) and respectively distributedas a N (0, σ2

i ), Ai and bi are parameters. The process (Zt) is thus interpreted as a

common risk factor, while (U(i)t ) are idiosyncratic risks. In this model, the losses

of different dates are independent but losses of a period related to different risksare correlated.

The overall loss is

Lt+1 =N∑

i=1

L(i)t+1 = a+ bZt+1 + ut+1,

where a =∑N

i=1 ai, b =∑N

i=1 bi, ut+1 =∑N

i=1 u(i)t+1.. Because of independence,

conditional and unconditional VaR coincide for this model. Using (3.4), the VaRassociated with risk i is

VaR(i)(α) = ai + (b2i + σ2i )

1/2Φ−1(1− α). (3.7)

The risk associated with the overall loss is

VaR(α) = a+ (b2 + σ2)1/2Φ−1(1− α), (3.8)

3.3. AGGREGATION OF RISKS: DIVERSIFICATION AND CONTAGION 29

where σ2 =∑N

i=1 sigma2i . The coefficient b depends on the correlations between

L(i)t+1 and Zt+1. The larger |b|, the larger VAR(α). The coefficient b2 can thus be

interpreted as a measure of contagion . In particular, if b =∑N

i=1 bi = 0 we cansay that the field of risks is immunized against the common risk factor. We have

VAR(α)−N∑

i=1

VAR(i)(α) =

(b2 + σ2)1/2 −N∑

i=1

(b2i + σ2i )

1/2

Φ−1(1− α).

This difference has the sign of

∑

i 6=j

bibj − (b2i + σ2i )

1/2(b2j + σ2j )

1/2

≤ 0,

with equality only if the noises are degenerate (σ2i = 0, ∀i) and if all the bi’s have

the same sign. Except in this case, there is partial diversification (the unanticipatedpart of the risk is reduced).

Had we wrongly assumed that the risks are independent, we would use the erro-neous VaR

VARe(α) = a+

N∑

i=1

(b2i + σ2i )

1/2

Φ−1(1− α),

This formula leads to an underestimation if and only if b2 = (∑N

i=1 bi)2 >

∑Ni=1 b

2i .

Dependence between risks is therefore leads to partial diversification or to conta-gion.

Homogeneous risks

If all sources of risk are homogeneous, that is to say if ai = a0, bi = b0, σi = σ0 forall i, the average VaR is

VaR(α) =1

NVaR(α) = a0 + (b20 +N−1σ2

0)1/2Φ−1(1− α)

< a0 + (b20 + σ20)

1/2Φ−1(1− α) = VaR(i)t (α).

When N tends to infinity, the part of VARt(α) due to the unanticipated loss tendsto b0Φ

−1(1−α). Thus, if b0 6= 0 there is partial diversification, even asymptotically,because the part due to the common factor cannot be diversified. If b0 = 0, therisks are independent and there is total diversification asymptotically.


Estimation of the common factor

The Kalman filter can be used to estimate Zt at the loss L(i)j , i = 1, . . . , N ,

j = 1, . . . , t and assuming known parameters. Introduce the vectors Lt =(L

(1)t , . . . , L

(N)t )′ a = (a1, . . . , an)

′, b = (b1, . . . , Bn)′ut = (u

(1)t , . . . , u

(N)t )′. Model

(3.6) writes

Lt = a+ bZt + ut, ut ∼ N (0,Σ := diag(σ21, . . . , σ

2N)).

We obtain, using the formula of the conditional expectation for a Gaussian vector2

Zt|t = E(Zt | Lt) = b(bb′ + σ)−1(Lt − a).

Since

(bb′ + σ)−1 = Σ−1 − Σ−1bbΣ−1

1 + bΣ−1b,

we have

b(bb′ + σ)−1 = bΣ−1 − bΣ−1bbΣ−1

1 + bΣ−1b=

bΣ−1

1 + bΣ−1b,

and finally

Zt|t =

∑Ni=1 bi(L

(i)t − ai)/σ

2i

1 +∑N

i=1 b2i /σ

2i

.

The variance of the error is

V (Zt − Zt|t) = V (Zt)− V (Zt|t) = 1− b′(bb′ + Σ)−1b = 1− b

′Σ−1b

1 + b′Σ−1b

=1

1 + b′Σ−1b=

1

1 +∑N

i=1 b2i /σ

2i

.

It is small when the "signal to noise" ratios b2i /σ2i are large.

3.3.2 Model with autocorrelated factor

Now suppose that the factor Zt in Model (3.6) has an AR(1) dynamic given by

Zt = ρZt−1 +√

1− ρ2ǫt, |ρ| < 1, (3.9)

2 If the vector (x, y)′ is Gaussian, with positive definite covariance matrix and letting µx =E(x), µy = E(y), σxx = Var(x), σyy = Var(y), σxy = σyx = Cov(x, y), the law of x conditional ony is the Gaussian

N (µx +ΣxyΣ−1yy (y − µy),Σxx − ΣxyΣ

−1yy Σyx)

.

NguyenDinhHai

Note

http://fourier.eng.hmc.edu/e161/lectures/gaussianprocess/node7.html chung minh chi tiet trong day ne

3.4. ALTERNATIVE STANDARD RISK MEASURES 31

where (ǫt) is a sequence of iid N (0, 1) variables. The stationary solution (Zt) ofthis model is also N (0, 1) distributed.

At time t the variables (Lt,Lt−1, . . . ,L1) are observed. Conditional on this set ofinformation, the law of Lt+1 is the normal

N (a+ bZt+1|t,Σ+ ω2t+1|tbb

′)

where Zt+1|t = Et+1(Zt) and ω2t+1|t = Vt+1(Zt) (where Et+1 and Vt+1 denote the

expectation and variance conditional on Lt,Lt−1, . . . ,L1). These quantities areobtained recursively from the equations of the Kalman filter, involving the filteringZt|t = Et(Zt) of Zt and the variance of the filtering error, Vt(Zt) = ω2

t|t:

Zt+1|t = ρZt|t, ω2t|t = ω2

t|t−11− b(ω−2t|t−1Σ+ bb′)−1b

Ω2t+1|t = ρ2ω2

t|t+1ρ2, Z1|0 = 0, ω2

1|0 = 1.

We deduce that the conditional distribution of the total loss Lt is the normal

N (a+ bZt+1|t, σ2 + ω2

t+1|tb2)

where a =∑N

i=1 ai, b =∑N

i=1 bi and σ2 =∑N

i=1 σ2i . We deduce

VARt+1(α) = a+ bZt+1|t + (ω2t+1|tb

2 + σ2)1/2Φ−1(1− α). (3.10)

We see that the VaR here depends on t except when the contagion coefficient b = 0.In this case, we retrieve the VaR calculated in the absence of autocorrelation of Zt.Note that it is possible to show that the conditional variance ω2

t+1|t converges to

1, which is the unconditional variance of Zt. Even asymptotically, formula (3.10)remains different from (3.8) due to the update of the estimate of the factor Zt .

3.4 Alternative standard risk measures

Although VaR is the risk measure which is commonly used, the choice an appro-priate measure of risk is an open issue.

3.4.1 Volatility and moments

In the portfolio theory of Markowitz (1952), the variance is used as a measureof risk. In a dynamic framework, it might therefore seem natural to take thevolatility as a measure of risk. This concept has proven misunderstood by manypractitioners. In addition, volatility ignores the sign of deviations from the mean.Finally, this measure does not satisfy a number of "coherence" properties we shallsee later (translation invariance, subadditivity).


3.4.2 Expected shortfall

The expected shortfall (ES), or anticipated loss, is the standard risk measurementused in insurance under the influence of Solvency 2. This risk measure closelylinked to the VaR, avoids some of its conceptual difficulties (see below subadditiv-ity). Moreover VaR does not give information on the potential loss incurred whenexceeded.

Let Lt,t+h such that ELt,t+h+ < ∞. For the moment, we assume the conditionaldistribution Lt,t+h is absolutely continuous. We define the ES at level α, or Tail-VaR, as the conditional expectation of the loss given that it exceeds the VaR:

ESt,h(α) := Et[Lt,t+h | Lt,t+h > VaRt,h(α)]. (3.11)

We have, temporarily omitting the indices,

E[L1L>VaR(α)] = E[L | L > VaR(α)]P [L > VaR(α)].

Now P [L > VAR(α)] = 1−P [T ≤ VAR(α)] = 1−(1−α) = α, the second equalityarising from the continuity of the cdf of the VAR(α). Hence

ESt,h(α) =1

αEt[Lt,t+h 1Lt,t+h>VaRt,h(α)

]. (3.12)

The following definition summarizes the equalities obtained for the ES and givesan extension.

Definition 3.2 Let Lt,t+h such that ELt,t+h+ < ∞. If the conditional distributionof Lt,t+h is absolutely continuous, we define the ES at the confidence level α as

ESt,h(α) = Et[Lt,t+h | Lt,t+h > VaRt,h(α)] =1

αEt[Lt,t+h 1Lt,t+h>VaRt,h(α)

]

In the general case we define

ESt,h(α)

=1

α

Et[Lt,t+h 1Lt,t+h>VaRt,h(α)] + VaRt,h(α)(α− Pt[Lt,t+h > VaRt,h(α)])

.

The conditional expectation ET [Lt,t+h | Lt,t+h > VARt,h(α)], which coincides withthe ES in the continuous case, is called tail-VaR. The following property gives auseful characterization of the ES.

Proposition 3.1 Let Lt,t+h such that EL+t,t+h < ∞. We have

ESt,h(α) =1

α

∫ α

0

VaRt,h(u)du.


Proof. For simplicity, we omit the indices in this proof. Note first that if P [L >VAR(α)] = 0, then O(α) = VAR(α) and the property is verified since VAR(u) =VAR(α) for all u ≤ α. Now suppose P [L > VAR(α)] > 0. Using the fact thatL has same law as F←(U), where U denotes a variable uniformly distributed on[0, 1], and F denotes the cdf of L (see Property B.1), we have,

E[L1L>VaR(α)] = E[F←(U)1F←(U)>F←(1−α)]

Note that the event [F←(U) > F←(1−α)] can be written as [U > (1−α)+], where

(1− α)+ = infx ∈]0, 1[ | F←(x) > F←(1− α) = P [L ≤ V aR]. (3.13)

Therefore

E[L1L>VaR(α)] =

∫ 1

(1−α)+F←(u)du

=

∫ 1−(1−α)+

0

F←(1− u)du

=

∫ α

0

V aR(u)du−∫ α

1−(1−α)+V aR(u)du

=

∫ 1−(1−α)+

0

F←(1− u)du

=

∫ α

0

V aR(u)du− α− 1 + (1− α)+V aR(α)

Using (3.13) and Definition 3.2 we obtain the desired equality.

2

This risk measure can thus be interpreted, for a confidence level α, as the averageof the VaR on all levels u ≤ α. We obviously have ESt,h(α) ≥ VARt,h(α).

Moreover, the integral characterization makes ESt,h(α) a continuous function ofα, whatever the nature of the loss variables. VaR does not always satisfy thisproperty (for variables with zero loss, with probability 1, on certain intervals).

Example 3.4 The Gaussian case. If the conditional distribution of the loss dis-tribution is the N (mt,t+h, σ

2t,t+h) then, by (3.4), VARt,h(α) = mt,t+h+σt,t+hΦ

−1(1−


α) where Φ is the cdf of the N (0,1). Using (3.11), and noting L∗ a variable dis-tributed as the N (0, 1) and Φ its density, we have

ESt,h(α) = mt,t+h + σt,t+hE[L∗ | L∗ ≥ Φ−1(1− α)]

= mt,t+h + σt,t+h1

αE[L∗ 1L∗≥Φ−1(1−α)]

= mt,t+h + σt,t+h1

αφΦ−1(1− α).

For example, if α = 0.05, the conditional standard deviation is multiplied by 1.65in the formula for the VaR and by 2.06 in the formula for the expected shortfall.

More generally, we have under the assumption (?? ), by Proposition 3.1 and (3.4),

ESt,h(α) = mt,t+h + σt,t+h1

α

∫ α

0

F←h (1− u)du. (3.14)

The links between VaR and expected shortfall can be studied in more detail (seeGourieroux and Liu, 2006). The difference between formulas (3.4) and (3.14) canbe seen by computing the ratio of the coefficients before the conditional standarddeviation:

L(α) :=1

αF←h (1− α)

∫ α

0

F←h (1− u)du. (3.15)

Table 3.1 shows the values of VaR and ES for four distibutions and the value ofL(α). This coefficient is always greater than 1 but may be very different accordingto the distribution. It tends to 1 as α tends to 0 in the case of the uniform lawand the double-exponential (Laplace) law. However, it is independent of α for thePareto and one can show that this law is the only one, for positive variables, witha constant coefficient L(α) (see Gourieroux and Liu, Proposition 1, 2006).

Figure 3.2 shows two examples of quantile and density functions for the Paretodistribution.

3.4.3 Distortion measures

In this section, we assume for simplicity that the cdf Fh of the loss distributionis continuous and strictly increasing. We suppress the indices t and h to simplifythe notations. According to Property 3.1, the ES can be written (omitting theindices)

ES(α) =

∫ 1

0

F−1(1− u)1[0,α](u)1

αdu,


1.5 2 2.5 3

1

2

3

4

0.2 0.4 0.6 0.8 1Α

2

4

6

8

ΖΑ

Figure 3.2: Density functions (left) quantile functions (right) for Pareto distributions with

parameters a = 3, b = 1 (in red) and a = 4, b = 1 (in black).

the term 1[0,α]1α

being interpreted as the density of the uniform distribution on[0, α]. More generally, we call Distortion Risk Measure (DRM) the number

r(F ;G) =

∫ 1

0

F−1(1− u)dG(u),

where G is a cdf on [0, 1], called distortion function, and F is the cdf of the lossdistribution. The introduction of a probability distribution on levels of confidenceis sometimes interpreted in terms of optimism or pessimism with respect to risk.If G admits a density g and if g is increasing on [0, 1], that is to say if G is convex,the quantiles F−1(1 − u) are assigned a weight which is large as u is large: largeexposures are not much taken into account. Conversely, if g is decreasing, theseextreme risks have the highest weights.

VaR at the confidence level α is a distortion measure, obtained by taking for Gthe Dirac mass at α. As we have seen, the expected shortfall corresponds to aconstant density g on [0, α]: this is an average over all confidence levels smallerthan α.

We have, by integration by parts and then change of variable (u 7→ 1− u),

r(F ;G) =

∫ 1−F (0)

0

F−1(1− u)dG(u) +

∫ 1

1−F (0)

F−1(1− u)d[G(u)− 1]

= −∫ 1−F (0)

0

G(u)dF−1(1− u)−∫ 1

1−F (0)

[G(u)− 1]dF−1(1− u)

=

∫ 1

F (0)

G(1− u)dF−1(u) +

∫ F (0)

0

[G(1− u)− 1]dF−1(u). (3.16)


As a result, a new change of variable (u = F (x)) leads to the formula

r(F ;G) =

∫ +∞

0

GS(x)dx−∫ 0

−∞[1−GS(x)]dx (3.17)

where S(x) = 1 − F (x) = P [L > x] (called survival function in the literatureon duration models). Introducing the random variable L∗ = L∗(F ;G) such thatSL∗(x) = GS(x), it comes from a characterization of expectation3 that

r(F ;G) = E(L∗). (3.18)

The interpretation is as follows: the initial survival function S is replaced by asurvival function G(S) (which gives more weight to large positive values when Gis concave). The risk measure is then calculated as the expectation for this newlaw.

In particular, for a positive variable (F (0) = 0) we have the formulas

r(F ;G) =

∫ 1

0

G(1− u)dF−1(u) =

∫ +∞

0

GS(x)dx = E(L∗),

where L∗ is positive. The first equality shows that F−1 and G play symmetricroles.

We construct families of risk measures by parameterizing the distortion measure

rp(F ;G) =

∫ 1

0

F−1(1− u)dGp(u),

where the parameter reflects the confidence level, that is to say, more or lessoptimism about risk.

Example 3.5 Proportional hazard DRM. Take Gp(u) = up, where p ∈]0,+∞[. When p < 1, G is concave, extreme losses are thus overweighted. From(3.16 ) and (3.17)

rp(F ;G) =

∫ 1

0

F−1(1− u)pup−1du

=

∫ 1

F (0)

(1− u)pdF−1(u) +

∫ F (0)

0

[(1− u)p − 1]dF−1(u)

=

∫ +∞

0

1− F (x)pdx−∫ 0

−∞[1− 1− F (x)p]dx

= E(Lp)

3For an integrable variable X we have E(X) =∫∞

0P (X > x)dx −

∫ 0

−∞P [X < x]dx (see

Billingsley (1995), Probability and measure, 3rd edition, John Wiley).


where Lp is a variable of survival function

P [Lp > x] = 1− F (x)p = P [L > x]p.

We speak of "proportional hazard" because the hazard function, defined as theopposite of the derivative of the logarithm of the survival function 4 of the variablesLp and L are proportional:

−∂

∂xlogP [Lp > x] = p

−∂

∂xlogP [L > x] =

pf(x)

P [L > x],

denoting by f the density of F .

Example 3.6 (Exponential DRM.) Take Gp(u) = 1−e−pu

1−e−p , where p ∈]0,+∞[.We have

rp(F,G) =

∫ 1

0

F−1(1− u)pe−pu

1− e−pdf.

The function g is decreasing regardless of p, which corresponds again to an over-weight of extreme losses.

Remark 3.4 It may be interesting to examine the sensitivity of the distortionmeasure with respect to the parameter p. When rp(F,G) is a differentiable functionof p, it suffices to calculate the derivative ∂rp(F,G)/∂p.

For the expected shortfall, we have by Proposition 3.1

∂

∂αES(α) =

1

αVaR(α)− ES(α).

This derivative is negative which confirms that the ES increases as α decreases(obvious from (3.11)). Table 3.1 gives some examples of sensitivities with respect top = α, for the VaR and ES, and several laws F , classified by increasing distributiontails. Note that when the law of loss is not continuous, the VaR is not continuousand a fortiori differentiable for all values of α. However the ES is continuous in α.Thus, regardless of the underlying loss distribution, this measure of risk ensuresthat a small change in the confidence level will not change significantly the levelof reserves. Sometimes it is necessary to introduce non-continuous laws of lossesfor portfolios containing derivatives (mixtures of discrete and continuous laws).

4In duration models, the hazard function at the point x is interpreted as the probability ofextinction at date x, given that one has reached that date.


Table 3.1: VaR and ES for the uniform, double exponential, Pareto and standard Gaus-

sian laws and relative sensitivities to α. The ratio L(α) is defined in (3.15).

U[a,b] N (0, 1) Laplace (λ), λ > 0 Pareto (a, b), a > 1

F (x) x−ab−a 1[a,b](x) Φ(x) 1− 0.5e−λx (1−

(

xb

)−a)1x>b

(pour x > 0)

VaR(α) a+ (b − a)(1− α) Φ−1(1− α) − 1λ log(2α) bα−1/a

∂VaR(α)∂α a− b −1

φΦ−1(1−α) − 1λα

−ba α−(a+1)/a

ES(α) a+ (b− a)(1− α2 )

1αφΦ−1(1− α) 1

λ (1− log(2α)) baa−1α

−1/a

∂ES(α)∂α

a−b2

1αΦ−1(1 − α) − 1

λα−ba−1α

−(a+1)/a

− 1α2φΦ−1(1− α)

L(α) a+(b−a)(1−α/2)a+(b−a)(1−α)

φΦ−1(1−α)αΦ−1(1−α) 1− 1

log(2α)a

a−1

3.5 Sensitivity with respect to the composition of

portfolio

We examine in this section how the various assets of a portfolio affect the overallrisk.

Let a portfolio of price pt = a′Pt, where a, Pt ∈ Rd.

Suppose, for the moment, that the price changes are iid Gaussian, ∆Pt ∼ iidN (m,Σ). Recall that Lt,t+h = −a′

∑hi=1∆Pi + t. According to formula (3.5 ), the

value at risk is given by

VaRt,h(α) = −a′mh +√a′Σa

√hΦ−1(1− α) := VaRt,h(a, α). (3.19)

We therefore have

∂VaRt,h(a, α)

∂a= −mh +

Σa√a′Σa

√hΦ−1(1− α) (3.20)

= −mh +Σa

a′ΣaVaRt,h(a, α) + a′mh

= −E[∆hPt+h | Lt,t+h = VaRt,h(a, α)]

letting ∆hPt+h = Pt+h − Pt and using the footnote 2. Note that Formula (3.20 )is patented by Garman and called delta-VaR. It can be seen that this derivative isa linear function, independent of α, of the value at risk. It provides the marginalcontribution to risk of each asset in the portfolio.

3.5. SENSITIVITY WITH RESPECT TO THE COMPOSITION OF PORTFOLIO39

We also have

∂2VaRt,h(a, α)

∂a∂a′=

√hΦ−1(1− α)√

a′Σa

(

Σ− Σaa′Σ

a′Σa

)

=Φ−1(1− α)√

h√a′Σa

V [∆hPt+h | Lt,t+h = VaRt,h(a, α)] (3.21)

It is found that the second derivative is positive semidefinite. Thus, the functionA 7→ VARth(a, α) is convex in the Gaussian case. If a and a∗ characterize twoportfolios, we have

VaRt,h1

2(a+ a∗), α ≤ 1

2VaRt,h(a, α) + VaRt,h(a

∗, α)which means that diversification has interest.

Does this result remain valid in the general case? We will see that the answeris negative. The formulas obtained for the derivatives can be extended to thecase of non-Gaussian distributions. We have Lt,t+h = −a′∆hPt+h. Suppose theconditional distribution of ∆hPt+h is absolutely continuous. The VaR at level αof this portfolio is characterized by the relationship

Pt[X + a1Y > VaRt,h(a, α)] = α, X = −d∑

i=2

ai∆hPi,t+h, Y = −∆hP1,t+h.

Lemma B.1 shows that

∂VaRt,h(a, α)

∂a1= −Et[∆hP1,t+h | Lt,t+h = VaRt,h(a, α)].

As a result, we obtain the same formula as in the Gaussian case:

∂VaRt,h(a, α)

∂a= −Et[∆hPt+h | Lt,t+h = VaRt,h(a, α)].

Let ga be the conditional density of Lt,t+h = −a′∆hPt+h. It can be shown that thesecond derivatives are given by

∂2VaRt,h(a, α)

∂a∂a′= −

(

∂

∂zlog ga(z)

)

z=VaRt,h(a,α)

Vt[∆hPt+h | Lt,t+h = VaRt,h(a, α)]

−(

∂

∂zVt[∆hPt+h | Lt,t+h = z]

)

z=VaRt,h(a,α)

(See Gouriéroux, Laurent, Scaillet (2000)). The second derivative shows two ef-fects. The first term is the effect of volatility, whose impact depends on the tail


of the loss distribution. The second term is the effect of heteroscedasticity whichdisappears in the Gaussian case 5.

The shape of the second derivative can help discuss the properties of convexity ofVaR as a function of the portfolio composition. Convexity is a desirable propertyof a risk measure, because it favors diversification. In the formula of the secondderivative, the first term is a positive definite matrix, provided that the densityga is decreasing for large positive values. The second term can have a positiveor negative sign. In the Gaussian case, it disappears and the convexity holds asexpected.

3.6 Coherent risk measures

VaR is often criticized for not satisfying, for any distribution of price changes, theconvexity property with respect to the portfolio composition. This means that therisk of a portfolio, as measured by VaR, may be larger than the sum of its com-ponents (even when these components are independent, except for the Gaussiancase). Risk management with VaR does not necessarily encourage diversification.Moreover, as we have seen, the VaR does not measure the severity of the losses.

In response to these criticisms, several authors attempted to define concepts ofconsistent risk measures. Artzner, Delbaen, Eber and Heath (1999) propose thefollowing definition.

Definition 3.3 Let L a set of random variables of losses defined on a measurablespace (Ω,A). Assume that L contains the constants and is closed under additionand multiplication by a scalar. Application ρ : L 7→ R is called a coherent riskmeasure if is is:

1. monotonous: ∀L1, L2 ∈ L, L1 ≤ L2 ⇒ ρ(L1) ≤ ρ(L2).

2. subadditive: ∀L1, L2 ∈ L, ρ(L1 + L2) ≤ ρ(L1) + ρ(L2).

3. positively homogeneous: ∀L ∈ L, ∀λ ≥ 0, ρ(λL) = λρ(L).

4. translation invariant: ∀L ∈ L, ∀c ∈ R, ρ(L+ c) = ρ(L) + c.

5We recover (3.21 ) in this case because

(−∂

∂zlog ga(z)

)

z=VaRt,h(a,α)

=

(

z + a′mh

a′Σah

)

z=VaRt,h(a,α)

=VaRt,h(a, α) + a′mh

a′Σah=

Φ−1(1 − α)

(a′Σah)1/2.

3.6. COHERENT RISK MEASURES 41

Remark 3.5 This axiomatic characterization was initially introduced with a finiteprobability space by Artzner, Delbaen, Eber and Heath (1999), and then extendedby Delbaen (2002). In this paper it is shown that for the existence of coherent riskmeasures, one can not take a too large set L, for example the set of all absolutelycontinuous random variables.

Remark 3.6 This definition is sometimes set for variables of profit, not loss, thatis to say, for −L instead of L. With this definition one has to change the firstinequality in the monotony property and c in −c in the right-hand side of the lastequality.

Immediate consequences of the definition are:

1. ρ(0) = 0, by writing the homogeneity property with L = 0. More generallyρ(c) = c for all constants c (if the loss is c for sure, you have provisioned c).

2. If L ≥ 0, then ρ(L) ≥ 0. If the loss is certain, it must be provisioned.

3. ρ(L− ρ(L)) = 0, ie the deterministic amount ρ(L) cancels the risk of L.

These constraints are in failure for most risk measures used in finance. Thus thevariance, or more generally, any measure based on the central moments of theloss distribution, does not satisfy, for example, the monotonicity property. Theexpectation defines a coherent (but uninteresting) risk measure. VaR satisfies allproperties with the exception of that of sub-additivity. For Gaussian variables(dependent or independent), the property is verified but the following exampleshows that the subadditivity may be in default for continuous and independentvariables.

Example 3.7 (Non subadditivity of VaR) Let L1 and L2 two independentvariables following a Pareto distribution, with density f(x) = (2 + x)−2 1x>−1.The distribution function of this law is F (x) = (1− (2 + x)−1)1x>−1 and thus theVaR at level α is VAR(α) = α−1 − 2. We check, for example with Mathematica,that

P [L1 + L2 ≤ x] = 1− 2

4 + x− 2 log(3 + x)

(4 + x)2, x > −2.

Therefore


P [L1 + L2 ≤ 2VaR(α)] = 1− α− α2

2log

(

2− α

α

)

< 1− α.

So

V aRL1+L2(α) > V aRL1

(α) + V aRL2(α) = 2V aRL1

(α), ∀α ∈]0, 1[. for example ifα = 0.01 one finds V aRL1

(0.01) = 98 and, numerically, V aRL1+L2(0.01) ≈ 203.2.

The following property shows that the ES satisfies the subadditivity property,which explains its success for the measurement of risk.

Proposition 3.2 The ES is a coherent risk measure in the sense of Definition3.3.

Proof. The properties of monotonicity, homogeneity and invariance result from(3.11 ) and these properties for the VaR. For subadditivity, we will only give theproof for absolutely continuous variables.

For EL+i < ∞, i = 1, 3 denote by V aRi(α) the value at risk at level α and let

ESi(α) = α−1E[Li 1Li≥V aRi(α)] the ES. We have, for L3 = L1 + L2,

αES1(α) + ES2(α)−ES3(α)= E[L1(1L1≥V aR1(α) −1L3≥V aR3(α))] + E[L2(1L2≥V aR1(α) −1L3≥V aR3(α))].

Note that

(L1 − V aR1(α))(1L1≥V aR1(α) −1L3≥V aR3(α)) ≥ 0

because both parentheses have the same sign. Therefore,

αES1(α) + ES2(α)− ES3(α) ≥ V aR1(α))E[1L1≥V aR1(α) −1L3≥V aR3(α)]

+V aR2(α))E[1L2≥V aR2(α) −1L3≥V aR3(α)]

= 0.

The property is proved.

2

3.6. COHERENT RISK MEASURES 43

Remark 3.7 One can show (see Kusuoka (2001), Acerbi and Tasche (2002)) thatthe ES is the smallest risk measure that is i) an upper bound of the VaR, ii)coherent and iii) only function of the loss distribution.

Remark 3.8 One can show (see Wang and Dhaene (1998)) that distorsion riskmeasures with G concave satisfy the property of subadditivity.

The following axioms were introduced initially for the analysis of insurance risk(see Wang, Young and Panjer, 1997).

1. Law-dependency: ρ(L) only depends on the law of L.

2. Additivity for comonotone risks: ∀L1, L2 ∈ L, ρ(L1 + L2) = ρ(L1) + ρ(L2)whenever L1 and L2 are increasing functions of a variable Z.

The comonotonicity can be interpreted as follows: if L1 and L2 are comonotonelosses, they evolve simultaneously. Specifically we show that they are comonotoneif and only if

[L1(ω)− L1(ω∗)][L2(ω)− L2(ω

∗)] ≥ 0, for almost every(ω, ω∗) ∈ Ω2.

There is therefore no possibility of diversification when aggregating the portfolioscorresponding to such losses.

Proposition 3.3 The VaR and ES are (i) monotone, (ii) positively homogeneous,(iii) translation invariant, (iv) law-dependent and (v) additive for comonotonerisks.

Proof. The properties (i)-(iv) are evident. We will simply verify (v) in the casewhere Li = fi(Z) with fi strictly increasing, for i = 1, 2 and assuming that thecdf of Z is strictly increasing. The cdf of variables Li is then FL = FZ f−1i . Sothe VaR of the variable Li at level α is VARi(α) = F−1Li

(1− α) = fi F−1Z (1− α).Therefore

VAR1(α) + VAR2(α) = (f1 + F2) F−1Z (1− α)

which is none other than the quantile of L1 + L2 = (f1 + F2)(Z). Property (v)is shown, in this case, for the VaR and it follows for the ES. Note that the VaRdepends on only a portion of the loss distribution: two very different laws maylead to the same VaR for a given level α (see Figure 3.1 ). For this reason, alongwith the absence of subadditivity, the VaR is a widely criticized risk measure (seeeg Tasche, 2002).

Chapter 4

Estimation of risk

The statistical literature on risk measures (VaR, Expected shortfall ...) has longbeen confined to the estimation of unconditional measures. Many approaches exist,depending heavily on the assumptions and the models (dependence or indepen-dence of observations, parametric vs. nonparametric methods).

We begin by considering the estimation of the cdf in the iid case.

4.1 Properties of the empirical cdf

Let X1, . . . , Xn denote iid variables with cdf F .

To estimate F (x) we can use the empirical cdf Fn(x) defined by

Fn(x) =1

n

n∑

i=1

1Xi≤x .

It has the following properties.

1. For x fixed, nFn(x) follows the binomial distribution with expectation nF (x)and variance nF (x)(1− F (x)).

2. By the strong law of large numbers

Fn(x) → F (x) a.s. when n → ∞.

The Glivenko-Cantelli theorem shows that the convergence is uniform:

supx∈R

|Fn(x)− F (x)| → 0 a.s. when n → ∞.

44

4.1. PROPERTIES OF THE EMPIRICAL CDF 45

3. For x fixed, the Central Limit Theorem (CLT) provides

√nFn(x)− F (x) d→ N (0, F (x)(1− F (x))). (4.1)

4. For x1 < x2 fixed, we have Cov(1Xi≤x1,1Xi≤x2

) = F (x1)(1 − F (x2)). So bythe CLT

√n

(

Fn(x1)− F (x1)Fn(x2)− F (x2)

)

⇒ N((

00

)

,

(

F (x1)(1− F (x1)) F (x1)(1− F (x2))F (x1)(1− F (x2)) F (x2)(1− F (x2))

))

.

Let (G(x))x∈R be a centered Gaussian process covariance function

Cov(G(x1), G(x2)) = F (x1)(1− F (x2)), x1 < x2.

An extension of the previous result shows that, in the sense of finite-dimensional distributions,

√n(Fn(x)− F (x)) ⇒ G(x). (4.2)

Let W (t), t ≥ 0 be the standard Brownian motion and let the processB(t), t ∈ [0, 1], called Brownian bridge, defined by

B(t) = W (t)− tW (1), 0 ≤ t ≤ 1.

Then B is a Gaussian process such that B(0) = B(1) = 0 and E(B(t)) = 0.In addition, since Cov(W (t1),W (t2)) = t1 ∧ t2,

Cov(B(t1), B(t2)) = t1(1− t2), 0 ≤ t1 < t2 ≤ 1.

Consequently, if F is the cdf of the uniform law on [0, 1], G defined in (4.2)is the Brownian bridge. Moreover, G can be expressed in terms of the Brow-nian motion. Indeed, the processes G(x) and B(F (x)) have the sameautocovariance function. As a result, in the sense of the convergence offinite-dimensional distributions,

√n(Fn(x)− F (x)) → B(F (x)). (4.3)

46 CHAPTER 4. ESTIMATION OF RISK

One can show that this convergence is actually a weak convergence (Billings-ley, 1968).

This result can be used to test that the law of the observations is F . Weintroduce the Kolmogorov-Smirnov statistic

Dn = supx∈R

|Fn(x)− F (x)|

and under the null hypothesis (law F ) and if F is continuous, we show that

√nDn ⇒ sup

x∈[0,1]|B(x)|.

(see for instance Resnik, 1994 1). This result is very interesting because theasymptotic distribution is independent of the law of the observations.

4.2 Empirical quantile function

Having defined the empirical cdf Fn(x), call empirical quantile function F←n definedby

F←n (α) = infx ∈ R | Fn(x) ≥ α, 0 < α < 1.

To simplify notations we set, when there is no doubt about the law in question,

ξα = F←(α) and ξn,α = F←n (α).

4.2.1 Calculation of the empirical quantiles

There are several ways to obtain the empirical quantile function.

Calculation by classification

We define the ordered sample

X1,n = min(X1, . . . , Xn) ≤ X2,n ≤ . . . ≤ Xn,n = max(X1, . . . , Xn).

1Adventures in Stochastic Processes, Birkhäuser, Berlin.

4.2. EMPIRICAL QUANTILE FUNCTION 47

When F is continuous, the equalities occur with zero probability and can be ne-glected. We can therefore assume in this case X1,n < . . . < Xn,n and then

ξn,α = Xk,n, fork − 1

n< α ≤ k

n. (4.4)

Calculation by minimization

The following method seems less direct but provides an interesting generalization,the quantile regression, we will see later. It is known that the sample mean is theOLS estimator in the regression of Xi on the constant 1. We thus have

X = argminz∈R

1

n

n∑

i=1

(Xi − z)2.

We similarly obtain the empirical quantile of level α ∈]0, 1[ as a solution of

ξn,α = argminz∈R

1

n

n∑

i=1

ρα(Xi − z) := argminz∈R

fn,α(z) (4.5)

where

ρα(u) = αu+ + (1− α)(−u)+ = u(α− 1u<0).

The function fn,α is positive, piecewise linear, convex, as a sum of convex functions,hence continuous, and tends to +∞ when z tends to ±∞. There is generally aunique solution to (4.5), except when nα is an integer in which case there are twosolutions: the empirical quantile corresponds to the smaller of the two. Noticingthat ∂ρα(u)/∂u

− = α1u>0+(α − 1)1u≤0 et ∂ρα(u)/∂u+ = α1u≥0+(α − 1)1u<0

we see that the function fn,α admits at every point a right derivative and a leftderivative (which coincide except at the Xi) given by:

∂fn,α∂z+

(z) =−1

n

n∑

i=1

α1Xi−z>0+(α− 1)1Xi−z≤0 =1

n

n∑

i=1

1Xi≤z −α, (4.6)

∂fn,α∂z−

(z) =−1

n

n∑

i=1

α1Xi−z≥0+(α− 1)1Xi−z<0 =1

n

n∑

i=1

1Xi<z −α, . (4.7)

The minimum ξn,α is characterized by the conditions


∂fn,α∂z+

(ξn,α) ≥ 0,∂fn,α∂z−

(ξn,α) < 0. (4.8)

It is easy to see that we retrieve the characterization (4.4).

4.2.2 Asymptotic properties

The strong convergence of F←n (α) to F←(α) is a direct result of the uniform con-vergence of the empirical cdf (Glivenko-Cantelli theorem).

The asymptotic distribution of the empirical quantiles in the iid case is given bythe following result.

Theorem 4.1 Let X1, . . . , Xn be a sample of iid variables with an absolutely con-tinuous law of density f . Then, if α ∈]0, 1[ and f(ξα) > 0, we have

√n(ξn,α − ξα)

d→ N(

0, ω2α

)

, ω2α =

α(1− α)

f 2(ξα).

Proof: We use the quantiles characterization (4.5), as a minimum of a convexfunction. The derivative of fn,α is strictly negative in all z < ξn,α (or the left andright derivatives if z coincides with one of the Xi) and positive or zero for z ≥ ξn,α.We therefore have, by (4.6)-(4.7), for any real ǫ,

P [√n(ξn,α − ξα) > ǫ] = P [f ′n,α(ξα + ǫ/

√n) < 0]

= P

[

1

n

n∑

i=1

1Xi<ξα+ǫ/√n < α

]

. (4.9)

Let pn = E(1Xi<ξα+ǫ/√n) = P [Xi < ξα + ǫ/

√n]. The sequence (Yni), where Yni =

1Xi<ξα+ǫ/√n−pn, constitutes a triangular array of centered and iid variables for

fixed n. We have

E(Yni) = 0, E(Y 2ni) = pn(1− pn).

Moreover, letting s2n = npn(1 − pn) we have for all ǫ > 0, using the fact that|Yor| < 1,


n∑

i=1

1

s2nE(Y 2

ni 1|Yni|>ǫsn) ≤ 1

pn(1− pn)P [|Yn1| > ǫsn] = 0

for n sufficiently large as sn → ∞.

As a result, according to a CLT for triangular arrays (see Appendix),

Zn :=

∑ni=1 Yni

√

npn(1− pn)

d→ N (0, 1) . (4.10)

Now, from (4.9), noting that α < pn,

P [√n(ξn,α − ξα) > ǫ] = P

[

Zn <

√n(α− pn)

√

pn(1− pn)

]

= P

[

Zn

ǫ√

pn(1− pn)√n(α− pn)

> ǫ

]

.

Moreover√n(α− pn) → −ǫf(ξα) when n → ∞. So, from (4.10 ),

Un = Zn

ǫ√

pn(1− pn)√n(α− pn)

d→ N(

0, ω2α

)

,

which proves the result.

2

Note that the asymptotic variance depends on α and the density at ξα. The termα(1 − α) is small for α corresponding to the tails of the distribution, but thiseffect is counterbalanced by that of the denominator, which makes the estimationof quantiles less accurate in regions of low density. Figure 4.1 shows how theasymptotic accuracy depends on α in the case of the N (0, 1) distribution. Theeffect of the denominator outweighs that of the numerator, the accuracy becomingvery low in the low density areas. Figure 4.2 exhibits a similar behavior (but here,distribution tails are only present in the positive side) for Pareto laws.

An extension of the previous proof provides the asymptotic distribution of a vectorof values. Let ξn = (ξn,α1

, . . . , ξn,αp)′ and let ξ = (ξα1

, . . . , ξαp)′, some vectors of p

empirical and theoretical quantiles.

Proposition 4.1 Let X1, . . . , Xn be a sample of absolutely continuous iid vari-ables, with density f . Then, if αi ∈]0, 1[ and f(ξαi

) > 0 for i = 1, . . . , p,

√n(ξn − ξ)

d→ N (0,Ω) , Ω = (ωij), ωij =αi ∧ αj − αiαj

f(ξαi)f(ξαj

).


0.2 0.4 0.6 0.8 1Α

5

10

15

20

25

30

ΩΑ2

Figure 4.1: Asymptotic variance for the empirical quantile of a N (0, 1).

0.1 0.2 0.3 0.4 0.5 0.6Α

0.05

0.1

0.15

0.2

0.25

0.3

ΩΑ2

0.2 0.4 0.6 0.8 1Α

10

20

30

40

50

60

70

ΩΑ2

Figure 4.2: Asymptotic variance for the empirical quantiles of the Pareto distributions of

Figure 3.2.


Note that the asymptotic covariance matrix Ω can be estimated by

Ω = (ωij), ωij =αi ∧ αj − αiαj

f(ξn,αi)f(ξn,αj

)

where f is a nonparametric estimator (obtained for example by the kernel method)of the density f .

A more accurate result on the asymptotic behavior of empirical quantiles is knownas the Bahadur representation. For a sample of iid variables, with density f suchthat f(α) > 0, Bahadur (1966) showed that: for 0 < α < 1,

ξn,α = ξα +α− Fn(ξα)

f(ξα)+Rn (4.11)

where rn is a random term of order (log logn/n)3/4 when n tends to infinity. Theconvergence (4.1) allows to retrieve the limiting distribution of the empirical quan-tile. This result was extended to more general processes. The following extensionis for linear processes. It also specifies the uniform behavior of ξn,α − ξα in theneighborhood of α.

Proposition 4.2 (Wu, 2005) If Xt =∑∞

i=0 aiǫt−i where (ǫt) is a sequence ofiid variables with finite variance and density fǫ and the sequence (ai) satisfies∑∞

i=0 |ai| < ∞, if

supx(fǫ(x) + |f ′ǫ(x)|) < ∞, f(ξα) > 0,

then we have the representation (4.11) with Rn = Oa.s.

(

(

lognn

)3/4log logn

)

2.

If moreover, for 0 < α0 < α1 < 1,

supx

|f ′′ǫ (x)| < ∞, infα0<α<α1

f(ξα) > 0,

then

supα0<α<α1

∣

∣

∣

∣

ξn,α − ξα − α− Fn(ξα)

f(ξα)

∣

∣

∣

∣

= Oa.s.

(

n−3/4 log5/4 n log log n)

.

The following extension concerns nonlinear processes.

2The notation Zn = Oa.s.(rn) means that Zn/rn is bounded with probability 1.


Proposition 4.3 (Wu, 2005) If Xt = G(Xt−1, ǫt) where (ǫt) is a sequence of iidvariables and Lǫ = supx 6=x′ |G(x, ǫ)−G(x′, ǫ)|/|x− x′| ≤ ∞ satisfies

E(logLǫ) < 0, E|Lrǫ + |x0 −G(x0, ǫ)|r|| < ∞

for r > 0 and x0, and if for 0 < α0 < α1 < 1

supx(f(x) + |f ′(x)|) < ∞, inf

α0<α<α1

f(ξα) > 0,

then

supα0<α<α1

∣

∣

∣

∣

ξn,α − ξα − α− Fn(ξα)

f(ξα)

∣

∣

∣

∣

= Oa.s.

(

n−3/4 log3/2 n)

. (4.12)

This property can be applied, in particular, to the ARCH (1) under appropriateassumptions on the coefficients. However, the asymptotic distribution of the em-pirical quantiles will not be obtained explicitly in this case because the marginaldensity of the observations is unknown (even when the conditional distribution isGaussian).

The uniform bound (4.12 ) allows us, using (4.3), to obtain the following weakconvergence

√nf(ξα)(ξn,α − ξα) ⇒ B(α), α ∈]α0, α1[ (4.13)

where B(t), t ∈ [0, 1] is a Brownian bridge.

4.3 Methods for estimating risk measures

The typical properties of financial series (presence and clustering of volatility, lep-tokurtic marginal distributions, asymmetries) complicate the estimation of VaR,and more generally risk measures. We have to distinguish, for example, the condi-tional and unconditional VaR. Methods for estimating quantiles developed in theiid Gaussian framework are inappropriate here.

4.3. METHODS FOR ESTIMATING RISK MEASURES 53

4.3.1 Nonparametric estimation

The results of the previous section apply to the estimation of the unconditionalVaR, defined as the quantile of level 1 − α of the loss distribution. The methodof "Historical simulation" simply estimates VaR by the empirical quantile of theunconditional distribution of Lt,t+h (it does not involve simulations in the statisticalsense). Noting Fn this distribution, assumed to be independent of t and h, we thushave the estimator

ˆVaRn,h(α) = F←n (1− α).

Instead of considering the entire sample, one can just consider the n0 more recentobservations. For example, if n0 = 1000, the estimated VaR at the confidence levelα = 5% is just the 50th largest observation.

This method based on the empirical quantile has the advantage of simplicity andto avoid any assumptions about the distributions of loss variables. However, it hassignificant drawbacks: (i) VaR is the unconditional VaR, (ii) the asymptotic distri-bution used to evaluate the accuracy of the estimator depends on precise assump-tions (e.g. the independence and equidistribution), (iii) the examples illustratingTheorem 4.1 show that quantiles corresponding to the tails of the distribution areestimated with very low asymptotic accuracy, yet it is precisely these quantilesthat are of interest, (iv) the method does not take into account the influence ofexplanatory variables (present or past) in the risk evaluation.

Nonparametric estimators can also be constructed for the expected shortfall (seeScaillet, 2005). More generally, a nonparametric estimator of the distortion riskmeasure

r(F ;G) =

∫ 1

0

F−1(1− u)dG(u),

where G is a cdf on [0, 1] and F is the loss distribution, assumed to be strictlyincreasing and continuous, is

r(Fn;G) =

∫ 1

0

F←n (1− u)dG(u).

The empirical quantile function being a step function, it is easy to express thisintegral in function of the observations:


r(Fn;G) =n−1∑

i=0

G

(

i+ 1

n

)

−G

(

i

n

)

Xn,n−i.

This estimator, called L-estimator in the statistical literature, has the form of alinear combination of ordered observations with a weight dependent of the varia-tions of G. We see in particular that if G is concave, large observations will beoverweighted.

In the case of iid observations, the asymptotic distribution of the estimator r(Fn, G)can be obtained from the weak convergence (4.37) of the quantile process ξn,α tothe theoretical quantile process. We have

√nr(Fn;G)− r(F ;G) ⇒

∫ 1

0

B(1− u)

f(ξ1−u)dG(u), (4.14)

where B(t), t ∈ [0, 1] is a Brownian bridge. The limit law is centered, withvariance given by

Varas[√nr(Fn;G)− r(F ;G)] =

∫ 1

0

∫ 1

0

u1 ∧ u2 − u1u2

f(ξ1−u1)f(ξ1−u2

)dG(u1)dG(u2).

If G admits a density g, we obtain, noting that 1/f(ξα) is the derivative of F−1(α),then using the change of variable u = F (x) = 1− S(x),

Varas[√nr(Fn;G)− r(F ;G)]

=

∫ 1

0

∫ 1

0

(u1 ∧ u2 − u1u2)g(u1)g(u2)dF−1(u1)dF

−1(u2)

=

∫ 1

0

∫ 1

0

F (x1) ∧ F (x2)− F (x1)F (x2)gS(x1)gS(x2)dx1dx2.

Example 4.1 In the case of the ES, which corresponds to g(u) = 1[0,α](u), theestimator is written

ESn(α) =1

nα

[nα]−1∑

i=0

Xn−i,n +

(

1− [nα]

nα

)

Xn−[nα],n.

So if nα = k, the estimator is simply the average of the k largest observations.


It can be shown (see Gourieroux and Liu, 2007) that the asymptotic variance ofthe estimator depends on the variance when the loss exceeds the VaR and on thegap between the ES and the VaR:

Varas[√nESn(α)−ES(α)] = (V [L | L > VaR(α)] + (1− α)ES(α)− VaR(α)2)

α

4.3.2 Dynamic models of conditional moments

A parametric approach to VaR estimation is to specify the conditional momentsof the loss variable. A classical model, noting Lt the loss variable between datest− 1 and t, is

Lt = mt(θ) + σt(θ)ǫt

where mt(θ) and σ2t (θ) are, respectively, the mean and variance conditional on the

past of Lt, and θ is a vector of parameters. Assuming that the sequence (ǫt) iid,with cdf F , the VaR (at horizon 1) takes the form

VaRt,1(α) = mt(θ) + σt(θ)F←(1− α). (4.15)

If the law of the error terms is assumed to be known, an estimator θ of θ gives aparametric VaR estimator defined by

ˆVaR(1)

t,1 (α) = mt(θ) + σt(θ)F←(1− α). (4.16)

In the, more realistic, case where the law of the error terms is not known a semi-parametric estimator of VaR can be defined by

ˆVaR(2)

t,1 (α) = mt(θ) + σt(θ)F←(1− α) (4.17)

where F←ǫ (1 − α) is an empirical quantile residuals obtained from ǫt = (Lt −mt(θ))/σt(θ). An intermediate solution is to assume that the law of ǫt belongs toa family set, Fβ. From an estimator β of the parameter, define

ˆVaR(3)

t,1 (α) = mt(θ) + σt(θ)F←β(1− α). (4.18)

Another estimator is the estimator obtained using the Gaussian quasi-maximumlikelihood


ˆVaRQMV (α) = mt(θQMV ) + σt(θQMV )Φ−1(1− α). (4.19)

Specifications commonly used for the calculation of conditional moments are theARMA and GARCH models.

Note that these methods have the advantage to produce VaR estimators that aredecreasing functions of α. This property will not be satisfied by other estimatorsseen thereafter.

Riskmetrics Model

The RiskMetrics methodTM was developed by JP Morgan to calculate VaR. It isbased on the following model, written for a series of returns in logarithm, rt =log(pt/pt−1),

rt+1 = σt+1ηt+1, (ηt) iid N (0, 1)

σ2t+1 = λσ2

t + (1− λ)r2t

(4.20)

where λ ∈]0, 1[ is a smoothing parameter (arbitrarily set at 0.94 for daily series).Thus σ2

t+1 is simply the prediction r2t+1 obtained by simple exponential smoothing.

This model can also be interpreted as an IGARCH (1,1) 3 without constant term.It is however important to note that (4.20) is not really a model: for any initialvalue r0, rt tends almost surely to 0 as t tends to infinity. It is therefore clearthat (4.20 ) can not be the DGP (Data Generating Process) of any financial series.This model can nevertheless be used as a tool for calculating VaRs.

The VaR at horizon 1 and level α of the loss variable Lt,t+1 = −rt+1 is, since theconditional distribution of Lt,t+1 is the law N (0, σ2

t+1),

VaRt+1,1(α) = σt+1Φ−1(1− α). (4.21)

At higher horizons, the VaR calculation is problematic. The following formula iscommonly used

VaRt+1,h(α) = σt+1

√hΦ−1(1− α), (4.22)

but we will see that it is false. Indeed, the loss at horizon h for logarithms of pricesis

3integrated GARCH, because the sum of the coefficients of σ2t and r2t is 1.


Lt,t+h = −(log pt+h − log pt) = −h∑

i=1

rt+i

and its conditional variance is

Vt(Lt,t+h) =

h∑

i=1

Vt(rt+i) =

h∑

i=1

Et(r2t+i) =

h∑

i=1

Et(σ2t+i).

For the first equality we used the relationship Et(rt+irt+j) = 0 for i 6= j. Now wecan write, setting a(ηt) = λ+ (1− λ)η2t , for i ≥ 1

σ2t+i = a(ηt+i−1)σ

2t+i−1 = a(ηt+i−1) . . . a(ηt+1)σ

2t+1. (4.23)

Therefore, since Ea(ηt) = 1

Etσ2t+i = Ea(ηt+i) . . . Ea(ηt+2)σ2

t+1 = σ2t+1.

Finally,

Vt(Lt,t+h) = hσ2t+1,

which explains the form used for VaRt+1h(α). However formula (4.22 ) is incorrectbecause the conditional distribution of Lt,t+h is not Gaussian4. The correct formulais

VaRt+1,h(α) = σt+1

√hF−1h (1− α), (4.24)

where Fh is the cdf of Lt,t+h/σt+1

√h. This distribution may be estimated non-

parametrically from −∑hi=1 rt+i/σt+1

√h where σ2

t+1 = λσ2t + (1 − λ)r2t and λ

denotes an estimator of λ.

Similar calculations can obviously be made for a more general GARCH (1,1), butwe lose the form in

√h given by formula (4.24 ). Indeed, (4.23) is no longer valid,

the σ2t+i being of the form α(ηt+i, . . . , ηt+2) + β(ηt+i, . . . , ηt+2)σ

2t+1.

4For example, for h = 2 we have Lt,t+2 = −σt+1ηt+1 +√

a(ηt+1)ηt+2 and the kurtosis ofthe conditional distribution is

3

(

(1− λ)(3 − λ)

2+ 1

)

6= 3.


4.3.3 Quantile Regression

Quantile regression (see Koenker’s book, 2005) can be interpreted from the linearregression. Recall that the linear regression specifies the expectation of a vectorY = (Y1, . . . , yn)

′, conditionally to a n× k matrix X of exogenous variables as

E(Yi | X) = X ′iβ, i = 1, . . . , n (4.25)

where the X ′i are the rows of X and β ∈ Rk is the unknown parameter. Equiva-

lently, we can consider the model

Yi = X ′iβ + ǫi, E(ǫi | X) = 0, i = 1, . . . , n. (4.26)

Other assumptions are obviously needed on the error terms ǫi and the matrix Xto estimate β.

Let α ∈]0, 1[ and let F←W (α | X) the quantile function associated with the law of avariable W given X. The quantile regression replaces the previous equations by

F←Yi(α | X) = X ′iβ(α), i = 1, . . . , n (4.27)

and equivalently, given the additivity with respect to constants of the quantilefunction,

Yi = X ′iβ(α) + ǫi, F←ǫi (α | X) = 0, i = 1, . . . , n (4.28)

where β(α) ∈ Rk is a parameter. When X is a column of 1, β(α) ∈ R is simply

the quantile of order α of the Yi.

Remark 4.1 The analogy between Models (4.26 ) and (4.27) is misleading be-cause the latter equation is actually a functional equality (in α). One is rarelyinterested in modeling the quantile corresponding to a particular α. More gener-ally, we are interested in values of α belonging to an interval included in ]0, 1[. Inthis context, constraints are necessary to ensure that X ′iβ(α) is a quantile function.The standard model (location-scale shift model) is written

F←Yi(α | X) = X ′iθ +X ′iγF

←0 (α), i = 1, . . . , n (4.29)

where θ ∈ Rk and γ ∈ R

k are parameters and F←0 (·) is a given quantile function. Toensure that the function F←Yi

(· | X) be increasing, we must constrain the exogenousvariables and the parameter space to get


X ′iγ ≥ 0, i = 1, . . . , n.

Note that the specification (4.29) is equivalent to the model

Yi = X ′iθ + (X ′iγ)ǫi, i = 1, . . . , n (4.30)

where the "error terms" 5 ǫi have distribution F0.

Remark 4.2 Model (4.27 ), if considered valid for any α ∈]0, 1[, completely char-acterizes the law of yi (conditionally on X). A simple way of simulating the Yi isto simulate variables Ui of law U [0, 1] and to take

Yi = X ′iβ(Ui), i = 1, . . . , n. (4.31)

It is known that a variable of cdf F is obtained by setting Y = F←(U) whereU ∼ U [0, 1]. For instance, simulations of Model (4.29 ) are obtained from

Yi = X ′iθ +X ′iγF←0 (Ui), i = 1, . . . , n. (4.32)

Reasoning with α fixed, an estimator of β(α) in Model (4.27 ) follows naturallyfrom Formula (4.5) for the empirical quantiles. We set

β(α) = arg minβ∈Rk

1

n

n∑

i=1

ρα(Yi −X ′iβ) := arg minβ∈Rk

fn,α(β). (4.33)

The limiting distribution of the estimator can be obtained under various assump-tions about the variables Yi and Xi. For simplicity we assume that the variablesYi are independent, with law FYi

conditionally on X. Let Ξi(α) = F←Yi(α). We

have the following result.

Proposition 4.4 Assume that the Fi are absolutely continuous, with strictly pos-itive density fi at ξi(α). Suppose further that there exist positive definite matricesΩ0 and Ω1(α) such that almost surely

1. 1n

∑ni=1XiX

′i → Ω0,

5the usual interpretation does not always work because these terms are not necessarily cen-tered.


2. 1n

∑ni=1 fi(ξi(α))XiX

′i → Ω1(α),

3. maxi=1,...,n ‖Xi‖/√n → 0.

Then √nβ(α)− β(α) d→ N

(

0, α(1− α)Ω−11 (α)Ω0Ω−11 (α)

)

Remark 4.3 The above results are valid for α given but considering several levelsα, there is no guarantee of coherency (monotonicity) of the estimators obtained.In other words, in the plan, the regression quantiles estimated for different α’scan intersect. However, it is possible to show that at the center of the graph, themonotony is respected. Specifically, noting X the empirical mean of the Xi, then

α1 ≥ α2 ⇒ X′β(α1) ≥ X

′β(α2)

(see Koenker, 2005, Theorem 2.5).

With the notations used for loss variables, an estimator of VaR, conditional on theinformation represented by the vector xt, is

ˆVaRt(α) = x′tβ(α).

Here it is natural to introduce in vector xt some lagged variables. The asymptoticresults have to be reconsidered. This approach can obviously be extended to thenonlinear framework, by specifying the VaR as a function of the form g(xt, β, α).The estimator is convergent (to β0 such that VARt(α) = g(xt, β0, α)) and asymp-totically normal under various dependence conditions (see Portnoy (1991)) includ-ing in particular the ARCH case (see Koenker and Zhao (1996)). A generalizationintroduced by Engle and Manganelli (2004), to be studied below, specifies a dy-namic model for VaR. Note that a drawback of this method, conducted separatelyfor various values of α, is that nothing guarantees the monotony of ˆVAR(α) as afunction of α.

4.3.4 Dynamic models of VaR

There are several ways to model conditional distributions. The most usual one,in econometrics, is to model the conditional densities. This is, for example, whatis done when an ARCH model with iid Gaussian noise is specified. Another typeof modeling is to specify the conditional Laplace transforms. This approach is


commonly used in finance (see, eg, affine models, CAR models (Compound Au-toregressive)). A third method, which seems more natural for risk assessment, isto specify the conditional quantiles.

As we have noted, a difficulty is to specify a model for conditional quantiles takinginto account the monotonicity of the function α 7→ VaRt(α).

QAR (Quantile AuroRegressive) Models

Formula (4.31 ) provides a simulation method for the quantile regression model(4.28). It also interprets as a random coefficient model with exogenous variables.A natural generalization of the autoregressive framework, introduced by Koenkerand Xiao (2006), is written

Yt = a0(Ut) + a1(Ut)Yt−1 + · · ·+ ap(Ut)Yt−p, (4.34)

where (Ut) is a sequence of iid U(0, 1] variables, the ai(·) are functions [0, 1] 7→ R,to be estimated. This model can be called quantile autoregression of order p orQAR(p). Provided that the right hand side of equality (4.34 ) is an increasingfunction of Ut, the conditional quantile function to Yt reads 6

F←Yt(α | Yt−1, . . . , Yt−p) = a0(α) + a1(α)Yt−1 + · · · ap(α)Yt−p, ∀α ∈]0, 1[. (4.35)

This restriction has importance, because the monotony has to take place whateverthe values of the variables Yt. However, it is easy to show that it holds when theYi are positive random variables and the ai(·) are quantile functions.

Constraints are also needed to ensure stationarity of solutions (Yt) to model (4.34).In the case p = 1, we see that

Yt = a0(Ut) +∞∑

i=0

a1(Ut) . . . a1(Ut−i)a0(Ut−i−1)

provided the infinite sum converges. By the Cauchy rule, we obtain a condition ofabsolute convergence of this sum, and thus a strict stationarity condition:

E log |a1(Ut)| < 0, E log |a0(Ut)| < +∞.

6as for any increasing function g and any variable U ∼ U [0, 1] we have F←g(U)(α) = g(F←U (α)) =

g(α).


Similarly, a second-order stationarity condition (and in the strict sense) is

Ea21(Ut) < 1, Ea20(Ut) < +∞.

Note that these conditions do not require that the function a1(·) be bounded aboveby 1. Conditions can be established in the general case from a vector representationof the model (4.34 ).

For α fixed, we obtain an estimator of the parameter vector a(α) =(a0(α), . . . , ap(α))

′ as a solution of

a(α) = arg mina∈Rp+1

1

n

n∑

i=1

ρα(Yt − a0 − a1Yt−1 − · · · − apYt−p) (4.36)

where ρα(u) = αu+ + (α − 1)(−u)+ = u(α − 1u<0). Under regularity conditionson the functions ai (involving the second order stationarity), Koenker and Xiao(2006, Theorem 2) showed the weak convergence

Σ−1/2(α)√na(α)− a(α) ⇒ B(α), (4.37)

where Bp+1(t), t ∈ [0, 1] is a Brownian bridge of dimension p + 1 and Σ(α) is anon-random matrix.

The remark made for quantile regression is still valid in the autoregressive frame-work: the main drawback of this approach is that it does not ensure the monotonyof the estimated quantile function.

Caviar (Conditional VaR Auroregressive) Models

These models, introduced by Engle and Manganelli (2004), differ from the previousones by the introduction of lagged quantiles in the specification of the quantile ofthe current date. A general model of this form is then, ∀α ∈]0, 1[

qt(α) := F←Yt(α | Yt−1, Yt−2, . . .)

= a0(α) +

p∑

i=1

ai(α)g(Yt−i) +r∑

j=1

bjqt−j(α) (4.38)

where g : R 7→ R+. Examples of functions g are g(y) = y2, g(y) = |y| but also

functions taking into account the effects of asymmetry in the conditional quantile,


such as g(y) = β1y+ + β2(−y)+ (function g can therefore include parameters to

be estimated). In this specification, the positive and negative past observationsof the same module will have a different impact on the conditional quantile (byanalogy with the asymmetric GARCH models for conditional variance).

If the polynomial b(z) = 1−∑rj=1 bjz

j has all its roots outside the unit disk, it is

invertible. Noting aα(z) =∑p

i=1 ai(α)zi and L the lag operator, we have

qt(α) = β−1(1)a0(α) + b−1(L)

p∑

i=1

ai(α)g(Yt−i)

:= a∗0(α) +

∞∑

i=1

a∗i (α)g(Yt−i). (4.39)

By assuming the positivity of g and the coefficients bj , and the fact that thefunctions ai are increasing, qt(·) is also increasing (because the coefficients of thepolynomial B−1(z) are then positive).

The existence of processes (Yt) admitting the conditional quantiles defined by(4.38) is problematic. Under the above conditions, it suffices to seek processes (Yt)solution of

Yt = a∗0(Ut) +∞∑

i=1

a∗i (Ut)g(Yt−i)

where (Ut) is a sequence of iid variables law U(0, 1].

DAQ (Dynamic Additive Quantile) models

These models, introduced by Gourieroux and Jasiak (2007), are written, with theprevious notations,

qt(α) := F←Yt(α | Yt−1, Yt−2, . . .)

= g0(Yt−1, Yt−2, . . .) +

p∑

i=1

ai(α)gi(Yt−1, Yt−2, . . .) (4.40)

where ai are quantile functions, the gi are nonnegative functions (e.g. with G0 > 0)of the past variables. All these functions are parameterized.


Representation (4.39 ) shows that models caviar are special cases of model DAQ.Other specifications of the functions ai and gi can show ARCH effects. For exampleone can take, using the quantile functions of normal and Cauchy laws standard

qt(α) = m0 +m1|Yt−1 − µ|+ σ0,0 + σ0,1Y2t−11/2Φ−1(α)

+σ1,0 + σ1,1Y2t−11/2 tanπ(α− 1/2). (4.41)

The right hand side of equation (4.40 ) is an increasing function of α, it is possibleto simulate the process (Yt) from the random coefficient model

Yt = g0(Yt−1, Yt−2, . . .) +

p∑

i=1

ai(Ut)gi(Yt−1, Yt−2, . . .)

where (Ut) iid ∼ U [0, 1]. For example, in the case of the model (4.41) with σ1,0 =σ1,1 = 0, we obtain

Yt = m0 +m1|Yt−1 − µ|+ σ0,0 + σ0,1Y2t−11/2ǫt

where (ǫt) iid ∼ N (0.1).

The DAQ model can also be used to specify additional dynamic risk measures.For example, the model (4.41 ) is pulled dynamic DRM

r(Ft;G) = m0 +m1|Yt−1 − µ|+ σ0,0 + σ0,1Y2t−11/2

∫ 1

0

Φ−1(u)dG(u)

+σ1,0 + σ1,1Y2t−11/2

∫ 1

0

tanπ(u− 1/2)dG(u). (4.42)

As for QAR models, estimation models DAQ can be conducted by the method ofquantile regression. More efficient methods, based on the criterion of Kullback-Leibler information, may also be used (see Gourieroux and Jasiak, 2007).

Appendix A

Stationarity of GARCH(p, q)Processes

We will determine conditions under which stationary processes (in the strict andsecond-order senses) verifying Definition 2.7 exist. We are mainly interested innon anticipative solutions to Model (2.12), that is solutions (ǫt) such that ǫt isa measurable function of the variables ηt−s, s ≥ 0. For such processes, σt isindependent from the σ-field generated by ηt+h, h ≥ 0 and ǫt is independentfrom de the σ-field generated by ηt+h, h > 0.We first consider the GARCH(1,1) model, which can be dealt with in a moreexplicit way than the general case. Let, for x > 0, log+ x = max(log x, 0).

A.1 Case of the GARCH(1,1) model

When p = q = 1, Model (2.12) has the form

ǫt = σtηt, (ηt) iid (0, 1)

σ2t = ω + αǫ2t−1 + βσ2

t−1

(A.1)

with ω ≥ 0, α ≥ 0, β ≥ 0. Let a(z) = αz2 + β.

Theorem A.1 (Strict stationarity of the strong GARCH(1,1)) If

−∞ ≤ γ := E logαη2t + β < 0, (A.2)

65

66 APPENDIX A. STATIONARITY OF GARCH(P,Q) PROCESSES

the infinite sum

ht =

1 +

∞∑

i=1

a(ηt−1) . . . a(ηt−i)

ω, (A.3)

converges almost surely (a.s.) and le process (ǫt) defined by ǫt =√htηt is the unique

strictly stationary solution of Model (A.1). This solution is non anticipative andergodic.

If γ ≥ 0 and ω > 0, there exists no strictly stationary solution.

Remark A.1 (On the strict stationarity condition (A.2))

1. It can be noted that the condition (A.2) depends on the distribution of ηtand that it is not symmetric in α and β.

2. In the case ARCH(1) (β = 0), the strict stationarity constraint writes

0 ≤ α < exp−E(log η2t ). (A.4)

For instance when ηt ∼ N (0, 1) the condition writes: α < 3.56. For adistribution such that E(log η2t ) = −∞, for instance with a mass at 0, Con-dition (A.4) is always satisfied. For such distributions, a strictly stationaryARCH(1) solution exists whatever the value of α.

Theorem A.2 (2nd order stationarity of the GARCH(1,1)) Let ω > 0.

If α + β ≥ 1, a nonanticipative and second-order stationary solution to theGARCH(1,1) model does not exist.

If α + β < 1, we have γ < 0 and the strictly stationary solution of Theorem A.1is second-order stationary. More precisely, (ǫt) is a white noise. Moreover, thereexists no other second-order stationary and nonanticipative solution.

Figure A.1 shows the zones of strict and second-order stationarity for the strongGARCH (1,1) model when ηt ∼ N (0, 1) (which only matters for the strict station-arity).

A.2 The general case

In the general case of a strtong GARCH(p, q), the following vector representationwill prove most useful. We have

zt = bt + Atzt−1, (A.5)

A.2. THE GENERAL CASE 67

where

bt = b(ηt) =

ωη2t0...ω0...0

∈ Rp+q, zt =

ǫ2t...

ǫ2t−q+1

σ2t...

σ2t−p+1

∈ Rp+q,

and

At =

α1η2t · · · αqη

2t β1η

2t · · · βpη

2t

1 0 · · · 0 0 · · · 00 1 · · · 0 0 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 1 0 0 . . . 0 0

α1 · · · αq β1 · · · βp

0 · · · 0 1 0 · · · 00 · · · 0 0 1 · · · 0

.... . .

. . ....

.... . .

. . ....

0 . . . 0 0 0 . . . 1 0

(A.6)

is a (p+q)×(p+q) matrix. In the ARCH(q) case, zt reduces to ǫ2t and its q−1 firstpast values, and At to the upper-left block of the above matrix. The distribution

α1

β1

1

0 1 2 3 4

1

3

2

Figure A.1: Stationarity regions for the GARCH(1,1) model when ηt ∼ N (0, 1). 1:

2nd-order stationarity; 1 and 2: strict stationarity; 3: non stationarity.


of zt conditional on its infinite past coincides with its distribution conditional onzt−1 only, which means that (zt) is a Markov process. Model (A.5) is thus calledMarkov representation of the GARCH(p, q) model.

Iterating (A.5) yields

zt = bt +

∞∑

k=1

AtAt−1 . . . At−k+1bt−k, (A.7)

provided that the series exists almost surely. Finding conditions ensuring theexistence of this series is the object of what follows. Notice that the fact that theright-hand vector in (A.7) exist does not ensure that its components are positive.One sufficient condition for

bt +∞∑

k=1

AtAt−1 . . . At−k+1bt−k > 0, a.s. (A.8)

in the sense that all the components of this vector are strictly positive (but possiblyinfinite), is that

ω > 0, αi ≥ 0 (i = 1, . . . , q), βj ≥ 0 (j = 1, . . . , p). (A.9)

Strict stationarity

The main tool for studying the strict stationarity is the concept of top Lyapounovexponent. Let A une matrix (p+ q)× (p+ q). The spectral radius of A, denotedby ρ(A), is defined as the greatest modulus of its eigenvalues. Let ‖ · ‖ denote anynorm on the space of the (p+ q)× (p+ q) matrices. We have the following algebraresult

limt→∞

1

tlog ‖At‖ = log ρ(A). (A.10)

This property has the following extension to random matrices.

Theorem A.3 Let At, t ∈ Z be a strictly stationary and ergodic sequence ofrandom matrices, such that E log+ ‖At‖ is finite. We have

limt→∞

1

tE (log ‖AtAt−1 . . . A1‖) = γ = inf

t∈N∗1

tE(log ‖AtAt−1 . . . A1‖) (A.11)

and γ (resp. exp(γ)) is called the top Lyapounov exponent (resp. spectral radius)of the sequence of matrices At, t ∈ Z. Moreover

γ = limt→∞

a.s.1

tlog ‖AtAt−1 . . . A1‖. (A.12)


Remark A.2 (On the top Lyapounov exponent γ)

1. It always holds that γ ≤ E(log ‖A1‖), with equality in dimension 1.

2. If At = A for all t ∈ Z, we have γ = log ρ(A) in view of (A.10).

3. All norms on a finite-dimensional space being equivalent, it readily followsthat γ is independent of the norm chosen.

As for ARMA models, we are mostly interested in the non anticipative solutions(ǫt) to Model (2.12), that is those for which ǫt belongs to the σ-field generated byηt, ηt−1, . . ..

Theorem A.4 (Strict stationarity of the GARCH(p, q) model) A neces-sary and sufficient condition for the existence of a strictly stationary GARCH(p, q)solution process of Model (2.12) with ω > 0 is that

γ < 0

where γ is the top Lyapounov exponent of the sequence At, t ∈ Z defined by(A.6).

When the strictly stationary solution exists, it is unique, non anticipative andergodic.

We now give two illustrations allowing to obtain more explicit stationarity condi-tions than in the theorem.

Example A.1 (GARCH(1,1)) In the GARCH(1,1) case, we retrieve the strict station-arity condition already obtained. The matrix At writes, in this case,

At = (η2t , 1)′(α1, β1).

We hence have

AtAt−1 . . . A1 =

t−1∏

k=1

(α1η2t−k + β1)At.

It follows that

log ‖AtAt−1 . . . A1‖ =

t−1∑

k=1

log(α1η2t−k + β1) + log ‖At‖

and, in view of (A.12) and by the strong law of large numbers, γ = E log(α1η2t + β1). The

necessary and sufficient condition for the strict stationarity is then E log(α1η2t + β1) < 0, as

obtained above.


α10 1 2 3

α2

0

1

2

3

12

3

0 1 2 3

0

1

2

3

Figure A.2: Stationarity regions for the ARCH(2) model. 1: 2nd-order stationarity; 1

and 2: Strict stationarity; 3: Non-stationarity

Example A.2 (ARCH(2)) For an ARCH(2) model, the matrix At takes the form

At =

(

α1η2t α2η

2t

1 0

)

and the stationarity region can be evaluated by simulation. Figure (A.2), constructed from

simulations, gives a more precise idea of the strict stationarity region for an ARCH(2) process.

Second-order stationarity

Theorem A.5 (2nd-order stationarity) If there exists a processGARCH(p, q), in the sense of Definition 3.3, which is second-order station-ary and non anticipative, et if ω > 0, then

q∑

i=1

αi +

p∑

j=1

βi < 1. (A.13)

Conversely, if (A.13) holds, the unique strictly stationary solution of Model (2.12)is a weak white noise (and thus is second-order stationary). In addition, thereexists no other non anticipative and second-order stationary solution.


Remark A.3 (On the second-order stationarity of GARCH)

1. under the conditions of Theorem A.5, the unique stationary solution of Model(2.12) is a white noise of variance

Var(ǫt) =ω

1−∑qi=1 αi −

∑pj=1 βj

.

2. Since the conditions in Theorems A.4 and A.5 are necessary and sufficient,we necessarily have

[

q∑

i=1

αi +

p∑

j=1

βi < 1

]

⇒ γ < 0

since the second-order stationary solution of Theorem A.5 is also strictlystationary. One can directly check this implication by noting that if (A.13)is true, the previous proof shows that the spectral radius ρ(EAt) is strictlyless than 1. Moreover, using a result by Kesten and Spitzer (1984, (1.4)), wealways have

γ ≤ log ρ(EAt). (A.14)

Appendix B

Quantile

B.1 Quantile function

For any real random variable X, the cdf (cumulative distribution function) isdefined by

F : x 7→ F (x) = P [X ≤ x].

The cdf is increasing, right continuous with a left limit at every point. If X admitsa density, F is continuous. If this density is positive on R, F is strictly increasing.

We define the generalized inverse of an increasing function T , from R to R, by

T←(y) = infx ∈ R | T (x) ≥ y,

with the convention that the infimum of an empty set is +∞. Thus T← is leftcontinuous.

Definition B.1 The generalized inverse of a cdf F

F←(α) = infx ∈ R | F (x) ≥ α, 0 < α < 1

is called the quantile function. The number x(α) = F←(α) is the α-quantile of F .

If F is continuous and strictly increasing, we simply have F←(α) = F−1(α). wehave the following characterization:

x0 = F←(α) ⇔ F (x0) ≥ α and F (x) < α, ∀ x < x0.

72

B.2. AGGREGATION OF QUANTILE FUNCTIONS 73

For a law with a positive density on R, the quantile of order α is characterized byP [X < F←(α)] = α.

Example B.1 If X ∼ N (µ, σ2) then x(α) = m+σΦ−1(1−α) where Φ is the cdf ofthe N (0.1). If X is a loss, m is interpreted as the anticipated loss and σΦ−1(1−α)as the unanticipated loss.

We have the following properties, which can be easily proved: for all 0 < α < 1

• if X = c where c is a constant, F←X (α) = c,

• if Y = X + c where c is a constant, F←Y (α) = F←X (α) + c,

• if Y = λX where λ ≥ 0 is a constant, F←Y (α) = λF←X (α).

Using the property: F (x) ≥ α ⇔ x ≥ F←(α), we prove the following result.

Proposition B.1 Let U be a uniform variable on [0, 1] and X be a variable withcdf F . Then

(I) P [F←(U) ≤ x] = F (x), that is, F←(U) has same law as X.

(Ii) If F is continuous, F (X) has same law as U .

B.2 Aggregation of quantile functions

Quantile functions (and more generally, risk measures) of sum of variables are notrelated, in general, to the quantiles of the individual variables.

Consider the Gaussian case. Suppose

(

XY

)

∼((

µX

µY

)

,

(

σ2X ρσXσY

ρσXσY σ2Y

))

,

then,

F←X (α) = mX + σXΦ−1(1− α),

F←Y (α) = mY + σYΦ−1(1− α),

F←X+Y (α) = mX +mY + (σ2X + σ2

Y + 2ρσXσY )1/2Φ−1(1− α).

74 APPENDIX B. QUANTILE

There is therefore no analytic relationship between the quantiles of the sum andthose of X and Y (unless ρ = 1 but then X and Y are proportional). We alwayshave

F←X+Y (α) ≤ F←X (α) + F←Y (α).

For Gaussian variables, VaR thus satisfies the subadditivity property (see definition3.3). It is easy to construct an example of discrete variables where this propertyis in default.

Example B.2 Let X, Y two independent and identically distributed random vari-ables such that:

P [X = 0] = 1− p, P [X = 1] = p, 0 < p < 1.

For (1− p)2 < α ≤ 1− p then F←X (α) = F←Y (α) = 0 and F←X+Y (α) ≥ 1.

B.3 Derivatives of the quantile of a linear combi-

nation of random variables

Lemma B.1 (Gouriéroux, Laurent, Scaillet, 2000) Let (X, Y ) ∈ R2 be a

random vector, with density f and such that E|Y | < ∞. Let Q(ǫ, α) the quan-tile defined by

P [X + ǫY > Q(ǫ, α)] = α.

Then

∂Q(ǫ, α)

∂ǫ= E[Y | X + ǫY = Q(ǫ, α)].

Proof: We have

α =

∫(∫ +∞

Q(ǫ,α)−ǫyf(x, y)dx

)

dy.

Differentiating this equality with respect to ǫ we obtain

B.4. NON ABSOLUTELY CONTINUOUS VARIABLES 75

0 =

∫

f(Q(ǫ, α)− ǫy, y)

(

∂Q(ǫ, α)

∂ǫ− y

)

dy.

Hence

∂Q(ǫ, α)

∂ǫ=

∫

yf(Q(ǫ, α)− ǫy, y)dy∫

f(Q(ǫ, α)− ǫy, y)dy= E[Y | X + ǫY = Q(ǫ, α)].

B.4 Non absolutely continuous variables

In the case of loss of variables whose law is not absolutely continuous with re-spect to the Lebesgue measure, or for continuous variables whose density is noteverywhere strictly positive, it may be necessary to distinguish several quantiles.

Call inferior α-quantile of X the number

qα = F←(α) = infx ∈ R | F (x) ≥ α

and superior α-quantile of X

qα = infx ∈ R | F (x) > α.

We have of course Qα ≤ qα, with equality if and only if F (x) = α for at most onex.

Similarly we define two notions of VaR for a loss X:

VaRα = q1−α, VaRα = q1−α.

We define the "tail conditional expectations" (TEC) of a variable X such thatE(X+) < ∞ by

TCEα = E[X | X ≥ VaRα], TCEα = E[X | X ≥ VaRα].

We have of course TCEα ≤ TCEα. It is possible to construct simple examplesshowing that TCEα is not a sub-additive risk measure (see Acerbi and Tasche,2002). Under certain regularity conditions, it can be shown that the smallestmeasure of risk that is consistent and larger than the VaR is the "worst conditionalexpectation" (WCE) defined by

76 APPENDIX B. QUANTILE

WCEα = infE[X | A];P (A) > α

(See Delbaen, 2002).

Finally, we define the expected shortfall, for X such that E(X+) ≤ +∞, by

ESα =1

αE(X 1X≥s) + s(α− P [X ≥ s]) , s ∈ [q1−α, q

1−α] (B.1)

noting that the definition is independent of the choice of s. We have the followingproperty.

Proposition B.2 Let X such that E(X+) < +∞. Then

ESα =1

α

∫ α

0

VaRudu. (B.2)

Proof: By (B.1) if U ∼ U[0,1],

αESα = E(X 1X≥VaRα) + VaRα(α− P [X ≥ VaRα])

= E(X 1X≥VaRα) + VaRαE(1U≤α−1X≥VaRα

)

= E(X 1X≥VaRα)− VaRαE(1q1−U≥q1−α

−11−U≥1−α)

= E(X 1X≥VaRα)− q1−αE(1(1−U<1−α)∩(q1−U≥q1−α))

= E(q1−U 1q1−U≥q1−α)−E(q1−U 1(1−U<1−α)∩(q1−U≥q1−α))

= E(q1−U 1U≤α)

since F←(1− U) = q1−U has the same law as X (see Proposition B.1 ).

B.5 Central limit theorem for triangular arrays

Let Xn1, . . . , Xnrn be a sequence of centered variables, called triangular array (be-cause usually rn ≤ n). We set, for all n and k = 1, . . . , rn

E(Xnk) = 0, E(X2nk) = σ2

nk, s2n =rn∑

k=1

σ2nk (B.3)

and assume that Sn > 0 for all n. We have the condition known as Lindeberg’scondition

B.5. CENTRAL LIMIT THEOREM FOR TRIANGULAR ARRAYS 77

∀ǫ > 0, limn→∞

rn∑

k=1

1

s2nE(X2

nk 1|Xnk|>ǫsn) = 0. (B.4)

The following theorem is proved in Billingsley (1995) 1 and generalizes the classicalCLT for iid variables.

Theorem B.1 Suppose that for all n the sequence Xn1, . . . , Xnrn is independentand satisfies (B.3 ) and (B.4 ). Then

(Xn1 + · · ·+Xnrn)/snd→ N (0, 1) .

1Probability and measure, 3rd edition, John Wiley.

Appendix C

References

Textbooks:

Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods,Springer-Verlag, New York, 2nd edition.

Francq, C. and J.M. Zakoïan (2010) GARCH Models: Structure, StatisticalInference and Financial Applications. John Wiley.

Gouriéroux, C. and J. Jasiak (2001) Financial Econometrics, PrincetonUniversity Press.

Koenker, R. (2005) Quantile regression, Cambridge University Press.

McNeil, A.J., Frey, R. and P. Embrechts (2005) Quantitative risk Man-agement, Princeton University Press.

References for Chapter 2:

Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedastic-ity. Journal of Econometrics 31, 307–327.

Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods,Springer-Verlag, New York, 2nd edition.

Engle, R.F. (1982) Autoregressive Conditional Heteroskedasticity with Esti-mates of the Variance of U.K. Inflation, Econometrica 50, 987–1008.

Mandelbrot, B. (1963) The variation of certain speculative prices. Journal ofBusiness 36, 394–419.

78

79


Acerbi, C. and D. Tasche (2002) On the coherence of expected shortfall.Journal of Banking and Finance 26, 1487-1503.

Artzner, P., Delbaen, F., Eber, J-M. and D. Heath (1999) Coherentmeasures of risk. Mathematical Finance 9, 203-228.

Delbaen, F. (2002) Coherent measures of risk on general probability spaces. In:Advances in Finance and Stochastics, essays in Honor of D. Sondermann,Springer, 1-37.

Gouriéroux, C., Laurent, J-P and O. Scaillet (2000) Sensitivity Analysisof VaR, Journal of Empirical Finance, 7, 225-246.

Gouriéroux, C. and W. Liu (2006) Sensitivity analysis of distorsion riskmeasures, unpublished document.

Kusuoka, S. (2001) On law invariant coherent risk measures. In : Advances inMathematical Economics, 83-95 vol. 3, Springer, Tokyo.

Tasche, D. (2002) Expected shortfall and beyond. Journal of Banking andFinance 26, 1519-1533.

Wang S. and J. Dhaene (1998) Comonotonicity, correlation order and pre-mium principles. Insurance: Mathematics and Economics 22, 235-242.

Wang S., Young V. and H. Panjer (1997) Axiomatic Characterization ofInsurance Prices. Insurance: Mathematics and Economics 21, 173-183.


Bahadur, R.R. (1966) A note on quantiles in large samples. Annals of Mathe-matical Statistics 37, 577–580.

De Rossi G. and A. Harvey (2006) Time-varying quantiles. Discussion paperCWPE 0649.

Engle, R.F. and S. Manganelli (2004) CAViaR: Conditional AutoregressiveValue at Risk by regression quantiles. Journal of Business and EconomicStatistics, 22, 367-381.

Gouriéroux, C. and J. Jasiak (2007) Dynamic quantile models. A paraîtredans Journal of Econometrics.

80 APPENDIX C. REFERENCES

Gouriéroux, C. and W. Liu (2007) Converting Tail-VaR to VaR: an econo-metric study. Unpublished document.

Koenker R. and Z. Xiao (2002) Inference on the quantile regression process.Econometrica, 70, 1583-1612.

Koenker R. and Z. Xiao (2006) Quantile autoregression. Journal of the Amer-ican Statistical Society, 101, 980-990.

Koenker R. and Q. Zhao (1996) Conditional quantile estimation for andinference for ARCH models. Econometric Theory, 12, 793-813.

Kuester, K., Mittnik, S. and M.S. Paolella (2006) Value at Risk predic-tion: a comparison of alternative strategies. Journal of Financial Economet-rics, 4, 53-89.

Martin, R. and T. Wilde (2002) Unsystematic Credit Risk, Risk Magazine,15, 123-128.

Scaillet, O. Nonparametric estimation of conditional expected shortfall. In-surance and risk management journal, 74, 639-660.

Wilde, T. (2001) Probing Granularity. Risk, 14, 103-106.

Wu, W.B. (2005) On the Bahadur representation of sample quantiles for de-pendent sequences. The Annals of Statistics, 33, 1934-1963.

Documents

advanced risk management