Value-at-Risk, Expected Shortfall and Density Forecasting … · I Value-at-Risk I Expected Shortfall I Density Estimation 3 / 44. Value-at-Risk. Value-at-Risk Value-at-Risk is a

Value-at-Risk, Expected Shortfalland Density Forecasting

Kevin Sheppardhttp://www.kevinsheppard.com

Oxford MFEThis version: February 20, 2020

February 2020

Risk Measurement Overview� What is risk?� What is Value-at-Risk?� How can VaR be measured and modeled?� How can VaR models be tested?� What is Expected Shortfall?� How can densities be forecasted?� How can density models be evaluated?� What is a coherent risk measure?

2 / 44

Risk� What is risk?� Market Risk

I Liquidity RiskI Credit RiskI Counterparty RiskI Model RiskI Estimation Risk

� Today’s focus: Market Risk� Tools

I Value-at-RiskI Expected ShortfallI Density Estimation

3 / 44

Value-at-Risk

Value-at-Risk� Value-at-Risk is a standard tool of risk management

I Basel Accord

Definition (Value-at-Risk)

The α Value-at-Risk (V aR) of a portfolio is defined as the largestnumber such that the probability that the loss in portfolio value oversome period of time is greater than the V aR is α,

Pr(R < −V aR) = α

where R = W1 −W0 is the total return on the portfolio, Wt, t = 0, 1, isthe value of the assets in the portfolio and 1 and 0 measure anarbitrary time span (e.g. one day or two weeks).

� Units are $, £, ¥� Almost always positive� It is a quantile

4 / 44

Kevin

Kevin

Value-at-Risk in a picture� Returns are N(.001, .0152)� W0 is £10,000,000

9.7 9.8 9.9 10.0 10.1 10.2 10.3 Portfolio Value (£ mil)

5% VaR

Previous Wealth

Expected Wealth

Distribution of Wealth5% quantileValue-at-Risk

5 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Percent Value-at-Risk� Value-at-Risk can be normalized and reported as a %

Definition (Percentage Value-at-Risk)

The α percentage Value-at-Risk (%V aR) of a portfolio is defined as thelargest return such that the probability that the return on the portfolio oversome period of time is less than -%V aR is α,

Pr(r < −%V aR) = α

where r is the percentage return on the portfolio. %V aR can be equivalentlydefined as %V aR = V aR/W0.

� Units are returns (no units)� Also almost always positive� Lets VaR be interpreted without knowing the value of the portfolio, W0

I No meaningful loss of information from standard VaRI Used throughout rest of lecture in place of formal definition of VaR

6 / 44

Kevin

Kevin

Kevin

Relationship between Quantiles and VaR� VaR is a quantile

I Quantile of the future distribution

Definition (α-Quantile)

The α-quantile of a random variable X is defined as the smallestnumber qα such that

Pr(X ≤ qα) = α

� Other “-iles”I TercileI QuartileI QuintileI DecileI Percentile

7 / 44

Kevin

Kevin

Kevin

Kevin

Conditional and Unconditional VaR� Condiitonal VaR is similar to conditional mean or conditional

variance

Definition (Conditional Value-at-Risk)

The conditional α Value-at-Risk is defined as

Pr(rt+1 < −V aRt+1|t|Ft) = α

where rt+1 = (Wt+1 −Wt) /Wt is the return on a portfolio at time t+ 1.

� t is an arbitrary measure of time⇒ t+ 1 also refers to an arbitraryunit of timeI day, two-weeks, 5 years, etc.

� Incorporates all information available at time t to assess risk attime t+ 1

� Natural extension of conditional expectation and conditionalvariance to conditional quantile

8 / 44

Kevin

Kevin

Kevin

Kevin

Models for Value-at-Risk

Conditional VaR: RiskMetrics� Industry standard benchmark� Restricted GARCH(1,1)

σ2t+1 = (1− λ)r2t + λσ2

t

� Exponentially Weighted Moving Average (EWMA):

σ2t+1 =

∞∑i=0

(1− λ)λir2t−i

V aRt+1 = −σt+1Φ−1(α)

� Φ−1(·) is the inverse normal CDF� Advantages

I No parameters to estimateI λ =.94 (daily data), .97 (weekly), .99 (monthly)I Easy to extend to portfolios (see notes)

� DisadvantagesI No parameters to estimateI No leverage effectI Random Walk VaR

9 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Conditional VaR: GARCH models for Value-at-Risk

rt+1 = µ+ εt+1

σ2t+1 = ω + γε2t + βσ2t

εt+1 = σt+1et+1

et+1i.i.d.∼ f(0, 1)

� Value-at-Risk:V aRt+1 = −µ− σt+1F

−1α

� F−1α is the α quantile of the distribution of et+1

I For example, 1.645 for the 5% from a normal� Advantages

I Flexible volatility model and easy to estimate� Disadvantages

I Must chose f (know f to get the correct VaR)I Location-Scale families

10 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Conditional VaR: Semiparametric/Filtered HS� Parametric GARCH + Nonparametric Density→ Semi-parametric VaR

et+1i.i.d.∼ g(0, 1), gunknown distribution

� Implementation1. Fit an ARCH model using Normal QMLE2. et = εt

σt3. Order residuals

e1 < e2 < . . . < eN−1 < eN

� Quantile is residual α×N residual (N=T).

V aRt+1(α) = −µ− σt+1G−1α

� AdvantagesI All advantages of GARCHI Quantile converges to true quantile

� DisadvantagesI Location-Scale familiesI Quantile convergence is slow

11 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Conditional VaR: CaViaR� Conditional Autoregressive Value-at-Risk (ARCVaR)

I Conditional quantile regressionI Directly parameterize quantile F−1α = q of the return distribution

qt+1 = ω + γHITt + βqt

HITt = I[rt<qt] − α

V aRt+1 = −qt+1

� AdvantagesI Focuses on quantileI Flexible specification

� DisadvantagesI Hard to estimateI Which specification?I Out-of-order VaR: 5% can less than 10% VaR

12 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Estimation of CaViaR models� Many CaViaR specifications

I Symmetricqt+1 = ω + γHITt + βqt.

I Symmetric absolute value,

qt+1 = ω + γ|rt|+ βqt.

I Asymmetric absolute value

qt+1 = ω + γ1|rt|+ γ2|rt|I[rt<0] + βqt

I Indirect GARCHqt+1 =

(ω + γr2t + βq2t

) 12

� Estimation minimizes the “tick” loss function

argminθ

T−1T∑t=1

α(rt − qt)(1− I[rt<qt]) + (1− α)(qt − rt)I[rt<qt]

I Non-differentiableI Requires “derivative-free” optimizers (e.g. simplex optimizers)

13 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Weighted Historical Simulation� Uses a weighted empirical CDF� Weights are exponentially decaying

wi = λt−i (1− λ) /(1− λt

), i = 1, 2, . . . , t

� Weighted Empirical CDF

Gt(r) =

t∑i=1

wiI[ri<r]

� Conditional VaR is solution toVaRt+1 = min

rG(r) ≥ α

� Example uses λ = 0.975

14 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Weighted Historical Simulation2018

4 3 2 1 0 1 20.0

0.2

0.4

0.6

0.8

1.0

15 / 44

Kevin

Kevin

Kevin

Kevin

Weighted Historical Simulation2009

4 2 0 2 4 60.0

0.2

0.4

0.6

0.8

1.0

16 / 44

Kevin

Kevin

Kevin

S&P 500 VaR Estimates, α = 5%

% VaR using RiskMetrics

2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

2%

4%

6%

8%

10%

% VaR using TARCH(1,1,1) with Skew t errors

2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

2%

4%

6%

8%

10%

17 / 44

S&P 500 VaR Estimates, α = 5%

% VaR using Asymmetric CaViaR

2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

2%

4%

6%

8%

10%

% VaR using Weighted Historical Simulation

2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

2%

4%

6%

8%

10%=0.95=0.99=0.999

18 / 44

Unconditional VaR� Parametric Estimation

I Specify some fully parametric model for returnsI Estimate the parameters by MLEI VaR is the α-quantile of the fit distribution

� Nonparametric Estimation (Historical Simulation)I Nonparametric estimation of the density of returns using raw dataI Identical to previous density estimationI Can “smooth” to reduce variance

� Parametric Monte CarloI Estimate a conditional model for short horizon returnsI Simulate the model for many periodsI Use a nonparametric estimate of the density of the simulated

returns

19 / 44

Evaluation of Value-at-Risk Models

Evaluating VaR models� Basic instrument for testing VaR is the “Hit”

get = I[rt<F−1t ] − α = HITt

� Is the generalized error from the “tick” loss function� If the VaR is correct,

Et−1[HITt] = 0

� Leads to a standard Generalized Mincer-Zarnowitz evaluationframework

� Hit Regression

HITt+h = γ0+γ1V aRt+h|t+γ2HITt+γ3HITt−1+. . .+γKHITt−K+1

I Null is H0 : γ0 = γ1 = . . . = γK = 0I Alternative is H1 : γj 6= 0 for some j

� As always, GMZ can be augmented with any time t measurablevariable

20 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Unconditional Evaluation of VaR using the Bernoulli� HITs from a correct VaR model have a Bernoulli distribution

I 1 with probability αI 0 with probability 1− α

� Likelihood for T Bernoulli random variables xt ∈ {0, 1}

f(xt; p) =T∏t=1

pxt(1− p)1−xt

� Log-likelihood is

l(p;xt) =T∑t=1

xt ln p+ (1− xt) ln 1− p

� In terms of α and HIT tl(α; HIT t) =

T∑t=1

HIT t lnα+(

1− HIT t)

ln 1− α

� Easy to conduct a LR testLR = 2(l(α; HIT )− l(α0; HIT )) ∼ χ2

1

� α = T−1∑Tt=1 HIT t, α0 is the α from the VaR

21 / 44

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Evaluation of Conditional VaR using the Bernoulli� Can also be extended to testing conditional independence ofHITs

� Define

n00 =

T−1∑t=1

(1− HIT t)(1− HIT t+1), n10 =

T−1∑t=1

(1− HIT t)HIT t+1

n01 =

T−1∑t=1

HIT t(1− HIT t+1), n11 =

T−1∑t=1

HIT tHIT t+1

� The log-likelihood for the sequence two VaR exceedences is

l(p; HIT ) = n11 ln(p11)+n01 ln(1−p11)+n00 ln(p00)+n10 ln(1−p00)

22 / 44

Kevin

Kevin

Kevin

Evaluation of Conditional VaR using the Bernoulli� Null is H0 : p11 = 1− p00 = α

� MLEs are

p00 =n00

n00 + n10, p11 =

n11n11 + n01

� Tested using a likelihood ratio test

LR = 2(l(p00, p11; HIT )− l(p00 = 1− α, p11 = α; HIT ))

� Test statistic follows a χ22 distribution

23 / 44

Kevin

Relationship to Probit/Logit� Standard GMZ regression is not an ideal test� Ignores special structure of a HIT� A HIT is a limited dependant variable

I Only takes one of two values� Define a modified hit HIT t = I[rt<F−1

t ]I Takes the value 1 with probability α and 0 with probability 1− αI Name that distribution→

� Leads to a modified regression framework known as a probit orlogitI Probit:

HIT t+1 = Φ (γ0 + xtγ)

– If model is correct, γ0 = Φ−1(α) and γ = 0– Estimated using Bernoulli Maximum Likelihood– Easy to compute Likelihood ratio

� Accounts for the limited range of the variable and that the densityis non-normal

� Allows for simple-yet-powerful likelihood ratio tests under the null24 / 44

Kevin

Kevin

Kevin

Density Forecasting and Evaluation

Density Estimation and Forecasting� End all be all of risk measurement� Issues:

I Equally hardI Lots of estimation and model error

– Can have non obvious effects on nonlinear functions (i.e. options)I Not closed under aggregation

– No multi-step

� Builds off of the GARCH VaR application

25 / 44

Density forecasts from GARCH models� Simple constant mean GARCH(1,1)

rt+1 = µ+ εt+1

σ2t+1 = ω + γε2t + βσ2t

εt+1 = σt+1et+1

et+1i.i.d.∼ g(0, 1).

� g is some known distribution, but not necessarily normal� Density forecast is simply g(µ, σ2t+1|t)

� Flexible through choice of g� Parsimonious� Semiparametric works in same way replacing g with the

standardized residuals of a “smoothed” estimate

26 / 44

Kernel Densities� “Smoothed” densities are more precise than rough estimates

g(e) =1

Th

T∑t=1

K

(et − eh

), et =

yt − µtσt

=εtσt

� Local average of how many et there are in a small neighborhoodof e

� K(·) is a kernelI Gaussian

K(x) =1√2π

exp(−x2/2)

I Epanechnikov

K(x) =

{34(1− x2) −1 ≤ x ≤ 1

0 otherwise

� h: Bandwidth controls smoothing� Silverman’s bandwidth

1.06σxT− 1

5

I h too small produces very rough densities (low bias but lots ofvariance)

I h too large produces overly smooth (low variance but very biased)27 / 44

S&P 500 Parametric and Nonparametric Densities

4 3 2 1 0 1 2 3 4

NormalSkew tNonparametric

28 / 44

Multi-step Density Forecasts� Densities do not aggregate in general

I Multivariate normal is special� Densities from GARCH models do not easily aggregate� 1-step density forecast from a standard GARCH(1,1)

rt+1|Ft ∼ N(µ, σ2t+1|t)

� Wrong 2-step forecast from a standard GARCH(1,1)

rt+2|Ft ∼ N(µ, σ2t+2|t)

� Correct 2-step forecast from a standard GARCH(1,1)

rt+2|Ft ∼∫ ∞−∞

φ(µ, σ2(et+1)t+2|t+1)φ(et+1)det+1.

� Must integrate out the variance uncertainty between t+ 1 and t+ 2

� Easy fix: directly model t+ 2 (or t+ h)29 / 44

The Fan plot� Hard to produce time-series of densities� Solution is the Fan Plot� Popularized by the Bank of England� Horizontal axis (x) is the number of time-periods ahead� Vertical axis (y) is the vale the variable might take� Density is expressed using varying degrees of color intensity.

I Dark color indicate the highest probabilityI Progressively lighter colors represent decreasing likelihoodI Essentially a plot of many quantiles of the distribution through time

� A lot of “wow”� Not necessarily a lot of content

30 / 44

A fan plot for an AR(2)

1 5 10 15 20Forecast Horizon

5

0

5

10

15

20

25

Pred

ictio

n Va

lue

31 / 44

Density “Standardized” Residuals� Consider a generic stochastic process {yt}

I Residuals from mean models:

εt = yt − µtI Residuals from variance models:

et =εtσt

=yt − µtσt

I Residuals from Value-at-Risk models:

HITt = I[yt<qt] − α

I Residual from density models:

ut = Ft(yt)

� Known as the Probability Integral Transformed Residuals.� One very useful property: If yt ∼ F then ut ≡ F (yt) ∼ U(0, 1)

32 / 44

Probability Integral Transform

Theorem (Probability Integral Transform)

Let a random variable X have a continuous, increasing CDF FX(x)and define Y = FX(X). Then Y is uniformly distributed andPr(Y ≤ y) = y, 0 < y < 1.

For any y ∈ (0, 1), Y = FX(X), and so

Pr(Y ≤ y) = Pr(FX(X) ≤ y)

= Pr(F−1X (FX(X)) ≤ F−1X (y)) Since F−1X is increasing

= Pr(X ≤ F−1X (y)) Invertible since strictly increasing

= FX(F−1X (y)) Definition of FX= y

33 / 44

Evaluating Density Forecasts: QQ Plots� Quantile-Quantile Plots� Plots the data against a hypothetical distribution

e1 < e2 < . . . < eN−1 < eN

I N = T but used to indicate that the index is not related to time� en against F−1( j

T+1)

F−1

(1

T + 1

)< F−1

(2

T + 1

)< . . . < F−1

(T − 1

T + 1

)< F−1

(T

T + 1

)� F−1 is inverse CDF of distribution being used for comparison� Should lie along a 45o line� No confidence bands

34 / 44

QQ Plots for the S&P 500Monthly Returns

Normal Student’s t, ν = 5.8

4 2 0 2 4Observed

4

2

0

2

4

Hyp

othe

tical

4 2 0 2 4Observed

4

2

0

2

4

Hyp

othe

tical

GED, ν = 1.25 Skewed t, ν = 6.3, λ = −0.19

4 2 0 2 4Observed

4

2

0

2

4

Hyp

othe

tical

4 2 0 2 4Observed

4

2

0

2

4

Hyp

othe

tical

35 / 44

Evaluating Density Forecasts: Kolmogorov-Smirnov� Formalizes QQ plots� Key property

I If x ∼ F , then u ≡ F (x) ∼ U(0, 1)I Can test U(0, 1)

� KS tests maximum deviation from U(0,1)

maxτ

∣∣∣∣∣ 1

T

(τ∑i=1

I[ui< τT]

)− τ

T

∣∣∣∣∣ , τ = 1, 2, . . . , T

I 1T

∑τi=1 I[uj< τ

T ]: Empirical percentage of u below τ/TI τ/T : How many should be below τ/T

� Nonstandard distribution� Parameter estimation error

I Parameter Estimation Error (PEE) causes significant sizedistortions

I Using a 5% CV will only reject 0.1% of the timeI Solution is to simulate the needed critical values

36 / 44

The Kolmogorov-Smirnov Test

200 400 600 800Observation Index

0.0

0.2

0.4

0.6

0.8

1.0C

umul

ativ

e Pr

obab

ility

95% Conf. BandsKS-test error (Normal)KS-test error (Student's t)

37 / 44

Addressing PEE in a KS test� Model is a complete model so can be easily simulated� Exact KS distribution tabulated

Algorithm (Correct CV for KS test with PEE)

1. Estimate model and save θ

2. Repeat many times (1000+)a. Simulate artificial series from model using θwith same number of

observations as original datab. Estimate parameters from simulated data, θc. Compute KS test statistic on simulated data using θand save as

KSi, i = 1, 2, . . . ,

3. Sort the KSi values and use the 1− α quantile for get correct CVfor α size test

38 / 44

Evaluating Density Forecasts: Berkowitz Test� Berkowitz Test extends KS to evaluation of conditional densities� Exploits probability integral transform property

ut = F (yt)

� But then re-transforms the data to a standard normal

ηt = Φ−1(ut) = Φ−1(F (yt))

I Since uti.i.d.∼ U(0, 1), ηt

i.i.d.∼ N(0, 1)� Test is a likelihood ratio test using an AR(1)

ηt = φ0 + φ1ηt−1 + νt

� If the model is correctly specifiedI φ0 = 0,φ1 = 0,σ2 = V[νt] = 1

� Likelihood ratio

2(l(ηt|φ0, φ1, σ2)− l(ηt|φ0 = 0, φ1 = 0, σ2 = 1)

)∼ χ2

3

I Critical values wrong if F has estimated parameters39 / 44

Idealized Risk Measures

Coherent Risk Measures� Coherence is a desirable property for a risk measure

I But not completely necessary� ρ is the required capital necessary according to some measure of

risk (VaR, ES, Standard Deviation, etc.)� P , P1 and P2 are portfolios of assets� A Coherent measure satisfies:

Drift Invarianceρ(P + c) = ρ(P )− c

Homogeneityρ(λP ) = λρ(P ) for any λ > 0

Monotonicity If P1 first order stochastically dominates P2, then

ρ(P1) ≤ ρ(P2)

Subadditivityρ(P1 + P2) ≤ ρ(P1) + ρ(P2)

40 / 44

Coherent Risk Measures� VaR is not coherent

I Because VaR is a quantile it may not be subadditive

VaR is Not Coherent� Two portfolios P1 and P2 holding a bond

I Each paying 0%, par value of $1,000I Default probability 3%, recovery rate 60%I Two companies, defaults are independent

� Value-at-Risk of P1 and P2is $0� Value-at-RIsk of P3 = 50%× P1 + 50%× P2 = $200

I 5.91% that one or both default

41 / 44

Coherent Risk MeasuresP1 and P2

500 400 300 200 100 0 100Change in Value

0.0

0.2

0.4

0.6

0.8

1.0

P3

500 400 300 200 100 0 100Change in Value

0.0

0.2

0.4

0.6

0.8

1.0

42 / 44

Coherent Risk Measures� ES is coherent

I Doesn’t mean muchI VaR still has a lot of advantagesI More importantly VaR and ES agree in most realistic settings

ES is coherent� ES of P1 and P2 is $240

I Given in lower 5% of distribution, 60% chance of a loss of $400� ES of P3

I Given in lower 5% of distribution:– 0.0009/0.05 = .018 probability of $400 loss (2 defaults)– 0.0491/0.05 = .982 probability of $200 loss (1 default)– ES of $7.20 + $196.40 = $203.60

� ES is subadditive when VaR is not

43 / 44

Expected Shortfall

Expected Shortfall� Conditional Expected Shortfall (ES, also called Tail VaR)

ESt+1 = Et[rt+1|rt+1 < −V aRt+1]

� "Expected Loss given you have a Value-at-Risk violation"� Usually requires the specification of a complete model for the

conditional distribution� Uses all of the information in the tail� Evaluation

I Standard Problem, a conditional meanI GMZ regression

(ESt+1|t −Rt+1)I[Rt+1<−V aRt+1|t] = xtγ

– H0 : γ = 0

� Difficult to test since relatively few observations

44 / 44

1

1

1

1

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

Kevin

1

Kevin

Kevin

Kevin

1

Kevin

Kevin

Kevin

Kevin

1

1

1

1

1

Documents

Value-at-Risk, Expected Shortfall and Density Forecasting … · I Value-at-Risk I Expected Shortfall I Density Estimation 3 / 44. Value-at-Risk. Value-at-Risk Value-at-Risk is a