Upload
others
View
9
Download
2
Embed Size (px)
Citation preview
Value-at-Risk, Expected Shortfalland Density Forecasting
Kevin Sheppardhttp://www.kevinsheppard.com
Oxford MFEThis version: February 20, 2020
February 2020
Risk Measurement Overview� What is risk?� What is Value-at-Risk?� How can VaR be measured and modeled?� How can VaR models be tested?� What is Expected Shortfall?� How can densities be forecasted?� How can density models be evaluated?� What is a coherent risk measure?
2 / 44
Risk� What is risk?� Market Risk
I Liquidity RiskI Credit RiskI Counterparty RiskI Model RiskI Estimation Risk
� Today’s focus: Market Risk� Tools
I Value-at-RiskI Expected ShortfallI Density Estimation
3 / 44
Value-at-Risk
Value-at-Risk� Value-at-Risk is a standard tool of risk management
I Basel Accord
Definition (Value-at-Risk)
The α Value-at-Risk (V aR) of a portfolio is defined as the largestnumber such that the probability that the loss in portfolio value oversome period of time is greater than the V aR is α,
Pr(R < −V aR) = α
where R = W1 −W0 is the total return on the portfolio, Wt, t = 0, 1, isthe value of the assets in the portfolio and 1 and 0 measure anarbitrary time span (e.g. one day or two weeks).
� Units are $, £, ¥� Almost always positive� It is a quantile
4 / 44
Value-at-Risk in a picture� Returns are N(.001, .0152)� W0 is £10,000,000
9.7 9.8 9.9 10.0 10.1 10.2 10.3 Portfolio Value (£ mil)
5% VaR
Previous Wealth
Expected Wealth
Distribution of Wealth5% quantileValue-at-Risk
5 / 44
Percent Value-at-Risk� Value-at-Risk can be normalized and reported as a %
Definition (Percentage Value-at-Risk)
The α percentage Value-at-Risk (%V aR) of a portfolio is defined as thelargest return such that the probability that the return on the portfolio oversome period of time is less than -%V aR is α,
Pr(r < −%V aR) = α
where r is the percentage return on the portfolio. %V aR can be equivalentlydefined as %V aR = V aR/W0.
� Units are returns (no units)� Also almost always positive� Lets VaR be interpreted without knowing the value of the portfolio, W0
I No meaningful loss of information from standard VaRI Used throughout rest of lecture in place of formal definition of VaR
6 / 44
Relationship between Quantiles and VaR� VaR is a quantile
I Quantile of the future distribution
Definition (α-Quantile)
The α-quantile of a random variable X is defined as the smallestnumber qα such that
Pr(X ≤ qα) = α
� Other “-iles”I TercileI QuartileI QuintileI DecileI Percentile
7 / 44
Conditional and Unconditional VaR� Condiitonal VaR is similar to conditional mean or conditional
variance
Definition (Conditional Value-at-Risk)
The conditional α Value-at-Risk is defined as
Pr(rt+1 < −V aRt+1|t|Ft) = α
where rt+1 = (Wt+1 −Wt) /Wt is the return on a portfolio at time t+ 1.
� t is an arbitrary measure of time⇒ t+ 1 also refers to an arbitraryunit of timeI day, two-weeks, 5 years, etc.
� Incorporates all information available at time t to assess risk attime t+ 1
� Natural extension of conditional expectation and conditionalvariance to conditional quantile
8 / 44
Models for Value-at-Risk
Conditional VaR: RiskMetrics� Industry standard benchmark� Restricted GARCH(1,1)
σ2t+1 = (1− λ)r2t + λσ2
t
� Exponentially Weighted Moving Average (EWMA):
σ2t+1 =
∞∑i=0
(1− λ)λir2t−i
V aRt+1 = −σt+1Φ−1(α)
� Φ−1(·) is the inverse normal CDF� Advantages
I No parameters to estimateI λ =.94 (daily data), .97 (weekly), .99 (monthly)I Easy to extend to portfolios (see notes)
� DisadvantagesI No parameters to estimateI No leverage effectI Random Walk VaR
9 / 44
Conditional VaR: GARCH models for Value-at-Risk
rt+1 = µ+ εt+1
σ2t+1 = ω + γε2t + βσ2t
εt+1 = σt+1et+1
et+1i.i.d.∼ f(0, 1)
� Value-at-Risk:V aRt+1 = −µ− σt+1F
−1α
� F−1α is the α quantile of the distribution of et+1
I For example, 1.645 for the 5% from a normal� Advantages
I Flexible volatility model and easy to estimate� Disadvantages
I Must chose f (know f to get the correct VaR)I Location-Scale families
10 / 44
Conditional VaR: Semiparametric/Filtered HS� Parametric GARCH + Nonparametric Density→ Semi-parametric VaR
et+1i.i.d.∼ g(0, 1), gunknown distribution
� Implementation1. Fit an ARCH model using Normal QMLE2. et = εt
σt3. Order residuals
e1 < e2 < . . . < eN−1 < eN
� Quantile is residual α×N residual (N=T).
V aRt+1(α) = −µ− σt+1G−1α
� AdvantagesI All advantages of GARCHI Quantile converges to true quantile
� DisadvantagesI Location-Scale familiesI Quantile convergence is slow
11 / 44
Conditional VaR: CaViaR� Conditional Autoregressive Value-at-Risk (ARCVaR)
I Conditional quantile regressionI Directly parameterize quantile F−1α = q of the return distribution
qt+1 = ω + γHITt + βqt
HITt = I[rt<qt] − α
V aRt+1 = −qt+1
� AdvantagesI Focuses on quantileI Flexible specification
� DisadvantagesI Hard to estimateI Which specification?I Out-of-order VaR: 5% can less than 10% VaR
12 / 44
Estimation of CaViaR models� Many CaViaR specifications
I Symmetricqt+1 = ω + γHITt + βqt.
I Symmetric absolute value,
qt+1 = ω + γ|rt|+ βqt.
I Asymmetric absolute value
qt+1 = ω + γ1|rt|+ γ2|rt|I[rt<0] + βqt
I Indirect GARCHqt+1 =
(ω + γr2t + βq2t
) 12
� Estimation minimizes the “tick” loss function
argminθ
T−1T∑t=1
α(rt − qt)(1− I[rt<qt]) + (1− α)(qt − rt)I[rt<qt]
I Non-differentiableI Requires “derivative-free” optimizers (e.g. simplex optimizers)
13 / 44
Weighted Historical Simulation� Uses a weighted empirical CDF� Weights are exponentially decaying
wi = λt−i (1− λ) /(1− λt
), i = 1, 2, . . . , t
� Weighted Empirical CDF
Gt(r) =
t∑i=1
wiI[ri<r]
� Conditional VaR is solution toVaRt+1 = min
rG(r) ≥ α
� Example uses λ = 0.975
14 / 44
Weighted Historical Simulation2018
4 3 2 1 0 1 20.0
0.2
0.4
0.6
0.8
1.0
15 / 44
Weighted Historical Simulation2009
4 2 0 2 4 60.0
0.2
0.4
0.6
0.8
1.0
16 / 44
S&P 500 VaR Estimates, α = 5%
% VaR using RiskMetrics
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
2%
4%
6%
8%
10%
% VaR using TARCH(1,1,1) with Skew t errors
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
2%
4%
6%
8%
10%
17 / 44
S&P 500 VaR Estimates, α = 5%
% VaR using Asymmetric CaViaR
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
2%
4%
6%
8%
10%
% VaR using Weighted Historical Simulation
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
2%
4%
6%
8%
10%=0.95=0.99=0.999
18 / 44
Unconditional VaR� Parametric Estimation
I Specify some fully parametric model for returnsI Estimate the parameters by MLEI VaR is the α-quantile of the fit distribution
� Nonparametric Estimation (Historical Simulation)I Nonparametric estimation of the density of returns using raw dataI Identical to previous density estimationI Can “smooth” to reduce variance
� Parametric Monte CarloI Estimate a conditional model for short horizon returnsI Simulate the model for many periodsI Use a nonparametric estimate of the density of the simulated
returns
19 / 44
Evaluation of Value-at-Risk Models
Evaluating VaR models� Basic instrument for testing VaR is the “Hit”
get = I[rt<F−1t ] − α = HITt
� Is the generalized error from the “tick” loss function� If the VaR is correct,
Et−1[HITt] = 0
� Leads to a standard Generalized Mincer-Zarnowitz evaluationframework
� Hit Regression
HITt+h = γ0+γ1V aRt+h|t+γ2HITt+γ3HITt−1+. . .+γKHITt−K+1
I Null is H0 : γ0 = γ1 = . . . = γK = 0I Alternative is H1 : γj 6= 0 for some j
� As always, GMZ can be augmented with any time t measurablevariable
20 / 44
Unconditional Evaluation of VaR using the Bernoulli� HITs from a correct VaR model have a Bernoulli distribution
I 1 with probability αI 0 with probability 1− α
� Likelihood for T Bernoulli random variables xt ∈ {0, 1}
f(xt; p) =T∏t=1
pxt(1− p)1−xt
� Log-likelihood is
l(p;xt) =T∑t=1
xt ln p+ (1− xt) ln 1− p
� In terms of α and HIT tl(α; HIT t) =
T∑t=1
HIT t lnα+(
1− HIT t)
ln 1− α
� Easy to conduct a LR testLR = 2(l(α; HIT )− l(α0; HIT )) ∼ χ2
1
� α = T−1∑Tt=1 HIT t, α0 is the α from the VaR
21 / 44
Evaluation of Conditional VaR using the Bernoulli� Can also be extended to testing conditional independence ofHITs
� Define
n00 =
T−1∑t=1
(1− HIT t)(1− HIT t+1), n10 =
T−1∑t=1
(1− HIT t)HIT t+1
n01 =
T−1∑t=1
HIT t(1− HIT t+1), n11 =
T−1∑t=1
HIT tHIT t+1
� The log-likelihood for the sequence two VaR exceedences is
l(p; HIT ) = n11 ln(p11)+n01 ln(1−p11)+n00 ln(p00)+n10 ln(1−p00)
22 / 44
Evaluation of Conditional VaR using the Bernoulli� Null is H0 : p11 = 1− p00 = α
� MLEs are
p00 =n00
n00 + n10, p11 =
n11n11 + n01
� Tested using a likelihood ratio test
LR = 2(l(p00, p11; HIT )− l(p00 = 1− α, p11 = α; HIT ))
� Test statistic follows a χ22 distribution
23 / 44
Relationship to Probit/Logit� Standard GMZ regression is not an ideal test� Ignores special structure of a HIT� A HIT is a limited dependant variable
I Only takes one of two values� Define a modified hit HIT t = I[rt<F−1
t ]I Takes the value 1 with probability α and 0 with probability 1− αI Name that distribution→
� Leads to a modified regression framework known as a probit orlogitI Probit:
HIT t+1 = Φ (γ0 + xtγ)
– If model is correct, γ0 = Φ−1(α) and γ = 0– Estimated using Bernoulli Maximum Likelihood– Easy to compute Likelihood ratio
� Accounts for the limited range of the variable and that the densityis non-normal
� Allows for simple-yet-powerful likelihood ratio tests under the null24 / 44
Density Forecasting and Evaluation
Density Estimation and Forecasting� End all be all of risk measurement� Issues:
I Equally hardI Lots of estimation and model error
– Can have non obvious effects on nonlinear functions (i.e. options)I Not closed under aggregation
– No multi-step
� Builds off of the GARCH VaR application
25 / 44
Density forecasts from GARCH models� Simple constant mean GARCH(1,1)
rt+1 = µ+ εt+1
σ2t+1 = ω + γε2t + βσ2t
εt+1 = σt+1et+1
et+1i.i.d.∼ g(0, 1).
� g is some known distribution, but not necessarily normal� Density forecast is simply g(µ, σ2t+1|t)
� Flexible through choice of g� Parsimonious� Semiparametric works in same way replacing g with the
standardized residuals of a “smoothed” estimate
26 / 44
Kernel Densities� “Smoothed” densities are more precise than rough estimates
g(e) =1
Th
T∑t=1
K
(et − eh
), et =
yt − µtσt
=εtσt
� Local average of how many et there are in a small neighborhoodof e
� K(·) is a kernelI Gaussian
K(x) =1√2π
exp(−x2/2)
I Epanechnikov
K(x) =
{34(1− x2) −1 ≤ x ≤ 1
0 otherwise
� h: Bandwidth controls smoothing� Silverman’s bandwidth
1.06σxT− 1
5
I h too small produces very rough densities (low bias but lots ofvariance)
I h too large produces overly smooth (low variance but very biased)27 / 44
S&P 500 Parametric and Nonparametric Densities
4 3 2 1 0 1 2 3 4
NormalSkew tNonparametric
28 / 44
Multi-step Density Forecasts� Densities do not aggregate in general
I Multivariate normal is special� Densities from GARCH models do not easily aggregate� 1-step density forecast from a standard GARCH(1,1)
rt+1|Ft ∼ N(µ, σ2t+1|t)
� Wrong 2-step forecast from a standard GARCH(1,1)
rt+2|Ft ∼ N(µ, σ2t+2|t)
� Correct 2-step forecast from a standard GARCH(1,1)
rt+2|Ft ∼∫ ∞−∞
φ(µ, σ2(et+1)t+2|t+1)φ(et+1)det+1.
� Must integrate out the variance uncertainty between t+ 1 and t+ 2
� Easy fix: directly model t+ 2 (or t+ h)29 / 44
The Fan plot� Hard to produce time-series of densities� Solution is the Fan Plot� Popularized by the Bank of England� Horizontal axis (x) is the number of time-periods ahead� Vertical axis (y) is the vale the variable might take� Density is expressed using varying degrees of color intensity.
I Dark color indicate the highest probabilityI Progressively lighter colors represent decreasing likelihoodI Essentially a plot of many quantiles of the distribution through time
� A lot of “wow”� Not necessarily a lot of content
30 / 44
A fan plot for an AR(2)
1 5 10 15 20Forecast Horizon
5
0
5
10
15
20
25
Pred
ictio
n Va
lue
31 / 44
Density “Standardized” Residuals� Consider a generic stochastic process {yt}
I Residuals from mean models:
εt = yt − µtI Residuals from variance models:
et =εtσt
=yt − µtσt
I Residuals from Value-at-Risk models:
HITt = I[yt<qt] − α
I Residual from density models:
ut = Ft(yt)
� Known as the Probability Integral Transformed Residuals.� One very useful property: If yt ∼ F then ut ≡ F (yt) ∼ U(0, 1)
32 / 44
Probability Integral Transform
Theorem (Probability Integral Transform)
Let a random variable X have a continuous, increasing CDF FX(x)and define Y = FX(X). Then Y is uniformly distributed andPr(Y ≤ y) = y, 0 < y < 1.
For any y ∈ (0, 1), Y = FX(X), and so
Pr(Y ≤ y) = Pr(FX(X) ≤ y)
= Pr(F−1X (FX(X)) ≤ F−1X (y)) Since F−1X is increasing
= Pr(X ≤ F−1X (y)) Invertible since strictly increasing
= FX(F−1X (y)) Definition of FX= y
33 / 44
Evaluating Density Forecasts: QQ Plots� Quantile-Quantile Plots� Plots the data against a hypothetical distribution
e1 < e2 < . . . < eN−1 < eN
I N = T but used to indicate that the index is not related to time� en against F−1( j
T+1)
F−1
(1
T + 1
)< F−1
(2
T + 1
)< . . . < F−1
(T − 1
T + 1
)< F−1
(T
T + 1
)� F−1 is inverse CDF of distribution being used for comparison� Should lie along a 45o line� No confidence bands
34 / 44
QQ Plots for the S&P 500Monthly Returns
Normal Student’s t, ν = 5.8
4 2 0 2 4Observed
4
2
0
2
4
Hyp
othe
tical
4 2 0 2 4Observed
4
2
0
2
4
Hyp
othe
tical
GED, ν = 1.25 Skewed t, ν = 6.3, λ = −0.19
4 2 0 2 4Observed
4
2
0
2
4
Hyp
othe
tical
4 2 0 2 4Observed
4
2
0
2
4
Hyp
othe
tical
35 / 44
Evaluating Density Forecasts: Kolmogorov-Smirnov� Formalizes QQ plots� Key property
I If x ∼ F , then u ≡ F (x) ∼ U(0, 1)I Can test U(0, 1)
� KS tests maximum deviation from U(0,1)
maxτ
∣∣∣∣∣ 1
T
(τ∑i=1
I[ui< τT]
)− τ
T
∣∣∣∣∣ , τ = 1, 2, . . . , T
I 1T
∑τi=1 I[uj< τ
T ]: Empirical percentage of u below τ/TI τ/T : How many should be below τ/T
� Nonstandard distribution� Parameter estimation error
I Parameter Estimation Error (PEE) causes significant sizedistortions
I Using a 5% CV will only reject 0.1% of the timeI Solution is to simulate the needed critical values
36 / 44
The Kolmogorov-Smirnov Test
200 400 600 800Observation Index
0.0
0.2
0.4
0.6
0.8
1.0C
umul
ativ
e Pr
obab
ility
95% Conf. BandsKS-test error (Normal)KS-test error (Student's t)
37 / 44
Addressing PEE in a KS test� Model is a complete model so can be easily simulated� Exact KS distribution tabulated
Algorithm (Correct CV for KS test with PEE)
1. Estimate model and save θ
2. Repeat many times (1000+)a. Simulate artificial series from model using θwith same number of
observations as original datab. Estimate parameters from simulated data, θc. Compute KS test statistic on simulated data using θand save as
KSi, i = 1, 2, . . . ,
3. Sort the KSi values and use the 1− α quantile for get correct CVfor α size test
38 / 44
Evaluating Density Forecasts: Berkowitz Test� Berkowitz Test extends KS to evaluation of conditional densities� Exploits probability integral transform property
ut = F (yt)
� But then re-transforms the data to a standard normal
ηt = Φ−1(ut) = Φ−1(F (yt))
I Since uti.i.d.∼ U(0, 1), ηt
i.i.d.∼ N(0, 1)� Test is a likelihood ratio test using an AR(1)
ηt = φ0 + φ1ηt−1 + νt
� If the model is correctly specifiedI φ0 = 0,φ1 = 0,σ2 = V[νt] = 1
� Likelihood ratio
2(l(ηt|φ0, φ1, σ2)− l(ηt|φ0 = 0, φ1 = 0, σ2 = 1)
)∼ χ2
3
I Critical values wrong if F has estimated parameters39 / 44
Idealized Risk Measures
Coherent Risk Measures� Coherence is a desirable property for a risk measure
I But not completely necessary� ρ is the required capital necessary according to some measure of
risk (VaR, ES, Standard Deviation, etc.)� P , P1 and P2 are portfolios of assets� A Coherent measure satisfies:
Drift Invarianceρ(P + c) = ρ(P )− c
Homogeneityρ(λP ) = λρ(P ) for any λ > 0
Monotonicity If P1 first order stochastically dominates P2, then
ρ(P1) ≤ ρ(P2)
Subadditivityρ(P1 + P2) ≤ ρ(P1) + ρ(P2)
40 / 44
Coherent Risk Measures� VaR is not coherent
I Because VaR is a quantile it may not be subadditive
VaR is Not Coherent� Two portfolios P1 and P2 holding a bond
I Each paying 0%, par value of $1,000I Default probability 3%, recovery rate 60%I Two companies, defaults are independent
� Value-at-Risk of P1 and P2is $0� Value-at-RIsk of P3 = 50%× P1 + 50%× P2 = $200
I 5.91% that one or both default
41 / 44
Coherent Risk MeasuresP1 and P2
500 400 300 200 100 0 100Change in Value
0.0
0.2
0.4
0.6
0.8
1.0
P3
500 400 300 200 100 0 100Change in Value
0.0
0.2
0.4
0.6
0.8
1.0
42 / 44
Coherent Risk Measures� ES is coherent
I Doesn’t mean muchI VaR still has a lot of advantagesI More importantly VaR and ES agree in most realistic settings
ES is coherent� ES of P1 and P2 is $240
I Given in lower 5% of distribution, 60% chance of a loss of $400� ES of P3
I Given in lower 5% of distribution:– 0.0009/0.05 = .018 probability of $400 loss (2 defaults)– 0.0491/0.05 = .982 probability of $200 loss (1 default)– ES of $7.20 + $196.40 = $203.60
� ES is subadditive when VaR is not
43 / 44
Expected Shortfall
Expected Shortfall� Conditional Expected Shortfall (ES, also called Tail VaR)
ESt+1 = Et[rt+1|rt+1 < −V aRt+1]
� "Expected Loss given you have a Value-at-Risk violation"� Usually requires the specification of a complete model for the
conditional distribution� Uses all of the information in the tail� Evaluation
I Standard Problem, a conditional meanI GMZ regression
(ESt+1|t −Rt+1)I[Rt+1<−V aRt+1|t] = xtγ
– H0 : γ = 0
� Difficult to test since relatively few observations
44 / 44
1
1
1
1
1
1
1
1
1
1
1