The Limits to Volatility Predictability: Quantifying Forecast … ANNUAL MEETINGS... · 2018-03-15 · The Limits to Volatility Predictability: Quantifying Forecast Accuracy Across

The Limits to Volatility Predictability: Quantifying

Forecast Accuracy Across Horizons∗

Xingyi Li† and Valeriy Zakamulin‡

This revision: December 18, 2017

Abstract

Volatility forecasting is crucial for portfolio management, risk management, and pric-

ing of derivative securities. Still, little is known about how far ahead one can forecast

volatility. First, in this paper we introduce the notions of the spot and forward predicted

volatilities and propose to describe the term structure of volatility predictability by the

spot and forward forecast accuracy curves. Then, by employing a few popular time-series

volatility models, we perform a comprehensive empirical study on the horizon of volatility

predictability. Our results suggest that, whereas the spot volatility can be predicted over

horizons that extend to 35 weeks, the horizon of the forward volatility predictability is

rather short and limited to approximately 7.5 weeks. Finally, we suggest a plausible expla-

nation for why standard models fail to provide sensible longer-horizon volatility forecasts.

We argue that volatility is less persistent and does not revert to its long-run mean as the

models assume. Our analysis reveals an important but currently overlooked stylized fact

about volatility: it cycles erratically over time and periods of high or low volatility follow

one another.

Key words: spot volatility, forward volatility, volatility forecasting, forecast accuracy,

term structure, out-of-sample forecasting, model comparison

JEL classification: C22, C53, C58, G17; EFM classification: 450

∗The authors are grateful to Jochen Jungeilges for his helpful comments and suggestions regarding theprevious draft of this paper. Any remaining errors in the manuscript are the authors’ responsibility.†School of Business and Law, University of Agder, Service Box 422, 4604 Kristiansand, Norway, Tel.: (+47)

38 14 13 38, E-mail: [email protected]‡School of Business and Law, University of Agder, Service Box 422, 4604 Kristiansand, Norway, Tel.: (+47)

38 14 10 39, E-mail: [email protected]

1

1 Introduction

Volatility forecasting is crucial for portfolio management, risk management, and pricing of

financial derivatives. Specifically, the volatility of a financial asset is a primary input to the

optimal portfolio choice problem. Volatility forecasting is a mandatory risk-management ex-

ercise for many financial institutions and banks around the world. Volatility is the most vital

input variable in the valuation of derivative securities. For instance, to price an option one

needs to know the future volatility of the underlying asset till the option maturity.

Nowadays, there is a trade in derivatives that are written on volatility itself. Examples

of such derivatives are Forward Volatility Agreements (FVA). The FVA is a forward contract

on the future spot realized or implied volatility of a financial asset (examples are: individual

stock, stock market index, commodity, foreign currency, etc.). In particular, the FVA specifies

the realized or implied volatility for an interval starting at a future date. The value of the FVA

at maturity is the difference between the contractual volatility level, which is determined at

the contract inception date, and the volatility level observed at the settlement date. The key

motivation to trade FVAs is that they allow investors to hedge volatility risk and to speculate

on volatility levels.

In portfolio management and risk management, the volatility needs to be forecasted over

horizons ranging from 1 day to 1 month. In contrast, in the valuation of derivative securities

the volatility needs to be forecasted over much longer horizons. For example, on the Chicago

Board Options Exchange (CBOE) one can trade short-term options with a maximum of 12

months to maturity and long-term options (LEAPS) that have expiration dates up to 39 months

into the future. Therefore, the successful pricing of options requires accurate forecasting of

volatility over a relatively long-term period starting now. The successful use of a FVA contract

requires accurate forecasting of volatility over a period starting at some point in the future.

FVA contracts are typically traded in over-the-counter markets and have maturities ranging

from 1 to 24 months (see, for example, Corte, Kozhan, and Neuberger (2017)).

There is now an enormous body of research on the properties of volatility, volatility mod-

eling and forecasting. The following stylized facts about volatility have been identified and

described in the financial literature: persistence and mean-reversion. Persistence in volatility

makes it possible to forecast future volatility. In fact, it is well documented in financial econo-

2

metric literature that volatility is predictable (for a good review of this literature, see Poon

and Granger (2003)). But what is the horizon of volatility predictability?

There is no doubt that volatility is forecastable over short horizons up to 1 month into the

future. However, there is a controversy in the literature about how far ahead the volatility is

forecastable. The answer to this question seems to depend greatly on employed methodology

of measuring forecast accuracy. Specifically, in the absolute majority of studies on volatility

predictability the researchers run a horse-race between several alternative forecasting models.

In these studies, the forecast accuracy is typically evaluated using measures based on either

(absolute or squared) forecast errors or percentage errors.1 Such studies report the ranking of

competing models and make recommendations to practitioners about which forecasting model

should be preferred. In some of these studies the forecast horizon is extended to 30-60 months

(examples of such studies are Cao and Tsay (1992), Alford and Boatsman (1995), Figlewski

(1997), and Green and Figlewski (1999)). The results of these studies seem to suggest that

volatility is forecastable over long-term horizons that extend to several years.

There are two issues with the above-mentioned studies. The first issue is that the results

are plagued by the fact that they are joint assessments of volatility forecastability and an

assumed model, and the results vary not only with the horizon, but also with the model.

To address this problem, Christoffersen and Diebold (2000) develop a model-free procedure

for measuring volatility predictability across horizons. They implement their procedure using

the data on four stock market indices and four exchange rates. In contrast to the previous

studies, Christoffersen and Diebold (2000) find that volatility forecastability decays quickly

with horizon and volatility is not predictable over horizons longer than 8 weeks.

The other more serious issue with the above-mentioned studies is that even though fore-

casting errors allow comparing alternative forecasting models, they do not allow measuring

predictive accuracy per se. For example, if the volatility over some forecast horizon is unpre-

dictable, all model forecasts are likely to be worthless. In this case using forecasting errors

to select the best model among the poor ones creates the illusion of predictability when none

is present. To overcome this problem, Galbraith (2003) proposes a procedure of determining

the horizon (coined the “content horizon”) beyond which forecasts from univariate time se-

ries models of stationary processes add nothing to the forecast implicit in the unconditional

1Examples of such measures are Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE).

3

mean. Using data on two exchange rates, Galbraith and Kisinbay (2005) find that the “content

horizon” in forecasting the volatility of exchange rates is limited to 6 weeks.

The present paper attempts to extend our knowledge about how far into the future useful

volatility forecast can be made from time-series models. Our contributions are threefold. On

the methodological side, we introduce the notions of the spot and forward predicted volatili-

ties. Consequently, we suggest describing the term structure of volatility predictability by two

curves: the spot and forward forecast accuracy curves. In our terminology, the “spot predicted

volatility” is the volatility forecasted over a period starting now. Hence, the “spot forecast

accuracy curve” shows the forecast accuracy for the spot predicted volatility. The “forward

predicted volatility” is the volatility forecasted over a period starting at some point in the

future; the “forward forecast accuracy curve” depicts the forecast accuracy for the forward

predicted volatility.

The motivation for the notion of “forward predicted volatility” comes from two sources.

First, the spot predicted volatility can be decomposed into the spot predicted volatility over

the first period and the forward predicted volatilities over all subsequent periods. Therefore,

the standard spot forecast accuracy curve provides useful, but at the same time misleading

information about the model’s ability to forecast volatility across a specific horizon. This is

because it creates a wrong impression that the predictive accuracy is evenly spread over the

total horizon; this impression is far from the reality. Therefore, much more useful information

about the model’s ability to forecast volatility across various horizons can be obtained from the

forward forecast accuracy curve. Second, the forward forecast accuracy curve is very relevant

in the context of FVAs because it measures directly the accuracy of a forward volatility forecast

and, therefore, allows one to understand the limits to the forward volatility predictability.

On the empirical side, by employing three popular yet simple time-series volatility models,2

we perform a comprehensive study on the horizon of volatility predictability. Specifically, using

data on 23 individual stocks, 39 world stock indices, 16 bond indices, 17 exchange rates, and 8

commodities, we estimate the term structure of volatility predictability for all major financial

markets. As a result, in our study we use a relatively large and broad data set of diverse

financial assets as compared to most of the existing studies on volatility forecasting. In contrast

to the previous studies that employ only a few financial assets, we do not report the results for

2Each of these models captures different stylized facts about volatility.

4

each individual asset because each of these results might not be truly representative. Instead,

we perform a so-called “meta-analysis” that combines the results for all individual assets that

belong to the same asset class. A key benefit of this approach is that the aggregation of

information on individual assets leads to a higher statistical power and more robust forecast

accuracy estimates than it is possible to obtain from the information on any individual asset.

Our empirical results are strikingly similar across the financial markets. Whereas the spot

volatility can be predicted over horizons that extend to 20-35 weeks, the horizon of the forward

volatility predictability is rather short and limited to 5-10 weeks. In other words, our results

on the forward forecast accuracy advocate that, regardless of the financial market, the horizon

of the spot (forward) volatility predictability is limited to 8 (2) months ahead. Therefore, the

horizon of volatility predictability is much shorter than the longest maturity of traded LEAPS

(39 months) and FVA (24 months) contracts.

Our third contribution consists in suggesting a plausible explanation for why standard

volatility models fail to provide sensible longer-horizon volatility forecasts. We argue that the

volatility dynamics do not fully correspond to the assumptions embedded in standard models,

which assume that volatility is highly persistent and mean-reverting. To support our argu-

ments, we suggest looking from a different perspective on the volatility dynamics. Specifically,

we propose to consider the volatility dynamics as a process which is constantly “switching”

between high and low volatility states. The interchanging periods of high and low volatility

can be regarded as waves of irregular cycle length and varying amplitude. Put differently, the

important but currently overlooked stylized fact about volatility is its oscillating dynamics:

volatility cycles erratically over time and periods of high or low volatility follow one another.

Using a few distinct financial assets, we detect the periods of rising and falling volatility and

present their summary statistics. We find that the statistics are strikingly similar across vari-

ous asset classes. Most importantly, our analysis reveals that the volatility is sometimes rather

non-persistent: its value can change dramatically over a course of a single day. In addition, we

indicate that the volatility does not revert to its long-run mean as the models assume.

The rest of the paper is organized as follows. Section 2 introduces the basic notions and

terminology used in the paper. Section 3 describes the data. The empirical methodology is

outlined in Section 4 that covers the volatility forecasting models, how we measure the forecast

accuracy, and how we conduct statistical inference about estimated forecast accuracies. Section

5

5 presents the empirical results. In the subsequent Section 6 we discuss the results and present

our view on the volatility dynamics. Finally, Section 7 summarizes and concludes the paper.

2 Basic Notions and Terminology

2.1 Spot and Forward Predicted Volatilities

Our notions of spot and forward predicted volatilities build on the notions of spot and forward

implied volatilities,3 see Taleb (1997, Chapter 9), Egelkraut, Garcia, and Sherrick (2007),

Glasserman and Wu (2011), and Corte, Sarno, and Tsiakas (2011). Let t denote the present

time and let {Ti} be a set of forecast horizons such that Ti+1 > Ti. Consider a time-series model

that uses historical returns for volatility forecasting. Depending on the method of forecasting,

such a model can predict the future volatility directly over the horizon of interest, σ̂t,t+Ti ,

where the subscript t, t + Ti means “the time t forecast of volatility from time t + 1 till time

t+Ti”. In other words, σ̂t,t+Ti denotes the time t predicted volatility over horizon Ti. We refer

to σ̂t,t+Ti as to “the spot predicted volatility”. When there is no dependency in the time series

of returns, using two spot predicted volatilities over horizons Ti and Tj (such that Tj > Ti) we

can deduce the predicted volatility over the period from t+ Ti to t+ Tj

σ̂2t+Ti,t+Tj = σ̂2t,t+Tj − σ̂2t,t+Ti . (2.1)

We refer to σ̂t+Ti,t+Tj as to “the forward predicted volatility”. Specifically, the forward pre-

dicted volatility represents the time t forecasted volatility between two future dates t+ Ti and

t+ Tj .

Alternatively, a time-series model can forecast the volatility over horizon of interest by

performing a rolling one-period ahead forecast. That is, first the model predicts the next

period volatility σ̂t,t+T1 , then it predicts the volatility over the second period σ̂t+T1,t+T2 , and

so on until the last period volatility σ̂t+Ti−1,t+Ti . In this case the spot predicted volatility over

the total horizon Ti is computed as

σ̂2t,t+Ti = σ̂2t,t+T1 + σ̂2t+T1,t+T2 + . . .+ σ̂2t+Ti−1,t+Ti . (2.2)

3It is worth noting that the notions of spot and forward implied volatilities are built, in their turn, on thenotions of spot and forward interest rates.

6

The forward predicted volatility between two future dates t+ Ti and t+ Tj is computed as

σ̂2t+Ti,t+Tj = σ̂2t+Ti,t+Ti+1+ σ̂2t+Ti+1,t+Ti+2

+ . . .+ σ̂2t+Tj−1,t+Tj . (2.3)

2.2 Term Structure of Volatility Predictability

As a rule, the procedure for measuring the forecast accuracy across various horizons is per-

formed as follows. First of all, one predicts the volatility across a set of horizons {Ti}, σ̂t,t+Ti .

Second, one computes the realized volatility across the same set of horizons, σt,t+Ti . Finally,

one compares the predicted and realized volatilities. For this purpose one usually employs

some function Ps(Ti) that computes the forecast accuracy for given horizon Ti. The subscript

s in this function emphasizes that the function uses the spot predicted volatilities. Typically,

the value of Ps(Ti) is limited from above by 1 (or 100%) meaning that if a model correctly

predicts the volatility, then the forecast accuracy equals 100%. Usually the value of Ps(Ti) is

significantly less than 1 and it is known that the volatility forecastability decays with horizon.

Therefore the function Ps(Ti) is decreasing. We refer to function Ps(Ti) as to “the forecast

accuracy curve for the spot predicted volatility” or just “the spot forecast accuracy curve” for

short.

The spot forecast accuracy curve provides useful, but at the same time misleading infor-

mation about the model’s ability to forecast volatility across a specific horizon. Consider the

following motivating example. Suppose you measure the model’s forecast accuracy in pre-

dicting the volatility over two periods, till time T2. You estimate that the forecast accuracy,

as measured by function Ps(T2), amounts to 50%. When you recall that the spot predicted

volatility σ̂t,t+T2 can be decomposed into the spot predicted volatility over the first period

σ̂t,t+T1 and the forward volatility over the second period σ̂t+T1,t+T2 , the natural question to ask

is: how accurate is the volatility forecast for each component? The forecast accuracy of 50%

over two periods is a meaningful measure when both the components, σ̂t,t+T1 and σ̂t+T1,t+T2 ,

are forecasted with about the same accuracy. In reality it might be the case that the value

of 50% appears because the first component is forecasted with 100% accuracy, but the other

component is forecasted with 0% accuracy.

We argue that the standard procedure for measuring the forecast accuracy across horizons

creates a wrong impression that the predictive accuracy is evenly spread over the total horizon.

7

However, this impression is far from the reality. Therefore, much more useful information about

the model’s ability to forecast volatility across various horizons can be obtained if one compares

the forward predicted volatilities, σ̂t+Ti−1,t+Ti , with the future realized volatilities over the same

periods, σt+Ti−1,t+Ti . For this purpose one can employ exactly the same function, Pf (Ti), but

in this case the function computes the volatility forecast accuracy for a set of (non-overlapping)

future periods {t+Ti−1, t+Ti}. The subscript f in this function emphasizes that the function

uses the forward predicted volatilities. Function Pf (Ti) has the same properties as the function

Ps(Ti): the value of the function is limited from above by 1, and the function is decreasing

with horizon. We refer to function Pf (Ti) as to “the forecast accuracy curve for the forward

predicted volatility” or just “the forward forecast accuracy curve” for short.

By the “term structure of volatility predictability” we mean the two curves, Ps(Ti) and

Pf (Ti), that jointly provide information about the model’s ability to forecast volatility across

various horizons. By augmenting the standard spot forecast accuracy curve with the forward

forecast accuracy curve, one gets a better insight into how far into the future one can really

rely on the model’s volatility forecasts.

3 Data

In our study we use historical data for the four major asset classes; two of the asset classes also

include sub-asset classes. These asset classes (sub-classes) include stocks (individual stocks

and stock market indices), bonds (intermediate- and long-term bonds), currencies, and com-

modities. All data come at the daily frequency and cover the period from January 1995 to

December 2016. The data for the two first asset classes (and sub-classes) are downloaded from

Yahoo Finance.4 The data for the two other asset classes are obtained from the Federal Re-

serve Economic Data (FRED), a database maintained by the Research division of the Federal

Reserve Bank of St. Louis.5 Table 1 lists the components of each data set.

The data set of stocks consists of prices of 23 individual stocks. These stocks represent

either current or previous components of the Dow Jones Industrial Average (DJIA) that have

price data in the whole sample period.6 The dataset of stock market indices includes the prices

4https://finance.yahoo.com/.5https://fred.stlouisfed.org/.6The DJIA is an index of the prices of 30 large US corporations selected to represent a cross section of US

industry. As for today, the components of the DJIA have changed 51 times since its beginning in 1896.

8

Table 1: Datasets: Asset classes, subclasses, and their components

Asset class Symbols

Individual stocks AXP, BA, CAT, CSCO, CVX, DD, DIS, GE, HD, INTC, JNJ,JPM, KO, MCD, MMM, MRK, MSFT, NKE, PFE, TRV, VZ,WMT, XOM

Stock market indices ÂORD, ÂTX, ˆBFX, ˆBTK, ˆBVSP, ˆDJA, ˆDJI, ˆDJT,ˆDJU, ˆFCHI, ˆGDAXI, ˆGSPC, ˆGSPTSE, ˆHSI, ÎXIC, ˆMID,ˆMXX, ˆMXY, ˆN225, ˆNBI, ˆNDX, ˆNYA, ÔEX, ˆPSE,ˆRUA, ˆRUI, ˆRUT, ˆSML, ˆSOX, ˆSSMI, ˆTA100, ÛTY,ˆXAU, ˆXCI, ˆXII, ˆXMI, ˆXNG, ˆXOI, ˆXTC

Long-term bonds VFIIX, VFITX, VFICX, VWEHX, VCAIX, VWITX

Intermediate-term bonds VBLTX, VUSTX, VWESX, VCITX, VWAHX, VWLTX,VNJTX, VNYTX, VOHIX, VPAIX

Exchange rates DEXBZUS, DEXCAUS, DEXDNUS, DEXHKUS, DEXINUS,DEXJPUS, DEXKOUS, DEXNOUS, DEXSDUS, DEXSFUS,DEXSIUS, DEXUSAL, DEXUSNZ, DEXUSUK, DTWEXB,DTWEXM, DTWEXO

Commodities DCOILWTICO, GOLDAMGBD228NLBM, DCOILBRENTEUGOLDPMGBD228NLBM, DGASUSGULF, GOLDPMGBD229NLBM,DGASNYH, GOLDAMGBD229NLBM

Notes: For stocks and bonds, Symbols refers to the symbols used in Yahoo Finance. For exchangerates and commodities, Symbols refers to the symbols used in FRED database.

of 39 major world stock indices.

The two data sets of bonds consist of prices of 6 intermediate-term bond indices and

10 long-term bond indices respectively. All bond indices represent the most popular bond

index funds provided by the Vanguard Group.7 These intermediate-term (long-term) bond

indices offer a low-cost, diversified approach to bond investing, providing broad exposure to

US investment-grade bonds with maturities of about 6 (14) years.

The data set of currencies consists of 14 individual spot foreign exchange rates and 3

indices. Each of these indices represents a weighted average of the foreign exchange value of

the US dollar against a specific subset of the broad index currencies. Finally, the data set of

commodities includes daily spot prices on 8 commodities.

4 Methodology

4.1 Volatility Forecasting Models

A number of stylized facts about the volatility of financial asset prices have emerged over

the years, and been confirmed in numerous studies. Engle and Patton (2001) highlight the

7https://investor.vanguard.com. The Vanguard Group is one of the world’s largest and most respectedinvestment companies. It provides low-cost passive funds available for their clients to invest in.

9

following stylized facts about volatility:

• Even though the volatility is not constant over time, the volatility exhibits persistence.

Simply put, volatility persistence means that period-to-period changes in volatility are

relatively small. In technical terms, volatility persistence means that the volatility process

exhibits a significant positive autocorrelation.

• Volatility tends to cluster through time. The volatility clustering occurs because large

moves in the price process tends to be followed by large moves (of either sign), and

small moves tend to be followed by small moves. The volatility clustering is commonly

considered to be yet another manifestation of volatility persistence.

• Volatility is mean-reverting. This means that a period of high (low) volatility will even-

tually be followed by a period of low (high) volatility. Mean reversion in volatility

is generally interpreted as meaning that there is a normal level of volatility to which

volatility will eventually return.

To illustrate the above points, Figure 1 plots the historical volatility of the S&P 500 stock

price index over the period from January 1995 to December 2016. The graph in this plot

clearly illustrates that the volatility is persistent and mean-reverting. A prominent feature

of the dynamics of volatility is that it has many periodic peaks and the spikes in volatility

coincide with the periods of stock market turmoil. For instance, during the period of the

Global Financial Crisis of 2008, the volatility reached 75% (on annual basis).

10

Figure 1: Historical volatility of the S&P 500 index

20

40

60

1995 2000 2005 2010 2015

S&

P 5

00 v

ola

tilit

y, %

Notes: The volatility is estimated using daily returns and the EWMA model with λ = 0.96.The volatility is expressed on annual basis in percentage points.

In our study we use three simple yet different models to forecast volatility. Each of these

models captures some of the stylized facts about volatility. Specifically, given the daily re-

turns of a financial asset as input, we implement the EWMA model, the GARCH model,

and the HAR model to produce T -day ahead volatility forecast, where T takes values in

{5, 10, 15, 20, . . . , 170, 175}. Our convention is that there are 5 trading days per week. Conse-

quently, we forecast volatility for 1-35 weeks in the future. We assume that the daily logarithmic

return process of any asset is given by

rt = µ+ σtεt, (4.1)

where µ is the daily long-run mean of rt, σt is the daily volatility, and εt is a white noise process

with zero mean and unit variance.

4.1.1 The EWMA Model

Our first volatility forecasting model is the well-known Exponentially Weighted Moving Av-

erage (EWMA) model popularized by the RiskMetricsTM group (Longerstaey and Spencer

11

(1996)). The one-step ahead forecasting equation in this model is given by

σ̂2t,t+1 = (1− λ)r2t + λσ̂2t−1,t, (4.2)

where λ is the so-called “decay factor”. We estimate the optimal decay factor for each asset by

minimizing the Mean Squared Error (MSE) of daily forecast.8 When the length of the forecast

horizon Ti is greater than one day, the multi-step ahead volatility forecast is performed using

the square root of time rule

σ̂t,t+Ti = σ̂t,t+1

√Ti. (4.3)

Observe that the EWMA model assumes that the volatility is highly persistent. That is, the

expected volatility over all future periods equals the forecasted volatility for the subsequent

period.

4.1.2 The GARCH Model

We employ the most widely used Generalized AutoRegressive Conditional Heteroskedasticity

(GARCH) (1,1) model, proposed by Bollerslev (1986), as the second alternative volatility

forecasting model. In this model the latent daily volatility is assumed to evolve according to

the following process

σ2t+1 = ω + α(rt − µ)2 + βσ2t , (4.4)

where the coefficients α, β, and ω are estimated using daily returns in (4.1) by the maximum

likelihood method. Observe that the EWMA model is a special case of the GARCH(1,1) model

where ω = 0 and α = 1−β. The volatility in the GARCH(1,1) model is not only persistent, but

also mean reverting. The persistence is measured by α+ β. When persistence equals 1, there

is no mean-reversion as in the EWMA model. In a stable GARCH(1,1) model, α+β < 1. The

higher the persistence, the slower the reversion to the mean is. The so-called half-life of mean

reversion, defined as ln(0.5)/ ln(α + β), measures the average time it takes volatility to move

halfway towards its long-term average. In stock markets, a typical estimate of the volatility

half-life amounts to 15 weeks (Engle and Patton (2001)).

The one-step ahead volatility forecast for day t+1 is given by equation (4.4). The volatility

8We use√

r2t = |rt| as the proxy of day t realized volatility.

12

for day t+2 is forecasted using the fact that E[(rt+1 − µ)2

]= σ2t+1. As a result, the multi-step

volatility forecast is based on rolling one-day ahead volatility forecasts from day t + 1 to day

t+ Ti

σ̂2t+i−1,t+i =

ω + α(rt − µ)2 + βσ2t if i = 1

ω + (α+ β)σ̂2t+i−2,t+i−1 if i > 1.

(4.5)

The forecasted Ti-day volatility is computed as the square root of the sum of daily forecasted

variances

σ̂t,t+Ti =

√√√√ Ti∑j=1

σ̂2t+j−1,t+j . (4.6)

4.1.3 The HAR Model

The persistence and mean-reversion of volatility can be captured by a simple Auto-Regressive

(AR(1)) model for volatility

σt+1 = β0 + β1σt + εt, (4.7)

where β0 and β1 are real constants and εt ∼ i.i.d.(0, σ2ε). For example, French, Schwert, and

Stambaugh (1987) use both the AR(1) and GARCH(1,1) models to predict the stock market

volatility and report that both the models have about the same forecast accuracy. Because

both the AR(1) and GARCH(1,1) models have the same forecast accuracy, but in the AR(1)

model the volatility can in principle be negative, the AR(1) model is virtually never used in

practical applications.

In the context of Realized Volatility9 (RV), Corsi (2009) introduces a Heterogeneous Auto-

Regressive model of Realized Volatility (HAR-RV) and shows that it has a superior forecast

accuracy compared to a set of alternative models. Inspired by Corsi, we adapt his HAR-RV

model for forecasting volatility using daily data only. This model is a simple auto-regressive

type model where the volatility is forecasted using several past volatilities realized over different

time horizons. When only daily data are available, we simply replace the realized volatilities

computed using past intraday returns with those computed using past daily returns. Therefore,

9The “realized volatility” is the estimator of daily volatility computed using intraday returns.

13

we label our model as HAR model and it is specified by

σt,t+T = β0,T + β1,Tσt + β2,Tσt−4,t + β3,Tσt−20,t + β4,Tσt−62,t + β5,Tσt−125,t + εt, (4.8)

where σt−τ,t is the realized volatility over τ + 1 days from day t − τ + 1 till day t (by our

convention σt−1,t = σt)

σt−τ,t =

√√√√τ−1∑i=0

r2t−i , τ + 1 ∈ {1, 5, 21, 63, 126}. (4.9)

The HAR model extends the AR(1) model (given by equation (4.7)) by adding several regres-

sors. The volatility in the HAR model is also persistent and mean-reverting.10 The inclusion

of lags of realized volatility aggregated over different time horizons improves the forecast ac-

curacy. One possible explanation of this fact can be found in the paper by Engle and Rangel

(2008). In particular, these authors convincingly demonstrate that the volatility dynamics

have two components: a high-frequency component that can be captured by a GARCH pro-

cess and a low-frequency component that can be explained by macroeconomic factors. Because

of the presence of a low-frequency component in volatility, the GARCH model alone is not able

to provide sensible longer-horizon volatility forecasts. In contrast, the HAR model allows, to

some extent, to capture the high- and low-frequency components in the volatility dynamics.

In the original HAR-RV model by Corsi (2009), σt−τ,t is estimated using intraday data. In

addition, in his original model Corsi uses only daily, weekly, and monthly realized volatilities as

regressors; the volatility is forecasted for one day only. Since in our empirical study we forecast

future volatility up to 35-week horizon, we augment the original model by two additional

regressors: 3-month and 6-month past realized volatilities. We assume that there are 21

trading days per month. Also note that our HAR model is used to perform a T -day ahead

volatility forecast where T ≥ 1. Therefore the βi coefficients are re-estimated (by OLS) for

each specific horizon T , hence we use notation βi,T .

10Note that the unconstrained HAR model suffers from the same drawback as the AR(1) model: the forecastedvolatility can in principle be negative. However, the main justification for using this model is that it provides asuperior long-horizon forecast accuracy as compared to that of the GARCH model.

14

4.2 Measuring Forecast Accuracy

The standard procedure for assessing the forecast accuracy starts with evaluating the forecast

errors

et,t+T = σ̂t,t+T − σt,t+T , (4.10)

which is the difference between the T -day forecasted volatility σ̂t,t+T and the realized volatility

σt,t+T . To measure the forecast accuracy, most often researchers use the MSE because it is

robust to the estimation error in the volatility proxy. The MSE is computed according to

MSE =1

M

M∑t=1

e2t,t+T , (4.11)

where M is the number of T -day ahead volatility forecasts in the out-of-sample period. Quite

obviously, the best forecasting model is that one which has the smallest MSE.

However, the MSE measure has several drawbacks. In the context of our empirical study,

two of them deserve mentioning. First, the MSE measures the absolute size of the squared

forecast errors and, therefore, it is scale-depended (see Hyndman and Athanasopoulos (2013)).

As a result, the MSE can be used for comparing forecasting models on a single dataset only.

Second, even though the MSE allows comparing alternative forecasting models, it does not

allow measuring predictive accuracy per se. For example, if the volatility over some forecast

horizon is unpredictable, all model forecasts are likely to be worthless. In this case using

the MSE criterion (to select the best model among the poor ones) creates the illusion of

predictability when none is present.

To overcome the first drawback and obtain a scale-free measure of forecast accuracy, one

possibility is to compute the following ratio

∑Mt=1 e

2t,t+T∑M

t=1(σt,t+T − σ̄t,t+T )2, (4.12)

where σ̄t,t+T is the mean value of the realized volatility in the out-of-sample period

σ̄t,t+T =1

M

M∑t=1

σt,t+T . (4.13)

15

The ratio given by (4.12) compares the sum of squared forecast errors with the sum of squared

variations of σt,t+T . Hence, this ratio measures the relative size of the forecast errors and

can be used for comparing forecasting models on several datasets. Still, this ratio retains the

second drawback because a particular value of this ratio is difficult to interpret.

Galbraith (2003) suggests measuring the forecast accuracy using the following function

C = 1− MSE

MSEBM, (4.14)

where MSEBM is the MSE of a selected Benchmark Model (BM), for example, the historical

mean model.11 Again, function C can be used for comparing forecasting models on several

datasets, but it seems to retain the second drawback because a selected benchmark model may

be totally useless for forecasting.

To overcome both the drawbacks, we employ the proportion of variance explained by the

forecasts (this measure is proposed by Blair, Poon, and Taylor (2001)):

Ps(Ti) = 1−∑M

t=1 e2t,t+Ti∑M

t=1(σt,t+Ti − σ̄t,t+Ti)2. (4.15)

Notice that this measure equals one minus the ratio given by (4.12), therefore it is a scale-free

measure as well. Observe in addition that the computation of P is similar to the computation of

the out-of-sample R-squared (R2) in the constrained linear regression model with zero intercept

and unit slope. Therefore, the computation of P can be interpreted as

P = 1− MSE

TSS, (4.16)

where TSS denotes the Total Sum of Squares. Consequently, the value of P can be conveniently

reported in percentages.

It is worth noting that the smaller the respective MSE, the closer P to 100%. Therefore

this measure allows evaluating predictive accuracy per se. Given that P is equivalent to an R2

in the restricted model, it is likely to be smaller than conventional R2. The value of P can even

be negative since the ratio given by (4.12) can be greater than 1. A negative P indicates that

the forecast errors have a greater amount of variations than the actual volatility, which means

11In the historical mean model, the forecasted volatility equals the sample mean of the historical volatility.

16

that a forecasting model does not have any predictive power. The subscript s in this function

emphasizes that the function uses the spot predicted volatilities. Thus, in our definition, this

function produces the spot forecast accuracy curve.

The forward forecast accuracy curve is defined by

Pf (Ti) = 1−∑M

t=1 e2t+Ti−1,t+Ti∑M

t=1(σt+Ti−1,t+Ti − σ̄t+Ti−1,t+Ti)2, (4.17)

where

et+Ti−1,t+Ti = σ̂t+Ti−1,t+Ti − σt+Ti−1,t+Ti (4.18)

is the error in forecasting forward volatilities.

4.3 Statistical Inference

We conduct inference about estimated forecast accuracies. Specifically, we test the following

null hypothesis for both the spot and forward forecast accuracy:

H0 : P (Ti) ≤ 0. (4.19)

In words, the null hypothesis assumes the absence of predictive ability over horizon of length

Ti. We illustrate the computation of the p-value of the hypothesis for the spot forecast accu-

racy. The computation of the p-value of the corresponding hypothesis for the forward forecast

accuracy is conducted along the similar lines.

We remind the reader that the spot forecast accuracy over horizon of length i is computed

as

Ps(Ti) = 1− MSE

TSS= 1−

∑Mt=1 e

2t,t+Ti∑M

t=1 ε2t,t+Ti

, (4.20)

where

εt,t+Ti = σt,t+Ti − σ̄t,t+Ti . (4.21)

Therefore, the null hypothesis can alternatively be formulated as

H0 :MSE

TSS≥ 1. (4.22)

17

In words, under the null hypothesis the MSE is greater or equal to the TSS. Consequently, we

reject the null hypothesis when the MSE is significantly below the TSS.

If the times series of et,t+Ti and εt,t+Ti are assumed to be Gaussian, serially uncorrelated,

and contemporaneously uncorrelated, then the ratio MSETSS under the null hypothesis has the

usual F -distribution. However, in our case, the assumptions listed above are not met. First,

because we perform a multi-step ahead volatility forecasting, the time series of et,t+Ti and

εt,t+Ti are serially correlated. Second, the times series of et,t+Ti and εt,t+Ti are contemporane-

ously correlated.12 Finally, the assumption of Gaussian errors also seems to be inappropriate.

Therefore, to compute the p-value of the null hypothesis we employ the block bootstrap ap-

proach.

Each bootstrap trial consists of 2 steps. First, using the original time series of {e1,1+Ti ,

e2,2+Ti , . . . , eM,M+Ti} and {ε1,1+Ti , ε2,2+Ti , . . . , εM,M+Ti} we construct two re-samples {e∗1,1+Ti ,

e∗2,2+Ti , . . . , e∗M,M+Ti

} and {ε∗1,1+Ti , ε∗2,2+Ti

, . . . , ε∗M,M+Ti} using the stationary block-bootstrap

method of Politis and Romano (1994). The optimal block length is selected automatically using

the method proposed by Politis and White (2004).13 The two re-sampled time series retain not

only the historical serial correlations, but also the historical contemporary correlation between

the original time series. The latter is achieved by insuring that in the re-sampled data the pair

{e∗t,t+Ti , ε∗t,t+Ti

} corresponds to the pair of original observations {eτ,τ+Ti , ετ,τ+Ti} at some time

τ . Second, using the bootstrapped data we compute MSE∗, TSS∗, and finally the fraction

MSE∗

TSS∗ . We estimate the sampling distribution of the ratio MSETSS by carrying out N = 1000

bootstrap trials in total. Finally, to estimate the significance level, we count how many times

the simulated value of the ratio MSE∗

TSS∗ happens to be greater or equal to 1. Denote this value

by n. The p-value of the predictive ability test (over horizon of length Ti) is computed as

p(Ti) = n/N .

Even though we compute the forecast accuracy curves and corresponding p-values for each

individual financial asset, we do not report the results for each individual asset because each of

these results might not be truly representative. Instead, we perform a so-called “meta-analysis”

that combines the results for all individual assets that belong to the same asset class. A key

12For the sake of illustration, suppose that at some time the volatility increases dramatically. Subsequently,as a result of this spike in volatility, both the forecast error and the difference between the current volatilityand the long-run volatility increase.

13See also the subsequent correction of the method by Patton, Politis, and White (2009).

18

benefit of this approach is that the aggregation of information on individual assets leads to

a higher statistical power and more robust forecast accuracy estimates than it is possible to

obtain from the information on any individual asset.

The empirical results on the forecast accuracy curves are reported for each asset class by

averaging the forecast accuracy curves for individual assets. Specifically, denoting by k the

number of individual assets that belong to the same asset class, we compute

P (Ti) =k∑j=1

Pj(Ti), (4.23)

where Pj(Ti) denotes the (spot or forward) forecast accuracy over horizon of length Ti for asset

j. Similarly, for each asset class we combine the results of multiple tests of the null hypothesis

to ask whether there is evidence from the collection of individual tests that might reject the

null hypothesis. In other words, we combine k p-values for individual assets that belong to the

same asset class to test whether collectively they can reject a common null hypothesis of no

predictive ability.

When the p-values of individual tests are independent, Fisher’s method (Fisher (1925)) of

combining the probabilities is asymptotically optimal among essentially all methods of com-

bining the results of independent tests (Littell and Folks (1971)). The method is to compute

the following test statistic

Ψ(Ti) =k∑j=1

−2 log pj(Ti), (4.24)

where pj(Ti) denotes the p-value of the volatility unpredictability hypothesis over horizon of

length Ti for asset j. Fisher demonstrated that for independent p-values the statistic Ψ(Ti)

follows a chi-squared distribution with 2k degrees of freedom, Ψ(Ti) ∼ χ22k.

Brown (1975) extended the Fisher’s method to the dependent case where p-values are

correlated. In the dependent case, the statistic Ψ(Ti) has the following mean and variance

E[Ψ(Ti)] = 2k, V ar[Ψ(Ti)] = 4k + 2∑∑m<j

Cov (−2 log pm(Ti),−2 log pj(Ti)) , (4.25)

where Cov(x, y) represents the covariance between x and y. Brown’s method is based on

the assumption that the distribution of Ψ(Ti) can be approximated by that of cχ22f where

19

c represents a re-scaling constant and χ22f is a chi-squared distribution with 2f degrees of

freedom. Brown calculated c and f by equating the first two moments of Ψ(Ti) and cχ22f

resulting in

f =E[Ψ(Ti)]

2

V ar[Ψ(Ti)], c =

V ar[Ψ(Ti)]

2E[Ψ(Ti)]=k

f. (4.26)

The combined p-value is then given by

p(Ti) = 1.0− Φ2f

(Ψ(Ti)

c

), (4.27)

where Φ2f is the cumulative distribution function of χ22f .

The covariances in (4.25) can be evaluated using either a numerical integration or by

Gaussian quadrature. We follow the original Brown’s method and use a Gaussian quadrature

that approximates the covariances by two quadratic functions of the correlation coefficient

ρ(−2 log pm(Ti),−2 log pj(Ti)). The problem is that we do not have data on the correlation

coefficients between two individual p-values. However, since any such correlation coefficient is

a function of the correlation between forecast accuracies of assets m and j, we assume that

these correlations can be estimated using the time series of squared errors. That is, we assume

that

ρ(−2 log pm(Ti),−2 log pj(Ti)) = ρ(

(emt,t+Ti)2, (ejt,t+Ti)

2),

where emt,t+Ti and ejt,t+Ti are forecast errors for assets m and j respectively. We find that the

time series of squared forecast errors exhibit significant positive correlations. For example, for

the data set of stocks, depending on the length of the forecast horizon Ti the average correlation

coefficient varies from 0.25 to 0.55.

5 Empirical Results

We remind the reader that our total sample covers the period from January 1995 to December

2016. The period from January 1995 to December 1999 (5 years) is used as the initial in-

sample period. Consequently, the out-of-sample period in our study is from January 2000 to

December 2016 (17 years) that covers several interchanging calm and turbulent times. All

forecasts are obtained using an expanding window scheme. Given a fixed forecasting model

20

and a fixed forecast horizon, we perform out-of-sample volatility forecasting for every asset

in a selected data set. Specifically, first the parameters of a model are estimated using in-

sample observations [1, 2, . . . , t]. Then the future volatility is forecasted for T -days ahead, T ∈

{5, 10, 15, 20, . . . , 170, 175}. After that, we expand the in-sample period by one day (it becomes

[1, 2, . . . , t + 1]) and repeat the forecasting procedure. Since estimation of the parameters of

each forecasting model is rather time consuming, to speed up the forecasting process we re-

estimate the model’s parameters every 50 days only. In the end of this forecasting process, we

compute the spot and forward forecast accuracy for each asset in a data set, as well as the

corresponding p-values of the predictive ability test. Finally, we compute the average spot and

forward forecast accuracy over all assets in a data set, as well as the p-values of the combined

probability tests.

For each financial asset class and sub-class, Figures 2 - 7 plot the average spot and forward

forecast accuracy curves produced by three volatility forecasting models: EWMA, GARCH,

and HAR. The same figures plot the corresponding p-values of the combined probability test

for each forecasting model. Specifically, in each figure the top left panel plots the average

spot forecast accuracy curves, whereas the top right panel plots the average forward forecast

accuracy curves. The forecast accuracy is reported in percentages. The forecast horizon varies

from 1 to 35 (20) weeks for the spot (forward) forecast accuracy. The bottom left panel

plots the p-value of the combined probability test for the spot forecast accuracies, while the

bottom right panel plots the p-values of the combined probability test for the forward forecast

accuracies. The dashed horizontal lines in the bottom panels show the locations of the 5% and

10% significance levels. A p-value below the chosen significance level leads to a rejection of the

volatility unpredictability hypothesis.

21

Figure 2: Individual stocks.

−20

0

20

40

0 10 20 30

Forecast horizon, weeks

Spot

fore

ca

st

accu

racy,

%

−20

0

20

40

5 10 15 20


Forw

ard

fore

cast

accura

cy,

%

0.00

0.05

0.10

0.15

0.20

0.25

0 10 20 30


Sp

ot

fore

ca

st p

−va

lue

0.00

0.05

0.10

0.15

0.20

0.25

5 10 15 20


Fo

rward

fo

reca

st

p−

valu

e

Model EWMA GARCH HAR

Notes: The top left panel plots the average spot forecast accuracy curves, whereas the top right panelplots the average forward forecast accuracy curves. The bottom left panel plots the p-value of thecombined probability test for the spot forecast accuracies, while the bottom right panel plots the p-values of the combined probability test for the forward forecast accuracies.

The main question of our study is how far ahead one can forecast volatility. Perhaps not

surprisingly, the answer to this question depends on whether the spot or forward forecast accu-

racy curve is used. If the spot forecast accuracy curve is used to gauge the limits to volatility

predictability, depending on the forecasting model, asset class, and chosen significance level,

the volatility can be predicted over horizons ranging from 5 weeks to 35 weeks. In contract,

the forward forecast accuracy curves (together with the corresponding p-values) advocate that

volatility can be predicted only over horizons ranging from 3 to 12 weeks. On average, at the

5% significance level, the horizon of forward volatility predictability is limited to 7.5 weeks

only. This means, among other things, that the forward volatility is not forecastable when the

22

future period (over which the forward volatility is predicted) is defined to extend beyond the

first 2 months.

Figure 3: Stock market indices

−20

0

20

40

0 10 20 30


Spo

t fo

recast

accu

racy,

%

−20

0

20

40

5 10 15 20


Forw

ard

fo

recast

accu

racy,

%

0.00

0.05

0.10

0.15

0.20

0.25

0 10 20 30


Sp

ot

fore

cast

p−

valu

e

0.00

0.05

0.10

0.15

0.20

0.25

5 10 15 20


Forw

ard

fo

reca

st

p−

valu

e



The empirical results for all financial asset classes and sub-classes share some similarities.

The first similarity is that, for all asset classes and regardless of the forecasting model, the

forecast accuracy curves have the same shape. Whereas the forward forecast accuracy curve is

a strictly monotonically decreasing and convex function, the spot forecast accuracy curve has

a hump-shaped form. Specifically, the spot forecast accuracy first increases as forecast horizon

increases, then decreases. The maximum is usually attained at a 4-week horizon. Therefore,

judging by the spot forecast accuracy, the volatility can be forecasted with the highest precision

23

over approximately a 1-month horizon. Then, the longer the forecast horizon, the worse the

forecast accuracy is. For both the spot and forward volatilities the p-values increase with

horizon; the p-values grow faster for the forward volatility than those for the spot volatility.

Figure 4: Long-term bond indices

−20

0

20

0 10 20 30


Sp

ot fo

recast

accura

cy,

%

−20

0

20

5 10 15 20

Forecast horizon, weeksF

orw

ard

fore

cast

accura

cy,

%

0.00

0.05

0.10

0.15

0.20

0.25

0 10 20 30


Sp

ot

fore

cast

p−

valu

e

0.00

0.05

0.10

0.15

0.20

0.25

5 10 15 20


Forw

ard

fo

reca

st

p−

valu

e



24

Figure 5: Intermediate-term bond indices

−20

0

20

40

0 10 20 30


Sp

ot fo

recast

accura

cy,

%

−20

0

20

40

5 10 15 20


Forw

ard

fore

cast

accu

racy,

%

0.00

0.05

0.10

0.15

0.20

0.25

0 10 20 30


Sp

ot fo

reca

st

p−

valu

e

0.00

0.05

0.10

0.15

0.20

0.25

5 10 15 20


Fo

rward

fore

ca

st p

−va

lue



The second similarity lies in the comparative ranking of alternative forecasting models.

Typically, the EWMA model produces the worst forecast accuracy, whereas the HAR model

produces the best one. The reader is reminded that the main difference between the EWMA

and GARCH models is that the GARCH model captures the mean reversion of volatility,

whereas the EWMA model does not. Apparently, accounting for mean reversion allows the

GARCH model to outperform the EWMA model. The HAR model includes the lags of real-

ized volatility aggregated over different time horizons. We conjecture that superior forecast

accuracy provided by the HAR model is explained by the fact that this model captures not

only the persistence and mean reversion of volatility, but also the high- and low-frequency

25

components in the volatility dynamics.

Figure 6: Currencies

−20

0

20

40

0 10 20 30


Sp

ot fo

recast

accura

cy,

%

−20

0

20

40

5 10 15 20


Forw

ard

fore

cast

accu

racy,

%

0.00

0.05

0.10

0.15

0.20

0.25

0 10 20 30


Sp

ot fo

reca

st

p−

valu

e

0.00

0.05

0.10

0.15

0.20

0.25

5 10 15 20


Fo

rward

fore

ca

st p

−va

lue



Qualitatively, for all financial asset classes and sub-classes the forecast accuracy curves

look similarly. Yet there are small quantitative differences. In particular, the volatility can

be forecasted with the best accuracy in the stock and currency markets. In these markets,

the spot forecast accuracy amounts to approximately 45% (20%) over horizons ranging from 2

to 5 (20 to 25) weeks. The forward forecast accuracy decreases to zero over horizons ranging

from 10 to 15 weeks. In contrast, the volatility forecast accuracy is worse in the bond and

commodity markets. In these markets, the spot forecast accuracy amounts to about 30% (10%)

26

over horizons ranging from 2 to 5 (20 to 25) weeks. The forward forecast accuracy decreases

to zero over horizons ranging from 5 to 10 weeks.

Figure 7: Commodities

−20

0

20

40

0 10 20 30


Sp

ot fo

recast

accura

cy,

%

−20

0

20

40

5 10 15 20


Fo

rwa

rd fore

cast

accura

cy,

%

0.00

0.05

0.10

0.15

0.20

0.25

0 10 20 30


Sp

ot

fore

cast

p−

valu

e

0.00

0.05

0.10

0.15

0.20

0.25

5 10 15 20


Forw

ard

fo

reca

st

p−

valu

e



6 Discussion

Why do all standard volatility models fail to provide sensible longer-horizon volatility forecasts?

In this section we argue that the volatility dynamics do not fully correspond to the assumptions

embedded in the standard volatility models, which effectively assume highly persistent and

mean-reverting dynamics. In order to motivate our point of view, consider again the plot of

27

the historical volatility of the S&P 500 index depicted in Figure 1. After having studied the

dynamics of volatility, one conclusion appears to have emerged, namely, that the volatility

is volatile. High volatility eventually gives way to low volatility and vice versa. On the one

hand, this observation confirms that the volatility is persistent, mean-reverting, and exhibits

volatility clustering. On the other hand, this observation suggests looking from a different

perspective on the volatility dynamics. Specifically, the volatility dynamics can be considered

as a process which is constantly “switching” between high and low volatility states. The

interchanging periods of high and low volatility can be regarded as waves of irregular cycle

length and varying amplitude. Put differently, the volatility exhibits oscillating dynamics: it

cycles erratically over time and periods of high or low volatility follow one another.

Considering the aforesaid, the new perspective on the volatility dynamics motivates us to

describe the evolution of volatility in terms of periods of rising and falling volatility. As a

starting point we postulate the existence of two distinct phases in the evolution of volatility,

specifically, periods of rising and falling volatility. Since a movement from a rising (falling)

volatility phase to a falling (rising) volatility phase involves a turning point, we need an algo-

rithm for dating of turning points in volatility cycles. For this purpose, we employ a simple

and well-known algorithm for detecting turning points between the bull and bear phases of a

financial market, namely, the algorithm of Lunde and Timmermann (2004). This algorithm is

motivated by the idea that, in order to qualify for a distinct bull or bear phase, the financial

asset price should change substantially from the previous peak or trough. For example, the

rise (fall) in the price should be greater than 20% from the previous local trough (peak) in

order to qualify for being a distinct bull (bear) market.

The algorithm of Lunde and Timmermann (2004) is based on imposing a minimum on the

price change since the last peak or trough. This dating rule is implemented in the following

manner. Let λ1 be a scalar defining the threshold of the movement in volatility that triggers

a switch from a falling-volatility state to a rising-volatility state, and let λ2 be the threshold

for shifts from a rising-volatility state to a falling-volatility state. Denote by Vt the value of

volatility at time t and suppose that a trough in volatility has been detected at time t0 < t.

Therefore, the algorithm knows that a rising-volatility state begins from time t0 + 1. The

28

algorithm first finds the maximum value of volatility on the time interval [t0, t]

V maxt0,t = max{Vt0 , Vt0+1, . . . , Vt}

and then computes the (inverse of the) relative change in volatility where the maximum value

serves as the reference value

δt =V maxt0,t − VtV maxt0,t

.

If δt > λ2, then a new peak is detected at time tpeak at which volatility attains maximum on

[t0, t]. The period [t0 + 1, tpeak] is labeled as a rising-volatility state. A falling-volatility state

begins from tpeak + 1.

If, on the other hand, a peak in volatility has been detected at time t0 < t, then the

algorithm finds the minimum value of volatility on the time interval [t0, t]

V mint0,t = min{Vt0 , Vt0+1, . . . , Vt}

and computes the relative change in volatility from the minimum value

δt =Vt − V min

t0,t

V mint0,t

.

If δt > λ1, then a new trough is detected at time ttrough at which volatility attains minimum

on [t0, t]. The period [t0 + 1, ttrough] is labeled as the falling-volatility state. A rising-volatility

state begins from ttrough + 1.

The application of the dating algorithm requires making an arbitrary choice of two param-

eters {λ1, λ2}. It is unclear how to make an appropriate choice in our case because, to the best

of the authors’ knowledge, no one before has attempted to describe the evolution of volatility

in terms of periods of rising and falling volatility. Lunde and Timmermann (2004) report the

empirical results for several alternative sets of parameters. We select the most typical param-

eter values {λ1 = 20%, λ2 = 15%}. It should be noted, however, that the results reported

below in this section are, to some extent, sensitive to the changes in the parameters. However,

qualitatively our findings remain intact when the parameters of the algorithm are changed.

29

Figure 8 illustrates the results of turning points detection for four individual assets14 that

belong to different asset classes: the S&P500 stock market index, the long-term bond market

index, the US/UK exchange rate, and the crude oil brent price. Shaded areas in each panel

indicate the periods of falling volatility. Once we establish turnings points in volatility, it

is possible to summarize various characteristics of the movements between each phase. We

compute the duration of each phase, D, and report the minimum, average, median, and the

maximum duration. For instance, the duration of a rising-volatility phase is computed as

D = tpeak − ttrough,

where ttrough denotes the date of a trough in volatility and tpeak denotes the day of the sub-

sequent peak in volatility. We also compute the amplitude of each phase, A, and report the

minimum, average, median, and the maximum amplitude. The amplitudes of the rising- and

falling-volatility phases are computed as

Arising =Vpeak − Vtrough

Vtrough, Afalling =

Vtrough − VpeakVpeak

.

For a rising-volatility phase, Vtrough denotes the value of volatility at a trough and Vpeak denotes

the value of volatility at the subsequent peak. For a falling-volatility phase, Vpeak denotes the

value of volatility at a peak and Vtrough denotes the value of volatility at the subsequent trough.

The summary statistics of periods of rising and falling volatility are reported in Table 2.

The statistics in the table are interesting and not only qualitatively, but also quantitatively

strikingly similar across various asset classes. It is clear that falling-volatility phases tend to

be longer than rising-volatility phases. Both the median duration and the average duration of

phases is rather stable across various asset classes. Specifically, the average (median) durations

of the rising- and falling-volatility phases amount to 6.5 (5) and 13 (10) weeks respectively.

Consequently, the average (median) duration of a falling-volatility phase exceeds the average

(median) duration of a rising-volatility phase by a factor of 2. The minimum duration of a

14For each asset, the historical volatility is estimated using daily returns and the EWMA model. It shouldbe noted, however, that volatility is unobservable and thus must be estimated from the data. There are severalalternative approaches to estimating the historical volatility. We use the EWMA model because it produces amuch smoother volatility curve as compared to, for example, a conditional volatility from the fitted GARCH(1,1)model.

30

Figure 8: Periods of rising and falling volatility identified by the dating algorithm

20

40

60

1995 2000 2005 2010 2015

Vo

latilit

y, %

S&P 500 index

5

10

15

20

25

1995 2000 2005 2010 2015

Vo

latilit

y, %

Long−term bond index

10

20

30

1995 2000 2005 2010 2015

Vo

latilit

y, %

US/UK exchange rate

25

50

75

100

1995 2000 2005 2010 2015

Vo

latilit

y, %

Crude oil brent price

Notes: Shaded areas indicate the periods of falling volatility.

rising-volatility phase amounts to 1 day only (in bond and currency markets). This number

says that volatility can increase for more than 20% over the course of single day. The average

(median) full cycle length amounts to 21.5 (15) weeks. On average, over a rising-volatility

phase, volatility increases by approximately 60% from the previous trough; over a falling-

volatility phase it decreases by about 30% from the previous peak. The median peak amplitude

amounts to approximately 40%. Since the median peak amplitude is substantially less than

the average peak amplitude, the distribution of peaks in volatility is highly non-symmetrical.

Specifically, the distribution of peaks is skewed to the right.

The summary statistics of periods of rising and falling volatility, reported in Table 2, help

explain why all standard volatility models fail to provide sensible longer-horizon volatility

forecasts. First of all, the volatility is sometimes rather non-persistent. That is, its value can

change dramatically over a course of a single day. Second, the volatility does not revert to its

long-run mean as the models assume. There are two issues with the mean-reverting dynamics

incorporated in the GARCH and HAR models. The first issue is that a typical estimate of the

31

Table 2: Summary statistics of periods of rising and falling volatility

S&P 500 Bond index US/UK rate Crude oil priceStatistics

Rising Falling Rising Falling Rising Falling Rising Falling

Number of phases 56 57 58 58 56 55 59 59Minimum duration 0.6 2.6 0.2 3.0 0.2 2.2 1.0 2.4Average duration 6.5 13.1 7.1 12.2 6.7 13.3 6.5 12.7Median duration 4.4 10.4 5.8 10.6 5.7 11.2 5.6 10.0Maximum duration 25.0 45.6 27.6 40.8 24.4 40.4 22.4 52.8Minimum amplitude 20 -15 21 -16 20 -15 20 -16Average amplitude 71 -36 49 -30 56 -32 58 -32Median amplitude 47 -36 36 -28 38 -32 39 -30Maximum amplitude 352 -78 177 -53 265 -58 338 -74

Notes: Duration is measured in weeks. The amplitude is measured in percentages.

volatility half-life amounts to 15 weeks (Engle and Patton (2001)). That is, if volatility is above

the long-run mean, the mean-reverting models assume that during the 15 subsequent weeks the

volatility will move halfway back towards its long-run mean. De facto, our results reveal that in

50% of cases over the course of subsequent 15 weeks the volatility will go through a full cycle:

a period of increasing volatility will be followed by a period of decreasing volatility or vice

versa. Interestingly, the estimated median cycle duration (15 weeks) is double as long as the

estimated horizon of predictability of forward volatility (7.5 weeks). This observation suggests

that the horizon of volatility predictability is limited from above by the median half-cycle

duration.

The second issue is that no model assumes that if the volatility is above (below) its long-run

mean, it can increase (decrease) even further. To make the discussion more concrete, consider

two specific examples that use data on the historical volatility of the S&P 500 index. Over

the period from 1995 to 2016, the long-run average volatility of the S&P 500 index was 16.6%

(on annual basis). In the first example, consider the volatility forecast on September 19, 2008,

when the index volatility increased to 33%. The EWMA model assumes that the volatility is

persistent and, therefore, it forecasts the volatility of 33% in all future.15 The GARCH model,

on the other hand, assumes that the volatility is persistent and mean-reverting. As a result,

the GARCH model forecasts that over the course of the following 15 weeks the volatility will

gradually decrease to 25%. No model assumes that the volatility can increase further. However,

the volatility kept increasing further and even further and on December 2, 2008, it attained

74%. In the second example, consider the volatility forecast on August 15, 2003, when the

15For the sake of illustration, we assume that rt = 0 in the EWMA model given by equation (4.2).

32

volatility decreased to 15%. Whereas the EWMA model predicts that the volatility will stay on

the same 15% level in all future, the GARCH model forecasts that the volatility will gradually

increase to its long-run mean. No model assumes that the volatility can decrease further. In

reality, the volatility kept falling down and decreased to 8%; the volatility stayed below its

long-run mean till August 6, 2007, over the course of 4 full years.

7 Summary and Conclusions

Even though volatility forecasting is crucial for portfolio management, risk management, and

pricing of derivative securities, little is still known about how far ahead one can forecast

volatility. Whereas the results reported in some papers seem to suggest that volatility is

forecastable over long-term horizons that extend to several years, in a handful of studies the

researchers demonstrate that forecast horizon is limited to 6-8 weeks. In this paper we aim to

fill this gap in the literature on the horizon of volatility predictability.

First of all, we suggest a novel approach to measuring the forecast accuracy over various

horizons. Specifically, we propose to use not only the spot forecast accuracy curve, but also the

forward forecast accuracy curve. We argue that the forward forecast accuracy curve provides

very useful information about the model’s ability to forecast volatility across various horizons.

The term structure of volatility predictability should be described by the spot and forward

forecast accuracy curves. Both curves are highly relevant in practice because in financial

markets there is a trade in long-term option contracts (LEAPS) with maturities up to 39

months in the future and contracts on forward volatility (FVA) with maturities up to 24

months ahead. The traders in these contracts are naturally interested in the horizon of the

spot and forward volatility predictability.

Second, using three popular models (EWMA, GARCH, and HAR) we conduct the most

comprehensive evaluation of the horizon of volatility predictability in all major financial mar-

kets. Our empirical results are strikingly similar across the different financial markets. We find

that, depending on the asset class, the horizon of the spot volatility predictability is confined

to 20-35 weeks, whereas the horizon of the forward volatility predictability is even shorter and

limited to 5-10 weeks. The longest horizon of the volatility predictability is observed in the

stock and currency markets, whereas the shortest one is observed in the bond markets. In

33

the majority of cases, the HAR model provides the best forecast accuracy, while the EWMA

model provides the worst one. Thereby our results are consistent with academic studies such as

Christoffersen and Diebold (2000) and Galbraith and Kisinbay (2005), who find that volatility

is unforecastable beyond a relatively short-term horizon. In addition, our results suggest that

the horizon of volatility predictability is much shorter than the longest maturity of traded

LEAPS and FVA contracts.

Finally, we suggest a plausible explanation for why standard volatility models are not able

to provide sensible longer-horizon volatility forecasts. Our analysis reveals an important but

currently overlooked stylized fact about volatility: it cycles erratically over time and periods

of high or low volatility follow one another. For a few distinct financial assets, we provide

the descriptive statistics of the periods of rising and falling volatility states. We demonstrate

that the volatility dynamics do not fully correspond to the assumptions embedded in standard

models, which assume that volatility is highly persistent and mean-reverting. Specifically, we

find that the volatility is sometimes rather non-persistent: its value can change dramatically

over a course of a single day. In addition we indicate that the volatility does not revert

to its long-run mean as the models assume. All this suggests the desirability of developing

volatility models that embed the new stylized fact about volatility dynamics. Such models can

potentially significantly extend the horizon of volatility predictability.

References

Alford, A. W. and Boatsman, J. R. (1995). “Predicting Long-Term Stock Return Volatility:

Implications for Accounting and Valuation of Equity Derivatives”, Accounting Review,

70 (4), 599–618.

Blair, B. J., Poon, S.-H., and Taylor, S. J. (2001). “Forecasting S&P 100 Volatility: The Incre-

mental Information Content of Implied Volatilities and High-Frequency Index Returns”,

Journal of Econometrics, 105 (1), 5 – 26.

Bollerslev, T. (1986). “Generalized Autoregressive Conditional Heteroskedasticity”, Journal

of Econometrics, 31 (3), 307–327.

34

Brown, M. B. (1975). “400: A Method for Combining Non-Independent, One-Sided Tests of

Significance”, Biometrics, 31 (4), 987–992.

Cao, C. Q. and Tsay, R. S. (1992). “Nonlinear Time-Series Analysis of Stock Volatilities”,

Journal of Applied Econometrics, 7 (S1), S165–S185.

Christoffersen, P. F. and Diebold, F. X. (2000). “How Relevant is Volatility Forecasting for

Financial Risk Management?”, Review of Economics and Statistics, 82 (1), 12–22.

Corsi, F. (2009). “A Simple Approximate Long-Memory Model of Realized Volatility”, Journal

of Financial Econometrics, 7 (2), 174–196.

Corte, P. D., Kozhan, R., and Neuberger, A. (2017). “The Cross-Section of Currency Volatility

Premia”, Working paper, Imperial College Business School, Warwick Business School,

and Cass Business School.

Corte, P. D., Sarno, L., and Tsiakas, I. (2011). “Spot and Forward Volatility in Foreign

Exchange”, Journal of Financial Economics, 100 (3), 496 – 513.

Egelkraut, T. M., Garcia, P., and Sherrick, B. J. (2007). “The Term Structure of Implied

Forward Volatility: Recovery and Informational Content in the Corn Options Market”,

American Journal of Agricultural Economics, 89 (1), 1–11.

Engle, R. and Patton, A. (2001). “What Good is a Volatility Model?”, Quantitative Finance,

1 (2), 237–245.

Engle, R. F. and Rangel, J. G. (2008). “The Spline-GARCH Model for Low-Frequency Volatil-

ity and Its Global Macroeconomic Causes”, Review of Financial Studies, 21 (3), 1187–

1222.

Figlewski, S. (1997). “Forecasting Volatility”, Financial Markets, Institutions & Instruments,

6 (1), 1–88.

Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh.

French, K. R., Schwert, G., and Stambaugh, R. F. (1987). “Expected Stock Returns and

Volatility”, Journal of Financial Economics, 19 (1), 3 – 29.

35

Galbraith, J. W. (2003). “Content Horizons for Univariate Time-Series Forecasts”, Interna-

tional Journal of Forecasting, 19 (1), 43 – 55.

Galbraith, J. W. and Kisinbay, T. (2005). “Content Horizons for Conditional Variance Fore-

casts”, International Journal of Forecasting, 21 (2), 249 – 260.

Glasserman, P. and Wu, Q. (2011). “Forward and Future Implied Volatility”, International

Journal of Theoretical and Applied Finance, 14 (3), 407–432.

Green, T. C. and Figlewski, S. (1999). “Market Risk and Model Risk for a Financial Institution

Writing Options”, Journal of Finance, 54 (4), 1465–1499.

Hyndman, R. J. and Athanasopoulos, G. (2013). Forecasting: Principles and Practice. OTexts.

Littell, R. C. and Folks, J. L. (1971). “Asymptotic Optimality of Fisher’s Method of Combining

Independent Tests”, Journal of the American Statistical Association, 66 (336), 802–806.

Longerstaey, J. and Spencer, M. (1996). “RiskMetricsTM—Technical Document”, Tech. rep.,

Morgan Guaranty Trust Company of New York: New York.

Lunde, A. and Timmermann, A. (2004). “Duration Dependence in Stock Prices: An Analysis

of Bull and Bear Markets”, Journal of Business and Economic Statistics, 22 (3), 253–273.

Patton, A., Politis, D. N., and White, H. (2009). “Correction to “Automatic Block-Length

Selection for the Dependent Bootstrap” by D. Politis and H. White”, Econometric

Reviews, 28 (4), 372–375.

Politis, D. N. and White, H. (2004). “Automatic Block-Length Selection for the Dependent

Bootstrap”, Econometric Reviews, 23 (1), 53–70.

Politis, D. and Romano, J. (1994). “The Stationary Bootstrap”, Journal of the American

Statistical Association, 89 (428), 1303–1313.

Poon, S.-H. and Granger, C. W. J. (2003). “Forecasting Volatility in Financial Markets: A

Review”, Journal of Economic Literature, 41 (2), 478–539.

Taleb, N. N. (1997). Dynamic Hedging: Managing Vanilla and Exotic Options. New York:

John Wiley & Sons.

36

Documents

The Limits to Volatility Predictability: Quantifying Forecast … ANNUAL MEETINGS... · 2018-03-15 · The Limits to Volatility Predictability: Quantifying Forecast Accuracy Across