Time series analysis of option pricing model performance

TIME SERIES ANALYSIS OF OPTION PRICING MODELS PERFORMANCE MASTER THESIS IN FINANCE

Mal Hartl (6060929)

Supervisor: Gildas Blanchard

School of Business and Economics

January 2014

1

Acknowledgements

When I started this thesis in June 2013, it was a rather ambitious project, considering that my knowledge of option pricing models was limited to a vague understanding of the Black-Scholes equations. Six months of continuous work later, I must say I am rather impressed by all the things this thesis has taught me. I would like to address my gratitude to my thesis supervisor, Gildas Blanchard, for his constant availability, patient guidance and helpful remarks during the writing of this report. Although I take full credit for the results presented below, they were the fruit of a continuous exchange of ideas between Gildas and I must thank him for keeping my eye on the prize. I also thank my family and close ones for their help and support.

2

Table of Contents INTRODUCTION .......................................................................................................................... 4

EMPIRICAL METHODOLOGY ................................................................................................... 6

DESCRIPTION OF THE DATA ..................................................................................................................... 6

Option data ................................................................................................................................................ 6

Index data ................................................................................................................................................... 9

OPTION PRICING MODELS ..................................................................................................................... 10

Ad-hoc Black and Scholes model ................................................................................................... 10

Heston and Nandi GARCH Model ................................................................................................... 11

Reasons for choosing the ABS and HN models ........................................................................ 13

CALIBRATION .......................................................................................................................................... 14

ABS procedure ........................................................................................................................................ 14

HN-GARCH model ................................................................................................................................. 15

EMPIRICAL RESULTS ...............................................................................................................16

I. OPTION PRICING PERFORMANCE OF THE MODELS ........................................................................ 17

In-sample model performance ........................................................................................................ 17

In-sample performance across maturity and moneyness categories ........................... 21

Out-of-sample model performance ............................................................................................... 23

II. FACTORS OF OPTION PRICING PERFORMANCE .............................................................................. 24

Foreword .................................................................................................................................................. 24

Stationarity and unit root testing ................................................................................................. 24

Common trends among ABS and HN pricing errors ............................................................. 25

Error forecasting variables .............................................................................................................. 26

In-sample time series regression for the ABS model ............................................................ 31

In sample regressions for the HN GARCH model .................................................................... 38

Out-of-sample time series regression for the ABS and HN model .................................. 41

CONCLUSIONS ............................................................................................................................47

REFERENCES ..............................................................................................................................50

APPENDIX ...................................................................................................................................53

3

List of figures Figure 1: S&P 500 index level (Wednesdays only), daily log-returns and

autocorrelation of squared returns .................................................................... 9 Figure 2: Calibration of the ABS model on the cross-section of OTM options on

June 16th 2010 and smoothed volatility surface ............................................ 15 Figure 3: Weekly VIX Index level (Jan. 1996 - June 2010 .................................... 16 Figure 4: Calibration of the HN GARCH model on the cross-section of OTM SPX

options on Jan. 17th, 1996 .............................................................................. 16 Figure 5: Time series of the in-sample option pricing performance (January 17th,

1996- June 30th, 2010) .................................................................................... 17 Figure 6: Comparison between the original DFW procedure and the HN GARCH

model .............................................................................................................. 19 Figure 7: Comparison of the pricing error across moneyness and maturity

categories for the HN and ABS models ......................................................... 22 Figure 8: Out-of-sample pricing errors of the HN and ABS models ..................... 23 Figure 9: Autocorrelations and XCF function of first-differenced $RMSE series

for ABS and HN ............................................................................................. 25 Figure 10: Autocorrelations and XCF function of original $RMSE series for ABS

and HN............................................................................................................ 25 Figure 11: Zero curve interpolation, yield spread and option pricing error ........... 34 Figure 12: Dividend yield and pricing error ........................................................... 37 Figure 13: Illustration of the distortions created by abnormal demand for "crash-

protective" puts ............................................................................................... 39

List of tables Table 1: Sample periods covered across option price studies .................................. 6 Table 2: Results of the filtering process ................................................................... 7 Table 3: Description of the option data .................................................................... 8 Table 4: Calibration of the HN GARCH and ABS pricing models ....................... 18 Table 5: In sample regressions for ABS pricing errors (filtered sample of OTM

options) ........................................................................................................... 32 Table 6: In sample regressions for HN pricing error (filtered sample of OTM

options) ........................................................................................................... 39 Table 7: Out of sample regressions for ABS pricing errors (filtered sample of

OTM options) ................................................................................................. 43 Table 8: Out of sample regressions for HN pricing errors (filtered sample of OTM

options) ........................................................................................................... 44

4

Introduction Since the creation of the first listed option exchange in 1973, the market for stock options has grown at an incredibly rapid pace. From 911 contracts on the opening day of the Chicago Board Options Exchange in 1973, the daily average volume increased to an impressive 4.25 million contracts in 2013, with more than 800,000 options traded daily on the S&P 500 index. Every trading day, thousands of market participants rely on various option pricing models to compete on this formidable marketplace.

Along with this dramatic growth, the last forty years have also witnessed the birth and flourishing of an option pricing literature, initiated by Fischer Black and Myron Scholes pioneering article in 1973. Since then, academics have worked relentlessly on developing ever more performing option pricing models, either by relaxing the assumptions of the Black-Scholes formula to fit the so-called volatility smile, or by creating alternative models improving the fit on the prices observed on the market. Examples include the binomial model of Cox, Ross and Rubinstein (1979), the stochastic volatility model of Heston (1993) and the deterministic volatility model of Dumas, Fleming and Whaley (1998). The steady increase in processor computational power has been accompanied by the emergence of increasingly complex option valuation models, to the point that recent models price options with a remarkable accuracy most of the time.

Although the literature has achieved a great deal by developing and comparing a large number of pricing models, it has consistently neglected one aspect: the time series behavior of option pricing performance. As said, most modern models usually achieve a good level of performance. However, observing option pricing performance over longer periods leads to a common conclusion for all models: performance is not constant over time and it appears to follow some form of cyclical pattern. This thesis addresses this gap in the existing literature by studying the time-series of option pricing performance for two widely accepted models, the ad hoc Black and Scholes, or practitioners model, and the Heston and Nandi GARCH model with Gaussian innovations. For our study, we rely on fourteen years of data on S&P 500 index options from 1996 to 2010. This long time frame allows us to capture long-run trends in valuation performance that might be overlooked over shorter horizons, such as the two- or three-year sample periods usually found in the option pricing literature. It also allows us to observe the effects of the two most recent financial crises.

Our empirical research consists of two parts. We first derive the time series of the pricing error by calibrating our two models on our sample of option data, and we show that our results support the hypothesis of a cyclical pattern in option pricing performance. In a second part, we then review the literature to construct a set of economic factors that could explain the time series behavior of the option pricing error. Using this set of explanatory variables, we then document that upward and downward movements in pricing performance can be explained by changes in the overall economic environment.

5

For both our models, our results unambiguously show that the level of mispricing on index options considerably increases with worsening financial market conditions. Specifically, our research shows that the level of error grows along with declining dividend yields, fluctuations in short-term interest rates and yield spread, decreasing price to book ratios and sentiment drops among institutional investors. We document that, as economic conditions deteriorate, gradually booming demand for crash-protective put options induces distorted implied volatility surfaces and high period-to-period volatility for the pricing model parameters, which ultimately lead to degraded pricing performance. Moreover, we show that these results are consistent across models and that they hold for in- and out-of-sample valuation. To the extent of our knowledge, this is the first piece of evidence of a direct relationship between option pricing error and financial market conditions. Our research therefore allows a better understanding of the factors that drive option-pricing performance. It marks a first step toward answering the until now unaddressed question: What explains the ups and downs in option valuation accuracy over time?

This thesis is organized as follows. In the first section, we describe our sample of option and index data, we summarize the theory behind our option pricing models, and we briefly explain our calibration procedure. In the second section, we present our empirical results. We first describe the time series of valuation errors for the ad hoc Black and Scholes and Heston and Nandi models and argue that they follow cyclical trends. We then turn to time series regressions to identify the determinants of this common periodic trend. In the last section, we conclude and review the various ways to improve and extend our study.

6

Empirical methodology In this section, we provide an overview of the data, methodology and option pricing theory we use to obtain our empirical results. We first describe our sample of option and index data and then proceed to develop the theoretical framework underlying the two option pricing models we rely on in this research: the ad-hoc Black-Scholes (ABS) model and the Heston and Nandi GARCH model (HN). Next, we provide the motivations for choosing these two models in particular. Finally, we briefly describe the calibration of the two option pricing models on the sample of option data.

Description of the data

Option data

We use intra-day data on European SPX options to test our models and compute the two time series of the dollar root mean square error ($). We consider closing prices of out-of-the-money (OTM) put and call options downloaded from OptionMetrics for each Wednesday from January 17th, 1996 to June 31, 20101, resulting in 263,127 observations (126,555 calls and 136,572 puts). Note that we choose to exclude ITM and DITM put and call options from our sample to limit the risk of having our results driven by liquidity biases. As Alexander (2008) points out, ITM options behave more like the underlying asset than OTM options, and are therefore a less viable hedging tool for portfolio managers, who are less interested in trading them. Another important remark is that the sample period we consider in our study is significantly longer that those usually covered in the option pricing literature, as Table 1 points out. By choosing longer time series of option and index records, we unavoidably increase the computational burden associated with the calibration process. However, we also expect to capture longer-term dynamic movements of this error that might be overlooked or difficult to observe over shorter periods. Table 1: Sample periods covered across option price studies

Study SPX options data Sample period covered Barone-Adesi, Engle and Mancini (2008) Wednesdays only Jan. 2002-Dec. 2004

Christoffersen and Jacobs (2004) Wednesdays only Jun. 1988-Jun. 1991 Heston and Nandi (2000) Wednesdays only Jan. 1992-Dec. 1994

Dumas, Fleming and Whaley (1998) Wednesdays only Jun. 1988-Dec. 1993

As we noted earlier, we decide to follow common practice and restrict our research to Wednesday only data, because it allows us to study longer time series for the pricing error. For our empirical setup, the weekly data criterion leaves us with 754 days of option data. Time series for daily data have also been studied for some sub-periods and are available upon request. However, working with daily data considerably increases the size of the total sample and renders the calibration

1 Due to stock market holidays and dramatic events such as 9/11, the CBOE trading floor was

closed for eight Wednesdays in our sample. Data for the subsequent trading day is used in the case of missing Wednesdays. Since the stock market was closed from Tuesday 09/11/2001 to Monday 11/17/2001, we have a missing week of data in our sample.

7

of the ABS and HN model on option prices computationally tedious, with little additional information. The risk-free rate of return for each particular option maturity is calculated by quadratic interpolation of the term structure of US Treasury Bill interest rates. We use the mid-point of the bid-ask quote as the option price. We also account for dividends paid out over the options life by calculating their present value and subtracting it from the current index level. To avoid later problems during the calibration process, we also discard option contracts that meet the following filtering criteria:

1) Options for which the bid-ask spread is smaller than the minimal tick size (5 cents for options are worth less than 3$ and 10 cents for all other options), as well as options with price quotes lower than $3/8. This allows us to avoid concerns of price discretization2

2) Options with a time to maturity less than 10 days or more than 360 days, to avoid liquidity-driven pricing biases

3) Options with an implied volatility superior to 70% 4) Put and call options that violate the no-arbitrage relationship. Let () be

the call price of a call option with strike and time to maturity , let () be the corresponding put price and let be the current index level, net of dividends. Options that violate the following inequalities are discarded:

Table 2 provides the detail of the data filtering process. For OTM options, about 37% of the data has been screened out, which leaves us with a sizeable sample of 164,675 options.

Table 2: Results of the filtering process

Total OTM options 263,127

Number of discarded options Filtering criterion

49,933 Price ( max0, () ( 1 ) () > max0, ()

8

We now proceed to describe the filtered sample. To this end, we follow Barone-Adesi, Engle and Mancini (2008), Bollen and Whaley (2004) and Bakshi et al (1997) and segregate our option data into moneyness and time to expiration categories. Regarding time to maturity, an option can be labelled short-term ( < 60 days), medium-maturity ( 60 180 days) and long-maturity ( > 180 days). We also define three moneyness categories based on the options delta3. Call options at time are classified as deep-out-of-the-money (DOTM) if 0.02 < $% 0.125, as out-the-money (OTM) if 0.125 < $% 0.375 and as near the money if 0.375 < $% 0.625. Similarly, put options are said to be deep-out-of-the money for 0.125 < $( 0.02, out-of-the-money for 0.375 < $( 0.125and near-the-money for 0.625 < $( 0.375. Table 1 describes some sub-sample properties for our 164,675 options contracts, divided into six moneyness-maturity categories. Table 3: Description of the option data

Maturity Moneyness < 60 60 180 > 180DOTM Mean Volume 1404,49 340,46 157,37

Mean option price 2,33 4,61 7,67 Mean *+, 0,27 0,27 0,26 % of puts 0,67 0,66 0,63 Observations 22.861 18.678 14.667

OTM Mean Volume 1837,25 640,14 230,12 Mean option price 11,16 20,62 34,10 Mean *+, 0,24 0,24 0,23 % of puts 0,57 0,61 0,61 Observations 19.595 21.630 21.218

NTM Mean Volume 2111,43 869,81 244,56 Mean option price 27,66 45,93 73,63 Mean *+, 0,22 0,22 0,21 % of puts 0,06 0,11 0,10 Observations 8385 9568 8957

We provide summary statistics for the average bid-ask midpoint price, the Black-Scholes implied volatility, the average trading volume, the proportion of put options and the total number of observations. The average midpoint prices range from $2.33 for DOTM options with short time to expiration to $73.63 for long-maturity NTM options. NTM puts (all maturities together) are considerably less represented in the total sample as the other categories: they only account for 1.6% of our sample, compared with 25.3% for DOTM puts and 16.8% for NTM calls. The average trading volume is highest for NTM short-term options and decreases as time to expiration increases and moneyness decreases. The average option midpoint price increases along with moneyness and maturity and is maximal for long-term NTM options. Table 3 also shows distinctive patterns for the implied volatility. Given a certain level of moneyness, the implied volatility decreases as the options time to expiration increases. The volatility smile also manifests itself across moneyness categories for a particular maturity category. The mean number

3 see for example Bollen and Whaley (2004)

9

of option contracts per Wednesday is 218.6, with a standard deviation of 118.70, a minimum of 92 and a maximum of 647.

Index data

Figure 1 shows the evolution of the level of the S&P 500 index over the sample period, the daily log-returns for Wednesdays only and the autocorrelation of squared returns. The index level ranges from a minimum of $606.37 on January 17th, 1996 to a maximum of $1562.5 on October 10th, 2007, with an average of $1132 a standard deviation of $227.41. It is important to note that our sample period contains the bursting of both the dot-com, in early 2000, and the US credit bubble, in late 2007. This is a noteworthy feature of our sample since we are a priori interested in the dynamic behavior of the time-varying HN and ABS pricing errors during these periods of financial turmoil.

Figure 1: S&P 500 index level (Wednesdays only), daily log-returns and autocorrelation of squared returns

10

Log-returns oscillate around a mean close to zero (7.21e-4), with standard deviation of 0.0254 on a daily base, and skewness and kurtosis of -0.59 and 7.043 respectively. One can discern distinctive patterns in volatility from the second plot, with a period of low volatility between 2003 and 2007 and times of high volatility during the two crises. Current squared log-returns are also slightly correlated with their lagged values, as shown in the last plot, with values around 0.2 for the first five lags. GARCH models, such as the one that we will present in the next section, are particularly suited to account for the excess kurtosis of the returns distribution (7.043 compared to 1 . = 0.036 under the i.i.d. hypothesis) and the autocorrelation of squared returns.

Option pricing models We now proceed to the description of the two option pricing models we rely on to estimate the time series of the option pricing error on out-of-the money S&P500 index options over the period January 1996-June 2010.

Ad-hoc Black and Scholes model

The ad-hoc Black and Scholes option pricing formula (hereafter ABS), proposed by Dumas, Fleming and Whaley (1998), refers to a Black-Scholes pricing model for which the implied volatilities of the options are smoothed across a series of explanatory variables. Let () be the call price of a particular European call option at time . The Black-Scholes model yields the following expression for (): where () is the spot price of the underlying asset at time net of the discounted value of expected dividends paid over the options life, is the strike price of the option, is its maturity date, 1 is the risk-free interest rate, .(2) is the cumulative unit normal density function with upper limit d and

The ABS approach extends the classical Black-Scholes model by taking into account the well-documented volatility smile exhibited by the implied volatilities of real options. This is exhibited in equation ( 3 ) and ( 4 ), in which * is a deterministic quadratic function of the following form:

The model in equation ( 5 ) captures variations of* due to disparities in relative moneyness and maturities among the set of options that are neglected when using the original Black-Scholes approach, which assumes constant volatility * = 34. A minimum value for * is imposed to avoid any negative volatilities values. We choose to restrict our deterministic volatility function to quadratic terms, which, as

() = ().(25) ()).(26), ( 2 )

25 = 78(,() 9 )::4.;

11

pointed out in Dumas, Fleming and Whaley, are necessary to fit the parabolic shape of the volatility smile. Berkowitz (2008) shows that in most cases, higher order terms do not improve the cross sectional fit of the smile and only result in overparameterization. Note that we also choose to smooth the implied volatilities across relative moneyness (/), as in Bollen and Whaley (2004), rather than across strike prices, as in the original formula in Dumas, Fleming and Whaley (henceforth DFW). As we discuss later when we present our results, the model we present here performs considerably better than the original model specified in DFW.

Heston and Nandi GARCH Model

Return dynamics

Heston and Nandi (2000) present a discrete-time GARCH model with Gaussian innovations for the variance of a spot asset, from which they derive a closed-form valuation formula for European options. In this research, we will focus exclusively on the simplified first-order case(F = G = 1)4. In the Heston and Nandi model (hereafter: HN), the log-spot price follows the following GARCH process over discrete time steps of length $:

RJ = ln ? ()( 1)E = 1 + M() + O()P() ( 6 )

where () is the spot price of the underlying asset at discrete time step net of the discounted value of expected dividends paid over the options life, RJis the one-period log return on the spot asset between $ and , 1 is the continuously compounded interest rate between the discrete time intervals $ and , P() is a standard Gaussian white noise series, P()~.(0,1) , () is the conditional variance of the log return between $ and , andR = (S, T5, U5, V5, M) are the GARCH pricing parameters under the historical measure .

The GARCH parameters R = (S, T5, U5, V5, M) influence the nature of the variance process, which itself shapes the distribution of log-returns. Different values for R therefore lead to different payoff distributions. The parameter T5 controls the kurtosis of the log-return distribution, and T5 = 0is equivalent to a deterministic time-varying variance process. Note also that forT5 = U5 = 0, the variance becomes constant, which yields a valuation model equivalent to a Black and Scholes model observed at discrete intervals. Parameter V5determines the skewness of the distribution and captures the negative relationship between shocks to returns and volatility, also called the leverage effect5. Finally M acts as a risk-premium parameter. The conditional mean of the log asset return is given by:

4 see Heston and Nandi (2000), pp. 588-589.

5 see, for example, Christoffersen and Jacobs (2004) or Heston (1993)

() = S + U5( $) + T5 XP( $) V5O( $)Y6, ( 7 )

12

Where [ denotes the information set available at time step $. The conditional expectation for the log return, as expressed in ( 8 ), consists of a riskless rate 1 and a risk premium M(). A noteworthy feature of the HN GARCH model is that the conditional variance of spot returns between steps and + $, ( + $), is directly observable at time , from the values of the current GARCH parameters R, the conditional variance () between $ and and the current and lagged prices of the spot asset. Isolating the Gaussian innovation term P() in ( 6 ) and substituting in ( 7 ) yields:

The model, specified under the historical measure , is not fit for the direct valuation of derivatives. Pricing real options requires us to derive the risk-neutral distribution of the spot price. In this section, we simply provide the risk-neutral version of equations ( 6 ) and ( 7 ), further detail on their risk-neutralization can be found in the original article by Heston and Nandi (2010). Under the risk-neutral measure, equations ( 6 ) and ( 7 ) become :

ln ? ()( 1)E = 1 ()2 + O()P() ( 10 )

The risk-neutral process is equivalent to equations ( 6 ) and ( 7 ), with parameter M set to0.5, and V5replaced by V5 = V5 + M + 0.5. Rewriting equation ( 9 ) for the one-period conditional mean of the log asset return with the risk-neutral parameters R yields:

The one-period conditional return from investing in the spot asset is therefore equal to the risk-free rate. The conditional variance of the spot asset under the risk-neutral measure is given by:

Note that the expressions for the conditional variance ( + $) under the historical and risk-neutral measures are equivalent. The proof is trivial: plugging V5 = V5 +M + 0.5 into equation ( 13 ) yields the expression for ( + $) in equation ( 9 ). Call option valuation formula The value(, , , , :5, 1, R) at time of a European call option is given by:

^ = (|[) = 1 + M(), ( 8 )

( + $) = S + U5() + T5 (() 1 M()) V5())6() ( 9 )

() = S + U5( $) + T5 XP( $) V5O( $)Y6 ( 11 )

^ = (|[) = 1 ( 12 )

( + $) = S + U5() + T5 (() 1 + () 2 V5())6() ( 13 )

= ()`max() , 0a = 5 + ()6 ( 14 )

13

where is the strike price of the option, is its maturity date, and denotes the expectation under the risk-neutral measure. The left-hand side expression for the call option value can be reformulated by means of the risk-neutral probabilities 5 and 6, where 5 is the delta of the call and 6 = Pr( > ) is the probability that the spot price at maturity exceeds the strike price . The risk neutral probabilities take the form:

denotes the real part of a complex number, and d(e) is the conditional generating function of the asset price for the risk neutral process. For the first-order HN process described by equations ( 6 ) and ( 7 ),d(e) takes the following log-linear form:

Where coefficients f(; ; e) and h(; ; e) are calculated backward from the terminal conditions f(; ; e) = h(; ; e) = 0 and the following recursive equations:

The prices of put options are derived from the call price in ( 14 ), using the put-call parity.

Reasons for choosing the ABS and HN models

The option pricing literature has produced a large number of models for the valuation of European options such as those on the S&P 500 index. In this research, we choose to compute the time series of the cross-sectional pricing error for two particular models: the ABS and the HN model. This section briefly motivates our choice. The ABS, as we described it above, is theoretically inconsistent. Indeed, smoothing Black-Scholes implied volatilities across moneyness and maturities and then plugging them back in equation ( 5 ) violates the assumption of constant volatility underlying the original Black-Scholes pricing formula. As some authors argue6, the ABS procedure may be viewed as nothing more than a sophisticated interpolation tool that provides an implied volatility surface. However, this pricing approach is widely used among option traders and practitioners and is frequently used as a performance benchmark in the option

6 see Berkowitz (2010) or Davis (2001)

5 = 56 + ijk(lj=)m,l n o9jpqr(st:5)st u 2ev4 ( 15 )

6 = 12 + 1wx ystd(ze)ze { 2ev4 ( 16 )

d(e) = `ta = t exp~f(; , e) + h5(; ; e)( + $), ( 17 )

f(; , e) = f( + $; , e) + e1 + h5( + $; ; e)S 12 ln(1 2T5h5( + $; ; e)) ( 18 )

h5(; , e) = e(M + V5) 12 V56 + U5h5( + $; ; e)+ 1 2 ( 5)61 2T5h5( + $; ; e) ( 19 )

14

pricing literature7. As these studies have shown, it consistently competes with other models. Another advantage of the ABS model is that, due to its theoretical inconsistency, it does not make any assumptions on the nature of the variance process. The time-varying trend observed for its error series is therefore independent of any theoretical constraints. This is not the case for the HN model, which, for example, ignores jumps and relies on the assumption of a continuous variance process. The HN error time series might therefore depend on time varying jump likelihood.

We also choose to estimate the time series of the error for the HN GARCH model. We justify this choice by the fact that GARCH models allow us to price options without implying volatilities from the Black-Scholes formula, as valuation is based on observables, such as the discrete observations of the underlying asset price. The main advantage of the model developed by Heston and Nandi over other GARCH models is that it has a closed-form solution and therefore does not require the use of Monte Carlo simulations. Option prices, as equation ( 14 ) shows in the previous section, can be directly obtained from the history of asset prices, option specifications and a finite number of risk neutral parameters R = (S, T, U, V). We mainly consider the HN GARCH in our analysis to assess the generalizability of our ABS results. We want to determine whether our results for the ABS model can be replicated using a more theory-based model. Similar results for two diametrically different models would suggest the existence of the common pattern we are looking for.

Calibration We now turn to a more in-depth description of the methodology used in the calibration of the two models on the sample of option data.

ABS procedure On each Wednesday of the sample, the implied volatility surface *( , )of equation ( 5 ) is calibrated on market data. We implemented a non-linear least square (NLS) procedure8 that minimizes an objective function, defined as the dollar root mean squared error between model option values and observed option prices ($RMSE). This cross-sectional fitting gives us 754 sets of optimized parameters 34 , 35 , 36 , 3C , 3D , 3; ,one for each Wednesday , which define 754 implied volatility surfaces. These surfaces, when plugged in the Black-Scholes pricing formulae, allow the closest match of model prices to option prices on the market. More importantly, the NLS procedure also yields a time series of the minimized dollar root mean squared error, which we will use as input for our empirical analysis detailed in the next chapter.

Figure 2 shows the calibration of the ABS model using NLS on the cross-section of out-of-the-money SPX options on June 16th, 2010, as well as the smoothed volatility surface on that day. The first plot shows that the option prices yielded by the ABS model closely match the market prices over the range of maturities on

7 see for instance Barone-Adesi, Engle and Mancini (2008), Heston and Nandi (2000) or Brandt

and Wu (2002) 8 The optimization procedure is implemented in Matlab, using the active-set algorithm and the

fminunc function

15

June 16th, 2010. The second plot shows the smoothed implied volatility surface on that day. Note that it has the expected shape: implied volatilities decrease as moneyness increases, with a surface getting flatter as the options time to maturity increases. Trend reversal occurs for long-maturity DOTM put options (K/S250 days), as the bottom-left portion of the surface shows.

Figure 2: Calibration of the ABS model on the cross-section of OTM options on June 16th 2010 and smoothed volatility surface

HN-GARCH model

A similar method is used to calibrate the HN GARCH model on the option data, although the optimization process calls for more precaution in this case. A first important remark is that the NLS procedure is now more computationally cumbersome, due to the intrinsic nature of the HN GARCH pricing formula: estimation of the risk-neutral-probabilities in equation ( 14 ) requires numerical integration and the handing of complex numbers, coefficients f(; ; e) and h5(; ; e) need to be computed recursively, etc. In and out, the overall greater number of operations makes the pricing of one cross section of options much slower with the HN model than with the ABS procedure.

Second, the objective function ($RMSE) for the HN-GARCH model contains jump discontinuities and its optimization often requires a large number of function evaluations, which translate into prohibitively long running times. The results generated by the NLS procedure are also very sensitive to the starting values for R = S, T, U, V, provided as input for the optimizer. We therefore incur the risk of running into local minima of the loss function ($RMSE), rather than the global minimum we are looking for. A promising solution to alleviate such concerns is to use algorithms such as the accelerated random search (ARS) proposed by Mller et al (2013). Mller et al acknowledge the caveats of local optimizers such as the one used in this research (active-set algorithms) for non-smooth objective functions, and show that ARS leads to significantly better results for the Heston and Nandi GARCH model. However, using ARS for our data would require considerable computational power (several multi-core computers for parallel processing) we did not have at our disposal for this study. We therefore implement a middle-ground solution: the objective function is evaluated

16

at several starting points R and the optimization starts with the set of initial parameters that yield the smallest objective function value.

Let be any given Wednesday of our sample, with = 1 corresponding to January 17th 1996 and = 754 being June 30th 2010. At each step of the NLS procedure, prices for the cross-section of options at time are calculated from equation ( 14 ), and therefore depend on the values of R = S, T, U, V and ( + $). The conditional variance ( + $) is obtained from expression ( 13 ), which depends on the GARCH parameters being optimized, the log-return of the spot asset since last Wednesday $, and the conditional variance () between $ and . We decided to proxy for the variance () by using an adjustment on the CBOE volatility index (VIX) at time $. This index uses real-time SPX options bid and ask quotes to provide a daily forecast for the expected volatility of the S&P 500 Index over the next 30 days. In other words, the VIX uses present time information to form expectations on the SPX volatility for the near future. This makes the lagged level of the VIX index a particularly good candidate to proxy for (), defined as the conditional variance between the last period and the current period , based on the information set available at time $. Figure 4 shows the calibrated prices for the first day of our sample, Jan. 17th, 1996, and Figure 3 shows the weekly VIX index level over our sample period.

Figure 4: Calibration of the HN GARCH model on the cross-section of OTM SPX options on Jan. 17th, 1996

Figure 3: Weekly VIX Index level (Jan. 1996 - June 2010

17

Empirical Results In previous sections, we provided an overall description of the sample of option and index data used in this research, as well as an overview of the theory we rely on to estimate the time series of the option pricing error. We now proceed to present our empirical results, which we divide in two parts. We start with a section comparing the in-sample and out-of-sample pricing performance of our two models, ABS and HN, over our sample period 1996-2010. In a second part, we then turn to regression analysis in an attempt to explain some of the variation of the in-sample pricing error over time for our sample of out-of-the-money options. Based on the existing literature on option pricing and stock return predictability, we test the explanatory power of a series of potential predictor variables, for both the ABS and the HN model. Next, we generalize our results to the whole spectrum of moneyness, and we show, relying on the ABS model only, that our regression results are robust to the inclusion of ITM and DITM options. Finally, we rely on the time series of out-of-sample RMSEs for the ABS model and study the ability of our explanatory variables to forecast future OOS pricing error.

I. Option pricing performance of the models

In-sample model performance

In this section, we describe the in-sample option pricing performance of the ad-hoc Black and Scholes procedure and the Heston and Nandi GARCH model. Figure 5 shows the time series of the in-sample dollar root mean squared error ($RMSE) for our two models. On the next page, Table 4 provides the yearly means and standard deviations of the coefficient estimates for the two models, as well as yearly descriptive statistics for the $RMSE. Figure 5: Time series of the in-sample option pricing performance (January 17th, 1996- June 30th, 2010)

As a first important remark, we note that for both models, pricing performance does not appear to behave randomly over the sample period. For both the ABS and HN, the time series of the $RMSE exhibits a marked trend in Figure 5, with considerably higher and more volatile in-sample pricing errors around the two crises in 2000 and 2008. After the dot-com bubble, the $RMSE seems to revert to its original pre-crisis level. A similar trend is observed in the aftermath of the

Table 4: Calibration of the HN GARCH and ABS pricing models

HN T 10 U V 106 S 10 Ann. Vol. RMSE Year Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev. 1996 2,21 1,89 0,04 0,08 7,99 3,25 0,08 0,17 0,120 0,011 1,37 0,39 1997 4,28 3,11 0,14 0,18 5,14 1,62 0,37 0,82 0,161 0,022 2,20 0,49 1998 4,79 1,80 0,01 0,04 4,80 1,29 0,04 0,19 0,206 0,039 3,76 1,33 1999 4,25 1,95 0,01 0,03 5,34 1,89 0,01 0,05 0,221 0,023 5,21 1,35 2000 4,66 2,37 0,03 0,07 4,98 1,61 0,04 0,25 0,175 0,020 5,19 1,03 2001 6,31 3,62 0,05 0,11 4,26 1,48 0,32 0,94 0,175 0,011 3,49 1,06 2002 8,07 6,20 0,14 0,20 3,61 1,23 0,39 0,90 0,187 0,027 2,22 0,53 2003 9,86 5,66 0,19 0,20 3,19 1,38 0,77 1,47 0,178 0,026 1,52 0,37 2004 3,38 1,71 0,07 0,14 5,96 2,16 0,10 0,31 0,143 0,013 2,10 0,55 2005 1,92 0,84 0,03 0,12 7,58 1,97 0,06 0,18 0,116 0,007 2,33 0,58 2006 1,57 0,66 0,02 0,07 8,31 1,92 0,04 0,17 0,110 0,009 2,60 0,52 2007 2,21 2,26 0,02 0,06 8,40 3,34 0,02 0,07 0,138 0,033 3,44 0,80 2008 8,46 7,00 0,09 0,14 4,03 1,85 0,37 0,84 0,225 0,064 3,30 1,11 2009 19,09 12,26 0,13 0,20 2,47 1,21 0,91 2,03 0,258 0,045 2,00 0,54 2010 7,81 4,38 0,05 0,16 3,89 1,30 0,43 0,80 0,199 0,024 2,49 0,87

ABS 34 35 36 3C 3D 3; RMSE Year Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev 1996 -0,878 0,190 1,33 0,31 -0,33 0,13 0,32 0,053 0,01 0,012 -0,33 0,049 1,24 0,25 1997 -0,610 0,148 1,01 0,22 -0,24 0,07 0,25 0,073 0,00 0,012 -0,26 0,078 1,81 0,42 1998

-0,715 0,107 1,17 0,20 -0,27 0,07 0,29 0,039 -0,01 0,012 -0,26 0,040 2,34 0,51 1999

-0,689 0,103 1,16 0,15 -0,30 0,06 0,23 0,046 -0,02 0,012 -0,17 0,026 3,16 0,59 2000

-0,465 0,100 0,82 0,14 -0,18 0,04 0,14 0,060 0,01 0,014 -0,14 0,049 3,59 0,73 2001 -0,325 0,098 0,61 0,16 -0,09 0,07 0,13 0,077 0,01 0,018 -0,15 0,055 2,35 1,07 2002

-0,294 0,117 0,58 0,13 -0,08 0,05 0,15 0,063 0,01 0,018 -0,19 0,040 1,34 0,44 2003 -0,262 0,086 0,47 0,09 -0,04 0,03 0,20 0,051 0,00 0,015 -0,20 0,023 0,85 0,19 2004

-0,472 0,098 0,69 0,15 -0,11 0,06 0,24 0,038 -0,01 0,008 -0,21 0,031 1,08 0,71 2005 -0,673 0,137 0,97 0,22 -0,20 0,08 0,28 0,042 -0,01 0,008 -0,25 0,034 1,44 0,24 2006

-0,921 0,160 1,35 0,26 -0,34 0,10 0,35 0,047 0,00 0,012 -0,33 0,048 1,96 0,30 2007 -0,947 0,131 1,47 0,18 -0,39 0,07 0,31 0,046 0,00 0,015 -0,31 0,041 2,43 0,52 2008

-0,509 0,199 1,00 0,20 -0,24 0,08 0,17 0,106 0,02 0,042 -0,21 0,038 2,00 0,74 2009

-0,293 0,073 0,61 0,09 -0,08 0,03 0,23 0,045 -0,01 0,018 -0,21 0,022 1,10 0,19 2010

-0,424 0,065 0,68 0,14 -0,09 0,04 0,27 0,026 0,00 0,016 -0,24 0,027 1,24 0,25

stock market crash in 2008, with $RMSEs reverting to their levels of 1996 or 2003. The mean yearly coefficient estimates in Table 4 also hint at a non-random time series behavior. For example, the HN coefficients V and T both exhibit extrema around the climaxes of the credit crisis in 2009. Furthermore, the annualized GARCH volatility in the last column, as given by O252(S + T)/(1 U TV6) also exhibits two peaks in 2000 and 2009. Altogether, these results suggest that the two series of pricing errors share a common trend, and that this trend might be explained or even forecasted by variables from our option sample (for example: average option price or average trading volume), but also by index characteristics or other market variables (such as fluctuations in investor sentiment or short-term interest rates). This will be the focus of our regression analyses in the later sections of this paper.

Figure 5 also seems to indicate that, over our sample period, the ABS procedure consistently prices options more accurately than the HN GARCH model. This directly contradicts the results presented in Heston and Nandi (2000) and Barone-Adesi, Engel and Mancini (2008). In both these studies, the HN-GARCH model is shown to compete closely with the ABS model or even outperform it in terms of in-sample pricing performance. Using the ABS model as their benchmark, Barone-Adesi et al (BEM) show that, on a yearly basis, the HN-GARCH model persistently outperforms over the years 2002-2004. Heston and Nandi (HN) report the same result for the in-sample performance of the two models over the period 1992-1994. Surprisingly, our calibration results, as shown in Figure 5 hardly confirm those results and even suggest the opposite: the $RMSEs for the ABS procedure over 1996-2010 are consistently lower compared to the $RMSEs of the HN GARCH model. This divergence in findings is easily explained by differences in the specification of the ABS procedure. Both BEM and HN use the original equations from DFW (1998) for their ad-hoc Black and Scholes benchmark, which we will henceforth notate as ABSDFW to differentiate it from our ABS procedure. In DFW, the implied volatilities are smoothed across maturities and strike prices , rather than across maturities and moneyness ratio /, as we do in equation ( 5 ). Although this may seem like a benign difference, there is empirical evidence that input variable transformation in implied volatility functions has a significant impact on pricing performance (see Andreou et al. (2013)). To test for differences in performance across ABS model specifications, we calibrate the original ABSDFW procedure on our sample of option data and compare its pricing performance against that of the HN GARCH model. Figure 6 shows that our results actually corroborate the findings in HN and BEM: the HN model consistently outperforms the ABSDFW benchmark for 2002-2004. Figure 6: Comparison between the original DFW procedure and the HN GARCH model

20

Figure 6 shows that the error for the ABSDFW benchmark especially deteriorates during periods of financial turbulence. The errors for this model around the years 2000 and 2008 are roughly three times as high as those of the HN model and four times as high as those of the modified ABS model. In and out, Figure 5 and Figure 6 suggest that the modified ABS procedure we use in this study, as described in Bollen and Whaley (2004), is the most efficient at capturing the volatility smile over our sample period. Not only does it largely outperform the original ABSDFW, it also exhibits an overall lower level of pricing error than the HN-GARCH model. We provide several reasons to explain that last result.

1) First, this might be due to liquidity biases. As pointed out in Berkowitz (2009), the ABS procedure is able to price options of all maturities and moneyness categories with a consistently impressive empirical performance, even for options that are thinly traded. Since roughly 50% of the option contracts in our sample have a daily volume inferior to ten trades, the performance gap between the HN-GARCH model and the ABS procedure might be partially explained by the fact that the HN-GARCH model prices illiquid options with less accuracy. 2) A review of the GARCH option pricing literature provides a second potential explanation: the HN GARCH model is relatively inaccurate for certain moneyness or maturity categories. Hsieh et al (2005) claim that the HN model underperforms for deep-out-the-money options and Ferreira et al (2005) find that its pricing performance deteriorates for options close to expiration. In fact, in their original article, Heston and Nandi already report that for their GARCH model, short-term OTM options are the most difficult to valuate. Since short-maturity OTM and DOTM options are both represented in non-negligible numbers among our option sample, it makes sense to investigate the pricing performance across our range of moneyness and maturities. We investigate this matter in the next section of this paper. 3) Finally, flaws in our calibration algorithm might also explain part of the pricing performance gap. Indeed, even if the weekly $RMSEs in our two time series are the outcomes of a minimization algorithm, it is important to acknowledge the potential presence of local minima in our results, for both the HN and ABS model. This might be more relevant for the HN-GARCH model, for which the objective function minimized during the calibration process is a highly non-linear function with jump discontinuities and a high sensitivity to starting values. We therefore suspect that running the NLS procedure with a genetic algorithm might produce time series of the pricing error that are slightly different than those we present in this research, with larger differences for the HN-GARCH than for the ABS model.

21

In-sample performance across maturity and moneyness categories

In the previous section, we introduced the idea of comparing pricing performance of our two models for different style of options. We suspect that the relative underperformance of the HN GARCH model observed in Figure 5 might result from the inability of the GARCH model to capture some parts of the volatility smile. For example, the results presented in Hsieh et al (2005) and Ferreira et al (2005) tend to suggest that DOTM options with short time to maturity might be more severely mispriced by the HN GARCH model. To better understand the time series behaviour of our two models over our aggregate sample of OTM options, we investigate the pricing performance across several moneyness and maturity categories. As in the description of the option data, we segregate our sample into three maturity categories: short maturity ( < 60 days), medium maturity ( 60 180 days) and long maturity ( > 180 days). Building on Bollen and Whaley (2004), we also define three moneyness categories based on the options delta: deep-out-of-the money (0.02 < $% 0.125 for calls and 0.125 < $( 0.02 for puts), out-of-the-money (0.125 < $% 0.375, 0.375 < $( 0.125) and near-the money (0.375 < $% 0.625, 0.625 < $( 0.375). Figure 7 illustrates the results of our classification. Note that due to our initial filtering of option data, we do not consider ITM or DITM options. To generalize our results to the full sample of option data, we repeated the analysis including ITM and DITM options, using only the ABS model. These results, along with their interpretation can be found in Appendix 1. When calibration is performed on the full sample of SPX options, the HN model requires prohibitively long computation times, and it was therefore left out.

Figure 7 shows that the difference in valuation performance is not uniform across the spectrum of maturities and moneyness. In fact, the two pricing models valuate some option categories with almost identical accuracy. For example, the HN model appears to compete on an equal basis with ABS for medium-maturity options and only slightly underperforms for options with long time to expiration. For some other categories, the performance gap between the models widens considerably. This is especially observable for short-term options, for which the pricing error of the HN consistently fluctuates above the level of its ABS counterpart. The HN model perform worst for short-maturity DOTM and OTM options, for which the $RMSEs are more than twice as high as for the ABS model. Figure 7 therefore confirms our intuition: while the ABS model accurately valuates options across maturities and moneyness, the HN model systematically fails to adequately capture some parts of the volatility surface and therefore underperforms. Figure 7 also allows us to reject our previous hypothesis that the larger errors of the HN model are due to its poor performance with thinly traded options. Indeed, a look back at Table 3 reveals that the options with the lowest average trading volumes are the ones with medium and long maturities, which are also those for which the HN GARCH performs closest to the ABS model. We therefore conclude from our results that from the three explanations we give in the previous section, the second one is the most likely to explain the performance gap.

Figure 7: Comparison of the pricing error across moneyness and maturity categories for the HN and ABS models

Out-of-sample model performance

We now turn to out-of-sample (OOS) valuation performance for both our models. On each Wednesday in our sample, the in-sample parameter estimates provided by the calibration are used to value SPX options one week later. Figure 8 shows the time series of the OOS dollar pricing errors for our two models.

Figure 8: Out-of-sample pricing errors of the HN and ABS models

As for the in-sample pricing errors, the two OOS time series exhibit similar trends, with increasing levels of pricing error around periods of financial turmoil and regression to pre-crisis levels afterwards. The ABS procedure also slightly outperforms over our aggregate sample period, as was the case in-sample and valuation errors seem to remain in the same range, with no sudden explosion in the OOS $RMSE values. This comforts us in the idea that both models are flexible enough to achieve good valuation performance. On any given day of our sample, both ABS and HN appear to fit the dynamics of index returns as well as the shape of the implied volatility surface quite accurately. Although the ABS model still outperforms in general over the period 1996-2010, the difference in out-of-sample pricing errors between the HN and the ABS model tends to be smaller than those observed in sample in Figure 5.

Appendix 3 shows the time series of the differences in $RMSE between the ABS and HN model in- and out-of-sample. The HN tends to outperform ABS more often when out-of-sample valuation is considered. This partially corroborates the results of Heston and Nandi (2000) and Barone-Adesi, Engle and Mancini (2008), who find that the performance of the ABS model considerably deteriorates out-of-sample because it captures pricing mechanisms by overfitting the data. However, even if the pricing performance of the ABS model indeed deteriorates out-of-sample relative to that of the HN model, our results suggest that the effect is only of moderate magnitude. This indicates that even if the ABS model slightly overfits the data, its estimates remain stable out of sample. Overall, and despite the small differences in performance reported in this section, the pricing errors of our two option pricing models seem to share a common component, which is the focus of our next section.

24

II. Factors of option pricing performance

Foreword

We mentioned in the beginning of this chapter that there were two major parts to our empirical analysis. In the previous section, we covered the first part by documenting the in-sample and out-of-sample performance of our two option pricing models. We now turn to the second part of our study, which directly derives from our previous findings. In our earlier description of the in and out-of-sample pricing errors for the two models, shown in Figure 5 and Figure 8, we observed that the $RMSE time series of the HN and ABS models seemed to exhibit more than just random fluctuations. This intuition is further confirmed when we consider the time series behaviour of the error across option categories in Figure 7. In all these charts, valuation errors seem to behave over time in a synchronous pattern for both the ABS and HN models. This is all the more striking considering that these models are diametrically different in nature, which reinforces our belief in a common cyclicality. The sharp increases around the two crisis periods, and the relatively stable errors in between, make the error time series in Figure 5 resemble the S&P500 index curve shown in Figure 1. This leads us to believe that valuation errors might vary along with fluctuations of the index level, or even other market variables such as interest rates. Motivated by this intuition, we decide to shift our focus: instead of comparing the differences in performance between models, we try to detect common patterns in the two time series. Moreover, if such a common trend exists, we want to identify its underlying factors. In other words, our primary goal is to find factors able to explain why our alternative models both perform well during some periods and why their accuracy deteriorates in other periods.

Stationarity and unit root testing

As said above, the first step of our approach is to statistically prove that the time series of the pricing error for the ABS and HN model share a common trend. One way to back that claim would be to derive the cross correlation function the two time series of errors, and look for high values for the correlations between the various lags of the two series. However, we first need to consider the issue of stationarity. Indeed, before making any inferences on common trends among our two time series, statistical theory requires that our two $RMSE series be stationary, or in other words mean-reverting. This condition will also need to be fulfilled when we use time series regressions later in this chapter, in order to avoid spurious regressions. The outcome of the unit root tests (see Appendix 4) for our two pricing error series seem to indicate that both the ABS (Fs = 12 , F = 0.31, = 1.97) and the HN (Fs = 10 , F = 0.14, = 2.41) $RMSE time series have at least one unit root. In other words, the error time series for ABS and HN are integrated processes of order , with 1. This means that at least one order of differencing is necessary to induce stationary behaviour. We calculate the first-difference of the two non-stationary $RMSE series, $ = 5. Charts representing these differenced series over our sample period can be found in Appendix 2. A second round of ADF tests on those differenced series shows that neither have unit roots, and that one order of differencing was sufficient to induce stationarity.

25

Common trends among ABS and HN pricing errors

In the last section, we derived two stationary pricing error series by using first-differences. We now show that these two differenced series share a common trend, by calculating their cross-correlation (XCF) function. Figure 9 shows the autocorrelation function of the two differenced series, as well the XCF function over twenty lag values. Figure 10 shows the same features for the original non-stationary series for comparison purposes. The first two charts of each figure show that first-order differencing removed the slow, decaying pattern for the autocorrelations in the original series. For the first-differenced series, autocorrelations fluctuate around zero across lag values. In each case, the autocorrelation for the first lag is the only one significantly different than zero, with values around around -0.5. This is a sign that stationarity for the two error series may have come at the price of slight overdifferencing. We acknowledge that caveat and address it later in the section on time series regressions.

Figure 9: Autocorrelations and XCF function of first-differenced $RMSE series for ABS and HN

Figure 10: Autocorrelations and XCF function of original $RMSE series for ABS and HN

Note that the first-differenced error series are moderately correlated for lag 0 (Pearsons r = 0.40, p

26

exhibit a common trend in their valuation performance. This encourages us to search for factors that could explain this observed trend.

Error forecasting variables

In this section, we try to determine a set of variables likely to explain the common pattern in pricing errors across alternative models. We distinguish two broad categories among our set of potential underlying factors: options market variables and other market variables. The first set relates to characteristics of our sample of SPX options, such as option maturity or moneyness. The second regroups factors that are specific to the spot market for the S&P500 index, but also comprises other broad market variables that we expect to have some additional explanatory power. We now give the rationale behind our choice of error forecasting variables.

Option market variables

We already mentioned repeatedly the important impact of certain option characteristics on pricing performance when we were comparing the ABS and HN models in the previous chapter. Factors such as moneyness, time to expiration or trading volume have been shown to influence model error in the existing option pricing literature (Hsieh et al. (2005), Ferreira et al. (2005), Heston and Nandi (2000), Barone-Adesi et al. (2008)). Our empirical results also provided evidence to support that claim, with Figure 7 showing considerably different levels of errors across moneyness and maturity categories. We therefore suspect variables such as mean moneyness (defined as K/S), mean trading volume and mean maturity to influence the pricing error over time. To check for additional patterns, we add three complementary variables: number of options under 1$, total number of options and number of untraded options. By doing so, we want to observe if large numbers of options with very low prices, or zero trading volume have an influence on the overall $RMSE. The variable total number of options is added to account for the explosive growth in size of the S&P500 equity option market over our sample period9. Furthermore, we also include the average implied volatility, the put-call ratio and deviations of the put-call parity as predictors. We define the latter as the daily put trading volume divided by the total trading volume. The put-call ratio is often considered a proxy for investment sentiment by both finance academics and practitioners, and studies have successfully linked it to stock returns. For example, Pan and Poteshman (2006) show that their put-call ratio measure has significant predictive power for the returns of individual stocks, with high (low) ratios indicating short-term underperformance (outperformance). Since both our RMSE time series exhibit higher levels of pricing error during times of crises, which can be considered as low-sentiment periods, we expect the put-call ratio to explain some of the variance in pricing errors. Finally, we add deviations from the put-call parity as our last option market variable. The put-call parity relationship imposes that put and and call options with the same strike price and maturity should have the same Black-Scholes implied volatility. In practice, however, deviations do exist, due to short-sales constraints, difficulties to borrow the underlying stock, or

9 For our sample of option data, the number of OTM options traded daily increased from 92 on January

17th, 1996 to 557 on June 30th, 2010. In terms of intraday volume, the increase is even more spectacular: from 31,179 on January 17th, 1996 to 449,726 on June 30th, 2010.

27

information asymmetries. For example, Lamont and Thaler (2003) argue that short sales restriction on the spot asset can prevent arbitrageurs from restoring the equilibrium between stock and options prices, which often lead to puts that are more expensive than the corresponding calls. The impact of deviations from the put-call parity on both the HN and ABS pricing errors is immediate. For example, for the HN model, deviations directly lead to the mispricing of put options, since the put prices are derived from call prices using the put call parity. To derive our measure of deviations from the put-call parity, we use a measure similar to that used by Cremers and Weinbaum (2010). On each day of the sample, we build pairs of put and call options with the same strike price and time to expiration. We then calculate the difference in implied volatility between each of these pairs of put and call options and use the average daily difference as our proxy. Note that put-call parity deviations can only be observed when the whole sample of SPX options is considered. Indeed, a put and a call option can only have identical strikes and expiration dates if (1) both these options are ATM or if (2) one is OTM and the other is ITM. We will therefore solely include this variable in the regressions for the full sample of SPX options, which we only estimate for the ABS model due to prohibitively long calibration times for the HN model.

Other variables

In rational, efficiently functioning and complete markets, returns on the SPX index and SPX options would be perfectly correlated and options would be valuated with perfect accuracy. In practice, of course, options are priced with a certain level of error, as our non-zero pricing error series show in Figure 5. Academics have studied feedback effects between the spot and options markets extensively10. This stream of research has shown that spot and derivatives markets interact through lead-lag relationships in returns and volatilities, price discovery and information spillovers, leading us us to think that market variables influencing stock returns might also affect option pricing performance. As a result, we consider including internal11 spot market variables, such as one-period SPX returns, but also other broad market variables in our set of error forecasting variables. We justify our choice of variables by reviewing the findings of the literature on stock returns predictability. Our first step is to include variables proper to the underlying S&P 500 index. We select the following index-related variables as predictors: the one- period return, the trading volume on the day of the pricing, the average volume over the last trading week, the observed volatility over the last 10 days, the VIX index level, the volatility of volatility, the S&P 500 SKEW index level, the price to book ratio, the price-earnings ratio and the index dividend yield. Note that although the VIX index, the SKEW index and the volatility of volatility are option-based, we arbitrarily include them as index-related variables. Our choice to include the contemporaneous and past trading volume on the index relies on the visibility hypothesis described in Gervais Kaniel and Mindelgrin (2001). They argue that shocks to trading activity, or volume, carry information about the direction of future stock price movements. They postulate that unusually high trading activity for individual stock creates shocks in trader interest, which lead to the

10 See for instance Conover and Peterson (1999), Gwilym and Buckle (2001) or Pan and Poteshman

(2006) 11

By internal variables, we mean variables that are specifically used as input for the ABS and HN pricing formulae.

28

existence of a high-volume return premium over short term holding periods. They build three value-weighted porfolios of stocks: a high volume portfolio, a normal volume portfolio and a low volume portfolio. They show that, for formation periods of one day and one week, the high-volume portfolio earns abnormal risk-adjusted returns for holding periods of up to a hundred days, without being rebalanced. In a similar portfolio study, Huang and Heian (2010) use a sample including all firms listed on the NYSE and AMEX and study the existence of the high-value return premium over a sample period of fifty years. Their results also support the existence of a high-volume premium for holding periods of one to four weeks. They show that the premiums are mostly concentrated in the first two weeks after the formation period, that they monotonically decrease for extended holding periods, and that they even become negative for periods longer than eight weeks. Building on this empirical research, we hypothesize that if shocks to the trading volume of the index potentially impact stock returns, they might also influence the option pricing error. To account for both immediate and gradual impact of volume on pricing performance, we include a contemporaneous measure of trading volume, trading volume on the day of the pricing as well as a lagged measure, average volume over the last trading week. Both volumes are defined in number of contracts traded. Next, we include two measures of volatility for the S&P 500 index. The impact of the volatility of the spot asset on option valuation is more direct than what we just described for the trading volume. Indeed, the valuation formulae for the HN and ABS model both take some form of index volatility as an input. For the ABS model, the volatility used in the Black-Scholes pricing formula is implied from option prices, which obviously depend on expected stock market volatility. In the case of the HN GARCH model, the call price formula relies on the conditional variance over the next period, ( + $),which measures the expected variance between the current and the next period to impact the pricing error. As we can see, the pricing performance is likely to be influenced by the expectations the market has for the index volatility in the near future. We account for that fact by adding one forward-looking measure of volatility, the well-accepted VIX index level to our set of error forecasting variables. Inspired by an article by Baltussen, Van Bekkum and Van Der Grient (2013), we also decide to add the volatility of volatility (vol-of-vol) as forecasting variable. We use the authors original proxy for the vol-of-vol, defined as the standard deviation of the implied volatilities of ATM put and call options over the last 20 trading days. In their empirical study, Baltussen, Van Bekkum and Van Der Grient sort individual stocks by vol-of-vol into value-weighted portfolio quintiles and show that stocks in the lowest quintile outperform those in the highest quintile by roughly 0.85% in the first month after portfolio formation. They also show that their results are robust to controlling for an extensive list of other known drivers of stock returns, such as size, book-to-market ratio, beta, momentum factor, stock turnover, put-minus-call implied volatilities or leverage. In light of these findings, we suspect that, as a measure capturing uncertainty of expected stock returns, vol-of-vol might explain some of the variance in the two time series of option pricing errors. In addition to the VIX index level, whose inclusion we discussed earlier, we add another widely used index released by the Chicago Board Options Exchange, the SKEW index. The skew index provides traders and portfolio managers with a measure of the perceived tail risk of the distribution of the SPX log returns at a 30-day horizon. It acts as an indicator for the expected skewness of the distribution of log returns: when the SKEW level is around a value of 100, the distribution of log returns is expected to be almost normal. When the SKEW level is at higher values, the expected skewness becomes more

29

negative and the probability of outlier negative returns increases. Accordingly, the SKEW index complements our measures of volatility by accounting for the fact that the distribution of log-returns is not normal. Next, we provide the rationale for adding the log price to book ratio, the log cyclically adjusted price-earnings ratio (CAPE)12 and the log index dividend yield as predictor variables of the pricing error. Pioneering articles by Fama and French (1988a, 1988b) and Campbell and Shiller (1989) or more recently, research by Campbell and Viceria (2005), show that variables such as the dividend yield or the price earnings ratio stock have significant predictive power for stock returns. Fama and French (1998a) examine the predictive power of the dividend yield on stock returns for various holding periods. Their study stresses the importance of the holding horizon in stock return predictability: while the dividend yield only explains less than 5% of the return variance for holding periods of a month, it explains up to 25% of the 3-5 year return variance. Campbell and Schiller build on this research by incorporating stock prices and the dividend yield in a Vector Autoregressive model (VAR), along with the price/earnings ratio. They show that the P/E ratio is a powerful predictor of stock returns, especially as the holding period increases. Campbell and Viceira (2005) extend the VAR framework presented in Campbell and Schiller by including bond and T-bill returns in the VAR model. Their set of return-forecasting variables includes the log dividend yield, the nominal interest rate on the three-month Treasury bill and the yield spread between long and short maturity government bonds. Their article makes an important contribution to the literature by showing the term structure of the return variance and covariance for stocks, bonds and T-bills. In the specific case of stocks, they conclude that as holding horizon increases, predictability from the dividend yield induces mean reversion, which make stocks less risky to hold over long horizons. For their two other return forecasting variables, short-term interest rates and term spreads, Campbell and Viceira (2005) find no conclusive evidence of stock return predictability. In similar research, Ang and Bekaert (2007) document the opposite: they show that at short horizons, short-term interest rates have significant predictive power for future excess returns. Since results differ across studies for the short-term interest rates and the term spread, we decide to include both variables as explanatory variables, because we suspect they might influence the option pricing error. We define short-term interest rates as the log nominal interest rate on the 90-days Treasury bill and the yield spread as the log difference between the yield on the five-year Treasury note and the yield on the 90-days T-bill. Finally, we also account for the fact that variations in investor sentiment might affect the price of S&P 500 index options and leave room for mispricing. Han (2007) studies the impact of investor sentiment on the shape of the implied volatility surface and the risk-neutral skewness of index returns for SPX options. His results show that bearish sentiment on the market leads to a more negative risk-neutral skewness of index returns and a steeper volatility smile. For bullish markets, implied volatility smiles tend to be flatter. In a more recent study, Frijns, Lehnert and Zwinkels (2012) present a stochastic volatility that differentiates between three groups of option traders: traders that trade on long-term mean reversion, traders that trade on short-term patterns and traders that form expectations about future volatility based on fluctuations in investor sentiment. The authors find that when the third group of

12 The CAPE or PE10 (Schiller) is based on average inflation-adjusted earnings from the previous 10

years

30

sentimental traders is accounted for, in- and out-of-sample pricing errors are significantly reduced. Their results are in line with those of Han, and show that sentiment is a non-negligible determinant of the shape of the volatility smile and option prices. In this research, we include four commonly used proxies for investor sentiment: the Baker and Wurgler sentiment index, mutual fund flows, the American Association of Individual Investors (AAII) sentiment survey, and the Investors Intelligence (II) Bearish Sentiment Index. The sentiment measure of Baker and Wurgler (2006) is based on six proxies of investor sentiment: market turnover, number of IPOs, first day return on IPOs, new equity issuances, and difference in book-to-market ratios between dividend payers and non-dividend payers. In our analysis, we use the modified version of the index, for which each proxy has been orthogonalized with respect to the NBER recession indicator, consumption growth and industrial production. This allows the index to capture pure sentiment variations, rather than business cycle fluctuations. We obtain the historical values of the index from the authors website. Our second proxy, mutual fund flows, has also been shown to be a good measure of investor sentiment (see for example Ben-Refael et al (2010) and Chiu and Omesh (2013)). We use the net monthly cash flows for domestic US equity funds from the Investment Company Institute (ICI) as our measure of sentiment. Finally, we include the AAII survey and the II report to account for differences in sentiment between institutional and individual investors. The AAII sentiment survey polls a random sample of individual investors every week and requires the respondents to formulate expectations on how they think the market will evolve for the next six months. Based on the responses, the AAII then calculates the proportion of bullish, bearish and neutral investors. Following Brown and Cliff (2004), we use the spread between the fraction of bullish and bearish investors as a measure of investment sentiment. The II report gathers over a hundred independent market newsletters every week and categorizes their content as bullish, bearish or neutral. A bull-bear spread is then calculated, in a similar manner to that of the AAII. At first, it may seem redundant to use four different proxies for investment sentiment. We justify our choice by the fact that the contemporaneous values of these four measures are relatively moderately correlated over our sample period, as the table in Appendix 6 shows. Since our proxies for sentiment hardly move together over time, we fear that using merely one of them might keep us from observing the real effect of investor sentiment on the pricing error.

The inclusion of the ten option market variables and the fifteen other variables into our set of error predictors leaves us with a total of twenty-five dependent variables for our time series regressions. We show the time series of these predictors in Appendix 8. As it is often the case when working with financial time series, we face the problem that our variables are not all sampled at the same frequency. For example, values for the Baker and Wurgler sentiment index or those for the CAPE are only released monthly by their respective authors. In this case, we use spline interpolation to transform financial time series with monthly frequency to weekly data. A second issue is that some of our pricing error predictors are non-stationary. To meet the assumptions required by OLS regressions on time series, we need to induce stationarity, either by differencing, or by detrending our raw predictor variables. The procedure for unit-root testing is identical to the one we described in our previous section on stationarity: advanced Dickey-Fuller tests are carried out on each predictor, using the appropriate model (AR, ARD or TS) and the adequate lag length provided by our information criterion. Trend-stationary variables are detrended and all

31

variables that fail to reject the ADF null hypothesis of non-stationarity are first-differenced. Appendix 7 provides the results of the ADF tests for each predictor, as well as an overview of the transformations used to induce stationarity. The next section presents the results of our in-sample timeseries regression.

In-sample time series regression for the ABS model The previous section provided the rationale behind our choice of variables to include in our time series regressions. We also gave an overview of our unit-root testing and data transformation process. We now proceed to the central piece of our research: the time series regressions of the option pricing errors on our twenty-five explanatory variables. Regression results are presented for the ad hoc Black and Scholes model, for our filtered sample of out-of-the-money S&P 500 index options. We also provide the regression estimates for the Heston and Nandi GARCH model as confirmatory evidence in the next section. We first focus on in-sample regressions, and then turn to out-of-sample prediction. Table 5 shows the results of the multiple regressions of the pricing error on our set of predictors for the ABS model. In all four regression model specifications, the dependent variable is the first-difference of the $RMSE, and the values of the predictors are contemporaneous to those of the dependent variable. We do not report regression results for the level of the error, since the level series is non-stationary, and its use in time series regressions would lead to spurious results. Note that we use four models in our regression table. Model 1 refers to a first-order autoregressive process, where the only predictor is the first lag of the dependent variable, Model 2 adds option sample characteristics to the set of predictors, and Model 3 includes all other predictors in the regression. Model 4 is a modified version of Model 3, for which all predictors have been first-differenced. Also note that the volatility of volatility has been omitted in Model 3, and that VIX index level is absent from Model 4. We exclude these variables because of their high correlations with other predictors. In levels, contemporaneous values of VIX and vol-of-vol are strongly correlated (Pearsons r= 0.80, p

32

Table 5: In sample regressions for ABS pricing errors (filtered sample of OTM options)

Model 1 Model 2 Model 3 Model 4

Lagged error -.412** (0,033)

-0,413*** (0,033)

-0,412*** (0,033)

-0,423*** (0,032)

Mean option price 0,042*** (0,011)

0,043*** (0,011)

0,014 (0,013)

Mean moneyness -0,191 (0,394)

0,075 (0,611)

-1,224 (1,085)

Mean maturity 0,002 (0,001)

0,001 (0,001)

0,007*** (0,002)

Total number of options 3,904*** (1,133)

3,777*** (1,151)

0,004*** (0,001)

Number of untraded options -1,083 (1,258)

-1,406 (1,279)

-0,002 (0,001)

Number of options under 1$ -2,554 (1,771)

-1,893 (1,849)

-0,001 (0,002)

Put-call ratio 0,025 (0,180)

-0,018 (0,185)

0,079 (0,134)

Mean volume per contract 0,009 (0,058)

0,018 (0,058)

0,000008 (0,00006)

Mean implied volatility -0,282 (1,467)

2,137 (1,869)

3,680 (1,870)

VIX level 0,018 (0,373) -

CAPE 0,525 (2,180)

0,705 (2,004)

Log T-bill interest rate 51,600*** (17,952)

62,951*** (16,79)

Log yield spread 40,658*** (14,477)

42,782*** (13,776)

Log dividend yield -0,050 (3,899)

-122,916*** (18,78)

Bull-Bear spread institutions 0,101 (0,205)

0,0003 (0,004)

Bull-Bear spread individuals -0,081 (0,132)

0,003** (0,001)

Baker and Wurgler index 0,010 (0,494)

0,072 (0,457)

Skew level -0,001 (0,004)

-0,004 (0,006)

SPX volume -0,020 (0,019)

-0,00002 (0,00002)

SPX volume over past week -0,080** (0,0393)

-0,00008** (0,00004)

One-period log returns 0,014 (0,0105)

0,772 (0,669)

Price to book ratio -0,406* (0,225)

-0,446** (0,213)

Mutual fund flow -0,003 (0,014)

-0,00005 (0,00006)

Volatility of volatility - 0,015

Constant 0.001 (0,018)

-0,081 (0,443)

-0,238 (0,628)

0,00034 (0,031)

Num. Obs. 752 752 752 752 f2. 6 .1684 .1999 .2037 .2716

Reports the unstandardized coefficient, standard errors (in parentheses) and adjusted6 for the multiple regressions of the in sample $RMSE on contemporaneous values of the predictor variables for the ad hoc Black and Scholes procedure. Model 1 refers to an AR(1) model for the $RMSE. Model 2 and Model 3 respectively add option market factors and equity market factors to the set of predictor variables. Regression model 4 is equivalent to Model 3, with all predictor variables first-differenced. . ***, ** and * denote significance at the 1%, 5% and 10% level respectively.

33

In order to explain a higher fraction of the variance, we could also consider higher order autoregressive models. To illustrate that claim, Appendix 9 shows the partial autocorrelation function for the first-differenced pricing error of the ABS model. The chart suggests that adding the second or third lag of the dependent variable as predictors in Model 1 might increase the fraction of variance explained by a significant amount. This also raises concerns that the inclusion of additional lags might weaken the significance of our results for Model 2, 3 and 4 of Table 5, or even eliminate the significance altogether. In untabulated results 14 , we show that our regression results are unaffected by the inclusion of additional lags. For the sake of conciseness, we only report our results for the first lag of the pricing error.

Model 2

As the second column of Table 5 shows, adding option market-related variables to the set of predictors increases the adjusted fraction of explained variance to 20 percent. The coefficient on the lagged difference in pricing error remains highly significant and negative. Additionally, the coefficient estimates for Model 2 show that two more predictors have a significant influence on the pricing error: the mean option price and the first-difference of the total number of index options available for trade. The effect of the mean option price on our measure of dollar pricing error is best understood by considering our previous results. A look back at Table 3 shows that a high mean option price on a given Wednesday is probably associated with a tilt toward long-maturity or near-the-money options, which are the most expensive on average. Figure 7 also reveals that the level of pricing error for the ABS model increases as (1) relative moneyness approaches unity and as (3) time to maturity increases. Combining these two results, we provide the following explanation for the positive effect of the first predictor: a high average price is associated with a tilt of the cross-section of options toward more expensive options (long-maturity NTM options), which also are those that are priced with the lowest accuracy by the ABS model. Ultimately, this results in higher dollar pricing errors. The effect of the second significant predictor, the total number of options, is straightforward. To price options, the ABS model relies on the smoothing of the volatility surface implied by option prices. An increase in the number of options results in a more complex implied volatility surface that needs to be fitted with a fixed number of six parameters, or degrees of freedom by the ABS procedure. This translates into higher interpolation errors during the smoothing process, and ultimately in higher valuation errors when the smoothed implied volatility surface is plugged into the Black-Scholes valuation formulae. Note that, as our dependent variable and unlike the mean option price, the total number of options is first-differenced in the regression model, which means that an increase from the previous week in the number of options is associated with in an increase in valuation errors.

Model 3

The inclusion of the rest of the predictors in Model 3 increases the adjusted 6 by an additional 0.4 percent, and the effects observed in Model 1 and 2 remain significant. In addition to the lagged pricing errors, the total number of options and the mean

14 These results are available from the author upon request

34

average o

Documents

Time series analysis of option pricing model performance