Correlation Model Risk and Non-Gaussian Factor Models by ......Abstract Correlation Model Risk and Non-Gaussian Factor Models Julio Antonio Hernandez Bellon Doctor of Philosophy Department

Correlation Model Risk and Non-Gaussian Factor Models

by

Julio Antonio Hernandez Bellon

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Department of MathematicsUniversity of Toronto

c© Copyright 2018 by Julio Antonio Hernandez Bellon

Abstract

Correlation Model Risk and Non-Gaussian Factor Models

Julio Antonio Hernandez BellonDoctor of Philosophy

Department of MathematicsUniversity of Toronto

2018

Two problems are considered in this thesis. The first is concerned with correlation model risk and the second with

non-Gaussian factor modelling of asset returns.

A fundamental problem in the application of mathematical finance results in a real world setting, is the de-

pendence of mathematical models on parameters that are hard to observe in markets. The common term for this

problem is model risk. The first part of this thesis studies the sensitivity of mathematical objects (prices) to cor-

relation inputs. In high dimensions, computational complexities increase faster than exponentially. A typical way

to deal with this problem is to introduce a principal component approach for dimension reduction. We consider

the price of portfolios of options and approximations obtained by modifying the eigenvalues of the covariance

matrix, then proceed to find analytical upper bounds on the magnitude of the difference between the price and the

approximation, under different assumptions.

In the second part of this thesis, the assumptions and estimation methods of four different factor models with

time-varying parameters are discussed. These models are based on Sharpe’s single index model. The first model

assumes that residuals follow a Gaussian white noise process, while the other three approaches combine the

structure of a single factor model with time-varying parameters, with dynamic volatility (GARCH) assumptions

on the model components. The four approaches then are used to estimate the time-varying alphas and betas of

three different hedge fund strategies. Results are compared.

ii

To my parents

iii

Acknowledgements

First and foremost, I would like to thank my advisor Prof. Luis A. Seco for his guidance, encouragement, andfinancial support during these years of graduate studies, but most importantly, for believing in me.

I would also like to thank the graduate students, faculty, and staff members of the Department of Mathematicsof University of Toronto.

Thanks to Prof. Matt Davison, Prof. Adrian Nachman, Prof. Robert L. Jerrard, Prof. Adam Stinchcombe,Prof. Catherine Sulem and Prof. Matilde Marcolli for their insightful comments and suggestions.

Thanks to my fellow Risklab students, especially to Chenyu Guo.

Thanks to Jonathan Mostovoy for proofreading my work.

I would also like to thank an amazing group of people that made this experience much more enjoyable: IvanSalgado, Jemima Merisca, Elizabeth Hill, Marcos Escobar, Selim Tawfik, Ling-Sang Tse, Francisco Guevara, mybrother Nicolas, my uncle Julio, and Michael Glas.

Finally, I would like to thank my parents Marivi and Nicolas, for their unconditional love and support.

iv

Table of Contents

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Estimation of correlation model risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Stochastic calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Black-Scholes theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 The Black-Scholes model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 The Multivariate Black-Scholes model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Upper bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.1 Non-degenerate Gaussian densities. Mean value theorem. . . . . . . . . . . . . . . . . . 192.4.2 L1 distance inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.4.3 Uncorrelated underlying assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4.4 Correlated underlying assets (Two dimensional case) . . . . . . . . . . . . . . . . . . . . 36

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3 Gaussian versus non-Gaussian factor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 Linear regression and estimation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2.1 Least squares estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2.2 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.3 Time series and stationary processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.4 Financial series and GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.5 Model approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.5.1 Model 1 (Gaussian white noise residuals) . . . . . . . . . . . . . . . . . . . . . . . . . . 77

v

3.5.2 Model 2 (GARCH residuals) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.5.3 Model 3 (Unobserved GARCH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.5.4 Model 4 (Weak GARCH residuals) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.6 Applications to hedge fund return modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

vi

List of Tables

3.1 Average alpha per period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.2 Average beta per period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

vii

List of Figures

2.1 Magnitude of the difference between the price of a portfolio of 2 put options and the approxima-tion obtained after modifying the smallest eigenvalue without making it zero, and the proposedupper bound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2 Magnitude of the difference between the price of a digital option and the approximation obtainedby making the variance zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3 Magnitude of the difference between the price of a call option and the approximation obtained bymaking the variance zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Magnitude of the difference between the price of a portfolio of 2 digital options and the approxi-mation obtained by making the smallest eigenvalue zero. . . . . . . . . . . . . . . . . . . . . . . 52

2.5 Magnitude of the difference between the price of a portfolio of 2 call options and the approxima-tion obtained by making the smallest eigenvalue zero. . . . . . . . . . . . . . . . . . . . . . . . . 52

3.1 Time-varying beta of the equity hedge strategy for models 1, 2 and 4. . . . . . . . . . . . . . . . . 843.2 Time-varying alpha of the equity hedge strategy for models 1, 2 and 4. . . . . . . . . . . . . . . . 853.3 Time-varying beta from model 4 for each strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . 873.4 Time-varying alpha from model 4 for each strategy. . . . . . . . . . . . . . . . . . . . . . . . . . 873.5 Time-varying beta of the event driven strategy for models 1, 2 and 4. . . . . . . . . . . . . . . . . 883.6 Time-varying alpha of the event driven strategy for models 1, 2 and 4. . . . . . . . . . . . . . . . 883.7 Time-varying beta of the merger arbitrage strategy for models 1, 2 and 4. . . . . . . . . . . . . . . 893.8 Time-varying alpha of the merger arbitrage strategy for models 1, 2 and 4. . . . . . . . . . . . . . 893.9 Time-varying beta of each strategy for model 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 903.10 Time-varying alpha of each strategy for model 3. . . . . . . . . . . . . . . . . . . . . . . . . . . 90

viii

Chapter 1

Introduction

1.1 Literature review

Finance is one of the fastest developing areas in modern banking and the corporate world. This growth bringsalong the emergence of sophisticated financial products and the resulting new mathematical models and methods.Several assets are traded daily on financial markets around the world: stocks and bonds are some of the mostcommon ones, however, more complex instruments with payoffs dependent on the price of other assets are nowcommonly traded too: forward contracts and options. In particular, European options give the holder of the con-tract the right, but not the obligation, to exercise the option at some future time specified in the contract. Differenttheories have been proposed through the years to model the dynamics of asset prices, particularly relevant isSamuelson, P. (1965) work. Black, F. and Scholes, M. (1973) derived an equation known as the Black-Scholesequation to price derivatives on a single asset within the Black-Scholes model framework, an adjusted Samuelsonmarket environment; that same year Merton, R. (1973) showed that a Black-Scholes type model can be derivedfrom weaker assumptions than in its original formulation. Cox, J., Ingersoll J. and Ross, S. (1985) extended theBlack-Scholes equation to the generalized Black-Scholes equation to price derivatives on multiple assets.

Financial market data often exhibit volatility clustering, where time series show periods of high volatility andperiods of low volatility. In fact, in economic and financial data, time varying volatility is more common thanconstant volatility, therefore accurate modeling of time varying volatility is of great importance. A groundbreakingpaper was proposed by Engel, R (1982) to model the heteroscedastic (time varying) conditional variance of a timeseries as a linear function of past squared observations. This formulation is called Autoregressive ConditionalHeteroscedastic (ARCH) model. Bollerslev, T (1986) proposed the generalized ARCH, or GARCH model, whichspecified the conditional variance to be a function of lagged squared observations and past conditional variance.Several extensions of the GARCH model have been proposed in the literature. The (ARCH-M) model is one of themost popular ones, in this model the conditional variance of the asset is included in the equation of the conditionalmean, see Engle, R., Lilien, D. and Robins, R. (1987). Empirical evidence of stock returns shows that a stockprice decrease tends the increase subsequent volatility by more than a stock price increase, this phenomenon iscalled leverage effect. Different models have been suggested to allow for asymmetric effects of positive andnegative shocks to volatility, some of the more popular ones are the GJR-GARCH model introduced by Glosten,L., Jagannathan, R. and Runkle, E. (1993) and the Exponential GARCH (EGARCH) introduced by Nelson, D.(1991). Drost, F. and Nijman, T. (1993) showed that temporal aggregation of GARCH processes is in general notGARCH but weak GARCH, a class of processes that they defined in their paper. Nijman, T. and Sentana, E. (1996)

1

CHAPTER 1. INTRODUCTION 2

showed that contemporaneous aggregation of independent univariate GARCH processes yields a weak GARCHprocess. Franq, C. and Zakoian, J. (2000) considered weak GARCH (or weak ARMA-GARCH) representationscharacterized by an ARMA structure of the squared error terms and proposed a two stage least squares method toestimate the model parameters.

A large part of modern investment theory is based in the idea that asset returns are driven by a small set ofbasic variables or factors. Factor models are widely used in risk management and portfolio optimization to predictreturns, generate estimates of abnormal return, and estimate the variability and co-variability of returns. The threemain types of multifactor models for asset returns are: macroeconomic factor models, fundamental factor models,and statistical factor models. The most widely used macroeconomic factor model is Sharpe’s single index model,which was proposed by Sharpe, W. (1964). A multifactor extension was later proposed by Chen, N.F., Roll, R,and Ross, S.A. (1986). The two main fundamental factor models were introduced by Fama E. and French, K.R.(Fama-French model) in 1992 and Rosenberg, B. (Barra model) in 1975 respectively. Roll, R. and Ross, S.A.(1980) provided an early empirical application of statistical factor models to equity return data modelling, lateron Connor, G. and Korajczyk, R. (1986), Stock, J.H. and Watson M.W. (2002) and Bai, J. (2003) extended thetheory of statistical factor models.

Hedge funds are one of the largest asset classes in the world. They seek to generate positive alpha for investorsthrough volatility and risk reduction. Hedge fund returns are labelled as absolute return investment vehiclesbecause of their ability to generate stable returns with low systematic risk, which has lead to a significant increaseof their popularity over the last 15 years. In 1992, Sharpe proposed asset class factor modeling for mutual fundsand showed that by using a limited number of asset classes, it was possible to explain the sources of performancefor US mutual funds, see Sharpe, W. (1992). Sharpe’s model is, however, less effective for hedge funds, whichemploy dynamic trading strategies. Fung, W. and Hsieh, D. (1997) employed a multi-factor approach, using threeequity classes, two bond classes, commodities and currencies to model hedge fund returns. Criticism that non-linearities were not being captured through simple linear regression models provided motivation for assessingnon-linear factors. Non-linear factors were implemented by Agarwal, V. and Naik, N. (2004) and Fung, W. andHsieh, D. (2004) in their multifactor models. Roncalli, T. and Weisang, G. (2008) pointed out that dynamic tradingof assets would in itself result in non-linear return profiles. Hence, instead of specifying option based factors, theyhighlighted the importance of capturing time-varying beta, because hedge fund strategies are dynamic, thereforetime invariant factor loadings are unrealistic and simplistic. Research in time varying beta lead to the next stepin hedge fund return modeling, which is replication of hedge fund returns via asset based style factor modeling.Hasanhodzic, J. and Lo, A. (2007), and Darolles, S. and Mero, G. (2007), estimated time varying betas throughrolling window regression, using 24-months and 36-months respectively. However, the length of the windowaffects significantly the estimation results, as longer windows tend to provide more statistical accuracy but areless able to capture recent exposures. The Kalman filter has been suggested to overcome issues with the rollingwindow approach. Takahashi, A. and Yamamoto, K. (2008) applied both methods: rolling window and Kalmanfilters to estimate time varying exposures. They noted that on average the Kalman filter captures exposure earlierthan rolling window. Similar results were obtained by Wei, W. (2010), who also compared factor based modellingvia rolling window against Kalman filters and showed that certain hedge fund strategies are more susceptible tobe cloned. Wei provided an excellent review of the evolution of hedge fund return modelling.


1.2 Contributions

Two problems are considered in this thesis. First, given a portfolio of single asset options with correlated under-lying assets we study how changes in the eigenvalues of the covariance matrix affect the price of the portfolio.Second, four different one factor rolling window regression approaches are discussed and used to model the re-turns of three different hedge fund strategies, the time varying alphas and betas of the returns are estimated andresults are compared.

One of the fundamental problems in the application of mathematical finance results to a real world setting isthe dependence of mathematical models on parameters that are hard to observe in markets. The common termfor this problem is model risk. Chapter 2 aims to provide some building blocks in the estimation of the sensi-tivities of mathematical objects (prices) to correlation inputs. We study the dependence of prices on correlationassumptions or correlation observation errors. In high dimensions, computational complexities increase fasterthan exponentially, a typical approach to deal with this problem is to introduce a principal component approachfor dimension reduction. This approach aims to replace the original covariance (or correlation) matrix by anotherone where the smaller eigenvalues have been set to 0, thereby reducing the effective dimension of the problem andachieving practical computational efficiency. In particular, we consider the price of portfolios of single asset op-tions, obtained under the assumptions of the Black-Scholes multidimensional model (constant and deterministicparameters). The eigenvalues of the covariance matrix are modified and analytical upper bounds of the magnitudeof the difference between the price and the approximation are obtained, for each of the following cases: when theoption payoffs are bounded and eigenvalues are modified without making them zero; when the assets are uncor-related and some eigenvalues are made zero, and the most interesting case, when the portfolio consists of two callor two digital options (both cases where considered) with correlated underlying assets and one of the eigenvaluesis set to 0. Monte Carlo simulations are used to plot the magnitude of the difference between the price and theapproximation.

In chapter 3, the assumptions and estimation methods of four different single factor models with time-varyingalpha and beta are discussed. All the considered models use a rolling window for the estimation of time-varyingparameters. Model 1 makes the traditional assumption that residuals follow a Gaussian white noise process.The other three models combine the structure of a 1-factor model with-time varying parameters, with dynamicvolatility assumptions on the model components. Model 2 considers a 1-factor model with GARCH(1,1) residuals,model 3 assumes that the centered regressor (market return) is GARCH(1,1) and the residuals are Gaussian whitenoise, and model 4 assumes that the residuals follow a weak GARCH(2,2) process. To the best of our knowledgethe last approach has not been used before to estimate time-varying parameters of factor models. Time-varyingalphas and betas of three different hedge fund strategies are then estimated using each model. Finally, results arecompared.

1.3 Future research

Dimension reduction methods play a key role in multi-asset derivatives pricing, therefore it is important to clearlyunderstand how accurate approximations produced by these methods are. In this thesis, analytical upper boundsof the magnitude of the difference between the price and an approximation obtained by making the smallesteigenvalue of the covariance matrix zero, were obtained for portfolios of two options. However, the more generalproblem, when a portfolio consists of an arbitrary number n > 2 of single asset options and k < n eigenvaluesare set to 0 still remains open.


In the second part of this thesis we considered four different approaches to estimate the time-varying param-eters of 1-factor models. It would be interesting however, to extend our analysis to multi-factor models withtime-varying parameters.

Chapter 2

Estimation of correlation model risk

2.1 Introduction

One of the fundamental problems in the application of mathematical finance results in a real world setting, isthe dependence of mathematical models on parameters that are hard to observe in markets. The common termfor this problem is model risk, and its study has become popular in recent years. Perhaps the best example of adisregard for model risk was the cause of the financial crisis of 2008: mathematical models that explained CDOprices, while correct from a mathematical perspective, made assumptions on correlations and were used in creditrating and portfolio valuation applications without giving serious thought to the dependence of conclusions oncorrelation estimations. Correlation matrices are notoriously hard to observe in practice, as historical estimatorsrequire extremely long time series when the dimension is high, and correlation numbers implied from market dataoffer a very reduced field of vision.

This chapter aims to provide some building blocks in the estimation of the sensitivities of prices to correlationinputs. Our motivation is, on the one hand, we are interested, from a risk management perspective, in understand-ing the dependence of prices on correlation assumptions or correlation observation errors, but also with an eye toassisting in the creation of new mathematical models in high dimensions. Indeed, in high dimensions, computa-tional complexities increase faster than exponentially, and a typical approach is to introduce a principal componentapproach for dimension reduction. This approach aims to replace the original covariance (or correlation) matrixby another one where the smallest eigenvalues have been set to 0, thereby reducing the effective dimension of theproblem and achieving practical computational efficiency. While the process of dimension reduction is common,we want to provide some element of error analysis in the always difficult setting of understanding sensitivities ofcalculated objects (prices, mainly) to correlation model inputs.

We consider portfolios of single asset options with correlated underlying assets. The multivariate Black-Scholes model framework can be used to model their prices, however as the number of options in the portfolioincreases finding the portfolio’s value becomes computationally very expensive and time consuming. In order tosolve this problem, dimension reduction methods are commonly used. We approximate the price of portfolios ofsingle asset options by modifying the eigenvalues of the covariance matrix, and obtain analytical upper bounds ofthe magnitude of the difference between the price and the approximation.

This chapter is organized as follows: section 2.2 provides basic definitions and important results of stochasticprocesses that will be used in the derivation of the Black-Scholes pricing formula in section 2.3. Upper boundson the change in price after eigenvalues of the covariance matrix are modified are obtained in section 2.4. The

5

CHAPTER 2. ESTIMATION OF CORRELATION MODEL RISK 6

conclusions of this chapter are in section 2.5.

2.2 Stochastic calculus

Definition 2.2.1. A probability space (Ω,F ,P) is said to be a complete probability space if for all B ∈ F withP(B) = 0 and all A ⊂ B, we have that P(A) = 0.

From now on we assume that all our probability spaces are complete.

Definition 2.2.2. Let (Ω,F ,P) be a probability space. A stochastic process X = (Xt)t∈[0,T ] is a parametrizedcollection of random variables defined on (Ω,F ,P) which take values in Rn.Note that for each t ∈ [0, T ] fixed we have a random vector

ω → Xt(ω).

On the other hand, fixing ω ∈ Ω we have a function

t→ Xt(ω),

which is called a path of Xt.

Definition 2.2.3. Let P and Q be two probability measures defined on the same space (Ω,F), if for any A ∈ F

P(A) > 0⇔ Q(A) > 0

we say that P and Q are equivalent.

Definition 2.2.4. Let (Ω,F ,P) be a probability space. A continuous stochastic process (Wt)t∈[0,T ] is said to beBrownian motion if

1. W0 = 0,

2. Wt −Ws is independent of Wt′ −Ws′ for 0 ≤ s′ < t′ ≤ s < t ≤ T,

3. Wt −Ws ∼ N(0, t− s) for 0 ≤ s < t ≤ T.

Definition 2.2.5. Let (Wt)t∈[0,T ] be n-dimensional Brownian motion. We define Ft = F (n)t to be the σ-algebra

generated by the random variables (Wi,s)1≤i≤n,0≤s≤t. in other words, Ft is the smallest σ-algebra containingall sets of the form ω : Wt1(ω) ∈ F1, · · · ,Wtk(ω) ∈ Fk , where tj ≤ t and Fj ⊂ Rn are Borel sets,j ≤ k = 1, 2, · · · (We assume that all sets of measure zero are included in Ft).

One can think of Ft as “ the history of Ws up to time t”. Note that Fs ⊂ Ft for s < t (i.e. (Ft)t∈[0,T ] isincreasing) and that Ft ∈ F .

Definition 2.2.6. Let (Nt)t∈[0,T ] be an increasing family of σ-algebras of subsets of Ω. A stochastic processgt(ω) : [0, T ]× Ω→ Rn is called Nt- adapted if for each t ∈ [0, T ] the function

ω → gt(ω)

is Nt- measurable.


Definition 2.2.7. A filtration (on (Ω,F)) is a family of σ- algebrasM = (Mt)t∈[0,T ] withMt ⊂ F for eacht ∈ [0, T ] and such that:

Ms ⊂Mt for 0 ≤ s < t.

Remark. Note that the family of σ-algebras (Ft)t∈[0,T ] from definition (2.2.5) is a filtration.

Definition 2.2.8. A filtered probability space is a probability space equipped with a filtration.

Definition 2.2.9. An n-dimensional stochastic processX = (Xt)t∈[0,T ] on a filtered probability space (Ω,F ,P,M)

is called a martingale with respect to the filtrationM = (Mt)t∈[0,T ] ( and with respect to P ) if

1. Xt isMt measurable for all t ∈ [0, T ],

2. E[|Xt|] <∞ for all t ∈ [0, T ],

3. E[Xs|Mt] = Xt for all 0 ≤ t < s ≤ T .

Proposition 1. A Brownian motion (Wt)t∈[0,T ] in Rn is a martingale with respect to the family of σ-algebras

(Ft)t∈[0,T ] generated by Ws; s ≤ t.

Proof. See Oksendal, B. (2003), p. 31.

Proposition 2 (Tower Law). Let (Ω,F ,P) be a probability space, G1 ⊆ G2 ⊆ F sub σ-algebras and Y a random

variable defined on such space. If E[|Y |] <∞ then

E[E[Y |G2]|G1] = E[Y |G1] P a.s.

Proof. See Rosenthal, J. (2006), p. 157.

Definition 2.2.10. Consider the filtered probability space (Ω,F ,P, (Ft)t∈[0,T ]) and let ft(ω) : [0, T ] × Ω → Rbe a process such that:

1. (t, ω)→ ft(ω) is B × F-measurable , where B denotes the Borel σ-algebra on [0, T ],

2. ft(ω) is Ft-adapted,

3. P(∫ T

0ft(ω)2dt <∞

)= 1.

The Ito integral is defined as

∫ T

0

ft(ω)dWt = limN→∞

N−1∑j=0

ftj (ω)(Wtj+1−Wtj )(ω)

where tj = j TN , (Wt)t∈[0,T ] is a one dimensional Brownian motion and the limit is in probability.

Definition 2.2.11. Let W = (W 1, · · · ,Wn) be an n-dimensional Brownian motion defined on the filtered prob-ability space (Ω,F ,P, (Ft)t∈[0,T ]) and σt = [σijt ] a matrix such that for all i = 1, · · · ,m and j = 1, · · · , n,σij(ω) is a process such that:

1. (t, ω)→ σijt (ω) is B × F-measurable , where B denotes the Borel σ-algebra on [0, T ],

2. σijt (ω) is Ft-adapted,


3. P(∫ T

0σijt (ω)2dt <∞

)= 1.∫ T

0σt(ω)dWt(ω) is a column vector whose i-th component is the following sum of 1-dimensional Ito integrals:

n∑j=1

∫ T

0

σijt (ω)dW jt (ω).

Definition 2.2.12. LetW be 1-dimensional Brownian motion on the filtered probability space (Ω,F ,P, (Ft)t∈[0,T ]).A 1-dimensional Ito process X on (Ω,F ,P, (Ft)t∈[0,T ]) is a process of the form

Xt = X0 +

∫ t

0

us(ω)ds+

∫ t

0

vs(ω)dWs(ω),

where both ut(ω) and vt(ω) are B × F-measurable and Ft-adapted, with

P

(∫ T

0

vt(ω)2dt <∞

)= 1,

and

P

(∫ T

0

|ut(ω)|dt <∞

)= 1.

Definition 2.2.13. LetW bem-dimensional Brownian motion on the filtered probability space (Ω,F ,P, (Ft)t∈[0,T ]).Let be µ = (µ1, · · · , µn) be an n- dimensional process and σ = [σij ] an n×m matrix of processes such that forall i = 1, · · · , n and j = 1, · · · ,m , we have that µit(ω) and σijt (ω) are B × F-measurable and Ft-adapted with

P

(∫ T

0

σijt (ω)2dt <∞

)= 1,

and

P

(∫ T

0

|µit(ω)|dt <∞

)= 1.

An n-dimensional Ito process X on (Ω,F ,P, (Ft)t∈[0,T ]) is a process of the form

Xt = X0 +

∫ t

0

µs(ω)ds+

∫ t

0

σs(ω)dWs(ω).

An Ito process can also be written in the following (differential) form

dXt = µdt+ σdWt.

Theorem 3 (n-dimensional Martingale Representation Theorem). Let W be n-dimensional Brownian motion on

the probability space (Ω,F ,P) and Ft be the filtration generated by W . Suppose that M is an n-dimensional

P-martingale process, M = (M1, · · · ,Mn) which has volatility matrix [σij ], in that dM jt =

∑ni=1 σ

ijt dW

it , and

the matrix satisfies the additional condition that (with probability one) it is always non-singular. Then if N is any

1-dimensional P-martingale, then there exists an n-dimensional Ft-adapted process φ = (φ1, · · · , φn) such that


the martingale N can be written as

Nt = N0 +

n∑j=1

∫ t

0

φjsdMjs .

Proof. See Oksedal, B. (2003), pp. 51-54.

Theorem 4 (Ito formula). Let X be a 1-dimensional Ito process given by

dXt = udt+ vdWt.

Let f(t, x) be in C2([0,∞)×R) and define the process Z by Zt = f(t,Xt) . Then Z is again an Ito process and

dZt =∂f

∂t(t,Xt)dt+

∂f

∂x(t,Xt)dXt +

1

2

∂2f

∂x2(t,Xt)(dXt)

2,

where (dXt)2 is computed according to the following rules: (dt)2 = 0, dtdWt = dWtdt = 0 and (dWt)

2 = dt.

Proof. See Oksendal, B. (2003), pp. 46-48.

Definition 2.2.14. Let W be a vector of d independent Brownian motion and δ = [δij ] be a deterministic andconstant matrix with unit length rows i.e. ||δi|| = 1 for i = 1, · · · , n, then the vector process W = δW is a vectorof correlated Brownian motion with correlation matrix ρ = δδ′.

Theorem 5 (General Ito formula). Let X be an n-dimensional Ito process given by

dXt = µdt+ σdWt

where W = (W 1, · · · ,Wm) is a correlated vector of Brownian motion and let f(t,Xt) be a C2 map from

[0,∞)× Rn into R, the stochastic differential of the process Z, where Zt = f(t,Xt), is given by

dZt =∂f

∂t(t,Xt)dt+

n∑i=1

∂f

∂xi(t,Xt)dX

it +

1

2

n∑i,j=1

∂2f

∂xi∂xj(t,Xt)dX

itdX

jt

with the formal multiplication table: (dt)2 = 0, dtdW it = 0 and dW i

t dWjt = ρijdt for all i, j = 1, · · · ,m.

In particular, if m = n and dX has the structure

dXit = µidt+ σidW i

t , for i = 1, · · · , n.

where µi and σi are scalar processes, then the stochastic differential of the process Z is given by

dZt =

∂f∂t

(t,Xt) +

n∑i=1

µi∂f

∂xi(t,Xt) +

1

2

n∑i,j=1

σiσjρij∂2f

∂xi∂xj(t,Xt)

dt+

n∑i=1

σi∂f

∂xi(t,Xt)dW

it

Proof. See Karatzas, I. and Shevre, S. (1991), pp. 150-153.

Definition 2.2.15. Let W be an m-dimensional Brownian motion vector, T > 0, µ(·, ·) : [0, T ] × Rn → Rn,σ(·, ·) : [0, T ]× Rn → Rn×m be measurable functions. An equation of the form

dXt = µ(t,Xt)dt+ σ(t,Xt)dWt


is called a stochastic differential equation (SDE).

The following theorem gives conditions for a solution to the initial value problem

dXt = µ(t,Xt)dt+ σ(t,Xt)dWt, t ∈ [0, T ],

X0 = x0.(2.1)

to exist and be unique.

Theorem 6 (Existence and uniqueness theorem for stochastic differential equations). Suppose that the functions

µ and σ satisfy that

|µ(t, x)|+ |σ(t, x)| ≤ C(1 + |x|); x ∈ Rn, t ∈ [0, T ]

for some constant C > 0, ( where |σ|2 =∑|σij |2 ) and

|µ(t, x)− µ(t, y)|+ |σ(t, x)− σ(t, y)| ≤ D|x− y|; x, y ∈ Rn, t ∈ [0, T ]

for some constant D, then there exists a unique solution to the SDE in (2.1) that is Ft-adapted and such that

E[|Xt|2] ≤ KeKt(1 + |x0|2); t ∈ [0, T ],

for some constant K.


Theorem 7 (Girsanov Theorem). Let W = (W 1, · · · ,Wn) be an n-dimensional P-Brownian motion defined

on the filtered probability space (Ω,F ,P, (Ft)t∈[0,T ]). Suppose that θ = (θ1, · · · , θn) is an Ft-adapted vector

process which satisfies the Novikov condition: E[exp

(12

∫ T0θ2t ds)]

< ∞, and we set W it = W i

t +∫ t

0θisds for

t ∈ [0, T ]. Then there exists a measure Q, equivalent to P up to time T , such that W = (W 1, · · · , Wn) is an

n-dimensional Q-Brownian motion up to time T . The Radon-Nikodym derivative of Q by P is

dQdP

= exp

(−

n∑i=1

∫ T

0

θitdWit −

1

2

∫ T

0

|θt|2dt

)

Proof. See Oksendal, B. (2003), p. 156.

2.3 Black-Scholes theory

Definition 2.3.1 (Geometric Brownian motion). A process X with stochastic differential equation

dXt = µXtdt+ σXtdWt

is called geometric Brownian motion.If X is a 1-dimensional geometric Brownian motion with SDE

dXt = uXtdt+ vXtdWt,

and initial conditionX0 = x,


thenXt = x exp

((u− 1

2v2)t+ vWt

).

This can be verified using Ito formula.We denote the price process of a risk free asset as B, with dynamics:

dBt = rBtdt,

B0 = 1.

where r is a constant.

2.3.1 The Black-Scholes model

Definition 2.3.2 (Black Scholes model). The Black-Scholes model assumes a market consisting of two assets withdynamics given by

dBt = rBtdt,

dSt = µStdt+ σStdWt.

where r, µ and σ are deterministic constants.

Definition 2.3.3. A stochastic variable X is called a contingent claim if its value is determined by a stochasticprocess (St)t∈[0,T ]. In particular, if X depends only on ST (the value of the stochastic process at time T ) then Xis called a simple claim. If (St)t∈[0,T ] is a price process then X is a financial derivative.

Examples of simple contingent claims

• European call option: At time T , the holder of the option has the right, but not the obligation, to buy anasset (with price process (St)t∈[0,T ]) at a predetermined strike price K. This is a contingent claim that pays

(ST −K)+ = max(ST −K, 0).

• European put option: At time T , the holder of the option has the right, but not the obligation, to sell an asset(with price process (St)t∈[0,T ]) at a predetermined strike price K. This is a contingent claim that pays

(K − ST )+ = max(K − ST , 0).

• European digital (call) option: At time T , the holder of the option receives a fix amount D predeterminedat the beginning of the contract if the price of the asset is greater than a predetermined strike price or zerootherwise. This contingent claim pays D, if ST −K > 0,

0, otherwise.

Definition 2.3.4. A portfolio ϕ = (ϕ1, · · · , ϕn) is a vector process. Each of its components describes the amountof a market security that we hold at time t.

Consider a portfolio (φ, ψ) where φt and ψt describe the number of units of one random security and thenumber of units of the bond that we hold at time t respectively. The processes can take positive or negative values


(we allow unlimited short-selling of the stock or bond). The security component of the portfolio φ should beFt-adapted.A portfolio is self financing if and only if the change in its value only depends on the change of the asset prices.Let S and B denote the stock price and the bond price respectively, the value of a portfolio (φt, ψt) at time t isgiven by Vt = φtSt + ψtBt. At the next time instant, two things happen: the old portfolio changes value becauseSt and Bt have changed price; and the old portfolio has to be adjusted to give a new portfolio as instructed bythe trading strategy (φ, ψ). If the cost of the adjustment is perfectly matched by the profits or losses made by theportfolio then no extra money is required from outside, the portfolio is self financing.In discrete time we get a difference equation.

∆Vti = φti∆Sti + ψti∆Bti

In continuous time, we get a stochastic differential equation.

Definition 2.3.5 (Self financing property). If (φ, ψ) is a portfolio with stock price S and bond price B, then

(φ, ψ) is self financing ⇐⇒ dVt = φtdSt + ψtdBt

Definition 2.3.6. A market is said to be arbitrage free if it is not possible to purchase a portfolio for which thereis a risk-less profit to be earned with positive probability.

Definition 2.3.7 ( Replicating strategy). Suppose that we are in a market of a riskless bond B, a risky security S,and a claim X on events up to time T . A replicating strategy for X is a self financing portfolio (φ, ψ) such thatVT = φTST + ψTBT = X .

Replicating strategies are important because if we are able to find a replicating strategy for the claim X , thenthe price of X at time t must be Vt = φtSt + ψtBt otherwise arbitrage opportunities will arise.To obtain the replicating strategy for a given claim X , the first step is to use Theorem 7 (Girsanov Theorem)(take θ = µ−r

σ ) to find a measure Q equivalent to P under which the discounted stock process Zt = B−1t St

is a martingale. (Note that because µ, σ and r are deterministic constants the Novikov condition is safisfied).Second, form the process Et = EQ

[B−1T X

∣∣Ft], this process is a martingale with respect to Q, note that fromProposition 2 (Tower Law) EQ

[EQ[X|Ft]

∣∣Fs] = EQ[X|Fs], for s ≤ t . Therefore, using Theorem 3 (MartingaleRepresentation Theorem), it is possible to find an Ft-adapted process φ such that dEt = φdZt.Consider the following replication strategy:

• hold φt units of stock at time t

• hold ψt = Et − φtSt units of the bond at time t.

The portfolio (φ, ψ) satisfies the self financing condition

dVt = φtdSt + ψtdBt

Note that Vt = BtEt, in particular the terminal value of the strategy is VT = ET = X , therefore there is anarbitrage price for X at all times.


Theorem 8 (Risk Neutral Valuation Formula). Suppose that we have a Black-Scholes model for a continuously

tradable stock and bond. Assume that the market is complete and arbitrage free. Let r be the (constant and

deterministic) risk free interest rate and X be a claim, knowable by some time horizon T . The arbitrage price of

such claim X is given by

Vt = BtEQ[B−1T X

∣∣Ft] = e−r(T−t)EQ[X∣∣Ft]

where Q is the martingale measure for the discounted stock B−1t St.

Proof. See Baxter, M. and Rennie, A. (1996), pp. 87-90.

Let X be a simple claim i.e. X = P (ST ) for some payoff function P , from Theorem 7 (Girsanov Theorem)we obtain that WQ

t = µ−rσ t+Wt, therefore the SDE of the process St written in terms of Q is

dSu = rSudu+ σSudWQu , 0 ≤ u < T,

St = s.

ThereforeST = s exp

((r − 1

2σ2

)(T − t) + σ(WT −Wt)

).

Let Y = ln(ST )s , we have that Y =

(r − 1

2σ2)

(T − t) + σ(WT −Wt), thus Y is normally distributed with mean(r − 1

2σ2)

(T − t) and standard deviation σ√T − t.

However ST = seY , therefore, we obtain the pricing formula

V (s, t) = e−r(T−t)∫RP (sey)pm,σ(y)dy,

where pm,σ is the density function of a normally distributed random variable with mean m =(r − 1

2σ2)

(T − t)and standard deviation σ = σ

√T − t.

The previous result can be derived using another approach. Let V (St, t) denote the price of a simple claimX = P (ST ) at time t, we assume as before that the market has a risky asset and risk free bond with dynamics:

dBt = rBtdt,

dSt = µStdt+ σStdWt.

where r, µ and σ are deterministic constants.Using Theorem 4 (Ito Formula) we obtain an equation for the infinitesimal change in the value of the claim:

dVt = d(V (St, t)) =∂V

∂tdt+

∂V

∂sdSt +

1

2σ2S2

t

∂2V

∂s2dt

Consider a portfolio consisting of 1 unit of the option and −∆ units of the stock; at time t the value of thisportfolio is

Πt = V (St, t)−∆St

After an infinitesimal time increment of dt, the change in portfolio value is given by

dΠt =∂V

∂tdt+

∂V

∂sdSt +

1

2σ2S2

t

∂2V

∂s2dt−∆dSt.


Note that the right hand side of this equation has a deterministic and a random component. Therefore, choosing∆ = ∂V

∂s the randomness is reduced to zero. This choice of ∆ results in a risk free increment of the portfolio

dΠt =

(∂V

∂t+

1

2σ2S2

t

∂2V

∂s2

)dt.

As a consequence, we have that dΠt = rΠdt = r(V − ∂V

∂S St)dt, otherwise arbitrage opportunities will arise.

Combining the two equations we obtain the Black-Scholes equation:

∂V

∂t+ rs

∂V

∂s+

1

2σ2s2 ∂

2V

∂s2− rV = 0.

Theorem 9. The solution to the Black-Scholes equation:

∂V∂t + rs∂V∂s + 1

2σ2s2 ∂2V

∂s2 − rV = 0 for (t, s) ∈ [0, T )× R+

with final condition V (T, s) = P (s) where P ∈ f ∈ L1loc : f = O(|s|−αe|s|2), α > 1 is

V (s, t) =e−r(T−t)

σ√

2π(T − t)

∫RP (sex) exp−

(x−

(r − 1

2σ2)

(T − t))2

2σ2(T − t)dx.

Proof. See Esposito, F. (2010), p.13.

Note that the price of a simple claim can be obtained either as a solution of a PDE or as a expected value, thisis always the case when the parameters r, µ and σ are deterministic constants. The following theorem connectsboth results.

Theorem 10 (Feyman-Kac formula). Suppose that X solves the stochastic differential equation:

dXs = f(Xs, s)ds+ g(Xs, s)dWs,

whereW is a 1-dimensional Brownian motion. Let b be a lower bounded continuous function defined on [0, T ]×R,

then the discounted expected payoff

u(x, t) = EXt=x[e−∫ Ttb(Xs,s)dsP (XT )

]solves the PDE

ut + f(x, t)ux + 12g

2(x, t)uxx − b(x, t)u = 0, for t < T , with u(x, T ) = P (x).


2.3.2 The Multivariate Black-Scholes model

Let’s consider a market with n risky assets and a risk free bond. This model allows for correlation between theassets.

Definition 2.3.8. The Multivariate Black-Scholes model assumes a market of n + 1 assets with dynamics given


bydBt = rBtdt,

dSit = µiSitdt+ σiSitdWit for i = 1, · · · , n.

where W = (W 1, · · · , Wn) is vector of correlated Brownian motion.

From Definition 2.2.14 we have that W = δW where W is a vector of n (independent) Brownian motion andρ = δδ′ is the correlation matrix. Therefore the market dynamics can be rewritten as

dBt = rBtdt,

dSit = Sit(µidt+

∑nj=1 σ

ijdW jt ) for i = 1, · · · , n.

where [σij ] = Mδ with

M =

σ1 0 · · · 0

0 σ2 · · · 0...

.... . .

...0 0 · · · σn

Definition 2.3.9 (Self financing property). Let (φ1, · · · , φn, ψ) be a portfolio with risky securities Si for i =

1, · · · , n and risk free bond price B, then

(φ1, · · · , φn, ψ) is self financing ⇐⇒ dVt =∑ni=1 φ

itdS

it + ψtdBt,

where Vt denotes the value of the portfolio at time t.

Definition 2.3.10. A stochastic variable X is called a contingent claim if its value is determined by a stochasticvector process S = (St)t∈[0,T ] i.e., for t ∈ [0, T ], St = (S1

t , · · · , Snt ) where Si is a stochastic process fori = 1, · · · , n. In particular, if X depends only on ST then X is called a simple claim.

Definition 2.3.11 (Replicating strategy). Suppose that we have a market of n risky securities (S1, · · · , Sn), ariskless bond B, and a claim X on events up to time T . A replicating strategy for X is a self financing portfolio(φ1, · · · , φn, ψ) such that VT =

∑ni=1 φ

iTST + ψTBT = X .

Examples of Simple Claims:

• European Basket call option with weights (wi)i=1,··· ,n and strike price K: At time T the holder of theoption receives the amount (

n∑i=1

wiSiT −K

)+

= max

(n∑i=1

wiSiT −K, 0

),

where∑ni=1 wi = 1 and wi > 0 for all i = 1, · · · , n.

• Product option with strike price K: At time T the holder of the option receives the amount(2∏i=1

SiT −K

)+

= max

(2∏i=1

SiT −K, 0

).


• Portfolio of n European call options (with the same maturity): At time T the holder of the option receives

n∑i=1

(SiT −Ki)+ =

n∑i=1

max(SiT −Ki, 0).

To derive the pricing formula for a simple claimX , we proceed as before. First, assume that Σ = [σij ] is invertibleand let 1 be the constant vector (1, · · · , 1). Note that Σθ = µ − r1 has a unique solution: θ = Σ−1(µ − r1).Let Z = (Z1, · · · , Zn), using Theorem 7 (Girsanov Theorem) we obtain a measure Q such that Zit = B−1

t Sit fori = 1, · · · , n is a Q-martingale. LetX be a claim maturing at time T and define the processEt = EQ(B−1

T X|Ft).This process is a Q-martingale, therefore using Theorem 3 (Martingale Representation Theorem (n-dimensionalversion)), if the matrix Σ is invertible, then there exists a vector process φ with φt = (φ1

t , · · · , φnt ) such that

Et = E0 +

n∑i=1

∫ t

0

φisdZis.

The hedging strategy (φ1t , · · · , φnt , ψt) where φit is the holding of asset i and ψt is the holding of the risk free

bond at time t, with ψt = Et −∑ni=1 φ

itZ

it guarantees that the portfolio (φ1, · · · , φn, ψ) is self financing:

dVt =

n∑i=1

φitdSit + ψtdBt,

also Vt = BtEt and the claim X is attainable by this portfolio. Therefore, we have the following result for amarket with n correlated risky assets and a risk free bond.

Theorem 11 (Multivariate Risk Neutral Valuation Formula). Suppose that we have a market with n correlated

risky assets and a risk free bond, with dynamics given by the Multivariate Black-Scholes model. Assume that the

market is complete and arbitrage free. Let r be the (constant and deterministic) risk free interest rate and X be a

claim, knowable by some time horizon T . The arbitrage price of such claim X is given by

Vt = BtEQ[B−1T X

∣∣Ft] = e−r(T−t)EQ[X∣∣Ft] ,

where Q is the martingale measure for the discounted stock vector B−1t St.

Proof. See Baxter, M. and Rennie, A. (1996), pp. 186-188.

A Black-Scholes pricing equation can be derived in the multidimensional setting too. Set up a portfolio of 1unit of the financial derivative that we want to price and -∆i units of each asset Si. At time t the value of thisportfolio is

Πt = V (S1t , · · · , Snt , t)−

n∑i=1

∆iSit .

Therefore, using Theorem 5 (General Ito Formula), we obtain that the change in value of this portfolio is given by

dΠ =

∂V∂t

+1

2

n∑i,j=1

σiσjρijSitS

jt

∂2V

∂si∂sj

dt+

n∑i=1

(∂V

∂si−∆i

)dSit .

Choosing ∆i = ∂V∂si

we obtain a risk free portfolio. In order to avoid arbitrage opportunities dΠt = rΠdt, which


leads to the following equation:

∂V

∂t+

1

2

n∑i,j=1

σiσjρijsisj∂2V

∂si∂sj+ r

n∑i=1

si∂V

∂si− rV = 0.

This is the multidimensional version of the Black-Scholes equation.

Theorem 12. The solution to the Multidimensional Black-Scholes equation

∂V

∂t+

1

2

n∑i,j=1

σiσjρijsisj∂2V

∂si∂sj+ r

n∑i=1

si∂V

∂si− rV = 0

with final condition V (T, s1, · · · , sn) = P (s1, · · · , sn) where P ∈ f ∈ L1loc : f = O(|s|−αe|s|2), α > n is

V (t, s) =e−r(T−t)

(2π)n2 |det(A)| 12

∫RP (s1e

x1 , · · · , snexn) exp

(−1

2(x−m)′A−1(x−m)

)dx

where

• A = (T − t)M [ρij ]M with (correlation matrix) [ρij ] = δδ′ and M a diagonal matrix with the σi’s in the

main diagonal.

• m = (m1, · · · ,mn) with mi =(r − σ2

i

2

)(T − t).

Proof. See Esposito, F. (2010), p.13.

Theorem 13 (General Feyman Kac formula). Suppose that X solves the vector-valued stochastic equation:

dXis = fi(Xs, s)ds+

∑j

gij(Xs, s)dWjs ,

where each component of W is an independent Brownian motion. Let b be a lower bounded continuous function

defined on [0, T ]× Rn, then the discounted expected payoff

u(x, t) = EXt=x[e−∫ Ttb(Xs,s)dsP (XT )

]solves the PDE

ut + Lu− bu = 0, for t < T , with u(x, T ) = P (x).

where L is the differential operator

Lu =∑i

fi∂u

∂xi+

1

2

∑i,j,k

gikgjk∂2u

∂xi∂xj.



2.4 Upper bounds

We want to obtain upper bounds on the magnitude of the price change of portfolios of single asset options aftereigenvalues of the covariance matrix are modified i.e. we want to find upper bounds on

|uA(st, t)− uA(st, t)|,

where uA denotes the value of the portfolio and A is the covariance matrix of the underlying assets and A isobtained by modifying the eigenvalues of A (the eigenvectors are kept constant). Our first approach to this prob-lem was to consider the value of a multi-asset option as a function of the vector of eigenvalues of the covariancematrix. Note that a portfolio of several 1-asset options can be considered as a multi-asset option with payoff givenby the sum of the payoffs of the 1-asset options, assuming that all the options in the portfolio have the same expirydate. We show the differentiability of the price of a multi-asset option as a function of the vector of eigenvalues,and use partial derivatives to obtain an upper bound when the covariance matrix is positive definite and eigenval-ues are modified without making them zero. In a second approach to this problem, an upper bound is obtainedwhen the eigenvalues are modified without making them zero, under the assumption that the covariance matrixis positive definite and the payoff is bounded, using an estimate of the L1 distance between two non-degenerateGaussian densities. Note that portfolios of digital or put options have bounded payoffs. Finally we consider thecase when some eigenvalues are made zero. We start with the simplest case: portfolios of n single asset optionswith uncorrelated underlying assets. Subsequently, portfolios of two single asset options with correlated underly-ing assets are considered and upper bounds are obtained when one of the eigenvalues is made zero.Before we continue it is necessary to introduce some notation that will be used throughout this section.

Notation:

• pA,m : Rn → R denotes a Gaussian density with mean vector m and covariance matrix A.

• Dλ is a diagonal matrix whose diagonal entries are the components of the vector λ.

• A′ is the transpose of matrix A

• t denotes time.

• T denotes maturity of the contract.

• I(·) is an indicator function

• δ(·) denotes the Dirac delta function.

• Φ denotes the standard normal cumulative distribution function:

Φ(x) =1√2π

∫ x

−∞e−

z2

2 dz

for all x ∈ R.

• The Q-function is the tail probability of a standard normal distribution:

Q(x) = 1√2π

∫∞xe−

z2

2 dz

= 1− Φ(x)


for all x ∈ R.

2.4.1 Non-degenerate Gaussian densities. Mean value theorem.

Theorem 14 (Mean Value). Let F : [c, d] ⊂ R→ R be continuous function on the interval [c, d] and differentiable

on (c, d), then there exists ξ ∈ (c, d) such that:

F (d)− F (c) = F ′(ξ)(d− c).

Proof. See Spivak, M. (1994), pp. 191.

Theorem 15 (Dominated Convergence). Let (fn)n be a sequence in L1(Rn) such that (a) fn → f a.e., and

(b) there exists a nonnegative function g ∈ L1(Rn) such that |fn| ≤ g a.e. for all n . Then f ∈ L1(Rn) and∫f = limn→∞

∫fn.

Proof. See Folland, G. (1999), p. 55.

Theorem 16 (Multidimensional Mean Value). Let U ⊂ Rn be an open convex set and F : U ⊂ Rn → R be

differentiable at each x ∈ U . For all a, b ∈ U there exists θ ∈ [a, b] such that

F (b)− F (a) =

n∑j=1

∂F

∂xj(θ)(bj − aj),

where [a, b] is the segment connecting a and b.

Proof. See Hubbard, B. and Hubbard, J. (2009), pp. 148-149.

Theorem 17 (Tonelli-Fubini). Suppose that (X,M, µ) and (Y,N , ν) are σ-finite measure spaces and letL+(X×Y ) be the space of all measurable functions from (X × Y ) to [0,∞].

(Tonelli) If f ∈ L+(X × Y ), then the functions g(x) =∫fxdν and h(y) =

∫fydµ are in L+(X) and L+(Y ),

respectively, and ∫fd(µ× ν) =

∫ [∫f(x, y)dν(y)

]dµ(x)

=∫ [∫

f(x, y)dµ(x)]dν(y).

(2.2)

(Fubini) If f ∈ L1(µ×ν), then fx ∈ L1(ν) for a.e. x ∈ X , fy ∈ L1(µ) for a.e. y ∈ Y , the a.e.-defined functions

g(x) =∫fxdν and h(y) =

∫fydµ are in L1(µ) and L1(ν) respectively, and 2.2 holds.

Proof. See Folland, G. (1999), p. 67.

Let A be a symmetric positive definite matrix, then A can be factored as

A = QDaQ′,

where Q is an orthogonal matrix, Da is a diagonal matrix with diagonal entries aj > 0 and a is a vector thatcontains the (positive) eigenvalues of A.

DefineB = QDbQ

′,


where Q is the orthogonal matrix previously obtained from the eigenvalue decomposition of A, b is a vector, andDb is a diagonal matrix with diagonal entries bj > 0.

Let uA(st, t) denote the price at time t of an option with payoff P (sT ) and covariance matrix A, then

uA(st, t) = e−r(T−t)

(2π)n2 | det(A)|

12

∫Rn P (ste

x) exp(− 12 (x−m)′A−1(x−m))dx

wherem = (m1, · · · ,mn)

mj = r − σ2j

2 (T − t)A = (T − t)DσCDσ

stex = (s1

t ex1 , · · · , snt exn)

Dσ is a matrix with σj along the diagonal and zeros everywhere else, C is a correlation matrix , C = (ρij)ij andr is the risk free interest rate.

Making the substitution x = Qy we obtain that


(2π)n2 (∏ni=1 aj)

12

∫Rn P (ste

Qy) exp(− 1

2 (y − m)′D−1a (y − m)

)dy

=∫Rn e

−r(T−t)P (steQy)pDa,m(y)dy

with m = Q′m.

We consider uA as a function of the vector of eigenvalues from the covariance matrix (keeping the rest of theparameters fixed including the mean vector m and the rotation matrix Q) i.e.

u(λ) =

∫Rne−r(T−t)P (ste

Qy)pDλ,m(y)dy.

Let a = (a1, · · · , an) ∈ Rn with aj > 0 for all j = 1, · · · , n, we show that there exists R > 0 such that ∂u∂λj

exists and it is continuous on the ball B(a,R) for all j = 1, · · · , nTherefore, as a consequence of Theorem 16 (Multivariate Mean Value) we obtain that

|uA(st, t)− uB(st, t)| = |u(a)− u(b)|≤ C

∑nj=1 |aj − bj |

withC = maxj max

λ∈B(a,R)

∣∣∣ ∂u∂λj (λ)∣∣∣

B = QDbQ′

for all b ∈ B(a,R).More formally, we show that

Proposition 18. Let uA(st, t) be the price at time t of an option with payoff function P (sT ) and positive definite

covariance matrix A. For any of the following payoff functions

P (sT ) =

∑ni=1 max(Ki − siT , 0), portfolio of put options.∑ni=1DiI(siT −Ki > 0), portfolio of digital options.


there exist R,C > 0 such that for all b ∈ B(a,R),

|uA(st, t)− uB(st, t)| ≤ Cn∑j=1

|aj − bj |,

where a = (a1, · · · , an) is the vector of eigenvalues of A and B is obtained from the eigenvalue decomposition

of A (A = QDaQ′), by modifying (only) the vector of eigenvalues i.e. B = QDbQ

′, with b = (b1, · · · , bn).

Proof. Let a = (a1, · · · , an) ∈ Rn with aj > 0 for all j = 1, · · · , n. We show that:

1. ∂u∂λj

(λ) exists at λ = a for all j = 1, · · · , n.

2. There exists R > 0 such that ∂u∂λj

is continuous on B(a,R) for all j = 1, · · · , n.

Let (a(k)j ) ⊂ R be a sequence that converges aj and define

a(k) = (a1, · · · , aj−1, a(k)j , aj+1, · · · , an), k ≥ 1

g(y, λ) = e−r(T−t)P (steQy)pDλ,m(y)

hk(y) = g(y,a(k))−g(y,a)

a(k)j −aj

.

We use Theorem 15 (Dominated Convergence) to show that

limk→∞∫Rn hk(y)dy =

∫Rn limk→∞ hk(y)dy

=∫Rn

∂∂λj

g(y, a)dy <∞.

However, by definition:∂u∂λj

(a) = limk→∞u(a(k))−u(a)

a(k)j −aj

= limk→∞∫Rn hk(y)dy.

Therefore,∂u∂λj

(a) =∫Rn

∂∂λj

g(y, a)dy <∞.

As (a(k)j ) is convergent there exist L,U > 0 and k0 ∈ N such that L < a

(k)j < U for all k ≥ k0.

Letaψ = (a1, · · · , aj−1, ψ, aj+1, · · · , an)

aL = (a1, · · · , aj−1, L, aj+1, · · · , an)

aU = (a1, · · · , aj−1, U, aj+1, · · · , an)

pψ,mj (yj) = 1√2πψ

e−12

(yj−mj)2

ψ

pL,mj (yj) = 1√2πL

e−12

(yj−mj)2

L

pU,mj (yj) = 1√2πU

e−12

(yj−mj)2

U

Note that,∂g(y, λ)

∂λj=

1

2

((yj − mj)

2

λj2 − 1

λj

)g(y, λ)


is defined and continuous for all λ ∈ Rn such that λj > 0 for all j = 1, · · · , n. Therefore by Theorem 14 (MeanValue) for all k ≥ k0,

|hk(y)| ≤ supψ∈[L,U ]

∣∣∣∣∂g(y, aψ)

∂λj

∣∣∣∣ .We want to find h ∈ L1(Rn) such that for all y ∈ Rn

supψ∈[L,U ]

∣∣∣∣∂g(y, aψ)

∂λj

∣∣∣∣ ≤ h(y).

For all y ∈ Rn and ψ ∈ [L,U ]∣∣∣∂g(y,aψ)∂λj

∣∣∣ =∣∣∣ 12e−r(T−t)P (ste

Qy)(

(yj−mj)2ψ2 − 1

ψ

)pDaψ ,m

∣∣∣≤

∣∣∣ 12e−r(T−t)P (steQy)

((yj−mj)2

L2 + 1L

)pDaψ ,m

∣∣∣ .Also, there exists r > 0 such that for all ψ ∈ [L,U ] and yj > r

pψ(yj) ≤ pU (yj).

Therefore, ∣∣∣∂g(y,aψ)∂λj

∣∣∣ ≤ ∣∣∣ 12e−r(T−t)P (steQy)

((yj−mj)2

L2 + 1L

)pDaU ,m

∣∣∣for all ψ ∈ [L,U ] and y /∈ B(m, r) ⊂ Rn, where B(m, r) is a closed ball centered at m with radius r.Hence, defining

h(y) =

M, if y ∈ B(m, r),12e−r(T−t)P (ste

Qy)(

(yj−mj)2L2 − 1

L

)pDaU (y), if y /∈ B(m, r),

with

M = maxψ∈[L,U ]

maxy∈ B(m,r)

∣∣∣∣12e−r(T−t)P (steQy)

((yj − mj)

2

ψ2− 1

ψ

)pDaψ (y)

∣∣∣∣ ,we obtain that

|hk(y)| ≤∣∣∣∣∂g(y, aψ)

∂λj

∣∣∣∣ ≤ h(y),

with h ∈ L1(Rn).From Theorem 15 (Dominated Convergence), we conclude that

∂u∂λj

(a) = limk→∞∫Rn hk(y)dy

=∫Rn limk→∞ hk(y)dy

=∫Rn

∂g(y,a)∂λj

dy.

Note that what we just showed is that for any λ ∈ Rn such that λi > 0 for all i = 1, · · · , n,

∂u

∂λj(λ) =

∫Rn

∂g(y, λ)

∂λjdy.

Next, we show that∫Rn

∂g(y,λ)∂λj

dy is continuous at any λ ∈ Rn such that λi > 0 for all i = 1, · · · , n.


Let λ = (λ1, · · · , λn) ∈ Rn with λi > 0 for all i = 1, · · · , n and (λ(k)) ⊂ Rn be a sequence that converges to λ.Then, there exist ε > 0 and k1 ∈ N such that λi − ε > 0 for all i = 1, · · · , n and λ(k) ∈ V for all k ≥ k1, where

V = θ ∈ Rn : λi − ε ≤ θi ≤ λi + ε.

Let

pθi,mi(yi) = 1√2πθi

e− 1

2

(yi−mi)2

θi

pλi+ε,mi(yi) = 1√2π(λi+ε)

e− 1

2

(yi−mi)2

λi+ε

λU = (λ1 + ε, · · · , λn + ε)

For all i = 1, · · · , n there exists ri > 0 such that for all θ = (θ1, · · · , θn) ∈ V

pθi,mi(y) ≤ pλi+ε,mi(yi)

for all yi > ri. Therefore, taking r = max(ri)ni=1 we obtain that

pDθ,m(yi) =∏ni=1 pθi,mi(yi)

≤∏ni=1 pλi+ε,mi(yi)

≤ pDλU (y).

Hence, ∣∣∣ ∂∂λj

g(y, θ)∣∣∣ ≤ ∣∣∣ 12e−r(T−t)P (ste

Qy)(

(yj−mj)2

(λj−ε)2+ 1

λj−ε

)pDλU ,m(y)

∣∣∣for all θ ∈ V and y /∈ B(m, r).

Let

h(y) =

M, if y ∈ B(m, r),∣∣∣ 12e−r(T−t)P (steQy)

((yj−mj)2

(λj−ε)2+ 1

λj−ε

)pDλU ,m(y)

∣∣∣ , if y /∈ B(m, r),

withM = max

θ∈V,y∈ B(m,r)

(∣∣∣ 12e−r(T−t)P (steQy)

((yj−mj)2

θ2j− 1

θj

)pDθ,m(y)

∣∣∣)and define

fk(y) =∂

∂λjg(y, λ(k)).

We have that fk, h ∈ L1(Rn) and

|fk(y)| ≤ supθ∈V

∣∣∣∣ ∂∂λj g(y, θ)

∣∣∣∣ ≤ h(y)

for all k ≥ k1 and y ∈ Rn.Therefore

limk→∞∂u∂λj

(λ(k)) = limk→∞∫Rn fk(y)dy

=∫Rn limk→∞ fk(y)dy

=∫Rn limk→∞

∂∂λj

g(y, λ(k))dy

=∫Rn

∂∂λj

g(y, λ)dy

= ∂u∂λj

(λ).


As a consequence of Theorem 15 (Dominated Convergence) and the continuity of ∂g∂λj

.

Therefore, we conclude that for any λ = (λ1, · · · , λn) ∈ Rn such that λi > 0 we have that ∂u∂λj

is continuousat λ. In particular, if A is a positive definite matrix, then A = QDaQ

′, where a = (a1, · · · , an) with ai > 0 forall i = 1, · · · , n. Hence, for each j = 1, · · · , n there exists Rj > 0 such that ∂u

∂λjis continuous on B(a,Rj).

Taking R = minj=1,nRj , we obtain that ∂u∂λj

is continuous on B(a,R) for all j = 1, · · · , n.Therefore,

|uA(st, t)− uB(st, t)| = |u(a)− u(b)|≤

(maxj=1,n max

λ∈B(a,R)

∣∣∣ ∂u∂λj (λ)∣∣∣) (∑nj=1 |aj − bj |)

≤ C(∑nj=1 |aj − bj |),

with C = maxj maxλ∈B(a,R)

∣∣∣ ∂u∂λj (λ)∣∣∣, as a consequence of Theorem 16 (Multivariate Mean Value).

2.4.2 L1 distance inequalities

Definition 2.4.1 (Hellinger distance). Let P and Q be two probability measures with densities p and q withrespect to a dominating measure λ. The Hellinger distance between two measures is defined as the L2(λ) distancebetween the square roots of their densities:

H(P,Q) =[ ∫ (√

dPdλ −

√dQdλ

)2

dλ] 1

2

= (λ[(√p−√q)2])

12 .

The Hellinger distance satisfies the inequality 0 ≤ H(P,Q) ≤√

2. However some authors prefer to have anupper bound of 1, therefore they include an extra factor of 1√

2in the definition of H(P,Q).

Definition 2.4.2 (Hellinger affinity). Let P and Q be two probability measures with densities p and q with respectto a dominating measure λ. The Hellinger affinity between two measures is defined as

α(P,Q) = λ[√pq].

Definition 2.4.3 ( Bhattacharya coefficient). Let p and q be probability densities with respect to the Lebesguemeasure in R. The Bhattacharya coefficient is defined as

BC(p, q) =

∫R

√pq.

Note that if λ is the Lebesgue measure and P and Q are probability measures absolutely continuous withrespect to λ then α(P,Q) = BC(p, q).

We obtained an estimate of the change in price after modifying the eigenvalues of the covariance matrix basedon some of the inequalities obtained by Kvatadze, Z. and Shervashidze, T. (2005).

Proposition 19. Let uA(st, t) be the price at time t of an option with payoff P (sT ) and positive definite covari-


ance matrix A. If the payoff function P is bounded then

|uA(st, t)− uB(st, t)| ≤ 2e−r(T−t)||P ||∞n∑i=1

|ai − bi|ai + bi

,


of A: A = QDaQ′, by modifying (only) the vector of eigenvalues i.e. B = QDbQ

′, with b = (b1, · · · , bn).

Corollary 19.1. Let uA(st, t) be the price at time t of an option with positive definite covariance matrix A and

payoff P (sT ) given by

P (sT ) =

∑ni=1 max(Ki − siT , 0), portfolio of put options.∑ni=1DiI(siT −Ki > 0), portfolio of digital options.

then

|uA(st, t)− uB(st, t)| ≤ 2e−r(T−t)||P ||∞n∑i=1

|ai − bi|ai + bi

,


of A (A = QDaQ′), by modifying (only) the vector of eigenvalues i.e. B = QDbQ

′, with b = (b1, · · · , bn).

Proof. Let uA(st, t) be the price at time t of an option with bounded payoff function P (sT ) and positive definitecovariance matrix A,


(2π)n2 | det(A)|

12

∫Rn P (ste

x) exp(− 12 (x−m)′A−1(x−m))dx.

Previously, we showed that by making the substitution x = Qy we obtain that


(2π)n2 (∏ni=1 aj)

12

∫Rn P (ste

Qy) exp(− 1

2 (y − m)′D−1a (y − m)

)dy

=∫Rn e

−r(T−t)P (steQy)pDa,m(y)dy

with m = Q′m.

As before, define B = QDbQ′ with b = (b1, · · · , bn) such that bi > 0 for all i = 1, · · · , n. Also, let pai,mi

and pbi,mi be Gaussian densities with mean mi and variance ai and bi respectively for all i, then

pDa,m − pDb,m =∏ni=1 pai,mi −

∏ni=1 pbi,mi

= pa1,m1· · · pan,mn − pb1,m1

· · · pbn,mn .

Moreover, ∫RnpDa,m − pDb,m =

n∑i=1

∫Rpai,mi − pbi,mi .

This can be proved using mathematical induction:Case k = 1 is trivial .Case k = n :Assume that the result is true for k = n− 1 :


∫Rn−1

pa1,m1 · · · pan−1,mn−1 − pb1,m1 · · · pbn−1,mn−1 =

n−1∑i=1

∫Rpai,mi − pbi,mi .

Then ∫Rn pa1,m1

· · · pan,mn − pb1,m1· · · pbn,mn =

∫Rn pa1,m1

· · · pan−1,mn−1pan,mn

−∫Rn pa1,m1

· · · pan−1,mn−1pbn,mn

+∫Rn pa1,m1

· · · pan−1,mn−1pbn,mn

−∫Rn pb1,m1

· · · pbn−1,mn−1pbn,mn

=∫Rn pa1,m1

· · · pan−1,mn−1(pan,mn − pbn,mn)

+∫Rn−1 pa1,m1

· · · pan−1,mn−1

−∫Rn−1 pb1,m1

· · · pbn−1,mn−1

=∫R pan,mn − pbn,mn+∑n−1i=1

∫R pai,mi − pbi,mi

=∑ni=1

∫R pai,mi − pbi,mi .

Therefore ∫Rn|pDa,m − pDb,m| ≤

n∑i=1

∫R|pai,mi − pbi,mi |.

In addition, let p and q be probability densities with respect to the Lebesgue measure in R. Using the Cauchy-Schwartz inequality we obtain that∫

R |p− q| =∫R |√p−√q||√p+

√q|

≤(∫

R |√p−√q|2

) 12(∫

R |√p+√q|2) 1

2

= (2(1−BC(p, q)))12 (2(1 +BC(p, q)))

12

= 2√

1−BC2(p, q).

This inequality is part of the more general result:

∫R|p− q| ≤ 2H(P,Q) ≤ 2

(∫R|p− q|

) 12

.

Note thatH2(P,Q) =

∫R(√p−√q)2

= 2(1−∫R√pq)

= 2(1−BC(p, q)).

Therefore ∫R |p− q| ≤ (2(1−BC(p, q)))

12 (2(1 +BC(p, q)))

12

= H(P,Q)√

4−H2(P,Q)

≤ 2H(P,Q)

because0 ≤ H(P,Q) ≤

√2.


Similarly, (√p−√q

)2 ≤ |√p−√q|(√p+√q)

= |p− q|.

Therefore,

H(P,Q) ≤(∫

R|p− q|

) 12

.

Making p = pai,mi and q = pbi,mi , we obtain that

BC(pai,mi , pbi,mi) =∫R√pai,mi(yi)pbi,mi(yi)dyi

=(

12π√aibi

) 12 ∫

R exp (− 12 (( 1

2 ( 1ai

+ 1bi

))12 (yi − mi))

2)dyi

=(

12π√aibi

) 12 ∫

R exp (− 12 ((ai+bi2aibi

)12 (yi − mi))

2)dyi

=(

1√aibi

) 12(

2aibiai+bi

) 12 1√

2π

∫R e− 1

2 z2i dzi

=(

2√aibi

ai+bi

) 12

,

after making the change of variables zi = (ai+bi2aibi)

12 (yi − mi).

Therefore, ∫R |pai,mi − pbi,mi | ≤ 2

√1−BC2(pai,mi , pbi,mi)

= 2√

ai−2√aibi+bi

ai+bi

= 2|√ai−

√bi|√

ai+bi

= 2|√ai−

√bi|√

ai+bi·√ai+√bi√

ai+√bi

= 2 |ai−bi|√ai+bi(

√ai+√bi)

≤ 2 |ai−bi|√(ai+bi)2

= 2 |ai−bi|ai+bi.

Hence,|uA(st, t)− uB(st, t)| ≤

∫Rn e

−r(T−t)P (steQy) |pDa,m(y)− pDb,m(y)| dy

≤ e−r(T−t)||P ||∞∑ni=1

∫R |pai,mi − pbi,mi |

≤ 2e−r(T−t)||P ||∞∑ni=1

|ai−bi|ai+bi

.

Monte Carlo simulations were used to compare the magnitude of the difference between the price of a portfolioof 2 put options and the approximation obtained after modifying the smallest eigenvalue , with our proposed upperbound. The following values were used: Sit = 10 and Ki = 9.6 for i = 1, 2, ∆ = T − t = 1, σ = 0.2, σ2 = 0.3,ρ = 0.3 and r = 0.02. The total number of simulations used was 2000000. Figure 2.1 shows the results.

2.4.3 Uncorrelated underlying assets

Theorem 20 (Markov’s Inequality). Let (Ω,F ,P) be a probability space and X : (Ω,F ,P) → [0,∞) be a

random variable, then for any a > 0

P(X > a) ≤ E(X)

a.

Proof. See Rosenthal, J. (2006).


Figure 2.1: Magnitude of the difference between the price of a portfolio of 2 put options and the approximationobtained after modifying the smallest eigenvalue without making it zero, and the proposed upper bound.

Proposition 21 (Chernoff bound of the Q function).

Q(x) ≤ e− x2

2 for all x > 0.

Proof. Let X be a random variable with standard normal distribution and x > 0. For any λ > 0, we have that

Q(x) = P(X > x)

= P(eλX > eλx)

≤ e−λxE(eλX)

from Markov’s inequality.However,

E(eλX) =∫R

1√2πeλxe−

12x

2

dx

=∫R

1√2πe−

12 (x2−2λx+λ2)+λ2

2 dx

= eλ2

2

∫R

1√2πe−

12 (x−λ)2dx

= eλ2

2 .

Hence,Q(x) ≤ e−λxeλ

2

2 .

In particular, if we choose λ = x we obtain that

Q(x) ≤ e− x2

2 .

Proposition 22. Let U be a (Lebesgue) measurable subset of R then

1√2πa

∫U

ecxe−(x−µ)2

2a dx =1√2πa

ecµ+ c2a2

∫U

e−(x−(µ+ca))2

2a dx.


Proof.1√2πa

∫Uecxe−

(x−µ)22a dx = 1√

2πa

∫Ue−(x2−2xµ+µ2)

2a +cxdx

= 1√2πa

∫Ue−x2+2xµ−µ2+2acx

2a dx

= 1√2πa

∫Ue−x2+2(ac+µ)x−µ2

2a dx

= 1√2πa

∫Ue−x2+2(ac+µ)x−(ac+µ)2+(ac+µ)2−µ2

2a dx

= 1√2πa

∫Ue−(x−(ac+µ))2+(ac+µ)2−µ2

2a dx

= 1√2πa

ea2c2+2acµ+µ2−µ2

2a

∫Ue−(x−(ac+µ))2

2a dx

= 1√2πa

ecµ+ c2a2

∫Ue−(x−(ac+µ))2

2a dx.

Proposition 23. Let U be a (Lebesgue) measurable subset of R then

∫U

e−(cx−µ2)2

b e−(x−µ1)2

a dx = e−(µ2−cµ1)2

ac2+b

∫U

e−

(x− acµ2+bµ1

ac2+b

)2ab

ac2+b dx.

Proof. We have that(cx−µ2)2

b =c2x2−2cµ2x+µ2

2

b(x−µ1)2

a =x2−2µ1x+µ2

1

a .

Therefore,

− (cx−µ2)2

b − (x−µ1)2

a =−(a(c2x2−2cµ2x+µ2

2)+b(x2−2µ1x+µ21))

ab

=−((ac2+b)x2−2(acµ2+bµ1)x+aµ2

2+bµ21)

ab

= −(ac2+bab

)(x2 − 2

(acµ2+bµ1

ac2+b

)x+

(aµ2

2+bµ21

ac2+b

))= −

(ac2+bab

)(x2 − 2

(acµ2+bµ1

ac2+b

)x+

(acµ2+bµ1

ac2+b

)2)

+(ac2+bab

)((acµ2+bµ1

ac2+b

)2

−(aµ2

2+bµ21

ac2+b

))= −

(ac2+bab

)(x−

(acµ2+bµ1

ac2+b

))2

+ 1ab

((acµ2+bµ1)2

ac2+b − (aµ22 + bµ2

1)).

However,

1ab

((acµ2+bµ1)2

ac2+b − (aµ22 + bµ2

1))

= 1ab

(a2c2µ2

2+2abcµ1µ2+b2µ21−(a2c2µ2

2+abµ22+abc2µ2

1+b2µ21)

ac2+b

)= 1

ab

(2abcµ1µ2−abµ2

2−abc2µ2

1

ac2+b

)= − (µ2

2−2cµ1µ2+c2µ21)

ac2+b

= − (µ2−cµ1)2

ac2+b .

Hence, ∫U

e−(cx−µ2)2

b e−(x−µ1)2

a dx = e−(µ2−cµ1)2

ac2+b

∫U

e−

(x− acµ2+bµ1

ac2+b

)2ab

ac2+b dx.

Let uA(st, t) be the price of an option with payoff P (ST ) and positive definite covariance matrix A then



(2π)n2 | det(A)|

12

∫Rn P (ste

x) exp(− 12 (x−m)′A−1(x−m))dx.

In particular, if the underlying assets are uncorrelated (A = Da) then


(2π)n2 (∏ni=1 ai)

12

∫Rn P (ste

x) exp(− 1

2 (x−m)′D−1a (x−m)

)dx

=∫Rn e

−r(T−t)P (stex)pDa,m(x)dx.

where Da is a diagonal matrix and a = (a1, · · · , an) is the vector of diagonal entries of Da.As a consequence, if P (ST ) =

∑ni=1DiI(SiT −Ki > 0) (portfolio of digital options) then

uA(st, t) =∫Rn e

−r(T−t) (∑ni=1DiI

(site

xi −Ki > 0))pDa,m(x)dx

= e−r(T−t)∑ni=1

∫Rn DiI

(site

xi −Ki > 0)∏n

i=1 pai,mi(xi)dx


∫RDiI

(site

xi −Ki > 0)pai,mi(xi)dxi .

Similarly, if P (ST ) =∑ni=1 max(SiT −Ki, 0) (portfolio of call options) then

uA(st, t) =∫Rn e

−r(T−t) (∑ni=1 max

(site

xi −Ki, 0))pDa,m(x)dx


∫Rn max

(site

xi −Ki, 0)∏n

i=1 pai,mi(xi)dx


∫R max

(site

xi −Ki, 0)pai,mi(xi)dxi .

Let A = Da with a = (a1, · · · , ak, 0, · · · , 0) we show that

1. If uDA denotes the price of a portfolio of n digital options (with the same maturity) at time t and covariancematrix A then there exist constants Cj1 , C

j2 > 0 with j = 1, · · · , n− k such that

∣∣uDA (st, t)− uDA (st, t)∣∣ ≤ n−k∑

j=1

Cj1e− C

j2

ak+j .

2. If uCA denotes the price of a portfolio of n call options (with the same maturity) at time t and covariancematrix A then there exist constants Cj1 , C

j2 , C

j3 , C


∣∣uCA(st, t)− uCA(st, t)∣∣ ≤ n−k∑

j=1

Cj1(eCj2ak+j − 1) + Cj3e

− Cj4

ak+j .

More formally, we have the following proposition:

Proposition 24. Let uDA (st, t) denote the price of a portfolio of n digital options (with the same maturity) at time

t with positive definite covariance matrix A. If the underlying assets are uncorrelated i.e. A = Da then there

exist constants Cj1 , Cj2 > 0 with j = 1, · · · , n− k such that

∣∣uDA (st, t)− uDA (st, t)∣∣ ≤ n−k∑

j=1

Cj1e− C

j2

ak+j ,

where uDA

(st, t) denotes an approximation of the price obtained by making zero n− k eigenvalues of the covari-

ance matrix i.e A = Da with a = (a1, · · · , ak, 0, · · · , 0).


Proof. If the underlying assets are uncorrelated then A = Da where a = (a1, · · · , an) with ai > 0 for alli = 1, · · · , n, therefore if uDA (st, t) denotes the price of a portfolio of digital n options at time t and covariancematrix A then

uDA (st, t) = e−r(T−t)∑ni=1

∫RDiI

(site

xi −Ki > 0)pai,mi(xi)dxi


∫∞lnKisit

Dipai,mi(xi)dxi.

In addition, if A = Da with a = (a1, · · · , ak, 0, · · · , 0) then

uDA

(st, t) = e−r(T−t)∑ki=1

∫∞lnKisit

Dipai,mi(xi)dxi

+e−r(T−t)∑ni=k+1

∫∞lnKisit

Diδ(xi −mi)dxi.

Letuai(s

it, t) =

∫ ∞lnKisit

pai,mi(xi)dxi

andu0(sit, t) =

∫ ∞lnKisit

δ(xi −mi)dxi,

we have that

|uDA (st, t)− uDA (st, t)| ≤ e−r(T−t)∑ni=k+1Di|uai(sit, t)− u0(sit, t)|.

However,

uai(sit, t) =

∫∞lnKisit

1√2πai

e− (xi−mi)

2

2ai dxi

=∫∞

lnKisti

−mi√ai

e−(x′i)

2

2 dx′i

=

(1− Φ

(lnKisit−mi

√ai

))

= Q

(lnKisit−mi

√ai

).

Also,

u0(sit, t) =

1, if mi > ln Ki

sit,

12 , if mi = ln Ki

sit,

0, if mi < ln Kisit.

Note that if ln Kisit

= mi then |uai(sit, t) − u0(sit, t)| = 0 for all ai > 0. Therefore, only the other two cases arenontrivial:

1. mi > ln Kisit

2. mi < ln Kisit


Case 1: If mi > ln Kisit

then

|uai(sit, t)− u0(sit, t)| = 1−Q

(lnKisit−mi

√ai

)

= Q

(mi−ln

Kisit√

ai

)

≤ e−

(mi−ln

Kisit

)2

2ai .

Case 2: If mi < ln Kisi

then

|uai(sit, t)− u0(sit, t)| = Q

(lnKisit−mi

√ai

)

≤ e−

(lnKisit

−mi

)2

2ai .

Remark. Note that both inequalities are obtained using Proposition 21 (Chernoff bound of the Q function).

Hence, we obtain that there exist constants Cj1 , Cj2 > 0 with j = 1, · · · , n− k such that

|uDA (st, t)− uDA (st, t)| ≤n−k∑j=1

Cj1e− C

j2

ak+j .

A similar result holds for call options.

Proposition 25. Let uCA(st, t) denote the price of a portfolio of n call options (with the same maturity) at time t

with positive definite covariance matrix A. If the underlying assets are uncorrelated i.e. A = Da then there exist

constants Cj1 , Cj2 , C

j3 , C


∣∣uCA(st, t)− uCA(st, t)∣∣ ≤ n−k∑

j=1


− Cj4

ak+j ,

where uCA

(st, t) denotes an approximation of the price obtained by making zero n−k eigenvalues of the covariance

matrix i.e A = Da with a = (a1, · · · , ak, 0, · · · , 0).

Proof. If the underlying assets are uncorrelated then A = Da where a = (a1, · · · , an) with ai > 0 for alli = 1, · · · , n, therefore if uCA(st, t) denotes the price of a portfolio of digital options at time t and covariancematrix A then

uCA(st, t) = e−r(T−t)∑ni=1

∫R max(site

xi −Ki, 0)pai,mi(xi)dxi.

In addition, if A = Da with a = (a1, · · · , ak, 0, · · · , 0) then

uCA

(st, t) = e−r(T−t)∑ki=1

∫∞lnKisit

(sitexi −Ki)pai,mi(xi)dxi

+e−r(T−t)∑ni=k+1

∫∞lnKisit

(sitexi −Ki)δ(xi −mi)dxi.


Let

uai(sit, t) =

∫ ∞lnKisit

(sitexi −Ki)pai,mi(xi)dxi

andu0(sit, t) =

∫ ∞lnKisit

(sitexi −Ki)δ(xi −mi)dxi,

we have that

|uCA(st, t)− uCA(st, t)| ≤ e−(T−t)n∑i=1

|uai(sit, t)− u0(sit, t)|.

Using Proposition 22 we obtain that

∫∞lnKisit

sitexipai,mi(xi)dxi =

∫∞lnKisit

1√2πai

e− (xi−mi)

2

2ai sit · exidxi

= sitemi+

ai2

∫∞lnKisit

1√2πai

e−(

(xi−(mi+ai))√2ai

)2

dxi

= sitemi+

ai2

∫∞lnKisit

−(mi+ai)

√ai

1√2πe−

(x′i)2

2 dx′i

= sitemi+

ai2

(1− Φ

(lnKisit−(mi+ai)√ai

))

= sitemi+

ai2 Q


).

On the other hand,

Ki

∫∞lnKisti

1√2πai

e− (xi−mi)

2

2ai dxi = Ki

∫∞lnKisit

−mi√ai

1√2πe−

(xi)2

2 dxi

= Ki

(1− Φ

(lnKisit−mi

√ai

))

= KiQ

(lnKisit−mi

√ai

).

Therefore

uai(si, t) = sitemi+

ai2 Q


)−KiQ

(lnKisit−mi

√ai

).

In addition, note thatu0(sit, t) = max(site

mi −Ki, 0).

We want to find an upper bound of|uai(sit, t)− u0(sit, t)|

as ai → 0.We must consider two cases:

1. mi ≥ ln(Kisit

)


2. mi < ln(Kisit

)

Case 1: If mi ≥ ln(Kisit

) then max(sitemi −Ki, 0) = site

mi −Ki. Therefore,

|uai(sit, t)− u0(sit, t)| =

∣∣∣∣∣stiemi+ ai2 Q(

lnKisit−(mi+ai)√ai

)−KiQ(lnKisit−mi

√ai

)− (sitemi −Ki)

∣∣∣∣∣=

∣∣∣∣∣sitemi(e ai2 Q(lnKisit−(mi+ai)√ai

)− 1)−Ki(Q(lnKisit−mi

√ai

)− 1)

∣∣∣∣∣=

∣∣∣∣∣sitemi(e ai2 (1−Q((mi+ai)−ln

Kisit√

ai))− 1)

∣∣∣∣∣+KiQ(mi−ln

Kisit√

ai)

≤ sitemi(e

ai2 − 1) + site

mieai2 Q(

(mi+ai)−lnKisit√

ai) +KiQ(

mi−lnKisit√

ai)

≤ sitemi(e

ai2 − 1) + (site

mieai2 +Ki)Q(

mi−lnKisit√

ai)

≤ sitemi(e

ai2 − 1) + (site

mieai2 +Ki)e

−

mi−ln

Kisit√

ai

2

2

= siemi(e

ai2 − 1) + (site

mieai2 +Ki)e

−

(mi−ln

Kisti

)2

2ai .

Remark. Note that the last inequality uses Proposition 21 (Chernoff bound of the Q function).

As a consequence, there exist constants Ci1, Ci2, C

i3, C

i4 > 0 such that

|uai(sit, t)− u0(sit, t)| ≤ Ci1(eCi2ai − 1) + Ci3e

−Ci4ai .

Case 2: If mi < ln(Kisti

) then max(sitemi −Ki, 0) = 0. Consequently,


∣∣∣∣∣sitemi+ ai2 (1− Φ(


))−K1(1− Φ(lnKisit−mi

√ai

))

∣∣∣∣∣≤

∣∣∣∣∣sitemi+ ai2 (1− Φ(


))|+Ki(1− Φ(lnKisit−mi

√ai

))

∣∣∣∣∣for ai > 0 small enough ln Ki

sit− (mi + ai) > 0, and


∼ O

(lnKisit−mi

√ai

), as ai → 0.

Therefore, using Proposition 21 (Chernoff bound of the Q function) we have that


∣∣∣∣∣sitemi+ ai2 Q(


)

∣∣∣∣∣+

∣∣∣∣∣KiQ(lnKisit−mi

√ai

)

∣∣∣∣∣≤ site

mi+ai2 e−

lnKisit

−(mi+ai)

√ai

2

2 +Kie−

lnKisit

−mi√ai

2

2

≤ (sitemi+

ai2 +Ki)e

−

lnKisit

−(mi+ai)

√ai

2

2

∼ O(e− Cai ),


with C > 0.Therefore, if aj > 0 is small enough for j = 1, . . . , n − k , then there exist constants Cj1 , C

j2 , C

j3 , C

j4 > 0 such

that ∣∣uCA(st, t)− uCA(st, t)∣∣ ≤ n−k∑

j=1


− Cj4

ak+j .

Corollary 25.1. Let uCA(st, t) denote the price of a portfolio of n call options (with the same maturity) at time t

with positive definite covariance matrix A. If the underlying assets are uncorrelated i.e. A = Da and aj small

enough for j = k + 1, · · · , n, then there exist a constant C > 0 such that

∣∣uCA(st, t)− uCA(st, t)∣∣ ≤ C n−k∑

j=1

ak+j ,

where uCA

(st, t) denotes an approximation of the price obtained by making zero n−k eigenvalues of the covariance

matrix i.e A = Da with a = (a1, · · · , ak, 0, · · · , 0).

Monte Carlo simulations were used to plot the magnitude of the difference between the price of a singleasset option and the approximation obtained by making the variance zero. Note that the covariance matrix of aportfolio of single asset options with uncorrelated assets is diagonal, therefore the eigenvalues are the variancesof each asset.Figure 2.2 shows the plot for a digital option. 4000000 simulations were used. The following values were assignedto the parameters and variables of the option: St = 100, K = 96 and D = 10, ∆ = T − t = 1, σ = 0.2 andr = 0.

Figure 2.2: Magnitude of the difference between the price of a digital option and the approximation obtained bymaking the variance zero.

Figure 2.3 shows the plot for a call option. The following values were assigned to the parameters and variablesof the option: St = 100, K = 96, ∆ = T − t = 1, σ = 0.2 and r = 0.02. The total number of simulations usedwas 4000000.


Figure 2.3: Magnitude of the difference between the price of a call option and the approximation obtained bymaking the variance zero.

2.4.4 Correlated underlying assets (Two dimensional case)

Let uA(s1t , s

2t , t) denote the price at time t of an option that depends on two underlying assets with payoff

P (s1T , s

2T ) and positive definite covariance matrix A, then


2π| det(A)|12

∫R2 P (ste

x) exp(− 12 (x−m)′A−1(x−m))dx,

wherem = (m1,m2)

mi = r − σ2i

2 (T − t)A = (T − t)DσCDσ

st = (s1t , s

2t )

stex = (s1

t ex1 , s2

t ex2).

Dσ is a matrix with σi along the diagonal and zeros everywhere else, C is a correlation matrix , C = (ρij)ij andr is the risk free interest rate.

Making the substitution x = Qy, whereQ is obtained from the eigenvalue decomposition ofA (A = QDaQ′)

we obtain that


2π(∏2i=1 ai)

12

∫R2 P (ste

Qy) exp(− 1

2 (y − m)′D−1a (y − m)

)dy

=∫R2 e−r(T−t)P (ste

Qy)pDa,m(y)dy,

with m = Q′m.Let

Q =

[c11 c12

c21 c22

]If P (S1

T , S2T ) = D1I(S1

T − K1 > 0) + D2I(S2T − K2 > 0) (portfolio of two digital options with the same

maturity) then

uA(s1t , s

2t , t) = e−r(T−t)

∫R2 D1I(s1

t ec11y1+c12y2 −K1 > 0)pa1,m1

(y1)pa2,m2(y2)dy1dy2+

e−r(T−t)∫R2 D2I(s2

t ec21y1+c22y2 −K2 > 0)pa1,m1

(y1)pa2,m2(y2)dy1dy2.


If P (S1T , S

2T ) = max(S1

T − K1, 0) + max(S2T − K2, 0) (portfolio of two call options with the same maturity)

then

uA(s1t , s

2t , t) = e−r(T−t)

∫R2 max(s1

t ec11y1+c12y2 −K1, 0)pa1,m1


e−r(T−t)∫R2 max(s2

t ec21y1+c22y2 −K2, 0)pa1,m1


From the eigenvalue decomposition of A, we have that

A = QDaQ′,

where a = (a1, a2) with ai > 0 for i = 1, 2.Define

A = QDaQ′,

with a = (a1, 0).Let uDA (st, t) denote the price at time t of a portfolio of two digital options then there exists a constant C > 0

such that ∣∣uDA (st, t)− uDA (st, t)∣∣ ≤ C√a2.

Similarly, Let uCA(st, t) denote the price at time t of a portfolio of two call options then there exist constantsC1, C2, C3, C4 > 0 such that

∣∣uCA(st, t)− uCA(st, t)∣∣ ≤ C1

√a2 + C2(eC3a2 − 1) + C4a2.

These bounds are formally presented in the next two propositions.

Proposition 26. Let uDA (st, t) denote the price at time t of a portfolio of two digital options with the same maturity

and correlated underlying assets, with positive definite covariance matrix A. There exists a constant C > 0 such

that ∣∣uDA (st, t)− uDA (st, t)∣∣ ≤ C√a2,

where uDA

(st, t) denotes an approximation of the price obtained by making zero the second eigenvalue of the

covariance matrix i.e A = QDaQ′ with a = (a1, a2) and A = QDaQ

′ with a = (a1, 0).

Proof. Let uDA (s1t , s

2t , t) denote the price at time t of a portfolio of two correlated digital options with positive

definite covariance matrix A and with the same maturity.Previously, we showed that

uDA (s1t , s

2t , t) = e−r(T−t)

∫R2 D1I(s1

t ec11y1+c12y2 −K1 > 0)pa1,m1


e−r(T−t)∫R2 D2I(s2

t ec21y1+c22y2 −K2 > 0)pa1,m1


LetuDA

(s1t , s

2t , t) = lim

a2→0uDA (s1

t , s2t , t),

andJ1 =

∫R2 D1I(s1

t ec11y1+c12y2 −K1 > 0)pa1,m1

(y1)pa2,m2(y2)dy1dy2

J2 =∫R2 D2I(s2

t ec21y1+c22y2 −K2 > 0)pa1,m1


J01 = lima2→0 J1

J02 = lima2→0 J2,


then|uDA (s1

t , s2t , t)− uDA (s1

t , s2t , t)| ≤ e−r(T−t)

(|J1 − J0

1 |+ |J2 − J02 |).

We just need to find an upper bound of |J1 − J01 | as the same procedure used to bound |J1 − J0

1 | can be used toobtain an upper bound of |J2 − J0

2 |.Different cases are considered based on the possible values of c1j with j = 1, 2.

If c11 6= 0 and c12 = 0 then

J1 =∫R2 D1I(s1

t ec11y1 −K1 > 0)pa1,m1


=(∫

RD1I(s1t ec11y1 −K1 > 0)pa1,m1

dy1

) (∫R pa2,m2

(y2)dy2

)=

∫RD1I(s1

t ec11y1 −K1 > 0)pa1,m1

dy1.

ThereforeJ0

1 =

∫RD1I(s1

t ec11y1 −K1 > 0)pa1,m1

dy1,

and |J1 − J01 | = 0 for all a2 > 0.

If c11 = 0 and c12 6= 0 then

J1 =∫R2 D1I(s1

t ec12y2 −K1 > 0)pa1,m1(y1)pa2,m2(y2)dy1dy2

=(∫

RD1I(s1t ec12y2 −K1 > 0)pa2,m2(y2)dy2

)(∫R pa1,m1(y1)dy1)

=∫RD1I(s1

t ec12y2 −K1 > 0)pa2,m2(y2)dy2.

Therefore there exist constantsC1, C2 > 0 such that |J1−J01 | ≤ C1e

−C2a2 . (This result was proved in the previous

section).We can now consider the case c11, c12 6= 0. To fix ideas and illustrate the procedure assume that c11, c12 > 0,

we obtain that

s1t ec11y1+c12y2 −K1 > 0 ⇐⇒ y2 >

ln K1

s1t

c12− c11

c12y1.

Therefore

J1 =∫∞−∞

∫∞lnK1s1t

c12− c11c12

y1

D1pa2,m2(y2)dy2

pa1,m1(y1)dy1

= D1

∫∞−∞

1√2πa1

e−(y1−m1)2

2a1

∫∞lnK1s1t

c12− c11c12

y1

1√2πa2

e−(y2−m2)2

2a2 dy2

dy1.

Hence

lima2→0

J1 = D1

∫ ∞−∞

1√2πa1

e−(y1−m1)2

2a1

(lima2→0

∫ ∞lnK1s1t

c12− c11c12

y1

1√2πa2

e−(y2−m2)2

2a2 dy2

)dy1.

Note that we can exchange the integrals using Theorem 17 (Tonelli-Fubini) and pass the limit inside the integralusing Theorem 15 (Dominated Convergence).We must determine

lima2→0

∫ ∞lnK1s1t

c12− c11c12

y1

1√2πa2

e−(y2−m2)2

2a2 dy2.


Note that ∫ ∞lnK1s1t

c12− c11c12

y1

1√2πa2

e−(y2−m2)2

2a2 dy2 =

∫ ∞−∞

I

(y2 >

ln K1

s1t

c12− c11

c12y1

)pa2,m2

(y2)dy2.

Thereforelima2→0

∫ ∞lnK1s1t

c12− c11c12

y1

1√2πa2

e−(y2−m2)2

2a2 dy2

is 1, if m2 >

lnK1s1t

c12− c11

c12y1,

12 , if m2 =

lnK1s1t

c12− c11

c12y1,

0, if m2 <lnK1s1t

c12− c11

c12y1.

Note that as we are assuming that c11, c12 > 0 then

m2 >ln K1

s1t

c12− c11

c12y1 ⇐⇒ y1 >

ln K1

s1t

c11− c12

c11m2.

As a consequence,

J01 = lim

a2→0J1 = D1

∫ ∞lnK1s1t

c11− c12c11

m2

1√2πa1

e−(y1−m1)2

2a1 dy1.

In addition,J1 = I1 + I2,

where

I1 = D1

∫ lnK1s1t

c11− c12c11

m2

−∞

(∫ ∞lnK1s1t

c12− c11c12

y1

pa2,m2(y2)dy2

)pa1,m1

(y1)dy1

and

I2 = D1

∫ ∞lnK1s1t

c11− c12c11

m2

(∫ ∞lnK1s1t

c12− c11c12

y1

pa2,m2(y2)dy2

)pa1,m1

(y1)dy1.

Let I01 = lima2→0 I1 and I0

2 = lima2→0 I2 then

|J1 − J01 | ≤ |I1 − I0

1 |+ |I2 − I02 |,

with I01 = 0 and I0

2 = D1

∫∞lnK1s1t

c11− c12c11

m2

1√2πa1

e−(y1−m1)2

2a1 dy1.

Note that

I1 = D1

∫ lnK1s1t

c11− c12c11

m2

−∞Q

lnK1s1t

c12− m2 − c11

c12y1

√a2

pa1,m1(y1)dy1,

withlnK1s1t

c12− m2 − c11

c12y1

√a2

≥ 0


for y1 ≤lnK1s1t

c11− c12

c11m2.

In addition,

|I2 − I02 | = D1

∣∣∣∣∣∣∫∞ln K1s1t

c11− c12c11

m2

1−∫∞

lnK1s1t

c12− c11c12

y1

pa2,m2(y2)dy2

pa1,m1(y1)dy1

∣∣∣∣∣∣= D1

∫∞lnK1s1t

c11− c12c11

m2

1−Q

lnK1s1t

c12−m2− c11c12

y1√a2

pa1,m1(y1)dy1

= D1

∫∞lnK1s1t

c11− c12c11

m2

Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1.

So far we have shown that under the assumption of c11, c12 > 0,

|J1 − J01 | ≤ T1 + T2

where

T1 = D1

∫ lnK1s1t

c11− c12c11

m2

−∞ Q

lnK1s1t

c12−m2− c11c12

y1√a2

pa1,m1(y1)dy1

T2 = D1

∫∞lnK1s1t

c11− c12c11

m2

Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1.

On the other hand, define

T3 = D1

∫ lnK1s1t

c11− c12c11

m2

−∞ Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1

and

T4 = D1

∫∞lnK1s1t

c11− c12c11

m2

Q

lnK1s1t

c12−m2− c11c12

y1√a2

pa1,m1(y1)dy1.

In general, for the remaining cases:

• c11 > 0 and c12 < 0

• c11 < 0 and c12 > 0

• c11 < 0 and c12 < 0

we obtain that

|J1 − J01 | ≤

T1 + T2, if c11c12 > 0,

T3 + T4, if c11c12 < 0.


The proofs of the remaining cases are almost identical to the proof provided above of the case: c11 > 0 andc12 > 0. In addition, we have that:

Ti ≤

D1

∫ lnK1s1t

c11− c12c11

m2

−∞1√

2πa1e−

lnK1s1t

c12−m2−

c11c12

y1

2

2a2 e−(y1−m1)2

2a1 dy1, for i = 1, 3.

D1

∫∞lnK1s1t

c11− c12c11

m2

1√2πa1

e−

c11c12y1−

lnK1s1t

c12−m2

2

2a2 e−(y1−m1)2

2a1 dy1, for i = 2, 4.

using Proposition 21 (Chernoff bound of the Q function).Next, we obtain an upper bound for the first integral:

R = D1

∫ lnK1s1t

c11− c12c11

m2

−∞

1√2πa1

e−

lnK1s1t

c12−m2−

c11c12

y1

2

2a2 e−(y1−m1)2

2a1 dy1.

In particular , we show that there exists a constant C > 0 such that

D1

∫ lnK1s1t

c11− c12c11

m2

−∞1√

2πa1e−

lnK1s1t

c12−m2−

c11c12

y1

2

2a2 e−(y1−m1)2

2a1 dy1 ≤ C√a2

for a2 > 0 small enough.Note that there also exists a constant C ′ such that for the second integral

D1

∫∞lnK1s1t

c11− c12c11

m2

1√2πa1

e−

c11c12y1−

lnK1s1t

c12−m2

2

2a2 e−(y1−m1)2

2a1 dy1 ≤ C ′√a2

for a2 > 0 small enough. The proof of this result is almost identical to the proof for the first integral shown below.

Let

A =lnK1s1t

c11− c12

c11m2

d =lnK1s1t

c12− m2

c = c11c12

b = 2a2

a = 2a1

m = m1.

From Proposition 23,

∫ A−∞ e−

(d−cy1)2

b e−(y1−m)2

a dy1 = e−(cm−d)2

ac2+b∫ A−∞ e

− ac2+bab (y1− acd+bmac2+b

)2dy1.


Therefore there exists a constant C > 0 such that

R ≤ C∫ A

−∞

1√aπe− ac

2+bab (y1− acd+bmac2+b

)2dy1.

Consider the integral ∫ A

−∞

1√aπe− ac


)2dy1.

Lety1√

2=

√ac2 + b√ab

(y1 −

acd+ bm

ac2 + b

),

then dy1 =√ab√

2√ac2+b

dy1 and at y1 = A, y1 = A =√

2√ac2+b√ab

(A− acd+bmac2+b ).

Therefore, ∫ A−∞

1√aπe− ac


)2dy1 =

√ab√

2πa√ac2+b

∫ A−∞ e−

y212 dy1

=√b√

ac2+bΦ(A),

where A =√

2√ac2+b√ab


Returning to our original variables we obtain that

A =

√2√

2a1(c11c12

)2+2a2√

2a12a2

(

lnK1s1t

c11− c12

c11m2

)−

2a1c11c12

lnK1s1t

c12−m2

+2a2m1

2a1(c11c12

)2+2a2

=

√2√

2a2√2a1

lnK1s1t

c11− c12c11

m2−m1√2a1(c11c12

)2+2a2

=

√a2√a1

lnK1s1t

c11− c12c11

m2−m1√a1(c11c12

)2+a2

,

and √b√

ac2 + b=

√a2√

a1( c11c12 )2 + a2

.

As a consequence, as a2 → 0 √a2√

a1( c11c12 )2 + a2

Φ(A) ∼ O(a212 ).

Hence, there exits C > 0 such thatR ≤ C

√a2.

Proposition 27. Let uCA(st, t) denote the price at time t of a portfolio of two call options with the same maturity

and correlated underlying assets with positive definite covariance matrixA. There exist constantsC1, C2, C3, C4 >

0 such that ∣∣uCA(st, t)− uCA(st, t)∣∣ ≤ C1

√a2 + C2(eC3a2 − 1) + C4a2,

where uCA

(st, t) denotes an approximation of the price obtained by making zero the second eigenvalue of the


covariance matrix i.e A = QDaQ′ with a = (a1, a2) and A = QDaQ

′ with a = (a1, 0).

Proof. Let uCA(s1t , s

2t , t) denote the price at time t of a portfolio of two correlated call options with positive definite

covariance matrix A and with the same maturity.Previously, we showed that

uCA(s1t , s

2t , t) = e−r(T−t)

∫R2 max(s1

t ec11y1+c12y2 −K1, 0)pa1,m1


e−r(T−t)∫R2 max(s2

t ec21y1+c22y2 −K2, 0)pa1,m1


LetuCA

(s1t , s

2t , t) = lim

a2→0uCA(s1

t , s2t , t),

andJ1 =

∫R2 max(s1

t ec11y1+c12y2 −K1, 0)pa1,m1


J2 =∫R2 max(s2

t ec21y1+c22y2 −K2, 0)pa1,m1


J01 = lima2→0 J1

J02 = lima2→0 J2.

then|uCA(s1

t , s2t , t)− uCA(s1

t , s2t , t)| ≤ e−r(T−t)

(|J1 − J0

1 |+ |J2 − J02 |).

We just need to find an upper bound of |J1 − J01 |, as the same procedure used to bound |J1 − J0

1 | can be used toobtain an upper bound of |J2 − J0

2 |.Different cases are considered based on the possible values of c1j with j = 1, 2.

If c11 6= 0 and c12 = 0 then

J1 =∫R2 max(s1

t ec11y1 −K1, 0)pa1,m1


=(∫

R max(s1t ec11y1 −K1, 0)pa1,m1

dy1

) (∫R pa2,m2

(y2)dy2

)=

∫R max(s1

t ec11y1 −K1, 0)pa1,m1

dy1.

ThereforeJ0

1 =

∫R

max(s1t ec11y1 −K1, 0)pa1,m1dy1,

and |J1 − J01 | = 0 for all a2 > 0.

If c11 = 0 and c12 6= 0 then

J1 =∫R2 max(s1

t ec12y2 −K1, 0)pa1,m1


=(∫

R max(s1t ec12y2 −K1, 0)pa2,m2

(y2)dy2

)(∫R pa1,m1

(y1)dy1)

=∫R max(s1

t ec12y2 −K1, 0)pa2,m2

(y2)dy2.

Therefore there exist constants C1, C2, C3, C4 > 0 such that

|J1 − J01 | ≤ C1(eC2a2 − 1) + C3e

−C4a2 .

This result was proved in the previous section (portfolio of uncorrelated call options).We can now consider the case c11, c12 6= 0. To fix ideas and illustrate the procedure assume that c11, c12 > 0,

we obtain that

s1t ec11y1+c12y2 −K1 > 0 ⇐⇒ y2 >

ln K1

s1t

c12− c11

c12y1.


Therefore,

J1 =∫∞−∞

∫∞lnK1s1t

c12− c11c12

y1

(s1t ec11y1+c12y2 −K1)pa2,m2

(y2)dy2

pa1,m1(y1)dy1

=∫∞−∞

∫∞lnK1s1t

c12− c11c12

y1

s1t ec11y1+c12y2pa2,m2

(y2)dy2

pa1,m1(y1)dy1

−∫∞−∞

∫∞lnK1s1t

c12− c11c12

y1

K1pa2,m2(y2)dy2

pa1,m1(y1)dy1,

and as a consequence of Theorem 15 (Dominated Convergence)

J01 = lima2→0 J1

=∫∞−∞ pa1,m1

(y1)

lima2→0

∫∞lnK1s1t

c12− c11c12

y1

(s1t ec11y1+c12y2 −K1)pa2,m2

(y2)dy2

dy1.

However,

lima2→0

∫ ∞lnK1s1t

c12− c11c12

y1

1√2πa2

(s1t ec11y1+c12y2 −K1)e−

(y2−m2)2

2a2 dy2

is s1t ec11y1+c12m2 −K1, if m2 ≥

lnK1s1t

c12− c11

c12y1,

0, if m2 <lnK1s1t

c12− c11

c12y1.

ThereforeJ0

1 =

∫ ∞lnK1s1t

c11− c12c11

m2

(s1t ec11y1+c12m2 −K1)pa1,m1

(y1)dy1.

In addition, J1 = I1 + I2 with

I1 =

∫ lnK1s1t

c11− c12c11

m2

−∞

(∫ ∞lnK1s1t

c12− c11c12

y1

(s1t ec11y1+c12y2 −K1)pa2,m2

(y2)dy2

)pa1,m1

(y1)dy1

and

I2 =

∫ ∞lnK1s1t

c11− c12c11

m2

(∫ ∞lnK1s1t

c12− c11c12

y1

(s1t ec11y1+c12y2 −K1)pa2,m2(y2)dy2

)pa1,m1(y1)dy1.

Let I01 = lima2→0 I1 and I0

2 = lima2→0 I2 then

I01 = 0

I02 =

∫∞lnK1s1t

c11− c12c11

m2

(s1t ec11y1+c12m2 −K1)pa1,m1

(y1)dy1,

and

|J1 − J01 | ≤ |I1 − I0

1 |+ |I2 − I02 |.


Therefore,

|I1 − I01 | =

∣∣∣∣∣∣∣∫ ln

K1s1t

c11− c12c11

m2

−∞

∫∞lnK1s1t

c12− c11c12

y1

(s1t ec11y1+c12y2 −K1)pa2,m2

(y2)dy2

pa1,m1(y1)dy1

∣∣∣∣∣∣∣ .Hence

|I1 − I01 | ≤ T1 + T2,

where

T1 =

∫ lnK1s1t

c11− c12c11

m2

−∞

(∫ ∞lnK1s1t

c12− c11c12

y1


(y2)dy2

)pa1,m1

(y1)dy1,

and

T2 = K1

∫ lnK1s1t

c11− c12c11

m2

−∞

(∫ ∞lnK1s1t

c12− c11c12

y1

pa2,m2(y2)dy2

)pa1,m1(y1)dy1.

However, from Proposition 22

∫∞lnK1s1t

c12− c11c12

y1

ec12y2pa2,m2(y2)dy2 = ec12m2e(c12)2a2

2

∫∞lnK1s1t

c12− c11c12

y1

1√2a2π

exp(− (y2−(m2+c12a2))2

2a2

)dy2

= ec12m2e(c12)2a2

2 Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

.

Therefore,

T1 = s1t ec12m2e

(c12)2a22

∫ lnK1s1t

c11− c12c11

m2

−∞ ec11y1Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

pa1,m1(y1)dy1,

and

T2 = K1

∫ lnK1s1t

c11− c12c11

m2

−∞ Q

ln

K1s1t

c12−m2

− c11c12y1

√a2

pa1,m1(y1)dy1.

Similarly,

I2 − I02 =

∫∞lnK1s1t

c11− c12c11

m2

∫∞lnK1s1t

c12− c11c12

y1

(s1t ec11y1+c12y2 −K1)pa2,m2

(y2)dy2

pa1,m1(y1)dy1

−∫∞

lnK1s1t

c11− c12c11

m2

(s1t ec11y1+c12y2 −K1)pa1,m1

(y1)dy1.

Therefore|I2 − I0

2 | ≤ R1 +R2,


where

R1 =

∣∣∣∣∣∣∫∞ln K1s1t

c11− c12c11

m2

s1t ec11y1+c12m2 −

∫∞lnK1s1t

c12− c11c12

y1


(y2)dy2

pa1,m1(y1)dy1

∣∣∣∣∣∣and

R2 = K1

∫∞lnK1s1t

c11− c12c11

m2

1−∫∞

lnK1s1t

c12− c11c12

y1

pa2,m2(y2)dy2

pa1,m1(y1)dy1.

Note that

R1 =

∣∣∣∣∣∣∫∞ln K1s1t

c11− c12c11

m2

s1t ec11y1

ec12m2 −∫∞

lnK1s1t

c12− c11c12

y1

ec12y2pa2,m2(y2)dy2

pa1,m1(y1)dy1

∣∣∣∣∣∣ .However, using Proposition 22 and the definition of the Q integral,

ec12m2 −∫ ∞

lnK1s1t

c12− c11c12

y1

ec12y2pa2,m2(y2)dy2

can be rewritten as

ec12m2

1− e(c12)2a2

2 Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

.

Therefore, as Q(x) = 1−Q(−x) , we obtain that

ec12m2 −∫∞

lnK1s1t

c12− c11c12

y1

ec12y2pa2,m2(y2)dy2 = ec12m2

(1− e

(c12)2a22

)+

ec12m2e(c12)2a2

2 Q

c11c12

y1−

lnK1s1t

c12−(m2+c12a2)

√a2

.

Hence,

R1 ≤ ec12m2

∣∣∣∣1− e (c12)2a22

∣∣∣∣ ∫∞ln K1s1t

c11− c12c11

m2

s1t ec11y1pa1,m1

(y1)dy1+

s1t ec12m2e

(c12)2a22

∫∞lnK1s1t

c11− c12c11

m2

ec11y1Q

c11c12

y1−

lnK1s1t

c12−(m2+c12a2)

√a2

pa1,m1(y1)dy1.

In addition, note that

R2 = K1

∫∞lnK1s1t

c11− c12c11

m2

Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1.


To sum up, we have that

|J1 − J01 | ≤

5∑i=1

Ti,

where

T1 = s1t ec12m2e

(c12)2a22

∫ lnK1s1t

c11− c12c11

m2

−∞ ec11y1Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

pa1,m1(y1)dy1

T2 = K1

∫ lnK1s1t

c11− c12c11

m2

−∞ Q

ln

K1s1t

c12−m2

− c11c12y1

√a2

pa1,m1(y1)dy1

T3 = ec12m2

∣∣∣∣1− e (c12)2a22

∣∣∣∣ ∫∞ln K1s1t

c11− c12c11

m2

s1t ec11y1pa1,m1

(y1)dy1

T4 = s1t ec12m2e

(c12)2a22

∫∞lnK1s1t

c11− c12c11

m2

ec11y1Q

c11c12

y1−

lnK1s1t

c12−(m2+c12a2)

√a2

pa1,m1(y1)dy1,

and

T5 = K1

∫∞lnK1s1t

c11− c12c11

m2

Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1.

Note that previously (portfolio of two Digital options) we showed that there exists a constant C1 > 0 such thatTi ≤ C1

√a2 for a2 > 0 small enough and i = 2, 5.

In addition, there exists constants C2, C3 > 0 such that T3 ≤ C2(eC3a2 − 1). Therefore, we just need to findupper bounds of T1 and T4. We only show the proof for T1 as the proof for T4 is almost identical.

Consider the following integral

S =∫ ln

K1s1t

c11− c12c11

m2

−∞ ec11y1Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

pa1,m1(y1)dy1.

From Proposition 22, we have that

S = C∫ ln

K1s1t

c11− c12c11

m2

−∞1√

2πa1e−

(y1−(m1+c11a1))2

2a1 Q

ln

K1s1t

c12−m2−c12a2

− c11c12y1

√a2

dy1,

with C = em1c11+(c11)2

2 a1 .Splitting the integration interval we obtain that

S = CS1 + CS2,


where

S1 =∫ ln

K1s1t

c11− c12c11

m2− (c12)2

c11a2

−∞1√

2πa1e−

(y1−(m1+c11a1))2

2a1 Q

ln

K1s1t

c12−m2−c12a2

− c11c12y1

√a2

dy1

S2 =∫ ln

K1s1t

c11− c12c11

m2

lnK1s1t

c11− c12c11

m2− (c12)2

c11a2

1√2πa1

e−(y1−(m1+c11a1))2

2a1 Q

ln

K1s1t

c12−m2−c12a2

− c11c12y1

√a2

dy1.

Note that the integrand in S2 is bounded by 1. Therefore, there exists a constant C4 > 0 such that S2 ≤ C4a2.Using Proposition 21 (Chernoff bound of the Q function), we obtain that

S1 ≤∫ ln

K1s1t

c11− c12c11

m2− (c12)2

c11a2

−∞1√

2πa1e−

(y1−(m1+c11a1))2

2a1 e−

lnK1s1t

c12−m2−c12a2

− c11c12y1

2

2a2 dy1.

Define:

A =lnK1s1t

c11− c12

c11m2 − (c12)2

c11a2

d =lnK1s1t

c12− m2 − c12a2

c = c11c12

b = 2a2

a = 2a1

m = m1 + c11a1.

Using Proposition 23,

∫ A−∞ e−

(d−cy1)2

b e−(y1−m)2

a dy1 = e−(cm−d)2

ac2+b∫ A−∞ e

− ac2+bab (y1− acd+bmac2+b

)2dy1.

Therefore, there exists a constant C > 0 such that

S1 ≤ C∫ A

−∞

1√aπe− ac


)2dy1.

Consider the integral ∫ A

−∞

1√aπe− ac


)2dy1.

Lety1√

2=

√ac2 + b√ab

(y1 −

acd+ bm

ac2 + b

),

then dy1 =√ab√

2√ac2+b

dy1 and at y1 = A, y1 = A =√

2√ac2+b√ab


Therefore,

∫ A−∞

1√aπe− ac


)2dy1 =

√ab√

2πa√ac2+b

∫ A−∞ e−

y212 dy1

=√b√

ac2+bΦ(A),


where A =√

2√ac2+b√ab


Returning to our original variables we obtain that

A =

√2√

2a1(c11c12

)2+2a2√

2a12a2

(

lnK1s1t

c11− c12

c11m2 − (c12)2

c11a2

)−

2a1c11c12

lnK1s1t

c12−m2−c12a2

+2a2m1

2a1(c11c12

)2+2a2

=

√2√

2a2√2a1

lnK1s1t

c11− c12c11

m2−m1− (c12)2

c11a2√

2a1(c11c12

)2+2a2

=

√a2√a1

lnK1s1t

c11− c12c11

m2− (c12)2

c11a2−m1√

a1(c11c12

)2+a2

,

and √b√

ac2 + b=

√a2√

a1( c11c12 )2 + a2

.

Therefore, as a2 → 0 √a2√

a1( c11c12 )2 + a2

Φ(A) ∼ O(a212 ).

Hence, there exits C5 > 0 such thatS1 ≤ C5

√a2

for a2 > 0 small enough. As a consequence , we obtain that there exists constants C ′1, C′2, C

′3, C

′4 > 0 such that

|J1 − J01 | ≤ C ′1

√a2 + C ′2(eC

′3a2 − 1) + C ′4a2

for a2 > 0 small enough.We have obtained an upper bound under the assumption of c11, c12 > 0. In general, for the remaining cases

we have that

• If c11 > 0 and c12 < 0 then

|J1 − J01 | ≤

n∑i=1

Ti,

with

T1 = s1t ec12m2e

(c12)2a22

∫ lnK1s1t

c11− c12c11

m2

−∞ ec11y1Q

c11c12

y1−

lnK1s1t

c12−(m2+c12a2)

√a2

pa1,m1(y1)dy1

T2 = K1

∫ lnK1s1t

c11− c12c11

m2

−∞ Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1


T3 = ec12m2

∣∣∣∣1− e (c12)2a22

∣∣∣∣ ∫∞ln K1s1t

c11− c12c11

m2

s1t ec11y1pa1,m1

(y1)dy1

T4 = s1t ec12m2e

(c12)2a22

∫∞lnK1s1t

c11− c12c11

m2

ec11y1Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

pa1,m1(y1)dy1

T5 = K1

∫∞lnK1s1t

c11− c12c11

m2

Q

ln

K1s1t

c12−m2

− c11c12y1

√a2

pa1,m1(y1)dy1.

• If c11 < 0 and c12 > 0 then

|J1 − J01 | ≤

n∑i=1

Ti,

with

T1 = s1t ec12m2e

(c12)2a22

∫∞lnK1s1t

c11− c12c11

m2

ec11y1Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

pa1,m1(y1)dy1

T2 = K1

∫∞lnK1s1t

c11− c12c11

m2

Q

ln

K1s1t

c12−m2

− c11c12y1

√a2

pa1,m1(y1)dy1

T3 = ec12m2

∣∣∣∣1− e (c12)2a22

∣∣∣∣ ∫lnK1s1t

c11− c12c11

m2

−∞ s1t ec11y1pa1,m1

(y1)dy1

T4 = s1t ec12m2e

(c12)2a22

∫ lnK1s1t

c11− c12c11

m2

−∞ ec11y1Q

c11c12

y1−

lnK1s1t

c12−(m2+c12a2)

√a2

pa1,m1(y1)dy1

T5 = K1

∫ lnK1s1t

c11− c12c11

m2

−∞ ec11y1Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1.

• If c11 < 0 and c12 < 0 then

|J1 − J01 | ≤

n∑i=1

Ti,

with

T1 = s1t ec12m2e

(c12)2a22

∫∞lnK1s1t

c11− c12c11

m2

ec11y1Q

c11c12

y1−

lnK1s1t

c12−(m2+c12a2)

√a2

pa1,m1(y1)dy1


T2 = K1

∫∞lnK1s1t

c11− c12c11

m2

Q

c11c12

y1−

lnK1s1t

c12−m2

√a2

pa1,m1(y1)dy1

T3 = ec12m2

∣∣∣∣1− e (c12)2a22

∣∣∣∣ ∫lnK1s1t

c11− c12c11

m2

−∞ s1t ec11y1pa1,m1

(y1)dy1

T4 = s1t ec12m2e

(c12)2a22

∫ lnK1s1t

c11− c12c11

m2

−∞ ec11y1Q

ln

K1s1t

c12−(m2+c12a2)

− c11c12y1

√a2

pa1,m1(y1)dy1

T5 = K1

∫ lnK1s1t

c11− c12c11

m2

−∞ ec11y1Q

ln

K1s1t

c12−m2

− c11c12y1

√a2

pa1,m1(y1)dy1.

In the remaining cases, following the same procedure described above, we obtain that there exist constantsC1, C2, C3, C4 > 0 such that

|J1 − J01 | ≤ C1

√a2 + C2(ec3a2 − 1) + C4a2

for a2 > 0 small enough.

Monte Carlo simulations were used to plot the magnitude of the difference between the price of a portfolio of 2options with correlated underlying assets and the approximation obtained by making zero the smallest eigenvalue.

Figure 2.4 shows the plot for a portfolio of two digital options. 4000000 simulations were used. The followingvalues were assigned to the parameters and variables of the options: Sit = 100, Ki = 106 and Di = 10, fori = 1, 2, ∆ = T − t = 1, σ1 = 0.2, σ2 = 0.3, ρ = 0.3 and r = 0.02.

Figure 2.5 shows the plot for a portfolio of two call options. 4000000 simulations were used. The follow-ing values were assigned to the parameters and variables of the options: Sit = 100 and Ki = 94 for i = 1, 2,∆ = T − t = 1, σ1 = 0.2, σ2 = 0.3, ρ = 0.3 and r = 0.02.

2.5 Conclusion

We considered the problem of finding upper bounds of the magnitude of the change in price of a portfolio ofoptions after eigenvalues from the covariance matrix are modified. Upper bounds were obtained using estimatesof the L1 distance of Gaussian densities assuming that the portfolio’s payoff is bounded. Upper bounds were alsoobtained for portfolios of n uncorrelated single asset options, two digital options with correlated underlying assetsand two call options with correlated underlying assets.


Figure 2.4: Magnitude of the difference between the price of a portfolio of 2 digital options and the approximationobtained by making the smallest eigenvalue zero.

Figure 2.5: Magnitude of the difference between the price of a portfolio of 2 call options and the approximationobtained by making the smallest eigenvalue zero.

Chapter 3

Gaussian versus non-Gaussian factormodels

3.1 Introduction

Multiple factor models decompose an asset’s return into factors common to all assets and a specific factor. Theycan be used to predict returns, to generate estimates of abnormal returns, and to describe the covariance structureof returns. The common factors are often interpreted as capturing fundamental risk components. The factor modelisolates an asset’s sensitivity to these risk factors.There are three main types of multifactor models for asset returns: macroeconomic factor models, fundamentalfactor models, and statistical factor models. Macroeconomic factor models use observable economic time seriesas measures of common factors in asset returns. Fundamental factor models use observable firm or asset specificattributes such as firm size and industry classification to determine common factors in asset returns. Statisticalfactor models treat the common factors as unobservable or latent factors. Zivot, E. and Wang, J. (2006) providean excellent description of each type of model.Each of the three types of multifactor models for asset returns has the general form:

Ri,t = αi + β1,if1,t + β2,if2,t + · · ·βK,ifK,t + εi,t

= αi + β′ift + εi,t,

where Ri,t is the return (real or in excess of the risk-free rate) on asset i (i = 1, · · · , N ), fk,t is the k-th commonfactor (k = 1, · · · ,K) and εi,t is the asset specific factor in time period t (t = 1, · · · , T ), αi is the intercept andβk,i is the factor loading for asset i on the k-th factor.The multifactor model has the following assumptions:

• The factor realizations ft are stationary with unconditional moments

E[ft] = µf

Cov(ft) = E[(ft − µf )′(ft − µf )] = Ωf .

53

CHAPTER 3. GAUSSIAN VERSUS NON-GAUSSIAN FACTOR MODELS 54

• The asset specific error terms εi,t are uncorrelated with each of the common factors fk,t so that

Cov(fk,t, εi,t) = 0, for all k, i and t.

• The error terms are serially uncorrelated and contemporaneously uncorrelated across assets

Cov(εi,tεj,s) = σ2i for all i = j and t = s

= 0, otherwise.

In a macroeconomic factor model, the realizations ft are observed macroeconomic variables that are assumed tobe uncorrelated with the asset specific error terms εi,t. The two most common macroeconomic factor models areSharpe’s single factor model (1970) and Chen, Roll and Ross’s (1986) multifactor model. Because the macroeco-nomic factors are observed the problem is then to estimate the factor betas βk,i, residual variances σ2

i and factorcovariance Ωf , using time series regression techniques.Sharpe’s single factor model or market model, is given by

Ri,t = αi + βiRM,t + εi,t, i = 1, · · · , N and t = 1, · · · , T,

where RM,t denotes the return or excess return (relative to the risk free rate ) on a market index (typically a valueweighted index like the S&P 500 index) in time period t. The market index is meant to capture market risk, whilethe error term captures non-market firm specific risk.The general macroeconomic multifactor model specifies K observable macro-economic variables as the factorrealizations ft:

Ri = 1Tαi + Fβi + εi, i = 1, · · · , NE[εiε

′i] = σ2

i IT ,

where 1T is a (T × 1) vector of ones, F is a (T ×K) matrix of factor realizations (the t-th row of F is f ′t), βi isa (K × 1) vector of factor loadings, and εi is a (T × 1) vector of error terms with covariance matrix σ2

i IT .The general form of the covariance matrix for the macroeconomic factor model is

Ω = BΩfB′ +D,

where B = [β1, β2, · · · , βN ]′, Ωf = E[(ft − µt)(ft − µt)′] is the covariance matrix of the observed factors andD is a diagonal matrix with σ2

i = V ar(εi,t) along the diagonal.Because the factor realizations are observable, the parameter matrices B and D of the model may be estimatedusing time series regression. The covariance matrix of the factor realizations may be estimated using the timeseries sample covariance matrix.Fundamental factor models use observable asset specific characteristics (fundamentals) like industry classifica-tion, market capitalization, style classification (value, growth) to determine the common risk factors. The twomost popular ways to estimate the fundamental factors are the ‘BARRA’ approach introduced in 1975 by BarRosenberg, founder of BARRA Inc., which is discussed in detail in Grinold and Kahn (2000) and the ‘Fama-French’ approach proposed by Fama, E and French, K. (1992).In the BARRA, approach the observable asset specific fundamentals are treated as the factor betas βi which aretime invariant, while the factor realizations at time t, ft, are unobservable. The problem is then to estimate thefactor realizations at time t given the factor betas, which is done running T cross-section regressions.


The BARRA-type single factor model is given by

Rt = βft + εt, t = 1, · · · , T,

where β is a vector of observed values of an asset specific attribute (market capitalization, industry classification,style classification) and ft is an unobserved factor realization. It is assumed that

V ar(ft) = σ2f

Cov(ft, εi,t) = 0, for all i, tV ar(εi,t) = σ2

i , i = 1, · · · , N.

The factor realization ft is the parameter to be estimated for each time period t = 1, · · · , T . Because the errorterm εt is heteroskedastic, efficient estimation of ft is done by weighted least squares given that the specificvariances σ2

i are known, in fact

ft,wls = (β′D−1β)−1β′D−1Rt, t = 1, · · · , T,

where D is a diagonal matrix with σ2i along the diagonal.

A popular fundamental factor model with K factors is the following stylized BARRA-type industry factor modelwith K mutually exclusive industries

Rt = β1f1,t + · · · = βKfK,t + εt, t = 1, · · · , T= Bft + εt,

E[εtε′t] = D

Cov(ft) = Ωf ,

where Rt is an (N × 1) vector of returns, B = [β1, · · · , βK ] is a (N ×K) matrix of zeros and ones that reflectsthe industry factor sensitivities for each asset, ft = (f1,t, . . . , fK,t)

′ is a (K × 1) vector of unobserved factorrealizations, εt is an (N × 1) error term, and D is the diagonal matrix with σ2

i along the diagonal.The factor realizations are estimated as follows:

• For each t = 1, · · · , T an ordinary linear regression estimate is obtained,

ft,OLS = (B′B)−1B′Rt.

• Given the time series of factor realizations fOLS = (f1,OLS , · · · , fT,OLS), the covariance matrix of theindustry factors is estimated as the time series sample covariance

ΩFOLS = 1T−1

∑Tt=1(ft,OLS − fOLS)(ft,OLS − fOLS)′

fOLS = 1T

∑Tt=1 ft,OLS .

Similarly, the residual variances can be estimated using

σ2i,OLS = 1

T−1

∑Tt=1(εi,t,OLS − εi,OLS)2, i = 1, · · · , N.

εi,t,OLS = 1T

∑Ti=1 εi,t,OLS .


The covariance matrix of the N assets is then estimated by

ΩOLS = BΩFOLSB′ + DOLS ,

where DOLS is a diagonal matrix with σ2i,OLS along the diagonal.

• The estimates of the residual variances are used as weights for weighted least squares estimation of thefactor realizations

ft,GLS = (B′D−1OLSB)−1B′D−1

OLS(Rt − Rt1N ).

Using the time series of factor realizations (f1,GLS , · · · , fT,GLS), the covariance matrix of the industryfactors ΩFGLS and the residual variances σ2

i,GLS are estimated. Then the covariance matrix of the N assetsis re-estimated by

ΩGLS = BΩFGLSB′ + DGLS ,

where DGLS is a diagonal matrix with σ2i,GLS along the diagonal.

In the Fama-French approach, the factor realizations are determined using a two step precedure, given observedasset characteristics. First, the cross section of assets is sorted based on the values of the asset specific character-istics. Then a hedge portfolio is constructed that is long in the top quintile and short in the bottom quintile of thesorted assets. The observed return of this hedge portfolio at time t is the observed factor realization for the assetspecific characteristic. The process is repeated for each asset specific characteristic. Then, given the observedfactor realizations for t = 1, · · · , T the factor betas for each asset are estimated using N time series regressions.In statistical factor models the factor realizations ft are not directly observable and most be extracted from theobservable returns Rt using statistical methods. The two main methods are factor analysis and principal compo-nents.The traditional factor analysis assumes that

Rt = µ+Bft + εt

Cov(ft, εs) = 0, for all t, sE[ft] = 0

E[εt] = 0

V ar(ft) = IK

V ar(εt) = D,

where D is a diagonal matrix with σ2i along the diagonal. Assuming that returns are jointly normally distributed

and temporarily i.i.d. B and D can be estimated using maximum likelihood. Given estimates B and D, anempirical version of the factor model can be constructed as

Rt − µ = Bft + εt,

where µ is the sample mean vector of Rt. The factor realizations in a given time period t, ft, are then estimatedusing the cross sectional generalized least squares regression,

ft = (B′D−1B)−1B′D−1(Rt − µ)


and the estimated factor model covariance matrix is given by

ΩF = BB′ + D.

Principal components analysis is used to explain the majority of the information in the sample covariance matrixof returns. With N assets there are N principal components, and these principal components are just linear com-binations of returns. The principal components are constructed and ordered so that the first principal componentexplains the largest portion of the sample covariance matrix of returns, the second principal component explainsthe next largest portion, and so on. The principal components are constructed to be orthogonal to each other andto be normalized to have unit length. The K most important principal components are the factor realizations. Thefactor loadings are then estimated using regression techniques.Macroeconomic factor models and in particular Sharpe’s single index model assume time invariant parameters.This assumption may be inadequate, because a constant estimate often fails to capture the dynamic relation be-tween an individual investment and the market. A simple and applicable technique to account for dynamic changesin the model’s parameters is a rolling window ordinary least squares regression. The intercept αt and the factorloading βt at time t are determined by regressing return series from t − τ to t − 1. The determination of thewindow’s length, τ , is somewhat ad hoc.We consider four 1-factor models with time varying parameters. Each of the considered models has the samegeneral structure. For each t ≥ τ + 1,

Rf,t−j = αt + βtRm,t−j + εt−j , for all j = 1, · · · , τ , (3.1)

Rf,t−j = rf,t−j − rF,t−j and Rm,t−j = rm,t−j − rF,t−j where rf,t−j , rm,t−j and rF,t−j denote the daily returnof an asset, the daily return of the market and the daily risk free rate respectively, at time t− j.The differences among the considered models lie on the assumptions on the model components. Model 1 assumesthat the residuals (εt−j) are i.i.d. such that εt−j ∼ N(0, σ2

ε,t) for all j = t − τ, · · · , t − 1, which is a standardassumption of factor models. The other models combine the structure of a 1-factor model with time varyingcoefficients, with dynamic volatility assumptions on the model components. Model 2 assumes that the seriesof residuals (εt−j) follows a GARCH(1,1) process. Model 3 assumes that the series of centered excess marketreturns (Rm,t−j − µm,t) follows a GARCH(1,1) process and the series of residuals (εt−j) is i.i.d. such thatεt−j ∼ N(0, σ2

ε,t) for all j = t − τ, · · · , t − 1. Model 4 assumes that both the series of centered excess assetreturns and the series of centered excess market returns follow GARCH(1,1) processes which leads us to considerweak GARCH(2,2) residuals (εt−j) for this model. The four models previously mentioned are then used toestimate the time varying alpha and beta of three different hedge fund strategies.This chapter starts with a description of the assumptions, and most common estimation methods of the linearregression model. In section 3.3 basic concepts and results from the theory of time series are introduced. Section3.4 focuses on the GARCH model and its estimation methods. In section 3.5 the four 1-factor models mentionedabove and their main assumptions and estimation methods are discussed. Section 3.6 shows an application ofthese models to hedge fund return modelling, in particular, the performance and market exposure of three differenthedge fund strategies are estimated and results are compared. Section 3.7 concludes this chapter.


3.2 Linear regression and estimation methods

Linear regression plays a key role in factor modeling. In fact, rolling window linear regression has become thestandard method to estimate dynamic alpha and betas in macroeconomic factor models. This section providesa detailed description of the simple linear regression model, its main assumptions, properties, and estimationmethods.

Definition 3.2.1. The simple linear regression model is given by the equation:

yt = β1 + β2xt + εt (3.2)

where (εt)nt=1 are independent and identically distributed Gaussian random variables with mean 0 and variance

σ2.

In general, we want to estimate the parameters β1 and β2 given a set of observations of a dependent variableyt and a single independent variable xt. The total number of observations is called the sample size and is denotedby n. In practice, when the successive observations are ordered in time, it often seem plausible that an error termwill be correlated with neighbouring error terms. This phenomenon is called serial correlation, when there isserial correlation, the error terms cannot be i.i.d. because they are not independent. Another possibility is that thevariance of the error terms may be systematically larger for some observations than for others, which may indicatethe presence of conditional heteroskedasticity. Later on we discuss models that capture these types of behavior,the ARMA and GARCH models.The simple linear regression can be written in matrix form:

y = Xβ + ε

with

y =

y1

y2

...yn

, ε =

ε1

ε2...εn

, X =

1 x1

1 x2

...1 xn

and β =

[β1

β2

]

Different methods can be used to estimate the model parameters β1 and β2. The most popular ones are least

squares and maximum likelihood estimation.

3.2.1 Least squares estimation

Consider the expression yt − Xtβ, if β is allowed to vary arbitrarily, then this difference is called the residual

associated with the t-th observation. Similarly the n-dimensional vector y −Xβ is called the vector of residuals.The sum of squares of the components of the vector of residuals is called the sum of squared residuals, or SSR.The idea of least squares estimation is to minimize the sum of squared residuals associated with a regressionmodel. In particular for the ordinary linear regression we have that:

SSR(β) =

n∑t=1

(yt −Xtβ)2


which can be rewritten in matrix form as

SSR(β) = (y −Xβ)′(y −Xβ).

Proposition 28. The least squares estimator of β from the simple linear regression model is

β = (X ′X)−1X ′y.

Proof. We must find a vector β that minimizes SSR(β) = (y −Xβ)′(y −Xβ),

SSR(β) = (y −Xβ)′(y −Xβ)

= (y′ − (Xβ)′)(y −Xβ)

= y′y − y′(Xβ)− (Xβ)′y + (Xβ)′(Xβ).

Therefore,∂SSR

∂β= −2y′X + 2β′(X ′X).

Hence,∂SSR∂β = 0 ⇔ β′(X ′X)− y′X = 0

⇔ (X ′X)β −X ′y = 0.

If X is full rank then X ′X is positive definite and

β = (X ′X)−1X ′y.

Let’s consider some statistical properties that are desirable for an estimator to possess.

Definition 3.2.2. Suppose that θ is an estimator of some parameter θ, the true value of which is θ0, then the bias

of θ is defined as E(θ)− θ0. If the bias of an estimator is zero for every admissible value of θ0, then the estimatoris said to be unbiased.

If the matrixX is considered fixed or non-stochastic then the least squares estimator of the OLR β is unbiased.However this assumption is often too strong, so usually it is assumed that ε and X are independent, or the evenweaker exogeneity assumption is made

E(ε|X) = 0 (3.3)

to show that β is unbiased.

Theorem 29. If it is assumed that E(ε|X) = 0 then the ordinary least squares estimator β is unbiased.

Proof. Let β0 be a vector with the true parameters of model (3.2). We have that:

β = (X ′X)−1X ′y

= (X ′X)−1X ′(Xβ0 + ε)

= β0 + (X ′X)−1X ′ε.


Therefore,E(β) = β0 + E((X ′X)−1X ′ε)

= β0 + E(E((X ′X)−1X ′ε|X)

)= β0 + E((X ′X)−1X ′E(ε|X))

= β0.

Definition 3.2.3. Let a(yn) be a vector function of a random vector yn, a(yn) converges in probability to arandom vector a0 if , for all ε > 0,

limn→∞

P (||a(yn)− a0|| < ε) = 1.

We denote the probability limit by plim, so plimn→∞a(yn) = a0 means that a(yn) converges in probability toa0.

Theorem 30 (Law of large numbers). Let x be the sample mean of xt, with t = 1, · · · , n, a sequence of random

variables, each with expectation µ. Then, provided the xt are independent, a law of large numbers states that

plimn→∞x = plimn→∞1

n

n∑t=1

xt = µ

Proof. See Rosenthal, J. (2006), p. 60.

Definition 3.2.4. A consistent estimator is one for which the estimate converges (in probability) to the quantitybeing estimated as the size of the sample tends to infinity.

Theorem 31. If plimn→∞X′X = SX′X where SX′X is a non-stochastic matrix with full rank and the (exogene-

ity) condition (3.3) is assumed then the ordinary least squares estimator β is consistent.

Proof. When proving Theorem (29), we showed that

β = β0 + (X ′X)−1X ′ε.

Therefore,plimn→∞β = β0 + plimn→∞(X ′X)−1X ′ε

= β0 +(plimn→∞

1nX′X)−1

plimn→∞1nX′ε

= β0 + (SX′X)−1plimn→∞1nX′ε.

However, from Theorem (30) (Law of large numbers) and the exogeneity condition, we obtain that

plimn→∞1nX′ε = plimn→∞

∑nt=1X

′tεt

= E(X ′tεt)

= 0.

Note thatE(X ′tεt) = E(E(X ′tεt|Xt))

= E(X ′tE(εt|Xt))

= 0

because E(εt|Xt) = 0.


Definition 3.2.5. One estimator is said to be more efficient than another if, on average the former yields moreaccurate estimates than the latter.

Theorem 32 (Gauss-Markov Theorem). If it is assumed that E(ε|X) = 0 and E(εε′|X) = σ2I in the linear

regression model (3.2), then the ordinary least squares estimator β is more efficient than any other linear unbiased

estimator β.

Proof. See Davidson, R. and MacKinnon, J. (2004), pp. 107-108.

According to Theorem 32 (Gauss-Markov), if the parameters of a regression model are to be estimated effi-ciently by least squares, the error terms must be uncorrelated and have the same variance:

E(εε′) = σ2I.

However, regression models can have error terms that are heteroskedastic, serially correlated, or both.Consider the generalized linear regression model:

y = Xβ + ε, E(εε′) = Ω, (3.4)

where Ω, the covariance matrix of the error terms, is a positive definite n× n matrix.If Ω is equal to σ2I , it is just the linear regression model with uncorrelated and homoskedastic errors. If Ω isdiagonal with non constant diagonal elements, then the errors are still uncorrelated, but they are heteroskedastic.If Ω is not diagonal, then εi and εj are correlated whenever Ωij the ijth element of Ω, is non zero.An efficient estimator for this model can be obtained transforming the regression so that it satisfies the conditionsof Theorem 32 (Gauss-Markov). This transformation is expressed in terms of an n×n matrix Ψ, which is usuallytriangular, that satisfies the equation

Ω−1 = ΨΨ′. (3.5)

Multiplying the generalized linear regression equation by Ψ gives

Ψ′y = Ψ′Xβ + Ψ′ε.

In addition,E(Ψ′εε′Ψ) = Ψ′E(εε′)Ψ

= Ψ′ΩΨ

= Ψ′(ΨΨ′)−1Ψ

= Ψ′(Ψ′)−1(Ψ)−1Ψ

= I.

Therefore, we obtain that

Ψ′y = Ψ′Xβ + Ψ′ε, E(Ψ′εε′Ψ) = I. (3.6)

Because the covariance matrix Ω is non singular, the matrix Ψ must be as well, and so the transformed regressionmodel (3.6) is equivalent to the original model (3.4).The ordinary least squares estimator of β from regression model (3.6) is

βGLS = (X ′ΨΨ′X)−1X ′ΨΨ′y

= (X ′Ω−1X)−1X ′Ω−1y.(3.7)


This estimator is called the generalized least squares estimator of β. Because Ψ is constant or non deterministicunder the assumptions of Theorems (29), (31) and (32), the estimator βGLS is unbiased, consistent and efficientrespectively.The generalized least squares estimator βGLS can also be obtained by minimizing the generalized least squarescriterion function

(y −Xβ)′Ω−1(y −Xβ),

which is the sum of squared residuals from the transformed regression (3.6). This criterion function can bethought of as a generalization of the SSR function in which the squares and cross products of the residuals fromthe original equation (3.4) are weighted by the inverse of the matrix Ω. When Ω is a diagonal matrix , eachobservation is simply given a weight proportional to the inverse of the variance of its error term. This is the casewhen the errors terms are heteroskedastic but uncorrelated.Let ω2

t denote the tth diagonal element of Ω. Then Ω−1 is a diagonal matrix with tth diagonal element ω−2t , and

Ψ can be chosen as the diagonal matrix with tth diagonal element ω−1t . Therefore, for t = 1, · · · , n we can write

ω−1t yt = ω−1

t Xtβ + ω−1t εt.

This regression can be estimated by ordinary least squares. This special case of generalized least squares is oftencalled weighted least squares. Note that observations for which the variance of the error terms are large, are givenlow weights, and observations for which it is small are given high weights.

Remark. If the vector of regression functions was x(β) instead of Xβ we can obtain generalized non-linear leastsquares estimates by minimizing the criterion function

(y − x(β))′Ω−1(y − x(β)). (3.8)

If we differentiate equation (3.8) with respect to β and divide the result by −2, we obtain that

X ′(β)Ω−1(y − x(β)) = 0, (3.9)

where X(β) is the matrix of derivatives of x(β) with respect to β. Solving equation (3.9) requires some sort ofminimization procedure.

In practice, the covariance matrix Ω is often not known. However, in many cases it is reasonable to supposethat Ω depends in a known way on a vector of unknown parameters γ. If so, it may be possible to estimate γconsistently, so as to obtain Ω(γ). Then Ψ(γ) can be defined as in equation (3.5) and generalized least squaresestimates computed conditional on Ψ(γ). This type of procedure is called feasible generalized least squares.Under suitable regularity conditions it can be shown that this type of procedure yields a feasible generalized leastsquares estimator βF that is consistent and asymptotically equivalent to the generalized least squares estimatorβGLS , see Amemiya, T. (1973).

3.2.2 Maximum likelihood estimation

Let y be an n-dimensional vector of observations of a random variable, for a given k-vector θ of parameters, letthe joint density function of y be written as f(y, θ). If f(y, θ) is evaluated at the n-dimensional y found in a givendata set, then the function f(y, ·) of the model parameters is referred to as the likelihood function of the model fora given data set. The maximum likelihood estimation method maximizes the loglikelihood function with respect


to the parameters. A parameter vector θ at which the likelihood takes on its maximum value is called a maximum

likelihood estimate of the parameters.Often, the successive observations in a sample are assumed to be independent. In that case the joint density ofthe entire sample is the product of the densities of the observations. Let f(yt, θ) denote the probability densityfunction of the tth observation. Then the joint density of the entire sample y is

f(y, θ) =

n∏t=1

f(yt, θ). (3.10)

However, it is customary to work instead with the log-likelihood function

L(y, θ) ≡ log f(y, θ) =

n∑t=1

Lt(y, θ), (3.11)

where Lt(y, θ), the contribution to the loglikelihood function made by observation t, is equal to log ft(yt, θ).Whatever value of θ that maximizes the log-likelihood function (3.11) will also maximize the likelihood function(3.10) because L(y, θ) is just a monotonic transformation of f(y, θ).For regression models, if we make the assumption that the error terms are normally distributed, the maximumlikelihood estimators coincide with some of the least squares estimators that we mentioned previously. Maximumlikelihood estimation can be applied to a wide variety of models other than regression models, yielding estimatorswith excellent asymptotic properties. However, its main disadvantage is that it requires stronger distributionalassumptions than other estimation methods.Consider the classical normal linear model

y = Xβ + u, u ∼ N(0, σ2I). (3.12)

For this model, the explanatory variables in the matrix X are assumed to be exogenous. Therefore, to constructthe likelihood function, we can use the density of y conditional on X . Note that yt is distributed, conditionally onX as N(Xtβ, σ

2), so the probability density function of yt is

ft(yt, β, σ) =1

σ√

2πexp

(− (yt −Xtβ)

2

2σ2

).

The contribution to the log-likelihood function made by the tth observation is

Lt(yt, β, σ) = −1

2log 2π − 1

2log σ2 − 1

2σ2(yt −Xtβ)2.

Because the yt are independent, the log-likelihood is the sum of these contributions over all t:

L(y, β, σ) = −n2 log 2π − n2 log σ2 − 1

2σ2

∑nt=1(yt −Xtβ)2

= −n2 log 2π − n2 log σ2 − 1

2σ2 (y −Xβ)T (y −Xβ).(3.13)

To find the Maximum Likelihood estimator, we need to maximize (3.13)

Theorem 33. The Maximum Likelihood estimator of β coincides with the ordinary least squares estimator under

the assumptions of the classical normal linear model.

Proof. To maximize L(y, β, σ), we first differentiate (3.13) with respect to σ, then we solve the resulting first


order condition for σ as a function of the data and the remaining parameters, and then substitute back into (3.13).Differentiating (3.13) with respect to σ and equating the derivative to zero, yields the first order equation

∂L(y, β, σ)

∂σ= −n

σ+

1

σ3(y −Xβ)′(y −Xβ) = 0, (3.14)

and solving (3.14) yields the result that

σ2(β) =1

n(y −Xβ)′(y −Xβ).

Substituting σ2(β) into equation (3.13), we obtain the concentrated likelihood function

Lc(y, β) = −n2

log 2π − n

2log

(1

n(y −Xβ)′(y −Xβ)

)− n

2.

Maximizing the concentrated log-likelihood function is equivalent to minimizing the sum of squared residualsas function of β. Therefore the maximum likelihood estimator β must be identical to the ordinary least squaresestimator.

Remark. Note that the fact that the maximum likelihood estimator and the ordinary least squares estimator areidentical depends critically on the assumption that error terms are normally distributed. A different distributionwould have yield a different maximum likelihood estimator.

One of the attractive features of the maximum likelihood estimation is that the estimators are consistent underquite weak regularity conditions and asymptotically normally distributed under stronger conditions.

Theorem 34. Let L(y, θ) denote a likelihood function , then under the following assumptions :

1. If L(y, θ1) = L(y, θ2) then θ1 = θ2 for all y (finite sample identification condition)

2. If θ0 is the true model parameter then p limn−1L(y, θ∗) 6= p limn−1L(y, θ0) for all θ∗ 6= θ0 (asymptotic

identification condition)

The Maximum Likelihood estimator θMLE is consistent.

Proof. Let L(θ) = exp(L(θ)) denote the likelihood function, note that the dependency on y of both L and L hasbeen suppressed for notational simplicity. Suppose that θ0 is the true parameter vector and θ∗ another vector inthe parameter space of the model.Since the logarithm function is strictly concave over the nonnegative real line and log-likelihood functions arenonnegative, using Jensen’s Inequality: If X is a real valued random vector, then E(h(X)) ≤ h(E(X)) wheneverh(·) is concave; we obtain that

E0 log

(L(θ∗)

L(θ0)

)< logE0

(L(θ∗)

L(θ0)

).

Note that this inequality holds strictly for all θ∗ 6= θ0 because of our first assumption. In addition, E0(·) denotesthe expectation taken with respect to the true parameter θ0. Since the joint density of the sample is the likelihoodfunction evaluated at θ0, we have that

E0

(L(θ∗)

L(θ0)

)=

∫L(θ∗)

L(θ0)L(θ0)dy =

∫L(θ∗)dy = 1.


Therefore,

logE0

(L(θ∗)

L(θ0)

)= 0.

As a consequence,

E0 log

(L(θ∗)

L(θ0)

)= E0L(θ∗)− E0L(θ0) < 0. (3.15)

Applying a Law of Large Numbers to the contributions to the log-likelihood function, then

p limn−1L(θ) = limn−1E0L(θ).

As a consequence, from (3.15)

p limn→∞

1

nL(θ∗) ≤ p lim

n→∞

1

nL(θ0).

However, since θMLE maximizes L(θ)

p limn→∞

1

nL(θMLE) ≥ p lim

n→∞

1

nL(θ0).

Therefore,p limn→∞

1

nL(θMLE) = p lim

n→∞

1

nL(θ0).

This result on its own does not prove that θMLE is consistent, because there may be other vectors θ∗ for whichp limn→∞

1nL(θ∗) = p limn→∞

1nL(θ0), however our second assumption rules out this possibility, hence the

maximum likelihood estimator θMLE is consistent.

Maximum Likelihood estimators have other attractive features, under certain regularity assumptions they areasymptotically normally distributed and asymptotically more efficient than Least Squares estimators, howeverthey are not unbiased in general. See Davidson, R. and MacKinnon, J. (2004).

Let’s consider next the structure of the likelihood and log-likelihood functions when successive observationsare not independent. Suppose that y1, y2 and y3 are random variables defined in the same probability space then

f(y1, y2, y3) = f(y1, y2)f(y3|y1, y2).

However,f(y1, y2) = f(y1)f(y2|y1).

Therefore,f(y1, y2, y3) = f(y1)f(y2|y1)f(y3|y1, y2).

In general, for a sample of size n,

f(y1, y2, · · · , yn) = f(y1)f(y2|y1) · · · f(yn|y1, · · · , yn−1).

This result can be rewritten as

f(yn) =

n∏t=1

f(yt|yt−1),

where yt is a vector with components y1, y2, · · · , yt. The vector yt can be thought as a subsample consisting ofthe first t observations of the full sample.


For a model to be estimated by maximum likelihood, the density f(yn) will depend a on k-dimensional vector ofparameters θ, and we can then write

f(yn, θ) =

n∏t=1

f(yt|yt−1; θ). (3.16)

The structure of (3.16) is a straightforward generalization of that of (3.10) with the marginal densities of successiveobservations replaced by densities conditional on the previous observations.The log-likelihood function corresponding to (3.16) has an additive structure

L(y, θ) ≡ log f(yn, θ) =

n∑t=1

Lt(yt, θ), (3.17)

where Lt(yt, θ) is the contribution to the log-likelihood function made by observation t. Note that, in the contri-butions Lt(·) to the log-likelihood we do not distinguish between current variable yt and the lag variables in thevector yt−1. Therefore, (3.17) has exactly the same structure as (3.11).

3.3 Time series and stationary processes

The standard time series analysis rests on important concepts such as stationarity, white noise, innovation, auto-correlation, and on a central family of models, the autoregressive moving average (ARMA) models.

Definition 3.3.1. A sequence of random variables (Xt)t∈Z defined on the same probability space is called a time

series, and is an example of a discrete stochastic process.

Stationarity plays a central part in time series analysis, because it replaces in a natural way the assumptionof independent and identically distributed (i.i.d.) distributions in standard statistics, which is often unrealistic inpractice.Next, we introduce two standard notions of stationarity.

Definition 3.3.2. The process (Xt) is said to be strictly (or strongly) stationary if the vectors (X1, · · · , Xk)′ and(X1+h, · · · , Xk+h) have the same joint distribution, for any k ∈ N and h ∈ Z.

The following notion may seem less demanding because it only constraints the first two moments of thevariables Xt, but contrary to strict stationarity, it requires the existence of these moments.

Definition 3.3.3. The process (Xt) is said to be second-order stationary if

• E(Xt) = m, for all t ∈ Z,

• Cov(Xt, Xt+h) = γX(h), for all h ∈ Z.

The function γX(·)(ρX(·) ≡ γX(·)

γX(0)

)is called the autocovariance function (autocorrelation function).

The simplest example of a second-order stationary process is white noise. This process is particularly impor-tant because it allows more complex stationary processes to be constructed.

Definition 3.3.4. The process (εt) is called weak white noise, if for some constant σ2 > 0 :

• E(εt) = 0, for all t ∈ Z,


• E(ε2t ) = σ2, for all t ∈ Z,

• Cov(εt, εt+h) = 0, for all t, h ∈ Z, h 6= 0.

Remark. Note that no independence assumption was made in the definition of weak white noise. The variables atdifferent times are only uncorrelated and this distinction is particularly crucial for time series, however sometimesit is necessary to replace this assumption for a stronger one: the variables εt and εt+h are independent andidentically distributed for all t, h ∈ Z; the process is then said to be strong white noise.

The aim of time series analysis is to construct a model for the underlying stochastic process. This model isthen used for analyzing the causal structure of the process and to obtain optimal predictors. The autoregresivemoving average (ARMA) is the most widely used model for the prediction of second-order stationary processes.ARMA models can be viewed as a natural consequence of a fundamental result from 1938 due to Herman Wold.

Theorem 35 (Wold’s Theorem). Any centered second-order stationary, and purely non-deterministic process,

admits an infinite moving average representation of the form

Xt = εt +

∞∑i=1

ciεt−i, (3.18)

where (εt) is the linear innovation process of (Xt), that is

εt = Xt − E(Xt|HX(t− 1)), (3.19)

whereHX(t− 1) denotes the Hilbert space generated by the random variables Xt−1, Xt−2, · · · and

E(Xt|HX(t− 1)) denotes the orthogonal projection of Xt onto HX(t− 1). The sequence of coefficients (ci) is

such that∑i c

2i <∞.

Proof. See Anderson, T. (1971) pp. 420-421.

Remark. A stationary process (Xt) is said to be purely deterministic if and only if⋂∞n=−∞HX(n) = 0, where

HX(n) denotes, in the Hilbert space of the real, centered, and square integrable variables, the subspace constitutedby the limits of the linear combinations of the variables Xn−i, i ≥ 0. Therefore, for a purely nondeterministic(or regular) process, the linear past, sufficiently far away in the past, is of no use in predicting future values. SeeBrockwell, P. and Davis, R. (1991).

Remark. Note that in equation (3.19), the equivalence classE(Xt|HX(t−1)) is identified with a random variable.

Truncating the infinite sum in (3.18), we obtain the process

Xt(q) = εt +

q∑i=1

ciεt−i,

called a moving average process of order q, or MA(q). We have that

||Xt −Xt(q)||22 = Eε2t∑i>q c

2i → 0, as q →∞.

It follows that the set of all finite order moving average processes is dense in the set of second order stationary andpurely deterministic processes. The class of ARMA models is often preferred to the MA models for parsimonyreasons, because they generally require fewer parameters.


Remark. A linear innovation process can be characterized by two properties i.e. (εt) is linear innovation processof (Xt) if (εt) is white noise and Cov(εt, Xt−k) = 0 for all k > 0.

In an ARMA model the current value Xt of the process is expressed as a linear function of the past values ofthe process and of current and past values of a white noise process (εt), i.e., of a sequence of uncorrelated andhomoscedatic variables with mean zero.

Definition 3.3.5. A second-order stationary process (Xt) is called ARMA(p,q), where p and q are integers, if thereexist real coefficients c, a1, · · · , ap, b1, · · · , bq , such that

Xt −∑pi=1 aiXt−1 = c+ εt +

∑qj=1 bjεt−j , for all t ∈ Z, (3.20)

where (εt) is the linear innovation process of (Xt).

Theorem 36 (Characterization of an ARMA process). Let (Xt) denote a second order stationary process. We

have that

ρX(h) +∑pi=1 aiρX(h− i) = 0, for all |h| > q

if and only if (Xt) is an ARMA(p, q) process.

Proof. See Brockwell, P. and Davis, R. (1991), pp. 89-90.

Remark. For statistical convenience, ARMA models are often used under stronger assumptions on the noise thanthat of white noise. If in Definition 3.3.5, (εt) is assumed to be strong white noise, then (Xt) is said to be a strong

ARMA process.

Equation (3.20) can be rewritten symbolically in a more compact form:

a(L)Xt = c+ b(L)εt, (3.21)

where a(z) = 1−a1z−· · ·− apzp and b(z) = 1 + b1z+ · · ·+ bpzq and L is the backward shift operator defined

by Ljzt = zt−j . The polynomials a(·) and b(·) are called autoregressive and moving average polynomials.The coefficients a1, · · · , aq, b1, · · · , bq are usually subject to stability constraints. These restrictions concern thezeros of the autoregressive and moving average polynomials:

a(z) ≡ 1− a1z − · · · − apzp 6= 0, for all |z| ≤ 1;

b(z) ≡ 1 + b1z + · · ·+ bqzq 6= 0, for all |z| ≤ 1;

Under these stability assumptions the polynomials a(·) and b(·) can be inverted, yielding two alternative repre-sentations of the process, an infinite moving average representation:

Xt = a(L)−1c+ a(L)−1b(L)εt

and an infinite autoregresive representation:

b(L)−1a(L)Xt = b(L)−1c+ εt.

The success of ARMA models is due to the tractability of the model, which is linear in the variables and some ofthe parameters at the same time. This first linearity feature yields forecast formulas that are easy to implement,


and the second one allows for estimation of the parameters using linear least squares methods.Note that making µ = c

1−∑ni=1 ai

we can rewrite equation (3.20) as

(Xt − µ)−p∑i=1

ai(Xt−i − µ) = εt +

q∑j=1

bjεt−j . (3.22)

Different methods have been proposed for the estimation of the parameters of ARMA models.Let (Xt) be an ARMA(p, q) process, from equation (3.22) we have that

(Xt − µ)− a1(Xt−1 − µ)− · · · − ap(Xt−p − µ) = εt + b1εt−1 + · · ·+ bqεt−q

for some real constants µ, a1, · · · , ap, b1, · · · , bq . Assume that (εt) is a sequence of i.i.d. random variables suchthat εt ∼ N(0, σ2) for all t ∈ Z.The parameters of this model can be estimated using a two step least squares method: first, compute µ using thesample average i.e. µ = 1

n

∑ni=1Xt; second, define Yt = Xt − µ and regress Yt on its p lags Yt−1, · · · , Yt−p

using ordinary least squares; then, for each of the q lags of the error take

εt = Yt −p∑i=1

πiYt−i,

where (πi)pi=1 are the OLS estimates obtained in the previous step, write Yt in the form

Yt = a1Yt−1 + · · ·+ apYt−p + b1εt−1 + · · · bq εt−q + εt

and regress Yt on Yt−1, · · · , Yt−p, εt−1 · · · , εt−q using ordinary least squares, to obtain estimates a1, · · · , ap,b1, · · · , bq of the coefficients. See Durbin, J. (1960).Maximum likelihood is widely used in the estimation of ARMA models. The estimation of the parameters canbe done in two different ways. One approach is to try to determine the joint density f(X1, · · · , Xn; θ) directlywhich requires among other things estimates of the n × n covariance matrix. An alternative approach relies onthe factorization of the joint density into a series of conditional densities and the density of a set of initial values:

L(Xn, θ) =

n∑t=p+1

log f(Xt|Xt−1; θ) + log f(Xp, θ). (3.23)

Note that Xk is a k-dimensional vector with components X1, X2, · · · , Xk and θ = (µ, a1, · · · , ap, b1, · · · , bq)denotes the vector of parameters from equation (3.22). Two types of maximum likelihood estimates can be com-puted. The first type is obtained maximizing the conditional log-likelihood function (first term in equation (3.23)),while the second type requires maximizing the exact log-likelihood function L(Xn, θ). For stationary models,both estimators are consistent and have the same limiting distribution.

3.4 Financial series and GARCH models

Modelling financial series is a complex problem. This complexity is mainly due to the existence of statisticalregularities (stylized facts) which are common to a large number of financial series and are difficult to reproduce


artificially using stochastic models.Most of these stylized facts were put forward in a paper by Mandelbrot, B. (1963). Since then they have beendocumented and completed by many empirical studies. Let (rt) where rt = pt−pt−1

pt−1be a series of daily price

returns. The following properties have been amply commented upon in the financial literature:

• Sample paths of returns are generally compatible with the second order stationarity assumption. The returnsoscillate around zero and while the oscillations vary a great deal in magnitude they are almost constant inaverage over long subperiods.

• The series of price returns generally displays small autocorrelations, making it close to white noise.

• Squares returns (r2t ) or absolute returns (|rt|) are generally strongly autocorrelated.

• Volatility clustering. Large absolute returns |rt| tend to appear in clusters. Turbulent (high-volatility)subperiods are followed by quiet (low-volatility) periods.

• Fat tails. When the empirical distribution of daily returns is drawn, one can generally observe that it doesnot resemble a Gaussian distribution. The densities have fat tails and are sharply peaked at zero: they arecalled leptokurtic. A measure of the leptokurticity is the kurtosis coefficient, defined as the ratio of thesample fourth order moment to the squared sample variance. It is asymptotically equal to 3 for Gaussiani.i.d. distributions. This coefficient is much greater than 3 for return series.

• Leverage Effect. The so called leverage effect was noted by Black, F. (1976), and involves asymmetry ofthe impact of past positive and negative values on the current volatility. Negative returns (corresponding toprice decreases) tend to increase volatility by a larger amount than positive returns (price increases) of thesame magnitud.

Any satisfactory model for daily returns must be able to capture the main stylized facts described in the previoussection. Particularly important are, the leptokurticity, the unpredictability of returns, and the existence of positiveautocorrelations in the square and absolute value returns.The fact that large absolute returns tend to be followed by large absolute returns (whatever the sign of the pricevariations might be) is hardly compatible with the assumption of constant conditional variance. This phenomenonis called conditional heteroscedasticity:

V ar(rt|rt−1, rt−2, · · · ) 6= constant.

Remark. Note that conditional heteroscedaticity is perfectly compatible with stationarity (in the strict and secondorder senses).

It is time now to introduce the concept of information set. The information set at time t is denoted by Ωt andcontains all events measurable at time t, in particular it contains realizations of all variables that have occuredon or before t. Given a time series (Xt), we denote the conditional expection and conditional variance of thestochastic variable Xt given the information set at time s by E(Xt|Ωs) and V ar(Xt|Ωs) respectively.

Proposition 37. If (εt) is a strong ARMA(p,q) process i.e. there exist constants c, a1, · · · , ap, b1, · · · , bq such that

εt = c+∑pi=1 aiεt−i + zt +

∑qj=1 bjzt−j ,

where (zt) is strong white noise with mean 0 and variance σ2z . Then the variance of εt conditional on the infor-

mation set at time t− 1 is constant and equal to σ2z .


Proof. Let (εt) be a (strong) ARMA(p, q) process, then

E(εt|Ωt−1) = c+

p∑i=1

aiεt−i +

q∑j=1

bjzt−j .

Therefore,εt − E(εt|Ωt−1) = zt.

Hence,V ar(εt|Ωt−1) = E[(εt − E(εt|Ωt−1))2|Ωt−1]

= E[z2t |Ωt−1]

= E[z2t ]

= σ2z .

Suppose that we have noticed that recent daily returns have been unusually volatile. We might expect thattomorrow’s return will also be more variable than usual, however an ARMA model cannot capture this type ofbehaviour because its conditional variance given the past is constant. Thus, other models must be considered ifwe want to account for non-constant volatility.Autoregresive conditionally heteroscedastic(ARCH) models were introduced by Engle, R. (1982) and their GARCH(generalized ARCH) extension is due to Bollerslev, T. (1986). In these models the key concept is the conditionalvariance, that is the variance conditional on the past. In the classical ARCH models the variance is expressed asa linear function of squared past values of the series. This specification is able to capture the main stylized factscharacterizing financial series.

Definition 3.4.1. A process (εt) is a (strong) ARCH(q) process if there exist q ∈ Z and real constants ω >

0, αi ≥ 0 for i = 1, · · · , q such that

εt =√htzt

ht = ω +∑qi=1 αiε

2t−i,

(3.24)

where (zt) is a sequence of i.i.d. random variables with E(zt) = 0 and V ar(zt) = 1 for all t ∈ Z.

Theorem 38 (Properties of an ARCH(q) process). Let (εt) be a second order stationary ARCH(q) process i.e.

εt =√htzt


2t−i,

with ω > 0 and αi ≥ 0 for i = 1, · · · , q, then

E(εt|Ωt−1) = 0

V ar(εt|Ωt−1) = w +∑qi=1 αiε

2t−1

E(εt) = 0

V ar(εt) = w1−∑qi=1 αi

,

(3.25)

where E(εt|Ωt−1) and V ar(εt|Ωt−1) are the conditional expectation and conditional variance of εt given the

information set at time t − 1, while E(εt) and V ar(εt) are the unconditional expectation and variance of the

process (εt).


Proof. We have that

E[εt|Ωt−1] = E[(√

ω +∑qi=1 αiε

2t−i

)zt∣∣Ωt−1

]=

(√ω +

∑qi=1 αiε

2t−i

)E[zt|Ωt−1]

= 0,

and

V ar(εt|Ωt−1) = E[((√

ω +∑qi=1 αiε

2t−i

)zt − 0

)2 ∣∣Ωt−1

]= E

[(ω +

∑qi=1 αiε

2t−i)z2t |Ωt−1

]= (ω +

∑qi=1 αiε

2t−i)E[z2

t |Ωt−1]

= ω +∑qi=1 αiε

2t−i.

In addition,E [εt] = E

[√ω +

∑qi=1 αiε

2t−i

]E[zt]

= 0,

andV ar(εt) = E[ε2t ]− (E[εt])

2

= E[ε2t ]

= E[htz2t ]

= E[ht]

= ω +∑qi=1 αiE[ε2t−i]

= ω +∑qi=1 αiV ar(εt−i).

Because the process (εt) is assumed to be second order stationary, we have that

V ar(εt) =ω

1−∑qi=1 αi

.

Corollary 38.1. If (εt) is a second order stationary ARCH(q) process then∑qi=1 αi < 1.

Definition 3.4.2. A process (εt) is called (strong) GARCH(p, q) if there exist integers p and q and real coefficientsω, αi and βj such that for all t ∈ Z

εt =√htzt


2t−i +

∑pj=1 βjht−j ,

(3.26)

where (zt) is a sequence of i.i.d. random variables with E(zt) = 0 and V ar(zt) = 1 for all t ∈ Z and ω > 0 ,αi ≥ 0 and βj ≥ 0 for all i = 1, · · · , q and j = 1, · · · , p respectively.

Definition 3.4.3. Let (εt) be a GARCH(p, q) process, if the coefficients α1, · · · , αq, β1, · · · , βp of the processsatisfy that

q∑i=1

αi +

p∑j=1

βj = 1,

then (εt) is called an integrated GARCH(p, q) or IGARCH(p, q) process.


Theorem 39 (Properties of a GARCH(p, q) process). Let (εt) be a second order stationary GARCH(p, q) process:

εt =√htzt


2t−i +

∑pj=1 βjht−j ,

with p, q ∈ Z, ω > 0, αi ≥ 0 and βj ≥ 0 for all i = 1, · · · , q and j = 1, · · · , p respectively, then

E(εt|Ωt−1) = 0

V ar(εt|Ωt−1) = w +∑qi=1 αiε

2t−1 +

∑pj=1 βjht−j

E(εt) = 0

V ar(εt) = w1−(

∑qi=1 αi+

∑pj=1 βj)

,

(3.27)

where E(εt|Ωt−1) and V ar(εt|Ωt−1) are the conditional expectation and conditional variance of εt given the

information set at time t − 1, while E(εt) and V ar(εt) are the unconditional expectation and variance of the

process (εt).

Proof. The proofs of the first three equations of (3.27) are almost identical to the proofs of the first three equationsof (3.25), which were already provided, therefore we only show the derivation of the last equation.

V ar(εt) = E[ε2t ]− (E[εt])2

= E[ε2t ]

= E[htz2t ]

= E[ht]

= ω +∑qi=1 αiE[ε2t−i] +

∑pi=1 βjE[ht−i]

= ω +∑qi=1 αiV ar(εt−i) +

∑pi=1 βjE[ht−j ].

However,E[ht−j ] = E[ht−jz

2t−j ]

= E[ε2t−j ]

= V ar(εt−j).

Therefore,

V ar(εt) = ω +

q∑i=1

αiV ar(εt−i) +

p∑i=1

βjV ar(εt−j).

Because the process (εt) is assumed to be second order stationary, we have that

V ar(εt) =ω

1−∑qi=1 αi −

∑pi=1 βj

.

Remark. Note that for all h, t ∈ Z, h 6= t:

Cov(εt, εt+h) = E(εtεt+h)− E(εt)E(εt+h) = 0.

Corollary 39.1. If (εt) is a second order stationary GARCH(p, q) process then∑qi=1 αi +

∑pj=1 βj < 1.

Let (εt) be a GARCH(p, q) process and define ut = ε2t − ht. Substituting ht−j = ε2t−j − ut−j in (3.26) the


following representation is obtained:

ε2t = ω +∑ri=1(αi + βi)ε

2t−i + ut −

∑pj=1 βjut−j , t ∈ Z, (3.28)

where r = max(p, q), with the convention αi = 0 (βj = 0) if i > q (j > p).Note that ut = ε2t − ht = ht(z

2t − 1). Therefore E(ut) = E(z2

t − 1)E(ht) = 0, for all t ∈ Z because (zt) isassumed to be i.i.d with zero mean and variance 1. Also, for k > 0 and t ∈ Z, we have that

Cov(ut, ut−k) = E(ht(z2t − 1)ht−k(z2

t−k − 1)) = E(z2t − 1)E(htht−k(z2

t−k − 1)) = 0,

Cov(ut, ε2t−k) = E(ht(z

2t − 1)ε2t−k) = E(z2

t − 1)E(htε2t−k) = 0.

Thus, equation (3.28) has the structure of an ARMA model. In fact, under additional assumptions (implying thesecond-order stationarity of (ε2t )), if (εt) is a GARCH(p, q) process, then (ε2t ) is an ARMA(r, p) process.

Since the introduction of ARCH models by Engle in 1982 different methods have been proposed for theestimation of the parameters of both ARCH and GARCH models. Some of the most popular ones are describedbelow.The simplest estimation method for ARCH models is that of ordinary least squares (OLS). Consider the ARCH(q)model:

εt =√htzt,

ht = ω0 +∑qi=1 α0iε

2t−i,with ω0 > 0, α0i ≥ 0, i = 1, · · · , q,

(zt) sequence of i.i.d. random variables, with E(zt) = 0, V ar(zt) = 1.

(3.29)

The OLS method (proposed by Engle in 1982) uses the autoregressive representation on the squares of the ob-served process. No assumption is made on the law of (zt). From (3.28), taking ut = ε2t −ht, the following AR(q)

representation of (εt) is obtained:

ε2t = ω0 +

q∑i=1

α0iε2t−i + ut. (3.30)

Let θ0 denote the true value of the vector of parameters i.e. θ0 = (ω0, α01, · · · , α0q)′ and θ be a generic vector

of parameters. Assume that ε1, · · · , εn is a realization of length n of the process (εt), let ε0, · · · , ε1−q be initialvalues (which can be chosen, for example, equal to zero) and define V ′t−1 = (1, ε2t−1, · · · , εt−q).From (3.30) we have that

ε2t = V ′t−1θ0 + ut, t = 1, · · · , n,

which can be rewritten as Y = Xθ0 + U , with

Y =

ε21...ε2n

, X =

1 ε20 · · · ε2−q+1

...1 ε2n−1 · · · ε2n−q

, U =

u1

...un

,Therefore, assuming that the matrix X ′X is invertible, the OLS estimator of θ0 is given by

θ = arg minθ||Y −Xθ||2 = (X ′X)−1X ′Y. (3.31)

Note that the estimated parameters from (3.31) can be negative, which is problematic as predictions of the volatil-


ity can be negative as well. In order to avoid this problem the constrained OLS estimator is used instead:

θc = arg minθ∈[0,∞)q+1

||Y −Xθ||2.

Note that in (3.30) the errors (ut) satisfy that ut = ε2t − ht = (z2t − 1)ht, therefore the process (ut) is a

conditionally heteroscedastic process, with conditional variance:

V ar(ut|Ωt−1) = (κz − 1)h2t , κz = E(z4

t ).

A feasible generalized least squares estimator of θ0 can be defined as follows:For all θ = (ω, α1, · · · , αq)′, let

ht(θ) = ω +∑qi=1 αiε

2t−i and Ω = diag(h−2

1 (θ), · · · , h−2n (θ)).

The FGLS estimator is defined byθF = (X ′ΩX)−1X ′ΩY.

This estimator was developed, by Bose, A. and Mukherjee, K. (2003).Franq, C. and Zakoian, J. (2010) prove the consistency of both estimators under certain regularity assumptions,and compare the asymptotic properties of both methods. An OLS estimator can be defined for a GARCH processtoo, however, the estimator is not explicit because (εt) is not an AR process. However, the most widely usedmethod to estimate the parameters of a strong GARCH processes is (Gaussian) quasi-maximum likelihood . TheQML estimator for ARCH models was first proposed by Engle, R. (1982), and extended for GARCH models byBollerslev, T. (1986).Suppose that the observations ε1, · · · , εn constitute a realization of a strong GARCH (p, q) process i.e.

εt =√htzt,

ht = ω0 +∑qi=1 α0iε

2t−i +

∑pj=1 β0jht−j , for all t ∈ Z,

where (zt) is a sequence of i.i.d. variables with mean zero and variance 1, ω0 > 0, α0i ≥ 0 (i = 1, · · · , q) andβ0j ≥ 0, (j = 1, · · · , p). The orders of p and q are assumed to be known. The vector of parameters

θ = (θ1, · · · , θp+q+1)′ = (ω, α1, · · · , αq, β1, · · · , βp)′

belongs to a parameter space of the form

Θ ⊂ (0,+∞)× [0,∞)p+q.

The true value of the parameter is unknown, and is denoted by θ0 = (ω0, α01, · · · , α0q, β1, · · · , β0p)′.

Given initial values ε0, · · · , ε1−q, h0, · · · , h1−p, the conditional Gaussian quasi-likelihood is given by

Ln(θ) = Ln(θ; ε1, · · · , εn) =

n∏t=1

1√2πht

exp

(− ε2t

2ht

), (3.32)


where the ht are recursively defined, for t ≥ 1 , by

ht = ht(θ) = ω +

q∑i=1

αiε2t−i +

p∑i=1

βj ht−j .

For a given θ, a reasonable set of initial values is

ε20 = · · · = ε21−q = h0 = · · · = h1−p =ω

1−∑qi=1 αi −

∑pj=1 βj

.

However, this set of initial values is not suitable when (εt) is not second order stationary, which can happen if (εt)

is an IGARCH process for example. Then, more suitable sets of initial values are

ε20 = · · · = ε21−q = h0 = · · · = h1−p = ω,

orε20 = · · · = ε21−q = h0 = · · · = h1−p = ε21.

A QML estimator of θ can be obtained maximizing (3.32) with respect to θ i.e.

θ = arg maxθ∈Θ

Ln(θ)

or, minimizing Ln(θ) = 1n

∑nt=1

(ε2tht

+ log ht

), i.e.

θ = arg minθ∈Θ

Ln(θ).

Franq, C. and Zakoian, J. (2010) show that the QML estimator is consistent and asymptotically normal undercertain regularity conditions and that the choice of initial values is unimportant for the asymptotic properties ofthe QML estimator.

3.5 Model approaches

A popular method to estimate time-varying alphas and betas in a dynamic 1-factor model is via rolling windowregression. Assuming that each window starts at t− τ and ends at t−1, our models satisfy the following equationfor each fixed t ≥ τ + 1:

Rf,t−j = αt + βtRm,t−j + εt−j , for all j = 1, · · · , τ . (3.33)

Rf,t−j = rf,t−j − rF,t−j and Rm,t−j = rm,t−j − rF,t−j where rf,t−j , rm,t−j and rF,t−j denote the daily returnof an asset, the daily return of the market and the daily risk free rate respectively, at time t− j. The parameter βtdenotes the exposure of the asset to the market at time t, while αt denotes the excess return of the asset over themarket excess return given the exposure of the asset to the market. In particular, if the asset is a hedge fund, thenalpha is often considered an indicator of the hedge fund’s manager ability to generate positive returns.We consider different versions of the model in (3.33). The first one assumes that the residuals (εt−j) are i.i.d withεt−j ∼ N(0, σ2

ε,t). The other three models incorporate dynamic volatility assumptions on different components


of (3.33).For simplicity, let us rewrite equation (3.33) as

Rf,t = αs + βsRm,t + εt, for all t = s− τ, · · · , s− 1, (3.34)

where s ≥ τ + 1.

3.5.1 Model 1 (Gaussian white noise residuals)

The first model that we consider is a 1-factor model with time-varying parameters based on the classical CAPMequation introduced independently by Sharpe, W. (1964) and Lintner, J. (1965). For each s ≥ τ + 1 we have that:

Rf,t = αs + βsRm,t + εt, (εt) i.i.d N(0, σ2ε,s) distributed, for all t = s− τ, · · · , s− 1. (3.35)

The parameters of equation (3.35) can be estimated using either ordinary least squares or maximum likelihood.Both methods yield the same estimates. These procedures are described in detail in section 3.2.

3.5.2 Model 2 (GARCH residuals)

The second model assumes that the residuals (εt) follow a GARCH(1,1) process.For s ≥ τ + 1, we have that:

Rf,t = αs + βsRm,t + εt, for all t = s− τ, · · · , s− 1.εt =

√htzt,where (zt) i.i.d. with E(zt) = 0 and V ar(zt) = 1,

ht = ωs + α1,sε2t−1 + β1,sht−1,where α1,s, β1,s ≥ 0 and ωs > 0.

(3.36)

In general, a regression model with (strong) GARCH(p, q) errors can be written as:

Yt = Xtβ + εt,

εt =√htzt,where (zt) i.i.d. with E(zt) = 0 and V ar(zt) = 1,


2t−i +

∑pj=1 βjht−j ,

(3.37)

where αi and βj are nonnegative constants and ω is a strictly positive constant, Yt is the dependent variable, Xt

is a vector of exogenous or predetermined regressors and β is a vector of parameters.The regression model with ARCH errors was proposed by Engle, R. (1982) along with two different methodsthat yield consistent estimates of the parameters of the model. The properties of these estimators are described indetail in Gourieroux, C. (1997). The main estimation procedures for (3.37) are extensions to the GARCH case ofthe two methods proposed by Engel in 1982. See Davidson, R. and MacKinnon, J. (2004).The first estimation method is a three step regression procedure. First, a consistent estimate of β is obtained usingthe ordinary least squares estimator from the regression of Yt on Xt, denoted by β. From this estimation weobtain as residuals

εt = Yt −Xtβ.

The parameters ω, α1, · · · , αq, β1, · · · , βp appearing in the conditional variance can be estimated using any ofthe consistent estimators of GARCH models discussed in Section 3.4, the estimated parameters are denoted byω, α1, · · · αq, β1, · · · , βp. Finally the estimator of β is improved applying weighted least squares to a regression


of Y onX using the conditional variances ht from the previous step as diagonal elements of the weighting matrix.The second step estimator can be denoted by ˜

β.The most popular method to estimate the parameters of (3.37) is quasi-maximum likelihood.Let θ be the vector of parameters and consider the following conditional mean and conditional variance:

mt(θ) = E(Yt|Yt−1, Xt)

σ2t (θ) = V ar(Yt|Yt−1, Xt).

These two conditional moments depend on past values of the process, and current and past values of some exoge-nous regressorsX . A conditional (Normal) quasi-likelihood function is obtained by assuming that the conditionaldistribution of Yt given Yt−1, Xt is normal. This quasi-likelihood function is

Ln(θ) =

n∏t=1

1√2πσ2

t (θ)exp

(− [Yt −mt(θ)]

2

2σ2t (θ)

).

Taking logarithm, we obtain the following conditional quasi-loglikelihood funtion:

logLn(θ) = −n2

log(2π)− 1

2

n∑t=1

log(σ2t (θ))− 1

2

n∑t=1

[Yt −mt(θ)]2

σ2t (θ)

. (3.38)

Note thatmt(θ) = Xtβ,

σ2t (θ) = ht

= ω +∑qi=1 αi(Yt−i −Xt−iβ)2 +

∑pj=1 βjht−j .

(3.39)

A QML estimator of θ can be obtained maximizing (3.38) with respect to θ i.e. θ = arg maxθ∈Θ

logLn(θ).

We estimated the parameters of (3.36) using both the three step regression procedure an the conditional quasi-loglikelihood method.

Remark. Assuming that zt ∼ N(0, 1) for all t (in addition to (zt) being i.i.d.), implies that

Yt|Yt−1, Xt ∼ N(Xtβ, ht).

Therefore, the assumption of (zt) being i.i.d. and zt ∼ N(0, 1) for all t is sometimes used instead of directlyassuming that the conditional distribution of Yt given Yt−1 and Xt is conditionally Gaussian.

3.5.3 Model 3 (Unobserved GARCH)

Our third model assumes that the centered regressor follows a GARCH(1,1) process.For s ≥ τ + 1, we have that:

Rf,t = αs + βsRm,t + ηt, for all t = s− τ, · · · , s− 1.εt =

√htzt,

ht = ωs + α1,sε2t−1 + β1,sht−1, with α1,s, β1,s ≥ 0 and ωs > 0.

(3.40)

where Rm,t = εt + µm,s (note that µm,s = E(Rm,t)), (zt) i.i.d. with zt ∼ N(0, 1), (ηt) i.i.d. with ηt ∼N(0, σ2

η,s) and the disturbances (zt) and (ηt) are also assumed to be mutually independent. For a given fixedwindow, the model in (3.40) belongs to a class often called unobserved GARCH or GARCH with errors. This


family of models was first proposed by Harvey, Ruiz and Sentana in 1982. A number of papers have focusedon these models. See Harvey, A., Ruiz, E. and Sentana, E. (1992), Gourieroux, C., Monfort, A. and Renault, E.(1993) and Franq, C. and Zakoian, J. (2000).

Rf,t = αs + βs(εt + µm,s) + ηt is conditionally Gaussian, with conditional mean and conditional variancegiven by:

E(Rf,t|Ωt−1) = αs + βsµm,s,

V ar(Rf,t|Ωt−1) = β2sht + σ2

η,s.

respectively.Let θs be the vector of parameters of (3.40) i.e. θs = (αs, βs, σ

2η,s, ωs, αs,1, βs,1)′, a conditional quasi-likelihood

function is:

Lτ (θs) =

s−1∏t=s−τ

1√2π(β2

sht + σ2η,s)

exp

(− (Rf,t − (αs + βsµm,s))

2

2(β2sht + σ2

s,η)

).

Therefore, a conditional quasi-loglikelihood function is given by:

logLτ (θs) = −τ2

log(2π)− 1

2

s−1∑t=s−τ

log(β2sht + σ2

η,s)−1

2

s−1∑t=s−τ

(Rf,t − (αs + βsµm,s))2

β2sht + σ2

η,s

. (3.41)

An estimator of θs can be obtained maximizing (3.41) i.e. θs = arg maxθs∈Θs

logLτ (θs).

The following procedure is used to estimate the parameters in (3.40). First, the sample mean of (Rm,t) is com-puted, i.e. µm,s = 1

τ

∑s−1t=s−τ Rm,t. Define εt = Rm,t − µm,s, because (εt) is assumed to follow a GARCH(1,1)

process we can use conditional quasi-loglikelihood to estimate ωs, αs,1 and βs,1, see Section 3.4. The estimatedparameters are denoted by ωs, αs,1, βs,1. These estimates are substituted in equation (3.41), we obtain that

Lτ (αs, βs, σ2s,η) = −τ

2log(2π)− 1

2

s−1∑t=s−τ

log(β2s ht + σ2

η,s)−1

2

s−1∑t=s−τ

(Rf,t − (αs + βsµm,s))2

β2s ht + σ2

η,s

, (3.42)

where ht = ωs + α1,sε2t−1 + β1,sht−1. Finally (3.42) is maximized with respect to αs, βs and σ2

η,s. This last stepyields the estimates of αs, βs and σ2

η,s.

3.5.4 Model 4 (Weak GARCH residuals)

Before our last model is described and a procedure to estimate the model parameters is proposed, we must in-troduce some concepts and results. The notions of temporal aggregation, contemporaneous aggregation, weakGARCH and weak ARMA-GARCH models are particularly relevant.The temporal aggregation problem can be formulated as follows: given a process (Xt) and an integer m, whatare the properties of the sampled process (Xmt) (that is constructed from (Xt) by only keeping the m-th observa-tion)? Does the aggregated process (Xmt) belong to the same class of models as the original (Xt)? If this holdsfor any Xt and m, the class is said to be stable by temporal aggregation.The class of weak GARCH models was first introduced by Drost, F. and Nijman, T. (1993) while studying thetemporal aggregation problem for GARCH models. They showed that although the class of GARCH models is,in general, not closed under temporal aggregation, there exists a wider class of GARCH type models that is stableunder temporal aggregation, the class of weak GARCH models.It is standard in finance to consider linear combinations of several series (for instance, portfolios), the contem-


poraneous aggregation problem can be stated as follows: if these series are GARCH processes, is their linearcombination also a GARCH process? Nijman, T. and Sentana, E. (1996) answered this question. They showedthat contemporaneous aggregation of independent univariate GARCH processes yields a weak GARCH process.Subsequently, they generalized this result by showing that a linear combination of variables generated by a mul-tivariate GARCH process will also be weak GARCH.

Definition 3.5.1 (Weak GARCH). A fourth order stationary process (εt) is said to be weak GARCH(p, r) if:

1. (εt) is a white noise,

2. (ε2t ) admits an ARMA representation of the form

ε2t −r∑i=1

aiε2t−i = c+ ut +

p∑j=1

bjut−j ,

where (ut) is the linear innovation process of (ε2t ).

Remark. Recall that if (ut) is the linear innovation process of (ε2t ) then (ut) is weak white noise and for all k > 0

Cov(ut, ε2t−k) = 0.

Remark. Definition 3.5.1 (which can be found in Franq, C. and Zakoian, J. (2010)) is more general than thatof weak GARCH proposed by Drost, F. and Nijman, T. (1993). From both definitions ut is linear innovationof ε2t , however, in the Drost and Nijman approach, ut is orthogonal to all past values of εt. This additionalconstraint ensures the stability of the class under temporal aggregation, however our model uses contemporaneousaggregation, not temporal aggregation, thus we don’t need this extra assumption.

Theorem 40. (Contemporaneous aggregation of GARCH processes) The sum of two fourth order stationary

GARCH(1,1) processes with independent disturbances is weak GARCH(2,2).

Proof. Consider the sum of two GARCH(1,1) processes (ε1,t) and (ε2,t) i.e.

(εt) satisfies that εt = ε1,t + ε2,t, for all t ∈ Z.

where

εi,t = σi,tηi,t, σ2i,t = ω + αiε

2i,t−1 + βiσ

2i,t−1, ωi > 0, αi, βi ≥ 0, (ηi,t) i.i.d. (0,1), i = 1, 2.

and suppose that the sequences (η1,t) and (η2,t) are independent.Note that (ε1,t) and (ε2,t) are centered, uncorrelated and mutually independent processes, due to the independenceassumptions imposed on (η1,t) and (η2,t), therefore E(εt) = 0 and Cov(εt, εt−h) = 0 for all h > 0, t ∈ Z,therefore (εt) is white noise. Moreover, for h > 0 we have that:

Cov(ε21,t, ε22,t−h) = 0,

Cov(ε1,tε2,t, ε21,t−h) = E(ε1,tε2,tε

21,t−h) = E(η1,t)E(σi,tε2,tε

21,t−h) = 0.

Consequently, for h > 0

Cov(ε2t , ε2t−h) = Cov(ε21,t, ε

21,t−h) + Cov(ε22,t, ε

22,t−h). (3.43)


If (εi,t) is fourth order stationary i.e. E(ε4i,t) = κi <∞ then (ε2i,t) follows an ARMA(1,1) process. In particular,the autocovariances of the squares take the form

γε2i (h) = Cov(ε2i,t, ε2i,t−h) = γε2i (1)(αi + βi)

h−1, h ≥ 1. (3.44)

Combining equations (3.43) and (3.44) we obtain that

γε2(h) = Cov(ε2t , ε2t−h) = γε21(1)(α1 + β1)h−1 + γε22(1)(α2 + β2)h−1. (3.45)

Given a function f defined on the integers, Lf(h) = f(h − 1), h > 0. Therefore, (1 − βL)βh = 0 for h > 0.Hence,

[1− (α1 + β1L)][1− (α2 + β2L)]γε2(h) = 0 h > 2.

From Theorem 36, we obtain that (ε2t ) is a weak GARCH(2,2) process of the form

[1− (α1 + β1L)][1− (α2 + β2L)]ε2t = ω + ut + θ1ut−1 + θ2ut−2,

where (ut) is the linear innovation process of (ε2t ). The orders obtained in the ARMA representation of (ε2t ) arenot necessarily the minimum ones. Note that if α1 +β1 = α2 +β2 then γε2(h) = [γε21(1) + γε22(1)](α1 +β1)h−1

for h ≥ 1. Therefore, [1− (α1 + β1)L]γε2(h) = 0 if h > 1. Hence, (ε2t ) is a weak GARCH(1,1) process.

Remark. This proof can be found in Franq, C. and Zakoian, J. (2010).

The class of weak ARMA-GARCH models is an extension of the class of weak GARCH models. The follow-ing definition was provided by Franq, C. and Zakoian, J. (2000).

Definition 3.5.2 (Weak ARMA-GARCH). A process (Xt) is said to be weak ARMA-GARCH if it satisfies thefollowing two stage representation:

Xt −∑Pi=1 φiXt−i = εt +

∑Qj=1 ψjεt−j

ε2t −∑pi=1 αiε

2t−i = ω + ut +

∑qi=1 βjut−j ,

(3.46)

where the two polynomias Φ(z) = 1−φ1z− · · ·−φP zP and Ψ(z) = 1 +ψ1z+ · · ·+ψQzQ have all their zeros

outside the unit disk and have no common zeros, the same regularity assumptions are made about the polynomialsφ(z) = 1−α1z−· · ·−αpzp and ψ(z) = ω+β1z+ · · ·+βqz

q , and (εt) and (ut) are linear innovation processesof (Xt) and (ε2t ) respectively.

The following procedure was proposed by Francq and Zakoian to estimate the parameters of (3.46). First, de-fine θ(1) = (φ1, · · · , φP , ψ1, · · · , ψQ), θ(2) = (ω, α1, · · · , αp, β1, · · · , βq) and θ = (θ(1), θ(2)). LetX1, · · · , Xn

be a realization of length n of (Xt). For 0 < t ≤ n, εt(θ1) and ut(θ) are approximated by ε(θ1) and ut(θ) obtainedby replacing the unknown starting values by zero (εt(θ1) = 0,−Q+ 1 ≤ t ≤ 0 and ut(θ) = 0,−q+ 1 ≤ t ≤ 0 ).A least squares estimator θ = (θ1, θ2) can be obtained using the following procedure: Minimize Q(1)

n (θ1), whereQ

(1)n (θ1) = 1

n

∑nt=1 εt

2(θ1) and let θ1 = arg minQ(1)n (θ1)

θ1∈Θ1

, then, minimize Q(2)n (θ2) = 1

n

∑nt=1 ut

2(θ1, θ2) and

let θ2 = arg minQ(2)n (θ2)

θ2∈Θ2

. Franq, C. and Zakoian, J. (2000) showed that under certain regularity assumptions the

estimator θ = (θ1, θ2) is consistent. .It is time now to introduce our last model, which assumes that both the series of centered excess market returns


(εm,t) and the series of centered excess asset returns (εf,t) follow fourth order stationary and mutually indepen-dent GARCH(1,1) processes.For each s ≥ τ + 1 we have that:

Rf,t = αs + βsRm,t + ut, for all t = s− τ, · · · , s− 1.

εf,t =√hf,tzf,t,

hf,t = ωf,s + α1,f,sε2f,t−1 + β1,f,shf,t−1.

εm,t =√hm,tzm,t,

hm,t = ωm,s + α1,m,sε2m,t−1 + β1,m,shm,t−1.

(3.47)

where Rm,t = εm,t + µm,s and Rf,t = εf,t + µf,s (note that µm,s = E(Rm,t) and µf,s = E(Rf,t)), (zm,t) and(zf,t) are i.i.d. and mutually independent with E(zm,t) = 0, V ar(zm,t) = 1 and E(zf,t) = 0, V ar(zf,t) = 1

respectively. (ut) is assumed to be centered i.e. E(ut) = 0, also E(ε4m,t) = κm <∞ and E(ε4f,t) = κf <∞, forall t = s− τ, · · · , s− 1.From (3.47), we have that

εf,t + µf,s = αs + βs(εm,t + µm,s) + ut.

Therefore,ut = (εf,t − βsεm,t) + (µf,s − (αs + βsµm,s))

= (εf,t − βsεm,t),(3.48)

because (ut) is assumed to be centered (this is a standard assumption of factor models). From Theorem 40 wehave that the process εf,t − βsεm,t is weak GARCH(2,2). Therefore the residuals (ut) can be written as a weakGARCH(2,2) process.To estimate the parameters αs and βs we follow a procedure similar to the one proposed by Francq and Zakoianto estimate the parameters of a weak ARMA-GARCH model. First, the sample mean of both series of returns arecomputed: µm,s = 1

τ

∑s−1t=s−τ Rm,t and µf,s = 1

τ

∑s−1t=s−τ Rf,t. Let εm,t = Rm,t− µm,s and εf,t = Rf,t− µf,s

and define ut = εf,t − βsεm,t, for s − τ ≤ t ≤ s − 1, then there exist real coefficients ωs, α1,s, α2,s, β1,s, β2,s

such that

u2t = α1,su

2t−1 + α2,su

2t−2 + ωs + ηt + β1,sηt−1 + β2,sηt−2, with ηt linear innovation of u2

t .

Let θs = (ωs, α1,s, α2,s, β1,s, β2,s), we approximate ut(βs) and ηt(θs) by ut(βs) and ηt(θs) where ut(β) = 0

and ηt(θs) = 0 for t = s − (τ + 2), s − (τ + 1). Our least squares estimator of φs = (βs, θs) is obtained byminimizing Qτ (φs) = 1

τ

∑s−1t=s−τ ηt

2(φs) i.e. φs = arg minQτ (φs)φs∈Φs

. Note that

ηs−τ (φs) = u2s−τ (φs)− ωs = (εf,s−τ − βsεm,s−τ )2 − ωs

ηs−(τ−1)(φs) = (εf,s−(τ−1) − βsεm,s−(τ−1))2 − (α1,s + β1,s)(εf,s−τ − βsεm,s−τ )2 + β1,sωs − ωs

ηt = u2t − (α1,su

2t−1 + α2,su

2t−2 + ω + β1,sηt−1 + β2,sηt−2), for all s− (τ − 2) ≤ t ≤ s− 1.

From our assumption of residuals (ut) being centered we obtain that µf,s − (αs + βsµm,s) = 0. Thus αs =

µf,s − βsµm,s, where βs was obtained in the previous step.


3.6 Applications to hedge fund return modelling

A substantial growth in the number of hedge funds has occurred since the 1990s, they have become popular fortheir ability to generate an absolute return at reduced risk regardless of the market conditions. The class of hedgefund strategies is very heterogeneous. Hedge funds can be classified into a number of different strategy groupsdepending on the main strategy followed. According to Fuss, R., Kaiser, D. and Adams, Z. (2007) the three maincategories are: arbitrage, event driven, and directional.Arbitrage strategies try to take advantage of temporarily wrong valuations between different financial instrumentsand attempt to offer their investors very low market exposure. Event driven strategies try to take advantage ofprice anomalies triggered by pending or upcoming firm transactions such as mergers, restructuring, liquidationsand insolvencies. The success of these strategies comes from a false judgment of a situation and the uncertaintyof other investors in the case of takeovers, reorganizations, or management buyouts. Directional strategies arecharacterized by significant market exposure (long and short). These three groups can be subdivided into morespecific strategies. See Fuss, R., Kaiser, D. and Adams, Z. (2007).The merger arbitrage strategy belongs to the event driven management style class, it invests simultaneously in longand short positions by purchasing the stocks of the company being taken over and selling those of the take-overcompany. Generally, stocks that are object of a take-over gain in value while stocks of the take-over companyfall in value. An interesting fact mentioned in the literature is that the merger arbitrage strategy displays highcorrelation to the equity markets when the latter declines and, in turn, displays low correlation when stocks tradeup or sideways. See Mitchell, M. and Pulvino, T. (2001).The equity hedge or long/short equity strategy belongs to the directional management style class. It involvesbuying long equities that are expected to increase in value and selling short equities that are expected to decreasein value. Many equity hedge funds have a long bias. Besides stocks, equity hedge funds can also be invested inother assets on a limited scale. See Kat, H. M. and Lu, S. (2002), Fung, W. and Hsieh, D. (2004) and Fuss, R.,Kaiser, D. and Adams, Z. (2007).

In this section we study the market exposure and performance of three different hedge fund strategies: equityhedge, event driven, and merger arbitrage, using the time-varying factor models described in the previous section.According to Wei, W. (2010) it is generally accepted that hedge fund returns can be separated into alpha and betacomponents. The first one attributable to manager skill and the second one being generated by market exposure.Modelling hedge fund returns involves primarily estimating the magnitude and direction of these alpha and betacomponents.The considered 1-factor models with time-varying parameters use a rolling window regression to account fordynamic changes in the model’s parameters. Time-varying alpha and beta at time t are determined by regressingreturn series from t− τ to t− 1. Each of the models has the general form:For each t ≥ τ + 1,

Rf,t−j = αt + βtRm,t−j + εt−j , for all j = 1, · · · , τ , (3.49)

where τ is the window length, Rf,t−j = rf,t−j − rF,t−j and Rm,t−j = rm,t−j − rF,t−j where rf,t−j , rm,t−jand rF,t−j denote the daily return of a hedge fund strategy, the daily return of the market and the daily risk freerate respectively, at time t− j.Model 1 assumes that residuals (εt−j) are i.i.d. such that εt−j ∼ N(0, σ2

ε,t) for all j = t − τ, · · · , t − 1. Thisis a standard assumption of macroeconomic factor models, however, we consider different assumptions in theremaining models. Model 2 assumes that the series of residuals (εt−j) is a GARCH(1,1) process. Model 3


Figure 3.1: Time-varying beta of the equity hedge strategy for models 1, 2 and 4.

assumes that the series of centered excess market returns (Rm,t−j − µm,t) is a GARCH(1,1) process and theseries of residuals (εt−j) is i.i.d. such that εt−j ∼ N(0, σ2

ε,t) for all j = t− τ, · · · , t− 1. Model 4 assumes thatthe series of residuals (εt−j) is a weak GARCH(2,2) process. Time-varying alpha and beta of three hedge fundstrategies are computed using each of these models, and results are compared.

As proxies for the three hedge fund strategies considered (equity hedge or long/short, event driven and mergerarbitrage), the daily prices of the following indices from HFR (http://www.hedgefundresearch.com) are used:HFRX Equity Hedge Index, HFRX Event Driven Index and HFRX ED: Merger Arbitrage Index. The Standard& Poor’s 500 Index (S&P 500) is used as a proxy for the market returns, the daily prices of the S&P 500 wereobtained from Yahoo Finance (https://finance.yahoo.com/quote/%5EGSPC/history). Our sample contains 3377observations of daily prices of each hedge fund index and of the S&P 500 Index, the first observation correspondsto March 31st, 2003 and the last one to September 5th, 2017. In all our models the explanatory and the responsevariables are excess returns. The excess return of an asset at time t is defined as Rt = rt − rF,t, where rt andrF,t denote the return of the asset and the risk free rate at time t respectively. The 3 month T-bill rate was used asa proxy for the risk free rate (https://fred.stlouisfed.org/series/DTB3). We used a 253 days (approximately 1 yearof business days) rolling window.

Figure 3.1 shows the graphs of time-varying betas of the equity hedge strategy for three of the four modelspreviously discussed in section 3.5: models 1, 2 and 4. Note that models 2.1 and 2.2 both refer to model 2, thedifference lies in the estimation method. Model 2.1 uses a three step least squares procedure while the parametersof model 2.2 are estimated using quasi-loglikelihood.Model 1 assumes that the series of residuals is a Gaussian white noise process while model 2 assumes GARCH(1,1)residuals. Figure 3.1 shows that the graphs of models 1, 2.1 and 2.2 look quite similar and do not seem to capture


Figure 3.2: Time-varying alpha of the equity hedge strategy for models 1, 2 and 4.

recent exposures, which suggests that considering GARCH residuals versus Gaussian white noise residuals doesnot change significantly time-varying beta estimates. Note that we did not include the graph of model 3 in figure3.1 because the time-varying betas obtained using this model, move in a “weird” way that does not reflect thebehavior of hedge fund exposures to markets. The time-varying betas of each strategy, obtained using model 3,are shown in figure 3.9. On the other hand, model 4 is able to capture recent market exposures and to react fasterto market changes, this becomes evident during the subprime mortgage crisis (2008 crisis).To illustrate our point, we quote the article Global financial crisis: five key stages 2007-2011 authored by Elliot,L. (2011, August 7) and published by the British newspaper The Guardian:“... 9 August 2007. 15 September 2008. 2 April 2009. 9 May 2010. 5 August 2011. From sub-prime to down-grade, the five stages of the most serious crisis to hit the global economy since the Great Depression can be foundin those dates.Phase one on 9 August 2007 began with the seizure in the banking system precipitated by BNP Paribas announc-ing that it was ceasing activity in three hedge funds that specialised in US mortgage debt. This was the momentit became clear that there were tens of trillions of dollars worth of dodgy derivatives swilling round which wereworth a lot less than the bankers had previously imagined. Nobody knew how big the losses were or how great theexposure of individual banks actually was, so trust evaporated overnight and banks stopped doing business witheach other.It took a year for the financial crisis to come to a head but it did so on 15 September 2008 when the US govern-ment allowed the investment bank Lehman Brothers to go bankrupt. Up to that point, it had been assumed thatgovernments would always step in to bail out any bank that got into serious trouble...When Lehman Brothers went down, the notion that all banks were “too big to fail” no longer held true, with the


Table 3.1: Average alpha per periodEquity hedge Event driven Merger arbitrage

Post dotcom bubble -0.0009746% 0.011872% 0.007837%Subprime mortgage crisis -0.032356% -0.028842% 0.022657%

Post financial crisis -0.010607% 0.002401% 0.011139%

Table 3.2: Average beta per periodEquity hedge Event driven Merger arbitrage

Post dotcom bubble 0.408% 0.26076% 0.14129%Subprime mortgage crisis 0.2679% 0.1954% 0.24134%

Post financial crisis 0.3072% 0.211467% 0.0981%

result that every bank was deemed to be risky...”The only graph that is able to capture the sudden exposure drop that happened as a consequence of these twoevents from the 2008 crisis, corresponds to model 4 (which assumes that residuals are weak GARCH(2,2)), whichsuggests that this model unlike the other three is able to capture recent market exposures. Similar results wereobserved for the other 2 hedge fund strategies. See figures 3.5 and 3.7.Figure 3.2 shows the time-varying alphas of the equity hedge strategy, obtained using models 1, 2 and 4. Thegraphs look quite similar except during the 2008 crisis period, where model 2 shows a higher alpha than models1 and 4. The time-varying alphas of each strategy for model 3 are shown in figure 3.10.

Our three hedge fund strategies were compared using the graphs of the time-varying betas and alphas producedby model 4 (figures 3.3 and 3.4). The average alpha and beta over three different sub-periods were computed foreach strategy: March 31st, 2004 - July 31st, 2007 (post dotcom bubble), August 1st, 2007 - June 30st, 2009(subprime mortgage crisis), July 1st, 2009 - September 6th, 2017 (post financial crisis). See tables 3.1 and 3.2 .Figure 3.3 reveals some interesting trends. Note that while the largest exposures of the equity hedge strategy andthe event driven strategy occurred during the post dotcom bubble, the highest exposure of the merger arbitragestrategy happened during the 2008 crisis. Overall, equity hedge exhibits the highest market exposure among allthe strategies, followed by event driven and merger arbitrage, in that order. The results from table 3.2 confirmit. Also, during the subprime mortgage crisis, the market exposures of the different strategies where, on average,closer to each other than during the post dotcom bubble and after the 2008 financial crisis, see table 3.2. Inaddition, from November 2008 to October 2009 the merger arbitrage strategy exhibits a higher market exposurethan the rest; this is not surprising as this type of hedge fund takes advantage of mergers and acquisitions.Figure 3.4 shows the graphs of time-varying alphas for each strategy. During the year 2008 the merger arbitragealpha shows an increasing trend, while the opposite happens to the other two strategies. Monday August 8, 2011,marked the day the US and global stock markets crashed following the Friday night credit rating downgrade byStandard and Poor’s of the United States sovereign debt from AAA, or “risk free”, to AA+. It was the first time inhistory the United States was downgraded. Note that a decline of the equity hedge alpha started at the beginningof August of 2011 and lasted until mid January of 2012.Table 3.2 shows that the average equity hedge alpha was negative every sub-period. The event driven averagealpha was negative only during the subprime mortgage crisis and positive during the other two sub-periods, whilethe merger arbitrage average alpha was positive in every sub-period. These results reveal that the event drivenstrategy, and more so the merger arbitrage strategy outperform the equity hedge strategy in the generation ofpositive alpha, specially in turbulent and adverse markets.


Figure 3.3: Time-varying beta from model 4 for each strategy.

Figure 3.4: Time-varying alpha from model 4 for each strategy.


Figure 3.5: Time-varying beta of the event driven strategy for models 1, 2 and 4.

Figure 3.6: Time-varying alpha of the event driven strategy for models 1, 2 and 4.


Figure 3.7: Time-varying beta of the merger arbitrage strategy for models 1, 2 and 4.

Figure 3.8: Time-varying alpha of the merger arbitrage strategy for models 1, 2 and 4.


Figure 3.9: Time-varying beta of each strategy for model 3.

Figure 3.10: Time-varying alpha of each strategy for model 3.


3.7 Conclusion

The assumptions and estimation methods, of four different 1-factor models with time-varying parameters havebeen discussed in this chapter. Using data of three different hedge fund strategies and a market index, the time-varying alpha and beta of each model have been estimated. The first model assumes that the series of residuals is aGaussian white noise process, while the second one assumes that the series of residuals is a GARCH(1,1) process,the third one uses an unobserved GARCH process to model the asset excess return and the last one assumes thatthe series of residuals is a weak GARCH(2,2) process. The model that performed the best was the one with weakGARCH(2,2) residuals, because unlike the other three, it was able to capture recent market exposures. To thebest of our knowledge, this model has not been used before to estimate the time-varying alpha and beta of factormodels.

Bibliography

Agarwal, V. and Naik, N. (2004). Risks and Portfolio Decisions Involving Hedge Funds. Review of Financial

Studies, 17(1), 63-98.

Amemiya, T. (1973). Regression Analysis when the Dependent Variable is Truncated Normal. Econometrica,41(6), 997-1016.

Anderson, T. (1971). The Statistical Analysis of Time Series. Wiley Series in Probability and Statistics.

Bai, J. (2003). Inferential theory for factor models of large dimension. Econometrica, 71, 135-171.

Baxter, M. and Rennie, A. (1996). Financial Calculus: An introduction to Derivative Pricing. Cambridge Uni-versity Press.

Bjork, T. (2009). Arbitrage Theory in Continuous Time. Oxford Finance Series.

Black, F. and Scholes, M. (1973). The Pricing of Options and Corporate Liabilities. Journal of Political Economy,81(3), 637-654.

Black, F. (1976). Studies of Stock Price Volatility Changes. Proceedings of the 1976 Meetings of the American

Statistical Association, Business and Economic Statistics, 177-181.

Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics,31(3), 307-327.

Bose, A. and Mukherjee, K. (2003). Estimating the ARCH Parameters by Solving Linear Equations. Journal of

Time Series Analysis, 24(2), 127-252.

Brockwell, P. and Davis, R. (1991). Time Series: Theory and Methods. Springer Series in Statistics.

Chen, N.F., Roll, R. and Ross, S.A. (1986). Economic Forces and the Stock Market. The Journal of Business,59(3), 383-404.

Connor, G. and Korajczyk, R. A. (1986). Performance measurement with the arbitrage pricing theory: A newframework for analysis. Journal of Financial Economics, 15, 373-394.

Cox, J., Ingersoll J. and Ross, S. (1985). A Theory of the Term Structure of Interest Rates. Econometrica, 53(2),385-408.

Darolles, S. and Mero, G. (2007) Hedge Funds Replication and Factor Models, Working Paper. Allocataire de

Recherche en Finance: Institut de Gestion de Rennes.

92

BIBLIOGRAPHY 93

Davidson, R. and MacKinnon, J. (2004). Econometric Theory and Methods. Oxford University Press (NewYork).

Drost, F. and Nijman, T. (1993). Temporal Aggregation of GARCH Processes. Econometrica, 61(4), 909-927.

Durbin, J. (1960) The Fitting of Time-Series Models. Revue de l’Institut International de Statistique / Review of

the International Statistical Institute, 28(3), 233-244.

Elliot, L. (2011, August 7). Global financial crisis: five key stages 2007-2011. The Guardian. Retrieved fromhttps://www.theguardian.com.

Engle, R. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United King-dom Inflation. Econometrica, 50(4), 987-1007.

Engle, R., Lilien, D. and Robins, R. (1987). Estimating Time Varying Risk Premia in the Term Structure: TheArch-M Model. Econometrica, 55(2), 391-407.

Esposito, F. (2010). Multidimensional Black-Scholes options. Munich Personal RePEc Archive, 42821.

Fama, E. and French, K.R. (1992) The Cross-Section of Expected Stock Returns. Journal of Finance, 47, 427-465.

Folland, G. (1999). Real Analysis: Modern Techniques and Their Applications. John Wiley and Sons.

Franq, C. and Zakoian, J. (2000). Estimating Weak GARCH Representations. Econometric Theory, 16, 692-728.

Franq, C. and Zakoian, J. (2010). GARCH Models: Structure, Statistical Inference and Financial Applications.John Wiley and Sons Ltd.

Fung, W. and Hsieh, D. (1997) Empirical Characteristics of Dynamic Trading Strategies: The Case of HedgeFunds. Review of Financial Studies, 10, 275-302.

Fung, W. and Hsieh, D. (2004) Extracting Portable Alphas from Equity Long/Short Hedge Funds. Journal of

Investment Management, 2(4), 1-19.

Fuss, R., Kaiser, D. and Adams, Z., (2007). Value at risk, GARCH modelling and the forecasting of hedge fundreturn volatility. Journal of Derivatives and Hedge Funds, 13(1), 2-25.

Glosten, L. Jagannathan, R. and Runkle, E. (1993). On the Relation between the Expected Value and the Volatil-ity of the Nominal Excess Return on Stocks. The Journal of Finance, 48(5), 1779-1801.

Grinold, R.C. and Kahn, R.N. (2000). Active Portfolio Management: A Quantitative Approach for Producing

Superior Returns and Controlling Risk. McGraw-Hill, New York.

Gourieroux, C., Monfort, A. and Renault, E. (1993). Indirect Inference. Journal of Applied Econometrics, 8(S),S85-S118.

Gourieroux, C. (1997). ARCH Models and Financial Applications. Springer Series in Statistics.

Hasanhodzic, J. and Lo, A. (2007). Can Hedge Fund Returns be Replicated?: The Linear Case. Journal of

Investment Management, 5(2), 5-45.

BIBLIOGRAPHY 94

Harvey, A., Ruiz, E. and Sentana, E. (1992). Unobserved Component Time Series Models with ARCH Distur-bances. Journal of Econometrics, 52(1-2), 129-157.

Hubbard, B. and Hubbard, J (2009). Vector Calculus, Linear Algebra, and Differential Forms: A Unified Ap-

proach. Matrix Editions.

Kvatadze, Z. and Shervashidze, T. (2005). On the Accuracy of Kraft Upper Bound for the L1 Distance BetweenGaussian Densities in Rk. Georgian Mathematical Journal, 12(4), 679 - 682.

Karatzas, I. and Shevre, S. (1991). Brownian Motion and Stochastic Calculus. Springer Science and BusinessMedia.

Kat, H. M. and Lu, S. (2002). An Excursion Into the Statistical Properties of Hedge Fund Returns. Working

Paper, The University of Reading.

Lintner, J. (1965). The Valuation of Risk Assets and the Selection of Risky Investments in Stock Portfolios andCapital Budgets. Review of Economics and Statistics, 47(1), 13-37.

Mandelbrot, B. (1963). The Variation of Certain Speculative Prices. textitThe Journal of Business, 36, 392417.

Merton, R. (1973). Theory of Rational Option Pricing. The Bell Journal of Economics and Management Science,4(1): 141-183.

Mitchell, M. and Pulvino, T. (2001). Characteristics of Risk and Return in Risk Arbitrage. The Journal of Fi-

nance, 56(6): 2135-2175.

Nelson, D. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica, 59(2),347-370.

Nijman, T. and Sentana, E. (1996). Marginalization and Contemporaneous Aggregation in Multivariate GARCHProcesses. Journal of Econometrics, 71, 71-87.

Oksendal, B. (2003). Stochastic Differential Equations: An Introduction with Applications. Springer.

Roll, R. and Ross, S. (1980). An empirical investigation of the arbitrage pricing theory. Journal of Finance, 35,1073-1103.

Roncalli, T. and Weisang, G. (2008) Tracking Problems, Hedge Fund Replication and Alternative Beta. Working

Paper, University of Evry.

Rosenthal, J. (2006). A First Look at Rigorous Probability Theory. World Scientific Publishing Co.

Samuelson, P. (1965). Rational Theory of Warrant Pricing. Industrial Management Review, 6(2), 13-39.

Spivak, M. (1994). Calculus. Publish or Perish.

Sharpe, W. (1964). Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk. Journal of

Finance, 19(3), 425 442.

Sharpe, W. (1970). Portfolio Theory and Capital Markets. McGraw-Hill, New York.

Sharpe, W. (1992). Asset Allocation: Management Style and Performance, Journal of Portfolio Management,18, 7-19.

BIBLIOGRAPHY 95

Stock, J. H. and Watson, M. W. (2002) Macroeconomic forecasting using diffusion indexes. Journal of Business

and Economic Statistics, 20, 147-162.

Takahashi, A. and Yamamoto, K. (2008) Hedge Fund Replication, CIRJEF-592 Discussion Paper, Universityof Tokyo.

Wei, W. (2010) Modelling and Replicating Hedge Fund Returns. Centre for Actuarial Studies, The University ofMelbourne.

Wold, H. (1938). A Study in the Analysis of Stationary Time Series. Stockholm: Almqvist and Wiksell.

Zivot, E. and Wang, J. (2006) Modeling Financial Time Series with S-PLUS. Springer-Verlag.

Documents

Correlation Model Risk and Non-Gaussian Factor Models by ......Abstract Correlation Model Risk and Non-Gaussian Factor Models Julio Antonio Hernandez Bellon Doctor of Philosophy Department