Brigo Toolkit

Embed Size (px)

Citation preview

  • 8/9/2019 Brigo Toolkit

    1/43

    A Stochastic Processes Toolkit for Risk Management

    Damiano Brigo, Antonio Dalessandro, Matthias Neugebauer, Fares Triki

    15 November 2007

    Abstract

    In risk management it is desirable to grasp the essential statistical features of a time series rep-

    resenting a risk factor. This tutorial aims to introduce a number of different stochastic processes

    that can help in grasping the essential features of risk factors describing different asset classes or

    behaviors. This paper does not aim at being exhaustive, but gives examples and a feeling for practi-

    cally implementable models allowing for stylised features in the data. The reader may also use these

    models as building blocks to build more complex models, although for a number of risk managementapplications the models developed here suffice for the first step in the quantitative analysis. The

    broad qualitative features addressed here are fat tails and mean reversion. We give some orientation

    on the initial choice of a suitable stochastic process and then explain how the process parameters

    can be estimated based on historical data. Once the process has b een calibrated, typically through

    maximum likelihood estimation, one may simulate the risk factor and build future scenarios for the

    risky portfolio. On the terminal simulated distribution of the portfolio one may then single out

    several risk measures, although here we focus on the stochastic processes estimation preceding the

    simulation of the risk factors Finally, this first survey report focuses on single time series. Correlation

    or more generally dependence across risk factors, leading to multivariate processes modeling, will be

    addressed in future work.

    JEL Classification code: G32, C13, C15, C16.

    AMS Classification code: 60H10, 60J60, 60J75, 65C05, 65c20, 91B70

    Key words: Risk Management, Stochastic Processes, Maximum Likelihood Estimation, Fat Tails, MeanReversion, Monte Carlo Simulation

    We are grateful to Richard Hrvatin and Lars Jebjerg for reviewing the manuscript and for helpful comments. Kyr-iakos Chourdakis furhter helped with comments and formatting issues. Contact: [email protected],[email protected], [email protected]

    Fitch Solutions, and Department of Mathematics, Imperial College, London. [email protected] Solutions, and Department of Mathematics, Imperial College, London. [email protected] Ratings. [email protected] Solutions, and Paris School of Economics, Paris. [email protected]

    1

  • 8/9/2019 Brigo Toolkit

    2/43

    Contents

    1 Introduction 3

    2 Modelling with Basic Stochastic Processes: GBM 5

    3 Fat Tails: GARCH Models 13

    4 Fat Tails: Jump Diffusion Models 17

    5 Fat Tails Variance Gamma (VG) process 21

    6 The Mean Reverting Behaviour 25

    7 Mean Reversion: The Vasicek Model 27

    8 Mean Reversion: The Exponential Vasicek Model 31

    9 Mean Reversion: The CIR Model 33

    10 Mean Reversion and Fat Tails together 36

    2

  • 8/9/2019 Brigo Toolkit

    3/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 3

    1 Introduction

    In risk management and in the rating practice it is desirable to grasp the essential statistical featuresof a time series representing a risk factor to begin a detailed technical analysis of the product or theentity under scrutiny. The quantitative analysis is not the final tool, since it has to be integrated with

    judgemental analysis and possible stresses, but represents a good starting point for the process leadingto the final decision.This tutorial aims to introduce a number of different stochastic processes that, according to the

    economic and financial nature of the specific risk factor, can help in grasping the essential features ofthe risk factor itself. For example, a family of stochastic processes that can be suited to capture foreignexchange behaviour might not be suited to model hedge funds or a commodity. Hence there is needto have at ones disposal a sufficiently large set of practically implementable stochastic processes thatmay be used to address the quantitative analysis at hand to begin a risk management or rating decisionprocess.

    This report does not aim at being exhaustive, since this would be a formidable task beyond the scopeof a survey paper. Instead, this survey gives examples and a feeling for practically implementable modelsallowing for stylised features in the data. The reader may also use these models as building blocks to buildmore complex models, although for a number of risk management applications the models developed here

    suffice for the first step in the quantitative analysis.The broad qualitative features addressed here are fat tails and mean reversion. This report begins

    with the geometric Brownian motion (GBM) as a fundamental example of an important stochastic processfeaturing neither mean reversion nor fat tails, and then it goes through generalisations featuring any of thetwo properties or both of them. According to the specific situation, the different processes are formulatedeither in (log-) returns space or in levels space, as is more convenient. Levels and returns can be easilyconverted into each other, so this is indeed a matter of mere convenience.

    This tutorial will address the following processes:

    Basic process: Arithmetic Brownian Motion (ABM, returns) or GBM (levels). Fat tails processes: GBM with lognormal jumps (levels), ABM with normal jumps (returns),

    GARCH (returns), Variance Gamma (returns).

    Mean Reverting processes: Vasicek, CIR (levels if interest rates or spreads, or returns ingeneral), exponential Vasicek (levels).

    Mean Reverting processes with Fat Tails: Vasicek with jumps (levels if interest rates orspreads, or returns in general), exponential Vasicek with jumps (levels).

    Different financial time series are better described by different processes above. In general, when firstpresented with a historical time series of data for a risk factor, one should decide what kind of generalproperties the series features. The financial nature of the time series can give some hints at whethermean reversion and fat tails are expected to be present or not. Interest rate processes, for example, areoften taken to be mean reverting, whereas foreign exchange series are supposed to be often fat tailed andcredit spreads feature both characteristics. In general one should:

    Check for mean reversion or stationaritySome of the tests mentioned in the report concern mean reversion. The presence of an autoregressive(AR) feature can be tested in the data, typically on the returns of a series or on the series itself if thisis an interest rate or spread series. In linear processes with normal shocks, this amounts to checkingstationarity. If this is present, this can be a symptom for mean reversion and a mean reverting processcan be adopted as a first assumption for the data. If the AR test is rejected this means that there isno mean reversion, at least under normal shocks and linear models, although the situation can be morecomplex for nonlinear processes with possibly fat tails, where the more general notions of stationarityand ergodicity may apply. These are not addressed in this report.

    If the tests find AR features then the process is considered as mean reverting and one can computeautocorrelation and partial autocorrelation functions to see the lag in the regression. Or one can go

  • 8/9/2019 Brigo Toolkit

    4/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 4

    directly to the continuous time model and estimate it on the data through maximum likelihood. In thiscase, the main model to try is the Vasicek model.

    If the tests do not find AR features then the simplest form of mean reversion for linear processeswith Gaussian shocks is to be rejected. One has to be careful though that the process could still bemean reverting in a more general sense. In a way, there could be some sort of mean reversion even under

    non-Gaussian shocks, and example of such a case are the jump-extended Vasicek or exponential Vasicekmodels, where mean reversion is mixed with fat tails, as shown below in the next point.Check for fat tails If the AR test fails, either the series is not mean reverting, or it is but with fattertails that the Gaussian distribution and possibly nonlinear behaviour. To test for fat tails a first graphicaltool is the QQ-plot, comparing the tails in the data with the Gaussian distribution. The QQ-plot givesimmediate graphical evidence on possible presence of fat tails. Further evidence can be collected byconsidering the third and fourth sample moments, or better the skewness and excess kurtosis, to seehow the data differ from the Gaussian distribution. More rigorous tests on normality should also berun following this preliminary analysis1. If fat tails are not found and the AR test failed, this meansthat one is likely dealing with a process lacking mean reversion but with Gaussian shocks, that could bemodeled as an arithmetic Brownian motion. If fat tails are found (and the AR test has been rejected),one may start calibration with models featuring fat tails and no mean reversion (GARCH, NGARCH,Variance Gamma, arithmetic Brownian motion with jumps) or fat tails and mean reversion (Vasicekwith jumps, exponential Vasicek with jumps). Comparing the goodness of fit of the models may suggestwhich alternative is preferable. Goodness of fit is determined again graphically through QQ-plot of themodel implied distribution against the empirical distribution, although more rigorous approaches can beapplied 2. The predictive power can also be tested, following the calibration and the goodness of fit tests.3.

    The above classification may be summarized in the table, where typically the referenced variable isthe return process or the process itself in case of interest rates or spreads:

    Normal tails Fat tailsNO mean reversion ABM ABM+Jumps,

    (N)GARCH, VGMean Reversion Vasicek Exponential Vasicek

    CIR, Vasicek with Jumps

    Once the process has been chosen and calibrated to historical data, typically through regressionor maximum likelihood estimation, one may use the process to simulate the risk factor over a giventime horizon and build future scenarios for the portfolio under examination. On the terminal simulateddistribution of the portfolio one may then single out several risk measures. This report does not focuson the risk measures themselves but on the stochastic processes estimation preceding the simulation ofthe risk factors. In case more than one model is suited to the task, one can analyze risks with differentmodels and compare the outputs as a way to reduce model risk.

    Finally, this first survey report focuses on single time series. Correlation or more generally dependenceacross risk factors, leading to multivariate processes modeling, will be addressed in future work.

    PrerequisitesThe following discussion assumes that the reader is familiar with basic probability theory, includ-

    ing probability distributions (density and cumulative density functions, moment generating functions),random variables and expectations. It is also assumed that the reader has some practical knowledge ofstochastic calculus and of Itos formula, and basic coding skills. For each section, the MATLAB codeis provided to implement the specific models. Examples illustrating the possible use of the models onactual financial time series are also presented.

    1such as, for example, the Jarque Bera, the Shapiro-Wilk and the Anderson-Darling tests, that are not addressed in thisreport

    2Examples are the Kolmogorov Smirnov test, likelihood ratio methods and the Akaike information criteria, as well asmethods based on the Kullback Leibler information or relative entropy, the Hellinger Distance and other divergences

    3The Diebold Mariano statistics can be mentioned as an example for AR processes. These approaches are not pursuedhere.

  • 8/9/2019 Brigo Toolkit

    5/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 5

    2 Modelling with Basic Stochastic Processes: GBM

    This section provides an introduction to modelling through stochastic processes. All the concepts will beintroduced using the fundamental process for financial modelling, the geometric Brownian motion withconstant drift and constant volatility. The GBM is ubiquitous in finance, being the process underlying

    the Black and Scholes formula for pricing European options.

    2.1 The Geometric Brownian Motion

    The geometric Brownian motion (GBM) describes the random behaviour of the asset price level S(t) overtime. The GBM is specified as follows:

    dS(t) = S(t)dt + S(t)dW(t) (1)

    Here W is a standard Brownian motion, a special diffusion process4 that is characterised by indepen-dent identically distributed (iid) increments that are normally (or Gaussian) distributed with zero meanand a standard deviation equal to the square root of the time step. Independence in the incrementsimplies that the model is a Markov Process, which is a particular type of process for which only the

    current asset price is relevant for computing probabilities of events involving future asset prices. In otherterms, to compute probabilities involving future asset prices, knowledge of the whole past does not addanything to knowledge of the present.

    The d terms indicate that the equation is given in its continuous time version5. The property ofindependent identically distributed increments is very important and will be exploited in the calibrationas well as in the simulation of the random behaviour of asset prices. Using some basic stochastic calculusthe equation can be rewritten as follows:

    d log S(t) =

    1

    22

    dt + dW(t) (2)

    where log denotes the standard natural logarithm. The process followed by the log is called anArithmetic Brownian Motion. The increments in the logarithm of the asset value are normally distributed.This equation is straightforward to integrate between any two instants, t and u, leading to:

    log S(u) log S(t) =

    12

    2

    (u t) + (W(u) W(t)) N

    12

    2

    (u t), 2(u t)

    . (3)

    Through exponentiation and taking u = T and t = 0 the solution for the GBM equation is obtained:

    S(T) = S(0)exp

    1

    22

    T + W(T)

    (4)

    This equation shows that the asset price in the GBM follows a log-normal distribution, while thelogarithmic returns log(St+t/St) are normally distributed.

    The moments of S(T) are easily found using the above solution, the fact that W(T) is Gaussian withmean zero and variance T, and finally the moment generating function of a standard normal randomvariable Z given by:

    E

    eaZ

    = e12a

    2

    . (5)

    Hence the first two moments (mean and variance) of S(T) are:

    4Also referred to as a Wiener Process.5A continuous-time stochastic process is one where the value of the price can change at any point in time. The theory is

    very complex and actually this differential notation is just short-hand for an integral equation. A discrete-time stochasticprocess on the other hand is one where the price of the financial asset can only change at certain fixed times. In practice,discrete behavior is usually observed, but continuous time processes prove to be useful to analyse the properties of themodel, besides being paramount in valuation where the assumption of continuous trading is at the basis of the Black andScholes theory and its extensions.

  • 8/9/2019 Brigo Toolkit

    6/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 6

    E [S(T)] = S(0)eT Var [S(T)] = e2TS2(0)

    e2T 1

    (6)

    To simulate this process, the continuous equation between discrete instants t0 < t1 < . . . < tn needs tobe solved as follows:

    S(ti+1) = S(ti)exp

    12

    2

    (ti+1 ti) + ti+1 tiZi+1 (7)where Z1, Z2, . . . Z n are independent random draws from the standard normal distribution. The

    simulation is exact and does not introduce any discretisation error, due to the fact that the equationcan be solved exactly. The following charts plot a number of simulated sample paths using the aboveequation and the mean plus/minus one standard deviation and the first and 99th percentiles over time.The mean of the asset price grows exponentially with time.

    0 100 200 300 400 500 60050

    100

    150

    200

    250

    300

    350

    Time in days

    SpotPrice

    0 100 200 300 400 50050

    100

    150

    200

    250

    300

    Time in days

    SpotPrice

    Figure 1: GBM Sample Paths and Distribution Statistics

    The following Matlab function simulates sample paths of the GBM using equation (7), which wasvectorised in Matlab.

    Code 1 MATLAB Code to simulate GBM Sample Paths.

    function S=GBM_s imulation( N_Sim ,T,dt,mu, sigma ,S0)

    mean = ( m u - 0 . 5 * s i g m a ^ 2 ) * d t ;S=S0*ones (N_Sim ,T+1);

    B M = s i g m a *sqrt (dt)*normrnd(0,1,N_Sim ,T);S(:,2: en d ) = S 0 * ex p ( cumsum (mean +BM,2));

    en d

    2.2 Parameter Estimation

    This section describes how the GBM can be used as an attempt to model the random behaviour of theFTSE100 stock index, the log of which is plotted in the left chart of Figure 2.

    The second chart, on the right of Figure 2, plots the quantiles of the log return distribution againstthe quantiles of the standard normal distribution. This QQ-plot allows one to compare distributionsand to check the assumption of normality. The quantiles of the historical distribution are plotted on theY-axis and the quantiles of the chosen modeling distribution on the X-axis. If the comparison distributionprovides a good fit to the historical returns, then the QQ-plot approximates a straight line. In the case

  • 8/9/2019 Brigo Toolkit

    7/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 7

    1940 1946 1953 1959 1965 1971 1978 1984 1990 1996 20030

    1

    2

    3

    4

    5

    6

    7

    8

    logofFTSE10

    0Index

    4 3 2 1 0 1 2 3 410

    8

    6

    4

    2

    0

    2

    4

    6

    8

    10

    Standard Normal Quantiles

    QuantilesofInputSample

    QQ Plot of Sample Data versus Standard Normal

    Figure 2: Historical FTSE100 Index and QQ Plot FTSE 100

    of the FTSE100 log returns, the QQ-plot show that the historical quantiles in the tail of the distributionare significantly larger compared to the normal distribution. This is in line with the general observationabout the presence of fat tails in the return distribution of financial asset prices. Therefore the GBMat best provides only a rough approximation for the FTSE100. The next sections will outline someextensions of the GBM that provide a better fit to the historical distribution.

    Another important property of the GBM is the independence of returns. Therefore, to ensure theGBM is appropriate for modelling the asset price in a financial time series one has to check that thereturns of the observed data are truly independent. The assumption of independence in statistical termsmeans that there is no autocorrelation present in historical returns. A common test for autocorrelation(typically on returns of a given asset) in a time series sample x1, x2, . . . , xn realised from a random processX(ti) is to plot the autocorrelation function of the lag k, defined as:

    ACF(k) =1

    (n k)vnki=1

    (xi m)(xi+k m), k = 1, 2, . . . (8)where m and v are the sample mean and variance of the series, respectively. Often, besides the ACF,one also considers the Partial Auto-Correlation function (PACF). Without fully defining PACF, roughlyspeaking one may say that while ACF(k) gives an estimate of the correlation between X(ti) and X(ti+k),PACF(k) informs on the correlation between X(ti) and X(ti+k) that is not accounted for by the shorterlags from 1 to k 1, or more precisely it is a sort of sample estimate of:

    Corr(X(ti) Xi+1,...,i+k1i , X(ti+k) Xi+1,...,i+k1i+k )

    where Xi+1,...,i+k1i and Xi+1,...,i+k1i+k are the best estimates (regressions in a linear context) of X(ti)

    and X(ti+k) given X(ti+1), . . . , X (ti+k1). PACF also gives an estimate of the lag in an autoregressiveprocess.

    ACF and PACF for the FTSE 100 equity index returns are shown in the charts in Figure 3. For theFTSE100 one does not observe any significant lags in the historical return time series, which means theindependence assumption is acceptable in this example.

    Maximum Likelihood Estimation

    Having tested the properties of independence as well as normality for the underlying historical dataone can now proceed to calibrate the parameter set = (, ) for the GBM based on the historicalreturns. To find that yields the best fit to the historical dataset the method of maximum likelihoodestimation is used (MLE).

    MLE can be used for both continuous and discrete random variables. The basic concept of MLE, assuggested by the name, is to find the parameter estimates for the assumed probability density function

  • 8/9/2019 Brigo Toolkit

    8/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 8

    0 5 10 15 200.2

    0

    0.2

    0.4

    0.6

    0.8

    Lag

    SampleAutoco

    rrelation

    Sample Autocorrelation Function (ACF)

    0 5 10 15 200.2

    0

    0.2

    0.4

    0.6

    0.8

    Lag

    SamplePartialAutocorrelations

    Sample Partial Autocorrelation Function

    Figure 3: Autocorrelation and Partial Autocorrelation Function for FTSE 100

    f (continuous case) or probability mass function (discrete case) that will maximise the likelihood orprobability of having observed the given data sample x1, x2, x3,...,xn for the random vector X1,...,Xn.In other terms, the observed sample x1, x2, x3,...,xn is used inside fX1,X2,...,Xn;, so that the only variablein f is , and the resulting function is maximised in . The likelihood (or probability) of observing aparticular data sample, i.e. the likelihood function, will be denoted by L().

    In general, for stochastic processes, the Markov property is sufficient to be able to write the likelihoodalong a time series of observations as a product of transition likelihoods on single time steps betweentwo adjacent instants. The key idea there is to notice that for a Markov process xt, one can write, againdenoting f the probability density of the vector random sample:

    L() = fX(t0),X(t1),...,X(tn); = fX(tn)|X(tn1); fX(tn1)|X(tn2); fX(t1)|X(t0); fX(t0); (9)In the more extreme cases where the observations in the data series are iid, the problem is consid-

    erably simplified, since fX(ti)|X(ti1); = fX(ti); and the likelihood function becomes the product ofthe probability density of each data point in the sample without any conditioning. In the GBM case,

    this happens if the maximum likelihood estimation is done on log-returns rather than on the levels. Bydefining the X as

    X(ti) := log S(ti) log S(ti1), (10)one sees that they are already independent so that one does not need to use explicitly the above decom-position (9) through transitions. For general Markov processes different from GBM, or if one had workedat level space S(ti) rather than in return space X(ti), this can be necessary though: see, for example,the Vasicek example later on.

    The likelihood function for iid variables is in general:

    L() = f (x1, x2, x3,...,xn) =n

    i=1f (xi) (11)

    The MLE estimate is found by maximising the likelihood function. Since the product of densityvalues could become very small, which would cause numerical problems with handling such numbers,the likelihood function is usually converted6 to the log likelihood L = log L.7 For the iid case thelog-likelihood reads:

    L() =n

    i=1

    log f (xi) (12)

    6This is allowed, since maxima are not affected by monotone transformations.7The reader has to be careful not to confuse the log taken to move from levels S to returns X with the log used to move

    from the likelihood to the log-likelihood. The two are not directly related, in that one takes the log-likelihood in generaleven in situations when there are no log-returns but just general processes.

  • 8/9/2019 Brigo Toolkit

    9/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 9

    The maximum of the log likelihood usually has to be found numerically using an optimisation algo-rithm8. In the case of the GBM, the log-returns increments form normal iid random variables, each witha well known density f(x) = f(x; m, v), determined by mean and variance. Based on Equation (3) (withu = ti+1 and t = ti) the parameters are computed as follows:

    m = 122t v = 2t (13)The estimates for the GBM parameters are deduced from the estimates of m and v. In the case of

    the GBM, the MLE method provides closed form expressions for m and v through MLE of Gaussian iidsamples9. This is done by differentiating the Gaussian density function with respect to each parameter andsetting the resulting derivative equal to zero. This leads to following well known closed form expressionsfor the sample mean and variance10 of the log returns samples xi for Xi = log S(ti) log S(ti1)

    m =n

    i=1

    xi/n v =n

    i=1

    (xi m)2/n. (14)

    Similarly one can find the maximum likelihood estimates for both parameters simultaneously usingthe following numerical search algorithm.

    Code 2 MATLAB MLE Function for iid Normal Distribution.

    function [ m u s i g m a ] = G B M _ c a l i b r a t i o n( S , d t , p a r am s )R e t = p r i c e 2 r e t ( S ) ;n= length (Ret);o p t i on s = o p t i ms e t ( M a x F u n E v a l s , 1 0 0 0 0 0 , M a x I te r , 1 0 0 0 0 0) ;f m i n s e a r c h ( @ no r m a lL L , p a r a ms , o p t i on s ) ;function m l l = n o rm a lL L ( p a r am s )

    mu=params(1);sigma= ab s ( p a r a m s ( 2 ) ) ;l l = n *lo g (1 / sqrt (2 * pi * d t ) / s i g m a ) +su m ( - ( R et - m u * d t ) . ^ 2 / 2 /( d t * s i g ma ^ 2 ) ) ;

    mll=-ll;en d

    en d

    An important remark is in order for this important example: given that xi = log s(ti) log s(ti1),where s is the observed sample time series for geometric brownian motion S, m is expressed as:

    m =n

    i=1

    xi/n =log(s(tn)) log(s(t0))

    n(15)

    The only relevant information used on the whole S sample are its initial and final points. All of theremaining sample, possibly consisting of hundreds of observations, is useless.11

    This is linked to the fact that drift estimation for random processes like GBM is extremely difficult.In a way, if one could really estimate the drift one would know locally the direction of the market inthe future, which is effectively one of the most difficult problems. The impossibility of estimating this

    quantity with precision is what keeps the market alive. This is also one of the reasons for the success ofrisk neutral valuation: in risk neutral pricing one is allowed to replace this very difficult drift with therisk free rate.

    8For example, Excel Solver or fminsearch in MatLab, although for potentially multimodal likelihoods, such as possiblythe jump diffusion cases further on, one may have to preprocess the optimisation through a coarse grid global method suchas genetic algorithms or simulated annealing, so as to avoid getting stuck in a local minima given by a lower mode.

    9Hogg and Craig, Introduction to Mathematical Statistics., fourth edition, p. 205, example 4.10Note that the estimator for the variance is biased, since E(v) = v(n 1)/n. The bias can be corrected by multiplying

    the estimator by n/(n 1). However, for large data samples the difference is very small.11Jacod, in Statistics of Diffusion Processes: Some Elements, CNR-IAMI 94.17, p. 2, in case v is known, shows that the

    drift estimation can be statistically consistent, converging in probability to the true drift value, only if the number of pointstimes the time step tends to infinity. This confirms that increasing the observations between two given fixed instants isuseless; the only helpful limit would be letting the final time of the sample go to infinity, or let the number of observationsincrease while keeping a fixed time step

  • 8/9/2019 Brigo Toolkit

    10/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 10

    The drift estimation will improve considerably for mean reverting processes, as shown in the nextsections.

    Confidence Levels for Parameter EstimatesFor most applications it is important to know the uncertainty around the parameter estimates obtained

    from historical data, which is statistically expressed by confidence levels. Generally, the shorter thehistorical data sample, the larger the uncertainty as expressed by wider confidence levels.

    For the GBM one can derive closed form expressions for the Confidence Interval (CI) of the meanand standard deviation of returns, using the Gaussian law of independent return increments and the factthat the sum of independent Gaussians is still Gaussian, with the mean given by the sum of the meansand the variance given by the sum of variances. Let Cn = X1 + X2 + X3 + ... + Xn, then one knows thatCn N(nm, nv) 12. One then has:

    P

    Cn nm

    v

    n z

    = P

    Cn/n m

    v/

    n z

    = (z) (16)

    where () denotes the cumulative standard normal distribution. Cn/n, however, is the MLE estimatefor the sample mean m. Therefore the 95th confidence level for the sample mean is given by:

    Cnn

    1.96 vn m

    Cnn

    + 1.96v

    n(17)

    This shows that as n increases the CI tightens, which means the uncertainty around the parameterestimates declines.

    Since a standard Gaussian squared is a chi-squared, and the sum of n independent chi-squared is achi-squared with n degrees of freedom, this gives:

    P

    n

    i=1

    (xi m)

    v

    2 z

    = P

    nv

    v z

    = 2n(z), (18)

    hence under the assumption that the returns are normal one finds the following confidence intervalfor the sample variance:

    n

    qUv v

    n

    qLv (19)

    where qL, qU are the quantiles of the chi-squared distribution with n degrees of freedom corresponding

    to the desired confidence level.In the absence of these explicit relationships one can also derive the distribution of each of the pa-

    rameters numerically, by simulating the sample parameters as follows. After the parameters have beenestimated (either through MLE or with the best fit to the chosen model, call these the first parame-ters), simulate a sample path of length equal to the historical sample with those parameters. For thissample path one then estimates the best fit parameters as if the historical sample were the simulatedone, and obtains a sample of the parameters. Then again, simulating from the first parameters, one

    goes through this procedure and obtains another sample of the parameters. By iterating one builds arandom distribution of each parameter, all starting from the first parameters. An implementation of thesimulated parameter distributions is provided in the next Matlab function.

    The charts in Figure 4 plot the simulated distribution for the sample mean and sample varianceagainst the normal and Chi-squared distribution given by the theory and the Gaussian distribution. Inboth cases the simulated distributions are very close to their theoretical limits. The confidence levelsand distributions of individual estimated parameters only tell part of the story. For example, in value-at-risk calculations, one would be interested in finding a particular percentile of the asset distribution.In expected shortfall calculations, one would be interested in computing the expectation of the asset

    12Even without the assumption of Gaussian log returns increments, but knowing only that log returns increments areiid, one can reach a similar Gaussian limit for Cn, but only asymptotically in n, thanks to the central limit theorem.

  • 8/9/2019 Brigo Toolkit

    11/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 11

    Code 3 MATLAB Simulating distribution for Sample mean and variance.

    % l o a d C : \ R e s e a r c h \ Q S C V O T e c h n i c a l P a p e r \ R e po r t \ C h a p t er 1 \ f t s e 1 0 0 m o n t h l y 1 9 40 . m at load f t s e 1 0 0 m o n t h l y 1 9 4 0 . m a t S = f t s e 1 0 0 m o n t h l y 1 9 4 0 d a t a ;ret= lo g ( S ( 2 : end ,2)./S(1: end - 1 , 2 ) ) ;n= length ( r e t )

    m= su m( r e t ) / nv= sqrt ( su m ( ( r e t - m ) . ^ 2 ) / n )

    S i m R = n o r m r n d ( m , v , 1 0 0 0 0 , n ) ;

    m_dist=su m (SimR ,2)/n;m_tmp=repmat(m_dist,1,799);v_dist= su m ( ( S i m R - m _ t m p ) . ^ 2 , 2 ) / n ;

    dt=1/12;sigma_dist= sqrt ( v _ d i s t / d t ) ;

    mu_dist=m_dist/dt+0.5*sigma_dist.^2;

    S0=100;pct_dist=S0*logninv(0.99,sigma_dist ,mu_dist);

    0.05 0.1 0.15 0.20

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    Frequency

    GBM Parameter mu600 650 700 750 800 850 900 950 10000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    Frequency

    Normalised Sample Variance

    Figure 4: Distribution of Sample Mean and Variance

    conditional on exceeding such percentile. In the case of the GBM, one knows the marginal distributionof the asset price to be log normal and one can compute percentiles analytically for a given parameterset. The MATLAB code 2.2, which computes the parameter distributions, also simulates the distributionfor the 99th percentile of the asset price after three years when re-sampling from simulation according tothe first estimated parameters. The histogram is plotted in Figure 5.

    2.3 Characteristic and Moment Generating Functions

    Let X be a random variable, either discrete with probability pX (first case) or continuous with densityfX (second case). The k th moment mk(X) is defined as:

    mk(X) = E(Xk) =

    x x

    kpX(x)

    xkfX(x)dx.

    Moments are not guaranteed to exist, for example the Cauchy distribution does not even admit the firstmoment (mean). The moment generating function MX(t) of a random variable X is defined, under sometechnical conditions for existence, as:

    M(t) = E(etX) =

    x e

    txpX(x)

    etxfX(x)dx

  • 8/9/2019 Brigo Toolkit

    12/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 12

    130 140 150 160 170 180 190 2000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    Frequency

    99th Percentile with S0=100 over a three year period

    Figure 5: Distribution for 99th Percentile of FTSE100 over a three year Period

    and can be seen to be the exponential generating function of its sequence of moments:

    MX(t) =

    k=0

    mk(X)

    k!tk. (20)

    The moment generating function (MGF) owes its name to the fact that from its knowledge one may backout all moments of the distribution. Indeed, one sees easily that by differentiating j times the MGF andcomputing it in 0 one obtains the j-th moment:

    dj

    dtjMX(t)|t=0 = mj(X). (21)

    A better and more general definition is that of characteristic function, defined as:

    X(t) = E(eitX),

    where i is the unit imaginary number from complex numbers theory. The characteristic function exists forevery possible density, even for the Cauchy distribution, whereas the moment generating function does notexist for some random variables, notably some with infinite moments. If moments exist, one can derivethem also from the characteristic function through differentiation. Basically, the characteristic functionforces one to bring complex numbers into the picture but brings in considerable advantages. Finally, itis worth mentioning that the moment generating function can be seen as the Laplace transform of theprobability density, whereas the characteristic function is the Fourier transform.

    2.4 Properties of Moment Generating and Characteristic Functions

    If two random variables, X and Y, have the same moment generating function M(t), then X and Y have

    the same probability distribution. The same holds, more generally, for the characteristic function.IfX and Y are independent random variables and W = X+ Y, then the moment generating function

    of W is the product of the moment generating functions of X and Y:

    MX+Y(t) = E(et(X+Y)) = E(etXetY) = E(etX)E(etY) = MX(t)MY(t).

    The same property holds for the characteristic function. This is a very powerful property since the sumof independent returns or shocks is common in modelling. Finding the density of a sum of independentrandom variables can be difficult, resulting in a convolution, whereas in moment or characteristic functionspace it is simply a multiplication. This is why typically one moves in moment generating space, computesa simple multiplication and then goes back to density space rather than working directly in density spacethrough convolution.

  • 8/9/2019 Brigo Toolkit

    13/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 13

    As an example, if X and Y are two independent random variables and X is normal with mean 1 andvariance 21 and Y is normal with mean 2 and variance

    22, then one knows that W = X+ Y is normal

    with mean 1 + 2 and variance 21 + 22 . In fact, the moment generating function of a normal random

    variable Z N(, 2) is given by:MZ(t) = e

    t+2

    2 t2

    Hence

    MX(t) = e1t+

    212 t

    2

    and M Y(t) = e2t+

    222 t

    2

    From this

    MW(t) = MX(t)MY(t) = e1t+

    212 t

    2

    e2t+222 t

    2

    = e(1+2)t+(21+

    22)

    2 t2

    one obtains the moment generating function of a normal random variable with mean (1 + 2) andvariance (21 +

    22), as expected.

    3 Fat Tails: GARCH Models

    GARCH13 models were first introduced by Bollerslev (1986). The assumption underlying a GARCH

    model is that volatility changes with time and with the past information. In contrast, the GBM modelintroduced in the previous section assumes that volatility () is constant. GBM could have been easilygeneralised to a time dependent but fully deterministic volatility, not depending on the state of theprocess at any time. GARCH instead allows the volatility to depend on the evolution of the process. Inparticular the GARCH(1,1) model, introduced by Bollerslev, assumes that the conditional variance (i.e.conditional on information available up to time ti) is given by a linear combination of the past variance(ti1)2 and squared values of past returns.

    S(ti)

    S(ti)= ti + (ti)W(ti)

    (ti)2 = 2 + (ti1)

    2 + (ti1)2

    (ti)2 = ((ti)W(ti))2

    (22)

    where S(ti) = S(ti+1) S(ti) and the first equation is just the discrete time counterpart of Equation(1).

    If one had just written the discrete time version of GBM, (ti) would just be a known function of timeand one would stop with the first equation above. But in this case, in the above GARCH model (t)2 isassumed itself to be a stochastic process, although one depending only on past information. Indeed, forexample

    (t2)2 = 2 + (t1)

    2 + 2 + 2(t0)2

    so that (t2) depends on the random variable (t1)2. Clearly conditional on the information at the

    previous time t1 the volatility is no longer random, but unconditionally it is, contrary to the GBMvolatility.

    In GARCH the variance is an autoregressive process around a long-term average variance rate 2

    .The GARCH(1,1) model assumes one lag period in the variance. Interestingly, with = 0 and = 1 one recovers the exponentially weighted moving average model, which unlike the equally weighted MLEestimator in the previous section, puts more weight on the recent observations. The GARCH(1,1) modelcan be generalised to a GARCH(p,q) model with p lag terms in the variance and q terms in the squaredreturns. The GARCH parameters can be estimated by ordinary least squares regression or by maximumlikelihood estimation, subject to the constraint + + = 1, a constraint summarising of the fact thatthe variance at each instant is a weighted sum where the weights add up to one.

    As a result of the time varying state-dependent volatility, the unconditional distribution of returns isnon-Gaussian and exhibits so called fat tails. Fat tail distributions allow the realizations of the randomvariables having those distributions to assume values that are more extreme than in the normal/Gaussian

    13Generalised Autoregressive Conditional Heteroscedasticity

  • 8/9/2019 Brigo Toolkit

    14/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 14

    10 5 0 5 100

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    Historical vs Simulated Return Distribution

    Frequency

    Historical

    NGARCH Sim

    Normal

    8 6 4 2 0 2 4 6 88

    6

    4

    2

    0

    2

    4

    6

    8

    X Quantiles

    YQuantiles

    Figure 6: Historical FTSE100 Return Distribution vs. NGARCH Return Distribution

    case, and are therefore more suited to model risks of large movements than the Gaussian. Therefore,returns in GARCH will tend to be riskier than the Gaussian returns of GBM.

    GARCH is not the only approach to account for fat tails, and this report will introduce other ap-proaches to fat tails later on.

    The particular structure of the GARCH model allows for volatility clustering (autocorrelation in thevolatility), which means a period of high volatility will be followed by high volatility and converselya period of low volatility will be followed by low volatility. This is an often observed characteristicof financial data. For example, Figure 8 shows the monthly volatility of the FTSE100 index and thecorresponding autocorrelation function, which confirms the presence of autocorrelation in the volatility.

    A long line of research followed the seminal work of Bollerslev leading to customisations and general-isations including exponential GARCH (EGARCH) models, threshold GARCH (TGARCH) models andnon-linear GARCH (NGARCH) models among others. In particular, the NGARCH model introduced byEngle and Ng (1993) is an interesting extension of the GARCH(1,1) model, since it allows for asymmetricbehaviour in the volatility such that good news or positive returns yield subsequently lower volatility,while bad news or negative returns yields a subsequent increase in volatility. The NGARCH is specifiedas follows:

    S(ti)

    S(ti)= ti + (ti)W(ti)

    (ti)2 = + (ti1)

    2 + ((ti1) (ti1))2(ti)

    2 = ((ti)W(ti))2 (23)

    NGARCH parameters , , and are positive and , , and are subject to the stationarityconstraint + (1 + 2) < 1.

    Compared to the GARCH(1,1), this model contains an additional parameter , which is an adjustmentto the return innovations. Gamma equal to zero leads to the symmetric GARCH(1,1) model, in whichpositive and negative (ti1) have the same effect on the conditional variance. In the NGARCH ispositive, thus reducing the impact of good news ((ti1) > 0) and increasing the impact of bad news((ti1) < 0) on the variance.

    MATLAB Routine 4 extends Routine 1 by simulating the GBM with the NGARCH volatility struc-ture. Notice that there is only one loop in the simulation routine, which is the time loop. Since theGARCH model is path dependent it is no longer possible to vectorise the simulation over the time step.For illustration purpose, this model is calibrated to the historic asset returns of the FTSE100 index. Inorder to estimate the parameters for the NGARCH geometric Brownian motion the maximum-likelihoodmethod introduced in the previous section is used.

  • 8/9/2019 Brigo Toolkit

    15/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 15

    100 200 300 400 500 600 700 800 900 1000

    2

    0

    2

    Newslevel

    4 years of daily simulation

    Market News

    100 200 300 400 500 600 700 800 900 10000

    0.02

    0.04

    0.06

    0.08

    0.1

    LogreturnsVolatility

    4 years of daily simulation

    GBM Log returns Volatility with GARCH effects

    100 200 300 400 500 600 700 800 900 10000

    0.02

    0.04

    0.06

    0.08

    0.1

    LogreturnsVolatility

    4 years of daily simulation

    GBM Log returns Volatility with NGARCH effects

    100 200 300 400 500 600 700 800 900 1000

    0.05

    0

    0.05

    0.1

    Logreturns

    4 years of daily simulation

    GBM Log returns

    100 200 300 400 500 600 700 800 900 1000

    0.05

    0

    0.05

    0.1

    Logreturns

    4 years of daily simulation

    GBM Log returns with GARCH effects

    100 200 300 400 500 600 700 800 900 1000

    0.05

    0

    0.05

    0.1

    Logreturns

    4 years of daily simulation

    GBM Log returns with NGARCH effects

    Figure 7: NGARCH News Effect vs GARCH News Effect

    Although the variance is changing over time and is state dependent and hence stochastic conditional oninformation at the initial time, it is locally deterministic, which means that conditional on the informationat ti1 the variance is known and constant in the next step between time ti1 and ti. In other words,

  • 8/9/2019 Brigo Toolkit

    16/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 16

    1940 1946 1953 1959 1965 1971 1978 1984 1990 1996 20030

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    Monthlyvolatilyin%

    0 5 10 15 200.2

    0

    0.2

    0.4

    0.6

    0.8

    Lag

    SampleAutoco

    rrelation

    Sample Autocorrelation Function (ACF)

    Figure 8: Monthly Volatility of the FTSE100 and Autocorrelation Function

    Code 4 MATLAB Code to Simulate GBM with NGARCH vol.

    function S=NGARCH_simulation(N_Sim ,T,params,S0) mu _ = p a r a m s ( 1 ) ; o m eg a _ = p a r a m s ( 2 ) ; a l p ha _ = p a r a m s ( 3 ) ; b e t a _ = p a r am s ( 4 ) ; g a m m a _ = p a r a ms ( 5 ) ;S=S0*ones (N_Sim ,T);v 0 = o m e g a _ / ( 1 - a l p h a _ - b e t a _ ) ;v=v0*ones (N_Sim ,1);BM=normrnd (0,1,N_Sim ,T);fo r i = 2 : T

    sigma_= sqrt(v);mean = ( m u _ - 0 . 5 * s i g m a _ . ^ 2 ) ;S ( : , i ) = S ( : , i - 1 ) . *ex p(mean + s i g m a _ . * B M ( : , i ) ) ;v = o m e g a _ + a l p h a _ * v + b e t a _ * (sqrt ( v ) . * B M ( : , i ) - g a m m a _ *sqrt (v)).^2;

    en den d

    the variance in the GARCH models is locally constant. Therefore, conditional on the variance at ti1the density of the process at the next instant is still normal14. Moreover, the conditional returns are stillindependent. The log-likelihood function for the sample x1,...,xi,...,xn of:

    Xi =S(ti)

    S(ti)

    is given by:

    L() =n

    i=1

    log f (xi; xi1) (24)

    f (xi; xi1) = fN(xi; , 2i ) =

    1

    22iexp

    (xi )

    2

    22i (25)

    L() = constant + 12

    log(2i ) (xi )2/2i (26)where f is the density function of the normal distribution. Note the appearance ofxi1 in the func-

    tion, which indicates that normal density is only locally applicable between successive returns, as noticedpreviously. The parameter set for the NGARCH GBM includes five parameters, = (,,,,). Dueto the time changing state dependent variance one no longer finds explicit solutions for the individualparameters and has to use a numerical scheme to maximise the likelihood function through a solver. Todo so one only needs to maximise the term in brackets in (26), since the constant and the factor do notdepend on the parameters.

    14Note that the unconditional density of the NGARCH GBM is no longer normal, but an unknown heavy tailed distri-bution.

  • 8/9/2019 Brigo Toolkit

    17/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 17

    Code 5 MATLAB Code to Estimate Parameters for GBM NGARCH.

    function [ m u _ o m e g a _ a l p h a_ b e t a _ g a m m a _ ] = N G _ c a l i b r a t i o n( d a ta , p a r a m s )r e t u rn s = p r i c e 2 r e t ( d at a ) ;r e t u r n s L e n g t h =length ( r e t u r n s ) ;o p t i on s = o p t i ms e t ( M a x F u n E v a l s , 1 0 0 0 0 0 , M a x I te r , 1 0 0 0 0 0) ;f m i n s e a r c h ( @ NG _ J G BM _ L L , p a r a ms , o p t i on s ) ;

    function m l l = N G _ J G B M _ L L ( p ar a m s )mu_=params(1);omega_= ab s ( p a r a m s ( 2 ) ) ; a l p h a _ =ab s (params(3));beta_= ab s ( p a r a m s ( 4 ) ) ; g a m m a _ = p a r a m s ( 5 ) ;d e n u m = 1 - a l p h a _ - b e t a_ * ( 1 + g a m m a _ ^ 2 ) ;if d en um < 0 ; m l l = i nt ma x ; return ; en d % V a r i a nc e s t a t i o n a r i t y t e s tv a r_ t = o m eg a _ / d en u m ;i n n o v = r e t u rn s ( 1 ) - m u _ ;L o g L i k e l i h o o d = lo g ( ex p ( - i n n o v ^ 2 / v a r _ t / 2 ) /sqrt ( pi *2*var_t));fo r t i m e = 2 : r e t u r n s L e n g t h

    v a r _t = o m e g a_ + a l p h a _ * v a r _ t + b e t a _ * ( i nn o v - g a m m a _ * sqrt ( v a r _ t ) ) ^ 2 ;if v a r_ t < 0 ; m l l = i n t m ax ; return ; en d ;i n n ov = r e t u rn s ( t i m e ) - m u _ ;L o g Li k e l ih o o d = L o g Li k e l ih o o d + lo g ( ex p ( - i n n o v ^ 2 / v a r _ t / 2 ) /sqrt ( pi *2*var_t));

    en d ml l = - L o g L i k e l i h o o d;

    en den d

    Having estimated the best fit parameters one can verify the fit of the NGARCH model for the FTSE100monthly returns using the QQ-plot. A large number of monthly returns using Routine 5 is first simulated.The QQ plot of the simulated returns against the historical returns shows that the GARCH modeldoes a significantly better job when compared to the constant volatility model of the previous section.The quantiles of the simulated NGARCH distribution are much more aligned with the quantiles of thehistorical return distribution than was the case for the plain normal distribution (compare QQ-plots inFigures 2 and 6).

    4 Fat Tails: Jump Diffusion Models

    Compared with the normal distribution of the returns in the GBM model, the log-returns of a GBM modelwith jumps are often leptokurtotic15. This section discusses the implementation of a jump-diffusion modelwith compound Poisson jumps. In this model, a standard time homogeneous Poisson process dictatesthe arrival of jumps, and the jump sizes are random variables with some preferred distribution (typicallyGaussian, exponential or multinomial). Merton (1976) applied this model to option pricing for stocksthat can be subject to idiosyncratic shocks, modelled through jumps. The model SDE can be written as:

    dS(t) = S(t)dt + S(t)dW(t) + S(t)dJt (27)

    where again Wt is a univariate Wiener process and Jt is the univariate jump process defined by:

    JT =NT

    j=1(Yj 1), or dJ(t) = (YN(t) 1)dN(t), (28)

    where (NT)T0 follows a homogeneous Poisson process with intensity , and is thus distributed like aPoisson distribution with parameter T. The Poisson distribution is a discrete probability distributionthat expresses the probability of a number of events occurring in a fixed period of time. Since the Poissondistribution is the limiting distribution of a binomial distribution where the number of repetitions of theBernoulli experiment n tends to infinity, each experiment with probability of success /n, it is usuallyadopted to model the occurrence of rare events. For a Poisson distribution with parameter T, thePoisson probability mass function is:

    fP(x,T) =exp(T)(T)x

    x!, x = 0, 1, 2, . . .

    15Fat-tailed distribution.

  • 8/9/2019 Brigo Toolkit

    18/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 18

    (recall that by convention 0! = 1). The Poisson process (NT)T0 is counting the number of arrivals in[1, T], and Yj is the size of the j-th jump. The Y are i.i.d log-normal variables (Yj exp

    NY, 2Y)that are also independent from the Brownian motion W and the basic Poisson process N.

    Also in the jump diffusion case, and for comparison with (2), it can be useful to write the equation

    for the logarithm of S, which in this case reads

    d log S(t) =

    1

    22

    dt + dW(t) + log(YN(t))dN(t) or, equivalently (29)

    d log S(t) =

    + Y 1

    22

    dt + dW(t) + [log(YN(t))dN(t) Ydt]

    where now both the jump (between square brackets) and diffusion shocks have zero mean, as is shownright below. Here too it is convenient to do MLE estimation in log-returns space, so that here too onedefines X(ti)s according to (10).

    The solution of the SDE for the levels S is given easily by integrating the log equation:

    S(T) = S(0)exp 2

    2 T + W(T)N(T)

    j=1 Yj (30)Code 6 MATLAB Code to Simulate GBM with Jumps.

    function S = J G B M _ s i m u l a t i o n( N _ S im , T , d t , p a r am s , S 0 ) mu _ s t ar = p a r a m s ( 1 ) ; s i g m a _ = p a r am s ( 2 ) ; l a m b d a_ = p a r a ms ( 3 ) ;mu_y_=params(4);sigma_y_=params(5);M _ s i mu l = zeros (N_Sim ,T);fo r t = 1 : T

    j u m p nb = p o i s sr n d ( l a m b d a _ * d t , N _ Si m , 1 ) ;j u m p = n o r m rn d ( m u _ y _ * ( j u m p n b - l a m bd a _ * d t ) , sqrt ( j u m p n b ) * s i g m a _ y _ ) ;M _ si m ul ( : , t ) = m u _s t ar * d t + s i gm a _ *sqrt ( d t ) * randn ( N _ Si m , 1 ) + j u m p ;

    en dS = r e t 2 p r i c e ( M _ s i m u l , S 0 ) ;

    en d

    The discretisation of the Equation (30) for the given time step t is:

    S(t) = S(t t)exp

    2

    2

    t +

    tt

    ntj=1

    Yj (31)

    where N(0, 1) and nt = Nt Ntt counts the jumps between time tt and t. When rewritingthis equation for the log returns

    X(t) := log(S(t)) = log(S(t)) log(S(t t)) we find:X(t) = log(S(t)) = t +

    t t + J

    t (32)

    where the jumps J

    t in the time interval t and the drift

    are defined as:

    Jt =nt

    j=1

    log(Yj) t Y , =

    + Y 12

    2

    (33)

    so that the jumps Jt have zero mean. Since the calibration and simulation are done in a discrete timeframework, we use the fact that for the Poisson process with rate , the number of jump events occurringin the fixed small time interval t has the expectation of t. Compute

    E(Jt ) = E

    ntj=1

    log Yj

    t Y = EE

    ntj=1

    log Yj |nt t Y

    = E(ntY) t Y = 0

  • 8/9/2019 Brigo Toolkit

    19/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 19

    0 50 100 150 200 2500.01

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    Jumpslogr

    eturnlevel

    1 year of daily simulation

    Jumps

    0 50 100 150 200 2500.038

    0.04

    0.042

    0.044

    0.046

    0.048

    0.05

    S(t)

    1 year of daily simulation

    GBM (black line) and GBM with Jumps (green line)

    0.06 0.04 0.02 0 0.02 0.04 0.060

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    Frequency

    Simulated daily log return

    GBM with Jumps histogram

    0.06 0.04 0.02 0 0.02 0.04 0.060

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    Frequency

    Simulated daily log return

    GBM histogram

    Figure 9: The Effect of the Jumps on the Simulated GBM path and on the Density Probability of theLog Returns. = 0, = 0.15, = 10, Y = 0.03, Y = 0.001

    The simulation of this model is best done by using Equation (32).Conditional on the jumps occurrence nt, the jump disturbance Jt is normally distributed

    Jt |nt N(nt t)Y , nt2Y .Thus the conditional distribution of X(t) = log(S(t)) is also normal and the first two conditionalmoments are:

    E(log(S(t))|nt) = t + (nt t)Y =

    2/2t + ntY , (34)Var( log(S(t))|nt) = 2t + nt2Y (35)

    The probability density function is the sum of the conditional probabilities density weighted by theprobability of the conditioning variable: the number of jumps. Then the calibration of the model param-eters can be done using the MLE on log-returns X conditioning on the jumps.

  • 8/9/2019 Brigo Toolkit

    20/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 20

    Code 7 MATLAB Code to Calibrate GBM with Jumps.

    function [ m u _ s t ar s i g m a_ l a m b da _ m u _ y _ s i g m a _y _ ] = J G B M _ c a l i b r a t i o n ( d a ta , d t , p a r a m s )r e t u rn s = p r i c e 2 r e t ( d at a ) ;d a t a L e n g t h =length ( d a t a ) ;o p t i on s = o p t i ms e t ( M a x F u n E v a l s , 1 0 0 0 0 0 , M a x I te r , 1 0 0 0 0 0) ;p a r a ms = f m i n s e a r c h ( @ FT _ J G BM _ L L , p a r am s , o p t i on s ) ;

    mu_star=params(1);sigma_= ab s ( p a r a m s ( 2 ) ) ; l a m b d a _ =ab s (params(3));mu_y_=params(4);sigma_y_= ab s (params(5));

    function m l l = F T _ J G B M _ L L ( p ar a m s )mu_=params(1);sigma_= ab s( p a r a m s ( 2 ) ) ; l a m b d a _ =ab s( p a r a m s ( 3 ) ) ;mu_y_=params(4);sigma_y_= ab s (params(5));

    M a x _ ju m p s = 5 ;f a c t o r i e l = f a c t o r i a l (0 : M a x _ j u m p s ) ;L o g L ik e l i ho o d = 0 ;fo r t i m e = 1 : d a t a L e n g t hP r o b De n s i ty = 0 ;

    fo r jumps=0:Max_jumps -1j u m p Pr o b = ex p ( - l a m b d a_ * d t ) * ( l a m b d a _ * d t ) ^ j u m ps / f a c t o r i e l ( j u m p s + 1 ) ;c o n d Vo l = d t * s i g m a_ ^ 2 + j u m p s * s i g m a _y _ ^ 2 ;c o n d Me a n = m u _ * d t + m u _y _ * ( j u mp s - l a mb d a _ * d t ) ;c o n d T o J u m p s = ex p (-(data(time)-condMean )^2/condVol/2)/ sqrt ( pi *2*condVol);P r o b D e n s i t y = P r o b D e n s i t y + j u m p Pr o b * c o n d T o J u m p s ;

    en dL o g Li k e l ih o o d = L o g Li k e l ih o o d + lo g ( P r o b D e n s i t y ) ;

    en d ml l = - L o g L i k e l i h o o d;

    en den d

    argmax,Y ,>0,>0,Y>0

    L = log(L) (36)

    The log-likelihood function for the returns X(t) at observed times t = t1, . . . , tn with values x1, . . . , xn,with t = ti ti1, is then given by:

    L() =n

    i=1

    log f(xi; , Y,,,Y) (37)

    f(xi; , Y,,,Y) =+j=0

    P(nt = j)fN

    xi;

    2/2t +jY, 2t +j2Y (38)This is an infinite mixture of Gaussian random variables, each weighted by a Poisson probability

    P(nt = j) = fP(j; t). Notice that if t is small, typically the Poisson process jumps at most once. Inthat case, the above expression simplifies in

    f(xi; , Y,,,Y) = (1 t)fN

    xi;

    2/2

    t, 2t

    + t fN xi; 2/2t + Y, 2t + 2Yi.e. a mixture of two Gaussian random variables weighted by the probability of zero or one jump in t.

    The model parameters are estimated by MLE for FTSE100 on monthly returns and then differentpaths are simulated. The goodness to fit is seen by showing the QQ-plot of the simulated monthly logreturns against the historical FTSE100 monthly returns in Fig. 10. This plot shows that the Jumpsmodel fits the data better then the simple GBM model.

    Finally, one can add markedly negative jumps to the arithmetic Brownian motion return process (29),to then exponentiate it to obtain a new level process for S featuring a more complex jump behaviour.Indeed, if Y is larger than one and 2Y is not large, jumps in the log-returns will tend to be markedlypositive only, which may not be realistic in some cases. For details on how to incorporate negative jumpsand on the more complex mixture features this induces, the reader can refer to Section 10, where themore general mean reverting case is described.

  • 8/9/2019 Brigo Toolkit

    21/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 21

    0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    Historical vs Simulated Return Distribution

    Frequen

    cy

    HistoricalJumps GBMGBM

    0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.20.2

    0.15

    0.1

    0.05

    0

    0.05

    0.1

    0.15

    0.2

    X Quantiles

    YQuantiles

    Figure 10: Historical FTSE100 Return Distribution vs. Jump GBM Return Distribution

    5 Fat Tails Variance Gamma (VG) process

    The two previous sections have addressed the modelling of phenomena where numerically large changesin value (fat tails) are more likely than in the case of the normal distribution. This section presentsyet another method to accomplish fat tails, and this is through a process obtained by time changing aBrownian motion (with drift and volatility ) with an independent subordinator, that is an increasingprocess with independent and stationary increments. In a way, instead of making the volatility statedependent like in GARCH or random like in stochastic volatility models, and instead of adding a newsource of randomness like jumps, one makes the time of the process random through a new process, thesubordinator. A possible related interpretative feature is that financial time, or market activity time,related to trading, is a random transform of calendar time.

    Two models based on Brownian subordination are the Variance Gamma (VG) process and NormalInverse Gaussian (NIG) process. They are a subclass of the generalised hyperbolic (GH) distributions.

    They are characterised by drift parameters, volatility parameter of the Brownian motion, and a varianceparameter of the subordinator.Both VG and NIG have exponential tails and the tail decay rate is the same for both VG and NIG

    distributions and it is smaller than for the Normal distribution.Therefore these models allow for more flexibility in capturing fat tail features and high kurtosis of

    some historical financial distributions with respect to GBM.Among fat tail distributions obtained through subordination, this section will focus on the VG dis-

    tribution, that can also be thought of as a mixture of normal distributions, where the mixing weightsdensity is given by the Gamma distribution of the subordinator. The philosophy is not so different fromthe GBM with jumps seen before, where the return process is also a mixture. The VG distribution wasintroduced in the financial literature by16 Madan and Seneta (1990).

    The model considered for the log-returns of an asset price is of this type:

    d log S(t) = dt + dg(t) + dW(g(t)) S(0) = S0 (39)

    where , and are real constants and 017. This model differs from the usual notation of Brownianmotion mainly in the term gt. In fact one introduces gt to characterise the market activity time. Onecan define the market time as a positive increasing random process, g(t), with stationary incrementsg(u) g(t) for u t 0. In probabilistic jargon g(t) is a subordinator. An important assumption is thatE[g(u) g(t)] = u t. This means that the market time has to reconcile with the calendar time betweent and u on average. Conditional on the subordinator g, between two instants t and u:

    ( W(g(u)) W(g(t)) )|g

    (g(u) g(t)), (40)16For a more advanced introduction and background on the above processes refer to Seneta (2004); Cont and Tankov

    (2004); Moosbrucker (2006) and Madan, Carr, and Chang (1998).17In addition one may say that (39) is a Levy process of finite variation but of relatively low activity with small jumps.

  • 8/9/2019 Brigo Toolkit

    22/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 22

    where is a standard Gaussian, and hence:

    (log(S(u)/S(t)) (u t))|g N((g(u) g(t)), 2(g(u) g(t))) (41)VG assumes {g(t)} ( t , ), a Gamma process with parameter that is independent from the

    standard Brownian motion{

    Wt}t0. Notice also that the gamma process definition implies

    {g(u)

    g(t)} ( ut , ) for any u and t. Increments are independent and stationary Gamma random variables.Consistently with the above condition on the expected subordinator, this Gamma assumption impliesE[g(u) g(t)] = u t and Var(g(u) g(t)) = (u t).

    Through iterated expectations, one has easily that:

    E[log(S(u)) log(S(t))] = ( + )(u t), Var(log S(u) log S(t)) = (2 + 2)(u t).Mean and variance of these log-returns can be compared to the log-returns of GBM in Equation (3), andfound to be the same if ( + ) = 2/2 and 2 + 2 = 2. In this case, the differences will be visiblewith higher order moments.

    Code 8 MATLAB code that simulate the asset model in eq. (39).

    function S = V G _ s i m u l a t i o n( N _ S im , T , d t , p a r a ms , S 0 )t h et a = p a ra m s ( 1) ; n u = p a ra m s ( 2) ; s i gm a = p a ra m s ( 3 ); m u = p a ra m s ( 4) ;g = g a m r nd ( d t / n u , n u , N _ Si m , T ) ;S = S 0 * cumprod ([ones(1,T); ex p ( m u* d t + t he ta * g + s ig ma * sqrt( g ) . * randn (N_Sim ,T))]);

    The process in equation (39) is called a Variance-Gamma process.

    5.1 Probability Density Function of the VG Process

    The unconditional density may then be obtained by integrating out g in Eq (41) through its Gammadensity. This corresponds to the following density function, where one sets u t =: t and Xt :=log(S(t + t)/S(t)):

    f(Xt)(x) =

    0 fN(x;g,

    2

    g)fg; t , dg (42)As mentioned above, this is again a (infinite) mixture of Gaussian densities, where the weights are givenby the Gamma densities rather than Poisson probabilities. The above integral converges and the PDF ofthe VG process Xt defined in Eq. (39) is

    f(Xt)(x) =2e

    (x)

    2

    2t (1/)

    |x |22

    + 2

    t/1/2Kt/1/2

    |s |22

    + 2

    2

    (43)

    Here K() is a modified Bessel function of the third kind with index , given for > 0 by

    K(x) =1

    2

    0

    y1 expx

    2

    (y1 + y) dyCode 9 MATLAB code of the variance gamma density function as in equation (43).

    function f x = V G d en s i t y( x , t he ta , n u , s ig ma , m u , T )

    v1 = 2* ex p ( ( t he t a *( x - m u )) / s i gm a ^ 2) / ( ( n u ^( T / n u )) * sqrt (2 * pi ) * s ig ma * gamma ( T / nu ) ) ;M 2 = ( 2 * si g ma ^ 2 ) / nu + t h et a ^ 2 ;v3 = ab s (x-mu)./ sqrt (M2);v 4 = v 3 .^ ( T/ nu - 0 .5 ) ;v6 = ( ab s ( x - m u ) . *sqrt ( M 2 ) ) . / s i g m a ^ 2 ;K = besselk ( T /n u - 0 .5 , v 6 );f x = v 1 .* v 4 .* K ;

  • 8/9/2019 Brigo Toolkit

    23/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 23

    4 2 0 2 4 6 8 10 12 14 160

    0.1

    0.2

    0.3

    0.4

    0.5

    0 100 200 300 400 5001.5

    2

    2.5

    3

    3.5

    4

    time (days)

    St

    Figure 11: On the left side, VG densities for = .4, = .9, = 0 and various values of of the parameter .On the right, simulated VG paths samples. To replicate the simulation, use code (5) with the parameters = = 0, = 6 and = .03.

    5.2 Calibration of the VG Process

    Given a time series of independent log-returns log(S(ti+1)/S(ti)) (with ti+1ti = t), say (x1, x2, . . . , xn),and denoting by = {, , , } the variance-gamma probability density function parameters that onewould like to estimate, the maximum likelihood estimation (MLE) of consists in finding that max-imises the logarithm of the likelihood function L() = xi=1f(xi; ). The moment generating function ofXt := log(S(t + t)/S(t)), E[e

    zX ] is given by:

    MX(z) = ez

    1 z 12

    2z2t

    (44)

    and one obtains the four central moments of the return distribution over an interval of length t:

    E[X] = X = ( + )t

    E[(X X)2] = (2 + 2)tE[(X X)3] = (232 + 32)tE[(X X)4] = (34 + 12222 + 643)t + (34 + 622+ 342)(t)2.

    Now considering the skewness and the kurtosis:

    S =3t

    ((2 + 2)t)3/2

    K =(34 + 12222 + 643)t + (34 + 622+ 342)(t)2

    ((2 + 2)t)2.

    and assuming to be small, thus ignoring 2, 3, 4, obtain:

    S =3

    t

    K = 3

    1 +

    t

    .

  • 8/9/2019 Brigo Toolkit

    24/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 24

    So the initial guess for the parameters 0 will be:

    =

    V

    t

    = K3 1t =

    S

    t

    3

    =X

    t

    Code 10 MATLAB code of the MLE for variance gamma density function in (43).

    % f u n c t io n p a r a ms = V G _ c a l i b r a t i o n( d a ta , d t )% c o m p ut a t i on o f t h e i n i t ia l p a r a me t e r s

    d a t a = ( E U R U S D sv ( : , 1 ) ) ;hist ( d a t a )

    dt = 1;

    d a t a = p r i c e 2 r e t ( d at a ) ;M = mean ( d a t a ) ;V = st d ( d a t a ) ;S = s k ew n es s ( d a ta ) ;K = k u rt o si s ( d a ta ) ;s ig ma = sqrt ( V / d t ) ;nu = (K /3 -1)* dt ;t he ta = ( S *s ig ma * sqrt ( d t ) ) / ( 3 * n u ) ;

    mu = (M /dt )- theta ;% V G M LEp d f _ VG = @ ( d a ta , t h e t a , n u , s i gm a , m u ) V G d e n s i t y ( da t a , t h et a , n u , s i gm a , m u , d t ) ;s t ar t = [ t he ta , n u , s ig ma , m u ];lb = [- intmax 0 0 - intmax ];ub = [ intmax intmax intmax intmax ];o pt io ns = s ta ts et ( Ma xI te r ,1 00 0, M ax Fu nE va ls ,1 00 00 );p a r a ms = m l e ( d at a , p d f , p d f _V G , s t a r t , s ta r t , l o w e r , lb , u p p e r , ub , o p t i o ns , o p t i o n s ) ;

    0 10 200

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    1yr

    2yr

    3yr

    4yr

    5yr

    6yr7yr

    8yr

    9yr

    10yr

    2 4 6 8 105

    0

    5

    10

    15

    20

    25

    0.10.25

    0.5

    0.75

    0.9

    0.95

    0.99

    years

    Figure 12: On the left side, VG densities for = 1, = .4, = 1.4, = 0 and time from 1 to 10 years.On the right side, on the y-axis, percentiles of these densities as time varies, x-axis.

    An example can be the calibration of the historical time series of the EUR/USD exchange rate, wherethe VG process is a more suitable choice than the geometric Brownian motion. The fit is performedon daily returns and the plot in Fig. 13 illustrates the results. Notice in particular that the difference

  • 8/9/2019 Brigo Toolkit

    25/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 25

    Code 11 MATLAB code to calculate the percentiles of the VG distribution as in fig (12).

    c l e a r a l l ; c l o s e a l l

    lb = 0; ub = 20;o p t i on s = o p t i ms e t ( D i s p l a y , o f f , T o lF u n , 1 e - 5 , M a x F u n E v a l s , 2 5 0 0) ;

    X = linspace ( - 1 5 , 3 0 , 5 0 0 ) ;s i gm a = 1 . 4;

    f = @ ( x, t ) V G d e ns i ty (x , 1 , . 4, s ig ma , 0 , t ) ;n T = 1 0;t = linspace (1,10,nT);

    p c = [ .1 . 25 . 5 . 7 5 . 9 . 95 . 99 ];

    fo r k = 1 :n TY ( : ,k ) = V G d e ns i t y( X , 1 , . 4 , s ig ma , 0 , t ( k ) ) ;f = @( x) V Gd en si ty( x, 1, .4 , sigma , 0, t( k)) ;V G _ P DF = q u a d l ( f , - 3 , 3 0) ;

    fo r zz = 1: length ( p c )p er cF = @ (a ) qu ad l( f, -3 ,a ) - pc (zz );P e r c A ( k , z z ) = fzero (percF ,1,options );

    en den d

    subplot (1,2,1)plot ( X , Y ) ;legend ( 1 y r , 2 y r , 3 y r , 4 y r , 5 y r , 6 y r , 7 y r , 8 y r , 9 y r , 1 0 yr )x l i m ( [ - 5 2 7 ] ) ; y l i m ( [ 0 . 3 2 ] ); subplot (1,2,2); plot ( t , P e r c A ) ;fo r zz = 1: length ( p c )

    text (t(zz),PercA(zz,zz),[{\fontsize {9}\color{black}, num2str ( p c ( z z ) ) } ] , H o r i z o n t a l A l i g n m e n t , . . . c e n t e r , B a c k g r o u n dC o l o r , [ 1 1 1 ] ) ; hold on

    en dx l i m ( [ . 5 1 0 ] ) ; xlabel ( y e a r s ) ; grid on

    between the Arithmetic Brownian motion for the returns and the VG process is expressed by the timechange parameter , all other parameters being roughly the same in the ABM and VG cases.

    6 The Mean Reverting Behaviour

    One can define mean reversion as the property to always revert to a certain constant or time varyinglevel with limited variance around it. This property is true for an AR(1) process if the absolute value ofthe autoregression coefficient is less than one (|| < 1). To be precise, the formulation of the first orderautoregressive process AR(1) is:

    xt+1 = + xt + t+1 xt+1 = (1 )

    1 xt

    + t+1 (45)

    All the mean reverting behaviour in the processes that are introduced in this section is due to anAR(1) feature in the discretised version of the relevant SDE.

    In general, before thinking about fitting a specific mean reverting process to available data, it isimportant to check the validity of the mean reversion assumption. A simple way to do this is to testfor stationarity. Since for the AR(1) process || < 1 is also a necessary and sufficient condition forstationarity, testing for mean reversion is equivalent to testing for stationarity. Note that in the specialcase where = 1, the process behaves like a pure random walk with constant drift, xt+1 = + t+1.

    There is a growing number of stationarity statistical tests available and ready for use in many econo-metric packages.18 The most popular are:

    the Dickey and Fuller (1979, DF) test; the Augmented DF (ADF) test of Said and Dickey (1984) test;

    18For example Stata, SAS, R, E-Views, etc.

  • 8/9/2019 Brigo Toolkit

    26/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 26

    200 400 600 800 1000 1200 1400 1600 18000.8

    1

    1.2

    1.4

    1.6

    days

    EUR/USD

    0.03 0.02 0.01 0 0.01 0.02 0.030

    0.02

    0.04

    0.06

    0.08

    0.03 0.02 0.01 0 0.01 0.02 0.030.04

    0.02

    0

    0.02

    0.04

    X Quantiles

    YQuantiles

    0.03 0.02 0.01 0 0.01 0.02 0.030

    20

    40

    60

    80

    historical

    estimated

    0.03 0.02 0.01 0 0.01 0.02 0.030.04

    0.02

    0

    0.02

    0.04

    X Quantiles

    YQuantiles

    estimated

    Figure 13: On the top chart the historical trend of the EUR/USD is reported. In the middle chart thedaily return distribution presents fat tails with respect to a normal distribution. The MLE parametersfor the normal R.V. are = .0002 and = 0.0061. The QQ-plot in the middle shows clearly thepresence of tails more pronounced than in the normal case. The bottom chart is the plot of the dailyreturn density against the estimated VG one, and the QQ-plot of the fitted daily historical FX returntime series EUR/USD using a VG process with parameters = .0001, = 0.4, = 0.0061, = .0001.

    the Phillips and Perron. (1988, PP);

    the Variance Ratio (VR) test of Poterba and Summers (1988) and Cochrane (2001); and the Kwiatkowski, Phillips, Schmidt, and Shin (1992, KPSS) test.All tests give results that are asymptotically valid, and are constructed under specific assumptions.Once the stationarity has been established for the time series data, the only remaining test is for the

    statistical significance of the autoregressive lag coefficient. A simple Ordinary Least Squares of the timeseries on its one time lagged version (Equation (45)) allows to assess whether is effectively significant.

    Suppose that, as an example, one is interested in analysing the GLOBOXX index history. Due to thelack of Credit Default Swap (CDS) data for the time prior to 2004, the underlying GLOBOXX spreadhistory has to be proxied by an Index based on the Merrill Lynch corporate indices19. This proxy index

    has appeared to be mean reverting. Then the risk of having the CDS spreads rising in the mean term torevert to the index historical long-term mean may impact the risk analysis of a product linked to theGLOBOXX index.

    The mean reversion of the spread indices is supported by the intuition that the credit spread is linkedto the economic cycle. This assumption also has some support from the econometric analysis. To illus-trate this, the mean reversion of a Merrill lynch index, namely the EMU Corporate A rated 5-7Y index20,is tested using the Augmented Dickey-Fuller test (Said and Dickey (1984)).

    19Asset Swap Spreads data of a portfolio of equally weighted Merrill Lynch Indices that match both the average creditquality of GLOBOXX, its duration, and its geographical composition.

    20Bloomberg Tickers ER33 - Asset Swap Spreads data from 01/98 to 08/07.

  • 8/9/2019 Brigo Toolkit

    27/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 27

    The ADF test for mean reversion is sensitive to outliers21. Before testing for mean reversion, theindex data have been cleaned by removing the innovations that exceed three times the volatility, as theyare considered outliers. Figure 15 shows the EMU Corporate A rated 5-7Y index cleaned from outliersand shows also the autocorrelations and partial autocorrelation plots of its log series. From this chartone can deduce that the weekly log spread of the EMU Corporate A rated 5-7Y index shows a clear

    AR(1) pattern. The AR(1) process is characterised by exponentially decreasing autocorrelations and bya partial autocorrelation that reaches zero after the first lag (see discussions around Equation (8) for therelated definitions):

    ACF(k) = k, k = 0, 1, 2,...; PACF(k) = , k = 1; PACF(k) = 0, k = 2, 3, 4,... (46)

    The mean reversion of the time series has been tested using the Augmented Dickey-Fuller test ( Saidand Dickey (1984)). The ADF statistics obtained by this time series is -3.7373. The more negative theADF statistics, the stronger the rejection of the unit root hypothesis22. The first and fifth percentiles are-3.44 and -2.87. This means that the test rejects the null hypothesis of unit root at the 1% significancelevel.

    Having discussed AR(1) stylised features in general, this tutorial now embarks on describing specific

    models featuring this behaviour in continuous time.

    7 Mean Reversion: The Vasicek Model

    The Vasicek model, owing its name to Vasicek (1977), is one of the earliest stochastic models of the short-term interest rate. It assumes that the instantaneous spot rate (or short rate) follows an Ornstein-Uhlenbeck process with constant coefficients under the statistical or objective measure used for historicalestimation:

    dxt = ( xt)dt + dWt (47)21For an outlier due to a bad quote, the up and down movement can be interpreted as a strong mean reversion in that

    particular data point and is able to influence the final result.22Before cleaning the data from the outliers the ADF statistic was -5.0226. This confirms the outliers effect on amplifying

    the mean reversion detected level.

    0 20 40 60 80 100 12020

    15

    10

    5

    0

    5

    10

    15

    20

    Simulation time

    AR(1) paths (in bleu) v.s. Random Walk paths (in red)

    0 20 40 60 80 100 1200

    20

    40

    60

    80

    100

    120

    Simulation time

    Variance of the stationary process AR(1) (in bleu) v.s. Variance of the Random Walk (in red)

    Figure 14: The Random walk vs the AR(1) stationary process. (AR(1): = 0, = 0.7, = 1 andRandom Walk: = 0, = 1)

  • 8/9/2019 Brigo Toolkit

    28/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 28

    0

    20

    40

    60

    80

    100

    120Weekly spread of the EMU Corporate A rated 57Y index

    1998

    2000

    2002

    2004

    2006

    2008

    0 5 10 15 20 25 300.5

    0

    0.5

    1

    Lag

    SampleAutocorrelation

    30 first Autocorrelations

    0 5 10 15 20 25 300.5

    0

    0.5

    1

    Lag

    SamplePartialAutocorrela

    tions

    30 first Partial Autocorrelations

    Figure 15: On the top: the EMU Corporate A rated 5-7Y weekly index from 01/98 to 08/07. On the

    bottom: the ACF and PACF plot of this time series.

    20 0 20 40 60 80 100 120 140 1600

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Frequency

    EMU Corporate A rated 57Y index spreads level

    EMU Corporate A rated 57Y index Histogram

    Figure 16: The histogram of EMU Corporate A rated 5-7Y index.

  • 8/9/2019 Brigo Toolkit

    29/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 29

    with , and positive and dWt a standard Wiener process. Under the risk neutral measure used forvaluation and pricing one may assume a similar parametrisation for the same process but with one moreparameter in the drift modelling the market price of risk; see for example Brigo and Mercurio (2006).

    This model assumes a mean-reverting stochastic behaviour of interest rates. The Vasicek model hasa major shortcoming that is the non null probability of negative rates. This is an unrealistic property for

    the modelling of positive entities like interest rates or credit spreads. The explicit solution to the SDE(47) between any two instants s and t, with 0 s < t, can be easily obtained from the solution to theOrnstein-Uhlenbeck SDE, namely:

    xt =

    1 e(ts)

    + xse(ts) + et

    ts

    eudWu (48)

    The discrete time version of this equation, on a time grid 0 = t0, t1, t2, . . . with (assume constant forsimplicity) time step t = ti ti1 is:

    x(ti) = c + bx(ti1) + (ti) (49)

    where the coefficients are:

    c = 1 et (50)b = et (51)

    and (t) is a Gaussian white noise ( N(0, 1)). The volatility of the innovations can be deduced bythe Ito isometry

    =

    (1 e2t) /2 (52)whereas if one used the Euler scheme to discretise the equation this would simply be

    t, which isthe same (first order in t) as the exact one above.

    Equation (49) is exactly the formulation of an AR(1) process (see section 6). Having 0 < b < 1 when0 < implies that this AR(1) process is stationary and mean reverting to a long-term mean given by .

    One can check this also directly by computing mean and variance of the process. As the distribution ofx(t) is Gaussian, it is characterised by its first two moments. The conditional mean and the conditionalvariance of x(t) given x(s) can be derived from Equation (48):

    Es[xt] = + (xs )e(ts) (53)

    Vars[xt] =2

    2

    1 e2(ts)

    (54)

    One can easily check that as time increases the mean tends to the long-term value and the varianceremains bounded, implying mean reversion. More precisely, the long-term distribution of the Ornstein-Uhlenbeck process is stationary and is Gaussian with as mean and

    2/2 as standard deviation.

    To calibrate the Vasicek model the discrete form of the process is used, Equation (49). The coefficients

    c, b and are calibrated using the equation (49). The calibration process is simply an OLS regression ofthe time series x(ti) on its lagged form x(ti1). The OLS regression provides the maximum likelihoodestimator for the parameters c, b and . By resolving the three equations system one gets the following, and parameters:

    = ln(b)/t (55) = c/(1 b) (56) = /

    (b2 1)t/2ln(b) (57)

    One may compare the above formulas with the estimators derived directly via maximum likelihood inBrigo and Mercurio (2006), where for brevity x(ti) is denoted by xi and n is assumed to be the numberof observations.

  • 8/9/2019 Brigo Toolkit

    30/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 30

    44 46 48 50 52 54 560

    0.01

    0.02

    0.03

    0.04

    0.05

    NormalizedFrequency

    LT mean10 20 30 40 50 60 70 80 90

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    NormalizedFrequency

    LT variance

    50 55 60 65 70 75 800

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    NormalizedFreq

    uency

    99th percentile

    Figure 17: The distribution of the long-term mean and long-term variance and the 99th percentileestimated by several simulations of a Vasicek process with = 8, = 50 and = 25

  • 8/9/2019 Brigo Toolkit

    31/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 31

    b = nni=1 xixi1 ni=1 xini=1 xi1nn

    i=1 x2i1 (

    ni=1 xi1)

    2, (58)

    = ni=1[xi

    bxi1]

    n(1 b) , (59)2 = 1

    n

    ni=1

    xi bxi1 (1 b)2 . (60)

    From these estimators, using (55) and (57) one obtains immediately the estimators for the parameters and .

    Code 12 MATLAB Calibration of the Vasicek process.

    function M L _ V a s i c e k P a r a ms = V a s i c e k _ c a l i b r at i o n ( V _ d at a , d t )

    N= length ( V _ d a t a ) ;x = [ o n es ( N - 1 ,1 ) V _ da t a ( 1 :N - 1 ) ];o l s = ( x * x ) ^ ( - 1 ) * ( x * V _ d a t a ( 2 : N ) ) ;

    r e si d = V _ da t a ( 2: N ) - x * o ls ;

    c = o l s ( 1 ) ;b = o l s ( 2 ) ;d e l t a =st d ( r e s i d ) ;

    a l p h a = -lo g ( b ) / d t ;t h e t a = c / ( 1 - b ) ;s i g m a = d e l t a /sqrt ( ( b ^ 2 - 1 ) * d t / ( 2 *lo g (b)));

    M L _ V as i c ek P a r am s = [ a l ph a t h et a s i gm a ] ;

    For the Monte Carlo simulation purpose the discrete time expression of the Vasicek model is used.The simulation is straightforward using the calibrated coefficients , and . One only needs to drawa random normal process t and then generate recursively the new xti process, where now ti are future

    instants, using equation (49), starting the recursion with the current value x0.

    8 Mean Reversion: The Exponential Vasicek Model

    To address the positivity of interest rates and ensuring their distribution to have fatter tails, it is possibleto transform a basic Vasicek process y(t) in a new positive process x(t).

    The most common transformations to obtain a positive number x from a possibly negative number yare the square and the exponential functions.

    By squaring or exponentiating a normal random variable y one obtains a random variable x with fattails. Since the exponential function increases faster than the square and is larger (exp(y) >> y2 forpositive and large y) the tails in the exponential Vasicek model below will be fatter than the tails in the

    squared Vasicek (and CIR as well). In other terms, a lognormal distribution (exponential Vasicek) hasfatter tails than a chi-squared (squared Vasicek) or even noncentral chi-squared (CIR) distribution.

    Therefore, if one assumes y to follow a Vasicek process,

    dyt = ( yt)dt + dWt (61)then one may transform y. If one takes the square to ensure positivity and define x(t) = y(t)2, then byItos formula x satisfies:

    dx(t) = (B + A

    x(t) Cx(t))dt +

    x(t)dWt

    for suitable constants A,B,Cand . This squared Vasicek model is similar to the CIR model that thesubsequent section will address, so it will not be addressed further here.

  • 8/9/2019 Brigo Toolkit

    32/43

    D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 32

    1998 1999 2000 2001