Approximate smiles in an extended SABR model...N(x) = √1 2π Rx e−x 2 2 dx denotes the normal cumulative distribution function, d± = logf/K ± 1 2σ 2 Bτ σB τ, (1.2) and D(t)

Approximate smiles in an extendedSABR model

d-fine

Exeter College

University of Oxford

A thesis submitted in partial fulfillment of the MSc in

Mathematical Finance

April 18, 2012

Abstract

Hagan’s asymptotic expansion fo the SABR model [27] is widely used by practitioners

for fitting the smile of vanilla interest-rate options. However, it is well known to break

down for very long dated options, large volatility of volatility or very small strikes. With

the current very low short-term rates in the market, deficiencies in the underlying CEV

model, namely an absorbing boundary at zero rates, become more acute. Thus, other

choices of local volatility such as shifted log-normal or shifted CEV receive more attention

as a basis for a stochastic-volatility extension. One recent empirical analysis of very long

time-series data for interest rates [26] suggests three regimes of interest rate dynamics

depending on the level of rates: log-normal behaviour at very low rates, normal dynamics

at intermediate rates and shifted log-normal behaviour at very large rates.

In this thesis, we review two types of approximation schemes used in the literature for the

standard, CEV-based SABR model: Asymptotic expansions for small time to maturity

τ as well as a mixing approach for ρ = 0 suggested by Barjaktarevich and Rebonato [48].

Both approaches are applied to an extended SABR model with a general local volatility

function C(f). Approximate results are compared to Monte Carlo and a two-dimensional

finite difference scheme for a few choices of C(f), including one that models the three

regimes of [26].

Contents

1 Introduction 1

2 Option pricing: Concepts and notation 8

3 Short time scales 11

3.1 The heat-kernel approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.1 Covariant form of pricing equation . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.2 Asymptotic expansion of the transition density . . . . . . . . . . . . . . . . . 14

3.1.3 Expected variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.4 Time value of vanilla option . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.5 Implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Application to local volatility model . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Application to SABR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3.1 Geodesic distance: Hyperbolic geometry . . . . . . . . . . . . . . . . . . . . . 22

3.3.2 van-Vleck-Morette determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.3 Parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.4 Transition density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3.5 Marginal transition density and local volatility . . . . . . . . . . . . . . . . . 27

3.3.6 Implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Small correlation ρ 29

4.1 Time-inhomogeneous local volatility model . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Conditioning on volatility path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Distribution of total variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4 Hull and White Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5 Averaging with precalculated quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.6 Extension to finite ρ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Effective local volatility 38

i

6 Numerical methods 39

6.1 Finite difference methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.1.1 Methods for one-factor models . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.1.2 Extension to two-factor models . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2 Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Quantitative comparison of different approximation schemes 57

7.1 Local volatility models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.1.1 Constant elasticity of variance (CEV) . . . . . . . . . . . . . . . . . . . . . . 57

7.1.2 Shifted log-normal and shifted CEV . . . . . . . . . . . . . . . . . . . . . . . 63

7.1.3 Two- and three-regime local volatility models . . . . . . . . . . . . . . . . . . 64

7.1.4 Cubic toy local volatility model . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.2 SABR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.2.1 Standard SABR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.2.2 Extended SABR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

8 Conclusions 80

A Laplace method for one-dimensional integrals 81

B Sigma functions 83

C Moments of the exponential integral of Brownian motion 85

D Hagan’s approximation 88

Bibliography 90

ii

Chapter 1

Introduction

The SABR model [27] is the de facto market standard for the description of smiles in the pricing of

interest-rate derivatives. On the one hand, this ubiquity is due to a qualitatively correct description

of the time evolution of smiles as opposed to previously used local-volatility models. On the other

hand, the model is simple enough for analytical approximations of the implied volatility that allow for

a fast and stable calibration of model to market vanilla option prices. The most widely used of these

approximate smile formulas is already given in the original paper by Hagan and co-workers [27]. Since

it is an asymptotic expansion, the quality of the approximation is well-known to deteriorate for long

times to maturity, large volatility or volatility of volatility or very small strikes. Furthermore, the

CEV parameter β cannot in practice be extracted from the calibration procedure and is generally

fixed a priori to either of the choices β = 0 (normal), β = 1 (log normal) or β = 12 (CIR type

dynamics) depending on current market conditions and the school of thought of the institution.

An empirical determination of the ‘right’ β requires an analysis of very long time series data for

interest rates and/or interest rate options. In one recent work [26], three regimes of interest rate

dynamics where suggested depending on the level of rates: log-normal behaviour at very low rates,

normal dynamics at intermediate rates and shifted log-normal behaviour at very large rates. In the

modelling of interest rate smiles, there is thus a demand for improved solutions of the standard

SABR model as well as for an analysis of a SABR model with more general local volatility as a

basis.

Historically, the modelling of interest rate options has evolved in several stages, reflecting the

progress of academic ideas but also the maturity of the market and current market conditions.

Shortly after the celebrated pricing formula of Black, Scholes and Merton [11, 44] for stock options

was adopted by the market, a similar formula for interest rate caplets and swaptions [10] was soon

being applied,

CB(τ, f,K, σB) = D(t)[fN(d+) −KN(d−)] . (1.1)

Here, f is the current, time t, value of the forward rate or the swap rate, τ = T − t is the time to

maturity T , K is the strike of the option and σB is the volatility of the interest rate. Furthermore,

1

N(x) = 1√2π

∫ x

−∞ e−x2

2 dx denotes the normal cumulative distribution function,

d± =log f/K ± 1

2σ2Bτ

σB√τ

, (1.2)

and D(t) is the appropriate discount factor. For caplets, we have D(t) = αP (t, T2) where T2 is the

end of the forward period T to T2 and also the time of payment of the caplet, P (t, Ti) is the price

of the zero coupon bond paying at time Ti and α is the day-count fraction for the forward period

T to T2. For swaptions, D(t) = Lv(t) =∑N

i=1 αiP (t, Ti) is called the level where Ti, i = 0, . . . , N

with T0 = T denote the beginning and end of the swap periods and αi is the day-count fraction for

the ith period from Ti−1 to Ti. Eq. (1.1) rests on the assumption that the interest rate follows a

log-normal dynamics with constant volatility. The forward rate respectively the swap rate Ft can

be written as

Ft =1

α

P (t, T ) − P (t, T2)

P (t, T2), Ft =

P (t, T0) − P (t, TN)

Lv(t), (1.3)

i.e. they are ratios of portfolios of tradeable assets to the zero coupon bond P (t, T2) or the level

Lv(t), respectively. If P (t, T2) or the level is chosen as a numeraire, the rate is then a martingale

in the respective measure according to the martingale pricing theorem (see also Chap. 2). Together

with the assumption of a log-normal dynamics with constant volatility, the interest rate Ft in the

pricing measure evolves according to the stochastic differential equation,

dFt = σBFtdWt , (1.4)

where Wt is a standard, driftless Brownian motion in the terminal or swap measure respectively.

Taken literally, Eqs. (1.1) and (1.4) imply that all options on the same interest rate should be

priced with the same volatility σB independent of the strike. Since forward rates with different

maturities could be considered to be identical, just at different stages of their lives depending on

the remaining time to maturity, one could argue that options for all strikes and maturities should

be priced with the same volatility σB. However, when current prices observed in the market are

compared to Black prices,

CB(τ, f,K, σimpl.(τ,K)) = Cmarket(τ,K) , (1.5)

it is observed that the implied volatilities1 σimpl.(K, τ) do exhibit a term structure, i.e. a dependence

on the time to maturity τ , as well as a smile, i.e. a dependence on the strike K. To model such

dependencies, the assumption of a constant σB has to be relaxed. A time-inhomogeneous model,

dFt = σ(t)FtdWt , (1.6)

1A conversion to implied volatilities is always possible and unique as observed call prices are always above thepayoff at maturity and Eq. (1.1) as a function of the volatility σB is monotonous.

2

could account for a term structure. In this case Black’s formula (1.1) still remains valid when σB is

replaced by the root-mean-square volatility σr.m.s.(t, T ) given by

σ2r.m.s.(t, T ) =

1

T − t

∫ T

t

σ2(s)ds . (1.7)

Similarly, smiles can be described when the instantaneous volatility is allowed to also depend on the

level of interest rates in so-called local-volatility models,

dFt = σL(Ft, t)FtdWt . (1.8)

The implied volatility σimpl.(K, τ) is not directly related to the local volatility σL(f, t). In particular,

σimpl.(K,T − t) 6= σL(K, t). Generally speaking, σI can only be obtained by solving the valuation

problem with time- and level-dependent σL(f, t) and inverting Eq. (1.5) for σimpl.. Dupire [18] has

shown that for a given complete set of market option prices Cmarket(K, τ) a unique σL(f, t) can be

determined that produces these prices. Using the forward Kolmogorov equation for the transition

density (see Chap. 2) and the particular European call payoff, one can easily show the relation

∂T C =1

2σ2L(K,T )K2∂2

KC , (1.9)

where C(t) = C(t)/D(t) is the option price normalized by the appropriate discount factor (nu-

meraire). If a sufficiently complete set of European option prices can be extracted from the market

such that derivatives with respect to expiry T and strike K can be taken numerically, Eq. (1.9) can

in principle be inverted to obtain σL(K,T ). However, this procedure turns out to be numerically

very unstable. Nevertheless, discrete versions of Eq. (1.8) on a tree can be fitted to market data of

vanilla options by forward induction and used for pricing exotics [18, 16].

By construction, local-volatility models yield a perfect fit of current market prices. However,

as pointed out by Hagan [27], the time evolution of future smiles implied by today’s fit is contrary

to what is actually observed in the market. A non-monotonous smile has its minimum close to

the at-the-money point. The minimum of the smile tracks the underlying rate when the latter

changes over time. On the contrary, a local volatility model predicts a movement of the smile in the

opposite direction of the underlying. A local-volatility model thus requires frequent re-calibration

upon movements of the underlying. Hence, hedges based on the model also need re-adjustments

leading to higher hedging costs. A satisfactory description of the dynamics of the smile is thus

essential for good hedges.

For a better description of the dynamics of the smile, Hagan and co-workers proposed to use a

stochastic volatility extension of the well-known CEV model of the following form,

dFt = αtC(Ft)dW1t ,

dαt = ναtdW2t , (1.10)

3

where W 1t and W 2

t are two correlated Brownian motions with d[W 1,W 2]t = ρdt, and C(f) = fβ is

the local volatility function of the CEV model. Option prices in this model are thus a function of

the five parameters f = F0, α = α0, β, ρ, ν the initials of which (omitting f and ν) form the catchy

acronym SABR for ’stochastic alpha beta rho’. The achievement of the original paper [27] was

to provide an analytical approximation for the implied volatility and to use it to gain an intuitive

understanding of their influence on the form of the smile. Roughly speaking, α determines the

overall level of the implied volatility, β and ρ together control the skew of the smile and ν influences

its convexity. In practice, matters are not quite as clean, but the given analytical approximation

(see formulas in App. D) is quite accurate for moderately large times to maturity and the market

conditions when the approximation was introduced. Thus it can be used to obtain quick fits to the

smile implied by market quotes of vanilla options. Due to this simplicity the model together with

the analytical approximation was soon adopted by practitioners as the market standard to describe

option smiles. Within the SABR model, only interest-rate options can be priced that depend on

the dynamics of a single interest rate. For exotics that depend on several maturities on the yield

curve, market models are needed that describe in an arbitrage-free way the dynamics of the entire

curve. To do so and simultaneously treat smiles in a consistent manner, LIBOR market models with

a stochastic-volatility extension that for an individual rate closely resemble the SABR dynamics

have been proposed by several authors (see [51] for an overview and references). These LMM-SABR

models are not discussed in this thesis.

The SABR model should be distinguished from its approximate solution by the Hagan expansion.

Indeed, by now it is clearly understood that the expansion deteriorates for long times to maturity

τ , high volatility α and volatility of volatility ν as well as very low or high strikes K. In particular,

for very low strikes the probability density for the rate implied by the approximation can become

negative. In recent market conditions of very low short term rates, this pathology becomes more

relevant.

In the search for better approximations to the SABR model, new insights and cleaner approxi-

mation schemes have been obtained by applying methods from differential geometry (see the presen-

tations [37] and [38]). Hagan and co-workers have derived representations of the transition density

of the SABR model in this context and also a ’refinement’ of their original approximation for the

implied volatility [28], which however does not seem to be widely used in practice. Labordere has

generalized the approach to a wider class of stochastic volatility models including a new λ-SABR

model with a mean-reverting volatility process [30]. He also obtains approximate solutions of a

SABR-extended LIBOR market model [32] as well as a classification of solvable local and stochastic

volatility models [31]. Exact integral representations for the standard SABR model with β = 0 or

β = 1 have been given by several authors [28, 30, 40]. Paulot has pushed the asymptotic expansion

of the implied volatility to second order in the time to maturity τ [46]. For small τ , the second order

4

smile formula improves the agreement with numerical solutions. However, for larger τ the patholo-

gies at low rates and strikes become worse by the increased order. A different line of attack that also

yields an expansion of the implied volatility for small τ was pursued by Berestycki and coworkers

[9] who give rigorous proves for the limit τ → 0 and derive a non-linear partial differential equation

directly for the implied volatility. Their results were used by Ob loj to point out an inconsistency in

the original smile formula given in [27] that when corrected improves the pathologies at low strikes

[45].

The SABR model in Eq. (1.10) is among a range of so-called stochastic-volatility models in which

the volatility or the variance of the underlying is itself governed by a second stochastic process. Early

models in this class, such as the one proposed by Hull and White [34] were stochastic volatility

extensions of the Black-Scholes model for the underlying. The book by Lewis [40] gives a good

overview of solution methods for these types of models. A particularly popular model is the one

introduced by Heston [33] which is a so-called affine model. The generating function of the moments

of the underlying can be derived analytically and vanilla option prices can then be obtained by fast

Fourier transformation. Approximation schemes generally devised for stochastic-volatility models

might also have some bearing on the SABR model and its extension to general C(f). In particular,

we will discuss mixing solutions derived from the limit ρ = 0. Other interesting limits that will

not be discussed here in detail are long times to maturity [21, 19, 20], asymptotically large strikes

[7] as well as multiscale expansions for models with a quickly mean-reverting volatility [23, 22, 24].

However, the techniques used to derive asymptotic results for long times to maturity are related to

the mixing solutions and we will give some comments in this context.

The particular choice of stochastic processes in the SABR model was mainly motivated by an-

alytical tractability. As Hagan points out in his lectures, the model was put together in a hurry

as many others on the ’street’. The parameter β was introduced because people from different

institutions couldn’t agree on whether interest rates should be normal, log-normal or possibly in

between. With the free parameter, everybody can pick his or her favorite. In a sense, the SABR

model is thus a minimal model that allows for a description of both downward sloping smirks as

well as non-monotonic smiles. As such, it has well-known drawbacks (see e.g. Sec. 3.10 in [51] for a

more detailed discussion of these points). First, the volatility is purely log-normal. Thus, averages

for long times to maturity are dominated by a large number of paths with very low volatility and

very few paths with very high volatility. Intuitively, a mean-reverting feature that keeps volatilities

at intermediate levels appears more realistic. Second, the dynamics of the rates is based on the

CEV model which for 0 < β < 1 has an absorbing boundary at zero (see Sec. 7.1.1), i.e. once a

zero interest rate has been reached it will stay there forever according to the model. This behaviour

clearly contradicts what is observed in reality. Even if zero short-term rates are possible, monetary

authorities will surely not keep them there forever. The absorbing boundary has a significant effect

5

0f

C(f

)

θ 0f

C(f

)

fL

fL

fR 0

f

C(f

)

fc/3

fc

Figure 1.1: Local volatility functions C(f) used in this thesis. The standard and shifted CEV choiceis shown on the left. The middle panels contains the three regime model, whereas on the right weshow the cubic toy model.

on option prices for longer maturities since a considerable proportion of the probability density will

be trapped at zero. A third drawback stems from the fact that both β and ρ influence the skew of

the smile. As a consequence, β cannot be extracted from fits to vanilla option prices observed in the

market and has to be fixed a priori.

To answer the question of what the ’true’ coefficient β should be or more generally what the

volatility should be as a function of the level of the rates, a range of empirical studies have been

conducted. Evidence is either indirectly gathered from implied volatilities of option prices or directly

through the analysis of very long-dated interest-rate time series. An example of the first kind is the

study of time series of swaption implied volatilities as a function of the level of the rate [47], i.e. a

study of what Hagan and coworkers call the ’backbone’ of the smile. The direct approach is taken

by de Guillaume et al. who study very long time series for swap and government rates [26]. They

point out that previous attempts to obtain the CEV exponent β from fitting of time series data are

inconclusive and depend very much on the era considered. From the analysis of time series with a

length of up to about 40 years, they extract a universal dependence of the variability of rates on

the level of rates involving three regimes. At rates roughly below 1%, the dynamics is log-normal

and thus has no absorbing boundary at zero rates. At intermediate levels, roughly up to 5 or 6% a

normal regime sets in. Finally at very high rates, the dynamics is of shifted log-normal type.

To incorporate these empirical findings into tractable models for option pricing, we will consider

in this thesis the extension of the SABR model in Eq. (1.10) to more general choices of the local

volatility function C(f). Besides the standard SABR model with the CEV local volatility function

C(f) = fβ , (1.11)

we will mainly be interested in a three-regime piecewise linear C(f) that models the universal curve

found in [26], i.e.

C(f) =

f , f ≤ fL ,fL , fL ≤ f ≤ fR ,

fL + κ(f − fR) , f ≥ fR .(1.12)

6

Here, fL and fR are two regime-switching points. Note that the regimes here are of a static na-

ture. Dynamic switching between different types of dynamics has also been modeled and analysed

empirically [50, 49]. We will also briefly consider a shifted CEV local volatility function,

C(f) = (f + θ)β , (1.13)

which for β = 1 reduces to shifted log-normal behaviour. In the recent market environment with very

low interest rates, such a model can be appealing as it moves the problematic absorbing boundary

from zero to slightly negative rates −θ. It will turn out that the non-analytic nature of C(f) in

Eq. (1.12) at the regime switching points can cause trouble in the asymptotic expansion. We will

therefore also consider the following cubic toy model,

C(f) =fc3

[(f

fc− 1

)3

+ 1

]

. (1.14)

In a smooth way, this roughly models the main features of the universal curve found in [26], i.e. a

linear behaviour at low rates, a levelling off at intermediate rates and an increase at very high rates

again. Fig. 1.1 shows a visualization of the different local volatility functions used in this thesis.

Note that, although we consider different choices of the function C(f) here, the functional form is

not taken as an ingredient for fitting the smile contrary to very recent proposals for an extended

’ZABR’ model [2].

Our contribution in this thesis is two-fold. First, we review the heat-kernel expansion for small

times to maturity putting particular emphasis on expressing results in a way suitable to arbitrary

C(f) wherever possible. Second, it is shown that a mixing approach for ρ = 0 based on pre-calculated

quantiles of the distribution of the average variance as proposed by Barjaktarevic and Rebonato [48]

is applicable for arbitrary C(f) when combining it with numerical solutions of the underlying local

volatility model. This approach is particularly suited to analyse the influence of different functional

shapes of C(f) as it is fast and the numerical implementation does not depend on any particularities

of C(f). We compare the results of both the heat-kernel expansion and the mixing approach to

numerical results obtained from Monte Carlo as well as two-dimensional finite-difference schemes.

The thesis is organized as follows. In Chap. 2, we review some concepts of option pricing and

introduce the notation used in the subsequent chapters. Chap. 3 discusses analytical approximations

for short times to maturity. Chap. 4 treats the limit of vanishing correlation ρ between the forward

rate and the volatility process. The short Chap. 5 describes an effective local-volatility scheme

motivated by [2]. In Chap. 6, we introduce the numerical methods used to gauge the accuracy of

the analytical approximations. Chap. 7 compares the analytical and semi-analytical approximations

to numerics in terms of accuracy and computational speed. Finally, Chap. 8 gives a conclusion.

7

Chapter 2

Option pricing: Concepts andnotation

To derive option prices in the SABR model in Eq. (1.10), we invoke the fundamental theorem of

arbitrage-free pricing. This theorem states that assuming the absence of arbitrage opportunities and

some technical conditions, there exists a probability measure such that the price V (t) of a tradeable

asset normalized by another tradeable and strictly positive asset N(t) (numeraire) is a martingale,

i.e.V (t)

N(t)= E

[V (T )

N(T )|Ft

]

. (2.1)

Note that for the two-factor SABR model, the market is not complete and the pricing measure is

not uniquely determined from arbitrage arguments alone. Nevertheless, there is a unique observable

price in the market. One can then assume that according to the risk preferences of the participants,

the market has chosen a particular measure to price the options. For consistent pricing, the chosen

measure has to be extracted by calibrating model parameters to market prices. The conversion

between the real-world measure and the pricing measure involves two market prices of risks, one

for each factor. In practice, it is more convenient to express the model dynamics directly in the

pricing measure as has been done in Eq. (1.10). The absence of a drift term for the rate follows

directly from the fact that Ft can be expressed as the normalized price of a portfolio of tradeable

assets as shown in Eq. (1.3). The particular driftless form of the SDE for the volatility in Eq. (1.10)

on the contrary is a modelling assumption and does not follow from no arbitrage. All SDEs and

expectation values denoted by bold font E are with respect to the pricing measure, the physical

measure will never be invoked here. Note that the explicit construction of hedging portfolios in

incomplete market conditions is very non-trivial and will not be considered here. For an account of

hedging in the SABR and related market models, see [27] and [6] as well as the textbook [51] by

Rebonato and co-workers.

Now consider the case of caplets or swaptions and choose D(t) given below Eq. (1.2) as a nu-

meraire. It turns out that the payoff in both cases can be written as V (T ) = (FT −K)+D(T ). Thus

8

the normalized price of a caplet or swaption is given by

C(t) =C(t)

D(t)= [(FT −K)+|Ft] . (2.2)

From now on, we will always work with normalized option prices and drop the tilde for convenience.

For comparison with market quotes one thus would have to reintroduce the discount factor D(t) as

appropriate. Since the pair of processes (Ft, αt) is Markov, the option price can only depend on the

current levels of the state variables and not on their history, i.e. C(t) = C(t, Ft, αt). Applying Ito’s

lemma, the (normalized) option price evolves according to the following SDE,

dCt =

[∂C

∂t+

1

2α2tC(Ft)

2 ∂2C

∂f2+

1

2α2t

∂2C

∂α2+ ρνα2

tC(Ft)∂2C

∂f∂α

]

dt

+ [. . . ] dW(1)t + [. . . ] dW

(2)t . (2.3)

Since Ct is a Martingale according to the theorem of option pricing, the expression in the first square

bracket has to vanish. For the option price,

C(t, T, f, α) = E[(FT −K)+|Ft = f, αt = α], (2.4)

this yields the partial differential equation

[∂t + L]C = 0 , (2.5)

where the differential pricing operator is given by

L =1

2σ2C(f)2

∂2

∂f2+

1

2α2 ∂2

∂α2+ ρνα2C(f)

∂2

∂f∂α. (2.6)

The terminal condition of Eq. (2.5) is given by the payoff, i.e. C(T, T, f, α) = g(f) = (f − K)+.

The SABR model is homogeneous in time. As a consequence, the option value can only depend on

the time τ = T − t to maturity and not on T and t separately, i.e. C(t, T, f, α) = C(τ, f, α). The

pricing PDE in terms of τ now reads as

[∂τ − L]C = 0 , (2.7)

and the terminal condition is transformed into an initial condition C(0, f, α) = g(f).

Apart from the two-factor SABR model, we are also interested in the underlying one-factor

local-volatility model,

dFt = σC(Ft)dWt , (2.8)

where now σ is a constant. Repeating the same argument, the pricing operator becomes

L =1

2α2C(f)2

∂2

∂f2. (2.9)

For a concise notation in the next chapter, it will be convenient to introduce a common notation

for both the SABR and the local volatility models. To this end, we introduce the tupel Xt = (Ft, αt)

9

for the SABR model and Xt = Ft for the local vol model, respectively. The stochastic processes are

then of the general form,

dX it = σi(Xt)dW

it , (2.10)

where (σ1, σ2) = (αC(f), α) for the SABR model and σ1 = σC(f) for the local volatility model and

dW it dW

jt = ρijdt. The pricing operator then becomes,

L =1

2

n∑

ij=1

ρijσiσj∂2

∂xi∂xj. (2.11)

The price of a European option can also be written in terms of the transition density p(τ ;x, y),

V (t;x) = E[g(XT )|Xt = x] =

∫

p(T − t;x, y)g(y)dy . (2.12)

Formally, p(τ ;x, y) is the Arrow-Debreu price of an option with a delta-function payoff at maturity

and thus also follows the Kolmogorov backward Eq. (2.7), i.e.

∂

∂τp(τ ;x, y) = Lxp(τ ;x, y) , (2.13)

where the subscript x on the operator L indicates that derivatives are to be taken with respect to

the initial variables x. Occasionally, we will also need the Kolmogorov forward or Fokker-Planck

equation (for a derivation see e.g. [54]),

∂

∂τp(τ ;x, y) =

1

2

n∑

ij=1

∂2

∂yi∂yj

[

ρijσi(y)σj(y)p(τ ;x, y)]

. (2.14)

Using Eq. (2.14) and a partial integration, one can easily get the Dupire equation for a European

call option,

∂τV =

∫

(y(1) −K)+[∂τp]ddy =1

2

∫

[∂2y(1)(y

(1) −K)+]σ21pd

dy

=1

2

∫σ21δ(y(1) −K)pddy

∫δ(y(1) −K)pddy

∫

δ(y(1) −K)pddy

≡ 1

2K2σ2

loc(τ, x,K)∂2KV . (2.15)

For a local volatility model, this relation has already been given in Eq. (1.9). For a multi-factor

model, the second to last equality gives a precise definition of the effective local volatility as an

average in the full model.

10

Chapter 3

Short time scales

Starting with the original paper by Hagan et al. [27], the SABR model has been analysed by

using asymptotic expansions. The derivation in [27] makes use of singular-perturbation theory in

an artificial small parameter ǫ which is set to ǫ = 1 at the end of the derivation, but the result can

be recast as an asymptotic series in the time to maturity τ . The procedure hidden in an appendix

uses a lot of physical intuition and several transformations of variables the motivation of which are

rather hard to follow. Furthermore, some additional approximations geared towards the underlying

CEV model are used. Subsequent work in the literature by Hagan et al. [28] and others [30, 46]

have obtained cleaner derivations of similar series using the language of differential geometry. We

follow this later route here.

The key insight is to note that the pricing Eq. (2.7) can be recast as a heat or diffusion equation

in a curved space with a geometry given by the functional form of the prefactors of the derivatives.

The leading asymptotics of the transition density at small times is then similar to the Gaussian

transition density of an ordinary diffusion process, but with the distance |x− y| between the initial

and final points replaced by the so-called geodesic distance d(x, y), i.e. the length of the shortest

path connecting x and y in the curved space. This result was first proved by Varadhan [57, 56]

in the 1960s. Correction terms involve an expansion in powers of the time τ and can be obtained

with methods from theoretical physics that are e.g. used to obtain semi-classical expansions for the

Schrodinger equation in quantum mechanics in powers of Planck’s constant ~ as well as methods from

geometrical optics. The former also goes under the name of WKB expansion. It is also relatively

straight forward to obtain the leading asymptotics from a saddle-point approximation of a Feynman

path integral representation of the transition density (see e.g. Sec. 7.4 in [5]). However, correction

terms are more easily obtained within a differential-operator formalism. In the mathematics or

mathematical physics literature, the transition density is called the kernel of the heat equation

and the approach is therefore known as the heat-kernel expansion. A detailed introduction geared

towards applications in finance is given by Avramidi [5] whereas the manual by Vassilevich shows

11

applications in quantum field theory [58]. The basic ideas are also summarized in several publicly

available sets of presentation slides [37, 38, 39, 4].

The approach is somewhat heavy on notation as some basic concepts from differential geometry

as used in general relativity are needed. We give a brief summary here and refer the reader to [5] or

[46] and references therein for the details. The road map of Sec. 3.1 containing the general formalism

is as follows:

1. The pricing equation is recast in a covariant form using the language of differential geometry

and some of the concepts are explained on the way as needed.

2. The ingredients of the asymptotic expansion of the transition density are explained and moti-

vated, the result is then quoted without derivation.

3. An integral over the volatility variable α is carried out by the Laplace method to obtain the

marginal transition density as well as the average variance from the heat-kernel expansion of

the transition density.

4. An expansion for the time value of a vanilla European call option is obtained by a local time

integral over the average variance.

5. A similar expansion of the Black formula is matched to the expansion of the option value to

obtain a series expansion of the implied volatility in the time τ to maturity.

In Secs. 3.2 and 3.3 the general formalism is then applied to local-volatility models and the SABR

model with general C(f).

3.1 The heat-kernel approach

3.1.1 Covariant form of pricing equation

In order to make use of known results in the physics and applied mathematics literature, the back-

wards Kolmogorov Eq. (2.13) for the transition density is recast as a heat or diffusion equation on

a Riemannian manifold.1

The key ingredient in the definition of a Riemannian manifold is the metric tensor which allows

the measurement of lengths and angles. To make connection with the PDE approach to derivative

pricing, we identify the matrix elements of the inverse metric tensor with the coefficients of the

quadratic derivative part of the pricing operator in Eq. (2.11),

gij =1

2ρijσiσj . (3.1)

1A very accessible introduction to the concepts of differential geometry as needed for general relativity can e.g. befound in [13].

12

A matrix inversion then yields the components gij of the metric. Note that there is no summation

implied on the right-hand side of Eq. (3.1) whereas in the following, the Einstein summation con-

vention will be used throughout, i.e. an index appearing twice – once as upper and once as lower

index – is implicitly summed over.

The differential operator

L = gij∂ij + bi∂i + γ , (3.2)

constitutes a generalization of the pricing operator in Eq. (2.11) and includes also possible drift terms

bi and a discount term γ. The partial derivatives are shorthands for ∂i ≡ ∂∂xi and ∂ij ≡ ∂2

∂xi∂xj . The

operator L can be written in the following co-variant form,

L = gij∇Ai ∇A

j + Q = g−12 (∂i + Ai)g

12 gij(∂j + Aj) + Q , (3.3)

where g = det(gij) and ∇Ai is a co-variant derivative that contains the Levi-Civita connection Γk

ij

induced by the metric as well as an Abelian connection Ai. More precisely, the action ∇Ai on a

scalar φ is given by

∇Ai φ = [∂i + Ai]φ , (3.4)

whereas its action on a vector,

∇Ai V

j = [∂i + Ai]Vj + Γj

ikVk , (3.5)

as well as a co-vector,

∇Ai Vj = [∂i + Ai]Vj − Γk

ijVk , (3.6)

contains the connection given by Christophel’s symbols,

Γkij =

1

2gkp(∂jgip + ∂igpj − ∂pgij) . (3.7)

The action of the co-variant derivative on higher-rank tensors can be defined similarly, but will not

be needed in the following. Combining Eqs. (3.4) and (3.6), the Laplace-type operator L can be

expressed as

L = gij [(∂i + Ai)δkj − Γk

ij ][∂k + Ak] + Q . (3.8)

Sorting terms according to the number of derivatives involved, we see that in order for Eqs. (3.2)

and (3.3) to represent the same operator, the Abelian connection and the constant term need to be

chosen as follows,

Ai = gijAj =1

2[bi + gjkΓi

jk] =1

2[bi − g−

12 ∂j(g

12 gij)] , (3.9)

Q = gij(AiAj − bjAi − ∂jAi) + γ , (3.10)

The second equalities in Eqs. (3.3) and(3.9) can be obtained using

g−12 ∂j(g

12 gij) = −gjkΓi

jk . (3.11)

13

This identity is shown by straightforward algebra noting that the derivatives of a determinant and

the inverse of a matrix A with respect to a parameter l are given by ∂l det(A) = det(A)tr(A−1∂lA)

as well as ∂lA−1 = −A−1(∂lA)A−1.

A crucial concept in a curved space is that of parallel transport. Vectors rooted at different points

on a manifold live in different vector spaces, namely the tangent spaces of the respective points, and

can a priori not be compared to each other, i.e. their relative length or an angle between them is not

defined. For such a comparison to be meaningful, one vector needs to be transported to the root of

the other. This transport should occur without extra rotation of the vector. More precisely, a vector

field with components V j is said to be parallel transported along a curve x(s), if its directional

covariant derivative along the curve vanishes, i.e.

xi∇iVj = 0 . (3.12)

Here, the covariant derivative ∇i is given by Eq. (3.5) for A = 0. Note that the parallel transport of

a vector does in general depend on the chosen path C. The components of the vector v at the initial

point x and the final point y are related by a linear transformation vi(y) = P ij(x, y)V j(x) where P i

j

are the components of the parallel transport operator.

3.1.2 Asymptotic expansion of the transition density

Let us now state the central result that will be used in the following without prove (see e.g. [5] for

a thorough motivation and references): The transition density p(τ ;x, y) given by the initial value

problem,

[∂τ − Lx]p(τ ;x, y) = 0 , (3.13)

and

limτ→0

p(τ ;x, y) = δ(x− y) , (3.14)

has the following asymptotic expansion for small times τ ,

p(τ ;x, y) =

√

g(y)

(4πτ)n2

√

∆(x, y)P(x, y)e−d2(x,y)

4τ

∞∑

k=0

ak(x, y)τk , (3.15)

where the leading heat-kernel coefficient a0 = 1 and the subsequent two-point functions ak(x, y)

with k ≥ 1 are regular in the coincidence limit x → y. The meaning of the geodesic distance d, the

van-Vleck-Morette determinant ∆ and the Abelian parallel-transport operator P will be explained

in the following paragraphs. Furthermore, the coefficients ak satisfy the recursion relation,

ak+1(x, y) =1

d(x, y)k+1

∫

C(x,y)d(x′, y)kP(x′, y)−1∆− 1

2 (x′, y)Lx′∆12 (x′, y)P(x′, y)ak(x′, y)dx′ , (3.16)

where the integral is along the geodesic connecting x and y. Note that the line element dx′ =√

gij xixjds involves the metric tensor once a parametrization of the geodesic has been chosen.

14

The geodesic distance d(x, y) is defined as the minimal length of a curve between the points x

and y, i.e.

d(x, y) = minx(s),

x(0)=x,x(sf )=y

∫ sf

0

√

gij(x(s))xi(s)xj(s)ds . (3.17)

The minimizing curve C(x, y) is called a geodesic. Considering linear variations δx(s) to such an

optimal curve, a Legendre ODE can be derived as a necessary condition for a curve to be a geodesic.

Provided the curve is parametrized by its arc length (or a constant multiple thereof), the resulting

second-order ODE reads as,2

xi + Γijkx

j xk = 0 , (3.18)

where dots denote derivatives with respect to the arc length s and Γijk are Christophel’s symbols as

defined in Eq. (3.7). Alternatively, a geodesic can be defined as a curve whose tangent vector xi(s)

is parallel transported as one moves along the curve, i.e.

xi∇ixj = 0 . (3.19)

In other words, a geodesic generalizes the concept of a straight line to a curved geometry: Following

a geodesic, one moves forward ‘following ones nose’ without turning or twisting. Eq. (3.19) directly

yields Eq. (3.18) using the definition of the covariant derivative and xi∂ixj = xj .

The van-Vleck-Morette determinant ∆(x, y) is defined as

∆(x, y) = g(x)−12 det

(

− ∂2

∂x∂y

1

2d2(x, y)

)

g(y)−12 , (3.20)

and behaves as a scalar under coordinate transformations.

Similar to parallel transport induced by the Levi Civita connection and defined in Eq. (3.12), one

can consider a generalized parallel transport including the Abelian connection Ai, i.e. replacing

∇i → ∇Ai in Eq. (3.12). The effect of this inclusion is an additional scalar factor P(x, y) in the

parallel transport operator given by the line integral of A along the geodesic C(x, y),

P(x, y) = exp

(∫

C(x,y)Aidx

i

)

. (3.21)

Summarizing this subsection, we reexpress the heat-kernel expansion of the transition density

for small τ in Eq. (3.15) in a form useful for the following calculations,

p(τ, x, y) = τ−n2 A(x, y)e−

B(x,y)2τ

∞∑

k=0

ak(x, y)τk , (3.22)

where B(x, y) = 12d

2(x, y) and

A(x, y) =

√

g(y)

(4π)n2

√

∆(x, y)P(x, y) . (3.23)

2For a detailed derivation, see e.g. Chap. 3 of [13].

15

3.1.3 Expected variance

In order to calculate the time value of a vanilla call option in the next subsection, the expectation

value of the instantaneous variance of the rate is needed with the final rate fixed at the value of the

strike, i.e.

v(τ, x,K) ≡ E[(σ1(Xτ ))2δ(X1τ −K)|X0 = x] =

∫

dny[σ1(y)]2δ(y1 −K)p(τ ;x, y) . (3.24)

The expression is for a general n-factor model where the first coordinate is the relevant interest rate,

i.e. x1 = f .

For the local volatility model in Eq. (2.8), i.e. for n = 1, Eq. (3.24) reduces to an algebraic one,

v(τ, f,K) = σ2C(K)2p(τ, f,K) = τ−12 σ2C(K)2A(f,K)e−

B(f,K)2τ

∞∑

k=0

ak(f,K)τk . (3.25)

Thus, the asymptotic expansion for the expected variance in the local volatility model is given by

the same coefficients as for the transition density with an extra prefactor.

For the two-factor SABR model in Eq. (1.10), a one-dimensional integral remains to be carried

out,

v(τ ; f, α;K) = C(K)2∞∑

k=0

τk−1

∫

α′2A(f, α;K,α′)e−B(f,α;K,α′)

2τ ak(f, α;K,α′)dα′ , (3.26)

where we have used the heat-kernel expansion of the transition density in Eq. (3.22). The integrands

in Eq. (3.26) are of the form f(α′)eg(α′)/τ where τ is a small parameter. The integral is then

dominated by the behavior of f and g in the vicinity of the minimum α of g(α′). It can be expanded

in an asymptotic series using a saddle-point expansion (Laplace method, a short summary is given

in App. A) resulting in

v(τ ; f, α;K) = A(f, α;K)τ−12 e−

B(f,α;K)2τ

∞∑

k=0

bk(f, α;K)τk , (3.27)

with b0 = 1,

B(f, α;K) = minα′

B(f, α;K,α′) = B(f, α;K, α) , (3.28)

as well as

A(f, α;K) =

√

4π

B′′(f, α;K, α)α2C2(K)A(f, α;K, α) . (3.29)

Here, primes denote derivatives with respect to α′ evaluated at the position α of the minimum. The

first correction term is given by the following rather cumbersome expression,

b1 = a1 +1

B′′

(2A′

A+

2

α2+

A′′

A

)

− 1

(B′′)2

(2B′′′

α+

A′B′′′

A+

B(4)

4

)

+5

12

(B′′′)2

(B′′)3. (3.30)

For brevity, we have omitted the arguments of the two-point functions which should be clear from

the context and by comparison with Eq. (3.29).

16

For comparison with Monte Carlo simulations, it will prove useful to derive a similar expression

for the marginal transition density of the SABR model

pm(τ, f, α;F ) =

∫

dα′p(τ ; f, α, F, α′) . (3.31)

We can again use Laplace’s method to obtain the following asymptotic expansion,

pm(τ ; f, α;F ) = A(f, α;F )τ−12 e−

B(f,α;F )2τ

∞∑

k=0

bk(f, α;F )τk , (3.32)

with

A(f, α;F ) =

√

4π

B′′(f, α;F, α)A(f, α;F, α) , (3.33)

b0 = 1 and

b1 = a1 +A′′

AB′′ −1

(B′′)2

(A′B′′′

A+

B(4)

4

)

+5(B′′′)2

12(B′′)3. (3.34)

Summarizing this subsection, we have derived an asymptotic expansion for the expected variance

which for both the local volatility model as well as the SABR model is of the form,

v(τ ;x;K) = A(x,K)τ−12 e−

B(x,K)2τ

∞∑

k=0

bk(x,K)τk , (3.35)

as well as a similar expansion for the marginal transition density of the form,

pm(τ ;x;K) = A(x,K)τ−12 e−

B(x,K)2τ

∞∑

k=0

bk(x,K)τk . (3.36)

Note that the ratio of the expressions in Eqs. (3.35) and (3.36) has a leading order expansion where

a lot of factors cancel out. For the SABR model, we obtain

v(τ ;x;K)

pm(τ ;x;K)= α2C2(K)

[

1 +

{1

B′′

(2A′

A+

2

α2

)

− 2B′′′

(B′′)2

}

τ + O(τ2)

]

. (3.37)

This ratio is just the local volatility as we will see in the next section.

3.1.4 Time value of vanilla option

The (normalized) value of a vanilla European call option is given by

C(τ, x,K) = E[(X1τ −K)+|X0 = x] =

∫

dy1(y1 −K)+pm(τ, x, y1)

= (x1 −K)+ +

∫ τ

0

dτ ′∫

dy1(y1 −K)+∂τ ′pm(τ ′, x, y1) , (3.38)

where pm is the marginal transition density defined in Eq. (3.31). Now integrating the Kolmogorov

forward Eq. (2.14) over y2 to yn, one obtains

∂τpm(τ, x,K) =1

2∂2K

∫

dny[σ(1)(y)]2δ(y1 −K)p(τ, x, y) . (3.39)

17

Other terms in the Kolmogorov forward equation vanish after integration since the transition density

is sufficiently strongly suppressed at infinity. Eq. (3.39) can also be expressed as

∂τpm(τ, x,K) =1

2∂2K

[K2σ2

loc(x,K)pm(τ, x,K)], (3.40)

where Dupire’s local volatility σloc can be obtained from the expected variance by normalizing with

the marginal transition density,

K2σ2loc(x,K) =

v(τ, x,K)

pm(τ, x,K)=

E[σ2(Xτ ′)δ(Xτ ′ −K)|X0 = x]

pm(τ, x,K). (3.41)

Using Eq. (3.39) in Eq. (3.38) and integrating twice by parts, one obtains

C(τ, x,K) = (x1 −K)+ +1

2

∫ τ

0

E[σ2(Xτ ′)δ(Xτ ′ −K)|X0 = x]dτ ′ . (3.42)

The expectation value has been calculated in Eq. (3.35) as an asymptotic series for small τ . To

obtain a similar series for the option value itself, we need to evaluate the following time integrals,

Ik(x, s) =

∫ s

0

uk− 12 e−

x2u du . (3.43)

Integration by parts yields the recursion relation,

Ik(x, s) =sk+

12

k + 12

e−x2s − x

2(k + 12 )

Ik−1(x, s) . (3.44)

For k = −1, the integral can be reduced to a Gaussian one by variable transformation such that

I−1 =

√

8π

xN

(

−√

x

s

)

=

√

2π

xerfc

(√x

2s

)

. (3.45)

Here, N(x) is the cumulative normal distribution function and erfc is the closely related complemen-

tary error function. Thus, all integrals Ik can be reduced to elementary functions and the cumulative

normal distribution function. Now, to obtain an asymptotic expansion for the option value, we can

use results for the asymptotic behaviour of the complementary error function at large arguments.

Alternatively, we can apply the recursion relations backwards ad infinitum, and obtain

Ik(x, s) =2sk+

32

xe−

x2s − 2k + 3

xIk+1(x, s)

=2sk+

32

xe−

x2s

[

1 − s

x(2k + 3) +

(2k + 3)(2k + 5)

x2Ik+2(x, s)

]

= . . .

=2sk+

32

x

[

1 +

∞∑

l=1

(2k + 3) . . . (2(k + l) + 1)(

− s

x

)l]

. (3.46)

Putting things together and sorting by orders of τ , we finally obtain the following asymptotic series

for the option value,

C(τ, x,K) = (x1 −K)+ +A(x,K)

B(x,K)τ

32 e−

B(x,K)2τ

∞∑

k=0

ck(x,K)τk , (3.47)

with

ck(x,K) = bk(x,K) +

k−1∑

l=0

(2l + 3) . . . (2k + 1)

(−B)k−lbl(x,K) . (3.48)

18

3.1.5 Implied volatility

In order to find the log-normal implied volatility, we need to match the option price in the full model

with the one obtained in the Black model. This is done by writing down the asymptotic expansion

for the time value in the Black model. Since this is a special case of the general local-volatility model

treated in Sec. 3.2 with C(f) = f , we will simply state the result here and postpone the details of

the calculation,

CB − (f −K)+ =

√

fK(σ2τ)3

2π ln4 Kf

e−ln2 K

f

2σ2τ

(

1 − σ2τ

[1

8+ 3 ln−2 K

f

]

+ O(τ2)

)

. (3.49)

Assuming that the implied volatility σimpl. has a regular expansion in τ ,

σimpl. = σ0 + σ1τ + σ2τ2 + O(τ3) , (3.50)

we can take logarithms on both sides of the identity CB = C and thoroughly expand in powers of

τ . The coefficients must match on both sides. The leading order ∼ τ−1 yields

σ0(x,K) =| ln K

f0|

√

B(x,K)=

√2| ln K

f0|

dmin(x,K), (3.51)

whereas the constant terms (zeroth order in τ) result in

σ1

σ0=

1

Bln

(1

σ0

√2π

KfA

)

, (3.52)

and the terms proportional to τ yield

σ2

σ0=

3

2

(σ1

σ0

)2

+1

B

(

c1 − 3σ1

σ0+ σ2

0

[

1

8+

3

ln2 Kf

])

. (3.53)

In summary, we have shown in this section, how the heat-kernel expansion in Eq. (3.15) can be

used to obtain an expansion for the Black implied volatility. Coefficients in this latter expansion are

algebraic combinations of coefficients of the former expansion and some derivatives with respect to α

thereof. The remaining task is to derive explicit analytical expressions for the quantities of the heat-

kernel expansion in Eq. (3.15), namely the geodesic distance, the van-Vleck-Morette determinant,

the parallel transport integral and wherever tractable the first correction term a1.

3.2 Application to local volatility model

Let us now apply the formalism presented in the previous section to the simple case of the local-

volatility model in Eq. (2.8), i.e. n = 1 and σ1 = σC(f). We will assume C(f) to be sufficiently

smooth for the derivations here. In Sec. 7.1.3, we will comment on problems that arise from the

non-analytic nature of the C(f) at regime switching points. Approximate expressions for the implied

volatility for the local volatility model have been obtained early on by Hagan and Woodward using

19

singular-perturbation theory [29]. Mathematically rigorous results, in particular an exact expres-

sion for the implied volatility in the limit τ → 0 have been derived by Berestycki et al. using a

PDE approach directly for the implied volatility [8]. Gatheral provides a thorough review of the

existing literature and accurate results for time-inhomogeneous local-volatility problems [25]. The

presentation in the current section is very much along the lines of the recent paper by Taylor [55].

The metric tensor for the local-volatility model only has a single component g11 = σ2C2(f)/2,

thus g11 = 2/σ2C2(f) and√

g(f) =√

2/σC(f). According to Eq. (3.9), the Abelian connection is

given by

A1(f) = −σ2

4C(f)C′(f) , ⇒ A1(f) = −1

2

C′(f)

C(f). (3.54)

There is only a single Christoffel symbol given by

Γ111 = −C′(f)

C(f). (3.55)

A geodesic connecting f to F is trivially given by the interval [f, F ] parametrized by the geodesic

distance

d(f, F ) =

∣∣∣∣∣

∫ F

f

√

g11(φ)dφ

∣∣∣∣∣

=√

2

∣∣∣∣∣

∫ F

f

dφ

σC(φ)

∣∣∣∣∣

=

√2

σ|Σ(F ) − Σ(f)| . (3.56)

Following the notation in [26], we have defined the Σ-function

Σ(f) =

∫ f dφ

C(φ), (3.57)

where the constant of integration can be chosen arbitrarily. As this quantity will be needed in

various contexts, we have compiled the analytical expressions for all the relevant C(f) in App. B.

Taking derivatives of Eq. (3.56), it follows that

− ∂2

∂f∂F

[1

2d2(f, F )

]

=2

σ2C(f)C(F ), (3.58)

and thus the van-Vleck-Morette determinant is given by ∆(f, F ) = 1. The parallel transport integral

can easily be calculated,

∫

C(f,F )

A1(φ)dφ = −1

2

∫ F

f

C′(φ)

C(φ)dφ = −1

2ln

(C(F )

C(f)

)

, (3.59)

and the Abelian factor in the parallel transport operator is thus given by

P(f, F ) =

√

C(f)

C(F ). (3.60)

Finally assembling the pieces, the heat-kernel expansion for the transition density reads as

p(τ ; f, F ) =

√

C(f)

2πσ2τC3(F )e−

12σ2τ

|Σ(F )−Σ(f)|2∞∑

k=0

ak(f, F )τk , (3.61)

20

with a0 = 1. The recursion relation for the heat-kernel coefficients in Eq. (3.16) simplifies in one

dimension,

ak+1(f, F ) =σ2

2[Σ(F ) − Σ(f)]k+1

∫ F

f

[Σ(F ) − Σ(φ)]k√

C(φ)∂2φ[√

C(φ)ak(φ, F )]dφ . (3.62)

With a0 = 1, the first correction term explicitly reads as

a1(f, F ) =σ2

4[Σ(F ) − Σ(f)]

∫ F

f

dφ

{

C′′(φ) − [C′(φ)]2

2C(φ)

}

= σ2C′(F ) − C′(f) − 1

2 [Σ(F ) − Σ(f)]

4[Σ(F ) − Σ(f)], (3.63)

where we have defined

Σ(f) =

∫ f [C′(φ)]2

C(φ)dφ , (3.64)

with a still arbitrary constant of integration. For a local volatility model, the integral in Eq. (3.27)

reduces to an algebraic expression and bk = ak. The time value of a vanilla call option is then

explicitly given by the following asymptotic expansion,

C(τ, f,K) = (f −K)+ +

√

C(f)C(K)(σ2τ)3

2π|Σ(K) − Σ(f)|4 e− 1

2σ2τ[Σ(K)−Σ(f)]2

∞∑

k=0

ck(f,K)τk , (3.65)

where

ck(f,K) = ak(f,K) +

k−1∑

l=0

(2l + 3)(2l + 5) . . . (2k + 1)

[−(Σ(K) − Σ(f))2/σ2]k−lal(f,K) . (3.66)

For the first coefficients, this equation yields,

c0(f,K) = 1 , (3.67)

c1(f,K) = a1(f,K) − 3σ2

[Σ(K) − Σ(f)]2, (3.68)

c2(f,K) = a2(f,K) − 5σ2

[Σ(K) − Σ(f)]2a1(f,K) +

15σ4

[Σ(K) − Σ(f)]4. (3.69)

The leading term in the implied volatility is given by

σ0(f,K) = σln K

f

Σ(K) − Σ(f). (3.70)

Note, that σ on the right hand side is the prefactor of C(f) in the local volatility model, not to be

confused with the Black volatility. Eq. (3.70) can also be written as

∫ K

f

dφ

σ0(f,K)φ=

∫ K

f

dφ

σC(φ), (3.71)

giving a precise notion of how the local volatility C(f) has to be ’harmonically averaged’ to obtain

the (leading order) implied volatility σ0 [8]. The first and second order correction terms are explicitly

given by

σ1

σ0=

σ2

2∆Σ2ln

[

∆Σ2

ln2 Kf

C(K)C(f)

Kf

]

, (3.72)

21

and

σ2

σ0=

σ4

(∆Σ)4

{

1

8ln2 K

f− 3

2ln

[

(∆Σ)2

ln2 Kf

C(K)C(f)

Kf

]

+3

8ln2

[

(∆Σ)2

ln2 Kf

C(K)C(f)

Kf

]}

+σ4

4(∆Σ)3

{

C′(K) − C′(f) − 1

2∆Σ

}

, (3.73)

where we have used the shorthands ∆Σ = Σ(K) − Σ(f) and ∆Σ = Σ(K) − Σ(f).

3.3 Application to SABR model

Let us now tackle the heat-kernel expansion of the main model of interest, namely the extended

SABR model in Eq. (1.10).

3.3.1 Geodesic distance: Hyperbolic geometry

For notational convenience, we follow [46] and set ν = 1 for most of the following calculations. This

amounts to working with a dimensionless time ν2τ → τ . To recover all factors of ν, we need to

perform the following replacements at the end of the calculation,

τ → ν2τ , α → α

ν, σimpl. → νσimpl. . (3.74)

With this choice of variables, the inverse metric reads as

(gij) =1

2α2

(C2(f) ρC(f)ρC(f) 1

)

. (3.75)

Performing a matrix inversion, we obtain

(gij) =2

α2C2(f)(1 − ρ2)

(1 −ρC(f)

−ρC(f) C2(f)

)

. (3.76)

The metric can be diagonalized by the following coordinate transformation,

x =q − ρα√

1 − ρ2,

y = α , (3.77)

where q =∫ f du

C(u) with a still arbitrary lower integration bound. The Jacobian of the transformation

is given by

Λ =∂(x, y)

∂(f, α)=

(1

C(f)√

1−ρ2− ρ√

1−ρ2

0 1

)

, (3.78)

such that the components of the transformed metric become,

(gij) → Λ(gij)ΛT =1

2y2(

1 00 1

)

, (3.79)

as well as,

(gij) →2

y2

(1 00 1

)

. (3.80)

22

Thus, the line element can formally be written as,

ds2 =2

y2(dx2 + dy2) . (3.81)

The transformed metric is that of the Poincare upper half plane (y > 0) which is a model of

hyperbolic geometry. The transformed metric is of the form gij(x) = γ(x)δij for which Christoffel’s

symbol read as,

Γijk =

1

2γ[δik∂jγ + δij∂kγ − δjk∂iγ] . (3.82)

Here, we have γ = 2y2 , and thus,

Γ111 = Γ1

22 = Γ212 = Γ2

21 = 0 , Γ112 = Γ1

21 = Γ222 = −Γ2

11 = −1

y. (3.83)

The ODE for the geodesic in Eq. (3.18) thus explicitly reads as,

x− 2

yxy = 0 ,

y +1

y(x2 − y2) = 0 . (3.84)

It can further be simplified by using complex coordinates z = x + iy,

z + iz2

Im(z)= 0 . (3.85)

The geodesics for the Poincare half-plane model are known to be half-circles centered on the real

axis. Using the angle φ to the real x axis as a parameter, we thus have z(φ) = x0 + reiφ. However,

in order to construct a solution of Eq. (3.85), we need to use the arc length instead of the angle as

a parameter. The length of a geodesic between the angles φ1 and φ2 is given by,

s =

∫ φ2

φ1

dφ

√

2

Im2(z(φ))

dz

dφ

dz

dφ=

√2

∫ φ2

φ1

dφ

sinφ=

√2 log[tan(φ/2)]|φ2

φ1. (3.86)

Inverting this expression to obtain φ in terms of s, we obtain a solution to Eq.(3.85),

z(s) = x(s) + iy(s) = x0 + re2i arctan(es/

√2) = x0 − r +

2r

1 − ies/√2. (3.87)

Alternatively, in terms of real coordinates, we have

(x(s)y(s)

)

=

(

x0 − r tanh(s/√

2)r

cosh(s/√2)

)

. (3.88)

Eq. (3.87) is easily verified to provide indeed a solution of Eq. (3.85), using

z = −√

2ires/√2

(1 − ies/√2)2

, z = − ires/√2(1 + ies/

√2)

(1 − ies/√2)3

. (3.89)

Given two points (x1, y1) and (x2, y2) on a geodesic, its radius r and its center x0 are determined

by algebraic equations,

r2 = (x1 − x0)2 + y21 = (x2 − x0)2 + y22 , (3.90)

23

with the explicit solutions

x0 =y22 − y21 + x2

2 − x21

2(x2 − x1), (3.91)

and

r2 =(y21 − y22)2

4(x1 − x2)2+

(x1 − x2)2

4+

y21 + y222

. (3.92)

Given x0 and r, the position s on the geodesic can be obtained by inverting Eq. (3.87). After some

lengthy but straightforward algebra, the geodesic distance |s1 − s2| between two points (x1, y1) and

(x2, y2) can be cast in the form

d =√

2acosh

(

1 +(x1 − x2)2 + (y1 − y2)2

2y1y2

)

, (3.93)

Substituting the coordinate transformation of Eq. (3.77) into Eq. (3.93), the geodesic distance

in the original variables is given by

d =√

2acosh

(

1 +(q1 − q2)2 + (α1 − α2)2 + 2ρ(q1 − q2)(α1 − α2)

2α1α2(1 − ρ2)

)

. (3.94)

With (f1, α1) = (f, α) and (f2, α2) = (K,α′), the minimum of d with respect to α′ occurs at

α =√

α2 + q2 + 2ρqα , (3.95)

where q = q2 − q1 =∫K

fdf

C(f) = Σ(K) − Σ(f) ≡ ∆Σ.

3.3.2 van-Vleck-Morette determinant

For the Poincare half plane, the van-Vleck-Morette determinant can be expressed in terms of the

geodesic distance,

∆(x, y) =d(x, y)√

2 sinh(d(x, y)/√

2). (3.96)

This can be verified by explicitly differentiating Eq. (3.93) according to the definition in Eq. (3.20).

3.3.3 Parallel transport

Carrying out the differentiation in Eq. (3.9) explicitly, the Abelian connection is given by

(Ai) = (−1

2g−

12 ∂j(g

12 gij)) = −1

4α2C(f)C′(f)

(10

)

, (3.97)

resulting in

(Ai) = (gijAj) =

C′(f)

2C(f)(1 − ρ2)

(−1

ρC(f)

)

. (3.98)

Thus the integral in the scalar factor of the parallel transport operator reads as∫

CAidx

i = − 1

2(1 − ρ2)

∫

C

C′(f)

C(f)df +

ρ

2(1 − ρ2)

∫

CC′(f)dα . (3.99)

The first integral is over an exact form, thus independent of the path, and can be easily carried out.

We obtain,

P(f, α;F, α) = e∫

C Aidxi

=

(C(f)

C(F )

) 12(1−ρ2)

eρ

2(1−ρ2)M

, (3.100)

24

where

M =

∫

CC′(f)dα , (3.101)

denotes the remaining path dependent term that needs to be calculated depending on the particular

realization of C(f).

3.3.3.1 Numerical quadrature for M

The remaining integral in Eq. (3.101) is a line integral along a geodesic. Using the angle φ to the x

axis in the x-y-plane as a parameter, the integral can be expressed as follows,

M =

∫ φ2

φ1

C′(Σ−1(ρx0 + r[ρ cosφ + ρ sinφ]))r cosφdφ . (3.102)

For all models considered here, the derivative C′(f), as well as the function Σ(f) and can be

calculated analytically (see App. B. The integral for M can then easily be carried out by numerical

quadrature. For special cases, it can also be performed analytically, as shown in the next two

subsections.

3.3.3.2 Standard SABR model

For the standard CEV-based SABR model, we have C(f) = fβ. With

Σ(f) =f1−β

1 − β, Σ−1(q) = [(1 − β)q]

11−β , C′(Σ−1(q)) =

β

1 − β

1

q, (3.103)

the integral for M reduces to

M =β

1 − β

∫ φ2

φ1

r cosφ

ρx0 + r[ρ cosφ + ρ sinφ]dφ . (3.104)

The integrand is a rational function of sinφ and cosφ. The Weierstrass substitution

t = tanφ

2, dφ =

2dt

1 + t2, sinφ =

2t

1 + t2, cosφ =

1 − t2

1 + t2, (3.105)

then reduces it to an integral over a rational function of t which can be carried out by partial

fractions. After some algebra, we obtain

M = G(t1) −G(t2) , (3.106)

with

G(t) =β

1 − β

r sgn(x0 − r)√

r2 − ρ2x20

∑

s=±

s

1 + t2s

{t2s ln(t− ts) + πts atan(t−1) − ln(1 + t2)

}, (3.107)

and the roots t± of some characteristic polynomial,

t± = − rρ

ρ(x0 − r)±√

r2 − ρ2x20

ρ2(x0 − r)2. (3.108)

Note that in Eq. (3.107), we have kept a potentially complex root and a complex logarithm to avoid

some case distinctions.

25

f

α

x

y

Figure 3.1: Visualization of the geodesics of the three regime model in the f -α plane (left panel)as well as the x-y standard Poincare hyperbolic plane. In the x-y plane, geodesics are half circlescentered on the x axis. The coordinate transformation effective between the left and right panels isgiven in Eq. (3.77), where q = Σ(f) and Σ(f) for the three regime model is given in Eq. (B.6).

3.3.3.3 Three-regime model

For the three-regime model with C(f) in Eq. (1.12), the path dependent integral in Eq. (3.101) has

a clear geometric interpretation. As C′(f) is piecewise constant it can be taken out of the integral,

M =

∫

CL

dα + κ

∫

CR

dα = ML + κMR . (3.109)

Here, CL and CR are the intersections of the relevant geodesic C with the regime f ≤ yL or f ≥ yR,

respectively. Thus ML and MR are the length of the projection of this intersection to the α-

axis. Fig. 3.1 shows a visualization of the geodesics and the regimes in the f -α as well as the x-y

plane. Note that y = α, and thus the length of the projection does not change under the variable

transformation. The calculation of the intersections can thus be done in the x-y plane by elementary

geometry.

3.3.4 Transition density

We are now ready to write down the leading asymptotic expansion for the transition density of the

extended SABR model. Assembling all contributions and reinstating the factors of ν, Eq. (3.15)

26

becomes

p(τ ; f, α;F,A) =1

2πC(F )A2τ

√

d/√

2

sinh(d/√

2)

(C(f)

C(F )

) 12(1−ρ2)

× exp

[

− d2

4ν2τ+

ρ

2ν(1 − ρ2)M(

f,α

ν;F,

A

ν

)]

×∞∑

k=0

ak(f, α;F,A)(ν2τ)k , (3.110)

where the geodesic distance in the original variables is given by

d(f, α;F,A) =√

2acosh

(

1 +ν2(∆Σ)2 + (A− α)2 + 2ρν(∆Σ)(A− α)

2αA(1 − ρ2)

)

, (3.111)

and M is calculated in Secs. 3.3.3.1 to Secs. 3.3.3.3 for all relevant choices of C(f). Formally, we

have written an infinite series in Eq. (3.110), but we will only obtain the very leading order a0 = 1

in this thesis.

3.3.5 Marginal transition density and local volatility

The asymptotic expansion of the marginal transition density in Eq. (3.32) can now be explicitly

computed. There are some surprising cancellations between A(f, α,K, α) and B′′ and we obtain

pm(f, α;K) =

√α

2πα3τC2(K)

(C(f)

C(K)

) 12(1−ρ2)

× exp

(

− d2min

4ν2τ+

ρ

2(1 − ρ2)M(

f,α

ν;K,

α

ν

))

×∞∑

k=0

bk(f, α;K) , (3.112)

where again, we will only work with the leading term b0 = 1 in the following. The minimum dmin of

the geodesic distance occurs at

α =√

α2 + ν2(∆Σ)2 + 2ρνα∆Σ , (3.113)

and is given by

dmin =√

2acosh

(α− ρν∆Σ − ρ2α

α(1 − ρ2)

)

=√

2

∣∣∣∣ln

(α− ν∆Σ − ρα

α(1 − ρ)

)∣∣∣∣. (3.114)

Finally, for the Dupire local volatility given in Eq. (3.41) the regular expansion of σloc in powers of

τ given in Eq. (3.37) can be evaluated explicitly by carrying out all required derivatives. We obtain,

K2σ2loc(τ,K) = α2C2(K)

{

1 +α

2α

sinh(dmin/√

2)

dmin/√

2[ρα2C′(K) + 4ν(1 − ρ2)(2 − α)]τ

}

. (3.115)

The leading time-independent order can be expressed as an effective C-function σCeff(K) = Kσloc(K)

with

σCeff(K) = αC(K) =√

α2 + ν2(∆Σ)2 + 2ρνα∆ΣC(K) . (3.116)

27

3.3.6 Implied volatility

We can now specialize the results of Sec. 3.1.5 for the Black implied volatility to the case of the

(extended) SABR model. The leading contribution with all factors of ν is given by

σ0 =ν| ln K

f |∣∣∣ln(

α−ν∆Σ−ραα(1−ρ)

)∣∣∣

, (3.117)

where the position α of the minimum is given in Eq. (3.113). Note that for ρ = 0 and taking the

limit ν → 0 the expression in Eq. (3.117) reduces to Eq. (3.70) for the local volatility model as

expected. To show this, we use acosh(1 + x) =√

2x[1 + O(x)]. For the first correction term, we

obtain

σ1

σ0=

2

d2min

ln

d2min

2 ln2 Kf

αα

ν2C

1−2ρ2

1−ρ2 (K)C1

1−ρ2 (f)

Kf

+ρ

ν(1 − ρ2)M(

f,α

ν;K,

α

ν

)

. (3.118)

We do not give a second order correction here, as for a general C(f) the algebra becomes very

involved and it is not clear whether the recursion relation for a1 in the heat-kernel expansion can

be evaluated analytically. For the standard SABR model, the calculation has been carried out by

Paulot [46], but some parts of the integrals had to be done by numerical quadrature.

28

Chapter 4

Small correlation ρ

In the early days of option valuation with stochastic volatility, it has been noted by Hull and White

[34] that the limit of vanishing correlation ρ = 0 provides some technical simplification and new

insights. Here, ρ denotes the correlation between the driving Brownian motions W(1)t and W

(2)t of

the asset and the volatility process. In the limit ρ = 0, the valuation of a European option can be

performed in two steps. First, the option value is calculated for a deterministic but time-dependent

volatility α(t). It turns out that for a large class of processes the option value only depends on the

mean variance and not on the details of how volatility is delivered over time. Second, an average

over volatility paths is performed which reduces to an average over realized mean variance.

Hull and White [34] perform this second step analytically by expanding the option value in

the time-inhomogeneous model around the mean realized variance. The first two terms of the

expansion already provide a good approximation for small times to maturity. We will consider this

approximation here for the extended SABR model with an arbitrary local vol factor C(f). Valuation

in the time-inhomogeneous model and the calculation of time derivatives of the option value has to

be performed numerically by finite-difference or tree methods.

For larger times to maturity the Hull and White expansion [34] quickly fails as the volatility

distribution becomes less localized. The required average can still be carried out numerically, per-

forming the integral over the density of the distribution by numerical quadrature. As the density

is not easily available in closed analytic form, we will follow Rebonato and Barjaktarevic [48] and

replace the averaging by a finite sum of terms placed at evenly spaced quantiles of the volatility

distribution. These quantiles can be precalculated by a Monte Carlo simulation.

In the particular case of an underlying that follows a normal or log-normal dynamics, i.e. β = 0

or β = 1 in the standard SABR model, conditioning on volatility paths can still be fruitful even for

ρ 6= 0. The option value with deterministic time-dependent volatility still depends on the realized

r.m.s. volatility but now the spot price has to be corrected by a term that depends on a time

integral of the volatility. Therefore, the joint distribution of this correction together with the r.m.s.

volatility has to be taken into account. A review of these so-called mixing solutions is giving in the

29

book by Lewis [40]. Unfortunately, for general C(f) the correction term for the spot price cannot be

dis-entangled from the dynamics of the underlying. Recently, Antonelli and Scarlatti [3] have used

ideas similar to mixing solutions to perform a power-series expansion in ρ of the option value in a

stochastic volalitity model.

This chapter exposes the ideas involved in the ρ = 0 limit in the following logical order. In

Sec. 4.1, we look at a time-inhomogeneous local volatility model and show that, as long as it does

not contain a drift term, the time dependence of α(t) can be absorbed in a rescaling of the time

variable. The option value then only depends on the total variance v = α2τ where τ = T − t is the

time to maturity and α is the root-mean-square realized volatility. In Sec. 4.2, we use the Tower

Law to express the option value of the full model as an average of option values in the associated

time-inhomogeneous local-volatility model over realized volatility paths α(t). As the option value

in the local-volatility model only depends on the r.m.s volatility α or alternatively on total variance

v, the average over volatility paths reduces to an simple one-dimensional integral, provided the

probability density of the mean variance is known. Sec. 4.3 shows how the mean variance is related

to the exponential integral of Brownian motion and reviews some relevant results in the literature

on its distribution. The Hull and White expansion around the expectation of the mean variance is

derived in Sec. 4.4. We also discuss, how a single run of an appropriate binomial tree or a finite

difference scheme can be used to obtain the option value in the local volatility model and the required

derivatives with respect to total variance v, where it is important to note that the variance v acts

as a dimensionless time. Sec. 4.5 presents a numerical averaging scheme over the distribution of

r.m.s. volatilities. To this end, the distribution of rms volatilities α can be represented numerically

by quantiles pre-computed using Monte Carlo. Monte Carlo simulation for the quantiles is rather

time consuming. However by dimensional analysis only quantiles for the normalized process with

α0 = 1 and ν = 1 have to be calculated as a function of dimensionless time ν2τ . Furthermore,

these quantiles only depend on the volatility process and can be used for any C(f) in subsequent

averaging. We show again that a single run of a binomial tree or a finite difference scheme is sufficient

to obtain the time-dependence of the option value in the local volatility model. Some care has to be

taken to obtain accurate results for very small and very large times. Finally in Sec. 4.6, we discuss

a combination of the mixing scheme with the heat-kernel expansion to obtain results for ρ 6= 0 using

a simple contravariate correction proposed by Barjaktarevich and Rebonato [48].

4.1 Time-inhomogeneous local volatility model

Let us consider a time-inhomogeneous extension of the local volatility model in Eq. (2.8). The

relevant interest rate then follows the SDE,

dFt = α(t)C(Ft)dWt , (4.1)

30

where α(t) is a deterministic function of time. The Kolmogorov backward equation for the value of

a European option in this model reads as

∂tVLV(t, T, f) = −1

2α2(t)C2(f)∂2

fVLV(t, T, f) . (4.2)

The subscript ’LV’ stands for local volatility as apposed to ’SV’ for the option value in the full

stochastic volatility SABR model in the next section. The explicit dependence of the pricing equation

on the function α(t) can be eliminated by performing a change of variables from time t to the total

variance in the time interval (t, T ),

v(t, T ) =

∫ T

t

α2(t′)dt′ ≡ α2(t, T )(T − t) . (4.3)

Here α(t, T ) is the root-mean-square (r.m.s.) volatility between the time t and maturity T . Note

that v(t, T ) acts as a dimensionless time. We denote by VLV(v, f) the option value as a function of

the total variance, i.e. VLV(t, T, f) = V (t(v), T, f) ≡ VLV(v, f) = VLV(v(t), f). In terms of the total

variance, the Kolmogorov backward equation now reads as

∂vVLV(v, f) =1

2C2(f)∂2

f VLV(v, f) , (4.4)

with the initial condition VLV(0, f) = VLV(T, T, f) = g(f), where g(f) is the payoff of the option,

e.g. g(f) = (f −K)+ for a European call. The time-inhomogeneity has disappeared from Eq. (4.4).

Thus, the option value can only depend on the total variance v(t, T ) to maturity,

VLV(t, T, f) = VLV(v(t, T ), f) = VLV(α2(t, T )(T − t), f) . (4.5)

Similar arguments could be used to show that the terminal distribution of FT in the time-inhomo-

geneous local vol model only depends on the total variance v(t, T ) and not on how the volatility α(t)

is delivered along the path.

Note that the argument in this section requires the absence of a drift term in Eq. (4.1). Otherwise,

a time change would only shift the explicit time dependence from the diffusion to the drift term.

4.2 Conditioning on volatility path

Using the Tower Law to condition on the realized volatility path, the value of a European option in

the (extended) SABR stochastic volatility model for ρ = 0 can be expressed as

VSV(t, f, α) = E[g(FT )|Ft = f, αt = α]

= E[E[g(FT )|Ft = f, αt′ = α(t′), t ≤ t′ ≤ T ]|αt = α]

= E[VLV (v(t, T ), f)|αt = α]

=

∫

VLV (v, f)E[δ(v − v(t, T ))|αt = α]dv

=

∫

VLV (v, f)π(v; τ, α, ν)dv . (4.6)

31

The second line contains the conditioning on the volatility path. The third line uses the fact that,

as shown in the previous subsection, the option value in the local volatility model does not depend

on the details of the volatility α(t) along the path, but only on the total variance v(t, T ). Thus, the

average over realized volatility paths reduces to an average over total variance which is written as

an integral over its probability density π(v; τ, α, ν) in the last line of Eq. (4.6).

Note that VLV (v, f) is simply the option value for the time-homogeneous local volatility model

in Eq.(2.8) with σ = 1. In some cases as for the CEV model, it is known analytically. In all other

cases, we can use a finite-difference scheme as described in Sec. 6.1.1 to solve for the option value.

The averaging in Eq. (4.6) involves option values for the same strike and initial value of the rate f

but for different dimensionless times v. This can be obtained from intermediate results of a single

run of the finite-difference scheme. The distribution of total variance has quite fat tails such that the

option values needed in the averaging are up to quite large dimensionless times. In order not to loose

accuracy for small times where the main probability weight of π is concentrated, a non-uniform grid

in the rate direction f is required in the finite-difference scheme. We have also experimented with

binomial trees and obtained similar results. As no real advantage of the trees over finite differences

have been noticed, we do not present results here. For computational speed, the whole shape of the

smile can be calculated at once using the Dupire forward Eq. (1.9) instead of the backwards pricing

equation. This is compatible with the averaging in Eq. (4.6).

4.3 Distribution of total variance

According to Eq. (4.6), we need to evaluate the probability density π(v; τ, α, ν) of the total variance

defined by

v =

∫ τ

0

α2tdt , (4.7)

where αt follows the stochastic process

dαt = ναtdWt , (4.8)

with the initial condition α0 = α. By dimensional analysis, it is easy to see that the probability

density only depends on the following dimensionless combination of parameters,

π(v, τ, α, ν) = π(v/α2τ, ν2τ, 1, 1) ≡ π(v/α2τ, ν2τ) . (4.9)

Thus, it is sufficient to know the distribution for the normalized process with α = ν = 1 as a function

of dimensionless time s = ν2τ . All other distributions can be obtained by rescaling. Changing the

variable of integration in Eq. (4.6) to u = v/α2τ , we obtain,

VSV (t, f, α) =

∫

VLV (uα2τ, f)π(u, ν2τ)du . (4.10)

32

The probability density π(u, s) is not readily available in analytic form. Note however that for

the normalized process (α = ν = 1) the SDE

dαs = αsdWs , (4.11)

has the well known solution,

αs = e−s2+Ws = eW

(− 12)

s , (4.12)

where W(µ)t = Wt + µt is a Brownian motion with drift µ. Thus, the total variance in the interval

(0, s) is given by∫ s

0

α2s′ds

′ =

∫ s

0

e2W(− 1

2)

s′ ds′ ≡ A(− 1

2 )s . (4.13)

Here, A(µ)s is the exponential integral of Brownian motion with drift which has been studied in

detail in the literature. A comprehensive review of available results is given by Matsumoto and Yor

[43]. Here, we simply quote some relevant results, for proofs see [43] and references therein. The

joint distribution of the Brownian motion and its exponential integral have the following integral

representation,

P(A(µ)s ∈ da,W (µ)

s ∈ dx) = eµx−µ2s2 exp

(

−1 + e2x

2a

)

θ

(ex

a, s

)dadx

a, (4.14)

where the function θ(r, s) is given by

θ(r, s) =re

π2

2s

√2π3s

∫ ∞

0

e−ξ2

2s e−r cosh(ξ) sinh(ξ) sin

(πξ

s

)

dξ . (4.15)

Thus, the probability density of A(µ)s independent of W

(µ)s has a representation in terms of a double

integral. In principle, when the option value or the probability density of the underlying local

volatility model are known, integral representations of the value of European options in the full

stochastic-volatility model can thus be derived. However, these multi-dimensional integrals do not

appear to be easily accessible numerically due to the oscillatory nature of the integrand. For s → ∞a simple asymptotic expression for the law of A

(−µ)∞ can be derived. In this case A

(−µ)∞ with µ > 0

is distributed as (2γµ)−1 where γµ is a gamma random variable with parameter µ, i.e.

P(γµ ∈ dx) =1

Γ(µ)xµ−1e−xdx . (4.16)

For the relevant case µ = 12 , it follows that

P(A(− 1

2 )∞ ∈ da) =e−

12a

√2πa3

da . (4.17)

The cumulative distribution function in this limit is thus given by

P(A(− 1

2 )∞ < a) = 2[1 −N(a−12 )] . (4.18)

33

For the distribution of normalized r.m.s. volatility, this implies an asymptotic expression for s =

ν2τ ≫ 1,

π(u, s) ∼ e−1us

√2πu3s3

. (4.19)

Combining this result with analytical expressions for the option price or the transition density in

the CEV model, the asymptotics of the SABR model for τ → ∞ and ρ = 0 can be derived [21].

In the following section, we will need the moments of the distribution of normalized total variance

defined as

M (n)(s) =

∫ ∞

0

unπ(u, s)du , (4.20)

respectively the centralized moments,

M (n)c (s) =

∫ ∞

0

(u − u(s))nπ(u, s)du , (4.21)

where u(s) = M (1)(s) is the average normalized total variance. Results for the moments are known

in the literature [34, 43]. Appendix C gives an elementary derivation of a recursion relation that can

be used to compute all moments. The result for the first few centralized moments reads as follows,

u(s) =1

s(es − 1) ,

M (2)c (s) =

1

15s2(e6s − 15e2s + 24es − 10

),

M (3)c (s) =

1

s3

(1

315e15s − 1

5e7s +

8

45e6s + 2e3s − 24

5e2s +

136

35es − 16

15

)

. (4.22)

4.4 Hull and White Expansion

For small times to maturity τ , the distribution of total variance discussed in the previous section

will be fairly localized. Hull and White [34] thus expand the value of the integrand in Eq. (4.10)

around the mean of the distribution to obtain an expansion in its moments,

VSV (τ, f, α) =

∫

VLV (uα2τ, f)π(u, ν2τ)du

=

∫ ∞∑

n=0

1

n!

∂nVLV

∂θn

∣∣∣∣∣θ=uα2τ

(θ − θ)nπ(u, ν2τ)du

=

∞∑

n=0

(α2τ)n

n!

∂nVLV

∂θn(uα2τ, f)M (n)

c (ν2τ)

= VLV (uα2τ, f) +(α2τ)2

2

∂2VLV

∂θ2(uα2τ, f)M (2)

c (ν2τ)

+(α2τ)3

6

∂3VLV

∂θ3(uα2τ, f)M (3)

c (ν2τ) + O(τ4) . (4.23)

Here, θ = uα2τ and u(s) as well as the centralized moments M(n)c (s) are defined in the previous

section.

34

When analytical formulas for the option value and its θ-derivatives for the local volatility model

are available, these can be used in Eq. (4.23) to obtain an analytical approximation for the full

SABR model for small times to maturity. For a general local vol factor C(f) however, the option

value and its derivatives can only be obtained numerically. We use a finite difference scheme as

described in Sec. 6.1 to calculate the time dependence of the option value in the the local volatility

model. Subsequently, the required time derivatives are estimated as discrete derivatives from the

last points in the time grid.

4.5 Averaging with precalculated quantiles

We will see that the Hull-White expansion only yields reasonably accurate results for the smile for

very small times to maturity and close to at-the-money. A far better approximation can be obtained

by performing the average over total variance numerically. To this end, we follow Barjaktarevich

and Rebonato [48] and replace the continuous distribution π by a finite number of delta functions,

πN (u, s) =

N∑

i=1

δ(u− ui(s)) , (4.24)

where ui(s) are evenly spaced quantiles of the distribution, i.e.

Π(ui(s), s) =

∫ ui(s)

0

π(u, s) =i− 1

2

N, (4.25)

for i = 1, . . . , N . The integral in Eq. (4.10) then reduces to a finite sum

VSV (τ, f, α) ≈N∑

i=1

VLV (ui(ν2τ)α2τ, f)du . (4.26)

The quantiles ui(s) can be precalculated using a Monte Carlo simulation. To this end, a vector

representing the squares α2s is initialized with ones and a vector Is representing the integrals over

α2s is initialized with zeros. For each time step, a vector ω of independent standard normal variables

is drawn and the following updates are performed,

α2s+∆ = α2

se−∆+2ω

√∆ ,

Is+∆ = Is +∆

2

(α2s + α2

s+∆

), (4.27)

corresponding to a trapezoidal integration rule. At each time steps, the vector Is is sorted and

normalized by the current time to determine the quantiles ui(s). Fig. 4.1 shows the result for N = 20

quantiles. The functions ui(s) as a function of the dimensionless time s are sufficiently smooth to

allow for a simple interpolation of quantiles in between precalculated times. The Monte Carlo

simulation is rather time consuming to achieve the necessary accuracy. However, this computation

has to be performed only once and can then be used for our numerical scheme for arbitrary local

vol factors C(f).

35

0,01 0,1 1 10 1000,01

0,1

1

5

s = ν2τ

u rms,

i(s)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

Figure 4.1: Quantiles urms,i(s) =√

ui(s) of the distribution of r.m.s. volatility α. Solid lines showthe result of a Monte Carlo simulation. Dotted lines in the main figure are asymptotic quantilesobtained from Eq. (4.18). The inset shows a magnified view for small times s most relevant for theaveraging procedure for realistic parameters.

The time dependence of the option value in the local-volatility model required in Eq. (4.26) will

be calculated numerically by a finite difference scheme as explained in Sec. 4.2. As averaging points

in Eq. (4.26) tend to be concentrated at very small times while some large times are also present, it

is essential to use an adapted time grid with reduced step size at small times as well as a non-uniform

grid in the rate direction with higher concentration of points around the strike. We solve the forward

Dupire equation to obtain the whole form of the smile at once. In some more detail the numerical

procedure is as follows:

1. For a given time to maturity τ and parameter ν interpolate the precalculated quantiles to

obtain all ui(ν2τ).

2. Set up the non-uniform grids in space and time. For convenience, we use a grid in time that

contains all θi = ui(ν2τ)α2τ given by the quantiles.

3. Solve the Dupire equation for the local volatility model numerically on the chosen grid to

obtain VLV (θ, f).

36

4. While performing the time-stepping, add the current vector of option prices to a result vector

whenever a grid-point θi is reached.

5. Normalize the result vector of option prices by the number of quantiles yielding a vector of

option prices VSV for the full SABR model.

6. Interpolate the result vector to all strikes needed and numerically invert the Black formula to

obtain the implied volatility.

4.6 Extension to finite ρ

The averaging approach described so far in this chapter is only applicable for vanishing correlation

ρ = 0. In order to extend the approach to small finite ρ, Barjaktarevich and Rebonato [48] have

proposed to use a simple contravariate type of correction. The main idea is that the Hagan approx-

imation is approximate in time to maturity, but presumably captures the ρ dependence correctly

for small τ . We can use a similar approximation here for general C(f) by combining the mixing

approach with the heat-kernel method, i.e. we approximate the implied volatility in the full SABR

model as follows

σimpl.(τ, f, α,K, ν, ρ) ≈ σimpl.(τ, f, α,K, ν, 0) +

σhkimpl.(τ, f, α,K, ν, ρ) − σhk

impl.(τ, f, α,K, ν, 0) . (4.28)

Here, the first term on the right hand side is calculated by the mixing approach described in this

chapter. The last two terms are obtained from the asymptotic heat-kernel expansion of the implied

volatility as described in Chap. 3.

37

Chapter 5

Effective local volatility

Andreasen and Huge [2] have very recently extended the SABR model to what they call a ’ZABR’

model. In a more local-volatility style of approach they include a general local-volatility function

for the rate process, similar to our approach here, and additionally consider a CEV extension for

the volatility process. In contrast to our philosophy here, they take a non-parametric local-volatility

function the form of which is then calibrated to market data. They claim to obtain very fast and

flexible solutions for the implied volatility smile of their rather complex stochastic volatility model

by the following approach. First an effective local volatility is derived by a special and supposedly

simplified approach as compared to the heat-kernel methodology. They obtain the very leading

time-independent approximation of the effective local volatility. This result is then inserted into

the Dupire equation and solved numerically to obtain option values and thus the implied volatility

smile. They use a one-step fully implicit scheme is for the solution of the Dupire equation to speed

up calculations. We will not follow this extra optimization here.

As we have already derived an asymptotic result for the effective local volatility for the extended

SABR model in Eq. (3.115) and set up finite-difference solvers, we can check the accuracy of the

approach without much extra implementations. We thus take the leading order volatility as an

effective C-function as defined in Eq. (3.116), i.e.

Ceff(α, f,K) = αC(K) =√

α2 + ν2(∆Σ)2 + 2ρνα∆ΣC(K) , (5.1)

and solve the forward Dupire Eq. (2.15) by a one-dimensional finite-difference scheme,

∂τV (τ, f, α,K) =1

2C2

eff(α, f,K)∂2KV (τ, f, α,K) , (5.2)

with the initial condition V (0, f, α,K) = (f − K)+. Using a grid for K, one obtains the whole

functional form of the smile in one finite-difference run. Note that it would be interesting to include

the first time-dependent term in Eq. (3.115) for the effective local volatility to see whether this

improves the accuracy of the approach.

38

Chapter 6

Numerical methods

In this chapter, we discuss numerical methods for the solution of option-pricing problems. Ultimately,

we are interested in solutions for the original and extended SABR models, i.e. for two-factor models,

as a gauge for the accuracy of analytical approximations. As an important preliminary and also

due to its relevance to the ρ = 0 averaging scheme discussed in Chap. 4, we also analyze numerical

schemes for the one-factor local volatility model with general C(f).

6.1 Finite difference methods

6.1.1 Methods for one-factor models

A finite difference method provides an approximate solution of a PDE by discretizing the derivatives

on a fixed space and time grid. Differential operators are thus replaced by finite and usually sparse

matrices that can be handled on a computer.

Let us start out by considering a simple function f(x) of one real variable x. A discrete approxi-

mation to this function is given by a vector with components fi that represent the function at points

xi of a discrete grid. The simplest such grid would be one with uniform spacing δx = (xmax−xmin)/M

between neighboring points, i.e. xi = xmin + iδx, i = 0, . . . ,M , but non-uniform grids will also be

considered below. Grid points will always be ordered with increasing index i, i.e.

xmin = x0 < x1 < · · · < xM = xmax . (6.1)

Discrete approximations to derivatives of f(x) at the grid point xi can be obtained by Taylor

expanding the function values at neighboring grid points around xi,

fi+1 = f(xi + δxi) = f(xi) + f ′(xi)δxi +1

2f ′′(xi)δx

2i + O(δx3

i ) ,

fi = f(xi) , (6.2)

fi−1 = f(xi − δxi−1) = f(xi) − f ′(xi)δxi−1 +1

2f ′′(xi)δx

2i−1 + O(δx3

i−1) .

39

ai bi ci

f(xi) 0 1 0

f ′(xi)xi−xi−1

(xi+1−xi)(xi+1−xi−1)xi+1−2xi+xi+1

(xi+1−xi)(xi−xi−1)− xi+1−xi

(xi−xi−1)(xi+1−xi−1)

f ′′(xi)2

(xi+1−xi)(xi+1−xi−1)− 2

(xi+1−xi)(xi−xi−1)2

(xi−xi−1)(xi+1−xi−1)

ai bi ci

f(xi) 0 1 0f ′(xi)

12δx 0 − 1

2δx

f ′′(xi)1

δx2 − 2δx2

1δx2

Table 6.1: Coefficients of discrete approximations to the first two derivatives using a centered three-point stencil in a non-uniform grid (upper table) and their well-known limit for a uniform grid (lowertable).

Here, δxi = xi+1 − xi is the distance to the next grid point. Forming a linear combination of the

function values at xi and the neighboring two grid points,

aifi+1 + bifi + cifi−1 = [ai + bi + ci]f(xi) + [aiδxi − ciδxi−1]f ′(xi)

+1

2[aiδx

2i + ciδx

2i−1]f ′′(xi) + O(δx3

i , δx3i−1) , (6.3)

we see that approximations to the derivatives can be constructed by demanding that the coefficients

on the right hand side of Eq. (6.3) be one for the required derivative and zero for the other two

terms. This leads to a 3 by 3 linear system for each derivative. The solutions are shown in Tab. 6.1

together with the version for a uniform grid.

In this subsection, we are interested in the pricing problem for a local volatility model, i.e. we

need to solve a discrete version of the PDE

∂τV =1

2C2(f)∂2

fV ≡ LV , (6.4)

where V (τ, f) represents the option value for a current rate f and time to maturity τ . Note that we

have set σ = 1 here, such that V (τ, f) corresponds directly to VLV (τ, f) as needed for the mixing

scheme of Chap. 4. The initial condition V (0, f) = g(f) is given by the payoff g(f) of the European

option, e.g. g(f) = (f −K)+ for a call and g(f) = (K − f)+ for a put. To discretize Eq. (6.4), we

introduce a grid in the τ direction

0 = τ0 < τ1 < · · · < τL = τ , (6.5)

as well as a grid in the f direction

fmin = f0 < f1 < · · · < fM = fmax , (6.6)

and denote by V lm = V (τl, fm), with l = 0, . . . , L and m = 0, . . . ,M , an approximation to the option

value at the grid point specified by the indices l and m. Note that we require neither of the grids to

40

be uniform. The differential operator L can now be discretized using the prescriptions in Tab. 6.1.

This yields,

[LV ]m =1

2C2(fm)[a(2)m Vm+1 + b(2)m Vm + c(2)m Vm−1] , (6.7)

where a(2)m , b

(2)m and c

(2)m denote the coefficients of the second derivative given in the third line of

Tab. 6.1 with xi replaced by fm. For notational convenience, we have defined V−1 = VM+1 = 0.

Thus, L is tridiagonal and in matrix notation reads as

L =

b0 a0c1 b1 a1

c2 b2 a2. . .

cM−1 bM−1 aM−1

cM bM

, (6.8)

where we use the shorthand am = 12C

2(fm)a(2)m and similarly bm and cm. Note that for the coef-

ficients in the first and last row of the matrix L to be well defined for a non-uniform grid, ghost

points f−1 and fM+1 beyond the limit of the physical grid have to be defined.

To understand the role of these ghost points, a note about the proper treatment of boundary

conditions is in order here. A discrete version of Dirichlet boundary conditions would fix the option

value on the boundary at the ghost points, i.e. V (τl, f−1) = α(τl) = αl, V (τl, fM+1) = β(τl) = βl.

For α(τ) and β(τ), some approximate solution for the option value for large or small rates has to

be used and α and β can thus explicitly depend on the time τ . The discrete second derivative with

respect to the rate f on the first and last real grid point depends on the boundary value. The

prescription in Eq. (6.7) should then be modified as

LVl → LV

l + bl, (6.9)

to take care of the Dirichlet boundary condition. Here, the boundary vector bl

is given by

bl

=(c0α

l, 0, 0, . . . , 0, aMβl)T

. (6.10)

Alternatively, discrete von Neumann boundary conditions can be used which fix the first derivative

of the option value on the boundary. Using central differences to approximate the first derivative,

we have to specify ∂fV (τl, f0) = α(τl) = αl and ∂fV (τl, fM ) = β(τl) = βl on the first and last point

of the real grid. In discrete terms, the condition at the upper boundary reads as

βl = a(1)M V l

M+1 + b(1)M V l

M + c(1)M V l

M , (6.11)

where the coefficients are given in the middle row in Tab. 6.1. Eq. (6.11) can be solved for the option

value V lM+1 at the ghost point and used in the evaluation of the second derivative at xM . Proceeding

in the same way for the lower boundary results in the following modification of Eq. (6.7),

LVl → LV

l + bl. (6.12)

41

Here, L is obtained from L by modifying the first and last rows as follows while keeping all other

rows fixed,

cM → cM = cM − aMc(1)M

a(1)M

, bM → bM = bM − aMb(1)M

a(1)M

,

a0 → a0 = a0 − c0a(1)0

c(1)0

, b0 → b0 = b0 − c0b(1)0

c(1)0

.(6.13)

The boundary vector in Eq. (6.12) is given by

bl

=

(

αl c0

c(1)0

, 0, 0, . . . , 0, βl aM

a(1)M

)T

. (6.14)

A convenient discrete boundary condition which does not have an analog in the continuous case

is to fix the value of the second derivative at the boundary, i.e. ∂2fV (τl, f0) = α(τl) = αl and

∂2fV (τl, fM ) = β(τl) = βl. Using central differences for the discrete approximation of the second

derivative, the condition at the upper boundary reads as

βl = a(2)M V l

M+1 + b(2)M V l

M + c(2)M V l

M . (6.15)

Proceeding as for the von Neumann boundary conditions, this results in a modification as in

Eq. (6.12) where now L is obtained from L by simply setting the first and last rows to 0. The

boundary vector in this case reads as

bl

=

(

αl c0

c(2)0

, 0, 0, . . . , 0, βl aM

a(2)M

)T

. (6.16)

For the local volatility model in Eq. (6.4), setting the second derivative on the boundary equal to

zero effectively keeps the option value fixed at the payoff, i.e. V (τ, f0) = g(f0) and V (τ, fM ) = g(fM ).

Since for all models of interest here the lower boundary is given by f0 = 0 and C(0) = 0, this

reproduces the exact option value at f = 0. Also due to the absence of a drift and discount term in

Eq. (6.4), V (τ, f) will asymptotically stay at the value of the payoff for large f . Thus, the discrete

boundary condition with a vanishing second order derivative at both boundaries is a convenient way

to impose these asymptotics. This procedure is actually equivalent to Dirichlet boundary conditions

V (τ, 0) = g(0) and V (τ, fM ) = g(fM ) while working with a grid (f1, . . . , fM−1) where we have

dropped the first and last grid point. However, with the vanishing second order boundary condition,

the boundary vector in Eq. (6.16) vanishes and an explicit treatment of boundary terms in the

numerical procedure is not required any more. This simplified book keeping makes up for the loss of

information at two grid points and we will always use it for the numerical solution of local-volatility

models in this thesis.

Let us now come back to the discretization of the PDE of the local volatility model in Eq. (6.4)

after the digression on boundary conditions. Using forward differences for the first-order time deriva-

tive yields

∂τV → V l+1m − V l

m

δτl. (6.17)

42

Combining this with the f derivatives, we have to decide what time slice to use on the right hand

side of Eq. (6.4), l + 1 or l. Using a linear combination of the two yields the well known θ-scheme,

Vl+1 − V

l

δτl= L[θV l+1 + (1 − θ)V l] + θb

l+1+ (1 − θ)b

l. (6.18)

Here, L is supposed to be the modified matrix L according to the boundary conditions used. For

notational convenience we drop the tilde from now on. Rearranging terms yields the following linear

system of equations,

[1− δτlθL]︸︷︷︸

≡A(θ)

Vl+1 = [1 + δτl(1 − θ)L]

︸︷︷︸

≡B(θ)

Vl + b

l . (6.19)

We have spelled out the boundary vector bl = θδτlbl+1

+(1−θ)δτlbl

one last time for completeness of

the argument. For the rest of this subsection, we will drop it assuming that we work with vanishing

second-order boundary conditions. For the choice θ = 0, Eq. (6.19) yields the explicit scheme in

which the option value at the next time step Vl+1 is given in terms of the one at the previous time

step Vl by a simple matrix multiplication. The choice θ = 1 yields a purely implicit scheme that

requires the solution of the linear system AVl+1 = V

l. The solution of such a tridiagonal system

can be done very efficiently with two sweeps of Gaussian elimination and thus a computational cost

that is proportional to the matrix size M . In MATLAB, the backslash operator ’\’ can be used for

this purpose. For θ = 12 , Eq. (6.19) is called the Crank-Nicolson scheme.

The practicability of a finite-difference method is determined by its stability. Roughly, a scheme

is said to be stable when oscillatory or non-smooth components of the vector V l that are introduced

by the initial condition or small errors due to finite numerical accuracy are dampened or at least

not amplified during time stepping. Considering a small perturbation Vl → V

l + δV l at time step

l, an exact solution of Eq. (6.19) would yield a perturbation δV l+k at a later time step given by

δV l+k = CkδV l , (6.20)

where C = A−1

B. Applying a vector norm as a measure of the size of the perturbation yields

‖δV l+k‖ = ‖CkδV l‖ ≤ ‖C‖k‖δV l‖ , (6.21)

where ‖C‖ is a matrix norm compatible with the vector norm used. Eq. (6.21) shows that a sufficient

condition for the stability of a scheme is that the matrix norm ‖C‖ ≤ 1. Depending on the norm

used, different criteria for stability can be defined. For the heat equation with constant coefficients,

an eigensystem of the matrix L can be determined by a discrete Fourier transformation. This has

been used by von Neumann to determine the eigenvalues of the matrix C. If all eigenvalues are ≤ 1,

the scheme does not amplify errors and is thus stable. For this particular case, the θ-scheme is found

to be stable for θ ≥ 12 including the implicit and the Crank Nicolson schemes (for details see e.g.

[53] and references therein). The explicit scheme is only conditionally stable with a condition that

43

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

y

r(y)

0 2 4 6 8 10 12 14 16 18 200

1

2

3

4

y

x

Figure 6.1: Illustration of the generation of a non-uniform grid. The upper panel shows the piece-wiselinear distance ratio function r(y). A very low value is chosen at y = 1 to obtain a concentrationpoint in the resulting grid. On the contrary, for large values of y a large distance ratio leads tocoverage of the asymptotic region with just a few grid points. The lower panel shows the generatingfunction g(x) and the non-uniform grid in the y direction resulting from a uniform one in the xdirection. Note that the axes are flipped for a better comparison with the upper panel.

requires the time step δτ to be reduced as δf2 when the step size in the rate direction is reduced

leading to very small required time steps and thus an inefficient scheme. Although strictly speaking

these results are only applicable to a PDE with constant coefficients, they are useful as a rule of

thumb more generally.

Another important consideration for a finite difference scheme is its consistency with the PDE

it is meant to approximate. This means that the finite difference operators applied to the true

solution of the continuous PDE approach the continuous derivatives as the grid-size is reduced to

zero. More or less by construction, all finite-difference schemes used here are second order consistent

in the space direction. By Taylor expansion, one can show that the θ-scheme for general choices of

θ is only first-order consistent in the time variable. Only for the special choice θ = 12 , i.e. for the

Crank-Nicholson scheme, second-order consistency is achieved. Therefore, we mainly work with the

Crank-Nicholson scheme. Under rather general conditions, it can then be shown that a consistent

and stable finite-difference scheme is also convergent.

To easily generate non-uniform grids, we follow Kluge [36] and use generating functions. A non-

44

uniform grid with points yi is then obtained from a uniform one with points xi by applying the

generating function g(x),

yi = g(xi) , (6.22)

where g(x) has to be monotonously increasing with y. For small grid spacings, the distance between

neighboring points of the non-uniform grid is given by,

δyi = yi+1 − yi = g(xi+1) − g(xi) ≈ g′(xi)δx = g′(g−1(yi))δx ≡ r(yi)δx , (6.23)

where δx is the spacing of the uniform grid and r(y) is called the distance ratio function. We would

like to externally prescribe the form of r(y) to chose the concentration of the grid in terms of the

target variable y. Solving for g(x) leads to the relation,

x =

∫ g(x)

y0

dy

r(y), (6.24)

which has to be inverted to obtain g(x). A particularly simple form is obtained for a piecewise linear

distance ratio function r(y) given by specifying the values ri = r(yi) at a set of points yi. Here, yi

are just a means of specifying the distance ratio function and should not be confused with the final

points yi of the grid. The distance ratio function is then given by

r(y) =ri+1 − riyi+1 − yi︸︷︷︸

≡ci

y + ri − yiri+1 − riyi+1 − yi

︸︷︷︸

≡di

, yi ≤ y ≤ yi+1 . (6.25)

Integration and inversion yields,

g(x) =

{ rici

(ecix − 1) + yi , ci 6= 0 ,

rix + yi , ci = 0 ,xi ≤ x ≤ xi+1 , (6.26)

where

xi =i−1∑

j=0

Ij , Ii =

∫ yi+1

yi

dy

ciy + di=

{ 1ci

ln ri+1

ri, ci 6= 0 ,

yi+1−yi

ri, ci = 0 .

(6.27)

Fig. 6.1 shows an example of a grid generated with such a piecewise linear distance ratio function.

Fig. 6.2 and 6.3 show an example of the time-dependence of the value of a European put option in

the Black-Scholes model obtained from a finite-difference solution of the backwards pricing equation.

For practical applications in Chap. 7, we will rather solve the forward Dupire equation for the

European call option to obtain the entire smile for all strikes at once. Due to the absence of drift

and discount terms, the numerical setup to solve the Dupire equation is identical to the one described

in the current section provided that the roles of f and K are interchanged.

45

10−4

10−2

100

102

10−1

100

101

102

103

τ

f

Figure 6.2: Grid in the τ and f directions used for the finite difference solution of the Black modelas shown in Fig. 6.3. Each grid contains 100 points. Note the logarithmic scale on both axes.

10−4

10−3

10−2

10−1

100

101

102

0

0.2

0.4

0.6

0.8

1

τ

V(τ

,f)

10−4

10−3

10−2

10−1

100

101

102

0

1

2

3

4

5

6

7x 10

−3

τ

abso

lute

err

or

Figure 6.3: Comparison of finite difference solution for the time dependence of the option valuein the Black model with the analytical expression. The grid used is shown in Fig. 6.2. The leftpanel shows the option values for a European put with fixed strike K = 1 and different initial ratesf = 0.2, 0.4, . . . , 1.4. Note that the limit of the option value for τ → 0 is the payoff (K−f)+ whereasfor τ → ∞ the value of the put goes to K. Solid lines are the analytical solution, crosses give theresult of the finite difference scheme. The right panel shows the absolute errors. Note that τ is thedimensionless time to maturity corresponding to σ = 1.

46

6.1.2 Extension to two-factor models

Basic finite-difference schemes for two-dimensional models can be constructed along the same lines

as in one dimension. The discretization now involves three grids: τl, l = 0, . . . , L, and fm, m =

0, . . . ,M , in the time and rate directions as before as well as a new grid αn, n = 0, . . . , N , for

the volatility variable α. The discrete approximation to the option value on a grid point is now

denoted by V lm,n = V (τl, fm, αn). Finite-difference operators for spatial derivatives involve a nine-

point stencil centered at the point with indices (m,n). The coefficients can be expressed in terms

of products of the one-dimensional coefficients given in Tab. 6.1 [36]. More specifically, the second

derivative with respect to the rate now reads as,

∂2fV (τl, fm, αn) →

0 c(2)f,m 0

0 b(2)f,m 0

0 a(2)f,m 0

︸︷︷︸

[∆(2)f ]m,n

⋆

V lm−1,n−1 V l

m−1,n V lm−1,n+1

V lm,n−1 V l

m,n V lm,n+1

V lm+1,n−1 V l

m+1,n V lm+1,n+1

︸︷︷︸

[V l]m,n

, (6.28)

where the coefficients a(2)f,m, b

(2)f,m and c

(2)f,m are taken from the last line of Tab. 6.1 with xi replaced

by fm. We make use of a stencil notation, where the action of the star multiplication is defined as

follows,

a−1,−1 a−1,0 a−1,1

a0,−1 a0,0 a0,1a1,−1 a1,0 a1,1

⋆ [V l]m,n ≡1∑

r,s=−1

ar,sVm+r,n+s . (6.29)

Similarly, the discrete version of the second derivative with respect to the volatility variable becomes,

∂2αV →

0 0 0

c(2)α,n b

(2)α,n a

(2)α,n

0 0 0

︸︷︷︸

[∆(2)α ]m,n

⋆[V l]m,n . (6.30)

Here again, the coefficients are given in Tab. 6.1 with xi now replaced by αl. First derivatives with

respect to the rate and the volatility have the same form as the second derivatives in Eqs. (6.29) and

(6.30) with the coefficients then taken from the middle row of Tab. 6.1. These will not be needed

for our purposes however. Finally, the mixed second derivative is given by

∂f∂αV →

c(1)f,mc

(1)α,n c

(1)f,mb

(1)α,n c

(1)f,ma

(1)α,n

b(1)f,mc

(1)α,n b

(1)f,mb

(1)α,n b

(1)f,ma

(1)α,n

a(1)f,mc

(1)α,n a

(1)f,mb

(1)α,n a

(1)f,ma

(1)α,n

︸︷︷︸

[∆(2)fα ]m,n

⋆[V l]m,n

→ 1

4δfδα

1 0 −10 0 0−1 0 1

⋆ [V l]m,n , (6.31)

where the simplified form in the last line is valid for grids with uniform spacing in both directions.

The discrete approximation to the spatial operator of the extended SABR model in Eq. (2.6) then

47

Figure 6.4: Illustration of the lexicographic ordering. Components originally arranged in a rect-angular grid have to be read column by column (left panel) to arrive at the f ordering defined inEq. (6.33) while reading row by row (right panel) results in the α ordering as in Eq. (6.34). For therequired index arithmetic, the use of MATLABs reshape function together with matrix transposesturns out to be very useful.

reads as,

LV →{

1

2α2nC

2(fm)[∆(2)f ]m,n +

1

2α2n[∆(2)

α ]m,n + ρν2α2n[∆

(2)fα ]m,n

}

⋆ [V l]m,n . (6.32)

For an efficient implementation on a computer, the application of the finite difference operator needs

to be expressed as a standard multiplication with a (sparse) matrix. To this end, the components of

Vl at a fixed time slice have to be aligned into a one-dimensional vector. The two following so-called

lexicographic orderings will be used,

Vlf = (V l

0,0, Vl1,0, . . . , V

lM,0, V

l0,1, . . . , V

lM,1, . . . , V

l0,N , . . . , V l

M,N ) ,

= (V lf,1, . . . , V

lf,(M+1)(N+1)) , (6.33)

Vlα = (V l

0,0, Vl0,1, . . . , V

l0,N , V l

1,0, . . . , Vl1,N , . . . , V l

M,0, . . . , VlM,N )

= (V lα,1, . . . , V

lα,(M+1)(N+1)) . (6.34)

It is easy to see that the part of the spatial operator containing the second order f derivative will

be tridiagonal with the first ordering while with the second, the derivatives with respect to α will

be tridiagonal. Hence the subscript on the vector Vl for the two orderings. Fig. 6.4 illustrates how

the components of the rectangular grid with indices m and n are ordered in Eqs. (6.33) and (6.34).

For a conversion between the two orderings, the components of the vectors have to be rearranged,

i.e.

Vlf = P fαV

lα , V

lα = P αfV

lf . (6.35)

Here, P fα and P αf are permutation matrices defined as

[P fα]n(M+1)+m+1,n+1+m(N+1) = 1 , m = 0, . . . ,M , n = 0, . . . , N , (6.36)

and all other elements equal to zero as well as P αf = PTfα = P

−1fα.

Once an ordering is chosen, the discrete approximation of the operator L in Eq. (6.32) then turns

into a matrix L which has the following contributions,

L = Lf + L

α + Lfα . (6.37)

48

Here, Lf contains the second derivatives with respect to f , Lα contains the second derivatives with

respect to α, and Lfα represents the mixed derivatives. In Eq. (6.37), we have not explicitly stated

the ordering employed to set up the matrices. We can do so by adding a subscript f or α. All

matrices will thus have two forms, related by reshuffling of indices, e.g.

Lfα = P αfL

ffP fα . (6.38)

With their natural ordering, the matrices Lff and L

αα are tridiagonal. Explicitly, we have

Lff =

α20M

f = 0

α21M

f

. . .

α2NM

f

, (6.39)

where

Mf =

0 0c1 b1 a1

c2 b2 a2. . .

cM−1 bM−1 aM−1

0 0

. (6.40)

The coefficients are the same as in the one-dimensional local volatility model, i.e. am = 12C

2(fm)a(2)f,m

and similarly for bm and cm. Furthermore,

Lαα =

Mα

Mα

. . .M

α

, (6.41)

where now all M + 1 blocks are identical and given by

Mα =

0 0

c1 b1 a1c2 b2 a2

. . .

cN−1 bN−1 aN−1

0 0

, (6.42)

with the coefficients an = 12α

2na

(2)α,n, bn = 1

2α2nb

(2)α,n and cn = 1

2α2nc

(2)α,n. Note that we have already

made use of the convenient boundary conditions of vanishing second order derivatives ∂2fV and ∂2

αV

at the boundaries similar to the procedure in one dimension. This dispenses us from the explicit

treatment of boundary vectors and only requires a simple adjustment of the matrices. The matrix

Lfα from mixed derivatives cannot be made tridiagonal in any ordering. In the f or α ordering, it

is band diagonal with a very sparsely populated band. Non-zero entries are located at a maximum

distance of N + 2 or M + 2 from the diagonal, respectively. We will refrain from giving a graphical

representation of Lfα here and simply note that the components of a matrix Mf for a general

9-point stencil [a]m,n with parts am,nr,s in the f ordering are given by

[Mf ]n1(M+1)+m1+1,n2(M+1)+m2+1 =

1∑

r,s=−1

am,nr,s δn1(M+1)+m1+1,(n2+s)(M+1)+m2+r+1 , (6.43)

49

with mi ∈ (0, . . . ,M) and ni ∈ (0, . . . , N). Eq. (6.43) can be implemented efficiently using vector

arithmetic and sparse matrices in MATLAB.

Once the matrices for spatial derivatives are set up, the time stepping in finite-difference schemes

can be expressed through matrix operations as for schemes in one dimension. The θ-scheme again

reads as,

[1− δτlθL]V l+1 = [1 + δτl(1 − θ)L]V l , (6.44)

where we have deliberately not indicated the ordering of vectors and matrices by subscripts. All

matrices and vectors have to be chosen with the same ordering (same subscript f or α), but none

of the orderings is preferable as L is only band diagonal and not tridiagonal as in one dimension.

The solution of the linear system in Eq. (6.44) can thus no longer be done by quick Gaussian

elimination with a cost linear in the matrix size. More involved methods such as iterative solutions

or factorizations adapted to sparse matrices have to be used. In MATLAB, we can still use the

backslash notation ’\’ to solve the system, but the computational cost will be higher as internally

more involved methods are used.

To revert to the simplicity and efficiency for tridiagonal matrices, several so-called splitting

schemes have been proposed in the literature that introduce intermediate fractional time steps. For

each intermediate step only one of the matrices Lf or L

α is involved in the implicit part of the

linear system. Choosing the appropriate ordering, the solution of the linear system thus involves

only a tridiagonal matrix. The natural ordering changes between subsequent intermediate time

steps, requiring a reshuffling of the vectors of option prices. For an explicit introduction of the

splitting schemes, consider first the case ρ = 0 for which the mixed derivatives in Lfα vanish. A

split version of the Crank-Nicolson scheme then reads as,

[

1− 1

2δτlL

ff

]

Vl+ 1

2

f =

[

1 +1

2δτlL

ff

]

Vlf ,

Vl+ 1

2α = P αfV

l+ 12

f ,[

1− 1

2δτlL

αα

]

Vl+1α =

[

1 +1

2δτlL

αα

]

Vl+ 1

2α ,

Vl+1f = P fαV

l+1α , (6.45)

where we have explicitly indicated the required ordering and the reshuffling of the indices between

steps. It turns out however that the second-order consistency in time of the Crank-Nicolson scheme

is lost by the splitting in Eq. (6.45) and reduced to first order consistency even for ρ = 0. A proof

involves only straightforward Taylor expansions in δτl (see [36] for details). It turns out that a

second-order consistent scheme can be recovered by simply interchanging the matrices Lf and L

α

50

on the left-hand side of Eq. (6.45) (or alternatively on the right-hand side for that matter), yielding

[

1− 1

2δτlL

αα

]

Vl+ 1

2α = P αf

[

1 +1

2δτlL

ff

]

Vlf ,

[

1− 1

2δτlL

ff

]

Vl+1f = P fα

[

1 +1

2δτlL

αα

]

Vl+ 1

2α . (6.46)

The scheme is called alternating direction implicit (ADI). However, for ρ 6= the mixed derivatives

have to be taken into account. They are usually added to the explicit parts on the right hand

side destroying again the second-order consistency in time. An alternative splitting scheme is given

by Yanenko [59] and is obtained by starting from a purely implicit scheme and adding the mixed

derivatives as explicit parts after splitting. In detail, it reads as

[1− δτlLff ]V

l+ 12

f =

[

1 +1

2δτlL

fαf

]

Vlf ,

[1− δτlLαα]V l+1

α = P αf

[

1 +1

2δτlL

fαf

]

Vl+ 1

2

f ,

Vl+1f = P fαV

l+1α . (6.47)

Again, this scheme is only first order consistent in time (see [53] and references therein for details

and proofs). By combining it with extrapolation methods, the order of consistency in time and thus

the order of convergence can be increased.

We follow the recommendation by Sheppard [53] for the SABR model and use a third order

consistent Khaliq and Twizell scheme [35]. Formally, the Yanenko scheme in Eq. (6.47) can be

written as Vl+1f = M (δτl)V

lf where

M(δτ) = P fα[1− δτLαα]−1

P αf

[

1 +1

2δτLfα

f

]

[1− δτLff ]−1

[

1 +1

2δτLfα

f

]

. (6.48)

Of course, an implementation always reverts back to Eq. (6.47) and never uses numerical matrix

inversions. Eq. (6.47) is just a convenient short hand. A third order extrapolation scheme now

subdivides the time step δτl into 3 equal subintervals and considers all possible ways to reach the

time step l + 1 from l by combining Yanenko substeps with sizes δτl/3, 2δτl/3 and δτl, yielding the

following possibilities,

Vl+1f,1 = M

3(δτl/3)V lf ,

Vl+1f,2 = M(2δτl/3)M(δτl/3)V l

f ,

Vl+1f,3 = M(δτl/3)M(2δτl/3)V l

f ,

Vl+1f,4 = M(δτl)V

lf . (6.49)

A linear combination of these 4 is then chosen for the final value Vl+1f such that the scheme is

third order consistent. This choice involves a straightforward but very lengthy and tedious Taylor

51

expansion of the inverse matrices in δτl. We will not reproduce it here, but refer the reader to

Sheppard’s thesis for the details [53]. The result is as follows,

Vl+1f =

9

2V

l+1f,1 − 9

4V

l+1f,2 − 9

4V

l+1f,3 + V

l+1f,4 . (6.50)

00.20.40.60.81

0

5

10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f

α

V

Figure 6.5: Example of option prices in the standard SABR model with β = 1 and ρ = 0 obtainedwith a two-dimensional finite-difference scheme. Results are from an ADI method, the non-uniformspatial grid has concentration points around the strike K = 1.0 and the initial volatility α = 0.1and M = N = 100 points in each direction. A uniform temporal grid with L = 100 points was used.Other parameters are T = 5 and ν = 0.3.

52

6.2 Monte Carlo

A Monte Carlo simulation of a European option price replaces the average in the pricing measure

of the payoff at maturity,

V (t, x) = E[g(XT )|Xt = x] , (6.51)

by an average over a finite number of realizations,

V (t, x) ≈ 1

N

N∑

i=1

g(X(i)T ) . (6.52)

Here, in general Xt is a vector of continuous time stochastic processes satisfying the stochastic

differential equations,

dXt = µ(Xt, t)dt + σ(Xt, t)dWt , (6.53)

with the initial condition X0 = x. For rare cases in which the terminal distribution of XT is known

analytically or even an analytic solution of the SDE can be derived, the terminal distribution can be

sampled directly in a so-called ’exact’ Monte Carlo scheme. For all other cases, the SDE has to be

simulated by replacing the continuous-time process by a discrete-time approximation. The simplest

and most popular scheme is the stochastic Euler or Euler-Maryuama scheme which reads as

Xi+1 −Xi = µ(Xi, t)δt + σ(Xi, t)ωi

√δt . (6.54)

Here, δt = T/M is the time step, Xi are approximations to the continuous process at time step

ti = iδt, and ωi is a vector of correlated normal variables with unit variance.

A naive application of this prescription to the (extended) SABR model would yield the following

Euler scheme,

Fi+1 = Fi + αiC(Fi)ω(1)i

√∆t ,

αi+1 = αi + ναiω(2)i

√∆t . (6.55)

where ω(1)i and ω

(2)i are correlated random variables that can be obtained from independent standard

normal variables z(1)i and z

(2)i via the Cholesky decomposition,

ω(2)i = z

(2)i , ω

(1)i =

√

1 − ρ2z(1)i + ρz

(2)i . (6.56)

However, note that the discrete volatility process αi+1 in Eq. (6.55) can become negative, if for a

small starting value αi and a relatively large time step δt a negative value of ω(2)i is sampled with

ω(2)i < − 1

ν√δt

. (6.57)

This behaviour is in contrast to the known solution to the SDE for geometric Brownian motion

(GBM) of αt that is garanteed to stay positive for all times. Similarly, the discrete rate process

53

Fi can also become negative, potentially resulting in undefined or imaginary values of C(Fi) for

the following time step. The exact solution for the GBM SDE for αt suggests to transform the

continuous time dynamics to the logarithmic variables Xt = lnFt and Yt = lnαt by using Ito’s

Lemma, yielding

dXt = −α2tC(Ft)

2

2F 2t

dt + αtC(Ft)

FtdW

(1)t ,

dYt = −ν2

2dt + νdW

(2)t . (6.58)

Applying the Euler discretization scheme to the processes Xt and Yt and subsequently transforming

back to Fi and αi, yields the logarithmic Euler scheme,

Fi+1 = Fi exp

(

−α2iC(Fi)

2

2F 2i

∆t + αiC(Fi)

Fi

√δtω

(1)i

)

,

αi+1 = αi exp

(

−ν2

2δt + νω

(2)i

√δt

)

. (6.59)

Note that the evolution of the volatility is captured exactly in this scheme. For the rate process, we

could have alternatively used the transformation Xt = Σ(Ft) with Σ(f) define in Eq. (3.57) resulting

in the scheme

Fi+1 = Σ−1

(

Σ(Fi) −α2iC

′(Fi)

2δt + αi

√δtω

(1)i

)

,

αi+1 = αi exp

(

−ν2

2δt + νω

(2)i

√δt

)

. (6.60)

It turns out that for a C(f) that starts out proportional to f , i.e. C(f) → cf for f → 0 as in the

two- or three-regime model or the cubic toy model, both schemes in Eqs. (6.59) and (6.60) are viable

numerically.

In contrast, for the original SABR model with C(f) = fβ and 0 < β < 1 the derivative C′(f)

or the quotient C(f)/f are divergent for f → 0 and both schemes become unstable. This is a

signature of the fact that zero rates are attainable in the SABR and the corresponding CEV model

(see Sec. 7.1.1). In order to include this possibility in the discrete simulation, we can stick with the

naive Euler discretization and force the rate process to stay non-negative by hand, i.e. we use the

scheme,

Fi+1 = q(

Fi + αiFβi ω

(1)i

√∆t)

,

αi+1 = αi exp

(

−ν2

2δt + νω

(2)i

√δt

)

, (6.61)

with q(x) = x+ = max(x, 0) (absorbing boundary) or q(x) = |x| (reflecting boundary). Note that in

the continuous time case a reflecting boundary leads to arbitrage opportunities as the rate can leave

the boundary at 0 with a finite probability leading to a potential upside gain with no possibility

for a downside loss. We will thus also exclude the reflecting boundary q(x) = |x| in the discrete

54

case. However even for q(x) = x+, the strict Martingale property of the rate process is lost. This is

most easily seen for a single time step. In the unfixed scheme in Eq. (6.55) the rate is a Martingale,

i.e. E[Fi+1] = Fi since E[ω(1)i ] = 0. By the fix in Eq. (6.61), the probability density for negative

samples of ω(1)i is shifted by hand upward to 0 leading to a positive drift. This truncation bias is

reduced with increasing number of time steps, but a potentially very large number of steps is needed

to make it vanish up to the required accuracy. On the contrary, for the log scheme or the H scheme

in Eqs. (6.59) and (6.60) the Martingale property is preserved exactly.

In the context of the Heston model and extensions thereof, Lord et al. [42] have compared

different fixes for an Euler discretization of a square root variance process. Transferring their notation

to the rate process of the SABR model would yield a set of schemes,

Fi+1 = q1(Fi) + αiq2(Fi)βω

(1)i

√∆t ,

Fi+1 = q2(Fi+1) . (6.62)

An auxiliary process Fi is introduced which can be allowed to become negative. The functions

qi(x) are chosen to be either the identity x, the absolute value function |x| or x+. In addition to

the absorbing boundary with qi(x) = x+, i = 1, 2, and the reflecting boundary with qi(x) = |x|,i = 1, 2, they consider truncated schemes which here would result in the choice q1(x) = x and

q2(x) = x+. Note that in their case a drift term is present, leading to additional degrees of freedom

and the distinction between a partially and a fully truncated scheme. Their recommendation from

numerical experiments is to use a truncated scheme due to a smaller truncation bias (in their case

the fully truncated version is superior). Note that the auxiliary process does in this case preserve

the Martingale property.

Recently, Chen et al. [14] have introduced a Monte Carlo scheme for the SABR model that

has a significantly reduced truncation bias. The spirit builds on the idea of ’exact simulations’

as introduced by Broadie and Kaya [12] for the Heston model. The low-bias scheme for the SABR

model makes use of the known transition density for the CEV model (see Sec. 7.1.1) and uses mixing

to combine this with sampling of the distribution of total variance. More specifically, for each time

step a sample of the terminal volatility αi+1 is first drawn from a log-normal distribution as in

the schemes discussed here. The conditional distribution of the integrated variance∫ ti+1

tiα2tdt is

then sampled numerically by using a moment-matched approximate log-normal probability density.

Finally a sample of Fi+1 is drawn from the CEV transition density conditional on the total variance.

Even though some intermediate approximations are used to avoid the evaluation of special functions

as much as possible, individual time steps are quite expensive in this scheme. However, due to the

reduced bias a considerably smaller number of time steps can be used as compared to the fixed

Euler schemes which makes the scheme attractive in terms of total computational cost for a required

accuracy. Here, we are mainly interested in Monte Carlo as a gauge of more efficient analytical or

55

semi-analytical approximations. Efficiency of the MC scheme is thus not our major concern. Even

optimized schemes do not seem to be efficient and accurate enough for a fast calibration procedure.

We will thus refrain from implementing the reduced-bias scheme here.

For practical applications of Monte Carlo, an estimate of the error bar of the results is of vital

importance. The total error has contributions from two components. First, the finite number of

sampling paths introduces a sampling error. From the law of large numbers, the contribution to the

error of the European option value can be estimated as

sampling error ≈√

var(VM )

N, (6.63)

where var(VM ) is the sample variance of the payoff evaluated at the final time step VM = g(FM ).

A second contribution comes from the discretization of the time domain. For European options the

weak error defined as

weak error = E[g(FT ) − g(F δtT )] , (6.64)

is the relevant measure of discretization error. Here, E[g(F δtT )] is meant to denote the Monte Carlo

average as defined in Eq. (6.52). An estimate of the weak error can be obtained from comparing

option values obtained with M and 2M time steps, i.e.

weak error ≈ E[g(Fδt/2T ) − g(F δt

T )] . (6.65)

In evaluating Eq. (6.65), it is important to use the same source of randomness for the simulation of the

option value with time step δt and δt/2. To this end, the random increments ∆Wδt/2t = ω

δt/2i

√

δt/2

are first drawn for the process with finer time step and then combined as ∆W δtt = ∆W

δt/2t +∆W

δt/2t+δt/2

to random increments for the coarser grid. This allows to estimate the weak error by separating it

from the sampling error. The total error bar is usually estimated as the root-mean-square error,

r.m.s. error =√

(sampling error)2 + (weak error)2 . (6.66)

The total computational cost of a Monte Carlo calculation is roughly proportional to the product

MN of the number of time steps M and the number of sample paths N . Thus, for a required

accuracy as measured by the r.m.s. error, M and N should be chosen such that the sampling error

and the weak error are of equal magnitude. For a globally Lipshitz continuous payoff, the order of

weak convergence is usually equal to 1, i.e. weak error ∼ δt as δt → 0. The variance of the payoff

tends to a constant with increasing number of paths such that sampling error ∼ N− 12 . Consequently,

we should increase the number of paths as N ∼ M2 with increasing number of time steps to keep

the balance between weak and sampling error. This optimal choice leads to a computational cost

that scales as ǫ−32 [17].

56

Chapter 7

Quantitative comparison ofdifferent approximation schemes

In this chapter, we apply the approximate analytical and semi-analytical as well as numerically exact

methods developed in previous chapters. The aim is to gauge the accuracy of the approximations for

the extended SABR model. However, we will also look at the underlying local volatility models in

some detail as some qualitative feature can already been seen there and the heat-kernel approxima-

tion can be evaluated to higher orders. Thus, the relevance of higher-order correction terms can be

analysed. Furthermore, a good handle on the local-volatility models is also required for the mixing

approach to the full SABR model as described in Chap. 4. All implementations for this thesis are

in MATLAB, the run times quoted are for a standard laptop (Lenovo T400).

7.1 Local volatility models

7.1.1 Constant elasticity of variance (CEV)

The constant elasticity of variance (CEV) model defined in our notation by C(f) = fβ is one of

the rare cases beyond the Black and Black-Scholes models for which analytical results are available

for the transition density and European option prices. Furthermore, the heat-kernel expansion can

be evaluated up to very high order [55]. Thus, besides being the basis for the stochastic volatility

extension to the SABR model, the CEV model constitutes a very good test case for the accuracy

of the perturbative results. The model has first been proposed by Cox and Ross [15]. A recent

comprehensive review of analytical results is given by Brecher and Lindsay [41]. Here, we will cite

some of the results without derivation as far as needed for the purpose of the comparison with the

heat-kernel expansion. For the details and references to the original literature, the reader is referred

to [41].

Different regimes have to be distinguished according to the value of the exponent β. For β = 1,

the model is equivalent to Black’s model and results have already been given before. In the regime

0 < β < 1 most relevant for applications, the boundary f = 0 is attainable and thus boundary

57

conditions have to be given special care. For 12 ≤ β < 1, f = 0 is an absorbing barrier, i.e. once

the process reaches the boundary it will stay there forever. For 0 < β < 12 , the boundary can be

absorbing or reflecting and an additional boundary condition has to be imposed. It has been argued

that in order to obtain an arbitrage free model, the boundary should always be absorbing and we

will only consider this case here (see e.g. Sec. 3.10.2 in [51] for a discussion of this point). For β = 0,

the process is equivalent to a Brownian motion and negative rates are possible. Note however that

the limit β → 0 is different from β = 0, since for any β > 0 negative rates are excluded and replaced

by a δ-contribution to the transition density at F = f (see below). The regime β > 1 poses special

problems due to a violation of the strict Martingality [41], but we will not be concerned with this

regime here.

For 0 < β < 1 with absorbing boundary conditions at f = 0, the analytical expression for the

transition density is given by

p(τ, f, F ) =F

12−2β

(1 − β)σ2τ

√

f exp

(

−F 2(1−β) + f2(1−β)

2(1 − β)2σ2τ

)

I 12(1−β)

((fF )1−β

(1 − β)2σ2τ

)

, (7.1)

where Iν(x) is the modified Bessel function of the first kind defined by

Iν(x) =(x

2

)ν ∞∑

k=0

(x2/4)k

k!Γ(ν + k + 1), (7.2)

and Γ(s) = Γ(s, 0) is the standard gamma function where Γ(s, x) is defined as

Γ(s, x) =

∫ ∞

x

us−1e−udu . (7.3)

Due to absorption at zero, the distribution in Eq. (7.1) is norm deficient with

∫ ∞

0

p(τ, f, F )dF = P

(1

2(1 − β),

f2(1−β)

2(1 − β)2σ2τ

)

< 1 . (7.4)

Here, P (s, x) = 1 − Γ(s, x)/Γ(s) is the normalized incomplete gamma function with Γ(s, x) defined

in Eq. (7.3). Note that we use the notation for gamma functions from Abramowitz’s and Stegun’s

reference manual [1] differing slightly from the notation in [41]. The missing probability weight in

Eq. (7.4) is the probability of absorption and is concentrated as a δ-function contribution at F = 0.

The value of a European call option in the CEV model (again for 0 < β < 1 and an absorbing

boundary at zero) can be expressed in terms of the cumulative distribution function of the non-

central χ2 process,

C(τ, f,K) = f

[

1 − χ′2(

K2(1−β)

(1 − β)2σ2τ,

3 − 2β

1 − β,

f2(1−β)

(1 − β)2σ2τ

)]

−Kχ′2(

f2(1−β)

(1 − β)2σ2τ,

1

1 − β,

K2(1−β)

(1 − β)2σ2τ

)

, (7.5)

where χ′2(x, k, λ) is the cumulative distribution function of the non-central χ2-distribution with

degrees of freedom k and non-centrality parameter λ. Major software packages such as MATLAB

58

provide built-in functionality for all the special functions required in this section. However, the

evaluation of χ′2 (ncx2cdf in MATLAB) can be quite slow as it is usually based on a truncation of

the infinite series,

χ′2(x, k, λ) =

∞∑

i=0

e−λ2 (λ/2)i

i!χ2(x, k + 2i) , (7.6)

where the cumulative central χ2-distribution χ2(x, k) can be expressed in terms of (incomplete)

gamma functions. Schroder [52] compiles a range of more efficient numerical approximation strate-

gies. For our purposes however, the build in functions will suffice.

Turning to the heat-kernel approximations for the CEV model, we note that the Σ-integrals

are given in App. B. As pointed out by Taylor [55], the recursion relation in Eq. (3.62) for the

coefficients ak(f, F ) of the asymptotic heat-kernel expansion of the transition density in Eq. (3.61)

can be solved explicitly and reduced to a simple algebraic recursion. The coefficient functions are

all of the form

ak(f, F ) =

[(1 − β)2

2(fF )1−β

]k

αk (7.7)

with α0 = 1 and

αk =

[

k − 2−β2(1−β)

] [

k + β2(1−β)

]

kαk−1 . (7.8)

The heat-kernel expansion for the transition density can thus be written as follows,

p(τ, f, F ) =

√

fβ

2πσ2τF 3βexp

(

− (F 1−β − f1−β)2

2σ2τ(1 − β)2

) ∞∑

k=0

[(1 − β)2σ2τ

2(fF )1−β

]k

αk . (7.9)

Note that the recursion in Eq. (7.8) terminates at the kth term for the following particular values

of β,

β =2k − 2

2k − 1, (7.10)

leading to a finite sum in Eq. (7.9). This occurs for the values β = 0, 23 ,45 ,

67 , etc. for which also

simplified option valuation formulas in terms of purely the normal distribution function are known

(see e.g. Sec. V in [52]).

Eq. (7.8) remains valid for β → 1 provided it is properly normalized as αk = (1 − β)2kαk.

In this case, we can resum the expansion in Eq. (7.9) noting that the recursion has the solution

ak(f,K) = (−1)k

8kk! such that∑∞

k=0 akσ2kτk = exp(−σ2τ

8 ) and the Eq. (7.9) reduces to the well-known

log-normal distribution,

p(τ, f, F ) =

√

f

2πσ2τF 3e−

ln2 Ff

2σ2τ− σ2τ

8 =1√

2πσ2τF 2e− 1

2σ2τ

(

ln Ff +σ2τ

2

)2

. (7.11)

Figs. 7.1 and 7.2 show a comparison of the heat-kernel expansion of the transition density with

the exact expression in Eq. (7.1) for different values of β, times to maturity τ and orders of the

expansion. For β = 12 , one notes a degeneracy at low rates F that becomes worse with increasing

order of the approximation. As is typical of an asymptotic series, for fixed parameters and fixed τ ,

59

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

F

p(τ,

f,F)

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

F

p(τ,

f,F)

Figure 7.1: CEV transition density for f = 1, σ = 0.3 and β = 12 and τ = 5 (left) or τ = 10 (right).

The solid line is the exact density in Eq. (7.1), the dashed lines give the heat-kernel expansion witha maximal k from top to bottom of 0, 1, 2, 3, 4, 5, 10, 20.

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

F

p(τ,

f,F)

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

F

p(τ,

f,F)

Figure 7.2: Same as Fig. 7.1 for β = 45

there is an optimal number of terms to be retained and beyond which the error committed increases

again until the series eventually diverges. For β = 45 which is among the special values in Eq. (7.10),

the series terminates after k = 2 and the first three terms provide actually an exact expression for

the transition density. We have only verified this equality numerically, but expect that it can also

be shown analytically by using the properties of special functions.

Using the integrals defined in Eq. (3.43), the value of the European call option has the following

series expansion,

C(τ, f,K) = (f −K)+ (7.12)

+

√

fβKβ

8π

∞∑

k=0

[(1 − β)2σ2τ

2(fK)1−β

]k

αkIk

((K1−β − f1−β)2

(1 − β)2, σ2τ

)

.

Here, the coefficients αk follow the recursion relation in Eq. (7.8) and the functions Ik can be

expressed in terms of elementary exponentials and the standard normal cumulative distribution

function via the recursion in Eq. (3.44).

Figs. 7.3 and 7.4 show a comparison of the option value in Eq. (7.13) with the exact result. Note

that for the special value β = 45 , the series again terminates with a maximal k of 2 and then coincides

60

10−3

10−2

10−1

100

101

0

0.2

0.4

0.6

0.8

1

τ

C(τ

,f,K

)

10−3

10−2

10−1

100

101

0

0.2

0.4

0.6

0.8

1

τ10

−310

−210

−110

010

10

0.2

0.4

0.6

0.8

1

τ

Figure 7.3: Time dependence of the value of the European call option for the CEV model in theheat-kernel approximation as compared to the exact solution for β = 1

2 and σ = 1. Left, middle andright panels are for f = 0.5, f = 1 and f = 1.5, respectively. The strike is K = 1. The solid lineis the exact option value in Eq. (7.5), the dashed lines give the heat-kernel expansion in Eq. (7.13)with a maximal k from top to bottom of 0, 1, 2, 3, 4, 5, 10, 20.

10−3

10−2

10−1

100

101

0

0.2

0.4

0.6

0.8

1

τ

C(τ

,f,K

)

10−3

10−2

10−1

100

101

0

0.2

0.4

0.6

0.8

1

τ10

−310

−210

−110

010

10

0.2

0.4

0.6

0.8

1

τ

Figure 7.4: Same as Fig. 7.3 for β = 45

with the exact result. But also for values of β away from the special values, the first few terms of

the expansion provide a good approximation of the option values for dimensionless time of up to

σ2τ = 1 or even larger. In contrast, if the involved cumulative normal distribution functions are

further expanded in an asymptotic expansion for small times τ , the series in Eq. (3.65) is obtained

which for the CEV model reduces to

C(τ, f,K) =

√

fβKβ(σ2τ)3

2π

2(1 − β)2

(K1−β − f1−β)2exp

((K1−β − f1−β)2

2(1 − β)2σ2τ

)

×∑

k

ck(f,K)(σ2τ)k , (7.13)

where the coefficients satisfy the recursion relation,

ck+1(f,K) = − (2k + 3)(1 − β)2

(K1−β − f1−β)2ck(f,K) + αk+1

((1 − β)2

2(fK)1−β

)k+1

, (7.14)

61

0 0.5 1-1

-0.5

0

0.5

1

1.5

2

τ

C(τ

,f,K

)

0 0.05 0.1-1

-0.5

0

0.5

1

τ0 0.5 1

0

0.5

1

1.5

2

τ

Figure 7.5: Comparison of the asymptotic expansion of the option value in the heat-kernel approx-imation with the exact solution for K = 1.0, β = 4

5 , σ = 1 and f = 0.5 (left), f = 1.1 (middle),f = 1.5 (right). The solid black line is the exact option value in Eq. (7.5), the dashed and dottedlines give the asymptotic expansion in Eq. (7.13) with a maximal k of 0 (dashed black), 1 (dottedblack), 2 (dashed red), 3 (dotted red), 4 (dashed green), 5 (dotted green), 10 (dashed blue), 20(dotted blue).

with the initial condition c0(f,K) = 1 and αk given by Eq. (7.8). Fig. 7.5 compares this asymptotic

series with the analytical formula. The approximation is much worse than Eq. (7.13) and only

applicable for times that are truly asymptotically small. Worse, the approximation deteriorates

with increasing order and oscillations around the true solution from even to odd orders are observed.

The range of applicability in time shrinks with the distance to the at-the-money point. Thus as an

approximation to the option value in its own, the series in Eq. (7.13) is useless. It has to be

resummed in order to avoid the oscillatory behaviour. One way of doing so is to determine the

implied volatility and using the Black model as a resummation. Thus, Eq. (7.13) should only be

used to match coefficients with a similarly expanded effective model such as the Black model.

Finally, Figs. 7.6 and 7.7 show the dependence of the implied volatility in the CEV model on the

time to maturity τ as well as the strike K. The heat-kernel approximation is given in Eqs. (3.70) to

Eqs. (3.73). One notices that the approximation is best for β close to 1, i.e. for a CEV model that is

still close to the effective Black model that is used when working with a log-normal implied volatility.

With decreasing β the time domain of applicability of the approximation shrinks and for β = 0.5

the benefit of the second-order approximation over the first order is not clear any more. For large τ ,

the second order does even shown a pathological degeneracy at low rates. A similar behaviour was

observed for the full SABR model by Paulot [46]. As the second order is very cumbersome to obtain

in the full SABR model and from the CEV results is not expected to yield great improvements, we

restrict the calculations to the first order in τ for the full model.

62

0 5 10 150.9

0.95

1

1.05

1.1

σL2τ

σ impl

/ σ L

0 5 10 15

0.7

0.8

0.9

1

1.1

1.2

1.3

σL2τ

Figure 7.6: Time dependence of the implied volatility in the CEV model for β = 0.8 (left) and β = 0.5(right). Note that due to the power law in the CEV model the prefactor σ has a dimension differentfrom the volatility in the Black model. We therefore use the effective the log-normal volatilityσL = σC(f)/f = σfβ−1 appropriate for the initial rate f as a normalization factor. Curves are forK/f = 0.8, 1.01, 1.2 from bottom to top. Solid lines show the exact result obtained from the optionvalue in Eq. (7.5) and a numerical inversion of Black’s formula. The dotted lines show the first-orderheat-kernel approximation, the dashed lines the second order.

7.1.2 Shifted log-normal and shifted CEV

The shifted log-normal and shifted CEV models in Eq. (1.13) move the lower boundary of the range

of possible interest rates from zero to a slightly negative rate −θ. These variants are therefore

discussed as modelling choices to account for occasional historical occurrences of negative rates.

Also, in the current environment with very low short-term rates, it might be advantageous to push

down the problematic absorbing boundary. Furthermore, the shifted log-normal model can be seen

as a local linearization of a more complex C(f) around the initial value f . It provides an easy way of

generating a skew in the implied volatility. Analytical results for the shifted models can be obtained

directly from the un-shifted versions by noting that the SDE of the shifted model reverts to the

original one by the change of variables Ft = Ft + θ. Thus, the transition density is given by

pshifted(τ, f, F ) = E[δ(Fτ − F )|F0 = f ] = E[δ(Fτ − (F + θ))|F0 = f + θ]

= p(τ, f + θ, F + θ) . (7.15)

Similarly, for the option value, we have

Cshifted(τ, f,K) = C(τ, f + θ,K + θ) . (7.16)

Using an implied Black volatility, this simple shift in the rates is accounted for by a non-trivial

time-dependent smile σimpl.(τ,K). An example of how well the heat-kernel approximation accounts

for the implied volatility in the shifted log-normal model is given in Fig. 7.8. Note that contrary

to the CEV model, the second order approximation does not show a pathological behaviour at low

strikes.

63

0 0.5 1 1.5 20.5

1

1.5

2

K

σ impl

. / σ L

0 0.5 1 1.5 20.5

1

1.5

2

K

Figure 7.7: Smile for the CEV model with β = 0.5 as well as σ2Lτ = 2 (left) and σ2

Lτ = 5 (right).The effective log-normal volatility σL is defined in the caption to Fig. 7.6. The solid line showsthe exact result whereas the dotted and dashed lines show the first- and second-order heat-kernelapproximation.

0 0.5 1 1.5 21

1.1

1.2

1.3

1.4

1.5

K

σ impl

. / σ

0 2 4 61.05

1.1

1.15

1.2

1.25

σ2τ

Figure 7.8: Smile (left) and time dependence of the implied volatility (right) for the shifted log-normal model with θ = 0.1, f = 1, σ = 1. On the left, we have σ2τ = 4 whereas on the right thecurves are for K = 1.2, 1.01, 0.8 from top to bottom. The solid line shows the exact result, the dotted,dot-dashed and dashed lines show the zeroth, first and second order heat-kernel approximation.

7.1.3 Two- and three-regime local volatility models

Let us now turn to the three-regime model in Eq. (1.12). To roughly mimic the empirical findings

of de Guillaume et al. [26], we take fL = 1, fR = 6 and κ = 0.25. However, we will mainly be

interested in the behaviour close to the lower switching point which could also be analysed by an

analogous two-regime model omitting the upper switching point.

Fig. 7.9 shows a comparison of the first two orders of the heat-kernel expansion for the transition

density with a Monte Carlo simulation. Diffusion starts at f = 1.5 to the right of the lower transition

point fL = 1. The distribution starts out normal until the lower regime is reached. With increasing

time some probability density seems to get trapped in the lower regime. The first order correction in

the heat-kernel expansion shows a pathological step at the transition point originating from a jump

in the first derivative of C(f). Due to this step, the approximation breaks down first close to the

64

0 1 2 3 40

0.5

1

1.5

F

p(τ,

f,F)

0 1 2 3 40

0.2

0.4

0.6

0.8

1

F

p(τ,

f,F)

0 1 2 3 40

0.2

0.4

0.6

0.8

F

p(τ,

f,F)

0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

F

p(τ,

f,F)

0 1 2 3 40

0.2

0.4

0.6

0.8

F

p(τ,

f,F)

0 1 2 3 40

0.5

1

1.5

F

p(τ,

f,F)

Figure 7.9: Comparison of heat-kernel expansion with a Monte Carlo simulation for the transitiondensity for the local volatility model with 3 linear regimes, fL = 1, fR = 6, κ = 0.25, f = 1.5.Dotted lines correspond to the leading order in the heat-kernel expansion, dashed lines include thefirst correction term. From upper-left to lower right: σ2τ = 0.1, 0.2, 0.5, 0.7, 1.0, 2.0. For the MCsimulation, 100 time steps and 106 paths were used in a logarithmic Euler scheme (see Sec. 6.2).The transition density is obtained by a histogram of the final rates scaled appropriately to matchthe normalization of the probability density.

65

0 2 4 6 80

0.5

1

σ impl

. / σ

0 2 4 6 80

0.5

1

0 2 4 6 80

0.5

1

0 2 4 6 80

0.5

1

K

σ impl

. / σ

0 2 4 6 80

0.5

1

K0 2 4 6 8

0

0.5

1

K

Figure 7.10: Implied volatility smile in the 3 regime local volatility model with fL = 1, fR = 6,σ2τ = 1. The initial rate is f = 0.9, 1.1, 3, 5.9, 6.1, 7 from top left to bottom right. Shown are thefinite difference solution of the forward Dupire equation (black) as well as the zeroth (red), first(green) and second order (blue) heat-kernel approximations.

transition point where at large times, the transition density can become negative.

Fig. 7.10 shows the implied volatility smile for different values of the initial rate f . Unfortunately,

the heat-kernel approximation shows again a degenerate behaviour for f close to a switching point

fL or fR. The approximation deteriorates with increasing order. For a better understanding of the

failure of the heat-kernel approach, let us look at the time dependence of the at-the-money implied

volatility for the same initial values f in Fig. 7.11. The first and second order of the approximation

seem to be completely off, except well inside the middle regime and at small times τ . It seems

that before any correction to the zeroth order smile, i.e. the τ = 0 limit, can be of a sizable effect,

the influence of the switching points is felt resulting in a different type of τ -dependence that is not

captured by the heat-kernel approach. In a sense, the quantities entering the approximation are of

a local nature. The functional shape of C(f) only enters the formulas on the interval [f,K] through

the geodesic distance and integrals of the derivatives in Σ. For both f and K inside the same

regime, the approximation can thus not be influenced by a nearby switching point. On the other

hand, one can view the switching point as a boundary between regimes with well-known solutions

for the pricing PDE. Starting from a binomial-tree representation of a two-regime model, we have

attempted to set up an expansion in the number of barrier crossings. The first two terms account

only for a very small fraction of the total transition density, thus a large and in the continuous

limit presumably infinite number of barrier crossings need to be taken into account. As we have not

managed to resum any subset of paths to a workable approximation for the local volatility model,

we do not show explicit results here. However, the argument shows that a good approximation

66

10-2

10-1

100

101

0.8

1

1.2

1.4

σ impl

. / σ

10-2

10-1

100

101

0.5

1

1.5

10-2

10-1

100

101

0.33

0.34

0.35

0.36

10-2

10-1

100

101

0.16

0.18

0.2

0.22

σ2τ

σ impl

. / σ

10-2

10-1

100

101

0.16

0.17

0.18

0.19

σ2τ10

-210

-110

010

10.16

0.17

0.18

0.19

σ2τ

Figure 7.11: Time dependence of the at-the-money implied volatility in the 3 regime model withfL = 1, fR = 6 for different initial rates f = 0.9, 1.1, 3, 5.9, 6.1, 7 from top left to bottom right.Shown are the finite-difference solution (black) as well as the zeroth (red), first (green) and secondorder (blue) heat-kernel approximations.

for a model with non-analytic switching points cannot be obtained with a local approach such as

the heat-kernel approximation. Also, the Monte Carlo results for the transition density in Fig. 7.9

clearly show an influence of the boundary even when both f and F are inside the middle regime

and in contrast to the heat-kernel approximation for which the leading order and the first order are

identical in this case showing no influence of the nearby fL.

7.1.4 Cubic toy local volatility model

To avoid the non-analytic switching points of C(f) while still roughly capturing the empirical findings

of [26], we analyze the cubic toy model introduced in Eq. (1.14) in the introduction. Fig. 7.12

shows a comparison of the heat-kernel approximation for the transition density with a Monte Carlo

simulation. Obviously, the pathological step from the three-regime model does not show up here. The

second-order heat-kernel approximation provides a very good description of the transition density

and even the first order yields the correct behaviour up to rather large τ . Fig. 7.13 compares the

implied volatility smile from the heat-kernel approximation with numerical finite-difference results.

Again, very good agreement is observed. Deviations first show for large f and large strike K as

the effective instantaneous volatility σinst. ≡ σC(f)/f and therefore the effective dimensionless time

σ2inst.τ is much larger there.

67

0 1 2 3 40

1

2

3

F

p(τ,

f,F)

0 1 2 3 40

0.5

1

1.5

F

p(τ,

f,F)

0 1 2 3 40

0.2

0.4

0.6

0.8

F

p(τ,

f,F)

0 1 2 3 40

0.5

1

F

p(τ,

f,F)

Figure 7.12: Comparison of heat-kernel expansion with a Monte Carlo simulation for the transitiondensity for the cubic toy model with fc = 2, f = 1. Dotted lines correspond to the leading order inthe heat-kernel expansion, dashed lines include the first correction term. From upper-left to lowerright: σ2τ = 0.1, 0.5, 1.0, 2.0.

0 2 4 60.2

0.4

0.6

0.8

σ impl

. / σ

0 2 4 60.2

0.4

0.6

0.8

0 2 4 60.2

0.4

0.6

0.8

0 2 4 60.2

0.3

0.4

0.5

K

σ impl

. / σ

0 2 4 60.2

0.3

0.4

0.5

K0 2 4 6

0.2

0.3

0.4

0.5

K

Figure 7.13: Implied volatility smile in the cubic toy model with fc = 2, σ2τ = 3. The initialrate f = 1, 2, 3, 4, 5, 6 from top left to bottom right. Shown are the finite difference solution of theforward Dupire equation (black) as well as the zeroth (red), first (green) and second order (blue)heat-kernel approximations.

68

0 1 2 30

0.5

1

1.5

p m

0 1 2 30

0.5

1

1.5

0 1 2 30

0.5

1

1.5

FT

p m

0 1 2 30

0.5

1

1.5

FT

Figure 7.14: Marginal transition density for the SABR model with ρ = 0, β = 0.5, f = 1, α = 0.3,ν = 0.5 and different times to maturity τ = 1, 5, 10, 20 from top left to bottom right. The solidline is the leading heat-kernel expansion whereas the histograms are from a Monte Carlo simulationusing a truncated Euler scheme with 106 paths and 1000 time steps.

7.2 SABR model

This section presents the main results for the full SABR model. We treat the standard SABR model

first and then move on to the extended version. The cases ρ = 0 and ρ 6= 0 are treated separately

as for the former we can use the mixing approach described in Chap. 4.

7.2.1 Standard SABR model

7.2.1.1 ρ = 0

We start with the special case of vanishing correlation ρ between the forward rate and the volatility

processes. As an illustration, Fig. 7.14 shows a comparison of the approximate marginal transition

density in Eq. (3.112) with histograms from a Monte Carlo simulation. The qualitative features

are described correctly up to rather large time τ . However, the absorbing boundary seems to be

treated differently. Whereas the true transition density has a delta function contribution at zero

as for the underlying CEV model, the heat-kernel approximation shows some continuously smeared

out probability weight for small rates.

The main focus of the approximations is on the description of the implied volatility smile. To

clearly show the differences between the various approaches, we use an adverse test case with a

rather high volatility of volatility ν = 0.5, an initial volatility of α = 0.3 and a low initial rate of

f = 1. Fig. 7.16 compares the different approximation schemes with numerical results for τ = 5.

The Hagan formula and the first order heat-kernel approximation are of comparable quality. They

agree at the money. For small and large strikes the heat-kernel approximation is slightly better.

69

0 1 2 30

0.5

1

1.5

p m

0 1 2 30

0.5

1

1.5

0 1 2 30

0.5

1

1.5

FT

p m

0 1 2 30

0.5

1

1.5

FT

Figure 7.15: Same as Fig. 7.15 with ρ = −0.5.

Very similar results are obtained from a numerical solution of the one-dimensional Dupire equation

with an effective local volatility as described in Chap. 5. A very fast and accurate description of

the smile for ρ = 0 is obtained by the mixing approach described in Chap. 4. Fig. 7.17 shows the

same comparison for different times to maturity. For small τ , all approaches yield similar results

and deciding which one is better would amount to chasing numerical noise. In practice, bid-ask

spreads will be larger than differences between approaches for small times. The mixing approach

yields a good description for all choices of τ , although at intermediate times some deviations to the

finite difference solution do show up. The strength of the mixing approach lies in the fact that it

gives very good results even or in particular for very large τ where asymptotic expansions fail. The

Hull-White expansion as described in Sec. 4.4 yields the correct form of the smile for very small τ ,

but quickly breaks down for larger τ . We will therefore not consider it any further in the following.

7.2.1.2 ρ 6= 0

Fitting of the standard SABR model to market option prices usually results in rather large negative

correlations ρ. Thus, we analyse the same test case as in the previous subsection, now with ρ = −0.5.

Mixing is not applicable any more, but a contravariate type of correction as discussed in Sec. 4.6

can be attempted. Fig. 7.15 shows the marginal transition density. Figs. 7.18 and 7.19 show results

for the smile obtained within the different approximation schemes. The corrected mixing scheme is

not quite as accurate as pure mixing for ρ = 0, but still seems to be the best choice of method for

the parameter range considered.

70

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.25

0.3

0.35

0.4

0.45

0.5

K

σ impl

.

Figure 7.16: Implied volatility smile for the SABR model with ρ = 0, β = 0.5, f = 1, α = 0.3,ν = 0.5, τ = 5. Shown are the heat-kernel approximation in the leading time-independent order(dotted black) and with the order τ correction (solid black), the Hagan formula (blue), the mixingapproach with 10 (dotted green), 20 (dashed green) and 100 quantiles (solid green), the effective-local-volatility scheme (magenta) as well as numerical results from a finite-difference ADI scheme(black crosses) and a Monte Carlo simulation (solid red) with an estimated r.m.s. error bar (dottedred). To achieve the shown accuracy, the finite difference scheme uses 200 grid points in all threedirections for a uniform grid in time and non-uniform grids in space with concentration points aroundthe strike K and the initial volatility α. The Monte Carlo simulation uses the truncated Euler schemewith 4 · 106 paths with 1000 time steps each, adjusted such that the weak and sampling error areroughly of the same size. The Monte Carlo simulation took about 15 minutes, the two-dimensionalfinite-difference scheme about 5 seconds per data point and the one-dimensional finite differenceruns to obtain option values for all strikes for the averaging scheme and the Dupire scheme about 5milliseconds.

71

0 0.5 1 1.5 20.25

0.3

0.35

0.4

0.45

0.5

σ impl

.

0 0.5 1 1.5 20.25

0.3

0.35

0.4

0.45

0.5

0 0.5 1 1.5 20.25

0.3

0.35

0.4

0.45

0.5

K

σ impl

.

0 0.5 1 1.5 20.25

0.3

0.35

0.4

0.45

0.5

K

Figure 7.17: Same as Fig. 7.16 for other times to maturity τ = 1, 2, 4, 10 from top left to bottomright. We only show the leading result for each approximation scheme and the time-consuming MonteCarlo simulation is omitted. In addition the solid yellow lines show the Hull-White expansion asdescribed in Sec. 4.4

72

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.2

0.25

0.3

0.35

0.4

0.45

0.5

K

σ impl

.

Figure 7.18: Implied volatility smile for the SABR model with ρ = −0.5 and all other parametersidentical to Fig. 7.16. The solid green line now shows the result of the mixing approach correctedin a contravariate type by the heat-kernel result as described in Sec. 4.6. The color code for allother results is the same as in Fig. 7.16. The 2D finite difference results are now obtained with athird-order Khaliq-Twizell scheme as described in Sec. 6.1.2.

73

0 0.5 1 1.5 20.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

σ impl

.

0 0.5 1 1.5 20.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 0.5 1 1.5 20.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

K

σ impl

.

0 0.5 1 1.5 20.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

K

Figure 7.19: Implied volatility smile for the SABR model with ρ = −0.5 for τ = 1, 4, 7, 10 from topleft to bottom right. All other parameters and color codes are the same as in Fig. 7.18.

74

7.2.2 Extended SABR model

7.2.2.1 ρ = 0

0 1 2 30

0.5

1

1.5

p m

0 1 2 30

0.5

1

1.5

0 1 2 30

0.5

1

1.5

FT

p m

0 1 2 30

0.5

1

1.5

FT

Figure 7.20: Marginal transition density for the extended 3 regime SABR model with ρ = 0, fL = 1,fR = 6, κ = 0.25, f = 1.5, α = 0.3, ν = 0.5 and different times to maturity τ = 1, 5, 10, 20 fromtop left to bottom right. The solid line is the leading heat-kernel expansion whereas the histogramsare from a Monte Carlo simulation using a logarithmic Euler scheme with 106 paths and 1000 timesteps.

Let us now turn to the model of main interest, namely the extended 3 regime SABR model.

Fig. 7.20 shows the marginal transition density for ρ = 0 and an initial value f to the right of

the left switching point. The qualitative features are again well described by the leading-order

heat-kernel approximation.

Fig. 7.21 shows the implied volatility smile for different times to maturity for the same initial

value f = 1.5. Similar to the underlying local volatility model discussed in Sec. 7.1.3, the first-order

heat-kernel approximation to the implied volatility shows a pathological step at the switching point.

The effective-local-volatility scheme does not show this step, but seems otherwise to yield the same

level of accuracy as the heat-kernel expansion. The quantile-averaging procedure works very well for

all times τ although at intermediate times, the asymptotic expansions are slightly better. Note that

we had to increase the number of gridpoints in the K-grid for the one-dimensional finite-difference

solvers in order to avoid small oscillations from the non-analytic switching points.

75

0 1 2 30.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

σ impl

.

0 1 2 30.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 1 2 30.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

K

σ impl

.

0 1 2 30.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

K

Figure 7.21: Implied volatility smile for the the extended 3 regime SABR model with ρ = 0, fL = 1,fR = 6, κ = 0.25, f = 1.5, α = 0.3, ν = 0.5 and different times to maturity τ = 1, 2, 5, 10 fromtop left to bottom right. The color codes are the same as in Fig. 7.17. Note that there is no Haganapproximation for the extended model.

7.2.2.2 ρ 6= 0

For finite correlation ρ, the mixing approach is not applicable by itself. We have seen for ρ = 0 that

the heat-kernel approach fails for the model with 3 regimes and non-analytic switching points between

them. We will thus use the effective-local-volatility scheme to obtain a contravariate correction to

mixing instead of the first-order heat-kernel approximation. Fig. 7.22 shows the time evolution

of the smile for an initial value of f = 1.5 to the right of the first switching point. Here again,

the corrected mixing scheme provides a very good description of the smile for all τ although the

effective-local-volatility scheme is slightly superior at intermediate times to maturity. The leading-

order heat-kernel approximation provides the correct limit for τ → 0, but the first correction term

shows a pathology at the switching point and should therefore not be used. Fig. 7.23 shows the

dependence of the smile on the level of the current rate f . The pathology of the first-order heat-

kernel approximation is strongest for f close to the lower switching point. A similar behaviour

occurs at the upper switching point, but as the effective instantaneous volatility C(f)/f and thus

76

0 1 2 30.1

0.2

0.3

0.4

0.5

σ impl

.

0 1 2 30.1

0.2

0.3

0.4

0.5

0 1 2 30.1

0.2

0.3

0.4

0.5

K

σ impl

.

0 1 2 30.1

0.2

0.3

0.4

0.5

K

Figure 7.22: Implied volatility smile for the extended 3 regime SABR model with ρ = −0.5, and allother parameters and color codes as in Fig. 7.21. The corrected mixing scheme (green line) is nowcorrected by the effective local volatility scheme and not the heat-kernel result (see text).

the effective dimensionless time to maturity is smaller for larger f the pathology is less pronounces

there. The effective-local-volatility scheme produces the correct shape of the smile for all f . For

small f however the overall level of the smile is not described well.

Finally, Fig. 7.24 shows the implied volatility in the extended SABR model with a cubic local-

volatility function. The parameter fc has been chosen such that value C(fc) at the plateau is

identical to the 3 regime model. The qualitative features are very similar to the three regime model

although the details differ in particular at the right wing. Note that the heat-kernel approximation

does not show any pathology for this toy model and can also be used for the contravariate correction

to mixing.

77

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

σ impl

.

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

σ impl

.

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

K

σ impl

.

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

K

Figure 7.23: Implied volatility smile for the extended 3 regime SABR model with ρ = −0.5, T = 10for different values of the current rate f = 0.9, 1.1, 4, 5.9, 6.1, 7 from top left to bottom right. Allother parameters and color codes are as in Fig. 7.22.

78

0 2 4 60

0.1

0.2

0.3

0.4

0.5

σ impl

.

0 2 4 60

0.1

0.2

0.3

0.4

0.5

0 2 4 60

0.1

0.2

0.3

0.4

0.5

K

σ impl

.

0 2 4 60

0.1

0.2

0.3

0.4

0.5

K

Figure 7.24: Implied volatility smile for the extended SABR model with a cubic local volatilityfunction C(f) with fc = 3, ρ = −0.5, α = 0.3, ν = 0.5, T = 10 for different values of the currentrate f = 0.5, 1, 3, 5 from top left to bottom right. The mixing scheme (green line) is corrected bythe heat-kernel result. The blue line gives the result of the Hagan formula in Eq. (D.3) with I0 andI1 from Eqs. (D.4) and (D.6).

79

Chapter 8

Conclusions

In this thesis, we have reviewed and extended approximation schemes for the standard SABR model

as well as an extended SABR model for several choices of a general local volatility function C(f).

We have also analyzed the underlying local volatility models in some detail.

We find that asymptotic heat-kernel expansions yield good agreement with numerical solutions

for small to intermediate times to maturity as long as the local volatility function C(f) does not

contain kinks as for a piecewise linear 3 regime model. An effective-local-volatility scheme based on

a very recent proposal by Andreasen and Huge [2] yields the same level of accuracy as the first-order

heat-kernel expansion of the implied volatility. However, it has the advantage to be robust also for a

model with different linear regimes and non-analytic switching points. We extend a mixing scheme

proposed by Barjaktarevich and Rebonato [48] to a general local volatility function by solving the

one-dimensional pricing problem numerically using a finite-difference solver for the forward Dupire

equation. We find that it works remarkably well for ρ = 0 and is best at large times to maturity

τ where asymptotic expansions fail. Even for ρ 6= 0 it can be combined with the asymptotic heat-

kernel or effective-local-volatility expansions using a simple contravariate type of correction. For

the parameter ranges analyzed, we find that this corrected mixing scheme is superior to the other

approximation schemes except for some intermediate ranges of τ , where the asymptotic expansions

are slightly better.

80

Appendix A

Laplace method forone-dimensional integrals

In this appendix, we give a brief summary of the Laplace asymptotic expansion method for one-

dimensional integrals on the real axis as needed in Sec. 3.1.3 to evaluate the expected local volatility

and the marginal density function of the SABR model in the heat-kernel expansion. The Laplace

method is also known as steepest descent method and its generalization to higher dimensions and

functional integrals are the basis for Feynman diagram expansions in theoretical physics. Thus,

consider the integral,

I(λ) =

∫ b

a

f(x)e−λg(x) , (A.1)

where f(x) and g(x) are sufficiently smooth functions of one real variable x. For asymptotically

large real λ the integral will be dominated by values of x around minima of g(x). For simplicity, we

will assume that g(x) has a single minimum at x∗ such that we can expand both f(x) and g(x) in

a Taylor expansion around x∗ yielding,

I(λ) = e−λg(x∗)∞∑

k=0

∫ b

a

f (k)(x∗)

k!(x − x∗)k exp

(

−λ∞∑

l=2

g(l)(x∗)

l!(x− x∗)l

)

dx

=e−λg(x∗)

√λ

∞∑

k=0

∫ yb

ya

f (k)(x∗)

k!

yk

λk2

exp

(

−∞∑

l=3

g(l)(x∗)

l!g(2)(x∗)

yl

λl2

)

e−y2

2 dy . (A.2)

In the second line, we have changed the variable of integration to y = λ(x − x∗) and defined the

shortcuts λ = λg(2)(x∗) as well as ya =√λ(a−x∗) and yb =

√λ(b−x∗). The integrand in Eq. (A.2)

can now be expanded in a series in powers of λ− 12 . The remaining integrals can be calculated

recursively by partial integration,

∫ yb

ya

yne−y2

2 dy = −yn−1e−y2

2

∣∣∣∣

yb

ya

+ (n− 1)

∫ yb

ya

yn−2e−y2

2 dy , (A.3)

and be reduced to the case n = 0 for even n and n = 1 for odd n. Thus in principle, we can evaluate

the integrals exactly in terms of elementary functions and the cumulative normal distribution func-

tion. However, the boundaries of the integral are asymptotically pushed to ±∞ since they contain

81

a factor√λ. Eq. (A.3) shows that the boundaries only yield exponentially small correction terms

which do not contribute to the asymptotic expansion. We will be interested in the case a = 0 and

b = ∞ and thus only need the moments of the Gaussian distribution. For odd n = 2m + 1 the

integrals then vanish whereas for even n = 2m, we have

∫ +∞

−∞y2me−

y2

2 =√

2π(2m− 1) · (2m− 3) · · · · · 3 · 1√

2π =√

2π(2m)!

2mm!. (A.4)

Carefully expanding the integrand in Eq. (A.2) in inverse powers of λ and thereby the prefactor of

the Gaussian exponential in powers of y, we obtain the asymptotic expansion of I(λ) up to the first

two correction terms as follows,

I(λ) =

√

2π

λe−λg(x∗)

[

f0 +1

λ

(f22

− f1g32

− f0g48

+5f0g

23

24

)

(A.5)

+1

λ2

(f48

− 5f3g312

− 5f2g216

− f1g58

− f0g648

+35f0g

24

192

)

+ O(λ−3)

]

,

where we have defined the shortcuts fk = f (k)(x∗) and gk = g(k)(x∗)/g(2)(x∗).

82

Appendix B

Sigma functions

In this appendix, we compile analytical expressions for the two Σ-functions,

Σ(f) =

∫ f dφ

C(φ), Σ(f) =

∫ f [C′(φ)]2

C(φ)dφ , (B.1)

for all choices of C(f) discussed in Eqs. (1.11) to (1.14) of the introduction. These integrals are

needed throughout the thesis in different contexts, in particular for the heat-kernel expansions.

The CEV model and the shifted log-normal model are particular cases of the shifted CEV model

in Eq. (1.13),

C(f) = (f + θ)β . (B.2)

For this choice and 0 ≤ β < 1, the integrals read as,

Σ(f) =(f + θ)1−β

1 − β, Σ−1(q) = [(1 − β)q]

11−β − θ , Σ(f) = − β2

(1 − β)(f + θ)1−β. (B.3)

For β = 1, we have

Σ(f) = Σ(f) = ln(f + θ) , Σ−1(q) = eq − θ , (B.4)

For the three regime model in Eq. (1.12), i.e.

C(f) =

f , f ≤ fL ,fL , fL ≤ f ≤ fR ,

fL + κ(f − fR) , f ≥ fR .(B.5)

the integrals and the inverse can also be calculated analytically. We obtain

Σ(f) =

ln(f/fL) , f ≤ fL ,ffL

− 1 , fL ≤ f ≤ fR ,fRfL

− 1 + 1κ ln

[

1 + κ f−fRfL

]

, f ≥ fR ,

(B.6)

and

Σ−1(q) =

fLeq , q ≤ 0 ,

fL(q + 1) , 0 ≤ q ≤ fR − fL ,

fLκ

[

eκ(q− fR

fL+1) − 1

]

+ fR q ≥ fR − fL ,(B.7)

83

as well as

Σ(f) =

ln(f/fL) , f ≤ fL ,0 , fL ≤ f ≤ fR ,

κ ln[

1 + κ f−fRfL

]

, f ≥ yR .(B.8)

Finally, for the cubic toy model in Eq. (1.14), i.e.

C(f) =fc3

[(f

fc− 1

)3

+ 1

]

, (B.9)

we obtain

Σ(f) = 3

∫ ffc

−1

0

du

1 + u3=

1

2ln

(ffc

)2

(ffc

− 32

)2

+ 34

+√

3

{π

6+ arctan

[2√3

(f

fc− 3

2

)]}

, (B.10)

as well as

Σ(f) = 3

∫ fyc

−1

0

u4

1 + u3du =

3

2

(f

fc− 1

)2

+1

2ln

(ffc

)2

(ffc

− 32

)2

+ 34

−√

3

{π

6+ arctan

[2√3

(f

fc− 3

2

)]}

. (B.11)

The function in Eq. (B.10) does not seem to be invertible in terms of elementary functions.

84

Appendix C

Moments of the exponentialintegral of Brownian motion

In this appendix, we give an elementary derivation of a recursion relation for the moments of the

distribution of total variance as needed in Sec. 4.4. The normalized volatility process follows the

SDE,

dαs = αsdWs , (C.1)

with α0 = 1. The solution of this SDE is well-known to be

αs = e−12 s+Ws , (C.2)

and thus the total variance is given by

As =

∫ s

0

α2s′ds

′ =

∫ s

0

e−s′+2Ws′ds . (C.3)

Note, that in this appendix, we drop the superscript (− 12 ) from the expression for the exponential

integral At of Brownian motion for notational convenience. The expectation of As can now be

expressed as follows,

< As >= E[As|F0] =

∫ s

0

e−s′E[e2Ws′ |F0]ds′ =

∫ s

0

es′ds′ = es − 1 . (C.4)

Here, we have used that E[e2Ws |F0] = e2s. To show this, consider the process Xt = exWt− x2t2 . It is

a Martingale, since by Ito’s Lemma, we have dXt = xdWt, and thus no drift term. Consequently,

we obtain for s1 < s2,

E[ex(Ws2−Ws1 )|Fs1 ] = ex2(s2−s1)

2 . (C.5)

85

In particular for s2 = s and s1 = 0, this yields E[exWs ] = ex2s2 . The second moment of the

distribution is given by

< A2s > =

∫ s

0

ds1

∫ s

0

ds2e−s1−s2E[e2(Ws1+Ws2 )|F0]

= 2

∫ s

0

ds1

∫ s

s1

ds2e−s1−s2E[e4Ws1+2(Ws2−Ws1 )|F0]

= 2

∫ s

0

ds1

∫ s

s1

ds2e−s1−s2E[e4Ws1E[e2(Ws2−Ws1)|Fs1 ]|F0]

= 2

∫ s

0

ds1

∫ s

s1

ds2e−s1−s2e8s1e2(s2−s1)

=1

15e6s − 2

5es +

1

3. (C.6)

For the centralized moment, this yields

< (As− < As >)2 >=< A2s > − < As >

2=1

15e6s − e2s +

8

5es − 2

3. (C.7)

We can now proceed along the same line for higher moments, obtaining

< Ans >= n!

∫ t

0

ds1

∫ t

s1

ds2 . . .

∫ t

sn−1

dsne−

∑ni=1 siE[e2

∑ni=1 Wsi ] . (C.8)

As before, we separate the sum in the exponent into independent increments

n∑

i=1

Wsi =

n−1∑

i=0

(n− i)(Wsi+1 −Wsi ) , (C.9)

where for convenience, we have defined s0 = 0. Thus, taking the expectation value yields

E[e2∑n

i=1 Wsi ] = e2∑n−1

i=0 (n−i)2(si+1−si) = e∑n

i=1[4(n−i)+2]si (C.10)

It is then easily shown by induction, that the remaining integral

In =

∫ t

s

ds1

∫ t

s1

ds2 . . .

∫ t

sn−1

dsne∑n

i=1[4(n−i)+1]si , (C.11)

has a representation of the form

In =n∑

k=0

cn,keakt+(an−ak)s , (C.12)

with ak = 2k2 − k, c1,1 = −c1,0 = 1. The moments can be expressed in terms of these coefficients as

< Ans >= n!

n∑

k=0

cn,keaks . (C.13)

From the induction step, the following recursion relation for the coefficients emerges

cn+1,k = − cn,kan+1 − ak

, k = 0, . . . , n ,

cn+1,n+1 =

n∑

k=0

cn,kan+1 − ak

. (C.14)

86

Starting from c1,1 = −c1,0 = 1, we then obtain for the first few coeefficients,

c2,0 =1

6, c2,1 = −1

5, c2,2 =

1

30, (C.15)

reproducing Eq. (C.6) as well as

c3,0 = − 1

90, c3,1 =

1

70, c3,2 = − 1

270, c3,3 =

1

1890, (C.16)

and thus

< A3s >=

1

315e15s − 1

45e6s +

3

35es − 1

15. (C.17)

Using the binomial formula, we can use this result to obtain the centralized third moment,

< (As− < As >)3 >=< A3s > −3 < A2

s >< As > +2 < As >3 . (C.18)

The result of the algebra is summerized in Eq. (4.22). Note that the normalization of the moments

is such that < (As− < As >)n >= snM(n)c (s).

87

Appendix D

Hagan’s approximation

In this appendix, we give a brief summary of the original asymptotic formulas for the SABR model

derived by Hagan and co-workers for convenient reference. We slightly change the notation for

better comparison with the results presented in this thesis. In the original paper [27], they give the

following formula for the implied volatility as their central result,

σB(f,K) =α

(fK)(1−β)/2{

1 + (1−β)2

24 log2 f/K + (1−β)4

1920 log4 f/K + . . .} · ζ

x(ζ)·

{

1 +

[(1 − β)2

24

α2

(fK)1−β+

1

4

ρβνα

(fK)(1−β)/2+

2 − 3ρ2

24ν2]

τ + . . .

}

, (D.1)

where

ζ =ν

α(fK)(1−β)/2 log f/K , x(z) = log

{√

1 − 2ρz + z2 + z − ρ

1 − ρ

}

. (D.2)

This seems to be the formula most widely used in practice to fit the form of the smile. In intermediate

steps of the derivation, they also give a similar formula in Eq. (B.65) of [27] for a general C(f). Both

formulas are of the form (see also [45]),

σB(f,K) = I0(f,K)[1 + τI1(f,K)] + O(τ2) . (D.3)

For the general C(f), Eq. (B.65) in [27] yields

IH0 (f,K) =ν log f/K

x(ζ)

ζ

z, (D.4)

where

z =ν

α

∫ f

K

df ′

C(f ′), (D.5)

and ζ = να

f−KC(fav)

, with the geometric average fav =√fK, is an approximation of z for small f −K.

As a general first order term, they give

I1(f,K) =2γ2 − γ2

1 + 1/f2av

24α2C2(fav) +

1

4ρναγ1C(fav) +

2 − 3ρ2

24ν2 , (D.6)

where γ1 = C′(fav)/C(fav) and γ2 = C′′(fav)/C(fav).

88

Comparing with the heat-kernel approximation, we recognize z = να∆Σ and |x(z)| = dmin/

√2

as the minimal geodesic distance. However, the prefactor of the heat-kernel result for the implied

volatility differs from Eq. (D.4) and in their notation reads as

Ihk0 (f,K) =ν log f/K

x(z). (D.7)

Ob loj [45] has pointed out this inconsistency and showed that using Eq. (D.7) instead of Eq. (D.4)

improves the degeneracy of the Hagan approximation for very low strikes. By using the approximate

ζ instead of z, an approximation for small moneyness is introduced. Also, in Eq. (D.1) an additional

expansion of z close to the at-the-money point is used. Thus, the Hagan approximation is effectively

a double expansion in time to maturity τ and moneyness. Note also, that it is somewhat inconsistent

to use log f/K and powers of K in the same formulas as the first imply an expansion in moneyness.

In a later paper [28], Hagan et al. also give an explicit formula for the effective local (normal)

volatility, which in our notation reads as

Ceff.(K) = αC(K)

[

1 +α3C′(f)[ρ cosh(dmin/

√2) − sinh(dmin/

√2)]

α2αC′(f)dmin/√

2 + 2√

1 − ρ2α2νν2τ + . . .

]

. (D.8)

This should be compared to Eq. (3.115), the correction term proportional to τ does however not seem

to agree. Also, they erroneously state that their result is a ’refinement of the original expression for

the implied volatility’ whereas their calculations suggest that the quantity is in fact a local volatility.

89

References

[1] M. Abramowitz and I.A. Stegun. Handbook of mathematical functions with formulas,

graphs, and mathematical tables, 55. Dover publications, 1964.

[2] J. Andreasen and B. Huge. Zabr–expansions for the masses. http://ssrn.com/abstract=

1980726, 2012.

[3] F. Antonelli and S. Scarlatti. Pricing options under stochastic volatility: a power series

approach. Finance Stoch., 13:269–303, 2009.

[4] M. Avellaneda. From sabr to geodesics. Presentation available at https://www.math.nyu.edu/

faculty/avellane/Lecture12Quant.pdf, 2009.

[5] I. G. Avramidi. Analytic and geometric methods for heat kernel applications in finance.

http://infohost.nmt.edu/ iavramid/notes/hkt/hktutorial13.pdf, 2007.

[6] B. Bartlett. Hedging under sabr model. Wilmott Magazine, pages 2–4, 2006.

[7] S. Benaim, P. Friz, and R. Lee. On the black-scholes implied volatility at extreme strikes.

Frontiers in Quantitative Finance, pages 19–45, 2008.

[8] H. Berestycki, J. Busca, and I. Florent. Asymptotics and calibration of local volatility

models. Quantitative finance, 2(1):61–69, 2002.

[9] H. Berestycki, J. Busca, and I. Florent. Computing the implied volatility in stochastic

volatility models. Comm. Pure Appl. Math., 57(10):1352–1373, 2004.

[10] F. Black. The pricing of commodity contracts. J. of Financial Economics, 3:167–179, 1976.

[11] F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of

Political Economy, 81:631–659, 1973.

[12] M. Broadie and O. Kaya. Exact simulation of stochastic volatility and other affine jump

diffusion processes. Operations Research, pages 217–231, 2006.

[13] Sean M. Carroll. Lecture notes on general relativity. arXiv:gr-qc/9712019, 1997.

90

[14] B. Chen, C.W. Oosterlee, and J.A.M. Weide. A low-bias simulation scheme for the sabr

stochastic volatility model. International Journal of Theoretical and Applied Finance, 2012.

[15] J.C. Cox and S.A. Ross. The valuation of options for alternative stochastic processes. Journal

of financial economics, 3(1):145–166, 1976.

[16] E. Derman and I. Kani. Riding on a smile. Risk, pages 277–284, 1994.

[17] D. Duffie and P. Glynn. Efficient monte carlo simulation of security prices. The Annals of

Applied Probability, 5(4):897–905, 1995.

[18] B. Dupire. Pricing with a smile. Risk, 7:18–20, 1994.

[19] M. Forde. Exact pricing and large-time asymptotics for the modified sabr model and the

brownian exponential functional. Int. J. Theor. Appl. Fin., 14:119, 2011.

[20] M. Forde. Stocvol smile toolpack documentation: a survey of asymptotic results for

stochastic volatility and exponential levy models. Available at http://webpages.dcu.ie/ for-

dem/research/FormulaSheet.pdf, 2011.

[21] M. Forde and A. Pogudin. The large-maturity smile for the sabr and cev-heston models,

2011.

[22] J.-P. Fouque, G. Papanicolaou, K.R. Sircar, and K. Solna. Singular perturbations in

option pricing. SIAM J. Appl. Math., 63(5):1648–1665, 2003.

[23] J.P. Fouque, G. Papanicolaou, and K.R. Sircar. Mean-reverting stochastic volatility.

International Journal of Theoretical and Applied Finance, 3(1):101–142, 2000.

[24] J.P. Fouque, G. Papanicolaou, R. Sircar, K. Solna, et al. Multiscale stochastic

volatility asymptotics. Multiscale Modeling and Simulation, 2:22–42, 2004.

[25] J. Gatheral, E.P. Hsu, P. Laurence, C. Ouyang, and T.H. Wang. Asymptotics of

implied volatility in local volatility models. Mathematical Finance, 2010.

[26] N. de Guillaume, R. Rebonato, and A. Pogudin. The nature of the dependence of the

magnitude of rate moves on the level of rates: A universal relationship. preprint, November

2010.

[27] P. S. Hagan, D. Kumar, A. S. Lesniewski, and D. E. Woodward. Managing smile risk.

Willmott Magazine, pages 84–108, July 2002.

[28] P.S. Hagan, A.S. Lesniewski, and D.E. Woodward. Probability distribution in the sabr

model of stochastic volatility. preprint, March 2005.

91

[29] P.S. Hagan and D.E. Woodward. Equivalent black volatilities. Applied mathematical

Finance, 6:147–157, 1999.

[30] P. Henry-Labordere. A general asymptotic implied volatility for stochastic volatility models.

Available at SSRN: http://ssrn. com/abstract/698601, 2005.

[31] P. Henry-Labordere. Solvable local and stochastic volatility models: Supersymmetric meth-

ods in option pricing. arXiv:cond-mat/0511028, 2005.

[32] P. Henry-Labordere. Unifying the bgm and sabr models: A short ride in hyperbolic geom-

etry. arXiv:cond-mat/0504317, 2005.

[33] S. Heston. A closed-form solution for options with stochastic volatility with applications to

bond and currency options. Review Financ. Stud., 6:327–343, 1993.

[34] J. Hull and A. White. The pricing of options on assets with stochastic volatilities. Journal

of Finance, pages 281–300, 1987.

[35] A. Q. M. Khaliq and E. H. Twizell. L0-stable splitting methods for the simple heat

equation in two space dimensions with homogeneous boundary conditions. SIAM journal on

numerical analysis, pages 473–484, 1986.

[36] T. Kluge. Pricing derivatives in stochastic volatility models using the finite difference method.

Thesis, Technische Universitat Chemnitz, Fakultat fur Mathematik, 2002.

[37] A. Lesniewski. Wkb method for swaption smile. Courant Institute Lecture, available at

http://lesniewski.us/papers/presentations/Courant020702.pdf, 2002.

[38] A. Lesniewski. The uses of differential geometry in finance. Bloomberg, available at

http://lesniewski.us/papers/presentations/Bloomberg112505.pdf, November 2005.

[39] A. Lewis. Geometries and smile asymptotics for a class of stochastic volatility models.

www.optioncity.net, 2007.

[40] A. L. Lewis. Option valuation under stochastic volatility. Finance press, 2000.

[41] A. Lindsay and D. Brecher. Results on the CEV Process, Past and Present. SSRN eLibrary,

2010.

[42] R. Lord, R. Koekkoek, and D. Van Dijk. A comparison of biased simulation schemes for

stochastic volatility models. Quantitative Finance, 10(2):177–194, 2010.

[43] H. Matsumoto and M. Yor. Exponential functionals of brownian motion, i: Probability

laws at fixed time. Probability Surveys, 2:312–347, 2005.

92

[44] R.C. Merton. Theory of rational option pricing. The Bell Journal of Economics and Man-

agement Science, 4(1):141–183, 1973.

[45] J. Ob loj. Fine tune your smile, correction to hagan et al. arXiv:0708.0998, 2008.

[46] L. Paulot. Asymptotic implied volatility at the second order with application to the sabr

model. Order, 2:20, 2009.

[47] R. Rebonato. Which process gives rise to the observed dependence of swaption implied

volatilities on the underlying? Intern. J. of Theor. and Appl. Finance, 6(4):419–442, 2003.

[48] R. Rebonato and J. P. Barjaktarevic. Improving on the hagan expansion, presentation

slides. private communication, 2011.

[49] R. Rebonato and J. Chen. Evidence for state transition and altered serial codependence in

us $ interest rates. Quantitative Finance, 9(3):259–278, 2009.

[50] R. Rebonato and D. Kainth. A two-regime stochastic-volatility extension of the libor market

model. Intern. J. of Theoretical and Applied Finance, 7(5):555–576, 2004.

[51] R. Rebonato, K. McKay, and R. White. The SABR/LIBOR Market Model. Wiley, 2009.

[52] M. Schroder. Computing the constant elasticity of variance option pricing formula. Journal

of Finance, pages 211–219, 1989.

[53] R. Sheppard. Pricing equity derivatives under stochastic volatility: A partial differential equa-

tion approach. PhD thesis, Faculty of Science, University of the Witwatersrand, 2007.

[54] S.E. Shreve. Stochastic calculus for finance: Continuous-time models, 2. Springer, 2004.

[55] S.M. Taylor. Explicit density approximations for local volatility models using heat kernel

expansions. preprint available at http://ssrn.com/abstract=1800415, 2011.

[56] S.R.S. Varadhan. Diffusion processes in a small time interval. Communications on Pure and

Applied Mathematics, 20(4):659–685, 1967.

[57] S.R.S. Varadhan. On the behavior of the fundamental solution of the heat equation with

variable coefficients. Communications on Pure and Applied Mathematics, 20(2):431–455, 1967.

[58] D.V. Vassilevich. Heat kernel expansion: user’s manual. Physics reports, 388(5-6):279–360,

2003.

[59] NN Yanenko. A difference method of solution in the case of the multidimensional equation

of heat conduction. In Dokl. Akad. Nauk SSSR, 125, pages 1207–1210, 1959.

93

Documents

Approximate smiles in an extended SABR model...N(x) = √1 2π Rx e−x 2 2 dx denotes the normal cumulative distribution function, d± = logf/K ± 1 2σ 2 Bτ σB τ, (1.2) and D(t)