Calibrating the Hull-White model using Adjoint Algorithmic …1147858/... · 2017-10-09 · further in Section 2.2. When using algorithmic di erentiation however, there is no need

IN DEGREE PROJECT MATHEMATICS,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2017

Calibrating the Hull-White model using Adjoint Algorithmic Differentiation

SIMON CARMELID

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ENGINEERING SCIENCES

Calibrating the Hull-White model using Adjoint Algorithmic Differentiation SIMON CARMELID Degree Projects in Financial Mathematics (30 ECTS credits) Degree Programme in Mathematics KTH Royal Institute of Technology year 2017 Supervisor at KTH: Henrik Hult Examiner at KTH: Henrik Hult

TRITA-MAT-E 2017:57 ISRN-KTH/MAT/E--17/57--SE Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Abstract

This thesis includes a brief introduction to Adjoint Algorithmic Differentia-tion (AAD), accompanied by numerical examples, step-by-step explanationsand runtime comparisons to a finite difference method. In order to show theapplicability of AAD in a stochastic setting, it is also applied in the calculationof the arbitrage free price and partial derivatives of a European call option,where the underlying stock has Geometric Brownian motion dynamics.

Finally the Hull-White model is calibrated using a set of zero coupon bonds,and a set of swaptions. Using AAD, the partial derivatives of the model arefound and used in a Newton-Raphson method in order to find the market’simplied volatility. The end result is a Monte Carlo simulated short rate curveand its derivatives with respect to the calibration parameters, i.e. the zerocoupon bond and swaption prices.

Sammanfattning

Denna uppsats innehaller en introduktion till Adjungerad Algoritmisk Differen-tiering (AAD), tillsammans med numeriska exempel, steg-for-steg beskrivningarsamt kortidsjamforelser med en finit differensmetod. For att illustrera applicer-barheten av AAD i ett stokastiskt ramverk, tillampas metoden i berakningen avdet arbitragefria priset och de partiella derivatorna av en europeisk kop-option,dar den underliggande aktien har geometrisk Brownsk dynamik.

Slutligen kalibreras Hull-White-modellen genom ett antal nollkupongsobli-gationer och swap-optioner. Via AAD beraknas de partiella derivatorna for mo-dellen som sedan anvands i Newton-Raphsons metod for att finna markandensimplicita volatilitet. Slutresultatet ar en Monte Carlo-simulerad rantekurva ochdess derivator med avseende pa kalibreringsparametrarna, dvs. nollkupongs- ochswap-optionspriserna.

Acknowledgements

First of all, I’d like to acknowledge the helpful guidance and input from pro-fessor Henrik Hult. You course corrected me when needed and provided muchneeded input in times of trouble.Secondly, my family for your continued support throughout my studies. Espe-cially my fiance Nina. This would not have been possible without your help andbacking!

2

Contents

1 Introduction 41.1 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . 41.2 Monte Carlo simulations . . . . . . . . . . . . . . . . . . . . . . . 5

2 Adjoint Algorithmic Differentiation 62.1 Inner workings of the CPU . . . . . . . . . . . . . . . . . . . . . 72.2 Finite Difference Methods . . . . . . . . . . . . . . . . . . . . . . 82.3 Adjoint Algorithmic Differentiation . . . . . . . . . . . . . . . . . 9

2.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Performance of AAD . . . . . . . . . . . . . . . . . . . . . . . . . 162.6 Notes on recursion . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 AAD in the stochastic setting 193.1 Geometric Brownian Motion . . . . . . . . . . . . . . . . . . . . . 203.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 The Hull-White model 244.1 Short Rate Models . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Choosing θ(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Volatility-Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 The case when σ(t) is piecewise constant . . . . . . . . . . . . . . 31

5 Calibrating the Hull-White model 335.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 The Assignment Function . . . . . . . . . . . . . . . . . . . . . . 345.3 Choosing f?(0, T ) . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3.1 Adjoint of f?(0, T ) . . . . . . . . . . . . . . . . . . . . . . 375.4 Adjoint of ϕ(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.5 Discretization of the diffusion process . . . . . . . . . . . . . . . 395.6 Black’s Formula for Swaptions . . . . . . . . . . . . . . . . . . . 40

5.6.1 Discount factors p(0, T ) . . . . . . . . . . . . . . . . . . . 415.6.2 Accrual factor and forward swap rate . . . . . . . . . . . 42

5.7 Newton-Raphson . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.8 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Conclusions 52

7 Bibliography 53

A Useful Proofs 55

3

Chapter 1

Introduction

Risk management has always been of interest for banks and investors, and fol-lowing the financial crisis of 2008, emphasis on risk management and reportinghas increased with several new regulations introduced in the EU. Also, with theelectrification of securities trading that began in the 1970’s and led to most ofEurope’s exchanges being fully automated in the 1990’s, high-frequency-tradingemerged as a way of gaining an edge on the market (see (10)). Thus, with moreand more complicated products, traded at higher frequency and an increasedemphasis on risk management, there is a need to develop faster methods forgetting sensitivities with respect to model parameters (see (1)). While severalapproaches have been proposed (see for example (9), (7)), this thesis will focuson Adjoint Algorithmic Differentiation (AAD for short). AAD is sometimesreferred to as Automatic Differentiation, and the idea was first introduced asearly as 1964 (see (15)). An excellent source on AAD is the book by Griewank& Walter (see (11)).

A model often encountered in financial mathematics is the Hull-White onefactor model for short rates. The model has several appealing properties sinceit is possible to fit the model to an initial yield curve and have the volatilitytime dependent. However, existing studies on calibrating the Hull-White modelwith time-dependent parameters are rather scarce (see (13)). The objective ofthis thesis is to propose a method for calibrating the Hull-White model and tofind its sensitivities with respect to the calibration parameters.

1.1 Stochastic Differential Equations

For most pricing problems, there exists an element of uncertainty about whatthe future value will be. The value of some financial contract is thus derivedfrom some unpredictable process, which determines the value of the contract atmaturity. Contracts on this form are called derivatives and in a sense, almost anyfinancial contract can be viewed as a derivative of some process. A stock price isa derivative of the market valuation of the company, and a savings account is aderivative of the interest rate. Even lottery tickets can be viewed as derivatives,in which case the underlying process is the numbers drawn in the lottery. In allof these processes, the investor estimates the behavior of the underlying process

4

in order to determine if the listed price is fair or not. In an arbitrage-freemarket, the expected outcome according to the market, is represented in thecurrent price of the asset. Since the underlying process is unknown it is oftenmodeled by a stochastic process, with the general representation

dX(t,X(t)) = α(t,X(t))dt+ σ(t,X(t))dW (t). (1.1)

The function α(t,X(t)) represents a drift function (seasonal drift, mean revert-ing drift, constant drift, etc.), and σ(t,X(t)) represents the volatility at time t,with the underlying process at X(t). W (t) is a one-dimensional Wiener-process,responsible for the randomness in the process. The general representation (1.1)is an example of a stochastic differential equation (SDE). SDEs represent acentral part of financial mathematics and modeling and hence, this thesis.

1.2 Monte Carlo simulations

Another central part of financial modeling is to today try and determine thevalue of an asset at a future time T . For deterministic processes this is simplya matter of calculating the result, but in financial markets there are seldomany deterministic assets. Instead, one simulates the value process of a givenasset based on a set of assumptions, including drift and volatility. In order tosimulate a stochastic process such as (1.1) on a computer, a Wiener processis generated such that the value of (1.1) can be computed. However, withonly one Wiener process generated, the result is just some random outcome outof all possible outcomes of the simulation and does not say much about thedistribution of possible values at time T . However, when simulating the processa large number of times and averaging the result, the mean will tend toward theexpected value of the process. Each simulated process, Xj is often referred toas a path, and the expected value of X(T ) after a large number of simulationsN is then

E[X(T )] ≈ 1

N

N∑j=1

Xj(T ). (1.2)

0.0

0.1

0.2

0.3

0.4

-2 -1 0 1 2 3output

density

ZNon-symetric

Symetric

Figure 1.1: Sample of 2000iid ∼ N(0, 1), and the first 1000together with their antitheticcounterparts.

The result from generating many ran-dom samples will converge towards thetrue probability distribution of the pro-cess. However, the convergence can some-times be slow, and a way of speeding itup is to use antithetic sampling (see (9)).In Figure 1.1, 2000 standard normal vari-ables are plotted, as well as the first 1000of those with their antithetic (or symmet-ric) counterparts. The antithetic sam-ple converges faster towards the standardnormal density function, and will be usedin all Monte Carlo simulations in this the-sis.

5

Chapter 2

Adjoint AlgorithmicDifferentiation

Typically when using a computer to compute derivatives of a function a finitedifference method, also known as a bump-and-run technique, is used. Thesemethods are based on the assumption that during a sufficiently small intervalx+ ∆x, a function F (x) is approximately linear. These methods are explainedfurther in Section 2.2. When using algorithmic differentiation however, there isno need for the linear assumptions associated with finite difference approxima-tions.

The benefits of using algorithmic differentiation is twofold. First of all thedesired derivatives are computed at machine accuracy, and truncation errors areavoided (14). Second, the computation of first order gradients can be completedat a small constant multiple of the cost of computing the original function.According to Griewank ((12)) and Capriotti ((4),(5)), this multiple is boundedat around 4, however Neuman ((14)) claims it typically ranges between 3 and30. Regardless, it is often a vast improvement over the finite difference methods,which scale linearly with the number of inputs n.

For complex models, with a large number of inputs N , there is a limitingfactor in terms of computing power to use the ”brute force” methods from thepast, which is a reason as to why algorithmic differentiation, or AD for short,has gained interest in recent years. Before explaining further what AD is, let’sexamine the following function

f(x1, x2, x3, x4) =

(x1x2

x3+ esin(x4) − sin(x4) cos(x3) +

x1

x2

)(x1 − x2x3)

2.

(2.1)The partial derivatives of (2.1) can be derived analytically in this example, butdoing so can in many real world situations be a very tedious task. Furthermore,the resulting expressions can be even more complex than the original model. Asan example, consider

∂f

∂x3=

(−x1x2

x23

+ sin(x4) sin(x3)

)(x1 − x2x3)

2+

(−2x2)

(x1x2


x1

x2

)(x1 − x2x3) ,

(2.2)

6

which is a lot more complex than the original function (2.1). However, notice thesimilarities between the two expressions (2.1) and (2.2). With a bit of cleverdesign, there are many terms which need only be calculated once, and thenreused to speed up the calculations. The idea of reusing past calculations willbe exemplified in Section 2.3.

2.1 Inner workings of the CPU

In order to fully appreciate the advantages and drawbacks of AAD, a brief in-troduction into how the computer actually calculates is in order. It is commonknowledge that the computer works with ones and zeros, or rather high and lowvoltage signals, in order to process what the user asks from it. Most of thesesignals are handled within the central processing unit, CPU.

The central processing unit contains various parts, such as a clock, programcounter, registers, cache memory and arithmetic logic unit (ALU). The variousparts of the CPU communicate and send data to each other via a central bus.What the CPU does is determined by the codes in the currently loaded programwhich are interpreted by the control logic to instruct other parts of the CPUweather to load or store data from the central bus.

The clock in the CPU determines when an operation is performed. Theclock is binary, and sends a high and low voltage signal one after the otherat a frequency often between 2 and 4 GHz on modern CPUs. Each time theclock signal goes high, the CPU does something, but only one thing. For ourpurposes, the registers, cache and ALU are of most importance. The ALU isthe part of the CPU that does the actual math. In order to do so, there are tworegisters connected to it. The registers are small ’memories’ of 8-, 16-, 32- or64-bits depending on the type of CPU. When data is loaded into the two reg-isters, the ALU can output the sum, difference, product or quotient of the twonumbers onto the bus the next time the clock signal goes high. Then some otherinstruction determines what happens to the output during the next clock signal.

Let’s illustrate the process of adding and multiplying some numbers togetherby an example. Consider the following program

example{

a=3

b=4

c=2

print((a+b)*c)

}

Figure 2.1: Example of a simple program

Table 2.1 shows an example of how the small program in Figure 2.1 could becomputed. By observing that loading data into the registers, and storing datafrom the bus to some memory location is just as slow as doing multiplication,some take-aways are as follows;

7

Clock tick Operation On Bus1 Load 3 into RegA 0000 00112 Load 4 into RegB 0000 01003 Compute Sum 0000 01114 Load output into RegA 0000 01115 Load 2 into RegB 0000 00106 Compute Product 0000 11107 Output bus to XXX 0000 1110

Table 2.1: Example of operations performed in the CPU in order to processthe code in Figure 2.1.

• The ALU only operates on two numbers at a time

• Storing or fetching data is just as time consuming as doing a calculationof two numbers

While the cost of computing derivatives using AAD, in theory, should only beabout 2 times the cost of calculating the original function, the number is offsetby the extra number of operations performed relating to storing and fetchingdata(see (11),(12),(14)).

2.2 Finite Difference Methods

In Edsberg’s book ((8)) a number of methods are presented on how to numeri-cally calculate derivatives of functions, or rather, approximations of derivatives.The numerical solutions are often obtained by discretizing the function and us-ing a finite difference method (FDM). Examples of known FDMs are Euler’smethod and Runge-Kutta’s method.

Finite difference methods are sometimes reffered to as bump-and-run tech-niques and are based on the idea that in a sufficiently small interval [x1, x1 + ∆x],the function is essentially linear, and the derivative can therefore be approxi-mated with

Definition 2.1. Euler forward

∂f

∂x1=f(x1 + ∆x, x2, x3, x4)− f(x1, x2, x3, x4)

∆x+O(∆x),

Definition 2.2. Euler backward

∂f

∂x1=f(x1, x2, x3, x4)− f(x1 −∆x, x2, x3, x4)

∆x+O(∆x),

Definition 2.3. Central difference method

∂f

∂x1=f(x1 + ∆x, x2, x3, x4)− f(x1 −∆x, x2, x3, x4)

2∆x+O(∆x2),

and similarly for x2, x3, . . . , xN . For smaller perturbations ∆x, the error be-comes smaller, however this leads us into the topic of truncation errors. Trunca-tion errors arise from approximating a value with high precision to fewer digits.

8

This comes from the fact that ∆x needs to become infinitely small in order forthe difference quotient to converge to the analytical result. However, since thecomputer only has a finite number of bits to represent the values, the trunca-tion error will distort the calculated value from the analytical solution. WithAlgorithmic Differentiation however, there are no truncation errors since thederivatives are never approximated in the first place(see (11)).

2.3 Adjoint Algorithmic Differentiation

Adjoint Algorithmic Differentiation takes advantage of the fact that any oper-ation in a computer is a series of simpler operations, that are combined intoa large model. As mentioned earlier the Arithmetic Logic Unit (ALU) insidethe computer’s CPU takes two inputs at a time and performs some operationon those, creating a large set of temporary outcomes (or states) along the way.The main idea is that a vector of inputs, IN , are inserted into the function ormethod to produce a vector of outputs OUT . Depending on the complexity ofthe method, there are a number of intermediate steps I0, I1, I2, . . . , IS produced”under the hood”, before the output is presented. The process of going fromthe vector IN to the vector OUT can be schematically described as

IN → I0 → I1 → I2 → . . . Is−1 → Is → OUT︸︷︷︸METHOD(IN)→OUT

.

Suppose we are interested in some partial derivative

∂OUT

∂INi,

then it can be expressed using the chain rule as

∂OUT

∂INi=∂OUT

∂Is

∂Is∂Is−1

. . .∂I1∂I0

∂I0∂INi

.

As an example, consider equation (2.1) which could be performed in the follow-

9

ing steps

1) =

=v1︷︸︸︷x1x2

x3+ e

=v2︷︸︸︷sin(x4) − sin(x4) cos(x3)︸︷︷︸

=v3

+x1

x2︸︷︷︸=v4

x1 − x2x3︸︷︷︸

=v5

2

2) =

v1

x3︸︷︷︸=v6

+ ev2︸︷︷︸=v7

− v2v3︸︷︷︸=v8

+v4

x1 − v5︸︷︷︸

=v9

2

3) =

v6 + v7︸︷︷︸v10

+ (−v8 + v4)︸︷︷︸v11

v29︸︷︷︸

v12

4) =

v10 + v11︸︷︷︸v13

v12

5) = v13v12︸︷︷︸v14

6) = v14 = f(x1, x2, x3, x4).

These steps exemplify how equation (2.1) is actually a result of (in this case) 14separate rather simple operations. AAD takes advantage of this fact with thehelp of the chain rule in order to produce the partial derivatives of f . Let’s definethe steps described above, as a series of intermediate vector valued functionsI1, I2, . . . , Is defined as

I0(x) =[x1 x3 x1x2 sin(x4) cos(x3) x1

x2x2x3

]=[x1 x3 v1 v2 v3 v4 v5

]I1(I0) =

[v4

v1x3

ev2 v2v3 x1 − v5

]=[v4 v6 v7 v8 v9

]I2(I1) =

[v6 + v7 −v8 + v4 v9v9

]=[v10 v11 v12

]I3(I2) =

[v10 + v11 v12

]=[v13 v12

]I4(I3) =

[v13v12

]=[v14

],

(2.3)

givingf(x) = I4(I3(I2(I1(I0(x))))).

Looking at the partial derivatives of f ,

∂f

∂x1=∂I4

∂I3

∂I3

∂I2

∂I2

∂I1

∂I1

∂I0

∂I0

∂x1,

∂f

∂x2=∂I4

∂I3

∂I3

∂I2

∂I2

∂I1

∂I1

∂I0

∂I0

∂x2,

∂f

∂x3=∂I4

∂I3

∂I3

∂I2

∂I2

∂I1

∂I1

∂I0

∂I0

∂x3,

∂f

∂x4=∂I4

∂I3

∂I3

∂I2

∂I2

∂I1

∂I1

∂I0

∂I0

∂x4,

10

it can be noted that the right hand side of the equations look very similar. Infact, they can be written as

∂f

∂xi=

(∂I4

∂I3

∂I3

∂I2

∂I2

∂I1

∂I1

∂I0

)∂I0

∂xi,

where the part in parenthesis is equal for all i. This is the fundamental ideaof AAD. Once the parenthesis is calculated, the CPU time required for eachpartial derivative is drastically reduced.

In accordance with previous literature, we will be using the notation vi =∂y/∂vi for the so called adjoint of vi. The adjoint variables in the context ofalgorithmic differentiation are defined as

Definition 2.4. Adjoint vectorThe adjoint of some intermediate vector Ii is

Ii(Ii, Ii+1) =

s∑j=i+1

Ij ·∂Ij∂Ii

, Is ≡ (1, 1, 1, . . . , 1)T ,

where s is the total number of intermediate steps.

So when constructing the adjoint vectors, one has to go recursively back-wards through the program, starting at Is−1

2.3.1 Example

Looking back at (2.1),

f(x1, x2, x3, x4) =

(x1x2


x1

x2

)(x1 − x2x3)

2,

and using the intermediate steps defined in (2.3), the adjoint vectors become

I4 ≡ 1 = v14

I3 =[v14

∂v14∂v13

v14∂v14∂v12

]=[v12 v13

]=[v13 v12

]I2 =

v13∂v13∂v10

+ v12∂v12∂v10

v13∂v13∂v11

+ v12∂v12∂v11

v13∂v13∂v12

+ v12∂v12∂v12

=

v13

v13

v12

=

v10

v11

v12

I1 =

v10

∂v10∂v4

+ v11∂v11∂v4

+ v12∂v12∂v4

v10∂v10∂v6

+ v11∂v11∂v6

+ v12∂v12∂v6

v10∂v10∂v7

+ v11∂v11∂v7

+ v12∂v12∂v7

v10∂v10∂v8

+ v11∂v11∂v8

+ v12∂v12∂v8

v10∂v10∂v9

+ v11∂v11∂v9

+ v12∂v12∂v9

=

v11

v10

v10

−v11

2v12v9

=

v4

v6

v7

v8

v9

11

I0 =

v4∂v4∂x1

+ v6∂v6∂x1

+ v7∂v7∂x1

+ v8∂v8∂x1

+ v9∂v9∂x1

v4∂v4∂x3

+ v6∂v6∂x3

+ v7∂v7∂x3

+ v8∂v8∂x3

+ v9∂v9∂x3

v4∂v4∂v1

+ v6∂v6∂v1

+ v7∂v7∂v1

+ v8∂v8∂v1

+ v9∂v9∂v1

v4∂v4∂v2

+ v6∂v6∂v2

+ v7∂v7∂v2

+ v8∂v8∂v2

+ v9∂v9∂v2

v4∂v4∂v3

+ v6∂v6∂v3

+ v7∂v7∂v3

+ v8∂v8∂v3

+ v9∂v9∂v3

v4∂v4∂v4

+ v6∂v6∂v4

+ v7∂v7∂v4

+ v8∂v8∂v4

+ v9∂v9∂v4

v4∂v4∂v5

+ v6∂v6∂v5

+ v7∂v7∂v5

+ v8∂v8∂v5

+ v9∂v9∂v5

=

=

v9

−v6v1x23

v61x3

v7ev2 + v8v3

v8v2

v4

−v9

=

x1

x3

v1

v2

v3

v4

v5

x =

x1 + v1x2 + v4

1x2

v1x1 − v4x1

x22

+ v5x3

x3 − v3 sin(x3) + v5x2

v2 cos(x4)

=

x1

x2

x3

x4

.

(2.4)

The final adjoint vector [x1, x2, x3, x4] is precisely

x1

x2

x3

x4

=

∂f∂x1∂f∂x2∂f∂x3∂f∂x4

,

although expressed in a series of new variables vi and vi. As a numeric example,consider the forward pass denoted ”→” and backward pass ”←” on page 13. Sofor some values of

(x1 x2 x3 x4

), lets say

(1.5 2.5 1.0 0.5

), the result is

12

x

x1 = 1.5000x2 = 2.5000x3 = 1.0000x4 = 0.5000

→ I0

x1 = 1.5000x3 = 1.0000v1 = 3.7500v2 ≈ 0.3894183v3 ≈ 0.5403023v4 = 0.6000v5 = 2.5000

→ I1

v4 = 0.6000v6 = 3.7500v7 ≈ 1.4761v8 ≈ 0.2104v9 = −1.0000

→ I2

v10 ≈ 5.2261v11 ≈ 0.3896v12 = 1.0000

→ I3

{v13 ≈ 5.6157v12 = 1.0000

→ I4

{v14 ≈ 5.6157

← I4

{v14 = 1.0000

← I3

{v13 = 1.0000v12 ≈ 5.6157

← I2

v10 = 1.0000v11 = 1.0000v12 ≈ 5.6157

← I1

v4 = 1.0000v6 = 1.0000v7 = 1.0000v8 = −1.0000v9 ≈ −11.2314

← I0

x1 ≈ −11.2314x3 = −3.7500v1 = 1.0000v2 ≈ 0.9358v3 ≈ −0.3894v4 = 1.0000v5 ≈ 11.2314

x

x1 ≈ −8.3314x2 ≈ 12.4914x3 ≈ 24.6563x4 ≈ 0.8619

13

2.4 Chaining

To illustrate the power of using the chain rule in algorithmic differentiation, weintroduce a concept called chaining. When using AAD, the task is to in eachstep along the way, find the relationship between input variables and outputvariables. Most times, the output of one method, is the input to the nextmethod, which is the input in the method after that and so on. In order to findthe relationship between the original input variables and the final results, thereneeds to be a way of linking (or chaining) these methods together. With AAD,it is simply a matter of multiplying them together.

Let’s once again return to (2.1),

f(x1, x2, x3, x4) =

(x1x2


x1

x2

)(x1 − x2x3)

2.

With the notation introduced in (2.3), f(x) is a mapping of x → v14, andthe partial derivatives ∂v14/∂xi are defined in (2.4).

What if v14 is the input to yet another function? Or what if x1 is the outputof some earlier procedure? We illustrate what happens by the following example

g(y) = sin(e0.01y3

). (2.5)

By deconstructing (2.5) in the same manner as earlier, the following can beshown

I0(y) = 0.01y3 = v1

I1(v1) = ev1 = v2

I2(v2) = sin(v2) = v3

g(y) = I2(I1(I0(y))).

As usual, v3 = 1, and the full adjoint of g is,

v2 = v3dv2

dv3= cos(v2),

v1 = v2dv1

dv2= v2e

v1 = cos(v2)v2,

y = v1dv1

dy= v1

(0.03y2

)= 0.03y2v2 cos(v2).

(2.6)

Note that y = 0.03y2e0.01y3 cos(e0.01y3) is equal to the analytical expressionof dg/dy, which is expected. The adjoint expression is therefore not an approx-imation of the derivative of g(x) at x, but rather the exact solution.If f(x1, x2, x3, x4) = v14 = y is the input to (2.5), the partial derivatives of

∂g(f(x1, x2, x3, x4))

∂xi,

can be calculated by means of chaining. That means the adjoints xi calculatedin (2.4), can simply be multiplied by the adjoint y to produce the desired partialderivatives since (remember that f(x) = y)

xiy =∂f(x)

∂xi

dy

df(x)︸︷︷︸=1

dg(y)

dy=∂g(y)

∂xi.

14

Using the same numerical example values as before(x1 x2 x3 x4

)=(

1.5 2.5 1.0 0.5), the result is v14 = y = 5.706112 and

g(5.706112) = sin(e0.01·5.7061123

)≈ sin

(e1.857893

)≈ sin (6.410218) ≈ 0.1266917 = y.

The question now is how g(y) is changed when some of the initial xis arechanged?Since v2 = e0.01y3 , Equation (2.6) gives

y = 0.03y2v2 cos(v2) ≈ 6.210992,

and multiplying y with the adjoints in (2.4) yields

x1y =∂g(f(x))

∂x1≈ −52.869346, x2y =

∂g(f(x))

∂x2≈ 78.707071,

x3y =∂g(f(x))

∂x3≈ 156.417491, x4y =

∂g(f(x))

∂x4≈ 5.858607.

When implementing

∂g(f(x))

∂x1≈ g(f(x1 + h, x2, x3, x4))− g(f(x1, x2, x3, x4))

h, h = 0.0001

(2.7)the results are

x1y =∂g(f(x))

∂x1≈ −52.857276 x2y =

∂g(f(x))

∂x2≈ 78.737702

x3y =∂g(f(x))

∂x3≈ 156.528886 x4y =

∂g(f(x))

∂x4≈ 5.858979

which are similar, but the runtime of doing the finite difference method is slower,and that is discussed more in Section 2.5.

Furthermore, suppose x3 is the result of some earlier input, such that

x3 = γ(a1, a2, a3, a4, a5, a6) =2a1 − a2a3

a4a5+ a6.

With a1 = 3, a2 = a3 =√

2, a4 = 4, a5 = 2 and a6 = 0.5, x3 is still

γ(3,√

2,√

2, 4, 2, 0.5) =6− 2

8+ 0.5 = 1. (2.8)

The adjoint variables are

a1 =2

a4a5= 0.25, a2 =

−a3

a4a5≈ −0.1767767,

a3 =−a2

a4a5≈ −0.1767767, a4 =

a2a3 − 2a1

a24a5

= −0.125,

a5 =a2a3 − 2a1

a4a25

= −0.25, a6 = 1.

15

It is now possible to calculate

∂g(f(x1, x2, γ(a1, a2, a3, a4, a5, a6), x4))

∂a1,

by simply multiplying x3y ≈ 156.417491 with the corresponding adjoint. Nu-merical results at the point

x1

x2

a1

a2

a3

a4

a5

a6

x4

=

1.52.53.0√

2√2

4.02.00.50.5

are

x1

x2

a1

a2

a3

a4

a5

a6

x4

=

−52.86934678.70707139.104373−27.650967−27.650967−19.552186−39.104373156.4174915.858607

. (2.9)

As expected ∂g(y)/∂x3 = ∂g(y)/∂a6 since a6 = 1.

Hopefully this clarifies what is meant by chaining in this context, and thepower of chaining when looking for derivatives which are the result of multiplefunctions feeding each other.

2.5 Performance of AAD

By continuing with the functions presented in Section 2.4, specifically

g(y) = sin(e0.01y3

),

where

y = f(x1, x2, x3, x4) =

(x1x2


x1

x2

)(x1 − x2x3)

2,

where in turn,

x3 = γ(a1, a2, a3, a4, a5, a6) =2a1 − a2a3

a4a5+ a6.

The partial derivatives

∂g(y)

∂xii ∈ [1, 2, 4] and

∂g(y)

∂aii ∈ [1, 2, 3, 4, 5, 6] ,

were calculated both through the finite difference method corresponding to Eu-ler forward, and by adjoint algorithmic differentiation. Both methods wererelatively quick in solving these partial derivatives, so the computation was re-peated 10 000 times for each method in order to get a good estimate. In orderto get a standard deviation of the estimates, the entire computation of 10 000repetitions was done 100 times, and the standard deviation as well as meanruntime of 10 000 repetitions could thus be computed. In table 2.2 the runtime

16

results of computing the partial derivatives using both techniques are given,along with the computation of only the function value.

Function Runtime (s) Std. Dev. (s)g(f(x1, x2, γ(a1, a2, a3, a4, a5, a6), x4)) 0.14286 0.008735184AAD derivatives 0.54379 0.02163158FDM derivatives 1.45272 0.01780142

Table 2.2: Table shows standard deviation and mean runtime per cycle, over100 cycles, where each cycle executes the function 10000 times. The runtime istherefore exaggerated by a factor of 10000, on a MacBook Pro 13 inch mid2009.

Note that the FDM-method is not only slower, but also only an approxi-mation, since the function g(f(x1, x2, γ(a1, a2, a3, a4, a5, a6), x4)) is not linear.Greater accuracy can be achieved with smaller perturbation h, but that will leadto more truncation errors and is not the point of this demonstration. Rather,observe that the time required for the FDM-method is 1.45272/0.14286 ≈ 10.2times slower than the time to compute only the value of the function. This is be-cause there are 9 variables, giving 9 partial derivatives, and the n+1 runtime isthus expected. For the AAD method, the runtime is only 0.54379/0.14286 ≈ 3.8,which is within Griewank’s proposed limits of 4-5x the original runtime (see(12)).

2.6 Notes on recursion

First of all, consider the function

yi+1 = αx+ βxyi,

and the partial derivative of yi+1 with respect to x,

x =∂yi+1

∂x= α+ βyi + βx

∂yi∂x

= α+ βyi + βx

(α+ βyi−1 + βx

∂yi−1

∂x

)= . . . .

The structure of the above equation suggests that it needs to be solved recur-sively, however Griewank (see (11)) proposes a set of rules regarding AAD, andone of those (rule 10) states that there is only need for one forward sweep andone backward sweep. In other words, recursive functions can either be solvedfrom ”right-to-left”, implying that i is on the further right along the grid thani− 1, or from ”left-to-right”. This can be illustrated by the fact that

∂yi∂x

=∂yi∂yi−1

∂yi−1

∂yi−2. . .

∂y1

∂y0

∂y0

∂x=∂y0

∂x

∂y1

∂y0. . .

∂yi−1

∂yi−2

∂yi∂yi−1

.

Merely an aesthetic difference perhaps, but the implication suggests that forsome applications, the adjoint variables can be constructed along with the orig-inal function in the forward pass. As an example, for some starting value y0,the next value y1 is

y1 = αx+ βxy0,

17

giving the adjoint variables as

x1 = α+ βy0, y0 = βx.

The adjoint variables are computed along with the value y1, and carried alonginto the next propagation giving

y2 = αx+ βxy1,

with adjoint variable, (omitting y)

x2 =∂y2

∂x= (α+ βy1) +

βx ∂y1

∂x︸︷︷︸=x1

= α+ βy1 + βxx1.

These values are again carried along to create

x3 =∂y3

∂x= (α+ βy2) +

βx ∂y2

∂x︸︷︷︸=x2

= α+ βy2 + βxx2,

and so on. The subscript in the adjoint variables suggests that xk is the partialderivative ∂yk/∂x. When implementing this method in code it is of course upto the programmer if there is a need to save the intermediate xks, or if only thefinal xN is of interest, in which case the variable can be overwritten as

x← α+ βyi + βxx,

where the arrow suggests assignment in code.

18

Chapter 3

AAD in the stochasticsetting

In order to evaluate the feasibility of AAD in financial settings, we must considerthe stochastic case as well, as financial mathematics is very much dependenton stochastic processes. Let’s consider a european call-option on a geometricBrownian motion, for which we can analytically derive the derivatives, and hencesee if the numerical results are in line with what is to be expected. The valueof such a call-option, will be priced using Black-Scholes option pricing formula.

Definition 3.1. Black-Scholes formulaFor a stock with dynamics as in Definition 3.2, the arbitrage free price at timet for a European call option with strike K and maturity T > t is

c(S(t), t) = S(t)φ(d1)− e−r(T−t)Kφ(d2)

where

d1 =1

σ√T − t

(ln

(S(t)

K

)+

(r +

σ2

2

)(T − t)

)d2 = d1 − σ

√T − t

and r is the risk free rate, σ the volatility of the underlying stock and φ(·) isthe standard normal cumulative distribution function.

For simplicity, consider the case when r = t = 0 and T = 1, in which case

C0 = S0φ(µ1)−Kφ(µ2), (3.1)

where

µ1 =ln(S0/K)

σ+σ

2,

µ2 =ln(S0/K)

σ− σ

2,

giving delta and vega as

∂C0

∂S0= φ(d1),

∂C0

∂σ= S0φ

′(µ1).

19

3.1 Geometric Brownian Motion

In the usual Black-Scholes model, the underlying asset is assumed to have log-normal daily returns, that is, the underlying asset is a geometric Brownianmotion GBM, defined as

S(t) = S0e(µ−σ22 )t+σW (t). (3.2)

Here is the formal definition:

Definition 3.2. Geometric Brownian MotionA stochastic process St is said to follow a Geometric Brownian Motion, GBM,if it solves

dS(t) = µS(t)dt+ σS(t)dW (t), S(0) = S0,

where µ is the drift, σ is the volatility, and W is a Wiener process.

Using Ito calculus, the process in Definition 3.2 can be written as

S(t) = S0 +

∫ t

0

µS(u)du+

∫ t

0

σS(u)dW (u). (3.3)

However, the Wiener process in Definition 3.2 (and subsequently Equation (3.3))is Q-Wiener, and in order to compare the simulated results the local rate of re-turn must equal the short rate. This can be done via a Girsanov transformationusing the kernel

h = −µ− rσ

.

With r = 0 as in (3.1), the dynamics of (3.3) become

dS(t) = S(t)((µ+ σh)dt+ σdWP (t)

)= S(t)σdWP (t).

Furthermore, recall that∫ t+dt

t

dS(t) = S(t+ dt)− S(t) ⇒ S(t+ dt) = S(t) +

∫ t+dt

t

dS(t).

This is still considered in a continuous-time case, but if a Monte Carlo simulationis to be applied, the time axis is divided into t ∈ [0, t1, . . . , tn]. The continuoustime Wiener process is not available, however, for a sufficiently small intervaldt, a first-order Euler approximation becomes

S(t+ dt) ≈ S(t) + σS(t)(WP (t+ dt)−WP (t)

).

And from the fundamental properties of Wiener processes it is evident that

W (t)−W (s) ∼ N(0,√t− s),

and thereforeS(t+ dt) ≈ S(t)

(1 + σN(0,

√dt)). (3.4)

A simple propagation algorithm for (3.4) could then be

20

prop(i,S,dt,sigma){

S[i+1]=S[i]*(1+sigma*rnorm(1,mean=0,sd=sqrt(dt)))

}

and in the context of algorithmic differentiation, the propagation is a mappingof the state vector at time ti to the state at ti+1 asSiσi

Zi

→︸︷︷︸prop

Si(1 + σiZi)σi

N(0,√dt)

=

Si+1

σi+1

Zi+1

, (3.5)

such that Z is a vector of n iid ∼ N(0,√dt). Finally, the value of a European

call option at maturity T is

V (Sn,k,K) =

{Sn,k −K if Sn,k > K

0 otherwise(3.6)

for each Monte-Carlo simulated path k. The arbitrage free value at time t = ydtis therefore

C(Sy, ydt) =e−r(ndt−ydt)

m

m∑k=1

V (Sn,k,K),

where m is the number of simulated paths. The Monte-Carlo simulation furtherimplies that

limm→∞

1

m

m∑k=1

V (Sn,k,K) = C0

where C0 is defined as in (3.1).

In order to calculate Delta and Vega using AAD, we need the adjoints

S0 =∂V (Sn,K)

∂S0, σ0 =

∂V (Sn,K)

∂σ0.

However, V (Sn,K) is not differentiable at Sn = K. In a practical sense, thisdoes not matter, since the grid can be designed as to not include the point K.By setting V ≡ 1, the first adjoint becomes

Sn =

V ∂V∂Sn

= 1 if Sn > K

undefined if Sn = K

0 if Sn < K

.

The rest of the adjoints are then computed by taking the adjoint of (3.5) suchthat

Si = Si+1∂Si+1

Si= Si+1(1 + σiZi)

σi = Si+1∂Si+1

∂σi+ σi+1

∂σi+1

∂σi= Si+1SiZi + σi+1

(3.7)

21

3.2 Numerical results

In order to test the feasibility of Adjoint Algorithmic Differentiation in a stochas-tic setting, the derivatives using AAD should coincide with the analytical valuesof Definition 3.1. Using Definition 3.1, with values r = 0, t = 0, T = 1, σ = 0.25,S(0) = 100 and K = 102, the arbitrage free price of the option becomes

c(S(0), 0) = 100φ

(ln(

100102

)0.25

+0.25

2

)− 102φ

(ln(

100102

)0.25

− 0.25

2

)≈ 9, 078459.

Furthermore, the delta and vega of the call option are

∂c(S(0), 0)

∂S(0)= φ(d1) ≈ 0.518261,

∂c(S(0), 0)

∂σ= S(0)φ′(d1)

√1− 0 ≈ 39.852427.

The process defined in (3.4) was then simulated by discretizing the grid to250 steps between t = 0 and T , such that dt = 1/250, and the value computed atmaturity according to (3.6). In Table 3.1 the results are presented for differentnumbers of Monte Carlo paths, along with the deviation in percent from theanalytical results.

MC paths c(S(0), 0) (±%) ∂c∂S (±%) ∂c

∂σ (±%)10 000 8,939778 (-1,53%) 0,51545 (-0,54%) 39,57476 (-0,70%)5 000 9,02042 (-0.64%) 0,52044 (+0.42%) 39,89564 (+0,11%)1 000 8,76619 (-3,44%) 0,51810 (-0,03%) 38,64311 (-3,03 %)500 9,14848 (+0,77%) 0,51580 (-0,47%) 40,36231 (+1,28%)100 9,96143 (+9,73%) 0,54841 (+5,82%) 45,53559 (+14,26%)

Table 3.1: Monte Carlo simulated European call option prices along delta andvega for the option. The deviation from the analytical value in parenthesis.

What is not obvious from table 3.1 is that the results become less volatilewith increased number of simulated Monte Carlo paths. The fact that 500 pathshas similar accuracy as 10 000 paths is simply chance. This phenomenon is il-lustrated in figure 3.1 where 200 Monte Carlo simulations were performed, halfwith 1000 paths, the other half with 100 paths.

22

0 20 40 60 80 100

78

910

11

c(S(0),0)

Figure 3.1: 100 simulations of call prices using 1000 Monte Carlo paths (red)and 100 MC paths (blue). Analytical price in black.

As expected, the results become less volatile and converge towards the an-alytical solution as the number of Monte Carlo paths simulated are increased.However, the main result is the derivatives, which are in line with the expectedaccuracy of a Monte Carlo simulation (see (9)), and thus show the feasibility ofAAD in financial mathematics.

23

Chapter 4

The Hull-White model

A central part of financial mathematics is of course interest rates, which areused not only to discount future payments into todays equivalent value, butas underlying instrument for various assets. In order to simulate the futurebehavior of interest rates some model of their movements are needed. Since thefuture movements are unknown today, some kind of stochastic process couldserve as a model.

4.1 Short Rate Models

There are several different stochastic representations of short rate dynamics.The general formula is

dr(t) = µ(t, r(t))dt+ σ(t, r(t))dW (t), r(0) = r0, (4.1)

where W (t) is a Wiener process. This general formulation has several variationswith different properties. The classic Vasicek model describes the short ratethrough an Ornstein-Uhlenbeck process (see(9)) and is defined as

Definition 4.1. Vasicek model

dr(t) = α(b− r(t))dt+ σdW (t),

where α, b, and σ are positive constants and W is a Wiener process.

This gives a so-called mean reverting process, since if r(t) > b, the drift willbe negative, and if r(t) < b the drift is positive. In other words, the process willtend towards the mean where r(t) = b.

The Cox-Ingersoll-Ross model expands upon the Vasicek model by havingthe variance proportional to the square root of the rate (see (6)). In other words,the greater the rate, the grater the variance.

Definition 4.2. Cox-Ingersoll-Ross

dr(t) = α(b− r(t))dt+ σ√r(t)dW, α > 0,

where W is a Wiener process.

24

The Cox-Ingersoll-Ross model, or CIR for short, is also a mean revertingprocess.

Both the Vasicek model and the CIR model have constant parameters a, band σ. This can make it difficult for the models to precisely represent the yieldcurve given by a set of bonds. Therefore, there is also the Ho-Lee model

Definition 4.3. Ho-Lee

dr(t) = θ(t)dt+ σdW (t),


The Ho-Lee model can preform well when fitted against a yield curve of bondprices, however, since the volatility σ is constant, there are yet another model Iwould like to present, namely the Hull-White model.

The Hull-White model is a popular and commonly used model in financialmodeling (see (13)). There are two main definitions of the model,

Definition 4.4. Hull-White (extended Vasicek)

dr(t) = (Θ(t)− α(t)r(t))dt+ σ(t)dW (t), α(t) > 0,


Definition 4.5. Hull-White (extended CIR)

dr(t) = (Θ(t)− α(t)r(t))dt+ σ(t)√r(t)dW (t), α(t) > 0,


They are similar to the Vasicek and Cox-Ingersoll-Ross models, but with timedependent variables instead of constants. In the rest of this thesis Definition4.4 will be used.

4.2 Choosing θ(t)

The θ in Definition 4.4 need be chosen such that the initial term structureof discount bonds is replicated by the model. First of all, assume that theinstantaneous short rate under the risk adjusted measure Q is given by

r(t) = x(t) + ϕ(t), r(0) = r0, (4.2)

where

dx(t) = −ax(t)dt+ σ(t)dW (t), x(0) = 0.

Which implies that ϕ(0) = r(0)− x(0) = r0. Using a filtration Fs at s togetherwith Proposition A.1, (4.2) can be written as

r(t) = x(s)e−a(t−s) +

∫ t

s

e−a(t−u)σ(u)dW (u) + ϕ(t). (4.3)

25

Meaning that r(t) conditional on Fs is normally distributed with mean

E [r(t)|Fs] = x(s)e−a(t−s) + ϕ(t), (4.4)

and variance

V ar {r(t)|Fs} = E[r(t)2|Fs

]− E [r(t)|Fs]2

= E

[(x(s)e−a(t−s) +

∫ t

s

e−a(t−u)σ(u)dW (u) + ϕ(t)

)2∣∣∣∣∣Fs

]

−(x(s)e−a(t−s) + ϕ(t)

)2

= E

[(∫ t

s

e−a(t−u)σ(u)dW (u)

)2∣∣∣∣∣Fs

]

+ E[

2(x(s)e−a(t−s) + ϕ(t)

)(∫ t

s


)∣∣∣∣Fs]+ E

[(x(s)e−a(t−s) + ϕ(t)

)2∣∣∣∣Fs]

−(x(s)e−a(t−s) + ϕ(t)

)2

.

The second expectation is equal to zero, which follows from the fundamentalproperties of Wiener processes. The third expectation is deterministic, and thusthe expectation can be removed, making it equal to the fourth term. Left behindis

V ar {r(t)|Fs} = E

[(∫ t

s


)2∣∣∣∣∣Fs

]

=

∫ t

s

e−2a(t−u)σ(u)2du.

(4.5)

The price p(t, T ) of a zero-coupon bond at time t, maturing at time T , andwith unit face value is denoted

p(t, T ) = E[e−

∫ Ttr(u)du

∣∣∣Ft] , (4.6)

where E denotes the expectation under Q. Inserting (4.2) this can be written as

p(t, T ) = E[e−

∫ Ttx(u)du−

∫ Ttϕ(u)du

∣∣∣Ft] .With ϕ(t) being a deterministic function, it can move outside the expectation,giving

p(t, T ) = e−∫ Ttϕ(u)duE

[e−

∫ Ttx(u)du

∣∣∣Ft] .Using Lemma A.6 with Y (t) = 0, the exponent inside the expectation has therepresentation

I(t, T ) =

∫ T

t

X(u)du =

∫ T

t

−x(u)du,

where I(t, T ) conditional on the sigma filed Ft is normally distributed withmean

M(t, T ) =1− e−a(T−t)

aX(t) = −1− e−a(T−t)

ax(t),

26

and variance

V (t, T ) =

∫ T

t

σ(u)2

a2

(1− e−a(T−u)

)2

du.

Furthermore, with I(t, T ) being a normal random variable with the above men-tioned mean and variance, then

E[eI(t,T )

∣∣∣Ft] = eM(t,T )+ 12V (t,T ). (4.7)

Therefore, equation (4.6) can be expressed as

p(t, T ) = exp

{−∫ T

t

ϕ(u)du− 1− e−a(T−t)

ax(t)

+1

2

∫ T

t

σ(u)2

a2

(1− e−a(T−u)

)2

du

}.

The objective is to find a function ϕ(t) which accurately reproduces theobserved term structure in the market. The prices of discount bonds can onlybe observed at time t = 0, for some maturity Ti, and we denote these observableprices by p?(0, Ti). Therefore ϕ(t) must solve the following equation

p?(0, Ti) = exp

{−∫ Ti

0

ϕ(u)du+1

2

∫ Ti

0

σ(u)2

a2

(1− e−a(T−u)

)2

du

},

since x(0) = 0. From the relationship between forward rates and bond prices

f(0, T ) = −∂ ln (p(0, T ))

∂T,

it follows that

f?(0, Ti) = ϕ(T )− ∂

∂T

{1

2

∫ Ti

0

σ(u)2

a2

(1− e−a(T−u)

)2

du

}.

This results yields Proposition 4.6.

Proposition 4.6. Term structure fitThe model defined in (4.2) fits the currently observed term structure of discountfactors if and only if, for each T

ϕ(T ) = f?(0, T ) +1

2

∂

∂T

(∫ T

0

σ(u)2

a2

(1− e−a(T−u)

)2

du

).

However, this does not answer the question of how to choose θ(t) in Definition4.4. The question can be answered by applying Itos lemma to (4.2), namely

dr(t) =

(∂ϕ(t)

∂t− ax(t)

)dt+ σ(t)dW (t)

=

(∂ϕ(t)

∂t− a (r(t)− ϕ(t))

)dt+ σ(t)dW (t)

=

((∂ϕ(t)

∂t+ aϕ(t)

)− ar(t)

)dt+ σ(t)dW (t).

27

It follows that the model is fully specified by setting

θ(t) =∂ϕ(t)

∂t+ aϕ(t). (4.8)

Calculating the derivative of ϕ(t) might seem daunting, however, calculatingθ(t) explicitly is not necessary. By again using x(t) = r(t) − ϕ(t), equations(4.3), (4.4) and (4.5) become

r(t) = ϕ(t) + r(s)e−a(t−s) − ϕ(s)e−a(t−s) +

∫ t

s

e−a(t−u)σ(u)dW (u), (4.9)

E [r(t)|Fs] = ϕ(t) + r(s)e−a(t−s) − ϕ(s)e−a(t−s), (4.10)

V ar {r(t)|Fs} =

∫ t

s

e−2a(t−u)σ(u)2du, (4.11)

where r(t) conditional on Fs is still normally distributed. Hence there is noneed for an explicit expression for θ(t), and thus no need to find the partialderivative ∂ϕ(t)/∂t. In fact, (4.9), (4.10) and (4.11) is all that is needed inorder to propagate from r(ti) to r(ti+1), but more on this in Chapter 5

4.3 Volatility-Calibration

In the previous section, we showed how to θ(T ) is a determined by the forwardrate f?(0, T ), the mean reverting parameter a and the volatility σ(t). Since theforward rate is observed, θ(T ) will depend on our choice of a and σ(t). Since wehave already let the market decide θ(t) for us through Proposition 4.6, it wouldbe satisfactory if the market could also decide the volatility σ(t). This can bedone by considering swap options commonly known as swaptions. We will beusing the following definitions found in Bjork (see (2)).

Definition 4.7. LIBOR forward rateLet pi(t) denote the zero coupon bond price p(t, Ti) and let Li(t) denote theLIBOR forward rate contracted at t, for the period [Ti − 1, Ti],i.e,

Li(t) =1

Ti − Ti−1· pi−1(t)− pi(t)

pi(t), i = 1, . . . , N.

Definition 4.8. Payer swap contractThe payments in a Tn × (TN − Tn) payer swap are a follows:

• Payments will be made and received at Tn+1, Tn+2, . . . , TN .

• For every elementary period [Ti, Ti+1], i = n, . . . , N − 1, the LIBOR rateLi+1(Ti) is set at time Ti, and the floating leg

(Ti+1 − Ti) · Li+1(Ti) =pi(Ti)− pi+1(Ti)

pi+1(Ti)=

1

pi+1(Ti)− 1,

is received at Ti+1.

• For the same period the fixed leg (Ti+1 − Ti) ·K is payed at Ti+1.

The value of such a payer swap at time t < Tn is denoted PSNn (t;K).

28

The payer swap defined above is a so called fixed for floating swap, allowingthe holder of the contract to know what they will be paying, regardless of futureinterest rate movements. For a receiver swap the payments go in the otherdirection, so the holder receives a fixed rate and pays floating payments.With these two definitions, we have the following proposition.

Proposition 4.9. Value of floating paymentThe arbitrage free value at time t of a floating payment (Ti+1−Ti) ·Li+1(Ti) attime Ti+1 is

p(t, Ti)− p(t, Ti+1).

Proof. The discounted value of the payment at time Ti+1 is

pi+1(t)(Ti+1 − Ti) · Li+1(t) = pi+1(t)(Ti+1 − Ti) ·1

Ti+1 − Ti· pi(t)− pi+1(t)

pi+1(t)︸︷︷︸Definition 4.7

= pi+1(t)pi(t)− pi+1(t)

pi+1(t)

= pi(t)− pi+1(t) = p(t, Ti)− p(t, Ti+1).

Using Proposition 4.9 the total value of the floating payments at time t ≤ Tnas

N−1∑i=n

[p(t, Ti)− p(t, Ti+1)] = p(t, Tn)− p(t, TN ) = pn(t)− pN (t).

Furthermore, the total value of the fixed payments are at time t < Tn

N−1∑i=n

p(t, Ti+1)(Ti+1 − Ti)K = K

N−1∑i=n

(Ti+1 − Ti)pi+1(t).

The sum in the right-hand side is an important object, deserving of its owndefinition

Definition 4.10. Accrual factorFore each pair n,N with n < N , the process SNn (t) is defined by

SNn (t) =

N∑i=n+1

(Ti − Ti−1) p(t, Ti).

SNn is referred to as the accrual factor or as the present value of a basispoint.

The value of the payer swap PSNn (t;K) is the difference between the floatingleg and the fixed leg, thus given by

PSNn (t;K) = pn(t)− pN (t)−KSNn (t).

For some K, the value of the swap is 0, which gives us the next definition

29

Definition 4.11. Forward Swap RateThe forward swap rate RNn (t) of the Tn × (TN − Tn) swap is the value K forwhich PSNn (t;K) = 0, i.e.

RNn (t) =pn(t)− pN (t)

SNn (t).

Using Definition 4.11, the arbitrage free price of a payer swap with swaprate K can be rewritten as

PSNn (t;K) =(RNn (t)−K

)SNn (t).

An option written on the swap described by PSNn (t;K), gives the holder theright but not the obligation to enter the swap contract at Tn. Such options areknown as swap-options, or swaptions, and can be priced using Black’s formulafor swaptions.

Definition 4.12. Black’s Formula for SwaptionsThe Black-76 formula for a Tn × (TN − Tn) payer swaption with strike K isdefined as

PSNNn (t) = SNn (t)

{RNn (t)N [d1]−KN [d2]

},

where

d1 =1

σn,N√TN − t

[ln

(RNn (t)

K

)+

1

2σ2n,N (TN − t)

],

d2 = d1− σn,N√TN − t.

The constant σn,N is known as the Black volatility. Given a market price forthe swaption, the Black volatility implied by the Black formula is referred to asthe implied Black volatility, and we denote it by σimp.

Furthermore, according to Wilmott (see (16)), the Black-Scholes formulaeare still valid when volatility is time dependent provided we use√

1

T − t

∫ T

t

σ(s)2ds, (4.12)

in place of σ, i.e. now we use

d1 =ln(RNn (t)K

)+ 1

2

∫ TNt

σ(s)2ds√∫ TNt

σ(s)T ds,

d2 =ln(RNn (t)K

)− 1

2

∫ TNt

σ(s)2ds√∫ TNt

σ(s)2ds.

(4.13)

Now, for some fixed a, t, T and a set of market factors such as the prices ofvarious discount bonds, it is clear that the function in Definition 4.12 dependsonly on σ, and it can be written as

PSNNn (σ; t,K, a,p?).

Suppose now that there is a swaption with price C?, on the underlying swapPSNn with swap rate K. Using Definition 4.12, we should therefore have

C? = PSNNn (σimp; 0,K, a,p?). (4.14)

How to determine σimp is discussed in Section 5.7.

30

4.4 The case when σ(t) is piecewise constant

Suppose there are a set of n observable swaptions in the market with maturitiesT1 < T2 < · · · < Tn, and that between each maturity, the volatility is constant.The expression for σ then becomes

σ(t) =

(n∑i=0

1{t∈[Ti−1,Ti]}σi

)+ 1{t∈[Tn,∞]}σn, (4.15)

where T0 = 0. We further define

T+(t) = min {i : Ti ≥ t} ,T−(t) = max {i : Ti ≤ t} ,

in other words, T+(t) is the index of the first maturity after (or at) t, and T−(t)for the last maturity before (or at) t. Note that if

t = Ti, then T+(t) = T−(t) = i.

With the expression for σ(t) in (4.15) inserted into Proposition 4.6, the resultis

ϕ(t) = f?(0, t) +1

2a2

∂

∂t

T−(t)−1∑i=0

σ2i

∫ Ti+1

Ti

(1− e−a(t−u)

)2

du

+σ2T−(t)

∫ t

TT−(t)

(1− e−a(t−u)

)2

du

)

= f?(0, t) +1

2a2

∂

∂t

T−(t)−1∑i=0

σ2i

[u− 2e−a(t−u)

a+e−2a(t−u)

2a

]Ti+1

Ti

+σ2T−(t)

[u− 2e−a(t−u)

a+e−2a(t−u)

2a

]tTT−(t)

)

= f?(0, t) +1

2a2

T−(t)−1∑i=0

σ2i

[2e−a(t−u) − e−2a(t−u)

]Ti+1

Ti

+σ2T−(t)

[2e−a(t−u) − e−2a(t−u)

]tTT−(t)

)

= f?(0, t) +1

2a2

T−(t)−1∑i=0

σ2i

(2e−a(t−Ti+1) − e−2a(t−Ti+1)

−2e−a(t−Ti) + e−2a(t−Ti)

)

+σ2T−(t)

(2− 1− 2e−a(t−TT−(t)) + e−2a(t−TT−(t))

)).

(4.16)

31

In the same manner, the variance (4.5) becomes

V ar[r(t)|Fs] =

∫ t

s

e−2a(t−u)σ(u)2du

=

∫ TT+(s)

s

e−2a(t−u)σ(u)2du+

∫ TT−(t)

TT+(s)


+

∫ t

TT−(t)


=σ2T+(s)

[e−2a(t−u)

2a

]TT+(s)

s

+

T−(t)∑i=T+(s)

σ2i

[e−2a(t−u)

2a

]Ti+1

Ti

+ σ2T−(t)

[e−2a(t−u)

2a

]tTT−(t)

=σ2T+(s)

2a

(e−2a(t−TT+(s)) − e−2a(t−s)

)+

T−(t)∑i=T+(s)

σ2i

2a

(e−2a(t−Ti+1) − e−2a(t−Ti)

)

+σ2T−(t)

2a

(1− e−2a(t−TT−(t))

).

(4.17)

Furthermore, the expressions for d1 and d2 in Equations (4.13) become

d1 =ln(RNn (t)K

)+ 1

2

(σ2T−(t)

(TT+(t) − t

)+∑N−1i=T+(t) σ

2i (Ti+1 − Ti)

)√σ2T−(t)

(TT+(t) − t

)+∑N−1i=T+(t) σ

2i (Ti+1 − Ti)

, (4.18)

and

d2 =ln(RNn (t)K

)− 1

2

(σ2T−(t)

(TT+(t) − t

)+∑N−1i=T+(t) σ

2i (Ti+1 − Ti)

)√σ2T−(t)

(TT+(t) − t

)+∑N−1i=T+(t) σ

2i (Ti+1 − Ti)

, (4.19)

which comes from the fact that∫ TN

t

σ(s)2ds =

∫ TT+(t)

t

σ(s)2ds+

∫ TN

TT+(t)

σ(s)2ds

= σ2T−(t)

∫ TT+(t)

t

ds+

N−1∑i=T+(t)

σ2i

∫ Ti+1

Ti

ds

= σ2T−(t)

(TT+(t) − t

)+

N−1∑i=T+(t)

σ2i (Ti+1 − Ti) .

32

Chapter 5

Calibrating the Hull-Whitemodel

In Chapter 4 we derived an expression for θ in Definition 4.4, and showed howσ is related to swaption prices. In this chapter we will develop a method for cal-ibrating the model in Definition 4.4 using Adjoint Algorithmic Differentiation.At the end, we will derive the partial derivatives of the calibration parameterswith respect to the final short rate curve. The Chapter is structured as topresent the forward pass equations, then the corresponding adjoint versions ina subsection before moving on to the next method.

5.1 Notations

Before going further into the calibration and adjoint versions of equations dis-cussed in Chapter 4, some notations must first be introduced.Maturity T s: The time of maturity of traded swaptions are denoted by T s.Furthermore, since the code is implemented in R, the first index of a vector is1, not 0 as in many other coding languages. Therefore, the vector of maturitiesTs = [T s1 = 0, T s2 , T

s3 , . . . , T

sn], where T si < T sj for i < j.

Maturity T r: The time of maturity for a quoted discount bond is denotedT r, and the corresponding vector of maturities is Tr [T r1 , T

r2 , . . . , T

rn ], where

T ri < T rj for i < j. Note that the length of the vectors Tr and Ts may differ.

Superscript σ(i): Between two maturities T si and T si+1, a constant σ denoted

by σ(i) applies.Subscript σi: For any time ti along the grid, there is a corresponding σi. Itfollows from above that

{σi : ti ∈ [T sj , T

sj+1)

}= σ(j).

Short rate r: The short rate r(t) at time ti is denoted ri.Less-than function L(tk): The function L(tk) returns the index of the swap-tion maturity that is nearest, but less than, or equal, to tk. It is the function

L(tk) = max {i : T si ≤ tk} . (5.1)

Hence, for any σk,

σk = σ(L(tk)). (5.2)

33

Greater-than function G(tk): The function G(tk) returns the index of theswaption maturity that is nearest, but more than tk. It is the function

G(tk) = min {i : T si > tk} . (5.3)

5.2 The Assignment Function

As mentioned earlier, in order to simulate the short rate at some future timeT , the time axis need to be discretized. When doing so, we end up with a timearray

t = [t1 = 0, t2, t3, . . . , tN−1, tN ] ,

where tN is the time at the end of the simulation. For each time ti, there is acorresponding short rate ri, such that r(ti) = ri, producing a similar vector forthe short rates as

r = [r1 = r(0), r2, r3, . . . , rN−1, rN ] .

How these ri’s are determined is discussed in Section 5.5 below. Furthermore,the vector of ϕ’s

ϕ = [ϕ1, ϕ2, ϕ3, . . . , ϕN−1, ϕN ] ,

is assigned using equation (4.16), and how that is done is discussed in furtherdetail in Section 5.4 below. One easy assignment function however, is the oneassigning the various σ’s into a vector.

Suppose there are n swaptions with good enough liquidity that we can as-sume their prices correspond to the market’s implied volatility. Suppose furtherthat these swaptions have different times of maturity T s1 < T s2 < · · · < T sn. Thenall σis between two of these maturities are equal and the assignment function is

σi =

σ(1) if ti < T s2σ(2) if T s2 ≤ ti < T s3...

...

σ(n) if T sn ≤ ti

.

For example, suppose there are 5 swaptions trading, with maturities at timest10, t20, t30, t40 and t50. Suppose further that the corresponding implied volatil-ities are 0.2, 0.3, 0.25, 0.3 and 0.35 respectively. Then the assignment functionfor σ would produce the graph seen in Figure 5.1.

34

0 10 20 30 40 50

0.00.10.20.30.40.5

Assigment function sigma

Index, k

Sigma

Figure 5.1: Example output of the σ-Assignment Function.

As the objective is to calibrate these σ(k)s, the adjoint variables of the as-signment function is needed. Since the σks are just simple assignments, theadjoint variables are simply

σ(k) =∑

{i:ti∈[T sk−1,Tsk)}

σi. (5.4)

5.3 Choosing f ?(0, T )

From Proposition 4.6, it is clear that we need the instantaneous short ratef?(0, T ) for all maturities T in order to compute the short rate r(T ) at thattime. However, since there are only a finite number of quoted rates r?i andassociated maturities T ri available in the market, a method for interpolatingthe instantaneous short rate to some T /∈ Tr is needed. First of all, for someobserved interest rate r?i , associated with a bond with maturity T ri , the price ofthe bond is

p?(0, T ri ) = e−∫ Tri0 r?i du = e−r

?i

∫ Tri0 du = e−r

?i T

ri .

From the relationship between the instantaneous forward rate and bond prices,

p(t, T ) = e−∫ Ttf(t,u)du ⇒ f(t, T ) = −∂ ln (p(t, T ))

∂T,

it is clear that f(0, T ) must satisfy

−∫ T ri

0

f?(0, u)du = − ln (p?(0, T ri )) = −∫ T ri

0

r?i du = −r?i T ri , (5.5)

for all maturities T ri . We will be using 2 assumptions regarding f(0, T ), specif-ically

35

• During the first period, T < T r1 , the instantaneous short rate is equal tothe rate corresponding to the first bond, i.e. f(0, T ) = r?1 , T ≤ T r1 .

• For the other periods, f(0, T ) is a linear function

f(0, T ) = f(0, T ri )

(T ri+1 − TT ri+1 − T ri

)+f(0, T ri+1)

(T − T riT ri+1 − T ri

), T ∈

[T ri , T

ri+1

),

such that f(0, T ) fulfills the condition given by (5.5).

The assumptions about f(0, T ) imply that f(0, T r1 ) = r?1 and∫ T r1

0

f(0, u)du = r?1Tr1 .

Furthermore, f(0, T r2 ) must solve∫ T r2

0

f(0, u)du =

∫ T r1

0

f(0, u)du+

∫ TR2

T r1

f(0, u)du

= r?1Tr1 +

f(0, T r1 )

T r2 − T r1

[T r2 u−

u2

2

]T r2T r1

+f(0, T r2 )

T r2 − T r1

[u2

2− T r1 u

]T r2T r1

= r?1Tr1 +

f(0, T r1 )

T r2 − T r1

(T r2

2 − T r22

2− T r2 T r1 +

T r12

2

)+f(0, T r2 )

T r2 − T r1

(T r2

2

2− T r1 T r2 −

T r12

2+ T r1

2

)= r?1T

r1 +

f(0, T r1 )

2 (T r2 − T r1 )(T r2 − T r1 )

2+

f(0, T r2 )

2 (T r2 − T r1 )(T r2 − T r1 )

2

= r?1Tr1 +

f(0, T r1 )

2(T r2 − T r1 ) +

f(0, T r2 )

2(T r2 − T r1 ) = r?2T

r2 .

Thus

f(0, T r2 ) =2 (r?2T

r2 − r?1T r1 )

T r2 − T r1− f(0, T r1 ) =

2 (r?2Tr2 − r?1T r1 )

T r2 − T r1− r?1T r1 .

Similar calculations show that

f(0, T ri+1) =2(r?i+1T

ri+1 − r?i T ri

)T ri+1 − T ri

− f(0, T ri ), if i > 0. (5.6)

Hence f(0, T ) can be written as

f(T, r; Tr) =

{r?1 if T ≤ T r1f(0, T ri )

(T ri+1−TT ri+1−T ri

)+ f(0, T ri+1)

(T−T ri

T ri+1−T ri

)if T ∈

[T ri , T

ri+1

),

(5.7)where f(0, T ri ) is defined by (5.6). By adjusting the notation, such that

f(0, T ri ) = Hi,

allows (5.6) to be rewritten as

Hi =2(r?i T

ri − r?i−1T

ri−1

)T ri − T ri−1

−Hi−1, if i > 1,

withH1 = r?1T

r1 .

36

5.3.1 Adjoint of f ?(0, T )

To find the adjoint variables r?i of f(0, T ), we will use the concept of chainingdiscussed in Section 2.4. First note that

∂H1

∂r?1= T r1 , and

∂H1

∂r?i= 0, i > 1.

To find an expression for the derivatives of Hi, i > 1, consider these examples;

∂H2

∂r?1=−2T r1T r2 − T r1

− T r1

∂H3

∂r?1= −∂H2

∂r?1= (−1)

(−2T r1T r2 − T r1

− T r1)

∂H4

∂r?1= −∂H3

∂r?1=∂H2

∂r?1...

∂Hn

∂r?1= (−1)n

∂H2

∂r?1

∂H2

∂r?2=

2T r2T r2 − T r1

∂H3

∂r?2=−2T r2T r3 − T r2

− 2T r2T r2 − T r1

∂H4

∂r?2= −∂H3

∂r?2...

∂Hn

∂r?2= (−1)n+1 ∂H3

∂r?2.

Hence the general formula can be expressed as

∂Hi

∂r?j= (−1)i+j+1 ∂Hj+1

∂r?j= (−1)i+j+1

(−2T rj

T rj+1 − T rj− ∂Hj

∂r?j

), for i > j,

where∂H1

∂r?1= T r1 and

∂Hj

∂r?j=

2T rjT rj − T rj−1

, if j > 1,

and∂Hi

∂r?j= 0 if i < j.

In a more condensed form this can be written as

∂Hi

∂r?j=

0 , i < j

T r1 , i = j = 12T rj

T rj −T rj−1, i = j > 1

(−1)i+j+1(−2T rj

T rj+1−T rj− ∂Hj

∂r?j

), i > j

. (5.8)

Thus the adjoint of an observed rate r?j on f(0, T ) is

r?j =∂f(0, T )

∂r?j=∂Hi

∂r?j

(T ri+1 − TT ri+1 − T ri

)+∂Hi+1

∂r?j

(T − T riT ri+1 − T ri

), for T ∈

[T ri , T

ri+1

],

(5.9)where i > 0. For T < T r1 , the adjoint become

r?j =∂f(0, T )

∂r?j=

{1 if j = 1

0 otherwise.(5.10)

37

5.4 Adjoint of ϕ(t)

Recall (4.16) with piecewise constant σ, which using the notation introduced inSection 5.1 is written as

ϕk = f(0, tk) +1

2a2

L(tk)−1∑i=1

σ(i)2

(2e−a(tk−T si+1) − e−2a(tk−T si+1)

−2e−a(tk−T si ) + e−2a(tk−T si )

)

+σ(L(tk))2(

1− 2e−a(tk−T sL(tk)) + e−2a(tk−T sL(tk))))

.

To avoid cumbersome notations, we introduce the functions gi(tk), and hi(tk),defined as

gi(tk) =

(2e−a(tk−T si+1) − e−2a(tk−T si+1)

−2e−a(tk−T si ) + e−2a(tk−T si )

)hi(tk) =

(1− 2e−a(tk−T si ) + e−2a(tk−T si )

),

hence

ϕk = f(0, tk) +1

2a2

L(tk)−1∑i=1

σ(i)2gi(tk) + σ(L(tk))2

hL(tk)

.

Since f(0, tk) is a function of the observed market rates r?i , so is ϕk, and theadjoint variables of the observed rates are

r?i =∂ϕk∂r?i

=∂ϕk

∂f(0, tk)︸︷︷︸=1

∂f(0, tk)

∂r?i,︸︷︷︸

Equations (5.9) and (5.10)

hence, the expressions in (5.9) and (5.10) applies to ϕk as well without modifi-cation. Furthermore, the adjoint variables of σ become

σ(i)ϕk

=σ(j)

a2gi(tk), i ∈ [1, L(tk)) ,

σ(L(tk))ϕk

=σ(j)

a2hj(tk), i ∈ [L(tk), L(tk) + 1) ,

σ(L(tk))ϕk

= 0, i ≥ L(tk) + 1,

(5.11)

where the subscript is there to emphasise that these are the adjoint variables ofsigma with respect to ϕk.

For each ϕk, there will therefore be a vector of n adjoint σ(i)s. For ϕ1 thisvector is simply

σϕ1=

1

a2

(σ(1)h1(t1) 0 0 . . . 0

)T,

and for ϕ2 it is

σϕ2 =1

a2

(σ(1)h1(t2) 0 0 . . . 0

)T.

38

This will continue until T s1 < tk ≤ T s2 , in which case

σϕk =1

a2

(σ(1)g1(tk) σ(2)h2(tk) 0 . . . 0

)T.

And similarly for T s2 < tk ≤ T s3 , the vector becomes

σϕk =1

a2

(σ(1)g1(tk) σ(2)g2(tk) σ(3)h3(tk) 0 . . . 0

)T.

The full set of adjoint variables σϕ can thus be represented by a matrix. Withthe n different σ(i)s, and the N different ϕks, the matrix become

σϕ =

| | |σϕ1 σϕ2 . . . σϕN| | |

[n×N ]

.

We will be referring to the adjoint variable σ(i)ϕk , i.e. the derivative of the kth ϕ,

with respect to the ith block of σs, as

σ(i)ϕk

= σϕ [i, k] ,

which is consistent with how elements are referenced in the programming lan-guage R.

5.5 Discretization of the diffusion process

The approach in this section will be similar to the one outlined in Section 3.1,but in this case the diffusion process is described by (4.9),

r(t) = ϕ(t) + r(s)e−a(t−s) − ϕ(s)e−a(t−s) +

∫ t

s

e−a(t−u)σ(u)dW (u).

Recall that r(t) conditional on Fs is normally distributed with mean

E [r(t)|Fs] = ϕ(t) + e−a(t−s) (r(s)− ϕ(s)) ,

and variance

V ar {r(t)|Fs} =

∫ t

s

e−2a(t−u)σ(u)2du.

Because of the piecewise constant nature of our chosen σ(t), the propagationfrom ti to ti+1 = ti + dt can be written as

r(ti+1) = ϕ(ti+1) + e−adt (r(ti)− ϕ(ti)) + σ(ti)N(0, 1)

√∫ ti+1

ti

e−2a(ti+1−u)du.

(5.12)With the notations introduced in Section 5.1, including the vector Z of iid ∼N(0, 1), (5.12) can be written as

ri+1 = ϕi+1 + e−adt (ri − ϕi) + σiZi

√∫ ti+1

ti

e−2a(ti+1−u)du.

39

The integral under the root is

∫ ti+1

ti

e−2a(ti+1−u)du =

[e−2a(ti+1−u)

2a

]ti+1

ti

=1− e−2adt

2a.

To avoid cumbersome notations, let α = e−adt, and β =√

(1− α2)/2a, whichallows (5.12) to be written as

rk+1 = ϕk+1 + α(rk − ϕk) + σ(L(tk)Zk√β.

In order to calculate the derivatives of rk generated by the model, with respectto the quoted interest rates r?i , the following equation needs to be solved;

∂rk∂r?i

=∂ϕk∂r?i− α∂ϕk−1

∂r?i+ α

∂rk−1

∂r?i

=∂f(0, tk)

∂r?i− α∂f(0, tk−1)

∂r?i+ α

∂rk−1

∂r?i.

The first two terms on the right-hand side are defined by (5.9) and (5.10).From the last term on the right-hand side, it is clear that this method needsto be solved recursively. However, recall from Section 2.6, that it is possibleto compute the adjoint variables during the forward pass in order to avoidrecursion. Hence the adjoint variable of some r?i on rk is simply

r?i,rk =∂f(0, tk)

∂r?i− α∂f(0, tk−1)

∂r?i+ αr?i,rk−1

, (5.13)

where r?i,rk−1is known when it is time to compute r?i,rk .

The adjoint variable for σ(i), can be computed using the same reasoning asabove, and realizing that σ(i) affects rk both explicitly through σ(L(tk))Zk

√β if

i = L(tk), and implicitly by affecting ϕk, ϕk−1 and potentially even rk−1. Theresulting adjoint is therefore

σ(i)rk

=∂ϕk∂σ(i)

− α∂ϕk−1

∂σ(i)+ ασ(i)

rk−1+ Zk−1

√β∂σ(L(tk))

∂σ(i)

= σϕ [i, k]− ασϕ [i, k − 1] + ασ(i)rk−1

+ Zk−1

√β1[i=L(tk)],

(5.14)

where again σ(i)rk−1 is known when computing σ

(i)rk , and 1 denotes the indicator

function.

5.6 Black’s Formula for Swaptions

In Definition 4.12, Black’s formula for swaptions is introduced. When using atime dependent (albeit piecewise constant) volatility, it becomes

PSNNn (ti) = SNn (ti)

(RNn (ti)N [d1]−KN [d2]

), (5.15)

40

where

d1 =ln(RNn (ti)K

)+ 1

2

∫ tNtn

σ(s)2ds√∫ tNtn

σ(s)2ds,

d2 =ln(RNn (ti)K

)− 1

2

∫ tNtn

σ(s)2ds√∫ tNtn

σ(s)2ds.

(5.16)

Using Algorithmic Differentiation, the aim is to find the volatilities σ(i) suchthat

PSNNini (0;Ki) = C?i ,

for all observed swaption prices C?i with strike Ki, maturity tni and tenor tNi −tni . By realizing that the swaption price is computed at time t = 0, it followsthat the accrual factor SNn and forward swap rate RNn are also computed at t = 0.This in turn implies that the discount factors p(t, T ) used in the computationsof SNn and RNn will also be evaluated for t = 0. Recall that

p(0, T ) = e∫ T0f(0,u)du,

hence we can use the expression for f(0, T ) used in (5.7) to compute SNn andRNn .

5.6.1 Discount factors p(0, T )

With f(0, T ) as

f(T, r; T) =

{r?1 if T ≤ T r1f(0, Ti)

(T ri+1−TT ri+1−T ri

)+ f(0, T ri+1)

(T−T ri

T ri+1−T ri

)if T ∈

[T ri , T

ri+1

),

we define F (0, T ) as

F (t) =

∫ t

0

f(0, u)du =

∫ T ri

0

f(0, u)du︸︷︷︸Using (5.5)

+

∫ t

T ri

f(0, u)du

= r?i Tri +

∫ t

T ri

(Hi

(T ri+1 − uT ri+1 − T ri

)+Hi+1

(u− T ri

T ri+1 − T ri

))du

= r?i Tri +

Hi

T ri+1 − T ri

[T ri+1u−

u2

2

]tT ri

+Hi+1

T ri+1 − T ri

[u2

2− T ri u

]tT ri

= r?i Tri +

Hi

T ri+1 − T ri

(T ri+1t−

t2

2− T ri+1T

ri +

T ri2

2

)+

Hi+1

T ri+1 − T ri

(t2

2− T ri t−

T ri2

2+ T ri

2

)︸︷︷︸

12 (t−T ri )

2

= r?i Tri +

Hi

(T ri+1t− t2

2 − Tri+1T

ri +

T ri2

2

)T ri+1 − T ri

+Hi+1

2

(t− T ri )2

T ri+1 − T ri.

41

To avoid cumbersome notations, let

Γi(t) =T ri+1t− t2

2 − Tri+1T

ri +

T ri2

2

T ri+1 − T ri,

and let F (tk) = Fk, giving Fk as

Fk =

{r?1tk if tk ≤ T r1r?i Ti +HiΓi(tk) + Hi+1

2(tk−T ri )2

T ri+1−T riif tk ∈

[T ri , T

ri+1

).

Hence, the adjoint variable for some r?j on Fk is

r?j,Fk =

tk if tk ≤ T r1 and j = 1

0 if tk ≤ T r1 and j 6= 1

1[j=i]Tri + ∂Hi

∂r?jΓi(tk) + ∂Hi+1

∂r?j

(tk−T ri )2

2(T ri+1−T ri )if tk ∈

[T ri , T

ri+1

),

were the partial derivatives ∂Hi/∂r?j are defined by (5.8). It follows that

∂p(0, tk)

∂r?j=∂p(0, tk)

∂Fk

∂Fk∂r?j

= −e−Fk r?j,Fk = −r?j,Fkp(0, tk).

Remark. Note that since the discount factors are evaluated at t = 0, they areindependent of the volatility σ, hence

∂p(0, tk)

∂σ(j)= 0, ∀ j, k.

5.6.2 Accrual factor and forward swap rate

In order to calculate the price of a swaption, the accrual factor and forwardswap rate is needed. Recall the definition of the accrual factor in Definition4.10,

SNn (tk) =

N∑i=n+1

(Ti − Ti−1) p(tk, Ti),

where the sum is over the payments during the lifetime of the swap contract.Hence, if the swap contract is written on the 3-month interest rate with tenor2 years, the accrual factor will be a sum over the 4 ∗ 2 number of payments.Furthermore, the adjoint of r?j of SNn is

r?j,SNn =

N∑i=n+1

(Ti − Ti−1)∂p(0, Ti)

∂r?j=

N∑i=n+1

(Ti−1 − Ti) r?j,Fip(0, Ti).

The definition of the forward swap rate, Defnintion 4.11,

RNn (tk) =p(tk, Tn)− p(tk, TN )

SNn (tk),

42

again evaluated at tk = 0, yields the adjoint variables as

r?j,RNn =∂RNn (0)

∂r?j=

=−r?j,Fnp(0, Tn)

SNn (0)−−r?j,FN p(0, TN )

SNn (0)− r?j,SNn

p(0, Tn)− p(0, TN )

SNn (0)2

=1

SNn (0)

(r?j,FN p(0, TN )− r?j,Fnp(0, Tn)− r?j,SNn R

Nn (0)

).

With the above results we return to (5.15) and find its adjoint variables. Thiscan be done by constructing a thorough forward pass of PSNN

n (0;K), which isthen stepped back through in order to create the adjoint variables. First of all,since σ is piecewise constant, the integrals in (5.16) can be rewritten as

∫ tN

tn

σ(s)2ds =σ(T−(tn))2∫ TT+(tn)

tn

ds+

T−(tN )−1∑i=T+(tn)

σ(i)2∫ T si+1

T si

ds

+ σ(T−(tN ))2∫ tN

T sT−(tN )

ds

=σ(T−(tn))2(T sT+(tn) − tn

)+

T−(tN )−1∑i=T+(tn)

σ(i)2 (T si+1 − T si

)+ σ(T−(tN ))2

(tN − T sT−(tN )

),

where σ(k) denotes one of the constant σs, and T si denotes the time of maturityof the ith swaption. By setting∫ tN

tn

σ(s)2ds = ΓNn ,

equations (5.16) become

d1 =ln(RNn (0)K

)+ 1

2ΓNn√ΓNn

,

d2 = d1 −√

ΓNn .

Now let RNn (0)/K = v1 and ln(v1) = v2, such that

d1 =v2 − ΓNn /2√

ΓNn, d2 = d1 −

√ΓNn .

Furthermore, recall that

N [x] =1√2π

∫ x

−∞e

−12 s2ds, (5.17)

43

which means that

∂N [x]

∂x=e−x

2/2

√2π

.

Using (5.17), (5.15) can be rewritten as

PSNNn (0;K) =

SNn (0)√2π

(RNn (0)

∫ d1

−∞e−x

2/2dx−K∫ d2

−∞e−x

2/2dx

).

Continuing the forward pass and creating temporary variables yields

PSNNn (0;K) =

SNn (0)√2π

(RNn (0)

∫ d1

−∞e−x

2/2dx−K∫ d2

−∞e−x

2/2dx

)

=SNn (0)√

2π︸︷︷︸v3

RNn (0)

∫ d1

−∞e−x

2/2dx︸︷︷︸v4

−K∫ d2

−∞e−x

2/2dx︸︷︷︸v5

= v3

RNn (0)v4 −Kv5︸︷︷︸v6

= v3v6 = v7

The corresponding backward pass is then

v6 = v3,

v3 = v6,

v5 = v6(−K) = −Kv3,

v4 = v6RNn (0) = RNn (0)v3,

d2 = v5∂v5

∂d2= −Kv3e

−d22/2,

d1 = v4e−d21/2 + d2 =

SNn (0)√2π

(RNn (0)e−d

21/2 −Ke−d

22/2),

ΓNn = d1

(−1

2ΓNn3/2− 1

4√

ΓNn

)− d2

2√

ΓNn,

v2 =d1√ΓNn

,

v1 =v2

v1.

44

And finally

SNn (0) =v3√2π

=v6√2π

=PSNN

n (0;K)

SNn (0),

RNn (0) = v6v4 +v1

K=SNn (0)√

2π

∫ d1

−∞e−x

2/2dx+d1

RNn (0)√

ΓNn,

σ(T−(tn)) = 2ΓNn σ(T−(tn))

(T sT+(tn) − tn

),

σ(i) = 2ΓNn σ(i)(T si+1 − T si

), i ∈ [T+(tn), T−(tN )− 1] ,

σ(T−(tN )) = 2ΓNn σ(T−(tn))

(tN − T sT−(tN )

).

(5.18)

This means that the derivative of a swaption price, with respect to someobserved interest rate r?j is

r?j = r?j,SNn SNn + r?j,RNn R

Nn .

5.7 Newton-Raphson

Recall that the objective by using Algorithmic Differentiation, is to find thevolatilities σ(i) such that

PSNNini (0;Ki) = C?i ,

for all observed swaption prices C?i with strike rate K, maturity tni and tenortNi − tni . In order to solve for the implied volatilites, consider the Newton-Raphson method

Definition 5.1. Newton-RaphsonThe point x where the function f(x) = 0 can be found by iterating

xn+1 = xn −f(x)

f ′(x), f ′(x) =

df(x)

dx,

until|xn+1 − xn| ≤ ε,

for some cut-off value ε.

Proof. The tangent to a function f(x) at x0 has according to the point-slopeform, the representation

y = f ′(x0)(x− x0) + f(x0).

At the point where f(x) = 0, this becomes

0 = f ′(x0)(x− x0) + f(x0),

x− x0 = − f(x0)

f ′(x0),

x = x0 −f(x0)

f ′(x0),

(5.19)

which can be iterated until f(x) = 0 is reached.

45

Using Definition 5.1 with

f(σ(i)) = C?i −PSNNini (0;Ki),

it is possible to find the volatility σ(i) where f(σ(i)) = 0, which gives

C?i = PSNNini (0;Ki)|σ=σimp .

Since C?i is an observed constant, it follows that

∂f(σ(i))

∂σ(i)= −

∂PSNNini (0;Ki)

∂σ(i)= −σ(i),

where σ(i) is defined by (5.18). Definition 5.1 then implies that the correct (or

close enough) implied volatility σ(i)imp is found by iterating

σ(i)k+1 = σ

(i)k −

C?i −PSNNini (0;Ki)

−σ(i), (5.20)

until ∣∣∣∣∣C?i −PSNNini (0;Ki)

−σ(i)

∣∣∣∣∣ < ε, (5.21)

for some cut-off value ε.

Furthermore, there are two factors that affect the price of a swaption:

• The implied volatility σ(i)imp, generated by the actors on the market can

change, resulting in a new price C?i + ∆C?i . These changes are picked upby (5.18), since

C?i = PSNNini (0;Ki)⇒

∂C?i∂σ(i)

=∂PSNNi

ni (0;Ki)

∂σ(i)= σ(i).

• The quoted rates r?i can change, resulting in a different term structure,and subsequently a different swap value and swaption price. However, ifthe term structure is changed, and the swaption price stays the same, therehas been a change in implied volatility. Theses scenarios require a newpass through the Newton-Raphson solver in order to obtain the alteredimplied volatilities. These scenarios are not covered further in this thesis,hence we assume that if the term structure is changed, the swaption priceschange accordingly as to eliminate arbitrage.

5.8 Numerical Results

Numerical values to test against were provided by ICAP, one of the largestinterest rate derivative brokers in the world, making their quoted prices morereliable than those from a less liquid marketplace. The data is a snapshot of themarket values on the 29th of June 2017. The first objective in the calibration isto find the instantaneous forward rate discussed in Section 5.3. In Figure 5.2,the instantaneous short rate is plotted along with the quoted rates in red. The

46

quoted rates have been added with 0.75%, as to represent the borrow-lendinggap at Sweden’s Central bank, the Riksbank, giving the vector

r? =[0.29 0.495 0.717 0.938 1.145 1.338 1.515 1.675 1.815 1.943

],

with corresponding maturities

Tr =[1 2 3 4 5 6 7 8 9 10

].

0 2 4 6 8 10

0.5

1.5

2.5

3.5

T, (years)

f(0,T

) (%

)

Figure 5.2: Graph of f(0, T ) for various T . The red lines represent the quotedrates, ending with a circle at maturity for that rate.

A question might arise as to why the quoted rates seem so much lowerthan the instantaneous short rate. This can be explained by the fact that theinstantaneous short rate is fitted against the integrals

∫ T ri

0

r?i du =

∫ T ri

0

f(0, u)du, ∀i.

Therefore, the shape of Figure 5.2, is a result of f(0, T ) ”compensating” forbeing lower than ri during most of the interval from 0 to T ri . The integrated

curve∫ T

0f(0, u)du is plotted in Figure 5.3, where the red circles represent the

integrated observed values r?i Tri .

47

0 2 4 6 8 10

05

1015

20

T, (years)

tota

l yie

ld (%

)

Figure 5.3: Graph of∫ T

0f(0, u)du for various T . The red circles are represent

the yield of the observed rates, i.e.∫ T ri

0r?i du = r?i T

ri .

The curve fit is a lot more clear in Figure 5.2, and thus we can be confidentthat the observed term structure is reproduced by f(0, T ).

To obtain the implied volatilities, a bootstrapping method was applied whereby

the first implied volatility σ(1)imp is solved against the first swaption maturity T s1 .

Note that the maturity in this sense is referring to the maturity of the under-lying swap contract, and not the maturity of the option. The maturity of theswaptions were constructed as T si−1 for swaption i, giving the tenor T si − Ti−1s

of the underlying swap contract. When σ(1)imp is found it is ”locked” and treated

as a constant while solving σ(2)imp using the swaption with maturity at T s2 . Then

σ(2)imp is ”locked” while solving for σ

(3)imp, and so on. Furthermore, with the im-

plied volatilities provided in the ICAP data, a set of at-the-money swaptionswere created, meaning that the strike rate K is equal to the forward swap rateRNn . These swaptions were constructed on swap contracts written on the quar-terly interest rate, hence there are 4 payments per year during the tenor of theswap. The implied volatilities are

σ =[0.347 0.389 0.404 0.450 0.410 0.376

],

with corresponding maturities

T s =[0.5 1.0 3.0 5.0 8.0 10

],

where the time in T s is denoted in years.

With the maturities, tenors, strike rates and prices of the swaptions cor-responding to the implied volatility computed, the volatilities were reset intodefault values of 0.2. Then the Newton-Raphson solver iterated each volatility

48

σ(i) until∣∣∣C?i −PSNNi

ni (0;Ki)∣∣∣ < ε for ε = 10−10. The resulting estimates are

σ =

0.34697590.38898070.40399940.44999990.41000030.3779617

,

which are close to the real implied volatilities provided.

With the instantaneous forward rate as in Figure 5.2, and the solved impliedvolatilities, a Monte Carlo simulation was performed in order to obtain thedistribution of r(t). With mean-reverting parameter a = 0.5, the result can beseen in Figure 5.4, where 5 simulated paths and their antithetic counterparts(giving 10 paths in total) are plotted.

0 2 4 6 8 10

01

23

45

6

t (years)

Inte

rest

rate

(%)

Figure 5.4: Graph of f(0, T ) in red, along with 10 Monte-Carlo paths for theshort rate r(t).

A version of Figure 5.3 together with integrated Monte Carlo paths,

∫ T

0

r(u)du,

are plotted in Figure 5.5. Again, each path has an antithetic counterpart, mak-ing the distribution symmetric around the mean f(0, T ).

49

0 2 4 6 8 10

05

1015

2025

30

t (years)

Tota

l yie

ld (%

)

Figure 5.5: Graph of∫ T

0f(0, u)du in red, along with 30 integrated

Monte-Carlo paths for the short rate,∫ T

0r(u)du.

With the paths generated, the sensitivities of the short rate r(tk) at time tkcan be expressed in terms of the calibration parameters r?i and C?i . Since thesensitivities of r?i,rk defined in (5.13) are independent of the random variablesZ, the sensitivity will not be affected by the Monte Carlo simulation. For arandom point tk (in this case k = 1337) along the grid, the sensitivities of theshort rate with respect to the calibration parameters r?i are

∂r1337

∂r?0= 0.936,

∂r1337

∂r?1= −2.496,

∂r1337

∂r?2= 3.744,

∂r1337

∂r?3= −4.992,

∂r1337

∂r?4= 6.240,

∂r1337

∂r?5= 16.128,

∂r1337

∂r?6= 4.816,

∂r1337

∂r?7= 0,

∂r1337

∂r?8= 0,

∂r1337

∂r?9= 0.

This is due to the fact that t1337 = 5.344 in the discretized grid, hence r?5has the largest impact on the short rate at that point. These results can bereplicated exactly using a finite difference method since r(t) is linear with re-spect to r?i , although via a linear function through both ϕ and f(0, T ).

The sensitivities of r1337 with respect to the volatilities σ, as defined in(5.14) are affected by the particular Monte Carlo path at hand. However, when

using a antithetic sample, the random effects are canceled immediately, and σ(i)rk

converges towards σ(i)ϕk = σϕ [i, k]. The resulting adjoints are therefore (even

50

with a sample of 2 antithetic paths)

∂r1337

∂σ(1)= 0.05018857,

∂r1337

∂σ(2)= 0.07048655,

∂r1337

∂σ(3)= 0.49875497,

∂r1337

∂σ(4)= 0.81266168,

∂r1337

∂σ(5)= 0.04095178,

∂r1337

∂σ(6)= 0.

The maturities Ts imply that σ(6) is only applicable for tk > 8, thus it has noeffect on r1337. These results are the analytical solutions to the partial deriva-tives, and the relationship between r(t) and σ is not linear, therefore they arenot exactly replicated by a finite difference approximation.

The change in a swaption price due to a change in implied volatility corre-sponds to moving the red line in Figure 5.6 up or down. Since PSNN

n (0, RNn )is approximately linear in the region around C?i , we approximate the derivative

dσ(i)

dC?i≈(∂C?i∂σ(i)

)=

(∂PSNNi

ni (0;Ki)

∂σ(i)

)=

1

σ(i),

where σ(i) is defined by (5.18). This is a very crude approximation indeed, butin the small region around C?i it is good enough due to the shape of the graphin Figure 5.6. With this approximation, the derivative dσ(i)/dC?i is of the order105. This might seem like a lot, but recall the small values of C?i . Furthermore,if C?i increases by 10%, then σimp will only increase by 20%.

0.0 0.2 0.4 0.6 0.8 1.0

5.0e-07

2.0e-06

3.5e-06

sigma

PSN(0,0.5,R(0,0.5))

Figure 5.6: Different swaption prices PSNNn (0, RNn ) for n = 0 and N = 0.5, for

different volatilities σ(1). The red line corresponds to the price of the swaptionusing the implied volatility σimp.

51

Chapter 6

Conclusions

Chapter 2 introduced Adjoint Algorithmic Differentiation, and described theprocess assisted by examples. Section 2.5 confirmed the performance of AADsuggested by Griewank (see (11)) and Capriotti (see (5)), specifically that theruntime is in the region 3-5x the original runtime of the function. The resultswere compared to the forward Euler Finite Difference Method, which scales inruntime linearly with the number of inputs n.

In Chapter 3 AAD was implemented on a stochastic model, where the re-sults could be compared to the analytical ones. By simulating the stochasticprocess using Monte Carlo with antithetic sampling, Section 3.2 could confirmthat the numerical results using AAD converges toward the analytical result,thereby confirming that AAD works even in the stochastic setting.

In Chapter 4 the Hull-White model was introduced, and a method for fittingthe model to an initial yield curve was presented. Then Section 4.3 introducedswaptions and described how their prices can be used to find the markets’ im-plied volatility σimp. Finally all expressions were rewritten as a piecewise con-stant volatility version in Section 4.4.

Finally Chapter 5 combines Chapters 2-4, by applying Adjoint AlgorithmicDifferentiation in the calibration of the Hull-White model. In order to do that,the adjoint version of all input parameters are presented for the methods in-cluded in the calibration. The proposed instantaneous forward rate f(0, T ) isshown to have a nice fit to the initial yield curve generated by the market. Thenthe implied volatilities can be solved with very high accuracy using the NewtonRaphson method. The sensitivities of the final short rate model with respect tothe calibration parameters can then be computed without the need of a finitedifference approximation. Furthermore, the sensitivities of the simulated shortrate with respect to the calibration parameters could be computed without theneed of a Monte Carlo simulation. Hence, the time consuming Monte Carlosimulation is only needed when the probability distribution of short rates r(t)at t is desired.

52

Chapter 7

Bibliography

[1] Benhamou, E. Smart monte carlo: Various tricks using malliavin calculus.Quantitative Finance 2, 5 (January 2002), 329–336.

[2] Bjork, T. Arbitrage theory in continuous time. Oxford university press,2009.

[3] Brigo, D., and Mercurio, F. Interest rate models-theory and practice:with smile, inflation and credit. Springer Science & Business Media, 2007.

[4] Capriotti, L. Fast greeks by algorithmic differentiation. Avaliable atSSRN: https://ssrn.com/abstract=1619626.

[5] Capriotti, L., and Giles, M. Algorithmic differentiation: Adjointgreeks made easy. Avaliable at SSRN: https://ssrn.com/abstract=1801522.

[6] Cox, J. C., Ingersoll Jr, J. E., and Ross, S. A. A theory of theterm structure of interest rates. Econometrica: Journal of the EconometricSociety (1985), 385–407.

[7] Di Nunno, G., Øksendal, B., and Proske, F. Malliavin Calculusfor Levy Processes with Applications to Finance, 1 ed. Springer BerlinHeidelberg, 2009.

[8] Edsberg, L. Introduction to computation and modeling for differentialequations. John Wiley & Sons, 2008.

[9] Glasserman, P. Monte Carlo methods in financial engineering, vol. 53.Springer Science & Business Media, 2013.

53

https://ssrn.com/abstract=1619626


[10] Gomber, P., Arndt, B., Lutat, M., and Uhle, T. High-frequencytrading. Available at SSRN: https://ssrn.com/abstract=1858626 .

[11] Greiwank, A., and Walther, A. Evaluating Derivatives. Society forIndustrial and Applied Mathematics, 2008.

[12] Griewank, A., et al. On automatic differentiation. Mathematical Pro-gramming: recent developments and applications 6, 6 (1989), 83–107.

[13] Gurrieri, S., Nakabayashi, M., and Wong, T. Calibration methods ofhull-white model. Available at SSRN: https://ssrn.com/abstract=1514192.

[14] Neumann, U. The Art of Differentiating Computer Programs. Society forIndustrial and Applied Mathematics, 2012.

[15] Wengert, R. E. A simple automatic derivative evaluation program. Com-munications of the ACM 7, 8 (1964), 463–464.

[16] Wilmott, P. Derivatives: The Theory and Practice of Financial Engi-neering. Wiley, 1998.

54



Appendix A

Useful Proofs

Here are some Theorems, Propositions and Lemmas that are useful in the thesis.

Proposition A.1. Ito ProcessSuppose θ, α ∈ L1 and σ ∈ L2, and let r0 be a random variable, measurable withrespect to F0. Then there exists a unique Ito process r, such that

dr(t) = (θ(t) + α(t)r(t)) dt+ σ(t)dW (t), r(0) = r0. (A.1)

It is given by

r(t) = e−∫ t0α(u)du

[r0 +

∫ t

0

e∫ s0α(u)duθ(s)ds+

∫ t

0

e∫ s0α(u)duσ(s)dW (s)

],

or, given the filtration at Fτ

r(t) = e−∫ tτα(u)du

[r(τ) +

∫ t

τ

e∫ sτα(u)duθ(s)ds+

∫ t

τ

e∫ sτα(u)duσ(s)dW (s)

].

Proof. Suppose r(t) is a process that solves (A.1), we then have

d(e∫ tsadνr(t)) =e

∫ tsadνdr(t) + e

∫ tsadνr(t)adt+ e

∫ tsadνa dtdr(t)︸︷︷︸

=0

=e∫ tsadν ((α(t)− ar(t)) dt+ σ(t)dW (t) + ar(t)dt)

=e∫ tsadν (α(t)dt+ σ(t)dW (t)) .

Integrating both sides from s to t yields

e∫ tsadνr(t)− e

∫ ssadν︸︷︷︸

=1

r(s) =

∫ t

s

e∫ usadνα(u)du+

∫ t

s

e∫ usadνσ(u)dW (u),

which implies

r(t) = e−∫ tsadν

(r(s) +

∫ t

s

e∫ usadνα(u)du+

∫ t

s

e∫ usadνσ(u)dW (u)

).

Furthermore, since the above process satisfies

e∫ tsadν (α(t)dt+ σ(t)dW (t)) = d(e

∫ tsadνr(t)) = e

∫ tsadνdr(t) + e

∫ tsadνr(t)adt,

55

we have

α(t)dt+ σ(t)dW (t) = dr(t) + r(t)adt,

which implies

dr(t) = (α(t)− ar(t)) dt+ σ(t)dW (t),

and so the process r(t) solves (A.1).

Proposition A.2. Integration by partsIf X(t) and Y (t) are one-dimensional Ito processes with

dX(t) = a(t)dt+ b(t)dW (t),

dY (t) = α(t)dt,

then ∫ T

t

X(s)dY (s) = X(T )Y (T )−X(t)Y (t)−∫ T

t

Y (s)dX(s).

Proof. By Ito’s lemma for a product. X(t)Y (t) is an Ito process with

d(X(t)Y (t)) = X(t)dY (t) + Y (t)dX(t),

and integrating both sides of the above expression yields

X(T )Y (T )−X(t)Y (t) =

∫ T

t

d(X(s)Y (s)) =

∫ T

t

X(s)dY (s) +

∫ T

t

Y (s)dX(s).

Proposition A.3. Interchanging the order of integrationSuppose α ∈ L1 and β ∈ L2. Then:

∫ t

τ

∫ s

τ

α(s)β(u)dW (u)ds =

∫ t

τ

∫ t

u

α(s)β(u)dsdW (u). (A.2)

Proof. Define process X and Y by:

dX(t) = β(t)dW (t), X(τ) = 0 ⇒ X(t) =

∫ t

τ

β(u)dW (u),

dY (t) = α(t)dt, Y (τ) = 0 ⇒ Y (t) =

∫ t

τ

α(s)ds.

56

Then, by applying Proposition A.2 to (A.2) it becomes∫ t

τ

∫ s

τ

α(s)β(u)dW (u)ds =

∫ t

τ

α(s)

∫ s

τ

b(u)dW (u)︸︷︷︸=X(s)

ds =

∫ t

τ

α(s)X(s)ds

=

∫ t

τ

X(s)α(s)ds︸︷︷︸=dY (s)

= {Proposition A.2}

= X(t)Y (t)−X(τ)Y (τ)︸︷︷︸=0

−∫ t

τ

Y (s)dX(s)

=

∫ t

τ

dX(u)

∫ t

τ

dY (s)−∫ t

τ

(∫ s

τ

dY (u)

)dX(s)

=

∫ t

τ

(∫ t

τ

dY (s)

)dX(u)−

∫ t

τ

(∫ u

τ

dY (s)

)dX(u)

=

∫ t

τ

(∫ t

τ

dY (s)−∫ u

τ

dY (s)

)dX(u)

=

∫ t

τ

(∫ t

u

dY (s)

)dX(u) =

∫ t

τ

∫ t

u

α(s)β(u)dsdW (u).

Definition A.4. Stochastic exponentialIf α ∈ L1 and β ∈ L2 are processes of dimension 1× 1 and 1×K, respectively,we define the one-dimensional process η [α, β] by

η [α, β] (t) = exp

[∫ t

0

(α− 1

2ββT

)ds+

∫ t

0

βdW

].

A process of this form is called a stochastic exponential.

Proposition A.5. Positive Ito processesIf X is a one-dimensional positive Ito process, then there exists α ∈ L1 andβ ∈ L2 such that

X(t) = X(0)η [α, β] (t),

where η [α, β] (t) is defined as in Definition A.4.

Proof. Since X is positive, ln(X(t)) is a well-defined Ito process. Write itsdifferential as

dln(x(t)) = γdt+ βdW.

Now set

γ = α+1

2ββT .

Then we get

ln(X(t)) = ln(X(0)) +

∫ t

0

(α+

1

2ββT

)ds+

∫ t

0

βdW.

And taking the exponential of this gives us

X(t) = X(0)η [α, β] (t).

57

Remark. An interesting follow up to Proposition A.5 is if η [α, β] (t) is writtenusing Y (t) = ln(X(t))

η [α, β] (t) =X(t)

X(0)=exp [ln(X(t))]

X(0)=eY (t)

X(0). (A.3)

By applying Ito’s lemma on f(x) = ex, noting that ∂f(x)∂x = f(x) and ∂f(x)

∂t = 0it becomes,

df(Y (t)) =

(∂f(Y )

∂t+ γ

∂f(Y )

∂Y+

1

2ββT

∂2f(Y )

∂Y 2

)dt+ β

∂f(Y )

∂YdW

=

(0 + γeY (t) +

1

2ββT eY (t)

)dt+ βeY (t)dW

=

((α− 1

2ββT

)X(t) +

1

2ββTX(t)

)dt+ βX(t)dW

= X(t) (αdt+ βdW ) = {Using Proposition A.5}= X(0)η [α, β] (t) (αdt+ βdW ) .

(A.4)

Combining (A.3) and (A.4) yields

d (η [α, β] (t)) = d

(f(Y (t))

X(0)

)= η [α, β] (t) (αdt+ βdW ) . (A.5)

Lemma A.6. Integral of Gaussian processes (Lemma 4.2.1 in (3))Consider the following two processes:

dX(t) = −aX(t)dt+ σ(t)dW1(t) X(0) = 0,

dY (t) = −bY (t)dt+ η(t)dW 2(t) Y (0) = 0,

where (W1,W2) is a two-dimensional Browninan motion with instantaneous cor-relation ρ as

dW1(t)dW2(t) = ρdt,

and a, b are constants, and σ(t), η(t) are deterministic functions. Then, for eacht, T the random variable

I(t, T ) =

∫ T

t

[X(u) + Y (u)] du, (A.6)

conditional on the sigma field Ft is normally distributed with mean

M(t, T ) =1− e−a(T−t)

aX(t) +

1− e−b(T−t)

bY (t), (A.7)

and variance

V (t, T ) =

∫ T

t

σ(u)2

a2

(1− e−a(T−u)

)2

du+

∫ T

t

η(u)2

b2

(1− e−b(T−u)

)2

du

+2ρ

ab

∫ T

t

σ(u)η(u)(

1− e−a(T−u))(

1− e−b(T−u))du.

(A.8)

58

Proof. By first splitting the integral in (A.6) into

I(t, T ) =

∫ T

t

X(u)du+

∫ T

t

Y (u)du, (A.9)

it is possible to work with each integral separately. By Proposition A.2 withZ(t) = t, such that dZ(u) = du, and dX(t) = −aX(t)dt + σ(t)dW (t) = dx(t)the first integral in (A.9) can be written as

∫ T

t

X(u)du =

∫ T

t

X(u)dZ(u) = X(T )Z(T )−X(t)Z(t)−∫ T

t

Z(u)dX(u)

= TX(T )− tX(t)−∫ T

t

udX(u)

= (TX(t)− TX(t)) + TX(T )− tX(t)−∫ T

t

udX(u)

= (T − t)X(t) + TX(T )− TX(t)−∫ T

t

udX(u)

= (T − t)X(t) +

∫ T

t

TdX(u)−∫ T

t

udX(u)

= (T − t)X(t) +

∫ T

t

(T − u)dX(u).

(A.10)From the definition of dX(t), the integral on the right-hand-side can be writtenas

∫ T

t

(T − u)dX(u) = −a∫ T

t

(T − u)X(u)du+

∫ T

t

(T − u)σ(u)dW1(u). (A.11)

By applying Proposition A.1 to X(u), we can rewrite the first integral of theright-hand-side in (A.11) as

− a∫ T

t

(T − u)X(u)du

=− a∫ T

t

(T − u)

(X(t)e−a(u−t) +

∫ u

t

e−a(u−s)σ(s)dW1(s)

)du

=− aX(t)

∫ T

t

(T − u)e−a(u−t)du− a∫ T

t

(T − u)

∫ u

t

e−a(u−s)σ(s)dW1(s)du.

(A.12)

59

These two integral can be calculated separately, the first one being:

− aX(t)

∫ T

t

(T − u)e−a(u−t)du

=− aX(t)

[(T − u)

e−a(u−t)

−a

]u=T

u=t︸︷︷︸=(T−t)/a

+

∫ T

t

e−a(u−t)

−adu

=− aX(t)

((T − t)a

+

[e−a(u−t)

a2

]u=T

u=t

)

=X(t)

(−(T − t)− e−a(T−t) − 1

a

).

(A.13)

And the second integral in (A.12) becomes

− a∫ T

t

(T − u)

∫ u

t

e−a(u−s)σ(s)dW1(s)du

=− a∫ T

t

∫ u

t

(T − u)e−a(u−s)σ(s)dW1(s)du

= {Using Proposition A.3 }

=

∫ T

t

−a∫ T

s

(T − u)e−a(u−s)du︸︷︷︸Use Equation (A.13)

σ(s)dW1(s)

=

∫ T

t

(−(T − s)− e−a(T−s) − 1

a

)σ(s)dW1(s)

=−∫ T

t

σ(s)

((T − s) +

e−a(T−s) − 1

a

)dW1(s).

(A.14)

Putting (A.13), and (A.14) back into (A.12) gives

− a∫ T

t

(T − u)X(u)du

=X(t)

(−(T − t)− e−a(T−t) − 1

a

)−∫ T

t

σ(s)

((T − s) +

e−a(T−s) − 1

a

)dW1(s).

Which can be inserted back into (A.11) to get∫ T

t

(T − u)dX(u) =−X(t)(T − t)− e−a(T−t) − 1

aX(t)−

∫ T

t

σ(u)(T − u)dWi(u)

−∫ T

t

σ(u)e−a(T−u) − 1

adW1(u) +

∫ T

t

σ(u)(T − u)dWi(u)

=−X(t)(T − t)− e−a(T−t) − 1

aX(t)

−∫ T

t

σ(u)e−a(T−u) − 1

adW1(u).

60

Which, in turn implies that (A.10) becomes∫ T

t

X(u)du =(T − t)X(t)−X(t)(T − t)− e−a(T−t) − 1

aX(t)

−∫ T

t

σ(u)e−a(T−u) − 1

adW1(u)

=1− e−a(T−t)

aX(t) +

∫ T

t

σ(u)

a

(1− e−a(T−u)

)dW1(u).

(A.15)

The steps above is similar for Y (t) in Equation (A.9), giving the analogousexpression for Y (t) as∫ T

t

Y (u)du =1− e−b(T−t)

bY (t) +

∫ T

t

η(u)

b

(1− e−b(T−u)

)dW2(u). (A.16)

Hence

E [I(t, T )|Ft] = E

[∫ T

t

X(u)du+

∫ T

t

Y (u)du

∣∣∣∣∣Ft]

=1− e−a(T−t)

aX(t) +

1− e−b(T−t)

bY (t)

= M(t, T ),

(A.17)

which proves (A.7).

As for the conditional variance in (A.6) the definition of variance yields

V ar {I(t, T )|Ft} = E

(∫ T

t

x(u)du+

∫ T

t

y(u)du

)2∣∣∣∣∣∣Ft

− E

[∫ T

t

x(u)du+

∫ T

t

y(u)du

∣∣∣∣∣Ft]2

, (A.18)

where it follows from the calculations earlier that the second expectation is equalto Equation (A.7) squared, specifically

M(t, T )2 =

(1− e−a(T−t)

ax(t)

)2

+

(1− e−b(T−t)

by(t)

)2

+ 2

(1− e−a(T−t)

ax(t)

)(1− e−b(T−t)

by(t)

). (A.19)

The first expectation needs a bit more work however,

E

(∫ T

t

x(u)du+

∫ T

t

y(u)du

)2∣∣∣∣∣∣Ft = Inserting (A.15) and (A.16)

61

= E

[(1− e−a(T−t)

aX(t) +

∫ T

t

σ(u)

a

(1− e−a(T−u)

)dW1(u)

+1− e−b(T−t)

bY (t) +

∫ T

t

η(u)

b

(1− e−b(T−u)

)dW2(u)

)2∣∣∣∣∣∣Ft .

For a less cumbersome notation, let

A(t) =1− e−a(T−t)

aX(t) α(t) =

∫ T

t

σ(u)

a

(1− e−a(T−u)

)dW1(u),

B(t) =1− e−b(T−t)

bY (t) β(t) =

∫ T

t

η(u)

b

(1− e−b(T−u)

)dW2(u).

Which gives

E[A2(t) + 2A(t)α(t) + 2A(t)B(t) + 2A(t)β(t) + α(t)2

+2α(t)B(t) + 2α(t)β(t) +B(t)2 + 2B(t)β(t) + β(t)2∣∣Ft]

=E[A2(t)

∣∣Ft]+ 2E [A(t)B(t)| Ft] + E[B2(t)

∣∣Ft]+ E [ 2A(t)α(t) + 2A(t)β(t) + 2α(t)B(t) + 2B(t)β(t)| Ft]+ E

[α(t)2 + 2α(t)β(t) + β(t)2

∣∣Ft] .The first three expectations are deterministic, and equal to M(t, T )2 seen in(A.19). The fourth expectation vanishes since

E [ 2A(t)α(t) + 2A(t)β(t) + 2α(t)B(t) + 2B(t)β(t)| Ft]=E [ 2α(t) (A(t) +B(t)) + 2β(t) (A(t) +B(t))| Ft]=2E [α(t)| Ft]︸︷︷︸

=0

E [A(t) +B(t)| Ft] + 2E [β(t)| Ft]︸︷︷︸=0

E [ (A(t) +B(t))| Ft]

= 0.

Thus (A.18) becomes

V ar {I(t, T )|Ft} = M(t, T )2 + E[α(t)2 + 2α(t)β(t) + β(t)2

∣∣Ft]−M(t, T )2

= E[α(t)2 + 2α(t)β(t) + β(t)2

∣∣Ft]= E

[α(t)2

∣∣Ft]+ 2E [α(t)β(t)| Ft] + E[β(t)2

∣∣Ft] .Returning to the original notation, the expectations become

E

(∫ T

t

σ(u)

a(1− e−a(T−u)dW1(u)

)2∣∣∣∣∣∣Ft =

1

a2

∫ T

t

σ(u)2(

1− e−a(T−u))2

du,

E

(∫ T

t

η(u)

b(1− e−b(T−u)dW2(u)

)2∣∣∣∣∣∣Ft =

1

b2

∫ T

t

η(u)2(

1− e−b(T−u))2

du,

2E

[∫ T

t

σ(u)

a(1− e−a(T−u)dW1(u)

∫ T

t

η(u)

b(1− e−b(T−u)dW2(u)

∣∣∣∣∣Ft]

=2ρ

ab

∫ T

t

σ(u)η(u)(

1− e−a(T−u))(

1− e−b(T−u))du,

which prove (A.8).

62

TRITA -MAT-E 2017:57

ISRN -KTH/MAT/E--17/57--SE

www.kth.se

Documents

Calibrating the Hull-White model using Adjoint Algorithmic …1147858/... · 2017-10-09 · further in Section 2.2. When using algorithmic di erentiation however, there is no need