Calibration of A ne Model - UvA · The main calibration instruments (market data) are yield curves, caplets/ oorlets and European-style swaptions. Depending on the application, there

MSc Stochastics and Financial Mathematics

Master Thesis

Calibration of Affine Model

Author: Supervisor:Jian He dr. P.J.C Spreij

dr. P.W. den Iseger

Examination date:August, 2016

Korteweg-de Vries Institute forMathematics

ABN AMRO Bank N.V

Abstract

With the development of financial modeling, the need for more general and sophisticated mod-els is rapidly increasing. These models can capture more features of the real underlying, suchas stochastic volatilities and interest rate spreads, and hence can model the derivatives moreaccurately. To this end, this thesis is devoted to the research on the affine model for financialapplication orientations.

The affine model is a sophisticated model that describes a class of stochastic processes withaffine structure in the logarithm of their characteristic function. The main contribution ofthis thesis is to propose a method for the calibration of affine model. The numerical capletpricing method and optimization algorithm consist the core of this thesis. We introduce an effi-cient caplet (floorlet) pricing algorithm that involves the Laplace transform inversion technique.Moreover, the Kalman filter algorithm is used for the calibration of affine models. Finally, thetwo factor Hull White model with stochastic volatility is used as an example to illustrate thecalibration performance.

Key words: Affine Model, Riccati Equations, Laplace Transform Inversion, Kalman Filter

Title: Calibration of Affine ModelAuthor: Jian He, [email protected], 10867317Supervisor: dr. P.J.C Spreij dr. P.W. den IsegerExamination date: August, 2016

Korteweg-de Vries Institute for MathematicsUniversity of AmsterdamScience Park 105-107, 1098 XG Amsterdamhttp://kdvi.uva.nl

ABN AMRO Bank N.V.Gustav Mahlerlaan 10, 1082 PP Amsterdamhttp://www.abnamro.com

2

http://kdvi.uva.nl

http://www.abnamro.com

Acknowledgements

At first, I would like to thank my daily supervisor Dr. Peter den Iseger for his guidance andsupport. Also, I would like to thank my supervisor Dr. Peter Spreij for his guidance intoMathematics. Further more, I want to thank my colleagues in the Market & ALM/T RiskModelling department of ABN AMRO Bank for their helps during the process. Finally, I wouldlike to like my family for their supports.

3

Contents

1. Interest Rates 91.1. Bank Account and Short Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2. Zero Coupon Bond and Forward Rates . . . . . . . . . . . . . . . . . . . . . . . . 91.3. Interest Rate Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4. Caps and Floors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2. Affine Model 132.1. Definition of Affine Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2. Discounting in Affine Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3. European Bond Option Pricing in Affine Model 183.1. Numerical Solution to Riccati equations . . . . . . . . . . . . . . . . . . . . . . . 183.2. Laplace Transform Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1. Legendre Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2.2. Poisson Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.3. Laplace Transform Inversion Algorithm . . . . . . . . . . . . . . . . . . . 253.2.4. Damping factors in IFFT Algorithm . . . . . . . . . . . . . . . . . . . . . 263.2.5. Example: Vasicek Short Rate Model . . . . . . . . . . . . . . . . . . . . . 273.2.6. Example: CIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3. Valuation of Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3.1. Bond Option Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3.2. Caplet Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.3. Example: CIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.4. Example: Heston Stochastic Volatility Model . . . . . . . . . . . . . . . . 33

4. Calibration of Affine Models by Kalman Filter 344.1. Outline of Kalman Filter Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 344.2. Linear Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3. Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.4. Calibration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.5. QR Decomposition for Linear or Extended Kalman Filter . . . . . . . . . . . . . 40

5. Calibration Example: Two Factor Hull White Model with Stochastic Volatil-ity 425.1. Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2. Kalman Filter Model for Two Factor Hull White Model with Stochastic Volatility 435.3. Derivative Computation of Affine Model for EKF Application . . . . . . . . . . . 445.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.4.1. Parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4.2. Small Noise Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4.3. Higher Noise Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4

6. Conclusion 58

Popular summary 60

Bibliography 61

Appendices 62

A. Proof of Lemma 3.2.1 63

B. proof of Theorem 3.2.2 64

C. Proof of Theorem 3.2.3 65

D. Numerical Algorithm for Swaption Pricing 67

5

Introduction

While many traditional models are successfully used in financial modeling, more sophisticatedmodels are desired. One example is the stochastic volatility model. It resolves the shortcomingof the models based on Black-Scholes that the underlying volatility is constant over the life ofthe derivative. Sophisticated models can model derivatives more accurately, however, the com-plexity of these model may bring many difficulties in derivative valuation and model calibration,and hence the financial applications of such models is difficult. The balance between the modelcomplexity and derivative calculation speed motivates the choice of affine model.

The affine model is the model for a class of continuous stochastic Markov processes of whichthe log-characteristic function is affine w.r.t. the initial state of the process. The earliest paperto discuss the processes with such property was by Kawazu and Watanabe [1971]. In resentyears, with the development of financial mathematics, the affine model becomes favorable infinancial application due to its nice properties. Duffie, Filipovic, and Schachermayer [2003] de-fined an affine process as a continuous Markov process with canonical state space Rm+ ×Rn andwith the affine structure of log-characteristic function. They also gave a rigorous mathematicalfoundation to the theory of affine processes such as the characterization of affine processes byadmissible parameters and the ordinary differential equations implied by the affine process. Thetheoretical background of affine models in this thesis is based on their research.

The affine model has good properties for financial applications. At first, the affine model isa genetic model which includes most models we usually use in the financial engineering, such asCIR model, Heston model and Hull White model. Secondly, the affine model can describe manyasset classes simultaneously, such as yield curves, inflations, FX and equity returns. Moreover,it can also easily capture features like spreads and stochastic volatility. Finally, a semi-analyticalsolution to the option price is available in the affine models. This enables us to compute theoption prices efficiently.

Parameters in one model need to be predetermined before the model is used. The processof finding the best parameters is often called calibration. The determination of such parametersis done by matching the model predictions to the available market date (i.e. derivative prices).The classical method to match the predictions and market data are maximum likelihood estima-tion (MLE) and least squares method (LSM). The main calibration instruments (market data)are yield curves, caplets/floorlets and European-style swaptions. Depending on the application,there are two different sets of data for the financial model calibration: the historical data and thecurrent market data. Calibration on the times series of historical data can lead to the dynamicsunder the real-world measure P. By contrast, calibration on the current market data gives usthe dynamics under the risk neutral measure Q. The different between these two measuresis that the real-world measure P contains a risk premium. This risk premium is the marketprice of risk. Although the real world and the risk neutral world are different, in financial riskmodeling we need both measures. On one hand, we need to generate the scenarios for the riskfactors (underlying process) under the real-world measure P; one the other hand, the option

6

pricing should be consistent with the market, namely under the risk neutral measure. Affinemodels allow for easy translation between these two measures. In the calibration part of thisthesis, for convenient, we assume that the real world measure P and the risk neutral measureQ are the same.

The calibration approach of a financial model depends on its characteristics. One thing weshould note is that we can only observe the prices of calibration instruments (yield curves, op-tions etc.), the states of underlying process are latent. However, in affine models, the expressionsfor the instruments are based on the state of underlying process. So in order to calibrate affinemodels, the algorithm should have ability to estimate the risk factors. Moreover, the modelcannot exactly recover the yield curves and option prices, since the model is only a reflection ofreality. Or in other word, the market prices contain the noise. In this sense, we therefore needto add a measurement error in pricing. According to these reasons, we decide to use Kalmanfilter for the calibration.

The Kalman filter is an algorithm to estimate the latent states of a dynamic system usingthe measurements that is a function of the system state but corrupted by a white noise. It isnamed after Rudolf Kalman, who is one of the primary developers of its theory. Kalman filteris widely used in practical area such as signal processing, continuous manufacturing process andaircraft. The traditional Kalman filter (linear Kalman filter) requires the dynamic system tobe linear, so is the mapping from the system states to measurements. For the nonlinear systemand mapping, extensions and generalizations also have been developed, such as the extendedKalman filter and the unscented Kalman filter. In this thesis, extended Kalman filter is used.

In the first chapter of this thesis, the basic concepts of interest rates and associated deriva-tives are introduced. Next we discuss the definition of affine processes and their properties.They the theoretical fundamentals of this thesis. The calibration requires two part of work:derivative pricing and optimization. These are the core parts of this thesis which will be pre-sented in the next two chapters. In the third chapter, the option pricing algorithm is discussed.This algorithm involves the numerical solution to the ordinary differential equations (Riccatiequations) and the Laplace transform inversion algorithm. The discrete Kalman filter and cal-ibration algorithm are discussed in the fourth chapter. Moreover, the QR decomposition isapplied in linear and extended Kalman filter to improve their efficiency. In the next chapter,we use extended Kalman filter to calibrate the two factor Hull White model with stochasticvolatility. In this special case, we also develop an efficient algorithm to compute the derivativesof the caplet (floorlet) value w.r.t. the underlying process. Finally, we make a proposal aboutthe improvements and research direction in the future.

This thesis is motivated by ABN AMRO’s project ”Multi-curve scenario model”, which aimsto estimate the risk exposure through multi-curve when market shocks. This project consistsof two parts (steps): first is to calibrate the chosen multi-curve interest model and the secondstep is to generate the scenarios from the calibrated model and estimate the risk exposure.

The main contribution of this thesis is to propose a method to calibrate the affine model forfinancial applications. We develop a numerical algorithm to solve the complex Riccati equationsin affine model. We also show an efficient numerical pricing algorithm for the caplet (floorlet)in affine models. This pricing algorithm is based on Laplace transform inversion techniquesdeveloped by den Iseger [2006] and den Iseger [2008]. Further more, we propose the discrete

7

Kalman filter algorithm for the calibration of affine model and illustrate the calibration per-formance of extended Kalman filter through the example two factor Hull White model withstochastic volatility. It is also worth to noting that the Kalman filter can statistically estimatethe correlation between the latent risk factors of different asset classes, such as interest products(yield curves, caplets) and FX etc.. This enables us to generate multiple scenarios under thereal world measure P, which is an essential step in the second part (step) of the project.

8

1. Interest Rates

Money has time value. Normally the money in hand today is worth more than the same amountin the future. This is due to the opportunity cost of handing the money since one can always putmoney into bank account to earn money at some rate, which normally is positive. These ratesare called interest rates in our daily life. However, expressing such concepts in mathematics isnot obvious since these rates change over time. In this chapter, we will introduce the definitionof different interest rates and interest rate derivatives.

1.1. Bank Account and Short Rate

The bank account represents a risk-less investment, whose profit accumulates continuously atthe risk-free rate. This rate is usually referred to the “short rate ”or “instantaneous spot rate”.

Definition 1.1.1. Let B(t) be the value of a bank account at time t > 0 and B(0) = 1. Weassume B(t) satisfies the PDE as follows:

dB(t) = rtB(t)dt, B(0) = 1,

rt is the short rate which can be stochastic. B(t) can also be represented as

B(t) = e∫ t0 rudu.

When dealing with interest rate products, it is important to realise the probabilistic nature ofr. The models that describe the stochastic property of r are called short rate models.

1.2. Zero Coupon Bond and Forward Rates

Definition 1.2.1. (Zero Coupon Bond) A T -maturity zero coupon bond is a contract thatguarantees its holder the payment of one unit of currency at time T . The contract value at timet ≤ T is denoted by P (t, T ).

It is clear that P (T, T ) = 1. Now a natural question is what should be the fair price of the zerocoupon bonds P (t, T ), t < T . Assume that the market is arbitrage free, and Q is a martingalemeasure. Then for t < T ,

P (t, T ) = B(t)EQ[P (T, T )

B(T )| Ft] = EQ[

B(t)

B(T )| Ft] = EQ[e−

∫ Tt rudu | Ft].

One can see that the fair price of a zero coupon bond is the expectation of the future payment(one unit of currency) with discounted by the bank account.

9

While zero coupon bonds reflect the price of a loan from today to the future, the forwardrates reflect the price of a load between two future days. The forward rate involves three timeinstants t ≤ T ≤ S: the current date t, the expire date T and the maturity date S. We candefine the forward rates through a prototypical forward rate agreement (FRA). There are servaldifferent forward rates:

Definition 1.2.2. (forward rates)1. The simple forward rate for [T, S] prevailing at t is given by

F (t, T, S) =1

S − T

(P (t, T )

P (t, S)− 1

).

2. The simple spot rate for [t, T ] is

F (t, T ) = F (t, t, T ) =1

T − t(

1

P (t, T )− 1).

3. The continuously compounded forward rate for [T, S] prevailing at t is

R(t, T, S) = − logP (t, S)− logP (t, T )

S − T,

which is equivalent to

eR(t,T,S)(S−T ) =P (t, T )

P (t, S).

4. The continuously compounded spot rate (zero rate) for [t, T ] is

R(t, T ) = R(t, t, T ) = − logP (t, T )

T − t.

1.3. Interest Rate Swaps

A swap is an agreement between two parties to exchange a payment stream at a fixed rate ofinterest for a payment stream at a floating rate (typically LIBOR). The two payment streamsare usually referred to as the fixed - and floating-leg, respectively.

There are many versions of interest rate swaps. A payer interest rate swap is specified by:1. a number of future dates: T0 < T1 < · · · < Tn, with Ti − Ti−1 = δi;2. a fixed rate K;3. floating rate at time Ti : F (Ti−1, Ti), i = 1, · · · , n;4. a nominal value N .

By definition the net gain at time Ti is δiN(F (Ti−1, Ti) − K). We can compute the timet ≤ T0 value of this net gain as

N (P (t, Ti−1)− P (t, Ti)−KδiP (t, Ti)) .

So the total value of the swap at time t is

Πp(t) = N

(P (t, T0)− P (t, Tn)−K

n∑i=1

δiP (t, Ti)

).

10

Conversely, the holder of a receiver interest rate swap receives the fixed leg and pay the floatingleg, so value of receiver interest rate swap at time t is

Πr(t) = −Πp(t).

The “fair ”price of the swap naturally comes from the word “fair ”: both parties in the interestrate swap earn the same amount. This gives the forward swap rate (the fair price rate) Rswap(t)at time t, which is the fixed rate such that Πr(t) = Πp(t) = 0. Hence

Rswap(t) =P (t, T0)− P (t, T )∑n

i=1 δiP (t, Ti).

1.4. Caps and Floors

A caplet with reset date T and settlement date S, S > T is an option which pays the holder thedifference between a simple spot rate F (T, S) and the strike rate k at time S. Its cash flow attime S is

(S − T )(F (T, S)− k)+.

Its value at time t is written as cpl(t, T, S). We call the time period T − t the expiry and S−Tthe maturity.

A cap is a strip of caplets. The price of a N-period cap with strike price k is given by

cp(t) =

n∑i=1

cpl(t, Ti−1, Ti).

A floor is converse to a cap. It is a strip of floorlets, and its the cash flow at time Ti is

(S − T )(k − F (T, S))+.

Write fll(t, Ti−1, Ti) as the price of the floorlet at time t, then the time t price of the floor isgiven by

fl(t) =n∑i=1

fll(t, Ti−1, Ti).

The caps gives the holder a protection against high interest rates. It guarantees the float-ing rate to be paid never exceeds the predetermined strike price k. On the other hand, thefloors protect against the low rates.

Caps and floors are strongly related to interest rate swaps. Actually a cap can be viewedas a payer swap where the payment is executed only if the net gain of each payment is positive.On contrary, a floor can be viewed as a receiver swap where the payment is executed only ifthe net gain of each payment is positive. Moreover, if the swap has nominal one and the samestrike k and tenor structure as cap and floor, one can show the parity relation

cp(t)− fl(t) = Πp(t).

11

Definition 1.4.1. The cap or floor is said to be at-the-money (ATM) if the strike equals tothe swap rate, namely,

k = Rswap(t) =P (t, T0)− P (t, T )∑n

i=1 δiP (t, Ti), δi = Ti − Ti−1.

Moreover, the cap (floor) is in-the-money (ITM) if k < Rswap(t) (k > Rswap)(t) and out-the-money (OTM) if k > Rswap(t) (k < Rswap)(t).

12

2. Affine Model

The affine model is the model that describe the affine processes. Affine processes are a classof continuous Markov processes with the ”affine” property. The key word ”affine” means thatthe logarithm of the conditioned characteristic function of the underlying process is affine withrespect to initial state of the process. The coefficients defining the affine relation are given bythe solutions of a system of ordinary differential equations, called ”Riccati equations”. Thenice analytical properties and the combination of many different models within one theoreticalframework make the affine processes favorable in financial applications.

In this chapter, we will introduce the definition of affine processes and their properties. Thedefinitions and theorems are based on [1]. We will fix the probability space (Ω,F ,Q), where Qis the risk neutral measure and fix the filtration F = (Ft)∞t=0 on this probability space.

2.1. Definition of Affine Processes

Given a state space χ ⊂ Rd and a filtration F, suppose the adapted d-dimensional stochasticprocess X satisfies the stochastic differential equation:

dX(t) = b(t,X(t))dt+ ρ(t,X(t))dWt,

X0 = x,

where b : χ→ Rd and ρ : χ→ Rd are measurable. The diffusion matrix is defined by a(t, x) =ρ(t, x)ρ>(t, x).

Definition 2.1.1. We say X is affine if the Ft-conditional characteristic function of X(T ) isexponential affine in X(t), for all t ≤ T . Namely, there exist C- and Cd- valued functions φ(t, u)and ψ(t, u), respectively satisfies

E[eu>X(T ) | Ft

]= eφ(T−t,u)+ψ(T−t,u)>X(t), (2.1.1)

for all u ∈ iRd, t ≤ T and x ∈ χ.

Equation(2.1.1) gives the characteristic function of X(T ) conditioned on X(t). Characteristicfunction is quite useful in financial engineering, since in most cases we cannot obtain a explicitformula for the distribution of the underlying process but the characteristic function providesthe full information about the distribution.

Note that the left hand side of (2.1.1) is a martingale, so if we apply Ito’s lemma to theright hand side, the drift term should to be zero. This can lead to the following theorem aboutequivalent condition for the affine processes and the implied dynamics for the function φ(t, u)and ψ(t, u).

13

Theorem 2.1.1. Suppose X is affine. Then the drift b(t, x) and diffusion matrix a(t, x) areaffine w.r.t. x, namely

a(t, x) = a+

d∑i=1

xiαi,

b(t, x) = b+d∑i=1

xiβi = b+Bx

(2.1.2)

for some d× d matrixes a and αi, and 1× d vector b and βi, and we denote B = (β1, · · · , β2).Moreover, φ, ψ solve the system of Riccati equations

∂tφ(t, u) =1

2ψ(t, u)>aψ(t, u) + b>ψ(t, u),

φ(0, u) = 0;

∂tψi(t, u) =1

2ψ(t, u)>αiψ(t, u) + β>i ψ(t, u),

ψ(0, u) = u.

(2.1.3)

Conversely, suppose the drift b(t, x) and diffusion matrix a(t, x) are affine of form (2.1.2),andsuppose there exist a solution (φ, ψ) of the system of Riccati equations (2.1.3), such that φ(t, u)+ψT (t, u)x has non-positive real part for all t ≥ 0, u ∈ iRd and x ∈ χ. Then X is affine.

In some cases, the value of the processes should be nonnegative, like CIR process. In financialengineering, we use such processes to model the volatility and the spreads. Then we need toadjust the state space to the following canonical state space χ ⊂ Rm+ ×Rn for some non-negativeinteger m,n such that m + n = d and further restrictions are required on the parameters in(2.1.2) :

· a, αi should be such that a(t, x) = a(t) +∑d

i=1 xiαi is symmetric and semi-positive defi-nite for all x ∈ χ due to the fact that a(t, x) = ρ(t, x)ρT (t, x).· a, αi, b, βi should be such that X will not leave χ

Define the index set

I = 1, · · · ,m, J = m+ 1, · · · , d.

For the above canonical space, we can give equivalent condition for the parameters. The pa-rameters that satisfy the conditions are called admissible parameters. The intuition behind theadmissible parameters is to make sure the processes Xi, i ∈ I to be nonnegative. Once theprocess hit the boundary, i.e. Xk(t0) for some k ∈ I and some time t0, the diffusion must beparallel to the boundary and the drift term must be nonnegative.

Theorem 2.1.2. The process X on the canonical state space Rm+ × Rn is affine if and only ifa(t, x) and b(t, x) are affine in the form (2.1.2) for parameters a, αi, b, βi as following:

14

a, αi are symmetric and semi− positive definite,aII = aIJ = aJI = 0,

αj = 0 for allj ∈ J,αi,kl = αi,lk = 0, for k ∈ I/i, for all 1 ≤ i, l ≤ d,

b ∈ Rm+ × Rn,BIJ = 0,

BII has nonnegative off − diagonal elements.

(2.1.4)

and the system of Riccati equations (2.1.3) exists a unique global solution (φ(·, u), ψ(·, u)) :R+ → C− × Cm− × iRn for all initial values u ∈ Cm− × iRn.

Remark 2.1.1. The above shows that for the affine process with admissible parameters, thefunction φ(t, u), ψ(t, u) and the characteristic function in (2.1.1) can be extended beyond u ∈iRd. Actually (2.1.1) holds for all u ∈ Cd if either side is well defined. In application, we assumethe (2.1.1) holds for all u ∈ Cd since we assume that moments of the underlying process alwaysexist.

2.2. Discounting in Affine Models

Suppose that X is an affine process on the canonical state space Rm+ × Rn, with admissibleparameters. Assume that the short rate is of the form

rt = c+ γ>X(t),

for some constant c ∈ R and γ ∈ Rd.

One purposes of this thesis is to derive a numerically tractable option pricing formula. Considera T -claim X(T ) with payoff f(X(T )), the price of the European option with strike K is givenby

π(t) = E[e−

∫ Tt rsds (f(X(T ))−K)+ | Ft

]<∞.

In order to compute the option price, in principle the information about the joint distributionsof X(T ) and

∫ Tt rsds is required. Moreover, the double integral with respect to such distribution

in most cases turns our to be very hard. Good news is that the option pricing formula can besimplified under the forward measure. The forward measure is based on the technique of changenumeraire, we replace the risk-free numeraire by the T -bond. Formally, the forward measureQT is defined by the probability measure on FT which is equivalent to the risk neutral measureQ and

dQT

dQ=

1

P (0, T )B(T ).

For t ≤ T ,

dQT

dQ|Ft = E[

1

P (0, T )B(T )| Ft]

=P (t, T )

P (0, T )B(t).

15

Under the forward measure QT , the option pricing formula can be simplified as

π(t) = P (t, T )EQT[(f(X(T ))−K)+ | Ft

].

In this case, we separate the discounting factor and the payoff function, so we only need to com-pute the bond price, and the single integral EQT [(f(X(T ))−K)+ | Ft] using the distributioninformation of X(T ) under the forward measure. So in the following, we will at first introducethe theorem which enables us to compute the bond price P (t, T ). Then by using this theorem,we derive the Ft-conditioned characteristic function of X(T ) under the forward measure QT ,which can be used to compute EQT [(f(X(T ))−K)+ | Ft].

Theorem 2.2.1. Assume X is an affine process on the canonical state space Rm+ × Rn withadmissible parameters satisfied (2.1.4), and the Riccati equations

∂tφ(t, u) =1

2ψ(t, u)>aψ(t, u) + b>ψ(t, u)− c,

φ(0, u) = 0;

∂tψi(t, u) =1

2ψ(t, u)>αiψ(t, u) + β>i ψ(t, u)− γi,

ψ(0, u) = u.

(2.2.1)

have solution (φ(t, u), ψ(t, u)) up to time T . Moreover, assume E[e∫ Tt rsds] <∞. Then

E[e−

∫ Tt rsdseu

>X(T ) | Ft]

= eφ(T−t,u)+ψ>(T−t,u)X(t), (2.2.2)

specially, setting u = 0,

P (t, T ) = E[e−

∫ Tt rsds | Ft

]= eφ(T−t,0)+ψ>(T−t,0)X(t). (2.2.3)

Remark 2.2.1. The discounted rate in affine models can be very general, like credit risk rate,if the rate is assume to be affine with the underlying process. This property makes the affinemodels very useful in counterparty risk calculation, like CVA.

Using the Theorem 2.2.1, we can derive the conditional characteristic function X(T ) under theforward measure. Assume the affine process X satisfies the condition in Theorem 2.2.1, fort ≤ T ≤ S, we have

EQS[eu>X(T ) | Ft

]=

EQ

[P (T,S)B(T ) e

u>X(T ) | Ft]

EQ

[P (T,S)B(T ) | Ft

]=

EQ

[e−

∫ Tt rsdseu

>X(T )eφ(S−T,0)+ψ>(S−T,0)X(T ) | Ft]

P (t, S)

=eφ(S−T,0)EQ

[e−

∫ Tt rsdse(u+ψ(S−T,0))>X(T ) | Ft

]P (t, S)

=eφ(S−T,0)+φ(T−t,u+ψ(S−T,0))+(ψ(T−t,u+ψ(S−T,0)))>X(t)

P (t, S).

(2.2.4)

16

Specially, let S = T ,

EQT[eu>X(T ) | Ft

]=eφ(T−t,u)+ψ>(T−t,u)X(t)

P (t, T ). (2.2.5)

17

3. European Bond Option Pricing in AffineModel

Derivative pricing is essential for the financial application of a model, since the main purposeof financial modelling is to predict the derivative prices in the future. Moreover, in the modelcalibration, derivative pricing is also necessary. So in this chapter, we will develop an efficientalgorithm to price the European bond option (including caplets and floorlets), which is one ofthe most important derivative in the financial modelling. The swaption pricing algorithm is alsopresented in the Appendices, but this algorithm is not efficient enough and need to be improved.

Generally, the knowledge of the distribution of the underlying process is crucial for the op-tion pricing. In some cases, the closed form expression of distribution function can be derived.One famous example is the Black-Scholes formula. But in most cases, the determination of dis-tribution function is impossible due to the complexity of the underlying process. One approachfor these cases is to directly link the characteristic function to the expected payoff of the option.The philosophy behind this approach is that the characteristic function completely determinesthe distribution function. This approach can be done by inverting the Laplace (Fourier) trans-form. In this way, the option prices can be calculated easily even for complex processes.

One advantage of affine models is the availability of semi-analytical solution for the Europeanbond option (also caplet and floorlet) price. Semi-analytical means that one can derive the ex-plicit option price formula in affine models, but it has to be calculated numerically. Accordingto the previous chapter, if we know the solution (φ, ψ) to the Riccati equations (2.2.1), we caneasily calculate the bond price (also zero rate) by using (2.2.3) and the characteristic function ofthe transition function by using (2.2.4). Therefore, we begin this chapter by deriving a numer-ical solution to the Riccati equations. Then we will introduce the Laplace (Fourier) inversionalgorithm based on [3] and [4], and pricing the option by inverting Laplace (Fourier) transform.Finally, we end this chapter by presenting pricing results of some specific affine models. In thischapter, the measure is assumed to be risk neutral if there is no specification.

3.1. Numerical Solution to Riccati equations

For many Riccati equations, it is hard (or even impossible) to calculate a closed-form solution,especially in high dimensional cases. So a numerical approach is needed. In this article, giventhe ODE (2.2.1), we use the Taylor series to approximate the solution (φ, ψ). In order to do so,at first we need to determine the coefficients in Taylor expansion.

Proposition 3.1.1. Suppose (φ(t, u), ψ(t, u)) is the solution of (2.2.1). Given the value of u,andassume the Taylor expansions of (φ(t, u), ψ(t, u)) are given by φ(t, u) =

∑∞k=0Ck(u)tk <∞, and

18

ψi(t, u) =∑∞

k=0Dik(u)tk <∞, then we have the following recursion for the coefficients:

C0(u) = 0,

C1(u) =1

2u>au+ b>u− c,

Ck+1(u) =1

1 + k

(1

2

k∑n=0

D>n (u)aDk−n(u) + b>Dku

), k ≥ 2,

Di0(u) = ui,

Di1(u) =

1

2u>αiu+ β>i u− γi,

Dik+1(u) =

1

1 + k

(1

2

k∑n=0

D>n (u)αiDk−n(u) + β>i Dk(u)

), k ≥ 2,

(3.1.1)

where Dn(u) = (D1n(u), · · · , Dd

n(u))>.

Proof. Suppose φ(t, u) =∑∞

k=0Ck(u)tk, ψi(t, u) =∑∞

k=0Dik(u)tk, let t = 0, we obtain C0(u) =

0, Di0(u) = ui. Taking the derivative of ψi(t, u) w.r.t. t,

∂tψi(t, u) =

∞∑k=1

Dik(u)ktk−1

=

∞∑k=0

Dik+1(u)(k + 1)tk

(3.1.2)

On the other hand, according to (2.2.1),

∂tψi(t, u) =1

2ψ(t, u)>αiψ(t, u) + β>i ψ(t, u)− γi

=1

2

d∑l,r=1

ψl(t, u)αi(l, r)ψr(t, u) +

d∑s=1

B(s, i)ψs(t, u)− γi

=1

2

d∑l,r=1

(∞∑k=0

Dlk(u)tk)αi(l, r)(

∞∑k=0

Drk(u)tk) +

d∑s=1

B(s, i)(

∞∑k=0

Dsk(u)tk)− γi

=1

2

d∑l,r=1

αi(l, r)∞∑k=0

(k∑

m=0

Dlm(u)Dr

k−m(u))tk +d∑s=1

B(s, i)(∞∑k=0

Dsk(u)tk)− γi

=

∞∑k=0

1

2

d∑l,r=1

k∑m=0

Dlm(u)αi(l, r)D

rk−m(u) +

d∑s=1

B(s, i)Dsk(u)tk

tk − γi

=∞∑k=0

(1

2

k∑m=0

D>m(u)αiDk−m(u) + βTi Dk(u)

)tk − γi

= (1

2u>αiu+ β>i u− γi) +

∞∑k=1

(1

2

k∑m=0

D>m(u)αiDk−m(u) + β>i Dk(u)

)tk

(3.1.3)

19

So compare the Taylor coefficients of (3.1.2) and (3.1.3),

D1(u) =1

2u>αiu+ β>i u− γi,

Dik+1(u) =

1

1 + k

(1

2

k∑n=0

D>n (u)αiDk−n(u) + β>i Dk(u)

)Similarly, we can also obtain

C1(u) =1

2u>au+ b>u− c,

Ck+1(u) =1

1 + k

(1

2

k∑n=0

D>n (u)aDk−n(u) + b>Dku

)

This proposition allows us to approximate the (φ(t, u), ψ(t, u)) by

φ(t, u) ≈N∑i=0

Ck(u)tk,

ψi(t, u) ≈N∑i=0

Dk(u)tk,

The approximation errors are of the form∑∞

k=N+1Ak(u)tk. The approximation is accurate andconverges quickly if t ≈ 0. For t >> 0, we divide the time interval into several subintervalswhich are small enough to make the approximation accurate.

Choose time steps ∆i > 0, i = 1, · · · , n such that T − t = ∆1 + · · · + ∆n, then by the towerproperty,

eφ(T−t,u)+ψ(T−t,u)>X(t) = E[eu>X(T ) | Ft]

= E[E[eu>X(T ) | FT−∆1 ] | Ft]

= eφ(∆1,u)E[eψ(∆1,u)>X(T−∆1) | Ft]= eφ(∆1,u)E[E[eψ(∆1,u)>X(T−∆1) | FT−∆1−∆2 ] | Ft]= eφ(∆1,u)+φ(∆2,ψ(∆1,u))E[eψ(∆2,ψ(∆1,u))>X(T−∆1−∆2) | Ft]=...

= eφ(∆1,u0)+φ(∆2,u1)+···+φ(∆n,un−1)eψ(∆n,un−1)>X(t),

where ui+1 = ψ(∆i+1, ui), u0 = u.

Comparing the two sides of the equation, we obtain

φ(T − t, u) =n∑i=1

φ(∆i, ui−1),

ψ(T − t, u) = ψ(∆n, un−1).

20

In practice, we can set the approximation error level to be ε. Then at each step we can choose∆ = (εAN (u))

1N , then the last term in the Taylor expansion is AN (u) ∗∆N = ε. Hence we can

control the approximation around the level ε.

Remark 3.1.1. The value of functions φ(t, u), ψ(t, u) might go to infinity for some value of tand u . In these situations, the Taylor expansion approximation doesn’t work. However, infinancial application, we assume these cases do not exist since in finance we always assume themoments of the underlying process exist.

Example 3.1.1. Consider the admissible parameters α = γ = 1, β = −1, the PDE for ψ is

∂tψ(t, u) = ψ(t, u)2 − ψ(t, u)− 1.

The unique closed form solution to this equation is given by

ψ(t, u) =2(e√

5t − 1)− ((√

5− 1)e√

5t +√

5 + 1)u

(√

5 + 1)e√

5t +√

5− 1− 2(e√

5t − 1)u.

For the numerical approximation, we choose the Taylor expansion order n = 10 and the toleranceof the error ε = 10−16. The following figures show the numerical errors for different t and u:

-10 -8 -6 -4 -2 0

u (t=5)

2

4

6

8

10#10-16

0 2 4 6 8 10

t (u=-1)

0

0.2

0.4

0.6

0.8

1

1.2#10-15

Figure 3.1.: Numerical Error

3.2. Laplace Transform Inversion

According to den Iseger (2006), given the Laplace transform of the target function, one can com-pute its value at points k1δ, · · · , kMδ, where M ∈ N+, ki = i, i ∈ 1, · · · ,M and δ is a certainpositive number. This algorithm can be very efficient and accurate, but it can only calculatethe function value at points kδ (i.e. on a uniform grid). In order to tackle this disadvantage,den Iseger (2008) has also offered another numerical inversion method based on the previouspaper. Instead of directly relating the target function and its Laplace transform by using thePoisson Summation Formula as in the first paper, the new method builds a relation betweenthe periodic summation of Fourier series coefficients of the function’s Legendre expansion tothe Fourier Transform of the target function. Given the Laplace transform of the function,the algorithm returns the coefficients of the Legendre expansion of the desired function. Thesecoefficients can be used to compute the value of target function at arbitrary points since theshifted Legendre polynomials form a complete orthogonal set in L2(−∞,∞).

21

In order to implement this formula to the algorithm, more mathematical tools are needed.So in this section, we will first introduce the two tools required for the construction of thealgorithm: Legendre polynomials and Gaussian quadrature. We will also introduce the PoissonSummation formula and use it to derive the algorithm.

3.2.1. Legendre Polynomials

One version of Legendre polynomials ln, n ∈ N+ is defined by

ln(t) =

√2n+ 1

n!∂n(tn(1− t)n), t ∈ [0, 1],

where ∂ is the differential operator.

The set ln, n ∈ N+ forms a complete orthogonal set in L2[0, 1]. We can also extend thiscase to L2(−∞,∞) by the fact that the shifted Legendre polynomials ln(·− j), n ∈ N+, j ∈ Zconsist a complete orthogonal set in L2(−∞,∞). So any function f ∈ L2(−∞,∞) can beexpanded into

f =∞∑n=0

∞∑j=−∞

< f, ln(· − t0 − j) > ln(· − t0 − j).

Here t0 is a shift parameter and the inner product < ·, · > is defined by

< f, g >=

∫Df(x)g(x)dx,

with D the domain of the function f(x)g(x). Since the domain of Legendre polynomials are[0, 1], we can obtain

f(t) =

∞∑n=0

< f, ln(· − t0 − j) > ln(t− t0 − j), t ∈ [t0 + j, t0 + j + 1), j = bt− t0c. (3.2.1)

So we can approximate the function f by

f(t) ≈n∑k=0

< f, lk(· − t0 − j) > lk(t− t0 − j), t ∈ [t0 + j, t0 + j + 1), j = bt− t0c.

One thing required in this approximation is the value of the functions lk. The complicatedexpression for lk makes it almost impossible to compute the function value directly. A goodalternative way is to find a recursion for lk.

The Legendre Polynomials ln(t) can be obtained by scaling and shifting the normalised Legendrepolynomials. The normalised Legendre polynomial of order k is defined by

Lk(t) =1

2kk!∂k(t2 − 1)k, t ∈ [−1, 1].

22

We can derive the relation between Lk(t) and lk(t):

lk(t) =

√2k + 1

k!∂k(tk(1− t)k)

=

√2k + 1

k!(−1)k∂k(t2 − t)

=

√2k + 1

k!(−1)k(

1

4)k∂k

((2t− 1)2 − 1

)k=

√2k + 1

k!(−1)k(

1

4)k∂k((2t− 1)2 − 1

)k∂(2t− 1)(k)

2k

=

√2k + 1

2kk!(−1)k∂k(x2 − 1)|x=2t−1

= (−1)k√

2k + 1Lk(2t− 1), t ∈ [0, 1].

Moreover, the normalised Legendre polynomials have following recursion: L0(t) = 1, L1(x) = xand

Lk(t) =2k − 1

ktLk−1(t)− k − 1

kLk−2(t), k ≥ 2.

Plugging lk(t) = (−1)k√

2k + 1Lk(2t−1) into this recursion and use the fact that lk(t) = lk(1−t),we obtain

lk(t) = (−1)k√

2k + 1Lk(2t− 1)

= (−1)k√

2k + 1

(2k − 1

k(2t− 1)Lk−1(2t− 1)− k − 1

kLk−2(2t− 1)

)= (−1)k

√2k + 1

(2k − 1

k(2t− 1)(−1)k−1 1√

2k − 1lk−1(t)− k − 1

k(−1)k−2 1√

2k − 3lk−2(t)

)= −2

k

√4k2 − 1(t− 1

2)lk−1(t)− k − 1

k

2k + 1

2k − 3lk−2(t)

=2

k

√4k2 − 1(t− 1

2)lk−1(t)− k − 1

k

2k + 1

2k − 3lk−2(t), k ≥ 2.

Hence the polynomials lk(t), t ∈ [0, 1] have the recursion

l0(t) = 1;

l1(t) =√

3(2t− 1);

lk(t) =2

k

√4k2 − 1(t− 1

2)lk−1(t)− k − 1

k

2k + 1

2k − 3lk−2(t), k ≥ 2.

So given the point t, we can easily calculate the function value.

3.2.2. Poisson Summation Formula

The Poisson summation formula is an equation that relates the Fourier series coefficients of theperiodic summation of a function to the values of the function’s continuous Fourier transform.Assume that the function f ∈ (−∞,∞) is bounded variation and

∑∞k=−∞ f(k) < ∞. The

simplest Poisson summation formula is stated as:

∞∑k=−∞

f(k) =∞∑

k=−∞f(i2πk),

23

where f is defined by

f(s) =

∫Re−sxf(x)dx, s ∈ (−i∞, i∞).

Then substituting the function f(x) by e−i2πxvf(x) for v ∈ [0, 1), we can get the followingtheorem:

Theorem 3.2.1. (PSF ): Suppose that f ∈ [−∞,∞] and is bounded variation. Then for allv ∈ [0, 1),

∞∑k=−∞

f(i2π(k + v)) =

∞∑k=−∞

e−i2πkvf(k), (3.2.2)

with f is defined by

f(s) =

∫Re−sxf(x)dx.

Based on the PSF, den Iseger (2006) gives an algorithm to compute the function value ona uniform grid given the Laplace transform of the function, however this algorithm is notable to compute the function value at arbitrary point. One way to tackle this is relating theLegendre expansion of the target function to its Fourier transform. Then we can inverse theLaplace transform to obtain the Legendre expansion coefficients. Finally by using (3.2.2) wecan compute the function value at arbitrary point. In order to find the relation we want, let’sfirst introduce the polynomials qvk : k ∈ N with

qvn(s) = pn(s)− (−1)n exp(−2πiv)pn(−s),

where pn(s) =√

2n+ 1∑n

k=0(k+n)!(n−k)!

(−s)kk! . Then we have the following lemma.

Lemma 3.2.1. Define the operator Ψ : Ψf(s) = 1sf(1

s ), denote the Laplace transform of the

Legendre polynomials by ln. Thenqvn = Ψln (3.2.3)

holds on the set 1i2π(k+v) , k ∈ Z.

Proof. see in Appendices

This lemma shows the relation between the Legendre polynomials lk and qvk. This relationenables us to find the desired result.

Theorem 3.2.2. Define the inner product < ·, · >v by

< f, g >v=∞∑

k=−∞

1

|λvk|2f(

1

λvk)g(

1

λvk) <∞,

where λvk, k ∈ Z, 0 < v < 1 is defined by λvk = i(2π(k + v)), the f(x) denotes the complexconjugation of f(x). Then the following relation holds for f ∈ L2(−∞,∞).

< Ψf , qvk >v=∞∑

j=−∞e−2πijv < f, lk(· − j) > . (3.2.4)


24

3.2.3. Laplace Transform Inversion Algorithm

The right hand side of (3.2.4) is a Fourier series, so if we can compute the left side, we canobtain the coefficients of the Legendre expansion < f, lk(· − j) > with the inverse fast Fouriertransform (IFFT) algorithm. However, the left hand side of (3.2.4) is an infinite summation.To solve this problem, we use the Gaussian quadrature to approximate < Ψf , qvk >v.

The Gaussian quadrature is widely used in approximation of the integral of a function. Moregeneral, Gaussian quadrature can be used to approximate any integral w.r.t. a positive measureand equivalently any inner product induced by this measure.

Definition 3.2.1. Given an inner product < ·, · >µ:

< f, g >µ=

∫Ifgdµ,

where µ is a positive measure, and I is a subinterval of the real axis. Let qk be a completeorthogonal set w.r.t. this inner product. The Gaussian quadrature is defined by the innerproduct < ·, · >µ,n:

< f, g >µ,n=n∑k=1

αkf(µk)g(µk),

where µk, k = 1, · · · , n are the roots of qn and αk, k = 1, · · · , n are given by αk =1∑n

j=1 |qj(µk)|2 .

Define the corresponding functional space L2v = h : ||h||2v =< h, h >v< ∞. In order to im-

plement Gaussian quadrature, we need to find a complete orthogonal set on L2v. Coincidentally

we find the set of polynomials qvk, k = 0, 1, · · · form a complete orthogonal set on L2v.

Theorem 3.2.3. For any 0 < v < 1, the set Qv = qvn, n ∈ N+ is a complete orthogonal setin L2

v.


Then the corresponding Gaussian quadrature to < Ψf , qvk >v is given by

< Ψf , qvk >vn=n∑k=1

αvkΨf(µvk)qvk(µvk), (3.2.5)

where µvk, k = 1, · · · , n are the roots of qvn and αk, k = 1, · · · , n are given by αvk =1∑n

j=1 |qj(µvk)|2 . The Gaussian quadrature is a finite summation, and hence there is no diffi-

culty to compute it.

Now assume we get the coefficients < f, lk(· − j) > with the IFFT algorithm, by (3.2.1),set the shift parameter t0 = 0, we have the approximation:

f(t) =

∞∑k=1

< f, lk(· − j) > lk(t− j)

≈n∑k=1

< f, lk(· − j) > lk(t− j), t ∈ [j, j + 1)

Consequently, we have the Laplace transform inversion algorithm:

25

Algorithm :1. Compute < Ψf , qvk >vn with Gaussian quadrature to approximate < Ψf , qvk >v;2. Compute the coefficients < f, lk(· − j) > by IFFT;3. Compute f(t) with

∑nk=1 < f, lk(· − j) > lk(t− j), t ∈ [j, j + 1).

Remark 3.2.1. To improve the approximation, we can introduce the function f∆(t) = f(∆t),then

f(t) = f∆(t

∆) ≈

n∑k=1

< f∆, lk(· − j) > lk(t

∆− j).

for t ∈ [∆j,∆(j + 1)).

3.2.4. Damping factors in IFFT Algorithm

In step 2 of the above algorithm, in order to inverse the infinite Fourier transform, the IFFTalgorithm is used. The IFFT algorithm can compute the inversion of discrete Fourier transformvery quickly. Suppose the infinite z-transform (Fourier transform) is given by :

H(z) =

∞∑k=−∞

hkzk,

where z = ei2πjN . Rewrite the z-transform

H(z) =

N−1∑k=0

∞∑j=−∞

hk+jnzk+jn =

N−1∑k=0

∞∑j=−∞

hk+jnzk. (3.2.6)

Note that for the polynomials P (z) =∑N−1

k=0 pkzk, the following inversion formula holds:

pk =1

N

N−1∑j=0

ei2πjkN P (e

i2πjN ), k = 0, · · · , N − 1,

This result come from the fact that

1

N

N−1∑j=0

ei2πjkN e

i2πjmN = δkm.

So applying this to (3.2.6),

∞∑j=∞

hk+jN =1

N

N−1∑j=0

ei2πjkN H(e

i2πjN ), k = 0, · · · , N − 1,

If the residual sum∑

j 6=0 hk+jN is small,we can get the approximation

hk ≈1

N

N−1∑j=0

ei2πjkN H(e

i2πjN ), k = 0, · · · , N − 1.

In this way, given the values of function H, the IFFT algorithm returns the approximation ofthe corresponding discrete value of the coefficients h.

26

To make sure the residual sum is small, in general we need to choose N large enough. However,if hk+jN converges slowly to 0, a very large N will affect the speed of the algorithm. In order toavoid such a problem, one can use the damping factor if hk converges fast to 0 in the negativepoints. Define Hr(z) = H(rz) and replace H(Z) in (3.2.6) by Hr(z). Choose a N large enoughsuch that hk+jN ≈ 0 for j < 0, then we obtain the new residual sum

∑j 6=0

rjNhk+jN ≈∞∑j=1

rjNhk+jN .

Hence we can make the residue sum arbitrarily small by choosing a sufficiently small r. On theother hand, in this case

hk ≈1

rk1

N

N−1∑j=0

ei2πjkN Hr(e

i2πjN ).

So a very small r can explode the approximation error and makes the algorithm unstable. Inapplication,we need to find a good balance between the N and r. If we want to calculate(h0, · · · , hm−1), we should choose a m ≥ m with N = 8m such that hk = 0, k ≤ −7m and the

damping factor r = e−448m (c.f. den Iseger (2006)).

3.2.5. Example: Vasicek Short Rate Model

The SDE of Vasicek short rate model is

drt = (b+ βrt)dt+ σdWt,

where b, β and σ are constant, W is the Brownian motion under the risk neutral measure. Thesolution to the SDE is given by

rt = r0eβt +

b

β(eβt − 1) + σeβt

∫ t

0e−βsdWs.

It follows that rt is normally distributed with mean

r0eβt +

b

β(eβt − 1),

and variance

σ2

2β(e2βt − 1).

On the other hand, the Vasicek model is an affine model, and hence the Laplace transform ofthe density of rt conditioned on r0 is given by

E[e−urt | F0

]= eφ(t,−u)+ψ(t,−u)r0 ,

where the Riccati equations are

∂tφ(t, u) =σ2

2ψ(t, u) + bψ,

∂tψ(t, u) = βψ(t, u).

27

We use (2.1) to approximate φ, ψ, and inverse the Fourier transform to get the density of rt.Set b = 0.09, β = −0.9, σ = 0.1, r0 = 0.02. Choose the Taylor expansion order N = 10, andtolerance ε = 10−10. For the parameters of Laplace transform inversion, we choose m = 8.For the following examples in this chapter, we will choose the same parameters. The followingfigures show the numerical density and its error compared to the explicit density, respectively.

-0.4 -0.2 0 0.2 0.4 0.6 0.8-1

0

1

2

3

4

5

6

Figure 3.2.: The numerical density function for rt conditioned on r0

-0.4 -0.2 0 0.2 0.4 0.6 0.8-4

-3

-2

-1

0

1

2#10-9

Figure 3.3.: The error between numerical density and explicit density

3.2.6. Example: CIR Model

On the canonical state space R+, the CIR process is defined by

dX(t) = k(θ −X(t))dt+ σ√X(t)dWt,

28

Where k, θ, σ are positive constants, and W is the Brownian motion under the risk neutralmeasure.

One can check that CIR model is an affine model and the system (2.1.3) is given by

φ(t, u) = kθ

∫ t

0ψ(s, u)ds,

ψ(t, u) =σ2

2ψ2(t, u)− kψ(t, u).

The solution is given by

φ(t, u) = −2kθ

σ2log(1− σ2

2k(1− e−kt)),

ψ(t, u) =2ku

2kekt − σ2(ekt − 1)u.

Define C(t) = −σ2(e−kt−1)4k , according to (2.1.1), we have

E[eu

X(T )C(T−t) | Ft

]=

A(t,T )u1−2u

(1− 2u)B2

,

where A(t, T ) = e−k(T−t)

C(T−t) X(t), B = 4kθσ2 . So the Ft-conditional distribution function of X(T )

C(T−t) is

noncentral χ2 with 4kθσ2 degrees of freedom and noncentral parameter e−k(T−t)X(t)

C(T−t) .

Now set the short rate r = X, the parameters k = 0.9, θ = 0.88/k, σ2 = 0.033, r0 = 0, 08.The numerical density and its error are presented in the following two figures:

0 0.2 0.4 0.6 0.8 1-2

0

2

4

6

8

10

12

Figure 3.4.: The numerical density function for rt conditioned on r0

29

0 0.2 0.4 0.6 0.8 1-6

-4

-2

0

2

4

6#10-10

Figure 3.5.: The error between numerical density and explicit density

3.3. Valuation of Option

Now we turn to the main purpose of this chapter: option valuation. Assume that X is a generalaffine process. One general assumption in affine models is that the short rate rt = c + γ>Xt,with constant c and γ.

3.3.1. Bond Option Valuation

The time t price of a European put option on an S-bond with expiry time T < S and strikeprice K > 0 is given by

π(t) = EQ

[e−

∫ Tt rsds(K − P (T, S))+ | Ft

].

Under the forward measure QT , the price can be represented

π(t) = P (t, T )EQT[(K − P (T, S))+ | Ft

].

Plugging (2.2.3) into this formula,

π(t) = P (t, T )EQT[(K − eφ(S−T,0)+ψ>(S−T,0)X(T ))+ | Ft

]= KP (t, T )EQT

[(1− e−(log(K)−φ(S−T,0)−ψ>(S−T,0)X(T )))+ | Ft

]= KP (t, T )

∫RG(logK − φ(S − T, 0)− x)fQ

T

ψ>(S−T,0)X(T )|X(t)(x)dx

= KP (t, T )(G ∗ fQ

T

ψ>(S−T,0)X(T )|X(t)

)(logK − φ(S − T, 0))

= KP (t, T )h(logK − φ(S − T, 0)),

(3.3.1)

where fQT

ψ>(S−T,0)X(T )|X(t)is the density function of X(T ) conditioned on X(t) under the forward

measure QT , and G(x) = (1− e−x)+. The function h is the convolution of these two functions.

30

According (2.2.5), we know the Laplace transform of fQT

ψ>(S−T,0)X(T )|X(t)

fQT

ψ>(S−T,0)X(T )|X(t)= EQT

[e−uψ

>(S−T,0)X(T ) | Ft]

= EQT[e−uψ

>(S−T,0)X(T ) | Ft]

= eφ(T−t,−uψ(S−T,0))−φ(T−t,0)+(ψ(T−t,−uψ(S−T,0))−ψ(T−t,0))>X(t).

(3.3.2)

Moreover, The transform for G(x) is given by

G(u) =1

u(u+ 1), u > 0.

Note that the transform of two functions’ convolution is equal to the product of the transformof each functions,

h(u) =1

u(u+ 1)eφ(T−t,−uψ(S−T,0))−φ(T−t,0)+(ψ(T−t,−uψ(S−T,0))−ψ(T−t,0))>X(t).

Then we use the inversion method in section 3.2 to calculate the convolution and finally theput option price. The call option price can be easily calculated by the put-call parity.

3.3.2. Caplet Valuation

The LIBOR rate (London Interbank Official Rate) is the most important interbank rate whichis usually considered as a reference for the fixed-income contract. The LIBOR rate for theperiod [T, S], S > T at time T is

L(T, S) =1

S − T(

1

P (T, S)− 1).

A caplet with reset date T and settlement date S, S > T is an option which pays the holder thedifference between a LIBOR rate and the strike rate k at time S. Its cash flow at time S is

(S − T )(L(T, S)− k)+.

Under the forward measure QT , the value of the caplet at time t is given by

Cpl(t, T, S) = P (t, T )EQT[P (T, S)(S − T )(L(T, S)− k)+ | Ft

]= P (t, T )EQT

[P (T, S)

((

1

P (T, S)− 1)− (S − T )k

)+

| Ft

]

= P (t, T )(1 + (S − T )k)EQT

[(

1

1 + (S − T )k− P (T, S))+ | Ft

].

We find that pricing the caplet is equivalent to price a put option on a bond with strike price1

1+(S−T )k . We can use the approach in the previous section to price it, and the floorlet can beeasily computed by the put-call parity.

31

3.3.3. Example: CIR Model

We introduce the CIR process in section (3.2.6). Let r = X, as an application, we compute theoption price under different strike prices.

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.20

0.05

0.1

0.15

0.2

Figure 3.6.: option prices vs strike prices

We can also compute the ATM cap price. The tenor is as follows: t = 0(today), T0 = 14(the

first reset day) and Ti−Ti−1 = 14 , i = 1, · · · , 199. The Table 3.1 shows that ATM cap prices for

various maturity time.

Table 3.1.: ATM cap price of CIR model

Maturity ATM cap price

1 0.00732 0.01903 0.03024 0.04065 0.05016 0.05887 0.06688 0.074210 0.087112 0.097915 0.111020 0.126530 0.1430

32

3.3.4. Example: Heston Stochastic Volatility Model

The dynamics for Heston model is given by

dV (t) = α(β − V (t))dt+ σ√V (t)dW v(t),

dS(t) = rS(t)dt+√V (t)S(t)dW s(t),

dW v(t)dW s(t) = ρdt.

Here V is the stochastic volatility and S is the stock price. ρ is the correlation of these twoprocesses and r is the constant interest rate. If we define X1(t) = V (t), X2(t) = logSt, applyIto’s lemma, we can obtain

dX1(t) = (k + κX1(t))dt+ σ√

2X1(t)sdW1(t),

dX2(t) = (r −X1(t))dt+√

2X1(t)(ρdW1(t) +√

1− ρ2dW2(t)),

for some constant k, σ > 0, κ ∈ R and ρ ∈ [−1, 1].

The process X = (X1, X2)> is affine with the canonical state space R+ × R. Since the stockprice S(t) = eX2(t), the price of the put option is given by

π(t) = e−r(T−t)E[(K − S(T ))+ | Ft

]= Ke−r(T−t)E

[(1− e−(log(K)−(0,1)X(T ))

)+| Ft

]= Ke−r(T−t)

(f(0,1)X(T )|X(t) ∗ V

)(log(K)).

Inverting the transform we obtain the price put option, using the put-call parity we can alsoget the call option price.

For illustration, we set the parameters X(0) = (0.02, 0)>, k = 0.02, κ = −2.0, σ = 0.1, r = 0.01and ρ = 0.5. The Table 3.2 shows the results for European call option prices at time t = 0 withdifferent strike prices and maturities.

Table 3.2.: Call option prices for Heston model

K/T 0.2 0.4 0.6 0.8 1.0

0.8 0.2016 0.2037 0.2061 0.2088 0.21150.9 0.1049 0.1120 0.1183 0.1239 0.12911.0 0.0348 0.0478 0.0571 0.0646 0.07111.1 0.0074 0.0168 0.0245 0.0310 0.03681.2 0.0012 0.0053 0.0100 0.0.0144 0.0186

Compare the results in the table (3.1) and (3.2) with the results in [1], we find the two resultsare the same. This reflects our pricing results is trustful.

33

4. Calibration of Affine Models by KalmanFilter

The Kalman Filter is an efficient method to estimate the internal states of a Markov processfrom a series of measurements with noise. The choice of Kalman filter for calibration is moti-vated by its characteristics. At first, the Kalman filter can estimate the states of underlyingprocess which are required in pricing the calibration instruments. Secondly, given the parame-ters of the model, one output of Kalman filter is the likelihood of observations. This gives us anapproach to find the optimal parameters: the Maximum Likelihood. Moreover, the likelihood ofobservation also gives us an indication about the quality of the calibration. Higher likelihoodmeans the calibrated model can produce more accurate prices.

The Kalman filter algorithm can be separated into two steps: the prediction step and theupdate step. In the prediction step, given the previous state, the next state of the process ispredicted according to the model of the underlying process. In the update step, the predictionof next state is updated based on the measurements (with some random noise) we observe.

The Kalman filter in this thesis is actually referred to as the discrete Kalman filter whichhas discrete time domain. There is also a continuous Kalman filter with continuous time do-main, but it is not suitable in our case because the data (calibration instruments) we have arewith discrete time. So if the original dynamics are continuous Markov processes, a standarddiscretization (Euler) on time should be implemented at first.

4.1. Outline of Kalman Filter Algorithm

In general, the Kalman filter model is as follows

Xk = Fk(Xk−1, µk, wk),

Zk = Hk(Xk, vk),(4.1.1)

where· Xk ∈ Rd is the state of the process at time k.· Zk ∈ Rm is the observable measurement at time k.· µk ∈ Rd is the optional control term, which is an input of the system. In our case we can setthis control term equal to zero.· wk is a normally distributed random valuable with mean 0 and covariance Qk.· vk is the measurement noise which has normal distribution with mean 0 and covariance Rk;· Fk and Hk are general functions at time k.

The initial state, say X0, is required in Kalman filter system. Normally, we assume X0 ∼N(x, P0), where x and P0 are the inputs decided by known information about the underlyingprocess.

34

We assume that the initial state X0, the random valuables w1, · · · , wk and the noise seriesv1, · · · , vk are independent. Then we can have the following Markov property:

P (Xk | Xk−1, · · · , X0) = P (Xk | Xk−1),

P (Zk | Xk, · · · , X0) = P (Zk | Xk).

The Kalman filter is directly linked to dynamic Bayesian networks. The core part of Kalmanfilter is using the observations to correct the estimation on the latent states. Define X−k , theprior, as the estimator of Xk based on the information up to time k− 1. Normally, the prior isgiven by

X−k = E[Fk(Xk−1, vk) | Xk−1].

Denote its estimation error by ε−k = Xk − X−k . Then prior error matrix is

P−k = E[ε−k (ε−k )T ].

Define Xk, the posterior, as the updated estimator of Xk given the measurement Zk, and denoteits error by εk = Xk − Xk. The posterior error matrix is given by

Pk = E[εkεTk ].

The challenge is to make a connection between the posterior and the prior. In Kalman filter,the variance control technique is used by assuming

Xk = X−k +Kk(Zk − Zk).

Zk = E[Hk(Xk, vk) | X−k ] is the estimation of the measurement Zk based on the prior X−k . The

difference (Zk− Zk) is called the measurement innovation. It reflects the deviation between thereal measurement and the predicted measurement. The matrix Kk is called the optimal gainwhich minimizes the expected posterior error:

∑nk=1 E|Xk − Xk|2. It reveals how much should

the prediction be corrected. One can easily see that the expected posterior error is the trace ofPk. By definition,

Pk = E(Xk − Xk)(Xk − Xk)>

= E(Xk − X−k −K(Zk − Zk)

)(Xk − X−k −K(Zk − Zk)

)>= P−k − E(Xk − X−k )(Zk − Zk)>K> −KE(Zk − Zk)(Xk − X−k )> +KE(Zk − Zk)(Zk − Zk)>K>.

Take the gradient of the trace of Pk w.r.t. K and set the gradient be zero, we obtain the optimalgain

Kk = PXk,ZkP−1Zk,Zk

,

with

PXk,Zk = E(Xk − X−k )(Zk − Zk)>,

PZk,Zk = E(Zk − Z−k )(Zk − Zk)>.

Hence the posterior error matrix

Pk = P−k − PXk,ZkP−1Zk,Zk

P>Xk,Zk

= P−k −KPZk,ZkK>.

35

4.2. Linear Kalman Filter

The simplest Kalman filter models are the linear models, where the function F and H can berepresented as a linear operator. The expression of these models is:

Xk = FkXk−1 + wk, wk ∼ N(0, Qk),

Zk = HkXk + vk, vk ∼ N(0, Rk),

where Fk and Hk are matrices. Then

Xk|Xk−1∼ N(FkXk−1, Qk);

Zk|Xk ∼ N(HkXk, Rk).

The prior and its error matrix are given by

X−k = FkXk−1,

P−k = E(Fk(Xk−1 − Xk−1) + wk)(Fk(Xk−1 − Xk−1) + wk)>

= FkPk−1F>k +Qk.

The error covariance

PXk,Zk = E(Xk − X−k )(Hk(Xk − X−k ) + vk)>

= P−k H>k ,

PZk,Zk = E(Hk(Xk − X−k ) + vk)(Hk(Xk − X−k ) + vk)>

= HkP−k H

>k +Rk.

It follows that the optimal gain

Kk = P−k HTk (HkP

−k H

Tk +Rk)

−1.

Hence we can update the estimation and its error matrix

Xk = X−k +Kk(Zk −HkX−k ),

Pk = P−k − P−k H

>k (HkP

−k H

>k +Rk)

−1HkP−k

= (I −KkHk)P−k .

We want to get the likelihood of the observations Zk, k = 1, 2, · · · . Note that

P (Zn) = P (Zn, · · · , Z1) =

(n∏k=2

P (Zk | Zk−1)

)P (Z1).

Since the joint distribution of (Xk, Zk) is Gaussian, we have

Zk |Zk−1∼ N (Zk, PZk,Zk).

Using the “log-likelihood ”: ln = logP (Zn), and adopting the convention that l1 = 0, we obtainthe recursion for the estimated likelihood

lk = lk−1 −1

2

(d log 2π − 1

2log(det(PZk,Zk))− (Zk −HkX

−k )>P−1

Zk,Zk(Zk −HkX

−k )

).

36

4.3. Extended Kalman Filter

In generally, the function Fk and Hk in (4.1.1) are not always linear functions. In our case,although the underlying affine process X can be discreterized into linear dynamics (we willshow this later), the caplet pricing formula w.r.t. the underlying process X is nonlinear. So thetraditional (linear) Kalman filter cannot apply for such systems. There are several approachesto deal with the nonlinearity. In this section, the extended Kalman filter (EKF) is presented.

The extended Kalman filter is to linearize the systems by Taylor expansion to form a Gaussianapproximation to the joint distribution of the state Xk and the measurement Zk. The firstorder EKF will be presented and used in this thesis. Higher order EKF is also possible and canbe formed in a similar way.

In the first order EKF, the prior and the prediction of measurement are given by the linearapproximation:

X−k = E[Fk(Xk−1, wk) | Xk−1] ≈ Fk(Xk−1),

Zk = E[Hk(X−k , vk) | X

−k ] ≈ Hk(X

−k ).

The covariance matrices are determined by the first order linearized systems

Xk = X−k +Ak(Xk−1 − Xk−1) +Wkwk,

Zk = Zk +Bk(Xk − X−k ) + Vkvk,

where

Ak(i, j) =∂F ik∂xj

(Xk−1, 0),

Bk(i, j) =∂H i

k

∂xj(X−k−1, 0),

Wk(i, j) =∂F ik∂wj

(Xk−1, 0),

Vk(i, j) =∂H i

k

∂vj(X−k−1, 0).

Then we have the approximation for error matrices

P−k ≈ E(Ak(Xk−1 − Xk−1) +Wkwk)(Ak(Xk−1 − Xk−1) +Wkwk)>

= AkPk−1A>k +WkQk−1W

>k ,

PXk,Zk ≈ E(Xk − X−k )(Bk(Xk − X−k ) + Vkvk)>

= P−k B>k ,

PZk,Zk ≈ E(Bk(Xk − X−k ) + Vkvk)(Bk(Xk − X−k ) + Vkvk)>

= BkP−k B

>k + VkRkV

>k .

It follows that the optimal gain

Kk = P−k B>k (BkP

−k B

>k + VkRkV

>k )−1.

37

Hence the posterior and its error matrix are

Xk = X−k +Kk(Zk − Zk),Pk = (I −KkBk)P

−k .

One of the assumption in EKF is that the joint density (Xt, Zt) is Gaussian. Given the infor-mation up to time k − 1,

(Xk, Zk)> ∼ N

((X−k , Zk)

>,Σ),

with

Σ =

(P−k PXk,Zk

P>Xk,Zk

PZk,Zk

). (4.3.1)

and recursion for the log-likelihood function is given by

lk = lk−1 −1

2

(d log 2π − 1

2log(det(PZk,Zk))− (Zk −Hk(X

−k ))>P−1

Zk,Zk(Zk −Hk(X

−k ))

).

Remark 4.3.1. The most difficult step in EKF is the calculation of the derivatives due to thecomplexity of the pricing formula. In affine models, we propose an efficient way to calculate thederivatives. Recall the caplet pricing formula is of the form P (0, T )h(x, k), where k is the strikeprice which is a constant w.r.t. x,and h is the convolution of a known function G and the densityfunction f = fQT

ψT (S−T,0)XT |Xt(see (3.3.1)). It is easy to compute the derivative of P (0, T ) w.r.t.

the risk factor. So the thing left is to compute the derivative of the convolution w.r.t. the stateX. Assume that the density function has nice properties such that the interchange betweendifferential and integral is allowed. Denote f

′i = df

dxi, then

df

dxi=

d

dxi

∫ ∞−∞

e−stf(t)dt

=

∫ ∞−∞

e−stdf

dxi(t)dt

= f′i .

(4.3.2)

The formula for f is given by (3.3.2) and we can easily compute its derivative w.r.t. xi, hence

we can obtain the Laplace transform f′i . Moreover, the derivatives in EKF

B(i) =dh

dxi=

d

dxi

∫ ∞−∞

G(t− u)f(u)du

=

∫ ∞−∞

d

dxi(G(t− u)f(u))du

=

∫ ∞−∞

G(t− u)df

dxi(u)du

= (G ∗ f ′i )(t).

So the derivative B(i) can be represented as a convolution. We can use the Laplace transform

inversion algorithm to compute this convolution since we can compute f′i by (4.3.2).

38

Remark 4.3.2. The EKF has the following two shortcomings:

1. The EKF use the first or higher order to linearize the nonlinear system and get the ap-proximation of the mean and variance. This kind of approximation only applies for the almostlinear functions. For more general functions, the linearization may introduce large errors.

2. Although in our case, the Jacobian matrices can be computed in an efficient way, in generalthe calculation of Jacobian matrices can be difficult and can introduce extra errors. Moveover,there are even some cases that the Jacobian matrices do not exist (Julier and Uhlmann, 2004).

4.4. Calibration Algorithm

In application, the underlying processes should be discreterized before applying Kalman filter.The discreterization leads to the linear relation of the underlying states (we can see this laterin the calibration testing part). Hence the Kalman filter model in our application is of the form

Xk = FkXk−1 + wk,

Zk = Hk(Xk) + vk.

The Fk is a linear operator and Hk is the pricing function. The linearity of Hk depends on thecalibration instruments. For instance, if the calibration only uses the zeros rates, than Hk islinear; on the other hand, if the caplets are used, as in our case, then Hk is nonlinear. Thefollowing figure is the illustration of the Kalman filter in our case:

Figure 4.1.: The algorithm of extended Kalman filter

39

Given the parameters of the underlying process, denote by Θ, and the noise level ε (definethe covariance matrices for vk), the Kalman filter algorithm can give the likelihood of theobservations. Hence we can build an likelihood function w.r.t. the parameters and the noiselevel. Denote this function by l = Kalman(Θ, ε), the calibration procedure is to find theparameters and the noise level to maximize the likelihood, which is equivalent to the followingoptimal problem:

minΘ,ε−Kalman(Θ, ε), s.t. the para satisfies the admissible conditions

This optimal problem can be solved by the function ”fmincon” in matlab.

4.5. QR Decomposition for Linear or Extended Kalman Filter

The QR decomposition of a matrix A is to decompose the matrix A into the product QR withQ an orthogonal matrix and R an upper triangular matrix. Moreover, if the matrix A is anm× n matrix with m > n, then matrix Q can be an m×m unitary matrix and R is an m× nupper triangular matrix with the bottom m− n rows all zero. In this case, we can rewrite thematrix A as

A = QR = Q

(R1

0

)= (Q1, Q2)

(R1

0

)= Q1R1.

with R1 an n×n upper triangular matrix and Q1,Q2 are m×n matrix and m× (m−n) matrixrespectively with orthogonal columns.

In the linear or extended Kalman filter, we need to invert the matrix to compute the op-timal Kalman gain. In linear Kalman filter, the optimal Kalman gain is given by Kk =P−k H

>k (HkP

−k H

>k + Rk)

−1. Denote HkP−k H

>k + Rk by Sk. We assume that the noises are

independent, so the covariance matrix Rk is givenh1

h2

. . .

hN

,

where the N is the number of different calibration instruments, and hi, i = 1, · · · , N are thenoise levels (the variance of the noises). If certain noise level hi is very small, then the Sk isalmost singular and it is difficult to invert it numerically. We can solve this problem by applyingthe QR decomposition.

Since P−k is semi-positive definite, we denote the Cholesky decomposition of P−k by Lk, soLkL

>k = P−k . Then we apply QR decomposition to HkLk, namely HkLk = QkRk. Assume we

use N different calibration instruments, then Qk is N × N and Rk =

(R1k

0

)is N ×m with

40

m << N the dimension of the underlying process. Then

Sk = HkP−k H

>k +Rk

= HkLkL>kH

>k + hI

= Qk

(RkR

>k + hI

)Q>k

= Qk

(R1k(R

1k)> + hI1 00 hI2

)Q>k .

It follows that

Kk = P−k H>k S−1k

= LkL>kH

>k Qk

((R1

k(R1k)> + hI1)−1 00 h−1I2

)Q>k

= LkR>k

((R1

k(R1k)> + hI1)−1 00 h−1I2

)Q>k

=[Lk(R

1k)>(R1

k(R1k)> + hI1)−1, 0

]Q>k .

So in order to compute the Kk, we only need to invert the m×m matrix R1k(R

1k)> + hI1. This

matrix is sure to be semi-positive definite even if the h is small. Moreover, inverting a m ×mmatrix instead of N × N matrix makes the algorithm more efficient. The QR decompositioncan be applied to extended Kalman filter similarly.

41

5. Calibration Example: Two Factor HullWhite Model with Stochastic Volatility

In this chapter, we will calibrate the two factor Hull White model with stochastic volatility, andcheck the performance of the calibration algorithm. For convenient, we assume that the realworld measure P and the risk neutral measure Q are the same.

The two factor Hull White model with stochastic volatility in our testing is defined by

dV (t) = k1(θ1 − V (t))dt+ σ1

√V (t)dW1(t),

dX1(t) = k2(θ2 −X1(t))dt+ σ2

√V (t)(ρ1dW1(t) +

√1− ρ2

1dW2(t)),

dX2(t) = k3(θ3 −X2(t))dt+ σ3

√V (t)(ρ2dW1(t) + adW2(t) +

√1− ρ2

2 − a2dW3(t)).

where a = ρ3−ρ1ρ2√1−ρ21

. The θ1, θ2, θ3 are the long term means, k1, k2, k3 are the mean reversion

rates. They correspond to the speed of adjustment to the long term means. ρ1, ρ2, ρ3 are thecorrelations between V and X1, V and X2, and X1 and X2, respectively. Let Y = (V,X1, X2),then Y is an affine process if k1θ1 ≥ 0

Note that in general Hull White processes, the long term mean θ2(t) and θ3(t) are the functionsof time t. For convenience, we just assume θ2(t) ≡ θ2 and θ3(t) ≡ θ3.

Remark 5.0.1. The difference between real world measure P and risk neutral measure Q isthat the real world measure P contains a risk premium. By Girsanov theorem, this risk premiumonly has impact on the drift term of a process. Therefore, assume X(t), the underlying processunder the measure Q, to be affine, when P 6= Q, we just need to assume a process λ(t) tomodel the risk premium, then the same underlying process under the real world measure hasthe dynamics dX(t) = dX(t) + λ(t)dt. In this way, the affine model can easily translate theunderlying process between measure P and measure Q.

5.1. Data Selection

In the calibration, we use simulated historical data instead of real historical data. We assign the“real” value to the parameters before hand, and use Monte Carlo to generate the underlyingprocess under the “real” parameters. Then we can compute the option prices based on thesimulated underlying process and add noise on the prices. These noised prices are the simu-lated historical data we use. Using the simulated historical data we can calibrate the modelwith extended Kalman filter and finally get the “calibrated” parameters. By comparing the“calibrated” parameters and “real” parameters, we can judge the calibration performance.

The calibration instruments are zero rates with different maturities and the sum of capletprices and its corresponding floorlet (with the same tenor and strike) prices with different ex-piries and strikes. The reason why we choose the sum of caplets and floorlets is as follows: The

42

extreme strikes are needed to calibrate the covariance, however the prices of the caplets withlarge strikes and the prices of the floorlets with small strikes can be almost zero, dependingon the parameters and the value of risk factors. In this case, the relative estimation error canbe very large even if the absolute error is small since the prices are very small. Therefore, theKalman filter will become very unstable.

In single caplet (or floorlet) pricing, solving of Riccati equations for counts over 90% of thetotal time. But noting that given the parameters and the value of u, the solution of Riccatiequations only depends on the expiry and maturity. So in calibration, for the caplets (or floor-lets) with the same expiry and maturity, we can use the same Riccati equations to computetheir prices. Moreover, for each run of Kalman filter, we only need to solve the Riccati equationonce and we can price all the caplets. In the calibration testing, the maturity of all the optionsare fixed to be 3 months

Defining the fixed strikes before hand may loss information since we do not know on whichlevel the given strikes are. If all the given strikes are on the ITM level, then we loss the in-formation of OTM caplets (or floorlets). So for the calibration, the strikes for each caplet aredefined by the proportion of the ATM strike.

5.2. Kalman Filter Model for Two Factor Hull White Modelwith Stochastic Volatility

Before the calibration, we need to discreterize the underlying process. DenoteK =diag(k1, k2, k3),Θ =(θ1, θ2, θ3)>,

ρ =

σ1 0 0

σ2ρ1 σ2

√1− ρ2

1 0

σ3ρ2 σ3a σ3

√1− ρ2

2 − a2

,

and Σ = ρρ>. Fixed the step size to be δ = tk+1 − tk for all k. Then the discreterized processis given by

Y (k + 1) = Y (k) +K(Θ− Y (k))δ +√V (k)ρZk

= KΘδ + (I −Kδ)Y (k) + wk,

where Y = (V,X1, X2)>, and Zk ∼ N (0, δI) with I the three dimensional identity matrix. Thenoise part is not independent from the process Y . However, if the step size δ is small enough,we can assume the volatility to be constant in each step. In our case, we choose δ = 1

365 . Hencewe can obtain the Kalman filter model for the two factor Hull White model with stochasticvolatility

Y (k + 1) = KΘδ + (I −Kδ)Y (k) + wk,

Zk+1 = h(Y (k + 1)) + vk,

where wk ∼ N (0, V (k)δΣ) with constant V (k), the h is non-linear pricing function, and the vkis the noise.

43

5.3. Derivative Computation of Affine Model for EKFApplication

In last chapter, we showed how to compute the derivatives of caplet prices w.r.t. the risk fac-tors given constant strike prices. However in the calibration, the strikes are related to the ATMstrike, which is also a function of the risk factors. Hence the strikes will be functions of riskfactors as well, and hence the previous algorithm for derivative calculation is not suitable inthis case. Fortunately, we can still use a similar way to compute the derivative.

Recall the pricing formula for caplet: P (t, T )h(x, k), with h = G ∗ f and k the strike price.The difference now is the derivative of function h. Denote H(x) = h(x, k(x)),

dH

dx=∂h

∂x+∂h

∂k

dk

dx.

The derivative ∂h∂x can be computed as previous. The function k is proportional to ATM strike.

The ATM strike price for caplet with expiry T and maturity S is given by

ATM =1

S − T(P (t, T )/P (t, S)− 1)

=1

S − T(eφ(T,0)−φ(S,0)+(ψ(T,0)−ψ(S,0))>Xt − 1).

Hence dkdx can be computed easily. The only thing left is to compute ∂h

∂k . Note that the functionG(x) = G′(x) = 0 if x < 0, so

∂h

∂k=

∂

∂k

∫ k

−∞G(k − t)f(t)dt

=

∫ ∞−∞

G′(k − t)f(t)dt

= G′ ∗ f.

(5.3.1)

The Laplace transform G′ = 1s+1 . So we can compute the (5.3.1) by inverting the Laplace

transform.

Remark 5.3.1. The function G(x) is not differentiable at x = 0. But since G(x) = G′(x) =0, x < 0,

∂

∂k

∫ k

−∞G(k − t)f(t)dt =

∂

∂k

∫ k−

−∞G(k − t)f(t)dt

=

∫ k−

−∞G′(k − t)f(t)dt

=

∫ ∞−∞

G′(k − t)f(t)dt,

with

G′(x) =

e−x, x > 0;

0, x < 0.

44

Example 5.3.1. This example shows the performance of the EKF under the noise level 10−6.We randomly choose a starting point for the Kalman filter and observe that the algorithm canestimate the real risk factor very well.

0 200 400 600 800 1000

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

EKF

Real

Figure 5.1.: V: filtered by improved EKF v.s. real

0 200 400 600 800 1000

0.04

0.05

0.06

0.07

0.08

EKF

Real

Figure 5.2.: X1: filtered by improved EKF v.s. real

45

0 200 400 600 800 1000

0.04

0.045

0.05

0.055

0.06

0.065

EKF

Real

Figure 5.3.: X2: filtered by improved EKF v.s. real

5.4. Results

In this section we present the calibration results of the extended Kalman filter. We use 20different zero rates with maturities from 1 up to 20, and 42 different options with expiries[ 112 ,

14 , 1, 6, 12, 20] and strikes [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75]∗ATM, where ATM means the at-

the-money strike. For each calibration instrument, we simulate the time series with 1000 ob-servations with time step 1

365 . Further more, in order to show how the calibration performanceis affected by the noise, the noise level in this testing will vary form 1010 to 10−8.

5.4.1. Parameterization

Before the calibration, selecting a proper parameter space is necessary. We call this step the“parameterization ”. Not all the parameters in the model need to be calibrated, since scalingor shifting the processes may give the equivalent model. The “over-parameterization ”meansthat the calibrated parameters are more than needed. It will make the calibration algorithmunstable and inefficient.

There are 12 parameters in this model and 4 parameters in the short rate formula: r(t) =c+ (γ1, γ2, γ3)>X(t), according to the assumption of affine model. We assume that the stochas-tic volatility impacts the short rate only through the volatility of X1 and X2, hence withoutlosing generality, the short rate can be formed as r(t) = X1(t) + X2(t). Moreover, since theshort rate is the sum of the two Hull White processes, the drift term of the short rate rt is givenby

(k2θ2 + k3θ3 − k2X1(t)− k3X2(t)) dt

Different value of θ2 and θ3 can lead to the same drift term of rt. So we can further assumethe long term mean of these two processes are the same without losing any generality, namelyk2 = k3. Note that the two factor Hull White model with stochastic volatility is of the form:

dY (t) = b(β − Y (t))dt+ Σ√V (t)W (t).

46

The interchange between Σ and V (t) can result to the same underling processes, and hencethe same option prices. To avoid over-parameterization, we need to fix another parameter inthe model. Note that the exchange of the processes X1 and X2 will lead to the same result onprices, so finally we choose to fix the volatility of X2, namely σ3.

Moreover we assume that the noises in the prices are independent and at the same level (withthe same variance ε). Since the covariances of the noises Rk are also the input of Kalman filter,we also include the noise level into the parameter space and try to find the “optimal ”noiselevel. Although in our testing we can easily check the calibration performance by comparingthe estimated parameters and the “real ”parameters, in application we do not know the “real”parameters. The (estimated) optimal noise level reflects how much noise in the observationsunder the estimated model. So in application we can judge the calibration performance throughthe estimated noise level. The lower the estimated noise level is, the better the calibration result.So finally, we initially set the parameter space to be [k1; k2; k3;σ1;σ2; ρ1; ρ2; ρ3; θ1; θ2(= θ3), ε].In the following subsections, we will find that the model is still over-parameterized if we onlyfixed σ3.

5.4.2. Small Noise Testing

Table 5.1 shows the calibration results under the noise level 10−10. We call this noise level thesmall noise level. In this case, the calibration fails to replicate all the “real ”parameters.

Table 5.1.: Calibration Results 1

real estimated

k1 0.4000 0.4006k2 0.3000 0.2097k3 0.2100 0.3004θ1 1.00 0.5872

θ2, θ3 0.0500 0.0500σ1 0.200 0.1511σ2 0.0080 0.0120

σ3 (fixed) 0.0100 0.0100ρ1 -0.3000 -0.4425ρ2 -0.5000 -0.6723ρ3 -0.4000 -0.2921ε 1e-10 8.9171e-11

likelihood 6.0986e+5 6.0844e+5

However, if we put the real process X1 and the estimated process X2, the real process X2 andestimated process X1 into the same figure, respectively, we find that the real process and theestimated process match each other (Figure 5.4). It shows that in this case the algorithm cannotidentify these two Hull White processes and exchange them.

47

0 200 400 600 800 1000

0.05

0.055

0.06

0.065

real X1

estimated X2

0 200 400 600 800 1000

0.045

0.05

0.055

0.06

real X2

estimated X1

Figure 5.4.: The mix between the Hull White processes

Moreover, low estimated noise level and the high calibration likelihood indicate that the cal-ibrated model can estimate the observations very well. The following Figure 5.5 shows themean and the maximum relative estimation error of the prices time series for each calibrationinstrument. We find that for all the 62 different instruments, the means and the maximums ofthe relative estimation error are below 0.025%.

48

0 10 20 30 40 50 60

0

0.5

1

1.5

2

2.5x 10

−4

mean

maximun

Figure 5.5.: The relative estimation error for all 62 calibration instruments

It is also interesting to notice the peaks in this figure. All these peaks are the relative error forthe options with the ATM strikes. The reason is that the prices of the option with ATM strikeare much smaller than other instruments, so these prices are more influenced by the noise. As aconsequence, the Kalman filter will give less precise estimations on them. The Figure 5.6 showsthat the relative estimation error is consistent with the relative noise in the simulated historicaldata.

0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1x 10

−3

mean

maximun

Figure 5.6.: The relative noise in all 62 calibration instruments

The exchange between the two Hull White risk factors and the good estimation on the obser-vation indicate the over-parameterization in the model. Different parameters in the diffusionpart of this the model can produce the same underlying processes even if the parameter σ3 isfixed. Note that the prices of instruments depend on the short rate process rt. If we look into

49

the diffusion part of rt:√V (t)

[(σ2ρ1 + σ3ρ2)dW1(t) + (σ2

√1− ρ2

1 + σ3a)dW2(t) +√

1− ρ22 − a2dWt(3)

](5.4.1)

there are three independent Brownian motions in the diffusion part. The three terms beforethe Brownian motions, denote them by B1, B2 and B3 respectively, together with the stochasticvolatility V decide the diffusion part. However, even if σ3 is fixed, there are still 4 parameters(σ2, ρ1, ρ2, ρ3) to determine the three terms. So only normalise one diffusion parameter in themodel is not enough to uniquely determine the model, different parameter settings can givethe same short rate process and hence the same prices. Moreover, Table 5.1 shows that theestimation on the θ1 and σ1 are wrong. The Figure 5.7 also shows the wrong estimation on thestochastic processes.

0 200 400 600 800 1000

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

Figure 5.7.: The stochastic volatility process: real v.s. estimated

To solve the over-parameterizations in the correlations and Hull White volatility parameters,one more parameters in the diffusion part of the Hull White processes should be normalised (orfixed). In the following example, based on the as simulated data and parameter setting as in theprevious case, we further fixed the correlation ρ1 = −0.3. The Table 5.2 shows the estimationare accurate and the algorithm can identify the two Hull White processes.

50

Table 5.2.: Calibration Result 2

real estimated by EKF

k1 0.4000 0.3999k2 0.3000 0.3000k3 0.2100 0.2100θ1 1.00 0.9987

θ2, θ3 0.0500 0.0500σ1 0.2000 0.1997σ2 0.0080 0.0080

σ3 (fixed) 0.0100 0.0100ρ1 (fixed) -0.3000 -0.3000

ρ2 -0.5000 -0.5000ρ3 -0.4000 -0.3988ε 1e-10 8.8735e-11

likelihood 6.0986e+5 6.09323e+5

5.4.3. Higher Noise Testing

Although in the second case the estimation on the stochastic volatility is correct, the over-parameterization on the correlation and the volatility of Hull White processes cannot fullyexplain the wrong estimation on the volatility process in our first case. In order to understandthe this problem, we also test the calibration performance under higher noise levels: “medium”noise level and “large ”noise level with the variances of noise 10−9 and 10−8 respectively.The choice of the calibration instruments is the same as previous. Further more, we fixed theσ3 = 0.01 and ρ1 = −0.3 and assume θ2 = θ3.

The Table 5.3 shows the calibration results under the simulated data with medium noise level(10−9). We can see that the estimation on the long term mean of stochastic volatility, thevolatility of stochastic volatility and correlations are different from the real value.

If we compare the estimation of the underlying processes Figure 5.8, the estimation on the HullWhite processes are good but again the estimation on the stochastic volatility is wrong.

51

Table 5.3.: Calibration Results (medium noise)

real noise (10−9)

k1 0.4000 0.3971k2 0.3000 0.3005k3 0.2100 0.2098θ1 1.00 0.9323

θ2, θ3 0.0500 0.0500σ1 0.2000 0.1892σ2 0.0080 0.0079

σ3 (fixed) 0.0100 0.0100ρ1 (fixed) -0.3000 -0.3000

ρ2 -0.5000 -0.5363ρ3 -0.4000 -0.3393ε 1e-9 6.8847e-10


0 200 400 600 800 1000

0.7

0.8

0.9

1

1.1

real V

estimation V

0 200 400 600 800 1000

0.05

0.055

0.06

real X1

estimation X1

0 200 400 600 800 1000

0.045

0.05

0.055

0.06

0.065

real X2

estimation X2

Figure 5.8.: The underlying processes: real v.s. estimated

52

The following Figure 5.9 shows average noise in the simulated data and average estimationerror of the prices for each instrument. We find that the estimated noise is smaller than thenoise. This shows that although the estimated parameters and stochastic volatility process aredifferent from the real ones, the calibrated model still describes the observations well.

0 10 20 30 40 50 600

0.5

1

1.5

2

2.5

3

x 10−5

average noise

average estimation error

Figure 5.9.: The average noise in the data v.s. the average estimation error

If we increase the noise level to 10−8, we can see from the Table 5.4.3 and Figure 5.10 that thecalibration result is worse. The estimations error on the long term mean of stochastic volatility,the volatility of volatility and the correlations are larger than that in the previous case, theestimated stochastic process is further away form the real process.

Table 5.4.: Calibration Results (large noise)

real noise (10−8)

k1 0.4000 0.4023k2 0.3000 0.3006k3 0.2100 0.2092θ1 1.00 0.8519

θ2, θ3 0.0500 0.0500σ1 0.2000 0.1719σ2 0.0080 0.0078

σ3 (fixed) 0.0100 0.0100ρ1 (fixed) -0.3000 -0.3000

ρ2 -0.5000 -0.6122ρ3 -0.4000 -0.2611ε 1e-8 7.1539e-9


53

0 200 400 600 800 1000

0.6

0.7

0.8

0.9

1

1.1

real V

estimated V

0 200 400 600 800 1000

0.048

0.05

0.052

0.054

0.056

0.058

0.06

0.062

real X1

estimated X1

0 200 400 600 800 1000

0.045

0.05

0.055

0.06

0.065

real X2

estimated X2

Figure 5.10.: The underlying processes: real v.s. estimated

More interesting thing to notice is that in the higher noise cases and the first case, the estimatedstochastic volatility processes do have the same fluctuation as the real stochastic volatilityprocess. Recall (5.4.1), the diffusion part of short rate process, it depends on the three termsbefore the three Brownian motions (i.e. B1, B2, B3 as we denote before) and the stochasticvolatility V . The variance of rk+1 − rk is Vk ∗ (B2

1 + B22 + B2

3)δ. So scaling the stochasticprocess V and dividing the square sum B2

1 +B22 +B2

3 by the same number will lead to the samedistribution for the short rate process r. In our case, the real square sums of the three terms is0.0001, and the estimated square sums are 0.00010838 and 0.00012043 under medium and largenoise respectively. If we scale the estimated process (under medium and large noise level) by thecorresponding ratio between the estimated and real square sum (1.0838 and 1.2043 respectively)

54

, we find the estimated stochastic volatility processes match the real process well in both case(see Figure 5.11). Moreover, through scaling the estimated stochastic volatility processes, weobtain the scaled estimation θ1 = 1.0150, σ1 = 19.70% and θ1 = 1.0260, σ1 = 18.86% under themedium and large noise level respectively. These scaled estimations have approximately thesame value as the real parameters

0 200 400 600 800 1000

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

Large noise

real

scaled estimation

0 200 400 600 800 1000

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

medium noise

real

scaled estimation

Figure 5.11.: The stochastic volatility process: real v.s. scaled estimation

We also scale the estimated stochastic volatility process in our first case. In this case, theestimation on B2

1 + B22 + B2

3 is 0.0001746, so the ratio is 1.746. The following Figure 5.12shows the scaled stochastic volatility process match the real process. The scaled estimationθ1 = 1.0253 and σ1 = 19.97% also match the real values.

55

0 200 400 600 800 1000

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

real

scaled estimation

Figure 5.12.: The stochastic volatility process: real v.s. scaled estimation

The results illustrate that mismatch between the estimated and the real stochastic volatil-ity process is due to the mismatch between the estimation on the correlations. Note that inKalman filter, the underlying process is estimated under the given model parameters. In theoptimization procedure, given a wrong estimation on the correlations, by scaling the stochasticvolatility process the algorithm can adjust the short rate process to match the real one andgive the good estimation on the observations. We also observe that the estimation error of thecorrelations and the volatility process a positive correlation with the noise in the simulated data.

In all the four cases, the estimated noise are lower than the real noise in the data. Thissuggests that the calibrated model can replicate the simulated data very well. It also indicatesthat the instrument prices have high tolerance to the correlation parameter sets. By scalingthe stochastic volatility process, the model with wrong correlations can produce the prices thatare close to the real prices. When the noise is small, the estimation error of the wrong modelis larger than the noise level, then the algorithm can identify the wrong model. In higher noisecase, the estimation error of the wrong error may be much smaller than the noise level, thenwrong model is good enough to describe the market data and hence the algorithm misidentifythat the wrong model is optimal. As we can see from the calibration results, the estimatednoise level (which reflects the estimation error) under the medium noise ( 6.8847e−10) is largerthan the small noise level (10−10), so the algorithm can tell the wrong estimation result underthe small noise and give the correct calibration result. By contrast, in the large noise case, theestimated noised level (7.1539e− 9) is smaller than the noise level (10−8), so the wrong modelis chose as the “optimal ”.

The calibration results shows that there still is freedom in the model parameters to get thealmost same prices. In order to further improve the performance of calibration, we need to un-derstand where the freedom comes from. The first possible reason is the over-parameterization.However, if the model is over-parameterized, the terms B1, B2 and B3 should be scaled at thesame level to make sure the calibrated model is equivalent to the real model, which is not thecase in our testing. So the freedom is not due to the over-parameterization. The freedom actu-

56

ally comes from the model itself. Different model parameters (but not equivalent) give almostthe same prices. In this sense, using more complicated instruments (like swaptions) can reducethe chance that different model parameters give the almost the same prices, and hence improvethe calibration performance. It is also worth noting that in all the calibration examples, the“optimal ”likelihood is less than the likelihood under the real parameters, the optimization pro-cedure stopped before getting the real optimal value. So to improve the calibration performance,developing a better optimization algorithm is also a good option.

57

6. Conclusion

The affine model in general is a very sophisticated model which can easily include stochasticvolatility and interest rate spreads. So the calibration of affine model is difficult. The calibrationof a financial model contains two part of work: calibration instruments valuation and optimiza-tion. The most used instruments are zero rates, caplets (floors) and European style swaptions.In Chapter 3 we discuss the valuation of caplets (floorlets) by using Laplace transform inversiontechniques. The pricing example shows that the pricing algorithm is stable and can valuate thecaplets (floorlets) correctly. The valuation of European style swaptions is also presented in theAppendices. This algorithm can price the swaption correctly but is time consuming. That is themain reason why we do not use the swaption as calibration instruments in this thesis. Addingswaptions into the calibration instruments will improve the performance of calibration, so devel-oping an efficient way to compute the swaption prices in affine model is a main task in the future.

The optimization algorithm is introduced in Chapter 4 and some parts of Chapter 5. Thisalgorithm is based on extended Kalman filter. In Chapter 5 we also develop an algorithm tocompute the derivatives of caplet pricing function w.r.t. the underlying states. This algorithmcan compute the derivatives very efficiently and can also be used in delta hedging calculation.

In Chapter 5, we use the two-factor Hull White with stochastic volatility model as an exampleto illustrate the performance of extended Kalman filter used in the affine model calibration.From the results, we find that in order to identify the two Hull White processes, we need tonormalise two parameters in the diffusion part of the model. The calibration works very wellwith the small noised data (10−10). In higher noised case, the estimated stochastic volatilityprocess is scaled away from the simulated stochastic volatility process due to the wrong estima-tion on the correlation parameters. The larger the noise is, the larger estimation error will bein the correlation parameters, and hence result in a more scaled estimated volatility process tocompensate the different between the real and estimated variance B2

1 +B22 +B2

3 . This freedom ofbalancing between the stochastic volatility process and the correlations is due to the complexityof the model and the complexity of the calibration instrument. More complicated calibrationinstruments will lower the degree of the the freedom and get more accurate calibration result.

In this testing, we only simulate one set of observations in each noise level. To get morecomprehensive knowledge about the calibration performance, we can simulate more sets of ob-servations based on the same “real ”parameters. Then for each set of observation, we do thecalibration and get one set of the estimated parameters. Finally we collect all the estimated pa-rameter sets together and we can observe how the estimated parameters distribute and comparethem to the real parameters. Moreover, in our testing, all the simulated data are based on thesame “real ”parameters, so we can also change the “real ”parameters and test the calibrationperformance. However, the current calibration algorithm is too time consuming to do suchtestings, so improve the efficient of the calibration algorithm is strongly desired.

Regarding to the performance of calibration, one challenge is the convergence of the optimiza-

58

tion procedure. In the testing, all the optimization procedures converge to a local maximumother than the global maximum. Improvement on the convergence performance is extremelyhelpful for the accuracy of the calibration, especially is the higher noised cases.

Another challenge of the algorithm, as we state before, is the computation time. One run-ning of the calibration normally takes 20 hours up to 2 days, depending on the parameterssetting, boundary conditions and the calibration instruments. Such a long calculation timemakes the algorithm hard to be tested. In the testing, we find that most of the running time isspent on matrices computation in Laplace transform inversion and on the solving of the Riccatiequations for the option pricing. Hence, in order to reduce the running time to a acceptablelevel, more efficient algorithms for the Laplace transform inversion and the Riccati equationsare needed.

59

Popular summary

One important application of probability and stochastic theory is the financial modelling. Thefinancial modelling aims to build mathematic models which can describe the stochastic natureof market and represent the financial underlying. Such financial models have wide applications,including asset pricing, modelling term structure of interest rate and the credit spreads, calcu-lating risk exposures, etc. .

In principle, the financial modelling has four parts of work: proposing a mathematic model,derivative pricing under this model, calibration of the model, and applications of the model.These four parts of work are connected to each other. For example, the choice of a modeldepends not only on its own characteristics, but also on its possible applications and its perfor-mance on derivative pricing and calibration.

In this thesis, the attractive properties and wide application possibilities originally motivatethe proposal of affine model. We further develop an efficient caplet (floorlet) pricing algorithmand test the calibration performance for a specific affine model: two factor Hull White modelwith stochastic volatility . The results show that the calibration algorithm gives correct es-timation on the model parameters and the underlying processes under the small noised data.When the data contains relatively larger noise, the estimations of model parameters are wrong.At the mean while the wrong estimated model can estimate the observations very well. Thissuggests that by balancing the value of underlying process and the model parameters we canget almost the same prices for the instruments. Implementing the swaptions into the calibrationand developing a more advanced optimization algorithm are two possible way to improve thecalibration performance.

60

Bibliography

[1] Flipovic D., Term-Structure Models, A Graduate Course, Springer (2009).

[2] Brigo D. and Mercurio F., Interest Rate Models: Theory and Practice with Smile, inflationand Credit, Springer (2nd ed. 2006).

[3] den Iseger P.W., Numerical Laplace Inversion Using Gaussian Quadrature, Probability inthe Engineering and Information Sciences 20 (2006), no. 1, 1-44.

[4] den Iseger P.W., Laplace Transform Iversion on the Entire Line, SSRN (2009).

[5] Conway, J.B., A Course in Functional Analysis, Springer (2nd ed. 1990)

[6] Julier S.J. and Uhlmann J.K., Unscented Filtering and Nonlinear Estimation, Proceedingsof the IEEE (Volume:92 , Issue: 3, 2004 )

[7] Wan E.A. and van der Merwe R., The Unscented Kalman Filter, in Kalman Filtering andNeural Networks, John Wiley & Sons, Inc., New York, USA (2001).

[8] Bishop G. and Welch G., An Introduction to the Kalman Filter, Chapel Hill, NC (2006)

[9] Duffie D., Filipovic D. and Schachermayer W., Affine Process and Application in Finance,Ann. Appl. Probab. 13 (2003), no. 3, 984–1053 .

[10] https : //en.wikipedia.org/wiki/QR decomposition

61

Appendices

62

A. Proof of Lemma 3.2.1

Proof. Note that for integers m < n,

Dm(tn(1− t)n) =

m∑k=0

(−1)m−kCkmn!

(n− k)!

n!

(n−m+ k)!tn−k(1− t)n−m+k.

Hence Dm(tn(1− t)n)|t=0 = Dm(tn(1− t)n)|t=1 = 0 for m < n. So by partial integration,∫ 1

0e−stln(t)dt =

(e−st√

2n+ 1

n!Dn−1(tn(1− t)n)

)|10 +s

∫ 1

0e−st√

2n+ 1

n!Dn−1(tn(1− t)n)dt

= s

∫ 1

0e−st√

2n+ 1

n!Dn−1(tn(1− t)n)dt

...

=

√2n+ 1

n!sn∫ 1

0e−sttn(1− t)ndt

=

√2n+ 1

n!sn

n∑k=0

(−1)kCkn

∫ 1

0e−sttn+kdt.

Again use partial integration, one can easily show that∫ 1

0e−sttn+k = (−1)n+k+1e−s

1

sn+k+1(n+ k)! +

1

sn+k+1(n+ k)!.

So we have

ln(s) =√

2n+ 1n∑k=0

(n+ k)!

(n− k)!k!(−1)n+1e−s

1

sk+1+√

2n+ 1n∑k=0

(n+ k)!

(n− k)!k!(−1)k

1

sk+1

=1

s

(pn(

1

s)− (−1)ne−spn(−1

s)

).

Since e−1s = e−i2πv on the set 1

i2π(k+v) , k ∈ Z, we can obtain on this set

qvn = Ψln.

63

B. proof of Theorem 3.2.2

Proof. By definition of the inner product and the previous lemma, we obtain

< Ψf , qvk >v=< Ψf ,Ψlk >v=

∞∑k=−∞

f(λvk)lk(λvk).

Denote hk(j) =< f, lk(· − j) >, then

hk(s) =

∫Re−sjh(j)dj

=

∫R

∫ j+1

je−sjf(t)lk(t− j)dtdj

=

∫R

∫ 1

0e−s(t−y)f(t)lk(y)dydt

=

∫Re−stf(t)dt

∫ 1

0esylk(y)dy

= f(s)lk(−s).

It follows that

hk(λvk) = f(λvk)lk(−λvk) = f(λvk)lk(λ

vk).

According to the PSF, we obtain

< Ψf , qvk >v=

∞∑k=−∞

hk(λvk) =

∞∑j=−∞

e−i2πjvh(j)

=

∞∑j=−∞

e−i2πjv < f, l(· − j) > .

64

C. Proof of Theorem 3.2.3

Proof. By the previous lemma, we have

< qvm, qvn >v=< Ψlm,Ψln >v .

Define lm,n(t) =∫∞−∞ lm(u)ln(u− t)du, again we can obtain that

lm,n(s) = lm(s)ln(−s).

< Ψlm,Ψln >v =∞∑

k=−∞lm(i2π(k + v))ln(i2π(k + v))

=∞∑

k=−∞lm,n(i2π(k + v)).

Apply PSF to lmn, we have

< qvm, qvn >v=< Ψlm,Ψln >v=

∞∑k=−∞

lm,n(i2π(k + v)) =

∞∑k=0

e−i2πkvlm,n(k).

Note that by definition lm,n(t) = 0 if t ≥ 1, hence

< qvm, qvn >v= lm,n(0) =

∫ 0

−1lm(−u)ln(−u)du =< lm, ln > .

So Qv is an orthogonal set.

Define

G(t) =∞∑

k=−∞f(i2π(k + v))e−i2π(k+v)t, t ∈ [0, 1].

Since∫ 1

0 e−i2π(k−l)tdt = δkl, we obtain

||G||2 =

∫ 1

0

( ∞∑k=−∞

f(−i2π(k + v))ei2π(k+v)t

)( ∞∑l=−∞

f(i2π(l + v))ei2π(l+v)t

)dt

=∞∑

k=−∞

∞∑l=−∞

f(i2π(k + v))f(i2π(l + v))

∫ 1

0e−i2π(k−l)tdt

=∞∑

k=−∞|f(i2π(k + v))|2 <∞.

65

So G ∈ L2([0, 1]). Since the Legendre polynomials are a complete orthogonal set in L2([0, 1]),we have

∞∑k=0

| < G, lk > |2 = ||G||2 =∞∑

k=−∞|f(i2π(k + v))|2.

On the other hand,

| < G, lk > | = |∫ 1

0

∞∑k=−∞

f(i2π(k + v))e−i2π(k+v)tlk(t)dt|

= |∞∑

k=−∞f(i2π(k + v))

∫ 1

0e−i2π(k+v)tlk(t)dt|

= |∞∑

k=−∞f(i2π(k + v))lk(i2π(k + v))|

= | < Φf ,Ψlk >v |.

So

∞∑k=0

| < G, lk > |2 =∞∑k=0

| < Φf ,Ψlk >v |2 =∞∑k=0

| < Φf , qvk >v |2.

Hence

∞∑k=0

| < Φf , qvk >v |2 =

∞∑k=−∞

|f(i2π(k + v))|2 = ||Ψf ||v.

By Conway (1990,Thm. 4.13), we obtain that Qv is a complete orthogonal set in L2v

66

D. Numerical Algorithm for SwaptionPricing

The payoff of a receiver swaption at maturity T0 is given by

p(T0) =

(1− P (T0, Tn)− kδ

n∑i=1

P (T0, Ti)

)+

.

In affine model, the bond prices

P (T0, Ti) = eφ(Ti−T0,0)+ψ>(Ti−T0)XT0 .

Denote Ai = φ(Ti − T0, 0), Bi = ψ>(Ti − T0), i ≥ 0, then

p(T0) =

(1− eAn+BnXT0 − kδ

n∑i=1

eAi+BiXT0

)+

.

Now assume the affine model is one dimensional, then due to the fact that Bi < 0, i > 0, thereexist a unique solution x∗ such that

1− eAn+Bnx∗ − kδn∑i=1

eAi+Bix∗

= 0,

and

1− eAn+Bnx − kδn∑i=1

eAi+Bix > 0,

if only if x > x∗. So

p(T0) =

(1− eAn+BnXT0 − kδ

n∑i=1

eAi+BiXT0

)1XT0>x∗

=(eAn+Bnx∗ − eAn+BnXT0

)1XT0>x∗ + kδ

n∑i=1

(eAi+Bix

∗ − eAi+BiXT0)

1XT0>x∗

= eAn+Bnx∗(

1− eBn(XT0−x∗))+

+ kδn∑i=1

eAi+Bix∗(

1− eBi(XT0−x∗))+

.

Denote Ki = eAi+Bix∗, the swaption price at time t is given by

swapt(t) = P (t, T0)E[p(T0) | Ft]

= P (t, T0)KnE[(

1− e−Bn(x∗−XT0 ))+| Ft

]+ P (t, T0)kδ

n∑i=1

KiE[(

1− eBi(x∗−XT0 ))+| Ft

].

67

Now in order to compute the swaption price, we only need to compute the conditional expec-tations

E[(

1− eBi(x∗−XT0 ))+| Ft

], i = 1, · · · , n.

Define function Gi(x) = (1 − eBi(x∗−XT0 ))+. The condition expectations can be written as

convolutions G ∗ fXT0 |Xt . The Laplace transform of Gi is Gi(s) = − Bi(s+Bi)s

and the Laplacetransform of fXT0 |Xt can be easily computed. So applied Laplace transform inversion code, wecan compute the swaption prices.

For high dimensional case, we need to use the conditional expectations. For example, in twofactor Hull White model with stochastic volatility, denote the pricing formula of swaption by

E[p(VT0 , X(1)T0, X

(2)T0

) | Ft]. Then using conditional expectation,

swapt(t) = E[p(VT0 , X(1)T0, X

(2)T0

) | Ft]

= EX

(1)T0,X

(2)T0

[EVT0 [p(VT0 , X

(1)T0, X

(2)T0

) | X(1)T0, X

(2)T0,Ft] | Ft

]= E

X(1)T0,X

(2)T0

[g(X

(1)T0, X

(2)T0

) | Ft]

≈∑i,j

wiwjg(λi, λj)fX(1)T0|Xt

(λi)fX(2)T0|Xt

(λj).

The function g is the swaption pricing function for one factor model, we can computes its valueaccording to the previous approach. The last equation we use Gaussian quadrature. The λi, λjare the quadrature points and the wi, wj are the wights. The f

X(1)T0|Xt

and fX

(2)T0|Xt

are the

conditional density functions which can be derived by inverting the Laplace transform as we doin option pricing.

68

Documents

Calibration of A ne Model - UvA · The main calibration instruments (market data) are yield curves, caplets/ oorlets and European-style swaptions. Depending on the application, there