An Introductory Course in Stochastic Processes

An Introductory Course in StochasticProcesses

Stéphane Crépey, University of Evry, France(after Tomasz R. Bielecki, Illinois Institute of Technology, Chicago)

February 13, 2012

This is an introductory course in stochastic processes. Its purpose is to introducestudents into a range of stochastic processes, which are used as modeling tools in diverse fieldsof applications, especially in the risk management applications for finance and insurance. Inaddition, students will be introduced to some basic stochastic analysis.

The course introduces the most fundamental ideas in the area of modeling and anal-ysis of real world phenomena in terms of stochastic processes. It covers different classesof Markov processes: discrete and continuous-time Markov chains, Brownian motion anddiffusion processes. It also presents some aspects of stochastic calculus with emphasis onthe application to financial and insurance modeling, as well as financial engineering.

Main references

1. Introduction to Stochastic Processes, Gregory F. Lawler. Chapman & Hall, old version1996 (new version 2004).2. Elementary Stochastic Calculus with Finance in View, Thomas Mikosch. World Scientific,1998 or later.3. Stochastic Calculus for Finance II: Continuous–Time Models, Steven E. Shreve. Springer,2004 or later.

Other references

4. Stochastic Differential Equations and Diffusion Processes, N. Ikeda and S. Watanabe.Second edition, North-Holland, 1989.5. Stochastic Integration and Differential Equations, Philip E. Protter. Second Edition,Springer, 2004.

Sections marked with a “∗” correspond to more advanced material that can be skippedat first reading.

Contents

I Some classes of discrete-time stochastic processes 7

1 Discrete-time stochastic processes 91.1 Conditional expectations and filtrations . . . . . . . . . . . . . . . . . . . . . 9

1.1.1 Main properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Discrete-time Markov chains 132.1 Motivation and construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 An introductory example . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Chapman-Kolmogorov equations . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.1 Long-range behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Discrete-time martingales [Lawler, Chapter 5; Mikosch, Section 1.4; Shreve,Chapter 2 and Section 3.2] 213.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Doob-Meyer decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Stopping times and optional stopping theorem . . . . . . . . . . . . . . . . . . 28

3.3.1 Applications to random walks . . . . . . . . . . . . . . . . . . . . . . . 303.4 Uniform integrability and martingales . . . . . . . . . . . . . . . . . . . . . . 343.5 Martingale convergence theorems . . . . . . . . . . . . . . . . . . . . . . . . . 35

II Some classes of continuous-time stochastic processes 37

4 Continuous-time stochastic processes 394.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.2 Continuous-time martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2.1 Optional Stopping Theorem . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Continuous-time Markov chains [Lawler, Chapter 3] 415.1 Poisson process [Shreve, Sections 11.2 and 11.3] . . . . . . . . . . . . . . . . . 435.2 Two-states continuous time Markov chain . . . . . . . . . . . . . . . . . . . . 465.3 Birth-and-death Process [Lawler, Section 3.3.] . . . . . . . . . . . . . . . . . . 47

6 Brownian motion [Lawler, Chapter 8; Mikosch, Section 1.3; Shreve, Sec-tions 3.3–3.7] 516.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1.1 Random walk approximation . . . . . . . . . . . . . . . . . . . . . . . 52

3

4 CONTENTS

6.1.2 Second order properties . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Markov properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.3 Martingale methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.3.1 Martingales associated with Brownian motion . . . . . . . . . . . . . . 586.3.2 Exit time from a corridor . . . . . . . . . . . . . . . . . . . . . . . . . 586.3.3 Laplace Transform of the first passage time of a drifted Brownian motion 61

6.4 Geometric Brownian motion [see also Mikosch, Example 1.3.8] . . . . . . . . . 63

III Elements of stochastic analysis 65

7 Stochastic integration [Lawler, Chapter 9; Mikosch, Chapter 2; Shreve,Sections 4.2 and 4.3 ] 677.1 Integration with respect to symmetric random walk . . . . . . . . . . . . . . . 677.2 The Itô stochastic integral for simple processes . . . . . . . . . . . . . . . . . 687.3 The general Itô stochastic integral . . . . . . . . . . . . . . . . . . . . . . . . 707.4 Stochastic Integral with respect to a Poisson process . . . . . . . . . . . . . . 727.5 Semimartingale Integration Theory [See Protter]∗ . . . . . . . . . . . . . . . . 73

8 Itô formula [Mikosch, Chapter 2; Shreve, Section 4.4] 778.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

8.1.1 What about∫ t

0 WsdWs? . . . . . . . . . . . . . . . . . . . . . . . . . . 788.1.2 What about

∫ t0 Ns−dNs? . . . . . . . . . . . . . . . . . . . . . . . . . . 78

8.2 Itô formulas for continuous processes . . . . . . . . . . . . . . . . . . . . . . . 788.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.3 Itô formulas relative to jump processes [See Ikeda and Watanabe]∗ . . . . . . 808.3.1 Brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

9 Stochastic differential equations (SDEs) [Mikosch, Chapter 3; Shreve, Sec-tion 6.2] 859.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859.2 Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

9.2.1 SDEs for diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9.3 Solving diffusion SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889.4 SDEs Driven by a Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . 919.5 Jump-Diffusions [See Ikeda and Watanabe]∗ . . . . . . . . . . . . . . . . . . . 93

10 Girsanov transformations 9510.1 Girsanov transformation relative to Gaussian distributions . . . . . . . . . . . 95

10.1.1 Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . 9510.1.2 Brownian motion [Mikosch, Section 4.2; Shreve, Sections 1.6 and 5.2.1] 96

10.2 Girsanov transformation relative to Poisson distributions . . . . . . . . . . . . 9710.2.1 Poisson random variables . . . . . . . . . . . . . . . . . . . . . . . . . 9710.2.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

10.3 Girsanov transformation relative to both Brownian motion and Poisson process 9810.4 Abstract Bayes formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

11 Feynman-Kac formulas∗ 101

11.1 Linear case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10111.2 Backward Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . 102

11.2.1 Non-linear Feynman-Kac formula . . . . . . . . . . . . . . . . . . . . . 10311.2.2 Optimal stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6

Part I

Some classes of discrete-timestochastic processes

7

Chapter 1

Discrete-time stochastic processes

1.1 Conditional expectations and filtrations

In this section we discuss the notions of conditional expectations and filtrations which arekey in the study of stochastic processes.

Definition 1.1. Let � and "1, . . . , "n be random variables. The conditional expectationE(�∣"1, . . . , "n) is a random variable characterized by two properties:1. The value of E(�∣"1, . . . , "n) depends only on the values of "1, . . . , "n, i.e., we can writeE(�∣"1, . . . , "n) = �("1, . . . , "n) for some function �. If a random variable can be written asa function of "1, . . . , "n, it is called measurable with respect to "1, . . . , "n.2. Suppose A is any event that depends only on "1, . . . , "n. Let IA denote the indicatorfunction of A, i.e., the random variable which equals 1 if A occurs and 0 otherwise. Then

E(�IA) = E(E(�∣"1, . . . , "n)IA) (1.1)

Let (Ω, P ) be the underlying probability space. That is, Ω is the set of elementaryevents ! and P is the probability defined on Ω. Every random variable considered here is afunction of !. Thus, the quantity E(�∣"1, . . . , "n)(!) is a value of E(�∣"1, . . . , "n). Sometimesa slightly less formal, but a bit more convenient notation is used: suppose ! ∈ Ω is suchthat "i(!) = xi, i = 1, 2, 3, . . . , n. Then, the notation E(�∣"1, . . . , "n)(x1, x2, . . . , xn) orE(�∣"1 = x1, "2 = x2, . . . , "n = xn) is used in place of E(�∣"1, . . . , "n)(!). Likewise, for thevalue of the indicator random variable IA, where A ⊂ Ω, the notation I"(A)(x1, x2, . . . , xn)is used instead of IA(!), with the understanding that "(A) = {("1(!), "2(!), . . . , "n(!)), ! ∈A}.

Lawler p.85-87 (p.101-106) shows how to compute conditional expectations using jointand marginal distributions of random variables.

Example 1.2. We illustrate the equality (1.1) with an example in which n = 1: Supposethat � and " are discrete random variables and A is an event which involves ". (For con-creteness you may think of " as the value of the first roll and � the sum of the two rolls of

9

10

dice, and A = {" ≤ 2}). Then indeed, we have, using the Bayes formula in the third line

E(E(�∣")IA) =∑x

E(�∣" = x)I"(A)(x)P (" = x)

=∑x

[∑y

yP (� = y∣" = x)

]I"(A)(x)P (" = x)

=∑y

y∑x

I"(A)(x)P (� = y, " = x)

=∑y,x

(yI"(A)(x))P (� = y, " = x) = E(�IA).

For the example involving dice, taking "(A) = {x ≤ 2}

E(�IA) =∑y,x

yI{x≤2}P (� = y, " = x)

=∑y

yP (� = y, " = 1) +∑y

yP (� = y, " = 2)

=1

36(2 + 3 + 4 + 5 + 6 + 7) +

1

36(3 + 4 + 5 + 6 + 7 + 8)

=27

36+

33

36=

5

3

and

E[E(�∣")IA] = E[("+ 3.5)I{"≤2}]

=

2∑x=1

xP (" = x) + 3.5P (" ≤ 2) =3

6+ 3.5

1

3=

5

3.

It will be convenient to make the notation more compact. If "1, "2, ... is a sequenceof random variables we will use ℱn to denote the information contained in "1, . . . , "n, andwe will write E(�∣ℱn) for E(�∣"1, . . . , "n). We have ℱn ⊂ ℱm if 1 ≤ n ≤ m . This isbecause the collection "1, . . . , "n of random variables contains no more information than"1, . . . , "n, . . . , "m.

Mikosch, section 1.4.2, defines information carried by random variables "1, . . . , "n interms of the associated �-field: �("1, . . . , "n). Thus, ℱn = �("1, . . . , "n) [we say that ℱn isa �-field generated by "1, . . . , "n.] As we already noted we have that

ℱn ⊂ ℱm if 1 ≤ n ≤ m.

A collection ℱn, n = 1, 2, 3, . . . , of �-fields satisfying the above property is called a filtration.[We similarly define a filtration ℱt, t ≥ 0, for a continuous time index t, and the conditionalexpectations with respect to such filtrations. This will be needed later.]

1.1.1 Main properties

0. Conditional expectation is a linear operation: if a, b are constants

E(a�1 + b�2∣ℱn) = aE(�1∣ℱn) + bE(�2∣ℱn).

11

1. The following property follows from (1.1) if the event A is the entire sample space, sothat IA = 1:

E(E(�∣ℱn)) = E(�).

1’. [Tower rule] If m ≤ n, then

E(E(�∣ℱn)∣ℱm) = E(�∣ℱm).

2. If � is measurable with respect to [is a function of] "1, . . . , "n then

E(�∣ℱn) = �.

2’. If � is measurable with respect to "1, . . . , "n then for any random variable �

E(��∣ℱn) = �E(�∣ℱn).

3. If � is independent of "1, . . . , "n, then

E(�∣ℱn) = E(�).

3’. [See Mikosch, section 1.4.4, rule 7.] If � is independent of "1, . . . , "n and � is measurablewith respect "1, . . . , "n, then for every function � = �(y, z)

E(�(�, �)∣ℱn) = E��(�, �)

whereE��(�, �) means that we “freeze” � and take expectation with respect to �, soE��(�, �) =E�(�, y)∣y=�.4. [Projection property of the conditional expectation; see Mikosch, section 1.4.5.] Let �be a random variable with E�2 < ∞. The conditional expectation E(�∣ℱn) is that randomvariable in L2(ℱn) which is closest to � in the mean square sense, so

E[� − E(�∣ℱn)]2 = min�∈L2(ℱn)

E("− �)2.

Example of the verification of the tower rule Let � = "1 + "2 + "3, where "i is theoutcome of the itℎ toss of a fair coin, so that P ("i = 1) = P ("i = 0) = 1/2, and the "i s areindependent. Consider

E(E(�∣ℱ2)∣ℱ1) = E(E(�∣"1, "2)∣"1)

= E("1 + "2 + E"3∣"1)) = "1 + E"2 + E"3 = "1 + 1,

andE(�∣"1) = "1 + E("2 + "3) = "1 + 1/2 + 1/2 = "1 + 1.

12

Chapter 2

Discrete-time Markov chains

2.1 Motivation and construction

2.1.1 An introductory example

Suppose that Rn denotes the short term interest rate prevailing on day n (n ≥ 0). Supposealso that the rate Rn is a random variable which may only take two values: Low (L) andHigh (H), for every n. [We call the possible values of Rn the states]. Thus, we consider arandom sequence: Rn, n = 0, 1, 2, . . . . Sequences like this are frequently called discrete timestochastic processes.

Next suppose that we have the following information available about the conditionalprobabilities:

P (Rn = jn∣R0 = j0, R1 = j1, . . . , Rn−2 = jn−2, Rn−1 = jn−1)

= P (Rn = jn∣Rn−2 = jn−2, Rn−1 = jn−1)(2.1)

for every n ≥ 2 and for every sequence of states (j0, j1, . . . , jn−2, jn−1, jn), and

P (Rn = jn∣R0 = j0, R1 = j1, . . . , Rn−2 = jn−2, Rn−1 = jn−1)

∕= P (Rn = jn∣Rn−1 = jn−1)(2.2)

for some n ≥ 1 and for some sequence of states (j0, j1, . . . , jn−2, jn−1, jn),In other words, we know that our today’s interest rate depends on the entire history of

the past values of the interest rates only through the values of interest rates prevailing onthe two immediately preceding days [this is the condition (2.1) above]. But, the informationcontained in these two values may sometimes affect today’s conditional distribution of theinterest rate in a different way than the information provided only by the yesterday’s valueof the interest rate [this is the condition (2.2) above].

The type of stochastic dependence subject to condition (2.2) is not the Markovian typeof dependence. [It will be clear soon what we mean by the Markovian type of dependence.]However, due to the condition (2.1) the stochastic process Rn, n = 0, 1, 2, . . . can be “en-larged” [or augmented] to a so called Markov chain that will exhibit the Markovian type ofdependence.

To see this, let us see what happens when we create a new stochastic process Xn, n =0, 1, 2, . . . , by enlarging the state space of the original sequence Rn, n = 0, 1, 2, . . . . Towardsthis end let us define

Xn = (Rn, Rn+1).

13

14

Observe that the state space for the sequence Xn, n = 0, 1, 2, . . . contains four elements:(L,L), (L,H), (H,L) and (H,H). We shall now examine conditional probabilities for thesequence Xn, n = 0, 1, 2, . . . :

P (Xn = in∣X0 = i0, X1 = i1, . . . , Xn−2 = in−2, Xn−1 = in−1) =

P (Rn+1 = jn+1, Rn = jn∣R0 = j0, R1 = j1, . . . , Rn−1 = jn−1, Rn = jn)

= P (Rn+1 = jn+1∣R0 = j0, R1 = j1, . . . , Rn−1 = jn−1, Rn = jn) =

by condition (2.1)

= P (Rn+1 = jn+1∣Rn−1 = jn−1, Rn = jn)

= P (Rn+1 = jn+1, Rn = jn∣Rn−1 = jn−1, Rn = jn)

= P (Xn = in∣Xn−1 = in−1)

for every n ≥ 1 and for every sequence of states (i0, i1, . . . , in−1, in).We see that the enlarged sequence Xn exhibits the so called Markov property.

2.1.2 Definitions and examples

Definition 2.1. A random sequence Xn, n = 0, 1, 2, . . . , where Xn takes values in the setS, is called a Markov chain with the (discrete: finite or countable) state space S if it satisfiesthe Markov property:

P (Xn = in∣X0 = i0, X1 = i1, . . . , Xn−2 = in−2, Xn−1 = in−1)

= P (Xn = in∣Xn−1 = in−1)

for every n ≥ 1 and for every sequence of states (i0, i1, . . . , in−1, in) from the set S.

Every discrete time stochastic process satisfies the following property [given the conditionalprobabilities are well defined, property (1.2) in Lawler, p.7 (p.9)]

P (X0 = i0, X1 = i1, . . . , Xn−2 = in−2, Xn−1 = in−1, Xn = in) =

P (X0 = i0)P (X1 = i1∣X0 = i0)× . . .×P (Xn = in∣X0 = i0, X1 = i1, . . . , Xn−2 = in−2, Xn−1 = in−1).

Not every random sequence satisfies the Markov property. Sometimes a random sequence,which is not a Markov chain, can be transformed to a Markov chain by means of enlargementof the state space.

Definition 2.2. A random sequence Xn, n = 0, 1, 2, . . . , where Xn takes values in the set S,is called a time-homogeneous Markov chain with the state space S if it satisfies the Markovproperty (1) and, in addition,

P (Xn = in ∣Xn−1 = in−1) = q(in−1, in) (2.3)

for every n ≥ 1 and for every two of states in−1, in from the set S, where p : S × S → [0, 1]is some given function.

We shall only study time-homogeneous Markov chains. Time-inhomogeneous Markovchain can be transformed to a time-homogeneous one by including the time variable in thestate vector.

15

Definition 2.3. The (possibly infinite) matrix Q = [q(i, j)]i,j∈S is called the (one-step)transition matrix for the Markov chain Xn.

The transition matrix for a Markov chain Xn is a stochastic matrix. That is, its rows canbe interpreted as probability distributions [with non-negative entries summing up to unity,see (1.4) and (1.5) in Lawler]. To every pair

(�0, Q

), where �0 =

(�0(i)

)i∈S is an initial

probability distribution on S and Q is a stochastic matrix, there corresponds some Markovchain with the state space S. Such a chain can be constructed via the formula [Equation(1.3) in Lawler, p.8]

P (X0 = i0, X1 = i1, . . . , Xn−2 = in−2, Xn−1 = in−1, Xn = in) =

�0(i0)q(i0, i1) . . . q(in−1, in).

In other words, the initial distribution �0 and the transition matrix Q determine a Markovchain completely by determining its finite dimensional distributions.

Remark 2.4. There is an obvious analogy with a difference equation:

xn+1 = axn, n = 0, 1, 2, . . . ,

x0 = x.

The solution path {x0, x1, x2, . . . } is uniquely determined by the initial condition x and thetransition rule a.

Example 2.5. Let "n, n = 1, 2, . . . be i.i.d. (independent, identically distributed randomvariables) such that P ("n = −1) = p, P ("n = 1) = q = 1 − p. Define X0 = 0 and, forn = 1, 2, 3, . . . ,

Xn = Xn−1 + "n.

The process Xn, n = 0, 1, 2, . . . is a time-homogeneous Markov chain on S = {. . . ,−i,−i+1, ⋅ ⋅ ⋅ − 1, 0, 1, . . . , i − 1, i, . . . } =the set of all integers, and the corresponding transitionmatrix is Q given by

q(i, i+ 1) = 1− p, q(i, i− 1) = p, i = 0,±1,±2, . . . .

This is a random walk (on the integer lattice) starting at zero. If p = 1/2, then the walk iscalled symmetric.


Xn =

⎧⎨⎩−N if Yn−1 = −NXn−1 + "n if −N < Xn−1 < NN if Xn−1 = N.

The process Xn, n = 0, 1, 2, . . . is a time-homogeneous Markov chain on S = {−N,−N +1, ⋅ ⋅ ⋅ − 1, 0, 1, . . . , N − 1, N}, and the corresponding transition matrix is Q given by

q(i, i+ 1) = 1− p, q(i, i− 1) = p, −N < i < N,

q(−N,−N) = q(N,N) = 1.

This is a random walk starting at zero with absorbing boundaries at −N and N . If p = 1/2,then the walk is called symmetric.

16


Xn =

⎧⎨⎩−N + 1 if Yn−1 = −NXn−1 + "n if −N < Xn−1 < NN − 1 if Xn−1 = N.

The process Xn, n = 0, 1, 2, . . . is a time-homogeneous Markov chain on S = {−N,−N +1, ⋅ ⋅ ⋅ − 1, 0, 1, . . . , N − 1, N}, and the corresponding transition matrix is Q given by

q(i, i+ 1) = 1− p, q(i, i− 1) = p, −N < i < N,

q(−N,−N + 1) = q(N,N − 1) = 1.

This is a random walk starting at zero with reflecting boundaries at −N and N . If p = 1/2,then the walk is called symmetric.

Example 2.8. Let "n, n = 0, 2, . . . be i.i.d. (independent, identically distributed randomvariables) such that P ("n = −1) = p, P ("n = 1) = q = 1− p. Then, the stochastic processXn = "n, n = 0, 1, 2, . . . is a time-homogeneous Markov chain on S = {−1, 1}, and thecorresponding transition matrix is

Q =

(−1 1

−1 p q1 p q

).

Here of courseq(i,−1) = P (Xn = −1∣Xn−1 = i) = P (Xn = −1)

for i = −1, 1, and, likewise,

q(i, 1) = P (Xn = 1∣Xn−1 = i) = P (Xn = 1)

for i = −1, 1.

2.2 Chapman-Kolmogorov equations

Definition 2.9. Given any two states i, j ∈ S the n-step transition probability qn(i, j) isdefined as

qn(i, j) = P (Xn = j∣X0 = i)

for every n ≥ 0. We define the n-step transition matrix Qn as

Qn = [qn(i, j)]i,j∈S .

Proposition 2.1. We have(i) q1(i, j) = q(i, j) and

q0(i, j) =

{1 if i = jN if i ∕= j.

and thus, Q1 = Q and Q(0) = I (the identity matrix).

(ii) qn(i, j) = P (Xk+n = j∣Xk = i) for each k ≥ 0.

17

Proof. Part (i) is obvious and part (ii) holds since we only consider time-homogenousMarkov chains. In fact, for n = 2 we have

P (Xk+2 = j∣Xk = i) =∑r∈S

P (Xk+2 = j,Xk+1 = r∣Xk = i) =

∑r∈S

[P (Xk+2 = j,Xk+1 = r,Xk = i)P (Xk+1 = r,Xk = i)

P (Xk+1 = r,Xk = i)P (Xk = i)

]=

∑r∈S

[P (Xk+2 = j∣Xk+1 = r,Xk = i)P (Xk+1 = r∣Xk = i)

]= by Markov property

∑r∈S

[P (Xk+2 = j∣Xk+1 = r)P (Xk+1 = r∣Xk = i)

]= by time homogeneity

∑r∈S

[P (X2 = j∣X1 = r)P (X1 = r∣X0 = i)

]=

∑r∈S

[P (X2 = j∣X1 = r,X0 = i)P (X1 = r∣X0 = i)

]=

∑r∈S

P (X2 = j,X1 = r∣X0 = i) =

P (X2 = j∣X0 = i) = q2(i, j).

Similar argument may be used for an arbitrary n ≥ 1. □

Proposition 2.2. The following representation for the n-step transition matrix holds:

Qn = Qn

for every n ≥ 0. [Recall: by definition we have Q0 = I.]

Proof. The proof is done by induction. See Lawler, p.11 (p.13). □

Corollary 2.3. The Chapman-Kolmogorov equation is satisfied:

Qm+n = QmQn = QnQm

for every m, n ≥ 0. Or, equivalently,

qm+n(i, j) =∑k∈S

qm(i, k)qn(k, j) =∑k∈S

qn(i, k)qm(k, j)

for every m, n ≥ 0, and every i, j ∈ S.

18

Proof. Qm+n = Qm+n = QmQn = QmQn. □

The Chapman-Kolmogorov equation provides the basis for the first step analysis:

Qn+1 = QQn. (2.4)

We shall see applications later. The last step analysis would be Qn+1 = QnQ. Observe thatEquation (2.4) and the Chapman-Kolmogorov equations are equivalent. Equation (2.4) canalso be written as

ΔQn+1 = AQn = QnA,

where ΔQn+1 = Qn+1 − Qn, and A = Q − I. Note that the diagonal elements of A arenegative and that the rows add to 0. The matrix A is called the generator for any Markovchain associated with Q.

Definition 2.10. The (unconditional) n-step probabilities �n(i) are defined as

�n(i) = P (Xn = i)

for every n ≥ 0. In particular, �0(i) = P (X0 = i) (the initial probabilities).

We shall use the notation �n = [�n(i)]i∈S . This is a row vector (possibly infinite) represent-ing distribution of the states of the Markov process at time n.

Proposition 2.4. We have�n = �0Q

n

for every n ≥ 0.

Proof. It is straightforward:

P (Xn = j) =∑i∈S

P (X0 = i)P (Xn = j∣X0 = i) =∑i∈S

�0(i)qn(i, j).

We already know that the n-step transition probability qn(i, j) is the (i, j) entry of the ma-trix Qn. □

A recursive equation for the n-step transition probabilities [that is for the conditionalprobabilities P (Xn = j∣X0 = i)] is:

Qn+1 = QnQ, n = 0, 1, 2, . . . ,

with the initial condition Q(0) = I.A recursive equation for the unconditional probabilities P (Xn = j) is:

�n+1 = �nQ, n = 0, 1, 2, . . . ,

with the initial condition �0 corresponding to the distribution of X0.See also example 6 in Lawler, p.11 (p.13).

19

2.2.1 Long-range behavior

By the long time behavior of a Markov chain we mean behavior of the conditional prob-abilities Qn and the unconditional probabilities �n for large n. In view of the fact that�n = �0Qn = �0Q

n this essentially boils down to the behavior of the powers Qn of thetransition matrix for large n.

Understanding the long time behavior of a Markov chains that model real systems isimportant for various applications in operations research and engineering [manufacturing,investment, scheduling etc.].

20

Homework 1: Conditional expectations and discrete-time Markov chainsRecall our introductory example.

1. Supposing thatP (Rn+2 = L∣Rn+1 = H,Rn = H) = P (Rn+2 = H∣Rn+1 = L,Rn = L) = 1/8P (Rn+2 = L∣Rn+1 = L,Rn = H) = P (Rn+2 = H∣Rn+1 = H,Rn = L) = 1/2P (Rn+2 = H∣Rn+1 = L,Rn = H) = P (Rn+2 = L∣Rn+1 = H,Rn = L) = 1/2P (Rn+2 = L∣Rn+1 = L,Rn = L) = P (Rn+2 = H∣Rn+1 = H,Rn = H) = 7/8,derive the transition matrix for the enlarged process Xn.

2. Assuming further P (R0 = L) = 1/3 and

P (R1 = L∣R0 = L) = P (R1 = H∣R0 = H) = 3/4,

what is the probability that the interest rates will be low for three consecutive daysstarting from day 0? from day 2?

3. Given that the initial interest rate is low, that is R0 = L, what is the conditionalprobability that R4 = H?

4. What is the probability that the interest rate will be high in the long range?

Chapter 3

Discrete-time martingales [Lawler,Chapter 5; Mikosch, Section 1.4;Shreve, Chapter 2 and Section 3.2]

3.1 Definitions and examples

Definition 3.1. A stochastic process Yn, n ≥ 0, is a martingale with respect to a filtrationℱn, n ≥ 0 if(i)

E∣Yn∣ <∞, for all n ≥ 0,

and(ii)

E(Ym∣ℱn) = Yn, for all m ≥ n. (3.1)

In this definition, as in the introductory section to this part, ℱn denotes the informationcontained in a sequence "1, . . . , "n of random variables. The second condition of the def-inition implies that Yn is ℱn-measurable. The first condition assures that the conditionalexpectations are well defined. When we say that Yn is a martingale without reference toℱn, n ≥ 0 we understand that ℱn is the information contained in Y0, . . . , Yn, so "n = Yn.

In order to verify (3.1) it is enough to show that for all n

E(Yn+1∣ℱn) = Yn, (3.2)

since then by the tower rule

E(Yn+2∣ℱn) = E(E(Yn+2∣ℱn+1)∣ℱn) = E(Yn+1∣ℱn) = Yn,

and so on. We also note that for every n ≥ 0

E(Yn+1) = E[E(Yn+1∣ℱn)] = E(Yn),

so that a martingale is a process with a constant mean. Because of property (3.2) martingaleis thought of as a model of a fair game. The process that can be thought of as a model of afavorable (unfavorable) game is called a submartingale (supermartingale) as defined below.

21

22

Definition 3.2. A stochastic process Yn is a submartingale (supermartingale) with respectto (ℱn) if for all n ≥ 0,

E∣Yn∣ <∞,

E(Ym∣ℱn) ≥ (≤)Yn, for all m ≥ n

Yn is ℱn-measurable.

We note that the last condition is automatically satisfied for a martingale. A process Ynsuch that Yn is measurable with respect to ℱn for every n ≥ 0 is called adapted to thefiltration ℱ = (ℱn)n≥0. We shall normally consider adapted processes only.

Example 3.3. Let the "i be i.i.d. random variables with mean �. Let S0 = S0 = 0 and forn > 0 let

Sn = "1 + . . .+ "n, Sn = Sn − n�.

[This is example 1 in Lawler, p.90 (p.107)]. Are all processes "n, Sn and Sn Markov chains?Are all these processes martingales with respect to the filtration ℱn if � ∕= 0? What is theanswer to the preceding question when � = 0?

Example 3.4. [Compare with example 3, section 5.2 in Lawler.] Consider a gambler whois playing a sequence of independent games in each of which he wins one with probability por loses one with probability 1 − p. Let "n, n ≥ 1 be a sequence of i.i.d. random variablesindicating the outcome of the itℎ game:

P ("i = 1) = p = 1− P ("i = −1), i ≥ 1, "0 = 0.

We note that E("i) = 2p − 1, i ≥ 1. Suppose that the gambler employs a betting strategybased on the past history of the game, that is, the bet �n+1 on the n+ 1tℎ game is

�n+1 = �n+1("1, . . . , "n), n ≥ 0,

where �n+1 ≥ 0. Let Yn, n ≥ 1, denote the gambler’s fortune after n games and set Y0 = 0.Then

Yn+1 = Yn + �n+1("1, . . . , "n)"n+1, n ≥ 0.

Now denote by ℱn the information contained in "0, . . . , "n and consider

E(Yn+1∣ℱn) = E

[Yn + �n+1("1, . . . , "n)"n+1

∣∣∣∣ℱn]= Yn + �n+1("1, . . . , "n)E("n+1)⎧⎨⎩

= Yn if E("n+1) = 0⇔ p = 12

≤ Yn if E("n+1) < 0⇔ p < 12

≥ Yn if E("n+1) > 0⇔ p > 12

Thus when p = 12 , Yn is a martingale with respect to ℱn .When p < (>)1

2 , Yn is a super-martingale (submartingale) with respect to ℱn. An interesting aspect of this example, whenp = 1

2 , is that no matter what betting strategy is used in the class of strategies based on thepast history of the game, we have E(Yn) = E(Y0) = 0, for every n.

Now recall Example 3.3 above. If p = 1/2 then the process

Sn = "1 + ⋅ ⋅ ⋅+ "n, n ≥ 0

23

is a martingale w.r.t ℱn, n ≥ 0. [It is a supermartingale if p < 1/2, and a submartingaleif p > 1/2.] Next, observe that �n = �n("1, . . . , "n−1) is ℱn−1-measurable for every n ≥ 1.[Such process is called predictable with respect to the filtration ℱn.] The gambler’s fortuneYn can be written as

Yn =n∑k=1

�k(Sk − Sk−1

), n = 0, 1, 2, 3 . . . .

This expression is a martingale transform of the process Sn by the process �n. This isthe discrete counterpart of the stochastic integral

∫� dS. [We know that that Yn is a

martingale (supermartingale, submartingale) with respect to ℱn, n ≥ 0 if Sn is a martingale(supermartingale, submartingale) with respect to ℱn, n ≥ 0.]

Example 3.5. [Compare with example 2, section 5.2 in Lawler.] This example is a specialcase of Example 3.4 with p = 1

2 , and the following betting strategy. Bet $1 on the firstgame. Stop if you win. If not double your bet. If you win stop betting (i.e. set �n = 0 forall greater n). Otherwise, keep doubling your bet until you eventually win. This is a veryattractive betting strategy which involves a random stopping rule: you stop when you win.Let Yn denote your fortune after n games. Assume Y0 = 0. We already know from Example3.4 that Yn is a martingale, with EYn = EY0 = 0. But in the present case the gambleremploys a randomized stopping strategy, i.e. the gambler stops the game at the randomtime � = min{i ≥ 1 : Yi = 1}, the time at which she wins. Note that Y� = 1 on {� < ∞},and that

P (� = n) = (1

2)n, n ≥ 1,

soP (� <∞) = 1.

Therefore the gambler wins 1 in finite time with probability one. In particular

E(Y�) = 1 ∕= 0 = E(Yn), n = 0, 1, 2, . . . .

The reason why this inequality happens is that � is an unbounded stopping time [i.e. thereis no finite constant K such that P (� ≤ K) = 1]. We shall talk about this more in afollowing lecture. That is why, employing this randomized [doubling] strategy the gamblercan guarantee that she finishes the game ahead. However, consider the expected amountlost before the gambler wins (which is the expected value of the last bet)

E(amount lost) =∞∑n=0

P (� = n+ 1)[2n − 1] =∞∑n=0

(1

2)n+1[2n − 1] =∞

Thus, on average, you need an infinite capital to play a winning game, which makes thedoubling strategy much less attractive.

Example 3.6 (Martingales associated with driftless random walk). This complements Ex-ample 3.3. Let "i, i ≥ 1,be i.i.d. random variables with E("i) = 0, E("2

i ) = �2 < ∞. Weverify that

Sn = x+

n∑i=1

"i, n ≥ 0,

24

andMn = S2

n − n�2, n ≥ 0,

are martingales with respect to ℱn, n ≥ 0, information contained in "1, . . . , "n, n ≥ 0, orequivalently, information contained in S0, . . . , Sn, n ≥ 0. We have(i)

E∣Sn∣ ≤ x+

n∑i=1

E∣"i∣ ≤n∑i=1

[E("2

i )]1/2

<∞,

and(ii)

E(Sn+1∣ℱn) = Sn.

Similarly(i)

E∣Mn∣ ≤ ES2n + n�2 = x2 + n�2 + n�2 = x2 + 2n�2 <∞,

and(ii)

E(Mn+1∣ℱn) = E[S2n+1 − (n+ 1)�2

∣∣∣ℱn]= E

[S2n + "2

n+1 + 2Sn"n+1 − (n+ 1)�2∣∣∣ℱn]

= Sn"2i + �2 + 0− (n+ 1)�2 = Mn.

Recall that Sn is a Markov chain. So, Sn is both a Markov chain and a martingale. Is Mn

a Markov chain as well?

Example 3.7 (Wald’s martingale). Let "i, i ≥ 1, be i.i.d. random variables with E("i) <∞, V ar("i) <∞. Set Sn = x+

∑ni=1 "i, and for i ≥ 1, let

m(�) = E [exp(�"i)] ,−∞ < � <∞,

be the moment generating function of "i. Define

Zn =exp(�Sn)

[m(�)]n, n ≥ 0.

We verify that Zn is a martingale for every �. We have(i)

E∣Zn∣ =E [exp(�Sn)]

[m(�)]n=

exp(�x) [m(�)]n

[m(�)]n= exp(�x),

and(ii)

E(Zn+1∣ℱn) = E

{exp(�Sn+1)

[m(�)]n+1 ∣ℱn}

=exp(�Sn)

[m(�)]n+1E(exp �"n+1) = Zn.

25

Now, suppose that each "i is normally distributed with mean � and variance �2. Then,letting x = 0 (to simplify presentation), we have, Sn ∼ N (n�, n�2), and thus

Zn = exp

(− �n�− 1

2�2n�2 + �Sn

).

This model is related to so called geometric Brownian motion with a drift.

Example 3.8. Let "i, i ≥ 1,be i.i.d. random variables with P ("i = 1) = p, P ("i = −1) =q = 1 − p, 0 < p, q < 1. Set Sn = x +

∑ni=1 "i. We note that E"i = p − q = 2p − 1, and

V ar("i) = 1− (2p− 1)2 = 4pq. We now verify that

Sn ≡ Sn − n(p− q), n ≥ 0,

andZn = (q/p)Sn , n ≥ 0,

are martingales. That Sn is a martingale with respect to �("1, . . . , "n), n ≥ 0 followsimmediately from Example 3.3 by writing

Sn ≡ x+n∑i=1

["i − (p− q)].

We now show that Zn is a Wald’s martingale. We have for i > 0

m(�) = E[exp(�"i)] = p exp(�) + q exp(−�).

If we choose � = ln(q/p) then m(�) = 1, and Wald’s martingale takes the form

exp [ln(q/p)Sn] = (q/p)Sn = Zn.

3.2 Doob-Meyer decomposition

Recall the driftless random walk of Example 3.6 and set S0 = 0. In this example we sawthat the process Mn = S2

n − n�2 was a martingale with respect to the filtration ℱn. Theprocess An = n�2 is non-decreasing [it is in fact strictly increasing in this case], and it ispredictable [of course, since it is deterministic]. Finally observe that for each n ≥ 0 we havethe following decomposition of the process S2

n:

S2n = S2

0 +Mn +An.

Thus, we have decomposed the sub-martingale into a sum of a martingale and a non-decreasing, predictable process. This particular result obtained for our example is a specialcase of the celebrated result known as the Doob-Meyer decomposition:

Theorem 3.1. Let Xn, n ≥ 0 be a process adapted to some filtration ℱn, n ≥ 0. AssumeE∣Xn∣ <∞ for every n ≥ 0. Then, Xn has a Doob-Meyer decomposition

Xn = X0 +Mn +An, ∀n ≥ 0,

where Mn is a martingale with M0 = 0, and An is a predictable process with A0 = 0. Thedecomposition is unique [in appropriate sense].

Xn is a sub-martingale if and only if the process An is non-decreasing. □

26

Definition 3.9. If a process Xn is a square integrable martingale, the predictable quadraticvariation process ⟨X⟩n of Xn is the predictable Doob-Meyer component of X2

n, such that

X2n − ⟨X⟩n

is a martingale with respect to ℱn.

27

Homework 2: Conditional expectations and discrete-time martingales

1. Show that if Xn is a martingale then it has uncorrelated increments, i.e.,

E(Xm −Xn)(Xk −Xr) = 0,

where 0 ≤ r < k ≤ n < m <∞.

2. Answer the questions posed in Example 3.3.

(a) Are the processes "n, Sn and Sn Markov chains?

(b) Are the processes "n, Sn and Sn martingales w.r.t ℱn when � ∕= 0?

(c) Are the processes "n, Sn and Sn martingales w.r.t ℱn when � = 0?

3. Let Sn =∑n

i=1 "i, n ≥ 0, where "i, i ≥ 1, is a sequence of independent, identicallydistributed random variables, and "i has an exponential distribution with parameter �.Identify three different martingales associated with the process {Sn, n ≥ 0}. Representthese martingales as a function of Sn and parameter �.

[Hint: The first martingale Sn = Sn−f(n, �) [where f(n, �) = some function of n and�]. The second martingaleMn = S2

n−g(n, �) [where g(n, �) = some function of n and�]. The third martingale Zn = exp(�Sn)

ℎ(n,�,�) [where ℎ(n, �, �) = some function of n and �.]

4. Let "i, i ≥ 1, be i.i.d. random variables with P ("i = −1) = P ("i = 1) = 12 . Set

Sn =∑n

i=1 "i, n = 0, 1, 2, . . . , and let ℱn, be the information contained in S0, . . . , Sn[which is the same as information contained in "1, . . . , "n ]. Finally, let

Zn = e−Sn , n ≥ 0.

(a) Verify whether the process Zn is a martingale, or super-martingale, or sub-martingale, or neither with respect to the filtration ℱn.

(b) Find a numerical sequence zn so that process Zn given as

Zn =Znzn

is a (Wald’s) martingale with respect to the filtration ℱn.

28

3.3 Stopping times and optional stopping theorem

The optional stopping theorem is also referred to in the literature as Doob’s optional sam-pling theorem.

Definition 3.10. A random variable � is called a stopping time with respect to ℱn, n ≥ 0,(where ℱn is the information contained in Y0, . . . , Yn) if(i) � takes values in {0, 1, . . . ,∞},(ii) For each n, I(� = n) is measurable with respect to ℱn.

Thus stopping time is a stopping rule based only on the information contained in ℱn. Putanother way, if we know that a particular event from ℱn took place, then we know whether� = n or not. The notation I(A) is used instead of IA in this section.

Example 3.11. Let � = j for some j ≥ 0. Clearly, � is a stopping time. This is the mostelementary example of a bounded [finite] stopping time. Let now "i, i ≥ 1, be i.i.d. randomvariables with P ("i = 1) = p, P ("i = −1) = q = 1 − p, 0 < p, q < 1. Set Sn =

∑ni=1 "i.

Let ℱn be the information contained in S0, . . . , Sn [which is the same as the informationcontained in "0, . . . , "n]. We consider different stopping rules.

1. Let

�j = min{n ≥ 0 : Sn = j}= ∞ if Sn ∕= j for all n ≥ 0.

Since I(�j = n) is determined by the information in ℱn, �j is a stopping time withrespect to ℱn.

2. Let�j = �j − 1, j ∕= 0,

then, since I(�j = n) = I(�j−1 = n) = I(�j = n+1), I(�j = n) is not ℱn-measurable(it is ℱn+1-measurable). Hence �j is not a stopping time.

3. �rj , the rtℎ passage time of the process Sn to j, r = 1, 2, . . . , is a stopping time withrespect to ℱn.

4. Let�j = max{n ≥ 0 : Sn = j}.

Thus �j is the last time Sn visits state j. Clearly �j is not a stopping time.

Exercise 1. suppose that � is a stopping time, then �j := min(�, j), where j is a fixedinteger, is also a stopping time. Clearly �j ≤ j.

Exercise 2. If � and � are stopping times, then so are min(�, �) and max(�, �).

Let � be any non-negative integer valued random variable which is finite with proba-bility one. Let Xn, n = 0, 1, 2, 3, . . . be a random sequence. Then, X� denotes the randomvariable that takes values X�(!)(!). The following proposition says that you cannot beat afair game by using a stopping rule which is a bounded stopping time.

29

Proposition 3.2. Let Mn be a martingale and � a stopping time with respect to ℱn, n ≥ 0,where ℱn is the information contained in M0, ...Mn. Then

EMmin(�, n) = EM0, ∀n ≥ 0.

Proof. We have

Mmin(�, n) = M�I(� ≤ n) +MnI(� > n)

= M�

n∑k=0

I(� = k) +MnI(� > n)

=

n∑k=0

MkI(� = k) +MnI(� > n).

Hence

EMmin(�, n) =

n∑k=0

E {MkI(� = k)}+ E {MnI(� > n)}

=

n∑k=0

E {[E(Mn∣ℱk)I(� = k)]}+ E {MnI(� > n)}

=

n∑k=0

E {[E(MnI(� = k)∣ℱk)]}+ E {MnI(� > n)}

=

n∑k=0

E {MnI(� = k)}+ E {MnI(� > n)}

= E {MnI(� ≤ n)}+ E {MnI(� > n)} = EMn = EM0,

where the second equality follows from the martingale property of Mn, the third from thefact that I(� = k) is measurable with respect to ℱk, and the fourth from the tower rule. □

In many situations of interest the stopping time is not bounded, but is almost surelyfinite, as in the doubling strategy of Example 3.5. In this example EX� = 1 ∕= 0 = EX0.The question arises when is it that EM� = EM0, for a stopping time which is not bounded?We have

M� = Mmin(�, n) +M�I(� > n)−MnI(� > n)

Hence, using Proposition 3.2 we obtain for every n

EM� = EM0 + E[M�I(� > n)]− E[MnI(� > n)]. (3.3)

This provides motivation for the following

Theorem 3.3 (Optional Stopping Theorem). Let Mn be a martingale and � a stoppingtime with respect to ℱn, n ≥ 0. If

P (� <∞) = 1, (3.4)

E∣M� ∣ <∞, (3.5)

andlimn→∞

E[∣Mn∣I(� > n)] = 0, (3.6)

thenEM� = EM0. (3.7)

30

Proof. It follows from (3.3) and (3.6) that we only have to show

limn→∞

E[M�I(� > n)] = 0. (3.8)

By (3.4) and (3.5)

E∣M� ∣ =∞∑k=0

E[∣M� ∣I(� = k)]

=n∑k=0

E[∣M� ∣I(� = k)] + E[∣M� ∣I(� > n)] <∞(3.9)

Now (3.8) follows because we see from (3.9) that E[∣M� ∣I(� > n)] is a tail of a convergentseries. □

Example 3.12. For the doubling strategy of Example 3.5 we know that (3.7) does not hold.We also know that for this strategy P (� <∞) = 1, and E∣Y� ∣ = 1 <∞, so it must be that(3.6) does not hold. Indeed, as n→∞

E[∣Yn∣I(� > n)] = ∣1− 2n∣P (� > n) = ∣1− 2n∣(1/2)n → 1.

3.3.1 Applications to random walks

Let "i, i ≥ 1, be i.i.d. random variables with P ("i = 1) = p, P ("i = −1) = q = 1 − p, 0 <p, q < 1. Set Sn = x+

∑ni=1 "i, and let

� = min{n ≥ 0 : Sn = a or Sn = b}, a ≤ x ≤ b,

where a, b are integers. Now � is a stopping with respect to ℱn, n ≥ 0, where ℱn is theinformation contained in S0, . . . , Sn. One can show [see Exercise 1.7 in Lawler] that E� <∞,which implies that P (� < ∞) = 1. Our goal is to compute, using OST, P (S� = b), theprobability that Sn reaches b before a, and E�, for a symmetric random walk and for arandom walk with drift.

Symmetric random walk: p = q = 1/2.

In this case we know from Example 3.6 that Sn and Mn = S2n − n are martingales with

respect to ℱn, n ≥ 0, where ℱn is the information contained in S0, . . . , Sn. We note that �is a stopping time with respect to ℱn, n ≥ 0. We apply OST to Sn and stopping time �.Since

E∣S� ∣ ≤ max(∣a∣, ∣b∣) <∞,

andlimn→∞

E[∣Sn∣I(� > n)] ≤ limn→∞

max(∣a∣, ∣b∣)P (� > n) = 0

(recall that P (� <∞) = 1), by OST we have

ES� = ES0 = x. (3.10)

But alsoES� = bP (S� = b) + aP (S� = a). (3.11)

31

Combining (3.10) and (3.11) we obtain

P (S� = b) =x− ab− a

, P (S� = a) =b− xb− a

. (3.12)

Setting a = 0 and b = N, we get

P (S� = b) =x

N,P (S� = 0) =

N − xN

,

which are the probabilities of winning and ruin, respectively, in the so-called Gambler’s ruinproblem. To compute E� we apply OST to Mn and �. We check assumptions of OST. Wehave

E∣M� ∣ ≤ ES2� + E� <∞,

andE[∣Mn∣I(� > n)] ≤ E[S2

nI(� > n)] + nP (� > n)

≤ max(a2, b2)P (� > n) + nP (� > n).

Clearlylimn→∞

max(a2, b2)P (� > n) = 0.

To show that limn→∞ nP (� > n) = 0 we write

E� =

n∑k=1

kP (� = k) +

∞∑k=n+1

kP (� = k).

Since E� <∞,∑∞

k=n+1 kP (� = k) is a tail of a convergent series, and thus

0 = limn→∞

∞∑k=n+1

kP (� = k) ≥ limn→∞

∞∑k=n+1

nP (� = k) = limn→∞

nP (� > n),

which shows that (3.6) holds. Hence by OST we have

EM� = ES2� − E� = ES2

0 = x2,

so that using (3.12) we obtain,

E� = ES2� − x2 = b2

x− ab− a

+ a2 b− xb− a

− x2 = (b− x)(x− a). (3.13)

By setting a = 0 and b = N we obtain the expected duration of the game in the Gambler’sruin chain

E� = (N − x)x.

Random walk with drift: p ∕= q.

In this case (see Example 3.8) Sn ≡ Sn − n(p− q), and Zn = (q/p)Sn , are martingales withrespect to ℱn, n ≥ 0, where Sn is defined as in Example 3.8. We first apply OST to Zn and�. We check assumptions of OST:

E∣Z� ∣ ≤ max((q/p)a , (q/p)b) <∞,

32

andlimn→∞

E[∣Zn∣I(� > n)] ≤ limn→∞

max((q/p)a , (q/p)b)P (� > n) = 0.

Hence we can apply OST to conclude

EZ� = EZ0 = (q/p)x . (3.14)

ButEZ� = (q/p)b P (S� = b) + (q/p)a (1− P (S� = b)). (3.15)

From (3.14) and (3.15) we obtain

P (S� = b) =(q/p)x − (q/p)a

(q/p)b − (q/p)a. (3.16)

Setting a = 0 and b = N in (3.16) we obtain the probability that the Gambler, who startswith x dollars, will win the desired N − x dollars:

P (S� = N) =(q/p)x − 1

(q/p)N − 1.

We now compute E� by applying OST to the martingale Sn. The assumptions of OST canbe verified in the same way as in previous cases. By OST we have

ES� = ES� − E�(p− q) = ES0 = x.

so thatE� =

ES� − xp− q

,

where ES� can be easily computed from (3.16).

Exercise 3 Compute the expected duration of the game for the Gambler’s ruin chain withp ∕= q.

33

Homework 3: Discrete-time optional stopping theorem

1. Exercise 1.

2. Exercise 2.

3. Exercise 3.

4. Let Sn = 1 +∑n

i=1 "i, n ≥ 0, be a random walk with p = 1/4 and q = 3/4 startingfrom x = 1. Suppose that

� = min {n ≥ 0 : Sn = −2} .

Use Optional Stopping Theorem (admitting its conditions hold) to compute

(a) E1(�)

(b) V1(�)

34

3.4 Uniform integrability and martingales

Condition (3.6) is difficult to verify. Here we present some conditions that imply it.

Definition 3.13. A sequence of random variables X1, X2, . . . is uniformly integrable (UIfor short) if for every � > 0 there exists a � > 0 such that if for some random event A ⊂ Ωwe have P (A) < � then

E(∣Xn∣IA) < �, (3.17)

for each n.

Observe that � must not depend on n and (3.17) must hold for all values of n.

Example 3.14. Let X1, X2, . . . be a random sequence with ∣Xn∣ ≤ K <∞ for every n [Kdoes not depend on n which means that the sequence is uniformly bounded]. This sequenceis UI. To see this fix � > 0. Then, take � = �

K . Now, take any event A so that P (A) < �.We have

E(∣Xn∣IA) ≤ KP (A) < K� = �,

for every n. Thus the sequence X1, X2, . . . is UI.

Exercise 1. Let the sequence Xn be as in the above example. Consider the sequenceSn =

∑nk=1Xk. Is the sequence Sn UI?

Here is an equivalent definition of a UI sequence:

Definition 3.15. A sequence of random variables X1, X2, . . . is uniformly integrable (UIfor short) if

supn≥0

E(∣Xn∣I∣Xn∣>a) −→ 0 as a→∞.

Example 3.16. Consider the fortune process Yn of the doubling strategy from Example3.5. We know that this process is a martingale with respect to ℱn. Is it a UI martingale?In order to answer this question consider the event An = {"1 = "2 = ⋅ ⋅ ⋅ = "n = −1}. ThenP (An) = 1

2

n and E(∣Yn∣IAn) = (2n−1)/2n [this is because ∣Yn∣ = 2n−1 if event An occurs].Thus, E(∣Yn∣IAn) = 1− 1

2

n. Now, take any � < 1. No matter how small (but positive) � youselect, you will alway find n large enough so that P (An) < � and E(∣Yn∣IAn) ≥ �. Thus, thegambler’s fortune process is not a UI martingale.

Suppose now that M0,M1, . . . is a UI martingale with respect to some filtration, andthat � is a stopping time s.t. P (� < ∞) = 1. By uniform integrability we then concludethat [since P{� > n} → 0]

limn→∞

E(∣Mn∣I{� > n}) = 0,

so that condition (3.6) holds. Thus, we may state a weaker version of the OST:

Theorem 3.4. LetMn be a UI martingale and � a stopping time with respect to ℱn. Supposethat P{� <∞} = 1 and E(∣M� ∣) <∞. Then, E(M�) = E(M0).

A useful criterion for uniform integrability is this one. If for a sequence of randomvariables Xn there exists a constant C < ∞ so that E(X2

n) < C for each n then thesequence Xn is uniformly integrable. [See Lawler p.98 (p.115) for proof].

35

Example 3.17. Consider a driftless random walk Sn as in Example 3.8, assuming P ("i =−1) = P ("i = 1) = 1/2 for every i ≥ 1. That is we have a symmetric random walk onintegers starting at 0. We know this random walk is a martingale with respect to ℱn. Now,consider the process Sn = Sn

n , n ≥ 1. We have that E(S2n) = 1/n for every n ≥ 1. Thus, the

sequence Sn is UI [of course, as it is a bounded sequence in the first place]. This criterion isnot satisfied for the random walk Sn itself. In fact, the random walk Sn is not UI!

3.5 Martingale convergence theorems

The following theorem is important.

Theorem 3.5. Suppose M0,M1, . . . is a supermartingale with respect to a filtration ℱn, andthere exists a finite constant C so that E∣Mn∣ < C for all n. Then, there exists a randomvariable M∞ so that with probability one

Mn→M∞.

This result is proved in Lawler (sect. 5.5) for a martingale sequence. We shall skipthe general proof. The above convergence means that if you denote A = {! ∈ Ω :limn→∞Mn(!) = M∞(!)}, then P (A) = 1. This means that the probability that the con-vergence limn→∞Mn(!) = M∞(!) does not hold is zero, or that this convergence holds foralmost every elementary event ! ∈ Ω. Such mode of convergence of random variables is calledconvergence with probability one or almost sure convergence [see appendix in Mikosch].

Corollary 3.6. Suppose M0,M1, . . . is a non-negative supermartingale with respect to afiltration ℱn. Then, there exists a random variable M∞ so that

Mn→M∞.

For UI supermartingales we obtain a stronger result:

Theorem 3.7. Suppose M0,M1, . . . is a UI supermartingale with respect to a filtration ℱn.Then, there exists a random variable M∞ so that

Mn →M∞.

Moreover, E(M∞∣ℱn) ≤ Mn. In particular, we get E(M∞) ≤ E(M0) [The equality holdshere in the case of martingales].

Example 3.18. See Example 6, p.102 (p.120), in Lawler, for an interesting example of anon-UI martingaleMn which almost surely converges toM∞ = 0, so that EM∞ ∕= EM0 = 1.

36

Homework 4: Discrete-time uniformally integrable martingales

1. Exercise 1.

2. An urn contains k red balls and m green balls at the initial time n = 0. One ballis chosen randomly from the urn. The ball is then put back into the urn togetherwith another ball of the same color. Hence, the number of total balls in the urn grows[Polya’s urn scheme, see Lawler Example 4 p.92 (p.109) and Example 3 p.101 (p.119)].Let Xn denote the proportion of green balls in the urn at time n ≥ 0.

(a) Is Xn a Markov chain?

(b) Is Xn a martingale?

(c) Is it a UI martingale?

Part II

Some classes of continuous-timestochastic processes

37

Chapter 4

Continuous-time stochastic processes

4.1 Generalities

So far we have been studying random processes in discrete time. We are turning now tostudying random processes in continuous time [See Mikosch, Section 1.2].

A collection of random variables {Xt, t ∈ [0,∞)} is called a continuous time random(or stochastic) process. We shall frequently use notation X to denote the process {Xt, t ∈[0,∞)}. Xt denotes the state at time t ≥ 0 of our random process. That is for every fixedt ≥ 0, Xt is a random variable on some underlying probability space (Ω, P ). This means thatXt(⋅) is a function from Ω to the state space S [that is Xt(⋅) : Ω→ S ]. On the other hand,for every fixed ! ∈ Ω we deal with a trajectory (or a sample path), denoted by X⋅(!), of ourrandom process. That is, X⋅(!) is a function from [0,∞) to S [that is X⋅(!) : [0,∞)→ S ].

The (natural) filtration generated by the process X is ℱt = �(Xs, 0 ≤ s ≤ t) =information contained in the random variables Xs, 0 ≤ s ≤ t. Process Yt is said to beℱ-adapted if Yt is ℱt-measurable for every t.

4.2 Continuous-time martingales

Since the definitions and results concerning martingales in continuous time are essentiallyanalogous to those in discrete time in this section we state definitions and results withoutmuch elaboration [see Mikosch, Section 1.5].

Definition 4.1. Process Y = (Yt, t ≥ 0) is called a martingale, resp. submartingale, resp.resp. supermartingale, with respect to the family of information sets ℱ = (ℱt, t ≥ 0) [whichsatisfy ℱs ⊂ ℱt, s ≤ t] if Y is ℱ-adapted and(i) E∣Yt∣ <∞, t ≥ 0(ii) For every s ≤ t one has E(Yt∣ℱs) = Ys, resp. E(Yt∣ℱs) ≥ Ys, resp. resp. E(Yt∣ℱs) ≤ Ys.

4.2.1 Optional Stopping Theorem

Definition 4.2. A nonnegative random variable � is called a stopping time relative to(ℱt, t ≥ 0) if for each t, I(� ≤ t), the indicator function of the event {� ≤ t}, is measurablewith respect to ℱt.

Theorem 4.1. Let (Mt, t ≥ 0) be a martingale and � a stopping time with respect to(ℱt, t ≥ 0). If

39

40

(i) P (� <∞) = 1(ii) E∣M� ∣ <∞(iii) limt→∞MtI(� > t) = 0 Then

EM� = EM0.

Chapter 5

Continuous-time Markov chains[Lawler, Chapter 3]

Definition 5.1. A random process X is called a continuous time Markov chain with adiscrete (finite or countable) state space S if for any 0 ≤ s, t and for any y ∈ S it holds

P (Xs+t = y∣ℱs) = P (Xs+t = x∣Xs).

The above Markov property can equivalently be stated as: for any sequence of times 0 ≤t1 ≤ t2 ≤ ⋅ ⋅ ⋅ ≤ tn−1 ≤ tn <∞, and for any collection of states x1, x2 . . . , xn−1, xn we have

P (Xtn = xn∣Xtn−1 = xn−1, . . . , Xt2 = x2, Xt1 = x1)

= P (Xtn = xn∣Xtn−1 = xn−1).

Definition 5.2. A Markov chain X is time homogeneous iff for all x, y ∈ S and all 0 ≤ s, twe have

P (Xs+t = y∣Xs = x) = P (Xt = y∣X0 = x) =: q(t;x, y).

LetQ(t) = (q(t;x, y))x,y∈S , t ≥ 0,

denote the transition probability function for a time homogeneous Markov chain X. Notethat Q(0) = I.

Proposition 5.1. For every s, t ≥ 0, the transition probability function for a time homoge-neous Markov chain X satisfies(i) 0 ≤ q(t;x, y) ≤ 1, ∀x, y ∈ S,(ii)

∑y∈S q(t;x, y) = 1, ∀x ∈ S,

(iii) (Chapman-Kolmogorov equations)

q(s+ t;x, y) =∑z∈S

q(s;x, z)q(t; z, y), ∀x, y ∈ S, ∀s, t ≥ 0 (5.1)

or, equivalentlyQ(s+ t) = Q(s)Q(t), ∀s, t ≥ 0.

Proof. Left as an exercise.

41

42

Recall that if a real valued continuous function f(t) satisfies the equation

f(s+ t) = f(s)f(t), ∀s, t ≥ 0,

then it is differentiable and such that (with f = dfdt )

f(t) = af(t), f(t) = eat

for some real number a. Similarly, in case of a continuous semi-group of transition probabil-ities Q(t) with a so-called matrix-generator A of X, the matrix function Q is differentiablein t ≥ 0 and such that Q(0) = I and for t ≥ 0

Q(t) = Q(t)A (5.2)

(forward form), or, equivalently since Q(t) commutes with A:

Q(t) = AQ(t) (5.3)

(backward form). The above equations are called Kolmogorov equations. The (infinite)matrix form of the unique solution is

Q(t) = eAt, t ≥ 0; (5.4)

where for any matrix [finite or infinite] the matrix exponential is defined in the usual manner

eAt =

∞∑n=0

(tA)n

n!.

For any initial distribution vector �0 = (P (X0 = x), x ∈ S) one then has that

(P (Xt = x), x ∈ S) =: �t = �0Q(t) = �0eAt.

Remark 5.3. The Chapman-Kolmogorov equation (5.1) can be equivalently written as

Q(s+ t)−Q(t) = Q(t)(Q(s)− I

), ∀s, t ≥ 0.

Now, fix t and rewrite the above for s > 0

Q(t+ s)−Q(t)

s= Q(t)

Q(s)−Q(0)

s.

Since the matrix function Q(t) is differentiable we obtain after letting s→ 0

Q(t) = Q(t)Q(0)

(the right derivative of Q(t) at t = 0). Comparing this with the forward equation weconclude that

A = Q(0).

Remark 5.4. Recall from Chapter 1 that for a discrete time Markov chain the n-steptransition matrix Qn satisfies the first step equation: Q0 = I and for n = 0, 1, 2, . . .

ΔQn+1 = AQn,

or, equivalently, the last step equation: Q0 = I and for n = 0, 1, 2, . . .

ΔQn+1 = QnA

where A = Q− I. The solution to both equations is Qn = Qn = (I+A)n, n = 0, 1, 2, 3, . . . .The forward Kolmogorov equation (5.2) is the continuous time counterpart of the last stepequation (5.4). The backward Kolmogorov equation (5.3) is the continuous time counterpartof the first step equation (5.4).

43

5.1 Poisson process [Shreve, Sections 11.2 and 11.3]

Although it is customary to denote a Poisson process by N = (Nt, t ≥ 0), Lawler uses thenotation X = (Xt, t ≥ 0). We shall follow Lawler’s notation. The notation h stands for asmall time increment.

Definition 5.5. Let X0 = 0. Also, let Xt denote the (random) number of occurences ofsome underlying random event in the time interval (0, t], t > 0. If Xt satisfies the followingtwo conditions we shall call X a Poisson process with intensity (or rate) � > 0:(i) For every t > 0 we have

P (Xt+h −Xt = k) =

⎧⎨⎩�h + o(h) if k = 1o(h) if k ≥ 21− �h + o(h) if k = 0.

where limh→0o(h)

h = 0.(ii) For any sequence 0 ≤ s1 ≤ t1 ≤ s2 ≤ t2 ≤ ⋅ ⋅ ⋅ ≤ sn ≤ tn < ∞, the random variablesXt1 −Xs1 , Xt2 −Xs2 , . . . , Xtn −Xsn are independent.

For any 0 ≤ s ≤ t the random variable Xt − Xs denotes the number of occurences ofour underlying random event in the time interval (s, t]. Any random process X satisfyingcondition (ii) of the above definition is called a process with independent increments. SeeLawler, section 3.1, for a motivating example where Xt represents the number of customersarriving at a service facility by the time t. The following important result explains the name“Poisson process”:

Theorem 5.2. Let X be a Poisson process with rate �. Then, for any 0 ≤ s, t we have

P (Xs+t −Xs = k) =(�t)ke−�t

k!, k = 0, 1, 2, . . . .

In other words, the increment Xs+t−Xs is a random variable that has Poisson distributionwith parameter �t.

Proof. Let us fix s and denote Pk(t) = P (Xs+t −Xs = k). Now,

P0(t+ h) = P (Xs+t+h −Xs = 0)

= P (Xs+t −Xs = 0, Xs+t+h −Xs+t = 0)

= P (Xs+t −Xs = 0)P (Xs+t+h −Xs+t = 0)

= P0(t)[1− �h + o(h)

].

Therefore,P0(t+ h)− P0(t)

h= −�P0(t) +

o(h)

hP0(t).

Letting h→ 0 we get

P0(t) = −�P0(t), t ≥ 0, (5.5)

with the initial conditionP0(0) = P (X0 = 0) = 1.

44

The ordinary differential equation (5.5) has the well known solution (check it as an exercise)

P0(t) = e−�t, t ≥ 0.

Thus, P (Xs+t −Xs = 0) = P0(t) = e−�t for t ≥ 0.Next, for any k ≥ 1, we have

Pk(t+ h) =k∑i=0

P(Xs+t −Xs = k − i, Xs+t+h −Xs+t = i

)=

k∑i=0

Pk−i(t)P(Xs+t+h −Xs+t = i

)= Pk(t)P

(Xs+t+h −Xs+t = 0

)+ Pk−1(t)P

(Xs+t+h −Xs+t = 1

)+

k∑i=2

Pk−i(t)P(Xs+t+h −Xs+t = i

)= Pk(t)

[1− �h + o(h)

]+ Pk−1(t)

[�h + o(h)

]+

k∑i=2

Pk−i(t)o(h).

So,Pk(t+ h)− Pk(t)

h= �

(Pk−1(t)− Pk(t)

)+o(h)

h

k∑i=2

Pk−i(t).

Again, letting h→ 0 we get

Pk(t) = �(Pk−1(t)− Pk(t)

), t ≥ 0, (5.6)

with the initial condition

Pk(0) = P (X0 = k) = P (0 = k) = 0,

for k = 1, 2, 3, . . . . Thus, for k = 1 we get

P1(t) = �(e−�t − P1(t)

), t ≥ 0, (5.7)

with the initial conditionP1(0) = P (X0 = 1) = 0.

The ordinary differential equation (5.7) has a well known solution

P1(t) = �te−�t, t ≥ 0.

Thus, P (Xs+t −Xs = 1) = P1(t) = (�t)1e−�t

1! for t ≥ 0. Proceeding similarly for k ≥ 2 wefinally obtain

P (Xs+t −Xs = k) = Pk(t) =(�t)ke−�t

k!, t ≥ 0;

for all k = 0, 1, 2, 3, . . . . □

We now have the important

45

Corollary 5.3. Let X be a Poisson process with rate �. Then the process X is a timehomogeneous Markov chain.

Proof. It is enough to verify that any three times 0 ≤ s ≤ r ≤ t < ∞, and for any threeintegers k ≤ m ≤ n we have

P (Xt = n∣Xr = m,Xs = k) = P (Xt = n∣Xr = m),

and thatP (Xt = n∣Xr = m)

only depends on m, n and the time diffrential t− r. Now,P (Xt = n∣Xr = m,Xs = k)

=P (Xt −Xr = n−m, Xr −Xs = m− k,Xs = k)

P (Xr −Xs = m− k,Xs = k)

= P (Xt −Xr = n−m) = P (Xt −Xr = n−m∣Xr = m)

= P (Xt = n∣Xr = m).

(5.8)

This proves the Markov property. From Theorem (5.2) we know that

P (Xt −Xr = n−m) =(�(t− r))n−me−�(t−r)

(n−m)!

which proves the time homogeneity in view of (5.8). □

In particular,

P (Xt = n∣X0 = 0) = P (Xt = n) =(�t)ne−�t

n!.

Consider the random time �1 = min{t > 0 : Xt = 1}. This is the time of the first jump ofthe process X. Then, of course, we have for all t ≥ 0

P (�1 > t) = P (Xt = 0) = e−�t,

and so the random time �1 has exponential distribution with parameter �. More generally,the so called sojourn times, that is the random times �n that elapse between the consec-utive jumps of the Poisson process X, are i.i.d random variables each having exponentialdistribution with parameter �.

The previous remark tells us that the trajectories of a Poisson process with rate � areright-continuous step functions. The height of each step is 1. The length of each step isthe value of exponential random variable with parameter �. The lengths of different stepsare distributed independently. Now sketch a graph of a trajectory (or a sample path) of aPoisson process [you may want to look at Figure 1.2.9 in Mikosch].

The time-t transition matrix of a Poisson process is Q(t) =(P (Xs+t = m∣Xs =

n))m,n=0,1,2,...

=(P (Xs+t − Xs = m − n)

)m,n=0,1,2,...

=(Pm−n(t)

)m,n=0,1,2,...

, wherePm−n(t) = 0 for m < n. Thus,

Q(t) =

⎛⎜⎜⎜⎜⎜⎝

0 1 2 3 4 ⋅ ⋅ ⋅0 P0(t) P1(t) P2(t) P3(t) P4(t) ⋅ ⋅ ⋅1 0 P0(t) P1(t) P2(t) P3(t) ⋅ ⋅ ⋅2 0 0 P0(t) P1(t) P2(t) ⋅ ⋅ ⋅3 0 0 0 P0(t) P1(t) ⋅ ⋅ ⋅...

......

......

.... . .

⎞⎟⎟⎟⎟⎟⎠.

46

Let us now introduce the countably infinite matrix

A =

⎛⎜⎜⎜⎜⎜⎝

0 1 2 3 4 ⋅ ⋅ ⋅0 −� � 0 0 0 ⋅ ⋅ ⋅1 0 −� � 0 0 ⋅ ⋅ ⋅2 0 0 −� � 0 ⋅ ⋅ ⋅3 0 0 0 −� � ⋅ ⋅ ⋅...

......

......

.... . .

⎞⎟⎟⎟⎟⎟⎠. (5.9)

Consistently with the general form (5.2)-(5.3) of the Kolmogorov equations of a Markovprocess, the system of ordinary differential equations (5.5), (5.6) can be written vectoriallyas: Q(0) = I and for t ≥ 0

Q(t) = AQ(t) = Q(t)A, Q(t) = eAt.

For 0 ≤ s ≤ t the increment Xt − Xs is a random variable that has the Poissondistribution with parameter �(t−s). We also know that this random variable is independentof the all random variables Xr, 0 ≤ r ≤ s. Thus, we have

E(Xt − �t∣ℱs) = E(Xt −Xs − �(t− s)∣ℱs) + E(Xs − �s∣ℱs) = Xs − �s.

This means that the processMt := Xt − �t, t ≥ 0

is a martingale with respect to the natural filtration of X.

5.2 Two-states continuous time Markov chain

We now consider a two state Markov chain with the infinitesimal generator

A =

( 0 10 −� �1 � −�

).

If the process is in state 0 then it waits for a random time �0 before it decides to jumpto state 1. The random time �0 has an exponential distribution with parameter �. If theprocess is in state 1 then it waits for a random time �1 before it decides to jump to state 0.The random time �1 has an exponential distribution with parameter �.

The forward Kolmogorov equation is

Q(t) = Q(t)A, Q(0) = I, t ≥ 0;

that is

q(t; 0, 0) = −�q(t; 0, 0) + �q(t; 0, 1), q(t; 0, 1) = �q(t; 0, 0)− �q(t; 0, 1),

q(t; 1, 0) = −�q(t; 1, 0) + �q(t; 1, 1), q(t; 1, 1) = �q(t; 1, 0)− �q(t; 1, 1),

for t ≥ 0, with the initial conditions

q(0; 0, 0) = q(0; 1, 1) = 1, q(0; 1, 0) = q(0; 0, 1) = 0.

47

The backward Kolmogorov equation is

Q = AQ(t), Q(0) = I, t ≥ 0;

that is

q(t; 0, 0) = −�q(t; 0, 0) + �q(t; 1, 0), q(t; 0, 1) = −�q(t; 0, 1) + �q(t; 1, 1),

q(t; 1, 0) = �q(t; 0, 0)− �q(t; 1, 0), q(t; 1, 1) = �q(t; 0, 1)− �q(t; 1, 1),

for t ≥ 0, with the initial conditions

q(0; 0, 0) = q(0; 1, 1) = 1, q(0; 1, 0) = q(0; 0, 1) = 0.

The matrix A diagonalizes as follows (check it as an exercise!)

A =

(1 −�

�

1 1

)(0 00 −(�+ �)

)( ��+�

��+�

− ��+�

��+�

).

Thus, the solution to both equations (forward and backward) is

Q(t) = eAt =

(1 −�

�

1 1

)e

⎛⎝0 00 −(�+ �)

⎞⎠t( ��+�

��+�

− ��+�

��+�

)

=

(1 −�

�

1 1

)(1 0

0 e−t(�+�)

)( ��+�

��+�

− ��+�

��+�

).

That is,

q(t; 0, 0) =�

�+ �+

�

�+ �e−t(�+�), q(t; 0, 1) = 1− q(t; 0, 0),

q(t; 1, 1) =�

�+ �+

�

�+ �e−t(�+�), q(t; 1, 0) = 1− q(t; 1, 1)

for t ≥ 0.Observe that

limt→∞

Q(t) =

(�

�+��

�+��

�+��

�+�

)=

(��

),

where, � = ( ��+�

��+�). Thus, � is the unique stationary distribution for this chain.

5.3 Birth-and-death Process [Lawler, Section 3.3.]

The infinitesimal generator of a Birth-and-Death process is the infinite matrix

A =

⎛⎜⎜⎜⎜⎜⎝

0 1 2 3 4 ⋅ ⋅ ⋅0 −�0 �0 0 0 0 ⋅ ⋅ ⋅1 �1 −�1 − �1 �1 0 0 ⋅ ⋅ ⋅2 0 �2 −�2 − �2 �2 0 ⋅ ⋅ ⋅3 0 0 �3 −�3 − �3 �3 ⋅ ⋅ ⋅...

......

......

.... . .

⎞⎟⎟⎟⎟⎟⎠.

48

The constants �i ≥ 0, i = 0, 1, 2, 3, . . . , represent the “death” rates at various states of theprocess. They are intensities of “downward” transitions; note that we always have �0 = 0.The constants �i ≥ 0, i = 0, 1, 2, 3, . . . , represent the “birth” rates at various states of theprocess. They are intensities of “upward” transitions. Observe that the diagonal elementsof the matrix A are non-positive, and that the rows of the matrix sum up to 0.

In each state i the process waits a random amount of time, �i, before the process “de-cides” to jump to either the higher state i+ 1 or the lower state i− 1 [the latter possibilityis valid only if i ≥ 1]. The waiting time �i has an exponential distribution with parameter�i + �i. Thus, the intensity (or rate) of a jump out of state i is �i + �i. Once the processdecides to jump from state i, the probability of the jump up [to i+1] is �i

�i+�i, and the prob-

ability of the jump down [to i− 1] is �i�i+�i

. Obviously, if �i +�i = 0 then the process neverleaves state i with probability one [we deal here with a degenerate exponential distributionwith all probability mass concentrated at ∞].

The Poisson process is a BDP process for which �i = 0, i = 0, 1, 2, 3, . . . , and �i = �, i =0, 1, 2, 3, . . . , for some positive �.

49

Homework 5: Poisson processes and Continuous-time Markov chains

1. Prove Proposition 5.1.

2. Let Xt be a Markov chain with state space {1, 2} and intensities �(l, 2) = l, �(2, l) = 4.Find Q(t).

3. Let X be a Poisson process with parameter � = 2. Determine the following expecta-tions

(a) E(X2)

(b) E(X21 )

(c) E(X1X2)

(d) E(�2), E(�1)

(e) E(T2) where T2 denotes the second jump time of Xt.

(f) E(�1�2)

4. Lawler, Ex. 3.1. Let Xt = # of calls arrived by time t. Thus, X is a Poisson processwith rate � = 4/hour. Compute:

(a) P (X1 < 2)

(b) P (X2 ≥ 8∣X1 = 6)

5. Let X be a Poisson process with parameter �. Is the process Yt = X2t −�t a martingale

with respect to the natural filtration of X?

50

Chapter 6

Brownian motion [Lawler, Chapter 8;Mikosch, Section 1.3; Shreve, Sections3.3–3.7]

Up to now we have been dealing with stochastic processes taking at most a countable numberof values. The one-dimensional Brownian motion (BM) process takes values in the entirereal line and that is why it is successfully used to model certain types of random continuousmotions. [See Mikosch, section 1.3, for graphs of simulated sample paths of a 1-d BMprocess.]

6.1 Definition and basic properties

Our intention is to model a “random continuous motion” that satisfies certain desirablephysical postulates. [It is worth emphasizing that here the continuity is understood both inthe time variable and the space variable.] Let Xt denote the position at time t ≥ 0 of ourrandom process. Our postulates regarding the Xt are as follows:

∙ X0 = 0 .

∙ The random process has independent and time homogeneous increments. That is forany 0 ≤ s ≤ t ≤ u ≤ v the random variables Xt −Xs and Xv −Xu are independent.In addition, for any 0 ≤ s ≤ t the distribution of Xt −Xs depends only on the timedifferential t− s.

Recall that a Poisson process with rate � satisfies the above two postulates. But it is aninteger valued process. Let us make instead the following additional postulate.

∙ The sample paths X⋅(!) of our random process are continuous functions from [0,∞)to the state space S = (−∞,∞).

It turns out [compare Lawler p.143-144 (p.173-174)] that the above three postulates implythat any 0 ≤ s ≤ t the distribution of the increment Xt −Xs must be Gaussian:

Xt −Xs ∼ N(�(t− s), �2(t− s)

)for some constants � and � > 0 [we assume strict positivity of � to avoid trivialities]. Allthis motivates the following

51

52

Definition 6.1. The stochastic process (Xt, t ≥ 0) is called Brownian motion with drift, orWiener process with drift, starting at 0, if X0 = 0 and

1. Sample paths of Xt are continuous functions of t,

2. Xt has independent increments: for any sequence of times 0 ≤ t1 < t2 < ... < tn <∞,the incrementsXt1−X0, Xt2−Xt1 , . . . , Xtn−Xtn−1 are independent random variables;for t ≥ s the distribution of Xt −Xs depends on t and s only through t− s,

3. For 0 ≤ s ≤ tXt −Xs ∼ N

(� (t− s), �2(t− s)

).

It was indicated above that condition 3 in Definition 6.1 is implied by conditions 1 and 2(assuming X0 = 0). Nevertheless, it is customary to include this condition as a part of thedefinition of the BM process. Note that in particular we have Xt ∼ N

(� t, �2t

). If we

change X0 = 0 into X0 = x in the definition, then Xt ∼ N(x+ � t, �2t

). Here � is called

the drift parameter and �2 the variance (or diffusion) parameter.Frequently, the definition of a BM process is formulated for the case � = 0. This is

the definition of the BM process given in Lawler p.144 (p.174). We have chosen to give thegeneral (i.e. � ∈ (−∞,∞)) definition above. Such BM process is called Brownian motionwith drift in Lawler, section 8.7, and denoted by Yt, t ≥ 0.

Definition 6.2. When � = 0 and �2 = 1, the process Xt, t ≥ 0 is called standard Brownianmotion (SBM) and is often denoted by {Wt, t ≥ 0}. [Mikosch denotes the SBM by Bt, as isalso done in some other texts.]

It can be shown that sample paths of a BM, though continuous, are nowhere differen-tiable [see Lawler p.145 (p.175), or Mikosch p. 36]. It can also be shown that sample pathsof BM do not have bounded variation on any finite time interval [see Mikosch p. 39].

6.1.1 Random walk approximation

A random walk may serve as a “discrete time prototype” of the BM process. As a matter offact, the BM process can be constructed as an appropriate limit of random walk processes.Here is how [the construction is done for the case � = 0 for simplicity; also, compare Lawlerp.144-145 (p.174-175), or Mikosch section 1.3.3]. Let "1, "2, ... be i.i.d with P ("1 ± 1) = 1

2 ,and let

Xh,kt = k("1 + "2 + ..."[ t

h])

where [x] denotes greatest integer ≤ x. Xh,kt can be interpreted as the time-t location of the

particle executing random walk with size of the step given by k and with time unit equal toh . We have

E(Xh,kt ) = 0,

V ar(Xh,kt ) = k2

[t

h

]≈ k2 t

h.

Let h, k → 0 in such a way that V ar(Xh,kt ) converges to a finite, positive number. (Note

that if we set k = h, then V ar(Xh,kt ) → 0 ). This can be accomplished by maintaining

53

k2 = � 2h, for a finite constant � . In particular by considering Xn = X1n, �√

n we obtainV ar(Xn

t ) ≈ �2t. We then have

Xn =�√n

("1 + "2 + ...+ "[nt]

)=

�√n

("1 + "2 + ...+ "[nt]

) √[nt]√[nt]

=�√t("1 + "2 + ...+ "[nt]

)√[nt]

√[nt]√nt

for every n. We note that, as n → ∞,√

[nt]/√nt → 1, and thus by the central limit

theorem, as n→∞

Xn → N (0, �2t) in distribution.

In addition one can show that all joint distributions of Xn converge to the joint normaldistributions.

6.1.2 Second order properties

We know that Xt ∼ N(� t, �2 t

). We now consider the joint probability density function

of (Xt1 , Xt2 , . . . , Xtn). Recall that the random variables Xt1 , Xt2 , . . . , Xtn are said to havea joint normal distribution if they can be represented as

Xti =m∑j=1

ai,j"j i = 1, 2, . . . , n

where "j , 1 ≤ j ≤ m are independent normal random variables and the ai,j are arbitraryconstants. For Brownian motion, we have

Xt1 = Xt1 −X0

Xt2 = (Xt2 −Xt1) + (Xt1 −X0)

. . .

Xtn = (Xtn −Xtn−1) + . . .+ (Xt1 −X0)

where, by definition of BM, increments are independent normal random variables. Hence thedistribution of (Xt1 , Xt2 , . . . , Xtn) is multi-variate normal with EXtj = � tj , 1 ≤ j ≤ n,and with covariance matrix C = (Ci,j) given by

Ci,j = Cov(Xti , Xtj

)= �2 min(ti, tj).

We verify the last statement. Assume that s < t. Then

Cov(Xs, Xt) = E(XsXt)− (EXs)(EXt)

= E [Xs(Xt −Xs)] + EX2s − �2st

= EXsE(Xt −Xs) + �2s+ �2s2 − �2st

= �2s(t− s) + �2s+ �2s2 − �2st

= �2s = �2 min(s, t).

54

6.2 Markov properties

Brownian motion, as a process with independent increments, is a Markov process. This canbe verified as follows:

P (Xt+s ≤ y ∣Xs = x, Xt1 = x1, . . . , Xtn = xn )= P (Xt+s −Xs ≤ y − x ∣Xs = x, Xt1 = x1, . . . , Xtn = xn)= P (Xt+s −Xs ≤ y − x) = P (Xt+s ≤ y ∣Xs = x)

where 0 ≤ t1 < t2 < ... < tn < s. Let

q(t;x, y) ≡ ∂yP (Xt+s ≤ y∣Xs = x) = ∂yP (Xt+s −Xs ≤ y − x)

=1√

2�t�exp[−(y − x− �t)2/2�2t].

where the first and second equality follow from the first and second defining property of BM,respectively. The function q (t; x, y) is the probability density function of Xt+s given thatXs = x. It is called the transition density function of (Xt, t ≥ 0).

Note that q (t; x, y) depends on x, y only through (y − x). Therefore BM is a spatiallyhomogeneous process as well as a time homogeneous process. You may recall that analogousproperties were satisfied for a Poisson process.

Remark 6.3. One has the following two properties of the function q (t; x, y) , with ∂z ≡∂z, ∂z2 ≡ ∂2

z2 for every variable z.(i) For every x ∈ (−∞,∞)

∂tq (t; x, y) = A∗q (t; x, y) , ∀t ≥ 0, y ∈ (−∞,∞),

whereA∗q (t; x, y) = −�∂yq (t; x, y) +

1

2�2∂2

y2q (t; x, y). (6.1)

The operator A∗ is the adjoint infinitesimal generator for BM. Equation (6.1) is called theforward Kolmogorov equation for the transition probability density function. [Compare withequation (5.2) in the case of a continuous-time Markov chain.](ii) For every y ∈ (−∞,∞)

∂tq (t; x, y)∂t = Aq (t; x, y) , ∀t ≥ 0, x ∈ (−∞,∞), (6.2)

whereAq (t; x, y) = �∂xq (t; x, y) +

1

2�2∂2

x2q (t; x, y).

The operator A is the infinitesimal generator for BM. Equation (6.2) is called the back-ward Kolmogorov equation for the transition probability density function. [Compare withequation (5.3) in the case of a continuous-time Markov chain.]

The BM process also satisfies the so called strong Markov property, namely the factthat Bt = Wt+� −W� is a Brownian motion independent of ℱ� , for every stopping time �[see Lawler p.147 (p.178)]. Using this property the following three important features of theSBM process (i.e. Xt = Wt) can be demonstrated [see Lawler p.148-150 (p.178-180)]:

55

Reflection Principle For any b > 0 and for any t > 0

P (Ws ≥ b for some 0 ≤ s ≤ t) = 2P (Wt ≥ b).

Equivalently,

P (�b < t) = 2P (Wt ≥ b),

where �b = inf{t ≥ 0 : Wt = b}.

Arctan law and recurrence For any t > 1

P (Ws = 0 for some 1 ≤ s ≤ t) = 1− 2

�Arctan

1√t− 1

.

ConsequentlyP (Ws = 0 for some 1 ≤ s) = 1.

Strong law of large numbers With probability 1 we have

limt→∞

Wt

t= 0.

56

Homework 6: Brownian motion

1. SupposeWt is a standard Brownian motion and Bt =√aWat with a > 0. Show that Bt

is a standard Brownian motion (known as time rescaled standard Brownian motion).

2. SupposeWt is a standard Brownian motion and Zt = tW1/t. Show that Zt is a standardBrownian motion (known as time reversed standard Brownian motion).

3. Let Wt be a standard Brownian motion. Compute the following conditional probabil-ity: P (W2 > 0∣W1 > 0). Are the events {W1 > 0} and {W2 > 0} independent?

4. Let Wt be a standard Brownian motion.

(a) Express the joint density function ofW2,W4,W6 in terms of the transition densityfunction of (Wt, t ≥ 0).

(b) Compute the probability density function of W4 conditional on W2 = 0 andW6 = 0.

(c) Compute E(W6∣W2,W4).

(d) Compute E(W6W2W4).

57

6.3 Martingale methods

Let Wt be a standard BM with W0 = 0, and let

�a = min{t ≥ 0 : Wt = a}, a ∕= 0.

Assume a > 0, and let F (t) = P (�a ≤ t). We have

P (Wt > a) = P (Wt > a, �a ≤ t) =

∫ t

0P (Wt > a∣�a = s) dF (s)

=

∫ t

0P(Wt > a

∣∣Ws = a, (Wu < a, u < s))dF (s)

=

∫ t

0P(Wt −Ws > a

∣∣∣Ws = a, (Wu < a, u < s))dF (s)

=

∫ t

0P (Wt −Ws > 0)dF (s) =

1

2P (�a ≤ t).

where the next-to-last equality follows by the independent increments property and the lastone by the distributional properties of BM. Hence

P (�a ≤ t) = 2P (Wt > a) =2√2�t

∫ ∞a

exp(−x2/2t)dx

=2√2�

∫ ∞a/√texp(−y2/2)dy

where in the last step we substituted y = x/√t. [Note, that we have just demonstrated

the so-called reflection principle for standard BM process]. Thus the probability densityfunction of �a is

(t) =d

dtP (�a ≤ t) = − 2√

2�exp(−a2/2t)

d

dt

(a√t

)=

a√2�t−3/2 exp(−a2/2t), t > 0, a > 0.

For a < 0, by symmetry (t) =

−a√2�t−3/2 exp(−a2/2t),

so that (t) =

∣a∣√2�t−3/2 exp(−a2/2t), a ∕= 0, t > 0,

is the pdf of �a. It is called an inverse Gaussian density. We have

P (�a <∞) = limt→∞

P (�a ≤ t) = limt→∞

2√2�

∫ ∞a/√texp(−y2/2)dy = 1,

and E�a = ∞. This shows that standard BM is a null recurrent process [It behaves like adriftless random walk on the integers, see Lawler].

Let Mt = max0≤s≤tWs, the maximum of BM on the time interval [0, t]. We can noweasily obtain the distribution of Mt :

P (Mt > a) = P ( max0≤s≤t

Ws > a) = P (�a ≤ t) =2√2�

∫ ∞a/√texp(−y2/2)dy.

58

6.3.1 Martingales associated with Brownian motion

Let (Xt, t ≥ 0) be BM with � = 0 and variance parameter �2. Then the following processesare martingales:

a. Xt

b. Mt = X2t− �2t

c. Zt = exp(�Xt − 1

2�2�2t

), � ∈ ℝ.

To verify that the above processes are martingales we write Xt = Xt − Xs + Xs, and usethe fact that BM has independent increments. As an example we verify that Xt, t ≥ 0, is amartingale. We have

E(Xt∣ℱs) = E(Xt −Xs +Xs∣ℱs) = E(Xt −Xs) +Xs = Xs,

and E∣Xt∣ < ∞, since Xt is a normal random variable. Hence indeed (Xt, t ≥ 0) is a mar-tingale.

Let now Xt, t ≥ 0, be BM with parameters � ∕= 0, and �2. Then the following processesare martingales

d. Yt = Xt − �t

e. Mt = (Xt − �t)2− �2t

f. Zt = exp(�Xt −

(��+ 1

2�2�2)t), � ∈ ℝ .

The processes in c. and f. are Wald’s martingales (recall the structure of Wald’s martingalein discrete time) since in both cases Zt = exp(�Xt)/E[exp(�Xt)].

6.3.2 Exit time from a corridor

Let Xt be BM with parameters � and �2, and X0 = x. Then for a, b ∈ ℝ

�a = min(t ≥ 0 : Xt = a), �b = min(t ≥ 0 : Xt = b) or�a,b = � = min(t ≥ 0 : Xt = a or b)

are stopping times (hitting time of given levels and exit time from a corridor) relative to(ℱt, t ≥ 0), where ℱt = �(Xs, s ≤ t). Let us now use OST to compute P (X� = b) and E�(given the initial condition x of X). We omit the verification of assumptions of OST asthese can be verified similarly as for random walks (see Chapter 2).

Case 1 � = 0 In this case (Xt, t ≥ 0) is a martingale. By OST

EX� = EX0 = x.

ButEX� = bP (X� = b) + aP (X� = a) = x.

59

Solving last equation for P (X� = b) = 1− P (X� = a) gives

P (X� = b) =x− ab− a

, a ≤ x ≤ b.

To compute E� we use the martingale in point b. of Subsection 6.3.1. By OST

EM� = EX2� − �2E� = x2

so that

E� =EX2

� − x2

�2=

[(x− ab− a

)b2 +

(b− xb− a

)a2 − x2

]/�2

=(x− a)(b− x)

�2.

Case 2 � ∕= 0 To compute P (X� = b) we apply OST to the martingale in point f. ofSubsection 6.3.1:

Zt = exp

(�Xt −

(��+

1

2�2�2

)t

),

with � = �∗ := −2�/�2. With this choice of �

Zt = exp

(−2�

�2Xt

)= exp (�∗Xt) .

By OSTEZ� = EZ0 = exp (�∗x) .

SolvingEZ� = exp (�∗b)P (X� = b) + exp (�∗a)P (X� = a) = exp (�∗x)

for P (X� = b) = 1− P (X� = a) gives

P (X� = b) =exp (�∗x)− exp (�∗a)

exp (�∗b)− exp (�∗a).

It now follows from the above equation that

P (�b < ∞) = lima→−∞

P (X� = b)

=

{1 if � > 0, hence �∗ < 0exp[�∗(x− b)] < 1 if � < 0, hence �∗ > 0

Similarly

P (�a < ∞) = limb→∞

P (X� = a) = limb→∞

exp (�∗b)− exp (�∗x)

exp (�∗b)− exp (�∗a)

=

{exp[�∗(x− a)] < 1 if � > 0, hence �∗ < 01 if � < 0, hence �∗ > 0.

These results show that BM with � ∕= 0 is a transient process. We shall now compute E� .For this we use the martingale:

Yt ≡ Xt − �t, t ≥ 0.

60

By OST we haveEY� = EX� − �E� = x,

so thatE� =

EX� − x�

,

whereEX� = a

exp (�∗b)− exp (�∗x)

exp (�∗b)− exp (�∗a)+ b

exp (�∗x)− exp (�∗a)

exp (�∗b)− exp (�∗a),

and, as before, �∗ = −2�/�2. Suppose now that we let a → −∞ and assume that � > 0,hence �∗ < 0. Then

lima→−∞

EX� = b,

which shows thatE� = E�b =

b− x�

.

61

6.3.3 Laplace Transform of the first passage time of a drifted Brownianmotion

Let now �a be the first passage time of BM Xt with � > 0 and variance parameter �2 tolevel a. We assume X0 = x < a, hence E�a < +∞. Recall that

Zt = exp

(�Xt −

(��+

1

2�2�2

)t

), � ∈ ℝ

is a martingale withEZt = EZ0 = exp(�x).

Applying OST to Zt with the stopping time �a, we obtain

EZ�a = exp(�x).

orE

[exp

(�a−

(��+

1

2�2�2

)�a

)]= exp(�x)

so thatE exp

{−(��+

1

2�2�2

)�a

}= exp[�(x− a)]. (6.3)

Let� = ��+

1

2�2�2.

We require that � > 0, so that E exp(−��a) is the Laplace transform of �a. Solving lastequation for � gives

�± =−�±

√�2 + 2�2�

�2.

Taking the positive root � = �+ (this guarantees that � > 0), and substituting it into theright hand side of (6.3), gives the Laplace transform of �a as

E exp(−��a) = exp[−(a− x)(

√�2 + 2�2�− �)/�2

].

This transform can be inverted to obtain the pdf of �a

(t) =a− x�√

2�t3exp

[−(a− x− �t)2

2�2t

], t ≥ 0.

Note that if x = 0, � = 0, and �2 = 1, we formally recover the pdf of �a obtained before forstandard BM. However the assumptions of OST are not satisfied in case of � = 0 since inthis case E�a = +∞. Obviously the Laplace transform can be used to obtain moments of�a. We have

d

d�E exp (−��a) = E

d

d�exp (−��a) = −E(�a exp (−��a)),

so thatd

d�E exp (−��a)

∣∣∣�=0

= −E�a.

Carrying out the computation on the left side of the above equation gives the result obtainedpreviously, that is

E�a =a− x�

.

Higher moments of �a can be obtained by further differentiation.

62

Homework 7: Continuous-time optional stopping theorem

1. Two independent Brownian motions X1t and X2

t with drift parameters �1 and �2,respectively, where �1 ≤ �2, and the same variance parameter �2, startout at positions x1 and x2, respectively, where x1 < x2. Calculate the probability thatthey will never meet.Hint : Consider the process X2

t −X1t and notice that it is also a Brownian motion.

Then consider separately two cases: �1 = �2 and �1 < �2.

2. Let Xt, t ≥ 0, be a Brownian motion with drift parameter � > 0, and variance param-eter �2. Let

# = min{t > 0 : Xt = a},

where X0 = x < a. Compute E# and V ar(#) using Optional Stopping Theorem.

63

6.4 Geometric Brownian motion [see also Mikosch, Example1.3.8]

Definition 6.4. Let Xt be a Brownian motion with parameters �, �2, and X0 = x. Theprocess

St = exp(Xt) = exp(x+ �t+ �Wt)

is called Geometric Brownian motion (GBM).

Note that the state space of (St, t ≥ 0) is S = (0,∞). Let

0 = t0 < t1 < t2 < ... < tn <∞.

be an increasing sequence of times, and consider relative changes

St1 − St0St0

,St2 − St1St1

, . . . ,Stn − Stn−1

Stn−1

,

which can be expressed as

exp(Xt1 −Xt0)− 1, exp(Xt2 −Xt1)− 1, ... exp(Xtn −Xtn−1)− 1,

from which we see that for GBM, relative changes in disjoint time intervals are independentrandom variables. St is also called a lognormal process, and is often used to model pricesof financial assets. Modeling prices with GBM involves the assumption that returns areindependent from period to period.

We compute ESt and V ar(St). To this end we recall that the moment generatingfunction M(�) of " ∼ N (m, ) is

M(�) = E exp(�") = exp(�m+�2

2 )

In particular, setting � = 1, we obtain

E exp(") = exp(m+1

2 )

In our case, Xt ∼ N (x+ �t, �2t). Hence

ESt = E exp(Xt) = exp(x+ �t+1

2�2t) = exp(x+ �t+

1

2�2t).

We can show in a similar way that

V ar(St) = exp(2x+ 2�t+ �2t)(exp(�2t)− 1

),

and also that the mean and the variance of the return are

E

(St − SsSs

)= exp

[�(t− s) +

1

2�2(t− s)

]− 1,

V ar

(St − SsSs

)= exp

[2�(t− s) + �2(t− s)

] {exp

[�2(t− s)

]− 1}.

64

Part III

Elements of stochastic analysis

65

Chapter 7

Stochastic integration [Lawler,Chapter 9; Mikosch, Chapter 2;Shreve, Sections 4.2 and 4.3 ]

Our purpose in this topic is to give an overview of the basics of stochastic calculus. Stochasticcalculus is one of the mathematical tools used in engineering (e.g. control engineering),modern finance industry, in modern insurance industry, and in modern management science,among others.

We begin with the study of stochastic integrals with respect to a SBM process. Weshall proceed in three stages. First, in Section 5.1, we shall define and analyze stochastic“integrals” with respect to a discrete time symmetric random walk. Next, in Section 5.2,we shall define and analyze stochastic integrals of random step functions with respect to aBM process. Lastly, in Section 5.3, we shall generalize results of Section 5.2 to stochasticintegrals of general stochastic integrand with respect to a SBM process. We shall indicate,that important properties of stochastic integrals derived in Sections 5.1 and 5.2 carry overto the general case studied in Section 5.3. Section 5.4 discusses integration with respect toa Poisson process. Finally Section 5.5 provides a glimpse of a more general semimartingaleintegration theory.

7.1 Integration with respect to symmetric random walk

Recall that we constructed a discrete-time stochastic integral with respect to a symmetricrandom walk in Example 3.4. We called it then a martingale transform of the process Snby the betting process �n. For convenience we repeat here the content of Example 3.4. Wehave, for n = 0, 1, 2, . . . ,

Sn = x+ "1 + ⋅ ⋅ ⋅+ "n,

where the "n are i.i.d. random variable with P ("n = −1) = P ("n = 1) = 12 for n ≥ 1.

We know that symmetric random walk Sn is a martingale with respect to the filtrationℱn = �("1, . . . , "n) = �(S0, S1, ⋅ ⋅ ⋅ , Sn). We saw that for n ≥ 0 the gambler’s fortuneprocess Yn could be represented as a discrete time stochastic integral

Yn =n∑k=1

�kΔSk, n = 1, 2, 3, . . . ,

67

68

where ΔSk = Sk − Sk−1. Recall that the process �n was supposed to be predictable withrespect to the filtration ℱn, that is for every n ≥ 1 we have that �n is ℱn−1-measurable.

Properties enjoyed by the stochastic integral Yn =∑n

k=1 �kΔSk:

(i) We verified in Example 3.3 that Yn, n ≥ 0 is a martingale with respect to the filtrationℱn;

(ii) EYn = 0 for every n ≥ 0. Here is why,

EYn = E[ n∑k=1

�kΔSk

]=

n∑k=1

E[�kΔSk]

=

n∑k=1

E[E[�kΔSk∣ℱk−1]

]=

n∑k=1

E[�kE["k∣ℱk−1]

]=

n∑k=1

E[�kE["k]

]= 0.

Of course, a much faster proof is possible: due to martingale property we have EYn =EY0 = 0.

(iii) V ar(Yn) = EY 2n =

∑nk=1 E�

2k for every n ≥ 1. See Lawler p.164 (p.199).

7.2 The Itô stochastic integral for simple processes

Definition 7.1. A stochastic process Zt, t ∈ [0, T ] is called a simple process if it satisfiesthe following properties:

∙ There exists a partition

�n : 0 = t0 < t1 < ⋅ ⋅ ⋅ < tn−1 < tn = T,

and a sequence of random variables Z1, Z2, . . . , Zn such that

Zt =

{Zi if ti−1 ≤ t < ti, i = 1, 2, 3, . . . , nZn if t = T.

∙ The sequence (Zi, i = 1, 2, . . . , n) is (ℱti−1 , i = 1, 2, . . . , n)-adapted. That is Zi is afunction of SBM up to time ti−1. Moreover, EZ2

i is finite.

We can now define the Itô stochastic integral for simple processes Z:

Definition 7.2. The Itô1stochastic integral for simple processes Z on the interval (0, t],where ti ≤ t < ti+1, is given by [a random Riemann-Stieltjes sum]

Y nt =

∫ t

0ZsdWs =

i∑k=1

ZkΔWtk + Zi+1(Wt −Wti), (7.1)

where ΔWtk = Wtk −Wtk−1, and where, for i = 0,

∑0k=1 ZkΔWtk = 0.

1The pioneering work is the paper by Kiyoshi Itô (1915-2008): Stochastic Integral, Proc. Imperial Acad.Tokyo, 20, 519-524, 1944.

69

We can think of the SBM Wt as of a symmetric random walk continuous in time and space.Suppose now that a gambler may place bets depending only on the history of SBM. Thebets may be placed only at a certain finite set of times 0 = t0 < t1 < ⋅ ⋅ ⋅ < tn−1 < tn = T .A bet Zi placed at time ti−1 may only depend on the history of the SBM up to time ti−1.The game is stopped at time t. If the player bets Zi at time ti−1 then the player receivesZiΔWti at time ti if ti < t, and the player receives Zi(Wt −Wti−1) if ti−1 ≤ t < ti. Thenthe integral Y n

t =∫ t

0 ZsdWs represents the player’s fortune at time t in such a game.The Itô stochastic integral of Z is then defined likewise on any interval (r, t] with

0 ≤ r ≤ t.When considered as a function of t the Itô stochastic integral for simple processesZ, Y n

t , is a stochastic process.

Properties enjoyed by the Itô stochastic integral for simple processes Z:

(i) The Itô stochastic integral for simple processes Z is a martingale. We check now thetwo martingale conditions:

∙ We have E∣Y nt ∣ < ∞ for all t ∈ [0, T ]. This follows from the isometry property

(iii) below.∙ We have E(Y n

t ∣ℱs) = Y ns for every 0 ≤ s ≤ t ≤ T . To demonstrate this we first

take ti ≤ s ≤ t ≤ ti+1. In this case we have

Y nt = Y n

s + Zi+1(Wt −Ws).

Thus, since both Y ns and Zi+1 are ℱs-measurable [why?], we have

E(Y nt ∣ℱs) = Y n

s + Zi+1E(Wt −Ws∣ℱs)= Y n

s + Zi+1E(Wt −Ws) = Y ns ,

where the second equality follows due to independent increments property of SBM.Exercise 1. Verify that E(Y n

t ∣ℱs) = Y ns is true for ti ≤ s ≤ ti+1 and tk ≤ t ≤

tk+1 where ti+1 ≤ tk.

(ii) EY nt = 0 for every t ∈ [0, T ].

Exercise 2. Verify property (ii).

(iii) (Isometry property) We have that

V ar(Y nt ) = E[(Y n

t )2] =

∫ t

0EZ2

sds, ∀t ∈ [0, T ].

Exercise 3. Verify property (iii). [Hint: See Lawler p.166-167 (p.202-203) or Mikoschp.106.]

(iv) (Linearity with respect to integrands) Let Zt and Ut be two simple processes,and let a, b be two constants. Then∫ t

0(aZs + bUs)dWs = a

∫ t

0ZsdWs + b

∫ t

0UsdWs, ∀t ∈ [0, T ].

This property follows immediately from the linearity property of summation.

70

(v) (Linearity on adjacent intervals) Let 0 ≤ r ≤ t ≤ T . Then∫ t

rZsdWs =

∫ t

0ZsdWs −

∫ r

0ZsdWs.

Exercise 4. Verify property (v). [Hint: See Mikosch p. 107.]

(vi) The sample paths of the process Y nt are continuous. This follows since the sample

paths of Wt are continuous and we have

Y nt = Y n

ti−1+ Zti(Wt −Wti−1), ti−1 ≤ t ≤ ti.

Example 7.3. This is a simple, but important example. This example is discussed in detailin Mikosch, section 2.2.1. Take Zi = Wti−1 . Here we have, for t = ti,

Y nt =

i∑k=1

Wtk−1ΔWtk =

i∑k=1

Wtk−1(Wtk −Wtk−1

)

=1

2W 2t −

1

2

i∑k=1

(ΔWtk

)2.

It is demonstrated in Mikosch (p.98) that when the partition �n becomes finer and finer [i.e.

mesℎ(�n) := max{i=1,2,...,n}[ti − ti−1] → 0] then the sum∑i

k=1

(ΔWtk

)2converges to t in

mean square sense. This is very important observation as you will see later.

Exercise 5. Suppose that the simple integrand process Z is deterministic. That is, supposethat Z1, Z2, . . . , Zn are constants. Verify that in this case the stochastic integral Y n

t is arandom variable that has normal distribution with mean zero and variance

∫ t0 (Zs)

2ds.

7.3 The general Itô stochastic integral

The general Itô stochastic integral for an appropriately regular integrand processes Zt, t ∈[0, T ] is defined as the mean square limit of a sequence of Itô stochastic integrals for simpleprocesses Znt , t ∈ [0, T ]. The processes Znt , t ∈ [0, T ] are chosen in such a way that theyconverge to the process Zt, t ∈ [0, T ] in an appropriate sense. This was the main idea of K.Itô. Here are some details.

Lemma 7.1. Let Z be a process satisfying the following assumptions:

∙ Z is adapted to the SBM on [0, T ], that is, for every t ∈ [0, T ], the random variable Ztis a function of Ws, 0 ≤ s ≤ t.

∙ The integral∫ T

0 EZ2sds is finite.

Then, there exists a sequence{

(Znt , t ≥ 0), n = 1, 2, 3, . . .}

of simple processes so that

∫ T

0E[Zs − Zns ]2ds→ 0

as n→∞.

71

Proof. See Mikosch, Appendix A4. Construction of an approximating sequence{

(Znt , t ≥

0), n = 1, 2, 3, . . .}of simple processes involves making the partitions �n finer and finer. □

Now, we already know how to evaluate the Itô stochastic integral for each simple process Zn

in the sequence above. Let us denote by Y nt the Itô stochastic integral of Zn on the interval

[0, t]. From the general results in the area of functional analysis it follows that there existsa process Yt, t ∈ [0, T ] so that

E sup0≤t≤T

[Yt − Y n

t

]2→ 0

as n → ∞. We say that the sequence of processes Y n converges in the mean square tothe process Y . Moreover one can prove that the limit Y does not depend on the choice ofa sequence Zn of simple processes approximating Z. As a consequence we can state thefollowing

Definition 7.4. The mean square limit process Y is called the Itô stochastic integral of Z,and it is denoted by

Yt =

∫ t

0ZsdWs, t ∈ [0, T ]. (7.2)

If Z is a simple process then the Itô stochastic integral of Z is given by the Riemann-Stieltjessum (7.1). Mikosch uses notation It(Z) to denote the Itô stochastic integral of Z.

The Itô stochastic integral of Z can be defined likewise on any interval (r, t] with0 ≤ r ≤ t.Properties enjoyed by the (general) Itô stochastic integral All the properties enjoyedby the Itô integral of simple processes are inherited by the general Itô stochastic integralY = Y (Z). Thus,

(i) The (general) Itô stochastic integral for processes Z is a martingale with respect tothe natural filtration of the SBM.

(ii) EYt = 0 for every t ∈ [0, T ].

(iii) (Isometry property) We have that

V ar(Yt) = EY 2t =

∫ t

0EZ2

sds, ∀t ∈ [0, T ].

(iv) (Linearity with respect to integrands) Let Zt and Ut be two admissible integrands,and let a, b be two constants. Then∫ t

0(aZs + bUs)dWs = a

∫ t

0ZsdWs + b

∫ t

0UsdWs, ∀t ∈ [0, T ].

(v) (Linearity on adjacent intervals) Let 0 ≤ r ≤ t ≤ T . Then∫ t

rZsdWs =

∫ t

0ZsdWs −

∫ r

0ZsdWs.

(vi) The sample paths of the process Yt are continuous.

72

The definition of the general Itô stochastic integral can be extended to the case ofT = ∞. Suppose that the simple integrand process Z is deterministic. In this case thestochastic integral Yt is a random variable that has normal distribution with mean zero andvariance

∫ t0 (Zs)

2 ds.

Example 7.5. (This is a continuation of Example 7.3) From the above results itfollows that mean square limit of the stochastic integrals

Y nt =

i∑k=1

Wtk−1ΔWtk =

i∑k=1

Wtk−1(Wtk −Wtk−1

)

is the stochastic integral

Yt =

∫ t

0WsdWs, t ∈ [0, T ].

But, we have seen that

Y nt =

1

2W 2t −

1

2

i∑k=1

(ΔWtk

)2,

and that when the partition �n becomes finer and finer then the sum∑i

k=1

(ΔWtk

)2con-

verges to t in mean square sense. Thus, we obtain the following formula

Yt =

∫ t

0WsdWs =

1

2W 2t −

1

2t, t ∈ [0, T ].

This result indicates that Itô stochastic calculus is different from an ordinary calculus. [But,you may want to read about the Stratonovich and Other Integrals in Mikosch, Section 2.4.]

7.4 Stochastic Integral with respect to a Poisson process

Le N be PP(� > 0). In view of the simple structure of trajectories of a Poisson process isis easy to define a stochastic integral with respect to such process. However, in order thatthe stochastic integral with respect to a Poisson process has nice properties, it is requiredthat integrands are predictable and that they satisfy some mild integrability conditions. Adetailed discussion of the concept of predictability for continuous time processes is beyondthe scope of this course. It will be enough for us to know that whenever a process Z isleft-continuous and adapted with respect to the natural filtration of process N , or in caseof an arbitrary deterministic process2 Z, then Z is predictable.

We define the stochastic integral of a predictable integrand Z with respect to a Poissonprocess N as

It :=

∫ t

0Zs dNs :=

∑s≤t

ZsΔNs, (7.3)

where ΔNt := Nt −Nt−, with Nt− = lims↑tNs is the left-limit of N at time t.Note that one can represent dNt as

∑n �Tn(dt) where Tn represents the ntℎ jump time

of N and �Tn is a Dirac mass at Tn (random measure over the half-line). In this perspectiveone can view the Poisson stochastic integral It as the Lebesgue-Stieltjes integral of Z againstthe measure dNt, “! by !”. In particular one has ΔIt = ZtΔNt.

2Borel-measurable function of time.

73

Properties of this integral are analogous to the properties of the Ito integral discussedabove, except that the process I is not a martingale with respect to the natural filtrationof our Poisson process. The reason why I is not a martingale is that process N itself is nota martingale. Letting Mt = Nt − �t denote the compensated martingale of N , it can beverified that the process Y defined as

Yt =

∫ t

0Zs dMs :=

∫ t

0Zs dNs − �

∫ t

0Zsds (7.4)

is a martingale.

7.5 Semimartingale Integration Theory [See Protter]∗

In this Section we give a very brief account of the general semimartingale integration theory,such as it is developed for instance in the book by Protter. Given the previous developmentsthe reader should be able to admit the following notions and results without too much harm(a detailed exposition would take us very far away from the scope of these notes).

Semimartingales are a class of integrator-processes giving rise to the most flexible theoryof stochastic integration. In mathematical finance another motivation for modeling prices oftraded assets as semimartingales is that price processes outside this class involve arbitrages,unless rather strong constraints are imposed on the trading strategies.

It is well known that in a filtration satisfying the usual conditions, every semimartingalecan be considered as càdlàg, a French acronym for “(almost surely) left limited and rightcontinuous”. All semimartingales in these notes are understood in a càdlàg version. In oneof several equivalent characterizations, a semimartingale X corresponds to the sum of a localmartingale M and of a finite variation process A, where:

∙ A local martingale M admits an increasing sequence of stopping times �n such thatevery stopped process M⋅∧�n is a uniformly integrable martingale, and

∙ A finite variation process is a difference between two adapted non-decreasing processesstarting from 0.

Any such representation X = M + A is called a Doob-Meyer decomposition of the semi-martingale X. A Doob-Meyer decomposition is not unique in general. However there is atmost one such representation of a process X with A predictable. One then talks of “thecanonical Doob-Meyer decomposition of a special semimartingale X”. In particular,

Proposition 7.2. A predictable local martingale of finite variation (e.g., a time-differentiablelocal martingale) is constant.

The stochastic integral of a predictable and locally bounded process Z with respect toa semimartingale Y is then defined as

Yt =

∫ t

0ZsdXs :=

∫ t

0ZsdMs +

∫ t

0ZsdAs, (7.5)

where X = M + A is a Doob-Meyer decomposition of X, and∫ t

0 ZsdMs is defined bylocalization of M . A remarkable fact is that the corresponding notion of stochastic integralis independent of the Doob-Meyer decomposition of X which is used in (7.5). Predicableand locally bounded processes notably include all left-limiting processes of the form Z ≡ Z−where Z is a semimartingale.

74

Proposition 7.3. In case X is a local martingale, the integral process Y is again a localmartingale.

In case of a continuous integrator X, it is possible (as we saw earlier in the case of theBrownian motion) to define the stochastic integral Y for a class of admissible integrands Zlarger than that of the predictable and bounded processes, namely for integrands Z that areonly progressive and subject to suitable integrability conditions. One then has that∫ t

0ZsdXs =

∫ t

0Zs−dXs (7.6)

for every admissible semimartingale Z.

75

Homework 8: Stochastic integration

1. Do

(a) Exercise 1. Verify that E(Y nt ∣ℱs) = Y n

s is true for ti ≤ s ≤ ti+1 and tk ≤ t ≤ tk+1

where ti+1 ≤ tk.(b) Exercise 2. Verify property (ii), that is, verify that EY n

t = 0 for every t ∈ [0, T ].

(c) Exercise 3.

(d) Exercise 4.

(e) Exercise 5.

2. Compute the Itô stochastic integral for the process Zt = 1, ∀t ∈ [0, T ].

3. Compute the Poisson stochastic integral IT in (7.3) for the process Zt = 1, ∀t ∈ [0, T ].

4. Define Zt = Nt−, ∀t ∈ (0, T ]. Explain why process Z is predictable. Compute thestochastic integrals It and Yt for the process Z.

76

Chapter 8

Itô formula [Mikosch, Chapter 2;Shreve, Section 4.4]

8.1 Introduction

Consider the integral∫ t

0 sds. We know that∫ t

0sds =

1

2t2.

Now, think of the function w(t) = t. Thus, we have∫ t

0w(s)dw(s) =

1

2w2(t).

In fact, if w(t) is any differentiable function of t, so that the integrals below exists, then wehave ∫ t

0w(s)dw(s) =

1

2w2(t)− 1

2w2(0). (8.1)

This is just the chain rule formula:

d[f(w(s))]

ds= f ′(w(s))w′(s) (8.2)

which for f(x) = x2 yields

d(w2(t)) = 2w(t)w′(t)dt = 2w(t)dw(t) (8.3)

or, in integrated form, (8.1).Observe next that for an arbitrary differentiable function f (like f(x) = x2 above) the

expression (8.2) can be written as

df(w(t)) = f(w(t+ dt))− f(w(t)) = f ′(w(t))dw(t).

On the other hand, if the function f is more than once differentiable then we have from theTaylor expansion

f(w(t) + dw(t))− f(w(t)) = f ′(w(t))dw(t) +1

2f ′′(w(t))(dw(t))2 + ⋅ ⋅ ⋅ ,

where, as usually, dw(t) = w(t+dt)−w(t) is the increment of the function w on the interval[t, t+ dt]. So, in case of a differentiable function !, we may in fact neglect all terms of order2 and higher in the above Taylor expansion. This is because (dw(t))k = o(dt) for any k ≥ 2.

77

78


0WsdWs?

Now, could it be that for the SBM we would have∫ t

0WsdWs =

1

2W 2t ? (8.4)

Of course not! First of all from the properties of the Itô integral we know that the expectationof the left hand side in (8.4) is zero, whereas the expectation of the right hand side in (8.4)is 1

2 t. Secondly, we already saw that the true value of the stochastic integral in (8.4) is∫ t

0WsdWs =

1

2W 2t −

1

2t.

Applying Taylor expansion to the function f(Wt) = W 2t we see that

dW 2t = (Wt + dWt)

2 −W 2t = 2WtdWt + (dWt)

2. (8.5)

But,E((dWt)

2) = E(Wt+dt −Wt)2 = dt.

Thus, the term (dWt)2 is like dt [it is frequently written that (dWt)

2 = dt]. That is theterm (dWt)

2 is not o(dt), and therefore it must not be neglected in (8.5). This the reasonwhy (8.4) is not true. This is the reason why Prof. Kyosi Itô invented Itô calculus, and ...became famous!


0Ns−dNs?

Could it be that for the Poisson process N we would have∫ t

0Ns−dNs =

1

2N2t ? (8.6)

Of course not! If you already did Exercise 4 from Homework 8, then you know that∫ t

0Ns−dNs =

1

2(N2

t −Nt).

8.2 Itô formulas for continuous processes

There is a general semimartingale Itô formula which leads to all of the formulas that youwill see below in this section and in the following sections. We shall not state this generalresult though. We shall only state some of its simpler versions that we are going to use inthe future lectures.

Let f(x) be twice continuously differentiable function. We already know that we mustnot neglect the terms dWt and (dWt)

2 = dt in the Taylor expansion of f(Wt + dWt).However, it is known, that we may neglect the higher order terms. The suitably amendedTaylor expansion yields the following simple Itô formula

df(Wt) = f ′(Wt)dWt +1

2f ′′(Wt)dt

79

or in integral form

f(Wt)− f(Wr) =

∫ t

rf ′(Ws)dWs +

1

2

∫ t

rf ′′(Ws)ds, r ≤ t

We know that the process W 2t − t is a martingale with respect to the natural filtration of

the SBM. Thus, the quadratic variation process of the martingale Wt is ⟨W ⟩t = t. This isthe reason why the term f ′′(Wt)dt is sometimes written as f ′′(Wt)d⟨W ⟩t.

Before we proceed, let us introduce some notation: for a function f(t, x) we denote

∂tf(t, x) =∂f(t, x)

∂t, ∂f(t, x) =

∂f(t, x)

∂x, ∂2f(t, x) =

∂2f(t, x)

∂2x.

The first extension of the simple Itô formula Let the function f(t, x) be once contin-uously differentiable w.r.t t and twice continuously differentiable w.r.t x. Then, for every0 ≤ r ≤ t,

f(t,Wt)− f(r,Wr) =

∫ t

r∂f(s,Ws)dWs +

∫ t

r

(∂sf(s,Ws) +

1

2∂2f(s,Ws)

)ds (8.7)

or in differential form, for t ≥ 0,

df(t,Wt) = ∂f(t,Wt)dWt +(∂tf(t,Wt) +

1

2∂2f(t,Wt)

)dt. (8.8)

The second extension of the simple Itô formula Suppose the processes bt and �t areadapted to the natural filtration of the SBM, and are such that the two integrals below arewell defined. Define a new process Xt by

Xt = X0 +

∫ t

0bsds+

∫ t

0�sdWs, t ≥ 0.

Let the function f(t, x) be once continuously differentiable w.r.t t and twice continuouslydifferentiable w.r.t x. Then for process X we have for 0 ≤ r ≤ t

f(t,Xt)− f(r,Xr) =

∫ t

r

(∂sf(s,Xs) + ∂f(s,Xs)bs +

1

2∂2f(s,Xs)�

2s

)ds

+

∫ t

r∂f(s,Xs)�sdWs (8.9)

or, in differential form,

df(t,Xt) =(∂tf(t,Xt) + ∂f(t,Xt)bt +

1

2∂2f(t,Xt)�

2t

)dt+ ∂f(t,Xt)�tdWt. (8.10)

Observe that formulas (8.7) and (8.8) are special cases of formulas (8.9) and (8.10) for thecase where bt ≡ 0 and �t ≡ 1.

8.2.1 Examples

Example 8.1. Take f(x) = x2. From Itô formula we get

W 2t −W 2

r = 2

∫ t

rWsdWs +

∫ t

r1ds = 2

∫ t

rWsdWs + (t− r), 0 ≤ r ≤ t.

In particular, with r = 0 we get ∫ t

0WsdWs =

1

2W 2t −

1

2t.

80

Example 8.2. Take f(x) = ex. From Itô formula we get

eWt − eWr =

∫ t

reWsdWs +

1

2

∫ t

reWsds.

Recall, that for a differentiable function w(t) we have

dew(t) = ew(t)w′(t)dt = ew(t)dw(t)

and thus

ew(t) − ew(r) =

∫ t

rew(s)dw(s).

Example 8.3. (Itô exponential) Take f(t, x) = ex−0.5t. From the first extension of theItô formula we get

eWt−0.5t − eWr−0.5r =

∫ t

reWs−0.5sdWs.

Example 8.4. (Geometric Brownian motion) Take f(t, x) = e(�−0.5�2)t+�x, where b, � ∈(−∞,∞) and � > 0. From the first extension of the Itô formula we get

e(�−0.5�2)t+�Wt − e(�− 12�2)r+�Wr

= �

∫ t

re(�− 1

2�2)s+�WsdWs + �

∫ t

re(�− 1

2�2)s+�Wsds.

Thus, defining a GBM process St by

St = S0e(�− 1

2�2)t+�Wt , t ≥ 0

we get

St − Sr = �

∫ t

rSsdWs + �

∫ t

rSsds, 0 ≤ r ≤ t

or, in differential formdSt = �Stdt+ �StdWt, t ≥ 0.

8.3 Itô formulas relative to jump processes [See Ikeda andWatanabe]∗

Suppose function f(t, n) : [0,∞)×{0, 1, . . .} 7→ ℝ, be differentiable in the first variable, andsuch that the process f(t,Nt) satisfies some mild integrability condition (e.g., f bounded).Then, we have that the following Itô formula for a Poisson process (integral form)

f(t,Nt)− f(r,Nr) =

∫ t

r∂sf(s,Ns) ds+

∫ t

r(f(s,Ns)− f(s,Ns−)) dNs (8.11)

in which (in differential form)

(f(s,Ns)− f(s,Ns−))dNs = (f(s,Ns− + 1)− f(s,Ns−))dNs. (8.12)

81

Example 8.5. Take f(t, n) = n2. We get

N2t = 2

∫ t

0Ns− dNs +Nt.

Example 8.6. Take f(t, n) = en. We get

eNt = 1 + (e− 1)

∫ t

0eNs− dNs.

Example 8.7. Take f(t, n) = 2n. Then,

2Nt = 1 +

∫ t

02Ns− dNs.

Let A be the generator of the Poisson process N in the sense introduced earlier of thematrix (5.9). Note that for fixed t, f(t, ⋅) may be considered as an infinite column vectorf = (f(t, 0), f(1), f(2), . . . )T. Consequently for fixed t the expression Af defines anothervector Af = (Af(t, 0), Af(t, 1), Af(t, 2), . . . )T. In view of the form (5.9) of A, one has that

Af(t, n) = � (f(t, n+ 1)− f(t, n)) .

Letting Mt = Nt − �t, it follows that (8.12) can be rewritten as the following Itô formulafor a Poisson process (canonical differential form)

df(t,Nt) = ∂tf(t,Nt)dt+ (f(t,Nt− + 1)− f(t,Nt−)) dNt

= (∂tf +Af) (t,Nt)dt + (f(t,Nt− + 1)− f(t,Nt−)) dMt.

Consequently, we have that

f(t,Nt)−∫ t

0(∂sf(s,Ns) +Af(s,Ns)) ds =

f(0, N0) +

∫ t

0(f(s,Ns− + 1)− f(s,Ns−)) dMs

which by application of Proposition 7.3 is a (local) martingale with respect to the naturalfiltration of N .

In addition to the Poisson process N with intensity �, let us consider a standard d-variate Brownian motion W , and let J(t) denote a family of i.i.d. d-variate random variableswith distribution denoted by w(dy), all assumed to coexist on the same probability space.Given adapted coefficients bt (a random vector in ℝd), �t (a random matrix in ℝd×d) anda predictable function �t(x) (a random vector in ℝd marked or parameterized by x ∈ ℝd),we shall now consider an Itô process in the sense of an adapted d-variate process X obeyingthe following dynamics over [0, T ]:

dXt = bt dt+ �t dWt + �t(J(t))dNt. (8.13)

In particular the description of the jumps of X is decomposed into, on one hand, the fre-quency of the jumps of X (given by, on average, jumps of N per unit of time), and, onthe other hand, the distribution w of the marks determining the jump size of X incurred by

82

a jump of N . Given a real valued, “regular enough” function f = f(t, x) on [0, T ]×ℝd, onethen has for any t ∈ [0, T ] the following Itô formula for an Itô process

df(t,Xt) = ∂tf(t,Xt)dt+ ∂f(t,Xt)btdt+ ∂f(t,Xt)�t dWt (8.14)

+1

2Tr(∂2f(t,Xt)at

)dt+ �ft (Xt−, J(t))dNt

in which:

∙ ∂f(t, x) and ∂2f(t, x) denote the row-gradient and the Hessian of f with respect to x,

∙ at = �t�Tt is the covariance matrix of X,

∙ Tr stands for the trace operator (sum of the diagonal elements of a square matrix),

∙ and

�ft (x, z) = f(x+ �t(z))− f(x).

Itô formula (8.14) reads equivalently, in canonical form :

df(t,Xt) =

(∂tf(t,Xt) + ∂f(t,Xt)bt +

1

2Tr(∂2f(t,Xt)at

)+ ��ft (Xt)

)dt

+∂f(t,Xt)�t dWt + dMft (8.15)

with

�ft (x) = E(�ft (x, J(t))

∣∣∣ℱt−) =

∫ℝd

(f(x+ �t(y))− f(x))w(dy).

anddMf

t = �ft (Xt−, J(t))dNt − ��ft (Xt)dt. (8.16)

Moreover one has the following

Lemma 8.1. Process Mf is a local martingale, and a martingale under suitable integrabilityconditions.

In the formalism of measure-stochastic integration, process Mf can equivalently bewritten as the stochastic integral of a predictable random function with respect to a com-pensated Poisson random measure. Lemma 8.1 then appears as an analog of Proposition 7.3.Using measure-stochastic integration, is also possible to adapt and extend the Itô formula(8.15) to more general Itô processes with an infinite activity of jumps.

8.3.1 Brackets

Introducing the (random) generator

Aft (x) = ∂f(t, x)bt +1

2Tr(∂2f(t, x)at

)+ ��ft (x),

it follows from (8.15) under suitable integrability conditions:

(dt)−1E(df(t,Xt)

∣∣ℱt−) = ∂tf(t,Xt) +Aft (Xt). (8.17)

83

Given another real valued function g = g(t, x), it is also easy to show that

(dt)−1Cov(df(t,Xt), dg(t,Xt)

∣∣ℱt−) (8.18)

= ∂f(t,Xt)at(∂g(t,Xt))T + ��f,gt (Xt) = Af,gt (Xt)

where (f, g) 7→ �f,g and (f, g) 7→ Af,g are the bilinear “carré du champ” (random) operatorsassociated with the linear (random) operators f 7→ Af and f 7→ �f , so

�f,gt (x) = �fgt (x)− f(t, x)�gt (x)− g(t, x)�ft (x)

=

∫ℝd

(f(x+ �t(y))− f(x)) (g(x+ �t(y))− g(x))w(dy).

and Af,g = Afg − fAg − gAf . Letting Yt = f(t,Xt) and Zt = g(t,Xt), the processCov

(dYt, dZt

∣∣ℱt−) also corresponds to the so-called sharp bracket d⟨Y,Z⟩t. To sum up,

Proposition 8.2. The (random) generator f 7→ Aft of process X, and its carré du champ(f, g) 7→ Af,g, are such that for every functions f, g of x,

E(df(t,Xt)

∣∣ℱt−) =(∂tf(t,Xt) +Aft (Xt)

)dt

Cov(df(t,Xt), dg(t,Xt)

∣∣ℱt−) = Af,gt (Xt)dt = d⟨Y, Z⟩t.(8.19)

In particular,

(dt)−1V ar(dYt∣∣ℱt−) = Af,ft (Xt) (8.20)

=d⟨Y ⟩tdt

= ∂f(t,Xt)at(∂f(t,Xt))T + ��f,ft (Xt)

with

�f,ft (x) =

∫ℝd

(f(t, x+ �t(y))− f(t, x))2w(dy).

By letting f and g range over the various coordinate mappings of X in the second line of(8.19), and denoting in matrix form < X >= (< Xi, Xj >)ji , one obtains that

(dt)−1Cov(dXt

∣∣ℱt−) =d⟨X⟩tdt

= at + �

∫ℝd

(�t�Tt )(y)w(dy). (8.21)

Observe that the above sharp brackets compensate the corresponding square brackets(quadratic covariation and variations) defined by [Y,Z]0 = 0 and

d[Y,Z]t = ∂f(t,Xt)at(∂g(t,Xt))Tdt+ �ft (Xt−, J(t))�

gt (Xt−, J(t))dNt

and thus [Y, Y ]0 = 0 and

d[Y, Y ]t = ∂f(t,Xt)at(∂f(t,Xt))T +

(�ft (Xt−, J(t))

)2dNt.

Notably, if X is a continuous Itô process, the corresponding sharp and square brackets (existand) coincide.

The square brackets can equivalently be defined as limits in probability1 of realizedcovariance and variance processes. They can be defined as such for any semimartingalesY, Z. They are key in the following semimartingale integration by parts formulas (indifferential form):

d(YtZt) = Yt−dZt + Zt−dYt + d[Y,Z]t (8.22)

and can also be used for stating a general semimartingale Itô formula .1Or almost sure limits in the case of nested time-meshes.

84

Homework 9: Itô formula

1. Verify formulas presented in Examples (8.1)-(8.4) and (8.5)-(8.7).

2. Apply Itô formula to f(Wt)

(a) f(x) = x,

(b) f(x) = x3.

3. Verify that process Y given as Yt =∫ t

0 WsdWs, t ≥ 0, is a martingale with respect tothe natural filtration of process W .

4. Denoting as usual Mt = Nt − �t :

(a) Verify that process Y given as Yt =∫ t

0 Ns−dMs, t ≥ 0, is a martingale withrespect to the natural filtration of process N ;

(b) Compute Jt =∫ t

0 NsdMs. Is process Jt a martingale with respect to the naturalfiltration of process N?

5. Derive the Itô formula relative to both Brownian motion and Poisson process, so fordf(t,Wt, Nt). Apply this formula to f(t, x, y) = txy + t+ x+ y.

6. Let W = (Wt, t ≥ 0) be a SBM. Define a process Y by

Yt =

∫ t

0eWsdWs, t ≥ 0.

Verify whether the process Y is a martingale with respect to the natural filtration ofW.

Chapter 9

Stochastic differential equations(SDEs) [Mikosch, Chapter 3; Shreve,Section 6.2]

9.1 Introduction

Consider the ordinary differential equation

dx(t)

dt= b, t ≥ 0, x(0) = x0. (9.1)

The constant b may be interpreted as an infinitesimal [instantaneous] absolute rate of changeof the function x(t). This is because, in view of (9.1), we have

x(t+ dt)− x(t) = bdt.

The solution to equation (9.1) is

x(t) = x0 + bt, t ≥ 0.

Now, consider the ordinary differential equation

dx(t)

dt= bx(t), t ≥ 0, x(0) = x0. (9.2)

Here, the constant b may be interpreted as an infinitesimal [instantaneous] relative rate ofchange of the function x(t). This is because, in view of (9.2), we have

x(t+ dt)− x(t)

x(t)= bdt.

The solution to equation (9.2) is

x(t) = x0ebt, t ≥ 0. (9.3)

Now, imagine that both rates are perturbed by normally distributed random shocks.In case of (9.1) this phenomenon can be modeled as

x(t+ dt)− x(t) = bdt+ �(Wt+dt −Wt)

85

86

or, equivalentlydx(t) = bdt+ �dWt. (9.4)

In case of (9.2) this phenomenon can be modeled as

x(t+ dt)− x(t)

x(t)= bdt+ �(Wt+dt −Wt)

or, equivalentlydx(t) = x(t)

(bdt+ �dWt

). (9.5)

Equations (9.4) and (9.5) are prototypes of stochastic differential equations (SDEs). Itneeds to be explained what is meant by a solution to a SDE, and how a SDE can be solved.

9.2 Diffusions

Definition 9.1. A Markov process X on state space S = (a, b),−∞ ≤ a < b ≤ ∞, is saidto be a diffusion with drift coefficient b(t, x) and diffusion coefficient �2(t, x) > 0, if(i) (Xt, t ≥ 0) has continuous sample paths, and(ii) The following relations hold at h→ 0, for every (t, x) ∈ ℝ+ × ℝ :

E(Xt+h −Xt∣Xt = x) = b(t, x)h + o(h) (9.6)E[(Xt+h −Xt)

2∣Xt = x] = �2(t, x)h + o(h). (9.7)

The functions b(t, x) and �2(t, x) are usually assumed to be continuous. They are also calledthe local mean function and the local variance function of a diffusion. Diffusion processesbehave locally like BM.

One has

V ar(Xt+h −Xt∣Xt = x)

= E[(Xt+h −Xt)2∣Xt = x]− [E(Xt+h −Xt∣Xt = x)]2

= �2(t, x)h + o(h)− [b(t, x)h + o(h)]2

= �2(t, x)h + o(h).

HenceV ar(Xt+h −Xt∣Xt = x)− E[(Xt+h −Xt)

2∣Xt = x] = o(h).

Therefore (9.7) is equivalent to

V [(Xt+h −Xt)2∣Xt = x] = �2(t, x)h + o(h) (9.8)

In case the coefficients b and � do not depend on time one calls X a time homogeneousdiffusion.

9.2.1 SDEs for diffusions

Let � be a random variable. Let b(t, x) and �(t, x) be two real valued functions. Supposenow that a real valued process X satisfies the following three properties:

∙ Property 1 The process X is adapted with respect to to the filtration generated by� and the SBM W .

87

∙ Property 2 The ordinary and Itô integrals below are well-defined for every t ∈ [0, T ]:∫ t

0b(s,Xs)ds,

∫ t

0�(s,Xs)dWs.

∙ Property 3 The equation

Xt = � +

∫ t

0b(s,Xs)ds+

∫ t

0�(s,Xs)dWs,

is satisfied for all t ∈ [0, T ].

Definition 9.2. We say that a process X is a strong solution to the SDE

dXt = b(t,Xt)dt+ �(t,Xt)dWt, t ≥ 0, X0 = �, (9.9)

if the process X satisfies the Properties 1-3 above.

A strong solution to the SDE (9.9) is readily seen to be a diffusion in the sense of Definition9.1. The SDE is thus also known as the diffusion equation with drift coefficient b(t, x),diffusion coefficient �(t, x) and initial condition �.

Observe that the strong solution process X to the diffusion SDE (9.9) is an Itô processin the sense of Section 8.3 (special case without jumps).

Solving equation (9.9) means determining a process X that is a strong solution to (9.9).[There is another concept of a solution to equation (9.9), so called weak solution. We shallnot discuss it here however.]

Typically, the drift and the diffusion coefficients, as well as the initial condition are theinput data for a modeler of physical phenomena. Therefore, if one attempts to model evolu-tion of a physical phenomenon using equation like (9.9) above, one must address questionsof uniqueness and existence of strong solutions to the equations like (9.9), just like it is thecase with ordinary differential equations. That is to say, one must answer the following twoquestions:

∙ What conditions are necessary and/or sufficient to be satisfied by b(t, x), �(t, x) and� so that there exist solutions to equation (9.9)?

∙ What conditions are necessary and/or sufficient to be satisfied by b(t, x), �(t, x) and� so that there exists a unique solution to equation (9.9)?

We shall not address these questions here. The reader is referred to Mikosch, p.138, for abrief discussion of the above issues.

9.2.2 Examples

Here are some basic examples of time-homogeneous diffusions:

1. BM: b(x) = b, �2(x) = �2; SBM: b(x) = 0, �2(x) = 1.

2. Ornstein-Uhlenbeck process, S = (−∞,∞)

b(x) = a(b− x), �2(x) = �2, a, b, �2 > 0.

This is a mean-reverting process.

88

3. Square-root process, S = (0,∞)

b(x) = bx, �2(x) = �2x, b, �2 > 0.

4. Square-root process with mean reversion, S = (0,∞)

b(x) = a(b− x), �2(x) = �2x, a, b, �2 > 0.

5. Constant Elasticity of Variance Diffusion, S = (0,∞)

b(x) = bx, �2(x) = �2x , b, �2 > 0, 0 ≤ ≤ 2

6. Geometric BM St, S = (0,∞). By definition St = exp(Xt), where Xt is BM. Anapplication of the Itô formula yields

b(S) = S

(b+

�2

2

), �2(S) = �2S2

The latter result can also be obtained by direct computations of

E(St+h − St∣St = S) = S

[exp

(bh +

�2

2h

)− 1

]= S

(1 + bh +

�2

2h + o(h)− 1

)= S

(bh +

�2

2h

)+ o(h)

and

V ar(St+h − St∣St = S) = S2 exp(2bh + �2h)[exp(�2h)− 1]

= S2(1 + 2bh + �2h + o(h)

)(�2h + o(h))

= �2S2h + o(h).

9.3 Solving diffusion SDEs

Deriving an explicit formula for a strong solution to the SDE (9.9) is not possible in general[just like it is the case with ODEs, or PDEs]. So, in general, one needs to approximatesolutions to equations like (9.9) by using numerical methods [see Mikosch, Section 3.4].Nevertheless, sometimes it is possible to guess an explicit formula for a strong solution tothe SDE (9.9), and then to use Itô formula to verify that the guessed formula is a correctone (Property 3 may be verified using Itô formula).

We shall discuss several examples of diffusion SDEs that can be explicitly solved byusing Itô formulas.

Example 9.3. Consider the equation

dXt = dWt, t ≥ 0; X0 = 0. (9.10)

Here b(t, x) = 0, �(t, x) = 1. The obvious strong solution is: Xt = Wt, t ≥ 0.

Exercise 1. Verify that Properties 1-3 are verified by this solution.


dXt = bdt+ �dWt, t ≥ 0; X0 = x. (9.11)

Here b(t, x) = b, �(t, x) = �. The obvious strong solution is: Xt = x+ bt+ �Wt, t ≥ 0.

89



dXt = dt+ 2sgn(Wt)√XtdWt, t ≥ 0; X0 = 0. (9.12)

Here b(t, x) = 1, �(t, x, w) = 2sgn(w)√x, where

sgn(x) =

{−1 if x > 00 if x = 0.

We shall verify that Xt = W 2t , t ≥ 0 is a strong solution to this equation.

∙ Property 1. For every t ≥ 0 the random variable Xt = W 2t is a function of Wt.

∙ Property 2. The integral∫ t

0 1ds = t is well defined. The integral∫ t

0 sgn(Ws)√XsdWs

is well defined. This is because the process√Xt is adapted to the SBM, and∫ t

0E[(sgn(Ws)

√Xs

)2]ds =

∫ t

0sds =

1

2t2

is well defined.

∙ Property 3. Using Itô formula we get [recall that here X0 = 0]

Xt = 2

∫ t

0WsdWs +

∫ t

01ds

= 2

∫ t

0sgn(Ws)

√XsdWs +

∫ t

01ds, t ≥ 0.

Thus, indeed Xt = W 2t , t ≥ 0 is a strong solution to equation (12).

Remark 9.6. Equation (9.12) is not of the form (9.9). In fact, this equation can beconsidered as a part of the following system of SDEs for two processes X and Y :

dXt = dt+ 2sgn(Yt)√XtdWt, t ≥ 0; X0 = 0, (9.13)

dYt = dWt, t ≥ 0; Y0 = 0. (9.14)

It can be demonstrated [using so called Levy characterization theorem] that the process

Wt =

∫ t

0sgn(Ws)dWs, t ≥ 0;

is a SBM. Observe that dWt = sgn(Wt)dWt. Thus the process Xt = W 2t which is the strong

solution to equation (9.12) is a weak solution to the following SDE

dXt = dt+ 2√XtdWt, t ≥ 0; X0 = 0.

In this sense the process Xt is a homogeneous diffusion with the drift coefficient b(x) =b(t, x) = 1 and the diffusion coefficient �(x) = �(t, x) = 2

√x. Finally observe that the

process Xt is an example of the square-root diffusion (see Subsection 9.2.2).

90


dXt =1

2Xtdt+XtdWt, t ≥ 0; X0 = 1. (9.15)

Here b(t, x) = 12x and �(t, x) = x. Using Example 8.2, we easily deduce that the process

Xt = eWt , t ≥ 0 is a strong solution to this equation.



dXt = bXtdt+ �XtdWt, t ≥ 0; X0 = ey. (9.16)

Here b(t, x) = bx and �(t, x) = �x. Using Example 8.4, we easily deduce that the GBMprocess

Xt = ey+(b− 12�2)t+�Wt , t ≥ 0;

is a strong solution to equation (9.16). Note that the random variable Yt = lnXt is normallydistributed with mean y + (b − 1

2�2)t and variance �2t. That is, the random variable Xt

has a lognormal distribution. Examples 9.7 and Example 9.9 below are special cases of thisexample.


dXt = (b+ �2/2)Xtdt+ �XtdWt, t ≥ 0; X0 = y. (9.17)

Here b(t, x) = (b+�2/2)x and �(t, x) = �x. Using Example 9.8 above we easily deduce thatthe GBM process

Xt = yebt+�Wt , t ≥ 0;

is a strong solution to equation (9.17). In particular, for b = −12 , � = 1 and y = 1 we get

thatXt = e−

12t+Wt (9.18)

solvesdXt = XtdWt, t ≥ 0; X0 = 1, (9.19)

and thus it is a Brownian martingale, called the stochastic exponential of the Brownianmotion W .

Example 9.10. Consider the Ornstein-Uhlenbeck equation

dXt = a(b−Xt)dt+ �dWt, t ≥ 0; X0 = x. (9.20)

Here b(t, x) = a(b − x) and �(t, x) = �. The strong solution to equation (9.20) is theOrnstein-Uhlenbeck (OU) process

Xt = xe−at + b(1− e−at) + �e−at∫ t

0easdWs, t ≥ 0.

Exercise 4. Verify that Properties 2-3 are satisfied here.

91

The random variable∫ t

0 easdWs has a normal distribution with mean zero and variance∫ t

0 e2asds. Thus, for the OU process we have

Xt ∼ N(xe−at + b(1− e−at), �2e−2at

∫ t

0e2asds

).

If a > 0 then for large values of t the distribution of the OU random variable is close toN (b, �

2

2a ). This is the reason why the constant b is called the mean reversion level.

Example 9.11. Consider an SDE

dXt = − Xt

1− tdt+ dWt, t ∈ [0, 1), X0 = 0.

The strong solution of this equation is

Xt = (1− t)∫ t

0

1

1− sdWs, t ∈ [0, 1).

It can be shown by continuity that X1 = 0. Thus, the process Xt is a Gaussian processwith the mean function m(t) := EXt = 0 and covariance function c(t, s) := Cov(Xt, Xs) =EXtXs = min (t, s)− ts, t, s ∈ [0, 1]. Process Xt is known as Brownian bridge.

In Mikosch, Example 1.3.5, Brownian bridge is given as

Yt = Wt − tW1, t ∈ [0, 1].

Observe that processes X and Y are both Gaussian with the same mean function m(t) andthe same covariance function c(t, s). However, the process Y is not adapted to the filtrationof W .

All the above examples, with the exception of Example 9.5, are special cases of theGeneral Linear SDE (3.32) of Mikosch, Section 3.3.

9.4 SDEs Driven by a Poisson Process

Let N be PP (�). In many ways SDEs driven by a Poisson process are easier to deal withthat SDEs driven by the SBM. As it was the case with SDEs driven by the SBM, Itô formulaplays a fundamental role in solving SDEs driven by a Poisson process.

Example 9.12. The unique strong solution to the equation

dXt = Xt−dNt, t ≥ 0; X0 = 1,

is the process Xt = (1 + )Nt , t ≥ 0. In particular, when = 1 we obtain Xt = 2Nt , t ≥ 0.

Example 9.13. The unique strong solution to the equation

dXt = bXt−(dt+ dNt), t ≥ 0; X0 = 1,

is the processXt = ebt(1+b)Nt , t ≥ 0. In particular, when b = 1 we obtainXt = et2Nt , t ≥ 0.

Example 9.14. The unique strong solution of

dXt = Xt−dMt, t ≥ 0; X0 = 1,

is

Xt = 1 +

∫ t

0Xu−dMu = eNt ln 2−�t, t ≥ 0.

Thus, process e−�t+Nt ln 2 is a martingale. Compare this with (9.18) and (9.19).

92

Homework 10: Stochastic differential equations

1. Do

(a) Exercise 1.

(b) Exercise 2.

(c) Exercise 3.

(d) Exercise 4.

2. Let Wt, t ≥ 0 be a SBM.

(a) Compute the expectation EXT and the variance V ar(XT ), whereXt is the strongsolution to the SDE

dXt = Xtdt+XtdWt, t ∈ [0, T ], X0 = 1.

Is the process X a martingale with respect to the natural filtration of W?

(b) Compute the covariance Cov(Yt, Ys) for 0 ≤ s ≤ t, where Yt is the strong solutionto the SDE

dYt = −Ytdt+ dWt, t ≥ 0, Y0 = 0

In addition compute the mean and the variance of the limiting distribution of theprocess Y . [Hint: Process etYt has independent increments.]

(c) What is the distribution of Yt, where Y is the strong solution to the SDE

dYt = −Ytdt+ dWt, t ≥ 0, Y0 = y

in which y ∼ N (0, 14) is independent of the SBM W .

3. Let Nt, t ≥ 0 be a PP(�). Compute the covariance Cov(Yt, Ys) for 0 ≤ s ≤ t, whereYt = lnZt, and where Z is the strong solution to the SDE

dZt = Zt−(dt+ dNt), t ≥ 0, Z0 = 1.

4. Let Wt, t ≥ 0 be a SBM. Let Nt, t ≥ 0 be a PP(�). Let Xt = 2Nte3t +W 2t . Compute

the differential dXt.

93

9.5 Jump-Diffusions [See Ikeda and Watanabe]∗

By jump-diffusion we mean hereafter an Itô process X in the sense of Section 8.3, but for aMarkovian SDE (8.13), meaning that the random coefficients bt, �t and �t(x) of (8.13) arenow given deterministically in terms of of Xt−

1 as

bt = b(t,Xt), �t = �(t,Xt), �t(x) = �(t,Xt−, x).

Equation (8.13) is thus now an SDE in X. Well-posedness2 of such jump-diffusion SDEscan be studied by classical Picard iteration techniques under suitable Lipschitz and growthconditions on the coefficients. A notable feature of the solution is the so-called Markovproperty , meaning that

E(Φ(Xs, s ∈ [t, T ]) ∣ ℱt) = E(Φ(Xs, s ∈ [t, T ]) ∣Xt)

for every (possibly path-dependent) functional Φ of X giving sense to both sides of theequality. So “the past of X does not influence its future,” the present of X gathering all therelevant information.

Given a real valued, “regular enough” function f = f(t, x) on [0, T ] × ℝd, one has by(8.15) the following Itô formula for a jump-diffusion (canonical form)

df(t,Xt) = (∂t +A) f(t,Xt)dt+ ∂f(t,Xt)�t dWt + dMft (9.21)

for the compensated jump (local) martingale

dMft = �f(t,Xt−, J(t))dNt − ��f(t,Xt)dt

in which we let for every t ≥ 0 and x, y in ℝd

�f(t, x, y) = f(t, x+ �(t, x, y))− f(t, x), �f(t, x) =

∫ℝd�f(t, x, y)w(dy)

and where the infinitesimal generator A of X acts on f at time t as

(Af)(t, x) = ∂f(t, x)b(t, x) +1

2Tr(∂2f(t, x)a(t, x)

)+ ��f(t, x). (9.22)

Therefore, for every suitable functions f, g = f, g (x) (cf. (8.19))

E(df(t,Xt)

∣∣ℱt−) = (∂t +A)f(t,Xt)dtCov

(df(t,Xt), dg(t,Xt)

∣∣ℱt−) = (A(fg)− fAg − gAf) (t,Xt)dt(9.23)

in which with Yt = f(t,Xt), Zt = g(t,Xt)

(A(fg)− fAg − gAf) (t,Xt) =d⟨Y,Z⟩tdt

= ∂f(t,Xt)at(∂g(t,Xt))T + �

∫ℝd�f(t,Xt, y)�g(t,Xt, y)w(dy).

Also, the conditionings with respect to ℱt− in (9.23) can be replaced by the conditioningwith respect to Xt−, by Markov property of X.

1Or of Xt in the case of b and �, which in view of (7.6) makes no difference in (8.13), by continuity of tand Wt.

2In the so-called strong sense to which we shall limit ourselves in the context of these notes.

94

Chapter 10

Girsanov transformations

Girsanov transformation is a very useful technique of converting processes that are not mar-tingales into martingales. It amounts to changing probabilities of random events (changingprobability measures).

10.1 Girsanov transformation relative to Gaussian distribu-tions

10.1.1 Gaussian random variables

Suppose " is a standard normal variable, that is "P∼N (0, 1). Its probability density function

is (x) = (2�)−12 exp

(− x2

2

), x ∈ (−∞,∞). Now, consider a function w(x) = eqx−

q2

2 , whereq is a constant. Let us now transform the probability measure P via the function w in orderto produce a new probability measure on Ω, denoted by Q, and defined by

dQ(!)

dP (!)= �(!) (10.1)

where � = w("). We call the random variable � the density of measure Q with respect toto the measure P . This is equivalent to writing

dQ = �dP [meaning that Q is absolutely continuous with respect to P ]

or

dP = �−1dQ [meaning that P is absolutely continuous with respect to Q].

We say that measures P and Q are equivalent with respect to each other.

Exercise 1. Verify that Q is a probability measure.[Hint: Since obviously Q is non-negative and �-additive, then verifying that Q is a probabil-ity measure amounts to verification thatQ(Ω) = 1, whereQ(Ω) =

∫Ω dQ(!) =

∫Ω �(!)dP (!) =

EP� =∫∞−∞w(x) (x)dx.]

95

96

In view of (10.1) we obtain

Q(" ∈ dx) = w(x)P (" ∈ dx)

= eqx−q2

2 (2�)−12 exp

(− x2

2

)dx

= (2�)−12 exp

(− (x− q)2

2

)dx.

But the function (2�)−12 exp

(− (x−q)2

2 ) is the density of the normal distribution N (q, 1).Thus, the random variable " has the normal distribution N (q, 1) under the measure Q,which we write X Q∼N (q, 1).

10.1.2 Brownian motion [Mikosch, Section 4.2; Shreve, Sections 1.6 and5.2.1]

Let q be a constant as before, let T <∞ and let us consider a process B defined as

Bt = Wt − qt, t ∈ [0, T ],

where the process Wt is the SBM under some filtered probability space (ℱt, P ). Now, definea process �t by

�t(!) = exp{qWt(!)− 1

2q2t}, t ∈ [0, T ], (10.2)

and then define a new measure Q on ℱT by

dQ(!) = �T (!)dP (!).

A remarkable result, known as the Girsanov Theorem, states that:

∙ The process �t is a martingale with respect to the filtration ℱt under the measure P ,

∙ The measure Q is a probability measure,

∙ The process Bt is a SBM under the measure Q.

Remark 10.1. The Girsanov theorem can not be generalized to the case T =∞.

For us, the most important application of the Girsanov theorem is the following example

Example 10.2. (Elimination of the drift term in a linear SDE) Consider the linear SDE

dXt = bXtdt+ �XtdWt, t ∈ [0, T ]; X0 = ey. (10.3)

From Example 9.16 we know that the strong solution to this equation is

Xt = ey+(b− 12�2)t+�Wt , t ∈ [0, T ].

This is not a martingale under P . Let us introduce a process

Bt = Wt − qt, t ∈ [0, T ],

with q = − b� . We may now rewrite equation (10.3) as

dXt = �XtdBt, t ∈ [0, T ]; X0 = ey. (10.4)

Due to the Girsanov theorem the process B is a SBM under Q. Thus the process Xt =ey+(b− 1

2�2)t+�Wt = ey−

12�2t+�Bt , which is the strong solution to both (10.3) and (10.4), is

a martingale under Q. This observation plays a crucial role in the so called risk-neutralapproach to pricing financial assets.

97

Exercise 2. Verify that the strong solution to (10.4) is a martingale with respect to thefiltration ℱt.

Exercise 3. Write down a SDE satisfied by process �t of (10.2).

10.2 Girsanov transformation relative to Poisson distributions

10.2.1 Poisson random variables

Let � P∼ P� be a Poisson random variable with parameter � under P , so

P (� = k) =e−��k

k!

for k = 0, 1, 2, . . . , and zero otherwise. Letting w(k) = e(�− )−k ln �

for some > 0, definea new measure Q on (Ω,ℱ) by

dQ(!)

dP (!)= �(!),

where � = w(�). Note

EP� =

∫�(!)P (d!) =

∑k

e(�− )−k ln(�

)e−�

�k

k!= 1,

showing that Q is a probability measure (with total mass equal to one). Observe now that

Q(� = k) =

∫{!;�(!)=k}

dQ(!) =

∫{!;�(!)=k}

�(!)dP (!)

= e(�− )−k ln �

P (� = k) =e− k

k!

for k = 0, 1, 2, . . . , and zero otherwise. Thus,

�Q∼ P .

10.2.2 Poisson process

Let Nt be PP(�) on some filtered probability space (Ω,ℱt, P ). Now, for > 0, define aprocess �t by

�t(!) = e(�− )t−Nt(!) ln �

, t ∈ [0, T ], (10.5)

and then, define a new measure Q on (Ω,ℱT ) by

dQ(!) = �T (!)dP (!).

An appropriate version the Girsanov Theorem states that:

∙ The process �t is a martingale with respect to the filtration ℱt under the measure P ,

∙ The measure Q is a probability measure,

∙ The process Nt is PP( ) under the measure Q.

98

Example 10.3. (Elimination of the drift term in a linear SDE) Consider the linear SDE

dXt = Xt−(− dt+ dNt), t ∈ [0, T ], X0 = 1. (10.6)

Process Xt is not a martingale with respect to ℱt under P. The above equation can of coursebe written as

dXt = Xt−dMt, t ∈ [0, T ]; X0 = 1, (10.7)

where we let Mt = Nt − t. So process Xt is a martingale with respect to ℱt under Q.

Exercise 4. Verify that the strong solution to (10.7) is a martingale with respect to thenatural filtration of N under Q.

Exercise 5. Take � = 1. Verify that process � of (10.5) is the strong solution to

d�t = ( − 1)�t−(−dt+ dNt), t ∈ [0, T ]; X0 = 1,

10.3 Girsanov transformation relative to both Brownian mo-tion and Poisson process

Girsanov transformation can be applied jointly to a pair (X,N) where X and N are aBrownian motion and a Poisson process, respectively, defined on the same probability space,say (Ω,ℱ , P ).

If X ∼ BM(x, b, �) and N ∼ PP(�), then one can apply a simultaneous change ofprobability measure P to a new measure Q so that under the new measure we have thatX ∼ BM(x,m, �) and N ∼ PP( ).

10.4 Abstract Bayes formula

Suppose that � is an ℱT -measurable and integrable random variable and let Q be definedvia P by dQ

dP = �T , for some positive P -martingale � with unit mean. We shall admit thefollowing lemma.

Lemma 10.1. A process X is a Q-local martingale if and only if �X is a P -local martingale.

As a consequence,

Corollary 10.2. The following Bayes formula holds:

�tEQ(�∣ℱt) = EP (�T �∣ℱt). (10.8)

Proof. One has a (Doob) Q-martingale EQ(�∣ℱt), and therefore by the lemma a P -martingale �tEQ(�∣ℱt). Since �tEQ(�∣ℱt) is a P -martingale with terminal condition �T �at T, so (10.8) follows (admitting the required integrability). □

99

Homework 11: Girsanov transformations

1. Solve Exercises 1-5 above.

2. Let Nt, t ∈ [0, T ] be a PP(�) on a filtered probability space (ℱt, P ) Let X be thestrong solution to the following SDE on this space:

dXt = Xt−(dt+ dNt), t ≥ 0, X0 = 1.

Define a new probability measure Q on (Ω,ℱT ) so that process X is a martingaleunder Q.

100

Chapter 11

Feynman-Kac formulas∗

11.1 Linear case

Let X be given as a jump-diffusion and let f = f(t, x) be a function such that f(t,Xt) islocal martingale. Then in view of the Itô formula in canonical form (9.21), one concludesfrom Proposition 7.2 that the time-differentiable local martingale

(∂t +A) f(t,Xt)dt = df(t,Xt)− ∂f(t,Xt)�t dWt − dMft

is constant. Using for instance BSDE techniques to be mentioned in Section 11.2, this inturn translates into the following partial integro-differential equation (deterministic PIDE)to be satisfied by the function f :

(∂t +A) f(t, x) = 0, x ∈ ℝd. (11.1)

The fundamental situation of this kind corresponds to a Doob-martingale

f(t,Xt) := E(�(XT )

∣∣Xt

)= E

(�(XT )

∣∣ℱt)for an integrable terminal condition �(XT ). Here the second equality, which grounds themartingale property of f(t,Xt), holds in virtue of the Markov property of X. In this casethe function f can typically be characterized as the unique solution to the PIDE (11.1),along with the terminal condition f = � at time T.

More generally, given suitable running and terminal cost functions c and �, and adiscount rate function r, let

u(t,Xt) := E

(∫ T

te−

∫ st r(�,X�)d�c(s,Xs)ds+ e−

∫ Tt r(s,Xs)ds�(XT )

∣∣∣Xt

)= E

(∫ T

te−


∫ Tt r(s,Xs)ds�(XT )

∣∣∣ℱt) , (11.2)

by Markov property of X. Let �t = e−∫ t0 r(s,Xs)ds denote the discount factor at rate r(t,Xt).

By immediate extensions of the previous computations, one then has:

∙ On one hand, the following (local) martingale that arises from the integration by partsand Itô formulas applied to u(t,Xt):

du(t,Xt)− (∂tu+Au) (t,Xt)dt = ∂u(t,Xt)�t dWt + dMut ; (11.3)

101

102

∙ On the other hand, the following Doob-martingale (conditional expectation process ofan integrable terminal condition) that arises from (11.2):

du(t,Xt) + (c(t,Xt)− ru(t,Xt)) dt. (11.4)

Substracting (11.3) from (11.4) yields the local martingale

(∂tu+Au+ c− ru)(t,Xt)dt (11.5)

which is therefore constant as a time-differentiable local martingale (Proposition 7.2). Alsoaccounting for the terminal condition u = � at time T , this translates into the followingPIDE to be satisfied by the function u:{

u(T, x) = �(x), x ∈ ℝd(∂tu+Au+ c− ru) (t, x) = 0, t < T, x ∈ ℝd (11.6)

The function u can then typically be characterized and computed (including numerically ifneeded/possible) as the unique solution in some sense to (11.6).

11.2 Backward Stochastic Differential Equations

The SDEs that were discussed in Chapter 9 were so called forward SDEs. This is because anysolution process of any such equation was supposed to satisfy the given initial condition.In Example 9.3 we considered the equation

dYt = dWt, t ≥ 0; Y0 = 0, (11.7)

with the obvious solution Yt = Wt. In the above equation the initial condition was specified– the equation is to be solved forward in time. The backward version of equation (11.7)would read

dYt = dWt, t ≥ 0; YT = �, (11.8)

where � is some random variable. In equation (11.8) the terminal condition is specified –the equation is to be solved backward in time, and called therefore a backward stochasticdifferential equation (BSDE). It is rather clear that equation (11.8) is only solvable if � =WT + c, where c is a constant, in which case we have Yt = Wt + c. Note that for c = 0 thesolution of this equation is the same as the solution of equation (11.7), that is Yt = Wt. Alsonote that equation (11.8) can be written as

dYt = ZtdWt, t ≥ 0; YT = �, (11.9)

where Zt = 1 (observe that the process Z is (trivially) adapted to the filtration ℱt, t ≥ 0,generated by the Brownian motion W ). In case when � = WT + c we saw that the BSDE(11.9) admits a solution pair (Yt, Zt) that is adapted to the filtration ℱt, t ≥ 0, whereYt = Wt + c and Zt = 1.

In more generality, consider the BSDE (11.9) where � is some random variable mea-surable with respect ℱT (not necessarily � = WT + c). If we additionally suppose that � issquare integrable, then one can show that BSDE (11.9) admits a solution pair (Yt, Zt) that isadapted to the filtration ℱt, t ≥ 0, where Yt = E(�∣ℱt) and Zt is a unique process appearingin the so called Brownian martingale representation of �, that is � = E(�) +

∫ T0 ZsdWs. We

then have

Yt = Y0 +

∫ t

0ZsdWs = E(�) +

∫ t

0ZsdWs = � −

∫ T

tZsdWs.

103

11.2.1 Non-linear Feynman-Kac formula

In the jump-diffusion set-up of Subsection 11, one can in view of (11.3) and (11.6) interpretthe triplet of processes (parameterized by x ∈ ℝd in case of V )

(Yt, Zt, Vt(x)) := (u(t,Xt), ∂u(t,Xt)�(t,Xt), �u(t,Xt−, x)) (11.10)

as a solution to the following BSDE:{YT = �(XT ) and for t < T,−dYt =

(c(t,Xt)− r(t,Xt)Yt

)dt− ZtdWt −

(Vt(J(t))dNt − �Vtdt

),

(11.11)

with Vt :=∫ℝd Vt(x)w(dx). In case of a diffusion X (without jumps, so � = 0), there is no

component V involved in the solution (or formally V = 0 above).In a BSDE perspective, the Feynman-Kac formula (11.2), written in the equivalent

form (as obvious from (11.11))

Yt = E

(∫ T

t

(c(s,Xs)− r(s,Xs)Ys + �(XT )

)ds∣∣∣ℱt) , (11.12)

is regarded as the Feynman-Kac representation of the solution (Y, Z, V ) of the BSDE (11.11).In fact, the simplest way to rigorously solve (11.6) in a function u satisfying (11.2), isactually to go the other way round, namely solving “upfront” (11.11) in a triplet of processes(Yt, Zt, Vt(x)), and redoing the above computations in reverse order to establish that thefunction u then defined via u(t,Xt) = Yt, solves (11.6) and satisfies (11.2).

Note that the “intrinsic‘” (non discounted) form (11.11) of the Feynman-Kac represen-tation (11.2) is implicit, meaning that the right-hand side of (11.12) also depends on Y. Inthis case this is not a real issue however, as revealed by the equivalent explicit discountedrepresentation (11.2). Now, the power of BSDEs precisely lies in the fact that this theoryallows one to solve more general problems than the linear equations (11.6), (11.11), namelynonlinear problems in which the BSDE coefficient, g(t,Xt, Yt) := c(t,Xt) − rYt in the caseof (11.6), (11.11), depends nonlinearly on Y, and possibly also on Z and V. Let us thusconsider the following BSDE to be solved in a triplet of processes (Yt, Zt, Vt(x)):{

YT = �(XT ) and for t < T,

−dYt = g(t,Xt, Yt, Zt, Vt)dt− ZtdWt −(Vt(J(t))dNt − �Vtdt

) (11.13)

where Vt :=∫ℝd Vt(y)�(t,Xt, y)w(dy) for a suitable (possibly vector-valued) integration ker-

nel �.Let now a function u = u(t, x) solve the following semilinear PIDE:⎧⎨⎩

u(T, x) = �(x), x ∈ ℝd∂tu(t, x) +Au(t, x)

+g(t, x, u(t, x), ∂u(t, x)�(t, x), �u(t, x)

)= 0, t < T, x ∈ ℝd

(11.14)

with �u(t, x) :=∫ℝd �u(t, x, y)�(t, x, y)w(dy).

Straightforward extensions of the computations having led from (11.6) to (11.11) showthat the triplet (Y,Z, V ) given in terms of u by formula (11.10), solves the nonlinear BSDE(11.13). For this reason formula (11.10) is known as a nonlinear Feynman-Kac formula.

104

11.2.2 Optimal stopping

.BSDEs allow one to deal non only with semilinearities referring to the possible nonlinear

dependence of the coefficient g with respect to (Y, Z, V ), but also to nonlinearities resultingfrom optimal stopping features which may be involved in a control problem.

For instance, one may consider instead of u in (11.2), the function v = v(t, x) such that

v(t,Xt) := ess sup�∈T[t,T ](11.15)

E

(∫ �

te−


∫ �t r(s,Xs)ds�(X� )

∣∣∣Xt

)or equivalently in implicit intrinsic form:

v(t,Xt) = ess sup�∈T[t,T ](11.16)

E

(∫ �

t

(c(s,Xs)− r(s,Xs)v(s,Xs)

)ds+ �(X� )

∣∣∣Xt

).

In these dynamic programming equations, T[t,T ] denotes the set of all [t, T ]-valued ℱ-stoppingtimes. The set T[t,T ] is uncountable, so that care is needed in taking the supremum over anuncountable family of random variables in the r.h.s. of (11.15), which is taken care of bythe use of the essential supremum (“ess sup”). In particular, one has v(t, x) ≥ �(x), whicharises from considering � ≡ t in (11.15), and of course v(t, x) ≥ u(t, x). In (11.15), as in(11.2), the conditioning with respect to ℱt can be replaced by the conditioning with respectto Xt, by the Markov property of a jump-diffusion X.

Computations similar to the above ones allow one to establish the “usual” connection(11.10) (nonlinear Feynman-Kac formula) between:

∙ On one hand, the solution (Y,Z, V,A) to the following reflected BSDE:⎧⎨⎩YT = �(XT ) and for t < T :

−dYt = g(t,Xt, Yt, Zt, Vt)dt+ dAt − ZtdWt −(Vt(J(t))dNt − �Vtdt

),

(Yt − �(Xt))+dAt = 0; , (Yt − �(Xt))

+dAt = 0;

(11.17)

Here A represents a further adapted, continuous and non-decreasing process which isrequired for preventing the component Yt of the solution from falling below the barrierlevel �(Xt); this non-decreasing process is only allowed to increase on the random set{Yt = �(Xt)}, as imposed by the minimality condition in the third line;

∙ On the other hand, the following obstacle problem:⎧⎨⎩v(T, x) = �(x), x ∈ ℝd

max(∂tv(t, x) +Av(t, x) + g

(t, x, v(t, x), ∂v(t, x)�(t, x), �v(t, x)

),

�(t, x)− v(t, x))

= 0, t < T, x ∈ ℝd.(11.18)

The corresponding dynamic programming equation generalizing (11.16) reads as follows:

Yt = (11.19)

ess sup�∈T[t,T ]E

(∫ �

tg(s,Xs, Ys, Zs, Vs)ds+ �(X� )

∣∣∣Xt

).

105

The proof of these results involves some technicalities though, since, in particular, the valuefunction v of an optimal stopping problem is well known to be only of class C1,1 on theboundary {v = �} of the continuation region {v > �}. This implies that v being a “solution”to (11.18) (which involves through the generator A the second derivatives in space of v) canonly be understood in a weak sense. But again BSDEs are useful here since there is a wellestablished theory connecting solutions of BSDEs to the so-called viscosity class of weaksolutions to nonlinear PDEs or PIDEs.

Finally it is important to have in mind that BSDEs are not only useful theoreticallyas we showed, but also in practice, when the dimension d of X is such that the numericalsolution of a nonlinear P(I)DE by a deterministic scheme is ruled out by Bellman’s “curseof dimensionality.” For solving such high-dimensional nonlinear problems, BSDE-basedsimulation schemes are the only viable alternative.

Documents

An Introductory Course in Stochastic Processes