sde-lecture.pdf

Stochastic Differential Equations

Lecture notes for courses given

at Humboldt University Berlin

and University of Heidelberg

Markus ReißInstitute of Applied Mathematics

University of Heidelberg

http://math.uni-heidelberg.de/studinfo/reiss

This version: February 12, 2007

Contents

1 Stochastic integration 3

1.1 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 The Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Construction in L2 . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.3 Doob’s Martingale Inequality . . . . . . . . . . . . . . . . . . 10

1.2.4 Extension of the Ito integral . . . . . . . . . . . . . . . . . . . 12

1.2.5 The Fisk-Stratonovich integral . . . . . . . . . . . . . . . . . . 13

1.2.6 Multidimensional Case . . . . . . . . . . . . . . . . . . . . . . 14

1.2.7 Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Strong solutions of SDEs 19

2.1 The strong solution concept . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Explicit solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4.1 Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4.2 Transformation methods . . . . . . . . . . . . . . . . . . . . . 27

3 Weak solutions of SDEs 29

3.1 The weak solution concept . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 The two concepts of uniqueness . . . . . . . . . . . . . . . . . . . . . 31

3.3 Existence via Girsanov’s theorem . . . . . . . . . . . . . . . . . . . . 32

3.4 Applications in finance and statistics . . . . . . . . . . . . . . . . . . 36

4 The Markov properties 37

4.1 General facts about Markov processes . . . . . . . . . . . . . . . . . . 37

4.2 The martingale problem . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 The strong Markov property . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 The infinitesimal generator . . . . . . . . . . . . . . . . . . . . . . . . 42

4.5 The Kolmogorov equations . . . . . . . . . . . . . . . . . . . . . . . . 45

4.6 The Feynman-Kac formula . . . . . . . . . . . . . . . . . . . . . . . . 47

Table of contents

5 Stochastic control: an outlook 48

Bibliography 53

Notation

The notation follows the usual conventions, nevertheless the general mathematicalsymbols that will be used are gathered in the first table. The notation of the differentfunction spaces is presented in the second table. The last table shows some regularlyused own notation.

General symbols

A := B A is defined by B[a, b], (a, b) closed, open interval from a to bN, N0, Z 1, 2, . . ., 0, 1, . . ., 0, +1,−1, +2,−2, . . .R, R+, R−, C (−∞,∞), [0,∞), (−∞, 0], complex numbersRe(z), Im(z), z real part, imaginary part, complex conjugate of z ∈ Cbxc largest integer smaller or equal to x ∈ Rdxe smallest integer larger or equal to x ∈ Ra ∨ b, a ∧ b maximum, minimum of a and b

|x| modulus of x ∈ R or Euclidean norm of x ∈ Rd

A ⊂ B A is contained in B or A = Bspan(v, w, . . .) the subspace spanned by v, w, . . .U + V , U ⊕ V the sum, the direct sum (U ∩ V = 0) of U and Vdim V , codim V linear dimension, codimension of Vran T , ker T range and kernel of the operator T

Ed identity matrix in Rd×d

det(M) determinant of M‖T‖, ‖T‖X→Y operator norm of T : X → Yf(•), g(•1, •2) the functions x 7→ f(x), (x1, x2) 7→ g(x1, x2)supp(f) support of the function ff |S function f restricted to the set Sf ′, f ′′, f (m) first, second, m-fold (weak) derivative of ff ′(a+) derivative of f at a to the right1S indicator function of the set S

f , F(f) f(ξ) = F(f)(ξ) =∫

R f(t)e−iξtdt or estimator f of fa, F(a), a ∈ M(I) a(ξ) = F(a)(ξ) =

∫Ie−iξtda(t)

log natural logarithmcos, sin, cosh, sinh (hyperbolic) trigonometric functions

2 Notation

P, E, Var, Cov probability, expected value, variance and covarianceL(X), X ∼ P the law of X, L(X) = PXn

P= X Xn converges P-stochastically to X

XnL= X Xn converges in law to X

N(µ, σ2) normal distribution with mean µ and variance σ2

σ(Zi, i ∈ I) σ-algebra generated by (Zi)i∈I

δx Dirac measure at xA . B A = O(B), i.e. ∃ c > 0∀p : A(p) ≤ cB(p) (p parameter)A & B B . AA ∼ B A . B and B . A

Function spaces and norms

Lp(I, Rd) p-integrable functions f : I → R (∫

I|f |p < ∞)

C(I, Rd) f : I → Rd | f continuousCK(I, Rd) f ∈ C(I, Rd) | f has compact supportC0(Rd1 , Rd2) f ∈ C(Rd1 , Rd2) | lim‖x‖→∞ f(x) = 0‖f‖∞ supx|f(x)|‖µ‖TV total variation norm: ‖µ‖TV = sup‖f‖∞=1

∫f dµ

Specific definitions

W (t) Brownian motion at time tX(t) solution process to SDE at time t(Ft) filtration for (W (t), t ≥ 0)

Chapter 1

Stochastic integration

1.1 White Noise

Many processes in nature involve random fluctuations which we have to account forin our models. In principle, everything can be random and the probabilistic structureof these random influences can be arbitrarily complicated. As it turns out, the socalled ”white noise” plays an outstanding role.

Engineers want the white noise process (W (t), t ∈ R) to have the following prop-erties:

• The random variables W (t) | t ∈ R are independent.

• W is stationary, that is the distribution of (W (t+ t1), W (t+ t2), . . . , W (t+ tn))does not depend on t.

• The expectation E[W (t)] is zero.

Hence, this process is supposed to model independent and identically distributedshocks with zero mean. Unfortunately, mathematicians can prove that such a real-valued stochastic process cannot have measurable trajectories t 7→ W (t) except forthe trivial process W (t) = 0.

1.1.1 Problem. If (t, ω) 7→ W (t, ω) is jointly measurable with E[W (t)2] < ∞ andW has the above stated properties, then for all t ≥ 0

E[(∫ t

0

W (s) ds)2]

= 0

holds and W (t) = 0 almost surely. Can we relax the hypothesis E[W (t)2] < ∞ ?

Nevertheless, applications forced people to consider equations like

x(t) = αx(t) + W (t), t ≥ 0.

4 Chapter 1. Stochastic integration

The way out of this dilemma is found by looking at the corresponding integratedequation:

x(t) = x(0) +

∫ t

0

αx(s) ds +

∫ t

0

W (s) ds, t ≥ 0.

What properties should we thus require for the integral process W (t) :=∫ t

0W (s)ds,

t ≥ 0? A straight-forward deduction (from wrong premises...) yields

• W (0) = 0.

• The increments (W (t1)−W (t2), W (t3)−W (t4), . . . ,W (tn−1)−W (tn)) are in-dependent for t1 ≥ t2 ≥ · · · ≥ tn.

• The increments are stationary, that is W (t1 + t)−W (t2 + t)L= W (t1)−W (t2)

holds for all t ≥ 0.

• The expectation E[W (t)] is zero.

• The trajectories t 7→ W (t) are continuous.

The last point is due to the fact that integrals over measurable (and integrable) func-tions are always continuous. It is highly nontrivial to show that – up to indistinguisha-bility and up to the norming Var[W (1)] = 1 – the only stochastic process fulfillingthese properties is Brownian motion (also known as Wiener process) (Øksendal 1998).Recall that Brownian motion is almost surely nowhere differentiable!

Rephrasing the stochastic differential equation, we now look for a stochasticprocess (X(t), t ≥ 0) satisfying

X(t) = X(0) +

∫ t

0

αX(s)ds + W (t), t ≥ 0, (1.1.1)

where (W (t), t ≥ 0) is a standard Brownian motion. The precise formulation involvingfiltrations will be given later, here we shall focus on finding processes X solving (1.1.1).

The so-called variation of constants approach in ODEs would suggest the solution

X(t) = X(0)eαt +

∫ t

0

eα(t−s)W (s) ds, (1.1.2)

which we give a sense (in fact, that was Wiener’s idea) by partial integration:

X(t) = X(0)eαt + W (t) +

∫ t

0

αeα(t−s)W (s) ds. (1.1.3)

This makes perfect sense now since Brownian motion is (almost surely) continuousand we could even take the Riemann integral. The verification that (1.1.3) defines a

1.1. White Noise 5

solution is straight forward:∫ t

0

αX(s) ds = X(0)

∫ t

0

αeαs ds + α

∫ t

0

W (s) ds + α2

∫ t

0

∫ s

0

eα(s−u)W (u) du ds

= X(0)(eαt − 1) + α

∫ t

0

W (s) ds + α2

∫ t

0

W (u)

∫ t

u

eα(s−u) ds du

= X(0)(eαt − 1) +

∫ t

0

αW (u)eα(t−u) du

= X(t)−X(0)−W (t).

Note that the initial value X(0) can be chosen arbitrarily. The expectationµ(t) := E[X(t)] = E[X(0)]eαt exists if X(0) is integrable. Surprisingly this expec-tation function satisfies the deterministic linear equation, hence it converges to zerofor α < 0 and explodes for α > 0. How about the variation around this mean value?Let us suppose that X(0) is deterministic, α 6= 0 and consider the variance function

v(t) := Var[X(t)] = E[(

W (t) +

∫ t

0

αeα(t−s)W (s) ds)2]

= E[W (t)2] + 2

∫ t

0

αeα(t−s) E[W (t)W (s)] ds +

∫ t

0

∫ t

0

α2eα(2t−u−s) E[W (s)W (u)] du ds

= t + 2

∫ t

0

αeα(t−s)s ds + 2

∫ t

0

∫ t

s

α2eα(2t−u−s)s du ds

= t +

∫ t

0

(2αeα(t−s)s + 2α(e2α(t−s) − eα(t−s))s

)ds

= 12α

(e2αt − 1

).

This shows that for α < 0 the variance converges to 12|α| indicating a stationary

behaviour, which will be made precise in the sequel. On the other hand, for α > 0 wefind that the standard deviation

√v(t) grows with the same order as µ(t) for t →∞

which lets us expect a very erratic behaviour.In anticipation of the Ito calculus, the preceding calculation can be simplified by

regarding (1.1.2) directly. The second moment of∫ t

0eα(t−s) dW (s) is immediately seen

to be∫ t

0e2α(t−s) ds, the above value.

1.1.2 Problem. Justify the name ”white noise” by calculating the expectation andthe variance of the Fourier coefficients of W on [0, 1] by formal partial integration,i.e. using formally

ak =

∫ 1

0

W (t)√

2 sin(2πkt) dt = −∫ 1

0

W (t)2πk√

2 cos(2πkt) dt

and the analogon for the cosine coefficients. Conclude that the coefficients are i.i.d.standard normal, hence the intensity of each frequency component is equally strong(”white”).


1.2 The Ito Integral

1.2.1 Construction in L2

We shall only need the Ito integral with respect to Brownian motion, so the generalsemimartingale theory will be left out. From now on we shall always be working on acomplete probability space (Ω, F, P) where a filtration (Ft)t≥0, that is a nested familyof σ-fields Fs ⊂ Ft ⊂ F for s ≤ t, is defined that satisfies the usual conditions:

• Fs =⋂

t>s Ft for all s ≥ 0 (right-continuity);

• all A ∈ F with P(A) = 0 are contained in F0.

A family (X(t), t ≥ 0) of Rd-valued random variables on our probability space is calleda stochastic process and this process is (Ft)-adapted if all X(t) are Ft-measurable.Denoting the Borel σ-field on [0,∞) by B, this process X is measurable if (t, ω) 7→X(t, ω) is a B⊗F-measurable mapping. We say that (X(t), t ≥ 0) is continuous if thetrajectories t 7→ X(t, ω) are continuous for all ω ∈ Ω. One can show that a process ismeasurable if it is (right-)continuous (Karatzas and Shreve 1991, Thm. 1.14).

1.2.1 Definition. A (standard one-dimensional) Brownian motion with respect tothe filtration (Ft) is a continuous (Ft)-adapted real-valued process (W (t), t ≥ 0) suchthat

• W(0)=0;

• for all 0 ≤ s ≤ t: W (t)−W (s) is independent of Fs;

• for all 0 ≤ s ≤ t: W (t)−W (s) is N(0, t− s)-distributed.

1.2.2 Remark. Brownian motion can be constructed in different ways (Karatzas andShreve 1991), but the proof of the existence of such a process is in any case non-trivial.

We shall often consider a larger filtration (Ft) than the canonical filtration (FWt )

of Brownian motion in order to include random initial conditions. Given a Brownianmotion process W ′ on a probability space (Ω′, F′, P′) with the canonical filtration F′

t =σ(W ′(s), s ≤ t) and the random variable X ′′

0 on a different space (Ω′′, F′′, P′′), wecan construct the product space with Ω = Ω′ × Ω′′, F = F′ ⊗ F′′, P = P′⊗P′′ suchthat W (t, ω′, ω′′) := W ′(t, ω′) and X0(ω

′, ω′′) := X ′′0 (ω′′) are independent and W is an

(Ft)-Brownian motion for Ft = σ(X0; W (s), s ≤ t). Note that X0 is F0-measurablewhich always implies that X0 and W are independent.

Our aim here is to construct the integral∫ t

0Y (s) dW (s) with Brownian motion as

integrator and a fairly general class of stochastic integrands Y .

1.2.3 Definition. Let V be the class of real-valued stochastic processes (Y (t), t ≥ 0)that are adapted, measurable and that satisfy

‖Y ‖V :=(∫ ∞

0

E[Y (t)2

]dt

)1/2

< ∞.

1.2. The Ito Integral 7

A process Y ∈ V is called simple if it is of the form

Y (t, ω) =∞∑i=0

ηi(ω)1[ti,ti+1)(t),

with an increasing sequence (ti)i≥0 and Fti-measurable random variables ηi.

For such simple processes Y ∈ V we naturally define∫ ∞

0

Y (t) dW (t) :=∞∑i=0

ηi(W (ti+1)−W (ti)). (1.2.1)

1.2.4 Proposition. The right hand side in (1.2.1) converges in L2(P), hence theintegral

∫ ∞0

Y (t) dW (t) is a P-almost surely well defined random variable. Moreoverthe following isometry is valid for simple processes Y :

E[(∫ ∞

0

Y (t) dW (t))2]

= ‖Y ‖2V .

Proof. We show that the partial sums Sk :=∑k

i=0 ηi(W (ti+1)−W (ti)) form a Cauchysequence in L2(P). Let k ≤ l, then by the independence and zero mean property ofBrownian increments we obtain

E[(

Sl − Sk

)2]=

l∑i=k+1

E[(

ηi(W (ti+1)−W (ti)))2]

+ 2∑

k+1≤i<j≤l

E[ηi(W (ti+1)−W (ti))ηj] E[W (tj+1)−W (tj)]

=l∑

i=k+1

E[(

ηi(W (ti+1)−W (ti)))2]

=l∑

i=k+1

E[η2

i

](ti+1 − ti)

=

∫ tl+1

tk+1

E[Y (t)2

]dt.

Due to ‖Y ‖V < ∞ the last line tends to zero for k, l → ∞. By the completeness ofL2(P) the Ito integral of Y is therefore well defined as the L2-limit of (Sk). The samecalculations as before also show

E[S2

k

]=

∫ tk+1

0

E[Y (t)2

]dt.

By taking the limit k →∞ on both sides the asserted isometry property follows.


The main idea for extending the Ito integral to general integrands in V is to showthat the simple processes lie dense in V with respect to the ‖•‖V -seminorm and touse the isometry to define the integral by approximation.

1.2.5 Proposition. For any process Y ∈ V there is a sequence of simple processes(Yn)n≥1 in V with limn→∞‖Y − Yn‖V = 0.

Proof. We proceed by relaxing the assumptions on Y step by step:

1. Y is continuous, |Y (t)| ≤ K for t ≤ T and Y (t) = 0 for t ≥ T : Set tni := in

anddefine

Yn(t) :=Tn−1∑i=0

Y (ti)1[tni ,tni+1)(t).

Then Yn is clearly a simple process in V and by the continuity of Y (t) theprocesses Yn converge to Y pointwise for all (t, ω). Since ‖Yn‖2

V ≤ TK2 holds,the dominated convergence theorem implies limn→∞‖Y − Yn‖V = 0.

2. |Y (t)| ≤ K for t ≤ T and Y (t) = 0 for t ≥ T , T ∈ N: Y can be approximated bycontinuous functions Yn in V with these properties (only T replaced by T + 1).For this suppose that h : [0,∞) → [0,∞) is continuous, satisfies h(t) = 0 fort ≥ 1 and

∫h = 1. For n ∈ N define the convolution

Yn(t) :=

∫ t

0

Y (s) 1nh(n(t− s)) ds.

Then Yn is continuous, has support in [0, T + 1n] and satisfies |Yn(t)| ≤ K for

all ω. Moreover, Yn(t) is Ft-adapted so that Yn ∈ V holds. Real analysis showsthat

∫(Yn − Y )2 → 0 holds for n →∞ and all ω, so the assertion follows again

by dominated convergence.

3. Y ∈ V arbitrary: The processes

Yn(t) :=

0, t ≥ n

Y (t), |Y (t)| ≤ n, t < n

n, Y (t) > n

−n, Y (t) < −n

are as in the preceding step with T = K = n. Moreover, they converge toY pointwise and satisfy |Yn(t, ω)| ≤ |Y (t, ω)| for all (t, ω) so that dominatedconvergence gives limn→∞‖Yn − Y ‖V = 0.

Putting the different approximations together completes the proof.

By the completeness of L2(P) and the isometry in Proposition 1.2.4 the followingdefinition of the Ito integral makes sense, in particular it does not depend on theapproximating sequence.


1.2.6 Definition. For any Y ∈ V choose a sequence (Yn) of simple processes withlimn→∞‖Yn − Y ‖V = 0 and define the Ito integral by∫ ∞

0

Y (t) dW (t) := limn→∞

∫ ∞

0

Yn(t) dW (t),

where the limit is understood in an L2(P)-sense.

For 0 ≤ A ≤ B and Y ∈ V we set∫ B

A

Y (t) dW (t) =

∫ ∞

0

Y (t)1[A,B](t) dW (t).

1.2.7 Problem.

1. The quadratic covariation up to time t between two functions f, g : R+ → R isgiven by

〈f, g〉t = lim|Π|→0

∑ti∈Π

(f(ti+1 ∧ t)− f(ti ∧ t))(g(ti+1 ∧ t)− g(ti ∧ t)) ∀ t ≥ 0,

if the limit exists, where Π denotes a partition given by real numbers (ti) witht0 = 0, ti ↑ ∞ and width |Π| = maxi(ti+1 − ti). We call 〈f〉t := 〈f, f〉t thequadratic variation of f . Show that Brownian motion satisfies 〈W 〉t = t fort ≥ 0, when the involved limit is understood to hold in probability. Hint: considerconvergence in L2(P).

2. Show that the process X with X(t) = W (t)1[0,T ](t) is in V for any T ≥ 0. Provethe identity ∫ T

0

W (t) dW (t) =

∫ ∞

0

X(t) dW (t) = 12W (T )2 − 1

2T.

Hint: Consider Xn =∑n−1

k=0 W (kTn

)1[ kT

n,(k+1)T

n)and use part 1.

1.2.2 Properties

In this subsection we gather the main properties of the Ito integral without givingproofs. Often the properties are trivial for simple integrands and follow by approxi-mation for the general case, the continuity property will be shown in Corollary 1.2.11.Good references are Øksendal (1998) and Karatzas and Shreve (1991).


1.2.8 Theorem. Let X and Y be processes in V then

(a) E[(∫ ∞

0

X(t) dW (t))2]

= ‖X‖2V (Ito isometry)

(b) E[∫ ∞

0

X(t) dW (t)

∫ ∞

0

Y (t) dW (t)]

=

∫ ∞

0

E[X(t)Y (t)] dt

(c)

∫ C

A

X(t) dW (t) =

∫ B

A

X(t) dW (t) +

∫ C

B

X(t) dW (t) P -a.s. for all 0 ≤ A ≤ B ≤ C

(d)

∫ ∞

0

(cX(t) + Y (t)) dW (t) = c

∫ ∞

0

X(t) dW (t) +

∫ ∞

0

Y (t) dW (t) P -a.s. for all c ∈ R

(e) E[∫ ∞

0

X(t) dW (t)]

= 0

(f)

∫ t

0

X(s) dW (s) is Ft-measurable for t ≥ 0

(g)(∫ t

0

X(s) dW (s), t ≥ 0)

is an Ft-martingale

(h)(∫ t

0

X(s) dW (s), t ≥ 0)

has a continuous version

(i)⟨∫ •

0

X(s) dW (s),

∫ •

0

Y (s) dW (s)⟩

t=

∫ t

0

X(s)Y (s) ds (quadratic covariation process)

(j) X(t)W (t) =

∫ t

0

X(s) dW (s) +

∫ t

0

W (s) dX(s) P -a.s. for X with bounded variation

1.2.3 Doob’s Martingale Inequality

1.2.9 Theorem. Suppose (Xn, Fn)0≤n≤N is a martingale. Then for every p ≥ 1 andλ > 0

λp P(

sup0≤n≤N

|Xn| ≥ λ)≤ E[|XN |p],

and for every p > 1

E[

sup0≤n≤N

|Xn|p]≤

( p

p− 1

)p

E[|XN |p].

Proof. Introduce the stopping time τ := infn | |Xn| ≥ λ ∧ N . Since (|Xn|p) is asubmartingale the optional stopping theorem gives

E[|XN |p] ≥ E[|Xτ |p] ≥ λp P(sup

n|Xn| ≥ λ

)+ E[|XN |p1supn|Xn|<λ],


which proves the first part. Moreover, we deduce from this inequality for any K > 0and p > 1

E[(

supn|Xn| ∧K

)p] = E

[∫ K

0

pλp−11supn|Xn|≥λ dλ]

≤∫ K

0

pλp−2 E[|XN |1supn|Xn|≥λ] dλ

= p E[|XN |

∫ supn|Xn|∧K

0

λp−2 dλ]

=p

p− 1E

[|XN |(sup

n|Xn| ∧K)p−1

].

By Holder’s inequality,

E[(

supn|Xn| ∧K

)p] ≤ p

p− 1E

[(sup

n|Xn| ∧K

)p](p−1)/p E[|XN |p]1/p,

which after cancellation and taking the limit K → ∞ yields the asserted momentbound.

1.2.10 Corollary. (Doob’s Lp-inequality) If (X(t), Ft)t∈I is a right-continuous mar-tingale indexed by a subinterval I ⊂ R, then for any p > 1

E[supt∈I

|X(t)|p]1/p ≤ p

p− 1supt∈I

E[|X(t)|p]1/p.

Proof. By the right-continuity of X we can restrict the supremum on the left to acountable subset D ⊂ I. This countable set D can be exhausted by an increasingsequence of finite sets Dn ⊂ D with

⋃n Dn = D. Then the supremum over Dn

increases monotonically to the supremum over D, the preceding theorem applies foreach Dn and the monotone convergence theorem yields the asserted inequality.

Be aware that Doob’s Lp-inequality is different for p = 1 (Revuz and Yor 1999, p.55).

1.2.11 Corollary. For any X ∈ V there exists a version of∫ t

0X(s) dW (s) that is

continuous in t, i.e. a continuous process (J(t), t ≥ 0) with

P(J(t) =

∫ t

0

X(s) dW (s))

= 1 for all t ≥ 0.

Proof. Let (Xn)n≥1 be an approximating sequence for X of simple processes in V .

Then by definition In(t) :=∫ t

0Xn(s) dW (s) is continuous in t for all ω. Moreover,

In(t) is an Ft-martingale so that Doob’s inequality and the Ito isometry yield theCauchy property

E[supt≥0

|Im(t)− In(t)|2]≤ 4 sup

t≥0E[|Im(t)− In(t)|2] = 4‖Xm −Xn‖2

V → 0


for m,n → ∞. By the Chebyshev inequality and the Lemma of Borel-Cantelli thereexist a subsequence (Inl

)l≥1 and L(ω) such that P-almost surely

∀ l ≥ L(ω) supt≥0

|Inl+1(t)− Inl

(t)| ≤ 2−l.

Hence with probability one the sequence (Inl(t))l≥1 converges uniformly and the limit

function J(t) is continuous. Since for all t ≥ 0 the random variables (Inl(t))l≥1 con-

verge in probability to the integral I(t) =∫ t

0X(s) dW (s), the random variables I(t)

and J(t) must coincide for P-almost all ω.

In the sequel we shall consider only t-continuous versions of the stochastic integral.

1.2.4 Extension of the Ito integral

We extend the stochastic integral from processes in V to the more general class ofprocesses V ∗.

1.2.12 Definition. Let V ∗ be the class of real-valued stochastic processes (Y (t), t ≥0) that are adapted, measurable and that satisfy

P(∫ ∞

0

Y (t)2 dt < ∞)

= 1.

1.2.13 Theorem. For Y ∈ V ∗ and n ∈ N consider the R+ ∪+∞-valued stoppingtime (!)

τn(ω) := inf

T ≥ 0 |∫ T

0

Y (t, ω)2 dt ≥ n

.

Then limn→∞ τn = ∞ P-a.s. and∫ ∞

0

Y (t) dW (t) := limn→∞

∫ τn

0

Y (t) dW (t)

exists as limit in probability. More precisely, we have P-a.s.∫ ∞

0

Y (t) dW (t) =

∫ τn

0

Y (t) dW (t) on

ω∣∣∣ ∫ ∞

0

Y (t, ω)2 dt < n

.

Proof. That τn = ∞ holds for all n ≥ N on the event ΩN := ω |∫ ∞

0Y (t, ω)2 dt < N,

is clear. By assumption the event⋃

n≥1 Ωn has probability one. Choosing N ∈ N so

large that P(⋃N

n=1 Ωn) ≥ 1− ε, the random variables∫ τn

0Y (t) dW (t) are constant for

all n ≥ N with probability at least 1 − ε. This implies that these random variablesform a Cauchy sequence with respect to convergence in probability. By completenessthe limit exists. The last assertion is obvious from the construction.

1.2.14 Remark. Observe that the first idea to set∫

Y (t) dW (t) =∫

Y (t)1ΩNdW (t)

for all ω ∈ ΩN is not feasible because 1ΩNis generally not adapted.


By localisation via the stopping times (τn) one can infer the properties of theextended integral from Theorem 1.2.7. The last assertion of the following theorem isproved in (Revuz and Yor 1999, Prop. IV.2.13).

1.2.15 Theorem. The stochastic integral over integrands in V ∗ has the same prop-erties as that over integrands in V regarding linearity (Theorem 1.2.7(c,d)), measur-ability (1.2.7(f)) and existence of a continuous version (1.2.7(h)). However, it is onlya local (Ft)-martingale with quadratic covariation as in (1.2.7(i)).

Moreover, if Y ∈ V ∗ is left-continuous and Π is a partition of [0, t], then the finitesum approximations converge in probability:∫ t

0

Y (s) dW (s) = lim|Π|→0

∑ti∈Π

Y (ti)(W (ti+1)−W (ti)).

1.2.5 The Fisk-Stratonovich integral

For integrands Y ∈ V an alternative reasonable definition of the stochastic integralis by interpolation∫ T

0

Y (t) dW (t) := lim|Π|→0

∑ti∈Π

12(Y (ti+1)) + Y (ti))(W (ti+1)−W (ti)),

where Π denotes a partition of [0, T ] with 0 = t0 < t1 < · · · < tn−1 = tn = T and|Π| = maxi(ti+1 − ti) and where the limit is understood in the L2(Ω)-sense. This isthe Fisk-Stratonovich integral.

1.2.16 Theorem. For an arbitrary integrand Y ∈ V we have in probability

lim|Π|→0

∑ti∈Π

12(Y (ti+1) + Y (ti))(W (ti+1)−W (ti)) =

∫ T

0

Y (t) dW (t) + 12〈Y,W 〉T .

Proof. Since the process Y Π :=∑

ti∈Π Y (ti)1[ti,ti+1) is a simple integrand in V and

satisfies lim|Π|→0 E[‖Y Π − Y ‖2L2(0,T )] → 0, we have

lim|Π|→0

∑ti∈Π

Y (ti)(W (ti+1)−W (ti)) =

∫ T

0

Y (t) dW (t)

even in L2(Ω) by Ito isometry. The assertion thus reduces to

lim|Π|→0

∑ti∈Π

(Y (ti+1)− Y (ti))(W (ti+1)−W (ti)) = 〈Y,W 〉T ,

which is just the definition of the quadratic covariation.between Y and W .

1.2.17 Corollary. The Fisk-Stratonovich integral is linear and has a continuousversion, but it is usually not a martingale and not even centred.

1.2.18 Example. We have∫ T

0W (t) dW (t) = 1

2W (T )2− 1

2T , but

∫ T

0W (t) dW (t) =

12W (T )2.


1.2.6 Multidimensional Case

1.2.19 Definition.

1. An Rm-valued (Ft)-adapted stochastic process W (t) = (W1(t), . . . ,Wm(t))T isan m-dimensional Brownian motion if each component Wi, i = 1, . . . ,m, is aone-dimensional (Ft)-Brownian motion and all components are independent.

2. If Y is an Rd×m-valued stochastic process such that each component Yij, 1 ≤i ≤ d, 1 ≤ j ≤ m, is an element of V ∗ then the multidimensional Ito inte-gral

∫Y dW for m-dimensional Brownian motion W is an Rd-valued random

variable with components

(∫ ∞

0

Y (t) dW (t))

i:=

m∑j=1

∫ ∞

0

Yij(t) dWj(t), 1 ≤ i ≤ d.

1.2.20 Proposition. The Ito isometry extends to the multidimensional case such thatfor Rd×m-valued processes X, Y with components in V and m-dimensional Brownianmotion W

E[⟨∫ ∞

0

X(t) dW (t),

∫ ∞

0

Y (t) dW (t)⟩]

=

∫ ∞

0

d∑i=1

m∑j=1

E[Xij(t)Yij(t)] dt.

Proof. The term in the brackets on the left hand side is equal to

d∑i=1

m∑j=1

m∑k=1

∫ ∞

0

Xij(t) dWj(t)

∫ ∞

0

Yik(t) dWk(t)

and the result follows from the one-dimensional Ito isometry once the following claimhas been proved: stochastic integrals with respect to independent Brownian motionsare uncorrelated (attention: they may well be dependent!).

For this let us consider two independent Brownian motions W1 and W2 and twosimple processes Y1, Y2 in V on the same filtered probability space with

Yk(t) =∞∑i=0

ηik(ω)1[ti,ti+1)(t), k ∈ 1, 2.

The common partition of the time axis can always be achieved by taking a common


refinement of the two partitions. Then by the Fti-measurability of ηik we obtain

E[∫ ∞

0

Y1(t) dW1(t)

∫ ∞

0

Y2(t) dW2(t)]

=∑

0≤i≤j<∞

E[ηi1ηj2(W1(ti+1)−W1(ti))(W2(tj+1)−W2(tj))

]+

∑0≤j<i<∞

E[ηi1ηj2(W1(ti+1)−W1(ti))(W2(tj+1)−W2(tj))

]=

∑0≤i≤j<∞

E[ηi1ηj2(W1(ti+1)−W1(ti))

]E

[(W2(tj+1)−W2(tj))

]+

∑0≤j<i<∞

E[ηi1ηj2(W1(tj+1)−W1(tj))

]E

[(W2(ti+1)−W2(ti))

]= 0

By Proposition 1.2.5 for each process in V there exists a sequence of simple processessuch that the corresponding stochastic integrals converge in L2(P), which implies thatthe respective covariances converge, too. This density argument proves the generalcase.

1.2.7 Ito’s formula

For complete proofs see Karatzas and Shreve (1991) or any other textbook on sto-chastic integration. Note in particular that different proof strategies exist, e.g. Revuzand Yor (1999), and that many extensions exist.

1.2.21 Theorem. For a process h ∈ V ∗ and an adapted process (g(t), t ≥ 0) with∫ T

0|g(t)| dt < ∞ P-almost surely for all T > 0 set

X(t) :=

∫ t

0

g(s) ds +

∫ t

0

h(s) dW (s), t ≥ 0.

Then Y (t) = F (X(t), t), t ≥ 0, with a function F ∈ C2,1(R×R+, R) satisfies

Y (t) = Y (0) +

∫ t

0

(∂F∂t

(X(s), s) + ∂F∂x

(X(s), s)g(s) + 12

∂2F∂x2 (X(s), s)h2(s)

)ds

+

∫ t

0

∂F∂x

(X(s), s)h(s) dW (s), t ≥ 0.

Proof. We only sketch the proof and assume that F , ∂F∂x

, ∂2F∂x2 , ∂F

∂tare even uniformly

bounded. Then for a partition Π of [0, t] with 0 = t0 < t1 < · · · < tn = t we infer


from Taylor’s formula

F (t,X(t)) = F (0, X(0)) +n∑

k=1

F (tk, X(tk))− F (tk−1, X(tk−1))

= F (0, X(0)) +n−1∑k=0

(∂F∂t

∆tk + ∂F∂x

∆X(tk)

+ 12

∂2F∂x2 (∆X(tk))

2 + o(∆tk) + O((∆tk)(∆X(tk))) + o((∆X(tk))2)

),

where all derivatives are evaluated at (X(tk), tk) and where we have set ∆tk = tk+1−tk,∆X(tk) = X(tk+1)−X(tk). If we now let the width of the partition |Π| tend to zero,we obtain by the continuity of X and the construction of the Riemann integral

n−1∑k=0

∂F∂t

∆tk →∫ t

0

∂F∂t

(X(s), s) ds

and by the identity ∆X(tk) =∫ tk+1

tkg(s) ds +

∫ tk+1

tkh(s) dW (s) and the construction

of the Ito integral

n−1∑k=0

∂F∂x

∆X(tk) →∫ t

0

∂F∂x

(X(s), s)g(s) ds +

∫ t

0

∂F∂x

(X(s), s)h(s) dW (s)

with convergence in L2(P). Note that for the precise derivation of the stochasticintegral we have to consider as approximating integrands the processes

YΠ(s) = h(s)n−1∑k=0

∂F∂x

(tk, X(tk))1[tk,tk+1)(s), s ∈ [0, t].

The third term converges to the quadratic variation process, using that an absolutelycontinuous function has zero quadratic variation:

n−1∑k=0

∂2F∂x2 (∆X(tk))

2 →∫ t

0

∂2F∂x2 (X(s), s)h2(s) ds.

The remainder terms converge to zero owing to the finite variation of∫ •

0g(s) ds

and the finite quadratic variation of∫ •

0h(s) dW (s), which implies that the respective

higher order variations vanish.

1.2.22 Theorem. For an Rd×m-valued process h with components in V ∗ and anadapted Rd-valued process (g(t), t ≥ 0) with

∫ T

0‖g(t)‖ dt < ∞ P-almost surely for all

T > 0 set

X(t) :=

∫ t

0

g(s) ds +

∫ t

0

h(s) dW (s), t ≥ 0,


where W is an m-dimensional Brownian motion. Then Y (t) = F (X(t), t), t ≥ 0, witha function F ∈ C2,1(Rd×R+, Rp) satisfies

Y (t) = Y (0) +

∫ t

0

(∂F∂t

(s, X(s)) + DF (X(s), s)g(s)

+ 12

d∑i,j=1

∂2F∂xi∂xj

(X(s), s)( m∑

l=1

hi,l(s)hj,l(s)))

ds

+

∫ t

0

DF (X(s), s)h(s) dW (s), t ≥ 0.

Here DF = (∂xiFj)1≤i≤d,1≤j≤p denotes the Jacobian of F .

1.2.23 Remark. Ito’s formula is best remembered in differential form

dF = Ft dt + Fx dX(t) + 12Fxx d〈X〉t (one-dimensional).

A rule of thumb for deriving also the multi-dimensional formula is to simplify theTaylor expansion by proceeding formally and then substituting dtidtj = dtidWj(t) = 0and dWi(t)dWj(t) = δi,jdt.

If the stochastic integrals on the right hand side in the two preceding theoremsare interpreted in the Fisk-Stratonovich sense and if h is constant, then the termsinvolving second derivatives do not appear in the corresponding formulae:

dF = Ft dt + Fxg dt + hFx dW (t) (one-dimensional).

Note, however, that for non-constant h we should write dF = Ft dt + Fxg dt + Fx hdW (t), where the last term is a Stratonovich integral with respect to the continuousmartingale

∫ •0

h(s) dW (s) instead of Brownian motion.

1.2.24 Problem.

1. Consider again Problem 1.2.7 and evaluate∫ t

0W (s) dW (s) by regarding Y (t) =

W (t)2.

2. Show that X(t) = exp(σW (t) + (a − σ2

2)t), W a one-dimensional Brownian

motion, satisfies the linear Ito stochastic differential equation

dX(t) = aX(t) dt + σX(t) dW (t).

What would be the solution of the same equation in the Fisk-Stratonovitch in-terpretation?

3. Suppose W is an m-dimensional Brownian motion, m ≥ 2, started in x 6= 0.Consider the process Y (t) = ‖W (t)‖ (Euclidean norm) and find an expressionfor the differential dY (t), assuming that W does not hit zero.


The Ito formula allows a rather simple proof of Levy’s martingale characterisationof Brownian motion.

1.2.25 Theorem. Let (M(t), Ft, t ≥ 0) be a continuous Rm-valued local martingalewith M(0) = 0 and cross-variations 〈Mk, Ml〉t = δklt for 1 ≤ k, l ≤ d P-almost surely.Then (M(t), t ≥ 0) is an m-dimensional (Ft)-Brownian motion.

Proof. We only sketch the proof for m = 1 and proper martingales M , details canbe found in (Karatzas and Shreve 1991, Thm. 3.16); in particular the integrationtheory for general semimartingales. In order to show that M has independent normallydistributed increments, it suffices to show

E[exp(iu(M(t)−M(s))) |Fs] = exp(−u2(t− s)/2), u ∈ R, t ≥ s ≥ 0.

By Ito’s formula for general continuous semimartingales applied to real and imaginarypart separately we obtain

exp(iuM(t)) = exp(iuM(s)) + iu

∫ t

s

exp(iuM(v)) dM(v)− 12u2

∫ t

s

exp(iuM(v)) dv.

Due to |exp(iuM(v))| = 1 the stochastic integral is a martingale and the functionF (t) = E[exp(iu(M(t)−M(s))), |Fs] satisfies

F (t) = 1− 12u2

∫ t

s

F (v) dv P -a.s.

This integral equation has the unique solution F (t) = exp(−u2(t− s)/2).

Chapter 2

Strong solutions of SDEs

2.1 The strong solution concept

The first definition of a solution of a stochastic differential equation reflects the in-terpretation that the solution process X at time t is determined by the equation andthe exogenous input of the initial condition and the path of the Brownian motion upto time t. Mathematically, this is translated into a measurability condition on Xt orequivalently into the smallest reasonable choice of the filtration to which X shouldbe adapted, see condition (a) below.

2.1.1 Definition. A strong solution X of the stochastic differential equation

dX(t) = b(X(t), t) dt + σ(X(t), t) dW (t), t ≥ 0, (2.1.1)

with b : Rd×R+ → Rd, σ : Rd×R+ → Rd×m measurable, on the given probabilityspace (Ω, F, P) with respect to the fixed m-dimensional Brownian motion W and theindependent initial condition X0 over this probability space is a stochastic process(X(t), t ≥ 0) satisfying:

(a) X is adapted to the filtration (Gt), where G0t := σ(W (s), 0 ≤ s ≤ t)∨σ(X0) and

Gt is the completion of⋂

s>t G0s with P-null sets;

(b) X is a continuous process;

(c) P(X(0) = X0) = 1;

(d) P(∫ t

0‖b(X(s), s)‖+ ‖σ(X(s), s)‖2 ds < ∞) = 1 holds for all t > 0;

(e) With probability one we have

X(t) = X(0) +

∫ t

0

b(X(s), s) ds +

∫ t

0

σ(X(s), s) dW (s), ∀ t ≥ 0.

20 Chapter 2. Strong solutions of SDEs

2.1.2 Remark. It can be shown (Karatzas and Shreve 1991, Section 2.7) that thecompletion of the filtration of Brownian motion (or more generally of any strongMarkov process) is right-continuous. This means that Gt equals already the completionof G0

t .

With this definition at hand the notion of the existence of a strong solution isclear. We will say that strong uniqueness of a solution holds, only if the constructionof a strong solution is unique on any probability space carrying the random elementsW and X0, where X0 is an arbitrary initial condition.

2.1.3 Definition. Suppose that, whenever (Ω, F, P) is a probability space equippedwith a Brownian motion W and an independent random variable X0, any two strongsolutions X and X ′ of (2.1.1) with initial condition X0 satisfy P(∀t ≥ 0 : Xt = X ′

t) =1. Then we say that strong uniqueness holds for equation (2.1.1) or more precisely forthe pair (b, σ).

2.1.4 Remark. Since solution processes are by definition continuous and R+ is sep-arable, it suffices to have the weaker condition P(Xt = X ′

t) = 1 for all t ≥ 0 in theabove definition.

2.1.5 Problem. Consider the stochastic differential equation

dX(t) = −(1− t)−1X(t) dt + dW (t), 0 ≤ t < 1.

Find a suitable modification of the notion of a strong solution for SDEs defined onbounded time intervals and check that the following process, called Brownian bridge,is a strong solution with initial condition X(0) = 0:

X(t) = (1− t)

∫ t

0

(1− s)−1 dW (s), 0 ≤ t < 1.

Does limt↑1 X(t) exist in some sense of convergence?

2.2 Uniqueness

2.2.1 Example. Consider the one-dimensional equation

dX(t) = b(X(t), t) dt + dW (t)

with a bounded, Borel-measurable function b : R×R+ → R that is nonincreasing inthe first variable. Then strong uniqueness holds for this equation, that is for the pair(b, 1). To prove this define for two strong solutions X and X ′ on the same filtered prob-ability space the process D(t) := X(t)−X ′(t). This process is (weakly) differentiablewith

d

dtD2(t) = 2D(t)D(t) = 2(X(t)−X ′(t))(b(X(t), t)− b(X ′(t), t)) ≤ 0, a.e.

From X(0) = X ′(0) we infer D2(t) = 0 for all t ≥ 0.

2.2. Uniqueness 21

Already for deterministic differential equations examples of nonuniqueness arewell known. For instance, the differential equation x(t) = |x(t)|α with 0 < α < 1 andx(0) = 0 has the family of solutions xτ (t) = ((t− τ)/β)β for t ≥ τ , xτ (t) = 0 for t ≤ τwith β = 1/(1 − α) and τ ≥ 0. The usual sufficient condition for uniqueness in thedeterministic theory is Lipschitz continuity of the analogue of the drift function inthe space variable. Also for SDEs Lipschitz continuity, even in its local form, suffices.First, we recall the classical Gronwall Lemma.

2.2.2 Lemma. Let T > 0, c ≥ 0 and u, v : [0, T ] → R+ be measurable functions. Ifu is bounded and v is integrable, then

u(t) ≤ c +

∫ t

0

u(s)v(s) ds ∀ t ∈ [0, T ]

implies

u(t) ≤ c exp(∫ t

0

v(s) ds), t ∈ [0, T ].

Proof. Suppose c > 0 and set

z(t) := c +

∫ t

0

u(s)v(s) ds, t ∈ [0, T ].

Then u(t) ≤ z(t), z(t) is weakly differentiable and for almost all t

z(t)

z(t)=

u(t)v(t)

z(t)≤ v(t)

holds so that log(z(t)) ≤ log(z(0)) +∫ t

0v(s) ds follows. This shows that

u(t) ≤ z(t) ≤ c exp(∫ t

0

v(s) ds), t ∈ [0, T ].

For c = 0 apply the inequality for cn > 0 with limn cn = 0 and take the limit.

2.2.3 Theorem. Suppose that b and σ are locally Lipschitz continuous in the spacevariable, that is, for all n ∈ N there is a Kn > 0 such that for all t ≥ 0 and allx, y ∈ Rd with ‖x‖, ‖y‖ ≤ n

‖b(x, t)− b(y, t)‖+ ‖σ(x, t)− σ(y, t)‖ ≤ Kn‖x− y‖

holds. Then strong uniqueness holds for equation (2.1.1).

Proof. Let two solutions X and X ′ of (2.1.1) with the same initial condition X0 begiven on some common probability space (Ω, F, P). We define the stopping timesτn := inft > 0 | ‖X(t)‖ ≥ n and τ ′n in the same manner for X ′, n ∈ N. Then


τ ∗n := τn∧τ ′n converges P-almost surely to infinity. The difference X(t∧τ ∗n)−X ′(t∧τ ∗n)equals P-almost surely∫ t∧τ∗n

0

(b(X(s), s)− b(X ′(s), s)) ds +

∫ t∧τ∗n

0

(σ(X(s), s)− σ(X ′(s), s)) dW (s).

We conclude by the Ito isometry and Cauchy-Schwarz inequality:

E[‖X(t ∧ τ ∗n)−X ′(t ∧ τ ∗n)‖2]

≤ 2 E[(∫ t∧τ∗n

0

‖b(X(s), s)− b(X ′(s), s)‖ ds)2]

+ 2 E[∫ t∧τ∗n

0

‖σ(X(s), s)− σ(X ′(s), s)‖2 ds]

≤ 2TK2n

∫ t

0

E[‖X(s ∧ τ ∗n)−X ′(s ∧ τ ∗n)‖2] ds + 2K2n

∫ t

0

E[‖X(s ∧ τ ∗n)−X ′(s ∧ τ ∗n)‖2] ds.

By Gronwall’s inequality we conclude P(X(t ∧ τ ∗n) = X ′(t ∧ τ ∗n)) = 1 for all n ∈ Nand t ∈ [0, T ]. Letting n, T →∞, we see that X(t) = X ′(t) holds P-almost surely forall t ≥ 0 and by Remark 2.1.4 strong uniqueness follows.

2.2.4 Remark. In the one-dimensional case strong uniqueness already holds forHolder-continuous diffusion coefficient σ of order 1/2, see (Karatzas and Shreve 1991,Proposition 5.2.13) for more details and refinements.

2.3 Existence

In the deterministic theory differential equations are usually solved locally aroundthe initial condition. In the stochastic framework one is rather interested in globalsolutions and then uses appropriate stopping in order to solve an equation up to somerandom explosion time. To exclude explosions in finite time, the linear growth of thecoefficients suffices. The standard example for explosion is the ODE

x(t) = x(t)2, t ≥ 0, x(0) 6= 0.

Its solution is given by x(t) = 1/(x−10 − t) which explodes for x0 > 0 and t ↑ x−1

0 . Notealready here that with the opposite sign x(t) = −x(t)2 the solution x(t) = x(0)/(1+t)exists globally. Intuitively, the different behaviour is clear because in the first case xgrows the faster the further away from zero it is (”positive feedback”), while in thesecond case x monotonically converges to zero (”negative feedback”).

We shall first establish an existence theorem under rather strong growth andLipschitz conditions and then later improve on that.

2.3.1 Theorem. Suppose that the coefficients satisfy the global Lipschitz and lineargrowth conditions

‖b(x, t)− b(y, t)‖+ ‖σ(x, t)− σ(y, t)‖ ≤ K‖x− y‖ ∀x, y ∈ Rd, t ≥ 0 (2.3.1)

‖b(x, t)‖+ ‖σ(x, t)‖ ≤ K(1 + ‖x‖) ∀x ∈ Rd, t ≥ 0 (2.3.2)

2.3. Existence 23

with some constant K > 0. Moreover, suppose that on some probability space (Ω, F, P)there exists an m-dimensional Brownian motion W and an initial condition X0 withE[‖X0‖2] < ∞. Then there exists a strong solution of the SDE (2.1.1) with initialcondition X0 on this probability space, which in addition satisfies with some constantC > 0 the moment bound

E[‖X(t)‖2] ≤ C(1 + E[‖X0‖2])eCt2 , t ≥ 0.

Proof. As in the deterministic case we perform successive approximations and applya Banach fixed point argument (”Picard-Lindelof iteration”). Define recursively

X0(t) := X0, t ≥ 0 (2.3.3)

Xn+1(t) := X0 +

∫ t

0

b(Xn(s), s) ds +

∫ t

0

σ(Xn(s), s) dW (s), t ≥ 0. (2.3.4)

Obviously, the processes Xn are continuous and adapted to the filtration generated byX0 and W . Let us fix some T > 0. We are going to show that for arbitrary t ∈ [0, T ]

E[

sup0≤s≤t

‖Xn+1(s)−Xn(s)‖2]≤ C1

(C2t)n

n!(2.3.5)

holds with suitable constants C1, C2 > 0 independent of t and n and C2 = O(T ). Letus see how we can derive the theorem from this result. From Chebyshev’s inequalitywe obtain

P(

sup0≤s≤T

‖Xn+1(s)−Xn(s)‖ > 2−n−1)≤ 4C1

(4C2T )n

n!

The term on the right hand side is summable over n, whence by the Borel-CantelliLemma we conclude

P(for infinitely many n: sup

0≤s≤T‖Xn+1(s)−Xn(s)‖ > 2−n−1

)= 0.

Therefore, by summation supm≥1 sup0≤s≤T‖Xn+m(s) − Xn(s)‖ ≤ 2−n holds for alln ≥ N(ω) with some P-almost surely finite random index N(ω). In particular, therandom variables Xn(s) form a Cauchy sequence P-almost surely and converge tosome limit X(s), s ∈ [0, T ]. Obviously, this limiting process X does not depend onT and is thus defined on R+. Since the convergence is uniform over s ∈ [0, T ], thelimiting process X is continuous. Of course, it is also adapted by the adaptednessof Xn. Taking the limit n → ∞ in equation (2.3.4), we see that X solves the SDE(2.1.1) up to time T because of

sup0≤s≤T

‖b(Xn(s), s)− b(XT (s), s)‖ ≤ K sup0≤s≤T

‖Xn(s)−XT (s)‖ → 0 (in L2(P))

E[‖σ(Xn(•), •)− σ(X(•), •)‖2

V ([0,T ])] ≤ K2T sup0≤s≤T

E[‖Xn(s)−X(s)‖2] → 0.


Since T > 0 was arbitrary, the equation (2.1.1) holds for all t ≥ 0. From estimate(2.3.5) and the asymptotic bound C2 = O(T ) we finally obtain by summation over nand putting T = t the asserted estimate on E[‖X(t)‖2].

It thus remains to establish the claimed estimate (2.3.5), which follows essentiallyfrom Doob’s martingale inequality and the type of estimates used for proving Theorem2.2.3. Proceeding inductively, we infer from the linear growth condition that (2.3.5)is true for n = 0 with some C1 > 0. Assuming it to hold for n− 1, we obtain with aconstant D > 0 from Doob’s inequality:

E[

sup0≤s≤t

‖Xn+1(s)−Xn(s)‖2]

≤ 2 E[

sup0≤s≤t

‖∫ s

0

b(Xn(u), u)− b(Xn−1(u), u) du‖2]

+ 2 E[

sup0≤s≤t

‖∫ s

0

σ(Xn(u), u)− σ(Xn−1(u), u) dW (u)‖2]

≤ 2K2t

∫ t

0

E[‖Xn(u)−Xn−1(u)‖2] du + 2DK2

∫ t

0

E[‖Xn(u)−Xn−1(u)‖2] du

≤ (2K2TC1 + 2DK2)Cn−12

tn

n!.

The choice C2 = 2K2(TC1 + D)/C1 = O(T ) thus gives the result.

The last theorem is the key existence theorem that allows generalisations intomany directions. The most powerful one is essentially based on conditions such thata solution X exists locally and ‖X(t)‖2 remains bounded for all t ≥ 0 (L(x) = x2 isa Lyapunov function). Our presentation follows Durrett (1996).

2.3.2 Lemma. Suppose X1 and X2 are adapted continuous processes with X1(0) =X2(0) and E[‖X1(0)‖2] < ∞. Let τR := inft ≥ 0 | ‖X1(t)‖ ≥ R or ‖X2(t)‖ ≥ R. Ifboth X1 and X2 satisfy the stochastic differential equation (2.1.1) on the random timeinterval [0, τR] with Lipschitz conditions on the coefficients b and σ, then X1(t∧τR) =X2(t ∧ τR) holds P-almost surely for all t ≥ 0.

Proof. We proceed as in the proof of inequality (2.3.5) and obtain for 0 ≤ t ≤ T :

E[

sup0≤s≤t∧τR

‖X1(s)−X2(s)‖2]≤ 2K2(t + D)

∫ t

0

E[‖X1(u ∧ τR)−X2(u ∧ τR)‖2] du

≤ 2K2(T + D)

∫ t

0

E[

sup0≤s≤u∧τR

‖X1(s)−X2(s)‖2]du.

Hence, Gronwall’s Lemma implies that the expectation is zero and the result follows.

2.3. Existence 25

2.3.3 Theorem. Suppose the drift and diffusion coefficients b and σ are locally Lip-schitz continuous in the space variable and satisfy for some B ≥ 0

2〈x, b(x, t)〉+ trace(σ(x, t)σ(x, t)T ) ≤ B(1 + ‖x‖2), ∀x ∈ Rd, t ≥ 0,

then the stochastic differential equation (2.1.1) has a strong solution for any initialcondition X0 satisfying E[‖X0‖2] < ∞.

Proof. We extend the previous theorem by a suitable cut-off scheme. For any R > 0define coefficient functions bR, σR such that

bR(x) =

b(x), ‖x‖ ≤ R,

0, ‖x‖ ≥ 2R,and σR(x) =

σ(x), ‖x‖ ≤ R,

0, ‖x‖ ≥ 2R,

and bR and σR are interpolated for ‖x‖ ∈ (R, 2R) in such a way that they are Lipschitzcontinuous in the state variable. Then let XR be the by Theorem 2.3.1 unique strongsolution to the stochastic differential equation with coefficients bR and σR. Introducethe stopping time τR := inft ≥ 0 | ‖XR(t)‖ ≥ R. Then by Lemma 2.3.2 XR(t) andXS(t) coincide for t ≤ min(τR, τS) and we can define

X∞(t) := XR(t) for t ≤ τR.

The process X∞ will be a strong solution of the stochastic differential equation (2.1.1)if we can show limR→∞ τR = ∞ P-almost surely.

Put ϕ(x) = 1 + ‖x‖2. Then Ito’s formula yields for any t, R > 0

e−Btϕ(XR(t))− ϕ(XR(0))

= −B

∫ t

0

e−Bsϕ(XR(s)) ds +d∑

i=1

∫ t

0

e−Bs2XR,i(s) dXR,i(s)

+1

2

d∑i=1

∫ t

0

e−Bs2d∑

j=1

σij(XR(s), s)2 ds

= local martingale

+

∫ t

0

e−Bs(−Bϕ(XR(s)) + 2〈x, bR(XR(s), s)〉+ trace(σR(XR(s), s)σT

R(XR(s), s)))

ds.

Our assumption implies that (e−B(t∧τR)ϕ(XR(t ∧ τR)))t≥0 is a supermartingale bythe optional stopping theorem. We conclude

E[ϕ(X0)] ≥ E[e−B(t∧τR)ϕ(XR(t ∧ τR))

]= E

[e−B(t∧τR)ϕ(X∞(t ∧ τR))

]≥ e−Bt P(τR ≤ t) min

‖x‖=Rϕ(x).

Because of lim‖x‖→∞ ϕ(x) = ∞ we have limR→∞ P(τR ≤ t) = 0. Since the events(τR ≤ t)R>0 decrease, there exists for all t > 0 and P-almost all ω an index R0 suchthat τR(ω) ≥ t for all R ≥ R0, which is equivalent to τR →∞ P-almost surely.


2.4 Explicit solutions

2.4.1 Linear Equations

In this paragraph we want to study the linear or affine equations

dX(t) =(A(t)X(t) + a(t)

)dt + σ(t) dW (t), t ≥ 0. (2.4.1)

Here, A is a d× d-matrix, a is a d-dimensional vector and σ is a d×m-dimensionalmatrix, where all objects are determinisic as well as measurable and locally boundedin the time variable. As usual, W is an m-dimensional Brownian motion and X ad-dimensional process.

The corresponding deterministic linear equation

x(t) = A(t)x(t) + a(t), t ≥ 0, (2.4.2)

has for every initial condition x0 an absolutely continuous solution x, which is givenby

x(t) = Φ(t)(x0 +

∫ t

0

Φ−1(s)a(s) ds), t ≥ 0,

where Φ is the so-called fundamental solution. This means that Φ solves the matrixequation

Φ(t) = A(t)Φ(t), t ≥ 0, with Φ(0) = Id .

In the case of a matrix A that is constant in time, the fundamental solution is givenby

Φ(t) = eAt :=∞∑

k=0

(tA)k

k!.

2.4.1 Proposition. The strong solution X of equation (2.4.1) with initial conditionX0 is given by

X(t) = Φ(t)(X0 +

∫ t

0

Φ−1(s)a(s) ds +

∫ t

0

Φ−1(s)σ(s) dW (s)), t ≥ 0.

Proof. Apply Ito’s formula.

2.4.2 Problem.

1. Show that the function µ(t) := E[X(t)] under the hypothesis E[|X(0)|] < ∞satisfies the deterministic linear differential equation (2.4.2).

2. Assume that A, a and σ are constant. Calculate the covariance functionCov(X(t), X(s)) and investigate under which conditions on A, a, σ and X0

this function only depends on |t − s| (weak stationarity). When do we havestrong stationarity?

2.4. Explicit solutions 27

2.4.2 Transformation methods

We follow the presentation by Kloeden and Platen (1992) and consider scalar equa-tions that can be solved explicitly by suitable transformations.

Consider the scalar stochastic differential equation

dX(t) = 12b(X(t))b′(X(t)) dt + b(X(t)) dW (t), (2.4.3)

where b : R → R is continously differentiable and does not vanish and W is a one-dimensional Brownian motion. This equation is equivalent to the Fisk-Stratonovichequation

dX(t) = b(X(t)) dW (t).

Define

h(x) :=

∫ x

c

1

b(y)dy for some c ∈ R .

Then X(t) := h−1(W (t) + h(X0)), where h−1 denotes the inverse of h which existsby monotonicity, solves the equation (2.4.3). This follows easily from (h−1)′(W (t) +h(X0)) = b(X(t)) and (h−1)′′(W (t) + h(X0)) = b′(X(t))b(X(t)).

2.4.3 Example.

1. (geometric Brownian motion) dX(t) = α2

2X(t) dt+αX(t) dW (t) has the solution

X(t) = X0 exp(αW (t)).

2. The choice b(x) = β|x|α for α, β ∈ R corresponds formally to the equation

dX(t) = 12αβ2|X(t)|2α−1 sgn(X(t)) dt + β|X(t)|α dW (t).

For α < 1 we obtain formally the solution

X(t) =∣∣∣β(1− α)W (t) + |X0|1−α sgn(X0)

∣∣∣1/(1−α)

sgn(β(1−α)W (t)+|X0|1−α sgn(X0)).

This is well defined and indeed a strong solution if 11−α

is nonnegative. The

specific choice α = n−1n

with n ∈ N odd gives

X(t) = (βn−1W (t) + n√

X0)n.

For even n and X0 ≥ 0 this formula defines a solution of

dX(t) =(n− 1)β2

nX(t)(n−2)/n dt + βX(t)(n−1)/n dW (t),

and X remains nonnegative for all times t ≥ 0. Observe that a solution exists,although the coefficients are not locally Lipschitz. One can show that for n = 2strong uniqueness holds, whereas for n > 2 also the trivial process X(t) = 0 isa solution.


3. The equation

dX(t) = −a2 sin(X(t)) cos3(X(t)) dt + a cos2(X(t)) dW (t)

has for X0 ∈ (−π2, π

2) the solution X(t) = arctan(aW (t) + tan(X0)), which

remains contained in the interval (−π2, π

2). This can be explained by the fact

that for x = ±π2

the coefficients vanish and for values x close to this boundarythe drift pushes the process towards zero more strongly than the diffusion partcan possibly disturb.

4. The equation

dX(t) = a2X(t)(1 + X(t)2) dt + a(1 + X(t)2) dW (t)

is solved by X(t) = tan(aW (t) + arctan X0) and thus explodes P-almost surelyin finite time.

The transformation idea allows certain generalisations. With the same assump-tions on b and the same definition of h we can solve the equation

dX(t) =(αb(X(t)) +

1

2b(X(t))b′(X(t))

)dt + b(X(t)) dW (t)

by X(t) = h−1(αt + W (t) + h(X0)). Equations of the type

dX(t) =(αh(X(t))b(X(t)) +

1

2b(X(t))b′(X(t))

)dt + b(X(t)) dW (t)

are solved by X(t) = h−1(eαth(X0) + eαt∫ t

0e−αs dW (s)).

Finally, we consider for n ∈ N, n ≥ 2, the equation

dX(t) = (aX(t)n + bX(t)) dt + cX(t) dW (t).

Writing Y (t) = X(t)1−n we obtain

dY (t) = (1− n)X(t)−n dX(t) + 12(1− n)(−n)X(t)−n−1c2X2(t) dt

= (1− n)(a + (b− c2

2n)Y (t))dt + (1− n)cY (t) dW (t).

Hence, Y is a geometric Brownian motion and we obtain after transformation for allX0 6= 0

X(t) = e(b− c2

2)t+cW (t)

(X1−n

0 + a(1− n)

∫ t

0

e(n−1)(b− c2

2)s+c(n−1)W (s) ds

)1/(1−n)

.

In addition to the trivial solution X(t) = 0 we therefore always have a nonnegativeglobal solution in the case X0 ≥ 0 and a ≤ 0. For odd integers n and a ≤ 0 a globalsolution exists for any initial condition, cf. Theorem 2.3.3. In the other cases it iseasily seen that the solution explodes in finite time.

Chapter 3

Weak solutions of SDEs

3.1 The weak solution concept

We start with the famous example of H. Tanaka. Consider the scalar SDE

dX(t) = sgn(X(t)) dW (t), t ≥ 0, X(0) = 0, (3.1.1)

where sgn(x) = 1(0,∞)(x) − 1(−∞,0](x). Any adapted process X satisfying (3.1.1) isa continuous martingale with quadratic variation 〈X〉t = t. Levy’s Theorem 1.2.25implies that X has the law of Brownian motion. If X satisfies this equation, thenso does −X, since the Lebesgue measure of t ∈ [0, T ] |X(t) = 0 vanishes almostsurely for any Brownian motion. Hence strong uniqueness cannot hold.

We now invert the roles of X and W , for equation (3.1.1) obviously impliesdW (t) = sgn(X(t)) dX(t). Hence, we take a probability space (Ω, F, P) equippedwith a Brownian motion X and consider the filtration (FX

t )t≥0 generated by X andcompleted under P. Then we define the process

W (t) :=

∫ t

0

sgn(X(s)) dX(s), t ≥ 0.

W is a continuous (FXt )-adapted martingale with quadratic variation 〈W 〉t = t, hence

also an (FXt )-Brownian motion. The couple (X, W ) then solves the Tanaka equation.

However, X is not a strong solution because the filtration (FWt )t≥0 generated by W

and completed under P satisfies FWt $ FX

t as we shall see.

For the proof let us take a sequence (fn) of continuously differentiable functions onthe real line that satisfy fn(x) = sgn(x) for |x| ≥ 1

nand |fn(x)| ≤ 1, fn(−x) = −fn(x)

for all x ∈ R. If we set Fn(x) =∫ x

0fn(y) dy, then Fn ∈ C2(R) and limn→∞ Fn(x) = |x|

holds uniformly on compact intervals. By Ito’s formula for any solution X of (3.1.1)

Fn(X(t))−∫ t

0

fn(X(s))dX(s) =1

2

∫ t

0

f ′n(X(s))ds, t ≥ 0,

30 Chapter 3. Weak solutions of SDEs

follows and by Lebesgue’s Theorem the left hand side converges in probability forn →∞ to |X(t)|−

∫ t

0sgn(X(s))dX(s) = |X(t)|−W (t). By symmetry, f ′n(x) = f ′n(|x|)

and we have for t ≥ 0 P-almost surely

W (t) = |X(t)| − limn→∞

1

2

∫ t

0

f ′n(|X(s)|)ds.

Hence, FWt j F

|X|t holds with obvious notation. The event X(t) > 0 has probability

12

> 0 and is not F|X|t -measurable. Therefore FX

t \ F|X|t is non-void and FW

t $ FXt

holds for any solution X, which is thus not a strong solution in our definition. Notethat the above derivation would be clearer with the aid of Tanaka’s formula and theconcept of local time.

3.1.1 Definition. A weak solution of the stochastic differential equation (2.1.1) is atriple (X,W ), (Ω, F, P), (Ft)t≥0 where

(a) (Ω, F, P) is a probability space equipped with the filtration (Ft)t≥0 that satisfiesthe usual conditions;

(b) X is a continuous, (Ft)-adapted Rd-valued process and W is an m-dimensional(Ft)-Brownian motion on the probability space;

(c) conditions (d) and (e) of Definition 2.1.1 are fulfilled.

The distribution PX(0) of X(0) is called initial distribution of the solution X.

3.1.2 Remark. Any strong solution is also a weak solution with the additional filtra-tion property FX

t j FWt ∨ σ(X(0)). The Tanaka equation provides a typical example

of a weakly solvable SDE that has no strong solution.

3.1.3 Definition. We say that pathwise uniqueness for equation (2.1.1) holds when-ever two weak solutions (X, W ), (Ω, F, P), (Ft)t≥0 and (X ′, W ), (Ω, F, P), (F′

t)t≥0 ona common probability space with a common Brownian motion with respect to bothfiltrations (Ft) and (F′

t), and with P(X(0) = X ′(0)) = 1 satisfy P(∀ t ≥ 0 : X(t) =X ′(t)) = 1.

3.1.4 Definition. We say that uniqueness in law holds for equation (2.1.1) whenevertwo weak solutions (X, W ), (Ω, F, P), (Ft)t≥0 and (X ′, W ′), (Ω′, F′, P′), (F′

t)t≥0 withthe same initial distribution have the same law, that is P(X(t1) ∈ B1, . . . , X(tn) ∈Bn) = P′(X ′(t1) ∈ B1, . . . , X

′(tn) ∈ Bn) holds for all n ∈ N, t1, . . . , tn > 0 and Borelsets B1, . . . , Bn.

3.1.5 Example. For the Tanaka equation pathwise uniqueness fails because X and−X are at the same time solutions. We have, however, seen that X must have thelaw of a Brownian motion and thus uniqueness in law holds.

3.2. The two concepts of uniqueness 31

3.2 The two concepts of uniqueness

Let us discuss the notion of pathwise uniqueness and of uniqueness in law in somedetail. When we consider weak solutions we are mostly interested in the law of thesolution process so that uniqueness in law is usually all we require. However, as weshall see, the concept of pathwise uniqueness is stronger than that of uniqueness inlaw and if we reconsider the proof of Theorem 2.2.3 we immediately see that we havenot used the special filtration properties of strong uniqueness and we obtain:

3.2.1 Theorem. Suppose that b and σ are locally Lipschitz continuous in the spacevariable, that is, for all n ∈ N there is a Kn > 0 such that for all t ≥ 0 and allx, y ∈ Rd with ‖x‖, ‖y‖ ≤ n

‖b(x, t)− b(y, t)‖+ ‖σ(x, t)− σ(y, t)‖ ≤ Kn‖x− y‖

holds. Then pathwise uniqueness holds for equation (2.1.1).

The same remark applies to Example 2.2.1. As Tanaka’s example has shown,pathwise uniqueness can fail when uniqueness in law holds. It is not clear, though,that the converse implication is true.

3.2.2 Theorem. Pathwise uniqueness implies uniqueness in law.

Proof. We have to show that two weak solutions (Xi, Wi), (Ωi, Fi, Pi), (Fit), i = 1, 2

on possibly different filtered probability spaces agree in distribution. The main ideais to define two weak solutions with the same law on a common space with the sameBrownian motion and to apply the pathwise uniqueness assumption. To this end weset

S := Rd×C(R+, Rm)× C(R+, Rd), S = Borel σ-field of S

and consider the image measures

Qi(A) := Pi((Xi(0), Wi, Xi) ∈ A), A ∈ S, i = 1, 2.

Since Xi(t) is by definition Fit-measurable, Xi(0) is independent of Wi under Pi. If

we call µ the law of Xi(0) under Pi (which by assumption does not depend on i),we thus have that the product measure µ⊗W is the law of the first two coordinates(Xi(0), Wi) under Pi, where W denotes the Wiener measure. Since C(R+, Rk) is aPolish space, a regular conditional distribution (Markov kernel) Ki of Xi under Pi

given (Xi(0), Wi) exists (Karatzas and Shreve 1991, Section 5.3D) and we may writefor Borel sets F ⊂ Rd×C(R+, Rm), G ⊂ C(R+, Rd)

Qi(F ×G) =

∫F

Ki(x0, w; G) µ(dx0) W(dw).

Let us now define

T = S × C(R+, Rd), T = Borel σ-field of T


and equip this space with the probability measure

Q(d(x0, w, y1, y2)) = K1(x0, w; dy1)K2(x0, w; dy2)µ(dx0) W(dw).

Finally, denote by T∗ the completion of T under Q and consider the filtrations

Tt = σ((x0, w(s), y1(s), y2(s)), s ≤ t)

and its Q-completion T∗t and its right-continuous version T∗∗ =⋂

s>t T∗s . Then the

projection on the first coordinate has under Q the law of the initial distribution ofXi and the projection on the second coordinate is under Q an T∗∗t -Brownian motion(recall Remark 2.1.2). Moreover, the distribution of the projection (w, yi) under Qis the same as that of (Wi, Xi) under Pi such that we have constructed two weaksolutions on the same probability space with the same initial condition and the sameBrownian motion.

Pathwise uniqueness now implies Q((x0, w, y1, y2) ∈ T | y1 = y2) = 1. Thisentails

P1((W1, X1) ∈ A) = Q((w, y1) ∈ A) = Q((w, y2) ∈ A) = P2((W2, X2) ∈ A).

The same methodology allows to prove the following, at a first glance ratherstriking result.

3.2.3 Theorem. The existence of a weak solution and pathwise uniqueness imply theexistence of a strong solution on any sufficiently rich probability space.

Proof. See (Karatzas and Shreve 1991, Cor. 5.3.23).

3.3 Existence via Girsanov’s theorem

The Girsanov theorem is one of the main tools of stochastic analysis. In the theory ofstochastic differential equations it often allows to extend results for a particular equa-tion to those with more general drift coefficients. Abstractly seen, a Radon-Nikodymdensity for a new measure is obtained, under which the original process behavesdifferently. We only work in dimension one and start with a lemma on conditionalRadon-Nikodym densities.

3.3.1 Lemma. Let (Ω, F, P) be a probability space, H ⊂ F be a sub-σ-algebra and f ∈L1(P) be a density, that is nonnegative and integrating to one. Then a new probabilitymeasure Q on F is defined by Q(dω) = f(ω) P(dω) and for any F-measurable randomvariable X with EQ[|X|] < ∞ we obtain

EQ[X |H] EP[f |H] = EP[Xf |H] P -a.s.

3.3. Existence via Girsanov’s theorem 33

3.3.2 Remark. In the unconditional case we obviously have

EQ[X] =

∫X d Q =

∫Xf d P = EP[Xf ].

Proof. We show that the left-hand side is a version of the conditional expectation onthe right. Since it is obviously H-measurable, it suffices to verify∫

H

EQ[X |H] EP[f |H] d P =

∫H

Xf d P =

∫H

X d Q ∀H ∈ H.

By the projection property of conditional expectations we obtain

EP[1H EQ[X |H] EP[f |H]] = EP[1H EQ[X |H]f ] = EQ[1H EQ[X |H]] = EQ[1HX],

which is the above identity.

3.3.3 Lemma. Let (β(t), 0 ≤ t ≤ T ) be an (Ft)-adapted process with β1t≤T ∈ V ∗.Then

M(t) := exp(−

∫ t

0

β(s) dW (s)− 1

2

∫ t

0

β2(s) ds), 0 ≤ t ≤ T,

is an (Ft)-supermartingale. It is a martingale if and only if E[M(T )] = 1 holds.

Proof. If we apply Ito’s formula to M , we obtain

dM(t) = −β(t)M(t) dW (t), 0 ≤ t ≤ T.

Hence, M is always a nonnegative local P-martingale. By Fatou’s lemma for condi-tional expectations we infer that M is a supermartingale and a proper martingale ifand only if EP[M(T )] = EP[M(0)] = 1.

3.3.4 Lemma. M is a martingale if β satisfies one of the following conditions:

1. β is uniformly bounded;

2. Novikov’s condition:

E[exp

(1

2

∫ T

0

β2(t) dt)]

< ∞;

3. Kazamaki’s condition:

E[exp

(1

2

∫ T

0

β(t) dW (t))]

< ∞.


Proof. By the previous proof we know that M solves the linear SDE dM(t) =−β(t)M(t) dW (t) with M(0) = 1. Since β(t) is uniformly bounded, the diffu-sion coefficient satisfies the linear growth and Lipschitz conditions and we couldmodify Theorem 2.3.1 to cover also stochastic coefficients and obtain equally thatsup0≤t≤T E[M(t)2] is finite. This implies βM1[0,T ] ∈ V and M is a martingale.

Alternatively, we prove βM1[0,T ] ∈ V by hand: If β is uniformly bounded by someK > 0, then we have for any p > 0 and any partition 0 = t0 ≤ t1 ≤ · · · ≤ tn = t

E[exp

(p

n∑i=1

β(ti−1)(W (ti)−W (ti−1)))]

= E[exp

(p

n−1∑i=1

β(ti−1)(W (ti)−W (ti−1)))

E[exp(pβ(tn−1)(W (tn)−W (tn−1)) |Ftn−1 ]]

= E[exp

(p

n−1∑i=1

β(ti−1)(W (ti)−W (ti−1)))

exp(p2β(tn−1)2(tn − tn−1)

]≤ E

[exp

(p

n−1∑i=1

β(ti−1)(W (ti)−W (ti−1)))

exp(p2K2(tn − tn−1)]

≤ exp( n∑

i=1

p2K2(ti − ti−1))

= exp(p2K2t).

This shows that the random variables exp(∑n

i=1 β(ti−1)(W (ti) −W (ti−1)))

are uni-

formly bounded in any Lp(P)-space and thus uniformly integrable. Since by tak-ing finer partitions these random variables converge to exp(

∫ t

0β(s) dW (s)) in P-

probability, we infer that M(t) has finite expectation and even moments of all orders.

Consequently,∫ T

0E[(β(t)M(t))2] dt is finite and M is a martingale.

For the sufficency of Novikov’s and Kazamaki’s condition we refer to (Liptser andShiryaev 2001) and the references and examples (!) there.

3.3.5 Theorem. Let (X(t), 0 ≤ t ≤ T ) be a stochastic (Ito) process on (Ω, F, (Ft), P)satisfying

X(t) =

∫ t

0

β(s) ds + W (t), 0 ≤ t ≤ T,

with a Brownian motion W and a process β1t≤T ∈ V ∗. If β is such that M is amartingale, then (X(t), 0 ≤ t ≤ T ) is a Brownian motion under the measure Q on(Ω, F, (Ft)) defined by Q(dω) = M(T, ω) P(dω).

Proof. We use Levy’s characterisation of Brownian motion from Theorem 1.2.25.Since M is a martingale, M(T ) is a density and Q is well-defined.

3.3. Existence via Girsanov’s theorem 35

We put Z(t) = M(t)X(t) and obtain by Ito’s formula (or partial integration)

dZ(t) = M(t) dX(t) + X(t) dM(t) + d〈M, X〉t= M(t)

(β(t) dt + dW (t)−X(t)β(t) dW (t)− β(t)dt

)= M(t)(1−X(t)β(t)) dW (t).

This shows that Z is a local martingale. If Z is a martingale, then we accomplish theproof using the preceding lemma:

EQ[X(t) |Fs] =EP[M(t)X(t) |Fs]

EP[M(t) |Fs]=

Z(s)

M(s)= X(s), s ≤ t,

implies that X is a Q-martingale which by its very definition has quadratic variationt. Hence, X is a Brownian motion under Q.

If Z is only a local martingale with associated stopping times (τn), then the aboverelation holds for the stopped processes Xτn(t) = X(t∧ τn), which shows that X is alocal Q-martingale and Levy’s theorem applies.

3.3.6 Proposition. Suppose X is a stochastic process on (Ω, F, (Ft), P) satisfyingfor some T > 0 and measurable functions b and σ

dX(t) = b(X(t), t) dt + σ(X(t), t) dW (t), 0 ≤ t ≤ T, X(0) = X0.

Assume further that u(x, t) := −c(x, t)/σ(x, t), c measurable, is such that

M(t) = exp(−

∫ t

0

u(X(s), s) dW (s)− 1

2

∫ t

0

u2(X(s), s) ds), 0 ≤ t ≤ T,

is an (Ft)-martingale.Then the stochastic differential equation

dY (t) = (b(Y (t), t) + c(Y (t), t)) dt + σ(Y (t), t) dW (t), 0 ≤ t ≤ T, Y (0) = X0,(3.3.1)

has a weak solution given by ((X, W ), (Ω, F, Q), (Ft)) for the Q-Brownian motion

W (t) := W (t) +

∫ t

0

u(X(s), s) ds, t ≥ 0,

and the probability Q given by Q(dω) := M(T, ω) P(dω).

3.3.7 Remark. Usually, the martingale (M(t), t ≥ 0) is not closable whence we areled to consider stochastic differential equations for finite time intervals.

The martingale condition is for instance satisfied if σ is bounded away from zeroand c is uniformly bounded. Putting σ(x, t) = 1 and b(x, t) = 0 we have weak existencefor the equation dX(t) = c(X(t), t) dt + dW (t) if c is Borel-measurable and satisfiesa linear growth condition in the space variable, but without continuity assumption(Karatzas and Shreve 1991, Prop. 5.36).


Proof. From Theorem 3.3.5 we infer that W is a Q-Brownian motion. Hence, we canwrite

dX(t) =(b(X(t), t)− σ(X(t), t)u(X(t), t)

)dt + σ(X(t), t) dW (t),

which by definition of u shows that (X, W ) solves under Q equation (3.3.1).

The Girsanov Theorem also allows statements concerning uniqueness in law. Thefollowing is a typical version, which is proved in (Karatzas and Shreve 1991, Prop.5.3.10, Cor 5.3.11).

3.3.8 Proposition. Let two weak solutions ((Xi, Wi), (Ωi, Fi, Pi), (Fit)), i = 1, 2, of

dX(t) = b(X(t), t) dt + dW (t), 0 ≤ t ≤ T,

with b : R×R+ → R measurable be given with the same initial distribution. IfPi(

∫ T

0|b(Xi(t), t)|2 dt < ∞) = 1 holds for i = 1, 2, then (X1, W1) and (X2, W2) have

the same law under the respective probability measures. In particular, if b is uniformlybounded, then uniqueness in distribution holds.

3.4 Applications in finance and statistics

Chapter 4

The Markov properties

4.1 General facts about Markov processes

Let us fix the measurable space (state space) (S, S) and the filtered probability space(Ω, F, P; (Ft)t≥0) until further notice. We present certain notions and results concern-ing Markov processes without proof and refer e.g. to Kallenberg (2002) for furtherinformation. We specialise immediately to processes in continuous time and later onalso to processes with continuous trajectories.

4.1.1 Definition. An S-valued stochastic process (X(t), t ≥ 0) is called Markovprocess if X is (Ft)-adapted and satisfies

∀0 ≤ s ≤ t, B ∈ S : P(X(t) ∈ B |Fs) = P(X(t) ∈ B |X(s)) P -a.s.

In the sequel we shall always suppose that regular conditional transition probabil-ities (Markov kernels) µs,t exist, that is for all s ≤ t the functions µs,t : S×S → R aremeasurable in the first component and probability measures in the second componentand satisfy

µs,t(X(s), B) = P(X(t) ∈ B |X(s)) = P(X(t) ∈ B |Fs) P -a.s. (4.1.1)

4.1.2 Lemma. The Markov kernels (µs,t) satisfy the Chapman-Kolmogorov equation

µs,u(x, B) =

∫S

µt,u(y, B) µs,t(x, dy) ∀ 0 ≤ s ≤ t ≤ u, x ∈ S, B ∈ S.

4.1.3 Definition. Any family of regular conditional probabilities (µs,t)s≤t satisfyingthe Chapman-Kolmogorov equation is called a semigroup of Markov kernels. The ker-nels (or the associated process) are called time homogeneous if µs,t = µ0,t−s holds. Inthis case we just write µt−s.

4.1.4 Theorem. For any initial distribution ν auf (S, S) and any semigroup ofMarkov kernels (µs,t) there exists a Markov process X such that X(0) is ν-distributedand equation (4.1.1) is satisfied.

38 Chapter 4. The Markov properties

If S is a metric space with Borel σ-algebra S and if the process has a continuousversion, then the process can be constructed on the path space Ω = C(R+, S) with itsBorel σ-algebra B and canonical right-continuous filtration Ft =

⋂s>t σ(X(u), u ≤

s), where X(u, ω) := ω(u) are the coordinate projections. The probability measureobtained is called Pν and it holds

Pν =

∫S

Px(A) ν(dx), A ∈ B,

with Px := Pδx.

For the formal statement of the strong Markov property we introduce the shiftoperator ϑt that induces a left-shift on the function space Ω.

4.1.5 Definition. The shift operator ϑt on the canonical space Ω is given by ϑt :Ω → Ω, ϑt(ω) = ω(t + •) for all t ≥ 0.

4.1.6 Lemma.

1. ϑt is measurable for all t ≥ 0.

2. For (Ft)-stopping times σ and τ the random time γ := σ + τ ϑσ is again an(Ft)-stopping time.

4.1.7 Theorem. Let X be a time homogeneous Markov process and let τ be an (Ft)-stopping time with at most countably many values. Then we have for all x ∈ S

Px(X ϑτ ∈ A |Fτ ) = PX(τ)(A) Px -a.s. ∀A ∈ B. (4.1.2)

If X is the canonical process on the path space, then this is just an identity concerningthe image measure under ω 7→ ϑτ(ω)(ω): Px(• |Fτ ) (ϑτ )

−1 = PX(τ).

4.1.8 Definition. A process X satisfying (4.1.2) for any finite (or equivalentlybounded) stopping time τ is called strong Markov.

4.1.9 Remark. The strong Markov property entails the Markov property by settingτ = t and A = X(s) ∈ B for some B ∈ S in (4.1.2).

4.2 The martingale problem

We specify now to the state space S = Rd. As before we work on the path spaceΩ = C(R+, Rd) with its Borel σ-algebra B.

4.2.1 Definition. A probability measure P on the path space (Ω, B) is a solution ofthe local martingale problem for (b, σ) if

M f (t) := f(X(t))− f(X(0))−∫ t

0

Asf(X(s)) ds, t ≥ 0,

4.2. The martingale problem 39

where

Asf(x) :=1

2

d∑i,j=1

(σσT (x, s))ij∂2f

∂xi∂xj

(x) + 〈b(x, s), grad(f)(x)〉,

b : Rd×R+ → Rd, σ : Rd×R+ → Rd×m measurable, is a local martingale under Pfor all functions f ∈ C∞

K (Rd, R).

4.2.2 Remark. If b and σ are bounded, then P even solves the martingale problem,for which M f is required to be a proper martingale.

4.2.3 Theorem. The stochastic differential equation

dX(t) = b(X(t), t) dt + σ(X(t), t) dW (t), t ≥ 0

has a weak solution ((X, W ), (Ω, A, P), (Ft)) if and only if a solution to the localmartingale problem (b, σ) exists. In this case the law PX of X on the path spaceequals the solution of the local martingale problem.

Proof. For simplicity we only give the proof for the one-dimensional case, the multi-dimensional method of proof follows the same ideas.

1. Given a weak solution, Ito’s rule yields for any f ∈ C∞K (R)

df(X(t)) = f ′(X(t)) dX(t) + 12f ′′(X(t)) d〈X〉t

= f ′(X(t))σ(X(t), t) dW (t) + Atf(X(t)) dt.

Hence, M f is a local martingale; just note that σ(X(•)) ∈ V ∗ is required forthe weak solution and f ′ is bounded such that the stochastic integral is indeedwell defined and a local martingale under P. Of course, this remains true, whenconsidered on the path space under the image measure PX .

2. Conversely, let P be a solution of the local martingale problem and considerfunctions fn ∈ C∞

K (R) with fn(x) = x for |x| ≤ n. Then the standard stoppingargument applied to M fn for n →∞ shows that

M(t) := X(t)−X(0)−∫ t

0

b(X(s), s) ds, t ≥ 0,

is a local martingale. Similarly approximating g(x) = x2, we obtain that

N(t) := X2(t)−X2(0)−∫ t

0

σ2(X(s), s) + b(X(s), s)2X(s) ds, t ≥ 0,

is a local martingale. By Ito’s formula, dX2(t) = 2X(t)dX(t)+d〈X〉t holds andshows

N(t) =

∫ t

0

2X(s) dM(s) + 〈M〉t −∫ t

0

σ2(X(s), s) ds, t ≥ 0.


Therefore 〈M〉t −∫ t

0σ2(X(s), s) ds is a continuous local martingale of bounded

variation. By (Revuz and Yor 1999, Prop. IV.1.2) it must therefore vanishidentically and d〈M〉t = σ2(X(t), t) dt follows. By the representation theoremfor continuous local martingales (Kallenberg 2002, Thm. 18.12) there exists aBrownian motion W such that M(t) =

∫ t

0σ(X(s), s) dW (s) holds for all t ≥ 0.

Consequently (X, W ) solves the stochastic differential equation.

4.2.4 Corollary. A stochastic differential equation has a (in distribution) uniqueweak solution if and only if the corresponding local martingale problem is uniquelysolvable, given some initial distribution.

4.3 The strong Markov property

We immediately start with the main result that solutions of stochastic differentialequations are under mild conditions strong Markov processes. This entails that thesolutions are diffusion processes in the sense of Feller (Feller 1971).

4.3.1 Theorem. Let b : Rd → Rd and σ : Rd → Rd×m be time-homogeneous measur-able coefficients such that the local martingale problem for (b, σ) has a unique solutionPx for all initial distributions δx, x ∈ Rd. Then the family (Px) satisfies the strongMarkov property.

Proof. In order to state the strong Markov property we need that (Px)x∈Rd are Markovkernels. Theorem 21.10 of Kallenberg (2002) shows by abstract arguments that x 7→Px(B) is measurable for all B ∈ B.

We thus have to show

Px(X ϑτ ∈ B |Fτ ) = PX(τ)(B) Px -a.s. ∀ B ∈ B, bounded stopping time τ.

By the unique solvability of the martingale problem it suffices to show that the random(!) probability measure Qτ := Px((ϑτ )

−1• |Fτ ) solves Px-almost surely the martingaleproblem for (b, σ) with initial distribution δX(τ). Concerning the initial distribution

we find for any Borel set A ⊂ Rd by the stopping time property of τ

Px((ϑτ )−1ω′ |ω′(0) ∈ A |Fτ )(ω) = Px(ω′ |ω′(τ(ω′)) ∈ A |Fτ )(ω)

= 1A(ω(τ(ω))

= 1A(X(τ(ω), ω))

= PX(τ(ω),ω)(ω′ |ω′(0) ∈ A).

It remains to prove the local martingale property of M f under Qτ , that is themartingale property of M f,n(t) := M f (t ∧ τn) with τn := inft ≥ 0 | ‖M f (t)‖ ≤ n.

4.3. The strong Markov property 41

By its very definition M f (t) is always Ft-measurable, so we prove that Px-almostsurely ∫

F

M f,n(t, ω′) Qτ (dω′) =

∫F

M f,n(s, ω′) Qτ (dω′) ∀F ∈ Fs, s ≤ t.

By the separability of Ω and the continuity of M f,n it suffices to prove this identityfor countably many F , s and t (Kallenberg 2002, Thm. 21.11). Consequently, we neednot worry about Px-null sets. We obtain∫

F

M f,n(t, ω′) Qτ (dω′) =

∫1F (ϑτ (ω

′′))M f,n(t, ϑτ (ω′′)) Px(dω′′ |Fτ )

= Ex[1(ϑτ )−1F M f,n(t, ϑτ ) |Fτ ].

Because of M f,n(t, ϑτ ) = M f ((t + τ)∧ σn) with σn := τn ϑτ + τ , which is by Lemma4.1.6 a stopping time, the process M f,n(t, ϑτ ) is a martingale under Px adapted to(Ft+τ )t≥0. Since (ϑτ )

−1F is an element of Fs+τ , we conclude by optional stopping thatPx-almost surely∫

F

M f,n(t, ω′) Qτ (dω′) = Ex[1(ϑτ )−1F Ex[Mf,n(t, ϑτ ) |Fs+τ ] |Fτ ]

= Ex[1(ϑτ )−1F M f,n(s + τ) |Fτ ]

=

∫F

M f,n(s, ω′) Qτ (dω′).

Consequently, we have shown that with Px-probability one Qτ solves the martingaleproblem with initial distribution X(τ) and therefore equals PX(τ).

4.3.2 Example. A famous application is the reflection principle for Brownian motionW . By the strong Markov property, for any finite stopping time τ the process (W (t +τ) − W (τ), t ≥ 0) is again a Brownian motion independent of Fτ such that withτb := inft ≥ 0 |W (t) ≥ b for some b > 0:

P0(τb ≤ t) = P0(τb ≤ t, W (t) ≥ b) + P0(τb ≤ t, W (t) < b)

= P0(W (t) ≥ b) + P0(τb ≤ t, W (τb + (t− τb))−W (τb) < 0)

= P0(W (t) ≥ b) + 12

P0(τb ≤ t).

This implies P0(τb ≤ t) = 2 P(W (t) > b) and the stopping time τb has a distributionwith density

fb(t) =b√2πt3

e−b2/(2t), t ≥ 0.

Because of τb ≤ t = max0≤s≤t W (t) ≥ b we have at the same time determined thedistribution of the maximum of Brownian motion on any finite interval.


4.4 The infinitesimal generator

We first gather some facts concerning Markov transition operators and their semi-group property, see Kallenberg (2002) or Revuz and Yor (1999).

4.4.1 Lemma. Given a family (µt)t≥0 of time-homogeneous Markov kernels, the op-erators

Ttf(x) :=

∫f(y)µt(x, dy), f : S → R bounded,measurable,

form a semigroup, that is Tt Ts = Tt+s holds for all t, s ≥ 0.

Proof. Use the Chapman-Kolmogorov equation.

We now specialise to the state space S = Rd with its Borel σ-algebra.

4.4.2 Definition. If the operators (Tt)t≥0 satisfy (a) Ttf ∈ C0(Rd) for all f ∈ C0(Rd)and (b) limh→0 Thf(x) = f(x) for all f ∈ C0(Rd), x ∈ Rd, then (Tt) is called a Fellersemigroup.

4.4.3 Theorem. A Feller semigroup (Tt)t≥0 is a strongly continuous operator semi-group on C0(Rd), that is limh→0 Thf = f holds in supremum norm. It is uniquelydetermined by its generator A : D(A) ⊂ C0(Rd) → Rd with

Af := limh→0

Thf − f

h, D(A) := f ∈ C0(Rd) | lim

h→0

Thf−fh

exists.

Moreover, the semigroup uniquely defines the Markov kernels and thus the distributionof the associated Markov process (which is called Feller process).

4.4.4 Corollary. We have for all f ∈ D(A)

ddt

Ttf = ATtf = TtAf.

4.4.5 Theorem. (Hille-Yosida) Let A be a closed linear operator on C0(Rd) withdense domain D(A). Then A is the generator of a Feller semigroup if and only if

1. the range of λ0 Id−A is dense in C0(Rd) for some λ0 > 0;

2. if for some x ∈ Rd and f ∈ D(A), f(x) ≥ 0 and f(x) = maxy∈Rd f(x) thenAf(x) ≤ 0 follows (positive Maximum principle).

4.4.6 Theorem. If b and σ are bounded and satisfy the conditions of Theorem 4.3.1,then the Markov kernels (Px)x∈Rd solving the martingale problem for (b, σ) give riseto a Feller semigroup (Tt). Any function f ∈ C2

0(Rd) lies in D(A) and fulfills

Af(x) =1

2

d∑i,j=1

(σσT (x))ij∂2f

∂xi∂xj

(x) + 〈b(x), grad(f)(x)〉.

4.4. The infinitesimal generator 43

We shall even prove a stronger result under less restrictive conditions, which turnsout to be a very powerful tool in calculating certain distributions for the solutionprocesses.

4.4.7 Theorem. (Dynkin’s formula) Assume that b and σ are measurable, locallybounded and such that the SDE (2.1.1) with time-homogeneous coefficients has a (indistribution) unique weak solution. Then for all x ∈ Rd, f ∈ C2

K(Rd) and all boundedstopping times τ we have

Ex[f(X(τ))] = f(x) + Ex

[∫ τ

0

Af(X(s)) ds].

Proof. By Theorem 4.2.3 the process M f is a local martingale under Px. By thecompact support of f and the local boundedness of b and σ we infer that M f (t) isuniformly bounded and therefore M f is a martingale. Then the optional stoppingresult E[M f (τ)] = E[M f (0)] = 0 yields Dynkin’s formula.

4.4.8 Example.

1. Let W be an m-dimensional Brownian motion starting in some point a andτR := inft ≥ 0 | ‖W (t)‖ ≥ R. Then Ea[τR] = (R2 − ‖a‖2)/m holds for ‖a‖ <R. To infer this from Dynkin’s formula put f(x) = ‖x‖2 for ‖x‖ ≤ R andextend f outside of the ball such that f ∈ C2(R) with compact support. ThenAf(x) = m for ‖x‖ ≤ R and therefore Dynkin’s formula yields Ea[f(W (τR ∧n))] = f(a) + m Ea[τR ∧ n]. By montone convergence,

Ea[τR] = limn→∞

Ea[τR ∧ n] = limn→∞

(Ea[‖W (τR ∧ n)‖2]− ‖a‖2)/m

holds and we can conclude by dominated convergence (‖W (τR ∧ n)‖ ≤ R).

2. Consider the one-dimensional stochastic differential equation

dX(t) = b(X(t)) dt + σ(X(t)) dW (t).

Suppose a weak solution exists for some initial value X(0) with E[X(0)2] < ∞and that σ2(x)+2xb(x) ≤ C(1+x2) holds. Then E[X(t)2] ≤ (E[X(0)2]+1)eCt−1follows. To prove this, use the same f and put κt := τR ∧ t with τR from abovefor all t ≥ 0 such that by Dynkin’s formula

Ex[X(κt)2] = x2+Ex

[∫ κt

0

(σ2(X(s))+2b(X(s))X(s)

)ds

]≤ x2+

∫ t

0

C(1+X(s∧κ)2) ds.

By Gronwall’ s lemma, we obtain Ex[1 + X(κt)2] ≤ (x2 + 1)eCt. Since this is

valid for any R > 0 we get Ex[X(t)2] ≤ (x2 + 1)eCt − 1 and averaging over theinitial condition yields E[X(t)2] ≤ (E[X(0)2] + 1)eCt − 1. Note that this kind ofapproach was already used in Theorem 2.3.3 and improves significantly on themoment estimate of Theorem 2.3.1.


3. For the solution process X of a one-dimensional SDE as before we consider thestopping time τ := inft ≥ 0 |X(t) = 0. We want to decide whether Ea[τ ] isfinite or infinite for a > 0. For this set τR := τ ∧ inft ≥ 0 |X(t) ≥ R, R > a,and consider a function f ∈ C2(R) with compact support, f(0) = 0 and solvingAf(x) = 1 for x ∈ [0, R]. Then Dynkin’s formula yields

Ea[f(X(τR ∧ n))] = f(a) + Ea[τR ∧ n].

For a similar function g with Ag = 0 and g(0) = 0 we obtain Ea[g(X(τR∧n))] =g(a). Hence,

Ea[τR ∧ n] = Ea[f(X(τR ∧ n))]− f(a)

= Pa(X(τR ∧ n) = R)f(R) + Ea[f(X(n))1τR>n]− f(a)

=(g(a)− Ea[g(X(n))1τR>n]

)f(R)

g(R)+ Ea[f(X(n))1τR>n]− f(a)

follows. Using the uniform boundedness of f and g we infer by montone anddominated convergence for n →∞

Ea[τR] = g(a)f(R)

g(R)− f(a).

Monotone convergence for R → ∞ thus gives Ea[τ ] < ∞ if and only if

limR→∞f(R)g(R)

is finite. The functions f and g can be determined in full gen-

erality, but we restrict ourselves to the case of vanishing drift b(x) = 0 andstrictly positive diffusion coefficient inf0≤y≤x σ(y) > 0 for all x > 0. Then

f(x) =

∫ x

0

∫ y

0

2

σ2(z)dz dy and g(x) = x

will do. Since f(x) →∞, g(x) →∞ hold for x →∞, L’Hopital’s rule gives

limR→∞

f(R)

g(R)= lim

R→∞

f ′(R)

g′(R)=

∫ ∞

0

2

σ2(z)dz.

We conclude that the solution of dX(t) = σ(X(t)) dW (t) with X(0) = a satisfiesEa[τ ] < ∞ if and only if σ−2 is integrable. For constant σ we obtain a multipleof Brownian motion which satisfies Ea[τ ] = ∞. For σ(x) = x + ε, ε > 0,Ea[τ ] < ∞ holds, but in the limit ε → 0 the expectation tends to infinity. Thiscan be understood when observing that a solution of dX(t) = (X(t)+ε)dW (t) isgiven by the translated geometric Brownian motion X(t) = exp(W (t)− t

2)− ε,

which tends to −ε almost surely, but never reaches the value −ε. Concerningthe behaviour of σ(x) for x → ∞ we note that Ea[τ ] is finite as soon as σ(x)grows at least like xα for some α > 1

2such that the rapid fluctuations of X for

large x make excursions towards zero more likely.

4.5. The Kolmogorov equations 45

4.5 The Kolmogorov equations

The main object one is usually interested in to calculate for the solution process Xof an SDE is the transition probability P(X(t) ∈ B |X(s) = x) for t ≥ s ≥ 0 and anyBorel set B. A concise description is possible, if a transition density p(x, y; t) existssatisfying

P(X(t) ∈ B |X(s) = x) =

∫B

p(x, y; t− s) dy.

Here we shall present analytical tools to determine this transition density if it exists.The proof of its existence usually either relies completely on analytical results or onMalliavin calculus, both being beyond our scope.

4.5.1 Lemma. Assume that b and σ are continuous and such that the SDE (2.1.1)has a (in distribution) unique weak solution for any deterministic initial value. Forany f ∈ C2

K(Rd) set u(x, t) := Ex[f(X(t))]. Then u is a solution of the parabolicpartial differential equation

∂u

∂t(x, t) = (Au(•, t))(x), ∀x ∈ Rd, t ≥ 0, with u(x, 0) = f(x) ∀x ∈ Rd .

Proof. Dynkin’s formula for τ = t yields by the Fubini-Tonelli theorem

u(x, t) = f(x) +

∫ t

0

Ex[Af(X(s))] ds ∀x ∈ Rd, t ≥ 0.

Since the coefficients b and σ are continuous, the integrand is continuous and u iscontinuously differentiable with respect to t satisfying ∂u

∂t(x, t) = Ex[Af(X(t))]. On

the other hand we obtain by the Markov property for t, h > 0

Ex[u(X(h), t)] = Ex[EX(h)[f(X(t))]] = Ex[f(X(t + h))] = u(x, t + h).

For fixed t > 0 we infer that the left hand side of

u(x, t + h)− u(x, t)

h=

Ex[u(X(h), t)]− u(x, t)

h

converges for h → 0 to ∂u∂t

and therefore also the right-hand side. Therefore u lies inthe domain D(A) and the assertion follows.

4.5.2 Corollary. If the transition density p(x, y; t) exists, is twice continuously dif-ferentiable with respect to x and continuously differentiable with respect to t, thenp(x, y; t) solves for all y ∈ Rd the backward Kolmogorov equation

∂u

∂t(x, t) = (Au(•, t))(x), ∀x ∈ Rd, t ≥ 0, with u(x, 0) = δy(x).

In other words, for fixed y the transition density is the fundamental solution of thisparabolic partial differential equation.


Proof. Writing the identity in the preceding lemma in terms of p, we obtain for anyf ∈ C2

K(Rd)∂

∂t

∫Rd

f(y)p(x, y; t) dy = A(∫

Rd

f(y)p(x, y; t) dy).

By the compact support of f and the smoothness properties of p, we may interchangeintegration and differentiation on both sides. From

∫( ∂

∂t− A)p(x, y; t)f(y)dy = 0 for

any test function f we then conclude by a continuity argument.

4.5.3 Corollary. If the transition density p(x, y; t) exists, is twice continuously dif-ferentiable with respect to y and continuously differentiable with respect to t, thenp(x, y; t) solves for all x ∈ Rd the forward Kolmogorov equation

∂u

∂t(y, t) = (A∗u(•, t))(y), ∀ y ∈ Rd, t ≥ 0, with u(y, 0) = δx(y),

where

A∗f(y) =1

2

d∑i,j=1

∂2

∂y2

((σσT (y))ijf(y)

)−

d∑i=1

∂

∂yi

(bi(y)f(y)

)is the formal adjoint of A. Hence, for fixed x the transition density is the fundamentalsolution of the parabolic partial differential equation with the adjoint operator.

Proof. Let us evaluate Ex[Af(X(t))] for any f ∈ C2K(Rd) in two different ways. First,

we obtain by definition

Ex[Af(X(t))] =

∫Af(y)p(x, y; t) dy =

∫f(y)(A∗p(x, •; t))(y) dy.

On the other hand, by dominated convergence and by Dynkin’s formula we find∫f(y)

∂

∂tp(x, y; t) dy =

∂

∂tEx[f(X(t))] = Ex[Af(X(t))].

We conclude again by testing this identity with all f ∈ C2K(Rd).

4.5.4 Remark. The preceding results are in a sense not very satisfactory becausewe had to postulate properties of the unknown transition density in order to derive adetermining equation. Karatzas and Shreve (1991) state on page 368 sufficient condi-tions on the coefficients b and σ, obtained from the analysis of the partial differentialequations, under which the transition density is the unique classical solution of the for-ward and backward Kolmogorov equation, respectively. Main hypotheses are ellipticityof the diffusion coefficient and boundedness of both coefficients together with certainHolder-continuity requirements. In the case of the forward equation in addition thefirst two derivatives of σ and the first derivative of b have to have these properties,which is intuitively explained by the form of the adjoint A∗.

4.6. The Feynman-Kac formula 47

4.5.5 Example. We have seen that a solution of the scalar Ornstein-Uhlenbeckprocess

dX(t) = αX(t) dt + σ dW (t), t ≥ 0,

is given by X(t) = X(0)eαt +∫ t

0eα(t−s)σ dW (s). Hence, the transition density is given

by the normal density

p(x, y; t) =1√

2πσ2(2α)−1(e2αt − 1)exp

(−(y − xeαt)2/(σ2α−1(e2αt − 1))

).

It can be easily checked that p solves the Kolmogorov equations

∂u

∂t(x, t) =

σ2

2

∂2u

∂x2(x, t) + α

∂u

∂x(x, t) and

∂u

∂t(y, t) =

σ2

2

∂2u

∂x2(y, t)− α

∂u

∂y(y, t).

For α = 0 and σ = 1 we obtain the Brownian motion transition density p(x, y; t) =(2πt)−1/2 exp((y − x)2/(2t)) which is the fundamental solution of the classical heatequation ∂u

∂t= 1

2∂2u∂x2 in both variables x and y.

4.6 The Feynman-Kac formula

Chapter 5

Stochastic control: an outlook

In this chapter we briefly present one main approach for solving optimal control prob-lems for dynamical systems described by stochastic differential equations: Bellman’sprinciple of dynamic programming and the resulting Hamilton-Jacobi-Bellman equa-tion.

For some T > s ≥ 0 and y ∈ Rd we consider the controlled stochastic differentialequation

dX(t) = b(X(t), u(t), t) dt + σ(X(t), u(t), t) dW (t), t ∈ [s, T ], X(s) = y,

where X is d-dimensional, W is m-dimensional Brownian motion and the coefficientsb : Rd×U × [0, T ] → Rd, σ : Rd×U × [0, T ] → Rd×m are regular, say Lipschitzcontinuous, in x and depend on the controls u(t) taken in some abstract metric spaceU , which are Ft-adapted. The goal is to choose the control u in such a way that agiven cost functional

J(s, y; u) := E[∫ T

s

f(X(t), u(t), t) dt + h(X(t))],

where f and h are certain continuous functions, is minimized.

5.0.1 Example. A standard example of stochastic control is to select a portfolio ofassets, which is in some sense optimal. Suppose a riskless asset S0 like a bond growsby a constant rate r > 0 over time

dS0(t) = rS0(t) dt,

while a risky asset S1 like a stock follows the scalar diffusion equation of a geometricBrownian motion (Black-Scholes model)

dS1(t) = S1(t)(b dt + σ dW (t)

).

Since this second asset is risky, it is natural to suppose b > r. The agent has at eachtime t the possibility to trade, that is to decide the fraction u(t) of his wealth X(t)

49

which is invested in the risky asset S1. Under this model we can derive the stochasticdifferential equation governing the dynamics of the agent’s wealth:

dX(t) = u(t)X(t)(b dt + σ dW (t)

)+ (1− u(t))X(t)r dt

= (r + (b− r)u(t))X(t) dt + σu(t)X(t) dW (t).

Note that necessarily u(t) ∈ [0, 1] has to hold for all t and we should choose U =[0, 1]. Suppose the investor wants to maximize his average utility at time T > 0,where the utility is usually assumed to be a concave function of the wealth. Then amathematically tractable cost functional would for instance be

J(s, y; u) = −Es,y;u[X(T )α], α ∈ (0, 1].

Note that the expectation depends of course on the initial wealth endowment X(s) = yand the chosen investment strategy u. The special linear form of the system impliesautomatically that the wealth process X cannot become negative and the cost funtionalis well-defined., but usually one has to treat this kind of restriction separately, eitherby specifying the set of admissible controls more precisely or by introducing a stoppingtime instead of the deterministic final time T .

We will not write down properly all assumptions on b, σ, f and h, but refer to(Yong and Zhou 1999, Section 4.3.1) for details, in particular the discussion aboutthe requirement of having a weak or a strong solution of the SDE involved. Here, weonly want to stress the fact that we allow all controls u in a set of admissible controlsU(s, T ) ⊂ u : [s, T ] × Ω → U |u(t, •) is Fs

t adapted, where Fst is generated by the

Brownian motion W in the period [s, t] and augmented by null sets.The problem of optimal stochastic control can then be stated as follows: Find for

given (s, y) a control process u ∈ U(s, T ) such that

J(s, y; u) = infu∈U(s,T )

J(s, y; u).

The existence of an optimal control process is not always ensured, but in many casesfollows from the setup of the problem or by compactness arguments.

We can now state the main tool we want to use for solving this optimizationproblem. Conceptually, the idea is to study how the optimal cost changes over timeand state. This means that we shall consider the so-called value function

V (s, y) := infu∈U(s,T )

J(s, y; u), (s, y) ∈ [0, T )× Rd,

with its natural extension V (T, y) = h(y).

5.0.2 Theorem. (Bellman’s dynamic programming principle) Under certain regular-ity conditions we have for any (s, y) ∈ [0, T )× Rd and z ∈ [s, T ]:

V (s, y) = infu∈U(s,T )

E[∫ z

s

f(Xs,y,u(t), u(t), t) dt + V (z, Xs,y,u(z))].

50 Chapter 5. Stochastic control: an outlook

Proof. See Theorem 4.3.3 in Yong and Zhou (1999).

Intuitively, this principle asserts that a globally optimal control u over the pe-riod [s, T ] is also locally optimal for shorter periods [u, T ]. In other words, we cannotimprove upon a globally optimal control by optimising separately on smaller subin-tervals. If this were the case, we could simply patch together these controls to obtaina globally better control.

The key point is that the knowledge of the value function for all arguments allowsto determine also the optimal controls which have to be applied in order to attain theoptimal cost. Therefore we have to study the equation for V in the Bellman principlemore thoroughly. Since integral equations are more difficult to handle, we look forinfinitesimal changes in s, which amounts to letting z ↓ s appropriately. Heuristically,we interchange limit and infimum in the following formal(!) calculations, which haveto be justified much more accurately:

0 =1

z − sinf

u∈U(s,T )E

[∫ z

s

f(Xs,y,u(t), u(t), t) dt + V (z, Xs,y,u(z))− V (s, y)]

then gives formally for z ↓ s

0 = infu∈U(s,T )

(f(y, u(s), s) +

∂

∂tE[V (t,Xs,y,u(t))]|t=s)

),

which using the theory developed in the preceding chapter yields

0 = infu∈U

(f(y, u, s) +

∂

∂sV (s, y) + As,y,uV (s, y)

),

where we have denoted by As,y,u the infinitesimal generator associated to Xs,y,u:

As,y,uf(y) =1

2

d∑i,j=1

(σσT (s, y, u))ij∂2f

∂yi∂yj

(y) + 〈b(s, y, u), grad(f)(y)〉.

In terms of the so-called Hamiltonian

H(t, x, u, p, P ) :=1

2trace(P (σσT )(s, y, u)) + 〈b(s, y, u), p〉 − f(s, y, u)

we arrive at the Hamilton-Jacobi-Bellman (HJB) equation

∂V

∂s= sup

u∈UH

(s, y, u,−

(∂V∂yi

)i,−

(∂2V

∂yi∂yj

)ij

), (s, y) ∈ [0, T )× Rd .

Together with V (T, y) = h(y) we thus focus a terminal value problem for a partialdifferential equation.

In general, the value function only solves the HJB equation in a weak sense as aso called viscosity solution.

51

In the sequel we assume that we have found the value function, e.g. via solvingthe HJB equation and proving uniqueness of the solution in a certain sense. Then theoptimal control u is given in feedback form u(t) = u∗(X(t), t) with u∗ found by themaximizing property

H(s, y, u∗(y, s),−

(∂V∂yi

)i,−

(∂2V

∂yi∂yj

)ij

)= supu∈UH

(s, y, u,−

(∂V∂yi

)i,−

(∂2V

∂yi∂yj

)ij

).

For a correct mathematical statement we cite the standard classical verificationtheorem from (Yong and Zhou 1999, Thm. 5.5.1).

5.0.3 Theorem. Suppose W ∈ C1,2([0, T ], Rd) solves the HJB equation together withits final value. Then

W (s, y) ≤ J(s, y; u)

holds for all controls u and all (s, y), that is W is a lower bound for the value function.Furthermore, an admissible control u is optimal if and only if

∂V

∂t(t,Xs,y,u(t)) = H

(t,Xs,y,u(t), u(t),−

(∂V∂yi

(t,Xs,y,u(t)))

i,−

(∂2V

∂yi∂yj(t,Xs,y,u(t))

)ij

)holds for t ∈ [s, T ] almost surely.

Let us close this chapter by reconsidering the optimal investment example. TheHamiltonian in this case is given by

H(t, x, u, p, P ) =1

2σ2u2x2P + (r + (b− r)u)xp

such that the HJB equation reads

∂tV (t, x) = supu∈[0,1]

(−1

2σ2u2x2∂xxV (t, x)− (r + (b− r)u)x∂xV (t, x)

).

Neglecting for a moment the restriction u ∈ [0, 1] we find the optimizing value u∗ inthis equation by the first order condition

σ2u∗x2∂xxV (t, x) + (b− r)x∂xV (t, x) = 0

leading to the more explicit HJB equation

∂tV (t, x) = −1

2

(r − b)2x2(∂xV )2

σ2x2∂xx

− rx∂xV +(r − b)2x2(∂xV )2

σ2x2∂xxV

= −rx∂xV +1

2

(r − b)2x2(∂xV )2

σ2x2∂xx

.

Due to the good choice of the cost functional we find for α ∈ (0, 1) a solution satisfyingthe HJB equation and having the correct final value to be

V (t, x) = eλ(T−t)xα with λ = rα +(b− r)2α

2σ2(1− α).

52 Chapter 5. Stochastic control: an outlook

This yields the optimal feedback function

u∗(x, t) =b− r

σ2(1− α).

Hence, if u∗ ∈ [0, 1] is valid, we have found the optimal strategy just to have a constantfraction of the wealth invested in both assets. Some special choices of the parametersmake the optimal choice clearer: for b ↓ r we will not invest in the risky asset becauseit does not offer a higher average yield, for σ →∞ the same phenomenon occurs dueto the concavity of the utility function penalizing relative losses higher than gains,for σ → 0 or α → 1 we do not run into high risk when investing in the stock and thuswill do so (even with borrowing for u∗ > 1!),.

Bibliography

Durrett, R. (1996): Stochastic calculus. A practical introduction. Probability andStochastics Series. Boca Raton, FL: CRC Press.

Feller, W. (1971): An introduction to probability theory and its applications. VolII. 2nd ed. Wiley Series in Probability and Mathematical Statistics. New York etc.:John Wiley and Sons.

Kallenberg, O. (2002): Foundations of modern probability. 2nd ed. Probability andIts Applications. New York, Springer.

Karatzas, I., and S. E. Shreve (1991): Brownian motion and stochastic calculus.2nd ed. Springer, New York.

Kloeden, P. E., and E. Platen (1992): Numerical solution of stochastic differen-tial equations. Applications of Mathematics. 23. Berlin: Springer-Verlag.

Liptser, R. S., and A. N. Shiryaev (2001): Statistics of random processes. 1:General theory. 2nd ed. Springer, Berlin.

Øksendal, B. (1998): Stochastic differential equations. An introduction with appli-cations. 5th ed. Universitext. Berlin: Springer.

Revuz, D., and M. Yor (1999): Continuous martingales and Brownian motion.3rd ed. Springer, Berlin.

Yong, J., and X. Y. Zhou (1999): Stochastic controls. Hamiltonian systems andHJB equations. Applications of Mathematics. 43. New York, NY: Springer.

Documents

sde-lecture.pdf