APPLIED STOCHASTIC PROCESSES - Imperial College …pavl/lecture_notesM4A42.pdf · APPLIED STOCHASTIC PROCESSES ... 5.2 The Backward and Forward Kolmogorov Equations ... 6.9 Brownian

APPLIED STOCHASTIC PROCESSES

G.A. PavliotisDepartment of Mathematics

Imperial College LondonLondon SW7 2AZ, UK

January 18, 2009

2

Contents

1 Introduction 7

1.1 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 7

1.2 The One-Dimensional Random Walk . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 9

2 Elements of Probability Theory 13

2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 13

2.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 14

3 Basics of the theory of Stochastic Processes 19

3.1 Definition of a Stochastic Process . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 19

3.2 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 20

3.2.1 Strictly Stationary Processes . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 20

3.2.2 Second Order Stationary Processes . . . . . . . . . . . . . . . .. . . . . . . . . . 21

3.2.3 Ergodic Properties of Stationary Processes . . . . . . . .. . . . . . . . . . . . . . 24

3.3 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 26

3.4 Other Examples of Stochastic Processes . . . . . . . . . . . . . .. . . . . . . . . . . . . . 30

3.4.1 Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 30

3.4.2 Fractional Brownian Motion . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . 31

3.4.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 31

3.5 The Karhunen-Loeve Expansion . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 32

3.6 The Path Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 36

3.7 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 37

4 Markov Processes 39

4.1 Introduction and Examples . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 39

4.2 Definition of a Markov Process . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 43

4.3 The Chapman-Kolmogorov Equation . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 44

4.4 The Generator of a Markov Processes . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 47

4.5 The Adjoint Semigroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . 49

4.6 Ergodic Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 50

4.7 Stationary Markov Processes . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 51

3

5 Diffusion Processes 53

5.1 Definition of a Diffusion Process . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 53

5.2 The Backward and Forward Kolmogorov Equations . . . . . . . .. . . . . . . . . . . . . . 54

5.2.1 The Backward Kolmogorov Equation . . . . . . . . . . . . . . . . .. . . . . . . . 54

5.2.2 The Forward Kolmogorov Equation . . . . . . . . . . . . . . . . . .. . . . . . . . 56

5.3 The Fokker-Planck Equation in Arbitrary Dimensions . . .. . . . . . . . . . . . . . . . . . 58

5.4 Connection with Stochastic Differential Equations . . .. . . . . . . . . . . . . . . . . . . . 59

5.5 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 60

6 The Fokker-Planck Equation 61

6.1 Basic Properties of the FP Equation . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 61

6.1.1 Existence and Uniqueness of Solutions . . . . . . . . . . . . .. . . . . . . . . . . 61

6.1.2 The FP equation as a conservation law . . . . . . . . . . . . . . .. . . . . . . . . . 63

6.1.3 Boundary conditions for the Fokker–Planck equation .. . . . . . . . . . . . . . . . 63

6.2 Examples of Diffusion Processes . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 64

6.2.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 64

6.2.2 The Onrstein-Uhlenbeck Process . . . . . . . . . . . . . . . . . .. . . . . . . . . . 66

6.2.3 The Geometric Brownian Motin . . . . . . . . . . . . . . . . . . . . .. . . . . . . 69

6.3 The OU Process and Hermite Polynomials . . . . . . . . . . . . . . .. . . . . . . . . . . . 69

6.4 Gradient Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 75

6.5 Eigenfunction Expansions . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 79

6.6 Self-adjointness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 81

6.7 Reduction to a Schrodinger Equation . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 81

6.8 The Klein-Kramers-Chandrasekhar Equation . . . . . . . . . .. . . . . . . . . . . . . . . 82

6.9 Brownian Motion in a Harmonic Potential . . . . . . . . . . . . . .. . . . . . . . . . . . . 87

6.10 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 95

7 Stochastic Differential Equations 97

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 97

7.2 The Ito Stochastic Integral . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 97

7.2.1 The Stratonovich Stochastic Integral . . . . . . . . . . . . .. . . . . . . . . . . . . 99

7.3 Existence and Uniqueness of solutions for SDEs . . . . . . . .. . . . . . . . . . . . . . . . 99

7.4 Derivation of the Stratonovich SDE . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 101

7.5 Ito versus Stratonovich . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 104

7.6 The Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 105

7.7 The Ito Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 106

7.8 Examples of SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 107

7.8.1 The Stochastic Landau Equation . . . . . . . . . . . . . . . . . . .. . . . . . . . . 108

7.9 The Backward Kolmogorov Equation . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 110

7.9.1 Derivation of the Fokker-Planck Equation . . . . . . . . . .. . . . . . . . . . . . . 110

4

8 The Smoluchowski and Freidlin-Wentzell Limits 1138.1 Asymptotics for the Langevin equation . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 1138.2 The Kramers to Smoluchowski Limit . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1158.3 The Freidlin–Wentzell Limit . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 120

9 Exit Time Problems and Reaction Rate Theory 1279.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 1279.2 Kramers’ Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 1299.3 The Mean First Passage Time Approach . . . . . . . . . . . . . . . . .. . . . . . . . . . . 129

9.3.1 The Boundary Value Problem for the MFPT . . . . . . . . . . . . .. . . . . . . . . 1299.4 Escape from a potential barrier . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 131

9.4.1 Calculation of the Reaction Rate in the Smoluchowski regime . . . . . . . . . . . . 1319.4.2 The Intermediate Regime:γ = O(1) . . . . . . . . . . . . . . . . . . . . . . . . . 1339.4.3 Calculation of the Reaction Rate in the energy-diffusion-limited regime . . . . . . . 133

9.5 Extension to Higher Dimensions . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1339.6 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 133

5

6

Chapter 1

Introduction

1.1 Historical Overview

• The theory of stochastic processes started with Einstein’swork on the theory of Brownian motion:Concerning the motion, as required by the molecular-kinetic theory of heat, of particles suspended inliquids at rest(1905).

– explanation of Brown’s observation (1827): when suspendedin water, small pollen grains arefound to be in a very animated and irregular state of motion.

– Einstein’s theory is based on

∗ A Markov chain model for the motion of the particle (molecule, pollen grain...).

∗ The idea that it makes more sense to talk about the probability of finding the particle atpositionx at timet, rather than about individual trajectories.

• In his work many of the main aspects of the modern theory of stochastic processes can be found:

– The assumption of Markovianity (no memory) expressed through the Chapman-Kolmogorovequation.

– The Fokker–Planck equation (in this case, the diffusion equation).

– The derivation of the Fokker-Planck equation from the master (Chapman-Kolmogorov) equationthrough a Kramers-Moyal expansion.

– The calculation of a transport coefficient (the diffusion equation) using macroscopic (kinetictheory-based) considerations:

D =kBT

6πηa.

– kB is Boltzmann’s constant,T is the temperature,η is the viscosity of the fluid anda is thediameter of the particle.

• Einstein’s theory is based on theFokker-Planck equation. Langevin (1908) developed a theory basedon astochastic differential equation. The equation of motion for a Brownian particle is

md2x

dt2= −6πηa

dx

dt+ ξ,

7

• whereξ is a random force.

• There is complete agreement between Einstein’s theory and Langevin’s theory.

• The theory of Brownian motion was developed independently by Smoluchowski.

• The approaches of Langevin and Einstein represent the two main approaches in the theory of stochas-tic processes:

– Study individual trajectories of Brownian particles. Their evolution is governed by a stochasticdifferential equation:

dX

dt= F (X) + Σ(X)ξ(t),

– whereξ(t) is a random force.

• Study the probabilityρ(x, t) of finding a particle at positionx at timet. This probability distributionsatisfies the Fokker-Planck equation:

∂ρ

∂t= ∇ · (F (x)ρ) +

1

2∇∇ : (A(x)ρ),

• whereA(x) = Σ(x)Σ(x)T .

• The theory of stochastic processes was developed during the20th century:

– Physics:

∗ Smoluchowksi.

∗ Planck (1917).

∗ Klein (1922).

∗ Ornstein and Uhlenbeck (1930).

∗ Kramers (1940).

∗ Chandrasekhar (1943).

∗ . . .

– Mathematics:

∗ Wiener (1922).

∗ Kolmogorov (1931).

∗ Ito (1940’s).

∗ Doob (1940’s and 1950’s).

∗ . . .

8

1.2 The One-Dimensional Random Walk

We let time be discrete, i.e.t = 0, 1, . . . . Consider the following stochastic processSn:

• S0 = 0;

• at each time step it moves to±1 with equal probability12 .

In other words, at each time step we flip a fair coin. If the outcome is heads, we move one unit to the right.If the outcome is tails, we move one unit to the left.

Alternatively, we can think of the random walk as a sum of independent random variables:

Sn =

n∑

j=1

Xj,

whereXj ∈ −1, 1 with P(Xj = ±1) = 12 .

We can simulate the random walk on a computer:

• We need a(pseudo)random number generatorto generaten independent random variables whichareuniformly distributed in the interval [0,1].

• If the value of the random variable is> 12 then the particle moves to the left, otherwise it moves to the

right.

• We then take the sum of all these random moves.

• The sequenceSnNn=1 indexed by the discrete timeT = 1, 2, . . . N is thepath of the random

walk. We use a linear interpolation (i.e. connect the pointsn, Sn by straight lines) to generate acontinuous path.

• Every path of the random walk is different: it depends on the outcome of a sequence of independentrandom experiments.

• We can compute statistics by generating a large number of paths and computing averages. For exam-ple,E(Sn) = 0, E(S2

n) = n.

• The paths of the random walk (without the linear interpolation) are not continuous: the random walkhas a jump of size1 at each time step.

• This is an example of adiscrete time, discrete spacestochastic processes.

• The random walk is atime-homogeneous Markovprocess.

• If we take a large number of steps, the random walk starts looking like a continuous time process withcontinuous paths.

9

0 5 10 15 20 25 30 35 40 45 50

−6

−4

−2

0

2

4

6

8

50−step random walk

Figure 1.1: Three paths of the random walk of lengthN = 50.

0 100 200 300 400 500 600 700 800 900 1000

−50

−40

−30

−20

−10

0

10

20

1000−step random walk

Figure 1.2: Three paths of the random walk of lengthN = 50.

10

0 0.5 1−2

0

2First 50 steps

0 0.5 1−2

0

2First 1000 steps

0 0.5 1−2

0

2First 100 steps

0 0.5 1−2

0

2First 5000 steps

0 0.5 1−2

0

2First 200 steps

0 0.5 1−2

0

2First 25000 steps

Figure 1.3: Space-time rescaled random walks.

• Consider the sequence ofcontinuous timestochastic processes

Znt :=

1√nSnt.

• In the limit asn → ∞, the sequenceZnt converges (in some appropriate sense) to aBrownian

motion with diffusion coefficientD = ∆x2

2∆t = 12 .

• Brownian motionW (t) is a continuous time stochastic processes with continuous paths that starts at0 (W (0) = 0) and has independent, normally. distributed Gaussian increments.

• We can simulate the Brownian motion on a computer using a random number generator that generatesnormally distributed, independent random variables.

Why introduce randomness in the description of physical systems?

• To describe outcomes of a repeated set of experiments. Thinkof tossing a coin repeatedly or ofthrowing a dice.

• To describe a deterministic process for which we have incomplete information. Think of an ODE forwhich we don’t know exactly the initial conditions (weatherprediction).

• To describe a complicated deterministic system with many degrees of freedom using a simpler, lowdimensional stochastic system. Think of the physical modelfor Brownian motion (a heavy particlecolliding with many small particles).

• To describe a system that is inherently random. Think of quantum mechanics.

11

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

t

U(t)

mean of 1000 paths5 individual paths

Figure 1.4: Sample Brownian paths.

12

Chapter 2

Elements of Probability Theory

2.1 Basic Definitions

Definition 2.1.1. The set of all possible outcomes of an experiment is called thesample spaceand is denotedbyΩ.

Example 2.1.2. • The possible outcomes of the experiment of tossing a coin areH andT . The samplespace isΩ =

H, T

.

• The possible outcomes of the experiment of throwing a die are1, 2, 3, 4, 5 and6. The sample spaceis Ω =

1, 2, 3, 4, 5, 6

.

• Eventsare subsets of the sample space.

• We would like the unions and intersections of events to also be events.

Definition 2.1.3. A collectionF of Ω is called afield onΩ if

i. ∅ ∈ F ;

ii. if A ∈ F thenAc ∈ F ;

iii. If A, B ∈ F thenA ∪B ∈ cF .

From the definition of a field we immediately deduce thatF is closed under finite unions and finite intersec-tions:

A1, . . . An ∈ F ⇒ ∪ni=1Ai ∈ F , ∩n

i=1Ai ∈ F .

WhenΩ is infinite dimensional then the above definition is not appropriate since we need to considercountable unions of events.

Definition 2.1.4. A collectionF of Ω is called aσ-field or σ-algebra onΩ if

i. ∅ ∈ F ;

ii. if A ∈ F thenAc ∈ F ;

13

iii. If A1, A2, · · · ∈ F then∪∞i=1Ai ∈ F .

A σ-algebra is closed under the operation of taking countable intersections.

Example 2.1.5. • F =∅, Ω

.

• F =∅, A, Ac, Ω

whereA is a subset ofΩ.

• Thepower setof Ω, denoted by0, 1Ω which contains all subsets ofΩ.

A sub-σ–algebra is a collection of subsets of aσ–algebra which satisfies the axioms of aσ–algebra.

Definition 2.1.6. A probability measure P on themeasurable space(Ω, F) is a functionP : F 7→ [0, 1]

satisfying

i. P(∅) = 0, P(Ω) = 1;

ii. For A1, A2, . . . withAi ∩Aj = ∅, i 6= j then

P(∪∞i=1Ai) =

∞∑

i=1

P(Ai).

Definition 2.1.7. The triple(Ω, F , P

)comprising a setΩ, aσ-algebraF of subsets ofΩ and a probability

measureP on (Ω, F) is a called aprobability space.

• Assume thatE|X| < ∞ and letG be a sub–σ–algebra ofF . Theconditional expectationof X withrespect toG is defined to be the functionE[X|G] : Ω 7→ E which isG–measurable and satisfies

∫

GE[X|G] dµ =

∫

GX dµ ∀G ∈ G.

• We can defineE[f(X)|G] and the conditional probabilityP[X ∈ F |G] = E[IF (X)|G], whereIF isthe indicator function ofF , in a similar manner.

2.2 Random Variables

Definition 2.2.1. Let (Ω,F) and (E,G) be two measurable spaces. A functionX : Ω 7→ E such that theevent

ω ∈ Ω : X(ω) ∈ A =: X ∈ A (2.1)

belongs toF for arbitrary A ∈ G is called a measurable function or random variable.

WhenE is R equipped with its Borelσ-algebra, then (2.1) can by replaced with

X 6 x ∈ F ∀x ∈ R.

14

• Let X be a random variable (measurable function) from(Ω,F , µ) to (E,G). If E is a metric spacethen we may defineexpectationwith respect to the measureµ by

E[X] =

∫

ΩX(ω) dµ(ω).

• More generally, letf : E 7→ R beG–measurable. Then,

E[f(X)] =

∫

Ωf(X(ω)) dµ(ω).

• Let U be a topological space. We will use the notationB(U) to denote the Borelσ–algebra ofU :the smallestσ–algebra containing all open sets ofU . Every random variable from a probability space(Ω,F , µ) to a measurable space(E,B(E)) induces a probability measure onE:

µX(B) = PX−1(B) = µ(ω ∈ Ω;X(ω) ∈ B), B ∈ B(E).

The measureµX is called thedistribution (or sometimes thelaw) of X.

Example 2.2.2.LetI denote a subset of the positive integers. A vectorρ0 = ρ0,i, i ∈ I is a distributiononI if it has nonnegative entries and its total mass equals1:

∑i∈I ρ0,i = 1.

In the case whereE = R, thedistribution function of the random variableX is the functionF : [0, 1] 7→[0, 1] given by

FX(x) = P(X 6 x). (2.2)

In this case,(R,B(R), FX ) becomes a probability space.The distribution functionF (x) of a random variable has the properties thatlimx→−∞ F (x) = 0, limx→+∞ F (x) =

1 and is right continuous.

Definition 2.2.3. A random variableX with values onR is calleddiscrete if it takes values in some count-able subsetx0, x1, x2, . . . of R. i.e.: P(X = x) 6= x only forx = x0, x1, . . . .

With a random variable we can associate theprobability mass function pk = P(X = xk). We will con-sider nonnegative integer valued discrete random variables. In this casepk = P(X = k), k = 0, 1, 2, . . . .

Example 2.2.4.The Poisson random variable is the nonnegative integer valued random variable with prob-ability mass function

pk = P(X = k) =λk

k!e−k, k = 0, 1, 2, . . . ,

whereλ > 0.

Example 2.2.5. The binomial random variable is the nonnegative integer valued random variable withprobability mass function

pk = P(X = k) =N !

n!(N − n)!pnqN−n k = 0, 1, 2, . . . N,

wherep ∈ (0, 1), q = 1 − p.

15

Definition 2.2.6. A random variableX with values onR is calledcontinuous if P(X = x) = 0∀x ∈ R.

LetX be a random variable from(Ω,F , µ) to (Rd,B(Rd)). The (joint) distribution functionFXRdmapsto[0, 1]

is defined assFX(x) = P(X 6 x).

• We can use the distribution of a random variable to compute expectations and probabilities:

E[f(X)] =

∫

Sf(x) dµX(x)

and

P[X ∈ G] =

∫

GdµX(x), G ∈ B(E).

• WhenE = Rd and we can writedµX(x) = ρ(x) dx, then we refer toρ(x) as theprobability density

function (pdf), ordensity with respect to Lebesque measurefor X.

• WhenE = Rd then byLp(Ω; Rd), or sometimesLp(Ω;µ) or even simplyLp(µ), we mean the Banach

space of measurable functions onΩ with norm

‖X‖Lp =(

E|X|p)1/p

.

• Let X be a nonnegative integer valued random variable with probability mass functionpk. We cancompute the expectation of an arbitrary function ofX using the formula

E(f(X)) =∞∑

k=0

f(k)pk.

Example 2.2.7. • Consider the random variableX : Ω 7→ R with pdf

γσ,m(x) := (2πσ)−12 exp

(−(x−m)2

2σ

).

Such anX is termed aGaussianor normal random variable. The mean is

EX =

∫

R

xγσ,m(x) dx = m

and the variance is

E(X −m)2 =

∫

R

(x−m)2γσ,m(x) dx = σ.

• Letm ∈ Rd andΣ ∈ R

d×d be symmetric and positive definite. The random variableX : Ω 7→ Rd

with pdf

γΣ,m(x) :=((2π)ddetΣ

)− 12exp

(−1

2〈Σ−1(x−m), (x−m)〉

)

is termed amultivariate Gaussian or normal random variable. The mean is

E(X) = m (2.3)

and the covariance matrix is

E

((X −m) ⊗ (X −m)

)= Σ. (2.4)

16

• Since the mean and variance specify completely a Gaussian random variable onR, the Gaussian iscommonly denoted byN (m,σ). Thestandard normal random variable isN (0, 1).

• Since the mean and covariance matrix completely specify a Gaussian random variable onRd, theGaussian is commonly denoted byN (m,Σ).

Example 2.2.8.An exponential random variableT : Ω → R+ with rateλ > 0 satisfies

P(T > t) = e−λt, ∀t > 0.

We writeT ∼ exp(λ). The related pdf is

fT (t) = λe−λt, t > 0,

0, t < 0.(2.5)

Notice that

ET =

∫ ∞

−∞tfT (t)dt =

1

λ

∫ ∞

0(λt)e−λtd(λt) =

1

λ.

If the timesτn = tn+1 − tn are i.i.d random variables withτ0 ∼ exp(λ) then, fort0 = 0,

tn =

n−1∑

k=0

τk

and it is possible to show that

P(0 6 tk 6 t < tk+1) =e−λt(λt)k

k!. (2.6)

17

18

Chapter 3

Basics of the theory of Stochastic Processes

3.1 Definition of a Stochastic Process

• Let T be an ordered set. Astochastic processis a collection of random variablesX = Xt; t ∈ Twhere, for each fixedt ∈ T ,Xt is a random variable from(Ω,F) to (E,G).

• The measurable spaceΩ,F is called thesample space. The space(E,G) is called thestate space.

• In this course we will take the setT to be[0,+∞).

• The state spaceE will usually beRd equipped with theσ–algebra of Borel sets.

• A stochastic processX may be viewed as a function of botht ∈ T andω ∈ Ω. We will sometimeswrite X(t),X(t, ω) or Xt(ω) instead ofXt. For a fixed sample pointω ∈ Ω, the functionXt(ω) :

T 7→ E is called asample path(realization, trajectory) of the processX.

• The finite dimensional distributions (fdd) of a stochastic process are the distributions of theEk–valued random variables(X(t1),X(t2), . . . ,X(tk)) for arbitrary positive integerk and arbitrary timesti ∈ T, i ∈ 1, . . . , k:

F (x) = P(X(ti) 6 xi, i = 1, . . . , k)

with x = (x1, . . . , xk).

• We will say that two processesXt andYt are equivalent if they have same finite dimensional distribu-tions.

• From experiments or numerical simulations we can only obtain information about the (fdd) of a pro-cess.

Definition 3.1.1. A Gaussian processis a stochastic processes for whichE = Rd and all the finite dimen-

sional distributions are Gaussian

F (x) = P(X(ti) 6 xi, i = 1, . . . , k)

= (2π)−n/2(detKk)−1/2 exp

[−1

2〈K−1

k (x− µk), x− µk〉],

19

for some vectorµk and a symmetric positive definite matrixKk.

• A Gaussian processx(t) is characterized by its mean

m(t) := Ex(t)

and the covariance function

C(t, s) = E

((x(t) −m(t)

)⊗(x(s) −m(s)

)).

• Thus, the first two moments of a Gaussian process are sufficient for a complete characterization of theprocess.

3.2 Stationary Processes

3.2.1 Strictly Stationary Processes

Definition 3.2.1. A stochastic process is called(strictly) stationary if all finite dimensional distributions areinvariant under time translation: for any integerk and timesti ∈ T , the distribution of(X(t1),X(t2), . . . ,X(tk))

is equal to that of(X(s+t1),X(s+t2), . . . ,X(s+tk)) for anys such thats+ti ∈ T for all i ∈ 1, . . . , k.In other words,

P(Xt1+t ∈ A1,Xt2+t ∈ A2 . . . Xtk+t ∈ Ak)

= P(Xt1 ∈ A1,Xt2 ∈ A2 . . . Xtk ∈ Ak), ∀t ∈ T.

• Let Xt be a strictly stationary stochastic process with finite second moment (i.e.Xt ∈ L2). Thedefinition of strict stationarity implies thatEXt = µ, a constant, andE((Xt−µ)(Xs−µ)) = C(t−s).Hence, a strictly stationary process with finite second moment is also stationary in the wide sense. Theconverse is not true.

Example 3.2.2.1. A sequenceY0, Y1, . . . of independent, identically distributed random variableswithmeanµ and varianceσ2 is a stationary process withR(k) = σ2δk0, σ

2 = E(Yk)2. LetZn = Yk, k =

0, 1, . . . . We have

1

N

N−1∑

j=0

Zj =1

N

N−1∑

j=0

Yj → EY0 = µ,

by the strong law of large numbers. Hence, we can calculate the mean (as well as higher order moments)using a single sample path of our stochastic process, provided that it is long enough (N ≫ 1). This is anexample of anergodic stationary processes.

2. LetZ be a single random variable with meanµ and varianceσ2 and setZj = Z, j = 0, 1, 2, . . . . Thenthe sequenceZ0, Z1, Z2, . . . is a stationary sequence withR(k) = σ2. In this case we have that

1

N

N−1∑

j=0

Zj =1

N

N−1∑

j=0

Z = Z,

which is independent ofN and does not converge to the mean of the stochastic processesZn. This is anexample of a non-ergodic processes.

20

We will see in the next section that, for stationary processes, ergodicity is related to fast dacay of cor-relations. In the first of the examples above, there was no correlation between our stochastic processesat different times. On the contrary, in our second example there was very strong correlation between thestochastic process at different times (in fact the stochastic process is given by the same random variable atall times).

3.2.2 Second Order Stationary Processes

Let(Ω,F ,P

)be a probability space. LetXt, t ∈ T (with T = R or Z) be a real-valued random process on

this probability space with finite second moment,E|Xt|2 < +∞ (i.e.Xt ∈ L2 ).

Definition 3.2.3. A stochastic processXt ∈ L2 is calledsecond-order stationaryor wide-sense stationaryif the first momentEXt is a constant and the covariance functionE(Xt − µ)(Xs − µ) depends only on thedifferencet− s:

EXt = µ, E((Xt − µ)(Xs − µ)) = C(t− s).

The constantµ is theexpectation of the processXt. Without loss of generality, we can setµ = 0,since if EXt = µ then the processYt = Xt − µ is mean zero. A mean zero process with be called acenteredprocess. The functionC(t) is thecovarianceor theautocorrelation function of theXt. NoticethatC(t) = E(XtX0), whereasC(0) = E(X2

t ), which is finite, by assumption. Since we have assumedthatXt is a real valued process, we have thatC(t) = C(−t), t ∈ R.

Remark 3.2.4. The first two moments of a Gaussian process are sufficient for acomplete characterizationof the process. A corollary of this is that a second order stationary Gaussian process is also a (strictly)stationary process.

Continuity properties of the covariance function are equivalent to continuity properties of the paths ofXt in theL2 sense, i.e.

limh→0

E|Xt+h −Xt|2 = 0.

Lemma 3.2.5.Assume that the covariance functionC(t) of a second order stationary process is continuousat t = 0. Then it is continuous for allt ∈ R. Furthermore, the continuity ofC(t) is equivalent to thecontinuity of the processXt in theL2-sense.

Proof. Fix t ∈ R. We calculate:

|C(t+ h) − C(t)|2 = |E(Xt+hX0) − E(XtX0)|2 = E|((Xt+h −Xt)X0)|2

6 E(X0)2E(Xt+h −Xt)

2

= C(0)(EX2t+h + EX2

t − 2EXtXt+h)

= 2C(0)(C(0) − C(h)) → 0,

ash→ 0.Assume now thatC(t) is continuous. From the above calculation we have

E|Xt+h −Xt|2 = 2(C(0) − C(h)), (3.1)

21

which converges to0 ash → 0. Conversely, assume thatXt is L2-continuous. Then, from the aboveequation we getlimh→0C(h) = C(0).

Notice that form (3.1) we immediately conclude thatC(0) > C(h), h ∈ R.

The Fourier transform of the covariance function of a secondorder stationary process always exists.This enables us to study second order stationary processes using tools from Fourier analysis. To make thelink between second order stationary processes and Fourieranalysis we will use Bochner’s theorem, whichapplies to all nonnegative functions.

Definition 3.2.6. A functionf(x) : R 7→ R is called nonnegative definite if

n∑

i,j=1

f(ti − tj)cicj > 0 (3.2)

for all n ∈ N, t1, . . . tn ∈ R, c1, . . . cn ∈ C.

Lemma 3.2.7.The covariance function of second order stationary processis a nonnegative definite function.

Proof. We will use the notationXct :=

∑ni=1Xtici. We have.

n∑

i,j=1

C(ti − tj)cicj =

n∑

i,j=1

EXtiXtj cicj

= E

n∑

i=1

Xtici

n∑

j=1

Xtj cj

= E(Xc

t Xct

)

= E|Xct |2 > 0.

Theorem 3.2.8.(Bochner) There is a one-to-one correspondence between the set of continuous nonnegativedefinite functions and the set of finite measures on the Borelσ-algebra ofR: if ρ is a finite measure, then

C(t) =

∫

R

eixt ρ(dx) (3.3)

in nonnegative definite. Conversely, any nonnegative definite function can be represented in the form(3.3).

Definition 3.2.9. LetXt be a second order stationary process with covarianceC(t) whose Fourier trans-form is the measureρ(dx). The measureρ(dx) is called thespectral measureof the processXt.

In the following we will assume that the spectral measure is absolutely continuous with respect to theLebesgue measure onR with densityf(x), dρ(x) = f(x)dx. The Fourier transformf(x) of the covariancefunction is called thespectral densityof the process:

f(x) =1

2π

∫ ∞

−∞e−itxC(t) dt.

22

From (3.3) it follows that that the covariance function of a mean zero, second order stationary process isgiven by the inverse Fourier transform of the spectral density:

C(t) =

∫ ∞

−∞eitxf(x) dx.

There are various cases where the experimentally measured quantity is the spectral density (or power spec-trum) of the stochastic process.

Remark 3.2.10. The correlation function of a second order stationary process enables us to associate atime scale toXt, thecorrelation time τcor:

τcor =1

C(0)

∫ ∞

0C(τ) dτ =

∫ ∞

0E(XτX0)/E(X2

0 ) dτ.

The slower the decay of the correlation function, the largerthe correlation time is. Of course, we have toassume sufficiently fast decay of correlations so that the correlation time is finite.

Example 3.2.11. • Consider the mean zero, second order stationary process with covariance function

R(t) =D

αe−α|t|. (3.4)

• The spectral density of this process is:

f(x) =1

2π

D

α

∫ +∞

−∞e−ixte−α|t| dt

=1

2π

D

α

(∫ 0

−∞e−ixteαt dt +

∫ +∞

0e−ixte−αt dt

)

=1

2π

D

α

(1

ix+ α+

1

−ix+ α

)

=D

π

1

x2 + α2.

• This function is called theCauchyor theLorentz distribution.

• The correlation time is (we have thatR(0) = D/α)

τcor =

∫ ∞

0e−αt dt = α−1.

A Gaussian process with an exponential correlation function is of particular importance in the theoryand applications of stochastic processes.

Definition 3.2.12. A Gaussian stochastic process with mean zero and covariancefunction(3.4) is called thestationary Ornstein-Uhlenbeck process.

23

The OU process was introduced by Ornstein and Uhlenbeck in 1930 (G.E. Uhlenbeck, L.S. Ornstein,Phys. Rev. 36, 823 (1930)) as a model for the velocity of a Brownian particle. It is of interest to calculatethe statistics of the position of the Brownian particle, i.e. of the integral

X(t) =

∫ t

0Y (s) ds, (3.5)

whereY (t) denotes the stationary OU process.

Lemma 3.2.13.LetY (t) denote the stationary OU process with covariance function(3.4). Then the positionprocess(3.5) is a mean zero Gaussian process with covariance function

E(X(t)X(s)) = 2min(t, s) + e−min(t,s) + e−max(t,s) − e−|t−s| − 1.

Exercise 3.2.14.Prove Lemma(3.2.13).

3.2.3 Ergodic Properties of Stationary Processes

Second order stationary processes are ergodic: time averages equal phase space (ensemble) averages. Anexample of an ergodic theorem for a stationary processes is the followingL2 (mean-square) ergodic theorem.

Theorem 3.2.15.Let Xtt>0 be a second order stationary process on a probability spaceΩ, F , P withmeanµ and covarianceR(t), and assume thatR(t) ∈ L1(0,+∞). Then

limT→+∞

E

∣∣∣∣1

T

∫ T

0X(s) ds − µ

∣∣∣∣2

= 0. (3.6)

For the proof of this result we will first need an elementary lemma.

Lemma 3.2.16.LetR(t) be an integrable symmetric function. Then

∫ T

0

∫ T

0R(t− s) dtds = 2

∫ T

0(T − s)R(s) ds. (3.7)

Proof. We make the change of variablesu = t − s, v = t + s. The domain of integration in thet, svariables is[0, T ] × [0, T ]. In theu, v variables it becomes[−T, T ] × [0, 2(T − |u|)]. The Jacobian of thetransformation is

J =∂(t, s)

∂(u, v)=

1

2.

The integral becomes

∫ T

0

∫ T

0R(t− s) dtds =

∫ T

−T

∫ 2(T−|u|)

0R(u)J dvdu

=

∫ T

−T(T − |u|)R(u) du

= 2

∫ T

0(T − u)R(u) du,

where the symmetry of the functionR(u) was used in the last step.

24

Proof of Theorem 3.2.15.We use Lemma (3.2.16) to calculate:

E

∣∣∣∣1

T

∫ T

0Xs ds− µ

∣∣∣∣2

=1

T 2E

∣∣∣∣

∫ T

0(Xs − µ) ds

∣∣∣∣2

=1

T 2E

∫ T

0

∫ T

0(R(t) − µ)(R(s) − µ) dtds

=1

T 2

∫ T

0

∫ T

0R(t− s) dtds

=2

T 2

∫ T

0(T − u)R(u) du

62

T

∫ +∞

0

(1 − u

T

)R(u) 6

2

T

∫ +∞

0R(t) dt → 0,

using the dominated convergence theorem and the assumptionR(·) ∈ L1.From the above calculation we conclude that

limT→+∞

∫ T

0

(1 − u

T

)R(u) du =

∫ ∞

0R(t) dt =: D,

which is a finite quantity. Assume thatµ = 0. The above calculation suggests that

limT→+∞

1

TE

(∫ t

0X(t) dt

)2

= 2D.

Hence, forT ≫ 1, we have that

E

(∫ t

0X(t) dt

)2

≈ 2DT.

This implies that, at sufficiently long times, the mean square displacement of the integral of the ergodicstationary processXt scales linearly in time, with proportionality coefficient2D.

Assume thatXt is the velocity of a (Brownian) particle. In this case, the integral ofXt

Zt =

∫ t

0Xs ds,

represents the particle position. From our calculation above we conclude that

EZ2t = 2Dt.

where

D =

∫ ∞

0R(t) dt =

∫ ∞

0E(XtX0) dt (3.8)

is the diffusion coefficient. Thus, one expects that at sufficiently long times and under appropriate as-sumptions on the correlation function, the time integral ofa stationary process will approximate a Brownianmotion with diffusion coefficientD. The diffusion coefficient is an example of atransport coefficientand (3.8) is an example of theGreen-Kubo formula: a transport coefficient can be calculated in terms ofthe time integral of an appropriate autocorrelation function. In the case of the diffusion coefficient we needto calculate the integral of the velocity autocorrelation function.

25

Example 3.2.17.Consider the stochastic processes with an exponential correlation function from Exam-ple 3.2.11, and assume that this stochastic process describes the velocity of a Brownian particle. SinceR(t) ∈ L1(0,+∞) Theorem 3.2.15 applies. Furthermore, the diffusion coefficient of the Brownian particleis given by ∫ +∞

0R(t) dt = R(0)τ−1

c =D

α2.

3.3 Brownian Motion

• The most important continuous time stochastic process isBrownian motion. Brownian motion isa mean zero, continuous (i.e. it has continuous sample paths: for a.eω ∈ Ω the functionXt is acontinuous function of time) process with independent Gaussian increments.

• A processXt has independent incrementsif for every sequencet0 < t1 < . . . tn the randomvariables

Xt1 −Xt0 , Xt2 −Xt1 , . . . ,Xtn −Xtn−1

are independent.

• If, furthermore, for anyt1, t2, s ∈ T and Borel setB ⊂ R

P(Xt2+s −Xt1+s ∈ B) = P(Xt2 −Xt1 ∈ B)

then the processXt hasstationary independent increments.

Definition 3.3.1. • A one dimensional standardBrownian motionW (t) : R+ → R is a real valued

stochastic process such that

i. W (0) = 0.

ii. W (t) has independent increments.

iii. For every t > s > 0W (t)−W (s) has a Gaussian distribution with mean0 and variancet− s.That is, the density of the random variableW (t) −W (s) is

g(x; t, s) =(2π(t− s)

)− 12exp

(− x2

2(t− s)

); (3.9)

• A d–dimensional standard Brownian motionW (t) : R+ → R

d is a collection ofd independent onedimensional Brownian motions:

W (t) = (W1(t), . . . ,Wd(t)),

whereWi(t), i = 1, . . . , d are independent one dimensional Brownian motions. The density of theGaussian random vectorW (t) −W (s) is thus

g(x; t, s) =(2π(t− s)

)−d/2exp

(− ‖x‖2

2(t− s)

).

26

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

t

U(t)

mean of 1000 paths5 individual paths

Figure 3.1: Brownian sample paths

Brownian motion is sometimes referred to as theWiener process.

Brownian motion has continuous paths. More precisely:

Definition 3.3.2. LetXt andYt, t ∈ T , be two stochastic processes defined on the same probabilityspaceΩ,F ,P. The processYt is said to be a modification ofXt if P(Xt = Yt) = 1 ∀t ∈ T . Then there is amodification of Brownian motion such that all paths are continuous.

Lemma 3.3.3. There is a continuous modification of Brownian motion.

This follows from a theorem due to Kolmogorov.

Theorem 3.3.4.(Kolmogorov) LetXt, t ∈ [0,∞) be a stochastic process on a probability spaceΩ,F ,P.Suppose that there are positive constantsα andβ, and for eachT > 0 there is a constantC(T ) such that

E|Xt −Xs|α 6 C(T )|t− s|1+β , 0 6 s, t 6 T. (3.10)

There there exists a continuous modificationYt of the processXt.

Exercise 3.3.5.Use Theorem(3.3.4)to show that Brownian motion has a continuous modification.

Remark 3.3.6. Equivalently, we could have defined the one dimensional standard Brownian motion asa stochastic process on a probability spaceΩ,F ,P with continuous paths for almost allω ∈ Ω, andGaussian finite dimensional distributions with zero mean and covarianceE(WtiWtj ) = min(ti, tj). Onecan then show that Definition 3.3.1 and continuity of paths are equivalent to the above definition.

It is possible to prove rigorously the existence of the Wiener process (Brownian motion):

Theorem 3.3.7.(Wiener) There exists an almost-surely continuous processWt with independent incrementssuch andW0 = 0, such that for eacht > the random variableWt is N (0, t). Furthermore,Wt is almostsurely locally Holder continuous with exponentα for anyα ∈ (0, 1

2).

27

Notice that Brownian paths are not differentiable.We can also construct Brownian motion through the limit of the random walk: letX1, X2, . . . be iid

random variables on a probability spaceΩ,F ,P with mean0 and variance1. Define the discrete timestochastic processSn with S0 = 0, Sn =

∑j=1X

nj , n > n. Define not a continuous time stochastic

process with continuous paths as the linearly interpolated, appropriately rescaled random walk:

W nt =

1√nS[nt] + (nt− [nt])

1√nX[nt]+1,

where[·] denotes the integer part of a number. ThenW nt converges weakly to a one dimensional standard

Brownian motion.Brownian motion is a Gaussian process. For thed–dimensional Brownian motion, and forI thed × d

dimensional identity, we have (see (2.3) and (2.4))

EW (t) = 0 ∀t > 0

and

E

((W (t) −W (s)) ⊗ (W (t) −W (s))

)= (t− s)I. (3.11)

Moreover,

E

(W (t) ⊗W (s)

)= min(t, s)I. (3.12)

From the formula for the Gaussian densityg(x, t−s), eqn. (3.9), we immediately conclude thatW (t)−W (s)

andW (t+ u) −W (s+ u) have the same pdf. Consequently, Brownian motion has stationary increments.Notice, however, that Brownian motion itselfis not a stationary process. SinceW (t) = W (t) −W (0), thepdf ofW (t) is

g(x, t) =1√2πt

e−x2/2t.

We can easily calculate all moments of the Brownian motion:

E(xn(t)) =1√2πt

∫ +∞

−∞xne−x2/2t dx

=

1.3 . . . (n− 1)tn/2, n even,0, n odd.

Exercise 3.3.8.Let a1, . . . an ands1, . . . sn be positive real numbers. Calculate the mean and variance ofthe random variable

X =n∑

i=1

aiW (si).

Exercise 3.3.9.LetW (t) be the standard one-dimensional Brownian motion and letσ, s1, s2 > 0. Calcu-late

i. EeσW (t).

ii. E(sin(σW (s1)) sin(σW (s2))

).

28

Notice that ifWt is a standard Brownian motion, then so is the rescaled processes

Xt =1√cWct.

To prove this we show that the two processes have the same distribution function:

P (Xt 6 x) = P(Wct 6

√cx)

=

∫ √cx

−∞

1√2πct

exp

(− z2

2ct

)dz

=

∫ x

−∞

1√2πct

exp

(−y

2

2t

)√c dy

= P (Wt 6 x) .

We can also add a drift and change the diffusion coefficient ofthe Brownian motion: we will define aBrownian motion with driftµ and varianceσ2 as the process

Xt = µt+ σWt.

Notice that

EXt = µt, E(Xt − EXt)2 = σ2t.

We can define the OU process through the Brownian motion via a time change.

Lemma 3.3.10.LetW (t) be a standard Brownian motion and consider the process

V (t) = e−tW (e2t).

ThenV (t) is a Gaussian second order stationary process with mean0 and covariance

R(s, t) = e−|t−s|. (3.13)

For the proof of this result we first need to show that time changed Gaussian processes are also Gaussian.

Lemma 3.3.11.LetX(t) be a Gaussian stochastic process and letY (t) = X(f(t)) wheref(t) is a strictlyincreasing function. ThenY (t) is also a Gaussian process.

Proof. We need to show that, for all positive integersN and all sequences of timest1, t2, . . . tN therandom vector

Y (t1), Y (t2), . . . Y (tN ) (3.14)

is has is a multivariate Gaussian random variable. Sincef(t) is strictly increasing, it is invertible and hence,there existsi, i = 1, . . . N such thatsi = f−1(ti). Thus, the random vector (3.14) can be rewritten as

X(s1), X(s2), . . . X(sN ),

which is Gaussian for allN and all choices of timess1, s2, . . . sN . HenceY (t) is also Gaussian.

29

Proof of Lemma 3.3.10.The fact thatV (t) is mean zero follows immediately from the fact thatW (t) ismean zero. To show that the correlation function ofV (t) is given by (3.13), we calculate

E(V (t)V (s)) = e−t−sE(W (e2t)W (e2s)) = e−t−s min(e2t, e2s)

= e−|t−s|.

The Gaussianity of theV (t) follows from Lemma 3.3.11 (notice that the transformation that givesV (t) interms ofW (t) is invertible and we can writeW (s) = s1/2V (1

2 ln(s))).

Exercise 3.3.12.Use Lemma 3.3.10 to obtain the probability density functionof the stationary Ornstein-Uhlenbeck process.

Exercise 3.3.13.Calculate the mean and the correlation function of the integral of a standard Brownianmotion

Yt =

∫ t

0Ws ds.

Exercise 3.3.14.Show that the process

Yt =

∫ t+1

t(Ws −Wt) ds, t ∈ R,

is second order stationary.

3.4 Other Examples of Stochastic Processes

3.4.1 Brownian Bridge

LetW (t) be a standard one dimensional Brownian motion. We define theBrownian bridge (from 0 to 0)to be the process

Bt = Wt − tW1, t ∈ [0, 1]. (3.15)

Notice thatB0 = B1 = 0. Equivalently, we can define the Brownian bridge to be the continuous GaussianprocessBt : 0 6 t 6 1 such that

EBt = 0, E((Bt − EBt)(Bs − EBs)) = min(s, t) − st, s, t ∈ [0, 1]. (3.16)

Another, equivalent definition of the Brownian bridge is through an appropriate time change of the Brownianmotion:

Bt = t(1 − t)W

(t

1 − t

), t ∈ [0, 1). (3.17)

Conversely, we can write the Brownian motion as a time changeof the Brownian bridge:

Wt = (t+ 1)B

(t

1 + t

), t > 0.

Exercise 3.4.1.Show the equivalence between definitions(3.15), (3.16)and (3.17).

30

Exercise 3.4.2.Use(3.15)to show that the probability density function of the Brownian bridge is

f(x, t) = (2π(1 − t))1/2 exp

(−|x− 1|2

2(1 − t)

).

Exercise 3.4.3.Give the definition of the Brownian Bridge from0 to a, wherea ∈ R. Calculate the meanand covariance of this Brownian bridge.

3.4.2 Fractional Brownian Motion

Definition 3.4.4. A (normalized) fractional Brownian motionWHt , t > 0 with Hurst parameterH ∈ (0, 1)

is a centered Gaussian process with continuous sample pathswhose covariance is given by

E(WHt WH

s ) =1

2

(s2H + t2H − |t− s|2H

). (3.18)

Fractional Brownian motion has the following properties.

i. WhenH = 12 , W

12t becomes the standard Brownian motion.

ii. WH0 = 0, EWH

t = 0, E(WHt )2 = |t|2H , t > 0.

iii. It has stationary increments,E(WHt −WH

s )2 = |t− s|2H .

iv. It has the following self similarity property

(WHαt , t > 0) = (αHWH

t , t > 0), α > 0,

where the equivalence is in law.

3.4.3 The Poisson Process

• Another fundamental continuous time process is thePoisson process:

Definition 3.4.5. The Poisson process with intensityλ, denoted byN(t), is an integer-valued, contin-uous time, stochastic process with independent incrementssatisfying

P[(N(t) −N(s)) = k] =e−λ(t−s)

(λ(t− s)

)k

k!, t > s > 0, k ∈ N.

• Both Brownian motion and the Poisson process arehomogeneous(or time-homogeneous): the in-crements between successive timess andt depend only ont− s.

Exercise 3.4.6.Use Theorem(3.3.4) to show that there does not exist a continuous modification ofthePoisson process.

31

3.5 The Karhunen-Loeve Expansion

Let f ∈ L2(Ω) whereΩ is a subset ofRd and leten∞n=1 be an orthonormal basis inL2(Ω). Then, it iswell known thatf can be written as a series expansion:

f =∞∑

n=1

fnen,

where

fn =

∫

Ωf(x)en(x) dx.

The convergence is inL2(Ω):

limN→∞

‖f(x) −N∑

n=1

fnen(x)‖L2(Ω) = 0.

It turns out that we can obtain a similar expansion for anL2 mean zero process which is continuous in theL2 sense:

EX2t < +∞, EXt = 0, lim

h→0E|Xt+h −Xt|2 = 0. (3.19)

For simplicity we will takeT = [0, 1]. LetR(t, s) = E(XtXs) be the correlation function.

Exercise 3.5.1.Show that the correlation function of a processXt satisfying(3.19) is continuous in bothtands.

Let us assume that an expansion of the form

Xt(ω) =

∞∑

n=1

ξn(ω)en(t), t ∈ [0, 1] (3.20)

whereen∞n=1 is an orthonormal basis inL2([0, 1]). The random variablesξn are calculated as

∫ 1

0Xtek(t) dt =

∫ 1

0

∞∑

n=1

ξnen(t)ek(t) dt

=

∞∑

n=1

ξnδnk = ξk,

where we assumed that we can interchange the summation and integration. We will assume that theserandom variables are orthogonal:

E(ξnξm) = λnδnm,

whereλn∞n=1 are positive numbers that will be determined later.

32

Assuming that an expansion of the form (3.20) exists, we can calculate

R(t, s) = E(XtXs) = E

( ∞∑

k=1

∞∑

ℓ=1

ξkek(t)ξℓeℓ(s)

)

=∞∑

k=1

∞∑

ℓ=1

E (ξkξℓ) ek(t)eℓ(s)

=∞∑

k=1

λkek(t)ek(s).

Consequently, in order to the expansion (3.20) to be valid weneed

R(t, s) =∞∑

k=1

λkek(t)ek(s). (3.21)

From equation (3.21) it follows that

∫ 1

0R(t, s)en(s) ds =

∫ 1

0

∞∑

k=1

λkek(t)ek(s)en(s) ds

=

∞∑

k=1

λkek(t)

∫ 1

0ek(s)en(s) ds

=

∞∑

k=1

λkek(t)δkn

= λnen(t).

Hence, in order for the expansion (3.20) to be valid,λn, en(t)∞n=1 have to be the eigenvalues and eigen-functions of the integral operator whose kernel is the correlation function ofXt:

∫ 1

0R(t, s)en(s) ds = λnen(t). (3.22)

Hence, in order to prove the expansion (3.20) we need to studythe eigenvalue problem for the integraloperatorR : L2[0, 1] 7→ L2[0, 1]. It easy to check that this operator isself-adjoint ((Rf, h) = (f,Rh) forall f, h ∈ L2(0, 1)) andnonnegative(Rf, f > 0 for all f ∈ L2(0, 1)). Hence, all its eigenvalues are realand nonnegative. Furthermore, it is acompactoperator (ifφn∞n=1 is a bounded sequence inL2(0, 1), thenRφn∞n=1 has a weakly convergent subsequence). The spectral theoremfor compact, self-adjoint operatorsimplies thatR has a countable sequence of eigenvalues tending to0. Furthermore, for everyf ∈ L2(0, 1)

we can write

f = f0 +

∞∑

n=1

fnen(t),

whereRf0 = 0, en(t) are the eigenfunctions ofR corresponding to nonzero eigenvalues and the con-vergence is inL2. Finally, Spencer’s Theoremstates that forR(t, s) continuous on[0, 1] × [0, 1], theexpansion (3.21) is valid, where the series converges absolutely and uniformly.

33

Exercise 3.5.2.LetXt be a stochastic process satisfying(3.19)andR(t, s) its correlation function. Showthat the integral operatorR : L2[0, 1] 7→ L2[0, 1]

Rf :=

∫ 1

0R(t, s)f(s) ds (3.23)

is self-adjoint and nonnegative. Show that all of its eigenvalues are real and nonnegative. Show that eigen-functions corresponding to different eigenvalues are orthogonal.

Exercise 3.5.3.LetH be a Hilbert space. An operatorR : H → H is said to beHilbert-Schmidt if thereexists a complete orthonormal sequenceφn∞n=1 in H such that

∞∑

n=1

‖Ren‖2 <∞.

LetR : L2[0, 1] 7→ L2[0, 1] be the operator defined in(3.23)withR(t, s) being continuous both int ands.Show that it is a Hilbert-Schmidt operator.

Now we are ready to prove (3.20).

Theorem 3.5.4. (Karhunen-Loeve). LetXt, t ∈ [0, 1] be anL2 process with zero mean and continuouscorrelation functionR(t, s). Let λn, en(t)∞n=1 be the eigenvalues and eigenfunctions of the operatorRdefined in(3.23). Then

Xt =

∞∑

n=1

ξnen(t), t ∈ [0, 1], (3.24)

where

ξn =

∫ 1

0Xten(t) dt, Eξn = 0, E(ξnξm) = λδnm. (3.25)

The series converges inL2 toX(t), uniformly int.

Proof. The fact thatEξn = 0 follows from the fact thatXt is mean zero. The orthogonality of the randomvariablesξn∞n=1 follows from the orthogonality of the eigenfunctions ofR:

E(ξnξm) = E

∫ 1

0

∫ 1

0XtXsen(t)em(s) dtds

=

∫ 1

0

∫ 1

0R(t, s)en(t)em(s) dsdt

= λn

∫ 1

0en(s)em(s) ds

= λnδnm.

34

Consider now the partial sumSN =∑N

n=1 ξnen(t).

E|Xt − SN |2 = EX2t + ES2

N − 2E(XtSN )

= R(t, t) + E

N∑

k,ℓ=1

ξkξℓek(t)eℓ(t) − 2E

(

Xt

N∑

n=1

ξnen(t)

)

= R(t, t) +N∑

k=1

λk|ek(t)|2 − 2E

N∑

k=1

∫ 1

0XtXsek(s)ek(t) ds

= R(t, t) −N∑

k=1

λk|ek(t)|2 → 0,

by Spencer’s theorem.

Exercise 3.5.5.LetXt a mean zero second order stationary process with continuouscovarianceR(t). Showthat

∞∑

n=1

λn = T R(0).

Remark 3.5.6. LetXt be a Gaussian second order process with continuous covarianceR(t, s). Then therandom variablesξk∞k=1 are Gaussian, since they are defined through the time integral of a Gaussianprocesses. Furthermore, since they are Gaussian and orthogonal, they are also independent. Hence, forGaussian processes the Karhunen-Loeve expansion becomes:

Xt =+∞∑

k=1

√λkξkek(t), (3.26)

whereξk∞k=1 are independentN (0, 1) random variables.

Example 3.5.7. The Karhunen-Loeve Expansion for Brownian Motion. The correlation function ofBrownian motion isR(t, s) = min(t, s). The eigenvalue problemRψn = λnψn becomes

∫ 1

0min(t, s)ψn(s) ds = λnψn(t).

Let us assume thatλn > 0 (it is easy to check that0 is not an eigenvalue). Upon settingt = 0 we obtainψn(0) = 0. The eigenvalue problem can be rewritten in the form

∫ t

0sψn(s) ds + t

∫ 1

tψn(s) ds = λnψn(t).

We differentiate this equation once: ∫ 1

tψn(s) ds = λnψ

′n(t).

35

We sett = 1 in this equation to obtain the second boundary conditionψ′n(1) = 0. A second differentiation

yields;−ψn(t) = λnψ

′′n(t),

where primes denote differentiation with respect tot. Thus, in order to calculate the eigenvalues and eigen-functions of the integral operator whose kernel is the covariance function of Brownian motion, we need tosolve the Sturm-Liouville problem

−ψn(t) = λnψ′′n(t), ψ(0) = ψ′(1) = 0.

It is easy to check that the eigenvalues and (normalized) eigenfunctions are

ψn(t) =√

2 sin

(1

2(2n + 1)πt

), λn =

(2

(2n + 1)π

)2

.

Thus, the Karhunen-Loeve expansion of Brownian motion on[0, 1] is

Wt =√

2∞∑

n=1

ξn2

(2n + 1)πsin

(1

2(2n + 1)πt

). (3.27)

Exercise 3.5.8.Calculate the Karhunen-Loeve expansion for a second order stochastic process with corre-lation functionR(t, s) = ts.

Exercise 3.5.9.Calculate the Karhunen-Loeve expansion of the Brownian bridge on[0, 1].

Exercise 3.5.10.LetXt, t ∈ [0, T ] be a second order process with continuous covariance and Karhunen-Loeve expansion

Xt =∞∑

k=1

ξkek(t).

Define the processY (t) = f(t)Xτ(t), t ∈ [0, S],

wheref(t) is a continuous function andτ(t) a continuous, nondecreasing function withτ(0) = 0, τ(S) =

T . Find the Karhunen-Loeve expansion ofY (t), in terms of the KL expansion ofXt. Use this in order tocalculate the KL expansion of the Ornstein-Uhlenbeck process.

Exercise 3.5.11.Calculate the Karhunen-Loeve expansion of a centered Gaussian stochastic process withcovariance functionR(s, t) = cos(2π(t− s)).

3.6 The Path Space

• Let (Ω,F , µ) be a probability space,(E, ρ) a metric space and letT = [0,∞). Let Xt be astochastic process from(Ω,F , µ) to (E, ρ) with continuous sample paths.

• The above means that for everyω ∈ Ω we have thatXt ∈ CE := C([0,∞);E).

• The space of continuous functionsCE is called thepath spaceof the stochastic process.

36

• We can put a metric onE as follows:

ρE(X1,X2) :=

∞∑

n=1

1

2nmax06t6n

min(ρ(X1

t ,X2t ), 1

).

• We can then define the Borel sets onCE, using the topology induced by this metric, andXt can bethought of as a random variable on(Ω,F , µ) with state space(CE,B(CE)).

• The probability measurePX−1t on (CE ,B(CE)) is called thelaw of Xt.

• The law of a stochastic process is a probability measure on its path space.

Example 3.6.1. The space of continuous functionsCE is the path space of Brownian motion (theWiener process). The law of Brownian motion, that is the measure that it induces onC([0,∞),Rd),

is known as theWiener measure.

3.7 Discussion and Bibliography

The kind of analysis presented in Section 3.2.3 was initiated by G.I. Taylor in [25]. The spectral theoremfor compact, self-adjoint operators which was needed in theproof of the Karhunen-Loeve theorem can befound in [19]. A similar result is valid forrandom fields. See [23] and the reference therein.

37

38

Chapter 4

Markov Processes

4.1 Introduction and Examples

In this chapter we will study some of the basic properties of Markov stochastic processes. Roughly speaking,a Markov process is a stochastic process that retains no memory of where it has been in the past: only thecurrent state of a Markov process can influence where it will go next. A bit more precisely: a Markovprocess is a stochastic process for which, given the present, past and future are statistically independent.

Perhaps the simplest example of a Markov process is that of a random walk in one dimension. Wedefined the one dimensional random walk as the sum of independent, mean zero and variance1 randomvariablesξi, i = 1, . . . :

XN =N∑

n=1

ξn, X0 = 0.

Let i1, . . . i2, . . . be a sequence of integers. Then, clearly

P(Xn+1 = in+1|X1 = i1, . . . Xn = in) = P(Xn+1 = in+1|Xn = in) (4.1)

In words, the probability that the random walk will be atin+1 at timen+1 depends only on its current valueand not on how it got there.

Consider the case where theξn’s are independent Bernoulli random variables withEξn = ±1 = 12 .

Then,P(Xn+1 = in+1|X1 = i1, . . . Xn = in) = 12 if in+1 = in ± 1 and0 otherwise.

The random walk is an example of adiscrete time Markov chain:

Definition 4.1.1. A stochastic processSn;n ∈ N and state space isS = Z is called a discrete timeMarkov chain provided that theMarkov property (4.1) is satisfied.

Consider now a continuous-time stochastic processXt with state spaceS = Z and denote byXs, s 6

t the collection of values of the stochastic process up to timet. We will say thatXt is a Markov processesprovided that

P(Xt+h = it+h|Xs, s 6 t) = P(Xt+h = it+h|Xt = it), (4.2)

for all h > 0. A continuous-time, discrete state space Markov process iscalled acontinuous-time Markovchain.

39

Example 4.1.2.The Poisson process is a continuous-time Markov chain with

P(Nt+h = j|Nt = i) = 0 if j < i,

e−λs(λs)j−i

(j−i)! , if j > i.

Similarly, we can define a Markov process for a continuous-time Markov process whose state space isR. In this case, the above definitions become

P(Xt+h ∈ Γ|Xs, s 6 t) = P(Xt+h ∈ Γ|Xt = x) (4.3)

for all Borel setsΓ.

Example 4.1.3.The Brownian motion is a Markov process withconditional probability density

p(y, t|x, s) := p(Wt = y|Ws = x) =1√

2π(t− s)exp

(−|x− y|2

2(t− s)

). (4.4)

Example 4.1.4.The Ornstein-Uhlenbeck processVt = e−tW (e2t) is a Markov process withconditionalprobability density

p(y, t|x, s) := p(Vt = y|Vs = x) =1√

2π(1 − e−2(t−s))exp

(

−|y − xe−(t−s)|22(1 − e−2(t−s))

)

. (4.5)

To prove(4.5) we use the formula for the distribution function of the Brownian motion to calculate, fort > s,

P(Vt 6 y|Vs = x) = P(e−tW (e2t) 6 y|e−sW (e2s) = x)

= P(W (e2t) 6 ety|W (e2s) = esx)

=

∫ ety

−∞

1√2π(e2t − e2s)

e− |zet−xes|

2(e2t−e2s) dz

=

∫ y

−∞

√2πe2t(1 − e−2(t−s))e

− |ρet−xes|

2(e2t(1−e−2(t−s)) dρ

=

∫ y

−∞

1√2π(1 − e−2(t−s))

e− |ρ−x|

2(1−e−2(t−s)) dρ.

Consequently, the transition probability density for the OU process is given by the formula

p(y, t|x, s) =∂

∂yP(Vt 6 y|Vs = x)

=1√

2π(1 − e−2(t−s))exp

(

−|y − xe−(t−s)|22(1 − e−2(t−s))

)

.

Markov stochastic processes appear in a variety of applications in physics, chemistry, biology and fi-nance. In this and the next chapter we will develop various analytical tools for studying them. In particular,we will see that we can obtain an equation for thetransition probability

P(Xn+1 = in+1|Xn = in), P(Xt+h = it+h|Xt = it), p(Xt + h = y|Xt = x), (4.6)

40

which will enable us to study the evolution of a Markov process. This equation will be called theChapman-Kolmogorov equation.

We will be mostly concerned withtime-homogeneousMarkov processes, i.e. processes for which theconditional probabilities are invariant under time shifts. For time-homogeneous discrete-time Markov chainswe have

P(Xn+1 = j|Xn = i) = P(X1 = j|X0 = i) =: pij.

We will refer to the matrixP = pij as thetransition matrix . It is each to check that the transition matrixis astochastic matrix, i.e. it has nonnegative entries and

∑j pij = 1. Similarly, we can define then-step

transition matrixPn = pij(n) as

pij(n) = P(Xm+n = j|Xm = i).

We can study the evolution of a Markov chain through the Chapman-Kolmogorov equation:

pij = (m+ n) =∑

k

pik(m)pkj(n). (4.7)

Indeed, letµ(n)i := P(Xn = i). The (possibly infinite dimensional) vectorµn determines the state of the

Markov chain at timen. A simple consequence of the Chapman-Kolmogorov equation is that we can writean evolution equation for the vectorµ(n)

µ(n) = µ(0)Pn, (4.8)

wherePn denotes thenth power of the matrixP . Hence in order to calculate the state of the Markov chainat timen all we need is the initial distributionµ0 and the transition matrixP . Componentwise, the aboveequation can be written as

µ(n)j =

∑

i

µ(0)i πij(n).

Consider now a continuous time Markov chain with transitionprobability

pij(s, t) = P(Xt = j|Xs = i), s 6 t.

If the chain is homogeneous, then

pij(s, t) = pij(0, t − s) for all i, j, s, t.

In particular,pij(t) = P(Xt = j|X0 = i).

The Chapman-Kolmogorov equation for a continuous time Markov chain is

dpij

dt=∑

k

pik(t)gkj , (4.9)

where the matrixG is called thegeneratorof the Markov chain. Equation (4.9) can also be written in matrixnotation:

dP

dt= PtG.

41

The generator of the Markov chain is defined as

G = limh→0

1

h(Ph − I).

Let nowµit = P(Xt = i). The vectorµt is the distribution of the Markov chain at timet. We can study its

evolution using the equationµt = µ0Pt.

Thus, as in the case if discrete time Markov chains, the evolution of a continuous time Markov chain iscompletely determined by the initial distribution and and transition matrix.

Consider now the case a continuous time Markov process with continuous state space and with contin-uous paths. As we have seen in Example 4.1.3 the Brownian motion is an example of such a process. It isa standard result in the theory of partial differential equations that the conditional probability density of theBrownian motion (4.4) is the fundamental solution of the diffusion equation:

∂p

∂t=

1

2

∂2p

∂y2, lim

t→sp(y, t|x, s) = δ(y − x). (4.10)

Exercise 4.1.5.Show that(4.4) is the solution of initial value problem(4.10)as well as of the final valueproblem

−∂p∂s

=1

2

∂2p

∂x2, lim

s→tp(y, t|x, s) = δ(y − x).

Similarly, the conditional distribution of the OU process satisfies the initial value problem

∂p

∂t=∂(yp)

∂y+∂2p

∂y2, lim

t→sp(y, t|x, s) = δ(y − x). (4.11)

The Brownian motion and the OU process are examples of adiffusion process. A diffusion process isa continuous time Markov process with continuous paths. We will see in Chapter 5, that the conditionalprobability densityp(y, t|x, s) of a diffusion process satisfies theforward Kolmogorov or Fokker-Planckequation

∂p

∂t= − ∂

∂y(a(y, t)p) +

1

2

∂2

∂y2(b(y, t)p), lim

t→sp(y, t|x, s) = δ(y − x). (4.12)

as well as thebackward Kolmogorov equation

−∂p∂s

= −a(x, s)∂p∂x

+1

2b(x, s)

∂2p

∂x2, lim

t→sp(y, t|x, s) = δ(y − x). (4.13)

for appropriate functionsa(y, t), b(y, t). Hence, a diffusion process is determined uniquely from these twofunctions.

Exercise 4.1.6.Use(4.5) to show that the forward and backward Kolmogorov equations for the OU processare

∂p

∂t=

∂

∂y(yp) +

1

2

∂2p

∂y2

and

−∂p∂s

= −x∂p∂x

+1

2

∂2p

∂x2.

42

4.2 Definition of a Markov Process

In Section 4.1 we gave the definition of Markov process whose time is either discrete or continuous, andwhose state space is the set of integers. We also gave severalexamples of Markov chains as well as ofprocesses whose state space is the real line. In this sectionwe give the precise definition of a Markovprocess witht ∈ T , a general index set andS = E, an arbitrary metric space. We will use this formulationin the next section to derive the Chapman-Kolmogorov equation.

In order to state the definition of a continuous-time Markov process that takes values in a metric spacewe need to introduce various new concepts. For the definitionof a Markov process we need to use theconditional expectation of the stochastic process conditioned on all past values. We can encode all pastinformation about a stochastic process into an appropriatecollection ofσ-algebras. Our setting will be thatwe have a probability space(Ω,F ,P) and an ordered setT . LetX = Xt(ω) be a stochastic process fromthe sample space(Ω,F) to the state space(E,G), whereE is a metric space (we will usually takeE to beeitherR or R

d). Remember that the stochastic process is a function of two variables,t ∈ T andω ∈ Ω.We start with the definition of aσ–algebra generated by a collection of sets.

Definition 4.2.1. LetK be a collection of subsets ofΩ. The smallestσ–algebra onΩ which containsK isdenoted byσ(K) and is called theσ–algebra generatedbyK.

Definition 4.2.2. LetXt : Ω 7→ E, t ∈ T . The smallestσ–algebraσ(Xt, t ∈ T ), such that the family ofmappingsXt, t ∈ T is a stochastic process with sample space(Ω, σ(Xt, t ∈ T )) and state space(E,G),

is called theσ–algebra generated byXt, t ∈ T.

In other words, theσ–algebra generated byXt is the smallestσ–algebra such thatXt is a measurablefunction (random variable) with respect to it: the set

(ω ∈ Ω : Xt(ω) 6 x

)∈ σ(Xt, t ∈ T )

for all x ∈ R (we have assumed thatE = X).

Definition 4.2.3. A filtration on (Ω,F) is a nondecreasing familyFt, t ∈ T of sub–σ–algebras ofF :Fs ⊆ Ft ⊆ F for s 6 t.

We setF∞ = σ(∪t∈TFt). Thefiltration generated by Xt, whereXt is a stochastic process, is

FXt := σ (Xs; s 6 t) .

Definition 4.2.4. A stochastic processXt; t ∈ T is adapted to the filtrationFt := Ft, t ∈ T if forall t ∈ T ,Xt is anFt–measurable random variable.

Definition 4.2.5. Let Xt be a stochastic process defined on a probability space(Ω,F , µ) with values inE and letFX

t be the filtration generated byXt. ThenXt is aMarkov process if

P(Xt ∈ Γ|FXs ) = P(Xt ∈ Γ|Xs) (4.14)

for all t, s ∈ T with t > s, andΓ ∈ B(E).

43

Remark 4.2.6. The filtrationFXt is generated by events of the formω|Xs1 ∈ B1, Xs2 ∈ B2, . . . Xsn ∈

Bn, with 0 6 s1 < s2 < · · · < sn 6 s andBi ∈ B(E). The definition of a Markov process is thusequivalent to the hierarchy of equations

P(Xt ∈ Γ|Xt1 ,Xt2 , . . . Xtn) = P(Xt ∈ Γ|Xtn) a.s.

for n > 1, 0 6 t1 < t2 < · · · < tn 6 t andΓ ∈ B(E).

Roughly speaking, the statistics ofXt for t > s are completely determined onceXs is known; informa-tion aboutXt for t < s is superfluous. In other words:a Markov process has no memory. More precisely:when a Markov process is conditioned on the present state, then there is no memory of the past.The pastand future of a Markov process are statistically independent when the present is known.

Remark 4.2.7. A non Markovian processXt can be described through a Markovian oneYt by enlargingthe state space: the additional variables that we introduceaccount for the memory in theXt. This ”Marko-vianization” trick is very useful since there are many more tools for analyzing Markovian process.

Example 4.2.8.The velocity of a Brownian particle is modeled by the stationary Ornstein-Uhlenbeck pro-cessYt = e−tW (e2t). The particle position is given by the integral of the OU process (we takeX0 = 0)

Xt =

∫ t

0Ys ds.

The particle position depends on the past of the OU process and, consequently, is not a Markov process.However, the joint position-velocity processXt, Yt is. Its transition probability densityp(x, y, t|x0, y0)

satisfies the forward Kolmogorov equation

∂p

∂t= −p∂p

∂x+

∂

∂y(yp) +

1

2

∂2p

∂y2.

4.3 The Chapman-Kolmogorov Equation

With a Markov processXt we can associate a functionP : T × T × E × B(E) → R+ defined through

the relationP[Xt ∈ Γ|FX

s

]= P (s, t,Xs,Γ),

for all t, s ∈ T with t > s and allΓ ∈ B(E). Assume thatXs = x. SinceP[Xt ∈ Γ|FX

s

]= P [Xt ∈ Γ|Xs]

we can writeP (Γ, t|x, s) = P [Xt ∈ Γ|Xs = x] .

Thetransition function P (t,Γ|x, s) is (for fixedt, x s) a probability measure onE with P (t, E|x, s) = 1;it is B(E)–measurable inx (for fixed t, s, Γ) and satisfies theChapman–Kolmogorovequation

P (Γ, t|x, s) =

∫

EP (Γ, t|y, u)P (dy, u|x, s). (4.15)

for all x ∈ E, Γ ∈ B(E) ands, u, t ∈ T with s 6 u 6 t. The derivation of the Chapman-Kolmogorovequation is based on the assumption of Markovianity and on properties of the conditional probability. Let

44

(Ω,F , µ) be a probability space,X a random variable from(Ω,F , µ) to (E,G) and letF1 ⊂ F2 ⊂ F .Then

E(E(X|F2)|F1) = E(E(X|F1)|F2) = E(X|F1). (4.16)

GivenG ⊂ F we define the functionPX(B|G) = P (X ∈ B|G) for B ∈ F . Assume thatf is such thatE(f(X)) <∞. Then

E(f(X)|G) =

∫

R

f(x)PX(dx|G). (4.17)

Now we use the Markov property, together with equations (4.16) and (4.17) and the fact thats < u ⇒FX

s ⊂ FXu to calculate:

P (Γ, t|x, s) := P(Xt ∈ Γ|Xs = x) = P(Xt ∈ Γ|FXs )

= E(IΓ(Xt)|FXs ) = E(E(IΓ(Xt)|FX

s )|FXu )

= E(E(IΓ(Xt)|FXu )|FX

s ) = E(P(Xt ∈ Γ|Xu)|FXs )

= E(P(Xt ∈ Γ|Xu = y)|Xs = x)

=

∫

R

P (Γ, t|Xu = y)P (dz, u|Xs = x)

=:

∫

R

P (Γ, t|y, u)P (dy, u|x, s).

IΓ(·) denotes the indicator function of the setΓ. We have also setE = R. The CK equation is an integralequation and is the fundamental equation in the theory of Markov processes. Under additional assumptionswe will derive from it the Fokker-Planck PDE, which is the fundamental equation in the theory of diffusionprocesses, and will be the main object of study in this course.

Definition 4.3.1. A Markov process ishomogeneousif

P (t,Γ|Xs = x) := P (s, t, x,Γ) = P (0, t− s, x,Γ).

We setP (0, t, ·, ·) = P (t, ·, ·). The Chapman–Kolmogorov (CK) equation becomes

P (t+ s, x,Γ) =

∫

EP (s, x, dz)P (t, z,Γ). (4.18)

Let Xt be a homogeneous Markov process and assume that theinitial distribution of Xt is givenby the probability measureν(Γ) = P (X0 ∈ Γ) (for deterministic initial conditions–X0 = x– we havethat ν(Γ) = IΓ(x) ). The transition functionP (x, t,Γ) and the initial distributionν determine the finitedimensional distributions ofX by

P(X0 ∈ Γ1, X(t1) ∈ Γ1, . . . ,Xtn ∈ Γn)

=

∫

Γ0

∫

Γ1

. . .

∫

Γn−1

P (tn − tn−1, yn−1,Γn)P (tn−1 − tn−2, yn−2, dyn−1)

· · · × P (t1, y0, dy1)ν(dy0). (4.19)

Theorem 4.3.2.([3, Sec. 4.1]) LetP (t, x,Γ) satisfy(4.18)and assume that(E, ρ) is a complete separablemetric space. Then there exists a Markov processX inE whose finite-dimensional distributions are uniquelydetermined by(4.19).

45

Let Xt be a homogeneous Markov process with initial distributionν(Γ) = P (X0 ∈ Γ) and transitionfunctionP (x, t,Γ). We can calculate the probability of findingXt in a setΓ at timet:

P(Xt ∈ Γ) =

∫

EP (x, t,Γ)ν(dx).

Thus, the initial distribution and the transition functionare sufficient to characterize a homogeneous Markovprocess. Notice that they do not provide us with any information about the actual paths of the Markovprocess. The transition probabilityP (Γ, t|x, s) is a probability measure. Assume that it has a density for allt > s:

P (Γ, t|x, s) =

∫

Γp(y, t|x, s) dy.

Clearly, fort = s we haveP (Γ, s|x, s) = IΓ(x). The Chapman-Kolmogorov equation becomes:

∫

Γp(y, t|x, s) dy =

∫

R

∫

Γp(y, t|z, u)p(z, u|x, s) dzdy,

and, sinceΓ ∈ B(R) is arbitrary, we obtain the equation

p(y, t|x, s) =

∫

R

p(y, t|z, u)p(z, u|x, s) dz. (4.20)

The transition probability density is a function of4 arguments: the initial position and timex, s and the finalposition and timey, t.

In words, the CK equation tells us that, for a Markov process,the transition fromx, s to y, t can be donein two steps: first the system moves fromx to z at some intermediate timeu. Then it moves fromz to y attime t. In order to calculate the probability for the transition from (x, s) to (y, t) we need to sum (integrate)the transitions from all possible intermediary statesz. The above description suggests that a Markov processcan be described through asemigroupof operators, i.e. a one-parameter family of linear operators with theproperties

P0 = I, Pt+s = Pt Ps ∀ t, s > 0.

A semigroup of operators is characterized through itsgenerator.Indeed, letP (t, x, dy) be the transition function of a homogeneous Markov process.It satisfies the CK

equation (4.18):

P (t+ s, x,Γ) =

∫

EP (s, x, dz)P (t, z,Γ).

LetX := Cb(E) and define the operator

(Ptf)(x) := E(f(Xt)|X0 = x) =

∫

Ef(y)P (t, x, dy).

This is a linear operator with

(P0f)(x) = E(f(X0)|X0 = x) = f(x) ⇒ P0 = I.

46

Furthermore:

(Pt+sf)(x) =

∫f(y)P (t+ s, x, dy)

=

∫ ∫f(y)P (s, z, dy)P (t, x, dz)

=

∫ (∫f(y)P (s, z, dy)

)P (t, x, dz)

=

∫(Psf)(z)P (t, x, dz)

= (Pt Psf)(x).

Consequently:Pt+s = Pt Ps.

4.4 The Generator of a Markov Processes

• Let (E, ρ) be a metric space and letXt be anE-valued homogeneous Markov process. Define theone parameter family of operatorsPt through

Ptf(x) =

∫f(y)P (t, x, dy) = E[f(Xt)|X0 = x]

for all f(x) ∈ Cb(E) (continuous bounded functions onE).

• Assume for simplicity thatPt : Cb(E) → Cb(E). Then the one-parameter family of operatorsPt

forms asemigroupof operators onCb(E).

• We define byD(L) the set of allf ∈ Cb(E) such that the strong limit

Lf = limt→0

Ptf − f

t,

exists.

Definition 4.4.1. The operatorL : D(L) → Cb(E) is called theinfinitesimal generator of the operatorsemigroupPt.

Definition 4.4.2. The operatorL : Cb(E) → Cb(E) defined above is called thegeneratorof the MarkovprocessXt.

• The study of operator semigroups started in the late40’s independently by Hille and Yosida. Semi-group theory was developed in the50’s and60’s by Feller, Dynkin and others, mostly in connectionto the theory of Markov processes.

• Necessary and sufficient conditions for an operatorL to be the generator of a (contraction) semigroupare given by the Hille-Yosida theorem (e.g. Evans Partial Differential Equations, AMS 1998, Ch. 7).

47

• The semigroup property and the definition of the generator ofa semigroup imply that, formally atleast, we can write:

Pt = exp(Lt).

• Consider the functionu(x, t) := (Ptf)(x). We calculate its time derivative:

∂u

∂t=

d

dt(Ptf) =

d

dt

(eLtf

)

= L(eLtf

)= LPtf = Lu.

• Furthermore,u(x, 0) = P0f(x) = f(x). Consequently,u(x, t) satisfies the initial value problem

∂u

∂t= Lu, u(x, 0) = f(x). (4.21)

• When the semigroupPt is the transition semigroup of a Markov processXt, then equation (4.21) iscalled thebackward Kolmogorov equation. It governs the evolution of anobservable

u(x, t) = E(f(Xt)|X0 = x).

• Thus, given the generator of a Markov processL, we can calculate all the statistics of our process bysolving the backward Kolmogorov equation.

• In the case where the Markov process is the solution of a stochastic differential equation, then thegenerator is a second order elliptic operator and the backward Kolmogorov equation becomes aninitial value problem for a parabolic PDE.

• The spaceCb(E) is natural in a probabilistic context, but other Banach spaces often arise in applica-tions; in particular when there is a measureµ onE, the spacesLp(E;µ) sometimes arise. We willquite often use the spaceL2(E;µ), whereµ will is the invariant measure of our Markov process.

• The generator is frequently taken as the starting point for the definition of a homogeneous Markovprocess.

• Conversely, letPt be acontraction semigroup(LetX be a Banach space andT : X → X a boundedoperator. ThenT is a contraction provided that‖Tf‖X 6 ‖f‖X ∀ f ∈ X), with D(Pt) ⊂ Cb(E),closed. Then, under mild technical hypotheses, there is anE–valued homogeneous Markov processXt associated withPt defined through

E[f(X(t)|FXs )] = Pt−sf(X(s))

for all t, s ∈ T with t > s andf ∈ D(Pt).

Example 4.4.3.The Poisson process is a homogeneous Markov process.

48

Example 4.4.4.The one dimensional Brownian motion is a homogeneous Markovprocess. The transitionfunction is the Gaussian defined in the example in Lecture2:

P (t, x, dy) = γt,x(y)dy, γt,x(y) =1√2πt

exp

(−|x− y|2

2t

).

The semigroup associated to the standard Brownian motion isthe heat semigroupPt = et2

d2

dx2 . The genera-tor of this Markov process is12

d2

dx2 .

• Notice that thetransition probability density γt,x of the one dimensional Brownian motion is thefundamental solution (Green’s function) of the heat (diffusion) PDE

∂u

∂t=

1

2

∂2u

∂x2.

4.5 The Adjoint Semigroup

• The semigroupPt acts on bounded measurable functions.

• We can also define the adjoint semigroupP ∗t which acts on probability measures:

P ∗t µ(Γ) =

∫

R

P(Xt ∈ Γ|X0 = x) dµ(x) =

∫

R

p(t, x,Γ) dµ(x).

• The image of a probability measureµ underP ∗t is again a probability measure. The operatorsPt and

P ∗t are adjoint in theL2-sense:

∫

R

Ptf(x) dµ(x) =

∫

R

f(x) d(P ∗t µ)(x). (4.22)

• We can, formally at least, writeP ∗

t = exp(L∗t),

• whereL∗ is theL2-adjoint of the generator of the process:∫

Lfh dx =

∫fL∗hdx.

• Let µt := P ∗t µ. This is thelaw of the Markov process andµ is the initial distribution. An argument

similar to the one used in the derivation of the backward Kolmogorov equation (4.21) enables us toobtain an equation for the evolution ofµt:

∂µt

∂t= L∗µt, µ0 = µ.

• Assuming thatµt = ρ(y, t) dy, µ = ρ0(y) dy this equation becomes:

∂ρ

∂t= L∗ρ, ρ(y, 0) = ρ0(y). (4.23)

49

• This is theforward Kolmogorov or Fokker-Planck equation. When the initial conditions are deter-ministic,X0 = x, the initial condition becomesρ0 = δ(y − x).

• Given the initial distribution and the generator of the Markov processXt, we can calculate the tran-sition probability density by solving the Forward Kolmogorov equation. We can then calculate allstatistical quantities of this process through the formula

E(f(Xt)|X0 = x) =

∫f(y)ρ(t, y;x) dy.

• We will derive rigorously the backward and forward Kolmogorov equations for Markov processes thatare defined as solutions of stochastic differential equations later on.

• We can study the evolution of a Markov process in two different ways:

• Either through the evolution ofobservables(Heisenberg/Koopman)

∂(Ptf)

∂t= L(Ptf),

• or through the evolution ofstates(Schrodinger/Frobenious-Perron)

∂(P ∗t µ)

∂t= L∗(P ∗

t µ).

• We can also study Markov processes at the level of trajectories. We will do this after we define theconcept of a stochastic differential equation.

4.6 Ergodic Markov processes

• A very important concept in the study of limit theorems for stochastic processes is that ofergodicity.

• This concept, in the context of Markov processes, provides us with information on the long–timebehavior of a Markov semigroup.

Definition 4.6.1. A Markov process is calledergodic if the equation

Ptg = g, g ∈ Cb(E) ∀t > 0

has only constant solutions.

• Roughly speaking, ergodicity corresponds to the case wherethe semigroupPt is such thatPt − I hasonly constants in its null space, or, equivalently, to the case where the generatorL has only constantsin its null space. This follows from the definition of the generator of a Markov process.

• Under some additional compactness assumptions, an ergodicMarkov process has aninvariant mea-sureµ with the property that, in the caseT = R

+,

limt→+∞

1

t

∫ t

0g(Xs) ds = Eg(x),

50

• whereE denotes the expectation with respect toµ.

• This is a physicist’s definition of an ergodic process:time averages equal phase space averages.

• Using the adjoint semigroup we can define an invariant measure as the solution of the equation

P ∗t µ = µ.

• If this measure is unique, then the Markov process is ergodic.

• Using this, we can obtain an equation for the invariant measure in terms of the adjoint of the generatorL∗, which is the generator of the semigroupP ∗

t . Indeed, from the definition of the generator of asemigroup and the definition of an invariant measure, we conclude that a measureµ is invariant if andonly if

L∗µ = 0

• in some appropriate generalized sense ((L∗µ, f) = 0 for every bounded measurable function).

• Assume thatµ(dx) = ρ(x) dx. Then theinvariant density satisfies thestationary Fokker-Planckequation

L∗ρ = 0.

• The invariant measure (distribution) governs the long-time dynamics of the Markov process.

4.7 Stationary Markov Processes

• If X0 is distributed according toµ, then so isXt for all t > 0. The resulting stochastic process, withX0 distributed in this way, isstationary .

• In this case the transition probability density (the solution of the Fokker-Planck equation) is indepen-dent of time:ρ(x, t) = ρ(x).

• Consequently, the statistics of the Markov process is independent of time.

Example 4.7.1. The one dimensional Brownian motionis not an ergodic process: The null space of thegeneratorL = 1

2d2

dx2 on R is not one dimensional!

Example 4.7.2.Consider a one-dimensional Brownian motion on[0, 1], with periodic boundary conditions.The generator of this Markov processL is the differential operatorL = 1

2d2

dx2 , equipped with periodicboundary conditions on[0, 1]. This operator is self-adjoint. The null space of bothL andL∗ comprisesconstant functions on[0, 1]. Both the backward Kolmogorov and the Fokker-Planck equation reduce to theheat equation

∂ρ

∂t=

1

2

∂2ρ

∂x2

with periodic boundary conditions in[0, 1]. Fourier analysis shows that the solution converges to a constantat an exponential rate.

51

Example 4.7.3. • The one dimensionalOrnstein-Uhlenbeck (OU) processis a Markov process withgenerator

L = −αx ddx

+Dd2

dx2.

• The null space ofL comprises constants inx. Hence, it is an ergodic Markov process. In order tocalculate the invariant measure we need to solve the stationary Fokker–Planck equation:

L∗ρ = 0, ρ > 0, ‖ρ‖L1(R) = 1. (4.24)

• Let us calculate theL2-adjoint ofL. Assuming thatf, h decay sufficiently fast at infinity, we have:∫

R

Lfh dx =

∫

R

[(−αx∂xf)h+ (D∂2

xf)h]dx

=

∫

R

[f∂x(αxh) + f(D∂2

xh)]dx =:

∫

R

fL∗hdx,

• where

L∗h :=d

dx(axh) +D

d2h

dx2.

• We can calculate the invariant distribution by solving equation (4.24).

• The invariant measure of this process is the Gaussian measure

µ(dx) =

√α

2πDexp

(− α

2Dx2)dx.

• If the initial condition of the OU process is distributed according to the invariant measure, then theOU process is a stationary Gaussian process.

• Let Xt be the1d OU process and letX0 ∼ N (0,D/α). ThenXt is a mean zero, Gaussian secondorder stationary process on[0,∞) with correlation function

R(t) =D

αe−α|t|

and spectral density

f(x) =D

π

1

x2 + α2.

Furthermore, the OU process isthe only real-valued mean zero Gaussian second-order stationaryMarkov process defined onR.

52

Chapter 5

Diffusion Processes

5.1 Definition of a Diffusion Process

• A Markov process consists of three parts: a drift (deterministic), a random process and a jump process.

• A diffusion process is a Markov process that has continuous sample paths (trajectories). Thus, it is aMarkov process with no jumps.

• A diffusion process can be defined by specifying its first two moments:

Definition 5.1.1. A Markov processXt with transition probabilityP (Γ, t|x, s) is called adiffusion processif the following conditions are satisfied.

i. (Continuity). For everyx and everyε > 0∫

|x−y|>εP (dy, t|x, s) = o(t− s) (5.1)

uniformly overs < t.

ii. (Definition of drift coefficient). There exists a function a(x, s) such that for everyx and everyε > 0∫

|y−x|6ε(y − x)P (dy, t|x, s) = a(x, s)(t − s) + o(t− s). (5.2)


iii. (Definition of diffusion coefficient). There exists a function b(x, s) such that for everyx and everyε > 0 ∫

|y−x|6ε(y − x)2P (dy, t|x, s) = b(x, s)(t− s) + o(t− s). (5.3)


Remark 5.1.2. In Definition 5.1.1 we had to truncate the domain of integration since we didn’t knowwhether the first and second moments exist. If we assume that there exists aδ > 0 such that

limt→s

1

t− s

∫

Rd

|y − x|2+δP (s, x, t, dy) = 0, (5.4)

53

then we can extend the integration over the wholeRd and use expectations in the definition of the drift and

the diffusion coefficient. Indeed, ,letk = 0, 1, 2 and notice that∫

|y−x|>ε|y − x|kP (s, x, t, dy)

=

∫

|y−x|>ε|y − x|2+δ|y − x|k−(2+δ)P (s, x, t, dy)

61

ε2+δ−k

∫

|y−x|>ε|y − x|2+δP (s, x, t, dy)

61

ε2+δ−k

∫

Rd

|y − x|2+δP (s, x, t, dy).

Using this estimate together with(5.4)we conclude that:

limt→s

1

t− s

∫

|y−x|>ε|y − x|kP (s, x, t, dy) = 0, k = 0, 1, 2.

This implies that assumption(5.4) is sufficient for the sample paths to be continuous (k = 0) and for thereplacement of the truncated integrals in(5.2)and(5.3)by integrals overR (k = 1 andk = 2, respectively).The definitions of the drift and diffusion coefficients become:

limt→s

E

(Xt −Xs

t− s

∣∣∣Xs = x

)= a(x, s) (5.5)

and

limt→s

E

( |Xt −Xs|2t− s

∣∣∣Xs = x

)= b(x, s) (5.6)

5.2 The Backward and Forward Kolmogorov Equations

Under mild regularity assumptions we can show that a diffusion process is completely determined from itsfirst two moments. In particular, we will obtain partial differential equations that govern the evolution of theconditional expectation of an arbitrary function of a diffusion processXt, u(x, s) = E(f(Xt)|Xs = x), aswell as of the transition probability densityp(y, t|x, s) = P(Xt = y|Xs = s). These are thebackward andforward Kolmogorov equations.

5.2.1 The Backward Kolmogorov Equation

Theorem 5.2.1.(Kolmogorov) Letf(x) ∈ Cb(R) and let

u(x, s) := E(f(Xt)|Xs = x) =

∫f(y)P (dy, t|x, s) ∈ C2

b (R).

Assume furthermore that the functionsa(x, s), b(x, s) are continuous in bothx and s. Thenu(x, s) ∈C2,1(R × R

+) and it solves thefinal value problem

−∂u∂s

= a(x, s)∂u

∂x+

1

2b(x, s)

∂2u

∂x2, lim

s→tu(s, x) = f(x). (5.7)

54

Proof. First we notice that, the continuity assumption (5.1), together with the fact that the functionf(x) isbounded imply that

u(x, s) =

∫

R

f(y)P (dy, t|x, s)

=

∫

|y−x|6εf(y)P (dy, t|x, s) +

∫

|y−x|>εf(y)P (dy, t|x, s)

6

∫

|y−x|6εf(y)P (dy, t|x, s) + ‖f‖L∞

∫

|y−x|>εP (dy, t|x, s)

=

∫

|y−x|6εf(y)P (dy, t|x, s) + o(t− s).

We add and subtract the final conditionf(x) and use the previous calculation to obtain:

u(x, s) =

∫

R

f(y)P (dy, t|x, s) = f(x) +

∫

R

(f(y) − f(x))P (dy, t|x, s)

= f(x) +

∫

|y−x|6ε(f(y) − f(x))P (dy, t|x, s) +

∫

|y−x|>ε(f(y) − f(x))P (dy, t|x, s)

= f(x) +

∫

|y−x|6ε(f(y) − f(x))P (dy, t|x, s) + o(t− s).

Now the final condition follows from the fact thatf(x) ∈ Cb(R) and the arbitrariness ofε.

Now we show thatu(s, x) solves the backward Kolmogorov equation. We use the Chapman-Kolmogorovequation (4.15) to obtain

u(x, σ) =

∫

R

f(z)P (dz, t|x, σ) (5.8)

=

∫

R

∫

R

f(z)P (dz, t|y, ρ)P (dy, ρ|x, σ)

=

∫

R

u(y, ρ)P (dy, ρ|x, σ). (5.9)

The Taylor series expansion of the functionu(s, x) gives

u(z, ρ) − u(x, ρ) =∂u(x, ρ)

∂x(z − x) +

1

2

∂2u(x, ρ)

∂x2(z − x)2(1 + αε), |z − x| 6 ε, (5.10)

where

αε = supρ,|z−x|6ε

∣∣∣∣∂2u(x, ρ)

∂x2− ∂2u(z, ρ)

∂x2

∣∣∣∣ .

Notice that, sinceu(x, s) is twice continuously differentiable inx, limε→0 αε = 0.

55

We combine now (5.9) with (5.10) to calculate

u(x, s) − u(x, s + h)

h=

1

h

(∫

R

P (dy, s + h|x, s)u(y, s + h) − u(x, s+ h)

)

=1

h

∫

R

P (dy, s + h|x, s)(u(y, s + h) − u(x, s + h))

=1

h

∫

|x−y|<εP (dy, s + h|x, s)(u(y, s + h) − u(x, s)) + o(1)

=∂u

∂x(x, s + h)

1

h

∫

|x−y|<ε(y − x)P (dy, s + h|x, s)

+1

2

∂2u

∂x2(x, s+ h)

1

h

∫

|x−y|<ε(y − x)2P (dy, s+ h|x, s)(1 + αε) + o(1)

= a(x, s)∂u

∂x(x, s + h) +

1

2b(x, s)

∂2u

∂x2(x, s + h)(1 + αε) + o(1).

Equation (5.7) follows by taking the limitsε→ 0, h→ 0.

Assume now that the transition function has a densityp(y, t|x, s). In this case the formula foru(x, s)becomes

u(x, s) =

∫

R

f(y)p(y, t|x, s) dy.

Substituting this in the backward Kolmogorov equation we obtain

∫

R

f(y)

(∂p(y, t|x, s)

∂s+ As,xp(y, t|x, s)

)= 0 (5.11)

where

As,x := a(x, s)∂

∂x+

1

2b(x, s)

∂2

∂x2.

Since (5.11) is valid for arbitrary functionsf(y), we obtain a partial differential equations for the transitionprobability density:

−∂p(y, t|x, s)∂s

= a(x, s)∂p(y, t|x, s)

∂x+

1

2b(x, s)

∂2p(y, t|x, s)∂x2

. (5.12)

Notice that the variation is with respect to the ”backward” variablesx, s. We will obtain an equation withrespect to the ”forward” variablesy, t in the next section.

5.2.2 The Forward Kolmogorov Equation

In this section we will obtain the forward Kolmogorov equation. In the physics literature is called theFokker-Planck equation. We assume that the transition function has a density with respect to Lebesguemeasure.

P (Γ, t|x, s) =

∫

Γp(t, y|x, s) dy.

56

Theorem 5.2.2.(Kolmogorov) Assume that conditions(5.1), (5.2), (5.3)are satisfied and thatp(y, t|·, ·), a(y, t), b(y, t) ∈C2,1(R × R

+). Then the transition probability density satisfies the equation

∂p

∂t= − ∂

∂y(a(t, y)p) +

1

2

∂2

∂y2(b(t, y)p) , lim

t→sp(t, y|x, s) = δ(x− y). (5.13)

Proof. Fix a functionf(y) ∈ C20(R). An argument similar to the one used in the proof of the backward

Kolmogorov equation gives

limh→0

1

h

(∫f(y)p(y, s+ h|x, s) ds − f(x)

)= a(x, s)fx(x) +

1

2b(x, s)fxx(x), (5.14)

where subscripts denote differentiation with respect tox. On the other hand∫f(y)

∂

∂tp(y, t|x, s) dy =

∂

∂t

∫f(y)p(y, t|x, s) dy

= limh→0

1

h

∫(p(y, t+ h|x, s) − p(y, t|x, s)) f(y) dy

= limh→0

1

h

(∫p(y, t+ h|x, s)f(y) dy −

∫p(z, t|s, x)f(z) dz

)

= limh→0

1

h

(∫ ∫p(y, t+ s|z, t)p(z, t|x, s)f(y) dydz −

∫p(z, t|s, x)f(z) dz

)

= limh→0

1

h

(∫p(z, t|x, s)

(∫p(y, t+ h|z, t)f(y) dy − f(z)

))dz

=

∫p(z, t|x, s)

(a(z, t)fz(z) +

1

2b(z)fzz(z)

)dz

=

∫ (− ∂

∂z(a(z)p(z, t|x, s)) +

1

2

∂2

∂z2(b(z)p(z, t|x, s)

)f(z) dz.

In the above calculation used the Chapman-Kolmogorov equation. We have also performed two integrationsby parts and used the fact that, since the test functionf has compact support, the boundary terms vanish.

Since the above equation is valid for every test functionf(y), the forward Kolmogorov equation follows.

Exercise 5.2.3.Prove equation(5.14).

Assume now that initial distribution ofXt is ρ0(x) and sets = 0 (the initial time) in (5.13). Define

p(y, t) :=

∫p(y, t|x, 0)ρ0(x) dx.

We multiply the forward Kolmogorov equation (5.13) byρ0(x) and integrate with respect tox to obtain theequation

∂p(y, t)

∂t= − ∂

∂y(a(y, t)p(y, t)) +

1

2

∂2

∂y2(b(y, t)p(t, y)) , (5.15)

together with the initial conditionp(y, 0) = ρ0(y). (5.16)

57

The solution of equation (5.15), provides us with the probability that the diffusion processXt, which initiallywas distributed according to the probability densityρ0(x), is equal toy at timet. Alternatively, we can thinkof the solution to (5.13) as the Green’s function for the PDE (5.15).

Quite often we need to calculate joint probability densities. For, example the probability thatXt1 = x1

andXt2 = x2. From the properties of conditional expectation we have that

p(x1, t1, x2, t2) = PXt1 = x1,Xt2 = x2)

= P(Xt1 = x1|Xt2 = x2)P(Xt2 = x2)

= p(x1, t1|x2t2)p(x2, t2).

Using the joint probability density we can calculate the statistics of a function of the diffusion processXt attimest ands:

E(f(Xt,Xs)) =

∫ ∫f(y, x)p(y, t|x, s)p(x, s) dxdy. (5.17)

Theautocorrelation function at timet ands is given by

E(XtXs) =

∫ ∫yxp(y, t|x, s)p(x, s) dxdy.

In particular,

E(XtX0) =

∫ ∫yxp(y, t|x, 0)p(x, 0) dxdy.

Notice also that from (5.17) it follows that

E(f(Xt)) =

∫ ∫f(y)p(y, t|x, 0)p(x, 0) dxdy

=

∫f(y)p(y, t) dy,

wherep(y, t) is the solution of (5.15).

5.3 The Fokker-Planck Equation in Arbitrary Dimensions

The drift and diffusion coefficients of a diffusion process in Rd are defined as:

limt→s

1

t− s

∫

|y−x|<ε(y − x)P (dy, t|x, s) = a(x, s)

and

limt→s

1

t− s

∫

|y−x|<ε(y − x) ⊗ (y − x)P (dy, t|x, s) = b(x, s).

The drift coefficienta(x, s) is ad-dimensional vector field and the diffusion coefficientb(x, s) is ad × d

symmetric matrix (second order tensor). The generator of ad dimensional diffusion process is

L = a(s, x) · ∇ +1

2b(s, x) : ∇∇

=

d∑

j=1

aj∂

∂xj+

1

2

d∑

i,j=1

bij∂2

∂x2j

.

58

Exercise 5.3.1.Derive rigorously the forward and backward Kolmogorov equations in arbitrary dimensions.

Assuming that the first and second moments of the multidimensional diffusion process exist, we canwrite the formulas for the drift vector and diffusion matrixas

limt→s

E

(Xt −Xs

t− s

∣∣∣Xs = x

)= a(x, s) (5.18)

and

limt→s

E

((Xt −Xs) ⊗ (Xt −Xs)

t− s

∣∣∣Xs = x

)= b(x, s) (5.19)

Notice that from the above definition it follows that the diffusion matrix is symmetric and nonnegativedefinite.

5.4 Connection with Stochastic Differential Equations

Notice also that the continuity condition can be written in the form

P (|Xt −Xs| > ε|Xs = x) = o(t− s).

Now it becomes clear that this condition implies that the probability of large changes inXt over shorttime intervals is small. Notice, on the other hand, that the above condition implies that the sample paths of adiffusion processare not differentiable: if they where, then the right hand side of the above equationwouldhave to be0 whent−s≪ 1. The sample paths of a diffusion process have the regularityof Brownian paths.A Markovian processcannot bedifferentiable: we can define the derivative of a sample paths only withprocesses for which the past and future are not statistically independent when conditioned on the present.

Let us denote the expectation conditioned onXs = x by Es,x. Notice that the definitions of the drift and

diffusion coefficients (5.5) and (5.6) can be written in the form

Es,x(Xt −Xs) = a(x, s)(t− s) + o(t− s).

andE

s,x((Xt −Xs) ⊗ (Xt −Xs)

)= b(x, s)(t− s) + o(t− s).

Consequently, the drift coefficient defines themean velocity vectorfor the stochastic processXt, whereasthe diffusion coefficient (tensor) is a measure of the local magnitude of fluctuations ofXt − Xs about themean value. hence, we can write locally:

Xt −Xs ≈ a(s,Xs)(t− s) + σ(s,Xs) ξt,

whereb = σσT andξt is a mean zero Gaussian process with

Es,x(ξt ⊗ ξs) = (t− s)I.

Since we have thatWt −Ws ∼ N (0, (t − s)I),

59

we conclude that we can write locally:

∆Xt ≈ a(s,Xs)∆t+ σ(s,Xs)∆Wt.

Or, replacing the differences by differentials:

dXt = a(t,Xt)dt+ σ(t,Xt)dWt.

Hence, the sample paths of a diffusion process are governed by astochastic differential equation(SDE).


The argument used in the derivation of the forward and backward Kolmogorov equations goes back toKolmogorov’s original work. More material on diffusion processes can be found in [11], [14].

60

Chapter 6

The Fokker-Planck Equation

In the previous chapter we derived the backward and forward (Fokker-Planck) Kolmogorov equations andwe showed that all statistical properties of a diffusion process can be calculated from the solution of theFokker-Planck equation.1 In this chapter we will study various properties of this equation such as existenceand uniqueness of solutions, long time asymptotics, boundary conditions and spectral properties of theFokker-Planck operator. We will also study in some detail various examples. We will restrict attention totime-homogeneous diffusion processes, for which the driftand diffusion coefficients do not depend on time.

6.1 Basic Properties of the FP Equation

6.1.1 Existence and Uniqueness of Solutions

Consider a homogeneous diffusion process onRd with drift vector and diffusion matrixa(x) andb(x). The

Fokker-Planck equation is

∂p

∂t= −

d∑

j=1

∂

∂xj(ai(x)p) +

1

2

d∑

i,j=1

∂2

∂xi∂xj(bij(x)p), t > 0, x ∈ R

d, (6.1a)

p(x, 0) = f(x), x ∈ Rd. (6.1b)

Sincef(x) is the probability density of the initial condition (which is a random variable), we have that

f(x) > 0, and∫

Rd

f(x) dx = 1.

We can also write the equation innon-divergence form:

∂p

∂t=

d∑

j=1

aj(x)∂p

∂xj+

1

2

d∑

i,j=1

bij(x)∂2p

∂xi∂xj+ c(x)u, t > 0, x ∈ R

d, (6.2a)

p(x, 0) = f(x), x ∈ Rd, (6.2b)

1In this chapter we will call the equation Fokker-Planck, which is more customary in the physics literature. rather forwardKolmogorov, which is more customary in the mathematics literature.

61

where

ai(x) = −ai(x) +d∑

j=1

∂bij∂xj

, ci(x) =1

2

d∑

i,j=1

∂2bij∂xi∂xj

−d∑

i=1

∂ai

∂xi.

By definition (see equation (5.19)), the diffusion matrix isalways symmetric and nonnegative. We willassume that it is actually uniformly positive definite, i.e.we will impose theuniform ellipticity condition:

d∑

i,j=1

bij(x)ξiξj > α‖ξ‖2, ∀ ξ ∈ Rd, (6.3)

Furthermore, we will assume that the coefficientsa, b, c are smooth and that they satisfy the growth condi-tions

‖b(x)‖ 6 M, ‖a(x)‖ 6 M(1 + ‖x‖), ‖c(x)‖ 6 M(1 + ‖x‖2). (6.4)

Definition 6.1.1. We will call a solution to the Cauchy problem for the Fokker–Planck equation(6.2) aclassical solutionif:

i. u ∈ C2,1(Rd,R+).

ii. ∀T > 0 there exists ac > 0 such that

‖u(t, x)‖L∞(0,T ) 6 ceα‖x‖2

iii. limt→0 u(t, x) = f(x).

It is a standard result in the theory of parabolic partial differential equations that, under the regularity anduniform ellipticity assumptions, the Fokker-Planck equation has a unique smooth solution. Furthermore, thesolution can be estimated in terms of an appropriate heat kernel (i.e. the solution of the heat equation onR

d).

Theorem 6.1.2.Assume that conditions(6.3)and (6.4)are satisfied, and assume that|f | 6 ceα‖x‖2. Then

there exists a unique classical solution to the Cauchy problem for the Fokker–Planck equation. Furthermore,there exist positive constantsK, δ so that

|p|, |pt|, ‖∇p‖, ‖D2p‖ 6 Kt(−n+2)/2 exp

(− 1

2tδ‖x‖2

). (6.5)

Notice that from estimates (6.5) it follows that all momentsof a uniformly elliptic diffusion processexist. In particular, we can multiply the Fokker-Planck equation by monomialsxn and then to integrate overR

d and to integrate by parts. No boundary terms will appear, in view of the estimate (6.5).

Remark 6.1.3. The solution of the Fokker-Planck equation is nonnegative for all times, provided that theinitial distribution is nonnegative. This is follows from themaximum principle for parabolic PDEs.

62

6.1.2 The FP equation as a conservation law

The Fokker-Planck equation is in fact a conservation law: itexpresses the law of conservation of probability.To see this we define theprobability current to be the vector whoseith component is

Ji := ai(x)p −1

2

d∑

j=1

∂

∂xj

(bij(x)p

). (6.6)

We use the probability current to write the Fokker–Planck equation as acontinuity equation:

∂p

∂t+ ∇ · J = 0.

Integrating the FP equation overRd and integrating by parts on the right hand side of the equation we obtain

d

dt

∫

Rd

p(x, t) dx = 0.

Consequently:

‖p(·, t)‖L1(Rd) = ‖p(·, 0)‖L1(Rd) = 1. (6.7)

Hence, the total probability is conserved, as expected. Equation (6.7) simply means that

E(Xt ∈ Rd) = 1, t > 0.

6.1.3 Boundary conditions for the Fokker–Planck equation

When studying a diffusion process that can take values on thewhole ofRd, then we study the pure initialvalue (Cauchy) problem for the Fokker-Planck equation, equation (6.1). The boundary condition was thatthe solution decays sufficiently fast at infinity. For ergodic diffusion processes this is equivalent to requiringthat the solution of the backward Kolmogorov equation is an element ofL2(µ) whereµ is the invariantmeasure of the process. There are many applications where itis important to study stochastic process inbounded domains. In this case it is necessary to specify the value of the stochastic process (or equivalentlyof the solution to the Fokker-Planck equation) on the boundary.

To understand the type of boundary conditions that we can impose on the Fokker-Planck equation, letus consider the example of a random walk on the domain0, 1, . . . N.2 When the random walker reacheseither the left or the right boundary we can either set

i. X0 = 0 orXN = 0, which means that the particle gets absorbed at the boundary;

ii. X0 = X1 orXN = XN−1, which means that the particle is reflected at the boundary;

iii. X0 = XN , which means that the particle is moving on a circle (i.e., weidentify the left and rightboundaries).

2Of course, the random walk is not a diffusion process. However, as we have already seen the Brownian motion can be definedas the limit of an appropriately rescaled random walk. A similar construction exists for more general diffusion processes.

63

Hence, we can haveabsorbing, reflectingor periodic boundary conditions.Consider the Fokker-Planck equation posed inΩ ⊂ R

d whereΩ is a bounded domain with smoothboundary. LetJ denote the probability current and letn be the unit outward pointing normal vector to thesurface. The above boundary conditions become:

i. The transition probability density vanishes on an absorbing boundary:

p(x, t) = 0, on ∂Ω.

ii. There is no net flow of probability on a reflecting boundary:

n · J(x, t) = 0, on ∂Ω.

iii. The transition probability density is a periodic function in the case of periodic boundary conditions.

Notice that, using the terminology customary to PDEs theory, absorbing boundary conditions correspond toDirichlet boundary conditions and reflecting boundary conditions correspond to Neumann. Of course, onconsider more complicated, mixed boundary conditions.

Consider now a diffusion process in one dimension on the interval [0, L]. The boundary conditions are

p(0, t) = p(L, t) = 0 absorbing,

J(0, t)) = J(L, t) = 0 reflecting,

p(0, t) = p(L, t) periodic,

where the probability current is defined in (6.6). An exampleof mixed boundary conditions would beabsorbing boundary conditions at the left end and reflectingboundary conditions at the right end:

p(0, t) = J(L, t) = 0.

There is a complete classification of boundary conditions inone dimension, theFeller classification: theBC can beregular, exit, entranceandnatural .

6.2 Examples of Diffusion Processes

6.2.1 Brownian Motion

Seta(y, t) ≡ 0, b(y, t) ≡ 2D > 0. This diffusion process is the Brownian motion with diffusion coefficientD. The Fokker-Planck equation is:

∂p

∂t= D

∂2p

∂y2, p(y, s|x, s) = δ(x− y). (6.8)

As we have already seen, the solution to this equation is

pW (y, s|x, t) =1√

4πD(s − t)exp

(− (y − x)2

4D(s− t)

).

64

Notice that using the Fokker-Planck equation for the Brownian motion we can immediately show that themean squared displacement scales linearly in time:

d

dtEx2 =

d

dt

∫

R

x2p(x, t) dx

= D

∫

R

x2∂2p(x, t)

∂x2dx

= D

∫

R

p(x, t) dx = 2D.

Consequently,Ex2 = 2Dt.

Assume now that the initial distribution of the Brownian motion isρ0(x). The solution of the Fokker-Planckequation for the Brownian motion with this initial distribution is

PW (y, t) =

∫p(y, t|x, 0)W (x) dy.

We can also consider Brownian motion in a bounded domain, with either absorbing, reflecting or periodicboundary conditions. SetD = 1

2 and consider the Fokker-Planck equation (6.8) on[0, 1] with absorbingboundary conditions:

∂p

∂t=

1

2

∂2p

∂x2, p(0, t) = p(1, t) = 0. (6.9)

We look for a solution to this equation in a sine Fourier series:

p(x, t) =

∞∑

k=1

pn(t) sin(nπx). (6.10)

Notice that the boundary conditions are automatically satisfied. The initial condition it

p(x, 0) = δ(x− x0),

where we have assumed thatW0 = x0. The Fourier coefficients of the initial conditions are

pn(0) = 2

∫ 1

0δ(x− x0) sin(nπx) dx = 2 sin(nπx0).

We substitute the expansion (6.10) into (6.9) and use the orthogonality properties of the Fourier basis toobtain the equations

pn = −n2π2

2pn n = 1, 2, . . .

The solution of this equation is

pn(t) = pn(0)e−n2π2

2t.

Consequently, the transition probability density for the Brownian motion on[0, 1] with absorbing boundaryconditions is

p(x, t|x0, 0) = 2

∞∑

n=1

e−n2π2

2t sinnπx0 sin(nπx).

65

Notice that

limt→∞

p(x, t|x0, 0) = 0.

This is not surprising, since all Brownian particles will eventually get absorbed at the boundary.

Exercise 6.2.1.Consider the Brownian motion withD = 12 on the interval [0,1] with reflecting boundary

conditions. The Fokker-Planck equation is

∂p

∂t=

1

2

∂2p

∂x2, ∂xp(0, t) = ∂xp(1, t) = 0.

i. Find the transition probability density for a Brownian particle starting atx0.

ii. Show that the process is ergodic and find the invariant distribution ps(x).

iii. Calculate the stationary autocorrelation function

E(W (t)W (0)) =

∫ 1

0

∫ 1

0xx0p(x, t|x0, 0)ps(x0), dxdx0.

6.2.2 The Onrstein-Uhlenbeck Process

We set nowa(y, t) = −αy, b(y, t) ≡ 2D > 0 into the Fokker-Planck equation to obtain

∂p

∂t= α

∂(yp)

∂x+D

∂2p

∂y2. (6.11)

This is the Fokker-Planck equation for the Ornstein-Uhlenbeck process. The corresponding stochastic dif-ferential equation is

dXt = −αXt +√

2DdWt.

So, in addition to Brownian motion there is a linear force pulling the particle towards the origin. This is thereason why the OU process is ergodic.

The transition probability density for the the OU process is

pOU(y, t|x, s) =

√α

2πD(1 − e−2α(t−s))exp

(−α(y − e−α(t−s)x)2

2D(1 − e−2α(t−s))

). (6.12)

We obtained this formula in Example (4.1.4) (forα = D = 1) by using the fact that the OU process can bedefined through the a time change of the Brownian motion. We can also derive it by solving equation (6.11).

Exercise 6.2.2.Solve equation(6.11)by taking the Fourier transform, using the method of characteristicsfor first order PDEs and taking the inverse Fourier transform.

We setx = 0, s = 0 in (6.12):

pOU(y, t|0, 0) =

√α

2πD(1 − e−2αt)exp

(− αy2

2D(1 − e−2αt)

).

66

We have thatlimα→0

pOU(y, t) = pW (y, t).

Thus, in the limit where the friction coefficient goes to0, we recover the transition probability of Brownianmotion from the transition probability of the OU processes.Notice also that

limt→+∞

pOU(y, t) =

√α

2πDexp

(−αy

2

2D

).

As we have already seen, the Ornstein-Uhlenbeck process is an ergodic Markov process, and its invariantmeasure is Gaussian.

Using (6.12) we can calculate all moments of the OU process Wecan calculate all moments of the OUprocess

E((Xt)n) =

∫ ∫ynp(y, t|x, 0)p0(x) dxdy,

wherep0(x) is the initial distribution. We will calculate the moments by using the Fokker-Planck equation,rather than the explicit formula for the transition probability density.

Define thenth moment of the OU process:

Mn =

∫

R

ynp(y, t) dy, n = 0, 1, 2, . . . ,

wherep(y, t) =∫

Rp(y, t|x, 0)p0(x) dx. Letn = 0. We integrate the FP equation overR to obtain:

∫∂p

∂t= α

∫∂(yp)

∂y+D

∫∂2p

∂y2= 0,

after an integration by parts and using the fact thatp(x, t) decays sufficiently fast at infinity. Consequently:

d

dtM0 = 0 ⇒ M0(t) = M0(0) = 1.

In other words:d

dt‖p‖L1(R) = 0 ⇒

∫

R

p(y, t) dy =

∫

R

p(y, t = 0) dy = 1.

Consequently:probability is conserved, as we have already shown. Letn = 1. We multiply the FPequation for the OU process byx, integrate overR and perform and integration by parts to obtain:

d

dtM1 = −αM1.

Consequently, the first moment convergesexponentially fast to 0:

M1(t) = e−αtM1(0).

Let nown > 2. We multiply the FP equation for the OU process byxn and integrate by parts (once on thefirst term on the RHS and twice on the second) to obtain:

d

dt

∫ynp = −αn

∫ynp+Dn(n− 1)

∫yn−2p.

67

Or, equivalently:d

dtMn = −αnMn +Dn(n− 1)Mn−2, n > 2.

This is a first order linear inhomogeneous differential equation. We can solve it using the variation ofconstants formula:

Mn(t) = e−αntMn(0) +Dn(n− 1)

∫ t

0e−αn(t−s)Mn−2(s) ds. (6.13)

We can use this formula, together with the formulas for the first two moments in order to calculate all higherorder moments in an iterative way. For example, forn = 2 we have

M2(t) = e−2αtM2(0) +D

∫ t

0e−2α(t−s)M0(s) ds

= e−2αtM2(0) +D

2αe−2αt(e2αt − 1)

=D

2α+ e−2αt

(M2(0) −

D

2α

).

Consequently, the second moment converges exponentially fast to its stationary valueD2α . The stationarymoments of the OU process are:

〈yn〉OU =

√α

2πD

∫

R

yne−αy2

2D dx

=

1.3 . . . (n− 1)(

Dα

)n/2, n even,

0, n odd.

It is not hard to check that

limt→∞

Mn(t) = 〈yn〉OU . (6.14)

Furthermore, if the initial conditions of the OU process arestationary, then:

Mn(t) = Mn(0) = 〈xn〉OU . (6.15)

Exercise 6.2.3.Use(6.13)to show(6.14)and (6.15).

Exercise 6.2.4.Show that the autocorrelation function of the stationary Ornstein-Uhlenbeck is

E(XtX0) =

∫

R

∫

R

xx0pOU(x, t|x0, 0)ps(x0) dxdx0

=D

2αe−α|t|,

whereps(x) denotes the invariant Gaussian distribution.

68

6.2.3 The Geometric Brownian Motin

We seta(x) = µx, b(x) = 12σ

2x2. This is thegeometric Brownian motion. The corresponding stochasticdifferential equation is

dXt = µXt dt+ σXt dWt.

This equation is one of the basic models of mathematical finance. The coefficientσ is called thevolatility .The generator of this process is

L = µx∂

∂x+σx2

2

∂2

∂x2.

Notice that this operator is not uniformly elliptic. The Fokker-Planck equation of the geometric Brownianmotion is:

∂p

∂t= − ∂

∂x(µx) +

∂2

∂x2

(σ2x2

2p

).

We can easily obtain an equation for thenth moment of the geometric Brownian motion:

d

dtMn =

(µn+

σ2

2n(n− 1)

)Mn, n > 2.

The solution of this equation is

Mn(t) = e(µ+(n−1)σ2

2)ntMn(0), n > 2

andM1(t) = eµtM1(0).

Notice that thenth moment might diverge ast → ∞, depending on the values ofµ andσ. Consider forexample the second moment and assume thatµ < 0. We have

Mn(t) = e(2µ+σ2)tM2(0),

which diverges whenσ2 + 2µ > 0.

6.3 The OU Process and Hermite Polynomials

The Ornstein-Uhlenbeck process is one of the few stochasticprocesses for which we can calculate explicitlythe solution of the corresponding SDE, the solution of the Fokker-Planck equation as well as the eigenfunc-tions of the generator of the process. In this section we willshow that the eigenfunctions of the OU processare the Hermite polynomials. We will also study various properties of the generator of the OU process. Inthe next section we will show that many of the properties of the OU process (ergodicity, self-adjointness ofthe generator, exponentially fast convergence to equilibrium, real, discrete spectrum) are shared by a largeclass of diffusion processes, namely those for which the drift term can be written in terms of the gradient ofa smooth functions. We will refer to these processes asgradient flows.

The generator of thed-dimensional OU process is (we set the drift coefficient equal to 1)

L = −p · ∇p + β−1∆p (6.16)

69

We have already seen that the OU process is an ergodic Markov process whose unique invariant measure isabsolutely continuous with respect to the Lebesgue measureonR

d with Gaussian densityρ ∈ C∞(Rd)

ρβ(p) =1

(2πβ−1)d/2e−β

|p|2

2 .

The natural function space for studying the generator of theOU process is theL2-space weighted by theinvariant measure of the process. This is a separable Hilbert space with norm

‖f‖2ρ :=

∫

Rd

f2ρβ dp.

and corresponding inner product

(f, h)ρ =

∫

R

fhρβ dp.

Similarly, we can define weightedL2-spaced involving derivatives.

Exercise 6.3.1.Give the definition of the Sobolev spaceHp(Rd; ρβ).

The reason why this is the right function space is that the generator of the OU process becomes a self-adjoint operator in this space. In fact,L defined in (6.16) has many nice properties that are summarized inthe following proposition.

Proposition 6.3.2. The operatorL has the following properties:

• For everyf, h ∈ C20 (Rd) ∩ L2

ρ(Rd),

(Lf, h)ρ = (f,Lh)ρ = −β−1

∫

Rd

∇f · ∇hρβ dp. (6.17)

• L is a non-positive operator onL2ρ.

• Lf = 0 iff f ≡ const.

• For everyf ∈ C20 (Rd) ∩ L2

ρ(Rd) with

∫fρβ = 0,

(−Lf, f)ρ > ‖f‖2ρ (6.18)

Proof. Equation (6.17) follows from an integration by parts:

(Lf, h)ρ =

∫−p · ∇fhρβ dp+ β−1

∫∆fhρβ dp

=

∫−p · ∇fhρβ dp− β−1

∫∇f · ∇hρβ dp+

∫−p · ∇fhρβ dp

= −β−1(∇f,∇h)ρ.

Non-positivity ofL follows from (6.17) upon settingh = f :

(Lf, f)ρ = −β−1‖∇f‖2ρ 6 0.

70

Similarly, multiplying the equationLf = 0 by fρβ, integrating overRd and using (6.17) gives

‖f‖ρ = 0,

from which we deduce thatf ≡ const. The spectral gap follows from (6.17), together with the Poincareinequality for Gaussian measures:

∫

Rd

f2ρβ dp 6 β−1

∫

Rd

|∇f |2ρβ dp (6.19)

for everyf ∈ H1(Rd; ρβ) with∫fρβ = 0. Indeed, upon combining (6.17) with (6.19) we obtain:

(Lf, f)ρ = −β−1|∇f |2ρ6 −‖f‖2

ρ

The spectral gap of the generator of the OU process, which is equivalent to the compactness of itsresolvent, implies thatL has discrete spectrum. Furthermore, since it is also a self-adjoint operator, we havethat its eigenfunctions form a countable orthonormal basisfor the separable Hilbert spaceL2

ρ. In fact, wecan calculate the eigenvalues and eigenfunctions of the generator of the OU process in one dimension3

Theorem 6.3.3.Consider the eigenvalue problem for the generator of the OU process in one dimension

−Lfn = λnfn. (6.20)

Then the eigenvalues ofL are the nonnegative integers:

λn = n, n = 0, 1, 2, . . . .

The corresponding eigenfunctions are the normalizedHermite polynomials:

fn(p) =1√n!Hn

(√βp), (6.21)

where

Hn(p) = (−1)nep2

2dn

dpn

(e−

p2

2

). (6.22)

Hermite polynomials appear very frequently in applications and they also play a fundamental role inanalysis. It is possible to prove that the Hermite polynomials form an orthonormal basis forL2(Rd, ρβ)

without using the fact that they are the eigenfunctions of the a symmetric operator with compact resolvent4

A typical result along these lines is [24, Lemma 2.3.4] (we use the notationρ1 = ρ):

3The multidimensional problem can be treated similarly by taking tensor products of the eigenfunctions of the one dimensionalproblem. See Exercise.

4In fact, the Poincare inequality for Gaussian measures canbe proved using the fact that that the Hermite polynomials form anorthonormal basis forL2(Rd, ρβ).

71

Proposition 6.3.4. For eachλ ∈ C, set

H(p;λ) = eλp−λ2

2 , p ∈ R.

Then

H(p;λ) =

∞∑

n=0

λn

n!Hn(p), p ∈ R, (6.23)

where the convergence is both uniform on compact subsets ofR × C, and forλ’s in compact subsets ofC, uniform inL2(C; ρ). In particular, fn(p) := 1√

n!Hn(

√βp) : n ∈ N is an orthonormal basis in

L2(C; ρβ).

From (6.22) it is clear thatHn is a polynomial of degreen. Furthermore, only odd (even) powers appearin Hn(p) whenn is odd (even). Furthermore, the coefficient multiplyingpn in Hn(p) is always1. Theorthonormality of the modified Hermite polynomialsfn(p) defined in (6.21) implies that

∫

R

fn(p)fm(p)ρβ(p) dp = δnm.

The first few Hermite polynomials and the corresponding rescaled/normalized eigenfunctions of the gener-ator of the OU process are:

H0(p) = 1, f0(p) = 1,

H1(p) = p, f1(p) =√βp,

H2(p) = p2 − 1, f2(p) =β√2p2 − 1√

2,

H3(p) = p3 − 3p, f3(p) =β3/2

√6p3 − 3

√β√6p

H4(p) = p4 − 3p2 + 3, f4(p) =1√24

(β2p4 − 3βp2 + 3

)

H5(p) = p5 − 10p3 + 15p, f5(p) =1√120

(β5/2p5 − 10β3/2p3 + 15β1/2p

).

The proof of Theorem 6.3.3 follows essentially from the properties of the Hermite polynomials. First, noticethat by combining (6.21) and (6.23) we obtain

H(√βp, λ) =

+∞∑

n=0

λn

√n!fn(p)

We differentiate this formula with respect top to obtain

λ√βH(

√βp, λ) =

+∞∑

n=1

λn

√n!∂pfn(p),

72

sincef0 = 1. From this equation we obtain

H(√βp, λ) =

+∞∑

n=1

λn−1

√β√n!∂pfn(p)

=

+∞∑

n=0

λn

√β√

(n+ 1)!∂pfn+1(p)

from which we deduce that1√β∂pfk =

√kfk−1. (6.24)

Similarly, if we differentiate (6.23) with respect toλ we obtain

(p− λ)H(p;λ) =+∞∑

k=0

λk

k!pHk(p) −

+∞∑

k=1

λk

(k − 1)!Hk−1(p)

+∞∑

k=0

λk

k!Hk+1(p)

from which we obtain the recurrence relation

pHk = Hk+1 + kHk−1.

Upon rescaling, we deduce that

pfk =√β−1(k + 1)fk+1 +

√β−1kfk−1. (6.25)

We combine now equations (6.24) and (6.25) to obtain

(√βp− 1√

β∂p

)fk =

√k + 1fk+1. (6.26)

Now we observe that

−Lfn =

(√βp− 1√

β∂p

)1√β∂pfn

=

(√βp− 1√

β∂p

)√nfn−1 = nfn.

The operators(√

βp− 1√β∂p

)and 1√

β∂p play the role ofcreation andannihilation operators. In fact,

we can generate all eigenfunctions of the OU operator from the ground state f0 = 0 through a repeatedapplication of the creation operator.

Proposition 6.3.5. Setβ = 1 and leta− = ∂p. Then theL2ρ-adjoint ofa+ is

a+ = −∂p + p.

Then the generator of the OU process can be written in the form

L = −a+a−.

73

Furthermore,a+ anda− satisfy the following commutation relation

[a+, a−] = −1

Define now the creation and annihilation operators onC1(R) by

S+ =1√

(n+ 1)a+

andS− =

1√na−.

ThenS+fn = fn+1 and S−fn = fn−1. (6.27)

In particular,

fn =1√n!

(a+)n1 (6.28)

and1 =

1√n!

(a−)nfn. (6.29)

Proof. let f, h ∈ C1(R) ∩ L2ρ. We calculate

∫∂pfhρ = −

∫f∂p(hρ) (6.30)

=

∫f(− ∂p + p

)hρ. (6.31)

Now,−a+a− = −(−∂p + p)∂p = ∂p − p∂p = L.

Similarly,a−a+ = −∂2

p + p∂p + 1.

and[a+, a−] = −1

Forumlas (6.27) follow from (6.24) and (6.26). Finally, formulas (6.28) and (6.29) are a consequenceof (6.24) and (6.26), together with a simple induction argument.

Notice that upon using (6.28) and (6.29) and the fact thata+ is the adjoint ofa− we can easily checkthe orthonormality of the eigenfunctions:

∫fnfm ρ =

1√m!

∫fn(a−)m1 ρ

=1√m!

∫(a−)mfn ρ

=

∫fn−m ρ = δnm.

From the eigenfunctions and eigenvalues ofL we can easily obtain the eigenvalues and eigenfunctions ofL∗, the Fokker-Planck operator.

74

Lemma 6.3.6. The eigenvalues and eigenfunctions of the Fokker-Planck operator

L∗· = ∂2p · +∂p(p·)

areλ∗n = −n, n = 0, 1, s, . . . and f∗n = ρfn.

Proof. We have

L∗(ρfn) = fnL∗ρ+ ρLfn

= −nρfn.

An immediate corollary of the above calculation is that we can thenth eigenfunction of the Fokker-Planck operator is given by

f∗n = ρ(p)1

n!(a+)n1.

Consider now the Fokker-Planck equation with an arbitrary initial condition.We can relate the generator of the OU process with the Schrodinger operator

6.4 Gradient Flows

Gradient Flows

• Let V (x) = 12αx

2. The generator of the OU process can be written as:

L = −∂xV ∂x +D∂2x.

• Consider diffusion processes with a potentialV (x), not necessarily quadratic:

L = −∇V (x) · ∇ +D∆. (6.32)

• This is agradient flow perturbed by noise whose strength isD = kBT wherekB is Boltzmann’sconstant andT the absolute temperature.

• The corresponding stochastic differential equation is

dXt = −∇V (Xt) dt +√

2DdWt.

• The corresponding FP equation is:

∂p

∂t= ∇ · (∇V p) +D∆p. (6.33)

• It is not possible to calculate the time dependent solution of this equation for an arbitrary potential.We can, however, always calculate the stationary solution.

75

Definition 6.4.1. A potentialV will be calledconfining if limx→+∞ V (x) = +∞ and

e−βV (x) ∈ L1(Rd). (6.34)

for all β ∈ R+.

Proposition 6.4.2. Let V (x) be a smooth confining potential. Then the Markov process withgenera-tor (6.32)is ergodic. The unique invariant distribution is theGibbs distribution

p(x) =1

Ze−V (x)/D (6.35)

where the normalization factorZ is thepartition function

Z =

∫

Rd

e−V (x)/D dx.

• The fact that the Gibbs distribution is an invariant distribution follows by direct substitution. Unique-ness follows from a PDEs argument (see discussion below).

• It is more convenient to ”normalize” the solution of the Fokker-Planck equation wrt the invariantdistribution.

Theorem 6.4.3.Let p(x, t) be the solution of the Fokker-Planck equation(6.33), assume that(6.34)holdsand letρ(x) be the Gibbs distribution(6.35). Defineh(x, t) through

p(x, t) = h(x, t)ρ(x).

Then the functionh satisfies thebackward Kolmogorov equation:

∂h

∂t= −∇V · ∇h+D∆h, h(x, 0) = p(x, 0)ρ−1(x). (6.36)

Proof. The initial condition follows from the definition ofh. We calculate the gradient and Laplacian ofp:

∇p = ρ∇h− ρhD−1∇V

and∆p = ρ∆h− 2ρD−1∇V · ∇h+ hD−1∆V ρ+ h|∇V |2D−2ρ.

We substitute these formulas into the FP equation to obtain

ρ∂h

∂t= ρ

(−∇V · ∇h+D∆h

),

from which the claim follows.

• Consequently, in order to study properties of solutions to the FP equation, it is sufficient to study thebackward equation (6.36).

• The generatorL is self-adjoint, in the right function space.

76

• We define the weightedL2 spaceL2ρ:

L2ρ =

f |∫

Rd

|f |2ρ(x) dx <∞,

• whereρ(x) is the Gibbs distribution. This is a Hilbert space with innerproduct

(f, h)ρ =

∫

Rd

fhρ(x) dx.

Theorem 6.4.4.Assume thatV (x) is a smooth potential and assume that condition(6.34)holds. Then theoperator

L = −∇V (x) · ∇ +D∆

is self-adjoint inL2ρ. Furthermore, it is non-positive, its kernel consists of constants.

Proof. Let f, ∈ C20 (Rd). We calculate

(Lf, h)ρ =

∫

Rd

(−∇V · ∇ +D∆)fhρ dx

=

∫

Rd

(∇V · ∇f)hρ dx−D

∫

Rd

∇f∇hρ dx−D

∫

Rd

∇fh∇ρ dx

= −D∫

Rd

∇f · ∇hρ dx,

from which self-adjointness follows.

If we setf = h in the above equation we get

(Lf, f)ρ = −D‖∇f‖2ρ,

which shows thatL is non-positive.Clearly, constants are in the null space ofL. Assume thatf ∈ N (L). Then, from the above equation

we get

0 = −D‖∇f‖2ρ,

and, consequently,f is a constant.Remark

• The expression(−Lf, f)ρ is called theDirichlet form of the operatorL. In the case of a gradientflow, it takes the form

(−Lf, f)ρ = D‖∇f‖2ρ. (6.37)

• Using the properties of the generatorL we can show that the solution of the Fokker-Planck equationconverges to the Gibbs distribution exponentially fast.

• For this we need the following result.

77

Theorem 6.4.5.Assume that the potentialV satisfies the convexity condition

D2V > λI.

Then the corresponding Gibbs measure satisfies the Poincare inequality with constantλ:∫

Rd

fρ = 0 ⇒ ‖∇f‖ρ >√λ‖f‖ρ. (6.38)

Theorem 6.4.6. Assume thatp(x, 0) ∈ L2(eV/D). Then the solutionp(x, t) of the Fokker-Planck equa-tion (6.33)converges to the Gibbs distribution exponentially fast:

‖p(·, t) − Z−1e−V ‖ρ−1 6 e−λDt‖p(·, 0) − Z−1e−V ‖ρ−1 . (6.39)

Proof. We Use (6.36), (6.37) and (6.38) to calculate

− d

dt‖(h − 1)‖2

ρ = −2

(∂h

∂t, h− 1

)

ρ

= −2 (Lh, h− 1)ρ

= (−L(h− 1), h− 1)ρ = 2D‖∇(h− 1)‖ρ

> 2Dλ‖h− 1‖2ρ.

Our assumption onp(·, 0) implies thath(·, 0) ∈ L2ρ. Consequently, the above calculation shows that

‖h(·, t) − 1‖ρ 6 e−λDt‖H(·, 0) − 1‖ρ.

This, and the definition ofh, p = ρh, lead to (6.39).

• The assumption ∫

Rd

|p(x, 0)|2Z−1eV/D <∞

• is very restrictive (think of the case whereV = x2).

• The function spaceL2(ρ−1) = L2(e−V/D) in which we prove convergence is not the right space touse. Sincep(·, t) ∈ L1, ideally we would like to prove exponentially fast convergence inL1.

• We can prove convergence inL1 using the theory oflogarithmic Sobolev inequalities. In fact, wecan also prove convergence inrelative entropy:

H(p|ρV ) :=

∫

Rd

p ln

(p

ρV

)dx.

• The relative entropy norm controls theL1 norm:

‖ρ1 − ρ2‖L1 6 CH(ρ1|ρ2)

• Using a logarithmic Sobolev inequality, we can prove exponentially fast convergence to equilibrium,assuming only that the relative entropy of the initial conditions is finite.

78

Theorem 6.4.7.Letp denote the solution of the Fokker–Planck equation(6.33)where the potential is smoothand uniformly convex. Assume that the the initial conditions satisfy

H(p(·, 0)|ρV ) <∞.

Thenp converges to the Gibbs distribution exponentially fast in relative entropy:

H(p(·, t)|ρV ) 6 e−λDtH(p(·, 0)|ρV ).

• Convergence to equilibrium forkinetic equations, both linear and non-linear (e.g., the Boltzmannequation) has been studied extensively.

• It has been recognized that the relative entropy plays a veryimportant role.

• For more information see

• On the trend to equilibrium for the Fokker-Planck equation:an interplay between physics and func-tional analysisby P.A. Markowich and C. Villani,1999.

6.5 Eigenfunction Expansions

• Consider the generator of a gradient stochastic flow with a uniformly convex potential

L = −∇V · ∇ +D∆. (6.40)

• We know that

i. L is a non-positive self-adjoint operator onL2ρ.

ii. It has a spectral gap:(Lf, f)ρ 6 −Dλ‖f‖2

ρ

whereλ is the Poincare constant of the potentialV .

• The above imply that we can study the spectral problem for−L:

−Lfn = λnfn, n = 0, 1, . . .

• The operator−L has real, discrete spectrum with

0 = λ0 < λ1 < λ2 < . . .

• Furthermore, the eigenfunctionsfjinftyj=1 form an orthonormal basis inL2

ρ: we can express everyelement ofL2

ρ in the form of a generalized Fourier series:

φ =

∞∑

n=0

φnfn, φn = (φ, fn)ρ (6.41)

79

• with (fn, fm)ρ = δnm.

• This enables us to solve the time dependent Fokker–Planck equation in terms of an eigenfunctionexpansion.

• Consider the backward Kolmogorov equation (6.36).

• We assume that the initial conditionsh0(x) = φ(x) ∈ L2ρ and consequently we can expand it in the

form (6.41).

• We look for a solution of (6.36) in the form

h(x, t) =

∞∑

n=0

hn(t)fn(x).

• We substitute this expansion into the backward Kolmogorov equation:

∂h

∂t=

∞∑

n=0

hnfn = L( ∞∑

n=0

hnfn

)

(6.42)

=∞∑

n=0

−λnhnfn. (6.43)

• We multiply this equation byfm, integrate wrt the Gibbs measure and use the orthonormalityof theeigenfunctions to obtain the sequence of equations

hn = −λnhn, n = 0, 1,

• The solution ish0(t) = φ0, hn(t) = e−λntφn, n = 1, 2, . . .

• Notice that

1 =

∫

Rd

p(x, 0) dx =

∫

Rd

p(x, t) dx

=

∫

Rd

h(x, t)Z−1eβV dx = (h, 1)ρ = (φ, 1)ρ

= φ0.

• Consequently, the solution of the backward Kolmogorov equation is

h(x, t) = 1 +

∞∑

n=1

e−λntφnfn.

• This expansion, together with the fact that all eigenvaluesare positive (n > 1), shows that the solutionof the backward Kolmogorov equation converges to1 exponentially fast.

• The solution of the Fokker–Planck equation is

p(x, t) = Z−1e−βV (x)

(

1 +

∞∑

n=1

e−λntφnfn

)

.

80

6.6 Self-adjointness

Self-adjointness

• The Fokker–Planck operator of a general diffusion process is not self-adjoint in general.

• In fact, it is self-adjointif and only if the drift term is the gradient of a potential (Nelson,≈ 1960).This is also true in infinite dimensions (Stochastic PDEs).

• Markov Processes whose generator is a self-adjoint operator are calledreversible: for all t ∈ [0, T ]

Xt andXT−t have the same transition probability (whenXt is stationary).

• Reversibility is equivalent to the invariant measure beinga Gibbs measure.

• SeeThermodynamics of the general duffusion process: time-reversibility and entropy production, H.Qian, M. Qian, X. Tang, J. Stat. Phys.,107, (5/6), 2002, pp. 1129–1141.

6.7 Reduction to a Schrodinger Equation

Reduction to a Schrodinger Equation

Lemma 6.7.1. The Fokker–Planck operator for a gradient flow can be writtenin the self-adjoint form

∂p

∂t= D∇ ·

(e−V/D∇

(eV/Dp

)). (6.44)

Define nowψ(x, t) = eV/2Dp(x, t). Thenψ solves the PDE

∂ψ

∂t= D∆ψ − U(x)ψ, U(x) :=

|∇V |24D

− ∆V

2. (6.45)

LetH := −D∆ + U . ThenL∗ andH have the same eigenvalues. Thenth eigenfunctionφn of L∗ and thenth eigenfunctionψn ofH are associated through the transformation

ψn(x) = φn(x) exp

(V (x)

2D

).

Remarks 6.7.2. i. From equation(6.44)shows that the FP operator can be written in the form

L∗· = D∇ ·(e−V/D∇

(eV/D·

)).

ii. The operator that appears on the right hand side of eqn.(6.45)has the form of aSchrodinger oper-ator:

−H = −D∆ + U(x).

iii. The spectral problem for the FP operator can be transformed into the spectral problem for a Schrodingeroperator. We can thus use all the available results from quantum mechanics to study the FP equationand the associated SDE.

81

iv. In particular, the weak noise asymptoticsD ≪ 1 is equivalent to the semiclassical approximationfrom quantum mechanics.

Proof. We calculate

D∇ ·(e−V/D∇

(eV/Df

))= D∇ ·

(e−V/D

(D−1∇V f + ∇f

)eV/D

)

= ∇ · (∇V f +D∇f) = L∗f.

Consider now the eigenvalue problem for the FP operator:

−L∗φn = λnφn.

Setφn = ψn exp(− 1

2DV). We calculate−L∗φn:

−L∗φn = −D∇ ·(e−V/D∇

(eV/Dψne

−V/2D))

= −D∇ ·(e−V/D

(∇ψn +

∇V2D

ψn

)eV/2D

)

=

(−D∆ψn +

(−|∇V |2

4D+

∆V

2D

)ψn

)e−V/2D = e−V/2DHψn.

From this we conclude thate−V/2DHψn = λnψne−V/2D from which the equivalence between the two

eigenvalue problems follows.

Remarks 6.7.3. i. We can rewrite the Schrodinger operator in the form

H = DA∗A, A = ∇ +∇U2D

, A∗ = −∇ +∇U2D

.

ii. These arecreation andannihilation operators. They can also be written in the form

A· = e−U/2D∇(eU/2D·

), A∗· = eU/2D∇

(e−U/2D·

)

iii. The forward the backward Kolmogorov operators have thesame eigenvalues. Their eigenfunctionsare related through

φBn = φF

n exp (−V/D) ,

whereφBn andφF

n denote the eigenfunctions of the backward and forward operators, respectively.

6.8 The Klein-Kramers-Chandrasekhar Equation

• Consider a diffusion process in two dimensions for the variablesq (position) and momentump. Thegenerator of this Markov process is

L = p · ∇q −∇qV∇p + γ(−p∇p +D∆p). (6.46)

82

• TheL2(dpdq)-adjoint is

L∗ρ = −p · ∇qρ−∇qV · ∇pρ+ γ (∇p(pρ) +D∆pρ) .

• The corresponding FP equation is:∂p

∂t= L∗p.

• The corresponding stochastic differential equations is the Langevin equation

Xt = −∇V (Xt) − γXt +√

2γDWt. (6.47)

• This is Newton’s equation perturbed by dissipation and noise.

• The Klein-Kramers-Chandrasekhar equation was first derived by Kramers in1923 and was studied byKramers in his famous paper ”Brownian motion in a field of force and the diffusion model of chemicalreactions”, Physica 7(1940), pp. 284-304.

• Notice thatL∗ is not a uniformly elliptic operator: there are second orderderivatives only with respectto p and notq. This is an example of a degenerate elliptic operator. It is,however,hypoelliptic: wecan still prove existence and uniqueness of solutions for the FP equation, and obtain estimates on thesolution.

• It is not possible to obtain the solution of the FP equation for an arbitrary potential.

• We can calculate the (unique normalized) solution of the stationary Fokker-Planck equation.

Theorem 6.8.1.LetV (x) be a smooth confining potential. Then the Markov process withgenerator(8.6)is ergodic. The unique invariant distribution is theMaxwell-Boltzmann distribution

ρ(p, q) =1

Ze−βH(p,q) (6.48)

whereH(p, q) =

1

2‖p‖2 + V (q)

is the Hamiltonian , β = (kBT )−1 is the inverse temperature and the normalization factorZ is thepartition function

Z =

∫

R2d

e−βH(p,q) dpdq.

• It is possible to obtain rates of convergence in either a weightedL2-norm or the relative entropy norm.

H(p(·, t)|ρ) 6 Ce−αt.

• The proof of this result is very complicated, since the generatorL is degenerate and non-selfadjoint.

• See F. Herau and F. Nier,Isotropic hypoellipticity and trend to equilibrium for theFokker-Planckequation with a high-degree potential, Arch. Ration. Mech. Anal., 171(2),(2004), 151–218.

83

Let ρ(q, p, t) be the solution of the Kramers equation and letρβ(q, p) be the Maxwell-Boltzmann distri-bution. We can write

ρ(q, p, t) = h(q, p, t)ρβ(q, p),

whereh(q, p, t) solves the equation∂h

∂t= −Ah+ γSh (6.49)

where

A = p · ∇q −∇qV · ∇p, S = −p · ∇p + β−1∆p.

The operatorA is antisymmetric inL2ρ := L2(R2d; ρβ(q, p)), whereasS is symmetric.

LetXi := − ∂∂pi

. TheL2ρ-adjoint ofXi is

X∗i = −βpi +

∂

∂pi.

We have that

S = β−1d∑

i=1

X∗i Xi.

Consequently, the generator of the Markov processq(t), p(t) can be written in Hormander’s ”sum ofsquares” form:

L = A + γβ−1d∑

i=1

X∗i Xi. (6.50)

We calculate the commutators between the vector fields in (6.50):

[A,Xi] =∂

∂qi, [Xi,Xj ] = 0, [Xi,X

∗j ] = βδij .

Consequently,

Lie(X1, . . . Xd, [A,X1], . . . [A,Xd]) = Lie(∇p,∇q)

which spansTp,qR2d for all p, q ∈ R

d. This shows that the generatorL is a hypoelliptic operator.Let nowYi = − ∂

∂piwith L2

ρ-adjointY ∗i = ∂

∂qi− β ∂V

∂qi. We have that

X∗i Yi − Y ∗

i Xi = β

(pi∂

∂qi− ∂V

∂qi

∂

∂pi

).

Consequently, the generator can be written in the form

L = β−1d∑

i=1

(X∗i Yi − Y ∗

i Xi + γX∗i Xi) . (6.51)

Notice also that

LV := −∇qV∇q + β−1∆q = β−1d∑

i=1

Y ∗i Yi.

84

• The phase-space Fokker-Planck equation can be written in the form

∂ρ

∂t+ p · ∇qρ−∇qV · ∇pρ = Q(ρ, fB)

• where thecollision operatorhas the form

Q(ρ, fB) = D∇ ·(fB∇

(f−1

B ρ)).

• The Fokker-Planck equation has a similar structure to the Boltzmann equation (the basic equation inthe kinetic theory of gases), with the difference that the collision operator for the FP equation is linear.

• Convergence of solutions of the Boltzmann equation to the Maxwell-Boltzmann distribution has alsobeen proved. See

• L. Desvillettes and C. Villani: On the trend to global equilibrium for spatially inhomogeneous kineticsystems: the Boltzmann equation. Invent. Math. 159, 2 (2005), 245-316.

• We can study the backward and forward Kolmogorov equations for (9.7) by expanding the solutionwith respect to the Hermite basis.

• We consider the problem in1d. We setD = 1. The generator of the process is:

L = p∂q − V ′(q)∂p + γ(−p∂p + ∂2

p

).

=: L1 + γL0,

• whereL0 := −p∂p + ∂2

p and L1 := p∂q − V ′(q)∂p.

• The backward Kolmogorov equation is∂h

∂t= Lh. (6.52)

• The solution should be an element of the weightedL2-space

L2ρ =

f |∫

R2

|f |2Z−1e−βH(p,q) dpdq <∞.

• We notice that the invariant measure of our Markov process isa product measure:

e−βH(p,q) = e−β 12|p|2e−βV (q).

• The spaceL2(e−β 12|p|2 dp) is spanned by the Hermite polynomials. Consequently, we canexpand the

solution of (6.52) into the basis of Hermite basis:

h(p, q, t) =

∞∑

n=0

hn(q, t)fn(p), (6.53)

85

• wherefn(p) = 1/√n!Hn(p).

• Our plan is to substitute (6.53) into (6.52) and obtain a sequence of equations for the coefficientshn(q, t).

• We have:

L0h = L0

∞∑

n=0

hnfn = −∞∑

n=0

nhnfn

• Furthermore

L1h = −∂qV ∂ph+ p∂qh.

• We calculate each term on the right hand side of the above equation separately. For this we will needthe formulas

∂pfn =√nfn−1 andpfn =

√nfn−1 +

√n+ 1fn+1.

p∂qh = p∂q

∞∑

n=0

hnfn = p∂ph0 +∞∑

n=1

∂qhnpfn

= ∂qh0f1 +

∞∑

n=1

∂qhn

(√nfn−1 +

√n+ 1fn+1

)

=∞∑

n=0

(√n+ 1∂qhn+1 +

√n∂qhn−1)fn

• with h−1 ≡ 0.

• Furthermore

∂qV ∂ph =

∞∑

n=0

∂qV hn∂pfn =

∞∑

n=0

∂qV hn

√nfn−1

=∞∑

n=0

∂qV hn+1

√n+ 1fn.

• Consequently:

Lh = L1 + γL1h

=

∞∑

n=0

(− γnhn +

√n+ 1∂qhn+1

+√n∂qhn−1 +

√n+ 1∂qV hn+1

)fn

86

• Using the orthonormality of the eigenfunctions ofL0 we obtain the following set of equations whichdeterminehn(q, t)∞n=0.

hn = −γnhn +√n+ 1∂qhn+1

+√n∂qhn−1 +

√n+ 1∂qV hn+1, n = 0, 1, . . .

• This is set of equations is usually called theBrinkman hierarchy (1956).

• We can use this approach to develop a numerical method for solving the Klein-Kramers equation.

• For this we need to expand each coefficienthn in an appropriate basis with respect toq.

• Obvious choices are other the Hermite basis (polynomial potentials) or the standard Fourier basis(periodic potentials).

• We will do this for the case of periodic potentials.

• The resulting method is usually called thecontinued fraction expansion. See Risken (1989).

• The Hermite expansion of the distribution function wrt to the velocity is used in the study of variouskinetic equations (including the Boltzmann equation). It was initiated by Grad in the late40’s.

• It quite often used in the approximate calculation of transport coefficients (e.g. diffusion coefficient).

• This expansion can be justified rigorously for the Fokker-Planck equation. See

• J. Meyer and J. Schroter,Comments on the Grad Procedure for the Fokker-Planck Equation, J. Stat.Phys. 32(1) pp.53-69 (1983).

• This expansion can also be used in order to solve the Poisson equation−Lφ = f(p, q). See G.A.Pavliotis and T. VogiannouDiffusive Transport in Periodic Potentials: Underdamped dynamics, Fluct.Noise Lett., 8(2) L155-L173 (2008).

6.9 Brownian Motion in a Harmonic Potential

There are very few potentials for which we can solve the Langevin equation or to calculate the eigenvaluesand eigenfunctions of the generator of the Markov processq(t), p(t). One case where we can calculateeverything explicitly is that of a Brownian particle in a quadratic (harmonic) potential

V (q) =1

2ω2

0q2. (6.54)

The Langevin equation is

q = −ω20q − γq +

√2γβ−1W (6.55)

or

q = p, p = −ω20q − γp+

√2γβ−1W . (6.56)

87

This is a linear equation that can be solved explicitly. Rather than doing this, we will calculate the eigenval-ues and eigenfunctions of the generator, which takes the form

L = p∂q − ω20q∂p + γ(−p∂p + β−1∂2

p). (6.57)

The Fokker-Planck operator is

L = p∂q − ω20q∂p + γ(−p∂p + β−1∂2

p). (6.58)

The processq(t), p(t) is an ergodic Markov process with Gaussian invariant measure

ρβ(q, p) dqdp =βω0

2πe−β

2p2−βω2

0q2 . (6.59)

For the calculation of the eigenvalues and eigenfunctions of the operatorL it is convenient to introducecreation and annihilation operator in both the position andmomentum variables. We set

a− = β−1/2∂p, a+ = −β−1/2∂p + β1/2p (6.60)

andb− = ω−1

0 β−1/2∂q, b+ = −ω−10 β−1/2∂q + ω0β

1/2p. (6.61)

We have thata+a− = −β−1∂2

p + p∂p

andb+b− = −β−1∂2

q + q∂q

Consequently, the operatorL = −a+a− − b+b− (6.62)

is the generator of the OU process in two dimensions.

Exercise 6.9.1.Calculate the eigenvalues and eigenfunctions ofL. Show that there exists a transformationthat transformsL into the Schr”odinger operator of the two-dimensional quantum harmonic oscillator.

Exercise 6.9.2.Show that the operatorsa±, b± satisfy the commutation relations

[a+, a−] = −1, (6.63a)

[b+, b−] = −1, (6.63b)

[a±, b±] = 0. (6.63c)

Using now the operatorsa± andb± we can write the generatorL in the form

L = −γa+a− − ω0(b+a− − a+b−), (6.64)

which is a particular case of (6.51). In order to calculate the eigenvalues and eigenfunctions of (6.64) weneed to make an appropriate change of variables in order to bring the operatorL into the ”decoupled”form (6.62). Clearly, this is a linear transformation and can be written in the form

Y = AX

88

whereX = (q, p) for some2 × 2 matrix A. It is somewhat easier to make this change of variables atthe level of the creation and annihilation operators. In particular, our goal is to find first order differentialoperatorsc± andd± so that the operator (6.64) becomes

L = −Cc+c− −Dd+d− (6.65)

for some appropriate constantsC andD. Since our goal is, essentially, to mapL to the two-dimensionalOU process, we require that that the operatorsc± andd± satisfy thecanonical commutation relations

[c+, c−] = −1, (6.66a)

[d+, d−] = −1, (6.66b)

[c±, d±] = 0. (6.66c)

The operatorsc± andd± should be given as linear combinations of the old operatorsa± and b±. Fromthe structure of the generatorL (6.64), the decoupled form (6.65) and the commutation relations (6.66)and (6.63) we conclude thatc± andd± should be of the form

c+ = α11a+ + α12b

+, (6.67a)

c− = α21a− + α22b

−, (6.67b)

d+ = β11a+ + β12b

+, (6.67c)

d− = β21a− + β22b

−. (6.67d)

Notice that thec− andd− are not the adjoints ofc+ andd+. If we substitute now these equations into (6.65)and equate it with (6.64) and into the commutation relations(6.66) we obtain a system of equations for thecoefficientsαij, βij. In order to write down the formulas for these coefficients itis convenient tointroduce the eigenvalues of the deterministic problem

q = −γq − ω20q.

The solution of this equation isq(t) = C1e

−λ1t + C2e−λ2t

with

λ1,2 =γ ± δ

2, δ =

√γ2 − 4ω2

0. (6.68)

The eigenvalues satisfy the relations

λ1 + λ2 = γ, λ1 − λ2 = δ, λ1λ2 = ω20 . (6.69)

Proposition 6.9.3. LetL be the generator(6.64)and letc±, dpm be the operators

c+ =1√δ

(√λ1a

+ +√λ2b

+), (6.70a)

c− =1√δ

(√λ1a

− −√λ2b

−), (6.70b)

89

d+ =1√δ

(√λ2a

+ +√λ1b

+), (6.70c)

d− =1√δ

(−√λ2a

− +√λ1b

−). (6.70d)

Thenc±, d± satisfy the canonical commutation relations(6.66)as well as

[L, c±] = −λ1c±, [L, d±] = −λ2d

±. (6.71)

Furthermore, the operatorL can be written in the form

L = −λ1c+c− − λ2d

+d−. (6.72)

Proof. first we check the commutation relations:

[c+, c−] =1

δ

(λ1[a

+, a−] − λ2[b+, b−]

)

=1

δ(−λ1 + λ2) = −1.

Similarly,

[d+, d−] =1

δ

(−λ2[a

+, a−] + λ1[b+, b−]

)

=1

δ(λ2 − λ1) = −1.

Clearly, we have that[c+, d+] = [c−, d−] = 0.

Furthermore,

[c+, d−] =1

δ

(−√λ1λ2[a

+, a−] +√λ1λ2[b

+, b−])

=1

δ(−√λ1λ2 + −

√λ1λ2) = 0.

Finally:

[L, c+] = −λ1c+c−c+ + λ1c

+c+c−

= −λ1c+(1 + c+c−) + λ1c

+c+c−

= −λ1c+(1 + c+c−) + λ1c

+c+c−

= −λ1c+,

and similarly for the other equations in (6.71). Now we calculate

L = −λ1c+c− − λ2d

+d−

= −λ22 − λ2

1

δa+a− + 0b+b− +

√λ1λ2

δ(λ1 − λ2)a

+b− +1

δ

√λ1λ2(−λ1 + λ2)b

+a−

= −γa+a− − ω0(b+a− − a+b−),

which is precisely (6.64). In the above calculation we used (6.69).

90

Using now (6.72) we can readily obtain the eigenvalues and eigenfunctions ofL. From our experiencewith the two-dimensional OU processes (or, the Schrodinger operator for the two-dimensional quantumharmonic oscillator), we expect that the eigenfunctions should be tensor products of Hermite polynomials.Indeed, we have the following, which is the main result of this section.

Theorem 6.9.4.The eigenvalues and eigenfunctions of the generator of the Markov processq, p (6.56)are

λnm = λ1n+ λ2m =1

2γ(n+m) +

1

2δ(n −m), n,m = 0, 1, . . . (6.73)

andφnm(q, p) =

1√n!m!

(c+)n(d+)m1, n,m = 0, 1, . . . (6.74)

Proof. We have

[L, (c+)2] = L(c+)2 − (c+)2L= (c+L − λ1c

+)c+ − c+(Lc+ + λ1c+)

= −2λ1(c+)2

and similarly[L, (d+)2] = −2λ1(c+)2. A simple induction argument now shows that (see Exercise 6.9.5)

[L, (c+)n] = −nλ1(c+)n and [L, (d+)m] = −mλ1(d

+)m. (6.75)

We use (6.75) to calculate

L(c+)n(d+)n1

= (c+)nL(d+)m1− nλ1(c+)n(d+m)1

= (c+)n(d+)mL1−mλ2(c+)n(d+m)1 − nλ1(c

+)n(d+m)1

= −nλ1(c+)n(d+m)1 −mλ2(c

+)n(d+m)1

from which (6.73) and (6.74) follow.

Exercise 6.9.5.Show that

[L, (c±)n] = −nλ1(c±)n, [L, (d±)n] = −nλ1(d

±)n, [c−, (c+)n] = n(c+)n−1, [d−, (d+)n] = n(d+)n−1.

(6.76)

Remark 6.9.6. In terms of the operatorsa±, b± the eigenfunctions ofL are

φnm =√n!m!δ−

n+m2 λ

n/21 λ

m/22

n∑

ℓ=0

m∑

k=0

1

k!(m− k)!ℓ!(n − ℓ)!

(λ1

λ2

)k−ℓ2

(a+)n+m−k−ℓ(b+)ℓ+k1.

The first few eigenfunctions are

φ00 = 1.

φ10 =

√β(√λ1p+

√λ2ω0q

)√δ

.

91

φ01 =

√β(√λ2p+

√λ1ω0q

)√δ

φ11 =−2

√λ1

√λ2 +

√λ1β p

2√λ2 + β pλ1ω0q + ω0β qλ2p+

√λ2ω0

2β q2√λ1

δ.

φ20 =−λ1 + β p2λ1 + 2

√λ2β p

√λ1ω0q − λ2 + ω0

2β q2λ2√2δ

.

φ02 =−λ2 + β p2λ2 + 2

√λ2β p

√λ1ω0q − λ1 + ω0

2β q2λ1√2δ

.

Notice that the eigenfunctions are not orthonormal.As we already know, the first eigenvalue, corresponding to the constant eigenfunction, is0:

λ00 = 0.

Notice that the operatorL is not self-adjoint and consequently, we do not expect its eigenvalues to be real.Indeed, whether the eigenvalues are real or not depends on the sign of the discriminant∆ = γ2 − 4ω2

0 . Intheunderdampedregime,γ < 2ω0 the eigenvalues are complex:

λnm =1

2γ(n+m) +

1

2i√

−γ2 + 4ω20(n−m), γ < 2ω0.

This it to be expected, since the underdamped regime the dynamics is dominated by the deterministic Hamil-tonian dynamics that give rise to the antisymmetric Liouville operator. We setω =

√(4ω2

0 − γ2), i.e.δ = 2iω. The eigenvalues can be written as

λnm =γ

2(n+m) + iω(n−m).

In Figure 6.9 we present the first few eigenvalues ofL in the underdamped regime. The eigenvalues arecontained in a cone on the right half of the complex plane. Thecone is determined by

λn0 =γ

2n+ iωn and λ0m =

γ

2m− iωm.

The eigenvalues along the diagonal are real:

λnn = γn.

On the other hand, in theoverdampedregime,γ > 2ω0 all eigenvalues are real:

λnm =1

2γ(n+m) +

1

2

√γ2 − 4ω2

0(n−m), γ > 2ω0.

In fact, in the overdamped limitγ → +∞ (which we will study in Chapter 8), the eigenvalues of thegeneratorL converge to the eigenvalues of the generator of the OU process:

λnm = γn+ω2

0

γ(n−m) +O(γ−3).

92

0 0.5 1 1.5 2 2.5 3−3

−2

−1

0

1

2

3

Re (λnm)

Im (

λnm)

Figure 6.1: First few eigenvalues ofL for γ = ω = 1.

93

This is consistent with the fact that in this limit the solution of the Langevin equation converges to thesolution of the OU SDE. See Chapter 8 for details.

The eigenfunctions ofL do not form an orthonormal basis inL2β := L2(R2, Z−1e−βH) sinceL is not

a selfadjoint operator. Using the eigenfunctions/eigenvalues ofL we can easily calculate the eigenfunc-tions/eigenvalues of theL2

β adjoint ofL. From the calculations presented in Section 6.8 we know thattheadjoint operator is

L := −A + γS

= −ω0(b+a− − b−a+) + γa+a−

= −λ1(c−)∗(c+)∗ − λ2(d

−) ∗ (d+)∗,

where(c+)∗ =

1√δ

(√λ1a

− +√λ2b

−), (6.77a)

(c−)∗ =1√δ

(√λ1a

+ −√λ2b

+), (6.77b)

(d+)∗ =1√δ

(√λ2a

− +√λ1b

−), (6.77c)

(d−)∗ =1√δ

(−√λ2a

+ +√λ1b

+). (6.77d)

Exercise 6.9.7. i. Show by direct substitution thatL can be written in the form

L = −λ1(c−)∗(c+)∗ − λ2(d

−) ∗ (d+) ∗ .

ii. Calculate the commutators

[(c+)∗, (c−)∗], [(d+)∗, (d−)∗], [(c±)∗, (d±)∗], [L, (c±)∗], [L, (d±)∗].

L has the same eigenvalues asL:−Lψnm = λnmψnm,

whereλnm are given by (6.73). The eigenfunctions are

ψnm =1√n!m!

((c−)∗)n((d−)∗)m1. (6.78)

Proposition 6.9.8. The eigenfunctions ofL andL satisfy the biorthonormality relation∫ ∫

φnmψℓkρβ dpdq = δnℓδmk. (6.79)

Proof. We will use formulas (6.76). Notice that using the third and fourth of these equations together withthe fact thatc−1 = d−1 = 0 we can conclude that (forn > ℓ)

(c−)ℓ(c+)n1 = n(n− 1) . . . (n− ℓ+ 1)(c+)n−ℓ. (6.80)

94

We have∫ ∫

φnmψℓkρβ dpdq =1√

n!m!ℓ!k!

∫ ∫((c+))n((d+))m1((c−)∗)ℓ((d−)∗)k1ρβ dpdq

=n(n− 1) . . . (n − ℓ+ 1)m(m− 1) . . . (m− k + 1)√

n!m!ℓ!k!

∫ ∫((c+))n−ℓ((d+))m−k

1ρβ dpdq

= δnℓδmk,

since all eigenfunctions average to0 with respect toρβ.

From the eigenfunctions ofL we can obtain the eigenfunctions of the Fokker-Planck operator. Usingthe formula (see equation (6.49))

L∗(fρβ) = ρLf

we immediately conclude that the the Fokker-Planck operator has the same eigenvalues as those ofL andL.The eigenfunctions are

ψ∗nm = ρβφnm = ρβ1√n!m!

((c−)∗)n((d−)∗)m1. (6.81)


The proof of existence and uniqueness of classical solutions for the Fokker-Planck equation of a uniformlyelliptic diffusion process with smooth drift and diffusioncoefficients, Theorem 6.1.2, can be found in [7].A standard textbook on PDEs, with a lot of material on parabolic PDEs is [4]. The Fokker-Planck equationis studied extensively in Risken’s monograph [20]. See also[10] and [14]. The connection between theFokker-Planck equation and stochastic differential equations is presented in Chapter 7. See also [1, 8, 9].

Diffusion processes in one dimension are studied in [16]. The Feller classification for one dimensionaldiffusion processes can be also found in [15, 5].

95

96

Chapter 7

Stochastic Differential Equations

7.1 Introduction

• In this part of the course we will study stochastic differential equation (SDEs): ODEs driven byGaussian white noise.

• LetW (t) denote a standardm–dimensional Brownian motion,h : Z → Rd a smooth vector-valued

function andγ : Z → Rd×m a smooth matrix valued function (in this course we will takeZ = T

d, Rd

or Rl ⊕ T

d−l.

• Consider the SDEdz

dt= h(z) + γ(z)

dW

dt, z(0) = z0. (7.1)

• We think of the termdWdt as representing Gaussian white noise: a mean-zero Gaussianprocess with

correlationδ(t− s)I.

• The functionh in (7.1) is sometimes referred to as thedrift andγ as thediffusion coefficient.

• Such a process exists only as a distribution. The precise interpretation of (7.1) is as an integral equationfor z(t) ∈ C(R+,Z):

z(t) = z0 +

∫ t

0h(z(s))ds +

∫ t

0γ(z(s))dW (s). (7.2)

• In order to make sense of this equation we need to define the stochastic integral againstW (s).

7.2 The Ito Stochastic Integral

The It o Stochastic Integral

• For the rigorous analysis of stochastic differential equations it is necessary to define stochastic inte-grals of the form

I(t) =

∫ t

0f(s) dW (s), (7.3)

97

• whereW (t) is a standard one dimensional Brownian motion. This is not straightforward becauseW (t) does not have bounded variation.

• In order to define the stochastic integral we assume thatf(t) is a random process, adapted to thefiltration Ft generated by the processW (t), and such that

E

(∫ T

0f(s)2 ds

)<∞.

• The Ito stochastic integralI(t) is defined as theL2–limit of the Riemann sum approximation of (7.3):

I(t) := limK→∞

K−1∑

k=1

f(tk−1) (W (tk) −W (tk−1)) , (7.4)

• wheretk = k∆t andK∆t = t.

• Notice that the functionf(t) is evaluated at the left end of each interval[tn−1, tn] in (7.4).

• The resulting Ito stochastic integralI(t) is a.s. continuous int.

• These ideas are readily generalized to the case whereW (s) is a standardd dimensional Brownianmotion andf(s) ∈ R

m×d for eachs.

• The resulting integral satisfies theIto isometry

E|I(t)|2 =

∫ t

0E|f(s)|2F ds, (7.5)

• where| · |F denotes the Frobenius norm|A|F =√tr(ATA).

• The Ito stochastic integral is amartingale:

EI(t) = 0

• andE[I(t)|Fs] = I(s) ∀ t > s,

whereFs denotes the filtration generated byW (s).

Example 7.2.1. • Consider the Ito stochastic integral

I(t) =

∫ t

0f(s) dW (s),

• wheref,W are scalar–valued. This is a martingale with quadratic variation

〈I〉t =

∫ t

0(f(s))2 ds.

• More generally, forf, W in arbitrary finite dimensions, the integralI(t) is a martingale with quadraticvariation

〈I〉t =

∫ t

0(f(s) ⊗ f(s)) ds.

98

7.2.1 The Stratonovich Stochastic Integral

The Stratonovich Stochastic Integral

• In addition to the Ito stochastic integral, we can also define the Stratonovich stochastic integral. It isdefined as theL2–limit of a different Riemann sum approximation of (7.3), namely

Istrat(t) := limK→∞

K−1∑

k=1

1

2

(f(tk−1) + f(tk)

)(W (tk) −W (tk−1)) , (7.6)

• wheretk = k∆t andK∆t = t. Notice that the functionf(t) is evaluated at both endpoints of eachinterval [tn−1, tn] in (7.6).

• The multidimensional Stratonovich integral is defined in a similar way. The resulting integral iswritten as

Istrat(t) =

∫ t

0f(s) dW (s).

• The limit in (7.6) gives rise to an integral which differs from the Ito integral.

• The situation is more complex than that arising in the standard theory of Riemann integration forfunctions of bounded variation: in that case the points in[tk−1, tk] where the integrand is evaluateddo not effect the definition of the integral, via a limiting process.

• In the case of integration against Brownian motion, which does not have bounded variation, the limitsdiffer.

• Whenf andW are correlated through an SDE, then a formula exists to convert between them.

7.3 Existence and Uniqueness of solutions for SDEs

Existence and Uniqueness of solutions for SDEs

Definition 7.3.1. By a solution of(7.1) we mean aZ-valued stochastic processz(t) on t ∈ [0, T ] withthe properties:

i. z(t) is continuous andFt−adapted, where the filtration is generated by the Brownian motionW (t);

ii. h(z(t)) ∈ L1((0, T )), γ(z(t)) ∈ L2((0, T ));

iii. equation(7.1)holds for everyt ∈ [0, T ] with probability1.

The solution is called unique if any two solutionsxi(t), i = 1, 2 satisfy

P(x1(t) = x2(t), ∀t ∈ [0.T ]) = 1.

• It is well known that existence and uniqueness of solutions for ODEs (i.e. whenγ ≡ 0 in (7.1)) holdsfor globally Lipschitz vector fieldsh(x).

99

• A very similar theorem holds whenγ 6= 0.

• As for ODEs the conditions can be weakened, whena priori bounds on the solution can be found.

Theorem 7.3.2.Assume that bothh(·) andγ(·) are globally Lipschitz onZ and thatz0 is a random variableindependent of the Brownian motionW (t) with

E|z0|2 <∞.

Then the SDE(7.1)has a unique solutionz(t) ∈ C(R+;Z) with

E

[∫ T

0|z(t)|2 dt

]<∞ ∀T <∞.

Furthermore, the solution of the SDE is a Markov process.

Remarks 7.3.3. • The Stratonovich analogue of(7.1) is

dz

dt= h(z) + γ(z) dW

dt, z(0) = z0. (7.7)

• By this we mean thatz ∈ C(R+,Z) satisfies the integral equation

z(t) = z(0) +

∫ t

0h(z(s))ds +

∫ t

0γ(z(s)) dW (s). (7.8)

• By using definitions(7.4)and (7.6) it can be shown thatz satisfying the Stratonovich SDE(7.7)alsosatisfies the Ito SDE

dz

dt= h(z) +

1

2∇ ·(γ(z)γ(z)T

)− 1

2γ(z)∇ ·

(γ(z)T

)+ γ(z)

dW

dt, (7.9a)

z(0) = z0, (7.9b)

• provided thatγ(z) is differentiable.

• White noise is, in most applications, an idealization of a stationary random process with short correla-tion time. In this context the Stratonovich interpretationof an SDE is particularly important becauseit often arises as the limit obtained by using smooth approximations to white noise.

• On the other hand the martingale machinery which comes with the Ito integral makes it more importantas a mathematical object.

• It is very useful that we can convert from the Ito to the Stratonovich interpretation of the stochasticintegral.

• There are other interpretations of the stochastic integral, e.g. the Klimontovich stochastic integral.

• The Definition of Brownian motion implies the scaling property

W (ct) =√cW (t),

100

• where the above should be interpreted as holding in law. Fromthis it follows that, ifs = ct, then

dW

ds=

1√c

dW

dt,

again in law.

• Hence, if we scale time tos = ct in (7.1), then we get the equation

dz

ds=

1

ch(z) +

1√cγ(z)

dW

ds, z(0) = z0.

7.4 Derivation of the Stratonovich SDE

The Stratonovich Stochastic Integral: A first application of multiscale methods

• When white noise is approximated by a smooth process this often leads to Stratonovich interpretationsof stochastic integrals, at least in one dimension.

• We use multiscale analysis (singular perturbation theory for Markov processes) to illustrate this phe-nomenon in a one-dimensional example.

• Consider the equationsdx

dt= h(x) +

1

εf(x)y, (7.10a)

dy

dt= −αy

ε2+

√2D

ε2dV

dt, (7.10b)

• with V being a standard one-dimensional Brownian motion.

• We say that the processx(t) is driven bycolored noise: the noise that appears in (7.10a) has non-zerocorrelation time. The correlation function of the colored noiseη(t) := y(t)/ε is (we takey(0) = 0)

R(t) = E (η(t)η(s)) =1

ε2D

αe−

α

ε2|t−s|.

• The power spectrum of the colored noiseη(t) is:

f ε(x) =1

ε2Dε−2

π

1

x2 + (αε−2)2

=D

π

1

ε4x2 + α2→ D

πα2

• and, consequently,

limε→0

E

(y(t)

ε

y(s)

ε

)=

2D

α2δ(t − s),

• which implies the heuristic

limε→0

y(t)

ε=

√2D

α2

dV

dt. (7.11)

101

• Another way of seeing this is by solving (7.10b) fory/ε:

y

ε=

√2D

α2

dV

dt− ε

α

dy

dt. (7.12)

• If we neglect theO(ε) term on the right hand side then we arrive, again, at the heuristic (7.11).

• Both of these arguments lead us to conjecture the limiting Ito SDE:

dX

dt= h(X) +

√2D

αf(X)

dV

dt. (7.13)

• In fact, as applied,the heuristic gives the incorrect limit.

• whenever white noise is approximated by a smooth process, the limiting equation should be inter-preted in the Stratonovich sense, giving

dX

dt= h(X) +

√2D

αf(X) dV

dt. (7.14)

• This is usually called the Wong-Zakai theorem. A similar result is true in arbitrary finite and eveninfinite dimensions.

• We will show this using singular perturbation theory.

Theorem 7.4.1.Assume that the initial conditions fory(t) are stationary and that the functionf is smooth.Then the solution of eqn(7.10a)converges, in the limit asε → 0 to the solution of the StratonovichSDE(7.14).

Remarks

i. It is possible to prove pathwise convergence under very mild assumptions.

ii. The generator of a Stratonovich SDE has the from

Lstrat = h(x)∂x +D

αf(x)∂x (f(x)∂x) .

iii. Consequently, the Fokker-Planck operator of the Stratonovich SDE can be written in divergence form:

L∗strat· = −∂x (h(x)·) +

D

α∂x

(f2(x)∂x·

).

iv. In most applications in physics the white noise is an approximation of a more complicated noiseprocesses with non-zero correlation time. Hence, the physically correct interpretation of the stochasticintegral is the Stratonovich one.

v. In higher dimensions an additional drift term might appear due to the noncommutativity of the rowvectors of the diffusion matrix. This is related to theLevy area correction in the theory of roughpaths.

102

Proof of Proposition 7.4.1The generator of the process(x(t), y(t)) is

L =1

ε2(−αy∂y +D∂2

y

)+

1

εf(x)y∂x + h(x)∂x

=:1

ε2L0 +

1

εL1 + L2.

The ”fast” process is an stationary Markov process with invariant density

ρ(y) =

√α

2πDe−

αy2

2D . (7.15)

The backward Kolmogorov equation is

∂uε

∂t=

(1

ε2L0 +

1

εL1 + L2

)uε. (7.16)

We look for a solution to this equation in the form of a power series expansion inε:

uε(x, y, t) = u0 + εu1 + ε2u2 + . . .

We substitute this into (7.16) and equate terms of the same power in ε to obtain the following hierarchyof equations:

−L0u0 = 0,

−L0u1 = L1u0,

−L0u2 = L1u1 + L2u0 −∂u0

∂t.

The ergodicity of the fast process implies that the null space of the generatorL0 consists only of constant iny. Hence:

u0 = u(x, t).

The second equation in the hierarchy becomes

−L0u1 = f(x)y∂xu.

This equation is solvable since the right hand side is orthogonal to the null space of the adjoint ofL0

(this is theFredholm alterantive). We solve it using separation of variables:

u1(x, y, t) =1

αf(x)∂xuy + ψ1(x, t).

In order for the third equation to have a solution we need to require that the right hand side is orthogonal tothe null space ofL∗

0: ∫

R

(L1u1 + L2u0 −

∂u0

∂t

)ρ(y) dy = 0.

We calculate: ∫

R

∂u0

∂tρ(y) dy =

∂u

∂t.

103

Furthermore:∫

R

L2u0ρ(y) dy = h(x)∂xu.

Finally∫

R

L1u1ρ(y) dy =

∫

R

f(x)y∂x

(1

αf(x)∂xuy + ψ1(x, t)

)ρ(y) dy

=1

αf(x)∂x (f(x)∂xu) 〈y2〉 + f(x)∂xψ1(x, t)〈y〉

=D

α2f(x)∂x (f(x)∂xu)

=D

α2f(x)∂xf(x)∂xu+

D

α2f(x)2∂2

xu.

Putting everything together we obtain the limiting backward Kolmogorov equation

∂u

∂t=

(h(x) +

D

α2f(x)∂xf(x)

)∂xu+

D

α2f(x)2∂2

xu,

from which we read off the limiting Stratonovich SDE

dX

dt= h(X) +

√2D

αf(X) dV

dt.

7.5 Ito versus Stratonovich

• A Stratonovich SDEdX(t) = f(X(t)) dt + σ(X(t)) dW (t) (7.17)

• can be written as an Ito SDE

dX(t) =

(f(X(t)) +

1

2

(σdσ

dx

)(X(t))

)dt + σ(X(t)) dW (t).

• Conversely, and Ito SDEdX(t) = f(X(t)) dt + σ(X(t))dW (t) (7.18)

• can be written as a Statonovich SDE

dX(t) =

(f(X(t)) − 1

2

(σdσ

dx

)(X(t))

)dt+ σ(X(t)) dW (t).

• The Ito and Stratonovich interpretation of an SDE can lead to equations with very different properties!

• Multiplicative Noise.

– When the diffusion coefficient depends on the solution of theSDEX(t), we will say that wehave an equation withmultiplicative noise.

104

– Multiplicative noise can lead tonoise induced phase transitions. See

∗ W. Horsthemke and R. Lefever,Noise-induced transitions, Springer-Verlag, Berlin 1984.

– This is a topic of current interest for SDEs in infinite dimensions (SPDEs).

Colored Noise

• When the noise which drives an SDE has non-zero correlation time we will say that we havecolorednoise.

• The properties of the SDE (stability, ergodicity etc.) are quite robust under ”coloring of the noise”.See

– G. Blankenship and G.C. Papanicolaou,Stability and control of stochastic systems with wide-band noise disturbances. I, SIAM J. Appl. Math.,34(3), 1978, pp. 437–476.

• Colored noise appears in many applications in physics and chemistry. For a review see

– P. Hanggi and P. JungColored noise in dynamical systems.Adv. Chem. Phys.89 239 (1995).

• In the case where there is an additional small time scale in the problem, in addition to the correlationtime of the colored noise, it is not clear what the right interpretation of the stochastic integral (in thelimit as both small time scales go to0). This is usually called theIt o versus Stratonovich problem.

• Consider, for example, the SDEτX = −X + v(X)ηε(t),

• whereηε(t) is colored noise with correlation timeε2.

• In the limit where both small time scales go to0 we can get either Ito or Stratonovich or neither. See

– G.A. Pavliotis and A.M. Stuart,Analysis of white noise limits for stochastic systems with twofast relaxation times, Multiscale Model. Simul.,4(1), 2005, pp. 1-35.

7.6 The Generator

• Given the functionγ(z) in the SDE (7.1) we define

Γ(z) = γ(z)γ(z)T . (7.19)

• ThegeneratorL is then defined as

Lv = h · ∇v +1

2Γ : ∇∇v. (7.20)

• This operator, equipped with a suitable domain of definition, is the generator of the Markov processgiven by (7.1).

• The formalL2−adjoint operatorL∗

L∗v = −∇ · (hv) +1

2∇ · ∇ · (Γv).

105

7.7 The Ito Formula

• TheIt o formula enables us to calculate the rate of change in time of functionsV : Z → Rn evaluated

at the solution of aZ-valued SDE.

• Formally, we can write:

d

dt

(V (z(t))

)= LV (z(t)) +

⟨∇V (z(t)), γ(z(t))

dW

dt

⟩.

• Note that ifW were a smooth time-dependent function this formula would not be correct: there isan additional term inLV , proportional toΓ, which arises from the lack of smoothness of Brownianmotion.

• The precise interpretation of the expression for the rate ofchange ofV is in integrated form:

Lemma 7.7.1. (Ito’s Formula) Assume that the conditions of Theorem 7.3.2 hold. Letx(t) solve(7.1)andlet V ∈ C2(Z,Rn). Then the processV (z(t)) satisfies

V (z(t)) = V (z(0)) +

∫ t

0LV (z(s))ds +

∫ t

0〈∇V (z(s)), γ(z(s)) dW (s)〉 .

• Let φ : Z 7→ R and consider the function

v(z, t) = E(φ(z(t))|z(0) = z

), (7.21)

• where the expectation is with respect to all Brownian driving paths. By averaging in the Ito formula,which removes the stochastic integral, and using the Markovproperty, it is possible to obtain theBackward Kolmogorov equation.

• For a Stratonovich SDE the rules of standard calculus apply:

– Consider the Stratonovich SDE (7.17) and letV (x) ∈ C2(R). Then

dV (X(t)) =dV

dx(X(t)) (f(X(t)) dt + σ(X(t)) dW (t)) .

– Consider the Stratonovich SDE (7.17) onRd (i.e. f ∈ R

d, σ : Rn 7→ R

d, W (t) is standardBrownian motion onRn). The corresponding Fokker-Planck equation is:

∂ρ

∂t= −∇ · (fρ) +

1

2∇ · (σ∇ · (σρ))). (7.22)

106

7.8 Examples of SDEs

i. The SDE for Brownian motion is:

dX =√

2σdW, X(0) = x.

• The Solution is:

X(t) = x+W (t).

ii. The SDE for the Ornstein-Uhlenbeck process is

dX = −αX dt+√

2λdW, X(0) = x.

• We can solve this equation using the variation of constants formula:

X(t) = e−αtx+√

2λ

∫ t

0e−α(t−s)dW (s).

• We can use Ito’s formula to obtain equations for the momentsof the OU process. The generatoris:

L = −αx∂x + λ∂2x.

• We apply Ito’s formula to the functionf(x) = xn to obtain:

dX(t)n = LX(t)n dt +√

2λ∂X(t)n dW

= −αnX(t)n dt + λn(n− 1)X(t)n−2 dt + n√

2λX(t)n−1 dW.

• Consequently:

X(t)n = xn +

∫ t

0

(−αnX(t)n + λn(n− 1)X(t)n−2

)dt

+n√

2λ

∫ t

0X(t)n−1 dW.

• By taking the expectation in the above equation we obtain theequation for the moments of the OUprocess that we derived earlier using the Fokker-Planck equation:

Mn(t) = xn +

∫ t

0(−αnMn(s) + λn(n− 1)Mn−2(s)) ds.

• Consider thegeometric Brownian motion

dX(t) = µX(t) dt + σX(t) dW (t), (7.23)

107

– where we use the Ito interpretation of the stochastic differential. The generator of this process is

L = µx∂x +σ2x2

2∂2

x.

– The solution to this equation is

X(t) = X(0) exp

((µ− σ2

2)t+ σW (t)

). (7.24)

• To derive this formula, we apply Ito’s formula to the function f(x) = log(x):

d log(X(t)) = L(

log(X(t)))dt+ σx∂x log(X(t)) dW (t)

=

(µx

1

x+σ2x2

2

(− 1

x2

))dt+ σ dW (t)

=

(µ− σ2

2

)dt+ σ dW (t).

• Consequently:

log

(X(t)

X(0)

)=

(µ− σ2

2

)t+ σW (t)

• from which (7.24) follows.

• Notice that the Stratonovich interpretation of this equation leads to the solution

X(t) = X(0) exp(µt+ σW (t))

• Exercise: calculate all moments of the geometric Brownian motion for the Ito and Stratonovich inter-pretations of the stochastic integral.

7.8.1 The Stochastic Landau Equation

• Consider theLandau equation:

dXt

dt= Xt(c−X2

t ), X0 = x. (7.25)

• This is a gradient flow for the potentialV (x) = 12cx

2 − 14x

4.

• Whenc < 0 all solutions are attracted to the single steady stateX∗ = 0.

• Whenc > 0 the steady stateX∗ = 0 becomes unstable andXt → √c if x > 0 andXt → −√

c ifx < 0.

• Consider additive random perturbations to the Landau equation:

dXt

dt= Xt(c−X2

t ) +√

2σdWt

dt, X0 = x. (7.26)

108

• This equation defines an ergodic Markov process onR: There exists a unique invariant distribution:

ρ(x) = Z−1e−V (x)/σ , Z =

∫

R

e−V (x)/σ dx, V (x) =1

2cx2 − 1

4x4.

• ρ(x) is a probability density for all values ofc ∈ R.

• The presence of additive noise in some sense ”trivializes” the dynamics.

• The dependence of various averaged quantities onc resembles the physical situation of a second orderphase transition.

• Consider now multiplicative perturbations of the Landau equation.

dXt

dt= Xt(c−X2

t ) +√

2σXtdWt

dt, X0 = x. (7.27)

• Where the stochastic differential is interpreted in the Itˆo sense.

• The generator of this process is

L = x(c− x2)∂x + σx2∂2x.

• Notice thatXt = 0 is always a solution of (7.27). Thus, if we start withx > 0 (x < 0) the solutionwill remain positive (negative).

• We will assume thatx > 0.

• Consider the functionYt = log(Xt). We apply Ito’s formula to this function:

dYt = L log(Xt) dt + σXt∂x log(Xt) dWt

=

(Xt(c−X2

t )1

Xt− σX2

t

1

X2t

)dt + σXt

1

XtdWt

= (c− σ) dt −X2t dt+ σ dWt.

• Thus, we have been able to transform (7.27) into an SDE with additive noise:

dYt =[(c− σ) − e2Yt

]dt + σ dWt. (7.28)

This is a gradient flow with potential

V (y) = −[(c− σ)y − 1

2e2y].

• The invariant measure, if it exists, is of the form

ρ(y) dy = Z−1e−V (y)/σ dy.

109

• Going back to the variablex we obtain:

ρ(x) dx = Z−1x(c/σ−2)e−x2

2σ dx.

• We need to make sure that this distribution is integrable:

Z =

∫ +∞

0xγe−

x2

2σ <∞, γ =c

σ− 2.

• For this it is necessary thatγ > −1 ⇒ c > σ.

• Not all multiplicative random perturbations lead to ergodic behavior!

• The dependence of the invariant distribution onc is similar to the physical situation of first order phasetransitions.

• ExerciseAnalyze this problem for the Stratonovich interpretation of the stochastic integral.

• ExerciseStudy additive and multiplicative random perturbations ofthe ODE

dx

dt= x(c+ 2x2 − x4).

• For more information see M.C. Mackey, A. Longtin, A. LasotaNoise-Induced Global AsymptoticStability , J. Stat. Phys. 60 (5/6) pp. 735-751.

7.9 The Backward Kolmogorov Equation

Theorem 7.9.1.Assume thatφ is chosen sufficiently smooth so that thebackward Kolmogorov equation

∂v

∂t= Lv for (z, t) ∈ Z × (0,∞),

v = φ for (z, t) ∈ Z × 0 , (7.29)

has a unique classical solutionv(x, t) ∈ C2,1(Z × (0,∞), ). Thenv is given by(7.21)wherez(t) solves(7.2).

7.9.1 Derivation of the Fokker-Planck Equation

• Now we can derive rigorously the Fokker-Planck equation.

Theorem 7.9.2.Consider equation(7.2) with z(0) a random variable with densityρ0(z). Assume that thelaw of z(t) has a densityρ(z, t) ∈ C2,1(Z × (0,∞)). Thenρ satisfies theFokker-Planck equation

∂ρ

∂t= L∗ρ for (z, t) ∈ Z × (0,∞), (7.30a)

ρ = ρ0 for z ∈ Z × 0. (7.30b)

110

Proof. • Let Eµ denote averaging with respect to the product measure induced by the measureµ with

densityρ0 onz(0) and the independent driving Wiener measure on the SDE itself.

• Averaging over randomz(0) distributed with densityρ0(z), we find

Eµ(φ(z(t))) =

∫

Zv(z, t)ρ0(z) dz

=

∫

Z(eLtφ)(z)ρ0(z) dz

=

∫

Z(eL

∗tρ0)(z)φ(z) dz.

• But sinceρ(z, t) is the density ofz(t) we also have

Eµ(φ(z(t))) =

∫

Zρ(z, t)φ(z)dz.

• Equating these two expressions for the expectation at timet we obtain∫

Z(eL

∗tρ0)(z)φ(z) dz =

∫

Zρ(z, t)φ(z) dz.

• We use a density argument so that the identity can be extendedto all φ ∈ L2(Z). Hence, from theabove equation we deduce that

ρ(z, t) =(eL

∗tρ0

)(z).

• Differentiation of the above equation gives (7.30a).

• Settingt = 0 gives the initial condition (7.30b).

111

112

Chapter 8

The Smoluchowski and Freidlin-WentzellLimits

• There are very few SDEs/Fokker-Planck equations that can besolved explicitly.

• In most cases we need to study the problem under investigation either approximately or numerically.

• In this part of the course we will develop approximate methods for studying various stochastic systemsof practical interest.

• There are many problems of physical interest that can be analyzed using techniques from perturbationtheory and asymptotic analysis:

i. Small noise asymptotics at finite time intervals.

ii. Small noise asymptotics/large times (rare events): thetheory of large deviations, escape from a po-tential well, exit time problems.

iii. Small and large friction asymptotics for the Fokker-Planck equation: The Freidlin–Wentzell (under-damped) and Smoluchowski (overdamped) limits.

iv. Large time asymptotics for the Langevin equation in a periodic potential: homogenization and aver-aging.

v. Stochastic systems with two characteristic time scales:multiscale problems and methods.

8.1 Asymptotics for the Langevin equation

• We will study various asymptotic limits for the Langevin equation (we have setm = 1)

q = −∇V (q) − γq +√

2γβ−1W . (8.1)

• There are two parameters in the problem, the friction coefficient γ and the inverse temperatureβ.

113

• We want to study the qualitative behavior of solutions to this equation (and to the correspondingFokker-Planck equation).

• There are various asymptotic limits at which we can eliminate some of the variables of the equationand obtain a simpler equation for fewer variables.

• In the large temperature limit,β ≪ 1, the dynamics of (9.7) is dominated by diffusion: the Langevinequation (9.7) can be approximated by free Brownian motion:

q =√

2γβ−1W .

• The small temperature asymptotics,β ≫ 1 is much more interesting and more subtle. It leads toexponential, Arrhenius type asymptotics for the reaction rate (in the case of a particle escaping froma potential well due to thermal noise) or the diffusion coefficient (in the case of a particle moving in aperiodic potential in the presence of thermal noise)

κ = ν exp (−βEb) , (8.2)

• whereκ can be either the reaction rate or the diffusion coefficient.The small temperature asymptoticswill be studied later for the case of a bistable potential (reaction rate) and for the case of a periodicpotential (diffusion coefficient).

• Assuming that the temperature is fixed, the only parameter that is left is the friction coefficientγ. Thelarge and small friction asymptotics can be expressed in terms of a slow/fast system of SDEs.

• In many applications (especially in biology) the friction coefficient is large:γ ≫ 1. In this case themomentum is the fast variable which we can eliminate to obtain an equation for the position. This istheoverdampedor Smoluchowskilimit.

• In various problems in physics the friction coefficient is small: γ ≪ 1. In this case the position isthe fast variable whereas the energy is the slow variable. Wecan eliminate the position and obtain anequation for the energy. This is theunderdampledor Freidlin-Wentzell limit.

• In both cases we have to look at sufficiently long time scales.

• We rescale the solution to (9.7):qγ(t) = λγ(t/µγ).

• This rescaled process satisfies the equation

qγ = −λγ

µ2γ

∂qV (qγ/λγ) − γ

µγqγ +

√2γλ2

γµ−3γ β−1W , (8.3)

• Different choices for these two parameters lead to the overdamped and underdamped limits:

• λγ = 1, µγ = γ−1, γ ≫ 1.

114

• In this case equation (8.3) becomes

γ−2qγ = −∂qV (qγ) − qγ +√

2β−1W . (8.4)

• Under this scaling, the interesting limit is the overdampedlimit, γ ≫ 1.

• We will see later that in the limit asγ → +∞ the solution to (8.4) can be approximated by the solutionto

q = −∂qV +√

2β−1W .

• λγ = 1, µγ = γ, γ ≪ 1:

qγ = −γ−2∇V (qγ) − qγ +√

2γ−2β−1W . (8.5)

• Under this scaling the interesting limit is the underdampedlimit, γ ≪ 1.

• We will see later that in the limit asγ → 0 the energy of the solution to (8.5) converges to a stochasticprocess on a graph.

8.2 The Kramers to Smoluchowski Limit

• We consider the rescaled Langevin equation (8.4):

ε2qγ(t) = −∇V (qγ(t)) − qγ(t) +√

2β−1W (t), (8.6)

• where we have setε−1 = γ, since we are interested in the limitγ → ∞, i.e. ε→ 0.

• We will show that, in the limit asε→ 0, qγ(t), the solution of the Langevin equation (8.6), convergesto q(t), the solution of the Smoluchowski equation

q = −∇V +√

2β−1W . (8.7)

• We write (8.6) as a system of SDEs:

q =1

εp, (8.8)

p = −1

ε∇V (q) − 1

ε2p+

√2

βε2W . (8.9)

• This systems of SDEs defined a Markov process in phase space. Its generator is

Lε =1

ε2(− p · ∇p + β−1∆

)+

1

ε

(p · ∇q −∇qV · ∇p

)

=:1

ε2L0 +

1

εL1.

• This is a singularly perturbed differential operator.

115

• We will derive the Smoluchowski equation (8.7) using a pathwise technique, as well as by analyzingthe corresponding Kolmogorov equations.

• We apply Ito’s formula top:

dp(t) = Lεp(t) dt+1

ε

√2β−1∂pp(t) dW

= − 1

ε2p(t) dt− 1

ε∇qV (q(t)) dt +

1

ε

√2β−1 dW.

• Consequently:

1

ε

∫ t

0p(s) ds = −

∫ t

0∇qV (q(s)) ds +

√2β−1 W (t) + O(ε).

• From equation (8.8) we have that

q(t) = q(0) +1

ε

∫ t

0p(s) ds.

• Combining the above two equations we deduce

q(t) = q(0) −∫ t

0∇qV (q(s)) ds +

√2β−1W (t) + O(ε)

• from which (8.7) follows.

• Notice that in this derivation we assumed that

E|p(t)|2 6 C.

• This estimate is true, under appropriate assumptions on thepotentialV (q) and on the initial condi-tions.

• In fact, we can prove a pathwise approximation result:

(

E supt∈[0,T ]

|qγ(t) − q(t)|p)1/p

6 Cε2−κ,

• whereκ > 0, arbitrary small (it accounts for logarithmic corrections).

• For the rigorous proof see

– E. Nelson,Dynamical Theories of Brownian Motion, Princeton University press 1967.

• A similar approximation theorem is also valid in infinite dimensions (i.e. for SPDEs):

116

– S. Cerrai and M. Freidlin,On the Smoluchowski-Kramers approximation for a system with aninfinite number of degrees of freedom, Probab. Theory Related Fields, 135 (3), 2006, pp. 363–394.

• The pathwise derivation of the Smoluchowski equation implies that the solution of the Fokker-Planckequation corresponding to the Langevin equation (8.6) converges (in some appropriate sense to beexplained below) to the solution of the Fokker-Planck equation corresponding to the Smoluchowskiequation (8.7).

• It is important in various applications to calculate corrections to the limiting Fokker-Planck equation.

• We can accomplish this by analyzing the Fokker-Planck equation for (8.6) using singular perturbationtheory.

• We will consider the problem in one dimension. This mainly tosimplify the notation. The multi–dimensional problem can be treated in a very similar way.

• The Fokker–Planck equation associated to equations (8.8) and (8.9) is

∂ρ

∂t= L∗ρ

=1

ε(−p∂qρ+ ∂qV (q)∂pρ) +

1

ε2(∂p(pρ) + β−1∂2

pρ)

=:

(1

ε2L∗

0 +1

εL∗

1

)ρ. (8.10)

• The invariant distribution of the Markov processq, p, if it exists, is

ρ0(p, q) =1

Ze−βH(p,q), Z =

∫

R2

e−βH(p,q) dpdq,

• whereH(p, q) = 12p

2 + V (q). We define the function f(p,q,t) through

ρ(p, q, t) = f(p, q, t)ρ0(p, q). (8.11)

Theorem 8.2.1.The functionf(p, q, t) defined in(8.11)satisfies the equation

∂f

∂t=

[1

ε2(−p∂q + β−1∂2

p

)− 1

ε(p∂q − ∂qV (q)∂p)

]f

=:

(1

ε2L0 −

1

εL1

)f. (8.12)

remark

• This is ”almost” the backward Kolmogorov equation with the difference that we have−L1 instead ofL1. This is related to the fact thatL0 is a symmetric operator inL2(R2;Z−1e−βH(p,q)), whereasL1

is antisymmetric.

117

Proof. • We note thatL∗0ρ0 = 0 andL∗

1ρ0 = 0. We use this to calculate:

L∗0ρ = L0(fρ0) = ∂p(fρ0) + β−1∂2

p(fρ0)

= ρ0p∂pf + ρ0β−1∂2

pf + fL∗0ρ0 + 2β−1∂pf∂pρ0

=(−p∂pf + β−1∂2

pf)ρ0 = ρ0L0f.

• Similarly,

L∗1ρ = L∗

1(fρ0) = (−p∂q + ∂qV ∂p) (fρ0)

= ρ0 (−p∂qf + ∂qV ∂pf) = −ρ0L1f.

• Consequently, the Fokker–Planck equation (8.10) becomes

ρ0∂f

∂t= ρ0

(1

ε2L0f − 1

εL1f

),

• from which the claim follows.

• We look for a solution to (8.12) in the form of a power series inε:

f(p, q, t) =

∞∑

n=0

εnfn(p, q, t). (8.13)

• We substitute this expansion into eqn. (8.12) to obtain the following system of equations.

L0f0 = 0, (8.14)

−L0f1 = −L1f0, (8.15)

−L0f2 = −L1f1 −∂f0

∂t(8.16)

−L0fn+1 = −L1fn, n = 2, 3 . . . (8.17)

• The null space ofL0 consists of constants inp. Consequently, from equation (8.14) we conclude that

f0 = f(q, t).

• Now we can calculate the right hand side of equation (8.15):

L1f0 = p∂qf.

• Equation (8.15) becomes:

L0f1 = p∂qf.

118

• The right hand side of this equation is orthogonal toN (L∗0) and consequently there exists a unique

solution. We obtain this solution using separation of variables:

f1 = −p∂qf + ψ1(q, t).

• Now we can calculate the RHS of equation (8.16). We need to calculateL1f1:

−L1f1 =(p∂q − ∂qV ∂p

)(p∂qf − ψ1(q, t)

)

= p2∂2qf − p∂qψ1 − ∂qV ∂qf.

• The solvability condition for (8.16) is∫

R

(− L1f1 −

∂f0

∂t

)ρOU(p) dp = 0,

• from which we obtain the backward Kolmogorov equation corresponding to the Smoluchowski SDE:

∂f

∂t= −∂qV ∂qf + β−1∂2

pf. (8.18)

• Now we solve the equation forf2. We use (8.18) to write (8.16) in the form

L0f2 =(β−1 − p2

)∂2

q f + p∂qψ1.

• The solution of this equation is

f2(p, q, t) =1

2∂2

qf(p, q, t)p2 − ∂qψ1(q, t)p + ψ2(q, t).

• The right hand side of the equation forf3 is

L1f2 =1

2p3∂3

q f − p2∂2qψ1 + p∂qψ2 − ∂qV ∂

2q fp− partialqV ∂qψ1.

The solvability condition ∫

R

L1f2ρOU(p) dp = 0.

• This leads to the equation−∂qV ∂qψ1 − β−1∂2

qψ1 = 0

• The only solution to this equation which is an element ofL2(e−βV (q)) is

ψ1 ≡ 0.

• Putting everything together we obtain the first two terms in the ε-expansion of the Fokker–Planckequation (8.12):

ρ(p, q, t) = Z−1e−βH(p,q)(f + ε(−p∂qf) + O(ε2)

),

119

• wheref is the solution of (8.18).

• Notice that we can rewrite the leading order term to the expansion in the form

ρ(p, q, t) = (2πβ−1)−12 e−βp2/2ρV (q, t) + O(ε),

• whereρV = Z−1e−βV (q)f is the solution of the Smoluchowski Fokker-Planck equation

∂ρV

∂t= ∂q(∂qV ρV ) + β−1∂2

qρV .

• It is possible to expand then-th term in the expansion (8.13) in terms of Hermite functions (theeigenfunctions of the generator of the OU process)

fn(p, q, t) =

n∑

k=0

fnk(q, t)φk(p), (8.19)

whereφk(p) is thek–th eigenfunction ofL0:

−L0φk = λkφk.

• We can obtain the following system of equations (L = β−1∂q − ∂qV ):

Lfn1 = 0,√k + 1

β−1Lfn,k+1 +

√kβ−1∂qfn,k−1 = −kfn+1,k, k = 1, 2 . . . , n− 1,

√nβ−1∂qfn,n−1 = −nfn+1,n,√

(n+ 1)β−1∂qfn,n = −(n+ 1)fn+1,n+1.

• Using this method we can obtain the first three terms in the expansion:

ρ(x, y, t) = ρ0(p, q)

(f + ε(−

√β−1∂qfφ1) + ε2

(β−1

√2∂2

q fφ2 + f20

)

+ε3

(−√β−3

3!∂3

q fφ3 +(−√β−1L∂2

q f −√β−1∂qf20

)φ1

))

+O(ε4),

8.3 The Freidlin–Wentzell Limit

Consider now the rescalingλγ,ε = 1, µγ,ε = γ. The Langevin equation becomes

qγ = −γ−2∇V (qγ) − qγ +√

2γ−2β−1W . (8.20)

We write equation (8.20) as system of two equations

qγ = γ−1pγ , pγ = −γ−1V ′(qγ) − pγ +√

2β−1W .

120

This is the equation for anO(1/γ) Hamiltonian system perturbed byO(1) noise. We expect that, to leadingorder, the energy is conserved, since it is conserved for theHamiltonian system. We apply Ito’s formula tothe Hamiltonian of the system to obtain

H =(β−1 − p2

)+√

2β−1p2W

with p2 = p2(H, q) = 2(H − V (q)).

Thus, in order to study theγ → 0 limit we need to analyze the following fast/slow system of SDEs

H =(β−1 − p2

)+√

2β−1p2W (8.21a)

pγ = −γ−1V ′(qγ) − pγ +√

2β−1W . (8.21b)

The Hamiltonian is the slow variable, whereas the momentum (or position) is the fast variable. Assumingthat we can average over the Hamiltonian dynamics, we obtainthe limiting SDE for the Hamiltonian:

H =(β−1 − 〈p2〉

)+√

2β−1〈p2〉W . (8.22)

The limiting SDE lives on the graph associated with the Hamiltonian system. The domain of definition ofthe limiting Markov process is defined through appropriate boundary conditions (thegluing conditions) atthe interior vertices of the graph.

• We identify all points belonging to the same connected component of the a level curvex : H(x) =

H, x = (q, p).

• Each point on the edges of the graph correspond to a trajectory.

• Interior vertices correspond to separatrices.

• Let Ii, i = 1, . . . d be the edges of the graph. Then(i,H) defines a global coordinate system on thegraph.

• For more information see

– Freidlin and Wentzell, Random Perturbations of Dynamical Systems, Springer 1998.

– Freidlin and Wentzell, Random Perturbations of Hamiltonian Systems, AMS 1994.

– Sowers, A Boundary layer theory for diffusively perturbed transport around a heteroclinic cycle,CPAM 58 (2005), no. 1, 30–84.

We will study the smallγ asymptotics by analyzing the corresponding backward Kolmogorov equationusing singular perturbation theory. The generator of the processqγ , pγ is

Lγ = γ−1 (p∂q − ∂qV ∂p) − p∂p + β−1∂2p

= γ−1L0 + L1.

121

Let uγ = E(f(pγ(p, q; t), qγ(p, q; t))). It satisfies the backward Kolmogorov equation associated to theprocessqγ , pγ:

∂uγ

∂t=

(1

γL0 + L1

)uγ . (8.23)

We look for a solution in the form of a power series expansion in ε:

uγ = u0 + γu1 + γ2u2 + . . .

We substitute this ansatz into (8.23) and equate equal powers in ε to obtain the following sequence ofequations:

L0u0 = 0, (8.24a)

L0u1 = −L1u1 +∂u0

∂t, (8.24b)

L0u2 = −L1u1 +∂u1

∂t. (8.24c)

. . . . . . . . .

Notice that the operatorL0 is the backward Liouville operator of the Hamiltonian system with Hamiltonian

H =1

2p2 + V (q).

We assume that there are no integrals of motion other than theHamiltonian. This means that the nullspace ofL0 consists of functions of the Hamiltonian:

N (L0) =

functions ofH. (8.25)

Let us now analyze equations (8.24). We start with (8.24a); eqn. (8.25) implies thatu0 depends onq, pthrough the Hamiltonian functionH:

u0 = u(H(p, q), t) (8.26)

Now we proceed with (8.24b). For this we need to find the solvability condition for equations of the form

L0u = f (8.27)

My multiply it by an arbitrary smooth function ofH(p, q), integrate overR2 and use the skew-symmetry ofthe Liouville operatorL0 to deduce:1

∫

R2

L0uF (H(p, q)) dpdq =

∫

R2

uL∗0F (H(p, q)) dpdq

=

∫

R2

u(−L0F (H(p, q))) dpdq

= 0, ∀F ∈ C∞b (R).

1We assume that bothu1 andF decay to0 as|p| → ∞ to justify the integration by parts that follows.

122

This implies that thesolvability condition for equation (8.27) is that∫

R2

f(p, q)F (H(p, q)) dpdq = 0, ∀F ∈ C∞b (R). (8.28)

We use the solvability condition in (8.24b) to obtain that

∫

R2

(L1u1 −

∂u0

∂t

)F (H(p, q)) dpdq = 0, (8.29)

To proceed, we need to understand howL1 acts to functions ofH(p, q). Letφ = φ(H(p, q)). We have that

∂φ

∂p=∂H

∂p

∂φ

∂H= p

∂φ

∂H

and

∂2φ

∂p2=

∂

∂p

(∂φ

∂H

)=

∂φ

∂H+ p2 ∂

2φ

∂H2.

The above calculations imply that, whenL1 acts on functionsφ = φ(H(p, q)), it becomes

L1 =[(β−1 − p2)∂H + β−1p2∂2

H

], (8.30)

where

p2 = p2(H, q) = 2(H − V (q)).

We want to change variables in the integral (8.29) and go from(p, q) to p, H. The Jacobian of thetransformation is:

∂(p, q)

∂(H, q)=

∂p∂H

∂p∂q

∂q∂H

∂q∂q

=∂p

∂H=

1

p(H, q).

We use this, together with (8.30), to rewrite eqn. (8.29) as∫ ∫ (∂u

∂t+[(β−1 − p2)∂H + β−1p2∂2

H

]u)F (H)p−1(H, q) dHdq = 0.

We introduce the notation

〈·〉 :=

∫· dq.

The integration overq can be performed ”explicitly”:∫ [∂u

∂t〈p−1〉 +

((β−1〈p−1〉 − 〈p〉)∂H + β−1〈p〉∂2

H

)u]F (H) dH = 0.

This equation should be valid for every smooth functionF (H), and this requirement leads to the differ-ential equation

〈p−1〉∂u∂t

=(β−1〈p−1〉 − 〈p〉

)∂Hu+ 〈p〉β−1∂2

Hu,

123

or,∂u

∂t=(β−1 − 〈p−1〉−1〈p〉

)∂Hu+ γ〈p−1〉−1〈p〉β−1∂2

Hu.

Thus, we have obtained the limiting backward Kolmogorov equation for the energy, which is the ”slowvariable”. From this equation we can read off the limiting SDE for the Hamiltonian:

H = b(H) + σ(H)W (8.31)

whereb(H) = β−1 − 〈p−1〉−1〈p〉, σ(H) = β−1〈p−1〉−1〈p〉.

Notice that the noise that appears in the limiting equation (8.31) is multiplicative, contrary to the additivenoise in the Langevin equation.

As it well known from classical mechanics, theaction andfrequencyare defined as

I(E) =

∫p(q,E) dq

and

ω(E) = 2π

(dI

dE

)−1

,

respectively. Using the action and the frequency we can write the limiting Fokker–Planck equation for thedistribution function of the energy in a very compact form.

Theorem 8.3.1.The limiting Fokker–Planck equation for the energy distribution functionρ(E, t) is

∂ρ

∂t=

∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)). (8.32)

Proof. We notice thatdI

dE=

∫∂p

∂Edq =

∫p−1 dq

and consequently

〈p−1〉−1 =ω(E)

2π.

Hence, the limiting Fokker–Planck equation can be written as

∂ρ

∂t= − ∂

∂E

((β−1 I(E)ω(E)

2π

)ρ

)+ β−1 ∂2

∂E2

(Iω

2π

)

= −β−1 ∂ρ

∂E+

∂

∂E

(Iω

2πρ

)+ β−1 ∂

∂E

(dI

dE

ωρ

2π

)+ β−1 ∂

∂E

(I∂

∂E

(ωρ2π

))

=∂

∂E

(Iω

2πρ

)+ β−1 ∂

∂E

(I∂

∂E

(ωρ2π

))

=∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)),

which is precisely equation (8.32).

124

Remarks 8.3.2. i. We emphasize that the above formal procedure does not provide us with the boundaryconditions for the limiting Fokker–Planck equation. We will discuss about this issue in the next section.

ii. If we rescale back to the original time-scale we obtain the equation

∂ρ

∂t= γ

∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)). (8.33)

We will use this equation later on to calculate the rate of escape from a potential barrier in theenergy-diffusion-limited regime.

125

126

Chapter 9

Exit Time Problems and Reaction RateTheory

9.1 Introduction

There are many systems in physics, chemistry and biology that exist in at least two stable states. Amongthe many applications we mention the switching and storage devices in computers. Another example isbiological macromolecules that can exist in many differentstates. The problems that we would like to solveare:

• How stable are the various states relative to each other.

• How long does it take for a system to switch spontaneously from one state to another?

• How is the transfer made, i.e. through what path in the relevant state space? There is a lot of importantcurrent work on this problem by E, Vanden Eijnden etc.

• How does the system relax to an unstable state?

We can separate between the1d problem, the finite dimensional problem and the infinite dimensional prob-lem (SPDEs). We we will solve completely the one dimensionalproblem and discuss in some detail aboutthe finite dimensional problem. The infinite dimensional situation is an extremely hard problem and wewill only make some remarks. The study of bistability and metastability is a very active research area, inparticular the development of numerical methods for the calculation of various quantities such as reactionrates, transition pathways etc.

We will mostly consider the dynamics of a particle moving in abistable potential, under the influence ofthermal noise in one dimension:

x = −V ′(x) +√

2kBT β. (9.1)

An example of the class of potentials that we will consider isshown in Figure. It has to local minima,one local maximum and it increases at least quadratically atinfinity. This ensures that the state spaceis ”compact”, i.e. that the particle cannot escape at infinity. The standard potential that satisfies theseassumptions is

V (x) =1

4x4 − 1

2x2 +

1

4. (9.2)

127

0 50 100 150 200 250 300 350 400 450 500−3

−2

−1

0

1

2

3

It is easily checked that this potential has three local minima, a local maximum atx = 0 and two localminima atx = ±1. The values of the potential at these three points are:

V (±1) = 0, V (0) =1

4.

We will say that the height of the potential barrier is14 . The physically (and mathematically!) interesting

case is when the thermal fluctuations are weak when compared to the potential barrier that the particle hasto climb over.

More generally, we assume that the potential has two local minima at the pointsa andc and a localmaximum atb. Let us consider the problem of the escape of the particle from the left local minimuma. Thepotential barrier is then defined as

∆E = V (b) − V (a).

Our assumption that the thermal fluctuations are weak can be written as

kBT

∆E≪ 1.

In this limit, it is intuitively clear that the particle is most likely to be found at eithera or c. There it willperform small oscillations around either of the local minima. This is a result that we can obtain by studyingthe small temperature limit by using perturbation theory. The result is that we can describe locally thedynamics of the particle by appropriate Ornstein–Uhlenbeck processes. Of course, this result is valid onlyfor finite times: at sufficiently long times the particle can escape from the one local minimum,a say, andsurmount the potential barrier to end up atc. It will then spend a long time in the neighborhood ofc until itescapes again the potential barrier and end ata. This is an example of arare event. The relevant time scale,theexit time or themean first passage timescales exponentially inβ := (kBT )−1:

τ = ν−1 exp(β∆E).

It is more customary to calculate thereaction rate κ := τ−1 which gives the rate with which particlesescape from a local minimum of the potential:

κ = ν exp(−β∆E). (9.3)

128

It is very important to notice that the escape from a local minimum, i.e. a state of local stability, can happenonly at positive temperatures: it is a noise assisted event.Indeed, consider the caseT = 0. The equation ofmotion becomes

x = −V ′(x), x(0) = x0.

In this case the potential becomes aLyapunov function:

dx

dt= V ′(x)

dx

dt= −(V ′(x))2 < 0.

Hence, depending on the initial condition the particle willconverge either toa or c. The particle cannotescape from either state of local stability.

On the other hand, at high temperatures the particle does not”see” the potential barrier: it essentiallyjumps freely from one local minimum to another.

To get a better understanding of the dependence of the dynamics on the depth of the potential barrierrelative to temperature, we solve the equation of motion (9.1) numerically. In Figure we present the timeseries of the particle position. We observe that at small temperatures the particle spends most of its timearoundx = ±1 with rapid transitions from−1 to 1 and back.

9.2 Kramers’ Theory

9.3 The Mean First Passage Time Approach

The Arrhenius-type factor in the formula for the reaction rate, eqn. (9.3) is intuitively and it has been ob-served experimentally in the late nineteenth century by Arrhenius and others. What is extremely importantboth from a theoretical and an applied point of view is the calculation of the prefactorν, therate coefficient.A systematic approach for the calculation of the rate coefficient, as well as the justification of the Arrheniuskinetics, is that of the mean first passage time method (MFPT). Since this method is of independent interestand is useful in various other contexts, we will present it ina quite general setting and apply it to the problemof the escape from a potential barrier in later sections. We will first treat the one dimensional problem andthen extend the theory to arbitrary finite dimensions.

We will restrict ourselves to the case of homogeneous Markovprocesses. It is not very easy to extendthe method to non-Markovian processes.

9.3.1 The Boundary Value Problem for the MFPT

• LetXt be a continuous time diffusion process onRd whose evolution is governed by the SDE

dXxt = b(Xx

t ) dt + σ(Xxt ) dWt, Xx

0 = x. (9.4)

• LetD be a bounded subset ofRd with smooth boundary. Givenx ∈ D, we want to know how long it

takes for the processXt to leave the domainD for the first time

τxD = inf t > 0 : Xx

t /∈ D .

129

• Clearly, this is a random variable. The average of this random variable is called themean first passagetime MFPT or thefirst exit time :

τ(x) := EτxD.

• We can calculate the MFPT by solving an appropriate boundaryvalue problem.

Theorem 9.3.1.The MFPT is the solution of the boundary value problem

−Lτ = 1, x ∈ D, (9.5a)

τ = 0, x ∈ ∂D, (9.5b)

whereL is the generator of the SDE 9.5.

• The homogeneous Dirichlet boundary conditions correspondto anabsorbing boundary: the particlesare removed when they reach the boundary. Other choices of boundary conditions are also possible.

• The rigorous proof of Theorem 9.3.1 is based on Ito’s formula.

Derivation

• Let ρ(X,x, t) be the probability distribution of the particles thathave not left the domainD at timet. It solves the FP equation with absorbing boundary conditions.

∂ρ

∂t= L∗ρ, ρ(X,x, 0) = δ(X − x), ρ|∂D = 0. (9.6)

• We can write the solution to this equation in the form

ρ(X,x, t) = eL∗tδ(X − x),

• where the absorbing boundary conditions are included in thedefinition of the semigroupeL∗t.

• The homogeneous Dirichlet (absorbing) boundary conditions imply that

limt→+∞

ρ(X,x, t) = 0.

• That is: all particles will eventually leave the domain.

• The (normalized) number of particles that are still insideD at timet is

S(x, t) =

∫

Dρ(X,x, t) dx.

• Notice that this is a decreasing function of time. We can write

∂S

∂t= −f(x, t),

130

• wheref(x, t) is thefirst passage times distribution.

• The MFPT is the first moment of the distributionf(x, t):

τ(x) =

∫ +∞

0f(s, x)s ds =

∫ +∞

0−dSdss ds

=

∫ +∞

0S(s, x) ds =

∫ +∞

0

∫

Dρ(X,x, s) dXds

=

∫ +∞

0

∫

DeL

∗sδ(X − x) dXds

=

∫ +∞

0

∫

Dδ(X − x)

(eLs1

)dXds =

∫ +∞

0

(eLs1

)ds.

• We applyL to the above equation to deduce:

Lτ =

∫ +∞

0

(LeLt1

)dt =

∫ t

0

d

dt

(LeLt1

)dt

= −1.

9.4 Escape from a potential barrier

In this section we use the theory developed in the previous section to study the long time/small temperatureasymptotics of solutions to the Langevin equation for a particle moving in a one–dimensional potential ofthe form (9.2):

x = −V ′(x) − γx+√

2γkBTW . (9.7)

In particular, we justify the Arrhenius formula for the reaction rate

κ = ν(γ) exp(−β∆E)

and we calculate the escape rateν = ν(γ). In particular, we analyze the dependence of the escape rateonthe friction coefficient. We will see that the we need to distinguish between the cases of large and smallfriction coefficients.

9.4.1 Calculation of the Reaction Rate in the Smoluchowski regime

We consider the Langevin equation (9.7) in the limit of largefriction. As we saw in Chapter??, in theoverdamped limitγ ≫ 1, the solution to (9.7) can be approximated by the solution tothe Smoluchowskiequation (9.1)

γx = −V ′(x) +√

2γkBTW .

We want to calculate the rate of escape from the potential barrier in this case. We assume that the particle isinitially at x0 which is neara, the left potential minimum.

• Consider the boundary value problem for the MFPT of the one dimensional diffusion process (9.1)from the interval(a, b):

−β−1eβV ∂x

(e−βV τ

)= 1 (9.8)

131

• We choose reflecting BC atx = a and absorbing B.C. atx = b. We can solve (9.8) with theseboundary conditions by quadratures:

τ(x) = β−1

∫ b

xdyeβV (y)

∫ y

0dze−βV (z). (9.9)

• Now we can solve the problem of the escape from a potential well: the reflecting boundary is atx = a,the left local minimum of the potential, and the absorbing boundary is atx = b, the local maximum.

• We can replace the B.C. atx = a by a repelling B.C. atx = −∞:

τ(x) = β−1

∫ b

xdyeβV (y)

∫ y

−∞dze−βV (z).

• WhenEbβ ≫ 1 the integral wrtz is dominated by the value of the potential neara. Furthermore, wecan replace the upper limit of integration by∞:

∫ z

−∞exp(−βV (z)) dz ≈

∫ +∞

−∞exp(−βV (a)) exp

(−βω

20

2(z − a)2

)dz

= exp (−βV (a))

√2π

βω20

,

• where we have used the Taylor series expansion around the minimum:

V (z) = V (a) +1

2ω2

0(z − a)2 + . . .

• Similarly, the integral wrty is dominated by the value of the potential around the saddle point. Weuse the Taylor series expansion

V (y) = V (b) − 1

2ω2

b (y − b)2 + . . .

• Assuming thatx is close toa, the minimum of the potential, we can replace the lower limitof inte-gration by−∞. We finally obtain

∫ b

xexp(βV (y)) dy ≈

∫ b

−∞exp(βV (b)) exp

(−βω

2b

2(y − b)2

)dy

=1

2exp (βV (b))

√2π

βω2b

.

• Putting everything together we obtain a formula for the MFPT:

τ(x) =π

ω0ωbexp (βEb) .

• The rate of arrival atb is 1/τ . Only have of the particles escape. Consequently, the escape rate (orreaction rate), is given by12τ :

κ =ω0ωb

2πexp (−βEb) .

132

9.4.2 The Intermediate Regime:γ = O(1)

• Consider now the problem of escape from a potential well for the Langevin equation

q = −∂qV (q) − γq +√

2γβ−1W . (9.10)

• The reaction rate depends on the fiction coefficient and the temperature. In the overdamped limit(γ ≫ 1) we retrieve (??), appropriately rescaled withγ:

κ =ω0ωb

2πγexp (−βEb) . (9.11)

• We can also obtain a formula for the reaction rate forγ = O(1):

κ =

√γ2

4 − ω2b −

γ2

ωb

ω0

2πexp (−βEb) . (9.12)

• Naturally, in the limit asγ → +∞ (9.12) reduces to (9.11)

9.4.3 Calculation of the Reaction Rate in the energy-diffusion-limited regime

• In order to calculate the reaction rate in the underdamped orenergy-diffusion-limited regimeγ ≪ 1

we need to study the diffusion process for the energy, (8.31)or (8.32). The result is

κ = γβI(Eb)ω0

2πe−βEb , (9.13)

• whereI(Eb) denotes the action evaluated atb.

• A formula for the escape rate which is valid for all values of friction coefficient was obtained by Mel-nikov and Meshkov in 1986, J. Chem. Phys 85(2) 1018-1027. This formula requires the calculation ofintegrals and it reduced to (9.11) and (9.13) in the overdamped and underdamped limits, respectively.

9.5 Extension to Higher Dimensions


The calculation of reaction rates and the stochastic modeling of chemical reactions has been a very activearea of research since the30’s. One of the first methods that were developed was that oftransition statetheory cite WIGNER . Kramers developed his theory in his celebrated paperCITE KRAMERS . In thischapter we have based our approach to the calculation of the mean first passage time. Our analysis is basedmostly on [10, Ch. 5, Ch. 9], [26, Ch. 4] and the excellent review article [13]. We highly recommendthis review article for further information on reaction rate theory. See also [12] and the review article ofMelnikov (1991).

133

There are many applications of interest where it is important to calculate reaction rates for non-MarkovianLangevin equations of the form

x = −V ′(x) −∫ t

0bγ(t− s)x(s) ds+ ξ(t) (9.14a)

〈ξ(t)ξ(0)〉 = kBTM−1γ(t) (9.14b)

We will derive generalized non–Markovian equations of the form (9.14a), together with the fluctuation–dissipation theorem (9.14b), in Chapter??. The calculation of reaction rates for the generalized Langevinequation is presented in [12].

The long time/small temperature asymptotics can be studiedrigorously by means of the theory ofFreidlin-Wentzell [6]. See also [2]. A related issue is thatof the small temperature asymptotics for theeigenvalues (in particular, the first eigenvalue) of the generator of the Markov processx(t) which is thesolution of

γx = −∇V (x) +√

2γkBTW .

The theory of Freidlin and Wentzell has also been extended toinfinite dimensional problems. This is a veryimportant problem in many applications such as micromagnetics...We refer toCITE... for more details.

A systematic study of the problem of the escape from a potential well was developed by Matkowsky,Schuss and collaborators [21, 17, 18]. This approach is based on a systematic use of singular perturbationtheory. In particular, the calculation of the transition rate which is uniformly valid in the friction coefficientis presented in [18]. This formula is obtained through a careful analysis of the PDE

p∂qτ − ∂qV ∂pτ + γ(−p∂p + kBT∂2p)τ = −1,

for the mean first passage timeτ . The PDE is equipped, of course, with the appropriate boundary conditions.Singular perturbation theory is used to study the small temperature asymptotics of solutions to the boundaryvalue problem. The formula derived in this paper reduces to the formulas which are valid at large and smallvalues of the friction coefficient at the appropriate asymptotic limits.

The study of rare transition events between long lived metastable states is a key feature in many systemsin physics, chemistry and biology. Rare transition events play an important role, for example, in the analysisof the transition between different conformation states ofbiological macromolecules such as DNA [22].The study of rare events is one of the most active research areas in the applied stochastic processes. Recentdevelopments in this area involve the transition path theory of W. E and Vanden Eijnden. Various simpleapplications of this theory are presented in Metzner, Schutte et al 2006. As in the mean first passage timeapproach, transition path theory is also based on the solution of an appropriate boundary value problem forthe so-called commitor function.

134

Bibliography

[1] L. Arnold. Stochastic differential equations: theory and applications. Wiley-Interscience [John Wiley& Sons], New York, 1974. Translated from the German.

[2] N. Berglund and B. Gentz.Noise-induced phenomena in slow-fast dynamical systems. Probability andits Applications (New York). Springer-Verlag London Ltd.,London, 2006. A sample-paths approach.

[3] S.N. Ethier and T.G. Kurtz.Markov processes. Wiley Series in Probability and Mathematical Statistics:Probability and Mathematical Statistics. John Wiley & SonsInc., New York, 1986.

[4] L.C. Evans.Partial Differential Equations. AMS, Providence, Rhode Island, 1998.

[5] W. Feller. An introduction to probability theory and its applications. Vol. II. Second edition. JohnWiley & Sons Inc., New York, 1971.

[6] M.I. Freidlin and A.D. Wentzell.Random Perturbations of dunamical systems. Springer-Verlag, NewYork, 1984.

[7] A. Friedman. Partial differential equations of parabolic type. Prentice-Hall Inc., Englewood Cliffs,N.J., 1964.

[8] A. Friedman. Stochastic differential equations and applications. Vol.1. Academic Press [HarcourtBrace Jovanovich Publishers], New York, 1975. Probabilityand Mathematical Statistics, Vol. 28.

[9] A. Friedman. Stochastic differential equations and applications. Vol.2. Academic Press [HarcourtBrace Jovanovich Publishers], New York, 1976. Probabilityand Mathematical Statistics, Vol. 28.

[10] C. W. Gardiner.Handbook of stochastic methods. Springer-Verlag, Berlin, second edition, 1985. Forphysics, chemistry and the natural sciences.

[11] I. I. Gikhman and A. V. Skorokhod.Introduction to the theory of random processes. Dover PublicationsInc., Mineola, NY, 1996.

[12] P. Hanggi. Escape from a metastable state.J. Stat. Phys., 42(1/2):105–140, 1986.

[13] P. Hanggi, P. Talkner, and M. Borkovec. Reaction-rate theory: fifty years after Kramers.Rev. ModernPhys., 62(2):251–341, 1990.

135

[14] W. Horsthemke and R. Lefever.Noise-induced transitions, volume 15 ofSpringer Series in Synerget-ics. Springer-Verlag, Berlin, 1984. Theory and applications in physics, chemistry, and biology.

[15] S. Karlin and H. M. Taylor.A second course in stochastic processes. Academic Press Inc. [HarcourtBrace Jovanovich Publishers], New York, 1981.

[16] P. Mandl. Analytical treatment of one-dimensional Markov processes. Die Grundlehren der mathe-matischen Wissenschaften, Band 151. Academia Publishing House of the Czechoslovak Academy ofSciences, Prague, 1968.

[17] B. J. Matkowsky, Z. Schuss, and E. Ben-Jacob. A singularperturbation approach to Kramers’ diffusionproblem.SIAM J. Appl. Math., 42(4):835–849, 1982.

[18] B. J. Matkowsky, Z. Schuss, and C. Tier. Uniform expansion of the transition rate in Kramers’ problem.J. Statist. Phys., 35(3-4):443–456, 1984.

[19] Frigyes Riesz and Bela Sz.-Nagy.Functional analysis. Dover Publications Inc., New York, 1990.Translated from the second French edition by Leo F. Boron, Reprint of the 1955 original.

[20] H. Risken.The Fokker-Planck equation, volume 18 ofSpringer Series in Synergetics. Springer-Verlag,Berlin, 1989.

[21] Z. Schuss. Singular perturbation methods in stochastic differential equations of mathematical physics.SIAM Review, 22(2):119–155, 1980.

[22] Ch. Schutte and W. Huisinga. Biomolecular conformations can be identified as metastable sets ofmolecular dynamics. InHandbook of Numerical Analysis (Computational Chemistry), Vol X, 2003.

[23] C. Schwab and R.A. Todor. Karhunen-Loeve approximation of random fields by generalized fastmultipole methods.J. Comput. Phys., 217(1):100–122, 2006.

[24] D.W. Stroock.Probability theory, an analytic view. Cambridge University Press, Cambridge, 1993.

[25] G. I. Taylor. Diffusion by continuous movements.London Math. Soc., 20:196, 1921.

[26] R. Zwanzig.Nonequilibrium statistical mechanics. Oxford University Press, New York, 2001.

136

Documents

APPLIED STOCHASTIC PROCESSES - Imperial College …pavl/lecture_notesM4A42.pdf · APPLIED STOCHASTIC PROCESSES ... 5.2 The Backward and Forward Kolmogorov Equations ... 6.9 Brownian