Basics on Measure Theory and Probability Theorymazhu/Math6050_16/math... · 2016. 10. 20. · 1. Basics on Measure Theory and Probability Theory This Section collects some basic facts

NOTES FOR MATH 6050D, INFINITE DIMENSIONAL

ANALYSIS, PART I

1. Basics on Measure Theory and Probability Theory

This Section collects some basic facts in measure theory and probabilitytheory. All the results can be found in standard text books such as [4].1.1. Measure Theory.Definition. Ω a set, a system subsets of Ω is called an algebra (a field) ifit contains Ω and is closed under taking the complement and finite union.An algebra is called σ-algebra (σ-field) if it satisfies the additional propertythat it is closed under countable union. A measurable space is a pair(Ω,Σ), where Ω is a set and Σ is a σ-algebra on Ω.

Definition. A measure on a measurable space (Ω,Σ) assigns to each A ∈ Σa number µ(A) ∈ [0,∞] such that µ is countably additive.

The easiest example is (S,Σ, P ), where S is a countable set S = {x1, x2, . . . },Σ is the set of all subsets in S, P is determined by P ({xi} = pi, pi are non-negative real numbers.

A Probability Measure on (Ω,Σ) is a measure µ such that µ(Ω) = 1.

Theorem 1.1. Let P be a finitely additive positive function defined overthe algebra Σ with P (Ω) = 1. The following conditions are equivalent.(1). P is σ-additive(2). Continuous from below.(3). Continuous from the above(4). A1 ⊃ A2 ⊃ . . . , ∩Ai is empty, then limP (An) = 0

Theorem 1.2. (Caratheodory’s Theorem) Let Ω be a set, A be an algebraof its subsets, and the σ(A) the smallest σ-algebra containing A. Let µ0be an a σ-additive measure on (Ω, A) (i.e, if Si ∈ A are countably manydisjoint sets, and ∪Si ∈ A, then µ0(∪Si) =

∑µ(Si)). Then there is a

unique measure µ on (Ω, σ(A)) which extends µ0.

For a topological space X, the open sets in X generate a σ-algebra, whichis the Borel σ-algebra, a subset in the Borel σ-algebra is called a Borel set.

A map f : Ω1 → Ω2 between two measurable spaces is called a measur-able map if f−1(A) is a measurable set in Ω1 for every measurable set A inΩ2. A measurable map f : Ω→ R, where R is given by the Borel measure,is called a measurable function.

1

NOTES FOR MATH 6050D, INFINITE DIMENSIONAL ANALYSIS, PART I 2

If Ω1 has a measure µ, f : Ω1 → Ω2 is a measurable map, we can definea measure f∗µ on Ω2 called the push-forward measure of µ under f by

f∗µ(A) = µ(f−1A).

If φ(: Ω2 → R is a measurable function, we can define a measurable functionf∗φ on Ω1 by

f∗φ(x) = φ(f(x)), x ∈ Ω1,f∗φ is called the pull-back of φ (under f).

The push-forward operator f∗ and the pull-back f∗ are dual each other inthe sense that, for every measure µ on Ω1, every measurable function φ onΩ2, satisfying the condition either φ ≥ 0 or∫

Ω1

|f∗φ(x)|µ(dx)|


(3). F (x) is continuous on the right and has a limit on the left at eachx ∈ R.

Definition. A function satisfies (1) (2) (3) is called a distribution func-tion. A generalized distribution function is a real valued function onR that is non-decreasing and is continuous on the right.

Theorem 1.3. Let F (x) be a distribution function on the real line R. Thereexists a unique probability measure on (R, B) such that

P (−∞, b] = F (b).

Expectation of a random variable.

Eξ =

∫Ωξ(ω)µ(dω).

Variance V ξ,√V ξ.

V ξ = E(ξ − Eξ)2

Independent Random Variables.

1.3. Central Limit Theorem.A random variable X is called a Gaussian random variable if it has

density function1√2πσ

e−(x−a)2

2σ2

It has expectation EX = a and variance V X = σ2. The correspondingdistribution is called a normal distribution, is denoted by N(a, σ2).

Theorem 1.4. ( Central Limit Theorem). Suppose X1, X2, ... is a sequenceof independent and identically distributed random variables, with EXi = 0and V Xi = σ

2


Example. (x+ x−1

2

)n=

n∑i=0

(ni

)xn−2i

For each a < b,

limn→∞

∑n−b√n

2i≤n−a

√n

2

1

2n

(ni

)=

1√2π

∫ bae−

t2

2 dt.

Example. Let f(x) be a Laurent polynomial: f(x) =∑

i pixi, pi ≥ 0,

f(1) = 1, f ′(1) = 0, f ′′(1) = σ2. Let f(x)n =∑

iCi(n)xi. Then for a < b,

limn→∞

∑n−b√n

2a√n≤i≤n

√n

Ci(n) =1

σ√

2π

∫ bae−

t2

2σ2 dt.

The second example include the first as a special case. To see it canbe proved using the central limit theorem, we set a finite probability spaceΩ = {i ∈ Z | pi 6= 0}, with measure given by µ(i) = pi. Let X : Ω → Rbe the random variable X(i) = i, f ′(1) = 0 is equivalent to EX = 0, thevariable is f ′′(1). Let X1, X2, . . . be independent copies of X (X1, X2, . . .can be realized on the infinite product space Ω×Ω× . . . . The central limittheorem tells that

limn→∞

P (a ≤ X1 + · · ·+Xn√n

≤ b) = 1σ√

2π

∫ bae−

t2

2σ2 dt.

This is equivalent to the limit formula in the example.

1.4.Riemann - Stieltjes integrals.

The Riemann-Stieltjes integral of a real-valued function f(x) on [a, b] ofa real variable with respect to a real function g(x) on [a, b] is denoted by∫ b

af(x)dg(x)

and defined to be the limit, as the mesh of the partition

P = {a = x0 < x1 < · · · < xn = b}of the interval [a, b] approaches zero, of the approximating sum

S(P, f, g) =n−1∑i=0

f(ci)(g(xi+1)− g(xi))

where ci is in the i-th subinterval [xi, xi+1]. The two functions f and g arerespectively called the integrand and the integrator.The ”limit” is here understood to be a number A (the value of the Riemann-Stieltjes integral) such that for every � > 0, there exists δ > 0 such that


for every partition P with mesh(P ) < δ, and for every choice of pointsci ∈ [xi, xi+1],

|S(P, f, g)−A| < �

The following are cases that∫ ba f(x)dg(x) exists.

(1). If g(x) is continuous, and f(x) is bounded and has at most countablymany dis-continuous points.(2). If g(x) is a distribution function (see Section 1.2 for definition) and f(x)is continuous. In the case (2), let µ be the probability measure on R associ-ated to g(x) (Theorem 1.3) then the Riemann-Stieltjes integral

∫ ba f(x)dg(x)

is equals to the integral∫

(a,b] f(x)µ(dx).

If g(x) ∈ C1[a, b], then the Riemann-Stieltjes integral∫ ba f(x)dg(x) is

reduced to the Riemann integral∫ baf(x)dg(x) =

∫ baf(x)g′(x)dx.

1.5. Conditional Expectation.In this section we explain the concept of conditional expectation. Let

(Ω,F , P ) be a fixed probability space. Suppose we have another σ-algebraG ⊂ F . Let X be a random variable with E|X| < ∞, i.e., X ∈ L1(Ω).Define a real-valued function µ on G by

µ(A) =

∫AX(ω)dP (ω), A ∈ G.

Note that µ satisfies the properties that(1). µ(Φ) = 0.(2). For a sequence of disjoints sets Ai in G,

∞∑i=1

µ(Ai) = µ(∪∞i=1Ai)

where the left hand side is convergent absolutely.(3) If P (A) = 0, then µ(A) = 0.

A function µ : G → R satisfying conditions (1) and (2) is called a signedmeasure on (Ω, G). A signed measure µ is said to be absolutely continu-ous with respect to P if it satisfies condition (3). Therefore, the functionµ defined as above is a signed measure on (Ω, G) and is absolutely contin-uous with respect to P . Apply the Radon-Nikodym theorem to the signedmeasure µ, we get a G-measurable random variable Y with E|Y |


The random variable Y is called the conditional expectation given G. It ischaracterized as the unique random variable Y ∈ L1(Ω,G, P ) satisfying∫

AX(ω)dP (ω) =

∫AY (ω)dP (ω), for; all A ∈ G.

1.6. Martingales.Let Ω be a space, T be an interval in R or the set of non-negative integers.

A filtration of σ-algebras is a family of σ-algebras {Ft, t ∈ T} such thatFt ⊂ F)t′ whenever t < t′.

A stochastic process Xt, t ∈ T , is said to be adapted to {Ft, t ∈ T} iffor each t, the random variable Xt is Ft-measurable.

Definition. Let Xt, t ∈ T be a stochastic process adapted to a filtration{Ft, t ∈ T} and E|Xt| < ∞ for all t ∈ T . Then Xt is called a martingalewith respect to {Ft} if for any s < t in T ,

E(Xt | Fs) = Xs.

Definition. Let Xt, t ∈ T be a stochastic process adapted to a filtration{Ft, t ∈ T} and E|Xt| < ∞ for all t ∈ T . Then Xt is called a submartin-gale with respect to {Ft} if for any s < t in T ,

E(Xt | Fs) ≥ Xs.

If Xt, t ∈ T is a martingale, then |Xt|, t ∈ T is a submartingale.

Theorem 1.5. (Doob’s Inequality for Discrete Submartingale) Let X1, X2, . . . , Xnbe a submartingale with respect F1,F2, . . . , assume all Xi ≥ 0, then for allc > 0,

P (max1≤i≤nXi ≥ c) ≤1

cE(Xn)

Proof. Notice that n = 1, the inequality is just the Chebyshev’s Inequality.For each 1 ≤ i ≤ n, let

Ai = {ω | Xi(ω) ≥ c,Xj(ω) < c, for j < i}

then Ai is Fi-measurable and they are mutually disjoint, moreover

{ω | max1≤i≤nXi(ω) ≥ c} ⊂n∐i=1

Ai


E(Xn) ≥n∑i=1

∫Ai

Xn(ω)P (dω)

≥n∑i=1

∫Ai

Xi(ω)P (dω)

≥n∑i=1

c

∫Ai

P (dω)

= cm∑i=1

P (Ai) = cP (max1≤i≤nXi ≥ c)

Theorem 1.6. (Doob’s Inequality for Continuous Submartingale) Let Xt, a ≤t ≤ b be a submartingale with respect Ft, t ∈ [a, b], assume all sample pathsare continuous and Xi ≥ 0 for all t ∈ [a, b], then for all c > 0,

P (Supa≤t≤bXt ≥ c) ≤1

cE(Xn)

Proof. Let Q = {r1, r2, ..} be an enumeration of all rational numbers in[a, b]. Then by the right continuity of Xt, we have

Supt∈[a,b]Xt = Supr∈QXr.

For each k, arrange the numbers in the set {r1, r2, ..., rk} in increasing orderrk1 < r

k2 < < r

kk .

Then for any c > 0,

{Supr∈QXr ≥ c} = ∩∞n=1 ∪∞k=1 {max1≤j≤kXrkj ≥ c−1

n}

It follows that

P (Supa≤t≤bXt ≥ c) = P (Supr∈QXr ≥ c)

= limn→∞

limk→∞

P{max1≤j≤kXrkj ≥ c−1

n}

We note that Xrk1, . . . , Xrkk

is a discrete submartingale, so we have

P{max1≤j≤kXrkj ≥ c−1

n} ≤ 1

c− 1nE(Xrkk

) ≤ 1c− 1n

E(Xb)

2

1.7. Different Types of Convergence.Definition. A sequence of random variables Xn converges in probabilityto a random variable X , if

limn→∞

P (|Xn −X| ≥ �) = 0

for all � > 0.


Definition. A sequence of random variables Xn converges almost every-where or almost surely to a random variable X , if outside a zero measuresubset, Xn converges to X pointwisely.

Definition. A sequence of random variables Xn in L2 converges in L2 to a

random variable X if

limEn→∞(|Xn −X|2) = 0.

Definition. A sequence of random variables Xn converges in distributionto a random variable X, if the distribution functions FXn(x) converges tothe distribution function FX(x) of X for all points x.

The relation of the convergences:limn→∞Xn = X almost surely implies limn→∞Xn = X in probability.limn→∞Xn = X in L

p implies limn→∞Xn = X in probability.limn→∞Xn = X in probability implies Xn converges to X in distribution.

A sequence of random variable {Xn} is called fundamental in proba-bility if for every � > 0,

limm,n→∞

P (|Xm −Xn| > �) = 0.

It is equivalent to the condition that for every � > 0, there is N such thatwhenever m,n > N ,

P (|Xm −Xn| > �) < �.We have the Cauchy Criterion:A sequence of random variables {Xn} converges in probability iff it is fun-damental in probability.

2. Wiener Measure and Wiener Integral in C[0, 1].

In this section we construct the Wiener measure on the function functionspace C[0, 1], our exposition follows [2] page 36- 53. The reference for Section2.5 is [1] page 44-50.2.1. Heat Kernel on R.

The heat kernel on Rn is

p(t, x) =1

(4πt)n/2e−

x21+···+x2n

4t

it satisfies the heat equation

∂

∂tp(t, x) = ∆p(t, x), p(0, x) = δ(x).


In our definition of Wiener measure on path in Rn, we use the slightlymodified function

p(t, x) =1√2πt

e−x21+···+x

2n

2t

It satisfies

2∂

∂tp(t, x) = ∆p(t, x), p(0, x) = δ(x).

It satisfied the semi-group property∫Rnp(t1, x− y)p(t2, y)dy = p(t1 + t2, x).

2.2. Banach Algebra C[0, 1] and cylinder sets.Let C[0, 1] denote the space of continuous functions x(t) on [0, 1] such

that x(0) = 0. It is a Banach space with norm

‖x(t)‖ = max(|x(t)|, 0 ≤ t ≤ 1).

A subset of C[0, 1] of the form

I = {x(t) ∈ C[0, 1] | (x(t1), . . . , x(tn)) ∈ E} (2.1)where t1, . . . , tn are distinct points in [0, 1], E ⊂ Rn is a Borel subset, iscalled a cylinder set.

Lemma 2.1. The system of all cylinder sets form an algebra.

Proof. We need prove C[0, 1] itself is a cylinder set, and the complement ofa cylinder set is a cylinder set. The union of two cylinder set is a cylinder set.Take E = Rn in (2.1), we see I = C[0, 1]. For I as in (2.1), it complement is

Ic = {x(t) ∈ C[0, 1] | (x(t1), . . . , x(tn)) ∈ Ec}which is a cylinder set. To prove the union of two cylinder set is a cylinderset, we notice that I in (2.1) can be also expressed as

I = {x(t) ∈ C[0, 1] | (x(t1), . . . , x(tn), . . . , x(tm)) ∈ E × Rm−n}.So we may assume two cylinder set has the same set of cutting pointst1, . . . , tm. 2

Theorem 2.2. The Borel σ-algebra of C[0, 1] is the same as the σ-algebragenerated by the cylinder sets.

Proof. Let B denote the system of all Borel sets in C[0, 1] and C denote theσ-algebra generated by the cylinder sets. We prove B ⊂ C and C ⊂ B.

To prove B ⊂ C, it is enough to prove an closed ball B = {x(t) | ‖x(t)−y(t)| ≤ r} in C. This follows from the identity

B = ∩a∈Q∩[0,1]{|x(a)− y(a)| ≤ r}


To prove C ⊂ B, it is enough to prove cylinder sets of the formI = {x(t) ∈ C[0, 1] | x(t0) ∈ E}

where E is open subset in R is in B. But I is clearly an open subset inC[0, 1]. 2

2.3. Wiener Measure on C[0, 1].For a subset I of C[0, 1] of the form (2.1) with 0 < t1 < t2 < · · · < tn ≤ 1,

we define

w(I) = ((2π)nt1(t2−t1) · · · (tn−tn−1))−12

∫Ee−( x

21

2t1+

(x2−x1)2

2(t2−t1)+···+ (xn−xn−1)

2

2(tn−tn−1))dx1 · · · dxn

The semi-group property implies that w(I) is well-defined and it is clearthat w is finite additive.

Theorem 2.3. w extends to a measure on the Borel σ-algebra of C[0, 1].

In the view of Theorem 1.1 and Theorem 1.2, we only need to prove

Lemma 2.4. If In (n = 1, 2, . . . ) is a decreasing sequence of cylinder setsand ∩∞n=1In is empty, then limn→∞w(In) = 0.

We first reduce the lemma to the following lemma

Lemma 2.5. If Kn (n = 1, 2, . . . ) is a decreasing sequence of closed cylindersets and ∩∞n=1Kn is empty, then limn→∞w(Kn) = 0.Proof of Lemma 2.5 implies Lemma 2.4. Let

In = I(t(n)1 , . . . , t

(n)sn ;En) = {x(t) ∈ C[0, 1] | (x(t

(n)1 ), . . . , t

(n)sn ) ∈ En}.

For arbitrary � > 0, and n ∈ Z>0, there is an closed subset Cn ⊂ En, suchthat

w(I(t(n)1 , . . . , t

(n)sn ;Cn)) ≤ w(I(t

(n)1 , . . . , t

(n)sn ;En)) ≤ w(I(t

(n)1 , . . . , t

(n)sn ;Cn))+

�

2n.

Lets denote w(I(t(n)1 , . . . , t

(n)sn ;Cn)) = Kn, Ln = ∩ni=1Ki, then Ln is a de-

creasing sequence of closed cylinder sets,

w(In) = w(Ln)+w(In−Ln) ≤ w(Ln)+n∑i=1

w(In−Ki) = w(Ln)+n∑i=1

w(Ii−Ki) < w(Ln)+�

Suppose Lemma 2.5 holds, then limw(In) ≤ �, since this is holds for all �,so limw(In) = 0. 2

To prove Lemma 2.5, we introduce some sets.S : the set of rational numbers r in [0, 1] such that r = b2n . It is clear S isdense in [0, 1].


Hα(a) = {x(t) ∈ C[0, 1] | ∃ s1 < s2 inS 3 |x(s1)− x(s2)| > a|s1 − s2|α}.

Its complement is

Hα(a)c = {x(t) ∈ C[0, 1] | ∀ s1 < s2 in S, |x(s1)− x(s2)| ≤ a|s1 − s2|α}.

Since S is dense in [0, 1], we have

Hα(a)c = {x(t) ∈ C[0, 1] | ∀ s1 < s2 in [0, 1], |x(s1)− x(s2)| ≤ a|s1 − s2|α}.

Lemma 2.6. For any 0 < α < 12 , any � > 0, then for a large enough,w(Hα(a)) < �.

Proof of Lemma 2.5. Assume Lemma 2.6 holds. If limn→∞w(Kn) 6= 0, sothere is � > 0, such that w(Kn) > � for all n. Then since w(Hα(a)) < �,Kn is not contained in Hα(a), i.e., we can find xn ∈ Kn ∩ Hα(a)c. weget a sequence x1(t), . . . , xn(t), . . . in C[0, 1]. This sequence is boundedand equicontinuous (since all are in Hα(a)

c). By the ArzelAscoli theorem,so these is a subsequence xni that converges uniformly to some x(t), thenx(t) ∈ ∩∞n=1Kn. Contradiction. 2

Steps to prove Lemma 2.6. Step 1. We prove if x ∈ C[0, 1] satisfies

|x( k2n

)− x(k − 12n

)| ≤ a( 12n

)α ∀1 ≤ k ≤ 2n, ∀n = 1, 2, . . .

for some α > 0, a > 0, then

|x(s1)− x(s2)| ≤2a

1− 2−α|s1 − s2|α ∀ s1, s2 ∈ S

then

|x(s1)− x(s2)| ≤2a

1− 2−α|s1 − s2|α ∀ s1, s2 ∈ [0, 1]

Step 2. Set

Iα,a,k,n = {x(t) ∈ C[0, 1] | |x(k

2n)− x(k − 1

2n)| > a( 1

2n)α}

Step 1 implies that

∩∞n=1 ∩2n

k=1 Icα,a,k,n ⊂ Hα(

2a

1− 2−α)c

so

Hα(2a

1− 2−α) ⊂ ∪∞n=1 ∩2

n

k=1 Iα,a,k,n

Use the estimate that

w(Iα,a,k,n) ≤√

2

π

1

a2n(α−

12

)e−a2

2·2n(1−2α)


we get

w∗(Hα(2a

1− 2−α)) ≤

√2

π

1

a

∞∑k=1

2k(α+12

)e−a2

22k(1−2α)

From this we see Lemma 2.6 holds.

For α > 0, let

Cα = {x(t) ∈ C[0, 1] | ∃a 3 |x(s1)− x(s1)| ≤ a|s1 − s2|α ∀ s1, s2 ∈ [0, 1]}

Then it is clear that

Cα = {x(t) ∈ C[0, 1] | ∃a 3 |x(s1)− x(s1)| ≤ a|s1 − s2|α ∀ s1, s2 ∈ S}

so Cα is a Borel set. It is clear that

Cα = ∪∞n=1Cα(n)

where

Cα(n) = {x(t) ∈ C[0, 1] | |x(s1)− x(s1)| ≤ n|s1 − s2|α ∀ s1, s2 ∈ S}.

It is clear that

Cα(n)c = Hα(n)

So

Ccα = (∪∞n=1Cα(n))c = (∩∞n=1Hα(n))Lemma 2.6 implies that w(Ccα) = 0 when α <

12 , so we have

Theorem 2.7. For 0 < α < 12 , w(Cα) = 1.

Theorem 2.8. For α > 12 , w(Cα) = 0.

2.4. Some Simple Integrations against Wiener Measure.If f(t) is a bounded function on [0, 1] with at most countably many discon-

tinuous points, then the RiemannStieltjes integral defines a linear function

I(f) : C[0, 1]→ R

x(t) ∈ C[0, 1] 7→ I(f)(x) =∫ 1

0f(t)dx(t).

For f(t) =∑n

j=1 aj1(tj ,tj+1], where 0 = t1 < t2 < · · · < tn < tn+1 = 1,

I(f)(x) =

∫ 10f(t)dx(t) =

n∑j=1

aj(x(tj+1)− x(tj))


it is a sum of n independent of random variables of mean 0, so it is a randomGaussian random variable of mean 0 and variance

n∑j=1

a2j (tj+1 − tj) =∫ 1

0|f(t)|2dt.

The variance of If (x) is also given by∫C[0,1]

|If (x)|2w(dx) =∫ 1

0|f(t)|2dt. (2.2)

Let V be the space of step functions f on [0, 1], it is dense subspace ofL2[0, 1]. The map V → L2(C[0, 1], w), f 7→ Φf is an isometry by (2.2), so itextends to an isometry

L2[0, 1]→ L2(C[0, 1], w)we denote the image of f formally as∫ 1

0f(t)dx(t)

It is called the stochastic integral.

Proposition 2.9. I(f) =∫ 1

0 f(t)dx(t) is a Gaussian variable of mean 0 and

variance∫ 1

0 f(t)2dt.

Proof. It is enough to prove the characteristic function

E(eiyI(f)(x)) =

∫C[0,1]

eiyI(f)(x)w(dx)

is a Gaussian function of y. Let fn be a sequence of step functions on [0, 1]that converges to f in L2[0, 1], then I(fn) converges to I(f) in L

2(C[0, 1], w),since w is a probability measure, so I(fn) converges to I(f) in L

1(C[0, 1], w),therefore,

limn→∞

∫C[0,1]

eiyI(fn)(x)w(dx) =

∫C[0,1]

eiyI(f)(x)w(dx).

Since I(fn) is a random variable of mean 0 and variance∫ 1

0 fn(t)2dt, so∫

C[0,1]eiyI(fn)(x)w(dx) = e−

12

∫ 10 fn(t)

2dty2

Therefore ∫C[0,1]

eiyI(f)(x)w(dx) = e−y2

2

∫ 10 f(t)

2dt.

2.

Proposition 2.10. If f1, . . . , fn ∈ L2[0, 1] are mutually orthogonal, then theGaussian random variables are I(f1), . . . , I(fn) independent.


Theorem 2.11. f1, . . . , fn ∈ L2[0, 1],∫C[0,1]

I(f1)(x1), . . . , I(fn)(xn)w(dx) =∑

(fi1 , fj1) · · · (fin , fjn) (2.3)

where (i1, j1), . . . , (in, jn) runs through all the partitions of {1, 2, . . . , 2n} into2 elements subsets. (fi, fj) =

∫ 10 fi(t)fj(t)dt.

Proof. Since the both hand sides are linear in each fi and symmetric underthe permutations of 1, 2, . . . , 2n, it is enough to prove the case f1 = f2 =· · · = f2n = f . Then the left hand side is the 2n-th moment of randomvariable I(f), since its characteristic function is

EeiI(f)(x)y = e−12yσ, σ =

∫ 10f(t)2dt

so ∫C[0,1]

I(f)2n(x)w(dx) =(2n)!

n!2n

(∫ 10f(t)2dt

)nsince there are (2n)!n!2n partitions, we proved the formula is correct for the casef1 = · · · = f2n. 2

2.5. Wiener Measure on C[0,∞).Let C[0,∞) be space of continuous functions x : [0,∞)→ R with x(0) =

0. C[0,∞) has a metric that gives the locally uniform convergence topology.The metric is constructed as follows. First, for each positive integer n, weintroduce the semi-norm | |n by

|f |n = max0≤x≤n|f(x)|.

It is clear that a sequence {fi} in C[0,∞) converges uniformly to f on [0, n]iff

limi→∞|fi − f |n = 0.

Then we induces a metric on C[0,∞) by

d(f, g) =∞∑n=1

1

2nmax(|f − g|n, 1).

It is easy to see that

limi→∞

d(fi, f) = 0

iff fi converges to f on uniformly on each compact interval I ⊂ [0,∞). Weconsider C[0,∞) as a complete metric space with the above metric d( ; ),let B be the σ-algebra of Borel sets.


We can also consider cylinder sets

I = {x(t) ∈ C[0,∞) | (x(t1), . . . , x(tn)) ∈ E} (2.4)where t1, . . . , tn are distinct points in [0,∞), E ⊂ Rn is a Borel subset. Itturns out that the sigma algebra generated by cylinder sets coincides withthe Borel sigma algebra B. The proof is similar to that of Theorem 2.2.

The Wiener measure on (C[0,∞),B) is defined in the similar way asWiener measure on C[0, 1]: for For a subset I of C[0,∞) of the form (2.4)with 0 < t1 < t2 < · · · < tn


Theorem 3.1. For a cutting of a finite interval [a, b], ∆ : q = t0 < t1 <· · · < tn = b, lets consider

V (∆)def=

n∑j=1

(Btj −Btj−1)2

Thenlim||∆||→0

V (∆) = b− a in L2(Ω).

Proof.

E((V (∆)− (b− a))2)

= E

( n∑j=1

(Btj −Btj−1)2 − (tj − tj−1))2

=

n∑i,j=1

E(((Bti −Bti−1)2 − (ti − ti−1))((Btj −Btj−1)2 − (tj − tj−1))

)if i 6= j, ((Bti − Bti−1)2 − (ti − ti−1)) and (Btj − Btj−1)2 − (tj − tj−1) areindependent, so

E(((Bti −Bti−1)2 − (ti − ti−1))((Btj −Btj−1)2 − (tj − tj−1))) = 0.So

E((V (∆)− (b− a))2) =n∑i=1

E(((Bti −Bti−1)2 − (ti − ti−1))2

).

E(((Bti −Bti−1)2 − (ti − ti−1))2

)=

1√2π(ti − ti−1)

∫R

(x2 − (ti − ti−1))2e− x

2

2(ti−ti−1)dx

=1√

2π(ti − ti−1)

∫R

(x2 − (ti − ti−1))2e− x

2

2(ti−ti−1)dx

=(ti − ti−1)2√

2π

∫R

(y2 − 1)2e−y2

2 dy

So

E((V (∆)− (b− a))2) =n∑i=1

(ti − ti−1)2 → 0, as ||∆|| → 0.

2

3.2. Canonical Model of Brownian Motion.


Let (C[0,∞),B, w) be the probability space constructed in Section 2.5.For each time t ≥ 0, we define the random variable

Bt : (C[0,∞),B, w)→ R, Bt(w) = x(t).It is clear that the process {Bt}, t ∈ [0,∞) is a Brownian motion, andthe space C[0,∞) is the space of sample paths. More generally, for anyprobability space (Ω,Σ, P ), with a measurable map

π : Ω→ C[0,∞)preserving the measure in the sense that

Pπ−1(A) = w(A)

then the pull-back process

π∗Bt(ω) = Bt(πω), t ∈ [0,∞)is a Brownian motion with continuous sample paths.

Conversely, given any Brownian motion {Bt} on a probability space (Ω,Σ, P ),after removing a measure 0 subspace Z ⊂ Ω, the modified Brownian motion

Bt : Ω− Z → Rhave continuous sample paths, i.e., for each ω ∈ Ω − Z, t 7→ Bt(ω) iscontinuous function on [0,∞) with Bt(ω) = 0, so we have a map

π : Ω− Z → C[0,∞), π(ω)(t) = Bt(ω),π is a measurable map as the pull-back of cylinder set (2.4) is equal to

{ω | (Bt1(ω), . . . , Btn(ω)) ∈ E}which is clearly measurable in Ω. It is also clear that

P (π−1A) = w(A)

for every cylinder set A ⊂ C[0,∞) which implies the equality for all mea-surable set A ⊂ C[0,∞). This shows that every Brownian is a pull-back of{Bt} on C[0,∞). We call this Brownian motion on C[0,∞) the canonicalmodel.

4. Stochatic Integrals

The main reference for this section is [3] Section 4 and 5.Let B(t) be a Brownian motion, Ft be the σ-algebra generated by B(s)

with s ≤ t. We denote byL2ad([a, b]× Ω)

the space of all stochastic processes f(t, ω), a ≤ t ≤ b, ω ∈ Ω, satisfying thefollowing conditions:


(1) f(t, ω) is adapted to the filtration {Ft};(2) ∫ b

aE|f(t)|2dt


Proof. (1).

E(|I(f)|2) =n∑

i,j=1

E(ξi−1ξj−1(B(ti)−B(ti−1))(B(tj)−B(tj−1))

=n∑i=1

E(ξ2i−1(B(ti)−B(ti−1))2)

=n∑i=1

E(ξ2i−1)(ti − ti−1)

=

∫ baE(|f |2)dt

In the last step, we used the fact ξi−1 and B(ti)−B(ti−1) are independent,and E((B(ti)−B(ti−1))2) = ti − ti−1.(2). It is clear that It(f) is a martingale. Now prove for t

′ < t,

E(I2t −∫ taf(s)2ds | Ft′) = I2t′ −

∫ t′af(s)2ds.

By adding more cutting points, we may assume that t = tm, t′ = tm−1. We

have

I2tm −∫ tma

f(s)2ds

=

m∑i,j=1

ξi−1ξj−1(B(ti)−B(ti−1))(B(tj)−B(tj−1))−m∑i=1

ξ2i−1(ti − ti−1)

E(I2tm −∫ tma

f(s)2ds | Ftm−1)

=m−1∑i,j=1

ξi−1ξj−1(B(ti)−B(ti−1))(B(tj)−B(tj−1))

+ξm−1ξm−1E((B(tm)−B(tm−1))2 | Ftm−1)−m∑i=1

ξ2i−1(ti − ti−1)

=m−1∑i,j=1

ξi−1ξj−1(B(ti)−B(ti−1))(B(tj)−B(tj−1))−m−1∑i=1

ξ2i−1(ti − ti−1)

= I2tm−1 −∫ tm−1a

f(s)2ds

(3).


We want to prove

S(∆, f) =

m∑j=1

((Isj (f)− Isj−1(f))2 −

∫ sjsj−1

f(s)2ds

)=

m∑j=1

Aj (4.4)

has L2-limit 0 as ||∆|| → 0.Claim 1. (4.4) has limit 0 as ||∆|| → 0 and ∆ contains all the cutting pointst1, . . . , tn.Claim 2. lim||∆||→0(S(∆, f)−S(∆′, f)) = 0, where ∆′ is obtained by addingt1, . . . , tn to ∆. It is easy to see Claim 2. holds. So our proof reduces to theproof of Claim 1. Notice that

S(∆, f1 + f2)

=∑j

((Isj (f1)− Isj−1(f1))2 + (Isj (f2)− Isj−1(f2))2 + 2(Isj (f1)− Isj−1(f1))(Isj (f2)− Isj−1(f2))

)−∑j

(∫ sjsj−1

f1(s)2ds+

∫ sjsj−1

f2(s)2ds+ 2

∫ sjsj−1

f1f2ds

)

From this, we see that if f1, f2 have disjoint support and ∆ contains all thecutting points of f1 and f2, then

(Isj (f1)− Isj−1(f1))(Isj (f2)− Isj−1(f2)) = 0

and ∫ sjsj−1

f1f2ds = 0

So we have

S(∆, f1 + f2) = S(∆, f1) + S(∆, f2).

Using the fact that Isj (f)− Isj−1(f) =∫ sjsj−1 f(s)dB(s),

E((Isj (f)− Isj−1(f))2) =∫ sjsj−1

f(s)2ds

som∑j=1

E|Isj − Isj−1 |2 =∫ baEf(s)2ds.

Our proof of Claim 1 further reduces to the case f = ξ1[c,d), this case reducesto Theorem in Section 3. 2

4.2. An Approximation Lemma.

Lemma 4.2. Let S ⊂ L2ad([a, b] × Ω) be the span of step functions inL2ad([a, b]× Ω), then S is a dense in L2ad([a, b]× Ω).


The proof of Lemma in [3] consists of three steps. We discuss here thesecond step, which is the following:For every f ∈ L2ad([a, b] × Ω), there is a sequence {gn} in L2ad([a, b] × Ω)such that gn converges to f in L

2([a, b]×Ω) and for each n, the two variablefunction

E(gn(t)gn(s))

is continuous.The approximation results in analysis can often be proved using convo-

lution. It is also the case for the above statement. Consider the functionsKn(t) (n = 1, 2, . . . ) on [0,∞) given by

Kn(t) = ne−nt

They have the properties that Kn ≥ 0 and∫ ∞0

Kn(t)dt = 1.

And for n large, Kn(x) is concentrated near a small neighborhood of x = 0.We consider the convolution product of Kn(t) and f(t, ω) (as a function oft),

gn(t, ω) =

∫ taKn(t− u)f(u, ω)du =

∫ tane−n(t−u)f(u, ω)du

We prove the sequence gn (n = 1, 2, . . . ) satisfies the conditions in the claim.

E(gn(s)gn(t)) =

∫ sa

∫ tan2e−n(s+t−u1−u2)

∫Ωf(u1, ω)f(u2, ω)P (dω)du1du2

which is clearly continuous in (s, t).

f(t)− gn(t) =∫ ∞

0e−τ (f(t)− f(t− n−1τ))dτ

where f(s) is understood to be 0 if s < a or s > b. The Schwartz inequalityimplies that

|f(t)− gn(t)|2 ≤∫ ∞

0e−τ |f(t)− f(t− n−1τ)|2dτ

So we have∫ baE(|f(t)− gn(t)|2)dt ≤

∫ ∞0

e−τE

(∫ ba|f(t)− f(t− n−1τ)|2dt

)dt.

For almost all ω, |f(t, ·) − f(t − n−1τ, ·)|2 is a measurable of t and is inL1[a, b]. one can prove for such ω,

limn→∞

∫ ba|f(t, ω)− f(t− n−1τ, ω)|2dt = 0.


(this can be proved as follows. For every � > 0, there is a step functions∑ni=1 ci1Ai(t) such that∫

R|f(t, ω)−

n∑i=1

ci1Ai(t)|2dt < �

where each Ai is an interval of finite length. It follows that∫R|f(t− τ

n, ω)−

n∑i=1

ci1Ai(t−τ

n)|2dt < �

for all n. Then one can prove directly∫R|n∑i=1

ci1Ai(t)−n∑i=1

ci1Ai(t− u)|2dt

). This proves the claim.

4.3. The definition of Integral for L2ad([a, b]× Ω) .Let S ⊂ L2ad([a, b] × Ω) be the span of step functions in L2ad([a, b] × Ω)

Theorem 4.1 (1) shows the linear map

I : f 7→∫ baf(s)dB(s)

is an isometry, so it extends to an isometry

I : L2ad([a, b]× Ω)→ L2(Ω).More explicitly, for f ∈ L2ad([a, b] × Ω), we can find a sequence of stepsfunctions {fn} such that

limn→∞

∫ baE(|f − fn|2)dt = 0,

then ∫ baf(s)dB(s)

is the L2-limit of∫ ba fn(s)dB(s). It clear to see that for each t,∫ t

afn(s)dB(s)

has a L2-limit which we denote as∫ taf(s)dB(s).

The convergence in L2 of

limn→∞

∫ tafn(s)dB(s) =

∫ taf(s)dB(s)

is uniform for t ∈ [a, b].


Theorem 4.3. Suppose the Brownian motion Bt has continuous samplepaths. Let f ∈ L2ad([a, b]Ω). Then the stochastic process∫ t

af(s)dB(s), a ≤ t ≤ b,

is continuous, namely, almost all of its sample paths are continuous func-tions on the interval [a, b].

Proof. Formula (4.3) clear indicates the sample paths are continuous. Forthe general case. Let {fn} be a sequence of step stochastic processes inL2([a, b]Ω) such that ∫ b

aE|fn − f |2dt ≤

1

n6.

In particular, {fn} converges to f in L2. For each n, define a stochasticprocess

X(n)t =

∫ tafn(s)dB(s), a ≤ t ≤ b.

All the sample paths ofX(n) are continuous. SinceXt−X(n)t are martingales,|Xt −X(n)t | are submartingales. By Doob’s inequality, we have

P (|Xt −X(n)t | ≥1

n) ≤ nE|Xb −X

(n)b |

By Schwarz inequality, we have

nE|Xb −X(n)b | ≤ n

1

n3=

1

n2.

So we have

P (Supt∈[a,b]|Xt −X(n)t | ≥

1

n) ≤ 1

n2

Since∑ 1

n2


L2ad([a, b] × Ω) ⊂ Lad(Ω, , L2[a, b]). The purpose of this subsection is toextend the stochastic integral

∫ ba f(t, ω)dBt to f(t, ω) ∈ Lad(Ω, , L

2[a, b]).

Lemma 4.4. Let f ∈ Lad(Ω, , L2[a, b]). Then there exists a sequence {fn}of step functions in L2ad([a, b]× Ω) such that

limn→∞

∫ ba|fn(t)− f(t)|2dt = 0

almost surely, and hence also in probability.

Lemma 4.5. Let f(t) be a step stochastic process in L2ad([a, b]× Ω). Thenthe inequality

P

(|∫ baf(t)dBt| > �

)≤ C�2

+ P

(∫ ba|f(t)|2dt > C

)To define the general stochastic integral

∫ ba f(t, ω)dBt, let {fn} be a se-

quence os step functions in L2ad([a, b]×Ω) as in Lemma 4.4. The stochasticintegral

I(fn) =

∫ bafn(t)dB(t)

is defined in Subsection 4.1. We now verify that the sequence {I(fn)} con-verges in probability using the Cauchy Criterion in Section 1.7. Apply

Lemma 4.6 to f = fn − fm with � > 0 and and C = �3

2 to get

P (|Im − In| > �) <�

2+ P (

∫ ba|fm(t)− fn(t)|2dt >

�3

2)

Since

limm,n→∞

∫ ba|fm(t)− fn(t)|2dt = 0 almost surely

limm,n→∞

P (

∫ ba|fm(t)− fn(t)|2dt >

�3

2) = 0

This proves the sequence {In} is fundamental in probability, by CauchyCriterion (see Section 1.7), {I(fn)} converges in probability,∫ b

af(t, ω)dBt

is defined to be the limit. It is easy to see that∫ ba f(t, ω)dBt is independent

of the choice of {fn}.


Theorem 4.6. Suppose f is a continuous {Ft}-adapted stochastic process.Then f ∈ Lad(Ω, L2[a, b]) and∫ b

af(t)dBt = lim

‖∆n‖→0

n∑i=1

f(ti−1)(Bti −Bti−1) in probability

where ∆n = {a = t0 < t1 < · · · < tn−1 < tn = b} is a partition of theinterval [a, b] and and ‖ ∆n ‖= max1≤i≤n(ti − ti−1).

Proof. Since f(t, ω) is continuous function of t ∈ [a, b] for fixed ω,∫ ba |f(t, ω)|

2dt <

∞. So f ∈ Lad(Ω, L2[a, b]). We define

fn(t, ω) =

n∑i=1

f(ti−1, ω)1[ti−1,ti](t)

it is clear that

limn→∞

∫ ba|fn(t, ω)− f(t, ω)|2dt = 0.

So ∫ bafn(t)dBt →

∫ baf(t)dBt in probability

From the definition in Section 4.1, we have∫ bafn(t)dBt =

n∑i=1

f(ti−1)(Bti −Bti−1).

2

4.5. Ito Formula.

Theorem 4.7. Let f(x) be a C2-function. Then

f(B(t))− f(B(a)) =∫ taf ′(B(s))dB(s) +

1

2

∫ taf ′′(B(s))ds,

where the first integral is the stochastic integral (as defined in Subsection4.4) and the second integral is a Riemann-integral for each sample path ofB(s).

Theorem 4.8. Let f(t, x) be a continuous function on (t, x) ∈ [a, b] × R,and ∂∂tf ,

∂∂xf and

∂2

∂x2f are continuous. Then

f(t, B(t))−f(a,B(a)) =∫ tafx(s,B(s))dB(s)+

∫ ta

(ft(s,B(s)) +

1

2fxx(s,B(s))

)ds,

where the first integral is the stochastic integral (as defined in Subsection4.4) and the second integral is a Riemann-integral for each sample path ofB(s).


References

[1] T. Hida, Brownian Motion, Springer, 1980.[2] H. Kuo, Gaussian Measures on Banach Spaces, Springer, 1975.[3] H. Kuo, Introduction to Stochastic Integration, Springer, 2006.[4] A. N. Shiryaev, Probability, G.T.M. 95, 2nd edition, Springer, 1989.[5] 2013#4, Arithmetic Functions, Ramanujan Expansion and Poisson Summation For-

mula

Documents

Basics on Measure Theory and Probability Theorymazhu/Math6050_16/math... · 2016. 10. 20. · 1. Basics on Measure Theory and Probability Theory This Section collects some basic facts