Donsker’s Theorem - University of Ottawaaix1.uottawa.ca/~rbalan/PierreYves-rapport.pdfDonsker’s Theorem Pierre Yves Gaudreau Lamarre August 2012 Abstract In this paper we provide

Donsker’s Theorem

Pierre Yves Gaudreau Lamarre

August 2012

Abstract

In this paper we provide a detailed proof of Donsker’s Theorem, including a review ofthe majority of the results on which the theorem is based, and we design algorithms thatprovide experimental evidence for the Classical Central Limit Theorem and that allow usto observe the effects of Donsker’s Theorem.

Acknowledgments. I would like to thank the University of Ottawa Work Study Program formaking this work term possible and Professor Raluca Balan for accepting to hire me as herassistant for the summer.

Contents

1 Weak Convergence in Metric Spaces 41.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Criteria for Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Prohorov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Notes on Other Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Spaces of Functions 142.1 Continuous Functions on [0,1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Donsker’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Numerical Verifications 363.1 Classical Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2 Donsker’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

A Appendix - Theoretical Background 46A.1 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46A.2 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A.3 Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.4 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Index 56

1 Weak Convergence in Metric Spaces

In this section we introduce the concept of weak convergence in metric spaces, otherwiseknown as convergence in distribution in particular contexts, which will be fundamental to thestudy of central limit theorems. Let (S, ρ) be an arbitrary metric space. Given x ∈ S and ε > 0we use B(x, ε) to denote the set y ∈ S : ρ(x, y) < ε, and S to denote the Borel σ-field on S.Given A ⊆ S, we use the notation ∂A for the boundary of A, A for the interior of A, A− forthe closure of A and IA for the indicator function of A.

1.1 Definition

Definition 1.1 Let (S, ρ) be a metric space and P, Pnn∈N be probability measures on (S,S).

We say that the sequence Pnn∈N converges weakly to P and write Pnw−→ P if, for every

function f : S → R that is bounded and continuous, we have∫S

f dPn →∫S

f dP 1.

To ensure that the concept of weak convergence is well defined, it is necessary to verify thatthe limit for Pn

w−→ P is unique.

Definition 1.2 Let (S, τ) be an arbitrary topological space. A probability measure P on (S,S)is regular if, for every A ∈ S and every ε > 0, there exists an open set G and a closed set F ofS such that F ⊆ A ⊆ G and P(G \ F ) < ε.

Proposition 1.3 ([2] Theorem 1.1) Let (S, ρ) be a metric space. Every probability measure Pon (S,S) regular.

Proposition 1.4 Let (S, ρ) be a metric space and P,P′ be probability measures on (S,S). IfP(F ) = P′(F ) for every closed set F ∈ S, then P = P′.

Proof Let A ∈ S and ε > 0 be arbitrary. By Proposition 1.3, there exists an open set G ∈ Sand a closed set F ∈ S such that F ⊆ A ⊆ G and P(G \ F ) < ε. As probability measures aremonotone, A ⊆ G implies that P(A) ≤ P(G), and F ⊆ A implies that −P′(A) ≤ −P′(F ). Wethen have

|P(A)− P′(A)| ≤ |P(G)− P′(F )|= |P(F ∪ (G \ F ))− P′(F )| (since F ⊆ G)

= |P(F ) + P(G \ F )− P′(F )| (F and G \ F are disjoint)

= P(G \ F ) (F is closed)

< ε.

Taking the limit as ε→ 0, we obtain the equality P(A) = P′(A).

Lemma 1.5 ([2] Theorem 1.2) Let (S, ρ) be a metric space and P,P′ be probability measures on(S,S). If ∫

S

f dP =

∫S

f dP′

for every bounded and uniformly continuous function f : S → R, then P = P′.

Proof Let F ∈ S be an arbitrary closed set. For every ε > 0, define Fε := x ∈ S : ρ(x, F ) < εand fε : S → R as

1Given a sequence xnn∈N, we write xn → x for limn→∞

xn = x to alleviate notation.

4

fε(x) =

1− ρ(x, F )

εif x ∈ Fε

0 otherwise

Note that fε is bounded by 1, uniformly continuous on S and for every x ∈ S, we have thatIF (x) ≤ fε(x) ≤ IFε(x). Thus,

P(F ) =

∫S

IF dP

≤∫S

fε dP (by monotonicity, IF ≤ fε)

=

∫S

fε dP′ (by hypothesis, fε bounded and uniformly continuous)

≤∫S

IFε dP′ (by monotonicity, fε ≤ IFε)

= P′(Fε).

As the above holds for every ε > 0, we take a limit as ε→ 0 and conclude P(F ) ≤ P′(F ). Usinga similar argument, we could prove P′(F ) ≤ P(F ), which would imply P(F ) = P′(F ). The resultthen follows by Proposition 1.4.

With these results, we can obtain the following theorem, which establishes that a singlesequence of probability measures cannot converge weakly to two different probability measures.

Theorem 1.6 Let (S, ρ) be a metric space and P,P′, Pnn∈N be probability measures on (S,S).

If Pnw−→ P and Pn

w−→ P′, then P = P′.

Proof Let f : S → R be an arbitrary bounded and continuous function. Then, by the definitionof weak convergence ∫

S

f dP = limn→∞

∫S

f dPn =

∫S

f dP′.

Since every uniformly continuous function is continuous, we can apply Lemma 1.5 to obtain theequality P = P′.

1.2 Criteria for Weak Convergence

It is often impractical to demonstrate weak convergence for a sequence of probability measuresPnn∈N by taking a limit as n→∞ of integrals with respect to dPn for an arbitrary boundedcontinuous function. We will now see that there are various criteria that imply or are equivalentto weak convergence that, in some cases, can be easier to work with.

Definition 1.7 Let (S, ρ) be a metric space and P be a probability measure on (S,S). A setA ∈ S is called a P-continuity set if P(∂A) = 0.

Theorem 1.8 (Portmanteau Theorem) ([2] Theorem 2.1) Let (S, ρ) be a metric space andP, Pnn∈N be probability measures on (S,S). The following conditions are equivalent.

i. Pnw−→ P

ii.

∫S

f dPn →∫S

f dP for every bounded, uniformly continuous function f : S → R

iii. lim supn→∞

Pn(F ) ≤ P(F ) for every closed set F ∈ S

iv. lim infn→∞

Pn(G) ≥ P(G) for every open set G ∈ S

v. Pn(A)→ P(A) for every P-continuity set A ∈ S

5

Proof

i. =⇒ ii. This is a direct consequence of the definition of weak convergence, since everyuniformly continuous function is continuous.

ii. =⇒ iii. Let F ∈ S be an arbitrary closed set. Given any ε > 0, let Fε ∈ S and fε : S → Rbe defined as in the proof of Lemma 1.5. We then have

lim supn→∞

Pn(F ) = lim supn→∞

∫S

IF dPn

≤ lim supn→∞

∫S

fε dPn (by monotonicity, IF ≤ fε)

=

∫S

fε dP (by ii, fε is bounded and uniformly continuous)

≤∫S

IFε dP (by monotonicity, fε ≤ IFε)

= P(Fε).

Taking the limit ε→ 0, we have lim supn→∞

Pn(F ) ≤ P(F ).

iii. ⇐⇒ iv. Suppose that iii. holds. Let G ∈ S be an arbitrary open set. Since G is open, Gc

is closed, and therefore, by hypothesis and the properties of inferior and superior limits, weobtain:

lim infn→∞

(Pn(G)− 1) = lim infn→∞

−(1− Pn(G))

= lim infn→∞

−Pn(Gc)

= − lim supn→∞

Pn(Gc)

≥ −P(Gc) (by iii, Gc is closed)

= P(G)− 1,

which proves iv. A similar argument shows that iv. implies iii.

iv. =⇒ v. Let A be an arbitrary P-continuity set of S. We first notice that

P(A−) = P((A− \A) ∪A) (since A ⊆ A−)

= P(∂A ∪A)= P(∂A) + P(A) (interior and boundary disjoint)

= P(A). (A is a P-continuity set)

Moreover, we have A ⊆ A ⊆ A−, which implies that P(A) ≤ P(A) ≤ P(A−) by mono-tonicity, and therefore, P(A) = P(A) = P(A−). By hypothesis and equivalence of iii. andiv,

P(A) = P(A−)

≥ lim supn→∞

Pn(A−) (by iii. since A− is closed)

≥ lim supn→∞

Pn(A) (by monotonicity since A ⊆ A−)

≥ lim infn→∞

Pn(A)

≥ P(A) (by iv. since A is open)

= P(A)

and thus,

P(A) = lim supn→∞ Pn(A−) = lim infn→∞ Pn(A).

6

Consequently, by the equality P(A) = P(A−) = P(A), we have

P(A) = lim supn→∞

Pn(A) = lim infn→∞

Pn(A),

which implies that Pn(A)→ P(A).

v. =⇒ i. Let f : S → R be an arbitrary continuous function such that 0 < f(x) < 1 for everyx ∈ S. For every t ∈ (0, 1), define the set [f > t] = x ∈ S : f(x) > t.

Let x ∈ [f > t]− \ [f > t]. We will prove f(x) = t. The fact that x 6∈ [f > t] impliesthat f(x) ≤ t, and x ∈ [f > t]− implies that there exists a sequence xnn∈N ⊆ [f > t] suchthat xn → x. Suppose f(x) = k < t. Then, there exists some ε > 0, namely, ε = t− k, suchthat, for every n ∈ N,

f(xn)− f(x) = f(xn)− k> t− k (since ∀n ∈ N : xn ∈ [f > t])

= ε.

The above implies that f(xn) 6→ f(x), which contradicts the fact that f is continuous. Thus,it is necessary that f(x) = t. From here, it follows that

[f > t]− ⊆ x ∈ S : f(x) ≥ t = [f ≥ t]. (1)

Let x ∈ [f > t], and k := f(x) > t. Since f is continuous, for ε = k − t > 0, there existsδ > 0 such that for any y ∈ B(x, δ), we have

|f(x)− f(y)| = |k − f(y)| < ε = k − t.

This implies that f(y) > t for every y ∈ B(x, δ). Thus, B(x, δ) ⊆ [f > t], which implies that[f > t] is open since x was arbitrary. We then have

[f > t] = [f > t]. (2)

We can now show the following with regards to the boundary of [f > t]:

∂[f > t] = [f > t]− \ [f > t]

⊆ [f ≥ t] \ [f > t] (by (1))

= [f ≥ t] \ [f > t] (by (2))

= x ∈ S : f(x) = t

By Corollary A.17, f is a random variable, thus, for every probability measure Q on S, ithas a distribution Qf−1 with associated distribution function FQf . Therefore,

Q(∂[f > t]) ≤ Q(x ∈ S : f(x) = t)= FQf (t)− lim

x↑tFQf (x)

which, by Proposition A.45, is nonzero for at most countably many t. Applying this to P,we have that the sets of the form [f > t] are P-continuity sets except for at most countablymany t, and consequently, by condition v. Pn([f > t]) → P([f > t]) except for at mostcountably many t. We then have

limn→∞

∫S

f dPn = limn→∞

∫[0,∞)

Pn([f > t]) dt (by Proposition A.56)

= limn→∞

∫(0,1)

Pn([f > t]) dt (since (0 < f(x) < 1)

=

∫(0,1)

P([f > t]) dt (by the bounded convergence Theorem)

=

∫S

f dP. (by Proposition A.56)

7

To see how the above equality generalizes to every bounded continuous function, we noticethat for every continuous function f : S → R that is bounded (i.e. there exists someM ∈ R such that −M < f(x) < M for every x ∈ S) the function g : S → R defined as

g(x) = f(x)+MM is such that 0 < g(x) < 1 for every x ∈ S. The conclusion follows by the

linearity of integrals.

Let (S, ρ) and (S′, ρ′) be metric spaces, P be a probability measure on (S,S) and h : S → S′

be S/S ′-measurable. Then2, P h−1 is a probability measure on (S′,S ′) 3. Given a sequence ofprobability measures Pnn∈N on (S,S), the following result gives necessary conditions on h for

Pnw−→ P to imply Pn h−1 w−→ P h−1.

Theorem 1.9 (Continuous Mapping Theorem) ([2] Theorem 2.7) Let (S, ρ) and (S′, ρ′)be metric spaces, P, Pnn∈N be probability measures on (S,S) and h : S → S′ be an arbitraryS/S ′-measurable function. Let Dh ⊆ S be the set of points at which h is discontinuous. If

P(Dh) = 0 and Pnw−→ P, then Pnh

−1 w−→ Ph−1.

Proof Let F ∈ S ′ be an arbitrary closed set. If x is a point in (h−1(F ))− we know there existsa sequence xnn∈N ⊆ h−1(F ) such that xn → x. Suppose x 6∈ Dh. Then, h is continuous atx, which implies that h(xn) → h(x), and since h(xi) ∈ F for every i ≥ 0, h(x) is a limit pointof F , hence, h(x) ∈ F−. Consequently, x ∈ Dc

h implies that x ∈ h−1(F−). This proves thatDch ∩ (h−1(F ))− ⊆ h−1(F−). We then have

lim supn→∞

Pn h−1(F ) = lim supn→∞

Pn(h−1(F ))

≤ lim supn→∞

Pn((h−1(F ))−

)(since h−1(F ) ⊆ (h−1(F ))−)

≤ P((h−1(F ))−

)(Portmanteau theorem, (h−1(F ))− is closed)

= P(Dch ∩ (h−1(F ))−

)(Proposition A.37 b. since P(Dc

h) = 1)

≤ P(h−1(F−)) (since Dch ∩ (h−1(F ))− ⊆ h−1(F−))

= Ph−1(F ) (F is closed, hence equal to its closure).

By the Portmanteau theorem, it follows that Pnh−1 w−→ Ph−1.

In the above result, no mention was made about conditions on h for Dh to be measurable.We see that this is always the case. Let (S, ρ) and (S, ρ′) be metric spaces and h : S → S′ be anarbitrary measurable function. Given ε > 0 and δ > 0, define the set

Aε,δ = x ∈ S : ∃y, z ∈ S such that ρ(x, y) < δ, ρ(x, z) < δ and ρ′(h(y), h(z)) ≥ ε

Lemma 1.10 For every ε > 0 and δ > 0, Aε,δ is open.

Proof Let ε > 0 and δ > 0 be arbitrary. Given x ∈ Aε,δ, there exists y, z ∈ S such thatρ(x, y) < δ, ρ(x, z) < δ and ρ′(h(y), h(z)) ≥ ε. Let η = minδ − ρ(y, x); δ − ρ(x, z). We showthat B(x, η) ⊆ Aε,δ. Let x′ be in B(x, η). By the triangle inequality, we have

ρ(x′, y) ≤ ρ(x′, x) + ρ(x, y)

< η + ρ(x, y)

≤ δ − ρ(x, y) + ρ(x, y)

= δ,

and similarly,

ρ(x′, z) ≤ ρ(x′, x) + ρ(x, z)

< η + ρ(x, z)

≤ δ − ρ(x, z) + ρ(x, z)

= δ.2Where denotes function composition3See Proposition A.15 and remark A.43

8

Hence, there exists y, z ∈ S such that ρ(x′, y) < δ, ρ(x′, z) < δ and ρ′(h(y), h(z)) ≥ ε, whichimplies that x′ ∈ Aε,δ. As x′ ∈ B(x, η) was arbitrary, we conclude B(x, η) ⊆ Aε,δ. Since x, εand δ were also arbitrary, it follows that, for every ε > 0 and δ > 0, for every x ∈ Aε,δ, thereexists an open ball B(x, η) contained in Aε,δ, from which the result follows.

Proposition 1.11 For every measurable function h : S → S′, Dh is a measurable set.

Proof By the definition of continuity on metric spaces, h is discontinuous at x ∈ S if and onlyif there exists some ε > 0 such that for every δ > 0, x ∈ Aε,δ. Hence

Dh =⋃ε>0

(⋂δ>0

Aε,δ

).

Since the positive rational numbers are dense in the positive real numbers, we see that

Dh =⋃ε>0

(⋂δ>0

Aε,δ

)=⋃ε∈Q+

⋂δ∈Q+

Aε,δ

,

where Q+ = x ∈ Q : x > 0. From Lemma 1.10, we have that every Aε,δ is open, so obviouslyan element of S. It then follows from Proposition A.4 and the definition of σ-fields that Dh ∈ S.

Theorem 1.12 ([2] Theorem 2.6) Let (S, ρ) be a metric space and P, Pnn∈N be probability

measures on (S,S). Pnw−→ P if and only if for every subsequence Pnii∈N, there exists a

further subsequence Pnij j∈N such that Pnijw−→ P.

Proof The fact that Pnw−→ P implies that Pnij

w−→ P is obvious from the definition of weakconvergence.

We now show the other implication by contraposition. Suppose Pn 6w−→ P. Consequently,

there exists a bounded and continuous function f : S → R such that∫S

f dPn 6→∫S

f dP

In other words, there exists ε > 0 such that, for every N ∈ N, there is some n ≥ N such that∣∣∣∣∫S

f dPn −∫S

f dP

∣∣∣∣ > ε

We can therefore find an increasing sequence of indices n1 < n2 < ... such that the subsequencePnii∈N has the following property∣∣∣∣∫

S

f dPni −∫S

f dP

∣∣∣∣ > ε

for every i ∈ N. Obviously no further subsequence of Pnii∈N converges weakly to P.

Theorem 1.13 Given an integer k ≥ 2, let (S1,S1), ..., (Sk,Sk) be measurable metric spaces4,(S1× ...×Sk,S1× ...×Sk) be the product space on S1, ..., Sk and for every i ≤ k, let Pi, Pinn∈Nbe probability measures on (Si,Si). If S1 × ... × Sn is separable, then the sequence of productmeasures P1

n × ...× Pknn∈N converges weakly to the product measure P1 × ...× Pk if and only

if for every i ≤ k, Pinw−→ Pi.

4For every i ≤ k, Si denotes the Borel σ-field on Si

9

Proof We proceed by induction.

The case k = 2 follows from [2] Theorem 2.8 (ii).

Suppose the result holds for some k = m ≥ 2. We prove it holds for k = m + 1. SupposeS1 × ... × Sm+1 = (S1 × ... × Sm) × Sm+1 is separable. From the case k = 2, we know that

(P1n × ...× Pmn )× Pm+1

nw−→ (P1 × ...× Pm)× Pm+1 if and only if

P1n × ...× Pmn

w−→ P1 × ...× Pm, and Pm+1n

w−→ Pm+1. (3)

From [2] Appendix M10, a product of two metric spaces S1 × S2 is separable if and only ifS1 and S2 are separable. Consequently, the fact that (S1 × ... × Sm) × Sm+1 is separableimplies that S1 × ... × Sm is separable. Since the result holds for k = m, we then have thatP1n × ... × Pmn

w−→ P1 × ... × Pm if and only if for every i ≤ m, Pinw−→ Pi. The result follows by

combining the last remark with (3).

1.3 Prohorov’s Theorem

Definition 1.14 A set K of probability measures on (S,S) is relatively compact if for everysequence of probability measures Pnn∈N ⊆ K, there exists a subsequence of Pn that convergesweakly to some probability measure P on (S,S).

Definition 1.15 A probability measure P on (S,S) is tight if, for every ε > 0, there exists acompact set K ∈ S such that P(K) > 1− ε. Similarly, a set K of probability measures on (S,S)is tight if for every ε > 0, there exists a compact set K ∈ S such that P(K) > 1 − ε for everyP ∈ K.

Theorem 1.16 (Prohorov’s Theorem Direct Part) ([2] Theorem 5.1) Let K be a collectionof probability measures on (S,S). If K is tight, then K is relatively compact.

Theorem 1.17 (Prohorov’s Theorem Converse Part) ([2] Theorem 5.2) Let K be a col-lection of probability measures on (S,S). If (S, ρ) is a separable and complete metric space andK is relatively compact, then K is tight.

Proof Let Gnn∈N be a sequence of open sets such that Gi ⊆ Gj if i ≤ j and Gn → S. Wefirst prove that, for every ε > 0, there exists N ∈ N such that for every P ∈ K, if n ≥ N , then

P(Gn) > 1− ε. (4)

To show this, suppose by contradiction that there exists some ε > 0 such that, for every n ∈ N,there exists Pn ∈ K such that Pn(Gn) ≤ 1 − ε. Since K is relatively compact, there exists asubsequence Pnii∈N of Pn that converges weakly to some probability measure P on (S,S).Since every Gn is open, it follows from the Portmanteau theorem that, for every n ∈ N

P(Gn) ≤ lim infi→∞

Pni(Gn)

≤ lim infi→∞

Pni(Gni) (by monotonicity, Gi ⊆ Gj for i ≤ j)

≤ 1− ε,

which contradicts the fact that Gn → S by Proposition A.10 c.

Since S is separable, for every k > 0, there exists a countable set Ak,nn∈N of open ballsof radius 1/k that covers S. Applying (4), we know that for any k > 0 and every ε > 0, thereexists nk ∈ N such that, for any P ∈ K, if n ≥ nk, then

P

⋃i≤n

Ak,i

> 1− ε

2k. (5)

Define the set

10

K :=

∞⋂k=1

⋃i≤nk

Ak,i

.

For every k > 0, the set⋃i≤nk Ak,i is obviously totally bounded, and since S is complete,⋃

i≤nk Ak,i is also compact5. Since every intersection of compact sets is compact6, K is compact.Moreover, for every probability measure P ∈ K

P(K) = 1− P(Kc)

= 1− P

∞⋂k=1

⋃i≤nk

Ak,i

c= 1− P

∞⋃k=1

⋃i≤nk

Ak,i

c (DeMorgan’s law)

≥ 1−∞∑k=1

P

⋃i≤nk

Ak,i

c

(countable subadditivity)

= 1−∞∑k=1

1− P

⋃i≤nk

Ak,i

= 1 +

∞∑k=1

P

⋃i≤nk

Ak,i

− 1

> 1 +

∞∑k=1

[(1− ε

2k

)− 1]

(by (5))

= 1− ε∞∑k=1

1

2k

= 1− ε (geometric series).

Remark 1.18 An obvious corollary to the converse part of Prohorov’s Theorem is that , if(S, ρ) is a complete and separable metric space, then every probability measure on (S,S) is tight.Indeed, for every probability measure P, the singleton P is relatively compact, since its onlysequence P,P, ... converges weakly to P.

1.4 Convergence in Distribution

We now look at the particular case of weak convergence for the distributions of randomelements.

Definition 1.19 Let (Ω,F ,P), (Ωn,Fn,Pn)n∈N be probability spaces, (S, ρ) be a metric spaceand X : Ω→ S, Xn : Ωn → Sn∈N be random elements with respective distributions µX , µXnn∈N.

We say Xnn∈N converges in distribution to X and write Xnd−→ X if µXn

w−→ µX .

We can rewrite the Portmanteau and continuous mapping theorems in the present contextand directly obtain the following modifications of the results.

Definition 1.20 Let (Ω,F ,P) be a probability space, (S, ρ) be a metric space and X : Ω→ S be arandom element with distribution µX . A set A ∈ S is called a X-continuity set if µX(∂A) = 0.

5See [4] Theorem 926See [5] Corollaire 3.10

11

Theorem 1.21 (Portmanteau Theorem) Let (Ω,F ,P), (Ωn,Fn,Pn)n∈N be probability spaces,(S, ρ) be a metric space and X : Ω→ S, Xn : Ωn → Sn∈N be random elements with respectivedistributions µX , µXnn∈N. The following conditions are equivalent

i. Xnd−→ X

ii. E[f(Xn)]→ E[f(X)] for every function f : S → R that is bounded and uniformly continuous

iii. lim supn→∞

µXn(F ) ≤ µX(F ) for every closed set F ∈ S

iv. lim infn→∞

µXn(G) ≥ µX(G) for every open set G ∈ S

v. µXn(A)→ µX(A) for every X-continuity set A ∈ S

Let (Ω,F ,P) be a probability space and (S,S) and (S′,S ′) be measurable spaces. By Theo-rem A.14, we know that, given a random element X : Ω → S and a S/S ′-measurable mappingh : S → S′, the composition h(X) : Ω→ S′ is a random element.

Theorem 1.22 (Continuous Mapping Theorem) Let (Ω,F ,P), (Ωn,Fn,Pn)n∈N be prob-ability spaces, (S,S) and (S′,S ′) be measurable spaces, X : Ω → S, Xn : Ωn → Sn∈N berandom elements, h : S → S′ be a S/S ′-measurable mapping and Dh ⊆ S be the set of points at

which h is discontinuous. If Xnd−→ X and P(X ∈ Dh) = 07, then h(Xn)

d−→ h(X).

Definition 1.23 Let (Ω,F ,P) and (Ω′,F ′,P′) be probability spaces, (S,S) be a measurable met-ric space and X : Ω→ S and Y : Ω′ → S be random elements. We say X and Y are equal in

distribution and write Xd= Y if µX = µY , where µX and µY are the distributions of X and

Y respectively.

Proposition 1.24 Let (Ω,F ,P), (Ωn,Fn,Pn)n∈N and (Ω′n,F ′n,P′n)n∈N be probability spaces,(S,S) be a measurable metric space and X : Ω→ S, Xn : Ωn → Sn∈N and Yn : Ω′n → Sn∈Nbe random elements. If Xn

d−→ X and Xnd= Yn for every n ∈ N, then Yn

d−→ X.

Proof Let f : S → R be an arbitrary bounded and continuous function. Then,

limn→∞

∫S

f dµYn = limn→∞

∫S

f dµXn (since Xnd= Yn ∀n ∈ N)

=

∫S

f dµX ,

which proves Ynd−→ X.

1.5 Notes on Other Modes of Convergence

Definition 1.25 Let (Ω,F ,P) be a probability space, (S, ρ) be a metric space and X, Xnn∈N be

random elements on Ω. We say Xnn∈N converges in probability to X and write Xnp−→ X

if, for every ε > 0

P(ρ(Xn, X) < ε)→ 1.

Proposition 1.26 ([2] Theorem 3.1) Let Xnn∈N and Ynn∈N be sequences of random vari-

ables on a probability space (Ω,F ,P) taking values in (S,S). If Xnd−→ X and ρ(Xn, Yn)

p−→ 0,

then Ynd−→ X.

7See Definition A.42

12

Proof For an arbitrary closed set F ⊆ S and ε > 0, let Fε = x ∈ S : ρ(x, F ) ≤ ε. For everyn ∈ N, if ω ∈ Ω is such that Yn(ω) ∈ F and Xn(ω) 6∈ Fε, then ρ(Yn(ω), Xn(ω)) > ε. Therefore,

ω ∈ Ω : Yn(ω) ∈ F \ ω ∈ Ω : Xn(ω) ∈ Fε ⊆ ω ∈ Ω : ρ(Yn(ω), Xn(ω)) > ε.

By monotonicity and Proposition A.10 e. we then have

P(Yn ∈ F ) ≤ P(ρ(Yn, Xn) > ε) + P(Xn ∈ Fε).

By hypothesis and the Portmanteau Theorem, since Fε is closed, we have

lim supn→∞

P(Yn ∈ F ) ≤ 0 + lim supn→∞

P(Xn ∈ Fε)

≤ P(X ∈ Fε)

Letting ε→ 0 we have

lim supn→∞

P(Yn ∈ F ) ≤ P(X ∈ F )

for every closed set F , which proves Ynw−→ X by the Portmanteau Theorem.

Theorem 1.27 (Slutsky’s Theorem)([10] Theorem 6.1) Let Xnn∈N, Ynn∈N and X be

random variables on a probability space (Ω,F ,P). If Xnd−→ X and Yn

p−→ c, where c ∈ R, then

a. Xn + Ynd−→ X + c

b. XnYnd−→ cX

13

2 Spaces of Functions

2.1 Continuous Functions on [0,1]

Let C be the set of all continuous functions x : [0, 1] → R with respect to the standardEuclidean metric d(t1, t2) = |t1 − t2|. As C is a real vector space, we can define a norm on itselements from which we can obtain a metric. Given x ∈ C, let

‖x‖ := supt∈[0,1]

|x(t)|

Then, C together with the distance ρ(x1, x2) = ‖x1−x2‖ is a metric space. Note that, throughoutthis section, we will denote by C the Borel σ-field on C, and by Bk the Borel σ-field on Rk forevery k ∈ N.

Proposition 2.1 The space (C, ρ) is complete.

Proof Let xnn∈N be an arbitrary Cauchy sequence of functions in C. Thus, for every ε > 0,there exists N ∈ N such that if n,m ≥ N , then ‖xm−xn‖ < ε. In particular, for every t ∈ [0, 1],

|xn(t)− xm(t)| ≤ supt∈[0,1]

|xn(t)− xm(t)| = ‖xn − xm‖ < ε,

which implies that xn(t)n∈N is a Cauchy sequence of real numbers. Since R with the standardeuclidean metric | · | is a complete metric space, we know that for every t ∈ [0, 1], the sequencexn(t) has a limit xt ∈ R.

Define the function x : [0, 1] → R as x(t) = xt for every t ∈ [0, 1]. We show that xn → xin the metric ‖ · ‖ and that x ∈ C. Since xnn∈N is a Cauchy sequence, for every ε > 0, thereexists N ∈ N such that if n,m ≥ N , then ‖xn − xm‖ < ε. Thus, for every t ∈ [0, 1], we have|xn(t)− xm(t)| < ε, which implies that

|xn(t)− x(t)| = limm→∞

|xn(t)− xm(t)| ≤ ε.

Since this is true for every t ∈ [0, 1], we have ‖xn − x‖ ≤ ε, which proves xn → x in C. The factthat x ∈ C follows from a classical result in functional analysis8. Consequently, every Cauchysequence of functions in C has a limit in C, which proves (C, ‖ · ‖) is complete.

Proposition 2.2 The space (C, ρ) is separable.

Proof For every k ∈ N, let Qk ⊆ C be the set of polynomials with rational coefficients of degreek. Then, for every k ∈ N, we have |Qk| ≤ |Qk| (where | · | denotes cardinality), which impliesthat

Q :=⋃k∈N

Qk

is a countable set, since it is a countable union of countable sets.

Let p(t) = a0 + a1t + ... + aktk be a polynomial of degree k with real coefficients. Since Q

is dense in R, for every ε > 0 and i ≤ k, there exists bi ∈ Q such that |ai − bi| < ε2(k+1) . If we

define q(t) = b0 + b1t+ ...+ bktk, then for every t ∈ [0, 1] we have

|p(t)− q(t)| =

∣∣∣∣∣k∑i=0

(ai − bi)ti∣∣∣∣∣

≤k∑i=0

|ai − bi|

<

k∑i=0

ε

2(k + 1)=

ε

2.

8See [5] Theorem 5.10 (Uniform limit of continuous functions is continuous)

14

Since the above holds for every t ∈ [0, 1] and k ∈ N, we have that, for every polynomial withreal coefficients p ∈ C and ε > 0, there exists a polynomial q ∈ Q with rational coefficientssuch that ρ(p, q) = ‖p − q‖ ≤ ε/2. Furthermore, we know from the Weierstrass approximationTheorem9 that, given any x ∈ C, for every ε > 0, there exists a polynomial p ∈ C with realcoefficients such that ρ(x, p) < ε/2. Hence, for every x ∈ C and ε > 0, there exists a polyno-mial p ∈ C with real coefficients and a polynomial q ∈ Q with rational coefficients such thatρ(x, q) ≤ ρ(x, p) + ρ(p, q) ≤ ε. This proves that Q that is dense in C.

Consequently, C has a countable dense set, which makes it separable.

Corollary 2.3 Every probability measure on (C, ρ) is tight.

Proof This follows from Proposition 2.1, Proposition 2.2 and Remark 1.18.

Some useful criteria for convergence of probability measures on (C, C) involves mappings thatproject functions of C in Euclidean space.

Definition 2.4 For every integer k ≥ 1, define the set

Uk = (t1, ..., tk) ∈ [0, 1]k : t1 < ... < tk.

Definition 2.5 Given k ≥ 1 and T = (t1, ..., tk) ∈ Uk, define the projection mapping,denoted πT : C → Rk, as

πT (x) := (x(t1), ..., x(tk))

for every x ∈ C.

Proposition 2.6 For every integer k ≥ 1 and T = (t1, ..., tk) ∈ Uk, the projection πT is contin-uous everywhere on C with respect to the euclidean `2 metric (hence C/Bk-measurable10).

Proof For every x1 ∈ C and ε > 0, there exists δ > 0, namely δ = ε√k

, such that if x2 ∈ B(x1, δ),

then

‖πT (x1)− πT (x2)‖`2 =

(k∑i=1

|x1(ti)− x2(ti)|2)1/2

≤

(k∑i=1

supt∈[0,1]

|x1(t)− x2(t)|2)1/2

<√k · δ = ε.

Definition 2.7 A set A ⊆ C is called finite-dimensional if there exists T ∈ Uk for somek ≥ 1 and H ∈ Bk such that π−1

T (H) = A. Let Cf be the class of all finite-dimensional sets.

Remark 2.8 We notice that Cf ⊆ C. To see this, let A ∈ Cf . Then, there exists T ∈ Ukwith k ≥ 1 and H ∈ Bk such that π−1

T (H) = A. From Proposition 2.6, we know πT is C/Bk-measurable, which by definition implies that A ∈ C.

Proposition 2.9 ([2] p.12) Let P,P′ be probability measures on (C, C). If P(A) = P′(A) forevery A ∈ Cf , then P = P′.

9See [5] Theorem 5.2710See Corollary A.17

15

Proof We first prove that Cf is a π-system. Let A,B ∈ Cf be arbitrary. Then, we know thereexists HA ∈ Bk and HB ∈ Bl, where k, l ≥ 1, and T = (t1, ..., tk) ∈ Uk, S = (s1, ..., sl) ∈ U l suchthat A = π−1

T (HA) and B = π−1S (HB). If we set U = (t1, ..., tk, s1, ..., sl), we then have

A ∩B = π−1T (HA) ∩ π−1

S (HB)

= x ∈ C : (x(t1), ..., x(tn)) ∈ HA and (x(s1), ..., x(sl)) ∈ HB= x ∈ C : (x(t1), ..., x(tn), x(s1), ..., x(sl)) ∈ HA ×HB= π−1

U (HA ×HB).

As Bk+l = σ(A × B : A ∈ Bk and B ∈ Bl) 11, we have that HA ×HB ∈ Bk+l, which impliesthat A ∩B ∈ Cf . Since A and B were arbitrary, Cf is a π-system.

We now prove σ(Cf ) = C. From Remark 2.8, we have that σ(Cf ) ⊆ C, since σ(Cf ) isthe intersection of every σ-field that contains Cf . We now show that B(x, ε) ∈ σ(CF ) for anyx ∈ C, ε > 0. Given arbitrary x ∈ C and ε > 0, from the definition of the norm ρ on C, we have

B(x, ε) =⋂

t∈[0,1]

x′ ∈ C : |x(t)− x′(t)| < ε.

Since Q is dense in R,

B(x, ε) =⋂

t∈Q∩[0,1]

x′ ∈ C : |x(t)− x′(t)| < ε.

We notice that, for any t ∈ Q ∩ [0, 1], π−1t (B(x(t), ε)) ∈ Cf , since B(x(t), ε) ∈ B. Also, by

definition,

π−1t

(B(x(t), ε)

)= x′ ∈ C : πt(x

′) ∈ B(x(t), ε)

= x′ ∈ C : x′(t) ∈ B(x(t), ε)= x′ ∈ C : |x(t)− x′(t)| < ε.

As σ-fields are closed under countable intersections12, B(x, ε) ∈ σ(Cf ), hence, σ(Cf ) containsevery open ball in C. Furthermore, since C is separable by Proposition 2.2, every open set of Cis the union of at most countably many open balls, hence, every open set of C is in σ(Cf ). SinceC is the intersection of every σ-field containing all the open sets of C, we have C ⊆ σ(Cf ). SinceC ⊆ σ(Cf ) and σ(Cf ) ⊆ C, we conclude σ(Cf ) = C. P = P′ then follows from Theorem A.12.

We know from Proposition 2.6 that every projection mapping πT with T ∈ Uk is C/Bk-measurable, and thus, for every probability measure P on (C, C), we can define a probabilitymeasure P π−1

T on (Rk,Bk) 13.

Theorem 2.10 ([2] Example 5.1) Let P, Pnn∈N be probability measures on (C, C). If for every

k ≥ 1 and every T ∈ Uk we have that Pn π−1T

w−→ P π−1T and Pnn∈N is relatively compact,

then Pnw−→ P.

Proof Since the set Pnn∈N is relatively compact, every sequence of elements in that setcontains a weakly convergent subsequence. In particular, for every subsequence Pnii∈N of

Pn, there exists a further subsequence Pnij j∈N such that Pnijw−→ P′ for some probability

measure P′ on (C, C). For every k ≥ 1 and T ∈ Uk, it follows from the Continuous Mapping

Theorem and Proposition 2.6 that Pnij π−1T

w−→ P′ π−1T . By Theorem 1.6, we then have that

P π−1T = P′ π−1

T for every projection mapping πT , as Pn π−1T

w−→ P π−1T obviously implies

that Pnij π−1T

w−→ P π−1T .

Given an arbitrary A ∈ Cf , we know there exists T ∈ Uk for some k ≥ 1 and H ∈ Bk suchthat A = π−1

T (H). We then have

11See [6] Proposition 5.312See Proposition A.413See Proposition A.15 and remark A.43

16

P(A) = P π−1T (H) = P′ π−1

T (H) = P′(A).

It then follows from Proposition 2.9 that P = P′.

We have just shown that, for every subsequence Pnii∈N of Pn, there exists a further

subsequence Pnij j∈N that converges weakly to P, which, by Theorem 1.12, implies that Pnw−→

P.

Corollary 2.11 ([2] Theorem 7.1) Let P, Pnn∈N be probability measures on (C, C). If Pnn∈Nis tight and Pn π−1

Tw−→ P π−1

T for every integer k ≥ 1 and every T ∈ Uk, then Pnw−→ P.

Proof This is a consequence of Theorem 2.10 and the direct part of Prohorov’s Theorem.

The results in this section up to this point give useful criteria to show weak convergence ofprobability measures on the space C. However, attempting to prove a sequence of probabilitymeasures is tight using the definition of tightness alone can sometimes be problematic. We willnow go over some conditions that are equivalent to tightness in (C, C) and are much easier towork with.

Definition 2.12 Let x ∈ C. Given δ > 0, define the modulus of continuity of x for δ,denoted w(x, δ), as

w(x, δ) := supt,s∈[0,1]|t−s|≤δ

|x(s)− x(t)|

Remark 2.13 A direct consequence of the definition of convergence for sequences and uniformcontinuity is that a function x ∈ C is uniformly continuous if and only if w(x, δ)→ 0 as δ → 0.

Proposition 2.14 For a fixed δ > 0, w(·, δ) : C → R is continuous on C.

Proof Let x ∈ C be arbitrary and t, s ∈ [0, 1] be such that |t − s| < δ. For every x′ ∈ C, wehave

|x(t)− x(s)| = |x(t)− x(s) + x′(t)− x′(t) + x′(s)− x′(s)|≤ |x(t)− x′(t)|+ |x′(s)− x(s)|+ |x′(t)− x′(s)|≤ 2‖x− x′‖+ |x′(t)− x′(s)|

Taking a supremum on both sides of the inequality with respect to all couples t, s ∈ [0, 1] suchthat |t− s| < δ, we obtain

w(x, δ) ≤ 2‖x− x′‖+ w(x′, δ) (6)

Using the same argument but interchanging x with x′ gives us

w(x′, δ) ≤ 2‖x− x′‖+ w(x, δ) (7)

Depending on which of w(x, δ) or w(x′, δ) is smaller, |w(x, δ) − w(x′, δ)| is either equal tow(x, δ)−w(x′, δ) or w(x′, δ)−w(x, δ). In both cases, we conclude from (6) and (7) that, for anyx, x′ ∈ C, |w(x, δ)− w(x′, δ)| ≤ 2‖x− x′‖.

Therefore, for every x ∈ C and ε > 0, there exists η > 0, namely, η = ε/2, such that ifx′ ∈ B(x, η), then

|w(x, δ)− w(x′, δ)| ≤ 2ρ(x, x′) < 2η = ε

This proves that w(·, δ) is continuous at x for every x ∈ C.

17

Definition 2.15 A set F of real-valued functions that are defined on the same domain D ⊆ Ris uniformly bounded if there exists M > 0 such that, for every x ∈ F and t ∈ D, we have|x(t)| < M , and uniformly equicontinuous if for every ε > 0, there exists δ > 0 such that, ift, s ∈ D satisfy |t− s| < δ, then |x(t)− x(s)| < ε for every x ∈ F .

Theorem 2.16 (Arzela-Ascoli Theorem) ([7] Theorem 7.6.1) Let xnn∈N be a sequence ofreal-valued functions defined on a compact interval [a, b] ⊂ R. If the set xnn∈N is uniformlybounded and uniformly equicontinuous, then there exists a subsequence of xn that convergesuniformly to some function x : [a, b]→ R.

The hypotheses of the above theorem can be expressed in a way that is better suited in ourcontext. First, every function in C is obviously a real-valued function defined on a compactinterval of R , so the result applies for every sequence of functions in C. Secondly, uniformconvergence for functions is convergence in the `∞ norm, which is the metric ρ we are using , sowe will simply express uniform convergence of a subsequence as convergence in the space (C, ρ).As for uniform equicontinuity and uniform boundedness, we have the following lemma.

Lemma 2.17 ([2] proof of Theorem 7.2) Suppose A ⊆ C is such that

i. supx∈A|x(0)| = M <∞

ii. limδ→0

(supx∈A

w(x, δ)

)= 0

Then A is uniformly bounded and uniformly equicontinuous.

Proof Given any x ∈ A, k ∈ N and t ∈ [0, 1], we have

|x(t)| =

∣∣∣∣∣x(0) +

k∑i=1

(x

(it

k

)− x

((i− 1)t

k

))∣∣∣∣∣≤ |x(0)|+

k∑i=1

∣∣∣∣x( itk)− x

((i− 1)t

k

)∣∣∣∣ (8)

From ii. we know that for every finite S > 0, there exists some k ∈ N large enough such that ifδ ≤ 1/k, then

supx∈A

w(x, δ) < S.

Combining this with i. and taking a supremum with respect all x ∈ A on both sides of theinequality (8) above, we obtain

supx∈A|x(t)| < M + kS <∞

and since this holds for every t ∈ [0, 1], A is uniformly bounded.

Condition ii. implies that, for every ε > 0, there exists η > 0 such that if δ ≤ η,

supx∈A

w(x, δ) < ε

Thus, for every ε > 0, there exists η > 0 such that, given t, s ∈ [0, 1], if |t− s| < η, then

|x(t)− x(s)| ≤ w(x, η) ≤ supx∈A

w(x, η) < ε

for every x ∈ C. Hence, A is uniformly equicontinuous.

Definition 2.18 Let (S, ρ) be a metric space. A set A ⊂ S is relatively compact if its closure,A−, is compact.

18

Note that a set in A a metric space (S, ρ) is relatively compact if and only if every sequencein that set has a subsequence that converges in the space S (the limit is not necessarily in theset A itself)14.

Theorem 2.19 ([2] Theorem 7.2) A class A ⊆ C is relatively compact if and only if it satisfiesconditions i. and ii. of Lemma 2.17.

Proof Suppose conditions i. and ii. of Lemma 2.17 hold. Then, we know A is uniformlybounded and uniformly equicontinuous, which obviously implies that every sequence xnn∈N ⊆A is also uniformly bounded and uniformly equicontinuous. Thus, by the Arzela-Ascoli The-orem, every sequence xnn∈N ⊆ A has a subsequence that converges in (C, ρ), which impliesthat A is relatively compact.

Suppose that A is relatively compact, i.e. the closure A− is compact. Since every compactset in a metric space is bounded15, we know

∞ > supx∈A‖x‖ = sup

x∈A

(supt∈[0,1]

|x(t)|

)≥ supx∈A|x(0)|,

which proves i.Consider now the sequence of functions w(·, 1

k ) : A− → R for k ∈ N. By Proposition 2.14,every function w(·, 1

k ) is continuous. Moreover, since every x ∈ C is uniformly continuous (asit is continuous on the compact [0, 1] 16), by Remark 2.13, for every x ∈ A−, w(x, 1

k ) → 0 ask →∞. Lastly, for every i ≥ 1 and x ∈ A−, we have

w(x, 1i+1 ) = sup

|t−s|≤ 1i+1

|x(t)− x(s)| ≤ sup|t−s|≤ 1

i

|x(t)− x(s)| = w(x, 1i ),

since |t−s| < 1i+i implies that |t−s| < 1

i . Combining the three properties of w(·, 1k ) mentioned

in the previous sentences with the fact that A− is compact, we can apply Dini’s Theorem17 andconclude that w(·, 1

k ) converges uniformly to 0, i.e.

limk→∞

(supx∈A−

|w(x, 1k )− 0|

)= 0.

This obviously implies that ii. holds.

Theorem 2.20 ([2] Theorem 7.3) A sequence Pnn∈N of probability measures on (C, C) is tightif and only if the following conditions hold.

I. For every η > 0, there exists α > 0 and n1 ∈ N such that if n ≥ n1, then

Pnx ∈ C : |x(0)| ≥ α ≤ η. (9)

II. For every ε > 0 and η > 0, there exists δ ∈ (0, 1) and n2 ∈ N such that if n ≥ n2, then

Pnx ∈ C : w(x, δ) ≥ ε ≤ η. (10)

Proof Suppose Pnn∈N is tight. Thus, for every η > 0, there exists a compact K ∈ C suchthat Pn(K) > 1− η for every n ∈ N. Since every compact set is closed, and consequently equalto its closure, every compact set is relatively compact. Therefore, by Theorem 2.19, K satisfiesconditions i. and ii. of Lemma 2.17. Condition i. obviously implies that there exists α > 0 such

14See [2] Theorem p.23915See [5] Proposition 3.716See [5] Theoreme 3.17 (Heine Theorem)17See [5] Theoreme 5.14

19

that |x(0)| < α for every x ∈ K, i.e. K ⊆ x ∈ C : |x(0)| < α. It then follows by monotonicityof probability measures that there exists n1 ∈ N, namely n1 = 1, such that if n ≥ n1, we have

Pnx ∈ C : |x(0)| ≥ α = 1− Pnx ∈ C : |x(0)| < α≤ 1− Pn(K)

< η,

which implies that condition I. holds. Condition ii. implies that, for every ε > 0, there exists∆ > 0 such that, for any 0 < δ ≤ ∆, we have

supx∈K

w(x, δ) < ε,

which obviously implies that w(x, δ) < ε for every x ∈ K. Hence, for every 0 < δ ≤ ∆,K ⊆ x ∈ C : w(x, δ) < ε. Given that the only constraints on δ are 0 < δ ≤ ∆, we can chooseit such that δ ∈ (0, 1) for every ∆ > 0. Therefore, there exists δ ∈ (0, 1) and n2 ∈ N, namelyn2 = 1, such that, if n ≥ n2,

Pnx ∈ C : w(x, δ) ≥ ε = 1− Pnx ∈ C : w(x, δ) < ε≤ 1− Pn(K)

< η,

which implies that II. holds.

Now suppose conditions I. and II. are satisfied. Thus, for every ε > 0 and η > 0, there existsα > 0, n1 ∈ N, δ ∈ (0, 1) and n2 ∈ N such that (9) and (10) hold. Since (C, ρ) is separableand complete, we know by Remark 1.18 that for every n ∈ N, Pn is tight. Hence, Pn,Pn, ...satisfies I. and II. In other words, for every n ∈ N, for any η > 0 and ε > 0, there exists αn > 0such that

Pnx ∈ C : |x(0)| ≥ αn ≤ η (11)

and δn ∈ (0, 1) such that

Pnx ∈ C : w(x, δn) ≥ ε ≤ η. (12)

Let n0 = maxn1, n2, αn and δn be defined as in the inequalities (11) and (12) above, andα′ > 0 and δ′ > 0 be defined as

α′ = max

α ; max

n≤n0

αn

δ′ = min

δ ; min

n≤n0

δn

Knowing that if a ≥ b, then |x(0)| ≥ a implies that |x(0)| ≥ b, and if a ≤ b, w(f, a) ≥ ε impliesthat w(f, b) ≥ ε, we have the following inclusions:

x ∈ C : |x(0)| ≥ α′ ⊆ x ∈ C : |x(0)| ≥ αn ∀n ≤ n0

x ∈ C : |x(0)| ≥ α′ ⊆ x ∈ C : |x(0)| ≥ αx ∈ C : w(x, δ′) ≥ ε ⊆ x ∈ C : w(x, δn) ≥ ε ∀n ≤ n0

x ∈ C : w(x, δ′) ≥ ε ⊆ x ∈ C : w(x, δ) ≥ ε

Thus, applying monotonicity of probability measures to the inequalities (9),(10),(11) and (12)above, we obtain

a. For every η > 0, there exists α′ > 0 such that, for every n ∈ N,

Pnx ∈ C : |x(0)| ≥ α′ ≤ η;

20

b. For every ε > 0 and η > 0, there exists δ′ ∈ (0, 1) such that for every n ∈ N

Pnx ∈ C : w(x, δ′) ≥ ε ≤ η.

Let η > 0 be arbitrary. Then, there exists α′ > 0 with the property that the set defined asB := x ∈ C : |x(0)| < α′ is such that Pn(Bc) < η for every n ∈ N. Similarly, we can find asequence δkk≥1 ⊆ (0, 1) with the property that the sets Bk := x ∈ C : w(f, δk) < 1

k are suchthat Pn(Bck) < η/2k for k ∈ N. Define K := A−, where

A = B ∩

( ∞⋂k=1

Bk

).

Since A obviously satisfies conditions i. and ii. of Lemma 2.17, K is compact. For every n ∈ N,we have

Pn(K) ≥ Pn

(B ∩

( ∞⋂k=1

Bk

))(A− ⊇ A for every set A)

= 1− Pn

[(B ∩

( ∞⋂k=1

Bk

))c]

= 1− Pn

(Bc ∪

( ∞⋃k=1

Bck

))

≥ 1−

(Pn(Bc) +

∞∑k=1

Pn(Bck)

)

> 1−

(η +

∞∑k=1

η

2k

)= 1− 2η

This proves that Pnn∈N is tight.

Definition 2.21 Let (Ω,F ,P) be a probability space. A random function on C is a mappingX : Ω→ C that is F/C-measurable.

Similar to the situation with probability measures and weak convergence on C, the projectionmappings and the set Cf of finite-dimensional sets determines which mappings Y : Ω → C arerandom functions and when there is weak convergence for distributions of random functions.

Proposition 2.22 Let (Ω,F ,P) be a probability space and X : Ω→ C be an arbitrary mapping.Then, X is a random function on C if and only if for every k ≥ 1 and T ∈ Uk, the compositionπT X : Ω→ Rk is a random vector18.

Proof If X is a random function, then every composition of X with a projection mappingis a composition of measurable functions, since it was shown in this section that every projec-tion is measurable. It then follows from Theorem A.14 that every composition is a random vector.

Suppose now that every composition ofX with a projection is a random vector. LetA ∈ Cf bean arbitrary finite-dimensional set. Thus, there exists k ≥ 1 and T ∈ Uk such that A = π−1

T (H)for some set H ∈ Bk, which implies that

X−1(A) = X−1(π−1T (H))

= (X−1 π−1T )(H)

= (πT X)−1(H) ∈ F (since every composition πT X is measurable).

As shown in the proof of Proposition 2.9, we know that σ(Cf ) = C, hence, by Proposition A.16,X is F/C-measurable.

18Recall that for T = (t1, ..., tk) ∈ Uk, πT X = (X(t1), ..., X(tk))

21

Theorem 2.23 ([2] Theorem 7.5) Let X, Xnn∈N be random functions on C defined on aprobability space (Ω,F ,P). If, for every T ∈ Uk and k ≥ 1,

πT Xnd−→ πT X (13)

and, for every ε > 0,

limδ→0

(lim supn→∞

P(w(Xn, δ) ≥ ε))

= 0, (14)

then Xnd−→ X.

Proof Given an arbitrary projection mapping πT , we know by Definition A.42 that the randomvectors πT X, πT Xnn∈N have respective distributions P (πT X)−1, P (πT Xn)−1n∈N,which, by associativity of compositions and compositions of inverse functions, are equivalent to(P X−1) π−1

T , (P X−1n ) π−1

T n∈N. To improve readability, given T ∈ Uk and n ∈ N, weadopt the following notation for the remainder of this proof:

µT := (P X−1) π−1T , and µnT := (P X−1

n ) π−1T .

Thus, by the definition of convergence in distribution, (13) implies that

µnTw−→ µT (15)

for every k ∈ N and T ∈ Uk.

We now show that the set of probability measures P X−1n n∈N satisfies conditions I. and

II. of Theorem 2.20. Restricting (15) to the case T = 0, we have µn0w−→ µ0. Given any sequence

µni0 i∈N where the indices ni appear in no particular order but are distinct, we can always

find a subsequence µnij0 j∈N such that ni0 < ni1 < ..., otherwise the sequence µni0 i∈N would

obviously not be infinite. Since µnij0 j∈N is also a subsequence of µn0n∈N, as the indices are

increasing, it converges weakly to µ0. This proves that µn0n∈N is relatively compact, whichimplies that it is tight by the converse part of Prohorov’s Theorem (since R is separable andcomplete).

Let η > 0 be arbitrary. Then, there exists a compact set K ⊂ R such that µn0 (K) > 1 − ηfor every n ∈ N. By the Heine-Borel Theorem19, K is closed and bounded. Thus, there existsα > 0 such that K ⊆ x ∈ R : |x| ≤ α. By monotonicity of probability measures, it follows that−µn0x ∈ R : |x| ≤ α ≤ −µn0 (K) for every n ∈ N. Consequently, there exists n1 ∈ N, namelyn1 = 1, such that, if n ≥ n1, we have

P X−1n x ∈ C : |x(0)| > α = P(|(π0 Xn| > α)

= µn0x ∈ R : |x| > α= 1− µn0x ∈ R : |x| ≤ α≤ 1− µn0 (K) < η.

This proves that P X−1n n∈N satisfies condition I. of Theorem 2.20.

For every ε > 0, (14) implies that, for every η > 0, there exists ∆ > 0 such that if 0 < δ ≤ ∆,we have

lim supn→∞

P(w(Xn, δ) ≥ ε) = infn∈N

(supm≥n

P(w(Xm, δ) ≥ ε))

< η.

This implies that there exists n2 ∈ N such that

supm≥n2

P(w(Xm, δ) ≥ ε) = supm≥n2

P X−1m x ∈ C : w(x, δ) ≥ ε < η.

19See [7] Page 105

22

Hence, if m ≥ n2, then

(P X−1m )x ∈ C : w(x, δ) ≥ ε < η,

which proves condition II. of Theorem 2.20, since δ can always be chosen in (0, 1) and satisfy0 < δ ≤ ∆.

Finally, by Theorem 2.20, P X−1n n∈N is tight, which, by Corollary 2.11, implies that

P X−1n

w−→ P X. This proves that Xnd−→ X.

Theorem 2.24 ([2] Example 5.2) Let Pnn∈N be probability measures on (C, C) such that

i. For every k ≥ 1 and T ⊆ Uk, there exists a probability measure µT on (Rk,Bk) such that

Pn π−1T

w−→ µT .

ii. Pnn∈N is relatively compact.

Then, there exists a probability measure P on (C, C) such that P π−1T = µT for every T ∈ Uk,

k > 0.

Proof Since Pnn∈N is relatively compact, there exists a subsequence Pnii∈N that convergesweakly to some probability measure P on (C, C). By the continuous mapping theorem, for every

k ∈ N and T ∈ Uk , we have Pni π−1T

w−→ P π−1T . By i. we have Pn π−1

Tw−→ µT , which implies

that Pni π−1T

w−→ µT . Thus, by uniqueness of limit for weak convergence, we have P π−1T = µT

for every T ∈ Uk, k > 0.

Remark 2.25 Notice that, in the previous theorem, it follows by Theorem 2.10 that Pnw−→ P.

2.2 Donsker’s Theorem

Definition 2.26 A stochastic process is a collection of random variables Xi : i ∈ I definedon the same probability space (Ω,F ,P).

Definition 2.27 A stochastic process Wt : t ∈ [0, 1] defined on a given probability space(Ω,F ,P) is called a Brownian motion on [0,1] if it satisfies the following conditions:

i. W starts at zero, i.e.

P(W0 = 0) = 1.

ii. W has independent increments, i.e. for any k > 0, for every 0 ≤ t0 < ... < tk ≤ 1 andH0, ...,Hk ∈ B

P

(k⋂i=1

(Wti −Wti−1∈ Hi)

)=

k∏i=1

P(Wti −Wti−1∈ Hi).

iii. For every 0 ≤ s < t ≤ 1, Wt−Ws has a normal distribution with mean 0 and variance t−s,i.e. for every H ∈ B,

P(Wt −Ws ∈ H) =1√

2π(t− s)

∫H

e−u2/[2(t−s)] du.

iv. W has continuous sample paths, i.e. for any ω ∈ Ω, the function W (ω, ·) : [0, 1]→ R definedas W (ω, t) = Wt(ω) for every t ∈ [0, 1], is continuous everywhere on [0, 1].

23

Proposition 2.28 Let Wt : t ∈ [0, 1] be a Brownian motion on a probability space (Ω,F ,P).For every 0 ≤ t1 < ... < tk ≤ 1,

P (Wt1 , ...,Wtk)−1 = P

(N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1

)−1

where t0 = 0 and N1, ..., Nk are i.i.d. normally distributed random variables with mean 0 andvariance 1 defined on a probability space (Ω′,F ′,P′).

Proof From Definition 2.27 ii. and iii. it is clear that

(Wt1 ,Wt2 −Wt1 ...,Wtk −Wtk−1)

d= (N1

√t1, N2

√t2 − t1, ..., Nk

√tk − tk−1).

Let g : Rs → Rs be defined as

g(x1, ..., xs) = (x1, x1 + x2, ..., x1 + ...+ xs).

Then,

P (Wt1 , ...,Wtk)−1 = P (Wt1 ,Wt2 −Wt1 , ...,Wtk −Wtk−1

)−1 g−1

= P′ [g(N1

√t1, N2

√t2 − t1, ..., Nk

√tk − tk−1)

]−1

= P′

(N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1

)−1

Definition 2.29 A probability measure W on (C, C) is called a Wiener measure if it satisfiesthe following conditions:

i. For every t ∈ [0, 1], the random variable Zt on (C, C,W) defined as Zt(x) = x(t) for everyx ∈ C is normally distributed with mean 0 and variance t, i.e.

W(Zt ≤ α) =Wx ∈ C : x(t) ≤ α =

∫(−∞,α]

e−u2/2t

√2πt

du

for every α ∈ R.

ii. The stochastic process Zt : t ∈ [0, 1] has independent increments, i.e. for every 0 ≤ t0 ≤... ≤ tk = 1, the random variables Zti −Zti−1

ki=1 are independent.

The stochastic process Zt : t ∈ [0, 1] introduced in the above definition is usually referredto as the coordinate-variable process.

Definition 2.30 Define the Coordinate-variable random function on (C, C,W), denotedZ : C → C, as Z(x) = x for every x ∈ C.

Proposition 2.31 For every 0 ≤ s < t ≤ 1, Zt−Zs has a normal distribution with mean 0 andvariance t− s, i.e. for every H ∈ B,

P(Zt −Zs ∈ H) =1√

2π(t− s)

∫H

e−u2/[2(t−s)] du.

Proof Let 0 ≤ s < t ≤ 1 be arbitrary. Letting ϕX denote the characteristic function of somerandom variable X, it follows from Definition 2.29 i. and Proposition A.59 that

ϕZs(u) = e−u2s

2 and ϕZt(u) = e−u2t

2 .

24

Also,

ϕZt(u) = E[eiuZt

]= E

[eiu(Zt−Zs+Zs)

]= E

[eiu(Zt−Zs)eiuZs

]= E

[eiu(Zt−Zs)

]E[eiuZs

]. (by Definition 2.29 ii. and Proposition A.54)

Therefore,

ϕZt−Zs =ϕZtϕZt

= e−u2(t−s)

2 ,

which implies that the result holds by Propositions A.58 and A.59.

From definitions 2.27 and 2.29 alone, it is not clear if there exists stochastic processes on[0, 1] that are Brownian motions or probability measures on (C, C) that are Wiener measures.However, it will only be necessary to show the existence of a Wiener measure since, as we willshow, the existence of a Brownian motion can be inferred from the existence of a Wiener measure.

Proposition 2.32 If there exists a Wiener measureW, the coordinate-variable process on (C, C,W)is a Brownian motion.

Proof We show that conditions i. ii. iii. and iv. of Definition 2.27 are satisfied.

By Definition 2.29 i. we know that Z0 is normally distributed with mean 0 and variance0. Thus, it obviously is the case that W(Z0 = 0) = 1, which proves condition i. of Definition2.27. Condition ii. follows directly from Definition 2.29 ii. Condition iii. is a consequenceof Proposition 2.31. Condition iv. is obvious given that sample paths of Zt : t ∈ [0, 1] areelements of C and that every element of C is continuous by definition.

Proposition 2.33 If there exists a Wiener measure on (C, C), it is unique.

Proof Let W and W ′ be Wiener measures on (C, C), (Ω,F ,P) be a probability space andNn : Ω → Rn∈N be i.i.d. normally distributed random variables with mean 0 and variance1. Let A ∈ Cf be arbitrary. By the definition of finite-dimensional sets, there exists k ∈ N,(t1, ..., tk) = T ∈ Uk and H ∈ Bk that satisfiy π−1

T (H) = A. Then, by Propositions 2.32 and2.28, we have

W(A) = W(π−1T (H))

= Wx ∈ C : (x(t1), ..., x(tk)) ∈ H= W((Zt1 , ...,Ztk) ∈ H)

= W (Zt1 , ...,Ztk)−1(H)

= P

(N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1

)−1

(H).

Where t0 = 0. Similarly,

W ′(A) = W ′ (Zt1 , ...,Ztk)−1(H)

= P

(N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1

)−1

(H).

Thus, W(A) =W ′(A) for every A ∈ Cf , which, by Proposition 2.9, implies that W =W ′.As a consequence of the above proposition, from now on, we use the phrase “the Wiener

measure” to emphasize that there cannot be more than one Wiener measure on (C, C).

25

Proposition 2.34 The Wiener measure (if it exists) is the only measure W on (C, C) such thatfor any k ∈ N and (t1, ..., tk) = T ∈ Uk,

W π−1T = P

(N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1

)−1

where t0 = 0 and N1, ..., Nk are i.i.d. normally distributed random variables with mean 0 andvariance 1 on a probability space (Ω,F ,P).

Proof Let Q be a probability measure on (C, C) such that for every k ∈ N and (t1, ..., tk) = T ∈Uk,

Q π−1T = P

(N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1

)−1

= W π−1T .

Let A ∈ Cf . Then, there exists l ∈ N, S ∈ U l and H ∈ Bl such that π−1S (H) = A. Therefore,

Q(A) = Q(π−1S (H))

= W(π−1S (H))

= W(A),

which implies Q =W by Proposition 2.9.

Definition 2.35 Define the floor function b·c : R→ Z as bxc = maxz ∈ Z : z ≤ x for everyx ∈ R and the ceiling function d·e : R→ Z as dxe = minz ∈ Z : z ≥ x for every x ∈ R.

Proposition 2.36 For every α ∈ R,

limn→∞

bnαcn

= limn→∞

dnαen

= α.

Proof For every n ∈ N and α ∈ R, we have

dnαen≥ nα

n= α and

bnαcn≤ nα

n= α.

For every ε > 0, there exists N ∈ N, namely, any N strictly greater than 1/ε, such that, ifn ≥ N , then ∣∣∣∣dnαen − bnαc

n

∣∣∣∣ =dnαe − bnαc

n≤ 1

n≤ 1

N< ε.

Thus, if the limits exist, we have

limn→∞

bnαcn

= limn→∞

dnαen

.

Furthermore, for every ε > 0, there exists 1/ε < N ∈ N such that if n ≥ N , then∣∣∣∣dnαen − α∣∣∣∣ =

dnαen− α

(since

dnαen≥ α

)≤ dnαe

n− bnαc

n

(since −bnαc

n≥ −α

)< ε,

which proves the result.

26

Definition 2.37 Let ξnn∈N be a sequence of i.i.d. random variables with mean 0 and variance0 < σ2 <∞ on a probability space (Ω,F ,P). Let S0 = 0 and Sn = ξ1 + ...+ ξn for every n ∈ N.For every t ∈ [0, 1] and integer n ≥ 0, define

Xnt (ω) :=

1

σ√nSbntc(ω) + (nt− bntc) 1

σ√nξbntc+1(ω).

Proposition 2.38 For any sequence ξnn∈N of i.i.d. random variables with mean 0 and vari-ance σ2 <∞ on (Ω,F ,P) and every integer n ≥ 0, the mapping Xn : Ω→ C defined as

Xn(ω) : [0, 1] −→ Rt 7→ Xn

t (ω)

for every ω ∈ Ω is a random function on C.

Proof We first justify the assertion that for every n ∈ N, Xn maps Ω into C. For every ω ∈ Ω,Xn(ω) : [0, 1]→ R is obviously continuous, since it is a linear interpolation of the set of points(

i

n,Si(ω)

σ√n

)ni=0

in the plane [0, 1]× R.

We now prove Xn is F/C-measurable. It was shown in the proof of Proposition 2.9 thatC = σ(Cf ). Therefore, by Proposition A.16, Xn is F/C-measurable if (Xn)−1(A) ∈ F for everyA ∈ Cf . Let A ∈ Cf be arbitrary. Then, there exists k ∈ N, (t1, ..., tk) = T ∈ Uk and H ∈ Bksuch that π−1

T (H) = A. We then have

(Xn)−1(A) = (Xn)−1(π−1T (H))

= (Xnt1 , ..., X

ntk

)−1(H).

Given that, for every i ≤ k, Xnti is obviously a random variable, hence a measurable function, it

follows from Proposition A.19 that (Xn)−1(A) ∈ F .

Lemma 2.39 ([2] Theorem 7.4) Let 0 = t0 < ... < tv = 1 be such that

min1<i<v

|ti − ti−1| ≥ δ (16)

for some δ > 0. Then,

I. For every x ∈ C,

w(x, δ) ≤ 3 max1≤i≤v

(sup

ti−1≤s≤ti|x(s)− x(ti−1)|

)

II. For every probability measure P on (C, C) and ε > 0,

Px ∈ C : w(x, δ) ≥ 3ε ≤v∑i=1

P

x ∈ C : sup

ti−1≤s≤ti|x(s)− x(ti−1)| ≥ ε

Proof We first prove I. Let

m := max1≤i≤v

(sup


).

27

For every x ∈ C, if s, t ∈ [ti−1, ti] for some 1 ≤ i ≤ v, we then have by the triangle inequalitythat

|x(s)− x(t)| = |x(s)− x(ti−1) + x(ti−1)− x(t)|≤ |x(s)− x(ti−1)|+ |x(t)− x(ti−1)|≤ 2m. (17)

If s ∈ [ti−1, ti] and t ∈ [ti, ti+1], then

|x(s)− x(t)| = |x(s)− x(ti−1) + x(ti−1)− x(ti) + x(ti)− x(t)|≤ |x(s)− x(ti−1)|+ |x(ti)− x(ti−1)|+ |x(t)− x(ti)|≤ 3m. (18)

Furthermore, for every t, s ∈ [0, 1] such that |t − s| ≤ δ, we know there exists 1 ≤ i < v suchthat t, s ∈ [ti−1, ti], or s ∈ [ti−1, ti] and t ∈ [ti, ti+1], otherwise (16) would be false. Combiningthis remark with (17) and (18) proves condition I.

For II. let ε > 0 and x ∈ x ∈ C : w(x, δ) ≥ 3ε. By I. we have that

m = max1≤i≤v

(sup


)≥ ε.

Thus, there exists j ≤ v such that

x ∈

x ∈ C : sup

tj−1≤s≤tj|x(s)− x(tj−1)| ≥ ε

,

otherwise m would be strictly smaller than ε. Hence,

x ∈ C : w(x, δ) ≥ 3ε ⊆v⋃i=1

x ∈ C : sup

ti−1≤s≤ti|x(s)− x(ti−1)| ≥ ε

.

II. then follows from monotonicity and countable subadditivity.

Lemma 2.40 ([2] Lemma p.88) Let ξnn∈N be i.i.d. random variables with mean 0 and vari-ance 0 < σ2 <∞ on a probability space (Ω,F ,P), Snn∈N be defined as in Definition 2.37 andXnn∈N be defined as in Proposition 2.38. If

limλ→∞

(lim supn→∞

λ2P

(maxi≤n|Si| ≥ λσ

√n

))= 0, (19)

then the set of the distributions µXnn∈N is tight.

Proof We show that µXnn∈N satisfies conditions I. and II. of Theorem 2.20.

For any ω ∈ Ω, we have

Xn(ω)(0) = Xn0 (ω) =

1

σ√nS0(ω) + (0)

1

σ√nξ0+1(ω) = 0.

Thus, for every α > 0, ω ∈ Ω : |Xn(ω)(0)| ≥ α = ∅. Therefore, for every η > 0 and n ∈ N,

µXnx ∈ C : |x(0)| ≥ α = P(∅) = 0 < η.

This proves µXnn∈N satisfies condition I. of Theorem 2.20.

28

We now prove that µXn satisfies condition II. of Theorem 2.20, i.e. for every ε > 0

limδ→0

(lim supn→∞

P(w(Xn, δ) ≥ ε))

= 0. (20)

For every n ∈ N and δ > 0, define

m(n) := dnδe and v(n) :=

⌈n

m(n)

⌉.

Let n ∈ N and δ > 0 be arbitrary. For every integer i < v(n) (including zero), let mi = i ·m(n)and mv(n) = n. We then have 0 = m0 < ... < mv(n) = n and for every 0 < i < v(n),mi −mi−1 = m(n), hence,

min1<i<v(n)

∣∣∣mi

n− mi−1

n

∣∣∣ ≥ δ.Therefore, it follows from Lemma 2.39 that

P(w(Xn, δ) ≥ 3ε) ≤v(n)∑i=1

P

(sup

mi−1k ≤s≤min

∣∣∣Xns −Xn

mi−1n

∣∣∣ ≥ ε) .We notice that, for every integer t ≤ n,

Xntn

=Stσ√n,

since tnn− b

tnnc = 0. Consequently, for every s ∈

[mi−1

n , min], if s = t

n for some integer t, then∣∣∣Xnmi−1n

−Xns

∣∣∣ =

∣∣∣∣Smi−1− St

σ√n

∣∣∣∣ .If there is no such t ∈ N, then there exists an integer t′ ∈ [mi−1,mi] such that t′

n < s < t′+1n .

Since for every ω ∈ Ω, Xn(ω) is linear on every interval of the form [ t′

n ,t′+1n ], we either have

Xnt′n

(ω) = St′(ω) < Xns (ω) < St′+1(ω) = Xn

t′+1n

(ω), or

Xnt′+1n

(ω) = St′+1(ω) < Xns (ω) < St′(ω) = Xn

t′n

(ω).

In either case, it follows that there exists t′′ ∈ N (either t′ or t′+1, depending on wether Smi−1(ω)is larger or smaller than Xn

s (ω)) such that∣∣∣Xnmi−1n

(ω)−Xns (ω)

∣∣∣ ≤ ∣∣∣∣Smi−1(ω)− St′′(ω)

σ√n

∣∣∣∣ .Hence, we have that

P(w(Xn, δ) ≥ 3ε) ≤v(n)∑i=1

P

(max

mi−1≤t≤mi|Smi−1

− St| ≥ ε · σ√n

).

Since the random variables ξnn∈N are i.i.d. we know by Proposition A.49 that they are sta-tionary. For every l ∈ N, define the mapping Σ : Rl → R as Σ(x1, ..., xl) = x1 + ... + xl. Giveni ∈ N and mi−1 ≤ t ≤ mi, we then have

St − Smi−1=

t∑j=1

ξj −mi−1∑j=1

ξj

=

t∑j=mi−1+1

ξj

= Σ(ξmi−1+1, ..., ξt).

29

Therefore,

|St − Smi−1| = |Σ(ξmi−1+1, ..., ξt)|

d= |Σ(ξ1, ..., ξt−mi−1

)| (since the ξn are stationary)

= |St−mi−1− S0|

= |St−mi−1|. (21)

We then have

v(n)∑i=1

P

(max

mi−1≤t≤mi|Smi−1 − St| ≥ εσ

√n

)=

v(n)∑i=1

P

(max

mi−1≤t≤mi|St−mi−1

| ≥ εσ√n

)

=

v(n)∑i=1

P

(max

t≤mi−mi−1

|St| ≥ εσ√n

)

=

v(n)∑i=1

P

(maxt≤m(n)

|St| ≥ εσ√n

)= v(n) · P

(maxt≤m(n)

|St| ≥ εσ√n

).

We notice that, by Proposition 2.36, as n→∞,

v(n) =

⌈n

m(n)

⌉=

⌈n

dnδe

⌉→ 1

δ<

2

δ, and

n

m(n)→ 1

δ>

1

2δ.

Thus, there exists N0 ∈ N such that, if n ≥ N0, then

P(w(Xn, δ) ≥ 3ε) ≤ 2

δ· P(

maxt≤m(n)

|St| ≥ε√2δσ√m(n)

).

The inequality above implies that for every ε > 0 and δ > 0 if n ≥ N0,

P(w(Xn, δ) ≥ 3ε) ≤ 2

δ· P(

maxt≤dnδe

|St| ≥ε√2δσ√dnδe

). (22)

We now prove (20). Let ε > 0 and η > 0 be arbitrary. We need to show that there existsδ0 ∈ (0, 1) and N ∈ N such that for every n ≥ N ,

P(w(Xn, δ0) ≥ 3ε) ≤ η.

By (19), there exists λ0 > ε/√

2 and N1 ∈ N such that for every n ≥ N1,

λ20P

(maxi≤n|Si| ≥ λ0σ

√n

)≤ η ε

2

4. (23)

If we choose δ0 such that ε/√

2δ0 = λ0, i.e.

2

δ0=

4λ20

ε2,

then it follows from (22) and (23) that there exists N ∈ N 20 such that if n ≥ N , then

P(w(Xn, δ0) ≥ 3ε) ≤ 2

δ0P

(maxi≤dnδ0e

|Si| >ε√2δ0

σ√dnδ0e

)=

4λ20

ε2P

(maxi≤dnδ0e

|Si| ≥ λ0σ√dnδ0e

)≤ η.

20Any N such that N ≥ N0 and dN δ0e ≥ N1

30

Theorem 2.41 (Donsker’s Theorem)([2] Theorem 8.2) Let ξnn∈N be i.i.d. random vari-ables with mean 0 and variance 0 < σ2 <∞ defined on a probability space (Ω,F ,P) and Xnn∈Nbe defined as in Proposition 2.38. If the Wiener measure W exists, Xnn∈N converges in dis-tribution to the coordinate-variable random function Z on (C, C,W).

Proof We prove the result using Theorem 2.23.

We first show that Xn and Z satisfy (13). Given n ∈ N and t ∈ [0, 1], let

φn,t(ω) :=Sbntc(ω)

σ√n

and ψn,t(ω) :=nt− bntcσ√n

ξbntc+1(ω).

Let 0 ≤ s < t ≤ 1 and n ∈ N be arbitrary. Then, Xnt −Xn

s = (φn,t−φn,s) + (ψn,t−ψn,s). Sincethe ξi have mean zero, it follows from the linearity of integrals that

E[ψn,t − ψn,s] = E[ψn,t]− E[ψn,s]

=nt− bntcσ√n

∫Ω

ξbntc+1 dP− ns− bnscσ√n

∫Ω

ξbnsc+1 dP

=nt− bntcσ√n

E[ξbntc+1]− ns− bnscσ√n

E[ξbnsc+1]

= 0.

Moreover,

Var(ψn,t − ψn,s) = E[(ψn,t − ψn,s)2]− E[ψn,t − ψn,s]2

= E[ψ2n,t − 2ψn,tψn,s + ψ2

n,s]− 0

= E[ψ2n,t]− 2E[ψn,tψn,s] + E[ψ2

n,s].

We see that

E[ψ2n,t] =

(nt− bntc)2

σ2n

∫Ω

ξ2bntc+1 dP

=(nt− bntc)2

σ2n(E[ξ2

bntc+1])

=(nt− bntc)2

n(E[ξbntc+1] = 0, thus σ2 = E[ξ2

bntc+1])

<1

n(∀x ∈ R, x− bxc < 1),

and similarly, E[ψ2n,s] <

1n . Furthermore,

− 2E[ψn,tψn,s] = −2(nt− bntc)(ns− bnsc)σ2n

∫Ω

ξbntc+1ξbnsc+1 dP

= −2(nt− bntc)(ns− bnsc)σ2n

E[ξbntc+1ξbnsc+1].

If bntc + 1 = bnsc + 1, then, E[ξbntc+1ξbnsc+1] = E[ξ2bntc+1] = σ2 > 0. Otherwise, by inde-

pendence, if follows from Proposition A.54 that E[ξbntc+1ξbnsc+1] = E[ξbntc+1]E[ξbnsc+1] = 0.In general, we conclude that E[ξbntc+1ξbnsc+1] ≥ 0. Combining this with the fact that n > 0and that, for every x ∈ R, x − bxc ≥ 0, we then have that −2E[ψn,tψn,s] ≤ 0, and therefore,Var(ψn,t − ψn,s) < 2

n . By Chebychev’s inequality, for every ε > 0,

P(|(ψn,t − ψn,s)− 0| < ε) = 1− P(|(ψn,t − ψn,s)− 0| ≥ ε)

> 1− 2

ε2n→ 1 as n→∞,

31

hence, ψn,t − ψn,sp−→ 0. As shown in equation (21) in the proof of Lemma 2.40, the fact that

the ξn are i.i.d. and hence stationary implies that

φn,t − φn,s =Sbntc − Sbnsc

σ√n

d=

Sbntc−bnsc

σ√n

=

√bntc − bnsc√

n

Sbntc−bnsc

σ√bntc − bnsc

As n→∞, it follows from Proposition 2.36 that

√bntc−bnsc√

nconverges in probability to

√t− s

and from the Lindenberg-Levy Theorem thatSbntc−bnsc

σ√bntc−bnsc

converges in distribution to N , which

has a normal distribution with mean 0 and variance 1. We can then apply Slutsky’s Theorem

and Proposition 1.24 to show that φn,t − φn,sd−→ N

√t− s, and thus by Slutsky’s Theorem,

Xnt −Xn

sd−→ N√t− s+ 0. In particular, if s = 0, we then have that Xk

n −Xn0 = Xn

td−→ N√t.

Let 0 = t0 ≤ t1 < ... < tk ≤ 1 be arbitrary and Y n = (Xnt1 , X

nt2 − X

nt1 , ..., X

ntk− Xn

tk−1).

We notice that the components of the vector Y n are sums of random variables that are linearcombinations of the independent random variables ξnn∈N, and that there is no ξi that occursin two different components of Y n. Therefore, the components of Y n are independent. As aconsequence, for every A1, ..., Ak ∈ B, we have that

P(Y n ∈ A1 × ...×Ak)) = P

(k⋂i=1

(ω ∈ Ω : Xnti(ω)−Xk

ti−1(ω)) ∈ Ai

)

=

k∏i=1

P(

(Xnti −X

nti−1

) ∈ Ai)

=

k∏i=1

P (Xnti −X

nti−1

)−1(Ai)

=(

P (Xkt1)−1 × ...× P (Xn

tk−Xn

tk−1)−1)

(A1 × ...×Ak),

where P(Xnt1)−1×...×P(Xn

tk−Xn

tk−1)−1 is the product measure defined on the probability spaces

(R,B,P(Xkt1)−1), ..., (R,B,P(Xn

tk− Xn

tk−1)−1). Since R is separable, we know by [2] Appendix

M10 that the product space Rk is also separable. It then follows by Theorem 1.13 that

Y nd−→ (N1

√t1, N2

√t2 − t1, N3

√t3 − t2, ..., Nk

√tk − tk−1),

where N1, ..., Ns are i.i.d. normally distributed random variables with mean 0 and variance 1.Therefore, by Definition 2.29 we see that

Y nd−→ (Zt1 ,Zt2 −Zt1 ,Zt3 −Zt2 , ...,Ztk −Ztk−1

).

Furthermore, since the mapping g : Rk → Rk defined as

g(x1, ..., xk) = (x1, x1 + x2, x1 + x2 + x3, ..., x1 + ...+ xk)

is obviously continuous, it follows from the continuous mapping Theorem that

(Xnt1 , ..., X

ntk

) = g(Y n)d−→ g(Zt1 ,Zt2 −Zt1 ,Zt3 −Zt2 , ...,Ztk −Ztk−1

) = (Zt1 , ...,Ztk).

Thus, we have that

πT Xn d−→ πT Z

32

for every k ∈ N and T ∈ Un, which proves Xn and Z satisfy (13).

We now prove that (14) holds, which, by Theorem 2.20, will be satisfied if we show that theset of distributions µXnn∈N is tight. To show this, we apply Lemma 2.40 by showing thatµXnn∈N satisfies (19). By Etemadi’s inequality, for every λ > 0 and n ∈ N, we have that

P

(maxi≤n|Si| ≥ λσ

√n

)≤ 3 max

i≤nP

(|Si| ≥

λ

3σ√n

).

Therefore, if

limλ→∞

[lim supn→∞

(λ2 max

i≤nP

(|Si| ≥

λ

3σ√n

))]= 0,

or equivalently,

limλ→∞

[lim supn→∞

(λ2 max

i≤nP(|Si| ≥ λσ

√n))]

= 0, (24)

then equation (19) will be satisfied, and the proof will be complete.The remainder of this proof is dedicated to showing (24). For every n ∈ N, let µn denote the

distribution of the random variable Sn/σ√n. By linearity of integrals, for every n ∈ N, we have

that

E[Sn] = E[ξn] + ...+ E[ξn] = 0.

Thus, by the Lindenberg-Levy Theorem, we have that

µnw−→ µN ,

where µN is the distribution of a normally distributed random variable N with mean 0 andvariance 1. For every λ > 0, let Rλ = x ∈ R : |x| ≥ λ. Then, we obviously have ∂(Rλ) = x ∈R : |x| = λ, hence

µN (∂(Rλ)) =

∫∂(Rλ)

e−u2/2

√2π

du

=

∫λ

e−u2/2

√2π

du+

∫−λ

e−u2/2

√2π

du

= 0.

Therefore, for every λ > 0, Rλ is a µN -continuity set, which implies by the Portmanteau Theoremthat

limk→∞

µk(Rλ)→ µN (Rλ). (25)

By Corollary A.52, we know that for every λ > 0,

µN (Rλ) = P(|N | ≥ λ) < 3λ−4.

Hence, it follows from (25) that, for any λ > 0, there exists kλ ∈ N such that if k ≥ kλ, then

µk(Rλ) = P(|Sk/σ√k| ≥ λ)

= P(|Sk| ≥ σ√kλ)

< 3λ−4. (26)

Take n ≥ kλ and choose 1 ≤ k ≤ n. We have two possible cases:

33

1. If 1 ≤ k < kλ, then it follows from the fact that

Var(Sk) = kσ2 (by [11] (4.1) p.324, since the ξi are i.i.d.)

and Chebyshev’s inequality that

P(|Sk| ≥ σ√nλ) ≤ kσ2

σ2λ2n=

k

λ2n<

kλλ2n

.

2. if kλ ≤ k ≤ n, it follows from monotonicity of measures and (26) that

P (|Sk| ≥ λσ√n) ≤ P

(|Sk| ≥ λσ

√k)< 3λ−4.

Consequently,

maxi≤n

P(|Si| ≥ λσ

√n)< max

kλλ2n

,3

λ4

,

which, if we multiply by λ2, implies that

λ2 maxi≤n

P(|Si| ≥ λσ

√n)< max

kλn,

3

λ2

. (27)

Since we obviously have that

lim supn→∞

(max

kλn,

3

λ2

)=

3

λ2,

it follows from (27) that

lim supn→∞

(λ2 max


√n))≤ 3

λ2.

Finally, taking λ→∞, we obtain

limλ→∞

[lim supn→∞

(λ2 max


√n))]

= 0.

This concludes the proof of (19), and thus the proof of the theorem.

Theorem 2.42 The Wiener measure on (C, C) exists.

Proof We use Theorem 2.24.

Let ξnn∈N be i.i.d. random variables with mean 0 and variance 0 < σ2 < ∞ defined ona probability space (Ω,F ,P) and Xnn∈N be defined as in Proposition 2.38. In the proof ofDonsker’s Theorem, we have shown that the set of distributions µXnn∈N is tight. By thedirect part of Prohorov’s Theorem, it follows that µXnn∈N is relatively compact, which verifiescondition ii. of Theorem 2.24.

We now prove condition i. is also satisfied. In the proof of Donsker’s Theorem, we showedthat for any k ∈ N and (t1, ..., tk) = T ∈ Uk,

(Xnt1 , X

nt2 −X

nt1 , ..., X

ntk−Xn

tk−1)d−→ (N1

√t1, N2

√t2 − t1, ..., Nk

√tk − tk−1),

where N1, ..., Nk are i.i.d. normally distributed random variables with mean 0 and variance 1 ona probability space (Ω′,F ′,P′). By the continuous mapping Theorem, this implies that

(Xnt1 , ..., X

ntk

)d−→ (N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1),

34

where t0 = 0, which proves condition i. since (Xnt1 , ..., X

ntk

) = πT Xn, and thus P(πT Xn)−1 =

(P (Xn)−1) π−1T .

Therefore, by Theorem 2.24, there exists a probability measure W on (C, C) such that forevery T ∈ Uk,

W π−1T = P′ (N1

√t1, N1

√t1 +N2

√t2 − t1, ...,

k∑i=1

Ni√ti − ti−1)−1.

By Proposition 2.34, W is the Wiener measure.

Corollary 2.43 There exists a Brownian motion on [0,1]

Proof Since the coordinate-variable process obviously exists, it follows from Theorem 2.42 andProposition 2.32 that there exists a Brownian motion on [0,1].

In conclusion, if we define a stochastic process Xnt t∈[0,1] for every n ∈ N, it follows from

Donsker’s Theorem and the previous remark that as n approaches infinity, Xnt t∈[0,1] converges

to a Brownian motion on [0,1].

35

3 Numerical Verifications

In this section, we present programs in the language R that provide visual corroboration ofthe classical Central Limit Theorem and Donsker’s Theorem.

3.1 Classical Central Limit Theorem

Our first goal is to observe the following result.

Theorem 3.1 (Lindenberg-Levy Theorem) ([1] Theorem 27.1) Let (Ω,F ,P) be a probabilityspace, Xnn∈N be a sequence of i.i.d. random variables on Ω with mean µ and finite positivevariance σ2 and, for every n ∈ N, Sn := X1 + ...+Xn. Then

Σn =Sn − µ · nσ√n

d−→ N ,

where N is normally distributed with mean 0 and variance 1, i.e. its distribution function µN isdefined as

µN (A) = P(N ∈ A) :=

∫A

e−u2/2

√2π

du

for every A ∈ B.

Note that, for every µ ∈ R and σ 6= 0, the function g : Rn → R defined as

g(x1, ..., xn) =(x1 + ...+ xn)− µ · n

σ√n

is continuous, hence, Bn/B-measurable21. This ensures that for any sequence Xnn∈N of i.i.d.random variables, Σn as defined above is a random variable, as compositions of measurablemappings are measurable22. Furthermore, it is obvious from its distribution that the randomvariable N has the density function

fN (t) =e−t

2/2

√2π

t ∈ R,

which implies that its distribution function is given by

FN (t) =

∫(−∞,t]

fN (u) du =

∫(−∞,t]

e−u2/2

√2π

du t ∈ R.

Proposition 3.2 The distribution function FN is uniformly continuous on R.

Proof For every t ∈ R, 0 < e−t2/2 ≤ 1, hence, |fN (t)| ≤ 1√

2π. For every ε > 0, there exists

δ > 0, namely δ = ε√

2π, with the property that for every t1, t2 ∈ R such that t1 < t2 andt2 − t1 < δ,

|FN (t1)− FN (t2)| =

∣∣∣∣∣∫

(−∞,t1]

fN (u) du−∫

(−∞,t2]

fN (u) du

∣∣∣∣∣=

∣∣∣∣∣∫

(−∞,t1]

fN (u) du−∫

(−∞,t1]∪(t1,t2]

fN (u) du

∣∣∣∣∣=

∣∣∣∣∣∫

(−∞,t1]

fN (u) du−∫

(−∞,t1]

fN (u) du−∫

(t1,t2]

fN (u) du

∣∣∣∣∣ (by Proposition A.26 d.)

≤∫

(t1,t2]

|fN (u)| du (by Proposition A.26 c.)

≤∫

(t1,t2]

du√2π

=t2 − t1√

2π<

δ√2π

= ε. (by monotonicity)

21See Corollary A.1722See Theorem A.14

36

Proposition 3.3 ([1] Page 327 (25.2)) Let X, Xnn∈N be random variables on a probabilityspace (Ω,F ,P) with respective distributions µX , µXnn∈N and distribution functions FX , FXnn∈N.

Then, µXnw−→ µX if and only if FXn(t)→ FX(t) for every t at which F is continuous.

Given an arbitrary collection Xnn∈N of i.i.d. random variables, Proposition 3.3 showsthat the Lindenberg-Levy Theorem holds if and only if FΣn(t) → FN (t) for every t ∈ R. Toobserve this convergence using computer simulations, we approximate FΣn using a finite amountof experimental realizations of Σn.

Definition 3.4 Let Xnn∈N be a sequence of i.i.d. random variables on a probability space(Ω,F ,P). For every k ∈ N and for a fixed ω ∈ Ω, define the empirical distribution functionfor X1, ..., Xk at ω as

Fk(ω, t) :=

∣∣∣(−∞, t] ∩ X1(ω), ..., Xk(ω)∣∣∣

t, t ∈ R

where | · | denotes cardinality.

The following theorem shows that, if the amount of experimental realizations of Σn we useto estimate FΣn is very large, the approximation is almost surely reliable.

Theorem 3.5 (Glivenko-Cantelli Theorem) ([1] Theorem 20.6) Let Xnn∈N be a sequenceof i.i.d. random variables on a probability space (Ω,F ,P) with common distribution function F .For every k ∈ N, define Dk : Ω→ R as

Dk(ω) = supt|Fk(ω, t)− F (t)|.

Then, P

(limk→∞

Dk = 0

)= 1.

We now go over a segment by segment presentation of a program in the R language thatsimulates realizations of i.i.d. random variables, and then superimpose the plots of the distribu-tion function of N and the empirical distribution function of Σn for large values of n for visualcomparison.

First, we define the parameters. n determines how far towards the limit simulations of Σnwill be taken. As n gets larger, we can expect FΣn to approach FN . For a given n ∈ N,k determines how many realizations of Σn will be used to compute the empirical distributionfunction. The larger k gets, the more reliable our estimate of FΣn becomes. D, a and b specifythe distribution of the i.i.d. sequence Xnn∈N with which Σn is defined. More details on theselast three parameters will be provided.

#Parameters

n=1000

k=1000

D=2

a=-1;b=1

In the above, the default value n = 1000 is arbitrary and could be changed to any desiredpositive integer. However, to observe the Lindenbeg-Levy theorem, which describes convergencein distribution as n→∞, it is ideal to choose n to be as large as possible. The same goes for k,which needs to be large for the Glivenko-Cantelli Theorem to be observed. As for D, a and b, thefollowing table illustrates what effect these parameters have on the distribution of the variablesXn.

37

D Distribution of the Xn

1 Normal with mean a ∈ R and variance b > 02 Uniform on [a, b] (a 6= b)3 Beta with parameters alpha a > 0 and beta b > 04 Binomial with size a ∈ N and probability b ∈ [0, 1]5 Chi-Squared with a ∈ N degrees of freedom6 Exponential with rate a > 07 Gamma with shape a > 0 and rate b > 0

Secondly, we look at the portion of the program that generates W ∈ Rk, a vector whoseentries are independent simulations of Σn.

#Simulations of \Sigma_n

W=rep(NA,k)

ctr=1

while(ctr<k+1)

if(D==1) #Normal

L=rnorm(n,a,b)

m=a;v=b;

if(D==2) #Uniform

L=runif(n,a,b)

m=(b+a)/2;v=((b-a)^2)/12

if(D==3) #Beta

L=rbeta(n,a,b)

m=a/(a+b);v=(a*b)/(((a+b)^2)*(a+b+1))

if(D==4) #Binomial

L=rbinom(n,a,b)

m=a*b;v=(a*b)*(1-b)

if(D==5) #Chi-Squared

L=rchisq(n,a)

m=a;v=2*a

if(D==6) #Exponential

L=rexp(n,a)

m=1/a;v=1/(a^2)

if(D==7) #Gamma

L=rgamma(n,a,b)

m=a/b;v=a/(b^2)

Sn=sum(L)

W[ctr]=(Sn-m*n)/(sqrt(n*v))

ctr=ctr+1

Notice that W is first defined as rep(NA,k), which is an “empty” vector containing k copies ofthe entry “NA”. Then, the while loop gradually replaces the NA values in W by realizations ofΣn. In details, for every 1 ≤ ctr < k+ 1, the loop generates L, a vector containing n realizationsof i.i.d. random variables with distribution specified by D, a and b, then generates m and v,respectively the mean and variance of these random variables23, Sn, which is the sum of everyelement in L and finally replaces the ctrth NA entry in W by

23See [8] for details

38

Σn =Sn −m · n√

n · v

Finally, we look at the portion of the code that superimposes the pots of the distributionfunctions.

#Plots

plot(ecdf(W),pch=46)

x=-400:400

x=x/100

y=pnorm(x)

lines(x,y,col=2)

Given a vector W ∈ Rk, the function ecdf(W) computes an empirical density function usingthe entries of W as experimental realizations. plot(ecdf(W),pch=46) displays a plot of theempirical density function of W , where pch=46 indicates which character is used to display theplot, in this case dots. x=-400:400 defines x as the vector x = (−400,−399, ..., 399, 400), andx=x/100 redefines it as x = (−4,−3.99, ..., 3.99, 4). Given t ∈ R, the function pnorm(t) computesan approximate of ∫

(−∞,t]fN (u) du.

Thus, if we define y=pnorm(x)=(pnorm(-4),pnorm(-3.99),...,pnorm(4)), then the functionlines(x,y,col=2) adds a curve representing the theoretical distribution function of N from −4to 4 on the already existing plot, where col=2 indicates the curve is to be displayed in red.

We observe the effect that changing the parameters n and k has by looking at some examplesof plots generated with the program.

Figure 1: n = 1000, k = 1000, D = 6, a = 5, b = 1

The convergence is already convincing with n = k = 1000. Increasing to n = k = 10000, wesee in the following figure that the two plots are nearly indistinguishable.

39

Figure 2: n = 10000, k = 10000, D = 6, a = 5, b = 1

If we significantly decrease n but let k = 10000, we can expect the empirical distribution func-tion to be a good approximation of the distribution function of Σn, but FΣn will not necessarilybe close to FN .

Figure 3: n = 2, k = 10000, D = 5, a = 4, b = 7

If we significantly decrease k and let n = 10000, we can not expect the empirical distributionfunction to be a good approximation of the distribution function of Σn, but FΣn should be closeto FN .

40

Figure 4: n = 10000, k = 5, D = 2, a = −10, b = 10

As expected, changing the parameters D, a and b appears to have no significant effect.

3.2 Donsker’s Theorem

In this section our goal is to observe Donsker’s Theorem. Given that the codomain of thecoordinate-variable random function Z is the set of continuous functions defined on [0, 1], it isnot easy to visualize the distribution µZ with two-dimensional plots. However, any function inC can obviously be represented in two dimensions, so we will simulate individual sample pathsof the Brownian motion on [0,1] by generating experimental realizations of Xn 24 for large n. Ajustification of the reliability of the plots produced with such methods to simulate sample pathsof a Brownian motion on [0, 1], W , can be found in [9] section 1.2.4. We will briefly review thesearguments.

The main idea is to regard the window on a computer screen on which a plot is displayed asthe unit square [0, 1]× [0, 1]. Thus, the plot of a function x ∈ C is defined as another continuousfunction xplot : [0, 1] → [0, 1], which is a shrunk or stretched version of x such that a graphicrepresentation of xplot fits in the unit square, i.e.

(t, xplot(t))t∈[0,1] ⊂ [0, 1]× [0, 1],

and such that there is no wasted space on the display, i.e.

supt∈[0,1]

xplot(t) = 1 and inft∈[0,1]

xplot(t) = 0.

If we let C ′ be the set of all functions x : [0, 1] → [0, 1] that are continuous, we can imagine aprogram that prints the plot of a function x ∈ C as a mapping plot : C → C ′ that assigns xplotto every x ∈ C. To define this function, we first need to introduce a few other mappings. Define

24Xn is defined as in Proposition 2.38

41

sup : C → R, inf : C → R and range : C → R as follows;

sup(x) := supt∈[0,1]

x(t) x ∈ C

inf(x) := inft∈[0,1]

x(t) x ∈ C

range(x) := sup(x)− inf(x) x ∈ C

Proposition 3.6 The functions sup, inf and range are continuous with respect to the standardEuclidean metric | · | in R.

Proof Let ε > 0 be arbitrary. Then, there exists δ > 0, namely δ = ε, such that if ‖x− y‖ < δ,then

|sup(x)− sup(y)| =

∣∣∣∣∣ supt∈[0,1]

x(t) − supt∈[0,1]

y(t)

∣∣∣∣∣=

∣∣∣‖x‖ − ‖y‖∣∣∣≤ ‖x− y‖ (since ‖ · ‖ is a norm)

< δ

= ε

and

|inf(x)− inf(y)| =

∣∣∣∣ inft∈[0,1]

x(t) − inft∈[0,1]

y(t)

∣∣∣∣=

∣∣∣∣∣− supt∈[0,1]

−x(t) + supt∈[0,1]

−y(t)

∣∣∣∣∣=

∣∣∣‖ − y‖ − ‖ − x‖∣∣∣≤ ‖(−y)− (−x)‖ (since ‖ · ‖ is a norm)

= ‖x− y‖< ε,

thus, sup and inf are uniformly continuous. Therefore, since range is a linear combination ofcontinuous functions, it is continuous.

We can now define the plot function as

plot(x) =x− inf(x)

range(x)x ∈ C.

We notice that this function is not defined for x ∈ C such that range(x) = 0, which are theconstant functions. For our purposes, this is not a source of great concern given that, for everyω ∈ Ω and n ∈ N, Xn(ω, 0) = 0, hence, for Xn(ω) to be constant, we have to have thatξ1(ω) = ... = ξn(ω) = 0, which is always an event of probability zero.

Proposition 3.7 The function plot is continuous for every non-constant x ∈ C with respect tothe metric ρ(a, b) = ‖a− b‖ on C ′ ⊂ C.

Proof Let p0 : C → p0(C) and p1 : p0(C)→ C ′ be defined as

p0(x) = x− inf(x) and p1(x) =x

range(x)for every x ∈ C.

Then, we have that plot = p1 p0, which implies plot is continuous if p0 and p1 are continuous25.

25See [12] Composition of continuous mappings p.185

42

Let ε > 0 be arbitrary. Then, there exists δ > 0, namely δ = ε/2 such that if ‖x − y‖ < δ(where x, y ∈ C are non-constant), then

‖p0(x)− p0(y)‖ = ‖(x− inf(x))− (y − inf(y))‖= ‖(x− y) + (inf(y)− inf(x))‖≤ ‖x− y‖+ ‖inf(y)− inf(x)‖ (by the triangle inequality)

= ‖x− y‖+ |inf(x)− inf(y)|≤ ‖x− y‖+ ‖x− y‖ (see proof of Proposition 3.6)

< δ + δ

= ε.

Hence, p0 is uniformly continuous.Let x ∈ p0(C) be an arbitrary non-constant function and xnn∈N ⊆ p0(C) be a sequence of

non-constant functions such that xn → x in the metric ρ(a, b) = ‖a − b‖. By Proposition 3.6,we know that range is continuous, which implies that range(xn) → range(x) in the euclideanmetric d(t, s) = |t − s|. Therefore, if we define x1 ∈ C as x1(t) = 1 for every t ∈ [0, 1], we havethat range(xn) · x1 → range(x) · x1 in the metric ρ(a, b) = ‖a− b‖. Hence,

limn→∞

p1(xn) = limn→∞

(xn

range(xn)

)= lim

n→∞

(xn

range(xn) · x1

)(Since

xn(t)

range(xn) · x1(t)=

xn(t)

range(xn)for every t

)=

limn→∞

xn

limn→∞

range(xn) · x1

=x

range(x) · x1

=x

range(x)

(Since

x(t)

range(x) · x1(t)=

x(t)

range(x)for every t

)= p1(x),

which implies that p1 is continuous at x. As x was arbitrary, p1 is continuous for every non-constant x ∈ p0(C).

Proposition 3.8 Discarding all ω ∈ Ω such that ξn(ω) = 0 for every n ∈ N, we have that

plot(Xn)d−→ plot(Z).

Proof The result follows from Donsker’s Theorem and the continuous mapping Theorem.

We now look at a segment by segment presentation of a program in R that generates approx-imative plots of realizations of a Brownian motion on [0,1]. We first define the parameters.

#Parameters

n=10000

D=2

a=4

n dictates how far towards the limit simulations of Xn will be taken. Again, n = 10000 is arbi-trary, but large values are preferred to observe a convergence. D and a specify the distributionof the ξn as follows:

D Distribution of the ξn1 Normal with mean 0 and variance a > 02 Uniform on [−a, a] (a > 0)3 Student t with parameter a > 2

43

Next, we generate Xn, which is a simulation of the random vector

Xn =

(0,

S1

σ√n,S2

σ√n, ...,

Snσ√n

),

where Sn and σ are defined as in Definition 2.37, and the distribution of the ξn are specified byD and a.

#Points

if(D==1) #Normal

L=rnorm(n,0,a)

m=0;v=a;

if(D==2) #Uniform

L=runif(n,-a,a)

v=((2*a)^2)/12

if(D==3) #Student

L=rt(n,a)

v=a/(a-2)

L=L/(v*sqrt(n))

Xn=cumsum(L)

Xn=c(0,Xn)

where L is first defined as an approximation of (ξ1, ..., ξn) and then L/(v*sqrt(n)) redefines itas

L =

(ξ1σ√n,ξ2σ√n, ...,

ξnσ√n

).

The function cumsum takes a vector V as an input and generates a new vector V ′ such that forevery i ∈ N, the ith entry of V ′ is the cumulative sum of the i first entries of V . Xn=cumsum(L)

thus defines

Xn =

(S1

σ√n,S2

σ√n, ...,

Snσ√n

),

and Xn=c(0,Xn) concatenates 0 with Xn, which yields

Xn =

(0,

S1

σ√n,S2

σ√n, ...,

Snσ√n

).

Finally, we examine the portion of the program that prints the plot of Xn.

#Plots

x=1:n

x=x/n

x=c(0,x)

plot(x,Xn,’l’)

x is defined as (0, 1/n, ..., (n− 1)/n, 1), and plot(x,Xn,’l’) prints a linear interpolation of theset of points (

i

n,Si(ω)

σ√n

)ni=0

.

We now look at examples of plots generated with this progam.

44

Figure 5: n = 10000, D = 2, a = 1

Figure 6: n = 10000, D = 3, a = 4

45

A Appendix - Theoretical Background

A.1 Measure Theory

Definition A.1 Let Ω be a set. The power set of Ω, denoted P(Ω), is defined as the set of allsubsets of Ω. In other words, Y ∈ P(Ω) if and only if Y ⊆ Ω.

By convention, ∅ ⊂ Ω for any set Ω.

Definition A.2 Let Ω an arbitrary set and F ⊆ P(Ω). F is a field of sets if the followingconditions hold:

1. Ω ∈ F

2. If A,B ∈ F , then A ∪B ∈ F

3. If A ∈ F , then Ac = ω ∈ Ω : ω 6∈ A ∈ F

Notice that in our use of the notation c, complements are always understood to be withrespect to the set Ω that is considered, regardless of whether or not Ω is a subset of some otherset. It also can be noticed that every field of sets contains the empty set. For any field F on aset Ω, the definition implies that Ωc ∈ F , and Ωc = ∅.

Definition A.3 Let Ω be an arbitrary set. A field of sets F ⊆ P(Ω) is a σ-field on Ω if forevery countably infinite collection Ann∈N of sets in F ,⋃

n∈NAn ∈ F

Given a set Ω and a σ-field F on Ω, we call the pair (Ω,F) a measurable space.

Proposition A.4 ([1] p.20) Let (Ω,F) be a measurable space. If Ann∈N ⊆ F , then⋂n∈N

An ∈ F

Definition A.5 Let Ω be a set and A ⊆ P(Ω) be an arbitrary collection of subsets of Ω. LetFA = F ⊆ P(Ω) : A ⊆ F and F is a σ-field on Ω. The σ-field generated by A, denotedσ(A), is defined as

σ(A) :=⋂F∈FA

F

Given a topological space (Ω, τ), the σ-field generated by the open sets of Ω, denoted B(Ω) iscalled the Borel σ-field of Ω.

The Borel σ-field generated by the standard Euclidean topology (Rn, τ) is denoted Bn.

Definition A.6 Let (Ω,F) be a measurable space. A function µ : F → [0,∞] is a measure onF if it satisfies the following properties:

i. For every countably infinite pairwise disjoint collection Ann∈N of sets of F , i.e. Ai∩Aj = ∅if i 6= j,

µ

(⋃n∈N

An

)=∑n∈N

µ (An)

ii. µ(∅) = 0

Given a set Ω, a σ-field F on Ω and a measure µ on F , we call the triple (Ω,F , µ) a measurespace.

46

Remark A.7 The concept of measure can be defined for any field of sets F that is not a σ-fieldby restricting condition i. of Definition A.6 to all disjoint unions that are in F .

Definition A.8 Let (Ω,F , µ) be a measure space. We call µ is a finite measure if µ(Ω) <∞and a σ-finite measure if there exists a sequence of sets A1, A2, ... ∈ F such that µ(An) <∞for every n ≥ 1 and ⋃

n≥1

An = Ω

Proposition A.9 Every finite measure is σ-finite.

Proof Let (Ω,F , µ) be a measure space such that µ is finite. Then, the sequence Ω, ∅, ∅, ...only has elements with finite measure and Ω ∪ ∅ ∪ ∅ ∪ ... = X.

Proposition A.10 Let (Ω,F , µ) be a measure space.

a. (Monotonicity of Measures [1] p.162) If A,B ∈ F are such that A ⊆ B, then µ(A) ≤ µ(B).

b. (Inclusion-Exclusion Formula [1] (10.5) p.162) Let A1, ..., An ∈ F be an arbitrary collectionof sets.

µ

(n⋃i=1

Ai

)=

k∑i=1

µ(Ai)−∑i<j

µ(Ai ∩Aj) +∑i<j<k

µ(Ai ∩Aj ∩Ak)− ...+ (−1)n+1µ

(n⋂i=1

(Ai)

).

c. ([1] Theorem 10.2) Let Ann∈N ⊆ F and A ∈ F be such that Ai ⊆ Aj for every i ≤ j andAn → A. Then, µ(An)→ µ(A).

d. (Countable Subadditivity [1] Theorem 10.2 (iii)) Let Ann∈N ⊆ F be arbitrary. Then,

µ

(⋃n∈N

An

)≤∑n∈N

µ(An).

e. For every A,B ∈ F , µ(A \B) ≥ µ(A)− µ(B)

Proofe. Given arbitrary A,B ∈ F , we have

µ(A \B) + µ(B) = µ(A ∩Bc) + µ(B)

= µ((A ∩Bc) ∪B) ((A ∩Bc) and B are disjoint)

= µ((A ∪B) ∩ (Bc ∪B))

= µ(A ∪B) (Bc ∪B = Ω)

≥ µ(A) (A ∪B ⊇ A).

Definition A.11 Let Ω be an arbitrary set. Π ⊆ P(Ω) is a π-system if A,B ∈ Π implies thatA ∩B ∈ Π. Λ ⊆ P(Ω) is a λ-system if

1. Ω ∈ Λ

2. If A ∈ Λ, then Ac ∈ Λ

3. If Ann∈N ⊆ Λ are pairwise disjoint, then⋃n∈NAn ∈ Λ

Theorem A.12 ([1] Theorem 10.3) Let µ1 and µ2 be measures on the space (Ω,F). Suppose thatF = σ(Π), where Π is a π-system. If µ1 and µ2 are σ-finite on Π and such that µ1(A) = µ2(A)for every A ∈ Π, then µ1(A) = µ2(A) for every A ∈ σ(Π)

47

Definition A.13 Let (Ω,F) and (Ω′,F ′) be measurable spaces. A mapping f : Ω → Ω′ isF/F ′-measurable if for every A ∈ F ′, f−1(A) ∈ F , where f−1(A) := ω ∈ Ω : f(ω) ∈ A.

Note that, when the σ-fields with respect to which a function is measurable are unambiguous,we sometimes call the function “measurable”.

Theorem A.14 ([1] Theorem 13.1) The composition of two measurable mappings is measurable.

Proposition A.15 ([1] p.185-186) Let (Ω,F) and (Ω,F ′) be measurable spaces, and f : Ω→ Ω′

be F/F ′-measurable. Given any measure µ on F , the function µ f−1 : F ′ → [0,∞] defined asµ f−1(A) = µ(f−1(A)) for every A ∈ F is a measure on F .

Proposition A.16 ([3] Lemma 7.2) Let (Ω,F) and (Ω′,F ′) be measurable spaces such thatF ′ = σ(A) for some class A ⊆ P(Ω′) and f : Ω → Ω′. If f−1(A) ∈ F for every A ∈ A, then fis F/F ′-measurable.

Corollary A.17 Let (Ω, τ) and (Ω′, τ ′) be topological spaces and f : Ω → Ω′ be a continuousfunction. Then, f is B(Ω)/B(Ω′)-measurable.

Definition A.18 Let (Ω1,F1), ..., (Ωn,Fn) be measurable spaces and

F1 × ...×Fn := σ(A1 × ...×An : Ai ∈ Fi for all i ≤ n).

We call the measurable space (Ω1 × ...× Ωn,F1 × ...×Fn) the product space of Ω1, ...,Ωn.

Proposition A.19 Let (Ω1,F1), ..., (Ωn,Fn), (Ω′1,F ′1), ..., (Ω′n,F ′n) be measurable spaces and letf1 : Ω1 → Ω′1, ..., fn : Ωn → Ω′n be functions. If fi is Fi/F ′i-measurable for every i ≤ n, then(f1, ..., fn) : Ω1 × ...× Ωn → Ω′1 × ...× Ω′n is F1 × ...×Fn/F ′1 × ...×F ′n-measurable.

Proof Let A′1 ∈ F ′1, ..., A′n ∈ F ′n be arbitrary. Since every fi is measurable, if follows that thereexists A1 ∈ F1, ..., An ∈ Fn such that

(f1, ..., fn)−1(A′1 × ...×A′n) = (A1 × ...×An)

and we obviously have that A1 × ...×An ∈ F1 × ...×Fn. Given that

F ′1 × ...×F ′n := σ(A′1 × ...×A′n : A′i ∈ F ′i for all i ≤ n),

the result follows from Proposition A.16.

A.2 Integration

Proposition A.20 Let x ∈ R. Then, there exists x+, x− ∈ [0,∞) such that x = x+ − x−.

Proof Let x+ = max0, x and x− = −min0, x.

Definition A.21 Let Ω be an arbitrary set. A partition of Ω is a finite set of sets Aini=1 ⊆P(Ω) that are pairwise disjoint and such that

n⋃i=1

Ai = Ω.

Definition A.22 Let (Ω,F , µ) be a measure space and f : Ω → R be a F/B-measurable map-ping. Given a partition ∆ = Aini=1 of Ω such that Ai ∈ F for every i, define

S∆(f) :=

n∑i=1

(infx∈Ai

f(x)

)µ(Ai).

48

Definition A.23 Let (Ω,F , µ) be a measure space, Γ = ∆ ⊆ F : ∆ is a partition of Ω andf : Ω → R be a F/B-measurable mapping such that for every ω ∈ Ω, f(ω) ≥ 0. We define theintegral of f over Ω with respect to the measure µ, as∫

Ω

f dµ := sup∆∈Γ

S∆(f).

Remark A.24 In some cases, integrals can be defined for functions that take negative values.Given a measure space (Ω,F , µ) and an arbitrary F/B-measurable mapping f : Ω → R, findf+ : Ω → [0,∞) and f− : Ω → [0,∞) such that f(ω) = f+(ω) − f−(ω) for every ω ∈ Ω 26. Ifat least one of ∫

Ω

f+ dµ or

∫Ω

f− dµ

is finite, then we define ∫Ω

f dµ :=

∫Ω

f+ dµ−∫

Ω

f− dµ.

Definition A.25 Given a subset A ⊆ Ω, we define∫A

f dµ :=

∫Ω

IA · f dµ.

Proposition A.26 Let (Ω,F , µ) be a measure space and f, g : Ω → R be F/B-measurablemappings.

a. (Monotonicity of Integrals [1] Theorem 16.1) If f(ω) ≤ g(ω) almost everywhere on Ω, that is,f(ω) > g(ω) for at most finitely many ω ∈ Ω, then∫

Ω

f dµ ≤∫

Ω

g dµ.

b. (Linearity of Integrals [1] Theorem 16.1) For every a, b ∈ R∫Ω

(af + bg) dµ = a

∫Ω

f dµ+ b

∫Ω

g dµ.

c. ([1] Page 207 (16.4)) ∣∣∣∣∫Ω

f dµ

∣∣∣∣ ≤ ∫Ω

|f | dµ.

d. ([1] Theorem 16.9) For every pairwise disjoint collection Aii∈N ⊆ F ,∫⋃i∈N Ai

f dµ =∑i∈N

∫Ai

f dµ.

Theorem A.27 (Bounded Convergence Theorem) ([1] Theorem 16.5) Let (Ω,F , µ) be ameasure space and fn : Ω → Rn∈N be a sequence of uniformly bounded functions (i.e., thereexists M ∈ R such that |fi(ω)| < M for every i ∈ N and ω ∈ Ω). If, for some function f : Ω→ R,

limn→∞

fn(ω) = f(ω)

holds almost everywhere (i.e. for all ω ∈ Ω \D where µ(D) = 0) , and µ(A) <∞ where A ⊆ Ω,then

26See Proposition A.20

49

limn→∞

∫A

fn dµ =

∫A

f dµ

Definition A.28 Let (Ωi,Fi, µi)ni=1 be a set of measure spaces such that for every i ≤ n, µiis a σ-finite measure and let (Ω1 × ... × Ωn,F1 × ... × Fn) be the product space over Ω1, ...,Ωn.For every E ∈ F1 × ...×Fn, define

µ1 × ...× µn(E) :=

∫Ω1

(∫Ω2

(...

(∫Ωn

IE dµn

)...

)dµ2

)dµ1,

Where IE denotes the indicator function of E. The function µ1× ...×µn : F1× ...×Fn → [0,∞)is called the product measure on Ω1 × ...× Ωn.

Remark A.29 As explained in [1] Theorem 18.2 and p.238, the product measure µ1 × ...× µnon (Ω1 × ... × Ωn,F1 × ... × Fn) is a σ-finite measure and it is the only measure such thatµ1 × ...× µn(A1 × ...×An) = µ1(A1) · ... · µn(An) for every A1 ∈ F1, ..., An ∈ Fn.

A.3 Lebesgue Measure

Definition A.30 Let Ω be a set and A ⊆ P(Ω). A is a semiring if

i. ∅ ∈ A

ii. For every A,B ∈ A, A ∩B ∈ A

iii. If A,B ∈ A and A ⊆ B, then there exists a finite collection of disjoint sets C1, ..., Cn ∈ Asuch that

B \A =

n⋃i=1

Ci

Notice that by condition ii. of the above definition, every semiring is a π-system.

Theorem A.31 ([1] Theorem 11.3) Let Ω be an arbitrary set and A ⊆ P(Ω) be a semiring. Ifµ0 : A → [0,∞) is such that

a. µ0(∅) = 0

b. If A,B ∈ A are disjoint, then µ0(A ∪B) = µ0(A) + µ0(B)

c. If Ann∈N ⊆ A, then µ0

(⋃n∈N

An

)≤∑n∈N

µ0(An)

then there exists a measure µ on σ(A) such that µ(A) = µ0(A) for every A ∈ A.

Theorem A.32 ([1] Theorem 11.4) Let Ω be an arbitrary set, A ⊆ P(Ω) be a semiring and µbe a measure on σ(A). Then µ is σ-finite on A.

Proposition A.33 For every k ≥ 1, the following collection A of subsets of Rk forms a semiring

A =⋃

ai,bi∈R1≤i≤k

(a1, b1]× ...× (ak, bk]

Proof We verify the three conditions of the definition of semirings.

1. ∅ = ∅ × ...× ∅ = (0, 0]× ...× (0, 0] ∈ A

2. Let A = (a1, b1]× ...× (ak, bk], B = (a′1, b′1]× ...× (a′k, b

′k] ∈ A. For every i ≤ k, set

50

Ii =

(maxai, a′i,minbi, b′i

]if (ai, bi] ∩ (a′i, b

′i] 6= ∅

∅ otherwise

Then, A ∩B = I1 × ...× Ik, which is in A.

3. Let A = (a1, b1]× ...× (ak, bk], B = (a′1, b′1]× ...× (a′k, b

′k] ∈ A be such that A ⊆ B. Then,

for every i ≤ k, we have a′i ≤ ai and bi ≤ b′i. Set Ii = (a′i, ai] ∪ (bi, b′i] for every i ≤ k.

Then, B \A = I1 × ...× Ik, which is always a finite union of sets of A.

Definition A.34 Let A ⊆ P(Rk) be defined as in Proposition A.33, and the set function λ0 :A → [0,∞) as

λ0

((a1, b1]× ...× (ak, bk]

)=

k∏i=1

(bi − ai)

We define the lebesgue measure λ : Bk → [0,∞) to be the extension of λ0 to σ(A) = Bk,whose existence and uniqueness is guaranteed by theorems A.31, A.32 and A.12.

Remark A.35 Let f : Rk → R be Bk/B-measurable. We write∫Rkf(x) dx

for the integral of f over Rk with respect to the Lebesgue measure.

A.4 Probability

Definition A.36 A probability space is a measure space (Ω,F ,P) such that P(Ω) = 1.

For a given probability space (Ω,F ,P), we call Ω the sample space, F the event spaceand P a probability measure. Elements of F are usually referred to as events, and subsetsof F as classes. It can also be noticed that a direct consequence of monotonicity of measures isthat, for any given probability measure P on a probability space, 0 ≤ P(A) ≤ 1 for every eventA.

Proposition A.37 Let (Ω,F ,P) be a probability space.

a. ([1] (2.6) p.24) For every A ∈ F , P(Ac) = 1− P(A)

b. If A ∈ F is such that P(A) = 1, then for every B ∈ F , P(A ∩B) = P(B)

Proofb. We first notice that, by monotonicity, A ⊆ A ∪B implies that 1 = P(A) ≤ P(A ∪B), hence

P(A ∪B)=1. Then, by the inclusion-exclusion formula,

1 = P(A ∪B) = P(A) + P(B)− P(A ∩B) = 1 + P(B)− P(A ∩B)

from which the result follows.

Theorem A.38 ([1] Theorem 3.1) Let Ω be a set, F0 ⊆ P(Ω) be a field of sets, and P0 be aprobability measure on that field (see Remark A.7). Given F = σ(F0), there exists a uniqueprobability measure P on F such that for every A ∈ F0, P(A) = P0(A).

Definition A.39 Let (Ω,F ,P) be a probability space. We define the concept of independencefor events in F as follows

1. A,B ∈ F are independent if P(A ∩B) = P(A)P(B)

51

2. A1, ..., An ∈ F are independent if P

(⋂i∈I

Ai

)=∏i∈I

P(Ai) for every subset I ⊆ 1, ..., n of

at least 2 elements.

3. Let I be an arbitrary infinite set. The elements of Aii∈I ⊆ F are independent if forevery finite subset J ⊂ I of at least 2 elements, the sets in Aii∈J are independent.

Definition A.40 Let (Ω,F ,P) be a probability space. We define the concept of independencefor classes of F as

1. A1, ...,An ∈ P(F) are independent if for every choice of events A1, ..., An such that Ai ∈ Aifor every i ≤ n, A1, ..., An are independent.

2. Let I be an arbitrary infinite set. The elements of Aii∈I ⊆ P(F) are independent if forevery finite subset J ⊂ I of at least 2 elements, the classes in Aii∈J are independent.

Definition A.41 Let (Ω,F ,P) be a probability space and (Ω′,F ′) be an arbitrary measurablespace. A random element on Ω is a F/F ′-measurable mapping X : Ω→ Ω′.

A random element X defined on a space (Ω,F ,P) is called simple if it has a finite image, arandom variable if it takes values in R and is F/B-measurable, a random vector if it takesvalues in Rk for 1 < k <∞ and is F/Bk-measurable, and a random sequence if it takes valuesin R∞ and is F/B∞-measurable.

Definition A.42 Let (Ω,F ,P) be a probability space, (Ω′,F ′) be a measure space and X : Ω→Ω′ be a random element. We call the distribution of X the probability measure on F ′ definedby PX−1(A) = P(X−1(A)) 27. To improve readability, we sometimes adopt the notation µXfor PX−1, and sometimes refer to µX(A) as P(X ∈ A), or as the “probability that the event Aoccurs”.

Remark A.43 In the above definition, PX−1 is asserted to be a probability measure. This isdue to the fact that PX−1(Ω′) = P(X−1(Ω′)) = P(Ω) = 1, where X−1(Ω′) = Ω, otherwise wewould have some ω ∈ Ω such that X(ω) 6∈ Ω′, which would be a contradiction.

Definition A.44 Let X be a random variable on a probability space (Ω,F ,P) with distribu-tion µX . The distribution function of X, denoted FX : R → [0, 1], is defined as FX(x) =µX((−∞, x]). The density function of X (if it exists), is a F/B-measurable function fX :X(Ω)→ R such that

µX(A) =

∫A

fX(x) dx

for every A ∈ B.

Proposition A.45 ([1] p.256) A distribution function FX for a given random variable X hasat most countably many discontinuities.

Definition A.46 Let (Ω,F ,P) be a probability space and (Ω,F ′) be a measurable space. LetXi : Ω → Ω′ni=1 be random elements. We say that X1, ..., Xn are independent if the eventsX−1

1 (A1), ..., X−1n (An) are independent for every A1, ..., An ∈ F ′.

Definition A.47 Let (Ω,F ,P) be a probability space and (Ω,F ′) be a measurable space. Wesay that the random elements Xi : Ω → Ω′n∈N are identically distributed if there exists aprobability measure µ : F ′ → [0, 1] such that µXi = µ for every 1 ≤ i ≤ n.

When a collection of random elements are independent and identically distributed, we saythey are i.i.d.

27See Proposition A.15

52

Definition A.48 Let (Ω,F ,P) be a probability space. We call a sequence of random variablesXn : Ω → Rn∈N stationary if, for every k ∈ N, the distribution of the random vector(Xn, Xn+1, ..., Xn+k) : Ω→ Rk does not depend on n.

Proposition A.49 Let Xnn∈N be a sequence of i.i.d. random elements on a probability space(Ω,F ,P) . Then, Xnn∈N is stationary.

Proof From [1] Example 27.7, we know that for every integer m ≥ 0 and Bm/B-measurablefunction f : Rm+1 → R, the sequence of random variables f(Xn, ..., Xn+m)n∈N is stationary.The result is a consequence of the special case where m = 0 and f is the identity function on R.

Definition A.50 Let (Ω,F ,P) be a probability space and X : Ω→ R be a random variable. Theexpected value of X, denoted E[X] is defined as

E[X] :=

∫Ω

X dP

The variance of X, denoted Var(X), is defined as Var(X) := E[X2]− E[X]2

Note that the expected value of a given random variable is sometimes referred to as themean.

Proposition A.51 (Markov’s Strict Inequality) Let X be a random variable on a probabilityspace (Ω,F ,P). For every α > 0 and k ∈ N,

Pω ∈ Ω : |X(ω)| > α < α−kE[|X|k].

Proof Let α > 0 and k ∈ N be arbitrary and Xα = ω ∈ Ω : |X(ω)| > α. Then,

E[|X|k] = E[|X|k · IXα ] + E[|X|k · IXcα ]

≥ E[|X|k · IXα ] (since E[|X|k · IXcα ] ≥ 0)

=

∫Ω

|X|k · IXα dP

=

∫Xα

|X|k dP

>

∫Xα

αk dP,

where the last strict inequality is due to [1] Theorem 15.2 (ii), since |X|k − αk > 0 everywhereon Xα. We then have

E[|X|k] > αk∫IXα

1 dP (by linearity of integrals)

= αkPω ∈ Ω : |X(ω)| > α,

from which the result follows.

Corollary A.52 Let (Ω,F ,P) be a probability space and N : Ω→ R with a normal distributionwith mean 0 and variance 1. Then, for every α > 0,

P(|N | ≥ α) < 3α−4.

53

Proof Let α > 0 be arbitrary. We first notice that

P(|N | ≥ α) = P(|N | = α) + P(|N | > α)

=

∫α,−α

e−u2/2

2πdu+ P(|N | > α)

= 0 + P(|N | > α).

Furthermore, we notice that for every n ∈ N,

E[Nn+1] =

∫Run+1 e

−u2/2

√2π

du (by [1] (21.4) p.274)

=1√2π

∫Run(u · e−u

2/2)

du

= n

∫Run−1 e

−u2/2

√2π

du (integration by parts)

= nE[Nn−1].

Hence,

E[|N |4] = E[N4] = 3 · E[N2] = 3(1 · E[N0]) = 3.

The result then follows from the strict Markov’s inequality.

Proposition A.53 (Etemadi’s Inequality)([1] Theorem 22.5) Let X1, ..., Xn be independentrandom variables on a probability space (Ω,F ,P) and, for every k ≤ n, Sk = X1 + ...+Xk. Forany α ≥ 0,

P

ω ∈ Ω : max

k≤n|Sk(ω)| ≥ 3α

≤ 3 max

k≤nPω ∈ Ω : |Sk| ≥ α.

Proposition A.54 ([1] (21.18) p.277) Let X1, ..., Xn be random variables on a probability space(Ω,F ,P). If X1, ..., Xn are independent, then

E[X1 · ... ·Xn] = E[X1] · ... · E[Xn].

Proposition A.55 (Chebyshev’s Inequality) ([1] (21.13) Page 276) Let X be a randomvariable on a probability space (Ω,F ,P) with mean µ and finite variance σ2. For every α > 0,

Pω ∈ Ω : |X(ω)− µ| ≥ α ≤ σ2

α2.

Proposition A.56 ([1] p.275) Let (Ω,F ,P) be a probability space and X be a random variabletaking only positive values. Then,

E[X] =

∫[0,∞)

P(X > t) dt.

Definition A.57 Let X be a random variable on a probability space (Ω,F ,P). Define the char-acteristic function of X, denoted ϕX as

ϕX(t) = E[eitX ].

for every t ∈ R.

Proposition A.58 ([1] p.342 (ii)) The characteristic function uniquely determines the distri-bution of random variables.

Proposition A.59 ([13] p.14) Let N be a random variable with a normal distribution withmean µ and variance σ2. Then,

ϕN (t) = eiµt−σ2t2

2 .

54

References

[1] P. Billingsley, Probability and Measure, Wiley, Third Edition (1995)

[2] P. Billingsley, Convergence of Probability Measures, Wiley, Second Edition (1999)

[3] Rene L. Schilling, Measures, Integrals and Martingales Cambridge University Press (2005)

[4] W. Sierpinski, General Topology, Dover, Translated and Revised by C. Cecilia Krieger(2000)

[5] G. Auliac, J.-Y. Caby, Mathematiques, Topologie et Analyse, EdiScience (2005)

[6] J. K. Hunter, Measure Theory (Lecture Notes), Department of Mathematics, University ofCalifornia at Davis http://www.math.ucdavis.edu/~hunter/measure_theory/measure_

notes.pdf

[7] R. S. Strichartz, The Way Of Analysis, Jones & Bartlett Learning, Revised Edition (2000)

[8] R. V. Hogg, E. A. Tanis, Probability and Statistical Inference, Prentice Hall, 8th edition(2010)

[9] W. Whitt, Stochastic-Process Limits, Springer (2002)

[10] A. K. Basu, Measure Theory and Probability, Prentice-Hall (2004)

[11] S. Ross, A First Course in Probability, Prentice-Hall, Eighth Edition (2010)

[12] P. J. Pahl , R. Damrath, F. Pahl (Translator) Mathematical Foundations of ComputationalEngineering: A Handbook, Springer (2001)

[13] T. M. Bisgaard, Z. Sasvari, Characteristic Functions and Moment Sequences: Positive Def-initeness in Probability, Nova Science Publishers Inc. (2000)

55

http://www.math.ucdavis.edu/~hunter/measure_theory/measure_notes.pdf

http://www.math.ucdavis.edu/~hunter/measure_theory/measure_notes.pdf

Index

Arzela-Ascoli Theorem, 18

Borel σ-field, 46Bounded Convergence Theorem, 49Brownian Motion, 23

Cf , 15Ceiling Function, 26Characteristic Function, 54Chebyshev’s Inequality, 54Class, 51Continuity Set

For a probability peasure P, 5For a random variable X, 11

Continuous Mapping Theorem, 8, 12Convergence

in Distribution, 11in Probability, 12Weak, 4

Coordinate-variableProcess, 24Random Function, 24

Countable Subadditivity, 47

Density Function, 52Distribution, 52Distribution Function, 52Donsker’s Theorem, 31

Empirical Distribution Function, 37Equality in Distribution, 12Etemadi’s Inequality, 54Event, 51Event Space, 51Expected Value, 53

Field of Sets, 46Finite Measure, 47Finite-Dimensional Sets, 15Floor Function, 26

Glivenko-Cantelli Theorem, 37

i.i.d., 52Inclusion-Exclusion Formula, 47Independent

Classes, 52Events, 51Random Elements, 52

Integral, 49

λ-system, 47Lebesgue Measure, 51Lindenberg-Levy Theorem, 36

Markov’s Strict Inequality, 53Mean, 53Measurable

Mapping, 48Space, 46

Measure, 46Measure Space, 46Modulus of Continuity, 17Monotonicity

of integrals, 49of measures, 47

Partition, 48π-system, 47Portmanteau Theorem, 5, 12Power Set, 46Probability Measure, 51

Regular, 4Tight, 10

Probability Space, 51Product Measure, 50Product Space, 48, 50Prohorov’s Theorem, 10Projection Mapping, 15

Random Element, 52Identically Distributed, 52Random Function, 21Random Sequence, 52Random Variable, 52Random Vector, 52Simple, 52

Relative Compactnessof Probability Measures, 10of Sets, 18

Sample Path, 23Sample Space, 51Semiring, 50σ-field, 46

Generated by a class of sets, 46σ-finite Measure, 47Slutsky’s Theorem, 13Stationary Sequence of Random Variables, 53Stochastic Process, 23

Uk, 15Uniformly Bounded, 18Uniformly Equicontinuous, 18

Variance, 53

56

Documents

Donsker’s Theorem - University of Ottawaaix1.uottawa.ca/~rbalan/PierreYves-rapport.pdfDonsker’s Theorem Pierre Yves Gaudreau Lamarre August 2012 Abstract In this paper we provide