Monte Carlo Methods and Stochastic Algorithmsbl/monte-carlo/poly.pdf · 1.1. ON THE CONVERGENCE RATE OF MONTE-CARLO METHODS 3 1.1 On the convergence rate of Monte-Carlo methods In

Monte Carlo Methodsand Stochastic Algorithms

BERNARD LAPEYRE

October 2019

Contents

1 Monte-Carlo Methods for Options 11.1 On the convergence rate of Monte-Carlo methods . . . . . . . . . . . . . . . . 31.2 Simulation methods of classical laws . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Simulation of the uniform law . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Simulation of some common laws of finance . . . . . . . . . . . . . . 5

1.3 Variance Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Control variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.2 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3 Antithetic variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.4 Stratification methods . . . . . . . . . . . . . . . . . . . . . . . . . . 151.3.5 Mean value or conditioning . . . . . . . . . . . . . . . . . . . . . . . 18

1.4 Low discrepancy sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.5 Exercises and problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Monte–Carlo methods for pricing American options. 532.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.2 The Longstaff and Schwartz algorithm. . . . . . . . . . . . . . . . . . . . . . 552.3 The Tsitsiklis–VanRoy algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 572.4 The quantization algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.5 The Broadie–Glassermann algorithm. . . . . . . . . . . . . . . . . . . . . . . 612.6 The Barraquand–Martineau algorithm . . . . . . . . . . . . . . . . . . . . . . 62

3 Introduction to stochastic algorithms 653.1 A reminder on martingale convergence theorems . . . . . . . . . . . . . . . . 65

3.1.1 Consequence and examples of uses . . . . . . . . . . . . . . . . . . . 663.2 Almost sure convergence for some stochastic algorithms . . . . . . . . . . . . 69

i

ii CONTENTS

3.2.1 Almost sure convergence for the Robbins-Monro algorithm . . . . . . 693.2.2 Almost sure convergence for the Kiefer-Wolfowitz algorithm . . . . . . 71

3.3 Speed of convergence of stochastic algorithms . . . . . . . . . . . . . . . . . . 733.3.1 Introduction in a simplified context . . . . . . . . . . . . . . . . . . . 733.3.2 L2 and locally L2 martingales . . . . . . . . . . . . . . . . . . . . . . 743.3.3 Central limit theorem for martingales . . . . . . . . . . . . . . . . . . 773.3.4 A central limit theorem for the Robbins-Monro algorithm . . . . . . . 78

3.4 Exercises and problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A A proof of the central limit theorem for martingales 89

References 93

Chapter 1

Monte-Carlo Methods for Options

Monte-Carlo methods are extensively used in financial institutions to compute European op-tions prices, to evaluate sensitivities of portfolios to various parameters and to compute riskmeasurements.

Let us describe the principle of the Monte-Carlo methods on an elementary example. Let∫[0,1]d

f (x)dx,

where f (·) is a bounded real valued function. Represent I as E( f (U)), where U is a uniformlydistributed random variable on [0,1]d . By the Strong Law of Large Numbers, if (Ui, i≥ 1) is afamily of uniformly distributed independent random variables on [0,1]d , then the average

SN =1N

N

∑i=1

f (Ui) (1.1)

converges to E( f (U)) almost surely when N tends to infinity. This suggests a very simple algo-rithm to approximate I: call a random number generator N times and compute the average (1.1).Observe that the method converges for any integrable function on [0,1]d : f is not necessarily asmooth function.

In order to efficiently use the above Monte-Carlo method, we need to know its rate of con-vergence and to determine when it is more efficient than deterministic algorithms. The CentralLimit Theorem provides the asymptotic distribution of

√N(SN− I) when N tends to +∞. Var-

ious refinements of the Central Limit Theorem, such as Berry-Essen and Bikelis theorems,provide non asymptotic estimates.

The preceding consideration shows that the convergence rate of a Monte Carlo method israther slow (1/

√N). Moreover, the approximation error is random and may take large values

even if N is large (however, the probability of such an event tends to 0 when N tends to infinity).Nevertheless, the Monte-Carlo methods are useful in practice. For instance, consider an integralin a hypercube [0,1]d , with d large (d = 40, e.g.). It is clear that the quadrature methods requiretoo many points (the number of points increases exponentially with the dimension of the space).Low discrepancy sequences are efficient for moderate value of d but this efficiency decreases

1

2 CHAPTER 1. MONTE-CARLO METHODS FOR OPTIONS

drastically when d becomes large (the discrepancy behaves like C(d) logd(N)N where the constant

C(d) may be extremely large.). A Monte-Carlo method does not have such disadvantages :it requires the simulation of independent random vectors (X1, . . . ,Xd), whose coordinates areindependent. Thus, compared to the computation of the one-dimensional situation, the numberof trials is multiplied by d only and therefore the method remains tractable even when d islarge. In addition, another advantage of the Monte-Carlo methods is their parallel nature: eachprocessor of a parallel computer can be assigned the task of making a random trial.

To summarize the preceding discussion : probabilistic algorithms are used in situationswhere the deterministic methods are unefficient, especially when the dimension of the statespace is very large. Obviously, the approximation error is random and the rate of convergenceis slow, but in these cases it is still the best method known.

1.1. ON THE CONVERGENCE RATE OF MONTE-CARLO METHODS 3

1.1 On the convergence rate of Monte-Carlo methods

In this section we present results which justify the use of Monte-Carlo methods and help tochoose the appropriate number of simulations N of a Monte-Carlo method in terms of the de-sired accuracy and the confidence interval on the accuracy.

Theorem 1.1.1 (Strong Law of Large Numbers). Let (Xi, i ≥ 1) be a sequence of independentidentically distributed random variables such that E(|X1|)<+∞. The one has :

limn→+∞

1n(X1 + · · ·+Xn) = E(X1) a.s.

Remark 1.1.1. The random variable X1 needs to be integrable. Therefore the Strong Lawof Large Numbers does not apply when X1 is Cauchy distributed, that is when its density is

1π(1+x2)

.

Convergence rate We now seek estimates on the error

εn = E(X)− 1n(X1 + · · ·+Xn).

The Central Limit Theorem precises the asymptotic distribution of√

NεN .

Theorem 1.1.2 (Central Limit Theorem). Let (Xi, i ≥ 1) be a sequence of independent identi-cally distributed random variables such that E(X2

1 ) < +∞.Let σ2 denote the variance of X1,that is

σ2 = E(X2

1 )−E(X1)2 = E

((X1−E(X1))

2) .Then : (√

nσ

εn

)converges in distribution to G,

where G is a Gaussian random variable with mean 0 and variance 1.

Remark 1.1.2. From this theorem it follows that for all c1 < c2

limn→+∞

P(

σ√n

c1 ≤ εn ≤σ√

nc2

)=∫ c2

c1

e−x22

dx√2π

.

In practice, one applies the following approximate rule, for n large enough, the law of εn is aGaussian random variable with mean 0 and variance σ2/n.

Note that it is impossible to bound the error, since the support of any (non degenerate) Gaus-sian random variable is R. Nevertheless the preceding rule allow one to define a confidenceinterval : for instance, observe that

P(|G| ≤ 1.96)≈ 0.95.

Therefore, with a probability closed to 0.95, for n is large enough, one has :

|εn| ≤ 1.96σ√

n.


How to estimate the variance The previous result shows that it is crucial to estimate thestandard deviation σ of the random variable. Its easy to do this by using the same samples asfor the expectation. Let X be a random variable, and (X1, . . . ,XN) a sample drawn along the lawof X . We will denote by XN the Monte-Carlo estimator of E(X) given by

XN =1N

N

∑i=1

Xi.

A standard estimator for the variance is given by

σ2N =

1N−1

N

∑i=1

(Xi− XN)2,

σ2N is often called the empirical variance of the sample. Note that σ2

N can be rewritten as

σ2N =

NN−1

(1N

N

∑i=1

X2i − X2

N

).

On this last formula, it is obvious that XN and σ2N can be computed using only ∑

Ni=1 Xi and

∑Ni=1 X2

i .Moreover, one can prove, when E(X2)<+∞, that limN→+∞ σ2

N = σ2, almost surely, and thatE(σ2

N)= σ2 (the estimator is unbiased). This leads to an (approximate) confidence interval by

replacing σ par σn in the standard confidence interval. With a probability near of 0.95, E(X)belongs to the (random) interval given by[

XN−1.96σN√

N, XN +

1.96σN√N

].

So, with very little additional computations, (we only have to compute σN on a sample alreadydrawn) we can give an reasonable estimate of the error done by approximating E(X) with XN .The possibility to give an error estimate with a small numerical cost, is a very useful feature ofMonte-Carlo methods.

1.2 Simulation methods of classical laws

The aim of this section is to give a short introduction to sampling methods used in finance. Ouraim is not to be exhaustive on this broad subject (for this we refer to, e.g., [Devroye(1986)])but to describe methods needed for the simulation of random variables widely used in finance.Thus we concentrate on Gaussian random variables and Gaussian vectors.

1.2.1 Simulation of the uniform law

In this subsection we present basic algorithms producing sequences of “pseudo random num-bers”, whose statistical properties mimic those of sequences of independent and identically

1.2. SIMULATION METHODS OF CLASSICAL LAWS 5

uniformly distributed random variables. For a recent survey on random generators see, forinstance, [L’Ecuyer(1990)] and for mathematical treatment of these problems, see Niederre-iter [Niederreiter(1995)] and the references therein. To generate a deterministic sequence which“looks like” independent random variables uniformly distributed on [0,1], the simplest (and themost widely used) methods are congruential methods. They are defined through four integersa, b, m and U0. The integer U0 is the seed of the generator, m is the order of the congruence, ais the multiplicative term. A pseudo random sequence is obtained from the following inductiveformula:

Un = (aUn−1 +b) (mod. m)

In practice, the seed is set to U0 at the beginning of a program and must never be changed insidethe program.

Observe that a pseudo random number generator consists of a completely deterministic al-gorithm. Such an algorithm produces sequences which statistically behaves (almost) like se-quences of independent and identically uniformly distributed random variables. There is notheoretical criterion which ensures that a pseudo random number generator is statistically ac-ceptable. Such a property is established on the basis of empirical tests. For example, one buildsa sample from successive calls to the generator, and one then applies the Chi–square test or theKolmogorov–Smirnov test in order to test whether one can reasonably accept the hypothesisthat the sample results from independent and uniformly distributed random variables. A gener-ator is good when no severe test has rejected that hypothesis. Good choice for a, b, m are givenin [L’Ecuyer(1990)] and [Knuth(1998)]. The reader is also refered to the following web site en-tirely devoted to Monte-Carlo simulation : http://random.mat.sbg.ac.at/links/.

1.2.2 Simulation of some common laws of finance

We now explain the basic methods used to simulate laws in financial models.

Using the distribution function in simulation The simplest method of simulation relies onthe use of the distribution function.

Proposition 1.2.1. Let X be a real random variable with strictly positive and continuous densitypX(x). Let F be its distribution function defined by

F(x) = P(X ≤ x).

Let U be a uniformly distributed in [0,1] random variable. Then X and F−1(U) have the samedistribution function, that is to say X and F−1(U) have the same law.

Démonstration : Clearly, as F−1 is strictly increasing, we have

P(F−1(U)≤ x

)= P(U ≤ F(x)) .

Now, as F(x)≤ 1, we have P(F−1(U)≤ x

)= F(x). So F−1(U) and X have the same distribu-

tion function and, hence, the same law.


Remark 1.2.2. Simulation of an exponential lawThis result can be extended to a general case (that is to say a law which does not admit a density,or with density not necessarily strictly positive). In this case we have to define the inverse F−1

of the increasing function F by

F−1(u) = infx ∈ R,F(x)≥ u .

If we note that F−1(u) ≤ x if and only if u ≤ F(x) the end of the previous proof remains thesame.

Remark 1.2.3. Simulation of asset models with jumps uses random variables with exponentiallaws. The preceding proposition applies to the simulation of an exponential law of parameterλ > 0, whose density is given by

λ exp(−λx)1R+(x).

In this case, a simple computation leads to F(x) = 1− e−λx, so the equation F(x) = u canbe solved as x− log(1−u)

λ. If U follows a uniform distribution on [0,1], − log(1−U)

λ(or − log(U)

λ

follows an exponential law with parameter λ ).

Remark 1.2.4. This method can also be used to sample Gaussian random variables. Of courseneither the distribution function nor its inverse are exactly known but some rather good poly-nomial approximations can be found, e.g. , in [Abramovitz and Stegun(1970)]. This method isnumerically more complex than Box-Muller method (see below) but can be used when usinglow discrepancy sequences to sample Gaussian random variables.

Conditional simulation using the distribution function In stratification methods, describedlater in this chapter, it is necessary to sample real random variable X , given that this randomvariable belongs to a given interval ]a,b]. This can be easily done by using the distributionfunction. Let U be a random variable uniform on [0,1], F be the distribution function of X ,F(x) = P(X ≤ x) and F−1 be its inverse. The law of Y defined by

Y = F−1 (F(a)+(F(b)−F(a))U) ,

is equal to the conditional law of X given that X ∈]a,b]. This can be easily proved by checkingthat the distribution function of Y is equal to the one of X knowing that X ∈]a,b].

Gaussian Law The Gaussian law with mean 0 and variance 1 on R is the law with the densitygiven by

1√2π

exp(−x2

2

).

Therefore, this distribution function of the Gaussian random variable X is given by

P(X ≤ z) =1√2π

∫ z

−∞

exp(−x2

2

)dx, ∀z ∈ R.

The most widely used simulation method of a Gaussian law is the Box-Muller method. Thismethod is based upon the following result (See exercise 2 for a proof.).

1.2. SIMULATION METHODS OF CLASSICAL LAWS 7

Proposition 1.2.5. Let U and V be two independent random variables which are uniformlydistributed on [0,1]. Let X and Y be defined by

X =√−2logU sin(2πV ),

Y =√−2logU cos(2πV ).

Then X and Y are two independent Gaussian random variables with mean 0 and variance 1.

Of course, the method can be used to simulate N independent realizations of the same realGaussian law. The simulation of the two first realizations is performed by calling a randomnumber generator twice and by computing X and Y as above. Then the generator is called twoother times to compute the corresponding two new values of X and Y , which provides two newrealizations which are independent and mutually independent of the two first realizations, andso on.

Simulation of a Gaussian vector To simulate a Gaussian vector

X = (X1, . . . ,Xd)

with zero mean and with a d× d covariance matrix C = (ci j,1 ≤ i, j ≤ n) with ci j = E(X iX j)one can proceed as follows.

C is a covariance matrix, so it is positive (since, for each v ∈ Rd , v.Cv = E((v.X)2) ≥ 0).

Standard results of linear algebra prove that there exists a d× d matrix A, called a square rootof C such that

AA∗ =C,

where A∗ is the transposed matrix of A = (ai j,1≤ i, j ≤ n).Moreover one can compute a square root of a given positive symmetric matrix by specifying

that ai j = 0 for i < j (i.e. A is a lower triangular matrix). Under this hypothesis, its easy to seethat A is uniquely determined by the following algorithm

a11 :=√

c11

For 2 < i≤ d

ai1 :=ci1

a11,

then i increasing from 2 to d,

aii :=

√√√√cii−i−1

∑j=1|ai j|2,

For j < i≤ d

ai j :=ci j−∑

j−1k=1 aika jk

a j j,

For 1 < i < jai j := 0.


This way of computing a square root of a positive symmetric matrix is known as the Cholevskyalgorithm.

Now, if we assume that G = (G1, . . . ,Gd) is a vector of independent Gaussian random vari-ables with mean 0 and variance 1 (which are easy to sample as we have already seen), onecan check that Y = AG is a Gaussian vector with mean 0 et with covariance matrix given byAA∗ = C. As X et Y are two Gaussian vectors with the same mean and covariance matrix, thelaw of X and Y are the same. This leads to the following simulation algorithm.

Simulate the vector (G1, . . . ,Gd) of independent Gaussian variables as explainedabove. Then return the vector X = AG.

Discrete law Consider a random variable X taking values in a finite set xk,k = 1, . . . ,N.The value xk is taken with probability pk. To simulate the law of X , one simulates a randomvariable U uniform on [0,1]. If the value u of the trial satisfies

k−1

∑j=0

p j < u≤k

∑j=0

p j,

one decides to return the value xk. Clearly the random variable obtained by using this procedurefollows the same law as X .

Bibliographic remark A very complete discussion on the simulation of non uniform randomvariables can be found in [Devroye(1986)], results and discussion on the construction of pseudo-random sequences in Knuth [Knuth(1998)].

[Ripley(2006)],[Rubinstein(1981)] and [Hammersley and Handscomb(1979)] are referencebooks on simulation methods. See also the survey paper by Niederreiter [Niederreiter(1995)]and the references therein, in particular these which concern nonlinear random number genera-tors.

1.3 Variance Reduction

We have shown in the preceding section that the ratio σ/√

N governs the accu-racy of a Monte-Carlo method with N simulations. An obvious consequence of thisfact is that one always has interest to rewrite the quantity to compute as the ex-pectation of a random variable which has a smaller variance : this is the ba-sic idea of variance reduction techniques. For complements, we refer the readerto [Kalos and Whitlock(2008)],[Hammersley and Handscomb(1979)],[Rubinstein(1981)] or[Ripley(2006)].

Suppose that we want to evaluate E(X). We try to find an alternative representation for thisexpectation as

E(X) = E(Y )+C,

1.3. VARIANCE REDUCTION 9

using a random variable Y with lower variance and C a known constant. A lot of techniques areknown in order to implement this idea. This paragraph gives an introduction to some standardmethods.

1.3.1 Control variates

The basic idea of control variate is to write E( f (X)) as

E( f (X)) = E( f (X)−h(X))+E(h(X)),

where E(h(X)) can be explicitly computed and Var ( f (X)−h(X)) is smaller than Var ( f (X)).In these circumstances, we use a Monte-Carlo method to estimate E( f (X)−h(X)), and we addthe value of E(h(X)). Let us illustrate this principle by several financial examples.

Using call-put arbitrage formula for variance reduction Let St be the price at time t of agiven asset and denote by C the price of the European call option

C = E(e−rT (ST −K)+

),

and by P the price of the European put option

P = E(e−rT (K−ST )+

).

There exists a relation between the price of the put and the call which does not depend on themodels for the price of the asset,namely, the “call-put arbitrage formula” :

C−P = E(e−rT (ST −K)

)= S0−Ke−rT .

This formula (easily proved using linearity of the expectation) can be used to reduce the varianceof a call option since

C = E(e−rT (K−ST )+

)+S0−Ke−rT .

The Monte-Carlo computation of the call is then reduced to the computation of the put option.

Remark 1.3.1. For the Black-Scholes model explicit formulas for the variance of the put andthe call options can be obtained. In most cases, the variance of the put option is smaller thanthe variance of the call since the payoff of the put is bounded whereas the payoff of the call isnot. Thus, one should compute put option prices even when one needs a call prices.

Remark 1.3.2. Observe that call-put relations can also be obtained for Asian options or basketoptions.

For example, for Asian options, set ST = 1T∫

0 Ssds. We have :

E((

ST −K)+

)−E

((K− ST

)+

)= E

(ST)−K,

and, in the Black-Scholes model,

E(ST)=

1T

∫ T

0E(Ss)ds =

1T

∫ T

0S0ersds = S0

erT −1rT

.


Basket options. A very similar idea can be used for pricing basket options. Assume that, fori = 1, . . . ,d

SiT = xie

(r− 1

2

p

∑j=1

σ2i j

)T +

p

∑j=1

σi jWj

T,

where W 1, . . . ,W p are independent Brownian motions. Let ai, 1≤ i≤ p, be positive real num-bers such that a1 + · · ·+ad = 1. We want to compute a put option on a basket

E((K−X)+) ,

where X = a1S1T + · · ·+adSd

T . The idea is to approximate

Xm

=a1x1

merT+∑

pj=1 σ1 jW

jT + · · ·+ adxd

merT+∑

pj=1 σd jW

jT

where m = a1x1 + · · ·+adxd , by Ym where Y is the log-normal random variable

Y = me∑di=1

aixim

(rT+∑

pj=1 σi jW

jT

).

As we can compute an explicit formula for

E[(K−Y )+

],

one can to use the control variate Z = (K−Y )+ and sample (K−X)+− (K−Y )+.

1.3.2 Importance sampling

Importance sampling is another variance reduction procedure. It is obtained by changing thesampling law.

We start by introducing this method in a very simple context. Suppose we want to compute

E(g(X)),

X being a random variable following the density f (x) on R, then

E(g(X)) =∫R

g(x) f (x)dx.

Let f be another density such that f (x)> 0 and∫R f (x)dx = 1. Clearly one can write E(g(X))

as

E(g(X)) =∫R

g(x) f (x)f (x)

f (x)dx = E(

g(Y ) f (Y )f (Y )

),

where Y has density f (x) under P. We thus can approximate E(g(X)) by

1n

(g(Y1) f (Y1)

f (Y1)+ · · ·+ g(Yn) f (Yn)

f (Yn)

),


where (Y1, . . . ,Yn) are independant copies of Y . Set Z = g(Y ) f (Y )/ f (Y ). We gave decreasedthe variance of the simulation if Var (Z) < Var (g(X)). It is easy to compute the variance of Zas

Var (Z) =∫R

g2(x) f 2(x)f (x)

dx−E(g(X))2.

From this and an easy computation it follows that if g(x)> 0 and f (x) = g(x) f (x)/E(g(X)) thenVar (Z) = 0! Of course this result cannot be used in practice as it relies on the exact knowledgeof E(g(X)), which is the exactly what we want to compute. Nevertheless, it leads to a heuristicapproach : choose f (x) as a good approximation of |g(x) f (x)| such that f (x)/

∫R f (x)dx can be

sampled easily.

An elementary financial example Suppose that G is a Gaussian random variable with meanzero and unit variance, and that we want to compute

E(φ(G)) ,

for some function φ . We choose to sample the law of G = G+m, m being a real constant to bedetermined carefully. We have :

E(φ(G)) = E(

φ(G)f (G)

f (G)

)= E

(φ(G)e−mG+m2

2

).

This equality can be rewritten as

E(φ(G)) = E(

φ(G+m)e−mG−m22

).

Suppose we want to compute a European call option in the Black and Scholes model, we have

φ(G) =(

λeσG−K)+,

and assume that λ << K. In this case, P(λeσG > K) is very small and unlikely the option willbe exercised. This fact can lead to a very large error in a standard Monte-Carlo method. Inorder to increase to exercise probability, we can use the previous equality

E((

λeσG−K)+

)= E

((λeσ(G+m)−K

)+

e−mG−m22

),

and choose m = m0 with λeσm0 = K, since

P(

λeσ(G+m0) > K)=

12.

This choice of m is certainly not optimal; however it drastically improves the efficiency of theMonte-Carlo method when λ << K (see exercise 4 for a mathematical hint of this fact).


The multidimensional case Monte-Carlo simulations are really useful for problems withlarge dimension, and thus we have to extend the previous method to multidimensional setting.The ideas of this section come from [Glasserman et al.(1999)].

Let us start by considering the pricing of index options. Let σ be a n×d matrix and (Wt , t ≥0) a d-dimensional Brownian motion. Denote by (St , t ≥ 0) the solution of dS1

t = S1t (rdt +[σdWt ]1)

. . .dSn

t = Snt (rdt +[σdWt ]n)

where [σdWt ]i = ∑dj=1 σi jdW j

t .Moreover, denote by It the value of the index

It =n

∑i=1

aiSit ,

where a1, . . . ,an is a given set of positive numbers such that ∑ni=1 ai = 1. Suppose that we want

to compute the price of a European call option with payoff at time T given by

h = (IT −K)+ .

As

SiT = Si

0 exp

((r− 1

2

d

∑j=1

σ2i j

)T +

d

∑j=1

σi jWj

T

),

there exists a function φ such that

h = φ (G1, . . . ,Gd) ,

where G j =W jT/√

T . The price of this option can be rewritten as

E(φ(G))

where G = (G1, . . . ,Gd) is a d-dimensional Gaussian vector with unit covariance matrix.As in the one dimensional case, it is easy (by a change of variable) to prove that, if m =

(m1, . . . ,md),

E(φ(G)) = E(

φ(G+m)e−m.G− |m|2

2

), (1.2)

where m.G = ∑di=1 miGi and |m|2 = ∑

di=1 m2

i . In view of 1.2, the variance V (m) of the randomvariable

Xm = φ(G+m)e−m.G− |m|2

2

isV (m) = E

(φ

2(G+m)e−2m.G−|m|2)−E(φ(G))2 ,

= E(

φ2(G+m)e−m.(G+m)+

|m|22 e−m.G− |m|

22

)−E(φ(G))2 ,

= E(

φ2(G)e−m.G+

|m|22

)−E(φ(G))2 .


Exercise 5 provides an example of the use of this formula to reduce variance. The reader isrefered to [Glasserman et al.(1999)] for an almost optimal way to choose the parameter m basedon this representation.

1.3.3 Antithetic variables

The use of antithetic variables is widespread in Monte-Carlo simulation. This technique is oftenefficient but its gains are less dramatic than other variance reduction techniques.

We begin by considering a simple and instructive example. Let

I =∫ 1

0g(x)dx.

If U follows a uniform law on the interval [0,1], then 1−U has the same law as U , and thus

I =12

∫ 1

0(g(x)+g(1− x))dx = E

(12(g(U)+g(1−U))

).

Therefore one can draw n independent random variables U1, . . . ,Un following a uniform law on[0,1], and approximate I by

I2n = 1n

(12(g(U1)+g(1−U1))+ · · ·+ 1

2(g(Un)+g(1−Un)))

= 12n (g(U1)+g(1−U1)+ · · ·+g(Un)+g(1−Un)) .

We need to compare the efficiency of this Monte-Carlo method with the standard one with 2ndrawings

I02n = 1

2n (g(U1)+g(U2)+ · · ·+g(U2n−1)+g(U2n))

= 1n

(12(g(U1)+g(U2))+ · · ·+ 1

2(g(U2n−1)+g(U2n))).

We will now compare the variances of I2n and I02n. Observe that in doing this we assume that

most of numerical work relies in the evaluation of f and the time devoted to the simulation ofthe random variables is negligible. This is often a realistic assumption.

An easy computation shows that the variance of the standard estimator is

Var (I02n) =

12n

Var (g(U1)) ,

whereas

Var (I2n) =1n

Var(

12(g(U1)+g(1−U1))

)=

14n

(Var (g(U1))+Var (g(1−U1))+2Cov(g(U1),g(1−U1)))

=1

2n(Var (g(U1)+Cov(g(U1),g(1−U1))) .

Obviously, Var (I2n) ≤ Var (I02n) if and only if Cov(g(U1),g(1−U1)) ≤ 0. One can prove that

if f is a monotonic function this is always true (see 6 for a proof) and thus the Monte-Carlomethod using antithetic variables is better than the standard one.


This ideas can be generalized in dimension greater than 1, in which case we use the transfor-mation

(U1, . . . ,Ud)→ (1−U1, . . . ,1−Ud).

More generaly, if X is a random variable taking its values in Rd and T is a transformation ofRd such that the law of T (X) is the same as the law of X , we can construct an antithetic methodusing the equality

E(g(X)) =12E(g(X)+g(T (X))) .

Namely, if (X1, . . . ,Xn) are independent and sampled along the law of X , we can consider theestimator

I2n =1

2n(g(X1)+g(T (X1))+ · · ·+g(Xn)+g(T (Xn)))

and compare it to

I02n =

12n

(g(X1)+g(X2))+ · · ·+g(X2n−1)+g(X2n)) .

The same computations as before prove that the estimator I2n is better than the crude one if andonly if Cov(g(X),g(T (X)))≤ 0. We now show a few elementary examples in finance.

A toy financial example. Let G be a standard Gaussian random variable and consider the calloption

E((

λeσG−K)+

).

Clearly the law of −G is the same as the law of G, and thus the function T to be considered isT (x) = −x. As the payoff is increasing as a function of G, the following antithetic estimatorcertainly reduces the variance :

I2n =1

2n(g(G1)+g(−G1)+ · · ·+g(Gn)+g(−Gn)) ,

where g(x) = (λeσx−K)+.

Antithetic variables for path-dependent options. Consider the path dependent option withpayoff at time T

ψ (Ss,s≤ T ) ,

where (St , t ≥ 0) is the lognormal diffusion

St = xexp((r− 1

2σ

2)t +σWt

).

As the law of (−Wt , t ≥ 0) is the same as the law of (Wt , t ≥ 0) one has

E(

ψ

(xexp

((r− 1

2σ

2)s+σWs

),s≤ T

))= E

(ψ

(xexp

((r− 1

2σ

2)s−σWs

),s≤ T

)),

and, for appropriate functionals ψ , the antithetic variable method may be efficient.


1.3.4 Stratification methods

These methods are widely used in statistics (see [Cochran(1953)]). Assume that we want tocompute the expectation

I = E(g(X)) =∫Rd

g(x) f (x)dx,

where X is a Rd valued random variable with density f (x).Let (Di,1≤ i≤ m) be a partition of Rd . I can be expressed as

I =m

∑i=1

E(1X∈Dig(X)) =m

∑i=1

E(g(X)|X ∈ Di)P(X ∈ Di),

where

E(g(X)|X ∈ Di) =E(1X∈Dig(X))

P(X ∈ Di).

Note that E(g(X)|X ∈ Di) can be interpreted as E(g(X i)) where X i is a random variable whoselaw is the law of X conditioned by X belongs to Di, whose density is

1∫Di

f (y)dy1x∈Di f (x)dx.

Remark 1.3.3. The random variable X i is easily simulated using an acceptance rejection pro-cedure. But this method is clearly unefficient when P(X ∈ Di) is small.

When the numbers pi = P(X ∈ Di) can be explicitly computed, one can use a Monte-Carlomethod to approximate each conditional expectation Ii = E(g(X)|X ∈ Di) by

Ii =1ni

(g(X i

1)+ · · ·+g(X ini)),

where (X i1, . . . ,X

ini) are independent copies of X i. An estimator I of I is then

I =m

∑i=1

piIi.

Of course the samples used to compute Ii are supposed to be independent and so the varianceof I is

m

∑i=1

p2i

σ2i

ni,

where σ2i be the variance of g(X i).

Fix the total number of simulations ∑mi=1 ni = n. This minimization the variance above, one

must chooseni = n

piσi

∑mi=1 piσi

.


For this values of ni, the variance of I is given in this case by

1n

(m

∑i=1

piσi

)2

.

Note that this variance is smaller than the one obtained without stratification. Indeed,

Var (g(X)) = E(g(X)2)−E(g(X))2

=m

∑i=1

piE(g2(X)|X ∈ Di

)−

(m

∑i=1

piE(g(X)|X ∈ Di)

)2

=m

∑i=1

piVar (g(X)|X ∈ Di)+m

∑i=1

piE(g(X)|X ∈ Di)2

−

(m

∑i=1

piE(g(X)|X ∈ Di)

)2

.

Using the convexity inequality for x2 we obtain (∑mi=1 piai)

2 ≤∑mi=1 pia2

i if ∑mi=1 pi = 1, and the

inequality

Var (g(X))≥m

∑i=1

piVar (g(X)|X ∈ Di)≥

(m

∑i=1

piσi

)2

,

follows.

Remark 1.3.4. The optimal stratification involves the σi’s which are seldom explicitly known.So one needs to estimate these σi’s by Monte-Carlo simulations.

Moreover note that arbitrary choices of ni may increase the variance. Common way tocircumvent this difficulty is to choose

ni = npi.

The corresponding variance1n

m

∑i=1

piσ2i ,

is always smaller than the original one as ∑mi=1 piσ

2i ≤ Var (g(X)). This choice is often made

when the probabilities pi can be computed. For more considerations on the choice of the ni andalso, for hints on suitable choices of the sets Di, see [Cochran(1953)].

A toy example in finance In the standard Black and Scholes model the price of a call optionis

E((

λeσG−K)+

).

It is natural to use the following strata for G : either G≤ d = log(K/λ )σ

or G > d. Of course thevariance of the stratum G ≤ d is equal to zero, so if you follow the optimal choice of number,you do not have to simulate points in this stratum : all points have to be sampled in the stratum


G≥ d! This can be easily done by using the (numerical) inverse of the distribution function ofa Gaussian random variable.

Of course, one does not need Monte-Carlo methods to compute call options for the Blackand Scholes models; we now consider a more convincing example.

Basket options Most of what follows comes from [Glasserman et al.(1999)]. The computa-tion of an European basket option in a multidimensional Black-Scholes model can be expressedas

E(h(G)),

for some function h and for G= (G1, . . . ,Gn) a vector of independent standard Gaussian randomvariables. Choose a vector u ∈ Rn such that |u| = 1 (note that < u,G >= u1G1 + · · ·+unGn isalso a standard Gaussian random variable.). Then choose a partition (Bi,1 ≤ i ≤ n) of R suchthat

P(< u,G >∈ Bi) = P(G1 ∈ Bi) = 1/n.

This can be done by setting

Bi =]N−1((i−1)/n),N−1(i/n)],

where N is the distribution function of a standard Gaussian random variable and N−1 is itsinverse. We then define the strata by setting

Di = < u,x >∈ Bi .

In order to implement our stratification method we need to solve two simulation problems

• sample a Gaussian random variable < u,G > given that < u,G > belongs to Bi,

• sample a new vector G knowing the value < u,G >.

The first problem is easily solved since the law of

N−1(

i−1N

+UN

), (1.3)

is precisely the law a standard Gaussian random variable conditioned to be in Bi (see page 6).To solve the second point, observe that

G−< u,G > u

is a Gaussian vector independent of < u,G > with covariance matrix I− u⊗ u′ (where u⊗ u′

denotes the matrix defined by (u⊗ u′)i j = uiu j). Let Y be a copy of the vector G. ObviouslyY−< u,Y > u is independent of G and has the same law as G−< u,G > u. So

G =< u,G > u+G−< u,G > u and < u,G > u+Y−< u,Y > u,

have the same probability law. This leads to the following simulation method of G given <u,G >= λ :


• sample n independent standard Gaussian random variables Y i,

• set G = λu+Y−< u,Y > u.

To make this method efficient, the choice of the vector u is crucial : an almost optimal way tochoose the vector u can be found in [Glasserman et al.(1999)].

1.3.5 Mean value or conditioning

This method uses the well known fact that conditioning reduces the variance. Indeed, for anysquare integrable random variable Z, we have

E(Z) = E(E(Z|Y )),

where Y is any random variable defined on the same probability space as Z. It is well knownthat E(Z|Y ) can be written as

E(Z|Y ) = φ(Y ),

for some measurable function φ . Suppose in addition that Z is square integrable. As the condi-tional expectation is a L2 projection

E(φ(Y )2)≤ E(Z2),

and thus Var (φ(Y ))≤ Var (Z).Of course the practical efficiency of simulating φ(Y ) instead of Z heavily relies on an ex-

plicit formula for the function φ . This can be achieved when Z = f (X ,Y ), where X and Y areindependent random variables. In this case, we have

E( f (X ,Y )|Y ) = φ(Y ),

where φ(y) = E( f (X ,y)).

A basic example. Suppose that we want to compute P(X ≤ Y ) where X and Y are indepen-dent random variables. This situation occurs in finance, when one computes the hedge of anexchange option (or the price of a digital exchange option).

Using the preceding, we have

P(X ≤ Y ) = E(F(Y )) ,

where F is the distribution function of X . The variance reduction can be significant, especiallywhen the probability P(X ≤ Y ) is small.

1.4. LOW DISCREPANCY SEQUENCES 19

1.4 Low discrepancy sequences

Using sequences of points “more regular” than random points may sometimes improve Monte-Carlo methods. We look for deterministic sequences (xi, i≥ 1) such that∫

[0,1]df (x)dx≈ 1

n( f (x1)+ · · ·+ f (xn)),

for all function f in a large enough set.When the considered sequence is deterministic, the method is called a quasi Monte-Carlo

method. One can find sequences such that the speed of convergence of the previous approxima-tion is of the order K log(n)d

n (when the function f is regular enough). Such a sequence is calleda “low discrepancy sequence”.

We give now a mathematical definition of a uniformly distributed sequence. In this definitionthe notion of discrepancy is involved. By definition, if x and y are two points in [0,1]d , x≤ y ifand only if xi ≤ yi, for all 1≤ i≤ d.

Definition 1.4.1. A sequence (xn)n≥1 is said to be uniformly distributed on [0,1]d if one of thefollowing equivalent properties is fulfilled :

1. For all y = (y1, · · · ,yd) ∈ [0,1]d :

limn→+∞

1n

n

∑k=1

1xk∈[0,y]

d

∏i=1

yi = Volume([0,y]),

where [0,y] = z ∈ [0,1]d,0≤ z≤ y.

2. Let D∗n(x) = supy∈[0,1]d

∣∣∣∣∣1n n

∑k=1

1xk∈[0,y]−Volume([0,y])

∣∣∣∣∣ be the discrepancy of the sequence,

thenlim

n→+∞D∗n(x) = 0,

3. For every bounded continuous function f on [0,1]d

limn→+∞

1n

n

∑k=1

f (xk) =∫[0,1]d

f (x)dx,

Remark 1.4.1. • If (Un)n≥1 is a sequence of independent random variables with uniformlaw on [0,1], the random sequence

(Un(ω),n≥ 1),

is almost surely uniformly distributed. Moreover, we have an iterated logarithm law forthe discrepancy, namely,

limsupn

√2n

log(logn)D∗n(U) = 1 a.s.


• The discrepancy of any infinite sequence satisfies the following property

D∗n >Cd(logn)max( d−1

2 ,1)

nfor an infinite number of values of n,

where Cd is a constant which depends on d only. This result is known as the Roth theorem(see [Roth(1954)]).

• It is possible to construct d-dimensional sequences with discrepancies bounded by(logn)d/n. We will see later in this section some examples of such sequences. Notethat, using the Roth theorem, these sequences are almost optimal. These sequences are,in principle, asymptotically better than random numbers.

In practice we use a number of drawing between 103 and 108 and, in this case, the bestknown sequences are not clearly better than random numbers in term of discrepancy. Thisis especially true in large dimension (greater than 100).

The discrepancy allows one to give an estimation of the approximation error

1n

n

∑k=1

f (xk)−∫[0,1]d

f (x)dx,

when f has a finite variation in the sense of Hardy and Krause. This estimate is known as theKoksma-Hlawka inequality.

Proposition 1.4.2 (Koksma-Hlawka inequality). Let g be a finite variation function in the senseof Hardy and Krause and denote by V (g) its variation. Then for n≥ 1∣∣∣∣∣ 1

N

N

∑k=1

g(xk)−∫[0,1]d

g(u)du

∣∣∣∣∣≤V (g)D∗N(x).

Remark 1.4.3. This result is very different from the central limit theorem used for randomsequences, which leads to a confidence interval for a given probability. Here, this estimationis deterministic. This can be seen as a useful property of low discrepancy sequences, but thisestimation involves V (g) and D∗N(x) and both of these quantities are extremely hard to estimatein practice. So, the theorem gives in most cases a large overestimation of the real error (veryoften, too large to be useful) .

For a general definition of finite variation function in the sense of Hardy and Krausesee [Niederreiter(1992)]. In dimension 1, this notion coincides with the notion of a functionwith finite variation in the classical sense. In dimension d, when g is d times continuouslydifferentiable, the variation of V (g) is given by

d

∑k=1

∑1≤i1<···<ik≤d

∫x ∈ [0,1]d

x j = 1, for j 6= i1, . . . , ik

∣∣∣∣ ∂ kg(x)∂xi1 · · ·∂xik

∣∣∣∣dxi1 . . .dxik .


When the dimension d increases, a function with finite variation has to be smoother. For in-stance, the set function 1 f (x1,...,xd)>λ has an infinite variation when d ≥ 2. Moreover, most ofthe standard option payoffs for basket options such as

(a1x1 + · · ·+adxd−K)+ or (K− (a1x1 + · · ·+adxd))+

do not have finite variation when d ≥ 3 (see [Kasas(2000)] for a proof).

Note that the efficiency of a law discrepancy method depends not only on the representation ofthe expectation, but also on the way the random variable is simulated. Moreover, the methodchosen can lead to functions with infinite variation, even when the variance is bounded.

For instance, assume that we want to compute E( f (G)), where G is a real random variableand f is a function such that Var ( f (G)) < +∞, f is increasing, f (−∞) = 0 and f (+∞) =+∞. Assume that we simulate along the law of G using the inverse of the distribution functiondenoted by N(x). For the sake of simplicity, we will assume that N is differentiable and strictlyincreasing. If U is a random variable drawn uniformly on [0,1], we have

E( f (G)) = E(

f (N−1(U))= E(g(U)) .

In order to use the Koksma-Hlawka inequality we need to compute the variation of g. But

V (g) =∫ 1

0|g′|(u)du

=∫ 1

0f ′(N−1(u))dN−1(u)

=∫R

f ′(x)dx = f (+∞)− f (−∞) = +∞.

An example in finance is given by the call option where

f (G) =(

λeσG−K)+,

and G is a standard Gaussian random variable. Of course, it is easy in this case to solve thisproblem by first computing the price of the put option and then by using the call-put arbitragerelation to retrieve the call price.

We will now give examples of some of the most widely used low discrepancy sequencesin finance. For other examples and an exhaustive and rigorous presentation of this subjectsee [Niederreiter(1992)].

The Van Der Corput sequence Let p be an integer, p≥ 2 and n a positive integer. We denoteby a0,a1, . . . ,ar the p-adic decomposition of n, that is to say the unique set of integers ai suchthat 0≤ ai < p for 0≤ i≤ r and ar > 0 with

n = a0 +a1 p+ · · ·+ar pr.


Using standard notations, n can be writen as

n = arar−1 . . .a1a0 in base p.

The Van Der Corput sequence in base p is given by

φp(n) =a0

p+ · · ·+ ar

pr .

The definition of φp(n) can be rewriten as follows

if n = arar−1 . . .a0 then φp(n) = 0,a0a2 . . .ar,

where 0,a0a2 . . .ar denotes the p−adic decomposition of a number.

Halton sequences. Halton sequences are multidimensional generalizations of Van Der Corputsequence. Let p1, · · · , pd be the first d prime numbers. The Halton sequence is defined by

xdn = (φp1(n), · · · ,φpd(n)) (1.4)

for an integer n and where φpi(n) is the Van Der Corput sequence in base pi.One can prove that the discrepancy of a d−dimensional Halton sequence can be estimated

by

D∗n ≤1n

d

∏i=1

pi log(pin)log(pi)

.

Faure sequence. These sequences are defined in [Faure(1981)] and [Faure(1982)]. The Fauresequence in dimension d is defined as follows. Let r be an odd integer greater than d. Nowdefine a function T on the set of numbers x such that

x = ∑k≥0

ak

rk+1 ,

where the sum is finite and each ak belongs to 0, . . . ,r−1, by

T (x) = ∑k≥0

bk

rk+1 ,

where

bk = ∑i≥k

(ik

)ai mod r,

and(

ik

)denote the binomial coefficients. The Faure sequence is then defined as follows

xn =(

φr(n−1),T (φr(n−1)), · · · ,T d−1(φr(n−1))), (1.5)

where φr(n) is the Van Der Corput sequence of basis r. The discrepancy of this sequence isbounded by C log(n)d

n .


Irrational translation of the torus These sequences are defined by

xn = (nα1, . . . ,nαd), (1.6)

where x is the fractional part of the number x and α = (α1, · · · ,αd) is a vector of real numberssuch that (1,α1, · · · ,αd) is a free family on Q. This is equivalent to say that there is no linearrelation with integer coefficients (λi, i = 0, . . . ,d) such that

λ0 +λ1α1 + · · ·+λdαd = 0.

Note that this condition implies that the αi are irrational numbers.One convenient way to choose such a family is to define α by

(√

p1, · · · ,√

pd) ,

where p1, . . . , pd are the d first prime numbers. See [Pagès and Xiao(1997)] for numerical ex-periments on this sequence.

Sobol sequence ([Sobol’(1967)]) One of the most used low discrepancy sequences is theSobol sequence. This sequence uses the binary decomposition of a number n

n = ∑k≥1

ak(n)2k−1,

where the ak(n) ∈ 0,1. Note that ak(n) = 0, for k large enough.First choose a polynomial of degree q with coefficient in Z/2Z

P = α0 +α1X + · · ·+αqXq,

such that α0 = αq = 1. The polynomial P is supposed to be irreducible and primitive in Z/2Z.See [Roman(1992)] for definitions and appendix A.4 of this book for an algorithm for comput-ing such polynomials (a table of (some) irreducible polynomials is also available in this bookand algorithm for testing the primitivity of a polynomial is available in Maple).

Choose an arbitrary vector of (M1, . . . ,Mq)∈Nq, such that Mk is odd and less than 2k. DefineMn, for n > q by

Mn =⊕qi=12i

αiMn−i⊕Mk−q,

where ⊕ is defined bym⊕n = ∑

k≥0(ak(m) XOR ak(n))2k,

and XOR is the bitwise operator defined by

aXORb = (a+b) mod. 2.

A direction sequence (Vk,k ≥ 0) of real numbers is then defined by

Vk =Mk

2k ,


and a one dimensional Sobol sequence xn, by

xn =⊕k≥0ak(n)Vk,

if n=∑k≥1 ak(n)2k−1, A multidimensional sequence can be constructed by using different poly-nomials for each dimension.

A variant of the Sobol sequence can be defined using a “Gray code”. For a given integer n,we can define a Gray code of n, (bk(n),k≥ 0), by the binary decomposition of G(n) = n⊕ [n/2]

n⊕ [n/2] = ∑k≥0

bk(n)2k.

Note that the function G is bijective from

0, . . . ,2N−1

to itself. The main interest of Graycodes is that the binary representation of G(n) and G(n+1) differ in exactly one bit. The variantproposed by Antonov et Salev (see [Antonov and Saleev(1980)]) is defined by

xn = b1(n)V1⊕·· ·⊕br(n)Vr.

For an exhaustive study of the Sobol sequence, see [Sobol’(1967)] and [Sobol’(1976)]. Aprogram allowing to generate some Sobol sequences for small dimensions can be foundin [Press et al.(1992)], see also [Fox(1988)]. Empirical studies indicate that Sobol se-quences are among the most efficient low discrepancy sequences (see [Fox et al.(1992)]and [Radovic et al.(1996)] for numerical comparisons of sequences).

1.5. EXERCISES AND PROBLEMS 25

1.5 Exercises and problems

1.5.1 Exercises

Exercice 1. Let Z be a Gaussian random variable and K a positive real number.

1. Let d = E(Z)−log(K)√Var (Z)

, prove that

E(1Z≥log(K)(e

Z)= eE(Z)+12 Var (Z)N

(d +

√Var (Z)

).

2. Prove the formula (Black and Scholes formula)

E((

eZ−K)+

)= eE(Z)+

12 Var (Z)N

(d +

√Var (Z)

)−KN(d),

Exercice 2. Let X and Y be two independent Gaussian random variables with mean 0 andvariance 1.

1. Prove that, if R =√

X2 +Y 2 and θ is the polar angle, then θ/2π and exp(−R2/2) are twoindependent random variables following a uniform law on [0,1].

2. Using the previous result, deduce proposition 1.2.5.

Exercice 3. Consider the case of a European call in the Black and Scholes model with a stochas-tic interest rate. Suppose that the price of the stock is 1, and the option price at time 0 is givenE(Z) with Z defined by

Ze−∫ T

0 rθ dθ

[e∫ T

0 rθ dθ−σ22 T+σWT −K

]+

.

1. Prove that the variance of Z is bounded by Ee−σ2T+2σWT .

2. Prove that Ee−12 γ2T+γWT = 1, and deduce an estimate for the variance of Z

Exercice 4. Let λ and K be two real positive numbers such that λ < K and Xm be the randomvariable

Xm =(

λeσ(G+m)−K)+

e−mG−m22 .

We denote its variance by σ2m. Give an expression for the derivative of σ2

m with respect to m asan expectation, then deduce that σ2

m is a decreasing function of m when m≤m0 = log(K/λ )/σ .

Exercice 5. Let G = (G1, . . . ,Gd) be a d-dimensional Gaussian vector with covariance equalto the identity matrix. For each m ∈ Rd , let V (m) denote the variance of the random variable

Xm = φ(G+m)e−m.G− |m|2

2 .


1. Prove that∂V∂mi

(m) = E(

φ2(G)e−m.G+

|m|22 (mi−Gi)

).

2. Assume that φ(G) =(∑

di=1 λieσiGi−K

)+

, λi,σi,K being real positive constant. Letmi(α) = αλiσi. Prove that, if ∑

di=1 λi < K then

dV (m(α))

dα≤ 0,

for 0≤ α ≤ K−∑di=1 λi

∑di=1(λiσi)2 and deduce that if

mi0 = σiλi

K−∑di=1 λi

∑di=1(λiσi)2

,

then V (m0)≤V (0).

Exercice 6. The aim of this exercise is to prove that the antithetic variable method decreasesthe variance for a function which is monotonous with respect to each of its arguments.

1. Let f and g be two increasing functions from R to R. Prove that, if X and Y are two realrandom variables then we have

E( f (X)g(X))+E( f (Y )g(Y ))≥ E( f (X)g(Y ))+E( f (Y )g(X)) .

2. Deduce that, if X is a real random variable, then

E( f (X)g(X))≥ E( f (X))E(g(X)) .

3. Prove that if X1, . . . ,Xn are n independent random variables then

E( f (X1, . . . ,Xn)g(X1, . . . ,Xn)|Xn) = φ(Xn),

where φ is a function which can be computed as an expectation.

4. Deduce from this property that if f and g are two increasing (with respect each of itsargument) functions then

E( f (X1, . . . ,Xn)g(X1, . . . ,Xn))

≥ E( f (X1, . . . ,Xn))E(g(X1, . . . ,Xn)) .

5. Let h be a function from [0,1]n in R which is monotonous with respect to each of itsarguments. Let U1, . . . ,Un be n independent random variables following the uniform lawon [0,1]. Prove that

Cov(h(U1, . . . ,Un)h(1−U1, . . . ,1−Un))≤ 0,

and deduce that in this case the antithetic variable method decreases the variance.


Exercice 7. Let X and Y be independent real random variables. Let F and G be the distributionfunctions of X and G respectively We want to compute by a Monte-Carlo method the probability

θ = P(X +Y ≤ t) .

1. Propose a variance reduction procedure using a conditioning method.

2. We assume that F and G are (at least numerically) easily invertible. Explain how toimplement the antithetic variates methods. Why does this method decrease the variancein this case ?

3. Assume that h is a function such that∫ 1

0 |h(s)|2ds < +∞. Let (Ui, i ≥ 1) be a se-quence of independent random variates with a uniform distribution on [0,1]. Prove that1N

∑Ni=1 h((i−1+Ui)/n) has a lower variance than

1N

∑Ni=1 h(Ui) .

Exercice 8. Let Z be a random variable given by

Z = λ1eβ1X1 +λ2eβ2X2 ,

where (X1,X2) is a couple of real random variables and λ1, λ2, β1 and β2 are real positivenumbers. This exercise studies various methods to compute the price of an index option givenby p = P(Z > t) .

1. In this question, we assume that (X1,X2) is a Gaussian vector with mean 0 such thatVar (X1) = Var (X2) = 1 and Cov(X1,X2) = ρ , with |ρ| ≤ 1. Explain how to simulaterandom samples along the law of Z. Describe a Monte-Carlo method allowing to estimatep and explain how to estimate the error of the method.

2. Explain how to use low discrepancy sequences to compute p.

3. We assume that X1 and X2 are two independent Gaussian random variables with mean 0and variance 1. Let m be a real number. Prove that p can be written as

p = E[φ(X1,X2)1λ1eβ1(X1+m)+λ2eβ2(X2+m)≥t

],

for some function φ . How can we choose m such that

P(λ1eβ1(X1+m)+λ2eβ2(X2+m) ≥ t)≥ 14

?

Propose a new Monte-Carlo method which allows to compute p. Explain how to checkon the drawings that the method does reduce the variance.

4. Assuming now that X1 and X2 are two independent random variables with distributionfunctions F1(x) and F2(x) respectively. Prove that

p = E[1−G2

(t−λ1eβ1X1

)],


where G2(x) is a function such that the variance of

1−G2

(t−λ1eλ1X1

),

is always less than the variance of 1λ1eβ1X1+λ2eλ2X2>t . Propose a new Monte-Carlo method

to compute p.

5. We assume again that (X1,X2) is a Gaussian vector with mean 0 and such that Var (X1) =Var (X2) = 1 and Cov(X1,X2) = ρ , with |ρ| ≤ 1. Prove that p = E [1−F2 (φ(X1))] whereF2 is the repartition function of X2 and φ a function to be computed.

Deduce a variance reduction method computing p.

Exercice 9. On considère X une variable aléatoire gaussienne centrée réduite.

1. Pour f une fonction bornée, on note I = E( f (X)) et I1n et I2

n les estimateurs :

I1n =

12n

( f (X1)+ f (X2)+ . . .+ f (X2n−1)+ f (X2n)) .

I2n =

12n

( f (X1)+ f (−X1)+ . . .+ f (Xn)+ f (−Xn)) .

où (Xn,n≥ 1) est une suite de variables aléatoires indépendantes tirées selon la loi de X .

Identifier la limite en loi des variables aléatoires√

n(I1n − I). Même question pour la

famille de variables aléatoires√

n(I2n − I).

On calculera la variance des lois limites.

2. Comment peut on estimer la variance de lois limites précédentes à l’aide de l’échantillon(Xn,1≤ i≤ 2n) pour I1

n et (Xn,1≤ i≤ n) pour I2n ?

Comment évaluer l’erreur dans une méthode de Monte-Carlo utilisant I1n ou I2

n ?

3. Montrer que si f est une fonction croissante Cov( f (X), f (−X)) ≤ 0. Quel est dans cecas, l’estimateur qui vous parait préférable I1

n ou I2n ? Même question si f est décroissante.

Exercice 10. Soit G une variable aléatoire gaussienne centrée réduite.

1. On pose Lm = exp(−mG− m2

2

), montrer que E(Lm f (G + m)) = E( f (G)) pour toute

fonction f bornée.

Soit Xm une autre variable aléatoire, intégrable telle que E(Xm f (G+m)) =E( f (G)) pourtoute fonction f bornée. Montrer que E(Xm|G) = Lm.

Dans une méthode de simulation quelle représentation de E( f (G)) vaut il mieux utiliserE(Xm f (G+m)) ou E(Lm f (G+m)) ?


2. Montrer que la variance de Lm f (G+m) se met sous la forme

E(

e−mG+m22 f 2(G)

)−E( f (G))2,

et que la valeur de m qui minimise cette variance est donnée par m = E(G f 2(G))E( f 2(G))

. Que vautce m optimum lorsque f (x) = x ? Commentaire.

3. Soit p1 et p2 deux nombres positifs de somme 1 et m1 et m2 2 réels, on pose :

l(g) = p1em1g−m21

2 + p2em2g−m22

2 .

On pose pour f mesurable bornée µ( f ) = E(l(G) f (G)). Montrer que

µ( f ) =∫R

f (x)p(x)dx,

p étant une densité que l’on précisera.

4. Proposer une technique de simulation selon la loi de densité p.

5. On suppose que G est une variable aléatoire suivant la loi précédente. Montrer que :

E(l−1(G) f (G)) = E( f (G)),

Var (l−1(G) f (G)) = E(l−1(G) f 2(G)

)−E( f (G))2.

6. On s’intéresse au cas p1 = p2 = 1/2, m1 =−m2 = m et f (x) = x. Montrer que :

Var (l−1(G)G) = E

(em2/2G2

cosh(mG)

).

On note v(m) cette variance comme fonction de m. Vérifier que v′(0) = 0 et v′′(0)< 0.

Comment choisir m pour réduire la variance lors d’un calcul de E(G) ?

Exercice 11. Soit X une variable aléatoire réelle et Y une variable de contrôle réelle. On sup-posera que E(X2)<+∞ et que E(Y 2)<+∞, E(Y ) = 0 pour i = 1, . . . ,n.

1. Soit λ un vecteur de Rn, calculer Var (X−λY ) et la valeur λ ∗ qui minimise cette variance.As t’on intérêt à supposer X et Y indépendantes ?

2. On suppose que ((Xn,Yn),n≥ 0) est une suite de variables aléatoires indépendantes tiréesselon la loi du couple (X ,Y ). On définit λ ∗n par

λ∗n =

∑ni=1 XiYi− 1

n ∑ni=1 Xi ∑

ni=1Yi

∑ni=1 X2

i −1n(∑

ni=1 Xi)n

.

Montrer que λ ∗n tends presque surement vers λ ∗ lorsque n tend vers +∞.


3. Montrez que√

n(λ ∗n − λn

)Yn, où Yn = (Y1 + · · ·+Yn)/n tend vers 0 presque sûrement.

4. En utilisant le théorème de Slutsky (voir question suivante) montrer que :

√n(

1n(X1−λ

∗n Y1 + · · ·+Xn−λ

∗n Yn)−E(X)

)tends vers un loi gaussienne de variance Var (X−λ ∗n Y ).

Comment interpreter le résultat pour une méthode de Monte-Carlo utilisant λX commevariable de contrôle ?

5. Montrez le théorème de Slutsky, c’est à dire que si Xn converge en loi vers X et Yn con-verge en loi vers une constante a alors le couple (Xn,Yn) converge en loi vers (X ,a) (onpourra considérer la fonction caractéristique du couple (Xn,Yn).


1.5.2 Problems

PROBLEM 1. Calcul par une méthode de Monte-Carlo du prix d’un CDS.On considère une suite de temps déterministes 0 < T1 < T2 < · · ·< Tn et un taux d’intérêt r≥ 0.On note par τ un temps (de défaut) aléatoire et l’on cherche à évaluer par une méthode deMonte-Carlo le prix d’un CDS donné par :

P = E(H(τ))

où :

H(τ) = Rn−1

∑i=1

e−rTi1τ>Ti− e−rτ1τ≤Tn

Intensité déterministe On suppose que (λs,s≥ 0) est une fonction continue non aléatoire des, telle que, pour tout s≥ 0, λs > 0. On suppose que τ suit la loi

1t≥0e−∫ t

0 λsdsλtdt.

1. Reconnaître la loi de τ lorsque λs ne dépend pas de s. Comment peut on, alors, simulerun variable aléatoire suivant la loi de τ ? Montrer que l’on peut calculer P à l’aide d’uneformule simple.

2. Soit ξ une variable aléatoire de loi exponentielle de paramètre 1. On note Λ la fonctionde R+ dans R+ définie par :

Λ(t) =∫ t

0λsds,

et Λ−1 son inverse. Montrer que Λ−1(ξ ) suit la même loi que τ et, en déduire une méthodede simulation selon la loi de τ .

3. En supposant Λ et Λ−1 calculable explicitement, proposer une méthode de Monte-Carlopermettant de calculer P. Expliquer comment l’on peut estimer, alors, l’erreur commise.

4. On suppose que Λ(Tn)≤ K. Montrer que :

P =

(R

n−1

∑i=1

e−rTi

)P(ξ > K)+E(H(τ)|ξ ≤ K)P(ξ ≤ K) .

Expliquer comment simuler efficacement une variable aléatoire selon la loi de ξ condi-tionnellement à l’événement ξ ≤ K.

En déduire une méthode de réduction de variance pour le calcul de P.


Intensité aléatoire On suppose dans ce paragraphe on suppose que λt est un processus aléa-toire donné par λt = exp(Xt) où (Xt , t ≥ 0) est solution de :

dXt =−c(Xt−α)dt +σdWt ,X0 = x.

avec c,σ ,α des réels donnés et (Wt , t ≥ 0) un mouvement brownien.On pose τ = Λ−1(ξ ), ξ étant un variable aléatoire de loi exponentielle de paramètre 1 in-

dépendante de W .

1. Calculer E(λt). Proposer une variable de contrôle pour le calcul de P. Comment peut onvérifier numériquement que cette variable de contrôle réduit effectivement la variance ?

2. Calculer :

E

(R

n−1

∑i=1

e−rTi1τ>Ti− e−rτ1τ<Tn

∣∣∣λt , t ≥ 0

).

En déduire un méthode de simulation évitant de simuler la variable aléatoire ξ dont onmontrera qu’elle réduit la variance.

Une méthode de fonction d’importance.

1. Soit β > 0, on pose ξβ = ξ/β . Montrer que, pour toute fonction f mesurable positive etpour tout β > 0 on a :

E( f (ξ )) = E(

f (ξβ )gβ (ξβ )).

gβ étant une fonction que l’on explicitera en fonction de β .

2. En déduire que pour tout β > 0 :

P = E(gβ (ξ/β )H(Λ−1(ξ/β ))

)avec H(t) = R∑

ni=1 e−rTi1t>Ti− e−rt1t<Tn.

3. On pose Xβ = gβ (ξ/β )H(Λ−1(ξ/β )), montrer que pour tout β > 0 :

Var (Xβ ) = E(gβ (ξ )H

2(Λ−1(ξ )))−P2.

4. Montrer que Var (Xβ )<+∞ si β < 2 et que :

Var (X2) =12E(∫ +∞

0H2(Λ−1(t))dt

)−P2.

5. Montrer que β → Var (Xβ ) est une fonction strictement convexe sur l’intervalle ]0,2[ etque

limβ→0+

Var (Xβ ) = +∞.

En admettant que limt→+∞ Λ−1(t) = +∞ presque sûrement, montrer quelimβ→2−Var (Xβ ) = +∞.

6. Quel β est il souhaitable de choisir dans une méthode de Monte-Carlo ? Proposer unméthode permettant d’approcher cette valeur.


PROBLEM 2. Méthode de biaisage d’un tirage uniforme par une loi β .

Le cas de la dimension 1. On note U une variable aléatoire de loi uniforme sur l’intervalle[0,1]. On considère la famille de loi β (a,1) dont la densité est donnée, pour a > 0, par :

aua−11u∈[0,1].

On note Va une variable aléatoire de de loi β (a,1).

1. Proposer une méthode de simulation selon la loi β (a,1). Pour g une fonction bornée,comment peut-on estimer E(g(Va)) à l’aide d’une méthode de Monte-Carlo ? Commentobtenir un ordre de grandeur de l’erreur dans cette méthode ?

2. Vérifier que, si f est une fonction de R dans R telle que E(| f (U)|) < +∞, pour touta > 0 :

E( f (U)) = E(

f (Va)

aV a−1a

).

Comment utiliser cette relation pour calculer E( f (U)) à l’aide d’une méthode de Monte-Carlo ? Quelle fonction de a doit on alors minimiser pour obtenir une méthode optimale?

3. On suppose que f est bornée. Montrer que, pour tout 0 < a < 2 :

σ2a = Var

(f (Va)

aV a−1a

)= E

(f 2(U)

aUa−1

)−E( f (U))2 .

4. En utilisant le lemme de Fatou, montrer que, si P( f (U) 6= 0)> 0, lima→0+ σ2a =+∞. Puis

que, si f est continue en 0 avec f (0) 6= 0, lima→2− σ2a =+∞.

On supposera, dans la suite, que f est bornée, continue en 0 avec f (0) 6= 0 (ce qui im-plique que P( f (U) 6= 0)> 0).

5. Montrez que, pour 0 < a < 2, σ2a admet des dérivées d’ordre 1 et 2 par rapport à a qui

s’écrivent sous la forme :

dσ2a

da= E

(f 2(U)g1(a,U)

)et

d2σ2a

da2 = E(

f 2(U)g2(a,U)),

g1 et g2 étant des fonctions de R+∗× [0,1] dans R que l’on calculera.

6. En déduire que σ2a est une fonction convexe sur l’intervalle ]0,2[ qui atteint son minimum

au point a solution unique de l’équation ψ(a) = 0 où

ψ(a) = E(

f 2(U)

a2Ua−1 (1−a| ln(U)|)).


7. Soit (Un,n ≥ 1) une suite de variables aléatoires indépendantes suivant une loi uniformesur [0,1]. On définit ψn(a), pour a > 0, par

ψn(a) =1n

n

∑i=1

f 2(Ui)

a2Ua−1i

(1−a| ln(Ui)|)

Vérifier que, pour a ∈]0,2[, ψn(a) converge presque sûrement vers ψ(a) lorsque n tendsvers +∞.

8. On pose τ = infk≥ 0, | f (Uk)|> 0. Vérifier que P(τ <+∞) = 1. Montrer que, pour n≥τ , ψn(a) est une fonction continue et croissante de a vérifiant ψn(0−) =−∞ et ψn(+∞) =+∞. En déduire qu’il existe une unique an > 0 tel que ψn(an)= 0. Proposer un algorithmepermettant de calculer an et puis (sans preuve) une méthode d’approximation de a.

Le cas d’une dimension plus grande que 1. Pour d ≥ 1, on note U = (U1, . . . ,Ud) unevariable aléatoire de loi uniforme sur [0,1]d . Pour a > 0, on considère une famille de vecteursaléatoires Va = (V 1

a , . . . ,Vda ) composés de variables aléatoires indépendantes de loi β (a,1).

1. Montrer que, si f est une fonction de [0,1]d dans R telle que E(| f (U)|)<+∞, pour touta > 0, on peut écrire :

E( f (U)) = E(l(a,Va) f (Va)) ,

l étant une fonction de R+∗× [0,1]d dans R que l’on précisera.

2. Pour f bornée et 0 < a < 2, exprimer σ2a = Var (l(a,Va) f (Va)) à l’aide de la fonction l et

du vecteur aléatoire U .

3. Expliquer comment l’on a intérêt à choisir a pour mettre en oeuvre une méthode deMonte-Carlo.

On suppose que f est bornée, continue en 0 et telle que f (0) 6= 0. Vérifier que σ2a est

une fonction convexe, telle que σ20+ = +∞ et σ2

2− = +∞ Montrer que la a optimal est la

solution unique de dσ2a

da = 0. Montrer que cette condition peut s’écrire sous la forme

dσ2a

da= E(H(a,U)) = 0.

Proposer (sans preuve) une méthode d’approximation de a.


PROBLEM 3. Biaisage gaussien “prévisible”.

Soit G = (G1,G2, . . . ,Gn) un vecteur de gaussiennes, centrées, réduites, indépendantes.On considère n fonctions λi(g), i= 1, . . . ,n, telles que : λi(g) = fi(g1, . . . ,gi−1), où les fi sont

des fonctions de Ri−1 dans R, supposées régulières et bornées ( f1 et donc λ1 sont supposéesconstantes).

On note Rλ la transformation de Rn dans Rn qui associe à g, g′= Rλ (g) dont les coordonnéessont calculées par, i croissant de 1 à n,

g′i = gi + fi(g′1, . . . ,g′i−1) = gi +λi(g′).

2. Montrer que Rλ est une transformation bijective de Rn dans Rn. On vérifiera que R−1λ

soninverse est donné par R−1

λ(g′)i = g′i−λi(g′).

Montrer que le déterminant de la matrice jacobienne de R−1λ

est identiquement égal à 1.

On définit la variable aléatoire Mλn (G) par

Mλn (G) = exp

(n

∑i=1

λi(G)Gi−12

n

∑i=1

λ2i (G)

)

3. Soit Fi = σ(G1, . . . ,Gi) montrer que, pour i≤ n,

E(

exp(

λi(G)Gi−12

λ2i (G)

)|Fi−1

)= 1.

En déduire que E(Mλn (G)) = 1.

4. Vérifier que l’on définit une nouvelle probabilité P(λ ) en posant

P(λ ) (A) = E(

Mλn (G)1A

).

et que l’on a (E(λ ) désigne l’espérance sous la probabilité P(λ ))

E(λ )(X) = E(

Mλn (G)X

)pour toute variable aléatoire P(λ )-intégrable X .

5. En utilisant le changement de variable g = R−1λ(g′), montrer que la loi de G sous P(λ ) est

identique à celle de Rλ (G) sous P. En déduire une méthode de simulation de la loi de Gsous P(λ ).

6. On note Xλ = φ(G)/Mλn (G). Vérifier que E(λ )(Xλ ) = E(φ(G)) et montrer que la vari-

ance de Xλ sous P(λ ) est donnée par

Var (λ )(Xλ ) = E(

e−∑ni=1 λi(G)Gi+

12 ∑

ni=1 λ 2

i (G)φ(G)2

)−E(φ(G))2


7. Comment peut on simuler une variable aléatoire dont la loi est celle de Xλ =φ(G)/Mλ

n (G) sous P(λ ) ? Quel est l’intérêt de pouvoir représenter E(φ(G)) sous laforme E(λ )(Xλ ) ?

8. Pour α un réel donné, on pose u(α) = Var (αλ )(Xαλ ). Montrer que u est une fonctiondifférentiable et que :

u′(α) = E

((n

∑i=1

αλi(G)2−n

∑i=1

λi(G)Gi

)e−∑

ni=1 αλi(G)Gi+

12 ∑

ni=1 α2λi(G)2

φ(G)2

).

On notera, en particulier, que u′(0) vaut −E((∑n

i=1 λi(G)Gi)φ(G)2) .9. Montrer que u est deux fois différentiable et que u′′ peut s’exprimer sous la forme

u′′(α) = E

( n

∑i=1

λ2i (G)+

(n

∑i=1

αλi(G)2−n

∑i=1

λi(G)Gi

)2×

× e−

n

∑i=1

αλi(G)Gi +12

n

∑i=1

α2λi(G)2

φ(G)2

)

10. On suppose dans la fin de ce problème que la fonction φ |λ | est non nulle sur un en-semble de mesure de Lebesgue non nulle. Montrer que u est strictement convexe et quelimα→±∞ u(α) = +∞.

11. Montrer que si u′(0) 6= 0, il existe un α tel que u(α)< u(0), puis que, pour ce α , αλ estun choix qui permet de réduire la variance de la méthode de Monte-Carlo.

12. Montrer, en utilisant la stricte convexité de u, que si u′(0) = 0, 0 réalise le minimum u.Interpréter ce résultat.

13. On suppose, pour cette question, que λ est un vecteur de fonctions paires (c’est à dire quivérifie, pour tout g, λ (−g) = λ (g) non nul et que la fonction φ 2 est paire. Montrer que,sous ces hypothèses, λ ne permet pas de réduire la variance.


PROBLEM 4. Simulation directionnelle et stratification.

Soit G = (G1, . . . ,Gd) un vecteur constitué de d gaussiennes centrées réduites indépendantes etH un fonction continue de Rd dans R et on cherche à calculer la probabilité :

p = P(H(G)≥ λ ) ,

pour une valeur λ positive donnée.

1. Décrire la méthode de Monte-Carlo habituelle permettant d’estimer p et expliquer com-ment l’on peut évaluer l’erreur de la méthode.

2. On suppose que l’on cherche à approche une valeur de p de l’ordre de 10−10. Donnerune évaluation (grossière) du nombre minimal de tirages n permettant d’évaluer p à 20%près. Lorsque le calcul de H demande un temps de l’ordre d’une seconde, qu’en concluezvous ? (1 an ≈ 3×107 secondes)

3. On écrit le vecteur G sous la forme G = RA où R = |G| et A = G/|G| prend ses valeursdans la sphère unité de Rd , Sd = x ∈ Rd, |x|= 1.Montrer que R2 suit une loi du χ2 à d degrés de liberté et que A suit une loi invariante parrotation dont le support est inclu dans Sd (On admettra que cela implique que la loi de Aest la loi uniforme sur la sphère).

4. A l’aide du changement de variable polaire généralisé (g1, . . . ,gd)→ (r,θ1, . . . ,θd−1)g1 = r cosθ1g2 = r sinθ1 cosθ2

· · ·gd−1 = r sinθ1 sinθ2 . . .sinθd−2 cosθd−1gd = r sinθ1 sinθ2 . . .sinθd−2 sinθd−1

montrer que R et A sont des variables aléatoires indépendantes.

5. On suppose, à partir de maintenant, que pour tout a∈ Sd la fonction r→H(ra) est stricte-ment croissante, que H(0) = 0 et limr→+∞ H(ra) = +∞.

Soit a∈ Sd , on pose φ(a) = P(H(Ra)≥ λ ). Calculer explicitement φ(a) en fonction de lafonction de répartition de la loi du χ2 à d degrés de liberté et de la solution de H(r∗a) = λ .

6. Soit (Ai, i ≥ 0) une suite de variables aléatoires indépendantes et de loi uniforme sur lasphère unité. Montrer que limn→+∞ ∑

ni=1 φ(Ai) = p, au sens de la convergence presque

sûre.

7. Montrer que Var (φ(A1)) ≤ Var(1H(G)≥λ

)et comparer la vitesse de convergence de

l’estimateur de la question 1 et celui de la question précédente.

8. Nous allons maintenant réduire la variance de l’estimateur précédent par une techniquede stratification.


On considère les 2d quadrants définit, pour ε ∈ −1,+1d , par :

Cε = x ∈ Sd,ε1x1 ≥ 0, . . . ,εdxd ≥ 0.

Montrer que P(A1 ∈Cε ∩Cε ′) = 0, pour ε 6= ε ′. En déduire la valeur de ρε = P(A1 ∈Cε)?

9. On considère l’estimateur In stratifié dans les strates (Cε ,ε ∈ −1,+1d) :

In = ∑ε∈−1,+1d

ρε

nε

nε

∑k=1

φ(A(ε)k ),

où nε = nwε(n) avec wε(n)≥ 0 et ∑ε∈−1,+1d wε(n) = 1 et où les (A(ε)k ,k ≥ 0) sont des

suites de variables aléatoires indépendantes suivant une loi uniforme sur Cε , ces suitesétant indépendantes entre elles.

Calculer Var (In) en fonction de n et des (wε(n),ε ∈ −1,+1d), des (ρε ,ε ∈ −1,+1d)

et des (σε ,ε ∈ −1,+1d) où σ2ε = Var (A(ε)

1 ).

10. Montrer l’inégalité : (N

∑i=1

αiai

)2

≤N

∑i=1

αia2i ,

pour (αi,ai,1≤ i≤ N) des réels positifs tel que ∑Ni=1 αi = 1.

En déduire que Var (In)≥ 1n

(∑ε∈−1,+1d)ρεσε

)2.

Comment peut on choisir les (wε(n),ε ∈ −1,+1d) pour réaliser la borne inférieureprécédente ?

11. Proposer une méthode de simulation selon la loi uniforme sur Cε ?

12. On fixe une fois pour toute les wε (indépendement de n) par :

wε =ρεσε

∑ε∈−1,+1p ρεσε

.

On pose nε = [nwε ]. Montrer que, au sens de la convergence presque sûre :

limn→+∞

In = p.

13. Montrer que la limite en loi des variables aléatoires :√

n(In− p) est une gaussienne

centrée de variance(

∑ε∈−1,+1d)ρεσε

)2.

Comment peut on utiliser ce résultat pour estimer l’erreur commise lorsque l’on estime pà l’aide de In ?


PROBLEM 5. Méthode du “carré latin”.

Le cas de la dimension 1. On considére un suite de variables aléatoires indépendantes de loiuniforme sur [0,1], U = (Ui, i≥ 1) et une permutation σ tirée uniformement sur l’ensemble despermutations de 1, . . . ,N et indépendante de la suite U . Pour N entier fixé et pour 1≤ i≤ N,on pose :

Y Ni =

i−Ui

Net XN

i = Y Nσ(i).

On considére une fonction mesurable bornée f et l’on cherche à comparer l’estimateur clas-sique :

I0(N) =1N( f (U1)+ · · ·+ f (UN)) ,

avec :I1(N) =

1N

(f (XN

1 )+ · · ·+ f (XNN ))

1. Quel est la limite de I0(N) lorsque N tends vers +∞ ? Comment peut on estimer l’erreurcommise lorsque l’on approxime cette limite par I0(N) ?

2. Pour un i fixé,1≤ i≤ N, identifier la loi de XNi . En déduire que E(I1(N)) = E( f (U1))

3. Montrer que I1(N) = 1N

(f (Y N

1 )+ · · ·+ f (Y NN )), et en déduire que

NVar (I1(N)) =∫ 1

0f 2(s)ds−N

N

∑k=1

(∫ k/N

(k−1)/Nf (s)ds

)2

,

4. Vérifier que NVar (I1(N)) =N2

N

∑i=1

∫ i/N

(i−1)/N

∫ i/N

(i−1)/N( f (s)− f (s′))2dsds′. Lorsque f est

une fonction continue, montrer que NVar (I1(N)) tend vers 0 lorsque N tends vers +∞.Que vaut NVar (I0(N)) ?

5. Pour f continue, en quel sens I1(N) convergente t’il vers E( f (U)) ?

6. Quel estimateur vous parait préférable, I0(N) ou I1(N), du point de vue de sa vitesse deconvergence en moyenne quadratique ?

D’un point de vue pratique quels vous paraissent les inconvénients de I1(N) par rapport àI1(N) ?

7. Comment peut-on interpréter l’estimateur I0(N) en terme de méthode de stratification ?

Le cas de la dimension 2 On considère maintenant le cas bidimensionnel. U désigne un suitede variables aléatoires indépendantes de loi uniforme sur [0,1]2, U = ((U1

i ,U2i ), i≥ 1) et deux

permutations indépendantes σ1 et σ2 tirée uniformement sur l’ensemble des permutations de1, . . . ,N et indépendantes de la suite U . Pour N entier fixé et pour 1≤ i, j ≤ N, on pose :

Y Ni, j =

(i−U1

iN

,j−U2

j

N

)et XN

i = Y Nσ1(i),σ2(i)

.


On pose, comme dans la première partie, pour f une fonction bornée de [0,1]2 dans R :

I0(N) =1N( f (U1)+ · · ·+ f (UN)) et I1(N) =

1N

(f (XN

1 )+ · · ·+ f (XNN ))

1. Montrer que

E(

f (XNi ))=

1N2 ∑

1≤k≤N1≤l≤N

E(

f (Yk,l))= E( f (U1))

2. Montrer que, pour i 6= j :

E(

f (XNi ) f (XN

j ))=

1N2(N−1)2 ∑

1≤k1 6=k2≤N1≤l1 6=l2≤N

E(

f (Yk1,l1))E(

f (Yk2,l2)),

puis que :

E(

f (XNi ) f (XN

j ))=

1N2(N−1)2

N4E( f (U1))

2− ∑1≤k≤N

1≤l1,l2≤N

E(

f (Yk,l1))E(

f (Yk,l2))

− ∑1≤k1,k2≤N

1≤l≤N

E(

f (Yk1,l))E(

f (Yk2,l))+ ∑

1≤k≤N1≤l≤N

E(

f (Yk,l))2.

3. En déduire que, pour f continue :

limN→+∞

(N−1)Cov( f (XN1 ), f (XN

2 )) = 2E( f (U))2−∫[0,1]3

f (x,y) f (x,y′)dxdydy′

−∫[0,1]3

f (x,y) f (x′,y)dxdx′dy.

4. Montrer que :

Var (E( f (U)|U1)) =∫[0,1]3

f (x,y) f (x,y′)dxdydy′−E( f (U))2 ,

Var (E( f (U)|U2)) =∫[0,1]3

f (x,y) f (x′,y)dxdx′dy−E( f (U))2 .

5. En déduire que :

limN→+∞

NVar (I1(N)) = Var ( f (U))−Var (E( f (U)|U1))−Var (E( f (U)|U2)) .

6. Quel est l’intérêt d’utiliser I1(N) plutôt que I0(N) ? Quels en sont les inconvénients ?

7. Vérifier que :

Var f (U)−E( f (U)|U1)−E( f (U)|U2)= Var ( f (U))−Var (E( f (U)|U1))

−Var (E( f (U)|U2)) .


PROBLEM 6. Annulation du bais d’une suite d’estimateurs.

On se donne une variable aléatoire Y à valeurs dans R de carré intégrable.

1. On suppose, tout d’abord, que l’on sait simuler exactement la variable aléatoire Y . Pourun ε réel positif, montrer que le nombre N(ε) de tirages aléatoires de Y indépendantspermettant d’obtenir (avec une grande probabilité) une approximation de E(Y ) avec uneerreur d’environ ε croit comme Cte/ε2.

On suppose maintenant que l’on ne sait pas simuler exactement Y mais seulement qu’il existeune suite de variables aléatoires de carré intégrable (Yn,n≥ 0) facilement simulables1 telles que

limn→+∞

E((Yn−Y )2)= 0. (1.7)

2. Montrer que le biais bn = E(Y )−E(Yn) tend vers 0 lorsque n tend vers +∞. Vérifier quelimn→+∞E((Yn−Y )Y ) = 0. En déduire que limn→+∞E(Y 2

n ) = E(Y 2) et en conclure quelimn→+∞ Var (Yn) = Var (Y ).

3. On se place dans un cas où l’on a une estimation a priori du bias du type |bn| ≤ Cnα , avec

α > 0. On choisit un n > 0 et le nombre de tirage N, (Y 1n , . . . ,Y

Nn ), selon la loi de Yn. On

approxime E(Y ) par Y n,N = 1N ∑

Ni=1Y i

n. Montrer que, pour N grand, avec une probabilitéproche de 1, on a

|Y n,N−E(Y )| ≤ R(n,N) :=Cnα

+2K√N.

4. On prend comme mesure de l’erreur R(n,N) et l’on suppose que le coût de N tirages selonla loi de Yn est donné par Coût(n,N) =C1nN. On s’impose une contrainte sur le coût totaldu calcul Ctotal. On s’intéresse à l’erreur minimale R(n,N) que l’on peut obtenir sous lacontrainte de coût Coût(n,N) =Ctotal, définie par :

Error(Ctotal) = minn,N,Cost(n,N)=Ctotal

R(n,N)

Montrer que Error(Ctotal) = Cte×C− α

2α+1total . En déduire que le coût minimal nécessaire

pour obtenir une précision ε croît comme Cte× ε− 2α+1

α . En comparant avec le résultatde la quoi question 1, expliquer en quoi la présence du biais dégrade la performance del’algorithme de simulation.

Comment se débarasser du biais ? Nous allons voir que l’on peut construire, à partir de lasuite (Yn,n ≥ 0) une variable aléatoire simulable et sans biais Y (c’est-à-dire telle que E() =E(Y )) au prix toutefois d’une augmentation de la variance.

On considère, pour cela, une variable aléatoire τ à valeurs dans N∗ indépendant de la suite(Yn,n≥ 0) dont la loi est donnée par

P(τ = n) = pn, avec pn > 0 pour tout n≥ 1. (1.8)

1C’est le cas lorsque l’on discrétise une équation différentielle stochastique.


On note ∆Yn = Yn−Yn−1, pour n≥ 1 (lorsque n = 1, on pose Y0 = 0). On définit Y par

Y =∆Yτ

pτ

= ∑n≥1

∆Yn

pn1τ=n.

5. Montrer que, sous l’hypothèse (1.8), E(|Y |) = ∑n≥1E(|∆Yn|). En déduire que lorsqueE(|Y |)<+∞, on a E(Y |Yn,n≥ 1) = Y et E(Y ) = E(Y ).

6. Donner un exemple de suite déterministe (∆Yn,n≥ 1) où Y = ∑n≥1 ∆Yn converge mais oùE(|Y |) = +∞, pour tout choix de p vérifiant l’hypothèse (1.8).

7. Montrer que l’on a

E(Y 2) = ∑n≥1

E((∆Yn)

2)pn

,

et que lorsque E(Y 2)<+∞, Var (Y )≥ Var (Y ).

8. On cherche à approximer E(Y ) où Y = f (XT ) et Yn = f (X (n)T ) où X une solution

d’une EDS et X (n) son approximation par un schéma d’Euler de pas hn = 1/2n. Mon-trer (en supposant les coefficients de l’EDS et f aussi réguliers que souhaité) que

E(∣∣∣ f (X (n)

T )− f (XT )∣∣∣2) ≤ Cρn, ρ étant un réel positif inférieur à 1 que l’on précisera.

En déduire que E((∆Yn)2)≤ 4Cρn.

Par ailleurs le coût d’une simulation de Yn est donné par Cte× 2n. Montrer que, si l’onnéglige le temps de simulation de τ , le coût moyen d’une simulation de Y est proportion-nel à ∑n≥1 pn2n.

9. On suppose que, comme dans l’exemple précédent, E((∆Yn)

2)≤Cρn, avec ρ < 1 et quele coût d’une simulation de Yn est donné par par cαn avec α > 1.

On suppose enfin que la loi de τ est une loi géométrique2 de paramètre q avec 0 < q < 1(on note q′ = 1−q). Vérifier que, si q′ > ρ , E((Y )2)<+∞ et que :

E(Y 2)≤ Cρq′

q(q′−ρ).

Par ailleurs, montrer que le coût moyen d’une simulation de Y , Cmoy = E(cατ), est finilorsque αq′ < 1 et qu’il alors égal à cαq/(1−αq′).

En supposant3 que la performance de la simulation est mesurée par le produit E(Cmoy)×E((Y )2) montrer que l’on doit avoir ρα < 1 pour espérer obtenir une méthode efficace etque, dans ce cas, le meilleur choix de q′ est q′ =

√ρ/α .

2i.e. pour tout n≥ 1, pn = q(1−q)n.3Voir, pour des élèments de justification, Glynn P.W., Whit W., 1992, The asymptotic efficiency of simulation

estimators, Oper. Res. 40(3):505-520.


PROBLEM 7. Biaisage de loi.

Soit X une variable aléatoire réelle telle que pour un t0 > 0, si t < t0, E(

et|X |)<+∞. On note

Mt = E(etX) . (1.9)

1. Vérifier que cette hypothèse est vérifiée pour une variable aléatoire X gaussienne centréeréduite avec t0 =+∞. Que vaut alors Mt ?

Soit X une variable aléatoire suivant une loi gamma de densité p(λ ,α,x)

p(λ ,α,x) = 1x>0xα−1 λ αe−λx

Γ(α).

où α > 0, λ > 0 et Γ(α) =∫+∞

0 e−xxα−1dx. Calculer E(etX) pour t < λ . Que vaut E(eλX)?

Sous l’hypothèse (1.9) on définit, pour t < t0, une mesure µt par, pour f est une fonctionmesurable et positive

µt( f ) =E(etX f (X))

Mt.

µt est alors une mesure positive de masse totale 1 sur R (autrement dit une loi de probabilité).

2. Soit Y (t) une variable aléatoire de loi µt sous P (On a donc, pour f mesurable et positiveEθ ( f (Y (t))) = E(etX f (X))

Mt). Montrer que pour 0≤ t < t0, E(|Y (t)|)<+∞, puis vérifier que

t→ E(Y (t)) est une fonction différentiable dont la dérivée vaut

ddtEθ (Y (t)) =

E(etX X2)E(etX)−E(etX X)2

E(etX)2 .

En déduire que t→ Eθ (Y (t)) est une fonction croissante.

La loi µt n’est pas toujours facile à simuler. Voici, toutefois, deux cas où les choses se passentbien.

3. Quelle est la loi de Y (t) lorsque X suit une loi gaussienne centrée réduite ? Commentsimuler Y (t) dans ce cas ?

4. Soit X une variable aléatoire suivant une loi gamma de densité p(λ ,α,x). Quelle est laloi de Y (t), pour t < λ ?

Comment simuler cette loi lorsque α = 1 et plus généralement lorsque α est un réelarbitraire.

On considère n variables aléatoires indépendantes (X1, . . . ,Xn) suivant la loi de X . On poseSX

n = X1 + · · ·+Xn et l’on cherche à calculer

θ = P(SX

n ≥ a).

pour une valeur de a telle que θ est de l’ordre de 10−12.


5. Quel est l’ordre de grandeur du nombre N de tirages du vecteur (X1, . . . ,Xn) nécessairespour obtenir un précision relative de 50% sur l’estimation de θ par une méthode deMonte-Carlo?

On va (donc) chercher à améliorer la méthode ...

6. Pour t < t0 et Y (t) une variable aléatoire de loi µt sous Pθ , vérifier que pour toute fonctiong mesurable, positive E(g(X)) = MtEθ

(e−tY (t)

g(Y (t)))

.

7. Pour un t < t0, on considére n variables aléatoires (Y (t)1 , . . . ,Y (t)

n ) indépendantes suivantla loi µt sous une probabilité Pθ . On note SY (t)

n =Y (t)1 + · · ·+Y (t)

n . Montrer que pour toutefonction F de la forme f1(x1)× ·· · × fn(xn) où les fi sont mesurables et bornées de Rdans R

E(F(X1, . . . ,Xn)) = Mnt Eθ

(F(Y (t)

1 , . . . ,Y (t)n )e−tSY (t)

n

).

On admettra que cette égalité se généralise à toute fonction F(x1, . . . ,xn) mesurablebornée de Rn dans R.

8. En déduire que : θ = P(SX

n ≥ a)= Mn

t Eθ

(1

SY (t)n ≥a

e−tSY (t)n

).

On note Zt la variable aléatoire Zt = 1SY (t)

n ≥aMn(t)e−tSY (t)

n . Comment utiliser Zt pour

mettre en œuvre une méthode de Monte-Carlo, permettant de calculer θ , possiblementplus efficace ? Quel critère doit-on utiliser pour choisir t de façon optimale pour cetteméthode ?

On suppose dans la suite que4 E(X)< a/n et que5 a/n < E(Y (t0)).

9. Vérifier que Var (Zt) ≤ (Mnt e−ta)2−θ 2. En minimisant la partie de droite de l’inégalité

précédente, montrer qu’un choix raisonnable pour t est donné par la solution unique tcdans l’intervalle ]0, t0[ de l’équation

Eθ (Y (t)) =E(etX X)

E(etX)=

an.

10. Montrer que, pour tout t < t0, Var (Zt) =E(

1SXn≥aM

nt e−tSX

n

)−θ 2, et que Var (Zt) admet

une dérivée donnée par

ddt

Var (Zt) = Mnt E

1SXn≥ae

−tSXn

[nEθ (Y (t))−SX

n

].

4Hypothèse naturelle lorsque θ est supposé petit.5Ceci est vrai lorsque, E(Y (t0)) = +∞, hypothèse vérifiée pour les deux exemples précédents.


11. En déduire que ddt Var (Zt) ≤ 0 pour 0 ≤ t ≤ tc où tc est la solution unique de E(Y (tc)) =

a/n. As t’on intérêt à utiliser des valeurs de t < tc pour mettre en œuvre la méthode deMonte-Carlo proposée?

12. Montrer que le résultat de la question précédente reste vrai si l’on remplace 1SXn≥a par

H(SXn ) où H est une fonction qui s’annule sur l’ensemble x < a (exemple: H(x) =

(x−a)+).

13. En supposant que Eθ (Y (t)) se calcule explicitement et sans chercher à en prouver la con-vergence, proposer un algorithme stochastique permettant de résoudre l’équation d’Eulerpour la minimisation de la variance, c’est à dire:

E

1SXn≥aM

nt e−tSX

n

[nEθ (Y (t))−SX

n

]= 0.


PROBLEM 8. Réduction de variance vectorielle

Préliminaires. On suppose que (U1,U2) est un couple de variables aléatoires réelles indépen-dantes de lois données.

1. Décrire la méthode de Monte-Carlo classique permettant de calculer P(U1 ≤U2) à l’aidede tirages indépendants selon la loi de (U1,U2) et expliquer comment l’on peut estimerl’erreur commise à l’issue de n tirages.

On considère une variable réelle X , de carré intégrable, sur un espace de probabilité (Ω,A ,P)et une sous-tribu B de A . On pose Y = E(X |B).

2. Montrer que Var (Y )≤ Var (X).

3. On note X = 1U1≤U2. Calculer Y = E(X |U1) à l’aide de la fonction de répartition F2de U2, supposée explicite. Expliquer comment l’on peut utiliser ce calcul pour construireun estimateur de variance réduite de P(U1 ≤U2) n’utilisant que des tirages indépendantsselon la loi de U1 .

Matrice de covariance d’un vecteur conditionné. On considère maintenant un vecteur aléa-toire X = (X1, . . . ,Xd) et une tribu B. On suppose que les Xi sont de carré intégrable et l’onappelle Γ(X) la matrice de variance-covariance de X

Γ(X)i, j = Cov(Xi,X j).

On définit le vecteur Y = (Y1, . . . ,Yd) en posant, pour tout i, 1≤ i≤ d, Yi = E(Xi|B).

4. Montrer que les coordonnées de Y sont de carré intégrable et que pour tout vecteur λ ∈Rd

Var

(d

∑i=1

λiYi

)≤ Var

(d

∑i=1

λiXi

).

On note Γ(Y ) la matrice de variance-covariance de Y . En déduire que Γ(Y ) ≤ Γ(X) oùsens des matrices définies positives, ce qui signifie, pour tout λ ∈ Rd ,

λ ·Γ(Y )λ ≤ λ ·Γ(X)λ ,

où l’on a noté x · y = ∑di=1 xiyi le produit scalaire dans Rd .

La méthode “Delta”. On considère une fonction f de Rn dans R, telle que les dérivées par-tielles ∂ f

∂xi(θ) existent pour i = 1, . . . ,d et vérifiant de plus6 que

f (x) = f (θ)+d

∑i=1

(xi−θi)∂ f∂xi

(θ)+o(x−θ),

6Cette hypothèse signifie que f est différentiable au point θ , ce qui est plus exigeant que la seule existence desdérivées partielles en ce point.


avec |o(h)| ≤ |h|ε(h) et lim|h|→0 ε(h) = 0.

On suppose que (Zn,n ≥ 0) est une suite de variables aléatoires de Rd convergeantpresque sûrement vers θ , un vecteur de Rd , telle que

√n(Zn − θ) converge en loi vers un

vecteur gaussien de moyenne nulle et de matrice de variance-covariance Γ.

5. Montrer que, si l’on note ∇ f (θ) =(

∂ f∂xi

(θ),1≤ i≤ d)

.

√n( f (Zn)− f (θ)) = ∇ f (θ) ·

√n(Zn−θ)+Rn,

où |Rn| ≤ |√

n(Zn−θ)|ε(Zn−θ), puis, en utilisant le lemme de Slutzky, prouver que Rntend en loi vers 0.

6. En déduire que√

n( f (Zn)− f (θ)) tend en loi vers une gaussienne de moyenne nulle etde variance ∇ f (θ) ·Γ ∇ f (θ).

Une application. Soit ((Xn,Yn),n≥ 1) une suite de couple de variables aléatoires réelles in-dépendantes suivant la loi du couple (X ,Y ). A et B sont deux sous ensemble de R. On chercheà calculer P(X ∈ A)/P(Y ∈ A) par une méthode de type Monte-Carlo.

7. Quelle est la limite presque sûre du quotient Qn = ∑ni=1 1Xi∈A/∑

ni=1 1Yi∈B?

8. On considère la suite de variables aléatoires de R2

Zn =

(1n

n

∑i=1

1Xi∈A,1n

n

∑i=1

1Yi∈B

).

On pose pA = P(X ∈ A) et pB = P(Y ∈ B) et θ = (pA, pB). Montrer en appliquant lethéorème de la limite centrale multidimensionnel (voir la question 15 pour un énoncé etune preuve) que

√n(Zn−θ) tend vers un vecteur gaussien centré de matrice de variance

covariance Γ1 que l’on calculera en fonction de pA, pB et pAB = P(X ∈ A,Y ∈ B).

9. En déduire, en utilisant la question 6, la convergence en loi de√

n(Qn− pA/pB) vers unegaussienne de variance σ2

1 = λ0 ·Γ1λ0, avec λ0 = (1/pB,−pA/p2B).

10. Comment peut on utiliser en pratique ce résultat pour estimer l’erreur commise lorsquel’on approxime pA/pB par Qn ?

11. Soit T une variable aléatoire, alors, il existe deux fonctions fX et fY telles queE(1X∈A

∣∣T) = fX(T ) et E(1Y∈B

∣∣T) = fY (T ). On suppose que fX et fY se calculentexplicitement et l’on considère une suite (Tn,n≥ 1) de variables aléatoires indépendantessuivant la loi de T . Quelle est la limite presque sûre de Qn = ∑

ni=1 fX(Ti)/∑

ni=1 fY (Ti) ?

12. Montrer que√

n(Qn− pA/pB) tend vers une gaussienne dont on calculera la variance σ22

à l’aide de λ0 et de la matrice de variance-covariance Γ2 du couple ( fX(T ), fY (T )).

13. En utilisant le résultat de la question 4, comparer σ1 et σ2 et expliquez pourquoi cettedeuxième méthode est préférable pour calculer pA/pB.


Rappel: Théorème de la limite centrale multidimensionnel.Soit (Zn,n ≥ 1) une suite de vecteurs aléatoires de même loi que Z = (Z1, . . . ,Zd). On

suppose que Z est de carré intégrable (i.e. E(|Z|2)<+∞), que le vecteur E(Z) = 0 et l’on noteΓZ la matrice de variance-covariance du vecteur Z.

14. Montrer, en utilisant le théorème de la limite centrale dans R, que pour, tout λ de Rd etpour tout u réel

limn→+∞

E(

ei u√n (λ ·Z1+···+λ ·Zn)

)= e−

u22 λ ·ΓZλ .

15. En déduire que les vecteurs 1√n (Z1 + · · ·+Zn) convergent en loi vers un vecteur gaussien

centrée de matrice de variance-covariance ΓZ . Étendre le résultat au cas où E(Z) 6= 0.


PROBLEM 9. Simulation partielle d’un vecteur

Préliminaires. On considére un vecteur U = (U1, . . . ,Ud) de Rd composé de variables aléa-toires indépendantes et de même loi. On cherche à calculer par simulation E( f (U)) où f estune fonction mesurable et bornée de Rd dans R.

1. On supppose que le coût de total d’une simulation de f (U) est proportionnel à la dimen-sion d. Décrire la méthode de Monte-Carlo classique et montrer que l’on peut obteniravec cette méthode une approximation de E( f (U)) à ε près, pour un coût majoré parKd× ε−2.

Le but de ce problème est montrer que l’on peut, sous certaines hypothèses, améliorer le facteurmultiplicatif ×d dans l’estimation de complexité précédente.

Pour cela on va utiliser une suite de variables aléatoires (Nk,k ≥ 1) indépendantesprenant leurs valeurs dans 1,2, . . . ,d et suivant la loi donnée par P(Nk = i) = pi pour i =1, . . . ,d. On suposera dans toute la suite que pd > 0.

2. On pose T0 = 0, puis pour n ≥ 1 Tn = k > Tn−1,Nk = d , et ∆Tn = Tn−Tn−1. Montrerque (∆Tn,n ≥ 1) est une suite de variables aléatoires indépendants de même loi dont oncalculera la loi commune.

Quelle est la limite Tn/n lorsque n tend vers +∞. En quel sens faut il entendre cette limite?

On considère (Uk,k ≥ 0) une suite de variables aléatoires indépendantes suivant la loi de U ,indépendante de la suite (Nk,k ≥ 1). On va modifier la suite (Uk,k ≥ 0) en renonçant àl’indépendance entre les tirages, en utilisant le suite (Nk,k ≥ 1).

On construit la suite (Uk,k≥ 0) de la façon suivante: on pose U0 =U0, puis on définit Uk+1

à partir de Uk en tirant Nk+1 et en posant

Uk+1i =Uk+1

i ,pour i = 1, . . . ,Nk+1 et Uk+1i = Uk

i ,pour i = Nk+1 +1, . . . ,d.

Autrement dit, on ne retire que les Nk+1 premières coordonnées et on laisse inchangé les autrescoordonnées.

3. Montrer que la loi de U1 est identique à celle de U .

4. Montrer que (Uk,k ≥ 0) est une chaîne de Markov (on utilisera le fait que Uk+1 est unefonction de Uk et du couple (Uk+1,Nk+1)) et en déduire que pour tout k ≥ 1, la loi de Uk

est identique à celle de U .

On pose

Fn =1n

(f (U0)+ · · ·+ f (Un−1)

)5. Vérifier que E(Fn) = E( f (U)).

6. Pourquoi lorsque pd = 0 on ne peut espérer (en général) que Fn converge vers E( f (U)) ?


On pourra toutefois démontrer (voir la partie III) que Fn converge presque sûrement versE( f (U)) lorsque n tend vers +∞, sous l’hypothèse que pd > 0.Etude de la variance. Pour étudier l’interêt de cette méthode, nous allons calculer la variancede l’estimateur Fk. On note a j = Cov

(f (U0), f (U j)

), pour j ≥ 0.

7. Montrer que la loi de (Um,Um+ j) est identique à celle de (U0,U j) et en déduire que, pourtout m≥ 0 et j ≥ 0,

Cov(

f (Um), f (Um+ j))= a j.

8. Montrer que

nVar (Fn) = a0 +2n−1

∑j=1

n− jn

a j ≤ a0 +2 ∑j≥1

a j.

9. En supposant que les (a j, j ≥ 0) sont tous positifs, vérifier que limn→+∞ nVar (Fn) =a0 +2∑ j≥1 a j.

Soit U ′ = (U ′1, . . . ,U′d) de même loi que U = (U1, . . . ,Ud) et indépendant de U , on définit, pour

j ≥ 0, C j par

C j = Cov( f (U1, . . . ,U j,U j+1, . . . ,Ud), f (U ′1, . . . ,U′j,U j+1, . . . ,Ud)).

10. On note M j = max1≤l≤ j Nl où la suite des (Nk,k ≥ 1) est supposée indépendante desvecteurs U et U ′. Montrer que, pour j ≥ 1

a j = Cov(

f (U1, . . . ,UM j ,UM j+1, . . . ,Ud), f (U ′1, . . . ,U′M j,UM j+1, . . . ,Ud)

).

11. On note qi = p1+ · · ·+ pi, pour i = 1, . . . ,d et q0 = 0. Vérifier que P(M j = i) = q ji −q j

i−1,et en déduire que, pour tout j ≥ 1

a j = Cov(

f (U0), f (U j))=

d

∑i=1

(q ji −q j

i−1)Ci.

12. Montrer, en conditionnant par (U j+1, . . . ,Ud), que C j ≥ 0, pour tout j ≥ 0. En déduireque a j ≥ 0 pour tout j ≥ 0.

13. Montrer que Ci est une fonction décroissante de i et vérifier que Cd = 0.

14. Montrer que limn→+∞ nVar (Fn) = σ2 où la variance asymptotique σ2 est donnée par:

σ2 =C0 +2

d

∑i=1

Ci

(1

1−qi− 1

1−qi−1

)=C0−2C1 +2

d−1

∑i=1

Ci−Ci+1

1−qi.

15. Vérifier que l’on obtient le résultat attendu dans le cas classique (c’est à dire lorsquep1 = · · ·= pd−1 = 0, pd = 1) et que l’on a toujours σ2≥Var ( f (U)). Comment interpréterce dernier résultat ?


Pour comprendre en quoi, malgré le résultat précédent, cet algorithme peut être intéressant,il faut prendre en compte le temps que l’on peut gagner dans le calcul des Uk. Pour cela onnote ti le temps de calcul moyen lorsque l’on modifie seulement les i premières coordonnées duvecteur Uk. Le temps de modification moyen est alors donné par ∑

di=1 piti. Le critère pertinent

à optimiser est alors donnée par le produit espérance du temps×variance. Pour plus de détailsvoir l’article Randomized Dimension Reduction for Monte-Carlo Simulation, Nabil Kahalé,Octobre 2018.Convergence presque sûre de l’estimateur. Avec les notations précédentes, on pose ( f estsupposée bornée), pour n≥ 0

Zn+1 = f (UTn)+ f (UTn+1) · · ·+ f (UTn+1−1).

16. Montrer que la suite (Zn,n ≥ 1) est une suite de variables aléatoires indépendantes et demême loi et que E(Zn) = E( f (U))×E(∆T1).

17. En déduire que

1n

(f (U0)+ · · ·+ f (UTn−1)

)et

1n

(f (U0)+ · · ·+ f (UTn+1−1)

)converge presque sûrement vers une même valeur limite que l’on calculera.

18. On considère un instant k et l’on définit n(k) comme l’unique entier tel que Tn(k) ≤ k <Tn(k)+1. Vérifier que n(k) tend p.s. vers +∞ lorsque k tend vers +∞. En déduire, encommençant par supposer que f est positive, que

Fk =1k

(f (U0)+ · · ·+ f (Uk)

)tend p.s. vers E( f (U)).

V (q) =

(d−1

∑i=0

qi(ti+1− ti)

)(C0−2C1

q0+2

d−1

∑i=1

Ci−Ci+1

qi

).

19. En utilisant l’inégalité de Cauchy-Schwartz, montrer que

V (q)≥

(√C0−2C1

√t1 +

d−1

∑i=1

√2(Ci−Ci+1)

√ti+1− ti

)2

,

et donner une valeur du vecteur q qui permet d’obtenir la borne inférieure précédente.

Lorsque le coût est proportionnel à i (i.e. ti = c× i), montrer que la borne inferieureaméliore celle du cas classique (i.e. pd = 1, td = c×d, pour laquelle elle vaut c×d×C0).

Chapter 2

Monte–Carlo methods for pricingAmerican options.

We present some recent Monte–Carlo algorithms devoted to the pricing of American options.This part follows the work done by Pierre Cohort (see [Cohort(2001)]). We give the de-scription of two class of algorithm. The first class use an approximation of conditional ex-pectation (Longstaff and Schwartz [Longstaff and Schwartz(1998)] and Tsitsiklis and VanRoy[Tsitsiklis and Roy(2000)]). The second one is based on an approximation a the underlyingMarkov chain (quantization algorithms [Pagès and Bally(2000), Pagès et al.(2001)], Broadieand Glassermann [Broadie and Glassermann(1997a), Broadie and Glassermann(1997b)] andBarraquand and Martineau [Barraquand and Martineau(1995)]).

2.1 Introduction

We assume that an American option can only be exercised at a given set of time (t0, . . . , tN = T ).In this case we say that the option is a Bermudean option.

The asset price at time tn is supposed to be given by a Markov chain (Xn,0≤ n≤ N) with atransition matrix between n and n+1 given by Pn(x,dy). This means that for every bounded f

E( f (Xn+1)|Fn) = Pn f (Xn) :=∫

f (y)Pn(Xn,dy),

where Fn = σ (Xk,k ≤ n).We assume that the payoff at time tn is given by ϕ (Xn) where ϕ is a bounded function

and that the actualization between time t j and t j+1 can be expressed as 1/(1+R j), R j being aconstant positive interest rate between time t j and t j+1. This allows us to define an actualizationcoefficient B between arbitrary time j and k, when j < k, by setting

B( j,k) =k−1

∏l= j

11+Rl

.

53

54 CHAPTER 2. MONTE–CARLO METHODS FOR PRICING AMERICAN OPTIONS.

By convention we set B( j, j) = 1. Note that B( j, j+1) = 1/(1+R j). With these notations, theprice at time 0 of this option is

Q0 = supτ∈T0,N

E(B(0,τ)ϕ (Xτ)) (2.1)

where T j,N is the set of Fn-stopping times taking values in j, . . . ,N.

Remark 2.1.1. For the d-dimensional Black and Scholes model the assets prices (Sx,it ,1≤ i≤ d)

can be written as

Sx,it = xie

(r− 1

2 ∑1≤ j≤p

σ2i j

)t + ∑

1≤ j≤pσi jW

jt

where (Wt , t ≥ 0) is a p–dimensional Brownian motion, σ is a d× p matrix, x belongs to Rd ,σ2

i = ∑1≤ j≤p σ2i j and r > 0 is the risk-less interest rate. So Pn is defined by

Pn f (x) = E(

f (Sxtn+1−tn)

).

Moreover, we have1+Rn = exp(r(tn+1− tn)).

Standard theory of optimal stopping proves that Q0 = u(0,X0), where u is the solution of thedynamic programming algorithm

u(N,x) = ϕ(x)u( j,x) = max

(ϕ(x), Pju( j+1,x)

), 0≤ j ≤ N−1. (2.2)

and Pj(x,dy) is the discounted transition kernel of the Markov chain (X j, j = 0, . . . ,N) betweenj and j+1 given by

Pj f (x) = B( j, j+1)Pj f (x).

The stopping timeτ∗ = inf

j ≥ 0,u( j,X j) = ϕ

(X j)

. (2.3)

is optimal, that is to say

Q0 = u(0,X0) = E(B(0,τ∗)ϕ (Xτ∗)) .

Moreover, the price Q j of this American option at time j given by

Q j = supτ∈T j,N

E(B( j,τ)ϕ (Xτ) |F j

),

can be computed as u( j,X j) and and optimal stopping time at time j is given by τ∗j where

τ∗j = infi≥ j;u(i,Xi) = ϕ(Xi) .

2.2. THE LONGSTAFF AND SCHWARTZ ALGORITHM. 55

2.2 The Longstaff and Schwartz algorithm.

This algorithm approximate the optimal stopping time τ∗ on M given paths by random variablesτ(m) depending on the path m and then estimate the price Q0 according to a Monte-Carlo formula

QM0 =

1M ∑

1≤i≤MB(0,τ(m))ϕ

(X (m)

τ(m)

). (2.4)

The construction of (τm,1 ≤ m ≤ M) is presented according to the following steps. First wewrite the dynamic programming algorithm directly on the optimal stopping time τ∗. Then wepresent the Longstaff and Schwartz approximation of this recursion. Finally we give a Monte-Carlo version, leading to τ(m).

A dynamical programming algorithm for optimal stopping time. The main feature of theLongstaff and Schwartz algorithm is to use a dynamic principle for optimal stopping timesrather than for the value function. Note that that τ∗ can be computed using the sequence (τ∗j ,0≤j ≤ N) defined by the backward induction

τ∗N = N,τ∗j = j1ϕ(X j)≥u( j,X j)+ τ∗j+11ϕ(X j)<u( j,X j).

(2.5)

Note that ϕ(X j)< u( j,X j)

=

ϕ(X j)< Pju( j+1,X j))

=

ϕ(X j)< E(

B( j,τ∗j+1)ϕ(Xτ∗j+1)∣∣∣X j

).

(2.6)

With this definition for τ∗j , it is easy to check recursively that

τ∗j = mini≥ j,ϕ (Xi) = u(i,Xi) . (2.7)

Thus τ∗j (and thus τ∗ = τ∗0 ) are optimal stopping times at time j. In order to estimate τ∗ we haveto find a way to approximate the conditional expectation

E(

B(

j,τ∗j+1)

ϕ

(Xτ∗j+1

)|X j

). (2.8)

The basic idea of Longstaff and Schwartz is to introduce a least square regression method toperform this approximation.

The regression method We denote by Z j+1 the random variable

Z j+1 = B(

j,τ∗j+1)

ϕ

(Xτ∗j+1

).

Obviously, as B and ϕ are bounded, Z j+1 is also a bounded random variable. Let us recall thatthe conditional expectation

E(Z j+1|X j

)(2.9)


can be expressed as ψ j(X j), where ψ j minimizes

E([

X j+1− f (X j)]2)

among all functions f such that E(

f (X j)2)<+∞. This come from the definition of the condi-

tional expectation as a L2-projection on the set of the σ(X j)-measurable random variables andthe well known property that all these random variables can be writen as f (X j).

Assume now that we have a sequence of functions (gl, l ≥ 1) which is a total basis of L2j =

L2 (Rd, law of X j)

for every time j,1≤ j ≤ N. For all time index j we are thus able to expressψ j as

ψ j = ∑l≥1

αlgl,

where the convergence of the series has to be understood in L2j . So we have a way to compute

recursively the optimal exercise time

1. initialize τN = N. Then inductively

2. define α j = (αj

l , l ≥ 0) as the sequence which minimizes

E([

B(

j,τ∗j+1)

ϕ

(Xτ∗j+1

)− (α ·g)(X j)

]2)

where α ·g = ∑l≥1 αlgl .

3. define τ∗j = j1ϕ(X j)≥(α j·g)(X j)+ τ∗j+11ϕ(X j)<(α j·g)(X j).

This program is not really implementable since we have to perform a minimization in an infinitedimensional space and we are not able to compute the expectation involved in the minimizationproblem in step 2.

In order to implement the algorithm, the basic idea is to truncate the series at index k. Thisleads to the following modified program.

1. initialize τN = N. Then inductively

2. define α j,k = (αj,k

l ,0≤ l ≤ k) as the vector which minimizes

E([

B(

j, τ j+1)

ϕ

(Xτ j+1

)− (α j,k ·g)(X j)

]2)

(2.10)

where (α j,k ·g) = ∑kl=1 α

j,kl gl .

3. defineτ j = j1

ϕ(X j)≥(α j,k·g)(X j) + τ j+11ϕ(X j)<(α j,k·g)(X j).

To make the previous program implementable, we introduce a Monte-Carlo version of (2.10).

2.3. THE TSITSIKLIS–VANROY ALGORITHM. 57

A Monte-Carlo approach to the regression problem Let (X (m)n ,0≤ n≤ N), for 1≤m≤M

be M paths sampled along the law of the process (Xn,0≤ n≤ N). We replace the minimizationproblem 2.10 by

1. initialize τMN = N. Then inductively

2. define αj,k

M = (αj,k

M ,0≤ j ≤ k) as the vector which minimizes

1M ∑

1≤m≤M

((α ·g)

(X (m)

j

)−B

(j,τ j+1

)ϕ

(X (m)

τ j+1

))2(2.11)

3. define for each trajectory m

τ(m)j = j1

ϕ(X j)≥(α j,kM ·g)(X j)

+ τ(m)j+11

ϕ(X j)<(αj,k

M ·g)(X j).

This algorithm is now fully implementable and that the estimator of the price is given by

1M

B(0,τ(m))ϕ(X (m)

τ(m)).

Remark 2.2.1. • The minimization problem (2.11) is a standard least square approxima-tion problem. An algorithm for solving it can be found for instance in [Press et al.(1992)].

• The basis (gk,k ≥ 1) is supposed to be independent of the time index j. This is just forsake of simplicity and one can change the basis at each time index if needed for numericalefficiency. For instance, it is quite usual to add the payoff function in the basis, at leastwhen j approaches N.

• Very often for financial products such as call or put options,

P(ϕ(X j) = 0

)> 0,

for every time j. Since

ϕ(X j)= 0⊂ A j, it is useless to compute

E(

B(

j,τ∗j+1)

ϕ

(Xτ∗j+1

)∣∣∣X j

)on the set

ϕ(X j)= 0

and we can thus restrict ourself in the regression to trajectoriessuch that

ϕ(X j)> 0

.

• A rigorous proof of the convergence of the Longstaff Schwartz algorithm is givenin [Clément et al.(2002)].

2.3 The Tsitsiklis–VanRoy algorithm.

This algorithm use a revised dynamical programming algorithm similar to (2.2) but involving

u( j,x) = Pju( j+1,x)

instead of u( j,x). An approximation of u is then performed by using a regression method.


Another form of the dynamic programming algorithm. Using u, the algorithm (2.2) canbe rewriten as

u(N,x) = Pn−1ϕ(x)For 0≤ j ≤ N−2,

u( j,x) = Pj max(ϕ(.), u( j+1, .))(x).(2.12)

So that Q0 = max(ϕ(x0), u(0,x0).For every 1 ≤ j ≤ N − 1, let (gl, l ≥ 1) be a basis of L2

j . At time N − 1, the authors ap-proximate u( j,x) by its projection on the subspace spanned by g1, . . . ,gk, according to the L2

jnorm. They iterate this procedure backward in time, projecting at time j the function

B( j, j+1)max(ϕ(x),(α j+1 ·g)(x)

)on the vector space generated by g1, . . . ,gk according to the L2

j norm. The resulting approxi-mated dynamical programming algorithm is then

αN−1 = D−1N−1E

(B(N−1,N)ϕ (XN)

tgN−1 (XN−1))

For 1≤ j ≤ N−2α j = D−1

j E(B( j, j+1)max

(ϕ(X j+1

),(α j+1 ·g)

(X j+1

)) tg(X j))

α = E(B(0,1)max

(ϕ (X1) ,(α

1 ·g)(X1))) (2.13)

where DMj is the covariance matrix of the vector g(X j) defined as

(D j)k,l = E(gk(X j)gl(X j)

)−E

(gk(X j)

)E(gl(X j)

).

At time 0, the proposed approximation of the price is given by

max(ϕ (x0) ,α) .

The Monte-Carlo algorithm. In order to obtain an implementable algorithm we use a setof trajectories denoted by (X (m),1 ≤ m ≤ M) sampled along the law of X . The Monte-Carloversion of (2.13) is

αN−1M = (DM

N−1)−1 1

M ∑1≤l≤M B(N−1,N)ϕ

(X (m)

N

)tg(

X (m)N−1

)For 1≤ j ≤ N−2,

αj

M = (DMj )−1B( j, j+1)×

× 1M ∑1≤l≤M max

(ϕ

(X (m)

j+1

),(α

j+1M ·g)

(X (m)

j+1

))tg(

X (m)j

)α0

M = B(0,1) 1M ∑1≤l≤M max

(ϕ

(X (m)

1

),(α1

M ·g)(

X (m)1

)),

(2.14)

where D j denotes the empirical dispersion matrix of the set of points

(g(X (m)j ),1≤ m≤M)

of Rk.The price at time 0 is approximated by max(ϕ (X0) ,α

0M).

2.4. THE QUANTIZATION ALGORITHMS. 59

2.4 The quantization algorithms.

Algorithms based on quantization methods have been introduced by Bally and Pagès in[Pagès and Bally(2000)]. The principle of these algorithms is to discretize, at time j, the under-lying Markov process X j by a set of “quantization levels” y j =

(y j

i

)1≤i≤n j

and then to approxi-

mate the transition kernel of X at time j by a discrete kernel Pj = (Pj(yjk,y

j+1l ),1≤ k ≤ n j, l ≤

n j+1).More precisely, let Pj (x,dy) be the transition kernel between times j and j+1 of a Markov

chain (Xn,0≤ n≤ N) taking its values in Rn and starting from x0.For 1≤ j ≤ N, let y j be a set of vectors of Rd

y j =

y j1, . . . ,y

jn j

.

Note that X0 = x0 is supposed to be constant and need not to be discretized. We recall that

supτ∈T0,N

E(B(0,τ)ϕ(Xτ)) = u(0,x0),

where u is given by the dynamic programming algorithm (2.2)u(N,x) = ϕ(x)u( j,x) = max

(ϕ(x), Pju( j+1,x)

)0≤ j ≤ N−1. (2.15)

We will now approxime the actualized transition kernel PjB( j, j + 1)Pj by a discrete transi-tion kernel Pj defined on y j × y j+1. This kernel must satisfy the following requirement :

Pj

(y j

k,yj+1l

)approximate the actualized probability that the process X moves from a neigh-

borhood of y jk at time j to a neighborhood of y j+1

l at time j+1. This can be done using Pjdefined by

Pj

(y j

k,yj+1l

)= B( j, j+1)

P(X j ∈Ck(y j),X j+1 ∈Cl(y j+1))

P(X j ∈Ck(y j))(2.16)

where the set Ci(y j) is the ith Voronoï cell of the set of points y j, defined by

Ci(y j)=z ∈ Rd,

∥∥∥z− y ji

∥∥∥= min1≤l≤n j

∥∥∥z− y jl

∥∥∥ . (2.17)

If we assume that, for every x ∈ Rd and l, Pj(x,∂Cl

(y j+1)) = 0, equation (2.16) defines a

probability transition kernel.The approximation of the dynamic programming algorithm (2.15) using the matrix Pj is

given by u(N,yN

k ) = f(yN

k)

1≤ k ≤ nNFor 0≤ j ≤ N−1, 1≤ k ≤ n j,

u( j,y jk)max

(f(

y jk

), ∑

1≤l≤n j+1

Pj

(y j

k,yj+1l

)u( j+1,y j+1

l )

) (2.18)


and the proposed price approximation is u(0,x0).In order to have an algorithm we need to construct a Monte-Carlo procedure to approximate

Pj. Let (X (m),1 ≤ m ≤ M) be M sample paths drawned along the law of the markov chain Xand independent. The formula (2.16) suggests to consider the MonteCarlo estimator Pj definedby

Pj

(y j

k,yj+1l

)= B( j, j+1)

∑1≤m≤M 1X (m)

j ∈Ck(y j),X (m)j+1Cl(y j+1)

∑1≤m≤M 1

X (m)j ∈Ck(y j)

(2.19)

An implementable version of (2.18) obtained by replacing Pj by Pj in (2.18).

Remark 2.4.1. When u has been computed at the point (y ji ,1≤ i≤ n j), we can extend it to Rn

by setting u( j,x) = u( j,y ji ) for x ∈Ci(y j). This allows us to approximate the stopping time τ∗

byτ = min

j;ϕ(X j) = u( j,X j)

, (2.20)

and leads to another approximation for the price at time zero by

1M ∑

1≤m≤MB(

0, τ(m))

f(Q

τ(m),X(m)

τ(m)

),

where τ(m) is the stopping time computed using 2.20 on the trajectory m.

Choice of the quantization sets There are many ways to implement the quantization method,depending on the choice of the quantization sets y j. We present here two of them.

The first one is the Random Quantization algorithm. We assume that n j = n for all j. We usea self–quantization procedure to generate the quantization sets, that is to say we set

y jm = X (m)

j 1≤ j ≤ n 1≤ m≤ n.

where X (m),1≤ m≤M are independently drawn along the law of X . This choice is acceptablebut is not the best among all the random choices (see [Cohort(2004)]).

A second way to choose the quantization sets is to minimize a quantity reflecting the qualityof the quantization set. The distortion D defined for a set y = (yk,1leqk ≤ n) and a randomvariable X by

D (y,X)E(

min1≤k≤n

‖X− yk‖2)

is a classical choice for this. Moreover, an error bound for the previously defined ap-proximation of the price can be derived in terms of the distortions (D

(y j,X j

),1 ≤ j ≤ N)

(see [Pagès and Bally(2000)]). It can be shown that there exists an optimal set y∗ satisfying

D (y∗,X) = miny∈(Rd)

nD (y,X)

2.5. THE BROADIE–GLASSERMANN ALGORITHM. 61

This optimal set can be computed numerically by using the Lloyd algorithm : let y(0) ∈(Rd)n

be such that all points are different and define a sequence y(n) setting

y(n+1)k = E

(Y |Y ∈Ck

(y(n)))

.

Note that a Monte-Carlo method must be used to approximate the previous expectation. ThenD(

y(n),X)

is decreasing and the sequence y(n) converges to a point y∗ which realizes alocal minimum of D(.,X). For more details on the Lloyd algorithm and its convergencesee [Sabin and Gray(1986)]. Another classical optimization procedure for D uses the Koho-nen algorithm, see [Fort and Pagès(1995)].

2.5 The Broadie–Glassermann algorithm.

For this algorithm we use a random mesh depending on time and write a dynamical program-ming algorithm at points of this mesh.

Mesh generation. We assume that the transition matrices on Rn of the process (Xn,ngeq0)Pj(x,dy) have a density

f j (x,y) .

This is obviously the case in the Black and Scholes model. We assume moreover that X0 = x0 ∈Rn.

Let (Y (m)1 ,1≤m≤ n) be a sample following the density f0 (x0,y). The set (Y (m)

1 ,1≤m≤ n)define the mesh at time 1.

Then the mesh at time j+1 (Y (m)j+1,1≤m≤ n) is defined assuming that Y j = (Y (m)

j ,1≤m≤ n)is already drawn and that the conditional law of Y j+1 knowing Y j is the law of n independantrandom variables with (conditional) density

1n ∑

1≤l≤nf j

(Y (l)

j ,y)

dy. (2.21)

The next step consists in writing an approximation of the dynamical programming algorithmu(N,x) = ϕ(x),u( j,x)max

(ϕ(x), Pju( j+1,x)

)0≤ j ≤ N−1. (2.22)

using only the points (Y (m)j ,1≤m≤ n). For this Broadie and Glassermann propose to approxi-

matePju( j+1,Y (k)

j ),

by a sum1n ∑

1≤l≤nβ

k,lj (Y (k)

j ,Y (m)j+1)u( j+1,Y (m)

j+1). (2.23)


They choose the weights βj

kl by considering (2.23) as a Monte-Carlo sum. Indeed, one has

E

(u( j+1,Y (m)

j+1)f j

(Y (k)

j ,Y (m)j+1

)1n ∑1≤l≤n f j

(Y (l)

j ,Y (m)j+1

)∣∣∣∣∣Y (k)j ,k = 1, . . . ,n

)=

=∫Rd

u( j+1,u)f j

(Y (k)

j ,u)

1n ∑1≤l≤n f j(Y

(l)j ,y)

1n ∑

1≤l≤nf j

(Y (l)

j ,y)

dy

=∫Rd

u( j+1,u) f j

(Y (k)

j ,y)

dy

= Pju( j+1,Y (k)j )

(2.24)

But (Y (m)j+1,1 ≤ m ≤ n) are, conditionally to (Y (m)

j ,1 ≤ m ≤ n), independant and identicallydistributed. So a natural estimator for (2.24) is given by

1n ∑

1≤m≤nβ

jk,mu( j+1,Y (m)

j+1) (2.25)

where

βj

k,m =f j

(Y (k)

j ,Y (m)j+1

)1n ∑1≤l≤n f j

(Y (k)

j ,Y (l)j+1

) . (2.26)

The approximation of the dynamical programming algorithm is then given byu(N,Y N

(k)) = ϕ(Y N(k)),

for 0≤ k ≤ n, 1≤ j ≤ N,

u( j,Y (k)j )max

(ϕ

(Y (k)

j

),∑1≤l≤n β

jk,m, u( j+1,Y (m)

j+1)),

and the price approximation by u(0,x0).

2.6 The Barraquand–Martineau algorithm

To avoid dimensionality issue, the authors propose to approximate the optimal stopping strat-egy a sub–optimal one : assume that the option holder knows at time j the payoff valuesϕ (Xi) , i≤ n but not the stock values Xi, i≤ n. Then the option holder can only exerciseaccording to a stopping time with respect to (Gn,n ≥ 0) where Gn = σ(Xi, i ≤ n). Doing this,the price is approximated (by below) by

supτ∈G0,N

E(B(0,τ)ϕ (Xτ)) (2.27)

where G0,N is the set of the Gn–stopping times taking values in 0, . . . ,N.

2.6. THE BARRAQUAND–MARTINEAU ALGORITHM 63

To compute (2.27), the authors propose a quantization method for the one-dimensional dy-namic programming algorithm. Obviously, in general, the process (ϕ(Xn),n≥ 0) is not Marko-vian with respect to G and a second approximation has to be done in order to write the followingdynamical programming algorithm assuming that (ϕ(Xn),n≥ 0) is Markovian

u(N,x) = ϕ(x)u( j,x) = max

(ϕ(x), Pju( j+1,x)

), 1≤ j ≤ N,

(2.28)

where Pj(x,dy) is given by

E(B(n,n+1) f (ϕ(Xn+1)) |ϕ(Xn)) = Pn f (ϕ(Xn)) .

Assume that the two previous approximations are acceptable. The algorithm (2.28) can now behandled by a one–dimensional quantization technique. For 1≤ j ≤ N, let

z j2 < .. . < z j

n j

⊂ R

and let z j1 =−∞, z j

n j+1 =+∞. Then, define the sets y j =

y j1, . . . ,y

jn j

by

y jk = E

(ϕ(X j)|ϕ(X j)∈[z j

k,zjk+1

]). (2.29)

As in the quantization algorithm, the kernel Pj is discretized on y j× y j+1 by ˆPj where

ˆPj

(y j

k,yj+1l

)= B( j, j+1)

P(

ϕ(X j) ∈ [z jk,z

jk+1],ϕ(X j+1) ∈ [z j+1

l ,z j+1l+1 ]

)P(

ϕ(X j) ∈ [z jk,z

jk+1]

) (2.30)

The next step is to approximate (2.28) using Pj

(y j

k,yj+1l

),

û(N,yNk ) = yN

k , 1≤ k ≤ nN

û( j,y jk) = max

(y j

k,∑1≤l≤n j+1ˆPj

(y j

k,yj+1l

)û( j+1,y j+1

l )),

For 1≤ k ≤ n j, 0≤ j ≤ N.

(2.31)

The approximation of the price at time 0 is given by û(0,x0).

A Monte-Carlo version of (2.31) is obtain using m samples of the process X , (X (m)j ,1≤ j ≤

N) and approximating

y jk with

1

Card (A jk)

∑m∈A j

k

ϕ(X (m)j ),

andˆ

jP withCard (A j

k∩A j+1l )

Card (A jk)

,

whereA j

k =

1≤ m≤ n,ϕ(X (m)j ) ∈

[z j

k,zjk+1

].

Remark 2.6.1. Comparison of most of the algorithms presented here can be foundin [Fu et al.(2001)].

Chapter 3

Introduction to stochastic algorithms

3.1 A reminder on martingale convergence theorems

F = (Fn,n≥ 0) denote an increasing sequence of σ -algebra of a probability space (Ω,A ,P).

Definition 3.1.1. A sequence of real random variable (Mn,n≥ 0) is a F -martingale if and onlyif, for all n≥ 0 :

• Mn is Fn-measurable

• Mn is integrable, E(|Mn|)<+∞.

• E(Mn+1|Fn) = Mn.

- When, for all n≥ 0, E(Mn+1|Fn)≤Mn the sequence is called a super-martingale.

- When, for all n≥ 0, E(Mn+1|Fn)≥Mn the sequence is called a sub-martingale.

Definition 3.1.2. An F -stopping time is a random variable τ taking its values in N∪+∞such that, for all n≥ 0, τ ≤ n ∈Fn.

Given a stopping time τ and a process (Mn,n≥ 0), we can define a stopped process by Mn∧τ .It is easy to check that a stopped martingale (resp. sub, super) remains an F -martingale (resp.sub, super).

Exercice 12. Check it using the fact that :

M(n+1)∧τ −Mn∧τ = 1τ>n (Mn+1−Mn) .

Convergence of super-martingale Almost sure convergence of super-martingale can be ob-tained under weak conditions.

Theorem 3.1.3. Let (Mn,n ≥ 0) be a positive super-martingale with respect to F (i.e. theconditional expectation is decreasing E(Mn+1|Fn)≤Mn, then Mn converge almost surely to arandom variable M∞ when n goes to +∞.

65

66 CHAPTER 3. INTRODUCTION TO STOCHASTIC ALGORITHMS

For a proof see [Williams(1991)].

Remark 3.1.1. The previous result remain true if, for all n, Mn ≥−a, with a≥ 0 (as Mn +a isa positive super-martingale).

To obtain Lp-convergence we need stronger assumptions.

Theorem 3.1.4. Assume (Mn,n ≥ 0) is a martingale with respect to F , bounded in Lp for ap > 1 (i.e. supn≥0E(|Mn|p)<+∞), then then Mn converge almost surely and in Lp to a randomvariable M∞ when n goes to +∞.

Remark 3.1.2. The case p = 1 is a special case, if (Mn,n ≥ 1) is bounded in L1, Mn convergeto M∞ almost surely but we need to add the uniform integrability of the sequence to obtainconvergence in L1.

For a proof of these theorems see for instance [Williams(1991)] chapter 11 and 12.

3.1.1 Consequence and examples of uses

We first remind a deterministic lemma know as Kronecker Lemma.

Lemma 3.1.3 (Kronecker Lemma). Let (An,n≥ 1) be an increasing sequence of strictly positivereal numbers, such that limn→+∞ An =+∞.

Let (εk,k ≥ 1) be a sequence of real numbers, such that Sn = ∑nk=1 εk/Ak converge when n

goes to +∞.Then :

limn→+∞

1An

n

∑k=1

εk = 0.

Proof. As Sn converge, we can write Sn = S∞ +ηn, ηn being a sequence converging to 0 whenn goes to +∞. Moreover, using Abel transform :

n

∑k=1

εk =n

∑k=1

Akεk

Ak= AnSn−

n−1

∑k=1

Sk (Ak+1−Ak) .

So :

1An

n

∑k=1

εk = S∞ +ηn−S∞

An−A1

An− 1

An

n−1

∑k=1

ηk (Ak+1−Ak)

= ηn +S∞

A1

An− 1

An

n−1

∑k=1

ηk (Ak+1−Ak) .

(3.1)

The first two terms converge to 0 (as An converge to +∞ and ηn to 0 when n goes to +∞). Thethird one is a (generalized) Cesaro mean of a sequence which converges to 0 so it converges to0.

3.1. A REMINDER ON MARTINGALE CONVERGENCE THEOREMS 67

A proof of the strong law of large number As an application of martingale convergencetheorem, we will give a short proof of the strong law of large numbers for a square integrablerandom variable X . Let (Xn,n ≥ 1) be a sequence of independent random variables followingthe law of X . Denote by X ′n = Xn−E(X).

Let Fn = σ(Xk,k ≤ n) and Mn be :

Mn =n

∑k=1

X ′kk.

Note that, using independence and E(X ′k) = 0, Mn is an F -martingale. Moreover, using onceagain independence, we get :

E(M2n) = Var (X)

n

∑k=1

1k2 < K <+∞.

So the martingale M is bounded in L2, and using theorem 3.1.4 converge almost surely to M∞.Using Kronecker lemma this implies that :

limn→+∞

1n

n

∑k=1

X ′k = 0,

or limn→+∞1n ∑

nk=1 Xk = E(X).

We can relax the L2 hypothesis to obtain the full strong law of large numbers under thetraditional L1 condition. See the following exercise (and [Williams(1991)] for a solution ifneeded).

Exercice 13. Suppose that (Xn,n ≥ 1) are independent variables following the law of X , withE(|X |)<+∞. Define Yn by :

Yn = Xn1|Xn|≤n.

1. Prove that limn→+∞E(Yn) = E(X).

2. Prove that ∑+∞

n=1P(|X |> n)≤ E(|X |), and deduce that

P(Exists n0(ω), for all n≥ n0, Xn = Yn) = 1.

3. Check that Var (Yn)≤ E(|X |21|X |≤n

)and prove that :

∑n≥1

Var (Yn)

n2 ≤ E(|X |2 f (|X |)

),

wheref (z) = ∑

n≥max(1,z)

1n2 ≤

2max(1,z)

.

Deduce that ∑n≥1 Var (Yn)/n2 ≤ 2E(|X |)<+∞.


4. Let Wn =Yn−E(Yn), prove that ∑k≥nWkk converge when n goes to +∞, and deduce, using

Kronecker lemma, that

limn→+∞

1n ∑

k≤nWk = 0,

then deduce limn→+∞1n ∑k≤nYn = E(X).

5. Using the result of question 2, prove that limn→+∞1n ∑k≤n Xn = E(X)

An extension of the super-martingale convergence theorem For the proof of the conver-gence of stochastic algorithms we will need on extension of the super-martingale convergencetheorem 3.1.3 known as Robbins-Sigmund lemma.

Lemma 3.1.4 (Robbins-Sigmund lemma). Assume Vn, an, bn, cn are sequences of positive ran-dom variables adapted to (Fn,n≥ 0) and that, moreover, for all n≥ 0 :

E(Vn+1|Fn)≤ (1+an)Vn +bn− cn,

then, on

∑n≥1 an <+∞,∑n≥1 bn <+∞

, Vn converge to a random variable V∞ and ∑n≥1 cn <+∞.

Proof. Let :

αn =1

∏nk=1 (1+ak)

.

Then define V ′n = αn−1Vn, b′n = αnbn, c′n = αncn. Clearly the hypothesis can be rewritten as :

E(V ′n+1|Fn

)≤V ′n +b′n− c′n.

This means, if Xn =V ′n−∑n−1k=0(b

′k− c′k), that Xn is a super-martingale.

Now consider the stopping time τa :

τa = inf

n≥ 0,

n−1

∑k=0

(b′k− c′k)≥ a

,

(the infimum is +∞ if the set is empty).τa is a stooping time such that, if n ≤ τa, then Xn ≥ −a. So Xn∧τa is a super-martingale,

bounded from below, so it converges. So we can conclude that limn→+∞ Xn exists on the set∪a>0 τa =+∞. But, as an is positive, αn is a positive, decreasing sequence, so it converges toα when n goes to +∞. Moreover :

αn =1

∏nk=1 (1+ak)

= e−∑nk=0 log(1+ak) ≥ e−∑

nk=0 ak .

So on the set

∑n≥1 an <+∞,∑n≥1 bn <+∞

, α > 0. It follows that ∑n≥1 b′n < +∞, and as,c′n > 0 that, for all n≥ 0

n

∑k=0

b′k−n

∑k=0

c′k ≤n

∑k=0

b′k <+∞.

3.2. ALMOST SURE CONVERGENCE FOR SOME STOCHASTIC ALGORITHMS 69

So for a > ∑k≥0 b′k, Ta =+∞, and we can conclude that, on the set∑n≥1

an <+∞, ∑n≥1

bn <+∞

,

Xn converge to X∞ when n goes to ∞. Moreover :

0≤n

∑k=0

c′k = Xn−V ′n +n

∑k=0

b′k,

so, as Vn ≥ 0, and Xn and ∑nk=0 b′k converge when n goes to +∞, we get that ∑k≥1 c′k is finite,

then, as αn converge to α > 0 that ∑k≥1 ck is finite.

3.2 Almost sure convergence for some stochastic algorithms

3.2.1 Almost sure convergence for the Robbins-Monro algorithm

Theorem 3.2.1. Let f be a function from Rd to Rd . Assume that :

H1 Hypothesis on the function f . f is continuous and there exists x∗ ∈Rd , such that f (x∗)= 0and 〈 f (x),x− x∗〉> 0 for x 6= x∗.

H2 Hypothesis on the step size γ . (γn,n≥ 1) is a decreasing sequence of positive real numberssuch that ∑n≥1 γn =+∞ and ∑n≥1 γ2

n <+∞.

H3 Hypothesis on the sequence Y . (Fn,n ≥ 0) is a filtration on a probability space and(Yn,n≥ 1) is sequence of random variables on this probability space such that

H3.1 E(Yn+1|Fn) = f (Xn),

H3.2 E(|Yn+1− f (Xn)|2 |Fn

)≤ σ2(Xn) where

s2(x) = σ2(x)+ f 2(x)≤ K(1+ |x|2).

Define the sequence (Xn,n≥ 0), by X0 = x0, where x0 is a point in Rd and, for n≥ 0

Xn+1 = Xn− γnYn+1.

Then limn→+∞ Xn = x∗, a.s..

Remark 3.2.1. • The main application of this algorithm arise when f (x) can be written as

f (x) = E(F(x,U)) ,

where U follows a known law and F is a function. In this case Yn is defined as Yn =F(Xn,Un+1) where (Un,n≥ 1) is a sequence of independent random variables followingthe law of U . Clearly, if Fn = σ(U1, . . . ,Un), we have

E(F(Xn,Un+1)|Fn) = f (Xn).


MoreoverE(|F(Xn,Un+1)− f (Xn)|2|Fn

)= σ

2(Xn),

where σ2(x) =E(|F(x,U)− f (x)|2

). So H3.2 is an hypothesis on behavior of the expec-

tation and the variance of F(x,U) when |x| goes to +∞.

• The hypothesis H1 is fulfilled when f (x) = ∇V (x) where V is a strictly convex functionand there exists x∗ which minimize V (x). This will be the case in most of our examples.

Proof. First note that hypothesis H3 implies that

E(Y 2

n+1|Fn)= E

((Yn+1− f (Xn))

2|Fn)+E

(f (Xn)

2)≤ s2(Xn)≤ K(1+ |Xn|2)≤ K′(1+ |Xn− x∗|2).

(3.2)

Let V (x) = |x− x∗|2 and set Vn =V (Xn). Clearly :

Vn+1 =Vn + γ2nY 2

n+1−2γn〈Xn− x∗,Yn+1〉.

Taking the conditional expectation we obtain :

E(Vn+1|Fn) =Vn + γ2nE(Y 2

n+1|Fn)−2γn〈Xn− x∗,E(Yn+1|Fn)〉,

and using hypothesis H3 we get

E(Vn+1|Fn)≤Vn + γ2n s2(Xn)−2γn〈Xn− x∗, f (Xn)〉.

So, using inequality 3.2, we obtain :

E(Vn+1|Fn)≤Vn +Kγ2n (1+Vn)−2γn〈Xn− x∗, f (Xn)〉

=Vn(1+Kγ

2n)+Kγ

2n −2γn〈Xn− x∗, f (Xn)〉.

(3.3)

So, using Robbins-Sigmund lemma, with an = bn = Kγ2n , and cn = γn〈Xn−x∗, f (Xn)〉 (which is

positive because of hypothesis H1), we get (by hypotheses H2 ∑n≥1 γ2n <+∞) that, both

• Vn converge to V∞, almost surely,

• ∑n≥1 γn〈Xn− x∗, f (Xn)〉<+∞.

Obviously V∞ is a positive random variable, and we only need to check that this random is equalto 0.

Assume that P(V∞ > 0)> 0, then on the set V∞ > 0 we have 0 <V∞/2≤Vn ≤ 3V∞/2 forn≥ n0(ω), so

∑n≥1

γn〈Xn− x∗, f (Xn)〉 ≥ infV∞/2≤|x−x∗|2≤3V∞

〈x− x∗, f (x)〉 ∑n≥n0

γn.

But ∑n≥n0 γn = +∞ and infV∞/2≤|x−x∗|2≤3V∞〈x− x∗, f (x)〉 > 0 (remind that f , and so 〈x−

x∗, f (x)〉, are continuous and V∞/2≤ |x− x∗|2 ≤ 3V∞ is a compact set). So on the set V∞ > 0we should have ∑n≥1 γn〈Xn− x∗, f (Xn)〉 = +∞, but we know that this sum is almost surely fi-nite. So we have proved that P(V∞ > 0) = 0, which prove the almost sure convergence of thealgorithm.

3.2. ALMOST SURE CONVERGENCE FOR SOME STOCHASTIC ALGORITHMS 71

3.2.2 Almost sure convergence for the Kiefer-Wolfowitz algorithm

The Kiefer-Wolfowitz algorithm is a variant of the Robbins-Monro algorithm. Its convergencecan be proved using the Robbins-Siegmund lemma.

Theorem 3.2.2. Let φ be a function from R to R, such that

φ(x) = E(F(x,U)) ,

where U is a random variable taking its values in Rp and F is a function from R×Rp to R.We assume that

• φ is a C2 strictly convex function such that∣∣φ ′′(x)∣∣≤ K(1+ |x|),

and there exist x∗ which minimize φ on R.

• (γn,n≥ 1) and (cn,n≥ 1) are decreasing sequence of real numbers such that

∑n≥1

γn =+∞, ∑n≥1

γncn <+∞, ∑n≥1

γ2n

c2n<+∞,

• s2(x) = E(F2(x,U)

)≤ K(1+ |x|).

• (U1n ,n ≥ 1) and (U2

n ,n ≥ 1) are 2 independent sequences of independent random vari-ables following the of U.

We define (Xn,n≥ 0) by X0 = x0 ∈ R and, inductively

Xn+1 = Xn− γnF(Xn + cn,U1

n+1)−F(Xn− cn,U2n+1)

2cn.

Thenlim

n→+∞Xn = x∗,a.s.

Proof. First note that because of the assumptions on φ , for |c| ≤ 1,∣∣φ(x+ c)−φ(x− c)−2cφ′(x+ c)

∣∣≤ c2K (1+ |x− x∗|) . (3.4)

Define Vn = |Xn− x∗|2, clearly

Vn+1 =Vn + |Xn+1−Xn|2 +2(Xn− x∗)(Xn+1−Xn) .

Moreover

|Xn+1−Xn|2 ≤2γ2

n4c2

n

(∣∣ f (Xn + cn,U1n+1∣∣2 + ∣∣ f (Xn− cn,U2

n+1∣∣2 .)


So :

E(|Xn+1−Xn|2 |Fn

)≤ γ2

n2c2

n

(s2(Xn + cn)+ s2(Xn− cn)

),

and

E(Vn+1|Fn)≤Vn

A1 +γ2

n2c2

n

(s2(Xn + cn)+ s2(Xn− cn)

)A2 − γn

cn(Xn− x∗)

[φ(Xn + cn)−φ(Xn− cn)−2cnφ

′(Xn)]

A3 − γn(Xn− x∗)φ ′(Xn).

(3.5)

Now, assuming that n is large enough to have cn ≤ 1,

A1≤ γ2n

c2n

K(1+ |Xn + cn|2 + |Xn− cn|2

)≤ γ2

nc2

nK′(1+ |Xn|2

)≤ γ2

nc2

nK′′(1+ |Xn− x∗|2

)=

γ2n

c2n

K′′(1+Vn),

(3.6)

Note that K < K′ < K′′ but that, as usual in this kind of proof, K′′ is still denoted by K 1.Moreover using (3.4), we obtain (using also |x| ≤ (1+ x2)/2)

A2≤ 2 |Xn− x∗| γn

2cnKc2

n = Kγncn |Xn− x∗| ≤ Kγncn

(1+ |Xn− x∗|2

)= Kγncn(1+Vn).

Finally we obtain

E(Vn+1|Fn)≤Vn(1+Kγ2

nc2

n+Kγncn)+K

γ2n

c2n+Kγncn− γn(Xn− x∗)φ ′(Xn).

Now we can using the Robbins-Siegmund lemma, setting

• an = bn = K γ2n

c2n+Kγncn

• cn = γn(Xn− x∗)φ ′(Xn), which is positive because of the convexity of φ .

So, using the fact that ∑n≥1 an = ∑n≥1 bn < +∞, we obtain Vn converge to V∞ and ∑n≥1 cn <+∞. Now using the same argument as in the proof the convergence of the Robbins-Monroalgorithm, we can conclude that P(V∞ = 0) = 1. This ends the proof of the convergence of thealgorithm.

1K denote a “constant” which can change from line to line (and even in the same line)!

3.3. SPEED OF CONVERGENCE OF STOCHASTIC ALGORITHMS 73

3.3 Speed of convergence of stochastic algorithms

3.3.1 Introduction in a simplified context

Robbins-Monro type algorithms are well known to cause problems of speed of convergence.We will see that these algorithms can lead to central limit theorem (convergence in C/

√n) but

not for an arbitrary choice of γn, in some sense, γn has to be large enough to have an optimalrate of convergence.

It is easy to show this in a simplified framework. We assume that

F(x,u) = cx+u,

where c > 0 and that U follows a Gaussian law with mean 0 and variance 1. The standardRobbins-Monro algorithm can be written as

Xn+1 = Xn− γn (cXn +Un+1) .

with γn = α/(n+ 1). In this case f (x) = cx and, using theorem 3.2.1, we can prove that Xnconverge, almost surely, to 0 when n goes to +∞.

To obtain more explicit computations we replace the discrete dynamic by a continuous one

dXt =−γt (cXtdt +σdWt) ,X0 = x.

where γt =α

t+1 . Using a standard way to solve this equation, we compute

d(

ec∫ t

0 γsdsXt

)= ec

∫ t0 γsds [cγtXtdt− cγtXt− γtσdWt ] =−ec

∫ t0 γsds

γtσdWt .

Butec∫ t

0 γsds = ecα∫ t

01

s+1 ds = (t +1)cα .

So, solving the previous equation, we get

Xt =X0

(t +1)cα− σα

(t +1)cα

∫ t

0

1(s+1)1−cα

dWs.

An easy computation leads to

E(X2t ) =

|X0|2

(t +1)2cα+

σ2α2

2cα−1

[1

t +1− 1

(t +1)2cα

].

We can now guess the asymptotic behavior of Xt

• if 2αx > 1, E(X2t ) behave as C/(t + 1), so we can hope a central limit behavior for the

algorithm,

• if 2αx < 1, E(X2t ) behave as C/(t + 1)2cα which is worse than the awaited central limit

behavior.


We can check on this example (exercise), that, when 2αx > 1√

tXt converge in distribution toa Gaussian random variable, but that, when 2αx < 1, tcαXt converge almost surely to a randomvariable.

We will see in what follows, that we can fully justify on the discrete algorithm : when α islarge enough, a central limit theorem is true and when α is too small an asymptotic convergenceworse that a central limit theorem occurs.

The result on which relies the proof is the central limit theorem for sequence of martingales(see below).

Practical considerations When using a Robbins-Monro style algorithm, in order to have agood behavior, γn has to be chosen large enough in order to have a central limit theorem, butnot too large in order to minimize the variance of the algorithm. One way to do this is to chooseγn = α/n with α large enough, another way is to choose γn = α/nβ , with 1/2 < β < 1. Butthis last choice increase the variance of the algorithm.

3.3.2 L2 and locally L2 martingales

Definition 3.3.1. Let (Fn,n ≥ 0) be a filtration on a probability space. An F -martingale(Mn,n≥ 0) is called a F -square integrable martingale if, for all n≥ 0, E(M2

n)<+∞.

in this case, we are able to define a very useful object, the bracket of the martingale. We willsee that the bracket of a martingale give a good indication of the asymptotic behavior of themartingale. This object will be useful to write the central limit theorem for martingales.

Definition 3.3.2. Assume (Mn,n≥ 0) is a square integrable martingale. There exists a unique,previsible, increasing process (〈M〉n,n≥ 0), equal at 0 at time 0 such that

M2n −〈M〉n

is a martingale. Moreover 〈M〉n can be defined by 〈M〉0 = 0 and

〈M〉n+1−〈M〉n = E((Mn+1−Mn)

2|Fn)= E

(M2

n+1|Fn)−M2

n .

Proof. Previsible means here that 〈M〉n is Fn−1-measurable. So, it is easy to check that if 〈M〉nis previsible and M2

n −〈M〉n is a martingale

〈M〉n+1−〈M〉n = E(M2

n+1|Fn)−M2

n ,

which proves the unicity of 〈M〉 adding 〈M〉0 = 0. Moreover, using the martingale property ofM, on can check that

E(M2

n+1|Fn)−M2

n = E((Mn+1−Mn)

2|Fn)≥ 0,

which proves that 〈M〉n is increasing.


The following theorem relate the almost sure asymptotic behavior of a square integrablemartingale Mn to its bracket.

Theorem 3.3.3 (Strong law of large number for martingales). Let (Mn,n ≥ 0) be a squareintegrable martingale and denote by (〈M〉n,n≥ 0) its bracket, then

• on 〈M〉∞ := limn→+∞〈M〉n <+∞, Mn converge almost surely to a random variabledenoted as M∞.

• on 〈M〉∞ =+∞,lim

n→+∞

Mn

〈M〉n= 0,a.s..

Moreover, as soon as a(t) is a positive, increasing function such that∫+∞

0dt

1+a(t) <+∞

limn→+∞

Mn√a(〈M〉n)

= 0,a.s..

Proof. Define τp asτp = infn≥ 0,〈M〉n+1 ≥ p .

τp is a stopping time as 〈M〉 is previsible. Note that, by definition, 〈M〉τp∧n ≤ 〈M〉τp ≤ p.

So M(p)n = Mn∧τp is also a martingale and 〈M(p)〉n = 〈M〉n∧τp (since M2

n∧τp−〈M〉n∧τp is a

martingale).Now remark that, for all n≥ 0

E((M(p)

n )2)= E(M2

0)+E(〈M(p)〉n

)≤ E(M2

0)+ p.

So (Mn∧τp ,n≥ 0) is a martingale bounded in L2 so it converges when n goes to +∞. So on the setτp =+∞

, Mn itself converges to a random variable M∞. As this is true for every p, we have

proved that Mn converge to M∞ on the set ∪p≥0

τp =+∞

. But 〈M〉∞ < p ⊂

τp =+∞

,so

〈M〉∞ <+∞= ∪p≥0 〈M〉∞ < p ⊂ ∪p≥0

τp =+∞.

So, on the set 〈M〉∞ <+∞, Mn converge to M∞, which proves the first point.For the second one, we will consider new martingale defined by

Nn =n

∑k=1

Mk−Mk−1√1+a(〈M〉k)

.

(Nn,n ≥ 0) is a martingale as it is defined as martingale transform of the martingale M (recallthat 〈M〉k is Fk−1-measurable, which is exactly what is needed to prove that N is a martingaletransform).

Moreover it is easy to check that this martingale is still a square integrable martingale andthat we can compute its bracket because

〈N〉n+1−〈N〉n := E(

(Mn+1−Mn)2

1+a(〈M〉n+1)

∣∣∣∣Fn

)=〈M〉n+1−〈M〉n1+a(〈M〉n+1)

.


But :

〈N〉n =n

∑k=1

〈M〉k−〈M〉k−1

1+a(〈M〉k)≤

n

∑k=1

∫ 〈M〉k〈M〉k−1

dt1+a(t)

≤∫ +∞

0

dt1+a(t)

<+∞.

So 〈N〉∞ < +∞, a.s., and, using first part of this theorem, Nn converge a.s. to N∞. But usingKronecker lemma, as we know that

N∞ =∞

∑k=1

Mk−Mk−1√1+a(〈M〉k)

we conclude that limn→+∞Mn√

1+a(〈M〉n)= 0, and so that limn→+∞

Mn√a(〈M〉n)

= 0. The first case is

obtained for a(t) = t2.

Application to the strong law of large numbers Assume that (Xn,n ≥ 1) is a sequenceof independent random variables following the law of X , such that E(|X |2) < +∞. DefineX ′n = Xn−E(X), then

Sn = X1 + · · ·+Xn−nE(X) = X ′1 + · · ·+X ′n,

is a martingale with respect to σ(X1, . . . ,Xn). As, using independence, E((Sn+1−Sn)

2|Fn)=

E((X ′n+1)2|Fn) = Var (X), 〈S〉n = nVar (X). So 〈S〉∞ = ∞ and using the previous theorem we

get limn→+∞ Sn/n = 0, which give the strong law of large numbers.Moreover using a(t) = t1+ε , with ε > 0, we get

limn→+∞

1nε/2

√n

X1 + · · ·+Xn

n−E(X)

= 0,a.s..

So we obtain a useful information on the speed of convergence of X1+···+Xnn to E(X).

Nevertheless, to obtain the central limit theorem a different tool is needed.

Extension to martingales locally in L2 We can extend the definition of the bracket for alarger class of martingale : the martingale locally in L2. We now give some definitions.

Definition 3.3.4. A process (Mn,n ≥ 0) is a local martingale, if there exists a sequence ofstopping times (τp, p≥ 0) such that

• τp increase in p and, a.s., goes to ∞ when p goes to ∞.

• (Mn∧τp,n≥ 0) is a martingale.

Definition 3.3.5. A process (Mn,n ≥ 0) is a locally square integrable martingale, if it existssequence of stopping times (τp, p≥ 0) such that

• τp increase in p and, a.s., goes to ∞ when p goes to ∞.

• (Mn∧τp,n≥ 0) is a martingale bounded in L2.


Proposition 3.3.1. If (Mn,n≥ 0) is a locally square integrable martingale, there exist a uniqueFn−1-adapted increasing process (〈M〉n,n≥ 0), equal at 0 at time 0, such that (M2

n−〈M〉n,n≥0) is a local martingale.

If τp, increasing with p and, a.s., going to ∞ when p goes to ∞, is such that Mτpn = Mτp∧n is

a square integrable martingale then 〈Mτp〉n = 〈M〉τp∧n.

Proof. The proof of this proposition is left as an exercise.

The strong law of large number remains true and unchange for martingale locally in L2.

Theorem 3.3.6. Let (Mn,n ≥ 0) be a locally square integrable martingale and denote by(〈M〉n,n≥ 0) its bracket, then

• on 〈M〉∞ := limn→+∞〈M〉n <+∞, Mn converge almost surely to a random variabledenoted as M∞.

• on 〈M〉∞ =+∞, as soon as a(t) is a positive, increasing function such that∫+∞

0dt

1+a(t) <+∞

limn→+∞

Mn√a(〈M〉n)

= 0,a.s..

Proof. The proof is almost identical to the square integrable case. it is left as an exercise.

Exercice 14. Let (Xn,n ≥ 1) be a sequence of independent real random variables followingthe law of X , such that P(X = ±1) = 1/2 and by (λn,n ≥ 1) a sequence of random variablesindependant of the sequence (Xn,n ≥ 1). Denote by Fn = σ(X1, . . . ,Xn,λ0, . . . ,λn−1). DefineMn by

Mn =n−1

∑k=0

λkXk+1.

1. Prove that M is an Fn-martingale if and only if E(|λk|)<+∞, for all k ≥ 0.

2. Prove that M is a L2-martingale if and only if, for all k ≥ 0 E(λ 2

k

)<+∞.

3. Prove that M is bounded in L2 if and only if ∑+∞

k=0E(λ 2

k

)<+∞.

4. Prove that M is a locally L2 martingale if and only if, for all k ≥ 0 |λk|<+∞.

5. Give an example of a martingales locally in L2 which is not a martingale.

3.3.3 Central limit theorem for martingales

We begin by stating the central limit theorem for martingales.


Theorem 3.3.7. Let (Mn,n≥ 0) be a locally in L2 martingale and a(n) be a sequence of strictlypositive real numbers increasing to +∞. Assume that

Bracket condition : limn→+∞

1a(n)〈M〉n = σ

2 in probability. (3.7)

Lindeberg condition : for all ε > 0,

limn→+∞

1a(n)

n

∑k=1

E((∆Mk+1)

21|∆Mk+1|≥ε

√a(n)

∣∣∣∣Fk

)= 0, in probability. (3.8)

Then :Mn√a(n)

converge in distribution to σG,

where G is a Gaussian random variable with mean 0 and variance 1.

Remark 3.3.2. Roughly speaking in order to obtain a Central limit theorem for a martingale, weneed to get, first, an asymptotic deterministic estimate for 〈M〉n ≈ a(n) when n goes to infinityand then, to check the Lindeberg condition.

Exercise 16 gives a proof in a simple case in which the role of the martingale hypotheseis clearer than in the detailled proof which is given page 89. For a complete discussion onmartingale convergence theorem, we refer to [Hall and Heyde(1980)].

Application to the standard case It is easy to recover the traditional central limit theoremusing the previous corollary. For this, consider a sequence of independent random variablesfollowing the law of X such that E(|X |2)<+∞. Let a(n) = n and Mn = X1 + · · ·+Xn−nE(X).M is a martingale and its bracket 〈M〉n = nVar (X) (so obviously 〈M〉n/n converge to Var (X) =σ2!). It remains to check the Lindeberg condition. But

1n

n

∑k=1

E(

X2k+11|Xk+1|≥ε

√n

∣∣∣Fk

)= E

(X21|X |≥ε

√n

).

But integrability of X2 and Lebesgue theorem proves that E(

X21|X |≥ε√

n

)converge to 0 when

n goes to ∞.

3.3.4 A central limit theorem for the Robbins-Monro algorithm

We can now derive a central limit theorem for the Robbins-Monro algorithm. We will onlydeal with the uni-dimensional case. This part follows closely [Duflo(1997)] (or [Duflo(1990)]in french).

Theorem 3.3.8. Let F(x,u) be a function from R×Rp to R and U is a random variable takingits values in Rd . We assume that

• f (x) = E(F(x,U)) is C2.


• f (x∗) = 0 and 〈 f (x),x− x∗〉> 0, for x 6= x∗.

• f ′(x∗) = c, where c > 0.

• If σ2(x) = Var(F(x,U)2), s2(x) = σ2(x)+ f 2(x)≤ K

(1+ |x|2

).

• It exists η > 0 such that, for all x ∈ R

v2+η(x) := E(|F(x,U)|2+η

)<+∞,

and supn≥0 v2+η(Xn)<+∞.

We consider a sequence (Un,n ≥ 1) of independent random variables following the law of Uand γ = α

n . We define Xn by

Xn+1 = Xn− γnF(Xn,Un+1),Xo = x ∈ R.

Then Xn converge almost surely to x∗ and

• if cα > 1/2,√

n(Xn−x∗) converge in distribution to a zero mean Gaussian random vari-able with variance σ2 = α2σ2(x∗)/(2cα−1)

• if cα < 1/2, ncα(Xn− x∗) converge almost surely to a random variable.

Remark 3.3.3. It is easy to optimize in α the asymptotic variance α2σ2(x∗)/(2cα−1) and toprove that the optimal choice is given by α = 1/c.

The same type of TCL can be obtained when γn = α/nβ , with 1/2 < β < 1. In this case itcan be proved that, for every α , (Xn− x∗)/

√γn converge in distribution to a gaussian random

variable, wathever the value of α .

Proof. We begin the proof with a simple deterministic lemma.

Lemma 3.3.4. Let c,α be strictly positive numbers. Assume that γn = α/n.Define αn = 1− cγn, βn = α1 . . .αn, then there exists a strictly positive number C such that

limn→+∞

βnncα =C and limn→+∞

(γn/βn)n1−cα =α

C.

Proof. Using Taylor formula, we get, for 0 < x < 1

0≤ log(1− x)+ x≤ x2

21

(1− x)2 .

So if n≥ n0, such that cγn < 1/2, we have

0≤ log(1− cγn)+ cγn ≤ 2c2γ

2n .

From this we deduce that, if sn = ∑nk=1 γk,

limn→+∞

βnecsn =C1 > 0.


But limn→+∞ ∑nk=1 1/k− log(n) =C2 where C2 is the Euler constant. So we have

limn→+∞

βnnαc =C3 =C1eαC2.

and from this we deduce that limn→+∞(γn/βn)n1−cα = α

C3.

In what follows we will assume that x∗ = 0. Our algorithm writes as

Xn+1 = Xn− γnF(Xn,Un+1)

= Xn(1− cγn)− γn [F(Xn,Un+1)− f (Xn)]− γn [ f (Xn)− cXn] .

Now, denote

• ∆Mn+1 = F(Xn,Un+1)− f (Xn) (∆Mn+1 is a martingale increment),

• Rn = f (Xn)− cXn (Rn will be “small”),

• αn = 1− cγn and βn = α1 . . .αn.

With these notationsXn+1

βn=

Xn

βn−1− γn

βn∆Mn+1−

γn

βnRn,

so

Xn = X0βn−1−βn−1

n−1

∑k=0

γk

βk∆Mk+1−βn−1

n−1

∑k=0

γk

βkRk.

The main point is now to estimate the bracket of Nn the martingale defined by

Nn =n−1

∑k=0

γk

βk∆Mk+1.

From this we will deduce the asymptotic behavior of Xn, by computing its bracket. Clearly

〈N〉n =n−1

∑k=0

γ2k

β 2k

σ2(Xn),

and we already know that Xn converge to x∗ = 0, so, by continuity of σ , limn→+∞ σ2(Xn) =σ2(x∗)> 0.

The case where 2cα > 1 In this case we are interested in the convergence in distribution of√n(Xn− x∗) to a gaussian random variable. The main step will be to apply the central limit

theorem for the martingale N. So we have to get an asymptotic estimate to its bracket 〈N〉n.For this, taking into account that (see previous lemma) γk

βk≈ 1

C kcα−1, we can prove that, for2cα > 1, limn→+∞〈N〉n =+∞, and it is easy to check that

limn→+∞

∑n−1k=0

γ2k

β 2k

σ2(Xn)

∑n−1k=0

γ2k

β 2k

σ2(x∗)= 1,a.s..


A more precise analysis (left as an exercise) shows that

n1−2cαn−1

∑k=0

γ2k

β 2k= (1+ εn)

α2

C2(2cα−1),

and leads to

limn→+∞

〈N〉nn2cα−1 =

α2σ2(x∗)C2(2cα−1)

,a.s..

So we have the bracket condition needed to apply the central limit theorem for martingale Nwith a(n) = n2cα−1. It remains to check the Lindeberg condition. For this note that

E(|X |21|X |≥δ

∣∣B)≤ E(|X |2+η

∣∣B)δ 2 .

So we have

1a(n)

n−1

∑k=1

E(|∆Nk+1|21

|∆Nk+1|≥ε√

a(n)∣∣∣∣Fk

)≤ 1

a(n)

n−1

∑k=1

1(ε√

a(n))ηE(|∆Nk+1|2+η

∣∣Fk)

=1

a(n)1+η/21

εη

n−1

∑k=1

(γk

βk

)2

E(|F(Xk,Uk+1)− f (Xk)|2+η

∣∣∣Fk

)≤ L(ω)

εη

1a(n)1+η/2

n−1

∑k=1

(γk

βk

)2

,

where L(ω) = supn≥0 v2+η(Xn) (which is supposed to be a.s. finite by hypothesis). So we get

1a(n)

n−1

∑k=1

E(|∆Nk+1|21

|∆Nk+1|≥ε√

a(n)∣∣∣∣Fk

)≤ L(ω)

εη

1a(n)1+η/2

n−1

∑k=1

(γk

βk

)2

≈ Kn−(2cα−1)(1+η/2)n(cα−1)(2+η)+1= Kn−η/2.

And this ends the proof that Lindeberg condition is fulfilled. So Nn/√

a(n) converge in dis-

tribution to a gaussian random variable with variance α2σ2(x∗)C2(2cα−1) , and so

√nβnNn converge to a

gaussian random variable with variance α2σ2(x∗)(2cα−1) .

To conclude (using Slutzky lemma) that√

nXn converge in distribution to the same randomvariable we need to prove that

√nβn−1X0 converge to 0 in probability (this is immediate) and

that εn =√

nβn−1 ∑n−1k=0

γkβk

Rk also converge to 0 in probability. This part is heavily technical andwill not be proved here2.

2Note, though, that, |Rn| ≤C|Xn|2, so if we can prove that E(|Xn|2)≤ K/n, we have

√nβn−1

n−1

∑k=0

E(∣∣∣∣ γk

βkRk

∣∣∣∣)≤√nβn−1

n−1

∑k=0

γk

kβk≤ K/

√n.


The case where 2cα < 1 In this case, we will prove that Xn−x∗βn

converge almost surely to arandom variable.

For this, we first check that

〈N〉n ≈ σ2(x∗)

n−1

∑k=0

γ2k

β 2k≈ σ

2(x∗)α2Kn−1

∑k=0

k2cα−2 <+∞.

So, using the strong law for martingale, Nn converge a.s. to N∞. If remains to check that∑

n−1k=0

γkβk

Rk converge a.s. to obtain the result of the theorem.

But, as |Rk| ≤C|Xk|2, we have

E

(∣∣∣∣∣n−1

∑k=0

γk

βkRk

∣∣∣∣∣)≤CE

(n−1

∑k=0

γk

βk|Xk|2

)≤C

n−1

∑k=0

γk

βkE(|Xk|2)≈C

n−1

∑k=0

E(|Xk|2)k1−cα

.

So if we can prove there exist β > cα such that E(|Xn|2) ≤ Cnβ

, we will be able to deduce theabsolute convergence of ∑

n−1k=0

γkβk

Rk. For a proof of this point we refer to [Duflo(1997)].

So, εn converge in L1 (and so in probability) to 0. See [Duflo(1997)] for details on the end of this proof


3.4 Exercises and problems

Exercice 15. We are interested in at the solution of

dXt =−γt (cXtdt +σdWt) ,X0 = x. (3.9)

where c, σ are positive real numbers, (Wt , t ≥ 0) is a brownian motion and

γt =α

t +1.

1. When σ = 0 prove that the unique solution of (3.9) is given by

Xt = xe−c∫ t

0 γsds =x

(t +1)cα

2. When σ is not zero, prove that

Xt =x

(t +1)cα− σα

(t +1)cα

∫ t

0

1(s+1)1−cα

dWs.

3. When 2αx > 1, prove that Xt is a gaussian random variable and that√

(t +1)Xt convergein distribution to a gaussian random variable with mean 0 and variance σ2α2

(2cα−1)G.

4. When 2αx > 1, prove that√

(t +1)cαXt converge almost surely to a finite random vari-able Z∞

Z∞ = x−σα

∫ +∞

0

1(s+1)1−cα

dWs.

Prove that Z∞ is a gaussian random variable. Compute its mean and its variance.

Exercice 16. In this exercise, we prove the central limit theorem for martingale in a specialcase.

Let (Mn,n≥ 0) be a martingale such that supn≥0 |∆Mn| ≤K <+∞, where ∆Mn = Mn−Mn−1and K is a constant. M is a square integrable martingale (why?) and, so, we can denote by 〈M〉its bracket. Assume moreover that

limn→+∞

〈M〉nn

= σ2,a.s. (3.10)

where σ is a positive real number.

1. For λ real, let φ j(λ ) = logE(

eλ∆M j

∣∣∣F j−1

), prove that

Xn = exp

(λMn−

n

∑j=1

φ j(λ )

),

is a martingale.


2. We want to extend φ j(z) to z a complex numbers. For this, we define the complex loga-rithm around 1 as, for |z| ≤ 1/2

log(1+ z) = ∑k≥1

(−1)k+1 zk

k. (3.11)

We this definition, one can prove that elog(1+z) = 1+ z for |z| ≤ 1/2, e denoting the com-plex exponential defined by ez = ∑k≥0

zk

k! .

Prove that, for u real,∣∣eiu∆M j −1

∣∣≤ e|u|K−1, then that∣∣∣E(eiu∆M j∣∣∣F j−1

)−1∣∣∣≤ e|u|K−1,

For |u| ≤CK = 1K log(3/2), prove that we can define, using the definition (3.11)

φ j(iu) = logE(

eiu∆M j∣∣∣F j−1

),

and that we have eφ j(iu) = E(

eiu∆M j∣∣F j−1

).

3. Prove that, for |u| ≤CK , (exp

iuMn−

n

∑j=1

φ j(iu)

,n≥ 0

)is a (complex) martingale.

4. Le u be a given real number, show that for a n large enough

E

[exp

(iu

Mn√n−

n

∑j=1

φ j(iu/√

n)

)]= 1.

5. Prove that, for x a complex number such that |x| ≤ 1/2

|ex−1− x− x2/2| ≤ |x|3 and | log(1+ x)− x| ≤ |x|2.

6. Show that, for n large enough∣∣∣∣E(ei u√n ∆M j

∣∣∣F j−1

)−1+

u2

2nE((∆M j)

2∣∣F j−1)∣∣∣∣≤ u3

n3/2 K3,

and that, for a c > 0 (depending on u), for n large enough, for all j ≤ n∣∣∣∣φ j

(iu√

n

)+

u2

2nE((∆M j)

2∣∣F j−1)∣∣∣∣≤ c

n3/2 ,

and deduce, using (3.10), that, for a given u

limn→+∞

n

∑j=1

φ j

(iu√

n

)=−σ2u2

2,a.s.


7. Proves that

limn→+∞

E

[exp

(iu

Mn√n−

n

∑j=1

φ j(iu/√

n)

)]−E

[exp(

iuMn√

n+

σ2u2

2

)]= 0,

and deduce that limn→+∞E[exp(

iu Mn√n

)]= exp

(σ2u2

2

). Conclude that Mn√

n converge indistribution to a gaussian random variable.

8. Generalize the result when

limn→+∞

〈M〉na(n)

= σ2,a.s.

where a(n) is a sequence of positive real numbers increasing to +∞ with n.

Exercice 17. Assume that (un,n≥ 0) and (bn,n≥ 0) are two sequence of positive real numbers,c > 0 such that, for all n≥ 0

un+1 ≤ un

(1− c

n

)+bn.

1. Let βn = 1/(∏

n−1k=1(1−

ck)). Prove that

un ≤Kβn

+1βn

n−1

∑k=1

βk+1bk ≤Kβn

+n−1

∑k=1

bk

2. Assume that bk =Ckα , with α > 1, prove that

un ≤Knc +

Knα−1 ≤

Kninf(c,α−1)

.

Exercice 18. We assume that Xn converge in probability to X and that |Xn| ≤ X with E(X)<

+∞. We want to prove that limn→+∞E(Xn) = E(X).

1. Let K be a positive real number and define φK(x) by φK(x) = (−K)1x<−K+x1|x|≤K+K1K<x. Prove that |φK(x)−φK(y)| ≤ |x− y|.

2. Prove thatE(|Xn−X |)≤ E(|φK(Xn)−φK(X)|)+2E(X1X≥K).

3. Prove that limK→∞E(X1X≥K) = 0.

4. Prove, for a given K, that

E(|φK(Xn)−φK(X)|)≤ 2KP(|Xn−X | ≥ ε)+ ε,

and deduce the extended Lebesgue theorem.


Exercice 19. We assume that Xn converge in distribution to X and that f is a continuous functionsuch that E(| f (Xn)|)<+∞, for all n≥ 1 and satisfies a property of equi-integrability

limA→+∞

E(| f (Xn)|1| f (Xn)|>A

)= 0

We want to prove that limn→+∞E( f (Xn)) = E( f (X)).

1. Prove that supn≥1E(| f (Xn)|) < +∞ and, considering the continuous function f (x)∧K,that E(| f (X)|)≤ supn≥1E(| f (Xn)|).

2. Prove that there exists a family of continuous functions φ Aδ(x), x ∈ R such that, φ A

δ(x) ≤

1x>A and for every x ∈ R, φ Aδ(x) converge to 1x>A.

3. Prove that

|E( f (Xn)−E( f (X))| ≤∣∣∣E( f (Xn)[1−φ

Aδ(Xn)

])−E

(f (X)

[1−φ

Aδ(X)])∣∣∣

+E(| f (Xn)|1| f (Xn)|>A

)+E

(| f (X)|1| f (X)|>A

),

and conclude.

4. Find an alternative way to recover the result of the exercise 18.

PROBLEM 10. Une méthode de Monte-Carlo adaptativeOn considère une fonction f , mesurable et bornée, de Rp dans R et X une variable aléatoire

à valeur dans Rp.On s’intéresse à un cas où l’on sait représenter E( f (X)) sous la forme

E( f (X)) = E(H( f ;λ ,U)) , (3.12)

où λ ∈ Rn, U est une variable aléatoire à valeur dans [0,1]d suivant une loi uniforme et où,pour tout λ ∈Rn, H( f ;λ ,U) est une variable aléatoire de carré intégrable qui prend des valeursréelles. La question 2 montrer que cela est généralement possible.

Le but de ce problème est de montrer que l’on peut, dans ce cas, faire varier λ au cours destirages tout en conservant les propriétés de convergence d’un algorithme de type Monte-Carlo.

1. On note φ la fonction de répartition d’une gaussienne centrée réduite et φ−1 son inverse.On suppose que X est une variable aléatoire réelle de loi gaussienne centrée réduite.

Montrer que, si λ ∈ R, H définie par :

H( f ;λ ,U) = e−λG− λ22 f (G+λ ) , où G = φ−1(U)

permet de satisfaire la condition (3.12).


2. Soit X une variable aléatoire à valeur dans Rp, de loi arbitraire, que l’on peut obtenir parune méthode de simulation: cela signifie qu’il existe une fonction ψ de [0,1]d dans Rp

telle que, si U = (U1, . . . ,Ud) suit une loi uniforme sur [0,1]d , la loi de ψ(U) est identiqueà celle de X .

On pose, pour i = 1, . . . ,d, Gi = φ−1(Ui) et l’on considère λ = (λ1, . . . ,λd) ∈ Rd . Pro-poser à partir de (G1 + λ1, . . . ,Gd + λd) un choix de variable aléatoire H( f ;λ ,U) quipermet de satisfaire l’égalité (3.12).

3. On considère une suite (Un,n ≥ 1) de variables aléatoires indépendantes uniformémentdistribuées sur [0,1]d . On note Fn = σ

(Uk,1≤ k ≤ n

).

Soit λ un vecteur fixé, comment peut on utiliser H( f ;λ ,U) pour estimer E( f (X)) ?Comment peut-on estimer l’erreur commise sur cette estimation ?

Quel est le critère pertinent pour choisir λ ?

On suppose que λ n’est plus constant mais évolue au fils du temps et est donné par une suite(λn,n≥ 0) de variables aléatoires Fn-mesurable (λ0 est supposée constante). On suppose que

pour tout λ ∈ Rp,s2(λ ) = Var (H( f ;λ ,U))<+∞,et s2(λ ) est une fonction continue de Rd dans R.

(3.13)

4. On pose

Mn =n−1

∑i=0

[H( f ;λi,Ui+1)−E( f (X))] .

Montrer que, si H est uniformément bornée3, (Mn,n ≥ 0) est une Fn-martingale dont lecrochet 〈M〉n s’exprime sous la forme 〈M〉n = ∑

n−1i=0 s2(λi).

5. Plus généralement montrer que, sous l’hypothèse (3.13), (Mn,n ≥ 0) est une martingalelocalement dans L2 de crochet toujours donné par 〈M〉n = ∑

n−1i=0 s2(λi) (on pourra utiliser

la famille de temps d’arrêt τA = infn≥ 0, |λn|> A et vérifier que (Mt∧τA,n≥ 0) est unemartingale bornée dans L2).

6. En utilisant la loi forte de grand nombre pour les martingales localement dans L2, montrerque si

limn→+∞

∑n−1i=0 s2(λi)

n= c, (3.14)

où c est une constante strictement positive, alors

limn→+∞

1n

n−1

∑i=0

H( f ;λi,Ui+1) = E( f (X)).

Interpréter ce résultat en terme de méthode de Monte-Carlo. Vérifier que, si λn convergepresque sûrement vers λ ∗ tel que s2(λ ∗)> 0, le résultat de cette question est vrai.

3i.e., il existe K réel positif tel que pour tout λ et u, |H( f ;λ ,u)| ≤ K <+∞


7. Quelle hypothèse faudrait-t’il ajouter à (3.14) pour obtenir un théorème central limitedans la méthode précédente (ne pas chercher à la vérifier) ? Enoncer le résultat que l’onobtiendrait alors.

Appendix A

A proof of the central limit theorem formartingales

The proof given here is a slightly adapted version of [Major(2013)].

Proof. We will need an extension of Lebesgue theorem (also known as Lebesgue theorem) which saythat if Xn converge in probability1 to X and |Xn| ≤ X with E(X)<+∞, then limn→+∞E(Xn) = E(X) (seeexercise 18 for a proof).

We denote by M(k) the locally in L2 martingale M(k)j = M j/

√a(k) and by 〈M(k)〉 its bracket. Then

we introduce for each k ≥ 0 the stopping time

τk = inf

j ≥ 0,〈M(k)〉 j+1 > 2σ2, (A.1)

The random variable τk is a stopping time with respect to the σ -algebras F j, j≥ 0, since the random vari-able 〈M(k)〉 j+1 is F j measurable. Moreover P(limk→+∞ τk =+∞) = 1, because P(τk > j) = P(〈M〉 j+1 ≤2σ2a(k)) and a(k) tends to +∞ when k goes to +∞. Now introduce, the stopped process M[k] defined by

M[k]j = M(k)

j∧τk.

We can check that M[k] is a martingale bounded in L2 as

〈M[k]〉 j = 〈M(k)〉 j∧τk ≤ 2σ2, (A.2)

so E(〈M[k]〉 j

)≤ 2σ2 < +∞ and E

((M[k]

j )2)≤ E

((M(k)

0 )2)+ 2σ2 < +∞. So M[k] is an L2 martingale

whose braket can be computed as

∆〈M[k]〉 j := 〈M[k]〉 j−〈M[k]〉 j−1 = E((∆M[k]

j )2∣∣∣F j−1

), where ∆M[k]

j := M[k]j −M[k]

j−1.

Note that we can rewrite the Lindeberg condition (3.8) as

limn→+∞

k

∑j=1

E((∆M(k)

j )21|∆M(k)j |≥ε

∣∣∣∣F j−1

)= 0, in probability.

1which is a weaker asumption than the almost surely convergence usually assumed.

89

90APPENDIX A. A PROOF OF THE CENTRAL LIMIT THEOREM FOR MARTINGALES

Moreover |∆M[k]j |= 1 j<τk|∆M(k)

j | ≤ |∆M(k)j |, so

limn→+∞

k

∑j=1

E((∆M[k]

j )21|∆M[k]j |≥ε

∣∣∣∣F j−1

)= 0, in probability.

Now, as E((∆M[k]

j )21|∆M[k]j |≥ε

∣∣∣∣F j−1

)≤ E

((∆M[k]

j )2∣∣∣F j−1

)= ∆〈M[k]〉 j, taking expectation in the

previous convergence and justifying it by the (extended) Lebesgue theorem (as 〈M[k]〉k ≤ 2σ2) we obtaina stronger Lindeberg condition for M[k]

limn→+∞

k

∑j=1

E((∆M[k]

j )21|∆M[k]j |≥ε

)= 0. (A.3)

Now note that 〈M[k]〉k = 1a(k)〈M〉k∧τk and, using hyphothesis (3.7), that

limk→+∞

P(τk > k) = limk→+∞

P(〈M〉k ≤ 2σ

2a(k))= 1.

So we have limk→+∞〈M[k]〉k = σ2, in probability. Now taking expectation and using again Lebesguetheorem we get a stronger bracket condition for M[k]

limk→+∞

E(〈M[k]〉k

)= σ

2. (A.4)

Now let Sk = M(k)k and Sk = M[k]

k = M(k)k∧τk

for k ≥ 1. With these notation, we want to prove that Sk

converge in law to to a gaussian random variable. But Sk−Sk converges in probability to 0 when k→ ∞

(since Sk = Sk if τk > k and we have already seen that limk→+∞P(τk > k) = 1.). So, using Slutzky lemma,it remains to prove the convergence in probability of Sk to a gaussian random variable, i.e.

limk→∞

E(eitSk) = e−σ2t2/2 for all real numbers t. (A.5)

And we will show that relation (A.5) follows from

limk→∞

E(eitSk+t2Uk/2) = 1 for all real numbers t, (A.6)

where Uk = 〈M[k]〉k. Indeed, Uk converge in probability to σ2 if k→ ∞, and 0 ≤Uk ≤ 2σ2 for all k ≥ 1because of (A.2). Hence eitSk+t2Uk/2− eitSk+σ2t2/2 converge in probability to 0 for all real numbers t ifk→ ∞, and ∣∣∣eitSk+t2Uk/2− eitSk+σ2t2/2

∣∣∣≤ ∣∣∣et2Uk/2− eσ2t2/2∣∣∣≤ et2σ2

Hence by (extended) Lebesgue’s theorem limk→∞E(eitSk+t2Uk/2−eitSk+σ2t2/2)= 0. Formula (A.5) followsfrom this statement if we can prove (A.6).

For this we first show that∣∣∣E(eitSk+t2Uk/2)−1∣∣∣≤ eσ2t

k

∑j=1

E∣∣∣et2∆〈M[k]〉 j/2E( eit∆M[k]

j∣∣F j−1)−1

∣∣∣ . (A.7)

91

Indeed, let us introduce the random variables Sk, j = M[k]j , Uk, j = 〈M[k]〉 j, for j ≥ 1 and Sk,0 = 0, Uk,0 = 0

for all indices k ≥ 1. Then we have Sk,k = Sk, Uk,k =Uk, and

E(

eitSk+t2Uk/2−1)=

k

∑j=1

E(

eitSk, j+t2Uk, j/2− eitSk, j−1+t2Uk, j−1/2)

=k

∑j=1

EeitSk, j−1+t2Uk, j−1/2E[(

eit∆M[k]j +t2∆〈M[k]〉 j/2−1

∣∣∣F j−1

)].

Since eitSk, j−1+t2Uk, j−1/2 is bounded by eσ2t2, it follows from the above identity that

∣∣∣E(eitSk+t2Uk/2−1)∣∣∣≤ eσ2t

k

∑j=1

E∣∣∣E(eit∆M[k]

j +t2∆〈M[k]〉 j/2−1∣∣∣F j−1

)∣∣∣ ,and as E

(eit∆M[k]

j +t2∆〈M[k]〉 j/2−1∣∣∣F j−1

)= et2∆〈M[k]〉 j/2E

(eit∆M[k]

j

∣∣∣F j−1

)− 1, this implies the esti-

mate (A.7).To prove formula (A.6) with the help of inequality (A.7) we have to give an estimate for

E∣∣∣et2∆〈M[k]〉 j/2E

(eit∆M[k]

j

∣∣∣F j−1

)−1∣∣∣.

The expression et2∆〈M[k]〉 j/2 can be written in the form et2∆〈M[k]〉 j/2 = 1+ t2∆〈M[k]〉 j2 +η

(1)k, j with an ap-

propriate random variable η(1)k, j which satisfies the inequality |η(1)

k, j | ≤ K1(t)∆〈M[k]〉2j with some numberK1(t) depending only on the parameter t, because 〈M[k]〉 j ≤ 2σ2 by formula (A.2). We can estimate theexpression

η(2)k, j = E

(eit∆M[k]

j −1+t2(∆M[k]

j )2

2

∣∣∣∣∣F j−1

)

in a similar way. To do this let us fix a small number ε > 0, and show that the inequality

∣∣∣∣∣eit∆M[k]j −1− it∆M[k]

j +t2(∆M[k]

j )2

2

∣∣∣∣∣≤ α(∆M[k]j ),

holds with α(x) = t2x21|x|>ε+ε

6 |t|3x21|x|≤ε. Indeed, we get this estimate by bounding the expression∣∣∣eitx−1− itx+ t2x2

2

∣∣∣ by t2x2 if |x| > ε and by |t|3|x|36 ≤ ε

|t|3x2

6 if |x| ≤ ε . Using that E(∆M[k]j |F j−1) = 0

and taking the conditional expectation in the last inequality with respect to F j−1 we get

|η(2)k, j | ≤ E

(∣∣∣∣∣eit∆M[k]j −1− it∆M[k]

j +t2(∆M[k]

j )2

2

∣∣∣∣∣∣∣∣∣∣F j−1

)

≤ E(

α(∆M[k]j )∣∣∣Fk, j

)≤ t2E

((∆M[k]

j )21|∆M[k]j |>ε

∣∣∣∣F j−1

)+

ε

6|t|3∆〈M[k]〉 j.

Since 〈M[k]〉 j ≤ 2σ2, both η(1)k, j and η

(2)k, j are bounded random variables (with a bound depending only on

92APPENDIX A. A PROOF OF THE CENTRAL LIMIT THEOREM FOR MARTINGALES

the parameter t), and the above estimates imply that∣∣∣et2∆〈M[k]〉 j/2E(

eit∆M[k]j

∣∣∣F j−1

)−1∣∣∣

=

∣∣∣∣∣(

1+t2∆〈M[k]〉 j

2+η

(1)k, j

)(1−

t2∆〈M[k]〉 j

2+η

(2)k, j

)−1

∣∣∣∣∣≤ t4(∆〈M[k]〉 j)

2 +K3(t)(|η(1)

k, j |+ |η(2)k, j |)

≤ K4(t)((∆〈M[k]〉 j)

2 +E((∆M[k]

j )21|∆M[k]j |>ε

∣∣∣∣F j−1

)+ ε∆〈M[k]〉 j

).

Let us take the expectation of the left-hand side and right-hand side expression in the last inequality andsum up for all indices j ≥ 1. The inequality obtained in such a way together with formula (A.7) implythat

|EeitSk+t2Uk/2−1| ≤ K5(t)

(k

∑j=1

E((∆〈M[k]〉 j)

2)

(A.8)

+k

∑j=1

E((∆M[k]

j )21|∆M[k]j |>ε

)+ ε

k

∑j=1

E∆〈M[k]〉 j

).

To estimate the first sum at the right-hand side of (A.8) let us make the following estimate:

E(∆〈M[k]〉 j)

2= E

[E((∆M[k]

j )21|∆M[k]j |>ε

∣∣∣∣F j−1

)+E

((∆M[k]

j )21|∆M[k]j |≤ε

∣∣∣∣F j−1

)]2

≤ 2E

E((∆M[k]

j )21|∆M[k]j |>ε

∣∣∣∣F j−1

)2+2E

E((∆M[k]

j )21|∆M[k]j |≤ε

∣∣∣∣F j−1

)2

≤ 2E

∆〈M[k]〉 jE(

∆M[k]j )21|∆M[k]

j |>ε

∣∣∣∣F j−1

)+2ε

2EE((∆M[k]

j )21|∆M[k]j |≤ε

∣∣∣∣F j−1

)≤ 4σ

2E((∆M[k]

j )21|∆M[k]j |>ε

)+2ε2E(

∆〈M[k]〉 j

).

Using this estimate and (A.8), we obtain

∣∣∣EeitSk+t2Uk/2−1∣∣∣≤ K6(t)

(k

∑j=1

E((∆M[k]

j )21|∆M[k]j |>ε

)+ ε

k

∑j=1

E(

∆〈M[k]〉 j

)),

and relations (A.3), (A.4) imply formula (A.6). Thus we have proved the central limit theorem.

Bibliography

[Abramovitz and Stegun(1970)] M. Abramovitz and A. Stegun. Handbook of MathematicalFunctions. Dover, 9th edition, 1970.

[Antonov and Saleev(1980)] I.A. Antonov and V.M. Saleev. An economic method of comput-ing l pτ -sequences. USSR Comput. Maths. Math. Phys, 19:252–256, 1980.

[Barraquand and Martineau(1995)] J. Barraquand and D. Martineau. Numerical valuation ofhigh dimensional multivariate american securities. J. of Finance and Quantitative Analy-sis, 30:383–405, 1995.

[Broadie and Glassermann(1997a)] M. Broadie and P. Glassermann. Pricing american-stylesecurities using simulation. J.of Economic Dynamics and Control, 21:1323–1352, 1997a.

[Broadie and Glassermann(1997b)] M. Broadie and P. Glassermann. A stochastic mesh methodfor pricing high-dimensional american options. Working Paper, Columbia University:1–37, 1997b.

[Clément et al.(2002)] Emmanuelle Clément, Damien Lamberton, and Philip Protter. An anal-ysis of a least squares regression method for American option pricing. Finance Stoch., 6(4):449–471, 2002.

[Cochran(1953)] William G. Cochran. Sampling techniques. John Wiley & Sons, Inc., NewYork; Chapman & Hall, Ltd., London, 1953.

[Cohort(2001)] P. Cohort. Monte–carlo methods for pricing american options. Doc-umentation for the premia software, INRIA and ENPC, February 2001. seehttp://cermics.enpc.fr/~premia.

[Cohort(2004)] Pierre Cohort. Limit theorems for random normalized distortion. Ann. Appl.Probab., 14(1):118–143, 2004.

[Devroye(1986)] Luc Devroye. Nonuniform random variate generation. Springer-Verlag, NewYork, 1986.

[Duflo(1990)] Marie Duflo. Méthodes récursives aléatoires. Techniques Stochastiques. Mas-son, Paris, 1990.

93

94 BIBLIOGRAPHY

[Duflo(1997)] Marie Duflo. Random iterative models, volume 34 of Applications of Mathemat-ics (New York). Springer-Verlag, Berlin, 1997. Translated from the 1990 French originalby Stephen S. Wilson and revised by the author.

[Faure(1981)] H. Faure. Discrépance de suites associées à un système de numération (en di-mension 1). Bull. Soc. Math. France, 109:143–182, 1981.

[Faure(1982)] H. Faure. Discrépance de suites associées à un système de numération (en di-mension s). Acta Arith., 41:337–351, 1982.

[Fort and Pagès(1995)] J.C. Fort and G. Pagès. About the a.s. convergence of the Kohonenalgorithm with a general neighborhood function. The Annals of Applied Probability, 5(4),1995.

[Fox(1988)] B.L. Fox. Algorithm 659:implementating Sobol’s quasirandom sequence genera-tor. ACM Transaction on Mathematical Software, 14(1):88–100, 1988.

[Fox et al.(1992)] B.L. Fox, P. Bratley, and H. Neiderreiter. Implementation and test of lowdiscrepancy sequences. ACM Trans. Model. Comput. Simul., 2(3):195–213, July 1992.

[Fu et al.(2001)] M.C. Fu, S.B. Laprise, D.B. Madan, Y. Su, and R. Wu. Pricing americanoptions: a comparison of monte carlo simulation approaches. Journal of ComputationalFinance, 4(3), Spring 2001.

[Glasserman et al.(1999)] P. Glasserman, P. Heidelberger, and P. Shahabuddin. Asymptoticallyoptimal importance sampling and stratification for pricing path dependent opions. Math-ematical Finance, 9(2):117–152, April 1999.

[Hall and Heyde(1980)] P. Hall and C. C. Heyde. Martingale limit theory and its application.Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London, 1980.Probability and Mathematical Statistics.

[Hammersley and Handscomb(1979)] J. Hammersley and D. Handscomb. Monte Carlo Meth-ods. Chapman and Hall, London, 1979.

[Kalos and Whitlock(2008)] Malvin H. Kalos and Paula A. Whitlock. Monte Carlo methods.Wiley-Blackwell, Weinheim, second edition, 2008.

[Kasas(2000)] Frédéric Kasas. Méthodes de Monte-Carlo et suites à discrépance faible ap-pliquées au calcul d’options en finance. PhD thesis, Université d’Evry, Juillet 2000.

[Knuth(1998)] Donald E. Knuth. The art of computer programming. Vol. 2. Addison-Wesley,Reading, MA, 1998. Seminumerical algorithms, Third edition [of MR0286318].

[L’Ecuyer(1990)] P. L’Ecuyer. Random numbers for simulation. Communications of the ACM,33(10), Octobre 1990.

BIBLIOGRAPHY 95

[Longstaff and Schwartz(1998)] F.A. Longstaff and E.S. Schwartz. Valuing american op-tions by simulations:a simple least-squares approach. Working Paper Anderson GraduateSchool of Management University of California, 25, 1998.

[Major(2013)] Péter Major. Central limit theorems for martingales. Technicalreport, Mathematical Institute of the Hungarian Academy of Sciences, 2013.http://www.math-inst.hu/~major/probability/martingale.pdf.

[Niederreiter(1995)] H. Niederreiter. New developments in uniform pseudorandom numberand vector generation. In P.J.-S. Shiue and H. Niederreiter, editors, Monte Carlo andQuasi-Monte Carlo Methods in Scientific Computing, volume 106 of Lecture Notes inStatistics, pages 87–120, Heidelberg, New York, 1995. Springer-Verlag.

[Niederreiter(1992)] Harald Niederreiter. Random number generation and quasi-Monte Carlomethods, volume 63 of CBMS-NSF Regional Conference Series in Applied Mathematics.Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992.

[Pagès and Bally(2000)] G. Pagès and V. Bally. A quantization algorithm for solving multi–dimensional optimal stopping problems. Technical report 628, université Paris 6, 2000.

[Pagès and Xiao(1997)] G. Pagès and Y.J. Xiao. Sequences with low discrepancy and pseudorandom numbers: theoretical results and numerical tests. J. Statist. Comput. Simul, 56:163–188, 1997.

[Pagès et al.(2001)] G. Pagès, V. Bally, and J. Printems. A stochastic optimisation method fornonlinear problems. Technical report number 02, University Paris XII, 2001.

[Press et al.(1992)] W. H. Press, Saul A. Teukolsky, W. T. Vetterling, and Brian P. Flannery.Numerical receipes in C. Cambridge University Press, Cambridge, second edition, 1992.The art of scientific computing.

[Radovic et al.(1996)] Igor Radovic, Ilya M. Sobol’, and Robert F. Tichy. Quasi-Monte Carlomethods for numerical integration: Comparison of different low discrepancy sequences.Monte Carlo Methods Appl., 2(1):1–14, 1996.

[Ripley(2006)] Brian D. Ripley. Stochastic simulation. Wiley Series in Probability and Statis-tics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2006. Reprint of the 1987original, Wiley-Interscience Paperback Series.

[Roman(1992)] S. Roman. Coding and information theory. Number 134 in Graduate Texts inMathematics. Springer-Verlag, New York, 1992.

[Roth(1954)] K. F. Roth. On irregularities of distribution. Mathematika, 1:73–79, 1954.

[Rubinstein(1981)] R.Y. Rubinstein. Simulation and the Monte Carlo Method. Wiley Series inProbabilities and Mathematical Statistics, 1981.

96 BIBLIOGRAPHY

[Sabin and Gray(1986)] M.J. Sabin and R.M. Gray. Global convergence and empirical consis-tency of the generalised lloyd algorithm. IEEE Transactions on Information Theory, 32:148–155, March 1986.

[Sobol’(1967)] I.M. Sobol’. On the distribution of points in a cube and the approximate evalu-ation od integral. USSR Computational Math. and Math. Phys., 7:86–112, 1967.

[Sobol’(1976)] I.M. Sobol’. Uniformly distributed sequences with an additional unifrom prop-erty. USSR Computational Math. and Math. Phys., pages 236–242, 1976.

[Tsitsiklis and Roy(2000)] J.N. Tsitsiklis and B. Van Roy. Regression methods for pricingcomplex american-style options. Working Paper, MIT:1–22, 2000.

[Williams(1991)] David Williams. Probability with martingales. Cambridge MathematicalTextbooks. Cambridge University Press, Cambridge, 1991.

Documents

Monte Carlo Methods and Stochastic Algorithmsbl/monte-carlo/poly.pdf · 1.1. ON THE CONVERGENCE RATE OF MONTE-CARLO METHODS 3 1.1 On the convergence rate of Monte-Carlo methods In