Lecture 2: Importance sampling the basics - Brown University · Lecture 2: Importance sampling – the basics Henrik Hult Department of Mathematics KTH Royal Institute of Technology

Lecture 2: Importance sampling – the basics

Henrik Hult

Department of MathematicsKTH Royal Institute of Technology

Sweden

Summer School on Monte Carlo Methods and Rare EventsBrown University, June 13-17, 2016

H. Hult Lecture 2

Outline

1 A random walk example with poor performance

2 A Markov random walk model

3 Markov chains in continuous time

4 Small-noise diffusions

H. Hult Lecture 2

A random walk example, with poor performanceThe model

Let Z1,Z2, . . . be iid N(0,1) random variables and put

H(α) = log E [exp{αZ1}] =α2

2, α ∈ R.

Let X n0 = 0, X n

k = 1n (Z1 + · · ·+ Zk ), for k ≥ 1, be the normalized

random walk.Consider computing the probability P{X n

n ∈ (−∞,a] ∪ [b,∞)}, byimportance sampling, where a < 0 < b and b < |a|.Clearly, there is no need for importance sampling because

P{X nn ∈ (−∞,a] ∪ [b,∞)} = Φ(a

√n) + 1− Φ(b

√n),

where Φ is the standard normal cdf.

H. Hult Lecture 2




2, α ∈ R.

Let X n0 = 0, X n




P{X nn ∈ (−∞,a] ∪ [b,∞)} = Φ(a

√n) + 1− Φ(b

√n),


H. Hult Lecture 2




2, α ∈ R.

Let X n0 = 0, X n




P{X nn ∈ (−∞,a] ∪ [b,∞)} = Φ(a

√n) + 1− Φ(b

√n),


H. Hult Lecture 2




2, α ∈ R.

Let X n0 = 0, X n




P{X nn ∈ (−∞,a] ∪ [b,∞)} = Φ(a

√n) + 1− Φ(b

√n),


H. Hult Lecture 2

A random walk example, with poor performanceImportance sampling

When applying importance sampling we may suggest a samplingdistribution with density

ϕα(z) = eαz−H(α)ϕ(z) = eαz−α22 ϕ(z),

where ϕ is the standard normal density.Note that ϕα is simply the density of a N(α,1) distribution.For large n, because b < |a|,

P{X nn ≥ b} � P{X n

n ≤ a}

so it may seem reasonable to focus on changing the measureassociated with the probability P{X n

n ≥ b}.Guided by, say Cramér’s Theorem, we suggest taking α = b.

H. Hult Lecture 2






n ≤ a}



H. Hult Lecture 2






n ≤ a}



H. Hult Lecture 2

A random walk example, with poor performanceNumerics

Table: Simulation results for the random walk, a = −0.25, b = 0.2,

True Estimate Std·√

N Relative ErrorN = 103 0.029 0.022 0.034 1.55N = 104 0.029 0.022 0.034 1.55N = 105 0.029 0.034 3.63 106

H. Hult Lecture 2





H. Hult Lecture 2





H. Hult Lecture 2

A random walk example, with poor performanceWeights and trajectories

Figure: Left: The log of the likelihood ratio weights. Right: Trajectories

Histogram of log(W)

log(W)

Fre

qu

en

cy

−10 −5 0 5

02

00

04

00

06

00

08

00

01

00

00

0 20 40 60 80 100

−0

.2−

0.1

0.0

0.1

0.2

0.3

Index

X[n

n,

]

H. Hult Lecture 2

A Markov random walk model

Let {vi(x), x ∈ Rd , i ≥ 0} be independent and identicallydistributed random vector fields with distribution

P{vi(x) ∈ ·} = θ(· | x),

where θ is a regular conditional probability distribution.Let

X ni+1 = X n

i +1n

vi(X ni ), X n

0 = x0.

Denote the log moment generating function of θ(· | x) by

H(x , α) = log E [exp{〈α, v1(x)〉}]

and suppose H(x , α) <∞ for all x and α in Rd .The Fenchel-Legendre transform (convex conjugate) of H(x , ·),denoted by

L(x , β) = supα∈Rd

[〈α, β〉 − H(x , α)] .

H. Hult Lecture 2



P{vi(x) ∈ ·} = θ(· | x),


X ni+1 = X n

i +1n

vi(X ni ), X n

0 = x0.


H(x , α) = log E [exp{〈α, v1(x)〉}]



[〈α, β〉 − H(x , α)] .

H. Hult Lecture 2



P{vi(x) ∈ ·} = θ(· | x),


X ni+1 = X n

i +1n

vi(X ni ), X n

0 = x0.


H(x , α) = log E [exp{〈α, v1(x)〉}]



[〈α, β〉 − H(x , α)] .

H. Hult Lecture 2



P{vi(x) ∈ ·} = θ(· | x),


X ni+1 = X n

i +1n

vi(X ni ), X n

0 = x0.


H(x , α) = log E [exp{〈α, v1(x)〉}]



[〈α, β〉 − H(x , α)] .

H. Hult Lecture 2

Examples

Queueing modelsCredit risk modelsEpidemic modelsChemical reactions

H. Hult Lecture 2

A Markov random walk modelProbabilities and expectations

In the generic setup we will be interested in computing anexpectation of the form

E [exp{−nF (X nn )}],

where F : Rd → R is a bounded continuous function.Rare-event probabilities such as P{X n

n ∈ A} can be incorporatedby formally writing

F (x) =

{0, x ∈ A,∞, x ∈ Ac .

H. Hult Lecture 2

A Markov random walk modelProbabilities and expectations

In the generic setup we will be interested in computing anexpectation of the form

E [exp{−nF (X nn )}],

where F : Rd → R is a bounded continuous function.Rare-event probabilities such as P{X n

n ∈ A} can be incorporatedby formally writing

F (x) =

{0, x ∈ A,∞, x ∈ Ac .

H. Hult Lecture 2

A Markov random walk modelThe backward equation

Let An denote the backward evolution operator associated withX n, that is,

Anf (i , x) = Ei,x [f (i + 1,X ni+1)− f (i , x)]

=

∫ [f (i + 1, x +

1n

z)− f (i , x)]θ(dz | x).

The (Kolmogorov) backward equation implies thatV n(i , x) = Ei,x [exp{−nF (X n

n )}] satisfies

AnV n(i , x) = 0,V n(n, x) = exp{−nF (x)},

where V n(0, x0) = E [exp{−nF (X nn )}] is the quantity we are

interested in computing.To see this, simply use the iterated expectation.

H. Hult Lecture 2




=

∫ [f (i + 1, x +

1n

z)− f (i , x)]θ(dz | x).


n )}] satisfies




H. Hult Lecture 2




=

∫ [f (i + 1, x +

1n

z)− f (i , x)]θ(dz | x).


n )}] satisfies




H. Hult Lecture 2

A Markov random walk modelImportance sampling

Let the sampling distribution be given by

θα(dz | x) = exp{〈α, z〉 − H(x , α)}θ(dz | x).

Consider the controlled process X n, where

X ni+1 = X n

i +1n

Zi , X n0 = x0.

The likelihood ratio is given by

dPα

dP=

n−1∏i=0

exp{〈αni , Zi〉 − H(X n

i , αni )}.

The importance sampling estimator is

dPdPα

exp{−nF (X nn )}.

H. Hult Lecture 2





X ni+1 = X n

i +1n

Zi , X n0 = x0.


dPα

dP=

n−1∏i=0


i , αni )}.


dPdPα

exp{−nF (X nn )}.

H. Hult Lecture 2





X ni+1 = X n

i +1n

Zi , X n0 = x0.


dPα

dP=

n−1∏i=0


i , αni )}.


dPdPα

exp{−nF (X nn )}.

H. Hult Lecture 2





X ni+1 = X n

i +1n

Zi , X n0 = x0.


dPα

dP=

n−1∏i=0


i , αni )}.


dPdPα

exp{−nF (X nn )}.

H. Hult Lecture 2

A Markov random walk modelAnalysis of the second moment

With Sαj =

∑j−1i=0−〈α

ni , vi(X n

i )〉+ H(X ni , α

ni ) we can write the

second moment as

W n(0, x0) := E α[(

e∑n−1

i=0 −〈αni ,Zi 〉+H(X n

i ,αni )e−nF (X n

n ))2]

= E[e∑n−1

i=0 −〈αni ,vi 〉+H(X n

i ,αni )e−2nF (X n

n )]

= E[eSα

n −2nF (X nn )]

Let W n(j , x) be the second moment of the importance samplingestimator, starting from x at time j . Then,

W n(j , x) = E αj,x

[(e∑n−1

i=j −〈αni ,Zi 〉+H(X n


n ))2]

= Ej,x

[eSα

n −Sαj −2nF (X n

n )].

W n satisfies a backward equation, similar to V n.

H. Hult Lecture 2

A Markov random walk modelAnalysis of the second moment

With Sαj =

∑j−1i=0−〈α

ni , vi(X n

i )〉+ H(X ni , α

ni ) we can write the

second moment as

W n(0, x0) := E α[(

e∑n−1

i=0 −〈αni ,Zi 〉+H(X n


n ))2]

= E[e∑n−1

i=0 −〈αni ,vi 〉+H(X n

i ,αni )e−2nF (X n

n )]

= E[eSα

n −2nF (X nn )]

Let W n(j , x) be the second moment of the importance samplingestimator, starting from x at time j . Then,

W n(j , x) = E αj,x

[(e∑n−1

i=j −〈αni ,Zi 〉+H(X n


n ))2]

= Ej,x

[eSα

n −Sαj −2nF (X n

n )].

W n satisfies a backward equation, similar to V n.

H. Hult Lecture 2

A Markov random walk modelA backward equation for the second moment

TheoremThe second moment W n satisfies the backward equation∫

[W n(i + 1, x +1n

z)e−〈αni (x),z〉+H(x ,αn

i (x)) −W n(i , x)]θ(dz | x) = 0,

W n(n, x) = e−2nF (x).

H. Hult Lecture 2

A Markov random walk modelProof of the backward equation

Let

W n(j , x , s) = Ej,x ,s

[exp{Sα

n − Sαj − 2nF (X n

n )}]

and An denote the backward operator for the Markov chain(X n,Sα), i.e.,

Anf (i , x , s) = Ei,x ,s[f (i + 1,X ni+1,S

αi+1)− f (i , x , s)]

=

∫ [f (i + 1, x +

1n

z, s − 〈αni (x), z〉+ H(x , αn

i (x)))− f (i , x , s)]dθ.

By the backward equation for W n:

AnW n(i , x , s) = 0,

W n(n, x , s) = exp{−2nF (x)},

and, since W n(j , x , s) = esW n(j , x) the equation for W n followsfrom a short calculcation.

H. Hult Lecture 2

A Markov random walk modelProof of the backward equation

Let

W n(j , x , s) = Ej,x ,s

[exp{Sα

n − Sαj − 2nF (X n

n )}]

and An denote the backward operator for the Markov chain(X n,Sα), i.e.,

Anf (i , x , s) = Ei,x ,s[f (i + 1,X ni+1,S

αi+1)− f (i , x , s)]

=

∫ [f (i + 1, x +

1n

z, s − 〈αni (x), z〉+ H(x , αn

i (x)))− f (i , x , s)]dθ.

By the backward equation for W n:

AnW n(i , x , s) = 0,

W n(n, x , s) = exp{−2nF (x)},

and, since W n(j , x , s) = esW n(j , x) the equation for W n followsfrom a short calculcation.

H. Hult Lecture 2

Markov chains in continuous timeThe model

Consider a continuous time Markov chain X n(t).At X n(t) = x the process can jump to new states x + 1

n v withv ∈ V.V denotes the set of possible jumps.The intensity of jumping from x to x + 1

n v is nλv (x) ≥ 0.The stochastic kernel associated with X n is

Θn(dt , v | x) = P{

Tk+1 − Tk ∈ dt ,X n(Tk+1) = x +1n

v | X n(Tk ) = x}

= nλv (x)e−nΛ(x)tdt ,

where 0 = T0 < T1, . . . denotes the jump times of X n andΛ(x) =

∑v∈V λv (x).

H. Hult Lecture 2





Θn(dt , v | x) = P{

Tk+1 − Tk ∈ dt ,X n(Tk+1) = x +1n

v | X n(Tk ) = x}



∑v∈V λv (x).

H. Hult Lecture 2





Θn(dt , v | x) = P{

Tk+1 − Tk ∈ dt ,X n(Tk+1) = x +1n

v | X n(Tk ) = x}



∑v∈V λv (x).

H. Hult Lecture 2





Θn(dt , v | x) = P{

Tk+1 − Tk ∈ dt ,X n(Tk+1) = x +1n

v | X n(Tk ) = x}



∑v∈V λv (x).

H. Hult Lecture 2





Θn(dt , v | x) = P{

Tk+1 − Tk ∈ dt ,X n(Tk+1) = x +1n

v | X n(Tk ) = x}



∑v∈V λv (x).

H. Hult Lecture 2

Markov chains in continuous timeImportance sampling

The importance sampling algorithm is implemented usingsampling intensities λαv (x) of the form

λαv (x) = e〈α,v〉λv (x),

The corresponding likelihood ratio is

dPα

dP=

NT∏k=1

Θα(dτk , vk | X n(Tk−1))

Θn(dτk , vk | X n(Tk−1)),

where NT = inf{k ≥ 1 : Tk > T}, τk = Tk − Tk−1,vk = n(X n(Tk )− X n(Tk−1)), Λα(x) =

∑v∈V λ

αv (x).

For a given λα the corresponding importance sampling estimatoris given as the sample mean of independent copies of

dPdPα

exp{−nF (X n(T ))}.

H. Hult Lecture 2





dPα

dP=

NT∏k=1




∑v∈V λ

αv (x).


dPdPα


H. Hult Lecture 2





dPα

dP=

NT∏k=1




∑v∈V λ

αv (x).


dPdPα


H. Hult Lecture 2

A simple epidemic model

Considers a population of n individuals, each who is eithersusceptible (S) to a virus or infected (I).The Markov chain X n(t) is the fraction of infected individuals attime t .This is an example of a continuous time Markov chain withV = {−1,1} and we take λ−1(x) = x , λ1(x) = ρx(1− x), ρ > 1.As n→∞ the process X n(t) converges to a deterministic process(by the law of large numbers) x satisfying the ODE

x(t) = −λ−1(x(t)) + λ1(x(t)) = −x(t) + ρx(t)(1− x(t)),

x(0) = x0.

This dynamical system has an absorbing state at x = 0 and astable equilibrium at x = 1− ρ−1.We may be interested in the probability that an infection, startingfrom x0 reaches a hight level x1 > x0 > 1− ρ−1 before comingback to the equilibrium at 1− ρ−1.

H. Hult Lecture 2




x(0) = x0.


H. Hult Lecture 2




x(0) = x0.


H. Hult Lecture 2




x(0) = x0.


H. Hult Lecture 2




x(0) = x0.


H. Hult Lecture 2

Small-noise diffusionsThe model

Compute an expectation of the form E [exp{−1εF (X ε(T ))], where

F is a bounded continuous function and X ε is the unique strongsolution to the stochastic differential equation

dX ε(t) = b(X ε(t))dt +√εσ(X ε(t))dB(t),

X ε(0) = x0.

Change the measure by a Girsanov transformation

dPα

dP= exp

{ 1√ε

∫ T

0〈α(s,X ε(s)),dB(s)〉 − 1

2ε

∫ T

0|α(s,X ε(s))|2ds

}.

Under Pα the underlying process has dynamics:

dX ε(t) = b(X ε(t)) + σ(X ε)α(t ,X ε(t))dt +√εσ(X ε(t))dB(t),

X ε(0) = x0,

where B is a Pα-Brownian motion.H. Hult Lecture 2





X ε(0) = x0.


dPα

dP= exp

{ 1√ε

∫ T

0〈α(s,X ε(s)),dB(s)〉 − 1

2ε

∫ T

0|α(s,X ε(s))|2ds

}.



X ε(0) = x0,






X ε(0) = x0.


dPα

dP= exp

{ 1√ε

∫ T

0〈α(s,X ε(s)),dB(s)〉 − 1

2ε

∫ T

0|α(s,X ε(s))|2ds

}.



X ε(0) = x0,



For a given α the corresponding importance sampling estimator isgiven as the sample mean of independent copies of

dPdPα

exp{−1ε

F (X ε(T ))}.

Theorem

The second moment W ε(t , x) = Et ,x [ dPdPα e−2 1

εF (X ε(T ))] satisfies the

backward equation

α2

εW ε − α√

εσDW ε + W ε

t + bDW ε + εσ2

2D2W ε = 0,

W ε(T , x) = e−2 1εF (x).

H. Hult Lecture 2


For a given α the corresponding importance sampling estimator isgiven as the sample mean of independent copies of

dPdPα

exp{−1ε

F (X ε(T ))}.

Theorem

The second moment W ε(t , x) = Et ,x [ dPdPα e−2 1

εF (X ε(T ))] satisfies the

backward equation

α2

εW ε − α√

εσDW ε + W ε

t + bDW ε + εσ2

2D2W ε = 0,

W ε(T , x) = e−2 1εF (x).

H. Hult Lecture 2

Small-noise diffusionsExample: Loss probabilities

Consider a multidimensional Black-Scholes model for theevolution of n financial asset prices, X ε

i (t), i = 1, . . . ,n, where

dX εi (t) = µiX ε

i (t)dt +√ε

n∑j=1

LijdBj(t), X εi (0) = x0,i .

The price of the k th derivative at time T is πk (T ,X ε(T )) and if theportfolio contains hk contracts of type k the value of the portfolioat time T is given by ∑

k

hkπk (T ,X ε(T )).

We may be interested in the probabilityP{∑

k hkπk (T ,X ε(T )) < b}, that is, the probability that the valueof the portfolio at a future time T is below some small number b.

H. Hult Lecture 2



i (t), i = 1, . . . ,n, where


i (t)dt +√ε

n∑j=1



k




H. Hult Lecture 2



i (t), i = 1, . . . ,n, where


i (t)dt +√ε

n∑j=1



k




H. Hult Lecture 2

Documents

Lecture 2: Importance sampling the basics - Brown University · Lecture 2: Importance sampling – the basics Henrik Hult Department of Mathematics KTH Royal Institute of Technology