Paul Glasserman - Malliavin Greeks without Malliavin calculus.pdf

8/14/2019 Paul Glasserman - Malliavin Greeks without Malliavin calculus.pdf

1/35

Stochastic Processes and their Applications 117 (2007) 16891723

www.elsevier.com/locate/spa

Malliavin Greeks without Malliavin calculus

Nan Chena,, Paul Glassermanb,1

a Chinese University of Hong Kong, Hong Kongb Columbia Business School, Columbia University, New York, NY 10027, USA

Received 2 August 2006; received in revised form 13 March 2007; accepted 25 March 2007

Available online 5 June 2007

Abstract

We derive and analyze Monte Carlo estimators of price sensitivities (Greeks) for contingent claims

priced in a diffusion model. There have traditionally been two categories of methods for estimating

sensitivities: methods that differentiate paths and methods that differentiate densities. A more recent line of

work derives estimators through Malliavin calculus. The purpose of this article is to investigate connections

between Malliavin estimators and the more traditional and elementary pathwise method and likelihoodratio method. Malliavin estimators have been derived directly for diffusion processes, but implementation

typically requires simulation of a discrete-time approximation. This raises the question of whether one

should discretize first and then differentiate, or differentiate first and then discretize. We show that in several

important cases the first route leads to the same estimators as are found through Malliavin calculus, but using

only elementary techniques. Time-averaging of multiple estimators emerges as a key feature in achieving

convergence to the continuous-time limit.

c 2007 Elsevier B.V. All rights reserved.

Keywords: Monte Carlo simulation; Likelihood ratio method; Pathwise derivative method; Malliavin calculus; Weak

convergence

1. Introduction

The calculation of price sensitivities is a central modeling and computational problem for

derivative securities. The prices of derivative securities are, to varying degrees, observable in

Corresponding address: Chinese University of Hong Kong, System Engineering, Shatin NT, Hong Kong. Tel.: +85226098237.

E-mail addresses:[email protected](N. Chen),[email protected](P. Glasserman).1 Tel.: +1 212 854 4102.

0304-4149/$ - see front matter c 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.spa.2007.03.012
http://www.elsevier.com/locate/spamailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.spa.2007.03.012http://dx.doi.org/10.1016/j.spa.2007.03.012mailto:[email protected]:[email protected]://www.elsevier.com/locate/spa


2/35

1690 N. Chen, P. Glasserman / Stochastic Processes and their Applications 117 (2007) 16891723

the market; but the hedging of derivative securities is based on price sensitivities, and these

sensitivities which are not observable require models and computational tools.

The computational effort required for the accurate calculation of price sensitivities (or

Greeks) is often substantially greater than that required for the calculation of the prices

themselves. This is particularly true of Monte Carlo simulation, for which the computing timerequired for sensitivities can easily be 10100 times greater than the computing time required to

estimate prices to the same level of precision.

The simplest and crudest approach to the Monte Carlo estimation of sensitivities to a

parameter simulates at two or more values of the underlying parameter and produces a finite-

difference approximation to the price sensitivity. In the case of delta, this means simulating

from different initial states; in the case of vega, this means simulating at different values of a

volatility parameter; and in the case of rho, this means simulating at different values of a drift

parameter. Finite-difference estimators are easy to implement, but are prone to large bias, large

variance, and added computational requirements.

Alternative methods seek to produce better estimators through some analysis of the underly-ing model. These methods evaluate derivatives directly, without finite-difference approximations.

Pathwisemethods treat the parameter of differentiation as a parameter of the evolution of the un-

derlying model and differentiate this evolution. At the other extreme, the likelihood ratio method

puts the parameter in the measure describing the underlying model and differentiates this mea-

sure. Because a price calculated by Monte Carlo is an expectation the integral of a discounted

payoff evaluated on each path, integrated against a probability measure all estimators of sensi-

tivities must involve some combination of these basic ideas: differentiating the evolution of the

path, or differentiating the measure. For general background on estimating sensitivities and many

references to the literature on this problem, see, e.g., Chapter 7 of Glasserman [9].

More recently, a fairly large and growing literature has developed around the derivation

of sensitivity estimators using Malliavin calculus. This line of work originated in Fournie,

Lasry, Lebuchoux, and Touzi [7] and includes Benhamou [1], Bermin, Kohatsu-Higa, and

Montero [2], Cvitanic, Ma, and Zhang [3], Davis and Johansson[4], Fournie et al. [8], Gobet and

Kohatsu-Higa [10], Kohatsu-Higa and Montero [12], and many others. Using the tools of

Malliavin calculus (cf. Nualart [15]), this approach derives estimators in continuous time, though

their implementation typically requires some form of time-discretization. The purpose of this

article is to investigate the connection between Malliavin estimators and estimators derived using

the more elementary ideas of the pathwise and likelihood ratio methods (LRM).

Our approach is as follows. We begin with a model specified through a stochastic differentialequation. Whereas the application of Malliavin calculus would, in effect, first differentiate and

then discretize, we discretize first. For simplicity, we use an Euler scheme. In the time-discrete

approximation, it is easy to derive pathwise and LRM estimators. Our main contribution is to

show how to combine these methods and then pass to the continuous-time limit in a way that

produces the Malliavin estimators. To put this another way, discretizing the Malliavin estimators

yields estimators that are equivalent (up to terms that vanish in the continuous-time limit) to

estimators derived using the more elementary methods. We carry this out for three important

cases of the Malliavin approach considered in Fournie et al. [7].

An insight that emerges from this analysis is the critical role played by time-averaging of

multiple unbiased sensitivity estimators in passing to the continuous-time limit. This becomesparticularly evident in the case of delta. A straightforward application of LRM to an Euler

scheme produces a delta estimator that explodes as the time increment decreases to zero. To

obtain a meaningful limit, we associate a separate unbiased estimator with each step along a


3/35

N. Chen, P. Glasserman / Stochastic Processes and their Applications 117 (2007) 16891723 1691

(time-discretized) path and average these estimators. The average converges to the Malliavin

estimator, though none of the individual estimators does. This observation sheds light on the

flexible weights that often appear in Malliavin estimators, and indicates that a virtue of the

Malliavin derivation is that it implicitly undertakes the necessary averaging.

We do not see the derivations in this article as inherently better or worse than those usingMalliavin calculus. Working directly in continuous time often permits the use of powerful and

efficient tools for analysis; working in discrete time allows more elementary arguments and can

produce estimators that can be implemented without further approximation. Both approaches

have advantages, and the purpose of this article is to illustrate connections between them. We do

this for three important cases sensitivities to an initial state, a drift parameter, and a diffusion

parameter. Because our objective is to provide insight, we restrict our analysis to one-dimensional

problems.

The rest of this paper is organized as follows. Section 2 outlines the main steps in our

derivations. In Section3, we verify that the estimators we derive are unbiased for the discrete-

time approximations with which we work. In Section4,we show that these estimators convergeweakly as the time step decreases. Several technical results are collected in Appendices AC.

2. Preview of main results

To prevent technical considerations from obscuring the simplicity of our main results, in this

section we outline our derivations without discussing the conditions required for their validity.

Subsequent sections are devoted to justifying the approach we sketch here.

We suppose that the underlying model dynamics are given by a stochastic differential equation

on

[0, T

],

dXt= (Xt)dt+ (Xt)dWt, X0= x, (1)where W is a standard Brownian motion. For simplicity, we restrict attention to scalar X. Con-

sider a (discounted) payoff function that depends on the values of the underlying asset at times

0t1


4/35


dYt= (Xt)Ytdt+ (Xt)YtdWt, Y0=1.The likelihood ratio method (LRM) estimator starts from the transition density g(x, )

describing the distribution ofXT given X0= x . The priceu (x)is given by

u(x )= (xT)g(x,xT) dxT,so bringing the derivative inside the integral and then multiplying and dividing byg(x,xT) yields

u(x)= (xT)

d

dxg(x,xT) dxT

= (xT)

d

dxlog g(x ,xT)

g(x,xT) dxT

= E(XT)

d

dx

log g(x,XT) (3)and the unbiased estimator

(XT)d

dxlog g(x,XT).

By differentiating the density, the LRM method avoids imposing any smoothness conditions on

. However, it requires existence and knowledge ofg .

The (or rather,a ) Malliavin estimator for this problem is (cf. Fournie et al.[7], p. 399)

(XT)1

T T

0

Yt

(Xt)dWt. (4)

Like the LRM estimator, this estimator multiplies the payoff(XT)by a random weight to esti-

mate the derivative. In contrast to the LRM estimator, it does not involve the transition densityg.

Consider, now, an Euler approximation,

Xi= Xi1+ (Xi1)t+ (Xi1)Wi , X0=x , (5)i = 1, . . . ,N, with time step t = T/N and Wi = W(i t) W((i 1)t). Letu(x)=E[(XN)] and letYi= d Xi /dx ,

Yi=Yi1+ (Xi1)Yi1t+ (Xi1)Yi1Wi , Y0=1. (6)The pathwise estimator ofu(x), the delta for the Euler scheme, is

(XN)YN.

For the LRM estimator, we may write

u(x )=

(xN)g(x,x1) g(xN1,xN) dxN dx1, (7)

whereg(xi1,xi )is the transition density from Xi1= xi1 to Xi= xi . Proceeding as before,we arrive at the estimator

(XN) Ni=1

ddx

logg(Xi1,Xi )= (XN) ddx

logg(x,X1), (8)

noting that only the first of the transition densities depends on the initial state x .


5/35


Whereas the transition density of the continuous-time process X is often unknown, the

transition density for the Euler scheme is Gaussian. In particular, X1 is normally distributedwith mean x+(x )tand variance 2(x)t. As a consequence, we can differentiate the logdensity and, after some simplification, write the estimator(8)as

(XN) W1

(x)t+ op(1)

,

whereop(1)converges weakly to zero astapproaches zero. While this estimator is, under mild

conditions, unbiased foru(x)for allt, it clearly behaves badly as tapproaches zero.But we have more flexibility than(8)initially indicates. For anyi= 1, . . . ,N, we may write

u(x)=E

(xN)g(Xi1(x),xi ) g(xN1,xN) dxN dxi

.

Here, we have written Xi1 as Xi1(x) to stress that Xi1 now has a functional dependenceon the initial state x through the Euler recursion(5).Differentiating inside the expectation and

integral and proceeding as before, we get the estimator

(XN)d

dXi1logg(Xi1,Xi )

dXi1dx

. (9)

The new factor (which we will write asYi1) enters through the chain rule of ordinary calculus.This estimator puts the dependence on x in the path up to the (i 1)st step (as in the pathwisemethod), and then treatsXi1as a parameter of the conditional distribution ofXi (as in the LRMmethod).

Again using the fact thatgis Gaussian, we can write(9)as

(XN)

Wi

(Xi1)t+ op(1)

Yi1.

Under mild conditions, this is unbiased foru(x) for all t, for all i = 1, . . . ,N. If we nowaverage these unbiased estimators, we get

(XN)1

N

N

i=1

Wi

(Xi1)t +op(1)Yi1

(XT)1

T

T

0

Yt

(Xt)

dWt, (10)

for small t. Thus, we recover the Malliavin estimator (4) as the limit of the average of

combinations of pathwise and LRM estimators.Theorem 4.6makes this limit precise.

2.2. Vega

Next, we turn to the estimation of vega, or sensitivity to changes in the diffusion coefficient.

Suppose X satisfies

dX

t=(X

t)

dt+ [

(X

t)

+ (X

t)] d

Wt, X

0=x ,

(11)for some, and Zt=dXt/dsatisfies

dZt=(Xt)Zt dt+ (Xt) dWt+ [(Xt) + (Xt)]Zt dWt, Z0=0. (12)


6/35


Let Ztdenote Zt at=0. The Malliavin estimator for the sensitivity ofE[(XT)] with respect

to at =0 is (cf. Fournie et al. [7], p. 403)

(XT)ZTYT

1

T

T

0

Yt

(Xt)

dWt

1

T

T

0

DtZTYT Yt

(Xt)

dt . (13)Here, Dtdenotes the Malliavin derivative operator.

For the Euler scheme, letZi= dXi /dat =0, fori= 1, . . . ,N. The pathwise estimator ofdE[(XN)]/dis

(XN) ZN. (14)

The LRM estimator has the form in(18),but now affects the transition densities through their

variances rather than their means. Straightforward calculation shows that(18)becomes

(XN)N

i=1 (

X

i1)

(Xi1)W2

it 1

. (15)

This estimator fails to converge as tapproaches zero; so, as we did for delta, we combine

the ideas of the pathwise and LRM techniques with averaging along the path. Observe that, for

small, the effect on XNof perturbing byis the same as the effect of perturbing the initialstatex byZN/YN, providedYN=0. Thus, we may write the pathwise derivative in(14)as

d

d(XN)=

d

dx(XN)

ZNYN

.

By converting the sensitivity to to a sensitivity to x , we can take advantage of the derivation inSection2.1.In order not to rely on differentiability of, we can take the derivative with respect

tox by multiplying by the LRM factor appearing on the left side of(10)to get1

N

Ni=1

Yi1Wi (Xi1)t

+ op(1)(XN)

ZNYN

.

But multiplying by the LRM factor has the effect of differentiating the product (XN)( ZN/YN),though what we want is the derivative of the first factor. To compensate, we subtract the derivative

of the second factor and (recalling that Nt= T) get

(XN)

1

T

Ni=1

Yi1Wi (Xi1)

+ op(1)ZN

YN d

dx

ZNYN

. (16)

This estimator subtracts a pathwise derivative from an LRM derivative.The new term in (16) can be evaluated directly by recursively differentiating the Euler

approximation(5). To make its connection to the Malliavin estimator more evident, we again

average over the path and then use the fact that dXi /dWi= (Xi1)to getd

dx ZN

YN =

1

N

N

i=1

d

dXi

ZN

YNYi=

1

T

N

i=1

d

dWi ZN

YN

Yi t (

Xi

1)

.

Substituting this expression in(16)then suggests the convergence of(16)to(13).We will show

that this approach does indeed produce estimators that are unbiased for all tand that converge

ast0. Some care will be required to handle division byYN.


7/35


2.3. Rho

To consider sensitivities with respect to changes in drift, we consider a family of processes

X satisfying

dXt= [(Xt) + (Xt)] dt+ (Xt) dWt, X0= x,for some , and we consider the derivative with respect toat = 0. The Malliavin estimatorfor this problem is (cf. Fournie et al.[7], p. 398)

(XT)

T0

(Xt)

(Xt)dWt. (17)

The Euler approximation for the perturbed process is

Xi= [(Xi1) + (Xi1)] t+ (Xi1) Wi , X0= x .Lettingg denote the transition density for this Euler approximation, the LRM estimator of thesensitivity is

(XN)N

i=1

d

dlogg(Xi1,Xi )

=0

. (18)

The fact that the transition density is Gaussian simplifies this to

(XN)N

i=1 (Xi1) (Xi1)

Wi .

The convergence of this estimator to the Malliavin estimator (17) now seems evident, and is

stated precisely inTheorem 4.10.

3. Unbiased estimators for Greeks

In this section, we derive estimators of delta, vega and rho that are unbiased for the Euler

approximation.

Introduce a sequence of i.i.d. random variables{i , i 1}, where i follows the normaldistribution with mean 0 and variance 1. Write the Euler scheme as

Xi= Xi1+ (Xi1)t+ (Xi1)

ti , X0= x .We also need the following technical conditions on the payoff function and the drift and

volatility functions. The payoff is a function of m variables, : Rm R, given by(Xi1 , . . . ,Xim )for some {i1, . . . , im} {1, . . . ,N}. We require the following:

Assumption 3.1. The growth rate of the function is polynomial, i.e., there exists a positive

integer psuch that

lim supx+

|(x)|xp


8/35


Assumption 3.2. (1) The drift and the volatilityare both twice differentiable with bounded

first and second derivatives;

(2)is not degenerate, i.e., infx (x) > for some >0;

(3) The function

preserves non-degeneracy under a small perturbation, i.e., there exists a

neighborhood of 0, K, and some >0 such that for any K,infx

[ (x) + (x)]> .

(4)is differentiable and its derivative is bounded.

Assumption 3.3. The drift perturbation function is bounded.

In Sections3.13.3,we consider the sensitivity ofu(x)with respect to a perturbation of the

initial point x (delta), the volatility (vega) and the drift (rho), respectively. As in(6), define

Y

ito be the derivative of

X

iwith respect tox , which satisfies the following recursion:

Yi=Yi1+ (Xi1)Yi1t+ (Xi1)Yi1

ti , Y0=1.We will want to divide by values of the processYin Section3.2and therefore we need to restrictthe process Y to be positive almost surely to avoid the complication that 1/Y could be non-integrable around 0. But this does not hold with normally distributed increments, so we will use

truncated normalsi where necessary.

3.1. Delta

In the outline of Section2.1,we averaged N unbiased estimators of delta with even weightsand considered payoff functions depending only on the value of the underlying asset at the claim

maturity. For the analysis of this section, we consider a more general case of uneven weights and

path-dependent claims, as in Fournie et al. [7].

Theorem 3.1. Suppose the weights {ai: 0i N} satisfyij

i=1ai t= 1 for all1 j m ,and suppose Assumptions 3.1 and3.2 hold. Then, the following is an unbiased estimator for

u(x ):

(Xi1 , . . . ,Xim ) N

i=1ai

Yi

1

(Xi1) ti+ I + II, (19)

where

I= (Xi1 , . . . ,Xim ) N

i=1ai

(Xi1) (Xi1)

[(

t i )2 t];

II= (Xi1 , . . . ,Xim )

Ni=1

ai(Xi1)

(Xi1)

t i

t.

Proof. By the Markov property ofX, we may write

u(x )=

(xi1 , . . . ,xim )g(x,x1) g(xN1,xN) dxN. . . dx1,


9/35


whereg(xi1,xi )is the transition density function ofXi , given the condition that Xi1= xi1.

g(xi1,xi )= 1

2 t (xi1)exp

(xi xi1 (xi1)t)

2

22(xi1)t

because of the assumption of normal increments . UnderAssumptions 3.1and3.2, the growthrate of function is polynomial and the transition density g decays exponentially. So we caninterchange the order of differentiation and integration to get

u (x )=

(xi1 , . . . ,xim )

dg(x,x1)

dxg(x1,x2) g(xN1,xN) dxN. . . dx1

=

(xi1 , . . . ,xim )

d lng(x,x1)dx

g(x,x1)g(x1,x2) g(xN1,xN) dxN. . . dx1.

After some algebra using the form ofggiven above, we have

u (x )=

(xi1 , . . . ,xim )

(x1 x (x )t)/((x)

t)

(x)

t

+ x1 x (x)t (x)

(x) (x)

+ (x)

(x)

x1 x (x)t

(x)

t

2 1

g(x,x1) g(xN1,xN) dxN dx1

= E(Xi1 , . . . ,Xim ) (X1 x (x)t)/((x)

t)

(x)t

+X1 x (x)t

(x)

(x) (x)

+(x)

(x)

X1 x (x)t

(x)

t

2 1

= E(Xi1 , . . . ,Xim )

1

(x)

t+ 1

t

(x) (x )

+ (x)

(x)(21 1)

Y0

where we use the iterationX1=x+ (x)t+ (x)

t1andY0=1.For the case thati

i 1, the first reference date, by the Markovian property of

X,

u(x)=E[E[(Xi1 , . . . ,Xim )|Xi1]].Interchanging the order of differentiation and expectation, and applying the chain rule, we have

u (x )= ddx

E[E[|Xi1]] = E

d

dXi1E[|Xi1]

dXi1dx

= E

d

dXi1E[|Xi1] Yi1

.

Treating Xi1 as a new initial point and applying the same arguments as above here,d

dXi1E[|Xi1] = E

it (Xi1)


10/35


+

ti(Xi1) (Xi1)

+ (2i 1)(Xi1) (Xi1)

Xi1

.

Thus,

u(x)=E

it (Xi1)

+

ti(Xi1) (Xi1)

+ (2i 1)(Xi1) (Xi1)

Yi1

. (20)

For the case thati >i 1,

Ji:=

it (Xi1)

+

ti(Xi1) (Xi1)

+ (2i 1)(Xi1) (Xi1)

Yi1

is not necessarily an unbiased estimator foru(x). But we can show that E[Jl] = E[Jk]for anyij


11/35


3.2. Vega

We take vega to be the derivative ofu(x) with respect to a perturbation that takes () to (

)

+

(

). In other words,

vega= dd

=0

E[(Xi1 , . . . ,Xim )],

where

Xi=Xi1+ (Xi1)t+ [ (Xi1) + (Xi1)]

ti , X0= x,

andi is a truncated normal distributed random variable whose density function is given by

f(w)=0, w < wL ;

C2

ew2

2 , wL < w < wR;0, w > wR .

The parametersC, , wL , wR satisfy (cf.Lemma A.1for the existence of such parameters):

wR= wL >0, C >0, >0; (22) wRwL

C2

e w2

2 dw=1; (23)

Var[

i ] = wR

wL

w2

C

2 e

w2

2 dw=

1. (24)

(22)implies E[i ] =0, (23)makes sure the function fis a probability density function.DefineZ :=dX/d, which satisfies the following recursion

Zi = Zi1+ (Xi1) Zi1t+ [(Xi1)+ (Xi1)] Zi1

ti+ (Xi1)

ti , Z0= 0.

LetZdenoteZ at=0. In addition, we defineY =dX/dx , which satisfies the recursion:

Y

i =Y

i1+ (X

i1)Y

i1t+ [(X

i1) + (X

i1)]Y

i1

ti , Y

0=1.It is easy to see thatYis equal toY at=0.

The following is the main theorem in the subsection:

Theorem 3.2. SupposeAssumptions3.1and3.2hold. Then, for any sufficiently large integer N ,

and for any weight set{ai: 0i N} such thatij

i=ij +1ai t= 1 for all0 j m ,

vega= E(Xi1 , . . . ,Xim )

m

k=1

Zik

Y

ik

Zik1

Y

ik1

iki=ik1+1

aiYi1

(Xi1)

ti

ddx

ZimYim


12/35


+ E(Xi1 , . . . ,Xim )

mk=1

ZikYik

Zik1Yik1

iki=ik1+1

ai t (III + IV)

+ C2

ew2R2

mk=1

ik

i=ik1+1ai

t E(Xi1 , . . . ,Xim )

Zik

Yik

Zik1

Yik1

Yi

(Xi1)

i =wR

C2

ew2

L2

m

k=1

ik

i=ik1+1

ai

t E

(Xi1 , . . . ,Xim )

Zik

Yik

Zik1

Yik1

Yi

(Xi1)

i =wL

,

where

III= (Xi1)Yi1

(Xi1)

ti

; IV=

(Xi1)Yi1 (Xi1)

2i

1

and notation f(x1, . . . ,xn)|xj =w denotes the value of function f when xj is fixed asw.Proof. We first show that this result holds for with a continuous derivative. By the truncated

normal assumption, the range of all possible values of X and Zmust be within a compact set.Over this range, /xis bounded because it is continuous. Therefore, we can take the derivative

inside the expectation to get, by the bounded convergence theorem,

vega= dd

=0

E[(Xi1 , . . . ,Xim )] = E

mj=1

Xij(Xi1 , . . . ,Xim ) Zij

. (25)

For any given(wR , wL , C, ) satisfying(22)(24),we always can find a big enough N such

that

1 supx

|(x)|t supx

|(x)|

twR= 1 sup

x|(x)|T

N

supx

|(x)|TwR

N>0.

Under such choice ofN,Yis positive almost surely because, for any i ,

Yi

=

i

j=1

(1

+(

Xj

1)t

+(

Xj

1)

t

j )

i

j=1(1 sup

x|(x)|t sup

x|(x)|

twR ) >0.


13/35


Now divide and multiply byYsimultaneously in the expectation of(25)to get

vega=E

m

j=1

Xij

(Xi1 , . . . ,Xim ) Zij

Yij

Yij

. (26)

For any weight set {ai: 0i N} such thatij

i=ij +1ai t= 1, notice thatZ0/Y0=0,

ZijYij

Yij=j

k=1

ZikYik

Zik1Yik1

Yij=

jk=1

ik

i=ik1+1ai t

Zik

Yik

Zik1Yik1

Yij . (27)

Plugging(27)back into(26)and interchanging the order of the sum over j and the sums over k

andi , we get

vega= mk=1

iki=ik1+1

ai tEZik

Yik

Zik1Yik1

mj=k

Xij(Xi1 , . . . ,Xim ) Y

ij

YiYi

.

Notice thatYij /Yi= (dXij /dx )/(dXi /dx)=dXij /dXi , so

vega=m

k=1

iki=ik1+1

ai tEZik

Yik

Zik1Yik1

mj=k

Xij(Xi1 , . . . ,Xim )

dXijdXi

Yi

.

(28)

Given a sample path of1, . . . ,i1,i+1, . . . ,N(all increments buti ),Xij andXi becomefunctions ofi . The implicit function theorem leads to

dXijdXi

= dXij /di

dXi /di= d

Xij /di (Xi1)

t

and sinceik1


14/35


C2

wRwL

ew2

i2 Zik

Yik

Zik1Yik1

Yi

t (Xi1) wi

d

dwiZik

Yik Zik1Yik1

Yi

t (Xi1) ZikYik

Zik1Yik1

Yi1(Xi1)

(Xi1) dwi+Ce

w2R

22

Zik

Yik

Zik1

Yik1

Yi

t (Xi1)

wi =wR

Zik

Yik

Zik1Yik1

Yi

t (Xi1)

wi =wL

where we view all of X,Y and Z as functions of wi and note that dYi /dwi =

Yi1(Xi1)/(Xi1). In addition, dXi /dwi= t (Xi1)and Yi= dXi /dx . So,

d

dwi

ZikYik

Zik1

Yik1

Yi

t (Xi1)= d

dXi

ZikYik

Zik1

Yik1

d

Xidx

= ddx

ZikYik

Zik1

Yik1

.

In summary,

E

ZikYik

Zik1Yik1

mj=k

Xij d

XijdXi

Yi

= EEZikYik

Zik1Yik1

mj=k

XijdXij

dXiYi 1, . . . ,i1,i+1, . . . ,N

= E Zik

Yik

Zik1Yik1

i

Yi

(Xi1)

t d

dx

ZikYik

Zik1Yik1

E Zik

Yik

Zik1Yik1

Yi1

(Xi1) (Xi1)

+ C

2 E

ZikYik

Zik1

Yik1

Yi (Xi1)

t

i =wR

Zik

Yik

Zik1Yik1

Yi

(Xi1)

t

i =wL

e

w2R2

.We get the result of the theorem by plugging the above equality back into (28).

To finish the proof, we can apply Lemma A.2to extend the conclusion to a general payoff

function satisfyingAssumption 3.1.

3.3. Rho

In this subsection, we go back to normal incrementsi. Define rho to be the derivative ofu(x)

with respect to a perturbation that takes the drift ()to () + (). In other words,

rho= dd

=0

E[(Xi1 , . . . ,Xim )]


15/35


where

Xi=Xi1+ [(Xi1) + (Xi1)]t+ (Xi1)

t i .LetU be

U =exp N

i=1

(Xi1) (Xi1)

ti

1

22

Ni=1

(Xi1) (Xi1)

2t

.

It is easy to check that U >0 and E[U] =1. Thus, we can define a new probability measureusingU, P(d)=U P(d). Moreover, it is easy to see (cf. Lemma A.3)that this change ofmeasure corresponds to a change of drift, in the sense that

E[(Xi1 , . . . ,Xim )] =E[(Xi1 , . . . ,Xim ) U]. (29)

Theorem 3.3. Suppose thatAssumptions3.13.3hold. Then,

rho=E(Xi1 , . . . ,Xim )

Ni=1

(Xi1) (Xi1)

ti

.

Proof. Fix a compact neighborhood of= 0, say,[l, l]. By(29)and the CauchySchwarzinequality,

1 (E[(Xi1 , . . . ,Xim )] u(x)) E(Xi1 , . . . ,Xim ) N

i=1

(X

i1)

(Xi1)ti

=E(Xi1 , . . . ,Xim )

1

(U 1)

E

(Xi1 , . . . ,Xim )

Ni=1

(Xi1) (Xi1)

ti

E

(Xi1 , . . . ,Xim )21/2

E

1 (U 1) N

i=1

(Xi1) (Xi1)

ti

2

1/2

.

By Taylor expansion with Cauchys remainder and the mean-value theorem for integration, we

can find () [l, l] (which may also depend on the values ofs) such that

U = U0 + dU

d

=0

+ 12

0

d2Ux

dx 2 ( x)2dx

= 1 +

Ni=1

(Xi1) (Xi1)

ti

+ U() ( ())

N

i=1

(Xi1) (Xi1)

ti ()

Ni=1

(Xi1) (Xi1)

2t

2

N

i=1

(Xi1) (Xi1)

2t

.


16/35


UnderAssumptions 3.2and3.3,and 1/are bounded, and we clearly have () [l, l].It follows that there are positive constantsC1andC2 (not dependent on) such that

U()

= exp()N

i=1

(

Xi

1)

(Xi1)ti1

2 ()2

Ni=1 (Xi1) (Xi1)

2

t

exp

C1

t

Ni=1

|i | + C2N

i=1t

. (30)

Furthermore, there also exist positive constantsC3, C4 (not dependent on) such that

N

i=1 (Xi1) (

Xi

1)

ti ()

N

i=1

(Xi1) (

Xi

1)

2t

2

N

i=1

(Xi1) (

Xi

1)

2t

C3tN

i=12i+ C4

Ni=1

t. (31)

With(30)and(31),using the exponential decay of the density of thes, it is easy to show that

sup[l,l]

E

U()

Ni=1

(Xi1) (Xi1)

ti ()

Ni=1

(Xi1) (Xi1)

2t

2

N

i=1

(Xi1) (Xi1)

2t

2


17/35


4. Convergence results

In this section, we show the consistency between the estimators obtained in Section3and the

Malliavin estimators. We prove that each of the unbiased estimators converges in distribution to

the corresponding Malliavin estimator as N +. The theoretical cornerstone of this analysisis the theory of weak convergence for stochastic differential equations. We review the necessary

results in Appendix B. For further background, see Jacod and Shiryayev [11] and Kurtz

and Protter[13,14]. In the section, we attach index Nto such notations as X,Y andZto stressthat they are under N-step Euler approximation.

For each positive integer N, there is a probability space ((N),F(N), {F(N)t },P (N))and onit defined a sequence of i.i.d. random variables{(N)i , 1i N}or{(N)i , 1i N}, whereF

(N)t = ((N)1 , . . . , (N)[N t/ T]) or ((N)1 , . . . ,(N)[N t/ T]). The distributions of (N)1 and(N)1 are

standard normal or truncated normal, respectively. Define

L (N)t =[N t/ T]

i=1

t(N)

(N)i and

L (N)t =[N t/ T]

i=1

t(N)

(N)i .

Thus, L (N) and L (N) induce two different sequences of probability measures on the spaceD1[0, T] (cf.Appendix Bfor the definition of the space D1[0, T]).

To show the estimator of vega will converge to the Malliavin estimator, we need to get rid of

the last two terms in the expression inTheorem 3.2.For this purpose, we choose the parameters

of the truncated normal as follows:

Assumption 4.1. (w(N)L , w(N)R , C

(N), (N))satisfy

C(N) = 12(

N x) 1 ,

(N) = 11 F(N x) ,

w(N)R = w(N)L =

N x(N),

wherex >0 is a fixed number such that

1 supx

|(x)|

T x >0,

is the cumulative probability function of standard normal and Fis defined as inLemma A.1.

Thus, as N +,C(N) and(N) converge to 1 andw(N)R goes to + in the order of

N.

Remark 4.1. By Lemma A.1, it is easy to show that (w(N)L , w

(N)R , C

(N), (N)) must satisfy

(22)(24).Meanwhile, for all sufficiently large N,

1 supx

|(x)|t(N) supx

|(x)|t(N)w

(N)R

=1

supx

|(x)|T

N sup

x |

(x )

|

T x

(N) >0.

Thus, allY(N) are indeed positive, almost surely, and the proof ofTheorem 3.2goes through for

all sufficiently large N.


18/35


4.1. Preliminary convergence results

As the first step, we establish the weak convergence result of processes (X(N),Y(N),Z(N))with increments(N) or

(N) in this subsection. For the purpose, we introduce functions

g(N)(x ,y,z)= g(x,y,z)= (x ) (x)(x )y (x )y

(x )z (x)z+ (x)

.

One can easily see that(X(N),Y(N),Z(N))is the solution to the following SDEs:

M(N)t = t

0g(N)(M(N)s )d L (N)s or M(N)t =

t0

g(N)(M(N)s )d L (N)s , (32)

dependent on which increments we are using. The sample paths of both

M(N) and

M(N) are in

D3[0, T].The following lemma is needed to establish the tightness of processes L (N) and L (N) (see

Appendix Bfor the definition of P-UT):

Lemma 4.2. ProcessesL (N) andL (N) are P-UT.Proof. In the following proof, we do not need any special properties of the normal (or truncated

normal) distribution except that E[] = 0 and Var[] = 1. So, without loss of generality, weonly consider the case ofL (N). For any elementary predictable process

H(N)t =Y(N)0 1{0}+k

i=1Y(N)i 1(si ,si+1](t)

defined on the probability space ((N),F(N), {F(N)t },P (N)) and satisfying|Y(N)i | 1 for all1i kand for anya >0, by Chebyshevs inequality,

P

t0

H(N)s d L (N)s

>a

E[

t0 H

(N)s d L (N)s ]2a2

=

ki=1

E[|Y(N)i |2 | L (N)si+1L (N)si |2]

a2 .

By the assumption that |Y(N)

i | 1, the right side is bounded above byk

i=1E[| L (N)si+1L (N)si |2]

a2 = t

(N)

a2

[N t/ T]j=1

Var[(N)j ].

Because the variance of(N)j is 1, the right side of the inequality above will bet

(N)[N t/T]/a2.This upper bound does not depend on Hand N, and it converges to 0 as a +. Thus,

lima+

supH(N)H(N),N

P t0

H(N)

s d

L(N)

s >a0.

where H(N) is the set of all elementary processes on ((N),F(N), {F(N)t },P (N)), i.e.,L (N) isP-UT.


19/35


Now we are able to show the weak convergence of (X(N),Y(N),Z(N)) in the Skorokhodtopology:

Lemma 4.3. SupposeAssumptions 3.1, 3.2 and4.1 hold and we have (X(N),Y(N),Z(N)) definedas the solution to the SDE(32).As N +,

(X(N),Y(N),Z(N))(X, Y,Z),where(X, Y,Z)is a global solution of the following SDE:

Mt= t

0g(Ms)dWs .

Proof. Suppose that(X(N),Y(N),Z(N))is the solution to the first SDE in(32).Using Donskersinvariance principle for L (N), we know that L (N) weakly converge to Brownian motion W.Furthermore,L

(N)

are P-UT byLemma 4.2.Thus, applyingLemma B.3,we have the conclusionthat(X(N),Y(N),Z(N))(X, Y,Z)in the Skorokhod topology if the solution(X, Y,Z)existsglobally.

As for the case that ( X(N),Y(N),Z(N))is the solution for the second SDE in(32).All of theabove arguments works except that now we can not use Donskers invariance principle to showL (N) converges to Brownian motion becauseL (N) is the sum of a triangular array (the distribution

of(N)1 depends on N by Assumption 4.1). Here we cite the LindebergFeller theorem formartingales (cf. Durrett[5], Theorem 7.7.3, p. 414) to establish such convergence. Indeed, for

anyt [0, T], as N +,

[N t/ T

]j=1

E[t(N)(N)i ]2 = t(N) [N t/ T

]j=1

E[(N)i ]2 = t(N) [N t/T] t

and for any >0,

Nj=1

E[(t(N)

(N)i )

2 1{|t(N)(N)i |>}] 1

2

Nj=1

E[(t(N)

(N)i )

4]

= N2

(t(N))2E[((N)1 )4].

Note thatE[((N)1 )4] is uniformly bounded for all N. The right hand side of the above inequality

converges to 0. In summary, both (i) and (ii) of Theorem 7.7.3 are satisfied. We have the

conclusion thatL(N) weakly converges to Brownian motionW.

Remark 4.4. This lemma implies that Y(N) is a tight process. In other words, we have thefollowing limit holds (cf. [11], p. 350): for any K >0,

limN

P (N)( sup1iN

|Y(N)i | > K)=0.

4.2. Convergence of delta estimators

To show the convergence of estimators for delta, vega and rho, we need another two

assumptions:


20/35


Assumption 4.2. is a.s. continuous under the measure of Xt, i.e., there exists a set E Rmsuch that is continuous outside ofEand

P((Xt1 , . . . ,Xtm )E)=0.

Assumption 4.3. There exists an L2 function{at : 1 t T} for which the sequence ofweights {a(N)i :1i N} satisfies

limN+

sup0tT

|a(N)[N t/ T] at| =0.

Lemma 4.5. UnderAssumptions3.1,3.2,4.2 and4.3,

I(N) := (X(N)i1 , . . . ,X(N)im

) N

i=1a

(N)i

(X(N)i1) (

X

(N)

i1)

[(t(N) (N)i )2 t(N)]

and

II(N) := (X(N)i1 , . . . ,X(N)im

)

Ni=1

a(N)i

(X(N)i1)

(X(N)i1)t(N) (N)i

t(N)

converge weakly to0.

Proof. Notice that

EN

i=1

a(N)i

(X(N)i1)

(X(N)

i1)

[(t(N) (N)i )

2 t(N)]2

=2N

i=1

i1j=1

a(N)i a

(N)j (t

(N))2E

(X(N)i1)

(X(N)i1)

(X(N)j1)

(X(N)j1)

(((N)i )

2 1)(((N)j )2 1)

+N

i=1(a

(N)i t

(N))2E

(X(N)i1)

(X(N)i1)

2((

(N)i )

2 1)2 . (33)

The expectation in the first summand in the right hand side of(33)is 0 because, conditional on

Fi

1,

E

(X(N)i1)

(X(N)i1)(X(N)j1) (X(N)j1)

(((N)i )

2 1)(((N)j )2 1)Fi1

= (X(N)i1)

(X(N)i1)(X(N)j1) (X(N)j1)

(((N)j )

2 1)E[(((N)i )2 1)|Fi1] =0

for alli, j ,i > j . And the second summand in the right hand side of(33)is bounded by:

supx(x )

(x)

2 N

i=1

(a(N)

i

t(N))2E[((

(N)

i

)2

1)2

]

=2

(x) (x )

2 Ni=1

(a(N)i )

2t(N) t(N),


21/35


because the assumption thathas a standard normal distribution leads to E[((N)i )2 1]2 =2.On the other hand, according toAssumption 4.3,supN

Ni=1(a

(N)i )

2t(N)


22/35


Lemma 4.7. UnderAssumptions3.1,3.2 and4.14.3,

ik

i=ik1+1a

(N)i t

(N) III(N) :=ik

i=ik1+1a

(N)i t

(N) (X(N)

i1)Y(N)

i1 (

X

(N)i

1)t(N)

(N)i

(N)

and

iki=ik1+1

a(N)i t

(N) IV(N) :=ik

i=ik1+1a

(N)i t

(N) (X(N)

i1)Y(N)

i1 (X(N)i1)

((N)i )

2

(N) 1

converge to0 weakly.

Proof. Using similar arguments as those inTheorem 4.6,it is easy to show that

iki=ik1+1

a(N)

i (X(N)i

1)Y

(N)i

1

(X(N)i1) t(N) (N)i

tk

tk1a

s

(Xs )Ys (Xs )

dWs

.

On the other hand, byAssumption 4.1,(N) converge to 1. Then we have

iki=ik1+1

a(N)i t

(N) III(N) = t(N)

(N)

ik

i=ik1+1a

(N)i

(X(N)i1)Y(N)i1 (X(N)

i1)t(N)

(N)i

.

0.As for the second convergence in the statement of the lemma, we first show

limN+

E

iki=ik1+1

a(N)i t

(N) (X(N)i1)Y(N)i1

(X(N)i1)

((N)i )

2

(N) 1

1{(N)b i}

2 =0,

for any fixedb > 0, where (N)b := inf{j: |Y(N)j | > b}. Once we establish this limit, we can

conclude thatik

i=ik1+1a(N)i t

(N) IV(N) 0, in light ofRemark 4.4(followingLemma 4.3)noting that P (N)(

(N)b N)1.

Indeed, the left hand side of the above limit is bounded by, using the inequality (a+ b)2

2a2

+2b2,

2

((N))2E

iki=ik1+1

a(N)i t

(N) (X(N)i1)Y(N)i1

(X(N)i1)

[((N)i )2 1] 1{(N)b i}

2

+ 2

1

((N))2 1

2 E

iki=ik1+1

a(N)i t

(N) (X(N)

i1)Y(N)

i1 (X(N)i1)

1{(N)b i}

2 . (34)

It is easy to see that the second term in(34)converges to 0 because(N)

1 and

E

iki=ik1+1

a(N)i t

(N) (X(N)i1)Y(N)i1

(X(N)i1) 1{(N)

b i}

2


23/35


Nik

i=ik1+1E

(a(N)i )2(t(N))2

(X(N)i1)Y(N)i1

(X(N)i1)

2 1{(N)b i}

supx

(x)b (x)

2 iki=ik1+1

(a(N)i )2(t(N)) T

is bounded uniformly for all N. For the first term in(34),

a(N)i t

(N) (X(N)i1)Y(N)i1

(X(N)i1)

[((N)i )2 1] 1{(N)b

i}

is a martingale difference sequence because the variance of (N)i is 1 and t(N) (X(N)i

1)Y

(N)i

1 / (

X(N)i

1) 1{

(N)b

i

} is F

(N)i

1 -measurable. By the orthogonality of martingale

differences and the assumption that is bounded and is non-degenerate,

E

iki=ik1+1

a(N)i t

(N) (X(N)i1)Y(N)i1

(X(N)i1)

[((N)i )2 1] 1{(N)b i}

2

= t(N) ik

i=ik1+1|a(N)i |2E

(X(N)i1)Y(N)i1 (X(N)i1)

[((N)i )2 1] 1{(N)b i}

2

t(N)

C b2t(N) ik

i=ik1+1|a(N)i |2E[|((N)i )2 1|2] t(N). (35)

Noting that the variance of(N)i is 1 and the fourth moment of(N)i is also bounded by someconstant independent ofN, we find that E[|((N)i )2 1|2]is bounded uniformly in N. Thus theright hand side of(35)converges to 0, which implies the convergence of the first term of(34)

to 0.

Lemma 4.8. Under all assumptions in Section 3 and Assumptions 4.14.3, there exists a

constant C which does not depend on N such that, for every 1

k

m ,

supik1+1iik

E

(X(N)i1 , . . . ,X(N)im )

Z(N)ik

Y(N)

ik

Z(N)ik1Y

(N)ik1

Y(N)i

(X(N)i1)

(N)i =w(N)

R (orw

(N)L

)


24/35


is bounded by C1N3/2 exp(C2N) for some constant C1 and C2 which do not depend on N

either, and thus it converges to 0 as N +. (Here, (N) = (X(N)i1 , . . . ,X(N)im

).)

Proof. We consider only the conditional expectation on the condition that

(N)i

=w

(N)R because

the arguments are the same for both cases ofw(N)R andw(N)L .

By the definition ofY(N) andZ(N) (cf. the part beforeTheorem 3.2in Section3.2), we knowthat

Z(N)jY

(N)j

=Z(N)

j1Y

(N)j

(1 + (X(N)j1)t(N) + (X(N)j1)t(N)

(N)j ) +

(X(N)j1)t(N)

(N)j

Y(N)

j

=Z(N)j1Y

(N)j

Y

(N)j

Y(N)

j

1

+ (X(N)j1)

t(N)

(N)j

Y(N)

j

=Z(N)j1Y

(N)j

1

+ (X(N)j1)

t(N)

(N)j

Y(N)

j

.

Thus,

Z(N)ik

Y(N)

ik

Z(N)ik1Y

(N)ik1

=ik

j=ik1+1

Z(N)j

Y(N)

j

Z(N)

j1Y

(N)j1

= ik

j=ik1+1

(X(N)j1)t(N)

(N)j

Y(N)

j

.

Because is non-degenerate and| (x)| | (0)| + supx|(x)| |x |, there must exists twoconstantsC3 and C4 which does not depend on Nsuch that

E(X(N)i1 , . . . ,X(N)im )

Z(N)

ik

Y(N)

ik

Z

(N)

ik1Y

(N)ik1

Y(N)i (X(N)i1)

(N)i =w(N)R

=ik

j=ik1+1E

(X(N)i1 , . . . ,X(N)im )

(X(N)j1)t(N)

(N)j

Y(N)

j

Y(N)

i

(X(N)i1)

(N)i =w(N)

R

ik

j=ik1+1E

|(X(N)i1 , . . . ,X

(N)im

)| t(N)(C3+ C4|X(N)j1|)|(N)j |

|Y(N)j |

|Y(N)i |

(N)i =w(

N)R

.

Using the arithmeticgeometric mean inequality, for any positive numbers a, b, c, d, we know

thatabcd (a4 + b4 + c4 + d4)/4. So the right hand side of the above inequality is less than

1

4

ik

j=ik1+1 E[|(X(N)i1 , . . . ,X

(N)im

)|4|

(N)i =w

(N)R

]

+ (t(N))2E[|C3+ C4|X(N)j1||4|(N)j |4|(N)i =w(N)

R

]

+ E[|Y(N)j |4|(N)i =w(N)R ] + E[|Y(N)

i |4|(N)i =w(N)R ]

. (36)


25/35


ByLemmas C.3andC.2, the last two terms in(36)are both bounded by constants independent

ofN. The polynomial growth ofimplies that

E[|(X(N)i1 , . . . ,X

(N)im )|

4

|(N)i =w(N)R ] Em

k=1(X

(N)ik )

2

(N)i =w

(N)R

2p

m2pm

k=1E[(X(N)ik )

4p|

(N)i =w

(N)R

].

ByLemma C.1,we know that it is also bounded by a constant independent ofN.For

(t(N))2E[|X(N)j1|4|(N)j |4|(N)i =w(N)R ],

we consider two cases. If j=i , noting that X(N)

j1 and(N)

j are independent, this equals

(t(N))2E[|X(N)j1|4|(N)i =w(N)R ] E[|(N)j |4]

sup1jN

E|(N)j |4 (t(N))2 E[|X(N)j1|4|(N)i =w(N)R ],

which is bounded by a constant byLemma C.1and the fact that sup1jN E|(N)j |4 is uniformlybounded for all N; if j= i , it equals

E[|X(N)j1|4 (t(N)w

(N)R )

4] =(x

T(N))4 supj

E|X(N)j1|4,

which is also bounded by a constant because(N) 1 and supj E|X(N)j1|4 bounded.In summary, all summands in (36)are bounded by constants independent of N. Therefore,

there exists a constant Csuch that

E

(X(N)i1 , . . . ,X(N)im )

Z(N)ik

Y(N)ik

Z(N)ik1Y

(N)ik1

Y(N)i

(X(N)i1)

(N)i =w

(N)R


26/35


Eq.(37)holds by arguments similar to those used in the proof ofTheorem 4.6and the fact

thatL (N) is P-UT (cf.Lemma 4.2).Now turn to the proof of(38).Notice that

d Z(N)

i

dx= d Z

(N)

i1dx

+ (X(N)i1)Y(N)i1Z(N)i1+ (X(N)i1) d Z(N)

i1dx

t(N) + (X(N)i1)Y(N)i1 (N)i

+

(X(N)i1)Y(N)i1Z(N)i1+ (X(N)i1)d Z(N)i1

dx

(N)i

t(N)

and

dY(N)

i

dx= dY

(N)i1

dx+

(X(N)i1)(Y(N)i1 )2 + (X(N)i1)dY

(N)i1

dx

t(N) +

(X(N)i1)(Y(N)i1 )2

+ (X(N)i1) dY(N)

i1dx

(N)i

t(N).

Using arguments similar to those in Lemma 4.3, we can show that (d Z(N)/dx, dY(N)/dx) weaklyconverges to(dZ/dx, dY/dx)in the Skorohod topology. Thus,

d

dx

Z(N)imY

(N)im

=

d Z(N)imdx

1Y

(N)im

Z(N)im

(Y(N)

im)2

dY

(N)im

dx

dZtm

dx 1

Ytm Ztm

(Ytm )2dYtm

dx = d

dxZtm

Ytm .

4.4. Convergence of rho estimators

In this section, we turn to the convergence of the rho estimators. Because the necessary

arguments are very similar to (and simpler than) what we used in Theorems 4.6 and4.9, we

just list the theorem as follows without detailed justification:

Theorem 4.10. Under all assumptions in Section 3 andAssumption4.2,the estimator given in

Theorem3.3 converges weakly to

(Xt1 , . . . ,Xtm ) T

0

(Xs )

(Xs )dWs .

The rest of this article presents technical results inAppendices AC.

Acknowledgment

This work is supported, in part, by NSF grants DMI0300044 and DMS0410234.

Appendix A. Some technical lemmas

Lemma A.1. There exist parameters satisfying (22)(24).


27/35


Proof. Notice that

limx+

xex 2/2

2(x) 1= 0,

where ()is the cumulative normal distribution function. Thus, we can choose an x such that

F(x):= 22

x ex 2/2

2(x) 1


28/35


the above equality should be less than

sup

KRN

|n( x i1 , . . . ,x im ) ( x i1 , . . . ,x im )|2 N

j=1

C2

exp

w2j

2

dw1 dwN

if we enlarge the whole integration region from[wL , wR]N to RN. Now applying change ofvariable here to switch integration variables from (w1, . . . , wN) to ( x 1 , . . . ,x N), we have theabove should equal to

C2

NsupK

RN

|n |2

N

j=1

1t( (xj

1)

+

(xj

1))

exp

(xi xi1 (xi1)t)

2

2((xj

1)

+

(xj

1))2t

dx .

By part (3) ofAssumption 3.2,we know that+ is bounded below by a positive constant for allK. Thus the above should be less than

CN

RN|n |2

Nj=1

12 t

exp

(xi xi1 (xi1)t)

2

22t

dx1 dxN.

Notice that the production is the joint density function of(X1, . . . ,XN) where Xi is definedby the recursion: Xi = Xi1+ (Xi1)t+

ti . One can easily show that the above

converges to 0 as n +

if using the argument of Theorem C.6(iv) in Evans [6] (cf. p. 630.

The proof there is under the Lebesgue measure. But one can easily change it to cover our case

under the measure induced by(X1, . . . ,XN)). This implies,

limn supK

E[|n (Xi1 , . . . ,Xim ) (Xi1 , . . . ,Xim )|2] =0.

Next, applying the arguments which lead toTheorem 3.2and noticing that n is differentiable,

for any K,d

d

E

[n (

Xi1 , . . . ,

Xim )

] = E

[n (

Xi1 , . . . ,

Xim )

D()

] (39)

where

D()=Zik

Yik

Zik1Yik1

iki=ik1+1

aiY

i1 (X

i1)

ti

ddx

ZimYim

+

mk=1

ZikYik

Zik1Yik1

iki=ik1+1

ai t

(Xi1)Yi1 (Xi1)

ti

+

(Xi1)Yi1 (Xi1)

2i

1


29/35


+ C2

ew2

R2

mk=1

iki=ik1+1

ai

tZik

Yik

Zik1

Yik1

Y

i

(Xi1)

i =wR

C2

ew2L2

mk=1

ik

i=ik1+1ai

tZik

Yik

Zik1

Yik1

Yi

(Xi1)

i =wL

.

By the CauchySchwarz inequality,

supK

E[|n (Xi1 , . . . ,Xim ) (Xi1 , . . . ,Xim )| |D()|]

supK

E[|n(Xi1 , . . . ,Xim ) (Xi1 , . . . ,Xim )|2] supK

E[|D()|2].

Note that D()is a continuous function with respect to and realization of(1, . . . ,N). The following result is a discrete version of the well-known Girsanov theorem. We state it

without proof.

Lemma A.3. Define a new probability measured P =UdP with U defined as in Section3.3.Then,

P[

t1dw1, . . . ,

tN dwN] = 1

(2t)N/2

exp

1

2

N

i=1

(wi (Xi1) (Xi1) t)2

t dw1

dwN.

Appendix B. Weak convergence of stochastic differential equations

This appendix presents a lemma, based on Kurtz and Protter [13] or Kurtz and Protter [14]

on the stability of stochastic differential equations that serves as the theoretical foundation of

Section4.

Suppose that the space of all cadlag functions:[0, T] Rm (that is, functions with rightcontinuity and left limits) by Dm[0, T]. Equip this space with the Skorokhod topology: asequence of processes n

Dm

[0, T

]converges to a process if and only if there exists a

sequence of functionsn , which are all strictly increasing withn(0)=0 andn (T)=T, suchthat

limn+ sup0tT

|n(s) s| =0 and limn+ sup0tT

|n (n (s)) (s)| =0.

We work with a sequence of probability spaces B(N) := ((N),F(N), {F(N)t },P (N)) anda probability space B := (,F, {Ft},P ). For each N, an m-dimensional process L (N) is asemimartingale defined on the probability space B(N), which is adapted to the filtration{F(N)t }.The sample paths of L (N) is in the space Dm[0, T]. Let g(N) : Rk [0, +) Mkm be acontinuous function mapping, with

Mkm

the space of k m matrices. Assume that a pair ofk-dimensional processes M(N) andU(N) satisfies the following SDE:

M(N)t =U(N)t +

t0

g(N)(M(N)s , s)dL (N)s .


30/35


Similarly, on the space(,F, {Ft},P ), a triple (M, U,L )and a continuous function g satisfy

Mt= Ut+ t

0g(Ms, s)dL s , (40)

where the sample paths ofL is in Dm[0, T] too.In addition, we call a sequence of probability measures on Dm[0, T], n , weakly converges

in the Skorokhod topology to another probability measure if for any bounded continuous

functional f: Dm[0, T] R, we have

limn+

Dm [0,T]

f()dm ()=Dm [0,T]

f()d().

Then we denote thatm .We would like to present a sufficient condition under which processes M(N) converge weakly

to M. But before doing that, several definitions are needed.

Definition B.1 (Elementary Processes). Anelementary processon B(N) is a predictable process

of the form

Ht= Y01{0}+k

i=1Yi 1(si ,si+1](t),

for some positive integer k and 0 < s1 0

lima+ sup

H(N)H(N),NP(N)

t

0H

(N)s dL

(N)s

>a

=0.

We now have the following lemma on the weak convergence ofM(N). It is a simple corollary

of Theorem 5.4 in Kurtz and Protter [13]or Theorem 8.6 in Kurtz and Protter [14]:

Lemma B.3. Suppose that(U(N),L (N)) converge weakly to (U,L ) in the Skorokhod topology

and that L (N) is P-UT. Assume that g(N) uniformly converges to g locally,

limN+

sup(x,t)K

|g(N)(x, t) g(x , t)| =0. (41)

If there exists a global solution M of (40) and weak local uniqueness holds, then

(U(N),L (N),M(N))(U,L ,M)in the Skorokhod topology.

Appendix C. Uniform bounds on moments

In this appendix, we derive various upper bounds for the moments ofX(N),Y(N) and (Y(N))1used in the proof of weak convergence of the vega estimators. Throughout this appendix, these

processes are assumed to be constructed from the truncated normal increments(N)i .


31/35


Lemma C.1. Suppose that p is a positive integer. Then,

supN

sup1iN

E[(X(N)i )2p|(N)j =w(N)R ]


32/35


For index j ,

E[(X(N)j )2p|(N)j =w(N)R ] = E[(X(N)j1+ (X(N)j1)t(N)

+ (

X

(N)

j1)t(N)

(N)

j

)2p

|

(N)

j =w

(N)

R ]= E[(X(N)j1+ (X(N)j1)t(N) + (X(N)j1)

t(N)w

(N)R )

2p].

ByAssumption 4.1,t(N)w

(N)R =

T x(N), which is bounded by a constantC3for all N

because(N) 1. Using the vector inequality (a+ b+ c)2p 32p1(a2p + b2p + c2p), and|(x)| C1(1 + |x |)and | (x)| C1(1 + |x |), we have

E[(X(N)j )2p|(N)j =w(N)R ] 32p1E[|X(N)j1|2p + |(X(N)j1)|2p|t(N)|2p

+ C2p3 | (X(N)j1)|2p]

32p1E[1 + |X(N)j1|]2p (1 + C2p1 |t(N)|2p + C

2p3 C

2p1 ).

By the vector inequality(a+ b)2p 22p(a2p + b2p)again, we can find constantC4 such thatthe right hand side of the above is less than C4(1 +E[|X(N)j1|2p]).

In summary, for a general i , we can find a common constantC, which is independent of N,

such that

E[(X(N)i )2p|(N)j =w(N)R ] (1 + Ct(N)) E[(X(N)i1)2p|(N)j =w (N)R ] + Ct(N)

ifi= j or

E[(X(N)

i )2p

|(N)

j =w(N)

R ] C E[(X(N)

i1)2p

|(N)

j =w(N)

R ] + Cifi= j . Accordingly, by induction, we can easily get that

E[(X(N)i )2p|(N)j =w(N)R ] (1 + C+ 2Ct(N))(1 + Ct(N))N1.The limit of the right hand side exists when N +. Therefore,

supN

sup1iN

E[(X(N)i )2p|(N)j =w(N)R ]


33/35


(1 + C5t(N)) E[(Y(N)i1 )4|(N)j =w(N)R ]whereC5 is a constant which does not depend on Nand indexi .

For index j ,

E[(Y(N)j )4|(N)j =w (N)R ] = E[(Y(N)j1)4 (1 + (X(N)j1)t(N)

+ (X(N)j1)t(N)

(N)j )

4|(N)j =w(N)R ]= E[(Y(N)

j1)4 (1 + (X(N)

j1)t(N)

+ (X(N)j1)t(N)w

(N)R )

4|(N)j =w(N)R ].

We know from the proof ofLemma C.1thatt(N)w

(N)R C3. Thus, by the boundedness of

and and similar arguments as those in the proof ofLemma C.1,

E[(Y(N)j )4|(N)j =w (N)R ] 33E[|Y(N)j1|4] (1 + C46 |t(N)|4 + C46 C43 ).where supx||< C6and supx||< C6.

By induction,

E[(Y(N)i )2p|(N)j =w(N)R ] (1 + C5t(N))N1 33 (1 + C46 |t(N)|4 + C46 C43 ) eC5 T33 (1 + C46 C43 ).

Lemma C.3. For any j,

supN

sup1iN

E[(Y(N)i )4|(N)j =w (N)R ]0 (by Assumption 4.1, such exists because 1supx|(x)|T x >0)and supposex lies in an interval [ supx|(x)|T x , supx|(x)|T x + ], then we canbound the function G(x)by 1 4x+ 10mx 2, wherem is the maximum value of 1/(1 + )6over the interval [ supx|(x )|

T x , supx|(x )|

T x+ ].

Notice that for any indexi= j , byAssumption 4.1,

|(X(N)i1)t(N) + (X(N)i1)t(N)

(N)i | sup

x|(x )|t(N) + sup

x|(x)|

t(N)w

(N)R

= supx

|(x )|t(N) + supx

|(x )|

T x(N).

In other words, (X(N)i1)t(N)+(X

(N)i1)t(N)

(N)i falls in the interval [ supx|(x)|T x

, supx|(x)|

T x+ ] almost surely when Nis big enough. Therefore,

(1 + (X(N)i1)t(N) + (X(N)i1)t(N)

(N)i )

4


34/35


1 4[(X(N)i1)t

(N) + (X(N)i1)

t(N)

(N)i ]

+ 10m[(X(N)i1)t(N) + (X(N)i1)

t(N)(N)i ]2

=1

+ [4t(N)

+10m(t(N))2

]+ [4

+20mt(N)]t(N)

(N)

i+ 10m()2t(N)((N)i )2.Taking expectations on both sides of the above inequality, we have

E[(1 + (X(N)i1)t(N) + (X(N)i1)t(N)

(N)i )

4|(N)j =w (N)R ]1 + [4t(N) + 10m(t(N))2] + 10m()2t(N)E[(N)i ]2.

Because E[(N)i ]2 is bounded uniformly for all Nandand are both bounded, there exists aconstantC7 such that

E[(1 + (X(N)i1)t(N) + (X(N)i1)t(N)

(N)i )

4|(N)j =w (N)R ] 1 + C7t(N).

For index j ,

E[(1 + (X(N)j1)t(N) + (X(N)j1)t(N)

(N)j )

4|(N)j =w (N)R ]= E[(1 + (X(N)j1)t(N) + (X(N)j1)

t(N)w

(N)R )

4|(N)j =w(N)R ]= E[(1 + (X(N)j1)t(N) + (X(N)j1)

T x(N))4|(N)j =w(N)R ]

where w(N)R = N x(N) by Assumption 4.1. Using the fact that and are boundedand that (N) 1, we find that the right hand side of the above equality is bounded by(1 supx|(x)|

T x )4 ifNis big enough.

Accordingly,

E[(Y(N)i )4|(N)j =w(N)R ] = E[(Y(N)i1 )4 (1 + (X(N)i1)t(N)

+ (X(N)i1)

t(N)

(N)i )

4|(N)j =w(N)R ]will be less than

(1 + C7t(N))E[(Y(N)i1 )4|(N)j =w(N)R ]ifi= j , and will be less than

(1 supx

|(x)|

T x )4E[(Y(N)i1 )

4|(N)j =w(N)R ]

ifi= j . By induction, we get

E[(Y(N)i )4|(N)j = w(N)R ] (1 + C7t(N))N1

(1

supx |

(x)

|

T x

)4

eC7 T

(1 supx

|(x)|T x )4 .


35/35


References

[1] E. Benhamou, Optimal Malliavin weighting function for the computation of Greeks, Mathematical Finance 13

(2003) 3753.

[2] H.-P. Bermin, A. Kohatsu-Higa, M. Montero, Local vega index and variance reduction methods, Mathematical

Finance 13 (2003) 8597.

[3] J. Cvitanic, J. Ma, J. Zhang, Efficient computation of hedging portfolios for options with discontinuous payoffs,

Mathematical Finance 13 (2003) 135151.

[4] M. Davis, M. Johansson, Malliavin Monte Carlo Greeks for jump diffusions, Stochastic Processes and their

Applications 116 (2006) 101129.

[5] R. Durrett, Probability: Theory and Examples, 2nd ed., Duxbury Press, 1996.

[6] L. Evans, Partial Differential Equations, in: Graduate Studies in Mathematics, vol. 19, American Mathematical

Society, 1998.

[7] E. Fournie, J.-M. Lasry, J. Lebuchoux, P.-L. Lions, N. Touzi, Applications of Malliavin calculus to Monte Carlo

methods in finance, Finance and Stochastics 3 (1999) 391412.

[8] E. Fournie, J.-M. Lasry, J. Lebuchoux, P.-L. Lions, Applications of Malliavin calculus to Monte Carlo methods in

finance. II, Finance and Stochastics 5 (2001) 201236.[9] P. Glasserman, Monte Carlo Methods in Financial Engineering, Springer-Verlag, New York, 2004.

[10] E. Gobet, A. Kohatsu-Higa, Computation of Greeks for barrier and lookback options using Malliavin calculus,

Electronic Communications in Probability 8 (2003) 5162.

[11] J. Jacod, A.N. Shiryayev, Limit Theorems for Stochastic Processes, 2nd ed., Springer-Verlag, Berlin, Heidelberg,

2003.

[12] A. Kohatsu-Higa, M. Montero, Malliavin calculus in finance, in: Handbook of Computational and Numerical

Methods in Finance, Birkhauser, 2004, pp. 111174.

[13] T.G. Kurtz, P. Protter, Weak limit theorems for stochastic integrals and stochastic differential equations, Annals of

Probability 19 (3) (1991) 10351070.

[14] T.G. Kurtz, P. Protter, Weak convergence of stochastic integrals and differential equations, Lecture Notes in

Mathematics 1627 (1995) 141.

[15] D. Nualart, Malliavin Calculus and Related Topics, Springer-Verlag, 1995.[16] P. Protter, Stochastic Integration and Differential Equations, Springer-Verlag, Berlin, Heidelberg, 1990.

Documents

Paul Glasserman - Malliavin Greeks without Malliavin calculus.pdf