Asymptotic properties of Monte Carlo estimators of ...€¦ · Journal of Econometrics 134 (2006) 1–68 Asymptotic properties of Monte Carlo estimators of diffusion processes Je´roˆme

ARTICLE IN PRESS

Journal of Econometrics 134 (2006) 1–68

0304-4076/$ -

doi:10.1016/j

�CorrespoE-mail ad

www.elsevier.com/locate/jeconom

Asymptotic properties of Monte Carlo estimatorsof diffusion processes

Jerome Detemplea,b, Rene Garciac,b,d,�, Marcel Rindisbacherb,e

aBoston University School of Management, USAbCIRANO, Canada

cEconomics Department, Universite de Montreal, CanadadCIREQ, Canada

eRotman School of Management, University of Toronto, Canada

Available online 31 August 2005

Abstract

This paper studies the limit distributions of Monte Carlo estimators of diffusion processes.

We examine two types of estimators based on the Euler scheme, one applied to the original

processes, the other to a Doss transformation of the processes. We show that the

transformation increases the speed of convergence of the Euler scheme. We also study

estimators of conditional expectations of diffusions. After characterizing expected approx-

imation errors, we construct second-order bias-corrected estimators. We also derive new

convergence results for the Mihlstein scheme. Illustrations of the results are provided in the

context of simulation-based estimation of diffusion processes.

r 2005 Elsevier B.V. All rights reserved.

JEL classification: C13; C15

Keywords: Monte Carlo estimators; Diffusion processes; Simulation-based estimation; Doss

transformation

see front matter r 2005 Elsevier B.V. All rights reserved.

.jeconom.2005.06.028

nding author. Tel.: +1514 343 5960; fax: +1 514 343 5831.

dress: [email protected] (R. Garcia).

www.elsevier.com/locate/jeconom

ARTICLE IN PRESS

J. Detemple et al. / Journal of Econometrics 134 (2006) 1–682

1. Introduction

Standard applications in finance, such as portfolio allocation, asset pricing andrisk management, rely on models of prices and state variables described by diffusionprocesses. For practical implementations of these financial models estimates of thedrift and volatility coefficients of the underlying processes are needed. Precisestatistical methods are essential for that purpose. The importance of accuratemethods is further enhanced by the potentially large impact of parameter uncertaintyand estimation errors on economic decisions.1

While maximum likelihood is the estimation method of choice, it is, in general,infeasible as explicit formulas for transition densities are often unknown. Inferencecan, therefore, only proceed with the use of an auxiliary numerical proceduredesigned to approximate the transition density. An example is the simulatedmaximum likelihood estimation (SMLE) method, suggested by Pedersen (1995a,b).2

This method approximates the transition density by applying a Euler discretizationto the diffusion process. It splits the time between observations into subintervals andevaluates the transition of the process between any two observations by integratingout convolutions of subdensities (generally Gaussian) using a Monte Carloprocedure. Consistency of the estimator is established when the numbers ofsubintervals and Monte Carlo replications go to infinity, while a particular ratio ofthe two converges to zero.3 Any feasible estimator, however, involves finite numbersof subintervals and replications and will therefore be biased and inefficient.Moreover, there are trade-offs between the number of subintervals and that ofreplications. Increasing the former will decrease the bias but will boost the varianceof the estimator. Increasing the latter will reduce the variance, but will also, if theestimator is biased, diminish the probability that a confidence interval of a given sizecovers the true value of the parameter. The main contribution of our paper is tocharacterize the asymptotic properties of the errors associated with efficientapproximation procedures for the estimation of diffusion processes. Our resultsenable us to construct (second-order) bias-corrected estimators that are asympto-tically equivalent to estimators obtained by sampling from the true distribution ofthe processes.

In a recent paper, Durham and Gallant (2002) stress that SMLE can becomecomputationally expensive, if the goal is to achieve a reasonable degree of accuracy,and propose various ways to reduce this cost. Their approach uses bias- andvariance-reduction schemes to limit both the number of subintervals and that ofreplications. They investigate, by experimentation, the trade-offs between these twocontrol parameters for various discretization and simulation schemes. Our goal is tosupplement experimentation with an explicit characterization of the asymptotic errorproperties of Monte Carlo estimators for general diffusion processes.

1For optimal portfolio choice, see, for example, Bawa et al. (1979) and Barberis (2000).2See also Santa-Clara (1995) and Brandt and Santa-Clara (2002).3In Brandt and Santa-Clara (2002), the ratio S1=2=M ! 0, with S the number of replications and M the

number of subintervals. A discussion of this ratio is provided in Section 5.1.

ARTICLE IN PRESS

J. Detemple et al. / Journal of Econometrics 134 (2006) 1–68 3

As an illustration, we apply our convergence results to simulated methods ofmoments.4 These include the simulated method of moments (Duffie and Singleton,1993), indirect inference procedures (Smith, 1990; Gourieroux et al., 1993) andefficient method of moment procedures (EMM) (Gallant and Tauchen, 1996).5

Recently, Broze et al. (1998) studied the asymptotic bias associated with indirectinference. This procedure relies on the simulation of diffusion processes at a step thatis finer than the observation step to correct the discretization bias. They emphasizethat the use of any fixed time interval to perform the simulation implies anasymptotic bias.6 Although their theoretical asymptotic results support the choice ofan arbitrarily small discretization step, their numerical experiments show a trade-offbetween the length of the simulated data MT (where T is the length of the observedsample and M the number of trajectories) and the size h ¼ T=N of the discretizationstep. Our results provide an analytical characterization of this trade-off. We show, inparticular, that the second-order bias is a function of both M and N and provide aprocedure to correct simulation-based method-of-moment estimators for it.

To state the problem in general terms, note that auxiliary numerical proceduresfor SMM estimators involve the calculation of conditional expectations of functionsof the solutions of stochastic differential equations. The use of a Monte Carloprocedure for that purpose induces two types of approximation errors. The firsterror is due to the numerical solution of a stochastic differential equation based on atime discretization scheme. The second error is the Monte Carlo error due to theapproximation of the expectation in the moment condition, by an average overindependent replications. Suppose that one seeks to compute the conditionalexpectation f ðt;xÞ ¼ Et½gðX T Þ�; where X T is the terminal value of the solution of thestochastic differential equation (SDE)

dX v ¼ AðX vÞdvþ BðX vÞdW v; X t ¼ x. (1)

To approximate the terminal value X T , of the solution of (1), several discretizationschemes can be used.7 The most popular, perhaps because of its ease ofimplementation, is the Euler scheme. This iterative procedure evaluates the driftand volatility functions at the value X ðtnÞ at time tn in order to infer the value X ðtnþ1Þ

at tnþ1, and proceeds in this manner until tN ¼ T . The second approximation is inthe computation of the conditional expectation that is performed by averaging over

4The methodology developed here could be applied to other simulation-based estimation methods such

as SMLE. In those applications the steps needed to characterize the bias will typically be more involved

than for the simulated method of moments.5Other numerical approaches have been proposed to estimate a diffusion process when the transition

density is not known in closed form. Let us mention the numerical resolution of the Kolmogorov forward

equation (Lo, 1988), the estimation of the infinitesimal generator based on a truncated set of

eigenfunctions (Kessler and Sørensen, 1999; Hansen and Scheinkman, 1995), orthogonal series expansions

(Ait-Sahalia, 2002), and Markov chain Monte Carlo (MCMC) Bayesian techniques (Eraker, 2001; Chib et

al., 2001). Except for Ait-Sahalia (2002), all other methods require that the sampling interval go to zero to

obtain convergence and involve the two types of approximation errors mentioned above.6Because the simulation is based on a fixed time step, and therefore not performed using the true

probability density function, they rename the method quasi-indirect inference.7A detailed analysis of discretization schemes available can be found in Kloeden and Platen (1997).

ARTICLE IN PRESS


a finite sample of approximated terminal values X ðTÞ. Justification for this averagingrests on the law of large numbers. The combination of these two operations, labelledMCE (Monte Carlo with Euler discretization), produces an estimate of f ðt;xÞ thatinvolves the two types of errors mentioned above. Understanding the trade-offbetween these errors requires the asymptotic error distribution.

In an insightful paper, Duffie and Glynn (1995) highlighted the trade-off betweenthe discretization error and the Monte Carlo averaging error, and showed theexistence of an efficient choice of discretization steps and Monte Carlo replications.For this efficient MCE scheme, they also characterized the asymptotic distribution ofthe approximation error and found it to be non-centered. A consequence is that theefficient procedure has a second-order bias. In this paper we characterize the second-order bias as the expected value of a known random variable. This random variablecan be simulated along with the diffusion and used to design a new approximationthat corrects for second-order bias. The bias corrected estimate is asymptoticallyequivalent to a Monte Carlo procedure that samples directly from the truedistribution of the terminal point of the diffusion.8

In the first part of this paper (Sections 2–4), we study the asymptotic distributionsof errors associated with discretization schemes for general diffusion processes andof Monte Carlo estimators of conditional expectations of diffusions. Errorproperties for approximations of solutions of SDEs (the first type of error) havebeen studied before. For the Euler discretization scheme the asymptotic errordistribution was found by Kurtz and Protter (1991a) and Jacod and Protter (1998).We extend their results by proposing a change of variables, commonly referred to asa Doss transformation (see Doss, 1977; Detemple et al., 2005) that reduces thediffusion coefficient of the SDE to unity. This transformation has enjoyed recentpopularity in financial econometrics (see, for instance, Ait-Sahalia, 2002; Durhamand Gallant, 2002) and has been used for the computation of optimal portfolios indynamic asset allocation models (Detemple et al., 2003). We show that a Dosstransformation of the SDE can improve the speed of convergence of thediscretization scheme as the martingale part of the transformed SDE can beapproximated without error. The asymptotic law of the estimate of the Dosstransformed state variable is derived and found to be non-centered. This can becontrasted with the simple Euler scheme applied to the original (non-transformed)SDE that produces an error whose asymptotic law is centered.

One of the numerical schemes used to reduce bias is the Milshtein second-orderscheme (Milshtein, 1995). We characterize its asymptotic error distribution and showthat it does not dominate the Euler scheme with transformation, in terms ofconvergence behavior. This highlights the benefits of the transformation. Our resultsfor the weak limit of the solution of SDEs based on the Milshtein scheme and thesecond-order bias of estimators of conditional expectations are new and therefore of

8Our method is simpler than the one in Talay and Tubaro (1990) as it does not require solving a PDE in

addition to calculating an expectation. In addition, it leads to the construction of computationally feasible

second-order bias corrected approximation schemes.

ARTICLE IN PRESS


independent interest. Moreover, they enlighten some of the experimentation resultsof Durham and Gallant (2002).

In the second part of the paper (Section 5), we apply these results to the estimationof the parameters of diffusions by simulated methods of moments. We first provide ageneral setup for parameter estimation and then develop an explicit simulatedextended quasi-maximum likelihood estimator based on the estimator presented byWefelmeyer (1996). We illustrate the method with L-CEV and CIR processes thatare widely used in the literature to model the short rate of interest.

As mentioned above, our theoretical convergence results enable us to design bias-corrected estimators that have the same asymptotic distributions as the infeasibleestimators based on the unknown true transition densities. We illustrate theimproved performance of these estimators in simulation experiments similar to thoseperformed by Durham and Gallant (2002) for the CIR model. Their simulationsshow that the Doss transformation reduces the bias of the parameter estimates. Ourasymptotic convergence results establish that this transformation indeed increasesthe speed of convergence, hence provides a theoretical explanation for their finding.

The paper is organized as follows. In Section 2 we study the asymptotic errordistributions of approximations of solutions of SDEs and provide numericalillustrations for standard processes in finance. Section 3 describes the asymptoticlaws of estimators of conditional expectations and characterizes expectedapproximation errors, second-order discretization biases and bias-corrected estima-tors. New asymptotic convergence results for the Milshtein scheme are presented inSection 4. Section 5 provides an application to a simulation-based method ofmoments. Conclusions are formulated in Section 6. All proofs are collected inAppendix A. Appendix B contains expressions needed to characterize the second-order bias-corrected estimators.

2. Asymptotic laws of estimators of solutions of SDEs

Continuous-time financial models are often based on multivariate diffusions withgeneral drift and diffusion functions. The solution of the financial application, be itasset pricing, portfolio allocation or risk management, relies on the simulation ofdiscretized versions of these stochastic differential equations (SDEs). The Eulerscheme is most often used for this purpose. This discretization involves anapproximation error. In this section we study the asymptotic error distribution ofEuler approximations of solutions of SDEs. We also study the error distributionassociated with a Doss transformation of the state variables. This change ofvariables is useful for numerical efficiency, as shown by the dynamic portfolioapplication in Detemple et al. (2003). It has also been used by Ait-Sahalia (2002) andDurham and Gallant (2002) as a variance reduction device. The importance ofobtaining an explicit expression for the asymptotic distribution of the approximationerror cannot be underestimated. As we will see, it is not possible to approximate thedistribution of this error in a finite simulation experiment using a simulatedbenchmark for the true value X T , no matter how finely the process is sampled.

ARTICLE IN PRESS


Convergence results for the Euler scheme are reviewed in Section 2.1. Theirapplication to Doss-transformed processes is studied in Section 2.2. Numericalillustrations are provided in Section 2.3.9

2.1. Euler approximation without transformation

To set the stage for the convergence results with the transformation and theMilshtein scheme, we recall known results for the standard Euler scheme. Theseresults are also essential for finding an explicit expression for the second-order bias.

Consider the d � 1 random vector X T given by the terminal value of the solutionof the SDE

dX v ¼ AðX vÞdvþXd

j¼1

BjðX vÞdW jv, (2)

where A and Bj are d � 1 vectors such that A 2 C3ðRd Þ, Bj 2 C3ðRd Þ and A;Bj are atmost of linear growth.10 The Euler approximation of (2) is

X NT ¼ X 0 þ

XN�1n¼0

AðX NnhÞhþ

XN�1n¼0

Xd

j¼1

BjðXNnhÞDW

jnh, (3)

where h ¼ T=N and DWjnh ¼W

jðnþ1Þh �W

jnh.

11

Kurtz and Protter (1991a) and Jacod and Protter (1998) deduce the asymptoticbehavior of the error associated with the Euler approximation of X T (see Jacod andProtter, 1998, Theorem 3.2, p. 276).

Theorem 1. The approximation error X NT � X T converges weakly at the rate 1=

ffiffiffiffiffiNp

(i.e.ffiffiffiffiffiNpðX N

T � X T Þ ) UXT ).

12 The asymptotic error is

UXT ¼ �

1ffiffiffi2p OT

Z T

0

O�1v

Xd

l;j¼1

½qBjBl �ðX vÞdZl;jv (4)

with ½Zl;j �l;j2f1;...;dg a d2� 1 standard Brownian motion independent of W, qBj a d � d

matrix of derivatives of Bj with respect to X and

Ov ¼ ER

Z �0

qAðX sÞdsþXd

j¼1

Z �0

qBjðX sÞdW js

!v

. (5)

9To simplify the presentation we assume homogeneous dynamics of the process to be simulated.10The space CkðRd Þ denotes the space of k times continuously differentiable, Rd -valued functions.11To simplify the notation we restrict the error analysis to equidistant discretization schemes.12Let S be a metric space and S its Borel sets. A sequence of random variables X N is said to converge

weakly to a random variable X whenever, with PXN � P � ðX N Þ�1 and PX � P � X�1, we haveR

Sf ðsÞdPXN ðsÞ !

RS

f ðsÞdPX ðsÞ for all continuous and bounded functions f on S (see Billingsley, 1968).

ARTICLE IN PRESS


In this last expression qA is the d � d matrix of derivatives of the vector A with respect

to the elements of X and ERð�Þ is the right stochastic exponential.13

Theorem 1 says that the error converges in law at the rate 1=ffiffiffiffiffiNp

to the randomvariable UX

T , as the number of discretization points N becomes large. The asymptoticerror UX

T depends on the coefficients of the SDE and their derivatives. Surprisingly,it also depends on new Brownian motions (½Zl;j�l;j2f1;...;dg), which are orthogonal tothe original ones (W). These appear because the stochastic integral of a time-dependent function with respect to a Brownian motion is imperfectly correlated withthe terminal value of the Brownian motion. A second Brownian motion is thenneeded to describe the law of the integral. The result in Theorem 1 follows becausethe limit law of X N

T depends on stochastic integrals of this sort.To illustrate the result consider the simple case of a geometric Brownian motion

dX v ¼ aX v dvþ bX v dW v. (6)

The asymptotic error is UXT ¼ �ðb=

ffiffiffi2pÞX T ZT ¼ Z

ðb2=2ÞX2T, where X T is log-normally

distributed and ZT is normal. The error distribution is a mixture of normals wherethe mixing distribution is the square of the geometric Brownian motion X T .

The scaling property of the Brownian motions Zl;j shows that the asymptoticdistribution, in the univariate case, is always a mixture of normals. But the law of themixing distribution is not always explicit. For instance, in the case of a CIR process

dX v ¼ kðX � X vÞdvþ sffiffiffiffiffiffiX v

pdW v, (7)

one finds that

UXT ¼ �

s2

2ffiffiffi2p OT

Z T

0

O�1v dZv ¼ Zs48O2

T

R T

0O�2v dv

.

The law of the mixing random variable ðs4=8ÞO2T

R T

0 O�2v dv is unknown as O satisfiesthe equation dOv ¼ ð�kdvþ ððs=2Þ=

ffiffiffiffiffiffiX v

pÞdW vÞOv where O0 ¼ 1, whose solution

depends on the path of X. One must then resort to numerical procedures to examinethe asymptotic error distribution. Illustrations of this are provided in Figs.1 and 2for the CIR process and the constant elasticity of variance process with linear drift(L-CEV).

The dependence of the asymptotic distribution on an independent Brownianmotion, that does not exist on the original probability space, shows that it is notpossible to approximate the distribution of the approximation error in a finitesimulation experiment using a simulated benchmark for the true value X T . Thisunderscores the importance of an explicit formula for the asymptotic error. In theabsence of an exact expression for UX

T , error analysis using a simulated benchmarkX N%

T with N% large (i.e. analysis offfiffiffiffiffiNpðX N

T � X N%

T Þ) will always depend just on theoriginal Brownian motions W j and not on the independent Brownian motions Zl;j

that appear in the random limit.

13For a d � d semimartingale M, the right stochastic exponential Zv ¼ ERðMÞv is the unique solution of

the d � d matrix SDE dZv ¼ dMv Zv with Z0 ¼ Id .

ARTICLE IN PRESS


Theorem 1 shows that the standard Euler procedure converges at the rate of1=

ffiffiffiffiffiNp

. Next, we introduce a change of variables that simplifies the volatility of theunderlying SDE, thereby leading to an approximation of the true value X T with animproved rate of convergence, equal to 1=N. Durham and Gallant (2002) havealready remarked in their experiments that this transformation improves theperformance of their simulation and importance sampling schemes. They suggestthat the improvement might be related to the fact that the transformed process is‘‘closer’’ to a Gaussian process. We formalize this intuition and illustrate it withexamples in the next sub-sections.

2.2. Euler approximation with Doss transformation

Let us first introduce the Doss transformation.14 Consider the transformedvolatility coefficient, BBðxÞ � BðxÞB

�1, where B is an arbitrary matrix of constants.

Suppose that the rotated volatility coefficient BB satisfies the rank condition,

rankðBBÞ ¼ d; a.e., (8)

and the commutativity condition,

qBBj BB

i ¼ qBBi BB

j for all i; j ¼ 1; . . . ; d. (9)

Then, there exists a function GB : Rd ! Rd solution of the total ODE

qzGBðzÞ ¼ BBðGBðzÞÞ; GBð0Þ ¼ 0, (10)

such that X t ¼ GBðX tÞ, where

dX v ¼ AðX vÞdvþXd

j¼1

Bj dW jv with GBðX 0Þ ¼ X 0, (11)

and

AðxÞ � BBðxÞ�1AtGBðxÞ, (12)

where the operator A is defined by

AGB � AðGBÞ �1

2

Xd

j¼1

qBBj ðG

BÞ. (13)

Note that in the case where B is commutative, one can choose B ¼ Id . In thisinstance the transformed state variables X j have unit volatility coefficients.

The Euler approximation of the transformed state variables satisfying (11) is

XN

T ¼ X 0 þXN�1n¼0

AðXN

nhÞhþXN�1n¼0

Xd

j¼1

BjDWjnh.

The error distribution of this approximation of the d-vector X T is given next.

14See Doss (1977) and Detemple et al. (2005) for details.

ARTICLE IN PRESS


Theorem 2. Suppose that the rank and commutativity conditions (8) and (9) are

satisfied. The approximation error XN

T � X T converges weakly at the rate 1=N (i.e.

NðXN

T � X T Þ ) UXT ). The asymptotic error is

UXT ¼ � OT

Z T

0

O�1

v qAðX vÞ

�1

2dX v þ

1ffiffiffiffiffi12p

Xd

j¼1

Bj dZjv þ

1

2

Xd

j;k;l¼1

ql;kAðX vÞBk;j Bl;j dv

!ð14Þ

with ½Zj �j2f1;...;dg a d � 1 standard Brownian motion independent of W and Zh;j ,

qAðX vÞ ¼ ½q1AðX vÞ; . . . ; qdAðX vÞ� the d � d matrix with columns given by the

derivatives of the vector AðX vÞ, and ql;kAðX sÞ the d � 1 vector of cross derivatives of

AðX vÞ with respect to arguments l; k. The d � d matrix Ov is

Ov ¼ ER

Z �0

qAðX sÞds

� �v

. (15)

Theorem 2 shows that the speed of convergence increases after application of thetransformation. It also shows that the limiting random variable is different and, inparticular, involves exponentials of a bounded total variation process instead of astochastic integral. But, in contrast to the limit without transformation UX

T , theasymptotic error UX

T does not have a zero mean.The simplest example is the Ornstein–Uhlenbeck process

dX v ¼ kðX � X vÞdvþ sdW v. (16)

Its asymptotic error UXT is the sum of two normal random variables

UXT ¼

k2e�kT

Z T

0

ekv dX v þksffiffiffiffiffi12p e�kT

Z T

0

ekv dZv

�k2

2ðX � X 0Þe

�kT T þW aðTÞ þ ZbðTÞ, ð17Þ

where

aðTÞ �s2kðe2kT � 1þ 2kTð1� kTÞÞ

16e2kTand bðTÞ �

ks2

24ð1� e�2kT Þ.

If X 0aX the asymptotic law is clearly non-centered.The result in Theorem 2 can now be used to construct an approximation of

X T ¼ GðX T Þ with an improved speed of convergence.

Corollary 1. Under the conditions of Theorem 2, NðGðXN

T Þ � X T Þ ) BðX T ÞUXT .

The convergence rate 1=N attained by GðXN

T Þ is the same as the convergence rateof the Euler scheme applied to an ordinary differential equation. This is the best ratethat can be attained with an Euler scheme.

ARTICLE IN PRESS


2.3. A numerical example

For concreteness we illustrate Theorems 1 and 2 with a mean reverting, constantelasticity of variance process

dX v ¼ kðX � X vÞdvþ sX gv dW v. (18)

This specification is often used to model the short rate in term structure models, as inChan et al. (1992). For g ¼ 0:5 we obtain a CIR process; otherwise it is a standardL-CEV process.

For a precise characterization of the asymptotic error we need to identify theexpressions in OT and UX

T . Straightforward computations give qBðxÞ ¼ sgxg�1 andqAðxÞ ¼ �k. The SDE for the transformed process is15

dX v ¼qA

B�

1

2qB

� �� G

� �ðX vÞ þ dW v, (19)

where GðxÞ ¼ ðsð1� gÞxÞ1=ð1�gÞ. Similarly, expressions for qA and q2A that appear inOT and UX

T , are

qAðxÞ ¼ qA�AqB

B�

1

2q2BB

� �� G

� �ðxÞ

and

q2AðxÞ ¼ q2AB� qA qB� Aq2 BþAðqBÞ2

B�

1

2ðq3B Bþ q2BqBÞB

� �� G

� �ðxÞ

(20)

with q2BðxÞ ¼ sgðg� 1Þxg�2, q3BðxÞ ¼ sgðg� 1Þðg� 2Þxg�3 and q2AðxÞ ¼ 0.For the L-CEV process we adopt the parameter values in Chan et al. (1992):

k ¼ 0:0171, X ¼ 0:1138, s ¼ �0:0655, g ¼ 0:9997. Parameter values for the CIRprocess are taken from Broze et al. (1998): k ¼ 0:0305, X ¼ 0:0791, s ¼ �0:0219.

Figs. 1 and 2 graph the asymptotic error distributions for CIR and L-CEV. Thegraphs plot the empirical distribution functions based on M ¼ 50 000 replicationsand use T ¼ 1 and N ¼ 365. Observe that the empirical distributions using thetransformation are non-centered and that the errors, with the transformation, areconsiderably smaller.

Comparing Figs. 3 and 4 for the CIR process and Figs. 5 and 6 for the L-CEVprocess reveals the increased speed of convergence with the transformation. In thoseexperiments the benchmark ‘‘true’’ value is computed without transformation andtaking N ¼ 214.16 Approximation errors, relative to this benchmark, are thencomputed using N ¼ 2x with x ¼ 2; . . . ; 9. Distribution functions are again based onM ¼ 50 000 replications.

The simulation of the discretized version of the process of interest is usually thefirst step of a Monte Carlo procedure for an econometric technique such as the

15Given two functions f and g, the notation � reads ½f � g�ðxÞ � f ðgðxÞÞ.16We take this shortcut to illustrate the relative speed of convergence.

ARTICLE IN PRESS

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

-5e-05

-2e-07 -1e-07 1e-07 2e-07 3e-07 4e-07 5e-070

-4e-05 -3e-05 -2e-05 -1e-05 1e-05 2e-05 3e-05 4e-05 5e-0500

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Fig. 1. Asymptotic error distribution function of CIR process, approximated with a Euler scheme without

Doss transformation (PðUX1 pxÞ upper graph), and with Doss transformation (PðU

GðX Þ1 pxÞ lower graph).


simulated method of moments. The next step is to generate a large number ofdiscretized trajectories to compute an average designed to approximate the momentcondition. Common wisdom suggests that the precision of this estimator can beimproved by discretizing the process as finely as possible and taking a very largenumber of independent replications. In practice, however, one faces a limited budgetof computation time. Duffie and Glynn (1995) propose a computationally efficientscheme which optimizes the gains achieved by reducing the length of thediscretization step and by increasing the number of simulations of the sample pathof the discretized process. For this efficient MCE scheme, they also characterized theasymptotic distribution of the approximation error and found it to be non-centered.The efficient procedure has therefore a second-order bias. In the next section, weextend their results in several directions. We start with a characterization of thesecond-order bias as the expected value of a known random variable. As this randomvariable can be simulated, along with the diffusion, a new approximation thatcorrects for second-order bias can be designed. We also provide equivalent resultsfor the Doss-transformed process introduced in the previous section.

ARTICLE IN PRESS

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

-1.5e-05

-1e-07 -5e-08 5e-08 1e-07 1.5e-07 2e-07 2.5e-07 3e-07 3.5e-070

-1e-05 -5e-06 5e-06 1e-05 1.5e-050

Fig. 2. Asymptotic error distribution function of a CEV process, approximated with a Euler scheme

without Doss transformation (PðUX1 pxÞ upper graph), and with Doss transformation (PðUGðX Þ

1 pxÞ lower

graph).


3. Asymptotic laws of estimators of conditional expectations

We now derive the asymptotic error of the estimate of the conditional expectationof a function of the terminal value of an SDE, X T . When the distribution of X T isunknown an estimator of the expected value is obtained by sampling independentreplications of the numerical solution of the SDE and averaging over the sampledvalues. The approximation error of this scheme has two components (Duffie andGlynn, 1995). The first is the error due to the discretization of the SDE. The secondis the error in the approximation of the conditional expectation by a sample average.

Section 3.1 presents our central result, namely the asymptotic error distributionsassociated with estimators of conditional expectations. Auxiliary results concerningthe error component associated with the discretization scheme are described inSection 3.2. The second-order biases of these estimators are discussed in Section 3.3,and bias correction is performed in Section 3.4.

ARTICLE IN PRESS

0

0.1

−7 − 6 − 5 − 4 − 3 − 2 −1 0 1 2

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Convergence of Cumulative Distribution Function: CIR (no transformation)

Pro

babi

lity

Error x 10− 4

Fig. 3. Speed of convergence of distribution function of the approximation error X N1 � X 1 for a CIR

process approximated with a Euler scheme for N ¼ 2x and x ¼ 2; . . . ; 9.


3.1. Asymptotic error distributions

Suppose that we wish to calculate E½gðX T ÞjF0� ¼ E0½gðX T Þ� ¼ E0½gðX T Þ� where X

solves (2) or (11). The estimators without and with transformation are

gN ;M �1

M

XMi¼1

gðX i;NT Þ, (21)

and

gN ;M�

1

M

XMi¼1

gðXi;N

T Þ. (22)

These estimators of the conditional expectation rely on the law of large num-

bers and draw independent replications X i;NT (resp. X

i;N

T ) of the terminal points

ARTICLE IN PRESS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Convergence of Cumulative Distribution Function: CIR (transformation)

Pro

babi

lity

− 1 − 0.5 0.5 1.51 0

Error x 10− 4

Fig. 4. Speed of convergence of the distribution function of the approximation error GðXN

1 Þ � X 1 for a

Doss-transformed CIR process approximated with a Euler scheme for N ¼ 2x and x ¼ 2; . . . ; 9.


X NT (resp. X

N

T ) of the Euler discretized diffusion without (resp. with) Dosstransformation. Our next theorem describes their asymptotic laws.

Theorem 3. Let g 2 C1ðRdÞ, g 2 C3ðRd Þ, and suppose that gðX T Þ 2 D1;2.17 Also

suppose that the assumptions of Theorems 4 and 5 hold. For the schemes without and

with transformation, we have, as M !1,ffiffiffiffiffiffiMpðgNM ;M � E0½gðX T Þ�Þ ) �1

2KT ðX 0Þ þ LT ðX 0Þ, (23)

ffiffiffiffiffiffiMpðgNM ;M

� E0½gðX T Þ�Þ ) �12

KT ðX 0Þ þ LT ðX 0Þ, (24)

17The spaceD1;2 is defined as the domain of the Malliavin derivative operator (see Nualart (1995) for an

exact definition and more on Malliavin calculus). A brief introduction to Malliavin calculus also appears

in Appendix D of Detemple et al. (2003).

ARTICLE IN PRESS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Convergence of Cumulative Distribution Function: CKLS (no transformation)

Pro

babi

lity

− 8 − 6 − 4 − 2 0 2

x 10− 4Error

Fig. 5. Speed of convergence of the distribution function of the approximation error X N1 � X 1 for a CEV

process approximated with a Euler scheme for N ¼ 2x and x ¼ 2; . . . ; 9.


where limM!1NM ¼ þ1 and � ¼ limM!1

ffiffiffiffiffiffiMp

=NM , and LT ðX 0Þ is the terminal

value of a centered Gaussian martingale with (deterministic) quadratic variation and

conditional variance given by

½L;L�T ¼

Z T

0

E0½NvðNvÞ0�dv � VAR½gðX T ÞjF0�, (25)

Nv ¼ Ev½qgðX T ÞDvX T �. (26)

In these expressions DsX T is the Malliavin derivative of X T . The deterministic

functions K and K are defined in Theorems 4 and 5 below.

The random variable DsX T captures the impact of an innovation in the Brownianmotion W at time s on the state variable X at time T. In essence this derivativemeasures the persistence of a shock in the state variable. It is similar to an impulse

ARTICLE IN PRESS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Convergence of Cumulative Distribution Function: CKLS (transformation)

Pro

babi

lity

− 6 − 4 − 2 0 2 4 6 8 10

Error x 10− 5

Fig. 6. Speed of convergence of the distribution function of the approximation error GðXN

T Þ �X T for a

Doss-transformed CEV process approximated with a Euler scheme for N ¼ 2x and x ¼ 2; . . . ; 9:


response function that quantifies the sensitivity of the variable X T to an uncertaintyshock at the prior time s.

The theorem shows that the asymptotic laws of the estimators have two parts. Thefirst, K, corresponds to the discretization bias; the second, L, results from the MonteCarlo estimation of the expectation. Note that L would not vanish, even if sampleswhere taken from the law of X T . This is because the conditional expectation cannotbe calculated in closed form.

The theorem also shows that the estimators converge at the same rate. Thisfollows from the fact that the convergence rate of the expected approxi-mation error, described in Theorems 4 and 5 in the next section, is the same.As the rate 1=


is obtained from a central limit theorem, an additional con-clusion is that higher-order schemes would fail to improve the convergencespeed. They would just reduce the factor � in the limits and therefore the second-order bias.

ARTICLE IN PRESS


3.2. Expected approximation errors

We now provide auxiliary results concerning the error component associated withthe discretization scheme. Let eN

T (eNT ) be the expected approximation error for the

scheme without (with) transformation. By definition

eNT � E0½gðX

NT Þ� � E0½gðX T Þ�, (27)

eNT � E0½gðX

N

T Þ� � E0½gðX T Þ�, (28)

where g � g � F with F the inverse of G, gðX NT Þ is an approximation of gðX T Þ based

on the Euler discretization of X, and gðXN

T Þ is an approximation of gðX T Þ based onthe Euler discretization of the transformed state variables X .

Next we study the convergence properties of these errors. These results arerelevant to the extent that the limits of NeN

T and NeNT determine the means of the

asymptotic error distributions of the corresponding efficient estimators of condi-tional expectations. As we will show, these limits characterize the second-orderapproximation bias of efficient Monte Carlo estimators of diffusions.

3.2.1. Euler scheme on the original state variables

Our first result describes the convergence of the expected approximation error eNT ,

in (27). Define the random variables

V1;T � � OT

Z T

0

O�1s qAðX sÞdX s þXd

j¼1

½qBjA�ðX sÞdW js

�Xd

i;j¼1

½qBj qBj Bi�ðX sÞdW is

!

þ OT

Z T

0

O�1s

Xd

j¼1

½qBj qBj A� þXd

j;k;l¼1

½qkðqlABk;jÞ�

" #ðX sÞds

þ OT

Z T

0

O�1s

Xd

i;j¼1

ð½q½qBj qBj Bi�Bi � qBi qBj qBj Bi�ðX sÞÞds, ð29Þ

V2;T � �

Z T

0

Xd

i;j¼1

ni;jðs;TÞds, (30)

where qA, qBj are d � d matrices of Jacobians, O is defined in Theorem 1 andni;jðs;TÞ is defined in (137)–(141). With this notation we have:

Theorem 4. Suppose that A;Bj 2 C2ðRdÞ. Let g 2 C3ðRd Þ be such that

limr!1

lim supN

E0½1fjNðgðX NTÞ�gðX T ÞÞj4rgNjgðX

NT Þ � gðX T Þj� ¼ 0 (31)

ARTICLE IN PRESS


(P-a.s.). Then,

NeNT !

12

KT ðX 0Þ �12E0½qgðX T ÞV 1;T þ V 2;T �, (32)

where V1;T , V 2;T are given by (29) and (30), and eNT is defined in (27).

Theorem 4 provides a probabilistic characterization of the asymptotic expectederror. The expressions in (32) depend on random variables V1 and V 2 that aredetermined in closed form by the derivatives of the drift and volatility coefficients ofthe SDE. They can therefore be simulated along with the state variables to derivebias-corrected estimators (see Section 3.4). This characterization of the asymptoticexpected error is easier to evaluate than the expressions presented in Talay andTubaro (1990) and Bally and Talay (1996a,b). These authors show that theasymptotic expected error for the Euler scheme can be written as the expectation of afunction of a random variable, where the function solves a PDE. In contrast, ourcharacterization is fully probabilistic as it does not require the resolution of a PDE,and therefore, does not suffer from the curse of dimensionality affecting numericalsolutions of PDEs. As a consequence, it remains applicable for multivariatediffusions. Even in the univariate case, using Monte Carlo simulation incombination with the solution of a PDE is computationally costly. This mayexplain why the theoretical results of Talay and Tubaro (1990) and Bally and Talay(1996a,b) have seen limited use in applications.

Implementation, in numerous applications, requires the computation of condi-tional expectations of path-dependent functionals of diffusion processes, such as theRiemann integral

R T

0gðX sÞds (see the example in Section 5.2). A convergent

estimator for this integral, based on the Euler scheme, isPN�1

n¼0 gðX nhÞh, whereh ¼ T=N, and X N is the solution of the Euler-discretized SDE starting at X 0.Theorem 4 can be used to deduce the asymptotic expected approximation error inthese cases.

Corollary 2. Under the assumptions of Theorem 4, we have, as N !1,

NE0

XN�1n¼0

gðX NnhÞh�

Z T

0

gðX sÞds

" #!

1

2K1T ðX 0Þ

P–a.s., with

K1T ðX 0Þ �

Z T

0

KsðX 0Þdsþ K2T ðX 0Þ, (33)

where K is defined in Theorem 4, and

K2T ðX 0Þ � � E0

Z T

0

qgðX sÞ AðX sÞ þXd

j¼1

½ðqBjÞBj�ðX sÞ

! "

þXd

j¼1

½B0j q2g Bj�ðX sÞ

!ds

#. ð34Þ

ARTICLE IN PRESS


The expected approximation error has two terms. The first term,R T

0 KsðX 0Þds,captures the cumulated expected approximation error X N � X in the Riemannintegral over the interval ½0;T �. The second term, K2T , emerges because thecontinuous Riemann integral,

R T

0 gðX sÞds, is approximated by a discrete sum,PN�1n¼0 gðX N

nhÞh.

3.2.2. Euler scheme on the transformed state variables

To derive the expected approximation error for the estimator with transformationdefine the random variable

VT ¼ �OT

Z T

0

O�1

v qAðX vÞ dX v þXd

j;k;l¼1

ql;kAðX vÞBk;j Bl;j dv

!(35)

with OT ¼ ERðR v

0 qAðX sÞdsÞ. We obtain

Theorem 5. Suppose that A 2 C1ðRdÞ, and that the conditions of Theorem 2 hold. For

g 2 C1ðRdÞ such that

limr!1

lim supN

E0½1fjNðgðX

NT Þ�gðX T ÞÞj4rg

NjgðXN

T Þ � gðX T Þj� ¼ 0 (36)

P-a.s. we have, P-a.s., as N !1,

NeNT !

12

KT ðX 0Þ �12E0½qgðX T ÞVT �, (37)

where VT is defined in (35), and eNT is defined in (28).

A comparison of (32) with (37) suggests that it will be difficult, in general, toestablish the dominance of one method over the other on the basis of the asymptoticexpected error. Indeed, the formulas reveal that both methods converge at the samespeed 1=N and that the second-order biases, while different (KT ðX 0ÞaKT ðX 0Þ), donot appear to be ordered in a systematic manner. To compare the two methods onemay want to use additional criteria, such as the computational cost.

The difference in the convergence rates to the limit errors (Theorems 1 and 2) andthose to the limit expected errors (Theorems 4 and 5) can be explained as follows.The speed of convergence for the limit error is determined by the martingale part ofthe error expansion that converges more slowly (1=

ffiffiffiffiffiNp

) than the bounded variationpart (1=N). The transformation eliminates the martingale part of the error. Takingthe expectation also eliminates the martingale part of the error. It followsimmediately that the expected errors will converge at the same rate. To see thatthe expectation eliminates the martingale part of the error in the absence of atransformation it suffices to note that this term converges weakly to a stochasticintegral whose expectation is null.18

Let us now consider estimators of Riemann integrals. The counterpart ofCorollary 2, for the scheme based on the transformed state variables, is

18The martingale part of the error converges to the product of a random variable and a stochastic

integral with independent integrator.

ARTICLE IN PRESS



NE0

XN�1n¼0

gðXN

nhÞh�

Z T

0

gðX sÞds

" #!

1

2K1T ðX 0Þ

P–a.s., with

K1T ðX 0Þ �

Z T

0

KsðX 0Þdsþ K2T ðX 0Þ, (38)

where K is defined in Theorem 5, and

K2T ðX 0Þ � �E0

Z T

0

qgðX sÞAðX sÞ þXd

j¼1

B0

j q2gðX sÞBj

!ds

" #. (39)

The two terms in this decomposition have the same interpretation as those ofCorollary 2. Note, in particular, that the expression for K2 follows immediately fromK2 and the deterministic nature of the volatility coefficient B (i.e. qBj ¼ 0).

3.3. Second-order discretization biases

Theorem 3 shows that the two procedures have asymptotic second-orderdiscretization biases that are, respectively, given by �

2K and �

2K . It follows that

any confidence interval, based solely on the Gaussian process L, will suffer from asize distortion. More precisely, when M !1 we have

P E0½gðX tÞ� 2 CNM ;Mg ðaÞ

� �! CðdÞ ð40Þ

with

CNM ;Mg ðaÞ �


gNM ;M þ F�1ða=2ÞsNM ;Mffiffiffiffiffiffi

Mp ;


gNM ;M þ F�1ða=2ÞsNM ;Mffiffiffiffiffiffi

Mp

� �

and

CðdÞ � FðF�1ð1� a=2Þ � dÞ � FðF�1ða=2Þ � dÞ, (41)

where F denotes the cumulative Gaussian distribution, d ¼ �KT ðX 0Þ=2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVAR½LT jF0�

pand where ðsN;M Þ

2¼ VARN ;M

½LT jF0� is a convergent estimatorof the variance. A confidence interval of nominal size a, based on L, will cover thetrue value E0½gðX T Þ� only with probability CðdÞ and not 1� a. For the method withtransformation the coverage probability of the true value is CðdÞ with d ¼ �KT ðX 0Þ=2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVAR½LT jF0�

p.

The degree of size distortion can be measured by sðzÞ � 1� a�CðzÞ, wherez 2 fd; dg. As CðzÞ is strictly monotone with Cð0Þ ¼ 1� a we conclude that theseconfidence intervals have the requested nominal size if and only if there is no second-order bias, i.e. d ¼ 0. Clearly, an increase in the second-order bias reduces the realcoverage probability. Likewise, a decrease in the asymptotic variance or an increasein � will increase the size distortion.

ARTICLE IN PRESS


A benefit of formulas (32), (37) for the second-order biases KT ðX 0Þ and KT ðX 0Þ, isthat they can be computed by simulation. Theorems 4 and 5 can then be used todevelop approximation schemes that correct for second-order bias. Likewise,asymptotically valid confidence intervals can easily be implemented. Furthermore,bias correction is feasible even when the number of state variables is large. Incontrast bias correction based on the solutions of PDEs quickly becomes infeasiblewhen the number of state variables increases.19

3.4. Bias-corrected estimators

Bias-corrected estimators are asymptotically equivalent to Monte Carlo estimatorsobtained by sampling from the true distribution of X T . As a result they do not sufferfrom the size distortion problem described above. Bias-corrected estimators ofconditional expectations can be constructed as

gN ;Mc ¼

1

M

XMi¼1

½gðX i;NNh Þ þ

12qgðX i;N

Nh ÞCi;N1;Nh þ

12

Ci;N2;Nh�, (42)

gN ;Mc ¼

1

M

XMi¼1

gðXi;N

Nh Þ þ1

2qgðX

i;N

Nh ÞCi;N

Nh

� �, (43)

where Ci;N1;Nh;C

i;N2;Nh; C

i;N

Nh , for n ¼ 0; . . . ;N � 1, are defined in Appendix B and g isimplicitly defined at the beginning of Section 3.2.2. Our next result shows thatgN ;Mc and gN;M

c are bias-corrected estimators.

Theorem 6. Suppose that the conditions of Theorems 4 and 5 hold. Then,ffiffiffiffiffiffiMpðgNM ;M

c � E0½gðX T Þ�Þ ) LT ðX 0Þ (44)

and ffiffiffiffiffiffiMpðgNM ;M

c � gNM ;Mc Þ ) 0 (45)

as M !1, with limM!1NM ¼ þ1.

Theorem 6 shows that gN;Mc and gN ;M

c correct for the second-order bias andare asymptotically equivalent.20 This means that a bias correction eliminates allthe potential benefits of the transformation. However, one should bear in mind thatthe equivalence is asymptotic and that bias-corrected estimators may performdifferently in finite samples.

19Duffie and Glynn (1995) propose the Richardson–Romberg type of estimator ð2=4MÞP4M

i¼1 gðX i;2NT Þ �

ð1=MÞPM

i¼1 gðX i;NT Þ to eliminate the second-order bias asymptotically. To calculate this estimator one must

quadruple the number of replications and double the number of discretization points. In our method the

second-order bias can be simulated along with the state variables, which is computationally cheaper than a

Richardson–Romberg approximation scheme.20Two estimators are called equivalent if they share the same asymptotic error distribution.

ARTICLE IN PRESS


Corresponding bias-corrected estimators for conditional expectations of Riemannintegrals

f ðX 0Þ ¼ E0

Z T

0

gðX sÞds

� �(46)

are

f N ;Mc ¼

1

M

XMi¼1

XN�1n¼0

gðX i;Nnh Þ þ

1

2qgðX i;N

nh ÞCi;N1;nh þ

1

2Ci;N

2;nh

� �hþ

1

2Ci;N

3;Nh

!, (47)

fN ;M

c ¼1

M

XMi¼1

XN�1n¼0

gðXi;N

nh Þ þ1

2qgðX

i;N

nh ÞCi;N

nh

� �hþ

1

2C

i;N

1;Nh

!. (48)

These estimators are sums over (42) and (43). Each involves an additional term(C3 and C1), that corrects for the bias due to the approximation of a continuousintegral by a discrete sum. Appendix B provides exact expressions for theseadditional bias correction terms. The next result parallels Theorem 6.

Corollary 4. Suppose that the conditions of Theorem 4 and 5 hold. Then,ffiffiffiffiffiffiMpðf NM ;M

c �

f ðX 0ÞÞ ) LfT ðX 0Þ, where L

fT ðX 0Þ is a centered Gaussian martingale with (deter-

ministic) quadratic variation and conditional variance given by

½Lf ;Lf �T ¼

Z T

0

E0½Nfv ðN

fvÞ0�dv � VAR

Z T

0

gðX sÞds

��F0

� �, (49)

Nfv ¼ Ev

Z T

v

qgðX sÞDvX s ds

� �. (50)

Furthermore,ffiffiffiffiffiffiMpðf NM ;M

c � fNM ;M

c Þ ) 0 (51)

as M !1 with limM!1NM ¼ 1.

4. A comparison with Milshtein’s second-order approximation

While Euler schemes for SDEs are appealing from a computational point of view,they might be judged insufficiently accurate. Second-order schemes such asMilshtein’s scheme (see Milshtein, 1984, 1995; Talay, 1984, 1986, 1996) have infact been proposed to provide improved approximations. In this section, we extendour analysis to the Milshtein second-order approximation.

The Milshtein approximation of X T in (2) is

~XN

T ¼ X 0 þXN�1n¼0

Að ~XN

nhÞhþXd

j¼1

Bjð ~XN

nhÞDWjnh þ

Xd

j;l¼1

½qBl Bj�ð ~XN

nhÞDF l;j

!,

(52)

ARTICLE IN PRESS


where

DF l;j �

Z ðnþ1Þhnh

Z s

nh

dW lv dW j

s, (53)

with h ¼ T=N and DWjnh ¼W

jðnþ1Þh �W

jnh. This scheme is obtained using a

stochastic Taylor expansion for the diffusion coefficient.21

The Milshtein scheme is often difficult to implement for multivariate diffusions.The increments DF l;j, defined in (53) cannot, in general, be written in a form that iseasy to simulate. In fact, for multivariate diffusions without commutative noise(qBl BkaqBkBl for some kal), the increment DFl;j can only be simulated by using afurther discretization of the intervals ½nh; ðnþ 1Þh� (see Gaines and Lyons, 1997).This entails a substantial increase in computational cost. The comparison to a Eulerscheme with a number of discretization points equal to the total number of pointsrequired to implement the Milshtein scheme is unclear.22

For commutative noise (i.e. qBl Bk ¼ qBk Bl), the last term in (52) simplifies to23

Xd

j;l¼1

½qBl Bj�ð ~XN

nhÞDFl;j ¼1

2

Xd

j¼1

½qBj Bj�ð ~XN

nhÞððDWjnhÞ

2� hÞ

!

þ1

2

Xd

j;l¼1jal

½qBl Bj �ð ~XN

nhÞDWjnh DW l

nh

0BB@

1CCA. ð54Þ

The representation of ~XN

T as a functional of the Brownian increments DW knh for

k ¼ l; j follows. Note, in particular, that every univariate diffusion has commutativenoise and can be implemented on the basis of (54).

Our next result describes the asymptotic distribution of the approximation errorassociated with (52). It will enable us to find an explicit expression for the MonteCarlo estimator of a conditional expectation based on the Milshtein scheme.

Theorem 7. The approximation error ~XN

T � X T converges weakly at the rate 1=N (i.e.

Nð ~XN

T � X T Þ ) ~UX

T ). The asymptotic error is

~UX

T ¼ �1

2OT

Z T

0

O�1s qAðX sÞdX s �Xd

j¼1

½ðqBjÞðqAÞBj�ðX sÞds

!

21For details on the stochastic Taylor expansion and the derivation of this result see Kloeden and Platen

(1997) or Milshtein (1995).22The commutative noise condition necessary for this representation is the same condition that is needed

for the Doss transformation. When commutativity fails the transformation cannot be used (see Remarks

3.1–3.4 in Detemple et al., 2005 for further details) and the implementation of the Milshtein scheme

requires the simulation of the iterated Wiener integrals DF l;j . In those cases implementation based on a

standard Euler scheme without transformation is considerably less demanding.23See Milshtein (1995) or Gaines and Lyons (1997).

ARTICLE IN PRESS


�1

2OT

Z T

0

O�1s

Xd

j¼1

½ðq½ðqAÞBj�ÞBj � ðqBjÞðqBjÞA�ðX sÞds

�1

2OT

Z T

0

O�1s

Xd

j¼1

½ðqBjÞA�ðX sÞdW js

�1ffiffiffiffiffi12p OT

Z T

0

O�1s

Xd

j¼1

½ðqAÞBj � ðqBjÞA�ðX sÞdZjs

�1ffiffiffi6p OT

Z T

0

O�1s

Xd

i;l;j¼1

½qBi qBl Bj�ðX sÞd ~Zl;j;is ,

where the process ððZjÞj2f1;...;dg; ð ~Zl;j;iÞi;l;j¼1;...;dÞ is a d þ d3

� 1 standard Brownian

motion independent of W. The process OT is given in Theorem 1.

Theorem 7 shows that the speed of convergence increases when one uses thestochastic Taylor expansion of the diffusion term. This expansion eliminates theerror component of order 1=

ffiffiffiffiffiNp

in the martingale part of the Euler approximation.It follows that the error for the Milshtein scheme converges at the same speed as theerror for the Euler scheme with transformation: both schemes improve on thestandard Euler approach. Note also that, in the special case of a constant volatilitycoefficient, estimators with and without transformation are identical, and asymptoticerror distributions for Milshtein and Euler with transformation are the same (i.e.~U

X

T ¼ UXT ). In this instance, all the schemes have the same asymptotic distribution.

The asymptotic error distribution of estimators of conditional expectations isdescribed next.

Theorem 8. Suppose that the assumptions of Theorem 9 below hold. Let g 2 C1ðRd Þ

and suppose that gðX T Þ 2 D1;2. For the Milshtein scheme, we have, as M !1,

ffiffiffiffiffiffiMp 1

M

XMi¼1

gð ~Xi;NM

T Þ � E0½gðX T Þ�

!) �

1

2~KT ðX 0Þ þ LT ðX 0Þ, (55)

where limM!1NM ¼ þ1 and � ¼ limM!1


=NM . The random variable LT ðX 0Þ

is the terminal value of a centered Gaussian martingale with (deterministic) quadratic

variation and conditional variance defined in Theorem 3. The deterministic function ~KT

is defined in Theorem 9.

For estimates of conditional expectations the rate of convergence of the Milshteinscheme is identical to the rates of the Euler schemes with and withouttransformation. The three schemes differ only in their second-order biases. Thesecond-order bias ~K can be found explicitly, as shown in Theorem 9 below. Its sizerelative to the second-order biases of Euler schemes depends on the coefficients ofthe underlying processes. A global ordering of the three schemes in terms of second-order asymptotic properties is not readily apparent.

ARTICLE IN PRESS


As for the estimator with transformation, the expected approximation error

~eNt � E0½gð ~X

N

T Þ � gðX T Þ�, (56)

can be directly deduced from Theorem 7 under an additional uniform integrabilitycondition. In order to do this, define the random variable

~VT ¼ � OT

Z T

0

O�1s qAðX sÞdX s �Xd

j¼1

½ðqBjÞðqAÞBj�ðX sÞds

!

� OT

Z T

0

O�1s

Xd

j¼1

½ðq½ðqAÞBj�ÞBj � ðqBjÞðqBjÞA�ðX sÞds

� OT

Z T

0

O�1s

Xd

j¼1

½ðqBjÞA�ðX sÞdW js. ð57Þ

The equivalent of Theorems 4 and 5, is

Theorem 9. For g 2 C1ðRdÞ such that

limr!1

lim supN

E0½1fjNðgð ~X NT Þ�gðX T ÞÞj4rg

Njgð ~XN

T Þ � gðX T Þj� ¼ 0 (58)

we have, P-a.s., as N !1,

N ~eNT !

12~KT ðX 0Þ �

12E0½qgðX T Þ ~VT �, (59)

where ~VT is defined in (57) and ~eNT is defined in (56).

Our next result extends Theorem 9 to Riemann integrals. The result parallelsCorollary 2.


NE0

XN�1n¼0

gð ~XN

nhÞh�

Z T

0

gðX sÞds

" #!

1

2~K1T ðX 0Þ

P–a.s., with

~K1T ðX 0Þ �

Z T

0

~KsðX 0Þdsþ ~K2T ðX 0Þ, (60)

where ~K is given in Theorem 9, and where ~K2T ¼ K2T is given in Corollary 2.

The asymptotic expected errors, and hence the second-order bias, for estimators ofRiemann integrals based on the Milshtein approximation, have the same structuresas the corresponding quantities for Euler schemes in Corollaries 2 and 3. The firstcomponent captures the cumulative expected error of the Milshtein approximationof the state variables in the Riemann integral. The second component is the expectederror due to the approximation of the continuous integral by a discrete sum.

ARTICLE IN PRESS


Bias-corrected estimators based on the Milshtein scheme are asymptoticallyequivalent to Monte Carlo estimators obtained by sampling from the truedistribution of X T . As a result they do not suffer from the size distortion problemdescribed earlier. Bias-corrected estimators of conditional expectations can beconstructed as

~gcN ;M ¼

1

M

XMi¼1

gð ~Xi;NNh Þ þ

1

2qgð ~X

i;NNh Þ

~Ci;N

Nh

� �, (61)

where ~Ci;N

Nh is a convergent approximation of � ~VT , defined in Appendix B. Our nextresult shows that ~gN;M

c is a bias-corrected estimator of the conditional expectationbased on the Milshtein scheme.

Theorem 10. Suppose that the conditions of Theorems 4, 5 and 9 hold. Then,ffiffiffiffiffiffiMpð ~gNM ;M

c � E0½gðX T Þ�Þ ) LT ðX 0Þ, (62)

ffiffiffiffiffiffiMpðgNM ;M

c � gcNM ;M

Þ ) 0 (63)

and ffiffiffiffiffiffiMpðgNM ;M

c � ~gcNM ;M Þ ) 0 (64)

as M !1, with limM!1NM ¼ 1.

It is important to note that bias-corrected Milshtein does not improve over bias-corrected Euler, even without transformation. Indeed, the theorem shows that allestimators (Milshtein and Euler) are asymptotically equivalent after second-orderbias corrections. For non-commutative noise, our explicit expressions for the second-order biases can be used to develop estimators based on the Euler scheme that arecomputationally superior to the Milshtein scheme because they do not requirefurther subdivisions of discretization intervals to approximate the increments DF l;j .

Second-order bias-corrected estimators for conditional expectations of Riemannintegrals based on the Milshtein scheme (as in (47)–(48)) are

~fN ;M

c ¼1

M

XMi¼1

XN�1n¼0

gð ~Xi;Nnh Þ þ

1

2qgð ~X

i;Nnh Þ

~Ci;N

nh

� �hþ

1

2~C

i;N

1;Nh

!. (65)

As for Euler-based estimators, they involve a new term ~C1 that corrects the second-order bias due to the approximation of a continuous integral by a discrete sum. Anexplicit expression can be found in Appendix B. In parallel with Corollary 4, wehave:

Corollary 6. Suppose that the conditions of Theorems 4 and 9 hold. ThenffiffiffiffiffiffiMpðf NM ;M

c � ~fNM ;M

c Þ ) 0 (66)

as M !1, with limM!1NM ¼ 1.

ARTICLE IN PRESS


Corollary 6 combined with Corollary 4 enable us to conclude that bias-corrected

estimators based on the Euler scheme f N ;Mc , the Euler scheme applied to the

transformed variables fN;M

c , and the Milshtein scheme ~fN ;M

c , are asymptotically

equivalent.

5. Simulation-based inference for diffusions

Standard dynamic models in finance involve diffusion processes with unknowncoefficients. Estimation and statistical inference are therefore essential forimplementation purposes. As mentioned in the introduction, the absence of ex-plicit formulas for transition densities of general diffusions implies that maxi-mum likelihood inference can only proceed with the use of an auxiliary numericalprocedure designed to approximate the transition density.24 Our characterizationsof asymptotic errors apply directly to these procedures and can be used tofind efficient experimental designs for simulation-based inference methods fordiffusions.

Section 5.1 sets up a general framework for the estimation of the parameters of adiffusion process using simulation-based methods of moments. A simulation-basedversion of extended quasi maximum likelihood estimation is presented in Section 5.2.Illustrations of the results are provided in Section 5.3.

5.1. A simulated method-of-moment approach for diffusions

We briefly sketch a general setup for parameter estimation of diffusion processes.Suppose that we observe an equally spaced discrete-time sample fY 0; . . . ;Y Lg, whereY l � Y tl

and D � tlþ1 � tl , of the following diffusion process, supposed to beunivariate for notational convenience,

dY t ¼ AðY t; yÞdtþ BðY t; yÞdW t and Y t0 given. (67)

Assume that coefficients depend on an unknown parameter vector y 2 C � Rp. Thetransition law of the Markov chain fY 0; . . . ;Y Lg is assumed to be absolutelycontinuous with respect to the Lebesgue measure l on Rd ,

Py0ðY tlþ12 dzjFtl

Þ � py0D ðY l ; zÞlðdzÞ, (68)

24Numerical approximations can be avoided if we let the observation interval decrease to zero. For this

type of fill-in asymptotics, efficient estimators have been obtained by Dacunha-Castelle and Florens-

Zmirou (1986), Florens-Zmirou (1989, 1993), Genon-Catalot (1990), Genon-Catalot and Jacod (1993) and

Bibby and Sørensen, 1995. This type of asymptotics seems less appropriate for non-experimental setups

like those encountered in financial econometrics.

ARTICLE IN PRESS


where py0D ðY l ; zÞ is the transition density. Also suppose that the Markov chain

fY 0; . . . ;Y Lg is geometrically ergodic.25 This last assumption guarantees theexistence of a stationary density, denoted py0ðyÞ.26

We seek to estimate the parameter y using the conditional constraints:ZRd

hj;DðY l ; z; yÞpyDðY l ; zÞlðdzÞ ¼ 0 for all j ¼ 1; . . . ; q. (70)

The structure of (70) implies that hDðY l ; z; y0Þ � ðhj;DðY l ; z; y0ÞÞj¼1;...;q is a vector of

martingale increments. This suggests estimating y0 by solutions yL

D of martingale esti-mating equations

1

L

XL�1l¼0

gDðY l ;Y lþ1; yÞ ¼ 0, (71)

where gDðY l ;Y lþ1; yÞ � JDðY lÞ0hDðY l ;Y lþ1; yÞ and JDðY lÞ is a q� p matrix of

adapted weights.In contrast to the estimator above, GMM estimators can be written as M

estimators,

~yL

D ¼ argminy

S1=2L

1

L

XL�1l¼0

gDðY l ;Y lþ1; yÞ

!2

, (72)

where the weight JDðY lÞ is now interpreted as the optimal instrument. The matrix SL

is a weighting matrix that controls the efficiency of the GMM estimator.It is known from Hansen (1985); Hansen et al. (1988); Chamberlain (1987, 1992);

Newey (1990) and Wefelmeyer (1996) that the efficient estimator ~yL

D is obtained forinstruments given by

JDðyÞ ¼ CDðyÞ�1GDðyÞ,

CDðyÞ �

ZR

hDðy; z; y0ÞðhDðy; z; y0ÞÞ0p

y0D ðy; zÞlðdzÞ,

25This assumption can be avoided by introducing a stochastic normalization for limit theorems, i.e. a

random sequence cL such that cLðyL

D � y0Þ ) Z where Z is a standard normal variate and cL !1.

Random normalizations arise naturally in the definitions of locally asymptotically mixed normal (LAMN)

and locally asymptotically Brownian functional (LABF) estimators. See Basawa and Scott (1982) for more

on this. As shown by Heyde (1992) confidence intervals based on a stochastic normalization of the

estimator are, in general, shorter than those based on a deterministic normalization.26A Markov chain is geometrically ergodic if there exists a constant r41 such that

X1k¼1

rk

ZB

py0D ðY l ; zÞlðdzÞ �

ZB

py0 ðzÞlðdzÞ

V

� �k

o1 for all B 2 R, (69)

where k � kV is the variational norm (see Meyn and Tweedie, 1993). See Duffie and Singleton (1993) for an

application of this in a discrete time model. General convergence results for Markov chains appear in

Meyn and Tweedie (1993).

ARTICLE IN PRESS


GDðyÞ �

ZR

ðqyhDðy; z; y0ÞÞ0p

y0D ðy; zÞlðdzÞ.

Our next result shows that the optimal weight SL does not play any role if optimal

instruments are used. This follows from the first-order asymptotic equivalence of ~yL

D

and yL

D.

Theorem 11. In an ergodic model, the efficient estimator yL

D has the asymptotic error

distributionffiffiffiffiLpðy

L

D � y0Þ ) S�1=2D Z (73)

as L!1, where

SD �

ZR

GDðyÞ0CDðyÞ

�1GDðyÞpy0ðyÞlðdyÞ

� �, (74)

and Z�Nð0; IpÞ is a standard normal random variable vector. Furthermore,ffiffiffiffiLpðy

L

D �~y

L

DÞ ) 0, (75)

as L!1.

A similar result for d-order Markov chains can be found in Muller andWefelmeyer (2002) (see also references therein). They show that the martingaleestimating function framework covers GMM, generalized quasi-likelihood, extendedquasi-likelihood, and quasi-likelihood estimations. Our result simply expresses thefact that the GMM estimator with optimal instruments is just identified andconsequently can be obtained as the solution of (71).

If the transition density pyDðy; zÞ is known, the estimation of the unknown

parameters based on (70) is equivalent to the estimation of unknown parameters in aMarkovian ergodic time series model.27 It is of particular interest to note that thediscrete nature of observations has no impact on the inference procedure.

Unfortunately, the transition density pyDðy; zÞ is usually unknown or the optimal

weights JðyÞ cannot be calculated in closed form. The corresponding optimalestimating function cannot therefore be used as it stands for estimating y, i.e. theefficient estimator is infeasible. Similarly, for moment-based constraints, indirectinference or EMM estimators, the functions hj;DðY l ;Y lþ1; yÞ that identify theparameters through (70), are not known in explicit form and are obtained by MonteCarlo simulation.28

27The particular choice hj;Dðy; z; yÞ ¼ qyjpyDðy; zÞ leads to a maximum likelihood estimator which attains

the Cramer–Rao lower bound ðRRd CDðyÞp

y0 ðyÞlðdyÞÞ�1.28See Gallant and Tauchen (2002) for a survey. The moment-based estimating functions hj are often of

the form hj;DðY l ;Y lþ1; yÞ ¼ f jðY l ;Y lþ1Þ �RRd f jðY l ; zÞpyDðY l ; zÞlðdzÞ. The functions hj cannot be obtained

in closed form, if the transition density is unknown or the Riemann integral cannot be solved explicitly, as

it is often the case in applications. Similarly, for indirect inference or EMM the functions hj are associated

with the score function of an auxiliary model, where the parameters of the auxiliary model are replaced by

their estimates based on the sample of data and where the structural processes are simulated for given

values of the structural parameters. Estimates of the latter are the values that set the moment conditions as

close to zero as possible based on a suitable GMM metric.

ARTICLE IN PRESS


In a diffusion setting, simulations and numerical solution techniques for SDEs canbe used to approximate the optimal weights J and/or the functions hj in theconstraints that identify the parameters.29 For simulation-based estimatorscomputations can be performed by first discretizing time and using the Euler orthe Milshtein schemes to approximate the solution of the stochastic differentialequation, and then replicating this approximation M times to generate M

independent realizations of the terminal point fY i;Nlþ1ðY lÞ : i ¼ 1; . . . ;Mg. The integral

in the expression for hj and/or J can then be replaced by an empirical mean overindependent replications given the initial value.

If the transition density is unknown, the estimator yL;M;N

is obtained from asimulation-based martingale estimating function, i.e. as the unique root of

1

L

XL�1l¼0

gM ;ND ðY l ;Y lþ1; y

L;M ;NÞ �

1

L

XL�1l¼0

JM ;ND ðY lÞh

M;ND ðY l ;Y lþ1; y

L;M ;NÞ ¼ 0.

(76)

The results of this paper can be used to study the asymptotic properties of thenormalized error

ffiffiffiffiffiffiMpðgM ;N

D � gDÞ resulting from this procedure. In particular, wenow show that an approximation of the true estimating function based on M

independent Monte Carlo replications with N discretization points introduces asecond-order bias. As a result simulation-based estimators have higher-orderasymptotic properties that differ from those of their infeasible optimal counterparts.The second-order bias has important implications for the construction of asymptoticconfidence intervals and for parametric hypothesis testing.

To describe the second-order bias of the simulation-based estimator, introduce theexpression,

kj;Dðy0Þ �ZR�R

Kj;Dðy; y; z; y0ÞpDðy; z; y0ÞlðdzÞpy0ðyÞlðdyÞ for j ¼ 1; . . . ; p,

(77)

where Kj;Dðy; y; z; y0Þ is defined in Theorem 4 (the notation emphasizes that thefunction being averaged over depends, for simulation-based estimating functions, onthe observations Y l and Y lþ1. As in Theorem 4, the first argument corresponds tothe starting point of the simulations Y l , i.e., the state for which the conditionalexpectation is calculated).30

Our next result shows that the simulation procedure introduces an additionalsecond-order bias.

Theorem 12. Assume that the conditions of Theorem 4, for the Euler scheme without

transformation, are satisfied. Then, for geometrically ergodic diffusions observed over

29Alternative kernel-based estimation methods for conditional expectations converge at a slower rate

(L�2=ðdþ4Þ) that depends on the dimension of the diffusion. Therefore, for multivariate diffusions, a large

sample is necessary to get reasonably accurate estimates of the weights and/or the functions hj . Simulation-

based estimators are particularly interesting in those cases.30For MCE estimators these observations do not affect the convergence properties.

ARTICLE IN PRESS


time intervals of equal fixed length D, we have, as L!1,ffiffiffiffiLpðy

L;ML;NL

D � yL

DÞ ) �e1e2SDðy0Þ�1kDðy0Þ, (78)

where e1 ¼ limL!1

ffiffiffiffiLp

=ffiffiffiffiffiffiffiffiML

pand e2 ¼ limL!1

ffiffiffiffiffiffiffiffiML

p=NL, with limL!1ML ¼

limL!1NL ¼ 1.Corresponding estimators for the Euler scheme with Doss transformation (under the

assumptions of Theorem 5) and the Milshtein scheme (under the assumptions of

Theorem 9), are obtained if Kj;Dð�; y; z; y0Þ in (77) is replaced by Kj;Dð�; y; z; y0Þ of

Theorem 5 and ~Kj;Dð�; y; z; y0Þ of Theorem 9, both adjusted for the dependence of hj and

J on the observations ðY l ;Y lþ1Þ ¼ ðy; zÞ.

Theorem 12 shows that the efficient simulated estimator has a second-order biasthat depends on the discretization scheme chosen. The numerical scheme used tosolve the SDE becomes irrelevant only if e1e2kDðy0Þ ¼ 0. The conditions stating thate1 and e2 are finite, which are conditions linking the sample size L, the number ofreplications ML and the number of discretization points NL, are therefore necessary.If these conditions fail, the growth of L has to be restricted, and the resultingestimators become inefficient. Ignoring the second-order bias leads to size-distortedstatistical tests.

The size of the second-order bias depends on the choice of the discretizationscheme only through the functions K, K and ~K that characterize the expectedapproximation errors of the different methods. This follows because the speed ofconvergence of the expected error is the same for all discretization schemes. We alsosee that the second-order bias is inversely related to the asymptotic variance of theestimator. Variance reduction, without second-order bias correction, will thereforeincrease the size distortion of asymptotic confidence intervals and hypothesis tests.

It is also important to note that the results of Theorem 12 do not directly coversimulated maximum likelihood estimators obtained from the numerical integrationof the Chapman–Kolmogorov equation (see Pedersen, 1995a,b; Brandt and Santa-Clara, 2002). For an estimator of this type, the Monte Carlo integration of the truetransition density kernel is based on a Gaussian kernel approximation withbandwidth 1=N determined by the number of discretization points N. The resultingscore function depends on the time discretization parameter N not just through thesimulated discretized trajectories but also through its functional form. As describedby Milshtein et al. (2004), this estimator of the transition density can be interpretedas a kernel estimator based on simulated i.i.d. data. It is well known that such anestimator will converge at a rate slower than M�1=2. In fact the rate for the score isM�2=ðdþ4Þ, a quantity that depends on the dimension of the diffusion. Optimality fora simulated maximum likelihood estimator requires that

ffiffiffiffiLp

=M2=ðdþ4ÞL ! e1 and

M2=ðdþ4ÞL =NL ! e2. It follows that efficient simulated maximum likelihood estima-

tors based on the numerical integration of the Chapman–Kolmogorov equation areless efficient than maximum likelihood estimators, even in the univariate case.

Theorem 12 shows that estimators with e1e2 ¼ 0 are inefficient. If L;ML;NL arechosen such that e1e2 ¼ 0, L has to grow at a slower rate than if 0oe1e2o1. Itfollows that the asymptotic variance and therefore the lengths of confidence intervals

ARTICLE IN PRESS


are larger than the variance and length of the asymptotic confidence interval of theefficient estimator, as L is too small compared to the efficient estimator. The efficientestimator requires quadrupling the sample length, quadrupling the number ofsimulations and doubling the number of discretization points, in order to cut theasymptotic variance and therefore the length of the confidence interval in half.

Corresponding optimal designs for the simulated maximum likelihood estimatorsof Pedersen (1995a,b) and Brandt and Santa-Clara (2002), depend in addition on thedimension of the diffusion. These authors assume e2 ¼ 0. This assumption,unfortunately, is not sufficient. To preclude an exploding second-order bias it mustalso be the case that

ffiffiffiffiLp

=ffiffiffiffiffiffiffiffiML

pdoes not diverge faster than


p=NL vanishes. The

rate of divergence of ML can therefore not be chosen independent of L.For estimators with a fixed number of discretization points and a fixed number of

Monte Carlo simulations the second-order bias always explodes when the samplesize becomes large. Asymptotic confidence intervals will then cover the true valueswith null probability. Hypothesis tests become invalid as their effective size canbecome considerably smaller than the nominal significance level of the test.

The results in Theorems 4, 5 and 9 describing the second-order biases for the Eulerschemes with and without Doss transformation and for the Milshtein scheme, enableus to construct second-order bias-corrected estimators that are both efficient andhave asymptotic confidence intervals that do not suffer from size distortion. It isimportant to note that asymptotic efficiency (i.e. estimators with the shortestasymptotic confidence intervals) can only be achieved by estimators that correct forsecond-order bias. Non-corrected estimators require a refinement of the timediscretization in the simulation of SDEs as the sample size increases. As aconsequence, the computational cost of efficient estimators without second-orderbias corrections explodes. Second-order bias-corrected estimators, alone, have thesame asymptotic distributions as estimators obtained from the infeasible optimalestimating function.31

5.2. Simulated extended quasi-maximum likelihood estimator

We now present a simulation-based version of the extended quasi-maximumlikelihood estimator (EQMLE) introduced for Markovian semi-martingales byWefelmeyer (1996). This estimator is a special case of a moment-based estimator. Itis based on parameter restrictions

aðY l ; yÞ � E

Z tlþ1

tl

AðY s; yÞds

��Y l

" #, (79)

31This does not imply that simulation-based inference methods without second-order bias correction are

inferior to estimators based on analytic orthogonal series expansions of the likelihood, such as those

proposed by Ait-Sahalia (2002). Explosions of the second-order biases associated with the efficient

estimators of this type can only be avoided by increasing the dimensionality of the orthogonal basis as the

sample increases. Second-order biases for such estimators are unknown.

ARTICLE IN PRESS


bðY l ; yÞ � E

Z tlþ1

tl

BðY s; yÞBðY s; yÞ0 ds

��Y l

" #(80)

implied by the drift and volatility functions of the diffusion process. The momentconditions associated with

h1;DðY l ;Y lþ1; yÞ � Y lþ1 � Y l � aðY l ; yÞ, (81)

h2;DðY l ;Y lþ1; yÞ � ðY lþ1 � Y l � aðY l ; yÞÞ2� bðY l ; yÞ, (82)

together with the optimal instruments JDðY lÞ are then used to obtain an estimatingfunction, as in the previous section, based on

giðY l ;Y lþ1; yÞ ¼ JDðY lÞhiðY l ;Y lþ1; yÞ for i ¼ 1; 2. (83)

Such an estimator is labelled extended quasi-maximum likelihood estimator

(EQMLE). This estimator is the most efficient in the class of estimating functionsbased on hi and is asymptotically equivalent to the GMM estimator based on theoptimal instruments (see Theorem 11).

The EQMLE is infeasible if both the drift and volatility functions ða; bÞ are notavailable in closed form. This is the case for general diffusions for which thetransition density is unknown or the expectation cannot be obtained in closed form.In this case calculation of the drift and volatility functions can proceed bysimulation. The resulting estimator, which uses the same optimal weights as theEQMLE, is called the simulated extended quasi-maximum likelihood estimator

(SEQMLE). The asymptotic error distribution of this estimator can be studiedwithin our theoretical framework. We also design estimators that correct for second-order bias. They are called simulated, bias-corrected, extended quasi-maximum

likelihood estimator (SBCEQMLE). The next section presents results for bothestimators for CIR and L-CEV processes.

5.3. An application of the simulated quasi-maximum likelihood estimator to CIR and

L-CEV processes

We perform a Monte Carlo study to analyze the properties of the simulation-based estimators associated with various discretization and transformation schemesintroduced in the previous sections. The candidate methods are the Eulerdiscretization scheme, with and without the Doss transformation, and the Milshteindiscretization scheme. The Monte Carlo setup is inspired by the study of Chan et al.(1992), also used by Ait-Sahalia (1999). We simulate 500 series of 306 monthlyobservations and compute our estimators, with and without bias correction(SEQMLE and SBCEQMLE) for the three methods just mentioned. For comparisonpurposes, we also report the results obtained with maximum likelihood estimatorsbased on the Euler approximation (ML-Euler) and on the approximations of thetransition densities proposed by Ait-Sahalia (1999) (ML-YAS) with Hermitepolynomials of order one (J ¼ 1) and two (J ¼ 2). The series in our setup areshort and the models considered are non-linear. The estimation task is therefore

ARTICLE IN PRESS

Table 1

RMSE and Average Bias of SEQMLE and SBCEQMLE estimators for CIR process

CIR: dY t ¼ kðY � Y tÞdtþ sffiffiffiffiffiffiY t

pdW t

True values: k ¼ 0:5, Y ¼ 0:06, s ¼ 0:072Method Y k s

RMSE (500 trials)

MLE-Euler 0.0073 0.2778 0.0032

MLE-YAS (J ¼ 1) 0.0073 0.2945 0.0029

MLE-YAS (J ¼ 2) 0.0073 0.3011 0.0029

SEQMLE-Euler 0.0108 0.2740 0.0047

SEQMLE-Doss 0.0108 0.2740 0.0047

SEQMLE-Milshtein 0.0108 0.2740 0.0047

SBCEQMLE-Euler 0.0109 0.2867 0.0048

SBCEQMLE-Doss 0.0109 0.2866 0.0048

SBCEQMLE-Milshtein 0.0109 0.2867 0.0048

Average bias (500 trials)

ML-Euler �0.0002 0.1342 �0.0016

MLE-YAS (J ¼ 1) �0.0002 0.1502 0.0002

MLE-YAS (J ¼ 2) �0.0002 0.1544 0.0002

SEQMLE-Euler 0.0043 0.0962 �0.0039

SEQMLE-Doss 0.0043 0.0962 �0.0039

SEQMLE-Milshtein 0.0043 0.0962 �0.0039

SBCEQMLE-Euler 0.0044 0.1085 �0.0040

SBCEQMLE-Doss 0.0044 0.1084 �0.0040

SBCEQMLE-Milshtein 0.0044 0.1085 �0.0040


difficult and standard errors of estimators will generally be high, especially for theL-CEV process.

In the simulation-based methods, the discretization parameter N ¼ 24 and thenumber of replications M ¼ 500 per observation. For the optimization procedure,we used quasi-Newton methods for computing the SEQMLE and SBCEQMLE, buta simplex method for the maximum likelihood estimators. The choice of the simplexmethod was motivated by the difficulties we experienced by using gradient-basedoptimization methods for the estimators based on the Hermite polynomialsapproximations (ML-YAS). The calculation of numerical derivatives using finitedifferences appeared numerically unstable and optimization in a Monte Carlocontext with gradient-based methods proved simply infeasible. We made sure thatthe simplex method provided reasonable results by comparing the estimation resultswith the actual series of short-term interest rates used in Ait-Sahalia (1996). Wereproduced the results reported in Table VI of Aıt-Sahalia, and verified that thesimplex method and a gradient-based method were giving very similar results interms of likelihood value and parameter estimates.

Table 1 displays the results for the CIR process. Algorithms for the variousestimators were started with the same initial values. These are obtained based on the

ARTICLE IN PRESS

Table 2

RMSE and Average Bias of SEQMLE and SBCEQMLE estimators for L-CEV process

L-CEV: dY t ¼ kðY � Y tÞdtþ sminfY t;Y%gg dW t

True values: k ¼ 0:5, Y ¼ 0:06, s ¼ 1:2, g ¼ 1:5 Y% ¼ 1010

(Y% guarantees thatR �0 minfY s;Y

%gg dW s remains a martingale for g41)

Method Y k s g

RMSE (500 trials)

ML-Euler 0.0131 0.3124 0.5828 0.1905

MLE-YAS (J ¼ 1) 0.0133 0.3232 0.6815 0.1807

SEQMLE-Euler 0.0433 0.3130 0.6173 0.1823

SEQMLE-Doss 0.0438 0.2965 0.6292 0.1822

SEQMLE-Milshtein 0.0432 0.3133 0.6169 0.1822

SBCEQMLE-Euler 0.0418 0.3221 0.5905 0.1699

SBCEQMLE-Doss 0.0460 0.3067 0.5958 0.1765

SBCEQMLE-Milshtein 0.0417 0.3222 0.5925 0.1744

Average bias (500 trials)

ML-Euler 0.0013 0.1336 �0.0679 �0.0573

MLE-YAS (J ¼ 1) 0.0015 0.1448 0.1014 �0.0149

SEQMLE-Euler 0.0243 �0.0357 0.0264 �0.0071

SEQMLE-Doss 0.0249 �0.0495 0.0320 �0.0071

SEQMLE-Milshtein 0.0241 �0.0346 0.0263 �0.0071

SBCEQMLE-Euler 0.0243 �0.0279 0.0151 0.0046

SBCEQMLE-Doss 0.0260 �0.0411 0.0284 0.0077

SBCEQMLE-Milshtein 0.0242 �0.0274 0.0165 0.0047


first-order autocorrelation coefficient for the speed of mean reversion, theunconditional sample mean for the long-term mean and the square-root of theunconditional sample variance divided by the sample mean for the volatilityparameter. The results differ across parameters. For the long-term mean, the resultsare better for the MLE methods both in terms of RMSE and average bias. For themean reversion parameter, the results of all methods are of the same order ofmagnitude for the RMSE, but both simulation-based methods, with and withoutbias correction, reduce the bias with respect to the MLE methods. Finally, for thevolatility parameter, the results appear to be slightly better for the MLE methods,both in terms of RMSE and average bias. For all parameters, the RMSE and theaverage bias obtained with the bias correction methods (SBCEQMLE) are, ifanything slightly larger than without bias correction. As discussed below, the effectsof bias corrections are more evident for the L-CEV process.

Table 2 provides the estimation results for the L-CEV process.32 Initial values forall estimators have been obtained by regressing observed increments on a constant

32The second-order approximation is not included because of severe numerical difficulties in maximizing

this function. Note that Ait-Sahalia (1999) does not report results with the second-order approximation

for estimating the parameters of the CEV model with interest rate data in Table VI.

ARTICLE IN PRESS


and the level of the process for the parameters of the drift function, and by regressingthe logarithm of the observed increments of the quadratic variation on a con-stant and the level of the process for the diffusion function. In terms of RMSE,the order of magnitude is similar for MLE and simulation-based methods exceptfor the long-term mean and the volatility parameter g. For the long-term mean,the MLE methods perform better, while for g the RMSE is smallest for thesimulation-based methods with bias correction (SBCEQMLE). However, forthe average bias, simulation-based methods improve results by a significant marginwith respect to the MLE methods, with the exception again of the long-termmean. The bias correction also appears sizable for k, s and g with respect tothe simulation-based methods without correction. Contrary to the results for theCIR process, the second-order bias-corrected estimates dominate in generalestimates without bias correction in terms of average bias. If we compare theresults between the various simulation-based methods, there are no differencesfor the CIR process, while the results appear to be better for Euler and Mihlsteinthan for Doss. In Durham and Gallant (2002), the simulation results wereshowing that the second-order bias was smallest if the Doss transformation wasused.

6. Conclusion

Error analysis for Monte Carlo estimators of numerical solutions of SDEs is adifficult task. The present paper introduced key elements to perform this analysis.Our investigation produced explicit formulas for error distributions that can be usedto construct asymptotically valid confidence intervals and to assess the asymptoticerrors of Euler schemes, with and without Doss transformation, and of the Milshteinscheme. Numerical experiments showed that the error distributions differsignificantly depending on whether the transformation is applied or not. Biascorrections were found to be important in the case of non-linear state variableprocesses, such as the L-CEV process. This underscores the importance of the errorstatistics obtained, as processes of relevance in financial economics often have a non-linear structure.

The characterizations obtained here permit a rigorous analysis of the asymptoticerror properties of a given estimator. They are therefore crucial for the design ofefficient Monte Carlo estimators of diffusion processes. Feasible efficient simulatedestimating function estimators were shown to require second-order bias corrections.Our explicit expressions for the second-order biases can be used to derive suchestimators.

In principle, error distributions, second-order biases and bias-corrected condi-tional estimators can also be derived when the conditional expectations to beestimated depend on unknown state variables. This situation is often encountered infinancial applications such as stochastic volatility models. The convergence results ofsimulation schemes presented could serve as a starting point for the study ofsimulation-based estimators for such latent variable models.

ARTICLE IN PRESS


Acknowledgements

The paper was presented at the CIRANO conference on Mathematical Financeand Econometrics (June 2001), the CIRANO conference on Monte Carlo Methodsin Finance (March 2002), the Bachelier Society Meetings in Crete (June 2002), andthe Econometric Society North American Winter Meeting in San Diego (January2004). Funding from MITACS is gratefully acknowledged. Rene Garcia also thanksHydro-Quebec and the Bank of Canada for financial support.

Appendix A. Proofs

We start with a series of auxiliary lemmas that are used to prove the main resultsof the paper. These lemmas give the weak limits of the components in the errorexpansions of the estimators (102) and (103).

Let ½Nt� be the integer part of Nt and define the time change ZNt ¼ ½Nt�=N if NteN

and ZNt ¼ t� 1=N otherwise. Note that limN!1 ZN

t ¼ t. We have

Lemma 1. The following weak convergence results hold:

V1;Nt � N

Z t

0

ðs� ZNs Þds)

1

2t � V1

t , (84)

V2;i;Nt � N

Z t

0

ðW is �W i � ZN

s Þds)1

2W i

t þ1ffiffiffiffiffi12p Zi

t � V2;it , (85)

V3;i;Nt � N

Z t

0

ðs� ZNs ÞdW i

s )1

2W i

t �1ffiffiffiffiffi12p Zi

t � V 3;it , (86)

V4;i;j;Nt �

ffiffiffiffiffiNp

Z t

0


s ÞdW js )

1ffiffiffi2p Z

i;jt � V

4;i;jt (87)

as N !1, where ððW iÞi2f1;...;dg; ðZiÞi2f1;...;dg; ðZ

i;jÞi;j2f1;...;dgÞ is a ð2d þ d2Þ-dimensional

standard Brownian motion.

Proof. As ZNt ¼ ½Nt�=N for NteN we have

R t

0ðs� ZN

s Þds ¼ ð1=N2ÞRNt

0ðs� ½s�Þds.

Using NR t

0ðs� ZNs Þds ¼ ð1=NÞ

P½Nt�k¼1

R½k�1;k½ðs� ½s�Þdsþ ð1=NÞ

RNt

½Nt�ðs� ½s�Þds we

then obtain

N

Z t

0

ðs� ZNs Þds ¼

1

2

½Nt�

Nþ

1

2

1

NðNt� ½Nt�Þ2 !

1

2t (88)

as N !1. The limit on the right-hand side uses ½s� ¼ k � 1 for s 2 ½k � 1; k½ and thebound Nt� ½Nt�p1.

ARTICLE IN PRESS


Similarly, it can be shown, using the scaling property of Brownian motion that

N

Z t

0


s Þds ¼1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

ðW is �W i

½s�Þds

þ1ffiffiffiffiffiNp

Z Nt

½Nt�

ðW is �W i

½s�Þds. ð89Þ

Ito’s lemma then impliesR½k�1;k½ðW

is �W i

½s�Þds ¼R k

k�1ðk � sÞdW i

s so that NR t

0ðW i

s�

W i � ZNs Þds ¼ ð1=

ffiffiffiffiffiNpÞP½Nt�

k¼1

R½k�1;k½ðk � sÞdW i

s þ ð1=ffiffiffiffiffiNpÞRNt

½Nt�ðW i

s �W i½s�Þds. Note

that the sequence of i.i.d. random variablesR½k�1;k½ðk � sÞdW i

s has variance 13 and

covariance 12 1fi¼jg with the Brownian motion W j. Donsker’s functional central limit

theorem (see Kallenberg, 1997, Theorem 12.9, p. 225) then gives

1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

ðk � sÞdW is )

1

2W i

t þ1ffiffiffiffiffi12p Zi

t, (90)

where Zi is a standard Brownian motion independent of W j for all j 2 f1; . . . ; dg.This establishes (85) because of the continuity of the pathwise integral with respect tothe Lebesgue measure,

P� limN!1

1ffiffiffiffiffiNp

Z Nt

½Nt�

ðW is �W i

½s�Þds ¼ 0. (91)

The same type of argument shows that

N

Z t

0

ðs� ZNs ÞdW i

s ¼1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

ðs� ½s�ÞdW is þ

1ffiffiffiffiffiNp

Z Nt

½Nt�

ðs� ½s�ÞdW is.

(92)

Donsker’s functional central limit theorem implies that the first part converges to aBrownian motion. The second part converges to zero, in probability, by thecontinuity of the Wiener integral,

P� limN!1

1ffiffiffiffiffiNp

Z Nt

½Nt�

ðs� ½s�ÞdW is ¼ 0. (93)

As the sequence of i.i.d. random variablesR½k�1;k½ðs� ½s�ÞdW i

s has variance 13 and

covariance 121fi¼jg with W j as well as covariance 1

61fi¼jg with

R½k�1;k½ðk � sÞdW j

s wehave

1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

ðs� ½s�ÞdW is )

1

2W i

t �1ffiffiffiffiffi12p Zi

t, (94)

ARTICLE IN PRESS


which establishes (86). It remains to show (87). Again, by the scaling property ofBrownian motion

ffiffiffiffiffiNp

Z t

0

ðW js �W j � ZN

s ÞdW is ¼

1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

ðW js �W

j½s�ÞdW i

s


Z Nt

½Nt�

ðW js �W

j½s�ÞdW i

s. ð95Þ

As the sequence of i.i.d random variablesR½k�1;k½ðW

js �W

j½s�ÞdW i

s has variance12and

is independent of W j,R½k�1;k½ðW

is �W i

½s�Þds as well as ofR½k�1;k½ðs� ½s�ÞdW i

s we can

appeal to Donsker’s invariance principle to conclude that

1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

ðW js �W

j½s�ÞdW i

s )1ffiffiffi2p Z

i;jt . (96)

The continuity of the Ito integral implies that P� limN!1ð1=ffiffiffiffiffiNpÞRNt

½Nt�ðW j

s �

Wj½s�ÞdW i

s ¼ 0. This establishes (87). &

In the sequel, we need to assess the convergence of stochastic integrals with respectto the processes VN � ½V1;N ;V 2;N ;V 3;N ;V 4;N � defined in Lemma 1. Results of Duffieand Protter (1992), that provide sufficient conditions for the weak convergence ofstochastic integrals, i.e. goodness, prove useful in that regard.

Definition 1. A process Y N is good if ðX N ;Y N Þ ) ðX ;Y Þ implies thatðX N ;Y N ;

R �0 X N

s dY Ns Þ ) ðX ;Y ;

R �0 X s dY sÞ.

Lemma 2. The semimartingales ðV 1;N ;V 3;N ;V 4;N Þ are good.

Proof. From condition A in Duffie and Protter (1992) it follows that V 1;N is good ifsupNN

R t

0ðs� ZNs Þdso1 for all t 2 ½0;T �. Similarly, V 3;N and V4;N are good if

supN VAR½V 3;i;Nt �o1 for all i ¼ 1; . . . ; d and supNVAR½V

4;i;j;Nt �o1 for all i; j ¼

1; . . . ; d and t 2 ½0;T �. But,

VAR½V 3;i;Nt � ¼ N2

Z t

0

ðs� ZNs Þ

2 ds, (97)

VAR½V4;i;j;Nt � ¼ NE

Z t

0

ðW is �W i

ZNsÞ2 ds1i¼j

� �¼ N

Z t

0

ðs� ZNs Þds1i¼j. (98)

It is therefore sufficient to show that N�R t

0ðs� ZNs Þ

� dso1 for � ¼ 1; 2 and t 2 ½0;T �.The bound 14ðt� ½t�Þ�X0 implies

N�

Z t

0

ðs� ZNs Þ

� ds ¼1

N

Z Nt

0

ðs� ½s�Þ� dso1

N

Z Nt

0

ds ¼ t. (99)

We conclude that V 1;N ;V 3;N ;V4;N are good. &

ARTICLE IN PRESS


The proof of our next lemma shows that the total variation of the semi-martingaleV 2;N is OPð

ffiffiffiffiffiNpÞ.33 As a result V 2;N cannot be good.

Lemma 3. The semi-martingale V2;N is not good.

Proof. We will show that N�1=2NR t

0 jW s �W ZNsjds ¼ OPð1Þ so that the total

variation of V 2;N satisfiesR t

0 jdV 2;Ns j ¼ OPð

ffiffiffiffiffiNpÞ. Note that

N1=2

Z t

0

jW s �W ZNsjds ¼

1

N

Z Nt

0

jW s �W ½s�jds

¼1

N

X½Nt��1

k¼0

Z kþ1

k

jW s �W kjdsþ1

N

Z Nt

½Nt�

jW s �W ½Nt�jds.

ð100Þ

By the continuity of the integral the second term ð1=NÞRNt

½Nt�jW s �W ½Nt�jds ¼ oPð1Þ.

For the first term, note that, by Jensen’s inequality,

1

N2

X½Nt��1

k¼0

E

Z kþ1

k

jW s �W kjds

� �2" #

p1

N2

X½Nt��1

k¼0

Z kþ1

k

ðs� kÞds

¼1

N

1

2

½Nt�

N! 0, ð101Þ

as ½Nt�=N ! t as N !1. By (100), (101) and invoking Chebyshev’s weak law oflarge numbers, we conclude34

P� limN!1

N�1=2N

Z t

0

jW s �W ZNsjds

¼ limN!1

1

N

X½Nt��1

k¼0

E

Z kþ1

k

jW s �W kjds

� �

¼ limN!1

1

N

X½Nt��1

k¼0

E½jW 1j�

Z 1

0

ffiffisp

ds ¼ t

ffiffiffi2

p

r2

3

� �,

where the last equality uses E½jW 1j� ¼ffiffiffiffiffiffiffiffi2=p

pand

R 10

ffiffisp

ds ¼ 23as well as ð½Nt�=NÞ !

t as N !1. This establishesR t

0jdV 2;N

s j ¼ OPðffiffiffiffiffiNpÞ. It follows that

supN

R t

0 jdV 2;Ns j ¼ supNN

R t

0 jW s �W ZNsjds ¼ 1. Theorem 3.2 of Jacod and Protter

(1998) then shows that V 2;Nt cannot be good. &

Define the errors UXN

t ¼ ½UXN1

t ; . . . ;UXN

dt � where U

XNl

t ¼ X Nl;t � X l;t, and U

X N

t ¼

½UXN

lt ; . . . ;U

X Nl

t � where UXN

lt ¼ X N

l;t � X Nl;ZN

t. To prove Theorem 1 we use the mean

33The random variable X N is OPðNÞ if P� limN!1ð1=NÞX N ¼ Ka0 for some random variable K. The

random variable X N is oPðNÞ if P� limN!1ð1=NÞX N ¼ 0. If P� limN!1X N ¼ Ka0 (resp.

P� limN!1X N ¼ 0) we say that X N is OPð1Þ (resp. oPð1Þ).34Chebyshev’s weak law of large numbers states that if ð1=NÞ

PNi¼1 Zi is such that

ð1=N2ÞPN

i¼1 E½ðZiÞ2� ! 0 then P� limN!1ðð1=NÞ

PNi¼1ðZ

i � E½Zi �ÞÞ ¼ 0.

ARTICLE IN PRESS


value theorem to write

AðX NZN

tÞ � AðX tÞ ¼ AðX N

t Þ � AðX tÞ � ðAðXNt Þ � AðX N

ZNtÞÞ

¼Xd

l¼1

ðqlAðX t þ l1;lUXN

lt elÞU

X Nl

t � qlAðXNt þ l3;lU

XNl

t elÞUX N

lt Þ,

where l�;l 2�0; 1½ for all l ¼ 1; . . . ; d, and e0l ¼ ½0; . . . ; 0; 1; 0; . . . ; 0� is the lth unit vectorof dimension d. A similar expression holds for BðX N

ZNtÞ � BðX tÞ. The expansion of the

error UX N

t of the Euler continuous approximation can be decomposed as

UXN

T ¼

Z T

0

Xd

l¼1

qlAðX s þ l1;lelUXN

ls ÞU

XNl

s ds

þ

Z T

0

Xd

l¼1

Xd

j¼1

qlBjðX s þ l2;lelUX N

ls ÞU

X Nl

s dW js

�

Z T

0

Xd

l¼1

qlAðXNs þ l3;lelU

XNl

s ÞUXN

ls ds

�

Z T

0

Xd

l¼1

Xd

j¼1

qlBjðXNs þ l4;lelU

XNl

s ÞUXN

ls dW j

s.

As UX N

ls ¼ AlðX

NZN

sÞðs� ZN

s Þ þPd

i¼1 Bl;iðXNZN

sÞðW i

s �W iZN

sÞ we have

UXN

T ¼

Z T

0

Xd

l¼1

qlAðX s þ l1;lelUXN

ls ÞU

XNl

s ds

þ

Z T

0

Xd

l¼1

Xd

j¼1


ls ÞU

X Nl

s dW js

�1

N

Z T

0

Xd

l¼1

qlAðXNs þ l3;lelU

XNl

s ÞAlðXNZN

sÞdV1;N

s

�1

N

Z T

0

Xd

l¼1

qlAðXNs þ l3;lelU

XNl

s ÞXd

j¼1

Bl;jðXNZN

sÞdV 2;j;N

s

�1

N

Z T

0

Xd

l¼1

Xd

j¼1


XNl

s ÞAlðXNZN

sÞdV3;j;N

s

�1ffiffiffiffiffiNp

Z T

0

Xd

l¼1

Xd

j¼1


X Nl

s ÞXd

i¼1

Bl;iðXNZN

sÞdV 4;i;j;N

s . ð102Þ

Lemmas 2 and 3 imply that the integral with respect to V 2;N is the only term whoselimit is difficult to find. Our next result shows that its limit must account for a‘‘Wong-Zakai correction term’’ (Wong and Zakai, 1964).

ARTICLE IN PRESS


Lemma 4. For a sequence of good semimartingales aN such that ðaN ;V2;NÞ ) ða;V2Þ

we haveR T

0 aNs dV 2;j;N

s )R T

0 as dV2;js þ ½a;V

2;j�T , for j ¼ 1; . . . ; d.

Proof. The integration by parts formula combined with the fact that V2;j;N is ofbounded total variation for fixed N, and that aN is good givesZ t

0

aNs dV 2;j;N

s ¼ aNt V

2;j;Nt �

Z t

0

V2;j;Ns daN

s � ½V2;j;N ; aN �t

¼ aNt V

2;j;Nt �

Z t

0

V2;j;Ns daN

s

) atV2;jt �

Z t

0

V 2;js das

¼

Z t

0

as dV 2;js þ ½a;V

2;j �t.

The second equality follows from ½V 2;j;N ; aN � ¼ 0, a consequence of the fact thatV 2;j;N is of bounded total variation for fixed N. The last equality is an application ofthe integration by parts formula. &

The asymptotic error expansion for X N in (102) involves an integral with respectto the bounded variation process V 2;j;N that is not good (by Lemma 3). The limit ofthis integral is therefore not the integral of the weak limit of the integrand withrespect to the weak limit of the integrator. Lemma 4 shows how the limit can beconstructed if the integrand is good. To apply this result to the integral with respectto V2;N in (102) it remains to show that the integrand is good. For this we need thefollowing lemma.

Lemma 5. The semimartingale X N is good.

Proof. By Theorem 2.5 of Kurtz and Protter (1991b) goodness is equivalent touniform tightness, defined as follows,

Definition 2 (Jakubowski et al., 1989). A sequence of semimartingales X N isuniformly tight if and only if for each t and any simple predictable HN such thatjHN jp1 (P-a.s.), the set f

R t

0 HNs� dX N

s ; NX1g, is stochastically bounded (uniformlyin N).35,36

To prove Lemma 5 it suffices to show that X N is uniformly tight. The proof is bycontradiction and uses arguments in Kurtz and Protter (1996).

To simplify notation set Y 0t ¼ ½t;W0t� and f ¼ ½A;B1; . . . ;Bd �, and write X N

T ¼

X 0 þPd

j¼0

R T

0 f jðXNZN

sÞdY j

s. As f 2 C2 (by assumption), P� limN!1X N ¼ X (byTheorem 3.1 of Jacod and Protter (1998)) and ZN

t ! t as N !1, we can apply thecontinuous mapping Theorem to conclude ðX N ; f ðX N

ZN Þ;Y Þ ) ðX ; f ðX Þ;Y Þ.

35H is simple predictable if there exists a sequence of stopping times 0 ¼ t1p �ptnþ1 ¼ T , and Fti

measurable finite random variables ðHiÞ0pipn such that Ht ¼ H01f0gðtÞ þPn

i¼1 Hi1�ti ;tiþ1 �ðtÞ.

36A set of random variables fX N ; NX1g is uniformly stochastically bounded if for any �40 there exists

d40 such that supN PðjX N j4dÞo�.

ARTICLE IN PRESS


Suppose now that X N is not uniformly tight. By Definition 2 it follows that thereexist a time t and simple predictable processes HN with jHN jp1 (P-a.s.), for whichthe set f

R t

0 HNs� dX N

s ; NX1g is not stochastically bounded, i.e. there exists an � forwhich there is no d40 such that supN Pðj

R t

0 HNs� dX sj4dÞo�. For this sequence

jHN jp1 (P-a.s.), and this �, we have supN0 infN4N0 PðjR t

0HN

s� dX Ns j4dÞX� for any

d40. Thus, there exists a sequence of positive constants cN !1, such that

�p lim infN!1

P

Z t

0

HNs� dX N

s

��4cN

� �

p lim infN!1

P

Z t

0

HNs� dX N

s 4cN

� �þ P �

Z t

0

HNs� dX N

s 4cN

� ��

¼ lim infN!1

P

Z t

0

HNs�

cNdX N

s 41

� �þ P �

Z t

0

HNs�

cNdX N

s 41

� ��

¼ lim infN!1

PXd

j¼0

Z t

0

HNj;s�

cNf jðX

NZN

sÞdY j

s41

!

þP �Xd

j¼0

Z t

0

HNj;s�

cNf jðX

NZN

sÞdY j

s41

!!

¼ 0.

The null limit on the last line uses goodness of Y and ðHNj;�f jðX

NZN�Þ=cN ;Y jÞj¼0;...;d )

ð0;Y Þ. The limit for the integrand follows from jHNj jp1, f jðX

NZN�Þ ) f jðX Þ, and

cN !1 (so that HNj f jðX

NZN� Þ=cN ) 0 uniformly in N). Goodness of Y then impliesPd

j¼0

R t

0ðHNj;s=cNÞf jðX

NZN

sÞdY j

s ) 0.

From this contradiction we conclude that X N is uniformly tight and hence

good. &

The weak limit results in Lemmas 1–5 are sufficient to prove Theorem 1.

Proof of Theorem 1 (Kurtz and Protter, 1991a). It follows from the weak limits inLemmas 1–5 that lines 2, 3 and 4 of the approximation error UX N

T in (102) convergeweakly to zero. The remaining terms (lines 1 and 5) converge weakly to a linear SDEwhose solution corresponds to the expression for UX

T in the theorem. &

If we apply the Doss transformation and sample Gaussian increments Wjtþðnþ1Þh �

Wjtþnh then the error UX

N

¼ XN� X of the Euler approximation with transforma-

tion can be expanded as

UXN

T ¼

Z T

0

Xd

l¼1

ql AðX s þ l1;lelUX

Nl

s ÞUX

Nl

s ds

�

Z T

0

Xd

l¼1

ql AðXN

s þ l3;lelUX

Nl

s ÞUX

Nl

s ds,

ARTICLE IN PRESS


where l�;l 2�0; 1½ for all l ¼ 1; . . . ; d, el ¼ ½0; . . . ; 1; . . . ; 0�0 is the lth unit vector and

UX

Nl

s ¼ XN

l;s � XN

l;ZNs. As U

XNl

s ¼ AlðX ZNsÞðs� ZN

s Þ þPd

j¼1 Bl;jðWjs �W

j

ZNsÞ we obtain

UXN

T ¼

Z T

0

Xd

l¼1

ql AðX s þ l1;lelUX

Nl

s ÞUX

Nl

s ds

�1

N

Z T

0

Xd

l¼1

ql AðXN

s þ l3;lelUX

Nl

s ÞAlðX ZNsÞdV1;N

s

�1

N

Z T

0

Xd

l¼1

ql AðXN

s þ l3;lelUX

Nl

s ÞXd

j¼1

Bl;j dV2;j;Ns . ð103Þ

As for the error expansion without transformation (102) we have integrals with

respect to XN. To find the limit using integration by parts we need to establish

goodness of XN. Our next lemma verifies this property.

Lemma 6. The semimartingale XN

is good.

Proof. The proof parallels the proof of Lemma 5. &

Proof of Theorem 2. As in the proof of Theorem 1 integrals with respect to V 2;j;N

need special treatment. Let al;j;Ns � ql AðX

N

s þ l3;lelUX

Nl

s ÞBl;j and note that al;j;N isgood by the continuity of ql A. An application of Lemma 4 then shows thatZ T

0

al;j;Ns dV 2;j;N

s )

Z T

0

ql AðX sÞBl;j dV2;js þ

1

2

Z T

0

Xd

k¼1

ql;kAðX sÞBk;j Bl;j ds.

(104)

As limN!1 ZNs ¼ s and P� limN!1U

XNl ¼ P� limN!1UX

Nl ¼ 0 and as V1;N is

good by Lemma 2, it follows that NUXN

) UX where

UXT ¼

Z T

0

Xd

l¼1

ql AðX sÞUX ls ds�

1

2

Z T

0

Xd

l¼1

½ql AðX sÞAlðX sÞ

þXd

j;k¼1

ql;kAðX sÞBk;j Bl;j �ds

�

Z T

0

Xd

l¼1

ql AðX sÞXd

j¼1

Bl;j1

2dW j

s þ1ffiffiffiffiffi12p dZj

s

� �.

This SDE is linear and its solution corresponds to the result announced. &

Proof of Corollary 1. The result follows from NðGðXN

T Þ � X T Þ ¼ qGðX T ÞUX

N

T þ

oPð1Þ, where oPð1Þ denotes terms that vanish in probability (see footnote 33), andqGðzÞ ¼ BðGðzÞÞ. &

ARTICLE IN PRESS


Proof of Theorem 3. The proofs are the same for the cases with and withouttransformation. The approximation error can be written as

1ffiffiffiffiffiffiMp

XMi¼1

gðX i;NT Þ � E0½gðX T Þ�

!

¼1ffiffiffiffiffiffiMp

XMi¼1

ðgðX i;NT Þ � gðX i

T ÞÞ þ1ffiffiffiffiffiffiMp

XMi¼1

ðgðX iT Þ � E0½gðX T Þ�Þ. ð105Þ

The Lindeberg central limit theorem for i.i.d. random variables gives


XMi¼1

ðgðX iT Þ � E0½gðX T Þ�Þ )

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVAR½gðX T ÞjF0�

pZ, (106)

where Z�Nð0; 1Þ. The Clark–Ocone formula gðX T Þ � E0½gðX T Þ� ¼R T

0 Es½DsgðX T Þ�dW s combined with the chain rule of Malliavin calculus (seeNualart, 1995) enables us to write VAR½gðX T ÞjF0� ¼

R T

0 E0½kEs½qgðX T ÞDsX T �k2�ds.

This establishes thatffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVAR½gðX T ÞjF0�

pZ ¼ LT ðX 0Þ in distribution. It remains to

find the weak limit of ð1=ffiffiffiffiffiffiMpÞPM

i¼1 ðgðXi;NT Þ � gðX i

T ÞÞ.If we introduce two sequences of numbers, NM and �M ¼


=NM such thatlimM!1 �M ¼ �o1 and limM!1NM ¼ 1, we can invoke Kolmogorov’s stronglaw of large numbers, to conclude that

limM!1

�MNM

1

M

XMi¼1

ðgðXi;NMT Þ � gðX i

T ÞÞ

!

¼ � limM!1

NME0½gðXNMT Þ � gðX T Þ�; P-a.s. ð107Þ

Theorem 4 on the expected approximation error (see below) then gives


XMi¼1

ðgðXi;NMT Þ � gðX i

T ÞÞ !�

2KT ðX 0Þ (108)

as M !1. The corresponding result for the estimator with transformation isobtained by invoking Theorem 5 instead Theorem 4. This establishes the claim forboth estimators. &

We now turn to the proof of the expected approximation error. The errorexpansion (102) shows that UX N

satisfies a linear SDE. The solution is

NðONT Þ�1UX N

T ¼ �

Z T

0

ðONs Þ�1 dIN

1;s �

Z T

0

ðONs Þ�1 dIN

2;s

�

Z T

0

ðONs Þ�1 dðIN

3;s � ½RN ; IN

3 �sÞ

�ffiffiffiffiffiNp

Z T

0

ðONs Þ�1 dðIN

4;s � ½RN ; IN

4 �sÞ, ð109Þ

ARTICLE IN PRESS


where ONT ¼ ERðRNÞT with RN

T ¼ ½RN1;T ; . . . ;R

Nd;T � such that

RNi;T ¼

Z T

0

qiAðX s þ l1;ieiUX N

is Þdsþ

Z T

0

Xd

j¼1

qiBjðX s þ l2;ieiUXN

is ÞdW j

s (110)

for i ¼ 1; . . . ; d, and

IN1;T �

Z T

0

Xd

l¼1

qlAðXNs þ l3;lelU

X Nl

s ÞAlðXNZN

sÞdV 1;N

s ,

IN2;T �

Z T

0

Xd

l¼1

qlAðXNs þ l3;lelU

X Nl

s ÞXd

j¼1

Bl;jðXNZN

sÞdV2;j;N

s ,

IN3;T �

Z T

0

Xd

l¼1

Xd

j¼1


X Nl

s ÞAlðXNZN

sÞdV 3;j;N

s ,

IN4;T �

Z T

0

Xd

i;j¼1

Xd

l¼1


XNl

s ÞBl;iðXNZN

sÞ

!dV4;i;j;N

s .

In order to find the limit of the expectation of NUXN

T we need the following lemma.

Lemma 7. Define JN4;T �

R T

0 ðONs Þ�1 dðIN

4;s � ½RN ; IN

4 �sÞ. For any FT -measurable

square integrable d � 1 random vector HT , and FT -measurable sequence of d � 1random vectors such that HN

T ) HT and

limr!1

lim supN

E0½jffiffiffiffiffiNpðHN

T Þ0JN

4;T j1fjffiffiffiNp

H 0T

JN4;Tj4rg� ¼ 0 (111)

(i.e. ðHNT Þ0JN

4;T is asymptotically uniformly integrable) the cross momentffiffiffiffiffiNp

E0½ðHNT Þ0JN

4;T � !12E0½U2;T �, (112)

where

U2;T ¼Xd

i;j;k¼1

½hk;jbk;i;j ;W i�T �HkT

Z T

0

dk;i;js dW i

s þ ½dk;i;j ;W i�T

� �� (113)

with bk;i;jt ; dk;i;j

t the kth elements of the d � 1 vectors bi;jt ; d

i;jt given by

di;jt ¼ O�1t qBj

Xd

l¼1

qlBj Bl;i

" #ðX tÞ for i; j ¼ 1; . . . ; d, (114)

bi;jt ¼ O�1t

Xd

l¼1

qlBjBl;i

" #ðX tÞ for i; j ¼ 1; . . . ; d (115)

ARTICLE IN PRESS


and with fhk;jt : j ¼ 1; . . . ; dg the integrands in the martingale representation of Hk

T ,

HkT � E0½H

kT � ¼

Xd

j¼1

Z T

0

hk;js dW j

s for k ¼ 1; . . . ; d. (116)

Proof. To establish this limit, recall that X N is good and that the coefficients arecontinuous by assumption. It follows that RN is good as well. Consider now the limit

offfiffiffiffiffiNp

E0½ðHNT Þ0R T

0 ðONs Þ�1d½RN ; IN

4 �s�. Given that

ffiffiffiffiffiNp

Z T

0

ðONs Þ�1 d½RN ; IN

4 �s ¼Xd

i;j¼1

Z T

0

di;js dV 2;i;N

s þ oPð1Þ, (117)

where di;j is the fixed random variable (114) we can apply Lemma 4 to obtain theweak limit

ffiffiffiffiffiNp

Z T

0


4 �s )Xd

i;j¼1

Z T

0

di;js dV 2;i

s þXd

i;j¼1

½di;j ;V2;i�T . (118)

Weak convergence, along with the uniform integrability offfiffiffiffiffiNpðHN

T Þ0R T

0ðON

s Þ�1 d½RN ; IN

4 �s in assumption (111), implies convergence in means. We obtain,

limN!1

ffiffiffiffiffiNp

E0 ðHNT Þ0

Z T

0


4 �s

� �

¼ E0 H 0T

Xd

i;j¼1

Z T

0

di;js dV 2;i

s þXd

i;j¼1

½di;j ;V 2;i�T

!" #

¼1

2

Xd

i;j¼1

E0 H 0T

Z T

0

di;js dW i

s þ ½di;j ;W i�T

� �� ,

where the last equality uses the independence of ðdi;j� ;HT Þ from Zi

� in V 2;i.Next, consider the limit of

ffiffiffiffiffiNp

E0½ðHNT Þ0R T

0 ðONs Þ�1 dIN

4;s� where HT is an arbitrarysquare integrable vector of FT -measurable random variables. By the Martingale

Representation Theorem we can, for each k ¼ 1; . . . ; d, write Hk;NT � E0½H

k;NT � ¼Pd

l¼1

R T

0hk;l;N

s dW ls. As

ffiffiffiffiffiNp

d½V4;i;j;N ;W l �t ¼ 1fj¼lg dV2;i;Nt we have

Xd

l¼1

Z �0

hk;l;Ns dW l

s;ffiffiffiffiffiNp

Z �0

ðONs Þ�1 dIN

4;s

� �T

¼Xd

i;j¼1

Z T

0

hk;j;Ns bi;j

s dV 2;i;Ns þ oPð1Þ

(119)

)Xd

i;j¼1

Z T

0

hk;js bi;j

s dV 2;is þ ½h

k;jbi;j ;V 2;i�T , (120)

ARTICLE IN PRESS


where the limit in the second line follows from Lemma 4 and goodness of thesequence hk;j;N .

Combining the results above gives, with hk;N0 �

ffiffiffiffiffiNp

E0½Hk;NT

R T

0 ðONs Þ�1 dIN

4;s�,

limN!1

hk;N0 ¼ lim

N!1

ffiffiffiffiffiNp

E0 E0½Hk;NT � þ

Xd

l¼1

Z T

0

hk;l;Ns dW l

s

! Z T

0

ðONs Þ�1 dIN

4;s

� �" #

¼ limN!1

ffiffiffiffiffiNp Xd

l¼1

E0

Z �0

hk;l;Ns dW l

s;

Z �0

ðONs Þ�1 dIN

4;s

� �T

� �

¼ limN!1

Xd

i;j¼1

E0

Z T

0

hk;j;Ns bi;j

s dV2;i;Ns

� �

¼Xd

i;j¼1

E0

Z T

0

hk;js bi;j

s dV 2;is þ ½h

k;jbi;j ;V2;i�T

� �

¼1

2

Xd

i;j¼1

E0½½hk;jbi;j ;W i�T �.

The second line in this derivation uses E0½R T

0 ðONs Þ�1 dIN

4;s� ¼ 0. The third and fourthlines follow from (119), (120) and the uniform integrability assumption (111). The

last line uses E0½R T

0 hk;js bi;j

s dV2;is � ¼ 0 for all i; j; k ¼ 1; . . . ; d. &

We now proceed with the proof of Theorem 4.

Proof of Theorem 4. By Lemma 7 we know that

ffiffiffiffiffiNp

E0 ðHNT Þ0

Z T

0

ðONs Þ�1dðIN

4;s � ½RN ; IN

4 �sÞ

� �!

1

2E0½U2;T �, (121)

with U2;T as in (113), for any FT -measurable sequence of random vectors HNT ) HT .

Given that other terms in (109) are of the same order and that each term isasymptotically uniformly integrable by assumption, we can find the expectedapproximation error from the weak limits of these terms.

As P� limN!1UX N¼ 0, and

d½RN ; IN3 �s ¼

Xd

j;k¼1

qBkðX sÞgj;N3;s d½W

k;V3;j;N �s

¼Xd

j;k¼1

qBkðX sÞgj;N3;s dV 1;N

s 1fk¼jg

¼Xd

j¼1

qBjðX sÞgj;N3;s dV 1;N

s

with

gj;N3;s �

Xd

l¼1


ls ÞAlðX

NZN

sÞ, (122)

ARTICLE IN PRESS


we get

½RN ; IN3 �T )

1

2

Z T

0

Xd

j¼1

½qBj qBj A�ðX sÞds. (123)

For the remaining terms we have ðIN1 ; I

N2 ; I

N3 Þ ) ðI1; I2; I3Þ where

I1;T ¼1

2

Z T

0

qAðX sÞAðX sÞds,

I2;T ¼

Z T

0

qAðX sÞXd

j¼1

BjðX sÞ1

2dW j


s

� �

þ1

2

Z T

0

Xd

j;k;l¼1

½qkðqlABl;jÞBk;j�ðX sÞds,

I3;T ¼

Z T

0

Xd

j¼1

qBjðX sÞAðX sÞ1

2dW j

s �1ffiffiffiffiffi12p dZj

s

� �.

For completeness we establish the limit of IN2;T . The other two limits are obtained

along the same lines. Applying Lemma 4 with aNs ¼

Pdl;j¼1 qlAðX

Ns þ

l3;lelUX N

ls ÞBl;jðX

NZN

sÞ, where the sequence aN is good, givesZ T

0

aNs dV 2;j;N

s )

Z T

0

as dV 2;js þ ½a;V

2;j�T , (124)

where as ¼Pd

l;j¼1 ½qlABl;j�ðX sÞ. Given that V2;jt ¼

12

Wjt þ

1ffiffiffiffi12p Z

jt Ito’s lemma implies

½a;V2;j�T ¼1

2

Z T

0

Xd

l;j¼1

½q½qlABl;j��ðX sÞd½X ;Wj�s

¼1

2

Z T

0

Xd

l;j;k¼1

½qk½qlABl;j�Bk;j�ðX sÞds, ð125Þ

where the second equality follows from d½X ;W j�s ¼ BjðX sÞds. This explains the limitof IN

2 .As g 2 C3ðRdÞ and as X N converges, the mean value theorem shows that

NðgðX NT Þ � gðX T ÞÞ ¼ qgðX T þ diag½l�UX N

T ÞNUXN

T

¼ qgðX T þ diag½l�UX N

T ÞONT ðNðO

NT Þ�1UX N

T Þ

for some vector l ¼ ½l1; . . . ; ld � with li 2�0; 1½; i ¼ 1; . . . d, where diag½l� is the

diagonal matrix with l on the diagonal and zeros elsewhere, and where NðONT Þ�1UXN

T

is defined in (109).Now define Hk

T � qgðX T ÞOkT for k ¼ 1; . . . ; d where Ok is the kth column of the

matrix process O. As NðgðX NT Þ � gðX T ÞÞ is uniformly integrable by assumption (31) we

can apply Lemma 7 to conclude that the expected approximation error converges to

NE0½gðXNT Þ � gðX T Þ� ! E0½qgðX T ÞU1;T �U2;T �, (126)

ARTICLE IN PRESS


where ðOT Þ�1U1;T is the limit of the first 3 terms in (109), i.e.

ðOT Þ�1U1;T ¼ �

Z T

0

ðOsÞ�1 dI1;s �

Z T

0

ðOsÞ�1 dI2;s �

Z T

0

ðOsÞ�1 dI3;s

þ1

2

Z T

0

Xd

j¼1

½qBj qBj A�ðX sÞds� ð127Þ

and U2;T is defined in Lemma 7 with HkT ¼ qgðX T ÞOk

T .

Let us now simplify U2;T . Note that (113) can be written as U2;T �Pdi;j;k¼1 ðU

k;i;j2;1;T�Hk

T Uk;i;j2;2;T Þ where U

k;i;j2;1;T � ½h

k;jbk;i;j ;W i�T and Uk;i;j2;2;T �

R T

0 dk;i;js dW i

s

þ½dk;i;j ;W i�T .

To simplify the second term, Uk;i;j2;2;T , recall the d � 1 process di;j

t ¼

O�1t ½qBj

Pdl¼1 qlBj Bl;i�ðX tÞ. Ito’s lemma gives

ddi;jt ¼ O�1t d qBj

Xd

l¼1

qlBj Bl;i

!þ ðdðO�1t ÞÞ qBj

Xd

l¼1

qlBj Bl;i

!

þ d O�1; qBj

Xd

l¼1

qlBj Bl;i

" #. ð128Þ

From dOt ¼ ½qAðX tÞdtþPd

j¼1 qBjðX tÞdWjt�Ot we deduce

dO�1t ¼ � O�1t ðdOtÞO�1t � O�1t d½O;O�1�t

¼ � O�1t qAðX tÞdtþXd

j¼1

qBjðX tÞdWjt

" #þ O�1t

Xd

j¼1

½qBj qBj�ðX tÞdt

ð129Þ

and

d½di;j ;W i�t ¼ O�1t d qBj

Xd

l¼1

qlBj Bl;i;Wi

" #t

þ d½O�1;W i�t qBj

Xd

l¼1

qlBj Bl;i

!

¼ O�1t q qBj

Xd

l¼1

qlBj Bl;i

" #d½X ;W i�t

� O�1X

j

qBj d½Wj ;W i�t qBj

Xd

l¼1

qlBj Bl;i

!

¼ O�1t q qBj

Xd

l¼1

qlBj Bl;i

" #Bi dt� O�1qBi qBj

Xd

l¼1

qlBj Bl;i

!dt.

Given thatPd

l¼1 qlBj Bl;i ¼ qBj Bi we can write

Xd

i;j¼1

½di;j ;W i�T ¼

Z T

0

Xd

i;j¼1

d½di;j ;W i�t ¼

Z T

0

O�1t ZðX tÞdt, (130)

ARTICLE IN PRESS


with

Z �Xd

i;j¼1

ðq½qBj qBj Bi�Bi � qBi qBj qBj BiÞ. (131)

Let us now simplify the first term, Uk;i;j2;1;T � ½h

k;jbk;i;j ;W i�T . The Clark–Oconeformula (see Nualart, 1995) and the chain rule of Malliavin calculus give

Uk;i;j2;1;T � ½h

k;jbk;i;j ;W i�T ¼

Z T

0

Es½Di;sðhk;js bk;i;j

s Þ�ds

¼

Z T

0

Es½ðDi;shk;js Þb

k;i;js þ hk;j

s ðDi;sbk;i;js Þ�ds.

By definition, hk;j is the process in the representation of HkT � qgðX T ÞOk

T . TheClark–Ocone formula gives hk;j

t ¼ Et½Dj;tðPd

l¼1 qlgðX T ÞOl;kT Þ� and the rules of

Malliavin calculus establish that

Dj;t

Xd

l¼1

qlgðX T ÞOl;kT

!¼Xd

l;n¼1

q2l;ngðX T ÞðDj;tX n;T ÞOl;kT þ

Xd

l¼1

qlgðX T ÞDj;tOl;kT ,

(132)

D2i;j;t

Xd

l¼1

qlgðX T ÞOl;kT

!¼

Xd

l;m;n¼1

q3l;m;ngðX T ÞðDi;tX m;T ÞðDj;tX n;T ÞOl;kT

þXd

l;n¼1

q2l;ngðX T ÞðD2i;j;tX n;T ÞO

l;kT

þXd

l;n¼1

q2l;ngðX T ÞðDj;tX n;T ÞDi;tOl;kT

þXd

l;n¼1

q2l;ngðX T ÞðDi;tX n;T ÞDj;tOl;kT

þXd

l¼1

qlgðX T ÞD2i;j;tO

l;kT ,

where

Di;tX m;T ¼Xd

h¼1

Om;ht;T Bh;iðX tÞ, (133)

D2i;j;tX m;T ¼

Xd

h¼1

ðDi;tOm;ht;T ÞBh;jðX tÞ þ

Xd

h¼1

Om;ht;T ½ðqBh;jÞBi�ðX tÞ (134)

and where, from

dOl;kt;v ¼

Xd

h¼1

qhAlðX vÞdvþXd

j¼1

qhBl;jðX vÞdW jv

" #Oh;k

t;v ; Ol;kt;t ¼ 1fl¼kg, (135)

ARTICLE IN PRESS


we deduce that Gl;k1;i ðt;TÞ � Di;tO

l;kt;T and Gl;k

2;i;jðt;TÞ ¼ D2i;j;tO

l;kt;T solve the linear

SDEs,

dGl;k1;i ðt; vÞ ¼

Xd

h¼1

qhAlðX vÞdvþXd

j¼1

qhBl;jðX vÞdW jv

!Gh;k1;i ðt; vÞ

þXd

h;m¼1

q2h;mAlðX vÞdvþXd

j¼1

q2h;mBl;jðX vÞdW jv

!Oh;k

v ðDi;tX m;vÞ

with Gl;k1;i ðt; tÞ ¼

Pdh¼1 qhBl;iðX tÞOh;k

t for i; l; k ¼ 1; . . . ; d, and

dGl;k2;i;jðt; vÞ ¼

Xd

h¼1

qhAlðX vÞdvþXd

p¼1

qhBl;pðX vÞdW pv

!Gh;k2;i;jðt; vÞ

þXd

h;m¼1


p¼1

q2h;mBl;pðX vÞdW pv

!ðDi;tX m;vÞG

h;k1;j ðt; vÞ

þXd

h;m¼1


p¼1


!Oh;k

v D2i;j;tX m;v

þXd

h;m;n¼1

q3h;m;nAlðX vÞdvþXd

p¼1

q3h;m;nBl;pðX vÞdW pv

!

�Oh;kv ðDj;tX m;vÞDi;tX n;v,

with Gl;k2;i;jðt; tÞ¼

Pdh;n¼1 q

2h;nBl;jðX tÞBn;iðX tÞOh;k

t þPd

h¼1 qhBl;jðX tÞGh;ki ðt; tÞ for i; j; l; k ¼

1; . . . ; d.From the definition bi;j

t ¼ O�1t ½ðqBjÞBi�ðX tÞ we also deduce that

Di;tbi;jt ¼ Di;tðO�1t ½ðqBjÞBi�ðX tÞÞ ¼ O�1t ½ðq½ðqBjÞBi� � qBiðqBjÞÞBi�ðX tÞ. (136)

Setting ni;jðt;TÞ � Et½ðDi;thk;jt Þb

k;i;jt þ hk;j

t ðDi;tbk;i;jt Þ� and using the law of iterated

expectations gives

ni;jðt;TÞ ¼Xd

k¼1

ðCk;i;jðt;TÞ½ðqBjÞBi�ðX tÞ þ Fk;i;jðt;TÞO�1t

�½ðq½ðqBjÞBi� � qBiðqBjÞÞBi�ðX tÞÞ, ð137Þ

where

Fk;i;jðt;TÞ ¼Xd

l;n¼1

q2l;ngðX T ÞXd

h¼1

On;ht;T Bh;jðX tÞ

!Ol;k

T þXd

l¼1

qlgðX T ÞGl;k1;j ðt;TÞ,

(138)

ARTICLE IN PRESS


Ck;i;jðt;TÞ ¼Xd

l;m;n¼1

q3l;m;ngðX T ÞXd

h¼1

Om;ht;T Bh;iðX tÞ

! Xd

h¼1

On;ht;T Bh;jðX tÞ

!Ol;k

T

þXd

l;m¼1

q2l;mgðX T ÞXd

h¼1

Gm;h1;i ðt;TÞBh;jðX tÞ þ

Xd

h¼1

Om;hT ½ðqBh;jÞBi�ðX tÞ

!Ol;k

T

þXd

l;m¼1

q2l;mgðX T ÞXd

h¼1

Om;ht;T Bh;jðX tÞ

!Gl;k1;i ðt;TÞ ð139Þ

and where Ol;kt;v solves (135) while Gl;k

1;i ðt; vÞ and Gl;k2;i;jðt; vÞ solve the linear SDEs

dGl;k1;i ðt; vÞ ¼

Xd

h¼1

qhAlðX vÞdvþXd

j¼1

qhBl;jðX vÞdW jv

!Gh;k1;i ðt; vÞ

þXd

h;m¼1


j¼1

q2h;mBl;jðX vÞdW jv

!

�Oh;kv

Xd

n¼1

Om;nt;v Bn;iðX tÞ

!ð140Þ

with Gl;k1;i ðt; tÞ ¼

Pdh¼1 qhBl;iðX tÞOh;k

t , and

dGl;k2;i;jðt; vÞ ¼

Xd

h¼1

qhAlðX vÞdvþXd

p¼1

qhBl;pðX vÞdW pv

!Gh;k2;i;jðt; vÞ

þXd

h;m¼1


p¼1


!

�Xd

n¼1

Om;nt;v Bn;iðX tÞ

!Gh;k1;j ðt; vÞ

þXd

h;m¼1


p¼1


!Oh;k

v

�Xd

h¼1

Gm;h1;i ðt; vÞBh;jðX tÞ þ Om;h

t;v ½ðqBh;jÞBi�ðX tÞ

� �

þXd

h;m;n¼1

q3h;m;nAlðX vÞdvþXd

p¼1

q3h;m;nBl;pðX vÞdW pv

!Oh;k

v

�Xd

p¼1

Om;pt;v Bp;jðX tÞ

! Xd

q¼1

On;qt;v Bq;iðX tÞ

!ð141Þ

with Gl;k2;i;jðt; tÞ ¼

Pdh;n¼1 q

2h;nBl;jðX tÞBn;iðX tÞOh;k

t þPd

h¼1 qhBl;jðX tÞGh;k1;i ðt; tÞ.

Substituting the definitions U2;T �Pd

i;j;k¼1 ðUk;i;j2;1;T �Hk

T Uk;i;j2;2;T Þ and Hk

T �

qgðX T ÞOkT in (126), using the results above and eliminating expectations of

stochastic integrals with respect to Zj (which is independent from FT ), enables us

ARTICLE IN PRESS


to rewrite the limit (126) as

NE0½gðXNT Þ � gðX T Þ� ! E0 qgðX T ÞU1;T �

Xd

i;j;k¼1

ðUk;i;j2;1;T �Hk

T Uk;i;j2;2;T Þ

" #

¼ E0 qgðX T Þ U1;T þXd

i;j;k¼1

OkT U

k;i;j2;2;T

!�Xd

i;j;k¼1

Uk;i;j2;1;T

" #

¼1

2E0½qgðX T ÞV 1;T þ V 2;T � ð142Þ

with V1;T ;V 2;T as defined in (29) and (30). This establishes the result. &

Proof of Corollary 2. The estimatorPN�1

n¼0 gðX NnhÞh based on the Euler discrete

approximation X Nnh with h ¼ T=N, is asymptotically equivalent to the estimatorR T

0gðX N

ZNsÞds based on the Euler continuous approximation X N

ZNs(Jacod and Protter,

1998, Lemma 1). To find the asymptotic limit of the expected approximation error ofthe Riemann integral, we can then study the approximation error of the continuousEuler approximation. This error has two parts:Z T

0

ðgðX NZN

sÞds� gðX sÞÞds ¼ HN

1T ðX 0Þ þHN2T ðX 0Þ, (143)

where

HN1T �

Z T

0

ðgðX Ns Þ � gðX sÞÞds,

HN2T � �

Z T

0

ðgðX Ns Þ � gðX N

ZNsÞÞds.

By Theorem 4 NE0½gðXNs Þ � gðX sÞ� !

12

KsðX 0Þ, P0-a.s., so that

NE0½HN1T � !

1

2

Z T

0

KsðX 0Þds, (144)

P0-a.s. It remains to study the second component, HN2T . As shown in the proof of

Theorem 1, we get, by the mean value theorem,

�NHN2T ¼

Z T

0

qgðX sÞ AðX NZN

sÞdV 1;N

s þXd

j¼1

BjðXNZN

sÞdV2;j;N

s

!þ oPð1Þ. (145)

The result now follows using the same arguments as those invoked in the proof ofTheorem 4 to find the expected value of limits involving the processes V1;N and V 2;N .In particular, with aN

1s � qgðX sÞAðXNZN

sÞ and aj;N

2s � qgðX sÞBjðXNZN

sÞ, we use

E0

Z T

0

aN1s dV 1;N

s

� �!

1

2E0

Z T

0

a1s ds

� �, (146)

E0

Z T

0

aj;N2s dV 2;j;N

s

� �! E0

Z T

0

ðaj2s dV 2;j

s þ d½aj2;V

2;j�sÞ

� �, (147)

ARTICLE IN PRESS


P0-a.s., where a1s ¼ ½ðqgÞA�ðX sÞ and aj2s ¼ ½ðqgÞBj�ðX sÞ. The expression for K2T ðX 0Þ

follows because the expectation of the integral relative to the martingale V2;j is nulland because d½aj

2;V2;j�s ¼

12d½aj

2;Wj�s. Calculating d½aj

2;Wj �s yields the expression for

K2T ðX 0Þ. &

We now turn to the expected error associated with the Doss transformation.

Proof of Theorem 5. Note that all the terms of the error expansion (103) are of order1=N and that the limit distribution is non-centered. If the terms in the errorexpansion are uniformly integrable, the result follows by taking the expectation ofthe solution of the linear SDE for the approximation error

NUX

Nt

T ¼ �ON

T

Z T

0

ðON

s Þ�1 dI

N

1;s þ

Z T

0

ðON

s Þ�1 dI

N

2;s

� �, (148)

where ON

T ¼ EðRNÞT with R

N

T ¼ ½RN

1;T ; . . . ; RN

d ;T � and RN

i;T ¼ qiAðX s þ l1;ieiUX

N

s Þ fori ¼ 1; . . . ; d and where

IN

1;T �

Z T

0

Xd

l¼1

ql AðXN

s þ l3;lelUX

Nl

s ÞAlðXN

ZNsÞdV 1;N

s ,

IN

2;T �

Z T

0

Xd

l¼1

ql AðXN

s þ l3;lelUX

Nl

s ÞXd

j¼1

Bl;jðXN

ZNsÞdV2;j;N

s .

As ðON

t ; IN

1;t; IN

2;t; XN

t Þ ) ðOt; I1;t; I2;t; X tÞ with Ot; X t as defined in Theorem 2 and

I1;T �1

2

Z T

0

½ðqAÞA�ðX sÞds

I2;T �

Z T

0

qAXd

j¼1

Bj

" #ðX sÞ

1

2dW j


s

� �

þ1

2

Z T

0

Xd

j;k;l¼1

ql;kAðX sÞBk;j Bl;j ds

(the expression for I2;T follows from (104)), the result follows using the samearguments as in the proof of Theorem 2. &

Proof of Corollary 3. We proceed as in the proof of Corollary 2. From Theorem 5 itfollows that

NE0

Z T

0

ðgðXN

s Þ � gðX sÞÞds

� �!

1

2E0

Z T

0

qgðX sÞV s ds

� �. (149)

The result for the second part of the expected approximation error, NE0½R T

0 ðgðXN

s Þ�

gðXN

ZNsÞÞds�, follows using the same arguments as in the proof of Corollary 2 for

K2T ðX 0Þ. Simply replace the drift coefficient A by A and the volatility coefficient B

by B. With Bj deterministic we get qBj ¼ 0 and the expression announcedfollows. &

ARTICLE IN PRESS


Proof of Theorem 6. Consider the Euler discrete approximations (172)–(173), inAppendix B, of the processes NCN

j;� for j ¼ 1; 2. The Euler continuous approxima-tions are

dðNCN1;sÞ ¼ qAðX N

ZNsÞhþ

Xd

j¼1

qBjðXNZN

sÞdW j

s

" #ðNCN

1;sÞ

�Xd

i;j¼1

½q½qBj qBj Bi�Bi�ðXNZN

sÞ

!ds

þ ðqAÞAþXd

j¼1

qBj qA Bj þXd

j;k;l¼1

qkðqlABl;jÞBk;j

" #ðX N

ZNsÞ

!ds

þXd

j¼1

½ðqAÞBj þ ðqBjÞA�ðXNZN

sÞdW j

s �Xd

i;j¼1

ðqBjÞðqBjÞBi�ðXNZN

sÞdW i

s,

dðNCN2;sÞ ¼

Xd

i;j¼1

nNi;jðZ

Ns ;TÞds

with CN1;0 ¼ CN

2;0 ¼ 0. Thus, NE0½qgðX NNhÞC

N1;Nh þ CN

2;Nh� ! �KT ðX 0Þ.We can use the arguments in the proofs of Theorems 4 and 5 to show that, for NM

and �M ¼ffiffiffiffiffiffiMp

=NM such that limM!1 �M ¼ �o1 and limM!1NM ¼ 1, we have

limM!1

1

2


XMi¼1

qgðXi;NMNM hÞC

i;NM1;NM h þ C

i;NM2;NM h

� �

¼1

2� lim

NM!1NME0 qgðX

NMT ÞC

NM1;T þ C

NM2;T

h i

¼ �1

2�KT ðX 0Þ.

Defining

gN ;Mc �

1

M

XMi¼1

gðX i;NNh Þ þ

1

2ðqgðX i;N

Nh ÞCi;N1;Nh þ Ci;N

2;NhÞ

� �, (150)

it then follows that

1

2


XMi¼1

ðqgðXi;NMNM hÞC

i;NM1;NM h þ C

i;NM2;NM hÞ

!(151)

corrects the asymptotic second-order bias for the estimator without transformationwhen M !1.

The proof for the estimator with transformation follows the same steps. In this

case the average over independent replications of the random variables 12 qgðX

N

T ÞCN

nh

approximates the negative of the second-order bias with transformation.

ARTICLE IN PRESS


The asymptotic equivalence of the bias-corrected estimators with and withouttransformation is a consequence of the fact that they have the same asymptoticdistribution. &

Proof of Corollary 4. The proof follows from the arguments used to establishTheorem 3 and from Corollaries 2 and 3. &

To prove convergence results for the Milshtein scheme, we need the followinglemmas.

Lemma 8. The following weak convergence results hold

V5;l;j;Nt � N3=2

Z t

0

Z v

ZNv

Z u

ZNv

dW lr dW j

u dv)

ffiffiffi2p

6Z

l;jt þ

1

6~Z

l;j� V

5;l;jt , (152)

V6;l;j;i;Nt � N

Z t

0

Z v

ZNv

Z u

ZNv

dW lr dW j

u dW iv )

1ffiffiffi6p ~Z

l;j;it � V

6;l;j;it (153)

as N !1, where ððZl;jÞl;j2f1;...;dg; ð ~Zl;jÞl;j2f1;...;dg; ð ~Z

l;j;iÞl;j;i2f1;...;dgÞ is a ð2d2

þ d3Þ-

dimensional standard Brownian motion and where ðZl;jÞl;j2f1;...;dg is the Brownian

motion in the limit V 4;l;j in Lemma 1.

Proof. Using the scaling property of Brownian motion we obtain

V5;l;j;Nt ¼

1ffiffiffiffiffiNp

Z Nt

0

Z v

½v�

Z u

½v�

dW lr dW j

u dv

¼1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

Z v

k�1

Z u

k�1

dW lr dW j

u dv


Z Nt

½Nt�

Z v

½Nt�

Z u

½Nt�

dW lr dW j

u dv.

Integration by parts givesR½k�1;k½

R v

k�1

R u

k�1 dW lr dW j

u dv ¼R½k�1;k½ðk � vÞR v

k�1 dW lr dW j

v. The sequence of i.i.d. random variablesR½k�1;k½ðk � vÞR v

k�1dW l

r dW jv is independent of W j, has mean zero, variance 1=12 and covariance

withR½k�1;k½

R v

k�1 dW mr dW n

v of ð1=6Þ1fl¼m;j¼ng. Donsker’s functional central limit

Theorem (see Kallenberg, 1997, Theorem 2.9, p. 225) then gives

1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

ðk � vÞ

Z v

k�1

dW lr dW j

v )

ffiffiffi2p

6Z

l;jt þ

1

6~Z

l;jt . (154)

This result and the continuity of the Riemann integral P� limN!1ð1=ffiffiffiffiffiNpÞRNt

½Nt�

R v

½Nt�

R u

½Nt�dW l

r dW ju dv ¼ 0, establishes (152).

ARTICLE IN PRESS


Similarly, using the scaling property of Brownian motion we obtain

V6;l;j;i;Nt ¼

1ffiffiffiffiffiNp

Z Nt

0

Z v

½v�

Z u

½v�

dW lr dW j

u dW iv

¼1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

Z v

k�1

Z u

k�1

dW lr dW j

u dW iv


Z Nt

½Nt�

Z v

½Nt�

Z u

½Nt�

dW lr dW j

u dW iv.

As the sequence of i.i.d random variablesR½k�1;k½

R v

k�1

R u

k�1 dW lr dW j

u dW iv is

independent of W j andR½k�1;k½

R v

k�1 dW mr dW n

v for all m; n ¼ 1; . . . ; d, and hasvariance 1=6, Donsker’s invariance principle establishes

1ffiffiffiffiffiNp

X½Nt�

k¼1

Z½k�1;k½

Z v

k�1

Z u

k�1

dW lr dW j

u dW iv )

1ffiffiffi6p ~Z

l;j;it . (155)

As by the continuity of the stochastic integral P� limN!1ð1=ffiffiffiffiffiNpÞRNt

½Nt�

R v

½Nt�

R u

½Nt�dW l

r dW ju dW i

v ¼ 0, this proves (153). &

Lemma 9. The semimartingale V6;l;j;i;N is good.

Proof. As for Lemma 2 it is sufficient to show that supN VAR½V6;l;j;i;Nt �o1.

Using

VAR½V6;l;j;i;Nt � ¼ N�1E

Z Nt

0

Z v

½v�

Z u

½v�

dW lr dW j

u dW iv

� �2" #

¼1

N

Z Nt

0

Z v

½v�

ðu� ½v�Þdudv ¼1

2

1

N

Z Nt

0

ðv� ½v�Þ2 dv

¼1

2

1

N

X½Nt�

k¼1

Z½k�1;k½

ðv� ðk � 1ÞÞ2 dvþ1

2

1

N

Z Nt

½Nt�

ðv� ½Nt�Þ2 dv

¼1

6

½Nt�

NþðNt� ½Nt�Þ3

N

� �o

1

6t,

we conclude that V6;l;j;i;N is good. &

Lemma 10. The semimartingale V 5;l;j;N is not good. For any sequence of good semi-

martingales aN such that ðaN ;V5;l;j;N Þ ) ða;V 5;l;jÞ we haveR T

0 aNs dV 5;l;j;N

s )R T

0as dV 5;l;j

s þ ½a;V5;l;j�T , for l; j ¼ 1; . . . ; d.

ARTICLE IN PRESS


Proof. Let aN be an arbitrary sequence of good semimartingales such thatðaN ;V5;l;j;N Þ ) ða;V 5;l;jÞ. Integration by parts givesZ T

0

aNs dV 5;l;j;N

s ¼ aNT V

5;l;j;NT �

Z T

0

V 5;l;j;Ns daN

s

) aT V5;l;jT �

Z T

0

V 5;l;js das

¼

Z T

0

as dV5;l;js þ ½a;V

5;l;j �T . ð156Þ

The semimartingale V 5;l;j;N is good if and only if ½a;V5;l;j�T ¼ 0. To show thatgoodness fails it suffices to find an aN such that ½a;V5;l;j �Ta0. Taking aN ¼ V 4;l;j;N

we obtain ½V4;l;j ;V5;l;j�T ¼16 T . The conclusion follows. The limit is given by

(156). &

The expansion of ~U~X

N

T � ~XN

T � X T for the approximation error of the Milshteinscheme is obtained along the same lines as (102) for the Euler scheme. We get

~U~X

N

T ¼

Z T

0

Xd

l¼1

qlAðX s þ l1;lel~U~X

Nl

s Þ~U~X

Nl

s ds

þ

Z T

0

Xd

l¼1

Xd

j¼1

qlBjðX s þ l2;lel~U~X

Nl

s Þ~U~X

Nl

s dW js

�1

N

Z T

0

Xd

l¼1

qlAð ~XN

s þ l3;lelU~X

Nl

s ÞAlð ~XN

ZNsÞdV1;N

s

�1

N

Z T

0

Xd

l¼1

qlAð ~XN

s þ l3;lelU~X

Nl

s ÞXd

j¼1

Bl;jð ~XN

ZNsÞdV 2;j;N

s

�1

N

Z T

0

Xd

l;j¼1

qlBjð ~XN

s þ l4;lelU~X

Nl

s ÞAlð ~XN

ZNsÞdV 3;j;N

s

�1

N

Z T

0

Xd

i;k;l;j¼1

qkBið ~XN

s þ l4;kekU~X

Nk

s Þ½qBlBj�kð~X

N

ZNsÞdV 6;l;j;i;N

s

�1

N3=2

Z T

0

Xd

k;l;j¼1

qkAð ~XN

s þ l3;kekU~X

Nk

s Þ½qBl Bj�kð~X

N

ZNsÞdV5;l;j;N

s , ð157Þ

where ~U~X

Nl

s ¼ ~XN

l;s � X l;s and U~X

Nl

s � ~XN

l;s �~X l;ZN

s. Comparing (157) with (102),

shows that the additional term in the Milshtein scheme eliminates integrals withrespect to V 4;i;j;N in the error expansion of the Euler scheme. At the same time itinduces two additional integrals, with respect to V 5;l;j;N and V 6;l;j;i;N (lines 5 and 6 of

(157)), in the approximation errors Að ~XN

ZNsÞ � Að ~X

N

s Þ and Bjð ~XN

ZNsÞ � Bjð ~X

N

s Þ.

Proof of Theorem 7. Lemmas 9 and 10 imply that the term in line 6 of (157) is ofhigher order than those in lines 2–5. Lemmas 1–5 and 8–10 imply that lines 1–5 of

ARTICLE IN PRESS


(157) converge weakly to a linear SDE whose solution is the expression for U~X

T inTheorem 7. &

Proof of Theorem 8. Note that the last two terms of ~UX

T are both products of twoindependent random variables one of which has null expectation. Thus, using

E OT

Z T

0

O�1s ½ðqAÞBj � ðqBjÞA�ðX sÞdZjs

� �

¼ E OT

Z T

0

O�1s ½ðqBiÞðqBlÞBj�ðX sÞd ~Zl;j;is

� �¼ 0, ð158Þ

for l; j; i ¼ 1; . . . ; d leads to the result announced. &

Proof of Theorem 9. The proof is the same as for the Euler scheme withtransformation. &

Proof of Corollary 5. We proceed as in the proof of Corollary 2. Decompose the

approximation error in two parts, ~HN

1T and ~HN

2T . These are the equivalents of HN1T

and HN2T in Corollary 2, obtained by replacing the Euler scheme by the Milshtein

scheme. The limit of NE0½ ~HN

1T �, namely 12

R T

0~KsðX 0Þds, is found using the same

arguments as for the Euler scheme.

The second component ~HN

2T can be written, using the mean value theorem, as

�N ~HN

2T ¼

Z T

0

qgðX sÞ Að ~XN

ZNsÞdV 1;N

s þXd

j¼1

Bjð ~XN

ZNsÞdV 2;N

s


Xd

l;j¼1

½ðqBlÞBj �ðX sÞdV5;l;j;Ns

!þ oPð1Þ. ð159Þ

The limits for the integrals with respect to V1;N and V2;j;N follow from the argumentsin the proof of Corollary 2. For the integral with respect to V5;l;j;N

s note that Lemmas8 and 10 imply

E1ffiffiffiffiffiNp

Xd

l;j¼1

½ðqBlÞBj�ðX sÞdV5;l;j;Ns

" #! 0, (160)

as N !1. This establishes the result announced. &

Proof of Theorem 10. The proof parallels the proof for the Euler scheme with

transformation. The average over independent replications of 12qgð ~X

N

T Þ~C

N

nh approx-

imates the negative of the second-order bias with transformation. The bias-correctedestimators with and without transformation are asymptotically equivalent becausethey share the same asymptotic distribution. &

Proof of Corollary 6. The proof is identical to the proof of Corollary 4. &

ARTICLE IN PRESS


Proof of Theorem 11. From (70) we get

�1

L

XL�1l¼0

gDðY l ;Y lþ1; y0Þ ¼1

L

XL�1l¼0

JDðY lÞ0qyhDðY l ;Y lþ1; y0Þðy

L

D � y0Þ þ oPð1Þ.

(161)

By assumption, Y is an ergodic Markov process. We then get, from the central limittheorem for ergodic processes,ffiffiffiffi

Lpðy

L

D � y0Þ ) S�1=2D Z (162)

where Z�Nð0; IpÞ, and SDðy0Þ � ðADðy0Þ�1Þ0BDðy0ÞADðy0Þ

�1, with

ADðy0Þ0�

ZRd

JDðyÞ0GDðyÞp

y0 ðyÞlðdyÞ,

BDðy0Þ �ZRd

JDðyÞ0CDðyÞJDðyÞp

y0 ðyÞlðdyÞ,

GDðyÞ �

ZRd

qyhðy; z; y0Þpy0D ðy; zÞlðdzÞ,

CDðyÞ �

ZRd

hðy; z; y0Þðhðy; z; y0ÞÞ0p

y0D ðy; zÞlðdzÞ.

Setting JDðyÞ0¼ GDðyÞ

0CDðyÞ�1 and using the Cauchy–Schwartz inequality gives the

variance lower bound

SDðy0Þ ¼ZRd

GDðyÞ0CDðyÞ

�1GDðyÞpy0 ðyÞlðdyÞ. (163)

Similarly, if P� limL!1 SL ¼ S, we can use (72) to show that

�1

L

XL

l¼1

ðS1=2gDðY l ;Y lþ1; y0ÞÞ

!01

L

XL

l¼1

qyðS1=2gDðY l ;Y lþ1; ~yL

DÞÞ

!

¼1

L

XL

l¼1

qyðS1=2gDðY l ;Y lþ1; y0ÞÞ

!01

L

XL

l¼1

qyðS1=2gDðY l ;Y lþ1; ~yL

DÞÞ

!

�ð~yL

D � y0Þ þ oPð1Þ.

It now follows thatffiffiffiffiLpð~y

L

D � y0Þ ) XDðy0Þ�1=2Z (164)

where XDðy0Þ ¼ ðCDðy0Þ�1Þ0DDðy0Þ

�1CDðy0Þ, with

CDðy0Þ0¼

ZRdðS1=2JDðyÞÞ

0GDðyÞpy0 ðyÞlðdyÞ, (165)

ARTICLE IN PRESS


DDðy0Þ ¼ZRdðS1=2JDðyÞÞ

0CDðyÞðS1=2JDðyÞÞp

y0ðyÞlðdyÞ. (166)

This is of the same form as the variance SD offfiffiffiffiLpðy

L

D � y0Þ above for which JDðyÞ isthe optimal weight (by the Cauchy–Schwartz inequality). We conclude that theefficient GMM must have SL ¼ Ip. &

Proof of Theorem 12. The mean value theorem and P� limL!1 yL;ML;NL

D � yL

D

� �¼

0 imply the existence of yL%

such that P� limL!1 yL%¼ y0 and

� GL;ML;NLD ðY l ;Y lþ1; y

L%ÞffiffiffiffiLpðy

L;ML;NL

D � yL

DÞ

¼

ffiffiffiffiLpffiffiffiffiffiffiffiffiML

p1

L

XL�1l¼0

UgML ;NLðY l ;Y lþ1; y

L;ML;NL

D ; yL

DÞ

!, ð167Þ

where

UgML ;NLY l ;Y lþ1; y

L;ML;NL

D ; yL

D

� ��


pg

ML;NLD ðY l ;Y lþ1; y

L;ML;NLÞ � gDðY l ;Y lþ1; y

LÞ

� �, ð168Þ

and

GL;ML;NLD ðY l ;Y lþ1; y

L%Þ �

1

L

XL�1l¼0

qygML;NLD ðY l ;Y lþ1; y

L%Þ. (169)

By the law of large numbers for ergodic Markov chains

P� limL!1

GL;ML;NLD ðY l ;Y lþ1; y

L%Þ ¼ SDðy0Þ. (170)

Similarly, using Theorems 4, 5, or 9, adjusted for the dependence on the observations

Y l , Y lþ1, and the fact that P� limL!1 ðyL;ML;NL

D � yL

DÞ ¼ 0, we conclude that

P� limL!1

1

L

XL�1l¼0

UgML ;NLðY l ;Y lþ1; y

L;ML;NL

D ; yL

DÞ ¼ e2kDðy0Þ (171)

if limL!1


p=NL ¼ e2, where kDðy0Þ is defined in (77). The result announced

follows if limL!1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiL=ML

p¼ e1o1. &

Appendix B. Correcting for second-order bias

This appendix provides formulas for the bias-corrected estimators of condi-tional expectations. In the expressions that follow we use the notation h ¼ Dt

for the time increment. The terms in the bias-corrected estimator without

ARTICLE IN PRESS


transformation, (42), are

CN1;ðnþ1Þh ¼ CN

1;nh þ qAðX NnhÞhþ

Xd

j¼1

qBjðXNnhÞDW

jnh

" #CN

1;nh

þ ðqAÞAþXd

j¼1

qBj qABj þXd

j;k;l¼1

qkðqlABl;jÞBk;j

" #ðX N

nhÞ

!h

N

�Xd

j;k¼1

½q½qBj qBj Bk�Bk�ðXNnhÞ

!h

N

þXd

j¼1

½ðqAÞBj þ ðqBjÞA�ðXNnhÞ

DWjnh

N

�Xd

j;k¼1

½ðqBjÞðqBjÞBk�ðXNnhÞ

DW knh

N, ð172Þ

Ci;N2;ðnþ1Þh ¼ CN

2;nh þXd

l;j¼1

nNl;jðnh;NhÞ

!h

N, (173)

where

ni;jðnh;NhÞ ¼Xd

k¼1

CNk;i;jðnh;NhÞ½ðqBjÞBi�ðX

NnhÞ

þXd

k¼1

FNk;i;jðnh;NhÞðON

nhÞ�1½ðq½ðqBjÞBi� � qBiðqBjÞÞBi�ðX

NnhÞ

with

FNk;i;jðnh;NhÞ ¼

Xd

l;n¼1

q2l;hgðX nN ÞXd

h¼1

On;h;Nnh;NhBh;jðX

NnhÞ

!Ol;k;N

Nh

þXd

l¼1

qlgðXNNhÞG

l;k1;j ðnh;NhÞ, ð174Þ

CNk;i;jðnh;NhÞ ¼

Xd

l;m;n¼1

q3l;m;ngðX NNhÞ

Xd

h¼1

Om;h;Nnh;Nh Bh;iðX

NnhÞ

!

�Xd

h¼1

On;h;Nnh;NhBh;jðX

NnhÞ

!Ol;k;N

Nh þXd

l;m¼1

q2l;mgðX NNhÞ

�Xd

h¼1

Gm;h;N1;i ðnh;NhÞBh;jðX

NnhÞ þ

Xd

h¼1

Om;h;NNh ½ðqBh;jÞBi�ðX

NnhÞ

!Ol;k;N

Nh

þXd

l;m¼1

q2l;hgðX NNhÞ

Xd

h¼1

Om;h;Nnh;Nh Bh;jðX

NnhÞ

!Gl;k;N1;i ðnh;NhÞ, ð175Þ

ARTICLE IN PRESS


and Gl;k;N1;i ðnh;NhÞ ¼

Pdp¼1 qpBl;iðX

NnhÞO

p;k;Nnh þ

PNhs¼nh DhG

l;k;N1;i ðnh; shÞ where

DhGl;k;N1;i ðnh; shÞ ¼

Xd

p¼1

qpAlðXNshÞhþ

Xd

j¼1

qpBl;jðX shÞDhWjsh

!Gp;k;N1;i ðnh; shÞ

þXd

p;m¼1

q2p;mAlðXNshÞhþ

Xd

j¼1

q2p;mBl;jðXNshÞDhW

jsh

!Op;k;N

sh

�Xd

q¼1

Om;q;Nnh;sh Bq;iðX

NshÞ

!, ð176Þ

and with Gl;k;N2;i;j ðnh;NhÞ ¼

Pdp;r¼1 q

2p;rBl;jðX

NnhÞBr;iðX

NnhÞO

p;k;Nnh þ

Pdp¼1 qpBl;jðX

NnhÞG

p;k;N1;i

ðnh; nhÞ þPNh

s¼nh DhGl;k;N2;i;j ðnh; shÞ where

DhGl;k;N2;i;j ðnh; shÞ ¼

Xd

p¼1

qpAlðXNshÞhþ

Xd

q¼1

qpBl;qðXNshÞDhW

qsh

!Gp;k;N2;i;j ðnh; shÞ

þXd

p;m¼1

q2p;mAlðXNshÞhþ

Xd

q¼1

q2p;mBl;qðXNshÞDhW

qsh

!

�Xd

r¼1

Om;r;Nnh;sh Br;iðX nhÞ

!Gp;k;N1;j ðnh; shÞ

þXd

p;m¼1

q2p;mAlðXNshÞhþ

Xd

q¼1

q2p;mBl;qðXNshÞDhW

qsh

!Op;k;N

sh

�Xd

p¼1

Gm;p;N1;i ðnh; shÞBp;jðX nhÞ þ Om;p;N

nh;sh ½ðqBp;jÞBi�ðXNnhÞ

� �

þXd

p;m;r¼1

q3p;m;rAlðXNshÞhþ

Xd

q¼1

q3p;m;rBl;qðX shÞDhWqsh

!Op;k;N

sh

�Xd

q¼1

Om;q;Nnh;sh Bq;jðX

NnhÞ

! Xd

q¼1

Or;q;Nnh;sh Bq;iðX

NnhÞ

!, ð177Þ

ONðnþ1Þh ¼ ON

nh þ qAðX NnhÞhþ

XN

j¼1

qBjðXNnhÞðDW

jnhÞ

!ON

nh,

X Nðnþ1Þh ¼ X N

nh þ AðX NnhÞhþ

Xd

j¼1

BjðXNnhÞðDW

jnhÞ

ARTICLE IN PRESS


with X N0 ¼ X 0, ON

0 ¼ Id and CN1;0 ¼ CN

2;0 ¼ 0. For the bias-corrected estimator withthe transformation, (43), we have

CN

ðnþ1Þh ¼ CN

nh þ ½qAðXN

nhÞh�CN

nh þ ðqAÞAþXd

j;k;l¼1

ql;kABk;j Bl;j

" #ðX

N

nhÞ

!h

N

þXd

j¼1

qAðXN

nhÞBj

DWjnh

N,

XN

ðnþ1Þh ¼ XN

nh þ AðXN

nhÞhþXd

j¼1

BjðDWjnhÞ (178)

with XN

0 ¼ X 0 and C0 ¼ 0, when the transformation is applied.Additional correction terms for the estimators of conditional expectations of

Riemann integrals are, for the Euler scheme,

CN3;ðnþ1Þh ¼ CN

3;nh þ ðqgÞ AþXd

j¼1

ðqBjÞBj

!" #ðX N

nhÞ þXd

j¼1

½B0jðq2gÞBj�ðX

NnhÞ

!h

N,

(179)

and for the Euler scheme with transformation,

CN

1;ðnþ1Þh ¼ CN

1;nh þ ½ðqgÞðAÞ�ðXN

nhÞ þXd

j¼1

B0

jðq2gðX

N

nhÞÞBj

!h

N. (180)

For the Milshtein scheme the process ~Cðnþ1Þh in the bias-corrected estimator (61) is

~CN

ðnþ1Þh ¼~C

N

nh þ qAð ~XN

nhÞhþXd

j¼1

qBjð ~XN

nhÞDWjnh

" #~C

N

nh

þ qAð ~XN

nhÞDh

~XN

nh

NþXd

j¼1

½ðq½ðqAÞBj�ÞBj�ð ~XN

nhÞh

N

þXd

j¼1

½ðqBjÞA�ð ~XN

nhÞDhW

jnh

N, ð181Þ

~XN

ðnþ1Þh ¼~X

N

nh þ Að ~XN

nhÞhþXd

j¼1

Bjð ~XN

nhÞðDWjnhÞ þ

Xd

l;j¼1

½ðqBlÞBj�ð ~XN

nhÞDFl;j,

(182)

where DFl;j is defined in (53). The additional correction term for second-order bias-corrected estimators of Riemann integrals based on the Milshtein scheme is

~CN

1;ðnþ1Þh ¼~C

N

1;nh þ ðqgÞ AþXd

j¼1

ðqBjÞBj

!" #ð ~X

N

nhÞ þXd

j¼1

½B0jðq2gÞBj�ð ~X

N

nhÞ

!h

N.

(183)

ARTICLE IN PRESS


An alternative is to use the Euler scheme to approximate X in the recursion for ~C.As both ~X

N

� and X N� converge to X � the resulting approximation is first-order

asymptotically equivalent.

References

Ait-Sahalia, Y., 1996. Testing continuous-time models of the spot interest rate. Review of Financial

Studies 9, 385–426.

Ait-Sahalia, Y., 1999. Transition densities for interest rate and other nonlinear diffusions. Journal of

Finance 54, 1361–1395.

Ait-Sahalia, Y., 2002. Maximum likelihood estimation of discretely observed diffusions: a closed form

approximation approach. Econometrica 70, 223–262.

Bally, V., Talay, D., 1996a. The law of the Euler scheme for stochastic differential equations (I):

convergence rate of the distribution function. Probability Theory and Related Fields 104, 43–60.

Bally, V., Talay, D., 1996b. The law of the Euler scheme for stochastic differential equations (II):

convergence rate of the density. Monte Carlo Methods and its Applications 2, 93–128.

Barberis, N., 2000. Investing for the long run when returns are predictable. Journal of Finance 55,

225–264.

Basawa, I.V., Scott, D.J., 1982. Asymptotic optimal inference for non-ergodic models. Lecture Notes in

Statistics, vol. 17. Springer, New York.

Bawa, V.S., Brown, S.J., Klein, R.W., 1979. Estimation Risk and Optimal Portfolio Choice. North-

Holland, New York.

Bibby, B.M., Sørensen, M., 1995. Martingale estimating functions for discretely observed diffusion

processes. Bernoulli 1, 17–39.

Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York.

Brandt, M., Santa-Clara, P., 2002. Simulated likelihood estimation of diffusions with an application to

exchange rate dynamics in incomplete markets. Journal of Financial Economics 63, 161–210.

Broze, L., Scaillet, O., Zakoian, J.-M., 1998. Quasi-indirect inference for diffusion processes. Econometric

Theory 14, 161–186.

Chamberlain, G., 1987. Asymptotic efficiency in estimation with conditional moment restrictions. Journal

of Econometrics 34, 305–334.

Chamberlain, G., 1992. Efficiency bounds for semiparametric regression. Econometrica 60, 567–596.

Chan, K.C., Karolyi, A., Longstaff, F.A., Saunders, A., 1992. A comparison of alternative models of the

short-term interest rates. Journal of Finance 47, 1209–1227.

Chib, S., Elerian, O., Shephard, N., 2001. Likelihood inference for discretely observed non-linear

diffusions. Econometrica 69, 959–993.

Dacunha-Castelle, D., Florens-Zmirou, D., 1986. Estimation of the coefficients of a diffusion from

discrete observations. Stochastics 19, 263–284.

Detemple, J.B., Garcia, R., Rindisbacher, M., 2003. A Monte Carlo method for optimal portfolios.

Journal of Finance 58, 401–446.

Detemple, J.B., Garcia, R., Rindisbacher, M., 2005. Representation formulas for Malliavin derivatives of

diffusion processes. Finance and Stochastics 9, 349–369.

Doss, H., 1977. Liens entre equations differentielles stochastiques et ordinaires. Annales de l’Institut H.

Poincare 13, 99–125.

Duffie, D., Glynn, P., 1995. Efficient Monte Carlo simulation of security prices. Annals of Applied

Probability 5, 897–905.

Duffie, D., Protter, P., 1992. From discrete to continuous finance: weak convergence of the financial gain

process. Mathematical Finance 2, 1–15.

Duffie, D., Singleton, K., 1993. Simulated moments estimation of Markov models of asset prices.

Econometrica 61, 929–952.

Durham, G.B., Gallant, A.R., 2002. Numerical techniques for maximum likelihood estimation of

continuous-time diffusion processes. Journal of Business and Economic Statistics 20, 297–316.

ARTICLE IN PRESS


Eraker, B., 2001. MCMC analysis of diffusion models with application to finance. Journal of Business and

Economic Statistics 19, 177–191.

Florens-Zmirou, D., 1989. Approximate discrete-time schemes for statistics of diffusion processes.

Statistics 20, 547–557.

Florens-Zmirou, D., 1993. On estimating the diffusion coefficient from discrete observations. Journal of

Applied Probability 30, 790–804.

Gaines, J.G., Lyons, T.J., 1997. Variable step size control in the numerical solution of stochastic

differential equations. SIAM Journal of Applied Mathematics 57, 1455–1484.

Gallant, A.R., Tauchen, G., 1996. Which moments to match? Econometric Theory 12, 657–681.

Gallant, A.R., Tauchen, G., 2002. Simulated score methods and indirect inference for continuous-time

models. Handbook of Financial Econometrics, forthcoming.

Genon-Catalot, V., 1990. Maximum contrast estimation for diffusion processes from discrete

observations. Statistics 21, 99–116.

Genon-Catalot, V., Jacod, J., 1993. On the estimation of the diffusion coefficient for multidimensional

diffusion processes. Annales de l’I.H.P., Probabilite et Statistiques 29, 119–151.

Gourieroux, C., Montfort, A., Renault, E., 1993. Indirect inference. Journal of Applied Econometrics 8,

85–118.

Hansen, L., 1985. A method for calculating bounds on the asymptotic covariance matrices of generalized

method of moments estimators. Journal of Econometrics 30, 203–238.

Hansen, L.P., Scheinkman, J.A., 1995. Back to the future: generating moment implications for

continuous-time Markov processes. Econometrica 63, 767–804.

Hansen, L.P., Heaton, J.C., Ogaki, M., 1988. Efficiency bounds implied by multi-period conditional

moment restrictions. Journal of the American Statistical Association 83, 863–871.

Heyde, C.C., 1992. On best asymptotic confidence intervals for parameters of stochastic processes. Annals

of Statistics 20, 603–607.

Jacod, J., Protter, P., 1998. Asymptotic error distributions for the Euler method for stochastic differential

equations. Annals of Probability 26, 267–307.

Jakubowski, A., Memin, J., Pages, G., 1989. Convergence en loi des suites d’integrals stochastiques sur

l’espace D1 de Skorohod. Probability Theory and Related Fields 81, 111–137.

Kallenberg, O., 1997. Foundations of Modern Probability Theory. Springer, New York.

Kessler, M., Sørensen, M., 1999. Estimating equations based on eigenfunctions for a discretely observed

diffusion process. Bernoulli 5 (2), 299–314.

Kloeden, P., Platen, E., 1997. Numerical Solution of Stochastic Differential Equations. Springer, Berlin.

Kurtz, T.G., Protter, P., 1991a. Wong–Zakai corrections, random evolutions and numerical schemes for

SDE’s. In: Stochastic Analysis. Academic Press, New York, pp. 331–346.

Kurtz, T.G., Protter, P., 1991b. Weak limit theorems of stochastic integrals and stochastic differential

equations. Annals of Probability 19, 1035–1070.

Kurtz, T.G., Protter, P., 1996. Weak convergence and stochastic integrals and differential equations, 1995

CIME School in Probability. Lecture Notes in Mathematics, vol. 1627. Springer, Berlin, pp. 1–41.

Lo, A., 1988. Maximum likelihood estimation of generalized Ito processes with discretely-sampled data.

Econometric Theory 4, 231–247.

Meyn, S.P., Tweedie, R.L., 1993. Markov Chains and Stochastic Stability. Communications and Control

Engineering Series. Springer, Berlin, New York.

Milshtein, G.N., 1984. Weak approximation of solutions of systems of stochastic differential equation.

Theory of Probability and its Application 4, 750–768.

Milshtein, G.N., 1995. Numerical Integration of Stochastic Differential Equations. Kluwer Academic

Publishers, New York.

Milshtein, G.N., Schoenmakers, J., Spokoiny, V., 2004. Transition density estimation for stochastic

differential equations via forward-reverse representations. Bernoulli 10, 281–312.

Muller, U.U., Wefelmeyer, W., 2002. Autoregression, estimating functions, and optimality criteria. In

Advances in Statistics, Combinatorics and Related Areas, Selected Papers from the SCRA2001-FIM

VIII Proceedings of the Wollongong Conference. World Scientific, Singapore.

Newey, W.K., 1990. Semiparametric efficiency bounds. Journal of Applied Econometrics 5, 90–135.

ARTICLE IN PRESS


Nualart, D., 1995. The Malliavin Calculus and Related Topics. Springer, New York.

Pedersen, A.R., 1995a. A new approach to maximum likelihood estimation for stochastic differential

equations based on discrete observations. Scandinavian Journal of Statistics 22, 55–71.

Pedersen, A.R., 1995b. Consistency and asymptotic normality of an approximate maximum likelihood

estimator for discretely observed diffusion processes. Bernoulli 1, 257–279.

Santa-Clara, P., 1995. Simulated likelihood estimation of diffusions with and application to the short term

interest rate. Ph.D. Dissertation, INSEAD.

Smith, A.A., Jr. 1990. Three Essays on the Solution and estimation of dynamic macroeconomic models.

Ph.D. Thesis, Duke University.

Talay, D., 1984. Efficient numerical schemes for the approximation of expectations of functionals of

S.D.E. In: Korezioglu, H., Maziotto, G., Szpirglas, J. (Eds.), Filtering and Control of Random

Processes, Lecture Notes in Control and Information Sciences, vol. 61, Springer, New York, pp.

294–313.

Talay, D., 1986. Discretisation d’une equation differentielle stochastique et calcul approche d’esperance de

fonctionelles de la solution. Mathematical Modelling and Numerical Analysis 0, 141–179.

Talay, D., 1996. Probabilistic numerical methods for partial differential equations: elements of analysis.

CIME School in Probability, Lecture Notes in Mathematics, vol. 1627. Springer, Berlin, pp. 149–196.

Talay, D., Tubaro, L., 1990. Expansion of the global error for numerical schemes solving stochastic

differential equations. Stochastic Analysis and its Application 8, 483–509.

Wefelmeyer, W., 1996. Quasi-likelihood models and optimal inference. Annals of Statistics 24, 405–422.

Wong, E., Zakai, M., 1964. On the convergence of ordinary integrals to stochastic integrals. Annals of

Mathematical Statistics 36, 1560–1564.

Documents

Asymptotic properties of Monte Carlo estimators of ...€¦ · Journal of Econometrics 134 (2006) 1–68 Asymptotic properties of Monte Carlo estimators of diffusion processes Je´roˆme