18
This article was downloaded by: [University of California Santa Cruz] On: 15 October 2014, At: 14:39 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK IIE Transactions Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uiie20 Stochastic root finding via retrospective approximation HUIFEN CHEN a & BRUCE W. SCHMEISER a a Department of Industrial Engineering , Chung Yuan Christian University , Chung Li, Taiwan E-mail: b School of Industrial Engineering , Purdue University , West Lafayette, IN, 47907-1287, USA E-mail: Published online: 27 Apr 2007. To cite this article: HUIFEN CHEN & BRUCE W. SCHMEISER (2001) Stochastic root finding via retrospective approximation, IIE Transactions, 33:3, 259-275, DOI: 10.1080/07408170108936827 To link to this article: http://dx.doi.org/10.1080/07408170108936827 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Stochastic root finding via retrospective approximation

  • Upload
    bruce-w

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stochastic root finding via retrospective approximation

This article was downloaded by: [University of California Santa Cruz]On: 15 October 2014, At: 14:39Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

IIE TransactionsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uiie20

Stochastic root finding via retrospective approximationHUIFEN CHEN a & BRUCE W. SCHMEISER aa Department of Industrial Engineering , Chung Yuan Christian University , Chung Li, TaiwanE-mail:b School of Industrial Engineering , Purdue University , West Lafayette, IN, 47907-1287, USAE-mail:Published online: 27 Apr 2007.

To cite this article: HUIFEN CHEN & BRUCE W. SCHMEISER (2001) Stochastic root finding via retrospective approximation, IIETransactions, 33:3, 259-275, DOI: 10.1080/07408170108936827

To link to this article: http://dx.doi.org/10.1080/07408170108936827

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representationsor warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Anyopinions and views expressed in this publication are the opinions and views of the authors, and are not theviews of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor and Francis shall not be liable for any losses,actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoevercaused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Stochastic root finding via retrospective approximation

lIE Transactions (200 I) 33, 259-275

Stochastic root finding via retrospective approximation

HUIFEN CHEN! and BRUCE W. SCHMEISER2

I Department of Industrial Engineering. Chung Yuan Christian University, Chung Li, TaiwanE-mail: [email protected]'2SclIOoi of Industrial Engineering, Purdue University, West Lafayette. IN 47907-1287, USAE-mail: [email protected]

Received October 1999 and accepted July 2000

Given a user-provided Monte Carlo simulation procedure to estimate a function at any specified point, the stochastic root-findingproblem is to find the unique argument value to provide a specified function value. To solve such problems, we introduce the familyof Retrospective Approximation (RA) algorithms. RA solves, with decreasing error, a sequence of sample-path equations that arebased on increasing Monte Carlo sample sizes. Two variations are developed: IRA, in which each sample-path equation isgenerated independently of the others, and ORA, in which each equation is obtained by appending new random variates to theprevious equation. We prove that such algorithms converge with probability one to the desired solution as the number of iterationsgrows, discuss implementation issues to obtain good performance in practice without tuning algorithm parameters, provideexperimental results for an illustrative application, and argue that IRA dominates ORA in terms of the generalized mean squarederror.

I. Introduction

The root-finding problem is to find the unique root x* ofthe equation g(x*) = y, where 9 :~ -t ~. We considerStochastic Root-Finding Problems (SRFPs), the casewhere g(x) can only be estimated by a consistent esti­mator y(x) for any given value of x. SRFPs arise in de­signing stochastic systems with computer simulation: y isthe desired system performance, x is the value of a designvariable, g(x) is the corresponding system performance,and y(x) is the estimated performance obtained from auser-provided Monte Carlo simulation procedure. Morespecifically, the SRFP is defined as follows.

Stochastic Root-Finding Problem (SRFP):Given:

(a) constant vector l' E ~I,

(b) a (computer) procedure for generating, for anyx E ~I, a d-dimensional consistent estimate y(x)of g(x).

Find: the unique root x* satisfying g(x*) = y using onlythe estimator y.Examples of SRFPs can be found in Chen and

Schmeiser (1994a) and in Section 5. As an illustrativeexample, we consider here a one-dimensional problemarising in statistical inference: Compute the analogue of aStudent's t critical value when the independent observa­tions arise from an arbitrary distribution function, sayFv, rather than from the normal distribution. Specifically:

0740-817X © 2001 "liE"

given Fv; a sample size 11, and a probability I - 0(, theproblem is to determine the critical value 11-•." such that

P{T:S: tl_.,,,} = I - 0(, where T = yli(V - /I)/S, V = 11-1

L.]=I I), S2 = (11 - I)-I L.]=I (I) - V)2, the observations

101, /12, ... , v" are sampled independently from Fv; andJ1 = E(V) assumed in the null hypothesis. This problem isan SRFP with true root x* = 11-..", target y = I - 0(, andfunction g(x) = peT :s: x). The user-written computerprocedure y(x) is the sample average of YI (x), . . . ,Ym(x),where Yj(x) = 1{'0 :s: x}, Tj is the jth observation of T,and 1 is the indicator function. That is, the computerprocedure mimics the process: Each observation Yj(x) isobtained by generating one random sample of size 11 fromFo, computing the value of Tj, and returning I if Tj :s: xand 0 otherwise.

This root-finding problem could be solved determinis­tically using numerical quadrature. Rather than y(x), theuser-provided procedure would compute

00 00

g(x) = J... JI{T:s:x}dFv(VI) .. ·dFv(v,,),-00 -00

probably using an off-the-shelf quadrature routine. Thisvalue, in turn, could be given to an off-the-shelf deter­ministic root-finding routine. Even for this simple exam­ple, and even for small values of 11, both simplicity of codeand computing time favor the stochastic approach. Ofcourse, for the important normal-distribution case,specialized deterministic routines are quite efficient.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 3: Stochastic root finding via retrospective approximation

260

Wc arc interested in black-box algorithms to solve theSRFP. Given such algorithms, a practitioner needs onlyto provide a computer simulation procedure that pro­vidcs the estimator ji(x) of y(x) by mimicking the be­havior of the modeled system. Often, as in this example,p(x) is an unbiased sample average, but more generally,such as in problems involving steady-state behavior ofstochastic systems, p(x) could be biased. Evcn moregenerally, 5i(x) need not be a sample average. The con­ccpts necessary to develop such simulation procedures arediscussed in standard textbooks, for example Law andKelton (2000) and Banks et 01. (1996).

The general problem is to solve d equations in d un­knowns. As with deterministic root-finding, many inter­esting problems lie in a single dimension. We consider onlyd = I in detail, The ideas, but not all of the details, of thispaper extend to multiple dimensions, e.g., Chen (1998).

Wc propose, in Section 2, a family of RetrospectiveApproximation (RA) algorithms, which iteratively solve asequence of sample-path approximation problems withincreasing sample sizes. In each iteration, the sample-pathapproximation problem is solved to within an error tol­crancc; the root estimator is then a function of thosesolutions. Using results from M-estimators, we show inSection 3 that, under proper conditions, the RA rootestimator converges to the true root with probability one(w.p.I). Section 4 is a discussion of implementation is­sues, Section 5 provides numerical results for two specificRA algorithms and empirically compares RA to sto­chastic-approximation algorithms (Robbins and Monro,1951), a classic stochastic root-finding approach. Thefamily of RA algorithms has been discussed previously inChcn (1994) and Chen and Schmeiser (1994b).

That an algorithm converges is not sufficient to makethe algorithm interesting; the finite-sample convergencecan be so slow as to make the algorithm impractical, evenwhen the asymptotic convergence rate is quite good. Ourempirical experience with various retrospective approxi­mation algorithms, however, has been good. Section 5.2shows that RA yields root estimates with much smallermean squared error (mse) than stochastic approximationfor practical numbers of replications. Work continues onstochastic approximation, although usually for optimi­zation rather than root finding (Andradottir, 1992; Fu,1994; L'Ecuyer et al., 1994; L'Ecuyer and Glynn, 1994;Fu and Hill, 1997). Simon (1998) proposes a naturalstochastic root-finding procedure that iteratively solves asequence of approximation functions of y(x), assumingthat thc solution can be determined exactly. A given ap­proximation function depends on all the past solutionsand their function estimates. This procedure converges ifthe root-finding function y is continuous, the estimator pis uniformly consistent, and the approximation functionsatisfies the "cluster point property". Despite the con­vergence proof. it is not clear how to construct an ap­proximation function that satisfies the cluster point

Chen and Schmeiser

property, is easy to implement, and is robust with respectto the algorithm parameters.

2. Retrospective approximation

Here we develop RA algorithms. In the following fivesubsections we discuss sample-path approximations, twovariations of RA algorithms, the rationale behind the RAlogic, estimators for the variance of the retrospective rootestimators, and retrospective approaches for stochasticoptimization.

2.1. Sample-pat" approximations

Fundamental to the retrospective approach is the conceptof the sample-path approximation to the function g. Atany point .r, this approximation is simply p(x). The ap­proximation to y is obtained by using common randomnumbers for every point x. We let!Q = {WI, .. . ,wm } de­note the random numbers used to obtain p(x) = 5im(x;!Q).Holding g; constant over all points x yields a sample-pathapproximation YnJ;!Q) to g. Although given !Q the sam­ple-path approximation 5im(x;!Q) is a deterministic func­tion of x, we need not write it explicitly; rather wecalculate its value only at each desired point x.

The sample-path equation

(I)

defines the random root x*(!Q), which is an estimate of thetrue root x*. We call x*(!Q) the retrospective root.

For the example in Section I, the random numbers Wjare used to generate the jth observationyk) = y(x; Wj) = I {7j(Wj) ::; x]. Given independentlygenerated random numbers WI, ... , W m, the sample-pathequation is

m

Pm(x*(!Q);!Q) = m- ILy(x*(!Q); Wj)j=1

m

=m- I LI{Tj(wJl ::;x*(!Q)} = 1-0:. (2)j=1

Despite the uniqueness of the true root tl-,.n of the strictlyincreasing, continuous (assuming Fv is continuous) func­tion q, the retrospective root might not be uniq ue or mightnot exist. The sample-path equation has many roots if I - 0:

lies on a step and has no root otherwise. In the lalter case,there is, however, a unique point x*(!Q) at which Ym crossesI - 0:. rn Section 3.1 we relax the definition of retrospectiveroots to allow "crossing" roots.

2.2. RA algorithms

RA iteratively solves approximately a sequence of sam­ple-path equations

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 4: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation 261

Section 5.1 empirically compares the efficiency of ORAand IRA algorithms.

Specifically, RA algorithms work as follows.

the ith retrospective solution. In IRA, the root estimatoris the weighted average of the solutions X(!QI),""X(Qlll'" ,£Q;):

Ym/X*(!Q;); !Qi) = )', (3)

for i = 1,2, ... , where the components of!Q; = {Wi,I, ... ,Wi,m,} are generated independently. RA uses a strictlyincreasing sample-size sequence {l1li}. At each retrospec­tive iteration i, RA uses a numerical root-finding algo­rithm to evaluate Ym, (-;!Q;) at one or more x values to finda retrospective solution X(!QI,'·' ,!Qi) that is close to theretrospective root x*(!Q;). In particular, RA uses the ret­rospective-iteration stopping criterion IX(!QI' ... ,!Q;) - X·

(!Q;)I < £i, where [e.} is a positive sequence converging tozero. Possibly {£i} is a function of the previous retro­spective solutions and therefore random; in this case weassume only w.p.1 convergence to zero. One approach tosatisfy the stopping criterion is to bound the retrospectiveroot and use bisection search until the stopping criterionis satisfied.

The values of x at which the sample-path approxima­tion YmJ; !Qi) is evaluated by the numerical root-findingalgorithm, as well as the number of such x values, arerandom, depending upon the particular sample-pathequation and (less directly) any information from earlierretrospective iterations. For example, in our implemen­tation the numerical root-finding algorithm uses an initialx computed from earlier sample-path equations. In fact,the computational efficiency of RA arises because smallsample sizes in the early retrospective iterations are usedto find the region of the root before more expensive andprecise computations are performed with larger samplesizes in the later retrospective iterations.

Two strategies seem natural for seeding the retro­spective iterations. In Dependent RA (DRA) all retro­spective iterations use the same random-number stream.Therefore, the retrospective iterations are dependentbecause the l1li-1 observations !Q;_I in the retrospectiveiteration i-I are the first l1li_1 observations of the m,observations !Qi in retrospective iteration i. In indepen­dent RA (IRAJ all retrospective iterations arc indepen­dently seeded. Therefore, the observations from iterationto iteration arc independent, with retrospective iterationi containing m, new observations !Qi independent ofQ;!I,'" ,Yli-i'

After i retrospective iterations, the root estimatorx(!QI,'" ,!Q;) differs between ORA and IRA. In ORA,

i(Q2\, . . . ,.QL:) = X(fQIl' ." ,Q2j),

i / ix(!QI,·· . ,!Qi) = L I1lj X(!QI,· .. , !Qj) L mi'j=1 J=I

(4)

(5)

RA Algorithms:Components:

I. An initial sample size nil and a rule for successivelyincreasing m, for i :::: 2.

2. A rule for computing an error-tolerance sequence{£;} that goes to zero w.p.1.

3. A numerical root-finding method for solving sam­ple-path Equation (3) to within a specified error £i

for each i.

Find: the root x'.

Step O. Initialize the retrospective iteration numberi = 1. Set nil and ei .

Step I. Generate !Qi' For IRA, !Q; is m, new obser­vations, generated independently of !Q;_I'For ORA, !Qi is obtained by generatingm, - nli-I new independent observationsand appending them to !Q;_I'

Step 2. Usc the numerical method to solve thedeterministic sample-path Equation (3) toobtain a retrospective solution X(!QI,' .. ,!Q;)that satisfies IX(!QI"" ,!Q;) -x*(!Q;)1 < £i·

Step 3. Compute the root estimator x(!QI, ... ,!Q;)using Equation (4) for ORA or (5) for IRA.

Step 4. Compute nli+1 and £i+I' Set i +-- i + I andgo to Step I.

In addition to choosing between ORA and [RA, aspecific RA algorithm is obtained by choosing the threecomponents: sample-size rule, error-tolerance sequence,and numerical root-finding method. We discuss thesechoices, as well as stopping rules (for the entire RA al­gorithm), in Section 4.

2.3. RA rationale

Here we briefly discuss the rationale of the RA algorithmstructure. We have three criteria for algorithm develop­ment: practical computational efficiency, guaranteedconvergence to the root x*, and a standard-error estima­tor for the root estimator. The first two underlie thediscussion here, and the last we discuss in the next sub­section.

RA algorithms usc, by definition, multiple iterationswith increasing sample sizes and decreasing error toler­ances. The key idea for practical computational efficiencyis that previous retrospective iterations, based upon smallsample sizes, can provide information for efficient nu­merical solution in future retrospective iterations. Com­puting in early iterations is inexpensive because samplesizes arc small; computing in later iterations is inexpen­sive because the approximate location of the retrospectiveroot is known. Also for efficiency, the error tolerancesneed to be appropriately sized: there is no need tochase randomness. In earlier retrospective iterations the

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 5: Stochastic root finding via retrospective approximation

262

retrospective root has larger sampling error (due to smallersample sizes), so the error tolerances should be larger; laterthe retrospective root has less sampling error, so the errortolerances should be smaller. The second criterion ­guaranteeing convergence - is obtained by approachingtwo limits simultaneously: the sample-path approximation5i" approaches q, because of increasing sample-size m, andthe retrospective solutions approach the retrospectiveroots, because the error tolerances converge to zero.

Two other sample-size rules, both commonly used,have disadvantages compared to increasing sample sizes.The first - to use a single iteration, called a static algo­rithm in Shapiro (1996) - is typical, but often inefficient.Here only ml needs to be specified, and a solution x(~I)

01' Equation (3) with i = I is returned. The sample-size mi

is necessarily large for Equation (3) with i = I to wellapproximate fl, which causes the root-finding method tobc inefficient because the lack of information from pre­vious iterations causes the method to examine manypoints x, each requiring the computation of .vIII' (x; ~I)based on a large sample size of mi. We discuss this firstsample-size rule in Section 3.1. A second sample-size rule,also not allowed within RA, is to use k iterations, each ofthe same sample-size: ml = mi = ... = mi, The multipleiterations allow subsequent iterations to begin with agood guess of the retrospective root x'(~;), saving com­putational effort, The disadvantage is lack of convergenceas k goes to infinity because of fixed finite sample size mi.

Any bias in the retrospective root for the true root (oralso in the retrospective solution for the retrospectiveroot) docs not go to zero. As with using k independentreplications in any simulation experiment, reasonablepractical performance can sometimes be obtained withsuch an approach (Healy, 1992, p. 9). Although biascannot be estimated, estimating the standard-error of theroot estimator is straightforward. The tradeotfs betweenthese two sample-size rules are analogous to those be­tween steady-state simulation using one long run versus kshorter runs: less bias versus easy standard-errorestimation. RA is designed to obtain both advantages.

By definition of RA, the new observations in Step 1 (m,in IRA and m, - mi-I in ORA) that are used to obtain.iI"i (x; ~i) must be independent of previously generatedobservations so that the variance estimators can work.The new observations do not, however, need to be inde­pendent of each other. Any variance reduction via de­pendence-induction methods must be applied within theset of new observations, thereby hiding the dependencewithin .vIII, (x). For example, antithetic pairs can be applieddirectly for either IRA or ORA. Stratified sampling,however, is straightforward for IRA but not for ORA.Application to steady-state simulation is straightforwardif~ is chosen to be the random numbers, which are easilymade independent; the dependence between the retro­spective roots will become negligible as the sample sizegrows.

Chell and Schmeiser

2.4. RA variance estimation

The third algorithm criterion is the ability to estimate thestandard error of the algorithm's current root estimatori(~I"" '~i)' We derive variance estimators for both IRAand 0 RA here. As discussed in Section 4.1, these esti­mators are helpful for determining error tolerances, foruse within the numerical root-finding algorithm, forstopping the overall algorithm, and for reporting precisionof the final root estimator. These variance estimators arebased on assumptions that are consistent with our intu­ition and computational results in Section 5, but certainlydo not hold exactly in practice. For example, the com­putational results show that bias in the variance estimatoris large in early iterations before becoming negligible.

First consider IRA. For now assume that IRA retro­spective solutions are unbiased uncorrelated estimators ofthe root x", with variances inversely proportional tosample-size; that is, mjVar(x(~I"" '~j» = \'2 for each}.RA algorithms contain no systematic reason for estimatesto be high or low, so biases should be small. Thatthe variance is inversely proportional to sample-size isnatural, at least asymptotically (see also Lemma 2 inSection 3.1). The independent retrospective roots {x' (~Jl}of IRA yield retrospective solutions {X(~I"" '~j)} thatare nearly uncorrelated; some correlation might arisefrom the use of information from previous retrospec­tive iterations, such as an initial search point. This cor­relation should be minor because RA forces the jthretrospective solution to be close to (within £j of) the jthretrospective root. Consider the covariance between twosuccessive retrospective solutions. Let b(~I"'" ~j)

denote the error of the jth retrospective solution, i.e.,

X(~I' ... ,~j) = x'(~j) + b(~I' ... , ~j)'

Then COV[X(~I';" , ~~), X(~I'" . ,~j+I)] ';Cov[x'~~Jl+b(~I'''''~j)'x (~j+I)+b(~I""'~j+l)] Cov[x (~j)'b(~I" .. '~j+I)] + COV[b(~I"'" ~j)' b(~I"'" ~j+I)]because ~l' ~2' ... are generated independently in IRA.The first covariance is small because: (i) x'(~j) andb(~I' ... '~j+l) are correlated only through ~j not theother ~'s; and (ii) the value of b(~I' ... ,~j+l) dependsmore on X(~I'" . '~j)' for example as an initial point, butlittle on x'(~j)' The second covariance is small because:(i) the signs of the two errors b(~I"'" ~j) andb(~I" .. ,~j+l) are nearly independent; and (ii) themagnitude of the two errors depends mainly on theirerror tolerances Ej and £j+l; and hence if the error toler­ances {£{ : I = 1,2, ... } are substantially smaller than thestandard deviation of the current retrospective root, thenthe correlation between successive errors should be small.For these reasons assuming zero correlation seems rea­sonable.

These IRA assumptions yield two useful results. First,after i IRA retrospective iterations, the minimal-varianceunbiased root estimator is i(~I"'" ~;), the weighted

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 6: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation

average of the i retrospective solutions, as given inEquation (5). Second, these assumptions provide an es­timator of the variance of the root estimator. The vari­ance is Var(x(fQI"" ,fQ;» = v2/ "L1=, mj' Therefore, afteri retrospective iterations, an unbiased sum-of-squared­differences estimator of the variance of the IRA root es­timator is, for i > I,

263

where the coefficients Cl.ij = mj/(m; - mj) are chosen tomake each term of the sum an unbiased estimator of thevariance. Because Cl.ij functionally depends on m., the es­timator cannot be computed using cumulative sums as inEquation (6), so the values of the i retrospective solutionsmust be stored. For any fixed i, however, m, is fixed; thenexpanding the squared differences allows computation

,v;r(X(fQl, ... ,Ql;» = y2/I>j,

j=1

= tmj[x(fQI,'" ,fQj) -X(fQI,'" , fQil] 2/ [u - I) tmj]'J=I j~1

= [t mjx2(fQI,'" ,fQj) - (t mj )x2(fQI,'",fQj)] / [U - I) t mj],

J~I J~I J=I(6)

Now consider ORA. We make assumptions andarguments analogous to those of IRA. Assume that theORA retrospective solutions are unbiased estimators ofthe root with variance inversely proportional to thesample size and that covariances are inversely propor­tional to the larger sample-size; that is,COV[X(fQI,""%)' x(fQ" ... , Qik)] = y2/max{mj,md foreach j and k. The arguments for unbiasedness andvariances are identical to those for IRA. The argumentfor covariance is based on the assumption that whenj < k , the kth retrospective solution X(fQI"" ,Qik) =mjmk'x(fQI,'" ,fQj) + (/Ilk - mJlmk1x@), the weightedaverage of the jth retrospective solution X(fQI,"" fQj)and the retrospective solution x(@ of the sample-pathequation Yn,(x;@ = /" where iii = mk - mj and Q2 = Qik\fQj = {Wmj+I, ... , wm, } , the set of (mk - mj) new obser­vations independent of fQ;. This covariance assumptionis true if the retrospective solution equals the retro­spective root at every iteration, 9 is linear, andy(x; w) = g(x) + z(w), where the random noise z(w) isfunctionally independent of x for every w. If this as­sumption is not true, then observations are becomingeither more or less valuable. Under the assumption,COV[X(fQI"" ,fQj), X(fQI"" ,Qik)] = Cov[X(fQl'''' ,fQj),mjmk' X(fQI,"" fQJl + (mk - mJlmk I x@)] = mjmkl Var[X(fQI, ... ,fQj)] = y2/mk. Hence the form of the co­variance is obtained.

These ORA assumptions suggest an estimator of itsvariance. The variance of the ORA root estimator can beestimated using

v;r(X(fQI, ... ,fQ;»

"L~:,I, CI.;Jlx(fQI"" ,fQj) -X(fQI,'" ,fQj)]2

i-Ii> I,

(7)

using only the two cumulative sums for Cl.ijX(fQl,'" ,fQJl,and Cl.jjX2(Q11 , ... , Q1j) over j.

2.5. Retrospective approaches for stochastic optimization

The genesis of RA algorithms lies in retrospective opti­mization. The optimization problem is to find the optimalpoint of an objective function using only an estimator ofthe function at any feasible point x (Fu, 1994). A retro­spective approach optimizes sample-path problems thatapproximate the optimization problem of interest. Ru­binstein and Shapiro (1993) use the phrase stochasticcounterpart and Giirkan et al. (1994), for example, usesample-path for this approach. We adopt Healy andSchruben's (1991) phrase retrospective to capture the ideaof solving a random problem that happened in the past.Robinson (1996) contains a good summary of such op­timization algorithms.

The design of the family of RA algorithms has fivefundamental features. We discuss these features and howthey distinguish RA from retrospective optimization al­gorithms as follows:

• We use a sequence of sample-path approximationswith increasing sample sizes and decreasing errortolerances. This structure is central to our simulta­neous goals of proving convergence and achievinggood practical performance (as discussed in Sec­tion 2.3). Shapiro (1996) describes this approach asdynamic to distinguish it from the more-typical staticalgorithms, which solve a single sample-path prob­lem. Shapiro and Homem-de-Mello (1997) and Ho­mem-de-Mello et al. (1999) recently have used astructure similar to ours (also discussed earlier inChen and Schmeiser (1994b) and Chen (1994», butwith random sample sizes and a stopping rule basedon a statistical test of hypothesis. Shapiro and Wardi

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 7: Stochastic root finding via retrospective approximation

264 Chen and Schmeiser

3. Convergence of RA algorithms

where Ym(x; fQ) is a consistent estimate of g(x) for all real x(i.e., for any .r, w.p.1 Ym(x; fQ) converges to g(x) as m --; (0).

least-square estimators. In many applications, such opti­mization is obtained by solving for the zero of the gra­dient function. Following Serfling (1980, p. 243) we viewM-estimators as the solution of a sample-path equation,such as Equation (I). The retrospective roots x*(fQ) sat­isfying Equation (I) and x*(fQ;) satisfying Equation (3)arc therefore M-estimators.

We show here that under weak conditions RA algorithmsconverge w.p.1 to the true root x* that satisfies g(x*) = y.As discussed in Section 2.2, RA algorithms estimate x* bysolving, to within an error bound, a sequence of sample­path equations based on increasing sample sizes. To showthe convergence of RA algorithms, we consider two cases:(i) a single RA iteration using a long-sample-path equa­tion; and (ii) a complete RA algorithm using a sequenceof equations of the form (3) for i = 1,2, .... The firstcase, discussed in Section 3.1, could be used to solveproblems with a single iteration with a fixed value of ml

and EI (as with the static methods of Section 2.5), but ourintent here is to use the single-iteration results of Case (i)to prove in Section 3.2 that RA algorithms converge to x*w.p.1 as the iteration index i goes to infinity. The ad­vantages of using multiple retrospective iterations ratherthan a single-iteration is discussed in Section 2.3. InSection 4 we discuss the choice of RA algorithm com­ponents.

(8)YIIJX*(m);fQ) = y,

3.1. Case (i): a single RA iteration

Consider using an RA algorithm but stopping after thefirst retrospective iteration using only the first sample­path equation, Equation (3) with i = I, and returningX(fQI) as the estimator of the root x*. (I RA and 0 RA areidentical in this case). We consider here the asymptoticbehavior of X(fQI) in the limit as the first-iteration samplesize m, goes to infinity and EI goes to zero. Although inpractice m) and E) are fixed, the asymptotic analysis hereis interesting in the same sense that it is interesting toshow that the sample mean converges to the populationmean as sample-size goes to infinity even though inpractice sample-size does not grow.

To simplify notation and provide emphasis on thesample-size, we denote the sample-size ml by m, the errorEI (dependent on the sample-size) by E(m), the sample­path fQI by fQ, the retrospective root X*(fQI) by X*(m), andthe retrospective solution x(fQIl by X(m). Equation (3)with i = I is then

(1996) consider gradient descent algorithms, whichare both dynamic and Markovian. The bundle-typemethod (Plambeck et al., 1996) is dynamic, but theerror tolerance docs not decrease.

• Averaging solutions from a sequence of indepen­dent problems, which we argue is more efficient.That is, IRA is more efficient than ORA becauseDRA reprocesses old data whereas IRA is alwaysworking on new data (as discussed further inSection 5.1). Healy and Sehruben (1991) averagesolutions of independent sample-path approxima­tions, but each has the same sample size.

• A black-box approach, in which the problem struc­ture is not used. Our work is in the spirit of methodssuch as bisection search or regula falsi rather thanmuch of the recent literature that exploits problemstructure. Stochastic approximation (Andradottir,1992; L'Ecuyer et al., 1994; L'Ecuyer and Glynn,1994) and bundle-type methods (Plambeck et al.,1996) fall into this category. Healy (1992) and Healyand Xu (1994, 1995), on the other hand, developproblem-specific algorithms.

• Robustness. We strive for good practical perfor­mance without algorithmic tuning. RA logic isnon-Markovian, in that information from previoussample-path approximations is used in the solutionof the current approximation. The bundle-typealgorithm used by Plambeck et al. (1996) is also non­Markovian. In contrast, the stochastic approxima­tion algorithm is typically Markovian.

• Convergence proof. We prove convergence underfairly broad conditions. Shapiro (1996) proves con­vergence for various static algorithms. Shapiro andWardi (1996) prove convergence of dynamicMarkovian algorithms. Robinson (1996) provesconvergence of the single-iteration bundle-typealgorithm.

Substantial earlier work considered static algorithms.Rubinstein and Shapiro (1993) solve stochastic optimi­zation problems by solving SRFPs, finding the zero of theassociated gradient functions. They estimate the optimalpoint by finding the zero of a sample-path gradientfunction using the score-function gradient estimates froma single long-run simulation experiment. They assumethat a finite-time algorithm is available for solving thesample-path equation exactly; that is, in our notationthey assume that X(fQI) = X*(fQI)' Healy and Schruben(1991) solve stochastic optimization problems by ana­lyzing each problem's structure to find the exact optimalpoint of a sample-path (or retrospective) objective func­tion. A specific algorithm therefore differs from problemto problem (Healy and Schruben, 1991; Fu and Healy,1992). Huber (1964) proposed M-estimators, which areobtained by optimizing an error function based on sam­ple data; examples include maximum-likelihood and

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 8: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation

The retrospective root X*(m) satisfying Equation (8) is anM-estimator (Section 2.5). Our purpose here is to showthat X(m), the approximation to X*(m), converges to thetrue root x* as the sample size m goes to infinity.

We show convergence in two parts: (i) X*(m) convergesto x* w.p.l; and then (ii) X(m) converges to x* w.p.J.Since X*(m) is an M-estimator, the first part follows fromthe results of M-estimators as stated in Lemma I, basedon the assumption that 1m (x; Q1) is an unbiased estimatethat averages m monotonic functions. We further developLemmas 3 and 4 to extend the result to more generalfunctions if and sample-path approximations 1m(-; Q1).The second part, the consistency of X(m), is shown inLemma 5.

Since Equation (8) may have no root or multiple roots,we define the concepts of a crossing root and a crossingset, which are then used in Lemmas 3, 4, and 5 andTheorems I and 2 in Section 3.2. Every root is a crossingroot; a crossing root might additionally be a discontinuitypoint where Ym(-; Q1) crosses 1'. To be more specific, wedefine the crossing root and crossing set as follows. Letsets

RN = {x: Ym(x;Q1) -1' < 0, x E iR},

RZ = {x: Ym(x; Q1) - l' = 0, x E iR}, and

RP = {x : YnJx; Q1) - l' > 0, x E iR}. (9)

Assume that the function J'm(-; Q1) is defined over thewhole real-number line iR. Then the three mutually ex­clusive sets RN , RZ, and RP partition iR, r.e,iR = RN U Rl U RP• Further define

RNP = {x : X E RN and x on the boundary of RPor RZ} U

{x : X E RPand x on the boundary of RN or Rl}

Then RZ contains all (true) roots of Equation (8) and RNP

contains all discontinuity points where YmC Q1) crosses l'but does not intersect the level 1'. The crossing set Cm (Q1)of Equation (8) is therefore defined as

Cm(Q1 ) = RZURNP. (10)

Every element of Cm (Q1) is a "crossing root". The setCm (Q1 ) is empty if the function J'n,(-; Q1) lies entirely belowor entirely above )'.

Lemmas I and 2 below use the concept of isolated root,which is a discrete point in Cm(Q1). Of (true) roots,crossing roots, and isolated roots, the most general arecrossing roots; all (true) roots and all isolated roots arealso crossing roots. A crossing root, however, might be a(true) root, an isolated root, both, or neither.

We illustrate crossing roots, isolated roots, and (true)roots for four different functions YmC Q1) in Fig. l , Thefunction Y,nC; Q1) in Fig. I(a) crosses the value l' at threedistinct points XI, X2, and X3; hence, any of these points isa crossing root and Cm(Q1) = {XI,X2, X3}. All three of thesepoints are also isolated roots. The only root is X2; the

265

points x\ and X3 are not roots because Ym(XI; Q1) oF l' andYm(X3;Q1) oF 1'. The function Ym(-;Q1) in Fig. l(b) crossesthe value l' only once with an intersecting interval (X4,X5]on the x-axis. Hence, all points in the interval (X4,X5] areroots, all points in [X4,X5] are crossing roots but none is anisolated root, and Cm(Q1) = [X4,X5]. The function J'm(-;Q1)in Fig. I(c) lies below the level l' except at one point X6,

where .vm (·;Q1) crosses the value l' and then immediatelydrops down to below 1'. Therefore. Cm (Q1) = {X6}; X6 isalso an isolated (crossing) root but not a (true) root. Thefunction Ym(-; Q1) in Fig. led) has no intersection with thelevel 1'. Hence, there is no root and no crossing root;Cm(Q1) is the empty set.

Equation (8) may have zero, one, or multiple crossingroots. Zero crossing roots occur when the sample-size m issmall, allowing )'m(x; Q1) to lie entirely below or above l'for all real x. Multiple crossing roots occur, for example,when Ym is a step function and one of the steps has height1'. Step functions occur in the two examples of Sections I,2 and 5, where y(x) is an indicator function. Indicatorfunctions occur, for example, when y(x) indicates whetheran event occurs or not.

We redefine the retrospective root X*(m) of Equa­tion (8) to be

X*(m) = {aony crossing root if Cm(Q1) is not empty,otherwise.

( I I)

The retrospective root X* (m) can be selected arbitrarilyfrom Cm (Q1) using any rule that produces a solution se­quence {X*(m)}. This selection rule is usually implicit inthe solution method. The choice of X*(m) = 0 whenCm(Q1) is empty is arbitrary because asymptotically Cm(Q1)is empty with probability zero. A better practical valuemight be the current estimate of x*.

Asymptotically the retrospective root X*(m) is an M­estimator. Lemmas I and 2 show the consistency andasymptotic normality of M-cstimators, and hence can beused for retrospective roots. Proofs can be found inHuber (1964) and Serfling (1980, p. 251), respectively.

Lemma t. Let x* be an isolaled 1'001 of o(x) = 1'. Supposethat .vm (x:Q1) = 2::;:\ y(x; wi)!m and each y(x; Wi) yields (Ill

unbiased estimator of o(x) for every x. Fur/her, supposethat, for every co, the function y(x; w) is monotone in x.Then x* is unique, and any solution sequence {X*(Ill)}satisfying Equation (8) converges to x* w.p.l as III -> 00.

Further, if with probabilitv one there is a neighborhood of'x* in which y(x; w) is a continuous function ofx, then thereexists such a solution sequence.

Lemma 2 (Serfling, 1980, p, 251) states conditions underwhich Jm(X* (Ill) - x*) is asymptotically normally dis­tributed.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 9: Stochastic root finding via retrospective approximation

266 Chen and Schmeiser

- - - -Ym(x;~) Ym(x;~) Ym(x;~) Ym(x;~)

'0vi•

..J...

,y V·············· y y ....... ......... - y _ ......... -_ .. '_ .

), V....- 0

----- -1-1 I .x x x XXI X2 X3

,,-X4 Xs X6

(a) (b) (c) (d)

Fig. I. Four functions illustrating different types of roots.

Lemma 4. Assume that

I. the function y : ~ --+ ~ has a unique root x* and sat­isfies

Lemmas 3 and 4 are stated for functions 9 that areincreasing in the neighborhood of the root x*. Trivialchanges, e.g., redefining 9 as -y, allow them to be statedfor functions 9 that are decreasing near x*.

Lemma 5 states that, despite allowing an error £(111) infinding the root of the sample-path equation, a singleiteration of an RA algorithm will converge as m goes toinfinity.

2. for every x E~, the estimate Ym(x; fQ) converges toy(x) w.p.I as III -> 00, and

3. for every positive integer m, w.p.I the crossing setCm(fQ) of the function Ym(x; fQ) is either empty or aninterval.

Then, for any selection rule defining the solution sequence{X*(m)} from {Cm(fQ)},

lim X*(m) =x' w.p.1.m~oo

set, if not empty, is a unique interval. That is, for everyvalue of III the crossing set Cm(iQ) of Equation (8), if notempty, is an interval w.p.l , i.e.,

PI'{iQ : c;(fQ) = [.;' (fQ), XU (fQ)]} = I,

where .x'-(fQ) = sup{x : Ym(x; fQ) < y} and XU (fQ) = inf{x: Ym(x;fQ) > y}. If.x'-(fQ) > xU(fQ), then Cm(iQ) is empty.For example, .x'-(fQ) = X4 and xU(fQ) = X5 in Fig. I(b).This condition restricts the function Ym('; fQ) to cross thetarget value l' at most once, but allows multiple contig­uous roots. In addition to Fig. I(b), examples include thestep functions that arise in the two examples of Sec­tions 1,2 and 5 and any function Ym(x; fQ) monotonic in x.Like Lemma 3, Lemma 4 does not require an unbiasedYm(x; fQ). The proof is listed in the Appendix.

if x>x*,if x=x',if x<x*,{

> yy(x) = y

<y

Lcmma 3. Assume that

I. the function y : ~R -> ~ is a non-decreasing functionwith unique root x* satisfying y(x*) = y, and

2. the sample-path approximation Ym(x; iQ) converges 10

y(x) uniformly in x w.p.I as III -> 00.

Then, ./CII· any selection rule defining the solution sequence{X'(III)} from {Cm(iQ)},

lim X'(III) = x' w.p.1.111-00

Lcmma 2. LeI x' be an isolated 1'001 of y(x) = y. Supposethe unbiasedness and monctonicily of y(x; w) as in Lent­ina I. Further, suppose that y(x) is differentiable at x' withy'(x') f= 0 and that E[y2(x;w)] is finite for x in a neigh­borhood ofx" and is continuous at x'. Then, as III -> 00 anyscaled solution sequence {/i1i(X*(IIl) - x')} of Equation(8) has an asymptotic normal dist~ibulion with mean zeroand variance VarLv(x*; w)I/[y'(x*)] .

Lemmas 3 and 4 show that a solution sequence{X'(III)} converges to .r" w.p.1 under certain conditionson the function y and the sample-path approximation Ym'Both lemmas relax the assumption of unbiasedness andmonotonieity of )Im(x; iQ), with Lemma 3 requiring onlyuniform convergence of Ym(x; iQ) to y(x) w.p.1 as III -> 00

and with Lemma 4 yet more relaxed with only point-wiseconvergence of Ym(x; iQ) to y(x) w.p.l as m -> 00 but re­quiring jim(';iQ) to cross y no more than once. (Thatconsistency is a relaxation of unbiased ness follows fromthe strong law of large numbers, Billingsley (1979, p. 70).Dropping the unbiasedness assumption allows. for ex­ample, the initial transient of steady-state simulation). Inaddition, both lemmas allow the sample-path approxi­mation .vm(-; iQ) to have multiple crossing roots for finitevalues of Ill. Lemma 3 considers a monotonic function q.We list the proof in the Appendix.

Lemma 4 considers a function y that has a unique rootbut that is not necessarily monotonic. It assumes that.vm ( · ; iQ) is point-wise convergent to yand that its crossing

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 10: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation

Lemma 5. Assume that conditions in Lemma I, 3, or 4hold. l] X(m) is obtained by one RA iteration and if E(m)converges to zero 1V.p.1 as m goes /0 infinity, then

limX(m)=x' w.p.1.m-oo

Proof. By Lemma 1,3, or 4, X'(m) converges to x' w.p.1.Therefore, w.p.1 there exists an N(Ql.) such that for everym > N(Ql.) the sample-path equation has a crossing root.When there is a crossing root, the root-finding methodreturns a solution X(m) that satisfies IX(m) -X'(m)1 <E(m). Because E(m) converges to zero w.p.l as m goes toinfinity, the random absolute numerical errorIX(m) -X'(m)1 converges to 0 w.p.1. Then Lemma 1,3or 4 implies that X(m) converges to x' w.p.1. •

3.2. Case (ii): Solving a sequence ofsample-path equations

We now show that RA algorithms, defined in Section 2.2,converge w.p.l. Recall that RA algorithms solve a se­quence of equations of the form (3), for i = 1,2, ... , forthe retrospective roots {x'(Qi,)}, using an increasingsample-size sequence {m;}. At each iteration i, RA re­turns a solution X(Ql.I, ... ,Ql.;), an approximation ofX'(Ql.i)' within error-tolerance Ei. The dependently seededRA, ORA. repeats mi-I observations Ql.i-I in Ql.i while inIRA all retrospective roots are independent. After i iter­ations, the 0 RA root estimator of the root x' isX(Ql.l, ... ,Ql.;) = X(Ql.1 , ... ,Ql.i)' the last retrospective solu­tion; the IRA root estimator is x(Ql.I, ... ,Ql.;) = ~i=1 mjx(Ql.I,· .. ,Ql.j)/~~=I IIlj, a weighted average of solutionsX(Ql.I)' ... ,X(Ql.I, .. · ,Qi,) where each weight is propor­tional to the sample size.

Theorems I and 2 respectively show that ORA andIRA algorithms converge to the solution of an SRFP ifits function 9 and associated estimator Ym are well be­haved.

Theorem I. Let a specific DRA algorithm be used to findthe unique root x* of the equation g(x) = l' using the esti­mator Ym(x; Ql.) for y(x). If 9 and Ym(-; Ql.) satisfy the con­ditions in Lemma I, 3, or 4, then the DRA root estimatorX(Ql.I, ... ,Qi,) converges to x' lI'.p.l as i -> 00.

Proof. The proof proceeds sequentially in three parts: (i)limi_oox' (Ql.i) = x' w.p.l; (ii) limi~oo x(Ql.!, ... ,Qi,) = x'w.p.l; (iii) limi~oo X(Ql.I, ... ,Ql.;) = x' w.p.l.

The sample-size sequence {mi} is increasing, so"i -> 00" implies "m, -> 00"; therefore, Lemma I, 3, or 4yields the first part. The second part follows because, bydefinition of RA algorithms, the sequence of error-toler­ance {e.] converging to zero w.p.1 implies convergenceusing Lemma 5. The third part is trivial becauseX(Ql.I'·· . ,Ql.i) = x(Ql.I, . . . ,Ql.;) by definition of ORA. •

Theorem 2 for IRA is analogous to Theorem I forORA. The new condition - that ~f=,1 Elx'(Ql.J! - x'i is

267

finite - seems likely to be satisfied in practice. Forexample, assume that there is a finite constant o: suchthat Elx'(Ql.J! -x*1 < «[m, for every i. as would beconsistent with Lemma 2. Then I:f=,1 Elx'(Ql.J! -x'i =rx ~f=,1 nl; I, which is finite if, for example and as werecommend for other reasons, mj = Clmj_1 for someCl > 1. As further evidence that the condition is weak,notice that the above argument holds even if the as­sumption is changed to use the sq uared differencerather than the absolute difference.

IRA convergence differs from ORA convergence inthat each retrospective iteration is independent. DefineiiJ= (Ql.1,Ql.2, ... ), the infinite sequence of observed ran­dom numbers corresponding to the one realization fromthe sample space. Convergence w.p.1 in Theorem 2 iswith respect to iiJ.

Theorem 2. Let a specific IRA algorithm be used to findthe unique root x* of the equation g(x) = l' using the esti­mator Ym(x;Ql.) for y(x). Assume that I:f=,1 Elx'(Ql.J! -x'iis finite. Then the IRA root estimator X(Ql.l,.·. ,Ql.i) con­verges to x' 1I'.p.1 as i -> 00.

Proof. As with Theorem I, the proof proceeds sequen­tially in three parts: (i) Iimi~OOx·(Ql.i) = x' w.p.l;(ii) Iimi~cox(Ql.I, ... ,Ql.i) =x' w.p.l; and (iii) limi_oox(Ql.I, ... , Ql.;) = x* w.p.1.

The first part shows that the retrospective roots con­verge to the root x' of y. The finiteness ofI:f=,1 Elx'(Ql.j) - x'i is sufficient, using Result 1.3.5 ofSerfling (1980).

The second part shows that retrospective solutionsconverge to x*. By definition of RA algorithms, Ei(iiJ)converges to zero w.p.1 and IX(Ql.I, ... ,Qi,) -x'(Qi,)1 <Ei(iiJ) for every i. Hence w.p.1 we have the followingproperty: for every G > 0 there exists an ..f(r., iiJ)such that !X(Ql.I, ... ,Ql.;) -x'(Ql.ill < Ei(iiJ) < f: for everyi>..f (G, iiJ). Then the first part implies the secondpart.

The third part shows convergence of IRA algo­rithms. Let b, = ~j=1 mj; then limi_oo b, = 00. Bydefinition

i i

X(Ql.I, ... ,Ql.i) = L IIljX(Ql.I , ... ,Ql.j) / L nljj=1 j=1

i

= bi'L mjx(Ql.I, ... ,Ql.j).j=1

By the Toeplitz Lemma (Loeve, 1977, p. 250), the eventlimi~ooX(Ql.I, ... ,Ql.i) = x' implies the event

lim X(Ql.I , ... ,Ql.i) = x'.I~OO

Therefore, convergence of X(Ql.I' ... ' Ql.;) to x' w.p.1implies convergence of X(Ql.I, ... ,Ql.i) to x' w.p.l. •

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 11: Stochastic root finding via retrospective approximation

268

4. Implementation of RA algorithms

We discuss here two issues associated with implementingRA algorithms: choice of a specific RA algorithm andchoice of independent random variables w. Although allRA algorithms converge under weak conditions, asshown in the previous section, computational effort toobtain roots to a specified precision depends upon thespecific algorithm used. Similarly, computational effortdepends upon the user's definition of the random vari­a blcs w; our definition of w based on random numbers isa sale, but not necessarily efficient, default. We discussthese choices in the next two subsections, respectively.Despite sometimes giving quite specific implementationsuggestions, our intention here is only to provide a senseof the issues.

4.1. Choice of a specific RA algorithm

Specifying an RA algorithm requires choosing threecomponents: a rule for increasing the sample sizes {mil, arule Cor decreasing the error bounds {E;}, and a methodCor solving the sample-path equations. We discuss each,as well as stopping rules for the algorithm as a whole.

The rule to determine the sample-size sequence {milcan take many forms. A reasonable family of sequencesto consider is III; being the integer part of a +CI (m;_I)h fora ~ 0, b ~ I, and CI ~ I. If, as we argued earlier,Var(x'(ill;)) / Var(x'(ill;_I)) = m;_I/m;, setting a = 0 andb = I is required to make the variance ratio independentof iteration number i. Therefore, m, = Clm;_I, withCI> I, seems natural. Chen (1994) showed experimen­tally that computational performance is robust to choicesof Cj over the set {1.5, 2,5, 10}. Smaller values of CIprovide more times at which to stop the algorithm. Tocleanly obtain only integer values of m., we typically useCI = 2. The remaining issue is the choice of the initialsample-size 1111. Any small value, including 1111 = I, is fine;some numerical root-finding methods might be able touse standard-error information about J/lIll (x; illI), in whichcase a larger value of 1111 might be useful, But, 1111 shouldbe small because on the first iteration nothing is knownabout the location of the retrospective root X'(illl),causing the root-finding method to examine many points.r, with each examination requiring 1111 observations for.villi (x). The goal is that Ill; is small when many points areexamined in the early iterations, and that III; grows largein later iterations when very few points need to be ex­amined because the previous iterations provide a goodinitial guess I(illl"" lill;-I) of .1'* (ill;).

The rule to determine the decreasing error-tolerancesequence {o;} also can take many forms. If the samplesizes {III;} increase by a factor of CI, then our earlier as­sumptions about variance decreasing with sample-sizesuggest thc form c; =c~1/2E;_I' The problem is then to

Chen and Schmeiser

specify an appropriate EI. Chosen too small, the root­finding method wastes computation finding a solutionX(illl,'" ,Qi;) close to X'(ill;), which might not be close to.1'* because of small sample-size. Because we are assumingthat no prior information is available about samplingerror, a value of EI scaled to the problem at hand is notknown. In our implementation discussed below and usedin Section 5.1, we have used a very large value of EI (suchas 1050), which eliminates the need to select a problem­dependent value; the error-tolerance logic is then reducedto being a device to prove convergence.

We allow random sequences of error tolerances to fa­cilitate future development of RA algorithms whose E; ateach retrospective iteration is based on an estimate of thestandard error of .1'* (ill;). Such algorithms would use arandom sequence {E;}, each a factor {), say, of the stan­dard error estimate of x'(ill;). This standard error couldbe estimated based on Equation (7) for DRA andEquation (6) for IRA, or the asymptotic formula inLemma 2. Choosing {) E [.1, I.] seems reasonable; lessthan one-tenth of a standard error is certainly too muchprecision and more than one standard error introduces asubstantial new source of error.

The deterministic root-finding method for solvingEquations (3) for i = 1,2, ... can be either analytical ornumerical. Analytical approaches require a known andsimpler structure of Equation (3); Healy (1992) investi­gates optimization problems for which an analytical so­lution can be obtained. Our statement of the SRFPincludes no structural information; in this black-boxcontext numerical methods must be used. The advantageis thaI no analyst effort is required to estimate a root; thedisadvantage is that the computational effort is greaterthan if Equation (3) could be solved analytically. Variousnumerical search methods are easily implemented, arereasonably efficient, and provide error bounds. Well­known examples include bisection and modified regulafalsi (also called the modified false-position method), asdiscussed, for example, in Conte and de Boor (1980).Such methods iterate from a starting point, which for RAalgorithms would be X(illl,' .. , ill;_I)' They also requireinitially bounding the solution x*(ill;), which can be doneby searching points in increments or decrements of 15;. Ifx'(Qi;) were known, the optimal 0; would be IX(illl, .. ';,Qi;_I) - .1'* (ill;) I· Because II(WI!" " ,ill;_I) - X* (Qi;) 1 and[Var(x(illl"" ,ill;-I) -X*(Qi;J:i]1 2 are proportional, wecould choose 0; = c2[Var(x(illl, ... ,ill;_I) _x*(Qi;))]1/2,where C2 > 0, for i> I. Chen (1994) shows that RA isrobust with respect to the choices of C2 and 15 1 ; the em­pirical results favor a small 01. Larger values of C2 ac­celerate the bound search but result in a bigger boundinginterval. Setting C2 = I seems reasonable. Using the as­sumptions for variances and the additional assumptionx'(ill;) =x(illl,···,ill;), then Var(I(illl,···,Qi;_I)-x*(ill;)) equals to \'2 (mi_'1 -mil) for DRA and

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 12: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation

V2m=~:.\ mX I+ mil) for IRA and hence can be esti­mated based on Equation (7) or (6). The chosen root­finding method must be implemented to handle multipleroots of jim, (.j!;QJ and to detect the difficult situation of noroot in the range of the computer arithmetic.

The RA algorithms defined in Section 2.2 do not havestopping rules, which are irrelevant to our proofs ofconvergence (unlike the €j stopping rules for the numer­ical search during each retrospective iteration i.) Never­theless, all but interactive implementations mustautomate stopping. Most natural stopping rules center ona standard error estimate for the current root estimator,such as Equation (6) for IRA and Equation (7) for DRA.When solving problems iteratively with CI = 2, each it­eration takes about twice as long as the previous itera­tion, so by retrospective iteration eight (or so) the resultsappear on the screen slowly enough to read in real time.Deciding when to stop is then easy. To obtain an auto­mated stopping rule, after each retrospective iteration astandard-error estimate can be computed and the algo­rithm stopped when the estimate is less than a user­specified precision. Because the standard error estimatecan be misleading when the number of retrospectiveiterations is small, we might require at least four (or so)iterations to avoid premature stopping.

4.2. Choice of w

In Section 2 we define W as pseudorandom numbers. Thisis a natural and safe choice in simulation. Depending onthe model, however, W can be chosen differently forgreater computational efficiency, while not affecting thesolution returned by the RA algorithm. Alternativechoices of w must be functionally independent of x andideally include more of the computations needed forgenerating the observations used in jim' Whatever thechoice, the sample {WI, ... , wm } needs to be generatedonly once for .vm (x:Qi) to be computed at any number ofpoints x during the process of solving the sample-pathequation.

Consider again the example from Section I, the anal­ogy of Student's t distribution. There are five possiblechoices for each component W in Qi = (WI, ... , wm). Inorder of increasing efficiency, they are: (i) the random­number seed; (ii) the random numbers used to generatevalues for VI,···, 1';,; (iii) the random variates VI, ... , 1';,;(iv) the pair of statistics (P,S); and (v) the statistic T.Because Qi needs to be computed only once for solving thesample-path equation, the higher-level definitions of W

lead to less total computing time. Of course, the values ofW need to be stored from the computation at the first xvalue for use at later x values.

The choice of W affects the computing time of IRA andDRA. A high-level choice reduces computing time forboth, but the ratio of computing times for IRA and DRAincreases as the choice of W becomes more efficient.

269

Whatever the choice, if the sample-size rule is m, = CI

m;-l, the ratio lies in the interval [I, u), where II =m;/(mj - mj_l) = CI/(CI - I) is the rate at which the ret­rospective samples grow (as discussed in Section 4.1). Theargument is straightforward. Of the four RA algorithmsteps (Section 2.2), only the first two involve significantcomputing time. First consider Step 2, which solves thedeterministic root-finding problem. Because at each ret­rospective iteration i the sample size m, is the same forboth D RA and IRA, the time for Step 2 is essentiallyidentical, although usually non-trivial. Now consider StepI, which generates the Qi values. At retrospective iterationi, IRA must generate m, new values but DRA has theopportunity to save and reuse m;-l = m.Ic, previousvalues. Therefore, if storing and saving the previousvalues is essentially free, the Step I computation time forIRA is about 11 times that for DRA. Thus, the ratio isapproximately I if the Step 2 computation time domi­nates and approximately II if the Step I time dominates.Therefore, in practice the ratio lies in the interval [I, u),completing the argument.

A particularly important case is when W is only theinitial random-number seed, in which case the ratio isone, as illustrated in our empirical results in Table I. Anefficient choice of W moves computation from Step 2 toStep I, and therefore increases the ratio, but not beyond~. Although inefficient in our example and often else­where, choosing the random-number seed for w oftensimplifies the programming necessary to create the pro­cedure jim' Storing a more-complex w in a discrete-eventsimulation would be difficult and the bookkeeping nec­essary to store and retrieve the values might become non­trivial, since the order in which the values are used wouldchange from one x value to the next. The argument of theprevious paragraph would then be invalid.

5. Application and empirical results

To illustrate SRFPs and RA algorithms, we now discussthe Guaranteed-Coverage Tolerance Interval (GCTI)problem that motivates our research interest. Numericalresults are provided in Section 5.1. Thiokol Corporationasked us to develop an algorithm to determine, in realtime, the constant x* satisfying the non-normal a-cover­age )I-confidence tolerance-interval relationship

Prw,dPrw{W 2: W -x*S} 2: a} =)1. (I I)

Here the product characteristic W is a continuous ran­dom variable with distribution function Fw having knownshape but unknown mean and variance. (For example,possibly W is normally distributed with unknown meanand variance). The sample mean Wand sample standarddeviation S are computed from product characteristics~, W2, ... , Wn previously generated from Fw independentof each other and of W. Given sample-size n, coverage a,

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 13: Stochastic root finding via retrospective approximation

270 Chen and Schmeiser

Tuble J. DRA and I RA comparison for a GeTI problem

Squared bias Variance MSE E[ Var(.'(j)J CPU lime (sec.fIOOO repl.)

DRA IRA DRA IRA DRA IRA DRA IRA DRA IRA

R.N. Seed R.N. Seed

I 0.42 0.42 0.17 0.17 0.59 0.59 2 3 2 32 0.26 0.27 0.14 0.09 0.40 0.36 0.17 0.075 6 9 6 93 0.10 0.13 0.15 0.08 0.25 0.21 0.15 0.054 8 12 8 124 0.03 0.05 0.15 0.07 0.18 0.12 0.11 0.040 12 18 13 185 0.00 0.01 0.14 0.06 0.14 0.07 0.08 0.030 19 27 20 326 0.00 0.00 0.11 0.04 0.11 0.04 0.06 0.021 33 46 40 527 0.000 0.000 0.046 0.024 0.046 0.024 0.027 0.012 61 87 75 968 0.000 0.000 0.022 0.012 0.022 0.012 0.015 0.007 121 168 143 1789 (l.000 0.000 0.010 0.006 0.010 0.006 0.008 0.004 227 322 278 345

10 0.000 0.000 0.005 0.003 0.005 0.003 0.004 0.002 435 660 540 670

confidence 1', and distribution shape, the problem is todetermine the value of .r" so that with 1001'% confidencethe random tolerance interval [W - x·S, 00) contains atleast the proportion a of the distribution. The Thiokolapplication is to reliability design issues, but such non­normal tolerance-interval problems arise in many con­texts (Chen and Schmeiser, 1995; Chen and Yang, 1999).

For this application, the root-finding function is0(.'1') = Prlv,s{Prw{W 2: W -xS} 2: a}, an (n+ I)-di­mensional integral. Numerical integration would be in­efficient even for small sample size n. The function 0(.'1')can be estimated easily, however, by Ym(x), the sampleaverage of 111 realizations of the conditional randomvariable

y(x). Choices of w, in order of increasing efficiency, are:(i) the pseudorandom number seed, (ii) the pseudoran­dom numbers, (iii) (WI,"" Wn ) , and (iv) (W,S).

Chen and Schmeiser (1995) analyze the Thiokol SRFPin some detail. They develop an efficient Monte Carloalgorithm to solve the problem by viewing it as a quan­tile-estimation problem. As expected, the special-purposequantile-estimation algorithm is more efficient thanblack-box RA algorithms, which have no informationabout the problem's structure. Here, however, we use thisexample to compare the efficiency of ORA and IRAalgorithms in Section 5.1 and to compare IRA to atuned version of e1assical stochastic approximation inSection 5.2.

Notice that the random variable Y(x) is not a function ofmean E( W) or variance Var( W) because the randomprobability Prw{ W 2: W - xS I w, S} does not depend onE(W) or Var(W). Hence the reliability Prw{W 2: W­xS I "V,S} can be computed using arbitrary values ofE(W) and Var(W) > 0, given Wand S based on a samplewith these arbitrary values of the mean and variance. TheMonte Carlo computer procedure for generating anobservation of Y(x) then consists of four steps: (i) gen­erate a sample WI, W2 , •. • ,11';, from Fw with arbitrarilychosen mean and variance, (ii) compute Wand S from thesample, (iii) compute p = Prw{ W 2: W - xSI w, S} =I - Fw(W - xS), and (iv) set Y(x) equal to zero if p is lessthan a and to one otherwise.

As discussed in Section 4.2, to implement RA the in­dependent variable w needs to be chosen. Various choicesare possible lor this SRFP. As always, the pseudorandomnumber seed can be used, and all computations repeatedfor each value of x: compute the pseudorandom numbers,transform them to random variates ~,j = I, ... ,n,compute Wand S, compute the reliability p, and compute

Y(x) = { ~ if Prw {W 2: W -xSIW,S} 2: a,otherwise.

5.1. Comparing DRA and IRA

Using a GCTI application, we illustrate the statistical andcomputational efficiency of specific ORA and IRA algo­rithms with a Monte Carlo experiment. For this appli­cation, IRA converges faster than ORA, the estimators ofsampling error work reasonably well after the first fewiterations, and saving the random numbers rather thanonly the seed value reduces computational effort by aboutone third for ORA and about one fifth for IRA.

Following the discussion in Section 4.1, we choose thefollowing RA components:

• Sample size sequence: m, = 2 and mj+1 = Clmj, whereCI = 2. With these choices, each new retrospectiveiteration uses a sample size almost equal to the totalused by all previous iterations.

• Error-tolerance sequence: EI = 1050 and Ei+1 = C;-I/2

Ej. That is, we essentially remove the error-tolerancelogic.

• A numerical root-finding method that returns a ret­rospective solution, an approximation of the root ofEquation (3): The initial solution is, rather arbitrar­ily, .'1'0 = 1. At retrospective iteration i, Ym, is com-

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 14: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation

puted at the current root estimate x(g~I' ... ,Q!;_,) fori > I and at Xo for i = I, which yields either an upperor lower bound. The other bound is found bysearching either left or right at a distance Oi. Thisdistance is doubled until the retrospective root isbounded. The sequence {oil defined by 0, = 0.0001and ()i = C2[Y:i'r(X(illl,'" ,illi_il-x*(ill;))]lf2, except

when this value is zero, in which case Oi = Oi-I. SeeSection 4.1, where C2 = I is suggested. Once theretrospective root is bounded, regula falsi search isused to find a retrospective solution within the errortolerance Ei. Because of the large error tolerancevalues, the logic reduces to two steps: (i) boundingthe retrospective solution, and (ii) returning the lin­ear interpolate of the bounds as the retrospectivesolution. We refer to this variation as the BoundingRA algorithm.

We intentionally did not tune the algorithm parametersxo, III I, 0" CI, and C2 to this application. As discussed inChen (1994), IRA performance is robust to these pa­rameter values, both in that these values work well overmany applications and in that small changes in the valueshave little effect on performance.

The application is the GCTI problem with n = 10,a = )' = 0.99, and Johnson distribution (Johnson, 1949)with skewness 4 and kurtosis 30. The true root is toler­ance factor x* ::::: 1.938 (see Chen and Schmeiser, 1995,Table 2).

Table I compares ORA and IRA performance basedon 20 independent runs of 1000 independent Monte Carloreplications. Each replication solves the GCT! applica­tion once with w = (illl, ill2, ... ) generated independentlyof other replications. All digits shown are statisticallysignificant, except possibly the last digits of cpu times.For iterations i = I, ... ,10, the quality of the solutionsare measured by the squared bias [E(X(illl' ... , ill;)) - x*]2,variance Var(x(ill" ... ,ill;)), and mean square error(squared bias plus variance, denoted by MSE), shown incolumns two through four, respectively. The fifth columnshows E[Y:i'r(xi)],J,te mean of the ORA and IRA vari­ance estimators Var(x(illl"" ,illi)) from Equations (7)and (6), respectively.

Table I shows that IRA produces better solutionsthan ORA for every retrospective iteration after the first,where the two algorithms are identical. For both ORAand IRA the squared biases quickly become negligiblecompared to the variance. The IRA bias accumulates thebiases from past iterations and hence is bigger than, butdecreases at about the same rate as, the ORA bias.Because the biases are small, the variances and mse's arenearly identical after the first few retrospective itera­tions. For both ORA and IRA these decrease by about50% with each retrospective iteration, as is expectedbecause Cl = 2 and therefore the sample sizes aredoubling.

271

The final few iterations suggest that asymptoticallyIRA has mse and variance that is 50 to 60% that ofORA. This seems reasonable because IRA has the benefitof using twice (or CI/(Cl - I) in general) as many inde­pendent observations as ORA (lIli for ORA and 2:;=l Ill)

for IRA).The averages of the variance estimators Y:i'r(.r(illl, ... ,

Q!;)) from Equations (7) and (6) are shown in the fifthcolumn. Although these estimators correctly sense theorder of magnitude of the sampling error, they underes­timate the variance from the third column. The relativeerror is less for ORA than for IRA and both relativeerrors are decreasing with retrospective iteration numberi. The biases in the variance estimators are caused by theearly iterations not having the asymptotic behavior as­sumed in the derivations of Equations (7) and (6) due tothe arbitrary starting point Xo = I. Maybe better varianceestimators can be found, perhaps by taking a movingaverage of only the previous few iterations. The currentestimators work well at their primary purpose, which is topredict the proper scaling (via oi+d for the next retro­spective iteration.

The two right-most columns show the number of cpuseconds required for 1000 Monte Carlo replications on aSun SparcCenter 1000 computer. The times almost dou­ble with each retrospective iterations because the samplesizes are doubling (c, = 2). The times to generate therandom numbers, transform them to random variates,and compute the observations y quickly dominate thefixed cost of each iteration.

Times for four RA algorithms, two versions of bothORA and IRA, are shown. Both versions, which corre­spond to implementations using two different w's, createthe same realizations and therefore have the same sta­tistical properties; they differ only in the method orcomputing. The columns labeled R.N. correspond to animplementation in which the random numbers are storedwhen first generated. The columns labeled Seed corre­spond to storing only the initial random-number seed,from which random numbers are recomputed as needed.The latter is simpler to code, primarily because it requiresa fixed amount of storage. The R.N. versions are faster,but negligibly so for IRA. The decrease in ORA times isgreater than for IRA because in ORA each randomnumber is used again and again, whereas in IRA eachrandom number is used in only one retrospective-itera­tion. The ratio of the IRA time to the ORA time is aboutone for the Seed versions and 1.2 for the R.N. versions.These results are consistent with our argument in Sec­tion 4.2 that the ratio lies in [I, u), where u = 2 here. Thespeed improvement could differ considerably with yetanother choice of w (e.g., storing the random samples or,better, only the sample means and standard deviations) orin another application.

A natural way to compare algorithm performance isvia the generalized I11se, the product of cpu time and mse.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 15: Stochastic root finding via retrospective approximation

272 Chen and Schmeiser

SA: ",-N(x·.l). =5 --

IRA: .. - N("',IO\m,,,,2, c,= 2,

at: 10·.~.. 1 ~

1000 2000 3000 4000 5000 6000 7000 8000o'---~-~-~-~-~-~~-~----J

o

60 ,-----,-~-~-~--~-~-~--~-_,

30

40

10

50

20

In this sense, IRA dominates DRA. The IRA mse is onlyhalf of the DRA mse, while the cpu time ratio (IRA toDRA) is no larger than two for any choices of w. Overall,I RA/ R.N. has the best generalized mse. IRAjSeed, whiehis only a bit slower, is likely to be a better choice in w

practice because it is easier to implement and requires ~finite storage, qualities not measured in generalized mse. Z

If the algorithms are to be run for many iterations, R.N.versions would eventually require disk (rather than ran­dom-access memory) storage, and the elapsed timeswould quickly be longer than Seed versions.

5.2. Comparing IRA and stochastic approximation

We now compare an IRA algorithm to a StochasticApproximation (SA) algorithm; in the GCTI applicationIRA has much smaller sampling error than SA. As shownin Fig. 2, RA algorithms are substantially more efficientthan Robbins and Monro's SA, the classical black-boxsolution method, even when it is tuned to the application.

Figurc 2 compares IRAjSeed and SA using the GCTIapplication with tolerance parameters n = 5, ex = 0.5,y = 0.9, and normal population, for which x* = 0.6857.The I RA/Seed algorithm used here is the same as that inTable I except that the initial point Xo is generated ran­domly from the normal distribution with mean x* andvariance 104 (denoted by N(x*, 104». The early conver­gence rate or SA depends on the initial point and thesample size (1'01' example, Chen (1994, p. 71) and Fu andHealy (1992», although the asymptotic convergence ratedoes not. We generate the initial point from N(x*, I) anddefine each SA iteration to use the average of five ob­servations, which is the (empirically determined) optimalvalue associated with this random initial point (Chen,1994, p. 71). The performance measure here is againgeneralized ruse, except that now computing effort ismeasured by N, the total number of y(x) observationscomputed. On a Sun SparcStation 2 computer, N = 8000corresponds to about 30 seconds of cpu time. Figure 2shows that IRA/Seed, despite having less informationabout the location of the root, and not being tuned to theapplication, converges faster than SA for these numbersof observations. The sample size N needed for the SAgeneralized rnse to drop below that of IRA is, as can beinferred from the figure, so large that in practice IRApcrforma nee is su bstantia Ily better.

That the SA asymptotic generalized mse is less mightseem surprising. The asymptotic generalized mse ratio(I RA to SA) lor this Thiokol example is about 2.3, whichis quite close to the asymptotic number of points at whichIRA evaluates YIIl'(X) at each retrospective iteration i(Chen 1994). Therefore, the asymptotic performaneeseems to be thc same if we count sample-path approxi­mations for IRA and function evaluations for SA. Thismakes sense because asymptotically the function 9 islinear and the slope at the root is known, so more than

N = number 01 y's generated

Fig. 2. A comparison of the IRA and SA algorithms.

one function evaluation is wasted. But the additionalfunction evaluations are, in fact, quite useful early, when9 is not linear, maybe explaining why IRA performsbetter than SA in practice.

6. Conclusions

We introduce a family of retrospective approximation(RA) algorithms to solve SRFPs. The algorithms arebased on solving sample-path approximations to theproblem of interest; that is, pseudorandom data from theproblem are generated and used to create a sequence ofapproximate problems, which have increasing samplesizes and decreasing solution-error tolerances. Our algo­rithms differ from retrospective optimization algorithmsin that they are iterative and in that they explicitly allowsome error in the solutions to the approximate problems.The latter is necessary if the approximate problems are tobe solved numerically.

In addition to introducing retrospective approximationalgorithms, we prove convergence for one-dimensionalSRFPs, we discuss implementation issues and specify twoalgorithm variations: Dependent Retrospective Approxi­mation (DRA), which uses all past observations at eachiteration, and Independent Retrospective Approximation(IRA), which uses each observation in only one iteration.Monte Carlo results for an application, as well as someanalytical arguments, indicate that IRA is superior toDRA. Monte Carlo results of a related application showthat IRA has smaller generalized mean squared errorthan a version of stochastic approximation that is tunedto the application.

Recommendations for future work in this area include:(i) proposing specific RA algorithms and proving con­vergence for multi-dimensional SRFPs; (ii) derivingasymptotic distributions for the root estimator; and (iii)extending RA's application on SRFPs to more generaloptimization problems.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 16: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation

Acknowledgements

This research is supported by Purdue Research Founda­tion Grant 690-1287-2104, Thiokol Corporation Con­tract A46111430, and NSF Grant OMS 93-00058. Wethank Colm O'Cinneide for helpful discussions, AntonKleywegt for helpful comments, and lIE Transactionseditor Jim Wilson for thoughtful suggestions that sub­stantially improved the presentation.

References

Andradottir, S. (1992) An empirical comparison of stochastic ap­proximation methods for simulation optimization, in Proceedingsof the First Industrial Engineering Research Conference, Klutke,Goo Mitta, D.Aoo Nnaji, B.O. and Seiford, L.M., (eds), Institute ofIndustrial Engineers, Norcross. GA, pp. 471-475.

Banks, J., Carson. II, J.S. and Nelson, B.L. (1996) Discrete-EventSystem Simulation, Prentice Hall, Upper Saddle River, NJ.

Billingsley. P. (1979) Probability and Measure, John WIley & Sons,New York, NY.

Chen, H. (1994) Stochastic Root Finding in System Design, Ph.D.Dissertation, School of Industrial Engineering, Purdue Universi-ty, West Lafayette, IN. .

Chen, H.-S. (1998) Multi-dimensional independent retrospecuve ap­proximation for making resource allocation deCISIOns to man.u­facturing systems, Master Thesis, Department of IndustnalEngineering. Da-Yeh University, Chang-Hwa, Taiwan. (inChinese).

Chen, H. and Schmeiser, B.W. (1994a) Stochastic root finding: prob­lem definition. examples, and algorithms, in Proceedings of theThird Industrial Engineering Research Conference, Burke, L. andJackman, J. (eds), Institute of Industrial Engineers, Norcross,GA, pp. 605-610.

Chen, H. and Schmeiser, B.W. (1994b) Retrospective approximationalgorithms for stochastic root finding, in Proceedings of the 1994IVimer Simulation Conference, Tew, J.D., Manivannan, S., Sa­dowski, D.A. and Seila, A.F. (eds), Institute of Electrical andElectronics Engineers, Piscataway, NJ, pp. 255-261.

Chen, H. and Schmeiser, B.W. (1995) Monte Carlo estimation forguaranteed-coverage nonnormal tolerance intervals. Journal ofStatistical Computation and Simulation, 51, 223-238.

Chen, H. and Yang, T.-K. (1999) Computation of the sample size andcoverage for guaranteed-coverage nonnormal tolerance intervals.Journal of Statistical Computation and Simulation, 63, 299-320.

Conte, S.D. and de Boor, e. (1980) Elementary Numerical Analysis: AllAlgorithmic Approach, New York, McGraw-Hili, NY.

Fu, M 'c. (1994) Optimization via simulation: a review. Annals ofOperations Research, 53, 199-247. . .

Fu, M.e. and Healy, K.J. (1992) Simulation optimization of (s,S) 111­

ventory systems, in Proceedings of the 1992 Winter SimulationConference, Swain, JJ., Goldsman, D., Crain. R.e. and Wilson,J.R: (eds), Institute of Electrical and Electronics Engineers, Pis­cataway, NJ, pp. 506-514.

Fu, M.e. and Hill, S.D. (1997) Optimization of discrete eventsystem via simultaneous perturbation. liE Transactions, 29,233-243.

Gurkan, G., Ozge, A.Y. and Robinson, S.M. (1994) Sample-path op­timization in simulation, in Proceedings of the 1994 Winter Sim­ulation Conference, Tcw, J.D., Manivannan, S., Sadowski, D.A.and Scila, A.F. (eds), Institute of Electrical and Electronics En­gineers, Piscataway. NJ, pp. 247-254.

273

Healy, KJ. (1992) Optimizing Stochastic Systems: a retrospective/de­terministic approach. Ph.D. Dissertation, Department of Opera­tions Research, Cornell University, Ithaca, NY.

Healy, K.J. and Schruben, L.W. (1991) Retrospective. simulation :e.sponse optimization, in Proceedings of the 199J Winter SimulationConference, Nelson, B.L., Kelton, W.D. and Clark, G.M. (cds),Institute of Electrical and Electronics Engineers, Piscataway, NJ,pp. 954-957. .

Healy, KJ. and Xu, Y. (1994) Simulation based retrospective ap­proaches to stochastic system optimization. Research Memoran­dum No. 94-14, School of Industrial Engineering, PurdueUniversity, West Lafayette, IN.

Healy, K.J. and Xu, Y. (1995) Simulation optimization of a bulkserver. Research Memorandum No. 95-3, School of IndustnalEngineering, Purdue University, West Lafayette, IN.

Homem-de-Mello. T., Shapiro, A. and Spearman, M.L. (1999) Findingoptimal material release times using simulation based optimiza­tion. Management Science, 45, 86-102.

Huber, PJ. (1964) Robust estimation of a location parameter. Annalsof Mathematical Statistics, 35, 73-101.

Johnson, N.L. (1949) Systems of frequency curves generated bymethods of translation. Biometrika, 36, 149-176.

Law, A.M. and Kelton, W.D. (2000) Simulation Modeling and Analy-sis, McGraw-Hill, New York, NY. .

L'Ecuyer, P., Giroux, N. and Glynn, P.W. (1994) Stochasticoptimization by simulation: numerical expenments WIth theM/M /1 queue in steady state. Management Science, 40, 1245­1261.

L'Ecuyer, P. and Glynn, P.W. (1994) Stochastic optimization by sim­ulation: convergence proofs for the CI/C/ I queue in steady state.Management Science, 40, 1562-1578.

Loeve. M. (1977) Probability Theory 1, Springer-Verlag, New York,NY.

Plambeck, E.L., Fu, B.-R., Robinson, S.M. and Suri. R. (1996) Sam­ple-path optimization of convex stochastic performance functions.Mathematical Programming, 75, 137-176.

Robbins, H. and Monro, S. (1951) A stochastic approximation meth­od. Annals of Mathematical Statistics, 22, 400-407.

Robinson, S.M. (1996) Analysis of sample-path optimization. Mathe­matics of Operations Research, 21, 513-528.

Rubinstein, R.Y. and Shapiro, A. (1993) Discrete Event Systems ­Sensitivitv Analvsis and Stochastic Optimization by the ScoreFUllction'Metho;', John Wiley & Sons, New York, NY.

Serfling, RJ. (1980) Approximation Theorems of Mathematical Statis­tics, John Wiley & Sons, New York, NY.

Shapiro, A. (1996) Simulation-based optimization - convergenceanalysis and statistical inference. Communications in Statistics ­Stochastic Models, 12, 425-454.

Shapiro, A. and Wardi, Y. (1996) Convergence analysis of sto­chastic algorithms. Mathematics of Operations Research, 21,615-628.

Shapiro, A. and Homem-de-Mello, T. (1997) A simulation-based ap­proach to two-stage stochastic programming with recourse.Mathematical Programming, 81. 301-325.

Simon, B. (1998) Convergence of a stochastic rootfinding procedure.Working Paper, Department of Mathematics, University of Col­orado at Denver, Denver, CO.

Appendix

Proof of Lemma 3. Without loss ofgenerality, assume that)' = O. By the second assumption, for every e > 0, w.p.1there exists a positive integer N(r.,Q2.) such that

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 17: Stochastic root finding via retrospective approximation

274 Chen and Schmeiser

11/ > N(I:,!.Q) implies 0(.1') - 10 < y",(x;!.Q) < 0(.1') + 10

for all x E ~.

and it follows immediately that .1'- = .1'+ is on theboundary of RN and on the boundary of RP

. Thus we seethat in this case, .1'- = .1'+ E RZ U RNP = Cm(!.Q).

•lim X'(m) =.1'*, w.p.l.01-00

Proof of Lemma 4. Without loss of generality, assumey = O. By the second assumption, for every x E ~ andevery I) > 0, w.p.l there exists a positive integer N(x, I),!.Q)such that

m > N (x, II,!.Q) implies IYm (x;!.Q) - g(x) I < IJ·

Given any 8 > 0, the first assumption implies

g(x' + E) > 0 and g(x' - 8) < O.

Now, choose .1'=.1'* + 8 and '1 = g(x* + e), and defineN1(.1",8, Q1.) = N(x, I), !.Q). Then for m > N, (x', e,!.Q)

IYm(x' +8;!.Q) - g(x* +e)1 < g(x' +e) w.p.1.

Therefore, w.p.l , .vm(x' + 8; Q1.) > 0 for every III > NI

(.1",8, !.Q). Similarly, choose x = x' - 8 and

(a)We have Ym([;!.Q) > 0 so that .1'- E RP; and it fol­lows from (A6) that .1'- is on the boundary of RN orRZ so that [ E RNP.

(b)We have Ym(x-;!.Q) = 0; and in this case we haveimmediately that .1'- E RZ

(c)We have Ym([;!.Q) < 0 so that .1'- ERN; and it fol­lows from (A6) that [ is on the boundary of RP orRZ so that .1'- E RNP.

Therefore, .1'- is a crossing root. Using a similar argu­ment, we can show that .1'+ is also a crossing root. We cansummarize all of these results as follows:

and Z E JV([) so that )'m(z;!.Q) 2: 0 }and z'EJV([)sothatYm(z';!.Q)<O ;

(A6)

otherwise, sup {x : Ym(z;!.Q) < 0 for every Z < x} > [ andthis contradicts the definition of .1'-. There are now threecases to consider:

[ = .1'+ implies th~t [ = .1'+ E RZ U RNP = Cm(!.Q) } ..1'- < .1'+ implies that Cm(!.Q) C [.1'- ,.1'+]

(A7)

Combining (A4) and (A 7), we see that Cm(!.Q) must benon-empty.

Let X'(IIl) be any point in Col (!.Q). Because x' is in theinterval [g-1 (-E), «' (E)], for every III > N(r.,!.Q)

IX'(IIl) -.1"1::; y-I(8) - 0- 1(-8) w.p.l.

Since 8 can be arbitrarily small and .1'* is the unique root,for any selection rule defining the solution sequence{X*(m)},

If .1'- < .1'+, then both .1'- and .1'+ are crossing roots.Here we prove for .1'- only; the proof for .1'+ is similar. Forevery neighborhood JV(x-) of [, we can find points Z

and z' satisfying

(A4)

and z.E ./V(x+) so that .f;,.(Z;~) > 0 )z' E .;11(.c) = JV(x+) so that yrn(z'; Q1.) < 0 ;

(A5)

z > .1'+Z' < x- and

[ == sup{x: .V,II(Z;!.Q) < 0 for every Z < x}. (A2)

Since y",(z;!.Q) > 0 for every z> frl(E) and y",(z;Q1.) < 0for every Z < a-I (-E), we sec that

.1'+ ::;«' (I:) and .1'- 2: «' (-E). (A3)

Next we prove that

by contradiction. Assume that .1'+ < .1'-. For everyx E (.1'+,.1'-), we can pick Z E (.1'+,.1') arbitrarily; and ap­plying the definition (A I) of .1'+, we must have.v",(x;!.Q) > 0 since x> Z> .1'+ Similarly we can pickz' E (x,[) arbitrarily; and applying the definition (A2) of.1'-. we must have )'",(x;!.Q) < 0 since x < z' < .1'-. This is acontradiction since Ym(x; Q1.) cannot be both positive andnegative; and thus the assumption that .1'+ <.C must belulsc. This establishes the inequality (A4).

Now if .1'- = .1'+, then we see that for every neighbor­hood .IV(X+) = .IV(X-) of the point .1'+ = .1'-, we can findpoints Z and z' satisfying

Therefore, for every III > N(E, !.Q), the crossing set Cm(!.Q)is not empty and satisfies

Cm(!.Q) ~ [y-I (-I:), 0- 1(E)] w.p.1.

To prove that the crossing set Cm(!.Q) is non-empty, we let

.1'+ == inf{x : Ym(z;!.Q) > 0 for every Z> x}, (A 1)

and

By the first assumption 0 is a non-decreasing function sothe inverse function 0- 1(a) = inf'[r : O(x) 2: a} is alsonon-decreasing. Now choose a particular positive valueof I: - small enough that both 0- 1(- 10) and O-I(E) arcdefined. Because I: > 0 and x' is the unique root,0- 1(-I:) s x' s 0-1(I:).

Now, consider the region of the crossing set. Because"x> a-I (I:)" implies "y(x) 2: I:", then w.p.l for every11/ > N(E, !.Q),

.v",(x;!.Q) > y(x) - 10 2: e - e = 0, for every x > y-I(E).

Similarly, w.p.1 for every 11/ > N(E,Q1.),

•v",(x;!.Q) < y(x) + I:::; -10+ /: = 0 for every .1'< y-I(-E).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014

Page 18: Stochastic root finding via retrospective approximation

Stochastic root finding via retrospective approximation

'1 = -y(x* - e) > 0, and define Nz(x*,e,wJ = N(X,11,wJThen, for m > Nz(x*, e,Q2.),

IYm(x* - e;Q2.) - y(x* - e)1 < -y(x* - e) w.p.1.

Therefore, w.p.l , Ym (x* - 0;Q2.) < 0 for everym > Nz(x*, B, Q2.).

Define N3(x*, f;, Q2.) = max {N1(x*,0,Q2.), Ni (x*,s, Q2.)}.Then, for every m > N3(x*, 0, Q2.)

Ym(x* - 0;Q2.) < 0 < Ym(x* + 0;Q2.) w.p.l.

Therefore, w.p.1 there exists at least one crossing root inIx* - e,x* + eJ and hence the crossing set Cm(Q2.) is notempty. (The argument is similar to that in the proof ofLemma 3). Furthermore, by the third assumption,C",(Q2.) = [_~ (Q2.), x" (Q2.)1 for every m, where :rf =sup{x: 5'm(x;Q2.) < y} and x'' = inf{x: y",(x;Q2.) > y}.Hence, for every m > N3(X*, 0, Q2.)

x*-e~xL~xU~x*+o w.p.l,

and therefore C",(Q2.) ~ [x* - s.x" + s] w.p.1. BecauseX*(m) is a point selected from Cm(Q2.), then w.p.1 everyX*(m) E [x' - B,X* + oj for m > N3(x*, e,Q2.). Hence, forany selection rule defining the solution sequence {X*(m)},

lim X*(m) =x* w.p.1. •m-eoo

275

Biographies

Huifen Chen is an Associate Professor in the Department of IndustrialEngineering at Chung Yuan Christian University in Taiwan. Shecompleted her Ph.D. in the School of Industrial Engineering at PurdueUniversity in August 1994. She received a B.S. degree in Accountingfrom National Cheng-Kung University in Taiwan in 1986 and an M.S.degree in Statistics from Purdue University in 1990. Her current re­search focuses on stochastic root finding, random-vector generation,non-normal tolerance intervals, and stochastic operations researchapplied in reliability and transportation.

Bruce Schmeiser is a Professor in the School of Industrial Engineeringat Purdue University. He received his Ph.D. from the School ofIndustrial and Systems Engineering at Georgia Tech in 1975; his un­dergraduate degree in the Mathematical Sciences and Master's degreein Industrial Engineering are from The University of Iowa. His re­search interests include stochastic root finding, simulation outputanalysis, input modeling, random-variate generation, variance-reduc­tion techniques, Markov chain Monte Carlo methods, and appliedoperations research. He has served in a variety of roles for the IIE,INFORMS and the Winter Simulation Conference.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 14:

39 1

5 O

ctob

er 2

014