Upload
bruce-w
View
212
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [University of California Santa Cruz]On: 15 October 2014, At: 14:39Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
IIE TransactionsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uiie20
Stochastic root finding via retrospective approximationHUIFEN CHEN a & BRUCE W. SCHMEISER aa Department of Industrial Engineering , Chung Yuan Christian University , Chung Li, TaiwanE-mail:b School of Industrial Engineering , Purdue University , West Lafayette, IN, 47907-1287, USAE-mail:Published online: 27 Apr 2007.
To cite this article: HUIFEN CHEN & BRUCE W. SCHMEISER (2001) Stochastic root finding via retrospective approximation, IIETransactions, 33:3, 259-275, DOI: 10.1080/07408170108936827
To link to this article: http://dx.doi.org/10.1080/07408170108936827
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representationsor warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Anyopinions and views expressed in this publication are the opinions and views of the authors, and are not theviews of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor and Francis shall not be liable for any losses,actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoevercaused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
lIE Transactions (200 I) 33, 259-275
Stochastic root finding via retrospective approximation
HUIFEN CHEN! and BRUCE W. SCHMEISER2
I Department of Industrial Engineering. Chung Yuan Christian University, Chung Li, TaiwanE-mail: [email protected]'2SclIOoi of Industrial Engineering, Purdue University, West Lafayette. IN 47907-1287, USAE-mail: [email protected]
Received October 1999 and accepted July 2000
Given a user-provided Monte Carlo simulation procedure to estimate a function at any specified point, the stochastic root-findingproblem is to find the unique argument value to provide a specified function value. To solve such problems, we introduce the familyof Retrospective Approximation (RA) algorithms. RA solves, with decreasing error, a sequence of sample-path equations that arebased on increasing Monte Carlo sample sizes. Two variations are developed: IRA, in which each sample-path equation isgenerated independently of the others, and ORA, in which each equation is obtained by appending new random variates to theprevious equation. We prove that such algorithms converge with probability one to the desired solution as the number of iterationsgrows, discuss implementation issues to obtain good performance in practice without tuning algorithm parameters, provideexperimental results for an illustrative application, and argue that IRA dominates ORA in terms of the generalized mean squarederror.
I. Introduction
The root-finding problem is to find the unique root x* ofthe equation g(x*) = y, where 9 :~ -t ~. We considerStochastic Root-Finding Problems (SRFPs), the casewhere g(x) can only be estimated by a consistent estimator y(x) for any given value of x. SRFPs arise in designing stochastic systems with computer simulation: y isthe desired system performance, x is the value of a designvariable, g(x) is the corresponding system performance,and y(x) is the estimated performance obtained from auser-provided Monte Carlo simulation procedure. Morespecifically, the SRFP is defined as follows.
Stochastic Root-Finding Problem (SRFP):Given:
(a) constant vector l' E ~I,
(b) a (computer) procedure for generating, for anyx E ~I, a d-dimensional consistent estimate y(x)of g(x).
Find: the unique root x* satisfying g(x*) = y using onlythe estimator y.Examples of SRFPs can be found in Chen and
Schmeiser (1994a) and in Section 5. As an illustrativeexample, we consider here a one-dimensional problemarising in statistical inference: Compute the analogue of aStudent's t critical value when the independent observations arise from an arbitrary distribution function, sayFv, rather than from the normal distribution. Specifically:
0740-817X © 2001 "liE"
given Fv; a sample size 11, and a probability I - 0(, theproblem is to determine the critical value 11-•." such that
P{T:S: tl_.,,,} = I - 0(, where T = yli(V - /I)/S, V = 11-1
L.]=I I), S2 = (11 - I)-I L.]=I (I) - V)2, the observations
101, /12, ... , v" are sampled independently from Fv; andJ1 = E(V) assumed in the null hypothesis. This problem isan SRFP with true root x* = 11-..", target y = I - 0(, andfunction g(x) = peT :s: x). The user-written computerprocedure y(x) is the sample average of YI (x), . . . ,Ym(x),where Yj(x) = 1{'0 :s: x}, Tj is the jth observation of T,and 1 is the indicator function. That is, the computerprocedure mimics the process: Each observation Yj(x) isobtained by generating one random sample of size 11 fromFo, computing the value of Tj, and returning I if Tj :s: xand 0 otherwise.
This root-finding problem could be solved deterministically using numerical quadrature. Rather than y(x), theuser-provided procedure would compute
00 00
g(x) = J... JI{T:s:x}dFv(VI) .. ·dFv(v,,),-00 -00
probably using an off-the-shelf quadrature routine. Thisvalue, in turn, could be given to an off-the-shelf deterministic root-finding routine. Even for this simple example, and even for small values of 11, both simplicity of codeand computing time favor the stochastic approach. Ofcourse, for the important normal-distribution case,specialized deterministic routines are quite efficient.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
260
Wc arc interested in black-box algorithms to solve theSRFP. Given such algorithms, a practitioner needs onlyto provide a computer simulation procedure that providcs the estimator ji(x) of y(x) by mimicking the behavior of the modeled system. Often, as in this example,p(x) is an unbiased sample average, but more generally,such as in problems involving steady-state behavior ofstochastic systems, p(x) could be biased. Evcn moregenerally, 5i(x) need not be a sample average. The conccpts necessary to develop such simulation procedures arediscussed in standard textbooks, for example Law andKelton (2000) and Banks et 01. (1996).
The general problem is to solve d equations in d unknowns. As with deterministic root-finding, many interesting problems lie in a single dimension. We consider onlyd = I in detail, The ideas, but not all of the details, of thispaper extend to multiple dimensions, e.g., Chen (1998).
Wc propose, in Section 2, a family of RetrospectiveApproximation (RA) algorithms, which iteratively solve asequence of sample-path approximation problems withincreasing sample sizes. In each iteration, the sample-pathapproximation problem is solved to within an error tolcrancc; the root estimator is then a function of thosesolutions. Using results from M-estimators, we show inSection 3 that, under proper conditions, the RA rootestimator converges to the true root with probability one(w.p.I). Section 4 is a discussion of implementation issues, Section 5 provides numerical results for two specificRA algorithms and empirically compares RA to stochastic-approximation algorithms (Robbins and Monro,1951), a classic stochastic root-finding approach. Thefamily of RA algorithms has been discussed previously inChcn (1994) and Chen and Schmeiser (1994b).
That an algorithm converges is not sufficient to makethe algorithm interesting; the finite-sample convergencecan be so slow as to make the algorithm impractical, evenwhen the asymptotic convergence rate is quite good. Ourempirical experience with various retrospective approximation algorithms, however, has been good. Section 5.2shows that RA yields root estimates with much smallermean squared error (mse) than stochastic approximationfor practical numbers of replications. Work continues onstochastic approximation, although usually for optimization rather than root finding (Andradottir, 1992; Fu,1994; L'Ecuyer et al., 1994; L'Ecuyer and Glynn, 1994;Fu and Hill, 1997). Simon (1998) proposes a naturalstochastic root-finding procedure that iteratively solves asequence of approximation functions of y(x), assumingthat thc solution can be determined exactly. A given approximation function depends on all the past solutionsand their function estimates. This procedure converges ifthe root-finding function y is continuous, the estimator pis uniformly consistent, and the approximation functionsatisfies the "cluster point property". Despite the convergence proof. it is not clear how to construct an approximation function that satisfies the cluster point
Chen and Schmeiser
property, is easy to implement, and is robust with respectto the algorithm parameters.
2. Retrospective approximation
Here we develop RA algorithms. In the following fivesubsections we discuss sample-path approximations, twovariations of RA algorithms, the rationale behind the RAlogic, estimators for the variance of the retrospective rootestimators, and retrospective approaches for stochasticoptimization.
2.1. Sample-pat" approximations
Fundamental to the retrospective approach is the conceptof the sample-path approximation to the function g. Atany point .r, this approximation is simply p(x). The approximation to y is obtained by using common randomnumbers for every point x. We let!Q = {WI, .. . ,wm } denote the random numbers used to obtain p(x) = 5im(x;!Q).Holding g; constant over all points x yields a sample-pathapproximation YnJ;!Q) to g. Although given !Q the sample-path approximation 5im(x;!Q) is a deterministic function of x, we need not write it explicitly; rather wecalculate its value only at each desired point x.
The sample-path equation
(I)
defines the random root x*(!Q), which is an estimate of thetrue root x*. We call x*(!Q) the retrospective root.
For the example in Section I, the random numbers Wjare used to generate the jth observationyk) = y(x; Wj) = I {7j(Wj) ::; x]. Given independentlygenerated random numbers WI, ... , W m, the sample-pathequation is
m
Pm(x*(!Q);!Q) = m- ILy(x*(!Q); Wj)j=1
m
=m- I LI{Tj(wJl ::;x*(!Q)} = 1-0:. (2)j=1
Despite the uniqueness of the true root tl-,.n of the strictlyincreasing, continuous (assuming Fv is continuous) function q, the retrospective root might not be uniq ue or mightnot exist. The sample-path equation has many roots if I - 0:
lies on a step and has no root otherwise. In the lalter case,there is, however, a unique point x*(!Q) at which Ym crossesI - 0:. rn Section 3.1 we relax the definition of retrospectiveroots to allow "crossing" roots.
2.2. RA algorithms
RA iteratively solves approximately a sequence of sample-path equations
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation 261
Section 5.1 empirically compares the efficiency of ORAand IRA algorithms.
Specifically, RA algorithms work as follows.
the ith retrospective solution. In IRA, the root estimatoris the weighted average of the solutions X(!QI),""X(Qlll'" ,£Q;):
Ym/X*(!Q;); !Qi) = )', (3)
for i = 1,2, ... , where the components of!Q; = {Wi,I, ... ,Wi,m,} are generated independently. RA uses a strictlyincreasing sample-size sequence {l1li}. At each retrospective iteration i, RA uses a numerical root-finding algorithm to evaluate Ym, (-;!Q;) at one or more x values to finda retrospective solution X(!QI,'·' ,!Qi) that is close to theretrospective root x*(!Q;). In particular, RA uses the retrospective-iteration stopping criterion IX(!QI' ... ,!Q;) - X·
(!Q;)I < £i, where [e.} is a positive sequence converging tozero. Possibly {£i} is a function of the previous retrospective solutions and therefore random; in this case weassume only w.p.1 convergence to zero. One approach tosatisfy the stopping criterion is to bound the retrospectiveroot and use bisection search until the stopping criterionis satisfied.
The values of x at which the sample-path approximation YmJ; !Qi) is evaluated by the numerical root-findingalgorithm, as well as the number of such x values, arerandom, depending upon the particular sample-pathequation and (less directly) any information from earlierretrospective iterations. For example, in our implementation the numerical root-finding algorithm uses an initialx computed from earlier sample-path equations. In fact,the computational efficiency of RA arises because smallsample sizes in the early retrospective iterations are usedto find the region of the root before more expensive andprecise computations are performed with larger samplesizes in the later retrospective iterations.
Two strategies seem natural for seeding the retrospective iterations. In Dependent RA (DRA) all retrospective iterations use the same random-number stream.Therefore, the retrospective iterations are dependentbecause the l1li-1 observations !Q;_I in the retrospectiveiteration i-I are the first l1li_1 observations of the m,observations !Qi in retrospective iteration i. In independent RA (IRAJ all retrospective iterations arc independently seeded. Therefore, the observations from iterationto iteration arc independent, with retrospective iterationi containing m, new observations !Qi independent ofQ;!I,'" ,Yli-i'
After i retrospective iterations, the root estimatorx(!QI,'" ,!Q;) differs between ORA and IRA. In ORA,
i(Q2\, . . . ,.QL:) = X(fQIl' ." ,Q2j),
i / ix(!QI,·· . ,!Qi) = L I1lj X(!QI,· .. , !Qj) L mi'j=1 J=I
(4)
(5)
RA Algorithms:Components:
I. An initial sample size nil and a rule for successivelyincreasing m, for i :::: 2.
2. A rule for computing an error-tolerance sequence{£;} that goes to zero w.p.1.
3. A numerical root-finding method for solving sample-path Equation (3) to within a specified error £i
for each i.
Find: the root x'.
Step O. Initialize the retrospective iteration numberi = 1. Set nil and ei .
Step I. Generate !Qi' For IRA, !Q; is m, new observations, generated independently of !Q;_I'For ORA, !Qi is obtained by generatingm, - nli-I new independent observationsand appending them to !Q;_I'
Step 2. Usc the numerical method to solve thedeterministic sample-path Equation (3) toobtain a retrospective solution X(!QI,' .. ,!Q;)that satisfies IX(!QI"" ,!Q;) -x*(!Q;)1 < £i·
Step 3. Compute the root estimator x(!QI, ... ,!Q;)using Equation (4) for ORA or (5) for IRA.
Step 4. Compute nli+1 and £i+I' Set i +-- i + I andgo to Step I.
In addition to choosing between ORA and [RA, aspecific RA algorithm is obtained by choosing the threecomponents: sample-size rule, error-tolerance sequence,and numerical root-finding method. We discuss thesechoices, as well as stopping rules (for the entire RA algorithm), in Section 4.
2.3. RA rationale
Here we briefly discuss the rationale of the RA algorithmstructure. We have three criteria for algorithm development: practical computational efficiency, guaranteedconvergence to the root x*, and a standard-error estimator for the root estimator. The first two underlie thediscussion here, and the last we discuss in the next subsection.
RA algorithms usc, by definition, multiple iterationswith increasing sample sizes and decreasing error tolerances. The key idea for practical computational efficiencyis that previous retrospective iterations, based upon smallsample sizes, can provide information for efficient numerical solution in future retrospective iterations. Computing in early iterations is inexpensive because samplesizes arc small; computing in later iterations is inexpensive because the approximate location of the retrospectiveroot is known. Also for efficiency, the error tolerancesneed to be appropriately sized: there is no need tochase randomness. In earlier retrospective iterations the
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
262
retrospective root has larger sampling error (due to smallersample sizes), so the error tolerances should be larger; laterthe retrospective root has less sampling error, so the errortolerances should be smaller. The second criterion guaranteeing convergence - is obtained by approachingtwo limits simultaneously: the sample-path approximation5i" approaches q, because of increasing sample-size m, andthe retrospective solutions approach the retrospectiveroots, because the error tolerances converge to zero.
Two other sample-size rules, both commonly used,have disadvantages compared to increasing sample sizes.The first - to use a single iteration, called a static algorithm in Shapiro (1996) - is typical, but often inefficient.Here only ml needs to be specified, and a solution x(~I)
01' Equation (3) with i = I is returned. The sample-size mi
is necessarily large for Equation (3) with i = I to wellapproximate fl, which causes the root-finding method tobc inefficient because the lack of information from previous iterations causes the method to examine manypoints x, each requiring the computation of .vIII' (x; ~I)based on a large sample size of mi. We discuss this firstsample-size rule in Section 3.1. A second sample-size rule,also not allowed within RA, is to use k iterations, each ofthe same sample-size: ml = mi = ... = mi, The multipleiterations allow subsequent iterations to begin with agood guess of the retrospective root x'(~;), saving computational effort, The disadvantage is lack of convergenceas k goes to infinity because of fixed finite sample size mi.
Any bias in the retrospective root for the true root (oralso in the retrospective solution for the retrospectiveroot) docs not go to zero. As with using k independentreplications in any simulation experiment, reasonablepractical performance can sometimes be obtained withsuch an approach (Healy, 1992, p. 9). Although biascannot be estimated, estimating the standard-error of theroot estimator is straightforward. The tradeotfs betweenthese two sample-size rules are analogous to those between steady-state simulation using one long run versus kshorter runs: less bias versus easy standard-errorestimation. RA is designed to obtain both advantages.
By definition of RA, the new observations in Step 1 (m,in IRA and m, - mi-I in ORA) that are used to obtain.iI"i (x; ~i) must be independent of previously generatedobservations so that the variance estimators can work.The new observations do not, however, need to be independent of each other. Any variance reduction via dependence-induction methods must be applied within theset of new observations, thereby hiding the dependencewithin .vIII, (x). For example, antithetic pairs can be applieddirectly for either IRA or ORA. Stratified sampling,however, is straightforward for IRA but not for ORA.Application to steady-state simulation is straightforwardif~ is chosen to be the random numbers, which are easilymade independent; the dependence between the retrospective roots will become negligible as the sample sizegrows.
Chell and Schmeiser
2.4. RA variance estimation
The third algorithm criterion is the ability to estimate thestandard error of the algorithm's current root estimatori(~I"" '~i)' We derive variance estimators for both IRAand 0 RA here. As discussed in Section 4.1, these estimators are helpful for determining error tolerances, foruse within the numerical root-finding algorithm, forstopping the overall algorithm, and for reporting precisionof the final root estimator. These variance estimators arebased on assumptions that are consistent with our intuition and computational results in Section 5, but certainlydo not hold exactly in practice. For example, the computational results show that bias in the variance estimatoris large in early iterations before becoming negligible.
First consider IRA. For now assume that IRA retrospective solutions are unbiased uncorrelated estimators ofthe root x", with variances inversely proportional tosample-size; that is, mjVar(x(~I"" '~j» = \'2 for each}.RA algorithms contain no systematic reason for estimatesto be high or low, so biases should be small. Thatthe variance is inversely proportional to sample-size isnatural, at least asymptotically (see also Lemma 2 inSection 3.1). The independent retrospective roots {x' (~Jl}of IRA yield retrospective solutions {X(~I"" '~j)} thatare nearly uncorrelated; some correlation might arisefrom the use of information from previous retrospective iterations, such as an initial search point. This correlation should be minor because RA forces the jthretrospective solution to be close to (within £j of) the jthretrospective root. Consider the covariance between twosuccessive retrospective solutions. Let b(~I"'" ~j)
denote the error of the jth retrospective solution, i.e.,
X(~I' ... ,~j) = x'(~j) + b(~I' ... , ~j)'
Then COV[X(~I';" , ~~), X(~I'" . ,~j+I)] ';Cov[x'~~Jl+b(~I'''''~j)'x (~j+I)+b(~I""'~j+l)] Cov[x (~j)'b(~I" .. '~j+I)] + COV[b(~I"'" ~j)' b(~I"'" ~j+I)]because ~l' ~2' ... are generated independently in IRA.The first covariance is small because: (i) x'(~j) andb(~I' ... '~j+l) are correlated only through ~j not theother ~'s; and (ii) the value of b(~I' ... ,~j+l) dependsmore on X(~I'" . '~j)' for example as an initial point, butlittle on x'(~j)' The second covariance is small because:(i) the signs of the two errors b(~I"'" ~j) andb(~I" .. ,~j+l) are nearly independent; and (ii) themagnitude of the two errors depends mainly on theirerror tolerances Ej and £j+l; and hence if the error tolerances {£{ : I = 1,2, ... } are substantially smaller than thestandard deviation of the current retrospective root, thenthe correlation between successive errors should be small.For these reasons assuming zero correlation seems reasonable.
These IRA assumptions yield two useful results. First,after i IRA retrospective iterations, the minimal-varianceunbiased root estimator is i(~I"'" ~;), the weighted
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation
average of the i retrospective solutions, as given inEquation (5). Second, these assumptions provide an estimator of the variance of the root estimator. The variance is Var(x(fQI"" ,fQ;» = v2/ "L1=, mj' Therefore, afteri retrospective iterations, an unbiased sum-of-squareddifferences estimator of the variance of the IRA root estimator is, for i > I,
263
where the coefficients Cl.ij = mj/(m; - mj) are chosen tomake each term of the sum an unbiased estimator of thevariance. Because Cl.ij functionally depends on m., the estimator cannot be computed using cumulative sums as inEquation (6), so the values of the i retrospective solutionsmust be stored. For any fixed i, however, m, is fixed; thenexpanding the squared differences allows computation
,v;r(X(fQl, ... ,Ql;» = y2/I>j,
j=1
= tmj[x(fQI,'" ,fQj) -X(fQI,'" , fQil] 2/ [u - I) tmj]'J=I j~1
= [t mjx2(fQI,'" ,fQj) - (t mj )x2(fQI,'",fQj)] / [U - I) t mj],
J~I J~I J=I(6)
Now consider ORA. We make assumptions andarguments analogous to those of IRA. Assume that theORA retrospective solutions are unbiased estimators ofthe root with variance inversely proportional to thesample size and that covariances are inversely proportional to the larger sample-size; that is,COV[X(fQI,""%)' x(fQ" ... , Qik)] = y2/max{mj,md foreach j and k. The arguments for unbiasedness andvariances are identical to those for IRA. The argumentfor covariance is based on the assumption that whenj < k , the kth retrospective solution X(fQI"" ,Qik) =mjmk'x(fQI,'" ,fQj) + (/Ilk - mJlmk1x@), the weightedaverage of the jth retrospective solution X(fQI,"" fQj)and the retrospective solution x(@ of the sample-pathequation Yn,(x;@ = /" where iii = mk - mj and Q2 = Qik\fQj = {Wmj+I, ... , wm, } , the set of (mk - mj) new observations independent of fQ;. This covariance assumptionis true if the retrospective solution equals the retrospective root at every iteration, 9 is linear, andy(x; w) = g(x) + z(w), where the random noise z(w) isfunctionally independent of x for every w. If this assumption is not true, then observations are becomingeither more or less valuable. Under the assumption,COV[X(fQI"" ,fQj), X(fQI"" ,Qik)] = Cov[X(fQl'''' ,fQj),mjmk' X(fQI,"" fQJl + (mk - mJlmk I x@)] = mjmkl Var[X(fQI, ... ,fQj)] = y2/mk. Hence the form of the covariance is obtained.
These ORA assumptions suggest an estimator of itsvariance. The variance of the ORA root estimator can beestimated using
v;r(X(fQI, ... ,fQ;»
"L~:,I, CI.;Jlx(fQI"" ,fQj) -X(fQI,'" ,fQj)]2
i-Ii> I,
(7)
using only the two cumulative sums for Cl.ijX(fQl,'" ,fQJl,and Cl.jjX2(Q11 , ... , Q1j) over j.
2.5. Retrospective approaches for stochastic optimization
The genesis of RA algorithms lies in retrospective optimization. The optimization problem is to find the optimalpoint of an objective function using only an estimator ofthe function at any feasible point x (Fu, 1994). A retrospective approach optimizes sample-path problems thatapproximate the optimization problem of interest. Rubinstein and Shapiro (1993) use the phrase stochasticcounterpart and Giirkan et al. (1994), for example, usesample-path for this approach. We adopt Healy andSchruben's (1991) phrase retrospective to capture the ideaof solving a random problem that happened in the past.Robinson (1996) contains a good summary of such optimization algorithms.
The design of the family of RA algorithms has fivefundamental features. We discuss these features and howthey distinguish RA from retrospective optimization algorithms as follows:
• We use a sequence of sample-path approximationswith increasing sample sizes and decreasing errortolerances. This structure is central to our simultaneous goals of proving convergence and achievinggood practical performance (as discussed in Section 2.3). Shapiro (1996) describes this approach asdynamic to distinguish it from the more-typical staticalgorithms, which solve a single sample-path problem. Shapiro and Homem-de-Mello (1997) and Homem-de-Mello et al. (1999) recently have used astructure similar to ours (also discussed earlier inChen and Schmeiser (1994b) and Chen (1994», butwith random sample sizes and a stopping rule basedon a statistical test of hypothesis. Shapiro and Wardi
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
264 Chen and Schmeiser
3. Convergence of RA algorithms
where Ym(x; fQ) is a consistent estimate of g(x) for all real x(i.e., for any .r, w.p.1 Ym(x; fQ) converges to g(x) as m --; (0).
least-square estimators. In many applications, such optimization is obtained by solving for the zero of the gradient function. Following Serfling (1980, p. 243) we viewM-estimators as the solution of a sample-path equation,such as Equation (I). The retrospective roots x*(fQ) satisfying Equation (I) and x*(fQ;) satisfying Equation (3)arc therefore M-estimators.
We show here that under weak conditions RA algorithmsconverge w.p.1 to the true root x* that satisfies g(x*) = y.As discussed in Section 2.2, RA algorithms estimate x* bysolving, to within an error bound, a sequence of samplepath equations based on increasing sample sizes. To showthe convergence of RA algorithms, we consider two cases:(i) a single RA iteration using a long-sample-path equation; and (ii) a complete RA algorithm using a sequenceof equations of the form (3) for i = 1,2, .... The firstcase, discussed in Section 3.1, could be used to solveproblems with a single iteration with a fixed value of ml
and EI (as with the static methods of Section 2.5), but ourintent here is to use the single-iteration results of Case (i)to prove in Section 3.2 that RA algorithms converge to x*w.p.1 as the iteration index i goes to infinity. The advantages of using multiple retrospective iterations ratherthan a single-iteration is discussed in Section 2.3. InSection 4 we discuss the choice of RA algorithm components.
(8)YIIJX*(m);fQ) = y,
3.1. Case (i): a single RA iteration
Consider using an RA algorithm but stopping after thefirst retrospective iteration using only the first samplepath equation, Equation (3) with i = I, and returningX(fQI) as the estimator of the root x*. (I RA and 0 RA areidentical in this case). We consider here the asymptoticbehavior of X(fQI) in the limit as the first-iteration samplesize m, goes to infinity and EI goes to zero. Although inpractice m) and E) are fixed, the asymptotic analysis hereis interesting in the same sense that it is interesting toshow that the sample mean converges to the populationmean as sample-size goes to infinity even though inpractice sample-size does not grow.
To simplify notation and provide emphasis on thesample-size, we denote the sample-size ml by m, the errorEI (dependent on the sample-size) by E(m), the samplepath fQI by fQ, the retrospective root X*(fQI) by X*(m), andthe retrospective solution x(fQIl by X(m). Equation (3)with i = I is then
(1996) consider gradient descent algorithms, whichare both dynamic and Markovian. The bundle-typemethod (Plambeck et al., 1996) is dynamic, but theerror tolerance docs not decrease.
• Averaging solutions from a sequence of independent problems, which we argue is more efficient.That is, IRA is more efficient than ORA becauseDRA reprocesses old data whereas IRA is alwaysworking on new data (as discussed further inSection 5.1). Healy and Sehruben (1991) averagesolutions of independent sample-path approximations, but each has the same sample size.
• A black-box approach, in which the problem structure is not used. Our work is in the spirit of methodssuch as bisection search or regula falsi rather thanmuch of the recent literature that exploits problemstructure. Stochastic approximation (Andradottir,1992; L'Ecuyer et al., 1994; L'Ecuyer and Glynn,1994) and bundle-type methods (Plambeck et al.,1996) fall into this category. Healy (1992) and Healyand Xu (1994, 1995), on the other hand, developproblem-specific algorithms.
• Robustness. We strive for good practical performance without algorithmic tuning. RA logic isnon-Markovian, in that information from previoussample-path approximations is used in the solutionof the current approximation. The bundle-typealgorithm used by Plambeck et al. (1996) is also nonMarkovian. In contrast, the stochastic approximation algorithm is typically Markovian.
• Convergence proof. We prove convergence underfairly broad conditions. Shapiro (1996) proves convergence for various static algorithms. Shapiro andWardi (1996) prove convergence of dynamicMarkovian algorithms. Robinson (1996) provesconvergence of the single-iteration bundle-typealgorithm.
Substantial earlier work considered static algorithms.Rubinstein and Shapiro (1993) solve stochastic optimization problems by solving SRFPs, finding the zero of theassociated gradient functions. They estimate the optimalpoint by finding the zero of a sample-path gradientfunction using the score-function gradient estimates froma single long-run simulation experiment. They assumethat a finite-time algorithm is available for solving thesample-path equation exactly; that is, in our notationthey assume that X(fQI) = X*(fQI)' Healy and Schruben(1991) solve stochastic optimization problems by analyzing each problem's structure to find the exact optimalpoint of a sample-path (or retrospective) objective function. A specific algorithm therefore differs from problemto problem (Healy and Schruben, 1991; Fu and Healy,1992). Huber (1964) proposed M-estimators, which areobtained by optimizing an error function based on sample data; examples include maximum-likelihood and
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation
The retrospective root X*(m) satisfying Equation (8) is anM-estimator (Section 2.5). Our purpose here is to showthat X(m), the approximation to X*(m), converges to thetrue root x* as the sample size m goes to infinity.
We show convergence in two parts: (i) X*(m) convergesto x* w.p.l; and then (ii) X(m) converges to x* w.p.J.Since X*(m) is an M-estimator, the first part follows fromthe results of M-estimators as stated in Lemma I, basedon the assumption that 1m (x; Q1) is an unbiased estimatethat averages m monotonic functions. We further developLemmas 3 and 4 to extend the result to more generalfunctions if and sample-path approximations 1m(-; Q1).The second part, the consistency of X(m), is shown inLemma 5.
Since Equation (8) may have no root or multiple roots,we define the concepts of a crossing root and a crossingset, which are then used in Lemmas 3, 4, and 5 andTheorems I and 2 in Section 3.2. Every root is a crossingroot; a crossing root might additionally be a discontinuitypoint where Ym(-; Q1) crosses 1'. To be more specific, wedefine the crossing root and crossing set as follows. Letsets
RN = {x: Ym(x;Q1) -1' < 0, x E iR},
RZ = {x: Ym(x; Q1) - l' = 0, x E iR}, and
RP = {x : YnJx; Q1) - l' > 0, x E iR}. (9)
Assume that the function J'm(-; Q1) is defined over thewhole real-number line iR. Then the three mutually exclusive sets RN , RZ, and RP partition iR, r.e,iR = RN U Rl U RP• Further define
RNP = {x : X E RN and x on the boundary of RPor RZ} U
{x : X E RPand x on the boundary of RN or Rl}
Then RZ contains all (true) roots of Equation (8) and RNP
contains all discontinuity points where YmC Q1) crosses l'but does not intersect the level 1'. The crossing set Cm (Q1)of Equation (8) is therefore defined as
Cm(Q1 ) = RZURNP. (10)
Every element of Cm (Q1) is a "crossing root". The setCm (Q1 ) is empty if the function J'n,(-; Q1) lies entirely belowor entirely above )'.
Lemmas I and 2 below use the concept of isolated root,which is a discrete point in Cm(Q1). Of (true) roots,crossing roots, and isolated roots, the most general arecrossing roots; all (true) roots and all isolated roots arealso crossing roots. A crossing root, however, might be a(true) root, an isolated root, both, or neither.
We illustrate crossing roots, isolated roots, and (true)roots for four different functions YmC Q1) in Fig. l , Thefunction Y,nC; Q1) in Fig. I(a) crosses the value l' at threedistinct points XI, X2, and X3; hence, any of these points isa crossing root and Cm(Q1) = {XI,X2, X3}. All three of thesepoints are also isolated roots. The only root is X2; the
265
points x\ and X3 are not roots because Ym(XI; Q1) oF l' andYm(X3;Q1) oF 1'. The function Ym(-;Q1) in Fig. l(b) crossesthe value l' only once with an intersecting interval (X4,X5]on the x-axis. Hence, all points in the interval (X4,X5] areroots, all points in [X4,X5] are crossing roots but none is anisolated root, and Cm(Q1) = [X4,X5]. The function J'm(-;Q1)in Fig. I(c) lies below the level l' except at one point X6,
where .vm (·;Q1) crosses the value l' and then immediatelydrops down to below 1'. Therefore. Cm (Q1) = {X6}; X6 isalso an isolated (crossing) root but not a (true) root. Thefunction Ym(-; Q1) in Fig. led) has no intersection with thelevel 1'. Hence, there is no root and no crossing root;Cm(Q1) is the empty set.
Equation (8) may have zero, one, or multiple crossingroots. Zero crossing roots occur when the sample-size m issmall, allowing )'m(x; Q1) to lie entirely below or above l'for all real x. Multiple crossing roots occur, for example,when Ym is a step function and one of the steps has height1'. Step functions occur in the two examples of Sections I,2 and 5, where y(x) is an indicator function. Indicatorfunctions occur, for example, when y(x) indicates whetheran event occurs or not.
We redefine the retrospective root X*(m) of Equation (8) to be
X*(m) = {aony crossing root if Cm(Q1) is not empty,otherwise.
( I I)
The retrospective root X* (m) can be selected arbitrarilyfrom Cm (Q1) using any rule that produces a solution sequence {X*(m)}. This selection rule is usually implicit inthe solution method. The choice of X*(m) = 0 whenCm(Q1) is empty is arbitrary because asymptotically Cm(Q1)is empty with probability zero. A better practical valuemight be the current estimate of x*.
Asymptotically the retrospective root X*(m) is an Mestimator. Lemmas I and 2 show the consistency andasymptotic normality of M-cstimators, and hence can beused for retrospective roots. Proofs can be found inHuber (1964) and Serfling (1980, p. 251), respectively.
Lemma t. Let x* be an isolaled 1'001 of o(x) = 1'. Supposethat .vm (x:Q1) = 2::;:\ y(x; wi)!m and each y(x; Wi) yields (Ill
unbiased estimator of o(x) for every x. Fur/her, supposethat, for every co, the function y(x; w) is monotone in x.Then x* is unique, and any solution sequence {X*(Ill)}satisfying Equation (8) converges to x* w.p.l as III -> 00.
Further, if with probabilitv one there is a neighborhood of'x* in which y(x; w) is a continuous function ofx, then thereexists such a solution sequence.
Lemma 2 (Serfling, 1980, p, 251) states conditions underwhich Jm(X* (Ill) - x*) is asymptotically normally distributed.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
266 Chen and Schmeiser
- - - -Ym(x;~) Ym(x;~) Ym(x;~) Ym(x;~)
•
'0vi•
..J...
,y V·············· y y ....... ......... - y _ ......... -_ .. '_ .
), V....- 0
----- -1-1 I .x x x XXI X2 X3
,,-X4 Xs X6
(a) (b) (c) (d)
Fig. I. Four functions illustrating different types of roots.
Lemma 4. Assume that
I. the function y : ~ --+ ~ has a unique root x* and satisfies
Lemmas 3 and 4 are stated for functions 9 that areincreasing in the neighborhood of the root x*. Trivialchanges, e.g., redefining 9 as -y, allow them to be statedfor functions 9 that are decreasing near x*.
Lemma 5 states that, despite allowing an error £(111) infinding the root of the sample-path equation, a singleiteration of an RA algorithm will converge as m goes toinfinity.
2. for every x E~, the estimate Ym(x; fQ) converges toy(x) w.p.I as III -> 00, and
3. for every positive integer m, w.p.I the crossing setCm(fQ) of the function Ym(x; fQ) is either empty or aninterval.
Then, for any selection rule defining the solution sequence{X*(m)} from {Cm(fQ)},
lim X*(m) =x' w.p.1.m~oo
set, if not empty, is a unique interval. That is, for everyvalue of III the crossing set Cm(iQ) of Equation (8), if notempty, is an interval w.p.l , i.e.,
PI'{iQ : c;(fQ) = [.;' (fQ), XU (fQ)]} = I,
where .x'-(fQ) = sup{x : Ym(x; fQ) < y} and XU (fQ) = inf{x: Ym(x;fQ) > y}. If.x'-(fQ) > xU(fQ), then Cm(iQ) is empty.For example, .x'-(fQ) = X4 and xU(fQ) = X5 in Fig. I(b).This condition restricts the function Ym('; fQ) to cross thetarget value l' at most once, but allows multiple contiguous roots. In addition to Fig. I(b), examples include thestep functions that arise in the two examples of Sections 1,2 and 5 and any function Ym(x; fQ) monotonic in x.Like Lemma 3, Lemma 4 does not require an unbiasedYm(x; fQ). The proof is listed in the Appendix.
if x>x*,if x=x',if x<x*,{
> yy(x) = y
<y
Lcmma 3. Assume that
I. the function y : ~R -> ~ is a non-decreasing functionwith unique root x* satisfying y(x*) = y, and
2. the sample-path approximation Ym(x; iQ) converges 10
y(x) uniformly in x w.p.I as III -> 00.
Then, ./CII· any selection rule defining the solution sequence{X'(III)} from {Cm(iQ)},
lim X'(III) = x' w.p.1.111-00
Lcmma 2. LeI x' be an isolated 1'001 of y(x) = y. Supposethe unbiasedness and monctonicily of y(x; w) as in Lentina I. Further, suppose that y(x) is differentiable at x' withy'(x') f= 0 and that E[y2(x;w)] is finite for x in a neighborhood ofx" and is continuous at x'. Then, as III -> 00 anyscaled solution sequence {/i1i(X*(IIl) - x')} of Equation(8) has an asymptotic normal dist~ibulion with mean zeroand variance VarLv(x*; w)I/[y'(x*)] .
Lemmas 3 and 4 show that a solution sequence{X'(III)} converges to .r" w.p.1 under certain conditionson the function y and the sample-path approximation Ym'Both lemmas relax the assumption of unbiasedness andmonotonieity of )Im(x; iQ), with Lemma 3 requiring onlyuniform convergence of Ym(x; iQ) to y(x) w.p.1 as III -> 00
and with Lemma 4 yet more relaxed with only point-wiseconvergence of Ym(x; iQ) to y(x) w.p.l as m -> 00 but requiring jim(';iQ) to cross y no more than once. (Thatconsistency is a relaxation of unbiased ness follows fromthe strong law of large numbers, Billingsley (1979, p. 70).Dropping the unbiasedness assumption allows. for example, the initial transient of steady-state simulation). Inaddition, both lemmas allow the sample-path approximation .vm(-; iQ) to have multiple crossing roots for finitevalues of Ill. Lemma 3 considers a monotonic function q.We list the proof in the Appendix.
Lemma 4 considers a function y that has a unique rootbut that is not necessarily monotonic. It assumes that.vm ( · ; iQ) is point-wise convergent to yand that its crossing
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation
Lemma 5. Assume that conditions in Lemma I, 3, or 4hold. l] X(m) is obtained by one RA iteration and if E(m)converges to zero 1V.p.1 as m goes /0 infinity, then
limX(m)=x' w.p.1.m-oo
Proof. By Lemma 1,3, or 4, X'(m) converges to x' w.p.1.Therefore, w.p.1 there exists an N(Ql.) such that for everym > N(Ql.) the sample-path equation has a crossing root.When there is a crossing root, the root-finding methodreturns a solution X(m) that satisfies IX(m) -X'(m)1 <E(m). Because E(m) converges to zero w.p.l as m goes toinfinity, the random absolute numerical errorIX(m) -X'(m)1 converges to 0 w.p.1. Then Lemma 1,3or 4 implies that X(m) converges to x' w.p.1. •
3.2. Case (ii): Solving a sequence ofsample-path equations
We now show that RA algorithms, defined in Section 2.2,converge w.p.l. Recall that RA algorithms solve a sequence of equations of the form (3), for i = 1,2, ... , forthe retrospective roots {x'(Qi,)}, using an increasingsample-size sequence {m;}. At each iteration i, RA returns a solution X(Ql.I, ... ,Ql.;), an approximation ofX'(Ql.i)' within error-tolerance Ei. The dependently seededRA, ORA. repeats mi-I observations Ql.i-I in Ql.i while inIRA all retrospective roots are independent. After i iterations, the 0 RA root estimator of the root x' isX(Ql.l, ... ,Ql.;) = X(Ql.1 , ... ,Ql.i)' the last retrospective solution; the IRA root estimator is x(Ql.I, ... ,Ql.;) = ~i=1 mjx(Ql.I,· .. ,Ql.j)/~~=I IIlj, a weighted average of solutionsX(Ql.I)' ... ,X(Ql.I, .. · ,Qi,) where each weight is proportional to the sample size.
Theorems I and 2 respectively show that ORA andIRA algorithms converge to the solution of an SRFP ifits function 9 and associated estimator Ym are well behaved.
Theorem I. Let a specific DRA algorithm be used to findthe unique root x* of the equation g(x) = l' using the estimator Ym(x; Ql.) for y(x). If 9 and Ym(-; Ql.) satisfy the conditions in Lemma I, 3, or 4, then the DRA root estimatorX(Ql.I, ... ,Qi,) converges to x' lI'.p.l as i -> 00.
Proof. The proof proceeds sequentially in three parts: (i)limi_oox' (Ql.i) = x' w.p.l; (ii) limi~oo x(Ql.!, ... ,Qi,) = x'w.p.l; (iii) limi~oo X(Ql.I, ... ,Ql.;) = x' w.p.l.
The sample-size sequence {mi} is increasing, so"i -> 00" implies "m, -> 00"; therefore, Lemma I, 3, or 4yields the first part. The second part follows because, bydefinition of RA algorithms, the sequence of error-tolerance {e.] converging to zero w.p.1 implies convergenceusing Lemma 5. The third part is trivial becauseX(Ql.I'·· . ,Ql.i) = x(Ql.I, . . . ,Ql.;) by definition of ORA. •
Theorem 2 for IRA is analogous to Theorem I forORA. The new condition - that ~f=,1 Elx'(Ql.J! - x'i is
267
finite - seems likely to be satisfied in practice. Forexample, assume that there is a finite constant o: suchthat Elx'(Ql.J! -x*1 < «[m, for every i. as would beconsistent with Lemma 2. Then I:f=,1 Elx'(Ql.J! -x'i =rx ~f=,1 nl; I, which is finite if, for example and as werecommend for other reasons, mj = Clmj_1 for someCl > 1. As further evidence that the condition is weak,notice that the above argument holds even if the assumption is changed to use the sq uared differencerather than the absolute difference.
IRA convergence differs from ORA convergence inthat each retrospective iteration is independent. DefineiiJ= (Ql.1,Ql.2, ... ), the infinite sequence of observed random numbers corresponding to the one realization fromthe sample space. Convergence w.p.1 in Theorem 2 iswith respect to iiJ.
Theorem 2. Let a specific IRA algorithm be used to findthe unique root x* of the equation g(x) = l' using the estimator Ym(x;Ql.) for y(x). Assume that I:f=,1 Elx'(Ql.J! -x'iis finite. Then the IRA root estimator X(Ql.l,.·. ,Ql.i) converges to x' 1I'.p.1 as i -> 00.
Proof. As with Theorem I, the proof proceeds sequentially in three parts: (i) Iimi~OOx·(Ql.i) = x' w.p.l;(ii) Iimi~cox(Ql.I, ... ,Ql.i) =x' w.p.l; and (iii) limi_oox(Ql.I, ... , Ql.;) = x* w.p.1.
The first part shows that the retrospective roots converge to the root x' of y. The finiteness ofI:f=,1 Elx'(Ql.j) - x'i is sufficient, using Result 1.3.5 ofSerfling (1980).
The second part shows that retrospective solutionsconverge to x*. By definition of RA algorithms, Ei(iiJ)converges to zero w.p.1 and IX(Ql.I, ... ,Qi,) -x'(Qi,)1 <Ei(iiJ) for every i. Hence w.p.1 we have the followingproperty: for every G > 0 there exists an ..f(r., iiJ)such that !X(Ql.I, ... ,Ql.;) -x'(Ql.ill < Ei(iiJ) < f: for everyi>..f (G, iiJ). Then the first part implies the secondpart.
The third part shows convergence of IRA algorithms. Let b, = ~j=1 mj; then limi_oo b, = 00. Bydefinition
i i
X(Ql.I, ... ,Ql.i) = L IIljX(Ql.I , ... ,Ql.j) / L nljj=1 j=1
i
= bi'L mjx(Ql.I, ... ,Ql.j).j=1
By the Toeplitz Lemma (Loeve, 1977, p. 250), the eventlimi~ooX(Ql.I, ... ,Ql.i) = x' implies the event
lim X(Ql.I , ... ,Ql.i) = x'.I~OO
Therefore, convergence of X(Ql.I' ... ' Ql.;) to x' w.p.1implies convergence of X(Ql.I, ... ,Ql.i) to x' w.p.l. •
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
268
4. Implementation of RA algorithms
We discuss here two issues associated with implementingRA algorithms: choice of a specific RA algorithm andchoice of independent random variables w. Although allRA algorithms converge under weak conditions, asshown in the previous section, computational effort toobtain roots to a specified precision depends upon thespecific algorithm used. Similarly, computational effortdepends upon the user's definition of the random varia blcs w; our definition of w based on random numbers isa sale, but not necessarily efficient, default. We discussthese choices in the next two subsections, respectively.Despite sometimes giving quite specific implementationsuggestions, our intention here is only to provide a senseof the issues.
4.1. Choice of a specific RA algorithm
Specifying an RA algorithm requires choosing threecomponents: a rule for increasing the sample sizes {mil, arule Cor decreasing the error bounds {E;}, and a methodCor solving the sample-path equations. We discuss each,as well as stopping rules for the algorithm as a whole.
The rule to determine the sample-size sequence {milcan take many forms. A reasonable family of sequencesto consider is III; being the integer part of a +CI (m;_I)h fora ~ 0, b ~ I, and CI ~ I. If, as we argued earlier,Var(x'(ill;)) / Var(x'(ill;_I)) = m;_I/m;, setting a = 0 andb = I is required to make the variance ratio independentof iteration number i. Therefore, m, = Clm;_I, withCI> I, seems natural. Chen (1994) showed experimentally that computational performance is robust to choicesof Cj over the set {1.5, 2,5, 10}. Smaller values of CIprovide more times at which to stop the algorithm. Tocleanly obtain only integer values of m., we typically useCI = 2. The remaining issue is the choice of the initialsample-size 1111. Any small value, including 1111 = I, is fine;some numerical root-finding methods might be able touse standard-error information about J/lIll (x; illI), in whichcase a larger value of 1111 might be useful, But, 1111 shouldbe small because on the first iteration nothing is knownabout the location of the retrospective root X'(illl),causing the root-finding method to examine many points.r, with each examination requiring 1111 observations for.villi (x). The goal is that Ill; is small when many points areexamined in the early iterations, and that III; grows largein later iterations when very few points need to be examined because the previous iterations provide a goodinitial guess I(illl"" lill;-I) of .1'* (ill;).
The rule to determine the decreasing error-tolerancesequence {o;} also can take many forms. If the samplesizes {III;} increase by a factor of CI, then our earlier assumptions about variance decreasing with sample-sizesuggest thc form c; =c~1/2E;_I' The problem is then to
Chen and Schmeiser
specify an appropriate EI. Chosen too small, the rootfinding method wastes computation finding a solutionX(illl,'" ,Qi;) close to X'(ill;), which might not be close to.1'* because of small sample-size. Because we are assumingthat no prior information is available about samplingerror, a value of EI scaled to the problem at hand is notknown. In our implementation discussed below and usedin Section 5.1, we have used a very large value of EI (suchas 1050), which eliminates the need to select a problemdependent value; the error-tolerance logic is then reducedto being a device to prove convergence.
We allow random sequences of error tolerances to facilitate future development of RA algorithms whose E; ateach retrospective iteration is based on an estimate of thestandard error of .1'* (ill;). Such algorithms would use arandom sequence {E;}, each a factor {), say, of the standard error estimate of x'(ill;). This standard error couldbe estimated based on Equation (7) for DRA andEquation (6) for IRA, or the asymptotic formula inLemma 2. Choosing {) E [.1, I.] seems reasonable; lessthan one-tenth of a standard error is certainly too muchprecision and more than one standard error introduces asubstantial new source of error.
The deterministic root-finding method for solvingEquations (3) for i = 1,2, ... can be either analytical ornumerical. Analytical approaches require a known andsimpler structure of Equation (3); Healy (1992) investigates optimization problems for which an analytical solution can be obtained. Our statement of the SRFPincludes no structural information; in this black-boxcontext numerical methods must be used. The advantageis thaI no analyst effort is required to estimate a root; thedisadvantage is that the computational effort is greaterthan if Equation (3) could be solved analytically. Variousnumerical search methods are easily implemented, arereasonably efficient, and provide error bounds. Wellknown examples include bisection and modified regulafalsi (also called the modified false-position method), asdiscussed, for example, in Conte and de Boor (1980).Such methods iterate from a starting point, which for RAalgorithms would be X(illl,' .. , ill;_I)' They also requireinitially bounding the solution x*(ill;), which can be doneby searching points in increments or decrements of 15;. Ifx'(Qi;) were known, the optimal 0; would be IX(illl, .. ';,Qi;_I) - .1'* (ill;) I· Because II(WI!" " ,ill;_I) - X* (Qi;) 1 and[Var(x(illl"" ,ill;-I) -X*(Qi;J:i]1 2 are proportional, wecould choose 0; = c2[Var(x(illl, ... ,ill;_I) _x*(Qi;))]1/2,where C2 > 0, for i> I. Chen (1994) shows that RA isrobust with respect to the choices of C2 and 15 1 ; the empirical results favor a small 01. Larger values of C2 accelerate the bound search but result in a bigger boundinginterval. Setting C2 = I seems reasonable. Using the assumptions for variances and the additional assumptionx'(ill;) =x(illl,···,ill;), then Var(I(illl,···,Qi;_I)-x*(ill;)) equals to \'2 (mi_'1 -mil) for DRA and
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation
V2m=~:.\ mX I+ mil) for IRA and hence can be estimated based on Equation (7) or (6). The chosen rootfinding method must be implemented to handle multipleroots of jim, (.j!;QJ and to detect the difficult situation of noroot in the range of the computer arithmetic.
The RA algorithms defined in Section 2.2 do not havestopping rules, which are irrelevant to our proofs ofconvergence (unlike the €j stopping rules for the numerical search during each retrospective iteration i.) Nevertheless, all but interactive implementations mustautomate stopping. Most natural stopping rules center ona standard error estimate for the current root estimator,such as Equation (6) for IRA and Equation (7) for DRA.When solving problems iteratively with CI = 2, each iteration takes about twice as long as the previous iteration, so by retrospective iteration eight (or so) the resultsappear on the screen slowly enough to read in real time.Deciding when to stop is then easy. To obtain an automated stopping rule, after each retrospective iteration astandard-error estimate can be computed and the algorithm stopped when the estimate is less than a userspecified precision. Because the standard error estimatecan be misleading when the number of retrospectiveiterations is small, we might require at least four (or so)iterations to avoid premature stopping.
4.2. Choice of w
In Section 2 we define W as pseudorandom numbers. Thisis a natural and safe choice in simulation. Depending onthe model, however, W can be chosen differently forgreater computational efficiency, while not affecting thesolution returned by the RA algorithm. Alternativechoices of w must be functionally independent of x andideally include more of the computations needed forgenerating the observations used in jim' Whatever thechoice, the sample {WI, ... , wm } needs to be generatedonly once for .vm (x:Qi) to be computed at any number ofpoints x during the process of solving the sample-pathequation.
Consider again the example from Section I, the analogy of Student's t distribution. There are five possiblechoices for each component W in Qi = (WI, ... , wm). Inorder of increasing efficiency, they are: (i) the randomnumber seed; (ii) the random numbers used to generatevalues for VI,···, 1';,; (iii) the random variates VI, ... , 1';,;(iv) the pair of statistics (P,S); and (v) the statistic T.Because Qi needs to be computed only once for solving thesample-path equation, the higher-level definitions of W
lead to less total computing time. Of course, the values ofW need to be stored from the computation at the first xvalue for use at later x values.
The choice of W affects the computing time of IRA andDRA. A high-level choice reduces computing time forboth, but the ratio of computing times for IRA and DRAincreases as the choice of W becomes more efficient.
269
Whatever the choice, if the sample-size rule is m, = CI
m;-l, the ratio lies in the interval [I, u), where II =m;/(mj - mj_l) = CI/(CI - I) is the rate at which the retrospective samples grow (as discussed in Section 4.1). Theargument is straightforward. Of the four RA algorithmsteps (Section 2.2), only the first two involve significantcomputing time. First consider Step 2, which solves thedeterministic root-finding problem. Because at each retrospective iteration i the sample size m, is the same forboth D RA and IRA, the time for Step 2 is essentiallyidentical, although usually non-trivial. Now consider StepI, which generates the Qi values. At retrospective iterationi, IRA must generate m, new values but DRA has theopportunity to save and reuse m;-l = m.Ic, previousvalues. Therefore, if storing and saving the previousvalues is essentially free, the Step I computation time forIRA is about 11 times that for DRA. Thus, the ratio isapproximately I if the Step 2 computation time dominates and approximately II if the Step I time dominates.Therefore, in practice the ratio lies in the interval [I, u),completing the argument.
A particularly important case is when W is only theinitial random-number seed, in which case the ratio isone, as illustrated in our empirical results in Table I. Anefficient choice of W moves computation from Step 2 toStep I, and therefore increases the ratio, but not beyond~. Although inefficient in our example and often elsewhere, choosing the random-number seed for w oftensimplifies the programming necessary to create the procedure jim' Storing a more-complex w in a discrete-eventsimulation would be difficult and the bookkeeping necessary to store and retrieve the values might become nontrivial, since the order in which the values are used wouldchange from one x value to the next. The argument of theprevious paragraph would then be invalid.
5. Application and empirical results
To illustrate SRFPs and RA algorithms, we now discussthe Guaranteed-Coverage Tolerance Interval (GCTI)problem that motivates our research interest. Numericalresults are provided in Section 5.1. Thiokol Corporationasked us to develop an algorithm to determine, in realtime, the constant x* satisfying the non-normal a-coverage )I-confidence tolerance-interval relationship
Prw,dPrw{W 2: W -x*S} 2: a} =)1. (I I)
Here the product characteristic W is a continuous random variable with distribution function Fw having knownshape but unknown mean and variance. (For example,possibly W is normally distributed with unknown meanand variance). The sample mean Wand sample standarddeviation S are computed from product characteristics~, W2, ... , Wn previously generated from Fw independentof each other and of W. Given sample-size n, coverage a,
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
270 Chen and Schmeiser
Tuble J. DRA and I RA comparison for a GeTI problem
Squared bias Variance MSE E[ Var(.'(j)J CPU lime (sec.fIOOO repl.)
DRA IRA DRA IRA DRA IRA DRA IRA DRA IRA
R.N. Seed R.N. Seed
I 0.42 0.42 0.17 0.17 0.59 0.59 2 3 2 32 0.26 0.27 0.14 0.09 0.40 0.36 0.17 0.075 6 9 6 93 0.10 0.13 0.15 0.08 0.25 0.21 0.15 0.054 8 12 8 124 0.03 0.05 0.15 0.07 0.18 0.12 0.11 0.040 12 18 13 185 0.00 0.01 0.14 0.06 0.14 0.07 0.08 0.030 19 27 20 326 0.00 0.00 0.11 0.04 0.11 0.04 0.06 0.021 33 46 40 527 0.000 0.000 0.046 0.024 0.046 0.024 0.027 0.012 61 87 75 968 0.000 0.000 0.022 0.012 0.022 0.012 0.015 0.007 121 168 143 1789 (l.000 0.000 0.010 0.006 0.010 0.006 0.008 0.004 227 322 278 345
10 0.000 0.000 0.005 0.003 0.005 0.003 0.004 0.002 435 660 540 670
confidence 1', and distribution shape, the problem is todetermine the value of .r" so that with 1001'% confidencethe random tolerance interval [W - x·S, 00) contains atleast the proportion a of the distribution. The Thiokolapplication is to reliability design issues, but such nonnormal tolerance-interval problems arise in many contexts (Chen and Schmeiser, 1995; Chen and Yang, 1999).
For this application, the root-finding function is0(.'1') = Prlv,s{Prw{W 2: W -xS} 2: a}, an (n+ I)-dimensional integral. Numerical integration would be inefficient even for small sample size n. The function 0(.'1')can be estimated easily, however, by Ym(x), the sampleaverage of 111 realizations of the conditional randomvariable
y(x). Choices of w, in order of increasing efficiency, are:(i) the pseudorandom number seed, (ii) the pseudorandom numbers, (iii) (WI,"" Wn ) , and (iv) (W,S).
Chen and Schmeiser (1995) analyze the Thiokol SRFPin some detail. They develop an efficient Monte Carloalgorithm to solve the problem by viewing it as a quantile-estimation problem. As expected, the special-purposequantile-estimation algorithm is more efficient thanblack-box RA algorithms, which have no informationabout the problem's structure. Here, however, we use thisexample to compare the efficiency of ORA and IRAalgorithms in Section 5.1 and to compare IRA to atuned version of e1assical stochastic approximation inSection 5.2.
Notice that the random variable Y(x) is not a function ofmean E( W) or variance Var( W) because the randomprobability Prw{ W 2: W - xS I w, S} does not depend onE(W) or Var(W). Hence the reliability Prw{W 2: WxS I "V,S} can be computed using arbitrary values ofE(W) and Var(W) > 0, given Wand S based on a samplewith these arbitrary values of the mean and variance. TheMonte Carlo computer procedure for generating anobservation of Y(x) then consists of four steps: (i) generate a sample WI, W2 , •. • ,11';, from Fw with arbitrarilychosen mean and variance, (ii) compute Wand S from thesample, (iii) compute p = Prw{ W 2: W - xSI w, S} =I - Fw(W - xS), and (iv) set Y(x) equal to zero if p is lessthan a and to one otherwise.
As discussed in Section 4.2, to implement RA the independent variable w needs to be chosen. Various choicesare possible lor this SRFP. As always, the pseudorandomnumber seed can be used, and all computations repeatedfor each value of x: compute the pseudorandom numbers,transform them to random variates ~,j = I, ... ,n,compute Wand S, compute the reliability p, and compute
Y(x) = { ~ if Prw {W 2: W -xSIW,S} 2: a,otherwise.
5.1. Comparing DRA and IRA
Using a GCTI application, we illustrate the statistical andcomputational efficiency of specific ORA and IRA algorithms with a Monte Carlo experiment. For this application, IRA converges faster than ORA, the estimators ofsampling error work reasonably well after the first fewiterations, and saving the random numbers rather thanonly the seed value reduces computational effort by aboutone third for ORA and about one fifth for IRA.
Following the discussion in Section 4.1, we choose thefollowing RA components:
• Sample size sequence: m, = 2 and mj+1 = Clmj, whereCI = 2. With these choices, each new retrospectiveiteration uses a sample size almost equal to the totalused by all previous iterations.
• Error-tolerance sequence: EI = 1050 and Ei+1 = C;-I/2
Ej. That is, we essentially remove the error-tolerancelogic.
• A numerical root-finding method that returns a retrospective solution, an approximation of the root ofEquation (3): The initial solution is, rather arbitrarily, .'1'0 = 1. At retrospective iteration i, Ym, is com-
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation
puted at the current root estimate x(g~I' ... ,Q!;_,) fori > I and at Xo for i = I, which yields either an upperor lower bound. The other bound is found bysearching either left or right at a distance Oi. Thisdistance is doubled until the retrospective root isbounded. The sequence {oil defined by 0, = 0.0001and ()i = C2[Y:i'r(X(illl,'" ,illi_il-x*(ill;))]lf2, except
when this value is zero, in which case Oi = Oi-I. SeeSection 4.1, where C2 = I is suggested. Once theretrospective root is bounded, regula falsi search isused to find a retrospective solution within the errortolerance Ei. Because of the large error tolerancevalues, the logic reduces to two steps: (i) boundingthe retrospective solution, and (ii) returning the linear interpolate of the bounds as the retrospectivesolution. We refer to this variation as the BoundingRA algorithm.
We intentionally did not tune the algorithm parametersxo, III I, 0" CI, and C2 to this application. As discussed inChen (1994), IRA performance is robust to these parameter values, both in that these values work well overmany applications and in that small changes in the valueshave little effect on performance.
The application is the GCTI problem with n = 10,a = )' = 0.99, and Johnson distribution (Johnson, 1949)with skewness 4 and kurtosis 30. The true root is tolerance factor x* ::::: 1.938 (see Chen and Schmeiser, 1995,Table 2).
Table I compares ORA and IRA performance basedon 20 independent runs of 1000 independent Monte Carloreplications. Each replication solves the GCT! application once with w = (illl, ill2, ... ) generated independentlyof other replications. All digits shown are statisticallysignificant, except possibly the last digits of cpu times.For iterations i = I, ... ,10, the quality of the solutionsare measured by the squared bias [E(X(illl' ... , ill;)) - x*]2,variance Var(x(ill" ... ,ill;)), and mean square error(squared bias plus variance, denoted by MSE), shown incolumns two through four, respectively. The fifth columnshows E[Y:i'r(xi)],J,te mean of the ORA and IRA variance estimators Var(x(illl"" ,illi)) from Equations (7)and (6), respectively.
Table I shows that IRA produces better solutionsthan ORA for every retrospective iteration after the first,where the two algorithms are identical. For both ORAand IRA the squared biases quickly become negligiblecompared to the variance. The IRA bias accumulates thebiases from past iterations and hence is bigger than, butdecreases at about the same rate as, the ORA bias.Because the biases are small, the variances and mse's arenearly identical after the first few retrospective iterations. For both ORA and IRA these decrease by about50% with each retrospective iteration, as is expectedbecause Cl = 2 and therefore the sample sizes aredoubling.
271
The final few iterations suggest that asymptoticallyIRA has mse and variance that is 50 to 60% that ofORA. This seems reasonable because IRA has the benefitof using twice (or CI/(Cl - I) in general) as many independent observations as ORA (lIli for ORA and 2:;=l Ill)
for IRA).The averages of the variance estimators Y:i'r(.r(illl, ... ,
Q!;)) from Equations (7) and (6) are shown in the fifthcolumn. Although these estimators correctly sense theorder of magnitude of the sampling error, they underestimate the variance from the third column. The relativeerror is less for ORA than for IRA and both relativeerrors are decreasing with retrospective iteration numberi. The biases in the variance estimators are caused by theearly iterations not having the asymptotic behavior assumed in the derivations of Equations (7) and (6) due tothe arbitrary starting point Xo = I. Maybe better varianceestimators can be found, perhaps by taking a movingaverage of only the previous few iterations. The currentestimators work well at their primary purpose, which is topredict the proper scaling (via oi+d for the next retrospective iteration.
The two right-most columns show the number of cpuseconds required for 1000 Monte Carlo replications on aSun SparcCenter 1000 computer. The times almost double with each retrospective iterations because the samplesizes are doubling (c, = 2). The times to generate therandom numbers, transform them to random variates,and compute the observations y quickly dominate thefixed cost of each iteration.
Times for four RA algorithms, two versions of bothORA and IRA, are shown. Both versions, which correspond to implementations using two different w's, createthe same realizations and therefore have the same statistical properties; they differ only in the method orcomputing. The columns labeled R.N. correspond to animplementation in which the random numbers are storedwhen first generated. The columns labeled Seed correspond to storing only the initial random-number seed,from which random numbers are recomputed as needed.The latter is simpler to code, primarily because it requiresa fixed amount of storage. The R.N. versions are faster,but negligibly so for IRA. The decrease in ORA times isgreater than for IRA because in ORA each randomnumber is used again and again, whereas in IRA eachrandom number is used in only one retrospective-iteration. The ratio of the IRA time to the ORA time is aboutone for the Seed versions and 1.2 for the R.N. versions.These results are consistent with our argument in Section 4.2 that the ratio lies in [I, u), where u = 2 here. Thespeed improvement could differ considerably with yetanother choice of w (e.g., storing the random samples or,better, only the sample means and standard deviations) orin another application.
A natural way to compare algorithm performance isvia the generalized I11se, the product of cpu time and mse.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
272 Chen and Schmeiser
SA: ",-N(x·.l). =5 --
IRA: .. - N("',IO\m,,,,2, c,= 2,
at: 10·.~.. 1 ~
1000 2000 3000 4000 5000 6000 7000 8000o'---~-~-~-~-~-~~-~----J
o
60 ,-----,-~-~-~--~-~-~--~-_,
30
40
10
50
20
In this sense, IRA dominates DRA. The IRA mse is onlyhalf of the DRA mse, while the cpu time ratio (IRA toDRA) is no larger than two for any choices of w. Overall,I RA/ R.N. has the best generalized mse. IRAjSeed, whiehis only a bit slower, is likely to be a better choice in w
practice because it is easier to implement and requires ~finite storage, qualities not measured in generalized mse. Z
If the algorithms are to be run for many iterations, R.N.versions would eventually require disk (rather than random-access memory) storage, and the elapsed timeswould quickly be longer than Seed versions.
5.2. Comparing IRA and stochastic approximation
We now compare an IRA algorithm to a StochasticApproximation (SA) algorithm; in the GCTI applicationIRA has much smaller sampling error than SA. As shownin Fig. 2, RA algorithms are substantially more efficientthan Robbins and Monro's SA, the classical black-boxsolution method, even when it is tuned to the application.
Figurc 2 compares IRAjSeed and SA using the GCTIapplication with tolerance parameters n = 5, ex = 0.5,y = 0.9, and normal population, for which x* = 0.6857.The I RA/Seed algorithm used here is the same as that inTable I except that the initial point Xo is generated randomly from the normal distribution with mean x* andvariance 104 (denoted by N(x*, 104». The early convergence rate or SA depends on the initial point and thesample size (1'01' example, Chen (1994, p. 71) and Fu andHealy (1992», although the asymptotic convergence ratedoes not. We generate the initial point from N(x*, I) anddefine each SA iteration to use the average of five observations, which is the (empirically determined) optimalvalue associated with this random initial point (Chen,1994, p. 71). The performance measure here is againgeneralized ruse, except that now computing effort ismeasured by N, the total number of y(x) observationscomputed. On a Sun SparcStation 2 computer, N = 8000corresponds to about 30 seconds of cpu time. Figure 2shows that IRA/Seed, despite having less informationabout the location of the root, and not being tuned to theapplication, converges faster than SA for these numbersof observations. The sample size N needed for the SAgeneralized rnse to drop below that of IRA is, as can beinferred from the figure, so large that in practice IRApcrforma nee is su bstantia Ily better.
That the SA asymptotic generalized mse is less mightseem surprising. The asymptotic generalized mse ratio(I RA to SA) lor this Thiokol example is about 2.3, whichis quite close to the asymptotic number of points at whichIRA evaluates YIIl'(X) at each retrospective iteration i(Chen 1994). Therefore, the asymptotic performaneeseems to be thc same if we count sample-path approximations for IRA and function evaluations for SA. Thismakes sense because asymptotically the function 9 islinear and the slope at the root is known, so more than
N = number 01 y's generated
Fig. 2. A comparison of the IRA and SA algorithms.
one function evaluation is wasted. But the additionalfunction evaluations are, in fact, quite useful early, when9 is not linear, maybe explaining why IRA performsbetter than SA in practice.
6. Conclusions
We introduce a family of retrospective approximation(RA) algorithms to solve SRFPs. The algorithms arebased on solving sample-path approximations to theproblem of interest; that is, pseudorandom data from theproblem are generated and used to create a sequence ofapproximate problems, which have increasing samplesizes and decreasing solution-error tolerances. Our algorithms differ from retrospective optimization algorithmsin that they are iterative and in that they explicitly allowsome error in the solutions to the approximate problems.The latter is necessary if the approximate problems are tobe solved numerically.
In addition to introducing retrospective approximationalgorithms, we prove convergence for one-dimensionalSRFPs, we discuss implementation issues and specify twoalgorithm variations: Dependent Retrospective Approximation (DRA), which uses all past observations at eachiteration, and Independent Retrospective Approximation(IRA), which uses each observation in only one iteration.Monte Carlo results for an application, as well as someanalytical arguments, indicate that IRA is superior toDRA. Monte Carlo results of a related application showthat IRA has smaller generalized mean squared errorthan a version of stochastic approximation that is tunedto the application.
Recommendations for future work in this area include:(i) proposing specific RA algorithms and proving convergence for multi-dimensional SRFPs; (ii) derivingasymptotic distributions for the root estimator; and (iii)extending RA's application on SRFPs to more generaloptimization problems.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation
Acknowledgements
This research is supported by Purdue Research Foundation Grant 690-1287-2104, Thiokol Corporation Contract A46111430, and NSF Grant OMS 93-00058. Wethank Colm O'Cinneide for helpful discussions, AntonKleywegt for helpful comments, and lIE Transactionseditor Jim Wilson for thoughtful suggestions that substantially improved the presentation.
References
Andradottir, S. (1992) An empirical comparison of stochastic approximation methods for simulation optimization, in Proceedingsof the First Industrial Engineering Research Conference, Klutke,Goo Mitta, D.Aoo Nnaji, B.O. and Seiford, L.M., (eds), Institute ofIndustrial Engineers, Norcross. GA, pp. 471-475.
Banks, J., Carson. II, J.S. and Nelson, B.L. (1996) Discrete-EventSystem Simulation, Prentice Hall, Upper Saddle River, NJ.
Billingsley. P. (1979) Probability and Measure, John WIley & Sons,New York, NY.
Chen, H. (1994) Stochastic Root Finding in System Design, Ph.D.Dissertation, School of Industrial Engineering, Purdue Universi-ty, West Lafayette, IN. .
Chen, H.-S. (1998) Multi-dimensional independent retrospecuve approximation for making resource allocation deCISIOns to man.ufacturing systems, Master Thesis, Department of IndustnalEngineering. Da-Yeh University, Chang-Hwa, Taiwan. (inChinese).
Chen, H. and Schmeiser, B.W. (1994a) Stochastic root finding: problem definition. examples, and algorithms, in Proceedings of theThird Industrial Engineering Research Conference, Burke, L. andJackman, J. (eds), Institute of Industrial Engineers, Norcross,GA, pp. 605-610.
Chen, H. and Schmeiser, B.W. (1994b) Retrospective approximationalgorithms for stochastic root finding, in Proceedings of the 1994IVimer Simulation Conference, Tew, J.D., Manivannan, S., Sadowski, D.A. and Seila, A.F. (eds), Institute of Electrical andElectronics Engineers, Piscataway, NJ, pp. 255-261.
Chen, H. and Schmeiser, B.W. (1995) Monte Carlo estimation forguaranteed-coverage nonnormal tolerance intervals. Journal ofStatistical Computation and Simulation, 51, 223-238.
Chen, H. and Yang, T.-K. (1999) Computation of the sample size andcoverage for guaranteed-coverage nonnormal tolerance intervals.Journal of Statistical Computation and Simulation, 63, 299-320.
Conte, S.D. and de Boor, e. (1980) Elementary Numerical Analysis: AllAlgorithmic Approach, New York, McGraw-Hili, NY.
Fu, M 'c. (1994) Optimization via simulation: a review. Annals ofOperations Research, 53, 199-247. . .
Fu, M.e. and Healy, K.J. (1992) Simulation optimization of (s,S) 111
ventory systems, in Proceedings of the 1992 Winter SimulationConference, Swain, JJ., Goldsman, D., Crain. R.e. and Wilson,J.R: (eds), Institute of Electrical and Electronics Engineers, Piscataway, NJ, pp. 506-514.
Fu, M.e. and Hill, S.D. (1997) Optimization of discrete eventsystem via simultaneous perturbation. liE Transactions, 29,233-243.
Gurkan, G., Ozge, A.Y. and Robinson, S.M. (1994) Sample-path optimization in simulation, in Proceedings of the 1994 Winter Simulation Conference, Tcw, J.D., Manivannan, S., Sadowski, D.A.and Scila, A.F. (eds), Institute of Electrical and Electronics Engineers, Piscataway. NJ, pp. 247-254.
273
Healy, KJ. (1992) Optimizing Stochastic Systems: a retrospective/deterministic approach. Ph.D. Dissertation, Department of Operations Research, Cornell University, Ithaca, NY.
Healy, K.J. and Schruben, L.W. (1991) Retrospective. simulation :e.sponse optimization, in Proceedings of the 199J Winter SimulationConference, Nelson, B.L., Kelton, W.D. and Clark, G.M. (cds),Institute of Electrical and Electronics Engineers, Piscataway, NJ,pp. 954-957. .
Healy, KJ. and Xu, Y. (1994) Simulation based retrospective approaches to stochastic system optimization. Research Memorandum No. 94-14, School of Industrial Engineering, PurdueUniversity, West Lafayette, IN.
Healy, K.J. and Xu, Y. (1995) Simulation optimization of a bulkserver. Research Memorandum No. 95-3, School of IndustnalEngineering, Purdue University, West Lafayette, IN.
Homem-de-Mello. T., Shapiro, A. and Spearman, M.L. (1999) Findingoptimal material release times using simulation based optimization. Management Science, 45, 86-102.
Huber, PJ. (1964) Robust estimation of a location parameter. Annalsof Mathematical Statistics, 35, 73-101.
Johnson, N.L. (1949) Systems of frequency curves generated bymethods of translation. Biometrika, 36, 149-176.
Law, A.M. and Kelton, W.D. (2000) Simulation Modeling and Analy-sis, McGraw-Hill, New York, NY. .
L'Ecuyer, P., Giroux, N. and Glynn, P.W. (1994) Stochasticoptimization by simulation: numerical expenments WIth theM/M /1 queue in steady state. Management Science, 40, 12451261.
L'Ecuyer, P. and Glynn, P.W. (1994) Stochastic optimization by simulation: convergence proofs for the CI/C/ I queue in steady state.Management Science, 40, 1562-1578.
Loeve. M. (1977) Probability Theory 1, Springer-Verlag, New York,NY.
Plambeck, E.L., Fu, B.-R., Robinson, S.M. and Suri. R. (1996) Sample-path optimization of convex stochastic performance functions.Mathematical Programming, 75, 137-176.
Robbins, H. and Monro, S. (1951) A stochastic approximation method. Annals of Mathematical Statistics, 22, 400-407.
Robinson, S.M. (1996) Analysis of sample-path optimization. Mathematics of Operations Research, 21, 513-528.
Rubinstein, R.Y. and Shapiro, A. (1993) Discrete Event Systems Sensitivitv Analvsis and Stochastic Optimization by the ScoreFUllction'Metho;', John Wiley & Sons, New York, NY.
Serfling, RJ. (1980) Approximation Theorems of Mathematical Statistics, John Wiley & Sons, New York, NY.
Shapiro, A. (1996) Simulation-based optimization - convergenceanalysis and statistical inference. Communications in Statistics Stochastic Models, 12, 425-454.
Shapiro, A. and Wardi, Y. (1996) Convergence analysis of stochastic algorithms. Mathematics of Operations Research, 21,615-628.
Shapiro, A. and Homem-de-Mello, T. (1997) A simulation-based approach to two-stage stochastic programming with recourse.Mathematical Programming, 81. 301-325.
Simon, B. (1998) Convergence of a stochastic rootfinding procedure.Working Paper, Department of Mathematics, University of Colorado at Denver, Denver, CO.
Appendix
Proof of Lemma 3. Without loss ofgenerality, assume that)' = O. By the second assumption, for every e > 0, w.p.1there exists a positive integer N(r.,Q2.) such that
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
274 Chen and Schmeiser
11/ > N(I:,!.Q) implies 0(.1') - 10 < y",(x;!.Q) < 0(.1') + 10
for all x E ~.
and it follows immediately that .1'- = .1'+ is on theboundary of RN and on the boundary of RP
. Thus we seethat in this case, .1'- = .1'+ E RZ U RNP = Cm(!.Q).
•lim X'(m) =.1'*, w.p.l.01-00
Proof of Lemma 4. Without loss of generality, assumey = O. By the second assumption, for every x E ~ andevery I) > 0, w.p.l there exists a positive integer N(x, I),!.Q)such that
m > N (x, II,!.Q) implies IYm (x;!.Q) - g(x) I < IJ·
Given any 8 > 0, the first assumption implies
g(x' + E) > 0 and g(x' - 8) < O.
Now, choose .1'=.1'* + 8 and '1 = g(x* + e), and defineN1(.1",8, Q1.) = N(x, I), !.Q). Then for m > N, (x', e,!.Q)
IYm(x' +8;!.Q) - g(x* +e)1 < g(x' +e) w.p.1.
Therefore, w.p.l , .vm(x' + 8; Q1.) > 0 for every III > NI
(.1",8, !.Q). Similarly, choose x = x' - 8 and
(a)We have Ym([;!.Q) > 0 so that .1'- E RP; and it follows from (A6) that .1'- is on the boundary of RN orRZ so that [ E RNP.
(b)We have Ym(x-;!.Q) = 0; and in this case we haveimmediately that .1'- E RZ
(c)We have Ym([;!.Q) < 0 so that .1'- ERN; and it follows from (A6) that [ is on the boundary of RP orRZ so that .1'- E RNP.
Therefore, .1'- is a crossing root. Using a similar argument, we can show that .1'+ is also a crossing root. We cansummarize all of these results as follows:
and Z E JV([) so that )'m(z;!.Q) 2: 0 }and z'EJV([)sothatYm(z';!.Q)<O ;
(A6)
otherwise, sup {x : Ym(z;!.Q) < 0 for every Z < x} > [ andthis contradicts the definition of .1'-. There are now threecases to consider:
[ = .1'+ implies th~t [ = .1'+ E RZ U RNP = Cm(!.Q) } ..1'- < .1'+ implies that Cm(!.Q) C [.1'- ,.1'+]
(A7)
Combining (A4) and (A 7), we see that Cm(!.Q) must benon-empty.
Let X'(IIl) be any point in Col (!.Q). Because x' is in theinterval [g-1 (-E), «' (E)], for every III > N(r.,!.Q)
IX'(IIl) -.1"1::; y-I(8) - 0- 1(-8) w.p.l.
Since 8 can be arbitrarily small and .1'* is the unique root,for any selection rule defining the solution sequence{X*(m)},
If .1'- < .1'+, then both .1'- and .1'+ are crossing roots.Here we prove for .1'- only; the proof for .1'+ is similar. Forevery neighborhood JV(x-) of [, we can find points Z
and z' satisfying
(A4)
and z.E ./V(x+) so that .f;,.(Z;~) > 0 )z' E .;11(.c) = JV(x+) so that yrn(z'; Q1.) < 0 ;
(A5)
z > .1'+Z' < x- and
[ == sup{x: .V,II(Z;!.Q) < 0 for every Z < x}. (A2)
Since y",(z;!.Q) > 0 for every z> frl(E) and y",(z;Q1.) < 0for every Z < a-I (-E), we sec that
.1'+ ::;«' (I:) and .1'- 2: «' (-E). (A3)
Next we prove that
by contradiction. Assume that .1'+ < .1'-. For everyx E (.1'+,.1'-), we can pick Z E (.1'+,.1') arbitrarily; and applying the definition (A I) of .1'+, we must have.v",(x;!.Q) > 0 since x> Z> .1'+ Similarly we can pickz' E (x,[) arbitrarily; and applying the definition (A2) of.1'-. we must have )'",(x;!.Q) < 0 since x < z' < .1'-. This is acontradiction since Ym(x; Q1.) cannot be both positive andnegative; and thus the assumption that .1'+ <.C must belulsc. This establishes the inequality (A4).
Now if .1'- = .1'+, then we see that for every neighborhood .IV(X+) = .IV(X-) of the point .1'+ = .1'-, we can findpoints Z and z' satisfying
Therefore, for every III > N(E, !.Q), the crossing set Cm(!.Q)is not empty and satisfies
Cm(!.Q) ~ [y-I (-I:), 0- 1(E)] w.p.1.
To prove that the crossing set Cm(!.Q) is non-empty, we let
.1'+ == inf{x : Ym(z;!.Q) > 0 for every Z> x}, (A 1)
and
By the first assumption 0 is a non-decreasing function sothe inverse function 0- 1(a) = inf'[r : O(x) 2: a} is alsonon-decreasing. Now choose a particular positive valueof I: - small enough that both 0- 1(- 10) and O-I(E) arcdefined. Because I: > 0 and x' is the unique root,0- 1(-I:) s x' s 0-1(I:).
Now, consider the region of the crossing set. Because"x> a-I (I:)" implies "y(x) 2: I:", then w.p.l for every11/ > N(E, !.Q),
.v",(x;!.Q) > y(x) - 10 2: e - e = 0, for every x > y-I(E).
Similarly, w.p.1 for every 11/ > N(E,Q1.),
•v",(x;!.Q) < y(x) + I:::; -10+ /: = 0 for every .1'< y-I(-E).
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014
Stochastic root finding via retrospective approximation
'1 = -y(x* - e) > 0, and define Nz(x*,e,wJ = N(X,11,wJThen, for m > Nz(x*, e,Q2.),
IYm(x* - e;Q2.) - y(x* - e)1 < -y(x* - e) w.p.1.
Therefore, w.p.l , Ym (x* - 0;Q2.) < 0 for everym > Nz(x*, B, Q2.).
Define N3(x*, f;, Q2.) = max {N1(x*,0,Q2.), Ni (x*,s, Q2.)}.Then, for every m > N3(x*, 0, Q2.)
Ym(x* - 0;Q2.) < 0 < Ym(x* + 0;Q2.) w.p.l.
Therefore, w.p.1 there exists at least one crossing root inIx* - e,x* + eJ and hence the crossing set Cm(Q2.) is notempty. (The argument is similar to that in the proof ofLemma 3). Furthermore, by the third assumption,C",(Q2.) = [_~ (Q2.), x" (Q2.)1 for every m, where :rf =sup{x: 5'm(x;Q2.) < y} and x'' = inf{x: y",(x;Q2.) > y}.Hence, for every m > N3(X*, 0, Q2.)
x*-e~xL~xU~x*+o w.p.l,
and therefore C",(Q2.) ~ [x* - s.x" + s] w.p.1. BecauseX*(m) is a point selected from Cm(Q2.), then w.p.1 everyX*(m) E [x' - B,X* + oj for m > N3(x*, e,Q2.). Hence, forany selection rule defining the solution sequence {X*(m)},
lim X*(m) =x* w.p.1. •m-eoo
275
Biographies
Huifen Chen is an Associate Professor in the Department of IndustrialEngineering at Chung Yuan Christian University in Taiwan. Shecompleted her Ph.D. in the School of Industrial Engineering at PurdueUniversity in August 1994. She received a B.S. degree in Accountingfrom National Cheng-Kung University in Taiwan in 1986 and an M.S.degree in Statistics from Purdue University in 1990. Her current research focuses on stochastic root finding, random-vector generation,non-normal tolerance intervals, and stochastic operations researchapplied in reliability and transportation.
Bruce Schmeiser is a Professor in the School of Industrial Engineeringat Purdue University. He received his Ph.D. from the School ofIndustrial and Systems Engineering at Georgia Tech in 1975; his undergraduate degree in the Mathematical Sciences and Master's degreein Industrial Engineering are from The University of Iowa. His research interests include stochastic root finding, simulation outputanalysis, input modeling, random-variate generation, variance-reduction techniques, Markov chain Monte Carlo methods, and appliedoperations research. He has served in a variety of roles for the IIE,INFORMS and the Winter Simulation Conference.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a Sa
nta
Cru
z] a
t 14:
39 1
5 O
ctob
er 2
014