9
IEEE TRANSACTIONS ON -4UTOMATIC CONTROL, VOL. AC-19, NO. 4, AUGUST 1974 349 Stochastic Approximation Type Methods for Constrained Systems: Algorithms and Numerical Results HAROLD J. HUSHSER., FELLOW, IEEE, AXD TO84 GATTIT\‘, MEMBER, IEEE Abstract-A stochastic version of the standard nonlinear pro- gramming problem is considered. A function f(~) is observed in the presence of noise, and we seek tominimizef(x) forx E C = (x:q’(x) < 01, where qi(x) are constraints. Numerous practical examples exist. Algorithms are discussed for selecting a sequence X,, which converges wp 1 to a point where a necessary condition for optimality holds. The algorithms use, of course, noise-corrupted observations on thef(x). Numerical results are presented.They indicate that the approach is quite versatile, and can be a useful tool for systematic Monte-Carlo optimization of constrained systems, a much-neglected area. However, many practical problems remaintobe resolved, e.g., investigation of efficient one-dimensional search methods and of the tradeoffs between the effort spent per search cycle and the number of search cycles. - I. INTHODUCTION ET f(z>, Pi (x), i = 0,. . . ,s, denotereal-valuedcon- t.inuously differentiable functions on Euclidean 1’- space. (It is convenient to assume below that. the second derivatives of f(.) are bounded.) The problem of mini- mizingf(x),forzintheset.C = (z:pi(z) < 0,i = l,...,s}, is t,he standard nonlinear programming problem. In this paper t,he problem of minimiz‘ngf(z) over C is considered but it is not assumed t.hat, f(z) is known; information on f(x) can be obt.ained only via noise-corru.pted obsercalioru. The algorit,hms pre*sent,ed here yield a sequence X , of ra.ndom variables which converge t.0 point 8 which satisfies a suitable necessary condition (Fritz John), denoted by NC, for constrained optimality. The problem is essentially a comtraimxl version of the types of problems to which stochastic approximation has generally been applied. Indeed, heavy use is made of the ideas of both stochastic approxima.tion and nonliuear pro- gramming, the former in the general sense treated in Kushner [1] (unconst,rained), Kushner and Gavin [2],[3] (unconstrained), the latter in the “con~putat,ionally- oriented” sense of Polak [$I, andbothtogetherinthe sense of Kushner [5] (constrained). The article draws heavily on the last, reference. This paper will present, algorithms and t,heorcmsn-hich yield a sequence {X,) such t.hat any limit point satisfies hTC. A number of typical simulation resu1t.s are also dis- Sworder, Past Chairman of t.he S C S Stochastic Control Committee. Manuscript received July 5, 1953. Paper recommended by L). D. This work was supported in part by Grants NONR-KO0014-6i-A- 0191-0018, NSF GK 3107351, and MOSR-71-20788. and Engineering, Brown University, Providence, R.I. The authors are with the Department of Apphed Mathematics cussed. The class of problems treat.ed is not only interest- ing and important, and admits of many fascinating (theo- reticallyandcon~putat~ionally)variations,buthas been almost entirely neglected in the 1it.erature. Many types of algorithms are possible, and it. is hoped t.hat, this article, besides presenting results of interest, will also encourage a further development.. Alany (.samples of interest fit the above format, both u-ithin and \vit.hout control theory. Exa:m,ple I : Consider a material whose average breaking sirength and whose. unit cost.depend onavector parameter x. It is desired to maxinize the average breaking strength under a cost, constraint; the cost. is known for each param- eter value, but. the actual brea.l<ing strength can only be observed experimentally, and is a random uwiuble for each value of z. Example 2: Let Y o be given and { Y,,n > 0) defined by Ynil = F(Yn,u,&), ?I = O,l,. . .:AT - 1, where {Gn] is a sequence of random variables, F(. , . , .) is a known func- tion, and a,. . . ,u,,~-~ is a sequence of paramfers which is to be chosen t.o minimize some function Ek(Y,-) EK(uo,. . .,u.~-~;+~,- . . f(uo,. . -,uAT-J, under a constraint, (n-here we writ.e k( YAv) K(%, . . . ,uN-&, . . . , For example, if u.i represent.s the fuel used onthe ith step, then the constraint could be a fuel constraint Et-’ ui _< L:, where LT is a given constant. Generally, even if the distribution of (+rL) is known, it. is difficult. (and usually practically impossible in nonlinear problems) to evaluate E:k(I’,) analytically, and even more difficult. to minimize it analytically with respect. to (uo,. . . ,u,+~). Suppose t.hat the distribution of {Gn} is known. Then, one frequently resorts to simulation, n.here, by generating sample pat,& { Y,} for a {+R) sequence selected according to the given distribution, (zero-mean) noise perturbd samples (of the objective function Pk(Y,)) of the type k(Y,) = Ek(YA7) + (k(Y,) - E(k(Y,v)) a.re observed. Very commonly, the derivatives of k(YAr) with respect to (uo,. . . ,ZL~~-~) canalsobesimulated. For illustrative purposes, suppose that u is a single (vect,or-valued) param- eter (which does not. depend on ~2). Then we may also be able to simulate the equations (for each given ($,] sequence) __- dJ’n+l - - aF(Yn,u,,lh) + WYn,u,+n) (aY,,du), dU d ZL d Yz n = 0,. . .,N - 1

Stochastic approximation type methods for constrained systems: Algorithms and numerical results

  • Upload
    t

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

IEEE TRANSACTIONS ON -4UTOMATIC CONTROL, VOL. AC-19, NO. 4, AUGUST 1974 349

Stochastic Approximation Type Methods for Constrained Systems: Algorithms

and Numerical Results HAROLD J. HUSHSER., FELLOW, IEEE, AXD TO84 GATTIT\‘, MEMBER, IEEE

Abstract-A stochastic version of the standard nonlinear pro- gramming problem is considered. A function f ( ~ ) is observed in the presence of noise, and we seek tominimizef(x) forx E C = (x:q’(x) < 01, where qi(x) are constraints. Numerous practical examples exist. Algorithms are discussed for selecting a sequence X,, which converges wp 1 to a point where a necessary condition for optimality holds. The algorithms use, of course, noise-corrupted observations on thef(x). Numerical results are presented. They indicate that the approach is quite versatile, and can be a useful tool for systematic Monte-Carlo optimization of constrained systems, a much-neglected area. However, many practical problems remain to be resolved, e.g., investigation of efficient one-dimensional search methods and of the tradeoffs between the effort spent per search cycle and the number of search cycles.

-

I. INTHODUCTION

ET f(z>, P i (x), i = 0,. . . ,s, denote real-valued con- t.inuously differentiable functions on Euclidean 1’-

space. (It is convenient to assume below that. the second derivatives of f(.) are bounded.) The problem of mini- mizingf(x),forzintheset.C = (z:pi(z) < 0 , i = l , . . . , s } , is t,he standard nonlinear programming problem. I n this paper t,he problem of minimiz‘ngf(z) over C is considered but it is not assumed t.hat, f(z) is known; information on f(x) can be obt.ained only via noise-corru.pted obsercalioru. The algorit,hms pre*sent,ed here yield a sequence X, of ra.ndom variables which converge t.0 point 8 which satisfies a suitable necessary condition (Fritz John), denoted by NC, for constrained optimality.

The problem is essentially a comtraimxl version of the types of problems t o which stochastic approximation has generally been applied. Indeed, heavy use is made of the ideas of both stochastic approxima.tion and nonliuear pro- gramming, the former in the general sense treated in Kushner [1] (unconst,rained), Kushner and Gavin [2],[3] (unconstrained), the latter in the “con~putat,ionally- oriented” sense of Polak [$I, and both together in the sense of Kushner [5] (constrained). The article draws heavily on the last, reference.

This paper will present, algorithms and t,heorcms n-hich yield a sequence { X , ) such t.hat any limit point satisfies hTC. A number of typical simulation resu1t.s are also dis-

Sworder, Past Chairman of t.he S C S Stochastic Control Committee. Manuscript received July 5, 1953. Paper recommended by L). D.

This work was supported in part by Grants NONR-KO0014-6i-A- 0191-0018, NSF GK 3107351, and MOSR-71-20788.

and Engineering, Brown University, Providence, R.I. The authors are with the Department of Apphed Mathematics

cussed. The class of problems treat.ed is not only interest- ing and important, and admits of many fascinating (theo- retically and con~putat~ionally) variations, but has been almost entirely neglected in the 1it.erature. Many types of algorithms are possible, and it. is hoped t.hat, this article, besides presenting results of interest, will also encourage a further development..

Alany (.samples of interest fit the above format, both u-ithin and \vit.hout control theory.

Exa:m,ple I : Consider a material whose average breaking sirength and whose. unit cost. depend on avector parameter x. It is desired to maxinize the average breaking strength under a cost, constraint; the cost. is known for each param- eter value, but. the actual brea.l<ing strength can only be observed experimentally, and is a random uwiuble for each value of z.

Example 2: Let Y o be given and { Y,,n > 0 ) defined by Y n i l = F(Yn,u,&), ?I = O , l , . . . : A T - 1, where {Gn] is a sequence of random variables, F ( . , . , .) is a known func- tion, and a,. . . ,u,,~-~ is a sequence of paramfers which is to be chosen t.o minimize some function Ek(Y,-) EK(uo, . . . , u . ~ - ~ ; + ~ , - . . f(uo,. . -,uAT-J, under a constraint, (n-here we writ.e k ( YAv) K(%, . . . ,uN-&, . . . ,

For example, if u.i represent.s the fuel used on the i th step, then the constraint could be a fuel constraint Et-’ ui _< L:, where LT is a given constant. Generally, even if the distribution of (+rL) is known, it. is difficult. (and usually practically impossible in nonlinear problems) to evaluate E:k(I’,) analytically, and even more difficult. to minimize it analytically with respect. to (uo,. . . ,u,+~). Suppose t.hat the distribution of {Gn} is known. Then, one frequently resorts to simulation, n.here, by generating sample pat,& { Y,} for a {+R) sequence selected according t o the given distribution, (zero-mean) noise per turbd samples (of the objective function Pk(Y,)) of the type k(Y,) = Ek(YA7) + (k(Y,) - E(k(Y,v)) a.re observed.

Very commonly, the derivatives of k(YAr) with respect to (uo, . . . , Z L ~ ~ - ~ ) can also be simulated. For illustrative purposes, suppose that u is a single (vect,or-valued) param- eter (which does not. depend on ~ 2 ) . Then we may also be able to simulate the equations (for each given ($,] sequence)

_ _ - dJ’n+l - - aF(Yn,u,,lh) + W Y n , u , + n ) (aY,,du), dU d ZL d Yz

n = 0, . . . , N - 1

Page 2: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

350 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, AUGUST 1974

to get a noisy estimate of t.he derivative dEk( Yx)/au of 111. A ~”EASIBLE DIRECTIOXS METHOD:

the objective function in the form GEKERAL OUTLIXE

F aBk(I’,)jau + where represents “derivative” observation noise.

I n this paper it is assumed that (zero-mean) noise- corrupted observations of Of(z)(Vj(z) + {) can br ob- served. The noise can depend on the current parameter x or X,, etc. The method of the paper can easily bc adapted t.0 handle the problem where onlyf(z) + noise is obscrvrd. The differences in development and in the basic met hods of proof are slight, but there is a significant notational bur- den in the latter case. Indeed, [2],[3],[.‘,] all dcal nith the caw where onlyf(z) + noise is observcd, and the standard finitcdiffcrencc techniques of stochastic approximation can be added here also. The practical interest motivating the paper arose in dynamical systems-in csamplrs such as Example 2 : although thr rrsults arc much more broadly applicable.

influcncrd ours. 2 Consider the unconstrained problem: minimize R cont.inuously

ditYerentiable flInction I(.]. If f (S .+ l ) - f(-\-,,, 5 -g(lrf(S,,)l), 1 h, --ill not. be a unit vector but., rather, Ih.’ 5 1, where h = where g ( . .I satisfies the conditicms above, then any bounded and

ministic method of feaGible directions. (hl,hZ,. . . ), in corlformance with the usual pract.ice i n the deter- convergent, sequence (S,) converges to R point I where rjfirj = 0;

i.e., where the necesar- condition holds.

Page 3: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

HUSHNER AND GAVIN : SMCH.4STIC BPPROSIMATION 351

This concept is crucial in the stochastic case, where we cannot, even in principle, exa.ctly minimize s,(z) on Zn+. This remark will help explain Condit,ions 3 and 4 below.

Discussion of Conctitions 3 and 4 Condition 3 will simply guarantee (essentially) t,hat

y(c,X,,h,) is bounded above by a suitable (negative) function of c02-'n sufiicient!g oft.n, even t.hough kn is unknown. The function ? ? I ( . ) is introduced, since there is no need for Condition 3 t o hold for small n. The funct.ion n.l( - ) is unimportant in the proof, but allows t.he procedure for determining h, to be relatively crude for a length of time that can increase as t.he difficulty of finding good direct.ions increases (i.e., as - / ( E , z , ~ ) +- 0). The c l ( - ) in Condition 4 plays the same role as 721( - ) .

Condition 4 says simply t.hat the (conditional average) decrease in f(x) at. t,he 72th step is bounded away from zero by some suitable function of y(e,Xn,hn) which measures (essentially) the least. negative of the 1) directional de- rivative of f (X , ) (in h,) and 2) the interval w-idth,l,+. Conditions 3 and 4 together guarantee t,hat. EX,f(Xn+l) - f(X,) is sufficiently negative, modulo a summable error sequence. The Pn' t.erm in Condiiion 4 is essent,ial. Wit.hout it, t.he problem would be deterministic, and we could not. find procedures which guarantee Condition 4 ,(for 0,' E

0). It incorporates some of the bias and noise effects of the nt.h search. See the discussion in [ I ] concerning an analo- gous term in the iterations for the unconstrained stochastic problem.

IV. -4 GEKEFUL COKVERGESCE THEOREM FOR A

STOCH.4STIC FEASIBLE DIRECTIONS JfETHOD

First a general convergence t,heorenl (Theorem 1) \ d l be stated and discussed. The theorem (proved in [SI) involves conditions on the methods of selecting h, and X, which guarantee corivergencc for the stochastic problem. The conditions arc valid for a great variety of specific methods. Then a particular procedure which fits into the framework of Theorem 1 will be discussed. Then severaI modifications are discussed, for \x-hich convergence can also be proved. So far it has not. been possible to prove convergence for several methods (to be ment.ioned) which seem to be more efficient in their use of observations than t.he ones for u-hich convergence can be proved. It. is hoped t,ha.t this situation n-ill be remedied soon, since the entire approach is both natural and appealing and also useful, as the simulation data in the last sect.ion show.

The proofs of the convergence theorems are omitted. They involve numerous estimates which are fairly close to estimates already given i n [ I ] , [ 2 ] : [3], [ 5 ] , and the general method follon-s that in [5] very closely.

The follou-ing conditions are required. Condition 1: f( .) has coniinu.ous and bounded first and

Concliiion 2: The qi( .), i = 1 , . ' ' ,s, have coniinuous -first secoml cle~it'atives.

derivatives and C i s compact and i s the closure of it3 interior. Let 8 1 ( . ) , 62(.) , and g ( . ) denote sorile arbitrary real-

valued nondecreasing and positive funct.ions on (0, m),

and nl( e ) and c1( e) arbitrary real-$slued nonincrea,sing positive functions on (0, G), Let (8, denote the smallest U- algebra which measures all the dat.a available up to and including the calculation of X , but. not h,, and let (8+,

measure, in addition, h., and the dat.a involved in calculat- ing h,.

Condition. 3: Let a, 0 < a 5 1, denote an arbitrary real number. For each E > 0, let h, satisjg (see t.he Remark following Condition 4)

Pan{Y(a€Xn ,hn) 5 - 8 1 ( ~ ) f 2 82(E) (4) wpl relative !o ihe set WIEW

72 2 n.l(-Y(€,Xn)), Y(E?X,) I - e .

Co?zdition. 4: For some random sequence { Pn'} satisfying E x ; IDn'/ < 03, let the one-dimensional search procedure satisfy, with probabiiity one,

E@B:f(Xn+l) - f (XJ 5 On' (5)

and also

E@:f(Xn+d - f(xn) 5 - g ( - y ( ~ J n , L ) ) + Pn' (6)

with proba.bility one relative to the set where . I .

n 2 cl(--Y(E,Xn,hn)), ~ ( ~ , X n , h n ) 5 -&(E) .

Remark: In [5], where Theorem 1 is proved !(for the more diacult case where V-f can only be estimat.ed via finite differences), the a in Condit,ions 3 a.nd 4 is get equal to unity. It is easily seen from the proof t,hat any a6(0,1] will do, for precisely the same reason t,hat. we can replace E

in (1) by LYE, a.nd accept there (in the deterrhistic feasible directions method) the h which corresponds to the first vaiue of k for which ~(01~,,2-~,a) 5 - ~ , , 2 - ~ , without com- promising convergence to NC. Below, 01 = 2 ndI be uRed. Sett,ing a = $ simply means that, in certain cases the sequence of tests or linear progra.ms which lead to the selection of h,, mhich is similar to that used for the dster- ministic problem is guaranteed t,o stop at a. k no greater than k(z )+2 (where k ( r ) is the smallest k for m h i c h y ( ~ ~ 2 - " , x ) 5 inst.ead of a.t k ( r ) . The 01 is inserted here simply because a procedure which was found t.o be useful requires a = 2.

Define E(Z) = max{ E : ~ ( E , z ) 5 - E,€ > O f . Then Condi- tion 3 is equivalent to (4) holding for E 5 €(X,), and where n 2 n1(-y(e ,XZ)) . Since y(ae,X,?h,) is nonincreasing as E - 0 (it doesn't become less nega.tive), to show Condition 3 it is enough to show that,

p a n ( y ( a i ( ~ n ) , ~ , , h n ) I -6l(e(Xn))i 2 ~ Z ( E ( X , ) ) (7)

for all n a.nd suitable 61( -), 6Z( -), Theurem 1: (The proof is a specializat,ion of the proof of

Theorem 2 in [5].) If we asmme C m ~ M i m s 1-4, then any

Page 4: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

352 IEEE TRANS.4CTIONS ON AUTOMATIC CONTROL, AUGUST 1974

acczmulatim point x of {X,z} satisjies the NC y(0.s) = 0 Remark: Since $ n m k -+ 0 mpl as k ---t 0, (9) is not. very wit11 probability m e . strong. It is actually enough (for Condition 3) to show

that (9) holds only for k >_ ko(e), where k O ( ~ ) is real V. A PROCEDURE GUARAATEEISG COXDITIOS 3 valued and cont.inuous on (O.? a), but thc extra notation

Condition 3 is rclatively easy to satisfy; indeed, even randomly chosen directions work. Follon-ing is a procedure which vas found more satisfactory and which \\-a,: wed in the Fimulations. First the method will be outlined, then a proof given. X, is available, and an h, is sought.

required for the proof is not n-orth it. In fact, this extended condition always hold$, sincc Elt".'l2 5 u?, n-herc u2 does not depend on n or .i.

Pvoof of Co?tdifio/t 3: Equation (7) \vi11 be proved for o( = $. Define & ( E ) = E-:'& .HI(.) = 0. For the momeltt mppose that

For each 1 1 , let Os", s = 0,1,. . ., dcnotc an arbitrnr-, but fixed, sequence of positire integers, and fix eo > 0. Drfinc M - 1 " = 0 and X s n = Ojn . d sequencr of obserra- t,ions Vj(X,) + E".i= Y n s i , i = 1,3,. . ., \rillbe talml, whrrc t,he ob,:crvation noises { ( n , i , i = l.??. . . } are mutunll>- orthogonal3, giwn 63". They will be talcen in group, 0," at a time: until the test below succeed,:. 8uppos;c. that E&Jn,ii.L 5 u') for a r cd number u?. Drfinc $"," = (1; AISn) c:2; E " . i ; $",' is the samplt. average noise after s group,: of observations are talcen. Drfinc J,O(s) as J , ( s ) Tvithout the index ( 0 ) : t,hc index of the objective function is deleted.

Step I : Set k = 0, el: = eo. Step 2: Take 0," observations (observe Y"si = Vj(.X,,) +

("*?, i = 31,-," + 1 , . . .,31,"), calculntc vj(fiSn) + $n,t

and (S)? the noisy version of (1)-(2), or the linear program (3), for E = e t .

-€(Xn)$ = -&(X,)

and, consequently,

(12)

since they usually depend on the parameter), the ust1:11 aswnlption Instead of asswning independent. noises (which rarely occur,

E[noise in current observation'all data leading to current parameter (in stochastic approximation) of orthogonality is made: in particular,

value] = 0 is assunled.

Page 5: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

KUSHNER -4ND G14VIN : STOCH.4STIC APPROXIM!iTION 353

I I I I I

€0 2 -m-l

Eo F m E o i r n ' ' Eo2 m+2

m = k ( x )

Fig. 1. Sot.ation for the procedure guaranteeing Condition 3. .

sctt,ing Zln = X,. To calculat,e Zifln from Zin, a.n observa- t,ion Y i n O f ( X , + hRZ1') + tin is taken, and t.he esti- mat,e of the directional derivative conlput,ed as + n , z ( Z i n ) + gin ( V f ( X , + h,Zin),h,) + (tin,h,), whcre we define (tin,hn) gin, and tin is the observat,ion noise in the gra.dient estimate.

Let ain denote the minimal a-algebra which measures all thc dataf(X, + h,Zj,) + t jn, m = 0,1,. . . : a - 1, all j ; m = 7 1 , j = 1 , . . .,i, and all the observations used to calculate the hot . . .,h,. It will always be supposed that there is a real number u' so that.

Eainti" = 0 EBinltinl' 5 u2.

Let (a in) and { N , ] denotr sequences of real positive numbers, or random i-ariables which are nonanticipat,ive with respect to the {Z,") to be defined below, and let,

Kow the search procedure can be defined. Define Zln X,; if Z i n E I,+ (i.e., X , + Z,"h, E C), then defineZi+ln by

Zi+ln = Z i n - ain(@n,,(Zi*) + ti"). (15)

If t>he Zin calculated by t.he previous iteration (15) is in l n + , the)?., to ca.lculate Zi+ln, use (15) also, but take

the actual obsmafimz at the nearest endpoint (a.t either I,+ or 0). (Since t,he &(z ) was defined by a linear interpolation for z @ In+, the slope at z @ 1,' is t,he slope at. the nearest endpoint,.) Thus, the observations are always taken in C , and we keep track of the distance that the iterates Zin try to move out, of l,+.

The search (15) stops at the N,th iterate. The ( W,] ca.n be selected rat,her a.rbitrarily, provided o ~ l y t.hat (Dl ) below holds. Then dcfineX,+l = Z N , ~ if Z N , ~ E In+, other- mise define X,!+1 as t.he nearest endpoint. in 1,+ (either X,+ or 0).

Theorem 3 is a special ca.se of N result, (Theorem 4) in [ 5 ] . Unfort,unately, the t,heorem has not, been proved under the most. general conditions t.hat we would wish it. to be. The basic problem is that t o vcrify Condition 4, we nerd to obtain the estimate!: ( . 5 ) j where E x , P,' < a. Equation (5 ) guarantres that f(X,J converges, irrespective of t.he limit of any subaequrncc of t,hc [ X , ) , and it is basic to the proof t.hat (6) implies convergence to a point 1%-here NC holds. The particular stumbling block wa5 in not. being

, If Z," e l,+ and Zin > 0, then S, + h,Zinmay not. be in C. In any case I,+ is a "one sided" interval which is what one would always use in t.he deterministic rase. Bee extensions below.

able to prove ( 5 ) if Qz(zlX,,hn) > 0, at. z = 0, and eit,her I$,(~,+IX,,~,) < 0 or +(zlXn,h,) had several local minima on [O,X,+]. The method of proof and the difficulties to be overcome are quit,e similar t.0 those in [l], except. that t.he constraintx must, be taken into account. The estimat.es used to prove (5), (6) for this procedure in [ 5 ] are, in fact, taken directly from [l 1.

Th.eorem 9: If (14) and (Dl)-(D3) hokl , then Condition 4 holcls.

(Dl) 0 ain 2 Blfofor a positive real number B1. (D2) f(x) is convex o n . C. (D3) ain - 0 as n + i + ~3 , uniform.ly in w , the proba-

bility space mriable. Remark: The condition (Dl) implies that the length of

the one-dimensiona,l searches increases with '11. The condi- tion is probably dispensible, but we have not, been success- ful in eliminating it. It is used in [5] t.0 prove that. t,he expectations of t,he pi', arising in all searches where +,(O X,,h,) > 0, are summable. In the case where O,(O X,&,) > 0, we expect, on the average, that Z,," @ In+, and, hence, that f(X,+l) > ~ ( Z X ~ ~ ) very frequent,ly. (In this case, the average force on Zin at z = 0 is to t,he left-out of the region C, since the directional derivative is positive there, and the it,erat,ion (15) is attempting t.o minimize.) Also, we expect, t,hat Extf(X,+~) > f ( X , ) mill frequently occur in this case. Increasing t,he search lengths as 12 is increased was u!?eful t.0 show that the average errors ExJ(X,+1) - f ( X n ) are summable, even though t,hey may frequently be positive. But simulations indicate that con- vergence is fast,er, and more efficient use is made of the observations, if t,he number of st.eps in t.he searches is relativelJ7 short (certa.inly not tending to a as 12 - a). This is a major gap.

VII. EXTENSIONS

There are several types of extensions to t.he one-dimen- tional search met,hod which can be considered.

Exta2sim1. I : Search on a two-sided interval. D e h e X,- = max{ h : X , + Xk, @ C, X 5 0) a.od 1 , ~ [ X n - , X n + ] , and use the method of Section VI on 1, rat.her than on ln+. Convergence can still be proved if f(z) is convex on C. A two-sided int.erva.1 is more natura.1 in t.he st,ochastic problem; because of the errors in edmating Vf(X,), the directions 12, are not always such that f(X, + zk,) de- creases as z increases.

Exten.sim 2: Suppose that a number of consecutive (for fixed 7 1 ) Zi" are monotonic; then it s e e m reasona.ble t,hat, ain should not be decreased, since the monotonic behavior suggests that the noise is relatively small with respect to the directional derivative. If a number of consecut.ive Zin oscillate, t,hen it is suggested that the iterates are near a local minimum, or the noise effects are import,ant. Iiesten

sentence of Theorem 4 in [5] was used to show (incorrectly) that. the 5 The proof of Theorem 4 in [5] cont.ains an error. The next-to-last

hypotheses of Theorem 3 there on ~ ( Z I Z , ~ ) (called +(z) there) hold. However, those hypot.heses do hold If f(r) (hence, + ( z ~ , h , ) for all z,h) IS convex.

Page 6: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

354

Fig. 2. Typical run.

Page 7: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

BUSHNER Ah'D GAVIN: STOCHASTIC APPROXIMATION

1Y

I -3

I -3

Fig. 4. Typical run. New noise sequence.

H, actually used in t,he SA steps ( X n = need hold only for t,hose paths for which infinitely many SA steps a.re used) a.nd (14) holds for the sequence of ain actually used in the SFD st,eps (provided that t.here are infinitely ma.ny). The proof involves relatively straightforward combination of the ideas in [l], [?I, and [SI. Indeed, if Conditions 3 a,nd 4 hold for X , E C - Cn and (19) holds for X, E Co, then convergence can be proved (without any concern for the det,ails of getting h.,, or from X,). The details are roughly as follows.

Let D den0t.e a set. in C for which sup y(0,z) 5 - p < 0

for any p > 0. Then it is proved in [ 5 ] that only finitely many stochastic feasible directions sea,rches can start, in D (the average number being bounded from above by a function of p and the diameter of D). This gives t.he basic convergcncc result in [SI. The same (finiteness) rcsult. must hold if a SA step is used at each interate n , if X , is in Cn. Xow supposc t,hat D is strictly i n h i o r to C. Let D' denote a set. containing D, and which also is strict.ly in- terior to C , and let,

ZED

inf 15 - yI 2 pl > o 60,

Z g D

for some pl, and

Fig. 5. Typical run number of Zi" g C limited.

for some p2. In D or D', y(0,z) = 0 is equivalent, to of(.) = 0. Suppose D fl Co is not, empty. Again, only finitely many wpl stochastic feasible diredons can be init.iated in D. Then, if X,ED infinit.ely oft.en, all but. a finite number (wpl) of those it.erates must be of the SA type. But. t,he results in [I ] inply that. t.he { X , ) must leave D' infinitely oft.en also (since p2 > 0) and t,hat f(X,) 3 - w for t.hose paths that, enter D and leave D' (via the use of SA) in- finitely often, This contradicts the fact, thatf(z) is bounded

in C. These result.s imply that-, for any p > 0, - r(O,XrL) 2 lim 11.

- p, which yields t.he desired convergence. Experience with simulations indicates that, excessive

effort should not be spent searching in any particular direction h,, but that. the effort should be comnlensurate wit,h t,hc effort spent in obt,aining h, (and also consistent wit,h t.he requirements for convergcncc). It appears t.hat the better met,hods aim at) getting thc best "local" im- provement, as, for example, a st,eepest-descent method would do. This idea motivates the use of SA, rat.her than st,ochast.ic feasible directions, in Co.

VIII. DISCUSSION OF THE DATA

A number of runs, using versions of Extension 3 and 1000 observat,ions (noisy gradient estinmtes), were ma.de with f(z) = q"z) = z12/2 + z22/9,

Page 8: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

356 IEEE TRrlh'SACTIONS ON 4UTONATIC COXTROL, AUGUST 1974

For i = 1 in (16), for thr term Q2(ZinlX, ,b , l ) + gin we use the last, noisy rstimate of Oj(X,) which was used in (S) t o find h,. Thus 2 Z n is always in t.he 11, dirrction from 21" = x,.

Once h , and I , are determined, t,hc SFD cycle searchrs along a two-sided interval, so that the final point can possibly lie in dircction --h, from X,.

The a i R and d, were selrcted in the following way. Definr d, = l:'(k + 1). Define aln = d5(n-o, = d5(n-l)+l,

. . .,urn = cZ5("-o+,, . . .. Thus, the first. SFD cyclr starts with all = 1, the second with a12 = $, etc. Assume cycle T I

is starting, and it is t o be an SA. Let C I S , dcnotr thr last value of cl,; initiating a cycle of any t,ype (SFD or SA step). Thrn lrt d, = &,+I. The choice is somewhat arbi- traq-. In unconstraintd stochastic approximation the procrdurc is wnsitivc to the [a,,c,~ wqucncr, but it is hard to select good calurs a priori unless the function is known. The samr problrm a r k s in thr constrained casr. Ultimately? the preferred methods will undoubtcdlJ- hare a variety of adaptivr features. Just, as for the deterministic problem, a good one-dimensional srarch proccdurc, is crit.ical here but still does not srem to mist for the sto- chastic case.

Figs. 2-5 reprrsent typical results, and the Xrl and O i n arr listed in the graphs. We art € 0 = 0.1 and go = 0.25. The observation noise was normal with zwo nwan and unit. variance. Considrr Fig. 2 , wherr X , = 3)1! 0," = 2. The st.arting point (1) is rathcr disadvantageous since -Vj(.r) points sharply outward near the boundary ( x : q l ( s ) = 01, and more so as x mows toward thr optimum point. SX is used in steps 5 , 6, S, 9: and 10. Each of thrse steps requirrd only a single noisy observation of Vf(s j , and thr avcmge improvemrnt per obserratiuu is w r y probably greater than what would have been obtainrd if thc SFD rcplnced thr SX. Thc SFD must, of course, be used near thc. boundary for the ge-lleral problem, if jamming is to be gmrantced to br avoided.

Our choice of Oi" a.nd X, is arbitrary, to some cxtcnt. N , is allo~wd to incrcsr linearly to guarantee the con- vergence of s&. Esperiment,ation n-ith various other O i n and X, sequences has not yiddrd any drtinitc conclu- sions-thr results depending on thr function and starting point to some extent. Also, the ratc of incrraw of X,, seems tied to the values of thr {a,?} srquencc. -1 too-rapid increase in X, srcms to b r a waste, as is a. large 0,". Initially, wporially if thr noise is not dominnnt. shorter cycle:: scrm prcfcrablc. Indeed. in many of the runs! the shorter cyclrs (slowest ratr of increase of )I7,) were the best, but it must be l x p in mind that the asymptotic rates of convrrgcncc also drpcnd on N7?,0in} and arc very costly to determine experimentally. ,-Z general? but not univcr::al. rule is that it is preferable to spend thr lrast effort prr ryclr, consistmt xith convcrgrncc. But this is

hard to quantify, sincr the needs of the "initial" and "later" stages of the scarch process arc somrwhat. opposed.

Kotc that contraction a,rguments arr not usrd. Hence, just. as for the usual algorithms- in t.he deterministic caw, it, is not. easy to drt,ermine rates of convergence.

The process behaves reasonably well until strp 12. The direction hl was poor, and N 1 was too small to allow the one-dimensional srarch to compensate for thc poor F : h3 and h7 rrsemble the types of directions which one wo-!!ei obtain in an analogous deterministic problrm. 1T-r h- . \ ( . X 1 2 = X13 = XI&. Due to thr valucs of the noisr at steps I: and 13, th r directions werc poor so that.j(X-12 + zh12) acll!

f(X1, + Z h 1 3 ) increased as 5 incrrasrd (from zero). T',l. itrra.tes pushed outward from thr boundary, and Z . V ~ & ~ ~ were not, in C . Conscqurntly Xr2 = X13 = X13. As will be seen? this t.ype of dua t ion occurred frequently, and even morr frequently as X, approached the constrained minimum. At point. 20, 1 1 ~ ~ actually pointed awaJ- from point 21. Thus, a poor dirrction was chosen, but. the SFD itrrates generally drifted in t.he dirrction - - h t t , which was prrferable. Sot r tha t , as X, approaches the optimum (along or near the boundary), the difficulty greatly in- crrases of finding a feasible h, along which f(A-, + 21712)

decreasrs as z increa.se!:. In view of the noisr. it is not sur- prising that many of the h , arc poor, and that several wccrssive points coincide. Incrrasing the Oi" rcducrs the (variancr of the) observation noise in the 1 1 , determination, but thrn (for a fixed total number of obwrrations) fewer iterate!: can be taken. In genrral the simulations do not indicate that it is worthn-lde to use a O i n beyond 2 or so, although the brst u-ay of determining thr number rc.mains to be understood. Increasing O i n does postpone the bunching problrm until thr iterates are closer to the minimum.

Ahother phenomrnon occurred at points 12, l G , and 21. Owing to round-off errors (checked by hand calculations), the simples procedure used to calculate (S) trrminatrd with the message that there n-as an unbounded optimal solution; n clearly impossible situation. At, the "un- bounded ~olution" signal. a direction h, XI^ selected at, random, proridrd only that (Vqi(X,).h,) < 0 for all actirc i. Indred, such random sclrctions also satisfy Condition 3.

In the run of Fig. 2 , for many 7 1 , m-hrre X, was on t.he boundary, it frrquently occurred that (V.f(Xn),hn) > 0, and many of the Z i n in such cycles werr not in C. The run of Fig. 3 used the !:anlr noise sequcnce and paramrtrrs as that of Fig. 2, but the SFD cyclrs were stopped at either thr first timr that X, oscillations occurrrd (as in Fig. 2) or the first time that y X n of thr Z i n were outside C , which- rver camc first, where y = 0.25. Consequently, on the cyclrn when ( ~ , f ( X n ) . l ~ , ) > 0, and whcrr the Zin tended on thr nur rge to move out of C, f r w r obsrrvation- + were used. Unfortunatrly, wr do not have a convergence thcorrm for thr mrthod which would stop thr (say) vth cycle at the first time that Z,"gC, although thrrc probablv is such n theorem.

The run of Fig. 3 seemed to uw t.hr obsrrvations more efficiently than that of Fig. 2, and the samc stopping rulr

Page 9: Stochastic approximation type methods for constrained systems: Algorithms and numerical results

GUSHNER AND GAVIN: STOCHASTIC APPROXIMATION 357

was used in all subsequent runs (except Fig. 4) depicted in the figures here (using eithcr 0.25N, = yN, or O.lNn =

Y N n ) . Fig. 4 is a re-run of Fig. 2, but nitah a different noise

sequence. Fig. 5 uses a different, constraint q l ( - ) , and the st,op rule of Fig. 3, with y = 0.1. Kote that X I 4 = X I S = . . . = X2,,. This bunching can probably be reduced if use is made of all observations at a point when subsequent h., are ca.lcula.ted; e.g., to calculate h.15, use (since .X14 = XIS) all observations taken at X I S . This was not done in order to keep the program relatively simple. Points 25-60 are virtually a.t t,he same point; very close t.0 the minimum. Note that t.he direction E = X14 - X13 points toward? t.he bounda.ry. This implies that h I 3 is in the opposite direc- tion to 6, and t<ha.t, 11.13 was selected with t,oo ‘(poor” ob- servations of vf(X1,).

I n general the runs are quite satisfact.ory. Many im- provements are no doubt possible, a.nd major questions concerning t.he allocation of effort. (observat.ions) between finding the h, and iterating in I , rema.in to be answered, in spite of the many simuhtions taken. Yet. tjhc a.pproach seems to have promise.

IX. COKCLUSIONS

Met.hods for systematically optimizing a constrained noisy system by Monte Ca.rlo were presented, t,ogether with numerical data. The approach seem to provide a very useful tool for a much-neglected area of applications.

REFERENCES

[31

141

local optimization of functions with nonunique stationary H. J. Kushner, “Stochastic approximation algorithm for the

points,” IEEE Trans. Autonmi. Contr., vol. AC-17, pp. 64-54, Oct. 1972. H. J. Kushner and T. Gavin, “Extensions of Kestm’s adaptive stochastic approximat.ion met.hod,” Ann. Stdist., vol. 1, no. 5, pp. 831-862, 1973; also Brown University, Providence, R..I., CDS Rep. 72-5.

stochast.ic systems,” Ini . J . Cont~., vol. 18, no. 5, pp. 963-975,

E. Polak, Co?n.pufational Methods in Optimizatwn. New York: 1973; also Brown University, Providence, R.I.,, CDS Rep. 72-5.

Academic, 1971.

- , “4 versatile method for the Monte Carlo ot.pimization of

151

161

H. J. Kushner, “Stochastic approximation algorithms for con- strained optimization problems,” Brown University, Providence, R.I., CDS Rep. 72-1; also Ann. Statist., to be published. &I. D. Canon, C. D. Cullum, and E. Polak, Theory of Optimal Control und Mdhematieal Progrummirw. New York: McGraw- Hill, 1970. - - .

171 H. Kesten, “Accelerated stochastic approximation,” Ann. X & . Statist., V O ~ . 29, pp. 41-59, 1958.

Harold J. Kushner (S’54-A’56-k1’59--S&I- ’73-F’74) received t.he B.S. degree in electrical engineering from t.he City College of the City Universit.y of New York, New York, N.Y. in 1955, and the 31.S. and Ph.D. de+ grees, in electrical engineering, from the University of Wisconsin, Madison, in 1956 and 1958, respectively.

He hm worked a t the 3I.I.T. Lincoln Laboratories, Research 1nst.itute for Ad- vRnced S t d i e s , Rrtltimore, kId., and Brown

University, Providence, R.I., where he is currently Professor of Applied Xathematics and Engineering. He has consulted for various industries and government laboratories and has written numerous papers and t.wo books on virtually all aspects of stochast,ic control and filtering theory. He has been currently active in application of opera- tions research and systems theory to hospital systems.

Dr. Kushner is a member of SIAM, the Institute of 3Iathematical Statistics, the Operations Research Society of America, and was the first. chairman of t.he Automat.ic Cont,rol Groups Stochastic Syst.ems Committ.ee.

and applied optimal

Tom Gavin (S’67-h.1’74) mas born in Mon- treal, P.Q., Canada, on August 20, 1947. He received the B.S. degree in electrical engineering from McGill University, Mon- treal, Canada, in 1969.

He is present.1.c; a ca11didat.e for the Ph.D. degree in applied mathematics a t Brown University, Providence, R.I. and employed at the Cent.re de Recherches Mathematiques, University de RIontreal, P.Q., Canada, work- ing in the areas of stochast.ic approximation

control.