3
CORRESPONDENCE 217 D Do(L) D 0(L-1) D i DL-2) D (3) D D(2) D(i) D0(1)L D D(1)A .(1)LA o1)A (1)A +D | (L) +D 1(L-1) +D (4) +D1(3) +D 1(2) 1)U 1J 1 1A D2( 2)41 D2( 1)42 2 23 2( 2,L-2 2( 2,L-1 2 2,L +D2(2)Ut- +D2(2)1412 +D2(2)U13 +D (2)1L2 +D (2)uAL +D (2)l,_L +D2(L) +D2 (5) +D 2(4) +D2(3) DL_1(1)UL_1 1 D(l)U_l D l()UL-1,3 D D 1)L ,l A +, . . +.. +. L-1,2 D (. A . ..1,L1 L-1(1)1 L L~~~~~~~, +DL (L- 11)@l | +DS 1 12 L-1 13 LL-l1 L- L-11, L-} 1 (L) Fig. 1. Matrix UA: Dn(l) = Dm(n)(l). Theorem 3: If bounds on the gain rate converge, then the ACKNOWLEDGMENT recurrence relation (4) converges to a stationary optimal policy. The author wishes to thank Prof. K. B. Irani for his critical Proof- The bounds on the gain rate converge, and so Proof: T br so advice during the course of this research and the reviewer for For this equation to hold, w1(n) -- fn + wj, for all i and for pointing out related work. some constant wi, substitution into the recurrence relation for REFERENCES w,(n) gives [1] D. J. White, "Dynamic programming, Markov chains, and the method of successive approximations," J. Math. Anal. Appl., vol. 6, pp. 373-376, NL ] ~~~~~~~~~~1963. Wi(n) -- max [q" + di (I)(f(n - 1) + wj) [2] A. Odoni, "On finding the maximal gain for Markov decision pro- j=1 W=1 cesses," Oper. Res., vol. 17, pp. 857-860, 1969. [3] N. A. J. Hastings, "Bounds on the gain of a Markov decision process," Sum the term containingfn and obtain Oper. Res., vol. 19, pp. 240-244, 1971. [4] R. A. Howard, Dynamic Programming and Markov Processes. Cam- N L bridge, Mass.: M.I.T. Press, 1960. ki ya k fl)v [5] W. S. Jewell, "Markov renewal programming," Oper. Res., vol. 11, w,(n) - fn ±rmax 1q, ± ,, C(I)(w;J * pp. 938-971, 1963. k j=l 1=1 [6] J. W. Boyse, "Solution of Markov renewal decision processes with application to computer system scheduling," Ph.D. dissertation, Syst. The quantity to be maximized is independent of n, so the Eng. Lab., Univ. Michigan, Ann Arbor, Rep. 026410-3-T, 1971. optimal plcw[7] P. J. Schweitzer, "Iterative solution of the functional equations of optimal policy will be stationary. undiscounted Markov renewal programming," J. Math. Anal. Appl., In summary, at each stage in the iteration we can compute the vol. 34, pp. 495-501, 1971. [8] T. E. Morton, "Undiscounted Markov renewal programming via maximum possible gain rate of the optimal policy and the modified successive approximations," Oper. Res., vol. 19, pp. 1081- minimnumn possible gain rate of the current L-stage policy sequence 1089, 1971. using -[w1(x)-wg(X-L)] 1 si.N L n-L5Z<n < fAs < f [f:C-iXL] <fA<f< ~max 1w()-wix-L 1.L.N L n-L<,<n When these bounds have converged to a satisfactory degree, the On the Stochastic Approximation Coefficients iterations are stopped and the policy sequence in use at that time K. KIRVAITIS is chosen as the control policy to be used for the infinite-time process. In practice, of course, we would like to control the Abstract-A simple and straightforward derivation of the optimal system with a single policy rather than having to use an L-stage form for the Kiefer-Wolfowitz stochastic approximation coefficients is policy sequence. Theorem 3 says that convergence is to a single presented. The results follow immediately from the mean-square sense policy, and experience [6] has been that convergence to a single convergence proof for the Kiefer-Wolfowitz algorithm by minimizing policy occurs after a reasonable number of iterations. the upper bound of the error variance. Since completing this work, the work of Schweitzer [7] and Morton [8] has been brought to the author's attention. These solution methods may offier computational advantages over those Manuscript received May 18, 1973; revised October 29, 1973. .. . . ... . . . . ........... ~The author is with the Department of Electrical Engineering, Illinois presented here and so should not be overlooked. Institute of Technology, Chicago, Ill. 60616.

On the Stochastic Approximation Coefficients

  • Upload
    k

  • View
    222

  • Download
    2

Embed Size (px)

Citation preview

Page 1: On the Stochastic Approximation Coefficients

CORRESPONDENCE 217

DDo(L) D0(L-1) DiDL-2) D (3) DD(2) D(i)

D0(1)L D D(1)A .(1)LA o1)A (1)A

+D|(L) +D1(L-1) +D (4) +D1(3) +D 1(2)1)U 1J 1 1A

D2( 2)41 D2(1)42 2 23 2( 2,L-2 2( 2,L-1 2 2,L

+D2(2)Ut- +D2(2)1412 +D2(2)U13 +D (2)1L2 +D (2)uAL +D (2)l,_L+D2(L) +D2(5) +D2(4) +D2(3)

DL_1(1)UL_1 1 D(l)U_l D l()UL-1,3 D D 1)L ,lA+, . . +.. +.L-1,2 D (.A . ..1,L1 L-1(1)1 L

L~~~~~~~,+DL (L- 11)@l | +DS 1 12 L-1 13 LL-l1 L- L-11, L-} 1

(L)

Fig. 1. Matrix UA: Dn(l) = Dm(n)(l).

Theorem 3: If bounds on the gain rate converge, then the ACKNOWLEDGMENTrecurrence relation (4) converges to a stationary optimal policy. The author wishes to thank Prof. K. B. Irani for his critical

Proof- The bounds on the gain rate converge, and soProof: T br so advice during the course of this research and the reviewer for

For this equation to hold, w1(n) -- fn + wj, for all i and for pointing out related work.

some constant wi, substitution into the recurrence relation for REFERENCESw,(n) gives [1] D. J. White, "Dynamic programming, Markov chains, and the method

of successive approximations," J. Math. Anal. Appl., vol. 6, pp. 373-376,N L ] ~~~~~~~~~~1963.

Wi(n) -- max [q" + di (I)(f(n - 1) + wj) [2] A. Odoni, "On finding the maximal gain for Markov decision pro-j=1 W=1 cesses," Oper. Res., vol. 17, pp. 857-860, 1969.

[3] N. A. J. Hastings, "Bounds on the gain of a Markov decision process,"Sum the term containingfn and obtain Oper. Res., vol. 19, pp. 240-244, 1971.[4] R. A. Howard, Dynamic Programming and Markov Processes. Cam-

N L bridge, Mass.: M.I.T. Press, 1960.ki ya k fl)v [5] W. S. Jewell, "Markov renewal programming," Oper. Res., vol. 11,w,(n) - fn ±rmax 1q, ± , , C(I)(w;J * pp. 938-971, 1963.

k j=l 1=1 [6] J. W. Boyse, "Solution of Markov renewal decision processes withapplication to computer system scheduling," Ph.D. dissertation, Syst.

The quantity to be maximized is independent of n, so the Eng. Lab., Univ. Michigan, Ann Arbor, Rep. 026410-3-T, 1971.optimal plcw[7] P. J. Schweitzer, "Iterative solution of the functional equations ofoptimal policy will be stationary. undiscounted Markov renewal programming," J. Math. Anal. Appl.,

In summary, at each stage in the iteration we can compute the vol. 34, pp. 495-501, 1971.[8] T. E. Morton, "Undiscounted Markov renewal programming via

maximum possible gain rate of the optimal policy and the modified successive approximations," Oper. Res., vol. 19, pp. 1081-minimnumn possible gain rate ofthe current L-stage policy sequence 1089, 1971.using

-[w1(x)-wg(X-L)]1 si.N L

n-L5Z<n

<fAs<f [f:C-iXL]<fA<f< ~max 1w()-wix-L

1.L.N Ln-L<,<n

When these bounds have converged to a satisfactory degree, the On the Stochastic Approximation Coefficientsiterations are stopped and the policy sequence in use at that time K. KIRVAITISis chosen as the control policy to be used for the infinite-timeprocess. In practice, of course, we would like to control the Abstract-A simple and straightforward derivation of the optimalsystem with a single policy rather than having to use an L-stage form for the Kiefer-Wolfowitz stochastic approximation coefficients ispolicy sequence. Theorem 3 says that convergence is to a single presented. The results follow immediately from the mean-square sensepolicy, and experience [6] has been that convergence to a single convergence proof for the Kiefer-Wolfowitz algorithm by minimizingpolicy occurs after a reasonable number of iterations. the upper bound of the error variance.

Since completing this work, the work of Schweitzer [7] andMorton [8] has been brought to the author's attention. Thesesolution methods may offier computational advantages over those Manuscript received May 18, 1973; revised October 29, 1973... . . ... . . . . ........... ~The author is with the Department of Electrical Engineering, Illinoispresented here and so should not be overlooked. Institute of Technology, Chicago, Ill. 60616.

Page 2: On the Stochastic Approximation Coefficients

218 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, MARCH 1974

I. INTRODUCTION where (qn, Yn) is the inner product. As is well known [4]

The application of the Kiefer-Wolfowitz stochastic approx- E[lqn+ 112] = E{E[lq.+lll12 qn]} (9)imation algorithm, given by [1 ], [2]

and taking the conditional expectation of (8), one obtains

1= -n (a) Yn, n = 1,2,.. (1)112 11IqI2 +

n

a(Cn) E[llqn+llI qn] = llqn + (-) E[llnl11I qn]

(where the quantities q, and Y1 are defined in Section II), to the )noisy hill-climbing problem stipulates that, for a class of prob- -C2n(q,E[Yn qn]). (10)lems [2], the positive real number sequences an,Cn in (1) satisfy \Cnd

From the assumptions on y(4,,) and (7), and taking the expecta-a en - 0, E(an)=X E(an <a . (2) tion indicated by (9), one obtains the inequality

n=1 Cn n=l Cn E[lqn+l j12] < E[lqn 12]In this correspondence, a straightforward derivation is given of an 22 an 1 2

the optimal form for the sequences an and Cn which minimizes I[ + (an) K22 - 2 (an) K1] + (11)the upper bound of the error variance. The results follow im-mediately from the mean-square sense convergence proof for which is an upper bound on the error variance. Without loss ofthe stochastic approximation algorithm. In [3] a version of the generality we can let K2 = K1 = K and, for notational con-stochastic approximation is applied to a different problem, and venience, let dn = an/cn and bn = F[lqn 112] in (11), which thenthe form of the optimal sequences of the weighting coefficients can be written asin that paper, denoted by AnA,n are similar to the sequences bn+ S bn [ - dnK]2 + dn2a2. (12)an,Cn described herein.

On successively iterating (12) one obtainsII. PROBLEM STATEMENT AND ASSUMPTIONS

n n -nLet qn and Yn be two uncorrelated random m vectors, n = bn 1 < b1 fH [1 - dK + dj f [ K]2

1,2,.-*, and let q be a constant m vector, the estimate of which j= i=J+1Jat the nth time instant jn is to be obtained given the noisy ob- (13)servations 1,', where

The first term on the right-hand side of (13) can be shown toYn' = Yn' + eni, i = 1,2,5* *,m, n = 1,2,. (3) vanish as n cc using the fact that

where e.' is zero-mean stationary noise with variance U2 < cc, [1 -djK] . exp [-diK], for some i > N (14)and

and that Zi. 1 di = o, while the second term can be shown toyn= y(Xn'jn2.ni +C. Am) vanish by invoking the Kronecker lemma [4]. Thus

- y(q.lq*qn Cm **.* 4,nm) (4) lim E[Ilqn+ 112] = 0 (15)with y(4n) a scalar unimodal function whose slopes are bounded n¢xfrom above and from below by some positive real constants K2 and the proof of (6a) is complete.and K1, respectively. The stochastic approximation algorithm To prove (6b), differentiate (12) with respect to dn to obtainstates that the next estimate of the vector q should be computed = bnKusing the recursive formula of (1), or d = (16)n

b,,K2 + q2 (64n+ = 4n - ( Yn)1 (5) Substituting for dn in (12) and iterating the result yields

where an,Cn must satisfy (2), and 41 is an arbitrary but finite real b S bna2 (17)vector. It will be shown that under the assumptions stated in the bnK + a2next sentence lim E [ llqnll2 ] = 0 ) and, finally, on iterating (16) and (17) simultaneously, one

lim E[Ilqn112] = 0 (6a) oban

and dan 11K d = 1/K (18)

n 1/2K2b (6b) f 2Kben n + a2/K2b1 which is (6b), and the ancillary result that

where qn = n- q, and K,bj are positive constants to be de- 2K2fined later. The assumptions necessary to prove (6a) are the b+ < (19)following: there exist positive real constants K2 and K, such that n + a2K2b

IF [Yqr]|| . K ||q || K I|q ||2 < (q,E[Yqr]) (7) Equation (19) allows one to draw two conclusions. First, bh+1vanishes in the limit as 1/n, which means that the convergence

where en is independent of qn, and the last term in (7) denotes the rate of the stochastic approximation method is 1/n. Second,inner product. given bn+1, b1, K, and a2, one can [2] estimate the number of

IIIT SOLUTION iterations necessary to obtain a given b"+±~Subtracting q from both sides of (5) and taking the norm

yields REFERENCES1q 2 _ q12 +an II j2 -2 (qY) (8) [1] J. Kiefer and J. Wolfowitz, "Stochastic estimation of the maximum of

qn+l-liqni + 1l 11- - (qn) a regression function," Ann. Math. Statist., vol. 23, 1952.Cn Cn~~4 [2] K. Kirvaitis and K. S. Fu, "Identification of nonlinear systems by

Page 3: On the Stochastic Approximation Coefficients

CORRESPONDENCE 219

stochastic approximation," in 1966 Joint Automatic Control Conf., adaptive estimation techniques. Once these are known, it isPreprints. possible to predict the DMT a few days ahead as discussed in

[3] G. N. Saridis, Z. J. Nicolic, and K. S. Fu, "Stochastic approximation the next section. It is seen that the accuracy of prediction de-algorithms for system identification, estimation, and decomposition ofmixtures," IEEE Trans. Syst. Sci. Cybern., vol. SSC-5, pp. 8-15, Jan. pends on the noise variance qk" of the state (1). A question that

1969.naualarssiwhteabetrmdl(ntesnetaqk[4] M. Loeve, Probability Theory. New York: Van Nostrand Reinhold, naturally arises is whether a better model (in the sense that q,j1955. becomes smaller) can be set up for the process under considera-

tion.A standard technique for improving the model accuracy is to

remove the assumption of constancy of the coefficient ak. This isdone, for example, if (2) is replaced with

Modeling and Prediction of the Daily Maximum Temperature ak+ 1 = bkak + Wk2 (3)K. L. S. SHARMA. AND A. K. MAHALANABIS where the coefficient bk is now assumed to be constant with

random fluctuations superposedAbstract-This correspondence examines the possibility of recursive

prediction for the daily maximum temperature based on a state variable bk+ 1 = bk + Wk (4)model. It is shown that past data can be processed for identifying asuitable noisy state model for this process. Once the model is selected The model corresponding to (1), (3), and (4) represents a third-the "corrector-predictor" algorithm of Kalman is readily applied for order model and its parameters ak, bk, qkl, qk , and qk33 canpredicting the daily maximum temperature for a chosen lead time. The again be identified adaptively. Apparently, the added complexitymethod is illustrated by predicting the daily maximum temperature of of the third-order model can only be justified if this leads to aDelhi, India. better prediction. It has, however, been recognized [1 ] that there

I. INTRODUCTION is always an optimum order of the model for a given physicalprocess. The accuracy of estimation and prediction deteriorates

Prediction of weather conditions occupies a very important if the model chosen is more complex than necessary. It is shownplace in our daily life, and many techniques have been developed that for the process under consideration, the second-orderby meteorologists for this purpose. As is well known, with the model corresponding to (1) and (2) appears to be more reliableproper amount of data and with a good computer available, it is for prediction purposes.possible to predict weather conditions such as temperature, In order that the proposed prediction algorithm may behumidity, wind velocity, etc. The present authors have been derived, it is necessary to add a measurement model to the stateinterested in making use of the Kalman filtering technique for models already discussed. It is assumed that measured DMT,obtaining weather predictions. This technique has been applied Yk, on the kth day is given bysuccessfully for predicting a wide variety of physical variablessuch as orbits of space bodies (both man made and natural), Yk = Xk + Vk (5)states of dynamical systems, and power demand in electric where Vk is introduced to account for the measurement in-utility systems. However, there does not seem to have been any accuracies. It is also assumed for convenience that Vk is anattempt to make use of this powerful technique for weather independent zero-mean white Gaussian noise having a finiteprediction. In this correspondence, the possibility of predicting variance Rk.the daily maximum temperature (DMT) of a given place basedon previous records of this quantity is examined through an III. THE PREDICTION ALGORITHMapplication of Kaknan filtering. In order to evaluate the model parameters a total of 710 past

II. DEVELOPMENT OF THE MODEL days of data were processed. The procedure is explained for thesecond-order model.

Before an application of Kalman filtering techniques is pos- Equations (1) and (2) are adjoined to obtain a nonlinear vectorsible, it is necessary to model the physical process in state-variable equation of the formform. Since the DMT constitutes a slowly varying random timeseries with nonstationary statistics, the simplest model would be Xk+I = f(Xk) + Wk (6)a scalar difference equation of the form where Xk = [xk,ak]', Wk = [Wkl,Wk2 ]', and f(Xk) = [akXk,ak]'-

Xk+ = akXk + Wk1 (1) In terms of augmented state, the output equation is rewritten as

where Xk denotes the maximum temperature on the kth day and Yk = HXk + Vk (7)Wk' is a noise sequence introduced to account for modeling where H = [1,0]. Equations (6) and (7) can be used for findingerrors. If it is assumed that the parameter ak remains constant a minimum variance estimate of the state Xk as well as theexcept for some random fluctuations, a model for this parameter covariance Qk of Wk via nonlinear adaptive estimation algorithmscan be chosen to be of the form [2], [3]. This, however, requires the knowledge of initial es-

ak+1 = ak + Wk.- (2) timates 00_ P0/0 of, respectively, the state XO and its errorcovariance and also of the variance Rk of the measurement

The sequences Wk' and Wk2 introduced in the model are assumed noise. The latter quantity is known from a knowledge of theto be independent zero-mean white Gaussian noise sequences accuracy of the temperature-measuring instruments. The selec-having variances qkj" and qk22, respectively. The parameters of tion of proper initial conditions for the state and its error co-this model, namely, ak, lk l, and qk22, can be identified using variance, however, requires a statistical analysis of past tem-

perature data.The extended Kalman filtering algorithm is used for on-line

Manuscript received October 13, 1972; revised September 17, 1973. identification of the parameter ak. The variances q,l" and qk~22The authors are with the Department of Electrical Engineering, Indian hv enietfe yaatn h ehdo aeadHs

Institute of Technology, New Delhi, India. hv enletfe yaatn h ehdo aeadHs