New algorithm for stochastic approximation (Corresp.)

494 IEEE TRANSACTIONS ON INFORMATION THEORY, JULY 197 1

is minimized with respect to al,az; . ‘,uN. The minimization of (4) yields a set of linear equations equivalent to the first N equations of (3) with the exact autocorrelation functions replaced by their estimates. It is clear that the maximum entropy procedure yields the same results when applied to these estimates. Therefore, the extrapolation beyond the (N + l)th sample autocorrelation point is equivalent to least- squares fitting of an Nth-order all-pole model to the data.

Asymptotic sampling properties of the least-squares coefficient estimates were derived by Mann and Wald [6]. The asymptotic expression for the Nth-order covariance matrix of the estimates is

CONVERGENCE

The problem of stochastic approximation has been mentioned by Sklansky [l] and Fu ei al. [2]. Dvoretzky [3] provided a powerful general theorem useful in proving the mean-square convergence of a class of stochastic approximation algorithms. To make this class more useful particularly in the area of learning systems, it has been suggested by Fu that methods be developed to improve the rate of convergence.

Consider an algorithm of the form

where &+I = 2” + Y”+l{.f,+l(r”+l) - &I (1)

r “+I = (r,,uz,. . . ,Y,+1 )

where the (i, j)th element of R,(N) is given by the sample autocorrela- to be used in the presence of an ergodic process where & is the nth

tion function rM(i - .j) of y(m) and sZ denotes the minimum value of estimate of x (the true value sought) of the mean of a normal distribu-

(4) divided by M - N. Moreover, the coefficient estimates are max- tion from which samples ri are taken to calculate the function fi(rJ,

imum-likelihood estimates if the e(N) in (2) are Gaussian. and yi is a gain sequence. It is required of the function f&J that

V. CONCLUDING REMARKS

Maximum entropy spectral analysis is equivalent to least-squares fitting of an all-pole model to the available data. The order of the model is determined by the number of available mean-lagged products. This implies the possibility of introduction of poles that were not actually present in the process. On the other hand, if the process contains both poles and zeros a relatively large number of poles is needed to ap- proximate the true spectrum sufficiently [7]. In this case the number of available mean-lagged products may be too small.

Therefore, it seems more adequate to use parametric methods such as the minimum residual methods developed by Astrom and Steiglitz [7], [S] for models with both poles and zeros or conventional fitting of all-pole models, since these approaches provide in addition tests for the order of the model.

A. VAN DEN Bos Dep. Appl. Phys. Delft Technol. Univ. Lorentzweg 1 Delft, Holland

REFERENCES [ll R. B. Blackman and J. W. Tukey, The Measurement of Power Stxxtra. New York:

Dover, 1958. El N. R. Zagalsky, “Exact spectral representation of truncated data,” Proc. IEEE

(Lett.), vol. 55, Jan. 1967, pp. 117-118. [3] J. P. Burg, “Maximum entropy spectral analysis,” presented at the 37th Annu.

Meetinn Sot. of Exoloration Geoohvsicists. Oklahoma Citv. Okla.. 1967. [41 C. E. Shannon anb W. Weaver; The M~thematicai The& of &mmunication.

Urbana, Ill.: University of Illinois Press, 1949, pp. 54-57. [51 M. G. Kendall and A. Stuart, The Aduanced Theory of Statistics, vol. 3. London:

Griffin, 1966, pp. 476-481. [61 H. B. Mann and A. Wald, “On the statistical treatment of linear stochastic differ-

ence equations,” Econometrica, vol. 11, July/Ott. 1943, pp. 173-219. 171 S. A. Tretter and K. Steiglitz, “Power-spectrum identification in terms of rational

models,” IEEE Trans. Automat. Contr. (Short Papers), vol. AC-12, Apr. 1967, pp. IQ-188.

IS] K. J. Astrb;m and T. Bohlin, “Numerical identification of linear dynamic systems from normal operating records,” Proc. IFAC Symp. Self-Adaptive Control Systems, Sept. 1966, pp. 3.2-l-3.2-9.

New Algorithm for Stochastic Approximation

4MnN = x. (2) Two forms have been used for f:(rJ in previous work, namely,

.f,WA = r, (3) and

(4)

With the appropriate gain sequence yi, Fu has shown that using (4) rather than (3) in (1) gives a faster decrease in the expected mean-square error.

It is suggested that the function f be chosen as

fnVl(r,,+ d = (R,+,(Z))+ where A,,, ,(I) is an estimate of the sample autocorrelation function of the samples rI,r2;. .,r,+l.

By definition,

K,(Z) = 4m-~rd (6) and if

r, .= x + c, (7)

where r, is the value of an element of zero-mean Gaussian white noise, then combining (6) and (7) gives

&(I) = -ai[x + 5”-Jx + r,11, (8) which reduces to

or I?,$) = x2 + k5J) (9)

since the cross terms in (8) vanish. Now combining (9) with (5) and (I) and subtracting x from both

sides gives

@ “,I - xl = (1 - Y”+mn - xl + Yn+1Wn+1 (10)

where, for x # 0,

VI.+1 =x((l+~)i- 1).

Abstract-A general stochastic approximation algorithm is given along Now iterating (10) and writing it in a closed form gives with assumptions and conditions necessary to show that it converges. Convergence is proven in the mean-square sense. The rate of convergence (-E-,+ 1 is shown to be better than two algorithms proposed previously.

(12)

Manuscript received March 25, 1970; revised November 25, 1970. if the void product is taken as unity. Combining (11) and (12) and

CORRESPONDENCE 495

.5

.4

.3

.2

.l

0 1 2 3 4 5 6

1\ \\ curve; fnci)j i 7 n l/n

I/n

l-l/n

I/n

Fig. 1. Normalized expected mean-square error as a function of n (62 = 1).

squaring both sides gives an expression for the mean-square error

(a”,, - x))’ =

I

n+l n+1 2 + igl Y&IX* + R&(O )+ J-J (1 - Yj) j=i+l 1

+ x2 [ n+l n+1 2 & yr j=v+l (1 - YJ) 1

+2G%-x4gu-Yi~)

I

?Z+1 n+l ’ igl Yitx2 + d<i(z))‘j=~+l(l - Yj)

1

-2xGbc#I$l-Yi4 [%~Yyl~Iu

(

n+l n+1 - 2x 2 Yi n (1 - Y,)

i=l j=i+ 1 I

ntl it+1 ’ & dx2 + ,R,,(z))f j=v+ 1 (l - yJ> 1 .

Suppose first that

yJ> 1

(13)

(14)

where uZ is the variance and a a constant for colored Gaussian noise. As the spectrum becomes wider, the noise approaches white [4], GL tends to co and

Hence for I + 0, (13) reduces to

(15)

(z&+1 - x)2 = (20 - xl’ (“fil (1 - YJ2 (16)

Therefore, for the mean-square error to become zero, regardless of the starting value, and for the algorithm to converge, requires only that

n+l lim J-J (1 - Yt) --f 0.

n-im i=l

The selection of a y-sequence determines the rate of convergence. The optimization of this rate was attempted by formulating as a discrete optimum control problem and applying Pontryagin’s maximum prin- ciple. The resulting equations were nonlinear and hence a solution not easily obtained. Simulations using a CDC 6400 digital computer were made for various conditions and the y-sequence

Y” = l/n. (18)

Convergence was particularly good for V/X ratios < 4. Included is a graph (see Fig. 1) comparing the expected mean-square error of all three algorithms herein mentioned.

CONCLUSION

An algorithm of the Dvoretzky type making use of the sample autocorrelation function was shown to be convergent if a y-sequence of the form (18) is used, then (17) is satisfied and the proof holds.

N. K. SINHA M. P. GRISCIK Dep. Elec. Eng. McMaster Univ. Hamilton, Ont., Canada

REFERENCES 111 121

[31

t41

J. Sklansky, “Learning systems for automatic control,” IEEE Trans. Automat. Contr., vol. AC-11, Jan. 1966, pp. 6-19. K. S. Fu,, Z. .I. Nikolic, T. Y. Chien, and W. G. Wee, “On the stochastic ap- proximatmn and related learning techniques,” Purdue Univ., Lafayette, Ind., Rep. TR-EE-66-6, Apr. 1966. A. Dvoretzky, “On stochastic approximation,” in Proc. 3rd Berkeley Symp. Math. Statist. and Probabilitv. Berkelev. Calif.: Universitv of California Press. 1956, pp. 39-55.

._

A. E. Bryson and Y.-C. Ho, Applied Optimal Control. Waltham, Mass.: Blaisdell, 1969, pp. 331-332.

Relations Among Sequency, Axis Symmetry, and Period of Walsh Functions

Abstract-A parameter is defined that distinguishes the members of the set (WAL(s,m)} of Walsh functions. There is a one-to-one correspondence between sequency and axis symmetry of each Walsh function. Axis symmetry is derived from the sequency number. A procedure is introduced for obtaining the period of a Walsh function from its sequency number.

I. INTRODUCTION

When a function x(t) is sampled at N = 2”+ ’ discrete points in time, the discrete function x = (x.x,- 1. . . x1x,) is produced. A dyadic translation [I] xr of function x(t) gives the function x(t 0 T) where t @ 7 represents the module 2 sum between the respective entries of the binary expressions for t and t.

A discrete function with N = 2” entries defines n axes of dyadic translations. Fig. 1 shows the three axes u2, al, and a0 defined for functions x of N = 22 and N = 23 entries. Table I shows the functions x(t @ T) resulting from dyadic translations about axes ai that correspond to ri = 1 in the binary expression of 5. The column at the left designates the functions under 5 dyadic translations. The binary

Manuscript received December 7, 1970; revised January 11, 1971. This work is supported partially by the U.S. Air Force, Office of Scientific Research, Office of Aerospace Research, under AFOSR Grant 70-1915. \i=l I

Documents

New algorithm for stochastic approximation (Corresp.)