2
TECHNICALNOTESANDCORRESPONDENCE 571 131 A. K. hlandal;, “Generation of, Lyapunov functions and its application in system deswn Ph.D. dissertatlon, Univ. Calcutta, Calcutta, Iqcha, 1971. [4] A. K. Manzai, T. hlukherjee, and A. K. Choudhury, Generatlon of L y a p unov functions for linear, nonllnear and tlmevarying systems,” private [5] R. E. Kalman and J. E. Bertram. “Control system analysis and design via communication. t.he second method of Lyapunov,” Trans. A S X E (J. BQ~~c Engr.), Sm. D, vol. 82, pp. 371-393,1960. [6] A. E. Mandal. Lyapunov functions for linear timevarying systems and their applications,” private communicaAion. [i] A. K. Mandal and A. K. Choudhury, Design of a class of nonlinear time- varying systems” private eommunicat,ion. A Note on Stochastic Approximation Algorithms in System Idenacation H. G. ICWATNY Abstract-This correspondence considers the application of stochastic approximation algorithms to a broad class of system identification problems. Both asymptotic and initial convergence properties of the algorithms are discussed. A suboptimal procedure for parameter selection and a means of convergence acceleration are suggested. INTRODUCTION A variety of interesting problems of system identification, both linear and nonlinear, can be formulated in terms of t,he discrete- time linear model yn = Qn‘Yn + Tr (1 1 relating the output y, (scalar, forsimplicity), input Y,, (random s-vector), unknown paramet.er an (s-vector), and random noise ?, [2]-[5], [7]. Numerous papers have been concerned with the on-line est.imation of a, using stochastic appro.ximation algorit,hms (for example, [3]-[7] ). It appears t.0 be well knom, although not explicitly st.ated, t,hat if (1) is an exact representation, then unbiased estimates of a,, can be obtained by these met.hods without prior knowledge of t,he stat,istical parameters of Y,, and qn, provided Y, and qn are statistically orthog- onal (and, of course, Y, repeat,edly spans s-space). If Y, and I],, are not ort.hogona1, then t.he est,imat,ors are generally biased and some statist,ical parameters of Y, and/or qn must be known in order to remove the bias [4]-[7]. However, the requirements for a non- parametric formulation can frequently be met (but not frequently enough, unfortunately). This is, perhaps, the strongest attraction for stochastic approximat,ion methods. It should also be noted that it is generally assumed that Y,, and qn are t.empora.lly independent sequences (as well as being mutually independent). However, this assumption isnot. necessary and greatly restricts the class of problems which can be cast into the form of (1). In [3] it was shown to be sdcient that. Y, (and qn) become independent at a geometricrate and actually this can be relaxed to a harmonic rate which correlates with conditions originally presented by Sakrison [I]. Even under such fortuitous circumstances, most. proposed (non- parametric) stochastic approximation algorithms have the dis- advantage that., alt.hough asymptot,ic convergence is assured, very lit.tle can be done to cont,rol the inibial convergence properties. However, an algorit.hm proposed in [3] and which also appeared in [6] has some very favorable characterist.ics in this regard.These will be explored in the following. Manuscript received Julx 12.19il: revised February 4, 19i2. The author is with the College of Engineering, Drexel University, Philadelphia, Pa. PROPERTIES OF STOCELSTICAPPROXJ3L4TIONALGORITHNS Discussion of the major points is facilitated by comparing the behavior of the more-or-less standard algorithm with the proposed algorithm It can be shown (as in [3]) that if the joint densit.y functions of Yn and ?a satisfy P(YpIYn) = P(Yp) + o((p - nl-l), P(vplrln) = P(q,) + o((p - n)-l) for p - n - m and that, variations in an vanish asymptotically at the rat.e n-w where w > 1, then, in both cases, t.he mean-square estimation error (lIp,,112), where pn = a, - an, is bounded by a quantit.y xn satisfying the difference equat,ion where B depends mainly on the observation noise and is proportional to k2 and Y = min (2w - 1,2). The sequence 5 . will be discussed below. On t,he basis of (4), b0t.h estimators can be shown to converge in the mean; in fact, the asymptot.ic behavior is expressed by t.he relations for estimator (2), and = O(n - ZX-B), for P - 1 2 2kp (6) for estimator (3), where y = inf, y,,, 0 = inf,, Bn, where yn is the smallest eigenvalue of the positivedefinite matrix (YnYn’), and pR is t.he smallest. eigenvalue of the positive-definite matrix (YnY,,‘/ IIY,IIZ). Since the maximum value of Y - 1 is 1, it is evident that the asymptotic convergence of either estimator (2) or (3) will never be faster than l/n regardless of how large 2k y or 2kp may be. It isinterestingto focus on thetransient response of (4). In [3] it. is shown that l.$,l < 1 for all n 2 X O where for estimat,or (2), where p = sup E(,,, pn being the largest, eigenvalue of the positive-definite matrix (~!Yn~~zYnY,,’), and y is defined above. For estimabor (3) No = integral part [ : I This means t.hat the transient part of (4) will converge mono- tonically for n 2 No. Alternatively, it is possible for the est,imat.e to diverge fromthe desired vector for n < :Vo. Since thequant,ity p/y is not known beforehand, it is impossible t,o determine X 0 for estimator (2). This is typical of most st.ochastic approximatmion procedures. Furthermore, p/y may be quite large, especially if the dimension of a is large. This means that there is likely to be an init,ial period of divergence although asymptotic convergence is assured. The novel feature of estimator (3) is that .Vo as given by

A note on stochastic approximation algorithms in system identification

  • Upload
    h

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A note on stochastic approximation algorithms in system identification

TECHNICAL NOTES AND CORRESPONDENCE 571

131 A. K. hlandal;, “Generation of, Lyapunov functions and its application in system deswn Ph.D. dissertatlon, Univ. Calcutta, Calcutta, Iqcha, 1971.

[4] A. K. Manzai, T. hlukherjee, and A. K. Choudhury, Generatlon of L y a p unov functions for linear, nonllnear and tlmevarying systems,” private

[5] R. E. Kalman and J. E. Bertram. “Control system analysis and design via communication.

t.he second method of Lyapunov,” Trans. A S X E ( J . B Q ~ ~ c Engr.), Sm. D , vol. 82, pp. 371-393,1960.

[6] A. E. Mandal. Lyapunov functions for linear timevarying systems and their applications,” private communicaAion.

[i] A. K. Mandal and A. K. Choudhury, Design of a class of nonlinear time- varying systems” private eommunicat,ion.

A Note on Stochastic Approximation Algorithms in System Idenacation

H. G. ICWATNY

Abstract-This correspondence considers the application of stochastic approximation algorithms to a broad class of system identification problems. Both asymptotic and initial convergence properties of the algorithms are discussed. A suboptimal procedure for parameter selection and a means of convergence acceleration are suggested.

INTRODUCTION

A variety of interesting problems of system identification, both linear and nonlinear, can be formulated in terms of t,he discrete- time linear model

yn = Qn‘Yn + T r (1 1

relating the output y,, (scalar, for simplicity), input Y,, (random s-vector), unknown paramet.er an (s-vector), and random noise ?,,

[2]-[5], [7]. Numerous papers have been concerned with the on-line est.imation of a, using stochastic appro.ximation algorit,hms (for example, [3]-[7] ).

It appears t.0 be well knom, although not explicitly st.ated, t,hat if (1) is an exact representation, then unbiased estimates of a,, can be obtained by these met.hods without prior knowledge of t,he stat,istical parameters of Y,, and qn, provided Y, and q n are statistically orthog- onal (and, of course, Y, repeat,edly spans s-space). If Y, and I],, are not ort.hogona1, then t.he est,imat,ors are generally biased and some statist,ical parameters of Y, and/or q n must be known in order to remove the bias [4]-[7]. However, the requirements for a non- parametric formulation can frequently be met (but not frequently enough, unfortunately). This is, perhaps, the strongest attraction for stochastic approximat,ion methods.

It should also be noted that it is generally assumed that Y,, and q n are t.empora.lly independent sequences (as well as being mutually independent). However, this assumption is not. necessary and greatly restricts the class of problems which can be cast into the form of (1). I n [3] it was shown to be s d c i e n t that. Y, (and qn) become independent at a geometric rate and actually this can be relaxed to a harmonic rate which correlates with conditions originally presented by Sakrison [I].

Even under such fortuitous circumstances, most. proposed (non- parametric) stochastic approximation algorithms have the dis- advantage that., alt.hough asymptot,ic convergence is assured, very lit.tle can be done to cont,rol the inibial convergence properties. However, an algorit.hm proposed in [3] and which also appeared in [6] has some very favorable characterist.ics in this regard. These will be explored in the following.

Manuscript received Julx 1 2 . 1 9 i l : revised February 4, 19i2. The author is with the College of Engineering, Drexel University, Philadelphia,

Pa.

PROPERTIES OF STOCELSTIC APPROXJ3L4TION ALGORITHNS

Discussion of the major points is facilitated by comparing the behavior of the more-or-less standard algorithm

with the proposed algorithm

It can be shown (as in [3]) that if the joint densit.y functions of Yn and ?a satisfy P(YpIYn) = P(Yp) + o ( ( p - nl-l), P(vplrln) = P(q,) + o ( ( p - n)-l) for p - n - m and that, variations in an vanish asymptotically at the rat.e n-w where w > 1, then, in both cases, t.he mean-square estimation error (lIp,,112), where pn = a,, - an, is bounded by a quantit.y x n satisfying the difference equat,ion

where B depends mainly on the observation noise and is proportional to k2 and Y = min (2w - 1,2). The sequence 5. will be discussed below.

On t,he basis of (4), b0t.h estimators can be shown to converge in the mean; in fact, the asymptot.ic behavior is expressed by t.he relations

for estimator (2) , and

= O(n - ZX-B), for P - 1 2 2kp (6 )

for estimator (3), where y = inf, y,,, 0 = inf,, Bn, where y n is the smallest eigenvalue of the positivedefinite matrix (YnYn’), and pR is t.he smallest. eigenvalue of the positive-definite matrix (YnY,,‘/ IIY,IIZ). Since the maximum value of Y - 1 is 1, it is evident that the asymptotic convergence of either estimator (2) or (3) will never be faster than l / n regardless of how large 2k y or 2kp may be.

It is interesting to focus on the transient response of (4). In [3] it. is shown that l.$,l < 1 for all n 2 X O where

for estimat,or (2), where p = sup E(,,, pn being the largest, eigenvalue of the positive-definite matrix (~!Yn~~zYnY,,’), and y is defined above. For estimabor (3)

N o = integral part [:I This means t.hat the transient part of (4) will converge mono-

tonically for n 2 N o . Alternatively, it is possible for the est,imat.e to diverge from the desired vector for n < :Vo. Since the quant,ity p/y is not known beforehand, it is impossible t,o determine X 0 for estimator (2). This is typical of most st.ochastic approximatmion procedures. Furthermore, p / y may be quite large, especially if the dimension of a is large. This means that there is likely to be an init,ial period of divergence although asymptotic convergence is assured. The novel feature of estimator (3) is that .Vo as given by

Page 2: A note on stochastic approximation algorithms in system identification

572 IEEE TRANS-WTIONS ON AUTOMATIC CONTBOL, AUGUST 1972

(8) can be controlled by judicious choice of k . By choosing k = 1, N O = 0 so that the transient term is monot,onically decreasing from the inception of the iteration process.

A SUBOPTIXAL PROCEDURE From (5) or (6), it. is seen that it is always possible to choose

It sufficiently large so that, the maximum rate of asymptotic con- vergence is obtained, i.e., since Y - 1 is a t most 1, select k > +? or k > +B. Unfortunately, y and B are not known a priori. On the other hand, the disturbance coefficient B increases with k2, and hence it is undesirable to choose k larger than necessary. To further complicat.e the problem, the number NO increases with k . In the case of estimator (2), nothing can be done in view of t.he limited a priori information in the way of selecting a near-opt,imum value of k . An advantage of estimator (3) is that. something more can be done. To begin with, suppose that Y , is s-dimensional and let X I , XZ, . . . , X, be the eigenvalues of the posit,ive-definit.e matrix (YnYn’/llYnl12). Then

However,

Now, a rather conservative upper bound for the minimum eigen- value x 1 can be obtained by asuming all of the eigenvalues to be equal. In this case, (9) and (10) yield x 1 = l/s. Thus, a conservative estimate of ,9 is l/s, and hence we should choose k accordingly, say, k = s. Note that s is the number of unknown parameters and may be quite large. Once k is specified, N O is given by (8). I n order to retain the property that the transient response of xn decreases from the start of the iteration procedure, the process is started wit.h n = NO.

CONVERGENCE ACCELERATION

In some inst.ances, when ,3 is quite small, the above estimate can be far too conservative and poor convergence is obt.ained. This situat.ion is accompanied by a large spread in t.he eigenvalues of (YnYn’//(Yn(j2) (the matrix is ill condit.ioned), and arbitrarily increasing the value of k is generally unsatisfactory. In such cases it has been found advant.ageous to use a modified algorithm as follows. At the time of computation of iin+l, in addition to the esti- mate correction provided in (a), a correction orthogonal to Yn and lying in the plane of Yn and Yn-1 is added. The algorithm is

where

zn = JlYnllTn-1 - (Yn’Yn-1)Yn- (12)

This algorithm converges to the true value of the parameter and its convergence properties may be characterized in t,erms of parameters similar to t,hose used above. The proof parallels the proof of con- vergence of (3) as out,lined in [3]. In this m e , the property N O = integral part [k/2] is retained. A conservative choice of k can be shown t,o be s/P. An example of the application of t,he algorithm can be found in [3].

REFERENCES (11 D. J. Sakrison, “Application of stochastic approximation methods to system

opumlzation,” Res. Lab. Electron., M a s s . Inst. Technol.. Cambridge, Tech. Rep. 391 1962.

(21 Y. C. Ho and R . C. I(. Lee “Identification of linear dynamic systems,” Inform. Contr., vol. 8 pp. 93-i10 Feb. 1965.

131 H. G . Kwatny and’D. W. C. Shen, “Identification of nonlinear systems

[41

using a method of stochastic approximation,” in PTOC. 8th Joint Automatic Control Conf., 1967. pp. 81-26. G . X. Saridis and G . Stein Stochastic approximation algorithms for linear discrete-time system idenhication.” IEEE Trans. A&mat. Contr., vol. A4C-13, pp. 515-523, Oct. 1968. -,“A new algorithm for linear system identification” IEEE Trans .

Y. Bar-Shalom and S. C. Schmarrp, “Application of stochastic approximation Automat. Contr. (~orresp.), vel. AC-13, pp. 592-594, Oct. i96s.

to on-line svstem identification IEEE Trans. Automat. Contr. (Corresp.), vol. AC15,‘pp. 606-607 Oct. 1b70. A . X. Netravali and R.’J. P. De Figueiredo. “On the ident.ification of non- linear dynamical systems,” IEEE Trans. Automat. Contr.. vol. AC-16, pp. 28-36, Feb. 19il .

Optimal Smoothing for Continuous-Time Systems with Multiple Time Delays

I(. K. BISWAS AKD A. K. MAHALAXABIS

Abst~aci-Equations for the smoothed state estimate and for the error covariances of a continuous-time system with multiple time delays, based on observations involving time delays, are derived through a combination of discretization, state augmentation, and subsequent dediscretization procedures.

INTRODUCTIOX ANhough the existing literature contains smoothing results for

non-time-delayed systems, no attempt. seems to have been made to extend these to the case of t,ime-delayed continuous-time syst,ems. Priemer and Vacroux have reported some results for the discret.e- time version of the problem, which were obtained t.hrough projection arguments [l] , 121. The aim of this correspondence is to report smoothing solutions for a continuous-time system having both transportation and observat.ion lags. T h e e are obtained by first. dis- cretizing the continuous-time problem and then employing a st.ate augmentat,ion technique [3] that converts the given problem into a non-time-delayed higher dimensional filtering problem. The desired smoothing solutions are then obtained from the components of the higher dimensional filtering equat,ions. Finally, a formal limiting procedure is utilized to derive t,he cont.inuous-t.ime smoot.hing solu- tions.

PROBLE~I STATEMENT

Consider a time-delayed continuous-time system modeled by t,he following equations:

L * ( t ) = Pi(t)X(t - ai) + w( t ) (1)

i = O

where z is the n-vector system st,ate, y is the m-vector observation, L Y ~ and Bi represent, respectively, the ith delay in the system and ob- servation equat,ions with ai > ai-1; & > B i - I ; (YO = BO = 0. L and M denote the total number of delays in the syst.em and observation.

The noise processes w ( t ) and v ( t ) are assumed to be independent, zero-mean, white, Gaussian processes with covariance &(t ) and R(t), respectively, with R(t) posit.ive definite.

If it is assumed t.hat t = ft (t,he kt.h sampling instant.), ai. = diT, and p i = hiT where T is the sampling interval, the discrete-t.ime equivalents of ( 1 ) and (2) can be obtained in the form [4]

The authors are with t.he Department of Electrical Engineering, Indian 1nst.i- Manuscript received January 18. 1972.

tute of Technology, New Delhi-29, India.