5
1192 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 34, NO. 11, NOVEMBER 1989 adaptive systems: Method of averaging and persistency of excitation,” IEEE Tmns. Automat. Contr., vol. AC-32, pp. 26-34, Jan. 1987. J. Ezzine and A. H. Haddad, “On the stabilization of two-form hybrid systems via averaging,” in Proc. 22nd Annual Conf. Inform. Sei. Syst., Princeton Univ., Mar. 1988, pp. 579-584. M. Mariton and P. Bertrand, “Non-switching control strategies for continuous- time jump linear quadratic systems,’’ in Proc. 24th IEEE Conf. Decision Contr.. Ft. Lauderdale, FL, Dec. 1985, pp. 916-921. J. G. F. Belinfante. “Explicit version of the Campbell-Baker-Hausdorff formula: integral representation for In eXeY ,” unpublished. R. M. Wilcox, “Exponential operators and parameter differentiation in quantum physics,” J. Math. Phys., vol. 8, pp. 962-982, 1967. R. Bellman, Introduclion to Matrix Analysis. New York: McGraw-Hill, 1960. J. Ezzine and C. D. Johnson, “Analysis of continuous-discrete model parameter sensitivity via a perturbation technique,” in Pm. 18th Southeastern Symp. Syst. Theory, 1986, pp. 545-550. C. Van Loan, “The sensitivity of the matrix exponential,” SIAM J. Numer. Anal., vol. 14, Dec. 1977. T. Strom, “On logarithmic norms,’’ SIAMJ. Numer. Anal., vol. 12, Oct. 1975. C. A. Desoer and M. Vidyasagar, Feedbock Systems: Input-Oulput Properties. New York: Academic, 1975. M. Mariton, “The equalizing solution of the JLQ problem,” in Proc. 26th IEEE Conf. Decision Contr., Los Angeles, CA, Dec. 1987. J. N. Tsitsiklis, “On the stability of asynchronous iterative processes,” in Proc. 25th IEEE Conf. Decision Contr., Athens, Greece, Dec. 1986, pp. 1617-1621. D. A. Castanon et al., “Asymptotic analysis, approximation and aggregation methods for stochastic hybrid systems,” in Prm. 1980 JACC, San Francisco, CA, 1980, paper TA3-D. On the Convergence of Self-Tuning Stochastic Servo Algorithms Based on Stochastic Approximation SRDJAN s. STANKOVIC AND MILOJE S. RADENKOVIC Abstmct- Algorithms of the stochastic approximation type for self- tuning tracking of stochastic references in the general delay case are proposed. Global stability, asymptotic optimality, convergence of the adaptive control law in a Cesaro sense, and the strong consistency of the parameter estimates are proved. The persistence of excitation condition is also analyzed. I. INTRODUCTION The problem of self-tuning control of linear stochastic systems has received considerable attention, starting from the pioneering contribution due to h om and Wittenmark 111. The problem of convergence of self- tuning control algorithms has been studied using different methodologies, e.g., [2]-[5]. Goodwin, Ramadge, and Caines [6] have proved the global stability and asymptotic optimality for some algorithms. Becker, Kumar, and Wei 171 have demonstrated convergence of the adaptive control law to the optimal one for the regulation problem in the unit delay case. The available strong consistency results cover some specific aspects of the problem. Caines and Lafortune [8] studied an identifier working in parallel with the self-tuning control loop. Chen [9], Chen and Guo [lo], Kumar and Praly [ll], and Lai and Wei [13] have considered the tracking problem with deterministic references in the unit delay case. In this note we shall analyze asymptotic properties of direct self- tuning control algorithms of the stochastic approximation type for track- ing stochastic ARMA references in the case of ARMAX processes with general delay. The motivation for such a control problem formulation can be found in numerous applications where reference signals are neither known nor reconstructable ahead of time [12]. We shall construct two self-tuning control algorithms providing adaptation to both process and Manuscript received October 9, 1986; revised April 6, 1988. Paper recommended by The authors are with the Faculty of Electrical Engineering, University of Belgrade, IEEE Log Number 8930719. Past Associate Editor, L. Valavani. Belgrade, Yugoslavia. reference models, differing by a priori assumptions about the regulator parameters. The analysis is first concerned with global stability and asymptotic optimality. The attention is focused on the general properties of the se- quence of parameter estimates; some new results are derived by using Kolmogorov’s inequality. Starting from the time-varying model of the closed-loop system, convergence of the adaptive control law to the op- timal one in a Cesaro sense is proved for both algorithms. This is a generalization of the results presented in [7]. Introducing additional as- sumptions concerning the structure of the adaptive regulator and the properties of the optimal regulator transfer function, it is proved that the algorithm in which the a priori knowledge about the sign of one of the regulator parameters is incorporated provide the strong consistency of the parameter estimates. This is, to the authors’ knowledge, the first strong consistency result for self-tuning algorithms in the general delay case. When no prior knowledge is available, i.e., when all the adaptive regulator parameters are estimated, it is proved that each recursion in the estimation algorithm converges to a multiple of the optimal param- eter vector. This result shows that the main conclusions in [7] can be extended to the tracking problem and the general delay case. An anal- ysis of the persistence of excitation condition is also given. It is shown that this condition holds only for the algorithms providing the strong consistency. 11. PROBLEM FORMULATION Let the process be represented by a single-input single-output ARMAX model A(q-’)y(i) = q-dB(q-‘)u(i) +c(q-’)w(i) (i 2 1) (1) where {Y(J]}, {u(z]}, and {w(z]} are output, input, and stochastic dis- turbance sequences, respectively, q-I stands for the unit delay opera- tor, d for the pure process time-delay, while A(q-l) = 1 + UI 4-l + ...+~,,q-~~,B(q-~) = b o +blq-’ +...+ b,,q-“~(b, # 0), and Let the reference sequence {Y*(J]} be generated by an ARMA model Ctq-’) = 1 + CI 4-1 + . . , + c,c q -”, . P(q-’)y*(i) = Q(q-’)W (2) where {U@} is a stochastic sequence independent of {w(z]}, while P(q-’) = 1 +p1qp1 + ... +p,,q-“p and Q(4-I) = 1 + q1q-I + Equations (1) and (2) are taken together with their initial con- ditions x, = {y”(O),y”(-l),~~~,y“(l - k),u”(l - d),...,uD(l - k), e(O),...,e(l -k)}, whereyo(ir = b(i):y*(i)], u0(OT = [u(i):O], and = [w(i): u(i)], while k = max {na , n~ +d, ne, np, n~ }. The process {x, , e(l), e(2), . . .} is defined on the probability space { R, F, Denote by F, the o-algebra generated by x, , and by F; the o-algebras generated by {x, , e(l), . . . ,e(i)}. Introduce also the following assump- tions concerning the process and the reference. AI: All finite dimensional distributions of .U, and { e(z]} are absolutely continuous with respect to the Lebesgue measure. A2: ’ ’ + qna q-“Q . PI. E{e(i)lFip1} = 0; E{e(i)e(i)TIF,-l} = [ut ‘1 (as.) 0 U; (3) E{w(~)~JF,-~} 5 k, < CO; E{v(~)~JF,-~} I k, < m; (U,,,, U, < CO; i 2 1). A3: Polynomials B(z), C(z), P(z), and Q(z) have zeros strictly outside the unit disk. Assume that the admissible control action u(z] is measurable with re- specttotheo-algebrageneratedby {y(l),... ,y(i), y*(l),... J(0, u(1) ,. ..,u(i - l)}, which is, in general, smaller than F;. The op- 0018-9286/89/1100-1192$01.00 0 1989 IEEE

On the convergence of self-tuning stochastic servo algorithms based on stochastic approximation

  • Upload
    ms

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the convergence of self-tuning stochastic servo algorithms based on stochastic approximation

1192 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 34, NO. 11, NOVEMBER 1989

adaptive systems: Method of averaging and persistency of excitation,” IEEE Tmns. Automat. Contr., vol. AC-32, pp. 26-34, Jan. 1987. J. Ezzine and A. H. Haddad, “On the stabilization of two-form hybrid systems via averaging,” in Proc. 22nd Annual Conf. Inform. Sei. Syst., Princeton Univ., Mar. 1988, pp. 579-584. M. Mariton and P. Bertrand, “Non-switching control strategies for continuous- time jump linear quadratic systems,’’ in Proc. 24th IEEE Conf. Decision Contr.. Ft. Lauderdale, FL, Dec. 1985, pp. 916-921. J. G. F. Belinfante. “Explicit version of the Campbell-Baker-Hausdorff formula: integral representation for In eXeY ,” unpublished. R. M. Wilcox, “Exponential operators and parameter differentiation in quantum physics,” J. Math. Phys., vol. 8, pp. 962-982, 1967. R. Bellman, Introduclion to Matrix Analysis. New York: McGraw-Hill, 1960. J. Ezzine and C. D. Johnson, “Analysis of continuous-discrete model parameter sensitivity via a perturbation technique,” in P m . 18th Southeastern Symp. Syst. Theory, 1986, pp. 545-550. C. Van Loan, “The sensitivity of the matrix exponential,” SIAM J. Numer. Anal., vol. 14, Dec. 1977. T. Strom, “On logarithmic norms,’’ SIAMJ. Numer. Anal., vol. 12, Oct. 1975. C. A. Desoer and M. Vidyasagar, Feedbock Systems: Input-Oulput Properties. New York: Academic, 1975. M. Mariton, “The equalizing solution of the JLQ problem,” in Proc. 26th IEEE Conf. Decision Contr., Los Angeles, CA, Dec. 1987. J. N. Tsitsiklis, “On the stability of asynchronous iterative processes,” in Proc. 25th IEEE Conf. Decision Contr., Athens, Greece, Dec. 1986, pp. 1617-1621. D. A. Castanon et al., “Asymptotic analysis, approximation and aggregation methods for stochastic hybrid systems,” in Prm. 1980 JACC, San Francisco, CA, 1980, paper TA3-D.

On the Convergence of Self-Tuning Stochastic Servo Algorithms Based on Stochastic Approximation

SRDJAN s. STANKOVIC A N D MILOJE S. RADENKOVIC

Abstmct- Algorithms of the stochastic approximation type for self- tuning tracking of stochastic references in the general delay case are proposed. Global stability, asymptotic optimality, convergence of the adaptive control law in a Cesaro sense, and the strong consistency of the parameter estimates are proved. The persistence of excitation condition is also analyzed.

I. INTRODUCTION The problem of self-tuning control of linear stochastic systems has

received considerable attention, starting from the pioneering contribution due to h o m and Wittenmark 111. The problem of convergence of self- tuning control algorithms has been studied using different methodologies, e.g., [2]-[5]. Goodwin, Ramadge, and Caines [6] have proved the global stability and asymptotic optimality for some algorithms. Becker, Kumar, and Wei 171 have demonstrated convergence of the adaptive control law to the optimal one for the regulation problem in the unit delay case. The available strong consistency results cover some specific aspects of the problem. Caines and Lafortune [8] studied an identifier working in parallel with the self-tuning control loop. Chen [9], Chen and Guo [lo], Kumar and Praly [ll], and Lai and Wei [13] have considered the tracking problem with deterministic references in the unit delay case.

In this note we shall analyze asymptotic properties of direct self- tuning control algorithms of the stochastic approximation type for track- ing stochastic ARMA references in the case of ARMAX processes with general delay. The motivation for such a control problem formulation can be found in numerous applications where reference signals are neither known nor reconstructable ahead of time [12]. We shall construct two self-tuning control algorithms providing adaptation to both process and

Manuscript received October 9, 1986; revised April 6, 1988. Paper recommended by

The authors are with the Faculty of Electrical Engineering, University of Belgrade,

IEEE Log Number 8930719.

Past Associate Editor, L. Valavani.

Belgrade, Yugoslavia.

reference models, differing by a priori assumptions about the regulator parameters.

The analysis is first concerned with global stability and asymptotic optimality. The attention is focused on the general properties of the se- quence of parameter estimates; some new results are derived by using Kolmogorov’s inequality. Starting from the time-varying model of the closed-loop system, convergence of the adaptive control law to the op- timal one in a Cesaro sense is proved for both algorithms. This is a generalization of the results presented in [7]. Introducing additional as- sumptions concerning the structure of the adaptive regulator and the properties of the optimal regulator transfer function, it is proved that the algorithm in which the a priori knowledge about the sign of one of the regulator parameters is incorporated provide the strong consistency of the parameter estimates. This is, to the authors’ knowledge, the first strong consistency result for self-tuning algorithms in the general delay case. When no prior knowledge is available, i.e., when all the adaptive regulator parameters are estimated, it is proved that each recursion in the estimation algorithm converges to a multiple of the optimal param- eter vector. This result shows that the main conclusions in [7] can be extended to the tracking problem and the general delay case. An anal- ysis of the persistence of excitation condition is also given. It is shown that this condition holds only for the algorithms providing the strong consistency.

11. PROBLEM FORMULATION

Let the process be represented by a single-input single-output ARMAX model

A(q-’)y(i) = q-dB(q-‘ )u( i ) + c ( q - ’ ) w ( i ) (i 2 1) (1)

where {Y(J]} , {u(z]}, and {w(z]} are output, input, and stochastic dis- turbance sequences, respectively, q-I stands for the unit delay opera- tor, d for the pure process time-delay, while A ( q - l ) = 1 + UI 4-l + . . . + ~ , , q - ~ ~ , B ( q - ~ ) = b o +blq-’ + . . . + b , ,q -“~(b , # 0), and

Let the reference sequence {Y*(J]} be generated by an ARMA model Ctq-’ ) = 1 + CI 4-1 + . . , + c,c q -”, .

P(q- ’ ) y* ( i ) = Q(q- ’ )W (2)

where {U@} is a stochastic sequence independent of {w(z]}, while P ( q - ’ ) = 1 + p 1 q p 1 + . . . +p,,q-“p and Q(4-I) = 1 + q1q-I +

Equations (1) and (2) are taken together with their initial con- ditions x , = { y ” ( O ) , y ” ( - l ) , ~ ~ ~ , y “ ( l - k ) , u ” ( l - d),...,uD(l - k), e(O), . . . ,e( l - k ) } , whereyo( i r = b( i ) : y* ( i ) ] , u0(OT = [u(i ) :O], and = [w(i): u(i)], while k = max {na , n~ + d , ne, n p , n~ }. The process {x , , e( l ) , e(2), . . .} is defined on the probability space { R, F,

Denote by F, the o-algebra generated by x, , and by F; the o-algebras generated by {x , , e( l ) , . . . , e ( i ) } . Introduce also the following assump- tions concerning the process and the reference.

AI : All finite dimensional distributions of .U, and { e(z]} are absolutely continuous with respect to the Lebesgue measure.

A2:

’ ’ ’ + qna q-“Q .

PI.

E { e ( i ) l F i p 1 } = 0; E{e( i )e ( i )TIF, - l }

= [ut ‘1 (as . ) 0 U;

(3)

E { w ( ~ ) ~ J F , - ~ } 5 k , < CO; E { v ( ~ ) ~ J F , - ~ } I k , < m; (U,,,, U, < CO; i 2 1).

A3: Polynomials B(z), C(z), P(z), and Q(z) have zeros strictly outside the unit disk.

Assume that the admissible control action u(z] is measurable with re- specttotheo-algebrageneratedby {y ( l ) , . . . , y ( i ) , y*( l ) , . . . J ( 0 , u(1) ,. . . , u ( i - l)}, which is, in general, smaller than F ; . The op-

0018-9286/89/1100-1192$01.00 0 1989 IEEE

Page 2: On the convergence of self-tuning stochastic servo algorithms based on stochastic approximation

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 34, NO. 11, NOVEMBER 1989 1193

timal control minimizing the mean-square tracking error J = IimN,, 1/N c,"=,Cy(i) - y*( i ) )2 is generated by

B(q-')F(q-' )QW' Mi) = - G W ' ) Q W ' ) y ( i ) + C(q-')T(q-')y*(i) (4)

where polynomials F ( q - ' ) , G ( q - ' ) , W ( q - ' ) , and T ( q - ' ) represent the minimum degree solutions with respect to F ( q - ' ) and W(q-' ) of the Diophantine equations

C(q- ' ) = A ( q - ' ) F ( q - ' ) +q-dG(q- ' ) ;

QW') = P(q- I )W(q- ' ) + qPdT(q- ' ) (5)

where deg F ( q - ' ) = n p 5 iiF = d - 1, deg G(q- ' ) = nc 5 i i ~ = m a x { n A - l , n c - d } , d e g W ( q - ' ) = n w < i i w = d - I , d e g T ( q - ' ) = n~ I i i ~ = max {np - 1 , -d}. The achieved minimum of the criterion is J,,, = u2 = E{(F(q- ' )w( i + d))2 + (w(q-')u(i + d ) ) 2 1 ~ , } . If z(i) = y ( i + d ) - y * ( i + d ) - F ( q - ' ) w ( i + d ) + W ( q - l ) u ( i + d ) , it can be easily shown that (4) is equivalent to

C(q- ' )Q(q- ' ) z ( i ) = e T h ( i ) = Ib, I(sgnb,u(i) + el$J2(i)) = 0 (6)

where

6 = coeff { G ( q - ' ) Q W ' ) , B(q- ' )F(q- ' )Q(q- ' ), C ( q - ' ) T ( q - ' ) }

O2 = coeff{lbo l - 'G(q- ' )Q(q- ' ) , Ibo I - ' (B(q- ' )F(q- ' )Q(q- ' ) - is the vector of all the optimal regulator parameters,

bo), IbJ'C(q-')T(q-')}, while $ J I ( ~ ) ~ = k(i),...,y(i - nc - n Q ) , u ( i ) ; . . , u ( i - ~ Z B - n~ - n g ) , - y * ( i ) ; . . , - y * ( i - nc - nr)] and 4 ~ ( i ) ~ = Lv(i), . . . ,y(i - nG - ne) , u(i - l ) , . . ' , u(i - nB - AF - n ~ ) , -y*(i ) , ' . . , -y*(i -ne - n ~ ) ] are the corresponding measurement vectors (notice that dim 02 =dim el - 1).

If the parameters of the process and reference models are not known to the designer, one is faced with a specific adaptive control problem. Introduce the following additional assumptions.

A4: Time-delay d is known. A5: Upper bounds of nA , nB , ne, ne , and np are known. We shall construct two direct adaptive control algorithms of stochastic

approximation type for the stochastic reference tracking problem using the methodology of Goodwin and co-workers [4], [6]. The first algo- rithm, denoted as P I , is based on the assumption that no a priori knowl- edge of the optimal regulator parameters is available, while in the sec- ond, denoted as P2, the sign of bo, the leading coefficient of B(q- ' ) , is introduced (see [6] for a similar approach). Both algorithms can be represented for i = d , d + 1 , . . . , by d interlaced recursions

where j = 1 stands for PI and j = 2 for P2, together with the corre- sponding control laws

for P2. (8)

In (7) and (8) the vectors of regressors $,( i)u = 1, 2) are constructed, on the basis of A5, as (i)T = [y ( i ) , . . . ,y(i - n l ) , u(i), . . ' , u(i - n2),-y*(i),...,-y*(i -n3)] and $2(i)T = o(i),...,y(i - n l ) , u(i ~

1),...,u(i-n2),-y*(i);..,-y*(i-n3)1, wheren' > i i ~ + n ~ , n2 2 n B + t i p + n Q , a n d n 3 = n c +f i r .

e(i) '$ ' ( i ) = 0 for P1; u( i ) = -(~gnb,)&i)~$2(i)

111. GLOBAL STABILITY

In this section we shall formulate the global stability results for the algorithms PI and P2 using, in general, the methodology presented in [4], [6]. Novel details are related to some properties of the sequence of regulator parameter estimates.

Theorem 1: Let the assumptions Al-A5 hold, together with the fol- lowing.

A 6 vjC(z)Q(z) - f i r /2, j = 1, 2 , are strictly positive real functions,

where 8; = [coeff {G(q- ' )Q(q- ' ) } ' , 0,. , . ,0, coeff { B ( q - ' ) x F ( q - l ) Q ( q - ' ) } T , 0,. . . ,0, coeff {C(q- ' )T(q- ' ) } ' , 0, . ' . ,O] and

8; = [coeff { Ib, l - ' G ( q - ' ) Q ( q - ' ) } T , 0,. . . ,0,

coeff {Ib, I - ' (B(q- ' )F(q- ' )Q(q- ' ) - bo)}', 0,. . . ,0,

coeff { Ib, I-'C(q-')T(q-')}', 0, . . ,O] ;

the number of inserted zeros depends on the chosen values of nl , k, and n3 in (7), (8) i.e., on the number of adaptive regulator parameters. Then, for both algorithms P1 and P2, for any finite integer 1 and io = 0, 1, . . . , d - I , with probability one

iv) lim s u p C a z(iY < co; N-00

, = I

Proof: The major part of the proof can be derived as a direct ex- tension of the results presented in [6 ] . We shall pay attention only to the assertions iii) and i). Consider one of d recursions in PI and put i, = 0. One can directly show that

(9) E{lli[(i + 1)d11l2 l~~~ } I Il&id)l12 + a(id) (a.s.1

where

it is easy to verify that

E{X(( i + I)d) lF,d } 5 X ( i d ) (as.) (10)

i.e., { X ( i q , F ,d } is a nonnegative supermartingale satisfying E { X ( i d ) } <_ E{X(O)} 5 c j < ca. The application of Kolmogorov's inequality [I41 leads directly to

5 9 ( E > O)(a.s.). (1 1)

The assertion iii) follows from the fact that y( id ) > 0 for all i . The assertion i) can now be proved after noticing that X ( i 4 converges to some finite random variableXOm and that lim,,OO y( id ) = 0 (a.s.), i.e., l imi++w JJ&id)))2 = X p < cc (a.%). The above conclusions obviously hold for all i, = (0, . . . , d - 1 }. The algorithm P2 can be analyzed in a where vi = 1 (fo; PI) and v2 = Ib, 1-l (fof P2):

Let O ( i ) = O ( i ) - B l for PI and e(i) = i(i) - 8 2 for P2, similar way.

Page 3: On the convergence of self-tuning stochastic servo algorithms based on stochastic approximation

1194 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 34, NO. 1 1 , NOVEMBER 1989

IV. CONVERGENCE OF T H E ADAPTIVE CONTROL LAW

In this section stronger results will be derived; it will be shown that both analyzed algorithms provide convergence of the adaptive control law to the optimal one in the Cesaro sense.

the structure implicit in (8), can be represented as

Applying the same reasoning, one obtains from (17)

CO - - I )d ' ( i ) )2

rJ ( id) Any three-term time-varying controller for the analyzed system, having i='

CO

5 K 3 x l l i ( i d ) - i((i -l)d)1I2 (a.s.) (K3 < m). (19) R(B(i), 4- ' )u( i ) = -S(O(i), 4- ' )y ( i ) + L(O(i), q - ' ) y * ( i ) (12) i = /

where O(l] is the (nl +n2 +n3 +3)-dimensional regulator parameter vector andS(O(i), q- ' ) =SI =el( i )+. . .+en,+, ( i )q-" l ,R(B( i ) , q-I) = R , = Bn,+2(i)+...+en,+.,+2(i)4-n2, L(O(i), 4 - I ) =-L, = en,+nl+3( i )+ . . .+ 0 F , + n 2 + n 3 + 3 q - n 3 . For the algoJithm P1 e(i)*= e(i), whi!e for p2 e(i) =

PIG),..., h,+1@1, sgn bo, Bnj+2( i ) ; . . , 8 , , + n , + l ( i ) , e n , + n , + 2 ( i ) , . . . , 0n,+n,+n,+2(i) lT. According to ( S ) , e(r] produces the optimal control law if

SjBF = RjG; LiBFQ = R;TC. (13)

Denote by V the set of regulator parameter vectors satisfying (13). Theorem 2: Let the assumptions Al-A6 hold. Then the adaptive con-

trol law converges to the optimal one for both algorithms P1 and P2 in the sense that for every open set VI 2 V

N - C O N lim L k Z ( S ( i o + i d ) E V I ) = 1 (a.s.) ,=I

i o E {O,...,d - 1) (14) where I ( . ) is the indicator function.

model Proof: Introducing (12) into (1) one obtains the closed-loop system

H,y( i + d ) = L , B y * ( i ) + R , C w ( i + d ) + A , , , u ( i )

+A1,2y(i) + Al,3y*(i) (15)

whereH, = R , A + S , B q - d , A , , l = R , B - B R l , A , , 2 = S , B - B S , , and A,,3 = BL, - L,B. After straightforward manipulations one obtains

Analogous inequalities hold also for the terms from (17) involving A: d , A:-/ d , and Therefore, from (17) and (19), after ap- plying Kronecker's lemma, one obtains

The application of [8, Lemma A.11 leads to the conclusion that the cross product terms in (20) converge to zero. If M, = x52, mk(i)qPk and N , = x;Eo nk(i)4+ and if, for example, { x ( i d ) } = {n,((i - I)d)nk((i - I)d)u(id - k)w(id - j ) } , one easily concludes that x(id) is F,d-6-measurable, where 6 = m i n u , k). Moreover,

E{x( id) -1)d -6 } = 0;

~ ~ ~ { x ( i d ) 2 ~ F ( l - , l d - , } < m (a.s.) (21)

since both m,((i - I)d) and nk((i - I)d) are F(,-l)d--6-measurable, and both {E:=, (l/j2)wCi)2, fi) and CX=, (l / .?)~ci)~, F, } are conver- gent supermartingales. Therefore, it fohows that

I =I

where d(e(i - I )d) = mj(( i -I)d)'u; + nk((i -I)d)2u: . In general, condition d(ej = 0 implies (13), i.e., that 0 E V . Reasoning as in [SI, [7] one concludes, consequently, that (22) implies (14). The theorem is thus proved.

PHiz( i ) = M l w ( i ) + N i v ( i ) + A:u(i) + Afy(i ) V. CONVERGENCE OF T H E REGULATOR PARAMETER ESTIMATES +A;"Y*(i + d ) + Ayv(i + d ) (16)

where Mi = P(R, G - SiBFZ N , Li BQ - H, T = L, BFQ - RiTC, Ay = A, , l , A;' = A,,2, Ay = A;,3qPd +(PL,B -L;BP)qPd + HiP - P H i . and AY = P H i W - H i W P .

In this section we shall analyze asymptotic properties of the regula- tor parameter estimates under the following additional assumptions con- cerning the structure of the adaptive regulator and the properties of the optimal regulator.

A 7 At least one of the following three conditions holds: .Define a ' new sequence { { ( l ] } by the following difference equation

(obtained from (16) by shifting e(r] I steps backwards): (a) ni = n~ +nc = "Q +nc; (b) n2 = ns + i i ~ +ne = n~ + n F + n ~ (c) n3 = n c +fir = n c +nr.

A8: The optimal regulator transfer function Gc ( z - ' ) = [ - G ( z - ~ ) Q ( z - ' ) : ~(z-~)T(z-~)]~(B(z-~)~(z-~)~(z-'))-' is irre- ducible.

Theorem 4: Let the assumptions Al-A8 hold. Then for the algorithm P2

PH,-{ { ( i ) = M1-/w( i ) + N , - / v ( i ) + A:-/u(i)

+A;'_,y(i) + A:I,y*(i + d ) + A:-,v(i + d ) (17)

where max {nM, n N } < I < m(nM = degM, , nN = degN,) and the initial conditions are assumed to be finite w.p.1.

tracting (17) from (16), one obtains Analyze now one of the recursions in (7) and put io = 0. After sub- lim i ( i ) = $ 2 . (a.s.). (23)

I"

Proof: Assume io = 0. Convergence of the sequence (de(( i - I)d)} to zero in the Cesaro sense (22) implies that zero is its cluster point, i.e., that there exists a subsequence of positive integers {Sp : SpPl < S p } on which

CO g (PH(!+;/d)Y K l (pH(r-/)dz(id))2 rl (id)

I =I I = I

00

i = I

K I , K2 < CO, j = 1, 2; (a.s.1. (18)

The second term in (18) maximizes the sum of squared norms of the terms (M, - M , - ~ ) w ( i ) , (NI - N,-~)u( i ) , etc., having in mind that ~ ~ $ , ( i ) ~ ~ 2 / r l ( i ) 5 1 and (w(i)' + u(i)')/r(i) 5 c4 < m (a.s.). The first term on the right-hand side of (18) is finite by virtue of iv), and the

Having in mind that Ile(i)(( is finite w.p.1 (Theorem l) , there exists a subsequence { t p } E {sp} satisfying

lim P ( R ( , ~ _ , , ~ G -S( ,p-I ,dBF) = P(R,G - S , B F ) = 0 (a.s.)(25)

lim (L(,p -/)dBQ - H(,p -/)dT) = Lo BQ - HOT = 0 (as.) (26) second by virtue of ii); the left-hand side is, therefore, finite. 1 - 0 3

Page 4: On the convergence of self-tuning stochastic servo algorithms based on stochastic approximation

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 34, NO. 1 1 , NOVEMBER 1989 1195

where Lo = lirn-m L<rp-/)d, S o = lixn-oo S ( p - / ) d , R , = lixn+m R(r; -,Id, and H , = Iirn-oo H(r; - / )d = RoA + S,Bq-d w.p. 1 .

Assume first that nl = "Q + nc , n2 = nB + n F + "Q, and n3 = ne + n T , i.e., that the entire structure of the optimal regulator is known. Let Do be the greatest common divisor (g.c.d.) of G and BF, i.e., G = DOG, and BF = DoBoF (G and F a r e coprime by A8). Therefore, by (25), so = D'DLG and R , = D'DLB,F, where D' and DL are arbi- trary polynomials satisfying deg D' = "Q and deg DL = deg Do . Thus, H , = D'DLBoC and (26) yields L,QDo = D'DLTC. Consequently, Lo = k,TC where k, is an arbitrary constant since TC is coprime with both Q and Do by virtue of (A8). Therefore, D'DL = k,QDo and the solution of (25), (26) is

R , = k,BFQ; So = k,GQ; Lo = k,CT. (27) In the algorithm P2, however, the leading coefficient of R(O(i), q-' ) is fixed, and equal to sgn bo; therefore, for P2, k, = Ib, I- ' , so that lixn,, O(tpd) = 8, [see (6)].

a) n l = n Q + n G , n2 > n E + n Q + n F , n3 > n e + n T

It is clear that (27) holds also in the cases when:

and

When

it is possible to obtain (27) after supposing that S, and & have a great- est common divisor and concluding that it can be only a constant, since deg Lo = degCT. After applying the same reasoning for the remain- ing recursions in (7), one concludes that there exist subsequences { t ' ; ~ } satisfying

lim &io + tj'd) = 62 (io = 0, . . ,d - 1) (as.). (28)

Thus, zero is $e cluster point of the sequences (IIe(i, + id)l lz} = { IlO(i, + i d ) - 8 2 11'). Since, by virtue of i), these sequences converge, it follows that for io E {O;.. ,d - I } lim-,m IlO(i, +id)1I2 = 0; this leads directly to (23).

Theorem 5: Let the assumptions Al-A8 hold. Then for algorithm P1

lim &io +id) = ki,,6, (29)

where k;, are random constants. Proof: One can easily conclude that the part of the proof of Theorem

4 up to (27) is valid for both algorithms P1 and P2. In P1, however, all the regulator parameters are estimated, so that k,, in (27) remains an arbitrary random constant. After applying the same reasoning for all the recursions, one concludes that there exist subsequences { t, i o } on which sequences { IlO(i, + i d ) - kioO1 II'}, where k,o are random constants, converge to zero. By applying directly the methodology of Becker, Kumar, and- Wei[7], one can readily prove that, for P1, the increments As(;) = O(i) -O(i -d) are orthogonal to Oii-d) (this property does not hold for P2), and that the sequences {IlO(i, + id)l12}(i, = 0,. . . ,d - 1 ) converges to finite random variables. This result leads to the conclusion that {IIO(i, + i d ) - kjo 01112} converges, i.e., that (29) holds.

Remark: More general conclusions can be derived by using the methodology of Theorems 3 and 4. Assume that an arbitrary value e', is assigned to an arbitrary j th element of the adaptive regulator param- eter vector O(z]; let the corresponding component of the vector 0, be 6'; # 0 [see (6)]. The parameter estimation recursion is obtained in this case from (7) by replacing the vector of regressors h ( I ] by (z], which contain all the elements of (z] except i tsj th component A J ( i ] , and flJ by pa > 0. The control law is defined by e$l,j(i) + O(iy$o ( i ) = 0.

The global stability analysis shows that the positiye real condition should be satisfied for aC(z )Q(z ) - /2, where a = & /$ ; obviously, a has to be positive, implying that the sign of the parameter 8; has to

i-cc

(io = 0, . . ,d - 1) (a.s.) i-oo

be a priori known (A6 is a special case in which $ = bo) . The analysis of convergence of the parameter estimates is almost the same as the analysis related to the algorithm P2; the only difference is that, in (27), k , = O:/Oi, wherefrom the conclusion about the strong consistency follows directly.

In general, one may conclude that the strong consistency of minimum variance self-tuning regulators can be achieved when at least the sign of one of the optimal regulator parameters in known and introduced into the algorithm, reducing the overall number of estimated parameters by one. Analogous conclusions can be drawn for the deterministic reference case [15].

VI. PERSISTENCE OF EXCITATION

In numerous papers dealing with parameter estimation algorithms, the persistence of excitation of the vector of regressors has been pointed out as one of the basic properties related to convergence of the parameter estimates, e.g., [ 5 ] , [8], [16]. The persistence of excitation condition is satisfied for the algorithm P j ('j = 1, 2) if

Theorem 6: Let the assumptions Al-A8 hold. Then the persistence of excitation condition (30) is satisfied for the algorithm P2 and not for the algorithm PI .

Proof: It can be easily shown, extending the methodology of Caines and Lafortune [8], that for P2 &(ir = b(i) , . . . , y ( i - nl ) . u(i - I),...,u(i -n2),-y*(i),...,-y*(i -n3)] = $ f ( i ) T + $ f ( i ) T , where

G . A T . G . B B P B

- - W ( I - l ) + - - u ( z - l ) , . . . , - - w ( r - n 2 )

A T . B P + - -u(z - n 2 ) -

Having in mind that B(z) is stable, it follows from AI, A4, and iv) that 1) limN+w 1 / N ELI $ f ( i ) $ f ( i ) T 1 1 = 0, and 11 limN,, I/N $f(i)$f(i)Tll = o (as . ) . Consequently, the persistence of excitation condition holds for P2 if R; = limN-,, I / N ELl $ f ( i ) @ ( i ) T 2 C ~ Z , o < ci < m, (a.s.1. AC- cording to the ergodicity property in A2, the multivariable version of Herglotz's theorem [17] can be applied, yielding

where

Page 5: On the convergence of self-tuning stochastic servo algorithms based on stochastic approximation

1196 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 34, NO. 11, NOVEMBER 1989

and

1 :o: . . . . . . : o F ( e J w ) : . . . . . . :eJnlwF(eJw) :eJwW2(eJw): . . . . . . : eJn2wW2(eiw)

~ e J ~ ~ ~ + d ) ~ ~ ~ ( e J w ) : e j ~ ~ ~ ( e J ~ ) ~ ~ ( e J ~ ) : . . . :eJn2w Wl(ejw)W3(eJw): W4(eJw): . . . :eJn3w W wl(elu):. . . -. . . 4 (eJw E2(eJW)T =

where W l ( e J ” ) = T(eJ”)/P(eJ”), Wl(eJ“) = G(eJ”)/B(eJw), W3(eJw) = A(eJ”)/B(eJ”), and W4(eJw) = Q(eJ“)/P(eJ”).

After a straightforward analysis based on [8], one can derive that is positive definite if XI (z) = 0, Xz (z) = 0, and A3 (z) = 0 is the only solution of the following couple of polynomial equations

where deg XI ( z ) = nl , deg XZ(Z) = n2 - 1, and deg X3 (z) = n 3 . Supposefirstthatnl = n c + n Q , n 2 = n B + n F + n Q , a n d n 3 = n c +

n~ . If Do is the g.c.d. of G and BF, one obtains from the first equation in (33) that XI = zGoD: and X2 = BoFD:, where D: is an arbitrary polynomial with deg D: = deg Do +nQ - 1 (see Theorem 4). Assume that DI is the g.c.d. of BFQ and TC; then TC = T I g C I * , and B F Q = BID: F I Dp Q , where DT 0:: = D: Dp = D , , since Q is coprime with TC by virtue of A8. From the second equation in (33) one obtains 5 = zTlCIDi and Xz = QBIFID;, where D; is any polynomial satisfying deg D{ = degDl - 1, and, consequently, Q B I F I D i = BoFD:. Since Do and D1 are coprime, one obtains that BI = B * Do, Bo = B * q , and DID: = QD,,Di. As Q and D, are coprime, D: = D t Q , i.e., DID: = D,Di. The last equation implies that 0:’ = 0 and Di = 0, since Do and DI are coprime and deg D: = degD2 - 1, d e g D r , = deg Do - 1. Therefore, XI = 0, X2 = 0, and X3 = 0 is the only solution of (33). It is easy to verify that the same conclusion holds for nl , &, and n3 satisfying AI , i.e., Rz > 0.

Applying the same methodology to the algorithm P1, one obtains, instead of (33), the following equations:

hl (z)B(z)F(z) = hz(z)G(z); X~(Z)B(Z)F(Z)Q(Z) = Az(z)T(z)Q(z) (34)

where deg XI (z) = nl , deg X2 (z) = nz , and deg A3 (z) = n3. These equations have at least one nonzero solution, i.e., RI is singular. This result is in accordance with the observations of Kumar in [16], related to the regulation case.

VII. CONCLUSION

In this note the problem of adaptive tracking of stochastic reference signals is considered, supposing that the process can be represented by an ARMAX model with arbitrary time-delay and the reference signal by an ARMA model. Two algorithms of stochastic approximation type, providing adaptation to both process and reference characteristics, are proposed. They differ by the incorporated a priori knowledge about the optimal regulator parameters.

The given analysis provides a complete insight into the asymptotic properties of the algorithms, including the global stability, asymptotic optimality, convergence of the adaptive control law, and the strong con- sistency of the parameter estimates. It is demonstrated that the informa- tion about the sign of one of the optimal regulator parameters enables achieving the strong consistency, provided the structure of the adaptive regulator is adequate A7 and the optimal regulator is irreducible A8. In the case of no a priori information, the self-tuning property (convergence of the adaptive control law to the optimal one) still holds, even though the persistence of excitation condition is not satisfied. To the author’s knowledge, this is the first attempt to prove the strong consistency of self-tuning regulators in the general delay case.

The proposed methodology of analysis, based on the time-varying closed-loop system model, can be applied to a broader class of problems U81.

REFERENCES K. J. astrom and B. Wittenmark, “On self-tuning regulators,” Automaiica, vol. 9, pp. 185-199, 1973. L. Ljung, “Analysis of recursive stochastic algorithms,” IEEE Trans. Automat. Conir., vol. AC-22, pp. 551-575, Aug. 1977. G. C. Goodwin, D. J . Hill, and M. Palaniswami, “A perspective of convergence of adaptive control algorithms,” Auiomaiica, vol. 20, pp. 519-531, 1984. G. C. Goodwin and S. Sin, Adapiive Filtering, Prediction and Control. En- glewood Cliffs, NJ: Prentice-Hall, 1984. P. R. Kumar and P. Varaiya, Stochastic Sysiems: Estimation, Identification and Adaptive Control. G . C. Goodwin, P. Ramadge, and P. Caines, “Discrete-time stochastic adaptive control,” SIAM J. Contr. Opiimiz., vol. 19, pp. 829-853, 1981. A. H. Becker, P. R. Kumar, and C. Z. Wei, “Adaptive control with the stochastic approximation algorithm: Geometry and convergence,” IEEE Trans. Automat. Contr., vol. AC-30, pp. 330-338, Apr. 1985. P. Caines and S . Lafortune, “Adaptive control with recursive identification for stochastic linear systems,” IEEE Trans. Automat. Contr., vol. AC-29, pp. 312-321, Apr. 1984. H. F. Chen, “Recursive system identification and adaptive control by use of the modified least squares algorithm,” SIAM J . Contr. Opiimiz., vol. 22, pp. 758-776, 1984. H. F. Chen and L. Guo, “Asymptotically optimal adaptive control with consistent parameter estimates,” SIAM J. Conir. Optimiz., vol. 25, pp. 558-575, May 1987. P. R. Kumar and L. Praly, “Self-tuning trackers,” SIAM J. Contr. Opiimiz., vol. 25, pp. 1053-1071, July 1987. S. S . Stankovid and M. S . Radenkovid, “Self-tuning servo for stochastic refer- ences,” Auiomaiica, vol. 22, pp. 241-244, 1986. T. L. Lai and C. Z. Wei, “Extended least-squares and their applications to adap- tive control and prediction in linear systems,” IEEE Trans. Automat. Contr., vol. AC-31, pp. 898-906, Oct. 1986. I. 1. Gikhman and A. V. Skorokhod, Introduction into the Theory of Random Processes. (in Russian), Moscow: Nauh. M. S . Radenkovid and S . S . StankoviC, “Strong consistency of parameter estimates in direct self-tuning control algorithms based on stochastic approximation,” Fac. Elec. Eng., Univ. Belgrade, Tech. Rep., A3/86, 1986. P. R. Kumar, “A survey of some results in stochastic adaptive control,” SIAM J . Conir. Optimiz., vol. 23, pp. 329-380, May 1985. C. W. Burill, Measure, Integration and Probability. New York: McGraw-Hill, 1972. S. S . Stankovid and M. S . Radenkovid, “On the asymptotic behavior of an adaptive pole-placement algorithm,” in Preprints 2nd IFAC Workshop on Adaptive Syst., Lund, Sweden, 1986, pp. 55-61.

Englewood Cliffs, NI: Prentice-Hall, 1986.

Complete Results for a Class of State Feedback Disturbance Attenuation Problems

IAN R. PETERSEN

Abstract-This note considers a problem of disturbance attenuation using full state feedback. The particular class of linear systems under consideration is closely related to the so-called one block problem in H infinity control. This note gives a complete solution to this problem in terms of a certain algebraic Riccati equation.

I. INTRODUCTION AND DEFINITIONS

In recent years, a large degree of interest has been focused on the H” optimization problem; e.g., see [1]-[3]. However, the existing ap-

Manuscript received February 24, 1988; revised November 10, 1988. This work was supported by the Australian Research Grants Scheme.

The author is with the Department of Electrical Engineering, University College, University of New South Wales, Australian Defence Force Academy, Canberra ACT 2600, Australia.

IEEE Log Number 8930292

0018-9286/89/1100-1196$01.00 0 1989 IEEE