SAAB2001_On a Discrete Time Stochastic Learning Control Algorithm

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001 1333

On a Discrete-Time Stochastic Learning Control Algorithm

Samer S. Saab

Abstract—In an earlier paper, the learning gain for a D-type learningalgorithm, is derived based on minimizing the trace of the input error co-variance matrix for linear time-varying systems. It is shown that, if theproduct of the input/output coupling matrices is full-column rank, then theinput error covariance matrix converges uniformly to zero in the presenceof uncorrelated random disturbances, whereas, the state error covariancematrix converges uniformly to zero in the presence of measurement noise.However, in general, the proposed algorithm requires knowledge of thestate matrix. In this note, it is shown that equivalent results can be achievedwithout the knowledge of the state matrix. Furthermore, the convergencerate of the input error covariance matrix is shown to be inversely propor-tional to the number of learning iterations.

Index Terms—Iterative learning control, stochastic control.

NOMENCLATURE

Kk

andPu; k

Learning gain and the input error covariance matrices re-sulting from the optimal control algorithm presented in[1].

Kk; Pu; k Learning gain and sequence of matrices (analogous toPu; k) used to define the modified learning algorithmproposed in this note.

P u; k: Actual input error covariance matrices resulting from theproposed (modified) learning algorithm.

I. PRELIMINARY

The system considered in [1] is a discrete-time-varying linear systemdescribed by the following difference equation

x(t+ 1; k) =A(t)x(t; k) +B(t)u(t; k) + w(t; k)

y(t; k) =C(t)x(t; k) + v(t; k) + vb(k) (1)

wheret 2 [0; nt];x(t; k) 2 <n;u(t; k) 2 <p;w(t; k) 2 <n;y(t; k) 2 <q;v(t; k) undesired bias vectorvb(k) 2 <q;A(t) state matrix;B(t) input coupling matrix;C(t) output coupling matrix.

The learning update is given by

u(t; k + 1) = u(t; k) +K(t; k)[e(t+ 1; k)� e(t; k)] (2)

whereK(t; k) is the(p� q) learning control gain matrix, ande(t; k)is the output error; i.e.,e(t; k) = yd(t) � y(t; k) whereyd(t) is arealizable desired output trajectory. It is assumed that for any realizableoutput trajectory and an appropriate initial conditionxd(0), there exists

Manuscript received June 5, 2000; revised November 16, 2000 and March9, 2001. This work was supported by the University Research Council at theLebanese American University.

The author is with the Department of Electrical and Computer Engineering,Lebanese American University, Byblos, Lebanon.

Publisher Item Identifier S 0018-9286(01)07686-3.

a unique control inputud(t) 2 <p generating the trajectory for thenominal plant. That is the following difference equation is satisfied

xd(t+ 1) =A(t)xd(t) +B(t)ud(t)

yd(t) =C(t)xd(t): (3)

Define the state and the input error vectors as�x(t; k)�= xd(t) �

x(t; k), and�u(t; k)�= ud(t) � u(t; k), respectively. It is assumed

that the initial state error�x(0; k), initial input error�u(t; 0), statedisturbancew(t; k), and the unbiased measurement errorv(t; k) areall modeled as zero-mean white Gaussian noise, and statistically inde-pendent.

Defining the input error and state error covariance matrices asPu; k = E[�u(t; k)�u(t; k)T ], andPx; k = E[�x(t; k)�x(t; k)T ],respectively, whereE is the expectation operator. It is shown [1]that the learning gain, which minimizes the trace of the input errorcovariance matrix, is given by

Kk = Pu; k(C+B)T [(C+

B)Pu; k(C+B)T

+ (C � C+A)Px; k(C � C

+A)T

+ C+QtC

+ +Rt +Rt+1]�1 (4)

where the argumentt is dropped for compactness, andC+ �=C(t+1).

The corresponding input error covariance update is given by

Pu; k+1 =(I �KkN)Pu; k (5)

= [I + Pu; kNTS�1

1; kN ]�1Pu; k (6)

whereS1; k�= (C �C+A)Px; k(C �C+A)T +C+QtC

+ +Rt +

Rt+1, andN�= C+B. The corresponding results are summarized as

follows. If C(t + 1)B(t) is full-column rank, then the learning algo-rithm, presented by (2), (4), and (5), guarantees the following.

2) Pu; k is a symmetric positive–definite matrix8 k, andt 2 [0; nt]. Moreover, the eigenvalues of(I�KkC

+B) are pos-itive and strictly less than one; i.e.,0 < �(I�KkC

+B) < 18 k,and t 2 [0; nt]. Consequently, there exists a consistent normk � k such that8 k andt 2 [0; nt], kI �KkC

+Bk < 1.3) kPu; k+1k < kPu; kk8 k. In addition,Pu; k ! 0 andKk ! 0

uniformly in [0; nt] ask ! 1.4) In the absence of state disturbance and reinitialization errors

(excluding biased measurement noise() Rt is positive–defi-nite), kPu; k+1k < kPu; kk 8k. Pu; k ! 0, Kk ! 0, and thestate error covariance matrixPx; k ! 0 uniformly in [0; nt] ask ! 1.

II. M AIN RESULTS

In this section, the “modified,” or suboptimal, stochastic learningcontrol algorithm and its convergence characteristics are presented.Consider the “modified” learning gain matrix to be given by

Kk = Pu; kNT [NPu; kN

T + S2]�1 (7)

whereS2�= C+QtC

+ +Rt+Rt+1, and the recursion of the matrixPu; k to be given by

Pu; k+1 =(I �KkN)Pu; k(I �KkN)T +KkS2KTk

=Pu; k �KkNPu; k � Pu; kNTK

Tk

+Kk(NPu; kNT + S2)K

Tk :

0018–9286/01$10.00 © 2001 IEEE

1334 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001

Note that this learning gain can no longer be claimed as optimal gainmatrix. Substituting the value ofKk into the last equality, we get

Pu; k+1 =Pu; k � Pu; kNT (NPu; kN

T + S2)�1NPu; k

� Pu; kNT (NPu; kN

T + S2)�1NPu; k

+ Pu; kNT (NPu; kN

T + S2)�1NPu; k

=(I �KkN)Pu; k:

Making use of [1 Claim 1], we have

Pu; k+1 =(I �KkN)Pu; k

= [I + Pu; kNTS�12 N ]�1Pu; k: (8)

It is worthwhile noting that by eliminating(C � C+A)Px; k(C �C+A)T , the matrixS1; k becomesS2, and consequently the results ofPu; k, that isPu; k+1 = (I�KkN)Pu; k, and (8) are consistent with (5)and (6), respectively. Since the term(C�C+A)Px; k(C�C

+A)T iseliminated in the learning matrixKk and the update of the error covari-ance matrix, then 1) knowledge of the state matrix is no longer needed,and 2)Pu; k does no longer represent the “true” input error covariancematrix.

In the following, the convergence characteristics ofKk andPu; k areshown to be equivalent toKk andPu; k.

Theorem 1: If N = C(t + 1)B(t) is full-column rank, then thelearning algorithm, presented by (2), (7), and (8), guarantees the fol-lowing:

2) Pu; k is a symmetric positive-definite matrix8 k, andt 2 [0; nt];3) the eigenvalues of(I �KkN) are positive and strictly less than

one; i.e.,0 < �(I �KkN) < 18 k, andt 2 [0; nt];4) kPu; k+1k < kPu; kk 8 k. In addition,Pu; k ! 0 andKk ! 0

uniformly in [0; nt] ask ! 1.The proofs of Theorem 1 are identical to the proofs of their counterpartspresented in [1], and are thus omitted.

In what follows, we show that ask ! 1, Pu; k ! 0 (and conse-quentlyKk ! 0) uniformly in [0; nt] if and only if the “true” inputerror covariance matrixP u; k ! 0 uniformly in [0; nt]. This fact un-derlines the main contribution of this manuscript. In addition, we showthat the convergence is inversely proportional to the number of learningiterations.

Claim 1: If N is full-column rank, then the learning algorithm, pre-sented by (2), (7), and (8), assures that

Pu; k = [I + kPu; 0NTS�12 N ]�1Pu; 0: (9)

In the following, we denote by�(M) the eigenvalues ofM .Proof: The proof is proceeded by induction. Using (8) fork = 1,

we obtain

Pu; 1 = [I + Pu; 0M ]�1Pu; 0

whereM�= NTS�12 N is a symmetric positive–definite matrix. Since

(9) is true fork = 1, we assume that the equality is true fork� 1, i.e.,Pu; k�1 = [I + (k � 1)Pu; 0M ]�1Pu; 0. Again using (8), we get

Pu; k = I + [I + (k � 1)Pu; 0M ]�1Pu; 0M�1

� [I + (k � 1)Pu; 0M ]�1Pu; 0

= [I + (k � 1)Pu; 0M ]

� [I + [I + (k � 1)Pu; 0M ]�1Pu; 0M ]�1

Pu; 0

= fI + (k � 1)Pu; 0M + Pu; 0Mg�1Pu; 0

=(I + kPu; 0M)�1Pu; 0:

Theorem 2: If N is full-column rank andPu; 0 is a symmetric posi-tive–definite matrix, then the learning algorithm, presented by (2), (7),and (8), guarantees that

kPu; kk <1

kc1 (10)

where c1�= 1=min[�(NTS�12 N)]. Let c2

�= c1kNk kS

�12 k, then

kK(t; k)k < (1=k)c2.Proof: SincePu; k is a symmetric positive definite matrix, then

using the results of Claim 1, we have

kPu; kk = k(P�1u; 0 + kNTS�12 N)�1k

=1=min[�(P�1u; 0 + kNTS�12 N)]:

Note that since P�1u; 0 is symmetric positive definite matrix,then min[�(P�1u; 0 + kNTS�12 N)] > min[�(kNTS�12 N)]

= kmin[�(NTS�12 N)]. Therefore

kPu; kk <1

k

1

min[�(NTS�12 N)]=

1

kc1:

Using (7), we have

kK(t; k)k <c1kkNk k(NPu; kN

T + S2)�1k

�c1kkNk kS�12 k:

Theorem 3: Assuming thatN is a full-column rank matrix, thelearning algorithm, presented by (2), (7), and (8) guarantees that theinput error covariance matrixP u; k ! 0 uniformly in [0; nt] ask ! 1. Furthermore, the rate of convergence ofP u; k is inverselyproportional tok for k > 1, that is, there exists a positive constantc

P

such thatkP u; kk < cP=k. Conversely, ifP u; k ! 0 ask !1, and

kP u; kk < cP=k, thenPu; k ! 0 ask !1. Furthermore, if the rate

of convergence ofP u; k is inversely proportional tok, then the rate ofconvergence ofPu; k is also inversely proportional tok.

Proof: In [1], it is shown that for any given learning gain matrixKk, in particularKk � Kk, the input error covariance matrix is givenby

P u; k+1�= E[�u(t; k + 1)�u(t; k + 1)T ]

= (I �KkN)P u; k(I �KkN)T

+Kk((C � C+A)P x; k(C � C+A)T + S2)KTk

whereP x; k is the state error covariance matrix corresponding toKk(=

Pu; kNT [NPu; kN

T + S2]�1). DefineLk

�= (C �C+A)P x; k(C �

C+A)T , this implies thatLk = LTk , andLk � 0. Employing the fact

that

Pu; k+1 = (I �KkN)Pu; k(I �KkN)T +KkS2KTk

then by subtracting the last two equations, we get

�P u; k+1�= P u; k+1 � Pu; k+1

=(I �KkN)�P u; k(I �KkN)T +KkLkKTk :

Without loss of generality (see the subsequent remark), it is assumedthat only at initialization we setPu; 0 = P u; 0, that is�P u; 0 = 0.DefineDk

�= KkLkK

Tk . Iterating the last equation up tok � 1, we

have

P u; k =Pu; k +�P u; k

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001 1335

Fig. 1. Top plot:P (solid), andP (dashed). Bottom plot:K .

where

�P u; k =

k�1

i=0

k�1

j=i+1

(I �Kk�jN)

�Di

k�1

j=i+1

(I �Kk�jN)

T

(11)

with k�1

j=k(:)

�= I . Note that sinceDi is symmetric and pos-

itive–semidefinite matrix, then�P u; k is also symmetric andsemipositive–definite matrix. Employing (8), we have

Pu; k�i =(I �Kk�i�1N)Pu; k�i�1

=

k�1

j=i+1

(I �Kk�jN)Pu; 1

= [I + (k � i)G]�1Pu; 0

whereG�= Pu; 0N

TS�12 N , and Equation (9) is used to obtain the lastequality. Note that sincePu; 0, andNTS�12 N are symmetric positivedefinite matrices, then the eigenvalues ofG are strictly positive, thatis, �(G) > 0. SubstitutingPu; 1 = [I + G]�1Pu; 0 into the secondequality of the last equation, we get

k�1

j=i+1

(I �Kk�jN) = [I + (k � i)G]�1(I +G):

Sinceminf�[I+(k�i)G]g > minf�[(k�i)G] = (k�i)min[�(G)],then

k[I + (k � i)G]�1(I +G)k <1

(k � i)

kI +Gk

min[�(G)]:

From the previous results, we havekKkk < (1=k)c2 or kKik <(1=i)c2. Since for allk, kI � KkNk < 1, then the boundedness ofP x; k is guaranteed [1], which implies that there exists a positive con-stantcL such thatkLkk � cL. Therefore,kDik < (1=i2)c22cL. De-fine cc1

�= (c22cLkI + Gk2)=fmin[�(G)]g2, andcc2

�= (kD0kkI +

Gk2)/fmin[�(G)]g2. Taking the norm on both sides of�P u; k definedin (11) and using the derived norm bounds, we get

k�P u; kk �

k�1

i=1

k�1

j=i+1

(I �Kk�jN) kDik

�

k�1

j=i+1

(I �Kk�jN)

T

+

k�1

j=i

(I �Kk�jN) kD0k

�

k�1

j=i

(I �Kk�jN)

T

< cc1

k�1

i=1

1

i2(k � i)2+ cc2

1

k2:

Note that fork > 3; k�1

i=1[1=i2(k�i)2] < 1=k, and k�1

i=1[1=i2(k�

i)2] = 1, and 0.5, fork = 2, and 3, respectively. This implies that fork > 1; k�1

i=1[1=i2(k � i)2] � 2=k. Definecc3

�= max(2cc1; cc2),

then we have

k�P u; kk <cc3k:

1336 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001

Equation (11) implies that

kP u; kk � kPu; kk+ k�P u; kk:

Therefore, by applying the learning algorithm presented by (2), (7), and(8), thuskPu; kk < (1=k)c1, we have

kP u; kk <cP

k

wherecP

�= c1 + cc3. Consequently, ask ! 1; P u; k ! 0. Con-

versely, (11) implies thatkP u; kk = kPu; k +�P u; kk. SincePu; k issymmetric and positive definite matrix and�P u; k is symmetric andpositive–semidefinite matrix, then 1) ifP u; k ! 0 ask ! 1, thenPu; k ! 0 and�P u; k ! 0, ask ! 1, and 2) ifkP u; kk < c

P=k,

then there exist positive constantscP1 andc�P1 such thatkPu; kk <(1=k)cP1, andk�P u; kk < c�P1=k for all k > 1.

Remark: When applying the proposed algorithm, it is intuitive toinitially set Pu; 0 to the estimate ofP u; 0. However, if this is not thecase, then�P u; 0 6= 0. Thus,�P u; k in (11) becomes

�P u; k =

k�1

i=0

k�1

j=i+1

(I �Kk�jN)

�Di

k�1

j=i+1

(I �Kk�jN)

T

+

k�1

i=0

(I �Kk�i�1N)

��P u; 0

k�1

i=0

(I �Kk�i�1N)

T

:

For instance, to guarantee that�P u; k is symmetric positive–semidefi-nite matrix, then it may be assumed that�P u; 0 is also symmetric posi-tive–semidefinite matrix. Applying the argument similar to the originalproof leads to the desired results.

Theorem 4: If N is a full-column rank matrix, then in absence ofstate disturbance and reinitialization errors (excluding biased measure-ment noise), the learning algorithm, presented by (2), (7), and (8), guar-antees that the input error covariance matrixP u; k ! 0, and the stateerror covariance matrixP x; k = E[�x(t; k)�x(t; k)T ]! 0 uniformlyin [0; nt] ask ! 1.

Proof: Theorem 1 implies thatPu; k ! 0, and consequently The-orem 3 implies thatP u; k ! 0. The rest of the proof is similar to theproof of its counterpart in [1], thus omitted.

III. N UMERICAL EXAMPLE

Application of the algorithm presented by (2), (7), and (8) is nowadded to the same example given in [1]. The convergence character-istics of P u; k; Pu; k, andKk are illustrated in Fig. 1. The top andbottom plots conform with the 10 dB/decade attenuation characteris-tics ofP u; k, Pu; k, andKk.

IV. CONCLUSION

This note presented an “upgraded,” or suboptimal, version of thestochastic algorithm presented in [1]. This presented algorithm doesnot require the use of the state matrix. In the presence of uncorrelatedrandom state disturbance, reinitialization errors, and biased measure-ment errors, this algorithm is shown to drive the input error covariancematrix to zero as the number of learning iterations increases. The rateof convergence is shown to be inversely proportional to the number of

iterations. The state error covariance matrix is also shown to convergeuniformly to zero in presence of random measurement errors.

ACKNOWLEDGMENT

The author would like to thank one of the earlier reviewers for con-structive suggestions in improving this work.

REFERENCES

[1] S. S. Saab, “A discrete-time stochastic learning control algorithm,”IEEETrans. Automat. Contr., vol. 46, pp. 877–887, June 2001.

Robust Nonlinear Integral Control

Zhong-Ping Jiang and Iven Mareels

Abstract—It is well known from linear systems theory that an integralcontrol law is needed for asymptotic set-point regulation under parameterperturbations. This note presents a similar result for a class of nonlinearsystems in the presence of an unknown equilibrium due to uncertain non-linearities and dynamic uncertainties. Both partial-state and output feed-back cases are considered. Sufficient small-gain type conditions are iden-tified for existence of linear and nonlinear control laws. A procedure forrobust nonlinear integral controller design is presented and illustrated viaa practical example of fan speed control.

Index Terms—Dynamic uncertainties, input-to-state stability, nonlinearsystems, robust integral control, small-gain.

I. INTRODUCTION

It is widely recognized that an integral controller is inherently robustin the face of model and controller parameter variations. The value ofintegral control in achieving robust asymptotic regulation has recentlybeen exploited for nonlinear uncertain systems—see, e.g., [1]–[4], [9],and [10], and the references therein. In [1] and [2], Freeman and Koko-tovic propose a backstepping scheme for robust integral control of aclass of nonlinear systems with unknown nonlinearities. Global set-point regulators with disturbance rejection property are constructed atthe price of assuming full-state information and the relative degreebeing equal to the system order. Both assumptions in [1], [2] are re-laxed by Khalil [10] by means of his “high-gain observers” techniquescomplemented by the idea of saturating the controller outside a com-pact set of interest. Naturally, as a consequence of the “worst-case”design, the results in [10] are of regional and semiglobal types.

The purpose of this note is to propose global regulation results fora class of nonlinear systems with disturbances combining those in [1],[2], [10], i.e., we do consider unmeasured zero-dynamics and uncer-tain nonlinearities. Both partial-state and output feedback control caseswill be investigated. The obtained results extend our previous results

Manuscript received November 7, 2000; revised March 22, 2001. Recom-mended by Associate Editor Z. Lin. This work was supported in part by theNational Science Foundation under Grants INT-9987317 and ECS-0093176.

Z.-P. Jiang is with the Department of Electrical and Computer Engineering,Polytechnic University, Brooklyn, NY 11201 USA (e-mail: [email protected]).

I. Mareels is with the Department of Electrical and Electronic Engineering,Melbourne University, Parkville 3052 Victoria, Australia (e-mail: [email protected]).

Publisher Item Identifier S 0018-9286(01)07685-1.

0018–9286/01$10.00 © 2001 IEEE

Documents

SAAB2001_On a Discrete Time Stochastic Learning Control Algorithm