Upload
h80bargo
View
214
Download
0
Embed Size (px)
DESCRIPTION
Technical Paper
Citation preview
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001 1333
On a Discrete-Time Stochastic Learning Control Algorithm
Samer S. Saab
Abstract—In an earlier paper, the learning gain for a D-type learningalgorithm, is derived based on minimizing the trace of the input error co-variance matrix for linear time-varying systems. It is shown that, if theproduct of the input/output coupling matrices is full-column rank, then theinput error covariance matrix converges uniformly to zero in the presenceof uncorrelated random disturbances, whereas, the state error covariancematrix converges uniformly to zero in the presence of measurement noise.However, in general, the proposed algorithm requires knowledge of thestate matrix. In this note, it is shown that equivalent results can be achievedwithout the knowledge of the state matrix. Furthermore, the convergencerate of the input error covariance matrix is shown to be inversely propor-tional to the number of learning iterations.
Index Terms—Iterative learning control, stochastic control.
NOMENCLATURE
Kk
andPu; k
Learning gain and the input error covariance matrices re-sulting from the optimal control algorithm presented in[1].
Kk; Pu; k Learning gain and sequence of matrices (analogous toPu; k) used to define the modified learning algorithmproposed in this note.
P u; k: Actual input error covariance matrices resulting from theproposed (modified) learning algorithm.
I. PRELIMINARY
The system considered in [1] is a discrete-time-varying linear systemdescribed by the following difference equation
x(t+ 1; k) =A(t)x(t; k) +B(t)u(t; k) + w(t; k)
y(t; k) =C(t)x(t; k) + v(t; k) + vb(k) (1)
wheret 2 [0; nt];x(t; k) 2 <n;u(t; k) 2 <p;w(t; k) 2 <n;y(t; k) 2 <q;v(t; k) undesired bias vectorvb(k) 2 <q;A(t) state matrix;B(t) input coupling matrix;C(t) output coupling matrix.
The learning update is given by
u(t; k + 1) = u(t; k) +K(t; k)[e(t+ 1; k)� e(t; k)] (2)
whereK(t; k) is the(p� q) learning control gain matrix, ande(t; k)is the output error; i.e.,e(t; k) = yd(t) � y(t; k) whereyd(t) is arealizable desired output trajectory. It is assumed that for any realizableoutput trajectory and an appropriate initial conditionxd(0), there exists
Manuscript received June 5, 2000; revised November 16, 2000 and March9, 2001. This work was supported by the University Research Council at theLebanese American University.
The author is with the Department of Electrical and Computer Engineering,Lebanese American University, Byblos, Lebanon.
Publisher Item Identifier S 0018-9286(01)07686-3.
a unique control inputud(t) 2 <p generating the trajectory for thenominal plant. That is the following difference equation is satisfied
xd(t+ 1) =A(t)xd(t) +B(t)ud(t)
yd(t) =C(t)xd(t): (3)
Define the state and the input error vectors as�x(t; k)�= xd(t) �
x(t; k), and�u(t; k)�= ud(t) � u(t; k), respectively. It is assumed
that the initial state error�x(0; k), initial input error�u(t; 0), statedisturbancew(t; k), and the unbiased measurement errorv(t; k) areall modeled as zero-mean white Gaussian noise, and statistically inde-pendent.
Defining the input error and state error covariance matrices asPu; k = E[�u(t; k)�u(t; k)T ], andPx; k = E[�x(t; k)�x(t; k)T ],respectively, whereE is the expectation operator. It is shown [1]that the learning gain, which minimizes the trace of the input errorcovariance matrix, is given by
Kk = Pu; k(C+B)T [(C+
B)Pu; k(C+B)T
+ (C � C+A)Px; k(C � C
+A)T
+ C+QtC
+ +Rt +Rt+1]�1 (4)
where the argumentt is dropped for compactness, andC+ �=C(t+1).
The corresponding input error covariance update is given by
Pu; k+1 =(I �KkN)Pu; k (5)
= [I + Pu; kNTS�1
1; kN ]�1Pu; k (6)
whereS1; k�= (C �C+A)Px; k(C �C+A)T +C+QtC
+ +Rt +
Rt+1, andN�= C+B. The corresponding results are summarized as
follows. If C(t + 1)B(t) is full-column rank, then the learning algo-rithm, presented by (2), (4), and (5), guarantees the following.
2) Pu; k is a symmetric positive–definite matrix8 k, andt 2 [0; nt]. Moreover, the eigenvalues of(I�KkC
+B) are pos-itive and strictly less than one; i.e.,0 < �(I�KkC
+B) < 18 k,and t 2 [0; nt]. Consequently, there exists a consistent normk � k such that8 k andt 2 [0; nt], kI �KkC
+Bk < 1.3) kPu; k+1k < kPu; kk8 k. In addition,Pu; k ! 0 andKk ! 0
uniformly in [0; nt] ask ! 1.4) In the absence of state disturbance and reinitialization errors
(excluding biased measurement noise() Rt is positive–defi-nite), kPu; k+1k < kPu; kk 8k. Pu; k ! 0, Kk ! 0, and thestate error covariance matrixPx; k ! 0 uniformly in [0; nt] ask ! 1.
II. M AIN RESULTS
In this section, the “modified,” or suboptimal, stochastic learningcontrol algorithm and its convergence characteristics are presented.Consider the “modified” learning gain matrix to be given by
Kk = Pu; kNT [NPu; kN
T + S2]�1 (7)
whereS2�= C+QtC
+ +Rt+Rt+1, and the recursion of the matrixPu; k to be given by
Pu; k+1 =(I �KkN)Pu; k(I �KkN)T +KkS2KTk
=Pu; k �KkNPu; k � Pu; kNTK
Tk
+Kk(NPu; kNT + S2)K
Tk :
0018–9286/01$10.00 © 2001 IEEE
1334 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001
Note that this learning gain can no longer be claimed as optimal gainmatrix. Substituting the value ofKk into the last equality, we get
Pu; k+1 =Pu; k � Pu; kNT (NPu; kN
T + S2)�1NPu; k
� Pu; kNT (NPu; kN
T + S2)�1NPu; k
+ Pu; kNT (NPu; kN
T + S2)�1NPu; k
=(I �KkN)Pu; k:
Making use of [1 Claim 1], we have
Pu; k+1 =(I �KkN)Pu; k
= [I + Pu; kNTS�12 N ]�1Pu; k: (8)
It is worthwhile noting that by eliminating(C � C+A)Px; k(C �C+A)T , the matrixS1; k becomesS2, and consequently the results ofPu; k, that isPu; k+1 = (I�KkN)Pu; k, and (8) are consistent with (5)and (6), respectively. Since the term(C�C+A)Px; k(C�C
+A)T iseliminated in the learning matrixKk and the update of the error covari-ance matrix, then 1) knowledge of the state matrix is no longer needed,and 2)Pu; k does no longer represent the “true” input error covariancematrix.
In the following, the convergence characteristics ofKk andPu; k areshown to be equivalent toKk andPu; k.
Theorem 1: If N = C(t + 1)B(t) is full-column rank, then thelearning algorithm, presented by (2), (7), and (8), guarantees the fol-lowing:
2) Pu; k is a symmetric positive-definite matrix8 k, andt 2 [0; nt];3) the eigenvalues of(I �KkN) are positive and strictly less than
one; i.e.,0 < �(I �KkN) < 18 k, andt 2 [0; nt];4) kPu; k+1k < kPu; kk 8 k. In addition,Pu; k ! 0 andKk ! 0
uniformly in [0; nt] ask ! 1.The proofs of Theorem 1 are identical to the proofs of their counterpartspresented in [1], and are thus omitted.
In what follows, we show that ask ! 1, Pu; k ! 0 (and conse-quentlyKk ! 0) uniformly in [0; nt] if and only if the “true” inputerror covariance matrixP u; k ! 0 uniformly in [0; nt]. This fact un-derlines the main contribution of this manuscript. In addition, we showthat the convergence is inversely proportional to the number of learningiterations.
Claim 1: If N is full-column rank, then the learning algorithm, pre-sented by (2), (7), and (8), assures that
Pu; k = [I + kPu; 0NTS�12 N ]�1Pu; 0: (9)
In the following, we denote by�(M) the eigenvalues ofM .Proof: The proof is proceeded by induction. Using (8) fork = 1,
we obtain
Pu; 1 = [I + Pu; 0M ]�1Pu; 0
whereM�= NTS�12 N is a symmetric positive–definite matrix. Since
(9) is true fork = 1, we assume that the equality is true fork� 1, i.e.,Pu; k�1 = [I + (k � 1)Pu; 0M ]�1Pu; 0. Again using (8), we get
Pu; k = I + [I + (k � 1)Pu; 0M ]�1Pu; 0M�1
� [I + (k � 1)Pu; 0M ]�1Pu; 0
= [I + (k � 1)Pu; 0M ]
� [I + [I + (k � 1)Pu; 0M ]�1Pu; 0M ]�1
Pu; 0
= fI + (k � 1)Pu; 0M + Pu; 0Mg�1Pu; 0
=(I + kPu; 0M)�1Pu; 0:
Theorem 2: If N is full-column rank andPu; 0 is a symmetric posi-tive–definite matrix, then the learning algorithm, presented by (2), (7),and (8), guarantees that
kPu; kk <1
kc1 (10)
where c1�= 1=min[�(NTS�12 N)]. Let c2
�= c1kNk kS
�12 k, then
kK(t; k)k < (1=k)c2.Proof: SincePu; k is a symmetric positive definite matrix, then
using the results of Claim 1, we have
kPu; kk = k(P�1u; 0 + kNTS�12 N)�1k
=1=min[�(P�1u; 0 + kNTS�12 N)]:
Note that since P�1u; 0 is symmetric positive definite matrix,then min[�(P�1u; 0 + kNTS�12 N)] > min[�(kNTS�12 N)]
= kmin[�(NTS�12 N)]. Therefore
kPu; kk <1
k
1
min[�(NTS�12 N)]=
1
kc1:
Using (7), we have
kK(t; k)k <c1kkNk k(NPu; kN
T + S2)�1k
�c1kkNk kS�12 k:
Theorem 3: Assuming thatN is a full-column rank matrix, thelearning algorithm, presented by (2), (7), and (8) guarantees that theinput error covariance matrixP u; k ! 0 uniformly in [0; nt] ask ! 1. Furthermore, the rate of convergence ofP u; k is inverselyproportional tok for k > 1, that is, there exists a positive constantc
P
such thatkP u; kk < cP=k. Conversely, ifP u; k ! 0 ask !1, and
kP u; kk < cP=k, thenPu; k ! 0 ask !1. Furthermore, if the rate
of convergence ofP u; k is inversely proportional tok, then the rate ofconvergence ofPu; k is also inversely proportional tok.
Proof: In [1], it is shown that for any given learning gain matrixKk, in particularKk � Kk, the input error covariance matrix is givenby
P u; k+1�= E[�u(t; k + 1)�u(t; k + 1)T ]
= (I �KkN)P u; k(I �KkN)T
+Kk((C � C+A)P x; k(C � C+A)T + S2)KTk
whereP x; k is the state error covariance matrix corresponding toKk(=
Pu; kNT [NPu; kN
T + S2]�1). DefineLk
�= (C �C+A)P x; k(C �
C+A)T , this implies thatLk = LTk , andLk � 0. Employing the fact
that
Pu; k+1 = (I �KkN)Pu; k(I �KkN)T +KkS2KTk
then by subtracting the last two equations, we get
�P u; k+1�= P u; k+1 � Pu; k+1
=(I �KkN)�P u; k(I �KkN)T +KkLkKTk :
Without loss of generality (see the subsequent remark), it is assumedthat only at initialization we setPu; 0 = P u; 0, that is�P u; 0 = 0.DefineDk
�= KkLkK
Tk . Iterating the last equation up tok � 1, we
have
P u; k =Pu; k +�P u; k
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001 1335
Fig. 1. Top plot:P (solid), andP (dashed). Bottom plot:K .
where
�P u; k =
k�1
i=0
k�1
j=i+1
(I �Kk�jN)
�Di
k�1
j=i+1
(I �Kk�jN)
T
(11)
with k�1
j=k(:)
�= I . Note that sinceDi is symmetric and pos-
itive–semidefinite matrix, then�P u; k is also symmetric andsemipositive–definite matrix. Employing (8), we have
Pu; k�i =(I �Kk�i�1N)Pu; k�i�1
=
k�1
j=i+1
(I �Kk�jN)Pu; 1
= [I + (k � i)G]�1Pu; 0
whereG�= Pu; 0N
TS�12 N , and Equation (9) is used to obtain the lastequality. Note that sincePu; 0, andNTS�12 N are symmetric positivedefinite matrices, then the eigenvalues ofG are strictly positive, thatis, �(G) > 0. SubstitutingPu; 1 = [I + G]�1Pu; 0 into the secondequality of the last equation, we get
k�1
j=i+1
(I �Kk�jN) = [I + (k � i)G]�1(I +G):
Sinceminf�[I+(k�i)G]g > minf�[(k�i)G] = (k�i)min[�(G)],then
k[I + (k � i)G]�1(I +G)k <1
(k � i)
kI +Gk
min[�(G)]:
From the previous results, we havekKkk < (1=k)c2 or kKik <(1=i)c2. Since for allk, kI � KkNk < 1, then the boundedness ofP x; k is guaranteed [1], which implies that there exists a positive con-stantcL such thatkLkk � cL. Therefore,kDik < (1=i2)c22cL. De-fine cc1
�= (c22cLkI + Gk2)=fmin[�(G)]g2, andcc2
�= (kD0kkI +
Gk2)/fmin[�(G)]g2. Taking the norm on both sides of�P u; k definedin (11) and using the derived norm bounds, we get
k�P u; kk �
k�1
i=1
k�1
j=i+1
(I �Kk�jN) kDik
�
k�1
j=i+1
(I �Kk�jN)
T
+
k�1
j=i
(I �Kk�jN) kD0k
�
k�1
j=i
(I �Kk�jN)
T
< cc1
k�1
i=1
1
i2(k � i)2+ cc2
1
k2:
Note that fork > 3; k�1
i=1[1=i2(k�i)2] < 1=k, and k�1
i=1[1=i2(k�
i)2] = 1, and 0.5, fork = 2, and 3, respectively. This implies that fork > 1; k�1
i=1[1=i2(k � i)2] � 2=k. Definecc3
�= max(2cc1; cc2),
then we have
k�P u; kk <cc3k:
1336 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001
Equation (11) implies that
kP u; kk � kPu; kk+ k�P u; kk:
Therefore, by applying the learning algorithm presented by (2), (7), and(8), thuskPu; kk < (1=k)c1, we have
kP u; kk <cP
k
wherecP
�= c1 + cc3. Consequently, ask ! 1; P u; k ! 0. Con-
versely, (11) implies thatkP u; kk = kPu; k +�P u; kk. SincePu; k issymmetric and positive definite matrix and�P u; k is symmetric andpositive–semidefinite matrix, then 1) ifP u; k ! 0 ask ! 1, thenPu; k ! 0 and�P u; k ! 0, ask ! 1, and 2) ifkP u; kk < c
P=k,
then there exist positive constantscP1 andc�P1 such thatkPu; kk <(1=k)cP1, andk�P u; kk < c�P1=k for all k > 1.
Remark: When applying the proposed algorithm, it is intuitive toinitially set Pu; 0 to the estimate ofP u; 0. However, if this is not thecase, then�P u; 0 6= 0. Thus,�P u; k in (11) becomes
�P u; k =
k�1
i=0
k�1
j=i+1
(I �Kk�jN)
�Di
k�1
j=i+1
(I �Kk�jN)
T
+
k�1
i=0
(I �Kk�i�1N)
��P u; 0
k�1
i=0
(I �Kk�i�1N)
T
:
For instance, to guarantee that�P u; k is symmetric positive–semidefi-nite matrix, then it may be assumed that�P u; 0 is also symmetric posi-tive–semidefinite matrix. Applying the argument similar to the originalproof leads to the desired results.
Theorem 4: If N is a full-column rank matrix, then in absence ofstate disturbance and reinitialization errors (excluding biased measure-ment noise), the learning algorithm, presented by (2), (7), and (8), guar-antees that the input error covariance matrixP u; k ! 0, and the stateerror covariance matrixP x; k = E[�x(t; k)�x(t; k)T ]! 0 uniformlyin [0; nt] ask ! 1.
Proof: Theorem 1 implies thatPu; k ! 0, and consequently The-orem 3 implies thatP u; k ! 0. The rest of the proof is similar to theproof of its counterpart in [1], thus omitted.
III. N UMERICAL EXAMPLE
Application of the algorithm presented by (2), (7), and (8) is nowadded to the same example given in [1]. The convergence character-istics of P u; k; Pu; k, andKk are illustrated in Fig. 1. The top andbottom plots conform with the 10 dB/decade attenuation characteris-tics ofP u; k, Pu; k, andKk.
IV. CONCLUSION
This note presented an “upgraded,” or suboptimal, version of thestochastic algorithm presented in [1]. This presented algorithm doesnot require the use of the state matrix. In the presence of uncorrelatedrandom state disturbance, reinitialization errors, and biased measure-ment errors, this algorithm is shown to drive the input error covariancematrix to zero as the number of learning iterations increases. The rateof convergence is shown to be inversely proportional to the number of
iterations. The state error covariance matrix is also shown to convergeuniformly to zero in presence of random measurement errors.
ACKNOWLEDGMENT
The author would like to thank one of the earlier reviewers for con-structive suggestions in improving this work.
REFERENCES
[1] S. S. Saab, “A discrete-time stochastic learning control algorithm,”IEEETrans. Automat. Contr., vol. 46, pp. 877–887, June 2001.
Robust Nonlinear Integral Control
Zhong-Ping Jiang and Iven Mareels
Abstract—It is well known from linear systems theory that an integralcontrol law is needed for asymptotic set-point regulation under parameterperturbations. This note presents a similar result for a class of nonlinearsystems in the presence of an unknown equilibrium due to uncertain non-linearities and dynamic uncertainties. Both partial-state and output feed-back cases are considered. Sufficient small-gain type conditions are iden-tified for existence of linear and nonlinear control laws. A procedure forrobust nonlinear integral controller design is presented and illustrated viaa practical example of fan speed control.
Index Terms—Dynamic uncertainties, input-to-state stability, nonlinearsystems, robust integral control, small-gain.
I. INTRODUCTION
It is widely recognized that an integral controller is inherently robustin the face of model and controller parameter variations. The value ofintegral control in achieving robust asymptotic regulation has recentlybeen exploited for nonlinear uncertain systems—see, e.g., [1]–[4], [9],and [10], and the references therein. In [1] and [2], Freeman and Koko-tovic propose a backstepping scheme for robust integral control of aclass of nonlinear systems with unknown nonlinearities. Global set-point regulators with disturbance rejection property are constructed atthe price of assuming full-state information and the relative degreebeing equal to the system order. Both assumptions in [1], [2] are re-laxed by Khalil [10] by means of his “high-gain observers” techniquescomplemented by the idea of saturating the controller outside a com-pact set of interest. Naturally, as a consequence of the “worst-case”design, the results in [10] are of regional and semiglobal types.
The purpose of this note is to propose global regulation results fora class of nonlinear systems with disturbances combining those in [1],[2], [10], i.e., we do consider unmeasured zero-dynamics and uncer-tain nonlinearities. Both partial-state and output feedback control caseswill be investigated. The obtained results extend our previous results
Manuscript received November 7, 2000; revised March 22, 2001. Recom-mended by Associate Editor Z. Lin. This work was supported in part by theNational Science Foundation under Grants INT-9987317 and ECS-0093176.
Z.-P. Jiang is with the Department of Electrical and Computer Engineering,Polytechnic University, Brooklyn, NY 11201 USA (e-mail: [email protected]).
I. Mareels is with the Department of Electrical and Electronic Engineering,Melbourne University, Parkville 3052 Victoria, Australia (e-mail: [email protected]).
Publisher Item Identifier S 0018-9286(01)07685-1.
0018–9286/01$10.00 © 2001 IEEE