12
IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd D¨ urr, Christian Ebenbauer, Frank Allg¨ ower, Furong Gao Abstract—In this paper, we develop an extremum seeking control method integrated with iterative learning control to track a time-varying optimizer within finite time. The behavior of the extremum seeking system is analyzed via an approximating system – the modified Lie bracket system. The modified Lie bracket system is essentially an online integral-type iterative learning control law. The paper contributes to two fields, namely, iterative learning control and extremum seeking. First, an online integral type iterative learning control with a forgetting factor is proposed. Its convergence is analyzed via k-dependent (iteration- dependent) contraction mapping in a Banach space equipped with λ-norm. Second, the iterative learning extremum seeking system can be regarded as an iterative learning control with “control input disturbance”. The tracking error of its modified Lie bracket system can be shown uniformly bounded in terms of iterations by selecting a sufficiently large frequency. Furthermore, it is shown that the tracking error will finally converge to a set, which is a λ-norm ball. Its center is the same with the limit solution of its corresponding “disturbance-free” system (the iterative learning control law); and its radius can be controlled by the frequency. Index Terms—Contraction mapping, extremum seeking (ES), iterative learning control (ILC), Lie bracket, λ-norm I. I NTRODUCTION C ONSIDERABLE research efforts have been devoted to extremum seeking control for the last several decades. The mechanism of extremum seeking control is to optimize a certain system performance measure (cost function) by adaptively adjusting the system parameters merely according to output measurements of the plant. Since there is little knowl- edge required about the plant dynamics, extremum seeking control has attracted the attention from various engineering domains, e.g., bioreactor, combustion, compressor [1]–[3]. In most of the classic extremum seeking literatures, the basic assumption is that the static input-output mapping is time invariant, i.e., [2], [4], [5] and the references therein. However, time varying input-output mapping is common in practice. For example, injection molding engineers are quite interested in finding an optimal ram velocity profile to minimize the unevenness of polymer melt front velocity (PMFV) in the mold cavity to yield a smooth surface on the final product. The map from ram velocity to PMFV is typically time varying, due to the complex geometry in the cavity [6]. Zhixing Cao and Furong Gao are with the Department of Chemical and Biomolecular Engineering, Hong Kong University of Science and Technology, Hong Kong SAR. Hans-Bernd D¨ urr, Christian Ebenbauer and Frank Allg¨ ower are with the Institute for Systems Theory and Automatic Control, University of Stuttgart, 70569 Germany. E-mail: email: {zcaoab, kefgao}@ust.hk, {hans- bernd.duerr, ce, frank.allgower}@ist.uni-stuttgart.de. In this paper, we propose a novel extremum seeking scheme based on iterative learning control to find the optimizer (op- timal trajectory) of a time varying mapping. Particularly, the contributions of the paper are threefold and summarized as follows. First, a novel extremum seeking approach is developed to solve the optimizer tracking problem by introducing an analog memory into the extremum seeking loop, which differs from the existing methods [1], [7]–[11]. There are two categories of similar approaches concerning the time varying mappings reported in literatures. Wang and Krsti´ c introduced a detector to minimize the amplitude of stable limit cycle by tuning a controller parameter to a constant optimizer [7]. Guay and his colleagues employed system flatness to parameterize all the variables by sine and cosine series; extremum seeking was used to steer the coefficients of the series to the optimizer [8]. Haring and his coworkers have developed a mean-over- perturbation-period filter to produce an estimate of the gradient for extremum seeking loop [9]. The underlying assumption of all these methods is that the corresponding cost functions, although time varying, admit a constant optimizer, which is different from our discussion in this paper. As for the second type, Krsti´ c introduced a compensator for time varying mapping which is structured by Wiener-Hammerstein models. This method requires the knowledge about the two time varying blocks, which may restrict its applicability [1], [10]. An adaptive delay-based estimator was introduced to feed gra- dient estimates to extremum seeking loop by Sahneh and his coworkers. The extremum seeking loop is a cascade feedback in nature [11]. Basically, this type of methods employed a fast decay ratio to suppress the tracking error. However, in some circumstances, these methods may not be able to yield satisfying transient tracking performance. Fortunately, many time varying systems exhibit certain repetitive behaviors, such as semiconductor manufacturing, pharmaceutical producing and the injection molding mentioned above [12]. Exploiting such repetitiveness provides potential for circumventing the aforementioned drawbacks. Actually, the works in [8], [9] have already utilized such a feature but not that sufficiently, since they only used it to reduce the cost function into a starting-time-independent function. Iterative learning control, first proposed by Arimoto et. al. [13], is good at exploiting repetitiveness to improve tracking performance from iteration to iteration [12], [14]–[16]. Second, the proposed approach is a new ILC scheme. The fundamental assumption adopted in most ILC literature is that the tracking trajectory has been already available as priori knowledge [17], which renders the controller knowing the arXiv:1509.01912v1 [math.OC] 7 Sep 2015

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

  • Upload
    ngodan

  • View
    226

  • Download
    1

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1

Iterative learning extremum seeking tracking via Liebracket approximation

Zhixing Cao, Hans-Bernd Durr, Christian Ebenbauer, Frank Allgower, Furong Gao

Abstract—In this paper, we develop an extremum seekingcontrol method integrated with iterative learning control to tracka time-varying optimizer within finite time. The behavior ofthe extremum seeking system is analyzed via an approximatingsystem – the modified Lie bracket system. The modified Liebracket system is essentially an online integral-type iterativelearning control law. The paper contributes to two fields, namely,iterative learning control and extremum seeking. First, an onlineintegral type iterative learning control with a forgetting factor isproposed. Its convergence is analyzed via k-dependent (iteration-dependent) contraction mapping in a Banach space equippedwith λ-norm. Second, the iterative learning extremum seekingsystem can be regarded as an iterative learning control with“control input disturbance”. The tracking error of its modifiedLie bracket system can be shown uniformly bounded in terms ofiterations by selecting a sufficiently large frequency. Furthermore,it is shown that the tracking error will finally converge to aset, which is a λ-norm ball. Its center is the same with thelimit solution of its corresponding “disturbance-free” system (theiterative learning control law); and its radius can be controlledby the frequency.

Index Terms—Contraction mapping, extremum seeking (ES),iterative learning control (ILC), Lie bracket, λ-norm

I. INTRODUCTION

CONSIDERABLE research efforts have been devoted toextremum seeking control for the last several decades.

The mechanism of extremum seeking control is to optimizea certain system performance measure (cost function) byadaptively adjusting the system parameters merely accordingto output measurements of the plant. Since there is little knowl-edge required about the plant dynamics, extremum seekingcontrol has attracted the attention from various engineeringdomains, e.g., bioreactor, combustion, compressor [1]–[3]. Inmost of the classic extremum seeking literatures, the basicassumption is that the static input-output mapping is timeinvariant, i.e., [2], [4], [5] and the references therein. However,time varying input-output mapping is common in practice.For example, injection molding engineers are quite interestedin finding an optimal ram velocity profile to minimize theunevenness of polymer melt front velocity (PMFV) in the moldcavity to yield a smooth surface on the final product. The mapfrom ram velocity to PMFV is typically time varying, due tothe complex geometry in the cavity [6].

Zhixing Cao and Furong Gao are with the Department of Chemical andBiomolecular Engineering, Hong Kong University of Science and Technology,Hong Kong SAR. Hans-Bernd Durr, Christian Ebenbauer and Frank Allgowerare with the Institute for Systems Theory and Automatic Control, Universityof Stuttgart, 70569 Germany. E-mail: email: {zcaoab, kefgao}@ust.hk, {hans-bernd.duerr, ce, frank.allgower}@ist.uni-stuttgart.de.

In this paper, we propose a novel extremum seeking schemebased on iterative learning control to find the optimizer (op-timal trajectory) of a time varying mapping. Particularly, thecontributions of the paper are threefold and summarized asfollows.

First, a novel extremum seeking approach is developed tosolve the optimizer tracking problem by introducing an analogmemory into the extremum seeking loop, which differs fromthe existing methods [1], [7]–[11]. There are two categoriesof similar approaches concerning the time varying mappingsreported in literatures. Wang and Krstic introduced a detectorto minimize the amplitude of stable limit cycle by tuning acontroller parameter to a constant optimizer [7]. Guay and hiscolleagues employed system flatness to parameterize all thevariables by sine and cosine series; extremum seeking wasused to steer the coefficients of the series to the optimizer[8]. Haring and his coworkers have developed a mean-over-perturbation-period filter to produce an estimate of the gradientfor extremum seeking loop [9]. The underlying assumptionof all these methods is that the corresponding cost functions,although time varying, admit a constant optimizer, whichis different from our discussion in this paper. As for thesecond type, Krstic introduced a compensator for time varyingmapping which is structured by Wiener-Hammerstein models.This method requires the knowledge about the two timevarying blocks, which may restrict its applicability [1], [10].An adaptive delay-based estimator was introduced to feed gra-dient estimates to extremum seeking loop by Sahneh and hiscoworkers. The extremum seeking loop is a cascade feedbackin nature [11]. Basically, this type of methods employed afast decay ratio to suppress the tracking error. However, insome circumstances, these methods may not be able to yieldsatisfying transient tracking performance. Fortunately, manytime varying systems exhibit certain repetitive behaviors, suchas semiconductor manufacturing, pharmaceutical producingand the injection molding mentioned above [12]. Exploitingsuch repetitiveness provides potential for circumventing theaforementioned drawbacks. Actually, the works in [8], [9]have already utilized such a feature but not that sufficiently,since they only used it to reduce the cost function into astarting-time-independent function. Iterative learning control,first proposed by Arimoto et. al. [13], is good at exploitingrepetitiveness to improve tracking performance from iterationto iteration [12], [14]–[16].

Second, the proposed approach is a new ILC scheme. Thefundamental assumption adopted in most ILC literature is thatthe tracking trajectory has been already available as prioriknowledge [17], which renders the controller knowing the

arX

iv:1

509.

0191

2v1

[m

ath.

OC

] 7

Sep

201

5

Page 2: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2

direction to steer. Within this paper, we do not have suchan assumption; only the distance to the tracking trajectory isknown, i.e., the absolute value of tracking error. Intuitively,almost all the ILC approaches will fail for this situation, sincesomehow the negative feedback along the iterations cannot beensured. It is natural to come up with combining extremumseeking with ILC; let them collaborate with each other: ILCprovides the past learning experience to extremum seekingto improve transient tracking performance; the “direction”information needed by ILC is given by extremum seeking.

Third, we propose a new online infinite-dimensionalintegral-type (I-type) ILC control law. The dynamical behaviorof the ILC system is studied by k-dependent contraction map-ping. The convergence of the ILC system is discussed. Similarresults are [18]–[22]. [18] has studied the proportional-type (P-type) online ILC, but only derived an index bound regardingthe ultimate tracking performance. Wang has considered thesampling effect and input saturation issues in the offline P-typeILC, and implemented it experimentally [19]. In [20], Saabhas investigated the offline P-type and D-type (derivative-type)ILC for the stochastic scenario, where a dynamic learninggain was adopted. [21] has discussed the forgetting factorselection for a general offline ILC algorithm. Ouyang andhis colleagues have developed an online PD-type ILC for aclass of input-affine nonlinear system, and also presented anultimate bound of tracking performance [22]. According toauthors’ knowledge, there is few papers contributing to onlineI-type ILC. Furthermore, we present more than an ultimatebound of tracking performance; the limit solution and itsuniqueness are studied as well. More interestingly, the natureof the proposed extremum seeking system has been revealed,when the behaviors of the proposed methods are analyzed viaan approximating system – modified Lie bracket system. It isan online integral-type ILC with “control input disturbance”.Based on the same reasoning, we extend the results on ILC toanalyze the iterative learning extremum seeking system. Weshow that the system is uniformly bounded in terms of k andconverges to a set as k goes to infinity. The particular set is a λ-norm ball, whose center is the limit solution of the associatedILC system, and radius can be controlled by the frequency ofthe sinusoid signal.

The rest of the paper is structured as follows: SectionII gives the technical preliminaries about λ-norm and Liebracket system; Section III formulates the problem; SectionIV presents the main results; Section V provides an illustrativeexample for the theory; a conclusion is drawn and an outlookis given in Section VI.

Notations: N++ and N0 denote the set of positive integersexcluding and including zero respectively. Q++ is for thepositive rational numbers. Mn is for all the matrices withdimensions n × n. Cn with n ∈ N0 stands for the set ofn times continuously differentiable functions and C∞ for theset of smooth functions. The gradient of a continuous function

f ∈ C1 : Rn → R is ∇xf(x) ,[∂f(x)∂x1

, . . . , ∂f(x)∂xn

]T. Two

vector fields f, g : Rn × R → Rn are twice continuouslydifferentiable; their Lie bracket is defined as [f, g](x, t) ,∂g(x,t)∂x f(x, t) − ∂f(x,t)

∂x g(x, t). For a point x ∈ X and a set

S ⊂ X , x /∈ S, the distance from x to S is defined asdist(x,S) = mins∈S ‖x− s‖. For two compact sets X ,Y andX ⊂ Y , ∂Y for the boundary of Y , the distance is defined asdist(X ,Y) = minx∈X ,y∈∂Y ‖x− y‖. int X means the interiorof set X .

II. PRELIMINARIES

A. λ-norm

The λ-norm, introduced by Arimoto et. al. in 1984 [13], isa topological measure widely used to analyze the convergenceof ILC control law [23]. The formal definition of λ-norm isas follows.

Definition 2.1: [23] The λ-norm of a function f : [0, L]→Rn is

‖f(•)‖λ = maxt∈[0,L]

e−λt‖f(t)‖∞

where ‖f(t)‖∞ = max1≤i≤n |fi(t)|.From the definition, it is easy to see that

‖f(•)‖λ ≤ ‖f(•)‖C ≤ eλL‖f(•)‖λfor positive λ, where ‖f(•)‖C = maxt∈[0,L] ‖f(t)‖∞.This shows that the λ-norm is equivalent to the C-norm,which means the convergence with respect to λ-norm isstill valid with respect to C-norm. Its advantage is that anon-monotonically converged sequence on C-norm can bemonotonically converged on λ-norm for a properly chosen λ.

B. Lie bracket system

In the classic extremum seeking literatures, for example[4], the behavior of the original extremum seeking systemis analyzed by averaging. However, within this paper, anemerging analysis tool based on the Lie bracket approximationis going to be used to study the extremum seeking system. Foran input-affine extremum seeking system

x = b1(x)√ωu1(ωt) + b2(x)

√ωu2(ωt)

with ω ∈ (0,∞), its Lie bracket system is

z =1

2[b1, b2](z)

For instance, in a traditional ES system, b1(x) = 1, b2(x) =−αf(x), α > 0, f(x) ∈ C2 : Rn → R admitting a localminimum, its Lie bracket system is z = −α∇f(z)/2, whichclearly minimizes the cost function. One advantage of usingLie bracket approximation is that the approximating systemand the original system share the same rate of convergence,i.e., the exponential stability of the Lie bracket system impliesthe practical exponential stability of the original one. Fordetails, please refer to [24].

III. PROBLEM STATEMENT

In this paper, we study the following optimization problem:at each time t ∈ [0, L], we solve the

minx(t)

F (x(t), t) (1)

Page 3: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 3

where x(t) ∈ Rn and F : Rn × [0, L]→ R,

F (x(t), t) = f∗(t) + [x(t)− x∗(t)]TQ[x(t)− x∗(t)] (2)

Q ∈Mn is a positive definite matrix. L > 0 is the finite timeduration or period. For each time t, x(t) = x∗(t) is the solutionto the optimization problem (1) with the corresponding optimalvalue f∗(t). For all time t ∈ [0, L], all the x∗(t) form a timevarying optimizer/minimizer (or called optimizer/minimizertrajectory) x∗ : [0, L] → Rn; if x∗(t) ≡ c for any t, ca constant vector, then x∗ is named a constant optimizer.Similarly, the optimal trajectory f∗ : [0, L]→ R consists of allthe optimal value f∗(t) over the interval [0, L]. x(t) in (1) orits ensemble x is named optimization variable or simply input.Obviously, the study on minimization problem does not lossany generality; it can be easily extended to the maximizationproblem by only altering the sign of the extremum seekinggain. According to Ariyur and Krstic (pp. 21) [1], the above-mentioned form of parameterization is able to approximateany vector function F (x(t), t) with a quadratic minimum atx∗(t). The objective for extremum seeking is to steer x to theminimizer trajectory x∗ over [0, L].

Mem

ory

↵p! sin(!t)

p! cos(!t)

1/s

F (x, t)

(1 � �)xk�1

xk

Fig. 1: Block diagram of iterative learning extremum seeking

A. Classical approach

The input x in the conventional ES literatures, is driven tothe neighborhood of the minimizer trajectory x∗ by solvingthe following ordinary differential equation (ODE).

x(t) = −αF (x(t), t)√ω sin(ωt) +

√ω cos(ωt) (3)

Here α ∈ (0,∞) is a constant gain. The black part in Fig.1 presents the closed loop of this classic approach. For afast changing minimizer trajectory, this approach may fail toachieve a satisfactory tracking within finite duration.

B. Main idea

Since the repeated operation mode of the repetitive process[12], there is no need to solve the tracking problem in onlysingle iteration; instead, it can be solved in many iterationseven infinite iterations. Since the previous input is a goodapproximation of the optimizer trajectory, it is quite handyand natural to modify the previous input to generate a new

control input. Borrowing the ideas from ILC, we propose tosolve the ES tracking problem by solving the following ODE.

xk(t) = (1−β)xk−1(t)−αF (xk(t), t)√ω sin(ωt)+

√ω cos(ωt)

(4)The subscript k ∈ N++ indicates the iteration index; β ∈ (0, 1]is the forgetting factor, which has been adopted in many ILCliteratures, i.e., [18]. To keep the notation simple, xk willbe used in the rest of paper instead of xk(t) without anyambiguity and the same for other similar variables, unlessstated otherwise.

Physically, (4) introduces a memory storing the input (xk−1)of the previous batch into the extremum seeking loop as thered part shown in Fig. 1. The term xk−1 is the feed-forwardcomponent, while the second and third terms in the right handside of (4) are the feedback component. It is intuitively un-derstandable that the proposed method could result in a betterperformance than the conventional one (3), since the feed-forward term somehow can facilitate the tracking. Meanwhile,the mechanism is always feeding a new input into the systemby insistently using feedback information to “polish” xk−1,a rough “guess” of the minimizer. Thus, it can be expectedthat the tracking performance would improve gradually as therough “guess” is becoming finer.

Remark 3.1: It is noted that when k = 1, (4) becomesx1 = (1− β)x0−αF (x1, t)

√ω sin(ωt) +

√ω cos(ωt), which

is exactly the standard extremum seeking, if x0(t) and x0(t)are assumed to be zero, or equivalently the memory is resetto zero.

Since we are only interested in the system over the finiteinterval, there is no need to discuss its asymptotic stabilityalong the time direction. The problems we are more in-terested in are under what condition xk will approach toa small neighborhood of x∗ when k tends to infinity, i.e.,‖xk − x∗‖C < D, k →∞ and what determines D.

IV. MAIN RESULTS

Note that if (3) iterates itself to different k, the correspond-ing coefficients on cos(ωt) will be different. On the other hand,the Lie bracket approximation is only valid for systems withrespect to t not to both k and t. Thus, we cannot directlyemploy it but have to tailor it accordingly. We propose to usethe modified Lie bracket (MLB) system to approximate theoriginal one.

zk =(1− β)xk−1 +γk2

[1,−αF ](zk, t)

=(1− β)xk−1 −αγk

2∇zF (zk, t)

(5)

Here the only modification is the introduction of γk. γk isa compensating parameter and only related to the forgettingfactor β as defined below.

γk =1− (1− β)k

β(6)

It is evident that {γk} is a monotonically increasing sequenceand 1 ≤ γk ≤ 1/β. The first equality holds if and only ifk = 1, which implies that the MLB system (5) reduces to thetraditional Lie bracket system when k = 1. The term xk−1 is a

Page 4: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 4

feed-forward term and only a function of time with respect tothe current iteration k, and does not contain any sinusoid term.Thus, it should not be included into the Lie bracket accordingto the theory in [24].

Remark 4.1: Observing (5), the MLB system (5) is unlikethe conventional Lie bracket system, because of the existenceof the derivation of xk−1 rather than being an independentsystem of itself. Inserting (2) into (5) and denoting

Γk =αγkQ

2=

[1− (1− β)k]αQ

we can rewrite (5) as

zk = (1−β)zk−1−Γk(zk−x∗)+(1−β)(xk−1− zk−1) (7)

Before presenting our main results, we impose the followingassumptions.• A1) The time varying optimizer x∗ ∈ C1 : [0, L] → Rn

and optimal trajectory f∗ ∈ C1 : [0, L]→ R;• A2) The initial condition of each iteration of (4) is

identical and equal to zero, i.e. xk(0) = x(0) = 0,∀k; sois the MLB system (5), i.e. zk(0) = x(0) = 0 for all k;

• A3) Assume that Q is bounded and Q ≥ δI , where δ isa known positive real number.

Remark 4.2: A1 is assumed to ensure the existence ofF (x, t)’s first-order partial derivative ∂F (x,t)

∂t , which is re-quired by integration by parts. A2 is a common assumption inmajority of ILC literatures to simplify derivations [14], calledidentical initial condition (i.i.c). The second part of A2 isassumed to let MLB system be in accordance with the originalsystem. A3 means that we do not require exact knowledgeabout Q, the Hessian matrix, but requires a lower bound ofQ for the minimum case, since a known Q means having aprecise knowledge about the plant, which is generally impossi-ble in practice. δ is only required by the following conceptualanalysis, does not restrict our method’s applicability.

Taking integration on both sides of (7), it turns to be

zk = (1−β)zk−1−Γk

∫ t

0

(zk−x∗)ds+(1−β)(xk−1−zk−1)

(8)The zk can be interpreted as control input with zk − x∗ astracking error. Then, the rewriting above clearly shows that it isin the form of integral-type ILC online (feedback) control [14]with some “control input disturbance”, i.e., (1 − β)(xk−1 −zk−1). �

Note that (5) is a linear ordinary differential equation, andwe can write down its explicit solution.

zk = e−Γktz(0) +

∫ t

0

e−Γk(t−s)[Γkx∗ + (1− β)xk−1]ds

It follows from z(0) = 0 and integrating by parts that

zk =(1− β)e−Γkt

[eΓksxk−1

∣∣t0−∫ t

0

eΓksΓkxk−1ds

]+ e−Γkt

∫ t

0

e−ΓksΓkx∗ds

Because xk−1(0) = 0, we have

zk = (1−β)xk−1+e−Γkt

∫ t

0

eΓksΓk[x∗−(1−β)xk−1]ds (9)

Now we define the tracking error of the MLB system (5) asyk , zk − x∗; the error system is

yk = Tk

(yk−1 + xk−1 − zk−1 −

β

1− β x∗)

(10)

where Tk is the mapping as follows.

Tk(x)(t) = (1−β)x(t)−(1−β)e−Γkt

∫ t

0

eΓksΓkx(s)ds (11)

For the sake of simple notation, we will write (11) for short asTk(x) = (1−β)x−(1−β)e−Γkt

∫ t0

eΓksΓkxds by abusing Tka little. Note that the mapping above is k-dependent because ofΓk is k-dependent. Now we will give the result of contractionmapping for k-dependent case, which differs from pp. 655[25] (where only k-invariant mappings are studied).

An operator T between two real linear spaces X and Y iscalled a linear mapping or linear operator if T (λx + µy) =λT (x) + µT (y) for all λ, µ ∈ R and x, y ∈ X . There is anorm ‖ • ‖ defined on X and Y . Then, a linear mapping isbounded if there exists a constant M ≥ 0 such that

‖T (x)‖ ≤M‖x‖ for all x ∈ X .The sets consisted of all these bounded linear mapping T isdenoted by B(X ,Y). We write B(X ) for short if the domainand range spaces are the same. Moreover, the mapping normis defined as

‖T‖ = supx 6=0

‖T (x)‖‖x‖ .

Definition 4.1 (Uniformly convergence (pp.109, [26])): If{Tk} is a sequence of mappings in B(X ,Y) and

limk→∞

‖Tk − T‖ = 0

for some T ∈ B(X ,Y), then we say that Tk convergesuniformly to T .

Note that ‖Tk(x) − T (x)‖ ≤ ‖Tk − T‖‖x‖. Given that‖x‖ is bounded, limk→∞ ‖Tk − T‖ = 0 implies thatlimk→∞ Tk(x) = T (x). In other words, uniform convergenceimplies strong convergence.

Theorem 4.1 (k-dependent contraction mapping): Let S bea compact subset of a Banach space X . If a mapping sequence{Tk} satisfies• C1) for every k, Tk ∈ B(S);• C2) ‖Tk(x) − Tk(y)‖ ≤ ρ‖x − y‖,∀x, y ∈ S, for a

universal contraction mapping ratio ρ ∈ [0, 1), denotingT as the set consisted of all such Tk;

• C3) Tk converges uniformly to T∞ as k →∞, for someT∞ ∈ T ,

then Tk is called a k-dependent contraction mapping on S.Furthermore,• there exists a unique solution x? ∈ S satisfying x? =T∞(x?);

• x? is independent of the initial value x1 ∈ S.Proof: The proof is similar to Theorem B.1 in [25].

Arbitrarily select x1 ∈ S and generate a sequence {xk}according to the formula xk+1 = Tk(xk). Every xk ∈ S, sinceTk ∈ B(S). First, we will show {xk} is a Cauchy sequence.

Page 5: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 5

It follows from S is compact that there is a constant D > 0such that ‖x‖ ≤ D for all x ∈ S. Additionally, that {Tk}is a Cauchy sequence follows, since Tk converges uniformly.Then, for an arbitrary ε > 0, there exists a kε such that‖Tm − Tn‖ ≤ ε/D for any m,n ≥ kε. For k − 1 > kε,we have

‖xk+1 − xk‖ =‖Tk(xk)− Tk−1(xk−1)‖≤‖Tk(xk)− Tk(xk−1)‖

+ ‖Tk(xk−1)− Tk−1(xk−1)‖≤ρ‖xk − xk−1‖+ ‖Tk − Tk−1‖‖xk−1‖≤ρ‖xk − xk−1‖+ ε

It follows that

‖xk+r − xk‖ ≤r−1∑i=0

‖xk+i+1 − xk+i‖

≤r−1∑i=0

[ρi+1‖xk − xk−1‖+ ρiε

]≤ ρ

1− ρ‖xk − xk−1‖+ε

1− ρ≤ ρ

1− ρ (ρ‖xk−1 − xk−2‖+ ε) +ε

1− ρ

≤ ρ

1− ρ

(ρk−kε−1‖xkε+1 − xkε‖+

k−kε−2∑i=0

ρiε

)+

ε

1− ρ

≤2ρk−kεD1− ρ +

ρε

(1− ρ)2+

ε

1− ρ

≤2ρk−kεD1− ρ +

ε

(1− ρ)2

Since ε is chosen arbitrarily, the right hand side will go to 0as k →∞. Hence, {xk} is a Cauchy sequence. From that Xis complete, xk → x? ∈ X as k → ∞. Furthermore, S isclosed; it follows that x? ∈ S.

The second step is to prove x? = T∞(x?). For any xk+1 =Tk(xk), it can be obtained that

‖x? − T∞(x?)‖ ≤‖x? − xk+1‖+ ‖xk+1 − T∞(x?)‖≤‖x? − xk+1‖+ ‖xk+1 − T∞(xk)‖

+ ‖T∞(xk)− T∞(x?)‖≤‖x? − xk+1‖+ ‖Tk − T∞‖‖xk‖

+ ρ‖xk − x?‖It is apparent that the right hand side can be made arbi-trarily small by selecting a sufficiently large k. Thereafter,x? = T∞(x?).

Finally, we will show the uniqueness. Suppose that there isanother fix point y? satisfying y? = T∞(y?). Then, we have

‖x? − y?‖ ≤ ‖T∞(x?)− T∞(y?)‖ ≤ ρ‖x? − y?‖Since ρ is strictly less than 1, it is a contradiction; then x? =y?. This completes the whole proof.

Define a Banach space X = C[0, L] and a compact set

S = {y ∈ X |‖y‖λ ≤ D2} (12)

It implies that zk is contained in Sz = {z ∈ X |‖z − x∗‖λ ≤D2}. The lemma below shows the conditions to fulfill C2 forTk.

Lemma 4.1: Let A1-A3 hold. For arbitrary β ∈ (0, 1), thereis a λ0 such that ‖Tk(x) − Tk(y)‖λ ≤ ρ‖x − y‖λ for anyx, y ∈ X , ρ ∈ (0, 1) for every λ ∈ (λ0,∞). Moreover, ρ andλ0 are independent of k.

Proof: For details of the proof, please refer to AppendixA.

Remark 4.3: Many ILC literatures do not use such aforgetting factor, since its existence will comprise the zerotracking error (perfect tracking) iteration-wisely [18]. How-ever, the above lemma suggests otherwise in our case. It isnecessary to introduce such a forgetting factor β to ensure theexistence of the k-dependent contraction mapping, since thatthe perfect tracking cannot be achieved due to the existence ofdither signal. If without β, this undesired effect would keepaccumulating and the overall performance would deterioraterapidly.

Remark 4.4: If the C-norm is used instead of λ-norm, orequivalently λ = 0, then according to the proof of Lemma4.1, β has to be greater than 2−

√2. It suggests that λ-norm

somehow enlarges the feasible basin of β.Remark 4.5: From (11), it is evident that the mapping Tk

is a linear mapping. Thus, we can rewrite (10) as

yk = Tk(yk−1) +Tk(xk−1− zk−1) +Tk

(− β

1− β x∗)

(13)

In (13), it shows that the second and third terms on the righthand side play a role like “disturbances”; the second oneis caused by the approximation, while the third is causedby the forgetting factor. The two “disturbances” are stilldifferent: Tk(xk−1−zk−1) is a persistently active noise, whileTk[−βx∗/(1− β)] is just a constant offset.

A. I-type ILC

To this end, we are ready to present our contribution onILC, which studies the behavior of the disturbance-free MLBsystem (8), i.e.,

zk = (1− β)zk−1 − Γk

∫ t

0

(zk − x∗)ds (14)

Or equivalently, its corresponding error system

yk = Tk(yk−1) + Tk

(− β

1− β x∗)

(15)

will be studied. It should be mentioned that an underlyingassumption has been imposed, namely, x∗ is known, when wediscuss on I-type ILC. Now, we define another mapping as

Gk(x) , Tk(x) + Tk

(− β

1− β x∗)

The objective is to show Gk is a k-dependent contractionmapping. It is easy to verify that Gk satisfies C2, since thatGk(x) − Gk(y) = Tk(x) − Tk(y) and Tk(x) − Tk(y) fulfillsC2. The following lemma gives a sufficient condition that Gkfulfills C1.

Page 6: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 6

Lemma 4.2: Let A1-A3 be satisfied. Given λ > 0, β ∈(0, 1), ρ ∈ (0, 1), if D2 in (12) satisfies

D2 ≥ max{D0, D∗},

then Gk maps S into S. D∗ is defined as

D∗ =βρ

(1− β)(1− ρ)‖x∗‖λ

And D0 = ‖y1‖λ, y1 yielded by

y1 = −x∗ − Γ1

∫ t

0

y1ds

It is simply executing (15) on k = 1.Proof: For details of the proof, please refer to Appendix

B.Lemma 4.2 not only presents a sufficient condition for Gk

satisfying C1, but also states that ‖yk‖λ is uniformly bounded.Obviously, we can offer more than that, i.e., convergence anduniqueness of the limit solution, by invoking the k-dependentcontraction mapping theorem.

Theorem 4.2: A1-A3 are satisfied. If D2 ≥ max{D0, D∗}

as in Lemma 4.2, there exists a unique limit solution for ykin (15) as k tends to infinity. Moreover, limk→∞ yk = y∞,where y∞ is defined as the solution to the following equation.

y∞ = T∞

(y∞ −

β

1− β x∗)

(16)

T∞ is the limit of Tk as k tends to infinity, like

T∞(x) = (1− β)x− (1− β)e−Γ∞t

∫ t

0

eΓ∞sΓ∞xds (17)

with

Γ∞ =αQ

2β(18)

Proof: It is noted from (16) that (16) is equivalent to

y∞ = G∞(y∞)

Hence, we only need to check whether Gk satisfies C1-C3or not. Since X is equipped with λ-norm, we can define themapping norm by the λ-norm as

‖Gk‖ = sup‖x‖λ=1

‖Gk(x)‖λ

Expanding Gk(x), we have

‖Gk‖ ≤ sup‖x‖λ=1

‖Tk(x)‖λ +

∥∥∥∥Tk (− β

1− β x∗)∥∥∥∥

λ

On the other hand, from Lemma 4.1, it is known that‖Tk(x)‖λ = ‖Tk(x) − Tk(0)‖λ ≤ ρ‖x‖λ. It immediatelyfollows that

‖Gk‖ ≤ ρ+

∥∥∥∥Tk (− β

1− β x∗)∥∥∥∥

λ

< +∞

Combining with Lemma 4.2, Gk ∈ B(S) for arbitrary k. Alsonoted from Lemma 4.1, the mapping sequence {Gk} satisfiesC1 and C2; it remains to show that Gk converges to G∞

uniformly. It suffices to show Tk converges to T∞ uniformly.By definition, we have

‖Tk − T∞‖ = sup‖x‖λ=1

‖Tk(x)− T∞(x)‖λ

From the definition of λ-norm and (10), we further get that

‖Tk − T∞‖ ≤ (1− β)

× supt∈[0,L]‖x‖λ=1

e−λt∥∥∥∥∫ t

0

(e−Γk(t−s)Γk − e−Γ∞(t−s)Γ∞

)xds

∥∥∥∥∞

By the mean value theorem, it is easy to derive that

‖Tk − T∞‖

≤(1− β) supt∈[0,L]‖x‖λ=1

e−λt∥∥∥∥∫ t

0

(e−Γk(t−s)Γk − e−Γ∞(t−s)Γ∞

)ds

∥∥∥∥∞

× ‖x(ξ)‖∞, (ξ ∈ [0, L])

≤(1− β) supt∈[0,L]

∥∥∥∥∫ t

0

(e−Γk(t−s)Γk − e−Γ∞(t−s)Γ∞

)ds

∥∥∥∥∞

≤(1− β) supt∈[0,L]

∥∥∥e−Γkt − e−Γ∞t∥∥∥∞

The term exp(−Γkt) can approach exp(−Γ∞t) arbitrarilysmall as k →∞. Therefore, it can be concluded that

‖Tk − T∞‖ → 0 as k →∞which is the uniform convergence. Then, apply the k-dependent contraction mapping, we can conclude the result.

Remark 4.6: Theorem 4.2 shows that the trajectory of thedisturbance-free MLB system (14) will ultimately converge toa fixed trajectory, which is parameterized by β and x∗. From(17), it is clear that 0 is a solution to the equation x = T∞(x).From (16), one can see that y∞ will approach 0 if β → 0. Butwill y∞ be 0 if β = 0? From Lemma 4.1, it is known thatthe contraction mapping method will fail when β = 0. Thus,we use another way to prove that conjecture in the followingtheorem.

Before presenting the theorem, we have to make a remarkon this particular case β = 0. In the following theorem, wewill abandon Γk but use a constant gain Γ (positive definitematrix, not necessary restricting to αQ/2) instead. The reasonsfor doing so are twofold. First, (6) suggests that Γk will beill-defined, since γk →∞. Second, as Remark 4.3 states, themotivation of introducing β is to handle the “control inputdisturbance”, and the gain Γk becomes k-varying because ofβ; now we are hereby dealing with “disturbance”-free controlsystem.

Theorem 4.3: Considering β = 0, that is zk is generated bythe following formula

zk = zk−1 − Γ(zk − x∗). (19)

If A1-A3 hold and x∗(0) = 0, then

zk → x∗ almost everywhere as k →∞.Proof: From (19), within this proof, we are equivalently

studyingyk = yk−1 − Γyk (20)

Page 7: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 7

Define an index as follows.

Jk =

∫ L

0

e−λtyTk ykdt (21)

where λ > 0. Rewriting (20) as yk−1 = yk + Γyk, we insertit into (21) and compare the difference of (21) between k andk − 1.

Jk − Jk−1 =

∫ L

0

e−λt[yTk yk − (yk + Γyk)T (yk + Γyk)

]dt

=−∫ L

0

e−λtyTk ΓTΓykdt− 2

∫ L

0

e−λtykΓykdt

As for the second term in the right hand side, we integrate byparts and derive that

2

∫ L

0

e−λtyTk Γykdt =e−λtyTk Γyk

∣∣∣∣t=Lt=0

+ λ

∫ L

0

e−λtyTk Γykdt

Thus, we have

Jk − Jk−1 =−∫ L

0

e−λtyTk(ΓTΓ + λΓ

)ykdt

− e−λtyTk (L)Γyk(L)

≤− ρmin

∫ L

0

e−λtyTk ykdt

Let ρmin be the smallest eigenvalue of the matrix ΓTΓ + λΓ.It is apparent that ρmin > 0 since Γ is positive definite andλ > 0. Thus, {Jk} is a non-increasing real number sequence,and Jk is bounded below by 0, Jk converges. It follows that

limk→∞

(∫ L

0

yTk ykdt

) 12

= 0

It is the L2-norm of yk. Thus, we can claim that yk convergesto 0 almost everywhere.

Remark 4.7: Theorem 4.3 gives a weaker result thanTheorem 4.2, since it can converge to 0 except on a setwhose measure is 0 and the uniqueness is lost. The result isquite understandable from a perspective of Laplace transform.Taking Laplace transform on both sides of (20), we have

Yk(s) =s

s+ ΓYk−1(s)

The modulus of the gain is less than 1, i.e., |s/(s+ Γ)| < 1.But it tends to 1 as s → ∞. It suggests that this algorithmhave a weaker decaying effect on high-frequency signal. Itcoincides with the result.

B. Iterative learning extremum seeking (ILES)

We follow the similar idea to study the dynamics of ILES.ILES is an ILC control policy with “control input disturbance”.Due to the “disturbance”, the unique limit solution cannot beachieved. Therefore, ILES converging to a set will be showninstead, if the “disturbance” is bounded.

According to (13), we define a new operator Hk as

Hk(x) , Tk(x) + Tk[−βx∗/(1− β)] + Tk(xk−1 − zk−1)

Since Tk is a bounded operator, so is Hk provided that xk−1−zk−1 is bounded. It is obvious that Hk(x)−Hk(y) = Tk(x)−

Tk(y); it is easy to verify that Hk satisfies C2. Following theprocedures we did in Section IV.A, we are going to show thathow to properly design D2 to ensure Hk maps S into S.

The following lemma lays the foundation of inductivearguments towards that conclusion. It basically means that forany k we can always ensure that xk is in a invariant set giventhat its MLB system zk is in the interior of the invariant set,if x1, x2, . . . , xk−1 are all in that set. This can be achieved byselecting a frequency ω larger than a threshold ω0, which isindependent of k and t.

Lemma 4.3: Let A1-A3 be satisfied and suppose thatx1, x2, . . . , xk0−1 are uniformly in t contained within a com-pact set S ∈ Rn and 0 ∈ int S. If zk0 is contained inK ⊂ int S, 0 ∈ K, then there exists a ω0 ∈ (0,+∞) such thatfor every ω0 ∈ (ω0,∞), xk0(t) is uniformly in t containedwithin S as well for any β ∈ (0, 1). Moreover, dist(K,S) canbe made arbitrarily small by selecting a sufficiently large ω.

Proof: This proof uses the similar arguments as Theorem1 in [24], but tailors them for the iterative case. For details ofthe proof, please refer to the Appendix C.

The lemma above also indicates that the “disturbance”‖xk − zk‖λ is uniformly bounded; it can be arbitrarily smallby selecting a sufficiently large ω.

Theorem 4.4: Let A1-A3 hold. Given λ and ρ ∈ (0, 1), thereexists a ω0 ∈ (0,+∞) such that for every ω ∈ (ω0,∞), Hk

maps S into S, if D2 in (12)

D2 ≥ max{D0, D?}

where D0 is defined in Lemma 4.2 and

D? =ρ

1− ρ

(D1 +

β

1− β ‖x∗‖λ).

D1 is the uniform bound of ‖xk − zk‖λ for an arbitrary k.Proof: We are going to show the theorem in an inductive

way.First, at k = 1, both the extremum seeking system (4) and

the MLB system (5) are in the standard form. From (9), theexplicit solution of the MLB system (5) is

z1 =

∫ t

0

e−Γ1(t−s)Γ1x∗ds

Because all the eigenvalues of −Γ1 lie in the open left complexplane and x∗ is contained in a compact set, z1 is well definedfor initial condition z1(0) = 0 in a compact set over the timeinterval [0, L]. According to the definition of D2, it is knownthat ‖y1‖λ < D2. Thus, by Lie bracket theorem [24], it isknown that the distance between x1 and z1 can be arbitrarilysmall provided that the frequency ω is sufficiently large.Hence, there exists a ω1 such that for every ω ∈ (ω1,∞),we have ‖x1 − z1‖λ < D1.

Now, we assume that ‖yi‖λ < D2 and ‖xi − zi‖λ < D1

for any i = 1, 2, . . . , k− 1 over the entire time interval. Then,we will show that these two relations are still valid for i = k.From Lemma 4.3, it is easy to conclude that there exists a ω2

such that for every ω ∈ (ω2,+∞), ‖xk − zk‖λ < D1. Weonly need to show ‖yk‖λ < D2.

Page 8: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 8

By the linearity of Tk, we have

‖yk‖λ =

∥∥∥∥Hk(yk−1)−Hk(0)

+Tk

(xk−1 − zk−1 −

β

1− β x∗)∥∥∥∥

λ

≤∥∥∥∥Tk (xk−1 − zk−1 −

β

1− β x∗)− Tk(0)

∥∥∥∥λ

+ ‖Hk(yk−1)−Hk(0)‖λ

≤ρ‖yk−1‖λ + ρ

∥∥∥∥xk−1 − zk−1 −β

1− β x∗∥∥∥∥λ

≤ρD2 + ρ

(D1 +

β

1− β ‖x∗‖λ)≤ D2

Therefore, by selecting ω0 = max{ω1, ω2}, we have ‖yk‖λ <D2 for all k. This completes the proof.

Theorem 4.4 suggests that the uniform bound of trackingerror of the MLB system (5) – D2, can be made smallby reducing the approximation error (D1), i.e., employing asufficiently large ω.

Since xk − zk is consistently varying, it is impossible toshow that Hk satisfies C3. Therefore, we cannot conclude theunique limit solution; however, we can show that yk convergesto a λ-norm ball.

Theorem 4.5: A1-A3 are satisfied. Given λ, β ∈ (0, 1) andρ ∈ (0, 1), there exists a ω0 such that for every ω ∈ (ω0,∞),yk in (13) will converge to a set Y as k tends to infinity.Furthermore,

Y = {y ∈ S|‖y − y∞‖λ ≤ Dy} (22)

Here y∞ is defined in (16), Dy = ρD1/(1 − ρ); D1 is theuniform bound of ‖xk − zk‖λ for an arbitrary k.

Proof: The proof is in Appendix D.Remark 4.8: Theorem 4.5 suggests that we cannot achieve

“perfect tracking” or a fixed limit trajectory unlike many ILCcontrol laws, due to the existence of dither signals (sinusoidsignals). However, according to Lemma 4.3, D1 can be madearbitrarily small for a sufficiently large ω. Thus, so is Dy ,since Dy is proportional to D1; that means we can make theultimate trajectory be as close to a fixed limit trajectory as onewishes by selecting a sufficiently large frequency. In the meantime, we can also let the fixed trajectory be as close to zeroas possible by having a small enough forgetting factor β.

V. ILLUSTRATIVE EXAMPLE

A. I-type ILCConsider the following tracking reference for the ILC con-

trol law in (14).x∗ = − sin

( π20t)

Fig. 2 shows the trajectory evolution versus iteration for theI-type ILC with a forgetting factor β = 0.5. It clearly verifiesthat the trajectory (zk) will converge to a fixed trajectory,but the trajectory has a gap with the tracking reference. Fig.3 demonstrates the evolution for the I-type ILC without theforgetting factor β. It is shown that the trajectory will convergeto the reference ultimately, thus “perfect tracking” achieved.Meanwhile, it proves the result of Theorem 4.3.

0 20 40 60 80 100-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5Iteration 1 Iteration 2 Iteration 3

Iteration 50 Reference

z k(t

)

Time(s)

Fig. 2: Integral-type ILC with forgetting factor β = 0.5

0 20 40 60 80 100-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5Iteration 1 Iteration 2 Iteration 3

Iteration 50 Reference

z k(t

)

Time(s)

Fig. 3: Integral-type ILC without forgetting factor β = 0

B. ILES

We study the following static map.

y = x2 + 2 sin( π

20t)x

It is evident that the minimizer trajectory is x∗ = − sin(π20 t).

As for the iterative learning extremum seeking control law,α is selected as 0.1 and the forgetting factor β = 0.3. Figs.4a-4d shows the evolutions of the original system and theMLB system (5) in the 1st, 2nd, 3rd, 50th iterations underthe frequency ω = 7. They indicate that the tracking perfor-mances of both the systems improve gradually, although thefluctuations are getting larger but finally bounded. Figs. 5a-5d shows the similar evolutions under the frequency ω = 15.Comparing Figs. 4a-4d and Figs. 5a-5d, it implies that boththe tracking error (the fluctuation of the MLB system (5)) andthe approximation error (the gap between the original systemand the MLB system (5)) are smaller under a larger frequencyω.

VI. CONCLUSION & OUTLOOK

This paper has proposed an iterative learning extremumseeking approach to solve the time varying optimizer tracking

Page 9: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 9

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(a) 1st iteration with ω = 7

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(b) 2nd iteration with ω = 7

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(c) 3rd iteration with ω = 7

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(d) 50th iteration with ω = 7

Fig. 4: The evolution of the original system and MLB system under frequency ω = 7.

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(a) 1st iteration with ω = 15

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(b) 2nd iteration with ω = 15

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(c) 3rd iteration with ω = 15

0 20 40 60 80 100-3

-2

-1

0

1

2

3ILES OptimizerMLB

Time (s)

xk/z

k

(d) 50th iteration with ω = 15

Fig. 5: The evolution of the original system and MLB system under frequency ω = 15.

problem. A modified Lie bracket system has been introducedto study the behavior of the ILES system. It has shown that theMLB system is an online integral-type ILC control law withbounded “control input disturbance”. The convergence of thecorresponding ILC control law has been analyzed. Based onthat, the convergence of the proposed ILES to a set has beenshown. The size of the set is reducible by tuning the frequencyof the dither signal. The distance between set’s center andoriginal is also tunable by some appropriate forgetting factorsβ. In the future, it is quite interesting to investigate how toextend the method to cover a general function F (x, t). It isworthy to study the dynamic mapping situation as well.

APPENDIX APROOF OF LEMMA 4.1

Proof: From the definitions of Tk and λ-norm, we have

‖Tk(x)− Tk(y)‖λ= maxt∈[0,L]

e−λt(1− β)∥∥∥∥(x− y)− e−Γkt

∫ t

0

eΓksΓk(x− y)ds

∥∥∥∥∞

It follows from the triangle inequality of norm and max{a+b} ≤ max{a}+ max{b} that

‖Tk(x)− Tk(y)‖λ≤ maxt∈[0,L]

e−λt(1− β)‖x− y‖∞

+ maxt∈[0,L]

e−λt(1− β)∥∥∥∥e−Γkt

∫ t

0

eΓksΓk(x− y)ds

∥∥∥∥∞

The first term on the right hand side is exactly (1−β)‖x−y‖λaccording to the definition of λ-norm. For the second term, wedo the following operation.

‖Tk(x)− Tk(y)‖λ ≤ (1− β)‖x− y‖λ + maxt∈[0,L]

e−λt(1− β)∥∥∥∥e−Γkt

∫ t

0

e(Γk+λI)se−λsΓk(x− y)ds

∥∥∥∥∞

According to mean value theorem, it can be obtained that

∥∥∥∥e−Γkt

∫ t

0

e(Γk+λI)se−λsΓk(x− y)ds

∥∥∥∥∞

=

∥∥∥∥e−Γkt

∫ t

0

e(Γk+λI)sΓkds(e−λξ(x(ξ)− y(ξ))

)∥∥∥∥∞,

[ξ ∈ (0, t)]

≤e−λξ‖x(ξ)− y(ξ)‖∞∥∥∥∥e−Γkt

∫ t

0

e(Γk+λI)sΓkds

∥∥∥∥∞,

[ξ ∈ (0, t)]

≤ maxs∈[0,t]

{e−λs‖x− y‖∞

}∥∥∥∥e−Γkt

∫ t

0

e(Γk+λI)sΓkds

∥∥∥∥∞

The first inequality is followed from the definition of inducedmatrix norm. It is also noted that

maxs∈[0,t]

{e−λs‖x− y‖∞

}≤ ‖x− y‖λ

Page 10: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 10

Hence,

‖Tk(x)− Tk(y)‖λ

≤(1− β)‖x− y‖λ(

1 + maxt∈[0,L]

e−λt∥∥∥∥e−Γkt

∫ t

0

e(Γk+λI)sΓkds

∥∥∥∥∞

)≤(1− β)‖x− y‖λ

(1 + max

t∈[0,L]

∥∥∥∥∫ t

0

e(Γk+λI)(s−t)Γkds

∥∥∥∥∞

)≤(1− β)‖x− y‖λ

[1 + max

t∈[0,L]

∥∥∥(I − e−(Γk+λI)t)

Γk

(Γk + λI)−1∥∥∥∞

]For a matrix A ∈ Mn, 1√

n‖A‖2 ≤ ‖A‖∞ ≤ √n‖A‖2,

which is obtained from the norm equivalence theorem. Then,it follows that∥∥∥(I − e−(Γk+λI)t

)Γk(Γk + λI)−1

∥∥∥∞

≤√n∥∥∥(I − e−(Γk+λI)t

)Γk(Γk + λI)−1

∥∥∥2

≤√n∥∥Γk(Γk + λI)−1

∥∥2

Thereafter, we have

‖Tk(x)−Tk(y)‖λ ≤ (1−β)‖x−y‖λ(1+√n‖Γk(Γk+λI)−1‖2)

Note that ‖Γk(Γk+λI)−1‖2 = ‖I−(Γk+λI)−1‖2, it is a nonincreasing function of Γk. From the assumption, it is knownthat Γk ≥ αδ/2I . Denoting ρ = (1 − β)(1 +

√nαδ/(αδ +

2λ)), it is easy to see there always exists a ρ < 1 whenλ ∈ (λ0,+∞) for arbitrary β ∈ (0, 1) with

λ0 = max

{0,αδ[√n(1− β)− β]

}This completes the proof.

APPENDIX BPROOF OF LEMMA 4.2

Proof: We are going to show the statement by inductivearguments.

When k = 1, from (9), the explicit solution of z1 is

z1 = −∫ t

0

e−Γ1(t−s)Γ1x∗ds

Since all the eigenvalues associated with Γ1 locates in theright complex plane and x∗ is continuous over the interval[0, L], z1 is well defined and bounded over [0, L]. Hence,‖y1‖λ is bounded. Furthermore, it is evident that ‖y1‖λ ≤ D2,equivalently, y1 ∈ S.

Assume that yk−1 ∈ S or ‖yk−1‖λ ≤ D2 holds for thek − 1-th iteration. It remains to show that yk ∈ S.

From (15) and Lemma 4.1, we have

‖yk‖λ =‖Gk(yk−1)‖λ

≤∥∥∥∥Tk(yk−1)− Tk(0) + Tk

(− β

1− β x∗)− Tk(0)

∥∥∥∥λ

≤‖Tk(yk−1)− Tk(0)‖λ

+

∥∥∥∥Tk (− β

1− β x∗)− Tk(0)

∥∥∥∥λ

≤ρ‖yk−1‖λ + ρ

∥∥∥∥ β

1− β x∗∥∥∥∥λ

≤ρD2 + ρ

∥∥∥∥ β

1− β x∗∥∥∥∥λ

From the statement, we have D2 ≥ βρ(1−β)(1−ρ)‖x∗‖λ. Thus,

‖yk‖λ ≤ ρD2 + ρ

∥∥∥∥ β

1− β x∗∥∥∥∥λ

≤ ρD2 + (1− ρ)D2 = D2

Hence, Gk(yk−1) ∈ S, and this completes the proof.

APPENDIX CPROOF OF LEMMA 4.3

Proof: The proof is adapted from Theorem 1 in [24]. Themajor differences are: first, [24] dealing with more generalform – input affine system, we only focus on sinusoid inputsignal; second, [24] suitable for infinite time horizon andsingle iteration, we are handling the case of finite time horizonand multiple iterations.

xk0

X

Sxk<k0

zk0

K

2E

xk0(tE)

Fig. 6: The illustration of the proof idea

The basic idea to show the lemma is illustrated in Fig. 6.xk<k0 are within S and zk0 is within K. We construct a tubewith radius E along the trajectory of zk0 . If E ≤ dist(K, S)and xk0 is within the tube, it can conclude that xk0 ∈ S.Moreover, supposing the xk0 leaves the tube at arbitrary timetE , if we can show tE is not the time that xk0 leaves the tube,then we can claim that xk0 will never leave the tube.

Page 11: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 11

Evaluating (4), (5) at k0, Subtracting (5) from (4) andintegrating on both sides, we have

xk0 − zk0 =− α∫ t

0

F (xk0 , t)√ω sin(ωs)ds

+

∫ t

0

√ω cos(ωs)ds

+αγk0

2

∫ t

0

∇zF (zk0 , s)ds

(23)

Let Ra be the first term on the right hand side of (23); takingintegration on Ra by parts, we obtain that

Ra =α√ω

[F (xk0 , s) cos(ωs)]|t0 −α√ω

∫ t

0

cos(ωs)F (xk0 , s)ds

=α√ω

[F (xk0 , s) cos(ωs)]|t0

− α√ω

∫ t

0

cos(ωs)∂F (xk0 , s)

∂tds

− α√ω

∫ t

0

cos(ωs)∇xF (xk0 , s)xk0ds

Denote the 1st and 2nd terms in the right hand side of the aboveequation as R1 and R2 respectively. Iterating (4) to the firstiteration, it follows that

xk0 = −k0∑i=1

(1−β)k0−iαF (xi, t)√ω sin(ωt)+γk0

√ω cos(ωt)

(24)Inserting (24) into Ra, it becomes

Ra =R1 +R2 +R3 − αγk0∫ t

0

cos2(ωs)∇xF (xk0 , s)ds

=R1 +R2 +R3 +R4 −αγk0

2

∫ t

0

∇xF (xk0 , s)ds

Here R3 , α2

2

∑k0i=1(1 −

β)k0−i∫ t

0sin(2ωs)∇xF (xk0 , s)F (xi, s)ds and

R4 , −αγk02

∫ t0

cos(2ωs)∇xF (xk0 , s)ds. The secondequality stems from the identity cos(2ωs) = 2 cos2(ωs) − 1.Hence, xk0 − zk0 becomes

xk0−zk0 =

5∑i=1

Ri−αγk0

2

∫ t

0

[∇xF (xk0 , s)−∇zF (zk0 , s)]ds

(25)where R5 = sin(ωt)/

√ω. It should be noted that (25) holds

universally for any t. Now we are going to show xk0 willremain in S over [0, L] by contradiction. Let E = dist(K, S).Assume that there is a time instant tE ∈ (0, L) such that‖xk0(t)− zk0(t)‖∞ < E for any t ∈ [0, tE) and ‖xk0(tE)−zk0(tE)‖∞ = E. It means that tE is the first time when xk0leaves a tube with zk0 as center and radius E. Since S is acompact set, for any x ∈ S , there always exist ε1, ε2, ε3 suchthat

‖F (x, s)‖∞ ≤ ε1, ‖∇xF (x, s)‖∞ ≤ ε2,∥∥∥∥∂F (x, s)

∂t

∥∥∥∥∞≤ ε3

Note that these bounds are uniformly bounded in t. Thus, wehave

‖R1‖∞ =

∥∥∥∥ α√ω

[F (xk, s) cos(ωs)]|t0∥∥∥∥∞≤ 2αε1√

ω

‖R2‖∞ =

∥∥∥∥ α√ω

∫ t

0

cos(ωs)∂F (xk, s)

∂tds

∥∥∥∥∞≤ αε3ω3/2

‖R3‖∞ =

∥∥∥∥∥α2

2

k0∑i=1

(1− β)k0−i∫ t

0

sin(2ωs)∇xF (xk, s)

F (xi, s)ds

∥∥∥∥∞≤ α2ε1ε2

k0∑i=1

(1− β)k0−i

<α2ε1ε22βω

‖R4‖∞ =

∥∥∥∥αγk02

∫ t

0

cos(2ωs)∇xF (xk0 , s)ds

∥∥∥∥∞≤ αγk0ε2

≤ αε24βω

‖R5‖∞ =

∥∥∥∥ sin(ωs)√ω

∥∥∥∥∞≤ 1√

ω

So there must exist a positive number M such that∑5i=1Ri ≤

M√ω

and M is independent of k.Since F (x, t) are twice continuous on S × [0, tE ], there

exist a positive number K such that ‖∇xF (xk0 , s) −∇zF (zk0 , s)‖∞ ≤ K‖xk0 − zk0‖∞ according to Lemma 3.2(pp. 90, [25]). Thereafter, for t ∈ [0, tE ]

‖xk0 − zk0‖∞ ≤M√ω

+αK

∫ t

0

‖xk0 − zk0‖∞ds

From Gronwall-Bellman inequality (pp. 651, [25]), we have

‖xk0 − zk0‖∞ ≤M√ω

eαK2β t, t ∈ [0, tE ] (26)

For any ω0 ∈ (M2eαKtEβ /E2,∞), where ω0 is independent

of k and t, ‖xk0(tE) − zk0(tE)‖∞ < E, which contradicts.Thus, xk0 will remain in the tube of zk0 . Since zk0 stays inK, xk0 will stay in S. (26) suggests that ‖xk − zk‖λ can besmall enough if ω is large enough. This completes the proof.

APPENDIX DPROOF OF THEOREM 4.5

Proof: From Lemma 4.3 and Theorem 4.4, we know thatthere exists a ω0 for every ω ∈ (ω0,+∞) such that we canensure that ‖xk − zk‖λ ≤ D1 and yk ∈ S for any k.

By similar arguments in the proof of Theorem 4.2, from thedefinition of limit, for an arbitrarily chosen ε > 0, selectingan ε′ > 0 satisfying 2ε′/(1− ρ) < ε,

2ε′D2

1− ρ < ε

there exists a kε such that ‖Gk−G∞‖ < ε′ can be guaranteedas long as k ≥ kε.

From (22), Y is a λ-norm ball with y∞ as its center andDy as its radius. Hence, we have

dist(yk,Y) = max {‖yk − y∞‖λ −Dy, 0}

Page 12: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 … TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative learning extremum seeking tracking via Lie bracket approximation Zhixing Cao, Hans-Bernd

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 12

If dist(yk,Y) = 0, it means that yk ∈ Y .For arbitrary k ≥ kε, we have the following from (10),(16).

‖yk − y∞‖λ=‖Gk(yk−1)−G∞(y∞) + Tk(xk−1 − zk−1)‖λ≤‖Gk(yk−1)−Gk(y∞)‖λ + ‖Gk(y∞)−G∞(y∞)‖λ

+ ‖Tk(xk−1 − zk−1)− Tk(0)‖λ≤ρ‖yk−1 − y∞‖λ + 2ε′D2 + ρD1

Iterating the above equation to kε, we have

‖yk − y∞‖λ ≤ρk−kε‖ykε − y∞‖λ +

k−kε−1∑i=0

2ρiε′D2

≤ρk−kε‖ykε − y∞‖λ +2ε′D2

1− ρ +ρD1

1− ρIt follows from Theorem 4.4 that ‖ykε− y∞‖λ is bounded by2D2. Therefore, it can be guaranteed that ‖yk−y∞‖λ ≤ Dy+εfor any k satisfying

k > kε + logρ1

2D2

(ε− 2ε′D2

1− ρ

)ε is chosen arbitrarily; thus, we can conclude that

limk→∞

‖yk − y∞‖λ ≤ρD1

1− ρIt follows that

limk→∞

dist(yk,Y) = 0

That means yk will finally be within Y . This completes theproof.

ACKNOWLEDGMENT

Zhixing Cao and Furong Gao want to thankAcademician Workstation Project of Guangdong, China#2012B090500010, National Natural Science Foundation ofChina #61433005 and Hong Kong Research Grant CouncilProject #612512 for supporting this work.

REFERENCES

[1] K. B. Ariyur and M. Krstic, Real-time optimization by extremum-seekingcontrol. John Wiley & Sons, 2003.

[2] Y. Tan, W. Moase, C. Manzie, D. Nesic, and I. Mareels, “Extremumseeking from 1922 to 2010,” in Control Conference (CCC), 2010 29thChinese. IEEE, 2010, pp. 14–26.

[3] D. Dochain, M. Perrier, and M. Guay, “Extremum seeking control andits application to process and reaction systems: A survey,” Mathematicsand Computers in Simulation, vol. 82, no. 3, pp. 369–380, 2011.

[4] M. Krstic and H.-H. Wang, “Stability of extremum seeking feedbackfor general nonlinear dynamic systems,” Automatica, vol. 36, no. 4, pp.595–601, 2000.

[5] N. J. Killingsworth and M. Krstic, “Pid tuning using extremum seeking:online, model-free performance optimization,” Control Systems, IEEE,vol. 26, no. 1, pp. 70–79, 2006.

[6] X. Chen, “A study on profile setting of injection molding,” Ph.D.dissertation, Hong Kong University of Science and Technology, 2002.

[7] H.-H. Wang and M. Krstic, “Extremum seeking for limit cycle mini-mization,” Automatic Control, IEEE Transactions on, vol. 45, no. 12,pp. 2432–2436, 2000.

[8] M. Guay, D. Dochain, M. Perrier, and N. Hudon, “Flatness-basedextremum-seeking control over periodic orbits,” Automatic Control,IEEE Transactions on, vol. 52, no. 10, pp. 2005–2012, 2007.

[9] M. Haring, N. Van De Wouw, and D. Nesic, “Extremum-seeking controlfor nonlinear systems with periodic steady-state outputs,” Automatica,vol. 49, no. 6, pp. 1883–1891, 2013.

[10] M. Krstic, “Performance improvement and limitations in extremumseeking control,” Systems & Control Letters, vol. 39, no. 5, pp. 313–326,2000.

[11] F. D. Sahneh, G. Hu, and L. Xie, “Extremum seeking control for systemswith time-varying extremum,” in Control Conference (CCC), 2012 31stChinese. IEEE, 2012, pp. 225–231.

[12] Y. Wang, F. Gao, and F. J. Doyle III, “Survey on iterative learningcontrol, repetitive control, and run-to-run control,” Journal of ProcessControl, vol. 19, no. 10, pp. 1589–1600, 2009.

[13] S. Arimoto, S. Kawamura, and F. Miyazaki, “Bettering operation ofrobots by learning,” Journal of Robotic Systems, vol. 1, no. 2, pp. 123–140, 1984.

[14] D. A. Bristow, M. Tharayil, and A. G. Alleyne, “A survey of iterativelearning control,” Control Systems, IEEE, vol. 26, no. 3, pp. 96–114,2006.

[15] D. Shen and Y. Wang, “Survey on stochastic iterative learning control,”Journal of Process Control, vol. 24, no. 12, pp. 64–77, 2014.

[16] H.-S. Ahn, Y. Chen, and K. L. Moore, “Iterative learning control: briefsurvey and categorization,” IEEE Transactions on Systems Man andCybernetics part C Applications and Reviews, vol. 37, no. 6, p. 1099,2007.

[17] H.-S. Ahn, K. L. Moore, and Y. Chen, Iterative learning control:robustness and monotonic convergence for interval systems. SpringerScience & Business Media, 2007.

[18] C.-J. Chien and J.-S. Liu, “A P-type iterative learning controller forrobust output tracking of nonlinear time-varying systems,” InternationalJournal of Control, vol. 64, no. 2, pp. 319–334, 1996.

[19] D. Wang, “On D-type and P-type ILC designs and anticipatory ap-proach,” International Journal of Control, vol. 73, no. 10, pp. 890–901,2000.

[20] S. S. Saab, “Stochastic P-type/D-type iterative learning control algo-rithms,” International Journal of Control, vol. 76, no. 2, pp. 139–148,2003.

[21] ——, “Optimal selection of the forgetting matrix into an iterativelearning control algorithm,” Automatic Control, IEEE Transactions on,vol. 50, no. 12, pp. 2039–2043, 2005.

[22] P. R. Ouyang, W.-J. Zhang, and M. Gupta, “PD-type on-line learningcontrol for systems with state uncertainties and measurement distur-bances,” Control and Intelligent Systems, vol. 35, no. 4, p. 351, 2007.

[23] H.-S. Lee and Z. Bien, “A note on convergence property of iterativelearning controller with respect to sup norm,” Automatica, vol. 33, no. 8,pp. 1591–1593, 1997.

[24] H.-B. Durr, M. S. Stankovic, C. Ebenbauer, and K. H. Johansson,“Lie bracket approximation of extremum seeking systems,” Automatica,vol. 49, no. 6, pp. 1538–1552, 2013.

[25] H. K. Khalil, Nonlinear systems, 3rd ed. Prentice Hall New Jersey,2002.

[26] J. K. Hunter. Applied analysis. [Online]. Available: https://www.math.ucdavis.edu/∼hunter/book/pdfbook.html