Download pdf - Applied stochastic approximation algorithms in Hilbert space

This article was downloaded by: [Monash University Library]On: 26 September 2013, At: 01:12Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of ControlPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tcon20

Applied stochastic approximation algorithms in HilbertspaceC. S. KUBRUSLY aa Departamento de Engenharia Eletrica, Pontificia Universidade Católica, R. Marques de S.Vicente 209,ZC-20, Rio de Janeiro, RJ, 20.000, BrasilPublished online: 25 Apr 2007.

To cite this article: C. S. KUBRUSLY (1978) Applied stochastic approximation algorithms in Hilbert space, International Journalof Control, 28:1, 23-31

To link to this article: http://dx.doi.org/10.1080/00207177808922433

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/tcon20

http://dx.doi.org/10.1080/00207177808922433

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

INT. J. CONTROL, 1978, VOL. 28, NO. I, 23-31

Applied stochastic approximation algorithms inHilbert space

C. S. KUBRUSLYt

This paper considers the theory of stochastic approximation in a Hilbert spacesetting for applicable purposes with emphasis in system identification. The algorithmsinvestigated here converge in quadratic mean and with probability 1 and are lessreatr-ict.ive, from the application viewpoint, than the original works on stochasticapproximation theory. This approach supplies a suitable class of algorithms snt.iafying the convergence requirements, without compromising the system identificationapplicability.

1. IntroductionStochastic approximation is a recursive scheme which can be used for

parametric estimation in a stochastic environment. Its origins are the worksof Robbins and Monro (1951), Kiefer and Wolfowitz (1952), Blum (1954), andthe unified general approach given by Dvoretzky (1956). Presently there isa great deal of literature on this subject, both from theoretical and practicalviewpoints. Some complete books on stochastic approximation have alreadybeen published (e.g., see Albert and Gardner (1967) and Wasan (1969)) andinteresting surveys regarding mainly the applications are also available (e.g.,see Sakrison (1966), Fabian (1971), and Ljung (1974)).

This technique has been extensively used for identification purposes inmemoryless systems and lumped parameter systems (cf. Saridis et al, (1969)and Saridis (1974)) and also, but not so extensively, for distributed parametersystems identification (cf. survey by Kubrusly (1977) and also Krtolica(1976 a, b)). Concerning the literature in system identification by stochasticapproximation, it has become common practice to refer to Dvoretzky's work(1956) and then to proceed directly to applicable algorithms. But it happensthat this bridge linking theory and practice is not so obvious and the gapbetween them is not so narrow either.

Basically the stochastic approximation algorithms take the followingrecursive form:

x(n+ I) = Tnx(n) +z(n)

where {Tn; n;:.O} is a family of operators and {z(n); n;:.O} is a randomsequence. In formulating the convergence theorem there are two basic setsof conditions to be imposed. Set I: those concerning the operators T no

Set II: those concerning the sequence z(n). From the application viewpointthe set II, as originally assumed by Dvoretzky (1956), is too restrictive andthe set I is too general. For this reason, in an attempt to bridge that gap

Received 28 April 1977.t Departamento de Engenharia Eletrica, Pontificia Universidade Catolica,

R. Marques de S. Vicente 209, ZC-20, 20.000, Rio de Janeiro, RJ, Brasil.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

24 O. S. Kubrusly

between theory and practice, we introduce some applicable stochastic approximation algorithms in Hilbert space by using a less restrictive set II.

2. Notational preliminaries and auxiliary resultsWe assume throughout this paper that I is the set of all non-negative

integers (i.e. 1= {a, I, 2, ... }) and H is a real separable Hilbert space. Theinner product and the norm in H will be denoted by <; >and II II, respectively. Let (n,.9f, P) be a probability space where .9f is a a-algebra ofsubsets of a non-empty basic set n, and P a probability measure defined on.9f. With £' denoting the a-algebra of subsets of H generated by the opensets, a random variable in H (e.g. x) is a measurable mapping of (n, .9f) into(H, £'). Denoting by E the expectation operator, a second-order randomvariable x in H is such that Ellxllz< 00, and a second-order random sequence{x(n); neI} is a family of second-order random variables in H. The randomvariable E[x(n)lx(O), ... , x(n-l)] in H will denote the conditional expectation of x(n) given a sub a-algebra of .9f induced by the random variablesx(O), ... ,x(n-I). For the theory of such H-valued random variables thereader is referred to Balakrishnan (1976).

We shall use the superscript + to denote a real valued fimction on R asfollows:

{

a 'a+= )

o·,

if a~O

if a~O

Note that (",a + {3b)+ ~ ",a+ + {3b+ for any non-negative real numbers "', {3. Thefollowing lemmas will be needed.

Lemma I

Let {"'n; neI} and {an; neI} be real sequences such that

n

(i) "'n>O, "In; n "',->0 as n->oo.1=0

eo(ii) an~O, "In; L a,<a<oo.

i=O

If {Sn; neI} is a non-negative real sequence such that

then

Proof

First let us definen-l

<1>(n, i) = n "'i; n> ij=i

<1>(n, n) = I

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

Stochastic aqrproximaiion. alqorithm« in Hilbert space

On iterating the inequality for {gn} we get, for any n > m > 0,

n-l

gn~ (n, O}go + L (n, i + 1}O't\=0

m-l n-1

=(n, O}go + L (n, i + 1}O't + L (n, i + 1)0';i=O i=m

(m-l ) (n-l )

< g+ .L 0'; max (n, i} + .L 0'; max (n, i)1=0 °=S;;l !liOm \=m m+l:E;I:$On

CXJ

«g+O') max (n, i)+c L 0';O:=s;;i:=S;;m i=m

25

where c is a finite positive number such that (n, i} -cc for all n and i~n

(i.e, (n, i) is uniformly bounded by (i}). But, given any E > 0,

CXJ EC L 0';<-

i=n, 2

by choosing nof:J large enough (ii). Also, by (i), (n, k}->O as n-> 00 for anyfinite k, So there exists n1El (n1 > no) sufficiently large such that

(g+O') max (n, i} <~, Vn;;'n1O"'''n, 2

Now, setting m=no, we have

Lemma 2

Let {~n; nEll be a real positive sequence such that

for all n, whereCXJ

En ~ 0 ; L En < 00n=O

Definen-l

(n, i}= n (1- ~j); i <nj=i

(n, n} = 1Then

for some finite positive constant k.

o

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

26

Proof

C. S. Kubrusly

n-l n-l

:;;~n Il (I+Ej):;;~nlim Il (1+Ej)i«! "-<Xl j==d

= ~nk; 0 < k < 00

since (e.g. sec Knopp (1951)) for any noEl:

co co

En;;'O and L En<oo <0> 0< Il (I+En)=k<oon=O n=no

o

3. The main resultsTheorem 1

Assume Xo (1lxoll < (0) a fixed point in H. Let {z(n) ; nEl} be a H-valuedrandom sequence, x(O) a H-valued second-order random variable, and considerthe following algorithm (a discrete-time dynamical system) in H :

x(n + 1) = T nx(n) +z(n)

where {Tn: H-.H; nEI} is a family of operators on H. If

IITnx-xoll :;;"'nllx-xoll

for any xEH. with {"'n; nEl} a real sequence such that

"'n>Ofor all n, and

nn "'i->O as n-. 00i=O

co

L Ellz(n)112 < 00n=O

andco

L E<Tnx(n)-xo; z(n»+ <00n=O

(1)

(2)

(3)

(4)

(5)

(6)

then x(n) converges to Xo in quadratic mean (q.m.) and with probability I(w.p.l) :

P{ lim x(n) =xo}= In-CO

Proof

Ilx(n + I) -xoI1 2= II T nx(n) -xo11 2+ Ilz(n)112 + 2<Tnx(n) -xo; z(n»

:;;"'n21Ix(n)-xoI12+ Ilz(n)112+2<Tnx(n)-xo; z(n»+

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

Stochastic approximation olqorithms in Hilbert space

(a) Convergence in quadratic mean. Set

~n = Ellx(n) -xo112

an = Ellz(n)112 + 2E<Tnx(n) -Xo ; z(n»+So

Then, by Lemma 1, we have

27

(b) Convergence with probability 1. Conditions (5) and (6) imply that(e.g. see Loeve (1963, p. 173)) :

ex>

L (1Iz(n)112+<Tnx(n)-xo; z(n»+) < 00, w.p.In~O

So, defining real random variables,

~n = Ilx(n) - xol12

an = Ilz(n)112+2<Tnx(n)-xo; z(n»+

we still get

and the result of Lemma 1 is ensured w.p.I, that is

w.p.I.~n= Ilx(n)-xoIl 2---;,.O as n ...... co

w.p.l.x(n) ---;,. Xo as n ...... co

sinceex>

L an<oo, w.p.In=O

o

Conditions (2)-(4) (set I) can be viewed as a particular case of thoserequired by Dvoretzky (1956). On the other hand, conditions (5) and (6)(set II) are less restrictive. In other words, concerning the assumptionsmade by Dvoretzky (1956), the set I was strengthened while the set II wasweakened. But in many applications of stochastic approximation for systemidentification (cf. Saridis and Stein (1968) and Kubrusly and Curtain (1977))the set I is particularized still further, allowing an even weaker version forset II. And that is the next result.

Corollary 1

Consider the following algorithm in H :

x(n+ 1) = (1- ~n)x(n)+ ~ny(n)

where(7)

(i) x(O) is a second-order H-valued random variable independent of{y(n); nEI}.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

28 C. S. Kubrusly

(ii) {!;n; nEI} is a real sequence as in Lemma 2, with the followingadditional conditionst.:

(iii) {y(n); nEI} is a H-valued random sequence with constant mean

Ey(n)=XOEH

for all n (II XoII < co), such that

Elly(n)112 <a2 < 00

n-lL E<y(i)-xo; y(n)-xo>+ ,;;f3n-l; n;;> Ii=O

where {f3n; nEI} is a non-negative real sequence such that00

L ~n2 f3n-l < 00n=l

(8)

(9)

(10)

(11)

(12)

(13)

(14)

Under the above assumptions x(n) converges to Xo 111 quadratic mean andwith probability I :

P{limx(n)=xo}=ln_oo

Proof

By (7)

SetCXn = (1- ~n)

z(n) = ~n[y(n) -xo]

and define operators Tn on H as follows:

T nX= cxn(x-xo) +xofor any x in H. So we get an algorithm as in (I) :

(1) x(n+I)=Tnx(n)+z(n)

where the conditions (2) to (5) are satisfied:

(2) II Tnx-xoll =cxnllx-xoll, for any xEH

t For example,

{~n=(an~b)a; O<a,;;I, b>I, !<a.";I}

is a possible family of such sequences.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

(3)

(4)

Stochastic approximation algorithms in Hilbert space

O<",,,=I-~,,<I, by (8)

nn "'i->O as n-> CXJi=O

29

by (8), (9) (cf. Knopp (1951)).co co co

(5) L Ellz(n)112= L ~,,2(Elly(n)112-llxoI12)«a2-llxoI12) L ~,,2<CXJn=O n=O n=O

by (10)-(12). Then, in order to complete the proof, we just need to verify thecondition (6) of the preceding theorem. First let us define the followingquantities:

cXy(n, m)=E<x(n)-xo ; y(m)-xo>

cXy+(n, m)=E<x(n)-xo ; y(m)-xo>+

cyy(n, m) = E<y(n) -xo ; y(m) -xo>

cyy+(n, m)=E<y(n)-xo ; y(m)-xo>+n-1

<1>(n, i)= n (1- ~i); t <nj=i

<1>(n, n) = 1

Now, rewriting the algorithm (7) as follows

x(n+ 1)-xo= (1- ~n)[x(n) -xoJ + ~n[y(n) -xoJ

we have, for any fixed niel ,

cXy(n+ 1, m) = <1>(n + 1, n)cXy(n, m) + ~nCyy(n, m)

On iterating the above equality we get, for any n> 0,

n-1cXy(n, m) = <1>(n, O)CXy(O, m) + L <1>(n, i + I)~icyy(i, m)

i=O

Since x(o) is a second-order random variable independent of {y(n); nEl}it comes, by (11), that CXy(O, m)=O for any mel. Thus, setting m=n~ 1,we have

,,-1cXy+(n, n)~ L <1>(n, i+I)~icyy+(i, n)

i=O

Then, by Lemma 2 and conditions (13), (14), we get

(6)co co

L E<Tnx(n)-xo; z(n»+= L "'n~nCxy+(n, n)n=O n=O

co n-l

~ L ~n(I-~n) L <1>(n,i+I)~icyy+(i,n)n=l i=O

co n-l

L ~n L <1>(n+I,i+I)~iCyy+(i,n)n=l i=O

co n-l co

~ L k~n2 L cyy+(i, n) ~k L ~n2 13n-1 < CXJn=l i=O n=l

for some finite positive constant k. D

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

30 C. S. Kubrusly

4. Comments and concluding remarks

Dvoretzky (1956) in his original paper dealt with real random variables,but he also formulated a generalization in normed linear spaces whose proofis a natural cxtention of the scalar (real) case. Schmetterer (1958) consideredstochastic approximation algorithms, in particular the unified Dvoretzkyapproach, in Hilbert space. The previous works were generalized by Venter(1966) who proposed a wider class of algorithms in Hilbert space.

The results we deduced in § 3 present the advantage of requiring lessrestrictive conditions concerning the random sequence {z(n)} (or {y(n)}). Inother words, conditions of the following type were assumed by Dvoretzky(1956) instead of (6) :

E[z(n) Ix(O), z(O), ... , z(n-l)] = 0, "In (15)

w.p.I in the scalar case, which can also be assumed in a Hilbert space settingas shown by Venter (1966), and

(16)

in the case of linear normed spaces. Assuming H-valued random variables,both (15) and (16) imply that E<Tnx(n)-xo; z(n),;;O for all n (where theequality holds in the case of (15) as shown by Kubrusly (1976)). Venter (1966)also proved that conditions weaker than (15), such as

00

L (EIIE[z(n)lx(O), z(l), ... , z(n-I)1112)1/2 < OJn=O

00

L IIE[z(n)lx(O), z(l), ... , z(n-l)]11 < OJ, w.p.In ...O

(17)

(18)

are able to ensure the convergence for algorithms in Hilbert space, in quadraticmean and with probability 1 in the case of (17), and with probability 1 in thecase of (18). Note that (15) implies (17) which implies (18) (for detaileddiscussion see Venter (1966)).

In many applications concerning the system identification practice, therandom sequence {y(nl} appearing in Corollary 1 comes as a non-linear transformation of observations of some dynamical system (e.g. see Saridis and Stein(1968) and Kubrusly and Curtain (1977) for illustrative examples concerningsuch applications of Corollary I). Therefore, even in particular cases dealingwith finite-dimensional linear dynamical systems under wide sense stationarityassumptions, it may be difficult (if possible) to verify conditions such as (15)(18) (now with z(n)= tn[y(n)-xo])' In a first step to meet this problem,correlated random sequences {z(n)} subject to (6) were allowed in Theorem l.This particular condition (6) may also not be easy to be directly verified inpractice, since it is expressed in terms of correlation between z(n) and x(n+ I),where {x(n)} is the random sequence to be analysed. However, in Corollary Ithis particular difficulty has been overcome, where (6) is substituted by (13)which involves only correlation between {y(n)}t. On the other hand, the

t Other results on almost sure and on mean square convergence for correlated(but scalar) random sequences were given in Ljung (1974, 1975).

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13

Stochastic approximation algorithms in Hilbert space 31

pair of conditions (13), (14) we assumed in Corollary 1 not only avoid randomanalysis of conditional expectation, but also dismiss uniform boundnessrequirements. In this way, these conditions are easier to be verified in manysituations, and more appropriate in case of wide-sense stationary randomsequences.

Finally we remark that a slightly similar version (up to (13), (14)) ofCorollary 1 for the (real) scalar case was presented in Fu (1969), where theproof stating the connection with the Dvoretzky theorem (1956) is somewhat obscure in the following sense. In that formulation it was not required(13), (14) or any of the sufficient conditions (15)-(18). So, if one believes inFu (1969), a simplified version of Corollary 1 for the scalar case (with directextension to finite-dimensional spaces) could be formulated by assuming only(8)-(12), without requiring the original sufficient condition (15) assumed byDvoretzky (1956). But it is clear that (8)-(12) does not imply (15) and, inthis way, they do not comprise a sufficient conditions set. In particular (11)does not imply (15) although the reverse is true.

ACKNOWLEDGMENT

The author thanks Mr. A. Alcaim for his carefully critical reading of themanuscript.

REFERENCES

ALBERT, A. E., and GARDNER, L. A., Jr., 1967, Stochastic Approximation and Non-linear Regression (MIT Press).

BALAKRISHNAN, A. V., 1976, Applied Functional Analysis (Springer-Verlag).BLUM, J. R., 1954, Ann. math. Statist., 25, 382.DVORETZKY, A., 1956, Proc. 3rd Berkeley Symp. on Math. Statist. and Prob., edited

by J. Neyman (University of California Press), p. 39.FABIAN, V., 1971, Optimization Methods in Statistics, edited by J. S. Rustagi

(Academic Press).Fu, K. S., 1969, System Theory, edited by L. A. Zadeh and E. Polak (McGraw.HiII).KIEFER, J'., and WOLFOWITZ, J., 1952, Ann. math. Statist., 23,462.KNOPP, K., 1951, Theory and Application of Infinite Series, 2nd edition (Blakie).KRTOLICA, R., 1976 a, Automn. remote Control, 37, 27; 1976 b, Ibid., 37, 170.KUBRUSLY, C. S., 1976, Ph.D. Thesis, Control Theory Centre, University of Warwick;

1977, Int. J. Control, 26, 509.KUBRUSLY, C. S., and CURTAIN, R. F., 1977, Int. J. Control, 25, 441.LJUNG, L., 1974, Report No. 7403, Div. of Autom. Control, Lund Institute of

'I'eehnology : 1975, Automn. remote Control, 35, 1532.LOEVE, M., 1963, Probability Theory, 3rd edition (Van Nostrand).ROBBINS, H., and MONRO, S., 1951, Ann. math. Statist., 22, 400.SAKRISON, D. J., 1966, Advances in Communication Systems, Vol. 2, edited by A. V.

Balakrishnan (Academic Press).SARIDIS, G. N., 1974, I.E.E.E. Trans. autom. Control, 19, 798.SARIDIS, G. N., NIKOLIK, Z. J., and Fu, K. S., 1969, I.E.E.E. Trans. Syst. Sci.

Cybernet., 5, 8.SARIDIS, G. N., and STEIN, G., 1968, I.E.E.E. Trans. autom. Control, 13,515.SCHMETTERER, L., 1958, Le Calcul des Probabilitss et ses Applications, 87,55.VENTER, J. H., 1966, Ann. math. Statist., 37, 1534.WASAN, W. T., 1969, Stochastic Approximation (Cambridge University Press).

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 0

1:12

26

Sept

embe

r 20

13