Upload
c-s
View
217
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [Monash University Library]On: 26 September 2013, At: 01:12Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
International Journal of ControlPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tcon20
Applied stochastic approximation algorithms in HilbertspaceC. S. KUBRUSLY aa Departamento de Engenharia Eletrica, Pontificia Universidade Católica, R. Marques de S.Vicente 209,ZC-20, Rio de Janeiro, RJ, 20.000, BrasilPublished online: 25 Apr 2007.
To cite this article: C. S. KUBRUSLY (1978) Applied stochastic approximation algorithms in Hilbert space, International Journalof Control, 28:1, 23-31
To link to this article: http://dx.doi.org/10.1080/00207177808922433
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
INT. J. CONTROL, 1978, VOL. 28, NO. I, 23-31
Applied stochastic approximation algorithms inHilbert space
C. S. KUBRUSLYt
This paper considers the theory of stochastic approximation in a Hilbert spacesetting for applicable purposes with emphasis in system identification. The algorithmsinvestigated here converge in quadratic mean and with probability 1 and are lessreatr-ict.ive, from the application viewpoint, than the original works on stochasticapproximation theory. This approach supplies a suitable class of algorithms snt.iafying the convergence requirements, without compromising the system identificationapplicability.
1. IntroductionStochastic approximation is a recursive scheme which can be used for
parametric estimation in a stochastic environment. Its origins are the worksof Robbins and Monro (1951), Kiefer and Wolfowitz (1952), Blum (1954), andthe unified general approach given by Dvoretzky (1956). Presently there isa great deal of literature on this subject, both from theoretical and practicalviewpoints. Some complete books on stochastic approximation have alreadybeen published (e.g., see Albert and Gardner (1967) and Wasan (1969)) andinteresting surveys regarding mainly the applications are also available (e.g.,see Sakrison (1966), Fabian (1971), and Ljung (1974)).
This technique has been extensively used for identification purposes inmemoryless systems and lumped parameter systems (cf. Saridis et al, (1969)and Saridis (1974)) and also, but not so extensively, for distributed parametersystems identification (cf. survey by Kubrusly (1977) and also Krtolica(1976 a, b)). Concerning the literature in system identification by stochasticapproximation, it has become common practice to refer to Dvoretzky's work(1956) and then to proceed directly to applicable algorithms. But it happensthat this bridge linking theory and practice is not so obvious and the gapbetween them is not so narrow either.
Basically the stochastic approximation algorithms take the followingrecursive form:
x(n+ I) = Tnx(n) +z(n)
where {Tn; n;:.O} is a family of operators and {z(n); n;:.O} is a randomsequence. In formulating the convergence theorem there are two basic setsof conditions to be imposed. Set I: those concerning the operators T no
Set II: those concerning the sequence z(n). From the application viewpointthe set II, as originally assumed by Dvoretzky (1956), is too restrictive andthe set I is too general. For this reason, in an attempt to bridge that gap
Received 28 April 1977.t Departamento de Engenharia Eletrica, Pontificia Universidade Catolica,
R. Marques de S. Vicente 209, ZC-20, 20.000, Rio de Janeiro, RJ, Brasil.
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
24 O. S. Kubrusly
between theory and practice, we introduce some applicable stochastic approximation algorithms in Hilbert space by using a less restrictive set II.
2. Notational preliminaries and auxiliary resultsWe assume throughout this paper that I is the set of all non-negative
integers (i.e. 1= {a, I, 2, ... }) and H is a real separable Hilbert space. Theinner product and the norm in H will be denoted by <; >and II II, respectively. Let (n,.9f, P) be a probability space where .9f is a a-algebra ofsubsets of a non-empty basic set n, and P a probability measure defined on.9f. With £' denoting the a-algebra of subsets of H generated by the opensets, a random variable in H (e.g. x) is a measurable mapping of (n, .9f) into(H, £'). Denoting by E the expectation operator, a second-order randomvariable x in H is such that Ellxllz< 00, and a second-order random sequence{x(n); neI} is a family of second-order random variables in H. The randomvariable E[x(n)lx(O), ... , x(n-l)] in H will denote the conditional expectation of x(n) given a sub a-algebra of .9f induced by the random variablesx(O), ... ,x(n-I). For the theory of such H-valued random variables thereader is referred to Balakrishnan (1976).
We shall use the superscript + to denote a real valued fimction on R asfollows:
{
a 'a+= )
o·,
if a~O
if a~O
Note that (",a + {3b)+ ~ ",a+ + {3b+ for any non-negative real numbers "', {3. Thefollowing lemmas will be needed.
Lemma I
Let {"'n; neI} and {an; neI} be real sequences such that
n
(i) "'n>O, "In; n "',->0 as n->oo.1=0
eo(ii) an~O, "In; L a,<a<oo.
i=O
If {Sn; neI} is a non-negative real sequence such that
then
Proof
First let us definen-l
<1>(n, i) = n "'i; n> ij=i
<1>(n, n) = I
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
Stochastic aqrproximaiion. alqorithm« in Hilbert space
On iterating the inequality for {gn} we get, for any n > m > 0,
n-l
gn~ <I>(n, O}go + L <I>(n, i + 1}O't\=0
m-l n-1
=<I>(n, O}go + L <I>(n, i + 1}O't + L <I>(n, i + 1)0';i=O i=m
(m-l ) (n-l )
< g+ .L 0'; max <I>(n, i} + .L 0'; max <I>(n, i)1=0 °=S;;l !liOm \=m m+l:E;I:$On
CXJ
«g+O') max <I>(n, i)+c L 0';O:=s;;i:=S;;m i=m
25
where c is a finite positive number such that <I>(n, i} -cc for all n and i~n
(i.e, <I>(n, i) is uniformly bounded by (i}). But, given any E > 0,
CXJ EC L 0';<-
i=n, 2
by choosing nof:J large enough (ii). Also, by (i), <I>(n, k}->O as n-> 00 for anyfinite k, So there exists n1El (n1 > no) sufficiently large such that
(g+O') max <I>(n, i} <~, Vn;;'n1O"'''n, 2
Now, setting m=no, we have
Lemma 2
Let {~n; nEll be a real positive sequence such that
for all n, whereCXJ
En ~ 0 ; L En < 00n=O
Definen-l
<I>(n, i}= n (1- ~j); i <nj=i
<I>(n, n} = 1Then
for some finite positive constant k.
o
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
26
Proof
C. S. Kubrusly
n-l n-l
:;;~n Il (I+Ej):;;~nlim Il (1+Ej)i«! "-<Xl j==d
= ~nk; 0 < k < 00
since (e.g. sec Knopp (1951)) for any noEl:
co co
En;;'O and L En<oo <0> 0< Il (I+En)=k<oon=O n=no
o
3. The main resultsTheorem 1
Assume Xo (1lxoll < (0) a fixed point in H. Let {z(n) ; nEl} be a H-valuedrandom sequence, x(O) a H-valued second-order random variable, and considerthe following algorithm (a discrete-time dynamical system) in H :
x(n + 1) = T nx(n) +z(n)
where {Tn: H-.H; nEI} is a family of operators on H. If
IITnx-xoll :;;"'nllx-xoll
for any xEH. with {"'n; nEl} a real sequence such that
"'n>Ofor all n, and
nn "'i->O as n-. 00i=O
co
L Ellz(n)112 < 00n=O
andco
L E<Tnx(n)-xo; z(n»+ <00n=O
(1)
(2)
(3)
(4)
(5)
(6)
then x(n) converges to Xo in quadratic mean (q.m.) and with probability I(w.p.l) :
P{ lim x(n) =xo}= In-CO
Proof
Ilx(n + I) -xoI1 2= II T nx(n) -xo11 2+ Ilz(n)112 + 2<Tnx(n) -xo; z(n»
:;;"'n21Ix(n)-xoI12+ Ilz(n)112+2<Tnx(n)-xo; z(n»+
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
Stochastic approximation olqorithms in Hilbert space
(a) Convergence in quadratic mean. Set
~n = Ellx(n) -xo112
an = Ellz(n)112 + 2E<Tnx(n) -Xo ; z(n»+So
Then, by Lemma 1, we have
27
(b) Convergence with probability 1. Conditions (5) and (6) imply that(e.g. see Loeve (1963, p. 173)) :
ex>
L (1Iz(n)112+<Tnx(n)-xo; z(n»+) < 00, w.p.In~O
So, defining real random variables,
~n = Ilx(n) - xol12
an = Ilz(n)112+2<Tnx(n)-xo; z(n»+
we still get
and the result of Lemma 1 is ensured w.p.I, that is
w.p.I.~n= Ilx(n)-xoIl 2---;,.O as n ...... co
w.p.l.x(n) ---;,. Xo as n ...... co
sinceex>
L an<oo, w.p.In=O
o
Conditions (2)-(4) (set I) can be viewed as a particular case of thoserequired by Dvoretzky (1956). On the other hand, conditions (5) and (6)(set II) are less restrictive. In other words, concerning the assumptionsmade by Dvoretzky (1956), the set I was strengthened while the set II wasweakened. But in many applications of stochastic approximation for systemidentification (cf. Saridis and Stein (1968) and Kubrusly and Curtain (1977))the set I is particularized still further, allowing an even weaker version forset II. And that is the next result.
Corollary 1
Consider the following algorithm in H :
x(n+ 1) = (1- ~n)x(n)+ ~ny(n)
where(7)
(i) x(O) is a second-order H-valued random variable independent of{y(n); nEI}.
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
28 C. S. Kubrusly
(ii) {!;n; nEI} is a real sequence as in Lemma 2, with the followingadditional conditionst.:
(iii) {y(n); nEI} is a H-valued random sequence with constant mean
Ey(n)=XOEH
for all n (II XoII < co), such that
Elly(n)112 <a2 < 00
n-lL E<y(i)-xo; y(n)-xo>+ ,;;f3n-l; n;;> Ii=O
where {f3n; nEI} is a non-negative real sequence such that00
L ~n2 f3n-l < 00n=l
(8)
(9)
(10)
(11)
(12)
(13)
(14)
Under the above assumptions x(n) converges to Xo 111 quadratic mean andwith probability I :
P{limx(n)=xo}=ln_oo
Proof
By (7)
SetCXn = (1- ~n)
z(n) = ~n[y(n) -xo]
and define operators Tn on H as follows:
T nX= cxn(x-xo) +xofor any x in H. So we get an algorithm as in (I) :
(1) x(n+I)=Tnx(n)+z(n)
where the conditions (2) to (5) are satisfied:
(2) II Tnx-xoll =cxnllx-xoll, for any xEH
t For example,
{~n=(an~b)a; O<a,;;I, b>I, !<a.";I}
is a possible family of such sequences.
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
(3)
(4)
Stochastic approximation algorithms in Hilbert space
O<",,,=I-~,,<I, by (8)
nn "'i->O as n-> CXJi=O
29
by (8), (9) (cf. Knopp (1951)).co co co
(5) L Ellz(n)112= L ~,,2(Elly(n)112-llxoI12)«a2-llxoI12) L ~,,2<CXJn=O n=O n=O
by (10)-(12). Then, in order to complete the proof, we just need to verify thecondition (6) of the preceding theorem. First let us define the followingquantities:
cXy(n, m)=E<x(n)-xo ; y(m)-xo>
cXy+(n, m)=E<x(n)-xo ; y(m)-xo>+
cyy(n, m) = E<y(n) -xo ; y(m) -xo>
cyy+(n, m)=E<y(n)-xo ; y(m)-xo>+n-1
<1>(n, i)= n (1- ~i); t <nj=i
<1>(n, n) = 1
Now, rewriting the algorithm (7) as follows
x(n+ 1)-xo= (1- ~n)[x(n) -xoJ + ~n[y(n) -xoJ
we have, for any fixed niel ,
cXy(n+ 1, m) = <1>(n + 1, n)cXy(n, m) + ~nCyy(n, m)
On iterating the above equality we get, for any n> 0,
n-1cXy(n, m) = <1>(n, O)CXy(O, m) + L <1>(n, i + I)~icyy(i, m)
i=O
Since x(o) is a second-order random variable independent of {y(n); nEl}it comes, by (11), that CXy(O, m)=O for any mel. Thus, setting m=n~ 1,we have
,,-1cXy+(n, n)~ L <1>(n, i+I)~icyy+(i, n)
i=O
Then, by Lemma 2 and conditions (13), (14), we get
(6)co co
L E<Tnx(n)-xo; z(n»+= L "'n~nCxy+(n, n)n=O n=O
co n-l
~ L ~n(I-~n) L <1>(n,i+I)~icyy+(i,n)n=l i=O
co n-l
L ~n L <1>(n+I,i+I)~iCyy+(i,n)n=l i=O
co n-l co
~ L k~n2 L cyy+(i, n) ~k L ~n2 13n-1 < CXJn=l i=O n=l
for some finite positive constant k. D
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
30 C. S. Kubrusly
4. Comments and concluding remarks
Dvoretzky (1956) in his original paper dealt with real random variables,but he also formulated a generalization in normed linear spaces whose proofis a natural cxtention of the scalar (real) case. Schmetterer (1958) consideredstochastic approximation algorithms, in particular the unified Dvoretzkyapproach, in Hilbert space. The previous works were generalized by Venter(1966) who proposed a wider class of algorithms in Hilbert space.
The results we deduced in § 3 present the advantage of requiring lessrestrictive conditions concerning the random sequence {z(n)} (or {y(n)}). Inother words, conditions of the following type were assumed by Dvoretzky(1956) instead of (6) :
E[z(n) Ix(O), z(O), ... , z(n-l)] = 0, "In (15)
w.p.I in the scalar case, which can also be assumed in a Hilbert space settingas shown by Venter (1966), and
(16)
in the case of linear normed spaces. Assuming H-valued random variables,both (15) and (16) imply that E<Tnx(n)-xo; z(n),;;O for all n (where theequality holds in the case of (15) as shown by Kubrusly (1976)). Venter (1966)also proved that conditions weaker than (15), such as
00
L (EIIE[z(n)lx(O), z(l), ... , z(n-I)1112)1/2 < OJn=O
00
L IIE[z(n)lx(O), z(l), ... , z(n-l)]11 < OJ, w.p.In ...O
(17)
(18)
are able to ensure the convergence for algorithms in Hilbert space, in quadraticmean and with probability 1 in the case of (17), and with probability 1 in thecase of (18). Note that (15) implies (17) which implies (18) (for detaileddiscussion see Venter (1966)).
In many applications concerning the system identification practice, therandom sequence {y(nl} appearing in Corollary 1 comes as a non-linear transformation of observations of some dynamical system (e.g. see Saridis and Stein(1968) and Kubrusly and Curtain (1977) for illustrative examples concerningsuch applications of Corollary I). Therefore, even in particular cases dealingwith finite-dimensional linear dynamical systems under wide sense stationarityassumptions, it may be difficult (if possible) to verify conditions such as (15)(18) (now with z(n)= tn[y(n)-xo])' In a first step to meet this problem,correlated random sequences {z(n)} subject to (6) were allowed in Theorem l.This particular condition (6) may also not be easy to be directly verified inpractice, since it is expressed in terms of correlation between z(n) and x(n+ I),where {x(n)} is the random sequence to be analysed. However, in Corollary Ithis particular difficulty has been overcome, where (6) is substituted by (13)which involves only correlation between {y(n)}t. On the other hand, the
t Other results on almost sure and on mean square convergence for correlated(but scalar) random sequences were given in Ljung (1974, 1975).
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13
Stochastic approximation algorithms in Hilbert space 31
pair of conditions (13), (14) we assumed in Corollary 1 not only avoid randomanalysis of conditional expectation, but also dismiss uniform boundnessrequirements. In this way, these conditions are easier to be verified in manysituations, and more appropriate in case of wide-sense stationary randomsequences.
Finally we remark that a slightly similar version (up to (13), (14)) ofCorollary 1 for the (real) scalar case was presented in Fu (1969), where theproof stating the connection with the Dvoretzky theorem (1956) is somewhat obscure in the following sense. In that formulation it was not required(13), (14) or any of the sufficient conditions (15)-(18). So, if one believes inFu (1969), a simplified version of Corollary 1 for the scalar case (with directextension to finite-dimensional spaces) could be formulated by assuming only(8)-(12), without requiring the original sufficient condition (15) assumed byDvoretzky (1956). But it is clear that (8)-(12) does not imply (15) and, inthis way, they do not comprise a sufficient conditions set. In particular (11)does not imply (15) although the reverse is true.
ACKNOWLEDGMENT
The author thanks Mr. A. Alcaim for his carefully critical reading of themanuscript.
REFERENCES
ALBERT, A. E., and GARDNER, L. A., Jr., 1967, Stochastic Approximation and Non-linear Regression (MIT Press).
BALAKRISHNAN, A. V., 1976, Applied Functional Analysis (Springer-Verlag).BLUM, J. R., 1954, Ann. math. Statist., 25, 382.DVORETZKY, A., 1956, Proc. 3rd Berkeley Symp. on Math. Statist. and Prob., edited
by J. Neyman (University of California Press), p. 39.FABIAN, V., 1971, Optimization Methods in Statistics, edited by J. S. Rustagi
(Academic Press).Fu, K. S., 1969, System Theory, edited by L. A. Zadeh and E. Polak (McGraw.HiII).KIEFER, J'., and WOLFOWITZ, J., 1952, Ann. math. Statist., 23,462.KNOPP, K., 1951, Theory and Application of Infinite Series, 2nd edition (Blakie).KRTOLICA, R., 1976 a, Automn. remote Control, 37, 27; 1976 b, Ibid., 37, 170.KUBRUSLY, C. S., 1976, Ph.D. Thesis, Control Theory Centre, University of Warwick;
1977, Int. J. Control, 26, 509.KUBRUSLY, C. S., and CURTAIN, R. F., 1977, Int. J. Control, 25, 441.LJUNG, L., 1974, Report No. 7403, Div. of Autom. Control, Lund Institute of
'I'eehnology : 1975, Automn. remote Control, 35, 1532.LOEVE, M., 1963, Probability Theory, 3rd edition (Van Nostrand).ROBBINS, H., and MONRO, S., 1951, Ann. math. Statist., 22, 400.SAKRISON, D. J., 1966, Advances in Communication Systems, Vol. 2, edited by A. V.
Balakrishnan (Academic Press).SARIDIS, G. N., 1974, I.E.E.E. Trans. autom. Control, 19, 798.SARIDIS, G. N., NIKOLIK, Z. J., and Fu, K. S., 1969, I.E.E.E. Trans. Syst. Sci.
Cybernet., 5, 8.SARIDIS, G. N., and STEIN, G., 1968, I.E.E.E. Trans. autom. Control, 13,515.SCHMETTERER, L., 1958, Le Calcul des Probabilitss et ses Applications, 87,55.VENTER, J. H., 1966, Ann. math. Statist., 37, 1534.WASAN, W. T., 1969, Stochastic Approximation (Cambridge University Press).
Dow
nloa
ded
by [
Mon
ash
Uni
vers
ity L
ibra
ry]
at 0
1:12
26
Sept
embe
r 20
13