4
ELSEVIER Systems & Control Letters 24 (1995)385-388 $)qJllLAU h CONTROl. Rate of convergence of the LMS method Lfiszl6 Gerencs~r Computer and Automation Institute of the Hungarian Academy of Sciences, P.O. Box 63, H-1518 Budapest, Hungary Received25 April 1993;revised 31 October 1993 Abstract It is shown under reasonable assumptions that for any q the Lq-norm of the estimation error generated by the LMS algorithm is of the order of magnitude O(T- 1/2), where T is the length of the observation period. Keywords: Identification; Least squares; Parameter estimation; Convergence A basic method of adaptive filtering is the least mean square (LMS) method in which a particular component of a random, second-order stationary signal is approximated by the linear combination of the remaining components. The LMS method has been used widely due to its simplicity and effec- tiveness; however its properties have not been very well understood until very recently. A major ad- vance has been made in [6], in which a very rich characterization of the LMS estimator process is given in the form of an invariance principle and a functional limit theorem. However the rate of convergence of higher-order moments was not con- sidered in that paper. The best available results in this direction are given in [8], but the established rate is slower than what is expected. The desired rate of convergence for higher-order moments was established for a general recursive estimation scheme of [7] in [4]; however that result is not directly applicable to the LMS algorithm, since a key assumption on the enforced boundedness of the estimator process is not satisfied. The main advance in this paper is that Tel.: 36-1-166-5644.Fax: 36-1-166-7503. the method developed in [4] is adjusted to yield a rate of convergence result for the continuous time LMS algorithm. The results can be extended with- out difficulty to discrete time. Moreover the methods of this paper are also applicable to the fixed gain version of the LMS algorithm (cf. Theorem 2). Let (X,, at) be an ~P+ 1-valued, second-order sta- tionary process, where (at) is a real-valued process. A basic problem of adaptive filtering is to find the best approximation of at in terms of Xt, in the mean square sense, i.e. we consider the minimization problem ming E (XXt g - at)2, ( 1) where 0 is an RP-valued weighting vector. Introduc- ing the notations A* = - EXtX~, b* = EX, a,, the minimization problem (1) is equivalent to the problem of solving the linear equation A*9 + b* = 0. (2) We shall assume that A* is nonsingular and hence negative definite, and thus the linear equation (2) 0167-6911/95/$09.50 © 1995- ElsevierScienceB.V. All rights reserved SSDI 0167-691 1(94)00103-0

Rate of convergence of the LMS method

Embed Size (px)

Citation preview

Page 1: Rate of convergence of the LMS method

E L S E V I E R Systems & Control Letters 24 (1995) 385-388

$)qJllLAU h CONTROl.

Rate of convergence of the LMS method

Lfisz l6 G e r e n c s ~ r

Computer and Automation Institute of the Hungarian Academy of Sciences, P.O. Box 63, H-1518 Budapest, Hungary

Received 25 April 1993; revised 31 October 1993

Abstract

It is shown under reasonable assumptions that for any q the Lq-norm of the estimation error generated by the LMS algorithm is of the order of magnitude O(T- 1/2), where T is the length of the observation period.

Keywords: Identification; Least squares; Parameter estimation; Convergence

A basic method of adaptive filtering is the least mean square (LMS) method in which a particular component of a random, second-order stationary signal is approximated by the linear combination of the remaining components. The LMS method has been used widely due to its simplicity and effec- tiveness; however its properties have not been very well understood until very recently. A major ad- vance has been made in [6], in which a very rich characterization of the LMS estimator process is given in the form of an invariance principle and a functional limit theorem. However the rate of convergence of higher-order moments was not con- sidered in that paper.

The best available results in this direction are given in [8], but the established rate is slower than what is expected. The desired rate of convergence for higher-order moments was established for a general recursive estimation scheme of [7] in [4]; however that result is not directly applicable to the LMS algorithm, since a key assumption on the enforced boundedness of the estimator process is not satisfied. The main advance in this paper is that

Tel.: 36-1-166-5644. Fax: 36-1-166-7503.

the method developed in [4] is adjusted to yield a rate of convergence result for the continuous time LMS algorithm. The results can be extended with- out difficulty to discrete time. Moreover the methods of this paper are also applicable to the fixed gain version of the LMS algorithm (cf. Theorem 2).

Let (X,, at) be an ~P+ 1-valued, second-order sta- tionary process, where (at) is a real-valued process. A basic problem of adaptive filtering is to find the best approximation of at in terms of Xt , in the mean square sense, i.e. we consider the minimization problem

ming E (XXt g - at)2, ( 1 )

where 0 is an RP-valued weighting vector. Introduc- ing the notations

A* = - E X t X ~ , b* = E X , a, ,

the minimization problem (1) is equivalent to the problem of solving the linear equation

A*9 + b* = 0. (2)

We shall assume that A* is nonsingular and hence negative definite, and thus the linear equation (2)

0167-6911/95/$09.50 © 1995 - Elsevier Science B.V. All rights reserved SSDI 0167-691 1(94)00103-0

Page 2: Rate of convergence of the LMS method

386 L. Gerencs~r / Systems & Control Letters 24 (1995) 385-388

has a unique solution g*. Define

A,(co) = -- X tX~ , bt(co) = X,a, .

Then the continuous time LMS method is defined by the differential equation

1 h, = t (A,h, + b,). (3)

The initial condition ha is assumed to have finite moments of all order.

An important concept in the analysis of recursive estimators is the concept of associated differential equation which is defined as

I ~j, = - : ( A ' o , + b*), (4)

t

To describe the stability condition imposed on (4) it is useful to make first a transformation of time t = C. Then for ~s = g~, we get

d ds ~' = A*~, + b*. (5)

Condition 1. We assume that (5) is exponentially asymptotically stable with negative top Lyapunov exponent, i.e. we have with some ct > O, Co > O, 1O~l ~< Coe-=Sl0ol.

This condition means that if tht denotes the fun- damental solution of (4) then for 1 ~< r ~< t

II ~b,~b,- ' II ~< Co (6)

Indeed it is easy to verify that q~, = e A• Jog ~, and thus q~t~b;- x = ea*aogt-log~) = eA*v where v = log(t/r), from which (6) follows.

For the characterization of the process (X,, a,) we refer to the theory of L-mixing processes that has been developed in [1]. A summary of the basic concepts, notations and properties is also given in the Appendix of [2].

Condition 2. We assume that z, = (X,, a,) is an L- mixing process with respect to a pair of families of a-algebras (~], ~ + ) and even the following stron- ger condition holds: Mo~(z) < ov and Fo~(z) < ~ . (The first inequality is equivalent to saying that (z,) is bounded by a finite constant.)

This condition is fairly strong and excludes some mathematically interesting examples such as Gaussian processes. On the other hand we think that in most practical problems, the condition is satisfied.

Theorem 1. Under Conditions 1 and 2 we have for ct > ½ and for any m >~ 1

E~/'lh, - g*]" = O( t - ~/2).

I f ~ <~ ½ then we have for any m >~ 1 and ~ > 0

E1/"lht _ g*[,, = O(t-~+') .

In the notation of [2] we can also write the state- ments of the theorem as I h, - g*l = OM(t-~/2) and bht- g*l = OM(t-~+~), respectively. For the proof we need the theorem below, which follows directly from Theorem 1.1 of [3] using a linear change of time scale t ' = e t . Let As, s>/so, be a (p×p)- matrix-valued stochastic process and let ff~,, be the solution of the matrix-valued random linear differ- ential equation

d ~ ~ ~

d~ ~O~,, = A~b .... ~ , , = I . (7)

Theorem A. Let A~, s >j So, be an L-mixing process with respect to say ~ , ~ + such that M ~ ( . 4 ) = K < oo, Foo(-~) < oo and E.4~ = A*. Then for any f ixed K and a' < or, and any finite m there exists an L such that i fF~( .4) <% L then for any s,r >~ So

11¢7~,r11 ~< C*e -~'('-'),

where the L , , ( O , : , P ) - n o r m of C* is finite and bounded in r.

Proof. First we prove that ht is M-bounded. Let t = C, and define h~ = he~, .4, = Ae, b~ = be. Then (3) can be written as

d ~ ~

ds h~ = A~h~ + bs. (8)

Let the fundamental solution of this linear differen- tial equation be q~s,r. It is easy to see that the process (,4s) is L-mixing with respect to ~-s = ~ e , ~-~+ = ~-~ (cf. Lemma 4.6 of [5]). Moreover the quoted lemma implies that if we restrict the process to the range s/> So, and denote this process by "4 .... then F~(Aso.~ ) ~< Fo~(At)/e s°.

Page 3: Rate of convergence of the LMS method

L. Gerencs~r / Systems & Control Letters 24 (1995) 385-388 387

Thus Theorem A implies that for any ~' < ~, and any finite m there exists an So such that for s, r /> So we have

11¢ .,11 < C * e - " t s - ' ) , (9)

and here the L,(f2, ~-, P)-norm of C* is finite and bounded in r.

We have for s/> So/~ = ~k .... h~0 + ~0 ~b~.,b,. For the first term we note that for any fixed So all moments of h~0 are finite since At is bounded and all moments of ~ are finite. On the other hand ap- plying (9) we get that the second term is majorated by

f C*e-, ' (~- ,~K. o

Thus the Lm(t2,~-,P)-norm of the second term is bounded with respect to s and the M-boundedness of the process h~ has been proved.

The second part of the proof follows the argu- ments of [4] closely. Take a real number s ~> 1 and consider the interval l-s, qs) where q > 1. Let ~, be the solution of (4) with initial condition #~ = h,. Subtracting (4) from (3) we have for s <~ t <~ qs:

h i - O , = ~d~2~ l ( A , h , + b , - A * h , - b * ) d r r

= ~b,~b_ 1 _1 (A~h, + b,)dr , (10) F

where/1, = A, - A* and b, = b~ - b*. Write h, = 0, + (h, - 0~) in (10); then we get

h, - .qt = qb,492 1 ~ r ( h r _ 0,)dr

f ' 1 (.~#, b~) dr. + l r + (11)

For the first term on the right-hand side we use the fact that Ilq~t~b, -111 is bounded by Co, hence the norm of this term is majorated by

Co - 12K(h, - 0,)1 dr. r

To estimate the second term on the right-hand side of (11) write this term as

4 t4,; l f ' 4 q 4V + (12)

Note that ~bt~ 1 = e A * ( l ° g t - l ° g q s ) = e A*(l°g(t/qs)),

hence for fixed q IJ ~t~b~ 111 is bounded. Let us now consider the integral expression in

(12), and let us define

L*q = sup q~,s4~21 _1 (-~,0, + b,)dr . (13) s <~ t <~ qs r

Here ~, = ~bt~b/19~, hence we consider the following related expressions:

sup 4~q~4~- 1 ,4,thttP~ ~ dr s <~ t <~ qs

I * s , q , 1 =

and

Is*q, 2 = sup ~bq~ ~b,- - . s <~ t <~ qs r

Since A,, b, are zero-mean L-mixing processes and Oqsq~, -1 and q~,O-1 are deterministic bounded processes, Theorem 1.1 and Theorem 5.1 of [1] imply that we have for m > 2 and i = 1,2 the inequality

E 1/m I* . m 1 ,,q,, <. C ~ dr <~ Cts-1 /2 , (14)

where C depends on m and on the processes (.4,) and (b,), but is independent of s and q. Finally for 1" defined under (13) we get $,q

I* I* • O M ( S - 1/2) (15) s , , l <~ s , q , X g s + I,.q.2 =

since 95 = h, = ON(I). From (11) we get

[ h , - 9 , [ <~ Co r 2 K l h , - O , Idr + I*~.q (16)

and applying the Bellman-Gronwall lemma in Is, t) we get

( f ' 1 2 C o K d r ) - • I* (17) Iht - 9tP ~< exp r ~'q"

The first term is majorated by

ff ~ 1 Cq = exp - 2 C o K d r = 2CoKlogq .

P

Thus we arrive at the conclusion that

sup Ih, - 9tl = Ou(s-1/2) . (18) s <~ t <~ qs

From here the proof can be completed as in [4] using the stability condition imposed onto (4).

Page 4: Rate of convergence of the LMS method

388 L. GerencsOr / Systems & Control Letters 24 (1995) 385-388

The idea is to paste together the inequalities (18) for s --- qi, i integer. From this procedure it is seen that the cases ~t > ½ and ~ -%< ½ should be treated separ- ately. Among others, in the case • ~< ½ the rate of convergence of deterministic equation (4), which is O(t-~), is too slow to expect that the randomized version of this equation will converge with rate O(t- 1/2). In the case of ~ > ½ the rate of conver- gence of ht is determined by the local tracking errors given by (18). []

In conclusion we mention that the above deriva- tion applies without significant changes for the analysis of the fixed gain version of the LMS method given by

Ot = 2(Atht + bt), (19)

where 2 is a fixed, small positive number. The initial condition ho is assumed to have finite moments of all order.

Theorem 2. Under Conditions 1 and 2 we have for any 1 <~ m <~ C2-1 with some constant C depending only on M ~ (z) and F~ (z) the inequality

E1/mlh , __ g,I r~ = O(21/2).

Acknowledgements

The author would like to express his thanks to A. Heunis for inspiring discussions on the subject and to M. Kouritzin, who discovered an error in the original, 1991 manuscript of this paper.

References

[1] L. Gerencser, On a class of mixing process, Stochastics 26 (1989) 165-191.

[2] L. Gerencs&, On the martingale approximation of the es- timation error of ARMA parameters, Systems Control Lett. 15 (1990) 417-423.

[3] L. Gerencser, Almost sure eponential stability of random linear differential equations, Stochastics 36 (1991) 411-416.

[4] L. Gerencs~r, Rate of convergence of recursive estimators, SIAM J. Control Optim. 30 0992) 1200-1226.

[5] L. Gerencs&, On Rissanen's stochastic complexity for sta- tionary ARMA processes, to appear in: J. Statist. Planning Inference 41 (1994) 303-325.

[6] A. Heunis, Rates of convergence for an adaptive filtering algorithm driven by stationary dependent data, to appear in: SIAM J. Control Optim. 32 (1994) 116 140.

[7] L. Ljung and T. S6derstrSm, Theorey and Practice of Recur- sire Identification (MIT Press, Cambridge, MA, 1983).

[8] M. Watanabe, The 2r-th mean convergence of adaptive filters with stationary dependent random variables, IEEE Trans. Inform. Theory IT-30 (1984) 134-140.