6
IEEE TRANSACTIONS ON lNFORMATlON THEORY, VOL. 35, NO. 5, SEPTEMBER 1989 1103 synthesis of a (16,s) dmi, = 4 dc free block code,” IEEE Trans. Mugn., vol. MAG-20, no. 5, pp. 881-883, Sept. 1984. M. A. Herro and L. Hu, “Error-correcting line codes,’’ in Proc. 23rd Ann. Allerton Conf. Communication, Control, and Computing, Monti- cello. IL, Oct. 1985, pp. 450-451. E. E. Bergmann, A. M. Odlyzko, and S. H. Sangani, ‘‘Half weight block codes for optical communications,” AT&T Tech. J., vol. 65, pp. 85-93, May/June 1986. K. A. Schouhamer Immink and G. F. M. Beenker, “Binary transmission codes with higher order spectral zeros at zero frequency,” IEEE Trans. Inform. Theory. vol. IT-33, pp. 452-454, 1987. H. C. Ferreira, “The synthesis of magnetic recording trellis codes with good Hamming distance properties,” IEEE Trans. Magn., vol. MAG-21, no. 5, pp. 1356-1358, Sept. 1985. H. C. Ferreira, J. F. Hope, and A. L. Nel, “Binary rate four eighths, runlength constrained, error correcting magnetic recording modulation code,” IEEE Trans. Mugn., vol. MAG-22, no. 5, pp. 1197-1199, Sept. 1986. P. Lee and J. K. Wolf, “Combined error correction/modulation coding,” IEEE Trans. Magn., vol. MAG-23, no. 5, pp. 3681-3683, Sept. 1987. Recursive Density Estimation Under Dependence LANH TAT TRAN Abstract -Recursive estimators of the density of weakly dependent random variables are studied under certain absolute regularity and strong mixing conditions. Uniform strong consistency of the density estimators is established, and their rates of convergence are obtained. I. INTRODUCTION Nonparametric estimation of a probability density f(x) is an interesting problem in statistical inference and has an important role in communication theory and pattern recognition [8], [19]. There is an extensive literature dealing with density estimation when the observations are independent. The reader is referred to Wegman [25] for a review. The purpose of this correspondence is to investigate recursive density estimators when the observations are dependent. For a bibliography and relevant papers, see [6], Let X,, t = . . . ,- l,O,l,. . . be a strictly stationary sequence of random variables defined on a probability space (Q, 9, P). Let MO, and MF denote, respectively, the o-fields generated by X(t), t I 0, and by X(t), t 2 n. Then X(t) is absolutely regular if [71, [141-[181, [221, ~ 31, WI, ~91, [~OI. fl( n) = E{ sup1 P( AIM”) - P(A)(: A MF} lo asn+w. Let a(n) = sup {lP(A fl B) - P(A)P(B)I: A E M,”, B E M! m}. If a(n) + 0 as n + 00, then X(t) is said to satisfy the strong mixing condition. Throughout the correspondence we as- sume that X( t) satisfies either the absolutely regular condition or the strong mixing condition. The absolutely regular condition is stronger than the strong mixing condition. Some results for absolutely regular processes do not hold for strong mixing ones, as can be seen from Berbee [2, p. 1041. A large class of stochastic processes is known to be strong mixing. Chanda [4], Gorodettski [9], and Withers [28] have obtained various conditions for linear Manuscript received June 3, 1987: revised December 3, 1988. The author was with the Department of Statistics, University of Pennsylva- nia, Philadelphia, PA. He is now with the Department of Statistics, Indiana University, Bloomington, IN 47401. IEEE Log Number 8930781. processes to be strong mixing. For more information on abso- lutely regular or strong mixing processes, see Yoshihara [31], Rosenblatt [21], or Ibragimov [13]. Let f(x) be the density of X,. As an estimator of f(x) we shall consider which can be computed recursively by This property is particularly useful in large sample sizes since fn(x) can be updat:d easily with each additional observation. The estimator f;, does not achieve the smallest asymptotic variance within the class of recursive estimators. Deheuvels [SI introduced the following estimator 1 ; can be shown that A,* (x) has smaller asymptotic variece than fn( x) in the independent case. However, asymptotically f,(x) has smaller mean square error th-an f,*(x) (see Wertz [27, p. 2861. In the independent case f,(x) has been thoroughly examined in Wegman and Davies [26]. In the dependent case quadratic mean convergence and asymptotic normality of these recursive estimators have been obtained by Masry [15] under various assumptioFs on the dependence of X,. Strong pointwise consis- tency of f,(x) has been proved in Gyorfi [lo] but no rates of convergence are given. Takahata [24]*and Masry and Gyorfi [17] obtained sharp almost sure rates for f, (x) of f( x) for the class of asymptotically uncorrelated processes, the definition of which can be found in [17]. The class of asymptotically uncorrelated processes is smaller than the class of strong mixing processes considered in Section IV. Under various conditions on the maxi- mal correlation, the bandwith b, and the kernel K, Masry and Gyorfi [17] showed that for some 6 > 0. To compare this result with others later on, we will consider the popular case where b, = 0-7 for simplicity. Then L < x > - ~f;:(x) =o( {(logn)1/2(log2n)“+8)/2n-(1-y)/2}) as. (1.1) Recently, MasryA[16] established sharp rates of almost sure convergence for f, (x) of f( x) for vector values stationary strong mixing processes under weak assumptions on the strong mixing condition, bandwidth, and kernel K. In the univariate case his results show that i( x) - EL( x) = o( { (logn)’/2(log, n)(1+~)/2n-l/*b-l+l/r 1) (1.2) for any r > 2 and some 6 > 0. He also raised the question as to whether his results can be further improved. Theorems 3.1 and 4.1 show that the rate of convergence in (1.2) can be improved 0018-9448/89/0900-1103$01.00 01989 IEEE

Recursive density estimation under dependence

  • Upload
    lt

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recursive density estimation under dependence

IEEE TRANSACTIONS ON lNFORMATlON THEORY, VOL. 35, NO. 5, SEPTEMBER 1989 1103

synthesis of a (16,s) dmi, = 4 dc free block code,” IEEE Trans. Mugn., vol. MAG-20, no. 5, pp. 881-883, Sept. 1984. M. A. Herro and L. Hu, “Error-correcting line codes,’’ in Proc. 23rd A n n . Allerton Conf. Communication, Control, and Computing, Monti- cello. IL, Oct. 1985, pp. 450-451. E. E. Bergmann, A. M. Odlyzko, and S . H. Sangani, ‘‘Half weight block codes for optical communications,” AT&T Tech. J . , vol. 65, pp. 85-93, May/June 1986. K. A. Schouhamer Immink and G. F. M. Beenker, “Binary transmission codes with higher order spectral zeros at zero frequency,” IEEE Trans. Inform. Theory. vol. IT-33, pp. 452-454, 1987. H. C. Ferreira, “The synthesis of magnetic recording trellis codes with good Hamming distance properties,” IEEE Trans. Magn., vol. MAG-21, no. 5, pp. 1356-1358, Sept. 1985. H. C. Ferreira, J. F. Hope, and A. L. Nel, “Binary rate four eighths, runlength constrained, error correcting magnetic recording modulation code,” IEEE Trans. Mugn., vol. MAG-22, no. 5, pp. 1197-1199, Sept. 1986. P. Lee and J. K. Wolf, “Combined error correction/modulation coding,” IEEE Trans. Magn., vol. MAG-23, no. 5 , pp. 3681-3683, Sept. 1987.

Recursive Density Estimation Under Dependence LANH TAT TRAN

Abstract -Recursive estimators of the density of weakly dependent random variables are studied under certain absolute regularity and strong mixing conditions. Uniform strong consistency of the density estimators is established, and their rates of convergence are obtained.

I. INTRODUCTION

Nonparametric estimation of a probability density f ( x ) is an interesting problem in statistical inference and has an important role in communication theory and pattern recognition [8], [19]. There is an extensive literature dealing with density estimation when the observations are independent. The reader is referred to Wegman [25] for a review. The purpose of this correspondence is to investigate recursive density estimators when the observations are dependent. For a bibliography and relevant papers, see [6],

Let X, , t = . . . ,- l,O,l,. . . be a strictly stationary sequence of random variables defined on a probability space ( Q , 9, P). Let MO, and MF denote, respectively, the o-fields generated by X ( t ) , t I 0, and by X ( t ) , t 2 n. Then X ( t ) is absolutely regular if

[71, [141-[181, [221, ~ 3 1 , WI, ~ 9 1 , [ ~ O I .

f l ( n) = E { sup1 P( AIM”) - P ( A ) ( : A € M F } l o a s n + w .

Let a ( n ) = sup { l P ( A fl B ) - P ( A ) P ( B ) I : A E M,”, B E M! m} . If a ( n ) + 0 as n + 00, then X ( t ) is said to satisfy the strong mixing condition. Throughout the correspondence we as- sume that X( t ) satisfies either the absolutely regular condition or the strong mixing condition. The absolutely regular condition is stronger than the strong mixing condition. Some results for absolutely regular processes do not hold for strong mixing ones, as can be seen from Berbee [2, p. 1041. A large class of stochastic processes is known to be strong mixing. Chanda [4], Gorodettski [9], and Withers [28] have obtained various conditions for linear

Manuscript received June 3, 1987: revised December 3, 1988. The author was with the Department of Statistics, University of Pennsylva-

nia, Philadelphia, PA. He is now with the Department of Statistics, Indiana University, Bloomington, IN 47401.

IEEE Log Number 8930781.

processes to be strong mixing. For more information on abso- lutely regular or strong mixing processes, see Yoshihara [31], Rosenblatt [21], or Ibragimov [13].

Let f ( x ) be the density of X,. As an estimator of f ( x ) we shall consider

which can be computed recursively by

T h i s property is particularly useful in large sample sizes since f n ( x ) can be updat:d easily with each additional observation.

The estimator f;, does not achieve the smallest asymptotic variance within the class of recursive estimators. Deheuvels [SI introduced the following estimator

1; can be shown that A,* ( x ) has smaller asymptotic variece than f n ( x ) in the independent case. However, asymptotically f , ( x ) has smaller mean square error th-an f , * (x ) (see Wertz [27, p. 2861.

In the independent case f , ( x ) has been thoroughly examined in Wegman and Davies [26]. In the dependent case quadratic mean convergence and asymptotic normality of these recursive estimators have been obtained by Masry [15] under various assumptioFs on the dependence of X, . Strong pointwise consis- tency of f , ( x ) has been proved in Gyorfi [lo] but no rates of convergence are given. Takahata [24]*and Masry and Gyorfi [17] obtained sharp almost sure rates for f, ( x ) of f( x ) for the class of asymptotically uncorrelated processes, the definition of which can be found in [17]. The class of asymptotically uncorrelated processes is smaller than the class of strong mixing processes considered in Section IV. Under various conditions on the maxi- mal correlation, the bandwith b, and the kernel K , Masry and Gyorfi [17] showed that

for some 6 > 0. To compare this result with others later on, we will consider the popular case where b, = 0 - 7 for simplicity. Then

L < x > - ~ f ; : ( x ) = o ( { ( l o g n ) 1 / 2 ( l o g 2 n ) “ + 8 ) / 2 n - ( 1 - y ) / 2 } ) a s .

(1.1)

Recently, MasryA[16] established sharp rates of almost sure convergence for f, ( x ) of f( x ) for vector values stationary strong mixing processes under weak assumptions on the strong mixing condition, bandwidth, and kernel K . In the univariate case his results show that

i( x ) - EL( x ) = o( { (logn)’/2(log, n ) ( 1 + ~ ) / 2 n - l / * b - l + l / r ” 1) (1.2)

for any r > 2 and some 6 > 0. He also raised the question as to whether his results can be further improved. Theorems 3.1 and 4.1 show that the rate of convergence in (1.2) can be improved

0018-9448/89/0900-1103$01.00 01989 IEEE

Page 2: Recursive density estimation under dependence

1104 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

under certain conditions. For b, = Cn-Y, we get from (1.2)

i ( x ) - Ef; : (x ) = O ( { ( l o g n ) ' ~ ~ ( l o g , n ) ~ ' + ~ ) ~ 2 n - ' ~ ~ + ~ - ~ ~ ' } ) . (1.3)

This correspondence is concerned more with the almost sure uniform consistency of the sequence i ( x ) and its rate of conver- gence than with pointwise convergence. Since parameter estima- tion in time series analysis often is carried out under the Gauss- ian assumption, it is useful to check whether or not the density of a time series is Gaussian or nearly so. An estimated probability density that looks heavy-tailed suggests that Gaussian estimators are possibly not efficient. Under this situation the interest is often on the overall shape of the density.

The method of proof in this correspondence is based on approximations of absolutely regular and strong mixing random variables (RV's) by independent RV's and is different than the method used in [16] and [17], where the theory of mixingales is employed.

The assumptions used herein are stated in Section 11. Assump- tion 1 is not required in [16] or [17]. However, large classes of processes satisfy this assumption. The results can be used for density estimation in autoregressive moving average models. We also assume a Lipschitz condition on the kernel K while Mary assumes that K has an integrable radiant majorant [16].

In situations where the observations are assumed to be inde- pendent, it is of interest to study the sensitivity of density estimators to departures from this assumption, e.g., whether the limit theorems continue to hold under departures from the inde- pendence assumption. For some motivation on this point, see Hart [l l] . In the independent case and under some weak assump- tions on the kernel function K ( x ) and on the den$ty func- tion f(x), Wegman and Davies [26] showed that /,(x) con- verges pointwise almost surely at a rate of order O((log1og n)'/ 'n-(l -Y)/,) if the bandwidth is Cn-y . Under simi- lar assumptions, our results show that f n ( x ) converges uniformly on compact sets to /(x) with rates of convergence of order O((1og n ) ' / 2 n - ( 1 -y) /2) almost surely when the observations sat- isfy various ab:olutely regular or strong mixing conditions. Thus, in some sense L l ( x ) is quite robust under certain departures from the independence assumption.

Our main results show that for b, = Cn-y,

X 1 ( x ) - E i ( x ) =O((logn)'/2n-('-Y)/, ) a s . (1.4)

under various assumptions. The rate of convergence in (1.4) is faster than the rates in (1.1) and (1.3) by a factor of (log, n)(' +')I2 and (log, n ) ( 1 + s ) / 2 n y / 2 - Y / r , respectively where 6 > 0 and r > 2. Note, however, that Assumption 2.1 is required here. This as- sumption enables us to utilize a useful result of [15, lemma 2.11, which is crucial to the proofs of our results. The rate of conver- gence in (1.2) can thus be improved under certain conditions.

To achieve the rate of convergence in (1.4), we require that B(n) = O(n-") for v > 5 and a(n) = O(n-") for v >11/2. Masry and Gyorfi [17] obtained the rate in (1.1) for asymptotically uncorrelated processes requiring only that the maximal correla- tion coefficient p ( n) = O( n-") for v > 1/2. Conditions (3.24) and (4.5) require that the b, tend to zero sufficiently slowly. These conditions are similar to those in [17, eq. (2.2)] and [16, eq. (1.7)].

The probability density function f(x) is assumed to be uni- variate. However, the results of the paper can be generalized straightforwardly to higher dimensional cases. Throughout, C will be used to denote constants whose values are unimportant and may vary from line to line.

11. PRELIMINARIES

The kernel K ( x ) is assumed to satisfy sup K ( x ) <0O

m < x < m

Jpm ~ ~ ( x ) l d x < o o lim I ~ K ( ~ ) I = o . (2.1)

b,,+O and nb,+oo asn-+ca. (2.2)

m 1x1 + m

The bandwith parameter b, satisfies

Assumption 1: The joint probability density f ( x , y , k ) of the random variables X, and $ + & exists and satisfies I f ( x , y , k ) - f ( x ) f ( y ) l I M < 00 for all x, y and k 2 1 .

Assumption 1 has been used by Masry [15]. We will give an example of a process satisfying Assumption 1 later.

Assumption 2: The bandwidth parameters { b,, } satisfy 1 " - c ( b n / b J ) ' + t (2.3)

/ = I

as n ~ m f o r l s r < 2 . Let K , , ( x ) be the averaging kernel defined by K, , (x) =

(l/bn ) K(x/b,, ). Then 1 " i1<4 =; c K,( x - 5 ) .

/-I

Lemma 2.1: Assume that X, is a strictly stationary strong mixing process satisfying a(n) = O(n-") for some v > 2. Let K and { b,, } satisfy (2.1), (2.2). Suppose Assumptions 1 and 2 hold and f ( x ) is continuous on a compact set B. Let

n

~ , ( x ) = n - , C v a r { ~ , ( x - ~ , ) } (2.4) r = l

n n

V,( x) = K 2 ~COV { K, ( x - X, ) , K, ( x - X,) } I. (2.5) I = l J = 1 I # /

Then nb,(x) + e l / ( x ) / ? ? m K 2 ( u ) d u , and nb,K(x) + 0 uni- formly on B, with 0, as defined in (2.3).

The proof of this lemma can be obtained by a slight variation of the proof of [15, theorem 31.

Example 2.1: We now give a specific example when Assump- tion 1 holds. Let X, be a stationary autoregressive process of order 1, i.e., X, = O X , - + e, where 101 < 1. Assume the e, are independent identically distributed (i.i.d.) and each e, has a standard Cauchy density. Then e, has characteristic function q ( u ) = exp ( - I u I ) and X, has characteristic function exp( - lul/(l- e)). Now, say,

=ekK + Ok-le ,+l + ... + Be,+k-l + e ,+ , = e k & + Z .

Clearly, Z has a Cauchy density with characteristic function exp(- luKl- O')/(l-O)). The joint density of X, and X,,, is f ( x , , xt+, , k ) = / ( X , ) / ~ ( X , + ~ - B"x,). A Cauchy density sym- metric about zero takes on its maximum value at zero. Thus f ( ~ , , x , + ~ , k ) I (1-B)2/(s2(1-Bk)), which is bounded away from zero for all k. It is now easy to see that Assumption 1 is satisfied. One can choose M = (1 - 0)'/(a2(1 - lob2) + (1 - 8)'/r2, for example. In this case X, is strong mixing with a( n) = O(

Assumption 3: { b,,} is a monotone sequence satisfymg b,+ - < b,.

Assumption 4: K ( x ) is Lipschitz with order 6, i.e., I K ( x ) - K ( y ) ( I Ix - ylS for all x, y and some 6 > 0.

for some s > 0 (see Pham and Tran [20]).

a

Page 3: Recursive density estimation under dependence

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5 , SEPTEMBER 1989 1105

Assumption 5: K has a Fourier transform K* so that K*(u) =

/?me- l" 'K(y) dy and that for some r , lim,+o{[l - K*(u>/lul ' ]} = k , is finite and that f ( ' ) ( x ) exists and is continuous on a compact set B.

Assumption 6: both nb; -+ 00 and (nb;)-lEy=ll$ converges to Y r .

Assumptions 5 and 6 are similar to conditions used by Wegman and Davies [26]. Note that these assumptions are only used in connection with the convergence of Ef,(x) to f ( x ) .

Lemma 2.2: Let K and { b,,} satisfy (2.1), (2.2). Suppose also that Assumptions 5 and 6 hold. Then ( E f ( x ) - f ( x ) ) / b , ' + y , k , f ( ' ) ( x ) uniformly on B.

Lemma 2.2 follows by an analysis of [26, theorem 11.

III. UNIFORM CONSISTENCY O F ~ ( X ) UNDER ABSOLUTE REGULARITY

Throughout this section we assume that { X , } satisfies the absolute regularity condition.

Lemma 3.1: Suppose { X,} satisfies the conditions of Lemma 2.1 except that X, is absolutely regular with B(n) = O(n-") for some Y > 2. Suppose Assumption 3 holds and that &,(log n)-' 4 00. Let ' p ( n ) be a function increasing to 00 arbitrarily slowly and

c,, = q(logn( nbfl)- ' ) l l2 (3 .1 ) where q > 0 is an arbitrary constant, and let

A,, = (nb,, logn)'/*. ( 3 4 Then

SUP P I I l , ( x ) - Efr(41>%,l X E B

for sufficiently large n. Here [XI denotes the greatest integer less than or equal to x .

Proof: a) Let 1 n

and p l = E I K , ( x - X , ) ] . T h e n 1 "

JXx) - E l k ) =; c [ K / G - x,) - P / ) ] . (3 .4 )

P = P ( ~ ) = [nbn/(Anq(n))l . (3 .5 )

J = 1

Let

Define q l ( x ) = K l ( x - K ) - p , and S ( n , x ) = ( l / n ) E y = , q , ( x ) . If n = 2 p q for some integer valued function q = q ( n ) , then S ( n , x ) can be writtenas S ( n , x ) = S ( n , x , l ) ) + S ( n , x , 2 ) , where

~ ( n , x , l ) = C V ( n , x , ( 2 j - 1 ) )

S ( n , x , 2 ) = V ( n , x , 2 j - l ) ( 3 4

9

I - 1

9

J = 1

with 1

V ( n , x , j ) = - q , ( x ) , j = l ; . . 3 4 . l = ( J - l ) p + l

If it is not the case that n = 2pq , then the last blocks of S ( n , x , l )

and S( n, x , 2 ) can be shorter than p but this does not affect the proof. By [34, lemma 3.11,

P [ I S ( n , x , l ) I > % J s p IC ? * I > % + 4 q B ( p ) ( 3 . 7 ) [J' l ]

where the V,* are independent random variables such that ?* has the same distribution as V( n, x, 2( j - 1)). We now follow an argument similar to Takahata [24] to obtain an upper bound for P[lCy=l?*I > cl , ] . Since S U P - ~ < ~ < ~ I K ( X ) I < 00,

1 Y ( n , x , j ) I I - 5 l ( l / b l )

- < --I 5 ( l / b I ) .

l = ( J - l ) p + l

. [ K ( ( x - K ) / b , ) - E K ( ( x - x , ) / b , ) l I

C

l = ( J - l ) p + l

By Assumption 3, we have bn/bl I 1 for ( j - 1 ) p + 1 I i I j p . Thus

JP

Ix,y*IIc(An/(nbn)) c ( b f l / b / ) ? I=( J - 1 ) p +1

I A$/ (nb , , ) ) a.s., (3 .9 ) which tends to zero. Thus [Afl?* I 1 1 / 2 for large n . Applying Bernstein's inequality (see Hoeffding [12] or Bennett [ l ] ) sepa- rately to the summands E?* and E - 5 and following an argument as in [24, p. 151, we obtain

for sufficiently large n. A simple computation shows that E ~ = l E ( V , * ) 2 ~ U n ( x ) + K ( x ) , where U , ( x ) and K(x) are as defined in (2.4), (2.5). By Lemma 2.1

9 lim nb, E ( ? * ) ' I C uniformly on B . (3 .11 )

n - e o j - 1

Let a > 0. From (3.7), (3.10) and (3.11) it follows that for large n

P [ l S ( n , x , l ) l > c , , ] I2exP[(-11+C)lognl+CnP-'B(P)

(3 .12)

by taking q sufficiently large. Note that C is independent of x E B. The same upper bound in (3.12) holds for P [ IS(n, x,2)1> z , ] by a similar argument. Finally, (3.3) follows from (3.4), (3.6), and (3.12).

Lemma 3.2: Let X, be an absolutely regular, strictly station- ary process satisfying the conditions of Lemma 3.11. Suppose in addition that Assumption 4 holds. Let

{( .) = n3(*- ' + 1)/2b(*-'- n 1)/2 (log - (3*- I + ')I2. ( 3.13)

Assume in addition that b,, tends to zero in such a way that m

Page 4: Recursive density estimation under dependence

1106 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5 , SEPTEMBER 1989

for some function cp(n) increasing to 01) arbitrarily slowly. Then

sup Ifi(x) - E ~ l ( x ) ( = O ( ( l o g n ) ' / 2 ( n b f l ) ~ ' / 2 ) a.s (3.15)

The summation over n of the last term of (3.22) is finite by condition (3.14). Thus for sufficiently large a,

00 X € B

proof: Let 1 = = ~-(3/(2")-'b,;1/(2"(l~~ n) (3 / (2w)+1 . , ,=I c .[ l S J < U max I'(x,)-E.(x,)l>c,,] < W .

Since B is compact, it can be covered with, say, U intervals 'J having length I such that

By the Borel Cmtelli lemma,

max 1' ( x, ) - EL ( x, ) I = O( (log n) 'I2( nb,, ) a s . 5 cn(3 / (28) )+ lb1/26 ,, (10gn)-(3/(26))-1. (3.16) 1 s J S U

Now (3.23)

The lemma follows from (3.17), (3.20), and (3.23).

Theorem 3.1: a) Let X, be an absolutely regular, strictly sta- tionary process with F(n) = O(n-") for some v > 5 . Suppose (2.1), (2.2) and Assumptions 1-3 hold and Assumption 4 holds with 6 > 3/( v - 5). Assume b, tends to zero in such a way that

Ifi?(x)-Ef;:(x)15 s u P I . f l ( x ) - ' ( x J ) l l S J S U x s J X € B

+ max I.(x,)-Ei(x,)

. I + SUP IE.(x,) - I . 15 J S U

1 S J < u x € J , n- (3S-l + 5 - u ) / 2 (logn)(3"'-'- Y)/2(log, .) -(' + f)b; -6-1 + 1 + U)/, (3.17)

+CO (3.24)

for some c > 0. Then Since K is Lipschitz with exponent 6, it follows that n

sup 1' ( x) - E' ( x) 1 = O( (log n)'/'( nb,, ) a s . (3.25)

b) Suppose in addition that K satisfies Assumptions 5 and 6 and b,, = CnPy with y 2 1/(2r + 1). Then

sup If;:(x) - f ( ~ ) ( = O ( ( l o g n ) ' / ~ n - ( ' - ~ ) / * ) a s . (3.26)

Remark 3.1: Note that -(36-' + 5 - v) < (- 6-l + 1 + v ) . Thus for (3.24) to lead to reasonable values of b,, we need

X € B l m - ~ i ( Y ) l ~ ( l / ~ ) c (1/b,)l(x-Y)/41S. (3.18)

J - 1

Note that b,, ,Clogn/n since nb,(logn)-' + W . Hence

1 " SUP l 'W-'(x,) l~- c

x E 1, n , = 1 X E B

- < Cz"n'+~/(log n)' +

=(10gn)'/2(nb~)-'/2. (3.19) -(36-'+5-v)>Oandsubsequently v > 5 and 6>3/(v-5). Thus Proof: a) Condition (3.24) implies that

(3.20)

Let E , , be as defined in (3.1). Since f(x) is continuous on B, by Lemma 3.1

g( n)b!'-'- '- ( cp ( n )) ' + "n - (log n ) "I2

= o( ( n log n (log, n)' + ') - ') (3.28)

for some increasing function cp( n). From the value of X, defined

Thus the summation in (3.14) is fini,te by (3.28). L "fl

,, b) By Lemma 2.2 supxE slfn(x) - f(x)l = O((1og n)'/, - - cn(3/B2~+)+lb1/26 (log n) - ( 3 / ( 2 m - 1 -

(nb,,)- '1,) + O( bi). The result then follows by noting that b: = o ( n - y r ) = ~ ( n - ( ' - y ) / ~ ) .

In the important case in which F(n) decays to zero exponen-

Theorem 3.2: a) Let { X t } be a strictly stationary absolutely regular process with /3( n) = O( ePs") for some s > 0 and suppose n b , , ( l ~ g n ) - ~ .+CO. If (2.1), (2.2), and Assumptions 1-4 hold, then (3.25) holds. b) Suppose in addition that K satisfies Assumption

tially fast, we obtain the following. /3([nb,,/(Xncp(n))1) X , l c p ( n)

b,l .___

- - cn(3/(2S))+ 1 -Ob(l /2S) ,, (logn)-(3/(2")-'

+ C s ( n ) c p ( n ) P ( [ n b n / ( X f l c p < n ) ) l ) (3.22) where {(n) is as defined in (3.13).

Page 5: Recursive density estimation under dependence

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989 1107

6 and b, = C K Y with y 2 1/(2r + 1). Then

sup I& x ) - f( x ) I = O((logn)'/2n-('-Y)/2 ) a s . x € B

Prooj We will prove a) only since the proof of b) is the same as the proof of b) of Theorem 3.1. Since b, > C( l~gn)~n- ' , the summation in (3.14) is bounded by

c 2 E ( n cp ( n) exp ( -3 log n /cp ( n 1) ( nb, /(log n 1') 'I2. n - 1

(3.31)

Choose cp(n) to be a function increasing to co such that q ( n ) = o{(nb, , ( l~gn)-~) ' /~) . The summation in (3.25) is then finite.

IV. 'IkE STRONG MXING CASE

We will need the following result from Bradley [3]. Lemma 4.1: Suppose X and Y are random variables taking

their values on 9' and R respectively where Y is a Bore1 space; suppose U is a uniform-[O,l] RV independent of (X,Y); and suppose E and v are positive numbers such that 5 I llYlly < m. Then there exists a real-valued RV Y* = f( X, Y, U ) , where f is a measurable function from Y X R X [0,1] into R , such that

1) Y* is independent of X, 2) the probability distributions of Y and Y* are identical,

and

- [ a ( a ( X ) , a(~))12"/ ~ + f .

3) P(lY*-Yl 25) <18(14Yll

Lemma 4.2: Suppose { X , } is a strictly stationary strong mix- ing stochastic process satisfying the conditions of Lemma 2.1. Suppose Assumption 3 holds and that nb,(logn)-'+co. Let cp(n) be a function increasing to 00 arbitrarily slowly. Let c, and A,, be as defined in (3.1) and (3.2). Let T = v/(2v + 1) where v is as in Lemma 4.1 Then for sufficiently large n

SUP fJ [ I i ( x ) - f( X I I 2 4 x E B

- < Cn --U + cq ( n ) ( n / b , " + 1)/2(log n 1'' - T)/2

Proofi The argument here is similar to the proof of Lemma 3.1 except that we now use Lemma 4.1 to approximate the RV's V( n, x, 2( j - l)), 1 I j I q, by independent RV's. By enlarging the probability space if necessary, introduce a sequence (U,, U,, . . . ) of independent uniform [0,1] random variables inde- pendent of our given sequence { V( n, x,2( j - l)), 1 5 j I 4 >. By Lemma 4.1, for each j , there exists an RV W(n, x , 2 ( j -1)) which is a measurable function of V(x, n , l ) , . . ., V ( n , x ,2 ( j -1)), v/ such that W(n, x , 2 ( j -1)) is independent of V(n,x, l ) ; . . , V ( n , x , 2 ( j - l ) ) , has the same distribution as V( n , x , 2( j - 1)) and satisfies

Let S ( n , x , l ) and c,, be as defined in (3.6) and (3.1), respec- tively. Then

f J [ I S ( n , x J ) l > c , , ] 1

= P c w( n, x,2( j -1)) + v( n, x,2( j - 1)) [ I j r l

- w( n, x,2( j - 1)) > cn I 1

( 4 4

Let a > 0 an arbitrary large positive number. From the proof of Lemma 3.1,

W( . ,x ,2 ( j - l ) ) > € , / 2 I n - " (4.3) I 1 for sufficiently large n. By (4.1)-(4.3)

p[ IS(n, x , l ) I > 4

- w( n, x,2( i - 1)) I > €, /%I.

. [ a ( .q X ) 9 a( y)]2T

- < n-" +Cq(2qX,'r,'rp(n)-')'

In-" + 1 8 q ( 2 q ~ ; ' ) ~ max ~ ~ V ( r t , x , 2 ( J - l ) ) ~ ~ ~ 1 s ~ 9 9

n-"+ C q ( 2 q c n ' ) r ( p / ( n b , ) ) ' [ a ( p > 1 2 T ,

. ( a [ ( p'( - nb, /log n Y 2 ] ) ,' The lemma follows by a simple computation using q = n/2p with p as in (3.5).

Lemma 4.3: Suppose X , is a strictly stationary strong mixing process satisfying the conditions of Lemma 4.2. Assume that Assumption 4 holds and that b, tends to zero in such a way that

E +)cp(n)( . [ ( rp(n))- ' (nb, / logn)1/2])2~<a (4.4) n = l

for some function rp(n) increasing to 00 arbitrarily slowly, and some 0 < T < 1/2. Here J(n) denotes

n ( 3 / ( 2 6 ) ) + 1 + ( T + 1)/26(1/(26)) - ( T + 1)/2 (logn) - ( 3 / ( 2 6 ) ) - 1 + ( 1 - ~ ) / 2 I !

Then

sup ~ i < x > - ~f;: , (x) l =O((logn)"2(nb,,)-''2) a.s x € B

Proof: The argument here is similar to the proof of Lemma 3.2. The proof of (3.20) remains the same. We now show (3.23). Using Lemma 4.2 and following the argument used in (3.22), we

Page 6: Recursive density estimation under dependence

1108 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5. SEPTEMBER 1989

have David K. Eldebrand and members of the Department of Statis-

where a (n) is as defined in the statement of the theorem. By the Borel-Cantelli lemma, (3.24) follows.

Theorem 4.1: a) Let X , be a strong mixing strictly stationary process with a(n) = O(n-’) for some v > 11/2. Suppose (2.1) and (2.2) hold and Assumptions 1 through 3 hold. Suppose Assumption 4 holds for 6 > 6/(2v - 11). Let z > 0 and

Suppose b,, tends to zero in such a way that

n ( 2 U T - T- 5)/2 - (3/(26)) ( r ( .)) -‘b,; (1/(26)) + ( T + 1 + 2n7)/2 +oo, (4.5)

for some 0 < T < 1/2. Then

sup I L, ( x ) - El2 ( x) I = O( (log n)1/2( nb,, ) ‘ I 2 ) a.s. X E B

b) Suppose in addition that K satisfies Assumptions 5 and 6 and b, = Cn-Y with y 2 1/(2r + 1). Then

tics, University of Pennsylvania, for their hospitality.

REFERENCES G. Bennett, “Probability inequalities for the sum of independent ran- dom variables.” J . Amer. Statist. Assoc., vol. 57, pp. 33-45, 1962. H. C. P. Berbee, “Random walks with stationary increments and re- newal theory.” Math. Center, Amsterdam, The Netherlands, Tract 112, 1979. R. C. Bradley, “Approximation theorems for strongly mixing random variables,’’ Michigan Math. J . , vol. 30, pp, 69-81, 1982. K. C. Chanda, “Strong mixing properties of linear stochastic processes,” J . Appl. Proh., vol. 14, pp. 67-77, 1974. P. Deheuvels, “Sur I’estimation sequentielle de la densite,” C. R . Acad. Sci. Paris Serie A , vol. 276, pp. 1119-1121, 1973. L. Devroye and L. Gyorfi, Nonparametric Density Estimation: the L , View. New York: Wiley, 1985. K. Fukunaga. Introduction to Statistical Pattern Recognition. New York: Academic, 1972. K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Trans. Inform. Theo?, vol. IT-21 pp. 32-40, 1975. V. V. Gorodetskii, “On the strong mixing properties for linear se- quences,” Theo? Prohuh. Appl.. vol. 22, pp. 411-413, 1977. L. Gyorfi, “Strong consistent density estimate from ergodic sample,” J . Multivar. Anal., vol. 11, pp. 81-84, 1981. J . D. Hart, “Efficiency of a kernel density estimator under an autore- gressive dependence model,” J . Amer. Statist. Assoc., vol. 79, pp. 110-117, 1984. W. Hoeffding, “Probability inequalities for sums of bounded random variables.” J . Amer. Statist. Assoc., vol. 58, pp. 13-30, 1963. I. A. Ibragimov. “Some limit theorems for stationary sequences,” Theory Prohah. Appl., vol. 4, pp. 347-382, 1962. E. Masry, “Probability density estimation from sampled data,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 696-709, 1983. -, “Recursive probability density estimation for weakly dependent processes,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 254-267, 1986. __ “Almost sure convergence of recursive density estimators for stationary mixing processes,” Statist. Prohah. Lett., vol. 5, pp. 249-254, lap7 * I Y I .

Proof: We will just prove a) since the proof of b) is the same as the proof of b) of Theorem 3.1. It is easily seen that for (4.4) to

[17] E. Masry and L. Gyorfi, “Strong consistency and rates for recursive

Ancrl. , vol. 22, pp, 79-93, 1987, probability density estimators of stationary processes,” J . Multivariate

H. T. Nguyen and D. T. Pham, “Nonparametric estimation in diffusion model by discrete sampling,” Publ. Inst. Statist. Univ. de Paris, vol. XXVI. OD. 89-109. 1981.

lead to reasonable values of ~ , I Y we need (2v - 11). Observe that (4.5) implies that

’11/2 and 6 > 6/ [18]

I I

[19]

[20]

[21]

[22]

P. Papantoni-Kazakos and D. Kazakos, Nonparametric Methods in Com- munications. D. T. Pham and L. T. Tran, “Some mixing properties of time series models,” Stoch. Proc. Appl., vol. 19, pp. 297-303, 1985. M. Rosenblatt. “A central limit theorem and a strong mixing condition,” Proc. Nut . Acad. Sci.. vol. 42, pp. 43-47, 1956. -, “Density estimates and Markov sequences,” in Nonparametric Techniques in Statistical Inference, M. Pun, Ed. London: Cambridge Univ. Press, 1970, pp. 199-210. G. Roussas. “Nonparametric estimation of the transition distribution function of a Markov process,” Ann. Math. Statist., vol. 40, pp. 1386-1400, 1969. H. Takahata, “Almost sure convergence of density estimators for weakly dependent stationary processes,” Bull. Tokyo Gakugei Uni. (IV), vol. 32, pp. 11-32, 1980. E. J. Wegman, “Nonparametric probability density estimation: I. A summary of available methods,” Technometrics, vol. 14, pp. 533-546, 1972.

n - ( 2 ~ ~ - 7 - 5 ) / 2 + 3/(26) ( 1 / ( 2 6 ) ) - ( ~ + 1 +2ur) /2 cp( n ) 1 + 2 T “ = New York: Marcel Dekker, 1977. ‘(l)

(4.6)

r( b,!

for increasing function c p ( n ) and < < 1/2. Utiliz- ing (4.6) and by a computation similar to that of (3.29) and

[23] (3.30), we obtain

[24]

[25] = o( ( n log “(log, n)’+ 6 ) - l) .

Hence condition (4.4) is satisfied. Using (4.4), an analog of Theorem 3.2 can be obtained for

stationary strong mixing processes. It is not difficult to see that Theorem 3.2 continues to hold if X, is a strictly stationary strong mixing process with a( n) = O(e-s ’ i ) for some s > 0.

ACKNOWLEDGMENT

I thank the referees for many useful suggestions and for pointing out a number of relevant references. 1 also thank Richard Bradley for drawing my attention to his paper. I thank Professor

[26] E. J. Wegman and H. I. Davies, “Remarks on some recursive estimators of a probability density function,” Ann. Statist., vol. 7, pp. 316-327, 1979.

[27] W. Wertz. “Sequential and recursive estimators of the probability den- sity,” Statistics, vol. 16. pp. 277-295, 1985.

(281 C. S. Withers, “Central limit theorems for dependent variables I,” Z . Wahrsch. Verw. Gehiete, vol. 57, pp. 509-534, 1981.

[29] C. T. Wolverton and T. J. Wagner, “Asymptotically discriminant func- tions for pattern classification,” IEEE Trans. Inform. Theory, vol. IT-15, pp. 258-265, 1969. H. Yamato, “Sequential estimation of a continuous probability density function and mode,” Bull. Math. Statist., vol. 14, pp. 1-12, 1971. K. Yoshihara. “Probability inequalities for sums of absolutely regular processes and their applications,” Z . Wahrsch. Verw. Gebiete, vol. 43, pp. 319-330. 1978. -, Density estimation for samples satisfying a certain absolute regu- larity condition,” J . Stutist. Planning Inference, vol. 9, pp. 19-32, 1984.

[30]

[31]

[32]