Wear convergence of stochastic approximation processes with random indices

This article was downloaded by: [University of Auckland Library]On: 30 November 2014, At: 16:48Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Sequential Analysis: Design Methods and ApplicationsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsqa20

Wear convergence of stochastic approximationprocesses with random indicesEdward W. Frees aa Department of Statistics and School of Business , University of Wisconsin , 1210 West DaytonStreet, Madison, 53706, WisconsinPublished online: 29 Mar 2007.

To cite this article: Edward W. Frees (1985) Wear convergence of stochastic approximation processes with random indices,Sequential Analysis: Design Methods and Applications, 4:1-2, 59-82, DOI: 10.1080/07474948508836072

To link to this article: http://dx.doi.org/10.1080/07474948508836072

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representationsor warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Anyopinions and views expressed in this publication are the opinions and views of the authors, and are not theviews of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor and Francis shall not be liable for any losses,actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoevercaused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyoneis expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/lsqa20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/07474948508836072

http://dx.doi.org/10.1080/07474948508836072

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

SEQUENTIAL ANALYSIS, 4(1&2), 59-82 (1985)

WEAK CONVERGENCE OF STOCHASTIC APPROXIMATION PROCESSES WITH RANDOM INDICES

Edward W. Frees

Department of Statistics and School of Business 1210 West Dayton Street University of Wisconsin Madison, Wisconsin 53706

Keywords and Phrases: strong approximations, mode estimation, stopping times.

ABSTRACT

Conditions are given for weak convergence through random

indices of a general stochastic approximation process which

includes the Robbins-Monro and Kiefer-Wolfowitz processes. For a

particular index, a sequential fixedwidth bounded length confi-

dence interval for the parameter being estimated is established.

As an example, an optimal recursive estimator and confidence

interval for the mode of a distribution function is constructed.

$1. INTRODUCTION AND SUMMARY

In a stochastic approximation (SA) procedure, stopping time

variables or stopping rules are important because they give the

researcher a reasonable criterion for halting the procedure with a

certain amount of confidence (see Stroup and Braun (1582), Sielken

(1973) and Farrell (1962)). Moreover, stopping times are useful in

developing confidence regions for parameters of interest. Proof of

Copyright O 1985 by Marcel Dekker, Inc. 0747-4946/85/0401-0059$3.50/0

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

60 FREES

weak convergence through a g e n e r a l random index g i v e s r e s u l t s f o r

s topping t imes a s s imple c o r o l l a r i e s .

Consider a g e n e r a l SA a lgor i thm,

f o r l o c a t i n g t h e assumed unique r o o t of f(x)=O, s a y 8, where f ( * )

i s a measurable f u n c t i o n such t h a t f:R+R. In ( 1 . 1 ) , Bn and e a r e n

random v a r i a b l e s and r is a f i x e d c o n s t a n t such t h a t O<r<1/2.

H e u r i s t i c a l l y , one may t h i n k of E~ a s t h e random e r r o r a t t h e nth

s t a g e due t o t h e f a c t t h a t f(Xn) can only be measured wi th e r r o r .

Bn may be thought of a s t h e b i a s i n t h e measurement of f (X,,), e.g.,

i f f ( * ) i s a p r o b a b i l i t y d e n s i t y func t ion . Tne a lgor i thm (1.1) is

a s p e c i a l c a s e of t h e a lgor i thm s t u d i e d by Kushner (1977) and Ljung

(1978). It c o n t a i n s t h e well-known Robbins-Monro (RM) (1951) and

Kiefer-Wolfowitz (KW) (1952) procedures f o r f i n d i n g r o o t s and

maxima of a r e g r e s s i o n f u n c t i o n , r e s p e c t i v e l y .

Mot iva t ion f o r s tudying a g e n e r a l a lgor i thm has been given by

Ruppert (1982). Ruppert cons idered a s i m i l a r a l g o r i t h m t o e s t i m a t e

t h e (assumed) unique 0 such t h a t f ( 0 ) = 0 ,

where O<r<1/4. As argued by Ruppert , a n impor tan t reason f o r con-

s i d e r i n g a lgor i thms of t h e form (1.2) i s t h a t a convent iona l

r e s t r i c t i v e assumption i n SA t h a t t h e e r r o r s {E,} be m a r t i n g a l e

d i f f e r e n c e s need no t be made. Ruppert showed t h a t f o r l a r g e n , t h e

e s t i m a t o r X, can be almost s u r e l y ( s t r o n g l y ) approximated by a

weighted average of E ~ , . . . , E ~ , where X and t h e e r r o r s E a r e k- n dimensional random v e c t o r s . This r e p r e s e n t a t i o n and s t r o n g approx-

imations f o r sums of dependent random v a r i a b l e s immediately g i v e

s e v e r a l important r e s u l t s about t h e l a r g e sample behavior of %. We c o n s i d e r o n l y t h e c a s e k=l . We choose a modif ied v e r s i o n of

(1.2) s i n c e Ruppert ' s r e p r e s e n t a t i o n i s not d i r e c t l y a p p l i c a b l e t o

s t a t i s t i c a l problems t h a t have non-zero b i a s terms (5 ) i n f i n i t e l y n

o f t e n and r<1/6, a s i n t h e example i n s4. The a lgor i thm (1.1) i s

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

STOCHASTIC APPROXIMATION 6 1

in a form more related to the easily accessible algorithm given by

Fabian (1968).

After a short preliminary section on notation and assumptions,

results which describe the large sample behavior of the recursive

estimator % are given in s3. The main result is weak convergence

via random indices (Theorem 3.2). This result subsumes the earlier

work on stopping rules in SA (see Stroup and Braun (1982) and

references cited therein). A fixed-width bounded length confidence

interval for the parameter 0 is an easy corollary of Theorem 3.2.

This corollary is the result given by Sielken (1973) and McLeish

(1976) for the RM process. Sielken's method was to verify

Anscornbe's (1952) uniform continuity in probability condition.

McLeish showed convergence of a randomized version of the RM

process in D[0,1) (the space of right-continuous real functions on

[0,1) having left-hand limits). The methods in this paper are dif-

ferent. Via strong approximation techniques, we prove convergence

of a randomized version of a process derived from (1.1) in D [ O , m ) ,

a more convenient function space for stopping rules.

As an example, $ 4 shows to how develop a fixed-width bounded

length confidence interval for the mode of a distribution function.

Using SA to estimate the mode was introduced by Burkholder (1956)

under restrictive conditions and more generally by Fritz (1973).

As a preliminary result, we show that our proposed estimator when

suitably standardized converges to a nondegenerate distribution.

This preliminary result is new and should be of independent inter-

est since it gives a method of constructing a mode estimator which

has a rate of convergence similar to the optimal rate of conver-

gence of density estimators (see ~;'ller and Gasser (1979)). The

example also demonstrates why it is important to consider a general

SA algorithm since the mode estimation problem is neither an RM nor

a KW procedure. The final section gives the results of an investi-

gation of finite sample properties of the mode estimator proposed

in $4 via a Monte-Carlo study.

s2. NOTATION AND ASSUMPTIONS - Assume that random elements are defined on a fixed probability D

ownl

oade

d by

[U

nive

rsity

of

Auc

klan

d L

ibra

ry]

at 1

6:48

30

Nov

embe

r 20

14

6 2 FREES

space ( O , F , P ) unless otherwise specified. Endow D[O,w) with Stone's

(1963) extension of Skorokhod's (1956) J1-topology and use =)W to denote convergence in this topology. Also, use =D to denote equiva-

lence in distribution. Let [ * I be the greatest integer function. Let {x,} and {yn] be sequences of random variables (r.v.). We write

Xn=O(Yn) if there exists a r.v. Z such that I x ~ \ / ] Y ~ \ < z almost

surely (a.s.) for all n. If Ixn(/ lyn( + 0 a.s., we write

%=o(Yn). Let 0 ( 0 ) and o ( 0 ) be the corresponding symbols for P P

relationships in probability.

The asymptotic coverage probability is fixed at 1-a, where

a9(0,1/2). We will define a sequence of random intervals {I,} and

a stopping rule Nd so that Nd is the first n such that

length(In)62d and lim P(BEI )=l-a. The basic assumptions about d 4 Nd

the relationships defined in (1.1) are stated below.

Al. Let n>0 and G>1/2-T. Assume that

f(x) = G(X-e) + o( IX-el1+"). A2. Let p>O and BER. Assume that Bn = 6 + o(n-'). A3. limn, Xn = 8 a.s.

A4. There exists a probability space which contains a

sequence {en} and a standard Brownian Motion B on [0,-) such that

and for some U,E > 0 depending on {E,}

AS. Let %/m, = N + op(l), where N is a positive random

variable, {mn} are integers going to infinity and {N ] is a n

sequence of random variables.

A6. Let { { } , {,} and {G~} be sequences of random variables I

such that 6 =B + o(l), un=u + o(1) and Gn=G + o(1). n

Remarks: In A6, we allow estimators of 0 that do not converge

to B as quickly as those in A2. These will be useful in the con-

struction of confidence intervals. The monograph of Philipp and

Stout (1975) gives several sets of sufficient conditions so that

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

STOCHASTIC APPROXIMATION 63

t h e sequence of random v a r i a b l e s { E ~ } can be r e d e f i n e d on a r i c h e r

p r o b a b i l i t y space without changing i t s d i s t r i b u t i o n s o t h a t A4

holds. Thus, o u r fo rmula t ion i n c l u d e s s t o c h a s t i c approximation

processes wi th c e r t a i n t y p e s of dependent n o i s e and even nonsta-

t i o n a r y n o i s e ( s e e P h i l i p p and S tou t , 1975, Chapter 8 ) . For nota-

t i o n a l convenience, we f o l l o w S t r a s s e n (1967) and adopt t h e phrase

"without l o s s of g e n e r a l i t y " when r e d e f i n i n g a sequence of random

v a r i a b l e s on a p o s s i b l y r i c h e r p r o b a b i l i t y space c o n t a i n i n g a .. .. C

Brownian Motion process ( s e e Csorgo and Revesz, 1981, Remark

2.2.1). For example, assumption A4 would be w r i t t e n a s

A 4 ' . Without l o s s of g e n e r a l i t y , t h e r e e x i s t s a s tandard

Brownian Motion process B on [O,-) and a u, E>O such t h a t

9 3 . SA CONVERGENCE RESULTS

Theorem 3.2 c o n t a i n s t h e main r e s u l t , weak convergence v i a

random i n d i c e s of t h e p rocess {w,} ( d e f i n e d i n (3.1) below) d e r i v e d

from (1.1) . To prove t h a t r e s u l t we need

Theorem 3.1.

Consider t h e a lgor i thm d e f i n e d i n (1.1) and assume Al-A4. Let

Then, wi thout l o s s of g e n e r a l i t y , t h e r e e x i s t s a s tandard Brownian

Motion process B and a sequence of Gaussian p r o c e s s e s {z,] d e f i n e d

on [0,-) such t h a t

where Z,(t) = -Z(nt) and

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

The proof of Theorem 3.1 u s e s t h e f o l l o w i n g two lemmas.

FREES

Lemma 3.1 (Ruppert , 1982, Lemma 4.1).

A s s w e a>-1/2 and A1-A4. Then t h e r e e x i s t s a s t a n d a r d

Brownian Motion Ba and a n E ' > O such t h a t

I f a<-112, t h e n l i m t + , lkct ka e e x i s t s and k

1 i m t+oD lkct ka ek i s f i n i t e a . s . (3.5)

Lemma 3.2.

Consider t h e a l g o r i t h m d e f i n e d i n (1 .1 ) and assume A1-A4.

Then, t h e r e e x i s t s E>O s u c h t h a t

P roof : For t h e one-dimensional c a s e , (3 .5) and assumption A4 imply - Rupper t ' s (1982) assumpt ion A4. The remainder of t h e proof i s

s i m i l a r t o Rupper t ' s Theorem 3.1. * Proof of Theorem 3.1: To prove ( 3 . 2 ) , f i r s t l e t y=G-1/2+.r. From

Lemmas 3.1 and 3.2, w i t h a=G- l+~=y- l /2 ,

which p roves (3 .2 ) . S i n c e -B(t) g B ( t ) and B ( a t ) =D B ( t ) ,

(3.4) i s immediate from (3.2). * The f o l l o w i n g lemma i s a m o d i f i c a t i o n o f a r e s u l t t h a t can be

found i n cs;rgb' and Revesz (1981, Theorem 7.2.2). However, t h e Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014


method of proof i s d i f f e r e n t and can be extended t o more genera l

s i t u a t i o n s .

Lemma 3.3.

Suppose { ~ ( t ) : t > 0 ] is a n element of D[O,-) def ined on

(O,F,P). Assme AS, t h a t t h e r e e x i s t s a Gaussian process

~ ( t ) = t - ~ / ~ ~ ( t ~ ) f o r some a>O, t h a t B(*) i s a Brownian Motion

process on to,-) and t h a t

Then, {n-lI2s(n.) } i s ~ k ~ i - r n i x i n ~ , i . e . ,

f o r each A,E E F such t h a t P(E)>O. Tnus,

-1 12 Nn S(Nnt) =>, G( t ) .

Proof: F i x T>O and cons ider on ly O<t<T. Now, from (3 .7) , - P { l i m sup 1n-'I2s(nt) - n -112~(n t ) ( = O } = 1.

n+o t

This g i v e s , f o r any s e t E having p o s i t i v e p r o b a b i l i t y ,

P { l i m sup / n -1 /2~ (n t ) -n -112~(n t ) I = 01 E } a 1. n+o t

This imp l i e s , f o r A E F,

It i s e a sy t o show t h a t n - l I 2 ~ ( n . ) i s ~ i n y i - m i x i n g , i . e . ,

P {n - l /Z~ (n t ) E A, o < ~ < T ( E } P(G(t) E A, O<t<T) . (3.11) .. .. C C

See f o r example, Csorgo and Revesz (1981, Lemma 7.2.2 and subse- I

quent remarks) . S ince T i s a r b i t r a r y , (3.10) and (3.11) a r e s u f f i -

c i e n t f o r (3.8). (3.8) and meorem 4 of Dur r e t t and Resnick (1976)

g i v e (3.9) . 9

Theorem 3.2.

Consider t h e sequence of s t o c h a s t i c p rocesses {w,} def ined i n

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

FREES

(3.1) and assume Al-AS. 'Ihen,

WN ( t ) =>W Z( t ) . n

Proof: A d i r e c t a p p l i c a t i o n of Lemma 3.3 t o Theorem 3.1. Define,

w i th y e - 1 / 2 + ~ ,

G( t ) = -t1/2-yu ~ ( t ~ ' (2y)-')

From (3 .6) . (3.7) i s s a t i s f i e d . Thus, from Lemma 3.3,

N'"~ S(Nnt) = t1I2 WN ( t ) =>W G( t ) . n

Using t-'l2 G(t) Z ( t ) co:pletes t h e proof. t

Remarks: It i s no t very d i f f i c u l t t o show t h a t t h e a lgor i thm

(1.1) w i th assumptions Al-A4 c o n t a i n s t h e Robbins-Mnro process .

See Ruppert (1982) o r Ljung (1978) f o r proofs under d i f f e r e n t

assumptions. Thus, Theorem 3.2 con t a in s t h e main r e s u l t of Stroup

and Braun (1982, Theorem 5).

To emphasize t h e importance of Theorem 3.2, we now cons t ruc t a

s e q u e n t i a l confidence r eg ion f o r t h e main parameter o f i n t e r e s t ,

8. Let z a be t h e ( l - a ) t h q u a n t i l e o f t h e s tandard normal d i s t r i b u - I

t ion . For random v a r i a b l e s Bn, on and Gn and f o r each d>O, de f i ne

Nd = i n f {n>l: d'Za/2an -('I2-') { ~ ( G ~ - I / ~ + T ) } - ~ ' ~ } (3.12)

= - i f no such n e x i s t s

and

Remark: We may d e f i n e t h e sequence Ind} where

nd = i n f { n > l : d>za12 o n 1 2 T { ( ~ - / 2 + ~ ) 1 2 } . It i s immediate t h a t under t h e assumptions of Theorem 3.1 t h a t

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

STOCHASTIC APPROXIMATION 6 7

l i m d q P ( 9 E I a . Fu r the r , from Lemma 3.6 of McLeish (1976), "d

we g e t l i m N /n =I a . s . d 4 d d

Coro l la ry 3.3

Consider t h e a lgor i thm def ined i n (1.1) and assume A1-A4 and

A6. Then,

l i m d 4 P(9 E I ) = 1-a. Nd

Proof: From (3.12) and A6, we have, -

Thus, A5 i s s a t i s f i e d . From Theorem 3 . 2 and t h e Continuous Mapping

Theorem a t t=l , we have

Remarks: The a p p l i c a b i l i t y of Theorem 3 . 2 and Coro l la ry 3.3

f o r t h e RM process was c i t e d i n 91 and f o r t h e mode e s t ima t i on

problem i s i n 94. These r e s u l t s can a l s o be used i n conjunc t ion

with e s t ima t i ng t h e op t imal replacement t ime i n a n age replacement

po l i cy , an important problem i n resource a l l o c a t i o n . F r ee s and

Ruppert (1983) showed t h a t a s t o c h a s t i c approximation e s t ima to r of

t h e opt imal replacement t ime ha s d e s i r a b l e asymptot ic p r o p e r t i e s

such a s s t r ong cons is tency and weak convergence t o a l i m i t i n g d i s -

t r i b u t i o n ( c f . , Frees and Ruppert f o r d e f i n i t i o n s of a n age

replacement po l i cy and t h e op t imal replacement t ime) . Coro l la ry

3.3.can be used t o determine a t what s t age of t h e e s t ima t i on

process t h e e s t ima to r i s reasonably c l o s e t o t h e opt imal replace-

ment time wi th high p r o b a b i l i t y . This t o o l may be u se fu l t o t h e

experimenter a s a cost-saving device. Coro l la ry 3.3 can a l s o be

proved by apply ing a r e s u l t o f B i l l i n g s l e y (1968, Theorem 17.1) t o

Theorem 3.1.

9 4 . MODE ESTIMATION VIA SA

Est imat ing t h e mode of a d i s t r i b u t i o n func t i on i s a d e l i c a t e

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

68 FREES

problem t h a t h a s drawn t h e a t t e n t i o n of many r e s e a r c h e r s ( c f . , Eddy

(1980, 1982) and H a l l (1982) ) . As poin ted o u t by F r i t z (1973), SA

i s a n a t u r a l v e h i c l e t o use i n mode e s t i m a t i o n . In t h i s s e c t i o n we

g i v e a v a r i a t i o n of F r i t z ' s SA mode e s t i m a t o r , prove i t s o p t i m a l i t y

and then c o n s t r u c t s e q u e n t i a l f ixed-wid th bounded l e n g t h confidence

i n t e r v a l s f o r t h e mode. Proofs o f t h e s e p r o p e r t i e s appear a t t h e

end of t h e s e c t i o n .

Le t f ( * ) be t h e p r o b a b i l i t y d e n s i t y f u n c t i o n of some d i s t r i b u -

t i o n f u n c t i o n F( 0 ) . When i t e x i s t s , use f("( *) f o r t h e sth der iv -

a t i v e of f , s=0 ,1 ,2 , ... Assume t h a t t h e mode of F, e = s u p x f ( x ) , i s

unique and f i n i t e .

Let Eo be t h e c l a s s of a l l Borel-measurable real-valued func-

t i o n s k( *), where k(*) i s bounded and e q u a l s z e r o o u t s i d e [ - l , l ] .

L e t r be a f i x e d p o s i t i v e i n t e g e r . For O<s<r, d e f i n e

Ms i s a c l a s s of k e r n e l f u n c t i o n s used t o e s t i m a t e f(') which i s

s i m i l a r t o a c l a s s used by Singh (1977).

Take X1 t o be a n a r b i t r a r y random v a r i a b l e w i t h f i n i t e second

moments. Let Fn = a(X1, Z1,...,Zn-l), t h e sigma f i e l d genera ted by

p a s t e v e n t s a t t h e n t h s t a g e . We assume t h e r e e x i s t sequences of

p o s i t i v e random v a r i a b l e s {an} and {en} where a,, cn a r e Fn-measur-

a b l e . L e t {z,} be an i . i . d . sequence of random v a r i a b l e s such t h a t

Z1 has d e n s i t y f . Define EF t o be t h e c o n d i t i o n a l e x p e c t a t i o n n

given Fn. Use kc Ml t o d e f i n e t h e e s t i m a t o r of f ( ' ) ( * ) by

Define t h e e s t i m a t o r of 9 a t t h e nth s t a g e , r e c u r s i v e l y by

'n+l

A l i s t of

El.

x and a r e

t h e b a s i c a s s m p t i o n s i s c o l l e c t e d below.

For some i n t e g e r r>l, assume f ( x ) , f ( r ) ( x ) e x i s t f o r each

bounded on t h e e n t i r e r e a l l i n e .

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014


82. For each x#B, ( x - 0 ) f ( l ) ( x ) < 0.

83. For some A,C>0 and r 3 / ( 2 ( 2 r + l ) ) , a n = ~ - 1 + ~ ( n - 3 / 2 ) and

and c - = ~ n - ~ ' / ~ + o ( n - ~ ' / ~ ) . 11

~ 4 . d r ) ( x ) i s cont inuous a t e. B5. Assume t h a t 0 1 1 2 - r = ( r -1 ) / (2 r+ l )

where G = -Aft2)( 8 ) . d B6. For some d>O, f ( r ) ( x ) = f ( r ) ( ~ ) + O( lx-91 ) f o r each x.

Lemma 4.1. ----- Consider t h e a lgor i thm def ined i n (4.2) and assume B1-B3.

Then.

The proof i s a m o d i f i c a t i o n of F r i t z (1973, Theorem 2) . See

a l s o Frees (1983, Theorem 4.1). The weak convergence o f t h e mode

e s t i m a t o r is now s t a t e d .

Theorem 4.1.

Consider t h e a lgor i thm def ined i n (4.2) and assume B1-B5. Define

1 p-A ( r ) ( 0 ) j y r / r ! k (y)dy , 02 = A * c - ~ ~ ( B ) k (y)dy and {wnj

-1 -1

a s i n ( 3 . 1 ) . Then, f o r Z ( t ) def ined i n ( 3 . 3 ) , we have

Remarks: Using t h e Continuous Mapping Theorem a t t=l g i v e s

t h e asympto t ic normal i ty of t h e mode e s t i m a t o r i n (4.2) when s u i t -

a b l y s tandard ized . The r a t e o f convergence, 1/2-T = ( r -1 ) / (2 r+ l ) ,

i s b e t t e r than t h e r a t e r e c e n t l y g iven by Eddy (1980, Coro l la ry

2.2) when t h e same number of bounded d e r i v a t i v e s i s assumed.

An immediate e x t e n s i o n i s t o f i n d t h e 0 such t h a t f ( ~ ) ( B ) = a

f o r nonnegat ive i n t e g e r p and f i x e d , known c o n s t a n t a. For

asymptot ic normal i ty o f a m u l t i v a r i a t e mode e s t i m a t o r s e e Frees

(1983, Theorem 4 .4) , i n which a d i f f e r e n t method of proof (due t o

Fabian (1968)) i s used.

To g e t a v e r s i o n of C o r o l l a r y 3.3, we i n t r o d u c e e s t i m a t o r s of

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

FREES

I

fin, on and Gn that satisfy A6. Let ko€ Mo, k2€ M2, krc M and r

define f;')(t) = kS((ln-t) /cn)/crl for s=0,2,r. Now, define

Corollary 4.2

Consider the algorithm defined in (4.2) and assume R1-B6. Use

the estimators in (4.5)-(4.7) to construct {N } and {I,} as in d

(3.12) and (3.13). m e n

The remainder of this section contains the proofs of Theorem

4.1 and Corollary 4.2. We use the following result due to

Strassen. (Recall the convention stated in the remark following

the assumptions in 52.)

Lemma 4.2 (Strassen, 1967, Theorem 4.4).

Let ,F+~} be a martingale difference sequence. Define

2 Yn = Zy ePk 4. S(Yn) \, and S(t) by linear interpolation.

Let h be a nonnegative, nondecreasing function on [O,-) such that

t-lh( t) is nonincreasing. If

Yn + a.~. and (4.8)

then, without loss of generality, there exists a Brownian Motion

process R on [O,-) such that

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014


Proof of Theorem 4.1: We need t o s a t i s f y t h e assumptions of

Theorem 3.1. A 1 i s t r u e by B1, B4 and B5. Define

* Thus, from (4.2) w i t h f ( * ) = - f ( ' ) ( * ) ,

Some e a s y c a l c u l a t i o n s show t h a t A2 i s t r u e w i t h t? de f ined i n t h e

s ta tement of t h e theorem. Note t h a t some of t h e b i a s term is i n *

a,n f (%) ((arm-~)f*(%)) b u t i s a s y m p t o t i c a l l y n e g l i g i b l e by

assumption B3. Lemma 4.1 s a t i s f i e s A3. The main work of t h e

theorem i s t o s a t i s f y A4 f o r which we w i l l Lemma 4.2.

L e t {E,] a s i n (4.11) and d e f i n e Yn a s i n Lemma 4 . 2 . Since

kcM1, we have t h a t E f ( ' ) (xn) i s bounded. F u r t h e r , f o r p>O, by

change of v a r i a b l e s , Fn

Thus, w i t h p 2 , n - 2 T ~ F ( f ; ' ) ( ~ ~ ) ) ~ + ~ - ~ f ( 8 ) 1 k2(y)dy a .s . by n

Lemma 4.1, B1, B3 and t h e Dominated Convergence Theorem. F u r t h e r ,

L This s a t i s f i e s (4.8) s i n c e Yn/n+u a s s .

To s a t i s f y (4.9) , u s e a c o n d i t i o n a l v e r s i o n o f Holder ' s in -

e q u a l i t y . Let p ,q b e nonnegat ive c o n s t a n t s such t h a t 2 /p + l/q = 1.

Then,

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

FREES

Now, by 83 and (4 .14) . EF ( f L 1 ) ( x n ) ) ~ = O ( ~ ~ ' ( P - ~ ) ) and w i t h n

( 4 . 1 1 ) ,

E cP = ~ ( n ~ ( p - 2 / 3 ) ) F n (4.16)

n

1-E For some O < d ( l - 2 ~ / 3 ) ( 1 - 2 / p ) , l e t h ( t ) = t . S i n c e Yn=O(n), we

have (h(yn))- ' l2 E 2 = ~ ( n ~ ( ~ - ~ / ~ ) - ~ / ~ ( ~ - ~ ) ) . R e q u i r i n g t h a t Fn "

0 < ~ < ( 1 - 2 r /3 ) (1-2/p) i m p l i e s t h a t ~ ( p - 2 ) /3-(p/2) (1-€)<-I and t h u s

(4 .9) i s s a t i s f i e d . We t h u s have (4 .10) which i s s u f f i c i e n t

f o r A 4 . + Tne proof of C o r o l l a r y 4.2 i s s i m i l a r t o t h e proof o f Corol-

l a r y 3 .3 u s i n g Theorem 4.1, Theorem 3.2 and t h e f o l l o w i n g :

Lemma 4.3.

Cons ide r t h e a l g o r i t h m d e f i n e d i n (4 .2) and a s s m e B1-B6. I

Then, t h e e s t i m a t o r s B,, on and Gn d e f i n e d i n (4.5)-(4.7) s a t i s f y

Ah. I

Proof : We prove o n l y Bn=B + o(1) a s t h e o t h e r s a r e s i m i l a r . S u f f i - - c i e n t f o r t h i s i s

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

STOCHASTIC APPROXIMATION

where I r \ l ( Y ) - ~ n ( < c y , by a T a y l o r - s e r i e s expans ion and n s i n c e kr E Mr. By B6,

by Lemma 4.1 and t h e Dominated Convergence Theorem. By Kronecker 's

lemma, we have

It is no t hard t o f i n d a t ~ ( 0 , 2 ] such t h a t

(4.19) and (4.20) a r e s u f f i c i e n t f o r (4.17) by Theorem 5 of Chow

(1965). i

55. MONTE-CARL0 SIMULATION

I n a s imple example, f i n i t e sample p r o p e r t i e s of t h e procedure

in t roduced i n 54 were i n v e s t i g a t e d . The Gamma d i s t r i b u t i o n was used

w i t h l o c a t i o n and s c a l e parameters 11 and 10 , r e s p e c t i v e l y , which

produces a mean of 1.1 and s t a n d a r d d e v i a t i o n of .3317. The d i s -

t r i b u t i o n f u n c t i o n and parameters determine a unique 0~1.

To e s t i m a t e t h e d e r i v a t i v e s of t h e d e n s i t y f u n c t i o n , we used

Legendre polynomials t o c a l c u l a t e s imple , polynomial k e r n e l est ima-

t o r s . For r a t e s r=3 and r=4, t h e s e k e r n e l e s t i m a t o r s a r e g i v e n i n

Table 1 below. By u s i n g r=4 and Legendre polynomials , t h e mode

e s t i m a t o r i s a s y m p t o t i c a l l y unbiased ( P O ) & T h i s f a c t had a g r e a t

impact on t h e performance of t h e s topping r u l e s which i s descr ibed

below.

Le t A and C be p o s i t i v e c o n s t a n t s , KA and KC be nonnegat ive

c o n s t a n t s . A s f i r s t sugges ted by Dvoretsky (1956), we used Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

TABLE I KERNEL FUNCTIONS

FREES

r = 3 - ko(y) = .375 (34jy2) I(-l<y<l)

k (y) = kl(y) = 1.5y I(-l<y<l)

k2(y) = 3.75(3y2-1) I(-l<y<l)

k3(y) = 26.~5(5~~-3~) I(-lcy<l)

3 k (y) = kl(y) = 1.875(5y-7y ) I(-l<y<l)

k2(y) = 3.75(3y2-1) I(-l<y<l)

{a,} and {en) of the form an = A(~+K*)-~ and cn = c(~+K~)-~.'/~.

Taking KA to be positive improved the convergence properties of the

procedure in finite samples. Choice of the positive parameters A

and C is constrained only by B5 (for the above Gamma distribution

and 1-4 this implies A > (-2/7)/f(2)(6) = .024). One criterion for

selection, introduced by Abdelhamid (1973), is to choose the A and

C that minimize the asymptotic mean square error. For t-3, some

easy calculations show that these optimal values are A1.08 and

Cz.7. Unfortunately, these optimal values are usually not available

to the experimenter a priori. Tables 2-4 give results for some

alternative parameters A and C (and various choices of KA, KC and

the initial estimator XI). In these tables, the criteria for choos-

ing among alternative estimators based on different parameters are

the expected bias and mean square error standardized by the sample

size.

Another important consideration in finite samples is the

choice of the appropriate confidence level (a) and confidence

interval width (2d). Based on the above parametrization of the

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014


TABLE 11 PERFORMANCE OF ESTIMATORS

250 Stage of Algorithm

500 2 000 a

model, i t i s easy t o ca l cu la t e f a c t o r s i n the asymptotic mean and

variance of n (r-1)/(2*1'(~n-~) a s given i n Theorem 4.1. For r.4,

these a r e bO.0, a2=.21885 and GE1.00088. Assume t h a t these values

a r e known, t h a t n1/3(~n-8) i s d i s t r i b u t e d normally (not only asymp-

t o t i c a l l y ) and t h a t i t i s des i red t o have a 95% confidence i n t e r v a l

fo r 8 with e r r o r no more than .I0 (which implies za12=1.96 and

ds.10). Based on the remarks preceding Corollary 3 . 3 , we requ i r e a

sample s i z e a t l e a s t

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

FREES

Even i n t h i s s i m p l i f i e d v e r s i o n , a l a r g e sample s i z e i s requi red

f o r moderately p r e c i s e r e s u l t s . Because of t h e Large sample n a t u r e

of t h e problem, we imposed t h e somewhat a r b i t r a r y c o n s t r a i n t t h a t

t h e procedure n o t be s topped b e f o r e n=200.

TABLE 111 PERFORMANCE OF ESTIMATORS

Stage of Algorithm 250 500 2000 m

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014


TABLE I V PERFORMANCE OF ESTIMATORS

S tage o f Algori thm 250 500 2000 m

At t h e 95% l e v e l of c o n f i d e n c e , Tab les 5-7 g i v e t h e r e s u l t s of

u s i n g t h e s t o p p i n g r u l e i n t h e SA a l g o r i t h m f o r s e l e c t e d v a l u e s of

A , C , KA, KC, X1 and d. The p r o p o r t i o n o f s u c c e s s e s column (PROPN

SUCC) i s t h e number o f times 6 E I d i v i d e d by t h e number o f t imes

t h e a l g o r i t h m was a c t u a l l y s toppedyd S i m i l a r l y , a v e r a g e s f o r t h e

s topp ing t i m e , Nd, and t h e b i a s a t Nd were computed o n l y over t h e

r u n s s topped .

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

FREES

TABLE V PERFORMANCE OF STOPPING RULES

PROPN PROPN STOPPED SUCC

TABLE V I PERFORMANCE OF STOPPING RULES

PROPN PROPN STOPPED SUCC

AVG

Nd

909.6 1513. 976.4

1538. 1047. 1581 910.8

1491. 895.5

1499. 857.3

1494. 865.6

1510. 870.1

1482. 932.9

1505.

AVG

Nd

909.6 1513.

880.9 1328. 1023. 1662. 1477.

0 650.4 954.4

1777. 1246. 1550. 1941.

542.8 1083.

538.0 1078. 1202. 1612.

AVG GNd- 8)

,012 ,004 .019 .007 .028 .012 .011 .003 .018 .008 ,003

-.002 -.004 -.001

,010 -003 .015 .005

AVG 8)

.012

.004 ,029 ,018 .008 .002 .039

0 .052 .044 .034 .061 ,056 .046 .017 ,013 .015 ,012

-.011 -.007

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014


TABLE VII PERFORMANCE OF STOPPING RULES

r 1 3 K A = O K C = O

d PROPN PROPN STOPPED SUCC

.lo0 .994 ,505

.075 .988 .508

.050 .912 .575 ,075 .998 .577 .050 .972 .568

.075 .996 .516

.050 .938 ,574

.075 .956 .446

.050 .818 .538

.075 .lo0 .240 ,050 .050 .440

.075 1.000 .282

.050 1 .OOO ,344

.075 .996 ,594

.050 .916 .600

.075 1.000 .590 ,050 1 .OOO .652

,075 1.000 .648 .050 1 .OOO .648

AVG ONd- 8)

.064 ,057 ,035 -. 009 ,003

.043

.029

.082

.050

.I40

.093

,131 .lo3 .038 .024 .065 .045

.023

.028

The following remarks summarize the results of the simulation

study, the details of which can be found in Tables 2-7. Each

simulation is based on 500 independent trials.

(1). The performance of the mode estimator can be character-

ized as typical of a stochastic approximation estimator. From

Tables 2-4, we see that the effects of the initial estimator X1

still persist for large n. The mode estimator performed better

when the initial estimator was less than the true mode than when X1

was greater than the mode, an important point in practice. The per-

formance was dramatically enhanced by taking KA to be positive, but

the introduction of positive only slowed the convergence of the

algorithm.

(2). For r=4, the stopping rule performed well even in cases

in which the mode estimator did poorly. Criteria for judging the Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

80 FREES

performance of t h e s topping r u l e s were p r o p o r t i o n of s u c c e s s e s and

t h e average b i a s a t t h e s topping t ime. When t h e convergence o f t h e

a lgor i thm was extremely poor , t h e s topping r u l e performed s l i g h t l y

worse. Thus, t h e s topping r u l e should no t b e r e l i e d on a s t h e s o l e

c r i t e r i o n f o r h a l t i n g t h e procedure when t h e exper imente r has no

knowledge of t h e d i s t r i b u t i o n f u n c t i o n .

( 3 ) . For r=3, t h e s topping r u l e performed w e l l on t h e b a s i s

of t h e average b i a s a t t h e s topping t ime. However, t h e performance

was markedly poorer than r=4 when judged by t h e p r o p o r t i o n of suc-

c e s s e s c r i t e r i o n . This i s due t o t h e f a c t t h a t t o c o n s t r u c t t h e

conf idence i n t e r v a l we needed t o e s t i m a t e t h e t h i r d d e r i v a t i v e of

t h e d e n s i t y a t t h e mode. Even t h e sample s i z e s we used were no t

l a r g e enough t o g e t a c c u r a t e e s t i m a t o r s of t h e b i a s . Thus, t h e

confidence i n t e r v a l s were much poorer than t h e a s y m p t o t i c a l l y

unbiased c a s e ( r = 4 ) .

BIBLIOGRAPHY

Abdelhamid, S.N. (1973). Transformation of o b s e r v a t i o n s i n s t o c h a s t i c approximation. Ann. S t a t i s t . 1, 1158-1174.

Anscombe, F. (1952). Large-sample theory of s e q u e n t i a l e s t i m a t i o n . Proc. Cambridge P h i l . Soc. 48, 600-607.

B i l l i n g s l e y , P. (1968). Convergence of P r o b a b i l i t y Measures. Wiley, New York.

Burkholder, D.L. (1956). On a c l a s s of s t o c h a s t i c approxima- t i o n processes . Ann. Math. S t a t i s t . 27, 1044-1059.

Chow, Y.S. (1965). Local convergence of m a r t i n g a l e s and t h e law of l a r g e numbers. Ann. Math. S t a t i s t . 36, 552-558.

.. .. . . Csorgo, M. and Revesz, P. (1981). Strong Approximations i n

P r o b a b i l i t y and S t a t i s t i c s . Akademiai Kiado, Budapest.

D u r r e t t , R.T. and Resnick, S . I . (1977). Weak convergence with random i n d i c e s . Stoch. Proc. Appl. 5, 213-220.

Dvoretsky, A. (1956). On s t o c h a s t i c approximation. R o c . Third Berkeley Symp. Math. S t a t i s t . Probab., 1 (J. Neyman, e d . ) , 39-55. Univ. C a l i f o r n i a P r e s s .

Eddy, W. (1980). Optimal k e r n e l e s t i m a t o r s of t h e mode. S t a t i s t . 8, 870-882.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014


Eddy, W. (1982). The asymptot ic d i s t r i b u t i o n s of k e r n e l e s t i - mators of t h e mode. Z . Wahrscheinlichkeitstheorie Verw. Gebiete . 59, 279-290.

Fabian, V. (1968). On asymptot ic normal i ty i n s t o c h a s t i c approximation. Ann. Math. S t a t i s t . 39, 1327-1332.

Fabian, V. (1971). S t o c h a s t i c approximation. Optimizing Methods i n S t a t i s t i c s , J.S. Rustagi (ed . ) , 439-460.

F a r r e l l , R.H. (1962). Bounded l eng th confidence i n t e r v a l s f o r t he zero of a r eg r e s s ion func t i on . Ann. Math. S t a t i s t . 33, 237-247.

F r ee s , E.W. (1983). On Const ruc t ion of Sequent ia l Age Replacement P o l i c i e s v i a S t o c h a s t i c Approximation. Ph .D. Disserta- t i o n , Department of S t a t i s t i c s , Univers i ty of North Carol ina. Chapel ill, North Caro l ina .

F r e e s , E.W. and Ruppert, D. (1983). Sequent ia l nonparametr ic age replacement p o l i c i e s . I n s t i t u t e of S t a t i s t i c s Mimeo S e r i e s 61535. Chapel H i l l , North Caro l ina .

F r i t z , J. (1973). S t o c h a s t i c approximation f o r f i nd ing l o c a l maxima of p r o b a b i l i t y d e n s i t i e s . S tud ia Sci . Math. Hungar. 8 , 309-322.

H a l l , P. (1982). Asymptotic theory of Grenander's mode e s t i - mator. Z . Wahrscheinlichkeitstheorie Verw. Gebiete. 60, 315-334.

K i e f e r , J. and Wolfowitz, J. (1952). S t o c h a s t i c e s t ima t i on of t he maximum of a r eg r e s s ion func t i on . Ann. Math. S t a t i s t . 23, 462-466.

Kushner, H.J . (1977). General convergence r e s u l t s f o r s to - c h a s t i c approximations v i a weak convergence theory . J. Math. Anal. and Appl. 61, 490-503.

L a i , T.L. and Robbins, H. (1979). Adaptive de s igns and sto- c h a s t i c approximation. Ann. S t a t i s t . 7, 1196-1221.

Ljung, L. (1978). Strong convergence of a s t o c h a s t i c approximation confidence. Ann. S t a t i s t . 6, 680-696.

McLeish, D.L. (1976). Func t iona l and random c e n t r a l l i m i t theorems f o r t h e Robbins-Monro process . J. Appl. Prob. 13 , 148-154.

~ ; ' l l e r , H.G. and Gasser , T. (1979)'. Optimal convergence prop- e r t i e s of k e r n e l e s t i m a t e s of d e r i v a t i v e s of a d e n s i t y f unc t i on . Smoothing Techniques f o r Curve Es t imat ion . (T. Gasser and M. Rosenbla t t , ed s . ) , 144-154. Springer-Verlag, Be r l i n .

P h i l i p p , W. and S t o u t , W. (1975). Almost s u r e i nva r i ance p r i n c i p l e s f o r p a r t i a l sums of weakly dependent random v a r i a b l e s . Amer. Math. Soc. Men. No. 161, Amer. Math. Soc., Providence, R I .

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

82 FREES

C

Renyi, A. (1958). On mixing sequences of s e t s . Acta Math. Acad. Sc i . Hungar. 9, 215-228.

Robbins, H. and Monro, S. (1951). A s t o c h a s t i c approximation method. Ann. Math. S t a t i s t . 22, 400-407.

Ruppert D. (1982). Almost s u r e a ~ p r o x i m a t i o n s t o t he Robbins- Monro and Kief er-wolf owitz processes wi th dependent no ise . S t a t i s t . 10, 178-187.

S ie lken , R.L. (1973). Stopping t imes f o r s t o c h a s t i c approximation procedures. 2. Wahrscheinlichkeitstheorie Verw. Geb. 26, 67-75.

Singh, R.S. (1977). Improvement on some known nonparametric uniformly c o n s i s t e n t e s t i m a t o r s of d e r i v a t i v e s of a d e n s i t y . S t a t i s t . 5, 394-399.

Skorohod, A.V. (1956). Limit theorems f o r s t o c h a s t i c p rocesses . Theor. Probab. Appl. 1, 261-290.

Stone, C.J . (1963). Weak convergence of s t o c h a s t i c processes def ined on s emi - i n f i n i t e time i n t e r v a l s . Proc. h e r . Math. Soc. 14, 694-696.

S t r a s s e n , V. (1967). Almost s u r e behaviour of sums of independent random v a r i a b l e s and mar t inga les . Proc. F i f t h Berke ley Symp. S t a t i s t . Probab. (Ed. L. LeCam e t a l . ) . Los Angeles: Univers i ty of Ca l i f o rn i a P r e s s , v. 2, 315-343.

S t roup , D.F. and Braun, I. (1982). On a s topping r u l e f o r s t o c h a s t i c approximation. 2. Wahrscheinlichkeitstheorie Verw. Geb. 60, 535-554. -

Recieved : October 1984.

k m n d e d by Jana ~ure&wa<

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

48 3

0 N

ovem

ber

2014

Documents

Wear convergence of stochastic approximation processes with random indices