Upload
edward-w
View
212
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [University of Auckland Library]On: 30 November 2014, At: 16:48Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Sequential Analysis: Design Methods and ApplicationsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsqa20
Wear convergence of stochastic approximationprocesses with random indicesEdward W. Frees aa Department of Statistics and School of Business , University of Wisconsin , 1210 West DaytonStreet, Madison, 53706, WisconsinPublished online: 29 Mar 2007.
To cite this article: Edward W. Frees (1985) Wear convergence of stochastic approximation processes with random indices,Sequential Analysis: Design Methods and Applications, 4:1-2, 59-82, DOI: 10.1080/07474948508836072
To link to this article: http://dx.doi.org/10.1080/07474948508836072
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representationsor warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Anyopinions and views expressed in this publication are the opinions and views of the authors, and are not theviews of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor and Francis shall not be liable for any losses,actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoevercaused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyoneis expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
SEQUENTIAL ANALYSIS, 4(1&2), 59-82 (1985)
WEAK CONVERGENCE OF STOCHASTIC APPROXIMATION PROCESSES WITH RANDOM INDICES
Edward W. Frees
Department of Statistics and School of Business 1210 West Dayton Street University of Wisconsin Madison, Wisconsin 53706
Keywords and Phrases: strong approximations, mode estimation, stopping times.
ABSTRACT
Conditions are given for weak convergence through random
indices of a general stochastic approximation process which
includes the Robbins-Monro and Kiefer-Wolfowitz processes. For a
particular index, a sequential fixedwidth bounded length confi-
dence interval for the parameter being estimated is established.
As an example, an optimal recursive estimator and confidence
interval for the mode of a distribution function is constructed.
$1. INTRODUCTION AND SUMMARY
In a stochastic approximation (SA) procedure, stopping time
variables or stopping rules are important because they give the
researcher a reasonable criterion for halting the procedure with a
certain amount of confidence (see Stroup and Braun (1582), Sielken
(1973) and Farrell (1962)). Moreover, stopping times are useful in
developing confidence regions for parameters of interest. Proof of
Copyright O 1985 by Marcel Dekker, Inc. 0747-4946/85/0401-0059$3.50/0
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
60 FREES
weak convergence through a g e n e r a l random index g i v e s r e s u l t s f o r
s topping t imes a s s imple c o r o l l a r i e s .
Consider a g e n e r a l SA a lgor i thm,
f o r l o c a t i n g t h e assumed unique r o o t of f(x)=O, s a y 8, where f ( * )
i s a measurable f u n c t i o n such t h a t f:R+R. In ( 1 . 1 ) , Bn and e a r e n
random v a r i a b l e s and r is a f i x e d c o n s t a n t such t h a t O<r<1/2.
H e u r i s t i c a l l y , one may t h i n k of E~ a s t h e random e r r o r a t t h e nth
s t a g e due t o t h e f a c t t h a t f(Xn) can only be measured wi th e r r o r .
Bn may be thought of a s t h e b i a s i n t h e measurement of f (X,,), e.g.,
i f f ( * ) i s a p r o b a b i l i t y d e n s i t y func t ion . Tne a lgor i thm (1.1) is
a s p e c i a l c a s e of t h e a lgor i thm s t u d i e d by Kushner (1977) and Ljung
(1978). It c o n t a i n s t h e well-known Robbins-Monro (RM) (1951) and
Kiefer-Wolfowitz (KW) (1952) procedures f o r f i n d i n g r o o t s and
maxima of a r e g r e s s i o n f u n c t i o n , r e s p e c t i v e l y .
Mot iva t ion f o r s tudying a g e n e r a l a lgor i thm has been given by
Ruppert (1982). Ruppert cons idered a s i m i l a r a l g o r i t h m t o e s t i m a t e
t h e (assumed) unique 0 such t h a t f ( 0 ) = 0 ,
where O<r<1/4. As argued by Ruppert , a n impor tan t reason f o r con-
s i d e r i n g a lgor i thms of t h e form (1.2) i s t h a t a convent iona l
r e s t r i c t i v e assumption i n SA t h a t t h e e r r o r s {E,} be m a r t i n g a l e
d i f f e r e n c e s need no t be made. Ruppert showed t h a t f o r l a r g e n , t h e
e s t i m a t o r X, can be almost s u r e l y ( s t r o n g l y ) approximated by a
weighted average of E ~ , . . . , E ~ , where X and t h e e r r o r s E a r e k- n dimensional random v e c t o r s . This r e p r e s e n t a t i o n and s t r o n g approx-
imations f o r sums of dependent random v a r i a b l e s immediately g i v e
s e v e r a l important r e s u l t s about t h e l a r g e sample behavior of %. We c o n s i d e r o n l y t h e c a s e k=l . We choose a modif ied v e r s i o n of
(1.2) s i n c e Ruppert ' s r e p r e s e n t a t i o n i s not d i r e c t l y a p p l i c a b l e t o
s t a t i s t i c a l problems t h a t have non-zero b i a s terms (5 ) i n f i n i t e l y n
o f t e n and r<1/6, a s i n t h e example i n s4. The a lgor i thm (1.1) i s
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION 6 1
in a form more related to the easily accessible algorithm given by
Fabian (1968).
After a short preliminary section on notation and assumptions,
results which describe the large sample behavior of the recursive
estimator % are given in s3. The main result is weak convergence
via random indices (Theorem 3.2). This result subsumes the earlier
work on stopping rules in SA (see Stroup and Braun (1982) and
references cited therein). A fixed-width bounded length confidence
interval for the parameter 0 is an easy corollary of Theorem 3.2.
This corollary is the result given by Sielken (1973) and McLeish
(1976) for the RM process. Sielken's method was to verify
Anscornbe's (1952) uniform continuity in probability condition.
McLeish showed convergence of a randomized version of the RM
process in D[0,1) (the space of right-continuous real functions on
[0,1) having left-hand limits). The methods in this paper are dif-
ferent. Via strong approximation techniques, we prove convergence
of a randomized version of a process derived from (1.1) in D [ O , m ) ,
a more convenient function space for stopping rules.
As an example, $ 4 shows to how develop a fixed-width bounded
length confidence interval for the mode of a distribution function.
Using SA to estimate the mode was introduced by Burkholder (1956)
under restrictive conditions and more generally by Fritz (1973).
As a preliminary result, we show that our proposed estimator when
suitably standardized converges to a nondegenerate distribution.
This preliminary result is new and should be of independent inter-
est since it gives a method of constructing a mode estimator which
has a rate of convergence similar to the optimal rate of conver-
gence of density estimators (see ~;'ller and Gasser (1979)). The
example also demonstrates why it is important to consider a general
SA algorithm since the mode estimation problem is neither an RM nor
a KW procedure. The final section gives the results of an investi-
gation of finite sample properties of the mode estimator proposed
in $4 via a Monte-Carlo study.
s2. NOTATION AND ASSUMPTIONS - Assume that random elements are defined on a fixed probability D
ownl
oade
d by
[U
nive
rsity
of
Auc
klan
d L
ibra
ry]
at 1
6:48
30
Nov
embe
r 20
14
6 2 FREES
space ( O , F , P ) unless otherwise specified. Endow D[O,w) with Stone's
(1963) extension of Skorokhod's (1956) J1-topology and use =)W to denote convergence in this topology. Also, use =D to denote equiva-
lence in distribution. Let [ * I be the greatest integer function. Let {x,} and {yn] be sequences of random variables (r.v.). We write
Xn=O(Yn) if there exists a r.v. Z such that I x ~ \ / ] Y ~ \ < z almost
surely (a.s.) for all n. If Ixn(/ lyn( + 0 a.s., we write
%=o(Yn). Let 0 ( 0 ) and o ( 0 ) be the corresponding symbols for P P
relationships in probability.
The asymptotic coverage probability is fixed at 1-a, where
a9(0,1/2). We will define a sequence of random intervals {I,} and
a stopping rule Nd so that Nd is the first n such that
length(In)62d and lim P(BEI )=l-a. The basic assumptions about d 4 Nd
the relationships defined in (1.1) are stated below.
Al. Let n>0 and G>1/2-T. Assume that
f(x) = G(X-e) + o( IX-el1+"). A2. Let p>O and BER. Assume that Bn = 6 + o(n-'). A3. limn, Xn = 8 a.s.
A4. There exists a probability space which contains a
sequence {en} and a standard Brownian Motion B on [0,-) such that
and for some U,E > 0 depending on {E,}
AS. Let %/m, = N + op(l), where N is a positive random
variable, {mn} are integers going to infinity and {N ] is a n
sequence of random variables.
A6. Let { { } , {,} and {G~} be sequences of random variables I
such that 6 =B + o(l), un=u + o(1) and Gn=G + o(1). n
Remarks: In A6, we allow estimators of 0 that do not converge
to B as quickly as those in A2. These will be useful in the con-
struction of confidence intervals. The monograph of Philipp and
Stout (1975) gives several sets of sufficient conditions so that
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION 63
t h e sequence of random v a r i a b l e s { E ~ } can be r e d e f i n e d on a r i c h e r
p r o b a b i l i t y space without changing i t s d i s t r i b u t i o n s o t h a t A4
holds. Thus, o u r fo rmula t ion i n c l u d e s s t o c h a s t i c approximation
processes wi th c e r t a i n t y p e s of dependent n o i s e and even nonsta-
t i o n a r y n o i s e ( s e e P h i l i p p and S tou t , 1975, Chapter 8 ) . For nota-
t i o n a l convenience, we f o l l o w S t r a s s e n (1967) and adopt t h e phrase
"without l o s s of g e n e r a l i t y " when r e d e f i n i n g a sequence of random
v a r i a b l e s on a p o s s i b l y r i c h e r p r o b a b i l i t y space c o n t a i n i n g a .. .. C
Brownian Motion process ( s e e Csorgo and Revesz, 1981, Remark
2.2.1). For example, assumption A4 would be w r i t t e n a s
A 4 ' . Without l o s s of g e n e r a l i t y , t h e r e e x i s t s a s tandard
Brownian Motion process B on [O,-) and a u, E>O such t h a t
9 3 . SA CONVERGENCE RESULTS
Theorem 3.2 c o n t a i n s t h e main r e s u l t , weak convergence v i a
random i n d i c e s of t h e p rocess {w,} ( d e f i n e d i n (3.1) below) d e r i v e d
from (1.1) . To prove t h a t r e s u l t we need
Theorem 3.1.
Consider t h e a lgor i thm d e f i n e d i n (1.1) and assume Al-A4. Let
Then, wi thout l o s s of g e n e r a l i t y , t h e r e e x i s t s a s tandard Brownian
Motion process B and a sequence of Gaussian p r o c e s s e s {z,] d e f i n e d
on [0,-) such t h a t
where Z,(t) = -Z(nt) and
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
The proof of Theorem 3.1 u s e s t h e f o l l o w i n g two lemmas.
FREES
Lemma 3.1 (Ruppert , 1982, Lemma 4.1).
A s s w e a>-1/2 and A1-A4. Then t h e r e e x i s t s a s t a n d a r d
Brownian Motion Ba and a n E ' > O such t h a t
I f a<-112, t h e n l i m t + , lkct ka e e x i s t s and k
1 i m t+oD lkct ka ek i s f i n i t e a . s . (3.5)
Lemma 3.2.
Consider t h e a l g o r i t h m d e f i n e d i n (1 .1 ) and assume A1-A4.
Then, t h e r e e x i s t s E>O s u c h t h a t
P roof : For t h e one-dimensional c a s e , (3 .5) and assumption A4 imply - Rupper t ' s (1982) assumpt ion A4. The remainder of t h e proof i s
s i m i l a r t o Rupper t ' s Theorem 3.1. * Proof of Theorem 3.1: To prove ( 3 . 2 ) , f i r s t l e t y=G-1/2+.r. From
Lemmas 3.1 and 3.2, w i t h a=G- l+~=y- l /2 ,
which p roves (3 .2 ) . S i n c e -B(t) g B ( t ) and B ( a t ) =D B ( t ) ,
(3.4) i s immediate from (3.2). * The f o l l o w i n g lemma i s a m o d i f i c a t i o n o f a r e s u l t t h a t can be
found i n cs;rgb' and Revesz (1981, Theorem 7.2.2). However, t h e Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION 65
method of proof i s d i f f e r e n t and can be extended t o more genera l
s i t u a t i o n s .
Lemma 3.3.
Suppose { ~ ( t ) : t > 0 ] is a n element of D[O,-) def ined on
(O,F,P). Assme AS, t h a t t h e r e e x i s t s a Gaussian process
~ ( t ) = t - ~ / ~ ~ ( t ~ ) f o r some a>O, t h a t B(*) i s a Brownian Motion
process on to,-) and t h a t
Then, {n-lI2s(n.) } i s ~ k ~ i - r n i x i n ~ , i . e . ,
f o r each A,E E F such t h a t P(E)>O. Tnus,
-1 12 Nn S(Nnt) =>, G( t ) .
Proof: F i x T>O and cons ider on ly O<t<T. Now, from (3 .7) , - P { l i m sup 1n-'I2s(nt) - n -112~(n t ) ( = O } = 1.
n+o t
This g i v e s , f o r any s e t E having p o s i t i v e p r o b a b i l i t y ,
P { l i m sup / n -1 /2~ (n t ) -n -112~(n t ) I = 01 E } a 1. n+o t
This imp l i e s , f o r A E F,
It i s e a sy t o show t h a t n - l I 2 ~ ( n . ) i s ~ i n y i - m i x i n g , i . e . ,
P {n - l /Z~ (n t ) E A, o < ~ < T ( E } P(G(t) E A, O<t<T) . (3.11) .. .. C C
See f o r example, Csorgo and Revesz (1981, Lemma 7.2.2 and subse- I
quent remarks) . S ince T i s a r b i t r a r y , (3.10) and (3.11) a r e s u f f i -
c i e n t f o r (3.8). (3.8) and meorem 4 of Dur r e t t and Resnick (1976)
g i v e (3.9) . 9
Theorem 3.2.
Consider t h e sequence of s t o c h a s t i c p rocesses {w,} def ined i n
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
FREES
(3.1) and assume Al-AS. 'Ihen,
WN ( t ) =>W Z( t ) . n
Proof: A d i r e c t a p p l i c a t i o n of Lemma 3.3 t o Theorem 3.1. Define,
w i th y e - 1 / 2 + ~ ,
G( t ) = -t1/2-yu ~ ( t ~ ' (2y)-')
From (3 .6) . (3.7) i s s a t i s f i e d . Thus, from Lemma 3.3,
N'"~ S(Nnt) = t1I2 WN ( t ) =>W G( t ) . n
Using t-'l2 G(t) Z ( t ) co:pletes t h e proof. t
Remarks: It i s no t very d i f f i c u l t t o show t h a t t h e a lgor i thm
(1.1) w i th assumptions Al-A4 c o n t a i n s t h e Robbins-Mnro process .
See Ruppert (1982) o r Ljung (1978) f o r proofs under d i f f e r e n t
assumptions. Thus, Theorem 3.2 con t a in s t h e main r e s u l t of Stroup
and Braun (1982, Theorem 5).
To emphasize t h e importance of Theorem 3.2, we now cons t ruc t a
s e q u e n t i a l confidence r eg ion f o r t h e main parameter o f i n t e r e s t ,
8. Let z a be t h e ( l - a ) t h q u a n t i l e o f t h e s tandard normal d i s t r i b u - I
t ion . For random v a r i a b l e s Bn, on and Gn and f o r each d>O, de f i ne
Nd = i n f {n>l: d'Za/2an -('I2-') { ~ ( G ~ - I / ~ + T ) } - ~ ' ~ } (3.12)
= - i f no such n e x i s t s
and
Remark: We may d e f i n e t h e sequence Ind} where
nd = i n f { n > l : d>za12 o n 1 2 T { ( ~ - / 2 + ~ ) 1 2 } . It i s immediate t h a t under t h e assumptions of Theorem 3.1 t h a t
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION 6 7
l i m d q P ( 9 E I a . Fu r the r , from Lemma 3.6 of McLeish (1976), "d
we g e t l i m N /n =I a . s . d 4 d d
Coro l la ry 3.3
Consider t h e a lgor i thm def ined i n (1.1) and assume A1-A4 and
A6. Then,
l i m d 4 P(9 E I ) = 1-a. Nd
Proof: From (3.12) and A6, we have, -
Thus, A5 i s s a t i s f i e d . From Theorem 3 . 2 and t h e Continuous Mapping
Theorem a t t=l , we have
Remarks: The a p p l i c a b i l i t y of Theorem 3 . 2 and Coro l la ry 3.3
f o r t h e RM process was c i t e d i n 91 and f o r t h e mode e s t ima t i on
problem i s i n 94. These r e s u l t s can a l s o be used i n conjunc t ion
with e s t ima t i ng t h e op t imal replacement t ime i n a n age replacement
po l i cy , an important problem i n resource a l l o c a t i o n . F r ee s and
Ruppert (1983) showed t h a t a s t o c h a s t i c approximation e s t ima to r of
t h e opt imal replacement t ime ha s d e s i r a b l e asymptot ic p r o p e r t i e s
such a s s t r ong cons is tency and weak convergence t o a l i m i t i n g d i s -
t r i b u t i o n ( c f . , Frees and Ruppert f o r d e f i n i t i o n s of a n age
replacement po l i cy and t h e op t imal replacement t ime) . Coro l la ry
3.3.can be used t o determine a t what s t age of t h e e s t ima t i on
process t h e e s t ima to r i s reasonably c l o s e t o t h e opt imal replace-
ment time wi th high p r o b a b i l i t y . This t o o l may be u se fu l t o t h e
experimenter a s a cost-saving device. Coro l la ry 3.3 can a l s o be
proved by apply ing a r e s u l t o f B i l l i n g s l e y (1968, Theorem 17.1) t o
Theorem 3.1.
9 4 . MODE ESTIMATION VIA SA
Est imat ing t h e mode of a d i s t r i b u t i o n func t i on i s a d e l i c a t e
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
68 FREES
problem t h a t h a s drawn t h e a t t e n t i o n of many r e s e a r c h e r s ( c f . , Eddy
(1980, 1982) and H a l l (1982) ) . As poin ted o u t by F r i t z (1973), SA
i s a n a t u r a l v e h i c l e t o use i n mode e s t i m a t i o n . In t h i s s e c t i o n we
g i v e a v a r i a t i o n of F r i t z ' s SA mode e s t i m a t o r , prove i t s o p t i m a l i t y
and then c o n s t r u c t s e q u e n t i a l f ixed-wid th bounded l e n g t h confidence
i n t e r v a l s f o r t h e mode. Proofs o f t h e s e p r o p e r t i e s appear a t t h e
end of t h e s e c t i o n .
Le t f ( * ) be t h e p r o b a b i l i t y d e n s i t y f u n c t i o n of some d i s t r i b u -
t i o n f u n c t i o n F( 0 ) . When i t e x i s t s , use f("( *) f o r t h e sth der iv -
a t i v e of f , s=0 ,1 ,2 , ... Assume t h a t t h e mode of F, e = s u p x f ( x ) , i s
unique and f i n i t e .
Let Eo be t h e c l a s s of a l l Borel-measurable real-valued func-
t i o n s k( *), where k(*) i s bounded and e q u a l s z e r o o u t s i d e [ - l , l ] .
L e t r be a f i x e d p o s i t i v e i n t e g e r . For O<s<r, d e f i n e
Ms i s a c l a s s of k e r n e l f u n c t i o n s used t o e s t i m a t e f(') which i s
s i m i l a r t o a c l a s s used by Singh (1977).
Take X1 t o be a n a r b i t r a r y random v a r i a b l e w i t h f i n i t e second
moments. Let Fn = a(X1, Z1,...,Zn-l), t h e sigma f i e l d genera ted by
p a s t e v e n t s a t t h e n t h s t a g e . We assume t h e r e e x i s t sequences of
p o s i t i v e random v a r i a b l e s {an} and {en} where a,, cn a r e Fn-measur-
a b l e . L e t {z,} be an i . i . d . sequence of random v a r i a b l e s such t h a t
Z1 has d e n s i t y f . Define EF t o be t h e c o n d i t i o n a l e x p e c t a t i o n n
given Fn. Use kc Ml t o d e f i n e t h e e s t i m a t o r of f ( ' ) ( * ) by
Define t h e e s t i m a t o r of 9 a t t h e nth s t a g e , r e c u r s i v e l y by
'n+l
A l i s t of
El.
x and a r e
t h e b a s i c a s s m p t i o n s i s c o l l e c t e d below.
For some i n t e g e r r>l, assume f ( x ) , f ( r ) ( x ) e x i s t f o r each
bounded on t h e e n t i r e r e a l l i n e .
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION 69
82. For each x#B, ( x - 0 ) f ( l ) ( x ) < 0.
83. For some A,C>0 and r 3 / ( 2 ( 2 r + l ) ) , a n = ~ - 1 + ~ ( n - 3 / 2 ) and
and c - = ~ n - ~ ' / ~ + o ( n - ~ ' / ~ ) . 11
~ 4 . d r ) ( x ) i s cont inuous a t e. B5. Assume t h a t 0 1 1 2 - r = ( r -1 ) / (2 r+ l )
where G = -Aft2)( 8 ) . d B6. For some d>O, f ( r ) ( x ) = f ( r ) ( ~ ) + O( lx-91 ) f o r each x.
Lemma 4.1. ----- Consider t h e a lgor i thm def ined i n (4.2) and assume B1-B3.
Then.
The proof i s a m o d i f i c a t i o n of F r i t z (1973, Theorem 2) . See
a l s o Frees (1983, Theorem 4.1). The weak convergence o f t h e mode
e s t i m a t o r is now s t a t e d .
Theorem 4.1.
Consider t h e a lgor i thm def ined i n (4.2) and assume B1-B5. Define
1 p-A ( r ) ( 0 ) j y r / r ! k (y)dy , 02 = A * c - ~ ~ ( B ) k (y)dy and {wnj
-1 -1
a s i n ( 3 . 1 ) . Then, f o r Z ( t ) def ined i n ( 3 . 3 ) , we have
Remarks: Using t h e Continuous Mapping Theorem a t t=l g i v e s
t h e asympto t ic normal i ty of t h e mode e s t i m a t o r i n (4.2) when s u i t -
a b l y s tandard ized . The r a t e o f convergence, 1/2-T = ( r -1 ) / (2 r+ l ) ,
i s b e t t e r than t h e r a t e r e c e n t l y g iven by Eddy (1980, Coro l la ry
2.2) when t h e same number of bounded d e r i v a t i v e s i s assumed.
An immediate e x t e n s i o n i s t o f i n d t h e 0 such t h a t f ( ~ ) ( B ) = a
f o r nonnegat ive i n t e g e r p and f i x e d , known c o n s t a n t a. For
asymptot ic normal i ty o f a m u l t i v a r i a t e mode e s t i m a t o r s e e Frees
(1983, Theorem 4 .4) , i n which a d i f f e r e n t method of proof (due t o
Fabian (1968)) i s used.
To g e t a v e r s i o n of C o r o l l a r y 3.3, we i n t r o d u c e e s t i m a t o r s of
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
FREES
I
fin, on and Gn that satisfy A6. Let ko€ Mo, k2€ M2, krc M and r
define f;')(t) = kS((ln-t) /cn)/crl for s=0,2,r. Now, define
Corollary 4.2
Consider the algorithm defined in (4.2) and assume R1-B6. Use
the estimators in (4.5)-(4.7) to construct {N } and {I,} as in d
(3.12) and (3.13). m e n
The remainder of this section contains the proofs of Theorem
4.1 and Corollary 4.2. We use the following result due to
Strassen. (Recall the convention stated in the remark following
the assumptions in 52.)
Lemma 4.2 (Strassen, 1967, Theorem 4.4).
Let ,F+~} be a martingale difference sequence. Define
2 Yn = Zy ePk 4. S(Yn) \, and S(t) by linear interpolation.
Let h be a nonnegative, nondecreasing function on [O,-) such that
t-lh( t) is nonincreasing. If
Yn + a.~. and (4.8)
then, without loss of generality, there exists a Brownian Motion
process R on [O,-) such that
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION 71
Proof of Theorem 4.1: We need t o s a t i s f y t h e assumptions of
Theorem 3.1. A 1 i s t r u e by B1, B4 and B5. Define
* Thus, from (4.2) w i t h f ( * ) = - f ( ' ) ( * ) ,
Some e a s y c a l c u l a t i o n s show t h a t A2 i s t r u e w i t h t? de f ined i n t h e
s ta tement of t h e theorem. Note t h a t some of t h e b i a s term is i n *
a,n f (%) ((arm-~)f*(%)) b u t i s a s y m p t o t i c a l l y n e g l i g i b l e by
assumption B3. Lemma 4.1 s a t i s f i e s A3. The main work of t h e
theorem i s t o s a t i s f y A4 f o r which we w i l l Lemma 4.2.
L e t {E,] a s i n (4.11) and d e f i n e Yn a s i n Lemma 4 . 2 . Since
kcM1, we have t h a t E f ( ' ) (xn) i s bounded. F u r t h e r , f o r p>O, by
change of v a r i a b l e s , Fn
Thus, w i t h p 2 , n - 2 T ~ F ( f ; ' ) ( ~ ~ ) ) ~ + ~ - ~ f ( 8 ) 1 k2(y)dy a .s . by n
Lemma 4.1, B1, B3 and t h e Dominated Convergence Theorem. F u r t h e r ,
L This s a t i s f i e s (4.8) s i n c e Yn/n+u a s s .
To s a t i s f y (4.9) , u s e a c o n d i t i o n a l v e r s i o n o f Holder ' s in -
e q u a l i t y . Let p ,q b e nonnegat ive c o n s t a n t s such t h a t 2 /p + l/q = 1.
Then,
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
FREES
Now, by 83 and (4 .14) . EF ( f L 1 ) ( x n ) ) ~ = O ( ~ ~ ' ( P - ~ ) ) and w i t h n
( 4 . 1 1 ) ,
E cP = ~ ( n ~ ( p - 2 / 3 ) ) F n (4.16)
n
1-E For some O < d ( l - 2 ~ / 3 ) ( 1 - 2 / p ) , l e t h ( t ) = t . S i n c e Yn=O(n), we
have (h(yn))- ' l2 E 2 = ~ ( n ~ ( ~ - ~ / ~ ) - ~ / ~ ( ~ - ~ ) ) . R e q u i r i n g t h a t Fn "
0 < ~ < ( 1 - 2 r /3 ) (1-2/p) i m p l i e s t h a t ~ ( p - 2 ) /3-(p/2) (1-€)<-I and t h u s
(4 .9) i s s a t i s f i e d . We t h u s have (4 .10) which i s s u f f i c i e n t
f o r A 4 . + Tne proof of C o r o l l a r y 4.2 i s s i m i l a r t o t h e proof o f Corol-
l a r y 3 .3 u s i n g Theorem 4.1, Theorem 3.2 and t h e f o l l o w i n g :
Lemma 4.3.
Cons ide r t h e a l g o r i t h m d e f i n e d i n (4 .2) and a s s m e B1-B6. I
Then, t h e e s t i m a t o r s B,, on and Gn d e f i n e d i n (4.5)-(4.7) s a t i s f y
Ah. I
Proof : We prove o n l y Bn=B + o(1) a s t h e o t h e r s a r e s i m i l a r . S u f f i - - c i e n t f o r t h i s i s
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION
where I r \ l ( Y ) - ~ n ( < c y , by a T a y l o r - s e r i e s expans ion and n s i n c e kr E Mr. By B6,
by Lemma 4.1 and t h e Dominated Convergence Theorem. By Kronecker 's
lemma, we have
It is no t hard t o f i n d a t ~ ( 0 , 2 ] such t h a t
(4.19) and (4.20) a r e s u f f i c i e n t f o r (4.17) by Theorem 5 of Chow
(1965). i
55. MONTE-CARL0 SIMULATION
I n a s imple example, f i n i t e sample p r o p e r t i e s of t h e procedure
in t roduced i n 54 were i n v e s t i g a t e d . The Gamma d i s t r i b u t i o n was used
w i t h l o c a t i o n and s c a l e parameters 11 and 10 , r e s p e c t i v e l y , which
produces a mean of 1.1 and s t a n d a r d d e v i a t i o n of .3317. The d i s -
t r i b u t i o n f u n c t i o n and parameters determine a unique 0~1.
To e s t i m a t e t h e d e r i v a t i v e s of t h e d e n s i t y f u n c t i o n , we used
Legendre polynomials t o c a l c u l a t e s imple , polynomial k e r n e l est ima-
t o r s . For r a t e s r=3 and r=4, t h e s e k e r n e l e s t i m a t o r s a r e g i v e n i n
Table 1 below. By u s i n g r=4 and Legendre polynomials , t h e mode
e s t i m a t o r i s a s y m p t o t i c a l l y unbiased ( P O ) & T h i s f a c t had a g r e a t
impact on t h e performance of t h e s topping r u l e s which i s descr ibed
below.
Le t A and C be p o s i t i v e c o n s t a n t s , KA and KC be nonnegat ive
c o n s t a n t s . A s f i r s t sugges ted by Dvoretsky (1956), we used Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
TABLE I KERNEL FUNCTIONS
FREES
r = 3 - ko(y) = .375 (34jy2) I(-l<y<l)
k (y) = kl(y) = 1.5y I(-l<y<l)
k2(y) = 3.75(3y2-1) I(-l<y<l)
k3(y) = 26.~5(5~~-3~) I(-lcy<l)
3 k (y) = kl(y) = 1.875(5y-7y ) I(-l<y<l)
k2(y) = 3.75(3y2-1) I(-l<y<l)
{a,} and {en) of the form an = A(~+K*)-~ and cn = c(~+K~)-~.'/~.
Taking KA to be positive improved the convergence properties of the
procedure in finite samples. Choice of the positive parameters A
and C is constrained only by B5 (for the above Gamma distribution
and 1-4 this implies A > (-2/7)/f(2)(6) = .024). One criterion for
selection, introduced by Abdelhamid (1973), is to choose the A and
C that minimize the asymptotic mean square error. For t-3, some
easy calculations show that these optimal values are A1.08 and
Cz.7. Unfortunately, these optimal values are usually not available
to the experimenter a priori. Tables 2-4 give results for some
alternative parameters A and C (and various choices of KA, KC and
the initial estimator XI). In these tables, the criteria for choos-
ing among alternative estimators based on different parameters are
the expected bias and mean square error standardized by the sample
size.
Another important consideration in finite samples is the
choice of the appropriate confidence level (a) and confidence
interval width (2d). Based on the above parametrization of the
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION
TABLE 11 PERFORMANCE OF ESTIMATORS
250 Stage of Algorithm
500 2 000 a
model, i t i s easy t o ca l cu la t e f a c t o r s i n the asymptotic mean and
variance of n (r-1)/(2*1'(~n-~) a s given i n Theorem 4.1. For r.4,
these a r e bO.0, a2=.21885 and GE1.00088. Assume t h a t these values
a r e known, t h a t n1/3(~n-8) i s d i s t r i b u t e d normally (not only asymp-
t o t i c a l l y ) and t h a t i t i s des i red t o have a 95% confidence i n t e r v a l
fo r 8 with e r r o r no more than .I0 (which implies za12=1.96 and
ds.10). Based on the remarks preceding Corollary 3 . 3 , we requ i r e a
sample s i z e a t l e a s t
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
FREES
Even i n t h i s s i m p l i f i e d v e r s i o n , a l a r g e sample s i z e i s requi red
f o r moderately p r e c i s e r e s u l t s . Because of t h e Large sample n a t u r e
of t h e problem, we imposed t h e somewhat a r b i t r a r y c o n s t r a i n t t h a t
t h e procedure n o t be s topped b e f o r e n=200.
TABLE 111 PERFORMANCE OF ESTIMATORS
Stage of Algorithm 250 500 2000 m
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION
TABLE I V PERFORMANCE OF ESTIMATORS
S tage o f Algori thm 250 500 2000 m
At t h e 95% l e v e l of c o n f i d e n c e , Tab les 5-7 g i v e t h e r e s u l t s of
u s i n g t h e s t o p p i n g r u l e i n t h e SA a l g o r i t h m f o r s e l e c t e d v a l u e s of
A , C , KA, KC, X1 and d. The p r o p o r t i o n o f s u c c e s s e s column (PROPN
SUCC) i s t h e number o f times 6 E I d i v i d e d by t h e number o f t imes
t h e a l g o r i t h m was a c t u a l l y s toppedyd S i m i l a r l y , a v e r a g e s f o r t h e
s topp ing t i m e , Nd, and t h e b i a s a t Nd were computed o n l y over t h e
r u n s s topped .
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
FREES
TABLE V PERFORMANCE OF STOPPING RULES
PROPN PROPN STOPPED SUCC
TABLE V I PERFORMANCE OF STOPPING RULES
PROPN PROPN STOPPED SUCC
AVG
Nd
909.6 1513. 976.4
1538. 1047. 1581 910.8
1491. 895.5
1499. 857.3
1494. 865.6
1510. 870.1
1482. 932.9
1505.
AVG
Nd
909.6 1513.
880.9 1328. 1023. 1662. 1477.
0 650.4 954.4
1777. 1246. 1550. 1941.
542.8 1083.
538.0 1078. 1202. 1612.
AVG GNd- 8)
,012 ,004 .019 .007 .028 .012 .011 .003 .018 .008 ,003
-.002 -.004 -.001
,010 -003 .015 .005
AVG 8)
.012
.004 ,029 ,018 .008 .002 .039
0 .052 .044 .034 .061 ,056 .046 .017 ,013 .015 ,012
-.011 -.007
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION
TABLE VII PERFORMANCE OF STOPPING RULES
r 1 3 K A = O K C = O
d PROPN PROPN STOPPED SUCC
.lo0 .994 ,505
.075 .988 .508
.050 .912 .575 ,075 .998 .577 .050 .972 .568
.075 .996 .516
.050 .938 ,574
.075 .956 .446
.050 .818 .538
.075 .lo0 .240 ,050 .050 .440
.075 1.000 .282
.050 1 .OOO ,344
.075 .996 ,594
.050 .916 .600
.075 1.000 .590 ,050 1 .OOO .652
,075 1.000 .648 .050 1 .OOO .648
AVG ONd- 8)
.064 ,057 ,035 -. 009 ,003
.043
.029
.082
.050
.I40
.093
,131 .lo3 .038 .024 .065 .045
.023
.028
The following remarks summarize the results of the simulation
study, the details of which can be found in Tables 2-7. Each
simulation is based on 500 independent trials.
(1). The performance of the mode estimator can be character-
ized as typical of a stochastic approximation estimator. From
Tables 2-4, we see that the effects of the initial estimator X1
still persist for large n. The mode estimator performed better
when the initial estimator was less than the true mode than when X1
was greater than the mode, an important point in practice. The per-
formance was dramatically enhanced by taking KA to be positive, but
the introduction of positive only slowed the convergence of the
algorithm.
(2). For r=4, the stopping rule performed well even in cases
in which the mode estimator did poorly. Criteria for judging the Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
80 FREES
performance of t h e s topping r u l e s were p r o p o r t i o n of s u c c e s s e s and
t h e average b i a s a t t h e s topping t ime. When t h e convergence o f t h e
a lgor i thm was extremely poor , t h e s topping r u l e performed s l i g h t l y
worse. Thus, t h e s topping r u l e should no t b e r e l i e d on a s t h e s o l e
c r i t e r i o n f o r h a l t i n g t h e procedure when t h e exper imente r has no
knowledge of t h e d i s t r i b u t i o n f u n c t i o n .
( 3 ) . For r=3, t h e s topping r u l e performed w e l l on t h e b a s i s
of t h e average b i a s a t t h e s topping t ime. However, t h e performance
was markedly poorer than r=4 when judged by t h e p r o p o r t i o n of suc-
c e s s e s c r i t e r i o n . This i s due t o t h e f a c t t h a t t o c o n s t r u c t t h e
conf idence i n t e r v a l we needed t o e s t i m a t e t h e t h i r d d e r i v a t i v e of
t h e d e n s i t y a t t h e mode. Even t h e sample s i z e s we used were no t
l a r g e enough t o g e t a c c u r a t e e s t i m a t o r s of t h e b i a s . Thus, t h e
confidence i n t e r v a l s were much poorer than t h e a s y m p t o t i c a l l y
unbiased c a s e ( r = 4 ) .
BIBLIOGRAPHY
Abdelhamid, S.N. (1973). Transformation of o b s e r v a t i o n s i n s t o c h a s t i c approximation. Ann. S t a t i s t . 1, 1158-1174.
Anscombe, F. (1952). Large-sample theory of s e q u e n t i a l e s t i m a t i o n . Proc. Cambridge P h i l . Soc. 48, 600-607.
B i l l i n g s l e y , P. (1968). Convergence of P r o b a b i l i t y Measures. Wiley, New York.
Burkholder, D.L. (1956). On a c l a s s of s t o c h a s t i c approxima- t i o n processes . Ann. Math. S t a t i s t . 27, 1044-1059.
Chow, Y.S. (1965). Local convergence of m a r t i n g a l e s and t h e law of l a r g e numbers. Ann. Math. S t a t i s t . 36, 552-558.
.. .. . . Csorgo, M. and Revesz, P. (1981). Strong Approximations i n
P r o b a b i l i t y and S t a t i s t i c s . Akademiai Kiado, Budapest.
D u r r e t t , R.T. and Resnick, S . I . (1977). Weak convergence with random i n d i c e s . Stoch. Proc. Appl. 5, 213-220.
Dvoretsky, A. (1956). On s t o c h a s t i c approximation. R o c . Third Berkeley Symp. Math. S t a t i s t . Probab., 1 (J. Neyman, e d . ) , 39-55. Univ. C a l i f o r n i a P r e s s .
Eddy, W. (1980). Optimal k e r n e l e s t i m a t o r s of t h e mode. S t a t i s t . 8, 870-882.
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
STOCHASTIC APPROXIMATION 81
Eddy, W. (1982). The asymptot ic d i s t r i b u t i o n s of k e r n e l e s t i - mators of t h e mode. Z . Wahrscheinlichkeitstheorie Verw. Gebiete . 59, 279-290.
Fabian, V. (1968). On asymptot ic normal i ty i n s t o c h a s t i c approximation. Ann. Math. S t a t i s t . 39, 1327-1332.
Fabian, V. (1971). S t o c h a s t i c approximation. Optimizing Methods i n S t a t i s t i c s , J.S. Rustagi (ed . ) , 439-460.
F a r r e l l , R.H. (1962). Bounded l eng th confidence i n t e r v a l s f o r t he zero of a r eg r e s s ion func t i on . Ann. Math. S t a t i s t . 33, 237-247.
F r ee s , E.W. (1983). On Const ruc t ion of Sequent ia l Age Replacement P o l i c i e s v i a S t o c h a s t i c Approximation. Ph .D. Disserta- t i o n , Department of S t a t i s t i c s , Univers i ty of North Carol ina. Chapel ill, North Caro l ina .
F r e e s , E.W. and Ruppert, D. (1983). Sequent ia l nonparametr ic age replacement p o l i c i e s . I n s t i t u t e of S t a t i s t i c s Mimeo S e r i e s 61535. Chapel H i l l , North Caro l ina .
F r i t z , J. (1973). S t o c h a s t i c approximation f o r f i nd ing l o c a l maxima of p r o b a b i l i t y d e n s i t i e s . S tud ia Sci . Math. Hungar. 8 , 309-322.
H a l l , P. (1982). Asymptotic theory of Grenander's mode e s t i - mator. Z . Wahrscheinlichkeitstheorie Verw. Gebiete. 60, 315-334.
K i e f e r , J. and Wolfowitz, J. (1952). S t o c h a s t i c e s t ima t i on of t he maximum of a r eg r e s s ion func t i on . Ann. Math. S t a t i s t . 23, 462-466.
Kushner, H.J . (1977). General convergence r e s u l t s f o r s to - c h a s t i c approximations v i a weak convergence theory . J. Math. Anal. and Appl. 61, 490-503.
L a i , T.L. and Robbins, H. (1979). Adaptive de s igns and sto- c h a s t i c approximation. Ann. S t a t i s t . 7, 1196-1221.
Ljung, L. (1978). Strong convergence of a s t o c h a s t i c approxi- mation confidence. Ann. S t a t i s t . 6, 680-696.
McLeish, D.L. (1976). Func t iona l and random c e n t r a l l i m i t theorems f o r t h e Robbins-Monro process . J. Appl. Prob. 13 , 148-154.
~ ; ' l l e r , H.G. and Gasser , T. (1979)'. Optimal convergence prop- e r t i e s of k e r n e l e s t i m a t e s of d e r i v a t i v e s of a d e n s i t y f unc t i on . Smoothing Techniques f o r Curve Es t imat ion . (T. Gasser and M. Rosenbla t t , ed s . ) , 144-154. Springer-Verlag, Be r l i n .
P h i l i p p , W. and S t o u t , W. (1975). Almost s u r e i nva r i ance p r i n c i p l e s f o r p a r t i a l sums of weakly dependent random v a r i a b l e s . Amer. Math. Soc. Men. No. 161, Amer. Math. Soc., Providence, R I .
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014
82 FREES
C
Renyi, A. (1958). On mixing sequences of s e t s . Acta Math. Acad. Sc i . Hungar. 9, 215-228.
Robbins, H. and Monro, S. (1951). A s t o c h a s t i c approximation method. Ann. Math. S t a t i s t . 22, 400-407.
Ruppert D. (1982). Almost s u r e a ~ p r o x i m a t i o n s t o t he Robbins- Monro and Kief er-wolf owitz processes wi th dependent no ise . S t a t i s t . 10, 178-187.
S ie lken , R.L. (1973). Stopping t imes f o r s t o c h a s t i c approxi- mation procedures. 2. Wahrscheinlichkeitstheorie Verw. Geb. 26, 67-75.
Singh, R.S. (1977). Improvement on some known nonparametric uniformly c o n s i s t e n t e s t i m a t o r s of d e r i v a t i v e s of a d e n s i t y . S t a t i s t . 5, 394-399.
Skorohod, A.V. (1956). Limit theorems f o r s t o c h a s t i c p rocesses . Theor. Probab. Appl. 1, 261-290.
Stone, C.J . (1963). Weak convergence of s t o c h a s t i c processes def ined on s emi - i n f i n i t e time i n t e r v a l s . Proc. h e r . Math. Soc. 14, 694-696.
S t r a s s e n , V. (1967). Almost s u r e behaviour of sums of inde- pendent random v a r i a b l e s and mar t inga les . Proc. F i f t h Berke ley Symp. S t a t i s t . Probab. (Ed. L. LeCam e t a l . ) . Los Angeles: Univers i ty of Ca l i f o rn i a P r e s s , v. 2, 315-343.
S t roup , D.F. and Braun, I. (1982). On a s topping r u l e f o r s t o c h a s t i c approximation. 2. Wahrscheinlichkeitstheorie Verw. Geb. 60, 535-554. -
Recieved : October 1984.
k m n d e d by Jana ~ure&wa<
Dow
nloa
ded
by [
Uni
vers
ity o
f A
uckl
and
Lib
rary
] at
16:
48 3
0 N
ovem
ber
2014