Upload
suojin
View
222
Download
2
Embed Size (px)
Citation preview
This article was downloaded by: [University North Carolina - Chapel Hill]On: 04 November 2014, At: 19:03Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20
On the bootstrap and smoothed bootstrapSuojin Wang aa Department of Statistical Science , Southern Methodist University , Dallas, TX, 75275Published online: 27 Jun 2007.
To cite this article: Suojin Wang (1989) On the bootstrap and smoothed bootstrap, Communications in Statistics -Theory and Methods, 18:11, 3949-3962, DOI: 10.1080/03610928908830134
To link to this article: http://dx.doi.org/10.1080/03610928908830134
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purposeof the Content. Any opinions and views expressed in this publication are the opinions and views of theauthors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should notbe relied upon and should be independently verified with primary sources of information. Taylor and Francisshall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, andother liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relationto or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
COMMUN. STATIST. -THEORY METH. , 18(11), 3949-3962 (1989)
On the Bootstrap and Smoothed Bootstrap
Suojin Wang
Department of Statistical Science Southern Methodist University
Dallas, TX 75275
Key Words and Phrases: Bootstrap; Saddlepoint Approzimaiion; Smoothed Bootstrap.
ABSTRACT
The standard bootstrap and two commonly used types of smoothed bootstrap are
investigated. The saddlepoint approximations are used to evaluate the accuracy of the
three bootstrap estimates of the density of a sample mean. The optimal choice for the
smoothing parameter is obtained when smoothing is useful in reducing the mean squared
error.
1. INTRO - DUCTION
Sup- that X1, ..., Xn are drawn independently from an unknown but continuous
distribution F. The standard bootstrap analysis is based on the empirical distribution
which is discrete. Because sampling properties from continuous and discrete distributions
could be quite different, it is natural to consider a smoothed bootstrap, i.e., a bootstrap
based on a smoothed version of %, which was proposed by Efron (1979); also see Efron
and Gong (1983). Two smoothed bootstraps, one a rescaled version of the other, are
defined in Section 2.
A natural question arises concerning whether the standard bootstrap or a
smoothed bootstrap is preferable. There is no global preference in general. In a recent
article, Silverman and Young (1987) developed criteria in the case of estimation of linear
functionals or the approximation of general functionals by linear functionals for
determining whether it is advantageoue to use the smoothed bootstrap rather than the
standard bootstrap using the second-order properties; see also Young (1988). The criteria
Copyright @ 1989 by Marcel Dekker, Inc. Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
3950 WANG
can be used when estimating parameters which are functionals of true distributions. Hall,
Diciccio and Romano (1989) have shown that appropriate smoothing can improve the
convergence rate of a variance estimator. It is often the case, however, that we not only
need the estimates of parameters, but also the distributions of these estimates. In
nonparametric analysis, the bootstrap playe an important role in approximating such
distributions.
In this article, we investigate the seeond-order effects of the smoothed bootstrap
approximations compared with the standard bootstrap approximation. It is usually very
difficult to evaluate the accuracy of the approximations to the distributions of the
estimates partly because these distributions themselves usually involve convolutions. In
some circumstances when the saddlepoint approximations are applicable, however, it
becomes fairly easy. Davison and Hinkley (1988) and Wang (1989) recently extended
saddlepoint approximation theory and applications to resampling problems; also see Reid
(1988) for a general review of applications of the saddlepoint approximations in the
parametric framework. Using the saddlepoint approximations as a powerful tool, Section
2 shows that for the distribution of a sample mean, the ordinary smoothed bootstrap
approximation is better than the standard bootstrap in the regions in which we are
usually more interested. In such cases the optimum choice for the smoothing parameter
is obtained. However, as we will show in Section 3, the rescaled smoothed bootstrap
approximation has the same second-order accuracy as the standard bootstrap.
2. STANDARD AND $MOO - THED BOOTSTRAP DENSITIES
Let
and f'h,r(x) = ~ ~ ( ( 1 + h2)1'2x), where I is an indicator function, W(.) is a symme
distribution function with variance 1, and its density has continuous derivative, b2 =
n-'=(xj - K ) ~ and h is a .smoothing parameter. Bootstrap stirnates with resamples
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
BOOTSTRAP AND SMOOTHED BOOTSTRAP 3951
from 8, f h and @ are called the standard, the ordinary smoothed and the rescaled h,r smoothed bootstrap respectively. Azzalini (1981) studied the second-order effects of the
smoothing in f h and gave the asymptotically optimum choice for h by comparing the
mean squared error (MSE) of ph with that of P.
The exact density of fi (- X - p 1 with p = E(X)
is usually difficult to obtain even for known distribution F except for some special cases.
Fortunately, when F is known, saddlepoint formulas provide very accurate
approximations; see Daniels (1954). When F is unknown, bootstrap approximations seem
to be very good alternatives. Singh (1981) provided some theoretical justification for
bootstrapping the distribution functions of the sample mean and its standardized form.
Let @(x), Ph(x) and ch r ( ~ ) be the standard, the ordinary smoothed and the rescaled >
smoothed bootstrap estimates of pn(x), i.e., they are obtained by replacing F in (1) by
f , f h and respectively. Our goal here is to study the second-order properties of h,r
these smoothed bootstrap estimates compared to the standard estimate. Note that
strictly speaking, when F is replaced by p, which is discrete, the probability function in
(1) is discrete. However when n is relatively large, the discreteness is negligible and P(x)
is defined using finite difference ratios as is treated by Ogbonomwan and Wynn (1988);
see Ogbonomwan and Wynn (1988) for details. Notice that usually h here should be
chosen smaller than in the case of estimating the density of a single X in order for the
MSEs of the estimates to not be dominated by the large bias. We now compare p(x) and
oh(x) and leave $h,r(x) to Section 3.
Without loss of generality, we assume p = 0. Furthermore assume that the
moment generating function (MGF) of the underlying distribution F converges in an open
interval containing the origin. The saddlepoint series expansions for @(x) and ph(x) are
the following:
and
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
3952 WANG
where K and kh are the cumulant generating functions (CGF) corresponding to f and
f h , X and Ah are the unique solutions to ~ ' ( 1 ) = x/f i and Rh(X) = x/G res+ively,
r . J = K ( j ) ( ~ ) / { t ~ ~ ( i ) } ~ / ~
and
6. J = ~ ~ ) ( i ~ ) / { t ~ ( i ~ ) } ' ~ ~ .
Let K(X) be the CGF of the true F and let gn(x) be the saddlepoint formula for the true
density pn(x), i.e., it is hypothetically obtained from (2) with true K rather than K. By
the clasical and bootstrap saddlepoint approximation theory (Wang, 1989) we can prove
the following lemma.
Lemma 1. Assume that fhe MFG of the underlying disfribution of X converges in an
open interval containing the origin. For each fized z ,
and
where pn = pn for some 0 < p < 1.
The proof is parallel to that in the case of the cumulative distribution function in
Wang (1989) and is omitted here.
It is easily seen that
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
BOOTSTRAP AND SMOOTHED BOOTSTRAP
and
for j = 0, 1, 2, . . . . Moreover,
Note that here we used the fact that
We now prove:
Lemma 2. Let j(z) be the standard bootstrap estimate of ~ ~ ( 2 ) . Then
and
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
3954
where
WANG
. = I E(x , - p)4 - 1 = n E ( a 3 . @2)' u4 u4
+ 0(i1) . (5)
Proof. By expanding R ( i ) and K(AO) a t 0 and by (4) and its analog for A,,, we have
B ( i ) - A& - K ( A ~ ) + A ~ & )
= n K(AO) - XO& - 6'Ia - u2A; ( ) ( = .(X(, - Aoe) + f (5 - $) + En + op(i3/') ,
where
and thus E(En) = ~ ( n - ~ ' ~ ) . It followa that
( U' )'I2 ex.(.: (3 - $) + ~ n } + ~,-,(n-'") W = an(.)
= go(') f EXP @ (5 - $)} ( I + Qn) + ~ ~ ( h ~ " ) 7 (6)
where Qn = -JIn/2uz + En = 0,(n-') and E(Qn) = 0 (n-"') since E(Dn) = 0(d3/').
Therefore.
and
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
BOOTSTRAP AND SMOOTHED BOOTSTRAP 3955
Appling Lemma 1 to the above formulas concludes Lemma 2.
The relationship between fih(x) and P(x) is investigated as follows.
J,ernrna 3. Let 7. = h21n3'2 and h i be Be ordinary smoothed booistrap estimab
ofpn(z). Then
and
Proof. For X -+ 0,
= K(A) + log { 7 ehdXyw(y) dy) -M
Similarly, by expanding M ~ , ML and M[, we obtain
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
3956 WANC
Moreover,
(3) - j - h 2 h - (- ' ,JO) j2) + op(yn) . (9)
Therefope, from (4), (8), and ( Q ) ,
hzx2 h 2 ~ ( 3 ) ( ~ ) x3 = n ( K(A)-A& - ) +y- 2,
4iu6
and
i t 1 1 ,( h) = K''(X~) + h2c2 + Op
Substituting (10) and (11) into (3) and using some simple algebra, we get
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
BOOTSTRAP AND SMOOTHED BOOTSTRAP 3957
where
6, = h4 + h2/ n , (15 )
and 4( . ) is the standard normal density function. Thus, we have the following theorem.
Thenrern 1. If $ ( z ) and j h ( z ) are the standard and the ordinary smoothed booistrap
estimates of pn(z) defined in ( I ) , then
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
3958
where
WANG
l(1 x y ( . $ l y , n (z ) = 3 a 4 (a)
and 6n and v are defined as in (15) and (5) respectively. Furthermore, for each x such
that cl(z) > 0 the optimum choice for h is
and in such case
MSE{ph(z)} = MSE {P(z)} -
Theorem 1 says that ordinary smoothing is useful if and only if cl(x) > 0 and the
smoothing parameter is properly chosen. As a symmetric function of x, cl(x) depends on
v and is usually positive in the region8 of primary interest which include the quantiles for
constructing confidence intervals and the MLEs. For example, if X is normally distri-
buted with a=l, then v=2 and cl(x) > 0 if x E R+ = [-2.02, -1) U [-.75, .75] U (1,2.02].
We have found similar results for exponential and many other distributions.
To examine the size of MSE{@(x)}, we calculate the first order approximation
using Lemma 2 as followa:
3. RESCALED SMOOTHED BOOTSTRAP DENSITY
Now we consider the rescaled smoothed bootstrap estimate Ph,,(x) of the density
pn(x). From the definition in the beginning of Section 2, we see that
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
BOOTSTRAP AND SMOOTHED BOOTSTRAP
Lemma 4. Let 6, = h4 + h2/n and lei jih,,(z) be the rescaled smoothed bootstrap
estimate of pn(z). Then
and
Proof. By Lemma 1 and 3, we see that
var{ph,Jx)') = (1 + h2) var {gh ((I + h?'12 Y. + O (pn)
= (1 + h2) var {g ((1 + h y J 2 x
)I
But, from (6) and because of the error structure, we have
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
3960
Thus, by expansions similar to (7) we obtain
= rar{ij(x)) + ~ C O V
Therefore, (IS), (17) and Lemma 2 lead to
It is easily obtained from (12) and Lemma 1 that
Lemma 4 is therefore proved by using Lemma 1.
We then have the following comparison:
Theorem 2. Under the same conditions as in Lemma 4,
MSE{@h,r(z)) = MSE{B(zj} + o(6,) .
WANG
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
This completes our proof. 0
The above equation indicates that the reaealed smoothed bootstrap and the
standard bootstrap estimates of the density of a sample mean have the same asymptotic
accuracy to the second order.
CONCLUSION$
In this paper we have studied the second-order properties of the ordinary smoothed
and the rescaled smoothed bootstrap density estimators. Ordinary smoothing can reduce
the MSE, but care is needed in such application. Fhcaled smoothing usually has smaller
effects. In any event both smoothing methods have their value in obtaining smooth
estimators.
ACKNOWLEDGEMENTS
This research was motivated by discussions with Professor David Binkley. It was
supported in part by DARPAIAFGL contract No. F19628-88-K-0042.
REFERENCES
Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by
a kernel method. Biometrika, 68, 326-328.
Daniels, H. E. (1954). Saddlepoint approximations in statistics. Ann. Math. Statist., 25,
631-650.
Davison, A. C. and Hinkley, D. V. (1988). Saddlepoint approximations in resampling
methods. Biomeirika, 75, 417-431.
Efron, B. (1979). Bootstrap methods: another look a t jackknife. Ann. Statist., 7, 1-26.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14
3962 WANG
Efron, B. and Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and
cross-validation. The American Statistician, 37, 36-48.
Hall, P., Diciccio, T. J. and Romano, J. P. (1989). On smoothing and the bootstrap.
Ann. Statist., 17, 692-705.
Ogbonomwan, S. M. and Wynn, H. P. (1988). Resampling generated likelihood.
Statistical Decision Theory and Related Topics IV, Vol. 1, 133-147, S. S. Gupta,
J. 0. Berger (eds.) Springer-Verlag, New York.
Reid, N. (1988). Saddlepoint methods and statistical inference (with Discussion).
Statist. Sci., 3, 213-238.
Silverman, B. W. and Young, G. A. (1987). The Bootstrap: to smooth or not to smooth?
Biometrika, 74, 469-479.
Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist., 9,
1187-1195.
Wang, S. (1990). Saddlepoint approximations in resampling analysis. Ann. Inst. Statist.
Math., 42, to appear.
Young, G. A. (1988). A note on bootstrapping the correlation coeficient. Unpublished.
Received Sepfembm 1 9 8 9 .
Recommended by D. 8. Owen, SoLLtlzehn Methodht U n i v m L t y , D a L h , TX.
Redezreed Anonumo~ ty .
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 1
9:03
04
Nov
embe
r 20
14