15
This article was downloaded by: [University North Carolina - Chapel Hill] On: 04 November 2014, At: 19:03 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20 On the bootstrap and smoothed bootstrap Suojin Wang a a Department of Statistical Science , Southern Methodist University , Dallas, TX, 75275 Published online: 27 Jun 2007. To cite this article: Suojin Wang (1989) On the bootstrap and smoothed bootstrap, Communications in Statistics - Theory and Methods, 18:11, 3949-3962, DOI: 10.1080/03610928908830134 To link to this article: http://dx.doi.org/10.1080/03610928908830134 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

On the bootstrap and smoothed bootstrap

  • Upload
    suojin

  • View
    222

  • Download
    2

Embed Size (px)

Citation preview

Page 1: On the bootstrap and smoothed bootstrap

This article was downloaded by: [University North Carolina - Chapel Hill]On: 04 November 2014, At: 19:03Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20

On the bootstrap and smoothed bootstrapSuojin Wang aa Department of Statistical Science , Southern Methodist University , Dallas, TX, 75275Published online: 27 Jun 2007.

To cite this article: Suojin Wang (1989) On the bootstrap and smoothed bootstrap, Communications in Statistics -Theory and Methods, 18:11, 3949-3962, DOI: 10.1080/03610928908830134

To link to this article: http://dx.doi.org/10.1080/03610928908830134

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purposeof the Content. Any opinions and views expressed in this publication are the opinions and views of theauthors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should notbe relied upon and should be independently verified with primary sources of information. Taylor and Francisshall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, andother liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relationto or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: On the bootstrap and smoothed bootstrap

COMMUN. STATIST. -THEORY METH. , 18(11), 3949-3962 (1989)

On the Bootstrap and Smoothed Bootstrap

Suojin Wang

Department of Statistical Science Southern Methodist University

Dallas, TX 75275

Key Words and Phrases: Bootstrap; Saddlepoint Approzimaiion; Smoothed Bootstrap.

ABSTRACT

The standard bootstrap and two commonly used types of smoothed bootstrap are

investigated. The saddlepoint approximations are used to evaluate the accuracy of the

three bootstrap estimates of the density of a sample mean. The optimal choice for the

smoothing parameter is obtained when smoothing is useful in reducing the mean squared

error.

1. INTRO - DUCTION

Sup- that X1, ..., Xn are drawn independently from an unknown but continuous

distribution F. The standard bootstrap analysis is based on the empirical distribution

which is discrete. Because sampling properties from continuous and discrete distributions

could be quite different, it is natural to consider a smoothed bootstrap, i.e., a bootstrap

based on a smoothed version of %, which was proposed by Efron (1979); also see Efron

and Gong (1983). Two smoothed bootstraps, one a rescaled version of the other, are

defined in Section 2.

A natural question arises concerning whether the standard bootstrap or a

smoothed bootstrap is preferable. There is no global preference in general. In a recent

article, Silverman and Young (1987) developed criteria in the case of estimation of linear

functionals or the approximation of general functionals by linear functionals for

determining whether it is advantageoue to use the smoothed bootstrap rather than the

standard bootstrap using the second-order properties; see also Young (1988). The criteria

Copyright @ 1989 by Marcel Dekker, Inc. Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 3: On the bootstrap and smoothed bootstrap

3950 WANG

can be used when estimating parameters which are functionals of true distributions. Hall,

Diciccio and Romano (1989) have shown that appropriate smoothing can improve the

convergence rate of a variance estimator. It is often the case, however, that we not only

need the estimates of parameters, but also the distributions of these estimates. In

nonparametric analysis, the bootstrap playe an important role in approximating such

distributions.

In this article, we investigate the seeond-order effects of the smoothed bootstrap

approximations compared with the standard bootstrap approximation. It is usually very

difficult to evaluate the accuracy of the approximations to the distributions of the

estimates partly because these distributions themselves usually involve convolutions. In

some circumstances when the saddlepoint approximations are applicable, however, it

becomes fairly easy. Davison and Hinkley (1988) and Wang (1989) recently extended

saddlepoint approximation theory and applications to resampling problems; also see Reid

(1988) for a general review of applications of the saddlepoint approximations in the

parametric framework. Using the saddlepoint approximations as a powerful tool, Section

2 shows that for the distribution of a sample mean, the ordinary smoothed bootstrap

approximation is better than the standard bootstrap in the regions in which we are

usually more interested. In such cases the optimum choice for the smoothing parameter

is obtained. However, as we will show in Section 3, the rescaled smoothed bootstrap

approximation has the same second-order accuracy as the standard bootstrap.

2. STANDARD AND $MOO - THED BOOTSTRAP DENSITIES

Let

and f'h,r(x) = ~ ~ ( ( 1 + h2)1'2x), where I is an indicator function, W(.) is a symme

distribution function with variance 1, and its density has continuous derivative, b2 =

n-'=(xj - K ) ~ and h is a .smoothing parameter. Bootstrap stirnates with resamples

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 4: On the bootstrap and smoothed bootstrap

BOOTSTRAP AND SMOOTHED BOOTSTRAP 3951

from 8, f h and @ are called the standard, the ordinary smoothed and the rescaled h,r smoothed bootstrap respectively. Azzalini (1981) studied the second-order effects of the

smoothing in f h and gave the asymptotically optimum choice for h by comparing the

mean squared error (MSE) of ph with that of P.

The exact density of fi (- X - p 1 with p = E(X)

is usually difficult to obtain even for known distribution F except for some special cases.

Fortunately, when F is known, saddlepoint formulas provide very accurate

approximations; see Daniels (1954). When F is unknown, bootstrap approximations seem

to be very good alternatives. Singh (1981) provided some theoretical justification for

bootstrapping the distribution functions of the sample mean and its standardized form.

Let @(x), Ph(x) and ch r ( ~ ) be the standard, the ordinary smoothed and the rescaled >

smoothed bootstrap estimates of pn(x), i.e., they are obtained by replacing F in (1) by

f , f h and respectively. Our goal here is to study the second-order properties of h,r

these smoothed bootstrap estimates compared to the standard estimate. Note that

strictly speaking, when F is replaced by p, which is discrete, the probability function in

(1) is discrete. However when n is relatively large, the discreteness is negligible and P(x)

is defined using finite difference ratios as is treated by Ogbonomwan and Wynn (1988);

see Ogbonomwan and Wynn (1988) for details. Notice that usually h here should be

chosen smaller than in the case of estimating the density of a single X in order for the

MSEs of the estimates to not be dominated by the large bias. We now compare p(x) and

oh(x) and leave $h,r(x) to Section 3.

Without loss of generality, we assume p = 0. Furthermore assume that the

moment generating function (MGF) of the underlying distribution F converges in an open

interval containing the origin. The saddlepoint series expansions for @(x) and ph(x) are

the following:

and

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 5: On the bootstrap and smoothed bootstrap

3952 WANG

where K and kh are the cumulant generating functions (CGF) corresponding to f and

f h , X and Ah are the unique solutions to ~ ' ( 1 ) = x/f i and Rh(X) = x/G res+ively,

r . J = K ( j ) ( ~ ) / { t ~ ~ ( i ) } ~ / ~

and

6. J = ~ ~ ) ( i ~ ) / { t ~ ( i ~ ) } ' ~ ~ .

Let K(X) be the CGF of the true F and let gn(x) be the saddlepoint formula for the true

density pn(x), i.e., it is hypothetically obtained from (2) with true K rather than K. By

the clasical and bootstrap saddlepoint approximation theory (Wang, 1989) we can prove

the following lemma.

Lemma 1. Assume that fhe MFG of the underlying disfribution of X converges in an

open interval containing the origin. For each fized z ,

and

where pn = pn for some 0 < p < 1.

The proof is parallel to that in the case of the cumulative distribution function in

Wang (1989) and is omitted here.

It is easily seen that

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 6: On the bootstrap and smoothed bootstrap

BOOTSTRAP AND SMOOTHED BOOTSTRAP

and

for j = 0, 1, 2, . . . . Moreover,

Note that here we used the fact that

We now prove:

Lemma 2. Let j(z) be the standard bootstrap estimate of ~ ~ ( 2 ) . Then

and

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 7: On the bootstrap and smoothed bootstrap

3954

where

WANG

. = I E(x , - p)4 - 1 = n E ( a 3 . @2)' u4 u4

+ 0(i1) . (5)

Proof. By expanding R ( i ) and K(AO) a t 0 and by (4) and its analog for A,,, we have

B ( i ) - A& - K ( A ~ ) + A ~ & )

= n K(AO) - XO& - 6'Ia - u2A; ( ) ( = .(X(, - Aoe) + f (5 - $) + En + op(i3/') ,

where

and thus E(En) = ~ ( n - ~ ' ~ ) . It followa that

( U' )'I2 ex.(.: (3 - $) + ~ n } + ~,-,(n-'") W = an(.)

= go(') f EXP @ (5 - $)} ( I + Qn) + ~ ~ ( h ~ " ) 7 (6)

where Qn = -JIn/2uz + En = 0,(n-') and E(Qn) = 0 (n-"') since E(Dn) = 0(d3/').

Therefore.

and

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 8: On the bootstrap and smoothed bootstrap

BOOTSTRAP AND SMOOTHED BOOTSTRAP 3955

Appling Lemma 1 to the above formulas concludes Lemma 2.

The relationship between fih(x) and P(x) is investigated as follows.

J,ernrna 3. Let 7. = h21n3'2 and h i be Be ordinary smoothed booistrap estimab

ofpn(z). Then

and

Proof. For X -+ 0,

= K(A) + log { 7 ehdXyw(y) dy) -M

Similarly, by expanding M ~ , ML and M[, we obtain

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 9: On the bootstrap and smoothed bootstrap

3956 WANC

Moreover,

(3) - j - h 2 h - (- ' ,JO) j2) + op(yn) . (9)

Therefope, from (4), (8), and ( Q ) ,

hzx2 h 2 ~ ( 3 ) ( ~ ) x3 = n ( K(A)-A& - ) +y- 2,

4iu6

and

i t 1 1 ,( h) = K''(X~) + h2c2 + Op

Substituting (10) and (11) into (3) and using some simple algebra, we get

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 10: On the bootstrap and smoothed bootstrap

BOOTSTRAP AND SMOOTHED BOOTSTRAP 3957

where

6, = h4 + h2/ n , (15 )

and 4( . ) is the standard normal density function. Thus, we have the following theorem.

Thenrern 1. If $ ( z ) and j h ( z ) are the standard and the ordinary smoothed booistrap

estimates of pn(z) defined in ( I ) , then

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 11: On the bootstrap and smoothed bootstrap

3958

where

WANG

l(1 x y ( . $ l y , n (z ) = 3 a 4 (a)

and 6n and v are defined as in (15) and (5) respectively. Furthermore, for each x such

that cl(z) > 0 the optimum choice for h is

and in such case

MSE{ph(z)} = MSE {P(z)} -

Theorem 1 says that ordinary smoothing is useful if and only if cl(x) > 0 and the

smoothing parameter is properly chosen. As a symmetric function of x, cl(x) depends on

v and is usually positive in the region8 of primary interest which include the quantiles for

constructing confidence intervals and the MLEs. For example, if X is normally distri-

buted with a=l, then v=2 and cl(x) > 0 if x E R+ = [-2.02, -1) U [-.75, .75] U (1,2.02].

We have found similar results for exponential and many other distributions.

To examine the size of MSE{@(x)}, we calculate the first order approximation

using Lemma 2 as followa:

3. RESCALED SMOOTHED BOOTSTRAP DENSITY

Now we consider the rescaled smoothed bootstrap estimate Ph,,(x) of the density

pn(x). From the definition in the beginning of Section 2, we see that

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 12: On the bootstrap and smoothed bootstrap

BOOTSTRAP AND SMOOTHED BOOTSTRAP

Lemma 4. Let 6, = h4 + h2/n and lei jih,,(z) be the rescaled smoothed bootstrap

estimate of pn(z). Then

and

Proof. By Lemma 1 and 3, we see that

var{ph,Jx)') = (1 + h2) var {gh ((I + h?'12 Y. + O (pn)

= (1 + h2) var {g ((1 + h y J 2 x

)I

But, from (6) and because of the error structure, we have

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 13: On the bootstrap and smoothed bootstrap

3960

Thus, by expansions similar to (7) we obtain

= rar{ij(x)) + ~ C O V

Therefore, (IS), (17) and Lemma 2 lead to

It is easily obtained from (12) and Lemma 1 that

Lemma 4 is therefore proved by using Lemma 1.

We then have the following comparison:

Theorem 2. Under the same conditions as in Lemma 4,

MSE{@h,r(z)) = MSE{B(zj} + o(6,) .

WANG

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 14: On the bootstrap and smoothed bootstrap

This completes our proof. 0

The above equation indicates that the reaealed smoothed bootstrap and the

standard bootstrap estimates of the density of a sample mean have the same asymptotic

accuracy to the second order.

CONCLUSION$

In this paper we have studied the second-order properties of the ordinary smoothed

and the rescaled smoothed bootstrap density estimators. Ordinary smoothing can reduce

the MSE, but care is needed in such application. Fhcaled smoothing usually has smaller

effects. In any event both smoothing methods have their value in obtaining smooth

estimators.

ACKNOWLEDGEMENTS

This research was motivated by discussions with Professor David Binkley. It was

supported in part by DARPAIAFGL contract No. F19628-88-K-0042.

REFERENCES

Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by

a kernel method. Biometrika, 68, 326-328.

Daniels, H. E. (1954). Saddlepoint approximations in statistics. Ann. Math. Statist., 25,

631-650.

Davison, A. C. and Hinkley, D. V. (1988). Saddlepoint approximations in resampling

methods. Biomeirika, 75, 417-431.

Efron, B. (1979). Bootstrap methods: another look a t jackknife. Ann. Statist., 7, 1-26.

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14

Page 15: On the bootstrap and smoothed bootstrap

3962 WANG

Efron, B. and Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and

cross-validation. The American Statistician, 37, 36-48.

Hall, P., Diciccio, T. J. and Romano, J. P. (1989). On smoothing and the bootstrap.

Ann. Statist., 17, 692-705.

Ogbonomwan, S. M. and Wynn, H. P. (1988). Resampling generated likelihood.

Statistical Decision Theory and Related Topics IV, Vol. 1, 133-147, S. S. Gupta,

J. 0. Berger (eds.) Springer-Verlag, New York.

Reid, N. (1988). Saddlepoint methods and statistical inference (with Discussion).

Statist. Sci., 3, 213-238.

Silverman, B. W. and Young, G. A. (1987). The Bootstrap: to smooth or not to smooth?

Biometrika, 74, 469-479.

Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist., 9,

1187-1195.

Wang, S. (1990). Saddlepoint approximations in resampling analysis. Ann. Inst. Statist.

Math., 42, to appear.

Young, G. A. (1988). A note on bootstrapping the correlation coeficient. Unpublished.

Received Sepfembm 1 9 8 9 .

Recommended by D. 8. Owen, SoLLtlzehn Methodht U n i v m L t y , D a L h , TX.

Redezreed Anonumo~ ty .

Dow

nloa

ded

by [

Uni

vers

ity N

orth

Car

olin

a -

Cha

pel H

ill]

at 1

9:03

04

Nov

embe

r 20

14