E-Book - Methodology and Theory for the Bootstrap

8/8/2019 E-Book - Methodology and Theory for the Bootstrap

1/41


2/41

2342 P. Hall

AbstractA brief account is given of the methodology and theory for the bootstrap.Methodology is developed in the context of the equation approach, which allowsattention to be focussed on specific criteria for excellence, such as coverage error ofa confidence interval or expected value of a bias-corrected estimator. This approachutilizes a definition of the bootstrap in which the key component is replacing a truedistribution function by its empirical estimator. Our theory is Edgeworth expansionbased, and is aimed specifically at elucidating properties of different methods forconstructing bootstrap confidence intervals in a variety of settings. The readerinterested in more detail than can be provided here is referred to the recentmonograph of Hall (1992).

1. IntroductionA broad interpretation of bootstrap methods argues that they are defined byreplacing an unknown distribution function, F, by its empirical estimator, p, in afunctional form for an unknown quantity of interest. From this standpoint, theindividual who first suggested that a population mean,

p = s xdF(x),

could be estimated by the sample mean,

x =s xdF(x),was using the bootstrap. We tend to favour this definition, although we appreciatethat there are alternative views.

Perhaps the most common alternative is to confer the name bootstrap onprocedures that use Monte Carlo methods to effect a numerical approximation.While we see that this does have its merits, we would argue against it on twogrounds. First, it is sometimes convenient to draw a distinction between theessentially statistical argument that leads to the substitution or plug-in methoddescribed in the previous paragraph, and the essentially numerical argument thatemploys a Monte Carlo approximation to calculate a functional of F^. There doexist statistical procedures which marry the numerical simulation and statisticalestimation into one operation, where the simulation is regarded as primarily astatistical feature. Monte Carlo testing is one such procedure; see for example


3/41

Ch. 39: Methodoioyy and Theoryfor the Bootstrap 2343Barnard (1963), Hope (I 968) and Marriott (1979). Our definition of the bootstrapwould not regard Monte Carlo testing as a bootstrap procedure. That may beseen as either an advantage or a disadvantage, depending on ones view.

A second objection that one may have to defining the bootstrap strictly in termsof whether or not Monte Carlo methods are employed, is that the method ofnumerical computation becomes intrinsic to the definition. TO cite an extreme case,one would not usually think of using Monte Carlo methods to compute a samplemean or variance, but nevertheless those quantities might reasonably be regardedas bootstrap estimators of the population mean and variance, respectively. In a lessobvious instance, estimators of bootstrap distribution functions, which wouldusually be candidates for approximation by Monte Carlo methods, may sometimesbe computed most effectively by exact, non-Monte Carlo methods. See for exampleFisher and Hall (1991). In other settings, saddlepoint methods provide excellentalternatives to simulation; see Davison and Hinkley (1988) and Reid (1988). Doesa technique stop being a bootstrap method as soon as non-Monte Carlo methodsare employed? To argue that it does seems unnecessarily pedantic, but to deny thatit does would cause some problems for a bootstrap definition based on the notionof simulation.

The name bootstrap was introduced by Efron (1979), and it is appropriate hereto emphasize the fundamental contributions that he made. As Efron was careful topoint out, bootstrap methods (in the sense of replacing F by F) had been aroundfor many years before his seminal paper. But he was perhaps the first to perceivethe enormous breadth of this class of methods. He saw too that the power of moderncomputing machinery could be harnessed to allow functionals of F^ to be computedin very diverse circumstances. The combination of these two observations isextremely powerful, and its ultimate effect on Statistics will be revolutionary.Necessarily, these two observations go together; the vast range of applications ofbootstrap methods would not be possible without a facility for extremely rapidsimulation. However, that fact does not imply that bootstrap methods are restrictedto situations where simulation is employed for calculation.

Statistical scientists who thought along lines similar to Efron include Hartigan(1969, 1971), who used resampled sub-samples to construct point and intervalestimators, and who stressed connections with Mahalanobis interpenetratingsamples and the jackknife of Quenouille (1949, 1956) and Tukey (1958); and Simon(1969, Chapters 23-25), who described a variety of Monte Carlo methods.

Let us accept, for the sake of argument, that bootstrap methods are defined bythe replace F by P rule, described above. Two challenges immediately emerge inresponse to this definition. First, we must determine how to focus this concept,SO as to make the bootstrap responsive to statistical demands. That is, how do wedecide which functionals of F should be estimated? This requires a principle thatenables US to implement bootstrap methods in a range of circumstances. The secondchallenge is that of calculating the values of those functionals in a practical setting.The latter problem may be solved partly by providing simulation methods or related


4/41

2344 P. alldevices, such as saddlepoint arguments, for numerical approximation. Spacelimitations mean that a thorough account of these techniques is beyond the scopeof this chapter. However, a detailed account of efficient methods of bootstrapsimulation may be found in Appendix II of Hall (1992). A key part of the answer tothe first question is the development of theory describing the relative performanceof different forms of the bootstrap, and that issue will be addressed at some lengthhere.

Our answer to the first question is provided in Section 2, where we describe anequation approach to focussing attention on specific statistical questions. Thistechnique was discussed in more detail by Hall and Martin (1988), Martin (1989)and Hall (1992, Chapter 1). It leads naturally to bootstrap iteration, which isdiscussed in Section 3. Section 4 presents theory that enables comparisons to bemade of different bootstrap approaches to inference about distributions. The readeris referred to Hinkley (1988) and DiCiccio and Roman0 (1988) for excellent reviewsof bootstrap methods.

Our discussion is necessarily kept brief and is essentially an abbreviated form ofan account that may be found in Hall (1992). In undertaking that abbreviation wehave omitted discussion of a variety of different approaches to the bootstrap. Inparticular, we do not discuss various forms of bias correction, not because we donot recommend it but because space does not permit an adequate survey. We readilyconcede that the restricted account of bootstrap methods and theory presented hereis in need of a degree of bias correction itself!

We do not address in any detail the bootstrap for dependent data, but pause hereto outline the main issues. There are two main approaches to implementing thebootstrap in dependent settings. The first is to model the dependent process as onethat is driven by independent and identically distributed disturbances - examplesinclude autoregressions and moving averages. We describe briefly here a techniquewhich may be used when no parametric assumptions are made about thedistribution of the disturbances. First estimate the parameters of the model, andcalculate the residuals (i.e. the estimated values of the independent disturbances).Then run the process over and over again, by Monte Carlo simulation, withparameter values set equal to their estimated values and with the bootstrappedindependent disturbances obtained by resampling randomly, with replacement,from the set of residuals. Each resampled process should be of the same length asthe original one, and bootstrap inference may be conducted by averaging over theindependent Monte Carlo replications. Bose (1988) addresses the efficacy of thisprocedure in the context of autoregressive models, and derives results that may beviewed as analogues (in the case of autoregressive processes) of some of thosediscussed later in this chapter for independent data.

If the distribution of disturbances is assumed known then, rather than estimateresiduals and resample with replacement from those, the parameters of the assumeddistribution may be estimated. The bootstrap disturbances may now be derived byresampling from the hypothesized distribution, with parameters estimated.


5/41

Ch. 39: Methodology and Throryfir the Bootstrup 2345

The major other way of bootstrapping dependent processes is to divide the datasequence into blocks, and resample the blocks rather than individual data values.This approach has application in spatial as well as linear or time series contexts,and indeed was apparently first suggested for spatial data; see Hall (1985). Blockingmethods may involve either non-overlapping blocks, as in the technique treated byCarlstein (1986), or overlapping blocks, as proposed by Kiinsch (1989). (Bothmethods were considered for spatial data by Hall (1985)) In sheer asymptotic termsKiinschs method has advantages over Carlsteins, but those advantages are notalways apparent in practice. This matter has been addressed by Hall and Horowitz(1993) in the context of estimating bias or variance, and there the matter of optimalblock width has been treated. The issue of distribution estimation using blockingmethods has been discussed by Gotze and Kiinsch (1990), Lahiri (1991, 1992) andDavison and Hall (1993).

2. A formal definition of the bootstrap principleMuch of statistical inference involves describing the relationship between a sampleand the population from which the sample was drawn. Formally, given a functionalf, from a class (f,:t~Y->, we wish to determine that value t, of r that solves anequation such as

W(Fcl? FJlFo) = 0, (2.1)where F = F, denotes the population distribution function and F = F, is thedistribution function of the sample. An explicit definition of F, will be givenshortly. Conditioning on F, in (2.1) serves to stress that the expectation is takenwith respect to the distribution F,. We call (2.1) the population equation because weneed properties of the population if we are to solve this equation exactly.For example, let 8, = d(F,) denote a true parameter value, such as the rth powerof a mean,

Let e= B(F,) be our bootstrap estimator of 8,, such as the rth power of a samplemean,

where 3 = F, is the empirical distribution function of the sample from which _? iscomputed. Correcting gadditively for bias is equivalent to finding that value 1, that


6/41

2346 P. all

solves (2.1) whenfr(F,, Fl) = v-1) - W,) + t. (2.2)

Our bias-corrected estimator would be 8+ t,. On the other hand, to construct asymmetric, 95% confidence interval for 8, we would solve (2.1) whenjl(F,, F,) = Z{B(F,) - t d B(F,) < B(F,) + t} - 0.95, (2.3)

where the indicator function Z(&) is defined to equal 1 if event 6 holds and 0otherwise. The confidence interval is (6 - to, 6 + to), where 8 = B(F,).

To obtain an approximate solution of the population equation (2.1) we argue asfollows. Let F, denote the distribution function of a sample drawn from F,(conditional on F,). Replace the pair (F,, F,) in (1.1) by (F,, F,), thereby transforming(2.1) to

(2.4)We call this the sample equation because we know (or can find out) everything aboutit once we know the sample distribution function F,. In particular, its solution f,is a function of the sample values.

We call & and E{f,(F,, FJ F,} the bootstrap estimators of t, and E{f,(F,,F,) 1F,}, respectively. They are obtained by replacing F0 by F, in formulae for toand E{f,(F,, F,)I F,}. In the bias correction problem, where f, is given by (2.2), thebootstrap version of our bias-corrected estimator is I!+ &,. In the confidence intervalproblem where (2.3) describes f,, our bootstrap confidence interval is (e - &,, 8 + f,).The latter is commonly called a (symmetric) percentile-method confidence intervalfor 6,.The bootstrap principle might be described in terms of this approach toestimation of a population equation.

It is appropriate now to give detailed definitions of F, and F,. There are twoapproaches, suitable for nonparametric and parametric problems respectively. Inboth, inference is based on a sample X of n random (independent and identicallydistributed) observations of the population. In the nonparametric case, F, is simplythe empirical distribution function of X; that is, the distribution function of thedistribution that assigns mass n-l to each point in X. The associated empiricalprobability measure assigns to a region B a value equal to the proportion of thesample that lies within 2. Similarly, F, is the empirical distribution function of asample drawn at random from the population with distribution function F,; thatis, the empiric of a sample !Z* drawn randomly, with replacement, from 3. If wedenote the population by X0 then we have a nest of sampling operations: X is drawnat random from X0 and !E* is drawn at random from X.


7/41

Ch. 39: Mrthodology and Theoryfor the Bootstrap 2341

In the parametric case, F, is assumed completely known up to a finite vector i,of unknown parameters. To indicate this dependence we write F, = F,*(,), an elementof a class {F,,,, k.~Aj of possible distributions. Let 1: be an estimator of I, computedfrom J, often (but not necessarily) the maximum likelihood estimator. It will be afunction of sample values, so we may write it as h(X). Then F, = F,Q, the distributionfunction obtained on replacing true parameter values by their sample estimates.Let X* denote the sample drawn at random from the distribution with distributionfunction F,,, (not simply drawn from 3 with replacement), and let fi* = A(F*)denote the version of I computed for .Y* instead of .Y. Then F, = F,i*,.

It is appropriate now to discuss two examples that illustrate the bootstrapprinciple.

Example 2.1. Bias reductionHere the function f, is given by (2.2), and the sample equation (2.4) assumes the form

E{W,) - W,) + [IF,) = 0,whose solution is

t = to= 8(F,) - E{O(F,)IF,}.The bootstrap bias-reduced estimator is thus

6, = @+ t*,,= 8(F,) + 2, = 28(F,) - E{O(F,)IF,}. (2.5)Note that our basic estimator I!?= B(F,) is also a bootstrap estimator since it isobtained by substituting F, for F, in the functional formula 8, = 8(F,).

The expectation E(B(F,)jF,} may always be computed (or approximated) byMonte Carlo simulation, as follows. Conditional on F,, draw B resamples{.Fz, 1 d b d B} independently from the distribution with distribution function F,.In the nonparametric case, where F, is the empirical distribution function of thesample 3, let F,, denote the empirical distribution function of .!!z. In the parametriccase, let iz = I(%;) be that estimator of &, computed from resample Fz, and putF,, = Fci*,. Define 6: = 8(F,,) and o^= H(F,). Then in both parametric and non-parametrPc circumstances,

h=lconverges to fi = E(O (F,)lF,} = E(@*(X) (with probability one, conditional on F,)as B+ncj.


8/41

2348 P. Hull

Example 2.2. Confidence int ervalA symmetric confidence interval for 8, = U(F,) may be constructed by applying theresampling principle using the function f, given by (2.3). The sample equation thenassumes the form

P{8(F,) - t < 8(F,) < Q(F,) + t(F,} - 0.95 = 0. (2.6)In a nonparametric context Q(F,), conditional on F,, has a discrete distributionand so it would seldom be possible to solve (2.6) exactly. However, any error in thesolution of (2.6) will usually be very small, since the size of even the largest atom ofthe distribution of B(F,) decreases exponentially quickly with increasing II. Thelargest atom is of size only 3.6 x 1O-4 when IZ= 10. We could remove this minordifficulty by smoothing the distribution function F,. In parametric cases, (2.6) mayusually be solved exactly for t.

The interval (& f,, 8+ &J is a bootstrap confidence interval for 8, = 8(F,),usually called a (two-sided, symmetric) percentile interval since &, is a percentile ofthe distribution of le(F,) - Q(F,)I conditional on F,. Other nominal 95% percentileintervals include the two-sided, equal-tailed interval (i?- f,,, 8 + fo2) and theone-sided interval (- co, & + f,,), where f,,, fo2, and f,, solve

P{@F,) < B(F,) - tlF,} - 0.025 = 0,P(B(F,) < QF,) + tlF,} - 0.975 = 0,

andP{e(F,) 8+ f,,) z 0.025.The ideal form of this interval, obtained by solving the population equation ratherthan the sample equation, does place equal probability in each tail.

Still other 95% percentile intervals are I^, = (e- fo2, 8+ f,,) and III = (- co,8 + fo4), where too4 s the solution ofP{fI(F,) d Q(F,) - tlF,} - 0.05 = 0.

These do not fit naturally into a systematic development of bootstrap methods byfrequentist arguments, and we find them a little contrived. They are sometimes


9/41

Ch. 3Y: Methodology and Theoryfor the Bootstrap 2349

motivated as follows. Define e* = B(F,), I?(x) = P(8* < ~1%) andI?(C()=inf{x:t?(x)>a}.

Thenr^, = [I?(0.025),&(0.975)] and fI = [- co,I?(0.95)].

All these intervals cover 8, with probability approximately 0.95, which might becalled the nominal coverage. Coverage error is defined to be true coverage minusnominal coverage; it generally converges to zero as sample size increases.

We now treat in more detail the construction of two-sided, symmetric percentileintervals in parametric problems. There, provided the distribution functions Fo, arecontinuous, equation (2.6) may be solved exactly. We focus attention on the caseswhere 8, = Q(F,) is a population mean and the population is normal or exponential.Our main aim is to bring out the virtues of pivoting, which usually amounts toresealing so that the distribution of a statistic depends less on unknown parameters.

If the population is Normal N@,c?) and we use the maximum likelihoodestimator x = (x, S2) to estimate 1, = (CL, 2), then the sample equation (2.6) may berewritten as

P(ln - %Nj < t 1F,) = 0.95, (2.7)where N is Normal N(0, 1) independent of F,. Therefore the solution of (2.6) is*t = t, = xog5n -12B, where X, is defined by

P(INI


10/41

2350 P.Hull

To appreciate why the percentile interval has this inadequate performance, let usgo back to our parametric example involving the Normal distribution. The rootcause of the problem there is that 8, and not cr, appears on the right-hand side in(2.8). This happens because the sample equation (2.6), equivalent here to (2.7),depends on 8. Put another way, the population equation (2.1), equivalent to

P{ l(V,) - B(F,)I < t} = 0.95,depends on cr, the population variance. This occurs because the distribution ofle(F,) - 8(F,)I depends on the unknown CJ.We should try to eliminate, or at leastminimize, this dependence.

A function T of both the data and an unknown parameter is said to be (exactly)piootal if it has the same distribution for all values of the unknowns. It isasymptotically pivotal if, for sequences of known constants {a,} and {b,}, a,T+ b,has a proper nondegenerate limiting distribution not depending on unknowns. Wemay convert 8(F,) - 8(F,) into a pivotal statistic by correcting for scale, changingit to T= {B(F,) - fl(F,)}/* r w h ere z*= r(F,) is an appropriate scale estimator. In ourexample about the mean there are usually many different choices for 2, e.g. thesample standard deviation {n- C(Xi - X) }1/Z the square root of the unbiasedvariance estimate, Ginis mean difference and the interquartile range. In morecomplex problems, a jackknife standard deviation estimator is usually an option.Note that exactly the same confidence interval will be obtained if t^ is replaced byc?, for any given c # 0, and so it is inessential that z be consistent for the asymptoticstandard deviation of f&F,). What is important is piuotalness - exact pivotalness ifwe are to obtain a confidence interval with zero coverage error, asymptoticpivotalness if exact pivotalness is unattainable. If we change to a pivotal statisticthen the function f, alters from the form given in (2.3) to

f,(F,, F,) = 1(&F,) - tr(F,) d B(F,) d B(F,) + tr(F,)} - 0.95. (2.9)In the case of our parametric Normal model, any reasonable scale estimator t*will

give exact pivotalness. We shall take z = 8, where 8 = a(F,) = n-C(Xi - f)denotes sample variance. Then f, becomes

ft(F,, F,) = 1(Q(F,) - m(~,) d e(F,) < e(F,) + ta(~,)) - 0.95.Using this functional in place of that at (2.3), but otherwise arguing exactly as before,equation (2.7) changes to

P((n- 1)-121T~_1( dt(F,} =0.95, (2.10)where T,_ I has Students t distribution with n - 1 degrees of freedom and isstochastically independent of F,. (Therefore the conditioning on F, in (2.10) is


11/41

Ch. 39: Methodology and Theory for the Bootstrap 2351

irrelevant.) Thus, the solution of the sample equation is f0 = (n - 1)) 1i2w0,95, wherew&= w,(n) is given by P(IT,_ 1 1< w ,) = ct . The bootstrap confidence interval is(X - &,,b, % + 2,8), with perfect coverage accuracy,

P{X -(n - 1)-12w0,95 8 ,< p d X + (n - l)-zw,,,,B) = 0.95.(Of course, the latter statement applies only to the parametric bootstrap under theassumption of a Normal model.)

Such confidence intervals are usually called percentile-t intervals since f0 is apercentile of the Students t-like statistic /0(F,) - 8(F,)1/r(F,).

Perfect coverage accuracy of percentile-t intervals usually holds only in parametricproblems where the underlying statistic is exactly pivotal. More generally, ifsymmetric percentile-t intervals are constructed in parametric .and nonparametricproblems by solving the sample equation when f, is defined by (2.9), where z(F,) ischosen so that T= {8(F,) - B(F,)}/z(F,) is asymptotically pivotal, then coverageerror will usually be O(n-) rather than the O(n- ) associated with ordinarypercentile intervals.

We conclude this example with remarks on the computation of critical points,such as r?,, by uniform Monte Carlo simulation. Further details, including anaccount of efficient Monte Carlo simulation, are given in Section 5.

Assume we wish to compute the solution 0, of the equationPC w-2) - W,)}Iz(F,) d 9, IF,] = LY, (2.11)

or, to be more precise, the value4 = inf{x:PC{W,) - B(F,))/T(F,) xIF, (3 M}.

Choose integers B > 1 and 1 d v 6 B such that v/(B + 1) = ~1.For example, if c1= 0.95then we could take (v, B) = (9599) or (950,999). Conditonal on F,, draw B resamples(gz, 1


12/41

2352 P. Hall

3. Iterating the principleRecall that in Section 2, we suggested that statistical inference often involvesdescribing a relationship between the sample and the population. We argued thatthis leads to a bootstrap principle, which may be enunciated in terms of finding anempirical solution to a population equation, (2.1). The empirical solution is obtainedby solving a sample version, (2.4) of the population equation. The notationemployed in those equations includes taking F,, F, and F, to denote the truepopulation distribution function, the empirical distribution function, and theresample version of the empiric, respectively. The solution of the populationequation is a functional of F,, say T(F,), and the solution of the sample equationis the corresponding functional of the empiric, T(F,). The population equation maythen be represented as

-W-W,W,~ FJIFd = 0,with approximate solution

-WYFI)(FO~F,)IFI~00. (3.1)The solution of the sample equation represents an approximation to the solutionof the population equation. In many instances we would like to improve on this

approximation ~ for example, to further reduce bias in a bias correction problem,or to improve coverage accuracy in a confidence interval problem. Therefore weintroduce a correction term t to the functional T, so that T(.) becomes U(., t) withU( ., 0) E T( .). The adjustment may be multiplicative, for example, U( ., t) E (1 + t )T( .).Or it may be an additive correction, as in U(*, t) = T(.) + t. Or t might adjust someparticular feature of T, as in the level-error correction for confidence intervals, whichwe shall discuss shortly. In all cases, the functional U(.,t) should be smooth in t.Our aim is to choose t so as to improve on the approximation (3.1).

Ideally, we would like to solve the equation(3.2)

for t. If we write g,(F, G) =fac,_(F, G), w e see that (3.2) is equivalent toJ%(Fo> FAIF,) = 0,

which is of the same form as the population equation (2.1). Therefore we obtain anapproximation by passing to the sample equation,


13/41

Ch. 39: Methodology and TheoryJbr the Bootstrap 2353

or equivalently,

This has solution &,oz T,(F,), say, giving us a new approximate equation of thesame form as the first approximation (3.1), and being the result of iterating thatearlier approximation,

Our hope is that the approximation here is better than that in (3.1) so that in asense U(F,, T,(F,)] is a better estimate than T(F,) of the solution t, to equation(2.1). Of course, this does not mean that U[F,, 7,(F,)] is closer to t, than T(F,),only that the left-hand side of (3.4) is closer to zero than the left-hand side of (3.1).

If we revise notation and call U[F,, T,(F,)] the new T(F,), we may run throughthe argument again, obtaining a third approximate solution of (2.1). In principle,these iterations may be repeated as often as desired.

We have given two explicit methods, multiplicative and additive, for modifyingour original estimate f, = T(F,) of the solution of (2.1) so as to obtain the adjustableform U(F,, t). Those modifications may be used in a wide range of circumstances.In the special case of confidence intervals, an alternative approach is to modify thenominal coverage probability of the confidence interval. To explain the argumentwe shall concentrate on the special case of symmetric percentile-method intervalsdiscussed in Example 2.1. Corrections for other types of intervals may be introducedin like manner.

An a-level symmetric percentile-method interval for Be = QF,) is given by[B(F,) - &,, f?(F,) + &,I, where &, is chosen to solve the sample equation

P{w-*) - t d WI) d 8(F,) + t/F,) -a = 0.(In our earlier examples, tl = 0.95.) This f,, is an estimator of the solution t, = T(F,)of the population equation

PlW,) - t d W,) d B(F,) + tlF,} - c1= 0,that is. of

P((B-8&t(F,)=cc,where e= O(F,). Therefore to is just the a-level quantile, x,, of the distribution of6 &I,


14/41

2354 P. Hall

Write x, as x(F,),, the quantile when F, is the true distribution function. ThenE, = T(F,) is just x(F,),, and we might take U(., t) to be

U(., t) = x(.),+1.This is an alternative to multiplicative and additive corrections, which in the presentproblem are

U(.,t)-(1 +t)x(.), and U(.,t)=x(.),+t,respectively. In general, each will give slightly different numerical results, although,as we shall prove shortly, each provides the same order of correction.

Concise definitions of Fj are different in parametric and nonparametric cases. Inthe former we work within a class {Fo,, d~l\} of distributions that are completelyspecified up to an unknown vector 1 of parameters. The true distribution isF, = Fcno), we estimate il, by I= L(X) where X = Xi is an n-sample drawn fromF,, and we take F, to be F,i,. To define Fj, let ij = L(Xj) denote the estimator icomputed for an n-sample Xj drawn from Fjm 1 and put Fj = F,ir The nonparametriccase is conceptually simpler. There, Fj is the empirical distribution of an n-sampledrawn randomly from Fj_ 1, with replacement.

To explain how high-index Fis enter into computation of bootstrap iterations,we shall discuss calculation of the solution to equation (3.3). That requirescalculation of U(F,, t), defined for example by

U(F,, t) = (1 + t)T(F,).And for this we must compute T(F,). Now , f, = T(F,) is the solution (in t) of thesample equation

and so T(F,) is the solution (in t) of the resample equation

~tft(F,~F,)IF,)= 0.Thus, to find the second bootstrap iterate, the solution of (3.3), we must constructF,, F,, and F,. Calculation of F, by simulation typically involves order Bsampling operations (B resamples drawn from the original sample), whereascalculation of F, by simulation involves order B2 sampling operations (Bresamples drawn from each of B resamples) if the same number of operations is usedat each level. Thus, i bootstrap iterations could require order B computations, andso complexity would increase rapidly with the number of iterations.


15/41

Ch. 39: Methodology and Theory,for the Bootstrap 2355

In regular cases, expansions of the error in formulae such as (3.1) are usuallypower series in n- i* or n- r, often resulting from Edgeworth expansions of the typethat we shall discuss in Section 4. Each bootstrap iteration reduces the order ofmagnitude of error by a factor of at least n - liz However, in many problems withan element of symmetry, such as two-sided confidence intervals, expansions of errorare power s e r i e s in IZ- rather than n I, and each bootstrap iteration reduces errorby a factor of n-l, not just n- I*.Example 3.1. Bias reductionIn this situation, each bootstrap iteration reduces the order of magnitude of biasby the factor n-l. (See Hall 1992, Section 1.5, for further details.) To investigatefurther the effect of bootstrap iteration on bias, observe that, in the case of biasreduction by an additive correction,

.ft(Fcb l) = w-1) - QF,) + t.Therefore the sample equation,

has solution t = T(F,) = QF,) - E{B(F,)IF,}, and so the once-iterated estimate is8, = @+ T(F,) = B(F,) + T(F,) = 2&F,) - E{B(F,)IF,).

See also (2.5). On iteration of this formula we obtain the following formula for ageneral bootstrap estimator.Theorem 3.1If I!?~ enotes the jth iterate of 8, and if the adjustment at each iteration is additive,then

E{B(Fi)(F,}, ja 1.

Example 3.2. Confidence int ervalHere, each iteration generally reduces the order of coverage error by the factor n-lin the case of two-sided intervals, and by n- I2 for one-sided intervals. To appreciatethe effect of iteration in more detail, let us consider the case of parametric, percentileconfidence intervals for a mean, assuming a Normal N(p, CJ*)population, discussedin Example 2.2. Let N denote a Normal N(0, 1) random variable. Estimate the


16/41

P. Hall

parameter i, = (p, 0) by th e maximum likelihood estimatorn^ X, 62)= (up,), rJ2(Fl)),

where X = n- CX , and 6 = IZ- x(X, - X) are sample mean and sample variance,respectively. The functional ,f, is, in the case of a symmetric two-sided 95% percen-tile confidence interval,

f;(F,,F,) = I{W,) - t d fI(F,) d B(F,) + t} - 0.95,and the sample equation (2.4) has solution t = T(F,) = n- 2~,~,,o(F,), where x0,95is given by P( 1N 1< x0.95) = 0.95. This gives the percentile interval

(X - n- 2x,,,& x +n- l2xo,958),derived in Example 2.2. For the sake of definiteness we shall make the coveragecorrection in the form

U(F,,t)=n - 12(%.95 + t)@),

although we would draw the same conclusion with other forms of correction. Thus,f (F,,t)(hI Fl) = w2 lw,l) - ~(~,w(~,) d x0.95 + t> - 0.95,

so that the sample equation (3.3) becomesP{n2IU(F,) - B(F,)(/o(F,) < x0.95 + tlF,} - 0.95 = 0. (3.5)

Observe thatW = nli2{d(F2) - B(F,)}/ o(F,)

= nm1j2 i (X*-X)in-l i (X*-X)2 -,

i=l H i=l I

where conditional on ?Z, XT,. . . , Xz are independent and identically distributedN(X, b2) random variables and X* = n cX*. Therefore, conditional on X, andalso unconditionally, W is distributed as {n/ (n - 1)}2T,_, where T,- 1 hasStudents t distribution with n - 1 degrees of freedom. Therefore the solution E, ofequation (3.5) is f. = (~/(n - l)}i2wo,,5 - xo,95, where w, = w,(n) is defined by

P(IT,-,/


17/41

Ch. 9: Methodoloyy und Thuory,fiv the Bootstrap 2357

The resulting bootstrap confidence interval isCW,) - n 124F1)(%95f,),NF,) + fl- 12~(~l)(%.95Ml= [X - (?I - 1))2W,,,,B,X + (?I - 1))2W,,,,6].

This is identical to the percentile-t (not the percentile) confidence interval derivedin Example 2.2 and has perfect coverage accuracy.

The methodology of bootstrap iteration was introduced by Efron (1983), Hall(1986), Beran (1987) and Loh (1987).

4. Asymptotic theory4.1. SummaryWe begin by describing circumstances where Edgeworth expansions, in the usualrather than the bootstrap sense, may be generated under rigorous regularityconditions; see Section 4.2, Major contributors to this theory include Chibishov(1972,1973a, 1973b), Sargan (1975, 1976) and Bhattacharya and Ghosh (1978). Ouraccount is based on the latter paper. Following that, in Section 4.3, we discussbootstrap versions of those expansions and then describe the conclusions that maybe drawn from those results. Our first conclusions, about the efficacy of pivotalmethods, are given towards the end of Section 4.3. Sections 4.4, 4.5, 4.6 and 4.7describe respectively a variety of different confidence intervals, properties ofbootstrap estimates of critical points, properties of coverage error and the specialcase of regression. The last case is of particular interest because, in the context ofintervals for slope parameters, it admits bootstrap methods with unusually goodcoverage accuracy.The main conclusions drawn in this section relate to the virtues of pivoting. Thatsubject was touched on in Section 2 but there we lacked the technical devicesnecessary to provide a broad description of the relative performances of pivotal andnon-pivotal methods. The Edgeworth expansion techniques introduced in Section4.2fill this gap. In particular, they enable us to show that pivotal methods generallyyield greater accuracy in the estimation of critical points (Section 4.5) and smallerasymptotic order of coverage error of one-sided confidence intervals (Section 4.6).Nevertheless, it should be borne in mind that these results are asymptotic incharacter and that, while they provide a valuable guide, they do not tell the wholestory. For example, the performance of pivotal methods with small samples dependsin large part on the relative accuracy of the variance estimator and can be very poorin cases where an accurate variance estimator is not available. Examples whichfeature poor accuracy include interval estimation for the correlation coefficient andfor a ratio of means when the denominator mean is close to zero.


18/41

2358 P. Hall

Theory for the bootstrap, along the lines of that described here, was developedby Bickel and Freedman (1980), Singh (1981), Beran (1982, 1987), Babu and Singh(1983, 1984, 1985), Hall (1986, 1988a, 1988b), Efron (1987), Liu and Singh (1987)and Robinson (1987). Further work on the bootstrap in regression models isdescribed by Bickel and Freedman (198 1, 1983), Freedman (198 l), Freedman andPeters (1984) and Peters and Freedman (1984a, 1984b).

4.2. Edgeworth and Cornish-Fisher expansionsWe begin by describing a general model that allows Edgeworth and CornishhFisherexpansions to be established rigorously. Let @, 4 denote respectively the StandardNormal distribution and density functions. Let X, X,, X,, . . . be independent andidentically distributed random column d-vectors with mean p, and put X = n C Xi.Let A: Rd --f R be a smooth function satisfying A(p) = 0. We have in mind a functionsuch as A(x) = {g(x) - g(~)}/h@), where 8, = g(p) is the (scalar) parameter estimatedby 6 = g(X) and g2 = h(/1)2 is the asymptotic variance of n28; or A(x) = {g(x) -g(p)}/h(x), where b2 = h(X) is an estimator of h(p). (Thus, we assume h is a knownfunction.)

This smooth function model allows us to study problems where 8, is a mean,or a variance, or a ratio of means or variances, or a difference of means or variances,or a correlation coefficient, etc. For example, if {W,, . . . , W,,} were a random samplefrom a univariate population with mean m and variance fi2, and if we wished toestimate 0, = m, then we would take d = 2, x = (X, Xc2)r = (W, W2)T, p = E(f),

&7(x1,x(2) = x(, h(x,X2) =x(2 _ (,(1)2.This would ensure that g(p) = m, g(x) = w (the sample mean), h(p) = b2, and

h(X)=n- $lX~2-(n~1i$lXj1)2=n~1i$l(Wi- W)=/P

(the sample variance). If instead our target were 8, = /II then we would take d = 4,X = (W, W2, W3, W4)*, /I = E(X),

g(x, . . )X(4)= x(2 - (x(l)2,h(x(,..., x(4) = x(4 _ 4xx3 + fj(x)2x2 _ 3(x)4 _ [x _ (,W)2]2.

In this case,Y(P)= B2> dx, = Bz,h(p) = E( W - m) - fi4,


19/41

Ch. 39: Methodology and Theory for the Bootst rap 2359

h(X) = rl-l i (Wi - W)2= B.i=l

(Note that E(W- m) - /I equals the asymptotic variance of nri2/?.) The caseswhere o0 is a correlation coefficient (a function of five means), or a variance ratio (afunction of four means), among others, may be treated similarly.

The following result may be established under the model described above. Wefirst present a little notation. Put p = E(X), and let

pi ,,,. , = E{ (X - pp. . ( X - p p } , j > 1,ai,...i, = (aj/ axv.. .ax~ q4(x )~ ,=,,

and

Note that c2 equals the asymptotic variance of ni2A(_%).

Theorem 4.1Assume that the function A has j + 2 continuous derivatives in a neighbourhood ofp = E(X), that .4(p) = 0, that E( II i! IJj2) < co, and that the characteristic functionx of X satisfies

limsup Ix(t)1 < 1.E,, % (4.1)

Suppose CJ> 0. Then for j 2 1,P{n2A(@/o d x) = e,(x) + n-2pl(x)&) + . . .

+ n-j2pj(X)4(X) + 0(n-j2) (4.2)


20/41

2360 P. alluniformly in x, where pj is a polynomial of degree at most 3j - 1, odd for even j andeven for odd j, with coefficients depending on moments of 2 up to order j + 2. Inparticular,

pl(x) = (4,o- + $42a-3(X2 l)}.See Bhattacharya and Ghosh (1978) for a proof.Condition (4.1) is a multivariate form of Cramers continuity condition. It is

satisfied if the distribution of z is nonsingular (i.e. has a nondegenerate absolutelycontinuous component) or if 2 = (W, W2,. , Wd )T where W is a random variablewith a nonsingular distribution.

Two versions of (4.2) are given byP{n12(g-&)/a < x} =CD(x) + n-12pl(X)c/l(X) I.

+ n -*pj(x)q5(x)+ o(n -j/2) (4.3)and

P{n12(O^-,)/b d x} = @(x) + n-12ql(X)#(X) + ...+n -2qj(X)~(X) + O(n-j), (4.4)

being Edgeworth expansions for non-Studentized and Studentized statistics, respec-tively. Here, pj and qj are polynomials of degree at most 3j - 1 and are odd or evenfunctions according to whether j is even or odd. They are usually distinct.

The Edgeworth expansion in Theorem 4.1 is readily inverted so as to yield aCornish-Fisher expansion of the critical point of a distribution. To appreciate how,first define w, = w,(n), the a-level quantile of the distribution of S, = n12A(x), by

w, = inf{x:P(S, d x) 3 2).Let z, be the a-level Standard Normal quantile, given by @(z,) = a. We may write

W,=Z,+n-2p,l(Z,)+n-1p2,(z,)+ .. +n-2pj&,)+ . .and

z, = w, + n -12p12(wa) n-p,,(w,) + + nP2pj2(Wm) + ..)where the functions pjl and pj2 are polynomials. These expansions are to beinterpreted as asymptotic series and in that sense are available uniformly inECU< 1 -sforanyO


21/41

Ch. 39: Methodology and Theoryfor the Bootstrap 2361

The polynomials pjI and pj2 are of degree at most j + 1, odd for even j and evenfor odd j, and depend on cumulants only up to order j + 2. They are completelydetermined by the pis in (4.2). In particular, it follows that pjI is determined byp1 , . . . , pj. To derive formulae for p1 1 and p2 1, note that

a= @(z,)+ ~~-2Pl,(z ,)+~-P2,(z,)~~(z,)-~~~~2P11(Z,)~2Z,~(Z ,)+ n - 12cPl m w + n - 2PII(z,){P;(d - z,Pl(z,)Hwl+ n-p2(z,)4(z,) + O(n_32)

=a+n ~2{Pll(z,)+P,(z,)}~(z,)+~-CP2,(z,)-~z,P,,(z,)2+ Pllk){PW - Z,PIWI + P2(Z,)laz) + wp32).

From this we may conclude thatPllb) = - Pl(4

andP2164 = Plw P;(x) - +xPl (42 - P2W

Formulae for the other polynomials piI, and for the pi2s, may be derived similarly,however, they will not be needed in our work.

CornishhFisher expansions under explicit regularity conditions may be deducedfrom results such as Theorem 4.1. For example, the following inversions of (4.3) and(4.4) are valid uniformly in E < c1< 1 - E, under the conditions of that theorem:

u, = z, + n -12p11(z,) + nmp2,(z,) + .. + n-j2pj,(Za) + o(n-j2), (4.5)and

v, = z, + n ~2q11(z,)+n~1q21(z,)+ .. +n-2qjl(z,)+o(n-j2). (4.6)Here z,, u,, v, are the solutions of the equations @(z,) = LX,

P{n(& &J/a


22/41

2362 P. Hull

4.3. Edgeworth and Cornish-Fisher expansions of bootstrap distributionsWe are now in a position to describe Edgeworth expansions of bootstrap distri-butions. We shall emphasize the role played by pivotal methods, introduced inSection 2. Recall that a statistic is (asymptotically) pivotal if its limiting distributiondoes not depend on unknown quantities. In several respects the bootstrap does abetter job of estimating the distribution of a pivotal statistic than it does for anonpivotal statistic. The advantages of pivoting can be explained very easily bymeans of Edgeworth expansion, as follows. If a pivotal statistic T is asymptoticallyNormally distributed, then in regular cases we may expand its distribution functionas

G(x) = P( T < x) = a(x) + n q(x)+(x) + O(n- ), (4.7)where q is an even quadratic polynomial. See (4.2), for example. We might takeT= n(t!?- 0)/c?, where i? is an estimator of an unknown parameter H,, and s2 isan estimator of the asymptotic variance o2 of n I28 The bootstrap estimator of Gadmits an analogous expansion,

G(x) = P(T* d x(.5) = a(x) + n-2Q(x)~(x) + O,(n-I), (4.8)where T* is the bootstrap version of T, computed from a resample %* instead ofthe sample ?Z,and the polynomial 4 is obtained from q on replacing unknowns,such as skewness, by bootstrap estimates. (The notation O,(n- I) denotes arandom variable that is order n- m probability. The distribution of T* conditionalon 3 is called the bootstrap distribution of T*.

The estimators in the coefficients of 4 are typically distant O,(n j2) from theirrespective values in q, and so 4 -q = O,(n- 12). Therefore, subtracting (4.7) and(4.8), we conclude that

P(T*


23/41


B(x) = P(u* d xl%)= 0 (x/c?) + n - 2~(x/o/s) + O,(n - )

respectively, where p is a polynomial, 8 is obtained from p on replacing unknownsby their bootstrap estimators, o2 equals the asymptotic variance of U, d2 is thebootstrap estimator of c2 and U* is the bootstrap version of U. Again, @- p =O,(n- ), and also 8 - G = O,(n- 1/2), whence

B(x) - H(x) = @(x/B) - @(x/a) + O,(nP). (4.9)Now, the difference between 6 and (T is usually of precise order n- 12. Indeed,n1i2(B - 0) typical1 y h as a limiting Normal N(0, c2) distribution, for some i > 0.Thus, @(x/6) - @(x/a) is generally of size n-1/2, not n-l. Hence by (4.9), thebootstrap approximation to H is in error by terms of size n-*/*, not n-. Thisrelatively poor performance is due to the presence of cr in the limiting distributionfunction @(x/a), i.e. to the fact that U is not pivotal.

Expansions such as (4.8) may be developed under the smooth function model,and analogues of Theorem 4.1 are available in the bootstrap case. For example, letus return to the notation introduced just prior to that theorem, and introduceadditionally the definitions x* = n- CXT, 8* = g(x*) and c?*~ = h(X*), where%* = {XT,. . . ) x} denotes a resample drawn randomly, with replacement, froms = {Xl,. . .) X,,}, Then under the same conditions as in Theorem 4.1, except that themoment condition should be strengthened a little, we have the following analoguesof (4.3) and (4.4) respectively:

P(n2(f7* e,lr3*< xp-} = @(x) + n-121j1(X)c#J(X)...+ n -2gj(x)c#J(x) o,(n-j2) (4.10)

andP(n12(B*6)/d*


24/41

2364 P. a//

Here, pjl and Qjl differ from pjI and q,r, appearing in (4.5) and (4.6), only in that F,is replaced by F,; that is, population moments are replaced by sample moments.Of course, CornishhFisher expansions are to be interpreted as asymptotic seriesand apply uniformly in values of z bounded away from zero and one. For example,

.j/ 2 sup Iti,-{z,+n -r/2f111(Z,)+ . +n-j/* Bjl(za)}l+or.


25/41


To construct a percentile-r confidence interval for 8,, define a2(F,) to be theasymptotic variance of n 1/28, and put b2 = a2(F,). A theoretical cc-level percentile-tconfidence interval is J, = (- co, 8 + to&), where on the present occasion t, is givenby

lye, < f7+ to&) = ct.This is equivalent to solving the population equation (2.1) with

.f,(F,, Fl) = I{@,) G w.1) + WFl)) - ccThe bootstrap interval is obtained by solving the corresponding sample equation,and is _?i = (- co, 8 + f,c?), where f, is now defined by

P(8 < B(F,) + ,)I F,} = ct.To simplify notation in future sections we shall often denote O(F,) and a@,) by t?*and 8*, respectively.

Exposition will be clearer if we represent t, and &, in terms of quantiles. Thus,we define u,, v,, ri,, and 8, by

(4.14)and

PblVv,) - e(Fl)jqa(F2)G D,IF,I = @. (4.15)Write (r = a(F,) and 6 = a@,). Then definitions of I,, .I,, fI, and ii equivalent tothose given earlier are

I, =(-Co,8-n-2aul_m), J, =(-co,e-n-128vl_a),I; =(-co,8-n-26til_a), f1 =(-co,8-n-2c?91_J,

All are confidence intervals for B,,, with coverage probabilities approximately equalto a.

In the nonparametric case the statistic B(F,), conditional on F,, has a discretedistribution. This means that equations (4.14) and (4.15) will usually not have exactsolutions, although as we point out in Section 1.3 and Appendix I, the errors dueto discreteness are exponentially small functions of n. The reader concerned by the


26/41

2366 P. Hull

problem of discreteness might like to define li, and 0, by

4 = inf{u:PCn12{B(F,) - B(F,)}/~J(F,) d u~F,] 3 CC}and

0, = inf{u:P[n12{8(F2) - B(F,)}/@,) G uIF,] 2 CZ},Two-sided, equal-tailed confidence intervals are constructed by forming the

intersection of two one-sided intervals. Two-sided analogues of I, and J, are

andJ, = (& n- 12c?uC1aJ,2,& n-28uC, -&

respectively, with bootstrap versions

and

The intervals I, and J, have equal probability in each tail; for example,

P(8, d 6 - n-112cw (l+a),Z) = P(6, > e- II- %UC1 _a),2) = $1 - CC).Intervals f2 and j2 have approximately the same level of probability in each tail,and are called two-sided, equal-tailed confidence intervals. Two-sided symmetricintervals were discussed in Section 2.

All the intervals defined above have at least asymptotic coverage ~1, n the sensethat if 4 is any one of the intervals,

as n + co. As before, we call CI he nominal coverage of the confidence interval 9.The coverage error of 9 is the difference between true coverage and nominalcoverage,

coverage error = P(8,~4) - CI.


27/41

Ch. 39: Methodology and Theoryfor the Boot strap 2361

4.5. Order of correctness of bootst rap approximat ions to crit ical point sThe a-level quantiles of the distributions of S = n(@- 0,)/a and T = n(8- d,)/c?are u, and v,, respectively, with bootstrap estimates ti, and 6,. Subtractingexpansions (4.5) and (4.6) from (4.12) and (4.13) we deduce that

4 -u, = n-121MZ,) - PII + ~-{a,,(d - P21Wl + ... (4.16)and

U*,--U,=n-2~~11(Z,)-qll (z,)} +n-CLi21(Z,)-q21(Z,)} + ..Now, the polynomial jjl is obtained from pjl on replacing population moments bysample moments, and the latter are distant O,(n- iI) from their population counter-parts. Therefore fijl is distant O,(n- 12) from pjl. Thus, by (4.16),

li, - u, = O,(C%I -12 + n-i)= O&-i),and similarly 0, - v, = O,(n - ).

This establishes one of the important properties of bootstrap, or sample, criticalpoints: the bootstrap estimates of U, and u, are in error by only order n-. Incomparison, the traditional Normal approximation argues that u, and v, are bothclose to z, and is in error by n-l; for example,

z, - u, = z, - {za n -12p11(z ,)+ . ..} = -n-2pll(Z ,)+O(n-).Approximation by Students t distribution hardly improves on the Normal approx-imation, since the g-level quantile t, of Students t distribution with n - v degreesof freedom (for any fixed v) is distant order n- , not order n- 12, away from z,. Thus,the bootstrap has definite advantages over traditional methods employed to approx-imate critical points.

This property of the bootstrap will only benefit us if we use bootstrap criticalpoints in the right way. To appreciate the importance of this remark, go back tothe definitions of the confidence intervals I,, J,, fl, and j1 given in Section 4.4. Sincev*l_~=vl~~+O,(n~),theupperendpointoftheinterval~l=(-co,8-n~2Bv*~_~)differs from the upper endpoint of J, = (-co, 6 n-28ul -,) by only 0,(n-32).We say that i1 is second-order correct for J, and that & n-/280, _a is second-order correct for 8- n- 1128vl -a since the latter two quantities are in agreement up toand including terms of order (n- 12)2 = n- . In contrast, II1 = (- co, 8 - n- 1 * *OUl-a)is generally only first-order correct for I, = (- co, 8- n- 1/2~u1 _,) since the upperendpoints agree only in terms of order n- lj2, not n- l,

((j_n-1/2A* fJU1 .) - (6 n-2aul -,) = n-/2(au1 _a - CM , _,)= n- 12u1@(a - 8) + 0,(n-32),


28/41


29/41

Ch. 39: Met hodology and Theory for t he Boot strap 2369

Coefficients of the polynomial q1 I are usually unknown quantities. In view of resultssuch as the Cramer-Rao lower bound (e.g. Cox and Hinkley 1974, pp. 254ff), thecoefficients cannot be estimated with an accuracy better than order n-l. Thismeans that u1 _a cannot be estimated with an accuracy better than order n- , andthat the upper endpoint of the confidence interval J, =(-co,& n-28u, -,)cannot be estimated with an accuracy better than order nP3j2. Therefore, except inunusual circumstances, any practical confidence interval i, that tries to emulate J,will have an endpoint differing in a term of order ne3j2 from that of J,, and so willnot be third-order correct. Exceptional circumstances are those where we haveenough parametric information about the population to know the coefficients ofqll. For example, in the case of estimating a mean, ql l vanishes if the underlyingpopulation is symmetric. If we know that the population is symmetric, we mayconstruct confidence intervals that are better than second-order correct. Forexample, we may resample in a way that ensures that the bootstrap distribution issymmetric, by sampling with replacement from the collection {+(X1 -X), . . . ,*(X,-X)} rather than {X,-X,..., X, - X}. But in most problems, both para-metric and nonparametric, second-order correctness is the best we can hope toachieve.

4.6. Coverage error of conjidence int ervalsIn this section we show how to apply the Edgeworth and CornishhFisher expansionformulae developed in Sections 4.2 and 4.3, to develop expressions for coverageaccuracy of bootstrap confidence intervals. It is convenient to focus attentioninitially on the case of one-sided intervals and to progress from there to thetwo-sided case.

A general one-sided confidence interval for BO may be expressed as 9, = (- 00,8 + f), where f is determined from the data. In most circumstances, if 3i has nominalcoverage c1 hen f admits the representation

f = n - 2d(z , + i?,), (4.17)where e, is a random variable and converges to zero as n+ CO.For example, thiswould typically be the case if T = n2(8 - f3,,)/&had an asymptotic Standard Normaldistribution. However, should the value 0 of asymptotic variance be known, wewould most likely use an interval in which f had the form

f = n li20(z , + c^ ,).Intervals fl,J1, and j1 (defined in Section 4.4) are of the former type, with c*, in(4.17) assuming the respective values - li, -a - z,, li, - z,, - vi _= - z,, -G1 _= - z,.So also are the Normal approximation interval (- co, 8 + n-i/*&z,) and Students


30/41

2370 P. ull

t approximation interval (- co, 8 + n p1/28tm),where t, is the a-level quantile ofStudents t distribution with IZ 1 degrees of freedom. Interval I, is of the lattertype. The main purpose of the correction term 2, is to adjust for skewness. To alesser extent it corrects for higher-order departures from Normality.

Suppose that f is of the form (4.17). Then coverage probabilitya i,n = P(B,E.Y,) = P{B, < e+ n-Q(Z, + c*,)}

= 1 - P{ni2(8- 0,)K l + e, < -za}. (4.18)We wish to develop an expansion of this probability. For that purpose it is necessaryto have an Edgeworth expansion of the distribution function of

?I(& e,)B- l + e,,or at least a good approximation to it. In some circumstances, 6, is easy to workwith directly; for example ?, = 0 in the case of the Normal approximation interval.But for bootstrap intervals, c^, is defined only implicitly as the solution of anequation, and that makes it rather difficult to handle. So we first approximate it bya Cornish-Fisher expansion.

Suppose thatt, = n-12.Gl(z,) + n-s*,(z,) + 0,(np32), (4.19)

where s1 and s2 are polynomials with coefficients equal to polynomials inpopulation moments and gj is obtained from sj on replacing population moments.by sample moments. Then ij = sj + O,,(n- l/*) and

P{n*(B- &JB- + t, d x} = P [ n(Q- 8,)8- + n-i2{s*l - s,(z,)}

d x - i n-ji2sj(z,) + 0(nm3*).1 (4.20)j=lHere we have used the delta method.

Therefore, to evaluate the coverage probability al,n at (4.18) up to a remainderof order ne3j2, we need only derive an Edgeworth expansion of the distributionfunction of

S, = n(& 8,)8- + n-A,, (4.21)where A,, = ni2{s*l(z,) - s,(z,)}. That is usually simpler than finding an Edgeworthexpansion for n l*@- 8,)K l + c^,.


31/41

Ch. 3Y: Methodology and Theory for the Bootstrap 2371

Put T, = n($- 8,-J/8 and A, = niz{$l(zJ - sl(z,)}, and let a, denote the realnumber such that

E(T,AJ = E[n(& 8,)8- ldz{$l(za) - s,(z,))]=a,+O(n-1). (4.22)

If s, is an even polynomial of degree 2, which would typically be the case, thena, = n(z,), where rt is an even polynomial of degree 2 with coefficients not dependingon c(. Then it may be shown that

P(S,


32/41

2372 P. Hall

There is no difficulty developing the expansion (4.24) to an arbitrary number ofterms, obtaining a series in powers of n- P where the coefficient of n-j4(z,) equalsan odd or even polynomial depending on whether j is even or odd. The followingproposition summarizes that result.Proposition 4.1Consider the confidence interval

91 =c~~(~)=(-m,6+-128(z,+ c*,)),where

~2~ n-2s*,(z,) + n- s*,(z,) + ...,where the 3js are obtained from polynomials sj on replacing population momentsby sample moments and odd/even indexed s;s are even/odd polynomials, respec-tively. Suppose

P{n(@- 0,)/s


33/41

Ch. 39: Methodology and Theory for t he Boot st rap 2373

appreciate why, go back to the definition (4.19) of e,, which implies thatY, =(-&+n-28(2,+&))

=(-co,e+n-~8{z,+n-%,(z,)}+O,(n-~2)).Since q1 1 = -ql (see Section 4.2) then,

J, =(-co,fLn-12hl_J=(-co,e-n-1%[zl_~+ll-12qll(Zl -a)]+0,(n-32))=(-co,e+n-12~[z,+.-12ql(ZJ]+Op(n-32)).

The upper endpoint of this interval agrees with that of 9, in terms of order n- ,for all CC,f and only if sl = ql , that is, if and only if the term of order n- l vanishesfrom (4.25). Therefore the second-order correct interval jl has coverage error oforder n- , but the interval I;, which is only first-order correct, has coverage errorof size IZ- 12 except in special circumstances.

So far we have worked only with confidence intervals of the form (- co,6 + f)where t*= n-i2B(z, + &) and C, is given by (4.19). Should the value a2 of asymptoticvariance be known then we would most likely construct confidence intervals usingt*= n-izo(z, + e,), again for 6, given by (4.19). This case may be treated byreworking arguments above. We should change the symbol q to p at eachappearance, because we are now working with the Edgeworth expansion (4.3) ratherthan (4.4). With this alteration, formula (4.24) for coverage probability continues toapply,

P{8,+oo,8+n-2 4zm+ a)) = M + n- 2{%(%) - Pl(Z,)}~(Z,)+ n-C~~(z,) + s2(z,) - +,s1(d2+ ~lk){ZaPlW - P;(%)> - %4x &z,) + O(n 32).

(Our definition of a, at (4.22) is unaffected if 8-l is replaced by c-l, sinceA-1 _o - o- + O&n- 12).) Likewise, as the analogue of Proposition 4.1 is valid, it is

necessary only to replace 8 by c in the definition of Y, and qj by pj at allappearances of the former. Therefore our conclusions in the case where C-Js knownare similar to those when c is unknown: a necessary and sufficient condition for theconfidence interval (- co, 8 + n- I2u(z, + 6,)) to have coverage error of order n-lfor all values of cx s that it be second-order correct relative to I,.

Similarly it may be proved that if 9, isjth-order correct relative to a one-sidedconfidence interval JJ;, meaning that the upper endpoints agree in terms of sizen-j12 or larger, then Y, and zJ; have the same coverage probability up to but not


34/41

2314 P. Hall

necessarily including terms of order n -j The converse of this result is false forj > 3. Indeed, there are many important examples of confidence intervals whosecoverage errors differ by O(n -3/2) but which are not third-order correct relative toone another.

Coverage properties of two-sided confidence intervals are rather different fromthose in the one-sided case. For two-sided intervals, parity properties of polynomialsin expansions such as (4.25) cause terms of order n -12 to cancel completely fromexpansions of coverage error. Therefore coverage error is always of order n- l orsmaller, even for the most basic Normal approximation method. In the case ofsymmetric two-sided intervals constructed using the percentile-t bootstrap, coverageerror is of order nm 2. The remainder of the present section will treat two-sidedequal-tailed intervals.

We begin by recalling our definition of the general one-sided interval 9, = S,(U)whose nominal coverage is a:

The equal-tailed interval based on this scheme and having nominal coverage c1 s

-02(4=au +~))\A(+(1 -4)

=(~+n-1'2+t~-gl,2 +c^(,-01),2),~+n-"28(z(l+.,,2 +tcl+aj,2)). (4.26)(Here Y\$ denotes the intersection of set 9 with the complement of set 2.) ApplyProposition 4.1 with z = zcl +aJ,2 = - zcl -ai,2, noting particularly that rj is an oddor even function accordingly as j is even or odd, to obtain an expansion of thecoverage probability of Y,(M):

M z,n=Wo~.~2(4~= @(~)+n-~~~r~(z)~(z)+n-'r,(z)~(z)+ ...

-{@(-z)+n-"2 rl(-z)qb-z)+n~1r2(-z)~(-z)+~~~}= cI + 2n-r,(z)+(z) + 2nm 2r,(z)q5(z ) + ...= a + 2n-[q,(z) + s2(z ) - +z~~(z)~ + sl(z ){zql(z ) - 4;(z)) - ql +aj,2~l

x 4(z) + O(n ). (4.27)The property of second-order correctness, which as we have seen is equivalent to

s1 = ql, has relatively little effect on the coverage probability in (4.27). This contrastswith the case of one-sided confidence intervals.


35/41

Ch. 39: Methodology and Theory@ the Bootstrap 2375For percentile confidence intervals,

s1= --P11 =Pl (4.28)

ands*(x)= P21W Plc4P;w - +vl(4z - P2(X), (4.29)

while for percentile-t intervals,s1= -q11=41 (4.30)

ands2(4 = 421(x) = ql(x)q;b) - &%(x)2 q2(4. (4.3 1)

There is no significant simplification of (4.27) when (4.28) and (4.29) are used toexpress s1 and s2. However, in the percentile-t case we see from (4.27), (4.30) and(4.3 1) that

a 2,n = a - 2~ a(, +a)/2~(1 +or~,24(z~1 +al,2) + 0W2Lwhich represents a substantial simplification.

When the asymptotic variance c2 is known, our formula for equal-tailed,two-sided, cc-level confidence intervals should be changed from that in (4.26) to

(e+ n- %zo -aji2 + c*(r aj,2), @+ n- 12+o +a)i2 + kc1+&),for a suitable random function c^,.If c*, s given by (4.19) then the coverage probabilityof this interval is given by (4.27), except that q should be changed to p at eachappearance in that formula. The value of a(, +6j,2 is unchanged.

4.7. Simple linear regression

In previous sections we drew attention to important properties of the bootstrap ina wide range of statistical problems. We stressed the importance of pivoting. Forexample, the coverage error of a one-sided percentile-t confidence interval is of sizen - , but the coverage error of an uncorrected one-sided percentile interval is of sizen- 11.7

The good performance of a percentile-t interval is available in problems wherethe variance of a parameter estimate may be estimated accurately. Many regression


36/41

2316 P. all

problems are of this type. Thus, we might expect the endearing properties ofpercentile-t to go over without change to the regression case. In a sense, this is true;one-sided percentile-t confidence intervals for regression mean, intercept or slopeall have coverage error at most 0(X ), whereas their percentile method counterpartsgenerally only have coverage error of size n - I2 However, this generalizationconceals several very important differences in the case of slope estimation. One-sided percentile-t confidence intervals for slope have coverage error O(n 312), notO(n-); and the error is only O(n-) in the case of two-sided intervals.

These exceptional properties apply only to estimates of slope, not to estimates ofintercept or means. However, slope parameters are particularly important in thestudy of regression, and our interpretation of slope is quite general. For example,in the polynomial regression model

Yi=c+xidl + . ..+xdm+Ei. ldidn, (4.32)we regard each dj as a slope parameter. A one-sided percentile-t confidence intervalfor dj has coverage error O(n- 32 , although a one-sided percentile-t interval for cor for

E(Ylx=x,)=c+x,d,+...+x;;d,has coverage error of size n-.

The reason that slope parameters have this distinctive property is that the designpoints xi confer a significant amount of extra symmetry. Note that we may rewritethe model (4.32) as

Yi = C + (Xi - 5l)dl + .. + (X - 5,)d, + i l,


37/41

Ch. 39: Methodology and Theory f o r he Bootstrap 2311The simple linear model isYi c + xid + .zi, Ibid&

where c, d, xi, Yi, &iare scalars, c and d are unknown constants representing interceptand slope, respectively, the ENS re independent and identically distributed randomvariables with zero mean and variance o, and the xis are fixed design points. Put&= ri- r-(Xi-$&X=n-Cxi,

62&p;, fJ; = n- l t (Xi - X)2.i=l

Then 6 estimates o, and in this notation,8=cq2n- i (Xi_X)(Yi_ Y), c*= r-22,

i=l

are the usual least-squares estimates of c and d. Since ahas variance It- 0; CTthenn(& d)a,/8 is (asymptotically) pivotal. Define

r* = e+ x,2+ E?, 16idn,where the 6:s are generated by resampling randomly from the residuals ii.Furthermore, ?*, c?*, and 6* have the same formulae as ~2, , and 8, except that Yi isreplaced by YF at each appearance of the former.

Quantiles u, and u, of the distributions ofn(d^ - d)o,/o and n(d - d)a,/c?

may be defined byf{n(& d)o,/a < urn}= P{n(& d)o,.B d ua}

= cc.and their bootstrap estimates 12, and 0, by

P{n(iP - d^)g,/B 22,l~} = P{n(J* - &7,/6* d 0,1X}= CY,

where X denotes the sample of pairs {(xl, Y,), . . . ,(x,, Y,)}. In this notation,one-sided percentile and percentile-r bootstrap confidence intervals for d are given


38/41

2378 P. Hull

by r^,(-co,li-n-2a,1Bal_.),.T1 (-co,&--a;80,_,)

respectively; compare Section 4.4. Each of these confidence intervals has nominalcoverage c(.The percentile-t interval _?i is the bootstrap version of an ideal intervalJ, =(-G0,d^-n-2a,1Bvl_a).

Of course, each of the intervals has a two-sided counterpart.Recall our definition that a one-confidence interval is second-order correct relative

to another if the (finite) endpoints of the intervals agree up to and including termsof order (n - 1/2)2 = n- ; see Section 4.3. It comes as no surprise to find that j1 issecond-order correct for J,, given what was learned in Section 4.5 about bootstrapconfidence intervals in more conventional problems. However, on the presentoccasion r^, is also second-order correct for J,, and that property is quite unusual.It ari_ses because Edgeworth expansions of the distributions of n/(d^- d)a,/a andn12(d - Lt)a /c? contain identical terms of size n- l/2, that is, Studentizing has noeffect on thexfirst term in the expansion. This is a consequence of the extra symmetryconferred by the presence of the design points xi, as we shall show in the nextparagraph. The reason why second-order correctness follows from identicalformulae for the n- 12 terms in expansions was made clear in Section 4.5.

Assume that 0.: is bounded away from zero as n---f co, and that maxi ~ iG ,,(xi - Xlis bounded as n -+ co . (In refined versions of the proof below, this boundednesscondition may be replaced by a moment condition on the design points xi, such assup& C(xi - -) < co.) Put C= n- C.q, and observe that

82=ne1i~l~f=np1 i (ei_c_(Xi-_x)(&d)}2i=l

=g2+n- i$lEf 02)+ O,(n - 1).Therefore, defining S = n(d^ - c&,/c, T = n1j2(a - d)a,/~?, and

A =+n-1a-2 i (6: - g2),i=l

we haveT= S(l - d) + O,(n-) = S + 0,(n-112). (4.34)

By making use of the fact that X(xi - X) = 0 (this is where the extra symmetry


39/41

Ch. 3Y: Methodology and Theory fiw the Bootstrap 2379

conferred by the design comes in) and of the representations = n-%;lo-l t (Xi - X)Ci,

i=l

we may easily prove that E{ S( 1 - A)} - E(S) = O(n ) forj = 1,2,3. Therefore, thefirst three cumulants of S and S(l - A) agree up to and including terms of ordern- I. Higher-order cumulants are of size n- or smaller. It follows that Edgeworthexpansions of the distributions of S and S(l - A ) differ only in terms of order 6 .In view of (4.34), the same is true for S and T,

P(S < w) = P(T6 w) + O(n_ ).(This step uses the delta method.) Therefore, Studentizing has no effect on the fistterm in the expansion, as had to be proved.

ReferencesBabu, G.J. and Singh, K. (1983) Inference on Means Using the Bootstrap, Annals o f Statistics, 11,

999-1003.Babu, G.J. and Singh, K. (1984) On One Term Correction by Efrons Bootstrap, Sankhya, Series A 46,219-232.Babu, G.J. and Singh, K. (1985) Edgeworth Expansions for Sampling without Replacement from Finite

Populations, Journal of Multivariate Analysis, 17, 261-278.Barnard, G.A. (1963) Contribution to Discussion, Journal of the Royal Statistical Society, Series B,

25, 294.Beran, R. (1982) Estimated Sampling Distributions: The Bootstrap and Competitors, Annals ofStatistics, 10, 212-225.Beran, R. (1987) Prepivoting to Reduce Level Error of Confidence Sets, Biometrika, 74.457-468.Bhattacharya, R.N. and Ghosh, J.K. (1978) On the Validity of the Formal Edgeworth Expansion,Annals of Statistics, 6,434&451.Bickel, P.J. and Freedman, D.A. (1980) On Edgeworth Expansions and the Bootstrap. Unpublishedmanuscript.Bickel, P.J. and Freedman, D.A. (1981) Some Asymptotic Theory for the Bootstrap, Annals ofStatistics, 9, 1196-1217.Bickel, P.J. and Freedman, D.A. (1983) Bootstrapping Regression Models with Many Parameters in

P.J. Bickel, K.A. Doksum, and J.C. Hodges, Jr., eds. A Festschrift for Erich L. Lehmann. Belmont:Wadsworth, 28-48.

Bose, A. (1988) Edgeworth Correction by Bootstrap in Autoregressions, Annals of Statistics, 16,170991722.Carlstein, E. (1986) The Use of Subseries Methods for Estimating the Variance of a General Statistic

from a Stationary Time Series, Annals of Statistics, 14, 1171-l 179.Chibishov, D.M. (1972) An Asymptotic Expansion for the Distribution of a Statistic Admitting anAsymptotic Expansion, Theory of Probability and its Applications, 17, 620-630.Chibishov, D.M. (1973a) An Asymptotic Expansion for a Class of Estimators Containing MaximumLikelihood Estimators, Theory of Probability and its Applications, 18, 295-303.Chibishov, D.M. (1973b) An Asymptotic Expansion for the Distribution of Sums of a Special Formwith an Application to Minimum-Contrast Estimates, Theory ofProbability and its Applications, 18,&19-661.Cox, D.R. and Hinkley, D.V. (1974) Theoretical Statistics. London: Chapman and Hall,


40/41


41/41

Ch. 39: Methodology und Theory,for the Bootstrap 2381Peters, S.C. and Freedman, D.A. (1984b) Some Notes on the Bootstrap in Regression Problems, J.Bus. Econ. Studies, 2, 406-409.Quenouille, M.H. (1949) Approximate Tests of Correlation in Time-Series, Journal of the RoyalStatistical Association, Series B 11. 68-84.Quenouille, M.H. (1956) Notes on Bias in Estimation, Biometrika, 43, 353-360.Reid, N. (1988) Saddlepoint Methods and Statistical Inference (With Discussion), Statistic. Sci., 3,

213-238.Robinson, J. (1987) Nonparametric Confidence Intervals in Regression: The Bootstrap and Randomiza-tion Methods, in: M.L. Puri, J.P. Vilaplana, and W. Wertz, eds., New Perspectives in Theoretical andApplied Statistics. New York: Wiley, pp 2433256.

Sargan, J.D. (1975) Gram-Charlier Approximations Applied to t Ratios of k-Class Estimators,Econometrica, 43, 3277346.Sargan, J.D. (1976) Econometric Estimators and the Edgeworth Approximation, Econometrica, 44,421-448.Simon, J.L. (1969) Basic Research Methods in Social Science. New York: Random House.Singh, K.(1981)0n the Asymptotic Accuracy ofEfrons Bootstrap, Annalsofstatistics, 9,1187-l 195.Tukey, J.W. (1958) Bias and Confidence in Not-Quite Large Samples (Abstract), Ann. Math. Statist.,29, 614.

Documents

E-Book - Methodology and Theory for the Bootstrap