Bootstrap Method for Time Series

8/2/2019 Bootstrap Method for Time Series

1/12

Statistical Science

2003, Vol. 18, No. 2, 219230 Institute of Mathematical Statistics, 2003

The Impact of Bootstrap Methods onTime Series AnalysisDimitris N. Politis

Abstract. Sparked by Efrons seminal paper, the decade of the 1980s wasa period of active research on bootstrap methods for independent datamainly i.i.d. or regression set-ups. By contrast, in the 1990s much researchwas directed towards resampling dependent data, for example, time seriesand random fields. Consequently, the availability of valid nonparametricinference procedures based on resampling and/or subsampling has freedpractitioners from the necessity of resorting to simplifying assumptions suchas normality or linearity that may be misleading.

Key words and phrases: Block bootstrap, confidence intervals, linearmodels, resampling, large sample inference, nonparametric estimation,subsampling.

1. INTRODUCTION: THE SAMPLE MEAN OF

A TIME SERIES

Let X1, . . . , Xn be an observed stretch from a strictlystationary time series {Xt, t Z}; the assumption ofstationarity implies that the joint probability law of(Xt, Xt+1, . . . , Xt+k ) does not depend on t for anyk 0. Assume also that the time series is weaklydependent; that is, the collection of random variables{Xt, t 0} is approximately independent of {Xt,t k} when k is large enough. An example of a weakdependence structure is given by m-dependence underwhich {Xt, t 0} is (exactly) independent of {Xt,t k} whenever k > m; independence is just thespecial case of 0-dependence.

Due to the dependence between the observations,even the most basic methods involved in applied sta-tistical work suddenly become challenging; an ele-mentary such example has to do with estimating theunknown mean = EXt of the time series. The sam-ple mean Xn = n

1nt=1 Xt is the obvious estima-tor; howeverand here immediately the difficulties

crop upit is not the most efficient. To see why, con-sider the regression Xt = + t, which also servesas a definition for the t process; it is apparent that

Dimitris N. Politis is Professor of Mathematics and Ad-

junct Professor of Economics, University of California,

San Diego, La Jolla, California 92093-0112 (e-mail:

[email protected]).

Xn is the ordinary least squares estimator of in thismodel. However, because of the dependence in theerrors t, the best linear unbiased estimator (BLUE)of is instead obtained by a generalized least squaresargument; consequently, BLUE = (11n 1)111n X,where X= (X1, . . . , Xn), 1= (1, . . . , 1) and n is the(unknown) covariance matrix of the vector X with i, j

element given by (i j ) = Cov(Xi , Xj ).It is immediate that BLUE is a weighted average ofthe Xt data, (i.e., BLUE =

ni=1 wi Xi ); the weights

wi are a function of the (unknown) covariances (s),s = 0, 1, . . . , n 1. For example, under the simpleAR(p), that is, autoregressive of order p, model,

(Xt )= 1(Xt1 ) + + p(Xtp ) + Zt,

(1)

where Zt i.i.d. (0, 2), it is easy to see that wj =w

nj+1for j

=1, . . . , p, w

p+1 =w

p+2 = =w

npandn

i=1 wi = 1; calculating w1, . . . , wp in termsof () is feasible but cumbersomesee Fuller (1996).For instance, in the special case of p = 1, we cancalculate that w1 = w2/(1 (1)), where (s) =(s)/(0) is the autocorrelation sequence.

Nevertheless, it is apparent that, at least in theAR(p) model above, the difference of the wi weightsfrom the constant weights in Xn is due mostly tothe end effects, that is, the first and last p weights.

219


2/12

220 D. N. POLITIS

Thus, it may be reasonable to conjecture that, fora large sample size n, those end effects will have anegligible contribution. This is, in fact, true; as shownin Grenander and Rosenblatt (1957) for a large classof weakly dependent processes [not limited to theAR(p) models], the sample mean

X

nis asymptotically

efficient. Therefore, estimating by our familiar, easy-to-use, sample mean Xn is fortuitously justified.

The problem, however, is further complicated if onewants to construct an estimate of the standard errorofXn. To see this, let 2n = Var(

nXn) and note that

n2n is just the sum of all the elements of the matrix n,that is, 2n =

ns=n(1 |s|/n) (s). Under the usual

regularity conditions,

2 := limn 2

n =

s=(s) = 2f (0),

where f(w) = (2 )1s= eiws (s), for w [, ], is the spectral density function of the {Xt}

process. It is apparent now that the simple task of stan-dard error estimation is highly nontrivial under depen-dence, as it amounts to nonparametric estimation of thespectral density at the origin.

One may think that estimating and plugginginto the above formula would give a consistent es-timator of the infinite sum

s= (s). Estimators

of(s) for |s| < n can certainly be constructed; (s) =(n s)1nst=1 (Xt Xn)(Xt+s Xn) and (s) =n1

nst=1

(Xt

Xn)(Xt+

s

Xn) are the usual candi-

dates. Note that (s) is unbiased but becomes quiteunreliable for large s because of increasing variabil-ity; at the extreme case of s = n 1, there is no aver-aging at all in (s). Thus, (s) is preferable for tworeasons: (a) it trades in some bias for a huge variancereduction by using the assumption of weak dependence[which implies that (s) 0 as s ] to shrinkthe (s) toward 0 for large s; and (b) (s) is a non-negative definite sequence, that is, it guarantees that

|s|


3/12

IMPACT OF BOOTSTRAP METHODS ON TIME SERIES ANALYSIS 221

B(x) = 1 |x|. Although the constant lag windowC(x) = 1 also leads to consistent estimation under (2),improved accuracy is achieved via the family of flat-toplag windows introduced in Politis and Romano (1995);the simplest representative of the flat-top family is the

trapezoidal lag window

T(x)

= min{1,

2(

1 |x|)},which can be thought of as a compromise between

C and B.

2. BLOCK RESAMPLING AND SUBSAMPLING

As is well known, the bootstrap scheme pioneered inthe seminal paper of Efron (1979) was geared towardindependent data. As a matter of fact, it was quicklyrecognized by Singh (1981) that if one applies thei.i.d. bootstrap to data that are dependent, inconsistencyfollows; see also the early paper by Babu and Singh(1983). Because of the total data scrambling inducedby the i.i.d. bootstrap, all dependence information islost. Consequently, the i.i.d. bootstrap estimate of 2ntypically does not converge to the correct limit givenby 2 =

s= (s); rather, it converges to (0),

that is, the i.i.d. bootstrap fails to give valid standarderror estimates in the dependent case.

2.1 Subsampling

Subsampling is probably one of the most intuitivemethods of valid statistical inference. To see whysubsampling works in the time series case, consider

the aforementioned consistent estimator of2 due to

Bartlett, that is, 2 T (0) or, equivalently, 2 fb,B (0).Recall that 2 fb,B (0) =

bs=b(1 |s|/b)(s),

which is actually a simple, plug-in estimator of2b = Var(

bb

i=1 Xi ) =b

s=b(1 |s|/b) (s). In-tuitively, 2 fb,B (0) is estimating

2b and not

2n ; con-

sistency of 2 fb,B (0) is achieved only because both2b and

2n tend to

2 under (2)and therefore also2b 2n 0.

In other words, it seems that we cannot directly es-timate 2n or

2

, and we have to content ourselves

with estimating 2b . But an immediate, empirical es-timator of 2b = Var(

bb

i=1 Xi ) can be constructedfrom the many size-b sample means that can be ex-tracted from our data X1, . . . , Xn. To formalize this no-tion, let Xi,b = b1

i+b1t=i Xt be the sample mean of

the ith block and let 2b,SUB denote the sample vari-

ance of the (normalized) subsample values

bXi,b fori = 1, . . . , q . The simple estimator 2b,SUB is the sub-sampling estimator of variance; it is consistent for 2

under (2) which is not surprising in view of that factthat it, too, is equivalent to the ubiquitous Bartlett esti-mator!

Nevertheless, the beauty of the subsampling method-ology is its extreme generality: let n = n(X1, . . . , Xn)

be an arbitrary statistic that is consistent for a generalparameter at rate an, that is, for large n, an(n )tends to some well-defined asymptotic distribution J;the rate an does not have to equal

n, and the dis-

tribution J does not have to be normalwe do noteven need to know its shape, just that it exists. Leti,b = b(Xi , . . . , Xi+b1) be the subsample value ofthe statistic computed from the ith block. The subsam-pling estimator ofJ is Jb,SUB defined as the empiricaldistribution of the normalized (and centered) subsam-ple values ab(i,b n) for i = 1, . . . , q .

The consistency of

Jb,SUB for general statistics un-der minimal conditionsbasically involving a weakdependence condition and (2)was shown in Politisand Romano (1992c, 1994b); consequently, confidenceintervals for can immediately be formed using thequantiles ofJb,SUB instead of the quantiles of the (un-known) J. A closely related early work is the paper bySherman and Carlstein (1996) where the subsamplingdistribution of the (unnormalized) subsample values isput forth as a helpful diagnostic tool.

If a variance estimator for ann is sought, it can beconstructed by the sample variance of the normalized

subsample values abi,b for i = 1, . . . , q ; consistencyof the subsampling estimator of variance for generalstatistics was shown by Carlstein (1986) under someuniform integrability conditions. It should be pointedout that if there is no dependence present, then the or-dering of the data is immaterial; the data can be per-muted with no information loss. Thus, in the i.i.d. case,there are

nb

subsamples of size b from which sub-

sample statistics can be recomputed; in this case,subsampling is equivalent to the well-known delete-d

jackknife (with d= n b) of Wu (1986) and Shao and

Wu (1989); see Politis, Romano and Wolf (1999) formore details and an extensive list of references.Subsampling in the i.i.d. case is also intimately

related to the i.i.d. bootstrap with smaller resamplesize b, that is, where our statistic is recomputed fromX1, . . . , X

b drawn i.i.d. from the empirical distribution

of the data X1, . . . , Xn; the only difference is thatsubsampling draws b values without replacement fromthe dataset X1, . . . , Xn while the bootstrap draws withreplacement.


4/12

222 D. N. POLITIS

The possibility of drawing an arbitrary resamplesize was noted as early on as in Bickel and Freed-man (1981); nevertheless, the research community fo-cused for some time on the case of resample size nsince that choice leads to improved performance, thatis, higher-order accuracy, in the sample mean case[cf. Singh (1981)]. Nevertheless, in instances wherethe i.i.d. bootstrap fails, it was observed by many re-searchers, most notably K. Athreya, J. Bretagnolle,M. Arcones and E. Gin, that bootstrap consistency canbe restored by taking a smaller resample size b sat-isfying (2); see Politis, Romano and Wolf (1999) forreferences but also note the early paper by Swanepoel(1986).

As a matter of fact, the i.i.d. bootstrap with resam-ple size b immediately inherits the general validity ofsubsampling in the case where b = o(n) in which

case the question with/without replacement is imma-terial; see Politis and Romano (1992c, 1993) or Politis,Romano and Wolf (1999, Corollary 2.3.1). However, torelax the condition b = o(n) to b = o(n), some extrastructure is required that has to be checked on a case-by-case basis; see Bickel, Gtze and van Zwet (1997).

Finally, note that in the case of the sample meanof i.i.d. data (with finite variance), the bootstrap withresample size n outperforms both subsampling andthe bootstrap with smaller resample size, as well asthe asymptotic normal approximation. Although theperformance of subsampling can be boosted by the

use of extrapolation and interpolation techniquessee, for example, Booth and Hall (1993), Politis andRomano (1995), Bickel, Gtze and van Zwet (1997)and Bertail and Politis (2001)the understanding hasbeen that subsampling sacrifices some accuracy forits extremely general applicability; this was also theviewpoint adopted in Politis, Romano and Wolf (1999,Chapter 10). Nevertheless, some very recent results ofSakov and Bickel (2000) and Arcones (2001) indicatethat this is not always the case: for the sample median,the i.i.d. bootstrap with resample size b = o(n) hasimproved accuracy as compared to the bootstrap with

resample size n; an analogous result is expected to holdfor subsampling.

2.2 Block Resampling

In the previous section, we saw the connection ofsubsampling in the i.i.d. case to the delete-d jack-knife. Recall that the standard delete-1 jackknife iscommonly attributed to Tukey (1958) and Quenouille(1949, 1956). Interestingly, Quenouilles work was fo-cused on blocking methods in a time series context,and

thus it was the precursor of (block) subsampling; ofcourse, blocking methods for estimation in time seriesgo back to Bartlett (1946) as previously mentioned.

After Carlsteins (1986) subseries variance estimator[as well as Halls (1985) retiling ideas for spatial data]

the time was ripe for Knschs (1989) introductionof the block bootstrap where, instead of recomputingthe statistic b on the smaller blocks of the typeBi = (Xi , . . . , Xi+b1), a new bootstrap pseudo-seriesX1 , . . . , X

l is created (with l = kb n) by joining

together k (= [n/b]) blocks chosen randomly (andwith replacement) from the set {B1, . . . , Bnb+1}; thenthe statistic can be recomputed from the new, full-sizepseudo-series X1 , . . . , X

l .

Thus, 10 years after Efrons (1979) pioneering paper,Knschs (1989) breakthrough indicated the startingpoint for the focus of research activity on this new area

of block resampling. The closely related work by Liuand Singh (1992), which was available as a preprint inthe late 1980s, was also quite influential at that pointbecause of its transparent and easily generalizableapproach.

Examples of some early work on block resamplingfor stationary time series include the blocks-of-blocksbootstrap of Politis and Romano (1992a), which ex-tended the applicability of the block bootstrap to para-meters associated with the whole infinite-dimensionalprobability law of the series; the circular bootstrap ofPolitis and Romano (1992b), in which the bootstrapdistribution is automatically centered correctly; and thestationary bootstrap of Politis and Romano (1994a),which joins together blocks of random lengthhavinga geometric distribution with mean band thus gen-erates bootstrap sample paths that are stationary seriesthemselves.

As in the i.i.d. case, the validity of the block-bootstrap methods must be checked on a case-by-casebasis, and this is usually facilitated by an underlyingasymptotic normality; typically, a condition such as (2)is required. For instance, the block-bootstrap estima-

tor of the distribution of n(Xn ) is essentiallyconsistent only if the latter tends to a Gaussian limit,a result that parallels the i.i.d. situation; see Gin andZinn (1990) and Radulovic (1996). To see why, notethat in the sample mean case the block-bootstrap dis-tribution is really a (rescaled) k-fold convolution ofthe subsampling distribution Jb,SUB with itself; sincek = [n/b] under (2), it is intuitive that the block-bootstrap distribution has a strong tendency towardGaussianity.


5/12


Interestingly, the block-bootstrap estimator 2b,BB ofVar(

nXn) is identical to the subsampling estimator

2b,SUB which in turn is equivalent to the Bartlett esti-

mator 2 fb,B (0); recall that the latter is suboptimaldue to its rather large bias of order O(1/b). In addi-

tion, the circular and stationary bootstrap estimatorsof Var(

nXn) suffer from a bias of the same orderof magnitude; see Lahiri (1999). However, if varianceestimation is our objective, we can construct a highlyaccurate estimator via a linear combination of block re-sampling/subsampling estimators, in the same way thatthe trapezoidal lag window spectral estimator fb,T canbe gotten by a linear combination of two Bartlett esti-mators. The simplest such proposal is to define

2b = 22b,BB 2b/2,BB,where b is now an even integer. The estimator

2b

has bias (and MSE) that is typically orders of mag-nitude less than those of 2b,BB; as a matter of fact,2b 2 fb,T(0) which, as previously alluded to, en-

joys some optimality properties. For example, if the co-variance (s) has an exponential decayas is the casein stationary ARMA modelsthen the bias of 2b be-comes O( b) for some > 1; see Politis and Romano(1995), Politis, Romano and Wolf (1999, Section 10.5)or Bertail and Politis (2001).

A theoretical disadvantage of 2b is that it is notalmost surely nonnegative; this comes as no surprisesince the trapezoidal lag window corresponds to ahigher-order (actually, infinite-order) smoothing ker-nel: all kernels of order higher than 2 share thisproblem. Notably, the fast convergence of 2b to anonnegative limit practically alleviates this problem.Nevertheless, the implication is that 2b is not associ-ated with some bootstrap distribution for Xn, as is thecase with 2b,BB; if

2b could have been gotten as the

variance of a probability distribution (empirical or not),then it should be nonnegative by necessity.

To come up with a method that gives improvedvariance estimation while at the same time being able

to produce bootstrap pseudo-series and a bootstrapdistribution for Xn, the following key idea may beemployed: the Bi blocks may be tapered, that is,their end points are shrunk toward a target value,before being concatenated to form a bootstrap pseudo-series; this is the tapered block-bootstrap method ofPaparoditis and Politis (2001a, 2002b) which indeedleads to a variance estimator 2b,TBB that is moreaccurate than 2b,BB. In particular, the bias of

2b,TBB

is of order O(1/b2). Although 2b,BB is typically

less accurate than 2b , the tapered block-bootstrapconstructs an estimator of the whole distribution for Xn(not just its variance), and this estimator is moreaccurate than its (untapered) block-bootstrap analog.Intuitively, this preferential treatment (tapering) of the

block edges is analogous to the weighting of theboundary points in constructing BLUE as discussed inthe Introduction.

2.3 Block Size Choice and Some Further Issues

Recall that the block size b must be chosen inorder to practically implement any of the block re-sampling/subsampling methods. By the equivalenceof 2b,BB and

2b,SUB to the Bartlett spectral estimator

2 fb,B (0), it is apparent that choosing b is tantamountto the difficult problem of bandwidth choice in non-parametric smoothing problems. As in the latter prob-

lem, here, too, there are two main approaches:(a) Cross-validation. A promising subsampling/

cross-validation approach for block size choice pro-posed by Hall, Horowitz and Jing (1995).

(b) Plug-in methods. This approach entails work-ing out an expression for the optimal (with respect tosome criterion) value ofb and then estimating/pluggingin all unknown parameters in that expression.

For the plug-in approach to be viable, it is necessarythat the pilot estimators to be plugged in are accurateand not plagued by a difficult block/bandwidth choice

themselves! Both of those issues can be successfullyaddressed by using pilot estimators of the trapezoidallag window type fb,T ; see Paparoditis and Politis(2001a) or Politis and White (2001). Note that choos-ing b in the estimator fb,T (or, equivalently, in

2b )

is relatively straightforward: one may pick b = 2m,where m is the smallest integer such that (k) 0 fork > m. Here (k) 0 really means (k) not signifi-cantly different from 0, that is, an implied hypothesistest; see Politis (2001b) for more details.

Finally, note that all of the aforementioned blockresampling/subsampling methods were designed forstationary time series. What happens if the time seriesis not stationary? Although both subsampling andblock resampling will not break down under a mildnonstationaritysee, for example, Politis, Romanoand Wolf (1999, Chapter 4)the following interestingcases may be explicitly addressed:

(i) Seasonal effects. If the time series Xt can bedecomposed as Xt = Yt + Wt, where Yt is stationaryand Wt is random or deterministic of period d, then


6/12

224 D. N. POLITIS

block resampling and subsampling will generally re-main valid if b is chosen to be an integer multipleof d. In addition, resampling/subsampling should notbe done from the set {B1, B2, . . . , Bnb+1} but ratherfrom a set of the type {B1, Bd+1, B2d+1, . . . }; that is,full overlap of the blocks is no longer recommendedsee Politis (2001a).

(ii) Local stationarity. Suppose that the time se-ries has a slowly changing stochastic structure in thesense that the joint probability law of (Xt, Xt+1, . . . ,Xt+m) changes smoothly (and slowly) with t forany m; see, for example, Dahlhaus (1997). Then a lo-cal block bootstrap may be advisable; here a bootstrappseudo-series is constructed by a concatenation ofk blocks of size b (such that kb n), where the j thblock of the resampled series is chosen randomly froma distribution (say, uniform) on all the size-b blocksof consecutive data whose time indices are close tothose in the original block. A rigorous description ofthe local block bootstrap is given in Paparoditis andPolitis (2002c).

(iii) Unit rootintegrated time series. A usefulmodel in econometrics is a generalization of therandom walk notion given by the unit root modelunder which Xt is not stationary but Dt is, whereDt = Xt Xt1 is the series of first differencessee,for example, Hamilton (1994). In this case, a blockbootstrap of the differences Dt can be performedyielding D1 , D

2 , . . . , and a bootstrap pseudo-series

for Xt

can be constructed by integrating the Dt

, thatis, by letting Xt =

ti=1 D

i . This idea is effectively

imposing a sample path continuity on the bootstraprealization Xt ; see Paparoditis and Politis (2001c). Onthe other hand, if a unit root is just suspected to exist,a bootstrap test of the unit root hypothesis (vs. thealternative of stationarity) may be constructed usingsimilar ideas; see Paparoditis and Politis (2003).

2.4 Why Have Bootstrap Methods Been so

Popular?

Finally, it is interesting to recall the two main

reasons for the immense success and popularity of thei.i.d. bootstrap of Efron (1979):

(a) The bootstrap was shown to give valid estimatesof distribution and standard error in difficult situa-tions; a prime example is the median of i.i.d. data forwhich the (delete-1) jackknife was known to fail, andthe asymptotic distribution is quite cumbersome.

(b) In easy cases where there exist easy-to-construct alternative distribution estimators, for exam-ple, the regular sample mean with its normal

large-sample distribution, the (Studentized) bootstrapwas shown to outperform those alternatives, that is, topossess second-order accuracysee Hall (1992) fordetails.

In the case of dependent stationary data, under

some regularity conditions, the appropriately standard-ized/Studentized block bootstrap (in all its variations,including the stationary and circular bootstrap) alsopossesses a higher-order accuracy property when es-timating the distribution of the sample mean; see Lahiri(1991, 1999) and Gtze and Knsch (1996). [Note thatit is not advisable to use a simple block-bootstrap esti-mate of standard error for the Studentization as it doesnot converge sufficiently fast; however, one may use anestimator such as bsee Davison and Hall (1993) orBertail and Politis (2001).] Nevertheless, the gains inaccuracy are not as spectacular as in the i.i.d. case; inaddition, estimating the standard error accurately is thedominant factor so that it is somewhat of a luxury tothink of capturing higher-order moments.

The bottom line is that practically all time seriesproblems fall under the difficult category, and thusthe aforementioned block resampling/subsamplingmethods become invaluable. To give a simple exam-ple, consider the problem of estimating the distributionof the lag-p sample autocorrelation (k) = (k)/ (0),for some fixed k, with the purpose of constructing aconfidence interval or hypothesis test concerning (k).

It is quite straightforward to show, under usual momentand mixing conditions, that the large-sample distribu-tion of

n((k) (k)) is N (0, V2k ). Unfortunately,

the variance V2k is, in general, intractable involving in-finite sums of fourth-order cumulants and so on; see,for example, Romano and Thombs (1996). Neverthe-less, block resampling and subsampling effortlesslyyield consistent estimators of V2k and the distributionof

n((k) (k)).

One may ask how it had been possible in the pre-bootstrap days to practically address such a basictime series inference question regarding the autocor-

relations; the answer is that people had to resort tosome more restrictive model assumptions. For exam-ple, a celebrated formula for V2k exists involving in-finite sums of the (unknown) () coefficients but nohigher cumulants; this formula is due to Bartlett butis valid only under the assumption that the time series{Xt} is linear; that is, it satisfies an equation of the type

Xt = +

i=i Zti ,(4)


7/12


where Zt i.i.d. (0, 2) and the j coefficientsare absolutely summablesee Brockwell and Davis(1991). Nonetheless, the class ofnonlinear time seriesis vast; for example, the simple time series defined byYt = ZtZt1 for all t is nonlinear and 1-dependentalthough uncorrelated. Other examples of nonlineartime series models include the ARCH/GARCH andbilinear models that are of particular prominence ineconometric analysis; see, for example, Granger andAndersen (1978), Subba Rao and Gabr (1984), Tong(1990), Bollerslev, Chou and Kroner (1992) or Engle(1995).

To give a concrete example, the 1.96/n bandsthat most statistical programs like S+ automaticallyoverlay on the correlogram [the plot of (k) vs. k]are based on Bartletts formula. These bands areusually interpreted as giving an implied hypothesistest of the first few (k) being 0 or not, (1) inparticular; however, this interpretation is true onlyif one is willing to assume (4). If the time series{Xt} is not linear, then Bartletts formula and the1.96/n bands are totally uninformative and maybe misleading; resampling and subsampling may wellgive the only practical solution in this casesee, forexample, Romano and Thombs (1996).

As an illustration, consider a realization X1, . . . ,X200 from the simple ARCH(1) model:

Xt = Zt

0.1 + 0.9X2t1,

where Zt i.i.d. N (0, 1); the sample path is plotted inFigure 1(a), while the sample autocorrelation functionis given in Figure 1(b).

Note that the time series {Xt} is uncorrelated al-though not independent. Consequently, the 1.96/nbands are misleading as they indicate that (1) is sig-nificantly different from 0. Because of the nonlinearitypresent, the correct 95% confidence limits should beapproximately 4.75/n, that is, more than two timeswider than the ones given by Bartletts formula. Us-ing the correct confidence limits 0.34, the estimate

(1)

= 0.33 is seen to not be significantly different

from 0 (if only barely).Figure 1(c) depicts the subsampling estimator of the

standard error of(1) as a function of the subsamplingblock size b. It is apparent that the block size choicediscussed in the previous section is an important issue:for good results, b must be large but small with respectto n = 200. With this caveat, the subsampling methodseems to be successful in capturing/estimating thecorrect standard error in this nonlinear setting. Even fora wide range of block size values going from b = 50

(a)

(b)

(c)FIG . 1. (a) Sample path of an ARCH(1) time series with n = 200;(b) sample autocorrelation of the series with the customary1.96/n bands; (c) subsampling-based estimates of the standarderror of(1) as a function of the subsampling block size b.

to 100, the subsampling estimator is 0.14 or above,which is a reasonable approximation of the target valueof 0.17.


8/12

226 D. N. POLITIS

3. MODEL-BASED RESAMPLING

FOR TIME SERIES

3.1 Linear Autoregressions

Historically speaking, block resampling/sub-sampling was not the first approach to bootstrappingnon-i.i.d. data. Almost immediately following Efrons(1979) paper, the residual-based bootstrap for lin-ear regression and autoregression was introduced andstudied; cf. Freedman (1981, 1984) and Efron andTibshirani (1986, 1993). In the context of the semi-parametric AR(p) model of (1), the residual-basedbootstrap amounts to an i.i.d. bootstrap of (a centeredversion of) the estimated residuals Zt defined by

Zt = (Xt Xn)

1(Xt

1

Xn)

p(Xt

p

Xn),

where j are some consistent estimators of j ,for example, the YuleWalker estimators. Havingconstructed bootstrap pseudo-residuals Z1 , Z

2 , . . . ,

a bootstrap series X1 , . . . , Xn can be generated via the

recursion (1) with j in place of j ; finally, our sta-tistic of interest can be recomputed from the bootstrapseries X1 , . . . , X

n.

If the assumed AR(p) model holds true, then theabove residual-based bootstrap works well for thesample mean and other more complicated statistics,and even enjoys a higher-order accuracy propertysimilar to Efrons i.i.d. bootstrap; see Bose (1988).Nevertheless, one never really knows whether theunderlying stochastic structure conforms exactly toan AR(p) model, and furthermore one would not knowthe value ofp. A more realistic approach followed bythe vast majority of practitioners is to treat (1) only asan approximation to the underlying stochastic structureand to estimate the order p from the data by criteriasuch as the AIC; see, for example, Shibata (1976).Doing so, one is implicitly assuming that {Xt} is alinear time series possessing an AR() representationof the form:

(Xt ) =

i=1i (Xti ) + Zt,(5)

where Zt i.i.d. (0, 2) and the j coefficients areabsolutely summable.

Of course, in fitting model (5) to the data, a fi-nite order p must be used which, however, is al-lowed to increase as the sample size n increases.Thus, the residual-based bootstrap applies verbatim

to the AR() model of (5) with the understandingthat p increases as a function of n; this is the so-called AR() sieve bootstrap employed bySwanepoel and van Wyk (1986) and rigorously investi-gated by Kreiss (1988, 1992), Paparoditis (1992) andBhlmann (1997), among others. Conditions for thevalidity of the AR() bootstrap typically require that

p as n but with pk/n 0,(6)where k is a small integer, usually 3 or 4. Consistencystill holds when the order p is chosen via the AIC; thatis, it is data dependent.

Under assumption (5), Choi and Hall (2000) proveda higher-order accuracy property of the AR() boot-strap in the sample mean case, while Bhlmann (1997,2002) showed that the AR() bootstrap estimator2AR(p) of

2n = Var(

nXn) has a performance compa-

rable to the aforementioned optimal estimator 2

b . Both2AR(p) and 2b exhibit an adaptivity to the strength of

dependence present in the series {Xt}, that is, they be-come more accurate when the dependence is weak, andthey both achieve a close to

n rate of convergence

for series with exponentially decaying covariance (s),for example, in the case of stationary ARMA models.Not surprisingly, the best performance for 2AR(p) isachieved when {Xt} satisfies a finite-order AR model,that is, a linear Markovian structure, whereas the bestperformance for 2b is achieved when {Xt} has anm-dependent structure, for example, a finite-order MA

model. Nevertheless, 2b is more generally valid. Evenamong the class of linear time series (4), there aremany series that do not satisfy (5), for example, thesimple MA(1) model Xt = Zt + Zt1, where Zt i.i.d. (0, 2).

3.2 Nonlinear Autoregressions and

Markov Processes

Although the class of linear time series is quiterich including, for example, all Gaussian time series,as mentioned before the class of nonlinear time se-ries is vast. It is intuitive that the AR(

) bootstrap

will generally not give good results when applied todata X1, . . . , Xn from a nonlinear time series. [Unless,of course, it so happens that the statistic to be boot-strapped has a large-sample distribution that is totallydetermined by the first two moments of{Xt}, i.e., by and (); this is actually what happens in the samplemean case.] See, for example, Bhlmann (2002) for adiscussion and some illustrative examples.

The linearity of model (1) is manifested in the factthat the conditional expectation m(Xt1, . . . , Xtp) :=


9/12


E(Xt|Xt1, . . . , Xtp) is a linear function of the vari-ables (Xt1, . . . , Xtp) just as in the Gaussian case.If we do not know and/or assume that m() is lin-ear, it may be of interest to estimate its functionalform for the purposes of prediction of Xn+1 giventhe recent observed past Xn, . . . , Xn

p+

1. Under somesmoothness assumptions on m(), this estimation canbe carried out by different nonparametric methods, forexample, kernel smoothing; see, for example, Masryand Tjstheim (1995). The bootstrap is then calledto provide a measure of accuracy of the resulting es-timator m(), typically by way of constructing con-fidence bands. With this objective, many differentresampling methods originally designed for nonpara-metric regression successfully carry over to this au-toregressive framework. We mention some prominentexamples:

A. Residual-based bootstrap. Assuming the struc-ture of a nonlinear (and heteroscedastic) autore-gression:

Xt = m(Xt1, . . . , Xtp)+ v(Xt1, . . . , Xtp)Zt,

(7)

where Zt i.i.d. (0, 1) and m() and v() aretwo (unknown) smooth functions, estimates ofthe residuals can be computed by Zt = (Xt m(Xt1, . . . , Xtp ))/v(Xt1, . . . , Xtp), wherem() and v() are preliminary [oversmoothed as inHrdle and Bowman (1988)] nonparametric esti-mators ofm() and v(). Then an i.i.d. bootstrap canbe performed on the (centered) residuals Zt, andrecursion (7)with m() and v() in place ofm()and v()can be used to generate a bootstrap se-ries X1 , . . . , X

n from which our target statistic, for

example, m(), can be recomputed. The consis-tency of this procedure under some conditions hasbeen recently shown in a highly technical paper byFranke, Kreiss and Mammen (2002).

B. Wild and local bootstrap. Let Xt1 = (Xt1, . . . ,Xt

p ) and consider the scatterplot ofXt vs. Xt

1;

smoothing this scatterplot will give our estima-tor m() as well as estimators for higher-order con-ditional moments if so desired. Interestingly, thewild bootstrap of Wu (1986) and the local boot-strap of Shi (1991) both have a direct applicationin this dependent context; their respective imple-mentation can be performed exactly as in the inde-pendent (nonparametric) regression setting withouteven having to assume the special structure impliedby the nonlinear autoregression (7) which is quite

remarkable. Neumann and Kreiss (1998) show thevalidity of the wild bootstrap in this time series con-text, while Paparoditis and Politis (2000) show thevalidity of the local bootstrap under very weak con-ditions.

Comparing the above two methods, that is, theresidual-based bootstrap vs. the wild/local bootstrap,we note the following: the wild/local bootstrap meth-ods are valid under weaker assumptions, as, for exam-ple, when (7) is not assumed; on the other hand, theresidual-based bootstrap is able to generate new boot-strap stationary pseudo-series X1 , . . . , X

n from which

a variety of statistics may be recomputed as opposed tojust generating new scatterplots to be smoothed.

In order to have our cake and eat it, too, the Markov-ian local bootstrap (MLB for short) was recently in-troduced by Paparoditis and Politis (2001b, 2002a). To

describe it, assume for the sake of argument that thetime series {Xt} is Markov of order p, with p = 1 forsimplicity. Given the data X1, . . . , Xn, the MLB proce-dure constructs a bootstrap sample path X1 , . . . , X

n as

follows:

(a) Assign X1 a value chosen randomly from theset {X1, . . . , Xn}. Suppose this chosen value is equalto Xt1 (for some t1).

(b) Assign X2 a value chosen randomly from the set{Xs+1 : |Xt1 Xs | h and 1 s < n1}, where h is abandwidth-type parameter. Suppose this chosen value

is equal to Xt2 .(c) Assign X3 a value chosen randomly from the set

{Xs+1 : |Xt2 Xs | h and 1 s < n 1}. Repeat theabove until the full bootstrap sample path X1 , . . . , X

n

is constructed.

The lag-1 (i.e., p = 1) MLB described above is capa-ble of capturing the transition probability, as well as thetwo-dimensional marginal of the Markov process {Xt};thus, it can capture the whole infinite-dimensional dis-tribution of{Xt} since that is totally determined by thetwo-dimensional marginal via the Markov property. To

appreciate why, consider momentarily the case where{Xt} has a finite state space, say {1, 2, . . . , d }, and sup-pose that Xt1 from part (a) equals the state 1. Then wecan even take h = 0, and the set {Xs+1 : |Xt1 Xs | h}is simply the set of all data points whose predecessorequaled 1. Choosing randomly from this set is like gen-erating data from the empirical transition function, thatis, an estimate of P (Xt+1 = |Xt = 1). In the generalcontinuous-state case, consistency of the MLB requiresthat we let h 0 as n , but at a slow enough rate


10/12

228 D. N. POLITIS

(e.g., nh ) sufficient to ensure that the cardinal-ity of sets of the type {Xs+1 : |Xt Xs | h} increaseswith n.

The above MLB algorithm can be easily modi-fied to capture the Markov(p) case for p 1. Itcan then generally be provencf. Paparoditis andPolitis (2001b, 2002a)that the MLB bootstrappseudo-series X1 , X

2, . . . is stationary and Markov of

order p and that the stochastic structure of {Xt } ac-curately mimics the stochastic structure of the originalMarkov(p) series {Xt}.

Note that the assumption of a Markov(p) structureis more general than assuming (7). It should bestressed, however, that the applicability of the MLBis not limited to Markov processes: the lag-p MLBaccurately mimics the (p + 1)-joint marginal of ageneral stationary {Xt} process, and therefore could

be used for inference in connection with any statisticwhose large-sample distribution only depends on this(p + 1)-joint marginal; in that sense, the MLB isactually a model-free method. Prime examples ofsuch statistics are the aforementioned kernel-smoothedestimators of the conditional moments m() and v();see, for example, Paparoditis and Politis (2001b).

Finally, note that a closely related method to theMLB is the Markov bootstrap of Rajarshi (1990) thatalso possesses many favorable properties; see, for ex-ample, Horowitz (2003) who also shows a higher-orderaccuracy property. The Markov bootstrap proceeds by

nonparametrically estimating the transition density asa ratio of two kernel-smoothed density estimators (thejoint over the marginal). A bootstrap series X1 , . . . , X

n

is generated by starting at an arbitrary data point andthen sampling from this explicitly estimated transitiondensity.

The relation between the MLB and the Markov boot-strap of Rajarshi (1990) is analogous to the relationbetween Efrons (1979) i.i.d. bootstrap (that samplesfrom the empirical) and the so-called smoothed boot-strap (that samples from a smoothed empirical); for ex-ample, the MLB only generates points Xt that actually

belong to the set of data points {X1, . . . , Xn}, while theMarkov bootstrap generates points Xt that can be any-where on the real line. By analogy to the i.i.d. case,it is intuitive here, too, that this extra smoothing maywell be superfluous at least in the case of statisticssuch as the conditional/unconditional moments andfunctions thereof. Nevertheless, in different situations,for instance, if the statistic under consideration has alarge-sample distribution that depends on the underly-ing marginal densitiesas opposed to distributions

this explicit smoothing step may be advisable. An ex-ample in the i.i.d. case is given by the sample me-dian (and other nonextreme sample quantiles) wherethe smoothed bootstrap outperforms the standard boot-strap; see, for example, Hall, DiCiccio and Romano

(1989).ACKNOWLEDGMENTS

Supported in part by NSF Grant DMS-01-04059.Many thanks are due to Professors E. Paparoditis(Cyprus) and J. Romano (Stanford) for constructivecomments on this manuscript.

REFERENCES

ARCONES, M. A. (2001). On the asymptotic accuracy of thebootstrap under arbitrary resampling size. Ann. Inst. Statist.

Math. To appear.BAB U, G. J. and S INGH, K. (1983). Inference on means using the

bootstrap. Ann. Statist. 11 9991003.BARTLETT, M. S. (1946). On the theoretical specification and sam-

pling properties of autocorrelated time-series. Suppl. J. Roy.Statist. Soc. 8 2741.

BERTAIL, P. and POLITIS, D. N. (2001). Extrapolation of sub-sampling distribution estimators: The i.i.d. and strong mixingcases. Canad. J. Statist. 29 667680.

BICKEL, P. and FREEDMAN, D. A. (1981). Some asymptotic theoryfor the bootstrap. Ann. Statist. 9 11961217.

BICKEL, P., GTZE, F. and VAN ZWE T, W. R. (1997). Resamplingfewer than n observations: Gains, losses, and remedies forlosses. Statist. Sinica 7 132.

BOLLERSLEV, T., CHO U, R. and KRONER, K. (1992). ARCHmodelling in finance: A review of the theory and empiricalevidence. J. Econometrics 52 559.

BOOTH, J. G. and HAL L, P. (1993). An improvement of thejackknife distribution function estimator. Ann. Statist. 2114761485.

BOS E, A. (1988). Edgeworth correction by bootstrap in autoregres-sions. Ann. Statist. 16 17091722.

BROCKWELL, P. and DAVIS, R. (1991). Time Series: Theory andMethods, 2nd ed. Springer, New York.

BHLMANN, P. (1997). Sieve bootstrap for time series. Bernoulli3 123148.

BHLMANN, P. (2002). Bootstraps for time series. Statist. Sci. 175272.

CARLSTEIN, E. (1986). The use of subseries values for estimatingthe variance of a general statistic from a stationary time series.

Ann. Statist. 14 11711179.CHO I, E. and HAL L, P. (2000). Bootstrap confidence regions

computed from autoregressions of arbitrary order. J. R. Stat.Soc. Ser. B Stat. Methodol. 62 461477.

DAHLHAUS, R. (1997). Fitting time series models to nonstationaryprocesses. Ann. Statist. 25 137.

DAVISON, A. C. and HAL L, P. (1993). On Studentizing and block-ing methods for implementing the bootstrap with dependentdata. Austral. J. Statist. 35 215224.


11/12


EFRON, B. (1979). Bootstrap methods: Another look at the jack-knife. Ann. Statist. 7 126.

EFRON, B. and TIBSHIRANI, R. J. (1986). Bootstrap methods forstandard errors, confidence intervals, and other measures ofstatistical accuracy (with discussion). Statist. Sci. 1 5477.

EFRON, B. and TIBSHIRANI, R. J. (1993). An Introduction to theBootstrap. Chapman and Hall, New York.

ENGLE, R., ed. (1995). ARCH: Selected Readings. Oxford Univ.Press.

FRANKE, J., KREISS, J.-P. and MAMMEN, E. (2002). Bootstrap ofkernel smoothing in nonlinear time series. Bernoulli 8 137.

FREEDMAN, D. A. (1981). Bootstrapping regression models. Ann.Statist. 9 12181228.

FREEDMAN, D. A. (1984). On bootstrapping two-stage least-squares estimates in stationary linear models. Ann. Statist. 12827842.

FULLER, W. A. (1996). Introduction to Statistical Time Series,2nd ed. Wiley, New York.

GIN , E. and ZIN N, J. (1990). Necessary conditions for thebootstrap of the mean. Ann. Statist. 17 684691.

GTZE

, F. and KNSCH

, H. (1996). Second-order correctnessof the blockwise bootstrap for stationary observations. Ann.Statist. 24 19141933.

GRANGER, C. and ANDERSEN, A. (1978). An Introduction toBilinear Time Series Models. Vandenhoeck und Ruprecht,Gttingen.

GRENANDER, U. and ROSENBLATT, M. (1957). Statistical Analy-sis of Stationary Time Series. Wiley, New York.

HAL L, P. (1985). Resampling a coverage pattern. StochasticProcess. Appl. 20 231246.

HAL L, P. (1992). The Bootstrap and Edgeworth Expansion.Springer, New York.

HAL L, P., DICICCIO, T. J. and ROMANO, J. P. (1989). Onsmoothing and the bootstrap. Ann. Statist. 17 692704.

HAL L, P., HOROWITZ, J. L. and JIN G, B.-Y. (1995). On blockingrules for the bootstrap with dependent data. Biometrika 82561574.

HAMILTON, J. D. (1994). Time Series Analysis. Princeton Univ.Press.

HRDLE, W. and BOWMAN, A. (1988). Bootstrapping in nonpara-metric regression: Local adaptive smoothing and confidencebands. J. Amer. Statist. Assoc. 83 102110.

HOROWITZ, J. L. (2003). Bootstrap methods for Markov processes.Econometrica 71 10491082.

KREISS, J.-P. (1988). Asymptotic statistical inference for a classof stochastic processes. Habilitationsschrift, Faculty of Math-ematics, Univ. Hamburg, Germany.

KREISS, J.-P. (1992). Bootstrap procedures for AR() processes.

In Bootstrapping and Related Techniques (K. H. Jckel,G. Rothe and W. Sendler, eds.) 107113. Springer, Berlin.KNSCH, H. R. (1989). The jackknife and the bootstrap for general

stationary observations. Ann. Statist. 17 12171241.LAHIRI, S. N. (1991). Second order optimality of stationary

bootstrap. Statist. Probab. Lett. 11 335341.LAHIRI, S. N. (1999). Theoretical comparisons of block bootstrap

methods. Ann. Statist. 27 386404.LIU, R. Y. and SINGH, K. (1992). Moving blocks jackknife and

bootstrap capture weak dependence. In Exploring the Limitsof Bootstrap (R. LePage and L. Billard, eds.) 225248. Wiley,New York.

MASRY, E. and TJSTHEIM, D. (1995). Nonparametric estimationand identification of nonlinear ARCH timeseries.EconometricTheory 11 258289.

NEUMANN, M. and KREISS, J.-P. (1998). Regression-type in-ference in nonparametric autoregression. Ann. Statist. 2615701613.

PAPARODITIS, E. (1992). Bootstrapping some statistics useful inidentifying ARMA models. In Bootstrapping and RelatedTechniques (K. H. Jckel, G. Rothe and W. Sendler, eds.)115119. Springer, Berlin.

PAPARODITIS, E. and POLITIS, D. N. (2000). The local bootstrapfor kernel estimators under general dependence conditions.

Ann. Inst. Statist. Math. 52 139159.PAPARODITIS, E. and POLITIS, D. N. (2001a). Tapered block

bootstrap. Biometrika 88 11051119.PAPARODITIS, E. and POLITIS, D. N. (2001b). A Markovian local

resampling scheme for nonparametric estimators in time seriesanalysis. Econometric Theory 17 540566.

PAPARODITIS, E. and POLITIS, D. N. (2001c). The continuous-path block-bootstrap. In Asymptotics in Statistics and Prob-ability (M. Puri, ed.) 305320. VSP Publications, Zeist, TheNetherlands.

PAPARODITIS, E. and POLITIS, D. N. (2002a). The local boot-strap for Markov processes. J. Statist. Plann. Inference 108301328.

PAPARODITIS, E. and POLITIS, D. N. (2002b). The tapered blockbootstrap for general statistics from stationary sequences.

Econom. J. 5 131148.PAPARODITIS, E. and POLITIS, D. N. (2002c). Local block

bootstrap. C. R. Math. Acad. Sci. Paris. 335 959962.PAPARODITIS, E. and POLITIS, D. N. (2003). Residual-based block

bootstrap for unit root testing. Econometrica 71 813855.POLITIS, D. N. (2001a). Resampling time series with seasonal

components. In Frontiers in Data Mining and Bioinformatics:Proceedings of the 33rd Symposium on the Interface ofComputing Science and Statistics.

POLITIS, D. N. (2001b). Adaptive bandwidth choice. J. Nonpara-metr. Statist. To appear.

POLITIS, D. N. and ROMANO, J. P. (1992a). A general resamplingscheme for triangular arrays of -mixing random variableswith application to the problem of spectral density estimation.

Ann. Statist. 20 19852007.POLITIS, D. N. and ROMANO, J. P. (1992b). A circular block-

resampling procedure for stationary data. In Exploring theLimits of Bootstrap (R. LePage and L. Billard, eds.) 263270.Wiley, New York.

POLITIS, D. N. and ROMANO, J. P. (1992c). A general theory for

large sample confidence regions based on subsamples underminimal assumptions. Technical Report 399, Dept. Statistics,Stanford Univ.

POLITIS, D. N. and ROMANO, J. P. (1993). Estimating thedistribution of a Studentized statistic by subsampling. Bull.

Internat. Statist. Inst. 2 315316.POLITIS, D. N. and ROMANO, J. P. (1994a). The stationary

bootstrap. J. Amer. Statist. Assoc. 89 13031313.POLITIS, D. N. and ROMANO, J. P. (1994b). Large sample con-

fidence regions based on subsamples under minimal assump-tions. Ann. Statist. 22 20312050.


12/12

230 D. N. POLITIS

POLITIS,D. N.and ROMANO, J. P. (1995). Bias-corrected nonpara-metric spectral estimation. J. Time Ser. Anal. 16 67103.

POLITIS, D. N., ROMANO, J. P. and WOL F, M. (1999). Subsam-pling. Springer, New York.

POLITIS, D. N. and WHITE, H. (2001). Automatic block-lengthselection for the dependent bootstrap. Econometric Rev. To

appear.QUENOUILLE, M. (1949). Approximate tests of correlation intime-series. J. Roy. Statist. Soc. Ser. B 11 6884.

QUENOUILLE, M. (1956). Notes on bias in estimation. Biometrika43 353360.

RADULOVIC, D. (1996). The bootstrap of the mean for strongmixing sequences under minimal conditions. Statist. Probab.

Lett. 28 6572.RAJARSHI, M. B. (1990). Bootstrap in Markov sequences based

on estimates of transition density. Ann. Inst. Statist. Math. 42253268.

ROMANO, J. P. and THOMBS, L. (1996). Inference for autocor-relations under weak assumptions. J. Amer. Statist. Assoc. 91590600.

SAKOV, A. and BICKEL, P. (2000). An Edgeworth expansion forthe m out ofn bootstrapped median. Statist. Probab. Lett. 49217223.

SHAO , J. and WU, C.-F. J. (1989). A general theory for jackknifevariance estimation. Ann. Statist. 17 11761197.

SHERMAN, M. and CARLSTEIN, E. (1996). Replicate histograms.J. Amer. Statist. Assoc. 91 566576.

SHI , S. G. (1991). Local bootstrap. Ann. Inst. Statist. Math. 43667676.

SHIBATA, R. (1976). Selection of the order of an autoregres-sive model by Akaikes information criterion. Biometrika 63

117126.SINGH, K. (1981). On the asymptotic accuracy of Efrons boot-strap. Ann. Statist. 9 11871195.

SUBBA RAO, T. and GAB R, M. (1984). An Introduction toBispectral Analysis and Bilinear Time Series Models. LectureNotes in Statist. 24. Springer, New York.

SWANEPOEL, J. W. H. (1986). A note on proving that the(modified) bootstrap works. Comm. Statist. Theory Methods.15 31933203.

SWANEPOEL, J. W. H. and VAN WYK, J. W. J. (1986). The boot-strap applied to power spectral density function estimation.

Biometrika 73 135141.TON G, H. (1990). Non-linear Time Series: A Dynamical Systems

Approach. Oxford Univ. Press.

TUKEY, J. W. (1958). Bias and confidence in not-quite largesamples (abstract). Ann. Math. Statist. 29 614.

WU, C.-F. J. (1986). Jackknife, bootstrap and other resamplingmethods in regression analysis (with discussion). Ann. Statist.14 12611350.

Documents

Bootstrap Method for Time Series