27
Journal of Econometrics 106 (2002) 243–269 www.elsevier.com/locate/econbase Establishing conditions for the functional central limit theorem in nonlinear and semiparametric time series processes James Davidson Cardi Business School, Cardi University, Aberconway Building, Colum Drive, Cardi CF10 3EU, UK Received 10 August 1998; revised 4 June 2001; accepted 13 July 2001 Abstract This paper considers methods of deriving sucient conditions for the central limit theorem and functional central limit theorem to hold in a broad class of time series processes, including nonlinear processes and semiparametric linear processes. The common thread linking these results is the concept of near-epoch dependence on a mixing process, since powerful limit results are available under this limited-dependence property. The particular case of near-epoch dependence on an independent process provides a convenient framework for dealing with a range of nonlinear cases, in- cluding the bilinear, GARCH, and threshold autoregressive models. It is shown in particular that even SETAR processes with a unit root regime have short memory, under the right conditions. A simulation approach is also demonstrated, applicable to cases that are analytically intractable. A new FCLT is given for semiparametric linear processes, where the forcing processes are of the NED-on-mixing type, under conditions that are evidently close to necessary. ? 2002 Elsevier Science S.A. All rights reserved. JEL classication: C10; C22 Keywords: Near-epoch dependence; FCLT; Nonlinear; Bilinear; GARCH; TAR Research supported by the ESRC under award L138251025. This paper is based on parts of a working paper circulated under the title “When is a Time Series I(0)?”. Tel.: +44-2920-874-558; fax: +44-2920-874-419. E-mail address: [email protected] (J. Davidson). 0304-4076/02/$ - see front matter ? 2002 Elsevier Science S.A. All rights reserved. PII: S0304-4076(01)00100-2

Establishing conditions for the functional central limit theorem in nonlinear and semiparametric time series processes

Embed Size (px)

Citation preview

Journal of Econometrics 106 (2002) 243–269www.elsevier.com/locate/econbase

Establishing conditions for the functional centrallimit theorem in nonlinear and semiparametric

time series processes�

James Davidson ∗

Cardi� Business School, Cardi� University, Aberconway Building, Colum Drive,Cardi� CF10 3EU, UK

Received 10 August 1998; revised 4 June 2001; accepted 13 July 2001

Abstract

This paper considers methods of deriving su*cient conditions for the central limittheorem and functional central limit theorem to hold in a broad class of time seriesprocesses, including nonlinear processes and semiparametric linear processes. Thecommon thread linking these results is the concept of near-epoch dependence on amixing process, since powerful limit results are available under this limited-dependenceproperty. The particular case of near-epoch dependence on an independent processprovides a convenient framework for dealing with a range of nonlinear cases, in-cluding the bilinear, GARCH, and threshold autoregressive models. It is shown inparticular that even SETAR processes with a unit root regime have short memory,under the right conditions. A simulation approach is also demonstrated, applicableto cases that are analytically intractable. A new FCLT is given for semiparametriclinear processes, where the forcing processes are of the NED-on-mixing type, underconditions that are evidently close to necessary. ? 2002 Elsevier Science S.A. Allrights reserved.

JEL classi$cation: C10; C22

Keywords: Near-epoch dependence; FCLT; Nonlinear; Bilinear; GARCH; TAR

� Research supported by the ESRC under award L138251025. This paper is based on partsof a working paper circulated under the title “When is a Time Series I(0)?”.

∗ Tel.: +44-2920-874-558; fax: +44-2920-874-419.E-mail address: [email protected] (J. Davidson).

0304-4076/02/$ - see front matter ? 2002 Elsevier Science S.A. All rights reserved.PII: S0304-4076(01)00100-2

244 J. Davidson / Journal of Econometrics 106 (2002) 243–269

1. Introduction

For statistical inference in time series, it is usually necessary to rely onasymptotic convergence results such as the central limit theorem (CLT) andfunctional central limit theorem (FCLT). The latter result, in particular, is cen-tral to the recent research on integrated processes and cointegrationmodels. To cite one seminal example from a large and growing literature,Phillips (1987) derives distributional results for unit-root autoregressive pro-cesses. His results rely on a FCLT due to Herrndorf (1984) which assumesa mixing condition on the increments of the process.As a benchmark, the basic result of this type is as follows. Let the stochas-

tic process Xn : [0; 1] �→ R be deGned by

Xn(�)=�−1n

[n�]∑t=1

(xt − Ext); 0¡�6 1; (1.1)

where �2n =Var(

∑nt=1 xt), and [a] denotes the largest integer not exceeding

a. Introduce the following assumptions:

1. xt is either �-mixing of size −r=(r − 2) for r¿ 2 or �-mixing of size−r=(2r − 2), for r¿ 2.1

2. supt E|xt−Ext |r ¡∞, and if r=2 then {(xt−Ext)2} is uniformly integrable.3. �2

n=n → �2¿ 0, as n → ∞.

Let d→ denote convergence in distribution, and let B denote standard Brow-nian motion on [0; 1].

Theorem 1.1. If Assumptions 1; 2 and 3 hold; then Xnd→B.

This is a special case of Theorem 1 of De Jong and Davidson (2000), butthe strong-mixing (�-mixing) case is essentially the same as the one cited byPhillips (1987). Considering the case �=1 yields a CLT, and for this result,Assumption 3 can be relaxed to permit forms of global heteroscedasticity; seeDe Jong (1997) and Davidson (1992, 1993) for details.2 Henceforth, we willrefer only to the FCLT to avoid repetition. There is eBectively no loss of gen-erality, since while the corresponding CLT may permit more heterogeneity,the short-memory requirements are always common to both results.In applications of Theorem 1.1, to determine the limiting distribution of

the Dickey and Fuller (1979) or Phillips and Perron (1988) statistics, forexample, note that the increments of the observed time series must satisfy

1 Among other references on mixing conditions see Davidson (1994, Chapter 14), for deGni-tions and examples.

2 Without Assumption 3, it may still be possible to show convergence to a Gaussian a.s.continuous limit process, but this process is not Brownian motion.

J. Davidson / Journal of Econometrics 106 (2002) 243–269 245

Assumptions 1–3. Corrections for autocorrelation may be needed to estimatethe variance consistently, as in the Phillips–Perron nonparametric correction,and this is also, in eBect, the role of the lagged increments in the aug-mented Dickey–Fuller regression. However, these corrections are irrelevantto the weak convergence of the partial sum process. To apply results such asTheorem 1.1, the prime consideration must be whether the economic timeseries having increments xt does possess the properties speciGedfor it.While the Lexibility and model-independence of mixing conditions are at-

tractive features, there are well-known practical limitations with this approachto limiting the memory of a process. Since the strong and uniform mixingconditions bind on the supremum over all possible random events involvingsequence coordinates having a given time separation, the relevant distribu-tions must be fully speciGed to establish that they hold. They are, therefore,di*cult or impossible to verify in practical applications. Moreover, they areknown to fail in some seemingly innocent cases. Stable Grst-order linear auto-regressive models with Bernoulli-distributed i.i.d. shocks are not strongmixing (Andrews, 1984). Known su*cient conditions for strong mixing spec-ify not merely a continuous distribution, but a smoothness condition onthe density (see Davidson, 1994, Theorem 14:9). Establishing that uniformmixing holds for linear models is even harder, and the best known suf-Gcient condition (e.g. Davidson, 1994, Theorem 14:14) requires the shockprocess to be bounded with probability 1, ruling out normality forexample.This paper considers alternatives to Assumption 1 that are either weaker

than mixing, or more easily veriGable in applications, or both. The mainvehicle for the analysis is the concept of near-epoch dependence (NED) ona mixing process, Grst introduced to econometricians by Gallant and White(1988). This approach to specifying conditions for the FCLT was pioneeredby McLeish (1975), although the concept originates with Billingsley (1968)and Ibragimov (1962). It is a nonparametric restriction on the memory of aprocess that nonetheless constrains only a sequence of low-order moments. Ithas the beneGts both of holding in cases where mixing fails, and, as we showbelow, of being potentially veriGable, as the property of a range of popularnonlinear time series models.Let xt(: : : ; Ct−1; Ct ; Ct+1; : : :) denote a random sequence whose coordinates are

measurable functions of another random process {Cs; −∞¡s¡∞}, whichis in general vector-valued. DeGne Ft

s =�(Cs; : : : ; Ct) for any s6 t, and alsolet Et+m

t−m(·) denote the expectation conditional on Ft+mt−m. For future reference,

note that we shall also use the standard notation Ft to denote Ft−∞. The

following deGnition is adapted from Davidson (1994, DeGnition 17:1). Wereproduce it here for convenience, since features will need to be speciGcallycited in what follows.

246 J. Davidson / Journal of Econometrics 106 (2002) 243–269

De$nition 1. xt is said to be near-epoch dependent on {Cs} in Lp-norm (orLp-NED) for p¿ 0 if 3

||xt − Et+mt−mxt ||p6dt�(m); (1.2)

where dt is a sequence of positive constants, and �(m) → 0 as m → ∞. It issaid to be Lp-NED of size −� if �(m)=O(m−�−�) for �¿ 0.

In the case where �(m)=O(e−�m) for �¿ 0, such that the size is nominallyequal to −∞, we also say the process is geometrically Lp-NED. The scaleconstants dt allow for possible nonstationarity and in many cases, includingall those examined in this paper, can be set equal to 1. More generally, itis desirable in applications to require that dt =O(||xt ||p), since otherwise theconcept can become vacuous. Note that NED is a condition on the mappingfrom {Cs; −∞¡s¡∞} to xt , and so says nothing about the amount ofdependence in the xt series itself. It becomes useful when combined with amixing condition on Cs, and in particular, independence of this series.The NED-on-mixing property, subject to suitable size and moment restric-

tions, proves su*cient for the FCLT to hold. Consider the following assump-tion:

1′: xt is L2-NED of size − 12 on a process {Cs} with respect to constants

dt6 ||xt ||r , where Cs is either �-mixing of size −r=(r − 2) for r¿ 2 or�-mixing of size −r=(2r − 2), for r¿ 2.

The following result is another case of Theorem 3.1 of De Jong and Davidson(2000).

Theorem 1.2. If Assumptions 1′; 2 and 3 hold; then Xnd→B.

The noteworthy feature of this result is that the mixing and moment condi-tions are identical to Theorem 1.1. The extension to NED functions of mixingprocesses represents a kind of free lunch, with no penalties on the permittedmixing sizes or other conditions. The CLT obtained by considering the case�=1 is similarly the most general currently available in respect of memoryproperties, although for this case Assumption 3 can be relaxed, as before.The linear process is a standard example. Consider the general MA(∞)

model

xt =∞∑k=0

�kut−k ; (1.3)

where ut (to be thought of as an element of Ct , or possibly coincident with it)is bounded in L2-norm with mean 0, but otherwise is unspeciGed. If the MA

3 || · ||p denotes the Lp-norm, (E| · |p)1=p.

J. Davidson / Journal of Econometrics 106 (2002) 243–269 247

coe*cients are absolutely summable, the process is L2-NED on {ut} withrespect to NED numbers

�(m)=∞∑

j=m+1

|�j|

and constants

dt =2 sups6t

||us||2

(see Davidson, 1994, Example 17:3.). Thus, if |�j|=O(j−1−�−�) for �¿ 0,the NED size is −�. To satisfy the L2-NED conditions of Theorem 1.2 wewould need |�j|=O(j−3=2−�). However, since the shocks are themselves per-mitted to be dependent, note that this result is not comparable with the stan-dard linear process setup, with i.i.d. shocks.So much is well-known. The present paper extends the theory in two di-

rections, distinct but having in common the application of the NED concept.First, in Section 2, the property of L2-NED on an independent process is usedto unify the treatment of a range of popular nonlinear time series models.Parametric conditions are derived for these models to satisfy the assumptionsof Theorem 1.2. The familiar case of ARMA processes is treated Grst, inSection 2.1, to establish methods and notation. We then consider in turn,bilinear models (Section 2.2), GARCH models (Section 2.3), and thresh-old autoregressive models (Section 2.4), looking in particular at the mixedstable=unit root case (Section 2.5). Section 2.6 considers an analogous, ana-lytically intractable case, the ESTAR model, in which simulation is used asan alternative means of checking the conditions. Section 2.7 then oBers somegeneralizations of the approach for smooth nonlinear autoregressive forms.Second, in Section 3, it is shown that under linear process assumptions,

the previously established su*cient conditions for the FCLT can be weakenedquite dramatically, well beyond what the discussion of the case (1.3) abovemight suggest. In particular, the result would apply, under mild summabilityassumptions, to a linear process driven by any of the sequences discussed inSection 2. This result is of particular interest, because it appears to get closeto deGning necessary memory conditions for the FCLT.Section 4 concludes the paper, and the main proofs are gathered in the

appendix.

2. Applications of the NED approach

The NED property oBers an approach to proving the FCLT for a rangeof time series models, since there is hope that the Lp norm in (1.2), for thecase p=2, can be evaluated and bounded in speciGc cases. In many cases of

248 J. Davidson / Journal of Econometrics 106 (2002) 243–269

practical interest, ut is speciGed as an independent shock process, and thenTheorem 1.2 oBers more generality than necessary. The NED numbers fullydetermine the restrictions on the memory of the observed process. In theexamples that follow we always assume independent shocks, although this isprincipally for tractability of the derivations.As in most dynamic econometric models, the observed process xt is made to

depend only on current and lagged values of ut , and hence is Ft-measurable.In particular, the operator Et+m

t−m is the same as Ett−m. Another feature of the

results that follow is that since it speciGcally bounds a second moment, As-sumption 1′ yields Assumption 2 as a by-product. We do not Gnd the formerwithout the latter. This generalizes the familiar fact, in linear processes, thata memory limitation is implicit in the property of wide-sense stationarity.These two properties are often confused, and we should emphasize that thecoincidence need not obtain in more general classes of model than the onesconsidered here.

2.1. ARMA models

This case is well-known but we describe it for completeness, and be-cause the techniques illuminate the treatment of the nonlinear cases. Let theARMA(p; q) model be

xt =p∑

j=1

jxt−j + ut +q∑

j=1

�jut−j; (2.1)

where ut ∼ i:i:d:(0; �2). Set q=p, for simplicity, and without loss of general-ity since the excess terms can be set to zero as required, and write the modelin companion form. Letting xt =(xt ; xt−1; : : : ; xt−p+1)′ and ut =(ut; ut−1; : : : ;ut−p+1)′, deGne p×p matrices

�=

1 · · · p−1 p1 · · · 0 0...

. . ....

...

0 · · · 1 0

; �=

�1 · · · �p−1 �p−1 · · · 0 0...

. . ....

...

0 · · · −1 0

(2.2)

so that

xt =�xt−1 + ut +�ut−1

=�mxt−m + ut +m−2∑j=0

�j(�+�)ut−j−1 +�m−1�ut−m (2.3)

for any m¿ 2. The process is covariance stationary subject to the usualstability condition that maxi |�i|¡ 1, where the �i are the eigenvalues of �.

J. Davidson / Journal of Econometrics 106 (2002) 243–269 249

Under the assumptions, we have

xt − Et+mt−mxt =�

m−p(xt−m+p − Et+mt−mxt−m+p): (2.4)

The object of interest is the L2-norm of the Grst element of (2.4). Let || · ||2with a vector argument represent the vector of the L2-norms of the elements,and similarly let | · | represent the vector of absolute values. Then, letting

c=(1; 0; : : : ; 0)′ (p×1) (2.5)

note that

||xt − Et+mt−mxt ||2 = ||c′�m−p(xt−m+p − Et+m

t−mxt−m+p)||26 2|c′�m−p| ||xt−m+p||2

= O(max

i|�i|m

); (2.6)

where the inequality is obtained from the Minkowski and Jensen inequalitiesand the assumption of stationarity. The conclusion is summarized as follows.

Proposition 2.1. The covariance stationary ARMA(p; q) process is geomet-rically L2-NED on the shock process with respect to constants dt =1.

A point to note is that the stated condition su*ces to satisfy Assumptions1′ and 2. To ensure condition 3 holds requires the additional stipulation ofinvertibility, excluding unit roots in the moving average component.A feature of the NED approach is the ease of generalization to allow for

heterogeneous marginal distributions. The above argument is straightforwardlymodiGed for the case where ut ∼ i:h:d:(0; �2

t ), denoting that the random vari-ables are independent but heterogeneously distributed, with time dependentvariances in particular. If for example �2

t =O(t�) for �¿ 0, it is easily ver-iGed that the foregoing proposition holds with dt =Ct�=2, for some C¿ 0.While these conditions can be shown to be compatible with a central limittheorem, they do not lead to a FCLT with regular Brownian motion as thelimit process.For the sake of simplicity, the analysis following focuses on stationary

processes by assuming the driving processes to be i.i.d. Following the cur-rent literature, the models to be considered are mainly varieties of nonlinearGnite-order diBerence equation. Since their solution involves iterated productsof random variables, it is not surprising to Gnd that they mostly share thegeometric L2-NED property with the linear ARMA form, subject to condi-tions su*cient for them to possess bounded L2 norms. These are constantunder stationarity of the driving process, but in each case that follows, notehow the assumption can be relaxed to the i.h.d. case, leading, given the ex-istence of the requisite marginal moments, to modiGed results allowing dt tobe a function of time.

250 J. Davidson / Journal of Econometrics 106 (2002) 243–269

2.2. Bilinear models

The books of Priestley (1988) and Tong (1990) are leading references onbilinear models. For a recent econometric application, see Davidson and Peel(1998). Chapter 4.1 of Priestley (1988) analyses the so-called BL(p; 0; p; 1)model, which is the case with m=p of

xt =p∑

j=1

jxt−j +m∑j=1

&jxt−jut−1 + ut: (2.7)

This in turn represents a sub-class of the class of BL(p; r; m; k) models deGnedby Subba Rao (1981), which involve terms in xt−jut−i for j=1; : : : ; m andi=2; : : : ; k, and ut−j for j=0; : : : ; r. The setting of m=p and r=0 sacriGceseBectively no generality, since the MA coe*cients are not involved in theNED analysis, similarly to the ARMA case.Following Priestley, let � be deGned as in (2.2) and let

B=

&1 · · · &p−1 &p

0 · · · 0 0...

. . ....

...

0 · · · 0 0

(p×p) (2.8)

and let c be deGned by (2.5). DeGning xt(p × 1) as before, the companionform of the model is

xt =�xt−1 + Bxt−1ut−1 + cut: (2.9)

DeGne the p×p matrices

�=E(xtx′t); �=E(utxtx′t); =E(u2t xtx′t)

and let E(u4t )=�4. While not necessarily assuming Gaussianity, let Eu3t =0.This simpliGes the expressions following, speciGcally the form of the matrixP, but does not aBect the nature of the solution and the conditions for itsexistence. Then, straightforward though still tedious manipulation shows that�=K +K ′ where

K =�4(I −�)−1Bcc′ (2.10)

and

Vec�=(I −�⊗�− �2B ⊗ B)−1(VecP + (�4 − �4)B ⊗ BVec cc′);(2.11)

Vec= (I −�⊗�− �2B ⊗ B)−1(�2 VecP

+(�4 − �4)(I −�⊗�)Vec cc′); (2.12)

J. Davidson / Journal of Econometrics 106 (2002) 243–269 251

where

P=��B′ + B��′ + �2cc′ (2.13)

subject to the stability condition represented by maxi |�i|¡ 1, where �i arethe eigenvalues of �⊗�+ �2B ⊗ B.

Letting t(j)=∏j

i=1 (�+But−i) (p×p) and wt =(�+But)xt (p×1), notethat

xt − Et+mt−mxt = t(m)(wt−m−1 − Et+m

t−mwt−m−1); (2.14)

where

E(wtw′t)=���

′ +��B′ + B��′ + BB′¡∞ (2.15)

and by the independence of the terms,

VecE[ t(j)cc′ t(j)′]= (�⊗�+ �2B ⊗ B)j Vec cc′: (2.16)

Since t(m) and wt−m−1 are also independent,

||xt − Et+mt−mxt ||2 = ||c′ t(m)(wt−m−1 − Et+m

t−mwt−m−1)||26 2 tr[E( t(m)cc′ t(m)′)E(wt−m−1w′

t−m−1)]1=2

= O(max

i|�i|m=2

)(2.17)

according to (2.16), using the Jensen inequality similarly to (2.6). The con-clusion may be stated as:

Proposition 2.2. The covariance stationary BL(p; 0; p; 1) process is geomet-rically L2-NED on the shock process with respect to constants dt =1.

Extending this result to BL(p; r; m; 1) is trivial, for reasons already re-marked. Moving to the general BL(p; r; m; k) is not trivial, however, bothbecause of the complexity of the relevant moment expressions, and becausethese involve moments of the ut of higher order, depending on k. Thus,consider the expression corresponding to (2.16) when t(j)=

∏ji=1 (� +∑k−1

l=0 Biut−i−l). However, subject to the existence of the required moments,parameter restrictions for the geometric L2-NED property could in principlebe derived by elaboration of the above arguments.

2.3. GARCH models

On GARCH models of persistent volatility see, for example, Bollerslev(1986) and Engle (1995). In this model

ut = h1=2t �t ; (2.18)

where �t ∼ i:i:d:(0; 1) and ht is an Ft−1-measurable process. Hansen (1991)has derived conditions for NED in the GARCH(1,1) case. In the GARCH(p;p)

252 J. Davidson / Journal of Econometrics 106 (2002) 243–269

model that we consider here, without loss of generality,

ht = �0 +p∑i=1

�iu2t−i +p∑i=1

&iht−i

= �0 +p∑i=1

[�i(�2t−i − 1) + �i]ht−i ; (2.19)

where �i = �i + &i. The following lemma holds for any case of (2.18) inwhich ht is bounded below by a constant �0¿ 0. (All proofs are given inthe appendix.)

Lemma 2.1. ||ut − Et+mt−mut ||26 �−1=2

0 ||ht − Et+mt−mht ||2:

It therefore su*ces to show that ht is L2-NED on {�t}, and as with theARMA and BL models the key step is to express the process as the sum of aFt+m

t−m-measurable term and a remainder. The companion form of the model is

ht = c�0 + (�+AS t−1)ht−1

= c�0 + �0∞∑j=1

j∏k=1

(AS t−k + �)c; (2.20)

where ht =(ht; : : : ; ht−p+1)′ and St =diag{�2t − 1; : : : ; �2t−p+1 − 1}. Here, c isas in (2.5) and � and A have the same structure as � in (2.2) and B in(2.8), respectively, with the �i and �i, respectively, in their top rows. Notethe condition for covariance stationarity, that � has all its eigenvalues withinthe unit circle such that

E(ht)= �0(I − �)−1c: (2.21)

Let � t(m)= �0∏m−p+1

k=1 (AS t−k + �). Noting that AS s + � is Ft+mt−m-

measurable for s= t − 1; : : : ; t −m+ p,

ht − Et+mt−mht =� t(m)(ht−m+p−1 − Et+m

t−mht−m+p−1): (2.22)

Suppose that ht is bounded in L2-norm, or equivalently, that the process xthas bounded fourth moments. Then

||ht − Et+mt−mht ||2 = ||c′� t(m)(ht−m+p−1 − Et+m

t−mht−m+p−1)||2

6p∑

j=1

||{� t(m)}1j(ht−m+p−j − Et+mt−mht−m+p−j)||2

6 2p∑

j=1

||{� t(m)}1j||2||ht−m+p−j||2; (2.23)

J. Davidson / Journal of Econometrics 106 (2002) 243–269 253

where the Grst inequality is Minkowski’s. The second uses the fact, easilyveriGed, that ht−m+p−j is independent of {� t(m − p + 1)}1j by assumption,and then Jensen’s inequality, as before.To obtain the conditions for ||ht ||2¡∞, not depending on t, let

�=E(hth′t); �=E(hth′tSt)

and note that

E(Sthth′tSt)=�4 dg�;

where �4 =E�4t − 1, the variance of �2t − 1, and dg� represents the p×pdiagonal matrix having the same diagonal as �. Using the Grst equality in(2.20) to substitute for ht and taking expectations yields

�=���′ + �4A dg�A′ + ��A′ +A�′�′ + �20F ;

�=��J ′ + �4A dg�J ′; (2.24)

where

J =

[0′ 0

Ip−1 0

](p×p):

and

F = cc′ + �(I − �)−1cc′ + cc′(I − �′)−1�′:

DeGne the p2×p2 permutation matrix P; such that Vec(�′)=PVec�, andthe p2×p2 deletion matrix D; such that Vec(dg�)=DVec�. Then the stablesolution of (2.24) for � is given by

Vec�= �20(I −M)−1 VecF ;

where

M =�⊗ �+ �4(A⊗A)D+ ((A⊗ �) + (�⊗A)P)×[I − (J ⊗ �)]−1�4(J ⊗A)D: (2.25)

Accordingly, the condition for existence of the fourth-order moments of theGARCH(p;p) (and more generally the GARCH(p; q) by setting the excessterms to 0 in formulae) is that the eigenvalues of both � and M all liestrictly inside the unit circle. Subject to this condition it is clear, in view ofthe geometric rates of convergence of the sequences of cross moments, easilydeduced from (2.20), that the process is fourth-order stationary in the senseof, e.g., Hannan (1970, p. 209). Since the eigenvalue condition is necessary aswell as su*cient, we may refer to it as the fourth-order stationarity condition.Note that to obtain this condition, we have invoked parameter restrictions plusthe Gniteness of the fourth moment of the i.i.d. driving process �t , but notGaussianity of the latter process.

254 J. Davidson / Journal of Econometrics 106 (2002) 243–269

Moreover, it is clear from consideration of the second equality of (2.20)that E(� t(m)� t(m)′) is the mth term in the expansion of � in geometricseries, and hence that ||{� t(m)}1j||2 =O(maxi |�i|m=2), where the �i are theeigenvalues of M in (2.25). The conclusions of this section are thereforesummarized as follows:

Proposition 2.3. A fourth-order stationary GARCH(p; q) process is geo-metrically L2-NED on the underlying i.i.d. process; with respect to constantsdt =1.

A further point to notice here is that the argument can be adapted to provethat xt is geometrically L1-NED, subject only to the covariance stationaritycondition. This is su*cient to show that, for example, the sample mean of theprocess converges in probability. However, it does not su*ce for the FCLT.The pure GARCH process is of course a fourth-order stationary martingale

diBerence on the assumptions, and Assumption 1′ is therefore not directlyrequired for the proof of the FCLT, although it su*ces. However, FCLTs formartingale diBerences call for a weak law of large numbers to hold in thesquares of the process (Davidson, 1994, Theorem 24:3). For this purpose, theL1-NED property of the squares can be invoked (Davidson, 1994, Theorem17:9), and Proposition 2.3 establishes fourth-order stationarity as a su*cientcondition.The result may be also extended by, for example, letting the process drive

an ARMA. Let xt in (2.1) have the MA(∞) representation

xt =∞∑j=0

)jut−j; |)j|=O(|�A|j); (2.26)

where �A is the absolutely largest eigenvalue of the companion form in (2.3).Also let ut = h1=2t �t where �t ∼ i:i:d:(0; 1) and ht is generated by (2.19) and let�G denote the absolutely largest eigenvalue of M in (2.25). Then, combiningprevious results with Minkowski’s inequality yields

||xt − Et+mt−mxt ||2 = ||c′(xt − Et+m

t−mxt)||26 ||c′(ut − Et+m

t−mut)||2

+m−p∑j=0

||c′�j(�+�)(ut−j−1 − Et+mt−mut−j−1)||2

+∞∑

j=m−p−1

||c′�j(�+�)ut−j−1||2

J. Davidson / Journal of Econometrics 106 (2002) 243–269 255

= O

m−p∑

j=0

|�A|j|�G|m−p−j +∞∑

j=m−p−1

|�A|j

= O((m− p+ 1)max{|�A|m−p; |�G|m−p}): (2.27)

Since the exponential term in the Gnal member of (2.27) dominates, theconclusion is as follows.

Proposition 2.4. A stable ARMA driven by a fourth-order stationaryGARCH process is geometrically L2-NED on the underlying i.i.d. process;with respect to constants dt =1.

However, it may be noted that a more general result is obtained by com-bining the GARCH result with Theorem 3.1 to be introduced below, sinceunder the assumptions, the driving GARCH process satisGes the conditionsimposed on xt in (3.1).

2.4. Switching and threshold autoregressions

A wide class of nonlinear autoregressive models take the general form

xt =p∑

j=1

�j; t−1xt−j + ut; (2.28)

where ut ∼ i:i:d:(0; �2) and �jt is a Ft-measurable random variable that ingeneral depends on xs for s6 t. The bilinear models already considered aremembers of this class. However, note that Ft can be larger than the naturalGltration �(xs; s6 t). As a referee points out, it is one of the strengths of theNED approach that such models can be easily treated.An important set of cases is deGned by

�jt =N∑i=1

jiI it ; (2.29)

where the ji for i=1; : : : ; N and j=1; : : : ; p are coe*cients, and the I it areFt-measurable indicator functions of which one equals unity, and the restzero. Models of this type include the self-exciting threshold autoregression(SETAR) models, in which I it is the indicator of xt itself falling in a par-ticular interval (see e.g. Priestley, 1988, Chapter 4.2; Tong, 1990, Chapter3.3; Granger and TerRasvirta, 1993, Chapter 4.1) and the Markov-switchingautoregression (Hamilton, 1994, Chapter 22.4) in which the I it constitute anindependently generated Markov chain. A third group are the smooth tran-sition autoregressive (STAR) models, in which the indicator functions arereplaced by smooth functions of the data having the unit interval as range,such as distribution functions or logistic forms.

256 J. Davidson / Journal of Econometrics 106 (2002) 243–269

The companion form for these models is

xt =t−1xt−1 + cut; (2.30)

where c was deGned in (2.5) and

t−1 =

�1; t−1 · · · �p−1; t−1 �p;t−1

1 · · · 0 0...

. . ....

...

0 · · · 1 0

:

The usual expansion yields, for m¿p,

xt =m−p∑j=1

j∏k=1

t−kcut−j +m−p+1∏k=1

t−kxt−m+p−1 + cut: (2.31)

Using the Minkowski, Jensen and HRolder inequalities, we have for somer¿ 1,4

||xt − Et+mt−mxt ||2

=

∥∥∥∥∥c′m−p+1∏k=1

t−kxt−m+p−1 − c′m−p+1∏k=1

t−kEt+mt−m(xt−m+p−1)

∥∥∥∥∥2

6 2

∥∥∥∥∥c′m−p+1∏k=1

t−k

∥∥∥∥∥2r=(r−1)

||xt−m+p−1||2r : (2.32)

In the case r=1, the Grst factor of the majorant of (2.32) is interpreted asthe sup-norm of the argument.For the case of (2.29),

t =N∑i=1

I it�i ;

where the �i are deGned as in (2.3), and∥∥∥∥∥c′m−p+1∏k=1

t−k

∥∥∥∥∥∞=O(|�∗|m−p+1);

where |�∗| denotes the largest modulus of the eigenvalues of the set �1; : : : ;�N .If |�∗|¡ 1 then the eigenvalues of t are stable with probability 1, and the

4 As previously, the L2-norm with vector argument is interpreted as the vector of L2-norms.

J. Davidson / Journal of Econometrics 106 (2002) 243–269 257

TAR is accordingly covariance stationary. Letting �∗ denote the member ofthe set to which �∗ corresponds, we have

E(x2t ) = �2 +∞∑j=1

∞∑l=1

c′Ej∏

k=1

l∏i=1

t−kcc′′t−icut−jut−i

6 �2 + �2

∞∑

j=1

c′�∗jc

2

¡∞:

Chan and Tong (1985) and Tong (1990) develop the concept of geo-metric ergodicity as a restriction on the behaviour of nonlinear stochasticdiBerence equations like the above. In stationary processes, this restrictionimplies strong mixing at the geometric rate so that the conditions of The-orem 1.1 are satisGed. It is therefore worth emphasizing the advantagesconveyed by the NED approach in these cases, in addition to the relativeease of establishing the property already demonstrated, and the powerfulasymptotic results available. The extension to nonstationary cases, featur-ing trending moments of the shock processes for example, has already beenindicated. However, another important gain in generality is that NED de-pends in no way on the distribution of the shocks, beyond the existence ofthe requisite moments. By contrast, establishing geometric ergodicity in atypical application of the above type requires the shock distribution to becontinuous.

2.5. Unit root SETAR models

A more interesting case is where maximum eigenvalues of unity are ad-missible with positive probability, so that the process behaves under certainconditions as a unit root process. Where stability with probability 1 is notavailable, the case r¿ 1 in (2.32) may be considered, requiring the processto have Gnite L2r norm. Then we must consider the behaviour of the m-foldproduct in the majorant as m increases.A simple Grst-order example of the self-exciting type will be analysed.

Let

xt =

{xt−1 + ut; |xt−1|¡a;

xt−1 + ut; otherwise;(2.33)

for 06 ¡ 1 and a¿ 0, where ut ∼ i:i:d:(0; �2), and assume further that utis continuous, the p.d.f. having Gnite second derivative at 0, and E|ut |2r ¡∞for r¿ 1: It could for example be Gaussian. Models of this type can arisein the theory of exchange rate bands, and Ss-type inventory models. Thefollowing result is proved in the appendix.

258 J. Davidson / Journal of Econometrics 106 (2002) 243–269

Fig. 1. The standard Gaussian case.

Proposition 2.5. Under the stated assumptions; xt in (2:33) is geometricallyL2-NED on ut with

�(m)=O((g(a) + (1− g(a)) 2r=(r−1))m(r−1)=2r) (2.34)

where; letting f denote the p.d.f. of ut;

g(a)= (2a)−1∫ a

−a

∫ a

−af(x − y) dx dy (2.35)

and g(a)¡ 1.

Fig. 1 plots the function g(a) for the case where f is the standard Gaus-sian. Application of (2.34) shows how the memory of the process is boundeddepending on the width of the band, measured in units of the standard devi-ation of the shock process. A large value of r is evidently necessary if therate of decline of �(m) is not to be very slow even for relatively small valuesof a; and hence formula (2.34) is most useful for cases where all momentsexist. Otherwise, the inequality in (2.32) could suBer a lack of sharpness,such that with r close to 1, the dependence might in practice decline fasterthan the derived bound indicates.

2.6. The ESTAR model

A close relative of the last example is the exponential smooth transitionautoregression (ESTAR) taking the form

xt =e−&x2t−1xt−1 + ut: (2.36)

where &¿ 0 is a parameter and, as usual, ut ∼ i:i:d:(0; �2) and E|ut |2r ¡∞for r¿ 1: This is another way to implement the idea of a process which hasrandom walk characteristics in the mid-range, and mean reversion properties

J. Davidson / Journal of Econometrics 106 (2002) 243–269 259

Table 1Numerical evaluation of bound (2.32) in the ESTAR modela

r &=1 &=0:1 &=0:01 &=0:001

1.1 0:565m×1:08 0:904m×1:40 0:981m×2:30 0:995m×4:0150 0:499m×4:09 0:876m×4:81 0:963m×6:85 0:988m×10:87

aThe factor 2 has been omitted.

at extreme values, but in this case the transition is smooth. The autoregressivecoe*cient �t−1 = e−&x2t−1 , corresponding to t−1 in (2.30) with p=1, is equalto 1 only with probability 0.Analytic solution of the diBerence equation is by no means straightfor-

ward in this type of model, and no simple way to evaluate the factors of(2.32) suggests itself. However, it is easy to compute the relevant momentsby simulation. The results of this exercise, using standard Gaussian distur-bances and a sample size of 100,000, are reported in Table 1. In the caseof ||∏m+1

k=1 �t−k ||2r=(r−1), the norms have been estimated for m=1; : : : ; 100 andthe logarithm of this series has been regressed on m.5 The R2s in these re-gressions are all greater than 0:97 and exceed 0:999 for the smaller values of&; so the exponential approximation is good. The antilogarithms of the slopecoe*cients appear as the left-hand factors in each cell of the table, raisedto power m; while the right-hand factors are the estimated L2r-norms of theprocess. In contrast to the threshold model, there is no serious penalty herein setting r close to unity. The diBerence is that in the former model �t =1with positive probability, so that taking a large power does not shrink it, butthat does not occur in this case.

2.7. Smooth nonlinear autoregressions

Finally, consider methods for tackling a general class of models whichencompasses some of those discussed above. These have the form

xt =f(ut; xt−1); (2.37)

where f is diBerentiable at least with respect to its second argument. As-sume, as previously, that ut ∼ i:i:d:(0; �2); with ||ut ||2r =B¡∞ for r¿ 1:The approach here is to consider the recursive solution

xt =f(ut; f(ut−1; f(ut−2; : : :) : : :))

= g(ut; ut−1; ut−2; : : :): (2.38)

5 The norm may be too small for the logarithm to be computed for the larger values of m;and in such cases the number of lags is truncated.

260 J. Davidson / Journal of Econometrics 106 (2002) 243–269

Denoting the latter function by gt; also deGne gmt = g(ut; : : : ; ut−m; 0; 0; : : :).Then note that

gt − gmt =∞∑

j=m+1

Gt−jut−j (2.39)

by the mean value theorem, where using the chain rule of diBerentiation and(2.38), we can write

Gt−j =(

@gt@ut−j

)∗=

m∏k=1

f2; t−k :

Here the ‘∗’ denotes that the derivative is evaluated at points u∗t−j ∈ [0; ut−j];and

f2; t−k =f2(ut−k ; g(ut−k−1; : : : ; ut−m; u∗t−m−1; : : :)) (2.40)

denotes the derivative of f with respect to its second argument, evaluated atthe points u∗t−m−k , for k¿ 0: Now use the fact that Et+m

t−m(:) is the minimummean-square Ft+m

t−m-measurable approximation of its argument, and then theHRolder inequality, to obtain, much as before,

||gt − Et+mt−mgt ||26 ||gt − gmt ||2

6 B∞∑

j=m+1

∥∥∥∥∥m∏

k=1

f2; t−k

∥∥∥∥∥2r=(r−1)

(2.41)

This case was considered by Gallant and White (1988) (also seeDavidson, 1994, Chapter 17.1) although subject to the stringent stability con-dition, |f2; t−k |6 b¡ 1 with probability 1. In this case it is obviously easyto bound (2.41), but the restriction is stronger than strictly necessary, in thelight of the earlier discussion. As an illustration of the calculations involved,one might take the ESTAR example considered above, although our approachthere is preferable since it exploited the additivity of the error term. The costof the extra generality is to introduce the inGnite series into the NED bound,but since, in general, the terms of the sum decline exponentially (if they dodecline) there is no extra penalty here. The sum from m to ∞ in (2.41)declines at the same rate as the mth term.

3. Linear forms

Let

zt =∞∑k=0

�kxt−k (3.1)

J. Davidson / Journal of Econometrics 106 (2002) 243–269 261

and deGne for zt the standardized stochastic process Zn on [0; 1], having theform of (1.1). The following theorem is proved in the appendix.

Theorem 3.1. Znd→B if the process {xs;−∞¡s¡∞} in (3:1) satis$es As-

sumptions 1′; 2 and 3; and the sequence {�j} satis$es the following conditions:|�j| is regularly varying at in$nity;6

∣∣∣∣∣∣∞∑j=0

�j

∣∣∣∣∣∣¡∞ (3.2)

and

∞∑k=0

n+k∑

j=1+k

�j

2

= o(n): (3.3)

In other words, this result says that any process satisfying the assumptionsof Theorem 1.2 can be replaced by a moving average of itself, under thespeciGed conditions, and the weak convergence is preserved. This is a corol-lary to Theorem 3.1 of Davidson and de Jong (2000), which establishes theFCLT for fractionally integrated processes. It operates by the trick due toDavydov (1970), of re-ordering the partial sums of Zn to collect terms withthe same time index, so converting a dependence problem eBectively into oneof heteroscedasticity. In the fully general version of Theorem 1.2 given by DeJong and Davidson (2000), the variances of the process are allowed to trendlike t� for any �¿−1, and exploiting this fact yields the result. However, itcannot be generalized beyond the linear case. There is no way yet known offurther improving general (model-independent) dependence conditions overthose given in Theorem 1.2.The novel condition here is (3.3). Considering the case n=1 shows that it

implies square-summability of the coe*cients. However, consideration of thefollowing numerical lemma, proved in the appendix, shows that it is weakerthan absolute summability.

Lemma 3.1. For �¿ 0;∞∑k=1

(k−� − (n+ k)−�)2 =o(n):

6 That is, |�j|= j)L(j) for real ) where L(xj)=L(j) → 1 as j → ∞ for all x¿ 0. L is calleda slowly varying function, and can be arbitrary for Gnite j. This assumption can certainlybe relaxed, by specifying suitable regularly varying bounding sequences. However, such anextension would complicate the statement of the result while adding little useful generality; inparticular, note that the Gnite-order MA case is covered directly by Theorem 1.2.

262 J. Davidson / Journal of Econometrics 106 (2002) 243–269

If the coe*cients all take the same sign, the Gniteness speciGed in (3.2)is equivalent to absolute summability of the sequence, and condition (3.3)holds in consequence. If |�j|=O(j−1−�) for �¿ 0, then∣∣∣∣∣∣

n+k∑j=1+k

�j

∣∣∣∣∣∣6n+k∑

j=1+k

|�j|

6Cn+k∑

j=1+k

j−1−�

6C�(k−� − (n+ k)−�) (3.4)

for C¿ 0:7

If the coe*cients may diBer in sign, however, condition (3.3) becomesthe binding condition. Consider an example with alternating signs, �j =(−1)jj−1=2−�, �¿0.8 This sequence is square-summable but not absolutely summable.However, deGning

�∗j =

{12 (�j + �j+1); 1 + k6 j6 n+ k and j − k odd;

�∗j−1; 1 + k6 j6 n+ k and j − k even;

note that if n is even,n+k∑

j=1+k

�j =n+k∑

j=1+k

�∗j (3.5)

and it can be veriGed that �(j + 1)−3=2−� ¡�∗j ¡ �j−3=2−�. If n is odd, thesame equality holds with �∗j so deGned for j¡n+ k and �∗n+k = �n+k . Then,similarly to (3.4),∣∣∣∣∣∣

n+k∑j=1+k

�j

∣∣∣∣∣∣6C1(k−1=2−� − (n+ k)−1=2−�) + C2(n+ k)−1=2−� (3.6)

for C1¿ 0 and C2¿ 0, with C2 = 0 if n is even. Both terms on the majorantside of (3.6) satisfy the condition that their sum of squares over k=1; 2; : : :is o(n).This example can be elaborated with more general patterns of sign change,

such that the partial signed sums grow appropriately. Consider

�j = j−1=2−� cos(24j=N )

7 For simplicity, we ignore the possible role of slowly varying components here. A versionof the argument can also be derived for the summable case |�j|=O(j−1(log j)−1−�), �¿ 0.

8 There is no suggestion that this is a realistic model of any observed process. It is takensimply as a tractable case, for purposes of illustration.

J. Davidson / Journal of Econometrics 106 (2002) 243–269 263

for positive Gnite integer N . A construction on the lines of (3.5) is clearlypossible, oB-setting pairs of positive and negative terms. A sinusoidal lagdistribution involving changes of sign will in general allow longer memorythan a monotone one. These examples give an indication of the generality ofcondition (3.3), noting that the ‘− 1

2 ’s in the exponents of the Grst majorantterm of (3.6) are actually surplus to the requirements of square-summability.However, a counter-example is provided by the case �j = j−1=2−� cos(24j5=N )for 5¡ 1: In this case, the period of the Luctuations is increasing with thelag, and the ‘oB-setting’ device fails in the tail, as k increases.An example of (3.1) in which the assumptions of Theorem 3.1 are violated

is the fractionally integrated, or I(d), model. In this case,

�j =6(j + d)

6(d)6(j + 1)=O(jd−1): (3.7)

When d=0, �j =0 for j¿ 0 and the assumptions of Theorem 3.1 reduceto those of Theorem 1.2. The long memory case, in which 0¡d¡ 1

2 , vio-lates the upper bound speciGed in condition (3.2) because the lag coe*cientsare positive and nonsummable. As the above-cited theorem of Davidson andde Jong (2000) shows, the limit of the partial sums is not B in this casebut fractional Brownian motion, a Gaussian process having positively cor-related increments. This strongly indicates that the summability condition isnecessary.The negative fractional model in which − 1

2 ¡d¡ 0 also violates (3.2), foralthough its coe*cients are summable, they sum identically to 0. The processis constructed as the simple diBerence of a nonstationary fractional process(with 1

2 ¡d¡ 1). The limit of the partial sums is in this case a Gaussianprocess with negatively correlated increments. However, note that without thesum-to-zero property a process with |�k |=O(kd−1) for d¡ 0 is one of thecases covered by Theorem 3.1, and the limit process is ordinary Brownianmotion.

4. Conclusion

The objectives of this paper have been twofold. First, it has considered theoperationalization of the near-epoch dependence assumption, for establishingthe FCLT conditions in nonlinear models. The condition is easily checked fora range of models, and indeed, most of the classes of model dealt with inthe monographs of Priestley (1988), Tong (1990) and Granger and TerRasvirta(1993) are covered by our results. Second, it has presented a new FCLT forlinear processes driven by dependent sequences of random variables, whoseconditions appear close to necessity for this class of models. It is known thatdiBerent limits obtain when the summability conditions are violated.

264 J. Davidson / Journal of Econometrics 106 (2002) 243–269

The one limiting feature of the former analysis has been the need to assumethat the shock processes driving these models are independent. It has beenpointed out that relaxing the stationarity assumption for the shock processesis generally rather trivial, but the analysis has used the independence in acrucial way to simplify the derivations. This is a limitation, in the sense thatthe theory itself allows the underlying process to exhibit unspeciGed localdependence, subject to a mixing condition. The di*culty in exploiting thisgeneralization lies simply in the greater di*culty of establishing the NEDproperty. Of course, it can be argued that modelling nonlinear dependencereduces the need to allow for unspeciGed dependence, and Proposition 2.4illustrates this point. Extending the class of switching and threshold modelsto allow for conditional heteroscedasticity represents one of several avenuesfor future research, and the conjecture that geometric L2-NED properties holdunder comparable assumptions is a plausible one.

Acknowledgements

I am grateful to Michael Jansson, Ron Gallant, an Associate Editor, andtwo anonymous referees for comments which have materially improved thepaper. I retain all responsibility for errors.

Appendix A.

Proof of Lemma 2.1. First, noting that ut and ht are independent and ut ismeasurable with respect to Ft+m

t−m for m¿ 0,

||xt − Et+mt−mxt ||2 = ||h1=2t − Et+m

t−mh1=2t ||2: (A.1)

Next, since ht and Et+mt−mht are both bounded below by �0, we have

|h1=2t − (Et+mt−mht)

1=2|6 �−1=20 |ht − Et+m

t−mht | a:s:and hence

||h1=2t − (Et+mt−mht)

1=2||26 �−1=20 ||ht − Et+m

t−mht ||2: (A.2)

Also note that

||h1=2t − Et+mt−mh

1=2t ||26 ||h1=2t − (Et+m

t−mht)1=2||2 (A.3)

in view of the fact that Et+mt−m(:) is the minimum MSE, Ft+m

t−m-measurableapproximation of its argument. Combining (A.2), (A.3) and (A.1) yields theresult.

J. Davidson / Journal of Econometrics 106 (2002) 243–269 265

Proof of Proposition 2.5. We evaluate the bound in (2.32). First verify that||xt ||2r ¡∞ for r¿ 1: Let

x1t = xtI(|xt−1|¡a)

so that ||x1t ||2r6 a: Then note that ||xt ||2r6 ||x1t ||2r + ||xt − x1t ||2r ; usingMinkowski’s inequality, where

||xt − x1t ||2r 6 ||xt−1||2r + ||ut ||2r

6m−1∑j=0

j||ut−j||2r + a m

¡∞;

where time t −m is the most recent occasion prior to t in which the processis contained in the band, and the last inequality holds for any m; Gnite orinGnite.Next, note that

∏mk=1 �t−k is a random variable whose range consists of

the values j for integers j=0; : : : ; m, depending on the number of occasionsthat the band is breached in time periods t −m to t − 1. If fn(:) denotes thep.d.f. of xt conditional on the process having stayed within the band for thepreceding n periods, there exists the recurrence relation

fn(x)=∫ a

−afn−1(y)f(x − y) dy; n=1; 2; 3; : : :

where f(:) denotes the p.d.f. of ut and f0(x)=f(x− x0); and x0 denotes thepoint at which the process is reset within the band following a breach (seeCox and Miller, 1965, Chapter 2:3). Note that |x0|¿ a. The probability ofstill being inside the band at step n is therefore∫ a

−afn(x) dx=

∫ a

−afn−1(y)

∫ a

−af(x − y) dx dy

=n∏

j=0

gj;

where

g0(a; ) =∫ a

−af(x − x0) dx

6∫ a( +1)

a( −1)f(u) du6 1

and

gn(a)=

∫ a−a fn−1(y)

∫ a−a f(x − y) dx dy∫ a

−a fn−1(y) dy¡ 1 (A.4)

266 J. Davidson / Journal of Econometrics 106 (2002) 243–269

for n¿ 1: The inequality in (A.4) is strict because∫ a−a f(x − y) dx6 1 for

−a6y6 a and the latter inequality must be strict for |y| near enough toa because the support of the distribution contains an open interval around 0by assumption. Note that g0 is increasing in a and decreasing in ; whereasgn; which is the conditional probability that the band is not breached at stepn; is increasing in a and decreasing in n, with inf a gn=0, supa gn=1, andg(a)= inf n gn is deGned in (2.35).To calculate the L2r=(r−1) norm of

∏mk=1 �t−k exactly from these formulae is

feasible given a form for f; but obviously very complicated. Instead, considerthe parallel problem in which gn= g(a) for every n: Also assume that the bandis breached at exactly time t −m: In this case, we would have, exactly,∣∣∣∣∣

∣∣∣∣∣m∏

k=1

�t−k

∣∣∣∣∣∣∣∣∣∣2r=(r−1)

=

m∑

j=0

2rj=(r−1)

(m

j

)g(a)m−j(1− g(a))j

(r−1)=2r

= (g(a) + (1− g(a)) 2r=(r−1))m(r−1)=2r : (A.5)

The second assumption lowers the probability of the Grst breach occurringsubsequent to time t−m; and so increases ||∏m

k=1 �t−k ||2r=(r−1), and is innocu-ous from the point of view of bounding this norm. Moreover, since gn → g(a);the approximation in (A.5) becomes arbitrarily close as m increases.

Proof of Theorem 3.1. This proceeds by adapting the proof of Theorem 3.1of Davidson and de Jong (2000), (henceforth, DdJ). Consider the quantities

ant(�; �′)=[n�]−t∑

j=max{0;[n�′]−t+1}�j: (A.6)

Under assumption (3.2), |ant(�; �′)|=O(1) but not o(1), as n → ∞, for06 �′¡�6 1 and −∞6 t6 [n�]. Therefore consider Eq. (B:5) of DdJ,which is

[n�]∑t=−∞

ant(�; �′)2 =[n�]∑

t=[n�′]+1

ant(�; �′)2 +[n�′]∑

t=−∞ant(�; �′)2

=M1n +M2n: (A.7)

It is immediate by assumptions (3.2) and (3.3), respectively, that M1n=O(n(�−�′)) and M2n=o(n(�−�′)). Thus, to establish the variance of the partialsum write ant for ant(0; 1) and consider Lemma 3:2 of DdJ. The argumenthas to be somewhat modiGed, because in this case the sequence of lag coe*-cients need not be monotone. Under assumption (3.3), |�j|=O(j−1=2−�) whichcorresponds to the case d= 1

2 − � in the lemma, whereas∑n

t−∞ a2nt =O(n),not O(n2−2�). However, the arguments of part (i) of the proof go through

J. Davidson / Journal of Econometrics 106 (2002) 243–269 267

unchanged, since they depend only on the properties of the ant . In part (ii)of the proof, we Gnd in place of Eq. (B:25) of DdJ that

rn−1∑i=−∞

B3ng

2ni =O

(B2n

(rn−1∑i=−∞

(rn − i)−1−2� +0∑

i=−∞|i|−1−2�

))=O(B2

n):

(A.8)

Hence, Eq. (B:26) of DdJ becomes

rn−1∑i=−∞

W ∗2ni =O(n−1B2

n): (A.9)

However, Bn may be freely chosen subject to the conditions stated, and withBn= n1=2−8 for 0¡8¡ 1

2 the conclusion stated in (B:27) continues to hold.It follows that, under the conditions of the theorem, �2

n =O(n) but noto(n). Since Theorem 3.1 of DdJ invokes the properties of the sequence bj(corresponding to �j here) only insofar as they lead to the conclusions ofLemmas 3:1 and 3:2 of DdJ, if follows that the theorem applies in this case,except that the right-hand member of Eq. (B:36) becomes simply �.

Proof of Lemma 3.1. Consider the case 0¡�¡ 1. Placing the terms over acommon denominator and applying the mean value theorem, note that

1k�

− 1(n+ k)�

= �( (n; k; �)n+ k

k(n+ k)

)� n (n; k; �)n+ k

; (A.10)

where 0¡ (n; k; �)¡ 1, by the strict concavity of the power transformation.Hence,

(1k�

− 1(n+ k)�

)2= n�2

( (n; k; �)n+ k

k(n+ k)

)2� ( n1=2k1=2+8

(n; k; �)n+ k

)2k−1−28

(A.11)

for 0¡8¡ 12 where on the right-hand side, the Grst term in parentheses is

O(1) as n → ∞ and o(1) as k → ∞, and the second term in parenthesesis o(1) in respect of both indices, assuming in each case that (n; k; �) isbounded away from 0 as k → ∞, for each n¿ 1 and each �. To show thelatter condition holds, re-write (A.10) after multiplying through by k�(n+ k)�

and letting �= n=k, as

(1 + �)� − 1= ��(1 + �)�−1: (A.12)

268 J. Davidson / Journal of Econometrics 106 (2002) 243–269

Expanding the left-hand side of this equation in Taylor series to second order,and the right-hand side to Grst order, simplifying and rearranging, yields

= 12 + O(�): (A.13)

The case �¿ 1 holds similarly. In the case �=1, (A.10) holds with =1, but the obvious cancellation in (A.11) yields the same conclusion asbefore.

References

Andrews, D.W.K., 1984. Non-strong mixing autoregressive processes. Journal of AppliedProbability 21, 930–934.

Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York.Chan, K.S., Tong, H., 1985. On the use of the deterministic Lyapunov function for the ergodicity

of stochastic diBerence equations. Advances in Applied Probability 17, 666–678.Bollerslev, T., 1986. Generalised autoregressive conditional heteroscedasticity. Journal of

Econometrics 31, 307–321.Cox, D.R., Miller, H.D., 1965. The Theory of Stochastic Processes. Methuen & Co, London.Davidson, J., 1992. A central limit theorem for globally nonstationary near-epoch dependent

functions of mixing processes. Econometric Theory 8, 313–329.Davidson, J., 1993. The central limit theorem for globally non-stationary near-epoch dependent

functions of mixing processes: the asymptotically degenerate case. Econometric Theory 9,402–412.

Davidson, J., 1994. Stochastic Limit Theory. Oxford University Press, Oxford.Davidson, J., de Jong, R.M., 2000. The functional central limit theorem and weak convergence

to stochastic integrals II: fractionally integrated processes. Econometric Theory 16 (5), 621–642.

Davidson, J., Peel, D., 1998. A nonlinear error correction mechanism based on the bilinearmodel. Economics Letters 58 (2), 165–170.

Davydov, Yu.A., 1970. The invariance principle for stationary processes. Theory of Probabilityand its Applications XV (3), 487–498.

De Jong, R.M., 1997. Central limit theorems for dependent heterogeneous random variables.Econometric Theory 13, 353–367.

De Jong, R.M., Davidson, J., 2000. The functional central limit theorem and weak convergenceto stochastic integrals I: weakly dependent processes. Econometric Theory 16 (5), 643–666.

Dickey, D.A., Fuller, W.A., 1979. Distribution of the estimators for autoregressive time serieswith a unit root. Journal of the American Statistical Association 74, 427–431.

Engle, R.F., 1995. ARCH: Selected Readings. Oxford University Press, Oxford.Gallant, A.R., White, H., 1988. A UniGed Theory of Estimation and Inference for Nonlinear

Dynamic Models. (Basil) Blackwell, Oxford.Granger, C.W.J., TerRasvirta, T., 1993. Modelling Nonlinear Economic Relationships. Oxford

University Press, Oxford.Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press, Princeton, NJ.Hannan, E.J., 1970. Multiple Time Series. Wiley, New York.Hansen, B.E., 1991. GARCH(1,1) processes are near-epoch dependent. Economics Letters 36,

182–186.Herrndorf, N., 1984. A functional central limit theorem for weakly dependent sequences of

random variables. Annals of Probability 12, 141–153.

J. Davidson / Journal of Econometrics 106 (2002) 243–269 269

Ibragimov, I.A., 1962. Some limit theorems for stationary processes. Theory of Probability andits Applications 7, 349–382.

McLeish, D.L., 1975. Invariance principles for dependent variables. Z.Wahrscheinlichkeitstheorie verw. Gebiete 32, 165–178.

Phillips, P.C.B., 1987. Time series regression with a unit root. Econometrica 55, 277–301.Phillips, P.C.B., Perron, P., 1988. Testing for a unit root in time series regression. Biometrika

75, 335–346.Priestley, M.B., 1988. Non-Linear and Non-Stationary Time Series Analysis. Academic Press,

London.Subba Rao, T., 1981. On the theory of bilinear models. Journal of the Royal Statistical Society

B 43, 244–245.Tong, H., 1990. Non-Linear Time Series. A Dynamical System Approach. Clarendon Press,

Oxford.