22
On Wavelet Regression with Long Memory Infinite Moving Average Errors Linyuan Li * Department of Mathematics and Statistics, University of New Hampshire, USA, Juan Liu Department of Mathematics and Statistics, University of New Hampshire, USA, Yimin Xiao Department of Statistics and Probability, Michigan State University, USA December 21, 2006 Abstract We consider the wavelet-based estimators of mean regression function with long memory infinite moving average errors and investigate their asymptotic rates of con- vergence of estimators based on thresholding of empirical wavelet coefficients. We show that these estimators achieve nearly optimal minimax convergence rates within a loga- rithmic term over a large class of non-smooth functions that involve many jump discon- tinuities, whose number of discontinuities may grow polynomially fast with sample size. Therefore, in the presence of long memory moving average noise, wavelet estimators still achieve nearly optimal convergence rates and provide explicitly the extraordinary local adaptability in handling discontinuities. A key result in our development is to establish a Bernstein-type exponential inequality for an infinite weighted sums of i.i.d. random variables under certain cumulant assumption. This large deviation inequality may be of independent interest. Short title: Wavelet Estimator with Long Memory Data 2000 Mathematics Subject Classification: Primary: 62G07; Secondary: 62C20 Keywords: Infinite moving average processes; long range dependence data; minimax estimation; nonlinear wavelet-based estimator; rates of convergence * Research supported in part by the NSF grant DMS-0604499. Research supported in part by the NSF grant DMS-0404729. 1

On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

On Wavelet Regression with Long Memory Infinite

Moving Average Errors

Linyuan Li∗

Department of Mathematics and Statistics, University of New Hampshire, USA,

Juan Liu

Department of Mathematics and Statistics, University of New Hampshire, USA,

Yimin Xiao†

Department of Statistics and Probability, Michigan State University, USA

December 21, 2006

Abstract

We consider the wavelet-based estimators of mean regression function with longmemory infinite moving average errors and investigate their asymptotic rates of con-vergence of estimators based on thresholding of empirical wavelet coefficients. We showthat these estimators achieve nearly optimal minimax convergence rates within a loga-rithmic term over a large class of non-smooth functions that involve many jump discon-tinuities, whose number of discontinuities may grow polynomially fast with sample size.Therefore, in the presence of long memory moving average noise, wavelet estimatorsstill achieve nearly optimal convergence rates and provide explicitly the extraordinarylocal adaptability in handling discontinuities. A key result in our development is toestablish a Bernstein-type exponential inequality for an infinite weighted sums of i.i.d.random variables under certain cumulant assumption. This large deviation inequalitymay be of independent interest.

Short title: Wavelet Estimator with Long Memory Data2000 Mathematics Subject Classification: Primary: 62G07; Secondary: 62C20Keywords: Infinite moving average processes; long range dependence data; minimaxestimation; nonlinear wavelet-based estimator; rates of convergence

∗Research supported in part by the NSF grant DMS-0604499.†Research supported in part by the NSF grant DMS-0404729.

1

Page 2: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

1 Introduction

Consider nonparametric regression

Yi = g(xi) + εi, i = 1, 2, · · · , n, (1.1)

where xi = i/n ∈ [0, 1], ε1, · · · , εn are observational errors with mean 0 and g is an unknown

function to be estimated. Common assumptions on ε1, · · · , εn are i.i.d. errors or station-

ary processes with short-range dependence such as classic ARMA processes, see, e.g., Hart

(1991), Tran, et al. (1996) and Truong and Patil (2001). However, in many fields which in-

clude agronomy, astronomy, economics, environmental sciences, geosciences, hydrology and

signal and image processing, it is unrealistic to assume that the observational errors are in-

dependent or short-range dependent. Instead, these observational errors exhibit slow decay

in correlation which is often referred to as long-range dependence or long memory. Suppose

ε1, · · · , εn, · · · is a stationary error process with mean 0 and variance 1. Then {εi, i ≥ 1} is

said to have long-range dependence or long memory, if there exists α ∈ (0, 1) such that

r(j) = E(ε1ε1+j

) ∼ C0|j|−α, (1.2)

where C0 > 0 is a constant and aj ∼ bj means that aj/bj → 1 when j →∞. The literature

on long-range dependence is very extensive, see, e.g., the monograph of Beran (1994) and the

references cited therein. Estimation for data with long-range dependence is quite different

from that for observations with independence or short-range dependence. For example, Hall

and Hart (1990) showed that the convergence rates of mean regression function estimators

differ from those with independence or short-range dependence.

In this paper we suppose that the errors {εi, i ∈ Z} constitute a strictly stationary

moving average sequence which is defined by

εi =∑j≤i

bi−j ζj, i ∈ Z. (1.3)

Here {ζj, j ∈ Z} is a sequence of i.i.d. random variables with mean zero and variance σ2,

and bi, i ∈ Z+ are nonrandom weights such that∑

i b2i = σ−2 [This implies that E(ε2

i ) = 1

for all i ∈ Z]. Furthermore, we assume that the weights decay slowly hyperbolically:

bi ∼ C1i−(1+α)/2, 0 < α < 1, (1.4)

where C1 is a constant. The equations (1.3) and (1.4) imply that (1.2) holds with C0 =

C21 σ2

∫∞0

(u + u2)−(1+α)/2 du. Hence the errors εi in (1.3) have long memory. The family of

long memory processes defined by (1.3) includes the important class of fractional ARIMA

processes. For more information on their applications in economics and other sciences, see

2

Page 3: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Robinson (1994) and Baillie (1996). For various theoretical results pertaining to the empirical

processes of long memory moving averages, see Ho and Hsing (1996, 1997), Giraitis, et al.

(1996, 1999), Koul and Surgailis (1997, 2001), among others.

In this paper, we will consider the nonparametric regression model (1.1) with random

errors {εi} satisfying (1.3) and (1.4). Furthermore, we assume that the random variables

{ζj, j ∈ Z} satisfy the Statulevicius condition (Sγ): There exist constants γ ≥ 0 and ∆ > 0

such that

|Γm(ζj)| ≤ (m!)1+γ

∆m−2for m = 3, 4, . . . , (1.5)

where Γm(ζj) denotes the cumulant of ζj of order m; see Section 2.2 for its definition and

some basic properties. Amosova (2002) has shown that, when γ = 0, the condition (Sγ) is

equivalent to the celebrated Cramer condition; and when γ > 0, it is equivalent to the Linnik

condition. Hence, the class of random variables satisfying (Sγ) is very large. For proving our

main theorem, we will establish a Bernstein-type exponential inequality for a weighted sums

of i.i.d. random variables ζj (see Lemma 4.2 below), which may be of independent interest.

For the nonparametric model (1.1), Csorgo and Mielniczuk (1995) and Robinson (1997)

have proposed kernel estimators of mean regression functions and provided central limit

theorems when the errors are long range dependent Gaussian sequences and stationary mar-

tingale difference sequences, respectively. They all assume that the mean regression function

g is a fixed continuously differentiable function.

Our objective of the present paper is to study the wavelet-based estimator of the re-

gression function g, where g belongs to a large function class which may have many jump

discontinuities. We investigate the asymptotic convergence rates of the estimators and show

that discontinuities of the unknown curve have a negligible effect on the performance of

nonlinear wavelet curve estimators.

Wavelet method in nonparametric curve estimation has become a well-known technique.

For a systematic discussion of wavelets and their applications in statistics, see the mono-

graph by Hardle, et al. (1998). The major advantage of wavelet method is its adaptability

to the varying degrees of smoothness of the underlying unknown curves. These wavelet es-

timators typically achieve the optimal convergence rates over exceptionally large function

space. For reference, see Donoho, et al. (1995), Donoho and Johnstone (1995, 1998), and

Hall, et al. (1998, 1999). All of the above works are under the assumption that the errors

are independent normal variables. For correlated noise, Wang (1996) and Johnstone and

Silverman (1997) examine the asymptotic properties of wavelet-based estimators of mean

regression function with long memory Gaussian noise. Kovac and Silverman (2000) and

von Sachs and Macgibbon (2000) consider a correlated heteroscedastic and/or nonstationary

noise sequence. They show that these estimators achieve minimax rates over wide range of

function spaces. All of the above works assume that the underlying function belongs to a

3

Page 4: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

large smooth function space. Li and Xiao (2006) consider block threshold wavelet estimation

of mean regression function when the errors are long memory Gaussian processes. In this

paper we consider that the mean regression function belongs to a large class of functions with

discontinuities and the observational errors follow long memory moving average precesses.

We show that the wavelet-based estimators, based on simple thresholding of the empirical

wavelet coefficients, attain nearly optimal convergence rates over a large space of non-smooth

functions.

The rest of this paper is organized as follows. In the next section, we recall some

elements of wavelet transforms, provide nonlinear wavelet-based mean regression function

estimators and some large deviation estimates for a weighted partial sums of random variables

{εi, i ≥ 1} under the Statulevicius condition (Sγ). The main results are described in Section

3, while their proofs appear in Sections 4.

Throughout this paper, we use C to denote positive and finite constants whose value

may change from line to line. Specific constants are denoted by C0, C1, C2, A, B, M and so

on.

2 Preliminaries

This section contains some facts about wavelets and large deviation estimates that will be

used in the sequel.

2.1 Wavelet estimators

Let φ(x) and ψ(x) be father and mother wavelets, having the following properties: φ and ψ

are bounded, compactly supported, and∫

φ = 1. We call a wavelet ψ r-regular if ψ has r

vanishing moments and r continuous derivatives. Let

φj0k(x) = 2j0/2φ(2j0x− k), ψjk(x) = 2j/2ψ(2jx− k), x ∈ R, j0, j ∈ Z,

then, the collection {φj0k, ψjk, j ≥ j0, k ∈ Z} forms an orthonormal basis (ONB) of L2(R).

Furthermore, let Vj0 and Wj be linear subspaces of L2(R) with the ONB {φj0k, k ∈ Z} and

{ψjk, k ∈ Z} respectively, we have the following decomposition

L2(R) = Vj0 ⊕Wj0 ⊕Wj0+1 ⊕Wj0+2 ⊕ · · · .

Therefore, for all f ∈ L2(R),

f(x) =∑

k∈Zαj0kφj0k(x) +

∑j≥j0

k∈Zβjkψjk(x), (2.1)

4

Page 5: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

where the coefficients are given by

αj0k =

∫f(x)φj0k(x) dx, βjk =

∫f(x)ψjk(x) dx

and the series in (2.1) converges in L2(R).

The orthogonality properties of φ and ψ imply:

∫φj0k1φj0k2 = δk1k2 ,

∫ψj1k1ψj2k2 = δj1j2δk1k2 ,

∫φj0k1ψjk2 = 0, ∀j0 ≤ j, (2.2)

where δjk denotes the Kronecker delta, i.e., δjk = 1, if j = k; and δjk = 0, otherwise. For

more information on wavelets see Daubechies (1992).

In our regression model, the mean function g is supported on a fixed unit interval [0, 1],

thus we can select an index set Λ ⊂ Z and modify some of ψij(x), i, j ∈ Z, such that

ψij(x), i, j ∈ Λ forms a complete orthonormal basis for L2[0, 1]. We refer to Cohen, et al.

(1993) for more details on wavelets on the interval. Hence, without loss of generality, we

may and will assume that φ and ψ are compactly supported on [0, 1]. We also assume that

both φ and ψ satisfy a uniform Holder condition of exponent 1/2, i.e.,

|ψ(x)− ψ(y)| ≤ C|x− y|1/2, for all x, y ∈ [0, 1]. (2.3)

Daubechies (1992, Chap.6) provides examples of wavelets satisfying these conditions.

As those in the wavelet literature, we investigate wavelet-based estimators’ asymptotic

rates of convergence over a large range of Besov function classes Bσp,q, σ > 0, 1 ≤ p, q ≤ ∞,

which is a very rich class of function space. They include, in particular, the well-known

Sobolev and Holder spaces of smooth functions Hm and Cσ (Bm2,2 and Bσ

∞,∞ respectively), as

well as function classes of significant spatial inhomogeneity such as the Bump Algebra and

Bounded Variations Classes. For a more detailed study we refer to Triebel (1992).

For a given r-regular mother wavelet ψ with r > σ, the wavelet expansion of g(x) is

g(x) =∑

k∈Zαj0kφj0k(x) +

∑j≥j0

k∈Zβjkψjk(x), x ∈ [0, 1], (2.4)

where

αj0k =

∫g(x)φj0k(x) dx and βjk =

∫g(x)ψjk(x) dx.

Let

Gσ∞,∞(M, A) =

{g : g ∈ Bσ

∞,∞, ‖g‖Bσ∞,∞ ≤ M, ‖g‖∞ ≤ A, supp g ⊆ [0, 1]}

,

and let PdτA be the set of piecewise polynomials of degree d ≤ r− 1, with support contained

in [0, 1], such that the number of discontinuities is less than τ and the supremum norm is

5

Page 6: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

less than A. The spaces of mean regression functions we consider in this paper are defined

by

VdτA{Gσ∞,∞(M, A)} = {g : g = g1 + g2; g1 ∈ Gσ

∞,∞(M, A), g2 ∈ PdτA}. (2.5)

i.e., VdτA{Gσ∞,∞(M, A)} is a function space in which each element is a mixture of a regular

function g1 from the Besov space Bσ∞,∞ with a function g2 that may posses discontinuities.

In the statement below, the notation 2j(n) ' h(n) means that j(n) is chosen to satisfy

the inequalities 2j(n) ≤ h(n) < 2j(n)+1.

Our proposed nonlinear wavelet estimator of g(x) is

g(x) =∑

k∈Zαj0kφj0k(x) +

j1∑j=j0

k∈ZβjkI(|βjk| > δj)ψjk(x), (2.6)

where

αj0k =1

n

n∑i=1

Yiφj0k(xi), βjk =1

n

n∑i=1

Yiψjk(xi), (2.7)

and the smoothing parameters j0, j1 are chosen too satisfy 2j0 ' log2 n and 2j1 ' n1−π for

some positive π > 0 (We will choose π < 0.75(2r+1)−1 in our main theorem below. Also for

the sake of simplicity, we always omit the dependence on n for j0 and j1). The threshold δj

is level j dependent satisfying δ2j = 23+γC2n

−α2−j(1−α) ln n, where γ is the constant in (1.5),

α is the long memory parameter in (1.2) and C2 = C0

∫∫ |x− y|−αψ(x)ψ(y) dxdy.

2.2 Large deviation estimates

Let ξ be a random variable with characteristic function fξ(t) = E exp(itξ) and E|ξ|m < ∞.

The cumulant of ξ of order m, denoted by Γm(ξ), is defined by

Γm(ξ) =1

imdm

dtm

(log fξ(t)

)∣∣∣∣t=0

, (2.8)

where log denote the principal value of the logarithm so that log f(0) = 0. Note that, under

the above assumptions, all cumulants of order not exceeding m exist and

log fξ(t) =m∑

j=1

Γm(ξ)

j!(it)j + o(|t|m) as t → 0.

Cumulants are in general more tractable than moments. For example, if ξ1, . . . , ξn are

independent random variables and if Sn = ξ1 + · · ·+ ξn, then (2.8) implies

Γm(Sn) =n∑

j=1

Γm(ξj). (2.9)

6

Page 7: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Moreover, if η = a ξ, where a ∈ R is a constant, then Γm(η) = am Γm(ξ). We refer to Petrov

(1975) and Saulis and Statulevicius (2000) for further information on cumulants and their

applications to limit theory.

The large tail probability estimates of ξ can be described using information on the

cumulants Γm(ξ). We will make use of the following result of Bentkus and Rudzkis (1980)

[see also Lemma 1.7 and Corollary 1.1 in Saulis and Statulevicius (2000)].

Lemma 2.1 Let ξ be a random variable with mean 0. If there exist constants γ ≥ 0, H > 0

and ∆ > 0 such that

∣∣Γm(ξ)∣∣ ≤

(m!

2

)1+γ H

∆m−2, m = 2, 3, . . . , (2.10)

then for all x > 0,

P(|ξ| ≥ x

) ≤

exp(− x2

4H

), if 0 ≤ x ≤ (H1+γ∆)1/(1+γ),

exp(− 1

4(x∆)1/(1+γ)

), if x ≥ (H1+γ∆)1/(1+γ).

(2.11)

Condition (2.10) can be regarded as a generalized Statulevicius condition. It is more

general than the celebrated Cramer and Linnik conditions. Recall that a random variable ξ

is said to satisfy the Cramer condition if there exists a positive constant a such that

E exp(a|ξ|) < ∞. (2.12)

See Petrov (1975, p. 54) for other equivalent formulations of the Cramer condition and its

various applications.

A random variable ξ is said to satisfy the Linnik condition if there exist positive con-

stants δ and Cν such that

E exp(δ |ξ|4ν/(2ν+1)

)< Cν for all ν ∈ (

0,1

2

). (2.13)

Clearly, the Linnik condition is weaker than the Cramer condition. Amosova (2002) has

proved that (i) If γ = 0, then the Statulevicius condition (Sγ) coincides with the Cramer

condition; (ii) if γ > 0, then (Sγ) coincides with the Linnik condition. See Amosova (2002)

for the precise relations among the constants γ, ∆, δ and ν in these conditions.

It is also worthwhile to mention the following result of Rudzkis, Saulis and Statulevicius

(1978) [see also Lemma 1.8 in Saulis and Statulevicius (2000)]: Let ξ be a random variable

satisfying the following conditions: E(ξ) = 0, E(ξ2) = σ2 and there exist constants γ ≥ 0

and K > 0 such that

|E(ξm)| ≤ (m!)1+γ Km−2σ2, m = 3, 4, . . . . (2.14)

7

Page 8: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Then ξ satisfies condition (2.10) with H = 21+γσ2 and ∆ =[2(K ∨ σ)

]−1.

Condition (2.14) is a generalization of the classical Bernstein condition: |E(ξm)| ≤12m! Km−2σ2 for all m = 3, 4, . . ., which has been used by many authors. For examples, see

Petrov (1975, p.55), Johnstone (1999, p.64), Picard and Tribouley (2000, p.301), Zhang and

Wong (2003, p.164), among others.

3 Main results and discussions

Recall that we consider the nonparametric regression model (1.1) with random errors {εi}satisfying (1.3), (1.4) and (1.5). The following theorem shows that the wavelet-based esti-

mators defined as in (2.6), based on simple thresholding of the empirical wavelet coefficients,

attain nearly optimal convergence rates over a large class of functions with discontinuities,

with a number of discontinuities that diverges polynomially fast with sample size. These

results show that the discontinuities of the unknown curve have a negligible effect on the

performance of nonlinear wavelet curve estimators.

Theorem 3.1 Suppose the wavelet ψ is r-regular. Our wavelet estimator g is defined as

in (2.6) with π < 0.75(2r + 1)−1. Let τn be any sequence of positive numbers such that

for all θ > 0, τn = O(nθ+0.25α(2r+1)−1). Then, there exists a constant C, such that for all

A,M ∈ (0,∞), 1/2 ≤ σ < r;

supd<r, τ≤τn

supg∈VdτA{Gσ∞,∞(M, A)}

E

∫ (g − g

)2 ≤ C n−2σα/(2σ+α) log2n.

Remark 3.1 The above wavelet estimators defined as in (2.6) do not depend on the un-

known parameters σ and d. However, because of the long-range dependence nature, our

thresholds δj (= λσj) must be level-dependent and our estimators depend on the unknown

long memory parameter α. Wang (1996, p.480) and Johnstone and Silverman (1997, p.340)

provide simple methods to estimate the long memory parameter α. So, in practice, one needs

to estimate long memory parameter before applying wavelet method. In this paper, we treat

it as known. Our thresholds δj = λσj =√

23+γ ln n σj (for details, see Lemma 4.2 below)

are similar to the standard term-by-term hard threshold δ =√

2 ln n σ in the Gaussian case.

However, because of the long memory non-Gaussian errors here, one needs a larger constant

23+γ instead of 2.

Remark 3.2 Minimax theory indicates that the best convergence rates over the function

space Gσ∞,∞(M, A) is at n−2σα/(2σ+α). Since Gσ

∞,∞(M, A) ⊆ VdτA{Gσ∞,∞(M, A)}, the above

estimators achieve optimal convergence rates up to a logarithmic term, without knowing the

smoothness parameter. From Wang (1996, p470), the traditional linear estimators which

8

Page 9: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

include kernel estimator can not achieve the rates stated in Theorem 3.1. Hence our non-

linear wavelet estimators achieve nearly optimal convergence rates over a large function

space.

Remark 3.3 Wang (1996) and Johnstone and Silverman (1997) consider wavelet estimators

of mean regression function in the wavelet domain or based on the so-called “sequence space

model” with Gaussian error. For details, see Johnstone and Silverman (1997). Based on

the asymptotic equivalence between “sequence space model” and “sampled data model”

(1.1), they derive the minimax optimal convergence rates of wavelet estimators in wavelet

domain. However this implication may not be true, when the underlying mean function

g is not sufficiently smooth. Therefore, for the function space with infinitely many jump

discontinuities, we consider the wavelet estimator in the time domain or directly based

on the “sampled data model” (1.1) as in Hall, et al. (1999). In the later paper, Hall,

et al. consider block-threshold projection estimators of the mean regression function with

Gaussian error, assuming the function g belongs to a large class of functions that involve

many irregularities of a wide variety of types. Here, since our function space is relatively

simple, we consider a simple standard term-by-term hard thresholded wavelet estimator

and derive nearly optimal convergence rates with infinite long memory moving average non-

Gaussian errors. We conjecture that a block thresholded estimator similar to that in Hall,

et al. (1998, 1999) can be constructed so that it attains exact minimax convergence rates

without the logarithmic penalty. The proof would likely follow the arguments of Hall, et al.

(1998, 1999), but it would be too lengthy to discuss the details here.

4 Proofs

The overall proof of Theorem 3.1 follows along the arguments of Donoho, et al. (1996) and

Hall, et al. (1998, 1999) for the independent data case. But moving from independent data

to long range dependence data, especially non-Gaussian random errors, involves a significant

change in complexity. For nonparametric regression model with Gaussian random errors or

for the density estimation with i.i.d. random variables, one can apply standard Bernstein

inequality to obtain an exponential bound. However, these techniques are not readily ap-

plicable to infinite moving average processes with long memory. The key technical ingredient

in our proof is to establish a Bernstein-type exponential inequality for a sequence of infinite

weighted sums of i.i.d. random variables. This inequality gives us an exponential bound

just like regression model with Gaussian error or density estimation with i.i.d. variables and

may be of independent interest. We believe our assumptions on the errors to derive such

exponential bound are minimum.

9

Page 10: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Proof of Theorem 3.1: The proof of Theorem 3.1 can be broken into several parts.

Observing that the orthogonality (2.2) of φ and ψ implies

E

∫ (g − g

)2=: I1 + I2 + I3 + I4,

where

I1 =∑

k

E(αj0k − αj0k

)2, I2 =

jσ∑j=j0

k

E(θjk − βjk

)2,

I3 =

j1∑j=jσ+1

k

E(θjk − βjk

)2, I4 =

∞∑j=j1+1

k

β2jk.

Here θjk = βjkI(|βjk| > δj) and jσ = jσ(n) such that 2jσ ' (n−1 log2 n

)−α/(2σ+α). In order

to prove Theorem 3.1, it suffices to show that Ii ≤ Cn−2σα/(2σ+α) log2n, i = 1, · · · , 4, for all

d, τ, σ, A, M , which are the following Lemmas 4.3 to 4.6.

For this purpose, we need some preparation. We start by collecting and proving some

lemmas. Denote

aj0k := E(αj0k) =1

n

n∑i=1

g(xi)φj0k(xi),

bjk := E(βjk) =1

n

n∑i=1

g(xi)ψjk(xi).

(4.1)

Since we consider nonparametric regression with discontinuities on the sample data model,

unlike the density estimation problem as in Hall, et al. (1998), one more step of approxi-

mation between empirical wavelet coefficients and true wavelet coefficients is needed. The

following lemma which calculates the discrepancy between them will be used for proving the

other lemmas.

Lemma 4.1 Suppose the mean regression function g as in (2.5) and the wavelets φ and ψ

satisfy the uniform Holder condition (2.3). Then, for σ ≥ 1/2 and all j0 and j,

supk|aj0k − αj0k| = O

(n−1/2 + τn−1

), (4.2)

supk|bjk − βjk| = O

(n−1/2 + τn−1

). (4.3)

Proof: We only prove (4.2). The proof of (4.3) is similar and is omitted.

Let p = 2j0 , we may write

aj0k =p1/2

n

n∑i=1

g( i

n

)φ(pi

n− k

). (4.4)

10

Page 11: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

For fixed n, p and k, we note that

0 ≤ pi

n− k ≤ 1 if and only if

nk

p≤ i ≤ n(k + 1)

p.

Let mk = bnkpc, where bxc denotes the smallest integer that is at least x. Since φ has its

support in [0, 1], the summation in (4.4) runs from mk to mk+1 − 1. However, for simplicity

of the notation, we will not distinguish between bxc and x. Thus

aj0k =p1/2

n

mk+1−1∑i=mk

g( i

n

)φ(pi

n− k

)(let i = mk + `)

=p1/2

n

n/p−1∑

`=0

g( `

n+

k

p

)φ(p`

n

)(let t` =

p`

n)

=1

p1/2

n/p−1∑

`=0

g(t` + k

p

)φ(t`

) p

n. (4.5)

Similarly, by a simple change of variables, we have

αj0k =

∫ 1

0

g(x)φj0k(x)dx

= p1/2

∫ (k+1)/p

k/p

g(x) φ(px− k) dx (let t = px− k)

=1

p1/2

∫ 1

0

g(t + k

p

)φ(t) dt. (4.6)

Combining (4.5) and (4.6), we have

aj0k − αj0k =1

p1/2

n/p−1∑

`=0

∫ p(`+1)n

p`n

[g(t` + k

p

)φ(t`)− g

(t + k

p

)φ(t)

]dt

= J1 + J2, (4.7)

where

J1 =1

p1/2

n/p−1∑

`=0

∫ p(`+1)n

p`n

[g(t` + k

p

)− g

(t + k

p

)]φ(t`

)dt

and

J2 =1

p1/2

n/p−1∑

`=0

∫ p(`+1)n

p`n

g(t + k

p

)[φ(t`)− φ(t)

]dt.

Let us consider the term J1 first. Since g = g1+g2 with g1 ∈ Gσ∞,∞(M, A) and g2 ∈ PdτA,

we can write J1 = J1,1 + J1,2, where

J1,j =1

p1/2

n/p−1∑

`=0

∫ p(`+1)n

p`n

[gj

(t` + k

p

)− gj

(t + k

p

)]φ(t`

)dt, j = 1, 2.

11

Page 12: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Since g1 ∈ Gσ∞,∞(M, A), σ ≥ 1/2 and φ is bounded on [0, 1], we have

∣∣J1,1

∣∣ ≤ 1

p1/2

n/p−1∑

`=0

∫ p(`+1)n

p`n

C( |t− t`|

p

dt

≤ 1

p1/2· C

( 1

n

≤ Cn−1/2.

(4.8)

Since g2 ∈ PdτA, it is piecewise polynomial and has at most τ discontinuities. Thus g2 is

bounded on [0, 1] and is Lipschitz on every open subinterval of [0, 1] where g2 is continuous.

For simplicity, we will assume that each interval (p`n, p(`+1)

n) contains at most one discontinuity

of the function g2

( ·+kp

). This reduction, which brings some convenience for presenting our

proof, is not essential and the same argument remains true if an interval contains more

discontinuities.

If (p`n, p(`+1)

n) contains no discontinuity of g2

( ·+kp

), then by the Lipschitz condition we

have ∫ p(`+1)n

p`n

∣∣∣g2

(t` + k

p

)− g2

(t + k

p

)∣∣∣∣∣φ(

t`)∣∣ dt ≤ C

p

n2. (4.9)

If (p`n, p(`+1)

n) contains one discontinuity, say t0, of g2

( ·+kp

), then we will split the integral

in (4.9) over (p`n, t0) and (t0,

p(`+1)n

). Since the values of the integrals remain the same if

we modify the values of the function g2

( ·+kp

)at the end-points of the intervals, we may

assume that g2

( ·+kp

)are polynomials on the closed intervals [p`

n, t0] and [t0,

p(`+1)n

]. Hence

the triangle inequality and Lipschitz condition imply that the integral in (4.9) is bounded

above by a constant multiple of

∫ t0

p`n

∣∣∣g2

(t` + k

p

)− g2

(t + k

p

)∣∣∣ dt +

∫ p(`+1)n

t0

∣∣∣g2

(t` + k

p

)− g2

(t0 + k

p

)∣∣∣ dt

+

∫ p(`+1)n

t0

∣∣∣g2

(t0 + k

p

)− g2

(t + k

p

)∣∣∣ dt

≤ C n−1

( ∫ t0

p`n

dt + 2

∫ p(`+1)n

t0

dt

).

(4.10)

Summing up (4.9) and (4.10) over ` = 0, 1, . . . , n/p− 1 and recall that there are τ disconti-

nuities, we obtain

∣∣J1,2

∣∣ ≤ 1

p1/2· C (1 + τ) n−1 ≤ C (1 + τ) n−1. (4.11)

As to the second term J2, we use the boundedness of g and the uniform 1/2-Holder condition

(2.3) for φ to derive∣∣J2

∣∣ ≤ 1

p1/2· C

(p

n

)1/2

= C n−1/2. (4.12)

It is clear that (4.2) follows from (4.7), (4.8), (4.11) and (4.12).

12

Page 13: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Remark 4.1 If we write αjk =∫

gφjk =∫

g1φjk +∫

g2φjk = αjk,1 + αjk,2, similarly for ajk,1

and ajk,2, then Lemma 4.1 shows that supk|ajk,1 − αjk,1| = O(n−1/2) and sup

k|ajk,2 − αjk,2| =

O(n−1/2 + τn−1). Furthermore, if the number of the jump discontinuities τ ≤ τn = O(n1/2),

then supk|ajk,1 − αjk,1| = O(n−1/2) and sup

k|ajk,2 − αjk,2| = O(n−1/2). Similar results hold for

βjk and bjk.

Lemma 4.2 Under the assumptions of Theorem 3.1, we have

P( ∣∣∣βjk − bjk

∣∣∣ > δj

)≤ n−1, ∀j ∈ [j0, j1] and k = 0, 1, · · · , 2j − 1. (4.13)

Proof: First let’s calculate E(βjk − bjk)2. From (2.7) and (4.1), we have

E(βjk − bjk)2 =

1

n2

n∑i1=1

n∑i2=1

E(εi1εi2

)ψjk(xi1)ψjk(xi2)

=2j

n2

n∑i1=1

n∑i2=1

r(i1 − i2)ψ(2jxi1 − k)ψ(2jxi2 − k).

For each fixed k = 0, 1, · · · , 2j − 1, similar to (4.5), we have

E(βjk − bjk)2 =

2j

n2

n2−j−1∑i1=1

n2−j−1∑i2=1

r(i1 − i2)ψ(i12

j

n

(i22j

n

)

= 2−jC0

(2jn−1

)α[ ∫∫

|x− y|−αψ(x)ψ(y) dxdy + o(1)],

where the last equality follows from (1.2) and a standard limiting argument. Recall that

δ2j = 23+γC2n

−α2−j(1−α) ln n in (2.6). Let σ2j = C2n

−α2−j(1−α) and λ = 2√

21+γ ln n, then

we have δ2j = λ2σ2

j . From the above calculation, we see that E(βjk − bjk)2 ∼ σ2

j . In view

of (2.7), (4.1) and (1.3), we may write βjk − bjk as an infinite weighted sum of independent

random variables {ζj, j ∈ Z}:

βjk − bjk = n−1

n∑i=1

εiψjk(xi) =:∑

s∈Zdn,sζs, (4.14)

where

dn,s =

n−1∑n

i=1 bi−sψjk(xi), if s ≤ 0;

n−1∑n

i=s bi−sψjk(xi), if 0 < s ≤ n;

0, otherwise.

Hence, we have∑s∈Z

d2n,s = E(βjk − bjk)

2 ∼ σ2j . Also let

Sn = σ−1j

s∈Zdn,s ζs and Sn,K = σ−1

j

|s|<K

dn,s ζs.

13

Page 14: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Then, as K →∞, Sn,K −→ Sn almost surely for all integers n. We re-write the partial sum

Sn,K as

Sn,K =∑

|s|<K

σ−1j dn,s ζs.

Then E(Sn,K) = 0 and, by (2.9) and (1.5), we have that for all integers m ≥ 3,

∣∣Γm(Sn,K)∣∣ =

∣∣∣∑

|s|<K

(dn,s

σj

)m

Γm(ζs)∣∣∣

≤∑

|s|<K

∣∣∣dn,s

σj

∣∣∣m (m!)1+γ

∆m−2.

(4.15)

By using (1.4), the Cauchy-Schwarz inequality and the fact that n−1∑n

i=1 ψ2jk(xi) → 1, we

have

sups∈Z

d2n,s ≤ C n−1

n∑i=1

i−(1+α) ≤ C n−1

for some finite constant C > 0. This implies

sups∈Z

d2n,s

σ2j

≤ C (n−12j)1−α. (4.16)

It follows from (4.16) that

|s|<K

∣∣∣dn,s

σj

∣∣∣m

≤ sup|s|<K

(d2n,s

σ2j

)(m−2)/2

·∑

|s|<K

d2n,sσ

−2j

≤(C

(n−12j

)(1−α)/2)m−2

.

(4.17)

Combining (4.15) and (4.17) yields

∣∣Γm(Sn,K)∣∣ ≤

(m!

2

)1+γ 21+γ

[C−1 ∆

(n 2−j

)(1−α)/2]m−2, ∀m = 3, 4, . . . . (4.18)

That is, Sn,K satisfies the condition (2.10) with H = 21+γ and ∆ = C−1 ∆(n 2−j

)(1−α)/2.

Since 2j1 ' n1−π, we have ∆ ≥ C−1 ∆ nπ(1−α)/2 for all integers j ∈ [j0, j1]. Hence

λ = 2√

21+γ ln n < (H1+γ∆)1/(1+γ) for all integers j ∈ [j0, j1], for sufficiently large n. It

follows from Lemma 2.1 that

P(|Sn,K | > λ

)≤ exp

(− λ2

4H

)= n−1. (4.19)

Let K →∞ and use Fatou’s lemma, we have

P(∣∣∣βjk − bjk

∣∣∣ > δj

)= P

(|Sn| > λ) ≤ lim inf

K→∞P

(|Sn,K | > λ

)≤ n−1.

This finishes the proof of Lemma 4.2.

14

Page 15: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Remark 4.2 From the proof of Lemma 4.2, we see that by choosing λ appropriately, the

tail probability estimate (4.13) can be significantly improved.

Lemma 4.3 Under the assumptions of Theorem 3.1,

I1 :=∑

k

E(αj0k − αj0k

)2= o

(n−2σα/(2σ+α) log2n

).

Proof: Note that

I1 ≤ 2[ ∑

k

E(αj0k − aj0k)2 +

k

(aj0k − αj0k)2]

=: 2(I11 + I12).

As to the first term, we may apply the similar calculation as that in Lemma 4.2 to derive

I11 =1

n2

k

n∑i1=1

n∑i2=1

E(εi1εi2

)φj0k(xi1)φj0k(xi2)

=∑

k

2−j0C0

(2j0n−1

)α[ ∫∫

|x− y|−αφ(x)φ(y) dxdy + o(1)]

=2j0−1∑

k=0

2−j0C0

(2j0n−1

)α∫∫

|x− y|−αφ(x)φ(y) dxdy + o((2j0n−1)α

)

≤ C(2j0n−1

)α= o

(n−2σα/(2σ+α) log2n

),

where the last equality follows from our choice of j0 with 2j0 ' log2 n.

As to the second term, since τ ≤ τn = O(nθ+0.25α(2r+1)−1) = O(n1/2), from Lemma 4.1

and Remark 4.1, we have

I12 = O(2j0n−1

)= o

(n−2σα/(2σ+α) log2n

).

Together with term I11, this proves Lemma 4.3.

Lemma 4.4 Under the assumptions of Theorem 3.1,

I2 :=

jσ∑j=j0

k

E(θjk − βjk

)2 ≤ Cn−2σα/(2σ+α) log2n,

where θjk = βjkI(|βjk| > δj) and jσ = jσ(n) such that 2jσ ' (n−1 log2 n

)−α/(2σ+α).

Proof: Notice θjk = βjkI(|βjk| > δj), we have

I2 ≤ 2

jσ∑j=j0

k

E[β2

jkI(|βjk| ≤ δj

)]+ 2

jσ∑j=j0

k

E[(

βjk − βjk

)2I(|βjk| > δj

)]

=: 2(I21 + I22).

(4.20)

15

Page 16: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Also,

I21 ≤jσ∑

j=j0

k

β2jkI

(|βjk| ≤ 2δj

)+

jσ∑j=j0

k

β2jkP

(|βjk − βjk| > δj

)

=: I211 + I212.

(4.21)

Since there are at most 2j non-zero terms of βjk’s and δ2j = 23+γC2n

−α2−j(1−α) ln n, we have

I211 ≤jσ∑

j=j0

k

4δ2j ≤

jσ∑j=j0

k

Cn−α2−j(1−α) ln n

≤ C log2 n · n−α

jσ∑j=j0

2jα ≤ Cn−2σα/(2σ+α) log2n.

(4.22)

As to the term I212, from (4.3) in Lemma 4.1 and our choice of τ , it is easy to see that

supk|bjk − βjk| < δj for all j ∈ [j0, jσ]. Thus, I212 = O

(∑jσ

j=j0

∑k β2

jkP(|βjk − bjk| > δj

)).

Write βjk =∫

gψjk =∫

g1ψjk +∫

g2ψjk =: βjk,1 + βjk,2 as in Remark 4.1. Since g1 ∈ Gσ∞,∞,

we have β2jk,1 = O(2−j(1+2σ)). As to βjk,2, since g2 ∈ PdτA and our wavelet ψ has r (r > d)

vanish moments, there are at most τ non-zero βjk,2 terms with β2jk,2 = O(2−j). Thus, apply

Lemma 4.2, we have

I212 ≤ C

jσ∑j=j0

2j2−j(1+2σ)n−1 + C

jσ∑j=j0

τ2−jn−1 = o(n−2σα/(2σ+α) log2n). (4.23)

Now let’s consider the second term I22. Apply Lemma 4.1 and E(βjk − bjk)2 ∼ σ2

j as

that in Lemma 4.3, we have

I22 ≤ 2[ jσ∑

j=j0

k

E(βjk − bjk

)2+

jσ∑j=j0

k

(βjk − bjk

)2]

≤ C

jσ∑j=j0

k

n−α2−j(1−α) + C

jσ∑j=j0

2j(n−1 + τ 2n−2

)

≤ Cn−α

jσ∑j=j0

2jα + Cn−12jσ + Cτ 22jσn−2

≤ Cn−2σα/(2σ+α) log2n,

(4.24)

where the last inequality follows from our choice τ ≤ τn, σ < r and 1 ≤ r. Combining with

(4.20), (4.21), (4.22) and (4.23), this completes the proof of the lemma.

Lemma 4.5 Under the assumptions of Theorem 3.1,

I3 :=

j1∑j=jσ+1

k

E(θjk − βjk

)2 ≤ Cn−2σα/(2σ+α) log2n,

16

Page 17: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

where θjk = βjkI(|βjk| > δj) and jσ = jσ(n), such that 2jσ ' (n−1 log2 n

)−α/(2σ+α).

Proof: As in Lemma 4.4, we have

I3 ≤ 2

j1∑j=jσ+1

k

E[β2

jkI(|βjk| ≤ δj

)]+ 2

j1∑j=jσ+1

k

E[(

βjk − βjk

)2I(|βjk| > δj

)]

=: 2(I31 + I32).

(4.25)

Also,

I31 ≤j1∑

j=jσ+1

k

β2jkI

(|βjk| ≤ 2δj

)+

j1∑j=jσ+1

k

β2jkP

(|βjk − βjk| > δj

)

=: I311 + I312.

(4.26)

Let’s consider term I311 first. From Remark 4.1, we only need to prove

I311,l =

j1∑j=jσ+1

k

β2jk,l I

(|βjk,l| ≤ 2δj

) ≤ Cn−2σα/(2σ+α) log2n, l = 1, 2. (4.27)

Since β2jk,1 = O(2−j(1+2σ)), we have

I311,1 ≤ C

j1∑j=jσ+1

2j · 2−j(1+2σ) ≤ C2−2σjσ = Cn−2σα/(2σ+α) log2n.

For the second term I311,2, since g2 ∈ PdτA and our wavelet ψ has r vanish moments with

r > d, there are at most τ non-zero coefficients βjk,2. Because |βjk,2| ≤ 2δj for these τ terms,

we have

I311,2 ≤ C

j1∑j=jσ+1

τδ2j ≤ Cτn−α2−(1−α)jσ ≤ Cn−2σα/(2σ+α) log2n,

the last inequality follows from τ ≤ τn = O(nθ+0.25α(2r+1)−1). Thus we prove (4.27).

As to the term I312, we have, for any positive number α1 and α2 such that α1 + α2 = 1,

I312 =

j1∑j=jσ+1

k

β2jkP

(|βjk − bjk| > α1δj

)+

j1∑j=jσ+1

k

β2jkI

(|bjk − βjk| > α2δj

).

Since we can choose α1 large enough, close to 1, from Lemma 4.2, the first term in I312 is

bounded by C∑j1

j=jσ+1 2j2−jn−1 = o(n−2σα/(2σ+α) log2n).

As to the second term in I312, based on Lemma 4.1, we have for all j ∈ [j0, j1], |bjk −βjk| < α2δj for sufficient large n. Therefore this term is negligible. Together with (4.26) and

(4.27), we prove the bound for term I31.

17

Page 18: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

As to the term I32, for any η ∈ (0, 1), we have

I32 ≤j1∑

j=jσ+1

k

E[(

βjk − βjk

)2I(|βjk| > ηδj

)]

+

j1∑j=jσ+1

k

E[(

βjk − βjk

)2I(|βjk − βjk| > (1− η)δj

)]

=: I321 + I322.

(4.28)

Let’s consider I321 first. Applying the same argument as in I22, using Lemma 4.1 and

noticing there are at most τ terms that |βjk| > ηδj, we have

I321 ≤ C

j1∑j=jσ+1

k

n−α2−j(1−α)I(|βjk| > ηδj

)+ C

j1∑j=jσ+1

τ(n−1 + τ 2n−2

)

=: I3211 + I3212.

(4.29)

For the second term I3212, based on the boundness of τ ≤ τn in Theorem 3.1, we have

I3212 ≤ Cj1τn−1 + Cj1τ3n−2 = o(n−2σα/(2σ+α) log2n).

As to the first term I3211, we can consider I3211,1 and I3211,2, respectively. For the term

I3211,2, since there are only τ terms that |βjk| > ηδj, we have I3211,2 ≤ C∑j1

j=jσ+1 τn−α2−j(1−α),

which is the same as I311,2. As to the term I3211,1, since β2jk,1 > η2δ2

j in I3211,1, we have, for

any t > 0,

I3211,1 ≤ Cn−α

j1∑j=jσ+1

k

2−j(1−α)(β2

jk,1η−2δ−2

j

)t

=Cnα(t−1)

(log2 n)t

j1∑j=jσ+1

k

β2tjk,12

−j(1−α)(1−t)

≤ Cnα(t−1)

(log2 n)t

j1∑j=jσ+1

2−j(1+2σ)t2−j(1−α)(1−t)

= o(n−2σα/(2σ+α) log2n

).

Together with I3212, we prove the bound for I321. In order to prove the Lemma, in view of

(4.28), we need to bound the last term I322.

As before, we may write

I322 ≤ 2

j1∑j=jσ+1

k

E[(

βjk − bjk

)2I(|βjk − βjk| > (1− η)δj

)]

+ 2

j1∑j=jσ+1

k

E[(

bjk − βjk

)2I(|βjk − βjk| > (1− η)δj

)]

=: 2(I3221 + I3222).

(4.30)

18

Page 19: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

For any positive numbers α1 and α2 such that α1 + α2 = 1, we have

I3221 ≤j1∑

j=jσ+1

k

E[(

βjk − bjk

)2I(|βjk − bjk| > α1(1− η)δj

)]

+

j1∑j=jσ+1

k

E[(

βjk − bjk

)2I(|bjk − βjk| > α2(1− η)δj

)](4.31)

As to the first term, by Holder’s inequality, for any positive numbers a and b such that

1/a + 1/b = 1, we have it’s bound

j1∑j=jσ+1

k

[E

(βjk − bjk

)2a]1/a[

P(|βjk − bjk| > α1(1− η)δj

)]1/b

.

Choose α1 close to 1 and η > 0 small enough, by Lemma 4.2 we derive[P (|βjk − bjk| >

α1(1 − η)δj)]1/b

= O(n−1/b). As to the first term, from Lemma 4.2, E(βjk − bjk)2a =

σ2aj E(

∑s∈Z σ−1

j dn,sζs)2a. Apply Rosenthal’s inequality (Hardle, et al., p.244) and calcu-

lation as in Lemma 4.2 to the above expectation term, we can show it is finite for all

a. Now choose a sufficiently large [so that b is close to 1], we can show the first term is

bounded by Cn−α2−j(1−α). Therefore we obtain that the first term in I3221 is bounded by

C∑j1

j=jσ+1 2jn−α2−j(1−α)n−1 = o(n−2σα/(2σ+α) log2n).

As to the second term in I3221, we apply Lemma 4.1 to see that, when n is sufficiently

large,∣∣bjk − βjk

∣∣ < α2(1 − η)δj for all j and k. Thus the second term in I3221 is negligible.

Hence we have derived a desired bound for term I3221.

Similar to I3221, we write

I3222 ≤j1∑

j=jσ+1

k

(bjk − βjk

)2P

(|βjk − bjk| > α1(1− η)δj

)

+

j1∑j=jσ+1

k

(bjk − βjk

)2I(|bjk − βjk| > α2(1− η)δj

).

(4.32)

The bound for the first term follows from Lemma 4.1 and Lemma 4.2, while the second term

is negligible too. Combining (4.32) with (4.31), we get bound for I322, which, together with

(4.28) and (4.29), proves the lemma.

Lemma 4.6 Under the assumptions of Theorem 3.1,

I4 :=∞∑

j=j1+1

k

β2jk = o

(n−2σα/(2σ+α) log2n

).

19

Page 20: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Proof: From (2.4), we may write the wavelet coefficients βjk as βjk =∫

gψjk =∫

g1ψjk +∫g2ψjk =: βjk,1 + βjk,2. In order to prove the lemma, it suffices to show

I4,l :=∞∑

j=j1+1

k

β2jk,l = o

(n−2σα/(2σ+α) log2n

), l = 1, 2.

Let’s first consider I4,1. Because the functions g and ψ have compact support, i.e., supp g ⊆[0, 1] and supp ψ ⊆ [0, 1], we have, for any level j, there are at most 2j non-zero coefficients

βjk,1’s. Since g1 ∈ Gσ∞,∞, we have β2

jk,1 = O(2−j(1+2σ)). Thus

I4,1 ≤ C

∞∑j=j1+1

2−2σj = C2−2σj1 = o(n−2σα/(2σ+α) log2n), (4.33)

where the last equality follows from our choice of j1 with 2j1 ' n1−π and π < 0.75(2r +1)−1.

As to the second term I4,2, since there are at most τ discontinuities for any level j and

β2jk,2 = O(2−j) for those at most τ coefficients, we have

I4,2 ≤ C

∞∑j=j1+1

2−2σj + C

∞∑j=j1+1

τ2−j. (4.34)

From the facts that τ ≤ τn = O(nθ+0.25(2r+1)−1) and 2j1 ' n1−π with π < 0.75(2r + 1)−1, one

can verify∑∞

j=j1+1 τ2−j = τ2−j1 = o(n−2σα/(2σ+α) log2n). Combining this with (4.33) and

(4.34) completes the proof of the lemma.

REFERENCES

Amosova, N. N. (2002). Necessity of the Cramer, Linnik and Statulevicius conditions for

the probabilities of large deviations. J. Math. Sci. (New York) 109, 2031–2036

Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics.

J. Econometrics 73, 5–59.

Bentkus, R. and Rudzkis, R. (1980), Exponential estimates for the distribution of random

variables. (Russian) Litovsk. Mat. Sb. 20, 15–30.

Beran, J. (1994). Statistics for Long Memory Processes. Chapman and Hall, New York.

Cohen, A., Daubechies, I. and Vial, P. (1993). Wavelets on the interval and fast wavelet

transforms. Appl. Comput. Harm. Anal. 1, 54–82.

Csorgo, S. and Mielniczuk, J. (1995). Nonparametric regression under long-range dependent

normal errors. Ann. Statist. 23, 1000–1014.

Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM, Philadelphia.

20

Page 21: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet

shrinking. J. Amer. Statist. Assoc. 90, 1200–1224.

Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage.

Ann. Statist. 26, 879–921.

Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrink-

age: asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B. 57, 301–369.

Giraitis, L., Koul, H. L. and Surgailis, D. (1996). Asymptotic normality of regression

estimators with long memory errors. Statist. Probab. Lett. 29, 317–335.

Giraitis, L. and Surgailis, D. (1999). Central limit theorem for the empirical process of a

linear sequence with long memory. J. Statist. Plann. Inference 80, 81–93.

Hall, P. and Hart, J. D. (1990). Nonparametric regression with long-range dependence.

Stoch. Process. Appl. 36, 339–351.

Hall, P., Kerkyacharian, G. and Picard, D. (1998). Block threshold rules for curve estima-

tion using kernel and wavelet method. Ann. Statist. 26, 922–942.

Hall, P., Kerkyacharian, G. and Picard, D. (1999). On the minimax optimality of block

thresholded wavelet estimators. Statist. Sinica 9, 33–50.

Hardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets, Approx-

imation and Statistical Applications. Lecture Notes in Statistics 129, Springer, New

York.

Hart, J. D. (1991). Kernel regression estimation with time series errors. J. Roy. Statist.

Soc. Ser. B. 53, 173–187.

Ho, H. C. and Hsing, T. (1996). On the asymptotic expansion of the empirical process of

long memory moving averages. Ann. Statist. 24, 992–1024.

Ho, H. C. and Hsing, T. (1997). Limit theorems for functionals of moving averages. Ann.

Probab. 25, 1636–1669.

Johnstone, I. M. (1999). Wavelet shrinkage for correlated data and inverse problems: adap-

tivity results. Statist. Sinica 9, 51–83.

Johnstone, I. M. and Silverman, B. W. (1997). Wavelet threshold estimators for data with

correlated noise. J. Roy. Statist. Soc. Ser. B. 59, 319–351.

Koul, H. L. and Surgailis, D. (1997). Asymptotic expansion of M-estimators with long

memory errors. Ann. Statist. 25, 818–850.

Koul, H. L. and Surgailis, D. (2001) Asymptotics of the empirical process of long memory

moving averages with infinite variance. Stoch. Process. Appl. 91, 309–336.

Kovac, A. and Silverman, B. W. (2000). Extending the scope of wavelet regression methods

by coefficient-dependent thresholding. J. Amer. Statist. Assoc. 95, 172–183.

21

Page 22: On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references cited therein. Estimation for data with long-range dependence is quite difierent

Li, L. and Xiao, Y. (2006). On the minimax optimality of block thresholded wavelet

estimators with long memory data. J. Statist. Plann. Inference (in press).

Petrov, V. V. (1975), Sums of Independent Random Variables. Springer-Verlag, New York.

Picard, D. and Tribouley, K. (2000). Adaptive confidence interval for pointwise curve

estimation. Ann. Statist. 28, 298–335.

Robinson, P. M. (1994). Semiparametric analysis of long-memory time series. Ann. Statist.

22, 515–539.

Robinson, P. M. (1997). Large-sample inference for nonparametric regression with depen-

dent errors. Ann. Statist. 25, 2054–2083.

Rudzkis, R., Saulis, L. and Statulevicius, V. (1978). A general lemma on large deviation

probabilities. Lith. Math. J. 18, 226–238.

Saulis, L. and Statulevicius, V. (2000). Limit theorems on large deviations. In: Limit

Theorems of Probability Theory. (Prokhorov, Yu. V. and Statulevicius, V., editors),

Springer, New York.

Tran, L. T., Roussas, G. G., Yakowitz, S and Truong Van, B. (1996). Fixed-design regres-

sion for linear time series. Ann. Statist. 24, 975–991.

Truong, Y. K. and Patil, P. N. (2001). Asymptotics for wavelet based estimates of piecewise

smooth regression for stationary time series. Ann. Inst. Statist. Math. 53, 159–178.

Triebel, H. (1992). Theory of Function Spaces II. Birkhauser, Basel.

von Sachs, R. and Macgibbon, B. (2000). Non-parametric curve estimation by wavelet

thresholding with locally stationary errors. Scandinavian J. Statist. 27, 475–499.

Wang, Y. (1996). Function estimation via wavelet shrinkage for long-memory data. Ann.

Statist. 24, 466–484.

Zhang, S. and Wong, M. (2003). Wavelet threshold estimation for additive regression

models. Ann. Statist. 31, 152–173.

22