Asymptotic normality of kernel type regression estimators ...file.statistik.tuwien.ac.at/filz/papers/09JSPI.pdf · Asymptotic normality of kernel type regression ... of a diagonal

Asymptotic normality of kernel type regression

estimators for random fields

Zsolt Karacsony∗

Department of Applied Mathematics, University of Miskolc, H-3515Miskolc-Egyetemvaros, Hungary

Peter Filzmoser

Department of Statistics and Probability Theory, Vienna University of Technology,Wiedner Hauptstraße 8-10, A-1040 Vienna, Austria

Abstract

The asymptotic normality of the Nadaraya-Watson regression estimator is

studied for α-mixing random fields. The infill-increasing setting is consid-

ered, that is when the locations of observations become dense in an increasing

sequence of domains. This setting fills the gap between continuous and dis-

crete models. In the infill-increasing case the asymptotic normality of the

Nadaraya-Watson estimator holds, but with an unusual asymptotic covari-

ance structure. It turns out that this covariance structure is a combination

of the covariance structures that we observe in the discrete and in the con-

tinuous case.

Key words: Central limit theorem, kernel, regression estimator, α-mixing,

random field, asymptotic normality of estimators, infill asymptotics,

∗Corresponding authorEmail addresses: [email protected] (Zsolt Karacsony),

[email protected] (Peter Filzmoser)

Preprint submitted to JSPI September 15, 2009

increasing domain asymptotics

2000 MSC: 62G08, 60F05

1. Introduction

Kernel type regression estimators have been widely studied in the litera-

ture. The original results by Nadaraya (1964) and Watson (1964) have been

extended in several papers, and they are summarized for example in Bosq

(1998), Devroye and Gyorfi (1985), and Prakasa Rao (1983). One impor-

tant issue for kernel type regression estimators is their asymptotic normality,

which has been studied in several papers, like in Schuster (1972) and Cai

(2001).

In this paper we consider (Xt, Yt), t ∈ T∞, to be a strictly stationary

random field. (Here T∞ is a domain in Rd, Xt and Yt are real-valued.) We

want to estimate the regression function r(x) = E (Φ (Yt) |Xt = x) , where Φ

is a known bounded measurable function. The data set is (Xt, Yt), t ∈ Dn.

We consider the well-known kernel type regression estimator

rn(x) =

∑t∈Dn

Φ(Yt)K(

x−Xt

h

)∑t∈Dn

K(

x−Xt

h

) ,

where K is a kernel function (see Nadaraya, 1964; Watson, 1964). However,

our sampling scheme is unusual. The locations of observations become dense

in an increasing sequence of domains. It is called the infill-increasing setting,

see, for example, Lahiri et al. (1999) and Fazekas (2003). We suppose that

the observed random field is weakly dependent, more precisely, the random

field satisfies a certain α-mixing condition. The main result of this paper

is that rn(x) is asymptotically normal with an unusual covariance structure.

2

That is, the asymptotic covariance matrix of (rn(x1), . . . , rn(xm)) is the sum

of a diagonal matrix and a matrix containing integrals of the conditional co-

variances, see Theorem 1. Note that in the classical case for independent ob-

servations the joint asymptotic normality of rn(x1), . . . , rn(xm) is well known

(see Schuster, 1972).

The infill-increasing setting can be considered as a compromise of the

continuous and the discrete case. However, an essential question is whether

the limiting behavior of the continuous model is the same as that of its dis-

crete counterpart. Estimators in the continuous case are mainly defined by

integrals, in the discrete case they are defined by sums. However, when in-

tegrals are calculated numerically, approximating sums have to be applied.

Therefore, considering an asymptotic result, not only the domain of the in-

tegration is increasing, but also the subdivision of the domain is more and

more dense. In this situation one should check if the limiting behavior is the

same as in the increasing domain setting. In this paper we show that these

are the same only in special situations, but otherwise they can be different.

Note that the infill-increasing approach is substantially different from the

pure infill setting. The infill setting means that the locations of the obser-

vations become more and more dense in a fixed domain (Cressie, 1991). In

the infill case several well-known estimators are not consistent (Lahiri, 1996).

Moreover, in the infill setup one cannot expect asymptotic normality of the

estimators (because of the lack of the appropriate central limit theorems).

The infill-increasing approach can be useful in geosciences, meteorology,

environmental studies, image processing, etc. In these sciences several pro-

cesses varying continuously in time or in space are studied. However, in

3

practice, we cannot observe the processes continuously in time or space. So

we have to use finite data sets and discrete approximations. Moreover, the

theoretical analysis of statistical models often requires simulation studies. In

computer simulations discrete approximations are always applied.

Concerning the motivation of our studies we have to refer to the sampling

schemes. Continuous time processes can be observed at deterministic or ran-

dom time. Most of the existing results concern the non infill case (see Masry,

1983; Bosq and Cheze, 1993). In Bosq (1998) the importance of the sampling

schemes is expressed, however no explicit result is mentioned for regression.

In Bosq (1998, p. 140) only the following hint is given: ”regression and den-

sity estimators behave alike when sampled data are available”. Actually, for

kernel type density estimators there are several results for infill-increasing

type sampling schemes. We refer to Bosq (1998, pp. 118-127), Blanke-Pumo

(2003), and Biau (2004).

This paper is organized as follows. In Section 2 the notation and the

main result are presented. Theorem 1 states the asymptotic normality of

the regression estimator rn(x). It is analogous to Theorem 1 of Fazekas and

Chuprunov (2006) who proved the asymptotic normality of the kernel type

density estimator in the same situation as in our paper. We quote it in

Theorem A. In Remark 2 we compare our result with the existing ones. We

show that the covariance structure given in Theorem 1 is a combination of

the one in the discrete case (see Schuster, 1972) and the one in the continuous

case (see Cheze, 1992).

The proof is given in Section 3. We apply the same method as in the

proof of Theorem 1 of Fazekas and Chuprunov (2006). We use the notion

4

of direct Riemann integrability presented in Fazekas and Chuprunov (2006).

We apply a central limit theorem for random fields (Theorem 2.1 in Fazekas

and Chuprunov, 2004, quoted in Theorem B). We also need the Rosenthal

inequality for random fields (Theorem in Fazekas et al., 2000, quoted in

Theorem C).

In Section 4 simulation results are presented. The numerical examples

show the above mentioned unusual covariance structure of the limiting dis-

tribution.

2. Notation and the main result

The following notation is used. Z is the set of all integers, Zd is the set

of d-dimensional integer lattice points, where d is a fixed positive integer.

R is the real line, Rd is the d-dimensional space with the usual Euclidean

norm ‖x‖. In Rd we shall also consider the distance corresponding to the

maximum norm,

%(x,y) = max1≤i≤d

|xi − yi|

where x = (x1, . . . , xd), y = (y1, . . . , yd). The distance of two sets in Rd

corresponding to the maximum norm is also denoted by % : %(A, B) =

inf%(x,y) : x ∈ A,y ∈ B.

For real valued sequences an and bn, an = o(bn) (resp. an = O(bn))

means that the sequence an/bn converges to 0 (resp. is bounded). We will

denote different constants with the same letter c. IA denotes the indicator

function of the set A. |D| denotes the cardinality of the finite set D and at

the same time |T | denotes the volume of the domain T.

We shall suppose the existence of an underlying probability space (Ω,F , P).

5

The σ-algebra generated by a set of events or by a set of random variables

will be denoted by σ.. The sign E stands for the expectation. The vari-

ance and the covariance are denoted by var(.) and cov(., .), respectively. The

Lp-norm of a random (vector) variable η is defined as

‖η‖p = E‖η‖p1/p , 1 ≤ p < ∞.

The sign ” ⇒ ” denotes convergence in distribution. N (m, Σ) stands for the

(vector) normal distribution with mean (vector) m and covariance (matrix)

Σ.

The scheme of observations is the following. For simplicity we restrict

ourselves to rectangles as domains for the observations. Let Λ > 0 be fixed.

By(Z

Λ

)dwe denote the Λ-lattice points in Rd, i.e. lattice points with distance

1Λ: (

ZΛ

)d

=

(k1

Λ, . . . ,

kd

Λ

): (k1, . . . , kd) ∈ Zd

.

T will be a bounded, closed rectangle in Rd with edges parallel to the axes,

and D will denote the Λ-lattice points belonging to T , i.e. D = T ∩ (Z/Λ)d.

For describing the limit distribution we consider a sequence of the previous

objects. I.e. let T1, T2, . . . be bounded, closed rectangles in Rd. Suppose that

T1 ⊂ T2 ⊂ T3 ⊂ . . . ,⋃∞

i=1 Ti = T∞.

We assume that the length of each edge of Tn is an integer and converges

to ∞, as n →∞ (e.g. T∞ = Rd or T∞ = [0,∞)d). Let Λn be an increasing

sequence of positive integers (the non-integer case is essentially the same)

and let Dn be the Λn-lattice points belonging to Tn.

Let ξt = (Xt, Yt), t ∈ T∞ be a strictly stationary two-dimensional

random field. The n-th set of observations involves the values of the random

6

field (Xt, Yt) taken at each point k ∈ Dn. We shall construct the estimator

from the data (Xk, Yk),k ∈ Dn. Actually, each k = k(n) depends on n, but to

avoid complicated notation we omit the superscript (n). By our assumptions,

limn→∞ |Dn| = ∞.

As the locations of the observations become more and more dense in

an increasing sequence of domains, we call our setup infill-increasing (see

Cressie, 1991; Lahiri, 1996, for infill asymptotics).

We need the notion of α-mixing (see, e.g. Doukhan, 1994; Guyon, 1995).

Let A and B be two σ-algebras in F . The α-mixing coefficient of A and B

is defined as

α(A,B) = sup|P(A)P(B)− P(AB)| : A ∈ A, B ∈ B.

The α-mixing coefficient of ξt : t ∈ T∞ is

α(r) = supα(FI1 ,FI2) : %(I1, I2) ≥ r,

where I1 and I2 are finite subsets in T∞, FIi= σξt : t ∈ Ii, i = 1, 2.

We list the conditions that will be used in our theorems.

Let ∫ ∞

0

s2d−1αa−1

a (s)ds < ∞, for some 1 < a < ∞. (1)

A function K : R → [0,∞) will be called a kernel if K is a bounded, con-

tinuous, symmetric density function (with respect to the Lebesgue measure),

lim|u|→∞

|u|K(u) = 0,

∫ ∞

−∞u2K(u)du < ∞. (2)

Let g be the (unknown) marginal density function of Xt. For the sake of

simplicity we assume that g(x) is positive. Let K be a kernel and let hn > 0,

7

then the kernel-type (or Parzen-Rosenblatt-type) estimator of g is

gn(x) =1

|Dn|1

hn

∑i∈Dn

K

(x−Xi

hn

), x ∈ R.

Our aim is to estimate the regression function

r(x) = E (Φ (Yt) |Xt = x) ,

where Φ is a bounded measurable function.

The usual kernel type estimator of r(x) is

rn(x) =

1|Dn|

∑t∈Dn

Φ(Yt)1hK(

x−Xt

h

)1

|Dn|∑

t∈Dn

1hK(

x−Xt

h

) ,

where h = hn > 0.

Let

a(x) = E(Φ2(Yt)|Xt = x

).

Denote by Rd0 the set Rd \ 0. Let gu(x, y) be the joint density function

of X0 and Xu, if u ∈ Rd0 and x, y ∈ R. Let

au(x, y) = E[Φ(Y0)− r(X0)] [Φ(Yu)− r(Xu)]

∣∣X0 = x, Xu = y.

We shall assume that for each fixed u the functions

au(., .), gu(., .), a(.), r(.), g(.), r′(.), g′(.), r′′(.), g′′(.) are bounded and continuous.

(3)

Furthermore we shall suppose that

limn→∞

1

Λdnhn

= L < ∞, limn→∞

Λn = ∞ and limn→∞

hn = 0 (4)

and

limn→∞

|Tn|h4n = 0. (5)

8

Throughout the paper we concentrate on the case when ξt and ξs are

dependent if t and s are close to each other.

First recall the asymptotic normality of the density estimator gn.

Let lu(x, y) = gu(x, y) − g(x)g(y),u ∈ Rd0 and x, y ∈ R. Let lu denote

lu(x, y) as a function l : Rd0 → C(R2), i.e. a function with values in C(R2) the

space of continuous real-valued functions over R2. Let

‖lu‖ = sup(x,y)∈R2

|lu(x, y)| (6)

be the norm of lu. Let x1, . . . , xm be given distinct real numbers. Let

Σl =(∫

Rd0lu(xi, xj)du

)1≤i,j≤m

and let D′

be a diagonal matrix with diag-

onal elements Lg(xi)∫∞−∞ K2(u)du, i = 1, . . . ,m. Let Σ

′= Σl + D

′.

Theorem A. (Theorem 1 in Fazekas and Chuprunov, 2006)

Assume that lu is Riemann integrable (as a function l : Rd0 → C(R2)) on each

bounded closed d-dimensional rectangle R ⊂ Rd0, moreover ‖lu‖ is directly

Riemann integrable (as a function ‖l‖ : Rd0 → R). Let x1, . . . , xm be given

distinct real numbers and assume that Σ′is positive definite. Suppose that

there exists 1 < a < ∞ such that (1) is satisfied and

(hn)−1 ≤ c|Tn|a2

(3a−1)(2a−1) for each n. (7)

If (4) and (5) are satisfied then√|Dn|Λd

n

(gn(xi)− g(xi)), i = 1, . . . ,m ⇒ N (0, Σ′) as n →∞. (8)

Note that in the recent paper Park, Kim, Park and Hwang (2008) a

similar phenomenon was described for a simpler dependence structure (m-

dependence) but for a more general sampling scheme.

9

The notion of direct Riemann integrability can be found in Fazekas and

Chuprunov (2006). Let l : Rd0 → [0,∞) be given. For a δ > 0 consider a

partition of Rd into (right closed and left open) d-dimensional cubes ∆i with

edge lengths δ such that the center of ∆0 is the origin 0 ∈ Rd. The family

∆i is called the subdivision corresponding to δ. If i 6= 0, for x ∈ ∆i let

lδ(x) = supl(y) : y ∈ ∆i, lδ(x) = infl(y) : y ∈ ∆i, while lδ(x) = lδ(x) =

0 if x ∈ ∆0. If

limδ→0

∫Rd

lδ(x)dx = limδ→0

∫Rd

lδ(x)dx = I

and this common value is finite, then l is called directly Riemann integrable

(d.R.i.) and I is its direct Riemann integral.

If l is d.R.i., then l is bounded outside each neighborhood of the origin.

Moreover, l is continuous almost everywhere (with respect to the Lebesgue

measure). Therefore, l is Riemann integrable on each bounded closed d-

dimensional rectangle not containing the origin. Call a zone a set M =

R1\R2, where R1 is a closed d-dimensional rectangle while R2 (∅ 6= R2 ⊂ R1)

is an open d-dimensional both rectangles having their centers at the origin.

Then one obtains that l is Riemann integrable on each zone.

If l ≥ 0 is d.R.i. then the improper integral∫

Rd0l(x)dx exists and it is

equal to the direct Riemann integral of l. The above statement implies: for

any ε > 0 there exists a zone M such that∫

Rd0\M

l(x)dx ≤ ε.

Finally, we have the following. Let l ≥ 0 be d.R.i. Let δn be positive

numbers converging to zero, and let ∆(n)i be the subdivision corresponding

to δn. Then for any ε > 0 there exist a zone M such that all Riemannian

approximating sums (based on the above subdivisions but not containing

term |∆0|l(x0)) of the integral∫

Rd0\M

l(x)dx are less than ε.

10

For the definition of the Riemann integrability of a Banach space valued

function, see Hille and Phillips (1957, p. 62).

Now we can state our main result. Let v(x) = a(x)− r2(x).

For a fixed positive integer m and fixed distinct real numbers x1, x2, . . . , xm

we introduce the notation

σ(xt, xs) =

∫Rd

0

au(xt, xs)gu(xt, xs)du, t, s = 1, . . . ,m, (9)

Σ(m) =

(σ(xt, xs)

g(xt)g(xs)

)1≤t,s≤m

. (10)

We assume that

lim|z|→∞

z3|K(z)| = 0. (11)

Theorem 1. Let (Xt, Yt) , t ∈ T∞, be a strictly stationary two-dimensional

random field and let r(x) = E (Φ (Yt) |Xt = x) be the regression function,

where Φ is a bounded measurable function. Let K be a kernel. Assume

that the conditions of Theorem A on the function lu are satisfied, and

that Σ′

is positive definite. Furthermore, assume that the marginal den-

sity function of Xt is positive, and that augu is Riemann integrable (as a

function a · g : Rd0 → C(R2)) on each bounded closed d-dimensional rectangle

R ⊂ Rd0. Moreover, ‖augu‖ where the norm is a similar in (6)) is directly

Riemann integrable (as a function ‖a · g‖ : Rd0 → R). Suppose there exists

1 < a < ∞ such that (1) and (7) are satisfied. Assume that the matrix

Σ(m) + D is positive definite where D is a diagonal matrix with diagonal ele-

ments Lv(xi)∫∞−∞ K2(t)dt/g(xi), i = 1, . . . ,m. If the conditions (3), (4), (5)

and (11) hold then√|Dn|Λd

n

(rn(xi)− r(xi)) , i = 1, . . . ,m ⇒ N (0, Σ) , as n →∞,

11

where

Σ = Σ(m) + D.

Remark 2. We show that the asymptotic covariance matrix Σ in Theorem 1

is a combination of the asymptotic covariance matrices in the discrete and

the continuous cases. In Schuster (1972) it is shown that (for independent

identically distributed observations) rn(x1), . . . , rn(xm) is asymptotically nor-

mal with diagonal covariance matrix. In particular,√

nhn(rn(xi)− r(xi)) ⇒

N (0, ci), where ci = v(xi)∫∞−∞ K2(t)dt/g(xi). Therefore, in Theorem 1 the

diagonal part D corresponds to the limiting covariance matrix in the discrete

case.

Now calculate the elements of σ(xt, xs). Denote by fX0,Xu,Y0,Yu(x1, x2, y1, y2)

the joint density function of X0, Xu, Y0, Yu, (u 6= 0). Then we obtain

au(x1, x2) =

∫∞−∞

∫∞−∞ [Φ(y1)− r(x1)] [Φ(y2)− r(x2)] fX0,Xu,Y0,Yu(x1, x2, y1, y2)dy1dy2

gu(x1, x2)

=

∫∞−∞

∫∞−∞ Mu(x1, x2, y1, y2)dy1dy2

gu(x1, x2).

Therefore (considering the case d = 1)

σ(xt, xs) =

∫R0

[∫ −∞

∞

∫ ∞

−∞Mu(x1, x2, y1, y2)dy1dy2

]du.

In Cheze (1992) and Bosq (1998), p. 138, the kernel regression estimator

was considered if (Xt, Yt), t ∈ [0, T ] is a continuous time stochastic process

(with certain α-mixing condition). The estimator of the regression function

r(x) = E(Φ(Y )|X = x) is given by

rT (x) =ϕT (x)

gT (x)(12)

12

where

ϕT (x) =1

T

∫ T

0

Φ(Yt)1

hT

K

(x−Xt

hT

)dt, gT (x) =

1

T

∫ T

0

1

hT

K

(x−Xt

hT

)dt.

Under some conditions, if T → ∞ and hT → 0, then rT is asymptotically

normal. More precisely,

rT (x)− r(x)√dT (x)

⇒ N (0, 1)

where

g2(x)dT (x) = (1,−r(x)) var

ϕT (x)

gT (x)

1

−r(x)

.

Using the above expression (and assuming certain analytical conditions) we

can see that the limit of TdT (x) is σ(x, x)/g2(x). Thus the result in Cheze

(1992) and Bosq (1998) can be formulated as

√T (rT (x)− r(x)) ⇒ N

(0, σ(x, x)/g2(x)

).

Therefore the diagonal elements of our matrix Σ(m) correspond to the limiting

variances in the continuous case. (In Cheze (1992) and Bosq (1998) joint

asymptotic normality of (rT (x1), . . . , rT (xm)) is not studied.)

Remark 3. If condition (5), i.e. limn→∞ |Tn|h4n = 0 is not satisfied, we can

prove that√|Dn|Λd

n

(rn(xi)− rn(xi)) , i = 1, . . . ,m ⇒ N (0, Σ) , as n →∞,

where rn(x) = 1|Dn|

∑t∈Dn

r(Xt)1hK(

x−Xt

h

)/gn(x), and

gn(x) = 1|Dn|

∑t∈Dn

1hK(

x−Xt

h

)→ g(x) in probability. It is a consequence of

the proof of Theorem 1.

13

Remark 4. Note that in Bosq (1998), p. 140, and in Bosq (1997) the prob-

lem of sampling is also studied, i.e., the behavior of the approximation of rT

in (12) with the corresponding discrete expression, if the process is observed

at time instants δn, 2δn, . . . , nδn. However, the asymptotic normality is not

considered in the above mentioned papers.

3. Proof of the Main Theorem

To prove the main theorem we need the following central limit theorem

and a version of the Rosenthal inequality for mixing fields.

First, define the discrete parameter (vector valued) random field Yn(k) as

follows. For each n = 1, 2, . . . , and for each k = k(n) ∈ Dn,

let Yn(k) be a Borel measurable function of ξk(n) (13)

where ξt, t ∈ T∞ is the underlying random field.

Theorem B. (Theorem 2.1 in Fazekas and Chuprunov, 2004)

Let ξt be a random field and let Yn(k) = (Y(1)n (k), . . . , Y

(m)n (k)) be an m-

dimensional random field defined by (13). Let Sn =∑

k∈DnYn(k), n =

1, 2, . . . . Suppose that for each fixed n the field Yn(k),k ∈ Dn, is strictly

stationary with EYn(k) = 0. Assume that

‖Yn(k)‖ ≤ Mn, (14)

where Mn depends only on n;

supn,k,t

E(Y (t)n (k))2 < ∞; (15)

14

for any increasing, unbounded sequence of rectangles Gn with Gn ⊆ Tn

limn→∞

1

Λdn|Gn|

E

[∑k∈Gn

Y (t)n (k) ·

∑l∈Gn

Y (s)n (l)

]= σts, t, s = 1, . . . ,m, (16)

where Gn = Gn ∩ (Z/Λn)d; the matrix Σ = (σts)mt,s=1 is positive definite; there

exists 1 < a < ∞ such that (1) is satisfied; and

Mn ≤ c|Tn|a2

(3a−1)(2a−1) for each n. (17)

Then1√

Λdn|Dn|

Sn ⇒ N (0, Σ), as n →∞. (18)

In the proof of the main theorem we also use the following form of the

Rosenthal inequality for mixing fields.

Theorem C. (Theorem in Fazekas et al., 2000)

Let 1 < l ≤ 2 and τ > 0. Let Yk, k ∈ Zd, be centered random variables with

E|Yk|l+τ < ∞, k ∈ Zd. Introduce the notation

L(l, τ,D) =∑k∈D

(E|Yk|l+τ

) ll+τ ,

if D is a finite set in Zd. Let

c(τ)1,1 = 1 +

∞∑s=1

sd−1[αY (s, 1, 1)]τ

2+τ ,

where αY (s, 1, 1) is the α-mixing coefficient of the field Yk, i.e. αY (s, 1, 1) =

supα(Yu, Yv) : %(u,v) ≥ s. Assume that c(τ)1,1 < ∞. Then there is a constant

c such that

E∣∣∣∑

k∈DYk

∣∣∣l ≤ c·c(τ)1,1L(l, τ,D), (19)

for any finite subset D of Zd.

15

Details and the general form of the Rosenthal inequality can be found

e.g. in Fazekas et al. (2000).

In the proof of the main theorem we will use the next theorem several

times. This is a particular case of Theorem 2.1.1 in Prakasa Rao (1983).

Theorem D. (Theorem 2.1.1 in Prakasa Rao, 1983)

Let K : R → R be measurable such that

|K(z)| ≤ M, z ∈ R,∫ ∞

−∞|K(z)|dz < ∞,

and

|z||K(z)| → 0 as |z| → ∞.

Furthermore, let g : R → R be measurable such that∫ ∞

−∞|g(z)|dz < ∞.

Define

gn(x) =1

hn

∫ ∞

−∞K

(z

hn

)g(x− z)dz

where 0 < hn → 0 as n →∞. Then, if g is continuous,

limn→∞

gn(x) = g(x)

∫ ∞

−∞K(z)dz, (20)

and if g is uniformly continuous, then the convergence in (20) is uniform.

Remark 5. We shall often use the next limit relations (see, e.g., Fazekas

and Chuprunov, 2006). Assume that the density function g is continuous, K

is a kernel, then as hn → 0 (hn > 0) we have the following.

E

(1

hn

K

(x−Xt

hn

))=

∫ ∞

−∞

1

hn

K

(x− u

hn

)g(u)du → g(x), (21)

16

E1

hn

K2

(x−Xt

hn

)=

∫ ∞

−∞

1

hn

K2

(x− u

hn

)g(u)du → g(x)

∫ ∞

−∞K2 (u) du,

(22)

E1

h2n

K

(xr −Xt

hn

)K

(xs −Xt

hn

)=

∫ ∞

−∞

1

h2n

K

(xr − u

hn

)K

(xs − u

hn

)g(u)du → 0,

(23)

if xr 6= xs.

Proof of Theorem 1. Consider the following decomposition

√|Dn|Λd

n

(rn(x)− r(x)) =

√|Dn|Λd

n

1|Dn|

∑t∈Dn

[Φ(Yt)− r(x)] 1hK(

x−Xt

h

)1

|Dn|∑

t∈Dn

1hK(

x−Xt

h

)

=

1√|Dn|Λd

n

1h

[∑t∈Dn

[Φ(Yt)− r(Xt)] K(

x−Xt

h

)+∑

t∈Dn[r(Xt)− r(x)] K

(x−Xt

h

) ]1

|Dn|∑

t∈Dn

1hK(

x−Xt

h

)=

J1(x) + J2(x)

J3(x),

where

J1(x) =1√

|Dn|Λdn

∑t∈Dn

1

h[Φ(Yt)− r(Xt)] K

(x−Xt

h

),

J2(x) =1√

|Dn|Λdn

∑t∈Dn

1

h[r(Xt)− r(x)] K

(x−Xt

h

),

and

J3(x) =1

|Dn|∑t∈Dn

1

hK

(x−Xt

h

).

First we prove the asymptotic normality of J1. We have to check the

conditions of Theorem B.

17

Let x1, x2, . . . , xm be fixed distinct real numbers. We need to prove the

joint asymptotic normality of J1 = (J1(x1), J1(x2), . . . , J1(xm))>. Define the

m-dimensional random vector Zn(i) with the following coordinates:

Z(s)n (i) =

1

h[Φ(Yi)− r(Xi)] K

(xs −Xi

h

),

for s = 1, . . . ,m and i ∈ Dn.

Divide Tn into d-dimensional unit cubes (having Λdn points of Dn in each of

them). Denote D′n the set of these cubes. Let Vn(k) = (V

(1)n (k), . . . , V

(m)n (k))

be the arithmetical mean of the variables Zn(i) having indices i in the k-th

unit cube. Then for each fixed n the field Vn(k), k ∈ D′n, is strictly stationary.

We shall apply Theorem B to Vn(k), k ∈ D′n, i.e. we shall use a non-infill

form of that theorem. We have

J1(xs) =1√

|Dn|Λdn

Λdn

∑i∈D′n

V (s)n (i) =

√Λd

n

|Dn|∑i∈D′n

V (s)n (i).

To see that EVn(k) = 0 consider

EZ(s)n (i) = E

(1


(xs −Xi

h

))= 0,

because E(Φ(Y )K

(x−X

h

))= E

[E Φ(Y )|X︸︷︷︸

r(X)

K(

x−Xh

) ]= E

(r(X)K

(x−X

h

)).

Since Φ, r and K are bounded, equation (7) implies (14) and (17).

To prove (15), we have to consider

E(V (s)

n (k))2

= E

(1

Λdn

∑i

1


(xs −Xi

h

))2

where∑i

means that i belongs to the k-th unit cube. The boundedness

of this expression can be checked similarly to the next proof (showing that

condition (16) is satisfied).

18

To calculate the limit in (16), let Gn be an increasing sequence of d-

dimensional rectangles, each Gn being a union of d-dimensional unit cubes.

Then

1

|Gn|E

∑k∈Gn∩Zd

V (t)n (k) ·

∑l∈Gn∩Zd

V (s)n (l)

=

1

Λdn|Gn|

∑i∈Gn

∑j∈Gn

1

h2E

[[Φ(Yi)− r(Xi)] K

(xt −Xi

h

)[Φ(Yj)− r(Xj)] K

(xs −Xj

h

)]= A + B,

where Gn = Gn ∩ (Z/Λn)d, and A denotes the part of the sum with i = j,

while B denotes the part of the sum with i 6= j.

For A we have

A =1

Λdn|Gn|

1

h

∑i∈Gn

E

[1

h[Φ(Yi)− r(Xi)]

2 K

(xt −Xi

h

)K

(xs −Xi

h

)].

If t = s we obtain

A =1

Λdn|Gn|

1

h

∑i∈Gn

E

[1

h[Φ(Yi)− r(Xi)]

2 K2

(xs −Xi

h

)].

We have

E

[1

h[Φ(Yi)− r(Xi)]

2 K2

(xs −Xi

h

)]= E

(1

hΦ2(Yi)K

2

(xs −Xi

h

))︸︷︷︸

∗

−E

(1

hr2(Xi)K

2

(xs −Xi

h

))︸︷︷︸

∗∗

.

We calculate * and ** one after another:

∗ = E

[E

(1

hΦ2(Yi) K2

(xs −Xi

h

)∣∣∣∣Xi

)]= E

(1

hK2

(xs −Xi

h

)a(Xi)

),

19

where a(x) = E (Φ2(Y )|X = x) . Therefore, by (22),

∗ =

∫ ∞

−∞a(u)

1

hK2

(xs − u

h

)g(u)du =

∫ ∞

−∞a(xs − ht)g(xs − ht)K2 (t) dt

→ a(xs)g(xs)

∫ ∞

−∞K2 (t) dt, if h → 0,

because a and g are bounded and continuous and K2 is integrable.

Similarly we get

∗∗ = E

(1

hr2(Xi)K

2

(xs −Xi

h

))=

∫ ∞

−∞

1

hr2(u)K2

(xs − u

h

)g(u)du

=

∫ ∞

−∞r2(xs − ht)K2 (t) g(xs − ht)dt → r2(xs)g(xs)

∫ ∞

−∞K2 (t) dt, if h → 0,

because r and g are bounded and continuous and K2 is integrable.

Applying (4), we have

A ' 1

Λdnhn

1

|Gn|∑i∈Gn

[a(xs)− r2(xs)

]g(xs)

∫ ∞

−∞K2 (t) dt

' Lv(xs)g(xs)

∫ ∞

−∞K2 (t) dt,

where v(xs) = a(xs)− r2(xs).

We remind that v(x) is the conditional variance of Φ(Y ), that is

v(x) = E(Φ2(Y )|X = x

)− [E (Φ(Y )|X = x)]2

= E[Φ(Y )− E (Φ(Y )|X = x)]2 |X = x

.

If t 6= s we obtain

A =1

Λdn

1

h2n

E

([Φ(Yi)− r(Xi)]

2 K

(xt −Xi

h

)K

(xs −Xi

h

))

=1

Λdn

[E

(1

h2a(Xi)K

(xt −Xi

h

)K

(xs −Xi

h

))20

−E

(1

h2r2(Xi)K

(xt −Xi

h

)K

(xs −Xi

h

))]=

1

Λdn

(A1 + A2).

The boundedness of a(x) and r(x) and (23) imply that

|A1|, |A2| ≤ c

∫ ∞

−∞

1

h2K

(xt − u

h

)K

(xs − u

h

)g(u)du → 0.

Thus, for t 6= s we obtain that A → 0, as Λdn →∞.

Now turn to B.

B =1

Λdn

1

|Gn|∑i6=j

E

(a(Xi, Xj)

1

h2K

(xt −Xi

h

)K

(xs −Xj

h

)),

where

a(Xi, Xj) = ai−j(Xi, Xj) = E [Φ(Yi)− r(Xi)] [Φ(Yj)− r(Xj)] |Xi, Xj .

Therefore

B =1

Λdn

1

|Gn|∑i6=j

∫ ∞

−∞

∫ ∞

−∞ai−j(u, v)

1

h2K

(xt − u

h

)K

(xs − v

h

)gi−j(u, v)dudv,

where gi−j(u, v) is the joint density function of Xi and Xj.

As the random field is strictly stationary, we can assume that the center

of the rectangle Gn is the origin. Then the set of vectors of the form i− j with

i, j ∈ Gn is 2Gn, where 2Gn is defined as (2Gn)∩ (Z/Λn)d. If u ∈ 2Gn is fixed,

then denote by |Gn,u| the number of pairs (i, j) ∈ Gn × Gn with i− j = u.

Then

B =

∫ ∞

−∞

∫ ∞

−∞

1

h2K

(xt − u

h

)K

(xs − v

h

)

×

1

Λdn

∑u∈2G0

n

|Gn,u||Gn|

au(u, v)gu(u, v)

dudv, (24)

21

where 2G0n = 2Gn \ 0. Now fix an ε > 0. As ||augu|| is directly Riemann

integrable, one can find a zone Mε ⊂ Rd (with center in the origin) such that∫Rd

0\Mε

||augu||du ≤ ε (25)

and at the same time the Riemannian approximating sums of this integral do

not exceed ε if the diagonals of the subdivision are small enough (see Fazekas

and Chuprunov, 2006). Therefore, as |Gn,u|/|Gn| ≤ 1,

1

Λdn

∑u∈2G0

n\Mε

|Gn,u||Gn|

||augu|| ≤ ε, (26)

where 1/Λdn is small enough, i.e. when n is large enough: n ≥ nε. Fix ε, Mε

and assume that n ≥ nε. Because augu is Riemann integrable as a function

a · g : Rd0 → C(R2) on R for each bounded closed d-dimensional rectangle R

in Rd0, we have ∥∥∥∥∥∥ 1

Λdn

∑u∈2G0

n∩Mε

augu −∫

Mε

augudu

∥∥∥∥∥∥ ≤ ε (27)

in the space C(R2), if n is large enough. This relation and (25) imply that∫Rd

0

au(x, y)gu(x, y)du

exists and is continuous in (x, y). As each edge of Gn converges to∞, |Gn,u||Gn| →

1 uniformly according to u ∈ Mε. Therefore, using that ‖augu‖ is directly

Riemann integrable, we obtain that∥∥∥∥∥∥ 1

Λdn

∑u∈2G0

n∩Mε

|Gn,u||Gn|

augu −1

Λdn

∑u∈2G0

n∩Mε

augu

∥∥∥∥∥∥ ≤ ε (28)

if n is large enough.

22

Relations (25)-(28) imply that∥∥∥∥∥∥ 1

Λdn

∑u∈2G0

n

|Gn,u||Gn|

augu −∫

Rd0

augudu

∥∥∥∥∥∥ ≤ 4ε (29)

if n is large enough.

Therefore, using that 1hK(

xt−uh

)is a density function, we have∣∣∣∣∣B −

∫ ∞

−∞

∫ ∞

−∞

1

h2K

(xt − u

h

)K

(xs − v

h

)∫Rd

0

au(u, v)gu(u, v)du

dudv

∣∣∣∣∣ ≤ 4ε

(30)

if n is large enough. As∫

Rd0au(u, v)gu(u, v)du is continuous according to

(u, v), the limit of the double integral in expression (30) is∫

Rd0au(xt, xs)gu(xt, xs)du =

σ(xt, xs) (see Theorem 2.1.1 in Prakasa Rao, 1983). Therefore

B →∫

Rd0

au(xt, xs)gu(xt, xs)du = σ(xt, xs).

Therefore we obtain the asymptotic covariance of J1 in the following form

L

∫ ∞

−∞K2(t)dt · diag (v(xt)g(xt)) + (σ(xt, xs))

mt,s=1 .

Now turn to J2.

J2(x) =1√

|Dn|Λdn

∑t∈Dn

1

h[r(Xt)− r(x)] K

(x−Xt

h

).

Then, using Taylor’s expansion (r(u) = r(x)+r′(x)(u−x)+ 12r′′(x)(u−x)2),

we get

E

(1

h[r(Xt)− r(x)] K

(x−Xt

h

))=

∫ ∞

−∞

1

h[r(u)− r(x)] K

(x− u

h

)g(u)du

=

∫ ∞

−∞

1

h

[r′(x)(u− x) +

1

2r′′(x)(u− x)2

]K

(x− u

h

)g(u)du

23

=

∫ ∞

−∞

1

h

[r′(x)z − 1

2r′′(x)z2

]K(z

h

)g(x− z)dz

= r′(x)h

∫ ∞

−∞

1

h

z

hK(z

h

)g(x−z)dz−1

2

∫ ∞

−∞

1

hr′′(x)z2K

(z

h

)g(x−z)dz = A11+A12.

We show that

|A11|, |A12| ≤ h2C. (31)

To obtain this relation, first consider A11. Using substitution t = zh, Taylor’s

expansion g(x − th) = g(x) + g′(x)(−th), the boundedness of g′′, and the

symmetry of K, we obtain:∫ ∞

−∞

1

h

z

hK(z

h

)g(x− z)dz =

∫ ∞

−∞tK(t)g(x− th)dt =

∫ ∞

−∞tK(t)g(x)dt +

∫ ∞

−∞tK(t)g′(x)(−th)dt = −hc

∫ ∞

−∞t2K(t)dt.

Therefore |A11| ≤ h2C.

For |A12| we have |A12| ≤ Ch2∫∞−∞

1h

(zh

)2K(

zh

)g(x− z)dz. By (11) and

Theorem D we have∫∞−∞

1h

(zh

)2K(

zh

)g(x − z)dz → g(x)

∫∞−∞ z2K(z)dz.

Therefore |A12| ≤ h2C.

Now, by (31), |E(J2(x))| .√

|Dn|Λd

nh2C =

√|Tn|h2C. Therefore E(J2(x)) →

0, because by (5), |Tn|h4n → 0.

We want to show that E|J2|l → 0 where 1 < l < 2.

As E|J2|l ≤ C(E|J2 − E(J2)|l + |E(J2)|l

)and E(J2) → 0, it is sufficient

to show that E|J2 − E(J2)|l → 0.

We have

E|J2 − E(J2)|l =

(1√

|Dn|Λdn

)l(1

h

)l

E

∣∣∣∣∣∑t∈Dn

ηt − Eηt

∣∣∣∣∣l

,

where ηt = (r(Xt)− r(x)) K(

x−Xt

h

).

24

Now joining ηt into the unit cubes (denote a cube with K) and applying

the Rosenthal inequality (19) for these, we obtain

E|J2 − E(J2)|l ≤ C

(1

|Dn|Λdn

) l2(

1

h

)l ∑K∈D′n

E

∣∣∣∣∣∑t∈K

ηt − Eηt

∣∣∣∣∣l+ε l

l+ε

≤ C

(1

|Dn|Λdn

) l2(

1

h

)l ∑K∈D′n

E

∣∣∣∣∣∑t∈K

ηt

∣∣∣∣∣l+ε l

l+ε

, (32)

as E|η − Eη|k ≤ CE|η|k for k > 1. (We see that c(ε)1,1 < ∞ follows from (1).)

Applying the Jensen inequality, we get∣∣∣∣∣∑t∈K

ηt

∣∣∣∣∣l+ε

= Λd(l+ε)n

∣∣∣∣∣∑t∈K

1

Λdn

ηt

∣∣∣∣∣l+ε

≤ Λd(l+ε)n

∑t∈K

1

Λdn

|ηt|l+ε ,

which implies that

E

∣∣∣∣∣∑t∈K

ηt

∣∣∣∣∣l+ε

≤ Λd(l+ε)n

∑t∈K

1

Λdn

E |ηt|l+ε = Λd(l+ε)n E |ηt|l+ε . (33)

Hence, by (32) and (33), we get

E|J2 − E(J2)|l ≤ C

(1

|Dn|Λdn

) l2(

1

h

)l |Dn|Λd

n

Λd·ln

(E |ηt|l+ε

) ll+ε

. (34)

We can calculate the limit of E |ηt|l+ε in the following way:

E |ηt|l+ε = E |r(Xt)− r(x)|l+ε K l+ε

(x−Xt

h

)

=

∫ ∞

−∞

∣∣∣ r(u)− r(x)︸︷︷︸r′(ex)(x−u)

∣∣∣l+ε

K l+ε

(x− u

h

)g(u)du

≤ ch1+l+ε

∫ ∞

−∞

1

h

∣∣∣∣x− u

h

∣∣∣∣l+ε

K l+ε

(x− u

h

)g(u)du

25

→ h1+l+εcg(x)

∫ ∞

−∞|z|l+εK l+ε(z)dz.

(Here we applied Theorem D.)

Therefore, by (34), we have

E|J2 − E(J2)|l ≤ C

(1

|Dn|Λdn

) l2(

1

h

)l |Dn|Λd

n

Λd·ln h

l(1+l+ε)l+ε

= C |Dn|1−l2(Λd

n

) l2−1

hl

l+ε = C |Tn|1−l2 h

ll+ε .

Choosing appropriate l and ε (e.g. l = 1.98, ε = 0.01) relation (5) implies

that |Tn|1−l2 h

ll+ε → 0.

Therefore E|J2|l → 0, so J2 → 0 in probability.

Finally, we deal with J3:

J3(x) =1

|Dn|∑t∈Dn

1

hK

(x−Xt

h

).

By Theorem A,√|Tn|(J3(x)− g(x)) is convergent in distribution, therefore

J3(x) → g(x) in probability.

Remark 6. We see that (5) and (7) can be satisfied simultaneously only if

1 < a < 5+√

174

.

4. Examples

In this section we present simple examples that give numerical evidence

for the phenomena described in Theorem 1.

Let Xu,u ∈ Rd, be a stationary Gaussian random field with mean value

function zero and covariance function ρu. In the following examples we con-

sider the same random fields Xu which were studied in Fazekas and Chuprunov

26

(2006) and Fazekas (2007). We will choose Φ(Yu) = 10 sin(Xu) + 100 + δu,

where δu = Xu, Xu is a stationary random field having the same distribution

as Xu and being independent of Xu.

Example 1: Consider the Gaussian process X(u), u ∈ R, with mean zero

and covariance function ρu = e−|u|, u ∈ R. We consider this process in the

1/Λ-lattice points of the domain T = [0, t] with Λ = 40 and t = 60. That

is, the sample is z1 = X(1/40), . . . , zs = X(2400/40) with s = 2400. Now

the covariance matrix of this data vector is (ρ|i−j|)si,j=1, where ρ = e−1/Λ.

Therefore the data generation for the simulation is easy. Let y1, . . . , ys be

i.i.d. standard normal and choose zi = ρi−1y1 +√

1− ρ2∑i

j=2 ρi−jyj, i =

1, . . . , s.

Using these data, we calculated the regression estimator rn at the points

x1 = −0.5, x2 = −0.25, x3 = 0, x4 = 0.25, and x5 = 0.5. We used two values

of the bandwidth, h1 = 0.025 and h2 = 0.005, and applied the standard

normal density function as kernel K.

The simulations were performed with MATLAB, 5000 repetitions of the

procedure were made. The data sets for both bandwidths h1 and h2 were the

same. The theoretical values of the regression function and the average of

their estimators are shown in the Table 1. For both values of the bandwidths

we can see a close agreement of the theoretical and empirical values.

We calculated the empirical covariance matrices Σ1 (corresponding to

bandwidth h1) and Σ2 (corresponding to bandwidth h2) for our standardized

estimators√

|D|Λ

(rn(x1)−r(x1), . . . , rn(x5)−r(x5)) (the standardization factor

is√

|D|Λ

= 7.7459):

27

Table 1: Theoretical values of the regression function and the average of their estimatorsfor the data of Example 1.

x −0.5 −0.25 0 0.25 0.5r(x) 95.2057 97.5260 100.0000 102.4740 104.7943

rn(x) with h1 = 0.025 95.2039 97.5220 99.9953 102.4684 104.7929rn(x) with h2 = 0.005 95.1970 97.5229 99.9939 102.4707 104.7976

Σ1 =

3.8773 2.6953 2.1923 1.7857 1.5073

2.6953 3.5796 2.5499 2.1007 1.7623

2.1923 2.5499 3.4399 2.4892 2.0995

1.7857 2.1007 2.4892 3.4852 2.6500

1.5073 1.7623 2.0995 2.6500 3.8147

;

Σ2 =

7.2195 2.8212 2.1822 1.8547 1.5020

2.8212 6.5902 2.5058 2.0756 1.7223

2.1822 2.5058 6.2153 2.4162 2.1099

1.8547 2.0756 2.4162 6.5192 2.6732

1.5020 1.7223 2.1099 2.6732 7.1689

.

The difference in the diagonals of Σ1 and Σ2 is clearly visible. The off-

diagonal elements are almost the same.

Now calculate the additional terms in the diagonals of the covariance

matrices described in Theorem 1. In our case the elements of the diagonal

matrix Dk for bandwidth hk (for k = 1, 2) are

1

Λ

1

hk

v(xi)1

g(xi)

∫ ∞

−∞K2(u)du =

1

40

1

hk

· 1 · 1

g(xi)

1

2√

π.

Because in the infill-increasing case only the diagonals of the limit co-

28

variance matrices can be different for different values of the bandwidth, we

show the ratio between the diagonals of the difference of the empirical co-

variance matrices, diag(Σ2−Σ1), and of the theoretical covariance matrices,

diag(D2 −D1), in Table 2.

Table 2: Ratio between the diagonals of the difference of the empirical covariance matricesand of the theoretical covariance matrices for the data of Example 1.

x −0.5 −0.25 0 0.25 0.5diag(Σ2−Σ1)diag(D2−D1)

1.0428 1.0316 0.9812 1.0397 1.0465

As the ratios are close to 1, the results show that the diagonal matrix D

of Theorem 1 explains well the dependence of the limit covariance matrix

on the bandwidth.

Finally, Figure 1 shows histograms with the relative frequencies of the

estimators of r(x3 = 0) for the bandwidths h1 = 0.025 (left picture) and h2 =

0.005 (right picture). The histograms are shown together with the theoretical

normal densities with mean and variance estimated from the data used for the

histograms. The approximate normal distribution of the regression estimator

stated in Theorem 1 is reflected in these figures. Different bandwidths lead

to a different spread of the normal distribution.

Example 2: In this example we consider the Gaussian process X(u, v), (u, v) ∈

R2, with mean zero and covariance function ρ(u,v) = e−(|u|+|v|), (u, v) ∈ R2.

As in the previous example, let Φ(Yu) = 10 sin(Xu)+100+ Xu. This process

is observed in the 1/Λ-lattice points of the domain T = [0, t]2 with Λ = 10

and t = 30, and thus the sample is z(i,j) = X(i/10,j/10), i, j = 1, . . . , 300,

with sample size (30 · 10)2 = 90000. Therefore, we generate data yk,l, for

29

r((x3 == 0))

Den

sity

98.5 99.0 99.5 100.5 101.5

0.0

0.5

1.0

1.5

2.0

h1 == 0.025

r((x3 == 0))D

ensi

ty

98.5 99.0 99.5 100.5 101.5

0.0

0.5

1.0

1.5

2.0

h2 == 0.005

Figure 1: Histograms with the relative frequencies of the estimators of r(x3 = 0) forthe bandwidths h1 = 0.025 (left) and h2 = 0.005 (right), together with the theoreticaldensities of the normal distribution.

k, l = 1, . . . , 300, to be i.i.d. standard normal, and choose

z(i,j) = ρi+j−2y1,1 +√

1− ρ2ρj−1

i∑k=2

ρi−kyk,1

+√

1− ρ2ρi−1

j∑l=2

ρj−ly1,l + (1− ρ2)i∑

k=2

j∑l=2

ρi−kρj−lyk,l,

i, j = 1, . . . , 300, where ρ = e−1/Λ.

As in the previous example, we calculated the regression estimator rn at

the points x1 = −0.5, x2 = −0.25, x3 = 0, x4 = 0.25, x5 = 0.5. We used the

bandwidth h1 = 0.01 and h2 = 0.002 and applied the standard normal density

function as kernel K. The data sets for both bandwidths were the same, and

5000 repetitions were performed. Table 3 shows that the theoretical values of

the regression function and the average of their estimators are very similar.

The standardized estimators (the standardization factor is√

|D|Λ2 = 30)

30

Table 3: Theoretical values of the regression function and the average of their estimatorsfor the data of Example 2.

x −0.5 −0.25 0 0.25 0.5r(x) 95.2057 97.5260 100.0000 102.4740 104.7943

rn(x) with h = 0.0100 95.2069 97.5270 99.9999 102.4735 104.7937rn(x) with h = 0.0020 95.2074 97.5276 100.0001 102.4724 104.7955

have the empirical covariance matrices

Σ1 =

5.1226 4.1911 3.9213 3.7524 3.5575

4.1911 4.9262 4.0812 3.9838 3.8019

3.9213 4.0812 4.7751 4.0712 3.9663

3.7524 3.9838 4.0712 4.9523 4.2129

3.5575 3.8019 3.9663 4.2129 5.1573

Σ2 =

8.2768 4.2437 3.9402 3.7704 3.6560

4.2437 7.8458 4.1450 4.1074 3.8184

3.9402 4.1450 7.5220 4.0948 4.0625

3.7704 4.1074 4.0948 7.9032 4.3544

3.6560 3.8184 4.0625 4.3544 8.4931

for the bandwidths h1 and h2, respectively. Again, the agreement of the

off-diagonal elements and the difference in the diagonal becomes visible.

Similar to the previous example, we show the ratios diag(Σ2−Σ1)diag(D2−D1)

in Table

4. These are close to 1 as expected from Theorem 1.

Since, according to Theorem 1, the regression estimator should approach

multivariate normality for different values xi, we present in Figure 2 the

resulting estimations of r(x1 = −0.5) (horizontal axes) and r(x2 = −0.25)

31

Table 4: Ratio between the diagonals of the difference of the empirical covariance matricesand of the theoretical covariance matrices for the data of Example 2.

x −0.5 −0.25 0 0.25 0.5diag(Σ2−Σ1)diag(D2−D1)

0.9841 1.0004 0.9711 1.0112 1.0408

(vertical axes) for the bandwidths h1 = 0.01 (left picture) and h2 = 0.002

(right picture). The estimated contour lines for certain levels are drawn

with dashed lines. The solid ellipses represent the same levels, but taken

from the density functions of the bivariate normal density with mean and

covariance taken from the underlying data. The close agreement is clearly

visible. Moreover, for both bandwidths the ellipses have the same orientation

but different size, which refers to the closeness of the off-diagonal elements

and to the disagreement of the diagonal elements of Σ1 and Σ2 from above.

94.9 95.1 95.3 95.5

97.2

97.4

97.6

97.8

h1 == 0.01

r((x1 == −− 0.5))

r((x2

==−−

0.25

))

94.9 95.1 95.3 95.5

97.2

97.4

97.6

97.8

h2 == 0.002

r((x1 == −− 0.5))

r((x2

==−−

0.25

))

Figure 2: Two-dimensional representations of the estimators of r(x1 = −0.5) and r(x2 =−0.25) for the bandwidths h1 = 0.01 (left) and h2 = 0.002 (right), together with contourlines (dashed) and ellipses for theoretical values of the normal densities (solid).

32

Acknowledgement

The authors are grateful to the referee for helpful comments and suggestions.

References

Biau, G. (2004), Spatial kernel density estimation. Mathematical Methods of

Statistics 12 (2003), no.4, 371–390 (2004).

Blanke, D., Pumo, B. (2002), Optimal sampling for density estimation in

continuous time. J. Time Ser. Anal. 24 (2003), no.1, 1–23.

Bosq, D. (1997), Parametric rates of nonparametric estimators and predictors

for continuous time processes. The Annals of Statistics, 25 (3), 982–1000.

Bosq, D. (1998), Nonparametric Statistics for Stochastic Processes. Springer,

New York - Berlin - Heidelberg.

Bosq, D., Cheze, N. (1993), Erreur quadratique asymptotique optimale

de l’estimateur non parametrique de la regression pour des observations

discretisees d’un processus stationnaire a temps continu. C. R. Acad. Sci.

Paris 317 (I), no. 9, 891–894.

Cai, Z. (2001), Weighted Nadaraya-Watson regression estimation. Statistics

& Probability Letters, 51, 307–318.

Cheze, N. (1992), Regression non parametrique pour un processus a temps

continu. C. R. Acad. Sci. Paris 315 (I), 1009–1012.

Cressie, N.A.C. (1991), Statistics for Spatial Data. Wiley, New York.

33

Devroye, L. and Gyorfi, L. (1985), Nonparametric density estimation. The

L1 view. Wiley, New York.

Doukhan, P. (1994), Mixing. Properties and Examples. Lecture Notes in

Statistics 85, Springer, New York.

Fazekas, I. (2003), Limit theorems for the empirical distribution function in

the spatial case. Statistics & Probability Letters, 62, 251–262.

Fazekas, I. (2007), Central limit theorems for kernel type density estimators.

Proceedings of the 7th International Conference on Applied Informatics,

Eger, (Vol. 1), 209–219.

Fazekas, I. and Chuprunov, A. (2004), A central limit theorem for random

fields. Acta Mathematica Academiae Paedagogicae Nyiregyhaziensis, 20

(1), 93–104, www.emis.de/journals/AMAPN.

Fazekas, I. and Chuprunov, A. (2006), Asymptotic normality of kernel type

density estimators for random fields. Stat. Inf. Stoch. Proc. 9, 161–178.

Fazekas, I., Kukush, A.G. and Tomacs, T. (2000), On the Rosenthal inequal-

ity for mixing fields. Ukrainian Math. J., 52(2), 266–276.

Guyon, X. (1995), Random Fields on a Network. Modeling, Statistics, and

Applications. Springer, New York.

Hille, E. and Phillips, R.S. (1957), Functional Analysis and Semi-groups.

AMS, Providence.

Lahiri, S.N. (1996), On inconsistency of estimators based on spatial data

under infill asymptotics. Sankhya, 58 Ser. A, 403–417.

34

Lahiri, S.N., Kaiser, M.S., Cressie, N. and Hsu, N. J. (1999), Prediction

of spatial cumulative distribution functions using subsampling. J. Amer.

Statist. Assoc. 94 (445), 86–110.

Masry, E. (1983), Probability density estimation from sampled data. IEEE

Transactions in information theory, vol. IT-29, no. 5, 696–709.

Nadaraya, E.A. (1964), On estimating regression. Theor. Probability Appl.,

9, 141–142.

Park, B., Kim, T.Y.K., Park, J-S., Hwang, S.Y. (2008), Practically appli-

cable central limit theorem for spatial statistics. Math. Geosci, Online:

http://springerlink.metapress.com/content/121014/?Content+Status=Accepted

Prakasa Rao, B.L.S. (1983), Nonparametric Functional Estimation. Aca-

demic Press, INC. London.

Schuster, E.F. (1972), Joint asymptotic distribution of the estimated regres-

sion function at a finite number of distinct points. The Annals of Mathe-

matical Statistics, 43 (1), 84–88.

Watson, G.S. (1964), Smooth regression analysis. Sankhya. Ser. A 26, 359–

372.

35

Documents

Asymptotic normality of kernel type regression estimators ...file.statistik.tuwien.ac.at/filz/papers/09JSPI.pdf · Asymptotic normality of kernel type regression ... of a diagonal