Nonparametric Regression Function Estimation for Errors-in …web.stat.nankai.edu.cn/chlzou/sinica_revise.pdf · 2013-08-11 · Pepe and Fleming (1991), Pepe (1992), Carroll et al

Statistica Sinica (2009): manuscript 1

Nonparametric Regression Function Estimation for

Errors-in-Variables Models with Validation Data

Lilun Du, Changliang Zou and Zhaojun Wang∗

Nankai University

Abstract: This paper develops estimation approach for nonparametric regression

analysis with measurement error in covariable assuming the availability of inde-

pendent validation data on covariable in addition to primary data on the response

variable and surrogate covariable. Without specifying any error model structure

between the surrogate and true covariables, we propose an estimator which inte-

grates local linear regression and Fourier transformation method. An estimator

combined by two local linear kernel estimators, is firstly used to calibrate the con-

ditional expectation of unknown objective regression function given the surrogate

covariates, and then the final estimator can be derived by passing a trigonometric

series approach suggested by Delaigle et al. (2006). Under mild conditions, the

consistency of the proposed estimator is established and the convergence rate is

also obtained. Numerical examples show that it performs well in applications.

Key words and phrases: Asymptotic Normality, Local Linear Regression, Measure-

ment Error, Trigonometric Series.

1. Introduction

In the last two decades, errors-in-variables (EV) model has drawn much at-

tention from statisticians. An increasing number of applications of the linear

and non-linear EV models have been seen in recent years due to their simple

forms and wide applicabilities. Comprehensive reviews on the research and de-

velopment of the EV model can be found in Fuller (1987), Carroll et al. (2006)

and the references therein. In practice, the relationship between the measured

(surrogate) variables and the true variables can be rather complicated compared

to the classical additive error structure usually assumed. In such cases, obtain-

ing correct statistical analyses becomes challenging. One solution is to use the

help of validation data to capture the underlying relationship between the true∗Corresponding author. Email: [email protected]

2 LILUN DU, CHANGLIANG ZOU AND ZHAOJUN WANG

variables and surrogate variables. For instance in the measurement of heart mus-

cle damage caused by a myocardial infection, peak cardiac enzyme level in the

bloodstream is a variable obtained easily, but this cannot assess accurately the

damage to the heart muscle. Instead, arterioscintograph, an invasive and ex-

pensive procedure, can be employed to produce a more accurate measure of the

heart muscle in a small subset of subjects enrolled in the study (cf., Wittes et al.

1989). Here, diagnostic data of heart damage by peak cardiac enzyme level in

the bloodstream are used as surrogate variable and the corresponding exact data

measured from the process of arterioscintograph for a small subset of subjects

are used as validation variable.

Inference based on surrogate data and a validation sample has been the ob-

ject of much attention. Carroll and Stefanski (1990), Carroll and Wand (1991),

Pepe and Fleming (1991), Pepe (1992), Carroll et al. (1995), Lee and Sepan-

ski (1995), Sepanski and Lee (1995), Wang (1999), Wang and Rao (2002) and

Stute et al. (2007) and the above referenced authors developed suitable methods

for different models. Although there are many collective efforts on statistical

inference with validation data, they mostly focus on specifying some paramet-

ric relationship between covariates and responses. While parametric methods

are useful in certain applications, questions will always arise about adequacy of

these parametric model assumptions and about potential impact of model mis-

specifications on statistical analyses. In comparison, it is well known that the

flexibility of nonparametric regression provides a useful tool especially when the

relationship is too complicated to be specified parametrically. This motivates us

to consider estimating regression function nonparametrically when the covariate

is measured erroneously in which some validation data are available to relate the

surrogate and true variables without specifying any error model structure.

To be specific, we assume that independent validation dataset V = (tj , tj)N+nj=N+1

is available, in addition to the primary (surrogate) dataset S = (Yi, ti)Ni=1 which

are generated by the following nonparametric model

Y = m(t) + ε, (1.1)

where Y is a scalar response variable, t is a univariate explanatory variable,

t is the surrogate variable of t, and ε is a random error with E[ε|t] = 0 and

E[ε2] < ∞. Given ti’s, the errors εi’s are assumed to be independent and identi-

Nonparametric Regression with Validation Data 3

cally distributed. Our objective is to estimate the unknown regression function

m(t) in (1.1) with the datasets V and S. Obviously standard methodologies based

on regression calibration developed for parametric inferences, such as Carroll and

Wand (1991), Sepanski and Lee (1995) and Stute et al. (2007) are not applicable

in this situation. Recently, Wang (2006) developed an estimation approach for

nonparametric regression analysis with surrogate data and validation sampling

which cannot be applied to our problem. Besides the fundamental difference

between measurement error in response variable considered in his paper and co-

variable measured erroneously considered here, the validation dataset assumed

in Wang (2006) is in the form of tj , Yj , YjN+nj=N+1 which contains the exactly

matched tj , YjN+nj=N+1 so that the regression calibration method could be em-

ployed to estimate the function E[Y |Y , t], whereas in our problem the validation

dataset does not provide such auxiliary information except for tj , tjN+nj=N+1.

In this paper, we propose a nonparametric estimator m(z) which integrates

local linear regression and Fourier transformation method. This estimation ap-

proach consists of two major steps: An estimator combined by two local linear

kernel estimators (LLKE) based on V and S respectively, is firstly proposed to

calibrate the function E[Y |U(t) = z] where U(t) = E[t|t]. This estimator is some-

what special and unique because two LLKEs are nested as one kernel function

involves in the other kernel function. Although E[Y |U(t) = z] is not our objec-

tive function, the relationship between m(z) = E[Y |t = z] and E[Y |U(t) = z] can

then be derived through the trigonometric series approach suggested by Delaigle

et al. (2006). Under mild conditions, the consistency of m(z) is established and

the convergence rate is also derived.

In the next section, we start by describing our estimation approach, and then

state the main asymptotic results. In Section 3, we investigate the finite sample

properties of our proposed approach. A real-data example is used to demonstrate

the method in Section 4. Section 5 concludes this paper by suggesting some future

research issues. The proofs are given in the Appendix.

2. Methodology

Recall model (1.1) and the assumptions below it. We consider rewriting (1.1)


as the following form

Y = m(U(t) + η) + ε, (2.1)

t = U(t) + η, (2.2)

where U(t), ε and η are independent. Taking U(t) as a new variable, this can

be regarded as a Berkson EV model (Berkson 1950; Carroll et al. 2006). This

enables us to apply a recent nonparametric technique for solving Berkson EV

problem developed by Delaigle et al. (2006), as long as we could obtain estimation

of the distribution of η and E[Y |U(t) = z] on which we will elaborate next.

2.1. Estimating E[Y |U(t) = z]

Represent (2.1) as

Y = M(U(t)) + ξ, (2.3)

where M(z) ≡ E[Y |U(t) = z], E[ξ|t] = 0 and E[ξ2] < ∞ . Generally, M is not

equal to m and ξ 6= ε + η. Based on the validation set V, we can firstly estimate

U(t) by means of local linear fit, that is

U(t) =

∑N+nj=N+1 tjL

(tj−tbn

)Sn2 − (tj − t)Sn1

∑N+nj=N+1 L

(tj−tbn

)Sn2 − (tj − t)Sn1

,

where

Snγ =1

nbn

N+n∑

j=N+1

L

(tj − t

bn

)(tj − t)γ , γ = 0, 1, 2,

L(·) is a symmetric density kernel function and bn is a bandwidth. Here we

choose to use the LLKE rather than the Nadaraya-Watson estimator for the

reason that the former possesses superior boundary behavior (cf., Fan 1992). As

illustrated in the followings, this estimator will be nested into another LLKE,

and thus unsatisfactory performance at boundary at the stage of the first local

linear fit may be deteriorated in the next LLKE because the value of U(t) at

boundary may lie in the interior of the range of U(t).

So far, with the datasets V and S, we can estimate the function M(·) through


plugging U(t) into the LLKE of (Yi, U(ti))Ni=1, that is,

M(z) =

∑Ni=1 YiK

(U(ti)−z

hN

)SN2 − (U(ti)− z)SN1

∑Ni=1 K

(U(ti)−z

hN

)SN2 − (U(ti)− z)SN1

, (2.4)

where

SNγ =1

NhN

N∑

i=1

K

(U(ti)− z

hN

)(U(ti)− z)γ , γ = 0, 1, 2, 3,

K(·) is a symmetric density kernel function and hN is a bandwidth.

Remark 1 The preceding algorithms require LLKE of function U(t) regressed on

t in the validation data. As discussed by Carroll and Wand (1991), Sepanski and

Carroll (1993), Sepanski et al. (1994) and others, the range of t in the validation

data is usually smaller than that in primary data observed, which could affect

LLKE U(t) studied above. If the LLKE were used blindly, this would lead to

extrapolation which is a dangerous business. Following the method dealing with

this edge effects problem developed by these authors, the LLKE U(t) used in

this paper are calculated on a compact set Θ = [tV min, tV max] interior to the

support of t, where tV min = mintjN+nj=N+1 and tV max = maxtjN+n

j=N+1. Sums

in the primary data are taken only for those t ∈ Θ. While this truncation causes

certain loss in efficiency, it is counterbalanced by a gain in robustness.

2.2. Estimating m(z)

In what follows, we shall assume that the densities of t, t and η, denoted

as ft, ft and fη respectively, are all compactly supported and bounded away

from zero. Without loss of generality, we suppose their support intervals have

been rescaled so that are contained within Ω = [−π, π]. In addition, to assure

m is identifiable, we suppose that the the support of ft is contained within the

range of U(t). After obtaining M(z) by Eq. (2.4), the Fourier transformation

method introduced by Delaigle et al. (2006) can be accommodated to find the

relationship between m(z) and M(z).

On Ω we may write the trigonometric series for m(z) as

m(z) = m0 +∞∑

l=1

m1l cos(lz) + m2l sin(lz), (2.5)


where

m0 =12π

∫

Ωm(t)dt, m1l =

1π

∫

Ωm(t) cos(lt)dt, m2l =

1π

∫

Ωm(t) sin(lt)dt.

Analogously, M(U) can be represented as

M(U) = M0 +∞∑

l=1

M1l cos(lU) + M2l sin(lU), (2.6)

where the constants M0, M1l,M2l for l = 1, 2, . . . , are the Fourier coefficients

determined by function M . Furthermore, the coefficients m1l,m2l are uniquely

determined by Ml1,M2l, if U and η are independent which implied by the Berkson

model (2.1)-(2.2). Hence, simple calculations yield(

m1l

m2l

)=

1α2

1l + α22l

(α1l −α2l

α2l α1l

)(M1l

M2l

), (2.7)

where α1l = Ecos(lη) and α2l = Esin(lη) provided that α21l + α2

2l 6= 0 for

l ≥ 1.

In the present problem, the distribution of η is not explicitly available. How-

ever, with the help of validation data, the empirical distribution of ηj can be

used instead, where

ηj = tj − U(tj), j = N + 1, . . . , N + n.

This type of estimation has been proposed and studied by Akritas and Keilegom

(2001). Correspondingly, α1l and α2l for l = 1, 2, . . . , can be estimated by

α1l =1n

N+n∑

j=N+1

cos(lηj), α2l =1n

N+n∑

j=N+1

sin(lηj). (2.8)

Thus, the estimated coefficients of m1l,m2l in (2.7) can be represented as(

m1l

m2l

)=

1α2

1l + α22l

(α1l −α2l

α2l α1l

)(M1l

M2l

), (2.9)

where

M1l =1π

∫

HM(t) cos(lt)dt, M2l =

1π

∫

HM(t) sin(lt)dt,


m0 = M0 =12π

∫

HM(t)dt,

M(·) is defined in (2.4), and H ⊆ Ω contains the support of M . Combining Eqs.

(2.5) and (2.9), our final estimator m(z) is given by

m(z) = m0 +q∑

l=1

[m1l cos(lz) + m2l sin(lz)], (2.10)

where q denotes the number of Fourier coefficients that are included in the es-

timator and can be deemed as another smoothing (regularization) parameter

besides bn and hN .

Remark 2 Based on Eqs. (2.1)-(2.2), we can see that inference on the rela-

tionship between m(z) and M(z) essentially lies in the nonparametric regression

problems with additive EV structure. Nonparametric methods for inference in

the settings of the classical EV model, that is t = t+δ, include kernel approaches

(e.g., Fan and Truong 1993; Delaigle et al. 2009) and techniques that are based

on simulation and extrapolation (see Cook and Stefanski 1994; Carroll et al.

1999; Staudenmayer and Ruppert 2004). These approaches may be also used in

the present circumstance after certain modifications, although the Berkson model

is particularly appropriate according to (2.2). A more thorough investigation on

this problem is beyond the scope of this paper but should be a subject of future

research.

2.3. Main Results

In this subsection, we study the asymptotic behavior of the proposed esti-

mator. To be clear, a set of conditions for the results stated later are presented.

Let Ft|t denotes the conditional distribution function of t given t.

Conditions

(C1) m(·) has s-order Lipschitz continuous second derivative for some real s > 0.

(C2) The functions U(·) and M(·) are bounded and twice continuously differen-

tiable.

(C3) The density function of U(t), denoted as fU , is Lipschitz continuous and

bounded away from 0.


(C4) The kernel functions K(w) and L(w) are bounded and symmetric proba-

bility density functions and satisfy∫

K(w)w2dw < ∞ and∫

L(w)w2dw <

∞. In addition, K(w) is twice differentiable and satisfies∫

K′(w)wdw <

∞,∫

K′2(w)dw < ∞, and

∫ |K ′′(w)|dw < ∞.

(C5) N , n, bn, hN satisfy the conditions that γ2n/h2

N → 0, nbn →∞ and Nh3N →

∞, where γn = ( 1√nbn

+ b2n) log1/2(1/bn).

(C6) lim Nn = λ ∈ [0,∞).

(C7) The density function ft is three times continuously differentiable.

(C8) F ′t|t is continuous in (t, t) and supt,t |t2F ′

t|t| < ∞, and the same holds for all

other partial derivatives of Ft|t with respect to t and t up to order two.

Remark 3 It is noted that conditions (C1)-(C4) are standard in nonparametric

regression except that some assumptions on the derivatives of the function K

are required which are often satisfied for most of commonly used kernel func-

tions. (C5) is the bandwidth condition used in theorems. (C6) is the standard

condition in the study of validation sampling and is reasonable in practice. (C7)

and (C8) are mild conditions used in the proofs of Theorem 2, which guarantee

the consistency of the moment estimators Eq. (2.8). Obviously, the conditions

imposed here are mild.

Due to the relationship between m(z) and M(z), the performance of esti-

mator M(z) is important, whose property plays an important role in the conver-

gence rate of m(z). Moreover, the double local linear regression M(z) is rather

complicated, whose theoretical results cannot be derived by using the standard

technique of LLKE, since it involves two error terms, say ξ and η. Hence, we

begin by studying the asymptotic property of M(z).

Theorem 1 Suppose Conditions (C1)-(C6) hold, we have√

NhN (M(z)−M(z)−B(z)) L−→ N(0, V (z)), (2.11)

where

V (z) =[Var(ξ) + λVar(η)M

′2(z)] ∫

K2(w)dw

fU (z),

B(z) =12M

′′(z)h2

N

∫K(w)w2dw + O(b2

n). (2.12)


Furthermore, if U is reversible,

B(z) =12M

′′(z)h2

N

∫K(w)w2dw − 1

2M

′(z)U

′′(U−1(z))b2

n

∫L(w)w2dw,

where U−1 denotes the inverse function of U .

Remark 4 If λ = 0, which means n >> N , the asymptotic property of M(z)

is the same as the standard LLKE (cf., Fan 1992). In reality, λ is usually a

constant larger than 1, because the covariates exactly measured tjN+nj=N+1 are

expensive to be obtained. However, in this situation, M(z) still achieves the

optimal convergence rate of the standard LLKE by choosing appropriate hN and

bn.

The next theorem establishes the convergence rate of the proposed estimator

m(z).

Theorem 2 Suppose Conditions (C1)-(C8) hold. If c1j−a1 ≤ |αkj | ≤ c2j

−a1

and |Mkj | ≤ c3j−a2 for some positive constants c1, c2, c3, a1 and a2, we have

∫

ΩE[m(t)−m(t)]2dt = O(N−1q2b+1 + h4

Nq2b+1 + b4nq2b+1 + q−2−s),

where b=max (a1, 2a1 − a2), s > 0 is defined in Condition (C1).

Remark 5 The convergence rate of M(z) presented in Theorem 1 leads to the

mean square error of Mkl − Mkl being with the order of Op(N−1 + h4N + b4

n).

Similarly, the mean square error of αkl − αkl is of order Op(n−1 + b4n) as shown

in the proof of Theorem 2. Combining the convergence rates of Mkl −Mkl and

αkl − αkl together results in the first three terms of mean integrated square

error of m(z) presented in the Theorem 2. The Condition (C5) and Theorem 2

motivate us to choose hN and bn in the region of [N−1/3, N−1/4] as then the mean

integrated squared error of m(t) will mainly depend on N and q. In particular,

undersmoothing at the stage of estimating M decreases the estimation bias at

the expense of variance to some extent, but further smoothing at the stage of

Fourier transformation can compensate for the increase of variance.

2.4. Smoothing Parameters Selection

As the same with any nonparametric regression procedure, an important

choice to be made is the amount of local averaging performed to obtain the re-

gression estimate. For the local polynomial regression estimator, bandwidth se-

lection rules were considered in Ruppert et al. (1995) and Fan and Gijbels (1996)


among others. Delaigle et al. (2006) proposed an automatic way of choosing the

smoothing parameter q and hN , which is a combination of an existing plug-in

bandwidth selector for LLKE and the cross-validation (CV) rule for trigonomet-

ric series. Since our estimator m(z) involves three regularization parameters hN ,

bn and q, we therefore present the following modification of the leave-one-out CV

selection criterion.

First, we use bn = bcn−1/20, where bc is the delete-one CV bandwidth selector

for the validation sample, that is the minimizer of

CV(bn) =1n

N+n∑

j=N+1

(tj − U (−j)(tj ; bn))2,

where U (−j)(·; bn) denotes the leave-one-out version of U using bn. Here the

multiplier n−1/20 is recommended partially due to the condition (C5) and the

order of cross-validation selector that have an optimal Op(n−1/5) rate (Hardle et

al. 1988). This bandwidth yields certain degree of undersmoothing to U , which

is appropriate from the intuitive explanation provided in Remark 5.

After obtaining bn and defining the corresponding U with the selected band-

width bn, a cross-validation criterion for selecting hN and q would choose

(hN , q) = arg minhN ,q

1N

N∑

i=1

[Yi − m(−i)(U(ti);hN , q)]2,

where m(−i)(·;hN ,m) denotes the version of m that is constructed on omitting

(ti, Yi) from the surrogate data using hN and q terms in the trigonometric series.

In this criterion, U(ti) is used as an empirical approximation of ti in the sense

of ignoring the error item ηi because in practical application the true regressor

ti is not observable. Note that since q is an integer, this two-dimensional CV

criterion does not require extensive computation effort. Furthermore, based on

our numerical results, it suffices to choose q from 1 to 5 which is in accordance

with the empirical findings in Delaigle et al. (2006). For instance, when (n,N) =

(60, 120), it requires less than 3 seconds to complete the whole curve fits on 201

grid points, using a Pentium-M 2.4MHz CPU.

3. Numerical Performance Assessment

We conduct a simulation study to evaluate numerical properties of the pro-

posed estimator using the smoothing parameters selection method described in


Section 2.4. The number of variety of regression function and combinations of

parameters are too large to allow a comprehensive, all-encompassing comparison.

Our goal is to show the effectiveness and robustness of the proposed estimator

m(z), and thus we only choose certain representative examples for illustration.

−0.5 0.0 0.5

0.2

0.3

0.4

0.5

0.6

z(a)

−0.5 0.0 0.5

0.2

0.3

0.4

0.5

0.6

z(b)

Figure 3.1: The estimated curves of M(z) for the regression function (I) through 1,000 repe-

titions when (a): (n, N) = (30, 120), U given by (i); (b): (n, N) = (30, 120), U given by (iii).

The solid, dashed, dotted, and two dashed-dotted curves represent M(z), the median of m2(z),

the median of M(z), and the quartiles of M(z), respectively.

We consider two regression functions m(·) taken from the examples of De-

laigle et al. (2006):

(I) m(z) = (1− z2)2Iz∈[−1,1]

(II) m(z) = (1− z2)2 exp(2z)Iz∈[−1,1].

where I· is the indicator function. Three cases for U are considered: (i) t = t+δ

which is the classical Berkson model; (ii) t = t2− 14+δ; (iii) t = cos(2πt)+δ, where

δ is independent of t and follows the uniform distribution U[−1, 1]. The t’s were

generated from two distributions: (1) t ∼ N(0, 0.52); (2) t ∼ exp(1)−1, where the

latter denotes the centered standard exponential distribution. Throughout this

section the ε’s are assumed to follow the normal distribution N(0, 0.252). For each

of several choices of m, U and the distribution of t, 1,000 simulated datasets were

generated for each sample size combination of (n,N) = (30, 120), (60, 200). The

kernel functions K and L used in (2.4) are both chosen to be the Epanechnikov


kernel function 0.75(1− x2)I(−1 ≤ x ≤ 1), which has certain optimal properties

(cf., Fan and Gijbels 1996).

−1.0 −0.5 0.0 0.5 1.0

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

z(a)

−1.0 −0.5 0.0 0.5 1.0

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

z(b)

Figure 3.2: Median Curves of 1000 estimators of regression function (I) with (n, N) = (60, 120)

when (a): U given by (i), ft given by (1); (b): U given by (ii), ft given by (1). The solid, dashed,

dotted, and dashed-dotted curves represent m(z), m(z), m1(z) and m2(z) respectively.

It is challenging to compare the proposed method with alternative methods,

since there is no obvious comparable method in the literature. Here, we consider

Delaigle et al.’s (2006) method where a simple Berkson error structure is assumed

(denoted as m1). Another possible alternative is the naive LLKE based on the

dataset (Yi, ti)Ni=1 (denoted as m2) for comparison use, although it is not a

consistent one due to ignoring of the measurement error. For m1, the method

proposed in Delaigle et al. (2006) for determining the parameters is considered.

For m2, the leave-one-out CV approach is used for choosing bandwidth hN .

Firstly, as the performance of our final estimator m(z) relies heavily on the

estimator M(z), it is interesting to see how well M(z) works. Figure 3.1 shows

the regression function M(z), the curves of the median and quartiles of 1000

estimates M(z) and the median of m2(z) with two different settings for the

regression function (I). In both settings, the first ft is considered. Since the close

form of true function M(z) is not available, we use simulation to approximate

it through 100,000 repetitions. In the first setting where U is considered as the

additive Berkson error model, the m2(z) is certainly a consistent estimator as we

can see from Figure 3.1(a). However, this is not the case for the second setting in


−1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

1.0

1.5

z(a)

−1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

1.0

1.5

z(a)

Figure 3.3: Median Curves of 1000 estimators of regression function (II) with (n, N) = (60, 120)

when (a): U given by (i), ft given by (2); (b): U given by (ii), ft given by (2). The solid, dashed,

dotted, and dashed-dotted curves represent m(z), m(z), m1(z) and m2(z) respectively.

which the nonlinear error structure (III) is considered. In this case, m2(z) is far

away from the true M(z) as we would expect, but M(z) still performs reasonably

well guaranteed by Theorem 1.

Table 3.1: The estimated MISE comparison for the estimators m(z), m1(z) and m2(z).

m(z) m1(z) m2(z)(n,N) U ft regression function m

(I) (II) (I) (II) (I) (II)(30, 120) (i) (1) 1.73e-2 7.32e-2 1.05e-2 2.54e-2 1.14e-1 2.65e-1

(i) (2) 1.53e-2 6.95e-2 1.10e-2 2.35e-2 1.11e-1 2.74e-1(ii) (1) 3.69e-2 8.95e-2 9.80e-2 1.47e-1 1.33e-1 3.67e-1(ii) (2) 3.40e-2 8.19e-2 9.66e-2 1.39e-1 1.28e-1 3.68e-1

(60, 120) (i) (1) 1.25e-2 6.22e-2 7.49e-3 1.96e-2 1.01e-1 2.60e-1(i) (2) 1.07e-2 5.80e-2 7.18e-3 1.75e-2 9.85e-2 2.48e-1(ii) (1) 2.55e-2 7.77e-2 9.64e-2 1.39e-1 1.23e-1 3.45e-1(ii) (2) 2.22e-2 7.49e-2 1.03e-1 1.58e-1 1.16e-1 3.31e-1

Figures 3.2 and 3.3 show the regression function curve m(z), the curves of

the medians of 1000 estimates m(z), m1(z) and m2(z) under different settings

of U and ft for (n,N) = (60, 120), in the two examples (I) and (II) respectively.

From these two figures, we can observe that m(z) can capture the patterns of


the true curves as well as m1(z) does, although m(z) tends to have larger bias

at boundaries and peaks due to its explicit dependence on the size of validation

dataset. Taking the sample sizes and noise levels into account, the proposed

estimator m(z) and smoothing parameter selection method appear to perform

very well for the test functions considered in this study. In comparison, clearly,

m2(z) fails to produce correct function curves as we expected.

−1.0 −0.5 0.0 0.5 1.0

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

z(a)

−1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

1.0

1.5

z(b)

Figure 3.4: Median Curves of 1000 estimators with (n, N) = (60, 120) when (a): m given by

(I), U given by (iii), ft given by (2); (b): m given by (II), U given by (iii), ft given by (2).

The solid, dashed, dotted, and dashed-dotted curves represent m(z), m(z), m1(z) and m2(z)

respectively.

Table 1 summarizes the results shown in Figures 3.2-3.3 numerically. The

estimated mean integrated squared errors (MISE) which were evaluated on a

grid of 201 equidistant values of z in [-1,1] are presented. We can see that m1(z)

has certain advantage over m(z) when the function (i) is considered because its

assumption on error model is correct in such a case. m(z) has much smaller

MISE than m1(z) in the case that the nonlinear Berkson model (ii) are used. In

these cases, the performance of m(z) improves considerably as the sample sizes

increases in terms of MISE, whereas m1(z) is hardly affected by those changes.

It is worth pointing out that the superiority of m(z) to m1(z) may become more

significant as the difference between the U and the simple linear model gets more

prominent. Figures 3.4 shows the median curve comparison when U is chosen as

(iii) with (n,N) = (60, 120). In both examples of this figure, m1(z) completely


fails to capture the main profile of the underlying regression function. Note that

in all the cases listed in Table 1, m2(z) are totally incorrect which indicates that

the proposed method is necessary.

4. A Real-Data Example

In this section, we apply the proposed approach to a dataset of enzyme

reaction speeds collected in 1974. The reaction speed (Y ) is calculated by the

particle number of radioactive matter obtained by reaction per minute in basal

density (t). The objective of this analysis is to relate Y to the basal density t.

There are two ways to measure the basal density: a simple chemical method can

be used to measure the basal density, but cannot assess it accurately because of

measurement errors; Another approach is to use a precision machine tool and an

expensive procedure to produce a more accurate measure t of the basal density

for a small subset of subjects enrolled in the study. The basal density obtained

by the chemical method is used as the surrogate variable t, and the corresponding

exact measure for a small subset of subjects is used as the validation variable t.

This dataset has been analyzed by Stute et al. (2007) in which a nonparamet-

ric fit is applied to relate t and t and the following non-linear model is considered

as the underlying regression function m,

m(t, β) =β1t

β2 + t. (4.1)

The dataset comprises n = 10 validation observations and N = 30 surrogate

observations. Here, we apply the proposed methodology to this dataset. Using

the smoothing parameters selection method given in Section 2.4 results in three

parameters bn = 0.453, hN = 0.144 and q = 3. Figure 4.1 displays the corre-

sponding estimated curve (m) along with the curve produced by the parametric

model (4.1) with β1 = 212.7 and β2 = 0.06484 obtained from Stute et al. (2007).

It is readily seen that the two curves have similar patterns, although the esti-

mator we proposed looks coarser, because the nonparametric smoother absorbs

considerably more degrees of freedom than do parametric approaches. In addi-

tion, m(z) differs much from the parametric one at the right boundary. This is

not surprising to us because of the lack of data in the interval t ∈ [−0.8, 1.0].

Thus, again it should be emphasized that compared with parametric methods, to

use the proposed method requires relatively large sample size, especially at the


boundary. We think this has become a less significant limitation with advances

in various technologies. New instruments can capture more information, and a

large amount of data become available in modern statistical analysis.

0.0 0.2 0.4 0.6 0.8 1.0

100

120

140

160

180

200

Basal Density

Rea

ctio

n S

peed

Figure 4.5: Our estimator m(z) and the parametric model (4.1) fit of the regression function

m, represented by the solid and dotted curves respectively.

5. Discussion

In the foregoing investigation, we have assumed without comment that in the

validation dataset only the true and surrogate covariables are observed. In fact,

in certain applications (e.g., Chen 2002), the observations on response variable

are also available. In such situations, several naive nonparametric function esti-

mators are obtainable. One is to directly use the matched Yj , tjN+nj=N+1 which is

apparently inefficient because the size of validation dataset is usually relatively

small. An alternative estimator is using the proposed approach given in Section

2 except that replacing the surrogate dataset with an extended sample by com-

bining Yj , tjN+nj=N+1 and Yi, tiN

i=1 together. However, this estimator ignores

the information provided by Yj , tjN+nj=N+1 and hence may not be efficient either.

How to construct an estimator which can fully incorporate the information given

by data in this case warrants further research.


Another extension is how to estimate nonparametric curve in a high dimen-

sion space when multi-covariates are contaminated with non-additive errors by

using a validation data. As we know, the Fourier transformation is not directly

applicable in this situation. To avoid the “curse of dimensionality”, several di-

mension reduction models, such as the additive model, the single-index model,

the partially linear model and the varying coefficient model, have been thor-

oughly studied and applied widely in practical applications. It is of interest to

incorporate these dimension reduction methods into our proposed methodology

to solve high-dimensional problems with validation data. Specially, the case that

only one univariate variable is measured with error in a multi-covariates model,

say, y = m(t,x) + ε where t is measured with error but x is measured exactly, is

often occurred in real life data. How to deal with this kind of problem is an im-

portant and interesting topic for future study. For instance, consider a commonly

used partially linear model y = m(t)+xT β +ε and the error model t = U(t)+η.

To solve this EV problem, by using the similar idea in Section 2, we may firstly

rewrite the model into y−xT β = M(U(t)) + ξ and regard y−xT β as a working

response Y (β), which is a convention in the partial linear model inference. The

next step is to estimate the coefficients β and M(z) simultaneously. Finally, the

Fourier transformation method suggested in Delaigle et al. (2006) can also be

employed to recover the nonparametric function m(z).

Acknowledgment

The authors thank the editor, the associate editor, and two referees for many

constructive comments and suggestions which greatly improved the quality of

the paper. This research was supported by the NSF of Tianjin Grant 07JCY-

BJC04300, the NNSF of China Grants 10771107 and 10711120448.

Appendix: Proofs of Theorems

Unless otherwise stated, subscript i runs from 1 to N , j from N +1 to N +n

which denote the surrogate data and validation data respectively. Let C1, C2, . . .

denote generic positive constants, not depending on M, hN , bn, n or N .

As Theorem 1 and its proof play an important role on establishing the other

results, we will detail the steps of the proof. First, we state the following necessary

lemma. In what follows, we assume the function U is reversible; Otherwise, the


terms involving U−1 can be expressed as O(1) and the corresponding convergence

rate can be achieved.

Lemma 1 Suppose Conditions C1-C6 hold, we have

(i) SN0P−→ fU (z).

(ii) S2N1 = op(SN0SN2), SN1SN3 = op(S2

N2).

(iii) SN1φN1 = op(SN2φN0), where

φNk =1

NhN

∑

i

K

(U(ti)− z

hN

)(U(ti)− z)kξi, k = 0, 1.

Proof. (i) Using second-order Taylor expansion, we have

SN0 :=fU (z)

=1

NhN

∑

i

K

(U(ti)− z

hN

)+

1NhN

∑

i

K′(

U(ti)− z

hN

) (U(ti)− U(ti)

)

hN

+1

2NhN

∑

i

K′′(

U(ti)− z

hN

) (U(ti)− U(ti)

)2

h2N

+ D4

:=D1 + D2 + D3 + D4

Standard density theory (Parzen 1962) leads to D1 = fU (z)(1+ op(1)). Next, we

focus on dealing with D2 and D3, respectively.

First, according to the asymptotic properties of LLKE (Fan and Gijbels

1996), we have

(U(ti)− U(ti)

)=

1

nbnft(ti)

∑

j

(tj − U(tj)

)L

(tj − ti

bn

)

+1

2nbnft(ti)

∑

j

[U′′(ti)

(tj − ti

)2]L

(tj − ti

bn

) (1 + op(1))

= (UD + UB)(1 + op(1)). (A.1)

Following Condition (C5), uniform convergence rate of the local linear smoother

(e.g., Carroll et al. 1997) reveals that

supt∈Ω

[U(t)− U(t)] = Op(γn). (A.2)


Then D2 can be decomposed as the following two terms,

D2 =

(1

Nh2N

∑

i

K′(

U(ti)− z

hN

)UD +

12Nh2

N

∑

i

K′(

U(ti)− z

hN

)UB

)(1 + op(1))

:= [D2V + D2B](1 + op(1)).

Simple calculations yield that

D2B =b2n

2Nh2N

∑

i

U′′(ti)K

′(

U(ti)− z

hN

)×

∫w2L(w)dw

= U′′(U−1(z))fU (z)

∫w2L(w)dw ×

∫K′(w)dw

b2n

2hN(1 + op(1)), (A.3)

D2V =1

nbnh2N

∑

j

(tj − U(tj)

)E

K

′(

U(t)−zhN

)L

(tj−tbn

)

ft(t)

∣∣∣∣∣tj

+1

nbnh2N

∑

j

(tj − U(tj)

) 1N

∑

i

K′(

U(ti)−zhN

)L

(tj−ti

bn

)

ft(ti)− E

K

′(

U(t)−zhN

)L

(tj−tbn

)

ft(t)

∣∣∣∣∣tj

:= D2V 1 +1

nbnh2N

∑

j

ηj1N

∑

i

ζij := D2V 1 + D2V 2.

Note that given i, ζij , j = N +1, . . . , N +n are independent, and it is also true

that given j, ζij , i = 1, . . . , N are independent. Thus, we have

E(D22V 1) =

1n2h4

N

∑

j

E

[(tj − U(tj)

)2K′2

(U(tj)− z

hN

)]

= λVar(η)fU (z)

Nh3N

∫K′2(w)dw, (A.4)

and

E[D22V 2] =

1N2n2b2

nh4N

∑

j

E

[η2

j

( ∑

i

ζij

)2]

=1

N2n2b2nh4

N

∑

j

E

[E

[η2

j |tj]∑

i

E[ζ2ij |tj

]]

=1

N2n2b2nh4

N

∑

j

E

E

[η2

j |tj]∑

i

E

K

′(

U(ti)−zhN

)L

(tj−ti

bn

)

ft(ti)

2 ∣∣∣∣∣tj

(1 + o(1))

=Var(η)

Nnbnh3N

fU (z)ftU

−1(z)

∫K′2(w)dw

∫L2(w)dw. (A.5)


Finally, using (A.2) we have

E[|D3|] ≤ 12Nh3

N

E

[∑

i

∣∣∣∣∣K′′(

U(ti)− z

hN

) ∣∣∣∣∣(U(ti)− U(ti)

)2]

≤ supt∈Ω

(U(t)− U(t)

)2 12Nh3

N

∑

i

E

[∣∣∣∣∣K′′(

U(ti)− z

hN

) ∣∣∣∣∣

]

=Op

(γ2

n

h2N

),

where we use the independence between ηj and tj . As a consequence, Condition

(C5) and the Markov inequality lead to D3 → 0. By using Condition (C5)

and the similar but more cumbersome arguments we can derive D4 = op(D3).

Combining this and Eqs. (A.1)-(A.5) together, we can complete the proof. (ii)

By using similar arguments but tedious algebra in (i) and Conditions (C2) and

(C5), we can obtain

SN1 = Op(h2N + b2

n + γ2n/hN ),

SN2 = Op(h2N + γnhN ),

SN3 = Op(h4N + γnh2

N ),

which directly lead to the result (ii).

(iii) The proof follows from similar but tedious calculations, and hence is

omitted. ¤

Proof of Theorem 1:

By Lemma 1, collecting the leading terms, M(z)−M(z) can be represented

as the following form

M(z)−M(z) =

1NhN

∑i K

(U(ti)−z

hN

) [Yi −M(U(ti))

]

fU (z)

+

M′′(z)

2NhN

∑i K

(U(ti)−z

hN

) [U(ti)− z

]2

fU (z)

(1 + op(1))

:=MV + MB

fU (z)(1 + op(1)),


where MB and MV can be rewritten as

MB =

1

2NhN

∑

i

K

(U(ti)− z

hN

)M

′′(z)

[U(ti)− z

]2

+1

2NhN

∑

i

K

(U(ti)− z

hN

)M

′′(z)

[U(ti)− U(ti)

]2

(1 + op(1))

:=MB1 + MB2(1 + op(1)),

MV =1

NhN

∑

i

K

(U(ti)− z

hN

)[Yi −M(U(ti))

]

− 1NhN

∑

i

K

(U(ti)− z

hN

)[M(U(ti))−M(U(ti))

]

:=MV 1 −MV 2.

For MB1, Taylor expansion of kernel function yields

MB1 =M

′′(z)

2NhN

∑

i

K

(U(ti)− z

hN

) (U(ti)− z

)2

+M

′′(z)

2Nh2N

∑

i

K′(

U(ti)− z

hN

) (U(ti)− U(ti)

) (U(ti)− z

)2 (1 + op(1))

:=MB11 + MB12 (1 + op(1)) .

Simple calculations yield that

MB11 =1

2hNE

[K

(U(t)− z

hN

)M

′′(z)(U(t)− z)2

](1 + op(1))

=12

∫K(w)w2dwM

′′(z)fU (z)h2

N + op(h2N ),

which is the same as the bias term of LLKE. As for MB12, using (A.2),

MB12 ≤ supt∈Ω

(U(t)− U(t)

) ∣∣∣M ′′(z)

∣∣∣ 12Nh2

N

∑

i

K′(

U(ti)− z

hN

) (U(ti)− z

)2

= Op (γnhN ) .

Hence, we conclude that

MB1 =12

∫K(w)w2dwM

′′(z)fU (z)h2

N + op(h2N ). (A.6)


Similarly, we can also obtain

MB2 = Op(γ2n), (A.7)

(A.6)-(A.7) and Condition (C5) together lead to

MB =12

∫K(w)w2dwM

′′(z)fU (z)h2

N + op(h2N ). (A.8)

Now we turn to analyze the term MV . Firstly, by using second-order Taylor

expansion and (A.1), we can prove that

MV 1 = ∆1 + Op(b2n(Nh3

N )−12 + (nbnNhN )−

12 ), (A.9)

where we denote ∆1 = 1NhN

∑i K

(U(ti)−z

hN

)ξi. A simple calculation yields that

Var(MV 1) =Var(ξ)fU (z)

NhN

∫K2(w)dw(1 + o(1)). (A.10)

Finally, let us focus on the term MV 2, which is the main term consisted of the

variance.

MV 2 =

1

NhN

∑

i

K

(U(ti)− z

hN

)M

′(U(ti))

(U(ti)− U(ti)

)

+1

Nh2N

∑

i

K′(

U(ti)− z

hN

)M

′(U(ti))

(U(ti)− U(ti)

)2

(1 + op(1))

:=

MV 21 + MV 22

(1 + op(1)),

where we can show that MV 22 = Op

(γ2

nhN

), while

MV 21 =

1

NhN

∑

i

K

(U(ti)− z

hN

)M

′(U(ti))(UD + UB/2)

(1 + op(1))

=

1

nhN

∑

j

ηjK

(U(tj)− z

hN

)M

′(U(tj))

+1

2NhN

∑

i

K

(U(ti)− z

hN

)M

′(U(ti))U

′′(ti)b2

n

∫L(w)w2dw

(1 + op(1))

= :

∆2 +12M

′(z)U

′′[U−1(z)]fU (z)b2

n

∫L(w)w2dw

(1 + op(1)). (A.11)


Note that

E[∆2] = 0, E[∆22] =

Var(η)nhN

M′2(z)fU (z)

∫K2(w)dw. (A.12)

Recalling the independence between validation data and surrogate data and

combining Eqs. (A.8)-(A.12), the central limit theorem leads to the result. ¤

Proof of Theorem 2:

Based on the proof of Theorem 1, we can see that

M(z)−M(z) = (∆1(z) + ∆2(z) + B(z)) (1 + op(1)), (A.13)

where ∆1, ∆2 and B are defined in (A.9), (A.11) and (2.12). By using (A.14) it

can be seen that

|Cov[M(z1), M(z2)]| ≤∣∣∣∣∣Var(η)n2h2

N

∑

j

K

(z1 − U(tj)

hN

)K

(z2 − U(tj)

hN

)[M

′(U(tj))]2

∣∣∣∣∣

+

∣∣∣∣∣Var(ξ)N2h2

N

∑

i

K

(z1 − U(ti)

hN

)K

(z2 − U(ti)

hN

) ∣∣∣∣∣+ o(h4

N + b4n + (NhN )−1)

≤ C1

NhNK ∗K

(z2 − z1

hN

)+ o(h4

N + b4n +

1NhN

). (A.14)

Eqs. (A.13), (A.14) and the definitions of M0 and Mkl imply that

E(M0 −M0)2 ≤ C2N−1 + C3h

4N + C4b

4n,

E(Mkl −Mkl)2 ≤ C2N−1 + C3h

4N + C4b

4n, (A.15)

uniformly in k and l. Hence, we have∫

ΩE(m(t)−m(t))2dt =

∫

ΩE

((m0 −m0) +

q∑

l=1

[(m1l −m1l) cos(lt)

+ (m2l −m2l) sin(lt)])2

dt

+∫

Ω

( ∞∑

l=q+1

[m1l cos(lt) + m2l sin(lt)])2

dt

=2πE(m0 −m0)2 + π

q∑

l=1

2∑

k=1

E(mkl −mkl)2 + Rq.


Under Conditions (C7) and (C8), by some modifications of the proof of

Theorem 1 in Akritas and Keilegom (2001), we can show

E(αkj − αkj)2 ≤ C5n−1 + C6b

4n,

uniformly in k and l. This property, the conditions given in Theorem 2 and Eq.

(A.15) yield that

q∑

l=1

2∑

k=1

E(mkl −mkl)2 ≤ (C7N−1 + C8h

4N + C9b

4n)

q∑

l=1

l2b.

Note that by the general theory of trigonometric series we have

∞∑

l=q+1

[m1l cos(lt) + m2l sin(lt)] ≤ C10q−2−s

uniformly in t. Therefore, Rq is of the order q−2−s. Taking all above results

together we can complete the proof. ¤ReferencesAkritas, M. G. and Van Keilegom, I. (2001), Non-parametric Estimation of the Residual Distribution.

Scand. J. Stat., 28, 549–568.

Berkson, J. (1950), Are There Two Regression Problems?. J. Am. Stat. Assoc., 45, 164–180.

Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997), Generalized Partially Linear Single-IndexModels. J. Am. Stat. Assoc., 92, 477–489.

Carroll, R. J., Knickerbocker, R. K. and Wang, C. (1995), Dimension Reduction in a SemiparametricRegression Model with Errors in Covariates. Ann. Stat., 23, 161–181.

Carroll, R. J., Maca, J. D. and Ruppert, D. (1999), Nonparametric Regression in the Presence ofMeasurement Error. Biometrika, 86, 541–554.

Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. (2006), Measurement Error in Non-linear Models: A Modern Perspective, Second Edition. Chapman and Hall, London.

Carroll, R. J. and Stefanski, L. A. (1990), Approximate Quasi-likelihood Estimation in Models withSurrogate Predictors. J. Am. Stat. Assoc., 85, 652–663.

Carroll, R. J. and Wand, M. P. (1991), Semiparametric Estimation in Logistic Measurement ErrorModels. J. R. Stat. Soc. Ser. B, 53, 573–585.

Chen, Y. (2002), Cox Regression in Cohort Studies With Validation Sampling. J. R. Stat. Soc. Ser.B, 64, 51–62.

Cook, J. R. and Stefanski, L. A. (1994), Simulation-Extrapolation Estimation in Parametric Measure-ment Error Models. J. Am. Stat. Assoc., 89, 1314–1328.

Delaigle, A., Fan, J. and Carroll, R. J. (2009), A Design-Adaptive Local Polynomial Estimator for theErrors-in-Variables Problem. J. Am. Stat. Assoc., 104, 348-359.

Delaigle, A., Hall, P. and Qiu, P. (2006), Nonparametric Methods for Solving the Berkson Errors-in-Variables Problem. J. R. Stat. Soc. Ser. B, 68, 201–220.

Fan, J. (1992), Design-Adaptive Nonparametric Regression. J. Am. Stat. Assoc., 87, 998–1004.

Fan, J. and Gijbels, I. (1996), Local Polynomial Modeling and Its Applications. Chapman and Hall,London.

Fan, J. and Truong, Y. K. (1993), Nonparametric Regression with Errors in Variables. Ann. Stat., 21,1990–1925.


Fuller, W. A. (1987), Measurement Errors Models. John Wiley, New York.

Hardle, W., Hall, P. and Marron, J. S. (1988), How Far are Automatically Chosen Regression Smooth-ing Parameter from Theire Optimum? (with discussion). J. Am. Stat. Assoc., 83, 86–99.

Lee, L. F. and Sepanski, J. (1995), Estimation of Linear and Nonlinear Errors-in-Variables ModelsUsing Validation Data. J. Am. Stat. Assoc., 90, 130–140.

Parzen, E. (1962), On Estimation of a Probability Density Function and Mode. Ann. Math. Stat, 33,1065–1076.

Pepe, M. S. (1992), Inference Using Surrogate Outcome Data and a Validation Sample. Biometrika,79, 355–365.

Pepe, M. S. and Fleming, T. R. (1991), A General Nonparametric Method for Dealing with Errors inMissing or Surrogate Covaraite Data. J. Am. Stat. Assoc. 86, 108–113.

Ruppert, D., Sheather, S. J. and Wand, M. P. (1995), An Effective Bandwidth Selector for Local LeastSquares Regression. J. Am. Stat. Assoc., 90, 1257–1270.

Sepanski, J. H. and Carroll, R. J. (1993), Semiparametric Quasilikelihood and Variance FunctionEstimation in Measurement Error Models. J. Econometrics., 58, 223–256.

Sepanski, J. H., Knickerbocker, R. K. and Carroll, R. J. (1994), A Semiparametric Correction forAttenuation. J. Am. Stat. Assoc. 89, 1366-1373.

Sepanski, J. and Lee, L. F. (1995), Semiparametric Estimation of Nonlinear Errors-in-Variables Modelswith Validation Study. J. Nonparametr. Stat., 4, 365–394.

Staudenmayer, J. and Ruppert, D. (2004), Local Polynomial Regression and Simulation-Extrapolation.J. R. Stat. Soc. Ser. B, 66, 17-30.

Stute, W., Xue, L., and Zhu, L.-X. (2007), Empirical Likelihood Inference in Nonlinear Errors-in-Covariables Models with Validation Data. J. Am. Stat. Assoc., 102, 332–346.

Wang, Q. (1999), Estimation of Partial Linear Errors-in-Variables Models with Validation Data. J.Multivariate Anal., 69, 30–64.

Wang, Q. (2006), Nonparametric Regression Function Estimation with Surrogate Data and Validationsampling. J. Multivariate Anal., 97, 1142–1161.

Wang, Q., and Rao, J. N. K. (2002), Empirical Likelihood-Based Inference in Linear Errors-in-Covariables Models with Validation Data. Biometrika, 89, 345–358.

Wittes, J., Lakatos, E. and Probstfield, J. (1989), Surrogate Endpoints in Clinical Trials: Cardiovas-cular Diseases. Stat. Med., 8, 4150–425.

LPMC and Department of Statistics, School of Mathematical Sciences, Nankai University, Tianjin,300071, China

E-mail: ([email protected]; [email protected]; [email protected])

Documents

Nonparametric Regression Function Estimation for Errors-in …web.stat.nankai.edu.cn/chlzou/sinica_revise.pdf · 2013-08-11 · Pepe and Fleming (1991), Pepe (1992), Carroll et al