34
A General Nonparametric Test for Misspecification by Ralph Bradley Robert McClelland * US Bureau of Labor Statistics JEL Code: C12, C14, Specification testing, degenerate moment functions We establish a new consistent, uniformly most powerful test for misspecification. Unlike most previous work, we use a constrained cross-validation scheme for smoothing parameter selection, so that the test does not require arbitrary parameter selections. We address the degeneracy issue through a bootstrap procedure that also allows us to establish the asymptotic distribution of our test statistic. By allowing the use of higher-order kernels without the calculation of higher order derivatives, our test should be simpler to calculate and more powerful than similar tests. Finally, we show that our test can be applied to discrete as well as continuous regressors. * Please address correspondence to: Robert McClelland, Room 3105, 2 Massachusetts Ave. NE, Washington, DC 20212; (202)-606-6579; x607 FAX (202) 606-7080; [email protected]. The authors are especially grateful to Anna Sanders for her extraordinarily patient editing. The views expressed in this paper are solely those of the author, and do not necessarily reflect the policy of the Bureau of Labor Statistics(BLS) or the views of other staff members of BLS.

A General Nonparmetric Test for Misspecification

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

A General Nonparametric Test for Misspecification

byRalph Bradley

Robert McClelland*

US Bureau of Labor Statistics

JEL Code: C12, C14, Specification testing, degenerate moment functions

We establish a new consistent, uniformly most powerful test for misspecification. Unlike most previouswork, we use a constrained cross-validation scheme for smoothing parameter selection, so that the testdoes not require arbitrary parameter selections. We address the degeneracy issue through a bootstrapprocedure that also allows us to establish the asymptotic distribution of our test statistic. By allowing theuse of higher-order kernels without the calculation of higher order derivatives, our test should be simplerto calculate and more powerful than similar tests. Finally, we show that our test can be applied to discreteas well as continuous regressors.

* Please address correspondence to: Robert McClelland, Room 3105, 2 Massachusetts Ave. NE,Washington, DC 20212; (202)-606-6579; x607 FAX (202) 606-7080; [email protected] authors are especially grateful to Anna Sanders for her extraordinarily patient editing. The viewsexpressed in this paper are solely those of the author, and do not necessarily reflect the policy of theBureau of Labor Statistics(BLS) or the views of other staff members of BLS.

Page 2

I. Introduction

Recent literature on misspecification tests has focused on tests that are consistent against all alternatives.

For example, Wooldridge (1992), and Lee, Granger and White (1993) develop tests for neglected

nonlinearity using the same theory as Bierens (1990). While these tests have a desirable robustness, there

are serious implementation issues. Bradley and McClelland (1993) (1994), Hong and White (1993),

Stinchcombe and White (1993), Bierens and Ploberger (1994), Whang and Andrews (1993), De Jong and

Bierens (1994), Hong (1993), and Horowitz and Härdle (1994) all develop new tests based on the

underlying theory of Bierens (1990), and that address these issues.1

To obtain consistency against all misspecifications, these tests necessarily use nonparametric estimators

such as kernel estimators. As a consequence, a smoothing parameter must be selected for each of these

tests. Unfortunately, selection of this parameter affects the finite sample size and power of the statistic

and only Bradley and McClelland (1993) (1994) offer any guidance in its selection in finite samples.

Most of the above tests are not only consistent against all alternatives, they are also in the class of the

uniformly most powerful tests (UPT). The UPT, established in Bradley and McClelland (1993) and

Stinchcombe and White (1992), has as its first moment the expectation of the product of the regression

residual and the conditional expectation of the residual given the regressors. Unfortunately, because the

1 Horowitz and Härdle (1994) is mentioned for completeness. However, the test in this paper is not based

on a underlying statistic that is largest under misspecification and consistent against all alternatives. To

remove problems of dimensionality when there is more than one regressor, Härdle and Horowitz use the

misspecified functional form as the argument to the kernel regression function. This feature does not

allow the test to be consistent against all alternatives and therefore is not in the class of tests that we

discuss in this paper.

Page 3

conditional expectation of the residual equals zero everywhere in the domain under the null hypothesis,

tests of its correlation with other variables degenerate to a constant.

Attempts to correct this problem fall into two dissimilar categories. Wooldridge (1992) is an example of

the first category in which he places restrictions on the expansion of his series estimator to keep the

statistic from degenerating. In the second category Hong and White prevent degeneration by

standardizing the sample moment of their test at a rate greater than root-n.2 It is now widely recognized

that the second category of tests have higher power than the first. What has not been generally

recognized is that this occurs partly because the first solution forces the noncentrality parameter of the

tests to converge to a form that is not the UPT.

In a later paper, Hong (1993) designs a second category test that allows the smoothing parameter to

converge at an “optimal” rate so that the nonparametric estimator achieves the optimal convergence in

terms of the integrated mean square error (IMSE) criterion for second order kernel functions. Although

further reduction of the IMSE is possible through the use of higher order kernels, the use of the second

derivative of the nonparametric components in his test would require extremely messy and tedious

estimation of higher order derivatives.

A final issue that has received little attention involves the type of data sets upon which these tests may be

used. With the exception of De Jong (1991), who extends the Bierens (1990) test to time series data, these

tests are restricted by the common assumptions that all variables are independently and identically

distributed and continuous.

In this paper we establish a new consistent test for misspecification whose non centrality parameter

converges to the UPT under misspecification. Unlike most other tests, it establishes an automatic

mechanism to select the smoothing parameter. By using a cross-validation scheme, the test makes no

2 This is the rate used for the sample moments for parametric statistical tests. See Davidson and

McKinnon (1993) page 112-113.

Page 4

demands of the researcher, such as arbitrarily selecting a smoothing parameter. Rather than use either of

the two previous approaches to the degeneracy issue, we employ a bootstrap procedure that uses variation

in the resampling to prevent degeneration of the test statistic. Finally, we show that our test can be applied

to discrete as well as continuous regressors.

II. Notation, Definitions and the Null Hypothesis

Suppose the random vector (y,x), y ∈ Y ⊂ ℜ and x ∈ X ⊂ ℜk on the probability space (Y×X, ,F), has a

joint probability distribution p(y,x) where the following holds:

E(y|x) f(x)=

y f(x)= + ε

(2.1)

(2.2)

where ε ∈ ℜ is the error of the model. It is clear that E(ε|x) = 0 for all x and E(ε)=0. In parametric

regression estimation, it is assumed that f(x) falls in a family of known real parametric functions f(x,θ) on

ℜk×Θ where Θ is the parameter space and is a compact subset of ℜq. We wish to test that a specific

parameterization of f(x), denoted f(x,θ), satisfies the null hypothesis:

Ho: Pr(E(y|x)=f(x,θ0))=1 for some θ0∈Θ. (2.3)

The alternative hypothesis is that

HA : Pr(E(y|x)=f(x,θ)) < 1 for all θ∈Θ (2.4)

In general, there are an infinite variety of ways that the alternative hypothesis in (2.4) can hold.

Given a consistent estimator for θ, that varies with sample of size n, which we denote as θ$n, a moment-

based test, or “m-test”, of the null hypothesis in (2.3) can be constructed. These m-tests are based on a

sample of observations from Y×X. If the null in (2.3) is true at θ=θo then using a sample of size n, there

is a moment function:

mn: Y×X × Θ × Π → ℜ1 (2.5)

such that:

E(mn(yi, xi, θo, π))=0 (2.6)

Page 5

for all observations i and some parameters θ0 in Θ and some infinite dimensional nuisance parameter π

in Π. A statistical test is consistent against all alternatives if (2.6) does not hold whenever (2.4) is true.

Given this moment function (which we can also write as min(θ,π)), one may use its sample analog to

construct a test of the form Mn= nm$ n V$

n m$ n ,where:

$ ($ , $ )m n mn1

ini 1

n

≡ −

=∑ θ πn n (2.7)

$θn is a consistent estimate of the true value of nuisance parameter π0 and V$ n is an estimator that

asymptotically converges to the variance of anm$ n, for some sequence an=o(nδ), .5 < δ < 1. Under certain

regularity conditions (see Newey 1994), Mn converges in distribution to a χ2(1) that is invariant to the

nuisance parameter π. Note, however, that these tests still generally require some nonparametric estimate

of π.

The form of the most powerful moment tests is established in Bradley and McCelland (1993) and

Stinchcombe and White (1992), where it is shown that the test must use a moment function that contains

[f(x i) - f(xi, θ)][y i - f(xi, θ)]. The term in the first bracket is the conditional expectation of yi - f(xi, θ). To

be consistent against all alternatives, the test must nonparametrically estimate this expectation. Tests of

this type belong to the class of UPTs but vary in aspects of the nonparametric estimator . Letting f(⋅) be

defined by (2.1) and (2.2) above and letting f(⋅,θ) be a specific parameterized version of f(⋅), some

examples of an degenerate moment functions that are the components for most UPTs : are listed below.

min=[f(x i)-f(xi, θ)][y i - f(xi, θ)]

min=[yi - f(xi)]2 - [yi - f(xi, θ)]2

(2.8a)

b)

The tests in Wooldridge (1992), Hong and White (1993), Horowitz and Härdle (1994), Zheng (1990), and

Hong (1993) are based on (2.8a). In constrast, Yatchew (1992) and Whang and Andrews (1991) develop

tests based on (2.8b).

The difficulty behind designing a test of this form is that the moment function is necessarily a degenerate

moment function, i.e. min(θ0, π0)=0, under the null. This degeneracy implies that the distribution of the

test statistic converges to a point, making the statistic useless. A more formal definition is as follows:

Page 6

Definition 2.1 Let min: Y×X×Θ×Π→ℜ1 be generated on an i.i.d. sample from (Y × X). Suppose that

plim (θ̂-θ)=0 and that plim ρ(πn,π0) = 0 for some suitable metric ρ(.,.) and for π0∈Π. For each

(θ,π) ∈ Θ × Π assume that min(θ,π) is measurable, and is twice Frechet differentiable with respect to π,

and twice differentiable with respect to θ. Let an be a nonstochastic sequence {an ∈ ℜ+ : ann-1/2 → ∞

ann-1 → 0}. Denote m

_n(θ,π )=En-1 ∑

i=1

nmin(θ,π). The moment function mn is an degenerate at (θ,π ) if

(a) a m mn n np[ ( , ) ( , )]θ π θ π− → 0

(b) a n mn n

− ∇ →1

2 0θ θ π( , )

(c) a mn nδ π π θ π π( $ ; , )− → ∈ ⊆0 0 00 for 0 Π Π

where Π0 is some neighborhood of π0, ∇θ is the gradient with respect to θ, and δ is the Frechet derivative

operator.

Initially, the tests in the papers listed above attempted to impose restrictions that avoided the degeneracy.

These test are either U-statistics or von Mise statistics, and the restrictions are usually placed on an

expansion of these statistics. Earlier tests such as Wooldridge(1992) are typically first order tests in that

they are based on the asymptotic theory of a first order expansion of min. When using first order testing,

one places limits on the convergence of the bias of the nonparametric estimator so that its variance

converges to zero faster than the bias. This avoids the degeneracy under the null because the smoothing

parameter converges at an adequately slow rate so that the bias is always present and nonzero. In these

tests, we say that the bias dominates the variance. The tests in both Wooldridge (1992) and Yatchew

(1992) are examples of this type: Wooldridge (1992) uses a sequence of non nested alternatives with an

appropriately slow asymptotic growth of the smoothing parameters, while Yatchew (1992) suggests a

sample splitting procedure where one part of the sample is used to calculate the nonparametric estimator

of the conditional expectation and then this estimator is multiplied by the estimated residuals of the other

part of the sample. Although the tests in these papers do not degenerate, there are problems with these

approaches. For example, non-nested testing requires slow convergence of the non parametric estimator,

Page 7

sample splitting makes inefficient use of the data, and weighting requires the choice of new arbitrary

parameters.

The major drawback of allowing the bias to dominate the variance is that if min(θ,π) is of the form in 2.8a)

then the limit of the actual estimated statistic will not be in the class of UPTs. Although the limit on the

convergence of the bias prevents degeneracy, by construction the presence of the bias of the nonparametric

estimator does not allow the first moment of m$ n in (2.7) to converge to the first moment of the UPT.

(See Bierens 1987 for the convergence properties of the kernel regression.).

An alternative approach is that of Hong and White (1993), DeJong and Bierens (1994), and Horowitz and

Härdle (1994), who use the central limit theorems (CLTs) of degenerate U-Statistics (in, for example, de

Jong 1987 and Hall 1984) to exploit the degeneracy of the moment function. Most often, they use a

standardization that is greater than root-n, and established regularity conditions where possible.

Typically, these tests are second order tests in that the asymptotic theory is established on the second

order expansion of min that exploits the first order degeneracy of (b) and (c) in definition (2.1). Instead of

controlling the convergence of the bias of the nonparametric estimator, they control the convergence of

the variance. In these tests, we say that the variance dominates the bias. Unlike the first order tests, these

second order tests do converge to a statistic in the UPT class.

In a variation of this second approach, Hong (1993) imposes the same rate of convergence on both the

variance and the bias. This is a distinct advantage over previous tests because it allows him to use the

“optimal” rate of convergence (in the IMSE sense) for the window width. Unfortunately, Hong's test is

difficult to implement. Because the squared bias and the variance converge at the same rate, the

denominator of his test must contain a variance component as well as a bias component. This bias

component requires the estimation of second derivatives of the nonparametric component of the test so

that if higher order kernels are to be used in order to improve the rate of convergence, messy higher order

Page 8

derivatives must then be calculated. Finally, by not allowing the bias to disappear, Hong’s test does not

converge to a UPT.

III. A Root-n Consistent Test

Our test is a second order test based on the first type of degenerate moment function in 2.8. Instead of

using a greater than root-n standardization, we use a bootstrap procedure that forces the variance to

dominate the bias by using the sample itself as an additional source of variation. This allows the test to

achieve root-n consistency. Unlike Hong's Test, we can use higher order kernels without changing the

structure of the denominator of the test. Therefore, we can readily use higher order kernels to achieve a

convergence of the window width that produces a lower integrated mean squared error than the Hong test.

Finally, by using a moment function of the form in (2.8a) and allowing the bias to disappear, our test does

converge to an element of the UPT.

We begin with definitions and assumptions about the data generating process (DGP):

Definition 3.1

Let ν be an element the class of continuous and r differentiable functions For s≤r the supremum Sobolev

Norm of ν is defined as:

( )ν νλλ

s s z Z D z, max sup∞ ≤ ∈= ,

where D is the differentiation operation with respect to the argument.

Definition 3.2

A Sobolev Space is defined as

{ }W (Z) C [Z] s r,rs r

s∞ ∞= ∈ < ∞ ≤ν ν: ,, .

Page 9

Assumption 3.1

(Y×X, ,F) is a complete probability space. (a) The stochastic process {( yi ,xi): Y×X →ℜk+1 for i =

1,2,..,n; n=1,2...} is generated from this space, and for each n, (yi,xi) is i.i.d. (b) The support of xi,

X⊂ℜk, is compact. (c) supx∈XE|yi|2+δ |xi=x) < ∞.

The compactness of X is used to avoid boundary issues. This assumption can be bypassed by including a

trimming function on our test. We can also extend our results to martingales or to α or φ mixing

variables, but this distracts from the main ideas of this paper.

Throughout the paper, we assume that when Ho is true we can consistently estimate the true θo. This

standard assumption is the next assumption.

Assumption 3.2

(a) Under Ho, n1/2[θ$-θo] = Op(1).

(b) The function f(x,θ) ∈ W∞,2s .

We also need to impose the following conditions on the kernel K(⋅).

Assumption (3.3)

K:T→ ℜ is a symmetric bounded kernel with compact support, where T ≡ [-1,1]k and K is

differentiable of order s, with the s-order being Lipschitz. Finally, for u ∈ ℜk

K(u)du = 1T∫

and there exists some t > 2k such that

u u ... u K(u)du = 0 for | i|= i1i

2i

ki

ji=1

k

T

1 2 k ∑∫ < t

u u .. . u K(u)du 0 for | i|= ti i i

T

1 2 k ≠∫ .

Assumption 3.3 allows us to induce the bias to converge to zero more rapidly than the variance so that the

variance of our estimator dominates the bias. We can therefore ignore the calculation of the bias in the

Page 10

denominator of our test statistic. Note that we have not mentioned the window width, γn, that is the

smoothing parameter in our test.

Below is an example of a kernel that implements assumption (3.3). We let k : ℜ→ ℜ and the K(.) that

satisfies (3.3) has the following form:

K u k uik

i( ) ( )= ∏ =1 (3.1)

k v v v I vjj

j

t( ) ( )( ) (| | )

/ ( )= −∑ ≤

=

−α 2 2

1

1 2 2 3

41 1 (3.2)

The α's are derived as a solution to the following simultaneous system of (1/2)(t-2) equations:

α δji j

ij

t

E v i t( ) , ( / )( )( )/ ( )

20

0

1 2 2

0 1 2 2+

=

= ≤ ≤ −∑ (3.3)

The kernel in this example is an Epanechnikov kernel, and is the most efficient kernel in the sense that it

minimizes the integrated mean squared error for the kernel regression estimator.

Assumption(3.4)

Let po(x) be the probability density function (pdf) for the random vector for the continuous variables in x.

Then po(x)∈W ∞∞,t where t is defined in assumption(3.3).

By assuming that the probability density function (pdf) of x is differentiable t times, where t > 2k,

Assumptions (3.3) and (3.4) imposes a convergence rate of the bias that goes to zero more rapidly than the

variance. As in most of the nonparametric literature, the proofs for the reduction in bias involve taking

higher order derivatives for the pdfs of the continuous variables.

As described in the introduction, we use a bootstrap procedure to prevent the degeneration of the test

statistic. To do this we generate for each observation i in a given sample yi ,xi , i= 1,...,n, a new random

vector of size n' by sampling with replacement from the set of integers {1,2,..,n}. We denote this random

vector for the ith observation as Ni. We then define the cardinal variable S(A) as the number of

occurrences of event A. Therefore, S(j∈Ni) is the number of times j occurs in the random vector Ni. The

Page 11

random vectors N1,N2, ...Nn along with the operator S(⋅) are the components of the bootstrapping process.

To prevent degeneration, we make the following assumption:

Assumption (3.5)

n'=O(n3/2γnk/2 )

This assumption establishes the relationship between the cardinality of Ni, n’, and the size of the sample,

n. The rate of increase of n’ guarantees that our statistic is Op(1) by offsetting the rate the underlying

moment of our test converges to zero when based on a kernel regression using the smoothing parameter,

γn . The assumptions of the convergence of γn are as follows:

Assumption (3.6)

The window width γn satisfies:

(a) γn → 0

(b) nγn3k →∞

(c) nγn2t+k/2 →0.

Assumption (3.6a) is a standard window width assumption for kernel regressions. Assumption (3.6b)

ensures that the probability limit of the variance estimator of our underlying test is invariant to the

underlying nuisance parameters. It is also important in terms establishing the essential condition for

Hall’s (1984) CLT for degenerate statistics. Finally, assumptions (3.6c) along with (3.5) allow us to

construct a test such under HA, the bias disappears in the asymptotic distribution. Thus assumptions (3.5)

and (3.6) are essential ingredients for our test to converge to a UPT.

Given the above assumptions, we can now describe the major components of our test. The moment

function of interest is

min =p(xi)[f(x i) - f(xi, θ)][y i - f(xi, θ)] (3.4)

Page 12

Our test is then based on estimating (3.4) with:

$ $ $

$ $m n [r (x ) p (x )f(x , )][y f(x , )]n1

n i n i i i ii 1

n

= − −−

=∑ θ θn n

(3.5)

where we use a "bootstrapped" Nadaraya-Watson kernel estimator for:

[ ]$ ( ) $ ( ) ( ) ( ) $f x np x y S j N K x xn i n i j i n j ij

n= ∈ − ≠−

=∑1

10 if pn (3.6)

$ ( ) ( ) ( )p xn

S j N K x xn i i n j ij

n= ∈ −=∑11

r$n(xi) = p$n(xi)f$n(xi)

The estimator p$n(x) is the kernel estimator for the product of the bootstrap indicator times the probability

density of x. The function Kn(x-xj) = γn-k K(γ n-1[x-xj]), where γn is the window width of the kernel

regression estimator. The parametric estimator of the conditional expectation is denoted as f(x, θ̂) since

the only component that is estimated is the parameter vector θ.

The only remaining detail is an assumption that establishes an automatic mechanism to select the window

width.

Assumption (3.7)

Let σ$ = {σ$ x1,..,σ$ xk} where σ$ xi is the sample standard deviation of the ith element in the random vector. x

Assume that the kernel function Kn(⋅) automatically standardizes each element in x by dividing by its

counterpart in σ$ .

Define R to be a set of int(log(n)) points equally spaced between .025 and 4. Then, γn =c*nδ where δ

satisfies the restrictions in assumption (3.6) and

c argminc

( (y f (x , c)*i i i

i 1

n=

∈−∑ −

=R$ )2 ,

where $ ( , )f x ci i− is the kernel regression that omits the ith observation and uses the window width γn =cnδ.

The residuals from the cross validation of assumption (3.7) are:

Page 13

ε$ i*=yi - f̂-i(xi,c*) (3.7)

$ ( , )*

*

*f x c

y Kx x

c n

Kx x

c n

i i

ji j

j

n

i j

j

n−≠

=

δ

δ

1

1

The purpose of assumption (3.7) is to allow finite sample power improvements while still ensuring that m$ n

will have a zero expectation under the null. We cannot do an unconstrained cross validation because this

would force the rate of convergence of γn to violate assumption (3.6). The bounds of R must be O(1) so

that we can maintain the necessary convergence of the kernel. When assumption (3.7) is combined with

the bounds on T in assumption (3.3), the kernel is allowed to move from the point where positive weight

is given to differences in xi-xj that are 8(nδ) sample standard deviations apart to the point where there are

.05(nδ) sample standard deviations apart. Notice that 8·nδ - 0.5·nδ = O(1)·nδ. Any other points of xi-xj

that are greater than 8(nδ) sample standard deviations are not weighted. This addresses the problems of

higher bias at the boundary that is discussed in Härdle (1990) pages 130-132, since no outliers of xi-xj will

be used.

Our test statistic is :

M V n[m R ]n n1/2

n n= −−$

$$ (3.8)

where

$

$R K(0)sn n2= −γ n

k(3.9)

$ $s 1/ n S(i N )n2

i ii 1

n= ∈∑ ∗

=ε 2

(3.10)

$ [ $ * $ * )]V 4C(K)n K (x xn2

n i j= −−<∑∑ i j i jε ε2 2

(3.11)

C(K) K (u)du2= ∫ (3.12)

ε$ i = yi - f$n(xi) (3.13)

Nothing in this test is left for the researcher to decide. The simple cross validation mechanism is

conducted over a compact finite set who boundary asymptotically approaches fixed values.

Page 14

IV. Asymptotic Properties of the Test

As we noted in section III, the test as delineated in (3.8) is based on a degenerate statistic. We use the

CLTs from Hall (1984) and DeJong (1987) to prove the asymptotic normality of (3.8). Unlike, Hong and

White (1993), we do not increase the variance by over standardizing, but instead use a bootstrap

procedure. The random variable min is the basis for our test and is linear in πi={p(xi)f(xi), p(xi)}. This

linearity simplifies the asymptotic proofs. The moment function in equation (3.1) is degenerate under the

null of correct specification, i.e., when θ equals the true value θo. The sample estimate for (3.1) is (3.3),

where r$n(xi) and p$n(xi) are used to estimate r0(xi) and p0(xi).

We now establish a lemma that justifies the use of the bootstrapping, and that characterizes the limiting

distribution of our statistic.

Lemma 4.1

Suppose Assumptions 3.1, 3.5, and 3.6 hold. Let ε come from a zero mean distribution with finite

variance. Let

W n Wn nijj

n

i

n= ∑∑−

==

2

11,

(4.1)

W S j N K x xnij i j i n i j= ∈ −ε ε ( ) ( ) ,

ε i i i iy E y x= − ( | )

Define

( )V C K E x p x042= ( ) ( ) ( ) ,σ (4.2)

where

C K K u du( ) ( ) .= ∫ 2

Then

V n W EW Nn nd

01 2 0 1− − →/ ( ) ( , ) .

Page 15

The bootstrap procedure adds enough additional variation so that even though Wn is based on a

degenerate statistic without the bootstrap, it is Op(1) with the bootstrap. We use the Hall (1984) CLT for

degenerate U-Statistics in order to prove asymptotic normality.

We next develop a heteroskedastic consistent estimator for V0.

Lemma 4.2

Given assumption 3.1 with δ=2, while assumptions 3.2, 3.3, and 3.5 also hold.

Then, $ ( ).V V on p− =0 1

With Lemma 4.1 and 4.2, we can prove that the distribution of our statistic converges to a standardnormal distribution under the null hypothesis..

Theorem 4.1

Suppose assumptions 3.1 through 3.7, then

M Nnd → ( , )0 1

under H0 .

We now focus on the distribution of Mn under global and local alternative hypotheses. We define a

sequence Hn under the local alternative Han

Han: E(y|x) =Hn (x,θ)= f(x,θ0)+n-1/2γn-k/4∆n(x) (4.3)

where ∆n(x) is a sequence of uniformly bounded functions that converges uniformly to a limit function

∆(x). This is different from n-1/2 convergence between the true and alternative under the parametric

alternatives of an m-test as outlined in Newey (1985).

We need to make additional assumptions in deriving the local asymptotic power of our test.

Assumption (3.8)

Let θ$ be an estimator for θ0 in (4.3). There is aθ such that the following holds

[θ$ -θ] = O(n−1/2)

Page 16

and

[θ0-θ] = O(n−1/2γ n−κ/4 )

Assumption (3.9)

The random variable, u, in (2.) has finite fourth moments.

Theorem 4.2

Let assumptions (3.1) to (3.9) hold. Define

∆ ∆* ( ) ( ) ( ( , ) / )x x E f x= − ∂ θ ∂θ β

β = plim n1/2 γnk/4 (θ

_n-θ0)

Under the sequence of local alternative models in (4.3) Mn is asymptotically distributed as N(µ,V0) where

µ=E[∆* (x)]. Under HA for any nonstochastic sequence {Cn = o(nγnk/2)}

P M Cn n[ ]> → 1.

From Theorem 4.2, we can conclude that Mn has power against alternatives whose distance from Ho is

O(n-1/2γn-k/4). Notice that µ equals plim E([yi-f(xi,θ

_n)][f(x i)-f(xi,θ

_n)]) which is exactly the first moment of

the UPT. However, there is a curse of dimensionality. As more regressors are added the order of local

alternatives that can be detected become slower. This problem can be readily improved by using higher

order kernels, and the rate of convergence can be made arbitrarily close to n-1/2. Unlike Hong (1993),

using higher-order kernels is straightforward.

In many applications there are discrete right hand side variables. We now relax our assumptions to

include both discrete and continuous right hand side variables. Let x be partitioned into [x1,x2] where x1

are the k1 continuous variables and x2 are the k2 discrete variables where k=k1+k2. We now add the

following assumptions.

Page 17

Assumption 3.9

The kernel K(.,.): ℜk1×ℜk2 → ℜ is chosen such that for z1,z2 element ℜk1×ℜk2

(a) K(z1,0) satisfies all the assumptions of (3.3) in terms of z1.

(b) nz

K z z dz for all

n

sup

| | /( , )

2

1 2 1 0 0>

→ >∫λ γ

λ

Assumption 3.9 places appropriate constraints on the tails of K(z1,.) so that E(K(z1,z2)|z2) converges to the

indicator function I(z=z2).

Assumption 3.10

x2 has support X2 which is a subset of ℜk2. X2 has the following additional properties:

(a) X2 has a finite number of elements where x2 ∈ X2 implies that p(x2) > 0.

(b) p xx X

( )2 12 2

=∑∈

Let p(x1|x2) be the density of x1 given x2. Then the following holds:

(c) p(x1|x2) ∈ Wt, ∞

s ,

and E(y|x1,x2)P(x1|x2) ∈ Wt, ∞

s with respect to the first argument x1.

The assumption (3.1) holds for X=X1×X2 where X1 is the support for the continuous variables.

Assumptions (3.5) and (3.6) hold for k=k1.

Perhaps the easiest way to describe the behavior of the test when discrete right hand side variables are

used is to show some primitive results when there are no continuous variables. In this case, assumption

(3.9) has a kernel such that K(0) = 1.

Letting In(xi=xj)=I(xi=xj)/γn we define

$

$ ( ) ( ) ( )r xn

y I x x S j Nn i jj

n

n j i i= ∑ = ∈=

1

1 (4.4)

Page 18

$

$ ( ) ( ) ( )p xn

I x x S j Nn ij

n

n j i i= ∑ = ∈=

1

1 (4.5)

$

$ ($$ ( $

$ ( ) ( , $ ))( ( , $ ))mn

r x p x f x y f xn n i n i i n i i ni

n= − −∑

=

1

1θ θ

(4.6)

Then the following holds:

( ){ }

n m mn n

f x f x u f x f x S j N

I x x K x x u

n n j i j i i n ij

n

n i j n i j i

$

$

$ ( ( ) ( ) [ ( ) ( , $ )]) ( )

( ) ( ) $

− = − + − − ∈ ×

= − −

=∑1 1

1

θ

≤ − + − − − ∈ ≠∑∑=≠

=

13 2

11nf x f x u f x f x u K x x S j N I x xj i j i n i i n i j i n i j

jj i

n

i

n

/ ( ( ) ( ) [ ( , $ ) ( )]) $ ( ) ( ) ( )θ

≤ ∈∑∑>=

≠=

12

11nZ S j N nK zij i

z xi njj i

n

i

n( ) sup ( )

| | ( ) /µ γ

where µ( ) inf | |/

x x xi

x j X j x i

i j= −∈

and Zij = ( ( ) ( ) [ ( , $ ) ( )]) $f x f x u f x f x uj i j i n i i− + − −θ .

By assumption (3.10)sup ( )

| | ( ) /z x i n

nK z>

→µ γ

0 .

Then, p n m mn nlim ( $

$

$ ) .− = 0

The statistic m$ n is based on a continuous kernel that maintains all the smoothness properties in

assumption (3.3) and therefore, we can easily have both discrete and continuous right hand side variables.

Notice that because the support X2 is finite, it will induce the test for pure discrete right hand to converge

in probability to its underlying first moment rather than in distribution. The reason for this is that the

kernel needs only to be estimated at a finite set of points in the sample space and these points do not grow

with sample size.

Theorem 4.3

Under the assumptions 3.1 through 3.8 for x1 and k=k1, and the additional assumptions of 3.9 and 3.10.

Theorem 4.2 carries over with k replaced by k1 and C K K u u du( ) ( , )= ∫ 1 22

1 where u1∈ℜk1 corresponds to

the continuous variables.

Page 19

We end this section by presenting an alternative estimator for V0 in (4.2). When higher order kernels are

used the computation of C(K) = ∫K2(u)du is difficult when there are several right hand side regressors.

We offer the following lemma to compute an consistent estimator for V0 that requires no integration.

Lemma 4.3

The estimator:

$

( )$ $ ( )*2 *2J

n nK x xn i j n i j

i ji

n=

−−∑∑

<=

−4

12

1

1ε ε

where εn* is defined in (3.7) is a consistent estimator for V0.

V Conclusions

We have in this paper proposed a test for misspecification that is consistent against all functional forms

and is of the form of the uniformly most powerful tests. By design the researcher does not arbitrarily

choose any parameter of the test. The researcher need only to estimate his original parametric model, and

then using the estimated residuals and the left and right hand side variables. The cross validation

mechanism along with the kernel of compact support allows the data to determine an optimal window

width while trimming out observations with density that are arbitrarily close to zero. The bounds of the

cross validation are fixed so that the window width will still have the required rate of decrease in order to

achieve standard asymptotic results.

It is important to note that our test prevents degeneracy by using a resampling technique. Although the

asymptotic distribution of our bootstrap test is identical to the Hong and White (1993), this research opens

a new avenue in which to promote additional finite sample efficiency. In order to maintain simplicity, our

resampling was done so that each observation had a 1/n chance of being chosen. This automatically

made the random variable S(j∈Ni) independent from the sample, allowing us to concentrate on showing

how resampling could prevent the degeneracy of our statistic. However, there are other resampling

strategies. One is to let p xi n i= 1 2/ $ ( )σ where e σ$ n2(xi) is an estimator for the residual variance. One

Page 20

could make resampling section be proportional related to pi so that the sample points with smaller

variance would have a higher chance of being chosen. This could increase the finite sample power of the

test.

While this test is still subject to the curse of dimensionality it can be offset by the use of higher order

kernels. Although the test is not optimal in the sense of Hong (1993) we can achieve a higher local power

by using higher order kernels which at this point cannot be done in Hong without estimated third order

and higher derivatives of the conditional expectation function. Finally, we show that our test can be

applied to discrete variables, allowing the test to be used on a large number of data sets.

References

Bierens, H.J., 1982, Consistent Model Specification Tests, Journal of Econometrics 20, 105-134.

—————, 1987, Kernel Estimators of Regression Functions, in: T. R. Bewley, ed., Advances in

Econometrics / Fifth World Congress, Vol. 1 (Cambridge University Press: New York), 99-144.

—————, 1990, A Consistent Conditional Moment Test of Functional Form, Econometrica 58, 1443-1458.

Bierens, H.J., and W. Ploberger, 1994, Asymptotic Power of the Integrated Conditional Moment Test

Against Large Local Alternatives.

Bradley, R. and R. McClelland, 1993, An Improved Nonparametric Test for Misspecification of

Functional Form, Manuscript

Bradley, R. and R. McClelland, 1994, A Kernel Test for Neglected Nonlinearity, manuscript

de Jong, P. 1987, A Central Limit Theorem for Generalized Quadratic Forms, Probability Theory and

Related Fields 75, 261-277.

————— and H.J. Bierens, 1994, On the Limit Behavior of the Chi-Square Test if the Number of

Conditional Moments Tested Approaches Infinity, Econometric Theory 9, 70-90.

de Jong, R.M. 1991, The Bierens Test Under Data Dependence, manuscript

Hall, P., 1984, Central Limit Theorem for Integrated Square Error of Multivariate Nonparametric Density

Estimators, Journal of Multivariate Analysis 14, 1-16.

Härdle, W., 1990, Applied Nonparametric Regression, (Cambridge University Press, New York).

Hong, Y., 1993, Consistent Specification Testing Using Optimal Nonparametric Kernel Estimation, CAE

Working Paper #93-13, Center for Analytic Economics, Cornell University.

————— and H. White, 1993, M-Testing Using Finite and Infinite Dimensional Parameters

Estimators, manuscript, University of California San Diego.

Horowitz, J. and W. Härdle, 1994, Testing a Parametric Model Against a Semiparametric Alternative,

Econometric Theory 10, 821-848.

Lee, T.H., White, H. and C.W.J. Granger, 1993, Testing for Neglected Nonlinearity in Time Series Models,

Journal of Econometrics 56, 269-290.

Newey, W. K., 1985, Maximum Likelihood Specification Testing and Conditional Moment Tests,

Econometrica 53, 1047-1070.

Newey, W, 1994, Kernel Estimatioin of Partial Means a a General Variance Estimator, Econometric

Theory 10, 233-253

Stinchcombe, M. and H. White, 1993, An Approach to Consistent Specification Testing Using Duality

and Banach Limit Theory, Discussion Paper, University of California, San Diego.

Whang, Y.J. and D. Andrews, 1993, Tests of Specification for Parametric and Semiparametric Models,

Journal of Econometrics 57, 277-318.

Wooldridge, J., 1992, A Test for Functional Form Against Nonparametric Alternatives, Econometric

Theory 8, 452-475.

Yatchew, A.J., 1992, Nonparametric Regression Tests Based on Least Squares, Econometric Theory 8,

435-451

Zheng, X., 1990, A Consistent Test of Functional Form Via Nonparametric Estimation Techniques,

Princeton University Discussion Paper.

Proofs

We start with Lemma A.1 which is an important ingredient for the lemmas and theorems in the paper.

Lemma A.1

Given assumptions 3.1 through 3.6, let N be continuously differentiable and E||N2(x)||<∞ and εi is defined

in (4.1). Then12

11nN x K x xi j n i j

j

n

i

n

ε ( ) ( )−==∑∑

is Op(1/ n).

Proof:

We rewrite the expression as a V-Statistic:

12

11nN x N x K x xi j j i n i j

j

i

i

n

{ ( ) ( )} ( )ε ε+ −==∑∑ .

We apply the theorem of Powell, Stock, and Stocker(1989), and we need to verify that

P E N x N x K x x o nn i j j i n i j= + − ={ ( ) ( )} ( ) ( )ε ε2

.

{ }P E N x N x K x x

Kx x

x N x p x p x dx dx

n i j j i n i j

i j

ni j i j i j

nk

≤ + −

=−

∫ ∫

[ ( ) ( )] ( )

( ) ( ) ( ) ( )

ε ε

γ γσ

2 2 2 2 2

2 2 221

2

= − −∫ ∫212

2 2

γσ γ γ γ

nk i i n i i n i n

kK u x N x u p x p x u dx du( ) ( ) ( ) ( ) ( )

= O O n n o nnk

nk( ) ( ( ) ) ( )γ γ− −= =1

QED

Lemma 4.1

Proof:

Under Ho, E(Wnij|xj) = E(Wnij|xi)=0. Given Assumption (3.1), the following holds:

E W En W n E S i N K n Kn nijj

n

i

n

i i nk

nk

i

n( ) ( ) ( ) ( ) ( )= =∑∑ ∈ =∑−

==

− −

=

−2

11

2 2 2

1

1

0 0ε γ σ γ (n'/n) .

Using A(3.5), this implies:

W E W n W n n K S i N n n n W Op nn n nij nk

i ii

n

jj i

n

i

n

nij nk

jj i

n

i

n− = + ∈ −∑∑∑ = +∑∑− − −

==≠

=

− −

=≠

=( ) ( ) ( ) { ( ) ( '/ ) } ([ ] )./2 1 1 2 2

111

2 2 1

110γ ε σ γ

The last equality follows from Chebyshev's inequality, and δ=2 from Assumption (3.1) and

S(i∈Ni)=Op(n'/n)=Op(n1/2γnk/2) from A(3.5).

Then

n(W - EW )n n =(n-3/2)∑i=1

n

∑j≠i

nWnij+op(1)

given nγnk→∞. To show that n (W - EW )n n →N(0,1) in distribution, we show that:

V U N where U n W and V Un nd

n nijj j i

n

n ni

n− −

= ≠= → = ∑ =∑1 2 3 2

110 1/ /

,( , ), , var( ).

Under Ho, because E(Un|xi) = E(Un|xj) =0, Un is a degenerate second order U-Statistic so that we require a

central limit theorem (CLT). From de Jong's (1987) CLT for generalized forms we know that for Vn-1/2Un

to be asymptotically N(0,1), it suffices that Gni/Vn2 = o(1) for i = 1,2,3, where :

Vn = Var(Un) ={ ( )2 ( ( ) ( ) )}n n n E S j N Ki i j nij− ∈−1 3 2 2 2 2ε ε

= 21 1 1

3

2

22 2 1 2 2

12

2 1 2 1 2

n n

nn

n

n

n

n

nK

x xx x p x p x dx dx

XX

nk

n

( )[ ' ( )

'] ( ) ( ) ( ) ( ) ( )

− − + −∫∫ −γ

γσ σ

= 2 12 2 1 2 21

22 1 2 1 2γ γ

γσ σn

k

XX

nk

n

Kx x

x x p x p x dx dx o∫∫ − − +( ) ( ) ( ) ( ) ( ) ( ) , by Assumption (3.5)

= { ( ) ( ) ( ) } ( ) ( ) ( )K u x u p x u du x p x dx oTX

n n2 2

1 12

1 1 1 1∫∫ + + +σ γ γ σ , T is defined in Assumption (3.3).

= 2 141

21 1C K x p x dx o

X

( ) ( ) ( ) ( )σ∫ + =2C(K)E{σ2(x)p(x)}+o(1)=V0+o(1)

Therefore, Vn-V0 = o(1). Define Knij as K((xi - xj)/γn)

G E W n n n n E S j N E K n n cE Kn nij i njj i

n

i

n

nk

n13 2 4 6 4

14

24

124

11

3 212

41 8 1= = − ∈∑∑ ≤ −−

=≠

=

−( / ) {( ) ( ( )) { ( ) } ( ) ( )/ ε ε γ

= n n c Kx x

p x p x dx dxnk

nk

XX n

− −− ∫∫−3 2 1 2 4

1 2 1 21( ) { ( ( )} ( ) ( )γ γγ

= n n c K v p x p x v dx dvnk

XT

n− −− +∫∫3 4

1 1 11( ) ( ) ( ) ( )γ γ

= n n c K v dv p x dx o O nnk

V X

nk− − −− + =∫ ∫3 4 2

1 121 1 1( ) ( ) ( ) { ( )} ( )γ γ

G E W W n n n n n ES ES E K Kn nij nikkk ik j

n

jj i

n

i

n

n n22 2 6

1

6122

11132

12

22

122

12

32

1321 2 3 16= ≤ − − −

=≠≠

=≠

=∑∑∑ { / } {( )( )( ) ( ) ( ) }ε ε ε ε

= O n E K K O nnk

n n( ) {( ) ( ) } ( )− −=1 212

213

2 1γ

G E W W W W E W W W W E W W W W nn nij nik nhk nkh nij nih njk nkhi j k h nik nih njk njh36= +∑ +∑∑∑ ≠ ≠ ≠

−{ ( ) ( ) ( )}

{ }≤ ∈ ∈ ∈ ∈ − − −3 1 1 4 4

1 2 32 3 2 3 6 1

222

32

42

12 13 42 43CE S N S N S N S Nn n n n

nE K K K Kn n n n{ ( ) ( ) ( ) ( )}

( )( )( ) ε ε ε ε

by A(3.5)

≤ − − − −− − − −∫∫∫∫3 1 23 1 3 1 2 4 3 4O K x x K x x K x x K x xn

kn

k

XXX

n nk

X

n nk

n nk

n( ) { (( ) / )}{ (( ) / )}{ (( / }{ (( / }γ γ γ γ γ γ γ γ γ

= + − − − + + + + + +∫∫∫∫3 1 1 1 1 1O K v K v w K r w K w p x p x v p x v w p x v w rnk

n n n n n n

TTTX

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )γ γ γ γ γ γ γ

(x2=x1+ γnv, x3=x2+ γnw, x4=x3+ γnr)

=O(γnk)

Since Vn is O(1), we have shown that Gni/Vn2 =o(1) for i=1,2,3 given nγn

k → ∞ and γnk → 0.

QED

Lemma 4.2

To simplify, we set ε$ i*= ε$ i where ε$ i* is defined in (3.8).

Let $ $ $J n Kn n i jj in

in= ∑∑−

= +=−2 2 2 2

111 ε ε , then $ ( )$V C K Jn n= 2 .

Since ( ) ( )$$ ,ε εi i i i i iY f x c f x= − ∗ = + and Y where f$

n(xi,c*) is defined in assumption (3.7).

( )$J n K x xn n i jj in

in

i j= −∑∑−= +=

−2 211

1 2 2ε ε

( ) ( ) ( )[ ]

( ) ( ) ( )[ ]

( ) ( ) ( )[ ] ( ) ( )[ ]

( ) ( ) ( )[ ] ( ) ( )[ ]

+ − − ∗∑∑

+ − − ∗∑∑

+ − − ∗ − ∗∑∑

+ − − ∗ − ∗∑

−= +=

−= +=

−= +=

−= +=

8

4

8

8

2 211

1

2 2 2

111

211

1

211

n K x x f x f x c

n K x x f x f x c

n K x x f x f x c f x f x c

n K x x f x f x c f x f x c

n i j i j i jj in

in

n i j j i ij in

in

n i j j i j j i ij in

in

n i j j j j i ij in

i

ε ε

ε

ε ε

ε

$ ,

$ ,

$ , $ ,

$ , $ ,

( ) ( ) ( )[ ] ( ) ( )[ ]

( )

n

n i j j j jj in

in

i i

i j n i jj in

in

n n n n n

n K x x f x f x c f x f x c

n K x x A A A A A

−= +=

−= +=

+ − − ∗∑∑ − ∗

= −∑∑ + + + + +

1

22

111

2

2 2 211

11 2 3 4 5

2

2 8 4 8 8 2

ε

ε ε

$ , $ ,

We show that ( )A o 1 i 1, .. .,5in p= =

Let ( )P s uu

p K u=

[ ][ ] ( ) ( )[ ]( ) () ( )

( ) [ ]

A P n f x f x c

Op Op Op n

op and n Op

n nk

iin

jjn

j jjn

nk k

nt

nk

jjn

12

12

1 1

1 2 2

21

1 2

1

1 1

≤ ∑ ∑ − ∗∑

≤ +

≤ → ∞ ∑ =

−= = =

− −

=−

γ ε ε

γ γ γ

γ ε

$ ,

, ( )

/ /

/ because n

( ) ( )[ ]

[ ] ( ) ( )[ ][ ]( ) ( ) ( ) ( )[ ]( )

A P n f x f x c

P n n f x f x c

Op Op n Op n Op o

n nk

j i ij in

in

nk

jjn

i iin

nk

nk

nt

p

22 2 2

11

1 21

1 2

1

1 2 1 2 1

≤ − ∗∑∑

≤ ∑ − ∗∑

≤ + =

−= +=

−=

−=

− −

γ ε

γ ε

γ γ γ

$ ,

$ ,

( )./

The third inequality follows from results in Bierens (1987) that ( ) ( )( ) ( )var $ ,f x f x c O ni i nk− ∗ =

−γ

1 and the

bias ( ) ( ) ( )f x f x c Oii

nt− ∗ =$ , γ .

( ) ( )[ ] ( ) ( )[ ]

()

A P n f x f x c n f x f x c

P O

op

n nk

i i i j i jj

n

i

n

nk

nk

nt

31 1

11

2

1

≤ − ∗ − ∗∑∑

=

− −

==

− − −

γ ε ε

γ γ γ

$ , $ ,

/p(n ) + O (n )1/2

p1/2

( ) ( )[ ] ( ) ( )

( ) ( )[ ]

( ) [ ] ( )

A P n f x f x c f x f x c

P n n f x f x c

Op Op n o

n nk

i i i j jj

n

i

n

nk

ii

n

i ii

nk

nk

p

42

11

1 2

1

12

1

1

2

3 21

≤ − ∗ − ∗∑∑

≤ ∑

− ∗∑

=

==

=

=

γ ε

γ ε

γ γ

$ , $ ,

$ ,

/

and

A P n f x f x c n f x f x c

O O n o by Assumption b

n nk

j j jj

i

i

n

i ij

i

i

n

nk

p nk

p

51 2

1

1

1

1 2

1

1

1

1 2 1 3 6

≤ −∑∑

−∑∑

= =

=

=

=

=

− −

γ ε

γ γ

( ( ) ( , ) ( ( ) ( , ))

( ) ( ) ( ) ( . )

* *

We then show that

( ) ( ) ( )( )p n K x x E x p xi jj

n

ni

n

i jlim 2 2 42 2 2

11

==∑∑ − =ε ε σ

Let µ4 = E(εj4). Note that

( )( ) ( ) ( ) ( )

( )

E K x x x x K x x dx dx

O o n ce n

i j n i j i j n i j i j

nk

nk

ε ε µ µ

γ γ

4 4 24 4

2− = −∫∫

= = → ∞−

,

( ), sin

Therefore the first term in the expansion of J$

n satisfies the conditions of the U Statistics Projection

Theorem in Powell, Stock, and Stocker(1989), and therefore.

( ) ( ) ( ) ( )( )p n K x x C K E p x p xi jj i

n

ni

n

i jlim 2 22 2 2

11

14−

= +=

−∑∑ − →ε ε σ since

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( )( ) ( )[ ]

E K x x x x K x x p x p x dx dx

x x u K u p x p x dx d

x x p x p x dxK u du o

C K E x p x o

i j n i jxx

i n

Txn

xT

ε ε σ σ

σ σ γ µ µ

σ σ

σ

2 2 2 22 1 2 1 2 1 2

21

21 1 1 1

21

21 1 1

4

1

1

− = ∫∫ −

= ∫∫ − +

= ∫∫ +

= +

,

,

( )

QED

Theorem 4.1

We prove this theorem by Lemmas 4.1, 4.2

.

We must show that ( ) ( )n mn Wn Op$ − = 1 under H0 . Also, under H0 (f(x) in 2.1 is equal to F(x,θ

0).

( ) ( )( )[ ] ( )[ ]

( )[ ] ( )[ ]( ) ( )[ ] ( )[ ] ( ) ( )[ ][ ]

( ) ( ) ( ) ( )[ ][ ] ( )

$ $ $ , $ , $

, $ $

, $ , $ ,

, , ,$ ,

m n r x p x f x y f x

n y f x K x x S u

n f x f x K x x S f x f x

n f x f x f x f x K x x S

n n i n i i n i i ni

n

j i n n i j ijj

nii

n

j j i n n i j ijj

nj i n ii

n

j j i o i n i n i j ijj

n

= − − −

= − − −

= − − + − − − −

= − + − − − −

=

==

==

=

∑∑

∑∑

1

2

2

2

1

11

01 01

0 01

θ θ

θ

θ ε θ ε θ θ

ε θ θ θ θ[ ] ( ) ( )[ ][ ]( )

( ) ( ) ( )[ ]

( ) ( ) ( ) ( )[ ][ ] ( )

( ) ( )[ ][ ]

∑∑

∑∑

∑∑

∑∑

− −

= − −

+ − − −

+ − − − − − •

− −

= + +

=

==

==

==

ε θ θ

ε ε

ε θ θ

θ θ θ θ

ε θ θ

i i n ii

n

j ij n i j ij

n

i

n

i n i j ij i n ij

n

i

n

j i i n i n i j ijj

n

i

n

i i n i

n n n

f x f x

n S K x x

n K x x S f x f x

n f x f x f x f x K x x S

f x f x

U U U

, $ ,

, $ ,

, , , $ ,

, $ ,

01

11

011

0 0 011

0

1 2 3

2

2

2

Because

nU nWn n1 =

we only need to show that

( ) ()n U U 12n 3n+ = op

To show this we first show that

( )nU p 12n p= o

( ) ( ) ( )[ ]( ) ( ) [ ]

nU n K x x S f x , f x ,

n K x x Sf x ,

q

2n3/2

j n i j ij i n i 0j 1n

i 1n

3/2j n i j ij

i nn nj 1

ni 1n

= − −∑∑

= −′

−∑∑

−==

−==

ε θ θ

ε∂ θ

∂θ θ

$

$

for some

{ }θ θ θ θ θ θ

γ

γ

n n

nk

nk

∈ − ≤ −

= =

: $ .

),

)o(n)O(n )

/

/

0

2

2

Since S = O(n and using lemma A.1,and assumption3.2

nU O(n

o(1)

ij1/2

2n-1/2

nU n f x K x x S

f x K x x S f x f x

f x f x K x x S f x f x

n jj

n

i

n

n i j ij i

j n i j ij i n i

i n i n i j ij i i n i

33 2

011

0 0

0 0

= ∑∑ −

− − −

− − − + −

==

/

,

( , ) ( )

( , ) ( ) [ ( , $ ) ( , )]

[ ( $ ) ( , )] ( ) [ ( , $ ) ( , )]

θ ε

θ θ θ

θ θ ε θ θ

= + +A A An n n1 2 3

By Lemma A.1, and Sjj = O(n1/2γnk/2 ), A1n = O(γn

k/2).

By assumption (3.2) and Lemma A.1 A2n = O(n-1/2γnk/2),

and A3n = O(n-1/2γnk/2).

QED

Theorem 4.2

Proof

( ) ( ) ( )[ ]( ) ( ) ( ) ( )[ ]

( ) ( )[ ] ( )( ) ( ) ( )[ ][ ]

$$ , , $ .

, , $

, ,

, $ , .

/ /

/ /

/ /

m R n f x n x f x

K x x S f x n x f x

n f x f x S K x x

n n x f x f x

S

n n j nk

j j j nj i

n

i

n

n i j ij ik

i i i n

j i j ij n i j ij i

n

i

n

nk

j i n ij i

n

i

n

ij

− = + + −

− + + −

= − + −

+ − −

− − −≠=

− −

−≠=

− − −≠=

∑∑

∑∑

∑∑

20

1 2 4

1

01 2 4

20 01

2 1 2 401

θ γ ε θ

θ γ ε θ

θ θ ε ε

γ θ θ

( ) ( ) ( ) ( )[ ][ ]( ) ( )[ ] ( )

( ) ( ) ( )[ ]( ) ( ) ( )[ ]

( )

K x x n x f x f x

n f xj f xi j SijKn xi xj

f x f x n x

n n x f x f x

S K x x

V V V V

n i j nk

i i i n

j i

n

i

n

i i n i

n j i n ij i

n

i

n

ij n i j i

n n n n

− + −

+ − + −

− +

+ − −

= + + +

− −

−≠=

− −

− − −≠=

∑∑

∑∑

1 2 40

2

1

0 01 2 1 4

2 1 2 1 401

1 2 3 4

0 0

/ /

/ /

/ /

, , $

, , .

, , $

, $ , .

.

γ θ θ

θ θ ε

θ θ γ

γ θ θ

ε

In Lemma 4.1, we have shown

( )nV N O Vn1 0→ , .

We then will show the following:

(i) plim nV n2 = µ , where µ is defined in the theorem

(ii) plim nV n3 0=

(iii) plim nV n4 0= .

Starting with (i)

(i)

( ) ( ) ( )[ ]( )( ) ( ) ( )[ ][ ]

nE V n tn

j in E n x

dfd

x

S K x x n xdf

dx

n nk

j i n n

ij n i j nk

i i n n

22 1 2 4

0

1 2 40

1= = ≠ −′

− −′

− − −

− −

Σ Σ ∆Θ

Θ

∆Θ

Θ

/ /

/ /

,~

$ .

,~

$

γ θ θ

γ θ θ

for some { }~: $ .θ θ θ θ θ θn n∈ − ≤ − 0 0

The summand is equal to

( ) ( ) [ ] ( ) ( ) ( ) [ ] ( ) ( )

( ) ( ) [ ] ( ) ()

( )

n xdf x

dS K u n x

df x

dp x p x dx du

n n xdf x

dp x dx Op

n n E x n

nk

i nui n

n ij nk

i

i n

n i nu i i

nk

nk

i

i n

n i i

k ki n

− − − −

− − − −

− −

− ⋅ −

− −

= −

−∫ +

= ⋅ −

∫∫

1 2 40

1 2 40

1 2 2 1 2 40

2

1 2 2 1 2 1 2 1

1

/ / / /

/ / / /

/ / / / /

,~

$

,~

$

,~

$

γ γθ

θθ θ γ

θ

θθ θ γ

γ γθ

θθ θ

γ γ γ

∆ ∆

∆( ) [ ] ( )4 0

0

2

1 2

1df x

dOp

n

i n

n

,~

/

θ θ

θθ θ

µ

−′

− +

=

( )( )

(iii) Since E

V

V

Op n

nV

n

n

n

2

21 2

2

0

0

=

=

− /

QED

Lemma 4.3

An alternative estimator for V0 is

$

$

( )$ $V n K x xn nk

n i j i jj

i

i

n= −∑∑−

=

=4 2 2 2 2

1

1

1γ ε ε

Proof:

Let

( )z x and z x where isdefined ini i i i i i= =( , ) $ ( , $ ) .ε ε 315

Therefore, taking a Taylor expansion around εi yields:

$

$

( ) ( ) ($ ) ( ) ($ )V n K x x n K x x n K x xn nk

n i j j ij

i

i

n

nk

n i j i j i ij

i

i

n

nk

n i j j i j jj

i

i

n= −∑∑ + − −∑∑ + − −∑∑−

=

=

=

=

=

=4 8 82 2 2 2

1

1

1

2 2 2

1

1

1

2 2 2

1

1

1γ ε ε γ ε ε ε ε γ ε ε ε ε

$

$

( ) ( )V n K x x on nk

n i j j ij

i

i

n= −∑∑ +−

=

=4 12 2 2 2

1

1

1γ ε ε

The last inequality comes from Bierens[1987] where $ ( )/ /ε ε γi i p nkO n− = − −1 2 2 .

Let H z z K x xn i j nk

n i j i j( , ) ( )= −γ ε ε2 2 2. Then, we can easily use Powel, Stock, and Stocker[1989] to show

that E[||Hn(zi,zj)||2]=o(n). Further, the following holds:

E n K x xn n

nK

x xx x dx dxn

kn i j j i

j

i

i

n

nk

XX

i j

ni j i j{ ( ) }

( )( ) ( ) ( )4

1

2

1 12 2 2 2

1

1

12

2 2−

=

=− =

− −∑∑ ∫∫γ ε ε

γ γσ σ

Letting u =(xi-xj)/γn, we get:

E n K x xn n

nK u x x u dx du V on

kn i j j i

j

i

i

n

XTi j n i{ ( ) }

( )( ) ( ) ( ) ( )4

1

2

112 2 2 2

1

1

12

2 20

=

=− =

−− = +∑∑ ∫∫γ ε ε σ σ γ

Therefore, the conditions of Powell, Stocker, and Stocker are satisfied.

Theorem 4.3

Let K(z1,z2) have the properties list in assumptions (3.9).

Let

WK x x x x

S j N

F Kx x

I x x S j N

niji n

nk

i j

n

i j

nj i

niji

nk

i j

nj j j i

=− −

=−

= ∈

εγ γ γ

ε

εγ γ

ε

1

1

1 1 2 2

11 1

1 2

( , ) ( )

( ) ( ) ( )

We have already shown that n W F o nnijj

n

niji

n−

=

=∑ − =∑2

1

1 2

1( ) ( )/ . Therefore, it suffices to show that:

n

nF N Vnij

d

j i

n

i

n

2 011

12 0 →∑∑

= +=

−( , ).

It is evident that E(Fnij|εi,xi)= E(Fnij|εj,xij)=0. Therefore, this is a degenerate statistic. Given the

assumption (3.10) where x2 can only take a finite number of values, the kernel K1(z1,z2)I(u1=u2) satisfies

assumption (3.3) for k=k1, and t > 2k1. Assumption (3.4) is satisfied for x1 and k1. Therefore, we can still

take the required Taylor Expansions with respect to x1, and in assumption 3.5 we set:

n O n nk' ( )/ /= 3 2 21γ

We satisfy assumption (3.6) with k=k1. The standardization in assumption (3.7) needs to be done only on

x1. Therefore all the assumptions for Lemma 4.1 are satisfied. QED