Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
TESTING EXOGENEITY IN NONPARAMETRIC INSTRUMENTAL VARIABLES MODELS IDENTIFIED BY CONDITIONAL QUANTILE RESTRICTIONS
by
Jia-Young Michael Fu Department of Economics Northwestern University
Evanston, IL 60201
Joel L. Horowitz Department of Economics Northwestern University
Evanston, IL 60201
Matthias Parey Department of Economics
University of Surrey Guildford GU2 7XH
United Kingdom
November 2019
Abstract
This paper presents a test for exogeneity of explanatory variables in a nonparametric instrumental variables (IV) model whose structural function is identified through a conditional quantile restriction. Quantile regression models are increasingly important in applied econometrics. As with mean-regression models, an erroneous assumption that the explanatory variables in a quantile regression model are exogenous can lead to highly misleading results. In addition, a test of exogeneity based on an incorrectly specified parametric model can produce misleading results. This paper presents a test of exogeneity that does not assume the structural function belongs to a known finite-dimensional parametric family and does not require estimation of this function. The latter property is important because nonparametric estimates of the structural function are unavoidably imprecise. The test presented here is consistent whenever the structural function differs from the conditional quantile function on a set of non-zero probability. The test has non-trivial power uniformly over a large class of structural functions that differ from the conditional quantile function by 1/2( )O n− . The results of Monte Carlo experiments and an empirical application illustrate the performance of the test. Key words: Hypothesis test, instrumental variables, quantile estimation, specification testing JEL Listing: C12, C14 We thank Richard Blundell for helpful comments. Part of this research was carried out while Joel L. Horowitz was a visitor at the Department of Economics, University College London, and the Centre for Microdata Methods and Practice. Matthias Parey gratefully acknowledges the support of the ERC grant MicroConLab at University College London. We thank Agnes Norris Keiller for research assistance.
1
TESTING EXOGENEITY IN NONPARAMETRIC INSTRUMENTAL VARIABLES MODELS IDENTIFIED BY CONDITIONAL QUANTILE RESTRICTIONS
1. INTRODUCTION
Econometric models often contain explanatory variables that may be endogenous. For example,
in a wage equation, the observed level of education may be correlated with unobserved ability, thereby
causing education to be an endogenous explanatory variable. It is well known that estimation methods for
models in which all explanatory variables are exogenous do not yield consistent parameter estimates
when one or more explanatory variables are endogenous. For example, ordinary least squares does not
provide consistent estimates of the parameters of a linear model when one or more explanatory variables
are endogenous. Instrumental variables estimation is a standard method for obtaining consistent
estimates.
The problem of endogeneity is especially serious in nonparametric estimation. Because of the ill-
posed inverse problem, nonparametric instrumental variables estimators are typically much less precise
than nonparametric estimators in the exogenous case. Therefore, it is especially useful to have methods
for testing the hypothesis of exogeneity in nonparametric settings. This paper presents a test of the
hypothesis of exogeneity of the explanatory variable in a nonparametric quantile regression model.
Quantile models are increasingly important in applied econometrics. Koenker (2005) and
references therein describe methods for and applications of quantile regression when the explanatory
variables are exogenous. Estimators and applications of linear quantile regression models with
endogenous explanatory variables are described by Amemiya (1982), Powell (1983), Chen and Portnoy
(1996), Januszewski (2002), Chernozhukov and Hansen (2004, 2006), Ma and Koenker (2006), Blundell
and Powell (2007), Lee (2007), and Sakata (2007). Nonparametric methods for quantile regression
models are discussed by Chesher (2003, 2005, 2007); Chernozhukov and Hansen (2004, 2005, 2006);
Chernozhukov, Imbens, and Newey (2007); Horowitz and Lee (2007); and Chen and Pouzo (2009, 2012).
Chen, Chernozhukov, Lee, and Newey (2014) provide sufficient conditions for local identification of
nonparametric quantile IV models. Blundell, Horowitz, and Parey (2017) estimate a nonparametric
quantile regression model of demand under the hypothesis that price is exogenous and an instrumental
variables quantile regression model under the hypothesis that price is endogenous.
The method presented in this paper consists of testing the conditional moment restriction that
defines the null hypothesis of exogeneity in a quantile instrumental variables (IV) model. This approach
does not require estimation of the structural function. An alternative approach is to compare a
nonparametric quantile estimate of the structural function under exogeneity with an estimate obtained by
using nonparametric instrumental variables methods. However, the moment condition that identifies the
structural function in the presence of endogeneity is a nonlinear integral equation of the first kind, which
2
leads to an ill-posed inverse problem (O’Sullivan 1986, Kress 1999). A consequence of this is that in the
presence of one or more endogenous explanatory variables, the rate of convergence of a nonparametric
estimator of the structural function is typically very slow. Therefore, a test based on a direct comparison
of nonparametric estimates obtained with and without assuming exogeneity will have low power.
Accordingly, it is desirable to have a test of exogeneity that avoids nonparametric instrumental variables
estimation of the structural function. This paper presents such a test.1
Breunig (2015) and Blundell and Horowitz (2007) have developed tests of exogeneity of the
explanatory variables in a nonparametric IV model that is identified through a conditional mean
restriction. The test presented here for exogeneity in quantile IV models uses ideas and has properties
similar to those of Blundell’s and Horowitz’s (2007) test. However, the non-smoothness of quantile
estimators presents technical issues that are different from and more complicated than those presented by
instrumental variables models that are identified by conditional mean restrictions. Therefore, testing
exogeneity in a quantile regression model requires a separate treatment from testing exogeneity in the
conditional mean models considered by Breunig (2015) and Blundell and Horowitz (2007). We use
empirical process methods to deal with the non-smoothness of quantile estimators. Such methods are not
needed for testing exogeneity in conditional mean models. Some concepts and mathematical details in
this paper are similar to concepts and details in Horowitz (2006) and Horowitz and Lee (2009). However,
the model and hypothesis in this paper are different from those in Horowitz (2006) and Horowitz and Lee
(2009) and require a separate treatment.
Breunig (2018) presents a test of exogeneity in nonparametric quantile IV models and derives its
asymptotic distribution under the null hypothesis of exogeneity and under local alternatives. The test
presented here is more powerful than that of Breunig (2018) against a large class of alternatives. The
theoretical reason for the power difference is explained in Section 3.5 of this paper. There are, however,
certain types of alternatives against which Breunig’s (2018) test has greater power. These are described
in Section 3.5. Section 4 and Section A.5 of the appendix present numerical illustrations of the power
differences.
This paper also presents the results of applying our test to a nonparametric quantile regression
model of the demand for gasoline using a large travel data set from the U.S. The test rejects the
hypothesis that price is exogenous for several demographic groups, indicating that endogeneity is a
potentially important concern in this setting.
1 The test presented in this paper achieves a 1/2n− rate of testing against a large class of alternatives. A referee has pointed out that under regularity conditions stronger than those of the test presented here, this rate can be achieved by using a suitable metric to compare estimates obtained with and without the assumption of exogeneity.
3
Section 2 of this paper presents the model, null hypothesis to be tested, and test statistic. Section
3 describes the asymptotic properties of the test. Section 4 presents the results of a Monte Carlo
investigation of the finite-sample performance of the test, including a comparison of the powers of our
test and Breunig’s (2018) test. Section 5 presents the empirical application of the test. Section 6
concludes. Additional Monte Carlo results and technical details, including regularity conditions and the
proofs of theorems, are in the Appendix. A computer program that implements the test is available on
request.2
2. THE MODEL, NULL HYPOTHESIS, AND TEST STATISTIC
This section begins by presenting the model setting that we deal with, the null hypothesis to be
tested, and issues that are involved in testing the null hypothesis. Section 2.2 presents the test statistic.
2.1 The Model and the Null and Alternative Hypotheses
Let Y be a scalar random variable, X and W be continuously distributed random scalars or
vectors, q be a constant satisfying 0 1q< < , and g be a structural function that is identified by the
relation
(2.1) [ ( ) 0 | ]P Y g X W w q− ≤ = =
for almost every supp( )w W∈ . Equivalently, g is identified by
(2.2) ( ) ; ( 0 | )Y g X U P U W w q= + ≤ = =
for almost every supp( )w W∈ . In (2.1) and (2.2), Y is the dependent variable, X is the explanatory
variable, and W is an instrument for X . The function g is nonparametric; it is assumed to satisfy mild
regularity conditions but is otherwise unknown.
Define the conditional q -quantile function ( ) ( | )qG x Q Y X x= = , where qQ denotes the
conditional q -quantile. We say that X is exogenous at quantile q if ( ) ( )g x G x= except, possibly, if x
is contained in a set of zero probability. Otherwise, we say that X is endogenous at quantile q . In the
remainder of this paper, we use the abbreviated terms “exogenous” and “endogenous” without the
qualification “at quantile q ”, but the qualification is understood to be included. This paper presents a test
of the hypothesis that X is exogenous against the alternative hypothesis that X is endogenous. Under
mild conditions, the test presented here rejects a false hypothesis of exogeneity with probability
approaching 1 as the sample size increases.
2 Contact Matthias Parey, e-mail: [email protected]
4
One possible way of testing 0H is to estimate g and G , compute the difference between the two
estimates in some metric, and reject 0H if the difference is too large. To see why this approach is
unattractive, assume that 2supp( , ) [0,1]X W ⊂ . This assumption entails no loss of generality if X and W
are scalars. It can always be satisfied by, if necessary, carrying out monotone increasing transformations
of X and W . Then (2.1) is equivalent to the nonlinear integral equation
(2.3) 1
0[ ( ), , ] ( ) 0YXW WF g x x w dx qf w− =∫ ,
where Wf is the probability density function of w ,
( , , ) ( , , )y
YXW YXWF y x w f u x w du−∞
= ∫ ,
and YXWf is the probability density function of ( , , )Y X W . Equation (2.3) can be written as the operator
equation
(2.4) ( )( ) ( )WT g w qf w= ,
where the operator T is defined by
1
0( )( ) [ ( ), , ]YXWT h w F h x x w dx= ∫
for any function h for which the integral exists. Thus,
1Wg qT f−= .
T and Wf are unknown but can be estimated consistently using standard methods. However, 1T − is a
discontinuous operator (Horowitz and Lee 2007), and (2.3) and (2.4) present an ill-posed inverse problem.
Because of the ill-posed inverse problem, the rate of convergence of any estimator of g is typically much
slower than the usual nonparametric rates. Consequently, a test based on comparing estimates of g and
G will have low power.3
The test developed here does not require estimation of g and, therefore has greater “precision”
than an estimator of g . Let n denote the sample size used for testing. Under mild conditions, the test
rejects 0H with probability approaching 1 as n →∞ whenever ( ) ( )g x G x≠ on a set of non-zero
probability. Moreover, like the test of Blundell and Horowitz (2007), the test developed here can detect a
large class of structural functions g whose distance from the conditional quantile function G in a
3 See the qualification of this statement in footnote 1.
5
suitable metric is 1/ 2( )O n− . In contrast, the rate of convergence in probability of a nonparametric
estimator of g is always slower than 1/ 2( )pO n− .4
Throughout the remaining discussion, we use an extended version of (2.1) and (2.2) that allows
g to be a function of a vector of endogenous explanatory variables, X , and a set of exogenous
explanatory variables, Z . We write this model as
(2.4) ( , ) ; ( 0 | , )Y g X Z U P U Z z W w q= + ≤ = = =
for almost every ( , ) supp( , )z w Z W∈ , where Y and U are random scalars. X and W are random
variables whose supports are compact sets that we take to be [ ,1 ]pa a− + ( 1p ≥ ) for some arbitrarily
small 0a > and [0,1]p , respectively. Z is a random variable whose support is a compact set that we take
to be [ ,1 ]ra a− + ( 0).r ≥ Thus, supp( , , ) [ ,1 ] [0,1]p r pX Z W a a += − + × The compactness assumption is
not restrictive because it can be satisfied by carrying out monotone increasing transformations of any
components of X , W , and Z whose supports are not compact. If 0,r = then Z is not included in (2.4)
and supp( , ) [ ,1 ] [0,1]p pX W a a= − + × . W is an instrument for X .
To avoid technical complications associated with the boundary of supp( , )X Z and enable us to
use certain results of Guerre and Sabbah (2012), we condition our analysis and null hypothesis on
( , ) [0,1]p rX Z +∈ . This amounts to testing the hypothesis [ ( , ) 0 | , , ]P Y g X Z X Z q− ≤ = , where is the
event ( , ) [0,1]p rX Z +∈ . An alternative approach is to restrict ( , )X Z to [ ,1 ]p rn na a +− + where 0na ≥
and na a→ as n →∞ . We do not do this here, however, because it complicates the mathematics but has
no practical effect on the performance of the test.
The resulting inferential problem is to test the null hypothesis, 0H , that
(2.5) ( 0 | , , )P U X x Z z q≤ = = =
except, possibly, if ( , )x z belongs to a set of probability 0. 0H is equivalent to testing
[ ( , ) ( , ) | ] 1P g X Z G X Z= = or [ ( , ) 0 | , , ]P Y G X Z Z z W w q− ≤ = = = , where ( , )G x z the q quantile
of Y conditional on ( , , )X Z : [ ( , ) | , , ]P Y G x z X x Z z q≤ = = = . The alternative hypothesis, 1H , is
that (2.5) does not hold on some set that has non-zero probability or, equivalently, that
[ ( , ) ( , ) | ] 1P g X Z G X Z= < . The data, , , , : 1,..., i i i iY X Z W i n= , are an independent random sample of
( , , , )Y X Z W .
4 Nonparametric estimation and testing of conditional mean and median functions is another setting in which the rate of testing is faster than the rate of estimation. See, for example, Guerre and Lavergne (2002) and Horowitz and Spokoiny (2001, 2002).
6
2.2 The Test Statistic
To form the test statistic, let YXZWf , XZWf , and ZWf , respectively, denote the probability density
functions of ( , , , )Y X Z W , ( , , )X Z W and ( , )Z W . Let |YXZWf , |XZWf , and |ZWf denote the
probability density functions conditional on . Define
| |( , , , | ) ( , , , | )y
YXZW YXZWF y x z w f u x z w du−∞
= ∫ .
Then under 0H ,
(2.6) | |[0,1] [0,1]( , ) [ ( , ), , , | ] ( , , | ) 0p pYXZW XZWS z w F G x z x z w dx q f x z w dx≡ − =∫ ∫
for almost every ( , ) [0,1]p rz w +∈ . 1H is equivalent to the statement that (2.6) does not hold on a set
[0,1]p r+⊂ with non-zero Lebesgue measure. A test statistic can be based on a sample analog of
2( , )S z w dzdw∫ , but the resulting rate of testing is slower than 1/ 2n− due to the need to estimate |XZWf
and |YXZWF nonparametrically. The rate 1/ 2n− can be achieved by carrying out an additional smoothing
step. To this end, for 1 2, [0,1]pξ ξ ∈ and 1 2, [0,1]rζ ζ ∈ , let 1 1 2 2( , ; , )ξ ζ ξ ζ denote the kernel of a
nonsingular integral operator from 2[0,1]p rL + to itself. That is, if
(2.7) 1 1 2 2 1 1 1 1[0,1]( , ; , ) ( , ) 0
p rd dξ ζ ξ ζ ψ ξ ζ ξ ζ
+=∫
for a function 2[0,1]p rLψ +∈ , then 1 1( , ) 0ψ ξ ζ = for all 1 1( , ) [0,1]p rξ ζ +∈ . For example, let 1p r= = .
Then set 1 1 2 2 1 2 1 2( , ; , ) ( , ) ( , )ξ ζ ξ ζ ξ ξ ζ ζ=
, where 1 2 1 21( , ) 2 sin( )sin( )k
kk kξ ξ πξ πξ
∞ −=
=∑
and
1 2( , )ζ ζ
is obtained by replacing 1 2( , )ξ ξ with 1 2( , )ζ ζ . The sums can be truncated at a finite k for
computational purposes. Truncation has a negligible effect on the numerical results if k is large. The
operator is assumed to satisfy certain additional restrictions that are stated in Assumption 5 of Section
A.1 of the Appendix. 0H is equivalent to
2
2
|[0,1]
|[0,1]
(2.8) ( , ) [ ( , ), , , | ] ( , , , )
( , , | ) ( , , , ) 0
p r
p r
YXZW
XZW
S z w F G x x z w dxd d
q f x z w dxd d
ζ ζ η ζ η ζ η
ζ η ζ η ζ η
+
+
≡
− =
∫
∫
7
for almost every ( , ) [0,1]p rz w +∈ . 1H is equivalent to the statement that (2.8) does not hold on a set
[0,1]p r+⊂ with non-zero probability. The test statistic is based on a sample analog of 2( , )S z w dzdw∫ .
( , )S z w does not depend on g , and computing the sample analog does not require estimation of g .5
To form a sample analog of ( , )S z w , let ˆ ( , )G x z be the nonparametric estimator of ( , )G x z that is
described in in the next paragraph. Let ( )I ⋅ denote the indicator function. It follows from (2.8) that
2
2
|[0,1]
|[0,1]
|
(2.9) ( , ) [ ( , )] ( , , , | )] ( , ; , )
( , , | ) ( , , , ) 0
[ ( , )] ( , ; , )
[ ( , )] ( , ; , ) ( , ) [0,1]
p r
p r
YXZW
XZW
YXZW
p rYXZW
S z w dxd d dyI y G x f y x z w
q f x z w dxd d
E I Y G X Z q Z W z w
E I Y G X Z q Z W z w I X Z
ζ η ζ ζ η ζ η
ζ η ζ η ζ η
+
+
∞
−∞
+
= ≤
− =
= ≤ −
= ≤ − ∈
∫ ∫
∫
/ ( ),P
where ( )P is the probability of the event . The sample analog of ( , )S z w is obtained from (2.9) by
replacing ( , )S z w with ( , ) ( )S z w P , G with G , the population expectation YXZWE with the sample
average, and multiplying the resulting expression by 1/2n to obtain a random variable that has a non-
degenerate limiting distribution. The resulting scaled sample analog is
(2.10) 1/2
1
ˆ ˆ( , ) [ ( , )] ( , , , ) ( , ) [0,1] n
p rn i i i i i i i
iS z w n I Y G X Z q Z W z w I X Z− +
== ≤ − ∈∑ .
The test statistic is
2[0,1]
ˆ ( , )p rn nS z w dzdwτ += ∫ .
Under 0H ,
2[0,1]
( , )p r
S z w dzdw+∫ =0,
so nτ differs from 0 only due to random sampling errors. Therefore, 0H is rejected if nτ is larger than
can be explained by random sampling errors.
The estimator G is a kernel nonparametric quantile regression estimator with kernel K and
bandwidth 0h > . We assume that K is supported on [ 1,1]− and that
5 The function is the kernel of a non-singular integral operator, so smoothing in (2.8)-(2.10) does not cause the test presented in this paper to be inconsistent. To the best of our knowledge, smoothing is necessary to obtain a 1/2n− rate of testing against local alternative hypotheses.
8
(2.11) 1
1
1 if 0( )
0 if 1 1.j j
u K u duj s−
== ≤ ≤ −∫
for some 2s ≥ . Define ( ) ( / )hK v K v h= . Also define
(2.12) ( )( ),
1( )
pk
p h hk
K x K x=
=∏ ,
where ( )kx denotes the k ’th component of the vector x . Define ,r hK similarly. Let qρ be the check
function: ( ) [ ( 0)]q y y q I yρ = − ≤ . The estimator of G is
(2.13) , ,1
ˆ ( , ) arg inf ( ) ( ) ( )n
q i p h i r h ia iG x z Y a K x X K z Zρ
=
= − − −∑ .
The test statistic nτ is obtained by substituting (2.13) into (2.10).
3. ASYMPTOTIC PROPERTIES
This section presents the asymptotic properties of the test of exogeneity based on nτ . Regularity
conditions for the theorems in this section are given in Section A.1 of the Appendix.
3.1 Asymptotic Properties of the Test Statistic under 0H
To obtain the asymptotic distribution of nτ under 0H , let YXZf denote the probability density
function of ( , , )Y X Z . Define
1/2
1
[0,1]
( , ) [ ( , )] ( , ) [0,1] ( , ; , )
[ ( , ), , , ) ( , ; , ) [ ( , )] ( , ) [0,1]
[ ( , ), , ]p
np r
n i i i i i i ii
YXZW i i i i ip ri i i i i
YXZ i i i i
B n I Y g X Z q I X Z Z W
f G X Z X Z w Z w dwI Y G X Z q I X Z
f G X Z X Z
ζ η ζ η
ζ η
− +
=
+
= ≤ − ∈
− ≤ − ∈
∑
∫
and
1 1 2 2 1 1 2 2( , ; , ) [ ( , ) ( , )]n nR E B Bζ η ζ η ζ η ζ η= .
Define the operator Ω on 2 ([0,1] )p rL + by
(3.1) 2 2 1 1 2 2 1 1 1 1[0,1]( )( , ) ( , ; , ) ( , )p r R d dφ ζ η ζ η ζ η φ ζ η ζ η+Ω = ∫ .
9
Let : 1,2,...j jω = denote the eigenvalues of Ω sorted so that 1 2 ... 0ω ω≥ ≥ ≥ .6 Let 21 : 1,2,...j jχ =
denote independent random variables that are distributed as chi-square with one degree of freedom. The
following theorem gives the asymptotic distribution of nτ under 0H .
Theorem 1: Let 0H be true. Then under assumptions 1-5 of Section A.1 of the Appendix,
21
1
dn j j
jτ ω χ
∞
=
→ ∑ .
Under 0H , G g= , so knowledge of or estimation of g is not needed to obtain the asymptotic
distribution of nτ under 0H or the asymptotic critical value of nτ . Denote the α -level asymptotic
critical value by zα . The statistic nτ is not asymptotically pivotal, so its asymptotic distribution cannot
be tabulated. Section A.2 of the Appendix presents a method for obtaining an approximate asymptotic
critical value.
3.2 Consistency of the Test against a Fixed Alternative Hypothesis
In this section, it is assumed that 0H is false. That is, [ ( , ) ( , ) | ] 1P g X Z G X Z= < . Define
(3.2) 2[0,1]( , ) [ ( , ), , , | ] [ ( , ), , , | ] ( , ; , )p r YXZW YXZWH F G x z x z w F g x z x z w z w dxdwdzζ η ζ η
+= −∫ .
Let zα denote the 1 α− quantile of the asymptotic distribution of nτ under sampling from the null-
hypothesis model ( , ) , ( 0 | , )Y G X Z V P V X Z q= + ≤ = . The following theorem establishes consistency
of the nτ test against a fixed alternative hypothesis.
Theorem 2: Let assumptions 1-5 of Section A.1 of the Appendix hold, and suppose that 2
[0,1]( , ) 0
p rH d dζ η ζ η
+>∫ .
Then for any α such that 0 1α< < ,
lim ( ) 1.nnzατ
→∞> =P
Because is the kernel of a nonsingular integral operator, the nτ test is consistent whenever ( , )g x z
differs from ( , )G x z on a set of ( , )x z values whose probability exceeds zero.
3.3 Asymptotic Distribution under Local Alternatives
This section obtains the asymptotic distribution of nτ under the sequence of local alternative
hypotheses 6 R is a bounded function under the assumptions of Section A.1. Therefore, Ω is a compact, completely continuous operator with discrete eigenvalues.
10
(3.3) 1/2[ ( , ) ( , ) | , , ]P Y G X Z n X Z W w Z z q−≤ + ∆ = = =
for almost every ( , ) [0,1]p rw z +∈ , where ∆ is a bounded function on [0,1]p r+ . Under (3.3)
(3.4) 1/2( , ) ( , ) ( , )g x z G X Z n x z−= + ∆ ,
and
( , ) ; ( 0 | , , )Y g X Z U P U Z z W w q= + ≤ = = =
for almost every ( , ) [0,1]p rw z +∈ .
Let Ω be the integral operator defined in (3.1), jφ denote the orthornormal eigenfunctions of
Ω , and jω denote the eigenvalues of Ω sorted so that 1 2 ...ω ω≥ ≥ Let |UXZWf denote the
probability density function of ( , , , )U X Z W conditional on . Define
2 |[0,1]( , ) (0, , , | ) ( , ) ( , ; , )p r UXZWf x z w x z z w dxdzdwµ ζ η ζ η+= − ∆∫
and
(3.5) [0,1]
( , ) ( , )p rj j d dµ µ ζ η φ ζ η ζ η+= ∫ .
Let 2 21 ( / ) : 1,2,...j j j jχ µ ω = denote a sequence of independent random variables distributed as non-
central chi-square with one degree of freedom and non-central parameters 2 /j jµ ω .
The following theorem gives the asymptotic distribution of nτ under the sequence of local
alternatives (3.3)-(3.4).
Theorem 3: Let assumptions 1-5 of Section A.1 of the Appendix hold. Under the sequence of
local alternatives (3.3)-(3.4),
2 21
1( / ).d
n j j j jj
τ ω χ µ ω∞
=
→ ∑
It follows from Theorem 3 that
lim ( )nnP zατ α
→∞> >
if 2 0jµ > for at least one j . In addition, for any 0ε >
lim ( ) 1nnP zατ ε
→∞> > −
if 2jµ is sufficiently large for at least one j .
11
3.4 Uniform Consistency
This section shows that for any 0ε > , the nτ test rejects 0H with probability exceeding 1 ε−
uniformly over a set of functions g whose distance from G is 1/ 2( )O n− . This set contains deviations
from 0H that cannot be represented as sequences of local alternatives. Thus, the set is larger than the
class of local alternatives against which the power of nτ exceeds 1 ε− . No test can have high power
against all alternatives to the null hypothesis (Janssen 2000). The practical consequence of the result of
this section is to define a relatively large class of alternatives against which the nτ test has high power in
large samples.
The following additional notation is used. Let 2⋅ denote the norm in 2 ([0,1] )p rL + . Define
( , )H ζ η as in (3.2). Define the linear operator T by
2 |[0,1]( )( , ) [ ( , ), , , | ] ( , ; , ) ( , )p r YXZWT f g x z x z w z w x z dxdwdzψ ζ η ζ η ψ+= ∫
and the function
( , ) ( , ) ( , )x z g x z G x zπ = − .
Let ([0,1] )a p rCC + be the class of functions defined in Section A.1 of the Appendix. For some finite
0C > , let nC be the class of functions ([0,1] )a p rCg C +∈ with a p r> + and C < ∞ satisfying:
(i) There is a function ( , )G x z such that [ ( , ) | , | ]P Y G X Z X x Z z q≤ = = = for almost every
( , ) [0,1]p rx z +∈ .
(ii) Assumption 3 in Section A.1 of the Appendix is satisfied with ( , )V Y G X Z= − .
(iii) The density function YZXWf satisfies Assumption 1 in Section A.1.
(iv) The function g satisfies Assumption 2 in Section A.1 with ( , )U Y G X Z= − .
(v) 1/22T n Cπ −≥ .
Condition (v) implies that nC contains alternative models g such that 1/22 ( )g G O n−− = . In addition,
condition (v) rules out differences between the structural functions under the null and alternative
hypotheses, ( , ) ( , ) ( , )x z g x z G x zπ = − , that are linear combinations of eigenfunctions of T associated
with eigenvalues that converge to zero too rapidly. Thus, the nτ test has low power against deviations
from 0H that operate through eigenfunctions of T associated with eigenvalues that converge to zero
very rapidly. Such deviations often correspond to highly oscillatory functions that have little relevance
for economic applications. Conditions (i)-(v) define the set over which the nτ test is uniformly
consistent. They do not restrict the sampled distribution or the function .
12
The following theorem states the result of this section.
Theorem 4: Let assumptions 1-5 of Section A.1 of the Appendix hold. Then given any 0δ >
and any (0,1)α ∈ , there is a 0C < ∞ such that for any 0C C≥ ,
lim inf ( ) 1nC
nn gzατ δ
→∞ ∈> ≥ −P
and
ˆlim inf ( ) 1 2nC
nn gzεατ δ
→∞ ∈> ≥ −P
,
where zεα is the estimated 1 α− critical value described in Section A.2.
3.5 Comparison with Breunig’s (2018) Test
Breunig (2018) presents a test of exogeneity in a nonparametric quantile regression model. The
nτ statistic described in this paper and Breunig’s (2018) test statistic are both quadratic forms of sample
analogs of the moment condition that holds under the null hypothesis of exogeneity. Breunig (2018)
treats sequences of local alternatives for which 2( )n noπ δ ξ δ= + , where 1/2n nδ − and ξ is a function
that does not depend on n . These sequences satisfy condition (v) of the definition of nC . Breunig’s
(2018) Corollary 3.2 shows that the asymptotic power of his test under these sequences is greater than the
test’s level, α , but less than 1.7 In contrast, under the assumptions of Theorem 4 of this paper,
( ) 1nP zατ > → for any (0,1)α ∈ if 1/2n nδ − . Thus, the nτ test has greater asymptotic power than
Breunig’s test under the sequences local alternatives that he considers and the assumptions of Theorem 4.
Breunig’s (2018) test can be more powerful asymptotically against sequences of alternatives that
are not in nC . As an example, let , : 1,2,...j j jλ φ = denote the eigenvalues and orthonormal
eigenvectors of the operator T defined in Section 3.5 with 5/4j jλ −= . Let the sequence of alternative
hypotheses be
(3.6) 1/4[ ( ) ( ) | , ]nP Y G X n X W w q−≤ + ∆ = = ,
where
(3.7) ( ) ( )n nx xφ∆ = .
Then 3/22T nπ −= and ( )nP zατ α> → . Asymptotically, the nτ test rejects the null and alternative
hypotheses with the same probability. Breunig’s (2018) test, however, rejects the alternative hypothesis
7 There are typographical errors in the statement of Breunig’s (2018) Corollary 3.2. In the notation of the corollary,
2nδ should be ( )1
nO n m− , not ( )nO m , and nm should satisfy Breunig’s condition (2.10).
13
with probability approaching 1 as n →∞ if the series length nm in the test statistic is chosen properly A
numerical illustration is presented in Section A.5 of the appendix.
To obtain a test that is consistent against sequences of alternatives in nC and sequences like the
foregoing one, let Bτ denote Breunig’s (2018) statistic. Let zτα and
Bzα , respectively, denote the τα -
and Bα -level critical values of nτ and Bτ . Define the combined statistic
max[ ( ), ( )]BC n BI z I z
τα ατ τ τ= > > .
Reject 0H if 1Cτ = . The test based on Cτ is consistent against any alternative for which the test based on nτ or Bτ is consistent. If 0H is correct, then the asymptotic level of Cτ is between max( , )Bτα α and
Bτα α+ .
3.6 Weight functions
This section considers the choice of the weight function ( , ; , )z w ζ η . We show that setting
1 |( , ; , ) ( , ) (0, , , | )UXZWz w z f z wζ η ζ η= has certain power advantages over a weight function that does
not depend on the distribution of ( , , , )U X Z W . The function 1 is assumed to be the kernel of a non-
singular integral operator from 2 ([0,1] )rL to itself. Horowitz and Lee (2009) present a method for
estimating (0, , , )UXZWf x z w and, therefore, | (0, , , | )UXZWf x z w . Section A.4 of the Appendix outlines
the extension of Theorems 1-5 to the case of an estimated weight function.
To start, assume that 0r = , so Z is not in the model. Let nfτ denote the nτ statistic with weight
function (0, , )UXWf x w and nτ
denote the statistic with a fixed weight function ( , )w η that does not
depend on the distribution of ( , , )U X W . The arguments of Horowitz and Lee (2009) show that there are
combinations of density functions UXWf and local alternative models such that an α -level test based on
nτ
has local power that is arbitrarily close to α , whereas the asymptotic local power of an α -level test
based on nfτ is bounded away from and above α . In contrast, it is not possible for the asymptotic local
power of the α -level nfτ test to approach α while the asymptotic local power of the α -level nτ
test
remains bounded away from and above α .
Horowitz and Lee (2009) did not investigate the case of 1r ≥ . The following theorem extends
their result to this case.
Theorem 5: Let assumptions 1-5 of Section A.1 of the Appendix0 hold. Let ( , )x z∆ be the
bounded function defined in (3.3)-(3.4). Fix the functions ( , ; , )z w ζ η and 1( , )z ζ . Assume that these
functions are bounded and that 1 is bounded away from 0.
14
(a) There are combinations of density functions UXZWf and functions ∆ such that an α -level
test based on nτ
has asymptotic local power that is arbitrarily close to α , whereas the asymptotic local
power of an α -level test based on nfτ is bounded away from and above α .
(b) It is not possible for the asymptotic local power of the α -level nfτ test to approach α while
the asymptotic local power of the α -level nτ
test remains bounded away from and above α .
Theorem 5 does not imply that the power of nfτ always exceeds that of nτ
. Moreover, in finite
samples, random sampling errors in an estimate of UXZWf can reduce the power of nfτ and increase the
difference between the true and nominal probabilities of rejecting a correct 0H . Consequently, a weight
function that does not depend on the sample may be attractive in applications. Section 4 provides
illustrations of the finite-sample performances of nfτ and nτ
with two weight functions that do not
depend on the sample.
4. MONTE CARLO EXPERIMENTS
This section reports the results of a Monte Carlo investigation of the finite-sample performance of
the nτ test. In the experiments, 1p = and 0r = , so Z does not enter the model. Realizations of
( , , )X W U were generated by
( )W ζ= Φ ,
( )21 11X ρ ζ ρ ξ= Φ + − ,
and
22 21U ρ ξ ρ ν= + − ,
where Φ is the (0,1)N distribution function; ζ , ξ , and ν are independent random variables with
(0,1)N distributions; and 1ρ and 2ρ ( 1 20 , 1ρ ρ≤ ≤ ) are constant parameters whose values vary among
experiments. The parameter 1ρ determines the strength of the instrument W , and 2ρ determines the
strength of the correlation between U and X . 0H is true if 2 0ρ = and false otherwise. Realizations of
Y were generated from
(4.1) 0 1 UY X Uθ θ σ= + + ,
where 0 0θ = , 1 0.5θ = , and 0.1Uσ = . Experiments were carried out with 1 0.35ρ = or 0.7 , and
2 0, 0.1, 0.2ρ = , or 0.3 . The instrument is stronger when 1 0.7ρ = than when 1 0.35ρ = , and the
correlation between X and U increases as 2ρ increases. The sample size was 750,1000n = , or 2000 ,
15
depending on the experiment, and the nominal probability of rejecting a correct 0H was 0.05.
Conditioning on has a negligible effect on the results of the experiments and is not carried out. There
were 2000 Monte Carlo replications per experiment.
The kernel function 2 2( ) (15 /16)(1 ) (| | 1)K v v I v= − ≤ was used to compute G in nτ and YXWf in
the estimated critical value of nτ and in the data-dependent weight function. The plug-in bandwidth of
Yu and Jones (1998) was used for G in the critical value of nτ . The multivariate version of Silverman’s
(1986) rule of thumb bandwidth was used for YXWf . The tuning parameter Kε , which is defined in
Section A.2 of the Appendix, was set at 15. Setting Kε equal to other values up to 30Kε = yields
similar results. Four different weight functions ( , )w η were used in nτ . One is the data-dependent
estimated probability density function ˆ ˆ[ ( ), , ]YXWf g wη η with g computed using the method of Horowitz
and Lee (2009). The bandwidths for YXWf in the Horowitz-Lee estimator were 0.01X Yh h= = for X
and Y and 0.3Wh = for W . The other weight functions are not data dependent. The second weight
function is the infeasible true probability density function [ ( ), , ]YXWf g wη η . The third and fourth weight
functions are ( , ) ( )w I wη η= ≤ and ( , ) exp( )w wη η= , respectively. The third weight function was used
by Song (2010) and Stute and Zhu (1998). The fourth was proposed by Bierens (1990). The second
weight function is not feasible in applications but provides an indication of the reduction in finite-sample
performance due to random sampling errors in estimating the weight function. We also computed the
power of Breunig’s (2018) test. As in Breunig (2018), we used a basis of B-splines of order 2. We chose
the numbers of knots by Monte Carlo to maximize the test’s power. This method of choosing knots is not
feasible in applications and gives Breunig’s test an advantage.
The results of the experiments are shown in Table 1 for 1 0.35ρ = and Table 2 for 1 0.7ρ = . In
the tables, nDτ , *nDτ , nIτ , and nBτ , respectively, denote the nτ tests with the data-dependent weight
function, the infeasible weight function, the Song (2010) weight function, and the Bierens (1990) weight
function. In what follows, the difference between the empirical and nominal probabilities of rejecting a
correct 0H is called the error in the rejection probability or ERP. The performance of the nBτ test is
poor. It has a large ERP when 2000n < and low power. The nIτ test has the best performance over all
experiments. Its ERP is low. Its power is higher than that of the nBτ test and only slightly lower than the
power of the infeasible *nDτ test in experiments in which the *
nDτ test has a low ERP. The nDτ test has a
relatively high ERP if the instrument is weak ( 1 0.35ρ = ) or n is small. The power of the nDτ test is
16
lower than that of the *nDτ test. The relatively poor performance of nDτ compared to *
nDτ is a
consequence of random sampling errors in estimating [ ( ), , ]YXWf g x wη in nDτ . All forms of the nτ test
have higher power than Breunig’s (2018) test.
We have also carried out experiments using the design of Breunig (2018, Table 3). The results
are shown in Table 3 of this paper. The details of the design and the definitions of the symbols ζ and ϑ
are given in Breunig (2018, Section 4). As in Tables 1 and 2, all forms of the nτ test have higher power
than Breunig’s (2018) test. Section A.5 of the appendix presents additional Monte Carlo comparisons of
the powers of the nτ test and Breunig’s test.
5. EMPIRICAL EXAMPLE: THE DEMAND FOR GASOLINE
This section reports the use of nτ to test exogeneity of price in a quantile regression model of the
demand for gasoline. Under the null hypothesis that price is exogenous, the model is
(5.1) ( , ) ; ( 0 | , )Q G P Y U P U P Y q= + ≤ = ,
where Q is the quantity of gasoline purchased by a household, P is the price paid, Y is the household’s
income, and U is an unobserved random variable whose q quantile is zero. In a nonparametric quantile
regression analysis, Blundell, Horowitz, and Parey (2017) found this function to be nonlinear and not
satisfactorily approximated by standard parametric models. Blundell, Horowitz, and Parey (2012);
Schmalensee and Stoker (1999) and Hausman and Newey (1995) obtained similar results for conditional
mean functions. As Blundell, Horowitz, and Parey (2017) explain, under suitable restrictions, quantile
regression permits demand to be recovered at a specific point in the distribution of unobservables, thereby
facilitating analysis of differential effects of price changes and welfare analysis across the distribution of
unobservables.
We test the exogeneity of P in the demand for gasoline by households with median consumption
in a range of income and demographic groups. Thus, 0.5q = in (5.1). The demographic and income
groups are listed in Columns 1 and 2 of Table 4. The data are from the 2009 National Household Travel
Survey (NHTS). The NHTS surveys the civilian non-institutionalized population in the United States. It
is a household-level survey conducted by telephone and complemented by travel diaries and odometer
readings. The instrument is the distance from a major oil platform in the Gulf of Mexico to the capital of
the state in which a household is located (Blundell, Horowitz, and Parey 2012). The cost of transporting
gasoline from its supply source is a major determinant of its price. The Gulf coast region accounts for the
majority of production of finished gasoline in the United States. It is also the starting point for most
major gasoline pipelines. Therefore, we expect the cost of transportation to increase as distance from the
17
Gulf of Mexico increases. Blundell, Horowitz, and Parey (2012) provide evidence on the relation
between gasoline price and distance from the Gulf coast.
We use nτ with the data-dependent weight function ˆ ˆ[ ( ), , ]YXWf g wη η , which gives the nDτ test
statistic described in Section 4. We do not use nIτ because with the NHTS data, the weight function
( )I w η≤ places high weight on regions where the data are sparse and low weight on regions where the
data are dense. We set 50Kε = . The other tuning parameters needed to implement nDτ were selected by
the methods described in Section 4. The kernel function is the same as the one used in Section 4.
The results of the test of the hypothesis that P is exogenous for the various income and
demographic groups are shown in Table 4. The evidence for endogeneity of P is mixed. Using
conventional significance levels, there is no evidence for endogeneity in 5 of the 11 tabulated
demographic and income groups ( 0.11p > ). The interpretation of the results for the other 6 groups
depends on how one treats the outcomes of multiple tests. All of the p values for these groups are below
0.05, which might be taken as evidence that P is endogenous for these groups. However, the Bonferroni
method for selecting the p -value to achieve a Type I error probability of 0.05 with 11 tests results in
0.05 /11 0.0045p = = . Using this conservative criterion, there is evidence of endogeneity for only one
demographic and income group. More generally, the decision about which groups show evidence of price
endogeneity in this multiple testing setting depends on how one balances the probabilities of Type I and
Type II errors. See, for example, Lehmann and Romano (2005, Ch. 9). Despite this ambiguity, the
results of the exogeneity test indicate that price endogeneity is potentially important in model (5.1) with
the NHTS data. It would be reasonable to calculate the objects of interest in an application with and
without assuming that price is exogenous in the 6 demographic and income groups for which 0.05p > in
Table 4.
6. CONCLUSIONS
Endogeneity of explanatory variables is an important problem in applied econometrics.
Erroneously assuming that explanatory variables are exogenous can lead to highly misleading results.
This paper has described a test for exogeneity in nonparametric quantile regressions. The test does not
use a parametric model, thereby avoiding the possibility of obtaining misleading results due to
misspecification of the model. The test also avoids the slow rate of convergence and potentially low
power associated with nonparametric instrumental variables estimation of either mean- or quantile-
regression models. The new test has non-trivial power against alternative hypotheses whose “distance”
from the null hypothesis of exogeneity is 1/2( )O n− , which is the same as the distance possible with tests
18
based on parametric models. The results of Monte Carlo experiments and an empirical application have
illustrated the performance of the test.
19
TABLE 1: RESULTS OF MONTE CARLO EXPERIMENTS WITH 1 0.35ρ =
Empirical Probability of Rejecting 0H
n 2ρ nDτ *nDτ nIτ nBτ Breunig’s
Test 750 0 0.093 0.073 0.056 0.074 0.050
0.1 0.108 0.136 0.104 0.098 0.056 0.2 0.194 0.334 0.279 0.184 0.091 0.3 0.384 0.640 0.542 0.350 0.170
1000 0 0.072 0.067 0.053 0.072 0.049 0.1 0.123 0.168 0.124 0.114 0.058 0.2 0.226 0.416 0.328 0.230 0.103 0.3 0.524 0.782 0.690 0.420 0.214
2000 0 0.062 0.056 0.046 0.054 0.044 0.1 0.162 0.252 0.205 0.147 0.062 0.2 0.483 0.697 0.608 0.384 0.157 0.3 0.860 0.968 0.926 0.731 0.404
20
TABLE 2: RESULTS OF MONTE CARLO EXPERIMENTS WITH 1 0.7ρ =
Empirical Probability of Rejecting 0H
n 2ρ nDτ *nDτ nIτ nBτ Breunig’s
Test
750 0 0.063 0.048 0.050 0.098 0.047 0.1 0.217 0.240 0.212 0.164 0.059 0.2 0.640 0.757 0.660 0.410 0.147 0.3 0.952 0.982 0.958 0.721 0.378
1000 0 0.057 0.048 0.054 0.086 0.043 0.1 0.262 0.312 0.269 0.177 0.065 0.2 0.610 0.861 0.794 0.511 0.195 0.3 0.991 1.000 0.993 0.854 0.508
2000 0 0.054 0.044 0.049 0.056 0.039 0.1 0.488 0.582 0.516 0.288 0.076 0.2 0.980 0.996 0.985 0.840 0.313 0.3 1.000 1.000 1.000 0.996 0.657
21
TABLE 3: RESULTS OF MONTE CARLO EXPERIMENTS WITH DESIGN OF BREUNIG (2018)a
n ζ ϑ Breunig
nDτ nIτ nBτ
500 0.4 0 0.063 0.070 0.065 0.066 0.3 0.172 0.299 0.460 0.300 0.35 0.231 0.394 0.584 0.379 0.4 0.319 0.540 0.753 0.496 0.45 0.429 0.666 0.841 0.597
0.7 0 0.055 0.066 0.051 0.075 0.3 0.273 0.806 0.826 0.572 0.35 0.393 0.928 0.951 0.692 0.4 0.571 0.977 0.980 0.813 0.45 0.746 0.991 0.997 0.894
1000 0.4 0 0.055 0.053 0.052 0.043
0.3 0.35 0.631 0.770 0.536 0.35 0.501 0.798 0.899 0.675 0.4 0.667 0.912 0.962 0.794 0.45 0.824 0.967 0.990 0.900 0.7
0 0.049 0.050 0.043 0.065 0.3 0.664 0.986 0.994 0.881 0.35 0.859 0.998 1.000 0.953 0.4 0.97 1.000 1.000 0.992 0.45 0.997 1.000 1.000 0.998
a Results in column labeled “Breunig” are from Breunig (2018, Table 3) using the knot choices in that table that give the highest powers.
22
TABLE 4: RESULTS OF TESTING EXOGENEITY OF PRICE IN A QUANTILE DEMAND MODELa
Demographic Group Income (Thousands of US $)
p -Value
1 adult, no children at home
0-45 0.012
45-65 0.316 2 adults, no children at home
0-45 0.152
45-65 0.001 65-100 0.118 2 adults, youngest child < 15 years
0-45 0.217
45-65 0.009 65-100 0.020 Retired, no children at home
0-45 0.007
45-65 0.022 65-100 0.429
a Results not reported for 1 adult, no children at home, and income $65,000-100,000 because the number of observations in this cell is very small.
23
REFERENCES
Amemiya, T. (1982). Two stage least absolute deviations estimators, Econometrica, 50, 689-711. Bierens, H.J. (1990). A consistent conditional moment test of functional form. Econometrica, 58, 1443-
1458. Blundell, R. and J.L. Horowitz (2007). A nonparametric test of exogeneity. Review of Economic Studies,
74, 1034-1058. Blundell, R., J.L. Horowitz, and M. Parey (2012). Measuring the price responsiveness of gasoline
demand: Economic shape restrictions and nonparametric demand estimation. Quantitative Economics, 3, 29-51.
Blundell, R., J.L. Horowitz, and M. Parey (2017). Nonparametric estimation of a heterogeneous demand
function under the Slutsky inequality restriction. Review of Economics and Statistics, 99, 291-304. Blundell, R. and J.L. Powell (2007). Censored regression quantiles with endogenous regressors, Journal
of Econometrics, 141, 65-83. Breunig, C. (2015). Goodness-of-fit tests based on series estimators in nonparametric instrumental
regression. Journal of Econometrics, 184, 328-346. Breunig, C. (2018). Specification testing in nonparametric instrumental quantile regression. Discussion
paper SFB 649 2016-032, Humboldt Universität zu Berlin. Chen, L. and S. Portnoy (1996). Two-stage regression quantiles and two-stage trimmed least squares
estimators for structural equation models, Communications in Statistics, Theory and Methods, 25, 1005-1032.
Chen, X., V. Chernozhukov, S. Lee, and W.K. Newey (2014). Local identification of nonparametric and
semiparametric models. Econometrica, 82, 785-809. Chen, X. and D. Pouzo (2009). Efficient estimation of semiparametric conditional moment models with
possibly nonsmooth residuals. Journal of Econometrics, 152, 46-60. Chen, X. and D. Pouzo (2012). Estimation of nonparametric conditional moment models with possibly
nonsmooth generalized residuals. Econometrica, 80, 277-321. Chernozhukov, V. and C. Hansen (2004). The effects of 401(k) participation on the wealth distribution:
an instrumental quantile regression analysis, Review of Economics and Statistics, 86, 735-751. Chernozhukov, V. and C. Hansen (2005). An IV model of quantile treatment effects, Econometrica, 73,
245-261. Chernozhukov, V. and C. Hansen (2006). Instrumental quantile regression inference for structural and
treatment effect models, Journal of Econometrics, 132, 491-525. Chernozhukov, V., G.W. Imbens, and W.K. Newey (2007). Instrumental variable identification and
estimation of nonseparable models via quantile conditions, Journal of Econometrics, 139, 4-14.
24
Chesher, A. (2003). Identification in nonseparable models, Econometrica, 71, 1405-1441. Chesher, A. (2005). Nonparametric identification under discrete variation. Econometrica, 73, 1525-
1550. Chesher, A. (2007). Instrumental values. Journal of Econometrics, 139, 15-34. Guerre, E. and P. Lavergne (2002). Optimal Minimax Rates for Nonparametric Specification Testing in
Regression Models, Econometric Theory, 18, 1139-1171. Guerre, E. and C. Sabbah (2012). Uniform bias study and Bahadur representation for local polynomial
estimators of the conditional quantile function. Econometric Theory, 28, 87-129. Hall, P. and J.L. Horowitz (2005). Nonparametric methods for inference in the presence of
instrumental variables. Annals of Statistics, 33, 2904-2929. Hausman, J.A. and W.K. Newey (1995). Nonparametric estimation of exact consumer surplus and
deadweight loss. Econometrica, 63, 1445-1476. Horowitz, J.L. (2006). Testing a parametric model against a nonparametric alternative with identification
through instrumental variables. Econometrica, 74, 521-538. Horowitz, J.L. and S. Lee (2007). Nonparametric instrumental variables estimation of a quantile
regression model. Econometrica, 75, 1191-1208. Horowitz, J.L. and S. Lee (2009). Testing a parametric quantile-regression model with an endogenous
explanatory variable against a nonparametric alternative. Journal of Econometrics, 152, 141-152. Horowitz, J.L. and V.G. Spokoiny (2001). An Adaptive, Rate-Optimal Test of a Parametric Mean
Regression Model against a Nonparametric Alternative, Econometrica, 69, 599-631. Horowitz, J.L. and V.G.Spokoiny (2002). An Adaptive, Rate-Optimal Test of Linearity for Median
Regression Models, Journal of the American Statistical Association, 97, 822-835. Janssen, A. (2000). Global power functions of goodness of fit tests. Annals of Statistics, 28, 239-253. Januszewski, S.I. (2002). The effect of air traffic delays on airline prices, working paper, Department of
Economics, University of California at San Diego, La Jolla, CA. Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press. Kress, R. (1999). Linear Integral Equations, 2nd ed., New York: Springer. Lee, S. (2007): Endogeneity in quantile regression models: a control function approach, Journal of
Econometrics, 141, 1131-1158. Lehmann, E.L. and J.P. Romano (2005). Testing Statistical Hypotheses. New York: Springer. Ma, L. and R. Koenker (2006). Quantile regression methods for recursive structural equation models,
Journal of Econometrics, 134, 471-506.
25
O’Sullivan, F. (1986). A Statistical Perspective on Ill-Posed Problems, Statistical Science, 1, 502-527. Pollard, D. (1984). Convergence of Stochastic Processes. New York: Springer-Verlag. Powell, J.L. (1983). The asymptotic normality of two-stage least absolute deviations estimators,
Econometrica, 50, 1569-1575. Sakata, S. (2007). Instrumental variable estimation based on conditional median restriction, Journal of
Econometrics, 141, 350-382. Schmalensee, R. and T.M. Stoker (1999). Household gasoline demand in the United States.
Econometrica, 67, 645-662. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman &
Hall. Song K. (2010). Testing semiparametric conditional moment restrictions using conditional martingale
transforms. Journal of Econometrics, 154, 74-84. Stute, W. and L. Shu (2005). Nonparametric checks for single-index models. Annals of Statistics, 33,
1048-1083. Yu, K. and M.C. Jones (1998). Local linear quantile regression. Journal of the American Statistical
Association, 93, 228-237.
26
APPENDIX: REGULARITY CONDITIONS, COMPUTATION OF THE CRITICAL VALUE,
PROOFS OF THEOREMS, AND EXTENSION TO AN ESTIMATED WEIGHT FUNCTION
A.1 Regularity Conditions
This section states the assumptions that are used to obtain the asymptotic properties of nτ . The
following notation is used. For any real 0a > , define [ ]a as the largest integer less than or equal to a .
Define ( , )U Y g X Z= − and ( , )V Y G X Z= − . Let E⋅ denote the Euclidean norm and 2⋅ denote the
2L norm. For any vector 1( ,..., )dx x=x , function ( )f x , and vector of non-negative integers
1( ,..., )dk k=k , define 1| | ... dk k= + +k and
1
| |
1
( ) ( )... dkk
d
D f fx x∂
=∂ ∂
kk x x .
For a set d⊂ and positive constants ,a M < ∞ , define ( )aMC as the class of continuous functions
:f → such that af M≤ , where
[ ]| | [ ] | | [ ] ,
| ( ) ( ) |max sup | ( ) | max supa a aa aE
D f D ff D f −≤ ≤ ′∈ ∈
′−= +
′−
k kk
k kx x x
x xxx x
and derivatives on the boundary of are one sided. Let | ( | , )V XZf v x z denote the probability density
function of V conditional on ( , ) ( , )X Z x z= , and let XZf denote the probability density function of
( , )X Z whenever these density functions exist.
We make the following assumptions.
Assumption 1: (i) The support of ( , , )X Z W is [ ,1 ] [0,1]p r pa a +− + × for some 0a > , where
dim( ) dim( )X W p= = and dim( )Z r= . (ii) ( , , , )Y X Z W has a probability density function YXZWf with
respect to Lebesgue measure. (iii) There is a finite constant fC such that | ( , , , ) |YXZW ff y x z w C≤ for all
( , , , )y x z w . Moreover, ( , , , ) /YXZWf y x z w y∂ ∂ exists and is continuous and bounded for all ( , , , )y x z w .
(iv) The data , , , : 1,..., i i i iY X Z W i n= are an independent random sample of ( , , , )Y X Z W .
Assumption 2: (i) ( 0 | , )P U Z z W w q≤ = = = for almost every ( , ) [0,1]p rz w +∈ . (ii)
[ ( , ) | , ]P Y G x z X x Z z q≤ = = = for almost every ( , ) [0,1]p rx z +∈ . (iii) There is a finite constant gC such
that | ( , ) | gg x z C≤ for all ( , ) [0,1]p rx z +∈ . (iv) Equation (2.4) has a solution ( , )g x z that is unique
except, possibly, for ( , )x z in a set of Lebesgue measure zero.
27
Assumption 3: (i) The probability density function | ( | , )V XZf v X x Z z= = exists for all v and
( , ) [0,1]p rx z +∈ . Moreover, for all v in a neighborhood of zero and ( , ) [0,1]p rx z +∈ , | ( | , )V XZf v x z δ≥
for some 0δ > , and | ( | , ) /V XZf v x z v∂ ∂ exists and is continuous. (ii) ( , )XZ XZf x z C≥ for all
( , ) [0,1]p rx z +∈ and some constant 0XZC > . (iii) ([0,1] )g
s p rCG C +∈ with 3( ) / 2s p r> + and gC as in
assumption 2. (iv) 2(supp( ) [0,1] )g
s p rYXZW Cf C Y +∈ × . (v) There are a neighborhood v of 0v = and a
constant fC such that | ( [0,1] )f
s p rV XZ C vf C +∈ × and ( , ) ([0,1] )
f
s p rXZ Cf x z C +∈ .
Assumption 4: (i) The kernel K is bounded, has support [ 1,1]− , and satisfies (2.11) for s as in
assumption 3. (ii) There is a constant KC < ∞ such that | ( ) ( ) | | |KK u K u C u u′ ′− ≤ − for all u , u′ . (iii)
The bandwidth h satisfies bhh C n−= where 0hC > is a finite constant and 1 / (2 ) 1 / [3( )]s b p r< < + .
Assumption 5: (i) The function defined in Section 2.2 is the kernel of a nonsingular integral
operator. (ii) The derivatives of with respect to each of its arguments are bounded uniformly over 2( )[0,1] p r+ . In particular, there is a constant C < ∞
such that
2( )( , , , ) [0,1]
sup | ( , ; , ) |p rz w
z w Cζ η
ζ η+∈
≤
,
2( )( , , , ) [0,1]
sup | ( , ; , ) / |p rz w
z w Cζ η
ζ η ζ+∈
∂ ∂ ≤
,
and
2( )( , , , ) [0,1]sup | ( , ; , ) / |
p rz wz w C
ζ ηζ η η
+∈∂ ∂ ≤
.
Assumptions 1 and 2 specify the model and properties of the random variables under
consideration. The constant a can be arbitrarily close to 0. Assumption 2(iv) requires the structural
function g to be identified. Assumption 3 establishes smoothness conditions. Because of the curse of
dimensionality, the smoothness of G , |V XZf , and XZf must increase as p r+ increases. Assumption 4
establishes properties of the kernel function and requires the estimator of G to be undersmoothed.
Undersmoothing prevents the asymptotic bias of G from dominating the asymptotic distribution of nτ .
hK must be a higher-order kernel if 2p r+ ≥ . Assumption 5 specifies properties of the function ,
which is chosen by the analyst. Assumption 5 does not restrict the distributions of ( , , , )Y X Z W in (2.4).
A.2 Obtaining the Critical Value
The statistic nτ is not asymptotically pivotal, so its asymptotic distribution cannot be tabulated.
This section presents a method for obtaining an approximate asymptotic critical value. The method is
28
based on replacing the asymptotic distribution of nτ with an approximate distribution. The difference
between the true and approximate distributions can be made arbitrarily small under both the null
hypothesis and alternatives. Moreover, the quantiles of the approximate distribution can be estimated
consistently as n →∞ . The approximate 1 α− critical value of the nτ test is a consistent estimator of
the 1 α− quantile of the approximate distribution.
We now describe the approximation to the asymptotic distribution of nτ . Under 0H , nτ is
asymptotically distributed as
21
1j j
jτ ω χ
∞
=
≡∑ .
Given any 0ε > , there is an integer Kε < ∞ such that
21
10 ( )
K
j jj
t tε
ω χ τ ε=
< ≤ − ≤ < ∑P P
uniformly over t . Define
21
1
K
j jj
ε
ετ ω χ=
=∑ .
Let zεα denote the 1 α− quantile of the distribution of ετ . Then 0 ( )zεατ α ε< > − <P . Thus, using
zεα to approximate the asymptotic 1 α− critical value of nτ creates an arbitrarily small error in the
probability that a correct null hypothesis is rejected. Similarly, use of the approximation creates an
arbitrarily small change in the power of the nτ test when the null hypothesis is false. The approximate
1 α− critical value for the nτ test is a consistent estimator of the 1 α− quantile of the distribution of ετ .
Specifically, let ˆ jω ( 1,2,..., )j Kε= be a consistent estimator of jω under 0H . Then the approximate
critical value of nτ , zεα is the 1 α− quantile of the distribution of
21
1ˆ ˆ
K
n j jj
ε
τ ω χ=
=∑ .
This quantile can be estimated with arbitrary accuracy by simulation.
In applications, Kε can be chosen informally by sorting the ˆ jω ’s in decreasing order and plotting
them as a function of j . They typically plot as random noise near ˆ 0jω = when j is sufficiently large.
One can choose Kε to be a value of j that is near the lower end of the “random noise” range. The
rejection probability of the nτ test is not highly sensitive to Kε , so it is not necessary to attempt precision
in making the choice.
29
The remainder of this section explains how to obtain the estimated eigenvalues ˆ jω . Define
[0,1]
( , ; , ) [ ( , ), , , )] ( , ; , )p YXZWX Z f G X Z X Z w Z w dwλ ζ η ζ η= ∫ .
Because G g= under 0H ,
1/2
1
( , ; , )( , ) [ ( , )] ( , ) [0,1] ( , ; , ) .[ ( , ), , ]
np r i i
n i i i i i i iYXZ i i i ii
X ZB n I Y G X Z q I X Z Z Wf G X Z X Z
λ ζ ηζ η ζ η− +
=
= ≤ − ∈ −
∑
An estimator of 1 1 2 2( , ; , )R ζ η ζ η that is consistent under 0H can be obtained by replacing unknown
quantities with estimators on the right-hand side of
1 1 2 2
2 1 11 1
2 22 2
( , ; , )
( , ; , ) [ ( , )] ( , ) [0,1] ( , ; , )[ ( , ), , ]
( , , , )( , ; , ) .[ ( , ), , ]
p ri i
YXZ
YXZ
R
X ZE I Y G X Z q I X Z Z Wf G X Z X Z
X ZZ Wf G X Z X Z
ζ η ζ η
λ ζ ηζ η
λ ζ ηζ η
+ = ≤ − ∈ −
× −
To do this, let YXZWf and YXZf , respectively, be kernel estimators of YXZWf and YXZf with bandwidths
that converge to 0 at the asymptotically optimal rates. As is well known, YXZWf and YXZf are consistent
uniformly over the ranges of their arguments. Let ( )ˆ iG − be the leave-observation- i -out estimator of G .
Define
( )[0,1]
ˆ ˆ ˆ( , ; , ) [ ( , ), , , )] ( , ; , ) ; 1,...,p
ii i YXZW i i i i iX Z f G X Z X Z w Z w dw i nλ ζ η ζ η−= =∫
and
1 1 2 2
1 ( ) 2 1 11 1 ( )
1
2 22 2 ( )
ˆ ( , ; , )
ˆ( , ; , )ˆ [ , )] ( , ) [0,1] ( , ; , ) ˆ ˆ[ ( , ), , ]
ˆ( , ; , )( , ; , ) .ˆ ˆ[ ( , ), , ]
ni p r i i
i i i i i i i ii YXZ i i i i
i ii i i
YXZ i i i i
R
X Zn I Y G X Z q I X Z Z Wf G X Z X Z
X ZZ Wf G X Z X Z
ζ η ζ η
λ ζ ηζ η
λ ζ ηζ η
− − +−
=
−
= ≤ − ∈ −
× −
∑
Let Ω be the operator defined by
2 2 1 1 2 2 1 1 1 1[0,1]
ˆ ˆ( )( , ) ( , ; , ) ( , )p r
R d dφ ζ η ζ η ζ η φ ζ η ζ η+
Ω = ∫ .
Denote the eigenvalues of Ω by ˆ : 1,2,...j jω = and order them so that 1 2ˆ ˆ ... 0ω ω≥ ≥ ≥ . The relation
between the ˆ jω ’s and jω ’s is given by the following proposition.
30
Proposition 1: Let assumptions 1-5 hold. Then ˆ (1)j j poω ω− = as n →∞ for each 1,2,...j =
It follows from Proposition 1 and Theorem 3 that under (3.3)-(3.4),
ˆlimsup | ( ) ( ) |n nn
z zεα ατ τ ε→∞
> − > ≤P P
for any 0ε > , where zεα denotes the estimated approximate α -level critical value.
To obtain an accurate numerical approximation to the ˆ jω ’s, let ˆ ( , )F x z denote the 1n× vector
whose i ’th component is ( )1 1 1 1
ˆˆ ˆ ( , ; , ) ( , ; , ) / [ ( , ), , ]ii i i i YZX i i i iZ W X Z f G X Z X Zζ η λ ζ η −− , and let ϒ
denote the n n× diagonal matrix whose ( , )i i element is ( ) 2ˆ [ ( , )] ( , ) [0,1] i p ri i i i iI Y G X Z q I X Z− +≤ − ∈ .
Then
11 1 2 2 1 1 2 2
ˆ ˆ ˆ( , ; , ) ( , ) ( , )R n F Fζ η ζ η ζ η ζ η− ′= ϒ .
The computation of the eigenvalues can now be reduced to finding the eigenvalues of a finite-dimensional
matrix. To this end, let : 1,2,...j jφ = be a complete, orthonormal basis for 2[0,1]p rL + . Then
1 1
( , ; , ) ( , ) ( , )jk j kj k
Z W d Z Wζ η φ ζ η φ∞ ∞
= ==∑∑ ,
where
2( )[0,1]( , ; , ) ( , ) ( , )p rjk j kd z w z w dwdzd dζ η φ ζ η φ ζ η+= ∫ ,
and
1 1
ˆ( , ; , ) ( , ) ( , )jk j kj k
X Z a X Zλ ζ η φ ζ η φ∞ ∞
= == ∑∑ ,
where
2( )[0,1]ˆ( , ; , ) ( , ) ( , )p rjk j ka z w z w dwdzd dλ ζ η φ ζ η φ ζ η+= ∫ .
Approximate ( , ; , )Z W ζ η and ˆ( , ; , )X Zλ ζ η by the finite sums
1 1( , ; , ) ( , ) ( , )
L L
jk j kj k
Z W d Z Wζ η φ ζ η φ= =
Π =∑∑
and
ˆ1 1
( , ; , ) ( , ) ( , )L L
jk j kj k
X Z a X Zλ ζ η φ ζ η φ= =
Π =∑∑
for some integer L < ∞ . Since and λ are known functions, L can be chosen to approximate them
with any desired accuracy. Let Φ be the n L× matrix whose ( , )i j component is
31
1/2 ( )
1
ˆ ˆ ( , ) ( , ) / [ ( , ), , ]L
iij jk k i i jk k i i YXZ i i i i
kn d Z W a X Z f G X Z X Zφ φ− −
=
Φ = −∑ .
The eigenvalues of Ω are approximated by those of the L L× matrix ′Φ ϒΦ .
A.3 Proofs of Theorems 1-5 and Proposition 1
Assumptions 1-5 hold throughout this section. To minimize the complexity of the proofs without
losing any important elements, assume that 1p = and 0r = . The proofs with 1p > and 0r > are
identical after replacing quantities for 1p = and 0r = with analogous quantities for the more general
case. Let YXWf and YXf , respectively, denote the probability density functions of ( , , )Y X W and ( , )Y X .
With 1p = and 0r = , (2.10) becomes
(A.1) 1/2
1
ˆ ˆ( ) [ ( )] ( , ) ( [0,1])n
n i i i ii
S w n I Y G X q W w I X−
== ≤ − ∈∑ ,
( , ; , )X Xλ ζ η becomes
1
0( ; ) [ ( ), , ] ( , )YXWX f G X X w w dwλ η η= ∫ ,
and the test statistic is
1 20
ˆ ( )n nS w dwτ = ∫ .
Define
1/21
1( ) [ ( )] ( , ) ( [0,1])
n
n i i i ii
S w n I Y g X q W w I X−
=
= ≤ − ∈∑ ,
1/22
1( ) [ ( )] [ ( )] ( , ) ( [0,1])
n
n i i i i i ii
S w n I Y G X I Y g X W w I X−
=
= ≤ − ≤ ∈∑ ,
and
1/23
1
ˆ( ) [ ( )] [ ( )] ( , ) ( [0,1])n
n i i i i i ii
S w n I Y G X I Y G X W w I X−
=
= ≤ − ≤ ∈∑ .
Then
3
1
ˆ ( ) ( )n njj
S w S w=
=∑ .
Lemma 1: As n →∞ ,
1/23
1
( , )( ) [ ( )] ( [0,1]) (1)[ ( ), ]
ni
n i i i pYX i ii
X wS w n I Y G X q I X of G X X
λ−
=
= − ≤ − ∈ +∑
32
uniformly over [0,1]w∈ .
Proof: Use linear functional notation. For any function f
; n nf fdP f fdP= =∫ ∫ ,
where P and nP , respectively, are the distribution function and empirical distribution function of
observable random variables that whose definitions will be clear from the context in which the notation is
used. Define ( )i i iY G Xε = − . Then
1/23
1
1/2
1/2
31 32
ˆ( ) [ ( ) ( )] ( 0) ( [0,1]) ( , )
ˆ( ) [ ( ) ( )] ( 0) ( [0,1]) ( , )
ˆ [ ( ) ( )] ( 0) ( [0,1]) ( , )
( ) ( ).
n
n i i i i i ii
n
n n
S w n I G X G X I I X W w
n I G X G X I I X W w
n I G X G X I I X W w
S w S w
ε ε
ε ε
ε ε
−
=
= ≤ − − ≤ ∈
= − ≤ − − ≤ ∈
+ ≤ − − ≤ ∈
≡ +
∑
In the linear functional notation used here, the integrals represented by n and are with respect to
( , , )X Wε . G is treated as fixed and not a function of ( , )Xε .
It follows from Proposition 2 of Guerre and Sabbah (2012) that
1/2
[0,1]
logˆsup | ( ) ( ) | px
nG x G x Onh∈
− =
.
For any function ( )s x , define
1/231
1/2
( , ) ( ) [ ( ) ( )] ( 0) ( [0,1]) ( , )
( ) [| | | ( ) ( ) |] ( [0,1]) ( , ).
n n
n
S w s n I s X G X I I X W w
n I s X G X I X W w
ε ε
ε
= − ≤ − − ≤ ∈
= − ≤ − ∈
Let 1C < ∞ be a positive constant. Define the class of functions
1/2
1[0,1]
log: sup | ( ) |x
ns s x Cn∈
= =
Define class of functions of ( , , )X Wε indexed by w ,
[| | | ( ) ( ) |] ( [0,1]) ( , ) : [0,1]I s X G X I X W w wε= ≤ − ∈ ∈
is Euclidean (Pakes and Pollard 1989, Lemma 2.13). It follows from Theorem 2.37 of Pollard (1984)
that
33
31 2[0,1],
1/2
2 1
sup | ( , ) | (log )sup [| | | ( ) ( ) |]
log(log ) | | (1),
nw s s
S w s C n I s X G X
nC n I C on
ε
ε
∈ ∈ ∈≤ −
≤ ≤ =
where 2C < ∞ is a constant, almost surely. Therefore,
(A.2) 31[0,1]
sup | ( ) | (1)n pw
S w o∈
= .
Consequently, the lemma follows if
(A.3) 1/232
1
( , )( ) [ ( )] ( [0,1]) (1)[ ( ), ]
ni
n i i i pYX i ii
X wS w n I Y G X q I X of G X X
λ−
=
= − ≤ − ∈ +∑
uniformly over [0,1]w∈ .
To prove (A.3), let XWfε and Xfε , respectively, denote the probability density functions of
( , , )X Wε and ( , )Xε . Define
( , , ) ( , , )u
XW XWF u x w f u x w duε ε−∞= ∫ .
Then
(A.4) 2
1/232 [0,1]
ˆ( ) [ ( ) ( ), , ] (0, , ) ( , )n XW XWS n F G x G x x w F x w w dxdwε εν ν= − −∫ ,
and (A.3) is equivalent to
1/232
1
( , )( ) [ ( 0) ] ( [0,1]) (1)(0, )
ni
n i i pX ii
X wS w n I q I X of Xε
λε−
=
= − ≤ − ∈ +∑
A Taylor series expansion yields
2
2
[0,1]
[0,1] [0,1]
ˆ [ ( ) ( ), , ] (0, , ) ( , )
ˆ ˆ(0, , )[ ( ) ( )] ( , ) sup | ( ) ( ) | .
XW XW
XWx
F G x G x x w F x w w dxdw
f x w G x G x w dxdw O G x G x
ε ε
ε
ν
ν∈
− −
= − + −
∫
∫
It follows from Proposition 2 of Guerre and Sabbah (2012) that
1/2
[0,1]
logˆsup | ( ) ( ) | px
nG x G x Onh∈
− =
.
Therefore,
(A.5) 2
1/232 [0,1]
ˆ( ) (0, , )[ ( ) ( )] ( , ) (1)n XW pS n f x w G x G x w dxdw oεν ν= − +∫
It follows from Corollary 1 of Kong, Linton, and Xia (2010) that
34
1
1 1ˆ ( ) ( ) [ ( 0)] ( )(0, )
ni
nX i
X xG x G x q I K R xf x nh hε
ε=
− − = − ≤ +
∑ ,
where
3/4
. .
[0,1]
logsup | ( ) | a s sn
x
nR x O hnh∈
= +
uniformly over 1,...,i n= . Therefore, standard calculations for kernel estimators yield
3/4. 1
1
ˆ(A.6) [ ( ) ( ), , ] (0, , ) ( , )
( , ) log[ ( 0)] ( [0,1]) .(0, )
XW XW
na s si
i iX ii
F G x G x x w F x w w dxdw
X nn q I I X O hf X nh
ε ε
ε
ν
λ νε−
=
− −
= − ≤ ∈ + +
∫
∑
The lemma follows by substituting (A.6) into (A.4). Q.E.D.
Proof of Theorem 1: Under 0H , 2 ( ) 0nS w = and g G= . Therefore, it follows from Lemma 1
that
1 3
ˆ ( ) ( ) ( )
( ) (1)
n n n
n p
S S S
B o
η η η
η
= +
= +
uniformly over [0,1]η∈ , where
1/2
1
( ; )( ) [ ( )] ( [0,1]) ( ; )[ ( ), ]
ni
n i i i iYX i ii
XB n I Y G X q I X Wf G X X
λ ηη η−
=
= ≤ − ∈ −
∑ .
Therefore,
2 ( ) (1)n n pB d oτ η η= +∫ ,
and nτ and 2 ( )nB dη η∫ have the same asymptotic distribution. The result follows by writing
2 2[ ( ) ( )]n nB EB dη η η−∫ as a degenerate U statistic of order two. See, for example, Serfling (1980, pp.
193-194). Q.E.D.
Proof of Theorem 2: Let 1 20
( ) 0H dη η >∫ . It suffices to show that
1plim 0nn
n τ−
→∞> .
As n →∞ ,
1/2 . .1( ) 0a s
nn S η− →
and
35
11/2 .
2 0( ) [ ( ), , ] [ ( ), , ] ( , ) ( )a s
n YXW YXWn S F G x x w F g x x w w dxdw Hη η η− → − =∫
by the strong law of large numbers (SLLN). In addition
1/23( ) 0p
nn S η− →
by lemma 1 and the SLLN. Therefore,
11 20
( ) 0pnn H dτ η η− → >∫ .
Q.E.D.
Proof of Theorem 3: By lemma 1
1 2 3
2
ˆ ( ) ( ) ( ) ( )
( ) ( ) (1).
n n n n
n n p
S S S S
B S o
η η η η
η η
= + +
= + +
Some algebra shows that 2[ ( )] ( )nE S η µ η= and 1/22[ ( )] ( )nVar S O nη −= . Therefore, 2 ( ) ( )p
nS η µ η→ ,
ˆ ( ) ( ) ( ) (1)n n pS B oη η µ η= + + ,
and
1 20[ ( ) ( )] (1)n n pB d oτ η µ η η= + +∫ .
But
1
( ) ( )n j jj
B bη φ η∞
=
=∑
and
1
( ) ( )j jj
µ η µ φ η∞
=
=∑ ,
where the jφ ’s are the eigenfunctions of the operator Ω defined in (3.1), the jµ ’s are as defined in
(3.5), and
1
0( ) ( )j n jb B dη φ η η= ∫ .
It follows that
2
21/2 1/2
1 1( )
nj jd
n j j jj jj j
bb
µτ µ ω
ω ω
∞
= =
→ + = +
∑ ∑ .
36
The random variables j jb µ+ are asymptotically distributed as independent ( , )j jN µ ω variates. Now
proceed as in, for example, Serfling’s (1980, pp. 195-199) derivation of the asymptotic distribution of a
degenerate, order two, U statistic. Q.E.D.
Proof of Theorem 4: The proof of Theorem 4 is similar to that of Theorem 5 of Horowitz (2006).
Therefore, we present only the steps of the proof of Theorem 4 that are different from those in Horowitz
(2006). Define
*2( ) [ ( ) [ ( )] ( [0,1]) ( , )n iS E I Y G X I Y g X I X Wη η= ≤ − ≤ ∈ ,
1/2 *3 2( ) ( ) ( )n n nD S n Sη η η= + ,
and
ˆ( ) ( ) ( )n n nS S Dη η η= − .
Then
1 2 2( ) ( ) [ ( ) ( )]n n n nS S S ESη η η η= + − .
It follows from lemma (2.13) of Pakes and Pollard (1989) and Theorem 7.21 of Pollard (1984) that ( )nS η
and nS are bounded in probability uniformly over [0,1]η∈ . Note that 22n nSτ = . Use the inequality
2 2 20.5 ( )a b b a≥ − − with ˆna S= and nb D= to obtain
22
2 2( ) 0.5n n nP z P D S zα ατ > ≥ − >
.
Arguments like those in Horowitz (2006) now show that for each 0ε > there is Mε < ∞ such that for all
M Mε> ,
(A.7) ( )22( ) 0.5n nP z P D z Mα ατ > ≥ > + .
By 2 2 20.5 ( )a b b a≥ − − with na D= and 1/2 *2nb n S= ,
22 2 22*
2 3 322 220.5 0.5 (1)n n n n pD n S S n T S oπ≥ − = − + .
But 23 2 (1)n pS O= by lemma 1. Therefore,
(A.8) 2 222 0.5 (1)n pD n T Oπ≥ + .
Substituting (A.8) into (A.7) yields
( )22( ) 0.25n nP z P n T z Mα ατ π ξ> ≥ + > +
37
for some random variable (1)n pOξ = . Because nξ is bounded in probability and 2 22n T Cπ ≥ by the
definition of nC , there is a 0C < ∞ such that ( ) 1nP zατ δ> ≥ − for all 0C C≥ and any 0δ > . Q.E.D.
The following lemma is used in the proof of Theorem 5.
Lemma 2: Let assumptions 1-5 of Section A.1 hold. Let ( , )x z∆ be the bounded function
defined in (3.3)-(3.4). Fix the functions ( , ; , )z w ζ η and 1( , )z ζ , and assume that these functions are
bounded and that 1 is bounded away from 0. Define
2[0,1]
( , ) (0, , , ) ( , ) ( , ; , )p r UXZWf x z w x z z w dxdzdwµ ζ η ζ η+
= ∆∫
and
2 1[0,1]
( , ) (0, , , ) ( , ) ( , ) (0, , , )pf UXZW UXZWf x z w x z z f z w dxdzdwµ ζ η ζ η+
= ∆∫ .
Then
(a) For any 0ε > , there are functions ( , )x z∆ and UXZWf such that 22 1
/ jjµ ω ε
∞
=<∑
and
2 2112
/f jjDµ ω
∞
=≥∑ for some 2
1 0D > , where 2⋅ denotes the 2L norm.
(b) There is a constant 0D > such that 22
2 2fDµ µ≤
.
Proof:
Part (a): We construct an example in which 22µ ε<
and 2
21fµ = . To simplify the
discussion, assume that G is known and does not have to be estimated, and set 1p r= = . Define
1/2 21
1( , ) [ ( , )] ( , ) [0,1] ( , ) (0, , , )
n
nf i i i i i i UXZW i ii
B n I Y g X Z q I X Z Z f Z Wζ η ζ η−
== ≤ − ∈∑
,
1/2 2
1( , ) [ ( , )] ( , ) [0,1] ( , ; , )
n
n i i i i i i ii
B n I Y g X Z q I X Z Z Wζ η ζ η−
== ≤ − ∈∑
,
1 1 2 2 1 1 2 2( , ; , ) [ ( , ) ( , )]nf nf nfR E B Bζ η ζ η ζ η ζ η= , and 1 1 2 2 1 1 2 2( , ; , ) [ ( , ) ( , )]n n nR E B Bζ η ζ η ζ η ζ η=
. Also,
define the operators fΩ and Ω
on 22 ([0,1] )L by
22 2 1 1 2 2 1 1 1 1[0,1]( )( , ) ( , ; , ) ( , )f fR d dϑ ζ η ζ η ζ η ϑ ζ η ζ ηΩ = ∫
and
22 2 1 1 2 2 1 1 1 1[0,1]( )( , ) ( , ; , ) ( , )R d dϑ ζ η ζ η ζ η ϑ ζ η ζ ηΩ = ∫
.
38
Let ( , ); 1,2,...jf jf jω ψ = and ( , ); 1,2,...j j jω ψ =
denote the eigenvalues and eigenvectors of fΩ
and Ω
, respectively, sorted in decreasing order of the eigenvalues. Define
3 1[0,1]( , ) (0, , , ) ( , ) ( , ) (0, , , )f UXZW UXZWf x z w x z z f z w dxdzdwµ ζ η ζ η= − ∆∫
,
3[0,1]( , ) (0, , , ) ( , ) ( , ; , )UXZWf x z w x z z w dxdzdwµ ζ η ζ η= − ∆∫
,
2[0,1]( , ) ( , )jf f jf d dµ µ ζ η ψ ζ η ζ η= ∫ ,
and
2[0,1]( , ) ( , )j j d dµ µ ζ η ψ ζ η ζ η= ∫
.
Arguments identical to those used to prove Theorem 3 but with a known G show that under the sequence
of local alternative hypotheses (3.3)-(3.4),
2 21
1( / )d
nf jf j jf jfj
τ ω χ µ ω∞
−
→ ∑
and
2 21
1( / )d
n j j j jj
τ ω χ µ ω∞
−
→ ∑
as n →∞ .
To establish part (a), it suffices to show that for any fixed function , UXZWf and ∆ can be
chosen so that 22
12/f jfjµ ω∞
=∑ is bounded away from 0 and 22 1/ jjµ ω∞
=∑
is arbitrarily close to 0.
To do this, assume that Z is independent of ( , , )U X W so that
(0, , , ) ( ) (0, , )UZXW Z UXWf x z w f z f X W= ,
where Zf and UXWf , respectively, are the probability density functions of Z and ( , , )U X W . For
[0,1]v∈ , define 1( ) 1vφ = and 1/21( ) 2 cos( )j v j vφ π−+ = for 1j ≥ . Define
2
1 if 1 or
otherwise.j j
j m
eλ −
==
Let
1/21 1
1(0, , ) 1 ( ) ( )UXW j j j
jf x w x wλ φ φ
∞
+ +=
= +∑ .
Then
39
21 1 2 2 1 1 1 2 1 2
1 2 1 1 1 21
( , ; , ) (1 ) [ ( , ) ( , ) ( ) ] [ (0, , ) (0, , ]
(1 ) ( , ) 1 ( ) ( ) ,
f Z Z W UXW UXW
j j jj
R q q E Z Z f Z E f W f W
q q Q
ζ η ζ η ζ ζ η η
ζ ζ λ φ η φ η∞
+ +=
= −
= − +
∑
where
21 2 1 1 1 2( , ) [ ( , ) ( , ) ( ) ]Z ZQ E Z Z f Zζ ζ ζ ζ= .
Let : 1,2,...k kν = denote the eigenvalues of the integral operator whose kernel is 1 2( , )Q ζ ζ . Then the
eigenvalues of fΩ are : , 1,2,...j k j kλ ν = . Let
0( , ) ( ) ( )Z mx z D z xφ∆ = ∆ ,
for some 1m ≥ , where 0 0D > is a constant and 2 ([0,1])Z L∆ ∈ is a bounded function. Then
1 2
0 10( , ) ( ) ( , ) ( ) ( )f m Z ZD z f z z dzµ ζ η φ η ζ= − ∆∫
,
and
21 12 2 20 12 0 0
21
( , ) ( ) ( )
.
f Z ZD z f z z dz d
D
µ ζ ζ = ∆
≡
∫ ∫
Moreover, 21 0D > for any m because 1 is the kernel of a non-singular integral operator.
We now show that m can be chosen so that 2µ
is arbitrarily close to 0. To do this, observe
that ( , ; , )z w ζ η has the Fourier representation
, , , 1
( , ; , ) ( ) ( ) ( ) ( )jkst j k s tj k s t
z w h z wζ η φ φ φ ζ φ η∞
== ∑ ,
where : , , , 1,2,...jksth j k s t = are constants. Then
0, , 1
( , ) ( ) ( )j jmst s tj s t
D b hµ ζ η φ ζ φ η∞
== − ∑
,
where
1
0( ) ( ) ( )j Z Z jb f z z z dzφ= ∆∫ .
The jb ’s are Fourier coefficients of ( ) ( )Z Zf z z∆ , so 21 j bj b c∞=
=∑ for some bc < ∞ . Therefore, by the
Cauchy-Schwarz inequality
40
22 2
02, 1 1
2 20
, , 1.
j jmsts t j
b jmstj s t
D b h
c D h
µ∞ ∞
= =
∞
=
=
≤
∑ ∑
∑
Because is bounded, m can be chosen so that
2 20
, , 1/ ( )jmst b
j s th c Dε
∞
=<∑
for any 0ε > . With this m , 2µ ε<
, which establishes part (a).
Part (b): We have
3[0,1]
( , ) (0, , , ) ( , ) ( , ; , )UXZWf x z w x z z w dxdzdwµ ζ η ζ η= − ∆∫
.
By the Cauchy-Schwarz inequality,
2 2 2
2
22 22 [0,1] [0,1] [0,1] [0,1]
2
[0,1] [0,1]
(0, , , ) ( , ) ( , ; , )
(A.9) (0, , , ) ( , )
UXZW
UXZW
f x z w x z dx dzdw z w dzdw d d
C f x z w x z dx dzdw
µ ζ η ζ η ≤ ∆ ×
≤ ∆
∫ ∫ ∫ ∫
∫ ∫
for some constant C < ∞
. Under assumption 2(iii), ( , )x z∆ is bounded from below, say by c∆ > −∞ , so
it can be assumed without loss of generality that ( , ) 0x z∆ ≥ for all 2( , ) [0,1]x z ∈ . (If 0c∆ < , replace
( , )x z∆ by ( , )x z c∆∆ − and ( , )G x z by ( , )G x z c∆+ . This is a normalization that has no effect on model
(3.4) because G is nonparametric.) By the boundedness of ( , )x z∆ from above, and of 1( , )z ζ from
below,
41
2
4
5
2
[0,1] [0,1]
[0,1]
[0,1]
1 1
(0, , , ) ( , )
(0, , , ) ( , ) (0, , , ) ( , )
(0, , , ) ( , ) (0, , , ) ( , )
(0, , , ) ( , ) (0, , , ) (
UXZW
UXZW UXZW
UXZW UXZW
UXZW UXZW
f x z w x z dx dzdw
f x z w x z f z w z dxd dzdw
f x z w x z f z w z dxd dzdwd
C f x z w x z f z w
η η η
η η η ζ
η
∆
= ∆ ∆
= ∆ ∆
≤ ∆
∫ ∫
∫
∫
5
2
[0,1]
1 [0,1]
21 2
, )
| ( , ) |
(A.10)
f
f
z dxd dzdwd
C d d
C
ζ η ζ
µ ζ η ζ η
µ
=
≤
∫
∫
for some finite constant 1C < ∞ , where the last line follows from the Cauchy-Schwarz inequality.
Lemma 2(b) follows from substituting (A.10) into (A.9). Q.E.D.
Proof of Theorem 5: Parts (a) and (b) of Theorem 5 follow from parts (a) and (b), respectively, of
Lemma 2. Q.E.D.
Proof of Proposition 1: Let op⋅ denote the operator norm
2
22
1supopu
A Au≤
= ,
where A is an operator on 2[0,1]L . By Theorem 5.1a of Bhatia, Davis, and McIntosh (1983), it suffices
to prove that ˆ 0pop
Ω−Ω → as n →∞ . An application of the Cauchy-Schwarz inequality shows that
2
2 21 2 1 2 1 2[0,1]
ˆ ˆ[ ( ; ) ( ; )]op
R R d dη η η η η ηΩ −Ω ≤ −∫ .
It follows from uniform consistency of G for G , YXZWf for YXZWf , and YXWf for YXWf that
1 2 1 2ˆ( , ) ( , ) (1)pR R oη η η η= +
uniformly over 21 2, [0,1]η η ∈ , where
1 2
1 2 1 21 2
1
( ; )
( , ) ( , )ˆ [ ( )] ( [0,1]) ( ; ) ( ; ) .[ ( ), ] [ ( ), ]
ni i
i i i i iYX i i YX i ii
R
X Xn I Y G X q I X W Wf G X X f G X X
η η
λ η λ ηη η−
=
≡ ≤ − ∈ − −
∑
42
Arguments like those used to prove lemma 1 show that 1 2 1 2( , ) ( , ) (1)n pR R oη η η η= + for each 1 2,η η , so
1 2 1 2ˆ ( , ) ( , ) (1)n pR R oη η η η= + as n →∞ for each 1 2,η η . Therefore,
2
21 2 1 2 1 2[0,1]
ˆ[ ( ; ) ( ; )] (1)pR R d d oη η η η η η− =∫
by the dominated convergence theorem. Q.E.D.
A.4 Extension of Theorems 1-5 to the case of an estimated weight function
Let ˆ ( , )W η be an estimator of the weight function ( , )W η . The test statistic with the estimated
weight function, 1p = , and 0r = is
1 20
( )n nS w dwτ = ∫
,
where
1/2
1
ˆ ˆ( ) [ ( ) ] ( [0,1]) ( , )n
n i i i ii
S w n I Y G X q I X W w−
== − − ∈∑
.
Define
1/24
1
ˆ( ) [ ( )] ( [0,1])[ ( , ) ( , )]n
n i i i i ii
S w n I Y g X q I X W w W w−
=
= ≤ − ∈ −∑ ,
1/25
1
ˆ( ) [ ( )] [ ( )] ( [0,1])[ ( , ) ( , )]n
n i i i i i i ii
S w n I Y G X I Y g X I X W w W w−
=
= ≤ − ≤ ∈ −∑ ,
and
1/26
1
ˆ ˆ( ) [ ( )] [ ( )] ( [0,1])[ ( , ) ( , )]n
n i i i i i i ii
S w n I Y G X I Y G X I X W w W w−
=
= ≤ − ≤ ∈ −∑ .
Then
6
1( ) ( )n nj
jS w S w
=
=∑
.
Under assumptions 1-5 of Section A.1 and assumptions 6-7 below, it follows from lemma A.3 of
Horowitz and Lee (2009) that 4 ( ) (1)n pS w o= uniformly over [0,1]w∈ . Methods like those used to prove
lemma 1 show that 6 ( ) (1)n pS w o= uniformly over [0,1]w∈ . Under 0H , 5 ( ) 0nS w = , so the use of an
estimated weight function does not affect Theorem 1 and Proposition 1. Theorem 2 is also unaffected
because it is concerned with the behavior of 1nn τ− as n →∞ , and 1/2
5 ( ) 0pnn S w− → uniformly over
43
[0,1]w∈ as n →∞ . In addition, 5 ( ) (1)n pS w o= uniformly over [0,1]w∈ under the sequence of local
alternatives (3.3)-(3.4). Therefore, Theorem 3 is unaffected by estimation of .
Now consider Theorem 4. For any function ( , )wδ η , define
*5( , ) [ ( ) [ ( )] ( [0,1]) ( , )nS E I Y G X I Y g X I X Wδ η δ η= ≤ − ≤ ∈ .
Let
1/2 * 1/2 *3 2 5 6
ˆ( ) ( ) ( ) [ ( , ) ( , ), ] ( )n n n n nD S n S n S W W Sη η η η η η η= + + − + ,
and
( ) ( ) ( )n n nS S Dη η η= − .
As before,
22
2 2( ) 0.5n n nP z P D S zα ατ > ≥ − >
.
Arguments like those used to prove Theorem 4 and lemma 1 combined with 6 ( ) (1)n pS oη = uniformly
over [0,1]η∈ show that 2
2(1)n pS O= . Therefore, as in the proof of Theorem 4,
( )22 22 22
0.5 0.5n n nP D S z P D z Mα α ε − < ≤ < + +
and
(A.11) ( )22( ) 0.5n nP z P D z Mα ατ > ≥ > +
for any sufficiently large M . But
( )*5 2 2
ˆ ˆ( , )nS Oη π− = ⋅ −
and, under assumption 7(ii) below, 2 22ˆ / (1)pT oπ π⋅ − = . Therefore, ( )*
5 22ˆ( , )n pS o Tπ− ⋅ = .
Now use 2 2 20.5 ( )a b b a≥ − − with na D= and 1/2 * 1/2 *2 5n nb n S n S= + to obtain,
22 2* *
2 5 3 62 220.5n n n n nD n S S S S≥ + − + .
But 1/2 * 1/22 ( ) ( )( ) (1)n pn S n T oη π η= + , 2 0Tπ > , 3 (1)n pS O= , and 6 2 (1)n pS o= . Therefore,
(A.12) 2 222 (1)n pD Cn T Oπ≥ +
for all sufficiently large C . The theorem follows by substituting (A.12) into (A.11). Q.E.D.
The following are the additional assumptions needed to accommodate an estimated weight
function.
44
Assumption 6:
(i) 2 2 1 1 2 2 1 1( , ) [0,1]sup | ( , ; , ) ( , ; , ) | ( , ) ( , )p r Ez w z w C z w z wζ η ζ η ζ η+∈ − ≤ −
for each
( , ) [0,1]p rz w +∈ .
(ii) ( , ; , ) ([0,1] )p rCz w Cν +⋅ ⋅ ∈
for each ( , ) [0,1]p rz w +∈ and some ( ) / 2p rν > + .
Assumption 7:
(i) 2( )( , , , ) [0,1]ˆsup | ( , ; , ) ( , ; , ) | (1)p r pz w z w z w oζ η ζ η ζ η+∈ − = as n →∞ .
(ii) 2 22ˆsup / (1)
nCg pT oπ π∈ ⋅ − = as n →∞ .
(iii) With probability approaching 1 as n →∞ , 2( )( , , , ) [0,1]ˆsup | ( , ; , ) |p rz w z w Cζ η ζ η+∈ ≤
,
2 2 1 1 2 2 2 2( , ) [0,1]ˆ ˆsup | ( , ; , ) ( , ; , ) | ( , ) ( , )p r Ez w z w C z w z wζ η ζ η ζ η+∈ − ≤ −
,
and for each ( , ) [0,1]p rz w +∈ , ( , ; , ) ([0,1] )p rCz w Cν +⋅ ⋅ ∈
for some ( ) / 2p rν > + .
Assumption 6, like assumption 5, specifies properties of the function and does not restrict the
distribution of ( , , , )Y X Z W . Assumption 6(i) is implied by 5(ii). Assumptions 7(i) and 7(iii) require to
have certain uniform consistency and smoothness properties. These are satisfied, for example, if
1ˆ( , , , ) ( , ) (0, , , )UXZWz w z f x z wξ ζ ζ= , where 1 is non-stochastic and UXZWf is the estimator of
Horowitz and Lee (2009). Assumption 7(ii) restricts the set nC and, therefore, is part of the definition
of that set.
A.5 Additional Monte Carlo Results
This section presents the results of Monte Carlo experiments whose designs are based on (3.6)-
(3.7). The experiments provide numerical examples of settings in which Breunig’s (2018) test is more
powerful than the nτ test.
As in the experiments described in Section 4, 1p = and 0r = . Realizations of ( , , )X W U were
generated by
( )W ζ= Φ ,
( )21 11X ρ ζ ρ ξ= Φ + − ,
and
* 22 2( ) 1U Xρ ξ ρ ν= + − ,
where
45
* 12 2( ) [ sin(10 )] / ( )X X Xρ ρ −= Φ ;
Φ is the (0,1)N distribution function; ζ , ξ , and ν are independent random variables with (0,1)N
distributions; 1 0.7ρ = ; and 2 0.1ρ = , 0.2, or 0.3, depending on the experiment. The term sin(10 )X on
the right-hand side of *2( )Xρ makes ( )n x∆ very wiggly. As is explained in Section 3.5, the nτ test can
have low power when ( )n x∆ is very wiggly. Realizations of Y were generated from (4.1). Other
aspects of the designs are as described in Section 4. The sample size is 2000n = . There were 2000
Monte Carlo replications per experiment.
The results of the experiments are shown in Table A.1. We computed the power of Breunig’s
(2018) test with several different choices of the test statistic’s tuning parameters. Table A.1 shows the
test’s power with the parameter values that maximize and minimize its power. As is expected from the
discussion in Section 3.5, Breunig’s (2018) test is more powerful than any version of the nτ test.
TABLE A.1: RESULTS OF MONTE CARLO EXPERIMENTS WITH THE DESIGN OF SECTION A.5a
Empirical Probability of Rejecting 0H
2ρ nDτ *nDτ nIτ nBτ Breunig’s
Test Max Power
Breunig’s Test Min
Power
0.1 0.139 0.158 0.138 0.118 0.335 0.272 0.2 0.226 0.279 0.220 0.192 0.536 0.433 0.3 0.298 0.497 0.310 0.267 0.677 0.568
a Max and min power of Breunig’s (2018) test refer to the test’s power with its tuning parameters chosen to maximize and minimize its power/
46
REFERENCES FOR THE APPENDIX Bhatia, R., C. Davis, and A. McIntosh (1983). Perturbation of Spectral Subspaces and Solution of Linear
Operator Equations, Linear Algebra and Its Applications, 52/53, 45-67. Kong, E., O. Linton, and Y. Xia (2010). Uniform Bahadur representation for local polynomial estimates
of M-regression and its application. Econometric Theory, 26, 1529-1564. Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of optimization estimators.
Econometrica, 57, 1027-1057. Pollard, D. (1984). Convergence of Stochastic Processes. New York: Springer-Verlag. Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley &
Sons.