York University This draft: November 30, 2021 arXiv:2111

Robust Permutation Tests in Linear Instrumental

Variables Regression∗

Purevdorj Tuvaandorj†

York University

This draft: November 30, 2021

Abstract

This paper develops permutation versions of identification-robust tests in linear instrumental

variables (IV) regression. Unlike the existing randomization and rank-based tests in which inde-

pendence between the instruments and the error terms is assumed, the permutation Anderson-

Rubin (AR), Lagrange Multiplier (LM) and Conditional Likelihood Ratio (CLR) tests are asymp-

totically similar and robust to conditional heteroskedasticity under standard exclusion restriction

i.e. the orthogonality between the instruments and the error terms. Moreover, when the instru-

ments are independent of the structural error term, the permutation AR tests are exact, hence

robust to heavy tails. As such, these tests share the strengths of the rank-based tests and the

wild bootstrap AR tests. Numerical illustrations corroborate the theoretical results.

Keywords: Anderson-Rubin statistic, Asymptotic validity, Conditional likelihood ratio statis-

tic, Exact test, Heteroskedasticity, Identification, Lagrange Multiplier statistic, Permuta-

tion test, Randomization test

1 Introduction

This paper proposes randomization (permutation) test versions of heteroskedasticity-robust Ander-

son and Rubin (1949)’s AR, Kleibergen (2002)’s LM and Andrews and Guggenberger (2019a)’s

∗First version: January 25, 2020; Second version: December 14, 2020. I thank Dmitry Arkhangelsky, Xavier

D’Haultfœuille, Antoine Djogbenou, Hiroyuki Kasahara, Uros Petronijevic and the participants of the Econometric

Society European Winter Meeting 2019, and the 37th Canadian Econometrics Study Group (CESG) Meeting 2021

for helpful comments. I gratefully acknowledge financial support from the LA&PS Minor Research Grant, York

University. All errors are my own.†[email protected]

1

arX

iv:2

111.

1377

4v1

[ec

on.E

M]

26

Nov

202

1

CLR tests in IV regression. These tests are asymptotically similar (in a uniform sense) and

heteroskedasticity-robust under the usual exogeneity condition i.e. the orthogonality (or uncor-

relatedness) between the error terms and the instruments. When the latter assumption is replaced

by independence of the instruments and the error term in the structural equation, the permutation

AR tests become exact hence robust to heavy tails.

Permutation inference is attractive because of its exactness when relevant assumptions hold. It

also bears a natural connection to IV methods since IVs provide an exogenous variation independent

of the unobserved confounders for the endogenous regressor in the IV regression and the permutation

inference typically seeks to exploit independence of two sets of variables by permuting the elements

of one variable holding the other fixed to replicate the distribution of a test statistic at hand.

Despite the link, the literature on randomization inference in IV models is scarce. Imbens

and Rosenbaum (2005) develop exact permutation tests in the IV model where the instruments are

assigned randomly.1 Their test statistic takes the form of an inner product between the instruments

(or transformations thereof e.g. ranks) and the structural error terms evaluated under the null

hypothesis. Exact permutation tests are obtained by permuting the former while holding the latter

fixed. Given the advantage of the permutation method in an IV setting as shown by Imbens and

Rosenbaum (2005), we broaden its scope by proposing the permutation versions of the commonly

used identification-robust AR, LM and CLR test statistics.

Recently, DiCiccio and Romano (2017) propose permutation tests for correlation between a

pair of random vectors and regression coefficients. DiCiccio and Romano (2017) show that using an

appropriate studentization argument, one can obtain tests that are asymptotically pivotal under the

assumption of zero correlation between the two random variables instead of the usual independence

assumption. The results are then generalized to heteroskedastic linear regression where the error

term is conditionally mean-independent of the regressors.

We extend the result of DiCiccio and Romano (2017) to the IV setting and propose two per-

mutation AR (PAR) statistics, denoted as PAR1 and PAR2 respectively. The PAR1 statistic is

based on permuting the rows of instrument matrix that shifts endogenous regressors, and the PAR2

statistic is based on permuting the null-restricted residuals in the structural equation. We establish

the finite sample validity of the permutation AR tests under an independence assumption akin to

the one used in Andrews and Marmer (2008). We then show their asymptotic validity under the

1This leads to what is referred to as the design-based approach by Abadie et al. (2020) who make an explicit

distinction between design-based and sampling-based uncertainties and propose robust standard errors to account for

each uncertainty in a regression model.

2

usual exogeneity condition allowing for conditional heteroskedasticity. Thus, our result differs from

that of Imbens and Rosenbaum (2005) who maintain the independence between the error terms and

the instruments and do not deal with heteroskedasticity.

In addition, we consider permutation LM (PLM) and CLR (PCLR) statistics. Following Freed-

man and Lane (1983), we permute the residuals from the first-stage estimation of the reduced-form

equation and generate a permutation right-hand side endogenous variables. Based on the latter

and the permuted null-restricted residuals in the structural equation, the permutation LM statistic

is then constructed. In the PCLR tests, we condition on the nuisance parameter estimates and

permute the null-restricted residuals of the structural equation only as in the PAR2 test to generate

the conditional permutation distribution.

We compare the robust permutation tests to both the identification-robust rank-based tests

of Andrews and Marmer (2008) and Andrews and Soares (2007), and the wild bootstrap tests of

Davidson and MacKinnon (2012) through simulations. In terms of the control of type I error,

we find that the robust permutation tests outperform the rank-based tests under the standard

exclusion restriction, and perform on par with or have an edge over the wild bootstrap tests in the

heteroskedastic designs considered.

As a main technical contribution, we show the uniform asymptotic similarity of the proposed

permutation tests which, to our best knowledge, has not been shown before in the context of

bootstrap and randomization inference in the IV model. In doing so, we adapt the results of

Andrews and Guggenberger (2017a,b, 2019a,b) and Andrews et al. (2020) in a nontrivial way.

Discussion of the related literature

The literature on linear IV model is vast. Earlier surveys on this topic are provided by Stock et al.

(2002), Dufour (2003), and Andrews and Stock (2007), and recent contributions include Moreira

and Moreira (2019), Young (2020) and the survey of Andrews et al. (2019).

Andrews and Marmer (2008) develop a rank-based AR statistic under the independence assump-

tion between the instruments and the structural error term. Under their assumption, the PAR1

test proposed here is exact while the PAR2 test is so when there are no included non-constant

exogenous variables. Andrews and Soares (2007) consider a rank-version of the CLR statistic of

Moreira (2003). The latter test employs asymptotic critical value after conditioning on a rank-based

nuisance parameter estimate and is robust to heavy-tailed errors but not to heteroskedasticity.

A strand of literature that focuses on bootstrap inference in the linear IV model include, among

others, Davidson and MacKinnon (2008), Moreira et al. (2009), and Davidson and MacKinnon

3

(2012) where they document an improved performance of bootstrap inference over asymptotic in-

ference. The results of this paper complement these studies.

The key for our results is the fact that the identification-robust PAR, PLM and PCLR statistics

are studentized. The use of studentization to obtain an improved test is not new; a well-known exam-

ple is the higher-order accuracy of the bootstrap-t confidence interval, as opposed to the percentile

bootstrap, in the one-sample problem, see, for instance, Chapter 15.5 of Lehmann and Romano

(2005). So and Shin (1999) develop a persistence-robust test for autoregressive models based on a

Cauchy estimator and a studentized statistic. In the context of permutation tests, Neuhaus (1993)

proposes a studentized permutation statistic in a two-sample problem with randomly censored

data. For further extensions and applications of permutation tests based on studentized statistic,

see Janssen (1997), Neubert and Brunner (2007), Pauly (2011) and Chung and Romano (2013,

2016).

For several recent papers that consider randomization inference, see Chernozhukov et al. (2009)

for finite sample inference in quantile regression with and without endogeneity, Canay et al. (2017)

for randomization inference under approximate symmetry condition with applications to differences-

in-difference and regression models with clustered errors, Canay and Kamat (2018) and Ganong and

Jager (2018) for permutation tests in regression discontinuity and kink designs, respectively, and

Dufour et al. (2019) for permutation tests for inequality measures.

The paper is organized as follows. Section 2 introduces the model and the identification-robust

test statistics and develops the permutation tests. Simulation results comparing the performance of

alternative test procedures are provided in Section 3. Section 4 presents two empirical applications.

Section 5 briefly concludes. The proofs are collected in Appendix A.

2 Robust permutation tests

2.1 Model setup

We develop permutation-based tests for a restriction on the d× 1 vector of structural coefficients

H0 : θ = θ0,

in the linear IV regression model:

y = Y θ +Xγ + u, (2.1)

Y = WΓ +XΨ + V, (2.2)

4

where y = [y1, . . . , yn]′ is n × 1 vector of endogenous variables, Y = [Y1, . . . , Yn]′ is n × d matrix

of right-hand side endogenous variables, X = [X1, . . . , Xn]′ is n × p matrix of right-hand side

exogenous variables whose first column is the n×1 vector of ones ι = [1, . . . , 1]′, W = [W1, . . . ,Wn]′

is n × k (k ≥ d) matrix of instrumental variables that does not include ι and the exogenous

variables in X, u = [u1, . . . , un]′ is n × 1 vector of structural error terms, V = [V1, . . . , Vn]′ with

Vi = [Vi1, . . . , Vid]′, is n× d matrix of reduced-form error terms; γ is p× 1 parameter vector, and Γ

and Ψ are k × d and p× d matrices of reduced-form coefficients, respectively.

Let Xi = [Xi1, . . . , Xip]′, Wi = [Wi1, . . . ,Wik]

′, Yi = [Yi1, . . . , Yid]′ and Z = [Z1, . . . , Zn]′ ≡

MXW , where MA ≡ In − PA ≡ In − A(A′A)−1A′ for a matrix A of full column rank. Rewrite the

equations (2.1)-(2.2) as

y = Y θ +Xγ + u, (2.3)

Y = ZΓ +Xξ + V, (2.4)

where ξ ≡ (X ′X)−1X ′WΓ + Ψ . Let Y = [Y, 1, . . . , Y, d], where Y, s = [Y1s, . . . , Yns]′ denotes the s-th

column of Y , s = 1, . . . , d. Define ui(θ) ≡ yi−Y ′i θ−X ′i(X ′X)−1X ′(y−Y θ), u(θ) = MX(y−Y θ) =

[u1(θ), . . . , un(θ)]′, Y = [Y1, . . . , Yn]′ ≡MXY , and

m(θ) ≡ n−1Z ′u(θ) = n−1Z ′(y − Y θ), (2.5)

Gs(θ) ≡ n−1Z ′Y, s, Cs(θ) ≡ n−1n∑i=1

ZiZ′iYisui(θ), (2.6)

Σ(θ) ≡ n−1n∑i=1

ZiZ′iui(θ)

2, J(θ) = [J1(θ), . . . , Jd(θ)], (2.7)

Js(θ) ≡ Gs(θ)− Cs(θ)Σ(θ)−1m(θ), s = 1, . . . , d, (2.8)

where Yis is the s-th element of Yi. The heteroskedasticity-robust AR and Kleibergen (2002)’s LM

statistics are

AR = AR(W,X, u(θ0)) ≡ n m(θ0)′Σ(θ0)−1m(θ0),

LM = LM(Z, u(θ0), Y ) ≡ n m(θ0)′Σ(θ0)−1J(θ0)(J(θ0)′Σ(θ0)−1J(θ0)

)−1J(θ0)′Σ(θ0)−1m(θ0).

(2.9)

The LM statistic may also be defined as

LM ≡ (y − Y θ0)′ZΓ

(Γ ′

n∑i=1

ZiZ′iui(θ0)2Γ

)−1

Γ ′Z ′(y − Y θ0), (2.10)

5

where Γ = [Γ1, . . . , Γd] with Γs ≡ (Z ′Z)−1Z ′Y, s− (Z ′Z)−1Cs(θ0)Σ(θ0)−1Z ′(y−Y θ0), s = 1, . . . , d.

In this paper, we focus on the statistic in (2.9). Finally, we consider the CLR-type statistics. Define

S ≡ Σ(θ0)−1/2n1/2m(θ0) ∈ Rk, (2.11)

T (Σ(θ0), J(θ0), V) ≡ Σ(θ0)−1/2n1/2J(θ0)(

(θ0, Id)Ωε(Σ(θ0), V)−1(θ0, Id)

′)1/2

∈ Rk×d, (2.12)

where Ωε(·, ·) ∈ R(d+1)×(d+1) is an eigenvalue-adjusted version of the matrix defined as

Ω(Σ(θ), V) = Ωij1≤i,j≤d+1 ∈ R(d+1)×(d+1), Ωij ≡ tr(Kij(V)′Σ(θ)−1)/k ∈ R, 2 (2.13)

and the k × k matrix Kij(V) is the (i, j) submatrix of K(V) given by

K(V) ≡ (B(θ0)′ ⊗ Ik) V (B(θ0)⊗ Ik) ∈ Rk(d+1)×k(d+1), B(θ) ≡

1 01×d

−θ −Id

∈ R(d+1)×(d+1).

(2.14)

The CLR-type statistics proposed by Andrews and Guggenberger (2019a,b) are

CLRa = CLR(S, Ta) ≡ S ′S − λmin

[(S, Ta)′(S, Ta)

], (2.15)

CLRb = CLR(S, Tb) ≡ S ′S − λmin

[(S, Tb)′(S, Tb)

], (2.16)

where λmin(·) denotes the smallest eigenvalue of a matrix, and

Ta = T (Σ(θ0), J(θ0), Va(θ0)), Va(θ) ≡

Σ(θ) C(θ)

C(θ)′ ΣG

∈ Rk(d+1)×k(d+1), (2.17)

C(θ) ≡ [C1(θ), . . . , Cd(θ)] ∈ Rk×kd,

ΣG = ΣGst1≤s,t≤d ∈ Rkd×kd, ΣG

st ≡ n−1n∑i=1

ZiZ′iYisYit − (n−1

n∑i=1

ZiYis)(n−1

n∑i=1

ZiYit)′ ∈ Rk×k,

(2.18)

Tb = T (Σ(θ0), J(θ0), Vb(θ0)),

Vb(θ) ≡ n−1n∑i=1

[(εi(θ)− εin(θ))(εi(θ)− εin(θ))′

]⊗ (ZiZ

′i) ∈ Rk(d+1)×k(d+1), (2.19)

εi(θ) ≡[ui(θ),−Y ′i

]′∈ Rd+1, Y = [Y1, . . . , Yn]′ = MXY ∈ Rn×d,

εin(θ) ≡ ε(θ)′Z(Z ′Z)−1Zi ∈ Rd+1, ε(θ) ≡ [ε1(θ), . . . , εn(θ)]′ ∈ Rn×(d+1). (2.20)

The CLRa and CLRb statistics are denoted as QLR and QLRP , respectively, in Andrews and

Guggenberger (2019b), and differ only in their definitions of Va(θ0) and Vb(θ0). Andrews and

2The eigenvalue-adjustment of Andrews and Guggenberger (2019b) is as follows. Let A be a nonzero positive

6

Guggenberger (2019b) show that the CLRb statistic is asymptotically equivalent to Moreira (2003)’s

CLR statistic in the homoskedastic linear IV regression with fixed instruments and multiple endoge-

nous variables under all strengths of identification while the CLRa statistic has the same property

under only certain strengths of identification but its general version is more broadly applicable.

2.2 Main results

We begin by recalling the basic notion of randomization tests from Chapter 15.2 of Lehmann and

Romano (2005). Let Gn be the set of all permutations π = [π(1), . . . , π(n)] of 1, . . . , n, and denote

a permutation version of a generic test statistic R by PR = Rπ. The corresponding permutation

distribution function is

FPRn (x) ≡ 1

n!

∑π∈Gn

1(Rπ ≤ x),

where 1(·) is the indicator function. Let Rπ(1) ≤ · · · ≤ Rπ(n!) be the order statistics of Rπ : π ∈ Gn,

and for a nominal significance level α ∈ (0, 1), define

r ≡ n!− I(n!α), (2.21)

N+ ≡ |j = 1, . . . , n! : Rπ(j) > Rπ(r)|,

N0 ≡ |j = 1, . . . , n! : Rπ(j) = Rπ(r)|,

aPR ≡ (n!α−N+)/N0, (2.22)

where I(·) denotes the integer part of a number. Clearly, N+ and N0 are the number of values

Rπ(j), j = 1, . . . , n!, that are greater than Rπ(r) and equal to Rπ(r), respectively. A randomization test

is then defined as

φPRn ≡

1 if R > Rπ(r),

aPR if R = Rπ(r),

0 if R < Rπ(r).

(2.23)

The robust permutation test statistics are described below. Let Wπ = [Wπ(1), . . . ,Wπ(n)]′ and

uπ(θ) = [uπ(1)(θ), . . . , uπ(n)(θ)]′ be an instrument matrix and a residual vector obtained by per-

muting the rows of W and the elements of u(θ), respectively, for a uniformly chosen permutation

semi-definite matrix of dimension p × p that has a spectral decomposition A = N∆N ′, where ∆ = diag(λ1, . . . , λp),

λ1 ≥ · · · ≥ λp ≥ 0, is the diagonal matrix that consists of the eigenvalues of A, and N is an orthogonal matrix of the

corresponding eigenvectors. Given a constant ε > 0, the eigenvalue adjusted matrix is defined as Aε ≡ N∆εN ′, where

∆ε ≡ diag(maxλ1, λ1ε, . . . ,maxλp, λ1ε). Andrews and Guggenberger (2019a) recommend ε = 0.01. We refer to

Andrews and Guggenberger (2019a) for further properties of the eigenvalue-adjustment procedure.

7

π ∈ Gn. Set

Σu(θ) ≡ diag(u1(θ)2, . . . , un(θ)2), mπ(θ) ≡ n−1Z ′uπ(θ), Σπ(θ) ≡ n−1n∑i=1

ZiZ′iuπ(i)(θ)

2.

We consider two heteroskedasticity-robust permutation AR statistics defined as:

PAR1 ≡ AR(Wπ, X, u(θ0)) = u(θ0)′MXWπ(W ′πMXΣu(θ0)MXWπ)−1W ′πMX u(θ0), (2.24)

PAR2 ≡ AR(W,X, uπ(θ0)) = n mπ(θ0)′Σπ(θ0)−1mπ(θ0). (2.25)

The PAR1 statistic is based on the permutation of the rows of the instrument matrix W , and the

PAR2 statistic uses the permutation of the null-restricted residuals u(θ0). The finite sample validity

of these tests are shown under the following condition.

Assumption 1. (W ′i , X ′i, ui)′ni=1 are i.i.d., and Wi and [X ′i, ui]′ are independently distributed.

Andrews and Marmer (2008) develop exact rank-based AR tests under assumptions similar to

Assumption 1. When W is independent of X and u, W ′MX u(θ0) and W ′πMX u(θ0) have the same

distribution under H0 : θ = θ0, so the PAR1 test is exact. On the other hand, the PAR2 test is

not, in general, exact because W ′MX u(θ0) and W ′MX uπ(θ0) may not have identical distributions.

However, when X = ι in Assumption 1, Z and uπ are independent, Z ′u(θ0) and Z ′uπ(θ0) are

identically distributed, and consequently, the PAR2 test is exact. The following result summarizes

the finite sample validity of the PAR tests.

Proposition 2.1 (Finite sample validity). Under Assumption 1 and H0 : θ = θ0, E[φPAR1n ] = α for

α ∈ (0, 1). If X = ι in Assumption 1, then E[φPAR2n ] = α.

Even if [X ′i, ui]′ and Wi are not distributed independently but satisfy the orthogonality condi-

tion E[(W ′i , X′i)′ui] = 0, we show that the permutation AR tests are asymptotically similar and

heteroskedasticity-robust.

The argument for permutation LM statistic is more involved because it is difficult to construct an

estimator of the reduced-form coefficients Γ whose (asymptotic) distribution remains invariant to a

permutation of the data. The OLS estimators of the reduced-form coefficients and the corresponding

residuals are

Γ = (Z ′Z)−1Z ′Y, ξ = (X ′X)−1X ′Y, V = Y − ZΓ −Xξ = [V1, . . . , Vn]′ = [V, 1, . . . , V, d],

where V, s = [V1s, . . . , Vns]′ is the s-th column of V , s = 1, . . . , d. Now permute the residuals

Vπ = [V ′π(1), . . . , V′π(n)]

′, and let

Y π = [Y π, 1, . . . , Y

π, d] ≡ ZΓ +Xξ + Vπ,

8

where Y π, s = [Y π

1s, . . . , Yπns]′ denotes the s-th column of Y π. The idea of permuting the residuals to

obtain asymptotically valid randomization tests appears in Freedman and Lane (1983) and DiCiccio

and Romano (2017).

One may also permute the null restricted residuals V = Y − ZΓ − Xξ = [V1, . . . , Vn]′, where

Γ and ξ are the restricted estimators of the reduced form parameters given θ = θ0, and obtain

Y π = [Y π, 1, . . . , Y

π, d] = ZΓ +Xξ + Vπ, where Vπ = [V ′π(1), . . . , V

′π(n)]

′.

The Jacobian estimator used in the permutation LM statistic is

Jπ(θ) = [Jπ1 (θ), . . . , Jπd (θ)], (2.26)

Jπs (θ) ≡ n−1Z ′Y π, s − Cπs (θ)Σπ(θ)−1mπ(θ), (2.27)

Cπs (θ) ≡ n−1n∑i=1

ZiZ′iVπ(i)suπ(i)(θ), s = 1, . . . , d. (2.28)

It is not difficult to see that in strongly identified models where Γ is a fixed matrix of full rank,

Jπ(θ0) − n−1Z ′ZΓpπ−→ 0 in probability, where

pπ−→ denotes the convergence in probability with

respect to the probability measure induced by the random permutation π uniformly distributed

over Gn. The permutation LM statistic is then defined as

PLM ≡ LM(Z, uπ(θ0), Y π) = n mπ(θ0)′Σπ(θ0)−1/2PΣπ(θ0)−1/2Jπ(θ0)Σπ(θ0)−1/2mπ(θ0). (2.29)

The permutation LM statistic that corresponds to (2.10) may be defined as

PLM ≡ uπ(θ0)′ZΓπ

(Γπ′

n∑i=1

ZiZ′iuπ(i)(θ0)2Γπ

)−1

Γπ′Z ′uπ(θ0),

where Γπ = [Γπ1 , . . . , Γ

πd ] with Γπ

s ≡ (Z ′Z)−1Z ′Y π, s− (Z ′Z)−1Cπs (θ0)Σπ(θ0)−1Z ′uπ(θ0), s = 1, . . . , d.

Finally, we define the permutation CLR statistics as follows:

PCLRa ≡ CLR(Sπ, Ta) = Sπ′Sπ − λmin

[(Sπ, Ta)′(Sπ, Ta)

], (2.30)

PCLRb ≡ CLR(Sπ, Tb) = Sπ′Sπ − λmin

[(Sπ, Tb)′(Sπ, Tb)

], (2.31)

where Ta and Tb are as defined in (2.17) and (2.19), and

Sπ ≡ n1/2Σπ(θ0)−1/2mπ(θ0) =

(n∑i=1

ZiZ′iuπ(i)(θ0)2

)−1/2

Z ′uπ(θ0). (2.32)

Let FPCLRn (x, T ) ≡ (n!)−1

∑π∈Gn 1(CLR(Sπ, T ) ≤ x) denote the permutation distribution of the

statistic PCLR = CLR(Sπ, T ) ∈ PCLRa,PCLRb given T ∈ Ta, Tb, respectively. The nominal

level α PCLR test rejects when

CLR(S, T ) > PCLR(r)(T ), (2.33)

9

where PCLR(r)(T ) is the r-th order statistic of CLR(Sπ, T ) : π ∈ Gn for r defined in (2.21), and

randomizes when CLR(S, T ) = PCLR(r)(T ).

Let P denote the distribution of the vector (W ′i , X′i, ui, V

′i )′ and we shall index the relevant

quantities by P in the sequel. Define

ei ≡ [ui, V′i ]′, Σe

P ≡ VarP [ei] =

σ2P ΣuV

P

ΣV uP ΣV

P

∈ R(d+1)×(d+1), (2.34)

Z∗i ≡Wi − EP [WiX′i](EP [XiX

′i])−1Xi ∈ Rk,

VP,a ≡ VarP

Z∗i ui

vec(Z∗i (Z∗′i Γ + V ′i ))

=

ΣP CP

C ′P ΣGP

∈ Rk(d+1)×k(d+1), ΣP ≡ EP [Z∗i Z∗′i u

2i ] ∈ Rk×k,

CP = [CP1, . . . , CPd] ∈ Rk×kd, CPs ≡ EP [Z∗i Z∗′i ui(Z

∗′i Γs + Vis)] ∈ Rk×k, s = 1, . . . , d,

ΣGP = ΣG

P,st, ΣGP,st ≡ EP [Z∗i Z

∗′i (Z∗′i Γs + Vis)(Z

∗′i Γt + Vit)]− EP [Z∗i Z

∗′i Γs] EP [ZiZ

∗′i Γt]

′,

s, t = 1, . . . , d, (2.35)

with Γs denoting the s-th column of Γ , and

VP,b ≡ EP

u2i −uiV ′i

−uiVi ViV′i

⊗ (Z∗i Z∗′i )

∈ Rk(d+1)×k(d+1). (2.36)

We maintain the following assumptions to develop the asymptotic results.

Assumption 2. For some δ, δ1 > 0 and M0 <∞,

(a) (W ′i , X ′i, ui, V ′i )′ni=1 are i.i.d. with distribution P ,

(b) EP [ui(W′i , X

′i)] = 0,

(c) EP [‖(W ′i , X ′i, ui)′‖4+δ] < M0,

(d) λmin(A) ≥ δ1 for A ∈

EP [(W ′i , X′i)′(W ′i , X

′i)],EP [Z∗i Z

∗′i ],ΣP , σ

2P ,

(e) EP [Vi(W′i , X

′i)] = 0,

(f) EP [‖Vi‖4+δ] < M0,

(g) λmin(A) ≥ δ1 for A ∈

ΣVP − ΣV u

P (σ2P )−1ΣuV

P ,VP,a.

10

Next we define the parameter space for the permutation tests.

PPAR10 = PPAR2

0 ≡ P : Assumption 2(a)-(d) hold, (2.37)

PPLM0 ≡ P : Assumption 2 holds, (2.38)

PPCLR0 ≡ P : Assumption 2(a)-(f) hold, PCLR ∈ PCLRa,PCLRb. (2.39)

The distribution P in Assumption 2(a) is allowed to vary with the sample size i.e. P = Pn. For

simplicity, we suppress the dependence on n. Assumption 2(c) and (f) impose finite 4 + δ moment

on the error terms and the exogenous variables, and are slightly stronger than the cross moment

restrictions used in Andrews and Guggenberger (2019a). For pointwise asymptotic results or when

P does not vary with n, weaker restrictions would be sufficient.

The independence assumption between the instruments and the error terms, which is typically

made in the randomization test literature (see, for example, Imbens and Rosenbaum (2005)), is not

maintained here; the instruments are only assumed to satisfy the standard exogeneity conditions

in Assumption 2(b) and (e). If independence assumptions are maintained, as shown in Proposition

2.1, the PAR tests are exact and do not require the finite 4 + δ moment assumptions.

Assumption 2(d) and (g) require that the second moment matrix of the instruments and ex-

ogenous covariates, and covariance matrices are (uniformly) nonsingular, and are similar to the as-

sumptions employed by Andrews and Guggenberger (2017a, 2019a) and Andrews et al. (2020) when

developing identification-robust (but not singularity-robust) AR, LM and CLR tests. The condition

λmin(VP,a) ≥ δ1 in Assumption 2(g) plays an important role for showing the asymptotic similarity

of the LM test, and it can be substituted by the condition λmin(EP [(eie′i) ⊗ (Z∗i Z

∗′i )]) ≥ δ1, see

Section 3.2 of Andrews and Guggenberger (2017a). The condition λmin(ΣVP −ΣV u

P (σ2P )−1ΣuV

P ) ≥ δ1

is used in the asymptotic distribution of the PLM statistic.

The main result of this paper is given in the following theorem which establishes the asymptotic

validity of the robust permutation tests.

Theorem 2.2 (Asymptotic similarity). Let H0 : θ = θ0 hold. Then, for α ∈ (0, 1) and the

permutation statistics

PR ∈ PAR1,PAR2,PLM,PCLRa,PCLRb

with the corresponding parameter spaces P0 ∈ PPAR10 ,PPAR2

0 ,PPLM0 ,PPCLRa

0 ,PPCLRb0 , respec-

tively,

lim supn→∞

supP∈P0

EP [φPRn ] = lim inf

n→∞infP∈P0

EP [φPRn ] = α.

11

Theorem 2.2 shows the asymptotic validity of the permutation tests. Thus, the advantage of

the PAR1 and PAR2 tests relative to the existing tests is that they inherit the desirable properties

of both the exact and asymptotic AR tests (including other simulation-based tests that are asymp-

totically valid); the PAR tests are exact under the same assumptions as the rank-based tests are,

and asymptotically pivotal and heteroskedasticity-robust under the assumptions used to show the

validity of the asymptotic tests.

The main technical tools for the asymptotics of the permutation statistics are the Hoeffding’s

combinatorial CLT (Hoeffding (1951), Motoo (1956) and Lemma A.1 in Appendix A) and a variance

formula (the equation (10) of Hoeffding (1951) and Lemma S.3.4 of DiCiccio and Romano (2017))

for the sums

n−1/2n∑i=1

Ziuπ(i)(θ0), n−1/2n∑i=1

ZiVπ(i)s, s = 1, . . . , d.

The proof of asymptotic similarity of the permutation tests uses the generic method of Andrews et al.

(2020) for establishing the asymptotic size of tests; in particular, the derivation of the asymptotic

distribution of the PCLR statistics relies extensively on Theorems 9.1 and 16.6 of Andrews and

Guggenberger (2019a,b).

Next, we shall derive the asymptotic distribution of the permutation statistics under the local

alternatives θn = θ0 +hθn−1/2 with hθ ∈ Rd fixed, assuming strong identification. Let χ2

l (η2) denote

noncentral chi-square random variable with degrees of freedom l and noncentrality parameter η2.

Proposition 2.3 (Local asymptotic power under strong identification). Let Assumption 2 hold,

and assume that the model is strongly or semi-strongly identified in the sense that the smallest

singular value τdn of Σ−1/2P GP ((θ0, Id)Ω

ε(ΣP ,VP )−1(θ0, Id)′)1/2, where GP ≡ EP [Z∗i Z

∗′i ]Γ and VP ∈

VP,a,VP,b, satisfies n1/2τdn →∞. Then, under H1 : θn = θ0 + hθn−1/2, AR

d−→ χ2k(η

2), LMd−→

χ2d(η

2), CLRad−→ χ2

d(η2) and CLRb

d−→ χ2d(η

2), where η2 ≡ limn→∞ h′θG′Σ−1Ghθ, G = limn→∞GP

and Σ = limn→∞ΣP , and

supx∈R|FPAR1n (x)− P [χ2

k ≤ x]| p−→ 0, (2.40)


k ≤ x]| p−→ 0, (2.41)

supx∈R|FPLMn (x)− P [χ2

d ≤ x]| p−→ 0, (2.42)

supx∈R|FPCLRan (x, Ta)− P [χ2

d ≤ x]| p−→ 0, (2.43)

supx∈R|FPCLRbn (x, Tb)− P [χ2

d ≤ x]| p−→ 0. (2.44)

12

It follows that the asymptotic local power of the PAR tests is equal to that of the asymptotic

AR test given by P [χ2k(η

2) > r1−α(χ2k)], where r1−α(χ2

k) is the 1−α quantile of χ2k random variable.

Moreover, under strong identification, the PLM and PCLR tests, and the asymptotic LM and CLR

tests have the same asymptotic local power equal to P [χ2d(η

2) > r1−α(χ2d)].

3 Simulations

This section presents a simulation evidence on the performance of the proposed permutation tests.

The data were generated according to

yi = Yiθ + γ1 +X ′2iγ2 + ui,

Yi = W ′iΓ + ψ1 +X ′2iΨ2 + Vi, i = 1, . . . , n,

where θ = 0, d = 1, Γ = (1, . . . , 1)′√λ/(nk) (k × 1) and λ ∈ 0.1, 4, 20, X2i is (p − 1) × 1

vector of non-constant included exogenous variables with p ∈ 1, 5, and Vi = ρui +√

1− ρ2εi

with ρ = 0.5. The parameters γ1, γ2, ψ1, and Ψ2 are set equal to 0. As specified below, Wi has

zero mean and unit covariance matrix except for the first part of simulations in Section 3.1, hence

λ = nΓ ′ E[WiW′i ]Γ ≈ Γ ′W ′WΓ . We consider the values λ ∈ 0.1, 4, 20 which correspond to very

weak, weak and strong identification. The tested restriction is H0 : θ = θ0 = 0.

We implement the heteroskedasticity-robust asymptotic tests denoted as AR, LM, CLRa, CLRb,

their permutation versions PAR1, PAR2, PLM, PCLRa, PCLRb, the normal score and Wilcoxon

score rank-based AR tests of Andrews and Marmer (2008) denoted as RARn and RARw, respec-

tively, the normal score and Wilcoxon score rank-based CLR statistics of Andrews and Soares (2007)

denoted as RCLRn and RCLRw, respectively, and the wild bootstrap AR and LM tests of David-

son and MacKinnon (2012) denoted as WAR and WLM, respectively. For both the asymptotic and

permutation CLR-type tests, we do not make the eigenvalue-adjustment (i.e. ε = 0 in footnote 2).

In all simulations, the rejection probabilities of the tests (including CLR, PCLR, the rank-based

and the wild bootstrap tests considered below) are computed using 2000 replications with 999

simulated samples for each replication. We consider the following three cases in turn.

3.1 Heavy tails

To examine the finite sample validity, we consider heavy-tailed observations where each element of

the random vector (W ′i , X′2i, ui, εi)

′ is drawn independently from standard Cauchy distribution, and

set k ∈ 5, 10 n ∈ 50, 100, and λ = 4.

13

Table 1 presents the null rejection probabilities. The asymptotic tests all underreject. The PAR1

test has nearly correct levels as predicted by Theorem 2.1 in all cases, and the PAR2 test, which is

exact only when p = 1, has more accurate rejection rates in the case p = 1 than in the case p = 5.

The RARn, RARw, RCLRn and RCLRw tests are all robust against heavy-tailed errors when the

IV’s, the exogenous covariates and the error terms are independent as stated in Assumption 2.1.

The RARn and RARw tests are exact and this is borne out in the simulation results. The RCLRn

and RCLRw tests slightly underreject which may be attributed to their asymptotic nature.

Note, however, that, under the independence assumption and heavy-tailed errors, the rank-based

tests have better power properties than the asymptotic tests, see Andrews and Soares (2007) and

Andrews and Marmer (2008). In such cases, the permutation tests are likely to be dominated by the

rank-based tests in terms of power given the local asymptotic equivalence between the permutation

tests and the asymptotic tests under strong identification established in Proposition 2.3.

3.2 Independence vs. standard exclusion restriction

The next set of simulations examines the role of studentization for obtaining valid permutation

tests and highlights the difference between the independence and the standard exclusion restriction

assumption in homoskedastic setting. Two cases are considered:

1. (W ′i , X′2i, ui, εi)

′ ∼ t5[0, Ik+p+1], where t5[0, Ik+p+1] stands for (k+p+1)-variate t-distribution

with degrees of freedom 5 and covariance matrix Ik+p+1. Under this setting, these random

variables have finite fourth moments, and (W ′i , X′2i)′ satisfies the standard exclusion restriction

in Assumption 2: E[(ui, V′i )′(W ′i , X

′i)] = 0 but (ui, εi)

′ and (W ′i , X′i)′ are dependent;3

2. (W ′i , X′2i, ui, εi)

′ ∼ N [0, Ik+p+1]. Clearly, in this case the instruments and the error terms are

independent.

In addition to the previous tests, for the just-identified design with k = 1, we additionally consider

a non-studentized randomization test, labelled as PNS, that rejects when W ′u(θ0), u(θ0) = Mι(y −

Y θ0), is below 0.25 or above 0.975 quantiles of the permutation distribution of W ′uπ(θ0).

Table 2 reports the empirical level of the tests in the case with k = 1 across different sample sizes.

When the error terms and the instruments satisfy the exogeneity condition but are dependent, the

PNS test overrejects; somewhat unexpectedly, the overrejection becomes more severe as the sample

size grows. The rank-type tests also show some size distortions and it does not improve as the sample

3Because the distribution is fixed i.e. does not vary with the sample size, the permutation tests remain asymptot-

ically valid under the finite fourth moment assumption.

14

size increases. The right panel of Table 2 shows that when the instruments and the error terms

are independent, the empirical level of the asymptotic, permutation and the rank-based CLR-type

tests are close to the nominal significance level.

The result for the overidentified models with n = 100 and k = 5 is reported in Table 3. The

permutation tests perform better than the asymptotic tests regardless of whether the instruments

and the error terms are independent or not. The rank-based tests, displayed in the bottom part of

Table 3, have nearly correct level in the independent case but overreject in the dependent case.

3.3 Conditional heteroskedasticity

The next part of the simulations considers designs with conditional heteroskedasticity. The data

are generated according to (2.1) and (2.2) with

ui = Wi1υi,

Vi = ρui +√

1− ρ2εi,

where Wi1 is the first element of Wi, and similarly to the previous case

1. (W ′i , X′2i, υi, εi)

′ ∼ t5[0, Ik+p+1];

2. (W ′i , X′2i, υi, εi)

′ ∼ N [0, Ik+p+1].

The sample size and the number of instruments are n = 100 and k ∈ 2, 5, 10. The results are

displayed in Table 4-6. The asymptotic AR test tends to underreject and performs poorly compared

to the LM statistic. The permutation tests perform better than their asymptotic counterparts in

most cases, and on par with the wild bootstrap in the independent case. However, when the

instruments satisfy the standard exogeneity condition, and there are included exogenous variables,

the permutation tests appear to have an edge over the wild bootstrap AR test which overrejects.

Qualitatively, the same observations made in the homoskedastic case for the rank-based tests are

also observed in the heteroskedastic case as the rank-based tests reject by a substantial margin in

the dependent case.

4 Empirical applications

4.1 Becker and Woessmann (2010)

Becker and Woessmann (2010) study the effect of Protestantism on education before the Indus-

trialization using using counties and towns-level data from 1816 Prussia. To unravel the direction

15

Table 1: Null rejection probabilities at 5% level. Cauchy distributed IV’s, controls and error terms.

λ = 4 n = 50 n = 100

p = 1 p = 5 p = 1 p = 5

k = 5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10

AR 0.95 0.55 0.85 0.25 0.50 1.05 0.25 0.40

LM 2.35 4.25 2.05 3.10 2.10 4.45 1.85 3.45

CLRa 1.45 1.70 1.15 0.75 0.75 1.95 0.55 1.30

CLRb 2.10 3.60 1.60 2.60 1.75 3.85 1.25 3.25

PAR1 4.40 5.30 4.70 5.60 4.05 4.95 4.45 4.70

PAR2 4.40 5.45 3.40 3.50 3.95 4.75 3.25 3.05

RARn 4.25 4.55 4.10 4.15 5.10 4.05 4.90 3.75

RARw 4.10 4.85 4.30 4.55 5.15 3.95 4.55 3.60

RCLRn 3.15 3.75 4.20 5.10 4.55 3.50 3.85 4.00

RCLRw 2.35 3.25 3.35 4.25 3.50 2.80 3.25 3.45

Note: AR, LM, and CLR are the heteroskedasticity-robust asymptotic tests, PAR1 and PAR2 denote the robust

permutation AR tests, RARn and RARw denote the normal score and Wilcoxon score rank-based AR tests of Andrews

and Marmer (2008), and RCLRn and RCLRw denote the normal score and Wilcoxon score rank-based CLR statistics

of Andrews and Soares (2007) respectively. The number of replications is 2000, and the number of permutation

samples for each replication is N = 999.

16

Table 2: Null rejection probabilities at 5% level. Just-identified model with k = p = 1.

λ = 4 t5[0, Ik+p+1] N [0, Ik+p+1]

n 50 100 200 400 50 100 200 400

AR 3.75 5.00 4.55 4.25 4.60 5.50 4.85 5.80

LM 3.75 5.00 4.55 4.25 4.60 5.50 4.85 5.80

CLRa 3.55 4.75 4.75 4.30 4.55 5.50 4.70 5.65

CLRb 3.55 4.75 4.75 4.30 4.55 5.50 4.70 5.65

PAR1 4.50 5.05 4.75 4.40 4.75 5.70 5.05 5.85

PAR2 4.25 5.40 4.90 4.30 4.75 5.65 4.90 5.90

PLM 4.25 5.40 4.90 4.30 4.75 5.65 4.90 5.90

PCLRa 4.25 5.40 4.90 4.30 4.75 5.65 4.90 5.90

PCLRb 4.60 5.40 4.90 4.30 4.75 5.65 4.90 5.90

RARn 10.35 11.15 12.35 12.00 4.90 5.45 4.90 6.05

RARw 7.65 9.30 8.70 8.35 4.60 5.35 4.85 5.65

RCLRn 8.10 9.65 11.40 11.65 3.45 4.45 4.25 5.50

RCLRw 7.10 8.95 8.65 8.10 4.55 5.20 5.05 5.75

WAR 4.65 6.00 5.35 4.80 4.40 5.45 4.85 5.60

WLM 4.65 6.00 5.35 4.80 4.40 5.45 4.85 5.60

PNS 15.30 16.90 17.80 19.40 5.20 6.00 5.55 6.20

Note: In the left panel, (W ′i , ui, εi)′ ∼ t5[0, Ik+p+1] and the single instrument Wi and the structural error term ui are

orthogonal but dependent. In the right panel, (W ′i , ui, εi)′ ∼ N [0, Ik+p+1] and Wi and ui are independent. PLM and

PCLR denote the robust permutation LM and CLR tests, WAR and WLM are the wild bootstrap AR and LM tests

of Davidson and MacKinnon (2012), PNS denotes the non-studentized permutation test, and the remaining tests are

as in Table 1.

17

Table 3: Null rejection probabilities at 5% level. Over-identified homoskedastic model with k = 5,

n = 100.

λ = 0.1 λ = 4 λ = 20

n = 100 t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1]

k = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5

AR 3.15 4.50 4.50 4.25 3.15 4.50 4.50 4.25 3.15 4.50 4.50 4.25

LM 2.90 4.30 4.15 4.25 3.20 4.05 3.70 3.70 3.00 4.10 3.25 3.85

CLRa 2.90 3.95 4.00 4.05 3.20 4.15 3.30 3.55 2.95 4.25 3.45 3.75

CLRb 2.80 3.95 4.05 4.05 3.20 4.10 3.35 3.45 2.55 4.00 3.30 3.95

PAR1 5.00 5.55 5.45 4.75 5.00 5.55 5.45 4.75 5.00 5.55 5.45 4.75

PAR2 5.05 6.85 5.60 5.45 5.05 6.85 5.60 5.45 5.05 6.85 5.60 5.45

PLM 5.10 5.60 5.40 5.15 5.00 6.05 5.05 5.05 4.70 6.15 4.95 5.45

PCLRa 4.80 6.25 5.20 4.85 4.30 5.50 4.20 4.30 4.25 5.20 3.70 4.15

PCLRb 6.20 7.25 5.35 5.10 6.30 6.70 4.65 4.75 5.80 7.00 4.30 5.00

RARn 18.65 15.70 5.25 4.45 18.65 15.70 5.25 4.45 18.65 15.70 5.25 4.45

RARw 12.40 10.20 5.35 4.10 12.40 10.20 5.35 4.10 12.40 10.20 5.35 4.10

RCLRn 13.85 14.55 3.35 3.70 13.15 14.50 3.15 3.75 11.10 12.65 4.00 3.50

RCLRw 10.40 12.90 4.45 4.55 10.20 11.90 4.20 4.65 9.15 10.90 4.55 4.35

WAR 5.30 7.45 5.35 4.10 5.30 7.45 5.35 4.10 5.30 7.45 5.35 4.10

WLM 4.65 5.85 4.95 4.80 4.20 6.40 4.40 4.05 4.50 5.95 4.30 4.35

Note: t5[0, Ik+p+1] and N [0, Ik+p+1] correspond to (W ′i , X′2i, ui, εi)

′ ∼ t5[0, Ik+p+1] (dependent but uncorrelated) and

(W ′i , X′2i, ui, εi)

′ ∼ N [0, Ik+p+1] (independent), respectively. The remaining part is as in Tables 1 and 2.

18

Table 4: Null rejection probabilities at 5% level. Conditionally heteroskedastic design with weak

identification (λ = 0.1).

n = 100 p = 1 p = 5

t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1]

Tests k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10

AR 2.35 1.75 0.60 4.05 2.90 1.40 4.30 2.40 1.15 5.25 3.25 2.60

LM 2.95 4.35 3.55 4.90 4.30 4.30 5.25 3.35 2.85 6.00 4.30 3.85

CLRa 2.40 2.05 0.80 4.05 3.65 2.00 4.10 2.40 1.35 5.30 3.30 3.00

CLRb 2.45 2.25 1.25 4.10 3.60 2.20 4.25 2.60 1.25 5.25 3.45 3.00

PAR1 3.60 5.15 4.35 4.85 5.60 5.20 7.40 7.55 8.85 5.95 4.70 4.70

PAR2 3.35 4.85 4.35 4.90 5.30 5.00 6.35 5.50 4.95 6.40 5.30 5.35

PLM 4.45 5.95 6.20 5.45 5.80 6.35 6.80 4.65 5.05 6.50 5.60 5.95

PCLRa 3.60 5.20 4.80 5.50 5.45 4.35 6.65 4.70 4.10 6.35 5.25 5.70

PCLRb 3.65 6.40 8.45 5.50 5.65 5.15 6.80 5.20 7.55 6.35 5.50 6.85

RARn 22.30 27.30 32.55 13.45 10.20 8.05 23.55 24.80 29.85 13.75 10.10 8.70

RARw 14.35 17.90 20.10 10.40 8.85 6.65 15.95 16.00 17.55 10.40 8.25 7.35

RCLRn 19.75 25.70 32.30 11.40 8.30 6.40 32.25 33.75 44.75 13.20 8.75 7.85

RCLRw 14.35 18.05 21.95 9.75 8.20 6.60 27.00 27.55 38.65 11.15 8.55 8.60

WAR 4.50 5.80 5.55 4.75 5.20 4.90 8.65 10.80 12.60 5.75 4.40 4.95

WLM 4.65 3.90 3.95 5.65 4.20 6.30 7.00 6.80 7.00 5.25 4.30 5.45

Note: As in Table 3.

19

Table 5: Null rejection probabilities at 5% level. Conditionally heteroskedastic design with weak

identification (λ = 4).

n = 100 p = 1 p = 5



AR 2.35 1.75 0.60 4.05 2.90 1.40 4.30 2.40 1.15 5.25 3.25 2.60

LM 3.10 3.95 3.70 4.55 3.85 3.80 4.50 2.80 2.50 6.00 3.60 4.00

CLRa 2.45 2.10 1.00 4.15 3.20 1.55 4.25 2.55 1.25 5.70 2.85 3.15

CLRb 2.40 2.25 1.50 4.10 3.15 1.85 4.15 2.45 1.30 5.65 3.00 3.45

PAR1 3.60 5.15 4.35 4.85 5.60 5.20 7.40 7.55 8.85 5.95 4.70 4.70

PAR2 3.35 4.85 4.35 4.90 5.30 5.00 6.35 5.50 4.95 6.40 5.30 5.35

PLM 4.20 5.35 5.80 5.15 4.90 5.05 6.20 3.90 4.60 6.30 4.45 6.10

PCLRa 3.70 5.55 4.15 4.90 4.70 4.05 6.05 4.65 3.90 6.45 4.55 5.65

PCLRb 3.80 6.20 8.75 5.05 5.20 5.40 6.20 5.80 7.85 6.85 5.05 7.10

RARn 22.30 27.30 32.55 13.45 10.20 8.05 23.55 24.80 29.85 13.75 10.10 8.70

RARw 14.35 17.90 20.10 10.40 8.85 6.65 15.95 16.00 17.55 10.40 8.25 7.35

RCLRn 19.45 22.40 29.95 11.05 7.50 6.05 29.25 31.25 41.50 11.85 8.50 7.15

RCLRw 13.70 17.50 19.55 9.40 7.45 6.35 25.00 26.20 36.00 10.10 8.25 7.55

WAR 4.50 5.80 5.55 4.75 5.20 4.90 8.65 10.80 12.60 5.75 4.40 4.95

WLM 3.20 3.50 4.15 4.00 4.05 4.90 6.60 5.85 7.00 5.55 3.90 5.30


20

Table 6: Null rejection probabilities at 5% level. Conditionally heteroskedastic design with strong

identification (λ = 20).

n = 100 p = 1 p = 5



AR 2.35 1.75 0.60 4.05 2.90 1.40 4.30 2.40 1.15 5.25 3.25 2.60

LM 3.05 3.20 2.40 4.90 3.00 3.15 4.55 3.10 2.10 5.65 3.95 3.90

CLRa 2.25 2.00 0.95 4.60 2.80 1.85 4.55 2.70 1.45 5.70 3.50 3.20

CLRb 2.45 2.40 1.35 4.65 2.95 2.20 4.70 2.80 1.75 5.80 3.55 3.25

PAR1 3.60 5.15 4.35 4.85 5.60 5.20 7.40 7.55 8.85 5.95 4.70 4.70

PAR2 3.35 4.85 4.35 4.90 5.30 5.00 6.35 5.50 4.95 6.40 5.30 5.35

PLM 4.35 4.35 4.50 5.45 4.20 4.75 5.95 4.40 4.20 6.45 4.95 5.90

PCLRa 4.35 4.55 3.55 4.95 3.40 3.20 5.85 4.10 3.30 6.35 4.10 4.60

PCLRb 4.50 6.00 8.25 5.00 3.90 4.65 6.00 5.35 7.25 6.50 4.90 6.65

RARn 22.30 27.30 32.55 13.45 10.20 8.05 23.55 24.80 29.85 13.75 10.10 8.70

RARw 14.35 17.90 20.10 10.40 8.85 6.65 15.95 16.00 17.55 10.40 8.25 7.35

RCLRn 16.75 17.90 23.45 9.80 7.05 5.85 25.35 24.15 33.65 11.60 6.25 6.95

RCLRw 12.80 13.80 16.75 8.85 7.05 5.95 22.05 21.15 28.75 9.85 6.25 6.90

WAR 4.50 5.80 5.55 4.75 5.20 4.90 8.65 10.80 12.60 5.75 4.40 4.95

WLM 3.60 3.85 3.70 4.90 4.75 4.55 6.15 5.25 7.00 5.80 3.65 4.95


21

of causation between a better education of Protestants and industrialization in 1870s, the authors

estimate IV regressions where the dependent variable is a number of primary schools per 1000 in-

habitants, and the endogenous regressor is a fraction of Protestants instrumented by the distance

to Wittenberg, the city from which Protestantism were spread in a roughly concentric manner in

Martin Luther’s times.

The authors consider several different specifications which differ in terms of the exogenous

controls used: for the counties-level data,

• Specification 1: constant;

• Specification 2: latitude, longitude, a product of latitude and longitude, percentage of popu-

lation living in towns;

• Specification 3: latitude, longitude, a product of latitude and longitude, percentage of pop-

ulation living in towns, percentage of population younger than 15 years of age, percentage

of population older than 60 years of age, number of horses per capita, number of bulls per

capita, looms per capita, share of farm laborers in total population and tonnage of transport

ships;

and for the towns-level data,

• Specification 4: latitude, longitude, a product of latitude and longitude, percentage of pop-

ulation younger than 15 years of age, percentage of population older than 60 years of age,

looms per capita, buildings w/massive walls, businesses per capita and retailers per capita.

For further details of the study and the arguments underlie the use of the distance instrument, see

Becker and Woessmann (2010).

Since there is a single instrument, all specifications are just-identified. The sample sizes are 293

for the counties-level data and 156 for the towns-level data in the sample for 1816 Prussia. We

construct confidence intervals for the coefficient on the share of protestants using 1999 simulated

samples for the asymptotic CLR, wild bootstrap, permutation and rank-based confidence intervals.

We include confidence intervals from the homoskedastic two-stage least squares t test, t2sls, and

a non-studentized permutation test, PNS, based on W ′u(θ0), u(θ0) = MX(y − Y θ0), that rejects

when it is in the lower or upper tails of the permutation distribution of W ′uπ(θ0) in all of the

specifications.

The 95% confidence intervals and their lengths based on inverting different tests are reported

in Table 8. In all of the specifications, the heteroskedasticity-robust asymptotic, permutation and

22

wild bootstrap confidence intervals are nearly identical. Interestingly, they are shorter than the

homoskedastic AR, RARn and RARw confidence intervals. The RCLRn and RCLRw confidence

intervals are comparable to the permutation confidence intervals for the counties-level data, and

shorter for the towns-level data, however these tests do not guard against heteroskedasticity. As the

Breusch-Pagan test for heteroskedasticity in the reduced-form regression of the dependent variable

and the endogenous regressor on the instrument and the exogenous covariates have p-values 0.01366

and 2.22×10−16 for Specification 1, 1.0916×10−6 and 0.0033 for Specification 2, 5.3147×10−7 and

6.8536×10−6 for Specification 3, and 0.14162 and 7.0197×10−5 for Specification 4, there is a strong

evidence of heteroskedasticity and so the heteroskedasticity-robust methods should be more reliable.

The non-studentized permutation test which is exact under independence (Assumption 2.1) and does

not account for conditional heteroskedasticity yields a confidence interval considerably wider than

the one based on the robust permutation tests.

Using the heteroskedasticity and identification-robust confidence intervals, we confirm the find-

ings of Becker and Woessmann (2010) that there is a significant effect of the share of Protestants

on the school supply in the counties-level data.

4.2 Bazzi and Clemens (2013)

Bazzi and Clemens (2013) reexamines cross-sectional IV regression results about the effect of foreign

aid on economic growth from several published studies. The variables in the “aid-growth” IV

regressions estimated by Rajan and Subramanian (2008) and Bazzi and Clemens (2013) are

• yi : the average annual growth of per capita GDP;

• Yi : the foreign-aid receipts to GDP ratio (Aid/GDP);

• Wi1 : a variable constructed from aid-recipient population size, aid-donor population size,

colonial relationship, and language traits (see Appendix A of Bazzi and Clemens (2013));

• Xi : constant, initial per capita GDP, initial level of policy, initial level of life expectancy,

geography, institutional quality, initial inflation, initial M2/GDP, initial budget balance/GDP,

Revolutions, and Ethnic fractionalization.

The data cover the period 1970-2000 and the sample size is n = 78. Bazzi and Clemens (2013)

consider the following four specifications (Table 1, Columns 1-4 therein):

• Specification 1: The baseline specification of Rajan and Subramanian (2008) (Table 4, Column

2) with the variables as above;

23

Table 7: Confidence intervals for the coefficient on the share of Protestants in Becker and Woessmann

(2010) data.

Counties, n = 293 Towns, n = 156

Spec. 1 Spec. 2 Spec. 3 Spec. 4

Test statistics 95% CI Length 95% CI Length 95% CI Length 95% CI Length

t2sls [0.24, 1.27] 1.03 [0.94, 1.80] 0.86 [0.98, 1.94] 0.96 [−0.48, 1.03] 1.51

Hom. AR [0.21, 1.25] 1.04 [0.95, 1.80] 0.85 [0.98, 1.95] 0.97 [−0.48, 1.05] 1.53

RARn [0.35, 1.28] 0.93 [0.83, 2.04] 1.21 [0.86, 2.14] 1.28 [−0.49, 1.06] 1.55

RARw [0.46, 1.31] 0.85 [0.79, 1.81] 1.02 [0.81, 1.97] 1.16 [−0.51, 0.98] 1.49

RCLRn [0.32, 1.28] 0.96 [0.99, 1.78] 0.79 [1.07, 1.89] 0.82 [−0.29, 0.73] 0.93

RCLRw [0.45, 1.32] 0.87 [1.03, 1.68] 0.65 [1.11, 1.83] 0.72 [−0.26, 0.68] 0.94

AR [0.27, 1.22] 0.95 [1.07, 1.69] 0.62 [1.10, 1.86] 0.76 [−0.51, 0.97] 1.48

PAR1 [0.28, 1.21] 0.93 [1.07, 1.69] 0.62 [1.07, 1.87] 0.80 [−0.53, 0.98] 1.51

PAR2 [0.27, 1.23] 0.96 [1.07, 1.70] 0.63 [1.10, 1.86] 0.76 [−0.51, 0.95] 1.46

PNS [0.22, 1.25] 1.03 [0.77, 2.03] 1.26 [0.72, 2.22] 1.50 [−0.89, 1.59] 2.48

WAR [0.25, 1.23] 0.98 [1.07, 1.69] 0.62 [1.10, 1.88] 0.78 [−0.56, 0.98] 1.54

F -statistic 80.78 174.65 143.78 79.50

Breusch-Pagan p-values 0.01, 0.00 0.00, 0.00 0.00, 0.00 0.14, 0.00

Note: The number of permutation samples is N = 1999. t2sls and Hom. AR are the homoskedastic two-stage least

squares t and AR tests, PNS denotes the non-studentized permutation test, and the remaining tests are as in Tables

1 and 2. The Breusch-Pagan test p-values are computed using the fitted values in the reduced-form regressions of yi

and Yi on the instrument and exogenous covariates respectively.

24

• Specification 2: log population is included in the second stage of Specification 1;

• Specification 3: Wi2 = log population replaces the instrument Wi1 in Specification 1;

• Specification 4: An instrument, Wi3, constructed from the colonial ties indicators only replaces

Wi1 in Specification 1.

Specifications 1-4 are just-identified as there is a single instrument for the single endogenous regres-

sor, hence we only consider the AR and PAR test results. In addition, we consider Specifications

5-7 with IV’s (Wi1,Wi2)′, (Wi2,Wi3)′ and (Wi1,Wi2,Wi3)′, respectively, to examine whether the

over-identified specifications could have any instrumentation power. Wi2 is included in the latter

three cases because it is the strongest instrument as reported in Bazzi and Clemens (2013). We also

include the confidence interval based on the two-stage least squares t-test, denoted as t2sls.

Tables 8 and 9 report the 95% confidence intervals for the coefficient on Aid/GDP. In all spec-

ifications, the Breusch-Pagan test p-values from the two separate reduced-form regressions of the

endogenous variables on the exogenous variables show an evidence of heteroskedasticity.

The results for Specifications 1-4 in Table 8 qualitatively agree with the homoskedastic CLR

confidence intervals reported in Bazzi and Clemens (2013). The permutation confidence intervals

are nearly identical but slightly shorter than the asymptotic and wild bootstrap confidence intervals

in Specifications 1 and 3. The infinite confidence intervals in Specifications 2 and 4 that do not use

log population or the instruments based on it indicate that the population size instrument used in

the other specifications has indeed the most identifying power.

The above results also extend to Table 9. Some noteworthy findings are as follows. The WAR

confidence intervals are wider than the AR and PAR confidence intervals. The PLM confidence

intervals are shorter than the LM and WLM confidence intervals, but still wider than the other

confidence intervals. Somewhat surprisingly, in Specification 6, the AR, CLRa, PAR and PCLR

confidence intervals exclude 0 thereby pointing towards borderline significant effect of the foreign aid

on growth despite the rather small value of the first-stage F -statistic 17.90. Also, in Specification 7,

the PCLR confidence intervals exclude 0 while the opposite holds for the asymptotic CLR confidence

intervals which underscore the utility of the proposed tests.

However, these results should be interpreted with caution because the population size might

affect the growth through multiple channels as argued by Bazzi and Clemens (2013). Overall, the

uncertainty regarding the effect of foreign aid in the sample data may be expressed more accurately

by the permutation and the wild bootstrap confidence intervals as the unobserved confounders in

25

the “aid-growth” regression are more likely to be orthogonal to, rather than independent of the

population size.

Table 8: Confidence intervals for the coefficient on Aid/GDP in Rajan and Subramanian (2008)

and Bazzi and Clemens (2013) data.

Spec. 1 Spec. 2 Spec. 3 Spec. 4

Test statistics 95% CI Length 95% CI 95% CI Length 95% CI

t2sls [−0.05, 0.24] 0.29 [−5.14, 6.96] [−0.06, 0.21] 0.27 [−973.38, 941.49]

Hom. AR [−0.02, 0.28] 0.30 (−∞,∞) [−0.03, 0.25] 0.28 (−∞,∞)

RARn [−0.05, 0.29] 0.34 (−∞,∞) [−0.06, 0.28] 0.34 (−∞,∞)

RARw [−0.06, 0.20] 0.26 (−∞,∞) [−0.07, 0.16] 0.23 (−∞,∞)

RCLRn [−0.03, 0.26] 0.29 (−∞,∞) [−0.04, 0.22] 0.26 (−∞,∞)

RCLRw [−0.04, 0.18] 0.22 (−∞,∞) [−0.05, 0.16] 0.21 (−∞,∞)

AR [−0.01, 0.29] 0.30 (−∞,∞) [−0.02, 0.25] 0.27 (−∞,∞)

PAR1 [−0.01, 0.30] 0.31 (−∞,∞) [−0.03, 0.27] 0.30 (−∞,∞)

PAR2 [−0.01, 0.28] 0.29 (−∞,∞) [−0.02, 0.25] 0.27 (−∞,∞)

PNS [−0.04, 0.37] 0.41 (−∞,∞) [−0.05, 0.30] 0.35 (−∞,∞)

WAR [−0.02, 0.32] 0.34 (−∞,∞) [−0.03, 0.28] 0.31 (−∞,∞)

F -statistic 31.63 0.13 36.30 0.00

Breusch-Pagan p-values 0.03, 0.00 0.03, 0.00 0.02, 0.00 0.01, 0.00

Note: The number of permutation samples is N = 1999. The sample size is 78. Specifications 1-4 are just-identified.

t2sls and Hom. AR are the homoskedastic two-stage least squares t and AR tests, PNS denotes the non-studentized

permutation test, and the remaining tests are as in Tables 1 and 2. The Breusch-Pagan test p-values are computed

using the fitted values in the reduced-form regressions of yi and Yi on the instrument and exogenous covariates

respectively.

5 Conclusion

This paper studies randomization inference in the linear IV model. The proposed permutation tests

are analogs of the linear IV identification-robust tests, and share the strengths of the rank/permutation

tests and the wild bootstrap tests. In particular, these tests allow for conditional heteroskedastic-

ity, do not require the instruments to be independently distributed of the error terms unlike most

26

Table 9: Confidence intervals for the coefficient on Aid/GDP in Rajan and Subramanian (2008)

and Bazzi and Clemens (2013) data.

Spec. 5 Spec. 6 Spec. 7

Test statistics 95% CI Length 95% CI Length 95% CI Length

t2sls [−0.06, 0.22] 0.28 [−0.06, 0.21] 0.27 [−0.06, 0.21] 0.27

Hom. AR [−0.05, 0.31] 0.36 [−0.02, 0.26] 0.28 [−0.04, 0.32] 0.36

Hom. LM [−0.63, 0.26] 0.89 [−0.63, 0.28] 0.91 [−0.63, 0.28] 0.91

Hom. CLR [−0.03, 0.26] 0.29 [−0.03, 0.28] 0.31 [−0.03, 0.28] 0.31

RARn [−0.08, 0.41] 0.49 [−0.06, 0.32] 0.38 [−0.09, 0.40] 0.49

RARw [−0.08, 0.29] 0.37 [−0.06, 0.23] 0.29 [−0.09, 0.27] 0.36

RCLRn [−0.03, 0.24] 0.27 [−0.03, 0.25] 0.28 [−0.03, 0.25] 0.28

RCLRw [−0.05, 0.17] 0.22 [−0.03, 0.17] 0.20 [−0.04, 0.17] 0.21

AR [−0.04, 0.34] 0.38 [0.04, 0.32] 0.28 [0.01, 0.42] 0.41

LM [−0.43, 0.27] 0.70 [−0.61, 0.36] 0.97 [−0.69, 0.35] 1.04

CLRa [−0.03, 0.28] 0.31 [0.02, 0.36] 0.34 [0.00, 0.35] 0.35

CLRb [−0.02, 0.28] 0.30 [0.00, 0.36] 0.36 [−0.03, 0.34] 0.37

PAR1 [−0.04, 0.35] 0.39 [0.04, 0.34] 0.30 [0.01, 0.43] 0.42

PAR2 [−0.04, 0.34] 0.38 [0.04, 0.31] 0.27 [0.02, 0.38] 0.36

PLM [−0.43, 0.25] 0.68 [−0.60, 0.33] 0.93 [−0.65, 0.33] 0.98

PCLRa [−0.03, 0.27] 0.30 [0.02, 0.34] 0.32 [0.01, 0.35] 0.34

PCLRb [−0.02, 0.27] 0.29 [0.02, 0.34] 0.32 [0.01, 0.35] 0.34

PNS [−0.04, 0.39] 0.43 [−0.06, 0.32] 0.38 [−0.04, 0.35] 0.39

WAR [−0.05, 0.44] 0.49 [0.00, 0.33] 0.33 [−0.04, 0.48] 0.52

WLM [−0.64, 0.30] 0.94 [−0.66, 0.30] 0.96 [−0.66, 0.33] 0.99

F -statistic 17.98 17.90 11.86

Sargan p-value 0.42 0.08 0.21

Breusch-Pagan p-values 0.03, 0.00 0.02, 0.00 0.02, 0.00

Note: The number of permutation samples is N = 1999. The sample size is 78. Specifications 5-7 are over-identified.

t2sls, Hom. AR, Hom. LM and Hom. CLR are the homoskedastic two-stage least squares t, AR, LM and CLR

tests, PNS denotes the non-studentized permutation test, and the remaining tests are as in Tables 1 and 2. The

Breusch-Pagan test p-values are computed using the fitted values in the reduced-form regressions of yi and Yi on the

instrument and exogenous covariates respectively.

27

rank-based or permutation tests currently available in the literature, and could be robust to heavy

tails in certain cases.

We plan to address the important issue of cluster and identification-robust inference in IV models

in another work.

28

References

Abadie, A., Athey, S., Imbens, G. W. and Wooldridge, J. M. (2020). Sampling-based vs.

Design-based Uncertainty in Regression Analysis. Econometrica, 88 265–296.

Anderson, T. W. and Rubin, H. (1949). Estimation of the Parameters of a Single Equation in

a Complete System of Stochastic Equations. Annals of Mathematical Statistics, 20 46–63.

Andrews, D. W. and Guggenberger, P. (2019a). Identification-and Singularity-Robust Infer-

ence for Moment Condition Models. Quantitative Economics, 10 1703–1746.

Andrews, D. W. K., Cheng, X. and Guggenberger, P. (2020). Generic Results for Establish-

ing the Asymptotic Size of Confidence Sets and Tests. Journal of Econometrics, 218 496–531.

Andrews, D. W. K. and Guggenberger, P. (2017a). Asymptotic Size of Kleibergen’s LM and

Conditional LR Tests for Moment Condition Models. Econometric Theory, 33 1046–1080.

Andrews, D. W. K. and Guggenberger, P. (2017b). Supplemental material to ”Asymptotic

Size of Kleibergen’s LM and Conditional LR Tests for Moment Condition Models”. Econometric

Theory, 33.

Andrews, D. W. K. and Guggenberger, P. (2019b). Supplemental material to “Identification-

and Singularity-Robust Inference for Moment Condition Models”. Quantitative Economics, 10.

Andrews, D. W. K. and Marmer, V. (2008). Exactly Distribution-Free Inference in Instrumental

Variables Regression with Possibly Weak Instruments. Journal of Econometrics, 142 183–200.

Andrews, D. W. K. and Soares, G. (2007). Rank Tests for Instrumental Variables Regression

with Weak Instruments. Econometric Theory, 23 1033–1082.

Andrews, D. W. K. and Stock, J. H. (2007). Inference with Weak Instruments. In Advances

in Econometrics: Proceedings of the Ninth World Congress of the Econometric Society, vol. 3.

Andrews, I., Stock, J. H. and Sun, L. (2019). Weak Instruments in Instrumental Variables

Regression: Theory and Practice. Annual Review of Economics, 11 727–753.

Bazzi, S. and Clemens, M. A. (2013). Blunt Instruments: Avoiding Common Pitfalls in Identi-

fying the Causes of Economic Growth. American Economic Journal: Macroeconomics, 5 152–86.

Becker, S. O. and Woessmann, L. (2010). The Effect of Protestantism on Education before the

Industrialization: Evidence from 1816 Prussia. Economics Letters, 107 224–228.

29

Canay, I. A. and Kamat, V. (2018). Approximate Permutation Tests and Induced Order Statistics

in the Regression Discontinuity Design. The Review of Economic Studies, 85 1577–1608.

Canay, I. A., Romano, J. P. and Shaikh, A. M. (2017). Randomization Tests under an

Approximate Symmetry Assumption. Econometrica, 85 1013–1030.

Chernozhukov, V., Hansen, C. and Jansson, M. (2009). Finite Sample Inference for Quantile

Regression Models. Journal of Econometrics, 152 93–103.

Chung, E. and Romano, J. P. (2013). Exact and Asymptotically Robust Permutation Tests.

Annals of Statistics, 41 484–507.

Chung, E. and Romano, J. P. (2016). Multivariate and Multiple Permutation Tests. Journal of

Econometrics, 193 76–91.

Davidson, R. and MacKinnon, J. G. (2008). Bootstrap Inference in a Linear Equation Estimated

by Instrumental Variables. The Econometrics Journal, 11 443–477.

Davidson, R. and MacKinnon, J. G. (2012). Wild Bootstrap Tests for IV Regression. Journal

of Business & Economic Statistics, 28 128–144.

DiCiccio, C. J. and Romano, J. P. (2017). Robust Permutation Tests for Correlation and

Regression Coefficients. Journal of the American Statistical Association, 112 1211–1220.

Dufour, J.-M. (2003). Identification, Weak Instruments and Statistical Inference in Econometrics.

Canadian Journal of Economics, 36 767–808.

Dufour, J.-M., Flachaire, E. and Khalaf, L. (2019). Permutation Tests for Comparing

Inequality Measures. Journal of Business & Economic Statistics, 37 457–470.

Freedman, D. and Lane, D. (1983). A Nonstochastic Interpretation of Reported Significance

Levels. Journal of Business & Economic Statistics, 1 292–298.

Ganong, P. and Jager, S. (2018). A Permutation Test for the Regression Kink Design. Journal

of the American Statistical Association, 113 494–504.

Hansen, B. E. (2021). Econometrics. Manuscript.

Hoeffding, W. (1951). A Combinatorial Central Limit Theorem. Annals of Mathematical Statis-

tics, 22 558–566.

30

Hu, T.-C., Moricz, F. and Taylor, R. L. (1989). Strong Laws of Large Numbers for Arrays of

Rowwise Independent Random Variables. Acta Mathematica Hungarica, 54 153–162.

Imbens, G. W. and Rosenbaum, P. R. (2005). Robust, Accurate Confidence Intervals with a

Weak Instrument: Quarter of Birth and Education. Journal of the Royal Statistical Society:

Series A (Statistics in Society), 168 109–126.

Janssen, A. (1997). Studentized Permutation Tests for Non-IID Hypotheses and the Generalized

Behrens-Fisher Problem. Statistics & Probability letters, 36 9–21.

Kleibergen, F. (2002). Pivotal Statistics for Testing Structural Parameters in Instrumental Vari-

ables Regression. Econometrica, 70 1781–1803.

Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses. 3rd ed. Springer.

Moreira, H. and Moreira, M. J. (2019). Optimal Two-Sided Tests for Instrumental Variables

Regression with Heteroskedastic and Autocorrelated Errors. Journal of Econometrics, 213 398–

433.

Moreira, M. J. (2003). A Conditional Likelihood Ratio Test for Structural Models. Econometrica,

71 1027–1048.

Moreira, M. J., Porter, J. R. and Suarez, G. A. (2009). Bootstrap Validity for the Score

Test When Instruments May be Weak. Journal of Econometrics, 149 52–64.

Motoo, M. (1956). On the Hoeffding’s Combinatorial Central Limit Theorem. Annals of the

Institute of Statistical Mathematics, 8 145–154.

Neubert, K. and Brunner, E. (2007). A Studentized Permutation Test for the Non-Parametric

Behrens–Fisher Problem. Computational Statistics & Data Analysis, 51 5192–5204.

Neuhaus, G. (1993). Conditional Rank Tests for the Two-Sample Problem under Random Cen-

sorship. Annals of Statistics, 21 1760–1779.

Pauly, M. (2011). Discussion about the Quality of F-Ratio Resampling Tests for Comparing

Variances. Test, 20 163–179.

Rajan, R. G. and Subramanian, A. (2008). Aid and Growth: What Does the Cross-Country

Evidence Really Show? The Review of Economics and Statistics, 90 643–665.

31

Schneller, W. (1988). A Short Proof of Motoo’s Combinatorial Central Limit Theorem using

Stein’s Method. Probability Theory and Related Fields, 78 249–252.

So, B. S. and Shin, D. W. (1999). Cauchy Estimators for Autoregressive Processes with Appli-

cations to Unit Root Tests and Confidence Intervals. Econometric Theory, 15 165–176.

Stock, J. H., Wright, J. H. and Yogo, M. (2002). A Survey of Weak Instruments and Weak

Identification in Generalized Method of Moments. Journal of Business and Economic Statistics,

20 518–529.

Young, A. (2020). Consistency without Inference: Instrumental Variables in Practical Application.

Tech. rep., London School of Economics.

32

A Proofs

In the sequel,d= denotes distributional equality. We use the abbrevations SLLN for the strong law of large of

numbers, WLLN for the weak law of large numbers for independent L1+δ-bounded random variables, CLT

for the central limit theorem, CMT for the continuous mapping theorem, and SV and SVD for “singular

value” and “singular value decomposition”, respectively.

Let (Ω0,F , P ) be the probability triplet underlying the observations (y, Y,W,X), and Pπ denote the

probability measure induced by a random permutation π uniformly distributed over Gn.pπ−→ and

dπ−→ denote

the convergence in probability and convergence in distribution with respect to Pπ. Also, oa.s.(·) and Oa.s.(·)

stand for o(·) P -almost surely and O(·) P -almost surely, respectively.

Proof of Proposition 2.1. Since Σu(θ0) = diag(u1(θ0)2, . . . , un(θ0)2) and ui(θ0) = ui−X ′i(X ′X)−1X ′u under

H0 : θ = θ0, we may write Σu(θ0) = Σu(u,X). Then

PAR1 = u′MXWπ(W ′πMXΣu(u,X)MXWπ)−1W ′πMXu

d= u′MXW (W ′MXΣu(u,X)MXW )−1W ′MXu

= AR, (A.1)

where the equality in distribution follows from Assumption 1. Then, letting φPAR1n = φPAR1

n (Wπ, X, u(θ0))

E[φPAR1n ] =

1

n!E

[ ∑π∈Gn

φPAR1n (Wπ, X, u(θ0))

]

=1

n!E

[N+ +

n!α−N+

N0N0

]= α,

where the first equality uses (A.1), and the second equality follows from the definitions of φPAR1n (Wπ, X, u(θ0)),

N+ and N0. The proof of the PAR2 test is similar, thus is omitted.

A.1 Outline of the asymptotic results

The proof of the asymptotic results is organized as follows. In Section A.2, we provide some asymptotic results

for triangular arrays of random variables which will be used later in the proof of asymptotic similarity. In

Section A.3, following Andrews and Guggenberger (2019b) we define the drifting sequences of parameters for

establishing the uniform validity of the permutation tests, and state two lemmas pertaining to the conditional

critical value of the CLR-type statistics. Section A.4 provides some asymptotic results for quantities that

depend on the sample data only. Section A.5 presents the main results about the asymptotic distribution of

key permutation quantities (Lemma A.6) and the PCLR statistics under the drifting parameter sequences.

The uniform asymptotic similarity is proved in Section A.6. The asymptotic distributions of the test statistics

under strong or semi-strong identification, and local alternatives are derived in Section A.7. Finally, Section

A.8 presents the proof of Lemma A.8 used in the derivation of the asymptotic distribution of the PCLR

statistics.

33

A.2 Supplementary results

We use the following auxiliary results in the proof of asymptotic results. The first result is a version of the

Hoeffding (1951)’s combinatorial CLT established by Motoo (1956) (see also Schneller (1988)) which is based

on a Lindeberg-type condition.

Lemma A.1 (Combinatorial CLT). For n2, n > 1, real scalars cn(i, j), i, j = 1, . . . , n, and a random permu-

tation π uniformly distributed over Gn, consider the sum

Sn ≡n∑i=1

cn(i, π(i))

with EPπ [Sn] = n−1∑ni=1

∑nj=1 cn(i, j) and VarPπ [Sn] = 1

n−1

∑ni=1

∑nj=1 g

2n(i, j), where

gn(i, j) = cn(i, j)− n−1n∑i=1

cn(i, j)− n−1n∑j=1

cn(i, j) + n−2n∑i=1

n∑j=1

cn(i, j).

If gn(i, j) 6= 0 for some (i, j), so that VarPπ [Sn] > 0, and for any ε > 0

limn→∞

1

n

n∑i=1

n∑j=1

g2n(i, j)

1n−1

∑ni=1

∑nj=1 g

2n(i, j)

1

|gn(i, j)|(1

n−1

∑ni=1

∑nj=1 g

2n(i, j)

)1/2> ε

= 0, (A.2)

then as n→∞Sn − EPπ [Sn]√

VarPπ [Sn]

dπ−→ N [0, 1] P -almost surely.

The next result is a triangular array SLLN.

Lemma A.2 (Corollary to Theorem 1 of Hu et al. (1989)). Let Znk : k = 1, . . . , n;n = 1, 2, . . . be an array

of i.i.d. random variables such that E[|Znk|2p] <∞ for some 1 ≤ p < 2. Then,

n−1/pn∑k=1

(Znk − E[Znk])a.s.−→ 0. (A.3)

Proof of Lemma A.2. By the cr inequality, E[(Znk−E[Znk])2p] ≤ 22p−1(E[Z2pnk]+(E[Znk])2p) <∞. Applying

Theorem 1 of Hu et al. (1989) with Xnk = Znk − E[Znk] yields that n−1/p∑nk=1Xnk converges completely

to 0 i.e. for any ε > 0∞∑n=1

P

[|n−1/p

n∑k=1

Xnk| > ε

]<∞. (A.4)

From Borel-Cantelli lemma, it follows that n−1/p∑nk=1Xnk

a.s.−→ 0.

We will also use the following fact.

Lemma A.3. If Xini=1 and X are defined on the probability space (Ω0,F , P ) and Xnp−→ X, then Xn

pπ−→

X in P -probability.

Proof of Lemma A.3. This proof is analogous to the bootstrap version of the result, see Hansen (2021),

Theorem 10.1.

34

A.3 Drifting sequences of parameters

Following Andrews and Guggenberger (2019b), we first introduce some notations, and define the DGP’s.

Henceforth, we shall suppress the argument θ0 in some quantities, and write, for example, m(θ0) = m,

Σ(θ0) = Σ, J(θ0) = J , mπ(θ0) = mπ, Σπ(θ0) = Σπ, Jπ(θ0) = Jπ.

Define GP ≡ EP [Z∗i Z∗′i ]Γ , H ≡ Σ−1/2, HP ≡ Σ

−1/2P , U ≡

((θ0, Id)Ω

ε(Σ, V)−1(θ0, Id)′)1/2

and

UP ≡((θ0, Id)Ω

ε(ΣP ,VP )−1(θ0, Id)′)1/2, where V = Va(θ0), VP = VP,a (defined in (2.17) and (2.35)) and

V = Vb(θ0), VP = VP,b (defined in (2.19) and (2.36)) for the CLRa,PCLRa and CLRb,PCLRb statistics,

respectively.

The SVD of HPGPUP is given as follows. Let κ1P ≥ · · · ≥ κdP and BP denote the ordered eigenvalues,

and a d× d orthogonal matrix of corresponding eigenvectors, respectively, of

U ′PG′PHPH

′PGPUP , (A.5)

and κ1P ≥ · · · ≥ κkP and AP denote the ordered eigenvalues, and a k×k orthogonal matrix of corresponding

eigenvectors, respectively, of

HPGPUPU′PG′PH′P . (A.6)

Let τ1P ≥ · · · ≥ τdP ≥ 0 denote the singular values of

HPGPUP . (A.7)

Define

λ1,P ≡ (τ1P , . . . , τdP )′ ∈ Rd, (A.8)

λ2,P ≡ BP ∈ Rd×d, (A.9)

λ3,P ≡ AP ∈ Rk×k, (A.10)

λ4,P ≡ GP = EP [Z∗i Z∗′i ]Γ ∈ Rk×d, (A.11)

λ5,P ≡ VP,a = EP

Z∗i ui


Z∗i ui


′ ∈ R(d+1)k×(d+1)k, (A.12)

λ6,P ≡ [λ6,1P , . . . , λ6,(d−1)P ]′ =

[τ2Pτ1P

, . . . ,τdP

τ(d−1)P

]′∈ [0, 1]d−1, (A.13)

λ7,P ≡ EP [(W ′i , X′i)′(W ′i , X

′i)] ∈ R(k+p)×(k+p), (A.14)

λ8,P ≡ Σe = VarP [(ui, V′i )′] ∈ R(d+1)×(d+1), (A.15)

λ9,P = (ΣP ,K(VP )) ∈ (Rk×k,R(d+1)k×(d+1)k), VP ∈ VP,a,VP,b, (A.16)

λ10,P ≡ P, (A.17)

λP ≡ (λ1,P , . . . , λ10,P )′. (A.18)

35

As in Andrews and Guggenberger (2019b), the drifting parameters λ1,P , . . . , λ6,P , λ9,P , λ10,P are used in the

asymptotic results for the AR, LM, CLR statistics. The parameter vector λ1,P characterizes the identifi-

cation strength. The sequences λ7,P and λ8,P are needed additionally for the asymptotic properties of the

permutation statistics.

Define

Λ ≡ λP : P ∈ PPCLR0 ,

hn(λP ) ≡ [n1/2λ1,P , λ2,P , λ3,P , λ4,P , λ5,P , λ6,P , λ7,P , λ8,P , λ9,P ],

and

H ≡h ∈ (R ∪ ±∞)dim(hn(λPn )) : hwn(λPwn )→ h for some subsequence wn of n

and some sequence λPwn ∈ Λ : n ≥ 1.

The following assumption corresponds to Assumption B∗ of Andrews and Guggenberger (2019b) which is key

for establishing the asymptotic similarity of the tests.

Assumption 3. For any subsequence wn of n and any sequence λPwn ∈ Λ : n ≥ 1 for which

hwn(λPwn )→ h ∈ H, EPwn [φPRwn ]→ α for some α ∈ (0, 1).

Let λPn ∈ Λ : n ≥ 1 be a sequence such that hn(λPn)→ h ∈ H, and

n1/2τjPn → h1,j ≥ 0, j = 1, . . . , d, (A.19)

λj,Pn → hj , j = 2, . . . , 8, (A.20)

λ5,mPn = EPn [Z∗i Z∗′i u

2i ]→ h5,m = Σ, (A.21)

λ6,jPn → h6,j , j = 1, . . . , d− 1, (A.22)

λ7,Pn → h7 ≡

Qww Qwx

Qwx Qxx

, (A.23)

λ8,Pn → h8 ≡ Σe =

σ2 ΣuV

ΣV u ΣV

, (A.24)

λ9,Pn → h9 ≡ (Σ,K(V)), (A.25)

where λ5,mPn ∈ Rk×k is the upper-left sub-matrix of λ5,Pn with the corresponding limit h5,m = Σ, h7 and h8

are partitioned conformably with λ7,Pn and λ8,Pn , respectively, and for both (ΣP ,K(VP,a)) and (ΣP ,K(VP,b)),

the limit is denoted by the same variable h9 with V in (A.25) denoting the limit of VP ∈ VP,a,VP,b defined

in (A.16).

Consider q = qh ∈ 0, . . . , d such that

h1,j =∞ for 1 ≤ j ≤ q,

h1,j <∞ for q + 1 ≤ j ≤ d.

36

The normalization matrices used to derive the limiting distribution of the test statistics are defined as follows:

SPn ≡ diag(

(n1/2τ1Pn)−1, . . . , (n1/2τqPn)−1, 1, . . . , 1)∈ Rd×d, (A.26)

TPn ≡ BPnSPn ∈ Rd×d. (A.27)

In the sequel, we index the quantities indexed by P = Pn by n, for example, write ΣP = EP [Z∗i Z∗′i u

2i ] = Σn,

GP = EP [Z∗i Z∗′i ]Γ = Gn, λwn = λPwn and λn = λPn .

Let κ1 ≥ · · · ≥ κd ≥ 0 denote the eigenvalues of

nU ′J ′HH ′J U .

Partition

An = [An,q, An,k−q], An,q ∈ Rk×q, An,k−q ∈ Rk×(k−q), (A.28)

Bn = [Bn,q, Bn,d−q] , Bn,q ∈ Rd×q, Bn,d−q ∈ Rd×(d−q), (A.29)

h2 = [h2,q, h2,d−q], h2,q ∈ Rd×q, h2,d−q ∈ Rd×(d−q), (A.30)

h3 = [h3,q, h3,k−q], h3,q ∈ Rk×q, h3,k−q ∈ Rk×(k−q). (A.31)

Define

Υn ≡

Υn,q 0q×(d−q)

0(d−q)×q Υn,d−q

0(k−d)×q 0(k−d)×(d−q)

∈ Rk×d, Υn,q ≡ diag(τ1n, . . . , τqn) ∈ Rq×q,

Υn,d−q ≡ diag(τ(q+1)n, . . . , τdn) ∈ R(d−q)×(d−q). (A.32)

For a non-random matrix J ∈ R(k−q)×(d−q), let rk,d,q(J , 1− α) denote the 1− α quantile of

Z∗′Z∗ − λmin[(J ,Z∗2 )′(J ,Z∗2 )], Z∗ = [Z∗′1 ,Z∗′2 ]′ ∼ N [0k×1, Ik], Z∗1 ∈ Rq, Z∗2 ∈ Rk−q. (A.33)

The following results due to Andrews and Guggenberger (2019b) will be used in the proof of the asymptotic

similarity of the PCLR tests.

Lemma A.4 (Lemma 27.3 of Andrews and Guggenberger (2019b)). Let Z = [Z ′1,Z ′2]′ ∼ N [0k×1, Ik], Z1 ∈

Rq, Z2 ∈ Rk−q, and Υ(τ c) ≡ [diag(τ c), 0(d−q)×(k−d)] ∈ R(k−q)×(d−q), where τ c ≡ (τ c(q+1), . . . , τcd)′ ∈ Rd−q

with τ cq+1 ≥ . . . , τ cd ≥ 0, and k ≥ d ≥ 1 and 0 ≤ q ≤ d. Then, the distribution function of

CLR∞(τ c) ≡ Z ′Z − λmin[(Υ(τ c),Z2)′(Υ(τ c),Z2)] (A.34)

is continuous and strictly increasing at its 1− α quantile rk,d,q(Υ(τ c), 1− α) for all α ∈ (0, 1).

Lemma A.5 (Lemma 16.2 of Andrews and Guggenberger (2019b)). Let J c be a (k−q)× (d−q) matrix with

the singular value decomposition AcΥcBc′, where Ac is a (k − q)× (k − q) orthogonal matrix of eigenvectors

of J cJ c′, Bc is a (d− q)× (d− q) orthogonal matrix of eigenvectors of J c′J c, and Υc is the (k− q)× (d− q)

37

matrix with the d − q singular values τ cq+1 ≥ · · · ≥ τ cd of J c as its first d − q diagonal elements and zeros

elsewhere. Then, for rk,d,q(·, 1− α) defined in (A.33)

rk,d,q(J c, 1− α) = rk,d,q(Υc, 1− α),

and the distribution of the random variable in (A.34) is identical to that of (A.33) with J c in place of J .

A.4 Probability limits of some key sample quantities

The proof of the asymptotic similarity of the permutation tests involves the derivation of the asymptotic

distributions of the AR, LM, CLRa and CLRb statistics under the drifting sequences of parameters given

above which can be derived following Andrews and Guggenberger (2017a,b, 2019a,b) based on the moment

function m∗i (θ) = Z∗i u∗i (θ), where u∗i (θ) ≡ Z∗′i Γ (θ−θ0)+ui+V

′i (θ−θ0), and the corresponding sample moment

function m∗(θ) ≡ n−1∑ni=1 Z

∗i u∗i (θ). Note here that the test statistics are based on m(θ) ≡ n−1Z ′(y−Y θ) =

n−1Z ′ZΓ (θ − θ0) + n−1Z ′u+ n−1Z ′V (θ − θ0) rather than m∗(θ). Therefore, since

m∗(θ) = n−1Z∗′Z∗Γ (θ − θ0) + n−1Z∗′u+ n−1Z∗′V (θ − θ0),

where Z∗ ≡ W − X(EP [XiX′i])−1 EP [XiW

′i ], one needs to account for the randomness that results when

Z∗ is replaced by Z in order to derive the asymptotic distributions. Since the latter does not entail any

substantial change in the proof of Andrews and Guggenberger (2017a,b, 2019a,b), we do not provide them

here to avoid overlap.

However, following Andrews and Guggenberger (2019b), we shall determine the probability limits of

certain sample random quantities that appear in the permutation statistics but do not depend on the random

permutation π, under the drifting parameter sequences.

We also remark that the proof of the asymptotic distributions of the permutation test statistics are similar

to that of the asymptotic test statistics and so one can derive the asymptotic distributions of the AR, LM,

CLRa and CLRb statistics proceeding similarly to the derivation of the asymptotic distributions of the robust

permutation statistics provided below by replacing the permutation quantities by their sample counterparts.

The limits of Va(θ0), Vb(θ0) and J . By the triangular array WLLN and the CMT, for s = 1, . . . , d,

n−1∑ni=1 ZiYis = n−1Z ′Y, s = n−1Z ′ZΓs + n−1Z ′V, s = EP [Z∗i Z

∗′i ]Γs + op(1),

D ≡ EP [WiX′i](EP [XiX

′i])−1 −W ′X(X ′X)−1 p−→ 0, (A.35)

(X ′X)−1X ′V, sp−→ 0, (A.36)

n−1X ′up−→ 0. (A.37)

We may rewrite Zi = Wi−W ′X(X ′X)−1Xi = Wi−EP [WiX′i](EP [XiX

′i])−1Xi+DXi = Z∗i +DXi, and Yis =

Z ′iΓs+Vis−X ′i(X ′X)−1X ′V, s = Z∗′i Γs+Vis+X′i(D

′Γs−(X ′X)−1X ′V, s). Note that ‖Z∗i ‖4, ‖Z∗i ‖3‖Xi‖, ‖Z∗i ‖3‖Vi‖,

‖Z∗i ‖2‖Xi‖‖Vi‖, ‖Z∗i ‖2‖Xi‖2, ‖Z∗i ‖2‖Vi‖2, ‖Z∗i ‖‖Xi‖3, ‖Z∗i ‖‖Xi‖2‖Vi‖, ‖Z∗i ‖‖Xi‖‖Vi‖2, ‖Xi‖3‖Vi‖, ‖Xi‖2‖Vi‖2,

38

have finite 1 + δ/4 moments by the Cauchy-Schwarz inequality. By (A.35), (A.36), (A.37), the triangular

array WLLN and the CMT, for s, t = 1, . . . , d

n−1n∑i=1

ZiZ′iYisYit = n−1

n∑i=1

(Z∗i +DXi)(Z∗i +DXi)

′(Z∗′i Γs + Vis +X ′i(D′Γs − (X ′X)−1X ′V, s))

(Z∗′i Γt + Vit +X ′i(D′Γt − (X ′X)−1X ′V, t))

= n−1n∑i=1

Z∗i Z∗′i (Z∗′i Γs + Vis)(Z

∗′i Γt + Vit) + op(1)

= EP [Z∗i Z∗′i (Z∗′i Γs + Vis)(Z

∗′i Γt + Vit)] + op(1). (A.38)

Similarly,

Cs = n−1n∑i=1

ZiZ′iYisui(θ0)

= n−1n∑i=1


′(Z∗′i Γs + Vis +X ′i(D′Γs − (X ′X)−1X ′V, s))(ui −X ′i(X ′X)−1X ′u)

= n−1n∑i=1

Z∗i Z∗′i (Z∗′i Γs + Vis)ui + op(1) = EP [Z∗i Z

∗′i (Z∗′i Γs + Vis)ui] + op(1) = Cns + op(1), (A.39)

and

Σ = n−1n∑i=1

ZiZ′iui(θ0)2 = n−1

n∑i=1


′(ui −X ′i(X ′X)−1X ′u)2

= EP [Z∗i Z∗′i u

2i ] + op(1) = Σn + op(1) = h5,m + op(1). (A.40)

By (A.38), (A.39), (A.40) and the CMT (see also the equations (27.73) and (27.74) of Andrews and Guggen-

berger (2019b))

Va(θ0) = Vn,a + op(1) = h5 + op(1),

K(Va(θ0)) = K(Vn,a) + op(1) = K(h5) + op(1),

(Σ,K(Va(θ0))) = (h5,m,K(h5)) + op(1),

Ωε(Σ,K(Va(θ0))) = Ωε(h5,m,K(h5)) + op(1) = Ωε(h9) + op(1), (A.41)

where the last equality holds by the continuity of the eigenvalue-adjusted matrix, see Andrews and Guggen-

berger (2019a), p.1717. To determine the limit of Vb(θ0) in (2.19), note that εi(θ0) = [ui(θ0),−Y ′i +

X ′i(X′X)−1X ′Y ]′ and

εi(θ0)−

01×k

−Γ ′

Zi =

ui − u′X(X ′X)−1Xi

−Vi + V ′X(X ′X)−1Xi

= vi − U ′X(X ′X)−1Xi, (A.42)

where vi ≡ [ui,−V ′i ]′ and U ≡ [u,−V ]. Letting Q ≡ (Z ′Z)−1Z ′U , we have

εin(θ0)− [0k×1,−Γ ]′Zi = ε(θ0)′Z(Z ′Z)−1Zi − [0k×1,−Γ ]′Zi = Q′Zi. (A.43)

39

Note here that Q = Op(n−1/2) because n−1Z ′Z = n−1W ′W −n−1W ′X(X ′X)−1X ′W

a.s.−→ Qzz which follows

from EP [Z∗i Z∗′i ]→ Qww −QwxQ−1

xxQxw ≡ Qzz, the WLLN and the CMT, and

n−1/2Z ′U = n−1/2W ′U − n−1W ′X(n−1X ′X)−1n−1/2X ′U = Op(1)

which holds due to the triangular array CLT, the WLLN and the CMT. Rewrite

Vb(θ0) = n−1n∑i=1

(εi(θ0)− εin(θ0))(εi(θ0)− εin(θ0))′

⊗ (ZiZ

′i)

= n−1n∑i=1

(εi(θ0)− [0k×1,−Γ ]′Zi)(εi(θ0)− [0k×1,−Γ ]′Zi)

′⊗ (ZiZ

′i) (A.44)

+ n−1n∑i=1

(εi(θ0)− [0k×1,−Γ ]′Zi)([0k×1,−Γ ]′Zi − εin(θ0))′

⊗ (ZiZ

′i) (A.45)

+ n−1n∑i=1

([0k×1,−Γ ]′Zi − εin(θ0))(εi(θ0)− [0k×1,−Γ ]′Zi)

′⊗ (ZiZ

′i) (A.46)

+ n−1n∑i=1

([0k×1,−Γ ]′Zi − εin(θ0))([0k×1,−Γ ]′Zi − εin(θ0))′

⊗ (ZiZ

′i). (A.47)

Using (A.42), the term (A.44) can be rewritten as

n−1n∑i=1

(εi(θ0)− [0k×1,−Γ ]′Zi)(εi(θ0)− [0k×1,−Γ ]′Zi)

′⊗ (ZiZ

′i)

= n−1n∑i=1

(vi − U ′X(X ′X)−1Xi)(vi − U ′X(X ′X)−1Xi)′ ⊗ (ZiZ

′i)

= n−1n∑i=1

(viv′i − U ′X(X ′X)−1Xiv

′i − viX ′i(X ′X)−1X ′U

+ U ′X(X ′X)−1XiX′i(X

′X)−1X ′U

)⊗ (ZiZ

′i).

Let vij and vil denote the j and l-th elements of vi, j, l = 1, . . . , d + 1. By the Cauchy-Schwarz inequality

EP [‖vijvilZ∗i Z∗′i ‖1+δ/2] ≤ (EP [‖vi‖4+δ])1/2(EP [‖Z∗i ‖4+δ])1/2 < ∞. Similarly, EP [‖vijvilXiX′i‖1+δ/2] < ∞

and EP [‖vijvilZ∗iX ′i‖1+δ/4] <∞. Then, the WLLN and (A.35) yield

n−1n∑i=1

viv′i ⊗ (ZiZ

′i)− EP [viv

′i ⊗ (Z∗i Z

∗′i )]

= n−1n∑i=1

viv′i ⊗ ((Z∗i +DXi)(Z

∗i +DXi)

′)− EP [viv′i ⊗ (Z∗i Z

∗′i )]

p−→ 0. (A.48)

Using the Cauchy-Schwarz inequality again,

EP [‖XijvilZ∗i Z∗′i ‖1+δ/4] ≤ (EP [‖Z∗i ‖4+δ])1/2(EP [X4+δ

ij ])1/4(EP [‖vi‖4+δ])1/4 <∞,

for j = 1, . . . , p and l = 1, . . . , d + 1. Similarly, EP [‖XijvilXiZ∗′i ‖1+δ/4] < ∞ and EP [‖XijvilXiX

′i‖1+δ/4] <

∞. Therefore, by the WLLN and (A.35)

n−1n∑i=1

XijvilZiZ′i − EP [XijvilZ

∗i Z∗′i ]

= n−1n∑i=1

Xijvil(Z∗i +DXi)(Z

∗i +DXi)

′ − EP [XijvilZ∗i Z∗′i ]

p−→ 0,

40

and

n−1n∑i=1

viX′i(X

′X)−1X ′Up−→ 0. (A.49)

By the Cauchy-Schwarz inequality EP [‖XijXilZ∗i Z∗′i ‖1+δ/2] ≤ EP [‖Xi‖4+δ])1/2(EP [‖Z∗i ‖4+δ])1/2 < ∞ and

similarly

EP [‖XijXilXiZ∗′i ‖1+δ/4] <∞ and EP [‖XijXilXiX

′i‖1+δ/2] <∞.

Then, by the WLLN and (A.35)

n−1n∑i=1

U ′X(X ′X)−1XiX′i(X

′X)−1X ′U ⊗ (ZiZ′i)

= n−1n∑i=1

U ′X(X ′X)−1XiX′i(X

′X)−1X ′U ⊗ ((Z∗i +DXi)(Z∗i +DXi)

′)p−→ 0. (A.50)

Combining (A.48), (A.49) and (A.50), we obtain

n−1n∑i=1

(εi − [0k×1,−Γ ]′Zi)(εi − [0k×1,−Γ ]′Zi)

′⊗ (ZiZ

′i)− EP [(viv

′i)⊗ (Z∗i Z

∗′i )]

p−→ 0. (A.51)

We consider the term in (A.45) next. Since by the Cauchy-Schwarz inequality

EP [‖vijZ∗ilZ∗i Z∗′i ‖1+δ/4] ≤ (EP [v4+δij ])1/4(EP [Z∗4+δ

il ])1/4(EP [‖Z∗i ‖4+δ])1/2 <∞,

EP [‖XijZ∗ilZ∗i Z∗′i ‖1+δ/4] ≤ (EP [X4+δ

ij ])1/4(EP [Z∗4+δil ])1/4(EP [‖Z∗i ‖4+δ])1/2 <∞

and the above moment bounds hold when Z∗i Z∗′i is replaced by XiZ

∗′i and XiX

′i. Using Q = Op(n

−1/2),

(A.35), the WLLN and the CMT, the terms in (A.45) and (A.46) are op(1):

n−1n∑i=1

(εi − [0k×1,−Γ ]′Zi)([0k×1,−Γ ]′Zi − εin)′

⊗ (ZiZ

′i)

= −n−1n∑i=1

(vi(Z∗i +DXi)

′Q− U ′X(X ′X)−1Xi(Z∗i +DXi)

′Q)⊗ ((Z∗i +DXi)(Z∗i +DXi)

′)

p−→ 0. (A.52)

Finally, consider the last term in (A.47). Since E[‖Z∗ijZ∗ilZ∗i Z∗′i ‖1+δ/4] ≤ E[‖Z∗i ‖4+δ] <∞, E[‖Z∗ijZ∗ilXiZ∗′i ‖1+δ/4] <

∞, and E[‖Z∗ijZ∗ilXiX′i‖1+δ/4] <∞, by (A.35) and the WLLN

n−1n∑i=1

([0k×1,−Γ ]′Zi − εin)([0k×1,−Γ ]′Zi − εin)′

⊗ (ZiZ

′i)

= n−1n∑i=1

Q′ZiZ′iQ⊗ (ZiZ

′i) = n−1

n∑i=1

Q′((Z∗i +DXi)(Z∗i +DXi)

′)Q⊗ ((Z∗i +DXi)(Z∗i +DXi)

′)p−→ 0.

(A.53)

41

Therefore, (A.51), (A.52), (A.53) and the CMT together with the continuity of the eigenvalue-adjusted matrix

give

Vb(θ0) = Vn,b + op(1),

K(Vb(θ0)) = K(Vn,b) + op(1),

(Σ,K(Vb(θ0))) = (h5,m,K(V)) + op(1) ≡ h9 + op(1),

Ωε(Σ,K(Vb(θ0))) = Ωε(h9) + op(1). (A.54)

Next we determine the limit of n1/2(J −Gn). Rewrite

n1/2(Gn −Gn) = n−1/2(n−1Z ′Y − EP [Z∗i Z∗′i ]Γ )

= n−1/2Z ′ZΓ − n1/2 EP [Z∗i Z∗′i ]Γ + n−1/2Z ′V

= n−1/2Z ′V + (n−1/2W ′W − n1/2 EP [WiW′i ])Γ

−(n−1/2W ′X − n1/2 EP [WiX

′i])

(X ′X)−1X ′WΓ

− EP [WiX′i](n

−1X ′X)−1(n−1/2X ′W − n1/2 EP [WiX

′i])

Γ

− EP [WiX′i](n1/2(n−1X ′X)−1 − n1/2(EP [XiX

′i])−1)

EP [XiW′i ]Γ . (A.55)

On noting that n−1/2Z ′V = n−1/2W ′V − n−1W ′X(n−1X ′X)−1X ′V = n−1/2Z∗′V + op(1), n−1/2Z ′u =

n−1/2W ′u− n−1W ′X(n−1X ′X)−1X ′u = n−1/2Z∗′u+ op(1), by the multivariate triangular array Lyapunov

CLT

n1/2vec(n−1W ′W − EP [WiW′i ])

n1/2vec(n−1W ′X − EP [WiX′i])

n1/2vec(n−1X ′X − EP [XiX′i])

n−1/2vec(Z ′V )

n−1/2Z ′u

=

n1/2vec(n−1W ′W − EP [WiW′i ])

n1/2vec(n−1W ′X − EP [WiX′i])

n1/2vec(n−1X ′X − EP [XiX′i])

n−1/2vec(Z∗′V )

n−1/2Z∗′u

+ op(1)

d−→

vec(Zww)

vec(Zwx)

vec(Zxx)

vec(Zzv)

Zzu

, (A.56)

where Z∗ = W −X(EP [XiX′i])−1 EP [XiW

′i ] and

Zww ≡ [Zww,1, . . . ,Zww,k], Zww,j ∼ N [0, limn→∞

VarP [WiWij ]], j = 1, . . . , k,

Zwx ≡ [Zwx,1, . . . ,Zwx,p], Zwx,l ∼ N [0, limn→∞

VarP [WiXil]], l = 1, . . . , p,

Zxx ≡ [Zxx,1, . . . ,Zxx,p], Zxx,s ∼ N [0, limn→∞

VarP [XiXis]], s = 1, . . . , p,

Zzv ≡ [Zzw,1, . . . ,Zzv,d], Zzv,r ∼ N [0, limn→∞

EP [Z∗i Z∗′i V

2ir]], r = 1, . . . , d,

Zzu ∼ N [0, limn→∞

EP [Z∗i Z∗′i u

2i ]].

By the triangular array WLLN,

n−1W ′Wp−→ Qww, n

−1W ′Xp−→ Qwx, n

−1X ′Xp−→ Qxx. (A.57)

Next we derive the asymptotic distribution of n1/2((n−1X ′X)−1 − (EP [XiX′i])−1). Let f : vec(A) →

vec(A−1), where A ∈ Rp×p is positive definite. The Jacobian of the mapping evaluated at vec(A) = vec(Qxx)

42

is f = −(A−1)′ ⊗ A−1|A=Qxx = −Q−1xx ⊗Q−1

xx . Since n1/2(vec(n−1X ′X)− vec(EP [XiX′i]))

d−→ vec(Zxx), by

the delta method

n1/2((n−1X ′X)−1 − (EP [XiX′i])−1)

d−→ vec−1p,p(f vec(Zxx))

= −vec−1p,p((Q

−1xx ⊗Q−1

xx )vec(Zxx))

= −Q−1xxZxxQ−1

xx , (A.58)

where vec−1p,p denotes the inverse vec operator with the property that vec−1

p,p(vec(A)) = A. Since n1/2((n−1X ′X)−1−

(EP [XiX′i])−1) = −Q−1

xxn1/2(n−1X ′X − EP [XiX

′i])Q

−1xx + op(1) by (A.56) and (A.58), the convergence in

(A.58) holds jointly with (A.56). Using (A.56)-(A.58) in (A.55) and invoking the Slutsky’s lemma,

n1/2(G−Gn)d−→ Gh ≡ (Zww −ZwxQ−1

xxQxw −QwxQ−1xxZxw +QwxQ

−1xxZxxQ−1

xxQxw)Γ + Zzv. (A.59)

Then, from (A.39), (A.40), (A.56), (A.59) and the CMT

n1/2vec(J −Gn) = n−1/2n∑i=1

vec(ZiY′i − EP [Z∗i Z

∗′i ]Γ )− [C ′1, . . . , C

′d]′Σ−1n1/2m

= vec((Zww −ZwxQ−1

xxQxw −QwxQ−1xxZxw +QwxQ

−1xxZxxQ−1

xxQxw)Γ + Zzv)

− [C ′P1, . . . , C′Pd]′Σ−1P Zzu + op(1),

= vec(Gh)− [C ′P1, . . . , C′Pd]′Σ−1P Zzu + op(1). (A.60)

A further result in Lemma 16.4 of Andrews and Guggenberger (2019b) is as follows:

n1/2[m, J −Gn, HnJUnTn]dπ−→ [mh, Jh,∆h], (A.61)

where (Jh,∆h) and mh are independent, mh

vec(Jh)

∼ N0k×(d+1),

h5,m 0k×dk

0k×dk Φh

, Φh ≡ limn→∞

(ΣGn − C ′nΣ−1n Cn)

whenever the limit exists, and

∆h ≡ [∆h,q,∆h,d−q] ∈ Rk×d, ∆h,q ≡ h3,q ∈ Rk×q, ∆h,d−q ≡ h3h1,d−q + h−1/25,m Jhh91h2,d−q ∈ Rk×(d−q),

h1,d−q ≡

0q×(d−q)

diag(h1,q+1, . . . , h1,d)

0(k−d)×(d−q)

, h91 ≡ ((θ0, Id)Ωε(h9)(θ0, Id)

′)1/2, Ω = Ω(Σ,K(V)). (A.62)

A.5 Asymptotic distributions of key permutation quantities

In this section, we derive the asymptotic distributions of key quantities that underlie the permutation test

statistics under the drifting sequence of parameters.

43

Conditioning events. We first describe the events on which we condition when deriving the asymptotic

results for the permutation tests. From (A.60), for any ε > 0 we can find a constant U0 and n0 ∈ N such

that for all n ≥ n0.

Ω1 ≡ ω ∈ Ω0 : ‖n1/2(J −Gn)‖ < U0, P [Ωc1] < ε/2. (A.63)

We shall establish that for PR ∈ PAR1,PAR2,PLM,PCLRa,PCLRb and all ω ∈ Ω1

supx∈R|Pπ[PR ≤ x]− P [PR∞ ≤ x]| p−→ 0, (A.64)

where PR∞ is the limiting random variable. When (A.64) holds, for any ε, ε1 > 0, there exists n1 ∈ N

such that P [supx∈R |Pπ[PR ≤ x] − P [PR∞ ≤ x]| > ε1] ≤ ε/2 for all ω ∈ Ω1 and n ≥ n1. Then, by the

Boole-Bonferroni inequality, for all n ≥ maxn0, n1

P [ω ∈ Ω0 : |Pπ[PR ≤ x]− P [PR∞ ≤ x]| > ε1]

≤ P [ω ∈ Ω1 : |Pπ[PR ≤ x]− P [PR∞ ≤ x]| > ε1] + P [Ωc1]

≤ ε. (A.65)

Since ε > 0 is arbitrary, it will follow that

supx∈R|Pπ[PR ≤ x]− P [PR∞ ≤ x]| p−→ 0. (A.66)

This means that we can use the distributional results such as that in (A.60) in the conditioning events to

obtain the convergence in Pπ-distribution in P -probability.

The following lemma is a permutation analog of Lemma 16.4 (see also (A.61)) of Andrews and Guggen-

berger (2019b).

Lemma A.6. Under all sequences λn ∈ Λ : n ≥ 1,

n1/2[mπ, Jπ − G,HnJπUnTn]

dπ−→ [mπh, J

πh ,∆

πh] in P -probability,

where

(a) mπh

vec(Jπh )

∼ N0k×(d+1),

σ2Qzz 0k×dk

0k×dk Φπh

, Φπh ≡ (ΣV − ΣV u(σ2)−1ΣuV )⊗Qzz,

(b)

∆πh ≡ [∆π

h,q,∆πh,d−q] ∈ Rk×d, ∆π

h,q ≡ h3,q ∈ Rk×q,

∆πh,d−q ≡ h3h1,d−q + h

−1/25,m (Gh + Jπh )h91h2,d−q ∈ Rk×(d−q),

where Gh is defined in (A.59), and h1,d−q and h91 are defined in (A.62),

(c) (Jπh ,∆πh) and mπ

h are independent,

44

(d) When Un ≡ Id (as opposed to UP ≡((θ0, Id)Ω

ε(ΣP ,VP )−1(θ0, Id)′)1/2 defined in Section A.3) with the

eigenvalues, the orthogonal eigenvectors and the SV’s of the matrices (A.5)-(A.7) defined accordingly

as in Section A.3, the above results also hold with ∆πh,d−q ≡ h3h1,d−q + h

−1/25,m (Gh + Jπh )h2,d−q where

Gh is as defined in (A.59), and h1,d−q is as defined in (A.62),

(e) Under all subsequences wn and all sequences λwn,n : n ≥ 1 ∈ Λ, the convergence results above hold

with n replaced with wn.

Proof of Lemma A.6. The proof is divided into three main steps. In the first step, we prove the multivariate

normality of some key permutation quantities. In the second step, we prove the consistency of the covariance

matrix estimates. The final step shows the desired result of the lemma.

Step 1: Multivariate normality. By the triangular array SLLN (Lemma A.2), n−1W ′W−EP [WiW′i ]

a.s.−→

0, n−1X ′W − EP [XiW′i ]

a.s.−→ 0 and n−1X ′X − EP [XiX′i]

a.s.−→ 0, and by (A.23) EP [WiW′i ] → Qww,

EP [XiW′i ]→ Qxw, and EP [XiX

′i]→ Qxx. Thus,

EP [Z∗i Z∗′i ]→ Qww −QwxQ−1

xxQxw ≡ Qzz, (A.67)

and by the CMT

n−1Z ′Z = n−1W ′W − n−1W ′X(X ′X)−1X ′Wa.s.−→ Qzz. (A.68)

Note that n1/2(Jπs − n−1Z ′ZΓs) = n1/2Z ′Vπ,s − Cπs Σπn1/2mπ, s = 1, . . . , d, where Γs = (Z ′Z)−1Z ′Y, s and

Vπ,s is the s-th column of Vπ. Next we will show that n−1/2Z ′Mιuπ(θ0)

n−1/2vec(Z ′MιVπ)

dπ−→

mπh

Gπh

∼ N [0,Σe ⊗Qzz] P -almost surely. (A.69)

Using the Cramer-Wold device, it suffices to prove that for any [t′0, t′1, . . . , t

′d]′ with tl ∈ Rk, l = 0, . . . , d

t′

n−1/2Z ′Mιuπ(θ0)


dπ−→ N [0, t′ (Σe ⊗Qzz) t] P -almost surely. (A.70)

Rewrite

t′

n−1/2Z ′Mιuπ(θ0)


= n−1/2n∑i=1

(t′0Ziuπ(i)(θ0) + t′1ZiVπ(i)1 + · · ·+ t′dZiVπ(i)d

)

= n−1/2n∑i=1

Z ′i[t0, t1, . . . , td][uπ(i)(θ0), Vπ(i)1, . . . , Vπ(i)d]′

= n−1/2n∑i=1

Z ′iζπ(i),

45

where ζi ≡ [t0, t1, . . . , td][ui(θ0), Vi1, . . . , Vid]′, i = 1, . . . , n. We verify the condition (A.2) of Lemma A.1 with

cn(i, j) ≡ n−1/2Z ′iζj . Since

EPπ

[n−1/2

n∑i=1

t′0Ziuπ(i)(θ0)|y, Y,W,X

]= n1/2

(n−1

n∑i=1

t′0Zi

)(n−1

n∑i=1

ui(θ0)

)= 0,

EPπ

[n−1/2

n∑i=1

t′0ZiVπ(i)l|y, Y,W,X

]= n1/2

(n−1

n∑i=1

t′0Zi

)(n−1

n∑i=1

Vil

)= 0, l = 1, . . . , d,

we have

EPπ

[n−1/2

n∑i=1

t′0Ziuπ(i)(θ0)|y, Y,W,X

]= n1/2

(n−1

n∑i=1

t′0Zi

)(n−1

n∑i=1

ui(θ0)

)= 0,

EPπ

[n−1/2

n∑i=1

t′0ZiVπ(i)l|y, Y,W,X

]= n1/2

(n−1

n∑i=1

t′0Zi

)(n−1

n∑i=1

Vil

)= 0, l = 1, . . . , d,

and gn(i, j) = cn(i, j) in Lemma A.1. Next we will show that

n−1e′e = n−1n∑i=1

eie′i = n−1

u(θ0)′u(θ0) u(θ0)′V

V ′u(θ0) V ′V

a.s.−→ Σe, (A.71)

or equivalently

n−1V ′Va.s.−→ ΣV , (A.72)

n−1V u(θ0)a.s.−→ ΣV u, (A.73)

n−1n∑i=1

ui(θ0)2 a.s.−→ σ2. (A.74)

By the SLLN (Lemma A.2), n−1W ′V −EP [WiV′i ]

a.s.−→ 0, hence n−1W ′Va.s.−→ 0 and similarly n−1X ′V

a.s.−→ 0.

Combining these with n−1W ′Xa.s.−→ Qwx and n−1X ′X

a.s.−→ Qxx, and using the CMT,

n−1Z ′V = n−1W ′V − n−1W ′X(n−1X ′X)−1n−1X ′Va.s.−→ 0. (A.75)

Thus, on noting that V = Y − ZΓ −Xξ = (In − PZ − PX)V , by the SLLN and CMT

n−1V ′V − EP [ViV′i ] = n−1V ′V − n−1V ′PZV − n−1V ′PXV − EP [ViV

′i ]

a.s.−→ 0

and EP [ViV′i ] → ΣV , we have n−1V ′V

a.s.−→ ΣV which verifies (A.72). Analogously to (A.75), n−1Z ′ua.s.−→ 0

and n−1X ′ua.s.−→ 0. Then, (A.73) follows using (A.68) and on noting that

n−1V u(θ0)− EP [Viui] = n−1V ′u(θ0)− n−1V ′PZ u(θ0)− n−1V ′PX u(θ0)− EP [Viui]

= n−1V ′u− n−1V ′X(X ′X)−1X ′u− EP [Viui]− n−1V ′Z(Z ′Z)−1Z ′u

a.s.−→ 0,

46

and EP [Viui]→ ΣV u. (A.74) holds because n−1∑ni=1 ui(θ0)2−EP [u2

i ] = n−1u′u−EP [u2i ]−n−1u′X(X ′X)−1X ′u

a.s.−→

0 by the SLLN and CMT, and σ2n ≡ EP [u2

i ]→ σ2. Furthermore, note that

D ≡ 1

n− 1

n∑i=1

n∑j=1

g2n(i, j)

=1

n(n− 1)

n∑i=1

n∑j=1

ζ ′iZjZ′jζi

=1

n− 1

n∑i=1

ζ ′i

n−1n∑j=1

ZjZ′j

ζi

=1

n− 1

n∑i=1

ζ ′i(Qzz + oa.s.(1))ζi

=1

n− 1

n∑i=1

ζ ′iQzzζi +1

n− 1

n∑i=1

ζ ′iζioa.s.(1)

=1

n− 1

n∑i=1

t′(ei ⊗ Ik)Qzz(e′i ⊗ Ik)t+

1

n− 1

n∑i=1

ζ ′iζioa.s.(1)

=1

n− 1

n∑i=1

t′(eie′i ⊗Qzz)t+ oa.s.(1)

= t′(Σe ⊗Qzz)t+ oa.s.(1), (A.76)

where the third equality follows from (A.68), the second to the last equality uses (A.71) and the fact that

n−1∑ni=1 ζ

′iζi = n−1

∑ni=1 t

′(eie′i⊗ Ik)t

a.s.−→ t′(Σe⊗ Ik)t which follows from (A.71) and the Slutsky’s lemma.

By the triangular array SLLN,

n−1n∑i=1

‖Vi‖2+δ = Oa.s.(1), n−1n∑i=1

‖Xi‖2+δ = Oa.s.(1), n−1n∑i=1

‖Wi‖2+δ = Oa.s.(1), (A.77)

n−1n∑i=1

‖Zi‖2+δ ≤ 21+δ

(n−1

n∑i=1

‖Wi‖2+δ + ‖Xi‖1+δ‖(X ′X)−1X ′W‖2+δ

)= Oa.s.(1). (A.78)

By the cr inequality and the triangular array SLLN,

n−1n∑i=1

|ui(θ0)|2+δ ≤ 21+δn−1n∑i=1

|ui|2+δ + 21+δn−1n∑i=1

‖Xi‖2+δ‖(X ′X)−1X ′u‖2+δ

= Oa.s.(1). (A.79)

Since V = (In−PZ−PX)V , Vi = Vi−Z ′i(Z ′Z)−1Z ′V −X ′i(X ′X)−1X ′V , and by Jensen’s inequality (applied

with the function f(x) = x2+δ, x > 0), (A.77) and (A.78)

n−1n∑i=1

‖Vi‖2+δ ≤ 31+δ

(n−1

n∑i=1

‖Vi‖2+δ + n−1n∑i=1

‖Zi‖2+δ‖(n−1Z ′Z)−1n−1Z ′V ‖2+δ

+ n−1n∑i=1

‖Xi‖2+δ‖(n−1X ′X)−1n−1X ′V ‖2+δ

)= Oa.s.(1). (A.80)

47

By the cr inequality, (A.79) and (A.80)

n−1n∑i=1

‖ζi‖2+δ ≤ 21+δ‖t0‖2+δn−1n∑i=1

|ui(θ0)|2+δ + 21+δ‖[t1, . . . , td]‖2+δn−1n∑i=1

‖Vi‖2+δ

= Oa.s.(1). (A.81)

We show that for D defined in (A.76) and any ε > 0

limn→∞

n−2n∑i=1

n∑j=1

|Z ′iζj |2

D1

(n−1|Z ′iζj |2

D> ε2

)= 0. (A.82)

Note that

n−2n∑i=1

n∑j=1

|Z ′iζj |2

D1

(n−2|Z ′iζj |2

D> ε2

)= n−2D−1

n∑i=1

n∑j=1

|Z ′iζj |2+δ

|Z ′iζj |δ1(|Z ′iζj |δ > εδDδ/2nδ/2

)≤ n−δ/2ε−δD−1−δ/2n−2

n∑i=1

n∑j=1

|Z ′iζj |2+δ

≤ n−δ/2ε−δD−1−δ/2n−1n∑i=1

‖Zi‖2+δn−1n∑i=1

‖ζi‖2+δ

a.s.−→ 0, (A.83)

where the convergence uses (A.78) and (A.81). Thus, the Lindeberg-type condition in (A.2) is satisfied.

Lemma A.1 and (A.76) yield (A.70).

Step 2: Consistency of the covariance matrix estimates. Next we determine the limits of

Cπs , s = 1, . . . , d, and Σπ. Note first that by Lemma S.3.4 of DiCiccio and Romano (2017) (see also Hoeffding

(1951)),

EPπ [Σπ|y, Y,W,X] = EPπ

[n−1

n∑i=1

ZiZ′iuπ(i)(θ0)2|y, Y,W,X

]

= (n−1Z ′Z)(n−1n∑i=1

ui(θ0)2)a.s.−→ σ2Qzz, (A.84)

where the convergence follows from the CMT, (A.68) and (A.74). On noting that n−1X ′up−→ 0 and

n−1∑ni=1 u

4i − EP [u4

i ]p−→ 0, by Minkowski’s inequality, the triangular array WLLN and the CMT,

n−1n∑i=1

‖Zi‖4 = n−1n∑i=1

‖Wi −W ′X(X ′X)−1Xi‖4

≤ 8n−1n∑i=1

‖Wi‖4 + ‖W ′X(X ′X)−1‖48n−1n∑i=1

‖Xi‖4

= Op(1), (A.85)

n−1n∑i=1

ui(θ0)4 ≤ 8n−1n∑i=1

u4i + 8n−1

n∑i=1

‖Xi‖4‖(X ′X)−1X ′u‖4 = Op(1), (A.86)

hence

n−1n∑i=1

Z2ijZ

2il ≤

(n−1

n∑i=1

Z4ij

)1/2(n−1

n∑i=1

Z4il

)1/2

= Op(1) j, l = 1, . . . , k. (A.87)

48

Therefore, letting Σπjl denote the (j, l), j, l = 1, . . . , k, element of Σπ, (A.86) and (A.87) give

VarPπ [Σπjl|y, Y,W,X] = VarPπ

[n−1

n∑i=1

ZijZiluπ(i)(θ0)2|y, Y,W,X

]

=1

n− 1

(n−1

n∑i=1

Z2ijZ

2il − (n−1

n∑i=1

ZijZil)2

)(n−1

n∑i=1

ui(θ0)4 − (n−1n∑i=1

ui(θ0)2)2

)

≤ 1

n− 1

(n−1

n∑i=1

Z2ijZ

2il

)(n−1

n∑i=1

ui(θ0)4

)p−→ 0, (A.88)

where the second equality follows from Lemma S.3.4 of DiCiccio and Romano (2017). Combining (A.84) and

(A.88), and using Chebyshev’s inequality, Pπ[|Σπjl−EPπ [Σjl|y, Y,W,X]| > ε] ≤ VarPπ [Σπjl|y, Y,W,X]/ε2p−→

0. Thus Σπ − EPπ [Σπ|y, Y,W,X]pπ−→ 0 in P -probability. Combining this with (A.84) and using the triangle

inequality, ‖Σπ−σ2Qzz‖ ≤ ‖EPπ [Σπ|y, Y,W,X]−σ2Qzz‖+‖Σπ−EPπ [Σπ|y, Y,W,X]‖ pπ−→ 0 in P -probability,

hence

n−1n∑i=1

ZiZ′iuπ(i)(θ0)2 pπ−→ σ2Qzz in P -probability. (A.89)

Furthermore, by (A.68), (A.73) and the CMT, for s = 1, . . . , d

EPπ [Cπs |y, Y,W,X] = EPπ

[n−1

n∑i=1

ZiZ′iVπ(i)suπ(i)(θ0)|y, Y,W,X

]= (n−1Z ′Z)(n−1

n∑i=1

Visui(θ0))a.s.−→ QzzΣ

V us ,

(A.90)

where ΣV us denotes the s-th element of ΣV u. Since Vi = Vi − Z ′i(Z ′Z)−1Z ′V − X ′i(X ′X)−1X ′V , applying

Minkowski’s inequality twice and using n−1∑ni=1 ‖Vi‖4 = Op(1) and n−1

∑ni=1 ‖Xi‖4 = Op(1) which hold

by the triangular array WLLN,

n−1n∑i=1

‖Vi‖4 ≤ 8n−1n∑i=1

‖Vi‖4 + 64n−1n∑i=1

‖Zi‖4‖(n−1Z ′Z)−1n−1Z ′V ‖4

+ 64n−1n∑i=1

‖Xi‖4‖(n−1X ′X)−1n−1X ′V ‖4 = Op(1). (A.91)

Then,

VarPπ [Cπs |y, Y,W,X]

= VarPπ

[n−1

n∑i=1

ZijZilVπ(i)suπ(i)(θ0)|y, Y,W,X

]

=1

n− 1

n−1n∑i=1

Z2ijZ

2il −

(n−1

n∑i=1

ZijZil

)2n−1

n∑i=1

V 2isui(θ0)2 −

(n−1

n∑i=1

Visui(θ0)

)2

p−→ 0, (A.92)

where the convergence follows from (A.68), (A.73), (A.87), and

n−1n∑i=1

V 2isui(θ0)2 ≤

(n−1

n∑i=1

V 4is

)1/2(n−1

n∑i=1

ui(θ0)4

)1/2

= Op(1), (A.93)

49

which in turn holds by (A.86) and (A.91). Therefore, from (A.90), (A.92) and Chebyshev’s inequality

Cπs = n−1n∑i=1

ZiZ′iVπ(i)uπ(i)(θ0)

pπ−→ QzzΣV us in P -probability. (A.94)

Step 3: Completing the proof. Let Gπh = [Gπh,1, . . . , Gπh,d]. Using (A.69), (A.89) and (A.94) along

with the CMT, for s = 1, . . . , d

n1/2Jπs − n−1/2Z ′ZΓs

= n−1/2Z ′MιVπ,s −

(n−1

n∑i=1

ZiZ′iVπ(i)suπ(i)(θ0)

)(n−1

n∑i=1

ZiZ′iuπ(i)(θ0)2

)−1

n−1/2Z ′uπ(θ0)

dπ−→ Gπh,s − ΣV us (σ2)−1mπh in P -probability. (A.95)

This proves the part (a). Next we determine the limit of HnJπUnTn following Andrews and Guggenberger

(2019b). Write

n1/2HnJπUnTn = n1/2HnGnUnTn + n1/2Hn(G−Gn)UnTn + n1/2Hn(Jπ − G)UnTn. (A.96)

Using (A.26), (A.27), (A.28), (A.29), (A.32) and the SVD HnGnUn = AnΥnB′n, the first term on the

right-hand side of the equation (A.96) can be expressed as

n1/2HnGnUnTn = n1/2HnGnUnBnSn

= n1/2HnGnUn(Bn,q, Bn,d−q)

Υ−1n,qn

−1/2 0q×(d−q)

0(d−q)×q Id−q

= [HnGnUnBn,qΥ

−1n,q, n

1/2HnGnUnBn,d−q]

= [AnΥnB′nBn,qΥ

−1n,q, n

1/2HnGnUnBn,d−q]

= [An,q, n1/2HnGnUnBn,d−q], (A.97)

where we used ΥnB′nBn,qΥ

−1n,q = ΥnB

′nBn[Iq, 0q×(d−q)]

′Υ−1n,q = [Iq, 0q×(d−q)]

′. For the second sub-matrix in

(A.97),

n1/2HnGnUnBn,d−q = n1/2AnΥnB′nBn,d−q

= n1/2AnΥn[0(d−q)×q, Id−q]′

= An[0(d−q)×q, n1/2Υn,d−q, 0(d−q)×(k−d)]

′

→ h3[0(d−q)×q,diag(h1,q+1, . . . , h1,d), 0(d−q)×(k−d)]′

= h3h1,d−q, (A.98)

where h1,d−q is defined in (A.62). Furthermore,

n1/2Hn(G−Gn)UnTn = n1/2Hn(G−Gn)UnBnSn

= n1/2Hn(G−Gn)Un[Bn,q, Bn,d−q]

Υ−1n,qn

−1/2 0q×(d−q)

0(d−q)×q Id−q

= [Hn(G−Gn)UnBn,qΥ

−1n,q, n

1/2Hn(G−Gn)UnBn,d−q]. (A.99)

50

Using Hn → Σ−1/2, (A.59), Un → h91 where h91 is defined in (A.62), Bn,q → h3,q, and n1/2τjn →∞ for all

j ≤ q,

Hn(G−Gn)UnBn,qΥ−1n,q = Hnn

1/2(G−Gn)UnBn,q(n1/2Υn,q)

−1 = op(1). (A.100)

Since Bn,d−q → h2,d−q,

n1/2Hn(G−Gn)UnBn,d−qd−→ Σ−1/2Ghh91h2,d−q. (A.101)

Using (A.100) and (A.101) in (A.99),

n1/2Hn(G−Gn)UnTnd−→ [0k×q,Σ

−1/2Ghh91h2,d−q]. (A.102)

Moreover, rewrite the the third term on the right-hand side of the equation (A.96) as

n1/2Hn(Jπ − G)UnTn = n1/2Hn(Jπ − G)UnBnSn (A.103)

= n1/2Hn(Jπ − G)Un[Bn,q, Bn,d−q]

Υ−1n,qn

−1/2 0q×(d−q)

0(d−q)×q Id−q

= [Hn(Jπ − G)UnBn,qΥ

−1n,q, n

1/2Hn(Jπ − G)UnBn,d−q]. (A.104)

In addition, by the part (a) of Lemma A.6, n1/2(Jπ − G)dπ−→ Jπh in P -probability. Using an argument

analogous to those in (A.100) and (A.101) and Lemma A.3,

Hn(Jπ − G)UnBn,qΥ−1n,q = opπ (1) in P -probability, (A.105)

n1/2Hn(Jπ − G)UnBn,d−qdπ−→ Σ−1/2Jπhh91h2,d−q in P -probability. (A.106)

Using (A.105) and (A.106) in (A.104),

n1/2Hn(Jπ − G)UnTndπ−→ [0k×q,Σ

−1/2Jπhh91h2,d−q] in P -probability. (A.107)

Combining (A.97), (A.98), (A.102), (A.107), and using An,q → h3,q = ∆πh,q and an argument similar to that

in (A.66) in (A.96), we obtain

n1/2HnJπUnTn

dπ−→ (∆h,q, h3h1,d−q + Σ−1/2(Gh + Jπh )h91h2,d−q) in P -probability.

This proves the part (b). Since ∆πh,d−q is a deterministic function of Jπh , the part (c) follows. The part (d)

follows similarly to (c). The part (e) is also analogous to the parts above.

Define

J+ ≡ [J , H−1Σπ−1/2mπ] ∈ Rk×(d+1), (A.108)

U+ ≡

U 0d×1

01×d 1

∈ R(d+1)×(d+1), U+n ≡

Un 0d×1

01×d 1

∈ R(d+1)×(d+1), (A.109)

h+91 ≡

h91 0d×1

01×d 1

∈ R(d+1)×(d+1), B+n ≡

Bn 0d×1

01×d 1

∈ R(d+1)×(d+1), (A.110)

51

where h91 is as defined in (A.62). Also, let

B+n = [B+

n,q, B+n,d+1−q], B

+n,q ∈ R(d+1)×q, B+

n,d+1−q ∈ R(d+1)×(d+1−q),

J+n ≡ [Gn, 0k×1] ∈ Rk×(d+1), Υ+

n ≡ [Υn, 0k×1] ∈ Rk×(d+1),

S+n ≡ diag((n1/2τ1n)−1, . . . , (n1/2τqn)−1, 1, . . . , 1) =

Sn 0d×1

01×d 1

∈ R(d+1)×(d+1).

Let κ+1 ≥ · · · ≥ κ

+d+1 ≥ 0 denote the eigenvalues of

nU+′J+′H ′HJ+U+. (A.111)

Next we state a result analogous to Theorem 16.6 of Andrews and Guggenberger (2019b).

Theorem A.7 (Asymptotic distribution of the PCLR statistic). Under all sequences λn,h : n ≥ 1 with

λn,h ∈ Λ,

PCLRdπ−→ PCLR∞ ≡ Zπ′Zπ − λmin

[(∆h,d−q,Zπ)′h3,k−qh

′3,k−q(∆h,d−q,Zπ)

]in P -probability,

where Zπ ≡ (σ2Qzz)−1/2mπ

h ∼ N [0, Ik], and the convergence holds jointly with the convergence in Lemma A.6.

If q = d, PCLR∞ = Zπ′h3,dh′3,dZπ ∼ χ2

d. Under all subsequences wn and all sequences λwn,h : n ≥ 1

with λwn,h ∈ Λ, the above results hold with n replaced with wn.

Proof of Theorem A.7. By Lemma A.6, n1/2mπ dπ−→ mπh ∼ N [0k×1, σ

2Qzz] in P -probability and by (A.89)

Σπpπ−→ σ2Qzz in P -probability. Thus, by Slutsky’s lemma

PAR2dπ−→ mπ′

h (σ2Qzz)−1mπ

h = Zπ′Zπ ∼ χ2k in P -probability. (A.112)

By definition, HJ+U+ = [HJU , Σπ−1/2mπ]. As defined in (A.111), the eigenvalues of nU+′J+′H ′HJ+U+

are κ+1 , . . . , κ

+d+1. Then,

κ+d+1 = λmin[n(HJU , Σπ−1/2mπ)′(HJU , Σπ−1/2mπ)] = λmin[nU+′J+′H ′HJ+U+].

Now premultiply the determinantal equation

|nU+′J+′H ′HJ+U+ − κId+1| = 0

by |S+′n B+′

n U+′n (U+)−1′| and postmultiply by |(U+)−1U+

n B+n S

+n | to obtain

|Q+(κ)| = 0, Q+(κ) ≡ nS+′n B+′

n U+′n J+′H ′HJ+U+

n B+n S

+n − κS+′

n B+′n U

+′n (U+)−1′(U+)−1U+

n B+n S

+n .

(A.113)

Since |S+n |, |B+

n |, |U+n | > 0 and |U+| > 0 with Pπ-probability approaching 1 (wppa 1), the eigenvalues

κ+j , 1 ≤ j ≤ d+ 1 above also solve |Q+(κ)| = 0 wppa 1.

Letting Sn,q ≡ diag((n1/2τ1n)−1, . . . , (n1/2τqn)−1) = (n1/2Υn,q)−1 ∈ Rq×q,

B+n S

+n = [B+

n,qSn,q, B+n,d+1−q]. (A.114)

52

Thus, HJ+U+n B

+n S

+n = [HJ+U+

n B+n,qSn,q, HJ

+U+n B

+n,d+1−q], n

1/2HnJ+U+

n B+n,qSn,q = n1/2HnJUnBn,qSn,q,

and by the SVD,HnJnUn = AnΥnB′n, where Υn = [diag(τ1n, . . . , τdn), 0d×(k−d)]

′. SinceAn = [An,q, An,k−q], An,q ∈

Rk×q, An,k−q ∈ Rk×(k−q),

n1/2HJ+U+n B

+n,qSn,q = n1/2HH−1

n HnJUnBn,qSn,q

= n1/2HH−1n HnGnUnBn,qSn,q + n1/2HH−1

n Hn(J −Gn)UnBn,qSn,q

= HH−1n AnΥnB

′nBn[Iq, 0q×(d−q)]

′Υ−1n,q + n1/2HH−1

n Hn(J −Gn)UnBn,q(n1/2Υn,q)

−1

= An[Iq, 0q×(d−q)]′ + opπ (1)

= An,q + opπ (1)

pπ−→ h3,q = ∆h,q in P -probability, (A.115)

where the third equality holds by the definitions of Sn,q,Υn and Υn,q, the fourth equality holds by B′nBn = Id,

J − Gn = Op(n−1/2) (shown in (A.60)), HH−1

np−→ Ik which in turn holds by (A.40), Hn → Σ−1/2 and

the CMT, ‖Hn‖ < ∞, Un = O(1) by definition, Bn = O(1) by the orthogonality of Bn, and n1/2τjn → ∞

for all j ≤ q, and the convergence uses the definition of ∆h,q and Lemma A.3. Using (A.60), (A.98),

Bn,d−q → h2,d−q, Hn → h−1/25,m and Un → h91,

n1/2HnJUnBn,d−q = n1/2HnGnUnBn,d−q + n1/2Hn(J −Gn)UnBn,d−q

d−→ h3h1,d−q + h−1/25,m Jhh91h2,d−q = ∆h,d−q. (A.116)

Then, under H0 : θ = θ0, using (A.116) and an argument analogous to that in (A.65)

n1/2HnJ+U+

n B+n,d+1−q = [n1/2HnJUn, HnH

−1Σπ−1/2mπ]B+n,d+1−q

= [n1/2HnJUnBn,d−q, n1/2HnH

−1Σπ−1/2mπ]

dπ−→ [∆h,d−q,Zπ] in P -probability, (A.117)

where the convergence holds by Slutsky’s lemma, Σπ−1/2mπ dπ−→ Zπ in P -probability, HH−1n

pπ−→ Ik in

P -probability and (A.116). Define

E+ =

E+1 E+

2

E+′2 E+

3

≡ B+′n U

+′n (U+)−1′(U+)−1U+

n B+n − Id+1, (A.118)

where E+1 ∈ Rq×q, E+

2 ∈ Rq×(d+1−q) and E+3 ∈ R(d+1−q)×(d+1−q). By the CMT, (A.41) and (A.54),

U − Unp−→ 0 with Un positive definite, hence (U+)−1U+

np−→ Id+1. Using the CMT again in conjunction

with B+′n B

+n = Id+1, we have E+ p−→ 0(d+1)×(d+1) and by Lemma A.3

E+ pπ−→ 0(d+1)×(d+1) in P -probability. (A.119)

Using (A.115), (A.117), (A.118) and the fact that ∆′h,q∆h,q = h′3,qh3,q = limn→∞A′n,qAn,q = Iq in (A.113)

53

yields

Q+(κ)

=

nS′n,qB+′n,qU

+′n J+′H ′HJ+U+

n B+n,qSn,q nS′n,qB

+′n,qU


n B+n,d+1−q

nB+′n,d+1−qU


n B+n,qSn,q nB+′

n,d+1−qU+′n J+′H ′HJ+U+

n B+n,d+1−q

− κS+′n (E+ + Id+1)S+

n

=

Iq + opπ (1)− κSn,qE+1 Sn,q − κS2

n,q n1/2A′n,qHnJ+U+

n B+n,d+1−q + opπ (1)− κSn,qE+

2

n1/2B+′n,d+1−qU

+′n J+′HnAn,q + opπ (1)− κE+′

2 Sn,q nB+′n,d+1−qU

+′n J+′H ′nHnJ

+U+n B

+n,d+1−q − κE

+3 − κ

.If q = 0, B+

n = B+n,d+1−q and

nB+′n U

+′J+H ′nHnJ+n U

+B+n = nB+′

n ((U+n )−1U+)′(B+

n )−1′B+′n U

+′n J+′H ′n(HH−1

n )′

(HH−1n )(HnJ

+U+n B

+n )(B+

n )−1((U+n )−1U+)B+

n

dπ−→ (∆h,d−q, (σ2Qzz)

−1/2mπh)′(∆h,d−q, (σ

2Qzz)−1/2mπ

h),

where the convergence uses (A.117), (U+n )−1U+ pπ−→ Id+1 and HH−1

npπ−→ Ik in P -probability. Using the

CMT in conjunction with the fact that the smallest eigenvalue of a matrix is a continuous function of the

matrix, and h3,k−qh′3,k−q = h′3h3 = Ik, we obtain the result stated in the theorem.

Next, we consider the case q ≥ 1. Using the formula of partioned matrix determinant and provided

Q+1 (κ) is nonsingular, |Q+(κ)| = |Q+

1 (κ)||Q+2 (κ)|, where

Q+1 (κ) ≡ Iq + opπ (1)− κSn,qE+

1 Sn,q − κS2n,q,

Q+2 (κ) ≡ nB+′

n,d+1−qU+′n J+′H ′nHnJ

+U+n B

+n,d+1−q − κE

+3 − κId+1−q + opπ (1)

− [n1/2B+′n,d+1−qU

+′n J+′HnAn,q + opπ (1)− κE+′

2 Sn,q](Iq + opπ (1)− κSn,qE+1 Sn,q − κS2

n,q)−1

[n1/2A′n,qHnJ+U+

n B+n,d+1−q + opπ (1)− κSn,qE+

2 ]. (A.120)

Moreover, by Lemma A.8 and E+1 = opπ (1),

κ+j S

2n,q = opπ (1), κ+

j Sn,qE+1 Sn,q = opπ (1), j = q + 1, . . . , d+ 1, (A.121)

in P -probability. Thus,

Q+1 (κj) = Iq + opπ (1)− κ+

j Sn,qE+1 Sn,q − κ

+j S

2n,q = Iq + opπ (1),

in P -probability, and so

|Q+1 (κj)| 6= 0, j = q + 1, . . . , d+ 1, wppa 1 in P -probability. (A.122)

Furthermore,

E+3 = opπ (1), (A.123)

n1/2HnJ+U+

n B+n,d+1−q = Opπ (1), (A.124)

Sn,q = o(1), (A.125)

E+2 = opπ (1), (A.126)

κ+j E

+′2 Sn,q(Iq + opπ (1))Sn,qE

+2 = κ+

j E+′2 S2

n,qE+2 + κ+

d+1E+′2 S2

n,qE+2 opπ (1) = opπ (1), (A.127)

54

where (A.123) and (A.126) follow from (A.119), (A.124) holds by (A.117), and (A.127) uses (A.121) and

(A.126). Therefore, using the results in the previous display

Q+2 (κ+

j ) = nB+′n,d+1−qU

+′n J+′H ′nHnJ

+U+n B

+n,d+1−q + opπ (1)

− [n1/2B+′n,d+1−qU

+′n J+′HnAn,q + opπ (1)](Iq + opπ (1))[n1/2A′n,qHnJ

+U+n B

+n,d+1−q + opπ (1)]

− κ+j

[Id+1−q + E+

3 − (n1/2B+′n,d+1−qU

+′n J+′HnAn,q + opπ (1))(Iq + opπ (1))Sn,qE

+2

− E+′2 S′n,q(Iq + opπ (1))(n1/2A′n,qHnJ

+U+n B

+n,d+1−q + opπ (1)) + κ+

j E+′2 Sn,q(Iq + opπ (1))Sn,qE

+2

]= nB+′

n,d+1−qU+′n J+′H ′nHnJ

+U+n B

+n,d+1−q + opπ (1)

− [n1/2B+′n,d+1−qU

+′n J+′HnAn,q + opπ (1)](Iq + opπ (1))[n1/2A′n,qHnJ

+U+n B

+n,d+1−q + opπ (1)]

− κ+j (Id+1−q + opπ (1))

= M+d+1−q − κ

+j (Id+1−q + opπ (1)), (A.128)

where

M+d+1−q ≡ nB

+′n,d+1−qU

+′n J+′H ′nh3,k−qh

′3,k−qHnJ

+U+n B

+n,d+1−q + opπ (1). (A.129)

From (A.113), (A.122) and |Q+(κ)| = |Q+1 (κ)||Q+

2 (κ)|, |Q+2 (κ+

j )| = |M+d+1−q − κ

+j (Id+1−q + opπ (1))| = 0, j =

q + 1, . . . , d+ 1, wppa 1 in P -probability. Therefore, κj : j = q + 1, . . . , d+ 1 are the d+ 1− q eigenvalues

of (Id+1−q + opπ (1))−1/2M+d+1−q(Id+1−q + opπ (1))−1/2 wppa 1 in P -probability. Using (A.117) in (A.129),

(Id+1−q + opπ (1))−1/2M+d+1−q(Id+1−q + opπ (1))−1/2 dπ−→ (∆h,d−q,Zπ)′(∆h,d−q,Zπ) in P -probability.

(A.130)

On noting that the vector of eigenvalues of a matrix is continuous function of the matrix elements, invoking

the CMT yields the convergence in Pπ-distribution of κj : j = q + 1, . . . , d+ 1 to the vector of eigenvalues

of (∆h,d−q,Zπ)′(∆h,d−q,Zπ) in P -probability, and so

λmin[nU+′J+′H ′HJ+U+]dπ−→ λmin[(∆h,d−q,Zπ)′(∆h,d−q,Zπ)] in P -probability. (A.131)

Combining (A.131) with (A.112) completes the proof of the asymptotic distribution of the PCLR statistic.

When q = d, U+n B

+n,d+1−q = [01×d, 1]′ and

n1/2HnJ+U+

n B+n,d+1−q = n1/2Hn[J , H−1Σπ−1/2mπ][01×d, 1]′ = n1/2HnH

−1Σπ−1/2mπ dπ−→ Zπ (A.132)

in P -probability. Therefore, using (A.132) in (A.129), (A.130) and (A.131) gives

λmin[nU+′J+′H ′HJ+U+]dπ−→ Zπ′h3,k−qh

′3,k−qZπ ∼ χ2

k−d.

Since h3h′3 = Ik, PCLR∞ = Zπ′Zπ−Zπ′h3,k−qh

′3,k−qZπ = Zπ′h3,dh

′3,dZπ ∼ χ2

d. The proof for subsequences

follows similarly. Also, the convergence holds jointly with Lemma A.6 because we only used the convergence

of the permutation quantity Sπ ≡ n1/2Σπ−1/2mπ conditional on the convergence of other quantities computed

from the observed data e.g. n1/2(J −Gn) as shown in Lemma A.6.

55

The following lemma is a permutation analog of Lemma 26.1 of Andrews and Guggenberger (2019b).

Lemma A.8 (Rates of convergence of sample eigenvalues). Under all sequences λn,h : n ≥ 1 with λn,h ∈ Λ

and q ≥ 1,

(a) κ+j

pπ−→∞ in P -probability for all j ≤ q,

(b) κ+j = opπ ((n1/2τln)2) in P -probability for all l ≤ q and j = q + 1, . . . , d+ 1,

Under all subsequences wn and all sequences λwn,h : n ≥ 1 with λwn,h ∈ Λ, the same results hold with n

replaced with wn.

The proof of Lemma A.8 is essentially the same as the proofs of Lemma 26.1 of Andrews and Guggen-

berger (2019b) and Lemma 17.1 of Andrews and Guggenberger (2017b) with only minor (mostly notational)

modifications. For completeness, we reproduce the arguments but defer the proof to Section A.8.

A.6 Asymptotic similarity

Proof of Theorem 2.2. The (uniform) asymptotic similarity of the permutation tests are established using

Proposition 16.3 of Andrews and Guggenberger (2019b).

PAR1 test: Let W = [W1, . . . , Wn]′ ≡MιW . Since ι is a column of X, MιMX = MX . Write

W ′πMXΣuMXWπ = W ′πMιΣuMιWπ −W ′πMιX(X ′X)−1X ′ΣuMιWπ −W ′πMιΣuX(X ′X)−1X ′MιWπ

+W ′πMιX(X ′X)−1X ′ΣuX(X ′X)−1X ′MιWπ. (A.133)

Consider the first term n−1W ′πMιΣuMιWπ = n−1W ′πΣuWπ = n−1∑ni=1 Wπ(i)W

′π(i)ui(θ0)2 on the right-hand

side of the expression above. Proceeding similarly to (A.89) and noting that VarP [Wi]−(Qww−QwQ′w)→ 0,

where Qw denotes the column of Qwx corresponding to the element 1 of Xi for Qww and Qwx defined in

(A.23), we obtain

n−1n∑i=1

Wπ(i)W′π(i)ui(θ0)2 − (Qww −QwQ′w)σ2 pπ−→ 0 in P -probability. (A.134)

Next we determine the limit of n−1W ′πMιX. Note that letting [X1, . . . , Xn]′ ≡MιX, Wi = [Wi1, . . . , Wik]′, Wij =

Wij − Wj , Wj ≡ n−1∑ni=1Wij , and Xi = [Xi1, . . . , Xip]

′,

EPπ [n−1W ′πMιX|y, Y,W,X] =

(n−1

n∑i=1

Wi

)(n−1

n∑i=1

Xi

)= 0,

VarPπ

[n−1

n∑i=1

Wπ(i)jXil|y, Y,W,X

]=

1

n− 1

(n−1

n∑i=1

W 2ij

)(n−1

n∑i=1

X2il

)→ 0, j = 1, . . . , k, l = 1, . . . , p,

where the convergence uses n−1∑ni=1 W

2ij − (EP [W 2

ij ] − (EP [Wij ])2)

a.s.−→ 0 and n−1∑ni=1 X

2il − (EP [X2

il] −

(EP [Xil])2)

a.s.−→ 0, which, in turn, hold by the triangular array SLLN and the CMT. By Chebyshev’s inequal-

ity, we obtain

n−1W ′πMιXpπ−→ 0 a.s. (A.135)

56

Consider the part n−1X ′ΣuMιWπ = n−1∑ni=1XiW

′π(i)ui(θ0)2 in the second term of (A.133). Using the

Cauchy-Schwarz inequality twice,

‖n−1X ′ΣuMιWπ‖ ≤

(n−1

n∑i=1

‖Xi‖4)1/4(

n−1n∑i=1

‖Wi‖4)1/4( n∑

i=1

ui(θ0)4

)1/2

= Op(1), (A.136)

where we used∑ni=1 ‖Xi‖4 = Op(1) and

∑ni=1 ‖Wi‖4 ≤

∑ni=1(8‖Wi‖4 + 8‖W‖4) = Op(1) which hold by the

triangular array WLLN, and (A.86). Furthermore, note that n−1∑ni=1XijXiluiX

′i − EP [XijXiluiX

′i]

p−→ 0

and n−1∑ni=1XijXilXiX

′i − EP [XijXilXiX

′i]

p−→ 0, j, l = 1, . . . , p, by the WLLN, EP [‖XijXiluiX′i‖1+ δ

4 ] ≤

(EP [‖Xi‖4+δ])1/2(EP [u4+δi ])1/4(EP [‖Xi‖4+δ])1/4 < ∞ by the Cauchy-Schwarz inequality, and E[‖Xi‖4+δ] <

∞. Therefore, we have

n−1X ′ΣuX − EP [XiX′iu

2i ] = n−1

n∑i=1

XiX′iui(θ0)2 − EP [XiX

′iu

2i ]

= n−1n∑i=1

XiX′i(ui(θ0)2 − 2uiX

′i(X

′X)−1X ′u+ u′X(X ′X)−1XiX′i(X

′X)−1X ′u)

− EP [XiX′iu

2i ]

p−→ 0, (A.137)

where the last line uses the WLLN, CMT, EP [‖XiX′iu

2i ‖1+δ/4] ≤ (EP [‖Xi‖4+δ])1/2(EP [u4+δ

i ])1/2, n−1X ′up−→

0 and (X ′X)−1 − (EP [XiX′i])−1 p−→ 0 which follow from the WLLN and the CMT. Using (A.134), (A.135),

(A.136), (A.137), and the CMT in (A.133), we obtain

n−1W ′πMXΣuMXWπ − (Qww −QwQ′w)σ2 p−→ 0 in P -probability. (A.138)

Next, we determine the limit of n−1/2W ′πu(θ0). By the cr inequality and the triangular array SLLN,

n−1n∑i=1

ui(θ0)2+δ − 21+δ EP [u2+δi ] ≤ 21+δ(n−1

n∑i=1

u2+δi − EP [u2+δ

i ] + n−1n∑i=1

‖Xi‖2+δ‖(X ′X)−1X ′u‖2+δ)

a.s.−→ 0, (A.139)

and for any t ∈ Rk

n−1n∑i=1

‖t′Wi‖2+δ ≤ ‖t‖2+δn−1n∑i=1

‖Wi‖2+δ ≤ ‖t‖2+δ21+δ

(n−1

n∑i=1

‖Wi‖2+δ + ‖n−1n∑i=1

Wi‖2+δ

)= Oa.s.(1),

(A.140)

D ≡ 1

n− 1

n∑i=1

n∑j=1

g2n(i, j) =

1

n(n− 1)

n∑i=1

n∑j=1

(t′Wiuj(θ0))2 =1

n− 1

n∑i=1

ui(θ0)2n−1n∑i=1

(t′Wi)2

= σ2t′VarP [Wi]t+ oa.s.(1). (A.141)

57

Note that by (A.138), (A.140) and (A.141), for any ε > 0

n−2n∑i=1

n∑j=1

(t′Wiui(θ0))2

D1

(n−1(t′Wiui(θ0))2

D> ε2

)

= n−2D−1n∑i=1

n∑j=1

|t′Wiui(θ0)|2+δ

|t′Wiui(θ0)|δ1(|t′Wiui(θ0)|δ > εδDδ/2nδ/2

)≤ n−δ/2ε−δD−1−δ/2n−2

n∑i=1

n∑j=1

|t′Wiui(θ0)|2+δ

≤ n−δ/2ε−δD−1−δ/2n−1n∑i=1

‖t′Wi‖2+δn−1n∑i=1

|ui(θ0)|2+δ

a.s.−→ 0. (A.142)

Thus,

limn→∞

n−2n∑i=1

n∑j=1

(t′Wiui(θ0))2

D1

(n−1(t′Wiui(θ0))2

D> ε2

)= 0. (A.143)

On noting that we may write n−1/2W ′πMX u(θ0) = n−1/2W ′πu(θ0) = n−1/2W ′uπ(θ0) for a permutation π

uniformly distributed over Gn, Lemma A.1 gives

n−1/2t′W ′πu(θ0)dπ−→ N [0, t′(Qww −QwQ′w)tσ2] P -almost surely. (A.144)

Using the Cramer-Wold device

n−1/2W ′πu(θ0)dπ−→ N [0, (Qww −QwQ′w)σ2] P -almost surely. (A.145)

Finally, from the Slutsky’s lemma, (A.138), (A.145) followed by the Polya’s theorem (Theorem 11.2.9 of

Lehmann and Romano (2005)), we obtain PAR1d−→ PAR1,∞ ∼ χ2

k in P -probability, and


k ≤ x]| p−→ 0.

The above result holds for any subsequence wn of n. Let r1−α(χ2k) denote the 1 − α quantile of χ2

k

distribution. Since χ2k distribution function is continuous and strictly increasing at its 1 − α quantile, we

have PAR1(r)p−→ r1−α(χ2

k) by Lemma 11.2.1 of Lehmann and Romano (2005). By definition, EP [φPAR1n ] =

P [AR > PAR1(r)] + aPAR1P [AR = PAR1(r)], hence

P [AR > PAR1(r)] ≤ EP [φPAR1n ] ≤ P [AR ≥ PAR1(r)].

By Theorem 16.6 of Andrews and Guggenberger (2019b), ARd−→ AR∞ ∼ χ2

k. By Slutsky’s lemma (Corollary

11.2.3 of Lehmann and Romano (2005)), we have limn→∞ P [AR > PAR1(r)] = limn→∞ P [AR ≥ PAR1(r)] =

P [AR∞ > r1−α(χ2k)] because of the continuity of the χ2

k distribution function at its 1−α quantile. Therefore,

limn→∞

EP [φPAR1n ] = P [AR∞ > r1−α(χ2

k)] = α.

Thus, Assumption 3 is verified for the PAR1 statistic. By Proposition 16.3 of Andrews and Guggenberger

(2019b), we obtain the desired result.

58

PAR2 test: By (A.112), we have PAR2dπ−→ χ2

k in P -probability, and by the Polya’s theorem


k ≤ x]| p−→ 0.

The remaining argument is analogous to the PAR1 test.

PLM test: Note first that because Σn and Tn are nonsingular

PLM = n mπ′Σπ−1/2PΣπ−1/2Jπ Σπ−1/2mπ

= n mπ′Σπ−1/2P(Σπ−1/2Σ

1/2n )n1/2Σ

−1/2n JπTn

Σπ−1/2mπ.

By Lemma A.6(d), n1/2Σ−1/2n JπTn

dπ−→ ∆πh = [∆π

h,q,∆πh,d−q] = [h3,q, h3h1,d−q+h

−1/25,m (Gh+Jπh )h2,d−q], where

h5,m = Σ. The only random term conditional on the data is h−1/25,m Jπhh2,d−q. We apply Corollary 16.2 of

Andrews and Guggenberger (2017b) with p = d, q∗ = q, ∆q∗ = h3,q ∈ Rk×q, ∆p−q∗ = ∆πh,d−q ∈ Rk×(d−q),

M = h′3 ∈ Rk×k, M1 = h′3,q ∈ Rq×k, M2 = h′3,k−q ∈ R(k−q)×k, ξ2 ∈ Rd−q and ∆ = ∆πh ∈ Rk×d. Note that for

ξ2 ∈ Rd−q with ‖ξ2‖ = 1,

VarPπ (h′3,k−qΣ−1/2Jπhh2,d−qξ2)

= VarPπ (vec(h′3,k−qΣ−1/2Jπhh2,d−qξ2))

= VarPπ(

((h2,d−qξ2)′ ⊗ (h′3,k−qΣ−1/2))(vec(Jπh ))

)=(

(h2,d−qξ2)′ ⊗ (h′3,k−qΣ−1/2)

)VarPπ (vec(Jπh ))

((h2,d−qξ2)⊗ (Σ−1/2h3,k−q)

)=(

(h2,d−qξ2)′ ⊗ (h′3,k−qΣ−1/2)

) ((ΣV − ΣV u(σ2)−1ΣuV )⊗Qzz

) ((h2,d−qξ2)⊗ (Σ−1/2h3,k−q)

),

= (ξ′2h′2,d−q(Σ

V − ΣV u(σ2)−1ΣuV )h2,d−qξ2)(h′3,k−qΣ−1/2QzzΣ

−1/2h3,k−q),

where the second equality uses the formula vec(ABC) = (C ′⊗A)vec(B) and the fourth equality uses Lemma

A.6(a). Observe that ξ′2h2,d−q(ΣV − ΣV u(σ2)−1ΣuV )h2,d−qξ2 is scalar, and

ξ′2h′2,d−q(Σ

V − ΣV u(σ2)−1ΣuV )h2,d−qξ2

ξ′2h′2,d−qh2,d−qξ2

≥ λmin(ΣV − ΣV u(σ2)−1ΣuV ) > 0.

Moreover, using h′3,k−qh3,k−q = Ik−q

λmin(h′3,k−qΣ−1/2QzzΣ

−1/2h3,k−q) = minη∈Rk−q\0

η′h′3,k−qΣ−1/2QzzΣ

−1/2h3,k−qη

η′η

= minη∈Rk−q\0

η′h′3,k−qΣ−1/2QzzΣ

−1/2h3,k−qη

η′h′3,k−qh3,k−qη

≥ mina∈Rk\0

a′Σ−1/2QzzΣ−1/2a

a′a

≥ λmin(Σ−1)λmin(Qzz)

> 0.

Therefore, rank(VarPπ (h′3,k−qΣ−1/2Jπhh2,d−qξ2)) = k − q ≥ d − q, and by Corollary 16.2 of Andrews and

Guggenberger (2017b), Pπ[rank(∆πh) = d] = 1 in P -probability. Since (Σπ)−1/2Σ1/2 = (σ2Qzz)

−1/2Σ1/2 is

59

nonsingular, Pπ[rank((Σπ)−1/2Σ1/2∆πh) = d] = 1 in P -probability. As shown in the proof of Theorem 11.1

of Andrews and Guggenberger (2017b), the mapping defined as f : Rk×d → Rk×k, f(D) = D(D′D)−1D′ is

continuous at D ∈ Rk×d that has full column rank with probability one. Then,

PLM = n mπ′Σπ−1/2P(Σπ−1/2Σ

1/2n )n1/2Σ

−1/2n JπTn

Σπ−1/2mπ

dπ−→ mπ′h Σπ−1/2P(Σπ)−1/2Σ1/2∆π

hΣπ−1/2mπ

h

∼ χ2d in P -probability,

where the convergence follows from the CMT, Lemma A.6 and the full column rank property established

above, and the last line holds because using the independence between ∆πh and mπ

h, the full column rank

of ∆πh and Σπ−1/2mπ

h ∼ N [0, Ik], the equality in distribution holds conditional on ∆πh, hence also holds

unconditionally. The null asymptotic distribution of the sample LM statistic follows from Theorem 4.1 of

Andrews and Guggenberger (2017a) because the parameter space assumptions in the equations (3.3), (3.9)

and (3.10) of Andrews and Guggenberger (2017a) are implied by Assumption 2. The asymptotic similarity

of the PLM test is similar to the PAR tests, thus is omitted.

PCLR tests: By Theorem A.7,

PCLRdπ−→ PCLR∞ = Zπ′Zπ − λmin

[(∆h,d−q,Zπ)′h3,k−qh

′3,k−q(∆h,d−q,Zπ)

]in P -probability.

Denote the SV’s of h′3,q∆h,d−q by τ(2)h = [τ(q+1)n, . . . , τdn]′. Then, rk,d,q(h

′3,k−q∆h,d−q, 1−α) = rk,d,q(τ

(2)h , 1−

α) and the distribution of PCLR∞ is continuous and strictly increasing at its 1− α quantile by Lemma A.4

and A.5 because conditional on the data (e.g. ω ∈ Ω1 with Ω1 defined in (A.63)), h′3,q∆h,d−q is nonrandom.

It follows from Lemma 11.2.1 of Lehmann and Romano (2005) (see also Theorem 15.2.3 therein) that

rk,d(T , 1− α)p−→ rk,d,q(h

′3,k−q∆d−q, 1− α). (A.146)

By Lemma 16.6 of Andrews and Guggenberger (2019b),

CLRd−→ CLR∞ ≡ Z ′Z − λmin

[(∆h,d−q,Z)′h3,k−qh

′3,k−q(∆h,d−q,Z)

], Z ∼ N [0k×1, Ik]. (A.147)

By the CMT, (A.146) and (A.147)

CLR− rk,d(T , 1− α)d−→ CLR∞ − rk,d,q(h′3,k−q∆h,d−q, 1− α). (A.148)

As shown in Andrews and Guggenberger (2019b) p.72,

P [CLR∞ = rk,d,q(h′3,k−q∆h,d−q, 1− α)] = 0. (A.149)

Then,

P [CLR > PCLR(r)]→ P [CLR∞ > rk,d,q(h′3,k−q∆h,d−q, 1− α)]

= E[P [CLR∞ > rk,d,q(h

′3,k−q∆h,d−q, 1− α)|∆h,d−q]

]= α,

60

where the convergence follows from (A.148) and (A.149), the first equality holds by the law of total expec-

tations, and the last equality holds because the conditional rejection probability is α. (A.149) also implies

that limn→∞ P [CLR ≥ PCLR(r)] = α. Since EP [φPCLRn ] = P [CLR > PCLR(r)] + aPCLRP [CLR = PCLR(r)],

P [CLR > PCLR(r)] ≤ EP [φPCLRn ] ≤ P [PCLR ≥ PCLR(r)].

By the sandwich rule,

limn→∞

EP [φPCLRn ] = α.

The above result holds for any subsequence wn of n. Thus, Assumption 3 holds for the PCLR statistic.

Using Proposition 16.3 of Andrews and Guggenberger (2019b), we obtain the desired result.

A.7 Local power under strong identification

Proof of Proposition 2.3. Under H1 : θn = θ0 + hθn−1/2,

ui(θ0) = (W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn

−1/2 + ui −X ′i(X ′X)−1X ′u.

Note that ‖Z∗i ‖2‖Wi‖2, ‖Z∗i ‖2‖Xi‖2, ‖Z∗i ‖2‖Vi‖2, ‖Z∗i ‖2u2i , ‖Z∗i ‖2‖Wi‖‖Xi‖, ‖Z∗i ‖2‖Wi‖‖Vi‖, ‖Z∗i ‖2‖Wi‖|ui|,

‖Z∗i ‖2‖Xi‖‖Vi‖, ‖Z∗i ‖2‖Xi‖|ui|, ‖Z∗i ‖2‖Vi‖|ui| have finite 1+δ/2 moments by the Cauchy-Schwarz inequality.

The same moment bounds are also obtained when ‖Z∗i ‖2 in the previous quantities is replaced by ‖Xi‖2 and

‖Z∗i ‖‖Xi‖. By the triangular array WLLN, CMT and the fact that Dp−→ 0 in (A.35),

Σ = n−1n∑i=1

ZiZ′iui(θ0)2 = n−1

n∑i=1


′

((W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn

−1/2 + ui −X ′i(X ′X)−1X ′u)2

= EP [Z∗i Z∗′i u

2i ] + op(1) = Σn + op(1) = h5,m + op(1). (A.150)

Proceeding similarly to the argument in (A.150),

Cs = n−1n∑i=1

ZiZ′iYisui(θ0)

= n−1n∑i=1


′(Z∗′i Γs + Vis +X ′i(D′Γs − (X ′X)−1X ′V, s))

((W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn

−1/2 + ui −X ′i(X ′X)−1X ′u)

= n−1n∑i=1

Z∗i Z∗′i (Z∗′i Γs + Vis)ui + op(1) = EP [Z∗i Z

∗′i (Z∗′i Γs + Vis)ui] + op(1) = Cns + op(1). (A.151)

The limit in (A.38) remains unchanged under the local alternatives. Thus, from (A.150) and (A.151), we

have

Va(θ0) = Vn,a + op(1). (A.152)

Next to determine limit of Vb(θ0), note first that

εi(θ0)−

01×k

−Γ ′

Zi =

(W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn

−1/2 + ui −X ′i(X ′X)−1X ′u

−Vi + V ′X(X ′X)−1Xi

61

and

εin(θ0)−

01×k

−Γ ′

Zi =

n−1/2h′θΓ ′Zi + n−1/2h′θV′Z(Z ′Z)−1Zi + u′Z(Z ′Z)−1Zi

−V ′Z(Z ′Z)−1Zi

.Hence,

εi(θ0)− εin(θ0) =

ai

0d×1

+

ui −X ′i(X ′X)−1X ′u− u′Z(Z ′Z)−1Zi

−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi

, (A.153)

where

ai ≡ (W ′iΓ +V ′i )hθn−1/2−X ′i(X ′X)−1X ′(WΓ +V )hθn

−1/2+n−1/2h′θΓ ′Zi+n−1/2h′θV

′Z(Z ′Z)−1Zi. (A.154)

Next we show that n−1∑ni=1 a

2iZiZ

′i

p−→ 0. For Di ∈ Wi, Xi, Vi and j + l ≤ 4, j ≥ 0, l ≥ 2, by the cr

inequality, and the WLLN,

n−1n∑i=1

‖Di‖j‖Zi‖l ≤ n−1n∑i=1

‖Di‖j2l−1(‖Wi‖l + ‖W ′X(X ′X)−1‖l‖Xi‖l) = Op(1). (A.155)

Therefore, using (A.155) and n−1X ′X = Op(1), n−1X ′W = Op(1), n−1Z ′Z = Op(1), n−1Z ′V = Op(1),

n−1X ′V = Op(1) which hold by the WLLN and the CMT,

‖n−1n∑i=1

a2iZiZ

′i‖ ≤ ‖hθ‖2n−1

(n−1

n∑i=1

‖W ′iΓ + V ′i ‖2‖Zi‖2 + ‖(X ′X)−1X ′(WΓV )‖2n−1n∑i=1

‖Xi‖2‖Zi‖2

+ ‖Γ‖2n−1n∑i=1

‖Zi‖3 + ‖V ′Z(Z ′Z)−1‖2n−1n∑i=1

‖Zi‖3

+ 2‖(X ′X)−1X ′(WΓ + V )‖n−1n∑i=1

‖Xi‖‖Zi‖2 + 2‖Γ‖n−1n∑i=1

‖W ′iΓ + V ′i ‖‖Zi‖3

+ 2‖V ′Z(Z ′Z)−1‖n−1n∑i=1

‖W ′iΓ + V ′i ‖‖Zi‖3

+ 2‖(X ′X)−1X ′(WΓ + V )‖‖Γ‖n−1n∑i=1

‖Xi‖‖Zi‖3

+ 2‖V ′Z(Z ′Z)−1‖‖(X ′X)−1X ′(WΓ + V )‖n−1n∑i=1

‖Xi‖‖Zi‖3

+ 2‖V ′Z(Z ′Z)−1‖‖Γ‖n−1n∑i=1

‖Zi‖4)

p−→ 0. (A.156)

Similarly, we have

n−1n∑i=1

ai(ui −X ′i(X ′X)−1X ′u− u′Z(Z ′Z)−1Zi)⊗ (ZiZ′i)

p−→ 0, (A.157)

n−1n∑i=1

ai(−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi)⊗ (ZiZ′i)

p−→ 0. (A.158)

62

As shown in (A.51)

n−1n∑i=1


−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi


−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi

′ ⊗ (ZiZ′i)

− EP

u2i −uiV ′i

−uiVi ViV′i

⊗ (Z∗i Z∗′i )

p−→ 0. (A.159)

By (A.153), (A.156), (A.157), (A.158), (A.159) and the CMT, we have

Vb(θ0)− Vn,bp−→ 0. (A.160)

AR and PAR statistics: Since, under H1 : θn = θ0 + hθn−1/2, y − Y θ0 = Y hθn

−1/2 +Xγ + u, using

n−1W ′MXWa.s.−→ Qzz and n−1W ′MXV

a.s.−→ 0, the convergence in Lemma 16.4 of Andrews and Guggenberger

(2019b) and Slutsky’s lemma we obtain

n1/2m(θ0) = n−1/2W ′MX(y − Y θ0)

= n−1/2W ′MX(Y hθn−1/2 +Xγ + u)

= n−1W ′MXWΓhθ + n−1W ′MXV hθ + n−1/2W ′MXu

d−→ N [Ghθ,Σ] , (A.161)

where Σ = limn→ EP [Z∗i Z∗′i u

2i ] and G = limn→∞ EP [Z∗i Z

∗′i ]Γ = h4. (A.150), (A.161) and the CMT together

yield ARd−→ χ2

k(η2).

Next we derive the asymptotic distribution of the permutation statistics. Since n−1W ′MXWa.s.−→ Qzz,

n−1V ′MXWa.s.−→ 0, n−1W ′MXu

a.s.−→ 0, n−1V ′MXVa.s.−→ ΣV , n−1u′MXV

a.s.−→ ΣuV and n−1u′MXua.s.−→ σ2,

by the SLLN and CMT

n−1u(θ0)′u(θ0) = n−1(MXWΓhθn

−1/2 +MXV hθn−1/2 +MXu

)′ (MXWΓhθn

−1/2 +MXV hθn−1/2 +MXu

)a.s.−→ σ2. (A.162)

By Jensen’s inequality, the WLLN, CMT, and (A.86)

n−1n∑i=1

ui(θ0)4 − EP [u4i ]

≤ 27n−1n∑i=1

‖W ′iΓ + V ′i ‖4‖hθ‖4n−2 + 27n−1n∑i=1

‖Xi‖4‖(X ′X)−1X ′(WΓ + V )‖4‖hθ‖4n−2

+ 27n−1n∑i=1

(ui −X ′i(X ′X)−1X ′u)4 − EP [u4i ]

p−→ 0, (A.163)

where for the first term we used n−1∑ni=1 ‖W ′iΓ + V ′i ‖4 ≤ 8n−1

∑ni=1(‖Wi‖4‖Γ‖4 + ‖Vi‖4) = Op(1) which

holds by the cr inequality and the WLLN. (A.134) holds by (A.162), (A.163) and an argument analogous to

63

the consistency of Σπ in (A.89). (A.136) holds by (A.163). (A.137) holds by an argument similar to (A.150).

Combining these and following an argument analogous to that in (A.138) then give

n−1W ′πMXΣuMXWπ − (Qww −QwQ′w)σ2 p−→ 0 in P -probability. (A.164)

Furthermore, the convergence in (A.89) holds here using (A.162) and (A.163) in (A.84) and (A.88), respec-

tively:

Σπpπ−→ σ2Qzz in P -probability. (A.165)

(A.141) holds by (A.162). A bound analogous to (A.139) is obtained by an argument similar to (A.163).

Thus, (A.142) and (A.143) hold. Therefore, by Theorem A.1

n−1/2W ′πMX(y − Y θ0) = n−1/2W ′Mιuπ(θ0)dπ−→ N

[0, (Qww −QwQ′w)σ2

]P -almost surely,

n1/2mπ(θ0) = n−1/2Z ′uπ(θ0)dπ−→ N

[0, Qzzσ

2]

P -almost surely. (A.166)

Finally, by (A.164), (A.165), (A.166) and Slutsky’s lemma, we obtain that PAR1dπ−→ χ2

k and PAR2dπ−→ χ2

k

in P -probability.

LM and PLM statistics: By definition, q = d if and only if n1/2τdn →∞. Decompose n1/2HnJUnTn =

n1/2HnGnUnTn + n1/2Hn(J −Gn)UnTn. By an analogous argument to that in (A.97) (see also Lemma 16.4

and the equations (25.5)-(25.6) of Andrews and Guggenberger (2019b)),

n1/2HnGnUnTnd−→ h3,q (A.167)

because the second term involving Bn,d−q in (A.97) drops out when q = d. The asymptotic distribution of

n1/2(G − Gn) remains invariant under H1 : θn = θ0 + hθn−1/2. This combined with (A.150), (A.151) and

(A.161) yields

n1/2(J −Gn) = Op(1). (A.168)

Therefore, n1/2Hn(J −Gn)UnTn = Hnn1/2(J −Gn)UnBn,qn

−1/2Υ−1n,q = op(1) because n1/2τdn →∞, and

n1/2HnJUnTnd−→ h3,q. (A.169)

Using Σ−1/2H−1n

p−→ Ik which holds by (A.150) and the CMT, (A.150), (A.169) and the CMT

LM = n m′Σ−1/2PΣ−1/2J Σ−1/2m = n m′Σ−1/2PΣ−1/2H−1n HnJUnTn

Σ−1/2m

d−→ χ2d(η

2), (A.170)

where we used PΣ−1/2H−1n HnJUnTn

p−→ Ph3,q , and the fact that Un and Tn are nonsingular, and h3,q = h3,d

has full column rank d.

Next consider the PLM statistic. To see that (A.79) holds under the local alternatives, note that by

64

Jensen’s inequality, the SLLN and (A.79),

n−1n∑i=1

|ui(θ0)|2+δ/2 ≤ 31+δ/2n−1n∑i=1

‖W ′iΓ + V ′i ‖2+δ/2‖hθn−1/2‖2+δ/2

+ 31+δn−1n∑i=1

‖Xi‖2+δ/2‖(X ′X)−1X ′(WΓ + V )‖2+δ/2‖hθn−1/2‖2+δ/2

+ 31+δ/2n−1n∑i=1

|ui −X ′i(X ′X)−1X ′u|2+δ/2

= Oa.s.(1), (A.171)

where n−1∑ni=1 ‖W ′iΓ + V ′i ‖2+δ/2 ≤ 21+δ/2n−1

∑ni=1(‖Wi‖2+δ/2‖Γ‖2+δ/2 + ‖Vi‖2+δ/2) = Oa.s.(1) by the cr

inequality and the triangular array SLLN. (A.81) holds by (A.171), hence (A.82) holds. Also, by the SLLN

and the CMT

n−1u(θ0)′V (A.172)

= n−1(MXWΓhθn−1/2 +MXV hθn

−1/2 +MXu)′(In − PZ − PX)V,

= n−1/2h′θn−1V ′(MX −MXW (W ′MXW )−1W ′MX)V + n−1u′(MX −MXW (W ′MXW )−1W ′MX)V

a.s.−→ ΣuV . (A.173)

Hence, (A.76) holds. By (A.82) and Lemma A.1, we then obtain (A.70). Note that (A.92) holds because

(A.93) holds by (A.163). Also, (A.90) holds by (A.173). This verifies (A.94). Combining the latter with

(A.70) and (A.165), we obtain

n1/2(Jπ − G) = Opπ (1) in P -probability. (A.174)

Thus, following the proof of Lemma A.6, noting that the term involving Bn,d−q in (A.97) evacuates when

q = d, and using (A.168) and (A.174), we obtain

n1/2HnJπUnTn = n1/2HnGnUnTn + n1/2Hn(G−Gn)UnTn + n1/2Hn(Jπ − G)UnTn

pπ−→ ∆πh = h3,d in P -probability,

where ∆πh = h3,q = h3,d = ∆h has full column rank. Since (Σπ)−1/2Σ1/2 = (σ2Qzz)

−1/2Σ1/2 is nonsingular,

Pπ[rank((Σπ)−1/2Σ1/2∆πh) = d] = 1 with P -probability one. Then,

PLM = n mπ′Σπ−1/2PΣπ−1/2Jπ Σπ−1/2mπ


1/2n )n1/2Σ

−1/2n JπUnTn

Σπ−1/2mπ

dπ−→ mπ′h Σπ−1/2P(Σπ)−1/2Σ1/2∆π

hΣπ−1/2mπ

h

∼ χ2d in P -probability, (A.175)

where the second equality holds because Σn and Tn are nonsingular, the convergence follows from the CMT,

the full column rank property verified above, and the last line uses (A.166) and the fact that P(Σπ)−1/2Σ1/2∆πh

is idempotent with rank d.

65

CLR and PCLR statistics: The asymptotic distributions of the CLR and PCLR statistics under the

sequence of local alternatives and strong or semi-strong identification with n1/2τdn → ∞, are derived pro-

ceeding similarly to Proposition A.7 and Lemma 28.1 of Andrews and Guggenberger (2019b). By definition,

q = d if and only if n1/2τdn → ∞ in which case Q+2 (κ) defined in (A.120) is scalar. From (A.128) which

also holds under the local alternatives because J −Gn = Op(n−1/2) as verified in (A.168), all the remaining

quantities n1/2Σπ−1/2mπ, Σ, V ∈ Va(θ0), Vb(θ0) have the same limits as under the null as shown in (A.150),

(A.152), (A.160), (A.165), (A.166), and

Q+2 (κ+

j ) = M+d+1−q − κ

+j (Id+1−q + opπ (1)), (A.176)

where

M+d+1−q ≡ nB

+′n,d+1−qU

+′J+′H ′nh3,k−qh′3,k−qHnJ

+U+n B

+n,d+1−q + op(1). (A.177)

Since |Q+2 (κ+

d+1)| = |M+d+1−q − κ+

d+1(Id+1−q + opπ (1))| = 0 with Pπ-probability approaching 1, in P -

probability, we obtain

κ+d+1 = nB+′

n,d+1−qU+′n J+′H ′nh3,k−qh

′3,k−qHnJ

+U+n B

+n,d+1−q(1 + opπ (1)) + opπ (1),

= n mπ′Σπ−1/2h3,k−qh′3,k−qΣ

π−1/2mπ + opπ (1) in P -probability. (A.178)

When q = d and under the local alternativesH1 : θn = θ0+hθn−1/2, as shown in (A.169), n1/2HJπUnBnSn

dπ−→

∆πh = h3,q as the term ∆π

h,d−q evacuates. Then,

PCLR = n mπ′Σπ−1mπ − κ+d+1 (A.179)

= n mπ′Σπ−1/2(Ik − h3,k−qh′3,k−q)Σ

π−1/2mπ + opπ (1)

= n mπ′Σπ−1/2h3,qh′3,qΣ

π−1/2mπ + opπ (1)

= n mπ′Σπ−1/2Ph3,qΣπ−1/2mπ + opπ (1)


1/2n )n1/2Σ

−1/2n JπUnTn

Σπ−1/2mπ + opπ (1)

dπ−→ χ2d in P -probability,

where the second equality uses (A.178), the third and the fourth equalities use Pn1/2HJπUnBnSn= Ph3,q

+

op(1) = h3,q(h′3,qh3,q)

−1h′3,q + op(1) = h3,qh′3,q + op(1) and (A.161) since h′3,qh3,q = Id, and the convergence

follows from (A.175).

Recall that by (A.167) (see also Lemma A.6, Lemma 10.3 of Andrews and Guggenberger (2017b) and

Lemma 16.4 of Andrews and Guggenberger (2019b)), n1/2HJUnBnSnd−→ h3,q. By replacing Σπ−1/2mπ in

(A.178) and (A.179) with Σ−1/2m and using (A.161), the asymptotic distribution of the CLR statistic is

obtained similarly:

CLR = n m′Σ−1/2h3,qh′3,qΣ

−1/2m+ op(1)

= n m′Σ−1/2PΣ−1/2H−1n HnJUnTn

Σ−1/2m

= n m′Σ−1/2PΣ−1/2J Σ−1/2m+ op(1)

d−→ χ2d(η

2),

66

where the second and the third equalities use PΣ−1/2J = Pn1/2HJUnBnSn= Ph3,q+op(1) = h3,q(h

′3,qh3,q)

−1h′3,q+

op(1) = h3,qh′3,q + op(1) and the convergence uses (A.170).

A.8 Proof of Lemma A.8

Proof of Lemma A.8. In view of Lemma A.3, op(1) terms are opπ (1) in P -probability. For simplicity, we

suppress the qualifier in P -probability. Also, whenever we use the convergence result n1/2(J −Gn) = Op(1)

given in (A.60), we are conditioning on the event ω ∈ Ω1 as discussed in Section A.4. The result for

subsequences follow similarly, thus we provide only the full sequence result.

Recall that by definition, h6,j ≡ limn→∞ τ(j+1)n/τjn, j = 1, . . . , d, and h6,q = 0 if q < d + 1. Define

h6,q = 0 if q = d + 1. Because τ1n ≥ · · · ≥ τ(d+1)n ≥ 0, h6,j ∈ [0, 1]. When h6,j = limn→∞ τ(j+1)n/τjn > 0,

τjn : n ≥ 1 and τ(j+1)n : n ≥ 1 have the same orders of magnitude.

Next divide the first q SV’s τ1n ≥ · · · ≥ τqn into groups such that the SV’s within each group have the same

orders of magnitude. Denote the number of groups by Gh with q ≥ Gh ≥ 1, and the first and last indices

of the SV’s in the g-th group by rg and r∗g respectively. Thus, r1 = 1, r∗g = rg+1 − 1 and r∗Gh = q with

the definition rGh+1 ≡ q + 1. By construction, h6,j > 0 for all j = rg, . . . , r∗g − 1, g = 1, . . . , Gh. Also,

limn→∞ τj′n/τjn = 0 for any j ∈ g and j′ ∈ g′ with g 6= g′. When d = 1, Gh = 1 and r1 = r∗1 = 1.

Step 1: The first group of eigenvalues. Recall that the eigenvalues of nU+′J+′H ′HJ+U+ are

denoted as κ+1 , . . . , κ

+d+1. Then,

κ+d+1 = λmin[n(HJU , Σπ−1/2mπ)′(HJU , Σπ−1/2mπ)] = λmin[nU+′J+′H ′HJ+U+].

Now premultiply the determinantal equation

|nU+′J+′H ′HJ+U+ − κId+1| = 0

by |τ−2r1nn

−1B+′n U

+′n (U+)−1′| and postmultiply by |(U+)−1U+

n B+n | to obtain

|τ−2r1nB

+′n U


n B+n − (n1/2τr1n)−2κB+′

n U+′n (U+)−1′(U+)−1U+

n B+n | = 0. (A.180)

Since |B+n |, |U+

n | > 0 and |U+| > 0 with Pπ-probability approaching 1 (wppa 1), and τr1n > 0 for n large

given n1/2τr1n →∞ for r1 ≤ q, the eigenvalues κ+j , 1 ≤ j ≤ d+ 1 above also solve (A.180) wppa 1. Thus,

(n1/2τr1n)−2κ+j : 1 ≤ j ≤ d+ 1 solve

|τ−2r1nB

+′n U


n B+n − κ(Id+1 + E+)| = 0, (A.181)

where

E+ =

E+1 E+

2

E+′2 E+

3

= B+′n U

+′n (U+)−1′(U+)−1U+

n B+n − Id+1, (A.182)

where E+1 ∈ Rr∗1×r∗1 , E+

2 ∈ Rr∗1×(d+1−r∗1 ) and E+3 ∈ R(d+1−r∗1 )×(d+1−r∗1 ).4 Note that HJ+U+

n B+n =

[HJUnBn, Σπ−1/2mπ] and HJ+

n U+n B

+n = [HGnUnBn, 0k×1], and by the SVD, HnGnUn = AnΥnB

′n. Let

4E+j , j = 1, 2, 3, and E+

j , j = 1, 2, 3, in (A.118) are different matrices because they have different dimensions.

67

O(τr2n/τr1n)d1×d2 denote a d1 × d2 matrix with O(τr2n/τr1n) elements on the main diagonal and zeros else-

where. Thus,

τ−1r1nHJ

+U+n B

+n

= τ−1r1nHJ

+n U

+n B

+n + τ−1

r1nH(J+ − J+n )U+

n B+n ,

= τ−1r1n[HGnUnBn, 0k×1] + (n1/2τr1n)−1[Hn1/2(J −Gn)UnBn, n

1/2Σπ−1/2mπ]

= τ−1r1n[HH−1

n AnΥnB′nBn, 0k×1] +Opπ ((n1/2τr1n)−1),

= τ−1r1n(Ik + opπ (1))AnΥ+

n +Opπ ((n1/2τr1n)−1),

= (Ik + opπ (1))An

h∗6,r∗1 + o(1) 0r∗1×(d−r∗1 ) 0r∗1×1

0(d−r∗1 )×r∗1 O(τr2n/τr1n)(d−r∗1 )×(d−r∗1 ) 0(d−r∗1 )×1

0(k−d)×r∗1 0(k−d)×(d−r∗1 ) 0(k−d)×1

+Opπ ((n1/2τr1n)−1), (A.183)

pπ−→ h3

h∗6,r∗1 0r∗1×(d+1−r∗1 )

0(k−r∗1 )×r∗1 0(k−r∗1 )×(d+1−r∗1 )

, h∗6,r∗1 ≡ diag(1, h6,1, h6,1h6,2, . . . ,

r∗1−1∏l=1

h6,l),

where the third equality holds by B′nBn = Id, J − Gn = Op(n−1/2) as shown in (A.60), n1/2Σπ−1/2mπ =

Opπ (1) as shown in Lemma A.6, HH−1n

p−→ Ik, ‖Hn‖ <∞, Un = O(1) by definition, and Bn = O(1) by the

orthogonality of Bn, the fifth equality uses τjn/τr1n =∏j−1l=1 (τ(l+1)n/τln) =

∏j−1l=1 h6,l + o(1), j = 2, . . . , r∗1 ,

and τjn/τr1n = O(τjn/τr1n), j = r2, . . . , d + 1, which holds by τ1n ≥ . . . τ(d+1)n, and the convergence uses

An → h3, τr2n/τr1n → 0 and n1/2τr1n →∞ since r1 ≤ q. Therefore, using h′3h3 = Ik

τ−2r1nB

+′n U


n B+n

pπ−→

h∗6,r∗1 0r∗1×(d+1−r∗1 )

0(k−r∗1 )×r∗1 0(k−r∗1 )×(d+1−r∗1 )

′ h′3h3

h∗6,r∗1 0r∗1×(d+1−r∗1 )

0(k−r∗1 )×r∗1 0(k−r∗1 )×(d+1−r∗1 )

=

h∗26,r∗10r∗1×(d+1−r∗1 )

0(d+1−r∗1 )×r∗1 0(d+1−r∗1 )×(d+1−r∗1 )

.Since U−Un

p−→ 0 and Un → h91 positive definite, U+−U+n

p−→ 0 and U+n → h+

91. Thus, (U+)−1U+n

p−→ Id+1

and using the CMT in conjunction with B+′n B

+n = Id+1, we have

E+ = B+′n U

+′n U+′−1U+−1U+

n B+n − Id+1

pπ−→ 0(d+1)×(d+1). (A.184)

Thus, (n1/2τr1n)−2κ+j : j = 1, . . . , d + 1 solve |τ−2

r1nB+′n U


n B+n − κId+1| = 0 with wppa

1. Then, since the ordered vector of eigenvalues of a matrix is a continuous function of the matrix, using

Slutsky’s lemma

(n−1κ+1 , . . . , n

−1κ+r∗1

)pπ−→ (1, h2

6,1, h26,1h

26,2, . . . ,

r∗1−1∏l=1

h26,l), (A.185)

n−1κ+j

pπ−→∞, j = 1, . . . , r∗1 , (A.186)

because n1/2τr1n →∞ given r1 ≤ q and h6,l > 0 for all l = 1, . . . , r∗1 −1. The same argument also yields that

the remaining d+1−r∗1 eigenvalues (n1/2τr1n)−2κ+j : j = r∗1+1, . . . , d+1 of |τ−2

r1nB+′n U


n B+n −

68

κId+1| = 0 satisfy

(n1/2τr1n)−2κ+j

pπ−→ 0, j = r∗1 + 1, . . . , d+ 1. (A.187)

Let B+n,j1,j2

denote the (d+ 1)× (j2 − j1) matrix that consists of j1 + 1, . . . , j2 columns of B+n for 0 ≤ j1 <

j2 ≤ d+ 1. Thus, we may write B+n = [B+

n,0,r∗1, B+

n,r∗1 ,d+1]. Proceeding similarly to (A.183),

τ−1r1nH

−1J+U+n B

+n,0,r∗1

= (Ik + op(1))An

h6,r∗1+ o(1)

0(k−r∗1 )×r∗1

+Opπ ((n1/2τr1n)−1),

τ−1r1nH

−1J+U+n B

+n,r∗1 ,d+1 = (Ik + op(1))An

0r∗1×(d+1−r∗1 )

O(τr2nτr1n

)(k−r∗1 )×(d+1−r∗1 )

+Opπ ((n1/2τr1n)−1).

From the last two expressions,

%n ≡ τ−2r1nB

+′n,0,r∗1

U+′n J+′H ′HJ+U+

n B+n,r∗1 ,d+1

=

h6,r∗1+ o(1)

0(k−r∗1 )×r∗1

′A′n(Ik + op(1))An

0r∗1×(d+1−r∗1 )

O(τr2nτr1n

)(k−r∗1 )×(d+1−r∗1 )

+Opπ ((n1/2τr1n)−1)

=

h6,r∗1+ o(1)

0(k−r∗1 )×r∗1

′ Ir∗1 + opπ (1) opπ (1)r∗1×(k−r∗1 )

opπ (1)(k−r∗1 )×r∗1 Ik−r∗1 + opπ (1)

0r∗1×(d+1−r∗1 )

O(τr2nτr1n

)(k−r∗1 )×(d+1−r∗1 )

+Opπ ((n1/2τr1n)−1)

= opπ (τr2n/τr1n) +Opπ ((n1/2τr1n)−1). (A.188)

Define

ξ1(κ) ≡ τ−2r1nB

+′n,0,r∗1


n B+n,0,r∗1

− κ(Ir∗1 + E+1 ) ∈ Rr

∗1×r

∗1 ,

ξ2(κ) ≡ %n − κE+2 ∈ Rr

∗1×(d+1−r∗1 ),

ξ3(κ) ≡ τ−2r1nB

+′n,r∗1 ,d+1U


n B+n,r∗1 ,d+1 − κ(Id+1−r∗1 + E+

1 ) ∈ Rd+1−r∗1×(d+1−r∗1 ).

Next we verify that (n1/2τr1n)−2κ+j , j = r∗1 +1, . . . , d+1, cannot solve the determinantal equation |ξ1(κ)| = 0

with wppa 1. First, note that

τ−2r1nB

+′n,0,r∗1


n B+n,0,r∗1

pπ−→ h26,r∗1

. (A.189)

Then, using (A.184), (A.187) and (A.189), for j = r∗1 + 1, . . . , d+ 1,

ξj1 ≡ ξ1((n1/2τr1n)−2κ+j )

= τ−2r1nB

+′n,0,r∗1


n B+n,0,r∗1

− (n1/2τr1n)−2κ+j (Ir∗1 + E+

1 )

= h∗26,r∗1+ opπ (1)− opπ (1)(Ir∗1 + opπ (1))

= h∗26,r∗1+ opπ (1). (A.190)

Note that λmin(h∗26,r∗1) > 0 because h6,l > 0 for all l = 1, . . . , r∗1 − 1. Hence,

|ξ1((n1/2τr1n)−2κ+j )| 6= 0 (A.191)

69

wppa 1. Using the determinant formula for partitioned matrices,

0 =∣∣τ−2r1nB

+′n U


n B+n − κ(Id+1 + E+)

∣∣ (A.192)

=

∣∣∣∣∣∣ ξ1(κ) ξ2(κ)

ξ2(κ)′ ξ3(κ)

∣∣∣∣∣∣= |ξ1(κ)||ξ3n(κ)− ξ2(κ)′ξ1(κ)−1ξ2(κ)|

= |ξ1(κ)||τ−2r1nB

+′n,r∗1 ,d+1U


n B+n,r∗1 ,d+1 − %

′nξ1(κ)−1%n − κ

(Id+1−r∗1 + E+

3 − E+′2 ξ1(κ)−1%n

− %′nξ1(κ)−1E+2 + κE+′

2 ξ1(κ)−1E+2

)|, (A.193)

for κ equal to any solution (n1/2τr1n)−2κ+j to the equation (A.192). From (A.191), (n1/2τr1n)−2κ+

j , j =

r∗1 + 1, . . . , d+ 1, solve |ξ3(κ)− ξ2(κ)′ξ1(κ)−1ξ2(κ)| = 0. Thus,

|τ−2r1nB

+′n,r∗1 ,d+1U


n B+n,r∗1 ,d+1 − o((τr2n/τr1n)2) +Opπ ((n1/2τr1n)−2)

− (n1/2τr1n)−2κ+j (Id+1−r∗1 + Ej2)| = 0,

where Ej2 ≡ E3 − E′2ξ−1j1 %n − %′nξj1E2 + (n1/2τr1n)−2κ+

j E′2ξ−1j1 E2 ∈ R(d+1−r∗1 )×(d+1−r∗1 ). Multiplying the

equation in the previous display by τ−2r2n/τ

−2r1n and using Opπ ((n1/2τr1n)−2) = opπ (1) which follows from

r2 ≤ q and n1/2τjn →∞ for all j ≤ q,

|τ−2r2nB

+′n,r∗1 ,d+1U


n B+n,r∗1 ,d+1 + opπ (1)− (n1/2τr2n)−2κ+

j (Id+1−r∗1 + Ej2)| = 0, (A.194)

Thus, (n1/2τr2n)−2κ+j : j = r∗1 + 1, . . . , d+ 1 solve

|τ−2r2nB

+′n,r∗1 ,d+1U


n B+n,r∗1 ,d+1 + opπ (1)− κ(Id+1−r∗1 + Ej2)| = 0. (A.195)

On noting that E2 = opπ (1) and E3 = opπ (1) by (A.184), ξ−1j1 = Opπ (1) by (A.190), and %n = opπ (1) using

(A.188), τr2n ≤ τr1n, n1/2τr1n → 0, and (n1/2τr1n)−2κ+j = opπ (1) for j = r∗1 + 1, . . . , d+ 1 by (A.187),

Ej2 = opπ (1). (A.196)

Now the same argument is repeated for the subsequent groups of indices. Specifically, replace (A.181) and

(A.184) by (A.195) and (A.196), and replace j = r∗1 + 1, . . . , d + 1, E, B+n , τr1n, τr2n, r∗1 , d + 1 − r∗1 ,

h∗6,r∗1 , B+n,0,r∗1

, and B+n,r∗1 ,d+1 by j = r∗2 + 1, . . . , d + 1, Ej2, B+

n,d+1−r∗1, τr2n, τr3n, r∗2 − r∗1 , d + 1 − r∗2 ,

h∗6,r∗2 ≡ diag(1, h6,r∗1+1, h6,r∗1+1h6,r∗1+2, . . . ,∏r∗2−1l=r∗1+1 h6,l) ∈ R(r∗2−r

∗1 )×(r∗2−r

∗1 ), B+

n,r∗1 ,r∗2, and B+

n,r∗2 ,d+1, respec-

tively. Moreover, Ej2 is partitioned further as

Ej2 =

E1j2 E2j2

E′2j2 E3j2

, E1j2 ∈ Rr∗2×r

∗2 , E2j2 ∈ Rr

∗2×(d+1−r∗1−r

∗2 ), E3j2 ∈ R(d+1−r∗1−r

∗2 )×(d+1−r∗1−r

∗2 ).

The analog of (A.195) is that (n1/2τr3n)−2κ+j : j = r∗2 + 1, . . . , d+ 1 solve

|τ−2r3nB

+′n,r∗2 ,d+1U


n B+n,r∗2 ,d+1 + opπ (1)− κ(Id+1−r∗2 + Ej3)| = 0, (A.197)

70

where

Ej3 ≡ E3j2 − E′2j2ξ−11j2%2n − %′2nξ−1

1j2E2j2 + (n1/2τr2n)−2κ+j E′2j2ξ

−11j2E2j2,

ξ1j2 ≡ ξ1j2((n1/2τr2n)−2κ+j )

= τ−2r2nB

+′n,r∗1 ,r

∗2U+′n J+′H ′HJ+U+

n B+n,r∗1 ,r

∗2− (n1/2τr2n)−2κ+

j (Ir∗2−r∗1 + E1j2)

= h∗26,r∗2+ opπ (1)− opπ (1)(Ir∗2−r∗1 + opπ (1))

= h∗26,r∗2+ opπ (1),

%2n ≡ τ−2r2nB

+′n,r∗1 ,r

∗2U+′n J+′H ′HJ+U+

n B+n,r∗2 ,d+1.

Analogously to (A.185), (A.186) and (A.187), it then follows that

κ+j

pπ−→∞, j = r2, . . . , r∗2 , (A.198)

(n1/2τr2n)−2κ+j = opπ (1), j = r∗2 + 1, . . . , d+ 1. (A.199)

Then, repeating the previous steps Gh − 2 more times gives (as verified below)

κ+j

pπ−→∞, j = 1, . . . , r∗Gh , (A.200)

(n1/2τrgn)−2κ+j = opπ (1), j = r∗g + 1, . . . , d+ 1, g = 1, . . . , Gh. (A.201)

Since r∗Gh = q, the part (a) of the lemma follows from (A.199). Setting g = Gh in (A.201) and noting that

r∗Gh = q yield

(n1/2τrGhn)−2κ+j = opπ (1), j = q + 1, . . . , d+ 1. (A.202)

If rGh = r∗Gh = q, from (A.202) (n1/2τqn)−2κ+j = opπ (1) for j = q + 1, . . . , d + 1. If rGh < r∗Gh = q, using

h6,l > 0 for all l = rGh , . . . , r∗Gh− 1,

limn→∞

τqnτrGhn

= limn→∞

τr∗Ghn

τrGhn=

r∗Gh−1∏

j=rGh

h6,j > 0. (A.203)

Combining (A.202) and (A.203), (n1/2τqn)−2κ+j = opπ (1) for j = q + 1, . . . , d + 1. On noting that τln ≥ τqn

for l ≤ q, the part (b) of the lemma follows. The result for all subsequences wn and λwn,h ∈ Λ : n ≥ 1

holds analogously. The formal induction proof of (A.200) and (A.201) are provided next.

Step 2: Using induction to complete the proof. Denote ogp denote a symmetric (d + 1 −

r∗g−1)× (d+ 1− r∗g−1) matrix whose (l,m), l,m = 1, . . . , d+ 1− r∗g−1, element is opπ (τ(r∗g+l)nτ(r∗g+l)n/τ2rgn) +

Opπ ((n1/2τrgn)−1). ogp = opπ (1) because r∗g−1 + l ≥ rg for l ≥ 1 as τjn’s are nonincreasing in j, and

n1/2τrgn →∞ for g = 1, . . . , Gh. The goal is to show that (n1/2τrgn)−2κ+j : j = r∗g−1 + 1, . . . , d+ 1 solve

|τ−2rgnB

+′n,r∗g−1,d+1U


n B+n,r∗g−1,d+1 + ogp − κ(Id+1−r∗g−1

+ Ejg)| = 0, (A.204)

for some (d+1−r∗g−1)×(d+1−r∗g−1) symmetric matrices Ejg = opπ (1) and ogp. This is shown using induction

over g = 1, . . . , Gh. For g = 1, the above holds upon noting that Ejg = E, ogp = 0, r∗g−1 = r∗0 ≡ 0 and

71

B+n,rg−1,d+1 = B+

n,0,d+1 = B+n . Now suppose that (A.197) holds for g with Ejg = opπ (1) and ogp. Proceeding

similarly to the argument in (A.183)

τ−1rgnHJ

+U+n B

+n,r∗g−1,d+1

= τ−1rgn(Ik + opπ (1))An

0r∗g−1×(d+1−r∗g)

diag(τrgn, . . . , τ(d+1)n)

0(k−d−1)×(d+1−r∗g−1)

+Opπ ((n1/2τrgn)−1), (A.205)

pπ−→ h3

0r∗g−1×(r∗g−r∗g−1) 0r∗g−1×(d+1−r∗g)

h∗6,r∗g 0(r∗g−r∗g−1)×(d+1−r∗g)

0(k−r∗g)×(r∗g−r∗g−1) 0(k−r∗g)×(d+1−r∗g)

, h∗6,r∗g ≡ diag(1, h6,rg , . . . ,

r∗g−1∏l=1

h6,l) ∈ R(r∗g−r∗g−1)×(r∗g−r

∗g−1),

and h∗r∗g ≡ 1 for r∗g = 1. From (A.205) and h′3h3 = Ik = limn→∞A′nAn,

τ−2rgnB

+′n,r∗g−1,d+1U


n B+n,r∗g−1,d+1

pπ−→

h∗26,r∗g0(r∗g−r∗g−1)×(d+1−r∗g)

0(d+1−r∗g)×(r∗g−r∗g−1) 0(d+1−r∗g)×d+1−r∗g)

. (A.206)

Using (A.197) which is assumed to hold in the induction step and ogp = opπ (1), wppa 1, (n1/2τrgn)−2κj :

j = r∗g−1 + 1, . . . , d+ 1 solve

|(Id+1−r∗g−1+ Ejg)

−1τ−2rgnB

+′n,r∗g−1,d+1U


n B+n,r∗g−1,d+1 + opπ (1)− κId+1−r∗g−1

| = 0. (A.207)

By the same arguments that led to (A.185)-(A.187), (A.206), (A.207) and the induction assumption that

Ejg = opπ (1),

κ+j

pπ−→∞, j = r∗g−1, . . . , r∗g , (A.208)

(n1/2τrgn)−2κ+j = opπ (1), j = r∗g + 1, . . . , d+ 1. (A.209)

Let o∗gp denote an (r∗g − r∗g−1) × (d + 1 − r∗g) matrix whose elements in column j, j = 1, . . . , d + 1 − r∗g ,

are opπ (τ(r∗g+j)n/τrgn) + Opπ ((n1/2τrgn)−1). Then o∗gp = opπ (1). Next replacing B+n,r∗g−1,d+1 in (A.183) with

Bn,r∗g ,d+1 and Bn,r∗g−1,r∗g, and multiplying the resulting matrices give

%gn ≡ τ−2rgnB

+′n,r∗g−1,r

∗gU+′n J+′H ′HJ+U+

n B+n,r∗g ,d+1

=

0r∗g−1×(r∗g−r∗g−1)

diag(τ(rg−1+1)n, . . . , τr∗gn)/τrgn

0(k−r∗g)×(r∗g−r∗g−1)

′

A′n(Ik + opπ (1))An

0r∗g−1×(r∗g−r∗g−1)

diag(τ(rg−1+1)n, . . . , τr∗gn)/τrgn

0(k−r∗g)×(r∗g−r∗g−1)

+Opπ ((n1/2τrgn)−1)

= o∗gp ∈ R(r∗g−r∗g−1)×(d+1−r∗g), (A.210)

where diag(τ(rg−1+1)n, . . . , τr∗gn)/τrgn = h∗6,r∗g+o(1), and the last equality usesA′n(Ik+opπ (1))An = Ik+opπ (1).

Partition the (d+ 1− r∗g−1)× (d+ 1− r∗g−1) matrices ogp and Ejg as:

ogp =

o1gp o2gp

o′2gp o3gp

, Ejg =

E1jg E2jg

E′2jg E3jg

, o1gp, E1jg ∈ R(r∗g−r∗g−1)×(r∗g−r

∗g−1),

72

o2gp, E2jg ∈ R(r∗g−r∗g−1)×(d+1−r∗g), o3gp, E3jg ∈ R(d+1−r∗g)×(d+1−r∗g), j = r∗g−1 + 1, . . . , d+ 1 and g = 1, . . . , Gh.

Define

ξ1jg(κ) ≡ τ−2r1nB

+′n,r∗g−1,r

∗gU+′n J+′H ′HJ+U+

n B+n,r∗g−1,r

∗g

+ o1gp − κ(Ir∗g−r∗g−1+ E1jg) ∈ R(r∗g−r

∗g−1)×(r∗g−r

∗g−1),

ξ2jg(κ) ≡ %gn + o2gp − κE2jg ∈ R(r∗g−r∗g−1)×(d+1−r∗g),

ξ3jg(κ) ≡ τ−2rgnB

+′n,r∗g ,d+1U


n B+n,r∗g ,d+1 + o3gp − κ(Id+1−r∗g + E3jg) ∈ Rd+1−r∗g×(d+1−r∗g).

From (A.197), (n1/2τrgn)−2κj : j = r∗g−1 + 1, . . . , d+ 1 solve

0 =∣∣τ−2rgnB

+′n,r∗g−1,d+1U

+′n J+′

n H ′nHnJ+n U

+n B

+n,r∗g−1,d+1 + ogp − κ(Id+1−r∗g−1

+ Ejgn)∣∣ (A.211)

=

∣∣∣∣∣∣ ξ1jg(κ) ξ2jg(κ)

ξ2jg(κ)′ ξ3jg(κ)

∣∣∣∣∣∣= |ξ1jg(κ)||ξ3jg(κ)− ξ2jg(κ)′ξ1jg(κ)−1ξ2jg(κ)|

= |ξ1jg(κ)||τ−2rgnB

+′n,r∗g ,d+1U


n B+n,r∗g ,d+1 + o3gp − (%gn + o2gpπ )′ξ1jgn(κ)−1(%gn + o2gp)

− κ(Id+1−r∗g + E3jg − E′2jg ξ1jg(κ)−1(%gn + o2gpπ )

− (%gn + o2gp)′ξ1jg(κ)−1E2jg + κE′2jg ξ1jg(κ)−1E2jg

)|, (A.212)

where the second equality holds by the formula of partitioned matrix determinant provided

ξ1jg

((n1/2τrgn)−2κj

), j = r∗g−1 + 1, . . . , d+ 1

is nonsingular wppa 1, a fact that is verified below. By the same argument as in (A.190), E1jg = opπ (1)

which holds by definition, and ogp = opπ (1),

ξ1jg ≡ ξ1jg((n1/2τrgn)−2κj) = h∗26,r∗g+ opπ (1). (A.213)

Since λmin(h∗26,r∗g) > 0, ξ1jgn

((n1/2τrgn)−2κj

), j = r∗g−1 + 1, . . . , d+ 1, is nonsingular wppa 1. Therefore,

0 =|τ−2rgnB

+′n,r∗g ,d+1U


n B+n,r∗g ,d+1 + o3gp − (%gn + o2gp)

′ξ−11jg(%gn + o2gpπ )

+ (n1/2τrgn)−2κ+j (Id+1−r∗g + Ej(g+1)n)|, (A.214)

where

Ej(g+1) ≡ E3jg − E′2jg ξj1g(%gn + o2gp)− (%gn + o2gp)′ξj1gE2jg

+ (n1/2τrgn)−2κ+j E′2jg ξ

−11jgE2jg ∈ R(d+1−r∗g)×(d+1−r∗g). (A.215)

Furthermore,

o3gp − (%gn + o2gp)′ξ−1

1jg(%gn + o2gp) = o3gp − (o∗gp + o2gp)′(h∗−2

6,r∗g+ opπ (1))(o∗gp + o2gp)

= o3gp − o∗′gpo∗gp

= (τ2rg+1n/τ

2rgn)o(g+1)p, (A.216)

where

73

1. the first equality follows from (A.210) and (A.213);

2. the second equality uses o2gp = o∗gp. The latter holds because the (j,m), j = 1, . . . , r∗g − r∗g−1,m =

1, . . . , d+ 1− r∗g element is

opπ (τ(r∗g−1+j)nτ(r∗g+m)n/τ2rgn) +Opπ ((n1/2τrgn)−1) = opπ (τ(r∗g+m)n/τrgn) +Opπ ((n1/2τrgn)−1)

since r∗g−1 + j ≥ rg and (h∗−26,r∗g

+ opπ (1))o∗gp = o∗gp which in turn holds because h∗6,r∗g is diagonal and

λmin(h∗26,r∗g) > 0;

3. the third equality uses the fact that the (j,m), j,m = 1, . . . , d+ 1− r∗g , of (τ2rgn/τ

2rg+1n)o3gp is of order

opπ (τ(r∗g+j)nτ(r∗g+m)n/τ2rgn)(τ2

rgn/τ2rg+1n) +Opπ ((n1/2τrgn)−1)(τ2

rgn/τ2rg+1n)

= opπ (τ(r∗g+j)nτ(r∗g+m)n/τ2rg+1n) +Opπ ((n1/2τrg+1n)−1)(τrgn/τrg+1n)

which is the same order as the (j,m) element of o(g+1)p using τrgn/τrg+1n ≤ 1;

4. the third equality also uses the fact that the (j,m), j,m = 1, . . . , d + 1 − r∗g , is of (τ2rgn/τ

2rg+1n)o∗′gpo

∗gp

is the sum of two terms that are of orders

opπ (τ(r∗g+j)nτ(r∗g+m)n/τ2rgn)(τ2

rgn/τ2rg+1n) = opπ (τ(r∗g+j)nτ(r∗g+m)n/τ

2rg+1n),

Opπ ((n1/2τrgn)−2)(τ2rgn/τ

2rg+1n) = Opπ ((n1/2τrg+1n)−2),

respectively. Thus, (τ2rgn/τ

2rg+1n)o∗gpo

∗gp is o(g+1)p.

For j = r∗g + 1, . . . , d + 1, since E2jg = opπ (1) and E3jg = opπ (1) by (A.197), ξ1jg = Opπ (1) by (A.213),

%gn + o2gp = opπ (1) by (A.210) and since o∗gp = opπ (1) and (n1/2τrgn)−2κ+j = opπ (1) by (A.199), it follows

that

Ej(g+1) = opπ (1). (A.217)

Using (A.216) and (A.217) in (A.214),

|τ−2rg+1nB

+′n,r∗g ,d+1U


n B+n,r∗g ,d+1 + o(g+1)p − (n1/2τrg+1n)−2κ+

j (Id+1−r∗g + Ej(g+1))| = 0,

(A.218)

Thus, wppa 1 (n1/2τrg+1n)−2κ+j : j = rg+1, . . . , d+ 1 solve

|τ−2rg+1nB

+′n,r∗g ,d+1U


n B+n,r∗g ,d+1 + o(g+1)p − κ(Id+1−r∗g + Ej(g+1))| = 0. (A.219)

This completes the induction step and (A.197) holds for all g = 1, . . . , Gh.

74

Documents

York University This draft: November 30, 2021 arXiv:2111