Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Identification from Moment Conditions with
Singular Variance
Nicky Grant∗
Faculty of Economics
University of Cambridge
November 7, 2012
Abstract
This paper studies identification robust inference from moment conditions with
a singular variance matrix at the true parameter. An equivalence between singular
variance and identification failure is established for a wide class of moment functions.
The Generalized Anderson Rubin (AR) statistic is shown not to exist at the true
parameter with singular variance. A novel asymptotic approach is devised to provide
general asymptotic theory making no assumption on the rank of the moment variance
matrix. Conditions for which the AR statistic is asymptotically locally chi-squared
with singular variance are established. The 2-Step GMM (2S) statistic is shown to
have a non-standard distribution with singular variance. Tests of the rank of the
moment variance matrix and methods to correct the size of the confidence regions
based on the 2S statistic are provided. An extensive simulation study verifies the key
results of this paper.
Keywords: GMM, Anderson Rubin Statistic, Singular Variance, Identification.
JEL Classification: C10, C13, C58
∗Author Correspondence: Robinson College, Grange Road, Cambridge, CB3 9AN. [email protected]
1
1 Introduction
This paper considers identification robust inference from moment conditions with a singu-
lar variance matrix at the true parameter . The AR statistic is shown to not exist at the
true parameter with singular variance. A novel generalized method to deriving asymp-
totic theory is developed that does not require that moments have non-singular variance.
Using this method conditions are provided under which the AR statistic has a standard
chi squared limit in a small neighborhood around the true parameter. This result is of
key importance given the equivalence between singular variance and identification failure
that is established for a class of empirically relevant moment functions.
Identification robust inference has gained increasing prominence in theoretical and applied
research. Point identified inference based on the normal approximation to the distribution
of the GMM estimator is known to be poor with identification failures (Hansen, Heaton
& Yaron (1996), Staiger & Stock (1997), Stock & Wright (2000), Newey & Windmeijer
(2009)).
Results from the identification literature suggest inference be made based on methods that
dispose of the strong identification assumption. Inference on the some parameter formed by
inverting the AR statistic based upon a chi squared approximation provides asymptotically
valid inference under the fewest assumptions (Anderson Rubin (1949, 1950), Zivot, Startz
& Nelson (1996), Stock & Wright (2000), Newey & Windmeijer (2009)).
One of the assumptions required for the validity of the AR method of robust inference
is that moments have non-singular variance. This is also a key assumption for other
identification robust measures for example the K-Statistic of Kleibergen (2005) and the
GEL statistic considered in Guggenberger & Smith (2005). When this assumption fails
the standard asymptotic approach breaks down with identification robust statistics shown
to not exist at the true parameter with singular variance.
This is a great concern since it is shown that moments with singular variance are a common
by-product of identification failures. An equivalence result is shown for a class of moments
functions that encapsulates Non Linear Least Squares and Maximum Likelihood. As such
current identification robust methods in the literature provide inference robust only to
those identification failures that do not correspond to singular variance. Consequently
the widespread relevance of results derived in this literature are limited without further
investigation.
In order to derive large limit theory for the AR statistic in a neighborhood of the true
2
parameter a novel asymptotic method is derived which is of theoretical interest in its
own right. Higher order asymptotic expansions of the eigenvalues and eigenvectors of the
sample variance matrix in a region around the true parameter are established. These
expansions prove critical in deriving asymptotic theory for key identification robust statis-
tics.
For brevity we derive asymptotic theory only for the commonly used AR and 2S statistics.
Since the AR statistic does not exist at the true parameter with singular variance a gen-
eralization of a confidence set is proposed termed a ‘Local Confidence Set’. Namely a set
which contains at least a subset of points within an asymptotically negligible neighborhood
around the true parameter.
The AR statistic is shown to have a standard chi-squared limit under further conditions
once the assumption moments have non-singular variance is dropped. Strikingly it is
shown that the AR statistic has a non-standard limit distribution in some instances when
strong identification occurs with singular variance. The 2S statistic with singular variance
is shown to have a highly non-standard limit distribution at the true parameter, even with
strong identification.
As such a means to test for singularity with strong identification is developed. A method
to remove redundant linear combinations of moments is provided to asymptotically to
eradicate the problem of singular variance for the 2S statistic. A bootstrap method sim-
ilar to Kleibergen (2011) is shown to deliver asymptotically valid confidence regions in a
simulation.
The identification and more general literature on estimation and inference has largely
ignored the problem of moments with singular variance. An exception is Penaranda &
Sentana (2010) who consider a modified GMM approach where those redundant linear com-
binations are known a priori. A transformed moment function removing those moments
that lead to singular variance is used for estimation. The strong identification assumption
is made on this transformed moment function, as such standard GMM asymptotics follow.
The assumption that the form of the singularity is known a priori is untenantable when
providing general asymptotic theory. More problematic for estimation is when the form (if
any) of the singularity at the true parameters is not known a priori. We derive asymptotic
theory that does not require knowledge of potentially redundant moment conditions at
the true unknown parameter.
Numerous examples of singular variance for commonly used moment functions are pro-
vided. There are many examples that arise in the financial econometric models, given the
3
highly non-linear nature of the moment functions implied by many financial economic the-
ories. Simultaneous equations models with conditional heteroscedasticity of an unknown
form and general non-linear regression models are also shown to be moment problems
where singular moment variance may arise.
Surprisingly the link between general identification failures and singular variance doesn’t
seem to have been made in the identification literature. This paper is the first step in
understanding the important link between identification and singular variance, providing
asymptotically valid methods of inference robust to both identification and non-singular
variance failures.
Two distinct simulation experiments demonstrate the main theoretical results and impli-
cations of this paper. The theoretical implications of the paper are verified. Interestingly
when the variance is almost-singular simulation evidence demonstrates that in certain
cases standard asymptotics provide a poor approximation of the relevant statistics used
to form inference. The results here are analogous to the weak identification literature. It
would be interesting to model (a subset) of the eigenvalues of the variance matrix as local
to zero. This approach is termed ‘weakly singular moments’ is considered in a companion
paper, Grant (2012).
Section 2 recaps common methods of forming inference and those methods proposed in the
literature that are robust to identification failures. Section 3 considers singular variance
and its relationship with identification. Examples of singular variance both with and with-
out identification failure are provided. Section 4 sets out the novel asymptotic approach
to deriving limit theory with singular variance. Asymptotic theory is established for the
AR and 2S statistic . Section 5 highlights methods for testing for redundant moments
and methods of providing inference robust to singularity and Section 6 contains an ex-
tensive simulation study using two different examples that demonstrate the main results
of the paper. Section 7 presents conclusions. An Appendix collects main definitions used
throughout the papers and collects proofs of theorems.
2 Identification Robust Inference
In order to gain meaningful inference we have to make some assumptions about the iden-
tifying nature of a set of moment restrictions. The benchmark assumption made in the
econometrics literature is the Strong Identification condition. The Strong Identification
Conditions are a set of assumptions which guarantee (along with some further technical
4
conditions) that the GMM and related estimators are strongly-consistent to a Gaussian
limit distribution of known form. One such regularity condition is that the variance of the
moments at the true parameter is full rank.
Before we produce the Strong Identification Condition we introduce the basic moment
setup of the paper. Suppose we have a sample of n observations wi (i = 1, .., n) on a data
vector w where w ∈ K and K ⊆ Rs for some s > 0 where s ∈ N. For simplicity we assume
that this sequence wi (i = 1, ..n) is independent across i,n 1
Let β be a p×1 vector of parameters lying in some compact set B ⊂ Rp. Let g(w, β) be an
m×1 moment function g(·, ·) : K×B 7−→ Rm. For all i ∈ 1, , n for any n and β ∈ B define
g(wi, β) := gi(β), Gi(β) := ∂gi(β)/∂β , G(β) := E[ 1n
∑ni=1Gi(β)], G(β) := 1
n
∑ni=1Gi(β)
Ω(β) := E[ 1n
∑ni=1 gi(β)gi(β)′], Ω = Ω(β0) , Ω(β) = 1
n
∑ni=1 gi(β)gi(β)′, G = G(β0),
Gi := Gi(β0) , gi := Gi(β0).
Define βn a potentially stochastic sequence where βn = β0 + ∆n for some sequence ∆n.
See the Appendix for a list of definitions of notation commonly used throughout the paper.
Let W be some (possibly data dependent) symmetric strictly positive definite weight ma-
trix and g(β) := 1n
∑ni=1 gi(β) then the GMM estimator is defined as
β := arg minβ∈B
g(β)′Wg(β) (1)
Where 2-Step GMM uses W that is consistent for Ω i.e W = Ω + op(1) based on some
initial consistent estimator β.
Strong Identification Condition (SI)
(i) E[gi(β)] = 0 uniquely at β0 ∈ B Global Identication
(ii) RG = p First Order Identification
Under the Strong Identification Condition along with a set of regularity conditions (see
for example Newey & McFadden (1994) Theorem 3.4, p2148) the 2-Step GMM estimator
converges to a Gaussian limit at rate n1/2
n1/2(β − β0)d→ N(0, (G′Ω−1G)−1) (2)
Asymptotics for the 2-Step GMM Estimator and related statistics made under SI and
these regularity conditions are often referred to as ‘standard asymptotics’.
1All results of this paper can be extended to allow for dependence. In this case we’d have to consider
more general estimators of the variance matrix (e.g HAC estimators) which would increase the notational
complexity though add little to the fundamental ideas and theorems of the paper.
5
Roughly speaking SI(i) guarantees consistency of the GMM estimator and SI(ii) the n1/2
convergence to a Gaussian distribution. First Order Under-identification (i.e RG < p)
whilst SI(i) holds was considered for non-linear IV in Sargan (1983). In this case the
non-linear IV estimator is consistent though with a non-standard limit distribution.
These results were extended to general GMM estimators in Renault & Donovon (2009).
When SI(ii) fails though there is second order identification and SI(i) holds the GMM
Estimator has a non standard limit distribution with convergence at rate n1/4 in directions
outside of the null column space of G. In general this distribution has an implicitly defined,
complicated and impractical form.
Interestingly the full rank assumption of Ω is not made as an identification condition in
the literature. The identification literature abounds with the theoretical implications for
standard asymptotics of departures from SI. Almost universally the assumption that Ω is
full rank is maintained. When Ω is full rank the results in these papers hold.
However as we demonstrate in Section 3, identification failures and the rank of Ω are
linked. In that for a large class of empirically relevant moment functions, departures from
SI imply that Ω is not full rank. As such the scope of the results of these papers may not
be as broad as initially prescribed, not without further investigation.
The message from the identification literature is that when there are departures from SI
inference on β0 using standard asymptotics may be poor. As such methods of inference
are proposed that provide asymptotically valid inference on β0 without the SI conditions.
Namely by inverting ‘identification Robust Statistics’ ( Stock & Wright (2000), Moreira
(2003) Kleibergen (2005), Guggenberger & Smith (2005)).
The Anderson Rubin statistic is a commonly prescribed statistic that provides asymptoti-
cally valid inference with general identification failures under the fewest assumptions. The
2-Step GMM Objective function is robust to departures from SI(ii) so long as SI(i) holds.
We focus on deriving asymptotics for these two statistics.
Standard methods of deriving asymptotic are no longer adequate when Ω is not full rank.
Firstly we show that the AR statistic will not exist at β0 in this case. Secondly we cannot
resort to Taylor-type expansions of the inverse of the sample variance around β0 when Ω
is singular.
As such we develop a novel method to deriving asymptotics in this case detailed in Sec-
tion 4. These general results will be useful for deriving asymptotics for the GMM (and
related) estimators , other identification robust statistics and the J-Test. This would be
an interesting and much needed avenue of future research though is beyond the scope of
6
this paper.
We now recap methods of forming identification robust statistics, and propose the notion
of a Local Confidence Set to generalize the notion of a confidence set. This will provide
useful for studying properties of the confidence set based on the inversion of a statistic
which doesn’t exist at β0.
2.1 Identification Robust Confidence Sets
A common method of deriving confidence sets for β0 is the inversion of some pre-specified
test statistic T (β), for example the Wald Statistic. Suppose that we wish to form a
confidence region that (asymptotically) contains β0 with probability α. Then if the statistic
T (β) is such that
T (β0)d→ T (3)
where T has a known (or feasibly estimable form).
Define the set B(c) =: β ∈ B : T (β) ≤ c for any c > 0
Prβ0 ∈ B(c) = PrT (β0) ≤ c (4)
Hence if (3) holds then
Prβ0 ∈ B(c) → PrT ≤ c (5)
Common choices of T (β) include the Wald, Lagrange Multiplier or Likelihood ratio statis-
tic. For example the Wald Statistic TW (β) based on the 2-Step GMM estimator β
TW (β) := n(β − β)′(G(β)′Ω(β)−1G(β))−1(β − β) (6)
Under the strong identification assumption (along with the assumption that Ω is non-
singular) and some other regularity conditions it can be shown for that these statistics
that (3) holds where T is χ2m.
Much recent literature has focused on conditions when the SI fails. When this is the case
these commonly used trio of statistics no longer in general have a χ2m limit distribution
(Staiger & Stock (1997), Stock & Watson (2000)).
In light of this statistics that are robust to identification failures have been considered in
the literature. A common example is the Anderson Rubin statistic TAR(β)
TAR(β) := ng(β)′Ω(β)−1g(β) (7)
7
Under the assumption that Ω(β0)p→ Ω where Ω is full rank,
√ng(β0)
d→ N(0,Ω) then
Ω(β0)−1 p→ Ω−1 by the Continuous Mapping Theorem hence by standard arguments
TAR(β0)d→ χ2
m (8)
This result makes no assumption about identification, as such inference formed inverting
the AR statistic using the χ2m approximation is asymptotically valid even when SI fails
(Stock & Wright (2000)). However for this result to hold Ω must be non-singular.
Many other ’identification robust’ statistics have been developed in the identification lit-
erature. For example the K-Statistic (Kleibergen (2005)). However such statistic require
further assumptions on the limit distribution of first order derivative matrix. As such the
AR statistics holds under the weakest set of assumptions and will be considered in this
paper. Though the implications of standard asymptotics for singular variance will hold
for the K-statistic and other many other identification robust statistics.
Another statistic is the 2-Step GMM objective function T2S(β) based on some initial GMM
estimator β is defined as
T2S(β) = ng(β)′Ω(β)−1g(β) (9)
Then under the same conditions as made for AR (along with βp→ β0 )
T2S(β0)d→ χ2
m (10)
A sufficient condition for consistency of β is SI(i). However no assumption is required on
the first-order derivative matrix G. This is not the case for the Wald Statistic which is the
inverse of a some function of G.
When the Ω is not full rank we show that T2S(β0) does not have a standard chi squared
limit. This holds irrespective of whether the Strong Identification holds.
When Ω is singular we show that TAR(β0) does not exist. However under further conditions
on the first order derivative G(β) at around β0 we show that TAR(β) is locally- chi squared.
Namely that TAR(βn)d→ χ2
m where ||∆n|| = o(n−1/2).
One surprising condition is that for any δ ∈ Rm such that δ′Ω = 0 implies that δ′G =
0. Hence the null space of Ω lies within that of the null column space of G. In just
identified models this means that strongly identified moments are ruled out with singular
variance. A simulation demonstrates the non-standard distribution of the AR statistic in
a neighborhood around β0 in this case.
In order to establish this result we provide a novel asymptotic method discussed in Section
4. Since we show in Theorem 1 that TAR(β) does not exist at β0 then the usual argument
8
for inverting statistics to form confidence regions breaks down. As such we slightly gen-
eralize the notion of a confidence set to a Local Confidence Set defined below. A Local
Confidence Set covers a subset of points within a sufficiently small neighborhood around
β0
2.1.1 Local Confidence Sets
We think more generally as providing confidence regions around some point local to β0.
Define a set of Local True Parameters as some ball around β0. Namely define B0(εn) :=
B(β0, εn) (i.e all β ∈ B s.t ||β − β0|| ≤ εn) . A Local Confidence set with size α is as a
set that contains some point (or subset of points ) of the Local True Parameter Set with
probability α. Namely some set C(α) such that ∃β ∈ B0(εn) such that
Prβ ∈ C(α) → α (11)
This idea generalizes the notion of a confidence set. If εn = 0 then B0(εn) = β0 and (11)
is equivalent to the usual definition of a confidence set.
How small a region around β0 that our confidence set covers asymptotically depends upon
the properties T (β).
Take a sequence βn where ∆n = O(n−δ) for then TAR(βn) with strong identification and
Ω non-singular it is straightforward to show
TAR(βn) = TAR(β0) +Op(n1/2−δ) (12)
Where TAR(β0)d→ χ2
m. Define cα as the α percentile of χ2m. For δ > 1/2 then B(cα) con-
tains all points within a δ neighborhood around β0 with probability α (asymptotically).
Hence B(c) is also a Local Confidence set for any δ-neighborhood around β0 for δ > 1/2.
When δ < 1/2 TAR(βn) diverges and hence B(c) does not contain βn w.p.a.1. When
v = 1/2 then βn in general lies in B(cα) with probability not equal to α asymptotically.
In Section 3 we consider the AR and 2S statistic where Ω is singular. In this case Theorem
1 establishes that TAR(β0) does not exist. Hence when deriving the limit of TAR(βn) for
some βn sufficiently close to β0 we cannot use the standard Taylor type expansions. In
this case we provide second order expansions of the sample eigenvectors and eigenvalues
of Ω(βn) around β0. To our knowledge this is a new approach in the literature.
9
We show in general that the AR statistic in a neighborhood around β0 has a will have a
χ2m limit under further conditions made in the literature when Ω is not full rank.
We also consider the limit of T2S(β0). In this case Ω(β) exists with probability 1 for all n.
However we show that in general the limit distribution is highly non-standard even with
strong identification.
3 Identification & Singular Variance
The assumption that Ω is full rank is amongst the regularity conditions for various identi-
fication robust statistics to possess a χ2m limit. This assumption is made (if not implicitly)
in all papers regarding identification. Though for a large class of moment functions we
establish that departures from SI lead to Ω being singular.
This is concerning as this literature in general advocates the use of identification robust
methods that do not require the SI condition to hold. However when either of the SI
conditions fail, often so does the condition that Ω is full rank. Hence as it stands the
results in the identification literature robust to identification are robust only to those
identification failures which do not lead to singular moment variance.
Methods of testing a parametric restriction where under the null it is known a priori that
certain parameters are unidentified was considered by Hansen (1996). In this case there
are unidentified parameters under the null and as the unrestricted GMM estimator will
have singular variance.
This problem for testing a specific parameter restriction is side-stepped in Hansen (1996)
by essentially estimating those identified (linear combinations) parameters as a function of
the unidentified parameter. The Wald Statistic is evaluated across (a subset of) the whole
parameter space for the unidentified parameters. A test statistic that essentially integrates
across the unidentified parameters is devised to test the particular null hypothesis. This
removes the unidentified parameters and as such the issue of singular variance is eradicated.
Andrews (1987) provides other examples of singular variance for Wald Type tests of par-
ticular parameter restrictions. Conditions are provided in which replacing the inverse of
the sample variance with a g-inverse has a chi squared limit. In general these conditions
are not satisfied by identification robust statistics.
In spite of this knowledge the issue of singular variance caused by identification failures for
forming identification robust confidence sets has gone unmentioned in the identification
literature. Namely when we do not know a priori whether β0 is strongly identified or not,
10
and/or whether there is singular variance at β0.
The general adage in the identification literature is that the fewer assumptions we make on
identification, the more ‘robust” our inference. However when one assumption is dropped
one must pay attention to the interplay with other assumptions. Namely dropping one
assumption may significantly (or possibly completely) compromise the validity of the re-
maining assumptions.
3.1 Relationship Between Ω and G
In many instances G and Ω are linked. For example casting maximum likelihood as a
GMM estimator then G = Ω via the information equality. In such a case singular moment
variance is equivalent to first order under-identification. The asymptotic distribution of
the Maximum Likelihood estimator and Likelihood Ration statistic in the case where
R(Ω) = p− 1 is provided by Rotnitzky, Cox, Bottai and Robins (2000).
More generally Renault & Donovon (2009) consider the GMM estimator with first-order
under-identification though SI(i) holds. The limit distribution of the GMM Estimator
is not provided though is shown to be non-standard with convergence at rate n1/4 in
directions outside of the null space of G. The limit distribution of the J-Test is derived
when R(G) = p− 1 maintaining the assumption that Ω is full rank.
GMM is consistent though with a non-standard distribution in this case. The limit dis-
tributions in general are impractical and it is not clear how to perform valid inference in
this case. No general limit theory emerges and some knowledge on the form of the under-
identification would be required. More research needs to be carried out on the properties
of GMM estimators with under-identification.
In order to perform valid inference we need to resort to identification robust statistics.
However as we now show first order identification is one common cause of singular variance.
As such we need general theory on identification robust statistics that do not maintain
the non-singular assumption on Ω.
We establish the link between identification failure and singular variance for moment func-
tions derived from conditional moments. This encompasses the a wide class of empirically
used moment functions. More general relationships likely hold between identification and
singular variance and warrant further analysis. Examples are provided in Section 3.2.
11
3.1.1 Conditional Moments
A large class of moment functions considered both theoretically and in practise are derived
from some system of conditional expectations. For example consider some potentially non-
linear residual2 ρi(β) := v(wi, β) where v(·, ·) : K ×B 7−→ R
E[ρi(β)|xi] = 0 (13)
Uniquely at β = β0, for some p× 1 xi.
Define σi(xi) = E[ρi(β0)2|xi] then the optimal instrument f(xi) in the i.i.d case is
f(xi) = E[∂ρi(β0)/∂β|xi]/σi(xi) (14)
Take the simple case where ∂ρi(β0)/∂β is a function only of xi , i.e
E[∂ρi(β0)/∂β|xi] = ∂ρi(β0)/∂β (15)
For example this holds for the class of Non-Linear least squares models.
A simple example is Ordinary Least Squares where wi = (yi, xi) for some scalar random
variable yi where ρi(β) = yi − x′iβ so that ∂ρi(β)/∂β = xi.
In order to use the optimal instrument f(xi) we often have to make some assumptions on
the form of the conditional heteroscedasticity or use some first stage estimate with which
to base inference. Whether we use ∂ρi(β)/∂β as our instrument or ∂ρi(β)/∂β/σi(xi) does
not alter the following relationship between Ω and G.
When σi(xi) is equal to a constant then ∂ρi(β)/∂β is the optimal instrument. Often
∂ρi(β)/∂β is used as an instrument given the form of the conditional heteroscedasticity is
unknown.
The moment function is then
gi(β) = εi(β)∂εi(β)/∂β (16)
For this class of moment functions the Ω and G are
Ω = E[ρi(β0)2∂ρi(β0)/∂β∂ρi(β0)/∂β′] (17)
G = E[∂ρi(β0)/∂β∂ρi(β0)/∂β′] (18)
Hence for any δ ∈ Rp s.t δ′Ωδ = 0 implies that
E[ρi(β0)2(δ′ρi(β0)/∂β)2] = 0 (19)
2For simplicity we consider a scalar residual function. Similar results are easily established for the
scenario of multiple residual functions
12
δ′ρi(β0) = 0 a.s(xi). Therefore δ′Gδ = E[(δ′∂ρi(β0)/∂β)2] = 0. The reverse is also simple
to establish, so that R(Ω) = R(G). First-order under-identification and singular variance
are equivalent.
If we used the optimal instrument then G remains the same though
Ω = E[∂ρi(β0)/∂β∂ρi(β0)/∂β′] = G.
The class of moment functions for which the equivalence holds has been shown for non-
linear least squares and maximum likelihood. A similar link between singular variance
and identification failure likely holds for a broader class of moment functions. This is an
interesting area for future research.
3.2 Examples of Singular Variance
We provide some examples of moment functions with singular variance both with and
without identification failures. Further examples are also introduced and studied in sim-
ulations in Section 5.
3.2.1 Singular Variance with Identification Failure
There are may examples where identification failures lead to singular variance. In essence
many identification failures are linked to some linear combination of moments being re-
dundant at β0. Take for example a class of stochastic semi-linear parametric equations
yi = α′xi + πf(xi, γ) + εi (20)
Define β = (α, π, γ) where α ∈ Rp π ∈ R and γ ∈ Rl for some l > 0.
Where yi is a scalar random variable and xi is a p× 1 vector (for simplicity assumed both
are i.i.d sequence of random variables such that) E[εi|xi] = 0 and E[ε2i |xi] at β = β0 for
some parameter vector β0 = (α0, π0, γ0). Where f(·, ·) : Rp × Rk → R is a continuously
differentiable function. Define εi(β) := yi − α′xi − πf(xi, γ).
∂εi(β)
∂β= (xi, f(xi, γ), π∂f(xi, γ)/∂γ)′ (21)
Then the moment function utilized in Non-Linear Least Squares is
gi(β) = εi(β)(xi, f(xi, γ), π∂f(xi, γ)/∂γ)′ (22)
This is a special class of the class of models considered in Hansen (1996). If we wish to
test for linearity, namely that π0 = 0 then γ0 is unidentified. Hence the GMM estimate
of γ0 will be inconsistent with a random limit distribution. The Wald Statistic (6) will
13
then have a non-standard distribution in this case. Methods to form a test of π0 = 0 are
provided that in general will deliver conservative inference.
Another way to view the problem is not the direct consequences for the limit distribution
of the GMM estimator, but the moment variance at the true parameter.
Ω = σ2E
xix′i f(xi, γ0)xi xiπ0∂f(xi, γ0)/∂γ′
f(xi, γ0)x′i f(xi, γ0)2 f(xi, γ0)π0∂f(xi, γ0)/∂γ′
π0∂f(xi, γ0)/∂γx′i f(xi, γ0)π0∂f(xi, γ0)/∂γ π20∂f(xi, γ0)/∂γ∂f(xi, γ0)/∂γ′
When π0 = 0 then R(Ω) = 2. It is simple to show that R(G) = 2 as we’d expect by
the results in Section 3.1.1. In this case the moment function is globally unidentified for
γ0 also. In this example unidentified parameters under the null coincide with singular
variance. This result is likely to hold in general. However when forming ‘identification
robust’ inference by inverting for example the AR statistic, when π0 = 0 then Ω is singular.
Though standard asymptotics for AR places no assumptions on G, it does on the rank
of Ω. This highlights the crucial issue for forming identification robust inference. The
assumption that Ω is full rank is in most realistic cases untenable with identification
failures.
Another possibility is that π0 6= 0 though δ′1∂f(xi, γ0)/∂γ = δ′2xi for some δ1 ∈ Rl and
δ2 ∈ Rp. where ||δ1|| > 0 and/or ||δ2|| > 0. In this case the moment function is globally
identified though will be first order under-identified. Also Ω will be singular since Non-
Linear Least Squares (NLS) moment functions we have seen that R(Ω) = R(G)
For example consider the case where α is scalar and f(xi, γ) = 1/(1 + γxi).
∂εi(β0)
∂β= (xi, π/(1 + γ0xi)
2, xi/(1 + γ0xi)2)′ (23)
When γ0 = 0 the first and third moment equal, though will be second order identified
with global identification. In this case NLS is consistent though with a non-standard limit
distribution.
Many examples of parametric semi linear regression with potential identification failure
are commonly used. Many examples exist in the financial econometric literature where
often highly non-linear moment conditions are used to identified some true model. A clas-
sic example is testing the null of no threshold effect in Smooth Transition Autoregressive
Models. Methods to test such null hypotheses have been considered in the literature where
it is known a priori some parameters are unidentified under the null.
14
Another common example the Heckman Selection Model. If the selection bias term con-
tains regressors from the first stage equation and is linear in regressors at the true param-
eter then GMM is first order-underidentified (and possibly globally unidentified). Hence
the GMM estimator will have a non-standard distribution. Also in this case the moments
will have singular variance.
Parametric Semi-Linear models are one class of models where the link between identifica-
tion failure and singular variance is straightforward to see. However this class of models is
by no means exhaustive. The full extent of the relationship between identification failure is
much wider. For example consider the non-linear moments derived using the test-function
approach for interest rate diffusion models . Let rt be the real interest rate where
rt − rt−1 = a(b− rt−1) + εtσrγt (24)
Define β = (a, b, σ, γ) . Under the assumption that εt is stationary at β = β0 where
β0 = (a0, b0, σ0, γ0) Then using the test-function approach of Hansen & Scheinkman (1995)
the following moment functions are derived in Jagannathan & Wang (2002).
gt(β) =
a(b− rt)r−2γ
t − γσ2r−1t
a(b− rt)r−2γ+1t − (γ − 1
2)σ2
(b− rt)r−at − 12σ
2r2γ−a−1t
a(b− rt)r−σt − 12σ
3r2γ−σ−1t
(25)
and has expectations equal to zero at β = β0. Then if σ0 = a0 the third and fourth
moments are equivalent. It is straightforward to show in this case that R(Ω) = R(G) = 3
and hence is both first order underidentified with singular variance.
3.2.2 Singular Variance with Strong Identification
Most common causes of singular variance arise from a lack of identification. However it is
possible that singular variance arises where the moment functions which strongly identify
the true parameter. For example the asset pricing moment equations in Penaranda &
Sentana (2009).
They show that from a set of over-identifying equations 1 linear combination of moments
is redundant at the true parameter. They develop asymptotic theory for the case where
the those combinations of redundant variables for different parameter values are known a
priori.
15
In general we do not know whether moments have a singular variance at the unknown
true parameter. We provide an example where singular moments arise when the moment
functions strongly identify β0. One such example is from linear systems of simultaneous
equations with conditional heteroscedasticity. Consider a simple linear system of equations
ε1i(β1) = y1i − β1xi (26)
ε2i(β2) = y2i − β2xi (27)
Under the assumption that
E[(ε1i(β1), ε2i(β2))′|zi] = 0 (28)
Uniquely at some β10 ∈ R, β20 ∈ R Where zi = (z1i, z2i). In such cases any function of z1i
and z2i is a valid instrument. For simplicity consider the case where the following moment
function is used to identify β0 = (β10, β20)
Equation (29) implies the following moment function
gi(β) =
ε1t(β1)z1i
ε2i(β2)z2i
(29)
Satisfies E[gi(β0)] = 0. (This may not be unique, however it is often implicitly assumed in
applied work that this is the case).
In general the residual εji(βj) interacted with zmli for j = 1, 2 , l = 1, 2 for various poly-
nomial orders m is a valid moment condition. We consider the following simple example
of how singular moments could arise based on such moment functions.
Suppose ε1i(β10) = σ1εiz2i and ε2i(β10) = σ2εiz1i for some σ1 6= 0 and σ2 6= 0 where
E[εi|zi] = 0 and E[ε2i |zi] = 1.
If (ε1i, ε2i, xi, z1i, z2i) is i.i.d then
Ω = E[z21iz
22i]
σ21 σ1σ2
σ1σ2 σ22
(30)
Hence R(Ω) = 1 and hence is singular. G may or may not be full rank and is not impacted
by the conditional heteroscedasticity that leads to Ω being singular.
G =
−E[xiz1i] 0
0 −E[xiz2i]
(31)
G is full rank so long as E[xiz1i] 6= 0 and E[xiz2i] 6= 0. In other words the instruments are
not weak instruments.
16
Many instance of such models arise. For example the C-CAPM implies a set of non-linear
residuals be mean independent of all variables known last period and before. An example
of (26), (27) arise for Conditional CAPM models, Cochrane (1996).
Though the example here is somewhat pathological, when using large systems of equations
interacted with polynomial functions of a potentially infinite set of instruments. It is
entirely plausible that Ω be singular or near-singular in such situations.
In sum the potential for singular moment variance at or near to the true parameter is
prevalent both with and without identification failures. As such the assumption that Ω is
non-singular can not be viewed as a relatively innocuous regularity condition.
4 Asymptotic Theory for Identification Robust Statistics
with Singular Variance
This section derives the asymptotic theory for the Anderson Rubin and 2-Step GMM
Statistics.
Firstly we lay out some definitions for the eigenspaces of the functional matrix Ω(β)
and Ω(β). By construction both matrices are positive definite and symmetric hence the
following decompositions can be made for all β. Let the m×m matrix P (β) be the matrix
of population eigenvalues where
Ω(β) = P (β)Λ(β)P (β)′ = P1(β)Λ1(β)P1(β)′ + P2(β)Λ2(β)P2(β)′ (32)
P (β) := (P1(β), P2(β)), P (β)′P (β) = Im and Λ2(β) = 0. dim(Λ1) = m∗(β) and
dim(Λ2) = m(β) where Λ2(β) is an m(β)×m(β) matrix of zeroes and m = m∗(β)+m(β).
Hence Ω(β) is full rank iff m(β) = 0. Hence Ω(β) =∑m∗
j=1 λ1j(β)P1j(β)P1j(β)′ where
λ1j(β) = [Λ1(β)]jj andP1(β) := (P11(β), P12(β), .., P1m∗(β)) where P1j(β)′P1j(β) = 1 and
λ2j(β) = 0 for j ∈ 1, .., m(β) with eigenvector P2j(β) where P2(β) := (P21(β), .., P2m(β))
where P ′2j(β)P2j(β) = 1 for all j = 1, .., m(β). When referring to the eigenvalues as a
whole we refer to (λ1(β), .., λm(β)) where the first m∗(β) are non-zero eigenvectors and
the final m(β) equal to zero.
We perform a similar decomposition for Ω to express
Ω(β) = P (β)1Λ(β)1P1(β)′ + P2(β)Λ2(β)P2(β)′ (33)
Where P (β) = (P1(β), P2(β)) and P (β)′P (β) = Im and dim(Λ1(β)) = m∗(β) and dim(Λ2) =
m(β). Λ2(β) is a m(β)× m(β) diagonal matrix with the sample estimates of the popula-
tion zero eigenvalues on the diagonal. Λ1(β) is an m∗(β) ×m∗(β) diagonal matrix with
17
sample estimates of the non-zero eigenvalues on the diagonal. P2(β) and P1(β) (where
P1(β) = (P11(β), P12(β), .., P1m∗(β)) and P2(β) = (P21(β), .., P2m(β)) are the correspond-
ing sample eigenvectors.
We are interested in Ω := Ω(β0). For notational simplicity throughout we define P1 :=
P1(β0) and P2 := P2(β0) , Λ1 = Λ1(β0) and Λ2 := Λ2(β0) (hence define m := m(β0), m∗ :=
m∗(β0) as the restive sizes of the two matrices Λ1 and Λ2) where P1 := (P11, .., P1m∗),
P2 := (P21, .., P2m) and for any j [Λ1]jj := λ1j and [Λ2]jj := λ2j .
We firstly show that when Ω has m zero eigenvalues (i.e has rank m − m) then the m
smallest eigenvalues of Ω(β0) are also equal to zero with corresponding eigenvector where
P2 = P2 w.p.1. In this case Ω(β0)−1 does not exist. This is the cause of the breakdown in
standard asymptotic analysis.
Theorem 1 (T1)
When R(Ω) = m− m where m > 0 for all n then Ω is singular for all n the following hold
w.p.1.
(i) Λ2(β0) = 0 (34)
(ii) P2(β0) = P2 (35)
T1 shows that TAR(β) does not exist at β = β0.
In light of T1 evaluate TAR(β) at some sequence βn where nκ∆np→ ∆ where ∆ is a
p × 1 bounded and non-stochastic for any κ > 1/2. We derive the limit distribution of
TAR(βn) and provide conditions under which is has a χ2m limit. As such identification
robust methods based on AR considered in Stock & Wright (2000) and others will provide
valid Local Confidence Sets under these further conditions, containing all such sequences
βn asymptotically with a certain probability.
Using the eigen-decomposition of Ω(β) we can express
TAR(βn) = ng(βn)′P (βn)Λ(βn)−1P (βn)g(βn) (36)
= g(βn)′P1(βn)Λ1(βn)−1P1(βn)′g(βn)+ g(βn)′P2(βn)Λ2(βn)−1P2(βn)′g(βn) (37)
Higher order expansions of Λ2(βn) and P2(βn) around β0 are derived. When m > 0 (i.e Ω
is singular) then Λ2(β0) = 0. Hence higher order terms in the expansion of the eigenvalues
Λ2(βn) around β0 enter first-order asymptotics of TAR(βn). Amazingly under some further
conditions on G then the AR statistic retains a standard chi squared limit locally around
β0.
18
Using the expansions of sample eigenvalues/vectors around β0 we show under certain
conditions that
nκ+1/2P ′2g(βn)d→ N(0, D) (38)
n2κΛ2(βn)p→ D (39)
For some full rank matrix non-stochastic matrix D .Hence
ng(βn)′P2(βn)Λ2(βn)−1P2(βn)′g(βn)d→ χ2
m (40)
So TAR(β) is locally chi-squared in a δ-neighborhood around β0.
Matters become more involved for T2S(β0). We consider the case where β the initial
estimator is strongly consistent. In this case βn = β where β is some initial GMM
estimator. In this case n1/2∆n = n1/2(β − β0) has a limit Gaussian distribution In this
case it is shown that nΛ(β) has a random limit. This is the cause of the breakdown of
standard asymptotics for 2S-GMM.
As such the limit distribution of T2S(β0) will depend upon the distribution of β. Asymp-
totics for T2S(β0) with identification failure is beyond the scope of this paper and may
prove impractical in general.
4.1 Asymptotic Expansions of Perturbations of Ω(β) around β0
Singularity of Ω poses a theoretical challenge to deriving asymptotic expansions of both
quadratic objective functions as a function of the inverse of some sample estimate of Ω.
Essentially we wish to expand Ω(βn)−1 . However Ω(β0)−1 doesn’t exist. As such we
cannot use Taylor Type expansions commonly used to derive the higher order expansions
of various test statistics.
We derive an expansion of both the AR and 2S statistic by providing expansions for
the eigenvectors and eigenvalues of Ω(βn) around β0. This approach is to the best of our
knowledge new in the identification literature and is of theoretical interest in its own right.
We expand λj(βn) and Pj(βn) around β0 for j = 1, ..,m. This result will prove crucial
in deriving the limit distribution of the 2-Step and AR objective function along with tests
of singularity.
These results are of interest in their own right and will be a useful framework for deriving
asymptotic theory for moment type estimators, tests of identification and other important
statistics based on moments have singular variance.
19
Assumption 1 (A1) (i) wi(i = 1, , , n) independent ∀ i,n (ii) minj λ1j > c for some c > 0
(iii) ||Ω(β0)− Ω|| = Op(n−1/2) , (iv)|| 1n
∑ni=1(Gi(β
∗)−Gi(β))|| ≤ M ||β∗ − β0||) ∀β, β∗ ∈ B
where M = Op(1), (v) ||Ω(β∗)− Ω(β)|| ≤ M ||β∗ − β0||) ∀β, β∗ ∈ B where M = Op(1) (vi)
m <∞ (vii) ||Ω(β0)|| = Op(1), (viii) ||G(β0)|| = Op(1),
A1(i) could be dropped to allow dependence, however we would need to use more general
estimators of Ω for example HAC. Extensions to this case are relatively straightforward
under further regularity conditions though would do little to change the main results of
this paper. A1 (ii) is relatively innocuous and maintains that the non-zero eigenvalues are
bounded away from zero for all n.
A1 (iii) is stronger than necessary and assumes that the sample variance Ω(β0) is strongly
consistent. The rate could be weakened and the main results below would not funda-
mentally alter. A1(iv), (v) are continuity assumptions on Ω(β) and G(β). A1 (vii),
(viii) assume that the sample variance and first order derivative at β0 are asymptotically
bounded. For simplicity we assume that m is fixed in A1 (vi). Extensions to m growing
with the sample size would be relatively straightforward.
A strong set of sufficient conditions for A1 is that wi is i.i.d, Gi(β) and gi(β) are continuous
and bounded in a neighborhood of β0 for all i, n.
When Ω is singular T1 establishes that λ2j(β0) = 03. As such higher order terms in the
expansion of λ2j(βn) and P2j(βn) around β0 enter the first order asymptotics of both the
AR and 2S statistic at β0.
Theorem 2 derives asymptotic expansions for the sample eigenvectors and eigenvalues of
Ω(βn) around β0. Higher order expansions are provided only for λ2j(βn) and P2j(βn).
The higher order terms in the expansion of λ1j(βn) and P1j(βn) do not enter first order
asymptotics for AR or 2S. For brevity only first order expansions for P1j(βn) and λj(βn)
are provided. Though not explicitly provided, higher order expansions could also easily
be derived for λ1j(βn) and P1j(βn) using the proof of T2.
These results are of interest in themselves and may prove useful for deriving general asymp-
totic theory with singular variance in other non-standard settings.
Theorem 2a (T2a) Under A1
i P1j(βn) = P1j(β0) +Op(||∆n||) ∀j = 1, ..,m∗3In the following any sample eigenvector or eigenvalue with an arbitrary subscript j means the results
hold for all such j. So that for example λ2j(β0) = 0 means the result holds for all j = 1, ..m
20
ii P2j(βn) = P2j + P1(β0)Λ−11 P1(β0)′(Ω(βn)− Ω(β0))P2j +Op(|||∆n||2) ∀j = 1, .., m
By standard arguments we can show that P1(β0) and Λ(β0)−1 converge to P1 and Λ1 at
rate n1/2 under A1 (ii), (iii). (Theorem 4.2, Bosq (2000)). Also by T1 Ω(β0)P2j = 0.
Then it is straightforward to establish
Corollary 1 (C1). Under A1 then T2a implies that
i P1j(βn) = P1j +Op(n−1/2 ∨ ||∆n||)
ii P2j(βn) = P2j + P1Λ−11 P ′1Ω(βn)P2j +Op(||∆n||(n−1/2 ∨ ||∆n||)))
Define the following,
γjn = P ′2j1n
∑ni=1G
′i∆n∆′nGiP2j ,Ψjn := Λ
−1/21 P ′1
1n
∑ni=1 gi∆nG
′iP2j , ψjn := Ψ′jnΨjn and
υjn := γjn − ψjn for j = 1, .., m.
Theorem 2b (T2b) Under A1
i λ1j(βn) = λ1j +Op(n−1/2 ∨ ||∆n||)
ii λ2j(βn) = υjn +Op(||∆n||2(||∆n|| ∨ n−1/2)
T2a and b are crucial for deriving the limit distribution of TAR(βn) (and also T2S(β0)).
Section 4.2 provides asymptotic theory for the AR and 2S GMM using the higher order
expansions in T2a and T2b.
4.1.1 Limit Distribution of Identification Robust Statistics
Using T2a and T2b we can derive the limit distribution of both TAR(βn) and T2S(β0). We
derive the limit of TAR(βn) and not TAR(β0) as noted in T1 TAR(β0) does not exist when
Ω is not full rank.
4.1.2 Anderson Rubin Statistic
We firstly derive the limit distribution of TAR(βn) for some sequence βn. In this case we
consider a sequence βn where nκ∆np→ ∆ where ∆ is a non-stochastic p× 1 vector. Define
for j = 1, ..m
Φj := Λ−1/21 P ′1E[
1
n
n∑i=1
gi∆′G′iP2j ] (41)
21
ψj := Φ′jΦj (42)
γj := E[1
n
n∑i=1
P ′2jG′i∆∆GiP2j ] (43)
Assumption 2 (A2) (i)nκβnp→ ∆ where ∆ is a p× 1 non-stochastic vector and κ > 1/2,
(ii)√n(G(β0) − G)′v
d→ N(0,Kv) where Kv := 1n
∑ni=1(E((Gi − G)′vv′(Gi − G)) for any
v ∈ Rp. (iii)√ng(β0)
d→ N(0,Ω), (iv) δ′Ω = 0 =⇒ δ′G = 0 for δ ∈ Rp (vi)nκγjnp→ γj ,
nκψjnp→ ψj
A2(i) assumes that the sequence nκβn has a non-random limit. This assumption could be
extended to allow for different rates of convergence for difference linear combinations of
parameters βn, however this would not change the result. The assumption this sequence
has a non-stochastic limit is innocuous since for AR when inverting to form a confidence
set βn is fixed. Unlike 2S below where the weight matrix is evaluated at an estimator of β0
where in this case n1/2∆n has a random limit under the assumption the initial estimator
is strongly identified.
A2(ii) is a high level assumption that a Central Limit Theorem holds for the matrix
n1/2(G(β0) − G), similarly for A2 (iii) regarding n1/2g(β0). The limit variance does not
allow for dependence in G(β0)−G since for simplicity we have assumed wi is independent
over i and hence so is Gi. A2 (iv) makes a high level weak law of large numbers assumption,
given that nκ∆np→ ∆ then nκψjn converges to ψj assuming a weak law of large numbers
results holds.
A key assumption is A2 (iv) that the null space Ω belongs to the null column space of
G. In just-identified settings, m = p then A2(iv) rules out strongly identified moment
conditions. This may also hold for m > p. Hence when there is singular variance, in order
for the AR statistic to satisfy Theorem 3 then the moment function must in many cases
also be first order under identified. This is a startling result, namely that if Ω is singular
and G is full rank (i.e is first order identified) then the AR statistic is not in general
locally-chi squared.
In fact in Simulation 1 an example of this case we demonstrate the AR statistic in a 1/n
neighborhood of β0 is highly non-standard. Hence the AR statistic is not necessarily robust
to singular variance when there is strong identification. This goes against the commonly
held wisdom of the identification literature. This paper is a first step in understand the
complex relation between usual notions of identification and regularity conditions often
relegated to secondary importance.
22
Theorem 3 Under A1-A2
TAR(βn)d→ χ2
m (44)
Hence inverting TAR(β) using the χ2m approximation will provide asymptotically valid
confidence regions for any δ−neighborhood around β0 and hence will be a Local Confidence
Set. Essentially inverting the AR statistic by the usual method will contain any β an
infinitesimally small distance away from β0 with the correct probability asymptotically.
However Assumption 2 must hold. Dropping the assumption that Ω is full rank we require
a central limit theorem to hold for G(β0) − G and that the null space of Ω to lie in the
null space of G.
4.1.3 2-Step GMM Statistic
For 2-Step GMM βn = β where ∆n = β−β0 and when appropriately scaled has a random
limit. We consider the case where the initial estimator is strongly identified so that n1/2∆n
in this case has a Gaussian limit distribution. Extensions to allow for departures from
strong identification would prove extremely difficult and are beyond the scope of this
paper.
For the AR statistic the sequence ∆n was fixed and as we saw under extra conditions this
statistic retained the standard chi squared limit in a neighborhood of β0. This is not the
case for 2-Step GMM. the distribution of β enters the limit distribution of T2S(β0) even
when β is strongly identified. It is shown below this distribution has a complicated form.
Define the following Υ := ΣP1Λ1/21 and Σ is defined in A3(i) where for all j = 1, ..m
and h = 1, ..,m∗
Πjn = Υ1
n
n∑i=1
(G′iP2jP′2jGi −
1
n
n∑j=1
G′jP′2jP2jGi(g
′iP1Λ−1
1 P ′1gj))Υ′ (45)
Πj = ΥE[1
n
n∑i=1
(G′iP2jP′2jGi −
1
n
n∑j=1
Ξ′G′jP′2jP2jGi(g
′iP1Λ−1
1 P ′1gj))]Υ′ (46)
Θhjn := λ−1/21h
n∑i=1
g′iP1hP′2jGiΥ (47)
Θhj := λ−1/21h E(
n∑i=1
g′iP1hP′2jGi)Υ (48)
23
Where Θhj is a 1×m∗ vector where Θhj := (θhj1, .., θhjm∗)
Assumption 3 (A3) (i) (β − β0) = Σg(β0) + op(1) for a full rank p ×m matrix Σ, (ii)
Πjnp→ Πj ∀j = 1, .., m, (iii) Θhjn
p→ Θhj ∀j = 1, .., m, h = 1, ..,m∗
Define W be a m∗ ×m∗ random matrix with a standard Wishart distribution.
By A3(i) and A2(ii)
√n(β − β0)
d→ ΥZ (49)
Where Zd∼ N(0, Im∗) since P ′2g(β0) = 0. Define Z : (Z1, .., Zm∗)
′
Theorem 4(i) derives the limit distribution of λ2j(β) which will be useful in both deriv-
ing the limit distribution of T2S(β0) and also for deriving tests of the rank of Ω in Section 5.
Theorem 4 (T4) Under A1-A3
λ2j(β)d→ tr(ΠjW) (50)
Theorem 5 (T5) Under A1-A3
T2S(β0)d→
m∗∑k=1
Z2k +
m∑j=1
(m∗∑h=1
m∗∑l=1
θhjlZlZh)2/tr(ΠjZZ′) (51)
T5 shows the highly non-standard limit distribution of T2S(β0) when m > 0, i.e moments
have singular variance. When m = 0 then T5 shows that T2S(β0) has a χ2m limit dis-
tribution. With singular variance the 2S statistic with strong identification is a highly
non-linear function of m∗ independent standard normal random variables.
T4 can be used to test the rank of Ω. This is the case only when GMM is strongly
identified. With departures from strong identification the limit distribution of the GMM
estimator in general has non-standard rates of convergence to a non-standard distribution.
With lack of global identification in general moment type estimators are inconsistent. This
causes obvious issues for testing the rank of Ω(β) and β = β0, i.e the rank of Ω.
In general inverting the 2S statistic using a standard χ2m limit provides incorrect inference.
Modified methods of inference based on the 2S statistic are derived, though are robust
only to certain forms of identification failure.
The message which follows from this, is that inverting the AR statistic, under further
conditions than those posed in the literature will still provide valid inference with general
identification failures. Not just departures of strong identification but also to non-singular
24
variance which we argue is a third identification condition which cannot be made separately
given the discussion on the interplay between identification and singular variance in Section
3.
One peculiar case which emerges is when strong identification occurs with singular vari-
ance. In many cases this will invalidate A2 (ii) and in general AR will not have a locally
chi squared limit. As such methods which remove eradicate the singularity issue would
be preferable. This is especially the case as in a simulation when moments have almost
singular variance with strong identification the AR statistic is poorly approximated by a
χ2m limit.
Further investigations in to the link between singular variance, identification and methods
to provide robust asymptotically valid inference are greatly needed. Inverting the AR
statistic using a χ2m limit is not as innocuous as implied by the identification literature
once we view the full rank assumption on the variance of the moments as an identification
condition
5 Singularity Robust Confidence Sets
Using Theorem 4 we can derive a test for the rank of Ω in the case when β is strongly
identified. Deriving a test with global or first order identification is beyond the scope of
this paper.
We may also wish to perform formal statistical tests of the rank of Ω. Theorem 2 derives
the limit distribution of the sum of the n times the r smallest eigenvalues under the null
that m∗ = r
Theorem 4 naturally lends itself to testing the null hypothesis that the rank of Ω is m− r
for any 1 ≤ r < m (i.e to test the Null Hypothesis that m = r). Section 5.2 details
how to perform such tests and estimate the quantiles of tr(ΨjW) in practice. Simulation
experiment 1 verifies the validity of this asymptotic approximation.
5.1 Testing for redundant moments in non-linear models
Given the invalid approximation of T2S(β0) by χ2m when the variance of moments is singular
(and poor approximation when β0 is almost a point of a singularity) we’d like some method
of testing for the rank of Ω. When Ω is not a function of β then if Ω is not full rank then
it will imply that Ω is also singular for any n.
Suppose we wish to test the following Null Hypothesis.
25
H0 : R(Ω) = m− r (52)
H1 : R(Ω) > m− r (53)
Where 0 < r < m. Under the null then
n
r∑j=1
λ2j(β)d→ tr(
r∑j=1
ΠjW) (54)
We require estimates of P1P′1, P2P
′2 and Υ along with an initial estimator β0 to esti-
mate each Πj . We take β to be the initial GMM estimator with (possibly data de-
pendent) matrix W . In this case when GMM is strongly identified then A3(ii) is sat-
isfied with Σ = (G′WG)−1G′W and a consistent estimate can be formed using Σ =
(G(β)′WG(β))−1G(β)′W
Though P1 and P2 are not in general continuous function of the elements of Ω, P1P′1
and P2P′2 are , Kato (1982). Hence since by T2 P1
p→ P1 P2p→ P2 then by the CMT
P1P′1
p→ P1P′1 and P2P
′2
p→ P2P′2.
Hence we can take an estimate of Πj based upon Σ, P1(β), P2(β) and Λ and β. Then
under A1-A3 it is straightforward to show that Πjp→ Ψj .
We can then simulate the distribution of tr(∑r
j=1 ΠjW) by taking B draws from r uncor-
related standard normal variables Z∗b where Z∗bd∼ N(0, Ir) for b = 1, ..B and taking the
α% quantiles of tr(ΨW∗b ) where W ∗b = Z∗bZ
∗′b . Since W∗ approximates the distribution
of W with arbitrary precision for B → ∞ and∑r
j=1 Πjp→∑j = 1rΠj by A1( it is easy
to establish that qαp→ qα where
qα = infx ∈ R : Prtr(r∑j=1
ΠjW∗) ≤ x ≥ α (55)
qα = infx ∈ R : Prtr(r∑j=1
ΠjW) ≤ x ≥ α (56)
Though in simulations the size of the test is poorly sized for even moderately large sam-
ple sizes, the power of the test is very strong. Under the alternative hypothesis then
n∑r
j=1 λ2j(β) diverges at rate n such that when the null is not true.
26
5.2 Estimates of Rank w.p.1
We could use this result to formulate some cut off point with which to select the number
of non-zero eigenvalues of Ω which would correctly determine the rank of Ω asymptotically
w.p.1. For example letting r := infj≥0(λj > δn) where δn > 0 and Theorem 3 of Bathia,
Yao & Ziegelmann(2010) shows that Prr = m → 1 as n→∞ when (δ2nn)−1 → 0.
This result is extremely useful to modify the 2-Step GMM objective function to remove
the potential problem of Ω being singular. The next section shows that when Ω is singular
that T2S(β0) is bounded in probability though in general will no longer be distributed χ2m.
If we take our estimator r then since r = m asymptotically w.p.1 we can delete redundant
moments using the estimated eigenvectors such that the modified set of moments satisfies
standard asymptotic theory asymptotically. No estimation error from estimating m feeds
in to the limit distribution of the transformed objective function asymptotically. Namely
we derive an objective function based on a set of moments that have non-singular variance
with rank m∗.
There are two possible ways to correct for the non-standard limit distribution of T2S(β0).
The first method is to estimate m using the method detailed to remove redundant combi-
nations of moments. The second method is to use bootstrap critical values with which to
invert T2S(β) to form a confidence region that is singularity-robust.
5.3 Removing Redundant Combinations of Moments
Given an estimate of r s.t Prr = m = 1 as n→∞ then the modified two step criterion
function T r2S(β0)
T r2S(β) :=m−r∑j=1
(P ′j√ng(β))2λ−1
j (57)
Utilizes the first m − r moments P ′j g(β) that have the largest eigenvalues. Given an
estimator r such that Prr = m → 1 then for any c > 0
PrT r2S(β0) ≤ c → PrT ∗2S(β0) ≤ c (58)
as n → ∞ where T ∗2S(β0) =∑m∗
j=1(P ′j√ng(β0))2λ−1
j Since m − r = m − m = m∗ with
probability 1 as n → ∞ and since√ng(β0) = Op(1) by A2(ii) and λj = λj + Op(n
−1/2)
for and Pj = Pj +Op(n−1/2) j ∈ 1, ..,m∗ (i.e for those j where λj > 0 by T2/
Then by A2(ii)√ng(β0)
d→ N(0,Ω) where Ω = P1Λ1P′1 . Hence (P ′1j g(β0)/λ
1/21j )2
dN(0, 1)
for j ∈ 1, ..,m∗. Therefore T ∗2S(β0)d→ χ2
m∗ .
27
This establishes then that for any c > 0 and as n→∞
PrT r2S(β0) ≤ c → Prχ2m∗ ≤ c (59)
Hence we can then derive confidence regions for β0 with valid levels asymptotically though
using the modified 2 Step GMM function T r2S(β) and the quantiles of χ2m∗ . We do not know
m∗ however since Prr = m → 1 then the quantiles of χ2m−r will be valid asymptotically.
Note that this method will yield confidence regions with correct size the case even if
m∗ < p. Though the smaller is m∗ the wider will our convergence regions become, though
will still be correctly sized asymptotically.
Another method would be to work directly with T2S(β) using quantiles based on a boot-
strap method without deleting redundant moments. The drawback is it is difficult to pro-
vide the validity of a bootstrap theoretically. The benefit is that we do not need to remove
redundant moments which may be overly conservative in small samples. The bootstrap is
shown to work well in simulations, even when the GMM estimator is under-identified and
hence A1(i) breaks down Renault & Donovon (2009).
Theorem 4 derives the limit distribution of the 2S statistic and may prove useful in proving
the validity of the bootstrap, at least in the case where GMM is strongly identified.
5.4 Bootstrap Confidence Regions
Bootstrapping robust statistics has received relatively little attention in the literature.
Kleibergen (2011) derive bootstrap critical values for the identification robust statistics in
Kleibergen (2005). For example for the Continuous Updating Estimator and the Kleiber-
gen Statistic and others. This paper does not derive higher order expansions for the 2-Step
GMM estimator. Also Kleibergen (2005) made the assumption that Ω is non-singular for
any n and hence that the minimum eigenvalue is well separated from zero for all n.
Establishing the validity of the bootstrap method for the 2-Step GMM objective function
when Ω is singular would be difficult even if the GMM estimator were strongly identified.
The limit distribution of the 2-Step GMM depends upon the limit distribution of the
sample estimates of the zero eigenvalues, which by Theorem 4 shows this distribution to
be highly non-standard. We do not provide conditions under which or the validity of
the bootstrap method. However we highlight a method similar to that put forward in
Kleibergen (2011) used when Ω is singular. It is shown to work well in simulations in
Section 6.
28
For any β ∈ B define gi(β) = gi(β)− g(β). Take a bootstrap sample of size N with replace-
ment from gi(β) to form a bootstrap sample b g∗1b(β), .., g∗nb(β) for b = 1, .., B∗ Then
define the bootstrapped 2-Step GMM objective function at β Tnb(β) := g∗(β)′bΩ∗b(β)g∗(β)b
Where Ω∗b(β) =∑n
i=1 gib(β)gib(β)′
Define T2S(β) = ng(β)′Ω(β)−1g(β). Take then the α∗ quantile of Tnb(β) from the boot-
strap sample which we define as α(β) and perform the inversion of T2S(β) selecting all
β ∈ B such that T2S(β) ≤ α(β). At β0 then α(β0) should asymptotically approximate the
quantiles of T2S(β0) (which is verified in three simulations. For β 6= β0 then T2S(β)→∞
under strong identification. With weak identification, or in the worse case scenario total
lack of identification then the power properties may be poor. Just as would be the case
of inverting robust-statistics in the case where Ω is singular, Dufour (1997), Kleibergen
(2005).
6 Simulations
We consider two different ways in which singularity and near- singularity may arise. Sim-
ulation 1 and two both consider instrumental variable models where the first stage is a
non-linear regression. This approach is common when the endogenous variable is a binary
variable, for example Angrist (2001). For example it isn’t uncommon for a first stage
Probit of Logit to be run in these cases. Simulation 1 considers the case where the GMM
estimator is strongly identified, though through conditional heteroscedasticity the moment
functions are singular at β0.
We also consider the case where the variance of the moments is almost singular, i.e where
the determinant of Ω is very small. In the former case the distribution of nQ2S(β0) does
not converge to a chi squared limit , though is bounded, even for large n, as to be expected
by Theorem 4. The case where Ω is almost singular, for small to medium sample sizes the
chi squared approximation is poor, with the distribution of T2S(β0) eventually been well
approximated by a chi squared distribution. In this example since the GMM Estimator is
strongly identified, we can use Theorem 2 to derive the limit distribution of the smallest
eigenvalue which is zero. The approximation of the limit distribution in Theorem 2 is
quite poor, though eventually converges for large sample size. We also show the bootstrap
confidence regions to perform well.
Simulation 1 satisfy A3 since Ω is singular whilst G is full column rank and hence first-
order identified. In this case we show that TAR(βn) for βn = β0 + 1/N has a non standard
29
chi squared distribution for any local sequence around β0. It is beyond the scope of this
paper to derive the limit distribution of TAR(βn) with singular variance where first order
identification is maintained.
Simulation 2 considers a parameter semi linear regression model used in Bierens (1990) to
test for non-linearities. For certain parameter values there is a lack of global identification
which causes issues for testing for non-linearities Hansen (1996). It is also the case that
the moment variance matrix would be singular for these parameter values which will cause
issues even for those statistics robust to identification failures. There exist some parameter
values where though globally identified, there is first order under-identification as well as
singular variance.
In this case the GMM estimator is consistent though with a non-standard distribution.
This would invalidate parameter inference based on the normal approximation to the
GMM estimator. Also from confidence intervals based upon the Wald , LR or Likelihood
ratio statistic based on an initial estimate of β0. The 2-step GMM objective function does
not require first order identification in order to provide correct inference on β0, requiring
only the initial estimator be evaluated at the true parameter is shown to be non-standard.
In this case the distribution of the estimated smallest eigenvalue is uncertain, however
we can show using theorem 2b(ii) in this case that it is Op(n−1/2) with a non-standard
distribution. We consider β0 where there the variance of the moments is singular and near
singular. In the latter case, convergence to the standard chi squared limit is shown to be
very slow.
Simulation 2 does satisfy the conditions of A3 and hence Theorem 3 holds and the con-
vergence of TAR(βn) for βn = β0 + 1/n to a χ2m limit is demonstrated.
6.1 Simulation 1: Non-Linear Simultaneous Equations with Conditional
Heteroscedasticity
Let (yi, xi, zi) i = 1, .., n be an i.i.d sequence where
yi = θ0xi + εi (60)
xi = (1 + π0zi)−1 + vi (61)
Where both E[εi|zi] = 0 an E[vi|zi] = 0] for all i.
Define β = (θ, π), then the moment function
gi(β) =
((yi − θxi)(1 + πzi)
,(xi − (1 + πzi)
−1)zi(1 + πzi)2
))′ (62)
30
E[gi(β0)] = 0 (63)
Where β0 := (θ0, π0)/ The second moment condition is from the non-linear regression
in (53). First moment condition uses the optimal non-linear instrument with which to
identify β0. Here π0 is a nuisance parameter and not the variable of interest.
Ω(β0) := E[gi(β0)gi(β0)′] = E
E[ε2i |zi](1+πzi)2
E[εivi|zi]zi(1+πzi)3
E[εivi|zi]zi(1+πzi)3
E[v2i |zi]z2i(1+πzi)4
Suppose the process (εi, vi) i = 1, .., n satisfies the following
εi|ziiid∼ N(0, σ2
ε (1 + π0zi)2z−2i ) (64)
vi|ziiid∼ N(0, σ2
v(1 + π0zi)4z−4i ) (65)
E[εivi|zi] = ρσvσεz−1i (1 + π0zi)
3 (66)
Where ρ = cor(εi, vi) Hence E[ε2i |zi] = σ2ε (1 + π0zi)
2z−1i
E[v2i |zi] = σ2
vz−2i (1 + π0zi)
4
Ω(β0) =
σ2ε ρσvσε
ρσεσv σ2v
(67)
p = 1 implies singular moments, p ≈ 1 then Ω is almost singular.
We simulate a sample of ((yi, xi, zi, εi, vi) i = 1, .., n satisfying (52)-(53) an (56)-(58)
letting zi = abs(ei) + 0.1 where ei i.i.d N(0, 1) and setting π0 = β0 = 0.1 , σε = σv = 1
with number of repetitions R = 20000 for each case.
6.2 2-Step GMM
We first plot the empirical density of T2S(β0) = ng(β0)′Ω(β)−1g(β0) forN = 100, 1000, 2000
and ρ = 1, 0.999999, 0 (Singular, Almost-Singular, Non-Singular) Since the model is just-
identified we take the first-step GMM estimator as the initial estimator β since the weight
matrix is irrelevant in this case.
Figure 1a: Density T2S(β0) N=100
31
Under A2(ii) that√ng(β0)
d→ N(0,Ω) then when moments have non-singular variance,
i.e R(Ω) = 2 (i.e non-singular) then T5 with m = 0 and m = 2 (i.e standard asymptotics)
then T2S(β0)d→ χ2
2
Where Prχ22 ≤ 5.99 = 0.95 Prχ2
2 ≤ 9.21 = 0.99
Table 1 plots the 90% and 95% quantiles for each n,p.
Table 1: Small Sample Quantiles of T2S(β0)
N = 100 N = 1000 N = 2000
q = 0.95 q = 0.99 q = 0.95 q = 0.99 q = 0.95 q = 0.99
ρ = 1 77.37 4355 18.2 38.22 17.3 33.34
ρ = 0.99999 41.9 1245 7 14.03 6.41 11.28
ρ = 0 14.98 311 6.09 9.84 6.02 9.32
Fig 1a-c show that case where the moments are uncorrelated (p=0) the χ22 provides a
reasonable approximation for small N (N = 100) and extremely good for (N=1000,2000).
For almost singular, the χ22 approximation is poor for small and medium sample size though
is reasonable for large N (N=2000). When moments are singular at β0 the distribution
does not converge to a χ22, though is bounded, which corresponds with Theorem 2. The
quantiles are roughly more than double those of a χ22 when p = 1. To visualise this we
plot the Quantile-Quantile plots of T2S(β0) and χ22 in Fig 2a-c.
Fig 2a: Quantile-Quantile Plots of T2S(β0) and χ22 N = 100
33
Fig 2b: Quantile-Quantile Plots of T2S(β0) and χ22 N = 1000
Fig 2c: Quantile-Quantile Plots ofT2S(β0) and χ22 N = 2000
34
Again we see that when p = 1 even for N = 2000 the chi22 approximation is massively
undersized. Performing the simulation for N = 20000 I found a similar result, though the
result is not reported here.
We now plot the density of√n(λ − λ) for p = 0, 0.99999, 1. Where λ is the minimum
eigenvalue of Ω and λ the minimum sample eigenvalue of Ω.
Fig 3a: Density of√N(λ− λ) N = 100
Fig 3b: Density of√N(λ− λ) N = 1000
35
Fig 3c:Density of√N(λ− λ) N = 2000
For p = 1, then λ = 0 and we see that√nλ is converging to a spike at zero. This
corresponds with Theorem 4 that λ = Op(n−1) when Ω is singular and the GMM estimator
is strongly identified, as is the case here since the G is full rank at β0 and the moment
conditions globally identify β0. Fig 3a-c show that the normal approximation to√n(λ−λ)
is poor, especially for the case where p = 0.99999 and hence λ is almost zero. The normal
approximation of the sample distribution of λ for λ > 0 is known to potentially be poor
in even large samples.
Fig 4a: Density of Nλ for λ = 0 N = 100
36
Fig 4a-c plot the density of n times the estimate of the zero eigenvalue. It seems to follow
some multiple of a χ21 distribution. It turns that, using Theorem 3 that the limit distribu-
tion of nλ when λ = 0 is approximately 2 times a χ21. We simulate the distribution of the
limit distribution in Theorem 4 for this example. The Quantile-Quantile plots of tr(ΓW )
where using simulation as defined in Section 5.1 against that of nλ. This approximation is
poor, requiring extremely large sample sizes to provide an accurate approximation. This
is shown in Figure 5 below.
Fig 5: QQ-Plot of Asymptotic Approximation to nλ
38
Finally we plot the rejection frequencies of the bootstrap approach to inverting T2S(β) to
form confidence regions detailed in Section 5.3 Even for small sample sizes this approach
works well.
Figure 6: Bootstrap Rejection Probabilities
39
6.3 Anderson Rubin
Since R(G) = 2 and R(Ω) = 1 when ρ = 1 then A2(iv) is not satisfied for the moment
set up in Simulation 1. Though it is beyond the scope of this paper to derive the limit
distribution of the AR statistic when A3 does not hold, this simulation shows how depar-
tures from A2 can lead to highly non standard limit distributions for the AR statistic.
We simulate 20000 times TAR(βn) for βn = β0 + 1/N for n = 100, 1000, 5000, 50000 for
p = 0.999999, 1
Figure 7: QQ-Plot of TAR(βn) and χ2m (p=1)
41
Even for very large sample sizes Figure 7 demonstrates that the AR statistic in a region
of β0 is not χ2m and is highly non-standard. Figure 8 below performs a similar exercise
for almost-singularity. For small sample sizes n = 100, 1000 the AR statistic is not well
approximated by a χ22 distribution, though eventually converges.
Figure 8: QQ-Plot of TAR(βn) and χ2m (p=0.999999)
42
6.4 Simulation 2: Parametric Semi-Linear Regression
Consider the following Semi-Linear model.
yi = π0xi + δ0 exp(γ0xi) + εi (68)
Where yi, xi are both scalar i.i.d variables where E[εi|xi] = 0. Define β = (π, δ, γ) ,
εi(β) = yi − πxi − δ exp(γxi)
Then ∂εi(β)/∂β = (xi, exp(γxi), δ exp(γxi)xi)′ and gi(β) = εi(β)(xi, exp(γxi), δ exp(γxi)xi)
′
Ω(β) = σ2E
x2i xi exp(γxi) δ exp(γxi)x
2i
xi exp(γxi) exp(2γxi) δ exp(2γxi)xi
δ exp(γxi)x2i exp(2γxi)xi δ2 exp(2γxi)x
2i
(69)
To test that the conditional mean is linear we test the parameter restriction that δ0 = 0.
In this case γ0 is globally under-identified. Tests of δ0 = 0 that overcome this problem are
given by Hansen (1996) and others.
When forming confidence sets for β0 when δ0 the variance of the moment function is also
singular. As such the sample variance estimated at β0 is also singular. As such many
identification robust statistics will not exist at the true parameter.
Another possibility is that γ0 = 0 (where δ0 6= 0). Then R(Ω) = 2 which also implies that
G = 2 by Theorem 1 and is easy to see by (). Since at γ0 = 0 ∂εi(β0)/∂β = (xi, 1, δxi)′.
Hence the third moment is a linear combination of the first. In this case the GMM
estimator is consistent, though the no longer has a standard limit distribution, Donovon
& Renault (2009).
We consider the following DGP
yi = exp(δxi) + xi + εi (70)
Where εiiid∼ N(0, 1), xi
d∼ U(0, 5) For δ = 0, 0.075, 0.1. When δ = 0 we have first order
under-identification and singular variance. For δ = 0.1 then thought Ω and G are full
rank, they have the smallest eigenvalue close to zero and hence are almost singular. Hence
standard asymptotic provide a poor approximation for both the GMM estimator and the
2-Step GMM objective function in small samples.
We estimate β0 using the GMM estimator β and simulate T2S(β0) with R = 5000 simu-
lations. For the case δ = 0 we estimate the smallest eigenvalue of Ω(β). In this example
43
β is not strong identified and hence√n(β − β0) does not have a normal limit distribu-
tion. In fact given first order under-identification there will exist some l ∈ R3 such that
l′(β − β0) = Op(n−1/4) with a non-normal limit distribution.
44
We now consider δ0 = 0.075, 0.1. In this case Ω and G are close to singular, as such we
expect strong identification asymptotic to provide a poor approximation to the distribution
of the GMM estimator and the 2-Step GMM objective function at β0.
47
6.5 Anderson Rubin
We consider the AR statistic for Simulation 2 for βn = β0+1/N for n = 5000, 10000, 50000
based on 50000 repetitions.
49
This verifies theorem 3 since the null space of G belongs to the null space of Ω and the
other conditions of Assumption 2 hold then the AR statistic is locally- chi squared with
singular variance in a vanishing neighborhood around β0.
7 Conclusion
This paper studies the up to now untreated identification issue that arises when moment
functions have singular variance at the true parameter. In other words when there exist
some linear combination of moments are redundant at the true parameter, though in small
samples are unbeknown to us.
The assumption of non-singular variance is made almost universally across the identifica-
tion literature. However in many instances singular variance are a by-product of identifica-
tion literature. An equivalence result between singular variance and first-order identifica-
tion failure is established for a large class of empirically relevant moment functions. This
encapsulates both Non-Linear Least Squares and Maximum Likelihood and as argued in
the paper most likely more general moment conditions.
Failures of identification and the issues it causes for deriving asymptotically valid inference
from moment functions is well established. A common method to providing asymptoti-
cal valid inference is to form confidence regions by inverting some ‘identification robust
statistic’, for example Kleibergen (2005). A popular method is based on the AR statistic
50
which has a chi squared limit under relatively few assumptions. One such assumption is
that moments have non-singular variance.
In light of the strong link between identification failure and singular variance, the non-
singular variance assumption arguably cannot be maintained if we no longer maintain the
strong identification condition. As such the results and methods derived in the current
identification literature have a much narrower relevance than currently thought without
further investigation.
The standard asymptotic approach to deriving asymptotic theory for the AR and related
statistics is insufficient. The AR statistic is shown to not exist at the true parameter
with singular variance. This result is likely to hold for many other identification robust
statistics. Standard Taylor type expansions around the true parameter are inadequate to
derive general asymptotic theory with singular variance.
A novel asymptotic approach is developed. Asymptotic expansions of the eigenvalues and
eigenvectors of sample variance matrix around the true parameter are established. First
order terms in the asymptotic expansion of the sample estimated of the zero eigenvalues
drop out at the true parameter with singular variance. As such higher order terms of the
sample eigenvalues and eigenvectors corresponding to zero population eigenvalues do not
vanish asymptotically.
Conditions under which the AR statistic still has a standard chi squared limit within a
region around the true parameter with singular variance are established. Further condi-
tions on the first order sample derivative matrix at the true parameter are required for
this result to hold. Strikingly one such condition is that the null space of the variance lies
within that of the first order derivative matrix asymptotically. In just-identified models
this implies that the AR statistic may not have a standard chi squared limit with singular
variance and strong identification. A simulation example demonstrates the non-standard
limit distribution of the AR statistic with strong identification and singular variance.
The limit distribution for the AR statistic is derived within an (asymptotically) non-
stochastic perturbation around the true parameter. For the 2S statistic this perturbation
has a stochastic limit. This limit feeds in to the limit distribution of the 2S statistic.
Unlike the AR statistic in general there are no further conditions under which higher
order terms in the eigenvalues and eigenvector expansions do not lead to a non-standard
limit. As such the 2S statistic at the true parameter in general has a highly-non-standard
limit distribution.
We derive asymptotic theory for 2S under the assumption the initial estimator is strongly
51
identified. Deriving the limit distribution with identification failures would in general prove
very difficult and is beyond the scope of this paper. A simulation experiment plots the
small sample distribution of the 2S statistic both with and without identification failure.
In both cases the standard chi squared critical values were largely undersized.
When the variance of the moments is non-singular though near to a point of singular-
ity simulation evidence shows that the chi squared approximation can be poor even for
extremely large sample sizes for the 2S statistic. This result is analogous to the normal
approximation to the distribution of the linear IV estimator being poor for large sample
sizes when there is weak identification. In this paper we derived asymptotic for the case
where the minimum eigenvalue of the population variance matrix is zero for finite n.
In light of the simulation evidence that the chi squared approximation to the small sample
distribution of the 2S statistic is poor even for large sample sizes, it would be valuable
to model the minimum eigenvalues as local-to zero. This approach is commonly used to
model weak identification, modeling the Jacobian as vanishing at some rate, Staiger &
Stock (1997), Newey & Windmeijer (2009). This is the approach taken in Grant (2012)
which is termed ‘Weak Singularity’.
In light of the problem that (almost) singular moments cause for inference we developed
a method of testing for redundant moments. The limit distribution of the minima of
the sample eigenvalue is established. With this we show how we can test the rank of
the population variance matrix at the true parameter. We also provided a method to
determine the rank of the matrix with probability one.
Using this approach we show how a modification to the 2S statistic eradicates the problem
of singular moments, yielding back a standard chi squared limit with an estimable degree
of freedom. We also detailed a bootstrap method which was shown to work well in two
simulations. Both with strong identification and underidentified with singular moments.
We term forming confidence sets utilizing such methods as ‘singularity robust confidence
sets’. We did not show theoretical validity of the bootstrap and is left for future research.
Numerous examples of singular variance both with and without identification failures are
provided. These issues are especially prevalent in non-linear models. As such these results
are especially useful for financial econometric models and discrete choice models.
A further investigation in to the more general link between identification failures and
singular variance is needed. Along with more general asymptotic theory which does not
maintain the assumption of singular variance. The question tackled in this paper poten-
tially opens up a large and empirically relevant question.
52
Given that singular variance and identification failures are closely linked; what is the
theoretical implication for commonly used estimation and inference results provided in
the econometric estimation literature? Which results remain the same in spite of singular
variance (and if so under which further conditions) and which do not? Which methods
overcome such problems if any of singular variance?
These questions cannot be underestimated in terms of their deep importance if one wishes
to confidently form valid inference from a set of moment conditions. This paper is the first
step in providing answers to these important questions and tools to study implications of
singular variance for other key inferential procedures.
8 Appendix
Definitions
For any random variables x E[x] refers to the mathematical expectation taken with respect
to (w.r.t) the density of x. Denotep→ ,
d→, as convergence in probability and convergence in
distribution respectively. For any deterministic sequence an and constant b then an → b
denotes b as the deterministic limit of an.d∼ is shorthand for ‘ is distributed as” and
‘w.p.a.1’ denotes ‘with probability approaching 1’ and ‘w.p.1’ denotes ‘with probability 1’.
op(a) refers to a variable that converges to zero w.p.a.1 when divided by a and similarly
Op(a) a variable bounded in probability when divided by a. Let A refer to any arbitrary
matrix then R(·) be such that R(A) N (A) denote the rank of A .||A|| and tr(A) are the
Euclidean Norm and Trace of a matrix A respectively. CMT refers to the Continuous
Mapping Theorem. For any a > 0 Ia×a refers to the a×a Identity Matrix and 0a an a× 1
vector of zeroes.
Lemma A Let A and A be two square matrices of dimension r whereA = r∗ and ||A−A|| =
Op(εn). Eigen-decompose A = RDR′ where RR′ = I and RDR′ = R1D1R′1 + R2D2R
′2
where D2 = 0 and D1 is a full rank (r − r∗) × (r − r∗) matrix. Similarly we express
A = RDR′ = R1D1R1 + R2D2R′2. Let M be some basis of the null space of A and define
B = (A−A) and J = −D−11 R′1BR2M where MM ′ = Im
R1 = R1 +R2MJ ′ +Op(ε2n) (71)
R2 = R2M +R1J +Op(ε2n) (72)
Where M is a change of basis such to validate asymptotic expansion in the more general
case when A is not square. For square matrices M = I. At any rate a change of basis
53
would not change any of the results as we could switch bases and work with R2 := R2M
and R2 := R2M . Hence () is equivalent to where implicitly we have shifted the basis of Ω
and Ω. Hence we can set M = Im and the results in this paper will not change.
R2 = R2 +R1J +Op(ε2n) (73)
Proof of Lemma A This general result is utilized in the proof of Proposition 1 of
Ratsimalahelo (2001). See equations A8 and A9 in the proof of Proposition 1.
Lemma B Under Assumption 1
P ′2jΩ(βn)P2j = γjn +Op(||∆3n||) (74)
Proof of Lemma B
Ω(βn) = 1n
∑ni=1 gi(βn)gi(βn)
Taylor expand gi(βn) around β0 where gi = gi(β0)
gi(βn) = gi +Gi(βn)∆n (75)
Where βn is a vector between β0 and βn Define Gi := Gi(βn)
Ω(βn) =1
n
n∑i=1
gig′i +
1
n
n∑i=1
Gi∆n∆′nG′i
+1
n
n∑i=1
gi∆′nG′i +
1
n
n∑i=1
Gi∆ng′i (76)
By Theorem 1 PrP ′2jgi(β0) = 0 = 1 and ∀i ∈ 1, .., n.
Hence 1n
∑ni=1(p′2jgi(β0))2 = 0 ∀ j ∈ 1, , m except on a set of measure zero for j =
1, .., m So that w.p.1
P ′2jΩ(βn)P2j =1
n
n∑i=1
P ′2jGi∆n∆′nG′iP2j (77)
Since by A1(iv) it is straightforward to establish that
|| 1n
n∑i=1
(GiGi −GtGi)|| = ||M ||||β∗n − β0|| = Op(||∆n||) (78)
And by construction||P2j ||2 = 1 then since by A1(v) || 1n∑n
i=1Gi(β0)|| = Op(1) it is straight-
forward to show that
1
n
n∑i=1
P ′2jGi∆n∆′nG′iP2j =
1
n
n∑i=1
P ′2jGi∆n∆′nG′iP2j +Op(||∆n||3) (79)
54
Theorem 1
R(Ω) = m − m where m > 0 since Ω is assumed singular. Hence P ′2jΩP2j = 0 for
j = 1, .., , m
E[
n∑i=1
(P ′2jgi(β0))2] = 0 (80)
So that P ′2jgi = 0 a.s(wi) for i = 1, ..n for all n. Hence Ω(β0) = 1n
∑ni=1 gig
′i so that
P ′2jΩ(β0)P2j =1
n
n∑i=1
(P ′2jgi)2 = 0 (81)
Hence there are at least m eigenvalues of Ω(β0) equal to zero, i.e λ2j = 0 with corresponding
eigenvector P2j for j = 1, .., m.
Hence P2(β0) = P2 and Λ2(β0) = 0
Theorem 2a
Using Lemma A with A = Ω(βn) , A = Ω(β0) where B = (Ω(βn)− Ω(β0)) since ||Ω(βn)−
Ω(β0|!| ≤ Op(||βn−β0)||) then εn = Op(||βn−β0)||) .R1 = P1(β0) , R2 = P2(β0), R1 = P1(βn)
and R2 = P2(βn) and D1 = Λ1 then J = −Λ−11 P1(β0)′(Ω(β0) − Ω(βn))P2(β0) (setting
M = I which will not impact any asymptotic results as noted in Lemma A).
We now show that ||J || = Op(||∆n||). Since ||J || ≤ ||Λ−11 ||||P2(β0)||||P1(β0)||||||Ω(βn)−Ω(β0)|| =
O(1)Op(||∆n||) since m is fixed by A1 () hence ||P2(β0)|| = m||P1(β0)|| = m∗ are all bounded.
Then using Lemma A (column by column) it is straightforward to establish that
P1j(βn) = P1j(β0) +Op(||∆n||) (82)
P2j(βn) = P2j(β0) + P1(βn)Λ1(βn)−1P ′1(Ω(βn)− Ω(β0))P2j(β0) + +Op(||∆n||2) (83)
And since |J || = Op(||∆n||)
P2j(βn) = P2j(β0) +Op(||∆n||) (84)
By T1(i) P2(β0) = P2 both () and () hold replacing P2j(β0) with P2j
Proof Theorem 2b
supj|λj(βn)− λj | ≤ ||Ω(βn)− Ω|| (85)
55
By an application Theorem 4.2 of Bosq (2000). By A1 (ii) (ii) and T then ||Ω(βn) −
Ω|| ≤ ||Ω(βn) − Ω(β0)|| + ||Ω(β0) − Ω|| = Op(||∆n||) + Op(n−1/2). This establishes that
λj = λj +Op(n−1/2 ∨ ||∆n||) for all j = 1, ..m and hence establishes (i). We not proceed
to prove (ii) which derives a second order expansion of λ2j)(βn) around β0 for j = 1, .., m
(ii)
nP ′2Ω(βn)P2 = nP ′2P1(βn)Λ1(βn)P1(βn)′P2 + P ′2P2(βn)Λ2(βn)P2(βn)′P2 (86)
By T2a(ii) P ′2P2(βn) = Im + Op(||∆n||2) and P1(βn) = Ω(βn)P1(βn)Λ1(βn)−1 where by
Theorem 1a (i) ||P1(βn)−P1|| = Op(n−1/2 ∨ ||∆n|| and T2b(i) ||Λ1(βn)−Λ1|| ≤ Op(n−1/2 ∨
||∆n||) and P ′2Ω(βn) = P ′2(Ω(βn)− Ω(β0) where ||Ω(βn)− β0|| = Op(||∆n||) and ||P2|| = m =
O(1) by A1 and Λ2(βn) ≤ Op(n−1/2 ∨ ||∆n) by () then straightforward manipulation will
demonstrate that
P ′2Ω(βn)P2 = Λ2(βn) + P ′2Ω(βn)P1Λ−11 P ′1Ω(βn)P2 +Op(∆n||2(∆n ∨ n−1/2) (87)
Hence for all j = 1, ..,m
λ2j(βn) = P ′2jΩ(βn)P2j − P ′2jΩ(βn)P1Λ−11 P ′1Ω(βn)P2j −Op(∆n||2(∆n ∨ n−1/2) (88)
By T1 and ()
P ′2jΩ(βn)P1 = P ′2j1
n
n∑i=1
Gi∆ng′iP1 +
1
n
n∑i=1
P ′2jGi∆n∆nG′iP1 (89)
A1( iv), (v) it is straightforward to establish (similar to the proof of Lemma B) tat
P ′2jΩ(βn)P1 = P ′2j1
n
n∑i=1
Gi∆ng′iP1 +Op(||∆n||2) (90)
P ′2jΩ(βn)P1Λ−11 P ′1Ω(βn)P2j =
+P ′2j1
n
n∑i=1
Gi∆ng′iP1
1
nΛ−1
1
n∑j=1
P ′1gi∆′nGiP2j +Op(||∆n||3) (91)
and Lemma B by A1
λ2j(βn) = γjn − ψjn +Op(||∆n||2(||∆n|| ∨ n−1/2) (92)
Proof of Theorem 3
TAR(βn) =√ng(βn)′Ω(βn)−1√ng(βn)
56
=m∗∑j=1
(P1j(βn)′√ng(βn))2λ−1
1j +m∑j=1
(P2j(βn)′√ng(βn))2λ2j(βn)−1 (93)
Firstly second order expand g(βn) around β0
g(βn) = g(β0) + G(β0)∆n +Op(n−2κ) (94)
By A1 (i),(iii). Then by Theorem 2a (i)P1j(βn) = P1j + Op(n−1/2 since ||∆n|| = Op(n
−κ)
by A2(i) where κ > 1/2 and similarly by Theorem 2b (i) λ1j(βn) = λ1j + Op(n−1/2)
∀j = 1, ..,m∗ By () and ()
(P1j(βn)′√ng(βn))2/λ1j = (P ′1j
√ng(β0))2/λ1j +Op(n
1/2−κ) (95)
Then by A2(iii) n1/2g(β0)d→ N(0,Ω) hence P ′1j
√ng(β0)/λ1j
d→ N(0, 1)
m∗∑j=1
(P1j(βn)′√ng(βn))2/λ1j
d→ χ2m∗ (96)
The case where m = m∗ we have the standard result that AR is χ2m
By Theorem 2(b)and
n2κλ2j(βn)p→ γjn − ψjn +Op(||∆n||2(n−1/2 ∨ ||∆n||) (97)
And by A2(iv) n2κ(γjn − ψjn)p→ γj − ψj = υj so that
n2κλ2j(βn)p→ υj (98)
nκP2j(βn)′√ng(βn) (99)
By Theorem 2b (ii)
P2j(βn) = P2j + P1Λ−11 P ′1Ω(βn)P2j +Op(||∆n||(n−1/2 ∨ ||∆n||) (100)
And using ()
nκ+1/2g(βn) = nκ+1/2g(β0) +√nG(β0)nκ∆n +Op(n
1/2−κ) (101)
Where by A1(i) κ > 1/2
Hence substituting () into () in to ()
(nκP2j(βn)′√ng(βn)) =
57
P ′2j√nG(β0)nκ∆n − nκ(P ′2jΩ(βn)P1Λ−1
1 P ′1√ng(β0) + op(1) (102)
A1(i)nκ∆′n→p∆
nκP ′2jΩ(βn)P1Λ−1/21 = P ′2j
1
n
n∑i=1
Ginκ∆ng
′iP1Λ
−1/21 +Op(n
−κ) (103)
P ′2j1
n
n∑i=1
Ginκ∆ng
′iP1Λ
−1/21
p→ 1
n
n∑i=1
P ′2jE[Gi∆g′i]P1Λ
−1/21 = Ψj (104)
By A2() ||√ng(β0)|| = Op(1) so that
(nκP2j(βn)′√ng(βn)) = P ′2j
√nG(β0)∆−ΨjΛ
−1/21 P ′1
√ng(β0) + op(1) (105)
Λ−1/21 P ′1
√ng(β0)
d→ N(0, Im∗) (106)
ΨjΛ−1/21 P ′1
√ng(β0)
d→ N(0, ψj) (107)
By A2 (iii) P ′2jG = 0 for all j = 1, .., m
G(β0)′P2j = (G(β0)−G)′P2jd→ N(0,Φj) (108)
Where Φj := E[ 1nG′iP2jP2jGi]. Since by then by Slutksys Theorem and ()
∆′G(β0)′P2jd→ N(0,∆′Φj∆) (109)
∆′Ψj∆ = tr(∆′E[1
nG′iP2jP
′2jGi]∆) = P ′2jE[
1
n
n∑i=1
G′i∆∆′Gi]P2j = γj (110)
Since
P ′2j√nG(β0)∆(P1Λ−1
1 P ′1Φj
√ng(β0))′
p→ ψj (111)
Then
nκP2j(βn)′√ng(βn)
d→ N(0, γj − ψj) (112)
Then by Theorem 1b(ii) and A3() n2κλj(βn)p→ υj
m∑j=1
(nκP2j(βn)′√ng(βn)/λj(βn)1/2)2 d→ χ2
m (113)
58
Theorem 4 Since ∆n = β − β0 for 2S-GMM By A3(1) ||∆n|| = Op(n−1/2) so that by
T2b(i)
λ2j(β) = υjn +Op(n−3/2) (114)
Where υjn = P ′2j1n
∑ni=1G
′i∆n∆′nGi − 1
n
∑nl=1(Gi∆n∆′nGl)(g
′iP1Λ−1
1 P ′1gl)P2j
nλ2j(β) = ntr(υjn) +Op(n−1/2)
= tr(P ′2j1
n
n∑i=1
G′in∆n∆′nGi −1
n
n∑l=1
Gin∆n∆′nGl(g′iP1Λ−1
1 P ′1gl)P2j) +Op(n−1/2)
= tr(1
n
n∑i=1
G′iP2jP′2jGi−
1
n
n∑l=1
GiP2jP′2jGl(g
′iP1Λ−1
1 P ′1gl)n∆n∆′n)+Op(n−1/2) (115)
By A3(ii) then
Υ′
(1
n
n∑i=1
G′iP2jP′2jGi −
1
n
n∑l=1
GiP2jP′2jGl(g
′iP1Λ−1
1 P ′1gl)
)Υ := Πjn
p→ Πj (116)
Since ∆n = β − β0 by A3(i) and A2(ii)
n1/2∆nd→ ΣN(0,Ω)
d= ΣP1Λ
1/21 Z (117)
Where Zd∼ N(0, Im∗) and Υ := ΣP1Λ
1/21
nλ2j(β)d→ tr(ΠjZZ
′) (118)
Where ZZ ′ is a m∗ ×m∗ matrix with a standard Wishart distribution.
Theorem 5
T2S(β0) = ng(β0)′Ω(β)−1g(β0)
=
m∗∑j=1
(P1j(β)′√ng(β0))2λ1j(β)−1 +
m∑j=1
(P2j(β)′√ng(β0))2λ2j(β)−1 (119)
Define Zh as the standard normal variable such that n1/2P1hλ−1/21h
d→ Zh for h = 1, ..m∗
where Zi and Zj are independent ∀i 6= j.
Since by A3 (ii) β − β0 = Op(n−1/2) then using T2 P1j(β)
p→ P1j and λ1j(β)p→ λ1j then
by a similar arguments to the proof of T3 equation (92)
m∗∑j=1
(P1j(β)′√ng(β0))2λ1j(β)−1 d→
m∗∑h=1
Z2h (120)
59
Again if m = 0 (i.e m = m∗) then T2S(β0) has a limit distribution the sum of m indepen-
dent chi squared random variance as such standard asymptotics follow. When m > 0 the
LHS of () in general does not have a χ2m limit distribution. Using Theorem 4 and CMT
(nλ2j(β)−1 d→ tr(ΠjW)−1 (121)
We next derive the limit of nP2j(β)′g(β0) An application Theorem 2b (ii) and by A3(i)
P2j(βn) = P2j + P1Λ−11 P ′1Ω(β)P2j +Op(n
−3/2) (122)
Where by a similar argument as in the proof of T3 where now ∆n = β − β0
P ′2jΩ(β)P1 =1
n
n∑i=1
P ′2jGi∆ng′iP1 +Op(n
−1/2) (123)
Since P ′2j g(β0) = 0 w.p.1 follows by T1(i) then
P2j(β)′√ng(β0) = P ′2j
1
n
n∑i=1
Gi√n(β − β0)g′iP1Λ−1
1 P ′1√ng(β0) +Op(n
−1/2) (124)
Then by straightforward matrix manipulation we can re-express () as
P ′2j1
n
n∑i=1
Gi√n(β − β0)g′iP1Λ−1
1 P ′1√ng(β0)
=m∗∑h=1
1
n
n∑i=1
Gi√n(β − β0)g′iP1hλ
−1/21h λ
−1/21h P ′1h
√ng(β0) (125)
Then for any h = 1, .,m∗
1
n
n∑i=1
Gi√n(β − β0)g′iP1hλ
−1/21h = tr(λ
−1/21h
n∑i=1
g′iP1hP′2jGi√n(β − β0)) (126)
Where by A3(i)√n(β − β0) = Σ
√ng(β0)
p→ ΣP1Λ1/21 Z where Υ := ΣP1Λ
1/21 and A3(iii)
√n(β − β0)
d→ ΥZ (127)
λ−1/21h
n∑i=1
g′iP1hP′2jGiΥ := Θhjn
p→ Θhj (128)
Where Θhj := (θhj1, .., θhjm∗)
1
n
n∑i=1
Gi√n(β − β0)g′iP1hλ
−1/21h
d→ ΘhjZ =
m∗∑l=1
θhjlZl (129)
Then since
λ−1/21h P ′1h
√ng(β0)
d→ Zh (130)
60
So that by () and () which hold for all j = 1, ..m and h = 1, ..m∗
m∗∑h=1
1
n
n∑i=1
Gi√n(β − β0)g′iP1hλ
−1/21h λ
−1/21h P ′1h
√ng(β0)
d→m∗∑h=1
m∗∑l=1
θhjlZlZh (131)
So that by the CMT
(nP2j(β)′g(β0))2/(nλ2j(β))d→ (
m∗∑h=1
m∗∑l=1
θhjlZlZh)2/tr(ΠjZZ′) (132)
Then substituting () in to () and by () and define
T2S(β0)d→
m∗∑k=1
Z2k +
m∑j=1
(
m∗∑h=1
m∗∑l=1
θhjlZlZh)2/tr(ΠjZZ′) (133)
61
References
[1] Andrews, K. (19987). Asymptotic Results for Generalized Wald Tests, Econometric
Theory, Cambridge University Press, vol. 3(03), pages 348-358, June.
[2] Bathia, N. , Yao, Q. and Ziegelmann, F. (2010). Identifying the finite dimen-
sionality of curve time series, Ann. Statist. Volume 38, Number 6 (2010), 3352-3386.
[3] Bosq, D. (2000). Linear Processes in Function Spaces, New York: Springer- Verlag.
[4] Grant, N. (2012). GMM with Weakly Singular Variance (work in progress).
[5] Hansen, H.L (1982). Large Sample Properties of Generalized Method of Moments
Estimators, Econometrica, 50, issue 4, p. 1029-54
[6] Jagannathan, R. and Wang, G. (2002). Generalized Method of Moments: Appli-
cations in Finance, Journal of Business & Economic Statistics, American Statistical
Association, vol. 20(4), pages 470-81, October.
[7] Kato, T. (1982). A short introduction to the perturbation theory of linear operators.,
Springer-Verlag 1982.
[8] Kleibergen, F. (2005). Testing Parameters in GMM Without Assuming that They
Are Identified, Econometrica, Econometric Society, vol. 73(4), pages 1103-1123, 07.
[9] Kleibergen, F. (2011). Improved accuracy of weak instrument robust GMM statistics
through bootstrap and Edgeworth approximations, working paper.
[10] Newey, W. and McFadden,D. (1994). Large Sample Estimation and Hypothesis
Testing,Handbook of Econometrics, Vol.4, 2111-2245.
[11] Newey, W.N. and Windmeijer, F. (2009). Generalized Method of Moments With
Many Weak Moment Conditions, Econometrica, Econometric Society, vol. 77(3), pages
687-719, 05.
[12] Penaranda, F. and Sentana, E. (2010). Spanning tests in return and stochastic
discount factor mean-variance frontiers: A unifying approach, Economics Working Pa-
pers 1101, Department of Economics and Business, Universitat Pompeu Fabra, revised
Sep 2010.
62
[13] Renault, E. and Donovon, P. (2009). GMM Overidentification Test with First
Order Underidentification, working paper (2009).
[14] Sargan, J. D. (1983). Identification and Lack of Identification, Econometrica,
Econometric Society, vol. 51(6), pages 1605-33, November
[15] Staiger, D. and Stock, J.H. (1997). Instrumental Variables Regression with Weak
Instruments, Econometrica, Econometric Society, vol. 65(3):557-586.
[16] Stock, J.H. and Wright, J. (2000). GMM with Weak Identification,Econometrica,
Econometric Society, vol. 68(5), pages 1055-1096, September.
63