40
Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Econometric Models I: Spatial Autoregressive Processes by Lung-fei Lee* Department of Economics The Ohio State University Columbus, Ohio August 1999; March 2001 Abstract Asymptotic properties of MLEs and QMLEs of spatial autoregressive processes are investigated. The stochastic rate of convergence of the MLE and QMLE for a spatial autoregressive process may be less than the n-rate under some circumstances even though its limiting distribution is asymptotically normal. Implications of the possible low rate of convergence of the estimators on classical statistics such as the likelihood ratio, Wald, and ecient score statistics are analyzed. Key Words: Spatial autoregressive process, spatial autoregression, maximum likelihood estimator, quasi-maximum likelihood estimator, moment estimator, rate of convergence, asymptotic distribution, test statistics. Correspondence Address: Lung-fei Lee, Department of Economics, The Ohio State University, 410 Arps Hall, 1945 N. High St., Columbus, OH 43210-1172. E-mail: l[email protected] * This paper is a revised and expanded version of the rst part of a paper previously circulated under the title Asymptotic Distributions of Maximum Likelihood Estimators for Spatial Autoregressive Models. The earlier version of the paper was presented in seminars at HKUST, NWU, OSU, Princeton U., PSU, U. of Florida, U. of Illinois, and USC. I appreciate comments from participants of those seminars. i

Abstract - College of Arts and Sciences1).pdfE-mail: lß[email protected] ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Asymptotic Distributions of Quasi-Maximum Likelihood Estimators

for Spatial Econometric Models I: Spatial Autoregressive Processes

by Lung-fei Lee*

Department of Economics

The Ohio State University

Columbus, Ohio

August 1999; March 2001

Abstract

Asymptotic properties of MLEs and QMLEs of spatial autoregressive processes are investigated. The

stochastic rate of convergence of the MLE and QMLE for a spatial autoregressive process may be less

than the√n-rate under some circumstances even though its limiting distribution is asymptotically normal.

Implications of the possible low rate of convergence of the estimators on classical statistics such as the

likelihood ratio, Wald, and efficient score statistics are analyzed.

Key Words:

Spatial autoregressive process, spatial autoregression, maximum likelihood estimator, quasi-maximum

likelihood estimator, moment estimator, rate of convergence, asymptotic distribution, test statistics.

Correspondence Address:

Lung-fei Lee, Department of Economics, The Ohio State University, 410 Arps Hall, 1945 N. High St.,

Columbus, OH 43210-1172.

E-mail: lß[email protected]

* This paper is a revised and expanded version of the Þrst part of a paper previously circulated under thetitle Asymptotic Distributions of Maximum Likelihood Estimators for Spatial Autoregressive Models. Theearlier version of the paper was presented in seminars at HKUST, NWU, OSU, Princeton U., PSU, U. ofFlorida, U. of Illinois, and USC. I appreciate comments from participants of those seminars.

i

Page 2: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

1. Introduction

Spatial econometrics consist of econometric techniques dealing with empirical economic problems caused

by spatial autocorrelation in cross-sectional and/or panel data. Issues about spatial dependence are receiving

more attention in both empirical and theoretical econometrics. Possible dependence across spatial units is

a relevant issue in a range of economic Þelds including urban, real estate, regional, public, agricultural, and

environmental economics as well as industrial organizational studies (see, e.g., the recent survey by Anselin

and Bera, 1998, for references). There are a few books dealing with spatial econometrics issues (Cliff and

Ord, 1973; Paelinck and Klaassen, 1979; Anselin, 1988; Cressie, 1993) and several collections of spatial

econometrics papers published in special journals for regional and urban economics (Anselin, 1992; Anselin

and Florax, 1995; Anselin and Rey, 1997).

To capture spatial dependence in cross-sectional units, the approach in spatial econometrics is to impose

structures on the speciÞcation of a model. The spatial autoregressive models were Þrst popularized byWhittle

(1954), Mead (1967) and Ord (1975). While spatial autoregression extends autocorrelation in time series

to spatial dimensions, the spatial aspect of a model has its own distinctive features that are not present in

the time domain. Spatial autocorrelation by Whittle (1954) has the distinguishing feature of simultaneity

in econometric equilibrium models.

The autoregressive model can be estimated by the method of maximum likelihood (ML). Ord (1975)

discussed a computationally effective procedure to implement the ML method. As an alternative, Kelejian

and Prucha (1999a) have suggested a generalized moments estimation method. In this paper, we investigate

the possible rate of convergence and the asymptotic distribution of the maximum likelihood estimator (MLE)

and the quasi-maximum likelihood estimator (QMLE) for the Þrst-order spatial autoregressive model. The

MLE is the one when disturbances are normally distributed. The QMLE is appropriate when the estimator

is derived from a normal likelihood but the disturbances are not truly normally distributed. In the existing

literature, the MLE of such a model is implicitly regarded to have the familiar√n-rate of convergence as the

usual MLE for a nonlinear parametric statistical model (see, e.g., the reviews by Anselin, 1988 and Anselin

and Bera, 1998). Our investigation reported below provides a broader view of the asymptotic property of the

MLE and the QMLE. It shows that the rates of convergence of the MLE and QMLE may depend on some

general features of the spatial weights matrix of the model. The MLE and QMLE may have a√n-rate of

convergence and their limiting distributions are normal. But, under some circumstances, the estimators may

1

Page 3: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

have a low rate of convergence for some of the models parameters and may even be inconsistent. Possible

implications for inferences are discussed.

This paper is organized as follows. In Sections 2, the spatial autoregressive model is presented and

regular conditions for the model are speciÞed. In Section 3, identiÞcation of the parameters in the model

is investigated. Consistency of the QMLE and MLE are investigated. In Section 4, we investigate the

rates of convergence and the asymptotic distributions of the estimators. All the proofs of the results are

presented in Appendices. Section 5 concludes the paper. Appendix A collects some useful basic lemmas

and propositions. Proofs of theorems are in Appendix B. Appendix C provides brief descriptions on some

computationally simpler estimators.

2. The Spatial Autoregressive Model

The Þrst-order spatial autoregressive model is speciÞed as

Yn = λWn,nYn + Vn, (2.1)

where n is the total number of cross-sectional units, Yn is an n-dimensional vector of sample observations

of the dependent variable, Wn,n is an n × n matrix of constants known as spatial weights, λ is a scalar

unknown coefficient with |λ| < 1, and Vn is an n-dimensional vector of i.i.d. disturbances with zero mean

and a constant variance σ2. This and related models were originated from Whittle (1954), Mead (1967),

Cliff and Ord (1973), and Ord (1975). For the ith cross-sectional unit, this model speciÞes that

yni = λwi,nYn + vi (2.1)0

where yni and vi are, respectively, the ith element of Yn and Vn, and wi,n is the ith row of Wn,n. In

geographic analysis, weights are functions of distance. In sociometrics, the weights reßect social networks

of individuals (Doreian, 1980). In economic applications, the weights may be based on economic distance

such as income or racial composition (Case et al., 1993). As a convention, it is understood that wn,ii = 0

for all i, where wn,ij is the (i, j) entry of Wn,n.

The parameter vector θ = (λ,σ2) in (2.1) can be estimated by various methods. The popular estimation

method is the method of maximum likelihood (ML) as suggested by Ord (1975). The log likelihood function

of the model under normal disturbances is

lnLn(θ) = −n2ln(2π)− n

2lnσ2 + ln |In − λWn,n|− 1

2σ2Y 0n(In − λWn,n)

0(In − λWn,n)Yn. (2.2)

2

Page 4: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Alternatively, a method of moments has been suggested by Kelejian and Prucha (1999a). In this paper,

we concentrate on the maximum likelihood estimator (MLE) and investigate its asymptotic distribution.1

The method of moments will be touched upon brießy in an appendix. When vis are not truly normally

distributed, the MLE is a quasi-likelihood estimator (QMLE). To allow the later possibility, we investigate

the asymptotic property of a QMLE. The QMLE will reduce to the true MLE when vis are truly normally

distributed.

To provide a rigorous analysis of the QMLE or MLE, we assume the following regularity conditions for

the model.

Assumption 1. The disturbances vi, i = 1, · · · , n, are i.i.d. with mean zero and variance σ2. Its

moment E(|v|4+2δ) for some δ > 0 exits.

Assumption 2. The elements wn,ij of the weights matrix Wn,n are of order O(1hn) uniformly in all

i, j, i.e., there exists a constant c such that |wn,ij | ≤ chnfor all i, j and n, where the sequence of rates hn

can be bounded or divergent.

Assumption 3. The ratio hnn → 0 as n goes to inÞnity.

Assumptions 2 and 3 describe how the elements of Wn,n are related to sample size n.2 Such features play

important roles for asymptotic properties of an estimator. Assumption 3 is always satisÞed if hn is a

bounded sequence. Assumptions 2 and 3 are motivated by the design of row-normalized weights matrix. In

empirical applications, it is a common practice to have the spatial weights matrixWn,n being row-normalized

(or row-standardized, Haining, 1990, p.82) such that

wi,n = [di1, di2, . . . , din]/nXj=1

dij (2.3)

where dij represents a function of the spatial distance of the ith and jth units in some (characteristic)

space. The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1 and

weighting operations can be interpreted as an average of neighboring values (e.g., Anselin and Rey, 1991;

Kelejian and Robinson, 1993). The largest eigenvalue of a row-normalized weights matrix is always 1, which

is often claimed to facilitate the interpretation of the autoregressive coefficient λ (Ord, 1975; Anselin and

Bera, 1998, p.243). For row-normalized weights matrix according to (2.3), as di,j are nonnegative constants

1 There are interesting computational issues that will not be addressed in this paper. Interested readersmay consult Ord (1975) and Þnd recent developments in Smirnov and Anselin (1999).

2 Manski (1993) criticized the literature on spatial models for not having speciÞed how the weights wn,ijshould change with n (footnote 7 in Manski). Our assumptions make this explicit.

3

Page 5: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

and uniformly bounded and the sumsPnj=1 dij , i = 1, · · · , n, are uniformly bounded away from zero at

the rate hn, the implied normalized weights matrix will have the property ascribed in Assumption 2. The

assumption 3 leads us to focus on the cases wherePn

j=1 dij , i = 1, · · · , n, do not diverge to inÞnity at a rate

equal to or faster than the rate of the sample size n. Assumptions 2 and 3 are, however, more general in that

they shall cover spatial weights matrices where elements are not restricted to be nonnegative and weights

matrices which might not be row-normalized. The following established asymptotic properties of the QMLE

can thus be valid within a broad class of model framework.

The exact rate hn in Assumption 2 will depend on a speciÞed weights matrix. For a sample with n cross

sectional units, one has to imagine how a single unit may inßuence other available cross sectional units. For

example, in an empirical study of states expenditures in Case et al. (1993), several weights matrices have

been tried. The simplest weights matrix speciÞes that dij = 1/si, where si is the number of borders state i

shares, if states i and j share a border. With this speciÞcation, it is reasonable to argue that hn would

be bounded even if more states could be available. An important case that hn might diverge to inÞnity and

satisÞes Assumptions 2 and 3 can be that in Case (1991, 1992). Case (1992) studied the possible spatial

effect of neighbors on a farmers adoption of new technologies. In her study, neighbors refer to farmers

who live in the same district (in rural Java, Indonesia). This characterization of the neighbors makes the

Wn,n matrix block diagonal. The only non-zero elements appear as a block of households in the same

district. Suppose that there are r districts and there are m farmers in each district (for simplicity). The

sample size is n = mr. Case assumed that in a district, each neighbor of a farmer is given equal weight.

In that case, Wn,n = Ir ⊗ Bm,m, where ⊗ is the Kronecker product, is a symmetric block diagonal matrix

with Bm,m = 1(m−1) (lml

0m − Im), where lm is a m-dimensional column vector of ones. For this example,

hn = (m − 1) and hnn = (m−1m ) · 1r = O(1r ). If the increase of sample size n is generated by the increase of

both r and m, then hn goes to inÞnity andhnn goes to zero as n tends to inÞnity. In another study by Case

(1991) on a consumer demand problem, a similar spatial weights matrix is utilized. Assumption 3, however,

rules out the cases with hn being equal or larger than n for the spatial autoregressive process as the MLE

would likely be inconsistent for such cases.3 Examples will be provided for the latter.

Assumption 4. At the true parameter λ0, Sn = In − λ0Wn,n, where In is the n × n dimensional

identity matrix, is nonsingular.

3 Note that the rate hn can not be faster than n when dij are uniformly bounded in (2.3).

4

Page 6: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Assumption 4 guarantees that the system (2.1) has an equilibrium and

Yn = S−1n Vn, (2.4)

which has zero mean and variance σ20S−1n S

0−1n , where σ20 denotes the true variance parameter of vi. A

sufficient condition for Assumption 4 is k λ0Wn,n k< 1 for some matrix norm k · k (Horn and Johnson 1985,

p301). If Wn,n consists of nonnegative entries and is row-normalized, a sufficient condition for Sn(λ) to be

invertible is that |λ| < 1 (Horn and Johnson, 1985). With elements of Wn,n being normalized as in (2.3), if

dij = dji for all i, j, Ord (1975) showed that the eigenvalues of Wn,n are all real. For the latter, assuming

that the largest eigenvalue ωmax of Wn,n is positive and the smallest eigenvalue ωmin is negative, a sufficient

condition for the invertibility of Sn is1

ωmin< λ0 <

1ωmax

. With dij being nonnegative, it is known that

ωmax = 1 and the sufficient condition becomes1

ωmin< λ0 < 1.

DeÞnition: Let An be a sequence of square n× n matrices, where An = (an,ij).

1). The column sums of An are uniformly bounded (in absolute value) if there exists a Þnite constant c

that does not depend on n such that max1≤j≤nPn

i=1 |an,ij | ≤ c.

2). The row sums of An are uniformly bounded (in absolute value) if there exists a Þnite constant c that

does not depend on n such that max1≤i≤nPn

j=1 |an,ij | ≤ c.

The preceding notions of uniform boundedness can be deÞned in terms of some matrix norms. The maximum

column sum matrix norm k · k1 of a square n×nmatrix A = (aij) is deÞned as k An k1= max1≤j≤nPn

i=1 |aij |,

and the maximum row sum matrix norm k · k∞ is k An k∞= max1≤i≤nPn

j=1 |aij | (see Horn and Johnson

(1985), pp.294-295). The uniformly boundedness of An in column (resp. row) sums is equivalent to the

sequence k An k1 (resp. k An k∞) being bounded.

The uniform boundedness of row and column sums plays an important role in the asymptotic properties

of estimators of spatial econometric models. It limits the correlation of yni across different units to a

manageable degree (Kelejian and Prucha, 1998 and 1999a). We follow Kelejian and Prucha (1998) by

imposing the following assumption in the model.4

Assumption 5. The matrices Wn,n and S−1n are uniformly bounded in both row and column sum.

When Wn,n is row-normalized as in (2.3) with nonnegative dij , kWn,n k∞= 1 and, hence, its row sums must

be uniformly bounded. Furthermore, if |λ0| < 1, S−1n can be expanded into a convergent seriesP∞i=0 λ

i0W

in,n

(Horn and Johnson, 1985). From the series representation and Wn,n being row-normalized, S−1n must be

4 Related conditions have also been adopted in Pinkse (1999) in a different context.

5

Page 7: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

uniformly bounded in row sums. For a symmetric matrix, uniform boundedness in row sums is equivalent

to uniform boundedness in column sums. When Wn,n is symmetric, Sn will be symmetric. Thus, for a row-

normalized symmetric Wn,n with nonnegative elements, Wn,n and Sn with |λ0| < 1 will satisfy Assumption

5. This is so for the spatial weights matrices in Case (1991, 1992). Effective restrictions by Assumption 5 are

imposed on the column sums of Wn,n and S−1n if they are not symmetric matrices. The Wn,n with (2.3)

can be uniformly bounded in column sums under some conditions on dij as stated in Lemma A.1 of Appendix

A. The uniform boundedness of the column sums of S−1n may impose restrictions on λ0 as well as Wn,n.

Lemma A.2 of Appendix A shows that, for any weights matrix, k λ0Wn,n k1< 1 and k λ0Wn,n k∞< 1 for

all n, are sufficient conditions for S−1n to be uniformly bounded in both row and column sums. Because a

matrix norm k · k has the submultiplicative property that k AnBn k≤k An k · k Bn k, it is immediate that if

An and Bn are matrices uniformly bounded in row sums, AnBn is uniformly bounded in row sums; and

if An and Bn are uniformly bounded in column sums, AnBn is uniformly bounded in column sums.

In our subsequent analysis, products of matrices such as W 0n,nWn,n and S

0−1n S−1n , etc., appear. Assumption

5 guarantees that those products of matrices will be uniformly bounded in row and column sums.

For any λ, denote Sn(λ) = In − λWn,n. Then, Sn corresponds to Sn(λ0). Assumption 5 imposes the

uniform boundedness conditions for S−1n (λ) only at the point λ = λ0. But, as shown in Lemma A.3 of

Appendix A, this is sufficient to guarantee that S−1n (λ) are uniformly bounded in row and column sums,

uniformly in a neighborhood of λ0. Indeed, if kWn,n k1≤ 1 and kWn,n k∞≤ 1 for all n, Lemma A.4 shows

that S−1n (λ) will be uniformly bounded in row and column sums, uniformly in any closed subset of (−1, 1).

These uniform properties in λ are useful in handling the nonlinearity of S−1n (λ) in the analysis of the (quasi)

likelihood function of the model.

3. IdentiÞcation and Consistency

The log likelihood function of this model is

lnLn(θ) = −n2ln(2π)− n

2lnσ2 + ln |Sn(λ)|− 1

2σ2Y 0nS

0n(λ)Sn(λ)Yn. (3.1)

As in common practice, E(lnLn(θ)) refers to the expectation of lnLn(θ) with respect to the true distribution

of Yn. When Ln(θ) in (3.1) is a quasi-likelihood, E∗(lnLn(θ)) shall denote the expectation of lnLn(θ) with

respect to the normal distribution with mean 0 and variance σ20S−1n S−1n to distinguish it from the true

expectation. However, the value of E(lnLn(θ)) is invariant with respect to any distribution of Vn with

6

Page 8: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

E(VnV0n) = σ

20In. Thus, E(lnLn(θ)) = E

∗(lnLn(θ)) when the variance of Vn is correctly speciÞed and there

is no need to distinguish E∗(lnLn(θ)) from E(lnLn(θ)) in our subsequent analysis.

Under Assumptions 1-5, the following theorem shows that 1n lnLn(θ)−E( 1n lnLn(θ)) converges to zero

in probability uniformly in a compact parameter space Θ of θ.

Theorem 1. Under Assumptions 1-5, supθ∈Θ | 1n lnLn(θ)− E( 1n lnLn(θ)|p→ 0.

The consistency of the QMLE θn would follow from the uniform convergence if the identiÞed uniqueness

assumption (White, 1994, Theorem 3.4) were satisÞed. For each n, Jensens inequality guarantees that

E(lnLn(θ)) is uniquely maximized at the true parameter vector θ0 = (λ0,σ20). For identiÞcation, a sufficient

condition called the identiÞable uniqueness condition in White (1994, p.28) is needed to guarantee the

uniqueness of the maximizer in the limiting process as n goes to inÞnity. For any ² > 0, let N²(θ0) be an

open ball of θ0 with radius ² and N²(θ0) denote the complement of N²(θ0) contained in Θ. The identiÞable

uniqueness assumption is

lim supn→∞

·max

θ∈N²(θ0)E(1

nlnLn(θ))− E( 1

nlnLn(θ0))

¸< 0. (3.2)

For some cases, the identiÞable uniqueness condition might not be easily veriÞable. For the cases with

limn→∞ hn =∞, E( 1n lnLn(θ))−E( 1n lnLn(θ0)) can be very ßat with respect to the component λ for large

n. This can be seen from the partial derivatives or the function itself. Consider speciÞcally the model with

a row-normalized nonnegative weights matrix in (2.3), where S−1(λ) will be uniformly bounded in row

sums uniformly in λ in any closed subset of (−1, 1) from Lemma A.4. As

E(1

nlnLn(θ))− E( 1

nlnLn(θ0))

=1

2(1 + lnσ20 − lnσ2) +

1

n(ln |Sn(λ)|− ln |Sn(λ0)|)− σ20

2σ2ntr(S

0−1n S0n(λ)Sn(λ)S

−1n )

(3.3)

and ∂ ln |Sn(λ)|∂λ = −tr(Wn,nS

−1n (λ)) (see, e.g., Dhrymes 1978, Corollaries 30), one has

∂λ[E(

1

nlnLn(θ))− E( 1

nlnLn(θ0))] = − 1

ntr(Wn,nS

−1n (λ)) +

σ20σ2n

tr(S0−1n W 0

n,nSn(λ)S−1n ) = O(

1

hn)

uniformly in λ in any bounded subset of (−1, 1), by Lemmas A.4 and A.9. Thus, if limn→∞ hn = ∞, this

derivative at any λ ∈ (−1, 1) goes to zero. Alternatively, denote σ2n(λ) = σ20n tr(S

0−1n S0n(λ)Sn(λ)S

−1n ). The

function (3.3) can be rewritten as

E(1

nlnLn(θ))−E( 1

nlnLn(θ0)) =

1

2(lnσ20 − lnσ2) +

(σ2 − σ20)2σ2

+1

n(ln |Sn(λ)|− ln |Sn(λ0)|)− (σ

2n(λ)− σ20)2σ2

.

(3.3)0

7

Page 9: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Denote Gn =Wn,nS−1n . Because σ2n(λ) is a quadratic function of λ,

σ2n(λ) = σ20 [1− 2(λ− λ0)

tr(Gn)

n+ (λ− λ0)2 tr(G

0nGn)

n]. (3.4)

As tr(Gn) = O( nhn ) and tr(G0nGn) = O( nhn ) by Lemma A.9, σ

2n(λ) − σ20 = O( 1hn ). By the mean value

theorem,

1

n(ln |Sn(λ)|− ln |Sn(λ0)|) = 1

n

∂ ln |Sn(λn)|∂λ

(λ− λ0) = − 1ntr(Wn,nS

−1n (λn))(λ− λ0) = O( 1

hn),

because tr(Wn,nS−1n (λ)) = O( nhn ), uniformly in λ ∈ (−1, 1) by Lemmas A.4 and A.9. Thus, if hn goes to

inÞnity,

limn→∞[E(

1

nlnLn(θ))− E( 1

nlnLn(θ0))] =

1

2(lnσ20 − lnσ2) +

(σ2 − σ20)2σ2

.

This observation indicates that the identiÞable uniqueness condition (3.2) might only be relevant for the

identiÞcation of σ2 for given λ or for the cases with a bounded sequence hn.

For further investigation, we consider the concentrated likelihood function of λ. Given any value of λ,

the QMLE of σ2 can be solved from the Þrst-order condition ∂ lnLn(θ)∂σ2 = 0, which is

σ2n(λ) =1

nY 0nS

0n(λ)Sn(λ)Yn. (3.5)

The concentrated log likelihood function of λ is

lnLn(λ) = −n2ln(2π)− n

2ln σ2n(λ) + ln |Sn(λ)|−

1

2σ2n(λ)Y 0nS

0n(λ)Sn(λ)Yn

= −n2(ln(2π) + 1)− n

2ln σ2n(λ) + ln |Sn(λ)|.

(3.6)

The QMLE λn of λ is the maximizer of lnLn(λ) and it satisÞes the Þrst order condition that∂ lnLn(λn)

∂λ = 0.

The corresponding QMLE of σ2 is

σ2n(λn) =

1

nY 0nS

0n(λn)Sn(λn)Yn. (3.7)

Instead of the expected log likelihood function, a better analysis may investigate some related functions

of the concentrated log likelihood function in (3.6). Given λ, consider the maximization of E(lnLn(λ,σ2))

with respect to σ2. The maximizer is σ2n(λ) =σ20n tr(S

0−1n S0n(λ)Sn(λ)S

−1n ). DeÞne the function

Qn(λ) = maxσ2

E(lnLn(λ,σ2)) = −n

2(ln(2π) + 1)− n

2lnσ2n(λ) + ln |Sn(λ)|. (3.8)

8

Page 10: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Qn(λ) is related to the concentrated log likelihood function lnLn(λ) as it will be shown that

hnn(lnLn(λ)−Qn(λ)) = −hn

2(ln σ2n(λ)− lnσ2n(λ)) (3.9)

can converge to zero in probability uniformly in λ.5

Theorem 2. Suppose the assumed regularity conditions 1-5 are satisÞed, then

(1) there exists a neighborhood Λ of λ0 such that

hnn(lnLn(λ)−Qn(λ)) p→ 0 (3.10)

uniformly in λ in Λ;

(2) for the case limn→∞ hn =∞, the convergence in (3.10) can be uniform in λ in any bounded set; and

(3) when hn is a bounded sequence, if lim supn→∞tr2(Gn)

n·tr(G0nGn)

< 1, then the convergence in (3.10) can be

uniform in λ in any bounded set.

We note that the condition lim supn→∞tr2(Gn)

n·tr(G0nGn)

< 1 for a bounded sequence hn in part (3) of the above

theorem is a very mild condition on the spatial weights matrix Wn,n. Let gn,ij denote the elements of Gn.

It follows that tr(Gn) =Pnl=1 gn,ll and tr(G

0nGn) =

Pnl=1

Pnk=1 g

2n,lk (the square of the Euclidean matrix

norm of Gn). Thus,

1

ntr(G0nGn)−

tr2(Gn)

n2= [

1

n

nXl=1

nXk=1

g2n,lk −1

n

nXl=1

g2n,ll] + [1

n

nXl=1

g2n,ll − (1

n

nXl=1

gn,ll)2].

As an empirical variance of gn,ll, l = 1, · · · , n, 1n

Pnl=1 g

2n,ll − ( 1n

Pnl=1 gn,ll)

2 ≥ 0. The condition will be

satisÞed as long as either the limiting empirical distribution of gn,ll for l = 1, · · · , n does not degenerate, i.e.,

gn,ll does not converge to a constant for all l = 1, · · · , n as n goes to inÞnity, or the average of the squares

of the off diagonal elements of Gn does not tend to zero.

Consistency of the estimator λn will follow if λ0 can be uniquely identiÞed from the limiting process

of hnn (Qn(λ) − Qn(λ0)). The identiÞable uniqueness condition can be established in a neighborhood of λ0under some mild regular conditions. Let Cn = Gn − tr(Gn)

n In.

Assumption 6. The limit of the sequence hnn tr[(C

0n + Cn)(C

0n + Cn)

0] exists and is positive.

We note that tr[(C0n + Cn)(C0n + Cn)

0] = 2[tr(CnC0n) + tr(C

2n)] = 2[tr(GnG

0n) + tr(G

2n) − 2

n tr2(Gn)]. By

Lemma A.9, hnn tr[(C0n + Cn)(C

0n + Cn)

0] is O(1). It is, in general, positive. This is so as follows. For any

matrix A, tr(AA0) is the square of the Euclidean norm of A. Therefore, tr(AA0) = 0 if and only if A = 0.

5 We note that Qn(λ) is not the expectation of lnLn(λ) for any Þnite n.

9

Page 11: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

For our case, C0n + Cn = G0n + Gn − 2 tr(Gn)n In is, in general, not a zero matrix. If it were zero, it would

imply that all the diagonal elements of Gn would equal each other and Gn would be a skew-symmetric

matrix, i.e., G0n = −Gn. Thus, in general, hnn [tr(C0n + Cn)(C 0n + Cn)0] > 0 for each n. Assumption 6 rules

out irregular cases that the limit of the relevant sequence vanishes. For the example from Case (1992) with

Wn,n = Ir ⊗Bm,m where r is the number of districts and m is the number of households in a district,

hnn[tr(C 0n + Cn)(C

0n + Cn)

0]

= 4(m− 1)m

[(m

m− 1 + λ0 )2(1− 2(1− λ0)/m

(1− λ0)2 +1

m)− 1

m(λ0

1− λ0 )2(1− (1− λ0)

m)−2]→ 4

(1− λ0)2 ,

which is Þnite and positive (as m goes to inÞnity).

Theorem 3. Under the assumed conditions, there exists a set Λ with λ0 in its interior such that, for any

neighborhood N²(λ0) of λ0 with radius ²,

lim supn→∞

[ maxλ∈Λ\N²(λ0)

hnnQn(λ)− hn

nQn(λ0)] < 0.

Furthermore, the QMLE λn derived from the maximization of lnLn(λ) with λ in Λ is a consistent estimator.

As λn is a consistent estimate of λ, the consistency of σ2n(λn) in (3.7) can be shown as follows. As σ

2n(λn) =

1nY

0nS

0n(λn)Sn(λn)Yn, by expansion

σ2n(λn) =

1

nV 0nVn + 2(λ0 − λn)

1

nY 0nW

0n,nYn + (λ0 − λn)2

1

nY 0nW

0n,nWn,nYn =

1

nV 0nVn + oP (1),

because (λn − λ0) = oP (1), 1nY 0nW 0n,nYn = OP (

1hn) and 1

nY0nW

0n,nWn,nYn = OP (

1hn) by Lemma A.14. The

consistency of σ2n(λn) follows as

1nV

0nVn

p→ σ20 by the law of large number for i.i.d. variables.

Assumption 6 is essentially a local identiÞcation condition. For global identiÞcation, one has to impose

identiÞcation conditions outside of Λ. For global identiÞcation, the true parameter λ0 needs to be the unique

global maximizer in the limiting process of hnn (Qn(λ)−Qn(λ0)) on its parameter space. For each n, Qn(λ)

may have a unique maximum at λ0 if the variance of Yn is unique at (λ0,σ20), i.e., whenever (λ,σ

2) 6= (λ0,σ20),

σ2S−1n (λ)S0−1n (λ) 6= σ20S−1n S

0−1n . By Jensens inequality, E(lnL(λ,σ2)) ≤ E(lnL(λ0,σ20)) for all (λ,σ2), and,

hence, Qn(λ) ≤ Qn(λ0). When the variance of Yn is unique at (λ0,σ20), as the ratio of the normal densities

with different variances can not be a constant, Jensens inequality applied to the logarithmic of the ratio

of normal densities will become a strict inequality and one has E(lnL(λ,σ2)) < E(lnL(λ0,σ20)) for all

(λ,σ2) 6= (λ0,σ20). Under such a circumstance, Qn(λ)−Qn(λ0) < 0 whenever λ 6= λ0. However, this is not

sufficient to guarantee the identiÞcation uniqueness of λ0 in the limiting process ofhnn (Qn(λ)−Qn(λ0)) as

10

Page 12: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

n goes to inÞnity (see, e.g., Amemiya (1985), p.116). Assumption 6 is not sufficient for global identiÞcation.

For example, Assumption 6 does not rule out cases with Wn,n being skew-symmetric. It is apparent that

if Wn,n were skew-symmetric, the variance matrix of Yn would be σ20S

−1n S

0−1n = σ20(In + λ

20W

0n,nWn,n). As

only λ20 would appear in the variance matrix, the values of λ = λ0 and λ = −λ0 would generate the same

variance matrix of Yn and the true parameter of λ could not be globally identiÞed.

λ0 would be globally identiÞable if for any λ 6= λ0,

limn→∞

µhnnln |σ20S−1n S

0−1n |− hn

nln |σ2n(λ)S−1n (λ)S

0−1n (λ)|

¶6= 0,

because it implies that

limn→∞

hnn[ln |Sn(λ)|− ln|Sn(λ0)|− n

2(lnσ2n(λ)− lnσ2n(λ0))] 6= 0 for any λ 6= λ0.

A direct veriÞcation of the global identiÞcation of λ0 would depend on the speciÞc sequence of weights

matrices Wn,n in a model. Simple and intuitive global identiÞcation conditions are available for some

important weights matrices when limn→∞ hn = ∞. These cases include the spatial weights matrix in Case

(1991, 1992) which is symmetric, and the familiar case of Ord (1975) where Wn,n is row-normalized and the

eigenvalues of Wn,n are real, in the following theorem.

Theorem 4. For the case with limn→∞ hn =∞, if the weights matrix Wn,n is symmetric or Wn,n = Λ−1n Dn

where Λn is a diagonal matrix and Dn is a symmetric matrix, then, for large enough n,hnn (Qn(λ)−Qn(λ0))

is concave on any bounded set of λ and λ0 is its unique global maximizer. Hence, λ0 is uniquely identiÞable

on any bounded parameter space. The estimator λn derived from the maximization of lnLn(λ) over any

bounded parameter space of λ is consistent.

For cases where global identiÞcation for the QMLE is hard to check, an alternative strategy is to design a

consistent estimation method under some regularity conditions and, hopefully, easily veriÞable identiÞcation

conditions (e.g., a method of moments) to obtain an initial consistent estimate λn of λ. As λn converges to

λ0 in probability, λn will lie in any neighborhood Λ of λ0 with probability close to 1 for large enough n and

the maximization of lnLn(λ) can start with the initial consistent estimate. Appendix C provides an example

of moment estimator. It is shown that one step Newton-Raphson iteration starting with an initial consistent

estimator λn will provide a second round estimator of λ, which has the same asymptotic distribution of the

QMLE.

4. Asymptotic Distributions of the MLEs

11

Page 13: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

As Y 0nWn,nYn = Y0nW

0n,nYn, it follows from (3.5) that

∂σ2n(λ)∂λ = − 2

nY0nW

0n,nSn(λ)Yn. Therefore,

∂ lnLn(λ)

∂λ=

1

σ2n(λ)Y 0nW

0n,nSn(λ)Yn − tr(Wn,nS

−1n (λ)). (4.1)

At the true parameter λ0, σ2n(λ0) =

1nV

0nVn and

∂ lnLn(λ0)

∂λ=

1

σ2n(λ0)V 0nG

0nVn − tr(Gn). (4.2)

Because∂S−1n (λ)∂λ = S−1n (λ)Wn,nS

−1n (λ) (Dhrymes 1978, p.120, Corollary 38),

∂2 lnLn(λ)

∂λ2=

2

nσ4n(λ)(Y 0nW

0n,nSn(λ)Yn)

2 − 1

σ2n(λ)Y 0nW

0n,nWn,nYn − tr((Wn,nS

−1n (λ))2). (4.3)

At λ0,

∂2 lnLn(λ0)

∂λ2=

2

nσ4n(λ0)(V 0nG

0nVn)

2 − 1

σ2n(λ0)V 0nG

0nGnVn − tr(G2n). (4.4)

As λn satisÞes the Þrst-order condition that∂ lnLn(λn)

∂λ = 0, by the mean-value theorem

λn − λ0 = −Ã∂2 lnLn(λn)

∂λ2

!−1∂ lnLn(λ0)

∂λ, (4.5)

where λn lies between λn and λ0. To derive the asymptotic distribution of the QMLE λn, we investigate the

rate of convergence of the Þrst and second derivatives of the log likelihood function in (4.2) and (4.4). For

the second derivative, only its probability limit is needed. For the Þrst derivative, we need to investigate its

limiting distribution also.

DeÞne the function

Pn(λ0) =2

nσ40(V 0nGnVn)

2 − 1

σ20V 0nG

0nGnVn − tr(G2n). (4.6)

This function differs from ∂2 lnLn(λ0)∂λ2 in that σ2n(λ0) of the latter is replaced by the true parameter σ

20 .

Proposition 1. Under the regularity conditions of Assumptions 1-5,

E(Pn(λ0)) =2

ntr2(Gn)− tr(GnG0n)− tr(G2n) +O(

1

hn)

and

hnnPn(λ0)

p−→ limn→∞

hnn[2

ntr2(Gn)− tr(GnG0n)− tr(G2n)]. (4.7)

In particular, if limn→∞ hn =∞, then

hnnPn(λ0)

p−→ − limn→∞

hnn[tr(GnG

0n) + tr(G

2n)].

12

Page 14: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Proposition 2. Under the regularity conditions of Assumptions 1-5,

hnn

"∂2 lnLn(λn)

∂λ2− Pn(λ0)

#p−→ 0,

for any λn which converges in probability to λ0.

An immediate consequence of Propositions 1 and 2 is that

Proposition 3. Under the assumed regularity conditions,

−hnn

∂2 lnLn(λn)

∂λ2p−→ lim

n→∞hnn[tr(GnG

0n) + tr(G

2n)−

2

ntr2(Gn)]. (4.8)

In particular, if limn→∞ hn =∞, then −hnn∂2 lnLn(λn)

∂λ2p−→ limn→∞ hn

n [tr(GnG0n) + tr(G

2n)].

We note that tr(GnG0n) + tr(G

2n)− 2

n tr2(Gn) =

12 tr[(C

0n + Cn)(C

0n + Cn)

0] is positive.

From (4.2), ∂ lnLn(λ0)∂λ can be rewritten into ∂ lnLn(λ0)∂λ = 1

σ2n(λ0)V 0nCnVn. Denote Qn = V 0nCnVn. The

asymptotic distribution of this Þrst-order derivative will depend on the asymptotic distribution of the

random quadratic form Qn. The mean of Qn is zero because tr(Cn) = 0 and its variance is σ2Qn=

(µ4 − 3σ40)Pni=1 C

2n,ii + σ

40 [tr(CnC

0n) + tr(C

2n)] by Lemma A.12. This random form Qn, after being nor-

malized, may converge in distribution to a standard normal variable. Limiting distributions of quadratic

random forms have been obtained in Whittle (1964), Beran (1972), Sen (1976), Giraitis and Taqqu (1998),

and Kelejian and Prucha (1999). The original theorem of Kelejian and Prucha (1999) can be slightly modi-

Þed to take into account the possible lower than√n-rate convergence of our Qn function. For this purpose,

Assumption 3 needs to be slightly strengthened.

Assumption 30. h1+ηn

n for some η > 0 tends to zero as n goes to inÞnity.

Proposition 4. Under the Assumptions 1-2, 3 and 4-6, Qn

σQn

D−→ N(0, 1).

Assumption 7. The limit of hnn

Pni=1 C

2n,ii exists.

As tr(CnC0n) + tr(C

2n) = tr(GnG

0n) + tr(G

2n) − 2 tr

2(Gn)n = O(hnn ), a consequence of Proposition 4 with

the added Assumption 7 is thatq

hnnQn

σ20

D−→ N(0, limn→∞ hnn [(

µ4−3σ40σ40

)Pn

i=1 C2n,ii + tr(CnC

0n) + tr(C

2n)]).

Because vs are i.i.d. and σ2n(λ0) =1n

Pni=1 v

2i , the law of large numbers for i.i.d. random variables implies

that σ2n(λ0)p−→ σ20 . Consequently, the Slutsky lemma implies the following result.

Proposition 5. Under the assumed regularity conditions,rhnn

∂ lnLn(λ0)

∂λ=

σ20σ2n(λ0)

·rhnn

Qnσ20

D−→ N(0, limn→∞

hnn[(µ4 − 3σ40σ40

)

nXi=1

C2n,ii + tr(CnC0n) + tr(C

2n)]). (4.9)

13

Page 15: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

For the cases that vis are normally distributed or limn→∞ hn =∞,rhnn

∂ lnLn(λ0)

∂λ−→ N(0, lim

n→∞hnn[tr(CnC

0n) + tr(C

2n)]). (4.9)0

Based on the results in Propositions 3 and 5, the asymptotic distribution of λn can be derived fromrn

hn(λn − λ0) = −

Ãhnn

∂2 lnLn(λn)

∂λ2

!−1·rhnn

∂ lnLn(λ0)

∂λ.

Theorem 5. Under the assumed regularity conditions,rn

hn(λn − λ0) D−→ N(0,Σλλ + ΣλλΩΣλλ), (4.10)

where Σλλ =¡limn→∞ hn

n [tr(CnC0n) + tr(C

2n)]¢−1

and Ω = (µ4−3σ40σ40

) limn→∞ hnn

Pni=1 C

2n,ii.

For the cases that vis are normally distributed or limn→∞ hn =∞,rn

hn(λn − λ0) D−→ N(0,Σλλ). (4.10)0

The rate of convergence of the QMLE λn depends on n and hn. The usual√n-rate of convergence is

achievable only when the sequence hn is bounded. That is the case when each unit might be inßuenced

by a few neighboring units even when the total number of cross-sectional units is large. Otherwise, the rate

of convergence may be slower than the√n-rate. Anyway, the proper normalization rate for λn shall be the

square root of nhn. nhnmight be interpreted as the effective sample size. For the example in Case (1992) (or

Case, 1991) where n = m · r with m being the number of households in a district and r being the number

of districts, nhn= ( m

m−1 )r. The effective sample size in that example is the number of districts r.

Even though the QMLE of λ might converge at a slower rate, the QMLE of σ2 is convergent at the

usual√n-rate. This is shown in the subsequent theorem. Denote Σ∗λλ = Σλλ + ΣλλΩΣλλ.

Theorem 6. Under the assumed regularity conditions,Ã qnhn(λn − λ0)

√n(σ2n(

λn)− σ20)

!D→ N(0,Σ),

where Σ =

µΣ∗λλ ΣλσΣσλ Σσσ

¶with Σσλ = −2σ20ψΣ∗λλ, Σσσ = 4σ40ψ2Σ∗λλ + (µ4 − σ40), ψ = limn→∞

√hnn tr(Gn),

and µ4 = E(v4).

14

Page 16: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

For the case that limn→∞ hn =∞, Σ∗λλ = Σλλ, Σλσ = 0 and Σσσ = (µ4 − σ40).

From Theorem 6, we see that even though the possible slower rate of convergence of λn does not reduce the

usual√n-rate of convergence of the QMLE σ2n(

λn) of σ2, the QMLE λn may have effect on the asymptotic

variance of the latter unless ψ was zero. ψ will not vanish in general if hn is bounded because√hnn tr(Gn) =

O( 1√hn). In that case, the QMLEs λn and σ

2n(λn) (after proper normalization on their rates of convergence)

can be correlated. For the case that limn→∞ hn = ∞, ψ = 0 and λn and σ2n(λn) become asymptotically

independent as Σλσ = 0. These properties are summarized in the following theorem.

Theorem 7. Under the assumed regularity conditions, if limn→∞ hn =∞, then

√n(σ2n(

λn)− σ20) D−→ N(0, µ4 − σ40)

and√n(σ2n(

λn)− σ20) is asymptotically independent ofq

nhn(λn − λ0).

The QMLE λn can be derived by maximizing the concentrated log likelihood function (3.6). Its compu-

tation can be simpliÞed if an optimization subroutine can start its iterations with a good initial estimator.

For the latter, computationally simple estimators based on the method of moments have been suggested

by Kelejian and Prucha (1999a). In Appendix C, related moments estimators are described for the autore-

gressive process under our framework. It is shown that those moment estimators areq

nhn-consistent and

are asymptotically normal. With these consistent estimators, a computationally simpler second round (one-

step) estimator of λ, which has the same asymptotic distribution as the QMLE (MLE) λn, is also available.

The moments estimators λn1 or λn2 can be used as an initial consistent estimator for the derivation of the

QMLE λn. A computationally simpler second-round (one-step) estimator can also be derived. Let λn be

an initial consistent estimator. A second-round estimator can be deÞned from Newtons algorithm based on

the concentrated log likelihood function in (3.6):

λns = λn − (∂2 lnLn(λn)

∂λ2)−1

∂ lnLn(λn)

∂λ.

As λn is aq

nhn-consistent estimator, the second-round estimator can have the same limiting distribution as

the QMLE λn. This is so as follows. By the Taylor expansion,

∂ lnLn(λn)

∂λ=∂ lnLn(λ0)

∂λ+∂2 lnLn(λn)

∂λ2(λn − λ0),

15

Page 17: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

where λn lies between λn and λ0. Hence,rn

hn(λns − λ0)

=

rn

hn(λn − λ0)− (hn

n

∂2 lnLn(λn)

∂λ2)−1 ·

rhnn

∂ lnLn(λn)

∂λ

= In − (hnn

∂2 lnLn(λn)

∂λ2)−1(

hnn

∂2 lnLn(λn)

∂λ2)rn

hn(λn − λ0)− (hn

n

∂2 lnLn(λn)

∂λ2)−1 ·

rhnn

∂ lnLn(λ0)

∂λ

= −(hnn

∂2 lnLn(λn)

∂λ2)−1 ·

rhnn

∂ lnLn(λ0)

∂λ+ oP (1),

by Propositions 1 and 2. This shows that the second-round estimator has the same asymptotic distribution

as λn from (3.5).

The possible slower rate of convergence of λn in Theorem 5 implies that, for statistical inference, one shall

take into account the factor hn in addition to the sample size n. However, some practical formulas for classical

inference statistics are valid when the disturbances are normally distributed. For example, the t statistic

formula for testing λ as a speciÞc constant, say λc, is asymptotically valid. Let ω2λ,n = −( ∂2 lnLn(λn)∂λ2 )−1.

This nhnω2λ,n is a consistent estimate of Σλλ by Propositions 1 and 2 and the continuous mapping theorem,

the conventional test statistic for testing H0 : λ0 = λc is (λn − λc)/ωλ,n. This statistic is asymptotically

standard normal, because

λn − λ0ωλ,n

= (−hnn

∂2 lnLn(λn)

∂λ2)−1/2

rhnn

∂ lnLn(λ0)

∂λ+ oP (1)

D−→ N(0, 1)

under the null hypothesis. In addition to the Wald-type statistic, the conventional likelihood ratio test

statistic is also valid for testing λ0 = λc. This is so, because

2[lnLn(λn)− lnLn(λc)] = −∂2 lnLn(λn)

∂λ2(λn − λ0)2

=

rn

hn(λn − λ0)Σ−1λλ

rn

hn(λn − λ0) + oP (1) D→ χ2(1),

which is asymptotically chi-square distributed with one degree of freedom under the null hypothesis. For

testing the null hypothesis λ0 = λc, Raos efficient score test statistic based on the concentrated log likelihood

function is also valid. The efficient score statistic is ∂ lnLn(λc)∂λ

³−∂2 lnLn(λc)

∂λ2

´−1∂ lnLn(λc)

∂λ . This statistic is

asymptotically chi-square distributed by Proposition 5 and because −hnn∂2 lnLn(λc)

∂λ2 is a consistent estimate

of the limiting variance ofq

hnn∂ lnLn(λc)

∂λ under the null hypothesis. From our results, we note that these

statistics based on the concentrated likelihood are robust as they are asymptotically valid, even when vi

are not normally distributed, as long as either limn→∞ hn =∞ or µ4 = 4σ3 holds.

16

Page 18: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Theorem 5 indicates that the rate of convergence of λn to λ0 can be very slow if the rate hn is only

slightly slower than the n-rate. Consequently, one may suspect that for cases with hn = n, the QMLE

λn could be inconsistent. The following example illustrates this consequence. For simplicity, consider the

process (2.1) with a known σ2 = 1 and the spatial weights matrix Wn,n with wn,ij =1

n−1 for all i 6= j.

The column and row sums of such weights matrices are uniformly bounded. As Wn,n =1

n−1 (lnl0n − In),

S−1n (λ) = 1(1+ λ

n−1 )(In +

λ1−λ

lnl0n

(n−1) ). It is apparent that the column and row sums of S−1n (λ) are uniformly

bounded as long as λ is bounded and is bounded away from 1. With this weights matrix, one has

Wn,nS−1n (λ) =

1

(n− 1 + λ) (lnl

0n

1− λ − In),

(Wn,nS−1n (λ))2 =

1

(n− 1 + λ)2 (n− 2(1− λ)(1− λ)2 lnl

0n + In),

tr[Wn,nS−1n (λ)] =

λ

1− λ(1−(1− λ)n

)−1,

and

tr[(Wn,nS−1n (λ))2] = (

n

n− 1 + λ)2

µ1− 2(1− λ)/n(1− λ)2 +

1

n

¶.

Because Wn,n is symmetric, its Þrst-order derivative of the concentrated log likelihood with respect to λ at

λ0 is∂ lnLn(λ0)

∂λ= −tr(Wn,nS

−1n ) + V 0nS

−1n Wn,nVn

= − λ01− λ0 (1−

(1− λ0)n

)−1 +1

n− 1 + λ0V0n(

lnl0n

1− λ0 − In)Vn.

As 1nV

0n(

lnl0n

1−λ0 − In)Vn− λ01−λ0 =

11−λ0 [(

Pn

i=1vi√

n)2− 1]+ (1−

Pn

i=1v2i

n ) converges in probability to ξ−11−λ0 where

ξ is a χ2(1) random variable with one degree of freedom, it follows that

∂ lnLn(λ0)

∂λ

D→ ξ − 11− λ0 .

The second-order derivative is

∂2 lnLn(λ)

∂λ2= −tr[(Wn,nS

−1n (λ))2]− V 0n(Wn,nS

−1n )2Vn

= −( n

n− 1 + λ )2(1− 2(1−λ)

n

(1− λ)2 +1

n)− 1

(n− 1 + λ)2V0n(n− 2(1− λ)(1− λ)2 lnl

0n + In)Vn.

The QMLE λn satisÞes the Þrst-order condition that∂ lnLn(λn)

∂λ = 0. Hence, by the mean value theorem,

λn = λ0 −µ∂2 lnLn(λn)

∂λ2

¶−1∂ lnLn(λ0)

∂λ,

17

Page 19: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

where λn lies between λn and λ0. If λn were a consistent estimator, λn would converge in probability to λ0.

Under this situation, one has

∂2 lnLn(λn)

∂λ2D→ − 1

(1− λ0)2 −ξ

(1− λ0)2 = −ξ + 1

(1− λ0)2 .

Thus, for this example, if the QMLE λn were a consistent estimator, it would imply that λn − λ0 D→

(1− λ0) ξ−1ξ+1 . The latter is a contradiction as (1− λ0) ξ−1ξ+1 does not have a degenerate distribution (at zero).

So λn could not be a consistent estimator of λ0.

This example is revealing in terms of the asymptotic scenarios of adding new sample observations (as n

increases). The above example corresponds to a sample obtained from a single district. By increasing n, it

means increasing spatial units in the same district. That will correspond to the notion of inÞll asymptotics

(Cressie 1993, p.101). The above example shows that the QMLE under inÞll asymptotics alone may not

be consistent. As in Case (1991, 1992), if there are many districts from which samples are obtained, the

QMLEs can be consistent if the number of districts, i.e., nhn, increases. The scenario of increasing numbers

of districts corresponds to the notion of increasing-domain asymptotics (Cressie 1993, p.100). Consistency

of the QMLE can be achieved with increasing-domain asymptotics. From our results, the rate of convergence

of the QMLE under the increasing-domain asymptotics alone can be the usual√n-rate. But, when both

inÞll and increasing-domain asymptotics are operating, the rates of convergence of the QMLEs for various

parameters can be different and some of them may have a slower rate than the usual one.

5. Conclusions

In this paper, we have investigated the asymptotic distribution of the QMLE and MLE for spatial

autoregressive processes. The familiar asymptotic properties of a QMLE or MLE such as the√n-consistency,

where n is the sample size, and asymptotic normality are casually stated in the existing literature. Our

analysis reveals that these properties depend on some features of the spatial weights matrix. Under some

situations, the estimator of the spatial effect parameter in a spatial autoregressive process might converge

at a slower than√n-rate even though the limiting distribution is normal.

A spatial weights matrix is typically row-normalized to facilitate the interpretation of the spatial in-

teraction effect parameter. The normalization factorsPn

j=1 dij for each row play an important role in the

asymptotic properties of the QMLE for spatial autoregressive processes. For models with spatial weights

matrices being sparse,Pn

j=1 dij would remain Þnite and bounded as n goes to inÞnity. For those cases,

the QMLE will converge at the usual√n-rate and its limiting distribution is normal. However, the rate of

18

Page 20: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

convergence might be different if the sumsPn

j=1 dij do not stabilize as n increases. The latter implies that

inßuences of each unit by other spatial units are uniformly small at a certain rate of order O( 1hn ). The rate

of convergence of the QMLE may depend on the rate of divergence hn. We show that for the Þrst-order

autoregressive process, if hn tends to inÞnity slower than the n-rate, the stochastic rate of convergence of the

QMLE of the spatial effect parameter isq

nhn, which is lower than the

√n by the factor h

−1/2n . From this

rate, we can see that if hn were divergent at a rate equal to n, the QMLE would not possibly be consistent.

For the spatial scenario considered in Case (1991, 1992), the case of hn = O(n) corresponds to the

notion of inÞll asymptotics. Our example illustrates that the consistency and asymptotic distributions of

the parameter estimates are subject to the requirement of increasing-domain asymptotics. InÞll asymptotics

alone might not provide consistent estimates. While increasing-domain asymptotic may provide consistent

estimates with the usual√n-rate of convergence, the interacting of inÞll and increasing-domain asymptotics

may provide low rate of convergences for some parameter estimates.

We investigate possible implications of the low rate of convergence of the MLE on statistical inference.

Statistical inferences with classical statistics such as the likelihood ratio, the Wald test statistics and the

efficient score test statistics, the lower rate of convergence of the MLE does not invalidate conventional

formulas. Empirical researchers should worry about are those where hn diverges at the rate n. For the latter

case, as the MLE would not be consistent, a conventional test statistic might not be meaningful.

This paper has analyzed asymptotic properties for the spatial autoregressive processes. When explana-

tory variables are also incorporated into a spatial autoregressive process, the resulting model is known as the

mixed regressive, spatial autoregressive model (Ord 1975, Anselin 1988). When explanatory variables are

relevant, they play an interesting role for parameter estimation. For those models, methods of instrumental

variables (IV) and, under some circumstances, even OLS method are feasible (Kelejian and Prucha 1998;

Lee 1999a). However, it could be shown when all explanatory variables were irrelevant, the IV and OLS

methods would not be applicable. For the latter, the asymptotic distributions of various coefficients would

have various rates of convergence. We will report our Þndings on that model in a separate paper.

19

Page 21: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Appendix A: Some Useful Lemmas

A.1 Uniform Boundedness in Row and Column Sums

DeÞnition: The sequences bin for i = 1, · · · , n are of order hn uniformly in i, denoted by O(hn), if

there exists a Þnite constant c1 independent of i and n such that |bin| ≤ c1hn for all i = 1, . . . , n and for

large enough n. They are bounded away from zero uniformly in i at the rate hn if there exists a positive

sequence hn and a constant c2 > 0 independent of i and n such that c2hn ≤ |bin| for all i = 1, . . . , n for

large enough n. The sequences are of exact order hn uniformly in i, denoted by Oe(hn), if they are of order

hn and are bounded away from zero at the rate hn uniformly in i.

Lemma A.1 Suppose that the spatial weights matrix Wn,n is a non-negative matrix with its (i, j)th

element being wn,ij =dijPn

l=1diland dij ≥ 0 for all i, j.

(1) If the row sumsPn

j=1 dij are uniformly bounded away from zero at the rate hn uniformly in i, and the

column sumsPn

i=1 dij are of order hn uniformly in j, then Wn,n are uniformly bounded in column

sums.

(2) If dij = dji for all i and j and the row sumsPn

j=1 dij = Oe(hn), then Wn,n are uniformly bounded

in column sums.

Proof: (1) Let c1 and c2 be the constants such that c1hn ≤Pnj=1 dij for all i, and

Pni=1 dij ≤ c2hn for

all j. It follows thatPn

i=1 wn,ij =Pn

i=1dijPn

l=1dil≤ 1

c1hn

Pni=1 dij ≤ c2

c1for all i.

(2) This is a special case of (1) becausePn

l=1 dil = O(hn) andPn

i=1 dij =Pn

i=1 dji implyPn

i=1 dij =

O(hn). Q.E.D.

Lemma A.2 Suppose that supn k λ0Wn,n k1< 1 and supn k λ0Wn,n k∞< 1, then S−1n is uniformly

bounded in both row and column sums.

Proof: For any matrix norm k · k, k λ0Wn,n k< 1 implies that S−1n =P∞

k=0(λ0Wn,n)k (Horn and

Johnson 1985, p.301). Let c = supn k λ0Wn,n k. Then, k S−1n k≤P∞k=0 k λ0Wn,n kk=

P∞k=0 c

k = 11−c <∞

for all n. Q.E.D.

Lemma A.3 Suppose that kWn,n k and k S−1n k, where k · k is a matrix norm, are bounded. Then

k Sn(λ)−1 k, where Sn(λ) = In − λWn,n, are uniformly bounded in a neighborhood of λ0.

Proof: Let c be a constant such that k Wn,n k≤ c and k S−1n k≤ c for all n. We note that S−1n (λ) =

(Sn− (λ−λ0)Wn,n)−1 = S−1n (In− (λ−λ0)Gn)−1, where Gn =Wn,nS

−1n . By the submultiplicative property

of a matrix norm, k Gn k≤kWn,n k · k S−1n k≤ c2 for all n.

20

Page 22: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Let B1(λ0) = λ : |λ − λ0| < 1/c2. It follows that, for any λ ∈ B1(λ0), k (λ − λ0)Gn k≤ |λ − λ0|· k

Gn k< 1. As k (λ − λ0)Gn k< 1, it follows that In − (λ − λ0)Gn is invertible and (In − (λ − λ0)Gn)−1 =P∞k=0(λ− λ0)kGkn. Therefore,

k (In − (λ− λ0)Gn)−1 k≤∞Xk=0

|λ− λ0|k k Gn kk≤∞Xk=0

|λ− λ0|kc2k = 1

1− |λ− λ0|c2 <∞

for any λ ∈ B1(λ0). The result follows by taking a close neighborhood B(λ0) contained in B1(λ0). In B(λ0),

supλ∈B(λ0) |λ− λ0|c2 < 1, and, hence,

supλ∈B(λ0)

k S−1n (λ) k≤k S−1n k · supλ∈B(λ0)

k (In − (λ− λ0)Gn)−1 k≤ supλ∈B(λ0)

c

1− |λ− λ0|c2 <∞.

Q.E.D.

Lemma A.4 Suppose that k Wn,n k≤ 1 for all n, where k · k is a matrix norm, then k Sn(λ)−1 k,

where Sn(λ) = In − λWn,n, are uniformly bounded in any closed subset of (−1, 1).

Proof: For any λ ∈ (−1, 1), k λWn,n k≤ |λ|· k Wn,n k< 1 and, hence, S−1n (λ) =P∞

k=0 λkW k

n,n. It

follows that, for any |λ| < 1, k S−1n (λ) k≤P∞k=0 |λ|k· k Wn,n kk≤

P∞k=0 |λ|k = 1

1−|λ| . Hence, for any closed

subset B of (−1, 1), supλ∈B k S−1n (λ) k≤ supλ∈B 11−|λ| <∞. Q.E.D.

A.2 Orders of Some Relevant Quantities

In this subsection, the assumption that the elements wn,ij of the weights matrixWn,n are of order O(1hn)

uniformly in i, j will be maintained. In general, Wn,n may not be row-normalized.

Lemma A.5 Suppose that the elements of the sequences of vectors Pn = (pn1, · · · , pnn)0 and Qn =

(qn1, · · · , qnn)0 are uniformly bounded for all n.

1) If An are uniformly bounded in either row or column sums, then |P 0nAnQn| = O(n).

2) If the row sums of An and Zn are uniformly bounded, |e0niAnPn| = O(1) and |zi,nAnPn| = O(1)

uniformly in i, where eni is the ith unit column vector of dimension n, and zi,n is the ith row of Zn.

3) If both the row and column sums of An are uniformly bounded and Vn = (v1, . . . , vn)0 where vis are

i.i.d. with zero mean and Þnite variance σ2, then 1nP

0nAnVn = oP (1).

Proof: Let constants c1 and c2 such that |pni| ≤ c1 and |qni| ≤ c2. For 1), there exists a constant such

that 1n

Pni=1

Pnj=1 |an,ij | ≤ c3. Hence, |P 0nAnQn| = |Pn

i=1

Pnj=1 an,ijpniqnj | ≤ c1c2

Pni=1

Pnj=1 |an,ij | ≤

nc1c2c3. For 2), let c4 be a constant such thatPn

j=1 |an,ij | ≤ c4 for all n and i. It follows that |e0niAnPn| =

|Pnj=1 an,ijpnj | ≤ c1

Pnj=1 |an,ij | ≤ c1c4. Because Zn is uniformly bounded in row sums,

Pnj=1 |zn,ij | ≤ cz

21

Page 23: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

for some constant cz. It follows that |zi,nAnPn| ≤Pn

j=1 |zn,ij | · |e0njAnPn|(Pn

j=1 |zn,ij |)c1c4 ≤ czc1c4.

For 3), 1nP

0nAnVn has zero mean and its variance is E(

1nP

0nAnVn)

2 = σ2 1n2P

0nAnA

0nPn. Because AnA

0n is

uniformly bounded in both row and column sums, 1n2P

0nAnA

0nPn = O(

1n ) from 1). The result 3) follows from

Chebyshevs inequality. Q.E.D.

Lemma A.6 Suppose that the column sums of An are uniformly bounded. Then, |wi,nAnbn| = O( knhn )

and |w0n,iAnbn| = O( knhn ) uniformly in i for any column vector bn = (bn1, . . . , bnn)0 such that

Pni=1 |bni| =

O(kn).

Proof: Let wn,ij =cn,ijhn. Because wn,ij = O(

1hn) uniformly in i and j, there exists a constant c1 such that

cn,ij ≤ c1 for all i, j and n. Because An is uniformly bounded in column sums, max1≤j≤nPn

k=1 |an,kj | ≤ c2for all n. It follows that

|wi,nAnbn| ≤ 1

hn|(

nXk=1

cn,ikan,k1, · · · ,nXk=1

cn,ikan,kn)bn| ≤ c1c2hn

nXl=1

|bnl| ≤ O(knhn).

Similarly, |w0n,iAnbn| = O( knhn ). Q.E.D.

We note that the order of terms in Lemma A.6 is the at-most order. For some special cases, other orders

can be sharper. For example, if Wn,n is uniformly bounded in row sums and the elements of the sequence

of vectors Anbn are uniformly bounded for all n, |wi,nAnbn| = O(1) from Lemma A.5. The usefulness of the

orders of Lemma A.6 depends on the order kn of bn as in the following Corollary.

Lemma A.7 Suppose that the column sums of An are uniformly bounded.

1) wi,nAnenj = O( 1hn ) and w0n,iAnenj = O( 1hn ) uniformly in i and j, where enj is the jth unit column

vector of the n-dimensional Euclidean space.

2) wi,nAnb0j,n = O( 1hn ) and w

0n,iAnb

0j,n = O( 1hn ) uniformly in i and j, where bj,n is the jth row of Bn,

when Bn is uniformly bounded in row sums.

Proof: The result 1) follows because l0nenj = 1 where ln = (1, . . . , 1)0. The result 2) holds because

Bn being uniformly bounded in row sums is equivalent to B0n being uniformly bounded in column sums.

Q.E.D.

Lemma A.8 Suppose that Wn,n and S−1n are uniformly bounded in row sums, and An is uni-

formly bounded in column sums. Then, the matrix Hn = S0−1n W 0

n,nAn has the properties that e0niH

mn enj(=

e0njH0mn eni) = O(

1hn) uniformly in i and j, and tr(Hm

n ) = tr(H0mn ) = O( nhn ) for any integer m ≥ 1.

22

Page 24: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Furthermore, if Wn,n, S−1n and An are uniformly bounded in both row and column sums, then

e0ni(HnH0n)menj = O(

1hn) uniformly in i and j, and tr[(HnH

0n)m] = tr[(H 0

nHn)m] = O( nhn ). In addition, for

Gn =Wn,nS−1n , e0ni(G

0nGn)

menj = O(1hn).

Proof: Because S0−1n W 0

n,n = W 0n,nS

0−1n , it follows that Hn = W 0

n,nS0−1n An. As W 0

n,n, S0−1n and

An are all uniformly bounded in column sums, Hmn for any integer m ≥ 1 are uniformly bounded

in column sums. Therefore, e0niHmn enj = w0n,iS

0−1n AnH

m−1n enj = O( 1hn ) by Lemma A.7, and tr(H

mn ) =Pn

i=1 e0niH

mn eni = O(

nhn).

When Wn,n, S−1n and An are uniformly bounded in both row and column sums, Hn, H 0n and

HnH 0n for any integer m will be uniformly bounded in column sums. Therefore,

e0ni(HnH0n)menj = e

0niHnH

0n(HnH

0n)m−1enj = w0n,iS

0−1n AnH

0n(HnH

0n)m−1enj = O(

1

hn)

by Lemma A.7 and, hence, tr[(HnH0n)m] = O( nhn ). With the trace operator,

tr[(HnH0n)m] = tr[HnH

0n(HnH

0n)m−1] = tr[H 0

n(HnH0n)m−1Hn] = tr[(H 0

nHn)m].

Finally, for Gn =Wn,nS−1n , because G0n = S

0−1n W 0

n,n =W0n,nS

0−1n ,

e0ni(G0nGn)

menj = e0niG

0nGn(G

0nGn)

m−1enj = w0n,iS0−1n Gn(G

0nGn)

m−1enj = O(1

hn).

Q.E.D.

Lemma A.9 Suppose An are uniformly bounded either in row sums or in column sums. Then,

1) elements an,ij of An are uniformly bounded in i and j,

2) tr(Amn ) = O(n) for m ≥ 1, and

3) tr(AnA0n) = O(n).

Proof: If An is uniformly bounded in row sums, let c1 be the constant such that max1≤i≤nPn

j=1 |an,ij | ≤

c1 for all n. On the other hand, if An is uniformly bounded in column sums, let c2 be the constant such

that max1≤j≤nPn

i=1 |an,ij | ≤ c2 for all n. Therefore, |an,ij | ≤Pn

l=1 |an,il| ≤ c1 if An is uniformly bounded

in row sums; otherwise, |an,ij | ≤Pn

k=1 |an,kj | ≤ c2 if An is uniformly bounded in column sums. The result

1) implies immediately that tr(An) = O(n). If An is uniformly bounded in row (column) sums, then Amn for

m ≥ 2 is uniformly bounded in row (column) sums. Therefore, 1) implies also that tr(Amn ) = O(n). Finally,

as tr(AnA0n) =

Pni=1

Pnj=1 a

2n,ij , it follows that |tr(AnA0n)| ≤

Pni=1(

Pnj=1 |an,ij |)2 ≤ nc21 if An is uniformly

bounded in row sums; otherwise |tr(AnA0n)| ≤Pn

j=1(Pn

i=1 |an,ij |)2 ≤ nc22. Q.E.D.

23

Page 25: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

A.3 First and Second Moments of Quadratic Forms and Limiting Distribution

Lemma A.10 Let An = [aij ] be an n-dimensional square matrix. Suppose that v1, . . . , vn are i.i.d. with

zero mean, variance σ2 and Þnite fourth moment µ4. Then

1) E(V 0nAnVn) = σ2tr(An),

2) E(V 0nAnVn)2 = (µ4 − 3σ4)

Pni=1 a

2ii + σ

4[tr2(An) + tr(AnA0n) + tr(A

2n)], and

3) var(V 0nAnVn) = (µ4 − 3σ4)Pni=1 a

2ii + σ

4[tr(AnA0n) + tr(A

2n)].

In particular, if vs are normally distributed, then E(V 0nAnVn)2 = σ4[tr2(An) + tr(AnA

0n) + tr(A

2n)] and

var(V 0nAnVn) = σ4[tr(AnA

0n) + tr(A

2n)].

Proof: The result in 1) is trivial. For the second moment,

E(V 0nAnVn)2 = E(

nXi=1

nXj=1

aijvivj)2 = E(

nXi=1

nXj=1

nXk=1

nXl=1

aijaklvivjvkvl).

Because vs are i.i.d. with zero mean, E(vivjvkvl) will not vanish only when i = j = k = l, (i = j) 6= (k = l),

(i = k) 6= (j = l), and (i = l) 6= (j = k). Therefore,

E(V 0nAnVn)2 =

nXi=1

a2iiE(v4i ) +

nXi=1

nXj 6=i

aiiajjE(v2i v2j ) +

nXi=1

nXj 6=i

a2ijE(v2i v2j ) +

nXi=1

nXj 6=i

aijajiE(v2i v2j )

= (µ4 − 3σ4)nXi=1

a2ii + σ4[

nXi=1

nXj=1

aiiajj +nXi=1

nXj=1

a2ij +nXi=1

nXj=1

aijaji]

= (µ4 − 3σ4)nXi=1

a2ii + σ4[tr2(An) + tr(AnA

0n) + tr(A

2n)].

The result 3) follows from var(V 0nAnVn) = E(V 0nAnVn)2 − E2(V 0nAnVn) and the results of 1) and 2). When

vs are normally distributed, µ4 = 3σ2. Q.E.D.

Lemma A.11 Under the assumptions in Lemma A.10, if An are uniformly bounded either in row

sums or in column sums, then

1) E(V 0nAnVn) = O(n),

2) E(V 0nAnVn)2 = O(n2),

3) var(V 0nAnVn) = O(n),

4) V 0nAnVn = OP (n), and

5) 1nV

0nAnVn − 1

nE(V0nAnVn) = oP (1).

Proof: The results 1)-3) follow immediately from Lemmas A.9 and A.10. For any ² > 0, by Chebyshevs

inequality and 3), P (| 1nV 0nAnVn − E( 1nV 0nAnVn)| > ²) ≤ var( 1nV0nAnVn)

²2 = O( 1n ). That is, 1nV

0nAnVn =

E( 1nV0nAnVn) + oP (1) = OP (1) by 1), and

1nV

0nAnVn − E( 1nV 0nAnVn) = oP (1). Q.E.D.

24

Page 26: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Lemma A.12 Assume that Wn,n, S−1n and An are uniformly bounded in both row and column

sums. The v1, . . . , vn are i.i.d. with zero mean and Þnite variance σ2.

1) Let Hn = S0−1n W 0

n,nAn. Then E(V0nHnVn) = O(

nhn), var(V 0nHnVn) = O(

nhn), and V 0nHnVn = OP (

nhn).

Furthermore, if limn→∞ hnn = 0, then hn

n V0nHnVn − hn

n E(V0nHnVn) = oP (1).

2) With Yn = S−1n Vn, for any nonnegative integers r1 and r2 such that r1 + r2 ≥ 1, E(Y 0nW

0r1n,nW

r2n,nYn) =

O( nhn ), var(Y0nW

0r1n,nW

r2n,nYn) = O( nhn ), and Y

0nW

0r1n,nW

r2n,nYn = OP (

nhn). Furthermore, hn

n Y0nW

0r1n,nW

r2n,nYn −

hnn E(Y

0nW

0r1n,nW

r2n,nYn) = oP (1) if limn→∞ hn

n = 0.

Proof: Lemma A.8 implies that E(V 0nHnVn) = σ2tr(Hn) = O(

nhn). By Lemmas A.10 and A.8,

var(V 0nHnVn) = (µ4 − 3σ4)nXi=1

(e0niHneni)2 + σ4[tr(HnH

0n) + tr(H

2n)] = O(

n

hn).

As E((V 0nHnVn)2) = var(V 0nHnVn)+E

2(V 0nHnVn) = O((nhn)2), the generalized Chebyshev inequality implies

that P (hnn |V 0nHnVn| ≥ M) ≤ 1M2 (

hnn )

2E(|V 0nHnVn|2) = 1M2O(1) and, hence,

hnn V

0nHnVn = OP (1). Finally,

because var(hnn V0nHnVn) = O(hnn ), the Chebyshev inequality implies that

hnn V

0nHnVn − hn

n E(V0nHnVn) =

oP (1) when limn→∞ hnn = 0. These prove the results in 1). The results of 2) follow from 1) by letting

Hn = S0−1n W

0r1n,nW

r2n,nS

−1n , as Y 0nW

0r1n,nW

r2n,nYn = V

0nHnVn. Q.E.D.

Lemma A.13 Suppose that An is a sequence of symmetric matrices with row and column sums

uniformly bounded in absolute value. The v1, · · · , vn are i.i.d. random variables with zero mean and Þnite

variance σ2, and its moment E(|v|4+2δ) for some δ > 0 exists. Let σ2Qnbe the variance of Qn where

Qn = V0nAnVn − σ2tr(An). Assume that the variance σ2Qn

is bounded away from zero at the rate nhnand the

entries an,ij of An are of order O(1hn). If limn→∞ h

1+ 2δ

n

n = 0, then Qn

σQn

D−→ N(0, 1).

Proof: The asymptotic distribution of the quadratic random form Qn can be established via the mar-

tingale central limit theorem. Our proof of this Lemma follows closely the original arguments in Kelejian

and Prucha (1999b). In their paper, σ2Qnis assumed to be bounded away from zero with the n-rate. Our

subsequent arguments modify theirs to take into account the different rate of σ2Qn.

Qn can be expanded into Qn =Pn

i=1 an,iiv2i + 2

Pni=1

Pi−1j=1 an,ijvivj − σ2tr(An) =

Pni=1 Zni, where

Zni = an,ii(v2i −σ2)+2vi

Pi−1j=1 an,ijvj . DeÞne σ-Þelds Ji =< v1, . . . , vi > generated by v1, . . . , vi. Because vs

are i.i.d. with zero mean and Þnite variance, E(Zni|Ji−1) = an,ii(E(v2i )−σ2)+2E(vi)Pi−1

j=1 an,ijvj = 0. The

(Zni,Ji)|1 ≤ i ≤ n, 1 ≤ n forms a martingale difference double array. We note that σ2Qn=Pni=1E(Z

2ni)

as Zni are martingale differences. Alsohnn σ

2Qn

= O(1). DeÞne the normalized variables Z∗ni = Zni/σQn .

25

Page 27: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

The (Z∗ni,Ji)|1 ≤ i ≤ n, 1 ≤ n is a martingale difference double array and Qn

σQn=Pni=1 Z

∗ni. In order for

the martingale central limit theorem to be applicable, we would show that there exists a δ∗ > such thatPni=1E|Z∗ni|2+δ

∗tends to zero as n goes to inÞnity. Secondly, it will be shown that

Pni=1E(Z

∗2ni |Ji−1)

p→ 1.

For any positive constants p and q such that 1p +

1q = 1, |Zni|q ≤ (|an,ii| · |v2i − σ2|+ 2|vi|

Pi−1j=1 |an,ij | ·

|vj |)q = (|an,ii| 1p |an,ii| 1q |v2i −σ2|+Pi−1

j=1 |an,ij |1p |an,ij | 1q 2|vi| · |vj |)q. The Holder inequality for inner products

applied to the last term implies that

|Zni|q ≤

iXj=1

(|an,ij | 1p )p

1p(|an,ii| 1q |v2i − σ2|)q + i−1X

j=1

(|an,ij | 1q 2|vi| · |vj |)q

1q

q

=

iXj=1

|an,ij |

qp|an,ii| · |v2i − σ2|q + i−1X

j=1

|an,ij |2q|vi|q|vj |q .

As An are uniformly bounded in row sums, there exists a constant c1 such thatPn

j=1 |an,ij | ≤ c1 for

all i and n. Hence |Zni|q ≤ 2qcqp

1 (|an,ii| · |v2i − σ2|q + |vi|qPi−1

j=1 |an,ij ||vj|q). Take q = 2 + δ. Let

cq > 1 be a Þnite constant such that E(|v|q) ≤ cq and E(|v2 − σ2|q) ≤ cq. Such a constant exists un-

der the moment conditions of v. It follows thatPn

i=1E|Zni|q ≤ 2qcqp

1 c2q

Pni=1

Pij=1 |an,ij | = O(n). AsPn

i=1E|Z∗ni|2+δ = 1σ2+δQn

Pni=1E|Zni|2+δ and σ2+δQn

= (hnn σ2Qn)1+

δ2 ( nhn )

1+ δ2 = O(( nhn )

1+ δ2 ),

Pni=1E|Z∗ni|2+δ =

O(h1+ δ

2n

nδ2) = O(h

1+ 2δ

n

n )δ2 , which goes to zero as n tends to inÞnity.

It remains to show thatPn

i=1E(Z∗2ni |Ji−1)

p→ 0. As E(Z2ni|Ji−1) = (µ4−σ4)a2n,ii+4σ2(Pi−1

j=1 an,ijvj)2+

2µ3an,ii(Pi−1

j=1 an,ijvj), and E(Z2ni) = (µ4 − σ4)a2n,ii + 4σ4

Pi−1j=1 a

2n,ij , it follows that

E(Z2ni|Ji−1)− E(Z2ni) = 4σ2(i−1Xj=1

i−1Xk 6=j

an,ijan,ikvjvk +

i−1Xj=1

a2n,ij(v2j − σ2)) + 2µ3an,ii(

i−1Xj=1

an,ijvj)

andPn

i=1E(Z∗2ni |Ji−1)− 1 = 1

σ2Qn

Pni=1[E(Z

2ni|Ji−1)− E(Z2ni)] = 4σ2

hnn σ2

Qn

(H1n +H2n) +2µ3

hnn σ2

Qn

H3n, where

H1n =hnn

nXi=1

i−1Xj=1

i−1Xk 6=j

an,ijan,ikvjvk, H2n =hnn

nXi=1

i−1Xj=1

a2n,ij(v2j − σ2),

and H3n =hnn

Pni=1

Pi−1j=1 an,iian,ijvj . We would like to show that Hjn for j = 1, 2, 3, converge in probability

to zero. E(H3n) = 0. By exchanging summations,Pni=1

Pi−1j=1 an,iian,ijvj =

Pn−1j=1 (

Pni=j+1 an,iian,ij)vj .

Thus, E(H23n) =

h2nn2 σ

2Pn−1

j=1 (Pn

i=j+1 an,iian,ij)2 ≤ σ2 h2nn2 max1≤i≤n |an,ii|2 ·

Pn−1j=1 (

Pni=j+1 |an,ij |)2 = O( 1n )

because maxi,j |an,ij |2 = O( 1h2n ) andPn−1j=1 (

Pni=j+1 |an,ij |)2 = O(n). E(H2n) = 0 and H2n can be rewritten

into H2n =hnn

Pn−1j=1 (

Pni=j+1 a

2n,ij)(v

2j − σ2). Thus

E(H22n) = (

hnn)2(µ4 − σ4)

n−1Xj=1

(

nXi=j+1

a2n,ij)2 ≤ (hn

n)2(µ4 − σ4) max

1≤i,j≤n|an,ij |2 ·

n−1Xj=1

(

nXi=j+1

|an,ij |)2 = O( 1n).

26

Page 28: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

We conclude that H3n = oP (1) and H2n = oP (1). E(H1n) = 0 but its variance is relatively more

complex than that of H2n and H3n. By rearranging terms, H1n =hnn

Pni=1

Pi−1j=1

Pi−1k 6=j an,ijan,ikvjvk =

hnn

Pn−1j=1

Pn−1k 6=j Sn,jkvjvk, where Sn,jk =

Pni=maxj,k+1 an,ijan,ik. The variance of H1n is

E(H21n) = (

hnn)2n−1Xj=1

n−1Xk 6=j

n−1Xr=1

n−1Xs6=r

Sn,jkSn,rsE(vjvkvrvs).

As k 6= j and r 6= s, E(vjvkvrvs) 6= 0 only for the cases that (j = r) 6= (k = s) and (j = s) 6= (k = r). The

variance of H1n can be simpliÞed and

E(H21n) = 2σ

4(hnn)2n−1Xj=1

n−1Xk 6=j

S2n,jk ≤ 2σ4(hnn)2n−1Xj=1

n−1Xk 6=j(

nXi1=1

nXi2=1

|an,i1jan,i1kan,i2jan,i2k|)

≤ 2σ4(hnn)2

nXi1=1

(

nXj=1

nXk 6=j

|an,i1jan,i1k|) · max1≤j≤n

nXi2=1

|an,i2j | ·maxi2,k

|an,i2k|

≤ 2σ4(hnn)2 max1≤j≤n

nXi2=1

|an,i2j | ·maxi2,k

|an,i2k| ·nX

i1=1

(nXj=1

|an,i1j | ·nXk=1

|an,i1k|) = O(hnn),

because An is uniformly bounded in row and column sums and an,ij = O(1hn) uniformly in i and j. Thus,

H1n = oP (1) as limn→∞ hnn = 0 implied by the condition limn→∞ h

(1+2δ)

n

n = 0.

As Hjn, j = 1, 2, 3, are oP (1) and limn→∞ hnn σ

2Qn> 0,

Pni=1 E(Z

∗2ni |Ji−1) converges in probability to 1.

The central limit theorem for the martingale difference double array is thus applicable (see, Hall and Heyde,

1980; Potscher and Prucha, 1997) to establish the result. Q.E.D.

Lemma A.14 Suppose that An is a square matrix with its column sums being uniformly bounded and

elements of the n × k matrix Cn are uniformly bounded. Then, 1√nC 0nAnVn = OP (1), Furthermore, if the

limit of 1nC0nAnA

0nCn exists and it is positive deÞnite, then

1√nC0nAnVn

D→ N(0,σ20 limn→∞ 1nC

0nAnA

0nCn).

Proof: This is Lemma A.2 in Lee (1999b). These results can be established by Chebyshevs inequality

and Liapounov double array central limit theorem. Q.E.D.

27

Page 29: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Appendix B: Proofs of Theorems

Proof of Theorem 1. We will show that 1n lnLn(θ)−E( 1n lnLn(θ)) = oP (1) at each θ and is stochas-

tically equicontinuous. Let Fn(λ) = S0−1n S0n(λ)Sn(λ)S

−1n . As

E(1

nlnLn(θ)) = −1

2ln(2π)− 1

2lnσ2 +

1

nln |Sn(λ)|− σ20

2σ2ntr(Fn(λ)),

it follows that

1

nlnLn(θ)−E( 1

nlnLn(θ)) = − 1

2σ2n[Y 0nS

0n(λ)Sn(λ)Yn − σ20tr(Fn(λ))] = −

1

2σ2n[V 0nFn(λ)Vn− σ20tr(Fn(λ))].

With the formula of Lemma A.10, var( 1n lnLn(θ)) =1

4σ4n2 (µ4 − 3σ40)Pn

i=1 F2n,ii(λ) + 2σ

40tr(Fn(λ)F

0n(λ)).

By Lemma A.9, F 2n,ii(λ) = O(1) and tr(Fn(λ)F0n(λ)) = O(n). Therefore, var( 1n lnLn(θ)) = O( 1n ). The

Chebyshev inequality implies that 1n lnL(θ)− E( 1n lnL(θ)) = oP (1) at each θ.

Let θ and θ be two values of the parameter vector, Tn1 = S0−1n W 0

n,nS−1n and Tn2 = S

0−1n W 0

n,nWn,nS−1n .

We have1

σ2Y 0nS

0n(λ)Sn(λ)Yn −

1

σ2Y 0nS

0n(λ)Sn(λ)Yn

=(σ2 − σ2)σ2σ2

Y 0nS0n(λ)Sn(λ)Yn +

(λ− λ)σ2

[2Y 0nWn,nYn − (λ+ λ)Y 0nW 0n,nWn,nYn],

and

1

σ2tr(Fn(λ))− 1

σ2tr(Fn(λ)) =

1

σ2[tr(Fn(λ))− tr(Fn(λ))] + 1

σ2σ2(σ2 − σ2)tr(Fn(λ))

=1

σ2(λ− λ)[2tr(Tn1)− (λ+ λ)tr(Tn2)] + 1

σ2σ2(σ2 − σ2)tr(Fn(λ)).

Hence,

|[ 1nlnLn(θ)− E( 1

nlnLn(θ))]− [ 1

nlnLn(θ)− E( 1

nlnLn(θ))]|

≤ | 1

2nσ2Y 0nS

0n(λ)Sn(λ)Yn −

1

2nσ2Y 0nS

0n(λ)Sn(λ)Yn|+ |

σ202nσ2

tr(Fn(λ))− σ202nσ2

tr(Fn(λ))|

≤ |σ2 − σ2|2σ2σ2

½| 1nY 0nYn|+ 2|

1

nY 0nWn,nYn|+ | 1

nY 0nW

0n,nWn,nYn|

¾+ 2

|λ− λ|σ2

½| 1nY 0nWn,nYn|+ | 1

nY 0nW

0n,nWn,nYn|

¾+ 2

|λ− λ|σ20σ2

| 1ntr(Tn1)|+ | 1

ntr(Tn2)|

+ |σ2 − σ2| σ202σ2σ2

½1

ntr[S

0−1n S−1n ] + 2| 1

ntr(Tn1)|+ | 1

ntr(Tn2)|

¾.

Lemma A.11 implies that | 1nY

0nYn| = | 1nV 0nS

0−1n S−1n Vn| = OP (1). | 1nY 0n(W 0

n,n +Wn,n)Yn| = OP (1hn) and

| 1nY 0nW 0n,nWn,nYn| = OP ( 1hn ) by Lemma A.12. Also, 1n tr[S

0−1n S−1n ] = O(1) from Lemma A.9. Lemma A.8

implies that | 1n tr(Tn1)| = O( 1hn ) and | 1n tr(Tn2)| = O( 1hn ). These imply the stochastic equicontinuity of

1n lnLn(θ)− E( 1n lnLn(θ)).

28

Page 30: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

With the pointwise probability convergence and stochastic equicontinuity, 1n lnLn(θ) − E( 1n lnLn(θ))

p→ 0 uniformly in θ ∈ Θ (Davidson, 1994, Theorem 21.9). Q.E.D.

Proof of Theorem 2. Because hnn (lnLn(λ) − Qn(λ)) = −hn

2 (ln σ2n(λ) − lnσ2n(λ)), it follows by the

mean value theorem that

hnn(lnLn(λ)−Qn(λ)) = − hn

2σ2n(λ)(σ2n(λ)− σ2n(λ)),

where σ2n(λ) lies between σ2n(λ) and σ

2n(λ). Because

hn[σ2n(λ)− σ2n(λ)] =

hnnV 0nS

0−1n S−1n Vn − σ20

hnntr(S

0−1n S−1n )

− λhnnV 0nS

0−1n (W 0

n,n +Wn,n)S−1n Vn − σ20

hnntr[S

0−1n (W 0

n,n +Wn,n)S−1n ]

+ λ2hnnV 0nG

0nGnVn − σ20

hnntr[G0nGn],

Lemma A.12 implies that supλ |hn[σ2n(λ) − σ2n(λ)]| = oP (1) on any bounded set of λ. In particular, this

implies that σ2n(λ)−σ2n(λ) converges to zero in probability uniformly in λ in any bounded set. As σ2n(λ) lies

between σ2n(λ) and σ2n(λ), it follows that

1σ2n(λ)

≤ max 1σ2n(λ)

, 1σ2n(λ)

≤ 1σ2n(λ)

+ 1σ2n(λ)

and

|hnn(lnLn(λ)−Qn(λ))| ≤ 1

2(1

σ2n(λ)+

1

σ2n(λ))|hn(σ2n(λ)− σ2n(λ))|.

One can consider the stochastic boundedness of 1σ2n(λ)

and 1σ2n(λ)

. Because Sn(λ) = Sn − (λ − λ0)Wn,n,

σ2n(λ) =σ20n tr(S

0−1n S0n(λ)Sn(λ)S−1n ) = σ20 [1− 2(λ− λ0) tr(Gn)

n + (λ− λ0)2 tr(G0nGn)n ], i.e.,

σ2n(λ)− σ20 =σ20n[2(λ0 − λ)tr(Gn) + (λ0 − λ)2tr(G0nGn)].

It follows that |σ2n(λ)− σ20 | ≤ O(1) · |λ− λ0| on any bounded set of λ. As σ20 > 0, supλ∈Λ | 1σ2n(λ)

| <∞ on a

small neighborhood Λ of λ0. Furthermore, because σ2n(λ)−σ2n(λ) = op(1) uniformly in λ in any bounded set,

one has that supλ∈Λ | 1σ2n(λ)

| = Op(1). Thus, hnn (lnLn(λ)−Qn(λ)) converges to zero in probability uniformly

in λ in Λ.

Since σ2n(λ) is a quadratic function of λ, it is easy to see that it has a minimum at λ = λ0 +tr(Gn)

tr(G0nGn)

and the minimum is σ2n,min = σ20 [1 − tr2(Gn)n·tr(G0

nGn)]. Under the assumption that lim supn→∞

tr2(Gn)n·tr(G0

nGn)< 1,

σ2n,min is bounded away from zero for large enough n. Because σ2n(λ)−σ2n(λ) converges to zero in probability

uniformly in λ, σn(λ) is bounded away from zero in probability on any bounded set B of λ. For the case

that limn→∞ hn = ∞, because tr2(Gn)n·tr(G0

nGn)= O( 1hn ), the speciÞed assumption is always satisÞed. Hence,

supλ∈B | 1σ2n(λ)

| = O(1) and supλ∈B | 1σ2n(λ)

| = Op(1). Q.E.D.

29

Page 31: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Proof of Theorem 3. As hnn (Qn(λ)−Qn(λ0)) = −hn2 (lnσ

2n(λ)− lnσ20) + hn

n (ln |Sn(λ)|− ln |Sn(λ0)|),

by expanding this function at λ0 up to the third order terms,

hnn(Qn(λ)−Qn(λ0))

=hn2n[2tr2(Gn)

n− tr(G0nGn)− tr(G2n)](λ− λ0)2 −

8σ60σ6n(λ)

hnn3tr3(G0nSn(λ)S

−1n )

+6σ40σ4n(λ)

hnntr(G0nSn(λ)S

−1n )

tr(G0nGn)n

+2hnntr[(Wn,nS

−1n (λ))3] (λ− λ0)

3

3!.

Because tr(G0nSn(λ)S−1n ) is linear in λ, it is of order O( nhn ) uniformly in λ in any bounded set of λ. From

Lemma A.8, tr[(Wn,nS−1n (λ))3] is of order O( nhn ) uniformly in λ in a neighborhood Λ1 of λ0. Therefore,

there exists a constant c1 > 0 such that, for large enough n,

hnn(Qn(λ)−Qn(λ0)) ≤ hn

2n[2tr2(Gn)

n− tr(G0nGn)− tr(G2n)](λ− λ0)2 + c1|λ− λ0|3.

On the other hand, hn2n [

2tr2(Gn)n − tr(G0nGn) − tr(G2n)] ≤ −c2 for some c2 > 0 for large n. Thus, let

Λ2 = λ : |λ− λ0| < c22c1 ∩ Λ1 and one has

hnn(Qn(λ)−Qn(λ0)) < −c2

2(λ− λ0)2.

The identiÞcation uniqueness condition holds on Λ2. Theorem 2 implies that hnn (lnLn(λ) − lnLn(λ0)) −

hnn (Qn(λ) − Qn(λ0)) converges to zero in probability uniformly in a neighborhood Λ3 of λ0. By taking

Λ = Λ2 ∩ Λ3, the consistency of λn follows from the uniform convergence and the identiÞcation uniqueness

condition (White 1994). Q.E.D.

Proof of Theorem 4. Because Qn(λ)−Qn(λ0) = −hn2 (lnσ

2n(λ)− lnσ20)+ hn

n (ln |Sn(λ)|− ln |Sn(λ0)|),

∂2

∂λ2[hnn(Qn(λ)−Qn(λ0))] = (hn

n) 2σ

40

σ4n(λ)

tr2(G0nSn(λ)S−1n )

n− σ20σ2n(λ)

tr(G0nGn)− tr[(Wn,nS−1n (λ))2].

As σ2n(λ)−σ20 = O( 1hn ) and hnn2 tr

2(G0nSn(λ)S−1n ) = O( 1hn ) uniformly in any bounded set of λ, it follows that

as limn→∞ hn =∞,

lim supn→∞

∂2

∂λ2[hnn(Qn(λ)−Qn(λ0))] = − lim sup

n→∞hnntr(G0nGn) + tr[(Wn,nS

−1n (λ))2]

uniformly in λ in any bounded set.

Note that Wn,nSn(λ) = Sn(λ)Wn,n implies S−1n (λ)Wn,n = Wn,nS

−1n (λ). When W 0

n,n = Wn,n, one has

S0n(λ) = Sn(λ) and

[Wn,nS−1n (λ)]0 = [S−1n (λ)Wn,n]

0 =W 0n,nS

0−1n (λ) =Wn,nS

−1n (λ),

30

Page 32: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

that, in turn, implies tr[(Wn,nS−1n (λ))2] = tr[Wn,nS

−1n (λ)][Wn,nS

−1n (λ)]0 ≥ 0. We conclude that when

Wn,n is symmetric and limn→∞ hn =∞, hnn [Qn(λ)−Qn(λ0)] is a concave function in λ on any bounded set

of λ for large enough n.

When Wn,n = Λ−1n Dn, one has S

−1n (λ) = (In − λΛ−1n Dn)

−1 = (Λn − λDn)−1Λn. Therefore,tr[(Wn,nS

−1n (λ))2] = trΛ−1n Dn(Λn − λDn)−1Λn · Λ−1n Dn(Λn − λDn)−1Λn

= trDn(Λn − λDn)−1 ·Dn(Λn − λDn)−1.Also, S−1n (λ)Wn,n =Wn,nS

−1n (λ) implies that (Λn − λDn)−1Dn = Λ−1n Dn(Λn − λDn)−1Λn. It follows that

tr[(Wn,nS−1n (λ))2] = trDnΛ−1n Dn(Λn − λDn)−1Λn(Λn − λDn)−1

= trΛ−1/2n Dn(Λn − λDn)−1Λn(Λn − λDn)−1DnΛ−1/2n .

Denote En = Λ−1/2n Dn(Λn − λDn)−1Λ1/2n . As D0

n = Dn, E0n = Λ

1/2n (Λn − λD0

n)−1D0

nΛ−1/2n = Λ

1/2n (Λn −

λDn)−1DnΛ

−1/2n . Consequently, tr[(Wn,nS

−1n (λ))2] = tr(EnE

0n) ≥ 0. For large enough n, hn

n [Qn(λ) −

Qn(λ0)] is also a concave function on any bounded set of λ for this case. λ0 is the global maximizer as

Qn(λ)−Qn(λ0) < 0 for any λ 6= λ0 by Jensens inequality. Hence, the local identiÞcation of λ0 implies the

global identiÞcation of λ0.

The consistency of λn follows from the global identiÞcation and the uniform convergence in Theorem 2.

Q.E.D.

Proof of Proposition 1. By Lemma A.10, E(V 0nGnVn)2 = (µ4 − 3σ40)Pn

i=1G2n,ii + σ

40 [tr

2(Gn) +

tr(GnG0n) + tr(G

2n)]. Therefore,

E(Pn(λ0)) =2

nσ40E(V 0nGnVn)

2 − 1

σ20E(V 0nG

0nGnVn)− tr(G2n)

=2

n(µ4 − 3σ40σ40

)

nXi=1

G2n,ii + [tr2(Gn) + tr(GnG

0n) + tr(G

2n)]− tr(G0nGn)− tr(G2n)

=2

ntr2(Gn)− tr(G0nGn)− tr(G2n) +

2

n[tr(GnG

0n) + tr(G

2n)] +

2

n(µ4 − 3σ40σ40

)nXi=1

G2n,ii.

By Lemma A.8,Pn

i=1G2n,ii = O( nh2n

), and tr(G2n) and tr(GnG0n) are of order O(

nhn). These give the Þrst

part of the results.

Eq.(4.6) can be rewritten as hnn Pn(λ0) =

2σ40(h

12n

n V0nGnVn)

2 − 1σ20

hnn V

0nG

0nGnVn − hn

n tr(G2n). The proba-

bility limit of hnn Pn(λ0) will follow from those of h12n

n V0nGnVn and

hnn V

0nG

0nGnVn. The term

h12n

n (V0nGnVn −

σ20tr(Gn)) has zero mean and its variance is

var(h12n

n(V 0nGnVn)) =

hnn2(µ4 − 3σ40)

nXi=1

G2n,ii + σ40 [tr(GnG

0n) + tr(G

2n)] = O(

1

n)

31

Page 33: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

by Lemmas A.10 and A.8. By the Chebyshev inequality, h12n

n V0nGnVn − σ20 h

12n

n tr(Gn)p→ 0. By the continuity

mapping theorem (see, White 1984, p.25), (h12n

n V0nGnVn)

2− (σ20 h12n

n tr(Gn))2 p→ 0. The mean of hnn V

0nG

0nGnVn

is σ20hnn tr(G

0nGn) and its variance is

var[hnn(V 0nG

0nGnVn)] = (

hnn)2(µ4 − 3σ40)

nXi=1

(G0nGn)2ii + 2σ

40tr((G

0nGn)

2) = O(hnn).

Hence, hnn V0nG

0nGnVn − σ20 hnn tr(G0nGn)

p→ 0. The probability limit of hnn Pn(λ0) follows by combining these

terms. Because tr(Gn) = O(nhn), hnn

tr2(Gn)n = O( 1hn ) goes to zero if limn→∞ hn =∞. Q.E.D.

Proof of Proposition 2. From (4.3) and (4.6), hnn [∂2 lnLn(λn)

∂λ2 − Pn(λ0)] = 2∆n1 +∆n2 +∆n3 where

∆n1 =1

σ4n(λn)

(1

nY 0nW

0n,nSn(

λn)Yn)(hnnY 0nW

0n,nSn(

λn)Yn)− 1

σ40(1

nY 0nW

0n,nSnYn)(

hnnY 0nW

0n,nSnYn),

∆n2 = (1

σ2n(λn)

− 1σ20)hnn Y

0nW

0n,nWn,nYn, and ∆n3 =

hnn [tr(G

2n(λn)) − tr(G2n)]. The ∆n1 can be decomposed

into two terms as ∆n1 = ∆n11 +∆n12 where

∆n11 = (1

σ4n(λn)

− 1

σ40)(1

nY 0nW

0n,nSn(

λn)Yn)(hnnY 0nW

0n,nSn(

λn)Yn),

and

∆n12 =1

σ40[(1

nY 0nW

0n,nSn(

λn)Yn)(hnnY 0nW

0n,nSn(

λn)Yn)− ( 1nY 0nW

0n,nSnYn)(

hnnY 0nW

0n,nSnYn)].

We would like to show that all these terms converge in probability to zero.

As σ2n(λn) =

1nY

0nS

0n(λn)Sn(λn)Yn =

1nY

0nYn − 2λn( 1nY 0nWn,nYn) + λ

2n(

1nY

0nW

0n,nWn,nYn), Lemma A.12

and that λnp→ λ0 imply

σ2n(λn)− σ2n(λ0) = 2(λ0 − λn)(

1

nY 0nWn,nYn) + (λ

2n − λ20)(

1

nY 0nW

0n,nWn,nYn) = oP (1).

As σ2n(λ0) =1nV

0nVn

p→ σ20 by the strong law of large numbers for i.i.d. variables, we conclude that

σ2n(λn)

p→ σ20 . Similarly, 1nY

0nW

0n,nSn(

λn)Yn = 1nY

0nW

0n,nSnYn − (λn − λ0) 1nY 0nW 0

n,nWn,nYn = OP (1hn)

and hnn Y

0nW

0n,nSn(

λn)Yn =1nY

0nW

0n,nSnYn − (λ − λ0) 1nY 0nW 0

n,nWn,nYn = OP (1). It follows by the Slutsky

lemma that ∆n11p→ 0. By expanding the products in ∆n12,

σ40∆n12 =

·2(λ0 − λn)( 1

nY 0nW

0n,nYn) + (

λ2n − λ20)(1

nY 0nW

0n,nWn,nYn)

¸(hnnY 0nW

0n,nWn,nYn) = oP (1).

Thus, ∆n1p→ 0. ∆n2

p→ 0 because ∆n2 = ( 1σ2n(

λn)− 1

σ20)(hnn Y

0nW

0n,nWn,nYn) = oP (1). By the mean-value

theorem, tr(G2n(λn)) = tr(G

2n) + 2tr(G

3n(λn)) · (λn − λ0), where λn lies between λn and λ0. As S−1n (λ) is

32

Page 34: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

uniformly bounded in row sums uniformly in λ in a neighborhood of λ0, G3n(λn) is uniformly bounded in row

sums. It follows from Lemma A.8 that tr(G3n(λn)) = OP (nhn). Hence ∆n3 = 2

hnn tr(G

3n(λn)) · (λn − λ0) =

oP (1). Q.E.D.

Proof of Proposition 3. It follows from Propositions 1 and 2 that

hnn

∂2 lnLn(λn)

∂λ2=hnn

"∂2 lnLn(λn)

∂λ2− Pn(λ0)

#+hnnPn(λ0)

p−→ limn→∞

hnn[2

ntr2(Gn)− tr(GnG0n)− tr(G2n)].

Q.E.D.

Proof of Proposition 4. The asymptotic distribution of the quadratic random form follows from

Lemma A.13. The matrix Cn in Qn can be replaced by a symmetric matrix An =12 (Cn + C

0n) such that

Qn = V0nAnVn. For this case, E(Qn) = 0 because tr(An) = 0. The entries of Cn are of order O(

1hn) and so

are those of An. Alsohnn σ

2Qn= O(1) and its limit is positive. So An satisÞes the conditions of Lemma A.13.

Under Assumption 30, limn→∞h1+ηn

n = 0 for some small η > 0. As v is normally distributed, its absolute

moment of any order exists. By taking δ in Lemma A.13 large enough such that 2δ ≤ η, Assumption 30

implies that limn→∞ h1+ 2

δn

n = 0. Hence, the result follows from Lemma A.13. Q.E.D.

Proof of Proposition 5. When vis are normally distributed, µ4 = 3σ40 and the Þrst term in the

variance formula in (4.9) drops out.

For the case that limn→∞ hn =∞, the simpliÞcation depends on the order of relevant terms. As Gn,ii =

O( 1hn ) by Lemma A.8, it implies that Cn,ii = Gn,ii−tr(Gn)n = O( 1hn ) and, consequently,

Pni=1 C

2n,ii = O(

nh2n).

Hence, hnnPn

i=1 C2n,ii = O(

1hn) goes to zero when limn→∞ hn =∞. Q.E.D.

Proof of Theorem 5. The result follows from Propositions 3 and 5. For the special cases, Ω = 0.

Q.E.D.

Proof of Theorem 6. As σ2n(λ0) =1nY

0nS

0nSnYn =

1nV

0nVn, it follows that

√n(σ2n(

λn)− σ20)

=√n(σ2n(

λn)− σ2n(λ0)) +√n(σ2n(λ0)− σ20)

= ((λn + λ0)1

nY 0nW

0n,nWn,nYn − 2 1

nY 0nWn,nYn) ·

√n(λn − λ0) +

√n(1

nV 0nVn − σ20)

= [(λn + λ0)

√hnnY 0nW

0n,nWn,nYn − 2

√hnnY 0nWn,nYn] ·

rn

hn(λn − λ0) + 1√

n

nXi=1

(v2i − σ20).

33

Page 35: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

From Propositions 3 and 5,q

nhn(λn − λ0) = Σλλ ·

qhnnQn

σ20+ oP (1). For any Þnite n, we note that

qhnnQn

σ20

and 1n

Pni=1(v

2i − σ20) are uncorrelated. This is so, because tr(Cn) = 0 implies that

E[V 0nCnVn(V0nVn − nσ20)] = E[(

nXi=1

nXj=1

cn,ijvivj)(

nXk=1

v2k)]

=

nXi=1

cn,iiE(v4i ) +

nXi=1

nXk 6=i

cn,iiE(v2i v2k)

= (µ4 + (n− 1)σ40)nXi=1

cn,ii = 0.

E(h12n

n Y0nWn,nYn) =

h12n

n σ20tr(S

0−1n Gn) and E(

h12n

n Y0nW

0n,nWn,nYn) =

h12n

n σ20tr(G

0nGn). Lemma A.12 implies

that var(h12n

n Y0nWn,nYn) = O(

1n ) and var(

h12n

n Y0nW

0n,nWn,nYn) = O(

1n ). It follows by the Chebyshev inequality

that

√hnnY 0nWn,nYn − h

12n

nσ20tr(S

0−1n Gn) = oP (1) and

√hnnY 0nW

0n,nWn,nYn − h

12n

nσ20tr(G

0nGn) = oP (1).

As S0−1n − λ0G0n = In, tr(S

0−1n Gn)− λ0tr(G0nGn) = tr(Gn). These imply that

√n(σ2n(

λn)− σ20) = −2σ20h12n

ntr(Gn)Σλλ ·

rhnn

Qnσ20

+1√n

nXi=1

(v2i − σ20) + oP (1).

Asq

nhn(λn − λ0) = Σλλ ·

qhnnQn

σ20+ oP (1), the results follow from the Martingale central limit theorem

as in the proof of Proposition 4. For the case that limn→∞ hn = ∞, ψ = 0 as√hnn tr(Gn) = O(

1√hn) → 0.

Q.E.D.

34

Page 36: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Appendix C: The Method of Moments

The purpose of the Appendix is to illustrate that initial consistent estimates can be easily obtained.

These consistent estimates can be used as initial estimates for the QML method or for the construction of

a second-round estimator.

The method of moments can be applied to the spatial autoregressive process (2.1). We investigate

a method of moments estimator related to that of Kelejian and Prucha (1999a) under our setting. The

moments considered are based on functions of second moments of Vn, namely,

E(V 0nVn) = nσ20 , E[(Wn,nVn)

0(Wn,nVn)] = σ20tr(W

0n,nWn,n), E[(Wn,nVn)

0Vn] = σ20tr(Wn,n) = 0.

Equivalently, these moment equations are

E(1

n[Yn − (Wn,nYn)λ0]

0[Yn − (Wn,nYn)λ0])

= E(1

nY 0nYn)− 2E(

1

nY 0nWn,nYn)λ0 + E(

1

n(Wn,nYn)

0(Wn,nYn))λ20 = σ

20 ,

(C.1)

E(Y 0nW0n,nWn,nYn)− 2E(Y 0nW

02n,nWn,nYn)λ0 + E(Y

0nW

02n,nW

2n,nYn)λ

20 = σ

20tr(W

0n,nWn,n), (C.2)

and

E(Y 0nWn,nYn)− [E(Y 0nW 2n,nYn) + E(Y

0nW

0n,nWn,nYn)]λ0 + E(Y

0nW

02n,nWn,nYn)λ

20 = 0. (C.3)

By concentrating these moments with respect to σ2, (C.1) and (C.2) imply that

E(Y 0nW0n,nWn,nYn)− tr(W 0

n,nWn,n)E(1

nY 0nYn)− 2[E(Y 0nW

02n,nWn,nYn)− tr(W 0

n,nWn,n)E(1

nY 0nWn,nYn)]λ0

+ [E(Y 0nW02n,nW

2n,nYn)− tr(W 0

n,nWn,n)E(1

nY 0nW

0n,nWn,nYn)]λ

20 = 0.

(C.4)

A method of moments estimator of λ can be derived from the moment equations (C.3) and (C.4). Denote

bn1 = Y0nW

0n,nWn,nYn − tr(W 0

n,nWn,n) · 1nY 0nYn, bn2 = Y

0nWn,nYn,

an11 = 2[Y0nW

02n,nWn,nYn − tr(W 0

n,nWn,n) · 1nY 0nWn,nYn],

an12 = −Y 0nW02n,nW

2n,nYn + tr(W

0n,nWn,n) · 1

nY 0nW

0n,nWn,nYn,

an21 = Y 0nW 2n,nYn + Y

0nW

0n,nWn,nYn, and an22 = −Y 0nW

02n,nWn,nYn. Let An =

µan11 an12an21 an22

¶and bn =µ

bn1bn2

¶. The estimation of λ based on (C.3) and (C.4) can be done with or without imposing the nonlinear

35

Page 37: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

constraints on the coefficients of these moment equations. For the estimation without imposing constraints,

a moment estimator λn1 of λ can be solved from the linear system: An(λn1, µn)0 = bn. That is,

λn1 = (1, 0)A−1n bn.

It follows that (λn1, µn)0− (λ0,λ20)0 = A−1n cn where cn = bn−An(λ0,λ20)0. From Lemma A.12, all the entries

of An and bn are of order OP (nhn). The moment equations (C.3) and (C.4) imply that E(cn) = 0. The

variance of cn is of order O(nhn) from Lemma A.12. Therefore, hnn cn = oP (1). Lemma A.12 implies also

that hnn An = OP (1). Under the identiÞcation condition that limn→∞ hnn An = Γ exists and Γ is nonsingular,

λn1 = λ0 + (1, 0)(hnn An)

−1 hnn cn = λ0 + oP (1), i.e.,

λn1 is consistent. Let cn1 and cn2 be, respectively, the

Þrst and second entries of cn. As Yn = S−1n Vn,

cn1 = V0nS

0−1n W 0

n,nWn,n −tr(W 0

n,nWn,n)

nIn − 2[W 02

n,nWn,n −tr(W 0

n,nWn,n)

nWn,n]λ0

+ [W02n,nW

2n,n −

tr(W 0n,nWn,n)

nW 0n,nWn,n]λ

20S−1n Vn,

and

cn2 = V0nS

0−1n Wn,n − 2(W 2

n,n +W0n,nWn,n)λ0 +W

02n,nWn,nλ

20S−1n Vn.

Let Σcn be the variance matrix of cn. The central limit theorem for the quadratic form (Lemma A.13) is

applicable to cn. One hasq

hnn cn

D→ N(0, limn→∞ hnn Σcn). It follows thatr

n

hn(λn1 − λ0) = (1, 0)(hn

nAn)

−1rhnncn

D−→ N(0, (1, 0)Γ−1( limn→∞

hnnΣcn)Γ

0−1).

Alternatively, λ can be estimated by the minimization: minλ12 (Anθ− bn)0(Anθ− bn) where θ = (λ,λ2)0.

Let λn2 be the estimator of this method. It follows from the mean-value theorem that

λn2 − λ0 = [(0, 2)A0n(Anθ − bn) + (1, 2λ)A0nAn(1, 2λ)0]−1(1, 2λ0)A0ncn,

where λ lies between λn2 and λ0 and θ = (λ, λ2)0. The consistency of λn2 can be easily shown as Γ is

invertible. Furthermore,

rn

hn(λn2 − λ0) D−→ N(0, [(1, 2λ0)Γ

0Γ(1, 2λ0)0]−1(1, 2λ0)Γ0( limn→∞

hnnΣcn)Γ(1, 2λ0)

0[(1, 2λ0)Γ0Γ(1, 2λ0)0]−1.

36

Page 38: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

References

Amemiya, T. (1985), Advanced Econometrics, Harvard U. Press, Cambridge, Massachusetts.

Anselin, L. (1988), Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, The Nether-

lands.

Anselin, L. (1992), Space and Applied Econometrics, Anselin, ed. Special Issue, Regional Science and Urban

Economics 22.

Anselin, L. and A.K. Bera (1998), Spatial Dependence in Linear Regression Models with an Introduction

to Spatial Econometrics, in: Handbook of Applied Economics Statistics, A. Ullah and D.E.A. Giles, eds.,

Marcel Dekker, NY.

Anselin, L. and R. Florax (1995), New Directions in Spatial Econometrics, Springer-Verlag, Berlin.

Anselin, L. and S. Rey (1991), Properties of Tests for Spatial Dependence in Linear Regression Models,

Geographical Analysis 23, 110-131.

Anselin, L. and S. Rey (1997), Spatial Econometrics, Anselin, L. and S. Rey, ed. Special Issue, International

Regional Science Review 20.

Beran, J.R. (1972), Rank Spectral Processes and Tests for Serial Dependence, Annals of Mathematical

Statistics 43, 1749-1766.

Case, A.C. (1991), Spatial Patterns in Household Demand, Econometrica 59, 953-965.

Case, A.C. (1992), Neighborhood Inßuence and Technological Change, Regional Science and Urban Eco-

nomics 22, 491-508.

Case, A.C., H.S. Rosen, and J.R. Hines (1993), Budget Spillovers and Fiscal Policy Interdependence:

Evidence from the States, Journal of Public Economics 52, 285-307.

Cliff, A.D. and J.K. Ord (1973), Spatial Autocorrelation, Pion Ltd., London.

Cressie, N. (1993), Statistics for Spatial Data, Wiley, New York.

Davidson, J. (1994), Stochastic Limit Theory, Oxford University Press, New York.

Dhrymes, P.J. (1978), Mathematics for Econometrics, Springer-Verlag, New York.

Doreian, P. (1980), Linear Models with Spatially Distributed Data, Spatial Disturbances, or Spatial Effects,

Sociological Methods and Research 9, 29-60.

Giraitis, L. and M.S. Taqqu (1998), Central Limit Theorems for Quadratic Forms with Time-Domain

Conditions. Annals of Probability 26, 377-398.

37

Page 39: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Haining, R. (1990), Spatial Data Analysis in the Social and Environmental Sciences, Cambridge U. Press,

Cambridge.

Hall, P. and C.C. Heyde (1980), Martingale Limit Theory and its Applications, New York: Academic Press.

Horn, R., and C. Johnson (1985), Matrix Analysis, New York: Cambridge University Press.

Kelejian, H.H. and I.R. Prucha (1998), A Generalized Spatial Two-Stage Least Squares Procedure for

Estimating a Spatial Autoregressive Model with Autoregressive Disturbances, Journal of Real Estate

Finance and Economics 17, 99-121.

Kelejian, H.H., and I.R. Prucha (1999a), A Generalized Moments Estimator for the Autoregressive Param-

eter in a Spatial Model. International Economics Review 40, 509-533.

Kelejian, H.H., and I.R. Prucha (1999b), On the Asymptotic Distribution of the Moran I Test Statistic

with Applications. Manuscript, Department of Economics, University of Maryland.

Kelejian, H.H., and D. Robinson (1993), A suggested method of estimation for spatial interdependent models

with autocorrelated errors, and an application to a county expenditure model, Papers in Regional Science

72, 297-312.

Lee, L.F. (1999a), Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model

with Autoregressive Disturbances, Manuscript, Department of Economics, HKUST, Hong Kong.

Lee, L.F. (1999b), Consistency and Efficiency of Least Squares Estimation for Mixed Regressive, Spatial

Autoregressive Models. Manuscript, forthcoming in Econometric Theory.

Manski, C.F. (1993), IdentiÞcation of Endogenous Social Effects: The Reßection Problem, The Review of

Economic Studies 60, 531-542.

Mead, R. (1967), A Mathematical Model for the Estimation of Interplant Competition, Biometrics 23,

189-205.

Ord, J.K. (1975), Estimation Methods for Models of Spatial Interaction, Journal of American Statistical

Association, 70.

Paelinck, J. and L. Klaassen (1979), Spatial Econometrics, Saxon House, Farnborough.

Pinkse, J. (1999), Asymptotic Properties of Moran and Related Tests and Testing for Spatial Correlation

in Probit Models, manuscript, U. of British Columbia.

Potscher, B.M. and I.R. Prucha (1997), Dynamic Nonlinear Econometric Models, Asymptotic Theory, New

York: Springer Verlag.

38

Page 40: Abstract - College of Arts and Sciences1).pdfE-mail: lßee@econ.ohio-state.edu ... The reason weights matrices are row-normalized is to ensure that all weights are between 0 and 1

Sen, A. (1976), A Large Sample-Size Distribution of Statistics Used in Testing for Spatial Correlation.

Geographical Analysis 9, 175-184.

Smirnov, O. and L. Anselin (1999), Fast Maximum Likelihood Estimation of Very Large Spatial Autoregres-

sive Models: A Characteristic Polynomial Approach. Manuscript, School of Social Sciences, University

of Texas at Dallas.

White, H. (1984), Asymptotic Theory for Econometricians, Academic Press, London.

White, H. (1994), Estimation, Inference and SpeciÞcation Analysis, Cambridge University Press, New York,

New York.

Whittle, P. (1954), On Stationary Processes in the Plane, Biometrika 41, 434-449.

Whittle, P. (1964), On the Convergence to Normality of Quadratic Forms of Independent Variables. Theory

of Probability and Its Applications 9, 103-108.

39