DEEQA,Ecole A - Institut national de la recherche agronomique DEEQA.pdf · DEEQA,Ecole Do ctorale MPSE A cademic y ear 2003-2004 A dv anced Econometrics P anel data econometrics and

DEEQA,Ecole Doctorale MPSE

Academic year 2003-2004

Advanced Econometrics

Panel data econometricsand GMM estimation

Alban ThomasMF 102, [email protected]

2

Purpose of the course

Present recent developments in econometrics, that allow fora consistent treatment of the impact of unobserved heterogeneity

on model predictions: Panel data analysis.

Present a convenient econometric framework for dealing withrestrictions imposed by theory: Method of Moments estimation.

Deal with discrete-choice models with unobserved hetero-geneity.

Two keywords: unobserved heterogeneity and endogeneity.

Methods:

- Fixed Eects Least Squares

- Generalized Least Squares

- Instrumental Variables

- Maximum Likelihood estimation for Panel Data models

- Generalized Method of Moments for Times Series

- Generalized Method of Moments for Panel Data

- Heteroskedasticity-consistent estimation

- Dynamic Panel Data models

- Logit and Probit models for Panel Data

- Simulation-based inference

- Nonparametric and Semiparametric estimation

Statistical software: SAS, GAUSS, STATA (?)

Contents

I Panel Data Models 7

1 Introduction 9

1.1 Gains in pooling cross section and time series . . . 9

1.1.1 Discrimination between alternative models . 9

1.1.2 Examples . . . . . . . . . . . . . . . . . . . 10

1.1.3 Less colinearity between explanatory variables 11

1.1.4 May reduce bias due to missing or unob-

served variables . . . . . . . . . . . . . . . 11

1.2 Analysis of variance . . . . . . . . . . . . . . . . . 12

1.3 Some denitions . . . . . . . . . . . . . . . . . . . 15

2 The linear model 17

2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Model notation . . . . . . . . . . . . . . . 18

2.1.2 Standard matrices and operators . . . . . . 19

2.1.3 Important properties of operators . . . . . 20

2.2 The One-Way Fixed Eects model . . . . . . . . . 21

2.2.1 The estimator in terms of the Frisch-Waugh-

Lovell theorem . . . . . . . . . . . . . . . . 21

2.2.2 Interpretation as a covariance estimator . . 23

2.2.3 Comments . . . . . . . . . . . . . . . . . . 24

2.2.4 Testing for poolability and individual eects 25

5

6 CONTENTS

2.3 The Random Eects model . . . . . . . . . . . . . 26

2.3.1 Notation and assumptions . . . . . . . . . 26

2.3.2 GLS estimation of the Random-eect model 27

2.3.3 Comparison between GLS, OLS and Within 29

2.3.4 Fixed individual eects or error components? 29

2.3.5 Example: Wage equation, Hausman (1978) 30

2.3.6 Best Quadratic Unbiased Estimators (BQU)

of variances . . . . . . . . . . . . . . . . . 31

3 Extensions 33

3.1 The Two-way panel data model . . . . . . . . . . . 33

3.1.1 The Two-way xed-eect model . . . . . . 33

3.1.2 Example: Production function (Hoch 1962) 36

3.2 More on non-spherical disturbances . . . . . . . . 37

3.2.1 Heteroskedasticity in individual eect . . . 37

3.2.2 `Typical heteroskedasticity . . . . . . . . . 38

3.3 Unbalanced panel data models . . . . . . . . . . . 39

3.3.1 Introduction . . . . . . . . . . . . . . . . . 39

3.3.2 Fixed eect models for unbalanced panels . 40

4 Augmented panel data models 47

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 47

4.2 Choice between Within and GLS . . . . . . . . . . 48

4.3 An important test for endogeneity . . . . . . . . . 49

4.4 Instrumental Variable estimation: Hausman-Taylor

GLS estimator . . . . . . . . . . . . . . . . . . . . 51

4.4.1 Instrumental Variable estimation . . . . . . 51

4.4.2 IV in a panel-data context . . . . . . . . . 51

4.4.3 Exogeneity assumptions and a rst instru-

ment matrix . . . . . . . . . . . . . . . . . 52

CONTENTS 7

4.4.4 More ecient procedures: Amemiya-MaCurdy

and Breusch-Mizon-Schmidt . . . . . . . . 53

4.5 Computation of variance-covariance matrix for IV

estimators . . . . . . . . . . . . . . . . . . . . . . 55

4.5.1 Full IV-GLS estimation procedure . . . . . 56

4.6 Example: Wage equation . . . . . . . . . . . . . . 56

4.6.1 Model specication . . . . . . . . . . . . . 56

4.7 Application: returns to education . . . . . . . . . 58

4.7.1 Variables related to job status . . . . . . . 58

4.7.2 Variables related to characteristics of house-

holds heads . . . . . . . . . . . . . . . . . 58

5 Dynamic panel data models 63

5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 63

5.1.1 Dynamic formulations from dynamic pro-

gramming problems . . . . . . . . . . . . . 63

5.1.2 Euler equations and consumption . . . . . . 65

5.1.3 Long-run relationships in economics . . . . 67

5.2 The dynamic xed-eect model . . . . . . . . . . . 69

5.2.1 Bias in the Fixed-Eects estimator . . . . . 70

5.2.2 Instrumental-variable estimation . . . . . . 73

5.3 The Random-eects model . . . . . . . . . . . . . 75

5.3.1 Bias in the ML estimator . . . . . . . . . . 75

5.3.2 An equivalent representation . . . . . . . . 76

5.3.3 The role of initial conditions . . . . . . . . 77

5.3.4 Possible inconsistency of GLS . . . . . . . . 78

5.3.5 Example: The Balestra-Nerlove study . . . 78

8 CONTENTS

II Generalized Method of Moments estimation 83

6 The GMM estimator 85

6.1 Moment conditions and the method of moments . 85

6.1.1 Moment conditions . . . . . . . . . . . . . 85

6.1.2 Example: Linear regression model . . . . . 86

6.1.3 Example: Gamma distribution . . . . . . . 87

6.1.4 Method of moments estimation . . . . . . . 87

6.1.5 Example: Poisson counting model . . . . . 88

6.1.6 Comments . . . . . . . . . . . . . . . . . . 89

6.2 The Generalized Method of Moments (GMM) . . . 91

6.2.1 Introduction . . . . . . . . . . . . . . . . . 91

6.2.2 Example: Just-identied IV model . . . . . 91

6.2.3 A denition . . . . . . . . . . . . . . . . . 92

6.2.4 Example: The IV estimator again . . . . . 92

6.3 Asymptotic properties of the GMM estimator . . . 93

6.3.1 Consistency . . . . . . . . . . . . . . . . . 94

6.3.2 Asymptotic normality . . . . . . . . . . . . 95

6.4 Optimal and two-step GMM . . . . . . . . . . . . 97

6.5 Inference with GMM . . . . . . . . . . . . . . . . 99

6.6 Extension: optimal instruments for GMM . . . . . 102

6.6.1 Conditional moment restrictions . . . . . . 102

6.6.2 A rst feasible estimator . . . . . . . . . . 104

6.6.3 Nearest-neighbor estimation of optimal in-

struments . . . . . . . . . . . . . . . . . . 106

6.6.4 Generalizing the approach: other nonpara-

metric estimators . . . . . . . . . . . . . . 109

7 GMM estimators for time series models 115

7.1 GMM and Euler equation models . . . . . . . . . 115

7.1.1 Hansen and Singleton framework . . . . . . 115

CONTENTS 9

7.1.2 GMM estimation . . . . . . . . . . . . . . 117

7.2 GMM Estimation of MA models . . . . . . . . . . 118

7.2.1 A simple estimator . . . . . . . . . . . . . 118

7.2.2 A more ecient estimator . . . . . . . . . . 120

7.2.3 Example: The Durbin estimator . . . . . . 121

7.3 GMM Estimation of ARMA models . . . . . . . . 122

7.3.1 The ARMA(1,1) model . . . . . . . . . . . 122

7.3.2 IV estimation . . . . . . . . . . . . . . . . 123

7.4 Covariance matrix estimation . . . . . . . . . . . . 125

7.4.1 Example 1: Conditional homoskedasticity . 126

7.4.2 Example 2: Conditional heteroskedasticity . 126

7.4.3 Example 3: Covariance stationary process . 127

7.4.4 The Newey-West estimator . . . . . . . . . 128

7.4.5 Weighted autocovariance estimators . . . . 130

7.4.6 Weighted periodogram estimators . . . . . 133

8 GMM estimators for dynamic panel data 135

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . 135

8.2 The Arellano-Bond estimator . . . . . . . . . . . . 136

8.2.1 Model assumptions . . . . . . . . . . . . . 136

8.2.2 Implementation of the GMM estimator . . 137

8.3 More ecient procedures (Ahn-Schmidt) . . . . . . 139

8.3.1 Additional assumptions . . . . . . . . . . . 139

8.4 The Blundell-Bond estimator . . . . . . . . . . . . 140

8.5 Dynamic models with Multiplicative eects . . . . 141

8.5.1 Multiplicative individual eects . . . . . . . 141

8.5.2 Mixed structure . . . . . . . . . . . . . . . 143

8.6 Example: Wage equation . . . . . . . . . . . . . . 145

10 CONTENTS

III Discrete choice models 149

9 Nonlinear panel data models 151

9.1 Brief review of binary discrete-choice models . . . 151

9.1.1 Linear Probability model . . . . . . . . . . 151

9.1.2 Logit model . . . . . . . . . . . . . . . . . 152

9.1.3 Probit model . . . . . . . . . . . . . . . . . 152

9.2 Logit models for panel data . . . . . . . . . . . . . 153

9.2.1 Sucient statistics . . . . . . . . . . . . . . 153

9.2.2 Conditional probabilities . . . . . . . . . . 155

9.2.3 Example: T = 2 . . . . . . . . . . . . . . . 156

9.3 Probit models . . . . . . . . . . . . . . . . . . . . 157

9.4 Semiparametric estimation of discrete-choice models 158

9.4.1 The binary choice model . . . . . . . . . . 159

9.4.2 The IV estimator . . . . . . . . . . . . . . 162

9.5 SML estimation of selection models . . . . . . . . 164

9.5.1 The GHK simulator . . . . . . . . . . . . . 164

9.5.2 Example . . . . . . . . . . . . . . . . . . . 168

Appendix 1. Maximum-Likelihood estimation of the

Random-eect model 171

Appendix 2. The two-way random eects model 173

Appendix 3. The one-way unbalanced random eects

model 179

Appendix 4. ML estimation of dynamic panel models181

Appendix 5. GMM estimation of static panel models185

CONTENTS 11

Appendix 6. A framework for simulation-based infer-

ence 194

Appendix 7. Example: the SAS c Software 203

Appendix 8. A crash course in Gauss c 211

Appendix 9. Example: The Gauss c software 219

Appendix 10. IV and GMM estimation with Gauss c224

Appendix 11. DPD estimation with Gauss c 232

References 238

12 CONTENTS

Part I

Panel Data Models

13

Chapter 1

Introduction

Panel data: Sequential observations on a number of

units (individuals, rms).

Also called cross-sections over time, longitudinal data or pooled

cross-section time-series data.

1.1 Gains in pooling cross section and time se-

ries

1.1.1 Discrimination between alternative models

Many economic models in the form:

F (Y;X;Z; ) = 0;

where Y : individual control variables (workers, rms); X: (public

policy or principal's) variables; Z: (xed) individual attributes;

: parameters.

Linear model:

Y = 0 + xX + zZ + u:

15

16 CHAPTER 1. INTRODUCTION

Alternative views concerning this model:

Policy variables have a signicant impact whatever individualcharacteristics, or

Dierences across individuals are due to idiosyncratic individualfeatures, not included in Z.

In practice, observed dierences across individuals may be due

to both inter-individual dierences and the impact of policy vari-

ables.

1.1.2 Examples

a) WAGE = 0 + 1EDUCATION + 2Z.

People with higher education level have higher wages becauserms value those people more;

People have higher education because they have higher ability(expected productivity) anyway, and rms value worker ability

more.

b) SALES = 0 + 1ADV ERTISEMENT + 2Z.

Advertisement expenditures boost sales;More ecient rms enjoy more sales, and thus have more moneyfor advertisement expenditures.

c) OUTPUT = 0 + 1REGULATION + 2Z.

Regulatory control aects rm output; Firms with higher output are more regulated on average.

d) WAGE = 0 + 11I(UNION) + 2Z.

Belonging to a union signicantly raises wages;

1.1. GAINS IN POOLING CROSS SECTION AND TIME SERIES 17

Firms react to higher wages imposed by unions by hiring higher-quality workers, and 1I(UNION) is a proxy for worker quality.

1.1.3 Less colinearity between explanatory variables

In consumer or production economics, input, output or consumer

prices are dicult to use, because:

Time-series: Aggregated macro price indexes are highly cor-related;

Cross-sections: Not enough price variation across individualsor rms.

With panel data, variations across individuals and across time pe-

riods are accounted for.

Time-series: no information on the impact of individual char-acteristics (socioeconomic variables,...);

Cross-sections: no information on adjustment dynamics. Es-timates may reect inter-individual dierences inherent in com-

parisons of dierent people or rms.

1.1.4 May reduce bias due to missing or unobserved

variables

With panel data, easy to control for unobserved heterogeneity

across individuals. This is critical in practice, explains why panel

data models are now so popular in micro- and macro-econometrics.

Point related to endogeneity and omitted variables issues.


Example: Output supply function under perfect competition

max = pQ C(;Q) where C(;Q) = c(Q)

, p = @c(Q)@Q

= AQ1 (Cobb-Douglas)

= (0 + 1Q) (Quadratic).

Cobb-Douglas case: logQ = 11 (log p log A ). From

equilibrium condition to estimable equation: Observations (Qit; pit),

unobserved heterogeneity i, rm i, period t.

logQit =1

1 (log pit log i A )

Identication issue: estimable equation is

~Qit = a0 + a1~pit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

where ~Qit = logQit, ~pit = log pit, a1 = 1=( 1),a0 = (A E log i) =( 1), Euit = 0.Model identied if E log i = 0, i.e., Ei = 1, otherwise A is bi-

ased if i is overlooked and E log i 6= 0.

Empirical issue: possible correlation between output price pitand eciency term i.

1.2 Analysis of variance

Consider the model

yit = i + xiti + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; Ti;

where xit is scalar, i and i are parameters, and Ti: number of

time periods available for individual i.

1.2. ANALYSIS OF VARIANCE 19

Useful rst-order empirical moments are

yi =1

T

TiXt=1

yit; xi =1

T

TiXt=1

xit;

Sxxi =

TiXt=1

(xit xi)2; Sxyi =TiXt=1

(xit xi)(yit yi);

and

Syyi =

TiXt=1

(yit yi)2; i = 1; 2; : : : ; N:

Least-square parameter estimates are computed as

i = Sxyi=Sxxi and i = yi xi

and the Residual Sum of Squares (RSS) for individual i is

RSSi = Syyi S2xyi=Sxxi; with (Ti 2) degrees of freedom:

Consider now a restricted model with constant slopes and con-

stant intercepts:

yit = + xit + "it;

which obtains by imposing the following restrictions1 = 2 = = N(= )1 = 2 = = N(= ):

Under these restrictions, least-squares parameter estimates would

be

=

PN

i=1

PTi

t=1(xit x)(yit y)PN

i=1

PTi

t=1(xit x)2


and = y x, where

y =1

NP

iTi

NXi=1

TiXt=1

yit; x =1

NP

iTi

NXi=1

TiXt=1

xit:

The Residual Sum of Squares is

RSS =

NXi=1

TiXt=1

(yit y)2

hPN

i=1

PTi

t=1(yit y)(xit x)i2

PN

i=1

PTi

t=1(xit x)2;

with as number of degrees of freedom:P

N

i=1 Ti 2.

For a majority of applications, the rst model is too general and

estimation would require a great number of time observations. If

unobserved heterogeneity is additive in the model, we might con-

sider the following specication with constant slope and dierent

intercepts:

yit = i + xit + "it:

MinimizingP

i

Pt(yit i xit)2 with respect to i and , we

haveXi

Xt

(yit i xit) = 0;Xi

Xt

xit(yit i xit) = 0;

so that

i = yi xi and =P

i

Ptxit(yit yi)P

i

Ptxit(xit xi)

:

Residual Sum of Squares has nowP

iTi (N +1) degrees of free-

dom (N + 1 parameters are estimated).

This is the most popular model encountered in empirical ap-

plications.

1.3. SOME DEFINITIONS 21

1.3 Some denitions

Typical panel: when number of units (individuals) N is large,and number of time periods (T ) is small.

Short (long) panel: when # periods T is small (large).

Balanced panel: same # periods for every unit (individual).

Rotating panel: A subset of individuals is replaced every pe-riod. Rotating panels can be balanced or unbalanced.

Pseudo panel: when one is pooling cross-sections made ofdierent individuals for every period.

Attrition: with long panels, the probability that an individualremains in the sample decreases as the number of periods increases

(non response, moving, death, etc.)

Chapter 2

The linear model

2.1 Notation

yit = xit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

where xit is a K vector, is a (K 1) vector of parameters, anduit is the residual term.

yit and components of xit are both time-varying and varying across

individuals.

Component of dependent variable that is unexplained by xit:

uit = i + t + "it;

where i is the time-invariant individual eect, t is the time

eect, and "it is the i.i.d. component.

One-way error-component model: uit = i + "it.

Two-way error-component model: uit = i + t + "it.

23

24 CHAPTER 2. THE LINEAR MODEL

Allows several predictions of yit given Xit:

E(yitjxit) = xit across i and t,E(yitjxit; i) = xit + i for ind. i, across periods,E(yitjxit; t) = xit + t for period t, across individuals,E(yitjxit; i; t) = xit + i + t for ind. i and period t.

2.1.1 Model notation

2.1.1.1 Model in matrix form

Y = X + + + ";

where Y; ; and " are (NT 1), X is (NT K).Convention: index t runs faster, index i runs slower:

0BBBBBBBBBBBBBBBBBBBB@

y11...

y1Ty21...

y2T...

yit...

yN1...

yNT

1CCCCCCCCCCCCCCCCCCCCA

=

266666666666666666666664

X(1)11 X

(K)11

... ...X

(1)1T X

(K)1T

X(1)21 X

(K)21

... ...X

(1)

2T X(K)

2T... ...X

(1)it

X(K)it

... ...X

(1)N1 X

(K)N1

... ...X

(1)NT

X(K)NT

377777777777777777777775

0BBBBBBB@

12...

k...

K

1CCCCCCCA+ + + "

2.1. NOTATION 25

2.1.1.2 Model in vector form

yi = Xi + i+ + "i; i = 1; 2; : : : ; N;

where yi is T 1, Xi is T K. Note: = (1; 2; : : : ; T )0 andi= (i; i; : : : ; i)

0 are (T 1).

2.1.2 Standard matrices and operators

INT : identity matrix w/ NT rows and NT columns; eT : T -vector of ones;

B = IN (1=T )eTe0T : (Between-individual operator);

B = (1=N)eNe0N IT : (Between-period operator);

Q = INT IN (1=T )eTe0T = INT B(Within-individual operator);

Q = INT (1=N)eNe0N IT = INT B(Within-period operator;)

B B = (1=NT )eNTe0NT(Computes full population mean).

Important assumption: No intercept term in the

model (otherwise, use B B to demean all variables).

The B operators are used to compute, from NT vectors and ma-

trices, individual- or time-specic means of variables which are


stored in matrices of row dimension NT .

The Q operators are used to compute deviations from these

means.

2.1.3 Important properties of operators

Symmetry, idempotency and orthogonality

Q0 = Q; B0 = B; Q2 = Q; B2 = B; BQ = QB = 0;

Rank of idempotent matrix = its trace

) rank(Q) = N(T 1) and rank(B) = N:Decomposition of the Q operator with N = T = 2:

Qy =

0BB@26641 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

3775 1 00 1

1

2

1 1

1 1

1CCA y

=

0BB@y11y12y21y22

1CCA 1226641 1 0 0

1 1 0 0

0 0 1 1

0 0 1 1

37750BB@y11y12y21y22

1CCA

=

0BB@y11y12y21y22

1CCA 120BB@y11 + y12y11 + y12y21 + y22y21 + y22

1CCAWe will also use

BT = (1=T )eTe0T : Between operator for a single individual; QT = IT (1=T )eTe0T = IT BT : Within operator for a singleindividual.

2.2. THE ONE-WAY FIXED EFFECTS MODEL 27

2.2 The One-Way Fixed Eects model

Terminology: the xed-eects model does not mean that indi-

vidual eects i are not random in the true model ! Rather,

estimation is conditional on unobserved heterogeneity: the i's

are treated as parameters to be estimated.

2.2.1 The estimator in terms of the Frisch-Waugh-Lovell

theorem

Inference is conditional on individual eects: estimates obtain by

regressing Y on X and on individual dummies.

Let E the NT N matrix of individual dummy variables:

E =

266666666666666666664

1 0 0 01 0 0 01 0 0 00 1 0 00 1 0 00 1 0 0... ...0 0 0 10 0 0 10 0 0 1" " "(i = 1) (i = 2) (i = N)

377777777777777777775and consider the model

Y = X +E + " =W + u

where W = [X;E], = ( 0; 0)0, u = + ".


Frish-Waugh-Lovell theorem: Parameter estimates are numeri-

cally identical in the 2 following procedures:

from OLS = (0; 0)0 = (W 0W )1W 0Y

= (X0X)1X0Y ; where

X = [I E(E 0E)1E 0]X = PEX;Y = [I E(E 0E)1E 0]Y = PEY

(residuals from least-square regression of X and Y on E).

But E = IN eT , E 0E = IN e0TeT = IN T, PE = I E(E 0E)1E 0 = I 1TE(IN)E 0= I 1

T(IN eT )(IN eT )0 = I IN 1T eTe

0T= Q.

Hence = (X0

X)1(X0

Y ) = (X 0P 0EPEX)

1(X 0P 0EPEY )

= (X 0QX)1(X 0QY ).

Idea behind the xed-eect estimation procedure:

Eliminate individual eects , Eliminate individual-specic deviations

from variables

Transformation of the linear model as follows:

yit 1=TXt

yit = (xit 1=TXt

xit) + uit 1=TXt

uit

, Y BY = (X BX) + uBu , QY = QX +Qu:Least square parameter estimate:

= [(QX)0(QX)]1

(QX)0QY = [X 0Q0QX]1

(X 0Q0QY )

= (X 0QX)1X 0QY and V ar() = 2"(X 0QX)1.


2.2.2 Interpretation as a covariance estimator

The model is, in vector form:26664y1y2...

yN

37775 =26664x1x2...

xN

37775 +26664eT0T...

0T

377751 +266640TeT...

0T

377752

+ +

266640T0T...

eT

37775N +26664"1"2...

"N

37775 ;with assumptions:

E("i) = 0; E("i"0i) = 2

"IT ; E("i"

0j) = 0 i 6= j:

OLS estimates of and i obtain by

min

NXi=1

"0i"i =

NXi=1

(yi i xi)0(yi i xi)

, i = yi xi; i = 1; 2; : : : ; N;and substituting in partial derivative wrt. , we have

=

"N;TXi;t

(xit xi)(xit xi)0#1 "

N;TXi;t

(xit xi)(yit yi)#

This is called the covariance estimator, or the LSDV (Least-Square

Dummy-Variable) estimator. is unbiased, is consistent when N

or T tends to innity. Its covariance matrix is

V ar= 2

"

"NXi=1

xiQTx0i

#1;


where QT = IT (1=T )eTe0T .i is unbiased but consistent only when T !1.

2.2.3 Comments

Model transformation by ltering out individual components) Coecients associated with time-invariant regressors are notidentied.

Fixed-eect procedure uses variation within periods for eachunit, hence the name.

Another possibility is the Between procedure, using varia-tion between individuals.

BY = BX + B+ B";

= [(BX)0(BX)]1

(BX)0BY = [X 0BX]1X 0BY:

This alternative estimator uses variation between individual means

for model variables.

If X1 is time-varying only, BX1 = f 1TP

T

tx1itgi;t = x1 8i, and

the intercept term is not identied.

A word of caution in computing variance estimates. In the

model QY = QX + Qu, statistical software would divide RSS

by NT K (individual eects not included). But in the modelY = X+E++", the RSS would be divided by N(T1)K.

Parameter variance estimates in the Within regression model must

be multiplied by (NT K)=[N(T 1)K].


Y

X

Between

Within

y

1

2

3

................................................................................

...........

2.2.4 Testing for poolability and individual eects

Poolability

As before:yit = i + xiti + "itversus

yit = i + xit + "it;

but now xit is a K vector.

H0 : 1 = 2 = = N(= ) (K(N 1) constraints).Fisher test statistic is

(RRSS URSS)=K(N 1)URSS=N(T K 1) v F (K(N 1); N(T K 1)) ;

where RRSS: from Within regression

and URSS:=P

N

i=1RSSi where RSSi = SyyiS2xyi=Sxxi (see 1.2).

Testing for individual eects

H0 : 1 = = N (= ).


yit = + xit + "it (OLS)

versus

yit = i + xit + "it (Within):

Fisher test statistic is

(RRSS URSS)=(N 1)URSS=(NT N K) v F ((N 1); NT N K)) ;

where RRSS: from OLS regression on pooled data

and URSS: from Within (LSDV) regression.

2.3 The Random Eects model

2.3.1 Notation and assumptions

Problem with Fixed-eect model: degrees of freedom are lost when

N ! 1. Dierent approach: assume individual eects are ran-dom, i.e., model inference is drawn marginally (unconditionally

upon the i's) wrt. the population of all eects.

Assumptions:

i v IID(0; 2); "it v IID(0;

2"); E(i"it) = E(ixit) = 0;

with

E(ij) =

2 if i = j;

0 otherwise;

E("it"sj) =

2"

if i = j and t = s;

0 otherwise:

Hence cov(uit; ujs) = 2+ 2

"if i = j and t = s, and 2

if i = j

and t 6= s.

2.3. THE RANDOM EFFECTS MODEL 33

Let

T = E(uiu0i) =

266642+ 2

"2

2

2 2 +

2" 2

... ...2

2 2 + 2"

37775 ;a (T T ) matrix, for every individual i, i = 1; 2; : : : ; N . We have

E(uu0) = = IN T = IN

2(eTe

0T) + 2

"IT

= IN

2(T BT ) + 2"(QT + BT )

since QT = IT BT and BT = (1=T )eTe0T . Therefore

= IN

2(T BT ) + 2"(QT + BT )

= T2B +

2"INT

or equivalently: = 2"Q+ (T2

+ 2

")B.

2.3.2 GLS estimation of the Random-eect model

General model form: Y = X + U; with E(UU 0) = .

Generalized Least Squares (GLS) produce ecient parameter es-

timates of , 2 and 2" , based on known structure of variance-

covariance matrix .

GLS =X 01X

1X 01Y

and V ar(GLS) = 2"

X 01X

1.

Computation of 1: use of the formula

r = (2")rQ+ (T2

+ 2

")rB

for an arbitrary scalar r. Based on properties of Q and B (idem-

potency and orthogonality).


Hence useful matrices are

1 =1

2"

Q+1

T2+ 2

"

B

and

1=2 =1

"Q+

1

(T2 + 2")

1=2B:

We have GLS =X 01X

1X 01Y

=

"X 0

2"

1X

#1 "X 0

2"

1Y

#:

=hX 0 (Q+ B)

1Xi1 h

X 0 (Q+ B)1Yi;

where = (T2+ 2

")=2

"= 1 + T2

=2

".

GLS as Weighted Least Squares. Premultiply the model by

"

1=2 and use OLS: Y = X + u, where

Y = "

1=2Y =

Q+

"

(" + T)1=2B

Y

X = "

1=2X =

Q+

"

(" + T)1=2B

X;

so that Y = (Q + 1=2B)Y; X = (Q + 1=2B)X; and in

scalar form:

fyitg = (yit yi) + 1=2yi = yit (1

1p)yi

fxitg = (xit xi) + 1=2xi = xit (11p)xi:

See Appendix 1 for Maximum Likelihood Estimation of the random-

eects model.


2.3.3 Comparison between GLS, OLS and Within

GLS =

X 0QX +

1

X 0BX

1X 0QY +

1

X 0BY

Within = (X

0QX)1X 0QY; Between = (X0BX)1X 0BY;

so that

GLS = S1Within + S2Between;

where S1 = [X0QX + 1

X 0BX]1X 0QX and

S2 = [X0QX + 1

X 0BX]1X

0BX

.

(i) If 2= 0, then 1= = 1 and GLS = OLS.

(ii) If T !1, then 1=! 0 and GLS ! Within. (iii) If 1=!1, then GLS ! Between. (iv) V ar(Within) V ar(GLS) is a s.d.p. matrix. (v) If 1=! 0 then V ar(Within)! V ar(GLS).

2.3.4 Fixed individual eects or error components?

Crucial issue in panel data econometrics: how should we treat ef-

fects i's ? As parameters or as random variables ?

) If inference is restricted to the specic units (individuals)in the sample: conditional inference, use Fixed eects. Example:

Individuals are not selected as random, or all rms in a given in-

dustry are selected.

) If inference on the whole population: marginal (uncondi-tional) inference, use Random eects. Example: Individuals are

selected randomly from a huge population (consumers).


2.3.4.1 Some practical choice criteria

Interpretation of eects in the (economic) model; Sampling process: purely random or not; Number of units (countries, regions, households,...); Interchangeability of units; Endogeneity of Xit (see later).

2.3.4.2 Terminology

When xed individual eects are considered, Fixed-Eects or

Within estimation procedure. When random individual eects,

GLS (Generalized Least Squares) estimation procedure.

2.3.5 Example: Wage equation, Hausman (1978)

629 high-school graduates, Michigan income dynamics study. 3774

observations (N = 629, T = 6).

Dependent variable: log wage

The GLS estimator is a weighted-average of the Within and Be-

tween estimators, where the weight is the inverse of the corre-

sponding variance.

The Within estimator neglects the variation between individuals,

the Between estimator neglects the

variation within individuals, and the OLS gives equal weight to

both Within and Between variations.

Note. If the model contains an intercept:

yit = + xit + i + "it;


Table 2.1: Within and GLS estimation results

Variable Within GLS

Constant 0.8499

Age in [20,35] 0.0557 0.0393

Age in [35,45] 0.0351 0.0092

Age in [45,55] 0.0209 -0.0007

Age in [55,65] 0.0209 -0.0097

Age 65 over -0.0171 -0.0423

Unemployed prev. year -0.0042 -0.0277

Poor health prev. year -0.0204 -0.0250

Self-employed -0.2190 -0.2670

South -0.1569 -0.0324

Rural -0.0101 -0.1215

we use B B B instead of B (to eliminate ) in the formulae.

2.3.6 Best Quadratic Unbiased Estimators (BQU) of

variances

If errors are normal, BQU estimates of 2and 2

"are found from

2"= u0Qu=tr(Q) =

PN

i=1

PT

t=1(uit ui)2N(T 1)

and \2"+ T2

= u0Bu=tr(B) = T

NXi=1

u2i=N;

because tr(Q) = N(T 1) and tr(B) = N .

But in practice, the uit's are unknown and we must estimates

variances from the uit's instead.


1/ Wallace and Hussain (1969): Use OLS residuals in place of

true u's;

2/ Amemiya (1971): Use LSDV residuals estimates. We have pNT (2

" 2

")p

N(2 2

)

v N

0;

24

"0

0 24

where 2 =

\2" + T

2 2"

=T .

3/ Swamy and Arora (1972): Use mean square errors of the

Within and the Between regressions.

Mean square error from Within regression:

2"=Y 0QY Y 0QX(X 0QX)1X 0QY

=[N(T 1)K]

and from the Between regression:

\2" + T2 =

Y 0BY Y 0BX(X 0BX)1X 0BY

=[N K 1]:

Note: Intercept term in the Between regressors (X), not in the

Within regression.

4/ Nerlove (1971): Compute 2= 1

N1

PN

i=1(i i)2, where iare parameter estimates associated to individual dummies from

LSDV regression. And 2"is estimated from Within regression.

Estimation methods above with covariance components replaced

by consistent estimates: Feasible GLS.

Chapter 3

Extensions

3.1 The Two-way panel data model

Error component structure of the form:

uit = i + t + "it i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

or in matrix form

U = (IN eT )+ (eN IT )+ ";

where = (1; : : : ; N)0 and = (1; : : : ; T )

0.

3.1.1 The Two-way xed-eect model

i and t are treated as xed parameters, conditional inference

on the N individuals over the period 1! T .

3.1.1.1 Notation

Fixed-eect estimates of obtain by using the new operator:

Q = IN IT IN (eTe0T=T ) (eNe0N=N) IT ;

39

40 CHAPTER 3. EXTENSIONS

so that Qu = fuit ui utgit :Averaging over individuals, we have

yt = xt + t + "t with restriction

NXi=1

i= 0:

and averaging over time periods:

yi = xi + i+ "i with restriction

TXt=1

t = 0;

OLS on model in deviations yields

= (X 0QX)1X 0QY;

i = yi xi;t = yt xt:

If the model contains an intercept, operator Q becomes

Q = IN IT IN (eTe0T=T ) (eNe0N=N) IT

+(eNe0N=N) (eTe0T=T )

so that Qu = fuit ui ut + ugit, and Within estimates are

= (X 0QX)1X 0QY;

i = (yi y) (xi x);t = (yt y) (xt x):

3.1.1.2 Testing for eects

1/ H0 : 1 = = N = 1 = = T = 0.

3.1. THE TWO-WAY PANEL DATA MODEL 41

Fisher test statistic:

(RRSS URSS)=(N + T 2)URSS=[(N 1)(T 1)K] v F (k1; k2);

where

k1 = N + T 2; k2 = (N 1)(T 1)K); and

URSS (Unrestricted RSS): from Within model,

RRSS: (Restricted RSS): from pooled OLS.

2/ H0 : 1 = = N = 0 given t 6= 0; t T 1.


(RRSS URSS)=(N 1)URSS=[(N 1)(T 1)K] v F (k1; k2);

where

k1 = N 1; k2 = (N 1)(T 1)K); and

URSS: from Within model,

RRSS: from regression w/ time dummies only:

(yit yt) = (xit xt) + (uit ut):

3/ H0 : 1 = = T1 = 0 given i 6= 0; i N 1.


(RRSS URSS)=(T 1)URSS=[(N 1)(T 1)K] v F (k1; k2);

where

k1 = T 1; k2 = (N 1)(T 1)K); and


URSS: from Within model,

RRSS: from Within regression as in one-way model:

(yit yi) = (xit xi) + (uit ui):

See Appendix 2 for the two-way random eects model.

3.1.2 Example: Production function (Hoch 1962)

Sample: 63 Minnesota farms over the period 1946-1951.

Estimation of a Cobb-Douglas production function:

logOutputit = 0 + 1 logLaborit + 2 logReal estateit+3 logMachineryit + 4 logFertilizerit:

Motivation for adding specic eects (into uit):

Climatic conditions, identical across farms (t); Farm-specic factors (soil, managerial quality) (i).

Table 3.1: Least square estimates of Cobb-Douglas production func-

tionAssumption

(I) (II) (III)

Estimate i = t = 0 i = 0 t = 0

1 (Labor) 0.256 0.166 0.043

2 (Real estate) 0.135 0.230 0.199

3 (Machinery) 0.163 0.261 0.194

4 (Fertilizer) 0.349 0.311 0.289

Sum of 's 0.904 0.967 0.726R2 0.721 0.813 0.884

3.2. MORE ON NON-SPHERICAL DISTURBANCES 43

3.2 More on non-spherical disturbances

Panel data: in the random-eect context, heteroskedasticity due

to panel data structure. But variances 2 and

2" are assumed

constant.

Heteroskedasticity and serial correlation:

V ar(i) = 2i

Individual-specic heteroskedasticity

V ar("i) = 2i

Typical heteroskedasticity

E("it"is) 6= 0 t 6= s Serial correlation:

We present here the rst two cases only.

3.2.1 Heteroskedasticity in individual eect

Mazodier and Trognon (1978):

V ar(i) = 2i

"it v IID(0; 2"); i = 1; 2; : : : ; N;

or E(0) = diag[2i] = and " v IID(0;

2").

= E(UU 0) = diag[2i] (eTe0T ) + diag[2" ] IT ;

where diag[2"] is N N . We have

= diag[T2i + 2" ]

eTe

0T

T

+ diag[2" ]

IT

eTe0T

T

r = diag[(T2

i+2

")r]

eTe

0T

T

+diag[(2

")r]

IT

eTe0T

T

:

Transformation of the heteroskedastic model:

multiply both sides by "

1=2

= diag

"

(T2i+ 2

")1=2

eTe

0T

T

+ IN

IT

eTe0T

T

:


Transformed variables in scalar form:

yit= yit

"1

"p

T2i+ 2"

!#yi:

Same form as in the homoskedastic case, only here is individual-

specic:

i = (T2i +

2")=

2" and y

it = yit

1 1p

i

yi:

Feasible GLS:

Step 1. Estimate 2" consistently from usual Within regression;

Step 2. Noting that V ar(uit) = w2i = 2i + 2" , estimate w2i by1=(T 1)

PT

t=1(uit iu)2, where uit is OLS residual; Step 3. Compute 2

i= w2

i 2

";

Step 4. Form T 2i + 2", i and compute yit; xit; Step 5. Regress y

iton x

itto get .

Important: consistency of variance components estimates w2i; i =

1; 2; : : : ; N requires T >> N .

3.2.2 `Typical heteroskedasticity

Assumptions: i v IID(0; 2i) and V ar("it) =

2i.

= E(UU 0) = diag[2] (eTe0T ) + diag[2i ] IT

= diag[T2+ 2

i] (eTe0T=T ) + diag[2i ] (IT eTe0T=T ) :

Transformed model uses

1=2 = diag[1p

T2 + 2i

] (eTe0T=T )

3.3. UNBALANCED PANEL DATA MODELS 45

+diag[1=i] (IT eTe0T=T ) ;so that Y = 1=2 has typical element

yit=yit yii

+yip

T2+ 2

i

=yit iyi

iwhere i = 1

ipT2 +

2i

E(u2it) = w2

i= 2+

2i8i, hence OLS residuals uit can be used to

estimate w2i: w2

i= 1=(T 1)

PT

t(uit iu)2.

Within residuals ~uit are then used to compute

2i = 1=(T 1)P

T

t(~uit ~ui)2.

A consistent estimate of 2 is 2 = (1=N)

PN

i(w2

i 2i ).

3.3 Unbalanced panel data models

3.3.1 Introduction

Denition: number of time periods is dierent from one unit (indi-

vidual) to another. For individual i, we have Ti periods, and total

number of observations is nowP

N

i=1 Ti (instead of NT previously).

Examples

Firms: may close down or new intrants in an industry; Consumers: may move, die or refuse to answer anymore; Workers: may become unemployed,...

Problem of attrition: probability of a unit staying in the sample

decreases as the # of periods increases.


3.3.2 Fixed eect models for unbalanced panels

3.3.2.1 The one-way unbalanced xed-eect model

Consider the unbalanced model with T1 = 3 and T2 = 2:0BBBB@y11y12y13y21y22

1CCCCA =0BBBB@x11x12x13x21x22

1CCCCA +0BBBB@11122

1CCCCA+0BBBB@"11"12"13"21"22

1CCCCA :To eliminate , we need a new Within operator

Q =

I3 e3e03=3 0

0 I2 e2e02=2

=

2666642=3 1=3 1=3 0 0

1=3 2=3 1=3 0 01=3 1=3 2=3 0 0

0 0 0 1=2 1=20 0 0 1=2 1=2

377775 ;and the same procedure as in the balanced case is applied:

Within = (X0QX)

1X 0QY

where Q = diag(ITi eTie0Ti=Ti)ji=1;2;:::;N .

3.3.2.2 The two-way unbalanced xed-eect model

The model is

yit = xit + i + t + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;


where Nt: # of units observed in period t, and n =P

T

t=1Nt.

Total number of observations is n.

A bit more complex to extend the Within approach here.

Important: We now assume that observations are ordered dif-

ferently: i runs fast and t runs slowly.

Consider a N N matrix at time t from which we delete rowscorresponding to missing individuals at t.

Example: N = 3, N1 = 3, N2 = 2, N3 = 2, and observations are

(y11; y21; y31) (y12; y32) (y13; y23).

24 1 0 00 1 00 0 1

35 )

8>>>>>>>>>>>>>>>>>>>>>>>>>:

D1 =

24 1 0 00 1 00 0 1

35D2 =

1 0 0

0 0 1

D3 =

1 0 0

0 1 0

We have 3 (Nt N) matrices Dt, t = 1; 2; 3 constructed from I3above.

Now dene a new matrix as (1;2), where1 = (D01; : : : ; D

0T)0,

a (nN) matrix, and 2 = diag(DteN), a (n T ) matrix:

=

26664D1 D1eN 0D2 0 0... 0

......

DT 0 DTeN

37775 :


The DteN 's provide the number of units present for each period t

(the Nt's).

Matrix is n (N + T ), and corresponds to the matrix of alldummies (units and periods) present in the sample. Part 1 in

is the equivalent ot matrix E (containing individual dummies)

before.

Note that 011 = diag(Ti) (number of periods in the sample for

unit i), and 022 = diag(Nt) (number of individuals for period

t).

Also, 021 is a TN matrix of dummy variables for the presencein the sample of unit i at time t.

Fixed-eect estimator could be implemented by considering the

model

yit = xit +Dit + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;

where Dit: particular row of matrix , and contains all the i's

and t's.

In the balanced panel case, we would have 1 = (eT IN) and2 = (IT eN), and would be NT (N + T ).


In example above, n = 3 + 2 + 2 = 7 and N = 3:

=

26666666664

1 0 0 1 0 0

0 1 0 1 0 0

0 0 1 1 0 0

1 0 0 0 1 0

0 0 1 0 1 0

1 0 0 0 0 1

0 1 0 0 0 1

37777777775;

vector would be (1; 2; 3; 1; 2; 3), and 0Y =

26666664

1 0 0 1 0 1 0

0 1 0 0 0 0 1

0 0 1 0 1 0 0

1 1 1 0 0 0 0

0 0 0 1 1 0 0

0 0 0 0 0 1 1

37777775

0BBBBBBBBB@

y11y21y31y12y32y13y23

1CCCCCCCCCA=

0BBBBBB@

y11 + y12 + y13y21 + y23y31 + y32y11 + y21 + y31y12 + y32y13 + y23

1CCCCCCAwould compute the sums of variables over periods and inviduals.

Easier method if N and T are large: use deviations from indi-

vidual and time means, as in the balanced two-way Within case.

LetN =

011 (N N);

T = 022 (T T );

NT = 021 (T N);

= 2 11N 0NT (n T );P = T NT1N 0NT = 02 (T T ):


Wansbeek and Kapteyn (1989): The required Within operator for

such unbalanced two-way panel is

Q =In 11N 01

P 0;

where P: generalized inverse of P .

Transformed variable QY , say, is also written as

QY = Y 11N 01Y P 0Y = Y 11N 1 ;

where 1 = 01Y and = P

0Y .

1 compute the individual sumsP

Ti

t=1 yti.

Typical transformed element:

(QY )ti = yti 1i

Ti

+

a0i

Ti

t;

where ai: i-th column of NT .

Example

Let Y = (y11; y21; y31; y12; y32; y13; y23) = (1; 2; 3; 2; 6; 3; 4), n = 7,

N = 3, T = 3.

We have

N = T =

24 3 0 00 2 00 0 2

35 ; NT =24 1 1 11 0 11 1 0

35 ;P =

24 1:6666 0:8333 0:83330:8333 1:1666 0:33330:8333 0:3333 1:1666

35


QY =

0BBBBBBBBB@

0:4582

0:1875

0:50000:54180:5000

0:0832

0:1875

1CCCCCCCCCA; 1 =

0@ 669

1A =0@ 0:33831:6618

2:0368

1A

For example,

Qy11 = 16

3+ (

1

3) (1 1 1 )

0@ 0:33831:66182:0368

1A+ 0:3383 = 0:4582:Qy31 = 3

9

2+ (

1

2) (1 1 0 )

0@ 0:33831:66182:0368

1A+ 0:3383 = 0:5:See Appendix 3 for the unbalanced random-eects model.

Chapter 4

Augmented panel data models

What are augmented panel models ? Implication for estimation ?

Special estimation techniques when GLS are not feasible.

4.1 Introduction

Consider the model

yit = xit + zi + i + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

with xit a 1K vector of time- and individual-varying regressors,and zi a 1G vector of individual-specic (time-invariant) re-gressors.

Example:

logWAGE = 1HOURS + 1EDUC + 2SEX + i + "it:

Estimation method:

Within: is not identiable because

QY = QX + (I B)Z +Q +Q" = QX +Q";

53

54 CHAPTER 4. AUGMENTED PANEL DATA MODELS

since BZ = Z. Only identiable. But two-step procedure is

feasible:

1/ Run Within regression ) ;2/ Run Between regression on

yi xi = i + Zi + "i; i = 1; 2; : : : ; N;to estimate the 's.

GLS: Both and are identiable.

4.2 Choice between Within and GLS

One of the choice criterion between Within and GLS: presence of

zi's in the model.

Recall: GLS is a consistent and ecient estimator provided re-

gressors are exogenous:

E(ixit) = 0 and E(izi) = 0 8i; t:

Consider the non-augmented model yit = xit + i + "it.

If xit is endogenous in the sense E(ixit) 6= 0, then GLS are notconsistent:

GLS = +X 01X

1 X 01U

= +

X 0Q+ 1B

X1

X 0Q+ 1B

U;

where = 1 + T2=2" , so that

X 0Q+ 1B

U= [X 0Q"+X 0(B +B")=]

4.3. AN IMPORTANT TEST FOR ENDOGENEITY 55

= 0 +X 0B= + 0 = X 0= 6= 0;

because E(X 0") = 0 and B = .

Same problem with the augmented model, if E(X 0) 6= 0 and/orE(Z 0) 6= 0.

Important consequence in practice: If (some of the) re-

gressors are endogenous, GLS estimates are not consistent, but

Within estimates are consistent because is ltered out.

Another criterion of choice between Within and GLS:

If endogenous regressors ) Choose Within estimation (but not identiable);

If all regressors are exogenous, use GLS (the most ecient).

Three problems remain:

still not identied, because in the Between regressionyi xi = zi + i + "i,

zi still correlated with i.

If one uses Within, all regressors are treated as endogenous (nodistinction between exogenous and endogenous xit's).

Within estimates not ecient.

4.3 An important test for endogeneity

Null hypothesis: H0 : E(X0) = E(Z 0) = 0 (exogeneity).

Comparison between two estimators:


GLS WithinH0 Consistent, Consistent,

ecient not ecient

Alternative Not consistent Consistent

Hausman (1978): Even if the xit's are exogenous, GLS esti-

mates of are not consistent in the augmented model. Therefore,

one can test for exogeneity using parameter estimates for only.

Hausman test statistic: Under H0,

HT =Within GLS

0 hV ar(Within) V ar(GLS)

i1Within GLS

v 2(K):

Notes

GLS and Within must have the same dimension.Weighting matrix

hV ar(Within) V ar(GLS)

iis positive: GLS

more ecient than Within under the null.

Recall that V ar(GLS) = 2"(X 0QX+X 0BX)1 and V ar(w) =

2"(X 0QX)1.

Interpretation of # of degrees of freedom of the test:

Within estimator is based on the conditionE(X 0QU) = 0, whereas

GLS is based onE(X 01U) = 0 ) E(X 0QU) = 0 and E(X 0BU) =0.

For GLS, we add K additional conditions (in terms of B): rank

of X. Hausman test uses these additional restrictions (see GMM

later).

4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLORGLS ESTIMATOR57

4.4 Instrumental Variable estimation: Hausman-

Taylor GLS estimator

4.4.1 Instrumental Variable estimation

Alternative method: Instrumental-variable estimation. In the

cross-section context with N observations:

Y = X + "; E(X 0") 6= 0; E(W 0") = 0;

where W is a N L matrix of instruments. If K = L,

[W 0(Y X)] = 0 , (W 0Y ) = (W 0X)

= (W 0X)1W 0Y (IV estimator):

If L > K,

[W 0(Y X)] = 0 (L conditions on K parameters)

and construct quadratic form (Y X)0W (W 0W )1W 0(Y X) where PW =W (W 0W )1W 0

) = (X 0P 0WX)1(X 0PWY ):

Note: in general, instruments W originate from or outside the

equation.

4.4.2 IV in a panel-data context

Account for variance-covariance structure (); Find relevant instruments, not correlated with .


Consider the general, augmented model:

Y = X11 +X22 + Z11 + Z22 + + ";

where

X1 : N K1 exogenous, varying across i and t;X2 : N K2 endogenous, varying across i and t;Z1 : N G1 exogenous, varying across i;Z2 : N G2 endogenous, varying across i;

and let = (X 01; X02; Z

01; Z

02) and = (

01;

02;

01;

02)0.

General form of the Instrumental-variable estimator for panel

data: Let Y = 1=2Y , X = 1=2X, and = 1=2. We

have

IV =h

0

PWi1 h

0

PWYi

=h01=2PW

1=2i1 h

01=2PW

1=2Y

i:

Computation of 1=2: as in the usual GLS case.

4.4.3 Exogeneity assumptions and a rst instrument ma-

trix

Exogeneity assumptions: E(X 01) = E(Z01) = 0

) Obvious instruments are X1 and Z1, not sucient becauseK1 +G1 < K1 +K2 +G1 +G2.

Additional instruments: must not be correlated with .

Because is the source of endogeneity, every variable not cor-

related with is a valid instrument. Best valid instruments are

highly correlated with X2 and Z2.

QX1 and QX2 are valid instruments: E[(QX1)0] = E[X 01Q] =

4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLORGLS ESTIMATOR59

0 and E[(QX2)0] = E[X 02Q] = 0.

As for X1, equivalent to use BX1 because we need

E[X 01

1U ] = E[X 01(Q+

1B)U ] = E[X 01B(Q+ 1B)U ]

since BQ = 0 and BB = B.

Hausman-Taylor (1981) matrix of instruments:

WHT = [QX1; QX2; BX1; Z1] = [QX1; QX2; X1; Z1]:

Identication condition: We have K1+K2+G1+G2 parameters

to estimate, using K1 +K1 +K2 +G1 instruments (K1 +K2 in-

struments inQX). Therefore, identication condition isK1 G2.

4.4.4 More ecient procedures: Amemiya-MaCurdy and

Breusch-Mizon-Schmidt

4.4.4.1 Amemiya and MaCurdy (1986)

Use the fact that if xit is exogenous, we can use the following con-

ditions: E(xiti) = 0 8i; 8t instead of E(x0ii) = 0.

Amemiya and MaCurdy (1986) suggest to use matrix X1 in


the list of instruments:

X1 =

26666666666666664

x11 x12 : : : x1T (i = 1; t = 1)

x11 x12 : : : x1T (i = 1; t = 2)

: : : : : : : : : : : : : : :

x21 x22 : : : x2T (i = 2; t = 1)

x21 x22 : : : x2T (i = 2; t = 2)

: : : : : : : : : : : : : : :

xN1 xN2 : : : xNT (i = N; t = 1)

xN1 xN2 : : : xNT (i = N; t = 2)

: : : : : : : : : : : : : : :

xN1 xN2 : : : xNT (i = N; t = T )

37777777777777775such that QX1 = 0 and BX

1 = X

1 . The AM instrument matrix

is WAM = [QX;X1 ; Z1], and an equivalent estimator obtains by

using

WAM = [QX; (QX1); BX1; Z1];

where (QX1) is constructed as X1 above.

Amemiya and MaCurdy: their instrument matrix yields an es-

timator as least as ecient as with the Hausman-Taylor matrix,

if i is not correlated with regressors 8t.

Identication condition: We add (QX1) to the Hausman-Taylor

list of instruments, but as [(QX1); X1] is of rank K1, we only add

(T 1)K1 instruments. identication condition is TK1 G2.

4.4.4.2 Breusch, Mizon and Schmidt (1989)

Even more ecient estimator: based on conditions

E[(QX2it)0i] = 0 8i; 8t, instead of condition

E[(QTX2i)0i] = 0.

4.5. COMPUTATION OFVARIANCE-COVARIANCE MATRIX FOR IV ESTIMATORS61

For BMS, estimator is more ecient if endogeneity in X2 origi-

nates from a time-invariant component. BMS instrument matrix:

WBMS = [QX; (QX1); (QX2)

; BX1; Z1]

where (QX1) and (QX2)

are constructed the same way as X1for AM.

Identication condition: For BMS, we add (QX2) to Amemiya-

MaCurdy instruments. Condition is then TK1+(T 1)K2 G2.As before, we only add (T 1)K2 instruments, as (QX2) is notfull rank but (T 1)K2.

4.5 Computation of variance-covariance matrix

for IV estimators

Problem here: endogenous regressors may yield unconsistent esti-

mates of variance components in , in particular parameter .

Method suggested by Hausman-Taylor (1981) that yields consis-

tent estimates.

Let M1 denote the individual-mean vector of the Within residual:

M1 = BY BXW =B BX(X 0QX)1X 0Q

Y

= Z + +B BX(X 0QX)1X 0Q

";

where X = (X1jX2), Z = (Z1jZ2), and = (1; 2). The lastthree terms above can be treated as centered residuals, and it

suces to nd instruments for Z2 in order to estimate .

The IV estimator of is

B = (Z0PCZ)

1(Z 0PCM1);


where PC is the projection matrix associated to instruments C =

(X1; Z1). Using parameter estimates W and B, we form resid-

uals

uW = QY QXW and uB = BY BXW ZB:

These two vectors of residuals are used to compute variance com-

posants as in standard Feasible GLS.

4.5.1 Full IV-GLS estimation procedure

Step 1. Compute individual means and deviations, BX, BY ,QX and QY .

Step 2. Estimate parameters associated toX using Within.

Step 3. Estimate B by the IV procedure above.

Step 4. Compute 2 and

2" from uW and uB, and compute

= 1 + T 2=2

".

Step 5. Transform variables by GLS scalar procedure , e.g.,(Q+

pB)Y = yit (1

p)yi.

Step 6. Compute projection projection PW from instrumentmatrix W .

Step 7. Estimate parameters .

4.6 Example: Wage equation

4.6.1 Model specication

4.6. EXAMPLE: WAGE EQUATION 63

Theory (Human capital or signal theory):

logw = F [X1; ; ED]; where w : wage rate;

: worker's ability (unobserved), X1: additional variables (indus-

try, occupation status, etc.), and ED: educational level. Proxies

for ability that can be used: number of hours worked, experience,

union, etc.

Main objective: estimate marginal gain associated withED: @w=@ED.

But problem: what if worker's ability is constant through time and

conditions ED ? True model would belogw = F [X1; ; ED];

ED = G[;X2];

where X2 are additional, individual-specic variables.

If ability is replaced by proxies Z, we havelogw = F [X1; Z; ED] + U;

ED = G[X2; Z2] + V;

where U = F [X1; ; ED] F [X1; Z; ED] andV = G[X2; ]G[X2; Z].

Two problems when estimating the rst equation while overlook-

ing the second one:

If some X1 and X2 variables in common, endogeneity bias (be-cause of ED);

If Z correlated with omitted variables (explaining ability), measurement-error bias.


4.7 Application: returns to education

Sample used: Panel Study of Income Dynamics (PSID), Univer-

sity of Michigan. See Baltagi and KhantiAkom 1990, Cornwell

and Rupert 1988.

595 individuals, for years 1976 to 1982 (7 time periods): heads of

households (males and females) aged between 18 and 65 in 1976,

with a positive wage in private, nonfarm employment for the

years 1976 to 1982.

4.7.1 Variables related to job status

LWAGE : logarithm of wage earnings;

WKS : number of weeks worked in the year;

EXP : working experience in years at the date of the sample;

OCC : dummy, 1 if bluecollar occupation;

IND : dummy, 1 if working in industry;

UNION : dummy, 1 if wage is covered by a union contract.

4.7.2 Variables related to characteristics of households

heads

SMSA : dummy, 1 if household resides in SMSA (StandardMetropolitan Statistical Area);

SOUTH : dummy, 1 if individual resides in the south;

MS : Marital Status dummy, 1 if head is married;

4.7. APPLICATION: RETURNS TO EDUCATION 65

FEM : dummy, 1 female;

BLK : dummy, 1 if head is black;

ED : number of years of education attained.

Individual-specic variables: ED, BLK and FEM .

Estimation of non-augmented models (w/o Zi's)

Variables a priori endogenous (because correlated with ability:

individual eects): X2: (EXPE, EXPE2, UNION , WKS,

MS);

Variables a priori exogenous: X1: (OCC, SOUTH, SMSA,

IND).

Augmented model

Yit = X1it1 +X2it2 + Z1i1 + Z2i2 + i + "it

Variables a priori endogenous: Z2: ED;

Variables a priori exogenous: Z1: (BLK, FEM).


Table 4.1: Sample 1 1976-1982. Descriptive StatisticsVariable Mean Std. Dev. Minimum Maximum

LWAGE 6.6763 0.4615 4.6052 8.5370

EXP 19.8538 10.9664 1.0000 51.0000

WKS 46.8115 5.1291 5.0000 52.0000

OCC 0.5112 0.4999 0.0000 1.0000

IND 0.3954 0.4890 0.0000 1.0000

UNION 0.3640 0.4812 0.0000 1.0000

SOUTH 0.2903 0.4539 0.0000 1.0000

SMSA 0.6538 0.4758 0.0000 1.0000

MS 0.8144 0.3888 0.0000 1.0000

ED 12.8454 2.7880 4.0000 17.0000

FEM 0.1126 0.3161 0.0000 1.0000

BLK 0.0723 0.2590 0.0000 1.0000

4.7. APPLICATION: RETURNS TO EDUCATION 67

Table 4.2: Dependent variable: log(wage). Exogenous regressors

only.Within GLS

Constant 0.0976 (0.0040)

OCC -0.0696 (0.02323) -0.0701 (0.02322)

SOUTH -0.0052 (0.05833) -0.0072 (0.05807)

SMSA -0.1287 (0.03295) -0.1275 (0.03290)

IND 0.0317 (0.02626) 0.0317 (0.02624)

2(4) = 0:551

Notes. Standard errors are in parentheses.

Table 4.3: Dependent variable: log(wage). Endogenous regressors

only.Within GLS

Constant 0.0561 (0.0024)

EXPE 0.1136 (0.002467) 0.1133 (0.002466)

EXPE2 -0.0004 (0.000054) -0.0004 (0.000054)

WKS 0.0008 (0.0005994) 0.0008 (0.0005994)

MS -0.0322 (0.01893) -0.0325 (0.01892)

UNION 0.0301 (0.01480) 0.0300 (0.01479)

2(5) = 24:94



Table 4.4: Dependent variable: log(wage). Augmented model.

Within GLS

Constant 0.1866 (0.01189)

OCC -0.0214 (0.01378) -0.0243 (0.01367)

SOUTH -0.0018 (0.03429) 0.0048 (0.03188)

SMSA -0.0424 (0.01942) -0.0468 (0.01891)

IND 0.0192 (0.01544) 0.0148 (0.01521)

EXPE 0.1132 (0.00247) 0.1084 (0.00243)

EXPE2 -0.0004 (0.00005) -0.0004 (0.00005)

WKS 0.0008 (0.00059) 0.0008 (0.00059)

MS -0.0297 (0.01898) -0.0391 (0.01884)

UNION 0.0327 (0.01492) 0.0375 (0.01472)

FEM -0.1666 (0.12646)

BLK -0.2639 (0.15413)

ED 0.1373 (0.01415)

2(9) = 495:3


Table 4.5: Dependent variable: log(wage). IV Estimation

HT AM BMS

Constant 0.1772 (0.017) 0.1781 (0.016) 0.1748 (0.016)

OCC -0.0207 (0.013) -0.0208 (0.013) -0.0204 (0.013)

SOUTH 0.0074 (0.031) 0.0072 (0.031) 0.0077 (0.031)

SMSA -0.0418 (0.018) -0.0419 (0.018) -0.0423 (0.018)

IND 0.0135 (0.015) 0.0136 (0.015) 0.0138 (0.015)

EXPE 0.1131 (0.002) 0.1129 (0.002) 0.1127 (0.002)

EXPE2 -0.0004 (0.005) -0.0004 (0.000) -0.0004 (0.000)

WKS 0.0008 (0.000) 0.0008 (0.000) 0.0008 (0.000)

MS -0.0298 (0.018) -0.0300 (0.018) -0.0303 (0.018)

UNION 0.0327 (0.014) 0.0324 (0.014) 0.0326 (0.014)

FEM -0.1309 (0.126) -0.1320 (0.126) -0.1337 (0.126)

BLK -0.2857 (0.155) -0.2859 (0.155) -0.2793 (0.155)

ED 0.1379 (0.021) 0.1372 (0.020) 0.1417 (0.020)

Test 2(3) = 5:23 2(13) = 19:29 2(13) = 12:23Notes. Standard errors are in parentheses.

Chapter 5

Dynamic panel data models

5.1 Motivation

Usefulness of dynamic panel data models:

Investigate adjustment dynamics in micro- and macro-economicvariables of interest;

Estimate equations from intertemporal-framework models (life-cycle models, nance,...)

In practice: estimate long-run elasticities and structural parame-

ters from Euler equations.

5.1.1 Dynamic formulations from dynamic programming

problems

Consider the general problem

maxq(0);:::;q(T )ER

ert(t);

(t) = p(t)q(t) c[q(t); b(t)];_b = G[b(t); q(t)];

69

70 CHAPTER 5. DYNAMIC PANEL DATA MODELS

where b(t) is the state variable (stock, capital,...), q(t) is the con-

trol variable, r is discount rate. G(:) describes the evolution path

of the state variable.

Dynamic programming solves the problem in a series of steps.

Switch to discrete-time framework:

maxq0;:::;qT EnP

T

t=0(1 + r)tt

o;

bt+1 = f(bt; qt);

and use the Bellman equation:

Vt(bt) = maxEtt + (1 + r)

1Vt+1(bt+1)

= maxEt fptqt c[qt; bt] + Vt+1f [bt; qt]g ;where Vt(bt) is the value function of the problem at time t, and

Et is the conditional expectation operator at time t.

We use a) the envelope theorem (evolution path at optimum de-

pends only on state variable, as control variable is already opti-

mized); b) First-order condition wrt. control variable.

@Vt(bt)

@bt=@t(bt; qt)

@bt+

1

1 + r

@Vt+1

@f

@f(bt; qt)

@bt;

(Envelope theorem)

@Vt(bt)

@qt=@t(bt; qt)

@qt+

1

1 + r

@Vt+1

@f

@f(bt; qt)

@qt= 0 (FOC):

From (FOC):

@Vt+1

@f= @t

@qt

@f(bt; qt)

@qt

1(1 + r);

5.1. MOTIVATION 71

that we replace in rst equation above:

@Vt

@bt=@t

@bt @t@qt

@f(bt; qt)

@qt

1@f(bt; qt)

@bt:

Now we lag (FOC) once and replace:

@t1

@qt1+

1

1 + r

"@t

@bt @t@qt

@f

@qt

1@f

@bt

#@f(bt1; qt1)

@qt1= 0:

Assume @f=@q = a1 and @f=@b = a2. We have

@t

@qt=

1 + r

a2

@t1

@qt1+

a1

a2

@t

@bt:

This is the Euler equation relating current and past marginal

prots.

If, for instance, prot is linear-quadratic in qt and bt, we have

b0 + b1qt + b2bt =1+ra2

(b0 + b1qt1 + b2bt1)

+a1

a2

(c0 + c1qt + c2bt)

, qit = 0 + 1qi;t1 + 2bi;t1 + 3bit + i + "it;

where

0 = (a2b1 a1c1)1 [b0 ((1 + r) a2) + a1c0] ;1 = (a2b1 a1c1)1 [(1 + r)b1] ;2 = (a2b1 a1c1)1 [(1 + r)b2] ;3 = (a2b1 a1c1)1 [a1c2 a2b2] :

5.1.2 Euler equations and consumption


Consider a two-period model with the following period-to-period

budget constraint

ct +At = yt + At1(1 + rt); t = 1; 2;

where ct is consumption at time t, At is total assets, yt is wage

income, and rt is interest rate.

Assume further, intertemporally additive preferences:

U = u(c1) +1

1 + u(c2);

where u0 > 0, u00 < 0 and 0 is the subjective discount rate.Often-used specication: CES (Constant Elasticity of Substitu-

tion)

U = c1 +

1

1 + c2 ;

where = 1=(1+) is the intertemporal elasticity of substitution.

At the optimum (by replacing budget constraints in utility func-

tion and optimizing wrt. A1):

@U

@A1=@u

@c1

@c1

@A1+

1

1 +

@u

@c2

@c2

@A1= 0

, @u@c1

=1 + r

1 +

@u

@c2:

This is the intertemporal eciency condition (Hall 1978), and in

the CES case we have

c1=1 =

1 + r

1 +

c1=2 :

5.1. MOTIVATION 73

Stochastic framework with u(X) = 1=2( X)2:

c1 =1 + r

1 + ( Ec2) , c1 = Ec2 if r = :

Hall Euler equation with more than 2 periods reduces to

ct+1 = ct + "t+1; where "t+1 is i.i.d.;

which is tested from the equation

ct = 0 + 1yt + 2(yt1 ct1) + "t:

This is an error-correction model that can be written

ct = 0 + 1yt + (ct1 1yt1) + 2(yt1 ct1) + "t:

5.1.3 Long-run relationships in economics

Long-run relationships are represented by the stationary path

of the variable of interest (consumption, capital stock,...)yt+1

yt= and if we add variable xt, yt+1 = yt + xt+1, stationary

equilibrium path is y = x

1.

5.1.3.1 Long-run elasticities

Dynamic models are helpful in computing long-run elasticities.

Consider for example the dynamic consumption model

~Ci;t+j = ~Ci;t+j1 + ~Pi;t+j + ui;t+j;

where ~Ci;t+j and ~Pi;t+j respectively denote logs of consumption

and price. Lagged consumption here accounts for habits. We

have~Ci;t+j =

j+1 ~Ci;t1 +j ~Pit +

j1 ~Pi;t+1 + : : :


+ ~Pi;t+j1 + ~Pi;t+j

+ ui;t+j;

where ui;t+j = juit +

j1ui;t+1 + + ui;t+j1 + ui;t+j.

Assume we want to compute the change in consumption at

time t+ j following a permanent change of 1% in price between

t and t+ j:

@ ~Ci;t+j

@ ~Pit+@ ~Ci;t+j

@ ~Pi;t+1+ + @

~Ci;t+j

@ ~Pi;t+j= (j + j1 + + + 1):

When consumption is stationary (in logs), jj < 1, and the long-run eect of price obtains by taking the limit

limj!1

jXs=0

@ ~Ci;t+j

@ ~Pi;t+s= lim

j!1(j + j1 + + + 1) =

1 :

5.1.3.2 Dynamic representations from AR(1) errors

Consider the following Cobb-Douglas production model

logQit = 1 logNit + 2 logKit + uit;

where Qit is output of rm i at time t, Nit is labor input, Kit is

capital stock, and uit is the residual. Assume the latter decom-

poses into

uit = t + i + vit + "it;

where t is a year-specic intercept (industry-wide technological

change), i is the unobserved rm-specic eect, "it is an i.i.d.

error component (measurement error), and vit is a productivity

shock having an AR(1) representation:

vit = vi;t1 + eit:

5.2. THE DYNAMIC FIXED-EFFECT MODEL 75

This model has the following, dynamic representation:

logQit = 1 logNit 1 logNi;t1 + 2 logKit 2 logKi;t1

+ logQi;t1 + (t t1) + [i(1 ) + eit + "it "i;t1] ;

or

logQit = 1 logNit + logNi;t1 + 3 logKit + logKi;t1+5 logQi;t1 +

t+ (

i+ !it);

subject to restrictions 2 = 15 and 4 = 35.

Hence, equivalence between a static (short-run) model with serially-

correlated productivity shocks, and a dynamic representation of

production output.

5.2 The dynamic xed-eect model

Simple dynamic panel-data model:

yit = yi;t1 + i + "it; i = 1; 2; : : : ; N ; t = 1; 2; : : : ; T;

where initial conditions yi0; i = 1; 2; : : : ; N are assumed known.

We assume E("it) = 0 8i; t, E("it"js) = 2"if i = j; t = s and 0 otherwise, E(i"it) = 0 8i; t.By continuous substitution:

yit = "it + "i;t1 + 2"i;t2 + + t1"i1 +

1 t1 i +

tyi0:


5.2.1 Bias in the Fixed-Eects estimator

The Within estimator is:

=

PN

i=1

PT

t=1(yit yi)(yi;t1 yi;1)PN

i=1

PT

t=1(yi;t1 yi;1)2;

i = yi yi;1;

where

yi =1

T

TXt=1

yit; yi;1 =1

T

TXt=1

yi;t1; "i =1

T

TXt=1

"it:

Also,

= +1NT

PN

i=1

PT

t=1("it "i)(yi;t1 yi;1)1NT

PN

i=1

PT

t=1(yi;t1 yi;1)2;

This estimator exists if denominator 6= 0 and is consistent if nu-merator converges to 0.

Numerator:

plimN!11

NT

N;TXi;t

(yi;t1 yi;1)("it "i) = plim1

N

NXi=1

yi;1"i

because "it is serially uncorrelated and not correlated with i. We

use

yi;1 =1

T

TXt=1

yi;t1 =1

T

1 T1 yi0 +

(T 1) T+ T(1 )2 i

+1 T11 "i1 +

1 T21 "i2 + + "i;T1

:


We have

plim1

N

NXi=1

yi;1"i = plim

(1

N

NXi=1

"i1

T

"T1Xt=1

1 Tt1 "it

#)

= plim

(1

N

NXi=1

1

T

TXt=1

"it

!1

T

"T1Xt=1

1 Tt1 "it

#)

=2"T 2

(T 1) T+ T

(1 )2

:

In a similar manner, we show that plim 1NT

PN;T

i;t(yi;t1 yi;1)2

=2"

1 2

1 1

T 2

(1 )2 (T 1) T+ T

T 2

Forming the ratio of these two terms, the asymptotic bias is

plimN!1( ) = 1 +

T 1

1 1

T

1 T1

1 2

(1 )(T 1)

1 1

T

T (1 )

1= O(1=T ):

In the transformed model

(yit yi) = (yi;t1 yi;1) + ("it "i);

the explanatory variable is correlated with residual, and correla-

tion is of order 1=T . Hence, the Fixed-Eects estimator is biased

in the usual case where N is large and T is small.


Table 5.1: Asymptotic bias in Fixed-Eects DPD estimator T Bias Percent

0.2 6 -0.2063 -103.1693

8 -0.1539 -76.9597

10 -0.1226 -61.3139

20 -0.0607 -30.3541

40 -0.0302 -15.0913

0.5 6 -0.2756 -55.1282

8 -0.2049 -40.9769

10 -0.1622 -32.4421

20 -0.0785 -15.6977

40 -0.0384 -7.6819

0.7 6 -0.3307 -47.2392

8 -0.2479 -35.4084

10 -0.1966 -28.0912

20 -0.0938 -13.3955

40 -0.0449 -6.4114

0.9 6 -0.3939 -43.7633

8 -0.3017 -33.5179

10 -0.2432 -27.0248

20 -0.1196 -13.2934

40 -0.0563 -6.2561


5.2.2 Instrumental-variable estimation

Only way to obtain consistent estimator of when T is xed

(small). Dierent procedure to eliminate individual eects: use

First dierencing instead of Within:

(yit yi;t1) = (yi;t1 yi;t2) + ("it "i;t1)yit = yi;t1 +"it;

and in vector form:

yi = yi;1 +"i; i = 1; 2; : : : ; N:

In model above, yi;t1 correlated by construction with "i;t1!Weneed instruments that are uncorrelated with ("it "i;t1) but cor-related with (yi;t1 yi;t2). Only possibility in a single-equationframework with no other explanatory variables: use values of de-

pendent variables.

Because of autoregressive nature of model, instruments from fu-

ture values of yit are not feasible because yit is a recursive function

of "it; "i;t1; : : : ; "i1; i; yi0.

As for lagged dependent variables, we can use either yi;t2 or

(yi;t2 yi;t3):E[yi;t2("it "i;t1)] = E("i;t2"it) E("i;t2"i;t1) = 0;E[(yi;t2 yi;t3)("it "i;t1)] = E["i;t2("it "i;t1)]

E["i;t3("it "i;t1)] = 0;E[yi;t2(yi;t1 yi;t2)] = 0 E("2i;t2) = 2" ;E[(yi;t2 yi;t3)(yi;t1 yi;t2)] = 0 E("2i;t2) = 2" :

Instrumental-variable estimators that are consistent whenN and/or

T !1:

=

PN

i=1

PT

t=3(yit yi;t1)(yi;t2 yi;t3)PN

i=1

PT

t=3(yi;t1 yi;t2)(yi;t2 yi;t3)


or =

PN

i=1

PT

t=3(yit yi;t1)yi;t2PN

i=1

PT

t=3(yi;t1 yi;t2)yi;t2:

Conclusion: With Within transformation on a dynamic model,

even though i is eliminated, endogeneity bias occurs for xed T

because the Q operator used introduces errors "is correlated by

construction with current explanatory variable.

Consider now a more general model:

yit = yi;t1 + xit + zi + i + "it:

IV Estimation proceeds as follows.

Step 1. First-dierence the model, to get

(yit yi;t1) = (yi;t1 yi;t2) + (xit xi;t1) + "it "i;t1:

Use yi;t2 or (yi;t2 yi;t3) as instrument for (yi;t1 yi;t2) andestimate ; with the IV procedure.

Step 2. Substitute and in rst-dierence Between equation:

yi yi;1 xi = zi + i + "i; i = 1; 2; : : : ; N;

and estimate by OLS.

Step 3. Estimate variance components:

2"= 1

2N(T1)

PN

i=1

PT

t=1 [(yit yi;t1) (yi;t1 yi;t2)

(xit xi;t1)i2;

2= 1

N

PN

i=1

hyi yi;1 zi xi

i2 1

T2";

5.3. THE RANDOM-EFFECTS MODEL 81

Consistency of the estimator:

IV estimator of , and 2"are consistent when N or T !1;

IV estimator of and 2are consistent only when T ! 1, but

inconsistent when T is xed and N !1.

5.3 The Random-eects model

We now treat i as a random variable, in addition to "it. As

for static models, i is not eliminated, but it is correlated by

construction with lagged dependent variable yi;t1.

5.3.1 Bias in the ML estimator

In the simple model yit = yi;t1+i+ "it, the MLE is equivalent

to the OLS estimator:

=

PN

i=1

PT

t=1 yityi;t1PN

i=1

PT

t=1 y2i;t1

= +

PN

i=1

PT

t=1(i + "it)yi;t1PN

i=1

PT

t=1 y2i;t1

:

We show that

plimN!11

NT

NXi=1

TXt=1

(i + "it)yi;t1 =1

T

1 T1 Cov(yi0; i)

+1

T

2

(1 )2(T 1) T+ T

;

and

plimN!11

NT

NXi=1

TXt=1

y2i;t1 =1 2TT (1 2):

PN

iy2i0

N

+2

(1 )2 :1

T

T 21

T

1 +1 2T1 2


+2

T (1 )

1 T1

1 2T1 2

Cov(yi0; i)

+2"

T (1 2)2(T 1) T2 + 2T

:

The bias depends on the behavior of initial conditions yi0 (constant

or generated as yit).

5.3.2 An equivalent representation

We consider a more general model

yit = yi;t1 + xit + zi + uit;

with the following assumptions:

jj < 1; E(i) = E("it) = 0;

E(ixit) = 0; E(izi) = 0; E(i"it) = 0;

E(ij) = 2

if i = j;

0 otherwise;

E("it"js) = 2"

if i = j; t = s;

0 otherwise:

We can also write

wit = wi;t1 + xit + zi + "it;

yit = wit + i;

where i = i=(1 ); Ei = 0; V ar(i) = 2 = 2=(1 )2;

and the dynamic process fwitg is independent from individual ef-fect i.


5.3.3 The role of initial conditions

The two equivalent specications of the model are:

(A) yit = yi;t1 + xit + zi + i + "it;

(B)wit = wi;t1 + xit + zi + "it;

yit = wit + i:

In model (A), yit is driven by unobserved characteristics i, dif-

ferent across units, in addition to xit and zi.

In model (B), dynamic process wit is independent from individual

eects i. Conditional on exogenous xit and zi, wit are driven by

identical processes with i.i.d. shocks "it. But observed value yit is

shifted by individual-specic eect i.

Possible interpretation: wit is a latent variable, yit is observed,

and i is a time-invariant measurement error.

The two processes are equivalent because wit is unobserved. But

assumptions (or knowledge) on initial conditions may help to dis-

tinguish between both processes.

Dierent cases:

1/ yi0 xed; 2/ yi0 random; 2.a/ yi0 independent of i, with E(yi0) = y0 and V ar(yi0) =2y0;

2.b/ yi0 correlated with i, with Cov(yi0; i) = 2y0; 3/ wi0 xed; 4/ wi0 random; 4.a/ wi0 random with common mean w and variance 2"=(12)


(stationarity assumption);

4.b/ wi0 random with common mean w and arbitrary variance2w0;

4.c/ wi0 random with mean i0 and variance 2"=(1 2) (sta-tionarity assumption);

4.d/ wi0 random with mean i0 and arbitrary variance 2w0.

See Appendix 4 for a derivation of Maximum Likelihood esti-

mators in each case.

5.3.4 Possible inconsistency of GLS

In cases 1 and 2.a/ (yi0 xed of random but independent of i):

When 2and 2

"are known, maximizing log-likelihood wrt. ;

and yields the GLS estimator. When 2 and 2" are unknown,

feasible GLS applies by using consistent estimates of these vari-

ances in VT .

Other cases

Estimators for and are consistent when T !1, because GLSconverges to Within. When N !1 and T is xed, GLS is incon-sistent in cases where initial values are correlated with individual

eects.

5.3.5 Example: The Balestra-Nerlove study

Seminal paper on Dynamic Panel Data models (1966). Household

demand for natural gas in the US, including a/ the demand due

to replacement of gas appliances, and b/ demand due to increases

in the stock of appliances.


Table 5.2: Properties of the MLE for dynamic panel data models

Parameters N xed, T !1 T xed, N !1Case 1: yi0 xed

; ; 2"

Consistent Consistent

; 2 Inconsistent Consistent

Case 2.a: yi0 random, yi0 ind. of i; ; 2

" Consistent Consistent

y0; ; 2; 2

y0Inconsistent Consistent

Case 2.b: yi0 correlated with i; ; 2

"Consistent Consistent

y0; ; 2;

2y0; Inconsistent Consistent

Case 3: wi0 xed

; ; 2"

Consistent Inconsistent

wi0; ; 2 Inconsistent Inconsistent

Case 4.a: wi0 random, mean w, variance 2"=(1 2)

; ; 2" Consistent Consistent

w; ; 2 Inconsistent Consistent

Case 4.b: wi0 random, mean w, variance 2w0

; ; 2"

Consistent Consistent

w0; ; 2; w Inconsistent Consistent

Case 4.c: wi0 random, mean i0, variance 2"=(1 2)

; ; 2" Consistent Inconsistent

i0; ; 2

Inconsistent Inconsistent

Case 4.d: wi0 random, mean i0, variance 2w0

; ; 2"

Consistent Inconsistent

i0; 2;

2w0

Inconsistent Inconsistent


Demand system:

Git= Git (1 r)Gi;t1;

F it= Fit (1 r)Fi;t1;

Fit = a0 + a1Nit + a2Iit;

Git= b0 + b1Pit + b2F

it;

where Gitand Git are respectively the new demand and the actual

demand for gas at time t from unit i, r is the appliances deprecia-

tion rate, F itand Fit are respectively the new and actual demand

for all types of fuel, Nit is total population, Iit is per-head income,

and Pit is relative price of gas.

Solving the system, we have the equation to be estimated:

Git = 0 + 1Pit + 2Nit + 3Ni;t1+4Iit + 5Ii;t1 + 6Gi;t1;

where Nit = Nit Ni;t1, Iit = Iit Ii;t1, and 6 = 1 r.

Estimation procedures: OLS, Within (LSDV) and GLS (with as-

sumption that initial conditions Gi0 are xed, case 1/).

In accordance with the theory, (here, 6) is biased upward for

OLS and downward for Within.


Table 5.3: Parameter estimates, Balestra-Nerlove model

Parameter OLS Within GLS

0 (Intercept) -3.650 - -4.091

(3.316) - (11.544)

1 (Pit) -0.0451(*) -0.2026 -0.0879(*)

(0.027) (0.0532) (0.0468)

2 (Nit) 0.0174(*) -0.0135 -0.00122

(0.0093) (0.0215) (0.0190)

3 (Ni;t1) 0.00111(**) 0.0327(**) 0.00360(**)

(0.00041) (0.0046) (0.00129)

4 (Iit) 0.0183(**) 0.0131 0.0170(**)

(0.0080) (0.0084) (0.0080)

5 (Ii;t1) 0.00326 0.0044 0.00354

(0.00197) (0.0101) (0.00622)

6 (Gi;t1) 1.010(**) 0.6799(**) 0.9546(**)

(0.014) (0.0633) (0.0372)

Notes. N = 36, T = 11. Standard errors are in parentheses. (*) and (**):

parameter signicant at 10% and 5% level respectively.

Part II

Generalized Method of Moments

estimation

89

Chapter 6

The GMM estimator

Generalized Method of Moments: ecient way to obtain consis-

tent parameter estimates under mild conditions on the model.

Very popular in estimating structural economic models, as it re-

quires much less conditions on model disturbances than Maximum

Likelihood. Another important advantage: easy to obtain param-

eter estimates that are robust to heteroskedasticity of unknown

form.

6.1 Moment conditions and the method of mo-

ments

6.1.1 Moment conditions

Consider a sample of size N , fxi; i = 1; 2; : : : ; Ng from which onewishes to estimate a p 1 vector whose true value is 0.Note: notation above is very general, xi will typically include de-

pendent (endogenous) and explanatory (exogenous, endogenous)

variables.

Let f(xi; ) denote a q1 function whose expectation E[f(xi; )]

91

92 CHAPTER 6. THE GMM ESTIMATOR

exists and is nite. Moment conditions are then dened as

E[f(xi; 0)] = 0:

6.1.2 Example: Linear regression model

Consider the linear model

yi = xi0 + ui; i = 1; 2; : : : ; N;

where 0: true value of parameter vector , and ui is the error

term.

A common assumption is E(uijxi) = 0 , E(yijxi) = xi0, andfrom the Law of Iterated Expectations:

E(xiui) = E[E(xiuijxi)] = E[xiE(uijxi)] = 0:

In terms of the denition above, = and f((xi; yi); ) = xi(yixi). Moment conditions are then

E(xiui) = E[xi(yi xi0] = 0:

Note that here, p = q, as many moment conditions as we have

parameters to estimate.

Suppose now we do not assume E(uijxi) = 0 but instead, thatE(ziui) = 0. Vector zi is q 1 and would consist of instrumentssuch that

E(ziui) = E[zi(yi xi0)] = 0; orf [(xi; yi; zi); ] = zi(yi xi):

There are q moment equations (as many as there are instruments)

and p parameters to estimate. Hence, identication condition is

q p.

6.1. MOMENT CONDITIONS AND THE METHOD OF MOMENTS 93

6.1.3 Example: Gamma distribution

A sample fxi; i = 1; 2; : : : ; Ng is drawn from a Gamma distri-bution (a; b) with true values a0 and b0. Relationship between

parameters and two rst moments of the distribution:

E(xi) =a0

b0; E[xi E(xi)]2 =

a0

b20:

In our notation in the denition above: = (a; b) and

f(xi; ) =hxi

a

b; (xi

a

b)2 a

b2

i;

so that E[f(xi; 0] = 0.

6.1.4 Method of moments estimation

How to estimate using moment conditions given above ? In the

case where p = q (as many conditions as parameters), we could

solve E[f(xi; 0)] = 0 for 0. But E[f(:)] is unknown, whereas

function values f(xi; ) can be computed 8; 8i. Also, samplemoments of function f(:) can be computed:

fN() =1

N

NXi=1

f(xi; ):

Basic idea of the method of moment estimation: if E(f) close to

fN (population moments close to empirical moments), then N is

a convenient estimate for 0, where f(N) = 0.

0 = E[f(0)] fN(N) ) 0 N :

Two important conditions need to hold for the method of moment

estimation to be valid: a) E(f) is adequately approximated by


fN ; b) moment conditions can be solved for N .

Example: linear regression.

Sample moment conditions are

1

N

NXi=1

xiui =1

N

NXi=1

xi(yi xiN) = 0;

and solving for N yields

N =

NXi=1

xix0i

!1NXi=1

xiyi:

6.1.5 Example: Poisson counting model

Poisson process: dependent variable is discrete (number of events,

etc.). Restriction: Mean of distribution is equal to the variance.

Assumption: dependent variables y1; y2; : : : ; yN are distributed

according to independent Poisson distributions, with parameters

1; 2; : : : ; N respectively.

Prob[yi = r] = exp(i)ri

r!

We assume the i's depend on explanatory variables by a log-

linear relationship:

logi = 0 +

pXj=1

jxij:

The likelihood of the Poisson model is

L = Ni=1

exp(i)

yi

i

yi!

= exp

"

NXi=1

i + 0

NXi=1

yi

6.1. MOMENT CONDITIONS AND THE METHOD OF MOMENTS 95

+

pXj=1

j

NXi=1

xijyi

# Ni=1yi!

1:

Let us consider the following sample moments :

T0 =

NXi=1

yi Tj =

NXi=1

xijyi j = 1; : : : ; p;

and we use the fact that

@i

@0= i and

@i

@j= xiji:

If we set derivatives of logL wrt. 0 and the j's to 0, we get

T0 =

NXi=1

i Tj =

NXi=1

xiji j = 1; : : : ; p

where i = exp(0 +P

p

j=1 jxij): Hence, we match sample mo-

ments T0 and Tj to theoretical momentsP

N

i=1 exp(0+P

p

j=1 jxij)

and Tj =P

N

i=1 xij exp(0 +P

p

j=1 jxij) respectively.

We have p+ 1 such matching conditions for p+ 1 parameters.

6.1.6 Comments

Note the dierence between the Method of Moments philosophy

and the usual estimation criteria. For Maximum Likelihood and

Least Squares, we maximize (minimize) a criterion

= argmax logL() (MLE);

= argmin1N

PN

i[yi f(xi; )]2 (LS);


whereas here, we start from First-order Conditions and solve the

system for .

Example: Instrumental Variable estimation

We could consider minimizing the IV criterion wrt. :

= argmin

(Y X)0Z(Z 0Z)1Z 0(Y X);

where Z is a N q matrix of instruments, or start from the FOC:

1

N

NXi=1

ziui =1

N

NXi=1

zi(yi xi) = 0

, =

NXi=1

z0ixi

!1NXi=1

z0iyi = (Z

0X)1Z 0Y:

Equivalently, we could maximize the log likelihood wrt. or start

from the FOC

1

N

NXi=1

@ logL()

@j= = 0;

which can be regarded here as a set of sample moment conditions.

Problems that remain to be solved:

Ensure that we can replace population moments by sample mo-ments, for the Method of Moments to work.

What if the system of moment conditions is overidentied (moreconditions than parameters) ?

How to be sure our moment conditions are valid (e.g., validchoice of instruments) ?

6.2. THE GENERALIZED METHOD OF MOMENTS (GMM) 97

6.2 The Generalized Method of Moments (GMM)

6.2.1 Introduction

As the name indicates, GMM is an extension of the Method of

Moments, when parameters are overidentied by moment con-

ditions. Equations E[f(xi; 0] = 0 represent q conditions for p

unknown parameters, therefore we cannot nd a vector N satis-

fying fN() = 0.

But we can look for that makes fN() as close to 0 as possible,

by dening

N = argmin

QN() = fN()0ANfN();

where AN is a positive weighting matrix of order 0(1).

Important note: for the just-identied case, QN() = 0 because

fN() = 0, but in the over-identied case, QN() > 0.

This fact is important for model checking (we will come to this

point later in the course).

6.2.2 Example: Just-identied IV model

Consider Y = X+u with condition E(W 0u) = 0 (W are instru-

ments), and

rank(W 0X) = p. Solving for we have = (W 0X)1(W 0Y )

that we replace in the IV criterion:

u()0P 0Wu() =

Y X(W 0X)1(W 0Y )

0W (W 0W )1W

Documents

DEEQA,Ecole A - Institut national de la recherche agronomique DEEQA.pdf · DEEQA,Ecole Do ctorale MPSE A cademic y ear 2003-2004 A dv anced Econometrics P anel data econometrics and