If you can't read please download the document
Upload
phammien
View
217
Download
0
Embed Size (px)
Citation preview
DEEQA,Ecole Doctorale MPSE
Academic year 2003-2004
Advanced Econometrics
Panel data econometricsand GMM estimation
Alban ThomasMF 102, [email protected]
2
Purpose of the course
Present recent developments in econometrics, that allow fora consistent treatment of the impact of unobserved heterogeneity
on model predictions: Panel data analysis.
Present a convenient econometric framework for dealing withrestrictions imposed by theory: Method of Moments estimation.
Deal with discrete-choice models with unobserved hetero-geneity.
Two keywords: unobserved heterogeneity and endogeneity.
Methods:
- Fixed Eects Least Squares
- Generalized Least Squares
- Instrumental Variables
- Maximum Likelihood estimation for Panel Data models
- Generalized Method of Moments for Times Series
- Generalized Method of Moments for Panel Data
- Heteroskedasticity-consistent estimation
- Dynamic Panel Data models
- Logit and Probit models for Panel Data
- Simulation-based inference
- Nonparametric and Semiparametric estimation
Statistical software: SAS, GAUSS, STATA (?)
3
4
Contents
I Panel Data Models 7
1 Introduction 9
1.1 Gains in pooling cross section and time series . . . 9
1.1.1 Discrimination between alternative models . 9
1.1.2 Examples . . . . . . . . . . . . . . . . . . . 10
1.1.3 Less colinearity between explanatory variables 11
1.1.4 May reduce bias due to missing or unob-
served variables . . . . . . . . . . . . . . . 11
1.2 Analysis of variance . . . . . . . . . . . . . . . . . 12
1.3 Some denitions . . . . . . . . . . . . . . . . . . . 15
2 The linear model 17
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Model notation . . . . . . . . . . . . . . . 18
2.1.2 Standard matrices and operators . . . . . . 19
2.1.3 Important properties of operators . . . . . 20
2.2 The One-Way Fixed Eects model . . . . . . . . . 21
2.2.1 The estimator in terms of the Frisch-Waugh-
Lovell theorem . . . . . . . . . . . . . . . . 21
2.2.2 Interpretation as a covariance estimator . . 23
2.2.3 Comments . . . . . . . . . . . . . . . . . . 24
2.2.4 Testing for poolability and individual eects 25
5
6 CONTENTS
2.3 The Random Eects model . . . . . . . . . . . . . 26
2.3.1 Notation and assumptions . . . . . . . . . 26
2.3.2 GLS estimation of the Random-eect model 27
2.3.3 Comparison between GLS, OLS and Within 29
2.3.4 Fixed individual eects or error components? 29
2.3.5 Example: Wage equation, Hausman (1978) 30
2.3.6 Best Quadratic Unbiased Estimators (BQU)
of variances . . . . . . . . . . . . . . . . . 31
3 Extensions 33
3.1 The Two-way panel data model . . . . . . . . . . . 33
3.1.1 The Two-way xed-eect model . . . . . . 33
3.1.2 Example: Production function (Hoch 1962) 36
3.2 More on non-spherical disturbances . . . . . . . . 37
3.2.1 Heteroskedasticity in individual eect . . . 37
3.2.2 `Typical heteroskedasticity . . . . . . . . . 38
3.3 Unbalanced panel data models . . . . . . . . . . . 39
3.3.1 Introduction . . . . . . . . . . . . . . . . . 39
3.3.2 Fixed eect models for unbalanced panels . 40
4 Augmented panel data models 47
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 47
4.2 Choice between Within and GLS . . . . . . . . . . 48
4.3 An important test for endogeneity . . . . . . . . . 49
4.4 Instrumental Variable estimation: Hausman-Taylor
GLS estimator . . . . . . . . . . . . . . . . . . . . 51
4.4.1 Instrumental Variable estimation . . . . . . 51
4.4.2 IV in a panel-data context . . . . . . . . . 51
4.4.3 Exogeneity assumptions and a rst instru-
ment matrix . . . . . . . . . . . . . . . . . 52
CONTENTS 7
4.4.4 More ecient procedures: Amemiya-MaCurdy
and Breusch-Mizon-Schmidt . . . . . . . . 53
4.5 Computation of variance-covariance matrix for IV
estimators . . . . . . . . . . . . . . . . . . . . . . 55
4.5.1 Full IV-GLS estimation procedure . . . . . 56
4.6 Example: Wage equation . . . . . . . . . . . . . . 56
4.6.1 Model specication . . . . . . . . . . . . . 56
4.7 Application: returns to education . . . . . . . . . 58
4.7.1 Variables related to job status . . . . . . . 58
4.7.2 Variables related to characteristics of house-
holds heads . . . . . . . . . . . . . . . . . 58
5 Dynamic panel data models 63
5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Dynamic formulations from dynamic pro-
gramming problems . . . . . . . . . . . . . 63
5.1.2 Euler equations and consumption . . . . . . 65
5.1.3 Long-run relationships in economics . . . . 67
5.2 The dynamic xed-eect model . . . . . . . . . . . 69
5.2.1 Bias in the Fixed-Eects estimator . . . . . 70
5.2.2 Instrumental-variable estimation . . . . . . 73
5.3 The Random-eects model . . . . . . . . . . . . . 75
5.3.1 Bias in the ML estimator . . . . . . . . . . 75
5.3.2 An equivalent representation . . . . . . . . 76
5.3.3 The role of initial conditions . . . . . . . . 77
5.3.4 Possible inconsistency of GLS . . . . . . . . 78
5.3.5 Example: The Balestra-Nerlove study . . . 78
8 CONTENTS
II Generalized Method of Moments estimation 83
6 The GMM estimator 85
6.1 Moment conditions and the method of moments . 85
6.1.1 Moment conditions . . . . . . . . . . . . . 85
6.1.2 Example: Linear regression model . . . . . 86
6.1.3 Example: Gamma distribution . . . . . . . 87
6.1.4 Method of moments estimation . . . . . . . 87
6.1.5 Example: Poisson counting model . . . . . 88
6.1.6 Comments . . . . . . . . . . . . . . . . . . 89
6.2 The Generalized Method of Moments (GMM) . . . 91
6.2.1 Introduction . . . . . . . . . . . . . . . . . 91
6.2.2 Example: Just-identied IV model . . . . . 91
6.2.3 A denition . . . . . . . . . . . . . . . . . 92
6.2.4 Example: The IV estimator again . . . . . 92
6.3 Asymptotic properties of the GMM estimator . . . 93
6.3.1 Consistency . . . . . . . . . . . . . . . . . 94
6.3.2 Asymptotic normality . . . . . . . . . . . . 95
6.4 Optimal and two-step GMM . . . . . . . . . . . . 97
6.5 Inference with GMM . . . . . . . . . . . . . . . . 99
6.6 Extension: optimal instruments for GMM . . . . . 102
6.6.1 Conditional moment restrictions . . . . . . 102
6.6.2 A rst feasible estimator . . . . . . . . . . 104
6.6.3 Nearest-neighbor estimation of optimal in-
struments . . . . . . . . . . . . . . . . . . 106
6.6.4 Generalizing the approach: other nonpara-
metric estimators . . . . . . . . . . . . . . 109
7 GMM estimators for time series models 115
7.1 GMM and Euler equation models . . . . . . . . . 115
7.1.1 Hansen and Singleton framework . . . . . . 115
CONTENTS 9
7.1.2 GMM estimation . . . . . . . . . . . . . . 117
7.2 GMM Estimation of MA models . . . . . . . . . . 118
7.2.1 A simple estimator . . . . . . . . . . . . . 118
7.2.2 A more ecient estimator . . . . . . . . . . 120
7.2.3 Example: The Durbin estimator . . . . . . 121
7.3 GMM Estimation of ARMA models . . . . . . . . 122
7.3.1 The ARMA(1,1) model . . . . . . . . . . . 122
7.3.2 IV estimation . . . . . . . . . . . . . . . . 123
7.4 Covariance matrix estimation . . . . . . . . . . . . 125
7.4.1 Example 1: Conditional homoskedasticity . 126
7.4.2 Example 2: Conditional heteroskedasticity . 126
7.4.3 Example 3: Covariance stationary process . 127
7.4.4 The Newey-West estimator . . . . . . . . . 128
7.4.5 Weighted autocovariance estimators . . . . 130
7.4.6 Weighted periodogram estimators . . . . . 133
8 GMM estimators for dynamic panel data 135
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . 135
8.2 The Arellano-Bond estimator . . . . . . . . . . . . 136
8.2.1 Model assumptions . . . . . . . . . . . . . 136
8.2.2 Implementation of the GMM estimator . . 137
8.3 More ecient procedures (Ahn-Schmidt) . . . . . . 139
8.3.1 Additional assumptions . . . . . . . . . . . 139
8.4 The Blundell-Bond estimator . . . . . . . . . . . . 140
8.5 Dynamic models with Multiplicative eects . . . . 141
8.5.1 Multiplicative individual eects . . . . . . . 141
8.5.2 Mixed structure . . . . . . . . . . . . . . . 143
8.6 Example: Wage equation . . . . . . . . . . . . . . 145
10 CONTENTS
III Discrete choice models 149
9 Nonlinear panel data models 151
9.1 Brief review of binary discrete-choice models . . . 151
9.1.1 Linear Probability model . . . . . . . . . . 151
9.1.2 Logit model . . . . . . . . . . . . . . . . . 152
9.1.3 Probit model . . . . . . . . . . . . . . . . . 152
9.2 Logit models for panel data . . . . . . . . . . . . . 153
9.2.1 Sucient statistics . . . . . . . . . . . . . . 153
9.2.2 Conditional probabilities . . . . . . . . . . 155
9.2.3 Example: T = 2 . . . . . . . . . . . . . . . 156
9.3 Probit models . . . . . . . . . . . . . . . . . . . . 157
9.4 Semiparametric estimation of discrete-choice models 158
9.4.1 The binary choice model . . . . . . . . . . 159
9.4.2 The IV estimator . . . . . . . . . . . . . . 162
9.5 SML estimation of selection models . . . . . . . . 164
9.5.1 The GHK simulator . . . . . . . . . . . . . 164
9.5.2 Example . . . . . . . . . . . . . . . . . . . 168
Appendix 1. Maximum-Likelihood estimation of the
Random-eect model 171
Appendix 2. The two-way random eects model 173
Appendix 3. The one-way unbalanced random eects
model 179
Appendix 4. ML estimation of dynamic panel models181
Appendix 5. GMM estimation of static panel models185
CONTENTS 11
Appendix 6. A framework for simulation-based infer-
ence 194
Appendix 7. Example: the SAS c Software 203
Appendix 8. A crash course in Gauss c 211
Appendix 9. Example: The Gauss c software 219
Appendix 10. IV and GMM estimation with Gauss c224
Appendix 11. DPD estimation with Gauss c 232
References 238
12 CONTENTS
Part I
Panel Data Models
13
Chapter 1
Introduction
Panel data: Sequential observations on a number of
units (individuals, rms).
Also called cross-sections over time, longitudinal data or pooled
cross-section time-series data.
1.1 Gains in pooling cross section and time se-
ries
1.1.1 Discrimination between alternative models
Many economic models in the form:
F (Y;X;Z; ) = 0;
where Y : individual control variables (workers, rms); X: (public
policy or principal's) variables; Z: (xed) individual attributes;
: parameters.
Linear model:
Y = 0 + xX + zZ + u:
15
16 CHAPTER 1. INTRODUCTION
Alternative views concerning this model:
Policy variables have a signicant impact whatever individualcharacteristics, or
Dierences across individuals are due to idiosyncratic individualfeatures, not included in Z.
In practice, observed dierences across individuals may be due
to both inter-individual dierences and the impact of policy vari-
ables.
1.1.2 Examples
a) WAGE = 0 + 1EDUCATION + 2Z.
People with higher education level have higher wages becauserms value those people more;
People have higher education because they have higher ability(expected productivity) anyway, and rms value worker ability
more.
b) SALES = 0 + 1ADV ERTISEMENT + 2Z.
Advertisement expenditures boost sales;More ecient rms enjoy more sales, and thus have more moneyfor advertisement expenditures.
c) OUTPUT = 0 + 1REGULATION + 2Z.
Regulatory control aects rm output; Firms with higher output are more regulated on average.
d) WAGE = 0 + 11I(UNION) + 2Z.
Belonging to a union signicantly raises wages;
1.1. GAINS IN POOLING CROSS SECTION AND TIME SERIES 17
Firms react to higher wages imposed by unions by hiring higher-quality workers, and 1I(UNION) is a proxy for worker quality.
1.1.3 Less colinearity between explanatory variables
In consumer or production economics, input, output or consumer
prices are dicult to use, because:
Time-series: Aggregated macro price indexes are highly cor-related;
Cross-sections: Not enough price variation across individualsor rms.
With panel data, variations across individuals and across time pe-
riods are accounted for.
Time-series: no information on the impact of individual char-acteristics (socioeconomic variables,...);
Cross-sections: no information on adjustment dynamics. Es-timates may reect inter-individual dierences inherent in com-
parisons of dierent people or rms.
1.1.4 May reduce bias due to missing or unobserved
variables
With panel data, easy to control for unobserved heterogeneity
across individuals. This is critical in practice, explains why panel
data models are now so popular in micro- and macro-econometrics.
Point related to endogeneity and omitted variables issues.
18 CHAPTER 1. INTRODUCTION
Example: Output supply function under perfect competition
max = pQ C(;Q) where C(;Q) = c(Q)
, p = @c(Q)@Q
= AQ1 (Cobb-Douglas)
= (0 + 1Q) (Quadratic).
Cobb-Douglas case: logQ = 11 (log p log A ). From
equilibrium condition to estimable equation: Observations (Qit; pit),
unobserved heterogeneity i, rm i, period t.
logQit =1
1 (log pit log i A )
Identication issue: estimable equation is
~Qit = a0 + a1~pit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
where ~Qit = logQit, ~pit = log pit, a1 = 1=( 1),a0 = (A E log i) =( 1), Euit = 0.Model identied if E log i = 0, i.e., Ei = 1, otherwise A is bi-
ased if i is overlooked and E log i 6= 0.
Empirical issue: possible correlation between output price pitand eciency term i.
1.2 Analysis of variance
Consider the model
yit = i + xiti + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; Ti;
where xit is scalar, i and i are parameters, and Ti: number of
time periods available for individual i.
1.2. ANALYSIS OF VARIANCE 19
Useful rst-order empirical moments are
yi =1
T
TiXt=1
yit; xi =1
T
TiXt=1
xit;
Sxxi =
TiXt=1
(xit xi)2; Sxyi =TiXt=1
(xit xi)(yit yi);
and
Syyi =
TiXt=1
(yit yi)2; i = 1; 2; : : : ; N:
Least-square parameter estimates are computed as
i = Sxyi=Sxxi and i = yi xi
and the Residual Sum of Squares (RSS) for individual i is
RSSi = Syyi S2xyi=Sxxi; with (Ti 2) degrees of freedom:
Consider now a restricted model with constant slopes and con-
stant intercepts:
yit = + xit + "it;
which obtains by imposing the following restrictions1 = 2 = = N(= )1 = 2 = = N(= ):
Under these restrictions, least-squares parameter estimates would
be
=
PN
i=1
PTi
t=1(xit x)(yit y)PN
i=1
PTi
t=1(xit x)2
20 CHAPTER 1. INTRODUCTION
and = y x, where
y =1
NP
iTi
NXi=1
TiXt=1
yit; x =1
NP
iTi
NXi=1
TiXt=1
xit:
The Residual Sum of Squares is
RSS =
NXi=1
TiXt=1
(yit y)2
hPN
i=1
PTi
t=1(yit y)(xit x)i2
PN
i=1
PTi
t=1(xit x)2;
with as number of degrees of freedom:P
N
i=1 Ti 2.
For a majority of applications, the rst model is too general and
estimation would require a great number of time observations. If
unobserved heterogeneity is additive in the model, we might con-
sider the following specication with constant slope and dierent
intercepts:
yit = i + xit + "it:
MinimizingP
i
Pt(yit i xit)2 with respect to i and , we
haveXi
Xt
(yit i xit) = 0;Xi
Xt
xit(yit i xit) = 0;
so that
i = yi xi and =P
i
Ptxit(yit yi)P
i
Ptxit(xit xi)
:
Residual Sum of Squares has nowP
iTi (N +1) degrees of free-
dom (N + 1 parameters are estimated).
This is the most popular model encountered in empirical ap-
plications.
1.3. SOME DEFINITIONS 21
1.3 Some denitions
Typical panel: when number of units (individuals) N is large,and number of time periods (T ) is small.
Short (long) panel: when # periods T is small (large).
Balanced panel: same # periods for every unit (individual).
Rotating panel: A subset of individuals is replaced every pe-riod. Rotating panels can be balanced or unbalanced.
Pseudo panel: when one is pooling cross-sections made ofdierent individuals for every period.
Attrition: with long panels, the probability that an individualremains in the sample decreases as the number of periods increases
(non response, moving, death, etc.)
22 CHAPTER 1. INTRODUCTION
Chapter 2
The linear model
2.1 Notation
yit = xit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
where xit is a K vector, is a (K 1) vector of parameters, anduit is the residual term.
yit and components of xit are both time-varying and varying across
individuals.
Component of dependent variable that is unexplained by xit:
uit = i + t + "it;
where i is the time-invariant individual eect, t is the time
eect, and "it is the i.i.d. component.
One-way error-component model: uit = i + "it.
Two-way error-component model: uit = i + t + "it.
23
24 CHAPTER 2. THE LINEAR MODEL
Allows several predictions of yit given Xit:
E(yitjxit) = xit across i and t,E(yitjxit; i) = xit + i for ind. i, across periods,E(yitjxit; t) = xit + t for period t, across individuals,E(yitjxit; i; t) = xit + i + t for ind. i and period t.
2.1.1 Model notation
2.1.1.1 Model in matrix form
Y = X + + + ";
where Y; ; and " are (NT 1), X is (NT K).Convention: index t runs faster, index i runs slower:
0BBBBBBBBBBBBBBBBBBBB@
y11...
y1Ty21...
y2T...
yit...
yN1...
yNT
1CCCCCCCCCCCCCCCCCCCCA
=
266666666666666666666664
X(1)11 X
(K)11
... ...X
(1)1T X
(K)1T
X(1)21 X
(K)21
... ...X
(1)
2T X(K)
2T... ...X
(1)it
X(K)it
... ...X
(1)N1 X
(K)N1
... ...X
(1)NT
X(K)NT
377777777777777777777775
0BBBBBBB@
12...
k...
K
1CCCCCCCA+ + + "
2.1. NOTATION 25
2.1.1.2 Model in vector form
yi = Xi + i+ + "i; i = 1; 2; : : : ; N;
where yi is T 1, Xi is T K. Note: = (1; 2; : : : ; T )0 andi= (i; i; : : : ; i)
0 are (T 1).
2.1.2 Standard matrices and operators
INT : identity matrix w/ NT rows and NT columns; eT : T -vector of ones;
B = IN (1=T )eTe0T : (Between-individual operator);
B = (1=N)eNe0N IT : (Between-period operator);
Q = INT IN (1=T )eTe0T = INT B(Within-individual operator);
Q = INT (1=N)eNe0N IT = INT B(Within-period operator;)
B B = (1=NT )eNTe0NT(Computes full population mean).
Important assumption: No intercept term in the
model (otherwise, use B B to demean all variables).
The B operators are used to compute, from NT vectors and ma-
trices, individual- or time-specic means of variables which are
26 CHAPTER 2. THE LINEAR MODEL
stored in matrices of row dimension NT .
The Q operators are used to compute deviations from these
means.
2.1.3 Important properties of operators
Symmetry, idempotency and orthogonality
Q0 = Q; B0 = B; Q2 = Q; B2 = B; BQ = QB = 0;
Rank of idempotent matrix = its trace
) rank(Q) = N(T 1) and rank(B) = N:Decomposition of the Q operator with N = T = 2:
Qy =
0BB@26641 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
3775 1 00 1
1
2
1 1
1 1
1CCA y
=
0BB@y11y12y21y22
1CCA 1226641 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
37750BB@y11y12y21y22
1CCA
=
0BB@y11y12y21y22
1CCA 120BB@y11 + y12y11 + y12y21 + y22y21 + y22
1CCAWe will also use
BT = (1=T )eTe0T : Between operator for a single individual; QT = IT (1=T )eTe0T = IT BT : Within operator for a singleindividual.
2.2. THE ONE-WAY FIXED EFFECTS MODEL 27
2.2 The One-Way Fixed Eects model
Terminology: the xed-eects model does not mean that indi-
vidual eects i are not random in the true model ! Rather,
estimation is conditional on unobserved heterogeneity: the i's
are treated as parameters to be estimated.
2.2.1 The estimator in terms of the Frisch-Waugh-Lovell
theorem
Inference is conditional on individual eects: estimates obtain by
regressing Y on X and on individual dummies.
Let E the NT N matrix of individual dummy variables:
E =
266666666666666666664
1 0 0 01 0 0 01 0 0 00 1 0 00 1 0 00 1 0 0... ...0 0 0 10 0 0 10 0 0 1" " "(i = 1) (i = 2) (i = N)
377777777777777777775and consider the model
Y = X +E + " =W + u
where W = [X;E], = ( 0; 0)0, u = + ".
28 CHAPTER 2. THE LINEAR MODEL
Frish-Waugh-Lovell theorem: Parameter estimates are numeri-
cally identical in the 2 following procedures:
from OLS = (0; 0)0 = (W 0W )1W 0Y
= (X0X)1X0Y ; where
X = [I E(E 0E)1E 0]X = PEX;Y = [I E(E 0E)1E 0]Y = PEY
(residuals from least-square regression of X and Y on E).
But E = IN eT , E 0E = IN e0TeT = IN T, PE = I E(E 0E)1E 0 = I 1TE(IN)E 0= I 1
T(IN eT )(IN eT )0 = I IN 1T eTe
0T= Q.
Hence = (X0
X)1(X0
Y ) = (X 0P 0EPEX)
1(X 0P 0EPEY )
= (X 0QX)1(X 0QY ).
Idea behind the xed-eect estimation procedure:
Eliminate individual eects , Eliminate individual-specic deviations
from variables
Transformation of the linear model as follows:
yit 1=TXt
yit = (xit 1=TXt
xit) + uit 1=TXt
uit
, Y BY = (X BX) + uBu , QY = QX +Qu:Least square parameter estimate:
= [(QX)0(QX)]1
(QX)0QY = [X 0Q0QX]1
(X 0Q0QY )
= (X 0QX)1X 0QY and V ar() = 2"(X 0QX)1.
2.2. THE ONE-WAY FIXED EFFECTS MODEL 29
2.2.2 Interpretation as a covariance estimator
The model is, in vector form:26664y1y2...
yN
37775 =26664x1x2...
xN
37775 +26664eT0T...
0T
377751 +266640TeT...
0T
377752
+ +
266640T0T...
eT
37775N +26664"1"2...
"N
37775 ;with assumptions:
E("i) = 0; E("i"0i) = 2
"IT ; E("i"
0j) = 0 i 6= j:
OLS estimates of and i obtain by
min
NXi=1
"0i"i =
NXi=1
(yi i xi)0(yi i xi)
, i = yi xi; i = 1; 2; : : : ; N;and substituting in partial derivative wrt. , we have
=
"N;TXi;t
(xit xi)(xit xi)0#1 "
N;TXi;t
(xit xi)(yit yi)#
This is called the covariance estimator, or the LSDV (Least-Square
Dummy-Variable) estimator. is unbiased, is consistent when N
or T tends to innity. Its covariance matrix is
V ar= 2
"
"NXi=1
xiQTx0i
#1;
30 CHAPTER 2. THE LINEAR MODEL
where QT = IT (1=T )eTe0T .i is unbiased but consistent only when T !1.
2.2.3 Comments
Model transformation by ltering out individual components) Coecients associated with time-invariant regressors are notidentied.
Fixed-eect procedure uses variation within periods for eachunit, hence the name.
Another possibility is the Between procedure, using varia-tion between individuals.
BY = BX + B+ B";
= [(BX)0(BX)]1
(BX)0BY = [X 0BX]1X 0BY:
This alternative estimator uses variation between individual means
for model variables.
If X1 is time-varying only, BX1 = f 1TP
T
tx1itgi;t = x1 8i, and
the intercept term is not identied.
A word of caution in computing variance estimates. In the
model QY = QX + Qu, statistical software would divide RSS
by NT K (individual eects not included). But in the modelY = X+E++", the RSS would be divided by N(T1)K.
Parameter variance estimates in the Within regression model must
be multiplied by (NT K)=[N(T 1)K].
2.2. THE ONE-WAY FIXED EFFECTS MODEL 31
Y
X
Between
Within
y
1
2
3
................................................................................
...........
2.2.4 Testing for poolability and individual eects
Poolability
As before:yit = i + xiti + "itversus
yit = i + xit + "it;
but now xit is a K vector.
H0 : 1 = 2 = = N(= ) (K(N 1) constraints).Fisher test statistic is
(RRSS URSS)=K(N 1)URSS=N(T K 1) v F (K(N 1); N(T K 1)) ;
where RRSS: from Within regression
and URSS:=P
N
i=1RSSi where RSSi = SyyiS2xyi=Sxxi (see 1.2).
Testing for individual eects
H0 : 1 = = N (= ).
32 CHAPTER 2. THE LINEAR MODEL
yit = + xit + "it (OLS)
versus
yit = i + xit + "it (Within):
Fisher test statistic is
(RRSS URSS)=(N 1)URSS=(NT N K) v F ((N 1); NT N K)) ;
where RRSS: from OLS regression on pooled data
and URSS: from Within (LSDV) regression.
2.3 The Random Eects model
2.3.1 Notation and assumptions
Problem with Fixed-eect model: degrees of freedom are lost when
N ! 1. Dierent approach: assume individual eects are ran-dom, i.e., model inference is drawn marginally (unconditionally
upon the i's) wrt. the population of all eects.
Assumptions:
i v IID(0; 2); "it v IID(0;
2"); E(i"it) = E(ixit) = 0;
with
E(ij) =
2 if i = j;
0 otherwise;
E("it"sj) =
2"
if i = j and t = s;
0 otherwise:
Hence cov(uit; ujs) = 2+ 2
"if i = j and t = s, and 2
if i = j
and t 6= s.
2.3. THE RANDOM EFFECTS MODEL 33
Let
T = E(uiu0i) =
266642+ 2
"2
2
2 2 +
2" 2
... ...2
2 2 + 2"
37775 ;a (T T ) matrix, for every individual i, i = 1; 2; : : : ; N . We have
E(uu0) = = IN T = IN
2(eTe
0T) + 2
"IT
= IN
2(T BT ) + 2"(QT + BT )
since QT = IT BT and BT = (1=T )eTe0T . Therefore
= IN
2(T BT ) + 2"(QT + BT )
= T2B +
2"INT
or equivalently: = 2"Q+ (T2
+ 2
")B.
2.3.2 GLS estimation of the Random-eect model
General model form: Y = X + U; with E(UU 0) = .
Generalized Least Squares (GLS) produce ecient parameter es-
timates of , 2 and 2" , based on known structure of variance-
covariance matrix .
GLS =X 01X
1X 01Y
and V ar(GLS) = 2"
X 01X
1.
Computation of 1: use of the formula
r = (2")rQ+ (T2
+ 2
")rB
for an arbitrary scalar r. Based on properties of Q and B (idem-
potency and orthogonality).
34 CHAPTER 2. THE LINEAR MODEL
Hence useful matrices are
1 =1
2"
Q+1
T2+ 2
"
B
and
1=2 =1
"Q+
1
(T2 + 2")
1=2B:
We have GLS =X 01X
1X 01Y
=
"X 0
2"
1X
#1 "X 0
2"
1Y
#:
=hX 0 (Q+ B)
1Xi1 h
X 0 (Q+ B)1Yi;
where = (T2+ 2
")=2
"= 1 + T2
=2
".
GLS as Weighted Least Squares. Premultiply the model by
"
1=2 and use OLS: Y = X + u, where
Y = "
1=2Y =
Q+
"
(" + T)1=2B
Y
X = "
1=2X =
Q+
"
(" + T)1=2B
X;
so that Y = (Q + 1=2B)Y; X = (Q + 1=2B)X; and in
scalar form:
fyitg = (yit yi) + 1=2yi = yit (1
1p)yi
fxitg = (xit xi) + 1=2xi = xit (11p)xi:
See Appendix 1 for Maximum Likelihood Estimation of the random-
eects model.
2.3. THE RANDOM EFFECTS MODEL 35
2.3.3 Comparison between GLS, OLS and Within
GLS =
X 0QX +
1
X 0BX
1X 0QY +
1
X 0BY
Within = (X
0QX)1X 0QY; Between = (X0BX)1X 0BY;
so that
GLS = S1Within + S2Between;
where S1 = [X0QX + 1
X 0BX]1X 0QX and
S2 = [X0QX + 1
X 0BX]1X
0BX
.
(i) If 2= 0, then 1= = 1 and GLS = OLS.
(ii) If T !1, then 1=! 0 and GLS ! Within. (iii) If 1=!1, then GLS ! Between. (iv) V ar(Within) V ar(GLS) is a s.d.p. matrix. (v) If 1=! 0 then V ar(Within)! V ar(GLS).
2.3.4 Fixed individual eects or error components?
Crucial issue in panel data econometrics: how should we treat ef-
fects i's ? As parameters or as random variables ?
) If inference is restricted to the specic units (individuals)in the sample: conditional inference, use Fixed eects. Example:
Individuals are not selected as random, or all rms in a given in-
dustry are selected.
) If inference on the whole population: marginal (uncondi-tional) inference, use Random eects. Example: Individuals are
selected randomly from a huge population (consumers).
36 CHAPTER 2. THE LINEAR MODEL
2.3.4.1 Some practical choice criteria
Interpretation of eects in the (economic) model; Sampling process: purely random or not; Number of units (countries, regions, households,...); Interchangeability of units; Endogeneity of Xit (see later).
2.3.4.2 Terminology
When xed individual eects are considered, Fixed-Eects or
Within estimation procedure. When random individual eects,
GLS (Generalized Least Squares) estimation procedure.
2.3.5 Example: Wage equation, Hausman (1978)
629 high-school graduates, Michigan income dynamics study. 3774
observations (N = 629, T = 6).
Dependent variable: log wage
The GLS estimator is a weighted-average of the Within and Be-
tween estimators, where the weight is the inverse of the corre-
sponding variance.
The Within estimator neglects the variation between individuals,
the Between estimator neglects the
variation within individuals, and the OLS gives equal weight to
both Within and Between variations.
Note. If the model contains an intercept:
yit = + xit + i + "it;
2.3. THE RANDOM EFFECTS MODEL 37
Table 2.1: Within and GLS estimation results
Variable Within GLS
Constant 0.8499
Age in [20,35] 0.0557 0.0393
Age in [35,45] 0.0351 0.0092
Age in [45,55] 0.0209 -0.0007
Age in [55,65] 0.0209 -0.0097
Age 65 over -0.0171 -0.0423
Unemployed prev. year -0.0042 -0.0277
Poor health prev. year -0.0204 -0.0250
Self-employed -0.2190 -0.2670
South -0.1569 -0.0324
Rural -0.0101 -0.1215
we use B B B instead of B (to eliminate ) in the formulae.
2.3.6 Best Quadratic Unbiased Estimators (BQU) of
variances
If errors are normal, BQU estimates of 2and 2
"are found from
2"= u0Qu=tr(Q) =
PN
i=1
PT
t=1(uit ui)2N(T 1)
and \2"+ T2
= u0Bu=tr(B) = T
NXi=1
u2i=N;
because tr(Q) = N(T 1) and tr(B) = N .
But in practice, the uit's are unknown and we must estimates
variances from the uit's instead.
38 CHAPTER 2. THE LINEAR MODEL
1/ Wallace and Hussain (1969): Use OLS residuals in place of
true u's;
2/ Amemiya (1971): Use LSDV residuals estimates. We have pNT (2
" 2
")p
N(2 2
)
v N
0;
24
"0
0 24
where 2 =
\2" + T
2 2"
=T .
3/ Swamy and Arora (1972): Use mean square errors of the
Within and the Between regressions.
Mean square error from Within regression:
2"=Y 0QY Y 0QX(X 0QX)1X 0QY
=[N(T 1)K]
and from the Between regression:
\2" + T2 =
Y 0BY Y 0BX(X 0BX)1X 0BY
=[N K 1]:
Note: Intercept term in the Between regressors (X), not in the
Within regression.
4/ Nerlove (1971): Compute 2= 1
N1
PN
i=1(i i)2, where iare parameter estimates associated to individual dummies from
LSDV regression. And 2"is estimated from Within regression.
Estimation methods above with covariance components replaced
by consistent estimates: Feasible GLS.
Chapter 3
Extensions
3.1 The Two-way panel data model
Error component structure of the form:
uit = i + t + "it i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
or in matrix form
U = (IN eT )+ (eN IT )+ ";
where = (1; : : : ; N)0 and = (1; : : : ; T )
0.
3.1.1 The Two-way xed-eect model
i and t are treated as xed parameters, conditional inference
on the N individuals over the period 1! T .
3.1.1.1 Notation
Fixed-eect estimates of obtain by using the new operator:
Q = IN IT IN (eTe0T=T ) (eNe0N=N) IT ;
39
40 CHAPTER 3. EXTENSIONS
so that Qu = fuit ui utgit :Averaging over individuals, we have
yt = xt + t + "t with restriction
NXi=1
i= 0:
and averaging over time periods:
yi = xi + i+ "i with restriction
TXt=1
t = 0;
OLS on model in deviations yields
= (X 0QX)1X 0QY;
i = yi xi;t = yt xt:
If the model contains an intercept, operator Q becomes
Q = IN IT IN (eTe0T=T ) (eNe0N=N) IT
+(eNe0N=N) (eTe0T=T )
so that Qu = fuit ui ut + ugit, and Within estimates are
= (X 0QX)1X 0QY;
i = (yi y) (xi x);t = (yt y) (xt x):
3.1.1.2 Testing for eects
1/ H0 : 1 = = N = 1 = = T = 0.
3.1. THE TWO-WAY PANEL DATA MODEL 41
Fisher test statistic:
(RRSS URSS)=(N + T 2)URSS=[(N 1)(T 1)K] v F (k1; k2);
where
k1 = N + T 2; k2 = (N 1)(T 1)K); and
URSS (Unrestricted RSS): from Within model,
RRSS: (Restricted RSS): from pooled OLS.
2/ H0 : 1 = = N = 0 given t 6= 0; t T 1.
Fisher test statistic:
(RRSS URSS)=(N 1)URSS=[(N 1)(T 1)K] v F (k1; k2);
where
k1 = N 1; k2 = (N 1)(T 1)K); and
URSS: from Within model,
RRSS: from regression w/ time dummies only:
(yit yt) = (xit xt) + (uit ut):
3/ H0 : 1 = = T1 = 0 given i 6= 0; i N 1.
Fisher test statistic:
(RRSS URSS)=(T 1)URSS=[(N 1)(T 1)K] v F (k1; k2);
where
k1 = T 1; k2 = (N 1)(T 1)K); and
42 CHAPTER 3. EXTENSIONS
URSS: from Within model,
RRSS: from Within regression as in one-way model:
(yit yi) = (xit xi) + (uit ui):
See Appendix 2 for the two-way random eects model.
3.1.2 Example: Production function (Hoch 1962)
Sample: 63 Minnesota farms over the period 1946-1951.
Estimation of a Cobb-Douglas production function:
logOutputit = 0 + 1 logLaborit + 2 logReal estateit+3 logMachineryit + 4 logFertilizerit:
Motivation for adding specic eects (into uit):
Climatic conditions, identical across farms (t); Farm-specic factors (soil, managerial quality) (i).
Table 3.1: Least square estimates of Cobb-Douglas production func-
tionAssumption
(I) (II) (III)
Estimate i = t = 0 i = 0 t = 0
1 (Labor) 0.256 0.166 0.043
2 (Real estate) 0.135 0.230 0.199
3 (Machinery) 0.163 0.261 0.194
4 (Fertilizer) 0.349 0.311 0.289
Sum of 's 0.904 0.967 0.726R2 0.721 0.813 0.884
3.2. MORE ON NON-SPHERICAL DISTURBANCES 43
3.2 More on non-spherical disturbances
Panel data: in the random-eect context, heteroskedasticity due
to panel data structure. But variances 2 and
2" are assumed
constant.
Heteroskedasticity and serial correlation:
V ar(i) = 2i
Individual-specic heteroskedasticity
V ar("i) = 2i
Typical heteroskedasticity
E("it"is) 6= 0 t 6= s Serial correlation:
We present here the rst two cases only.
3.2.1 Heteroskedasticity in individual eect
Mazodier and Trognon (1978):
V ar(i) = 2i
"it v IID(0; 2"); i = 1; 2; : : : ; N;
or E(0) = diag[2i] = and " v IID(0;
2").
= E(UU 0) = diag[2i] (eTe0T ) + diag[2" ] IT ;
where diag[2"] is N N . We have
= diag[T2i + 2" ]
eTe
0T
T
+ diag[2" ]
IT
eTe0T
T
r = diag[(T2
i+2
")r]
eTe
0T
T
+diag[(2
")r]
IT
eTe0T
T
:
Transformation of the heteroskedastic model:
multiply both sides by "
1=2
= diag
"
(T2i+ 2
")1=2
eTe
0T
T
+ IN
IT
eTe0T
T
:
44 CHAPTER 3. EXTENSIONS
Transformed variables in scalar form:
yit= yit
"1
"p
T2i+ 2"
!#yi:
Same form as in the homoskedastic case, only here is individual-
specic:
i = (T2i +
2")=
2" and y
it = yit
1 1p
i
yi:
Feasible GLS:
Step 1. Estimate 2" consistently from usual Within regression;
Step 2. Noting that V ar(uit) = w2i = 2i + 2" , estimate w2i by1=(T 1)
PT
t=1(uit iu)2, where uit is OLS residual; Step 3. Compute 2
i= w2
i 2
";
Step 4. Form T 2i + 2", i and compute yit; xit; Step 5. Regress y
iton x
itto get .
Important: consistency of variance components estimates w2i; i =
1; 2; : : : ; N requires T >> N .
3.2.2 `Typical heteroskedasticity
Assumptions: i v IID(0; 2i) and V ar("it) =
2i.
= E(UU 0) = diag[2] (eTe0T ) + diag[2i ] IT
= diag[T2+ 2
i] (eTe0T=T ) + diag[2i ] (IT eTe0T=T ) :
Transformed model uses
1=2 = diag[1p
T2 + 2i
] (eTe0T=T )
3.3. UNBALANCED PANEL DATA MODELS 45
+diag[1=i] (IT eTe0T=T ) ;so that Y = 1=2 has typical element
yit=yit yii
+yip
T2+ 2
i
=yit iyi
iwhere i = 1
ipT2 +
2i
E(u2it) = w2
i= 2+
2i8i, hence OLS residuals uit can be used to
estimate w2i: w2
i= 1=(T 1)
PT
t(uit iu)2.
Within residuals ~uit are then used to compute
2i = 1=(T 1)P
T
t(~uit ~ui)2.
A consistent estimate of 2 is 2 = (1=N)
PN
i(w2
i 2i ).
3.3 Unbalanced panel data models
3.3.1 Introduction
Denition: number of time periods is dierent from one unit (indi-
vidual) to another. For individual i, we have Ti periods, and total
number of observations is nowP
N
i=1 Ti (instead of NT previously).
Examples
Firms: may close down or new intrants in an industry; Consumers: may move, die or refuse to answer anymore; Workers: may become unemployed,...
Problem of attrition: probability of a unit staying in the sample
decreases as the # of periods increases.
46 CHAPTER 3. EXTENSIONS
3.3.2 Fixed eect models for unbalanced panels
3.3.2.1 The one-way unbalanced xed-eect model
Consider the unbalanced model with T1 = 3 and T2 = 2:0BBBB@y11y12y13y21y22
1CCCCA =0BBBB@x11x12x13x21x22
1CCCCA +0BBBB@11122
1CCCCA+0BBBB@"11"12"13"21"22
1CCCCA :To eliminate , we need a new Within operator
Q =
I3 e3e03=3 0
0 I2 e2e02=2
=
2666642=3 1=3 1=3 0 0
1=3 2=3 1=3 0 01=3 1=3 2=3 0 0
0 0 0 1=2 1=20 0 0 1=2 1=2
377775 ;and the same procedure as in the balanced case is applied:
Within = (X0QX)
1X 0QY
where Q = diag(ITi eTie0Ti=Ti)ji=1;2;:::;N .
3.3.2.2 The two-way unbalanced xed-eect model
The model is
yit = xit + i + t + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;
3.3. UNBALANCED PANEL DATA MODELS 47
where Nt: # of units observed in period t, and n =P
T
t=1Nt.
Total number of observations is n.
A bit more complex to extend the Within approach here.
Important: We now assume that observations are ordered dif-
ferently: i runs fast and t runs slowly.
Consider a N N matrix at time t from which we delete rowscorresponding to missing individuals at t.
Example: N = 3, N1 = 3, N2 = 2, N3 = 2, and observations are
(y11; y21; y31) (y12; y32) (y13; y23).
24 1 0 00 1 00 0 1
35 )
8>>>>>>>>>>>>>>>>>>>>>>>>>:
D1 =
24 1 0 00 1 00 0 1
35D2 =
1 0 0
0 0 1
D3 =
1 0 0
0 1 0
We have 3 (Nt N) matrices Dt, t = 1; 2; 3 constructed from I3above.
Now dene a new matrix as (1;2), where1 = (D01; : : : ; D
0T)0,
a (nN) matrix, and 2 = diag(DteN), a (n T ) matrix:
=
26664D1 D1eN 0D2 0 0... 0
......
DT 0 DTeN
37775 :
48 CHAPTER 3. EXTENSIONS
The DteN 's provide the number of units present for each period t
(the Nt's).
Matrix is n (N + T ), and corresponds to the matrix of alldummies (units and periods) present in the sample. Part 1 in
is the equivalent ot matrix E (containing individual dummies)
before.
Note that 011 = diag(Ti) (number of periods in the sample for
unit i), and 022 = diag(Nt) (number of individuals for period
t).
Also, 021 is a TN matrix of dummy variables for the presencein the sample of unit i at time t.
Fixed-eect estimator could be implemented by considering the
model
yit = xit +Dit + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;
where Dit: particular row of matrix , and contains all the i's
and t's.
In the balanced panel case, we would have 1 = (eT IN) and2 = (IT eN), and would be NT (N + T ).
3.3. UNBALANCED PANEL DATA MODELS 49
In example above, n = 3 + 2 + 2 = 7 and N = 3:
=
26666666664
1 0 0 1 0 0
0 1 0 1 0 0
0 0 1 1 0 0
1 0 0 0 1 0
0 0 1 0 1 0
1 0 0 0 0 1
0 1 0 0 0 1
37777777775;
vector would be (1; 2; 3; 1; 2; 3), and 0Y =
26666664
1 0 0 1 0 1 0
0 1 0 0 0 0 1
0 0 1 0 1 0 0
1 1 1 0 0 0 0
0 0 0 1 1 0 0
0 0 0 0 0 1 1
37777775
0BBBBBBBBB@
y11y21y31y12y32y13y23
1CCCCCCCCCA=
0BBBBBB@
y11 + y12 + y13y21 + y23y31 + y32y11 + y21 + y31y12 + y32y13 + y23
1CCCCCCAwould compute the sums of variables over periods and inviduals.
Easier method if N and T are large: use deviations from indi-
vidual and time means, as in the balanced two-way Within case.
LetN =
011 (N N);
T = 022 (T T );
NT = 021 (T N);
= 2 11N 0NT (n T );P = T NT1N 0NT = 02 (T T ):
50 CHAPTER 3. EXTENSIONS
Wansbeek and Kapteyn (1989): The required Within operator for
such unbalanced two-way panel is
Q =In 11N 01
P 0;
where P: generalized inverse of P .
Transformed variable QY , say, is also written as
QY = Y 11N 01Y P 0Y = Y 11N 1 ;
where 1 = 01Y and = P
0Y .
1 compute the individual sumsP
Ti
t=1 yti.
Typical transformed element:
(QY )ti = yti 1i
Ti
+
a0i
Ti
t;
where ai: i-th column of NT .
Example
Let Y = (y11; y21; y31; y12; y32; y13; y23) = (1; 2; 3; 2; 6; 3; 4), n = 7,
N = 3, T = 3.
We have
N = T =
24 3 0 00 2 00 0 2
35 ; NT =24 1 1 11 0 11 1 0
35 ;P =
24 1:6666 0:8333 0:83330:8333 1:1666 0:33330:8333 0:3333 1:1666
35
3.3. UNBALANCED PANEL DATA MODELS 51
QY =
0BBBBBBBBB@
0:4582
0:1875
0:50000:54180:5000
0:0832
0:1875
1CCCCCCCCCA; 1 =
0@ 669
1A =0@ 0:33831:6618
2:0368
1A
For example,
Qy11 = 16
3+ (
1
3) (1 1 1 )
0@ 0:33831:66182:0368
1A+ 0:3383 = 0:4582:Qy31 = 3
9
2+ (
1
2) (1 1 0 )
0@ 0:33831:66182:0368
1A+ 0:3383 = 0:5:See Appendix 3 for the unbalanced random-eects model.
52 CHAPTER 3. EXTENSIONS
Chapter 4
Augmented panel data models
What are augmented panel models ? Implication for estimation ?
Special estimation techniques when GLS are not feasible.
4.1 Introduction
Consider the model
yit = xit + zi + i + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
with xit a 1K vector of time- and individual-varying regressors,and zi a 1G vector of individual-specic (time-invariant) re-gressors.
Example:
logWAGE = 1HOURS + 1EDUC + 2SEX + i + "it:
Estimation method:
Within: is not identiable because
QY = QX + (I B)Z +Q +Q" = QX +Q";
53
54 CHAPTER 4. AUGMENTED PANEL DATA MODELS
since BZ = Z. Only identiable. But two-step procedure is
feasible:
1/ Run Within regression ) ;2/ Run Between regression on
yi xi = i + Zi + "i; i = 1; 2; : : : ; N;to estimate the 's.
GLS: Both and are identiable.
4.2 Choice between Within and GLS
One of the choice criterion between Within and GLS: presence of
zi's in the model.
Recall: GLS is a consistent and ecient estimator provided re-
gressors are exogenous:
E(ixit) = 0 and E(izi) = 0 8i; t:
Consider the non-augmented model yit = xit + i + "it.
If xit is endogenous in the sense E(ixit) 6= 0, then GLS are notconsistent:
GLS = +X 01X
1 X 01U
= +
X 0Q+ 1B
X1
X 0Q+ 1B
U;
where = 1 + T2=2" , so that
X 0Q+ 1B
U= [X 0Q"+X 0(B +B")=]
4.3. AN IMPORTANT TEST FOR ENDOGENEITY 55
= 0 +X 0B= + 0 = X 0= 6= 0;
because E(X 0") = 0 and B = .
Same problem with the augmented model, if E(X 0) 6= 0 and/orE(Z 0) 6= 0.
Important consequence in practice: If (some of the) re-
gressors are endogenous, GLS estimates are not consistent, but
Within estimates are consistent because is ltered out.
Another criterion of choice between Within and GLS:
If endogenous regressors ) Choose Within estimation (but not identiable);
If all regressors are exogenous, use GLS (the most ecient).
Three problems remain:
still not identied, because in the Between regressionyi xi = zi + i + "i,
zi still correlated with i.
If one uses Within, all regressors are treated as endogenous (nodistinction between exogenous and endogenous xit's).
Within estimates not ecient.
4.3 An important test for endogeneity
Null hypothesis: H0 : E(X0) = E(Z 0) = 0 (exogeneity).
Comparison between two estimators:
56 CHAPTER 4. AUGMENTED PANEL DATA MODELS
GLS WithinH0 Consistent, Consistent,
ecient not ecient
Alternative Not consistent Consistent
Hausman (1978): Even if the xit's are exogenous, GLS esti-
mates of are not consistent in the augmented model. Therefore,
one can test for exogeneity using parameter estimates for only.
Hausman test statistic: Under H0,
HT =Within GLS
0 hV ar(Within) V ar(GLS)
i1Within GLS
v 2(K):
Notes
GLS and Within must have the same dimension.Weighting matrix
hV ar(Within) V ar(GLS)
iis positive: GLS
more ecient than Within under the null.
Recall that V ar(GLS) = 2"(X 0QX+X 0BX)1 and V ar(w) =
2"(X 0QX)1.
Interpretation of # of degrees of freedom of the test:
Within estimator is based on the conditionE(X 0QU) = 0, whereas
GLS is based onE(X 01U) = 0 ) E(X 0QU) = 0 and E(X 0BU) =0.
For GLS, we add K additional conditions (in terms of B): rank
of X. Hausman test uses these additional restrictions (see GMM
later).
4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLORGLS ESTIMATOR57
4.4 Instrumental Variable estimation: Hausman-
Taylor GLS estimator
4.4.1 Instrumental Variable estimation
Alternative method: Instrumental-variable estimation. In the
cross-section context with N observations:
Y = X + "; E(X 0") 6= 0; E(W 0") = 0;
where W is a N L matrix of instruments. If K = L,
[W 0(Y X)] = 0 , (W 0Y ) = (W 0X)
= (W 0X)1W 0Y (IV estimator):
If L > K,
[W 0(Y X)] = 0 (L conditions on K parameters)
and construct quadratic form (Y X)0W (W 0W )1W 0(Y X) where PW =W (W 0W )1W 0
) = (X 0P 0WX)1(X 0PWY ):
Note: in general, instruments W originate from or outside the
equation.
4.4.2 IV in a panel-data context
Account for variance-covariance structure (); Find relevant instruments, not correlated with .
58 CHAPTER 4. AUGMENTED PANEL DATA MODELS
Consider the general, augmented model:
Y = X11 +X22 + Z11 + Z22 + + ";
where
X1 : N K1 exogenous, varying across i and t;X2 : N K2 endogenous, varying across i and t;Z1 : N G1 exogenous, varying across i;Z2 : N G2 endogenous, varying across i;
and let = (X 01; X02; Z
01; Z
02) and = (
01;
02;
01;
02)0.
General form of the Instrumental-variable estimator for panel
data: Let Y = 1=2Y , X = 1=2X, and = 1=2. We
have
IV =h
0
PWi1 h
0
PWYi
=h01=2PW
1=2i1 h
01=2PW
1=2Y
i:
Computation of 1=2: as in the usual GLS case.
4.4.3 Exogeneity assumptions and a rst instrument ma-
trix
Exogeneity assumptions: E(X 01) = E(Z01) = 0
) Obvious instruments are X1 and Z1, not sucient becauseK1 +G1 < K1 +K2 +G1 +G2.
Additional instruments: must not be correlated with .
Because is the source of endogeneity, every variable not cor-
related with is a valid instrument. Best valid instruments are
highly correlated with X2 and Z2.
QX1 and QX2 are valid instruments: E[(QX1)0] = E[X 01Q] =
4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLORGLS ESTIMATOR59
0 and E[(QX2)0] = E[X 02Q] = 0.
As for X1, equivalent to use BX1 because we need
E[X 01
1U ] = E[X 01(Q+
1B)U ] = E[X 01B(Q+ 1B)U ]
since BQ = 0 and BB = B.
Hausman-Taylor (1981) matrix of instruments:
WHT = [QX1; QX2; BX1; Z1] = [QX1; QX2; X1; Z1]:
Identication condition: We have K1+K2+G1+G2 parameters
to estimate, using K1 +K1 +K2 +G1 instruments (K1 +K2 in-
struments inQX). Therefore, identication condition isK1 G2.
4.4.4 More ecient procedures: Amemiya-MaCurdy and
Breusch-Mizon-Schmidt
4.4.4.1 Amemiya and MaCurdy (1986)
Use the fact that if xit is exogenous, we can use the following con-
ditions: E(xiti) = 0 8i; 8t instead of E(x0ii) = 0.
Amemiya and MaCurdy (1986) suggest to use matrix X1 in
60 CHAPTER 4. AUGMENTED PANEL DATA MODELS
the list of instruments:
X1 =
26666666666666664
x11 x12 : : : x1T (i = 1; t = 1)
x11 x12 : : : x1T (i = 1; t = 2)
: : : : : : : : : : : : : : :
x21 x22 : : : x2T (i = 2; t = 1)
x21 x22 : : : x2T (i = 2; t = 2)
: : : : : : : : : : : : : : :
xN1 xN2 : : : xNT (i = N; t = 1)
xN1 xN2 : : : xNT (i = N; t = 2)
: : : : : : : : : : : : : : :
xN1 xN2 : : : xNT (i = N; t = T )
37777777777777775such that QX1 = 0 and BX
1 = X
1 . The AM instrument matrix
is WAM = [QX;X1 ; Z1], and an equivalent estimator obtains by
using
WAM = [QX; (QX1); BX1; Z1];
where (QX1) is constructed as X1 above.
Amemiya and MaCurdy: their instrument matrix yields an es-
timator as least as ecient as with the Hausman-Taylor matrix,
if i is not correlated with regressors 8t.
Identication condition: We add (QX1) to the Hausman-Taylor
list of instruments, but as [(QX1); X1] is of rank K1, we only add
(T 1)K1 instruments. identication condition is TK1 G2.
4.4.4.2 Breusch, Mizon and Schmidt (1989)
Even more ecient estimator: based on conditions
E[(QX2it)0i] = 0 8i; 8t, instead of condition
E[(QTX2i)0i] = 0.
4.5. COMPUTATION OFVARIANCE-COVARIANCE MATRIX FOR IV ESTIMATORS61
For BMS, estimator is more ecient if endogeneity in X2 origi-
nates from a time-invariant component. BMS instrument matrix:
WBMS = [QX; (QX1); (QX2)
; BX1; Z1]
where (QX1) and (QX2)
are constructed the same way as X1for AM.
Identication condition: For BMS, we add (QX2) to Amemiya-
MaCurdy instruments. Condition is then TK1+(T 1)K2 G2.As before, we only add (T 1)K2 instruments, as (QX2) is notfull rank but (T 1)K2.
4.5 Computation of variance-covariance matrix
for IV estimators
Problem here: endogenous regressors may yield unconsistent esti-
mates of variance components in , in particular parameter .
Method suggested by Hausman-Taylor (1981) that yields consis-
tent estimates.
Let M1 denote the individual-mean vector of the Within residual:
M1 = BY BXW =B BX(X 0QX)1X 0Q
Y
= Z + +B BX(X 0QX)1X 0Q
";
where X = (X1jX2), Z = (Z1jZ2), and = (1; 2). The lastthree terms above can be treated as centered residuals, and it
suces to nd instruments for Z2 in order to estimate .
The IV estimator of is
B = (Z0PCZ)
1(Z 0PCM1);
62 CHAPTER 4. AUGMENTED PANEL DATA MODELS
where PC is the projection matrix associated to instruments C =
(X1; Z1). Using parameter estimates W and B, we form resid-
uals
uW = QY QXW and uB = BY BXW ZB:
These two vectors of residuals are used to compute variance com-
posants as in standard Feasible GLS.
4.5.1 Full IV-GLS estimation procedure
Step 1. Compute individual means and deviations, BX, BY ,QX and QY .
Step 2. Estimate parameters associated toX using Within.
Step 3. Estimate B by the IV procedure above.
Step 4. Compute 2 and
2" from uW and uB, and compute
= 1 + T 2=2
".
Step 5. Transform variables by GLS scalar procedure , e.g.,(Q+
pB)Y = yit (1
p)yi.
Step 6. Compute projection projection PW from instrumentmatrix W .
Step 7. Estimate parameters .
4.6 Example: Wage equation
4.6.1 Model specication
4.6. EXAMPLE: WAGE EQUATION 63
Theory (Human capital or signal theory):
logw = F [X1; ; ED]; where w : wage rate;
: worker's ability (unobserved), X1: additional variables (indus-
try, occupation status, etc.), and ED: educational level. Proxies
for ability that can be used: number of hours worked, experience,
union, etc.
Main objective: estimate marginal gain associated withED: @w=@ED.
But problem: what if worker's ability is constant through time and
conditions ED ? True model would belogw = F [X1; ; ED];
ED = G[;X2];
where X2 are additional, individual-specic variables.
If ability is replaced by proxies Z, we havelogw = F [X1; Z; ED] + U;
ED = G[X2; Z2] + V;
where U = F [X1; ; ED] F [X1; Z; ED] andV = G[X2; ]G[X2; Z].
Two problems when estimating the rst equation while overlook-
ing the second one:
If some X1 and X2 variables in common, endogeneity bias (be-cause of ED);
If Z correlated with omitted variables (explaining ability), measurement-error bias.
64 CHAPTER 4. AUGMENTED PANEL DATA MODELS
4.7 Application: returns to education
Sample used: Panel Study of Income Dynamics (PSID), Univer-
sity of Michigan. See Baltagi and KhantiAkom 1990, Cornwell
and Rupert 1988.
595 individuals, for years 1976 to 1982 (7 time periods): heads of
households (males and females) aged between 18 and 65 in 1976,
with a positive wage in private, nonfarm employment for the
years 1976 to 1982.
4.7.1 Variables related to job status
LWAGE : logarithm of wage earnings;
WKS : number of weeks worked in the year;
EXP : working experience in years at the date of the sample;
OCC : dummy, 1 if bluecollar occupation;
IND : dummy, 1 if working in industry;
UNION : dummy, 1 if wage is covered by a union contract.
4.7.2 Variables related to characteristics of households
heads
SMSA : dummy, 1 if household resides in SMSA (StandardMetropolitan Statistical Area);
SOUTH : dummy, 1 if individual resides in the south;
MS : Marital Status dummy, 1 if head is married;
4.7. APPLICATION: RETURNS TO EDUCATION 65
FEM : dummy, 1 female;
BLK : dummy, 1 if head is black;
ED : number of years of education attained.
Individual-specic variables: ED, BLK and FEM .
Estimation of non-augmented models (w/o Zi's)
Variables a priori endogenous (because correlated with ability:
individual eects): X2: (EXPE, EXPE2, UNION , WKS,
MS);
Variables a priori exogenous: X1: (OCC, SOUTH, SMSA,
IND).
Augmented model
Yit = X1it1 +X2it2 + Z1i1 + Z2i2 + i + "it
Variables a priori endogenous: Z2: ED;
Variables a priori exogenous: Z1: (BLK, FEM).
66 CHAPTER 4. AUGMENTED PANEL DATA MODELS
Table 4.1: Sample 1 1976-1982. Descriptive StatisticsVariable Mean Std. Dev. Minimum Maximum
LWAGE 6.6763 0.4615 4.6052 8.5370
EXP 19.8538 10.9664 1.0000 51.0000
WKS 46.8115 5.1291 5.0000 52.0000
OCC 0.5112 0.4999 0.0000 1.0000
IND 0.3954 0.4890 0.0000 1.0000
UNION 0.3640 0.4812 0.0000 1.0000
SOUTH 0.2903 0.4539 0.0000 1.0000
SMSA 0.6538 0.4758 0.0000 1.0000
MS 0.8144 0.3888 0.0000 1.0000
ED 12.8454 2.7880 4.0000 17.0000
FEM 0.1126 0.3161 0.0000 1.0000
BLK 0.0723 0.2590 0.0000 1.0000
4.7. APPLICATION: RETURNS TO EDUCATION 67
Table 4.2: Dependent variable: log(wage). Exogenous regressors
only.Within GLS
Constant 0.0976 (0.0040)
OCC -0.0696 (0.02323) -0.0701 (0.02322)
SOUTH -0.0052 (0.05833) -0.0072 (0.05807)
SMSA -0.1287 (0.03295) -0.1275 (0.03290)
IND 0.0317 (0.02626) 0.0317 (0.02624)
2(4) = 0:551
Notes. Standard errors are in parentheses.
Table 4.3: Dependent variable: log(wage). Endogenous regressors
only.Within GLS
Constant 0.0561 (0.0024)
EXPE 0.1136 (0.002467) 0.1133 (0.002466)
EXPE2 -0.0004 (0.000054) -0.0004 (0.000054)
WKS 0.0008 (0.0005994) 0.0008 (0.0005994)
MS -0.0322 (0.01893) -0.0325 (0.01892)
UNION 0.0301 (0.01480) 0.0300 (0.01479)
2(5) = 24:94
Notes. Standard errors are in parentheses.
68 CHAPTER 4. AUGMENTED PANEL DATA MODELS
Table 4.4: Dependent variable: log(wage). Augmented model.
Within GLS
Constant 0.1866 (0.01189)
OCC -0.0214 (0.01378) -0.0243 (0.01367)
SOUTH -0.0018 (0.03429) 0.0048 (0.03188)
SMSA -0.0424 (0.01942) -0.0468 (0.01891)
IND 0.0192 (0.01544) 0.0148 (0.01521)
EXPE 0.1132 (0.00247) 0.1084 (0.00243)
EXPE2 -0.0004 (0.00005) -0.0004 (0.00005)
WKS 0.0008 (0.00059) 0.0008 (0.00059)
MS -0.0297 (0.01898) -0.0391 (0.01884)
UNION 0.0327 (0.01492) 0.0375 (0.01472)
FEM -0.1666 (0.12646)
BLK -0.2639 (0.15413)
ED 0.1373 (0.01415)
2(9) = 495:3
Notes. Standard errors are in parentheses.
Table 4.5: Dependent variable: log(wage). IV Estimation
HT AM BMS
Constant 0.1772 (0.017) 0.1781 (0.016) 0.1748 (0.016)
OCC -0.0207 (0.013) -0.0208 (0.013) -0.0204 (0.013)
SOUTH 0.0074 (0.031) 0.0072 (0.031) 0.0077 (0.031)
SMSA -0.0418 (0.018) -0.0419 (0.018) -0.0423 (0.018)
IND 0.0135 (0.015) 0.0136 (0.015) 0.0138 (0.015)
EXPE 0.1131 (0.002) 0.1129 (0.002) 0.1127 (0.002)
EXPE2 -0.0004 (0.005) -0.0004 (0.000) -0.0004 (0.000)
WKS 0.0008 (0.000) 0.0008 (0.000) 0.0008 (0.000)
MS -0.0298 (0.018) -0.0300 (0.018) -0.0303 (0.018)
UNION 0.0327 (0.014) 0.0324 (0.014) 0.0326 (0.014)
FEM -0.1309 (0.126) -0.1320 (0.126) -0.1337 (0.126)
BLK -0.2857 (0.155) -0.2859 (0.155) -0.2793 (0.155)
ED 0.1379 (0.021) 0.1372 (0.020) 0.1417 (0.020)
Test 2(3) = 5:23 2(13) = 19:29 2(13) = 12:23Notes. Standard errors are in parentheses.
Chapter 5
Dynamic panel data models
5.1 Motivation
Usefulness of dynamic panel data models:
Investigate adjustment dynamics in micro- and macro-economicvariables of interest;
Estimate equations from intertemporal-framework models (life-cycle models, nance,...)
In practice: estimate long-run elasticities and structural parame-
ters from Euler equations.
5.1.1 Dynamic formulations from dynamic programming
problems
Consider the general problem
maxq(0);:::;q(T )ER
ert(t);
(t) = p(t)q(t) c[q(t); b(t)];_b = G[b(t); q(t)];
69
70 CHAPTER 5. DYNAMIC PANEL DATA MODELS
where b(t) is the state variable (stock, capital,...), q(t) is the con-
trol variable, r is discount rate. G(:) describes the evolution path
of the state variable.
Dynamic programming solves the problem in a series of steps.
Switch to discrete-time framework:
maxq0;:::;qT EnP
T
t=0(1 + r)tt
o;
bt+1 = f(bt; qt);
and use the Bellman equation:
Vt(bt) = maxEtt + (1 + r)
1Vt+1(bt+1)
= maxEt fptqt c[qt; bt] + Vt+1f [bt; qt]g ;where Vt(bt) is the value function of the problem at time t, and
Et is the conditional expectation operator at time t.
We use a) the envelope theorem (evolution path at optimum de-
pends only on state variable, as control variable is already opti-
mized); b) First-order condition wrt. control variable.
@Vt(bt)
@bt=@t(bt; qt)
@bt+
1
1 + r
@Vt+1
@f
@f(bt; qt)
@bt;
(Envelope theorem)
@Vt(bt)
@qt=@t(bt; qt)
@qt+
1
1 + r
@Vt+1
@f
@f(bt; qt)
@qt= 0 (FOC):
From (FOC):
@Vt+1
@f= @t
@qt
@f(bt; qt)
@qt
1(1 + r);
5.1. MOTIVATION 71
that we replace in rst equation above:
@Vt
@bt=@t
@bt @t@qt
@f(bt; qt)
@qt
1@f(bt; qt)
@bt:
Now we lag (FOC) once and replace:
@t1
@qt1+
1
1 + r
"@t
@bt @t@qt
@f
@qt
1@f
@bt
#@f(bt1; qt1)
@qt1= 0:
Assume @f=@q = a1 and @f=@b = a2. We have
@t
@qt=
1 + r
a2
@t1
@qt1+
a1
a2
@t
@bt:
This is the Euler equation relating current and past marginal
prots.
If, for instance, prot is linear-quadratic in qt and bt, we have
b0 + b1qt + b2bt =1+ra2
(b0 + b1qt1 + b2bt1)
+a1
a2
(c0 + c1qt + c2bt)
, qit = 0 + 1qi;t1 + 2bi;t1 + 3bit + i + "it;
where
0 = (a2b1 a1c1)1 [b0 ((1 + r) a2) + a1c0] ;1 = (a2b1 a1c1)1 [(1 + r)b1] ;2 = (a2b1 a1c1)1 [(1 + r)b2] ;3 = (a2b1 a1c1)1 [a1c2 a2b2] :
5.1.2 Euler equations and consumption
72 CHAPTER 5. DYNAMIC PANEL DATA MODELS
Consider a two-period model with the following period-to-period
budget constraint
ct +At = yt + At1(1 + rt); t = 1; 2;
where ct is consumption at time t, At is total assets, yt is wage
income, and rt is interest rate.
Assume further, intertemporally additive preferences:
U = u(c1) +1
1 + u(c2);
where u0 > 0, u00 < 0 and 0 is the subjective discount rate.Often-used specication: CES (Constant Elasticity of Substitu-
tion)
U = c1 +
1
1 + c2 ;
where = 1=(1+) is the intertemporal elasticity of substitution.
At the optimum (by replacing budget constraints in utility func-
tion and optimizing wrt. A1):
@U
@A1=@u
@c1
@c1
@A1+
1
1 +
@u
@c2
@c2
@A1= 0
, @u@c1
=1 + r
1 +
@u
@c2:
This is the intertemporal eciency condition (Hall 1978), and in
the CES case we have
c1=1 =
1 + r
1 +
c1=2 :
5.1. MOTIVATION 73
Stochastic framework with u(X) = 1=2( X)2:
c1 =1 + r
1 + ( Ec2) , c1 = Ec2 if r = :
Hall Euler equation with more than 2 periods reduces to
ct+1 = ct + "t+1; where "t+1 is i.i.d.;
which is tested from the equation
ct = 0 + 1yt + 2(yt1 ct1) + "t:
This is an error-correction model that can be written
ct = 0 + 1yt + (ct1 1yt1) + 2(yt1 ct1) + "t:
5.1.3 Long-run relationships in economics
Long-run relationships are represented by the stationary path
of the variable of interest (consumption, capital stock,...)yt+1
yt= and if we add variable xt, yt+1 = yt + xt+1, stationary
equilibrium path is y = x
1.
5.1.3.1 Long-run elasticities
Dynamic models are helpful in computing long-run elasticities.
Consider for example the dynamic consumption model
~Ci;t+j = ~Ci;t+j1 + ~Pi;t+j + ui;t+j;
where ~Ci;t+j and ~Pi;t+j respectively denote logs of consumption
and price. Lagged consumption here accounts for habits. We
have~Ci;t+j =
j+1 ~Ci;t1 +j ~Pit +
j1 ~Pi;t+1 + : : :
74 CHAPTER 5. DYNAMIC PANEL DATA MODELS
+ ~Pi;t+j1 + ~Pi;t+j
+ ui;t+j;
where ui;t+j = juit +
j1ui;t+1 + + ui;t+j1 + ui;t+j.
Assume we want to compute the change in consumption at
time t+ j following a permanent change of 1% in price between
t and t+ j:
@ ~Ci;t+j
@ ~Pit+@ ~Ci;t+j
@ ~Pi;t+1+ + @
~Ci;t+j
@ ~Pi;t+j= (j + j1 + + + 1):
When consumption is stationary (in logs), jj < 1, and the long-run eect of price obtains by taking the limit
limj!1
jXs=0
@ ~Ci;t+j
@ ~Pi;t+s= lim
j!1(j + j1 + + + 1) =
1 :
5.1.3.2 Dynamic representations from AR(1) errors
Consider the following Cobb-Douglas production model
logQit = 1 logNit + 2 logKit + uit;
where Qit is output of rm i at time t, Nit is labor input, Kit is
capital stock, and uit is the residual. Assume the latter decom-
poses into
uit = t + i + vit + "it;
where t is a year-specic intercept (industry-wide technological
change), i is the unobserved rm-specic eect, "it is an i.i.d.
error component (measurement error), and vit is a productivity
shock having an AR(1) representation:
vit = vi;t1 + eit:
5.2. THE DYNAMIC FIXED-EFFECT MODEL 75
This model has the following, dynamic representation:
logQit = 1 logNit 1 logNi;t1 + 2 logKit 2 logKi;t1
+ logQi;t1 + (t t1) + [i(1 ) + eit + "it "i;t1] ;
or
logQit = 1 logNit + logNi;t1 + 3 logKit + logKi;t1+5 logQi;t1 +
t+ (
i+ !it);
subject to restrictions 2 = 15 and 4 = 35.
Hence, equivalence between a static (short-run) model with serially-
correlated productivity shocks, and a dynamic representation of
production output.
5.2 The dynamic xed-eect model
Simple dynamic panel-data model:
yit = yi;t1 + i + "it; i = 1; 2; : : : ; N ; t = 1; 2; : : : ; T;
where initial conditions yi0; i = 1; 2; : : : ; N are assumed known.
We assume E("it) = 0 8i; t, E("it"js) = 2"if i = j; t = s and 0 otherwise, E(i"it) = 0 8i; t.By continuous substitution:
yit = "it + "i;t1 + 2"i;t2 + + t1"i1 +
1 t1 i +
tyi0:
76 CHAPTER 5. DYNAMIC PANEL DATA MODELS
5.2.1 Bias in the Fixed-Eects estimator
The Within estimator is:
=
PN
i=1
PT
t=1(yit yi)(yi;t1 yi;1)PN
i=1
PT
t=1(yi;t1 yi;1)2;
i = yi yi;1;
where
yi =1
T
TXt=1
yit; yi;1 =1
T
TXt=1
yi;t1; "i =1
T
TXt=1
"it:
Also,
= +1NT
PN
i=1
PT
t=1("it "i)(yi;t1 yi;1)1NT
PN
i=1
PT
t=1(yi;t1 yi;1)2;
This estimator exists if denominator 6= 0 and is consistent if nu-merator converges to 0.
Numerator:
plimN!11
NT
N;TXi;t
(yi;t1 yi;1)("it "i) = plim1
N
NXi=1
yi;1"i
because "it is serially uncorrelated and not correlated with i. We
use
yi;1 =1
T
TXt=1
yi;t1 =1
T
1 T1 yi0 +
(T 1) T+ T(1 )2 i
+1 T11 "i1 +
1 T21 "i2 + + "i;T1
:
5.2. THE DYNAMIC FIXED-EFFECT MODEL 77
We have
plim1
N
NXi=1
yi;1"i = plim
(1
N
NXi=1
"i1
T
"T1Xt=1
1 Tt1 "it
#)
= plim
(1
N
NXi=1
1
T
TXt=1
"it
!1
T
"T1Xt=1
1 Tt1 "it
#)
=2"T 2
(T 1) T+ T
(1 )2
:
In a similar manner, we show that plim 1NT
PN;T
i;t(yi;t1 yi;1)2
=2"
1 2
1 1
T 2
(1 )2 (T 1) T+ T
T 2
Forming the ratio of these two terms, the asymptotic bias is
plimN!1( ) = 1 +
T 1
1 1
T
1 T1
1 2
(1 )(T 1)
1 1
T
T (1 )
1= O(1=T ):
In the transformed model
(yit yi) = (yi;t1 yi;1) + ("it "i);
the explanatory variable is correlated with residual, and correla-
tion is of order 1=T . Hence, the Fixed-Eects estimator is biased
in the usual case where N is large and T is small.
78 CHAPTER 5. DYNAMIC PANEL DATA MODELS
Table 5.1: Asymptotic bias in Fixed-Eects DPD estimator T Bias Percent
0.2 6 -0.2063 -103.1693
8 -0.1539 -76.9597
10 -0.1226 -61.3139
20 -0.0607 -30.3541
40 -0.0302 -15.0913
0.5 6 -0.2756 -55.1282
8 -0.2049 -40.9769
10 -0.1622 -32.4421
20 -0.0785 -15.6977
40 -0.0384 -7.6819
0.7 6 -0.3307 -47.2392
8 -0.2479 -35.4084
10 -0.1966 -28.0912
20 -0.0938 -13.3955
40 -0.0449 -6.4114
0.9 6 -0.3939 -43.7633
8 -0.3017 -33.5179
10 -0.2432 -27.0248
20 -0.1196 -13.2934
40 -0.0563 -6.2561
5.2. THE DYNAMIC FIXED-EFFECT MODEL 79
5.2.2 Instrumental-variable estimation
Only way to obtain consistent estimator of when T is xed
(small). Dierent procedure to eliminate individual eects: use
First dierencing instead of Within:
(yit yi;t1) = (yi;t1 yi;t2) + ("it "i;t1)yit = yi;t1 +"it;
and in vector form:
yi = yi;1 +"i; i = 1; 2; : : : ; N:
In model above, yi;t1 correlated by construction with "i;t1!Weneed instruments that are uncorrelated with ("it "i;t1) but cor-related with (yi;t1 yi;t2). Only possibility in a single-equationframework with no other explanatory variables: use values of de-
pendent variables.
Because of autoregressive nature of model, instruments from fu-
ture values of yit are not feasible because yit is a recursive function
of "it; "i;t1; : : : ; "i1; i; yi0.
As for lagged dependent variables, we can use either yi;t2 or
(yi;t2 yi;t3):E[yi;t2("it "i;t1)] = E("i;t2"it) E("i;t2"i;t1) = 0;E[(yi;t2 yi;t3)("it "i;t1)] = E["i;t2("it "i;t1)]
E["i;t3("it "i;t1)] = 0;E[yi;t2(yi;t1 yi;t2)] = 0 E("2i;t2) = 2" ;E[(yi;t2 yi;t3)(yi;t1 yi;t2)] = 0 E("2i;t2) = 2" :
Instrumental-variable estimators that are consistent whenN and/or
T !1:
=
PN
i=1
PT
t=3(yit yi;t1)(yi;t2 yi;t3)PN
i=1
PT
t=3(yi;t1 yi;t2)(yi;t2 yi;t3)
80 CHAPTER 5. DYNAMIC PANEL DATA MODELS
or =
PN
i=1
PT
t=3(yit yi;t1)yi;t2PN
i=1
PT
t=3(yi;t1 yi;t2)yi;t2:
Conclusion: With Within transformation on a dynamic model,
even though i is eliminated, endogeneity bias occurs for xed T
because the Q operator used introduces errors "is correlated by
construction with current explanatory variable.
Consider now a more general model:
yit = yi;t1 + xit + zi + i + "it:
IV Estimation proceeds as follows.
Step 1. First-dierence the model, to get
(yit yi;t1) = (yi;t1 yi;t2) + (xit xi;t1) + "it "i;t1:
Use yi;t2 or (yi;t2 yi;t3) as instrument for (yi;t1 yi;t2) andestimate ; with the IV procedure.
Step 2. Substitute and in rst-dierence Between equation:
yi yi;1 xi = zi + i + "i; i = 1; 2; : : : ; N;
and estimate by OLS.
Step 3. Estimate variance components:
2"= 1
2N(T1)
PN
i=1
PT
t=1 [(yit yi;t1) (yi;t1 yi;t2)
(xit xi;t1)i2;
2= 1
N
PN
i=1
hyi yi;1 zi xi
i2 1
T2";
5.3. THE RANDOM-EFFECTS MODEL 81
Consistency of the estimator:
IV estimator of , and 2"are consistent when N or T !1;
IV estimator of and 2are consistent only when T ! 1, but
inconsistent when T is xed and N !1.
5.3 The Random-eects model
We now treat i as a random variable, in addition to "it. As
for static models, i is not eliminated, but it is correlated by
construction with lagged dependent variable yi;t1.
5.3.1 Bias in the ML estimator
In the simple model yit = yi;t1+i+ "it, the MLE is equivalent
to the OLS estimator:
=
PN
i=1
PT
t=1 yityi;t1PN
i=1
PT
t=1 y2i;t1
= +
PN
i=1
PT
t=1(i + "it)yi;t1PN
i=1
PT
t=1 y2i;t1
:
We show that
plimN!11
NT
NXi=1
TXt=1
(i + "it)yi;t1 =1
T
1 T1 Cov(yi0; i)
+1
T
2
(1 )2(T 1) T+ T
;
and
plimN!11
NT
NXi=1
TXt=1
y2i;t1 =1 2TT (1 2):
PN
iy2i0
N
+2
(1 )2 :1
T
T 21
T
1 +1 2T1 2
82 CHAPTER 5. DYNAMIC PANEL DATA MODELS
+2
T (1 )
1 T1
1 2T1 2
Cov(yi0; i)
+2"
T (1 2)2(T 1) T2 + 2T
:
The bias depends on the behavior of initial conditions yi0 (constant
or generated as yit).
5.3.2 An equivalent representation
We consider a more general model
yit = yi;t1 + xit + zi + uit;
with the following assumptions:
jj < 1; E(i) = E("it) = 0;
E(ixit) = 0; E(izi) = 0; E(i"it) = 0;
E(ij) = 2
if i = j;
0 otherwise;
E("it"js) = 2"
if i = j; t = s;
0 otherwise:
We can also write
wit = wi;t1 + xit + zi + "it;
yit = wit + i;
where i = i=(1 ); Ei = 0; V ar(i) = 2 = 2=(1 )2;
and the dynamic process fwitg is independent from individual ef-fect i.
5.3. THE RANDOM-EFFECTS MODEL 83
5.3.3 The role of initial conditions
The two equivalent specications of the model are:
(A) yit = yi;t1 + xit + zi + i + "it;
(B)wit = wi;t1 + xit + zi + "it;
yit = wit + i:
In model (A), yit is driven by unobserved characteristics i, dif-
ferent across units, in addition to xit and zi.
In model (B), dynamic process wit is independent from individual
eects i. Conditional on exogenous xit and zi, wit are driven by
identical processes with i.i.d. shocks "it. But observed value yit is
shifted by individual-specic eect i.
Possible interpretation: wit is a latent variable, yit is observed,
and i is a time-invariant measurement error.
The two processes are equivalent because wit is unobserved. But
assumptions (or knowledge) on initial conditions may help to dis-
tinguish between both processes.
Dierent cases:
1/ yi0 xed; 2/ yi0 random; 2.a/ yi0 independent of i, with E(yi0) = y0 and V ar(yi0) =2y0;
2.b/ yi0 correlated with i, with Cov(yi0; i) = 2y0; 3/ wi0 xed; 4/ wi0 random; 4.a/ wi0 random with common mean w and variance 2"=(12)
84 CHAPTER 5. DYNAMIC PANEL DATA MODELS
(stationarity assumption);
4.b/ wi0 random with common mean w and arbitrary variance2w0;
4.c/ wi0 random with mean i0 and variance 2"=(1 2) (sta-tionarity assumption);
4.d/ wi0 random with mean i0 and arbitrary variance 2w0.
See Appendix 4 for a derivation of Maximum Likelihood esti-
mators in each case.
5.3.4 Possible inconsistency of GLS
In cases 1 and 2.a/ (yi0 xed of random but independent of i):
When 2and 2
"are known, maximizing log-likelihood wrt. ;
and yields the GLS estimator. When 2 and 2" are unknown,
feasible GLS applies by using consistent estimates of these vari-
ances in VT .
Other cases
Estimators for and are consistent when T !1, because GLSconverges to Within. When N !1 and T is xed, GLS is incon-sistent in cases where initial values are correlated with individual
eects.
5.3.5 Example: The Balestra-Nerlove study
Seminal paper on Dynamic Panel Data models (1966). Household
demand for natural gas in the US, including a/ the demand due
to replacement of gas appliances, and b/ demand due to increases
in the stock of appliances.
5.3. THE RANDOM-EFFECTS MODEL 85
Table 5.2: Properties of the MLE for dynamic panel data models
Parameters N xed, T !1 T xed, N !1Case 1: yi0 xed
; ; 2"
Consistent Consistent
; 2 Inconsistent Consistent
Case 2.a: yi0 random, yi0 ind. of i; ; 2
" Consistent Consistent
y0; ; 2; 2
y0Inconsistent Consistent
Case 2.b: yi0 correlated with i; ; 2
"Consistent Consistent
y0; ; 2;
2y0; Inconsistent Consistent
Case 3: wi0 xed
; ; 2"
Consistent Inconsistent
wi0; ; 2 Inconsistent Inconsistent
Case 4.a: wi0 random, mean w, variance 2"=(1 2)
; ; 2" Consistent Consistent
w; ; 2 Inconsistent Consistent
Case 4.b: wi0 random, mean w, variance 2w0
; ; 2"
Consistent Consistent
w0; ; 2; w Inconsistent Consistent
Case 4.c: wi0 random, mean i0, variance 2"=(1 2)
; ; 2" Consistent Inconsistent
i0; ; 2
Inconsistent Inconsistent
Case 4.d: wi0 random, mean i0, variance 2w0
; ; 2"
Consistent Inconsistent
i0; 2;
2w0
Inconsistent Inconsistent
86 CHAPTER 5. DYNAMIC PANEL DATA MODELS
Demand system:
Git= Git (1 r)Gi;t1;
F it= Fit (1 r)Fi;t1;
Fit = a0 + a1Nit + a2Iit;
Git= b0 + b1Pit + b2F
it;
where Gitand Git are respectively the new demand and the actual
demand for gas at time t from unit i, r is the appliances deprecia-
tion rate, F itand Fit are respectively the new and actual demand
for all types of fuel, Nit is total population, Iit is per-head income,
and Pit is relative price of gas.
Solving the system, we have the equation to be estimated:
Git = 0 + 1Pit + 2Nit + 3Ni;t1+4Iit + 5Ii;t1 + 6Gi;t1;
where Nit = Nit Ni;t1, Iit = Iit Ii;t1, and 6 = 1 r.
Estimation procedures: OLS, Within (LSDV) and GLS (with as-
sumption that initial conditions Gi0 are xed, case 1/).
In accordance with the theory, (here, 6) is biased upward for
OLS and downward for Within.
5.3. THE RANDOM-EFFECTS MODEL 87
Table 5.3: Parameter estimates, Balestra-Nerlove model
Parameter OLS Within GLS
0 (Intercept) -3.650 - -4.091
(3.316) - (11.544)
1 (Pit) -0.0451(*) -0.2026 -0.0879(*)
(0.027) (0.0532) (0.0468)
2 (Nit) 0.0174(*) -0.0135 -0.00122
(0.0093) (0.0215) (0.0190)
3 (Ni;t1) 0.00111(**) 0.0327(**) 0.00360(**)
(0.00041) (0.0046) (0.00129)
4 (Iit) 0.0183(**) 0.0131 0.0170(**)
(0.0080) (0.0084) (0.0080)
5 (Ii;t1) 0.00326 0.0044 0.00354
(0.00197) (0.0101) (0.00622)
6 (Gi;t1) 1.010(**) 0.6799(**) 0.9546(**)
(0.014) (0.0633) (0.0372)
Notes. N = 36, T = 11. Standard errors are in parentheses. (*) and (**):
parameter signicant at 10% and 5% level respectively.
88 CHAPTER 5. DYNAMIC PANEL DATA MODELS
Part II
Generalized Method of Moments
estimation
89
Chapter 6
The GMM estimator
Generalized Method of Moments: ecient way to obtain consis-
tent parameter estimates under mild conditions on the model.
Very popular in estimating structural economic models, as it re-
quires much less conditions on model disturbances than Maximum
Likelihood. Another important advantage: easy to obtain param-
eter estimates that are robust to heteroskedasticity of unknown
form.
6.1 Moment conditions and the method of mo-
ments
6.1.1 Moment conditions
Consider a sample of size N , fxi; i = 1; 2; : : : ; Ng from which onewishes to estimate a p 1 vector whose true value is 0.Note: notation above is very general, xi will typically include de-
pendent (endogenous) and explanatory (exogenous, endogenous)
variables.
Let f(xi; ) denote a q1 function whose expectation E[f(xi; )]
91
92 CHAPTER 6. THE GMM ESTIMATOR
exists and is nite. Moment conditions are then dened as
E[f(xi; 0)] = 0:
6.1.2 Example: Linear regression model
Consider the linear model
yi = xi0 + ui; i = 1; 2; : : : ; N;
where 0: true value of parameter vector , and ui is the error
term.
A common assumption is E(uijxi) = 0 , E(yijxi) = xi0, andfrom the Law of Iterated Expectations:
E(xiui) = E[E(xiuijxi)] = E[xiE(uijxi)] = 0:
In terms of the denition above, = and f((xi; yi); ) = xi(yixi). Moment conditions are then
E(xiui) = E[xi(yi xi0] = 0:
Note that here, p = q, as many moment conditions as we have
parameters to estimate.
Suppose now we do not assume E(uijxi) = 0 but instead, thatE(ziui) = 0. Vector zi is q 1 and would consist of instrumentssuch that
E(ziui) = E[zi(yi xi0)] = 0; orf [(xi; yi; zi); ] = zi(yi xi):
There are q moment equations (as many as there are instruments)
and p parameters to estimate. Hence, identication condition is
q p.
6.1. MOMENT CONDITIONS AND THE METHOD OF MOMENTS 93
6.1.3 Example: Gamma distribution
A sample fxi; i = 1; 2; : : : ; Ng is drawn from a Gamma distri-bution (a; b) with true values a0 and b0. Relationship between
parameters and two rst moments of the distribution:
E(xi) =a0
b0; E[xi E(xi)]2 =
a0
b20:
In our notation in the denition above: = (a; b) and
f(xi; ) =hxi
a
b; (xi
a
b)2 a
b2
i;
so that E[f(xi; 0] = 0.
6.1.4 Method of moments estimation
How to estimate using moment conditions given above ? In the
case where p = q (as many conditions as parameters), we could
solve E[f(xi; 0)] = 0 for 0. But E[f(:)] is unknown, whereas
function values f(xi; ) can be computed 8; 8i. Also, samplemoments of function f(:) can be computed:
fN() =1
N
NXi=1
f(xi; ):
Basic idea of the method of moment estimation: if E(f) close to
fN (population moments close to empirical moments), then N is
a convenient estimate for 0, where f(N) = 0.
0 = E[f(0)] fN(N) ) 0 N :
Two important conditions need to hold for the method of moment
estimation to be valid: a) E(f) is adequately approximated by
94 CHAPTER 6. THE GMM ESTIMATOR
fN ; b) moment conditions can be solved for N .
Example: linear regression.
Sample moment conditions are
1
N
NXi=1
xiui =1
N
NXi=1
xi(yi xiN) = 0;
and solving for N yields
N =
NXi=1
xix0i
!1NXi=1
xiyi:
6.1.5 Example: Poisson counting model
Poisson process: dependent variable is discrete (number of events,
etc.). Restriction: Mean of distribution is equal to the variance.
Assumption: dependent variables y1; y2; : : : ; yN are distributed
according to independent Poisson distributions, with parameters
1; 2; : : : ; N respectively.
Prob[yi = r] = exp(i)ri
r!
We assume the i's depend on explanatory variables by a log-
linear relationship:
logi = 0 +
pXj=1
jxij:
The likelihood of the Poisson model is
L = Ni=1
exp(i)
yi
i
yi!
= exp
"
NXi=1
i + 0
NXi=1
yi
6.1. MOMENT CONDITIONS AND THE METHOD OF MOMENTS 95
+
pXj=1
j
NXi=1
xijyi
# Ni=1yi!
1:
Let us consider the following sample moments :
T0 =
NXi=1
yi Tj =
NXi=1
xijyi j = 1; : : : ; p;
and we use the fact that
@i
@0= i and
@i
@j= xiji:
If we set derivatives of logL wrt. 0 and the j's to 0, we get
T0 =
NXi=1
i Tj =
NXi=1
xiji j = 1; : : : ; p
where i = exp(0 +P
p
j=1 jxij): Hence, we match sample mo-
ments T0 and Tj to theoretical momentsP
N
i=1 exp(0+P
p
j=1 jxij)
and Tj =P
N
i=1 xij exp(0 +P
p
j=1 jxij) respectively.
We have p+ 1 such matching conditions for p+ 1 parameters.
6.1.6 Comments
Note the dierence between the Method of Moments philosophy
and the usual estimation criteria. For Maximum Likelihood and
Least Squares, we maximize (minimize) a criterion
= argmax logL() (MLE);
= argmin1N
PN
i[yi f(xi; )]2 (LS);
96 CHAPTER 6. THE GMM ESTIMATOR
whereas here, we start from First-order Conditions and solve the
system for .
Example: Instrumental Variable estimation
We could consider minimizing the IV criterion wrt. :
= argmin
(Y X)0Z(Z 0Z)1Z 0(Y X);
where Z is a N q matrix of instruments, or start from the FOC:
1
N
NXi=1
ziui =1
N
NXi=1
zi(yi xi) = 0
, =
NXi=1
z0ixi
!1NXi=1
z0iyi = (Z
0X)1Z 0Y:
Equivalently, we could maximize the log likelihood wrt. or start
from the FOC
1
N
NXi=1
@ logL()
@j= = 0;
which can be regarded here as a set of sample moment conditions.
Problems that remain to be solved:
Ensure that we can replace population moments by sample mo-ments, for the Method of Moments to work.
What if the system of moment conditions is overidentied (moreconditions than parameters) ?
How to be sure our moment conditions are valid (e.g., validchoice of instruments) ?
6.2. THE GENERALIZED METHOD OF MOMENTS (GMM) 97
6.2 The Generalized Method of Moments (GMM)
6.2.1 Introduction
As the name indicates, GMM is an extension of the Method of
Moments, when parameters are overidentied by moment con-
ditions. Equations E[f(xi; 0] = 0 represent q conditions for p
unknown parameters, therefore we cannot nd a vector N satis-
fying fN() = 0.
But we can look for that makes fN() as close to 0 as possible,
by dening
N = argmin
QN() = fN()0ANfN();
where AN is a positive weighting matrix of order 0(1).
Important note: for the just-identied case, QN() = 0 because
fN() = 0, but in the over-identied case, QN() > 0.
This fact is important for model checking (we will come to this
point later in the course).
6.2.2 Example: Just-identied IV model
Consider Y = X+u with condition E(W 0u) = 0 (W are instru-
ments), and
rank(W 0X) = p. Solving for we have = (W 0X)1(W 0Y )
that we replace in the IV criterion:
u()0P 0Wu() =
Y X(W 0X)1(W 0Y )
0W (W 0W )1W