Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl)
UvA-DARE (Digital Academic Repository)
Essays in panel data modelling
Juodis, A.
Link to publication
Citation for published version (APA):Juodis, A. (2015). Essays in panel data modelling.
General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s),other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, statingyour reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Askthe Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam,The Netherlands. You will be contacted as soon as possible.
Download date: 20 Jul 2020
Essays in Panel Data Modelling
Arturas Juodis
This thesis analyses the properties of the estimation techniques for panel data models with additive and multiplicative error structures. First, this thesis discusses the relative merits of the maximum likelihood estimators in dynamic panel data models. Second, it provides an in-depth analysis of genuine and pseudo panel data models with unobserved interactive effects.
Arturas Juodis holds a Bachelor‘s degre in Economics from Vilnius University and M.Phil. in Economics from Tinbergen Institute. In December 2011, he joined Amsterdam School of Economics at the University of Amsterdam as a PhD student. His research mainly focuses on various aspects of panel data analysis for micro- and macro-economic applications.
633
Universiteit van Amsterdam
Essays in Panel Data M
odelling Arturas Juodis
ESSAYS IN PANEL DATA MODELLING
ISBN 978 90 5170 684 0
Cover design: Crasborn Graphic Designers bno, Valkenburg a.d. Geul
This book is no. 633 of the Tinbergen Institute Research Series, established through
cooperation between Rozenberg Publishers and the Tinbergen Institute. A list of books
which already appeared in the series can be found in the back.
ESSAYS IN PANEL DATA MODELLING
ACADEMISCH PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Universiteit van Amsterdam
op gezag van de Rector Magnificus
prof. dr. D.C. van den Boom
ten overstaan van een door het College voor Promoties ingestelde
commissie, in het openbaar te verdedigen in de Agnietenkapel
op donderdag 3 december 2015, te 10:00 uur
door
Arturas Juodis
geboren te Minsk, Wit-Rusland
Promotiecommissie:
Promotor: Prof. dr. H.P. Boswijk Universiteit van Amsterdam
Copromotor: dr. M.J.G. Bun Universiteit van Amsterdam
Overige leden: dr. S.A. Broda Universiteit van Amsterdam
Prof. dr. M.A. Carree Universiteit Maastricht
Prof. dr. J.F. Kiviet Universiteit van Amsterdam
Prof. dr. F.R. Kleibergen Universiteit van Amsterdam
dr. J.C.M. van Ophem Universiteit van Amsterdam
dr. R. Okui Kyoto University
Prof. dr. T.J. Wansbeek Rijksuniversiteit Groningen
Faculteit: Economie en Bedrijfskunde
Acknowledgements
It is difficult to overestimate the amount of help and encouragement I received from
my supervisors Peter Boswijk and Maurice Bun. Starting from the beginning of my
PhD they helped me to shape my research agenda and quite importantly allowed me
to deviate from the original topic as much as I wanted. I am especially grateful to
them for always keeping their door open, irrespective whether I wanted to talk about
vacations, submissions, referees, conferences or visits, you name it.
I would like to express my gratitude to Simon Broda, Martin Carree, Jan Kiviet, Frank
Kleibergen, Hans van Ophem, Ryo Okui and Tom Wansbeek for agreeing to be in my
doctoral committee.
My four years at UvA would not be as productive and pleasant without good colleagues:
Kees Jan van Garderen, Noud van Giersbergen, Jan Kiviet, Frank Kleibergen, Hans
van Ophem, Simon Broda, Andrei Lalu, Xiye Yang, Yang Liu, Rutger Poldermans,
Milan Pleus and Andrew Pua. I am especially thankful to Frank Kleibergen for his
help and encouragement during the job search. Thanks to the office next to the coffee
machine, I was always happy to have a chat with other fellow PhD students at UvA:
Tomasz, Rutger T., Swaphnil, Lucy, Lin, Oana, Julien, David, Christian, Moutaz, Hao
and Stephany. Finally, the excellent work of non-academic staff Robert, Kees and
Wilma is highly appreciated.
Admission to the TI Mphil programme was a crucial step for my PhD. I am grateful
to Admission board of the Tinbergen Institute and the DGS at the time prof. Adriaan
Soetevent, for opening the door of academia for me. The excellent work of non-academic
staff Judith, Ester and Arianne is appreciated and not forgotten. I have met many great
people at TI, whose friendship I still enjoy. Thanks Piotr, Erkki, Sait (aka Stilian),
Lukasz, Sandor, Violeta, Grega, Ona, Lerby and many others.
During my time as PhD student I had an opportunity to visit Monash University
in Spring 2014 and Lund University in Spring 2015. My visit to Monash University
would not have been as pleasurable as it was without always positive and encouraging
host, Vasilis Sarafidis. He made sure my stay in Melbourne was good both inside and
outside the university. I was also very happy to meet George Athanasopoulos, Ann
Maharaj, Param and Mervyn Silvapulle, Anastasios Panagiotelis, Xueyan Zhao and
Tingting Cheng at Monash. I am grateful to Joakim Westerlund for hosting me at
Lund University and especially for his positive attitude and willingness to discuss and
i
share research ideas. I am also grateful to Hande Karabıyık, Simon Reese, Emre Aylar
and Milda Norkute for making my stay easy and pleasurable.
I am also happy that my fellow Lithuanians were always around to enjoy (and are still
enjoying) the hospitality of Casa Arturo in Amstelveen: Simas, Egle G., Egle J., Vytis,
Zivile, Renata and Ieva.
There are many people back in Lithuania that I would also like to thank. VuSIF
people: Ricardas, Vytautas, Tomas and Linas. Cycling and running fanatics: Darius
and Marius. I am very grateful to some of the academic staff at Vilnius University,
namely, prof. Linas Cekanavicius, prof. Alfredas Rackauskas, dr. Dmitrij Celov and
dr. Vita Karpuskiene, who helped and inspired me to continue with graduate studies.
Finally, I would like to thank my dear second-half Zina, my parents and my brother
for being supportive throughout the last 5 years. PhD can be a stressful (but also
pleasant) endeavour and Zina had to suffer all the ups and downs through all those
years we have been living in the Netherlands. Finally, I wish all my grandparents could
live until this day, I know they would be proud of me.
My research project was financed by NWO (through the MaGW grant “Likelihood-
based inference in dynamic panel data models with endogenous covariates”), Univer-
sity of Amsterdam, Tinbergen Institute and C. Willems Stichting. I would also like
to thank seminar and conference participants at University of Amsterdam, Univer-
sity of Groningen, Utrecht University, Monash University, University of Melbourne,
Deakin University, Lund University, Tinbergen Institute, NESG (Amsterdam, Tilburg,
Maastricht), IAAE annual meetings (London and Thessaloniki), Panel Data Confer-
ence (Cambridge, London and Budapest), Panel Data Workshop in Amsterdam (2013
and 2015) and NY State Camp Econometrics, for their comments, suggestions and
discussions that greatly improved my papers.
August 2015, Amsterdam.
Contents
Acknowledgements i
Contents iii
1 Introduction 1
1.1 Why panel data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Likelihood-based estimation of dynamic panel data models . . . . . . . 4
1.3 Panel data models with interactive effects . . . . . . . . . . . . . . . . 6
2 First Difference Transformation in Panel VAR models: Robustness,Estimation and Inference 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Setup and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Assumptions and definitions . . . . . . . . . . . . . . . . . . . . 12
2.3 OLS in first differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 With exogenous regressors . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Without exogenous regressors . . . . . . . . . . . . . . . . . . . 16
2.4 Transformed MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Likelihood function with imposed covariance-stationarity . . . . 22
2.4.2 Cross-sectional heterogeneity . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Misspecification of the mean parameter . . . . . . . . . . . . . . 27
2.4.4 Identification and bimodality issues for three-wave panels . . . . 28
2.4.5 Time-series heteroscedasticity . . . . . . . . . . . . . . . . . . . 32
2.5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.2 Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.3 Technical remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.4 Results: Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.5 Results: Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
Contents iv
2.A.1 Auxiliary results . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.A.2 Log-likelihood function . . . . . . . . . . . . . . . . . . . . . . . 44
2.A.3 Score vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.A.4 Bimodality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.B Iterative bias correction procedure . . . . . . . . . . . . . . . . . . . . . 51
2.C Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 On Maximum Likelihood Estimation of Dynamic Panel Data Models 63
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 ML estimation for the panel AR(1) model . . . . . . . . . . . . . . . . 66
3.3 Multiple solutions and constrained estimation . . . . . . . . . . . . . . 71
3.3.1 Three-wave panel and the Transformed ML estimator . . . . . . 71
3.3.2 Further asymptotic results for T > 2 and TML . . . . . . . . . . 74
3.3.3 Constrained estimation . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 Extension to exogenous regressors . . . . . . . . . . . . . . . . . . . . . 78
3.5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.5.1 Results: Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5.2 Results: Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6 Empirical illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.1 ARX(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.2 AR(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.B Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.C Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4 Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 99
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 Theoretical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.1 Quasi-differenced (QD) GMM . . . . . . . . . . . . . . . . . . . 104
4.3.2 Quasi-long-differenced (QLD) GMM . . . . . . . . . . . . . . . 107
4.3.3 Factor IV (FIVU and FIVR) . . . . . . . . . . . . . . . . . . . . 109
4.3.4 Linearized QLD GMM . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.5 Projection GMM . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.6 Linear GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.3.7 Projection Quasi ML Estimator . . . . . . . . . . . . . . . . . . 117
4.4 Some general remarks on the estimators . . . . . . . . . . . . . . . . . 118
4.4.1 (Non-)Invariance to factor loadings . . . . . . . . . . . . . . . . 118
4.4.2 Unbalanced samples . . . . . . . . . . . . . . . . . . . . . . . . 119
4.4.3 Observed factors . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Contents v
4.5.1 Setup and designs . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.A Starting values for non-linear estimators . . . . . . . . . . . . . . . . . 128
4.B Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5 Pseudo Panel Data Models with Cohort Interactive Effects 143
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.3 Cohort Interactive Effects . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3.1 Inconsistency of the conventional Fixed Effects estimator . . . . 147
5.3.2 Assumptions and estimation . . . . . . . . . . . . . . . . . . . . 149
5.3.3 Unbalanced samples . . . . . . . . . . . . . . . . . . . . . . . . 155
5.3.4 Dynamic models . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.4 Testing, model selection and identification . . . . . . . . . . . . . . . . 158
5.4.1 Testing and model selection . . . . . . . . . . . . . . . . . . . . 158
5.4.2 Identification: Local and Weak . . . . . . . . . . . . . . . . . . 159
5.4.3 Identification: Global . . . . . . . . . . . . . . . . . . . . . . . . 161
5.5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.5.2 Results: Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.5.3 Results: Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.5.4 Results: Model selection . . . . . . . . . . . . . . . . . . . . . . 168
5.6 Empirical illustration: ENEMDU Dataset . . . . . . . . . . . . . . . . 169
5.6.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.A Theoretical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.A.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.A.2 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.A.3 Sufficient conditions for FE estimator . . . . . . . . . . . . . . . 178
5.A.4 The Hausman test for fixed effects . . . . . . . . . . . . . . . . . 179
5.B The ENEMDU dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.B.1 The linear-log specification . . . . . . . . . . . . . . . . . . . . . 181
5.C Monte Carlo results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Bibliography 185
Nederlandse samenvatting 195
Chapter 1
Introduction
1.1 Why panel data?
Panel data are repeated observations on the same cross section unit, typically of in-
dividuals or firms (in microeconomic applications), observed for several time periods.
The use of panel data has been increasingly popular in empirical macroeconomic and
(especially) microeconomic studies and there are several reasons behind the success
story.
The key advantage of using panel data is the possibility to perform statistical inference
on quantities that can have a causal interpretation, at the same time controlling for
unobservable cross-sectional and time-series characteristics (hence avoiding omitted
variable bias). One of the most common examples of such characteristics, the so-called
individual specific effect, is the effect of talent or skills in a model of workers’ hourly
earnings. In order to consistently estimate the coefficients in that model, researchers
need to control for heterogeneity in workers’ talents or skills. Unfortunately, non-
experimental data containing information on individual workers’ talents and skills are
scarce. Without such information, it is extremely difficult to control for talent using
cross-sectional data to obtain statistical results that can be interpreted as causal. In
contrast, when panel data are available, a variety of estimation methods can be used
to control for the unobservable individual effects.
A second major advantage of panel data is increased precision in estimation. This is
the result of an increase in the number of observations owing to pooling several time
series observations of data for each individual. The possibility to pool observations over
1
Chapter 1. Introduction. Why panel data? 2
two dimensions (individuals and time) allows estimation of the common parameters of
interest even when one of the dimensions can be of limited size.
This flexibility of panel data modelling was also instrumental in ensuring that, together
with standard courses on cross-sectional and time-series modelling, panel data methods
became an important part of the econometrics curriculum around the world. Further-
more, more specialized textbooks of Hsiao (2002), Arellano (2003b) and Baltagi (2013)
have become standard in the literature, and might serve the readers of this thesis as
a more general introduction into the field of panel data. The applicability and im-
portance of (dynamic) panel data models for empirical economists can be illustrated
by the fact that the seminal paper of Arellano and Bond (1991) (that is extensively
referred to in this thesis) is one of the most cited papers in the econometrics literature
since the early 90s.1
Let us briefly explain what the prototypical panel data model looks like and what are
the key ingredients that drive the analysis of this type of models. Panel data sets
are obtained using T repeated measurements of individual-specific quantities wi,t =
(yi,t,x′i,t)′. Here yi,t is the dependent or explained variable, while xi,t is the vector
of observed explanatory variables (or regressors) that can be time-varying, but also
time-invariant.
Two indices in subscript emphasize that all observations are obtained by sampling from
two dimensions (cross-sectional and time). Here the index i denotes individual units
(e.g. firms, households, etc.) and the index t denotes time periods. The total number of
individual observations is usually denoted by N . Hence, if the panel is “balanced” (all
individuals are observed the same number of times, T ), the total number of observations
is given by N × T .
The analysis of this thesis is limited to linear models, i.e.
yi,t = x′i,tβ + ui,t. (1.1)
The main emphasis of this thesis is on models where one of the explanatory variables
in xi,t is the lagged value of yi,t. Dynamic models are of interest in a wide range of
economic applications, including Euler equations for household consumption, adjust-
ment cost models for firms’ factor demands, and empirical models of economic growth.
1In “On the Remarkable Success of the ARELLANO-BOND Estimator”, AENORM, No. 77,p.15-20, prof. T.J. Wansbeek, provides a detailed analysis behind the success story of this paper.
Chapter 1. Introduction. Why panel data? 3
Even when the coefficients of lagged dependent variables are not of direct interest, al-
lowing for dynamics in the underlying process may be crucial for recovering consistent
estimates of other parameters.
As was mentioned above, one of the key advantages of panel data models is the pos-
sibility to control for omitted individual and time-specific variables, without any need
to rely on external instruments. In order to do that, one has to specify the way these
omitted variables enter the error term ui,t. It is usually assumed that the error term
ui,t can be well approximated by the following additive decomposition
ui,t = ηi + τt + εi,t, (1.2)
where εi,t is the idiosyncratic component, ηi is the individual-specific effect and τt is
the time effect. This is the standard additive error component model mostly used and
studied in the literature.
The error component structure in (1.2) has some limitations as it imposes additivity
in individual and time effects. One can relax the additivity assumption, and instead
consider a more flexible additive-multiplicative decomposition of ui,t. In that case one
usually assumes that
ui,t = λ′ift + εi,t, (1.3)
and refers to the L-dimensional vectors λi and ft as individual-specific factor loadings
and time-varying factors. This specification is commonly referred to as “interactive-
effects” or “multi-factor error terms”. Note how the additive structure can be seen as
a special case of the multiplicative one, observing that
ηi + τt =(ηi 1
)( 1
τt
)= λ′ift.
Hence, the multiplicative nature of λ′ift nests the commonly used additive model.
However, this additional flexibility also has a price in terms of estimation, as discussed
later in this thesis.
This thesis consists of four independent chapters. The first two chapters are devoted
to the analysis of consistent likelihood-based estimators for univariate and multivariate
panel data models. In those two chapters we will maintain the additive error-component
structure as in (1.2).
Chapter 1. Introduction. Why panel data? 4
The remaining two chapters, on the other hand, relax the additivity assumption and
consider the more flexible specification (1.3). That part of the thesis contributes to the
growing literature on panel data models with “interactive-effects”.
In all chapters it is assumed that the number of cross-sectional units (individuals, firms,
regions) is large (later referred to as “large N”), while the number of time-series obser-
vations is limited and can be as small as three (“fixed T”). It is therefore common to
consider the semi-asymptotic behavior of estimators (and corresponding test statistics)
by keeping T fixed and assuming only N to be large. Asymptotic approximations of
this type are mostly applicable to micro-econometric panels, where it is very costly or
impossible to track individuals (or firms) over longer time spans. Furthermore, in this
thesis, we assume that the slope parameter vector β is the same for all individuals and
is constant over time. Although quite restrictive, this assumption is commonly imposed
when analysing micro-econometric panels. The analysis with individual specific slopes
is extremely challenging for panels with fixed time-series dimensions and especially so
for dynamic panels.
1.2 Likelihood-based estimation of dynamic panel
data models
A central theme in linear dynamic panel data analysis is the fact that the Fixed Effects
(FE) estimator is inconsistent for fixed T and large N . This inconsistency is referred
to as the Nickell (1981) bias, and is an example of the incidental parameters problem,
see Neyman and Scott (1948). It has therefore become common practice to estimate
the parameters of dynamic panel data models by the Generalized Method of Moments
(GMM), see Arellano and Bond (1991) and Blundell and Bond (1998). A main reason
for using GMM is that it provides asymptotically efficient inference exploiting a minimal
set of statistical assumptions. GMM inference is not without its own problems, however,
see e.g. Bun and Kiviet (2006) and Kiviet, Pleus, and Poldermans (2015). This again
has led to an interest in likelihood based methods that implicitly or explicitly correct
for the incidental parameters problem.
Chapter 2 of this thesis, “First Difference Transformation in Panel VAR models: Ro-
bustness, Estimation and Inference”, based on Juodis (2014b), analyses the properties
of the likelihood-based estimator in first differences (also referred to as the Transformed
ML estimator) for the first-order Panel Vector Autoregressive model (or PVAR(1)).
Chapter 1. Introduction. Why panel data? 5
Models of this type have recently received a substantial amount of attention from
econometricians, with the recent contributions by Akashi and Kunitomo (2012), Hsiao
and Zhou (2015) and Hayakawa (2015), among others.
As a starting point for our analysis in that chapter we take the setup of Binder, Hsiao,
and Pesaran (2005) and provide new results that shed some light on the distributional
properties of this maximum likelihood estimator. Particularly, the focus is on situations
where the underlying assumptions used to derive asymptotic properties of the Trans-
formed Maximum Likelihood estimator are violated. A simplified approach to obtain
the estimator is also provided. This chapter is also one of the few papers in the litera-
ture that touches upon the topic of asymptotic bimodality issues for the log-likelihood
function. To illustrate the importance of the underlying assumptions and bimodality,
an extensive Monte Carlo study is conducted. Results of the simulation study provide
some interesting insights about the finite-sample distribution of the estimator and can
be useful both for applied and for theoretical econometricians.
Chapter 3 of this thesis, “On Maximum Likelihood Estimation of Dynamic Panel Data
Models” is based on Bun, Carree, and Juodis (2015). As a starting point of our
work, we take the bimodality analysis briefly discussed in the previous chapter. This
chapter provides an in-depth discussion of the bimodality and negative variance issues
for likelihood-based estimators in the univariate Panel AR(1) model with and without
additional exogenous variables. We concentrate our attention on the univariate AR(1)
model, leaving the more challenging multivariate setup of the previous chapter for
future research.
In this paper we show that the First Order Conditions for the likelihood estimators are
cubic in the autoregressive parameter. This suggests that the log-likelihood function
in finite samples can be bimodal or unimodal, for any value of T . Furthermore, we
find that commonly used statistics, can have poor coverage in finite samples and one
might need to rely on other means of inference. This problem can be especially relevant
for empirical economists using these methods. Using the dataset in Bun and Carree
(2005), we show how these theoretical results can influence empirical estimates of U.S.
state level unemployment dynamics.
Chapter 1. Introduction. Why panel data? 6
1.3 Panel data models with interactive effects
In some situations the assumption that omitted unobserved variables can be well ap-
proximated by an additive error component structure, can be too restrictive. For
example, one can consider a model of hourly wage rates as illustrated in Ahn et al.
(2001). It is reasonable to assume that the productivity of an individual’s unobserv-
able talent or skill can change over the business cycle. If so, the effect of unobservable
talent on hourly wages would vary over time because workers’ hourly wage rates de-
pend on their labor productivity. It is also likely that hourly wage rates depend on
multiple individual effects. For example, individual workers’ wages could be affected
by unexpected changes in macroeconomic variables. Panel data models that assume a
single time-invariant individual effect are inappropriate for the analysis of data with
such multiple time-varying individual effects.
In such cases econometricians usually assume that one can instead approximate the
effect of omitted variables with interactive (or multiplicative) fixed effects. Despite the
advantages of the more flexible error-component structure of the model, models with
interactive effects also introduce several practical challenges in terms of estimation.
The presence of multiplicative effects creates two major obstacles for applied econome-
tricians. First of all, their presence invalidates commonly used inferential procedures,
as the mostly used estimators are inconsistent in this case. Secondly, one has to rely
on non-linear estimation techniques that are less appealing than their linear counter-
parts from a computational point of view. Finite-sample and asymptotic properties of
these non-linear estimators depend on high-level assumptions and non-trivial nuisance
parameters.
The remaining two chapters of this thesis are devoted to the analysis of models of
this type, and as such contribute to the growing literature by shedding some light
on the finite-sample and asymptotic properties of estimators introduced to deal with
interactive effects.
Chapter 4 of this thesis, “Fixed T Dynamic Panel Data Estimators with Multi-Factor
Errors” is based on Juodis and Sarafidis (2014). In this paper we present a compre-
hensive overview of estimators for dynamic panel data models with interactive effects.
The objective of this chapter is to serve as a useful guide for practitioners who wish
to apply methods that allow for multiplicative sources of unobserved heterogeneity in
their model. We pay particular attention to calculating the number of identifiable
Chapter 1. Introduction. Why panel data? 7
parameters correctly, which is a requirement for asymptotically valid inferences and
consistent model selection procedures. This issue is often overlooked in the literature.
We investigate the finite-sample performance of the estimators under a number of
different designs using a large scale Monte Carlo study. In particular, we examine (i)
the effect of the presence of weakly exogenous covariates, (ii) the effect of changing the
magnitude of the correlation between the factor loadings of the dependent variable and
those of the covariates, (iii) the impact of the number of moment conditions on bias and
size for GMM estimators and tests, (iv) the impact of different levels of persistence in
the data, and finally (v) the effect of sample size. These are important considerations
with high empirical relevance. Notwithstanding, to the best of our knowledge they
remain largely unexplored in the literature.
The final chapter, “Pseudo Panel Data Models with Cohort Interactive Effects”, which
is based on Juodis (2015), takes a side step from the main topic of this thesis and
provides a detour to the analysis of pseudo panel data models. When genuine panel
data samples are unavailable, repeated cross-sectional surveys can be used to form
pseudo panels. I investigate the properties of linear pseudo-panel data models with a
fixed number of cohorts and time observations. It extends the work of Inoue (2008)
and Verbeek (2008) to models with multiplicative fixed effects. A special role in that
chapter is devoted to the discussion of identification issues of the proposed estimator
for potentially unbalanced samples.
In addition to the theoretical results for the novel estimator, an extensive Monte Carlo
simulation study is conducted to assess the finite-sample properties. We mainly focus on
the robustness of the proposed estimator with respect to endogeneity, cohort interactive
effects, and weak identification. To the best of our knowledge, this chapter is the first
study in the literature that touches upon the issue of weak and global identification in
pseudo panels for a fixed number of cohorts and time-series observations.
Chapter 2
First Difference Transformation in
Panel VAR models: Robustness,
Estimation and Inference
2.1 Introduction
When the feedback and interdependency between dependent variables and covariates is
of particular interest, multivariate dynamic panel data models might arise as a natural
modeling strategy. For example, particular policy measures can be seen as a response
to the past evolution of the target quantity, meaning that the reduced form of two
variables can be modeled by means of a Panel VAR (PVAR) model. In this paper we
aim at providing a thorough analysis of the performance of fixed T consistent estimation
techniques for PVARX(1) model based on observations in first differences. We mainly
focus on situations when the number of time periods is assumed to be relatively small,
while the number of cross-section units is large.
The estimation of univariate dynamic panel data models and the incidental parameter
problem of the ML estimators have received a lot of attention in the last three decades,
see Nickell (1981), and Kiviet (1995) among others. However, a similar analysis for mul-
tivariate panel data models was not covered and investigated in detail. Main exceptions
are papers by Holtz-Eakin et al. (1988), Hahn and Kuersteiner (2002), Binder, Hsiao,
and Pesaran (2005, hereafter BHP) and Hayakawa (2015) presenting theoretical results
for linear PVAR models. For empirical examples of PVAR models for microeconomic
9
Chapter 2. First Difference Transformation in Panel VAR models 10
panels, see Arellano (2003b, pp.116-120), Ericsson and Irandoust (2004), Michaud and
van Soest (2008), Koutsomanoli-Filippaki and Mamatzakis (2009) among others.
Because of the inconsistency of the Fixed Effects (FE, ML) estimator, the estimation
of Dynamic Panel Data (DPD) models has been mainly concentrated within the GMM
framework, with the version of the Arellano and Bond (1991) estimator and estimators
of Arellano and Bover (1995), Blundell and Bond (1998) and Ahn and Schmidt (1995;
1997). However, Monte Carlo studies have revealed that the method of moments (MM)
based estimators might be subject to substantial finite-sample biases, see Kiviet (1995),
Alonso-Borrego and Arellano (1999) and BHP. These potentially unattractive finite
sample properties of the GMM estimators have led to the recent interest in likelihood-
based methods, that are not subject to the incidental parameter bias. In this paper
the ML estimator based on the likelihood function of the first differences of Hsiao et al.
(2002), BHP and Kruiniger (2008) is analyzed (hereafter TML).
Monte Carlo results presented in BHP suggest that the Transformed Maximum Like-
lihood (TML) based estimation procedure outperforms the GMM based methods in
terms of both finite sample bias and RMSE. However, their analysis is incomplete,
particularly because, they did not consider cases where the models are stable, but the
initial condition is not mean and/or covariance stationary. Furthermore, the Monte
Carlo analysis was limited to situations where error terms are homoscedastic both in
time and in the cross-section dimension, leaving relevant cases of heteroscedastic error
terms unaddressed. We address both issues in the Monte Carlo designs presented in
Section 2.5.
We aim to contribute to the literature in multiple ways. First of all, we show that the
multivariate analogue of the FDOLS estimator of Han and Phillips (2010) is consistent
only over a restricted parameter set. Secondly, we consider properties of the TML
estimator for models with cross-sectional heteroscedasticity and mean non-stationarity.
Furthermore, we show that in the three wave panel the log-likelihood function of the
unrestricted TML estimator can violate the global identification condition. Finally,
the extensive Monte Carlo study expands the finite sample results available in the
literature to cases with possible non-stationary initial conditions and cross-sectional
heteroscedasticity.
The paper is structured as follows. In Section 2.2 we present the model and underlying
assumptions. Theoretical results for the Panel First Difference estimator are presented
in Section 2.3. We continue in Section 2.4 discussing the properties of the TML es-
timator under different assumptions regarding stationarity and heteroscedasticity. In
Chapter 2. First Difference Transformation in Panel VAR models 11
Section 2.5 we analyze the finite sample performance of estimators considered in the
paper by means of a Monte Carlo analysis. Finally, we conclude in Section 2.6.
Here we briefly discuss notation. Bold upper-case Greek letters are used to denote
the original parameters, i.e. Φ,Σ,Ψ, while the lower-case Greek letters φ,σ,ψdenote vec (·) (vech (·) for symmetric matrices) of corresponding parameters, in the
univariate setup corresponding parameters are denoted by φ, σ2, ψ2. Where necessary
we use subscript 0 to denote the true values of the aforementioned quantities. We
use ρ(A) to denote the spectral radius1 of a matrix A ∈ Rn×n. The commutation
matrix Ka,b is defined such that for any [a× b] matrix A, vec(A′) = Ka,b vec(A). The
duplication matrix Dm is defined such that for symmetric [m × m] matrix vecA =
Dm vechA. We define yi− ≡ (1/T )∑T
t=1 yi,t−1 and similarly yi ≡ (1/T )∑T
t=1 yi,t. The
lag-operator matrix LT is defined such that for any [T × 1] vector x = (x1, . . . , xT )′,
LTx = (0, x1, . . . , xT−1)′. The jth column of the [x × x] identity matrix is denoted
by ej. x is used to indicate variables after Within Group transformation (for example
yi,t = yi,t− yi), while x is used for variables after a “quasi-averaging” transformation.2
For further details regarding the notation used in this paper, see Abadir and Magnus
(2002).
2.2 Setup and assumptions
2.2.1 The Model
In this paper we consider the following PVAR(1) specification
yi,t = ηi +Φyi,t−1 + εi,t, i = 1, . . . , N, t = 1, . . . , T, (2.1)
where yi,t is an [m× 1] vector, Φ is an [m×m] matrix of parameters to be estimated,
ηi is an [m × 1] vector of fixed effects and εi,t is an [m × 1] vector of innovations
independent across i, with zero mean and constant covariance matrix Σ.3 If we set
m = 1 the model reduces to the linear DPD model with AR(1) dynamics.
1ρ(A) ≡ maxi(|λi|), where the λi’s are (possibly complex) eigenvalues of a matrix A.2yi = yi − yi,0 and yi− = yi− − yi,0.3Later in the paper we present the detailed analysis when Σ is i specific.
Chapter 2. First Difference Transformation in Panel VAR models 12
For a prototypical example of (2.1) consider the following bivariate model (see e.g. Bun
and Kiviet (2006), Akashi and Kunitomo (2012) and Hsiao and Zhou (2015))
yi,t = ηyi + γyi,t−1 + βxi,t + ui,t,
xi,t = ηxi + φyi,t−1 + ρxi,t−1 + vi,t,
where E[ui,tvi,t] = σuv. This system has the following reduced form(yi,t
xi,t
)=
(ηyi + βηxi
ηxi
)+
(γ + βφ βρ
φ ρ
)(yi,t−1
xi,t−1
)+
(ui,t + βvi,t
vi,t
). (2.2)
Depending on the parameter values, the process xi,tTt=0 can be either exogenous
(φ = σuv = 0), weakly exogenous (σuv = 0) or endogenous (σuv 6= 0).
For many empirically relevant applications the PVAR(1) model specification might be
too restrictive and incomplete. The original model then can be extended by including
strictly exogenous variables (the PVARX(1) model)
yi,t = ηi +Φyi,t−1 +Bxi,t + εi,t, i = 1, . . . , N, t = 1, . . . , T, (2.3)
where xi,t is a [k × 1] vector of strictly exogenous regressors and B is an [m × k]
parameter matrix.4 Furthermore, some models with group specific spatial dependence,
as in e.g. Kripfganz (2015) and Verdier (2015), can be also formulated as a reduced
form PVARX(1).
2.2.2 Assumptions and definitions
At first we define several notions that are primarily used for the model without exoge-
nous regressors.
Definition 2.1 (Effect stationary initial condition). The initial condition yi,0 is said
to be effect stationary if
E[yi,0|ηi] = (Im −Φ0)−1ηi, (2.4)
implying that the process yi,tTt=0 generated by (2.1) is effect stationary, E[yi,t|ηi] =
E[yi,0|ηi], for ρ(Φ0) < 1.
4Note that the model considered in Han and Phillips (2010) substantially differs from (2.3). Theyconsider a model specification with lags of xi,t and restricted parameters. Their specification can beaccommodated within (2.3) only if the so-called common factor restrictions on B are imposed.
Chapter 2. First Difference Transformation in Panel VAR models 13
Note that effect non-stationarity does not imply that the process yi,tTt=0 is mean non-
stationary, i.e. E[yi,t] 6= E[yi,0]. The latter property of the process crucially depends
on E[ηi].
Definition 2.2 (Common dynamics). The individual heterogeneity ηi is said to satisfy
the “common dynamics” assumption if
ηi = (Im −Φ0)µi. (2.5)
Under the common dynamics assumption, individual heterogeneity drops from the
model in the pure unit root case Φ0 = Im. Without this assumption the process
yi,tTt=0 has a discontinuity at Im, as at this point the unrestricted process is a
Multivariate Random Walk with drift. Combination of the two notions results in
E[yi,0|µi] = µi, note that this term is well defined for ρ(Φ0) = 1.
Definition 2.3 (Extensibility). The DGP satisfies extensibility condition if
Φ0Σ0 = (Φ0Σ0)′.
We call this condition “Extensibility” as in some cases this condition is sufficient to
extend univariate conclusions to general m ≥ 1 situations. One of the important
implications of this condition is that
∞∑t=0
Φt0Σ0(Φt0)′ = (Im −Φ20)−1Σ0 = Σ0(Im −Φ2′
0 )−1.
At first we summarize the assumptions regarding the DGP used in this paper, that are
similar to those made by Hsiao et al. (2002) and Binder et al. (2005).
(A.1) The disturbances εi,t, t ≤ T , are i.i.d. for all i with finite fourth moment, with
E[εi,t] = 0m and E[εi,tε′i,s] = 1(s=t)Σ0, Σ0 being a p.d. matrix.
(A.2) The initial deviation ui,0 ≡ yi,0 − µi is i.i.d. across cross-sectional units, with
E[ui,0] = 0m with variance Ψu,0 and a finite fourth moment.
(A.3) For all i = 1, . . . , N and t = 1, . . . , T , the moment restrictions E[ui,0ε′i,t] = Om
are satisfied .
(A.4) N →∞, but T is fixed.
Chapter 2. First Difference Transformation in Panel VAR models 14
(A.5) Regressors (if present) xi,t are strictly exogenous: E[xi,sε′i,t] = Ok×m, ∀t, s =
1, . . . , T , with a finite fourth moment.
(A.6) Matrix Φ0 ∈ Rm×m satisfies ρ(Φ0) < 1.
(A.6)* Denote by κ a [p × 1] vector of unknown coefficients. κ ∈ Γ , where Γ is a
compact subset of Rp and κ0 ∈ interior(Γ ).
We denote the set of Assumptions (A.1)-(A.6) by SA and by SA* the set when in
addition the (A.6)* assumption is satisfied. The SA assumptions are used to establish
results for the Panel FD estimators, while SA* are used to study asymptotic proper-
ties of the TML estimator. Assumption (A.6) is needed to ensure that the Hessian
of the TML estimator has a full rank5 in the model without regressors. On the other
hand, in Assumption (A.6)* we implicitly extend the parameter space for Φ to satisfy
the usual compactness assumption so that both consistency and asymptotic normality
can be proved directly, assuming the model is globally identified over the parameter
space. However, as we show in Section 2.4.4, the extended parameter space (beyond the
stationary region) might violate the global identification condition. For now the dimen-
sion of κ (“p”), is left unspecified and depends on a particular parametrization used
for estimation (with/without exogenous regressors, with/without mean term, etc.). In
Section 2.4.2 we consider the situation where we allow for individual specific Ψu,0 and
Σ0 matrices.
Note that Assumption (A.2) does not impose any restrictions on yi,0 and µi directly,
but instead on the initial deviation µi,0 (that in principle can be a linear or non-linear
function of µi). However, it is important to note that all estimators in first differences
remain invariant to the distributional characteristics of µi only if
yi,0 = µi + εi,0
with the idiosyncratic component ui,0 = εi,0 independent of µi. As emphasized in
Hsiao et al. (2002) and Hayakawa and Pesaran (2012), in this case µi can be spatially
correlated and/or depend on εi,t, t = 1, . . . , T without affecting the distribution of
the estimator in First Differences. Later in the paper we discuss situations when
this restriction might be violated and the consequences for the properties of the TML
estimator.
5See e.g. Bond et al. (2005) and Juodis (2014a) for proofs that the Hessian matrix of the TMLestimator is singular at the unit root in Panel AR(1) and Panel VAR(1) models, respectively.
Chapter 2. First Difference Transformation in Panel VAR models 15
2.3 OLS in first differences
2.3.1 With exogenous regressors
The original model in levels contains individuals effects that we remove using the first-
difference transformation. In that case the model specification is given by
∆yi,t = Φ∆yi,t−1 +B∆xi,t + ∆εi,t, i = 1, . . . , N, t = 2, . . . , T.
Before proceeding we define the following variables
∆wi,t ≡
(∆yi,t−1
∆xi,t
), SN ≡
(1
N
N∑i=1
T∑t=1
∆wi,t∆w′i,t
),
ΣW ≡ plimN→∞
SN , Υ ≡ (Φ,B) .
After pooling observations for all t and i, we define the pooled panel first difference
estimator (FDOLS) as
Υ ′ = S−1N
(1
N
N∑i=1
T∑t=1
∆wi,t∆y′i,t
). (2.6)
Similarly to the conventional Fixed Effects (FE) transformation, the FD transformation
introduces correlation between the explanatory variable ∆yi,t−1 and the modified error
term ∆εi,t. As a result this estimator is inconsistent,6 with the asymptotic bias derived
in Proposition 2.4.
Proposition 2.4. Let yi,tTt=1 be generated by (2.3) and Assumptions SA be satisfied.
Then
plimN→∞
(Υ − Υ0)′ = −(T − 1)Σ−1W
(Σ0
Ok×m
). (2.7)
It is easy to see that FDOLS is numerically equal to the FE estimator with T = 2,
thus the asymptotic bias is identical as well. Furthermore, as long as T ≥ 2 the
bias correction approaches as in Kiviet (1995) and Bun and Carree (2005) are readily
available for this estimator (for more details please refer to Appendix 2.B). However, the
6Irrespective whether T fixed or T →∞.
Chapter 2. First Difference Transformation in Panel VAR models 16
consistency and asymptotic normality of any estimator based on an iterative procedure
crucially depends on the existence of a unique fixed point. As a result, similarly to
the estimator of Bun and Carree (2005), this estimator might fail to converge for
some DGP specifications. These issues stimulate us to look for other analytical bias-
correction procedures that have desirable finite sample properties irrespective of the
DGP parameter values and initialization yi,0. Some special cases for the model without
exogenous regressors are discussed in the next section.
2.3.2 Without exogenous regressors
Assume that yi,0 is covariance stationary and as a consequence
ΣW = (T − 1)
(Σ0 + (Im −Φ0)
(∞∑t=0
Φt0Σ0(Φt0)′
)(Im −Φ0)′
).
In the univariate case it is well known that covariance stationarity of yi,0 is a sufficient
condition to obtain an analytical bias-corrected estimator. However, it is no longer
sufficient for m > 1 and general matrices Φ0 and Σ0. One special case for analyt-
ical bias-corrected estimator is obtained for (Φ0,Σ0) that satisfy the “extensibility”
condition, so that
ΣW = 2(T − 1)Σ0 (Im +Φ′0)−1.
The resulting fixed T consistent estimator for Φ is then given by
ΦFDLS = 2Φ∆ + Im.
It can be similarly shown that this estimator is also fixed T consistent if Φ0 = Im
and the common dynamics assumption is satisfied. For m = 1, this estimator was
analyzed by Han and Phillips (2010), who labeled it the First Difference Least-Squares
(FDLS) estimator, and proved its consistency and asymptotic normality under various
assumptions. It should be noted that the same estimator (or the moment conditions
it is based on) has been studied earlier in the DPD literature, see Bond et al. (2005),
Ramalho (2005), Hayakawa (2007), Kruiniger (2007).
Proposition 2.5 (Asymptotic Normality FDLS). Let the DGP for covariance station-
ary yi,t satisfy the extensibility condition together with conditions of Proposition 2.4.
Then √N(φFDLS − φ0
)d−→ Nm(0m2 ,F), (2.8)
Chapter 2. First Difference Transformation in Panel VAR models 17
where
F ≡ (Σ−1W ⊗ Im)X(Σ−1
W ⊗ Im), X ≡ plimN→∞
1
N
N∑i=1
vecOi (vecOi)′ ,
Oi ≡
(T∑t=2
(2∆yi,t + (Im −Φ0)∆yi,t−1)∆y′i,t−1
).
The proof of Proposition 2.5 follows directly as an application of the standard Lindeberg-
Levy CLT (see e.g. White (2000) for a general reference on asymptotic results).
Note that if the extensibility condition is violated the multivariate analogue of the
FDLS estimator is not fixed T consistent. In that case the moment conditions similar
to Han and Phillips (2010) can be considered. However, for general Φ0 andΣ0 matrices
these moment conditions are non-linear inΦ and require numerical optimization making
this approach undesirable, because the closed-form estimator is the main advantage of
the FDLS estimator as compared to the TML estimator that we describe in the next
section.
2.4 Transformed MLE
Independently Hsiao et al. (2002) and Kruiniger (2002)7 suggested to build the quasi-
likelihood for a transformation of the original data, such that after the transformation
the likelihood function is free from incidental parameters. In particular, the likelihood
function for the first differences was analyzed. BHP extended the univariate analysis of
Hsiao et al. (2002) and Kruiniger (2002) to the multivariate case, allowing for possible
cointegration between endogenous regressors.
In order to estimate (2.3) using the TML estimator of BHP we need to fully describe
the density function f(∆yi|∆Xi). The only thing that needs to be specified and
not imposed directly by (2.3) is E[∆yi,1|∆Xi], where ∆Xi is a [Tk × 1] vector of
stacked exogenous variables. The conditional mean assumption is actually stronger
than necessary for consistency and asymptotic normality of the TML estimator so we
follow the approach of Hsiao et al. (2002) and consider the following linear projection
for the first observation:
7Later appeared in Kruiniger (2008).
Chapter 2. First Difference Transformation in Panel VAR models 18
(TX.D) Proj[∆yi,1|∆Xi] = γ +Gπ∆Xi = B∆xi,1 +G∆X†i , ∆X†i = (1,∆X ′i)′,
with the projection error denoted by vi,1. For the resulting TML estimator to be con-
sistent and standard inference procedures to be applicable, the population projection
coefficients have to be identical for all cross-sectional units. This requirement can be
violated if ui,0 is an individual specific function of µi (or ui,0 is a function of µi and
µi is deterministic).
Before proceeding we define
∆Ei ≡ (ITm −LT ⊗Φ)∆Yi − (IT ⊗B)∆Xi − vec (G∆X†i e′1),
where ∆Yi = vec (∆yi,1, . . . ,∆yi,T ). Then assuming (conditional) joint normality of
the error terms and the initial observation, the log-likelihood function (up to a constant)
is of the following form
`(κ) = −N2
log |Σ∆τ | −N
2tr
((Σ−1
∆τ
) 1
N
N∑i=1
∆Ei∆E′i
), (2.9)
with κ = (φ′,σ′,ψ′, vecB′, vecG′)′ and Ψ = E[vi,1v′i,1]. The Σ∆τ matrix has a block
tridiagonal structure, with −Σ on the lower and upper first off-diagonal blocks, and
2Σ on all but the first (1,1) diagonal blocks. The first (1,1) block is set to Ψ , which
takes into account the fact that the variance of vi,1 is treated as a free parameter.
Remark 2.1. As discussed in BHP, the log-likelihood function in (2.9) depends on a
fixed number of parameters, and satisfies the usual regularity conditions. Therefore
under SA* the maximizer of this (quasi) log-likelihood function is consistent with
limiting normal distribution as N → ∞. Consistency is derived assuming that the
log-likelihood function has a unique global maximum at the true value κ0. Note that
for this log-likelihood function consistency of the resulting estimator cannot be proved
based on zeros of the gradient vector, as in general more than one solution will solve
the First Order Conditions (FOC). Section 2.4.4 contains some details for AR(1) on
this issue, while the follow-up paper of Bun et al. (2015) provides a detailed analysis
for the ARX(1) model.
Remark 2.2. Note that the results for the TML estimator derived in this paper do
not require the normality assumption. If the normality assumption is violated `(κ)
is a (quasi) log-likelihood function. For brevity, we use the term log-likelihood rather
than quasi log-likelihood even if the normality assumption is violated. In its general
form, the asymptotic variance-covariance matrix of the estimator has a “sandwich”
Chapter 2. First Difference Transformation in Panel VAR models 19
form. This “sandwich” form allows for√N consistent inference, when the normality
assumption is violated.
Next we show that conditioning on exogenous variables in first differences leads to a
concentrated log-likelihood functions in φ only.
Theorem 2.6. Let Assumptions SA* and (TX.D) be satisfied. Then the log-likelihood
function of BHP for model (2.3) can be rewritten
− 2
N`(κ) = (T − 1) log |Σ|+ log |Θ|
+ tr
(Σ−1 1
N
N∑i=1
T∑t=1
(yi,t −Φyi,t−1 −Bxi,t)(yi,t −Φyi,t−1 −Bxi,t)′)
+ tr
(Θ−1 T
N
N∑i=1
(yi −G∆X†i −Φyi− −Bxi)(yi −G∆X†i −Φyi− −Bxi)′),
where κ = (φ′,σ′,θ′, vecB′, vecG′)′, Θ ≡ Σ + T (Ψ −Σ) and xi ≡ xi − xi,0.
Proof. In Appendix 2.A.2.
The main conclusion of Theorem 2.6 is that in the case where Ψ is unrestricted, both the
score and the Hessian matrix of the log-likelihood function have closed form expressions,
that are easy to use. That implies that there is no need to use the involved algorithms
of BHP in order to compute the inverse and the determinant of the block tridiagonal
matrix Σ∆τ .
In order to simplify the notation, we introduce a new variable
ξi(κ) ≡ yi −G∆X†i −Φyi− −Bxi. (2.10)
Using this definition,8 we can formulate the following result.
8Some other variables used in this section are defined in Appendix 2.A, so we do not repeat ithere.
Chapter 2. First Difference Transformation in Panel VAR models 20
Proposition 2.7. Let Assumptions SA* be satisfied. Then the score vector associated
with the log-likelihood function of Theorem 2.6 is given by9
∇(κ) =
vec(Σ−1
∑Ni=1
∑Tt=1(yi,t −Φyi,t−1 −Bxi,t)y′i,t−1 + TΘ−1
∑Ni=1 ξi(κ)y′i−
)D′m vec (N
2(Σ−1(ZN(κ)− (T − 1)Σ)Σ−1))
D′m vec (N2
(Θ−1(MN(κ)−Θ)Θ−1))
vec(Σ−1
∑Ni=1
∑Tt=1(yi,t −Φyi,t−1 −Bxi,t)x′i,t + TΘ−1
∑Ni=1 ξi(κ)x′i
)vec(TΘ−1
∑Ni=1 ξi(κ)∆X†
′
i
)
.
(2.11)
Furthermore, the score vector satisfies the usual regularity condition
E[∇(κ0)] = 0p.
Proof. In Appendix 2.A.3.
The dimension of the κ vector is substantial especially for moderate values of m and
k, hence from a numerical point of view, maximization with respect to all parameters
might not be appealing. Next we show that it is possible to construct the concentrated
log-likelihood function with respect to the φ parameter only.10 To simplify further
notation we define the following concentrated variables (assuming N > Tk)
yi ≡ yi −
(N∑i=1
yi∆X†′i
)(N∑i=1
∆X†i ∆X†′i
)−1
∆X†i ,
yi− ≡ yi− −
(N∑i=1
yi−∆X†′
i
)(N∑i=1
∆X†i ∆X†′i
)−1
∆X†i ,
yi,t ≡ yi,t −
(N∑i=1
T∑t=1
yi,tx′i,t
)(N∑i=1
T∑t=1
xi,tx′i,t
)−1
xi,t,
yi,t−1 ≡ yi,t−1 −
(N∑i=1
T∑t=1
yi,t−1x′i,t
)(N∑i=1
T∑t=1
xi,tx′i,t
)−1
xi,t.
9See also similar derivations in Mutl (2009).10The key observation for this result is that, although the B parameter enters both tr (·) compo-
nents, xi belongs to the column space spanned by ∆X†i . Hence after concentrating out G, B is nolonger present in the second term.
Chapter 2. First Difference Transformation in Panel VAR models 21
Using the newly defined variables the concentrated log-likelihood function for κc =
φ′,σ′,θ′′ is given by
`c(κc) = −N2
((T − 1) log |Σ|+ tr
(Σ−1 1
N
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)(yi,t −Φyi,t−1)′
))
− N
2
(log |Θ|+ tr
(Θ−1 T
N
N∑i=1
(yi −Φyi−)(yi −Φyi−)′
)).
Continuing we can concentrate out both Σ and Θ to obtain the concentrated log-
likelihood function for the φ parameter vector only.
`c(φ) = −N(T − 1)
2log
∣∣∣∣∣ 1
N(T − 1)
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)(yi,t −Φyi,t−1)′
∣∣∣∣∣− N
2log
∣∣∣∣∣ TNN∑i=1
(yi −Φyi−)(yi −Φyi−)′
∣∣∣∣∣.However, as there is no closed-form solution for Φ, numerical routines should be used
to maximize this concentrated likelihood function.11 The corresponding FOC can be
derived from Proposition 2.7 for the unrestricted model.
Remark 2.3. The log-likelihood function in Theorem 2.6 can be expressed in terms of
the log-likelihood function for observations in levels `cl (κ) (Within group part), as
`(κ) = `cl (κ)− N
2
(log |Θ|+ tr
(Θ−1 T
N
N∑i=1
ξi(κ)ξi(κ)′
)),
where κ = (φ′,σ′, vecB′)′. The additional (“Between” group) term corrects for the
fixed T inconsistency of the standard ML (FE) estimator. This result is just a general-
ization of Kruiniger (2006; 2008) and Han and Phillips (2013) conclusions to PVARX(1)
with respect to the functional form of `(κ).12
Remark 2.4. In the online appendix Juodis (2014c) we derive the exact expression for
the empirical Hessian matrix HN(κTMLE) and show that this matrix as well as its
inverse are not block-diagonal, hence the TMLE of Φ and Σ (as well as Θ) are not
11For the PVAR(1) model with spatial dependence of autoregressive type as in Mutl (2009), boththe Θ and Σ parameters can be concentrated out but not the spatial dependence parameter λ.
12Grassetti (2011) also discusses a similar decomposition of the log-likelihood function for the panelARX(1) model.
Chapter 2. First Difference Transformation in Panel VAR models 22
asymptotically independent.13 Non block-diagonality of the covariance matrix needs to
be taken into account, e.g. for the impulse response analysis as in Cao and Sun (2011).
In the next few sections we focus our attention on the restricted model without addi-
tional strictly exogenous regressors. In this case the quasi log-likelihood function can
be simplified and written in the following way
`(κ) = −N2
((T − 1) log |Σ|+ tr
(Σ−1 1
N
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)(yi,t −Φyi,t−1)′
))
− N
2
(log |Θ|+ tr
(Θ−1 T
N
N∑i=1
(yi −Φyi−)(yi −Φyi−)′
)), (2.12)
where κ = (φ′,σ′,θ′)′, Θ ≡ Σ + T (Ψ − Σ) and Ψ = var ∆yi,1. Model without
exogenous regressors was considered in BHP for TML estimator and in Alvarez and
Arellano (2003) for the model in levels. Note that in this specification we assume that
E[ui,0] = 0m holds, later in Section 2.4.3 we investigate properties of the maximizer
(2.12) when this assumption is violated. Possible problems with respect to bimodality
of the log-likelihood function in the AR(1) context are discussed in Section 2.4.4. In
Section 2.4.1 we provide results when the covariance-stationarity assumption is imposed
on Ψ .
2.4.1 Likelihood function with imposed covariance-stationarity
If one is willing to strengthen some of the original assumptions by assuming that ui,0
comes from the (covariance) stationary distribution, then the log-likelihood function is
a function of κcov = φ,σ only. The Θ matrix in this case is no longer treated as a
free parameter but instead is restricted to be of the following form
Θ = Σ + T (Im −Φ)
(∞∑t=0
ΦtΣ(Φt)′
)(Im −Φ)′.
Note that if one imposes covariance stationarity of ui,0, it is no longer possible to con-
struct the concentrated log-likelihood for the φ parameter and a joint optimization
13This result is in sharp contrast to the pure time series VARs where it can be shown that estimatesare indeed asymptotically independent.
Chapter 2. First Difference Transformation in Panel VAR models 23
over the full parameter vector κcov is required.14 Kruiniger (2008) presents asymp-
totic results for the univariate version of this estimator under a range of assumptions
regarding types of convergence. Results for PVAR(1) can be proved similarly.
Proposition 2.8. Let Assumptions SA* be satisfied. Then the score vector associated
with the log-likelihood function in (2.12) under covariance stationarity is given by:15
∇(κcov) =
(vec (W2,N(κcov)) + J ′φθ vecW1,N(κcov)
D′m(vec (N
2(Σ−1(ZN(κcov)− (T − 1)Σ)Σ−1)) + J ′σθ vecW1,N(κcov)
) ) .(2.13)
Here we define Π ≡ Φ− Im and
W1,N(κ) ≡ N
2(Θ−1(MN(κ)−Θ)Θ−1),
W2,N(κ) ≡ Σ−1
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)y′i,t−1 + TΘ−1
N∑i=1
(yi −Φyi−)y′i−,
Jφθ ≡ −T((σ′D′m(Im2 −Φ′ ⊗Φ′)−1
)⊗ Im2
)× (Im ⊗Km⊗Im)− (Im2 ⊗ vec (Π) + vec (Π)⊗ Im2)
+ T((σ′D′m(Im2 −Φ′ ⊗Φ′)−1
)⊗((Π ⊗Π) (Im2 −Φ⊗Φ)−1
))× (Im ⊗Km⊗Im) (Im2 ⊗ φ+ φ⊗ Im2) ,
Jσθ ≡ Im2 + T (Π ⊗Π) (Im2 −Φ⊗Φ)−1 .
Proof. In Appendix 2.A.3.
It can be seen that E[∇(κcov0 )] 6= 0m2+(1/2)(m+1)m, unless the initial condition is co-
variance stationary (that is in contrast with the conclusion of Proposition 2.7 for the
unrestricted estimator). Thus violation of the covariance stationarity implies that es-
timator κcov is inconsistent.
Remark 2.5. Han and Phillips (2013) discuss possible problems of the TML estimator
with imposed covariance stationarity near unity. They observe that the log-likelihood
function can be ill-behaved and bimodal close to φ0 = 1. In this paper, we do not
investigate this possibility of bimodality for the PVAR model as the behaviour of the
log-likelihood function close to unity is not of prime interest for us. Furthermore,
the bimodality in Han and Phillips (2013) is not related to the bimodality of the
unrestricted TML estimator as discussed in Section 2.4.4.14Unless the parameter space for Φ and Σ is such that the “extensibility condition” is satisfied,
see univariate results in Han and Phillips (2013).15Note that there is a mistake in the derivations of the Jφθ term in Mutl (2009).
Chapter 2. First Difference Transformation in Panel VAR models 24
2.4.2 Cross-sectional heterogeneity
In this subsection we consider a model with possible cross-sectional heterogeneity in
Σ,Ψu. For notational simplicity we consider a model without exogenous regressors.
All results presented can be extended to a model with exogenous regressors at the
expense of more complicated notation.
(A.1)** The disturbances εi,t, t ≤ T , are i.h.d. for all i with E[εi,t] = 0m and
E[εi,tεi,s] = 1(s=t)Σ0,i, Σ0,i being a p.d. matrix and maxi E[‖εi,t‖4+δ
]< ∞
for some δ > 0.
(A.2)** The initial deviations ui,0 are i.h.d. across cross-sectional units, with E[ui,0] =
0m and finite p.d. variance matrix Ψu,0,i and maxi E[‖ui,0‖4+δ
]< ∞, for some
δ > 0.
We denote by Σ0 and similarly by Ψu,0 the limiting values of the corresponding sample
averages, i.e. Σ0 = limN→∞(1/N)∑N
i=1Σ0,i.16 Existence of the higher-order moments
as presented in Assumptions (A.1)**-(A.2)** is a standard sufficient condition for
the Lindeberg-Feller CLT to apply. We denote by SA** the set of assumptions SA*,
with (A.1)-(A.2) replaced by (A.1)**-(A.2)**. The univariate analogues of the
results presented in this section for the TMLE estimator, were derived by Kruiniger
(2013) and Hayakawa and Pesaran (2012).
Remark 2.6. As an example of a DGP that satisfies (A.2)**, consider the following
equation
yi,0 = µi + F (µi)εy0, (2.14)
with µi being an non-stochastic m dimensional vector, F (·) : Rm → Rm×m a real
function and εy0 ∼ (0m,Σy0). In this example E[ui,0] = 0m, while E[ui,0u′i,0] =
F (µi)Σy0F (µi)′.
The unrestricted log-likelihood function for κ = (φ′,σ′1, . . . ,σ′N ,θ
′1, . . . ,θ
′N)′ suffers
from the incidental parameter problem, as the number of parameters grows with the
sample size, N . This implies that no√N consistent inference can be made on the σi
and θi parameters, but that does not imply that φ parameter cannot be consistently
16As it was mentioned in Kruiniger (2013), Assumptions (A.1)**-(A.2)** are actually stronger
than necessary, as it is sufficient to assume that (1/N)∑Ni=1 E[εi,sε
′i,s] = (1/N)
∑Ni=1 E[εi,tε
′i,t] for all
s, t = 2, . . . , T to prove consistency and asymptotic normality.
Chapter 2. First Difference Transformation in Panel VAR models 25
estimated. Notably, we consider the pseudo log-likelihood function `p(κ)17
`p(κ) = −N2
((T − 1) log |Σ|+ tr
(Σ−1 1
N
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)(yi,t −Φyi,t−1)′
))
− N
2
(log |Θ|+ tr
(Θ−1 T
N
N∑i=1
(yi −Φyi−)(yi −Φyi−)′
)),
obtained if one would mistakenly assume that observations are i.i.d. We shall prove
that the conclusions from Section 2.4 continue to hold, with κ0 replaced by pseudo-true
values κ = (φ′, σ′, θ′)′, where
σ = vech Σ0, θ = vech Θ0, φ = φ0,
Θ0 = Σ0 + T (Im −Φ0)
(limN→∞
1
N
N∑i=1
Ψu,0,i
)(Im −Φ0)′.
We assume that κ satisfy a compactness property similar to (A.5)*. It is not difficult
to see that the point-wise probability limit of (1/N)`p(κ) is given by
plimN→∞
1
N`p(κ) = −1
2
((T − 1) log |Σ|+ tr
(Σ−1 plim
N→∞ZN(κ)
))− 1
2
(log |Θ|+ tr
(Θ−1 plim
N→∞MN(κ)
)),
where
plimN→∞
ZN(κ) = (T − 1)Σ0 + (Φ0 −Φ)
(plimN→∞
RN
)(Φ0 −Φ)′
− 1
T
((Φ0 −Φ)ΞΣ0 + Σ0Ξ
′(Φ0 −Φ)′),
plimN→∞
MN(κ) = Θ0 + (Φ0 −Φ)
(plimN→∞
PN
)(Φ0 −Φ)′
+1
T
((Φ0 −Φ)ΞΘ0 + Θ0Ξ
′(Φ0 −Φ)′).
Note that we would obtain the same probability limit of the pseudo log-likelihood func-
tion if ui,0 and εi,tN,Ti=1,t=1 were i.i.d. Gaussian with parameters κ, hence identification
17Here “p” stands for pseudo and is used to distinguish from the standard TMLE log-likelihoodfunction where inference on Σ and Θ is possible.
Chapter 2. First Difference Transformation in Panel VAR models 26
follows from the result for i.i.d. data. Similarly denote κN = (σ′N , θ′N , φ
′)′, where
σN =1
N
N∑i=1
σ0,i, θN =1
N
N∑i=1
θ0,i, φ = φ0.
Consistency and asymptotic normality of κ follows using standard arguments, see e.g.
Amemiya (1985).
Proposition 2.9 (Consistency and Asymptotic normality). Under Assumptions SA**
the maximizer of `p(κ) is consistent κp−→ κ. Furthermore, under these assumptions
√N (κ− κN)
d−→ N(0,BPML),
where
BPML = H−1` I`H−1
` ,
H` = limN→∞
E
[− 1
NHN
p (κ)
], and I` = lim
N→∞
1
NE
[N∑i=1
∇(i)p (κ0,i)∇(i)
p (κ0,i)′
].
In Appendix we show that the expected value of this log-likelihood function evaluated
at κN is zero. Here by ∇(i)p (κ0,i) we denote the contribution of one cross-sectional
unit i to the score of the pseudo log-likelihood function ∇p(κ) evaluated at the true
values φ0,σ0,i,θ0,i. Note that unless cross-sectional heterogeneity disappears (at a
sufficiently fast rate) as N → ∞, the standard “sandwich” formula of the variance-
covariance matrix evaluated at κ is not a consistent estimate of the asymptotic variance-
covariance matrix in Proposition 2.9, as in general
limN→∞
1
N
N∑i=1
σ0,iσ′0,i 6=
(limN→∞
1
N
N∑i=1
σ0,i
)(limN→∞
1
N
N∑i=1
σ0,i
)′, (2.15)
while H` and BPML are not block-diagonal for fixed T . However, under some restrictive
assumptions on higher order moments of initial observations and the variance of strictly-
exogenous regressors (when they are present) Hayakawa and Pesaran (2012) argue that
it is possible to construct a modified consistent estimator of I` for the ARX(1) model.
In the Monte Carlo section of this paper we use the standard “sandwich” estimator for
the variance-covariance matrix without any modifications. We leave the derivation of
modified consistent estimator of I` for general PVARX(1) case for future research.
Chapter 2. First Difference Transformation in Panel VAR models 27
2.4.3 Misspecification of the mean parameter
Let us assume that one does not acknowledge the fact that data in differences is mean
non-stationary (as a consequence of E[ui,0] = γu0 6= 0m) and just considers the log-
likelihood function as in (2.12). Denote by κ = (φ′, σ′, θ′)′, where
φ = φ0, σ = σ0, θ = σ0 + T vech[(Im −Φ0) E[ui,0u
′i,0](Im −Φ0)′
].
Hence θ is a function of the second moment of ui,0, rather than the variance of ui,0.
Analogously to the univariate result in Kruiniger (2002), we have the following result.
Proposition 2.10. Let all but E [ui,0] = γu0 = 0m Assumptions SA* be satisfied.
Then κ the maximizer of (2.12) is consistent in a sense that κp−→ κ. Furthermore,
under these assumptions
√N (κ− κ)
d−→ N(0,BML),
where
BML = H−1` I`H−1
` ,
H` = limN→∞
E
[− 1
NHN(κ)
], and I` = lim
N→∞E
[1
N
N∑i=1
∇(i)(κ)∇(i)(κ)′
].
In Appendix we show that the expected value of this log-likelihood function evaluated
at κ is zero.
Remark 2.7. One can think of γ = (Φ0 − Im)γu0 as a (restricted) time effect for
∆yi,1. In general, the non-inclusion of time effects (when they are present in the
model for yi,t, t > 1) results in inconsistency of the TML estimator. As it was already
discussed in BHP, inclusion of time effects is equivalent to cross-sectional demeaning of
all ∆yi,t beforehand. The resulting estimator κ, is then consistent for κ0. As a result,
if the cross-sectional demeaning is performed beforehand, the non-inclusion of the γ
parameter is inconsequential.
Remark 2.8. Note that by combining analysis in Propositions 2.10 and 2.9 we can see
that for cases where E [∆yi,1] = γi are individual specific, one still obtains a consistent
estimate of Φ by simply maximizing `p(κ).18 As an example for this situation we
consider DGP
yi,0 = Γµi + εy0, εy0 ∼ (0m,Σy0),
18Please refer to the proof of Proposition 2.9 in Appendix.
Chapter 2. First Difference Transformation in Panel VAR models 28
with Γ 6= Im and µi being non-stochastic individual specific effects. Hence, the mean
E [∆yi,1] = (Φ0 − Im)(Γ − Im)µi = γi is individual specific.
2.4.4 Identification and bimodality issues for three-wave pan-
els
In this section we study the behavior of the log-likelihood function for the TML es-
timator with an unrestricted initial condition. Consistency and asymptotic normality
of any ML estimator, among others, requires the assumption that the expected log-
likelihood function has the unique maximum at the true value. As we shall prove in
this section, this condition is possibly violated for the TML estimator with unrestricted
initial condition for T = 2. For ease of exposition we consider the univariate setup as
in Hsiao et al. (2002).
Theorem 2.11. Let assumptions SA* be satisfied. Then for all φ0 ∈ (−1; 1) and
T = 2 the following holds
plimN→∞
`c(φ0) = plimN→∞
`c(φp) (2.16)
for any value of ψ2u,0 > 0. Consequently the expected log-likelihood function has two
local maxima
κ0 =(φ0, σ
20, θ
20
)′,
κp =(φp, θ
20, σ
20
)′,
where
φp ≡ 2
(x− 1
x
)+ φ0, x ≡ 1 + (1− φ0)2ψ2
u,0/σ20 =
1
2
(θ2
0
σ20
+ 1
),
Recall that based on the definition of Θ in Theorem 2.6, the true value of θ2 is given
by
θ20 = σ2
0 + T (1− φ0)2ψ2u,0, ψ2
u,0 = E[u2i,0].
Several remarks regarding the results in Theorem 2.11 are worth mentioning.19 First
of all, instead of proving the result using the concentrated log-likelihood function, it
can be proved similarly by considering the expected log-likelihood function directly.
19We should emphasize that Theorem 2.11 has any theoretical meaning only if φp ∈ Γ .
Chapter 2. First Difference Transformation in Panel VAR models 29
Secondly, if the parameter space is expressed in terms of κ = (φ, σ2, ψ2)′, then the
value of ψ2 in both sets is equal to ψ20 = ψ2
p = (σ20 + θ2
0)/2.
Remark 2.9. While deriving the result we assumed that E[ui,0] = 0 and γ is not included
in the parameter set. If E[ui,0] 6= 0 then two cases are possible: a) a misspecified log-
likelihood function as in Section 2.4.3 is considered and the result remains unchanged;
b) the γ parameter is included in the set of parameters and as a result Theorem 2.11
does not hold true. For intuition observe that in the latter case the trivial estimator
φ = (∑N
i=1 ∆yi,2)/(∑N
i=1 ∆yi,1) is consistent. However, the key observation for this
special case is that the model does not contain time effects. If, on the other hand,
the model contains time effects, φ is no longer consistent and consequentially the main
result of this section is still valid after cross-sectional demeaning of the data.
Remark 2.10. In the covariance stationary case it can be shown that the conclusion
of Theorem 2.11 extends to PVAR(1) if the extensibility condition is satisfied and
in addition Φ0 is symmetric. In particular, this condition is satisfied by all three
stationary designs in BHP with the pseudo value equal to the identity matrix.
Without loss of generality we can rewrite ψ2u,0 as
ψ2u,0 = α
σ20
1− φ20
, α ≥ 0.
To get more intuition about the problem at hand we can rewrite φp in the following
way
φp =(φ2
0 + φ0)(1− α) + 2α
1 + α + φ0(1− α). (2.17)
From here it can be easily seen that then the pseudo-true value φp is equal to unity
for covariance stationary initialization (α = 1). Furthermore, we can consider other
special cases
|φ0|≤ 1, α = 0→ φp = φ0,
|φ0|≤ 1, α ∈ (0, 1)→ φ0 < φp < 1.
In Monte Carlo simulations it is common to impose some restrictions on the parameter
space. In most cases φ is restricted to the stable region (−1; 1), e.g. Hsiao et al. (2002).
However, as it is clearly seen from Figure 2.2 (and derivations above) a stable region
restriction on φ does not solve the bimodality issue and φp can lie in this interval.
By construction the concentrated log-likelihood function is a sum of two quasi -concave
functions with maxima at different points (Within Group and Between Group parts),
Chapter 2. First Difference Transformation in Panel VAR models 30
thus bimodality does not disappear for T > 2. Thus by adding these two terms we end-
up having function with possibly two modes, with the first one being of order OP (NT )
and the second one of order OP (N). This different order of magnitude explains why for
larger values of T the WG mode determines the shape of the whole function. To illus-
trate the problem described we present several figures of plimN→∞ `c(φ) for stationary
initial conditions.
-2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
-2.00
-1.75
-1.50
-1.25
-1.00
-0.75
-0.50
-0.25
0.00
(a) φ0 = −0.5, T = 2
-2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
-1.50
-1.25
-1.00
-0.75
-0.50
-0.25
0.00
0.25
(b) φ0 = 0.5, T = 2
-2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
-2.00
-1.75
-1.50
-1.25
-1.00
-0.75
-0.50
-0.25
0.00
0.25
(c) φ0 = 0.5, T = 6
Figure 2.1: Concentrated asymptotic log-likelihood function. In all figures the firstmode is at the corresponding true value φ0, while the second mode is located atφ = 1. The initial observation is from the covariance stationary distribution. Thedashed line represents the WG part of the log-likelihood function, while the dottedline the BG part. The solid line, which stands for the log-likelihood function is a
sum of dashed and dotted lines.
The behavior of the concentrated log-likelihood function in Figures 2.1a, 2.1b and 2.1c
is in line with the theoretical results provided earlier. Note that once φ0 is approaching
unity the log-likelihood function becomes flatter and flatter between the two points.
We can see from Figure 2.1c that once T is substantially bigger than 2, the “true value”
mode starts to dominate the “pseudo value” mode. Based on all figures presented we
Chapter 2. First Difference Transformation in Panel VAR models 31
TMLE
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3
0.5
1.0
1.5
2.0
2.5
3.0
Density
TMLE
Figure 2.2: Histogram for the TMLE estimator with T = 3, φ0 = 0.5, N = 250and 10,000 MC replications. The initial observation is from the covariance stationarydistribution. Starting values for all iterations are set to φ(0) = 0.0, 0.1, . . . , 1.5. No
non-negativity restrictions imposed.
can suspect that at least for covariance stationary initial conditions (or close to) the
TML estimator is biased positively, with the magnitude diminishing in T .
The main intuition behind the result in Theorem 2.11 is quite simple. When the log-
likelihood function for θ (or ψ) is considered, no restrictions on the relative magnitude
of those terms compared to σ2 are imposed. In particular, it is possible that θ2 < σ2
but that is a rather strange result given that
θ20 = σ2
0 + T (1− φ0)2 E[u2i,0].
But that is exactly what happens in the κp set as
θ2p = σ2
0, σ2p = θ2
0.
Hence the implicit estimate of (1 − φ0)2 E[u2i,0] is negative as we do not fully exploit
the implied structure of var ∆yi,1, which is a so-called “negative variance problem”
documented in panel data, among others, by Maddala (1971).20 This problem was
already encountered in some Monte Carlo studies performed in the literature (even for
larger values of T ), while some other authors only mention this possibility, e.g. Alvarez
and Arellano (2003) and Arellano (2003a). For instance, Kruiniger (2008) mentions
that for values of φ0 close to unity the non-negative constraint on (1 − φ0)2 E[u2i,0], if
imposed, is binding in 50 % of the cases. The Θ or Ψ parameter, on the other hand,
is by construction p.d. (or non-negativity for univariate case). That explains why in
20Note that Maddala (1971) considers the Random Effects type estimator for Dynamic Panel Datamodels, similar (although not identical) to the one in Alvarez and Arellano (2003).
Chapter 2. First Difference Transformation in Panel VAR models 32
some studies (for instance Ahn and Thomas (2006)) no numerical issues with the TML
estimator were encountered. In this paper we analyze the limiting case of T = 2 and
quantify the exact location of the second mode. Observations made in this section
provide intuition for some of the Monte Carlo results presented in Section 2.5.
2.4.5 Time-series heteroscedasticity
Unlike the case with cross-sectional homoscedasticity, time-series homoscedasticity is
necessary for fixed T consistency of Φ. However, in this section we show that for T
sufficiently large one can still consistently estimate Φ.21 At first we concentrate out
the Θ parameter and consider the normalized version of the log-likelihood function
`c(κc) =− 1
2Tlog
∣∣∣∣∣ TNN∑i=1
(yi −Φyi−)(yi −Φyi−)′
∣∣∣∣∣− T − 1
2Tlog |Σ| − tr
(Σ−1 1
2NT
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)(yi,t −Φyi,t−1)′
).
As the term inside the first log-determinant term is of order OP (T ), the first component
of the log-likelihood function is of order oP (1). Thus as N, T →∞ (jointly)
`c(κc) = op(1)− T − 1
2Tlog |Σ| − tr
(Σ−1 1
2NT
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)(yi,t −Φyi,t−1)′
).
Clearly the remaining component is just the FE effect log-likelihood function and con-
sistency of Σ and Φ follows directly. For the case with time-series heteroscedasticity
in Σt the log-likelihood function consistently estimates Σ∞ ≡ limT→∞1T
∑Tt=1Σt as-
suming that this limit exists.
The gradient of the log-likelihood function with respect to φ is given by
∇φ(κ) = vec
(Σ−1
N∑i=1
T∑t=1
(yi,t −Φyi,t−1)y′i,t−1
)
+ vec
((1
TΘ
)−1 N∑i=1
(yi −Φyi−)y′i−
).
21In order to show similar results for general models with exogenous regressors one has to provethat as T →∞ the incidental parameter matrix G does not result in an incidental parameter problem.
Chapter 2. First Difference Transformation in Panel VAR models 33
As it was argued in the previous sections, the second (“Between”) component of the
derivative w.r.t. Φ is of lower order than the first (“Within”) component. As a result,
under the assumption that N/T → ρ, evaluated at the true value of Φ0
1√NT
(1
TΘ
)−1 N∑i=1
(yi −Φ0yi−)y′i− =√ρ
(1
TΘ
)−11
N
N∑i=1
(yi −Φ0yi−)y′i− + op(1)
=√ρ ((Im −Φ0)Ψu,0(Im −Φ0)′)
−1
× [(Im −Φ0)Ψu,0] + op(1)
=√ρ (Im −Φ′0)
−1+ op(1),
where the corresponding result is valid irrespective of whether time-series heteroscedas-
ticity is present or not. Now consider the bias for the score of the fixed effects estimator
evaluated at Φ0 and Σ = 1T
∑Tt=1Σt (as in e.g. Juodis (2013))
1√NT
E
[Σ−1
N∑i=1
T∑t=1
εi,ty′i,t−1
]= −√ρT Σ−1 E[εiyi
′] + o(1)
= −√ρ
TΣ−1
(T−2∑t=0
(t∑l=0
Φl0
)ΣT−1−t
)′+ o(1)
= −√ρ
TΣ−1
((Im −Φ0)−1
T−2∑t=0
)ΣT−1−t
)′+ o(1)
+
√ρ
TΣ−1
((Im −Φ0)−1
T−2∑t=0
Φt+10 ΣT−1−t
)′+ o(1)
= −√ρ(Im −Φ′0)−1
+1
TΣ−1
(T−2∑t=0
Φt+10 ΣT−1−t
)′+ o(1)
= −√ρ(Im −Φ′0)−1 + o(1).
Here the last line follows if one assumes that the Σs sequence is bounded, so that
the sum term is of order O(1). Hence assuming that N/T → ρ, the standardized
score (NT )−1/2∇φ(κ0) has an asymptotic distribution correctly centered at zero. As
a result the large N, T distribution of the TML estimator is identical to that of the
bias-corrected FE estimator of Hahn and Kuersteiner (2002).
In the previous section we have shown that in the correctly specified model with time-
series homoscedasticity the score of the TML estimator fully removes the induced bias
Chapter 2. First Difference Transformation in Panel VAR models 34
of the FE estimator. This conclusion was established based on the assumption that
N →∞ for a fixed value of T . In this section we have extended this result by showing
that in the presence of possible time-series heteroscedasticity the estimating equations
of the TML estimator remove the leading bias of the FE estimator.
2.5 Simulation study
2.5.1 Setup
At first we present the general DGP that can be used to generate initial conditions yi,0:
yi,0 = ai +Eiµi +Ciεi,0, εi,0 ∼ IID
(0m,
∞∑j=0
Φj0Σ0(Φj0)′
), (2.18)
for some parameter matrices ai [m× 1], Ei [m×m] and Ci [m×m]. The special case
of this setup is the (covariance) stationary model if ai = 0m and Ci = Ei = Im. We
distinguish between stability and stationarity conditions. We call the process yi,tTt=0
dynamically stable if ρ(Φ) < 1 and (covariance) stationary if in addition the first two
moments are constant over time (t = 0, . . . , T ).
In what follows we set ai = 02 for all design.22 We generate the individual heterogeneity
µi (rather than ηi) using a procedure similar to BHP
µi = π
(qi − 1√
2
)ηi, qi
iid∼ χ2(1), ηiiid∼ N(02,Ση). (2.19)
Unlike in the paper of BHP we do not fix Ση = Σ, instead we extend the approach of
Kiviet (2007) by specifying23,24
vecΣη =
(1
T
T∑t=1
(Φt0(E − Im) + Im
)⊗(Φt0(E − Im) + Im
))−1
× (Im2 −Φ0 ⊗Φ0)−1 vecΣ0.
22In the online appendix some additional results for Design 2 are presented with ai = ı2.23See the online appendix of this paper.24If the variance of εi,t differs between individuals then we evaluate this expression at Σn rather
than at Σ.
Chapter 2. First Difference Transformation in Panel VAR models 35
The way we generate µi ensures that the individual heterogeneity is not normally
distributed, but still IID across individuals. In the effect stationary case the particular
way the µi are generated does not influence the behavior of the TML log-likelihood
function. However, the non-normality of µi in the effect non-stationary case implies
non-normality of ui,0 and, hence, a quasi maximum likelihood interpretation of the
likelihood function. With respect to the error terms we restrict our attention to εi,t
being normally distributed ∀i, t.25
2.5.2 Designs
The parameter set which is common for all designs, consists of a triplet N ;T ; π with
possible values
N = 100; 250, T = 3; 6, π = 1; 3.
In the DPD literature it is well known that in the effect stationary case a higher value
π leads to worse finite sample properties of the GMM estimators, see e.g. Bun and
Windmeijer (2010) and Bun and Kiviet (2006). That might also have indirect influence
on the TML estimator even in the effect stationary case, as we use GMM estimators
as starting values for numerical optimization of the log-likelihood function.
In this paper six different Monte Carlo designs are considered. The first one is adapted
from the original analysis of BHP, while the other five are constructed to reveal whether
the TML estimator is robust with respect to different assumptions regarding the pa-
rameter matrix Φ0, the initial conditions yi,0, and cross-sectional heteroscedasticity. In
the case where observations are covariance stationary or cointegrated, BHP calibrated
the design matrices Φ and Σ such that the population R2∆l
26 remained approximately
constant (≈ 0.237) between designs.
25The analysis can be extended to the cases where the error terms are skewed and/or have fatter tailsas compared to the Gaussian distribution. As a partial robustness of their results BHP consideredt- and chi square distributed disturbances, but the results were close to the Gaussian setup. Theestimation output for these setups was not presented in their paper.
26Computation of the population R2 for stationary series R2∆l = 1 − Σl,l
Γl,l, l =
1; where vec (Γ ) in the covariance stationary case is given by vec (Γ ) =(((Im −Φ0)⊗ (Im −Φ0)) (Im2 −Φ0 ⊗Φ0)
−1+ Im2
)Dm σ.
Chapter 2. First Difference Transformation in Panel VAR models 36
Design 1 (Covariance Stationary PVAR with ρ(Φ0) = 0.8 from BHP).
Φ0 =
(0.6 0.2
0.2 0.6
), Σ0 =
(0.07 −0.02
−0.02 0.07
), Ση =
(0.123 0.015
0.015 0.123
).
The second eigenvalue is equal to 0.4 and the population R2∆ values are given by
R2∆l = 0.2396, l = 1, 2.
Although the Monte Carlo designs in BHP are well chosen, they are quite limited in
scope as the analysis was mainly focused on the influence of ρ(Φ0). Furthermore, all
design matrices in the stationary designs were assumed to be symmetric and Toeplitz,27
which substantially shrinks the parameter space for Φ0 and Σ.
Design 2 (Covariance Stationary PVAR with ρ(Φ0) = 0.50498).
Φ0 =
(0.4 0.15
−0.1 0.6
), Σ0 =
(0.07 0.05
0.05 0.07
), Ση =
(0.079 0.052
0.052 0.100
).
The eigenvalues of Φ0 in this design are given by 0.5 ± 0.070711i and the population
R2∆ values are given by R2
∆2 = 0.23434 and R2∆2 = 0.23182.
The parameter matrix Φ0 was chosen such that the population R2∆ are comparable
between Designs 1 and 2, but the extensibility condition is violated.
In Designs 3-4 we study the finite-sample properties of the estimators when the initial
condition is not effect-stationary.28
Design 3 (Stable PVAR with ρ(Φ0) = 0.50498). We take Φ0 and Σ0 from Design 2,
but with
Ei = 0.5× I2, Ci = I2, i = 1, . . . , N.
Ση,T=3 =
(0.090 0.059
0.059 0.144
), Ση,T=6 =
(0.083 0.055
0.055 0.122
).
Design 4 (Stable PVAR with ρ(Φ0) = 0.50498). We take Φ0 and Σ0 from Design 2,
but with
Ei = 1.5× I2, Ci = I2, i = 1, . . . , N.
27Hence they satisfied the “Extensibility” condition.28Note that effect non-stationarity in these designs has no impact on the first unconditional moment
of the yi,tTt=0 process. It can be explained by the fact that E[µi] = 02 is a sufficient condition forthe yi,tTt=0 process to have a zero mean. Thus there is no reason to allow for mean non-stationarityby including γ parameter into the log-likelihood function, but it is crucial to allow for a covariancenon-stationary initial condition.
Chapter 2. First Difference Transformation in Panel VAR models 37
Ση,T=3 =
(0.069 0.045
0.045 0.074
), Ση,T=6 =
(0.074 0.049
0.049 0.083
).
In Section 2.4.2 we presented theoretical results for the TML estimator when unre-
stricted cross sectional heteroscedasticity is present. This design is used to investigate
the impact of multiplicative cross-sectional heteroscedasticity on the estimators.
Design 5 (Stable PVAR with ρ(Φ0) = 0.50498 with non-i.i.d. εi,t). As a basis for this
design we take Φ0 and Σ0 from Design 2, but with
Ei = I2, Ci = ϕiI2, Σ0,i = ϕ2iΣ0, ϕ2
iiid∼ χ2(1), i = 1, . . . , N.
The last design is dedicated to reveal the robustness properties of the TML estimator
when time-series heteroscedasticity is present. From Section 2.4.5 we know that this
estimator is not fixed T consistent in this case.
Design 6 (Stable PVAR with time-series heteroscedasticity). As a basis for this design
we take Φ0 and Σ0 from Design 2 Ei = Ci = I2, but with Σ0,t are generated as
Σ0,t = (0.95− 0.05T + 0.1t)×Σ0, t = 1, . . . , T.
This particular form of the time-series heteroscedasticity was chosen such that the
T−1∑T
t=1Σ0,t = Σ0.
For convenience we have multiplied both the mean and the median bias by 100. Sim-
ilarly to BHP we only present results for φ11 and φ12, as results for the other two
parameters are similar both quantitatively and qualitatively. The number of Monte
Carlo simulations is set to B = 10, 000.
2.5.3 Technical remarks
As starting values for TMLE estimation algorithm we used estimators available in a
closed form. Namely, we used “AB-GMM”, “Sys-GMM” and FDLS, the additive bias-
corrected FE estimator as in Kiviet (1995) and the bias-corrected estimator of Hahn
and Kuersteiner (2002). Here “AB-GMM” stands for the Arellano and Bond (1991)
estimator; “Sys-GMM” is the System estimator of Blundell and Bond (1998) which
incorporates moment conditions based on the initial condition. All aforementioned
Chapter 2. First Difference Transformation in Panel VAR models 38
GMM estimators are implemented in two steps, with the usual clustered weighting
matrix used in the second step.29
We denote by “TMLE” the global maximizer of the TML objective in (2.12). By
“TMLEr”, we denote the estimator which is obtained similarly as “TMLE”, but instead
of selecting the global maximum, the local maximum that satisfies the restrictions
|Θ− Σ|≥ 0 is selected when possible30 and the global maximum otherwise. The TML
estimator with imposed covariance stationarity is denoted by “TMLEc”. Finally, we
denote by “TMLEs” the estimator that is obtained by choosing the local maximum of
the TMLE objective function with the lowest spectral norm.31 This choice is motivated
by the fact that for a univariate three-wave panel the second mode is always larger than
the true mode; in a PVAR one can think of the spectral norm as a measure of distance.
Regarding inference, for all the TML estimators we present results based on robust
“sandwich” type standard errors labeled (r). In the case of GMM estimators, we
provide rejection frequencies based on the commonly used Windmeijer (2005) corrected
S.E.
2.5.4 Results: Estimation
In this section we briefly summarize the main findings of the MC study as presented
in Tables 2.1 to 2.6. Inference related issues are discussed in the next section.
Design 1. For GMM estimators results are similar to those in BHP. Irrespective of
N , the properties of all GMM estimators deteriorate as T and/or π increase and these
effects are substantial both for diagonal and off-diagonal elements of Φ. Similarly, we
can see that for small values of T , the performance of the TML estimator is directly
related to the corresponding bias and the RMSE properties of the GMM estimators.32
Hence using the estimators that are biased towards the pseudo-true value, helps to find
the second mode that happens to be the global maximum in that replication. On the
other hand, if the resulting estimators are restricted in some way (TMLEs, TMLEr,
29That takes the form “Z ′uu′Z”.30In principle this restriction is necessary but not sufficient for Θ − Σ to be p.s.d. However, for
the purpose of exposition in this paper we stick to this condition rather than checking non-negativityof the corresponding eigenvalues.
31However, unlike the univariate studies of Hsiao et al. (2002) and Hayakawa and Pesaran (2012),where the φ parameter was restricted to lie in the stationary region, in the numerical routine for theTMLE no restrictions on the parameter space of φ are imposed.
32This contrasts sharply with the finite-sample results presented in BHP.
Chapter 2. First Difference Transformation in Panel VAR models 39
TMLEc), the strong dependence on starting values is no longer present (especially
for TMLEs). In terms of both the bias and the RMSE we can see that the TMLEc
estimator performs remarkably well irrespective of design parameter values for both di-
agonal and off-diagonal elements. The FDLS estimator does perform marginally worse
as compared to the TMLEc estimator but still outperforms all the GMM estimators.
All the TML estimators (except for TMLEc) tend to have an asymmetric finite sample
distribution that results in corresponding discrepancies between estimates of mean and
median.
In Section 2.4.4 we have mentioned that the second mode of the unrestricted TML
estimator is located at Φ = Im. Based on the results in Table 2.1 we can see that the
diagonal elements for the TML estimator are positively biased towards 1, while the off-
diagonal elements are negatively biased in direction of 0 (at least for small N and T ).
Thus the bimodality problem remains a substantial issue even for T > 2 and choosing
the global optimum is not always the best strategy as TMLEs clearly dominates TMLE
for small values of T . For T = 6 the TMLEr and TMLEs provide equivalent results
and some improvements over the “global” standard TMLE.
Design 2. One of the implications of this setup is that the FDLS estimator is not
consistent. More importantly, for this setup we do not know whether the bimodality
issue even for T = 2 is still present, thus the need for the TMLEr and TMLEs esti-
mators is less obvious. However, the motivation becomes clear once we look at the
corresponding results in Table 2.2. TMLEs and TMLEr dominate TMLE in all cases,
with TMLEs being the preferred choice. We can observe that the bias of the TML
estimator in terms of both the magnitude and the sign does not change dramatically as
compared to Design 1. Observe that the bias of the TMLEc in the diagonal elements
does not decrease with T fast enough to match the performance of the TMLEr/TMLEs
estimators. While for the off-diagonal elements quite a substantial bias remains even
for N = 250, T = 6.33
Designs 3 and 4. As expected, the properties of Sys-GMM (that rely on the effect-
stationarity implied moment conditions) deteriorate significantly compared to Design
2. We observe that for π = 1 the AB-GMM estimator is more biased in comparison
to Design 2 (for Design 3), but is less biased if π = 3. The intuition of these patterns
is similar to the one presented by Hayakawa (2009a) within the univariate setting.
Unlike the previous designs, the TML estimator exhibits lower bias for π = 3 despite
33As it will turn out later, these properties will play a major role to explain the finite sampleproperties of the LR test of covariance stationarity, that is presented in the online appendix.
Chapter 2. First Difference Transformation in Panel VAR models 40
the fact that the quality of the starting values diminished in the same way as in the
effect-stationary case. The magnitudes of the effect non-stationary initial conditions
considered in these designs are sufficient to ensure that the restrictions imposed on the
TMLEr estimator are satisfied even for small values of N and T .
Design 5. Unlike in Designs 3-4, the setup of Design 5 has no impact on the consistency
of estimators (except FDLS). As can be clearly seen from Table 2.6, the same can not be
said about the variance of the estimators. The introduction of cross-sectional variation
in Σ0,i affects all estimation techniques by means of higher RMSE/MAE values. On
the other hand, effects are less clear for bias with improvements for some estimators
and higher bias for others.
Design 6. In this setup all TML estimators are inconsistent due to the time-series
heteroscedasticity, but the TMLEc estimator seems to be affected the most in terms
of both the bias and precision. By comparing the results in Tables 2.2 and 2.6, we see
that diagonal elements (φ11 in this case) are mostly affected as the estimation quality of
the off-diagonal elements remains unaffected. Furthermore, the Sys-GMM estimator,
albeit still consistent, also shows some signs of deteriorating finite sample properties.
For T = 6 the bias of the TMLE/TMLEs/TMLEr estimators diminishes, as can be
expected given that the bias is of order O(T−2).
2.5.5 Results: Inference
Below we briefly summarize the main findings regarding the size and power of the two-
sided t-test for φ11. Results for the other entries are available from the author upon
request.
• Except for TMLEc, for N = 100 all estimators result in substantially oversized
test statistics with relatively low power. In many cases rejection frequencies for
alternatives close to the unit circle are of similar magnitude to size.
• When the estimator is consistent, the inference based on TMLEc serves as a
benchmark both for size and power.
• In designs with the effect stationary initial condition (exceptN = 250, T = 6 to be
discussed next), the empirical rejection frequencies based on all the TML (except
for TMLEc) as well as the AB-GMM estimators do not result in symmetric power
curves, due to the substantial finite sample bias of the estimators.
Chapter 2. First Difference Transformation in Panel VAR models 41
• Results for T = 6 and N = 250 suggest that the TML estimators without imposed
stationarity restrictions are well sized and have good power properties in all
designs with almost perfectly symmetric power curves.
• Although all the TML estimators (without imposed stationarity restriction) are
inconsistent with time-series heteroscedastic error terms, the actual rejection fre-
quencies for N = 250 are only marginally worse in comparison to the benchmark
case. The same, however, can not be said about the TMLEc estimator.
• In the design with cross-sectional heteroscedasticity, the TML based test statistics
become more oversized, compared to the benchmark case. The only exception is
the case with N = 250 and T = 6 where the actual size increases by at most 1%.
The results on bias and size presented here suggest that under the assumption of time
homoscedasticity, likelihood based techniques might serve as a viable alternative to the
GMM based methods in the simple PVAR(1) model. Particularly, the TML estimator
of BHP tends to be robust with respect to non-stationarity of the initial condition
and cross-sectional heterogeneity of parameters. Furthermore, in finite samples likeli-
hood based methods are robust even if smooth time-series heteroscedasticity is present.
However, the TML estimator might suffer from serious bimodality problems when the
number of cross-sectional units is small and the length of time series is short. In these
cases the resulting estimator heavily depends on the way the estimator is chosen. For
some designs in 30%−40% of all MC replications no local maxima satisfying |Θ−Σ|> 0
was available even for N = 250. However, this problem becomes marginal once T = 6
where such fractions drop to 1% − 10%. Based on these results we suggest that the
resulting TMLE estimator is chosen such that (when possible) local maxima should
satisfy the p.s.d. restriction |Θ − Σ|> 0 (TMLEr), and otherwise the solution with
smaller spectral norm should be chosen (TMLEs).
2.6 Conclusions
In this paper we provide a thorough analysis of the performance of fixed T consistent
estimation techniques for the PVARX(1) model based on observations in first differ-
ences. We have mostly emphasized the results and properties of the likelihood based
method. We have extended the approach of BHP with the inclusion of strictly exoge-
nous regressors and shown how to construct a concentrated likelihood function for the
autoregressive parameter only.
Chapter 2. First Difference Transformation in Panel VAR models 42
The key finding of this paper is that in the three-wave panel the expected log-likelihood
function of BHP in the univariate setting does not have a unique maximum at the
true value. This result has been shown to be robust irrespective of the initialization.
Furthermore, we have provided a sufficient condition for this result to hold for the
PVAR(1) in the three-wave panel.
Finally, we have conducted an extensive MC study with the emphasis on designs where
the set of standard assumptions about the stationarity and the cross-sectional ho-
moscedasticity were violated. Results suggest that likelihood based inference techniques
might serve as a feasible alternative to GMM based methods in a simple PVARX(1)
model. However, for small values of N and/or T the TML estimator is vulnerable to
the choice of the starting values for the numerical optimization algorithm. These finite
sample findings have been related to the bimodality results derived in this paper. We
proposed several ways of choosing the estimator among local maxima. Particularly, we
suggest that the resulting TMLE estimator be chosen such that local maxima should
satisfy the p.s.d. restriction (TMLEr), and otherwise the solution with smaller spectral
norm should be chosen (TMLEs).
2.A Proofs
First, we define a set of new auxiliary variables, that are used in the derivations
εi,t(φ) ≡ yi,t −Φyi,t−1, εi(φ) ≡ yi −Φyi−
ZN(κ) ≡ 1
N
N∑i=1
T∑t=1
εi,t(φ)εi,t(φ)′, QN(κ) ≡ 1
N
N∑i=1
T∑t=1
yi,t−1εi,t(φ)′,
MN(κ) ≡ T
N
N∑i=1
εi(φ)εi(φ)′, NN(κ) ≡ T
N
N∑i=1
yi−εi(φ)′,
RN ≡1
N
N∑i=1
T∑t=1
yi,t−1y′i,t−1, PN ≡
T
N
N∑i=1
yi−y′i−, Ξ ≡
T−2∑l=0
(T − 1− l)Φl0.
Chapter 2. First Difference Transformation in Panel VAR models 43
In the derivations we use several results concerning differentials (for more details refer
to Magnus and Neudecker (2007))
dlog |X| = tr (X−1(dX)), d(trX) = tr (dX),
d(vecX) = vec (dX), dX−1 = −X−1(dX)X−1,
dXY = (dX)Y +X(dY ), d(X ⊗X) = d(X)⊗X +X ⊗ d(X),
vec (dX ⊗X) = (Im ⊗Km⊗Im)(Im2 ⊗ vecX) vec d(X)
2.A.1 Auxiliary results
Lemma 2.12.
Υ ≡T−1∑l=0
Φl0 − TIm +
(T−2∑l=0
(T − l)Φl0 −T−2∑l=0
Φl0
)(Im −Φ0) = Om.
Proof.
Υ ≡T−1∑l=0
Φl0 − TIm +
(T−2∑l=0
(T − l)Φl0 −T−2∑l=0
Φl0
)(Im −Φ0)
= ΦT−10 +
T−2∑l=0
Φl+10 − TIm + T
(T−2∑l=0
Φl0 −T−1∑l=1
Φl0
)−
(T−2∑l=1
lΦl0 −T−1∑l=1
(l − 1)Φl0
)
= ΦT−10 +
T−1∑l=1
Φl0 − TIm + T (Im −ΦT−10 )−
(T−2∑l=1
Φl0 − (T − 2)ΦT−10
)
= ΦT−10 +
T−2∑l=0
Φl+10 − TΦT−1
0 −
(T−2∑l=1
Φl0 − (T − 2)ΦT−10
)= (1− T )ΦT−1
0 +ΦT−10 + (T − 2)ΦT−1
0 = Om.
Lemma 2.13. Under Assumptions SA* the following equality holds
E [NN(κ0)] =1
TΞΘ0.
for Θ0 = Σ0 + T (Im −Φ0)E[ui,0u′i,0](Im −Φ0)′.
Chapter 2. First Difference Transformation in Panel VAR models 44
Proof. Define Π0 = Φ0 − Im then
E [NN(κ0)′] = E
(T
N
N∑i=1
(yi −Φ0yi−)y′i−
)
= E
[(Π0ui,0 + εi)
((T−1∑s=0
Φs0 − TIm
)yi,0 +
(T−2∑l=0
(T − 1− l)Φl0
)−Π0µi+
)′]
+ E
[(Π0ui,0 + εi)
(T−1∑t=1
t−1∑s=0
Φs0εi,t−s
)′]
= E
[(Π0ui,0 + εi)
(Υyi,0 +
(T−2∑l=0
(T − 1− l)Φl0
)Π0ui,0 +
(T−1∑t=1
t−1∑s=0
Φs0εi,t−s
))′].
In Lemma 2.12 we showed that Υ = Om, thus
E
[T
N
N∑i=1
(yi −Φ0yi−)y′i−
]= E
[(Π0ui,0 + εi)
(ΞΠ0ui,0 +
(T−1∑t=1
t−1∑s=0
Φs0εi,t−s
))′]= (Im −Φ0)E[ui,0u
′i,0](Im −Φ0)′Ξ ′ +
1
TΣ0Ξ
′ =1
TΘ0Ξ
′.
2.A.2 Log-likelihood function
Theorem 2.6.
Let
∆τi =
∆yi,1
∆εi,2...
∆εi,T
, CT =
1 0 · · · 0
1 1. . .
......
. . . . . . 0
1 · · · 1 1
,LT =
0 · · · · · · · · · 0
1. . . . . . . . .
...
0. . . . . . . . .
......
. . . . . . . . . 0
0 · · · 0 1 0
and let D be a [T × T + 1] matrix which transforms a [T + 1 × 1] vector x into a
[T × 1] vector of corresponding first differences. Also define Θ ≡ T (Ψ −Σ) +Σ and
Chapter 2. First Difference Transformation in Panel VAR models 45
Ω ≡ Σ−1Θ. If we denote Γ ≡ Σ−1Ψ it then follows
Σ∆τ = (IT ⊗Σ)
Γ −Im Om · · · Om
−Im 2Im. . . . . .
...
Om. . . . . . . . . Om
.... . . . . . . . . −Im
Om · · · Om −Im 2Im
= (IT ⊗Σ) [(DD′ ⊗ Im) + (e1e
′1 ⊗ (Γ − 2Im))]
= (IT ⊗Σ)[((C ′TCT )−1 ⊗ Im) + (e1e
′1 ⊗ (Γ − Im))
].
Subsequently the determinant is given by (using the fact that |CT |= 1)
|Σ∆τ | = |Σ|T |((C ′TCT )−1 ⊗ Im) + (e1e′1 ⊗ (Γ − Im))|
= |Σ|T |Im + (e′1C′TCTe1(Γ − Im))||(C ′TCT )−1|
= |Σ|T |Im + (e′1C′TCTe1(Γ − Im))|
= |Σ|T |Im + T (Γ − Im)|= |Σ|T |Ω|= |Σ|T−1|Θ|,
where the second line follows by means of the Matrix Determinant Lemma.34 Using
the Woodbury formula we can evaluate Σ−1∆τ
Σ−1∆τ =
[((C ′TCT )−1 ⊗ Im) + (e1e
′1 ⊗ (Γ − Im))
]−1(IT ⊗Σ−1)
= ((C ′TCT )⊗ Im)− ((C ′TCTe1)⊗ Im)((Γ − Im)−1 + TIm
)× ((e′1C
′TCT )⊗ Im)(IT ⊗Σ−1)
= (C ′T ⊗ Im)U (CT ⊗ Im)(IT ⊗Σ−1)
= (C ′T ⊗ Im)U(IT ⊗Σ−1)(CT ⊗ Im),
34Alternatively |Σ∆τ | can be evaluated using the general formula for tridiagonal matrices in Moli-nari (2008).
Chapter 2. First Difference Transformation in Panel VAR models 46
where U is
U = ITm − ((CTe1)⊗ Im)((Γ − Im)Ω−1
)((e′1C
′T )⊗ Im)
= ITm − (ıT ⊗ Im)((Γ − Im)Ω−1
)(ı′T ⊗ Im) = ITm − ıT ı′T ⊗
((Γ − Im)Ω−1
)= ITm −
1
TıT ı′T ⊗
(Im −Ω−1
)= ITm −
1
TıT ı′T ⊗ Im +
1
TıT ı′T ⊗Ω−1
= WT ⊗ Im +1
TıT ı′T ⊗Ω−1,
so that
Σ−1∆τ = (C ′T ⊗ Im)
[WT ⊗Σ−1 +
1
TıT ı′T ⊗Θ−1
](CT ⊗ Im).
Now using the fact that R = (ITm −LT ⊗Φ) and defining zi = (yi,0, . . . ,yi,T )
Z ≡ (CT ⊗ Im)(ITm −LT ⊗Φ) vec (ziD′)
= vec (ziD′C ′T −ΦziD′L′TC ′T ) = vec ((CTDz
′i)′ −Φ(CTLTDz
′i)′)
= vec ((Yi − ıTyi,0)′ −Φ(Yi− − ıTyi,0)′).
Hence the log likelihood function of BHP can be rewritten in the following way (where
κ = (φ′,σ′,θ′)′)
`(κ) = c− N2
((T − 1) log |Σ|+ log |Θ|+ tr (Σ−1ZN(κ)) + tr (Θ−1MN(κ))
). (2.20)
In order to include exogenous regressors in the model we denote the following quantities
γ = G∆X†i , Xi = (xi,1, . . . ,xi,T ).
The Z term in this case is given by
Z ≡ (CT ⊗ Im) ((ITm −LT ⊗Φ) vec (ziD′)− (IT ⊗B) vec (∆Xi)− vec (γe′1))
= vec ((Yi − ıT (yi,0 + γ))′ −Φ(Yi− − ıTyi,0)′ −B(Xi − ıTxi,0)′).
Result follows directly based on derivations for PVAR(1) model by redefining ZN and
MN .
Chapter 2. First Difference Transformation in Panel VAR models 47
2.A.3 Score vector
Proposition 2.7.
Here for simplicity we derive the first differential of `(κ) without exogenous regressors
− 2
Nd`(κ) = (T − 1) tr (Σ−1(dΣ)) + tr (Θ−1(dΘ))
− tr (Σ−1(dΣ)Σ−1ZN(κ))− tr (Θ−1(dΘ)Θ−1MN(κ))
+ tr (Σ−1(dZN(κ))) + tr (Θ−1(dMN(κ)))
= tr (Σ−1((T − 1)Σ −ZN(κ))Σ−1(dΣ))
+ tr (Θ−1(Θ −MN(κ))Θ−1(dΘ))
− 2 tr(Σ−1((dΦ)QN(κ))
)− 2 tr
(Θ−1((dΦ)NN(κ))
).
Based on these derivations we conclude that the corresponding [2m2 + m × 1] score
vector is given by
∇(κ) = N
vec (Σ−1QN(κ)′ +Θ−1NN(κ)′)
D′m vec (−12(Σ−1((T − 1)Σ −ZN(κ))Σ−1))
D′m vec (−12(Θ−1(Θ −MN(κ))Θ−1))
. (2.21)
The mean zero result follows directly from Lemma 2.13 and the fact that E[Σ−10 QN(κ0)′] =
−(1/T )Ξ ′ (the “Nickell bias”).
Proposition 2.8.
We need to derive the exact expression for vec dΘ under the assumption that vec E[ui,0u′i,0] =
(Im2 − Φ ⊗ Φ)−1 vecΣ. First, we rewrite the expression for vecΘ (we prefer to work
with vec (·) rather than vech (·) to avoid excessive use of duplication matrix Dm)
vecΘ = vecΣ + T ((Im −Φ)⊗ (Im −Φ)) vec E[ui,0u′i,0]
= vecΣ + T ((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1 vecΣ = Jσθ vecΣ.
Using rules for differentials we get that
d(vecΘ) = Jσθ d(vecΣ) + d(Jσθ) vecΣ.
Chapter 2. First Difference Transformation in Panel VAR models 48
Using the product rule for differentials
1
Td(Jσθ) = − (d(Φ)⊗ (Im −Φ) + (Im −Φ)⊗ d(Φ)) (Im2 −Φ⊗Φ)−1
+ ((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1
× (d(Φ)⊗Φ+Φ⊗ d(Φ)) (Im2 −Φ⊗Φ)−1.
Recall the definition of E[ui,0u′i,0] = Ψ0 and ψ0 = vecΨ0. As d(Jσθ) vecΣ is already
a vector by taking vec (·) of this term nothing changes
1
Tvec (d(Jσθ) vecΣ) = −(ψ′0 ⊗ Im2) vec (d(Φ)⊗ (Im −Φ) + (Im −Φ)⊗ d(Φ))
+ (ψ′0 ⊗(((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1
))
× vec (d(Φ)⊗ (Φ) + (Φ)⊗ d(Φ)).
Using the formula for vec (dX ⊗X)
1
Td(Jσθ) vecΣ =− (ψ′0 ⊗ Im2)(Im ⊗Km⊗Im)(Im2 ⊗ (j − φ) + (j − φ)⊗ Im2) dφ
+ (ψ′0 ⊗(((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1
))
× (Im ⊗Km⊗Im)(Im2 ⊗ φ+ φ⊗ Im2) dφ
Recall the definition of Jφθ to conclude that
d(Jσθ) vecΣ = Jφθ dφ. (2.22)
The desired results follows by combining the differential results for dvecΘ with the
proof of Proposition 2.7.
Proposition 2.10.
Consider the score vector evaluated at κ
∇(κ) = N
vec(Σ−1
0 QN(φ0)′ + Θ−1NN(φ0)′)
D′m vec (−12(Σ−1
0 ((T − 1)Σ0 −ZN(φ0))Σ−10 ))
D′m vec (−12(Θ−1(Θ −MN(φ0))Θ−1))
. (2.23)
Now observe that the mean of E[ui,0] does not influence the “Nickell bias” E[Σ−10 QN(φ0)′] =
−(1/T )Ξ ′ and the unbiasedness of the FE estimator of Σ as E[ZN(φ0)] = (T − 1)Σ0.
On the other hand MN(φ0) and NN(φ0) are (implicitly) influenced by γ. Similarly as
Chapter 2. First Difference Transformation in Panel VAR models 49
in the proof of 2.13
E
[T
N
N∑i=1
(yi −Φ0yi−)y′i−
]= E
[(Π0ui,0 + εi)
(ΞΠ0ui,0 +
(T−1∑t=1
t−1∑s=0
Φs0εi,t−s
))′]= Π0 E[ui,0u
′i,0]Π ′0Ξ
′ +1
TΣ0Ξ
′ =1
TΘΞ ′.
Note that this term depends on the second uncentered moment of ui,0 rather than
second centered moment of ui,0. Finally
E
[T
N
N∑i=1
(yi −Φ0yi−)(yi −Φ0yi−)′
]= T E
[(Π0ui,0 + εi) (Π0ui,0 + εi)
′]= TΠ0 E[ui,0u
′i,0]Π ′0 +Σ0 = Θ.
Combining all results we conclude that E[∇(κ)] = 0.
Proposition 2.9.
To see that E[∇(κN)] = 0 we just make use of the proof of Proposition 2.10. Note
that
E
[T
N
N∑i=1
(yi −Φ0yi−)y′i−
]
=1
N
N∑i=1
E
[(Π0ui,0 + εi)
(ΞΠ0ui,0 +
(T−1∑t=1
t−1∑s=0
Φs0εi,t−s
))′]
= Π01
N
(N∑i=1
E[ui,0u′i,0]
)Π ′0Ξ
′ +1
TΣNΞ
′ =1
TΘNΞ
′
and
E
[T
N
N∑i=1
(yi −Φ0yi−)(yi −Φ0yi−)′
]=T
N
N∑i=1
E[(Π0ui,0 + εi) (Π0ui,0 + εi)
′]= TΠ0
1
N
(N∑i=1
E[ui,0u′i,0]
)Π ′0 + ΣN = ΘN .
On the other hand E[Σ−1N QN(φ0)′] = −(1/T )Ξ ′ and E[ZN(φ0)] = (T − 1)ΣN . Com-
bining these intermediate results the desired final conclusion E[∇(κN)] = 0 follows.
Note that in this case E[ui,0] is allowed to be non-zero and individual specific.
Chapter 2. First Difference Transformation in Panel VAR models 50
2.A.4 Bimodality
Proof Theorem 2.11.
Denote the true value for θ2 as θ20 that for general T is equal to
θ20 = σ2
0 + T (1− φ0)2 E[u2i,0].
Thus at T = 2 it is equal to
θ20 = σ2
0 + 2(1− φ0)2 E[u2i,0]
For some φ we denote the following variables
θ2φ = E
[2
N
N∑i=1
(yi − φyi−)2
], σ2
φ = E
[1
N
N∑i=1
2∑t=1
(yi,t − φyi,t−1)2
].
and a = φ0 − φ.
As we assume that the observations are i.i.d. it is sufficient to analyze the previous
expressions for some arbitrary individual i. First, we proceed with the expression for
σ2φ (recall the definition of x)
σ2φ = E
[1
N
N∑i=1
2∑t=1
(yi,t − φyi,t−1)2
]= 0.5 E [(∆yi,2 − φ∆yi,1)2]
= 0.5 E [(∆εi,2 + (φ0 − φ)∆yi,1)2]
= 0.5 E [(∆εi,2 + (φ0 − φ)((1− φ0)ui,0 + εi,1))2]
= 0.5 E [(εi,2 + (φ0 − φ)(1− φ0)ui,0 + (φ0 − φ− 1)εi,1)2]
= 0.5(σ20(1 + (φ0 − φ− 1)2) + (φ0 − φ)2(1− φ0)2 E[u2
i,0])
= 0.5σ20
(1− 2(φ0 − φ) + 1 + (φ0 − φ)2x
)= 0.5σ2
0
(a2x+ 2(1− a)
)Similarly we can derive expressions for θ2
0 and θ2φ in terms of x and a.
θ20 = σ2
0 + 2(1− φ0)2 E[u2i,0] = σ2
0 (2x− 1)
Chapter 2. First Difference Transformation in Panel VAR models 51
While for θ2φ it follows that
θ2φ = E
[2
N
N∑i=1
(yi − φyi−)2
]= 2 E
[(ui − ui,0 − φ(ui,− − ui,0))2
]= 2 E
[(εi + φ0ui,− − ui,0 − φ(ui,− − ui,0))2
]= 0.5 E
[(εi,2 + εi,1 + φ0(ui,1 + ui,0)− 2ui,0 − φ(ui,1 − ui,0))2
]= 0.5 E
[(εi,2 + εi,1(1 + φ0 − φ) + ui,0(φ0(1 + φ0)− 2− φ(φ0 − 1)))2
]= 0.5σ2
0
[1 + (1 + a)2 + (1− φ0)2 E[u2
i,0](a+ 2)2]
= 0.5σ20
[1 + (1 + a)2 + (1− φ0)2 E[u2
i,0](a+ 2)2/σ20
]= 0.5σ2
0
[1 + (1 + a)2 + (x− 1)(a+ 2)2
]= 0.5σ2
0
[a2x+ (a+ 1)(4x− 2)
].
Continuing,
σ2φθ
2φ = 0.25σ4
0
(a2x− 2(a− 1)
) (a2x+ (a+ 1)(4x− 2)
)= 0.25σ4
0
(a2(a2x2 + 2xa(2x− 2) + (2x− 2)2
)+ 4(2x− 1)
)= 0.25σ4
0
(a2 (ax+ 2(x− 1))2 + 4(2x− 1)
)= 0.25σ4
0
(a2 (ax+ 2(x− 1))2)+ σ2
0θ20.
The first term in brackets is obviously equal for true value φ0 (a = 0) and for
a = 21− xx⇒ φ0 − φ = 2
1− xx⇒ φ = 2
x− 1
x+ φ0.
2.B Iterative bias correction procedure
Asymptotic normality of the estimator can be proved by treating it as the solution of
the following estimating equations
N∑i=1
T∑t=2
((∆yi,t − Υ∆wi,t)∆w
′i,t +
1
2(∆yi,t − Υ∆wi,t)(∆yi,t − Υ∆wi,t)
′S
)= Om×(k+m),
(2.25)
where S = [Im Om×k].
Chapter 2. First Difference Transformation in Panel VAR models 52
Algorithm 1 Iterative Bias-correction procedure FDOLS
1. For k = 1 to kmax:
2. Given Υ (k−1) compute Υ (k) = Υ + (T − 1)Σ(Υ (k−1))S−1N ;
3. If ‖Υ (k) − Υ (k−1)‖< ε, stop. For some pre-specified matrix norm ‖·‖.
To initialize iterations we set Υ (0) = Υ , and Σ(Υ (k−1)) is defined as
Σ(Υ ) =1
2N(T − 1)
N∑i=1
(T∑t=2
(∆yi,t − Υ∆wi,t) (∆yi,t − Υ∆wi,t)′
). (2.24)
Proposition 2.14. Let Assumptions SA be satisfied and assume that the iterative
procedure in Algorithm 1 has a unique fixed point. Then
√N (υiBC − υ0)
d−→ Nm(0m2 ,F), (2.26)
where
F ≡ V −1XV −1, V = (Σ∆ ⊗ Im)− 1
2(Im(k+m) + Km,(k+m))((S
′Σ0S)⊗ Im),
X ≡ plimN→∞
1
N
N∑i=1
vecOi (vecOi)′ ,
Oi ≡T∑t=2
((∆yi,t − Υ0wi,t)w
′i,t +
1
2(∆yi,t − Υ0wi,t)(∆yi,t − Υ0wi,t)
′S
)
Note that the asymptotic distribution of the estimator depends upon the choice of
Σ(Φ). A different asymptotic distribution is obtained if instead of using the Σ esti-
mator in (2.24) we opt for the standard infeasible ML estimator
Σ(Υ ) =1
N(T − 1)
N∑i=1
(T∑t=1
(yi,t −Φyi,t−1 −Bxi,t) (yi,t −Φyi,t−1 −Bxi,t)′).
2.C Tables
Chapter 2. First Difference Transformation in Panel VAR models 53
Table
2.1
:D
esig
n1
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
φ11
AB
-GM
M-1
5.9
9-1
5.5
7-0
.77
0.4
50.4
30.2
5-3
6.0
0-3
5.3
2-1
.29
0.5
30.6
90.4
3-1
2.2
9-1
1.7
8-0
.34
0.0
80.1
80.1
2-1
9.7
6-1
8.9
6-0
.48
0.0
60.2
60.1
9S
ys-
GM
M2.2
02.9
4-0
.24
0.2
50.1
50.1
017.0
418.2
2-0
.15
0.4
60.2
50.2
07.0
07.1
9-0
.06
0.1
90.1
00.0
825.3
426.1
70.1
10.3
70.2
70.2
6F
DL
S0.3
30.2
1-0
.23
0.2
40.1
40.1
00.3
10.2
0-0
.23
0.2
40.1
40.1
00.1
10.0
5-0
.14
0.1
40.0
90.0
60.0
80.0
3-0
.14
0.1
40.0
90.0
6T
ML
E6.0
03.2
3-0
.25
0.4
70.2
30.1
510.1
17.1
8-0
.25
0.5
30.2
60.1
61.7
80.8
1-0
.11
0.1
80.0
90.0
63.5
42.2
3-0
.11
0.2
20.1
10.0
6T
ML
Ec
-0.5
1-0
.57
-0.2
00.1
90.1
20.0
8-0
.24
-0.4
8-0
.20
0.2
00.1
20.0
8-0
.34
-0.3
0-0
.10
0.0
90.0
60.0
4-0
.36
-0.3
1-0
.10
0.0
90.0
60.0
4T
ML
Es
1.4
1-0
.96
-0.2
60.3
70.1
90.1
31.4
6-0
.98
-0.2
60.3
70.1
90.1
30.8
80.1
9-0
.12
0.1
60.0
80.0
50.8
30.1
3-0
.12
0.1
60.0
80.0
5T
ML
Er
2.4
5-0
.44
-0.2
60.4
10.2
00.1
34.8
51.0
2-0
.26
0.4
80.2
30.1
40.8
80.1
9-0
.12
0.1
60.0
80.0
50.8
40.1
4-0
.12
0.1
60.0
80.0
5φ12
AB
-GM
M-9
.06
-8.5
2-0
.71
0.5
20.4
10.2
3-1
9.6
1-1
8.7
4-1
.18
0.7
50.6
60.3
7-6
.25
-6.1
1-0
.28
0.1
50.1
40.0
9-1
1.3
5-1
1.1
0-0
.40
0.1
60.2
00.1
4S
ys-
GM
M-1
.02
-1.0
8-0
.25
0.2
30.1
50.0
9-5
.51
-5.6
3-0
.35
0.2
40.1
90.1
2-2
.29
-2.1
5-0
.14
0.0
90.0
80.0
5-1
0.5
6-1
0.9
7-0
.22
0.0
30.1
30.1
1F
DL
S-0
.14
-0.1
3-0
.25
0.2
40.1
50.1
0-0
.19
-0.1
8-0
.25
0.2
40.1
50.1
0-0
.15
-0.1
1-0
.16
0.1
60.1
00.0
6-0
.18
-0.1
4-0
.16
0.1
60.1
00.0
6T
ML
E-2
.77
-1.8
7-0
.31
0.2
20.1
60.1
1-3
.27
-2.0
2-0
.34
0.2
30.1
80.1
21.4
91.0
6-0
.10
0.1
40.0
80.0
53.0
02.6
4-0
.09
0.1
60.0
90.0
6T
ML
Ec
-0.4
3-0
.32
-0.1
60.1
50.1
00.0
6-0
.62
-0.5
2-0
.17
0.1
50.1
00.0
6-0
.18
-0.2
1-0
.08
0.0
80.0
50.0
3-0
.19
-0.2
1-0
.08
0.0
80.0
50.0
3T
ML
Es
-3.8
8-3
.37
-0.2
80.1
90.1
50.1
0-4
.04
-3.5
4-0
.28
0.1
80.1
50.1
00.6
90.3
9-0
.10
0.1
20.0
70.0
50.6
40.3
3-0
.10
0.1
20.0
70.0
5T
ML
Er
-3.8
9-3
.43
-0.2
80.1
90.1
50.1
0-4
.77
-4.4
2-0
.29
0.1
80.1
50.1
00.6
90.3
9-0
.10
0.1
20.0
70.0
50.6
30.3
2-0
.10
0.1
20.0
70.0
5N
=250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
φ11
AB
-GM
M-6
.62
-7.0
1-0
.44
0.3
30.2
50.1
6-2
1.2
0-2
1.1
5-0
.92
0.4
80.5
00.3
0-5
.57
-5.3
9-0
.20
0.0
90.1
00.0
7-1
1.0
0-1
0.7
1-0
.33
0.0
90.1
70.1
2S
ys-
GM
M0.6
30.9
9-0
.16
0.1
60.1
00.0
710.0
110.7
1-0
.16
0.3
50.1
80.1
32.2
12.2
1-0
.06
0.1
00.0
60.0
415.6
115.7
40.0
10.3
00.1
80.1
6F
DL
S0.0
4-0
.02
-0.1
50.1
50.0
90.0
60.0
3-0
.02
-0.1
50.1
50.0
90.0
60.0
50.1
6-0
.09
0.0
90.0
60.0
40.0
50.1
6-0
.09
0.0
90.0
60.0
4T
ML
E3.6
01.0
5-0
.18
0.3
50.1
60.0
96.3
23.0
7-0
.17
0.4
20.1
90.1
01.0
80.3
0-0
.08
0.1
20.0
60.0
42.1
61.0
5-0
.07
0.1
60.0
70.0
4T
ML
Ec
-0.3
3-0
.38
-0.1
30.1
20.0
70.0
5-0
.30
-0.3
6-0
.13
0.1
20.0
80.0
5-0
.16
-0.1
1-0
.06
0.0
60.0
40.0
2-0
.16
-0.1
1-0
.06
0.0
60.0
40.0
2T
ML
Es
1.4
3-0
.56
-0.1
80.2
80.1
40.0
91.4
7-0
.54
-0.1
80.2
80.1
40.0
90.7
60.1
4-0
.08
0.1
10.0
60.0
30.7
60.1
4-0
.08
0.1
10.0
60.0
3T
ML
Er
1.5
6-0
.51
-0.1
80.2
90.1
40.0
92.1
9-0
.31
-0.1
80.3
30.1
50.0
90.7
60.1
4-0
.08
0.1
10.0
60.0
30.7
60.1
4-0
.08
0.1
10.0
60.0
3φ12
AB
-GM
M-3
.78
-4.2
9-0
.41
0.3
50.2
40.1
5-1
4.0
2-1
4.1
1-0
.86
0.5
70.4
90.2
8-2
.93
-2.8
9-0
.17
0.1
10.0
90.0
6-7
.12
-6.9
0-0
.29
0.1
40.1
50.1
0S
ys-
GM
M-0
.46
-0.3
5-0
.17
0.1
50.1
00.0
6-2
.00
-1.8
1-0
.28
0.2
20.1
50.0
9-0
.37
-0.3
1-0
.09
0.0
80.0
50.0
3-4
.61
-4.6
9-0
.18
0.0
90.0
90.0
7F
DL
S0.0
20.0
1-0
.15
0.1
50.0
90.0
60.0
0-0
.01
-0.1
50.1
50.0
90.0
6-0
.04
0.0
0-0
.10
0.1
00.0
60.0
4-0
.04
0.0
0-0
.10
0.1
00.0
60.0
4T
ML
E-0
.28
0.2
5-0
.19
0.1
70.1
10.0
70.1
01.0
8-0
.23
0.1
90.1
30.0
81.0
20.3
3-0
.07
0.1
10.0
50.0
32.0
70.9
9-0
.06
0.1
30.0
60.0
4T
ML
Ec
-0.1
5-0
.09
-0.1
00.1
00.0
60.0
4-0
.18
-0.1
2-0
.10
0.1
00.0
60.0
4-0
.10
-0.0
4-0
.05
0.0
50.0
30.0
2-0
.10
-0.0
4-0
.05
0.0
50.0
30.0
2T
ML
Es
-1.2
0-0
.92
-0.1
80.1
50.1
00.0
7-1
.27
-0.9
9-0
.18
0.1
50.1
00.0
70.7
10.2
0-0
.07
0.1
00.0
50.0
30.7
10.2
0-0
.07
0.1
00.0
50.0
3T
ML
Er
-1.2
1-0
.95
-0.1
80.1
50.1
00.0
7-1
.57
-1.2
5-0
.19
0.1
40.1
00.0
70.7
10.2
0-0
.07
0.1
00.0
50.0
30.7
10.2
0-0
.07
0.1
00.0
50.0
3
Chapter 2. First Difference Transformation in Panel VAR models 54
Table
2.2
:D
esig
n2
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
φ11
AB
-GM
M-8
.01
-8.3
9-0
.54
0.4
20.3
10.2
0-2
0.2
4-2
0.7
2-1
.01
0.6
50.5
80.3
5-7
.19
-7.0
9-0
.29
0.1
40.1
50.1
0-1
0.4
4-1
0.3
8-0
.38
0.1
70.2
00.1
3S
ys-
GM
M3.9
84.0
7-0
.28
0.3
40.1
90.1
325.4
826.4
4-0
.19
0.6
80.3
70.2
810.1
59.9
7-0
.07
0.2
80.1
50.1
137.5
538.4
50.1
50.5
70.4
00.3
8F
DL
S-4
.67
-4.8
3-0
.34
0.2
50.1
80.1
3-4
.67
-4.8
2-0
.34
0.2
50.1
80.1
3-5
.43
-5.5
2-0
.24
0.1
40.1
30.0
9-5
.43
-5.5
3-0
.24
0.1
40.1
30.0
9T
ML
E8.3
83.8
6-0
.27
0.5
90.2
70.1
614.8
89.6
2-0
.27
0.7
00.3
40.1
90.2
2-0
.13
-0.1
30.1
40.0
90.0
60.8
6-0
.06
-0.1
30.1
60.1
00.0
6T
ML
Ec
2.2
52.1
7-0
.21
0.2
60.1
40.1
02.4
72.2
3-0
.21
0.2
60.1
50.1
00.5
80.5
6-0
.11
0.1
20.0
70.0
50.5
80.5
6-0
.11
0.1
20.0
70.0
5T
ML
Es
5.3
51.6
7-0
.27
0.5
10.2
40.1
46.4
12.3
6-0
.27
0.5
50.2
50.1
40.1
8-0
.13
-0.1
30.1
40.0
80.0
60.1
8-0
.13
-0.1
30.1
40.0
80.0
6T
ML
Er
5.6
11.7
5-0
.27
0.5
20.2
40.1
48.1
13.1
4-0
.27
0.6
00.2
70.1
50.1
8-0
.13
-0.1
30.1
40.0
80.0
60.1
9-0
.13
-0.1
30.1
40.0
80.0
6φ12
AB
-GM
M-1
.74
-1.4
1-0
.50
0.4
50.2
90.1
80.0
00.0
1-0
.86
0.8
80.5
70.3
2-0
.43
-0.3
4-0
.22
0.2
00.1
30.0
90.4
20.4
8-0
.28
0.2
80.1
70.1
2S
ys-
GM
M-1
.67
-1.6
4-0
.33
0.2
90.1
90.1
3-8
.19
-8.4
7-0
.51
0.3
50.2
70.1
8-2
.94
-2.8
6-0
.20
0.1
40.1
10.0
7-9
.99
-10.0
0-0
.29
0.1
00.1
50.1
1F
DL
S4.9
84.9
8-0
.27
0.3
60.2
00.1
34.9
74.9
7-0
.27
0.3
60.2
00.1
38.3
68.3
9-0
.12
0.2
90.1
50.1
08.3
78.3
9-0
.12
0.2
90.1
50.1
0T
ML
E-0
.88
-0.5
5-0
.38
0.3
40.2
10.1
3-2
.86
-1.5
2-0
.46
0.3
70.2
50.1
60.0
50.1
0-0
.12
0.1
20.0
70.0
50.0
30.1
7-0
.13
0.1
30.0
90.0
5T
ML
Ec
-7.2
7-7
.32
-0.2
70.1
30.1
40.1
0-7
.27
-7.3
1-0
.27
0.1
30.1
40.1
0-3
.31
-3.3
5-0
.14
0.0
70.0
70.0
5-3
.31
-3.3
5-0
.14
0.0
70.0
70.0
5T
ML
Es
-0.6
9-0
.55
-0.3
20.3
00.1
90.1
2-1
.35
-1.1
5-0
.32
0.2
90.1
80.1
20.0
10.0
9-0
.12
0.1
20.0
70.0
50.0
00.0
9-0
.12
0.1
20.0
70.0
5T
ML
Er
-0.6
6-0
.53
-0.3
20.3
00.1
80.1
2-1
.91
-1.7
1-0
.32
0.2
80.1
80.1
20.0
10.0
9-0
.12
0.1
20.0
70.0
50.0
00.0
9-0
.12
0.1
20.0
70.0
5N
=250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
φ11
AB
-GM
M-3
.39
-4.0
5-0
.33
0.2
80.1
90.1
3-8
.81
-10.0
9-0
.62
0.4
90.3
50.2
2-3
.16
-3.2
1-0
.17
0.1
10.0
90.0
6-4
.99
-4.9
7-0
.24
0.1
40.1
30.0
8S
ys-
GM
M1.3
71.4
1-0
.19
0.2
10.1
20.0
814.3
714.1
3-0
.19
0.4
80.2
50.1
72.9
12.8
6-0
.08
0.1
40.0
70.0
522.5
922.3
40.0
20.4
40.2
60.2
2F
DL
S-5
.11
-5.1
3-0
.23
0.1
30.1
20.0
8-5
.11
-5.1
3-0
.23
0.1
30.1
20.0
8-5
.62
-5.7
3-0
.17
0.0
60.0
90.0
7-5
.62
-5.7
3-0
.17
0.0
60.0
90.0
7T
ML
E4.0
60.7
8-0
.18
0.4
20.1
80.0
98.1
02.5
5-0
.18
0.5
70.2
40.1
10.0
2-0
.09
-0.0
80.0
80.0
50.0
30.0
3-0
.09
-0.0
80.0
80.0
50.0
3T
ML
Ec
2.3
02.2
9-0
.12
0.1
70.0
90.0
62.3
22.3
0-0
.12
0.1
70.0
90.0
60.7
00.6
8-0
.07
0.0
80.0
40.0
30.7
00.6
8-0
.07
0.0
80.0
40.0
3T
ML
Es
2.8
70.4
0-0
.18
0.3
50.1
60.0
93.1
70.5
0-0
.18
0.3
70.1
70.0
90.0
2-0
.09
-0.0
80.0
80.0
50.0
30.0
2-0
.09
-0.0
80.0
80.0
50.0
3T
ML
Er
2.9
10.4
0-0
.18
0.3
50.1
60.0
93.5
50.5
6-0
.18
0.4
10.1
80.0
90.0
2-0
.09
-0.0
80.0
80.0
50.0
30.0
2-0
.09
-0.0
80.0
80.0
50.0
3φ12
AB
-GM
M-0
.66
-0.8
0-0
.30
0.2
90.1
80.1
20.3
9-0
.13
-0.5
60.5
60.3
50.2
1-0
.06
-0.0
5-0
.14
0.1
40.0
90.0
60.5
90.7
5-0
.19
0.2
00.1
20.0
8S
ys-
GM
M-0
.59
-0.6
2-0
.21
0.1
90.1
20.0
8-5
.03
-4.6
7-0
.39
0.2
80.2
10.1
4-0
.74
-0.7
5-0
.12
0.1
00.0
70.0
4-5
.98
-5.9
5-0
.26
0.1
40.1
30.0
9F
DL
S5.0
95.1
4-0
.14
0.2
50.1
30.0
95.0
95.1
4-0
.14
0.2
50.1
30.0
98.4
98.5
2-0
.05
0.2
10.1
20.0
98.4
98.5
2-0
.05
0.2
10.1
20.0
9T
ML
E0.8
50.3
2-0
.22
0.2
60.1
40.0
8-0
.27
0.1
9-0
.35
0.3
00.1
80.1
00.0
00.0
4-0
.08
0.0
70.0
50.0
30.0
10.0
4-0
.08
0.0
70.0
50.0
3T
ML
Ec
-7.2
8-7
.26
-0.2
00.0
50.1
10.0
8-7
.28
-7.2
7-0
.20
0.0
50.1
10.0
8-3
.29
-3.2
4-0
.10
0.0
30.0
50.0
4-3
.29
-3.2
4-0
.10
0.0
30.0
50.0
4T
ML
Es
0.6
70.1
3-0
.19
0.2
30.1
30.0
80.4
7-0
.04
-0.1
90.2
30.1
30.0
80.0
00.0
4-0
.08
0.0
70.0
50.0
30.0
00.0
4-0
.08
0.0
70.0
50.0
3T
ML
Er
0.6
60.1
2-0
.19
0.2
30.1
30.0
80.2
4-0
.19
-0.2
00.2
20.1
30.0
80.0
00.0
4-0
.08
0.0
70.0
50.0
30.0
00.0
4-0
.08
0.0
70.0
50.0
3
Chapter 2. First Difference Transformation in Panel VAR models 55
Table
2.3
:D
esig
n3
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
φ11
AB
-GM
M-2
7.5
5-2
2.6
3-1
.33
0.6
40.7
10.3
6-3
.27
-2.2
9-0
.33
0.2
30.2
00.0
9-1
1.6
5-1
1.2
6-0
.41
0.1
60.2
10.1
4-5
.37
-4.3
4-0
.26
0.1
20.1
30.0
7S
ys-
GM
M29.5
130.9
00.0
30.5
10.3
30.3
160.9
461.5
00.5
10.7
00.6
10.6
229.0
329.5
40.1
10.4
60.3
10.3
057.8
458.3
20.5
10.6
30.5
80.5
8F
DL
S10.0
39.8
9-0
.21
0.4
20.2
20.1
464.3
464.5
60.2
71.0
10.6
80.6
50.9
00.7
7-0
.18
0.2
10.1
20.0
834.6
833.5
70.0
60.6
60.3
90.3
4T
ML
E16.3
77.5
5-0
.24
0.8
10.3
60.1
77.4
70.7
3-0
.13
0.7
80.2
70.0
60.3
2-0
.09
-0.1
20.1
30.0
90.0
50.0
7-0
.14
-0.0
80.0
80.0
60.0
3T
ML
Ec
17.7
017.4
3-0
.09
0.4
60.2
40.1
852.1
351.5
50.3
40.7
90.5
40.5
27.8
17.5
4-0
.06
0.2
20.1
20.0
840.9
442.5
70.1
90.5
70.4
30.4
3T
ML
Es
4.0
00.7
2-0
.23
0.4
50.2
10.1
10.2
0-0
.21
-0.1
30.1
40.0
90.0
50.0
1-0
.11
-0.1
20.1
30.0
80.0
5-0
.14
-0.1
5-0
.08
0.0
80.0
50.0
3T
ML
Er
5.2
70.9
4-0
.23
0.5
50.2
30.1
10.4
5-0
.21
-0.1
30.1
50.1
00.0
50.0
1-0
.11
-0.1
20.1
30.0
80.0
5-0
.14
-0.1
5-0
.08
0.0
80.0
50.0
3φ12
AB
-GM
M-1
.70
-3.2
9-1
.03
1.0
30.6
70.3
30.6
10.3
9-0
.24
0.2
50.1
70.0
8-0
.74
-0.6
5-0
.30
0.2
80.1
80.1
11.1
81.0
5-0
.17
0.1
90.1
10.0
6S
ys-
GM
M-3
.10
-3.7
6-0
.23
0.2
00.1
30.0
9-1
0.4
3-1
0.6
0-0
.17
-0.0
30.1
10.1
1-4
.45
-4.6
6-0
.18
0.1
00.0
90.0
7-1
1.7
1-1
1.8
8-0
.16
-0.0
70.1
20.1
2F
DL
S5.6
55.5
0-0
.26
0.3
80.2
00.1
39.0
49.1
2-0
.22
0.4
00.2
10.1
48.5
58.5
8-0
.12
0.2
90.1
50.1
19.8
29.8
0-0
.15
0.3
50.1
80.1
2T
ML
E-0
.76
-0.0
6-0
.49
0.4
30.2
60.1
50.8
30.2
2-0
.17
0.3
00.1
70.0
50.0
60.0
9-0
.11
0.1
10.0
70.0
40.0
80.0
7-0
.08
0.0
70.0
50.0
3T
ML
Ec
-7.4
2-7
.50
-0.2
90.1
50.1
50.1
0-3
.83
-4.8
6-0
.16
0.1
20.0
90.0
7-3
.07
-3.0
3-0
.15
0.0
90.0
80.0
5-5
.02
-5.9
6-0
.18
0.1
10.1
00.0
8T
ML
Es
-0.0
7-0
.33
-0.2
50.2
60.1
60.0
9-0
.04
-0.0
4-0
.12
0.1
20.0
70.0
5-0
.05
0.0
6-0
.11
0.1
10.0
70.0
4-0
.01
0.0
6-0
.08
0.0
70.0
50.0
3T
ML
Er
-0.3
6-0
.60
-0.2
50.2
50.1
50.0
9-0
.07
-0.0
5-0
.12
0.1
20.0
70.0
5-0
.05
0.0
6-0
.11
0.1
10.0
70.0
4-0
.01
0.0
6-0
.08
0.0
70.0
50.0
3N
=250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
φ11
AB
-GM
M-1
7.4
4-1
3.6
8-0
.99
0.5
10.5
40.2
6-0
.78
-0.7
8-0
.14
0.1
30.0
80.0
5-5
.85
-5.6
8-0
.26
0.1
30.1
30.0
9-1
.95
-1.6
3-0
.13
0.0
80.0
70.0
4S
ys-
GM
M32.0
632.7
40.1
60.4
60.3
30.3
361.1
761.3
40.5
50.6
70.6
10.6
127.4
427.7
10.1
40.4
10.2
90.2
857.8
458.0
10.5
40.6
10.5
80.5
8F
DL
S9.8
79.7
9-0
.10
0.3
00.1
60.1
166.9
267.0
10.4
20.9
20.6
90.6
70.7
50.6
0-0
.11
0.1
30.0
70.0
535.9
235.2
60.1
70.5
70.3
80.3
5T
ML
E9.6
12.1
4-0
.15
0.6
90.2
60.0
91.1
00.0
1-0
.08
0.0
90.1
10.0
3-0
.03
-0.0
7-0
.07
0.0
80.0
50.0
3-0
.06
-0.1
1-0
.05
0.0
50.0
30.0
2T
ML
Ec
18.1
917.8
70.0
10.3
60.2
10.1
851.3
750.8
60.4
20.6
00.5
20.5
17.9
87.8
5-0
.01
0.1
70.1
00.0
844.0
044.9
70.3
00.5
50.4
50.4
5T
ML
Es
1.2
6-0
.06
-0.1
50.2
10.1
20.0
7-0
.02
-0.0
7-0
.08
0.0
80.0
50.0
3-0
.03
-0.0
7-0
.07
0.0
80.0
50.0
3-0
.06
-0.1
1-0
.05
0.0
50.0
30.0
2T
ML
Er
1.6
2-0
.02
-0.1
50.2
20.1
30.0
7-0
.02
-0.0
7-0
.08
0.0
80.0
50.0
3-0
.03
-0.0
7-0
.07
0.0
80.0
50.0
3-0
.06
-0.1
1-0
.05
0.0
50.0
30.0
2φ12
AB
-GM
M0.1
2-1
.50
-0.7
50.8
20.5
20.2
50.1
00.1
1-0
.12
0.1
20.0
80.0
5-0
.25
-0.1
3-0
.21
0.2
00.1
30.0
80.4
90.5
0-0
.10
0.1
10.0
60.0
4S
ys-
GM
M-3
.98
-4.2
6-0
.16
0.0
90.0
90.0
6-1
0.4
4-1
0.4
9-0
.15
-0.0
60.1
10.1
0-3
.81
-3.9
8-0
.14
0.0
70.0
70.0
5-1
1.7
8-1
1.8
3-0
.14
-0.0
90.1
20.1
2F
DL
S5.8
85.8
2-0
.14
0.2
60.1
30.0
99.4
59.4
7-0
.11
0.2
90.1
50.1
18.7
18.7
6-0
.04
0.2
20.1
20.0
910.0
910.2
2-0
.06
0.2
60.1
40.1
1T
ML
E0.6
00.4
1-0
.38
0.3
70.1
90.0
80.1
3-0
.01
-0.0
70.0
70.0
60.0
3-0
.01
-0.0
2-0
.07
0.0
70.0
40.0
3-0
.01
-0.0
1-0
.05
0.0
50.0
30.0
2T
ML
Ec
-7.5
9-7
.53
-0.2
20.0
60.1
10.0
8-4
.78
-5.2
4-0
.11
0.0
40.0
70.0
6-3
.01
-2.9
6-0
.11
0.0
40.0
60.0
4-6
.36
-6.8
7-0
.15
0.0
40.0
90.0
7T
ML
Es
0.3
2-0
.05
-0.1
40.1
50.0
90.0
6-0
.02
-0.0
4-0
.07
0.0
70.0
40.0
3-0
.01
-0.0
2-0
.07
0.0
70.0
40.0
3-0
.01
-0.0
1-0
.05
0.0
50.0
30.0
2T
ML
Er
0.1
9-0
.13
-0.1
40.1
50.0
90.0
6-0
.02
-0.0
4-0
.07
0.0
70.0
40.0
3-0
.01
-0.0
2-0
.07
0.0
70.0
40.0
3-0
.01
-0.0
1-0
.05
0.0
50.0
30.0
2
Chapter 2. First Difference Transformation in Panel VAR models 56
Table
2.4
:D
esig
n4
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
φ11
AB
-GM
M-2
.40
-2.6
4-0
.33
0.3
00.1
90.1
3-0
.60
-0.6
7-0
.19
0.1
80.1
10.0
7-3
.95
-4.0
5-0
.21
0.1
20.1
10.0
7-1
.59
-1.5
8-0
.13
0.0
90.0
70.0
4S
ys-
GM
M28.8
229.7
8-0
.11
0.6
60.3
70.3
052.0
851.3
30.4
60.6
10.5
20.5
129.1
530.1
60.0
30.5
20.3
30.3
052.8
052.7
00.5
00.5
60.5
30.5
3F
DL
S7.1
06.7
9-0
.23
0.3
90.2
00.1
356.2
456.1
30.1
70.9
60.6
10.5
60.3
30.2
2-0
.19
0.2
00.1
20.0
832.1
431.2
60.0
40.6
30.3
70.3
1T
ML
E12.1
84.8
5-0
.25
0.7
30.3
20.1
511.0
91.6
9-0
.18
0.9
60.3
30.0
80.6
2-0
.12
-0.1
20.1
40.1
00.0
50.3
2-0
.01
-0.0
90.0
90.0
70.0
4T
ML
Ec
13.5
213.2
3-0
.12
0.4
00.2
10.1
548.7
047.5
30.2
10.8
40.5
20.4
86.9
86.8
3-0
.07
0.2
10.1
10.0
836.2
636.9
50.1
50.5
50.3
80.3
7T
ML
Es
4.8
81.3
0-0
.24
0.5
10.2
20.1
20.8
00.1
4-0
.15
0.1
80.1
10.0
60.1
1-0
.14
-0.1
20.1
30.0
80.0
50.0
2-0
.02
-0.0
90.0
90.0
50.0
4T
ML
Er
5.4
21.4
4-0
.24
0.5
40.2
30.1
21.4
80.1
7-0
.15
0.1
90.1
40.0
60.1
1-0
.14
-0.1
20.1
30.0
80.0
50.0
2-0
.02
-0.0
90.0
90.0
50.0
4φ12
AB
-GM
M-0
.46
-0.2
7-0
.32
0.3
00.1
90.1
3-0
.14
0.0
1-0
.20
0.2
00.1
20.0
8-0
.25
-0.2
4-0
.17
0.1
60.1
00.0
7-0
.06
-0.0
9-0
.11
0.1
10.0
70.0
4S
ys-
GM
M-1
3.2
5-1
3.5
2-0
.43
0.1
70.2
30.1
6-1
6.0
5-1
5.4
8-0
.25
-0.0
90.1
70.1
5-9
.47
-9.7
8-0
.28
0.1
00.1
50.1
1-1
4.8
7-1
4.8
0-0
.18
-0.1
20.1
50.1
5F
DL
S5.6
95.8
0-0
.26
0.3
80.2
00.1
48.7
68.6
9-0
.25
0.4
30.2
30.1
58.5
68.6
9-0
.12
0.2
90.1
50.1
19.6
49.6
7-0
.15
0.3
40.1
80.1
2T
ML
E-0
.26
0.0
7-0
.44
0.4
00.2
40.1
41.0
70.4
0-0
.50
0.5
20.2
60.0
90.1
40.0
4-0
.12
0.1
20.0
80.0
50.1
3-0
.05
-0.0
90.0
90.0
60.0
3T
ML
Ec
-6.0
2-6
.03
-0.2
80.1
60.1
40.1
00.3
7-0
.08
-0.2
00.2
30.1
30.0
8-2
.74
-2.6
8-0
.14
0.0
90.0
80.0
5-0
.54
-1.0
8-0
.17
0.1
80.1
10.0
7T
ML
Es
-0.1
8-0
.31
-0.2
80.2
90.1
70.1
1-0
.08
0.0
1-0
.16
0.1
60.1
00.0
6-0
.03
-0.0
5-0
.11
0.1
10.0
70.0
5-0
.07
-0.0
6-0
.09
0.0
80.0
50.0
3T
ML
Er
-0.1
9-0
.36
-0.2
80.2
90.1
70.1
1-0
.13
-0.0
2-0
.16
0.1
60.1
00.0
6-0
.03
-0.0
5-0
.11
0.1
10.0
70.0
5-0
.07
-0.0
6-0
.09
0.0
80.0
50.0
3N
=250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
φ11
AB
-GM
M-0
.98
-1.2
0-0
.20
0.1
90.1
20.0
8-0
.22
-0.3
5-0
.11
0.1
10.0
70.0
4-1
.62
-1.6
6-0
.12
0.0
90.0
70.0
4-0
.57
-0.5
3-0
.07
0.0
60.0
40.0
3S
ys-
GM
M35.1
336.3
00.0
10.6
60.4
00.3
652.4
052.0
50.4
80.5
80.5
20.5
231.4
332.2
20.0
80.5
30.3
40.3
253.9
153.7
90.5
20.5
60.5
40.5
4F
DL
S6.8
66.7
3-0
.12
0.2
60.1
40.0
958.3
258.1
70.3
30.8
40.6
00.5
80.1
90.0
3-0
.12
0.1
30.0
70.0
533.2
132.5
40.1
50.5
40.3
50.3
3T
ML
E9.1
02.0
8-0
.16
0.6
90.2
60.0
92.7
50.1
0-0
.11
0.1
70.1
70.0
40.0
5-0
.09
-0.0
70.0
80.0
50.0
30.0
1-0
.03
-0.0
50.0
50.0
30.0
2T
ML
Ec
13.8
913.7
9-0
.02
0.3
00.1
70.1
450.2
147.7
40.3
20.7
90.5
20.4
87.1
67.1
3-0
.01
0.1
60.0
90.0
738.8
139.1
90.2
50.5
20.4
00.3
9T
ML
Es
1.8
80.3
0-0
.16
0.2
40.1
40.0
70.1
2-0
.01
-0.1
00.1
00.0
60.0
40.0
1-0
.09
-0.0
70.0
80.0
50.0
30.0
0-0
.03
-0.0
50.0
50.0
30.0
2T
ML
Er
2.0
40.3
5-0
.16
0.2
50.1
40.0
70.1
6-0
.01
-0.1
00.1
00.0
60.0
40.0
1-0
.09
-0.0
70.0
80.0
50.0
30.0
0-0
.03
-0.0
50.0
50.0
30.0
2φ12
AB
-GM
M-0
.11
-0.1
3-0
.20
0.1
90.1
20.0
80.0
1-0
.10
-0.1
20.1
20.0
70.0
5-0
.02
-0.0
4-0
.10
0.1
10.0
60.0
40.0
1-0
.01
-0.0
70.0
70.0
40.0
3S
ys-
GM
M-1
6.3
2-1
6.2
6-0
.39
0.0
60.2
10.1
7-1
6.0
6-1
5.8
3-0
.21
-0.1
20.1
60.1
6-1
0.3
1-1
0.3
0-0
.26
0.0
60.1
40.1
1-1
5.3
1-1
5.2
5-0
.17
-0.1
30.1
50.1
5F
DL
S5.8
25.7
1-0
.14
0.2
60.1
30.0
99.0
59.0
1-0
.13
0.3
10.1
60.1
18.6
88.7
0-0
.05
0.2
20.1
20.0
99.8
29.8
6-0
.06
0.2
60.1
40.1
0T
ML
E0.9
50.7
4-0
.38
0.3
70.2
00.0
91.1
30.2
3-0
.12
0.2
10.1
50.0
40.0
30.0
6-0
.07
0.0
70.0
40.0
30.0
00.0
2-0
.05
0.0
50.0
30.0
2T
ML
Ec
-6.0
4-6
.08
-0.2
00.0
80.1
00.0
7-0
.24
-0.4
6-0
.13
0.1
30.0
80.0
5-2
.71
-2.7
1-0
.10
0.0
50.0
50.0
4-1
.00
-1.2
8-0
.12
0.1
10.0
70.0
5T
ML
Es
0.6
30.1
2-0
.16
0.1
90.1
10.0
60.0
50.0
2-0
.10
0.1
00.0
60.0
40.0
00.0
5-0
.07
0.0
70.0
40.0
3-0
.01
0.0
2-0
.05
0.0
50.0
30.0
2T
ML
Er
0.5
80.1
0-0
.16
0.1
90.1
10.0
60.0
50.0
2-0
.10
0.1
00.0
60.0
40.0
00.0
5-0
.07
0.0
70.0
40.0
3-0
.01
0.0
2-0
.05
0.0
50.0
30.0
2
Chapter 2. First Difference Transformation in Panel VAR models 57
Table
2.5
:D
esig
n5
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
φ11
AB
-GM
M-1
4.6
5-1
5.6
5-0
.77
0.5
10.4
30.2
7-2
9.4
3-3
0.0
4-1
.31
0.7
50.7
40.4
5-1
2.9
4-1
3.0
0-0
.40
0.1
40.2
10.1
5-1
5.8
9-1
5.8
9-0
.49
0.1
70.2
60.1
8S
ys-
GM
M6.8
67.1
0-0
.33
0.4
50.2
50.1
732.3
334.1
4-0
.19
0.7
70.4
40.3
518.4
318.4
3-0
.05
0.4
10.2
30.1
946.4
347.6
90.2
60.6
20.4
80.4
8F
DL
S-4
.67
-4.7
5-0
.46
0.3
70.2
60.1
7-4
.69
-4.7
6-0
.46
0.3
70.2
60.1
7-5
.52
-5.5
9-0
.32
0.2
10.1
70.1
1-5
.53
-5.6
1-0
.32
0.2
10.1
70.1
1T
ML
E10.6
16.5
9-0
.37
0.6
90.3
40.2
217.6
314.9
8-0
.36
0.7
80.4
00.2
71.0
2-0
.23
-0.1
90.2
40.1
40.0
83.6
20.3
8-0
.19
0.4
40.1
90.0
9T
ML
Ec
1.6
91.2
5-0
.30
0.3
50.2
00.1
32.3
81.6
2-0
.30
0.3
80.2
10.1
30.2
30.3
5-0
.16
0.1
70.1
00.0
70.2
40.3
3-0
.16
0.1
70.1
00.0
7T
ML
Es
5.1
31.4
9-0
.36
0.5
80.2
90.1
96.9
73.0
6-0
.35
0.6
10.3
00.1
90.3
9-0
.34
-0.1
80.2
10.1
30.0
80.5
6-0
.30
-0.1
80.2
20.1
30.0
8T
ML
Er
5.8
81.8
0-0
.36
0.6
00.3
00.1
910.7
05.4
6-0
.35
0.7
00.3
40.2
20.4
1-0
.34
-0.1
80.2
10.1
30.0
80.8
6-0
.27
-0.1
80.2
30.1
40.0
8φ12
AB
-GM
M-2
.69
-2.5
4-0
.67
0.6
00.4
20.2
4-0
.96
0.0
6-1
.10
1.0
70.7
20.3
80.1
60.4
0-0
.28
0.2
80.1
70.1
11.2
51.3
6-0
.33
0.3
60.2
10.1
4S
ys-
GM
M-2
.75
-2.7
0-0
.42
0.3
60.2
40.1
6-9
.82
-9.8
7-0
.56
0.3
80.3
00.1
9-5
.09
-5.0
7-0
.26
0.1
60.1
40.0
9-1
2.0
1-1
2.2
9-0
.28
0.0
50.1
60.1
3F
DL
S5.4
15.5
4-0
.40
0.5
00.2
80.1
85.4
15.5
4-0
.40
0.5
00.2
80.1
88.6
58.7
2-0
.22
0.3
80.2
00.1
48.6
48.7
1-0
.22
0.3
80.2
00.1
4T
ML
E-1
.70
-0.8
9-0
.48
0.4
20.2
70.1
8-3
.84
-2.9
4-0
.53
0.4
30.2
90.2
00.3
80.2
0-0
.18
0.2
00.1
20.0
7-0
.26
0.2
1-0
.27
0.2
30.1
50.0
8T
ML
Ec
-6.9
4-6
.86
-0.3
50.2
10.1
80.1
2-6
.92
-6.8
1-0
.35
0.2
00.1
80.1
2-3
.21
-3.2
0-0
.18
0.1
20.1
00.0
7-3
.22
-3.2
1-0
.18
0.1
20.1
00.0
7T
ML
Es
-1.3
3-1
.06
-0.4
00.3
60.2
30.1
5-2
.43
-2.2
2-0
.39
0.3
40.2
20.1
50.2
80.1
4-0
.17
0.1
80.1
10.0
70.1
20.0
4-0
.17
0.1
80.1
10.0
7T
ML
Er
-1.2
7-0
.97
-0.4
00.3
70.2
30.1
5-2
.97
-2.9
9-0
.39
0.3
40.2
20.1
50.2
70.1
3-0
.17
0.1
80.1
10.0
7-0
.04
-0.0
9-0
.18
0.1
80.1
10.0
7N
=250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
φ11
AB
-GM
M-6
.59
-7.1
8-0
.47
0.3
70.2
70.1
8-1
5.3
2-1
6.1
8-0
.88
0.6
20.5
00.3
1-5
.79
-5.6
1-0
.25
0.1
30.1
30.0
9-7
.80
-7.7
1-0
.32
0.1
60.1
60.1
1S
ys-
GM
M2.7
82.7
4-0
.24
0.2
90.1
60.1
122.1
322.6
8-0
.19
0.6
40.3
40.2
58.5
18.1
8-0
.07
0.2
50.1
30.0
937.3
538.0
00.1
50.5
70.4
00.3
8F
DL
S-5
.19
-5.0
6-0
.30
0.2
00.1
60.1
1-5
.19
-5.0
7-0
.30
0.2
00.1
60.1
1-5
.67
-5.6
7-0
.22
0.1
10.1
10.0
8-5
.67
-5.6
7-0
.22
0.1
10.1
10.0
8T
ML
E6.3
51.6
3-0
.24
0.5
40.2
40.1
312.3
05.8
6-0
.24
0.6
70.3
10.1
7-0
.03
-0.2
3-0
.12
0.1
20.0
70.0
50.2
5-0
.21
-0.1
20.1
30.0
80.0
5T
ML
Ec
2.0
11.8
4-0
.18
0.2
30.1
20.0
82.0
81.8
6-0
.18
0.2
30.1
30.0
80.5
20.5
9-0
.10
0.1
10.0
60.0
40.5
20.5
9-0
.10
0.1
10.0
60.0
4T
ML
Es
4.0
30.4
0-0
.24
0.4
70.2
10.1
24.9
10.8
4-0
.24
0.5
10.2
20.1
3-0
.05
-0.2
3-0
.12
0.1
20.0
70.0
5-0
.04
-0.2
3-0
.12
0.1
20.0
70.0
5T
ML
Er
4.1
60.4
3-0
.24
0.4
80.2
20.1
26.0
31.1
4-0
.24
0.5
50.2
40.1
3-0
.05
-0.2
3-0
.12
0.1
20.0
70.0
5-0
.04
-0.2
3-0
.12
0.1
20.0
70.0
5φ12
AB
-GM
M-0
.76
-0.8
6-0
.41
0.4
00.2
50.1
60.7
60.3
3-0
.75
0.7
90.4
90.2
90.1
30.1
6-0
.19
0.1
90.1
10.0
81.0
41.0
5-0
.24
0.2
60.1
50.1
0S
ys-
GM
M-1
.12
-1.0
7-0
.28
0.2
60.1
60.1
1-7
.53
-7.6
2-0
.48
0.3
30.2
60.1
7-2
.29
-2.2
1-0
.17
0.1
30.1
00.0
6-9
.96
-10.3
4-0
.29
0.1
10.1
60.1
2F
DL
S5.3
45.2
8-0
.22
0.3
30.1
70.1
25.3
45.2
8-0
.22
0.3
30.1
70.1
28.6
08.6
6-0
.10
0.2
70.1
40.1
08.6
08.6
6-0
.10
0.2
70.1
40.1
0T
ML
E0.2
20.4
1-0
.33
0.3
20.1
90.1
2-1
.69
-0.3
5-0
.44
0.3
50.2
30.1
40.1
30.1
3-0
.10
0.1
10.0
70.0
40.1
70.1
5-0
.11
0.1
10.0
70.0
4T
ML
Ec
-7.1
0-7
.10
-0.2
50.1
00.1
30.0
9-7
.10
-7.0
9-0
.25
0.1
00.1
30.0
9-3
.23
-3.2
1-0
.13
0.0
60.0
70.0
4-3
.23
-3.2
1-0
.13
0.0
60.0
70.0
4T
ML
Es
0.3
00.1
4-0
.27
0.2
90.1
70.1
0-0
.30
-0.2
4-0
.27
0.2
80.1
70.1
00.1
10.1
3-0
.10
0.1
10.0
60.0
40.1
10.1
3-0
.10
0.1
10.0
60.0
4T
ML
Er
0.3
10.1
5-0
.27
0.2
90.1
70.1
0-0
.74
-0.6
7-0
.28
0.2
70.1
70.1
00.1
20.1
3-0
.10
0.1
10.0
60.0
40.1
10.1
3-0
.10
0.1
10.0
60.0
4
Chapter 2. First Difference Transformation in Panel VAR models 58
Table
2.6
:D
esig
n6
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
Mea
nM
ed.
5q
95
qR
MM
AE
φ11
AB
-GM
M-8
.23
-8.4
0-0
.55
0.4
10.3
10.1
9-2
5.6
5-2
5.2
1-1
.19
0.6
70.6
60.3
9-8
.24
-8.1
0-0
.31
0.1
40.1
60.1
1-1
2.2
7-1
2.2
7-0
.43
0.1
70.2
20.1
5S
ys-
GM
M6.4
46.2
7-0
.26
0.3
90.2
10.1
430.6
231.8
2-0
.15
0.7
30.4
10.3
310.5
010.3
6-0
.07
0.2
80.1
50.1
137.9
238.8
60.1
60.5
70.4
00.3
9F
DL
S0.3
00.1
3-0
.31
0.3
10.1
90.1
30.2
70.1
2-0
.31
0.3
10.1
90.1
3-6
.69
-6.7
4-0
.26
0.1
30.1
40.1
0-6
.69
-6.7
4-0
.26
0.1
30.1
40.1
0T
ML
E12.7
59.9
3-0
.21
0.5
60.2
70.1
617.4
213.8
0-0
.21
0.6
70.3
20.1
93.0
12.7
7-0
.10
0.1
80.0
90.0
63.1
22.7
8-0
.10
0.1
80.0
90.0
6T
ML
Ec
10.6
510.6
3-0
.13
0.3
50.1
80.1
310.9
110.7
0-0
.13
0.3
60.1
90.1
32.9
82.9
0-0
.09
0.1
50.0
80.0
52.9
82.9
0-0
.09
0.1
50.0
80.0
5T
ML
Es
10.6
68.5
0-0
.21
0.5
00.2
40.1
411.7
59.2
2-0
.21
0.5
30.2
50.1
53.0
02.7
7-0
.10
0.1
80.0
90.0
63.0
02.7
7-0
.10
0.1
80.0
90.0
6T
ML
Er
10.8
88.5
8-0
.21
0.5
10.2
40.1
513.1
29.7
6-0
.21
0.5
80.2
70.1
53.0
02.7
7-0
.10
0.1
80.0
90.0
63.0
02.7
7-0
.10
0.1
80.0
90.0
6φ12
AB
-GM
M-2
.01
-1.5
9-0
.49
0.4
30.2
90.1
8-0
.04
-0.5
1-0
.96
0.9
70.6
30.3
3-0
.62
-0.4
8-0
.23
0.2
10.1
40.0
90.2
50.2
8-0
.31
0.3
10.1
90.1
2S
ys-
GM
M-2
.17
-2.0
7-0
.35
0.2
90.2
00.1
3-9
.12
-9.4
9-0
.52
0.3
50.2
80.1
8-2
.87
-2.8
3-0
.20
0.1
40.1
10.0
7-9
.84
-9.9
0-0
.29
0.1
00.1
50.1
1F
DL
S4.0
44.0
2-0
.29
0.3
60.2
00.1
44.0
54.0
5-0
.29
0.3
60.2
00.1
48.2
48.3
8-0
.13
0.3
00.1
50.1
18.2
58.3
8-0
.13
0.3
00.1
50.1
1T
ML
E-0
.99
-0.3
1-0
.34
0.3
00.1
90.1
2-2
.87
-1.6
7-0
.42
0.3
10.2
20.1
4-0
.18
-0.1
4-0
.13
0.1
20.0
70.0
5-0
.19
-0.1
3-0
.13
0.1
20.0
80.0
5T
ML
Ec
-8.7
5-8
.81
-0.2
90.1
20.1
50.1
1-8
.70
-8.7
1-0
.29
0.1
20.1
50.1
1-3
.77
-3.7
5-0
.14
0.0
70.0
80.0
5-3
.77
-3.7
5-0
.14
0.0
70.0
80.0
5T
ML
Es
-0.5
1-0
.16
-0.3
00.2
80.1
70.1
1-1
.23
-0.7
4-0
.30
0.2
60.1
70.1
1-0
.18
-0.1
4-0
.13
0.1
20.0
70.0
5-0
.18
-0.1
4-0
.13
0.1
20.0
70.0
5T
ML
Er
-0.5
6-0
.19
-0.3
00.2
80.1
70.1
1-1
.89
-1.5
8-0
.31
0.2
60.1
70.1
1-0
.18
-0.1
4-0
.13
0.1
20.0
70.0
5-0
.19
-0.1
4-0
.13
0.1
20.0
70.0
5N
=250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
φ11
AB
-GM
M-3
.43
-3.8
3-0
.33
0.2
70.1
90.1
2-1
2.0
2-1
2.6
1-0
.73
0.5
10.4
10.2
5-3
.54
-3.4
5-0
.18
0.1
10.1
00.0
7-5
.90
-5.9
0-0
.27
0.1
50.1
40.0
9S
ys-
GM
M2.6
42.4
4-0
.18
0.2
40.1
30.0
919.3
918.8
7-0
.16
0.5
70.3
00.2
13.0
43.0
3-0
.07
0.1
40.0
70.0
522.6
222.2
60.0
20.4
40.2
60.2
2F
DL
S-0
.12
-0.1
9-0
.19
0.1
90.1
20.0
8-0
.12
-0.1
8-0
.19
0.1
90.1
20.0
8-6
.87
-6.9
6-0
.19
0.0
60.1
00.0
8-6
.87
-6.9
6-0
.19
0.0
60.1
00.0
8T
ML
E9.5
77.7
9-0
.11
0.3
70.1
80.1
011.8
18.7
6-0
.11
0.4
80.2
10.1
12.9
02.7
9-0
.05
0.1
20.0
60.0
42.9
02.7
9-0
.05
0.1
20.0
60.0
4T
ML
Ec
10.8
410.8
2-0
.04
0.2
60.1
40.1
110.8
510.8
3-0
.04
0.2
60.1
40.1
13.1
03.0
7-0
.05
0.1
10.0
60.0
43.1
03.0
7-0
.05
0.1
10.0
60.0
4T
ML
Es
9.0
37.5
4-0
.11
0.3
50.1
70.1
09.3
47.6
9-0
.11
0.3
60.1
70.1
02.9
02.7
9-0
.05
0.1
20.0
60.0
42.9
02.7
9-0
.05
0.1
20.0
60.0
4T
ML
Er
9.0
47.5
4-0
.11
0.3
50.1
70.1
09.6
67.7
6-0
.11
0.3
90.1
80.1
02.9
02.7
9-0
.05
0.1
20.0
60.0
42.9
02.7
9-0
.05
0.1
20.0
60.0
4φ12
AB
-GM
M-0
.74
-0.6
9-0
.29
0.2
80.1
70.1
10.5
70.1
6-0
.60
0.6
30.3
90.2
3-0
.19
-0.1
6-0
.15
0.1
40.0
90.0
60.5
00.6
9-0
.21
0.2
20.1
30.0
9S
ys-
GM
M-0
.73
-0.7
7-0
.22
0.2
00.1
30.0
8-6
.19
-6.1
1-0
.43
0.3
00.2
30.1
6-0
.70
-0.7
0-0
.12
0.1
00.0
70.0
4-5
.64
-5.6
3-0
.25
0.1
40.1
30.0
9F
DL
S4.1
54.1
3-0
.16
0.2
40.1
30.0
94.1
44.1
2-0
.16
0.2
40.1
30.0
98.3
88.4
8-0
.06
0.2
20.1
20.0
98.3
88.4
8-0
.06
0.2
20.1
20.0
9T
ML
E0.8
40.9
8-0
.20
0.2
10.1
30.0
8-0
.19
0.6
8-0
.27
0.2
20.1
50.0
8-0
.16
-0.1
2-0
.08
0.0
70.0
50.0
3-0
.16
-0.1
2-0
.08
0.0
70.0
50.0
3T
ML
Ec
-8.8
4-8
.82
-0.2
20.0
40.1
20.0
9-8
.85
-8.8
3-0
.22
0.0
40.1
20.0
9-3
.75
-3.7
3-0
.11
0.0
30.0
60.0
4-3
.75
-3.7
3-0
.11
0.0
30.0
60.0
4T
ML
Es
0.9
80.9
7-0
.18
0.2
00.1
20.0
80.7
50.8
1-0
.19
0.2
00.1
20.0
8-0
.16
-0.1
2-0
.08
0.0
70.0
50.0
3-0
.16
-0.1
2-0
.08
0.0
70.0
50.0
3T
ML
Er
1.0
00.9
8-0
.18
0.2
00.1
20.0
80.5
10.6
3-0
.19
0.2
00.1
20.0
8-0
.16
-0.1
2-0
.08
0.0
70.0
50.0
3-0
.16
-0.1
2-0
.08
0.0
70.0
50.0
3
Chapter 2. First Difference Transformation in Panel VAR models 59
Table
2.7
:D
esig
n1.
Rej
ecti
onfr
equen
cies
for
two
sided
t-te
sts
forφ
11.
Tru
eva
lueφ
11
=0.
6.
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
0.4
0.5
0.6
0.7
0.8
0.4
0.5
0.6
0.7
0.8
0.4
0.5
0.6
0.7
0.8
0.4
0.5
0.6
0.7
0.8
TM
LE
(r)
.320
.235
.210
.233
.316
.376
.291
.255
.259
.309
.802
.293
.106
.305
.664
.821
.346
.130
.270
.592
TM
LE
c(r)
.395
.139
.065
.172
.440
.399
.143
.072
.178
.444
.930
.414
.061
.465
.938
.929
.414
.061
.465
.938
TM
LE
s(r)
.257
.180
.172
.222
.340
.257
.181
.173
.224
.341
.780
.258
.092
.316
.695
.779
.255
.093
.318
.697
TM
LE
r(r)
.276
.201
.192
.236
.347
.314
.243
.233
.271
.363
.780
.258
.092
.316
.695
.779
.255
.093
.318
.697
AB
-GM
M2(W
).0
39
.052
.088
.144
.225
.059
.087
.127
.176
.236
.111
.050
.154
.414
.725
.049
.081
.219
.450
.693
Sys-
GM
M2(W
).3
99
.203
.087
.090
.231
.627
.491
.332
.176
.084
.931
.658
.227
.099
.462
.988
.957
.865
.646
.309
N=
250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
0.4
0.5
0.6
0.7
0.8
0.4
0.5
0.6
0.7
0.8
0.4
0.5
0.6
0.7
0.8
0.4
0.5
0.6
0.7
0.8
TM
LE
(r)
.366
.198
.150
.234
.418
.407
.235
.177
.233
.386
.992
.540
.099
.510
.881
.994
.589
.134
.468
.827
TM
LE
c(r)
.760
.261
.056
.301
.780
.761
.260
.056
.300
.780
.789
.054
.812
.789
.054
.812
TM
LE
s(r)
.330
.170
.133
.242
.449
.330
.170
.134
.242
.449
.987
.520
.085
.516
.894
.987
.520
.085
.516
.894
TM
LE
r(r)
.333
.173
.136
.245
.451
.343
.185
.150
.256
.456
.987
.520
.085
.516
.894
.987
.520
.085
.516
.894
AB
-GM
M2(W
).0
60
.032
.066
.145
.275
.029
.048
.086
.137
.209
.406
.088
.094
.437
.840
.125
.047
.126
.377
.691
Sys-
GM
M2(W
).5
85
.257
.076
.164
.524
.587
.397
.208
.094
.123
.992
.689
.092
.361
.947
.986
.881
.599
.253
.189
Table
2.8
:D
esig
n2.
Rej
ecti
onfr
equen
cies
for
two
sided
t-te
sts
forφ
11.
Tru
eva
lueφ
11
=0.
4.
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.239
.185
.173
.205
.281
.315
.260
.238
.247
.292
.712
.219
.055
.273
.687
.704
.222
.065
.279
.681
TM
LE
c(r)
.358
.148
.064
.104
.272
.361
.151
.069
.108
.276
.808
.311
.050
.258
.756
.808
.311
.050
.258
.756
TM
LE
s(r)
.204
.153
.148
.193
.284
.218
.166
.159
.201
.288
.713
.219
.054
.272
.688
.713
.219
.054
.272
.688
TM
LE
r(r)
.211
.160
.155
.198
.287
.250
.200
.195
.232
.311
.713
.219
.054
.272
.688
.713
.219
.055
.273
.688
AB
-GM
M2(W
).0
57
.046
.067
.123
.211
.038
.053
.076
.114
.165
.183
.062
.094
.274
.565
.090
.049
.100
.246
.462
Sys-
GM
M2(W
).2
78
.145
.081
.086
.151
.531
.424
.311
.208
.135
.835
.528
.209
.077
.200
.971
.925
.838
.693
.494
N=
250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.297
.116
.103
.215
.409
.336
.170
.149
.240
.399
.985
.498
.048
.513
.963
.985
.498
.048
.513
.962
TM
LE
c(r)
.716
.280
.058
.150
.515
.716
.280
.058
.150
.515
.997
.644
.043
.522
.989
.997
.644
.043
.522
.989
TM
LE
s(r)
.285
.102
.091
.209
.414
.290
.106
.095
.213
.416
.985
.498
.048
.513
.963
.985
.498
.048
.513
.963
TM
LE
r(r)
.286
.102
.092
.210
.414
.295
.113
.104
.221
.422
.985
.498
.048
.513
.963
.985
.498
.048
.513
.963
AB
-GM
M2(W
).1
28
.051
.059
.145
.295
.035
.034
.058
.106
.179
.517
.138
.068
.356
.775
.267
.077
.074
.268
.591
Sys-
GM
M2(W
).4
34
.171
.067
.129
.350
.465
.318
.197
.120
.101
.949
.536
.086
.226
.765
.950
.811
.573
.319
.167
Chapter 2. First Difference Transformation in Panel VAR models 60
Table
2.9
:D
esig
n3.
Rej
ecti
onfr
equen
cies
for
two
sided
t-te
sts
forφ
11.
Tru
eva
lueφ
11
=0.
4.
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.303
.222
.210
.249
.329
.733
.331
.141
.358
.701
.784
.269
.061
.318
.766
.970
.562
.079
.590
.964
TM
LE
c(r)
.634
.393
.192
.089
.081
.980
.969
.950
.908
.833
.868
.450
.082
.056
.299
.734
.645
.529
.407
.289
TM
LE
s(r)
.218
.109
.108
.185
.335
.738
.289
.081
.328
.720
.789
.269
.057
.315
.769
.972
.562
.076
.588
.966
TM
LE
r(r)
.233
.130
.132
.210
.357
.739
.290
.083
.330
.722
.789
.269
.057
.315
.769
.972
.562
.076
.588
.966
AB
-GM
M2(W
).0
39
.050
.070
.103
.152
.332
.133
.069
.193
.432
.087
.052
.103
.254
.477
.426
.151
.084
.370
.715
Sys-
GM
M2(W
).8
72
.764
.616
.422
.232
.999
.996
.970
.855
.613
.313
N=
250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.486
.180
.135
.259
.480
.982
.579
.079
.580
.962
.995
.577
.049
.595
.985
.910
.060
.907
TM
LE
c(r)
.961
.771
.392
.112
.067
.999
.998
.987
.999
.885
.194
.055
.532
.901
.870
.812
.694
.517
TM
LE
s(r)
.512
.110
.060
.233
.536
.984
.575
.067
.576
.966
.995
.577
.049
.595
.985
.910
.060
.907
TM
LE
r(r)
.514
.114
.065
.239
.541
.984
.575
.067
.576
.966
.995
.577
.049
.595
.985
.910
.060
.907
AB
-GM
M2(W
).0
30
.030
.047
.089
.149
.697
.270
.063
.339
.748
.259
.067
.080
.294
.618
.782
.326
.065
.509
.910
Sys-
GM
M2(W
).9
95
.972
.898
.719
.389
.998
.939
.674
.295
Table
2.1
0:
Des
ign
4.R
ejec
tion
freq
uen
cies
for
two
sided
t-te
sts
forφ
11.
Tru
eva
lueφ
11
=0.
4.
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.260
.186
.179
.225
.309
.553
.256
.149
.291
.549
.776
.265
.064
.307
.741
.951
.522
.075
.526
.941
TM
LE
c(r)
.570
.314
.141
.065
.093
.922
.875
.798
.693
.561
.873
.440
.080
.069
.342
.680
.540
.380
.251
.152
TM
LE
s(r)
.198
.116
.119
.186
.311
.594
.209
.071
.246
.583
.784
.266
.058
.303
.745
.958
.526
.072
.526
.948
TM
LE
r(r)
.207
.126
.129
.195
.318
.595
.214
.078
.254
.590
.784
.266
.058
.303
.745
.958
.526
.072
.526
.948
AB
-GM
M2(W
).1
65
.075
.071
.141
.283
.488
.186
.082
.233
.535
.399
.114
.084
.319
.696
.813
.326
.086
.498
.906
Sys-
GM
M2(W
).6
97
.569
.445
.329
.238
.999
.977
.909
.779
.615
.475
N=
250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.434
.169
.136
.257
.458
.878
.415
.090
.439
.859
.993
.569
.050
.574
.982
.870
.059
.861
TM
LE
c(r)
.930
.666
.259
.059
.109
.996
.995
.987
.957
.861
.880
.179
.074
.605
.901
.851
.719
.488
.268
TM
LE
s(r)
.440
.098
.066
.222
.485
.934
.423
.061
.437
.900
.993
.569
.049
.574
.982
.870
.059
.861
TM
LE
r(r)
.443
.101
.069
.224
.487
.934
.423
.061
.438
.900
.993
.569
.049
.574
.982
.870
.059
.861
AB
-GM
M2(W
).3
90
.126
.060
.194
.478
.845
.357
.070
.390
.849
.820
.282
.062
.465
.921
.993
.647
.064
.743
.996
Sys-
GM
M2(W
).9
17
.804
.663
.517
.384
.999
.985
.900
.728
.559
Chapter 2. First Difference Transformation in Panel VAR models 61
Table
2.1
1:
Des
ign
5.R
ejec
tion
freq
uen
cies
for
two
sided
t-te
sts
forφ
11.
Tru
eva
lueφ
11
=0.
4.
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.280
.242
.230
.246
.282
.357
.316
.293
.289
.299
.429
.141
.090
.234
.494
.433
.172
.123
.254
.490
TM
LE
c(r)
.232
.116
.079
.116
.224
.241
.127
.092
.128
.235
.537
.185
.066
.202
.525
.536
.185
.066
.203
.525
TM
LE
s(r)
.226
.190
.188
.221
.278
.245
.207
.202
.230
.282
.429
.134
.079
.228
.495
.431
.137
.082
.230
.497
TM
LE
r(r)
.240
.206
.203
.234
.288
.305
.272
.266
.285
.323
.429
.134
.080
.229
.496
.434
.141
.089
.237
.502
AB
-GM
M2(W
).0
54
.061
.087
.133
.198
.053
.071
.098
.132
.175
.090
.065
.144
.322
.547
.069
.072
.148
.295
.481
Sys-
GM
M2(W
).2
68
.165
.101
.080
.106
.579
.485
.389
.285
.189
.823
.629
.389
.191
.117
.986
.971
.939
.880
.765
N=
250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.234
.162
.155
.218
.328
.303
.232
.214
.255
.329
.826
.279
.059
.345
.785
.823
.280
.064
.348
.782
TM
LE
c(r)
.443
.169
.063
.124
.352
.443
.169
.063
.125
.352
.896
.386
.051
.336
.848
.896
.386
.051
.336
.848
TM
LE
s(r)
.208
.136
.134
.206
.333
.219
.147
.143
.214
.337
.827
.279
.058
.346
.785
.826
.279
.058
.346
.785
TM
LE
r(r)
.211
.140
.137
.209
.336
.239
.168
.166
.236
.354
.827
.279
.058
.346
.785
.827
.279
.059
.346
.785
AB
-GM
M2(W
).0
72
.048
.067
.138
.237
.033
.045
.072
.110
.161
.254
.069
.082
.293
.628
.136
.054
.087
.239
.493
Sys-
GM
M2(W
).3
16
.154
.078
.093
.206
.504
.391
.278
.182
.115
.882
.551
.185
.084
.297
.972
.932
.844
.701
.501
Table
2.1
2:
Des
ign
6.R
ejec
tion
freq
uen
cies
for
two
sided
t-te
sts
forφ
11.
Tru
eva
lueφ
11
=0.
4.
N=
100
T=
3π
=1
N=
100
T=
3π
=3
N=
100
T=
6π
=1
N=
100
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.326
.223
.171
.168
.204
.379
.283
.227
.211
.228
.804
.324
.059
.170
.550
.802
.324
.061
.172
.549
TM
LE
c(r)
.560
.297
.126
.065
.119
.561
.301
.132
.072
.127
.864
.407
.070
.168
.615
.864
.407
.070
.168
.615
TM
LE
s(r)
.301
.194
.146
.151
.199
.316
.208
.159
.161
.205
.804
.324
.059
.170
.550
.804
.324
.059
.170
.550
TM
LE
r(r)
.305
.201
.154
.157
.202
.336
.236
.191
.191
.229
.804
.324
.059
.170
.550
.804
.324
.059
.170
.550
AB
-GM
M2(W
).0
63
.048
.070
.123
.213
.045
.057
.080
.116
.163
.156
.057
.099
.274
.548
.069
.050
.108
.243
.440
Sys-
GM
M2(W
).2
88
.157
.085
.082
.136
.582
.482
.375
.267
.174
.830
.531
.214
.079
.191
.972
.927
.844
.696
.501
N=
250
T=
3π
=1
N=
250
T=
3π
=3
N=
250
T=
6π
=1
N=
250
T=
6π
=3
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
0.2
0.3
0.4
0.5
0.6
TM
LE
(r)
.560
.231
.110
.110
.239
.572
.263
.144
.138
.248
.996
.701
.073
.294
.883
.996
.701
.073
.294
.883
TM
LE
c(r)
.918
.617
.217
.054
.181
.918
.618
.217
.054
.181
.999
.786
.092
.303
.942
.999
.786
.092
.303
.942
TM
LE
s(r)
.557
.222
.102
.105
.239
.561
.226
.106
.108
.241
.996
.701
.073
.294
.883
.996
.701
.073
.294
.883
TM
LE
r(r)
.557
.222
.102
.105
.240
.563
.231
.114
.118
.249
.996
.701
.073
.294
.883
.996
.701
.073
.294
.883
AB
-GM
M2(W
).1
42
.055
.058
.143
.299
.033
.036
.059
.103
.166
.467
.120
.071
.342
.747
.207
.064
.076
.254
.540
Sys-
GM
M2(W
).4
37
.171
.068
.120
.313
.508
.370
.256
.168
.122
.954
.543
.087
.223
.758
.952
.819
.573
.317
.170
Chapter 3
On Maximum Likelihood
Estimation of Dynamic Panel Data
Models
3.1 Introduction
Dynamic panel data models have a prominent place in applied research and at the same
time form a challenging field in econometric theory. Many panel data applications have
a relatively small number of time periods T , whereas the cross sectional dimension N is
sizeable. It is therefore common to consider the semi-asymptotic behavior of estimators
and corresponding test statistics with T fixed and only N tending to infinity.
A central theme in linear dynamic panel data analysis is the fact that the Fixed Effects
(FE) estimator is inconsistent for fixed T and N large. This inconsistency is referred
to as the Nickell (1981) bias, and is an example of the incidental parameters problem.
It has therefore become common practice to estimate the parameters of dynamic panel
data models by the Generalized Method of Moments (GMM), see Arellano and Bond
(1991) and Blundell and Bond (1998). A main reason for using GMM is that it provides
asymptotically efficient inference exploiting a minimal set of statistical assumptions.
GMM inference has not been without its own problems, however. These include small
sample biases in both coefficient and variance estimators, sensitivity to important nui-
sance parameters and choices regarding the type and number of moment conditions. A
63
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 64
large literature has been devoted to adapting the GMM approach to limit the impact
of these inherent drawbacks, see Bun and Sarafidis (2015) for a recent overview.
This again has led to an interest in likelihood based methods that correct for the inci-
dental parameters problem. Some of these methods are based on modifications of the
profile likelihood, see Lancaster (2002) and Dhaene and Jochmans (2015). Other meth-
ods start from the likelihood function of the first differences, see Hsiao et al. (2002) and
Binder et al. (2005). Essentially these methods treat the incidental parameters as fixed
in estimation. The alternative approach is to assume random effects, but in dynamic
models it is then necessary to be explicit about the non-zero correlation between indi-
vidual specific effects and initial conditions (Anderson and Hsiao (1982), Alvarez and
Arellano (2003), Moral-Benito (2012) and Hsiao and Zhang (2015)). Random effects
type ML estimators therefore typically exploit Chamberlain (1982) type of projections
to model the dependence between individual specific effects, initial observations and
additional covariates.
In this study we consider the Transformed Maximum Likelihood approach (TML) as in
Hsiao et al. (2002) and the Random effects Maximum Likelihood estimator (RML) as
in Alvarez and Arellano (2003).1 There is a close connection between TML and RML
in the sense that TML can be expressed as a restricted version of RML. Under suit-
able regularity conditions ML estimators are consistent and asymptotically normally
distributed. Monte Carlo evidence provided in both studies, suggest that these likeli-
hood based approaches can serve as viable alternatives to the usual GMM estimators.
Just like for GMM, however, the application of ML estimators is not without its own
problems.
In this study we address two important issues when implementing ML for dynamic panel
data models. First, we show that in the simple setup without time-series heteroscedas-
ticity both the TML and RML estimators give rise to a cubic first-order condition in
the autoregressive parameter. We therefore have either one or three solutions to the
first-order conditions. As a result even asymptotically the log-likelihood function can
be bimodal. This result is different from Kruiniger (2008) and Han and Phillips (2013),
who find a quartic equation assuming covariance stationarity.
Second, because both TML and RML can be seen as random effect ML estimators,
we address the issue of negative variance estimates as mentioned in Maddala (1971),
Alvarez and Arellano (2003) and Han and Phillips (2013). An important consequence
1Because the former is derived conditional on the initial observations and individual specific effects,it is also referred to as fixed effects ML in Kruiniger (2013).
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 65
of bimodality is that unconstrained maximization of the log-likelihood may lead to
ML estimates which do not satisfy the implicit restriction of non-negative variances.
Enforcing this non-negativity constraint may furthermore lead to a boundary solution
(Maddala (1971), Alvarez and Arellano (2003)).
We further investigate the impact of multiple roots and boundary conditions in finite
samples in a Monte Carlo study. We consider finite sample bias and RMSE of coef-
ficient estimators as well as size and power of corresponding t and LR statistics. We
find that, despite the robustness of the TML and RML to initial conditions, the fi-
nite sample properties of both estimators for small values of T depend heavily on the
initial condition. A partial explanation is that the behavior of the initial condition
has direct effect on the bimodality of the log-likelihood function. We find that when
there are three solutions to the first-order condition, the left solution always satisfies
the non-negativity restriction, while the right solution violates it. Estimators taking
into account non-negativity constraints perform much better than unconstrained coun-
terparts. Furthermore, we find that inference based on the LR statistic is size correct,
while t statistics show large size distortions. Using the dataset in Bun and Carree
(2005) we show how these theoretical results can influence empirical estimates of U.S.
state level unemployment dynamics.
Throughout the analysis we limit ourselves to an asymptotic analysis in which T is fixed
and N → ∞. When T is large the influence of initial conditions becomes negligible,
hence our main results become less relevant. For results with T large, see e.g. Bai
(2013a). Furthermore, we do not analyze the unit root case. Distribution theory
becomes rather different in this case, see Ahn and Thomas (2006) and Kruiniger (2013).
The plan of this study is as follows. In Section 3.2 we introduce the Maximum Like-
lihood estimators for the panel AR(1) model including the cubic first-order condition
for the autoregressive parameter. Section 3.3 deals with the possibility of multiple
solutions and proposes bounded estimation as a solution. Section 3.4 contains the ex-
tension to dynamic models with additional covariates. Section 3.5 reports the results
from the Monte Carlo study, while Section 3.6 shows the empirical results. Section 3.7
concludes.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 66
3.2 ML estimation for the panel AR(1) model
We consider the following simple AR(1) specification without exogenous regressors:2
yi,t = ηi + φyi,t−1 + εi,t, E[εi,t|yi,0, ηi] = 0, (3.1)
for i = 1, . . . , N, t = 1, . . . , T . We assume that the idiosyncratic errors εi,t are i.i.d.
(0, σ2)3 and that initial conditions yi,0 are observed. Stacking the observations over
time, we can write the AR(1) model for each individual as
yi = φyi− + ıTηi + εi, εi = (εi,1, . . . , εi,T )′, (3.2)
with yi and yi− defined accordingly and ıT a vector of ones. We follow Kruiniger
(2013) to derive the log-likelihood function(s), but final results are identical to those
in Hsiao et al. (2002), Alvarez and Arellano (2003) or Binder et al. (2005). As it is
discussed in e.g. Kruiniger (2013) the crucial assumptions for consistency, asymptotic
normality and feasible inference of the likelihood estimation can be summarized with
two following assumptions.
Assumption TML: ξi,0 ≡ yi,0 − ηi/(1 − φ0) are i.i.d. with finite second moments,
∀i = 1, . . . , N .
Assumption RML: yi,0 and ηi are i.i.d. with finite second moments, ∀i = 1, . . . , N .
Remark 3.1. Note that Assumption TML is somewhat less restrictive than RML,
because one has to impose distributional assumptions only on a particular linear com-
bination of yi,0 and ηi. Assumption RML, on the other hand, imposes restrictions on
the whole joint distribution of yi,0 and ηi.
To derive corresponding estimating equations for both estimators, it is convenient to
use the Chamberlain (1982) type of projection for ηi:4
ηi = πyi,0 + vi, E[viyi,0] = 0, vi ∼ i.i.d.(0, σ2v). (3.3)
2Time-specific effects can be accommodated by taking the variables in deviations from the cross-sectional mean.
3Assuming heteroscedasticity over i does not violate consistency of the resulting estimators andhas no effect on the results of this paper.
4For simplicity we do not include a constant term in the projection as it would serve as a restrictedtime effect.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 67
Note that this projection is only necessary for the RML estimator, while the i.i.d.
assumption is only imposed for simplicity as the discussed estimators remain consistent
even if vi is heteroscedastic. When we set π = 1−φ the projection corresponds exactly
to the TML framework as in this case vi = −(1 − φ0)ξi,0. In the TML approach the
variance of vi = ∆yi,1− εi,1 is a parameter to be estimated.5 We therefore only exploit
the projection in (3.3) to show the algebraic comparison between RML and TML. The
model can be represented as:6
Ryi = (e1φ+ ıTπ)yi,0 + (ıTvi + εi), (3.4)
where R = IT −LTφ and e1 is the first column of the IT matrix.7 Define the combined
error term as
ui ≡ ıTvi + εi, (3.5)
then it follows under our assumption
E[ui] = 0T , var[ui] = Σ = σ2vıT ı
′T + σ2IT . (3.6)
The variance-covariance structure ofΣ is of the usual “correlated” random effects form.
Using the matrix inversion and determinant lemmas, we obtain
Σ−1 =1
σ2IT −
1
σ2
σ2v
σ2 + Tσ2v
ıT ı′T , |Σ|= (σ2)T−1(σ2 + Tσ2
v). (3.7)
Denote by WT ≡ IT − 1TıT ı′T the usual fixed effects projection matrix, then we can
write
Σ−1 =1
σ2WT +
1
Tθ2ıT ı′T , |Σ|= (σ2)T−1θ2, (3.8)
θ2 ≡ σ2 + Tσ2v . (3.9)
Hence, instead of estimating σ2 and σ2v we rather estimate σ2 and θ2. By doing so
we do not restrict σ2v to be positive. This parametrization, in one form or the other,
has been used in Hsiao et al. (2002), Alvarez and Arellano (2003), Ahn and Thomas
(2006), and Kruiniger (2008) inter alia.
5Alternatively, following Hsiao et al. (2002) one can estimate ω ≡ 1 + σ2v/σ
2 = var (∆yi,1)/σ2.6Bai (2013a) considers a similar conditional maximum likelihood estimator with a possible factor
structure in the error term εi.7The lag-operator matrix LT is defined such that for any [T × 1] vector x = (x1, . . . , xT )′, LTx =
(0, x1, . . . , xT−1)′.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 68
Now we define the quasi log-likelihood function8 for some individual i (up to a constant)
−2`i(κ) = (T − 1) log(σ2) + log(θ2)
+ (yi − φyi− − ıTπyi,0)′Σ−1(yi − φyi− − ıTπyi,0), (3.10)
where κ = (φ, π, σ2, θ2)′. This function is the true likelihood function if ui are jointly
normal.9 Now using the fact that WT ıT = 0T , we can write
−2`i(κ) = (T − 1) log(σ2) + log(θ2)
+1
σ2[(yi − φyi−)′WT (yi − φyi−)] +
T
θ2(yi − φyi− − πyi,0)2, (3.11)
where yi ≡ (1/T )∑T
t=1 yi,t and yi− ≡ (1/T )∑T
t=1 yi,t−1. Furthermore, we define yi,t ≡yi,t − yi, yi,t−1 ≡ yi,t−1 − yi−, yi ≡ yi − yi,0, yi− ≡ yi− − yi,0 and ρ ≡ π − (1− φ). This
implies the following final expression for the log-likelihood function (after summing
over all individual log-likelihood functions)
− 2
N`(κ) = (T − 1) log(σ2) + log(θ2)
+1
Nσ2
N∑i=1
T∑t=1
(yi,t − φyi,t−1)2 +T
Nθ2
N∑i=1
(yi − φyi− − ρyi,0)2. (3.12)
The log-likelihood function for observations in first-differences (also known as Trans-
formed log-likelihood in Hsiao et al. (2002) and Juodis (2014b)) is obtained by setting
ρ = 0. Thus both RML and TML estimators provide an in-built bias-correction term
for the usual fixed effects log-likelihood function.
The parameters σ2, θ2 (and ρ for RML) can be concentrated out, resulting in (up to a
constant)
`c(φ) = −N2
((T − 1) log σ2(φ) + log θ2(φ)
), (3.13)
where
σ2(φ) =1
N(T − 1)
N∑i=1
T∑t=1
(yi,t − φyi,t−1)2, θ2(φ) =T
N
N∑i=1
(yi − φyi−)2. (3.14)
8Or, following Bai (2013a), a distance function between population and sample covariance matrices.9This parametrization ensures that θ2 > 0 or equivalently ω > 1− 1
T as in Hsiao et al. (2002).
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 69
Here for the random effects log-likelihood function we defined yi and yi−
yi ≡ yi − yi,0∑N
i=1 yiyi,0∑Ni=1 y
2i,0
, yi− ≡ yi− − yi,0∑N
i=1 yi−yi,0∑Ni=1 y
2i,0
, (3.15)
while for the log-likelihood in first differences yi ≡ yi and yi− ≡ yi−.
The likelihood function in (3.13) is defined for all values of φ ∈ R, hence from theoretical
and computational point of view there are no reasons to consider a restricted parameter
space for estimation. Nevertheless, some studies (Hsiao et al. (2002); Hayakawa and
Pesaran (2015)) restrict φ ∈ (−1; 1). This may have consequences for the finite sample
properties of the resulting estimators, as we shall see below. Furthermore, the fact
that the likelihood function is defined over the whole real line, is in contrast with the
likelihood function in Kruiniger (2008) and Han and Phillips (2013). In these studies
stationarity has been assumed, hence the likelihood function is naturally defined only
for −1 < φ ≤ 1.
The FOC (first order condition) for the autoregressive parameter φ can now be ex-
pressed in the following way
d`c(φ)
dφ=
1
σ2(φ)
N∑i=1
T∑t=1
yi,t−1(yi,t − φyi,t−1) +T
θ2(φ)
N∑i=1
yi−(yi − φyi−) = 0, (3.16)
or alternatively
θ2(φ)N∑i=1
T∑t=1
yi,t−1(yi,t − φyi,t−1) + σ2(φ)TN∑i=1
yi−(yi − φyi−) = 0. (3.17)
Given that σ2(φ) and θ2(φ) are quadratic in φ it is not difficult to see that the FOC
is cubic in φ. Thus for any value of T and any realization of yiNi=1 there will be at
least one and at most three solutions to (3.17). For general value of T there is no easy
formula for the solutions, but in the next section we will obtain interesting analytical
results for three-wave panels. In any case the solutions of the cubic equation can be
found without any need of explicit numerical maximization. One can simply use root
finder algorithms based on the eigenvalues of the companion matrix.
For some reason the fact of possible multiple solutions is mostly forgotten when dis-
cussing both likelihood estimators. An exemption is Hayakawa and Pesaran (2015)
who observe that the TML log-likelihood function can have more than one solution
asymptotically. Here we show that also in finite samples this is possible, and that
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 70
both the TML and RML log-likelihood functions have one or two local maxima. More
importantly, this result is unaffected if strictly exogenous regressors are added to the
model as in Section 3.4.
Given the structure of the log-likelihood function we can easily specify the interval for
all solutions φ. In particular, we have
Lemma 3.1. For any N and T all solutions of (3.17) lie in the following interval
φ ∈ (φW , φB) =
(∑Ni=1
∑Tt=1 yi,tyi,t−1∑N
i=1
∑Tt=1 y
2i,t−1
,
∑Ni=1 yiyi−∑Ni=1 y
2i−
). (3.18)
Furthermore, this result continues to hold if N →∞.
The proof of this lemma follows directly from the fact that the log-likelihood function
is a sum of two quasi-concave functions with different maxima. The lower bound of
this interval is the fixed effects ML estimator (also known as Within Group or LSDV
estimator). The upper bound can be interpreted as a quasi-between estimator. It is
well known that plimN,T→∞ φW = φ, but that for fixed T the within estimator has
a negative bias, see Nickell (1981). Furthermore, it is straightforward to show that
plimN,T→∞ φB = 1 because yi and yi− converge to the same value as T goes to infinity.
Next, we investigate the asymptotic behavior of the interval in Lemma 3.1. As the
lower bound (φW ) is the same for both RML and TML estimators we are primarily
interested in the upper bound (φB), which is different between estimators. The result
is summarized in the following Proposition.
Proposition 3.2. The probability limits of the quasi-between estimators from Lemma
3.1 are
plimN→∞
φRMLB ≤ plim
N→∞φTMLB . (3.19)
Proof. In Appendix 3.A.
Thus the upper bound for RML is no larger than for TML. The interval for possible
values for φRML is narrower than the corresponding interval for TML. This result can
be expected given that the RML estimator is found to be more efficient than TML, see
Kruiniger (2013).
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 71
3.3 Multiple solutions and constrained estimation
The possibility of having one or three solutions to the cubic equation (3.17) has impor-
tant consequences. We first characterize the solutions for the case in which analytical
results can be derived, that of three-wave panels and TML. We then proceed to the
case of general T . Finally, we discuss a procedure of bounded estimation.
3.3.1 Three-wave panel and the Transformed ML estimator
For general values of T we can only specify in which interval the solutions of the cubic
equation lie, as described in Lemma 3.1. For T = 2 and the Transformed log-likelihood
function (i.e. ρ = 0), this result can be sharpened and a simple analytic expression for
the ML estimator can be derived. Observe that for T = 2 we have for σ2(φ) and θ2(φ)
as defined in (3.14)
σ2(φ) =1
2N
N∑i=1
(∆yi,2 − φ∆yi,1)2 , θ2(φ) =1
2N
N∑i=1
(∆yi,2 − (φ− 2)∆yi,1)2 . (3.20)
We have then the following expression for the TML log-likelihood function
Proposition 3.3. For T = 2 the log-likelihood function for the TML estimator is given
by
`c(φ) = −N2
log
(σ2(φ) +
(1
N
N∑i=1
∆yi,1∆yi,2 − φ1
N
N∑i=1
(∆yi,1)2
))2
+ d
,
(3.21)
where d does not depend on φ but only on data.
Proof. In Appendix 3.A.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 72
The polynomial inside the log(·) expression in Proposition 3.3 is symmetric around the
point φ = φW + 1. The FOC is
1
2σ2θ2
(N∑i=1
(∆yi,2 − φ∆yi,1)(∆yi,2 − (φ− 2)∆yi,1)
)
×
(N∑i=1
(∆yi,2 − (φ− 1)∆yi,1)∆yi,1
)= 0.
The solutions are given by φ = φW + 1 and for D > 0
φ(l) = φ−√D, φ(r) = φ+
√D,
where D ≡ 1 + φ2W −
∑Ni=1(∆yi,2)2∑Ni=1(∆yi,1)2
is the discriminant of the quadratic part of the score.
The first derivative of the concentrated likelihood consists of a linear and quadratic
part. The latter implies either zero or two more solutions for φ on top of the inter-
mediate case φ = φW + 1. Furthermore, setting this quadratic part equal to zero
can be recognized as the FOC of the bias corrected FE estimator as in Bun and Car-
ree (2005). Consistency10 of φ(l) follows directly from plimN→∞ φW = φ0 − σ20
var (∆yi,1)
and plimN→∞D =(
1− σ20
var (∆yi,1)
)2
. The solutions φ and φ(r) are inconsistent unless
var(∆yi,1) = σ20, i.e. σ2
v = 0.
The relationships between the solutions to the FOC can be further summarized as
follows
Corollary 3.4. For T = 2 and TML the following holds
`c(φ(l)) = `c(φ(r)), (3.22)
σ2v(φ
(l)) > 0 > σ2v(φ
(r)), (3.23)
θ2(φ) = σ2(φ), (3.24)
φB = φW + 2. (3.25)
However, this result does not hold in general for RML and/or T > 2.
Proof. In Appendix 3.A.
10From now on, where necessary to avoid confusion, we will use the subscript 0 to denote the truevalues of the parameters, e.g. φ0, σ
20 .
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 73
Corollary 3.4 states that, if the cubic equation has only one solution, it is a corner
solution as the estimate for σ2v(φ) = 0 in this case.11 The equality of the likelihood for
φ(l) and φ(r) would imply that both can be considered “maximum likelihood”. However,
the second is inconsistent and leads to a negative estimate of σ2v . To illustrate the
relevance of the occurrence of three solutions we derived the probability of a positive
discriminant by assuming normality.
Corollary 3.5. For T = 2 and under joint normality of the data, the probability of
D > 0 (two maxima) is given by
Pr (D > 0) = F
N
N − 1
(2
σ20
var (∆yi,1)−(
σ20
var (∆yi,1)
)2)−1
(N−1,N)
, (3.26)
where F (·)(N−1,N) is the CDF of an F distributed random variable with (N − 1, N)
degrees of freedom.
Proof. In Appendix 3.A.
The term inside F (·)(N−1,N) is always larger than 1 and consequently Pr (D > 0) ≥ 0.5
as var (∆yi,1) ≥ σ20. If the initial observation is generated from a stationary process
thenσ20
var (∆yi,1)= (1 + φ0)/2. To allow for unrestricted initial condition we define the
following relative variance ratio α0
α0 ≡1− φ2
0
σ20
var
(yi,0 −
ηi(1− φ0)
),⇒ var (∆yi,1) = σ2
0
(α0
1− φ0
1 + φ0
+ 1
), (3.27)
such that α0 = 1 if the initial observation is covariance stationary. Pr (D > 0) then
depends on N , φ0 and α0. It can be easily seen that Pr (D > 0) is a decreasing function
of φ0 and a increasing function of α0. Below we provide two graphs to illustrate how
this probability depends on the population parameters.
11While discussing the properties of the panel VAR estimator, Juodis (2014b) observed that forthe AR(1) with T = 2 case the results of the previous corollary hold asymptotically. In this paper,we show that this result is exact if the quadratic equation has a positive discriminant. Furthermore,Juodis (2014b) investigates the location of the second mode asymptotically and shows that its locationdepends on initialization of yi,0.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 74
α0
φ 0
N=
50
0.20.4
0.6
0.81.0
0.25
0.50
0.75
1.00
0.6
0.7
0.8
α0
φ 0
N=
250
0.20.4
0.6
0.81.0
0.25
0.50
0.75
1.00
0.6
0.7
0.8
0.9
1
Figure 3.1: Probability of D > 0 with N = 50 on the left and N = 250 on theright. φ0 ∈ [0; 0.95] and α0 ∈ [0.0; 1.05].
3.3.2 Further asymptotic results for T > 2 and TML
In this subsection we extend the analysis of one or three solutions to the FOC to T > 2.
We consider the extent to which asymptotically the discriminant of (3.17) of TML is
positive or negative. Before proceeding we define the following quantities
aE ≡ E
[T∑t=1
y2i,t−1
], aE ≡ E
[T y2
i−], ξ ≡
T−2∑t=0
(T − t− 1)φt0, x ≡ φ0 − φ. (3.28)
Using this notation we can express the asymptotic solutions of the FOC for general T
as
Proposition 3.6. The two non-trivial (x 6= 0) asymptotic solutions of (3.17) (if they
exist) are implicitly defined by
x2 T
T − 1(aE aE) + x
ξ
T
((2T − 1
T − 1θ2
0aE
)−(T + 1
T − 1σ2
0 aE
))+(θ2
0aE + σ20 aE)− 2ξ2
T (T − 1)θ2
0σ20 = 0. (3.29)
The existence of non-trivial solutions to this equation in the simple AR(1) model de-
pends on two parameters: the autoregressive coefficient φ0 and the relative variance
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 75
parameter α0. Under this reparametrization the solutions in Proposition 3.6 are invari-
ant to σ20, because all quantities of interest are multiplicative in σ2
0
aE =
T−1∑t=0
φ2t0 −
1
T
(T−1∑t=0
φt0
)2 α0σ
20
1− φ20
+ σ20
T−2∑t=0
(t∑
j=0
φj0
(φj0 −
1
T
(t∑
j=0
φj0
))),
(3.30)
aE =α0σ
20ξ
2
T
1− φ0
1 + φ0
+σ2
0
T
T−2∑t=0
(t∑
j=0
φj0
)2
, (3.31)
θ20 = Tσ2
0
(α0
1− φ0
1 + φ0
+1
T
). (3.32)
To gain further insight into the quadratic equation in Proposition 3.6 for general T > 2
we investigate the sign of the discriminant numerically for different values of T . In
Figure 3.2 we present two plots of the sign of the discriminant. Here 3 indicates that
the discriminant is positive (thus bimodality), while 1 implies that the discriminant is
negative and the log-likelihood function is asymptotically unimodal. We present results
for T ∈ 3; 5. For higher values of T the border between three and one solution to the
FOC approaches the α0 = 1 line from below. There is a major change from T = 2 to
T > 2 in the set of values (φ0, α0) for which in the limit there is a positive discriminant
value. For T = 2 all values of (φ0, α0) for which α0 > 0 and φ0 < 1 have a positive
discriminant when N →∞. This set is obviously smaller already for T = 3.
0.80
0.85
0.90
0.95
1.00
1.05
0.00 0.25 0.50 0.75 1.00
φ
α 1
3
0.80
0.85
0.90
0.95
1.00
1.05
0.00 0.25 0.50 0.75 1.00
φ
α 1
3
Figure 3.2: The sign of the discriminant for T = 3 on the left and T = 5 on theright graph. 3 is for positive discriminant and thus three solutions to FOC, while 1
is for negative discriminant and one solution. φ0 ∈ [0; 0.99] and α0 ∈ [0.8; 1.05]
Figure 3.2 shows that as T increases the interval of α0 < 1, that results in a positive
discriminant, shrinks. It can be shown numerically for the relevant range of values for
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 76
α0 and φ0 that for α0 ≥ 1 the discriminant is always positive. These results show that
multiple solutions are possible for T > 2 even if N becomes large.
3.3.3 Constrained estimation
Aside from the suggested “take left” procedure in case of bimodality to avoid a negative
estimate of σ2v we may also use restricted ML estimation. Consider the following
reparametrization δ = σ2
θ2, so that in the population δ ∈ (0; 1] because by definition
θ2 = σ2 + Tσ2v . In order to take this population restriction into account we consider
the following reformulated log-likelihood function
(3.33)`(κ) = −N
2
(T log(σ2)− log(δ)
+1
Nσ2
N∑i=1
T∑t=1
(yi,t − φyi,t−1)2 + δT
Nσ2
N∑i=1
(yi − φyi− − ρyi,0)2
),
for κ = (φ, σ2, δ, ρ)′. The corresponding concentrated log-likelihood function in terms
of the δ parameter is
− 2
N`c(δ) = T log
[(c− 2φ(δ)b+ φ2(δ)a
)+ δ
(c− 2φ(δ)b+ φ2(δ)a
)]− log δ, (3.34)
where φ(δ) = b+δba+δa
and
a =1
N
N∑i=1
T∑t=1
y2i,t−1, b =
1
N
N∑i=1
T∑t=1
yi,tyi,t−1, c =1
N
N∑i=1
T∑t=1
y2i,t,
a =T
N
N∑i=1
y2i−, b =
T
N
N∑i=1
yiyi−, c =T
N
N∑i=1
y2i .
The expression for the concentrated log-likelihood can be further simplified as
− 2
N`c(δ) = T log
[(c+ δc)− (b+ δb)2
a+ δa
]− log δ. (3.35)
Similarly to (3.17), the FOC for the log-likelihood function in (3.35) is cubic in δ.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 77
One can relate the likelihood function in (3.35) to equation (2.3) in Maddala (1971),
who investigates the occurrence of the boundary solution δ = 1 for this type of likeli-
hood function.
Remark 3.2. Note that Maddala (1971) and also Balestra and Nerlove (1966), assume
that for all i one has yi,0 = 0. If this restriction is indeed true, the two log-likelihood
functions are identical and φRML = φTML. On the other hand, if this restriction is not
satisfied, the resulting estimator is not consistent for any fixed value of T and has an
asymptotic bias of order OP (T−2). This estimator is labeled a “Misspecified Random
Effects Estimator” in Hahn et al. (2004), where the authors provide asymptotic results
for this estimator as N, T →∞ (jointly).
One can see that the necessary and sufficient condition for δ = 1 to be a local maximum
isd`c(δ)
dδ
∣∣∣∣δ=1
> 0. (3.36)
In this case one sets δ = 1 and the corresponding estimate of φ is given by
φ(1) =b+ b
a+ a. (3.37)
We know that for T = 2 and in the case of the TML estimator this φ(1) is exactly the
middle solution φ = φW + 1. In all other cases this solution will differ from the unique
global unconstrained maximum (in the one solution case). For example, if yi,0 = 0 for
all i, one can recognize φ(1) as the pooled OLS estimator of φ, which is known to be
positively biased. In general, φ(δ) is a weighted sum of “within” and “quasi-between”
estimators and thus belongs to the interval of Lemma 3.1. Furthermore, the weight of
“within” estimator is monotonically decreasing in δ, because
φ(δ) = φW q(δ) + φB(1− q(δ)), q(δ) =a
a+ δa, q′(δ) < 0. (3.38)
Hence, if the global maximum φ does not satisfy the non-negativity constraint it is
always non-smaller than φ(1) (assuming φW < φB). In the Monte Carlo section of this
paper we will investigate the finite sample properties of TML and RML estimators that
use φ(1) as estimate at the boundary of the parameter space.
Remark 3.3. Although not addressed in this paper, the use of the boundary solution
φ(1) might lead to a non-standard inference problem in finite samples (Feldman and
Cousins (1998), Ketz (2014)). Confidence intervals based on inverting the Likelihood
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 78
Ratio statistic have to be handled with care as for some values of the null hypothesis
φ0 the likelihood ratio statistic can be non-positive.
3.4 Extension to exogenous regressors
For most empirically relevant applications the AR(1) model specification is too restric-
tive and incomplete. In this subsection we therefore extend our analysis to an ARX(1)
model including additional strictly exogenous regressors.12 For ease of exposition, we
consider the following simplified version with one additional regressor
yi,t = ηi + φyi,t−1 + βxi,t + εi,t, E[εi,t|xi,0, . . . , xi,T , yi,0, ηi] = 0. (3.39)
Then using stacked notation for individual i we have
yi = φyi− + βxi + ıTηi + εi, εi = (εi,1, . . . , εi,T )′. (3.40)
We continue by using the Chamberlain (1982) type of projection for ηi as in Kruiniger
(2006) and Bai (2013a) on not only yi,0, but now also all the lags and leads of xi,t
ηi = π′wi + vi, E[viwi] = 0, wi = (yi,0, xi,0,x′i)′. (3.41)
The main implication is that the combined error term ui can be represented as
ui = Ryi − e1φyi,0 − βxi − ıTπ′wi. (3.42)
The variance-covariance matrix remains the same as in the pure AR(1) model without
exogenous regressor. The quasi log-likelihood function over all individuals is then given
by (up to a constant)
(3.43)− 2
N`(κ) = (T − 1) log(σ2) + log(θ2) +
1
Nσ2
N∑i=1
T∑t=1
(yi,t − φyi,t−1 − βxi,t)2
+T
Nθ2
N∑i=1
(yi − φyi− − βxi − π′wi)2,
where κ = (φ, β, σ2, θ2,π′)′.
12Inclusion of weakly exogenous regressors can be handled if the xi,t vector admits a VAR repre-sentation. Some results for first order panel VAR models are discussed in Juodis (2014b).
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 79
Similarly to the model without xi,t the TML estimator can be expressed as a restricted
version (in terms of the parameter restrictions) of a more general RML estimator.
Without loss of generality, we can rewrite the second component of the log-likelihood
function as
T
Nθ2
N∑i=1
(yi − φyi− − βxi − π′wi)2 =
T
Nθ2
N∑i=1
(yi − φyi− − βxi − ρ′zi)2, (3.44)
where zi ≡ (yi,0, xi,0,∆xi,1, . . . ,∆xi,T )′ and xi ≡ xi−xi,0. Furthermore, using e.g. The-
orem 3.1 in Juodis (2014b) the second component of the TML log-likelihood function
is given by
T
Nθ2
N∑i=1
(yi − φyi− − βxi − π′∆∆xi)2, ∆xi = (∆xi,1, . . . ,∆xi,T )′. (3.45)
Hence, by setting first two components of the ρ vector to zero we obtain the TML
estimator as the restricted version of the RML estimator.13
Irrespective of the estimator considered it is not difficult to see that, because xi ∈Span(∆xi), one can concentrate out the ρ/π∆ parameter such that the second compo-
nent of the log-likelihood function in (3.45) does not contain the β parameter. There-
fore, the (concentrated) log-likelihood function can be expressed as
(3.46)− 2
N`(κ) = (T − 1) log(σ2) + log(θ2) +
1
Nσ2
N∑i=1
T∑t=1
(yi,t−φyi,t−1− βxi,t)2
+T
Nθ2
N∑i=1
(yi − φyi−)2,
where we defined
yi ≡ yi −
(N∑i=1
yi∆x′i
)(N∑i=1
∆xi∆x′i
)−1
∆xi, (3.47)
yi ≡ yi −
(N∑i=1
yiz′i
)(N∑i=1
ziz′i
)−1
zi, (3.48)
13Note that the interpretation of TML as restricted version of RML is only valid if one includesxi,0 in wi.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 80
for TML and RML estimators respectively (similarly for yi−). One can also concentrate
out the β parameter from the first component of the log-likelihood function. After
subsequent concentration of σ2 and θ2, the resulting log-likelihood function is then of
the same structure as in (3.13). The cubic FOC in (3.17) follows directly from that.
Summarizing, in this section we argued that in the model augmented with exogenous
regressors the FOC of the TML/RML estimators again is cubic in the autoregressive
parameter φ. Our derivations above rely upon the fact that the full Chamberlain
(1982) projection has been used, rather than the restricted Mundlak (1978) projection.
Without going into further discussion, we state that for the TML estimator the results
above do not carry over if one uses the Mundlak (1978) projection instead. However,
they continue to be valid for the RML estimator if one does not include xi,0 when
exploiting the Mundlak (1978) projection.14
3.5 Simulation study
In this section we investigate the finite sample performance of the various estimators
and corresponding test statistics using simulated data. In particular, we consider the
following panel AR(1) model
yi,t = φyi,t−1 + (1− φ)µi + εi,t, εi,t ∼ N (0, 1) , t = 1, . . . , T. (3.49)
yi,0 = γµi + εi,0, εi,0 ∼ N(
0,ζ
1− φ2
), µi ∼ N
(0, σ2
µ
). (3.50)
Mean (effect) stationarity of yi,t is achieved for designs with γ = 1, while the process
yi,t is covariance stationary if and only if both γ = ζ = 1. The actual value of σ2µ is
irrelevant for the TML estimator as long as γ = 1, but for the RML estimator this
parameter is always important. For the TML estimator the only important parameter
is α as defined in (3.27) as it measures the deviation from covariance stationarity.
Even for the simple AR(1) model the parameter space is already very large. We have
tried to cover its most relevant part by considering the following parameter settings
N = 50, 250, T = 3, 7, γ = 0.5, 1.0, σµ = 1, 3, φ = 0.5, 0.8,14Or if one does not impose that the coefficient for xi,0 is identical to the one for xi,1, . . . , xi,T .
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 81
while ζ = 1. We report mean, median, IQR (Interquartile Range) and RMSE for the
following coefficient estimators (TML/RML):15
• T(R)ML based on the global maximum (T(R)MLg), where always the global
maximum is selected.
• T(R)ML based on the “left” maximum (T(R)MLl), that takes into account the
non-negativity restriction only if there are two competing local maxima.
• T(R)ML with the imposed boundary condition φ(1) (T(R)MLb) from equation
(3.37), as in Section 3.3.3.
Note that in calculating coefficient estimators we refrained from using numerical opti-
mization techniques. As mentioned earlier, exploiting root finding algorithms one can
find solutions to the cubic first-order condition.16
Regarding inference we consider empirical rejection frequencies based on two sided t-
and LR statistics.17 We address both size and power. Due to the possible flatness of the
profile log-likelihood functions induced by the bimodality, inference quality based on
the two classical tests might differ substantially. The t or Wald test critically depends
on a quadratic approximation of the likelihood, which may cause problems when the
likelihood is flat. The LR test is probably better behaved under the null hypothesis, but
the flatness of the likelihood will influence its power. Below we summarize some general
patterns that arise from the various Tables with simulation results in the Appendix.
3.5.1 Results: Estimation
Regarding coefficient estimation we find that both T(R)MLl and T(R)MLb perform
substantially better than always choosing the global maximum of the likelihood function
(T(R)MLg). This point is especially relevant for TML and results from not taking the
non-negativity constraint into consideration. The “right” instead of “left” solution to
the FOC may sometimes provide the global maximum of the likelihood, but choosing
15We do not provide results for T(R)ML estimator where one imposes restrictions directly on theparameter space of φ. The common restriction of this type is φ ∈ (−1; 1) as considered e.g. in Hsiaoet al. (2002).
16As implemented by e.g. the roots(·) function in Matlab.17For the t-test we exploit the usual “sandwich” covariance matrix estimator, e.g. as in Hayakawa
and Pesaran (2015) or Juodis (2014b).
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 82
it causes serious bias. Therefore, in terms of bias and of RMSE both TMLg and RMLg
are dominated by “left” estimators or by exploiting the boundary condition.
When we do not consider T(R)MLg, we find little difference between RML and TML.
There is some tendency of RML to dominate TML, confirming results in Kruiniger
(2013). However, we do not observe any substantial problems for TML when σµ in-
creases, unlike the aforementioned study. Furthermore, in terms of RMSE exploiting
the boundary solution is almost always better than the “left” estimator. However, in
some cases (for small N and/or large φ) this choice can have a negative effect on mean
and median bias. This observation is in line with the theoretical discussion at the end
of Section 3.3.3. As N and T increase the discrepancy becomes negligible. Also the
distributions of all estimators tend to be asymmetric as illustrated by the discrepancy
between mean and the median.
Finally, we find that in all cases where the cubic FOC had three solutions, the “left”
solution always satisfies the non-negativity constraint, while the “right” solution never
satisfies it. Furthermore, in those cases the global constrained maximum is also achieved
at the “left” solution, and not at the boundary. Hence for replications with three
solutions only the “left” solution is natural. This is an important observation, because
it suggests that there is always at most one interior maximum in the constrained
optimization problem.
Figure 3.3 further illustrates how different ways of dealing with the boundary condition
shapes finite sample distributions of coefficient estimators. The fact that most of
the studies that consider RML and/or TML estimation (e.g. Hsiao et al. (2002),
Alvarez and Arellano (2003), Ahn and Thomas (2006), Kruiniger (2008), Hayakawa
and Pesaran (2015)) either do not address the negative variance issue at all or only
mention it without further exploring its consequences is somewhat puzzling. As we
can see for many designs substantial gains in terms of RMSE can be achieved when
using RMLl/TMLl rather than RMLg/TMLg. Finally, in most cases IQR for TML is
larger than the corresponding value for RML, thus confirming the asymptotic results
presented in Proposition 3.2.
3.5.2 Results: Inference
Regarding inference with the t and LR statistics, we observe that the LR test provides
reasonable size control, but the power properties are poor for small N and T . Given the
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 83
TMLg
0.25 0.50 0.75 1.00 1.25
2
4Density
TMLg RMLg
0.25 0.50 0.75 1.00 1.25
2
4Density
RMLg
TMLl
0.25 0.50 0.75 1.00 1.25
2
4Density
TMLl RMLl
0.25 0.50 0.75 1.00 1.25
2
4Density
RMLl
TMLb
0.25 0.50 0.75 1.00 1.25
2
4Density
TMLb RMLb
0.25 0.50 0.75 1.00 1.25
2
4Density
RMLb
Figure 3.3: Finite sample distribution of the TML/RML estimators for N =250, T = 3, φ = 0.5 with covariance stationary initialization of yi,0.
asymmetry of the likelihood function, especially the power for alternatives larger than
the null hypothesis is negligible and comparable to size. The power of the LR test im-
proves significantly with a large sample size, and especially for larger T . TMLb/RMLb
tends to be undersized in comparison to TMLl/RMLl.
Furthermore, inference based on the t-test in samples with small N and T is unreliable
as the actual rejection frequencies are substantially higher than nominal ones. The
results for the t-statistic deteriorate for φ = 0.8, which is related to the non-standard
behavior of the TML/RML estimator when φ is local-to-unity, see Kruiniger (2013) for
related results. The LR test is much less affected by the value of φ.
Finally, TML and RML statistics are similar in terms of empirical size with RML
statistics having higher power. In terms of empirical size we can rank test statistics in
the following order: g > l > b. These differences slowly disappear as N and/or T get
larger.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 84
3.6 Empirical illustration
In this section we study the behavior of TML and RML estimators exploiting data
from Bun and Carree (2005), who considered the following model for unemployment
at the U.S. state level
ui,t = φui,t−1 + βgi,t−1 + ηi + τt + εi,t. (3.51)
Here ui,t is the unemployment rate in state i at time t and gi,t−1 is the real economic
growth rate at time t − 1.18 The annual panel data cover the years 1991-2000 for
all U.S. states (including Washington D.C, hence N = 51). We present estimation
results for the model including the growth regressor (Table 3.1) and the pure AR(1)
specification (Table 3.2). In order to investigate how the behavior of the log-likelihood
function for both estimators changes as T increases we consider estimates over an
increasing window. Thus results for T = 2 are obtained based on years 1998 − 2000,
T = 3 exploits the period 1997 − 2000, etc. All models are estimated based on data
in deviations from cross-sectional means to filter-out the time effects. To illustrate the
finite sample properties of T(R)MLg, T(R)MLl and T(R)MLb we present coefficient
estimates for varying T .
3.6.1 ARX(1) model
Using all time periods (T = 9) the estimation results based on TML and RML are very
similar to the estimates in Bun and Carree (2005) obtained using the bias-corrected
FE estimator.19 The similarity between the bias corrected FE results as found by
Bun and Carree (2005) and TML/RML estimators is not surprising, given that all
three estimators correct for the bias in the FE estimator using some bias adjustment
procedure.
The results in Table 3.1 show that for RML estimation RMLg = RMLl = RMLb,
irrespective of T . Hence, the global maximum is always achieved at the “left” solution
that satisfies the non-negativity restriction, which amounts to θ2 ≥ σ2. The same
holds for TML estimation, with the clear exception of the T = 2 case. There the global
maximum is attained at φ(r) = 1.422, which is substantially larger than φ(l) = 0.506.
18Empirical evidence on strict exogeneity of gi,t−1 is provided in Bun and Carree (2005).19The bias corrected FE estimates are φ = 0.615 and β = −0.057. Furthermore, our results are in
line with the results of Lokshin (2008) obtained for TML.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 85
Table 3.1: TML and RML estimates for the ARX(1) model.
TMLg TMLl RMLg RMLl
T φ β φ β φ β φ β2 1.422 0.020 0.506 0.003 0.493 0.003 0.493 0.0033 0.429 -0.006 0.429 -0.006 0.532 -0.008 0.532 -0.0084 0.492 -0.026 0.492 -0.026 0.562 -0.025 0.562 -0.0255 0.451 -0.031 0.451 -0.031 0.489 -0.030 0.489 -0.0306 0.511 -0.036 0.511 -0.036 0.531 -0.035 0.531 -0.0357 0.511 -0.038 0.511 -0.038 0.517 -0.038 0.517 -0.0388 0.577 -0.041 0.577 -0.041 0.587 -0.040 0.587 -0.0409 0.617 -0.057 0.617 -0.057 0.641 -0.055 0.641 -0.055
3.6.2 AR(1) model
The empirical results for the pure AR(1) model without gi,t−1 are reported in Table 3.2.
As can be seen, the estimation results obtained from TML are quite stable irrespective
of the time horizon under consideration. Furthermore, in all cases we find that the
global maximum of the log-likelihood function is attained at the left maximum.20
The results for the RML estimator, however, are considerably less stable. For T =
2, 5, 7, 8 all three RML estimators are identical and are very close to the TML es-
timator. In two other cases, i.e. T = 6, 9, the global maximum is obtained at the
“right” solution and not at the “left” one. The result is a large difference between
the RMLg and RMLl/RMLb. Because the “left” solution satisfies the non-negativity
constraint, the RMLl and RMLb estimators are identical. Finally, for T = 3, 4 there
exists one solution only, which does not satisfy the non-negativity constraint. RMLl
and RMLb produce therefore markedly different estimates, with the latter actually be-
ing quite close to the corresponding TMLl estimates. In Figure 3.5 (see the Appendix)
we provide detailed plots of the concentrated log-likelihood function of this model for
different values of T . Based on these plots we can see how small increments in the
length of the time series change the shape of the concentrated log-likelihood function.
Furthermore, the log-likelihood functions for both estimators are relatively flat where
likelihoods are both unimodal and bimodal. Finally, we see in all cases that the second
mode of RML is smaller in absolute value as compared to that of TML.
20When T = 2 we report the “left” solution for TMLg as both its solutions are of the same log-likelihood value.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 86
Table 3.2: TML and RML estimates for the AR(1) model.
T TMLg TMLl RMLg RMLl RMLb2 0.502 0.502 0.516 0.516 0.5163 0.514 0.514 0.750 0.750 0.6134 0.522 0.522 0.950 0.950 0.6675 0.461 0.461 0.514 0.514 0.5146 0.553 0.553 1.062 0.596 0.5967 0.545 0.545 0.565 0.565 0.5658 0.613 0.613 0.632 0.632 0.6329 0.671 0.671 1.054 0.695 0.695
3.7 Conclusions
We have investigated some finite sample and asymptotic properties of the TML and
RML estimators for dynamic panel data models. Both estimators are consistent for
fixed T and N large, but in finite samples their actual numerical implementation mat-
ters for inference. We showed that in a simple AR(1) model with homoscedastic errors
the TML and RML estimators can be obtained as solutions of cubic first-order con-
ditions. We furthermore argued that in some cases the value that maximizes the log-
likelihood function is not the best possible solution as it can violate the non-negativity
constraint of a positive variance. Finally, we showed that these results extend to models
with additional exogenous regressors.
In a Monte Carlo study we found that the issue of non-negativity constraints cannot
be ignored as it is commonly done in the literature. However, we also found, that for
some parameter values the use of a constrained likelihood can have detrimental effects
on finite sample bias of the corresponding ML estimator. Additionally, the inference
based on likelihood based estimators can be highly misleading, as for small values of N
and T we found that t-statistics tend to be substantially oversized. Although inference
based on the LR test provides reasonable size control for small N and T , it can result
in low power due to possible flatness of the likelihood function.
Finally, we investigated the issues of local maxima and boundary solutions in an em-
pirical analysis of U.S. state level unemployment rates. We have found that in some
cases the different treatment of these issues leads to markedly different estimates of the
autoregressive parameter.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 87
3.A Proofs
.
Proof Proposition 3.2. To prove that the quasi-between estimator for RML is asymp-
totically smaller than for TML, it is sufficient to show that
θ2T a
R∞ − θ2
RaT∞ ≥ 0,
as plimN→∞ φ(j)B = φ0 + 1
T
θ2j
aj∞ξ, for j = T,R that follows from the fact that φ = φ0
is always a solution to the FOC asymptotically (for more details refer to the proof of
Proposition 3.6) so that the maximum likelihood estimator is consistent. Here a(j)∞ =
plimN→∞TN
∑Ni=1 y
2i− for j = T,R. Observe that for any T
ξ ≡T−2∑t=0
(T − t− 1)φt0 ≥ 0,
θ2T = σ0 + T (var (∆yi,1)− σ2
0) = σ20 + T
(E ((φ0 − 1 + π0)yi,0 + vi + εi,1)2 − σ2
0
),
θ2R = σ2
0 + T(E (vi)
2) ,where as before ηi = π0yi,0 + vi and π0 =
E[yi,0ηi]
E[y2i,0]. The difference is thus θ2
T − θ2R =
T((φ0 − 1 + π0)2 E[y2
i,0])≥ 0. Regarding aR∞ we have
aR∞ = aT∞ − T(E[yi−yi,0])2
E[y2i,0]
,
E[yi−yi,0] =1
Tξ(φ0 − 1 + π0) E[y2
i,0],
while for aT∞:21
aT∞ =
(ξ2
T(φ0 − 1)2
)E
(yi,0 −
ηi1− φ0
)2
+σ2
0
T
T−2∑t=0
(t∑
j=0
φj0
)2
,
=
(ξ2
T(φ0 − 1)2
)E
(yi,0
(1− π0
1− φ0
)− vi
1− φ0
)2
+σ2
0
T
T−2∑t=0
(t∑
j=0
φj0
)2
,
=
(1
T(φ0 − 1 + π0)2ξ2
)E[y2
i,0] +ξ2
TE[v2
i ] +σ2
0
T
T−2∑t=0
(t∑
j=0
φj0
)2
,
21For derivations of this term please refer to Lemma 2 in the Appendix of Chapter 2.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 88
as E[viyi,0] = 0, which implies that
aR∞ =ξ2
TE[v2
i ] +σ2
0
T
T−2∑t=0
(t∑
j=0
φj0
)2
=1
T
(ξ2 E[v2
i ] + σ20q),
where q is implicitly defined. Denote w ≡ (φ0 − 1 + π0)2 E[y2i,0] ≥ 0, then
θ2T a
R∞ − θ2
RaT∞ = TwaR∞ − θ2
R
ξ2
Tw
= w(ξ2 E[v2
i ] + σ20q)− wξ2
(E[v2
i ] +1
Tσ2
0
)= wσ2
0
(q − ξ2
T
)> 0,
where the last result follows as an implication of the Jensen’s inequality.
Proof Proposition 3.3. Using the variables defined in Section 3.3.3, we note that for
T = 2 and TML
a = a, b = b+ 2a, c = c+ 4(a+ b).
and thus θ2(φ) = σ2(φ) + 4(a(1− φ) + b). Furthermore, for T = 2 we have from (3.13)
that `c(φ) ∝ log(θ2(φ)σ2(φ)) with
θ2(φ)σ2(φ) = σ4(φ) + 4σ2(b− φa) + 4σ2(φ)a
=(
2(b− φa) + σ2(φ))2
− 4(b− φa)2 + 4σ2(φ)a
=(
2(b− φa) + σ2(φ))2
− 4(b2 − 2φab+ φ2a2) + 4(ac− 2φab+ φ2a2)
=(
2(b− φa) + σ2(φ))2
+ 4(ac− b2)
=(
2(b− φa) + σ2(φ))2
+ d,
where
d =
(1
N
N∑i=1
(∆yi,1)2
)(1
N
N∑i=1
(∆yi,2)2
)−
(1
N
N∑i=1
∆yi,1∆yi,2
)2
. (3.52)
Proof Corollary 3.4. From Proposition 3.3 we have that θ2(φ)−σ2(φ) = 4(a(1−φ)+
b). Given that φ = 1 + b/a one can easily see that θ2(φ)− σ2(φ) = 4(a(1− φ) + b) = 0.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 89
The first and the third parts follow from the symmetry established in Proposition 3.3,
while the last part follows directly from definitions.
These results do not hold for T > 2 and/or the RML estimator. For example, observe
that for general T we can decompose (for simplicity denote T1 = 1/(T − 1))
θ2(φ) = c− 2φb+ φ2a, σ2(φ) = T1
(c− 2φb+ φ2a
),
θ2(φ)− σ2(φ) = φ2(a− T1a
)− 2φ
(b− T1b
)+(c− T1c
).
For T = 2 and TML we have a = a and the right hand side of the last equation becomes
linear in φ. Hence, setting the left hand side of the last equation equal to zero and
solving for φ, there is only one solution given by φW + 1. For T > 2 and/or the RML
estimator, the equation θ2(φ)− σ2(φ) = 0 has two solutions of the form
φ =b− T1b
a− T1a±
√(b− T1b)2 − (a− T1a)(c− T1c)
(a− T1a)2. (3.53)
The first order condition in (3.17), evaluated at the value φ, becomes proportional to
(b+ b)− φ(a+ a) 6= 0. In the point φ the first order condition is not zero in this case,
hence the corner solution θ2(φ) = σ2(φ) cannot hold.
Proof Corollary 3.5. Note that the discriminant D = 1 − S22.1/S11, where S22.1 ≡S22−S2
12/S11, where Sij is the i, j element of the S =∑N
i=1 ((∆yi,1,∆yi,2)′(∆yi,1,∆yi,2))
matrix. Under joint normality the elements S22.1 and S11 are independent χ2(·) random
variables with respectively N − 1 and N degrees of freedom. The main result follows
after observing that E[vechS] = (var (∆yi,1), φ0 var (∆yi,1)−σ20, φ
20 var (∆yi,1)+2σ2
0(1−φ0))′.
Proof Proposition 3.6. Observe that for any value of φ (see e.g. Chapter 2)
σ2E(x) ≡ E[σ2(φ)] = σ2
0 +1
T − 1
(x2aE − x
2ξ
Tσ2
0
),
θ2E(x) ≡ E[θ2(φ)] = θ2
0 + x2aE + x2ξ
Tθ2
0,
with aE, aE, ξ, x defined in (3.28). It is not difficult to see that the asymptotic polyno-
mial is given by
θ2E(x)
(aEx−
σ20ξ
T
)+ σ2
E(x)
(aEx+
θ20ξ
T
)= 0.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 90
Plugging in the expressions for σ2E(x) and θ2
E(x) into the previous formula
x
([θ2E(x)aE + σ2
E(x)aE] +x
Tξ
[θ2
0
aET − 1
− σ20 aE
]− 2ξ2
T (T − 1)θ2
0σ20
)= 0.
Note that
θ2E(x)aE + σ2
E(x)aE = x2
(1 +
1
T − 1
)(aE aE) + x
2ξ
T
(θ2
0aE −1
T − 1σ2
0 aE
)+(θ2
0aE + σ20 aE).
Combining both expressions and removing the trivial solution x = 0 we get
x2 T
T − 1(aE aE) + x
(1
Tξ
(θ2
0
aET − 1
− σ20 aE
)+
2ξ
T
(θ2
0aE −1
T − 1σ2
0 aE
))+(θ2
0aE + σ20 aE)− 2ξ2
T (T − 1)θ2
0σ20 = 0.
3.B Tables
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 91
Table 3.3: Estimation Results for N = 50, T = 3.
Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSEφ = 0.5 γ = 0.5 σµ = 1 φ = 0.5 γ = 0.5 σµ = 3 φ = 0.5 γ = 1.0 σµ = 1 φ = 0.5 γ = 1.0 σµ = 3
TMLg 0.72 0.68 0.62 0.40 0.82 0.66 0.83 0.53 0.69 0.66 0.57 0.38 0.69 0.66 0.57 0.38RMLg 0.57 0.52 0.33 0.26 0.75 0.58 0.77 0.47 0.55 0.51 0.32 0.25 0.64 0.58 0.51 0.34TMLl 0.54 0.50 0.32 0.23 0.52 0.49 0.20 0.18 0.54 0.50 0.34 0.24 0.54 0.50 0.34 0.24RMLl 0.53 0.50 0.28 0.22 0.53 0.50 0.20 0.19 0.53 0.49 0.29 0.22 0.54 0.49 0.33 0.25TMLb 0.51 0.50 0.30 0.19 0.52 0.49 0.20 0.16 0.50 0.50 0.30 0.19 0.50 0.50 0.30 0.19RMLb 0.49 0.49 0.24 0.16 0.52 0.50 0.20 0.16 0.47 0.48 0.22 0.15 0.50 0.49 0.28 0.18
φ = 0.8 γ = 0.5 σµ = 1 φ = 0.8 γ = 0.5 σµ = 3 φ = 0.8 γ = 1.0 σµ = 1 φ = 0.8 γ = 1.0 σµ = 3TMLg 0.90 0.93 0.36 0.28 0.95 0.97 0.41 0.31 0.90 0.92 0.36 0.28 0.90 0.92 0.36 0.28RMLg 0.83 0.82 0.36 0.24 0.95 0.96 0.44 0.31 0.82 0.82 0.36 0.24 0.86 0.87 0.40 0.27TMLl 0.76 0.77 0.34 0.21 0.78 0.78 0.34 0.20 0.75 0.77 0.34 0.21 0.75 0.77 0.34 0.21RMLl 0.78 0.77 0.33 0.22 0.79 0.78 0.35 0.22 0.77 0.77 0.33 0.22 0.77 0.77 0.34 0.23TMLb 0.73 0.76 0.29 0.19 0.77 0.78 0.31 0.19 0.73 0.75 0.29 0.19 0.73 0.75 0.29 0.19RMLb 0.71 0.73 0.22 0.17 0.76 0.78 0.30 0.18 0.70 0.73 0.22 0.18 0.72 0.74 0.26 0.19
Table 3.4: Estimation Results for N = 50, T = 7.
Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSEφ = 0.5 γ = 0.5 σµ = 1 φ = 0.5 γ = 0.5 σµ = 3 φ = 0.5 γ = 1.0 σµ = 1 φ = 0.5 γ = 1.0 σµ = 3
TMLg 0.51 0.49 0.10 0.12 0.52 0.49 0.08 0.15 0.51 0.49 0.10 0.11 0.51 0.49 0.10 0.11RMLg 0.50 0.49 0.09 0.08 0.52 0.50 0.08 0.13 0.50 0.49 0.10 0.08 0.50 0.49 0.10 0.10TMLl 0.50 0.49 0.10 0.07 0.49 0.49 0.08 0.06 0.50 0.49 0.10 0.07 0.50 0.49 0.10 0.07RMLl 0.50 0.49 0.09 0.07 0.49 0.49 0.08 0.06 0.50 0.49 0.10 0.07 0.50 0.49 0.10 0.07TMLb 0.50 0.49 0.10 0.07 0.49 0.49 0.08 0.06 0.50 0.49 0.10 0.07 0.50 0.49 0.10 0.07RMLb 0.49 0.49 0.09 0.07 0.49 0.49 0.08 0.06 0.49 0.49 0.10 0.07 0.50 0.49 0.10 0.07
φ = 0.8 γ = 0.5 σµ = 1 φ = 0.8 γ = 0.5 σµ = 3 φ = 0.8 γ = 1.0 σµ = 1 φ = 0.8 γ = 1.0 σµ = 3TMLg 0.87 0.86 0.25 0.16 0.91 0.90 0.30 0.19 0.87 0.85 0.24 0.16 0.87 0.85 0.24 0.16RMLg 0.81 0.80 0.14 0.11 0.91 0.89 0.30 0.19 0.81 0.80 0.14 0.11 0.84 0.82 0.20 0.14TMLl 0.81 0.79 0.15 0.11 0.81 0.79 0.13 0.10 0.81 0.79 0.16 0.11 0.81 0.79 0.16 0.11RMLl 0.80 0.79 0.13 0.10 0.81 0.79 0.13 0.10 0.80 0.79 0.14 0.10 0.81 0.79 0.15 0.11TMLb 0.79 0.79 0.14 0.08 0.80 0.79 0.13 0.09 0.79 0.79 0.13 0.08 0.79 0.79 0.13 0.08RMLb 0.77 0.78 0.09 0.07 0.80 0.79 0.13 0.09 0.77 0.78 0.09 0.07 0.78 0.79 0.12 0.08
Table 3.5: Estimation Results for N = 250, T = 3.
Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSEφ = 0.5 γ = 0.5 σµ = 1 φ = 0.5 γ = 0.5 σµ = 3 φ = 0.5 γ = 1.0 σµ = 1 φ = 0.5 γ = 1.0 σµ = 3
TMLg 0.63 0.53 0.28 0.29 0.71 0.52 0.75 0.41 0.61 0.52 0.25 0.26 0.61 0.52 0.25 0.26RMLg 0.51 0.49 0.12 0.11 0.58 0.50 0.10 0.26 0.51 0.49 0.13 0.11 0.56 0.50 0.16 0.20TMLl 0.51 0.50 0.13 0.12 0.50 0.49 0.08 0.07 0.52 0.49 0.14 0.14 0.52 0.49 0.14 0.14RMLl 0.51 0.49 0.12 0.10 0.50 0.49 0.08 0.07 0.51 0.49 0.12 0.10 0.52 0.49 0.14 0.13TMLb 0.51 0.50 0.13 0.10 0.50 0.49 0.08 0.06 0.51 0.49 0.14 0.11 0.51 0.49 0.14 0.11RMLb 0.50 0.49 0.12 0.09 0.50 0.49 0.08 0.06 0.50 0.49 0.12 0.08 0.51 0.49 0.14 0.10
φ = 0.8 γ = 0.5 σµ = 1 φ = 0.8 γ = 0.5 σµ = 3 φ = 0.8 γ = 1.0 σµ = 1 φ = 0.8 γ = 1.0 σµ = 3TMLg 0.89 0.90 0.27 0.20 0.95 0.96 0.34 0.25 0.88 0.90 0.27 0.20 0.88 0.90 0.27 0.20RMLg 0.82 0.80 0.20 0.14 0.94 0.94 0.35 0.24 0.81 0.80 0.20 0.14 0.85 0.83 0.26 0.18TMLl 0.80 0.79 0.23 0.14 0.81 0.79 0.23 0.13 0.79 0.79 0.23 0.14 0.79 0.79 0.23 0.14RMLl 0.80 0.79 0.19 0.13 0.82 0.79 0.23 0.14 0.80 0.79 0.20 0.13 0.80 0.79 0.23 0.15TMLb 0.78 0.79 0.20 0.12 0.80 0.79 0.21 0.12 0.78 0.79 0.20 0.12 0.78 0.79 0.20 0.12RMLb 0.76 0.78 0.12 0.09 0.80 0.79 0.21 0.12 0.76 0.78 0.12 0.09 0.77 0.79 0.17 0.11
Table 3.6: Estimation Results for N = 250, T = 7.
Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSEφ = 0.5 γ = 0.5 σµ = 1 φ = 0.5 γ = 0.5 σµ = 3 φ = 0.5 γ = 1.0 σµ = 1 φ = 0.5 γ = 1.0 σµ = 3
TMLg 0.49 0.49 0.04 0.03 0.49 0.49 0.03 0.02 0.49 0.49 0.04 0.03 0.49 0.49 0.04 0.03RMLg 0.49 0.49 0.04 0.03 0.49 0.49 0.03 0.02 0.49 0.49 0.04 0.03 0.49 0.50 0.04 0.03TMLl 0.49 0.49 0.04 0.03 0.49 0.49 0.03 0.02 0.49 0.49 0.04 0.03 0.49 0.49 0.04 0.03RMLl 0.49 0.49 0.04 0.03 0.49 0.49 0.03 0.02 0.49 0.49 0.04 0.03 0.49 0.50 0.04 0.03TMLb 0.49 0.49 0.04 0.03 0.49 0.49 0.03 0.02 0.49 0.49 0.04 0.03 0.49 0.49 0.04 0.03RMLb 0.49 0.49 0.04 0.03 0.49 0.49 0.03 0.02 0.49 0.49 0.04 0.03 0.49 0.50 0.04 0.03
φ = 0.8 γ = 0.5 σµ = 1 φ = 0.8 γ = 0.5 σµ = 3 φ = 0.8 γ = 1.0 σµ = 1 φ = 0.8 γ = 1.0 σµ = 3TMLg 0.84 0.80 0.10 0.11 0.87 0.81 0.26 0.15 0.83 0.80 0.10 0.10 0.83 0.80 0.10 0.10RMLg 0.80 0.79 0.06 0.05 0.87 0.81 0.24 0.14 0.80 0.79 0.06 0.05 0.81 0.80 0.07 0.07TMLl 0.80 0.79 0.06 0.06 0.80 0.79 0.05 0.04 0.80 0.79 0.06 0.06 0.80 0.79 0.06 0.06RMLl 0.80 0.79 0.06 0.04 0.80 0.79 0.05 0.05 0.80 0.79 0.06 0.04 0.80 0.79 0.06 0.06TMLb 0.80 0.79 0.06 0.05 0.80 0.79 0.05 0.04 0.80 0.79 0.06 0.05 0.80 0.79 0.06 0.05RMLb 0.79 0.79 0.05 0.03 0.80 0.79 0.05 0.04 0.79 0.79 0.05 0.03 0.80 0.79 0.06 0.04
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 92
Table
3.7
:t-
test
resu
lts
forN
=50
,T
=3.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σµ
=1
φ=
0.5
γ=
0.5
σµ
=3
φ=
0.5
γ=
1.0
σµ
=1
φ=
0.5
γ=
1.0
σµ
=3
TM
Lg
0.4
70.4
40.4
40.4
60.4
60.4
90.4
40.4
60.5
40.6
30.4
60.4
30.4
20.4
20.4
30.4
60.4
30.4
20.4
20.4
3R
ML
g0.3
20.2
50.2
30.2
80.3
60.4
50.3
60.3
80.4
60.5
80.3
10.2
40.2
20.2
60.3
40.4
00.3
60.3
50.3
60.4
0T
ML
l0.2
30.2
00.2
20.2
60.3
20.2
10.0
80.1
00.2
20.3
80.2
50.2
20.2
30.2
60.3
20.2
50.2
20.2
30.2
60.3
2R
ML
l0.2
60.1
90.1
80.2
30.3
30.2
30.1
00.1
10.2
20.3
90.2
70.2
00.1
90.2
30.3
20.2
60.2
20.2
20.2
60.3
3T
ML
b0.1
70.1
30.1
40.2
00.2
80.2
00.0
70.0
90.2
00.3
70.1
70.1
40.1
40.1
90.2
70.1
70.1
40.1
40.1
90.2
7R
ML
b0.1
40.0
70.0
70.1
60.2
80.2
10.0
70.0
80.1
90.3
60.1
30.0
60.0
70.1
60.2
90.1
50.1
00.1
10.1
60.2
7φ
=0.8
γ=
0.5
σµ
=1
φ=
0.8
γ=
0.5
σµ
=3
φ=
0.8
γ=
1.0
σµ
=1
φ=
0.8
γ=
1.0
σµ
=3
TM
Lg
0.5
60.4
70.3
60.2
80.2
70.5
80.5
20.4
40.3
30.3
00.5
60.4
70.3
50.2
70.2
70.5
60.4
70.3
50.2
70.2
7R
ML
g0.4
00.3
10.2
60.2
70.3
30.5
50.4
80.4
00.3
30.3
20.3
90.3
10.2
60.2
70.3
30.4
60.3
80.3
10.2
80.3
1T
ML
l0.3
40.2
80.2
20.2
10.2
90.3
30.2
90.2
50.2
20.2
80.3
40.2
80.2
10.2
10.2
90.3
40.2
80.2
10.2
10.2
9R
ML
l0.3
20.2
50.2
10.2
50.3
30.3
20.2
70.2
20.2
20.2
90.3
20.2
40.2
10.2
40.3
30.3
30.2
50.2
10.2
30.3
2T
ML
b0.2
90.2
40.2
00.2
10.3
20.3
00.2
70.2
40.2
10.2
90.2
90.2
30.2
00.2
10.3
30.2
90.2
30.2
00.2
10.3
3R
ML
b0.1
00.0
70.1
00.1
90.3
30.2
30.1
80.1
50.1
90.2
80.1
00.0
70.1
10.2
00.3
40.1
70.1
10.1
30.1
90.3
2
Table
3.8
:t-
test
resu
lts
forN
=50
,T
=7.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σµ
=1
φ=
0.5
γ=
0.5
σµ
=3
φ=
0.5
γ=
1.0
σµ
=1
φ=
0.5
γ=
1.0
σµ
=3
TM
Lg
0.7
90.2
70.0
80.3
50.7
70.9
00.3
80.1
10.4
40.8
80.7
70.2
60.0
80.3
40.7
60.7
70.2
60.0
80.3
40.7
6R
ML
g0.7
90.2
70.0
60.3
30.7
60.9
00.3
80.1
00.4
40.8
80.7
70.2
60.0
60.3
20.7
50.7
70.2
60.0
70.3
30.7
5T
ML
l0.7
90.2
60.0
50.3
20.7
60.9
00.3
70.0
60.4
00.8
70.7
70.2
50.0
50.3
10.7
40.7
70.2
50.0
50.3
10.7
4R
ML
l0.7
90.2
60.0
60.3
20.7
60.9
00.3
80.0
60.4
00.8
70.7
70.2
50.0
60.3
20.7
40.7
70.2
50.0
50.3
10.7
4T
ML
b0.7
90.2
60.0
50.3
20.7
60.9
00.3
70.0
60.4
00.8
70.7
70.2
50.0
50.3
10.7
40.7
70.2
50.0
50.3
10.7
4R
ML
b0.7
90.2
60.0
60.3
20.7
60.9
00.3
80.0
60.4
00.8
70.7
70.2
50.0
50.3
20.7
50.7
70.2
40.0
50.3
10.7
4φ
=0.8
γ=
0.5
σµ
=1
φ=
0.8
γ=
0.5
σµ
=3
φ=
0.8
γ=
1.0
σµ
=1
φ=
0.8
γ=
1.0
σµ
=3
TM
Lg
0.6
40.4
70.4
20.4
30.5
10.7
20.5
00.4
90.5
40.5
70.6
40.4
60.4
10.4
20.5
10.6
40.4
60.4
10.4
20.5
1R
ML
g0.6
10.3
20.2
10.3
40.6
30.7
20.5
00.4
80.5
40.5
90.6
00.3
20.2
10.3
40.6
20.6
20.3
90.3
20.3
80.5
5T
ML
l0.5
10.2
70.2
40.3
20.5
50.5
90.2
10.1
90.3
50.5
80.5
00.2
70.2
40.3
10.5
60.5
00.2
70.2
40.3
10.5
6R
ML
l0.5
80.2
90.1
80.3
30.6
30.5
90.2
10.1
90.3
50.5
80.5
70.2
90.1
80.3
30.6
30.5
20.2
80.2
20.3
20.5
8T
ML
b0.4
40.1
80.1
30.2
70.5
60.5
60.1
70.1
40.3
10.5
80.4
30.1
70.1
20.2
70.5
70.4
30.1
70.1
20.2
70.5
7R
ML
b0.4
60.1
20.0
70.3
00.6
80.5
40.1
50.1
20.3
00.5
80.4
50.1
10.0
60.3
10.6
90.4
20.1
40.0
90.2
70.5
9
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 93
Table
3.9
:t-
test
resu
lts
forN
=25
0,T
=3.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σµ
=1
φ=
0.5
γ=
0.5
σµ
=3
φ=
0.5
γ=
1.0
σµ
=1
φ=
0.5
γ=
1.0
σµ
=3
TM
Lg
0.6
00.2
80.2
80.4
30.6
40.9
10.4
40.3
10.5
60.8
60.5
30.2
70.2
70.4
00.5
90.5
30.2
70.2
70.4
00.5
9R
ML
g0.5
70.1
50.0
80.2
60.5
90.9
10.3
60.1
50.4
40.8
30.5
30.1
50.0
80.2
50.5
60.5
00.2
00.1
80.3
30.5
6T
ML
l0.4
90.0
90.0
90.2
80.5
40.9
00.2
90.0
50.3
60.7
90.4
00.1
10.1
10.2
70.5
10.4
00.1
10.1
10.2
70.5
1R
ML
l0.5
60.1
40.0
80.2
60.5
80.9
00.2
90.0
50.3
60.7
90.5
20.1
50.0
80.2
50.5
60.4
30.1
20.1
10.2
70.5
2T
ML
b0.4
70.0
70.0
70.2
50.5
20.9
00.2
90.0
50.3
60.7
90.3
80.0
80.0
80.2
30.4
90.3
80.0
80.0
80.2
30.4
9R
ML
b0.5
50.1
20.0
60.2
50.5
80.9
00.2
90.0
50.3
60.7
90.5
00.1
20.0
50.2
40.5
60.4
00.0
80.0
60.2
30.5
0φ
=0.8
γ=
0.5
σµ
=1
φ=
0.8
γ=
0.5
σµ
=3
φ=
0.8
γ=
1.0
σµ
=1
φ=
0.8
γ=
1.0
σµ
=3
TM
Lg
0.5
90.5
30.4
30.2
90.3
50.5
90.5
50.5
10.4
30.3
60.5
80.5
10.4
10.2
80.3
60.5
80.5
10.4
10.2
80.3
6R
ML
g0.4
50.2
80.2
00.2
60.4
60.5
70.5
20.4
70.4
00.3
80.4
40.2
80.1
90.2
60.4
60.5
00.3
90.2
90.2
90.4
3T
ML
l0.3
80.3
30.2
70.2
30.4
00.2
90.2
60.2
70.2
90.3
70.3
80.3
30.2
60.2
30.4
10.3
80.3
30.2
60.2
30.4
1R
ML
l0.4
20.2
50.1
80.2
50.4
60.3
10.2
70.2
50.2
60.3
80.4
20.2
60.1
70.2
50.4
60.4
00.3
00.2
20.2
60.4
4T
ML
b0.3
30.2
80.2
30.2
20.4
50.2
70.2
50.2
60.2
90.3
90.3
30.2
70.2
10.2
20.4
60.3
30.2
70.2
10.2
20.4
6R
ML
b0.2
50.0
60.0
70.2
20.5
00.2
60.2
10.2
00.2
40.3
80.2
40.0
50.0
70.2
30.5
10.2
70.1
40.0
90.2
10.4
5
Table
3.1
0:
t-te
stre
sult
sfo
rN
=25
0,T
=7.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σµ
=1
φ=
0.5
γ=
0.5
σµ
=3
φ=
0.5
γ=
1.0
σµ
=1
φ=
0.5
γ=
1.0
σµ
=3
TM
Lg
0.9
90.8
60.0
50.8
40.9
91.0
00.9
40.0
50.9
41.0
00.9
90.8
50.0
50.8
20.9
90.9
90.8
50.0
50.8
20.9
9R
ML
g0.9
90.8
70.0
50.8
50.9
91.0
00.9
40.0
50.9
41.0
00.9
90.8
50.0
50.8
30.9
90.9
90.8
50.0
50.8
20.9
9T
ML
l0.9
90.8
60.0
50.8
40.9
91.0
00.9
40.0
50.9
41.0
00.9
90.8
50.0
50.8
20.9
90.9
90.8
50.0
50.8
20.9
9R
ML
l0.9
90.8
70.0
50.8
50.9
91.0
00.9
40.0
50.9
41.0
00.9
90.8
50.0
50.8
30.9
90.9
90.8
50.0
50.8
20.9
9T
ML
b0.9
90.8
60.0
50.8
40.9
91.0
00.9
40.0
50.9
41.0
00.9
90.8
50.0
50.8
20.9
90.9
90.8
50.0
50.8
20.9
9R
ML
b0.9
90.8
70.0
50.8
50.9
91.0
00.9
40.0
50.9
41.0
00.9
90.8
50.0
50.8
30.9
90.9
90.8
50.0
50.8
20.9
9φ
=0.8
γ=
0.5
σµ
=1
φ=
0.8
γ=
0.5
σµ
=3
φ=
0.8
γ=
1.0
σµ
=1
φ=
0.8
γ=
1.0
σµ
=3
TM
Lg
0.9
90.5
70.2
40.6
20.8
00.9
90.7
50.3
20.7
50.9
00.9
90.5
40.2
40.6
00.8
00.9
90.5
40.2
40.6
00.8
0R
ML
g0.9
90.5
80.0
80.6
10.9
20.9
90.7
40.3
00.7
50.9
10.9
80.5
60.0
80.6
00.9
20.9
90.5
20.1
40.5
80.8
5T
ML
l0.9
80.4
60.1
10.5
50.8
70.9
90.6
80.0
60.6
40.9
30.9
80.4
40.1
10.5
30.8
60.9
80.4
40.1
10.5
30.8
6R
ML
l0.9
80.5
70.0
70.6
10.9
20.9
90.6
80.0
60.6
40.9
30.9
80.5
60.0
80.6
00.9
20.9
80.4
80.1
00.5
50.8
7T
ML
b0.9
60.4
30.0
60.5
20.8
70.9
90.6
80.0
50.6
30.9
30.9
50.4
00.0
70.5
10.8
70.9
50.4
00.0
70.5
10.8
7R
ML
b0.9
80.5
60.0
40.6
20.9
60.9
90.6
70.0
50.6
30.9
30.9
80.5
40.0
40.6
30.9
60.9
50.4
40.0
60.5
30.8
8
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 94
Table
3.1
1:
LR
test
resu
lts
forN
=50
,T
=3.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σν
=1
φ=
0.5
γ=
0.5
σν
=3
φ=
0.5
γ=
1.0
σν
=1
φ=
0.5
γ=
1.0
σν
=3
TM
Lg
0.2
50.0
90.0
50.0
90.1
30.4
20.1
70.0
80.1
30.2
40.2
20.0
80.0
50.0
80.1
20.2
20.0
80.0
50.0
80.1
2R
ML
g0.2
50.1
00.0
70.0
90.1
40.4
20.1
80.1
00.1
40.2
50.2
30.1
00.0
60.0
80.1
30.2
40.1
00.0
60.0
90.1
2T
ML
l0.2
20.0
70.0
30.0
60.0
90.3
60.1
20.0
40.0
80.1
90.2
00.0
60.0
30.0
50.0
90.2
00.0
60.0
30.0
50.0
9R
ML
l0.2
40.1
00.0
60.0
80.1
20.3
60.1
30.0
50.0
90.1
90.2
20.0
90.0
60.0
70.1
30.2
10.0
80.0
40.0
60.1
0T
ML
b0.2
00.0
60.0
30.0
60.0
90.3
60.1
20.0
40.0
80.1
90.1
80.0
50.0
30.0
50.0
90.1
80.0
50.0
30.0
50.0
9R
ML
b0.2
10.0
60.0
30.0
60.1
20.3
60.1
20.0
40.0
90.1
90.1
80.0
50.0
30.0
60.1
20.1
80.0
50.0
30.0
50.0
9φ
=0.8
γ=
0.5
σν
=1
φ=
0.8
γ=
0.5
σν
=3
φ=
0.8
γ=
1.0
σν
=1
φ=
0.8
γ=
1.0
σν
=3
TM
Lg
0.0
50.0
20.0
30.0
40.0
40.0
90.0
30.0
30.0
50.0
50.0
40.0
20.0
40.0
40.0
40.0
40.0
20.0
40.0
40.0
4R
ML
g0.1
20.0
60.0
60.0
60.0
80.1
10.0
40.0
40.0
60.0
60.1
10.0
60.0
60.0
60.0
80.0
80.0
40.0
50.0
60.0
5T
ML
l0.0
40.0
10.0
20.0
30.0
30.0
90.0
20.0
20.0
30.0
40.0
40.0
10.0
20.0
30.0
20.0
40.0
10.0
20.0
30.0
2R
ML
l0.1
10.0
50.0
40.0
50.0
70.1
00.0
20.0
20.0
40.0
40.1
00.0
50.0
40.0
50.0
70.0
60.0
20.0
30.0
40.0
4T
ML
b0.0
40.0
10.0
20.0
30.0
30.0
80.0
20.0
20.0
30.0
40.0
40.0
10.0
20.0
30.0
20.0
40.0
10.0
20.0
30.0
2R
ML
b0.0
50.0
10.0
20.0
40.0
70.0
90.0
20.0
20.0
40.0
40.0
50.0
10.0
20.0
40.0
70.0
40.0
10.0
20.0
30.0
4
Table
3.1
2:
LR
test
resu
lts
forN
=50
,T
=7.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σν
=1
φ=
0.5
γ=
0.5
σν
=3
φ=
0.5
γ=
1.0
σν
=1
φ=
0.5
γ=
1.0
σν
=3
TM
Lg
0.8
20.3
00.0
60.2
50.6
50.9
10.3
90.0
60.3
40.8
30.8
00.2
80.0
50.2
40.6
30.8
00.2
80.0
50.2
40.6
3R
ML
g0.8
20.3
00.0
40.2
50.6
70.9
10.3
90.0
60.3
40.8
20.8
00.2
80.0
40.2
40.6
60.8
00.2
80.0
50.2
40.6
3T
ML
l0.8
20.2
90.0
40.2
40.6
40.9
10.3
80.0
50.3
20.8
10.8
00.2
80.0
40.2
30.6
20.8
00.2
80.0
40.2
30.6
2R
ML
l0.8
20.3
00.0
40.2
40.6
60.9
10.3
80.0
50.3
20.8
10.8
00.2
80.0
40.2
40.6
60.8
00.2
80.0
40.2
30.6
3T
ML
b0.8
20.2
90.0
40.2
40.6
40.9
10.3
80.0
50.3
20.8
10.8
00.2
80.0
40.2
30.6
20.8
00.2
80.0
40.2
30.6
2R
ML
b0.8
20.3
00.0
40.2
40.6
60.9
10.3
80.0
50.3
20.8
10.8
00.2
80.0
40.2
40.6
60.8
00.2
80.0
40.2
30.6
3φ
=0.8
γ=
0.5
σν
=1
φ=
0.8
γ=
0.5
σν
=3
φ=
0.8
γ=
1.0
σν
=1
φ=
0.8
γ=
1.0
σν
=3
TM
Lg
0.7
00.2
40.0
60.1
20.1
20.7
90.3
00.0
60.1
60.1
90.6
90.2
20.0
60.1
20.1
20.6
90.2
20.0
60.1
20.1
2R
ML
g0.7
00.2
40.0
70.1
50.4
20.7
90.3
20.0
80.1
80.2
10.6
90.2
30.0
60.1
40.4
30.6
90.2
30.0
70.1
30.2
0T
ML
l0.6
60.2
10.0
40.0
90.1
00.7
60.2
50.0
40.1
20.1
50.6
50.2
00.0
40.0
90.1
00.6
50.2
00.0
40.0
90.1
0R
ML
l0.6
90.2
30.0
60.1
40.4
20.7
60.2
50.0
50.1
20.1
60.6
80.2
20.0
60.1
40.4
30.6
60.2
10.0
50.1
10.1
9T
ML
b0.6
60.1
80.0
30.0
90.1
00.7
60.2
40.0
40.1
20.1
50.6
50.1
70.0
30.0
90.1
00.6
50.1
70.0
30.0
90.1
0R
ML
b0.6
90.1
80.0
30.1
40.4
30.7
60.2
40.0
40.1
20.1
60.6
80.1
70.0
20.1
30.4
30.6
60.1
80.0
30.1
00.1
9
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 95
Table
3.1
3:
LR
test
resu
lts
forN
=25
0,T
=3.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σν
=1
φ=
0.5
γ=
0.5
σν
=3
φ=
0.5
γ=
1.0
σν
=1
φ=
0.5
γ=
1.0
σν
=3
TM
Lg
0.7
30.2
60.0
90.1
80.3
60.9
30.4
30.0
90.3
30.7
30.6
80.2
30.0
80.1
60.3
10.6
80.2
30.0
80.1
60.3
1R
ML
g0.7
30.2
30.0
50.1
70.4
20.9
30.4
00.0
80.3
10.7
20.7
00.2
10.0
50.1
60.4
10.6
70.2
20.0
70.1
50.3
1T
ML
l0.7
00.2
10.0
60.1
40.3
20.9
30.3
70.0
50.2
80.6
90.6
50.1
90.0
60.1
30.2
70.6
50.1
90.0
60.1
30.2
7R
ML
l0.7
30.2
30.0
50.1
60.4
20.9
30.3
80.0
50.2
80.7
00.6
90.2
10.0
50.1
60.4
10.6
60.2
00.0
60.1
40.2
9T
ML
b0.7
00.2
10.0
50.1
40.3
20.9
30.3
70.0
50.2
80.6
90.6
50.1
90.0
50.1
30.2
70.6
50.1
90.0
50.1
30.2
7R
ML
b0.7
30.2
30.0
50.1
60.4
20.9
30.3
80.0
50.2
80.6
90.6
90.2
00.0
40.1
50.4
10.6
60.1
90.0
50.1
30.2
9φ
=0.8
γ=
0.5
σν
=1
φ=
0.8
γ=
0.5
σν
=3
φ=
0.8
γ=
1.0
σν
=1
φ=
0.8
γ=
1.0
σν
=3
TM
Lg
0.3
30.0
40.0
30.0
60.0
50.4
70.1
00.0
40.0
80.0
90.3
10.0
30.0
40.0
60.0
40.3
10.0
30.0
40.0
60.0
4R
ML
g0.4
30.1
20.0
50.0
90.2
30.4
90.1
10.0
40.0
90.0
90.4
10.1
20.0
50.0
90.2
30.3
70.0
90.0
50.0
70.0
9T
ML
l0.3
20.0
40.0
20.0
40.0
40.4
50.0
90.0
20.0
60.0
70.3
00.0
30.0
20.0
40.0
30.3
00.0
30.0
20.0
40.0
3R
ML
l0.4
20.1
20.0
50.0
90.2
30.4
60.1
00.0
30.0
60.0
80.4
10.1
10.0
50.0
90.2
30.3
50.0
70.0
40.0
60.0
8T
ML
b0.3
10.0
40.0
20.0
40.0
40.4
50.0
90.0
20.0
60.0
70.2
90.0
30.0
20.0
40.0
30.2
90.0
30.0
20.0
40.0
3R
ML
b0.3
90.0
60.0
20.0
80.2
30.4
50.0
90.0
30.0
60.0
80.3
70.0
50.0
20.0
80.2
40.3
20.0
40.0
20.0
50.0
8
Table
3.1
4:
LR
test
resu
lts
forN
=25
0,T
=7.
φ−φ0
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
-.2
-.1
.0.1
.2-.
2-.
1.0
.1.2
φ=
0.5
γ=
0.5
σν
=1
φ=
0.5
γ=
0.5
σν
=3
φ=
0.5
γ=
1.0
σν
=1
φ=
0.5
γ=
1.0
σν
=3
TM
Lg
0.9
90.8
70.0
40.8
20.9
90.9
90.9
50.0
50.9
31.0
00.9
90.8
60.0
50.8
00.9
90.9
90.8
60.0
50.8
00.9
9R
ML
g0.9
90.8
80.0
40.8
30.9
90.9
90.9
50.0
40.9
31.0
00.9
90.8
60.0
40.8
10.9
90.9
90.8
60.0
50.8
00.9
9T
ML
l0.9
90.8
70.0
40.8
20.9
90.9
90.9
50.0
50.9
31.0
00.9
90.8
60.0
50.8
00.9
90.9
90.8
60.0
50.8
00.9
9R
ML
l0.9
90.8
80.0
40.8
30.9
90.9
90.9
50.0
40.9
31.0
00.9
90.8
60.0
40.8
10.9
90.9
90.8
60.0
50.8
00.9
9T
ML
b0.9
90.8
70.0
40.8
20.9
90.9
90.9
50.0
50.9
31.0
00.9
90.8
60.0
50.8
00.9
90.9
90.8
60.0
50.8
00.9
9R
ML
b0.9
90.8
80.0
40.8
30.9
90.9
90.9
50.0
40.9
31.0
00.9
90.8
60.0
40.8
10.9
90.9
90.8
60.0
50.8
00.9
9φ
=0.8
γ=
0.5
σν
=1
φ=
0.8
γ=
0.5
σν
=3
φ=
0.8
γ=
1.0
σν
=1
φ=
0.8
γ=
1.0
σν
=3
TM
Lg
0.9
90.6
90.0
80.3
30.4
20.9
90.8
00.0
90.4
90.6
00.9
90.6
70.0
80.3
20.4
30.9
90.6
70.0
80.3
20.4
3R
ML
g0.9
90.7
10.0
50.4
90.9
40.9
90.8
00.1
00.5
00.6
20.9
90.7
00.0
50.4
90.9
50.9
90.6
70.0
60.3
50.6
9T
ML
l0.9
90.6
70.0
60.3
00.4
10.9
90.7
80.0
50.4
40.5
60.9
90.6
50.0
60.2
90.4
20.9
90.6
50.0
60.2
90.4
2R
ML
l0.9
90.7
10.0
50.4
90.9
40.9
90.7
80.0
50.4
40.5
70.9
90.7
00.0
50.4
90.9
50.9
90.6
60.0
60.3
40.6
9T
ML
b0.9
90.6
70.0
40.3
00.4
10.9
90.7
80.0
50.4
40.5
60.9
90.6
50.0
40.2
90.4
20.9
90.6
50.0
40.2
90.4
2R
ML
b0.9
90.7
10.0
30.4
90.9
30.9
90.7
80.0
50.4
40.5
70.9
90.7
00.0
30.4
90.9
40.9
90.6
60.0
40.3
40.6
9
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 96
3.C Figures
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
9.2
9.3
9.4
TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
9.4
9.5
9.6
RMLE
(a) T = 2
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
9.0
9.2
9.4 TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
9.00
9.25
9.50RMLE
(b) T = 3
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.75
9.00
9.25
9.50TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.75
9.00
9.25
9.50 RMLE
(c) T = 4
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.50
8.75
9.00
9.25
9.50TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.50
8.75
9.00
9.25
9.50RMLE
(d) T = 5
Figure 3.4: Average concentrated log-likelihood function for φ in AR(1) model.
Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 97
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.25
8.50
8.75
9.00
9.25
9.50TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.5
9.0
9.5RMLE
(a) T = 6
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.50
8.75
9.00
9.25
9.50TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.50
8.75
9.00
9.25
9.50RMLE
(b) T = 7
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.0
8.5
9.0
TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.0
8.5
9.0
RMLE
(c) T = 8
TMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.0
8.5
9.0TMLE
RMLE
-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
8.0
8.5
9.0
RMLE
(d) T = 9
Figure 3.5: Average concentrated log-likelihood function for φ in AR(1) model.
Chapter 4
Fixed T Dynamic Panel Data
Estimators with Multi-Factor
Errors
4.1 Introduction
There is a large literature on estimating dynamic panel data models with a two-way
error components structure and T fixed. Such models have been used in a wide range of
economic and financial applications; e.g. Euler equations for household consumption,
adjustment cost models for firms’ factor demand, and empirical models of economic
growth. In all these cases the autoregressive parameter has structural significance and
measures state dependence, which is due to the effect of habit formation, technologi-
cal/regulatory constraints, or imperfect information and uncertainty that often underlie
economic behavior and decision making in general.
Recently there has been a surge of interest in developing dynamic panel data estimators
that allow for richer error structures − mainly factor residuals. In this case standard
dynamic panel data estimators fail to provide consistent estimates of the parameters;
see e.g. Sarafidis and Robertson (2009), and Sarafidis and Wansbeek (2012) for a recent
overview. The multi-factor approach is appealing because it allows for multiple sources
of multiplicative unobserved heterogeneity, as opposed to the two-way error components
structure that represents additive heterogeneity. For example, in an empirical growth
model the factor component may reflect country-specific differences in the rate at which
99
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 100
countries absorb time-varying technological advances that are potentially available to
all of them. In a partial adjustment model of factor input prices, the factor component
may capture common shocks that hit all producers, albeit with different intensities. In
this study we provide a review of inference methods for dynamic panel data models
with multi-factor error structure.
The majority of estimators developed in the literature are based on the Generalized
Method of Moments (GMM) approach. This is presumably because in microecono-
metric panels endogeneity of the regressors is often an issue of major importance. In
particular, Ahn, Lee, and Schmidt (2013) extend Ahn, Lee, and Schmidt (2001) to
the case of multiple factors, and propose a GMM estimator that relies on quasi-long-
differencing to eliminate the common factor component. Nauges and Thomas (2003)
utilise the quasi-differencing approach of Holtz-Eakin, Newey, and Rosen (1988), which
is computationally tractable for the single factor case, and propose similar moment
conditions to Ahn et al. (2001) mutatis mutandis. Sarafidis, Yamagata, and Robertson
(2009) propose using the popular linear first-differenced and System GMM estimators
with instruments based solely on strictly exogenous regressors. Robertson and Sarafidis
(2015) develop a GMM approach that introduces new parameters which represent the
unobserved covariances between the factor component of the error and the instruments.
Furthermore they show that given the model’s structure there exist restrictions in the
nuisance parameters that lead to a more efficient GMM estimator compared to quasi-
differencing approaches. Hayakawa (2012) shows that the moment conditions proposed
by Ahn et al. (2013) can be linearized at the expense of introducing extra parame-
ters. Finally, Bai (2013b) and Hayakawa (2012) suggest estimators that approximate
the factor loadings using a Chamberlain (1982) type projection approach, with Quasi
Maximum Likelihood estimators suggested in the former and a GMM estimator in the
latter cases.
The objective of our study is to serve as a useful guide for practitioners who wish to
apply methods that allow for multiplicative sources of unobserved heterogeneity in their
model. All methods are analyzed using a unified notational approach, to the extent
that this is possible of course, and their properties are discussed under deviations from
a baseline set of assumptions commonly employed. We pay particular attention to
calculating the number of identifiable parameters correctly, which is a requirement for
asymptotically valid inferences and consistent model selection procedures. This issue
is often overlooked in the literature. Furthermore, we consider the extensibility of
these estimators to practical situations that may frequently arise, such as their ability
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 101
to accommodate unbalanced panels, and to estimate models with common observed
factors.
Next, we investigate the finite sample performance of the estimators under a number
of different designs. In particular, we examine (i) the effect of the presence of weakly
exogenous covariates, (ii) the effect of changing the magnitude of the correlation be-
tween the factor loadings of the dependent variable and those of the covariates, (iii) the
impact of the number of moment conditions on bias and size for GMM estimators, (iv)
the impact of different levels of persistence in the data, and finally the effect of sample
size. These are important considerations with high empirical relevance. Notwithstand-
ing, to the best of our knowledge they remain largely unexplored. For example, the
simulation study in Robertson and Sarafidis (2015) does not consider the effect of us-
ing a different number of instruments on the finite sample properties of the estimator.
In Ahn et al. (2013) the design focuses on strictly exogenous regressors, while in Bai
(2013b) the results reported do not include inference. The practical issue of how to
choose initial values for the non-linear algorithms is considered in the appendix. The
results of our simulation study indicate that there are non-negligible differences in the
finite sample performance of the estimators, depending on the parametrization consid-
ered. Naturally, no estimator dominates the remaining ones universally, although it is
fair to say that some estimators are more robust than others.
The outline of the rest of the paper is as follows. The next section introduces the dy-
namic panel data model with a multi-factor error structure and discusses some under-
lying assumptions that are commonly employed in the literature. Section 4.3 presents
a large range of dynamic panel estimators developed for such models when T is small,
and discusses several technical points regarding their properties. Section 4.4 provides
some general remarks on the estimators. Section 4.5 investigates the finite sample per-
formance of the estimators. A final section concludes. The appendix analyzes in detail
the implementation of all these methods.
In what follows we briefly introduce our notation. The usual vec(·) operator denotes the
column stacking operator, while vech(·) is the corresponding operator that stacks only
the elements on and below the main diagonal. The elimination matrix Ba is defined
such that for any [a× a] matrix (not necessarily symmetric) vech(·) = Ba vec(·). The
lag-operator matrix LT is defined such that for any [T × 1] vector x = (x1, . . . , xT )′,
LTx = (0, x1, . . . , xT−1)′. Shorthand notation xi,s:k, s ≤ k is used to denote the vectors
of the form xi,s:k = (xi,s, . . . , xi,k)′. The jth column of the [x × x] identity matrix is
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 102
denoted by ej. Finally, 1(·) is the usual indicator function. For further details regarding
the notation used in this paper see Abadir and Magnus (2002).
4.2 Theoretical setup
We consider the following dynamic panel data model with a multi-factor error structure
yi,t = αyi,t−1 +K∑k=1
βkx(k)i,t + λ′ift + εi,t; i = 1, . . . , N, t = 1, . . . , T, (4.1)
where the dimension of the unobserved components λi and ft is [L× 1]. Stacking the
observations over time for each individual i yields
yi = αyi,−1 +K∑k=1
βkx(k)i + Fλi + εi,
where yi = (yi,1, . . . , yi,T )′ and similarly for (yi,−1,x(k)i ), while F = (f1, . . . ,fT )′ is
of dimension [T × L]. In what follows we list some assumptions that are commonly
employed in the literature, followed by some preliminary discussion. In Section 4.3 we
provide further discussion with regards to which of these assumptions can be strength-
ened/relaxed for each estimator analyzed.
Assumption 1: x(k)i,t has finite moments up to fourth order for all k;
Assumption 2: εi,t ∼ i.i.d. (0, σ2ε) and has finite moments up to fourth order;
Assumption 3: λi ∼ i.i.d. (0,Σλ) with finite moments up to fourth order, where Σλ
is positive definite. F is non-stochastic and bounded such that ‖F ‖< b <∞;
Assumption 4: E(εi,t|y′i,0:t−1,λ
′i,x
(k)′
i,1:τ
)= 0 for all t and k, where τ is a positive
integer that is bounded by T .
Assumption 1 is a standard regularity condition. Assumptions 2-3 are employed mainly
for simplicity and can be relaxed to some extent, details of which will be documented
later.1
1The zero-mean assumption for εi,t is actually implied by Assumption 4.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 103
Assumption 4 can be crucial for identification, depending on the estimation approach,
because it characterizes the exogeneity properties of the covariates. In particular, we
will refer to covariates that satisfy τ = T as strictly exogenous with respect to the
idiosyncratic error component, whereas covariates that satisfy only τ = t are weakly
exogenous. When τ < t the covariates are endogenous. The exogeneity properties of
the covariates play a major role in the analysis of likelihood based estimators because
the presence of weakly exogenous or endogenous regressors may lead to inconsistent
estimates of the structural parameters, α and βk.
Furthermore, Assumption 4 implies that the idiosyncratic errors are conditionally seri-
ally uncorrelated. This can be relaxed in a relatively straightforward way, particularly
for GMM estimators; for example, an MA process of order q can be accommodated by
truncating the set of instruments with respect to y based on E(εi,t|y′i,0:s,λ
′i,x
(k)′
i,1:τ
)= 0,
where s < t− q.
Assumption 4 also implies that the idiosyncratic error is conditionally uncorrelated with
the factor loadings. This is required for identification based on internal instruments
in levels. Finally, notice that the set of our assumptions implies that yi,t has finite
fourth-order moments, but it does not imply conditional homoskedasticity for the two
error components.
Under Assumptions 1-4, the following set of population moment conditions is valid by
construction
E[vech(εiy′i,−1)] = 0T (T+1)/2. (4.2)
In addition, the following sets of moment conditions are valid, depending on whether
τ = T or τ = t hold true, respectively
E[vec(εix(k)′
i )] = 0T 2 ; (4.3)
E[vech(εix(k)′
i )] = 0T (T+1)/2. (4.4)
For all GMM estimators one can easily modify the above moment conditions to al-
low for endogenous x’s. For example, for (say) τ = t − 1 one may redefine x(k)i ≡
(xi,0, . . . , xi,T−1)′ and proceed in exactly the same way as in τ = t.
From now on we will use the triangular structure of the moment conditions induced
by the vech(·) operator to construct the estimating equations for the GMM estimators.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 104
To achieve this we adopt the following matrix notation for the stacked model
Y = αY−1 +K∑k=1
βkXk +ΛF ′ +E; i = 1, . . . , N,
where (Y ,Y−1,Xk,E) are [N × T ] matrices with typical rows (y′i,y′i,−1,x
(k)′
i , ε′i) re-
spectively. Similarly a typical row element of Λ is given by λ′i.
Remark 4.1. For notational symmetry, while describing GMM estimators we assume
that x(k)i,0 observations are not included in the set of available instruments. Otherwise
additional T or T − 1 (depending on the estimator analyzed) moment conditions are
available. The same strategy is used in the Monte Carlo section of this paper.
4.3 Estimators
4.3.1 Quasi-differenced (QD) GMM
Replacing the expectations in (4.2) and (4.4) with sample averages yields
vech
(1
N(Y − αY−1 −
K∑k=1
βkXk −ΛF ′)′Y−1
);
vech
(1
N(Y − αY−1 −
K∑k=1
βkXk −ΛF ′)′Xk
).
These moment conditions depend on the unknown matrices F and Λ. In the simple
fixed effects model where F = ıT , the first-differencing transformation proposed by
Anderson and Hsiao (1982) is the most common approach to eliminate the fixed effects
from the equation of interest. Using a similar idea in the model with one unobserved
time varying factor, i.e.
yi,t = αyi,t−1 +K∑k=1
βkx(k)i,t + λift + εi,t,
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 105
Holtz-Eakin, Newey, and Rosen (1988) suggest eliminating the unobserved factor com-
ponent using the following quasi-differencing (QD) transformation
yi,t − rtyi,t−1 = α(yi,t−1 − rtyi,t−2) +K∑k=1
βk(x(k)i,t − rtx
(k)i,t−1) + εi,t − rtεi,t−1, (4.5)
for i = 1, . . . , N, t = 2, . . . , T where rt ≡ ft/ft−1. By construction equation (4.5) is free
from λift because
λift − rtλift−1 = λift −ftft−1
λift−1 = 0, ∀t = 2, . . . , T.
It is easy to see that the QD approach is well defined only if all ft 6= 0, t = 1, . . . , T −1.
Collecting all parameters involved in quasi-differencing we can define the corresponding
[(T − 1)× T ] QD transformation matrix by
D(r) =
−r2 1 0 · · · 0
0 −r3... 0
......
... 1...
0 0 . . . −rT 1
,
where r = (r2, . . . , rT )′. The first-differencing (FD) transformation matrix is a special
case with r2 = . . . = rT = 1. Pre-multiplying the terms inside the vech(·) operator in
the sample analogue of the population moment conditions above by D(r), and noticing
that D(r)F = 0, we can rewrite the estimating equations for the QD GMM estimator
as
mα = vech
(1
ND(r)
(Y − αY−1 −
K∑k=1
βkXk
)′Y−1J(1)′
);
mk = vech
(1
ND(r)
(Y − αY−1 −
K∑k=1
βkXk
)′XkJ(1)′
)∀k.
Here J(L) = (IT−L,O(T−L)×L) is a selection matrix that appropriately truncates the
set of instruments to ensure that the term inside the vech(·) operator is a square matrix.
One can easily see that the total number of moment conditions and parameters under
the weak exogeneity assumption for all x’s is given by
#moments =(K + 1)(T − 1)T
2; #parameters = (K + 1) + (T − 1).
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 106
The total number of parameters consists of two terms. The first term within the
brackets corresponds to K+1 parameters of interest (or structural/model parameters),
while the remaining term corresponds to T − 1 nuisance parameters, the time-varying
factors.
Remark 4.2. If we define rt ≡ ft−1/ft, we can also consider the quasi-differencing matrix
of the following type
D(r) =
1 −r2 0 · · · 0
0 1... 0
......
... −rT−2...
0 0 . . . 1 −rT
.
This transformation approach uses forward differences rather than backward differ-
ences. However, similarly to the original transformation matrix of Holtz-Eakin et al.
(1988), the estimator based on this transformation requires that all ft 6= 0, t = 2, . . . , T .
Hence the restrictions imposed by two differencing strategies overlap for t = 2, . . . , T−1,
but not for t = 1 and t = T . Finally, one can also consider transformation matrices
based on higher order forward differences.
The approach of Holtz-Eakin et al. (1988) as it stands is tailored for models with one
unobserved factor. In principle, it can be extended to multiple factors by removing
each factor consecutively based on a D(l)(r(l)) matrix, with the final transformation
matrix being a product of L such matrices. However, this approach soon becomes
computationally very cumbersome as the estimating equations become multiplicative in
r(l). On the other hand, if the model involves some observed factors, the corresponding
D(·)(·) matrix is known, leading to a simple estimator that involves equations containing
r and structural parameters only. For example, Nauges and Thomas (2003) augment
the model of Holtz-Eakin et al. (1988) by allowing for time-invariant individual effects
yi,t = ηi + αyi,t−1 +K∑k=1
βkx(k)i,t + λift + εi,t; t = 1, . . . , T,
where ηi is eliminated using the FD transformation matrix D(ıT−1), which yields
∆yi,t = α∆yi,t−1 +K∑k=1
βk∆x(k)i,t + λi∆ft + ∆εi,t; t = 2, . . . , T,
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 107
followed by the QD transformation, albeit operated based on a [(T−2)×(T−1)] matrix
D(r). The resulting number of parameters and moment conditions can be modified
accordingly from those in Holtz-Eakin et al. (1988).
Remark 4.3. The FD transformation is by no means the only way to eliminate the
fixed effects from the model. Another commonly discussed transformation is Forward
Orthogonal Deviations (FOD). If one uses FOD instead of FD, the identification of
structural parameters would require that all f ∗t 6= 0.2 Depending on the properties of
the ft’s, it might be safer (at the expense of efficiency) to use FOD even in the absence
of ηi since rt is defined for ft 6= 0, t = 1, . . . , T − 1 only.
Remark 4.4. Assumption 2 can be easily relaxed. For example, unconditional time-
series and cross-sectional heteroscedasticity of the idiosyncratic error component, εi,t,
is allowed in the two-step version of the estimator. Serial correlation can be accommo-
dated by choosing the set of instruments appropriately, as in the discussion provided
in Section 4.2. This is a particular attractive feature, which is common to all GMM
estimators discussed in this paper. Unconditional heteroscedasticity in λi can also be
allowed, although this is a less interesting extension for practical purposes since there
are no repeated observations over each λi.
Finally, endogeneity of the regressors can be easily allowed. The exogeneity property
of the covariates can be tested using an overidentifying restrictions test statistic. The
same holds for all GMM estimators discussed in this paper, which is of course a desirable
property from the empirical point of view since the issue of endogeneity in panels with
T fixed, e.g. microeconometric panels, may frequently arise.
4.3.2 Quasi-long-differenced (QLD) GMM
As we have mentioned before, the QD approach in Holtz-Eakin et al. (1988) is difficult
to generalize to more than one unobserved factor (or one unobserved factor plus ob-
served factors). Rather than eliminating factors using such transformation, Ahn, Lee,
and Schmidt (2013) propose using a quasi-long-differencing (QLD) transformation. The
factors can be removed from the model using the following QLD transformation matrix
D(F ∗)
D(F ∗) = (IT−L,F∗) = J(L) + F ∗J(L),
where F ∗ is a [T − L × L] parameter matrix and J(L) = (OL×(T−L), IL), an [L × T ]
selection matrix. Rather than using the last observation yi,t−1 to remove factors from
2Here f∗t ≡ ct(ft − (ft+1 + . . .+ fT )/(T − t)) with c2t = (T − t)/(T − t+ 1).
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 108
the model at time t (one-by-one), the QLD approach uses long-differences from the last
observations yi,T−L+1:T to remove all L factors at once.
To see this, partition F = (F ′A,−F ′B)′ where FA and FB are of dimensions [(T−L)×L]
and [L × L] respectively. Then assuming that FB is invertible, one can redefine (or
normalize) the factors and factor loadings as
Fλi =
(F ∗
−IL
)λ∗i ; F ∗ ≡ FAF−1
B ; λ∗i ≡ FBλi.
Using fairly straightforward matrix algebra it then follows
D(F ∗)Fλi = (IT−L,F∗)
(F ∗
−IL
)λ∗i = 0T−L.
One can express all available moment conditions for this estimator as
mα = vech
(D(F ∗)
1
N
(Y − αY−1 −
K∑k=1
βkXk
)′Y−1J(L)′
);
mk = vech
(D(F ∗)
1
N
(Y − αY−1 −
K∑k=1
βkXk
)′XkJ(L)′
)∀k.
Counting the number of moment conditions and resulting parameters we have
#moments =(K + 1)(T − L)(T − L+ 1)
2; #parameters = K + 1 + (T − L)L.
However, we will further argue that the number of identifiable parameters is smaller
than K + 1 + (T − L)L. To explain the reason for this, let K = 1 and rewrite the
transformed equation for yi,1 as
yi,1 +L∑l=1
f∗(l)1 yi,T−l = α
(yi,0 +
L∑l=1
f∗(l)1 yi,T−l−1
)
+ β
(xi,1 +
L∑l=1
f∗(l)1 xi,T−l
)+
(εi,1 +
L∑l=1
f∗(l)1 εi,T−l
). (4.6)
This equation has 2 + L unknown parameters in total, while the number of moment
conditions is 2 (yi,0 and xi,1). Thus, L “nuisance parameters” are identified only up
to a linear combination, unless L ≤ 2 (or K + 1 for the general model), and the total
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 109
number of identifiable parameters is
#parameters = K + 1 + (T − L)L− 1(L≥K+1)(L−K − 1)(L−K)
2.
Notice that for L = 1 the number of moment conditions and the number of identifiable
parameters is exactly the same as in the QD transformation. Thus, one expects that
the corresponding GMM estimators are asymptotically equivalent.
Remark 4.4 regarding Assumptions 2-4, as discussed in Section 4.3.1, applies identically
here as well. Ahn et al. (2013) show that under conditional homoscedasticity in εi,t
the estimation procedure simplifies considerably because it can be performed through
iterations. Furthermore, for the case where the regressors are strictly exogenous, the
resulting estimator is invariant to the chosen normalization scheme; see their Appendix
A.
Remark 4.5. One can see the quasi long-differencing transformation matrix as the lim-
iting case (in terms of the longest difference) of the forward differencing transformation
matrix in Remark 4.2.
4.3.3 Factor IV (FIVU and FIVR)
Rather than eliminating the incidental parameters λi, Robertson and Sarafidis (2015)
propose a GMM estimator, that relies on reducing these parameters onto a finite set
of estimable coefficients. They label the proposed estimator as FIVU (Factor IV Un-
restricted). Their approach makes use of centered moment conditions of the following
form
mα = vech
(1
N
(Y − αY−1 −
K∑k=1
βkXk
)′Y−1 − FG′
);
mk = vech
(1
N
(Y − αY−1 −
K∑k=1
βkXk
)′Xk − FG′k
)∀k,
where (G,Gk) are defined as
G = E[yi,−1λ′i]; Gk = E[x
(k)i λ
′i],
with typical row elements g′t and g(k)′
t respectively. The (G,Gk) matrices represent
the unobserved covariances between the instruments and the factor loadings in the
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 110
error term. This approach adopts essentially a (correlated) random effects treatment
of the factor loadings, which is natural because the asymptotics apply for N large
and T fixed, and there are no repeated observations over each λi. Thus, it is in the
spirit of Chamberlain’s projection approach. Different sensitivities to the factors (i.e.
differences in the factor loadings) can be generated by different values of the variance
of the cross sectional distribution of λi. Notice that as in Holtz-Eakin et al. (1988) and
Ahn, Lee, and Schmidt (2013), factors corresponding to loadings that are uncorrelated
with the regressors can be accommodated through the variance-covariance matrix of the
idiosyncratic error component, εi,t, i.e. E(εiε
′i
), since the latter can be left unrestricted.
For this estimator the total number of moment conditions is given by
#moments =(K + 1)T (T + 1)
2.
As the model stands right now, Gk (all K + 1) and F are not separately identifiable
because
FG′ = FUU−1G′
for any invertible [L × L] matrix U . This rotational indeterminacy is typically elimi-
nated in the factor literature by requiring an [L×L] submatrix of F to be the identity
matrix.3 These restrictions correspond to the L2 term in the equation below. Further-
more, for L > 1 additional normalizations are required due to the fact that the moment
conditions are of triangular vech(·) type. In particular, the number of identifiable pa-
rameters is
#parameters = (K + 1)(1 + TL) + TL− L2 − (K + 1)L(L− 1)
2
− 1(L≥K+1)(L−K − 1)(L−K)
2.
The (K + 1)L(L − 1)/2 term corresponds to the unobserved “last” g, while the last
term involving the indicator function corresponds to the unobserved “first” f and is
identical to the right-hand side term in the corresponding expression for the number
of identifiable parameters Ahn, Lee, and Schmidt (2013).
Notwithstanding, as shown in Robertson and Sarafidis (2015) if one is only interested
in the structural parameters, α and βk, it is not essential to impose any identifying nor-
malizations on G and F ; the resulting unrestricted estimator for structural parameters
3Robertson and Sarafidis (2015) discuss which submatrix of F has to be be invertible in order forthe estimator with weakly exogenous regressors to be consistent.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 111
is consistent and asymptotically normal, while the variance-covariance matrix can be
consistently estimated using the corresponding sub-block of the generalized inverse of
the unrestricted variance-covariance matrix.4
Remark 4.6. Compared with the QLD estimator of Ahn et al. (2013) this estimator
utilises L(K + 1)(T − (L− 1)/2) extra moment conditions, at the expense of estimat-
ing exactly the same number of additional parameters. Hence these estimators are
asymptotically equivalent. Although in FIVU estimation one does not have to impose
any restrictions on F , for asymptotic identification the true value of FB (as defined for
QLD estimator) should still satisfy the full rank condition. Finally, the FIVU estimator
remains consistent even if the i.i.d. assumption on λi is replaced by i.h.d. (independent
and heteroscedastically distributed). However, in that situation consistent estimation
of the variance-covariance matrix is not possible. Ahn (2015) also discusses these is-
sues. Note that all other estimators that do not difference away λi, will also suffer from
this problem.
The autoregressive nature of the model suggests that individual rows of the G matrix
have also an autoregressive structure, i.e.
gt = αgt−1 +k∑k=1
βkg(k)t +Σλft.
For identification one may impose L(L + 1)/2 restrictions so that w.l.o.g. Σλ = IL.
Thus, one can express F in terms of other parameters as follows
F = (L′T − αIT )G+ eTg′T −
k∑k=1
βkGk.
Here LT is the usual lag matrix, while the additional parameter gT is introduced to
take into account the fact that in the original set of moment conditions gT = E[λiyi,T ]
does not appear as a parameter. Robertson and Sarafidis (2015) label the estimator
that takes into account restrictions imposed on F as FIVR (Factor IV Restricted)
estimator.
Robertson and Sarafidis (2015) show that FIVR is asymptotically more efficient than
FIVU and consequently than procedures involving some form of differencing. Further-
more, the restrictions imposed on a subset of the nuisance parameters appear to provide
substantial efficiency gains in finite samples.
4For further details see Theorem 3 in the corresponding paper.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 112
Counting the total number of moment conditions and parameters, we have
#moments =(K + 1)T (T + 1)
2
#parameters = (K + 1)(1 + TL) + L− (K + 1)L(L− 1)
2.
Remark 4.7. Note that in the model without any regressors (or if regressors are strictly
exogenous), the (K + 1)L(L − 1)/2 term reduces to L(L − 1)/2. Together with the
L(L+ 1)/2 restrictions imposed on Σλ, one then has in total L2 restrictions (which is
a standard number of restrictions usually imposed for factor models).
4.3.4 Linearized QLD GMM
Hayakawa (2012) proposes a linearized GMM version of the QLD model in Ahn et al.
(2013) under strict exogeneity, at the expense of introducing extra parameters. The
moment conditions can be written as follows
mα = vech
(1
NJ
(Y − αY−1 −
K∑k=1
βkXk
)′Y−1J
′
)
+ vech
(1
NJ(L)
(Y F ∗ − Y−1F
∗α −
K∑k=1
XkF∗βk
)′Y−1J
′
);
mk = vech
(1
NJ
(Y − αY−1 −
K∑k=1
βkXk
)′Y−1J
′
)
+ vech
(1
NJ(L)
(Y F ∗ − Y−1F
∗α −
K∑k=1
XkF∗βk
)′XkJ
′
); ∀k.
The parameters F ∗α , F ∗βk do not appear in the estimator of Ahn et al. (2013). The
latter can be obtained directly by noting that
F ∗α = αF ∗; F ∗βk = βkF∗.
The linearized estimator is linear in parameters and, thus is computationally easy to
implement. On the other hand, this simplicity is not without price, as this estimator is
not as efficient as the estimator in Ahn et al. (2013). In total, under strict exogeneity
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 113
of all x(k)i,t we have
#moments =(T − L)(T − L+ 1)
2+KT (T − L);
#parameters = K + 1 + (T − L)L︸ ︷︷ ︸ALS
+ (T − L)L(K + 1)︸ ︷︷ ︸Linearization
−L(L− 1)
2.
Notice that the last term in the equation for the total number of parameters is not
present in the original study of Hayakawa (2012). To explain the necessity of this
term consider the (T − L)’th equation (for ease of exposition we set L = 2) without
exogenous regressors
yi,T−2 − f (1)T−2yi,T − f
(2)T−2yi,T−1 = αyi,T−3 + f (1)
αT−2yi,T−1 + f (2)
αT−2yi,T−2
+ εT−2,t − f (1)T−2εi,T − f
(2)T−2εi,T−1.
Clearly only f(2)T−2 + f
(1)αT−2 can be identified but not the individual terms separately.
As a result L(L− 1)/2 normalizations need to be imposed. Furthermore, as it can be
easily seen this term is unaltered if additional regressors are present in the model so
long as they do not contain other lags of yi,t or lags of exogenous regressors.
If regressors are only weakly exogenous, one has to use the Linearized QLD GMM esti-
mator with care. For simplicity consider only the case with a single weakly exogenous
regressor. Observe that we can rewrite the first equation of the transformed model as
yi,1 +L∑l=1
f(l)1 yi,T−l = αyi,0 + βxi,1 +
L∑l=1
f (l)α1yi,T−l−1 +
L∑l=1
f(l)β1xi,T−l + . . . (4.7)
This equation contains 2 + 3L unknown parameters, with only two available moment
conditions (assuming xi,0 is not observed, otherwise 3). Hence the full set of parameters
in this equation cannot be identified without further normalizations. It then follows
that the minimum value of T required in order to identify the structural parameters of
interest is such that (for simplicity assume L = 1)
2(T − 1) = 2 + 3 =⇒ min T = 1 + d2.5e = 4,
where dxe is the smallest integer not less than x (the “ceiling” function). For more
general models with K > 1, the condition min T = 4 continues to hold as
(K + 1)(T − 1) ≥ (K + 2) + (K + 1) =⇒ min T = 1 +
⌈2K + 3
K + 1
⌉= 4.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 114
Notice that for the non-linear estimator min T = 3 in the single-factor case. As a
result, for L = 1 under weak exogeneity the number of identifiable parameters and
moment conditions is given by
#moments = (K + 1)(T − L)(T − L+ 1)
2− (K + 1);
#parameters = K + 1 + (T − L)L︸ ︷︷ ︸ALS
+ (T − L)L(K + 1)︸ ︷︷ ︸Linearization
−L(L− 1)
2− (K + 2),
where −(K+1) and −(K+2) adjustments are made to take into account the fact that
for t = 1 there are (K + 2) nuisance parameters to be estimated with (K + 1) available
moment conditions. Both expressions can be similarly modified for L > 1.
Remark 4.8. Although not discussed in Hayakawa (2012), the same linearisation strat-
egy for the QD estimator of Holtz-Eakin, Newey, and Rosen (1988) is also feasible.
4.3.5 Projection GMM
Following Bai (2013b)5, Hayakawa (2012) suggests approximating λi using a Mundlak
(1978)-Chamberlain (1982) type projection of the following form
λi = Φzi + νi,
where zi = (1,x(1)′
i , . . . ,x(K)′
i , yi,0)′. Notice that by definition of the projection E[νiz′i] =
OL×(TK+2) . As a result, the stacked model for individual i can be written as
yi = αyi,−1 +K∑k=1
βkx(k)i + FΦzi + Fνi + εi. (4.8)
While Bai (2013b) proposes maximum likelihood estimation of the above model, Hayakawa
(2012) advocates a GMM estimator; in our standard notation the total set of moment
5Note, that the first version of this paper dates back to 2009.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 115
conditions used by Hayakawa (2012) is given by
mα =1
N
(Y − αY−1 −
K∑k=1
βkXk −ZΦ′F ′)′Y−1e1;
mι =1
N
(Y − αY−1 −
K∑k=1
βkXk −ZΦ′F ′)′ıN ;
mk = vech
(1
N
(Y − αY−1 −
K∑k=1
βkXk −ZΦ′F ′)′Xk
), ∀k.
Assuming weak exogeneity of the covariates one has
#moments = 2T +KT (T + 1)
2;
#parameters = (K + 1) + (T − L)L︸ ︷︷ ︸ALS
+L(TK + 2)︸ ︷︷ ︸Projection
.
Similarly to the FIVU estimator of Robertson and Sarafidis (2015) the number of
identifiable parameters is smaller than the nominal one and depends on the projected
variables zi.
One can further relate the Projection estimator to the FIVU estimator. To understand
the connection between two estimator better following Bond and Windmeijer (2002),
we consider a more general projection specification of the following form
λi = Φzi + νi,
where zi = (x(1)′
i , . . . ,x(K)′
i ,y′i,−1)′. The true value of Φ has the usual expression for
the projection estimator
Φ0 ≡ E [λiz′i] E [ziz
′i]−1.
The first term in the notation of Robertson and Sarafidis (2015) is simply
E [λiz′i] = (G′1, . . . ,G
′K ,G
′) . (4.9)
This estimator coincides asymptotically with the FIVU estimator of Robertson and
Sarafidis (2015), as well as with the QLD GMM estimator of Ahn et al. (2013) and QD
estimator of Holtz-Eakin et al. (1988) (for L = 1) if all T (T + 1)(K + 1)/2 moment
conditions are used. A proof for the equivalence between FIVU, QLD and QD GMM
estimators is given in Robertson and Sarafidis (2015).
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 116
4.3.6 Linear GMM
In their discussion of the test for cross-sectional dependence, Sarafidis et al. (2009)
observe that if one can assume
xi,t = Π(xi,t−1, . . . ,xi,0) + Γxift + π(εi,t−1, . . . , εi,0) + εxi,t (4.10)
where Π(·) and π(·) are measurable functions, and the stochastic components are such
that
E[εxi,sεi,l] = 0K ,∀s, l;E[vec(Γxi)λ
′i] = OKL×L,
then the following moment conditions are valid even in the presence of unobserved
factors in both equations for yi,t and xi,t
E[(yi,t − αyi,t−1 − β′xi,t)∆xi,s] = 0,∀s ≤ t;
E[(∆yi,t − α∆yi,t−1 − β′∆xi,t)xi,s] = 0,∀s ≤ t− 1.
The total number of valid (non-redundant) moment conditions is given by
#moments = K
((T − 1)T
2+ (T − 1)
),
if one does not include xi,0 and ∆xi,1 among the instruments. Under mean stationarity
additional moment conditions become available in the equations in levels, giving rise
to a system GMM estimator.
Identification of the structural parameters crucially depends on the condition that no
lagged values of yi,t are present in (4.10) as well as on the assumption that the factor
loadings of the y and x processes are uncorrelated factor loadings. However, it is
important to stress that all exogenous regressors are allowed to be weakly exogenous
due to the possible non-zero π(·) function, or even endogenous provided that εi,t is
serially uncorrelated.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 117
4.3.7 Projection Quasi ML Estimator
To control for the correlation between the strictly exogenous regressors and the ini-
tial condition with factor loadings λi, Bai (2013b), similarly to the GMM estimator
proposed in Hayakawa (2012), considers the linear projection of the following form
λi = Φzi + νi, E[νiν′i] = Σv.
However, instead of relying on covariances as in the GMM framework, the Quasi ML
approach makes use of the following second moment estimator
S(θ) =1
N
(Y − αY−1 −
K∑k=1
βkXk −ZΦ′F ′)′(
Y − αY−1 −K∑k=1
βkXk −ZΦ′F ′),
where θ = (α,β′, σ2, vecF ′, vecΦ′)′. Evaluated at the true values of the parameters
the expected value of S(θ0) is
E[S(θ0)] = Σ = ITσ2 + FΣνF
′.
To solve the rotational indeterminacy problem, one can similarly to the FIVR estimator
of Robertson and Sarafidis (2015) normalize Σν = IL and redefine F ≡ FΣ1/2ν and
Φ ≡ ΦΣ−1/2ν . To evaluate the distance between S(θ) and Σ Bai (2013b)6 suggests
maximising the following Quasi Maximum Likelihood (QML) objective function to
obtain consistent estimates of the underlying parameters
`(θ) = −1
2
(log|Σ|+ tr
(Σ−1S
)).
Under standard regularity conditions for M-estimators the estimator obtained as the
maximizer of the objective function `(θ) is consistent and asymptotically normal for
fixed T , with asymptotic variance-covariance matrix of “sandwich” form irrespective of
the distributional assumptions imposed on the combined error term εi,t+ν′ift. If one can
replace the projection assumption by the assumption of conditional expectations, the
resulting estimator can be seen as a Quasi Maximum Likelihood estimator conditional
on exogenous regressors Xk and initial observation yi,0.
6Strictly speaking in the aforementioned paper the author solely describes the approach in termsof the likelihood function, while in Bai (2013a) the author describes a QML objective function as justone possibility.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 118
The theoretical and finite sample properties of this estimator without factors are dis-
cussed in Alvarez and Arellano (2003), Kruiniger (2013) and Bun et al. (2015) among
others, while Westerlund and Norkute (2014) discuss the properties of this estimator
for possibly non-stationary data with large T .
The above version of the estimator requires time series homoscedasticity in εi,t for
consistency. If this condition holds true and all covariates are strictly exogenous, the
estimator provides efficiency gains over the GMM estimators analyzed before since the
latter do not make use of moment conditions that exploit homoscedasticity (see e.g.
Ahn et al. (2001)). The estimator can be modified in a straightforward manner under
time series heteroscedasticity to estimate all σ2t . On the other hand, cross-sectional
heteroscedasticity cannot be allowed without additional restrictions.
Furthermore, the estimator generally requires τ = T in Assumption 4, i.e. strict
exogeneity of the regressors. An exception to this is discussed in the following remark.
Remark 4.9. If it is plausible to assume that all exogenous regressors have the following
dynamic specification
x(k)i,t = βxx
(k)i,t−1 + αxyi,t−1 + f ′tλ
x(k)i + εxi,t, (4.11)
so that x(k)i,t is possibly weakly exogenous, then according to Bai (2013b) it is sufficient
to project on (1, x(1)i,0 , . . . , x
(K)i,0 , yi,0) only, resulting in a more efficient estimator. A
necessary condition for this approach to be valid is that the factor loadings (λx(k)i ,λi)
are independent, once conditioned on the initial observations (1, x(1)i,0 , . . . , x
(K)i,0 , yi,0).
4.4 Some general remarks on the estimators
4.4.1 (Non-)Invariance to factor loadings
In situations where the model contains fixed effects only, i.e. λ′ift = λi, some of the
classical panel data estimators can be location invariant with respect to individual
effects. For example, under mean stationarity of the initial condition the GMM esti-
mators of Anderson and Hsiao (1982)(with instruments in first differences). Hayakawa
(2009b), or the Transformed ML estimators as in Hsiao et al. (2002), Kruiniger (2013)
and Juodis (2014b) are invariant to the distribution of the fixed effects λi. In general,
irrespective of the properties of yi,0, none of the estimators presented in this paper are
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 119
invariant to λ′ift for fixed T. For GMM estimators, invariance would require knowledge
of the whole history ftTt=−∞ in order to construct instruments that are invariant to
λi. This conclusion is true both for estimators that involve some sort of differencing
(QD, QLD) and projection (FIVU, Projection GMM).
4.4.2 Unbalanced samples
As it is mentioned in e.g. Juodis (2015), for the quasi-long-differencing transformation
of Ahn et al. (2013) in the model with weakly exogenous regressors it is necessary
that for all individuals the last L observations are available to the researcher. Other-
wise the D(F ∗) transformation matrix might become group-specific, if one can group
observations based on availability.
To see this in more detail, consider Equation (4.6). As it stands, the quasi-long-
differencing transformation that removes the incidental parameters from the error is
feasible for individual i only if the last L periods are available. Otherwise, these
individuals may either be dropped out altogether, or be grouped such that it becomes
possible to normalize on different T − L periods. Either way, the estimator may suffer
from a substantial loss in efficiency, as a result of removing observations of splitting
the sample. On the other hand, if it is plausible to assume that the model contains
only strictly exogenous regressors, then it is sufficient that there exist L common time
indices t(1), . . . , t(L) where observations for all individuals are available.
The extension of FIVU and FIVR to unbalanced samples follows trivially by simply in-
troducing indicators, depending on whether a particular moment condition is available
for individual i or not (as for the standard fixed effects estimator).
The QD GMM estimator of Nauges and Thomas (2003) can be trivially modified as
well, as in the standard Arellano and Bond (1991) procedure. However, similarly to
that procedure, this transformation might result in dropping quite a lot of observations.
The projection estimator of Hayakawa (2012) requires further modifications in order to
take into account that projection variables zi are not fully observed for each individual.
We conjecture that the modification could be performed in a similar way as in the model
without a factor structure, as discussed by Abrevaya (2013). For maximum likelihood
based estimators, such extendability appears to be a more challenging task.
Remark 4.10. The above discussion relies on the existence of a large enough number
of consecutive time periods for each individual in the sample. For example, FIVU
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 120
requires at least two consecutive periods and quasi-differencing type procedures require
at least three. Under these circumstances, we note that estimators in their existing
form may not be fully efficient. For example, if one observes only yi,T and yi,T−2 for
a substantial group of individuals, assuming exogenous covariates are available at all
time periods, then one could use backward substitution and consider moment conditions
within the FIVU framework, which are quadratic in the autoregressive parameter and
result in efficiency gains. For projection type methodologies, however, such substantial
unbalancedness may affect the consistency of the estimators as one cannot substitute
unobserved quantities for zeros in the projection term. This issue is discussed in detail
by Abrevaya (2013).
4.4.3 Observed factors
In some situations one might want to estimate models with both observed and un-
observed factors at the same time. Taking the structure of observed factors into ac-
count may improve the efficiency of the estimators, although one can still consistently
estimate the model by treating the observed factors as unobserved. One such possi-
bility has been already discussed in Nauges and Thomas (2003) for models with an
individual-specific, time-invariant effect. In this section we will briefly summarize im-
plementability issues for all estimators when observed factors are present in the model
alongside their unobserved counterparts.7
For the GMM estimators that involve some form of differencing, e.g. Holtz-Eakin et al.
(1988) and Ahn et al. (2013), one can deal with observed factors using a similar proce-
dure as in Nauges and Thomas (2003), that is, by removing the observed factors first
(one-by-one) and then proceeding to remove the unobserved factors from the model.
The first step can be most easily implemented using a quasi-differencing matrix D(r)
with known weights.
For the GMM estimators of Robertson and Sarafidis (2015) (FIVU) and Hayakawa
(2012), since the unobserved factors are not removed from the model, the treatment
of the observed factors is somewhat easier. One merely needs to split the FG′ terms
into two parts, observed and unobserved factors, and then proceed as in the case of
unobserved factors. In this case the number of identified parameters will be smaller
7Under assumption that appropriate regularity conditions hold, which prohibit asymptoticcollinearity between the observed and unobserved factors.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 121
than in the case where one treats the observed factors as unobserved. As a result, one
gains efficiency, at the expense, however, of robustness.
For FIVR one needs to take care when solving for F in terms of the remaining parame-
ters, because in the model with observed factors one estimates the variance-covariance
matrix of the factor loadings for the observed factors, while for those which are unob-
served their variance-covariance matrix is normalized.
The extension of the likelihood estimator of Bai (2013b) to observed factors can be
implemented in a similar way to the projection GMM estimator. As in FIVR, one
would have to estimate the variance-covariance matrix of the factor loadings for the
observed factors, while the covariances of unobserved factors can be w.l.o.g. normalized
as before.
4.5 Simulation study
This section investigates the finite sample performance of the estimators analyzed above
using simulated data. Our focus lies on examining the effect of the presence of weakly
exogenous covariates, the effect of changing the magnitude of the correlation between
the factor loadings of the dependent variable and those of the covariates, as well as
the impact of changing the number of moment conditions on bias and size for GMM
estimators. We also investigate the effect of changing the level of persistence in the
data, as well as the sample size in terms of both N and T .
4.5.1 Setup and designs
We consider model (4.1) with K = 1, i.e.
yi,t = αyi,t−1 + βxi,t + ui,t; ui,t =L∑`=1
λ`,if`,t + εyi,t.
The process for xi,t and for ft is given, respectively, by
xi,t = δyi,t−1 + αxxi,t−1 +L∑`=1
γ`,if`,t + εxi,t;
f`,t = αff`,t−1 +√
1− α2fεf`,t; εf`,t ∼ N (0, 1), ∀`.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 122
The factor loadings are generated by λ`,i ∼ N (0, 1) and
γ`,i = ρλ`,i +√
1− ρ2υf`,i; υf`,i ∼ N (0, 1)∀`,
where ρ denotes the correlation between the factor loadings of the y and x processes.
Furthermore, the idiosyncratic errors are generated as8
εyi,t ∼ N (0, 1) ; εxi,t ∼ N(0, σ2
x
).
The starting period for the model is t = −S and the initial observations are generated
as
yi,−S =L∑`=1
λ`,if`,−S + εyi,−S; xi,−S =L∑`=1
γ`,if`,−S + εxi,−S; f−S ∼ N (0, 1).
The signal-to-noise ratio of the model is defined as follows
SNR ≡ 1
T
T∑t=1
var(yi,t|λ`,i, γ`,i, f`,sts=−S
)var εyi,t
− 1.
σ2x is set such that the signal-to-noise ratio is equal to SNR = 5 in all designs.9 This
particular value of SNR is chosen so that it is possible to control this measure across
all designs. Lower values of SNR (e.g. 3 as in Bun and Kiviet (2006)) would require
σ2x < 0 ceteris paribus in order to satisfy the desired equality for all designs.
We set β = 1−α such that the long run parameter is equal to 1, αx = 0.6, αf = 0.5 and
L = 1.10 We consider N = 200; 800 and T = 4; 8. Furthermore, α = 0.4; 0.8,ρ = 0; 0.6 and δ = 0; 0.3. The minimum number of replications performed equals
2, 000 for each design and the factors are drawn in each replication. The choice of
the initial values of the parameters for the nonlinear algorithms is discussed in 4.A.
8We have also explored the effect of non-normal errors based on the chi-squared distribution(centered and normalized). The results were almost identical and therefore, to save space, we refrainfrom reporting them.
9To ensure this, we also set S = 5.10Similar results have been obtained for L = 2. To avoid repeating similar conclusions we refrain
from reporting these results. We note that the number of factors can be estimated for all GMMestimators based on the model information criteria developed by Ahn et al. (2013). The performanceof these procedures appears to be more than satisfactory; the interested reader may refer to theaforementioned paper, as well as to the Monte Carlo study in Robertson, Sarafidis, and Westerlund(2014). The size of L is treated as known in this paper because there is currently no equivalentmethodology proposed for testing the number of factors within the likelihood framework.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 123
When at least one of the estimators fails to converge in a particular replication, that
replication is discarded.11
Note that for the QML estimator we use standard errors based on a “sandwich”
variance-covariance matrix, as opposed to the simple inverse of the Hessian variance
matrix. First order conditions as well as Hessian matrices for likelihood estimators are
obtained using analytical derivatives to speed-up the computations.12
Although feasible, in this paper we do not implement the linearized GMM estimator
of Hayakawa (2012) adapted to weakly exogenous regressors. This is mainly due to
the fact that this estimator merely provides an easy way to obtain starting values for
the remaining estimators, which involve non-linear optimization algorithms. Motivated
from our theoretical discussion regarding the estimators considered in this paper, some
implications can be discussed a priori, based on our Monte Carlo design.
1. When δ 6= 0, likelihood based estimators are inconsistent, with the exception of
the modified estimator of Bai (2013b) conditional on (yi,0, xi,0).
2. For ρ 6= 0 the likelihood estimator conditional on (yi,0, xi,0) is inconsistent because
the conditional independence assumption is violated.
3. For α = 0.8, ρ = 0, δ = 0 the projection GMM estimator might suffer from weak
instruments because yi,0 remains the only relevant instrument.
Remark 4.11. Please note that although discussed in Section 4.3.1, the QD estimator
of Holtz-Eakin et al. (1988) is not included in the Monte Carlo study. As discussed
in Robertson et al. (2014), this estimator is asymptotically equivalent to the QLD
estimator of Ahn et al. (2013) (if all ft 6= 0, t = 1, . . . , T ) and thus it can be expected
that finite sample results should be similar. Furthermore, given that the QD estimator
requires more stringent normalization restrictions than the QLD estimator, one can
suspect that finite sample results can be even somewhat more sensitive to the DGP of
ft. Unreported preliminary results, that can be obtained from authors upon request,
confirm this observation.
11For the numerical maximization we used the BFGS method as implemented in the OxMetricsstatistical software. Convergence is achieved when the difference in the value of the given objectivefunction between two consecutive iterations is less than 10−4. Other values of this criterion wereconsidered in the preliminary study with similar qualitative conclusions, although the number of timesparticular estimators fail to converge varies. For further details on OxMetrics see Doornik (2009).
12In the preliminary study, results based on analytical and numerical derivatives were compared.Since the results were quantitatively and qualitatively almost identical (for designs where estimatorswere consistent), we prefer the use of analytical derivatives solely for practical reasons.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 124
4.5.2 Results
The results are reported in the Appendix in terms of median bias and root median
square error, which is defined as
RMSE =√
med[(αr − α)2],
where αr denotes the value of α obtained in the rth replication using a particular estima-
tor (and similarly for β). As an additional measure of dispersion we report the radius
of the interval centered on the median containing 80% of the observations, divided by
1.28. This statistic, which we shall refer to as “quasi-standard deviation” (denoted
qStd) provides an estimate of the population standard deviation if the distribution
were normal, with the advantage that it is more robust to the occurrence of outliers
compared to the usual expression for the standard deviation. The reason we report
this statistic is that, on the one hand, the root mean square error is extremely sensitive
to outliers, and on the other hand it is fair to say that the root median square error
does not depend on outliers pretty much at all. Therefore, the former could be unduly
misleading given that in principle, for any given data set, one could estimate the model
using a large set of different initial values in an attempt to avoid local minima, or lack
of convergence in some cases (which we deal with in our experiments by discarding
those particular replications). In a large-scale simulation experiment as ours, however,
the set of initial values naturally needs to be restricted in some sensible/feasible way.
The quasi-standard deviation lies in-between because while it provides a measure of
dispersion that is less sensitive to outliers compared to the root mean square error,
it is still more informative about the variability of the estimators relative to the root
median square error. Finally, we report size, where nominal size is set at 5%. For
the GMM estimators we also report size of the overidentifying restrictions (“J”) test
statistic.
Initially we discuss results for the OLS estimator, the GMM estimator proposed by
Sarafidis, Yamagata, and Robertson (2009) and the linearized GMM estimator of
Hayakawa (2012); these estimators have been used to obtain initial values for the
parameters for the non-linear estimators, among other (random) choices. As we can
see in Table 4.1, in many circumstances the OLS estimator exhibits large median bias,
while the size of the estimator is most often not far from unity. On the other hand, the
linear GMM estimator proposed by Sarafidis, Yamagata, and Robertson (2009) does
fairly well both in terms of bias and RMSE when δ = 0 and ρ = 0, i.e. when the
covariate is strictly exogenous with respect to the total error term, ui,t. The size of
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 125
the estimator appears to be somewhat upwardly distorted, especially for T large, but
one expects that this would substantially improve if one made use of the finite-sample
correction proposed by Windmeijer (2005). On the other hand, the estimator is not
consistent for the remaining parameterizations of our design and this is well reflected
in its finite sample performance. Notably, the “J” statistic appears to have high power
to detect violations of the null, even if N is small.
With regards to the linearized GMM estimator of Hayakawa (2012), both median bias
and RMSE are reasonably small, even for N = 200, so long as δ = 0, i.e. under strict
exogeneity of x with respect to the idiosyncratic error. However, the estimator appears
to be quite sensitive to high values of α, both in terms of bias and qStd, an outcome
that may be partially related to the fact that the value of β is small in this case,
which implies that a many-weak instruments type problem might arise. Naturally, the
performance of the estimator deteriorates for δ = 0.3 as the moment conditions are
invalidated in this case. While the size of the “J” statistic appears to be distorted
upwards when the estimator is consistent, it has in general quite large power to detect
violations of strict exogeneity, and for high values of α this holds true even with a
relatively small N .
Tables 4.3 and 4.4 report results for the quasi-long-differenced GMM estimator pro-
posed by Ahn, Lee, and Schmidt (2013). The only difference between the two tables is
that Table 4.3 is based on the “pseudo-full” set of moment conditions, i.e. T (T − 1),
obtained by always treating xi,t as weakly exogenous, while Table 4.4 is based on the
4 most recent lags of the variables. In the latter case the number of instruments is of
order O(T ). This strategy is possible to implement only for T = 8, as for T = 4 there
are not enough degrees of freedom to identify the model when truncating the moment
conditions to such extent.13 The estimator appears to have small median bias under
all designs. This is expected given that the estimator is consistent. The qStd results
indicate that the estimator has large dispersion in some designs, especially when T is
small. We have explored further the underlying reason for this result. We found that
this is often the case when the value of the factor at the last time period, i.e. fT ,
is close to zero. Thus, the estimator appears to be potentially sensitive to this issue,
because the normalization scheme sets fT = 1.14 The two-step version improves on
these results. On the other hand, inferences based on one-step estimates seem to be
13To be more precise, the total number of moment conditions for the subset estimator is q(2(T −1) + 1− q), where in our case q = 4.
14Notice that imposing a different normalization, e.g. fT−1 = 1 would result in losing T momentconditions, as explained in the main text.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 126
relatively more reliable. This outcome may be attributed to the standard argument
provided for linear GMM estimators, which is that two-step estimators rely on an es-
timate of the variance-covariance matrix of the moment conditions, which, in samples
where N is small, can lead to conservative standard errors. Notice that a Windmeijer
(2005) type correction is not trivial here because the proposed expression applies to
linear estimators only. Truncating the moment conditions for T = 8 seems to have a
negligible effect on the size properties of the one-step estimator but does improve size
for the two-step estimator quite substantially. This result seems to apply for all overi-
dentified GMM estimators actually. The “J” statistic exhibits small size distortions
upwards.
Tables 4.5 and 4.8 report results for FIVU and FIVR based on either the full or the
truncated sets of moment conditions, proposed by Robertson and Sarafidis (2015).
Similarly to Ahn et al. (2013), both estimators have very small median bias in all
circumstances. Furthermore, they perform well in terms of qStd. Especially the two-
step versions have small dispersion regardless of the design. Naturally, the dispersion
decreases further with high values of T because the degree of overidentification of
the model increases. As expected, RMSE appears to go down roughly at the rate of√N . FIVR dominates FIVU, which is not surprising given that the former imposes
overidentifying restrictions arising from the structure of the model and thus it estimates
a smaller number of parameters. The size of one-step FIVU and FIVR estimators is
close to its nominal value in all circumstances. On the other hand, the two-step versions
appear to be size distorted when T is large, although the distortion decreases when
only a subset of the moment conditions is used. Thus, one may conclude that using the
full set of moment conditions and relying on inferences based on first-step estimates is a
sensible strategy. From an empirical point of view this is appealing because it simplifies
matters regarding how many instruments to use − an important question that often
arises in two-way error components models estimated using linear GMM estimators.
Finally, the size of the “J” statistic is often slightly distorted when N is small, but
improves rapidly as N increases.
The projection GMM estimator proposed by Hayakawa (2012) has small bias and per-
forms well in general in terms of qStd unless α is close to unity, in which case outliers
seem to occur relatively more frequently. One could suspect that this design is the worst
case scenario for the estimator because only yi,0 is included in the set of instruments,
while lagged values of xi,t are only weakly correlated with yi,t−1. Inferences based on
the first-step estimator are reasonably accurate, certainly more so compared to the
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 127
two-step version, although the latter improves for the truncated set of moment condi-
tions. The “J” statistic seems to be size-distorted downwards but it slowly improves
for larger values of N .
Finally, Table 4.11 reports results for the conditional maximum likelihood estimator
proposed by Bai (2013b). The left panel corresponds to the estimator that treats xi,t as
strictly exogenous with respect to the idiosyncratic error, while the panel on the right-
hand side corresponds to the estimator that is consistent under weak exogeneity of a
first-order form15, which is satisfied in our design, assuming that ρ = 0. Interestingly,
the former appears to exhibit negligible median bias in all cases, even when both δ and
ρ take non-zero values. The dispersion of the estimator is small as well, unless T = 4
and δ = 0.3. Likewise the size of the estimator is distorted upwards when δ = 0.3
and gets worse with higher values of N , which is natural given that the estimator
is not consistent in this case. However, for cases where this estimator is consistent
(δ = 0 and ρ = 0), it may serve as a benchmark because it has negligible bias and
excellent size. This can be expected given the asymptotic optimality of this estimator
under conditional homoscedasticity of εi,t. The conclusion is pretty much invariant to
different values of N, T or ρ. The second estimator, in designs with ρ = 0.6 where it
is not consistent, tends to have substantial bias for both α and β. On the other hand,
when it is supposed to be consistent (δ = 0.3, ρ = 0.0) it is more size distorted than
the first estimator that is inconsistent. This is a somewhat puzzling finding.
Remark 4.12. Monte Carlo evidence in Juodis and Sarafidis (2015) suggest that the
standard error correction as in Windmeijer (2005) can substantially improve the empir-
ical size of the two-step FIVU estimator. We suspect that the same is also applicable
to the estimators of Ahn, Lee, and Schmidt (2013) and Hayakawa (2012). However,
extensive analysis of this issue is beyond the scope of this paper.
4.6 Conclusions
In this paper we have analyzed a group of fixed T dynamic panel data estimators with
a multi-factor error structure. All currently available estimators have been presented
using a unified notation. Both their theoretical properties as well as possible limitations
are discussed. We have considered a model with a lagged dependent variable and
additional regressors, possibly weakly exogenous or endogenous. We found that the
15That is, when xi,t follows an AR(1) process.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 128
number of identifiable parameters for the GMM estimators can be smaller than what
can be found in the literature. This result is of major importance for practitioners
when performing model selection based on overidentifying test statistics. Theoretical
discussions in this paper were complemented by a finite sample study based on Monte
Carlo simulation.
We designed our Monte Carlo exercise to shed some light on the relative merits of
the various estimation approaches. It was found that the likelihood estimator of Bai
(2013b), when consistent, can serve as a benchmark in that it has negligible bias and
good size control, irrespective of the sample size. Under such circumstances, the FIVR
estimator proposed by Robertson and Sarafidis (2015) performs closely as well. How-
ever, FIVR is more robust to violations from strict exogeneity, as well as from the no
conditional correlation condition between the factor loadings. The latter applies to
other GMM estimators as well, at least provided that the cross-sectional dimension is
large enough.
This paper assumes that the time-series dimension is fixed. Bai (2013b) shows that
the presence of factors does not result in an incidental parameters problem for the
conditional maximum likelihood estimator as far as the structural parameters are con-
cerned. A natural question to ask is whether GMM estimators in models where the
number of parameters and number of moment conditions grows with T suffer from an
incidental parameters problem. Furthermore, in this paper we assume that all method
of moments estimators do not suffer from the presence of weak instruments. The anal-
ysis of the estimating procedures when this assumption might be violated is already
non-trivial for the models without factors, see e.g. Bun and Kleibergen (2014) and
Bun and Poldermans (2015). We leave detailed analysis of these questions for future
research.
4.A Starting values for non-linear estimators
This appendix discusses the choice of starting values used for the non-linear optimiza-
tion algorithms.
Ahn et al. (2013). Under conditional homoscedasticity in εi,t, this estimator can be
implemented through an iterative procedure. Iterations start given some set of initial
values for the structural parameters, α, β. For this purpose, we use both the one- and
two-step linearized GMM estimator as proposed by Hayakawa (2012), as well as the
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 129
OLS estimator. The two-step estimator is implemented in exactly the same way except
that the set of initial values for the structural parameters includes the one-step esti-
mator. Once final estimates of α, β and F are obtained, these are used as initial values
in the non-linear optimization algorithm, which optimises all parameters at once. This
is implemented in order to make sure that we indeed find the global minimum of the
objective function.
FIVU. Similarly to the previous estimator, FIVU can also be implemented in steps.
Iterations start given a set of starting values for the factors F . This set is obtained
using the linearized GMM estimator, estimates of the principal components extracted
from OLS residuals, and one set of uniform random variables on [−1; 1]. Unlike for
Ahn et al. (2013), joint non-linear optimization is not used as a final step in order to
save computational time.
FIVR. For this estimator the main source of starting values is obtained from FIVU
with the starting value of gT implied in terms of other parameters. Other starting
values include those based on the OLS estimator and the one- and two-step linearized
GMM estimator. In this case starting values for the nuisance parameters G are simply
drawn from uniform [−1; 1].
Projection GMM. This estimator is implemented in exactly the same way as Ahn
et al. (2013), i.e. first an iterative procedure is used, followed by a non-linear one.
Starting values for the factors are obtained using the principal components extracted
from OLS residuals, the estimate of f is obtained from the linearized GMM estimator,
and two sets of uniform random variables on [−1; 1]. In order to uniquely identify
all parameters up to rotation, we impose fT = 1 in estimation. We suspect that in
principle, similarly to FIVU, one can estimate the model without normalizations and
perform a degrees of freedom correction at the end. We leave this question open for
future research.
Projection MLE. Starting values for the structural parameters are obtained using the
linearized GMM estimator, OLS, and two sets of uniform random variables on [−1; 1].
The remaining parameters (including log(σ2)) are drawn as uniform random variables
on [0; 1]. In the preliminary study we also tried [−1; 1], but the results were identical.
Alternatively, one could also use the principal component estimates of F obtained from
OLS residuals, as suggested by Bai (2013b).
Subset GMM estimators. For T = 8 when both the subset and full-set GMM es-
timators are available, we estimate the subset estimators first using the algorithms as
described above and then use the subset estimator as starting values for the estimators
that make use of the full set of moment conditions.
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 130
4.B Tables
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 131T
able
4.1
:O
LS
esti
mat
oran
dSyst
emG
MM
esti
mat
orby
Sar
afidis
,Y
amag
ata,
and
Rob
erts
on(2
009)
Des
ign
sO
LS
Su
b-S
yst
emα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
04
.4.0
.0.0
22.0
48.1
35.6
09
-.00
8.0
25.0
69.2
47
-.00
2.0
29
.089
.060
-.002
.021
.065
.060
.041
200
4.4
.0.3
.005
.051
.146
.438
-.04
8.0
62.1
46.4
85
-.08
0.0
94
.228
.351
.037
.069
.204
.310
.707
200
4.4
.6.0
-.03
5.0
51.1
39.6
33.0
88
.088
.092
.851
-.03
5.0
56
.152
.405
.086
.087
.130
.638
.720
200
4.4
.6.3
-.17
0.1
70.1
62.9
21.1
41
.141
.162
.817
-.32
0.3
20
.237
.907
.289
.289
.320
.878
.866
200
4.8
.0.0
-.04
8.0
50.0
97.6
62.0
09
.013
.035
.139
-.03
8.0
91
.299
.105
-.012
.032
.108
.082
.044
200
4.8
.0.3
-.06
6.0
66.1
02.6
47-.
031
.045
.114
.412
-.30
1.3
05
.684
.649
-.007
.096
.299
.413
.823
200
4.8
.6.0
-.08
3.0
83.1
02.8
35.0
64
.064
.059
.893
-.11
3.1
47
.397
.488
.029
.054
.158
.360
.587
200
4.8
.6.3
-.18
1.1
81.1
37.9
64.1
09
.109
.131
.799
-.44
5.4
45
.403
.907
.246
.246
.319
.808
.818
200
8.4
.0.0
.037
.045
.110
.691
-.01
8.0
24.0
61.3
55
-.00
3.0
14
.044
.148
-.001
.012
.036
.135
.029
200
8.4
.0.3
.032
.051
.129
.519
-.06
0.0
65.1
18.5
67
-.12
2.1
22
.160
.772
.090
.095
.165
.653
.901
200
8.4
.6.0
-.01
3.0
41.1
16.6
67.0
77
.077
.067
.934
-.04
5.0
47
.089
.669
.087
.087
.079
.933
.768
200
8.4
.6.3
-.14
9.1
49.1
22.9
71.1
54
.154
.126
.952
-.36
2.3
62
.148
1.3
93.3
93
.207
.999
.988
200
8.8
.0.0
-.01
6.0
31.0
84.6
41.0
03
.010
.029
.103
-.03
3.0
41
.115
.248
-.007
.013
.039
.179
.039
200
8.8
.0.3
-.02
3.0
36.1
01.4
44-.
059
.063
.111
.564
-.40
4.4
04
.465
.960
.095
.139
.396
.692
.990
200
8.8
.6.0
-.04
5.0
46.0
82.7
60.0
62
.062
.040
.980
-.09
7.0
99
.204
.766
.038
.040
.073
.653
.680
200
8.8
.6.3
-.17
7.1
77.1
08.9
99.1
65
.165
.135
.952
-.57
0.5
70
.211
1.5
13.5
13
.299
1.9
7280
04
.4.0
.0.0
31.0
79.2
21.8
46
-.01
2.0
31.0
89.4
37
-.00
1.0
18
.056
.053
.000
.016
.049
.048
.053
800
4.4
.0.3
.004
.054
.152
.714
-.05
7.0
69.1
55.7
19
-.07
5.0
88
.193
.565
.050
.071
.190
.544
.949
800
4.4
.6.0
-.06
4.0
85.2
02.8
67.2
17
.217
.166
.987
-.12
2.1
27
.177
.857
.267
.267
.171
.957
.986
800
4.4
.6.3
-.18
1.1
81.1
68.9
70.1
54
.154
.182
.928
-.36
4.3
64
.212
.960
.366
.366
.325
.968
.985
800
4.8
.0.0
-.06
9.0
71.1
37.8
58.0
05
.013
.038
.086
-.01
4.0
45
.148
.069
-.002
.017
.057
.053
.048
800
4.8
.0.3
-.06
1.0
61.1
06.8
05-.
048
.057
.136
.703
-.29
5.3
05
.630
.807
.015
.104
.323
.645
.978
800
4.8
.6.0
-.11
0.1
10.1
34.9
29.2
08
.208
.135
.989
-.20
9.2
20
.359
.878
.232
.233
.183
.933
.979
800
4.8
.6.3
-.19
9.1
99.1
48.9
93.1
36
.136
.171
.919
-.51
5.5
15
.305
.965
.399
.399
.331
.967
.971
800
8.4
.0.0
.063
.074
.162
.876
-.02
9.0
34.0
86.5
49
-.00
1.0
10
.030
.074
.000
.010
.030
.059
.042
800
8.4
.0.3
.035
.051
.124
.740
-.06
7.0
69.1
06.7
91
-.10
4.1
04
.123
.871
.081
.082
.117
.773
180
08
.4.6
.0-.
036
.057
.148
.841
.205
.205
.118
1-.
129
.129
.086
.974
.236
.236
.088
11
800
8.4
.6.3
-.15
8.1
58.1
16.9
98.1
68
.168
.125
.992
-.36
2.3
62
.111
1.4
03.4
03
.163
11
800
8.8
.0.0
-.02
3.0
40.1
11.8
63.0
02
.010
.032
.083
-.00
6.0
19
.061
.096
.000
.009
.026
.057
.046
800
8.8
.0.3
-.02
3.0
34.0
92.7
04-.
057
.058
.091
.769
-.36
5.3
65
.453
.974
.069
.109
.319
.772
180
08
.8.6
.0-.
068
.068
.095
.925
.209
.209
.081
1-.
200
.200
.182
.982
.210
.210
.084
.999
180
08
.8.6
.3-.
169
.169
.095
1.1
57
.157
.118
.993
-.53
0.5
30
.180
1.4
62.4
62
.240
.998
1
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 132T
able
4.2
:L
inea
rize
dG
MM
Est
imat
orof
Hay
akaw
a(2
012)
wit
hst
rict
exog
enei
tyas
sum
pti
on
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
04
.4.0
.0-.
004
.030
.097
.060
-.00
5.0
30.0
99.0
57
-.00
3.0
25
.077
.120
-.008
.024
.076
.111
.125
200
4.4
.0.3
-.05
9.0
65.1
60.2
14-.
160
.168
.247
.504
-.03
2.0
51
.142
.306
-.189
.190
.215
.812
.239
200
4.4
.6.0
-.01
2.0
31.1
09.0
79.0
00
.029
.103
.068
-.00
7.0
26
.085
.147
-.005
.025
.080
.128
.133
200
4.4
.6.3
-.08
5.0
86.1
81.2
91-.
160
.174
.262
.503
-.05
9.0
65
.150
.404
-.195
.196
.228
.831
.216
200
4.8
.0.0
-.06
0.0
77.2
16.1
93-.
010
.025
.084
.085
-.06
0.0
74
.209
.281
-.014
.023
.077
.194
.179
200
4.8
.0.3
-.32
2.3
22.3
01.7
68-.
125
.127
.134
.643
-.34
8.3
48
.320
.930
-.157
.157
.096
.905
.098
200
4.8
.6.0
-.07
5.0
90.2
42.2
36-.
008
.025
.084
.095
-.07
2.0
88
.243
.345
-.017
.026
.076
.207
.193
200
4.8
.6.3
-.34
7.3
47.3
05.7
61-.
126
.130
.134
.627
-.38
0.3
80
.334
.938
-.157
.157
.089
.905
.082
200
8.4
.0.0
-.00
3.0
22.0
75.0
94.0
00
.023
.078
.092
.000
.015
.048
.339
-.003
.015
.047
.333
.108
200
8.4
.0.3
-.06
4.0
70.1
68.2
79-.
015
.063
.219
.157
-.02
9.0
39
.102
.525
-.058
.068
.142
.695
.642
200
8.4
.6.0
-.01
2.0
24.0
92.1
17.0
10
.022
.091
.113
-.00
6.0
17
.056
.372
.004
.015
.051
.331
.114
200
8.4
.6.3
-.08
0.0
80.2
00.3
74-.
007
.063
.267
.186
-.04
2.0
44
.117
.584
-.057
.073
.164
.707
.583
200
8.8
.0.0
-.02
4.0
29.0
92.1
65-.
003
.015
.051
.080
-.02
0.0
24
.071
.433
-.005
.011
.036
.311
.118
200
8.8
.0.3
-.20
1.2
01.1
79.8
20-.
048
.074
.215
.317
-.19
3.1
93
.149
.991
-.086
.095
.126
.852
.600
200
8.8
.6.0
-.02
9.0
33.1
06.2
16.0
04
.015
.063
.111
-.02
5.0
27
.079
.476
-.002
.011
.038
.319
.104
200
8.8
.6.3
-.20
8.2
08.1
85.8
84-.
048
.077
.252
.340
-.20
0.2
00
.137
.996
-.089
.097
.137
.869
.508
800
4.4
.0.0
-.00
5.0
28.1
02.0
81-.
007
.032
.117
.078
-.00
2.0
23
.074
.143
-.006
.023
.076
.117
.149
800
4.4
.0.3
-.06
6.0
69.1
22.4
78-.
192
.194
.227
.726
-.03
7.0
55
.128
.603
-.215
.215
.178
.979
.818
800
4.4
.6.0
-.00
8.0
28.1
08.0
93-.
003
.033
.114
.087
-.00
4.0
23
.083
.160
-.005
.024
.084
.142
.160
800
4.4
.6.3
-.07
8.0
78.1
25.5
49-.
200
.203
.194
.773
-.05
4.0
57
.118
.605
-.229
.229
.175
.980
.732
800
4.8
.0.0
-.08
2.0
98.3
02.2
55-.
020
.035
.123
.144
-.07
3.0
87
.292
.339
-.021
.031
.122
.266
.203
800
4.8
.0.3
-.38
9.3
89.3
07.8
92-.
148
.149
.121
.806
-.43
6.4
36
.321
.981
-.178
.178
.067
.995
.549
800
4.8
.6.0
-.10
6.1
18.3
16.3
07-.
022
.037
.120
.156
-.09
9.1
12
.341
.422
-.028
.036
.118
.312
.233
800
4.8
.6.3
-.40
9.4
09.3
11.8
87-.
151
.152
.107
.824
-.45
8.4
58
.308
.985
-.182
.182
.051
.991
.436
800
8.4
.0.0
-.00
3.0
19.0
79.0
88-.
002
.024
.099
.112
.000
.011
.035
.208
-.004
.012
.039
.199
.167
800
8.4
.0.3
-.06
6.0
69.1
17.5
15-.
019
.052
.157
.290
-.01
3.0
25
.066
.528
-.085
.087
.089
.915
180
08
.4.6
.0-.
007
.020
.077
.106
.002
.022
.092
.113
-.00
3.0
12
.036
.209
-.002
.012
.037
.173
.163
800
8.4
.6.3
-.07
2.0
73.1
17.5
85-.
027
.053
.166
.314
-.01
9.0
24
.057
.511
-.094
.096
.083
.952
180
08
.8.0
.0-.
027
.029
.107
.242
-.00
4.0
19.0
71.0
91
-.02
3.0
24
.075
.415
-.008
.012
.040
.250
.193
800
8.8
.0.3
-.18
5.1
85.1
41.8
84-.
057
.067
.125
.531
-.18
2.1
82
.140
.984
-.103
.104
.089
.974
180
08
.8.6
.0-.
031
.033
.112
.275
-.00
3.0
19.0
71.0
91
-.02
5.0
26
.079
.459
-.008
.013
.039
.259
.192
800
8.8
.6.3
-.19
2.1
92.1
36.9
26-.
062
.073
.133
.572
-.18
8.1
88
.124
.993
-.109
.110
.076
.977
1
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 133T
able
4.3
:G
MM
esti
mat
orof
Ahn,
Lee
,an
dSch
mid
t(2
013)
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
04
.4.0
.0.0
01.0
28.0
87.0
75
-.00
2.0
26.0
85.0
56
-.00
1.0
22
.067
.137
.000
.021
.065
.102
.097
200
4.4
.0.3
-.00
1.0
55.2
00.1
09-.
005
.057
.199
.111
-.00
7.0
38
.134
.148
.000
.041
.137
.158
.085
200
4.4
.6.0
-.00
5.0
29.0
97.0
94.0
04
.025
.083
.063
-.00
4.0
23
.074
.150
.002
.020
.063
.091
.094
200
4.4
.6.3
-.02
0.0
48.2
11.1
34.0
13
.049
.217
.117
-.01
3.0
37
.134
.141
.005
.037
.127
.138
.081
200
4.8
.0.0
-.00
4.0
29.1
07.0
96-.
001
.016
.056
.058
-.00
5.0
22
.083
.146
.000
.013
.045
.099
.102
200
4.8
.0.3
-.01
4.0
43.4
24.1
82-.
004
.038
.292
.166
-.01
3.0
34
.373
.197
-.003
.029
.270
.198
.122
200
4.8
.6.0
-.00
7.0
32.1
17.1
10.0
03
.016
.053
.067
-.00
7.0
22
.086
.142
.002
.013
.044
.092
.106
200
4.8
.6.3
-.01
6.0
39.3
23.1
68.0
06
.034
.125
.098
-.01
3.0
32
.273
.193
.001
.027
.103
.151
.107
200
8.4
.0.0
-.00
1.0
22.0
77.1
09.0
00
.022
.081
.100
-.00
1.0
15
.049
.315
.000
.014
.045
.257
.106
200
8.4
.0.3
.008
.054
.205
.133
-.01
1.0
57.2
19.1
28
.001
.029
.105
.341
-.002
.029
.102
.332
.078
200
8.4
.6.0
-.00
6.0
24.0
92.1
42.0
04
.020
.076
.100
-.00
4.0
17
.058
.356
.002
.013
.043
.239
.085
200
8.4
.6.3
-.01
4.0
46.2
35.1
44.0
10
.047
.246
.141
-.00
7.0
27
.116
.323
.006
.027
.110
.296
.091
200
8.8
.0.0
-.00
5.0
21.0
72.1
04.0
01
.013
.044
.063
-.00
2.0
15
.050
.288
.001
.009
.028
.197
.095
200
8.8
.0.3
-.00
5.0
35.1
33.0
99.0
03
.037
.133
.096
-.00
4.0
22
.079
.280
.002
.023
.076
.263
.074
200
8.8
.6.0
-.00
6.0
21.0
80.1
13.0
02
.012
.045
.076
-.00
3.0
15
.054
.295
.001
.008
.027
.195
.093
200
8.8
.6.3
-.01
0.0
33.1
34.1
18.0
10
.036
.146
.113
-.00
5.0
21
.075
.264
.006
.023
.076
.241
.075
800
4.4
.0.0
-.00
2.0
25.0
85.0
90.0
02
.029
.105
.092
-.00
1.0
18
.057
.123
.001
.021
.068
.120
.096
800
4.4
.0.3
-.00
2.0
33.1
24.1
06-.
001
.033
.126
.119
-.00
3.0
21
.070
.122
.000
.022
.072
.124
.105
800
4.4
.6.0
-.00
5.0
24.0
86.1
02.0
05
.025
.097
.086
-.00
3.0
19
.060
.136
.002
.019
.064
.091
.096
800
4.4
.6.3
-.00
8.0
28.1
15.1
11.0
05
.027
.121
.111
-.00
5.0
19
.063
.110
.002
.019
.066
.109
.100
800
4.8
.0.0
-.00
4.0
20.0
76.0
96.0
00
.018
.059
.078
-.00
4.0
17
.058
.136
.000
.015
.048
.093
.088
800
4.8
.0.3
-.00
5.0
22.0
94.1
27-.
002
.021
.079
.124
-.00
4.0
17
.067
.132
-.001
.016
.059
.130
.111
800
4.8
.6.0
-.00
6.0
19.0
73.1
01.0
01
.019
.063
.064
-.00
5.0
16
.065
.143
.000
.016
.052
.085
.090
800
4.8
.6.3
-.00
6.0
21.0
89.1
27.0
02
.021
.074
.098
-.00
5.0
17
.070
.138
.000
.017
.054
.115
.106
800
8.4
.0.0
.001
.022
.083
.136
-.00
1.0
27.1
11.1
23
-.00
1.0
10
.035
.220
.000
.013
.041
.186
.141
800
8.4
.0.3
.003
.029
.115
.109
-.00
4.0
30.1
18.1
19
-.00
1.0
12
.040
.176
.001
.012
.040
.173
.123
800
8.4
.6.0
-.00
4.0
19.0
79.1
43.0
03
.021
.088
.113
-.00
1.0
12
.038
.237
.001
.012
.037
.154
.139
800
8.4
.6.3
-.00
5.0
23.1
17.1
33.0
04
.024
.120
.126
-.00
2.0
12
.039
.170
.002
.012
.038
.150
.114
800
8.8
.0.0
-.00
2.0
13.0
45.0
83.0
00
.015
.051
.076
-.00
1.0
08
.027
.175
.000
.009
.027
.125
.110
800
8.8
.0.3
-.00
2.0
17.0
63.0
83.0
01
.017
.063
.083
-.00
1.0
09
.029
.137
.001
.010
.030
.134
.097
800
8.8
.6.0
-.00
3.0
13.0
46.0
87.0
00
.015
.052
.083
-.00
1.0
08
.030
.183
.000
.009
.027
.115
.116
800
8.8
.6.3
-.00
3.0
15.0
56.0
93.0
02
.016
.063
.088
-.00
1.0
08
.027
.117
.001
.009
.028
.108
.095
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 134
Table
4.4
:Subse
tG
MM
esti
mat
orof
Ahn,
Lee
,an
dSch
mid
t(2
013)
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
08
.4.0
.0.0
00.0
22.0
72.1
02
-.00
1.0
21.0
74.0
94
-.00
1.0
14
.046
.262
-.001
.013
.041
.193
.128
200
8.4
.0.3
.008
.050
.185
.125
-.01
2.0
52.1
88.1
25
.001
.025
.090
.263
-.002
.026
.088
.258
.087
200
8.4
.6.0
-.00
6.0
23.0
85.1
34.0
04
.019
.067
.090
-.00
3.0
16
.053
.299
.002
.013
.041
.189
.098
200
8.4
.6.3
-.01
2.0
44.2
05.1
31.0
09
.042
.208
.124
-.00
6.0
25
.094
.256
.005
.024
.089
.225
.087
200
8.8
.0.0
-.00
4.0
21.0
71.0
94.0
00
.013
.041
.060
-.00
2.0
14
.047
.261
.000
.008
.027
.168
.116
200
8.8
.0.3
-.00
5.0
34.1
27.0
92.0
03
.035
.126
.090
-.00
4.0
21
.073
.235
.002
.022
.070
.214
.088
200
8.8
.6.0
-.00
6.0
22.0
79.1
15.0
02
.012
.042
.072
-.00
3.0
15
.054
.273
.001
.008
.026
.152
.097
200
8.8
.6.3
-.01
0.0
32.1
21.1
09.0
09
.034
.132
.101
-.00
6.0
20
.071
.213
.006
.022
.071
.190
.088
800
8.4
.0.0
.001
.020
.079
.119
-.00
2.0
26.1
01.1
16
.000
.011
.034
.189
.000
.012
.039
.166
.135
800
8.4
.0.3
.004
.027
.109
.105
-.00
4.0
29.1
11.1
13
-.00
1.0
12
.039
.166
.001
.012
.039
.152
.129
800
8.4
.6.0
-.00
4.0
18.0
76.1
27.0
02
.020
.081
.115
-.00
1.0
12
.037
.208
.000
.011
.036
.130
.131
800
8.4
.6.3
-.00
4.0
21.0
99.1
24.0
04
.021
.100
.123
-.00
2.0
12
.038
.151
.001
.012
.037
.130
.110
800
8.8
.0.0
-.00
2.0
13.0
46.0
84.0
00
.014
.051
.077
-.00
1.0
09
.028
.162
.000
.009
.028
.121
.103
800
8.8
.0.3
-.00
3.0
17.0
60.0
82.0
01
.017
.061
.082
-.00
1.0
09
.029
.132
.001
.009
.030
.131
.101
800
8.8
.6.0
-.00
3.0
13.0
47.0
92-.
001
.014
.050
.078
-.00
1.0
09
.030
.170
.000
.009
.027
.105
.108
800
8.8
.6.3
-.00
3.0
14.0
53.0
89.0
01
.015
.058
.082
-.00
1.0
08
.027
.120
.001
.009
.028
.102
.094
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 135T
able
4.5
:F
IVU
esti
mat
orof
Rob
erts
onan
dSar
afidis
(201
5)
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
04
.4.0
.0.0
01.0
23.0
68.0
64
-.00
2.0
22.0
65.0
48
.000
.021
.061
.073
-.001
.021
.060
.061
.031
200
4.4
.0.3
.008
.045
.132
.072
-.00
4.0
43.1
36.0
68
-.00
3.0
36
.111
.085
.001
.038
.113
.085
.031
200
4.4
.6.0
.000
.023
.069
.063
.001
.020
.060
.041
.000
.022
.064
.079
.000
.019
.057
.064
.029
200
4.4
.6.3
-.00
8.0
36.1
07.0
64.0
06
.036
.116
.064
-.00
6.0
33
.100
.068
.003
.034
.102
.079
.031
200
4.8
.0.0
.000
.024
.075
.063
.000
.014
.042
.053
-.00
1.0
20
.061
.070
.001
.012
.040
.069
.035
200
4.8
.0.3
-.00
3.0
30.0
99.0
60.0
03
.026
.088
.065
-.00
3.0
28
.089
.076
.002
.024
.079
.080
.038
200
4.8
.6.0
-.00
2.0
25.0
79.0
63.0
01
.013
.041
.043
-.00
2.0
20
.066
.071
.002
.012
.038
.066
.033
200
4.8
.6.3
-.00
6.0
29.0
93.0
68.0
04
.026
.082
.069
-.00
4.0
28
.089
.084
.002
.025
.079
.085
.035
200
8.4
.0.0
.002
.014
.042
.072
-.00
2.0
13.0
41.0
71
.001
.012
.036
.182
.000
.011
.034
.160
.032
200
8.4
.0.3
.012
.034
.097
.080
-.01
4.0
34.0
99.0
85
.004
.021
.063
.173
-.004
.022
.065
.180
.035
200
8.4
.6.0
.000
.014
.042
.065
.000
.012
.035
.061
.000
.013
.037
.179
.000
.011
.033
.135
.032
200
8.4
.6.3
-.00
4.0
25.0
80.0
56.0
03
.026
.079
.054
-.00
2.0
20
.060
.174
.002
.020
.061
.158
.034
200
8.8
.0.0
-.00
1.0
13.0
38.0
53.0
00
.008
.025
.050
.000
.011
.034
.168
.000
.007
.023
.143
.037
200
8.8
.0.3
-.00
1.0
22.0
66.0
51.0
01
.023
.068
.048
-.00
1.0
18
.054
.163
.001
.018
.057
.155
.036
200
8.8
.6.0
-.00
1.0
14.0
39.0
51.0
00
.008
.023
.055
.000
.012
.035
.164
.001
.007
.022
.140
.037
200
8.8
.6.3
-.00
4.0
20.0
60.0
48.0
05
.023
.066
.048
-.00
3.0
18
.053
.156
.002
.019
.057
.153
.030
800
4.4
.0.0
.000
.020
.061
.060
.000
.022
.073
.066
.000
.017
.051
.069
-.001
.020
.060
.069
.052
800
4.4
.0.3
.002
.024
.078
.072
-.00
1.0
24.0
81.0
68
-.00
1.0
20
.059
.059
.000
.020
.061
.063
.055
800
4.4
.6.0
-.00
2.0
19.0
55.0
68.0
02
.019
.058
.056
-.00
1.0
17
.053
.074
.002
.018
.057
.066
.050
800
4.4
.6.3
-.00
4.0
21.0
63.0
64.0
02
.020
.067
.059
-.00
2.0
18
.054
.060
.001
.018
.055
.065
.046
800
4.8
.0.0
-.00
2.0
16.0
53.0
58.0
00
.015
.047
.050
-.00
1.0
16
.048
.067
.000
.013
.042
.056
.050
800
4.8
.0.3
-.00
2.0
17.0
55.0
56.0
01
.017
.053
.053
-.00
2.0
15
.048
.058
.001
.015
.047
.052
.051
800
4.8
.6.0
-.00
4.0
15.0
51.0
71.0
00
.016
.049
.058
-.00
3.0
14
.047
.077
.001
.015
.046
.059
.048
800
4.8
.6.3
-.00
4.0
16.0
52.0
69.0
02
.016
.050
.059
-.00
2.0
15
.047
.066
.000
.015
.046
.058
.049
800
8.4
.0.0
.002
.013
.038
.056
-.00
3.0
17.0
50.0
66
.000
.008
.025
.079
.000
.010
.031
.081
.050
800
8.4
.0.3
.005
.018
.055
.063
-.00
7.0
19.0
55.0
64
.000
.010
.030
.080
.000
.010
.031
.083
.047
800
8.4
.6.0
-.00
1.0
11.0
31.0
54.0
00
.012
.035
.055
-.00
1.0
09
.026
.078
.001
.010
.030
.080
.055
800
8.4
.6.3
-.00
1.0
13.0
39.0
54.0
00
.013
.038
.052
-.00
1.0
10
.029
.078
.001
.010
.030
.077
.051
800
8.8
.0.0
-.00
1.0
08.0
26.0
49.0
00
.010
.030
.059
.000
.007
.021
.077
.000
.008
.024
.080
.050
800
8.8
.0.3
.000
.011
.034
.050
.001
.011
.034
.056
.000
.008
.024
.065
.000
.008
.025
.079
.052
800
8.8
.6.0
-.00
1.0
08.0
25.0
50-.
001
.009
.028
.057
-.00
1.0
07
.021
.084
.000
.008
.024
.079
.051
800
8.8
.6.3
-.00
1.0
09.0
29.0
56.0
00
.010
.031
.053
.000
.007
.024
.076
.000
.008
.025
.073
.059
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 136
Table
4.6
:Subse
tF
IVU
esti
mat
orof
Rob
erts
onan
dSar
afidis
(201
5)
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
08
.4.0
.0.0
02.0
13.0
42.0
76
-.00
3.0
13.0
41.0
69
.000
.012
.035
.126
-.001
.011
.035
.116
.029
200
8.4
.0.3
.011
.032
.094
.088
-.01
2.0
33.0
94.0
87
.001
.021
.064
.125
-.002
.021
.065
.119
.030
200
8.4
.6.0
.000
.014
.042
.068
.000
.012
.037
.049
.000
.013
.037
.127
-.001
.011
.034
.104
.032
200
8.4
.6.3
-.00
5.0
25.0
75.0
57.0
05
.024
.074
.059
-.00
2.0
20
.060
.121
.002
.020
.060
.109
.030
200
8.8
.0.0
.000
.014
.042
.066
.000
.008
.025
.057
-.00
1.0
12
.037
.136
.000
.008
.023
.115
.031
200
8.8
.0.3
-.00
2.0
23.0
68.0
57.0
01
.023
.068
.047
-.00
3.0
18
.057
.125
.002
.019
.056
.116
.035
200
8.8
.6.0
-.00
2.0
14.0
44.0
69.0
01
.008
.024
.058
-.00
1.0
12
.038
.134
.000
.008
.023
.101
.028
200
8.8
.6.3
-.00
5.0
20.0
61.0
52.0
05
.022
.067
.044
-.00
4.0
18
.054
.122
.003
.019
.058
.103
.039
800
8.4
.0.0
.002
.013
.038
.059
-.00
3.0
16.0
47.0
60
.000
.009
.026
.072
.000
.011
.033
.063
.044
800
8.4
.0.3
.004
.017
.051
.069
-.00
5.0
17.0
51.0
72
.000
.010
.032
.076
.000
.011
.033
.074
.045
800
8.4
.6.0
.000
.011
.032
.060
.000
.012
.035
.058
.000
.009
.028
.077
.000
.010
.032
.071
.048
800
8.4
.6.3
-.00
1.0
12.0
38.0
55.0
01
.012
.038
.069
-.00
1.0
10
.030
.079
.000
.010
.031
.071
.044
800
8.8
.0.0
.000
.010
.029
.059
.000
.010
.031
.055
.000
.008
.024
.068
.000
.008
.025
.072
.041
800
8.8
.0.3
-.00
1.0
11.0
34.0
59.0
01
.011
.032
.056
-.00
1.0
08
.026
.068
.001
.008
.026
.068
.047
800
8.8
.6.0
-.00
1.0
09.0
29.0
59.0
00
.009
.029
.061
.000
.008
.025
.072
.000
.008
.025
.073
.046
800
8.8
.6.3
-.00
2.0
10.0
30.0
49.0
01
.010
.031
.056
.000
.008
.025
.078
.000
.009
.025
.067
.050
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 137T
able
4.7
:F
IVR
esti
mat
orof
Rob
erts
onan
dSar
afidis
(201
5)
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
04
.4.0
.0.0
01.0
19.0
58.0
68
-.00
2.0
20.0
60.0
58
.000
.016
.047
.081
-.001
.018
.052
.081
.035
200
4.4
.0.3
.008
.037
.113
.081
-.00
6.0
38.1
22.0
71
-.00
2.0
27
.083
.081
-.001
.030
.090
.080
.033
200
4.4
.6.0
.000
.019
.057
.061
.000
.019
.055
.046
.000
.016
.048
.081
.000
.017
.051
.073
.031
200
4.4
.6.3
-.00
2.0
31.0
95.0
62.0
03
.034
.106
.065
-.00
1.0
26
.079
.068
.000
.029
.088
.077
.032
200
4.8
.0.0
.001
.017
.055
.066
.000
.012
.038
.063
.000
.014
.044
.072
.000
.011
.035
.085
.035
200
4.8
.0.3
.000
.023
.073
.061
.002
.024
.076
.057
.000
.021
.061
.067
.000
.022
.067
.082
.039
200
4.8
.6.0
-.00
1.0
18.0
54.0
59.0
00
.012
.037
.060
.000
.014
.044
.068
.000
.011
.035
.086
.038
200
4.8
.6.3
-.00
1.0
23.0
71.0
62.0
02
.024
.076
.066
.000
.021
.062
.071
.000
.022
.072
.084
.041
200
8.4
.0.0
.001
.012
.037
.069
-.00
2.0
13.0
39.0
68
.001
.011
.031
.181
-.001
.011
.033
.172
.043
200
8.4
.0.3
.015
.034
.095
.086
-.01
7.0
36.0
99.0
87
.005
.020
.057
.214
-.006
.021
.061
.215
.043
200
8.4
.6.0
.000
.012
.036
.067
-.00
1.0
11.0
33.0
62
.001
.011
.032
.189
.000
.011
.032
.163
.040
200
8.4
.6.3
-.00
2.0
25.0
77.0
54.0
01
.027
.080
.051
-.00
1.0
18
.055
.197
.001
.020
.060
.186
.038
200
8.8
.0.0
.000
.011
.032
.054
.000
.008
.023
.051
.001
.009
.028
.179
.000
.007
.022
.155
.037
200
8.8
.0.3
.000
.019
.057
.047
.000
.022
.066
.045
.001
.015
.046
.183
.000
.018
.054
.174
.037
200
8.8
.6.0
.000
.011
.031
.054
.000
.007
.022
.051
.001
.009
.028
.181
.000
.007
.022
.159
.036
200
8.8
.6.3
-.00
3.0
18.0
55.0
51.0
04
.022
.066
.046
-.00
1.0
16
.047
.176
.002
.018
.056
.177
.038
800
4.4
.0.0
-.00
1.0
15.0
45.0
59.0
00
.019
.061
.063
-.00
1.0
12
.036
.066
.001
.016
.049
.066
.051
800
4.4
.0.3
.000
.021
.064
.068
-.00
1.0
22.0
70.0
66
-.00
1.0
15
.044
.068
.000
.016
.049
.060
.048
800
4.4
.6.0
-.00
1.0
13.0
41.0
59.0
02
.017
.052
.051
-.00
1.0
12
.037
.062
.001
.015
.048
.056
.051
800
4.4
.6.3
-.00
2.0
17.0
51.0
62.0
02
.018
.058
.059
-.00
1.0
14
.043
.059
.001
.016
.050
.058
.048
800
4.8
.0.0
-.00
1.0
11.0
34.0
61.0
00
.014
.043
.056
.000
.010
.030
.075
.000
.012
.038
.060
.045
800
4.8
.0.3
.000
.014
.042
.051
.000
.015
.046
.052
-.00
1.0
11
.035
.062
.000
.013
.040
.059
.047
800
4.8
.6.0
-.00
1.0
11.0
33.0
69.0
00
.015
.044
.056
.000
.010
.029
.075
.000
.014
.042
.062
.042
800
4.8
.6.3
-.00
1.0
13.0
41.0
64.0
02
.015
.048
.056
.000
.011
.035
.057
.000
.014
.042
.059
.044
800
8.4
.0.0
.001
.011
.033
.050
-.00
2.0
15.0
47.0
64
.000
.007
.020
.093
.000
.010
.028
.082
.054
800
8.4
.0.3
.005
.017
.053
.070
-.00
6.0
18.0
56.0
73
.000
.008
.026
.082
.000
.010
.028
.082
.054
800
8.4
.6.0
.000
.009
.026
.051
-.00
1.0
11.0
33.0
54
.000
.007
.021
.079
.000
.009
.028
.077
.053
800
8.4
.6.3
-.00
1.0
12.0
37.0
52.0
00
.013
.038
.056
.000
.009
.026
.078
.000
.009
.029
.079
.053
800
8.8
.0.0
.000
.006
.019
.053
.000
.010
.028
.061
.000
.005
.016
.082
.000
.007
.023
.080
.053
800
8.8
.0.3
.000
.010
.029
.055
.000
.011
.032
.054
.000
.007
.020
.078
.000
.008
.023
.081
.053
800
8.8
.6.0
.000
.006
.018
.051
.000
.009
.028
.057
.000
.005
.016
.078
.000
.008
.024
.080
.050
800
8.8
.6.3
.000
.009
.027
.055
.000
.010
.031
.055
.000
.007
.021
.079
.000
.008
.024
.079
.049
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 138
Table
4.8
:Subse
tF
IVR
esti
mat
orof
Rob
erts
onan
dSar
afidis
(201
5)
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
08
.4.0
.0.0
02.0
12.0
38.0
72
-.00
2.0
12.0
39.0
71
.000
.010
.031
.131
-.001
.010
.032
.124
.035
200
8.4
.0.3
.012
.032
.092
.093
-.01
2.0
34.0
97.0
89
.002
.018
.056
.142
-.003
.019
.059
.140
.038
200
8.4
.6.0
.001
.012
.037
.067
.000
.011
.034
.054
.000
.011
.032
.139
-.001
.010
.032
.124
.033
200
8.4
.6.3
-.00
2.0
23.0
70.0
63.0
03
.025
.074
.060
-.00
1.0
18
.053
.133
.000
.020
.057
.122
.035
200
8.8
.0.0
.000
.011
.032
.066
.000
.008
.023
.061
.000
.010
.029
.142
.000
.008
.022
.116
.035
200
8.8
.0.3
-.00
1.0
19.0
58.0
53.0
01
.022
.065
.052
.000
.015
.046
.136
.001
.018
.052
.126
.039
200
8.8
.6.0
.000
.011
.032
.064
.000
.007
.023
.054
.000
.009
.029
.143
.000
.007
.022
.118
.038
200
8.8
.6.3
-.00
3.0
18.0
54.0
56.0
04
.022
.065
.049
-.00
1.0
15
.047
.131
.001
.018
.055
.121
.042
800
8.4
.0.0
.001
.010
.032
.061
-.00
2.0
14.0
43.0
64
.000
.007
.022
.077
.000
.010
.029
.074
.053
800
8.4
.0.3
.004
.015
.048
.071
-.00
5.0
17.0
51.0
73
.000
.009
.027
.073
.000
.010
.029
.076
.048
800
8.4
.6.0
.000
.009
.026
.052
.000
.011
.033
.055
.000
.007
.023
.077
.000
.010
.029
.073
.055
800
8.4
.6.3
.000
.011
.035
.053
.000
.012
.036
.058
.000
.009
.026
.066
.000
.010
.029
.074
.049
800
8.8
.0.0
.000
.006
.020
.055
.000
.009
.028
.064
.000
.006
.017
.074
.000
.007
.023
.069
.044
800
8.8
.0.3
.000
.009
.028
.061
.000
.010
.030
.060
.000
.007
.020
.070
.000
.008
.023
.070
.052
800
8.8
.6.0
.000
.006
.019
.056
.000
.009
.028
.057
.000
.006
.017
.073
.000
.008
.024
.076
.046
800
8.8
.6.3
.000
.009
.026
.059
.000
.010
.030
.059
.000
.007
.021
.072
.000
.008
.024
.070
.050
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 139T
able
4.9
:P
roje
ctio
nG
MM
esti
mat
orof
Hay
akaw
a(2
012)
wit
hw
eak
exog
enei
ty
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
04
.4.0
.0.0
00.0
25.0
76.0
58
-.00
1.0
23.0
75.0
53
-.00
2.0
23
.072
.087
.002
.023
.072
.074
.020
200
4.4
.0.3
.003
.056
.181
.078
-.00
3.0
54.1
72.0
83
-.01
1.0
55
.171
.113
.007
.050
.166
.113
.026
200
4.4
.6.0
-.00
1.0
26.0
81.0
77.0
03
.021
.070
.062
-.00
3.0
26
.081
.106
.005
.022
.073
.097
.028
200
4.4
.6.3
-.01
6.0
55.1
91.1
06.0
15
.052
.191
.097
-.01
9.0
56
.206
.153
.016
.050
.199
.141
.021
200
4.8
.0.0
-.00
1.0
33.1
07.0
73.0
01
.016
.050
.047
-.00
1.0
31
.101
.092
.002
.015
.046
.062
.020
200
4.8
.0.3
-.00
9.0
50.1
79.0
88.0
02
.034
.116
.079
-.01
3.0
52
.179
.132
.004
.035
.117
.111
.033
200
4.8
.6.0
-.00
3.0
32.1
04.0
69.0
03
.015
.050
.053
-.00
5.0
32
.108
.108
.004
.015
.051
.088
.021
200
4.8
.6.3
-.01
3.0
56.2
12.1
06.0
09
.041
.142
.084
-.01
8.0
59
.253
.167
.010
.041
.150
.122
.025
200
8.4
.0.0
.001
.015
.046
.075
-.00
1.0
14.0
46.0
75
.000
.013
.039
.143
.000
.012
.038
.131
.018
200
8.4
.0.3
.015
.045
.134
.089
-.01
5.0
44.1
35.0
99
.002
.031
.093
.147
-.002
.031
.092
.145
.021
200
8.4
.6.0
.000
.014
.044
.063
.001
.012
.037
.056
.000
.014
.042
.144
.001
.012
.035
.118
.028
200
8.4
.6.3
-.00
8.0
38.1
20.0
66.0
08
.038
.118
.051
-.00
6.0
31
.089
.136
.006
.030
.089
.128
.029
200
8.8
.0.0
-.00
1.0
16.0
50.0
59.0
01
.009
.028
.068
-.00
1.0
16
.046
.140
.001
.008
.026
.129
.021
200
8.8
.0.3
-.00
1.0
33.1
04.0
52.0
02
.030
.094
.056
-.00
4.0
31
.090
.128
.004
.027
.081
.123
.019
200
8.8
.6.0
-.00
1.0
15.0
46.0
45.0
00
.009
.025
.058
-.00
2.0
15
.047
.136
.001
.008
.025
.118
.026
200
8.8
.6.3
-.01
0.0
41.1
25.0
59.0
07
.038
.121
.059
-.00
9.0
35
.106
.135
.008
.033
.100
.138
.026
800
4.4
.0.0
-.00
1.0
26.0
74.0
65.0
00
.026
.082
.079
-.00
1.0
21
.064
.072
.001
.023
.069
.073
.035
800
4.4
.0.3
.000
.032
.105
.072
-.00
1.0
31.1
02.0
75
-.00
3.0
28
.090
.079
.003
.027
.088
.074
.037
800
4.4
.6.0
-.00
4.0
24.0
79.0
86.0
04
.022
.072
.069
-.00
4.0
24
.077
.103
.004
.022
.074
.095
.038
800
4.4
.6.3
-.00
6.0
31.1
07.0
81.0
06
.030
.108
.072
-.00
5.0
27
.089
.078
.004
.025
.089
.075
.042
800
4.8
.0.0
-.00
6.0
37.1
13.0
66.0
01
.019
.057
.054
-.00
7.0
36
.108
.098
.002
.017
.052
.059
.036
800
4.8
.0.3
-.00
3.0
28.0
94.0
67.0
01
.021
.067
.060
-.00
4.0
25
.088
.074
.001
.020
.063
.066
.044
800
4.8
.6.0
-.00
7.0
28.1
05.0
81.0
04
.021
.069
.071
-.00
8.0
28
.105
.096
.003
.020
.067
.084
.045
800
4.8
.6.3
-.00
7.0
31.1
15.0
77.0
05
.026
.091
.064
-.00
7.0
30
.106
.085
.005
.024
.083
.072
.043
800
8.4
.0.0
.003
.018
.053
.105
-.00
3.0
22.0
65.1
18
.000
.011
.032
.082
.000
.012
.036
.077
.030
800
8.4
.0.3
.008
.025
.073
.072
-.00
7.0
25.0
72.0
74
.000
.016
.048
.076
.000
.016
.047
.073
.027
800
8.4
.6.0
-.00
1.0
14.0
41.0
60.0
00
.013
.040
.055
-.00
1.0
12
.037
.084
.001
.012
.035
.079
.042
800
8.4
.6.3
-.00
3.0
22.0
68.0
54.0
03
.022
.068
.056
-.00
1.0
17
.047
.067
.001
.016
.047
.073
.035
800
8.8
.0.0
-.00
1.0
19.0
56.0
68.0
00
.013
.039
.079
-.00
2.0
15
.046
.087
.001
.010
.029
.075
.031
800
8.8
.0.3
.000
.018
.055
.057
.000
.016
.048
.057
-.00
1.0
14
.042
.080
.001
.012
.036
.079
.037
800
8.8
.6.0
.000
.015
.048
.046
.001
.011
.033
.055
-.00
1.0
14
.046
.083
.001
.010
.030
.073
.040
800
8.8
.6.3
-.00
2.0
20.0
62.0
53.0
01
.019
.058
.055
-.00
1.0
15
.047
.071
.000
.015
.043
.069
.041
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 140
Table
4.1
0:
Subse
tP
roje
ctio
nG
MM
esti
mat
orof
Hay
akaw
a(2
012)
wit
hw
eak
exog
enei
ty
Des
ign
sG
MM
1st
epG
MM
2st
epα
βα
βJ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Siz
e20
08
.4.0
.0.0
00.0
14.0
45.0
74
.000
.014
.044
.064
.000
.013
.041
.123
-.001
.013
.039
.108
.013
200
8.4
.0.3
.009
.043
.133
.083
-.00
9.0
43.1
34.0
88
.000
.033
.099
.120
.000
.032
.098
.128
.026
200
8.4
.6.0
.000
.014
.045
.071
.001
.012
.037
.054
.000
.015
.046
.140
.001
.012
.037
.111
.029
200
8.4
.6.3
-.00
9.0
38.1
18.0
76.0
09
.038
.118
.067
-.00
6.0
31
.093
.110
.005
.030
.091
.109
.035
200
8.8
.0.0
-.00
1.0
17.0
55.0
71.0
01
.009
.029
.068
-.00
2.0
17
.052
.138
.000
.009
.027
.115
.020
200
8.8
.0.3
-.00
4.0
38.1
17.0
68.0
04
.033
.105
.063
-.00
6.0
33
.102
.116
.004
.029
.087
.109
.031
200
8.8
.6.0
-.00
2.0
16.0
50.0
59.0
00
.009
.026
.058
-.00
2.0
17
.052
.122
.001
.009
.026
.104
.025
200
8.8
.6.3
-.01
1.0
44.1
39.0
79.0
09
.042
.130
.070
-.00
7.0
37
.116
.131
.007
.035
.108
.120
.032
800
8.4
.0.0
.003
.017
.049
.083
-.00
3.0
20.0
57.0
97
-.00
1.0
11
.034
.072
.000
.013
.038
.072
.033
800
8.4
.0.3
.006
.024
.072
.067
-.00
5.0
24.0
72.0
70
.000
.018
.051
.072
.001
.018
.051
.072
.034
800
8.4
.6.0
-.00
2.0
14.0
42.0
53.0
01
.014
.040
.051
-.00
1.0
13
.039
.075
.001
.012
.036
.070
.044
800
8.4
.6.3
-.00
2.0
22.0
66.0
58.0
03
.022
.067
.054
-.00
1.0
17
.050
.072
.000
.017
.048
.072
.044
800
8.8
.0.0
.000
.020
.064
.068
.000
.013
.039
.070
-.00
2.0
17
.052
.092
.001
.010
.030
.076
.035
800
8.8
.0.3
-.00
1.0
19.0
57.0
58.0
00
.015
.048
.061
-.00
1.0
15
.047
.076
.001
.013
.039
.069
.036
800
8.8
.6.0
-.00
1.0
16.0
52.0
54.0
01
.011
.034
.055
-.00
2.0
15
.052
.079
.001
.011
.031
.065
.044
800
8.8
.6.3
-.00
3.0
21.0
65.0
59.0
02
.019
.061
.062
-.00
1.0
17
.051
.068
.000
.016
.046
.068
.042
Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 141T
able
4.1
1:
Con
dit
ional
like
lihood
esti
mat
orof
Bai
(201
3b)
Des
ign
sS
tric
tW
eak
αβ
αβ
NT
αρ
δB
ias
RM
SE
qS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
Bia
sR
MS
EqS
tdS
ize
200
4.4
.0.0
.001
.013
.040
.052
-.001
.013
.038
.050
-.00
1.0
13.0
39.0
59.0
02.0
13.0
36
.066
200
4.4
.0.3
.003
.027
.081
.150
-.015
.031
.103
.207
-.00
1.0
25.0
74.1
27.0
00.0
27.0
78
.161
200
4.4
.6.0
.000
.014
.040
.053
.000
.013
.038
.052
-.01
1.0
17.0
43.1
29.0
24.0
24.0
39
.302
200
4.4
.6.3
.000
.025
.074
.109
-.006
.029
.090
.167
-.04
0.0
42.0
81.3
50.0
50.0
51.0
78
.445
200
4.8
.0.0
.000
.013
.040
.054
.000
.009
.026
.059
.000
.013
.039
.052
-.00
2.0
09.0
25.0
6920
04
.8.0
.3-.
005
.026
.225
.234
-.016
.030
.134
.313
.000
.019
.058
.093
.000
.020
.060
.142
200
4.8
.6.0
.000
.013
.039
.048
.000
.009
.027
.059
-.00
5.0
14.0
41.0
66.0
11.0
12.0
26
.166
200
4.8
.6.3
-.00
3.0
22.0
75.1
62
-.002
.026
.082
.194
-.02
5.0
28.0
60.2
05.0
35.0
35.0
55
.347
200
8.4
.0.0
.000
.008
.024
.056
.000
.008
.024
.051
-.00
1.0
08.0
24.0
53.0
01.0
08.0
24
.064
200
8.4
.0.3
.005
.015
.045
.086
-.005
.016
.047
.096
.001
.016
.049
.120
-.00
3.0
18.0
53.1
4420
08
.4.6
.0.0
00.0
09.0
25.0
59.0
00
.008
.024
.057
-.00
6.0
10.0
25.0
88.0
12.0
13.0
25
.190
200
8.4
.6.3
.003
.015
.044
.076
-.003
.016
.047
.090
-.02
3.0
25.0
54.2
90.0
27.0
28.0
57
.328
200
8.8
.0.0
.000
.008
.024
.053
.000
.006
.017
.062
.000
.008
.024
.050
-.00
1.0
06.0
18.0
6420
08
.8.0
.3-.
008
.015
.044
.131
.007
.018
.054
.155
.000
.015
.047
.148
-.00
1.0
19.0
57.1
7920
08
.8.6
.0.0
00.0
08.0
24.0
52.0
00
.006
.017
.059
-.00
3.0
08.0
24.0
65.0
06.0
07.0
17
.122
200
8.8
.6.3
-.00
9.0
15.0
42.1
28
.011
.019
.052
.150
-.02
1.0
22.0
50.2
56.0
27.0
29.0
57
.332
800
4.4
.0.0
.000
.010
.031
.060
.001
.012
.035
.051
-.00
3.0
11.0
33.0
95.0
04.0
15.0
43
.172
800
4.4
.0.3
.002
.022
.072
.339
-.014
.028
.116
.438
.001
.020
.061
.301
-.00
2.0
25.0
78.4
1580
04
.4.6
.0.0
00.0
10.0
31.0
57.0
01
.012
.035
.052
-.02
5.0
26.0
51.4
49.0
64.0
64.0
76
.798
800
4.4
.6.3
-.00
2.0
21.0
63.2
97
-.003
.028
.099
.409
-.04
4.0
44.0
73.6
42.0
56.0
56.0
74
.741
800
4.8
.0.0
-.00
1.0
09.0
27.0
57
.000
.010
.030
.049
.000
.009
.027
.058
-.00
6.0
12.0
38.1
8280
04
.8.0
.3-.
008
.024
.250
.448
-.019
.035
.170
.578
-.00
2.0
16.0
49.2
63.0
01.0
22.0
67
.411
800
4.8
.6.0
.000
.009
.027
.055
.000
.010
.032
.052
-.00
7.0
11.0
30.1
34.0
40.0
40.0
45
.722
800
4.8
.6.3
-.00
8.0
22.0
77.3
88
.005
.031
.110
.516
-.03
4.0
34.0
49.6
16.0
49.0
49.0
55
.779
800
8.4
.0.0
.000
.006
.018
.058
.000
.008
.023
.056
-.00
1.0
07.0
20.0
81.0
02.0
10.0
29
.154
800
8.4
.0.3
.005
.011
.030
.211
-.006
.012
.035
.241
.002
.013
.039
.314
-.00
4.0
16.0
48.3
8580
08
.4.6
.0-.
001
.006
.019
.054
.000
.008
.023
.050
-.01
4.0
15.0
25.4
03.0
35.0
35.0
44
.708
800
8.4
.6.3
.003
.010
.029
.175
-.003
.012
.035
.224
-.02
6.0
26.0
47.5
86.0
30.0
31.0
54
.629
800
8.8
.0.0
.000
.005
.015
.050
.000
.007
.019
.058
.000
.005
.015
.052
-.00
4.0
08.0
24.1
6380
08
.8.0
.3-.
006
.009
.024
.212
.007
.012
.035
.301
.000
.010
.031
.270
-.00
1.0
14.0
41.3
6980
08
.8.6
.0.0
00.0
05.0
15.0
46.0
00
.007
.020
.057
-.00
4.0
06.0
16.1
18.0
21.0
21.0
31
.553
800
8.8
.6.3
-.00
7.0
10.0
23.2
14
.010
.013
.033
.323
-.02
0.0
20.0
32.5
68.0
26.0
27.0
40
.655
Chapter 5
Pseudo Panel Data Models with
Cohort Interactive Effects
5.1 Introduction
Over the last three decades panel data techniques proved to be of high value for both
micro and macro economists. Nevertheless, genuine microeconomic panel data can still
be difficult and costly to obtain and administer. The non-availability of genuine panel
datasets can be especially problematic for developing countries with a limited amount
of administrative data that tracks individuals over time. In such cases, repeated cross-
section surveys can be used to form so-called pseudo panels.
Models for this type of data in economics were introduced by Deaton (1985), with
early contributions by Verbeek and Nijman (1992) and Moffitt (1993) among others.
Although pseudo panel data models have not been analysed as extensively as their gen-
uine counterparts, the volume of literature on these types of models is increasing. For
some recent theoretical papers, readers may be referred to McKenzie (2004), Verbeek
and Vella (2005) and Inoue (2008)[hereafter I2008 ] inter alia, while Verbeek (2008)
provides an excellent overview of the literature.
Existing estimation methods for linear pseudo panel data models assume that the un-
observed individual heterogeneity can be properly captured using the standard additive
error component structure. However, in some cases this assumption might be too re-
strictive to properly describe the data at hand. For genuine panel data models, there is
a substantial literature available on models that use a multiplicative error component
143
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 144
structure of the factor type to capture the unobserved individual characteristics in a
more flexible way, see e.g. Pesaran (2006), Bai (2009), Sarafidis et al. (2009) and the
survey of Sarafidis and Wansbeek (2012).
The key component of pseudo panel analysis is the use of cohort information in esti-
mation. We use the generic term “cohorts” to describe any grouping structure based
on variables like gender, race, or age. In this paper, we introduce a factor structure to
linear pseudo panel data models with a fixed number of time periods and cohorts. We
provide several theoretical contributions to the existing pseudo panel data literature.
Firstly, if the common mean assumption is violated for all cohorts, but can be main-
tained within cohorts, a Generalized Method of Moments (GMM) estimator based on
the quasi-differencing approach of Ahn, Lee, and Schmidt (2013) is consistent. Sec-
ondly, we discuss identification, estimation and inference properties of this estimator
for potentially unbalanced samples.
In addition to the theoretical results of the novel estimator, an extensive Monte Carlo
simulation study is conducted to assess the finite sample properties. We mainly focus
on the robustness of the proposed estimator with respect to endogenous variables,
cohort interactive effects, and weak identification.
We also apply our estimator in an empirical analysis of the labour supply elasticity
in Ecuador over the period of 2007-2013. We use annual survey data to construct ten
cohorts based on the corresponding heads of the household that work full time. To
account for possible general non-linear trends in labour supply we allow for a non-
additive factor structure using the newly developed estimator. We find a statistically
significant negative wage effect on hours worked.
Here we briefly introduce our notation. The usual vec(·) operator denotes the column
stacking operator. The commutation matrix Ka,b is defined in such a way that for any
[a× b] matrix A, vec(A′) = Ka,b vec(A). ⊗ denotes the Kronecker product satisfying
the property vec(ABC) = (C ′⊗A) vec(B) and ıT is a [T×1] vector of ones. For some
set A, we denote its cardinality by |A|. Finally, 1(·) is the usual indicator function.
For further details regarding the notation used in this paper see Abadir and Magnus
(2002).
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 145
5.2 The Model
In this paper, we consider a linear panel data model with group specific membership
yi,t = β′ws,t + ζ ′zi,t + νi,t, νi,t = λ′ift + εi,t, E[εi,t] = 0, i ∈ Is,t (5.1)
where Is,t is the set of all individuals (in total Ns,t) that are in group s = 1, . . . , S
at time t = 1, . . . , T , ws,t is a Kw-dimensional vector of group-time-specific covariates,
and zi,t is a Kz-dimensional vector of individual specific explanatory variables. Thus, in
total there are Kz +Kw = K parameters of interest for observed explanatory variables.
We denote the combined parameter vector by θ = (β′, ζ ′)′. For ease of exposition we
shall assume at this stage that zi,t does not contain any lags of yi,t. Extensions to
dynamic models are formally discussed in Section 5.3.4.
We use the generic term cohorts to describe any grouping structure based on some
selection variable. In the literature variables like gender, race, region of residence and
most popularly age are used to define the group participation, see Verbeek (2008) and
McKenzie (2004).
To allow for individual specific unobserved characteristics, ui,t contains the multifactor
error term λ′ift =∑L
l=1 λ(l)i f
(l)t . The L-dimensional vectors λi and ft are individual
specific factor loadings and time specific factors, respectively. The standard two com-
ponent (Fixed Effects (FE)) model (as in e.g. McKenzie (2004), I2008 and Verbeek
(2008)) can be obtained by setting ft to some constant c, such that λ′ift = δi, ∀t.
Estimation of the model in (5.1) is straightforward if E[νi,t|zi,t] = 0 and can be per-
formed by using pooled cross-sectional OLS. However, in most cases of empirical in-
terest these conditions can be violated as the unobserved individual characteristics λi
are correlated with observed individual characteristics zi,t. Hence, if the correlation
is non-zero we have to rely on either general “external” instruments or pseudo panel
techniques that use the cohort structure of the dataset as instruments, as originally
suggested by Deaton (1985). This paper deals with the latter type of estimators.
Before defining the estimators considered in this paper, we discuss the notation first.
All estimators discussed in this paper, can be expressed solely in terms of the matri-
ces/vectors containing cross-sectional averages. Observations at the individual level
i, on the other hand, are only used for the estimation of the asymptotic variance-
covariance matrices. By taking the cross-sectional average for some group s at time t
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 146
we obtain the following aggregated equation
ys,t = θ′xs,t + νs,t, s = 1, . . . , S, t = 1, . . . , T. (5.2)
Here we denote
ys,t =1
Ns,t
∑i∈Is,t
yi,t, xs,t =1
Ns,t
∑i∈Is,t
xi,s,t, λs,t =1
Ns,t
∑i∈Is,t
λi, εs,t =1
Ns,t
∑i∈Is,t
εi,t,
νs,t = λ′s,tft + εs,t, xi,s,t = (w′s,t, z′i,t)′.
After performing cross-sectional averaging we stack all observations over the time-
dimension for some cohort s
ys = Xsθ + νs, (5.3)
where Xs, a [T ×K] dimensional matrix, is defined as
Xs = (xs,1, . . . , xs,T )′ (5.4)
and similarly for T dimensional vectors ys and us. Finally, we can stack observations
for all cohorts
y = Xθ + ν, (5.5)
where the corresponding s specific vectors/matrices are stacked on top of each other,
e.g.
y = (y′1, . . . ,y′S)′, X = (X ′1, . . . ,X
′S)′. (5.6)
It is important that already at this point we discuss the asymptotic setup that one
can use to derive the theoretical results. Using the terminology of Verbeek (2008) we
formulate commonly used asymptotic schemes:
Type I Ns,t →∞. T and S are fixed;
Type II Ns,t and T fixed but S →∞;
Type III Ns,t →∞ and T →∞ but S fixed.
In this paper, we assume that one possesses a dataset such that the Type I asymptotic
scheme is reasonable for describing the finite sample properties of the estimator con-
sidered. Hence, unless stated otherwise,p−→/
d−→ are used to denote convergence in
probability/distribution as all Ns,t →∞.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 147
A well known implication (see e.g. I2008 ) of the Type I asymptotics is the robust-
ness of the estimator based on cross-sectional averages to the presence of endogenous
explanatory variables. However, as it is later discussed in this paper, robustness to
endogeneity is only achieved under the assumption of strong identification. Another
implication for our analysis is that under Type I (unlike Type II) asymptotics, the
estimator that is discussed in this paper does not suffer from the “many instrument”
bias as in Bekker (1994) and Bekker and van der Ploeg (2005). The intuition behind
these properties is discussed later in the paper.
5.3 Cohort Interactive Effects
5.3.1 Inconsistency of the conventional Fixed Effects estima-
tor
In this section we show that the conventional Fixed Effects type estimator for pseudo
panel data models is inconsistent if νi,t has a factor structure. The conventional esti-
mator can be motivated assuming that
E[λ′ift|i ∈ Is,t] = δs. (5.7)
Here we refer to δs as the cohort fixed effect. This condition is satisfied if, for example,
ft = c is time-invariant vector and E[λi|i ∈ Is,t] = λs or the factor loadings λi have
zero mean, i.e. E[λi|i ∈ Is,t] = 0L.
Assuming (5.7), one can then rewrite (5.5)
y = Xθ + vec (ıδ′) + (ν − vec (ıδ′))
= vec (ıδ′) +Xθ + u, (5.8)
where δ = (δ1, . . . , δS)′ and u ≡ ν − vec (ıδ′). The cohort fixed effects vector δ can
be then eliminated from (5.8) using the within group transformation matrix of the
form M = IS ⊗ (IT − (1/T )ıT ı′T ). Using the terminology of I2008 (or alternatively of
Dargay (2007)) the GMM (or the Fixed Effects (FE)) estimator is given by
θGMMl = (X ′MΩMX)−1X ′MΩMy, (5.9)
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 148
where the subscript l stands for “linear” andΩ is some pre-specified [ST×ST ] positive-
definite weighting matrix. The asymptotically efficient version of this estimator can be
found in Appendix 5.A.3 together with the underlying assumptions.
This estimator remains consistent provided that (5.7) holds, because in this case
Mup−→ 0S(T−1). This condition in some cases can be too restrictive as it imposes
that all cohorts respond similarly to common shocks (on average). However, it can still
be reasonable to maintain the less restrictive assumption that E[λi|i ∈ Is,t] = λs, such
that
E[λ′ift|i ∈ Is,t] = λ′sft. (5.10)
Under this assumption all individuals i in cohort s have an error-component struc-
ture with common time varying-mean, or in other words, a cohort interactive effects
structure.
Before formally characterizing the asymptotic properties of the θGMMl estimator under
the cohort interactive effects structure in (5.10), we define
F = (f1, . . . ,fT )′, Λ = (λ1, . . . ,λS)′.
Here F and Λ are [T × L] and [S × L] matrices of factors and cohort factor loadings,
respectively. As a special case, in the fixed effects model, both F = c and Λ = δ are
T and S dimensional vectors. For the more general model we define ui,t to be
ui,t ≡ νi,t − λ′sft, i ∈ Is,t,
such that the newly combined error term has mean zero, i.e. E[ui,t] = 0, i ∈ Is,t. Using
this notation we state formally the assumptions we impose on the error terms ui,t.
(A.1) Ns,t → ∞, ∀s, t; ∃N → ∞ s.t. Ns,t/N → πs,t and 0 < minπs,t < maxπs,t <
∞. T and S are fixed (Type I asymptotics).
(A.2) ui,t are i.h.d. with finite 2 + δ moment, for δ > 0, such that√Ns,tus,t
d−→N (0, σ2
s,t) jointly ∀s, t with 0 < minσ2s,t ≤ maxσ2
s,t <∞.
Assumption (A.1) states that the number of individuals per cohort at any time t
should be large and asymptotically non-negligible as compared to N , while the number
of cohorts and time periods is fixed.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 149
Remark 5.1. Note that in (A.1) instead of explicitly assuming thatN =∑T
t=1
∑Ss=1 Ns,t
we allow for some generic N . The estimators and the test statistics considered in this
paper are invariant to a particular choice of N . In general, one can think of N to be the
sum (as in Inoue (2008)), average or even any particular value of Ns,t (as in McKenzie
(2004)).
In (A.2) we do not impose the i.i.d. assumption unlike I2008, but allow for het-
eroscedasticity between individuals and over time. Furthermore, this assumption can
be relaxed by allowing a certain degree of spatial dependence between individuals of
the same cohort. In that case consistency (or inconsistency) properties of all estimators
discussed in this paper are not altered, but the knowledge about the exact structure of
the spatial dependence is required for correct inference.
Similarly to the model with cohort fixed effects one can rewrite the stacked equation
for y as
y = Xθ + ν = vec (FΛ′) +Xθ + u.
In this case under high-level Assumptions (A.1)-(A.2) the following result applies to
the θGMMl estimator
Proposition 5.1. If (5.10) holds, plimN→∞X = X∞ and FΛ′ are deterministic, then
under Assumptions (A.1)-(A.2)
θGMMl − θ0p−→ (X ′∞MΩMX∞)
−1X ′∞MΩM vec (FΛ′).
Proof. In Appendix 5.A.1.
Thus, the GMM/FE estimator converges in probability to a value that depends on
unobserved factors (F ) and on cohort factor loadings (Λ). Note that, in principle, it
is possible that both limiting quantities have zero mean (hence θGMMl can be asymp-
totically unbiased), if one assumes FΛ′ to be stochastic. Technical reasons behind the
assumption that some of the quantities have to be deterministic are discussed later in
this paper in more detail.
5.3.2 Assumptions and estimation
Given that the θGMMl estimator is in general inconsistent in the presence of the mul-
tifactor error structure, another estimation strategy is needed to obtain consistent
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 150
estimates of θ. For this purpose, we adopt the quasi-differencing (QD) approach of
Ahn et al. (2001) and Ahn et al. (2013), that is tailored for genuine panel data models
with fixed T . Their approach suggests the use of the transformation matrix Ms(φ)
that depends on an unknown parameter vector φ so that (for T > L)
Ms(φ)F = O(T−L)×L.
In other words, one has to introduce the additional parameter vector φ in order to
remove the unobserved factors F from the model. Unlike the standard setup with fixed
effects δs only, where the factors and consequently the corresponding transformation
matrix are known (up to a constant), the Ms(φ) matrix is unknown and depends on
φ which has to be estimated jointly with θ.
Observe that for each [L× L] invertible matrix A we have
Fλs = (FA)(A−1λs
)= F ∗λ∗s.
In order to avoid this rotational indeterminacy (or in other words, non-uniqueness to
multiplication), we can normalise F ∗ = (Φ′,−IL)′ (assuming that the lower [L × L]
block (FL×L) of F matrix is of full rank). One can then set Ms(φ) to be
Ms(φ) = (IT−L,Φ), (5.11)
where φ = vec (Φ). Analogously to the fixed effects transformation matrix, we define
the stacked version of this matrix using the Kronecker product, i.e. M (φ) = IS ⊗Ms(φ).
Given the transformation matrix M (φ) we define the non-linear GMM estimator
γGMMn = (θ′GMMn, φ′GMMn)′ as the global minimiser of the following objective function
f(γ) =1
2[(y −Xθ)′M (φ)′ΩM (φ)(y −Xθ)] , (5.12)
for some pre-specified [S(T −L)×S(T −L)] positive definite weighting matrix Ω. The
corresponding gradient of the objective function in (5.12) is given by
∇f(γ) =
(−X ′M (φ)′
Q′((y −Xθ)⊗ IS(T−L)
) )ΩM (φ)(y −Xθ)
=
(Dθ(γ)′
Dφ(γ)′
)ΩM (φ)(y −Xθ).
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 151
Here Dγ(γ) = (Dθ(γ),Dφ(γ)) is the Jacobian matrix of the moment conditions
M (φ)(y−Xθ), evaluated at some γ (when evaluated at the true value γ0 we suppress
the dependence on γ). Finally, Q is an [S2T (T − L) × (T − L)L] selection matrix of
the following form
Q = (((IS ⊗KTS)(vec IS ⊗ IT ))⊗ IT−L)V , V =
((O(T−L)×L
IL
)⊗ IT−L
),
with zeros and ones as elements.
Before proceeding, we extend the set of the high-level assumptions that are sufficient
to prove the asymptotic results for γGMMn.
(A.3) γ = (θ′,φ′)′; γ ∈ Γ ⊂ RK+(T−L)L and γ0 ∈ interior(Γ ). The parameter space
Γ is compact.
(A.4) rk [plimN→∞Dγ ] = K+ (T −L)L. The X∞ ≡ plimN→∞X matrix is determin-
istic.
(A.5) a) L = L0 < minS, T, while (S − L)(T − L) > K with L0 being the true
number of factors with non-zero mean factor loadings. b) rk(Λ) = L0. c) F−1L×L
exists. d) F and Λ are deterministic matrices.
(A.6) The model is asymptotically identified: plimN→∞M (φ)(y −Xθ) = 0S(T−L)
implies γ = γ0.
The probability limit of the (transposed) Jacobian D′γ in (A.4) can be expressed in
the following way
plimN→∞
D′γ =
(−X ′∞M(φ0)′
Q′(vec (FΛ′)⊗ IS(T−L))
).
Here, the regressor matrix X∞ = (W ,Z∞), has a typical st’th row element given by
(w′s,t, limN→∞(1/Ns,t)∑Ns,t
i=1 E[z′i,t|i ∈ Is,t]). If the zi,t are i.i.d. for all i ∈ Is,t the st’th
row is simply given by (w′s,t,E[z′i,t|i ∈ Is,t]).
(A.4) is the strong identification assumption commonly used in the standard GMM
setting. This assumption is quite restrictive even when φ is known as noted by Ver-
beek (2008): “While it is not obvious that this requirement will be satisfied in empirical
applications, it is also not easy to check, because estimation error in the reduced form
parameters may hide collinearity problems. That is, sample cohort averages may exhibit
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 152
time-variation while the unobserved population cohort averages do not.”. The impor-
tance of this problem is illustrated in Section 5.4.2 as well as in the Monte Carlo section
of this paper. However, we leave the properties of the identification-robust inference
procedures of e.g. Kleibergen (2005) for future research.
For genuine panel data models Ahn et al. (2013) formulate (A.5) slightly differently:
“...denotes the number of the individual-specific effects that are correlated with the re-
gressors...”. In our case, only factors with non-zero mean factor loadings are of interest
for estimation, as factors with zero mean factor loadings cannot be identified from cross-
sectional averages. Hence, it is possible that the genuine panel data estimators would
identify more factors than the pseudo panel data estimator even if applied to the same
dataset.
Assumption (A.6) imposes asymptotic global identification for the objective function.
In Section 5.4.3 we provide detailed examples when this condition can be violated.
Note that Assumptions (A.1)-(A.6) do not impose any exogeneity restriction of zi,t
with respect to ui,t, thus elements of zi,t are allowed to be endogenous, as noted at the
end of Section 5.2.
Remark 5.2. In this paper, we treat both cohort specific variables ws,t and the cohort-
interactive effects component FΛ′ as deterministic. Equivalently, Assumptions (A.1)-
(A.6) can be formulated conditional on these observed and unobserved quantities, but
in that case one has to rely on limit theory developed in Kuersteiner and Prucha (2013;
2015) to obtain the limiting distribution. Our treatment of the unobserved quantities
is similar to the genuine panel data models for fixed T , where F is usually treated
as deterministic (as in Ahn et al. (2013) and Robertson and Sarafidis (2015)). The
deterministic treatment of variables is only needed in order to avoid technicalities,
without any effect on the way estimation and inference are performed (as emphasized
by Kuersteiner and Prucha (2013; 2015)).
Assumptions (A.1)-(A.6) are sufficient to obtain the following asymptotic represen-
tation of γGMMn.
Proposition 5.2. Suppose that Assumptions (A.1)-(A.6) are satisfied. Then γGMMn
has the following asymptotic representation.
√N(γGMMn − γ0)
d−→ plimN→∞
((D′γΩDγ
)−1D′γ
)ΩM (φ0)Σ1/2ξ, (5.13)
where ξ ∼ N (0ST , IST ) and Σ is an [ST × ST ] diagonal matrix with the typical (s −1)T + t diagonal element given by σ2
s,t/πs,t.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 153
These results can be proved using standard arguments, e.g. as in Newey and McFadden
(1994).
Remark 5.3. Similar to the original setup of Ahn et al. (2013) for genuine panels, the
asymptotic distribution of γGMMn is well-defined only in the case of the true value of
L = L0 as imposed by Assumption (A.5).
The asymptotic variance-covariance matrix (treating X∞, F and Λ as deterministic)
is minimised at Ωopt = (M(φ0)ΣM (φ0)′)−1. In that case, the asymptotic variance-
covariance matrix of the θGMMn is given by
plimN→∞
(D′θΩ
1/2optMΩ
1/2optDφ
Ω1/2optDθ
)−1
,
where MΩ
1/2optDφ
= IS(T−L) − Ω1/2optDφ
(D′φΩoptDφ
)−1D′φΩ
1/2opt is the usual “residual
maker” projection matrix that projects off the column space of Ω1/2optDφ.
Remark 5.4. Depending on the assumptions made for the error terms (heteroscedas-
ticity or homoscedasticity over time and cohorts) and regressors (static or dynamic
model) the formulae for consistent estimation of Σ can be used without modifications
as in I2008. The typical (s − 1)T + t diagonal element of the Σ matrix is equal to
q2s,t = (N/Ns,t)σ
2s,t where
σ2s,t =
1
Ns,t
∑i∈Is,t
(yi,t − x′i,s,tθ1)2 −
1
Ns,t
∑i∈Is,t
(yi,t − x′i,s,tθ1)
2
(5.14)
for some consistent initial estimator θ1 (e.g. the estimator that uses the identity matrix
for Ω).
For a fixed value of S and T , the unconditional (treating deterministic quantities as
stochastic) distribution of√N(γGMMn − γ0) is not multivariate normal and depends
on the factors, cohort-specific factor loadings and cohort specific regressors ws,t in the
limit. Note that the limiting distribution of the linear GMM estimator γGMMl (as
in I2008 ) is also normal only conditionally on cohort specific regressors ws,t, while
unconditionally it is not. Hence the conditioning argument is not unique to the non-
linear estimator.
Note that the number of rows in theQ matrix is quadratic in both S and T . Thus, even
for moderate dimensions, numerical computations might become cumbersome. Given
that under Assumptions (A.1)-(A.6) the Σ matrix is diagonal (or block-diagonal if
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 154
dynamics are allowed), we can also limit our attention to the block diagonal (over
cohorts) Ω weighting matrix. In this case, the objective function can be substantially
simplified to
f(γ) =1
2
S∑s=1
(ys −Xsθ)′Ms(φ)′ΩsMs(φ)(ys −Xsθ).
Here as before Ms(φ) = (IT−L,Φ), while Ωs is the s’th block of the block-diagonal
matrix Ωs. The gradient can be also expressed as a sum
∇f(γ) =S∑s=1
(−X ′sMs(φ)′
V ′ ((ys −Xsθ)⊗ IT−L)
)ΩsMs(φ)(ys −Xsθ),
with V as defined previously. This simplification of the objective function is used in
Section 5.5 while conducting the Monte Carlo study as well as in the empirical exercise.
As an alternative to the quasi-differencing approach with respect to factors, one can
also construct a similar estimator by quasi-differencing over cohorts. In this case, one
has to look for a redefined φ s.t.
M (φ)(Λ⊗ IT ) = 0T (S−L). (5.15)
In this case the φ vector is of dimension (S−L)L rather than (T−L)L. This alternative
QD transformation might be of particular interest if T >> S and it is reasonable to
consider a large N, T asymptotic framework as in McKenzie (2004) (Type III). As a
result, the number of parameters does not grow as both N, T increase.
Remark 5.5. The possibility to perform quasi-differencing with respect to either Λ or
F is similar in spirit to the estimation procedure of Robertson and Sarafidis (2015) for
genuine panel data models. For the model studied in Robertson and Sarafidis (2015),
quasi-differencing can be performed either with respect to the F or G matrices (where
G depends on the covariance between the factor loadings and the instruments).
Remark 5.6. For the sake of brevity, in the remaining sections we focus on the original
setup, but all results can be modified accordingly by redefining theM(φ) matrix. Note
that if we estimate the factor loadings instead of factors themselves, the weighting
matrix Ω in some cases is not block diagonal over the time dimension. Furthermore,
if the Σ matrix is not diagonal, the optimal Ω in the second step is not even block
diagonal in the S dimension. As a result, one cannot use the simplified objective
function for estimation.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 155
5.3.3 Unbalanced samples
The quasi-differencing approach can be extended to allow for the number of time-series
observations to be cohort specific (Ts), such that observations for cohort s are observed
only at time periods t ∈ Ts. This extension is important from an empirical point of
view as some cohorts can disappear if the time span is sufficiently long, or alternatively,
if irregularly spaced surveys from different countries are used, as in McKenzie (2001).
The total number of cross-sectional averages is TS′ =∑S
s=1 Ts, which in the case of a
balanced panel is equal to ST . At first we define
P = diag(P1, . . . ,PS), T =
∣∣∣∣∣S⋃s=1
Ts
∣∣∣∣∣ ,where T is the total length of time series, while each Ps is a [Ts × T ] group specific
“selection matrix” of ones and zeros that is equal to IT if the panel is balanced. Using
this notation, the DGP in the stacked notations can be expressed as
y = P vec (FΛ′) +Xθ + u, (5.16)
where X now is of dimensions [TS′ × K]. Assuming that the last L observations are
observed for all cohorts, the QD matrix M(φ) in this case can be written in the
following way
M(φ) = R(IS ⊗ (IT−L,Φ))P ′, (5.17)
where R is a [(TS′ − SL)× (T −L)S] block-diagonal selection matrix that selects only
those rows of (IS ⊗ (IT−L,Φ))P ′ that are available to the researcher.
In general, observations other than the last L can be used for normalization. The
general necessary condition for normalisation to be feasible is
Tmin =
∣∣∣∣∣S⋂s=1
Ts
∣∣∣∣∣ ≥ L, (5.18)
which is obviously satisfied if the last L observations are observed. Furthermore, we
define the following set as
S∗ = i, j, . . . , k ∈ 1, . . . , S : rk (λi,λj, . . . ,λk) = L.
In other words, the set S∗ contains all subsets of cohorts whose factor loadings span an
L dimensional space. Using this definition we can formulate the identifying restriction
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 156
as follows
∀t ∈S⋃s=1
Ts, ∃St ⊆ S∗, s.t. ∀s ∈ St : t ∈ Ts, |St| ≥ L. (5.19)
Combining these assumptions we can rewrite Assumption (A.5) as:
(A.5) a)∑S
s=1 Ts − (T − L + S)L > K with L0 being the true number factors with
non-zero mean factor loadings. b) ∀t ∈⋃Ss=1 Ts, ∃St ⊆ S∗, s.t. ∀s ∈ St : t ∈
Ts, |St| ≥ L0. c)∣∣∣⋂S
s=1 Ts∣∣∣ ≥ L0 and F−1
L×L exists. d) F and Λ are deterministic
matrices.
In this case, FL×L denotes some [L × L] dimensional block of the F (not necessarily
the last block), that is used for normalisation.
5.3.4 Dynamic models
So far we have assumed that the vector of individual specific regressors zi,t does not
contain any lags of yi,t. If p lags of yi,t are included among the regressors, then the
previous results need to be adjusted. Note that unlike in genuine panel data models,
the mere presence of yi,t−1, . . . , yi,t−p does not violate the consistency of the non-linear
GMM estimator for fixed T , and consequently one does not have to use lagged values of
yi,t as instruments. The fact that under Type I asymptotics dynamic pseudo panel data
estimators have no “Nickell” bias (Nickell (1981)) is well documented in the literature
(e.g. McKenzie (2004) and Verbeek and Vella (2005)).
For most pseudo panel datasets no information regarding the history of any individual
i is observed, hence the historical averages (y∗s,t−1, . . . , y∗s,t−p) are not observed either.1
The following modified Assumption (A.2) is sufficient to allow for the use of observed
quantities (ys,t = (ys,t−1, . . . , ys,t−p)′) instead of their unobserved counterparts (y∗s,t =
(y∗s,t−1, . . . , y∗s,t−p)
′):
(A.2) ui,t are i.h.d. with finite 2 + δ moment, for δ > 0, such that√Ns,t(us,t + (y∗s,t−
ys,t)′α0)
d−→ N (0, σ2s,t) jointly ∀s, t ≥ p with 0 < minσ2
s,t ≤ maxσ2s,t <∞.
1Here y∗s,t−p = (1/Ns,t∑i∈Is,t yi,t−p) and we use ∗ to distinguish between averages of the lagged
dependent variables and lagged averages of dependent variables themselves ((1/Ns,t∑i∈Is,t yi,t−p) vs.
(1/Ns,t−p∑i∈Is,t−p
yi,t−p)).
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 157
Here α0 is a [p × 1] vector of corresponding true coefficients for lagged dependent
variables. Due to the non-zero covariance between us,t−1 and ys,t−1 (for p = 1 and
analogously for p > 1), the variance-covariance matrix Σ has a block tri-diagonal
structure (for further details please refer to Assumption (2c) of I2008 or Theorem 2 of
McKenzie (2004)).
Before considering the dynamic model for potentially unbalanced datasets in greater
detail we define the following extended sets
T ∗s = Ts⋃T +s , s = 1, . . . , S,
where T +s is the set of indices for observed “pre-sample” observations. For balanced
samples, this definition implies that in total T + p observations are observed (so that
timing starts at −p + 1). Using this definition, Condition (5.18) can be extended to
allow dynamics in the model
∃A(l) ⊂
(S⋂s=1
T ∗s
), s.t.
∣∣A(l)∣∣ > p, l = 1, . . . , L, (5.20)
and each set A(l) is distinct and connected. In other words, it is possible to find L
distinctive time periods so that for all S cohorts the vector (ys,t, y′s,t, x
′s,t)′ is observed.
Although only L common subsets A(l) are needed for normalisation, it is still necessary
that for each cohort s the total number of complete observations is at least L + 1.
Making use of (5.20) we can reformulate Assumption (A.5) in the following way
(A.5) a)∑S
s=1 Ts− (T −L+S)L > K+p with L0 being the true number factors with
non-zero mean factor loadings. b) ∀t ∈⋃Ss=1 Ts, ∃St ⊆ S∗, s.t. ∀s ∈ St : t ∈
Ts, |St| ≥ L0. c) ∃A(l) ⊂(⋂S
s=1 T ∗s), s.t.
∣∣A(l)∣∣ > p, l = 1, . . . , L and F−1
L×L
exists. d) F and Λ are deterministic matrices.
Note that for datasets with highly disconnected T ∗s this condition might be violated,
thus observations of some cohorts need to be discarded for the estimation procedure to
be feasible (for any given L). Assumption (A.6), on the other hand, can be difficult
to satisfy for some dynamic models, as we discuss in Section 5.4.3.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 158
5.4 Testing, model selection and identification
In this section we briefly discuss how hypothesis testing and the selection of the number
of factors can be performed under the conditions of Proposition 5.2. Later we discuss
some examples, where one or more of these conditions can be potentially violated.
Particularly, we discuss the issues of local and global identification.
5.4.1 Testing and model selection
Given that the estimator derived under Assumptions (A.1)-(A.6) has a well-defined
asymptotic normal limit, hypothesis testing is conducted in the usual way. First of all,
the t− and Wald statistics can be used to test parameter restrictions. Secondly, we
can consider the Wald test for H0 : ft = c 6= 0,∀t = 1, . . . , T (hence φt = −1) of the
fixed effects model:
W = N(φGMMn + ιT−1
)′( Avar φGMMn
)−1 (φGMMn + ιT−1
)d−→ χ2(T −1). (5.21)
Here, the consistent estimator of the variance-covariance matrix of φGMMn is given by
Avar φGMMn =(Dφ(γ)′Ω1/2MΩ1/2Dθ(γ)Ω
1/2Dφ(γ))−1
, (5.22)
with Ω =(M (φ1)Σ(θ1)M(φ1)′
)−1
evaluated at a consistent one-step estimator γ1
(e.g. using the identity weighting matrix Ω). Given that the number of degrees of
freedom grows linearly with T , one can suspect that some loss of power for moderate
values of T might occur. Furthermore, in Appendix 5.A.4 we discuss how a Hausman
type test can be performed in order to test the Fixed Effects assumption.
Similar to any standard GMM estimation problem it can be shown that under (A.1)-
(A.6) the criterion function has a limiting chi-square distribution (provided that (S −L)(T −L)−K > 0, and accordingly modified for unbalanced and/or dynamic models):
JN(L) = N(y −Xθ)′M(φ)′ΩoptM(φ)(y −Xθ)d−→ χ2
(S−L)(T−L)−K ,
if L = L0. Here JN(L) denotes the corresponding “J” statistic for the model with L
factors. Testing for the number of unobserved factors can be performed sequentially
as in Ahn et al. (2013) or using a BIC model selection criterion. One starts with
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 159
H0 : L0 = 0 and if the null hypothesis is rejected proceeds with H0 : L0 = 1. The
sequential procedure can be motivated by the fact that for L < L0 (for any positive
definite Ω)
JN(L) = N(y −Xθ)′M(φ)′ΩM (φ)(y −Xθ)→∞.
Alternatively, for model selection we use the Schwartz Information criterion (BIC) of
the following form
SN(L) = JN(L)− a ln (N)((T − L)(S − L)−K).
For further details please refer to Propositions 2 and 3 of Ahn et al. (2013). However,
as we further discuss in the section on global identification, the sequential procedure
and BIC can fail to consistently estimate the true number of factors if the global
identification assumption is violated, as for some DGP’s in this case JN(L = 0)d−→
χ2ST−K .
Remark 5.7. While JN(L) is invariant to a particular choice of “N”, this is not the
case for SN(L). In the empirical section we consider two BIC criteria based on N =∑Ss=1
∑Tt=1 Ns,t as e.g. in I2008 and N = (1/ST )
∑Ss=1
∑Tt=1Ns,t.
5.4.2 Identification: Local and Weak
At first, we consider the local asymptotic identification condition summarised in (A.4).
For the asymptotic distribution to be properly defined, the matrix M(φ0)X∞ should
have a full column rank K. This condition is more restrictive than the analogous
condition for the model with only fixed effects, where it is necessary thatMX∞ (where
M = IS ⊗ (IT − (1/T )ıT ı′T )) is of full column rank. For example, the rank condition
for the non-linear GMM estimator in a model with one individual specific regressor
only is not satisfied if the DGP of the regressor, is of the following form
zi,t = f ′tλzi + εzi,t, εzi,t ∼ (0, σ2
i,t), (5.23)
and as a result one has
plimN→∞
M(φ0)z = 0S(T−L).
In this example the cross-sectional averages of zi,t asymptotically lie in the space
spanned by F . In the fixed effects model (i.e. f ′tλzs = δs) this condition is violated if
the mean of zi,t does not sufficiently vary over time. On the other hand, if the factors ft
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 160
in the equation for zi,t differ from the corresponding factors in yi,t then the asymptotic
identification condition is satisfied.
Remark 5.8. In the genuine panel data literature sometimes it is assumed that factor
loadings in the equation for zi,t have a different non-zero mean than the corresponding
factor loadings in the equation for yi,t (in our case that implies E[λzi ] 6= λs 6= 0L). In
such a case it might be tempting to use the Quasi-differencing approach with respect
to cohort factor loadings (as in (5.15)) rather than factors to circumvent the problem
with local identification. However, in Section 5.4.3 we show that for this setup the
global (rather than local) identification assumption is violated.
The full rank Jacobian condition plays an important role in deriving consistency and
asymptotic normality of the GMM estimator. To further illustrate the importance of
this assumption for pseudo panel models we consider the simplified example with one
endogenous regressor.
Example 5.1 (One Regressor).
yi,t = ζzi,t + ui,t, ui,t = ρ
(zi,t −
µs,tNγs,t
)+ εi,t i ∈ Is,t,
zi,t =µs,tNγs,t
+ ε(z)i,t , εi,t, ε
(z)i,t ∼ iid(0, 1).
For simplicity, we assume that λs = 0L,∀s and µs,t are deterministic, so that the first
step estimator of ζ with Ω = IST is simply given by
ζ =
∑Ss=1
∑Tt=1 zs,tys,t∑S
s=1
∑Tt=1 z
2s,t
.
Then we can state the following result:
Proposition 5.3. Let the assumptions in the One Regressor example be satisfied, then
(ζ − ζ0)− ρ d−→∑S
s=1
∑Tt=1 π
−1s,t zs,tεs,t∑S
s=1
∑Tt=1 π
−1s,t z
2s,t
, if γ ≥ 1/2.
N1/2−γ(ζ − ζ0)d−→∑S
s=1
∑Tt=1 π
−(0.5+γ)s,t µs,tus,t∑S
s=1
∑Tt=1 π
−2γs,t µ
2s,t
, if 0 ≤ γ < 1/2.
Proof. In Appendix 5.A.1.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 161
Here the limiting random variables with subscript s, t are defined as
zs,t ∼ 1(γ=1/2)µs,t +N (0, 1),
εs,t ∼ −1(γ=1/2)ρµs,t +N (0, 1),
us,t ∼ N (0, 1 + ρ2),
where all Gaussian random variables are mutually independent. As a result, the non-
scaled estimator of the structural parameter ζ converges in distribution to a random
variable centered at ζ0 + ρ under weak-instrument asymptotics (if all µs,t = 0 for γ =
1/2). On the other hand, for semi-weak (or semi-strong) identification (0 ≤ γ < 1/2),
the estimator retains the asymptotic (conditionally) normal limit centered at the true
value ζ0 but with slower rate of convergence N1/2−γ.
Although one can think that the aforementioned example is quite restrictive, it is of
particular importance if one considers the AR(1) model
yi,t = αyi,t−1 + ui,t, i ∈ Is,t. (5.24)
In this case, the only source of variation (µs,t) is coming from the possible mean (effect)
non-stationarity of the initial condition
yi,0 =µsNγs,t
+ ui,0. (5.25)
Furthermore, as by construction the equation of interest contains an endogenous regres-
sor (as discussed in Section 5.3.4) with coefficient ρ = −α0, the estimator α converges
to a random limit centered at zero for γ > 1/2.
The important lesson that we learn from this example is that in cases where the rank
condition can be potentially locally violated, endogeneity starts to play an important
role. This is in sharp contrast to the full-rank assumption. In other words, endogenous
regressors play a role even if one considers the Type I asymptotic scheme. We inves-
tigate implications of this example for the more detailed model in the Monte Carlo
section of this paper.
5.4.3 Identification: Global
In addition to the full rank condition of the Jacobian matrix, the model has to be
globally identified, as formally summarised in Assumption (A.6). We start this section
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 162
with the most trivial case by considering the setup as in (5.23), but with
E[λzi ] = κλs,∀i ∈ Is,t, κ 6= 0. (5.26)
Then the required global identification condition for the model with one regressor is of
the very simple form
M (φ) ((IS ⊗ F ) vec(Λ′) (κ(ζ0 − ζ) + 1)) = 0S(T−L).
This condition is satisfied for ζ = ζ0 + 1/κ, irrespective of the value of φ. In this case,
if one performs sequential selection of factors, the model with L = 0 can be selected,
as the JN(L = 0) statistic for this model has a non-degenerate chi-squared limit. The
corresponding (inconsistent) estimator satisfies ζp−→ ζ0 + 1/κ.
Assumption (A.6) is particularly difficult to satisfy for linear dynamic models without
any additional regressors. As we will see in the next two examples, a necessary condition
for global identification of dynamic models is the presence of regressors (or initial
condition yi,0), that cannot be well approximated by the factor structure present in the
model itself. To illustrate this point, consider a linear AR(1) model
yi,t = λ′sft + αyi,t−1 + ui,t, i ∈ Is,t.
If the model only contains fixed effects, i.e. ft = c for all t, and the initial condition is
mean-stationary, then
yi,t−1 =δs
1− α0
+∞∑j=0
(α0)jui,t−1−j, i ∈ Is,t.
Coming back to equation (5.26) we have κ = 1/(1 − α0), and thus irrespective of the
value for M (φ)
plimN→∞
M (φ)(y −Xα) = 0S(T−L),
at α = 1. As a result, unlike the linear GMM estimator of I2008, where the stationary
initial condition violates the local identification assumption (A.4), (as in Example 5.1),
estimation of unobserved factors in this case also leads to the violation of the global
identification assumption (A.6).
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 163
The presence of unobserved factors in general is not sufficient for global identification.
To illustrate that, we consider the process for yi,t with “infinite” initialisation at “−∞”
yi,t−1 = λ′s
(∞∑j=0
(α0)jft−1−j
)+∞∑j=0
(α0)jui,t−j = λ′sf∗t−1 +
∞∑j=0
(α0)jui,t−1−j.
Defining F ∗ = (f ∗0 , . . . ,f∗T−1)′ the global identification condition can then be formu-
lated in the following way
plimN→∞
M (φ)(y −Xα) = M (φ) vec(((α0 − α)F ∗ + F )Λ′) = 0S(T−L).
If we can further assume that the appropriate [L×L] block of the F = (α0−α)F ∗+F
matrix is invertible, then each parameter value from the set
Γ = γ = (α,φ′)′ ∈ R(T−L)L+1 : α ∈ R, Φ = −F[(T−L)×L]F−1[L×L],
satisfies the moment conditions.
Based on these two examples, we can see that a necessary condition for global identi-
fication of dynamic models is the presence of regressors (or initial condition yi,0) that
cannot be well approximated by the factor structure present in the model itself. Note
that the non-linear dynamic pseudo panel data models as studied by e.g. Antman
and McKenzie (2007b) or more general models with regressors can still be globally
identified.
Finally, we can obtain a similar result if we consider the example in (5.26) with quasi-
differencing with respect to factor loadings rather than factors
M (φ) vec(F (Λ+ (ζ0 − ζ)Λ(z))′) = 0T (S−L), ∀γ ∈ ΓΛ,Λ = (ζ0 − ζ)Λ(z) +Λ,
ΓΛ = γ = (ζ,φ′)′ ∈ R(S−L)L+1 : ζ ∈ R, Φ = −Λ[(S−L)×L]Λ−1[L×L]
even if E[λ(z)i ] 6= λs 6= 0L. Although locally this transformation provides identification,
the global identification condition is violated.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 164
5.5 Simulation study
5.5.1 Setup
The main goal of this Monte Carlo study is to investigate the effects of possibly en-
dogenous regressors in the nearly singular designs with factors. By doing so, we expand
the literature to models with unobserved factors and cases where asymptotic identifi-
cation assumption might be (locally) violated. As the main focus of this paper is on
static pseudo panel data models, we do not investigate the finite sample properties of
estimators in dynamic specifications.
The summary of the Monte Carlo setup is provided below.
yi,t = ftλs + βws,t + ζzi,t + εi,t, i ∈ Is,t,
zi,t = µs,t +√
1− σ2µzi,t, zi,t ∼ N (0, 1), µs,t = 1 + σµN (0, 1),
ws,t ∼ N (0, 1), λs ∼ U(0, 1),
λs ∼ N (λs, σ2λs), ft = 1 + σfu
(f)t , u
(f)t = αfu
(f)t−1 + ε
(f)t ,
ε(f)t ∼ N (0, 1− α2
f ), u(f)0 ∼ N (0, 1)
εi,t = ρ(zi,t − µs,t) + (√
1− ρ2)√
1− σ2fληi,t, ηi ∼ N (0, 1).
Several quantities were normalised to obtain a parameter space that is more “orthog-
onal” (following suggestions of Kiviet (2007; 2012)):
var zi,t = 1, var (ftλs + εi,t) = 1.
The following parameter space is considered:2
N = 150; 300, T = 5, S = 10,
σ2fλ = 0.1; 0.5, ρ = 0; 0.3, θ0 = 1; 1,σ2µ = 0; 0.05; 0.3, σ2
f = 0; 0.1, αf = 0.5.
To ensure that var (ftλs) = σ2fλ the σ2
λsis set to
σ2λs =
σ2fλ − σ2
f λ2s
σ2f + 1
.
2In the preliminary version of this paper, designs with T = 10 were also considered. But giventheir similarity to results for T = 5 we decided to present only the latter case.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 165
Note that all s, t specific variables λs, ft, ws,t, µs,t are simulated in every Monte Carlo
replication so that the limiting distribution of the estimator is only conditionally nor-
mal. This is done to emphasize that in Assumptions (A.1)-(A.6) these quantities are
assumed to be deterministic only for technical reasons. On the other hand, λs’s are
generated only once in each design to make sure that the GMMl0 (two step estimator
without any factors, i.e. L = 0) estimator is biased in finite samples. However, the re-
sults for other estimators do not change quantitatively or qualitatively if all λs = 0. The
σ2µ parameter is introduced to control the degree of asymptotic singularity of the GMM
Jacobian matrix. Similar to I2008, other distributional assumptions for λs, µs,t, ws,tcan be considered, but the given setup is sufficient for our purposes. Ns,t is set to
be bπs,tNST c, where πs,t ∼ U(0, 1). Note, by generating λs, ft in each replication we
deviate from the theoretical discussions in this paper, but are more in line with genuine
panel data literature and the setup of I2008.
We consider 48 different Monte Carlo designs in total and for convenience of further
discussion we divide them in four different groups with 12 designs each. We denote the
groups by letters A, B, C and D in Tables 5.5 and 5.6. We denote two linear GMM
estimators that do not allow any cohort effects (M = IST ) or only time-invarying co-
hort effects (M = IS⊗ (IT − (1/T )ıT ı′T )), by GMMl0 and GMML1 (that are obtained
as in (5.9) but with the optimal weighting matrix). Furthermore, we use abbrevi-
ations GMMn1, GMMn2 and GMMo to denote the two-step non-linear GMM with
L = L0 = 1, non-linear GMM with L = 2 (both solutions to (5.12)) and GMM based
on BIC selection criteria, respectively. All results are presented for the two-step esti-
mators (where necessary) with the asymptotically optimal weighting matrix Ω under
the assumption that σ2s,t = σ2.3,4 In this case for an estimator of σ2 we use
σ2 =1
ST
T∑t=1
S∑s=1
σ2s,t
with σ2s,t defined in (5.14). In this section, we discuss the results for the ζ parameter
only; results for β are available from the author upon request.
Remark 5.9. Note that, for the given setup the GMMl0 estimator is always inconsistent
and biased, where the second result is due to λs being non-zero. The GMMl1 estimator,
3Note that under these assumptions all linear GMM estimators are one-step efficient and we usethis fact in estimation, whereas to obtain non-linear GMM estimators we perform estimation in twosteps as the optimal weighting matrix depends on unknown parameters.
4Note that for GMM estimators with factors as starting values we use “GMMl1” estimator forθ and a vector of zeros for factors. Based on preliminary simulations, results were not found to besensitive to starting values.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 166
on the other hand, is always unbiased but inconsistent if σ2f 6= 0. In this paper, we do
not consider any designs where for σ2f 6= 0 and ρ = 0 the GMMl1 estimator is both
inconsistent and biased. This is mainly due to the impossibility to control the relative
variance ratio as in that case var (ftλs) would have to vary over time and/or cohorts.
As this possibility is not desirable for a fair comparison, we leave analysis of this case
for future research.
5.5.2 Results: Estimation
In this section we summarise the bias and RMSE properties of the estimators as pre-
sented in Table 5.5. Firstly, we discuss the results for the GMML0 estimator. As
argued in the setup of this Monte Carlo study, the designs considered ensure that
GMMl0 is biased in finite samples. Unbiasedness of the estimator can be easily ob-
tained by fixing λs = 0 (or any other constant if an intercept is included). The value
of the RMSE is mostly driven by squared bias, as can be expected given the minimum
variance properties of this estimator.
We now turn our attention to the properties of other estimators in each of the four
subgroups.
(A) In all 12 designs it can be seen that the GMMl1 estimator does not exhibit
any visible bias, while non-linear estimators tend to be slightly positively biased
in the first four designs. In general, the results are not surprising despite the
fact that none of the estimators is consistent for σ2µ = 0. As all µs,t = 0, the
estimators are still centered around the true value (recall Example 5.1). The
effects of asymptotic non-identification, on the other hand, show up clearly once
we consider the corresponding values of the RMSE. Even the slightest increase
in σ2µ reduces RMSE substantially. Furthermore, observe that for σ2
µ = 0 the size
of N does not have any effect on RMSE, while in other cases it has a substantial
negative effect, as can be expected given the consistency. Finally, higher values
of σ2µ have a positive effect (in terms of lower value) on RMSE of all estimators,
as it has a direct impact on the variance of all GMM estimators.
(B) In this subset of designs, the zi,t regressor is endogenous and has a non-zero
correlation coefficient ρ. In the first four designs when all GMM estimators are
asymptotically (locally) unidentifiable, the biases are roughly proportional to ρ.
Bias is somewhat smaller when a time-varying factor is present but still remains
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 167
substantial. Inflation in bias directly translates into larger values of RMSE. As
can be expected given the consistency of the GMM estimators, the bias quickly
disappears once σ2µ = 0.05. However, it still remains present as the corresponding
values of RMSE are larger in (B) as compared to (A).
(C) As compared to designs in group (A), designs with σ2fλ = 0.5 show substantial
improvements in terms of both bias and RMSE. While some bias was visible in
(A), it is no longer the case here. We can see that for σ2fλ = 0.5 with time-
varying factors there are no designs left with RMSE of GMMl1 being lower than
the corresponding value of GMMn1. However, it is still true in one case that the
RMSE that of GMMo is lower than of GMMn1. Furthermore, GMMn2 always
has higher RMSE than the correctly specified GMMn1.
(D) Finally, the results of (D) designs are very similar to those of (B), with conclusions
analogous to the (A)-(C) comparison.
5.5.3 Results: Testing
In this section, inferential properties of the estimators are considered and discussed.
The Wald statistic is used to test the fixed effects assumption, as discussed in Section
5.4.1. To simplify matters, we denote the two Hausman tests by h0 and h1, where h0 is
the test statistic for GMMl0 vs. GMMl1, and h1 tests for GMMl1 vs. GMMn1. Given
that in terms of size and power the Wald test dominates the h1 test (see Table 5.6),
we do not discuss the behaviour of the latter test in greater detail. The h0 almost in
all design rejects close to 100% of all Monte Carlo replications.
All test statistics have a nominal size of 5% (with the exception of GMMn2 estimator,
as no asymptotic results for that estimator are available).
(W) In Section 5.4.1 we have mentioned that the Wald test can be used as an alter-
native to the h1 test for the null hypothesis of no time-varying fixed effects. A
quick look at Table 5.6 suggests that for any given Monte Carlo setup the Wald
test is superior in terms of both size and power. Although we can see that for
low values of σ2fλ and N the test is slightly size-distorted (with a maximum of
12%), the distortions tend to disappear quickly and dominate the size of h1 test
in all cases. Similarly to the h1 test, a larger σ2µ does not seem to influence the
results a lot, while an increase in σ2fλ and/or N has a positive result for size
(closer to the nominal size). Conclusions regarding power are very similar to the
ones applicable to size but with accordingly adjusted implications.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 168
(J-) Results for the “J” test are discussed from left to right in Table 5.6. As can
be expected, based on the “J” test in all cases we reject the model without
any fixed effects. Patterns for empirical rejection frequencies of the “J” test for
GMMl1 estimator are very similar to those of the Wald test as described in the
previous paragraph, though the power is somewhat smaller for the “J” test. As
in all designs with σ2µ > 0, the GMMn1 and GMMo estimators are consistent,
the rejection frequencies for those estimators should be close 0.05. However, it
is not always exactly the case, as often some size distortions are visible (up to
12%). In general, distortions diminish with larger values of N , σ2fλ and σ2
µ. Note
that asymptotic (local) non-identification has no major influence on inference
and that can be partially explained using the results in Dovonon and Renault
(2009). Observe that for the GMMn2 estimator the “J” test statistic is always
undersized, which is driven by the fact that we fit more than the true number of
factors.
(t-) First of all, we note that results for GMMn1, GMMo and GMMn2 are very similar
to each other and can be ranked in terms of size distortions in the aforementioned
order. Secondly, as can be expected from the asymptotic efficiency perspective
when GMMl1 is consistent, it has better size properties than GMMn1. Moreover,
unlike the non-linear estimator, the linear estimator is obtained in one step, and
is not affected by possible estimation bias of the optimal weighting matrix. On
the other hand, when it is inconsistent, the rejection frequencies slowly approach
1 as N , σ2µ, and/or σ2
fλ increase. Similarly, the empirical rejection frequencies of
GMMn1, GMMo and GMMn2 converge to the nominal size of 5% for larger values
of the aforementioned design parameters. As can be expected given the bias
results, when all GMM estimators are not (locally) asymptotically identified, the
properties of the t− statistic are directly dependent on the value of the correlation
coefficient ρ, so that for ρ = 0.3 empirical rejection frequencies are close to 1.
As mentioned previously, these size distortions disappear once σ2µ increases, but
even for σ2µ = 0.3 all test statistics are slightly oversized.
5.5.4 Results: Model selection
The results of this section can be found in the last column of Table 5.6 (#L = 1),
where each number indicates the fraction of Monte Carlo replications in which the
correct number of factors was selected (L0 = 1 in this case). Here we adopt a proce-
dure similar to Ahn et al. (2013) and set a = 0.75/ln(5), while we “N” is defined as
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 169
N = (1/ST )∑S
s=1
∑Tt=1Ns,t. The results based N =
∑Ss=1
∑Tt=1Ns,t are similar and
available from the author upon request. Below, we briefly summarise the findings.
We can see that the BIC based model selection procedure performs well in general.
In 10 out of 48 designs, the proportion of correctly specified L is marginally lower
than 95% while in the majority of cases this is above 98%. The results are not highly
sensitive to the choice of the design parameters, but a few clear trends are still visible.
Firstly, a higher relative weight of unobserved factor components as represented by the
σ2fλ parameter has a clear positive effect on the model selection procedure. It can be
easily explained by the fact that for smaller values of σ2fλ the DGP is quite close to
the model without factors and thus GMMl0 is the preferred estimation procedure. We
find it surprising that there are no substantial effects on model selection when ρ > 0
and σ2µ = 0. This contradicts the findings of the previous two sections where clear
distortions in terms of the bias and the size of the test statistics were visible.
5.6 Empirical illustration: ENEMDU Dataset
5.6.1 The Dataset
Ecuador has experienced a period of rapid growth for the last 10 years. The average real
GDP growth was above 4% with a small positive real GDP growth of 0.5% observed
even in the midst of the global recession in 2009. Furthermore, the country went
through a period of substantial shift in terms of alleviating economic inequality and
poverty. According to data of the World Bank, the percentage of the population living
below the poverty line steadily declined from over 44% in 2004 to 25.6% in 2013. This
decline reflects substantial changes in the socioeconomic environment in the country.
Finally, the decline in the unemployment rate over the last decade is also clearly visible,
see Figure 5.1. Furthermore, even during the global economic downturn the national
unemployment rate did not exceed 9% which is a small number as compared to some
developed countries during the same period. The dramatic changes of these main macro
indicators suggest that substantial changes at the micro level also occurred. In this
section, we estimate the labour supply elasticity for working males that are also the
heads of the household. To accommodate possible common shocks, we estimate the
model with cohort interactive effects.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 170
Figure 5.1: National unemployment rate dynamics over 2003-2013. Source: BancoCentral Del Ecuador.
We use annual data from the National Employment and Unemployment Survey (EN-
EMDU) collected by the National Institute of Statistics and Census of Ecuador. The
dataset contains information at household level, with information provided over all in-
dividuals of age five and above. Due to the fact that in 2007 the survey methodology
was updated, and the fact that our estimation method is consistent for a small number
of time series observations we limit our sample to the period of 2007-2013. Further-
more, we consider only surveys from the fourth quarter of each year as it contains the
largest number of observations, which is also representative for annual observations.
This is partially done to ensure that each cohort contains at least 100 individuals.
Like some related studies (e.g. Antman and McKenzie (2007a) and Gonzalez and
Sala (2015)), we study the labour market participation of prime aged males (26-55)
occupying a single job. We restrict our sample to males who work for at least 30 hours
but no more than 60 hours per week to minimise the number of potential outliers.
Moreover, as we are only interested in the intensive margin (the number of hours
worked) of the labour supply, the observations with a lower number of hours worked
(corresponding to part-time workers) are not of prime interest. A joint study of the
extensive (decision to work full/part time) and intensive margin (the number of hours
worked) is complicated due to the scarcity of the available explanatory variables. To
obtain real rather than nominal income, we deflated individual income using the annual
Consumer Price Index (CPI) at the national level.
Before proceeding with estimation and model specification, we discuss how we define
cohorts in our study. Similar to Gonzalez and Sala (2015) we define cohorts solely
based on the age of the individual. In total, we construct 10 cohorts of equal age
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 171
3.76
3.78
3.80
3.82
3.84
27 30 33 36 39 42 45 48 51 54 57 60age
log
hour
s
(a) Average log hours worked.
0.5
0.6
0.7
0.8
0.9
1.0
27 30 33 36 39 42 45 48 51 54 57 60age
log
wag
es
(b) Average logwage.
Figure 5.2: Age of each cohort is defined as the middle point in the interval.
intervals based on individuals born in 1952-1981, therefore each cohort represents a
three year interval. Alternatively, one could define cohorts based on five year intervals
and/or geographical location. That strategy, on the other hand, would substantially
reduce the average number of observations per cohort or the total number of cohorts.
This is a common tradeoff faced by applied econometricians when dealing with pseudo
panel datasets, see e.g. Verbeek and Vella (2005) and Verbeek (2008). Hence, for
simplicity and ease of exposition we consider only results based on three year intervals.
As discussed by I2008 the adequacy of the model as well as the definition of cohorts
can be investigated by means of the “J” statistic.
5.6.2 Results
As a basic setup, we are interested in the model of the following form
log hoursi,t = γ logwagei,t + β′zi,t + θ′qi,t + ui,t, E[ui,t] = 0, i ∈ Is,t. (5.27)
Here log hoursi,t is the logarithm of the weekly hours worked by individual i while
logwagei,t is the real hourly wage. Models of similar form were extensively estimated
using genuine panel data methods, see e.g. Ziliak (1997). Furthermore, we assume that
the regressors in zi,t are observed by the econometrician, however qi,t are unobserved
but can be well approximated by
qi,t = Λ(q)s ft + εi,t, εs,t
p−→ 0Kq .
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 172
We would like to stress that we do not assume that E[zi,tq′i,t] = O or E[qi,t logwagei,t] =
0Kq hence we can allow for endogeneity in our framework. For example, due to the
non-availability of consumption data in this dataset, we assume that this endogenous
variable is a part of the qi,t variables rather than of zi,t.
Combining both equations we obtain a simple model to study the labour supply elas-
ticity in the intensive margin that can be summarised by the following equation
log hoursi,t = λ′sft + γ logwagei,t + βzi,t + vi,t, vi,t = ui,t + θ′εi,t, i ∈ Is,t, (5.28)
while λ′s = θ′Λ(q)s . The only control variable that we include in our model that is of
particular interest on its own (and is available in the dataset) is the total number of
individuals under the age of 16 in a given household. By including this variable, we
follow some other studies, particularly Peterman (2014), and control for the house-
hold composition. The averages of log hoursi,t worked and logwagei,t are presented in
Figure 5.2, while average values of zi,t and the number of observations per cohort are
summarised in Appendix 5.B.
As discussed in Sections 5.4.1 and 5.4.3, the “J” statistic evaluated at the GMMl0
estimator can be used as a possible indication for global non-identification. For this
model specification, the “J” statistic is equal to JN(L = 0) = 698.91, which indicates a
clear rejection based on the critical values from χ2(68) at any conventional significance
level. That provides some indication that global identification failure for this dataset
is not very likely.
Variable GMMl1 GMMn1 GMMn2logwage -0.130*** -0.075*** -0.076**# kids 0.003 -0.008 -0.011
df 58 52 38J 88.05*** 56.97 44.92
BIC1 -187.35 -133.62BIC2 -84.41 -58.39
Wald(FE) 33.874***
Table 5.1: T = 7, S = 10. Results are based on 2-step estimates using theoptimal weighting matrix in the second step. Based only on heads of the household.* indicates statistically significant at the 10% level, **- at the 5% level, and ***- at the1% level γ0, β0 = 0. J(GMMl0) = 577.84. BIC1 and BIC2 use N =
∑Ss=1
∑Tt=1Ns,t
and N = (1/ST )∑S
s=1
∑Tt=1Ns,t respectively.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 173
As we can see from the estimation results in Table 5.1, the GMMl1 model specification
is rejected based on the “J” statistic.5 This conclusion is also confirmed using the
Wald test for testing the null hypothesis that the factor is constant over time. Hence,
the assumption that cohorts were not affected by any internal or external shocks is
difficult to justify based on the statistical procedures we consider. For GMMl1, the es-
timated elasticity coefficient of logwage is negative and significant at any conventional
significance level. On the other hand, if we allow for cohort interactive effects6 the
estimated elasticity coefficient is smaller in magnitude and does not significantly differ
from zero at the 1% significance level for the GMMn2 estimator. Based on the BIC
model selection criteria, the model specification with one factor is preferred. Turning
our attention to the estimated coefficient of #kids we can see that the results differ
slightly between estimators. In all cases, we find that estimates are not significant at
any conventional significance level. As a robustness check in Appendix 5.B.1 we also
provide results based on a linear-log specification with qualitatively similar conclusions.
Overall, our results indicate that by not taking into account the possible presence of
time-varying factors researchers can overestimate the elasticity of the labour supply
using pseudo-panel data.
Although methodologically our study is simpler and spans a shorter (and different)
time period, we can compare our results with those in Gonzalez and Sala (2015). They
found that for some Latin American countries, particularly Paraguay, the estimate
of the labour supply elasticity is strongly negative, while for others, it was found to
be positive (Argentina, and after sample restriction, Uruguay).7 Our estimate of the
elasticity coefficient places Ecuador closer to countries like Paraguay than to Argentina
or Chile. This conclusion can also be partially supported by some relatively similar
macroeconomic indicators (e.g. GDP per capita) in both countries and relatively long
average hours worked.
5The results based on the one-step estimator that assumes σ2s,t = σ2 are qualitatively and quan-
titative similar.6Unlike the Monte Carlo study where only one set of starting values based on FE estimator was
used, in this section we use up to 100 random starting values that are uniformly distributed on [−10; 10]a for non-linear estimator to make sure that the global minimum of the objective function is selected.
7This result holds irrespective of whether a log-log or linear-log specification is used.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 174
5.7 Conclusions
In this paper, we have studied the properties of available estimation techniques for
linear pseudo panel data models. We have extended the pseudo panel data literature
to models with possible cohort interactive effects. To overcome inconsistency of the
usual FE estimator, we have introduced the approach of Ahn et al. (2013) to pseudo
panel data models. The consistency and conditional asymptotic normality of the new
estimator was proved for pseudo panels with a fixed number of time series observations
and cohorts. Furthermore, we have discussed the estimation and identification for
datasets with a cohort-specific number of time observations.
Results from the extensive Monte Carlo study suggest that the estimator that accounts
for the multiplicative structure of the cohort effects has good finite sample properties
for small values of S and T . The results, however, can be sensitive to the relative
importance of the unobserved factors in the total error component structure.
As an empirical illustration, we have studied labour supply elasticity based on data
from Ecuador. In our analysis we have found that the model with multiplicative error
structure provides a better fit to the data than its counterpart with fixed effects. More
importantly, we have found that using our estimation technique, the estimated labour
supply elasticity measure is smaller in absolute value in comparison to the conventional
cohort fixed effects estimation strategy.
As thoroughly discussed by McKenzie (2004) and Verbeek (2008), different types of
asymptotic approximations are available for pseudo panel data models, depending on
their dimensions. In this paper, we have mainly investigated the effects of the error
terms with multifactor structure assuming that the number of cohorts and the time
dimension is fixed. This assumption is only sensible for models with limited number
of cohorts but a large number of observations per cohort. However, given the limited
scope of this paper we leave the rigorous analysis of other asymptotic schemes for
models with multifactor error structure for future research.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 175
5.A Theoretical results
5.A.1 Proofs
Proof of Proposition 5.1.
The result of this proposition follows directly given the DGP is given by:
y = vec (FΛ′) +Xθ0 + u.
Plugging in the expression for y into the formula for
θGMMl − θ0 = (X ′MΩMX)−1X ′MΩM(vec (FΛ′) + u)
= (X ′MΩMX)−1X ′MΩM(vec (FΛ′)) + oP (1)
= (X ′∞MΩMX∞)−1X ′∞MΩM vec (FΛ′) + oP (1).
Here the second line follows, as by Assumption (A.2) plimN→∞ u = 0ST . Finally,
using the notation plimN→∞X = X∞ we obtain the final result.
Proof of Proposition 5.3.
The proof of this proposition is a straightforward modification of the proof for a simple
IV estimator.
CASE I: γ ≥ 0.5:
ζ − ζ0 − ρ =
∑Ss=1
∑Tt=1 zs,t(εs,t − ρ(µs,t/N
γs,t))∑S
s=1
∑Tt=1 z
2s,t
=N
N
∑Ss=1
∑Tt=1 zs,t(εs,t − ρ(µs,t/N
γs,t))∑S
s=1
∑Tt=1 z
2s,t
=
∑Ss=1
∑Tt=1(N/Ns,t)(
√Ns,tzs,t)(
√Ns,tεs,t − ρ(µs,t/N
γ+0.5s,t ))∑S
s=1
∑Tt=1(N/Ns,t)
(√Ns,tzs,t
)2 .
From here the desired result follows given that
N/Ns,t → π−1s,t ,√
Ns,tzs,td−→ zs,t,√
Ns,tεs,t − ρ(µs,t/Nγ+0.5s,t )
d−→ εs,t,
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 176
as we assume that all idiosyncratic components are i.i.d. and hence the usual CLT
applies.
CASE II: γ ∈ [0; 0.5):
N1/2−γ(ζ − ζ0) =N1/2+γ
N2γ
∑Ss=1
∑Tt=1 zs,t(us,t)∑S
s=1
∑Tt=1 z
2s,t
=
∑Ss=1
∑Tt=1(N/Ns,t)
1/2+γ(Nγs,tzs,t)(
√Ns,tus,t)∑S
s=1
∑Tt=1(N/Ns,t)2γ(Nγ
s,tzs,t)2
.
By means of the Slutsky’s Theorem for the denominator
Nγs,tzs,t = µs,t +N
γ−1/2s,t (
√Ns,tε
(z)s,t )
p−→ µs,t.
For the numerator simple CLT for i.i.d. data applies√Ns,tus,t = ρ
√Ns,tε
zs,t +
√Ns,tεs,t
d−→ ρN (0, 1) +N (0, 1).
The desired result follows by combining the results for the numerator and denominator
N1/2−γ(ζ − ζ0)d−→∑S
s=1
∑Tt=1 π
−(0.5+γ)s,t µs,tus,t∑S
s=1
∑Tt=1 π
−2γs,t µ
2s,t
,
with us,t ∼ N (0, 1 + ρ2).
5.A.2 Differentials
If we denote transformed equations by u = u(θ,φ) = M (φ)(y −Xθ), the objective
function can be formulated in the following way
f(θ,φ) =1
2u′Ωu.
Using the product rule for differentials
df(θ,φ) = u′Ω du. (5.29)
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 177
Here the first differential of the u term can be compactly written
du = (dM (φ))(y −Xθ)−M (φ)X dθ
= (IS ⊗ (OT−L, dΦ))(y −Xθ)−M (φ)X dθ
=((y −Xθ)′ ⊗ IS(T−L)
)vec(IS ⊗ (OT−L, dΦ))−M(φ)X dθ
=((y −Xθ)′ ⊗ IS(T−L)
)Q dφ−M (φ)X dθ
= Dφ dφ+Dθ dθ.
Here the selection matrix Q is of the following form
Q = (((IS ⊗KTS)(vec IS ⊗ IT ))⊗ IT−L)
(O[(T−L)2×(T−L)L]
I(T−L)L
).
For more detailed derivations, see Magnus and Neudecker (2007)[p. 56]. The second
differential of the objective function f(θ,φ) is
d2f(θ,φ) = (du)′Ω du+ u′Ω d2u. (5.30)
Note that the second term is asymptotically negligible (oP (1)) if evaluated at any
consistent estimator γ. Hence
d2f(θ,φ) = (du)′Ω du+ oP (1). (5.31)
Plugging in the value of du (ignoring the oP (1) term),
d2f(θ,φ) = (du)′Ω du
= (dφ)′Dφ(γ)′ΩDφ(γ)(dφ) + (dθ)′Dθ(γ)′ΩDθ(γ)(dθ)
+ 2(dφ)′Dφ(γ)′ΩDθ(γ)(dθ).
Combining all results we obtain formulas for the score (∇(γ)) and the Hessian (H(γ)):
∇(γ) =
(Dθ(γ)′
Dφ(γ)′
)Ωu,
H(γ) =
(Dθ(γ)′ΩDθ(γ) Dθ(γ)′ΩDφ(γ)
Dφ(γ)′ΩDθ(γ) Dφ(γ)′ΩDφ(γ)
)=
(Dθ(γ)′
Dφ(γ)′
)Ω
(Dθ(γ)′
Dφ(γ)′
)′.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 178
Under Assumptions (A.1)-(A.6) the asymptotic distribution is
√N(γ − γ0)
d−→ plimN→∞
( D′θΩDθ D′θΩDφ
D′φΩDθ D′φΩDφ
)−1(D′θ−D′φ
)ΩM (φ0)Σ1/2ξ.
Furthermore, similarly to any standard GMM problem the (conditional) variance of√N(γ − γ0) is minimized at
Ωopt = (M (φ0)ΣM (φ0)′)−1.
Therefore, the asymptotic variance with Ωopt as a weighting matrix equals
Avar γ =
[plimN→∞
(D′θD′φ
)(M(φ0)ΣM (φ0)′)
−1
(D′θD′φ
)′]−1
.
Finally, the same result holds if evaluated at any weighting matrix Ω such that Ωp−→
Ωopt.
5.A.3 Sufficient conditions for FE estimator
For the model with cohort fixed effects we define
ui,t ≡ νi,t − δs, i ∈ Is,t.
The following high-level assumptions are sufficient to prove that the linear GMM esti-
mator θGMMl is consistent and asymptotically normally distributed.
(FE.1) Ns,t →∞, ∀s, t; ∃N →∞ s.t. Ns,t/N → πs,t and 0 < minπs,t < maxπs,t <
∞. T and S are fixed (Type I asymptotics).
(FE.2) ui,t are i.h.d. with finite 2 + δ moment, for δ > 0, such that√Ns,tus,t
d−→N (0, σ2
s,t) jointly ∀s, t with 0 < minσ2s,t ≤ maxσ2
s,t <∞.
(FE.3) There exists a unique true value θ0.
(FE.4) rk(MX∞) = K. The X∞ ≡ plimN→∞X matrix is deterministic.
(FE.5) S(T − 1) > K.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 179
Under these assumptions the asymptotically efficient weighting matrix can be consis-
tently estimated by
Ωopt =(MΣM
)+
.
Here we use the Moore-Penrose pseudoinverse of(MΣM
)because the rank of this
matrix is given by rk (M) = (T − 1)S. The typical (s − 1)T + t diagonal element of
the Σ matrix is equal to q2s,t = (N/Ns,t)σ
2s,t where
σ2s,t =
1
Ns,t
∑i∈Is,t
(yi,t − x′i,s,tθ1)2 −
1
Ns,t
∑i∈Is,t
(yi,t − x′i,s,tθ1)
2
, (5.32)
which is evaluated at some consistent initial estimator θ1 (e.g. the estimator that
replaces Ω by an identity matrix). Therefore (under Type I asymptotics) if θ1p−→ θ0,
one has q2s,t
p−→ q2s,t ≡ σ2
s,t/πs,t, where Ns,t/N → πs,t.
5.A.4 The Hausman test for fixed effects
The Hausman test can be used to test for the presence of unobserved factors (also
suggested by Bai (2009) in the context of genuine panels with large N, T ). In particular,
if L = L0 = 1 then the following result
H = N(
∆θ)′( Avar θGMMn − Avar θGMMl
)−1
∆θd−→ χ2(K) (5.33)
holds under the null hypothesis of the fixed effects model being correct. Here ∆θ =
θGMMn − θGMMl, while to estimate variance-covariance matrices of estimators we use
Avar φGMMn =(Dφ(γ)′Ω1/2MΩ1/2Dθ(γ)Ω
1/2Dφ(γ))−1
, (5.34)
with Ω =(M(φ1)Σ(θ1)M (φ1)′
)−1
evaluated at a consistent first-step estimator
γ1 = (θ′1, φ′1)′ (under the alternative hypothesis). For the fixed effects estimator the
variance-covariance matrix is analogously given by
Avar θGMMl =
(X ′M
(MΣ(θGMMl,1)M
)+
MX
)−1
, (5.35)
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 180
for M = IS ⊗ (IT − (1/T )ıT ı′T ). Here θGMMl,1 is a first-step consistent estimator
(under the null hypothesis). In both cases Σ(·) can be estimated using the formula in
(5.32).
5.B The ENEMDU dataset
Table 5.2: Observations per cohort in a particular year. Here Cohort 1 is the oldestand cohort is 10 the youngest.
1 2 3 4 5 6 7 8 9 102007 264 283 276 308 300 351 322 277 235 1912008 288 329 383 383 366 353 361 298 267 2112009 260 317 320 340 335 338 280 241 229 1852010 318 341 390 417 396 398 390 272 285 2022011 338 361 339 420 372 356 412 369 321 2972012 338 359 411 429 420 402 441 387 342 3012013 240 304 363 437 432 447 525 518 473 467
Table 5.3: The average number of individuals of age under 16 in a particularhousehold. Here Cohort 1 is the oldest and cohort 10 is the youngest.
1 2 3 4 5 6 7 8 9 102007 1.08 1.30 1.57 1.94 2.32 2.21 2.37 2.27 2.08 1.572008 0.92 1.47 1.41 1.87 2.18 2.09 2.46 2.18 2.12 1.702009 0.93 1.15 1.42 1.60 1.98 1.99 2.39 2.27 2.18 1.602010 0.90 1.08 1.30 1.54 1.86 2.05 2.17 2.18 2.37 1.792011 0.78 0.82 1.06 1.25 1.55 1.79 1.94 2.17 2.20 1.932012 0.80 0.97 0.97 1.21 1.40 1.86 1.92 2.21 2.21 1.942013 0.81 0.86 1.04 1.14 1.28 1.77 1.85 2.23 2.27 2.01
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 181
5.B.1 The linear-log specification
hoursi,t = λ′sft + γ logwagei,t + βzi,t + ui,t, i ∈ Is,t. (5.36)
In this specification the γ parameter no longer has the elasticity interpretation. We
summarize the results for this specification in Table 5.4. The results are both quantita-
tively and qualitatively similar to the ones based on log-log specification. The estimated
coefficient of logwage based on the Fixed Effects specification is substantially larger in
magnitude as compared to the counterpart estimated allowing for time varying factors.
Variable GMMl1 GMMn1 GMMn2logwage -6.21*** -3.45** -3.37**# kids 0.18 -0.41 -0.49
df 58 52 38J 88.29*** 56.25 43.45
BIC1 -188.07 -135.10BIC2 -85.12 -59.86
Wald(FE) 33.68***
Table 5.4: T = 7, S = 10. Results are based on 2-step estimates using theoptimal weighting matrix in the second step. Based only on heads of the household.* indicates statistical significance at the 10% level, **- at the 5% level, and ***- at the1% level γ0, β0 = 0. J(GMMl0) = 580.90. BIC1 and BIC2 use N =
∑Ss=1
∑Tt=1Ns,t
and N = (1/ST )∑S
s=1
∑Tt=1Ns,t respectively.
5.C Monte Carlo results
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 182
Table 5.5: Estimation results for T = 5, S = 10. 10000 MC replications. Forζ0 = 1.
Bias RMSE
σ2fλ; ρ;σ2
µ; N ;σ2f L0 LFE1 L1 L2 L L0 LFE1 L1 L2 L
.1 ; .00 ; .00 ; 150 ; .0 .53 .00 .04 .02 .03 .54 .16 .21 .26 .22 .1 ; .00 ; .00 ; 150 ; .1 .43 .00 .02 .01 .01 .44 .18 .18 .22 .18 .1 ; .00 ; .00 ; 300 ; .0 .39 .00 .02 .02 .02 .40 .16 .20 .25 .20 .1 ; .00 ; .00 ; 300 ; .1 .48 .00 .01 .00 .01 .48 .21 .18 .21 .17 .1 ; .00 ; .05 ; 150 ; .0 .35 .00 .00 .00 .00 .36 .06 .07 .09 .07
(A) .1 ; .00 ; .05 ; 150 ; .1 .51 .00 .01 .00 .00 .51 .07 .10 .09 .07 .1 ; .00 ; .05 ; 300 ; .0 .47 .00 .00 .00 .00 .48 .04 .05 .06 .05 .1 ; .00 ; .05 ; 300 ; .1 .53 .00 .00 .00 .00 .53 .06 .07 .06 .05 .1 ; .00 ; .30 ; 150 ; .0 .45 .00 .00 .00 .00 .46 .02 .03 .04 .03 .1 ; .00 ; .30 ; 150 ; .1 .37 .00 .00 .00 .00 .38 .03 .03 .04 .03 .1 ; .00 ; .30 ; 300 ; .0 .39 .00 .00 .00 .00 .40 .02 .03 .03 .02 .1 ; .00 ; .30 ; 300 ; .1 .35 .00 .00 .00 .00 .36 .02 .02 .03 .02
.1 ; .30 ; .00 ; 150 ; .0 .66 .30 .34 .33 .33 .67 .34 .39 .41 .39 .1 ; .30 ; .00 ; 150 ; .1 .47 .30 .24 .24 .23 .48 .35 .31 .34 .31 .1 ; .30 ; .00 ; 300 ; .0 .61 .30 .32 .32 .32 .61 .34 .37 .40 .37 .1 ; .30 ; .00 ; 300 ; .1 .47 .30 .18 .19 .18 .48 .36 .27 .29 .26 .1 ; .30 ; .05 ; 150 ; .0 .52 .04 .04 .04 .04 .53 .07 .08 .10 .08
(B) .1 ; .30 ; .05 ; 150 ; .1 .44 .04 .04 .04 .04 .45 .08 .08 .09 .07 .1 ; .30 ; .05 ; 300 ; .0 .49 .02 .02 .02 .02 .50 .05 .05 .07 .05 .1 ; .30 ; .05 ; 300 ; .1 .42 .02 .02 .02 .02 .43 .06 .05 .06 .05 .1 ; .30 ; .30 ; 150 ; .0 .33 .00 .01 .01 .01 .34 .02 .03 .04 .03 .1 ; .30 ; .30 ; 150 ; .1 .42 .01 .01 .01 .01 .43 .03 .04 .04 .03 .1 ; .30 ; .30 ; 300 ; .0 .44 .00 .00 .00 .00 .45 .02 .03 .03 .02 .1 ; .30 ; .30 ; 300 ; .1 .31 .00 .00 .00 .00 .32 .02 .02 .03 .02
.5 ; .00 ; .00 ; 150 ; .0 .58 .00 .00 .00 .00 .62 .12 .13 .17 .13 .5 ; .00 ; .00 ; 150 ; .1 .47 .00 .00 .00 .00 .52 .18 .10 .14 .11 .5 ; .00 ; .00 ; 300 ; .0 .59 .00 .00 .00 .00 .64 .12 .13 .17 .13 .5 ; .00 ; .00 ; 300 ; .1 .51 .00 .00 .00 .00 .55 .23 .10 .13 .10 .5 ; .00 ; .05 ; 150 ; .0 .46 .00 .00 .00 .00 .51 .04 .05 .06 .05
(C) .5 ; .00 ; .05 ; 150 ; .1 .51 .00 .00 .00 .00 .55 .07 .05 .06 .05 .5 ; .00 ; .05 ; 300 ; .0 .44 .00 .00 .00 .00 .49 .03 .03 .05 .03 .5 ; .00 ; .05 ; 300 ; .1 .42 .00 .00 .00 .00 .47 .06 .03 .04 .03 .5 ; .00 ; .30 ; 150 ; .0 .38 .00 .00 .00 .00 .42 .02 .02 .03 .02 .5 ; .00 ; .30 ; 150 ; .1 .33 .00 .00 .00 .00 .37 .03 .02 .03 .02 .5 ; .00 ; .30 ; 300 ; .0 .44 .00 .00 .00 .00 .48 .01 .01 .02 .01 .5 ; .00 ; .30 ; 300 ; .1 .49 .00 .00 .00 .00 .52 .03 .02 .02 .01
.5 ; .30 ; .00 ; 150 ; .0 .51 .30 .30 .30 .30 .56 .32 .33 .35 .33 .5 ; .30 ; .00 ; 150 ; .1 .35 .30 .18 .20 .18 .41 .35 .21 .25 .22 .5 ; .30 ; .00 ; 300 ; .0 .56 .30 .30 .30 .30 .61 .32 .33 .35 .33 .5 ; .30 ; .00 ; 300 ; .1 .47 .30 .14 .16 .14 .52 .37 .18 .23 .17 .5 ; .30 ; .05 ; 150 ; .0 .57 .04 .04 .04 .04 .61 .06 .06 .08 .06
(D) .5 ; .30 ; .05 ; 150 ; .1 .41 .04 .03 .04 .03 .46 .08 .06 .07 .06 .5 ; .30 ; .05 ; 300 ; .0 .47 .02 .02 .02 .02 .52 .04 .04 .05 .04 .5 ; .30 ; .05 ; 300 ; .1 .44 .02 .02 .02 .02 .49 .07 .04 .05 .04 .5 ; .30 ; .30 ; 150 ; .0 .45 .01 .01 .01 .01 .49 .02 .02 .03 .02 .5 ; .30 ; .30 ; 150 ; .1 .49 .01 .01 .01 .01 .52 .03 .03 .03 .02 .5 ; .30 ; .30 ; 300 ; .0 .33 .00 .00 .00 .00 .38 .01 .02 .02 .01 .5 ; .30 ; .30 ; 300 ; .1 .38 .00 .00 .00 .00 .42 .03 .02 .02 .01
Here “L0” is the “GMMl0” estimator; “LFE1 ” is the “GMMl1” fixed effects estimator; “L1”and “L2” are the non-linear “GMMn1” and “GMMn2” estimators, respectively; “L” is the
“GMMo” estimator with optimal number of factors based on BIC.
Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 183
Table 5.6: Testing results for T = 5, S = 10. 10000 MC replications. For ζ0 = 1.
L0 LFE1 L1 L2 Lσ2
fλ; ρ;σ2µ; N ;σ2
f J t J t J t J t J t W h0 h1 #L = 1
.1 ; .00 ; .00 ; 150 ; .0 1 1 .05 .04 .08 .16 .02 .16 .04 .16 .09 .98 .31 .94 .1 ; .00 ; .00 ; 150 ; .1 1 1 .36 .08 .09 .14 .02 .14 .04 .13 .59 .96 .47 .94 .1 ; .00 ; .00 ; 300 ; .0 1 1 .05 .04 .06 .15 .02 .16 .04 .15 .08 .97 .27 .98 .1 ; .00 ; .00 ; 300 ; .1 1 1 .62 .12 .08 .11 .02 .13 .05 .11 .78 .97 .56 .97 .1 ; .00 ; .05 ; 150 ; .0 1 1 .06 .05 .10 .08 .03 .09 .06 .07 .12 1 .27 .94
(A) .1 ; .00 ; .05 ; 150 ; .1 1 1 .45 .11 .13 .09 .03 .08 .06 .07 .72 1 .59 .91 .1 ; .00 ; .05 ; 300 ; .0 1 1 .06 .06 .07 .06 .02 .07 .05 .06 .08 1 .18 .98 .1 ; .00 ; .05 ; 300 ; .1 1 1 .71 .18 .09 .07 .03 .08 .06 .06 .89 1 .72 .97 .1 ; .00 ; .30 ; 150 ; .0 1 1 .06 .05 .09 .06 .03 .08 .06 .06 .08 1 .17 .96 .1 ; .00 ; .30 ; 150 ; .1 1 1 .46 .12 .10 .06 .03 .07 .06 .06 .73 1 .57 .95 .1 ; .00 ; .30 ; 300 ; .0 1 1 .06 .05 .08 .06 .03 .07 .06 .06 .08 1 .15 .98 .1 ; .00 ; .30 ; 300 ; .1 1 1 .67 .16 .08 .05 .03 .07 .06 .05 .87 1 .69 .99
.1 ; .30 ; .00 ; 150 ; .0 1 1 .04 .55 .08 .61 .02 .49 .03 .60 .13 .94 .29 .94 .1 ; .30 ; .00 ; 150 ; .1 1 1 .40 .55 .13 .51 .03 .43 .06 .49 .62 .87 .49 .91 .1 ; .30 ; .00 ; 300 ; .0 1 1 .05 .55 .05 .59 .01 .49 .04 .59 .08 .96 .27 .98 .1 ; .30 ; .00 ; 300 ; .1 1 1 .65 .54 .10 .40 .03 .35 .07 .39 .81 .92 .59 .96 .1 ; .30 ; .05 ; 150 ; .0 1 1 .07 .12 .11 .14 .04 .13 .06 .13 .10 1 .24 .94
(B) .1 ; .30 ; .05 ; 150 ; .1 1 1 .45 .17 .12 .15 .04 .13 .07 .13 .71 1 .57 .94 .1 ; .30 ; .05 ; 300 ; .0 1 1 .06 .09 .08 .09 .03 .10 .06 .09 .07 1 .19 .98 .1 ; .30 ; .05 ; 300 ; .1 1 1 .63 .19 .09 .10 .03 .10 .06 .09 .85 1 .66 .97 .1 ; .30 ; .30 ; 150 ; .0 1 1 .06 .06 .10 .07 .04 .08 .06 .07 .11 1 .21 .95 .1 ; .30 ; .30 ; 150 ; .1 1 1 .50 .13 .10 .07 .03 .08 .06 .06 .76 1 .60 .95 .1 ; .30 ; .30 ; 300 ; .0 1 1 .06 .05 .07 .06 .03 .07 .06 .06 .07 1 .12 .99 .1 ; .30 ; .30 ; 300 ; .1 1 1 .60 .15 .08 .06 .03 .07 .06 .06 .84 1 .66 .98
.5 ; .00 ; .00 ; 150 ; .0 1 1 .05 .04 .05 .06 .02 .08 .04 .07 .05 .99 .25 .97 .5 ; .00 ; .00 ; 150 ; .1 1 1 .78 .17 .07 .06 .02 .07 .05 .06 .92 .99 .67 .97 .5 ; .00 ; .00 ; 300 ; .0 1 1 .05 .04 .05 .06 .02 .07 .04 .06 .05 1 .23 .99 .5 ; .00 ; .00 ; 300 ; .1 1 1 .90 .26 .06 .06 .02 .07 .05 .06 .97 .99 .75 .99 .5 ; .00 ; .05 ; 150 ; .0 1 .99 .06 .05 .07 .06 .03 .07 .05 .06 .07 1 .15 .97
(C) .5 ; .00 ; .05 ; 150 ; .1 1 1 .82 .23 .07 .06 .03 .07 .05 .06 .93 1 .78 .97 .5 ; .00 ; .05 ; 300 ; .0 1 1 .06 .05 .07 .06 .02 .07 .06 .06 .06 1 .13 .99 .5 ; .00 ; .05 ; 300 ; .1 1 .99 .91 .31 .06 .06 .02 .07 .05 .06 .97 1 .85 .99 .5 ; .00 ; .30 ; 150 ; .0 1 .99 .06 .05 .07 .05 .02 .07 .05 .06 .06 1 .12 .98 .5 ; .00 ; .30 ; 150 ; .1 1 .99 .78 .23 .07 .06 .03 .07 .05 .06 .92 1 .76 .97 .5 ; .00 ; .30 ; 300 ; .0 1 1 .05 .05 .06 .05 .02 .07 .05 .05 .06 1 .10 .99 .5 ; .00 ; .30 ; 300 ; .1 1 1 .93 .36 .06 .05 .02 .06 .05 .05 .98 1 .89 .99
.5 ; .30 ; .00 ; 150 ; .0 1 1 .05 .78 .05 .75 .02 .60 .04 .75 .07 .97 .27 .97 .5 ; .30 ; .00 ; 150 ; .1 1 .98 .75 .71 .09 .53 .03 .44 .06 .53 .91 .95 .68 .95 .5 ; .30 ; .00 ; 300 ; .0 1 1 .05 .78 .05 .75 .02 .59 .04 .75 .06 .98 .26 .99 .5 ; .30 ; .00 ; 300 ; .1 1 1 .91 .67 .09 .42 .03 .37 .07 .42 .97 .97 .78 .98 .5 ; .30 ; .05 ; 150 ; .0 1 1 .08 .16 .09 .16 .03 .15 .06 .16 .07 1 .17 .97
(D) .5 ; .30 ; .05 ; 150 ; .1 1 .99 .78 .28 .08 .15 .03 .14 .06 .15 .92 1 .75 .97 .5 ; .30 ; .05 ; 300 ; .0 1 1 .07 .11 .07 .11 .03 .11 .06 .11 .07 1 .14 .99 .5 ; .30 ; .05 ; 300 ; .1 1 1 .91 .33 .07 .11 .03 .11 .06 .10 .97 1 .85 .99 .5 ; .30 ; .30 ; 150 ; .0 1 1 .06 .07 .07 .07 .03 .08 .06 .07 .07 1 .12 .97 .5 ; .30 ; .30 ; 150 ; .1 1 1 .83 .26 .07 .07 .03 .08 .06 .07 .95 1 .80 .97 .5 ; .30 ; .30 ; 300 ; .0 1 .99 .06 .06 .06 .06 .03 .07 .05 .06 .06 1 .11 .99 .5 ; .30 ; .30 ; 300 ; .1 1 1 .92 .32 .06 .06 .02 .07 .06 .06 .97 1 .86 .99
Here “L0” is the “GMMl0” estimator; “LFE1 ” is the “GMMl1” fixed effects estimator; “L1”and “L2” are the non-linear “GMMn1” and “GMMn2” estimators, respectively; “L” is the
“GMMo” estimator with optimal number of factors based on BIC.
Bibliography
Abadir, K. M. and J. R. Magnus (2002): “Notation in Econometrics: A Proposal
for a Standard,” Econometrics Journal, 5, 76–90.
Abrevaya, J. (2013): “The Projection Approach for Unbalanced Panel Data,” The
Econometrics Journal, 16, 161–178.
Ahn, S. C. (2015): “Comment on “IV Estimation of Panels with Factor Residuals”
by D. Robertson and V. Sarafidis,” Journal of Econometrics, 185, 542 – 544.
Ahn, S. C., Y. H. Lee, and P. Schmidt (2001): “GMM Estimation of Linear
Panel Data Models with Time-varying Individual Effects,” Journal of Econometrics,
101, 219–255.
——— (2013): “Panel Data Models with Multiple Time-varying Individual Effects,”
Journal of Econometrics, 174, 1–14.
Ahn, S. C. and P. Schmidt (1995): “Efficient Estimation of Models for Dynamic
Panel Data,” Journal of Econometrics, 68, 5–27.
——— (1997): “Efficient Estimation of Dynamic Panel Data Models: Alternative
Assumptions and Simplified Estimation,” Journal of Econometrics, 76, 309–321.
Ahn, S. C. and G. M. Thomas (2006): “Likelihood Based Inference for Dynamic
Panel Data Models,” Unpublished Manuscript.
Akashi, K. and N. Kunitomo (2012): “Some Properties of the LIML Estimator in
a Dynamic Panel Structural Equation,” Journal of Econometrics, 166, 167 – 183.
Alonso-Borrego, C. and M. Arellano (1999): “Symmetrically Normalized
Instrumental-Variable Estimation using Panel Data,” Journal of Business & Eco-
nomic Statistics, 17, 36–49.
185
Bibliography 186
Alvarez, J. and M. Arellano (2003): “The Time Series and Cross-Section Asymp-
totics of Dynamic Panel Data Estimators,” Econometrica, 71(4), 1121–1159.
Amemiya, T. (1985): Advanced Econometrics, Harvard University Press.
Anderson, T. W. and C. Hsiao (1982): “Formulation and Estimation of Dynamic
Models Using Panel Data,” Journal of Econometrics, 18, 47–82.
Antman, F. and D. J. McKenzie (2007a): “Earnings Mobility and Measurement
Error: A Pseudo Panel Approach,” Economic Development and Cultural Change, 56,
125–161.
——— (2007b): “Poverty Traps and Nonlinear Income Dynamics with Measurement
Error and Individual Heterogeneity,” Journal of Development Studies, 43, 1057–1083.
Arellano, M. (2003a): “Modeling Optimal Instrumental Variables for Dynamic
Panel Data Models,” Unpublished manuscript.
——— (2003b): Panel Data Econometrics, Advanced Texts in Econometrics, Oxford
University Press.
Arellano, M. and S. Bond (1991): “Some Tests of Specification for Panel Data:
Monte Carlo Evidence and an Application to Employment Equations,” Review of
Economic Studies, 58, 277–297.
Arellano, M. and O. Bover (1995): “Another Look at the Instrumental Variable
Estimation of Error-components Models,” Journal of Econometrics, 68, 29–51.
Bai, J. (2009): “Panel Data Models With Interactive Fixed Effects,” Econometrica,
77, 1229–1279.
——— (2013a): “Fixed-Effects Dynamic Panel Models, a Factor Analytical Method,”
Econometrica, 81, 285–314.
——— (2013b): “Likelihood Approach to Dynamic Panel Models with Interactive
Effects,” Working Paper.
Balestra, P. and M. Nerlove (1966): “Pooling Cross Section and Time Series
Data in the Estimation of a Dynamic Model: The Demand for Natural Gas,” Econo-
metrica, 34, pp. 585–612.
Baltagi, B. H. (2013): Econometric Analysis of Panel Data, Wiley.
Bibliography 187
Bekker, P. A. (1994): “Alternative Approximations to the Distributions of Instru-
mental Variable Estimators,” Econometrica, 62, pp. 657–681.
Bekker, P. A. and J. van der Ploeg (2005): “Instrumental Variable Estimation
Based on Grouped Data,” Statistica Neerlandica, 59, 239–267.
Binder, M., C. Hsiao, and M. H. Pesaran (2005): “Estimation and Inference in
Short Panel Vector Autoregressions with Unit Root and Cointegration,” Econometric
Theory, 21, 795–837.
Blundell, R. W. and S. Bond (1998): “Initial Conditions and Moment Restrictions
in Dynamic Panel Data Models,” Journal of Econometrics, 87, 115–143.
Bond, S., C. Nauges, and F. Windmeijer (2005): “Unit Roots: Identification
and Testing in Micro Panels,” Working paper.
Bond, S. and F. Windmeijer (2002): “Projection Estimators for Autoregressive
Panel Data Models,” The Econometrics Journal, 5, 457–479.
Bun, M. J. G. and M. A. Carree (2005): “Bias-Corrected Estimation in Dynamic
Panel Data Models,” Journal of Business & Economic Statistics, 23(2), 200–210.
Bun, M. J. G., M. A. Carree, and A. Juodis (2015): “On Maximum Likelihood
Estimation of Dynamic Panel Data Models,” UvA-Econometrics Working Paper Se-
ries.
Bun, M. J. G. and J. F. Kiviet (2006): “The Effects of Dynamic Feedbacks on LS
and MM Estimator Accuracy in Panel Data Models,” Journal of Econometrics, 132,
409–444.
Bun, M. J. G. and F. R. Kleibergen (2014): “Identification in Linear Dynamic
Panel Data Models,” UvA-Econometrics Working Paper Series.
Bun, M. J. G. and R. W. Poldermans (2015): “Weak Identification Robust
Inference in Dynamic Panel Data Models,” Mimeo.
Bun, M. J. G. and V. Sarafidis (2015): “Dynamic Panel Data Models,” in The
Oxford Handbook of Panel Data, ed. by B. H. Baltagi, Oxford University Press,
chap. 3.
Bun, M. J. G. and F. Windmeijer (2010): “The Weak Instrument Problem of
the System GMM Estimator in Dynamic Panel Data Models,” The Econometrics
Journal, 13, 95–126.
Bibliography 188
Cao, B. and Y. Sun (2011): “Asymptotic Distributions of Impulse Response Func-
tions in Short Panel Vector Autoregressions,” Journal of Econometrics, 163, 127–143.
Chamberlain, G. (1982): “Multivariate Regression Models for Panel Data,” Journal
of Econometrics, 18, 5–46.
Dargay, J. (2007): “The Effect of Prices and Income on Car Travel in the UK,”
Transportation Research Part A, 41, 949–960.
Deaton, A. (1985): “Panel Data From Time Series of Cross-sections,” Journal of
Econometrics, 30, 109–126.
Dhaene, G. and K. Jochmans (2015): “Likelihood Inference in an Autoregression
with Fixed Effects,” Econometric Theory, (forthcoming).
Doornik, J. (2009): An Object-Oriented Matrix Language Ox 6, London: Timberlake
Consultants Press.
Dovonon, P. and E. Renault (2009): “GMM Overidentification Test with First
Order Underidentification,” Working Paper.
Ericsson, J. and M. Irandoust (2004): “The Productivity-bias Hypothesis and
the PPP Theorem: New Evidence from Panel Vector Autoregressive Models,” Japan
and the World Economy, 16, 121–138.
Feldman, G. J. and R. D. Cousins (1998): “Unified Approach to the Classical
Statistical Analysis of Small Signals,” Phys. Rev. D, 57, 3873–3889.
Gonzalez, R. and H. Sala (2015): “The Frisch Elasticity in the Mercosur Coun-
tries: A Pseudo-Panel Approach,” Development Policy Review, 33, 107–131.
Grassetti, L. (2011): “A Note on Transformed Likelihood Approach in Linear Dy-
namic Panel Models,” Statistical Methods & Applications, 20, 221–240.
Hahn, J. and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for a
Dynamic Panel Model with Fixed Effects When Both N and T are Large,” Econo-
metrica, 70(4), 1639–1657.
Hahn, J., G. Kuersteiner, and M. H. Cho (2004): “Asymptotic Distribution
of Misspecified Random Effects Estimator for a Dynamic Panel Model with Fixed
Effects when Both n and T are Large,” Economics Letters, 84, 117 – 125.
Bibliography 189
Han, C. and P. C. B. Phillips (2010): “GMM Estimation for Dynamic Panels with
Fixed Effects and Strong Instruments at Unity,” Econometric Theory, 26, 119–151.
——— (2013): “First Difference Maximum Likelihood and Dynamic Panel Estima-
tion,” Journal of Econometrics, 175, 35–45.
Hayakawa, K. (2007): “Consistent OLS Estimation of AR(1) Dynamic Panel Data
Models with Short Time Series,” Applied Economics Letters, 14:15, 1141–1145.
——— (2009a): “On the Effect of Mean-Nonstationarity in Dynamic Panel Data Mod-
els,” Journal of Econometrics, 153, 133–135.
——— (2009b): “A Simple Efficient Instrumental Variable Estimator for Panel AR(p)
Models when Both N and T are Large,” Econometric Theory, 25, 873–890.
——— (2012): “GMM Estimation of Short Dynamic Panel Data Model with Interactive
Fixed Effects,” Journal of the Japan Statistical Society, 42, 109–123.
——— (2015): “An Improved GMM Estimation of Panel VAR Models,” Computational
Statistics and Data Analysis, (forthcoming).
Hayakawa, K. and M. H. Pesaran (2012): “Robust Standard Errors in Trans-
formed Likelihood Estimation of Dynamic Panel Data Models,” Working Paper.
——— (2015): “Robust Standard Errors in Transformed Likelihood Estimation of
Dynamic Panel Data Models,” Journal of Econometrics, 188, 111–134.
Holtz-Eakin, D., W. K. Newey, and H. S. Rosen (1988): “Estimating Vector
Autoregressions with Panel Data,” Econometrica, 56, 1371–1395.
Hsiao, C. (2002): Analysis of Panel Data, Econometric Society Monographs, Cam-
bridge University Press, 2 ed.
Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu (2002): “Maximum Likeli-
hood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time
Periods,” Journal of Econometrics, 109, 107–150.
Hsiao, C. and J. Zhang (2015): “IV, GMM or Likelihood Approach to Estimate
Dynamic Panel Models when Either N or T or Both are Large,” Journal of Econo-
metrics, 187, 312 – 322.
Hsiao, C. and Q. Zhou (2015): “Statistical Inference for Panel Dynamic Simulta-
neous Equations Models,” Journal of Econometrics, (forthcoming).
Bibliography 190
Inoue, A. (2008): “Efficient Estimation and Inference in Linear Pseudo-panel Data
Models,” Journal of Econometrics, 142, 449–466.
Juodis, A. (2013): “A Note on Bias-corrected Estimation in Dynamic Panel Data
Models,” Economics Letters, 118, 435–438.
——— (2014a): “Cointegration Testing in Panel VAR Models Under Partial Identifi-
cation and Spatial Dependence,” UvA-Econometrics working paper 2014/08.
——— (2014b): “First Difference Transformation in Panel VAR models: Robustness,
Estimation and Inference,” UvA-Econometrics working paper 2013/06.
——— (2014c): “Supplement to “First Difference Transformation in Panel VAR mod-
els: Robustness, Estimation and Inference”.” http://arturas.economists.lt/FD_
online.pdf.
——— (2015): “Pseudo Panel Data Models with Cohort Interactive Effects,” Working
Paper.
Juodis, A. and V. Sarafidis (2014): “Fixed T Dynamic Panel Data Estimators
with Multi-Factor Errors,” UvA-Econometrics working paper 2014/07.
——— (2015): “Simplified Estimators for Dynamic Panels with a Multifactor Error
Structure,” Mimeo.
Ketz, P. (2014): “Testing Near or at the Boundary of the Parameter Space,” Mimeo.
Kiviet, J. F. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators
in Dynamic Panel Data Models,” Journal of Econometrics, 68, 53–78.
——— (2007): “Judging Contending Estimators by Simulation: Tournaments in Dy-
namic Panel Data Models,” in The Refinement of Econometric Estimation and Test
Procedures, ed. by G. Phillips and E. Tzavalis, Cambridge University Press, chap. 11,
282–318.
——— (2012): “Monte Carlo Simulation for Econometricians,” in Foundations and
Trends in Econometrics, ed. by W. H. Greene, NOW the essence of knowledge,
vol. 5.
Kiviet, J. F., M. Pleus, and R. Poldermans (2015): “Accuracy and Efficiency of
Various GMM Inference Techniques in Dynamic Micro Panel Data Models,” Working
Paper.
Bibliography 191
Kleibergen, F. R. (2005): “Testing Parameters in GMM without Assuming that
They are Identified,” Econometrica, 73, 1103–1123.
Koutsomanoli-Filippaki, A. and E. Mamatzakis (2009): “Performance and
Merton-type Default Risk of Listed Banks in the EU: A Panel VAR Approach,”
Journal of Banking and Finance, 33, 2050–2061.
Kripfganz, S. (2015): “Unconditional Transformed Likelihood Estimation of Time-
Space Dynamic Panel Data Models,” Working Paper.
Kruiniger, H. (2002): “On the Estimation of Panel Regression Models with Fixed
Effects,” Working paper 450, Queen Mary, University of London.
——— (2006): “Quasi ML Estimation of the Panel AR(1) Model with Arbitrary Initial
Condition,” Working paper 582, Queen Marry, University of London.
——— (2007): “An Efficient Linear GMM Estimator for the Covariance Stationary
AR(1)/Unit Root Model for Panel Data,” Econometric Theory, 23, 519–535.
——— (2008): “Maximum Likelihood Estimation and Inference Methods for the Co-
variance Stationary Panel AR(1)/Unit Root Model,” Journal of Econometrics, 144,
447–464.
——— (2013): “Quasi ML Estimation of the Panel AR(1) Model with Arbitrary Initial
Conditions,” Journal of Econometrics, 173, 175–188.
Kuersteiner, G. and I. R. Prucha (2013): “Limit Theory for Panel Data Models
with Cross Sectional Dependence and Sequential Exogeneity,” Journal of Economet-
rics, 174, 107–126.
——— (2015): “Dynamic Spatial Panel Models: Networks, Common Shocks, and
Sequential Exogeneity,” Working Paper.
Lancaster, T. (2002): “Orthogonal Parameters and Panel Data,” Review of Eco-
nomic Studies, 69, 647–666.
Lokshin, B. (2008): “A Monte Carlo Comparison of Alternative Estimators for Dy-
namic Panel Data Models,” Applied Economics Letters, 15, 15–18.
Maddala, G. S. (1971): “The Use of Variance Components Models in Pooling Cross
Section and Time Series Data,” Econometrica, 39, 341–358.
Bibliography 192
Magnus, J. R. and H. Neudecker (2007): Matrix Differential Calculus with Ap-
plications in Statistics and Econometrics, John Wiley & Sons.
McKenzie, D. J. (2001): “Estimation of AR(1) Models with Unequally Spaced
Pseudo-panels,” Econometrics Journal, 4, 89–108.
——— (2004): “Asymptotic Theory for Heterogeneous Dynamic Pseudo-panels,” Jour-
nal of Econometrics, 120, 235–262.
Michaud, P.-C. and A. van Soest (2008): “Health and Wealth of Elderly Couples:
Causality Tests Using Dynamic Panel Data Models,” Journal of Health Economics,
27, 1312 – 1325.
Moffitt, R. (1993): “Identification and Estimation of Dynamic Models with a Time
Series of Repeated Cross-sections,” Journal of Econometrics, 59, 99 – 123.
Molinari, L. G. (2008): “Determinants of Block Tridiagonal Matrices,” Linear Al-
gebra and its Applications, 429, 2221–2226.
Moral-Benito, E. (2012): “Determinants of Economic Growth: A Bayesian Panel
Data Approach,” The Review of Economics and Statistics, 94, 566–579.
Mundlak, Y. (1978): “On The Pooling of Time Series and Cross Section Data,”
Econometrica, 46, 69–85.
Mutl, J. (2009): “Panel VAR Models with Spatial Dependence,” Working Paper.
Nauges, C. and A. Thomas (2003): “Consistent Estimation of Dynamic Panel Data
Models with Time-varying Individual Effects,” Annales d’Economie et de Statistique,
70, 54–75.
Newey, W. K. and D. McFadden (1994): “Large Sample Estimation and Hy-
pothesis Testing,” in Handbook of Econometrics, ed. by J. Heckman and E. Leamer,
Elsevier, vol. 4, chap. 36, 2111–2245.
Neyman, J. and E. L. Scott (1948): “Consistent Estimation from Partially Con-
sistent Observations,” Econometrica, 16, 1–32.
Nickell, S. (1981): “Biases in Dynamic Models with Fixed Effects,” Econometrica,
49, 1417–1426.
Pesaran, M. H. (2006): “Estimation and Inference in Large Heterogeneous Panels
with a Multifactor Error Structure,” Econometrica, 74, 967–1012.
Bibliography 193
Peterman, W. B. (2014): “Reconciling Micro and Macro Estimates of the Frisch
Labor Supply Elasticity: A Sensitivity Analysis,” Working Paper.
Ramalho, J. J. S. (2005): “Feasible Bias-corrected OLS, Within-groups, and First-
differences Estimators for Typical Micro and Macro AR(1) Panel Data Models,”
Empirical Economics, 30, 735–748.
Robertson, D. and V. Sarafidis (2015): “IV Estimation of Panels with Factor
Residuals,” Journal of Econometrics, 185, 526–541.
Robertson, D., V. Sarafidis, and J. Westerlund (2014): “GMM Unit Root
Inference in Generally Trending and Cross-Correlated Dynamic Panels,” Working
Paper.
Sarafidis, V. and D. Robertson (2009): “On the Impact of Error Cross-Sectional
Dependence in Short Dynamic Panel Estimation,” Econometrics Journal, 12, 62–81.
Sarafidis, V. and T. J. Wansbeek (2012): “Cross-sectional Dependence in Panel
Data Analysis,” Econometric Reviews, 31, 483–531.
Sarafidis, V., T. Yamagata, and D. Robertson (2009): “A Test of Cross Sec-
tion Dependence for a Linear Dynamic Panel Model with Regressors,” Journal of
Econometrics, 148, 149–161.
Verbeek, M. (2008): “Pseudo-Panels and Repeated Cross-Sections,” in The Econo-
metrics of Panel Data, ed. by L. Matyas, P. Sevestre, J. Marquez, A. Spanos,
F. Adams, P. Balestra, M. Dagenais, D. Kendrick, J. Paelinck, R. Pindyck, and
W.Welfe, Springer Verlag, vol. 46 of Advanced Studies in Theoretical and Applied
Econometrics.
Verbeek, M. and T. Nijman (1992): “Can Cohort Data Be Treated As Genuine
Panel Data?” Empirical Economics, 17, 9–23.
Verbeek, M. and F. Vella (2005): “Estimating Dynamic Models from Repeated
Cross-sections,” Journal of Econometrics, 127, 83–102.
Verdier, V. (2015): “Estimation of Dynamic Panel Data Models with Cross-Sectional
Dependence: Using Cluster Dependence for Efficiency,” Journal of Applied Econo-
metrics, (forthcoming).
Westerlund, J. and M. Norkute (2014): “A Factor Analytical Method to Inter-
active Effects Dynamic Panel Models with or without Unit Root,” Working Paper
2014:12.
Bibliography 194
White, H. (2000): Asymptotic Theory for Econometricians, Economic Theory, Econo-
metrics, and Mathematical Economics, Academic Press, 2 ed.
Windmeijer, F. (2005): “A Finite Sample Correction for the Variance of Linear
Efficient Two-Step GMM Estimators,” Journal of Econometrics, 126, 25–51.
Ziliak, J. P. (1997): “Efficient Estimation with Panel Data When Instruments Are
Predetermined: An Empirical Comparison of Moment-Condition Estimators,” Jour-
nal of Business & Economic Statistics, 15, 419–431.
Nederlandse Samenvatting (Summary in Dutch)
Panel data zijn herhaalde waarnemingen, over verschillende tijdsperioden, van ver-
schillende cross-sectionele eenheden, zoals individuen of bedrijven. Panel data wor-
den in toenemende mate gebruikt in empirische macro- en (vooral) micro-economische
analyses, en deze toename heeft verschillende oorzaken. Het grootste voordeel van
het gebruik van panel data is dat het een statistische analyse mogelijk maakt van
causale effecten, waarbij voor niet-waargenomen kenmerken kan worden gecorrigeerd.
Een tweede voordeel van panel data is dat het samenvoegen van tijdreeksen over
verschillende cross-sectionele eenheden in een enkel model kan leiden tot een relatief
nauwkeurige schatting van onbekende parameters, zelfs als het aantal cross-sectie een-
heden (N) of het aantal tijdseenheden (T ) relatief klein is.
Een centraal thema in de analyse van lineaire dynamische panel-data modellen is de
inconsistentie van de fixed-effects schatter als N toeneemt maar T eindig blijft. Deze
inconsistentie staat bekend als de Nickell bias, en is een voorbeeld van het incidentele-
parameter probleem. Vanwege deze problemen met de fixed-effects schatter is het
gebruikelijk om de parameters van dynamische panel-data modellen te schatten met de
gegeneraliseerde momentenmethode (GMM). Deze methode leidt tot een consistente
en asymptotisch efficiente schatter, maar kent in eindige steekproeven ook beperk-
ingen: het gebruik van te veel of te zwakke instrumentele variabelen kan leiden tot
een vertekening in GMM schatters en toetsen. Dit heeft geleid tot een hernieuwde
belangstelling voor likelihood-gebaseerde schattingsmethoden die corrigeren voor het
incidentele-parameter probleem.
In Hoofdstuk 2 van dit proefschrift worden eigenschappen geanalyseerd van de schatter
gebaseerd op de likelihood in eerste verschillen in een eerste-orde panel vector autore-
gressief model. Nieuwe resultaten worden verkregen over de verdelingseigenschappen
van deze schatter. De nadruk ligt daarbij op situaties waarin niet aan de aannames
wordt voldaan, waaronder asymptotische eigenschappen zijn afgeleid. Daarnaast wordt
een vereenvoudigde aanpak voor het bepalen van deze schatter afgeleid. Bovendien
wordt in dit hoofdstuk de asymptotische bimodaliteit van de likelihood geanalyseerd,
een onderwerp dat in de literatuur onderbelicht is gebleven. Ter illustratie wordt een
uitgebreide Monte Carlo simulatiestudie uitgevoerd. De resultaten daarvan bieden be-
langwekkende inzichten in de eindige-steekproef eigenschappen van de schatter, relevant
voor zowel theoretische als toegepaste econometristen.
195
Nederlandse Samenvatting 196
Hoofdstuk 3 geeft een gedetailleerde analyse van de mogelijke bimodaliteit van de
likelihood en negativiteit van variantieschatters in het univariate panel AR(1) model,
eventueel uitgebreid met exogene verklarende variabelen. Een belangrijk resultaat is
dat de eerste-orde voorwaarde voor de ML schatter een derdegraads polynoom vormt
in de autoregressieve parameter. Dit suggereert dat de log-likelihood functie in eindige
steekproeven (voor willekeurige T ) zowel unimodaal als bimodaal kan zijn. Het hoofd-
stuk laat verder zien dat gebruikelijke t-toetsen een sterk vertekend significantieniveau
kunnen hebben. Dit probleem kan bijzonder relevant zijn bij empirische toepassing
van deze methoden; in een empirische illustratie blijkt het van invloed op de geschatte
dynamiek in de werkloosheid per staat in de Verenigde Staten.
Soms wordt de aanname dat het effect van weggelaten variabelen kan worden benaderd
met een additieve foutencomponenten structuur als te restrictief beschouwd. Een niet-
waargenomen eigenschap zoals talent of aanleg kan bijvoorbeeld een tijdsvarierend ef-
fect hebben op de productiviteit en daarmee op het inkomen. In dergelijke gevallen is
het gebruikelijk in de econometrische literatuur om interactie- (oftewel multiplicatieve)
effecten op te nemen. Tegenover de voordelen van een dergelijke flexibilere structuur
staan twee belangrijke praktische beperkingen. Ten eerste zijn standaard inferentiele
methoden inconsistent; ten tweede moet men gebruik maken van niet-lineaire metho-
den, die tot rekentechnische complicaties kunnen leiden, en waarvan de asymptotische
eigenschappen sterk afhankelijk zijn van specificieke aannames en nuisance parameters.
In de twee resterende hoofdstukken wordt dit type model geanalyseerd, en worden
nieuwe resultaten over de eindige-steekproef en asymptotische eigenschappen afgeleid
van schatters die rekening houden met interactie-effecten.
Hoofdstuk 4 geeft een uitgebreid overzicht van schatters voor dynamische panel-data
modellen met interactie-effecten. Doel van dit hoofdstuk is om empirische onderzoek-
ers een praktische handleiding te bieden bij het toepassen van methoden die rekening
houden met verschillende vormen van niet-waargenomen heterogeniteit. Bijzondere
aandacht wordt besteed aan de berekening van het aantal identificeerbare parameters,
een in de literatuur vaak veronachtzaamde vereiste voor het afleiden van asympto-
tisch accurate inferentiemethoden en consistente modelselectie procedures. De eindige-
steekproef eigenschappen van schatters worden bestudeerd voor een aantal verschil-
lende parameterconfiguraties in een grootschalige Monte Carlo studie. Hierbij wordt
aandacht gegeven aan (i) het effect van de aanwezigheid van zwak exogene verklarende
variabelen, (ii) het effect van een veranderende correlatie tussen de factor loadings voor
de endogene en de verklarende variabelen, (iii) de invloed van het aantal momentvoor-
waarden op de onzuiverheid en het significatieniveau van GMM schatters en toetsen,
Nederlandse Samenvatting 197
(iv) het effect van verschillen in tijdreeks-persistentie in de data, en tenslotte (v) het
effect van de steekproefgrootte.
Hoofdstuk 5 verlegt de analyse naar die van pseudo-panel-data modellen. Als een
echt panel niet beschikbaar is kan een pseudo-panel geconstrueerd worden uit her-
haalde cross-secties. In dit laatste hoofdstuk worden eigenschappen van schatters
bestudeerd in lineaire pseudo-panel-data modellen met een vast aantal cohorten en
tijdreekswaarnemingen, en, in het bijzonder, multiplicatieve effecten. Bijzondere aan-
dacht wordt gegeven aan identificatie aspecten van de voorgestelde schatter voor het
geval van niet-gebalanceerde steekproeven. Naast theoretische resultaten geeft het
hoofdstuk een uitgebreide Monte Carlo simulatiestudie. Daarbij wordt de nadruk
gelegd op de robuustheid van de voorgestelde schatter met betrekking tot endogen-
iteit, cohort interactie-effecten en zwakke identificatie. Voor zover bekend voor het
eerst in de literatuur worden zwakke en globale identificatie aspecten onderzocht in
pseudo-panels met een vast aantal cohorten en tijdreekswaarnemingen.
The Tinbergen Institute is the Institute for Economic Research, which was founded
in 1987 by the Faculties of Economics and Econometrics of the Erasmus University
Rotterdam, University of Amsterdam and VU University Amsterdam. The Institute
is named after the late Professor Jan Tinbergen, Dutch Nobel Prize laureate in eco-
nomics in 1969. The Tinbergen Institute is located in Amsterdam and Rotterdam. The
following books recently appeared in the Tinbergen Institute Research Series:
583. L.T. GATAREK, Econometric Contributions to Financial Trading, Hedging and
Risk Measurement
584. X. LI, Temporary Price Deviation, Limited Attention and Information Acquisition
in the Stock Market
585. Y. DAI, Efficiency in Corporate Takeovers
586. S.L. VAN DER STER, Approximate feasibility in real-time scheduling: Speeding
up in order to meet deadlines
587. A. SELIM, An Examination of Uncertainty from a Psychological and Economic
Viewpoint
588. B.Z. YUESHEN, Frictions in Modern Financial Markets and the Implications for
Market Quality
589. D. VAN DOLDER, Game Shows, Gambles, and Economic Behavior
590. S.P. CEYHAN, Essays on Bayesian Analysis of Time Varying Economic Patterns
591. S. RENES, Never the Single Measure
592. D.L. IN ’T VELD, Complex Systems in Financial Economics: Applications to
Interbank and Stock Markets
593. Y. YANG, Laboratory Tests of Theories of Strategic Interaction
594. M.P. WOJTOWICZ, Pricing Credits Derivatives and Credit Securitization
595. R.S. SAYAG, Communication and Learning in Decision Making
596. S.L. BLAUW, Well-to-do or doing well? Empirical studies of wellbeing and de-
velopment
597. T.A. MAKAREWICZ, Learning to Forecast: Genetic Algorithms and Experi-
ments
598. P. ROBALO, Understanding Political Behavior: Essays in Experimental Political
Economy
599. R. ZOUTENBIER, Work Motivation and Incentives in the Public Sector
600. M.B.W. KOBUS, Economic Studies on Public Facility use
601. R.J.D. POTTER VAN LOON, Modeling non-standard financial decision making
602. G. MESTERS, Essays on Nonlinear Panel Time Series Models
603. S. GUBINS, Information Technologies and Travel
604. D. KOPANYI, Bounded Rationality and Learning in Market Competition
605. N. MARTYNOVA, Incentives and Regulation in Banking
606. D. KARSTANJE, Unraveling Dimensions: Commodity Futures Curves and Eq-
uity Liquidity
607. T.C.A.P. GOSENS, The Value of Recreational Areas in Urban Regions
608. L.M. MARC, The Impact of Aid on Total Government Expenditures
609. C. LI, Hitchhiking on the Road of Decision Making under Uncertainty
610. L. ROSENDAHL HUBER, Entrepreneurship, Teams and Sustainability: a Series
of Field Experiments
611. X. YANG, Essays on High Frequency Financial Econometrics
612. A.H. VAN DER WEIJDE, The Industrial Organization of Transport Markets:
Modeling pricing, Investment and Regulation in Rail and Road Networks
613. H.E. SILVA MONTALVA, Airport Pricing Policies: Airline Conduct, Price Dis-
crimination, Dynamic Congestion and Network Effects
614. C. DIETZ, Hierarchies, Communication and Restricted Cooperation in Coopera-
tive Games
615. M.A. ZOICAN, Financial System Architecture and Intermediation Quality
616. G. ZHU, Three Essays in Empirical Corporate Finance
617. M. PLEUS, Implementations of Tests on the Exogeneity of Selected Variables and
their Performance in Practice
618. B. VAN LEEUWEN, Cooperation, Networks and Emotions: Three Essays in
Behavioral Economics
619. A.G. KOPANYI-PEUKER, Endogeneity Matters: Essays on Cooperation and
Coordination
620. X. WANG, Time Varying Risk Premium and Limited Participation in Financial
Markets
621. L.A. GORNICKA, Regulating Financial Markets: Costs and Trade-offs
622. A. KAMM, Political Actors playing games: Theory and Experiments
623. S. VAN DEN HAUWE, Topics in Applied Macroeconometrics
624. F.U. BRAUNING, Interbank Lending Relationships, Financial Crises and Mon-
etary Policy
625. J.J. DE VRIES, Estimation of Alonso’s Theory of Movements for Commuting
626. M. POP LAWSKA, Essays on Insurance and Health Economics
627. X. CAI, Essays in Labor and Product Market Search
628. L. ZHAO, Making Real Options Credible: Incomplete Markets, Dynamics, and
Model Ambiguity
629. K. BEL, Multivariate Extensions to Discrete Choice Modeling
630. Y. ZENG, Topics in Trans-boundary River sharing Problems and Economic The-
ory
631. M.G. WEBER, Behavioral Economics and the Public Sector
632. E. CZIBOR, Heterogeneity in Response to Incentives: Evidence from Field Data