UvA-DARE (Digital Academic Repository) Essays in panel data … › ws › files › 2636319 › 166708_DEF_Juodis... · prof. dr. D.C. van den Boom ten overstaan van een door het

UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Essays in panel data modelling

Juodis, A.

Link to publication

Citation for published version (APA):Juodis, A. (2015). Essays in panel data modelling.

General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s),other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, statingyour reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Askthe Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam,The Netherlands. You will be contacted as soon as possible.

Download date: 20 Jul 2020

https://dare.uva.nl/personal/pure/en/publications/essays-in-panel-data-modelling(1eb54731-6663-4c2b-9f8e-00a85468fa01).html

Essays in Panel Data Modelling

Arturas Juodis

This thesis analyses the properties of the estimation techniques for panel data models with additive and multiplicative error structures. First, this thesis discusses the relative merits of the maximum likelihood estimators in dynamic panel data models. Second, it provides an in-depth analysis of genuine and pseudo panel data models with unobserved interactive effects.

Arturas Juodis holds a Bachelor‘s degre in Economics from Vilnius University and M.Phil. in Economics from Tinbergen Institute. In December 2011, he joined Amsterdam School of Economics at the University of Amsterdam as a PhD student. His research mainly focuses on various aspects of panel data analysis for micro- and macro-economic applications.

633

Universiteit van Amsterdam

Essays in Panel Data M

odelling Arturas Juodis

ESSAYS IN PANEL DATA MODELLING

ISBN 978 90 5170 684 0

Cover design: Crasborn Graphic Designers bno, Valkenburg a.d. Geul

This book is no. 633 of the Tinbergen Institute Research Series, established through

cooperation between Rozenberg Publishers and the Tinbergen Institute. A list of books

which already appeared in the series can be found in the back.

ESSAYS IN PANEL DATA MODELLING

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Universiteit van Amsterdam

op gezag van de Rector Magnificus

prof. dr. D.C. van den Boom

ten overstaan van een door het College voor Promoties ingestelde

commissie, in het openbaar te verdedigen in de Agnietenkapel

op donderdag 3 december 2015, te 10:00 uur

door

Arturas Juodis

geboren te Minsk, Wit-Rusland

Promotiecommissie:

Promotor: Prof. dr. H.P. Boswijk Universiteit van Amsterdam

Copromotor: dr. M.J.G. Bun Universiteit van Amsterdam

Overige leden: dr. S.A. Broda Universiteit van Amsterdam

Prof. dr. M.A. Carree Universiteit Maastricht

Prof. dr. J.F. Kiviet Universiteit van Amsterdam

Prof. dr. F.R. Kleibergen Universiteit van Amsterdam

dr. J.C.M. van Ophem Universiteit van Amsterdam

dr. R. Okui Kyoto University

Prof. dr. T.J. Wansbeek Rijksuniversiteit Groningen

Faculteit: Economie en Bedrijfskunde

Acknowledgements

It is difficult to overestimate the amount of help and encouragement I received from

my supervisors Peter Boswijk and Maurice Bun. Starting from the beginning of my

PhD they helped me to shape my research agenda and quite importantly allowed me

to deviate from the original topic as much as I wanted. I am especially grateful to

them for always keeping their door open, irrespective whether I wanted to talk about

vacations, submissions, referees, conferences or visits, you name it.

I would like to express my gratitude to Simon Broda, Martin Carree, Jan Kiviet, Frank

Kleibergen, Hans van Ophem, Ryo Okui and Tom Wansbeek for agreeing to be in my

doctoral committee.

My four years at UvA would not be as productive and pleasant without good colleagues:

Kees Jan van Garderen, Noud van Giersbergen, Jan Kiviet, Frank Kleibergen, Hans

van Ophem, Simon Broda, Andrei Lalu, Xiye Yang, Yang Liu, Rutger Poldermans,

Milan Pleus and Andrew Pua. I am especially thankful to Frank Kleibergen for his

help and encouragement during the job search. Thanks to the office next to the coffee

machine, I was always happy to have a chat with other fellow PhD students at UvA:

Tomasz, Rutger T., Swaphnil, Lucy, Lin, Oana, Julien, David, Christian, Moutaz, Hao

and Stephany. Finally, the excellent work of non-academic staff Robert, Kees and

Wilma is highly appreciated.

Admission to the TI Mphil programme was a crucial step for my PhD. I am grateful

to Admission board of the Tinbergen Institute and the DGS at the time prof. Adriaan

Soetevent, for opening the door of academia for me. The excellent work of non-academic

staff Judith, Ester and Arianne is appreciated and not forgotten. I have met many great

people at TI, whose friendship I still enjoy. Thanks Piotr, Erkki, Sait (aka Stilian),

Lukasz, Sandor, Violeta, Grega, Ona, Lerby and many others.

During my time as PhD student I had an opportunity to visit Monash University

in Spring 2014 and Lund University in Spring 2015. My visit to Monash University

would not have been as pleasurable as it was without always positive and encouraging

host, Vasilis Sarafidis. He made sure my stay in Melbourne was good both inside and

outside the university. I was also very happy to meet George Athanasopoulos, Ann

Maharaj, Param and Mervyn Silvapulle, Anastasios Panagiotelis, Xueyan Zhao and

Tingting Cheng at Monash. I am grateful to Joakim Westerlund for hosting me at

Lund University and especially for his positive attitude and willingness to discuss and

i

share research ideas. I am also grateful to Hande Karabıyık, Simon Reese, Emre Aylar

and Milda Norkute for making my stay easy and pleasurable.

I am also happy that my fellow Lithuanians were always around to enjoy (and are still

enjoying) the hospitality of Casa Arturo in Amstelveen: Simas, Egle G., Egle J., Vytis,

Zivile, Renata and Ieva.

There are many people back in Lithuania that I would also like to thank. VuSIF

people: Ricardas, Vytautas, Tomas and Linas. Cycling and running fanatics: Darius

and Marius. I am very grateful to some of the academic staff at Vilnius University,

namely, prof. Linas Cekanavicius, prof. Alfredas Rackauskas, dr. Dmitrij Celov and

dr. Vita Karpuskiene, who helped and inspired me to continue with graduate studies.

Finally, I would like to thank my dear second-half Zina, my parents and my brother

for being supportive throughout the last 5 years. PhD can be a stressful (but also

pleasant) endeavour and Zina had to suffer all the ups and downs through all those

years we have been living in the Netherlands. Finally, I wish all my grandparents could

live until this day, I know they would be proud of me.

My research project was financed by NWO (through the MaGW grant “Likelihood-

based inference in dynamic panel data models with endogenous covariates”), Univer-

sity of Amsterdam, Tinbergen Institute and C. Willems Stichting. I would also like

to thank seminar and conference participants at University of Amsterdam, Univer-

sity of Groningen, Utrecht University, Monash University, University of Melbourne,

Deakin University, Lund University, Tinbergen Institute, NESG (Amsterdam, Tilburg,

Maastricht), IAAE annual meetings (London and Thessaloniki), Panel Data Confer-

ence (Cambridge, London and Budapest), Panel Data Workshop in Amsterdam (2013

and 2015) and NY State Camp Econometrics, for their comments, suggestions and

discussions that greatly improved my papers.

August 2015, Amsterdam.

Contents

Acknowledgements i

Contents iii

1 Introduction 1

1.1 Why panel data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Likelihood-based estimation of dynamic panel data models . . . . . . . 4

1.3 Panel data models with interactive effects . . . . . . . . . . . . . . . . 6

2 First Difference Transformation in Panel VAR models: Robustness,Estimation and Inference 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Setup and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 Assumptions and definitions . . . . . . . . . . . . . . . . . . . . 12

2.3 OLS in first differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 With exogenous regressors . . . . . . . . . . . . . . . . . . . . . 15

2.3.2 Without exogenous regressors . . . . . . . . . . . . . . . . . . . 16

2.4 Transformed MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 Likelihood function with imposed covariance-stationarity . . . . 22

2.4.2 Cross-sectional heterogeneity . . . . . . . . . . . . . . . . . . . . 24

2.4.3 Misspecification of the mean parameter . . . . . . . . . . . . . . 27

2.4.4 Identification and bimodality issues for three-wave panels . . . . 28

2.4.5 Time-series heteroscedasticity . . . . . . . . . . . . . . . . . . . 32

2.5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5.2 Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.5.3 Technical remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.5.4 Results: Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.5.5 Results: Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

iii

Contents iv

2.A.1 Auxiliary results . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.A.2 Log-likelihood function . . . . . . . . . . . . . . . . . . . . . . . 44

2.A.3 Score vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.A.4 Bimodality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.B Iterative bias correction procedure . . . . . . . . . . . . . . . . . . . . . 51

2.C Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 On Maximum Likelihood Estimation of Dynamic Panel Data Models 63

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2 ML estimation for the panel AR(1) model . . . . . . . . . . . . . . . . 66

3.3 Multiple solutions and constrained estimation . . . . . . . . . . . . . . 71

3.3.1 Three-wave panel and the Transformed ML estimator . . . . . . 71

3.3.2 Further asymptotic results for T > 2 and TML . . . . . . . . . . 74

3.3.3 Constrained estimation . . . . . . . . . . . . . . . . . . . . . . . 76

3.4 Extension to exogenous regressors . . . . . . . . . . . . . . . . . . . . . 78



3.5.2 Results: Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.6 Empirical illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.6.1 ARX(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.6.2 AR(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.B Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.C Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4 Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 99

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.2 Theoretical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.3 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.3.1 Quasi-differenced (QD) GMM . . . . . . . . . . . . . . . . . . . 104

4.3.2 Quasi-long-differenced (QLD) GMM . . . . . . . . . . . . . . . 107

4.3.3 Factor IV (FIVU and FIVR) . . . . . . . . . . . . . . . . . . . . 109

4.3.4 Linearized QLD GMM . . . . . . . . . . . . . . . . . . . . . . . 112

4.3.5 Projection GMM . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.3.6 Linear GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.3.7 Projection Quasi ML Estimator . . . . . . . . . . . . . . . . . . 117

4.4 Some general remarks on the estimators . . . . . . . . . . . . . . . . . 118

4.4.1 (Non-)Invariance to factor loadings . . . . . . . . . . . . . . . . 118

4.4.2 Unbalanced samples . . . . . . . . . . . . . . . . . . . . . . . . 119

4.4.3 Observed factors . . . . . . . . . . . . . . . . . . . . . . . . . . 120


Contents v

4.5.1 Setup and designs . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

4.A Starting values for non-linear estimators . . . . . . . . . . . . . . . . . 128

4.B Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5 Pseudo Panel Data Models with Cohort Interactive Effects 143

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.3 Cohort Interactive Effects . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.3.1 Inconsistency of the conventional Fixed Effects estimator . . . . 147

5.3.2 Assumptions and estimation . . . . . . . . . . . . . . . . . . . . 149

5.3.3 Unbalanced samples . . . . . . . . . . . . . . . . . . . . . . . . 155

5.3.4 Dynamic models . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5.4 Testing, model selection and identification . . . . . . . . . . . . . . . . 158

5.4.1 Testing and model selection . . . . . . . . . . . . . . . . . . . . 158

5.4.2 Identification: Local and Weak . . . . . . . . . . . . . . . . . . 159

5.4.3 Identification: Global . . . . . . . . . . . . . . . . . . . . . . . . 161


5.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164


5.5.3 Results: Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.5.4 Results: Model selection . . . . . . . . . . . . . . . . . . . . . . 168

5.6 Empirical illustration: ENEMDU Dataset . . . . . . . . . . . . . . . . 169

5.6.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

5.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

5.A Theoretical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

5.A.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

5.A.2 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

5.A.3 Sufficient conditions for FE estimator . . . . . . . . . . . . . . . 178

5.A.4 The Hausman test for fixed effects . . . . . . . . . . . . . . . . . 179

5.B The ENEMDU dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

5.B.1 The linear-log specification . . . . . . . . . . . . . . . . . . . . . 181

5.C Monte Carlo results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Bibliography 185

Nederlandse samenvatting 195

Chapter 1

Introduction

1.1 Why panel data?

Panel data are repeated observations on the same cross section unit, typically of in-

dividuals or firms (in microeconomic applications), observed for several time periods.

The use of panel data has been increasingly popular in empirical macroeconomic and

(especially) microeconomic studies and there are several reasons behind the success

story.

The key advantage of using panel data is the possibility to perform statistical inference

on quantities that can have a causal interpretation, at the same time controlling for

unobservable cross-sectional and time-series characteristics (hence avoiding omitted

variable bias). One of the most common examples of such characteristics, the so-called

individual specific effect, is the effect of talent or skills in a model of workers’ hourly

earnings. In order to consistently estimate the coefficients in that model, researchers

need to control for heterogeneity in workers’ talents or skills. Unfortunately, non-

experimental data containing information on individual workers’ talents and skills are

scarce. Without such information, it is extremely difficult to control for talent using

cross-sectional data to obtain statistical results that can be interpreted as causal. In

contrast, when panel data are available, a variety of estimation methods can be used

to control for the unobservable individual effects.

A second major advantage of panel data is increased precision in estimation. This is

the result of an increase in the number of observations owing to pooling several time

series observations of data for each individual. The possibility to pool observations over

1

Chapter 1. Introduction. Why panel data? 2

two dimensions (individuals and time) allows estimation of the common parameters of

interest even when one of the dimensions can be of limited size.

This flexibility of panel data modelling was also instrumental in ensuring that, together

with standard courses on cross-sectional and time-series modelling, panel data methods

became an important part of the econometrics curriculum around the world. Further-

more, more specialized textbooks of Hsiao (2002), Arellano (2003b) and Baltagi (2013)

have become standard in the literature, and might serve the readers of this thesis as

a more general introduction into the field of panel data. The applicability and im-

portance of (dynamic) panel data models for empirical economists can be illustrated

by the fact that the seminal paper of Arellano and Bond (1991) (that is extensively

referred to in this thesis) is one of the most cited papers in the econometrics literature

since the early 90s.1

Let us briefly explain what the prototypical panel data model looks like and what are

the key ingredients that drive the analysis of this type of models. Panel data sets

are obtained using T repeated measurements of individual-specific quantities wi,t =

(yi,t,x′i,t)′. Here yi,t is the dependent or explained variable, while xi,t is the vector

of observed explanatory variables (or regressors) that can be time-varying, but also

time-invariant.

Two indices in subscript emphasize that all observations are obtained by sampling from

two dimensions (cross-sectional and time). Here the index i denotes individual units

(e.g. firms, households, etc.) and the index t denotes time periods. The total number of

individual observations is usually denoted by N . Hence, if the panel is “balanced” (all

individuals are observed the same number of times, T ), the total number of observations

is given by N × T .

The analysis of this thesis is limited to linear models, i.e.

yi,t = x′i,tβ + ui,t. (1.1)

The main emphasis of this thesis is on models where one of the explanatory variables

in xi,t is the lagged value of yi,t. Dynamic models are of interest in a wide range of

economic applications, including Euler equations for household consumption, adjust-

ment cost models for firms’ factor demands, and empirical models of economic growth.

1In “On the Remarkable Success of the ARELLANO-BOND Estimator”, AENORM, No. 77,p.15-20, prof. T.J. Wansbeek, provides a detailed analysis behind the success story of this paper.


Even when the coefficients of lagged dependent variables are not of direct interest, al-

lowing for dynamics in the underlying process may be crucial for recovering consistent

estimates of other parameters.

As was mentioned above, one of the key advantages of panel data models is the pos-

sibility to control for omitted individual and time-specific variables, without any need

to rely on external instruments. In order to do that, one has to specify the way these

omitted variables enter the error term ui,t. It is usually assumed that the error term

ui,t can be well approximated by the following additive decomposition

ui,t = ηi + τt + εi,t, (1.2)

where εi,t is the idiosyncratic component, ηi is the individual-specific effect and τt is

the time effect. This is the standard additive error component model mostly used and

studied in the literature.

The error component structure in (1.2) has some limitations as it imposes additivity

in individual and time effects. One can relax the additivity assumption, and instead

consider a more flexible additive-multiplicative decomposition of ui,t. In that case one

usually assumes that

ui,t = λ′ift + εi,t, (1.3)

and refers to the L-dimensional vectors λi and ft as individual-specific factor loadings

and time-varying factors. This specification is commonly referred to as “interactive-

effects” or “multi-factor error terms”. Note how the additive structure can be seen as

a special case of the multiplicative one, observing that

ηi + τt =(ηi 1

)( 1

τt

)= λ′ift.

Hence, the multiplicative nature of λ′ift nests the commonly used additive model.

However, this additional flexibility also has a price in terms of estimation, as discussed

later in this thesis.

This thesis consists of four independent chapters. The first two chapters are devoted

to the analysis of consistent likelihood-based estimators for univariate and multivariate

panel data models. In those two chapters we will maintain the additive error-component

structure as in (1.2).


The remaining two chapters, on the other hand, relax the additivity assumption and

consider the more flexible specification (1.3). That part of the thesis contributes to the

growing literature on panel data models with “interactive-effects”.

In all chapters it is assumed that the number of cross-sectional units (individuals, firms,

regions) is large (later referred to as “large N”), while the number of time-series obser-

vations is limited and can be as small as three (“fixed T”). It is therefore common to

consider the semi-asymptotic behavior of estimators (and corresponding test statistics)

by keeping T fixed and assuming only N to be large. Asymptotic approximations of

this type are mostly applicable to micro-econometric panels, where it is very costly or

impossible to track individuals (or firms) over longer time spans. Furthermore, in this

thesis, we assume that the slope parameter vector β is the same for all individuals and

is constant over time. Although quite restrictive, this assumption is commonly imposed

when analysing micro-econometric panels. The analysis with individual specific slopes

is extremely challenging for panels with fixed time-series dimensions and especially so

for dynamic panels.

1.2 Likelihood-based estimation of dynamic panel

data models

A central theme in linear dynamic panel data analysis is the fact that the Fixed Effects

(FE) estimator is inconsistent for fixed T and large N . This inconsistency is referred

to as the Nickell (1981) bias, and is an example of the incidental parameters problem,

see Neyman and Scott (1948). It has therefore become common practice to estimate

the parameters of dynamic panel data models by the Generalized Method of Moments

(GMM), see Arellano and Bond (1991) and Blundell and Bond (1998). A main reason

for using GMM is that it provides asymptotically efficient inference exploiting a minimal

set of statistical assumptions. GMM inference is not without its own problems, however,

see e.g. Bun and Kiviet (2006) and Kiviet, Pleus, and Poldermans (2015). This again

has led to an interest in likelihood based methods that implicitly or explicitly correct

for the incidental parameters problem.

Chapter 2 of this thesis, “First Difference Transformation in Panel VAR models: Ro-

bustness, Estimation and Inference”, based on Juodis (2014b), analyses the properties

of the likelihood-based estimator in first differences (also referred to as the Transformed

ML estimator) for the first-order Panel Vector Autoregressive model (or PVAR(1)).


Models of this type have recently received a substantial amount of attention from

econometricians, with the recent contributions by Akashi and Kunitomo (2012), Hsiao

and Zhou (2015) and Hayakawa (2015), among others.

As a starting point for our analysis in that chapter we take the setup of Binder, Hsiao,

and Pesaran (2005) and provide new results that shed some light on the distributional

properties of this maximum likelihood estimator. Particularly, the focus is on situations

where the underlying assumptions used to derive asymptotic properties of the Trans-

formed Maximum Likelihood estimator are violated. A simplified approach to obtain

the estimator is also provided. This chapter is also one of the few papers in the litera-

ture that touches upon the topic of asymptotic bimodality issues for the log-likelihood

function. To illustrate the importance of the underlying assumptions and bimodality,

an extensive Monte Carlo study is conducted. Results of the simulation study provide

some interesting insights about the finite-sample distribution of the estimator and can

be useful both for applied and for theoretical econometricians.

Chapter 3 of this thesis, “On Maximum Likelihood Estimation of Dynamic Panel Data

Models” is based on Bun, Carree, and Juodis (2015). As a starting point of our

work, we take the bimodality analysis briefly discussed in the previous chapter. This

chapter provides an in-depth discussion of the bimodality and negative variance issues

for likelihood-based estimators in the univariate Panel AR(1) model with and without

additional exogenous variables. We concentrate our attention on the univariate AR(1)

model, leaving the more challenging multivariate setup of the previous chapter for

future research.

In this paper we show that the First Order Conditions for the likelihood estimators are

cubic in the autoregressive parameter. This suggests that the log-likelihood function

in finite samples can be bimodal or unimodal, for any value of T . Furthermore, we

find that commonly used statistics, can have poor coverage in finite samples and one

might need to rely on other means of inference. This problem can be especially relevant

for empirical economists using these methods. Using the dataset in Bun and Carree

(2005), we show how these theoretical results can influence empirical estimates of U.S.

state level unemployment dynamics.


1.3 Panel data models with interactive effects

In some situations the assumption that omitted unobserved variables can be well ap-

proximated by an additive error component structure, can be too restrictive. For

example, one can consider a model of hourly wage rates as illustrated in Ahn et al.

(2001). It is reasonable to assume that the productivity of an individual’s unobserv-

able talent or skill can change over the business cycle. If so, the effect of unobservable

talent on hourly wages would vary over time because workers’ hourly wage rates de-

pend on their labor productivity. It is also likely that hourly wage rates depend on

multiple individual effects. For example, individual workers’ wages could be affected

by unexpected changes in macroeconomic variables. Panel data models that assume a

single time-invariant individual effect are inappropriate for the analysis of data with

such multiple time-varying individual effects.

In such cases econometricians usually assume that one can instead approximate the

effect of omitted variables with interactive (or multiplicative) fixed effects. Despite the

advantages of the more flexible error-component structure of the model, models with

interactive effects also introduce several practical challenges in terms of estimation.

The presence of multiplicative effects creates two major obstacles for applied econome-

tricians. First of all, their presence invalidates commonly used inferential procedures,

as the mostly used estimators are inconsistent in this case. Secondly, one has to rely

on non-linear estimation techniques that are less appealing than their linear counter-

parts from a computational point of view. Finite-sample and asymptotic properties of

these non-linear estimators depend on high-level assumptions and non-trivial nuisance

parameters.

The remaining two chapters of this thesis are devoted to the analysis of models of

this type, and as such contribute to the growing literature by shedding some light

on the finite-sample and asymptotic properties of estimators introduced to deal with

interactive effects.

Chapter 4 of this thesis, “Fixed T Dynamic Panel Data Estimators with Multi-Factor

Errors” is based on Juodis and Sarafidis (2014). In this paper we present a compre-

hensive overview of estimators for dynamic panel data models with interactive effects.

The objective of this chapter is to serve as a useful guide for practitioners who wish

to apply methods that allow for multiplicative sources of unobserved heterogeneity in

their model. We pay particular attention to calculating the number of identifiable


parameters correctly, which is a requirement for asymptotically valid inferences and

consistent model selection procedures. This issue is often overlooked in the literature.

We investigate the finite-sample performance of the estimators under a number of

different designs using a large scale Monte Carlo study. In particular, we examine (i)

the effect of the presence of weakly exogenous covariates, (ii) the effect of changing the

magnitude of the correlation between the factor loadings of the dependent variable and

those of the covariates, (iii) the impact of the number of moment conditions on bias and

size for GMM estimators and tests, (iv) the impact of different levels of persistence in

the data, and finally (v) the effect of sample size. These are important considerations

with high empirical relevance. Notwithstanding, to the best of our knowledge they

remain largely unexplored in the literature.

The final chapter, “Pseudo Panel Data Models with Cohort Interactive Effects”, which

is based on Juodis (2015), takes a side step from the main topic of this thesis and

provides a detour to the analysis of pseudo panel data models. When genuine panel

data samples are unavailable, repeated cross-sectional surveys can be used to form

pseudo panels. I investigate the properties of linear pseudo-panel data models with a

fixed number of cohorts and time observations. It extends the work of Inoue (2008)

and Verbeek (2008) to models with multiplicative fixed effects. A special role in that

chapter is devoted to the discussion of identification issues of the proposed estimator

for potentially unbalanced samples.

In addition to the theoretical results for the novel estimator, an extensive Monte Carlo

simulation study is conducted to assess the finite-sample properties. We mainly focus on

the robustness of the proposed estimator with respect to endogeneity, cohort interactive

effects, and weak identification. To the best of our knowledge, this chapter is the first

study in the literature that touches upon the issue of weak and global identification in

pseudo panels for a fixed number of cohorts and time-series observations.

Chapter 2

First Difference Transformation in

Panel VAR models: Robustness,

Estimation and Inference

2.1 Introduction

When the feedback and interdependency between dependent variables and covariates is

of particular interest, multivariate dynamic panel data models might arise as a natural

modeling strategy. For example, particular policy measures can be seen as a response

to the past evolution of the target quantity, meaning that the reduced form of two

variables can be modeled by means of a Panel VAR (PVAR) model. In this paper we

aim at providing a thorough analysis of the performance of fixed T consistent estimation

techniques for PVARX(1) model based on observations in first differences. We mainly

focus on situations when the number of time periods is assumed to be relatively small,

while the number of cross-section units is large.

The estimation of univariate dynamic panel data models and the incidental parameter

problem of the ML estimators have received a lot of attention in the last three decades,

see Nickell (1981), and Kiviet (1995) among others. However, a similar analysis for mul-

tivariate panel data models was not covered and investigated in detail. Main exceptions

are papers by Holtz-Eakin et al. (1988), Hahn and Kuersteiner (2002), Binder, Hsiao,

and Pesaran (2005, hereafter BHP) and Hayakawa (2015) presenting theoretical results

for linear PVAR models. For empirical examples of PVAR models for microeconomic

9

Chapter 2. First Difference Transformation in Panel VAR models 10

panels, see Arellano (2003b, pp.116-120), Ericsson and Irandoust (2004), Michaud and

van Soest (2008), Koutsomanoli-Filippaki and Mamatzakis (2009) among others.

Because of the inconsistency of the Fixed Effects (FE, ML) estimator, the estimation

of Dynamic Panel Data (DPD) models has been mainly concentrated within the GMM

framework, with the version of the Arellano and Bond (1991) estimator and estimators

of Arellano and Bover (1995), Blundell and Bond (1998) and Ahn and Schmidt (1995;

1997). However, Monte Carlo studies have revealed that the method of moments (MM)

based estimators might be subject to substantial finite-sample biases, see Kiviet (1995),

Alonso-Borrego and Arellano (1999) and BHP. These potentially unattractive finite

sample properties of the GMM estimators have led to the recent interest in likelihood-

based methods, that are not subject to the incidental parameter bias. In this paper

the ML estimator based on the likelihood function of the first differences of Hsiao et al.

(2002), BHP and Kruiniger (2008) is analyzed (hereafter TML).

Monte Carlo results presented in BHP suggest that the Transformed Maximum Like-

lihood (TML) based estimation procedure outperforms the GMM based methods in

terms of both finite sample bias and RMSE. However, their analysis is incomplete,

particularly because, they did not consider cases where the models are stable, but the

initial condition is not mean and/or covariance stationary. Furthermore, the Monte

Carlo analysis was limited to situations where error terms are homoscedastic both in

time and in the cross-section dimension, leaving relevant cases of heteroscedastic error

terms unaddressed. We address both issues in the Monte Carlo designs presented in

Section 2.5.

We aim to contribute to the literature in multiple ways. First of all, we show that the

multivariate analogue of the FDOLS estimator of Han and Phillips (2010) is consistent

only over a restricted parameter set. Secondly, we consider properties of the TML

estimator for models with cross-sectional heteroscedasticity and mean non-stationarity.

Furthermore, we show that in the three wave panel the log-likelihood function of the

unrestricted TML estimator can violate the global identification condition. Finally,

the extensive Monte Carlo study expands the finite sample results available in the

literature to cases with possible non-stationary initial conditions and cross-sectional

heteroscedasticity.

The paper is structured as follows. In Section 2.2 we present the model and underlying

assumptions. Theoretical results for the Panel First Difference estimator are presented

in Section 2.3. We continue in Section 2.4 discussing the properties of the TML es-

timator under different assumptions regarding stationarity and heteroscedasticity. In


Section 2.5 we analyze the finite sample performance of estimators considered in the

paper by means of a Monte Carlo analysis. Finally, we conclude in Section 2.6.

Here we briefly discuss notation. Bold upper-case Greek letters are used to denote

the original parameters, i.e. Φ,Σ,Ψ, while the lower-case Greek letters φ,σ,ψdenote vec (·) (vech (·) for symmetric matrices) of corresponding parameters, in the

univariate setup corresponding parameters are denoted by φ, σ2, ψ2. Where necessary

we use subscript 0 to denote the true values of the aforementioned quantities. We

use ρ(A) to denote the spectral radius1 of a matrix A ∈ Rn×n. The commutation

matrix Ka,b is defined such that for any [a× b] matrix A, vec(A′) = Ka,b vec(A). The

duplication matrix Dm is defined such that for symmetric [m × m] matrix vecA =

Dm vechA. We define yi− ≡ (1/T )∑T

t=1 yi,t−1 and similarly yi ≡ (1/T )∑T

t=1 yi,t. The

lag-operator matrix LT is defined such that for any [T × 1] vector x = (x1, . . . , xT )′,

LTx = (0, x1, . . . , xT−1)′. The jth column of the [x × x] identity matrix is denoted

by ej. x is used to indicate variables after Within Group transformation (for example

yi,t = yi,t− yi), while x is used for variables after a “quasi-averaging” transformation.2

For further details regarding the notation used in this paper, see Abadir and Magnus

(2002).

2.2 Setup and assumptions

2.2.1 The Model

In this paper we consider the following PVAR(1) specification

yi,t = ηi +Φyi,t−1 + εi,t, i = 1, . . . , N, t = 1, . . . , T, (2.1)

where yi,t is an [m× 1] vector, Φ is an [m×m] matrix of parameters to be estimated,

ηi is an [m × 1] vector of fixed effects and εi,t is an [m × 1] vector of innovations

independent across i, with zero mean and constant covariance matrix Σ.3 If we set

m = 1 the model reduces to the linear DPD model with AR(1) dynamics.

1ρ(A) ≡ maxi(|λi|), where the λi’s are (possibly complex) eigenvalues of a matrix A.2yi = yi − yi,0 and yi− = yi− − yi,0.3Later in the paper we present the detailed analysis when Σ is i specific.


For a prototypical example of (2.1) consider the following bivariate model (see e.g. Bun

and Kiviet (2006), Akashi and Kunitomo (2012) and Hsiao and Zhou (2015))

yi,t = ηyi + γyi,t−1 + βxi,t + ui,t,

xi,t = ηxi + φyi,t−1 + ρxi,t−1 + vi,t,

where E[ui,tvi,t] = σuv. This system has the following reduced form(yi,t

xi,t

)=

(ηyi + βηxi

ηxi

)+

(γ + βφ βρ

φ ρ

)(yi,t−1

xi,t−1

)+

(ui,t + βvi,t

vi,t

). (2.2)

Depending on the parameter values, the process xi,tTt=0 can be either exogenous

(φ = σuv = 0), weakly exogenous (σuv = 0) or endogenous (σuv 6= 0).

For many empirically relevant applications the PVAR(1) model specification might be

too restrictive and incomplete. The original model then can be extended by including

strictly exogenous variables (the PVARX(1) model)

yi,t = ηi +Φyi,t−1 +Bxi,t + εi,t, i = 1, . . . , N, t = 1, . . . , T, (2.3)

where xi,t is a [k × 1] vector of strictly exogenous regressors and B is an [m × k]

parameter matrix.4 Furthermore, some models with group specific spatial dependence,

as in e.g. Kripfganz (2015) and Verdier (2015), can be also formulated as a reduced

form PVARX(1).

2.2.2 Assumptions and definitions

At first we define several notions that are primarily used for the model without exoge-

nous regressors.

Definition 2.1 (Effect stationary initial condition). The initial condition yi,0 is said

to be effect stationary if

E[yi,0|ηi] = (Im −Φ0)−1ηi, (2.4)

implying that the process yi,tTt=0 generated by (2.1) is effect stationary, E[yi,t|ηi] =

E[yi,0|ηi], for ρ(Φ0) < 1.

4Note that the model considered in Han and Phillips (2010) substantially differs from (2.3). Theyconsider a model specification with lags of xi,t and restricted parameters. Their specification can beaccommodated within (2.3) only if the so-called common factor restrictions on B are imposed.


Note that effect non-stationarity does not imply that the process yi,tTt=0 is mean non-

stationary, i.e. E[yi,t] 6= E[yi,0]. The latter property of the process crucially depends

on E[ηi].

Definition 2.2 (Common dynamics). The individual heterogeneity ηi is said to satisfy

the “common dynamics” assumption if

ηi = (Im −Φ0)µi. (2.5)

Under the common dynamics assumption, individual heterogeneity drops from the

model in the pure unit root case Φ0 = Im. Without this assumption the process

yi,tTt=0 has a discontinuity at Im, as at this point the unrestricted process is a

Multivariate Random Walk with drift. Combination of the two notions results in

E[yi,0|µi] = µi, note that this term is well defined for ρ(Φ0) = 1.

Definition 2.3 (Extensibility). The DGP satisfies extensibility condition if

Φ0Σ0 = (Φ0Σ0)′.

We call this condition “Extensibility” as in some cases this condition is sufficient to

extend univariate conclusions to general m ≥ 1 situations. One of the important

implications of this condition is that

∞∑t=0

Φt0Σ0(Φt0)′ = (Im −Φ20)−1Σ0 = Σ0(Im −Φ2′

0 )−1.

At first we summarize the assumptions regarding the DGP used in this paper, that are

similar to those made by Hsiao et al. (2002) and Binder et al. (2005).

(A.1) The disturbances εi,t, t ≤ T , are i.i.d. for all i with finite fourth moment, with

E[εi,t] = 0m and E[εi,tε′i,s] = 1(s=t)Σ0, Σ0 being a p.d. matrix.

(A.2) The initial deviation ui,0 ≡ yi,0 − µi is i.i.d. across cross-sectional units, with

E[ui,0] = 0m with variance Ψu,0 and a finite fourth moment.

(A.3) For all i = 1, . . . , N and t = 1, . . . , T , the moment restrictions E[ui,0ε′i,t] = Om

are satisfied .

(A.4) N →∞, but T is fixed.


(A.5) Regressors (if present) xi,t are strictly exogenous: E[xi,sε′i,t] = Ok×m, ∀t, s =

1, . . . , T , with a finite fourth moment.

(A.6) Matrix Φ0 ∈ Rm×m satisfies ρ(Φ0) < 1.

(A.6)* Denote by κ a [p × 1] vector of unknown coefficients. κ ∈ Γ , where Γ is a

compact subset of Rp and κ0 ∈ interior(Γ ).

We denote the set of Assumptions (A.1)-(A.6) by SA and by SA* the set when in

addition the (A.6)* assumption is satisfied. The SA assumptions are used to establish

results for the Panel FD estimators, while SA* are used to study asymptotic proper-

ties of the TML estimator. Assumption (A.6) is needed to ensure that the Hessian

of the TML estimator has a full rank5 in the model without regressors. On the other

hand, in Assumption (A.6)* we implicitly extend the parameter space for Φ to satisfy

the usual compactness assumption so that both consistency and asymptotic normality

can be proved directly, assuming the model is globally identified over the parameter

space. However, as we show in Section 2.4.4, the extended parameter space (beyond the

stationary region) might violate the global identification condition. For now the dimen-

sion of κ (“p”), is left unspecified and depends on a particular parametrization used

for estimation (with/without exogenous regressors, with/without mean term, etc.). In

Section 2.4.2 we consider the situation where we allow for individual specific Ψu,0 and

Σ0 matrices.

Note that Assumption (A.2) does not impose any restrictions on yi,0 and µi directly,

but instead on the initial deviation µi,0 (that in principle can be a linear or non-linear

function of µi). However, it is important to note that all estimators in first differences

remain invariant to the distributional characteristics of µi only if

yi,0 = µi + εi,0

with the idiosyncratic component ui,0 = εi,0 independent of µi. As emphasized in

Hsiao et al. (2002) and Hayakawa and Pesaran (2012), in this case µi can be spatially

correlated and/or depend on εi,t, t = 1, . . . , T without affecting the distribution of

the estimator in First Differences. Later in the paper we discuss situations when

this restriction might be violated and the consequences for the properties of the TML

estimator.

5See e.g. Bond et al. (2005) and Juodis (2014a) for proofs that the Hessian matrix of the TMLestimator is singular at the unit root in Panel AR(1) and Panel VAR(1) models, respectively.


2.3 OLS in first differences

2.3.1 With exogenous regressors

The original model in levels contains individuals effects that we remove using the first-

difference transformation. In that case the model specification is given by

∆yi,t = Φ∆yi,t−1 +B∆xi,t + ∆εi,t, i = 1, . . . , N, t = 2, . . . , T.

Before proceeding we define the following variables

∆wi,t ≡

(∆yi,t−1

∆xi,t

), SN ≡

(1

N

N∑i=1

T∑t=1

∆wi,t∆w′i,t

),

ΣW ≡ plimN→∞

SN , Υ ≡ (Φ,B) .

After pooling observations for all t and i, we define the pooled panel first difference

estimator (FDOLS) as

Υ ′ = S−1N

(1

N

N∑i=1

T∑t=1

∆wi,t∆y′i,t

). (2.6)

Similarly to the conventional Fixed Effects (FE) transformation, the FD transformation

introduces correlation between the explanatory variable ∆yi,t−1 and the modified error

term ∆εi,t. As a result this estimator is inconsistent,6 with the asymptotic bias derived

in Proposition 2.4.

Proposition 2.4. Let yi,tTt=1 be generated by (2.3) and Assumptions SA be satisfied.

Then

plimN→∞

(Υ − Υ0)′ = −(T − 1)Σ−1W

(Σ0

Ok×m

). (2.7)

It is easy to see that FDOLS is numerically equal to the FE estimator with T = 2,

thus the asymptotic bias is identical as well. Furthermore, as long as T ≥ 2 the

bias correction approaches as in Kiviet (1995) and Bun and Carree (2005) are readily

available for this estimator (for more details please refer to Appendix 2.B). However, the

6Irrespective whether T fixed or T →∞.


consistency and asymptotic normality of any estimator based on an iterative procedure

crucially depends on the existence of a unique fixed point. As a result, similarly to

the estimator of Bun and Carree (2005), this estimator might fail to converge for

some DGP specifications. These issues stimulate us to look for other analytical bias-

correction procedures that have desirable finite sample properties irrespective of the

DGP parameter values and initialization yi,0. Some special cases for the model without

exogenous regressors are discussed in the next section.

2.3.2 Without exogenous regressors

Assume that yi,0 is covariance stationary and as a consequence

ΣW = (T − 1)

(Σ0 + (Im −Φ0)

(∞∑t=0

Φt0Σ0(Φt0)′

)(Im −Φ0)′

).

In the univariate case it is well known that covariance stationarity of yi,0 is a sufficient

condition to obtain an analytical bias-corrected estimator. However, it is no longer

sufficient for m > 1 and general matrices Φ0 and Σ0. One special case for analyt-

ical bias-corrected estimator is obtained for (Φ0,Σ0) that satisfy the “extensibility”

condition, so that

ΣW = 2(T − 1)Σ0 (Im +Φ′0)−1.

The resulting fixed T consistent estimator for Φ is then given by

ΦFDLS = 2Φ∆ + Im.

It can be similarly shown that this estimator is also fixed T consistent if Φ0 = Im

and the common dynamics assumption is satisfied. For m = 1, this estimator was

analyzed by Han and Phillips (2010), who labeled it the First Difference Least-Squares

(FDLS) estimator, and proved its consistency and asymptotic normality under various

assumptions. It should be noted that the same estimator (or the moment conditions

it is based on) has been studied earlier in the DPD literature, see Bond et al. (2005),

Ramalho (2005), Hayakawa (2007), Kruiniger (2007).

Proposition 2.5 (Asymptotic Normality FDLS). Let the DGP for covariance station-

ary yi,t satisfy the extensibility condition together with conditions of Proposition 2.4.

Then √N(φFDLS − φ0

)d−→ Nm(0m2 ,F), (2.8)


where

F ≡ (Σ−1W ⊗ Im)X(Σ−1

W ⊗ Im), X ≡ plimN→∞

1

N

N∑i=1

vecOi (vecOi)′ ,

Oi ≡

(T∑t=2

(2∆yi,t + (Im −Φ0)∆yi,t−1)∆y′i,t−1

).

The proof of Proposition 2.5 follows directly as an application of the standard Lindeberg-

Levy CLT (see e.g. White (2000) for a general reference on asymptotic results).

Note that if the extensibility condition is violated the multivariate analogue of the

FDLS estimator is not fixed T consistent. In that case the moment conditions similar

to Han and Phillips (2010) can be considered. However, for general Φ0 andΣ0 matrices

these moment conditions are non-linear inΦ and require numerical optimization making

this approach undesirable, because the closed-form estimator is the main advantage of

the FDLS estimator as compared to the TML estimator that we describe in the next

section.

2.4 Transformed MLE

Independently Hsiao et al. (2002) and Kruiniger (2002)7 suggested to build the quasi-

likelihood for a transformation of the original data, such that after the transformation

the likelihood function is free from incidental parameters. In particular, the likelihood

function for the first differences was analyzed. BHP extended the univariate analysis of

Hsiao et al. (2002) and Kruiniger (2002) to the multivariate case, allowing for possible

cointegration between endogenous regressors.

In order to estimate (2.3) using the TML estimator of BHP we need to fully describe

the density function f(∆yi|∆Xi). The only thing that needs to be specified and

not imposed directly by (2.3) is E[∆yi,1|∆Xi], where ∆Xi is a [Tk × 1] vector of

stacked exogenous variables. The conditional mean assumption is actually stronger

than necessary for consistency and asymptotic normality of the TML estimator so we

follow the approach of Hsiao et al. (2002) and consider the following linear projection

for the first observation:

7Later appeared in Kruiniger (2008).


(TX.D) Proj[∆yi,1|∆Xi] = γ +Gπ∆Xi = B∆xi,1 +G∆X†i , ∆X†i = (1,∆X ′i)′,

with the projection error denoted by vi,1. For the resulting TML estimator to be con-

sistent and standard inference procedures to be applicable, the population projection

coefficients have to be identical for all cross-sectional units. This requirement can be

violated if ui,0 is an individual specific function of µi (or ui,0 is a function of µi and

µi is deterministic).

Before proceeding we define

∆Ei ≡ (ITm −LT ⊗Φ)∆Yi − (IT ⊗B)∆Xi − vec (G∆X†i e′1),

where ∆Yi = vec (∆yi,1, . . . ,∆yi,T ). Then assuming (conditional) joint normality of

the error terms and the initial observation, the log-likelihood function (up to a constant)

is of the following form

`(κ) = −N2

log |Σ∆τ | −N

2tr

((Σ−1

∆τ

) 1

N

N∑i=1

∆Ei∆E′i

), (2.9)

with κ = (φ′,σ′,ψ′, vecB′, vecG′)′ and Ψ = E[vi,1v′i,1]. The Σ∆τ matrix has a block

tridiagonal structure, with −Σ on the lower and upper first off-diagonal blocks, and

2Σ on all but the first (1,1) diagonal blocks. The first (1,1) block is set to Ψ , which

takes into account the fact that the variance of vi,1 is treated as a free parameter.

Remark 2.1. As discussed in BHP, the log-likelihood function in (2.9) depends on a

fixed number of parameters, and satisfies the usual regularity conditions. Therefore

under SA* the maximizer of this (quasi) log-likelihood function is consistent with

limiting normal distribution as N → ∞. Consistency is derived assuming that the

log-likelihood function has a unique global maximum at the true value κ0. Note that

for this log-likelihood function consistency of the resulting estimator cannot be proved

based on zeros of the gradient vector, as in general more than one solution will solve

the First Order Conditions (FOC). Section 2.4.4 contains some details for AR(1) on

this issue, while the follow-up paper of Bun et al. (2015) provides a detailed analysis

for the ARX(1) model.

Remark 2.2. Note that the results for the TML estimator derived in this paper do

not require the normality assumption. If the normality assumption is violated `(κ)

is a (quasi) log-likelihood function. For brevity, we use the term log-likelihood rather

than quasi log-likelihood even if the normality assumption is violated. In its general

form, the asymptotic variance-covariance matrix of the estimator has a “sandwich”


form. This “sandwich” form allows for√N consistent inference, when the normality

assumption is violated.

Next we show that conditioning on exogenous variables in first differences leads to a

concentrated log-likelihood functions in φ only.

Theorem 2.6. Let Assumptions SA* and (TX.D) be satisfied. Then the log-likelihood

function of BHP for model (2.3) can be rewritten

− 2

N`(κ) = (T − 1) log |Σ|+ log |Θ|

+ tr

(Σ−1 1

N

N∑i=1

T∑t=1

(yi,t −Φyi,t−1 −Bxi,t)(yi,t −Φyi,t−1 −Bxi,t)′)

+ tr

(Θ−1 T

N

N∑i=1

(yi −G∆X†i −Φyi− −Bxi)(yi −G∆X†i −Φyi− −Bxi)′),

where κ = (φ′,σ′,θ′, vecB′, vecG′)′, Θ ≡ Σ + T (Ψ −Σ) and xi ≡ xi − xi,0.

Proof. In Appendix 2.A.2.

The main conclusion of Theorem 2.6 is that in the case where Ψ is unrestricted, both the

score and the Hessian matrix of the log-likelihood function have closed form expressions,

that are easy to use. That implies that there is no need to use the involved algorithms

of BHP in order to compute the inverse and the determinant of the block tridiagonal

matrix Σ∆τ .

In order to simplify the notation, we introduce a new variable

ξi(κ) ≡ yi −G∆X†i −Φyi− −Bxi. (2.10)

Using this definition,8 we can formulate the following result.

8Some other variables used in this section are defined in Appendix 2.A, so we do not repeat ithere.


Proposition 2.7. Let Assumptions SA* be satisfied. Then the score vector associated

with the log-likelihood function of Theorem 2.6 is given by9

∇(κ) =

vec(Σ−1

∑Ni=1

∑Tt=1(yi,t −Φyi,t−1 −Bxi,t)y′i,t−1 + TΘ−1

∑Ni=1 ξi(κ)y′i−

)D′m vec (N

2(Σ−1(ZN(κ)− (T − 1)Σ)Σ−1))

D′m vec (N2

(Θ−1(MN(κ)−Θ)Θ−1))

vec(Σ−1

∑Ni=1

∑Tt=1(yi,t −Φyi,t−1 −Bxi,t)x′i,t + TΘ−1

∑Ni=1 ξi(κ)x′i

)vec(TΘ−1

∑Ni=1 ξi(κ)∆X†

′

i

)

.

(2.11)

Furthermore, the score vector satisfies the usual regularity condition

E[∇(κ0)] = 0p.


The dimension of the κ vector is substantial especially for moderate values of m and

k, hence from a numerical point of view, maximization with respect to all parameters

might not be appealing. Next we show that it is possible to construct the concentrated

log-likelihood function with respect to the φ parameter only.10 To simplify further

notation we define the following concentrated variables (assuming N > Tk)

yi ≡ yi −

(N∑i=1

yi∆X†′i

)(N∑i=1

∆X†i ∆X†′i

)−1

∆X†i ,

yi− ≡ yi− −

(N∑i=1

yi−∆X†′

i

)(N∑i=1

∆X†i ∆X†′i

)−1

∆X†i ,

yi,t ≡ yi,t −

(N∑i=1

T∑t=1

yi,tx′i,t

)(N∑i=1

T∑t=1

xi,tx′i,t

)−1

xi,t,

yi,t−1 ≡ yi,t−1 −

(N∑i=1

T∑t=1

yi,t−1x′i,t

)(N∑i=1

T∑t=1

xi,tx′i,t

)−1

xi,t.

9See also similar derivations in Mutl (2009).10The key observation for this result is that, although the B parameter enters both tr (·) compo-

nents, xi belongs to the column space spanned by ∆X†i . Hence after concentrating out G, B is nolonger present in the second term.


Using the newly defined variables the concentrated log-likelihood function for κc =

φ′,σ′,θ′′ is given by

`c(κc) = −N2

((T − 1) log |Σ|+ tr

(Σ−1 1

N

N∑i=1

T∑t=1

(yi,t −Φyi,t−1)(yi,t −Φyi,t−1)′

))

− N

2

(log |Θ|+ tr

(Θ−1 T

N

N∑i=1

(yi −Φyi−)(yi −Φyi−)′

)).

Continuing we can concentrate out both Σ and Θ to obtain the concentrated log-

likelihood function for the φ parameter vector only.

`c(φ) = −N(T − 1)

2log

∣∣∣∣∣ 1

N(T − 1)

N∑i=1

T∑t=1


∣∣∣∣∣− N

2log

∣∣∣∣∣ TNN∑i=1


∣∣∣∣∣.However, as there is no closed-form solution for Φ, numerical routines should be used

to maximize this concentrated likelihood function.11 The corresponding FOC can be

derived from Proposition 2.7 for the unrestricted model.

Remark 2.3. The log-likelihood function in Theorem 2.6 can be expressed in terms of

the log-likelihood function for observations in levels `cl (κ) (Within group part), as

`(κ) = `cl (κ)− N

2

(log |Θ|+ tr

(Θ−1 T

N

N∑i=1

ξi(κ)ξi(κ)′

)),

where κ = (φ′,σ′, vecB′)′. The additional (“Between” group) term corrects for the

fixed T inconsistency of the standard ML (FE) estimator. This result is just a general-

ization of Kruiniger (2006; 2008) and Han and Phillips (2013) conclusions to PVARX(1)

with respect to the functional form of `(κ).12

Remark 2.4. In the online appendix Juodis (2014c) we derive the exact expression for

the empirical Hessian matrix HN(κTMLE) and show that this matrix as well as its

inverse are not block-diagonal, hence the TMLE of Φ and Σ (as well as Θ) are not

11For the PVAR(1) model with spatial dependence of autoregressive type as in Mutl (2009), boththe Θ and Σ parameters can be concentrated out but not the spatial dependence parameter λ.

12Grassetti (2011) also discusses a similar decomposition of the log-likelihood function for the panelARX(1) model.


asymptotically independent.13 Non block-diagonality of the covariance matrix needs to

be taken into account, e.g. for the impulse response analysis as in Cao and Sun (2011).

In the next few sections we focus our attention on the restricted model without addi-

tional strictly exogenous regressors. In this case the quasi log-likelihood function can

be simplified and written in the following way

`(κ) = −N2

((T − 1) log |Σ|+ tr

(Σ−1 1

N

N∑i=1

T∑t=1


))

− N

2

(log |Θ|+ tr

(Θ−1 T

N

N∑i=1


)), (2.12)

where κ = (φ′,σ′,θ′)′, Θ ≡ Σ + T (Ψ − Σ) and Ψ = var ∆yi,1. Model without

exogenous regressors was considered in BHP for TML estimator and in Alvarez and

Arellano (2003) for the model in levels. Note that in this specification we assume that

E[ui,0] = 0m holds, later in Section 2.4.3 we investigate properties of the maximizer

(2.12) when this assumption is violated. Possible problems with respect to bimodality

of the log-likelihood function in the AR(1) context are discussed in Section 2.4.4. In

Section 2.4.1 we provide results when the covariance-stationarity assumption is imposed

on Ψ .

2.4.1 Likelihood function with imposed covariance-stationarity

If one is willing to strengthen some of the original assumptions by assuming that ui,0

comes from the (covariance) stationary distribution, then the log-likelihood function is

a function of κcov = φ,σ only. The Θ matrix in this case is no longer treated as a

free parameter but instead is restricted to be of the following form

Θ = Σ + T (Im −Φ)

(∞∑t=0

ΦtΣ(Φt)′

)(Im −Φ)′.

Note that if one imposes covariance stationarity of ui,0, it is no longer possible to con-

struct the concentrated log-likelihood for the φ parameter and a joint optimization

13This result is in sharp contrast to the pure time series VARs where it can be shown that estimatesare indeed asymptotically independent.


over the full parameter vector κcov is required.14 Kruiniger (2008) presents asymp-

totic results for the univariate version of this estimator under a range of assumptions

regarding types of convergence. Results for PVAR(1) can be proved similarly.

Proposition 2.8. Let Assumptions SA* be satisfied. Then the score vector associated

with the log-likelihood function in (2.12) under covariance stationarity is given by:15

∇(κcov) =

(vec (W2,N(κcov)) + J ′φθ vecW1,N(κcov)

D′m(vec (N

2(Σ−1(ZN(κcov)− (T − 1)Σ)Σ−1)) + J ′σθ vecW1,N(κcov)

) ) .(2.13)

Here we define Π ≡ Φ− Im and

W1,N(κ) ≡ N

2(Θ−1(MN(κ)−Θ)Θ−1),

W2,N(κ) ≡ Σ−1

N∑i=1

T∑t=1

(yi,t −Φyi,t−1)y′i,t−1 + TΘ−1

N∑i=1

(yi −Φyi−)y′i−,

Jφθ ≡ −T((σ′D′m(Im2 −Φ′ ⊗Φ′)−1

)⊗ Im2

)× (Im ⊗Km⊗Im)− (Im2 ⊗ vec (Π) + vec (Π)⊗ Im2)

+ T((σ′D′m(Im2 −Φ′ ⊗Φ′)−1

)⊗((Π ⊗Π) (Im2 −Φ⊗Φ)−1

))× (Im ⊗Km⊗Im) (Im2 ⊗ φ+ φ⊗ Im2) ,

Jσθ ≡ Im2 + T (Π ⊗Π) (Im2 −Φ⊗Φ)−1 .


It can be seen that E[∇(κcov0 )] 6= 0m2+(1/2)(m+1)m, unless the initial condition is co-

variance stationary (that is in contrast with the conclusion of Proposition 2.7 for the

unrestricted estimator). Thus violation of the covariance stationarity implies that es-

timator κcov is inconsistent.

Remark 2.5. Han and Phillips (2013) discuss possible problems of the TML estimator

with imposed covariance stationarity near unity. They observe that the log-likelihood

function can be ill-behaved and bimodal close to φ0 = 1. In this paper, we do not

investigate this possibility of bimodality for the PVAR model as the behaviour of the

log-likelihood function close to unity is not of prime interest for us. Furthermore,

the bimodality in Han and Phillips (2013) is not related to the bimodality of the

unrestricted TML estimator as discussed in Section 2.4.4.14Unless the parameter space for Φ and Σ is such that the “extensibility condition” is satisfied,

see univariate results in Han and Phillips (2013).15Note that there is a mistake in the derivations of the Jφθ term in Mutl (2009).


2.4.2 Cross-sectional heterogeneity

In this subsection we consider a model with possible cross-sectional heterogeneity in

Σ,Ψu. For notational simplicity we consider a model without exogenous regressors.

All results presented can be extended to a model with exogenous regressors at the

expense of more complicated notation.

(A.1)** The disturbances εi,t, t ≤ T , are i.h.d. for all i with E[εi,t] = 0m and

E[εi,tεi,s] = 1(s=t)Σ0,i, Σ0,i being a p.d. matrix and maxi E[‖εi,t‖4+δ

]< ∞

for some δ > 0.

(A.2)** The initial deviations ui,0 are i.h.d. across cross-sectional units, with E[ui,0] =

0m and finite p.d. variance matrix Ψu,0,i and maxi E[‖ui,0‖4+δ

]< ∞, for some

δ > 0.

We denote by Σ0 and similarly by Ψu,0 the limiting values of the corresponding sample

averages, i.e. Σ0 = limN→∞(1/N)∑N

i=1Σ0,i.16 Existence of the higher-order moments

as presented in Assumptions (A.1)**-(A.2)** is a standard sufficient condition for

the Lindeberg-Feller CLT to apply. We denote by SA** the set of assumptions SA*,

with (A.1)-(A.2) replaced by (A.1)**-(A.2)**. The univariate analogues of the

results presented in this section for the TMLE estimator, were derived by Kruiniger

(2013) and Hayakawa and Pesaran (2012).

Remark 2.6. As an example of a DGP that satisfies (A.2)**, consider the following

equation

yi,0 = µi + F (µi)εy0, (2.14)

with µi being an non-stochastic m dimensional vector, F (·) : Rm → Rm×m a real

function and εy0 ∼ (0m,Σy0). In this example E[ui,0] = 0m, while E[ui,0u′i,0] =

F (µi)Σy0F (µi)′.

The unrestricted log-likelihood function for κ = (φ′,σ′1, . . . ,σ′N ,θ

′1, . . . ,θ

′N)′ suffers

from the incidental parameter problem, as the number of parameters grows with the

sample size, N . This implies that no√N consistent inference can be made on the σi

and θi parameters, but that does not imply that φ parameter cannot be consistently

16As it was mentioned in Kruiniger (2013), Assumptions (A.1)**-(A.2)** are actually stronger

than necessary, as it is sufficient to assume that (1/N)∑Ni=1 E[εi,sε

′i,s] = (1/N)

∑Ni=1 E[εi,tε

′i,t] for all

s, t = 2, . . . , T to prove consistency and asymptotic normality.


estimated. Notably, we consider the pseudo log-likelihood function `p(κ)17

`p(κ) = −N2

((T − 1) log |Σ|+ tr

(Σ−1 1

N

N∑i=1

T∑t=1


))

− N

2

(log |Θ|+ tr

(Θ−1 T

N

N∑i=1


)),

obtained if one would mistakenly assume that observations are i.i.d. We shall prove

that the conclusions from Section 2.4 continue to hold, with κ0 replaced by pseudo-true

values κ = (φ′, σ′, θ′)′, where

σ = vech Σ0, θ = vech Θ0, φ = φ0,

Θ0 = Σ0 + T (Im −Φ0)

(limN→∞

1

N

N∑i=1

Ψu,0,i

)(Im −Φ0)′.

We assume that κ satisfy a compactness property similar to (A.5)*. It is not difficult

to see that the point-wise probability limit of (1/N)`p(κ) is given by

plimN→∞

1

N`p(κ) = −1

2

((T − 1) log |Σ|+ tr

(Σ−1 plim

N→∞ZN(κ)

))− 1

2

(log |Θ|+ tr

(Θ−1 plim

N→∞MN(κ)

)),

where

plimN→∞

ZN(κ) = (T − 1)Σ0 + (Φ0 −Φ)

(plimN→∞

RN

)(Φ0 −Φ)′

− 1

T

((Φ0 −Φ)ΞΣ0 + Σ0Ξ

′(Φ0 −Φ)′),

plimN→∞

MN(κ) = Θ0 + (Φ0 −Φ)

(plimN→∞

PN

)(Φ0 −Φ)′

+1

T

((Φ0 −Φ)ΞΘ0 + Θ0Ξ

′(Φ0 −Φ)′).

Note that we would obtain the same probability limit of the pseudo log-likelihood func-

tion if ui,0 and εi,tN,Ti=1,t=1 were i.i.d. Gaussian with parameters κ, hence identification

17Here “p” stands for pseudo and is used to distinguish from the standard TMLE log-likelihoodfunction where inference on Σ and Θ is possible.


follows from the result for i.i.d. data. Similarly denote κN = (σ′N , θ′N , φ

′)′, where

σN =1

N

N∑i=1

σ0,i, θN =1

N

N∑i=1

θ0,i, φ = φ0.

Consistency and asymptotic normality of κ follows using standard arguments, see e.g.

Amemiya (1985).

Proposition 2.9 (Consistency and Asymptotic normality). Under Assumptions SA**

the maximizer of `p(κ) is consistent κp−→ κ. Furthermore, under these assumptions

√N (κ− κN)

d−→ N(0,BPML),

where

BPML = H−1` I`H−1

` ,

H` = limN→∞

E

[− 1

NHN

p (κ)

], and I` = lim

N→∞

1

NE

[N∑i=1

∇(i)p (κ0,i)∇(i)

p (κ0,i)′

].

In Appendix we show that the expected value of this log-likelihood function evaluated

at κN is zero. Here by ∇(i)p (κ0,i) we denote the contribution of one cross-sectional

unit i to the score of the pseudo log-likelihood function ∇p(κ) evaluated at the true

values φ0,σ0,i,θ0,i. Note that unless cross-sectional heterogeneity disappears (at a

sufficiently fast rate) as N → ∞, the standard “sandwich” formula of the variance-

covariance matrix evaluated at κ is not a consistent estimate of the asymptotic variance-

covariance matrix in Proposition 2.9, as in general

limN→∞

1

N

N∑i=1

σ0,iσ′0,i 6=

(limN→∞

1

N

N∑i=1

σ0,i

)(limN→∞

1

N

N∑i=1

σ0,i

)′, (2.15)

while H` and BPML are not block-diagonal for fixed T . However, under some restrictive

assumptions on higher order moments of initial observations and the variance of strictly-

exogenous regressors (when they are present) Hayakawa and Pesaran (2012) argue that

it is possible to construct a modified consistent estimator of I` for the ARX(1) model.

In the Monte Carlo section of this paper we use the standard “sandwich” estimator for

the variance-covariance matrix without any modifications. We leave the derivation of

modified consistent estimator of I` for general PVARX(1) case for future research.


2.4.3 Misspecification of the mean parameter

Let us assume that one does not acknowledge the fact that data in differences is mean

non-stationary (as a consequence of E[ui,0] = γu0 6= 0m) and just considers the log-

likelihood function as in (2.12). Denote by κ = (φ′, σ′, θ′)′, where

φ = φ0, σ = σ0, θ = σ0 + T vech[(Im −Φ0) E[ui,0u

′i,0](Im −Φ0)′

].

Hence θ is a function of the second moment of ui,0, rather than the variance of ui,0.

Analogously to the univariate result in Kruiniger (2002), we have the following result.

Proposition 2.10. Let all but E [ui,0] = γu0 = 0m Assumptions SA* be satisfied.

Then κ the maximizer of (2.12) is consistent in a sense that κp−→ κ. Furthermore,

under these assumptions

√N (κ− κ)

d−→ N(0,BML),

where

BML = H−1` I`H−1

` ,

H` = limN→∞

E

[− 1

NHN(κ)

], and I` = lim

N→∞E

[1

N

N∑i=1

∇(i)(κ)∇(i)(κ)′

].

In Appendix we show that the expected value of this log-likelihood function evaluated

at κ is zero.

Remark 2.7. One can think of γ = (Φ0 − Im)γu0 as a (restricted) time effect for

∆yi,1. In general, the non-inclusion of time effects (when they are present in the

model for yi,t, t > 1) results in inconsistency of the TML estimator. As it was already

discussed in BHP, inclusion of time effects is equivalent to cross-sectional demeaning of

all ∆yi,t beforehand. The resulting estimator κ, is then consistent for κ0. As a result,

if the cross-sectional demeaning is performed beforehand, the non-inclusion of the γ

parameter is inconsequential.

Remark 2.8. Note that by combining analysis in Propositions 2.10 and 2.9 we can see

that for cases where E [∆yi,1] = γi are individual specific, one still obtains a consistent

estimate of Φ by simply maximizing `p(κ).18 As an example for this situation we

consider DGP

yi,0 = Γµi + εy0, εy0 ∼ (0m,Σy0),

18Please refer to the proof of Proposition 2.9 in Appendix.


with Γ 6= Im and µi being non-stochastic individual specific effects. Hence, the mean

E [∆yi,1] = (Φ0 − Im)(Γ − Im)µi = γi is individual specific.

2.4.4 Identification and bimodality issues for three-wave pan-

els

In this section we study the behavior of the log-likelihood function for the TML es-

timator with an unrestricted initial condition. Consistency and asymptotic normality

of any ML estimator, among others, requires the assumption that the expected log-

likelihood function has the unique maximum at the true value. As we shall prove in

this section, this condition is possibly violated for the TML estimator with unrestricted

initial condition for T = 2. For ease of exposition we consider the univariate setup as

in Hsiao et al. (2002).

Theorem 2.11. Let assumptions SA* be satisfied. Then for all φ0 ∈ (−1; 1) and

T = 2 the following holds

plimN→∞

`c(φ0) = plimN→∞

`c(φp) (2.16)

for any value of ψ2u,0 > 0. Consequently the expected log-likelihood function has two

local maxima

κ0 =(φ0, σ

20, θ

20

)′,

κp =(φp, θ

20, σ

20

)′,

where

φp ≡ 2

(x− 1

x

)+ φ0, x ≡ 1 + (1− φ0)2ψ2

u,0/σ20 =

1

2

(θ2

0

σ20

+ 1

),

Recall that based on the definition of Θ in Theorem 2.6, the true value of θ2 is given

by

θ20 = σ2

0 + T (1− φ0)2ψ2u,0, ψ2

u,0 = E[u2i,0].

Several remarks regarding the results in Theorem 2.11 are worth mentioning.19 First

of all, instead of proving the result using the concentrated log-likelihood function, it

can be proved similarly by considering the expected log-likelihood function directly.

19We should emphasize that Theorem 2.11 has any theoretical meaning only if φp ∈ Γ .


Secondly, if the parameter space is expressed in terms of κ = (φ, σ2, ψ2)′, then the

value of ψ2 in both sets is equal to ψ20 = ψ2

p = (σ20 + θ2

0)/2.

Remark 2.9. While deriving the result we assumed that E[ui,0] = 0 and γ is not included

in the parameter set. If E[ui,0] 6= 0 then two cases are possible: a) a misspecified log-

likelihood function as in Section 2.4.3 is considered and the result remains unchanged;

b) the γ parameter is included in the set of parameters and as a result Theorem 2.11

does not hold true. For intuition observe that in the latter case the trivial estimator

φ = (∑N

i=1 ∆yi,2)/(∑N

i=1 ∆yi,1) is consistent. However, the key observation for this

special case is that the model does not contain time effects. If, on the other hand,

the model contains time effects, φ is no longer consistent and consequentially the main

result of this section is still valid after cross-sectional demeaning of the data.

Remark 2.10. In the covariance stationary case it can be shown that the conclusion

of Theorem 2.11 extends to PVAR(1) if the extensibility condition is satisfied and

in addition Φ0 is symmetric. In particular, this condition is satisfied by all three

stationary designs in BHP with the pseudo value equal to the identity matrix.

Without loss of generality we can rewrite ψ2u,0 as

ψ2u,0 = α

σ20

1− φ20

, α ≥ 0.

To get more intuition about the problem at hand we can rewrite φp in the following

way

φp =(φ2

0 + φ0)(1− α) + 2α

1 + α + φ0(1− α). (2.17)

From here it can be easily seen that then the pseudo-true value φp is equal to unity

for covariance stationary initialization (α = 1). Furthermore, we can consider other

special cases

|φ0|≤ 1, α = 0→ φp = φ0,

|φ0|≤ 1, α ∈ (0, 1)→ φ0 < φp < 1.

In Monte Carlo simulations it is common to impose some restrictions on the parameter

space. In most cases φ is restricted to the stable region (−1; 1), e.g. Hsiao et al. (2002).

However, as it is clearly seen from Figure 2.2 (and derivations above) a stable region

restriction on φ does not solve the bimodality issue and φp can lie in this interval.

By construction the concentrated log-likelihood function is a sum of two quasi -concave

functions with maxima at different points (Within Group and Between Group parts),


thus bimodality does not disappear for T > 2. Thus by adding these two terms we end-

up having function with possibly two modes, with the first one being of order OP (NT )

and the second one of order OP (N). This different order of magnitude explains why for

larger values of T the WG mode determines the shape of the whole function. To illus-

trate the problem described we present several figures of plimN→∞ `c(φ) for stationary

initial conditions.

-2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2

-2.00

-1.75

-1.50

-1.25

-1.00

-0.75

-0.50

-0.25

0.00

(a) φ0 = −0.5, T = 2

-2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2

-1.50

-1.25

-1.00

-0.75

-0.50

-0.25

0.00

0.25

(b) φ0 = 0.5, T = 2

-2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2

-2.00

-1.75

-1.50

-1.25

-1.00

-0.75

-0.50

-0.25

0.00

0.25

(c) φ0 = 0.5, T = 6

Figure 2.1: Concentrated asymptotic log-likelihood function. In all figures the firstmode is at the corresponding true value φ0, while the second mode is located atφ = 1. The initial observation is from the covariance stationary distribution. Thedashed line represents the WG part of the log-likelihood function, while the dottedline the BG part. The solid line, which stands for the log-likelihood function is a

sum of dashed and dotted lines.

The behavior of the concentrated log-likelihood function in Figures 2.1a, 2.1b and 2.1c

is in line with the theoretical results provided earlier. Note that once φ0 is approaching

unity the log-likelihood function becomes flatter and flatter between the two points.

We can see from Figure 2.1c that once T is substantially bigger than 2, the “true value”

mode starts to dominate the “pseudo value” mode. Based on all figures presented we


TMLE

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3

0.5

1.0

1.5

2.0

2.5

3.0

Density

TMLE

Figure 2.2: Histogram for the TMLE estimator with T = 3, φ0 = 0.5, N = 250and 10,000 MC replications. The initial observation is from the covariance stationarydistribution. Starting values for all iterations are set to φ(0) = 0.0, 0.1, . . . , 1.5. No

non-negativity restrictions imposed.

can suspect that at least for covariance stationary initial conditions (or close to) the

TML estimator is biased positively, with the magnitude diminishing in T .

The main intuition behind the result in Theorem 2.11 is quite simple. When the log-

likelihood function for θ (or ψ) is considered, no restrictions on the relative magnitude

of those terms compared to σ2 are imposed. In particular, it is possible that θ2 < σ2

but that is a rather strange result given that

θ20 = σ2

0 + T (1− φ0)2 E[u2i,0].

But that is exactly what happens in the κp set as

θ2p = σ2

0, σ2p = θ2

0.

Hence the implicit estimate of (1 − φ0)2 E[u2i,0] is negative as we do not fully exploit

the implied structure of var ∆yi,1, which is a so-called “negative variance problem”

documented in panel data, among others, by Maddala (1971).20 This problem was

already encountered in some Monte Carlo studies performed in the literature (even for

larger values of T ), while some other authors only mention this possibility, e.g. Alvarez

and Arellano (2003) and Arellano (2003a). For instance, Kruiniger (2008) mentions

that for values of φ0 close to unity the non-negative constraint on (1 − φ0)2 E[u2i,0], if

imposed, is binding in 50 % of the cases. The Θ or Ψ parameter, on the other hand,

is by construction p.d. (or non-negativity for univariate case). That explains why in

20Note that Maddala (1971) considers the Random Effects type estimator for Dynamic Panel Datamodels, similar (although not identical) to the one in Alvarez and Arellano (2003).


some studies (for instance Ahn and Thomas (2006)) no numerical issues with the TML

estimator were encountered. In this paper we analyze the limiting case of T = 2 and

quantify the exact location of the second mode. Observations made in this section

provide intuition for some of the Monte Carlo results presented in Section 2.5.

2.4.5 Time-series heteroscedasticity

Unlike the case with cross-sectional homoscedasticity, time-series homoscedasticity is

necessary for fixed T consistency of Φ. However, in this section we show that for T

sufficiently large one can still consistently estimate Φ.21 At first we concentrate out

the Θ parameter and consider the normalized version of the log-likelihood function

`c(κc) =− 1

2Tlog

∣∣∣∣∣ TNN∑i=1


∣∣∣∣∣− T − 1

2Tlog |Σ| − tr

(Σ−1 1

2NT

N∑i=1

T∑t=1


).

As the term inside the first log-determinant term is of order OP (T ), the first component

of the log-likelihood function is of order oP (1). Thus as N, T →∞ (jointly)

`c(κc) = op(1)− T − 1

2Tlog |Σ| − tr

(Σ−1 1

2NT

N∑i=1

T∑t=1


).

Clearly the remaining component is just the FE effect log-likelihood function and con-

sistency of Σ and Φ follows directly. For the case with time-series heteroscedasticity

in Σt the log-likelihood function consistently estimates Σ∞ ≡ limT→∞1T

∑Tt=1Σt as-

suming that this limit exists.

The gradient of the log-likelihood function with respect to φ is given by

∇φ(κ) = vec

(Σ−1

N∑i=1

T∑t=1

(yi,t −Φyi,t−1)y′i,t−1

)

+ vec

((1

TΘ

)−1 N∑i=1

(yi −Φyi−)y′i−

).

21In order to show similar results for general models with exogenous regressors one has to provethat as T →∞ the incidental parameter matrix G does not result in an incidental parameter problem.


As it was argued in the previous sections, the second (“Between”) component of the

derivative w.r.t. Φ is of lower order than the first (“Within”) component. As a result,

under the assumption that N/T → ρ, evaluated at the true value of Φ0

1√NT

(1

TΘ

)−1 N∑i=1

(yi −Φ0yi−)y′i− =√ρ

(1

TΘ

)−11

N

N∑i=1

(yi −Φ0yi−)y′i− + op(1)

=√ρ ((Im −Φ0)Ψu,0(Im −Φ0)′)

−1

× [(Im −Φ0)Ψu,0] + op(1)

=√ρ (Im −Φ′0)

−1+ op(1),

where the corresponding result is valid irrespective of whether time-series heteroscedas-

ticity is present or not. Now consider the bias for the score of the fixed effects estimator

evaluated at Φ0 and Σ = 1T

∑Tt=1Σt (as in e.g. Juodis (2013))

1√NT

E

[Σ−1

N∑i=1

T∑t=1

εi,ty′i,t−1

]= −√ρT Σ−1 E[εiyi

′] + o(1)

= −√ρ

TΣ−1

(T−2∑t=0

(t∑l=0

Φl0

)ΣT−1−t

)′+ o(1)

= −√ρ

TΣ−1

((Im −Φ0)−1

T−2∑t=0

)ΣT−1−t

)′+ o(1)

+

√ρ

TΣ−1

((Im −Φ0)−1

T−2∑t=0

Φt+10 ΣT−1−t

)′+ o(1)

= −√ρ(Im −Φ′0)−1

+1

TΣ−1

(T−2∑t=0

Φt+10 ΣT−1−t

)′+ o(1)

= −√ρ(Im −Φ′0)−1 + o(1).

Here the last line follows if one assumes that the Σs sequence is bounded, so that

the sum term is of order O(1). Hence assuming that N/T → ρ, the standardized

score (NT )−1/2∇φ(κ0) has an asymptotic distribution correctly centered at zero. As

a result the large N, T distribution of the TML estimator is identical to that of the

bias-corrected FE estimator of Hahn and Kuersteiner (2002).

In the previous section we have shown that in the correctly specified model with time-

series homoscedasticity the score of the TML estimator fully removes the induced bias


of the FE estimator. This conclusion was established based on the assumption that

N →∞ for a fixed value of T . In this section we have extended this result by showing

that in the presence of possible time-series heteroscedasticity the estimating equations

of the TML estimator remove the leading bias of the FE estimator.

2.5 Simulation study

2.5.1 Setup

At first we present the general DGP that can be used to generate initial conditions yi,0:

yi,0 = ai +Eiµi +Ciεi,0, εi,0 ∼ IID

(0m,

∞∑j=0

Φj0Σ0(Φj0)′

), (2.18)

for some parameter matrices ai [m× 1], Ei [m×m] and Ci [m×m]. The special case

of this setup is the (covariance) stationary model if ai = 0m and Ci = Ei = Im. We

distinguish between stability and stationarity conditions. We call the process yi,tTt=0

dynamically stable if ρ(Φ) < 1 and (covariance) stationary if in addition the first two

moments are constant over time (t = 0, . . . , T ).

In what follows we set ai = 02 for all design.22 We generate the individual heterogeneity

µi (rather than ηi) using a procedure similar to BHP

µi = π

(qi − 1√

2

)ηi, qi

iid∼ χ2(1), ηiiid∼ N(02,Ση). (2.19)

Unlike in the paper of BHP we do not fix Ση = Σ, instead we extend the approach of

Kiviet (2007) by specifying23,24

vecΣη =

(1

T

T∑t=1

(Φt0(E − Im) + Im

)⊗(Φt0(E − Im) + Im

))−1

× (Im2 −Φ0 ⊗Φ0)−1 vecΣ0.

22In the online appendix some additional results for Design 2 are presented with ai = ı2.23See the online appendix of this paper.24If the variance of εi,t differs between individuals then we evaluate this expression at Σn rather

than at Σ.


The way we generate µi ensures that the individual heterogeneity is not normally

distributed, but still IID across individuals. In the effect stationary case the particular

way the µi are generated does not influence the behavior of the TML log-likelihood

function. However, the non-normality of µi in the effect non-stationary case implies

non-normality of ui,0 and, hence, a quasi maximum likelihood interpretation of the

likelihood function. With respect to the error terms we restrict our attention to εi,t

being normally distributed ∀i, t.25

2.5.2 Designs

The parameter set which is common for all designs, consists of a triplet N ;T ; π with

possible values

N = 100; 250, T = 3; 6, π = 1; 3.

In the DPD literature it is well known that in the effect stationary case a higher value

π leads to worse finite sample properties of the GMM estimators, see e.g. Bun and

Windmeijer (2010) and Bun and Kiviet (2006). That might also have indirect influence

on the TML estimator even in the effect stationary case, as we use GMM estimators

as starting values for numerical optimization of the log-likelihood function.

In this paper six different Monte Carlo designs are considered. The first one is adapted

from the original analysis of BHP, while the other five are constructed to reveal whether

the TML estimator is robust with respect to different assumptions regarding the pa-

rameter matrix Φ0, the initial conditions yi,0, and cross-sectional heteroscedasticity. In

the case where observations are covariance stationary or cointegrated, BHP calibrated

the design matrices Φ and Σ such that the population R2∆l

26 remained approximately

constant (≈ 0.237) between designs.

25The analysis can be extended to the cases where the error terms are skewed and/or have fatter tailsas compared to the Gaussian distribution. As a partial robustness of their results BHP consideredt- and chi square distributed disturbances, but the results were close to the Gaussian setup. Theestimation output for these setups was not presented in their paper.

26Computation of the population R2 for stationary series R2∆l = 1 − Σl,l

Γl,l, l =

1; where vec (Γ ) in the covariance stationary case is given by vec (Γ ) =(((Im −Φ0)⊗ (Im −Φ0)) (Im2 −Φ0 ⊗Φ0)

−1+ Im2

)Dm σ.


Design 1 (Covariance Stationary PVAR with ρ(Φ0) = 0.8 from BHP).

Φ0 =

(0.6 0.2

0.2 0.6

), Σ0 =

(0.07 −0.02

−0.02 0.07

), Ση =

(0.123 0.015

0.015 0.123

).

The second eigenvalue is equal to 0.4 and the population R2∆ values are given by

R2∆l = 0.2396, l = 1, 2.

Although the Monte Carlo designs in BHP are well chosen, they are quite limited in

scope as the analysis was mainly focused on the influence of ρ(Φ0). Furthermore, all

design matrices in the stationary designs were assumed to be symmetric and Toeplitz,27

which substantially shrinks the parameter space for Φ0 and Σ.

Design 2 (Covariance Stationary PVAR with ρ(Φ0) = 0.50498).

Φ0 =

(0.4 0.15

−0.1 0.6

), Σ0 =

(0.07 0.05

0.05 0.07

), Ση =

(0.079 0.052

0.052 0.100

).

The eigenvalues of Φ0 in this design are given by 0.5 ± 0.070711i and the population

R2∆ values are given by R2

∆2 = 0.23434 and R2∆2 = 0.23182.

The parameter matrix Φ0 was chosen such that the population R2∆ are comparable

between Designs 1 and 2, but the extensibility condition is violated.

In Designs 3-4 we study the finite-sample properties of the estimators when the initial

condition is not effect-stationary.28

Design 3 (Stable PVAR with ρ(Φ0) = 0.50498). We take Φ0 and Σ0 from Design 2,

but with

Ei = 0.5× I2, Ci = I2, i = 1, . . . , N.

Ση,T=3 =

(0.090 0.059

0.059 0.144

), Ση,T=6 =

(0.083 0.055

0.055 0.122

).

Design 4 (Stable PVAR with ρ(Φ0) = 0.50498). We take Φ0 and Σ0 from Design 2,

but with

Ei = 1.5× I2, Ci = I2, i = 1, . . . , N.

27Hence they satisfied the “Extensibility” condition.28Note that effect non-stationarity in these designs has no impact on the first unconditional moment

of the yi,tTt=0 process. It can be explained by the fact that E[µi] = 02 is a sufficient condition forthe yi,tTt=0 process to have a zero mean. Thus there is no reason to allow for mean non-stationarityby including γ parameter into the log-likelihood function, but it is crucial to allow for a covariancenon-stationary initial condition.


Ση,T=3 =

(0.069 0.045

0.045 0.074

), Ση,T=6 =

(0.074 0.049

0.049 0.083

).

In Section 2.4.2 we presented theoretical results for the TML estimator when unre-

stricted cross sectional heteroscedasticity is present. This design is used to investigate

the impact of multiplicative cross-sectional heteroscedasticity on the estimators.

Design 5 (Stable PVAR with ρ(Φ0) = 0.50498 with non-i.i.d. εi,t). As a basis for this

design we take Φ0 and Σ0 from Design 2, but with

Ei = I2, Ci = ϕiI2, Σ0,i = ϕ2iΣ0, ϕ2

iiid∼ χ2(1), i = 1, . . . , N.

The last design is dedicated to reveal the robustness properties of the TML estimator

when time-series heteroscedasticity is present. From Section 2.4.5 we know that this

estimator is not fixed T consistent in this case.

Design 6 (Stable PVAR with time-series heteroscedasticity). As a basis for this design

we take Φ0 and Σ0 from Design 2 Ei = Ci = I2, but with Σ0,t are generated as

Σ0,t = (0.95− 0.05T + 0.1t)×Σ0, t = 1, . . . , T.

This particular form of the time-series heteroscedasticity was chosen such that the

T−1∑T

t=1Σ0,t = Σ0.

For convenience we have multiplied both the mean and the median bias by 100. Sim-

ilarly to BHP we only present results for φ11 and φ12, as results for the other two

parameters are similar both quantitatively and qualitatively. The number of Monte

Carlo simulations is set to B = 10, 000.

2.5.3 Technical remarks

As starting values for TMLE estimation algorithm we used estimators available in a

closed form. Namely, we used “AB-GMM”, “Sys-GMM” and FDLS, the additive bias-

corrected FE estimator as in Kiviet (1995) and the bias-corrected estimator of Hahn

and Kuersteiner (2002). Here “AB-GMM” stands for the Arellano and Bond (1991)

estimator; “Sys-GMM” is the System estimator of Blundell and Bond (1998) which

incorporates moment conditions based on the initial condition. All aforementioned


GMM estimators are implemented in two steps, with the usual clustered weighting

matrix used in the second step.29

We denote by “TMLE” the global maximizer of the TML objective in (2.12). By

“TMLEr”, we denote the estimator which is obtained similarly as “TMLE”, but instead

of selecting the global maximum, the local maximum that satisfies the restrictions

|Θ− Σ|≥ 0 is selected when possible30 and the global maximum otherwise. The TML

estimator with imposed covariance stationarity is denoted by “TMLEc”. Finally, we

denote by “TMLEs” the estimator that is obtained by choosing the local maximum of

the TMLE objective function with the lowest spectral norm.31 This choice is motivated

by the fact that for a univariate three-wave panel the second mode is always larger than

the true mode; in a PVAR one can think of the spectral norm as a measure of distance.

Regarding inference, for all the TML estimators we present results based on robust

“sandwich” type standard errors labeled (r). In the case of GMM estimators, we

provide rejection frequencies based on the commonly used Windmeijer (2005) corrected

S.E.

2.5.4 Results: Estimation

In this section we briefly summarize the main findings of the MC study as presented

in Tables 2.1 to 2.6. Inference related issues are discussed in the next section.

Design 1. For GMM estimators results are similar to those in BHP. Irrespective of

N , the properties of all GMM estimators deteriorate as T and/or π increase and these

effects are substantial both for diagonal and off-diagonal elements of Φ. Similarly, we

can see that for small values of T , the performance of the TML estimator is directly

related to the corresponding bias and the RMSE properties of the GMM estimators.32

Hence using the estimators that are biased towards the pseudo-true value, helps to find

the second mode that happens to be the global maximum in that replication. On the

other hand, if the resulting estimators are restricted in some way (TMLEs, TMLEr,

29That takes the form “Z ′uu′Z”.30In principle this restriction is necessary but not sufficient for Θ − Σ to be p.s.d. However, for

the purpose of exposition in this paper we stick to this condition rather than checking non-negativityof the corresponding eigenvalues.

31However, unlike the univariate studies of Hsiao et al. (2002) and Hayakawa and Pesaran (2012),where the φ parameter was restricted to lie in the stationary region, in the numerical routine for theTMLE no restrictions on the parameter space of φ are imposed.

32This contrasts sharply with the finite-sample results presented in BHP.


TMLEc), the strong dependence on starting values is no longer present (especially

for TMLEs). In terms of both the bias and the RMSE we can see that the TMLEc

estimator performs remarkably well irrespective of design parameter values for both di-

agonal and off-diagonal elements. The FDLS estimator does perform marginally worse

as compared to the TMLEc estimator but still outperforms all the GMM estimators.

All the TML estimators (except for TMLEc) tend to have an asymmetric finite sample

distribution that results in corresponding discrepancies between estimates of mean and

median.

In Section 2.4.4 we have mentioned that the second mode of the unrestricted TML

estimator is located at Φ = Im. Based on the results in Table 2.1 we can see that the

diagonal elements for the TML estimator are positively biased towards 1, while the off-

diagonal elements are negatively biased in direction of 0 (at least for small N and T ).

Thus the bimodality problem remains a substantial issue even for T > 2 and choosing

the global optimum is not always the best strategy as TMLEs clearly dominates TMLE

for small values of T . For T = 6 the TMLEr and TMLEs provide equivalent results

and some improvements over the “global” standard TMLE.

Design 2. One of the implications of this setup is that the FDLS estimator is not

consistent. More importantly, for this setup we do not know whether the bimodality

issue even for T = 2 is still present, thus the need for the TMLEr and TMLEs esti-

mators is less obvious. However, the motivation becomes clear once we look at the

corresponding results in Table 2.2. TMLEs and TMLEr dominate TMLE in all cases,

with TMLEs being the preferred choice. We can observe that the bias of the TML

estimator in terms of both the magnitude and the sign does not change dramatically as

compared to Design 1. Observe that the bias of the TMLEc in the diagonal elements

does not decrease with T fast enough to match the performance of the TMLEr/TMLEs

estimators. While for the off-diagonal elements quite a substantial bias remains even

for N = 250, T = 6.33

Designs 3 and 4. As expected, the properties of Sys-GMM (that rely on the effect-

stationarity implied moment conditions) deteriorate significantly compared to Design

2. We observe that for π = 1 the AB-GMM estimator is more biased in comparison

to Design 2 (for Design 3), but is less biased if π = 3. The intuition of these patterns

is similar to the one presented by Hayakawa (2009a) within the univariate setting.

Unlike the previous designs, the TML estimator exhibits lower bias for π = 3 despite

33As it will turn out later, these properties will play a major role to explain the finite sampleproperties of the LR test of covariance stationarity, that is presented in the online appendix.


the fact that the quality of the starting values diminished in the same way as in the

effect-stationary case. The magnitudes of the effect non-stationary initial conditions

considered in these designs are sufficient to ensure that the restrictions imposed on the

TMLEr estimator are satisfied even for small values of N and T .

Design 5. Unlike in Designs 3-4, the setup of Design 5 has no impact on the consistency

of estimators (except FDLS). As can be clearly seen from Table 2.6, the same can not be

said about the variance of the estimators. The introduction of cross-sectional variation

in Σ0,i affects all estimation techniques by means of higher RMSE/MAE values. On

the other hand, effects are less clear for bias with improvements for some estimators

and higher bias for others.

Design 6. In this setup all TML estimators are inconsistent due to the time-series

heteroscedasticity, but the TMLEc estimator seems to be affected the most in terms

of both the bias and precision. By comparing the results in Tables 2.2 and 2.6, we see

that diagonal elements (φ11 in this case) are mostly affected as the estimation quality of

the off-diagonal elements remains unaffected. Furthermore, the Sys-GMM estimator,

albeit still consistent, also shows some signs of deteriorating finite sample properties.

For T = 6 the bias of the TMLE/TMLEs/TMLEr estimators diminishes, as can be

expected given that the bias is of order O(T−2).

2.5.5 Results: Inference

Below we briefly summarize the main findings regarding the size and power of the two-

sided t-test for φ11. Results for the other entries are available from the author upon

request.

• Except for TMLEc, for N = 100 all estimators result in substantially oversized

test statistics with relatively low power. In many cases rejection frequencies for

alternatives close to the unit circle are of similar magnitude to size.

• When the estimator is consistent, the inference based on TMLEc serves as a

benchmark both for size and power.

• In designs with the effect stationary initial condition (exceptN = 250, T = 6 to be

discussed next), the empirical rejection frequencies based on all the TML (except

for TMLEc) as well as the AB-GMM estimators do not result in symmetric power

curves, due to the substantial finite sample bias of the estimators.


• Results for T = 6 and N = 250 suggest that the TML estimators without imposed

stationarity restrictions are well sized and have good power properties in all

designs with almost perfectly symmetric power curves.

• Although all the TML estimators (without imposed stationarity restriction) are

inconsistent with time-series heteroscedastic error terms, the actual rejection fre-

quencies for N = 250 are only marginally worse in comparison to the benchmark

case. The same, however, can not be said about the TMLEc estimator.

• In the design with cross-sectional heteroscedasticity, the TML based test statistics

become more oversized, compared to the benchmark case. The only exception is

the case with N = 250 and T = 6 where the actual size increases by at most 1%.

The results on bias and size presented here suggest that under the assumption of time

homoscedasticity, likelihood based techniques might serve as a viable alternative to the

GMM based methods in the simple PVAR(1) model. Particularly, the TML estimator

of BHP tends to be robust with respect to non-stationarity of the initial condition

and cross-sectional heterogeneity of parameters. Furthermore, in finite samples likeli-

hood based methods are robust even if smooth time-series heteroscedasticity is present.

However, the TML estimator might suffer from serious bimodality problems when the

number of cross-sectional units is small and the length of time series is short. In these

cases the resulting estimator heavily depends on the way the estimator is chosen. For

some designs in 30%−40% of all MC replications no local maxima satisfying |Θ−Σ|> 0

was available even for N = 250. However, this problem becomes marginal once T = 6

where such fractions drop to 1% − 10%. Based on these results we suggest that the

resulting TMLE estimator is chosen such that (when possible) local maxima should

satisfy the p.s.d. restriction |Θ − Σ|> 0 (TMLEr), and otherwise the solution with

smaller spectral norm should be chosen (TMLEs).

2.6 Conclusions

In this paper we provide a thorough analysis of the performance of fixed T consistent

estimation techniques for the PVARX(1) model based on observations in first differ-

ences. We have mostly emphasized the results and properties of the likelihood based

method. We have extended the approach of BHP with the inclusion of strictly exoge-

nous regressors and shown how to construct a concentrated likelihood function for the

autoregressive parameter only.


The key finding of this paper is that in the three-wave panel the expected log-likelihood

function of BHP in the univariate setting does not have a unique maximum at the

true value. This result has been shown to be robust irrespective of the initialization.

Furthermore, we have provided a sufficient condition for this result to hold for the

PVAR(1) in the three-wave panel.

Finally, we have conducted an extensive MC study with the emphasis on designs where

the set of standard assumptions about the stationarity and the cross-sectional ho-

moscedasticity were violated. Results suggest that likelihood based inference techniques

might serve as a feasible alternative to GMM based methods in a simple PVARX(1)

model. However, for small values of N and/or T the TML estimator is vulnerable to

the choice of the starting values for the numerical optimization algorithm. These finite

sample findings have been related to the bimodality results derived in this paper. We

proposed several ways of choosing the estimator among local maxima. Particularly, we

suggest that the resulting TMLE estimator be chosen such that local maxima should

satisfy the p.s.d. restriction (TMLEr), and otherwise the solution with smaller spectral

norm should be chosen (TMLEs).

2.A Proofs

First, we define a set of new auxiliary variables, that are used in the derivations

εi,t(φ) ≡ yi,t −Φyi,t−1, εi(φ) ≡ yi −Φyi−

ZN(κ) ≡ 1

N

N∑i=1

T∑t=1

εi,t(φ)εi,t(φ)′, QN(κ) ≡ 1

N

N∑i=1

T∑t=1

yi,t−1εi,t(φ)′,

MN(κ) ≡ T

N

N∑i=1

εi(φ)εi(φ)′, NN(κ) ≡ T

N

N∑i=1

yi−εi(φ)′,

RN ≡1

N

N∑i=1

T∑t=1

yi,t−1y′i,t−1, PN ≡

T

N

N∑i=1

yi−y′i−, Ξ ≡

T−2∑l=0

(T − 1− l)Φl0.


In the derivations we use several results concerning differentials (for more details refer

to Magnus and Neudecker (2007))

dlog |X| = tr (X−1(dX)), d(trX) = tr (dX),

d(vecX) = vec (dX), dX−1 = −X−1(dX)X−1,

dXY = (dX)Y +X(dY ), d(X ⊗X) = d(X)⊗X +X ⊗ d(X),

vec (dX ⊗X) = (Im ⊗Km⊗Im)(Im2 ⊗ vecX) vec d(X)

2.A.1 Auxiliary results

Lemma 2.12.

Υ ≡T−1∑l=0

Φl0 − TIm +

(T−2∑l=0

(T − l)Φl0 −T−2∑l=0

Φl0

)(Im −Φ0) = Om.

Proof.

Υ ≡T−1∑l=0

Φl0 − TIm +

(T−2∑l=0

(T − l)Φl0 −T−2∑l=0

Φl0

)(Im −Φ0)

= ΦT−10 +

T−2∑l=0

Φl+10 − TIm + T

(T−2∑l=0

Φl0 −T−1∑l=1

Φl0

)−

(T−2∑l=1

lΦl0 −T−1∑l=1

(l − 1)Φl0

)

= ΦT−10 +

T−1∑l=1

Φl0 − TIm + T (Im −ΦT−10 )−

(T−2∑l=1

Φl0 − (T − 2)ΦT−10

)

= ΦT−10 +

T−2∑l=0

Φl+10 − TΦT−1

0 −

(T−2∑l=1

Φl0 − (T − 2)ΦT−10

)= (1− T )ΦT−1

0 +ΦT−10 + (T − 2)ΦT−1

0 = Om.

Lemma 2.13. Under Assumptions SA* the following equality holds

E [NN(κ0)] =1

TΞΘ0.

for Θ0 = Σ0 + T (Im −Φ0)E[ui,0u′i,0](Im −Φ0)′.


Proof. Define Π0 = Φ0 − Im then

E [NN(κ0)′] = E

(T

N

N∑i=1

(yi −Φ0yi−)y′i−

)

= E

[(Π0ui,0 + εi)

((T−1∑s=0

Φs0 − TIm

)yi,0 +

(T−2∑l=0

(T − 1− l)Φl0

)−Π0µi+

)′]

+ E

[(Π0ui,0 + εi)

(T−1∑t=1

t−1∑s=0

Φs0εi,t−s

)′]

= E

[(Π0ui,0 + εi)

(Υyi,0 +

(T−2∑l=0

(T − 1− l)Φl0

)Π0ui,0 +

(T−1∑t=1

t−1∑s=0

Φs0εi,t−s

))′].

In Lemma 2.12 we showed that Υ = Om, thus

E

[T

N

N∑i=1


]= E

[(Π0ui,0 + εi)

(ΞΠ0ui,0 +

(T−1∑t=1

t−1∑s=0

Φs0εi,t−s

))′]= (Im −Φ0)E[ui,0u

′i,0](Im −Φ0)′Ξ ′ +

1

TΣ0Ξ

′ =1

TΘ0Ξ

′.

2.A.2 Log-likelihood function

Theorem 2.6.

Let

∆τi =

∆yi,1

∆εi,2...

∆εi,T

, CT =

1 0 · · · 0

1 1. . .

......

. . . . . . 0

1 · · · 1 1

,LT =

0 · · · · · · · · · 0

1. . . . . . . . .

...

0. . . . . . . . .

......

. . . . . . . . . 0

0 · · · 0 1 0

and let D be a [T × T + 1] matrix which transforms a [T + 1 × 1] vector x into a

[T × 1] vector of corresponding first differences. Also define Θ ≡ T (Ψ −Σ) +Σ and


Ω ≡ Σ−1Θ. If we denote Γ ≡ Σ−1Ψ it then follows

Σ∆τ = (IT ⊗Σ)

Γ −Im Om · · · Om

−Im 2Im. . . . . .

...

Om. . . . . . . . . Om

.... . . . . . . . . −Im

Om · · · Om −Im 2Im

= (IT ⊗Σ) [(DD′ ⊗ Im) + (e1e

′1 ⊗ (Γ − 2Im))]

= (IT ⊗Σ)[((C ′TCT )−1 ⊗ Im) + (e1e

′1 ⊗ (Γ − Im))

].

Subsequently the determinant is given by (using the fact that |CT |= 1)

|Σ∆τ | = |Σ|T |((C ′TCT )−1 ⊗ Im) + (e1e′1 ⊗ (Γ − Im))|

= |Σ|T |Im + (e′1C′TCTe1(Γ − Im))||(C ′TCT )−1|

= |Σ|T |Im + (e′1C′TCTe1(Γ − Im))|

= |Σ|T |Im + T (Γ − Im)|= |Σ|T |Ω|= |Σ|T−1|Θ|,

where the second line follows by means of the Matrix Determinant Lemma.34 Using

the Woodbury formula we can evaluate Σ−1∆τ

Σ−1∆τ =

[((C ′TCT )−1 ⊗ Im) + (e1e

′1 ⊗ (Γ − Im))

]−1(IT ⊗Σ−1)

= ((C ′TCT )⊗ Im)− ((C ′TCTe1)⊗ Im)((Γ − Im)−1 + TIm

)× ((e′1C

′TCT )⊗ Im)(IT ⊗Σ−1)

= (C ′T ⊗ Im)U (CT ⊗ Im)(IT ⊗Σ−1)

= (C ′T ⊗ Im)U(IT ⊗Σ−1)(CT ⊗ Im),

34Alternatively |Σ∆τ | can be evaluated using the general formula for tridiagonal matrices in Moli-nari (2008).


where U is

U = ITm − ((CTe1)⊗ Im)((Γ − Im)Ω−1

)((e′1C

′T )⊗ Im)

= ITm − (ıT ⊗ Im)((Γ − Im)Ω−1

)(ı′T ⊗ Im) = ITm − ıT ı′T ⊗

((Γ − Im)Ω−1

)= ITm −

1

TıT ı′T ⊗

(Im −Ω−1

)= ITm −

1

TıT ı′T ⊗ Im +

1

TıT ı′T ⊗Ω−1

= WT ⊗ Im +1

TıT ı′T ⊗Ω−1,

so that

Σ−1∆τ = (C ′T ⊗ Im)

[WT ⊗Σ−1 +

1

TıT ı′T ⊗Θ−1

](CT ⊗ Im).

Now using the fact that R = (ITm −LT ⊗Φ) and defining zi = (yi,0, . . . ,yi,T )

Z ≡ (CT ⊗ Im)(ITm −LT ⊗Φ) vec (ziD′)

= vec (ziD′C ′T −ΦziD′L′TC ′T ) = vec ((CTDz

′i)′ −Φ(CTLTDz

′i)′)

= vec ((Yi − ıTyi,0)′ −Φ(Yi− − ıTyi,0)′).

Hence the log likelihood function of BHP can be rewritten in the following way (where

κ = (φ′,σ′,θ′)′)

`(κ) = c− N2

((T − 1) log |Σ|+ log |Θ|+ tr (Σ−1ZN(κ)) + tr (Θ−1MN(κ))

). (2.20)

In order to include exogenous regressors in the model we denote the following quantities

γ = G∆X†i , Xi = (xi,1, . . . ,xi,T ).

The Z term in this case is given by

Z ≡ (CT ⊗ Im) ((ITm −LT ⊗Φ) vec (ziD′)− (IT ⊗B) vec (∆Xi)− vec (γe′1))

= vec ((Yi − ıT (yi,0 + γ))′ −Φ(Yi− − ıTyi,0)′ −B(Xi − ıTxi,0)′).

Result follows directly based on derivations for PVAR(1) model by redefining ZN and

MN .


2.A.3 Score vector

Proposition 2.7.

Here for simplicity we derive the first differential of `(κ) without exogenous regressors

− 2

Nd`(κ) = (T − 1) tr (Σ−1(dΣ)) + tr (Θ−1(dΘ))

− tr (Σ−1(dΣ)Σ−1ZN(κ))− tr (Θ−1(dΘ)Θ−1MN(κ))

+ tr (Σ−1(dZN(κ))) + tr (Θ−1(dMN(κ)))

= tr (Σ−1((T − 1)Σ −ZN(κ))Σ−1(dΣ))

+ tr (Θ−1(Θ −MN(κ))Θ−1(dΘ))

− 2 tr(Σ−1((dΦ)QN(κ))

)− 2 tr

(Θ−1((dΦ)NN(κ))

).

Based on these derivations we conclude that the corresponding [2m2 + m × 1] score

vector is given by

∇(κ) = N

vec (Σ−1QN(κ)′ +Θ−1NN(κ)′)

D′m vec (−12(Σ−1((T − 1)Σ −ZN(κ))Σ−1))

D′m vec (−12(Θ−1(Θ −MN(κ))Θ−1))

. (2.21)

The mean zero result follows directly from Lemma 2.13 and the fact that E[Σ−10 QN(κ0)′] =

−(1/T )Ξ ′ (the “Nickell bias”).

Proposition 2.8.

We need to derive the exact expression for vec dΘ under the assumption that vec E[ui,0u′i,0] =

(Im2 − Φ ⊗ Φ)−1 vecΣ. First, we rewrite the expression for vecΘ (we prefer to work

with vec (·) rather than vech (·) to avoid excessive use of duplication matrix Dm)

vecΘ = vecΣ + T ((Im −Φ)⊗ (Im −Φ)) vec E[ui,0u′i,0]

= vecΣ + T ((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1 vecΣ = Jσθ vecΣ.

Using rules for differentials we get that

d(vecΘ) = Jσθ d(vecΣ) + d(Jσθ) vecΣ.


Using the product rule for differentials

1

Td(Jσθ) = − (d(Φ)⊗ (Im −Φ) + (Im −Φ)⊗ d(Φ)) (Im2 −Φ⊗Φ)−1

+ ((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1

× (d(Φ)⊗Φ+Φ⊗ d(Φ)) (Im2 −Φ⊗Φ)−1.

Recall the definition of E[ui,0u′i,0] = Ψ0 and ψ0 = vecΨ0. As d(Jσθ) vecΣ is already

a vector by taking vec (·) of this term nothing changes

1

Tvec (d(Jσθ) vecΣ) = −(ψ′0 ⊗ Im2) vec (d(Φ)⊗ (Im −Φ) + (Im −Φ)⊗ d(Φ))

+ (ψ′0 ⊗(((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1

))

× vec (d(Φ)⊗ (Φ) + (Φ)⊗ d(Φ)).

Using the formula for vec (dX ⊗X)

1

Td(Jσθ) vecΣ =− (ψ′0 ⊗ Im2)(Im ⊗Km⊗Im)(Im2 ⊗ (j − φ) + (j − φ)⊗ Im2) dφ

+ (ψ′0 ⊗(((Im −Φ)⊗ (Im −Φ)) (Im2 −Φ⊗Φ)−1

))

× (Im ⊗Km⊗Im)(Im2 ⊗ φ+ φ⊗ Im2) dφ

Recall the definition of Jφθ to conclude that

d(Jσθ) vecΣ = Jφθ dφ. (2.22)

The desired results follows by combining the differential results for dvecΘ with the

proof of Proposition 2.7.

Proposition 2.10.

Consider the score vector evaluated at κ

∇(κ) = N

vec(Σ−1

0 QN(φ0)′ + Θ−1NN(φ0)′)

D′m vec (−12(Σ−1

0 ((T − 1)Σ0 −ZN(φ0))Σ−10 ))

D′m vec (−12(Θ−1(Θ −MN(φ0))Θ−1))

. (2.23)

Now observe that the mean of E[ui,0] does not influence the “Nickell bias” E[Σ−10 QN(φ0)′] =

−(1/T )Ξ ′ and the unbiasedness of the FE estimator of Σ as E[ZN(φ0)] = (T − 1)Σ0.

On the other hand MN(φ0) and NN(φ0) are (implicitly) influenced by γ. Similarly as


in the proof of 2.13

E

[T

N

N∑i=1


]= E

[(Π0ui,0 + εi)

(ΞΠ0ui,0 +

(T−1∑t=1

t−1∑s=0

Φs0εi,t−s

))′]= Π0 E[ui,0u

′i,0]Π ′0Ξ

′ +1

TΣ0Ξ

′ =1

TΘΞ ′.

Note that this term depends on the second uncentered moment of ui,0 rather than

second centered moment of ui,0. Finally

E

[T

N

N∑i=1

(yi −Φ0yi−)(yi −Φ0yi−)′

]= T E

[(Π0ui,0 + εi) (Π0ui,0 + εi)

′]= TΠ0 E[ui,0u

′i,0]Π ′0 +Σ0 = Θ.

Combining all results we conclude that E[∇(κ)] = 0.

Proposition 2.9.

To see that E[∇(κN)] = 0 we just make use of the proof of Proposition 2.10. Note

that

E

[T

N

N∑i=1


]

=1

N

N∑i=1

E

[(Π0ui,0 + εi)

(ΞΠ0ui,0 +

(T−1∑t=1

t−1∑s=0

Φs0εi,t−s

))′]

= Π01

N

(N∑i=1

E[ui,0u′i,0]

)Π ′0Ξ

′ +1

TΣNΞ

′ =1

TΘNΞ

′

and

E

[T

N

N∑i=1

(yi −Φ0yi−)(yi −Φ0yi−)′

]=T

N

N∑i=1

E[(Π0ui,0 + εi) (Π0ui,0 + εi)

′]= TΠ0

1

N

(N∑i=1

E[ui,0u′i,0]

)Π ′0 + ΣN = ΘN .

On the other hand E[Σ−1N QN(φ0)′] = −(1/T )Ξ ′ and E[ZN(φ0)] = (T − 1)ΣN . Com-

bining these intermediate results the desired final conclusion E[∇(κN)] = 0 follows.

Note that in this case E[ui,0] is allowed to be non-zero and individual specific.


2.A.4 Bimodality

Proof Theorem 2.11.

Denote the true value for θ2 as θ20 that for general T is equal to

θ20 = σ2

0 + T (1− φ0)2 E[u2i,0].

Thus at T = 2 it is equal to

θ20 = σ2

0 + 2(1− φ0)2 E[u2i,0]

For some φ we denote the following variables

θ2φ = E

[2

N

N∑i=1

(yi − φyi−)2

], σ2

φ = E

[1

N

N∑i=1

2∑t=1

(yi,t − φyi,t−1)2

].

and a = φ0 − φ.

As we assume that the observations are i.i.d. it is sufficient to analyze the previous

expressions for some arbitrary individual i. First, we proceed with the expression for

σ2φ (recall the definition of x)

σ2φ = E

[1

N

N∑i=1

2∑t=1

(yi,t − φyi,t−1)2

]= 0.5 E [(∆yi,2 − φ∆yi,1)2]

= 0.5 E [(∆εi,2 + (φ0 − φ)∆yi,1)2]

= 0.5 E [(∆εi,2 + (φ0 − φ)((1− φ0)ui,0 + εi,1))2]

= 0.5 E [(εi,2 + (φ0 − φ)(1− φ0)ui,0 + (φ0 − φ− 1)εi,1)2]

= 0.5(σ20(1 + (φ0 − φ− 1)2) + (φ0 − φ)2(1− φ0)2 E[u2

i,0])

= 0.5σ20

(1− 2(φ0 − φ) + 1 + (φ0 − φ)2x

)= 0.5σ2

0

(a2x+ 2(1− a)

)Similarly we can derive expressions for θ2

0 and θ2φ in terms of x and a.

θ20 = σ2

0 + 2(1− φ0)2 E[u2i,0] = σ2

0 (2x− 1)


While for θ2φ it follows that

θ2φ = E

[2

N

N∑i=1

(yi − φyi−)2

]= 2 E

[(ui − ui,0 − φ(ui,− − ui,0))2

]= 2 E

[(εi + φ0ui,− − ui,0 − φ(ui,− − ui,0))2

]= 0.5 E

[(εi,2 + εi,1 + φ0(ui,1 + ui,0)− 2ui,0 − φ(ui,1 − ui,0))2

]= 0.5 E

[(εi,2 + εi,1(1 + φ0 − φ) + ui,0(φ0(1 + φ0)− 2− φ(φ0 − 1)))2

]= 0.5σ2

0

[1 + (1 + a)2 + (1− φ0)2 E[u2

i,0](a+ 2)2]

= 0.5σ20

[1 + (1 + a)2 + (1− φ0)2 E[u2

i,0](a+ 2)2/σ20

]= 0.5σ2

0

[1 + (1 + a)2 + (x− 1)(a+ 2)2

]= 0.5σ2

0

[a2x+ (a+ 1)(4x− 2)

].

Continuing,

σ2φθ

2φ = 0.25σ4

0

(a2x− 2(a− 1)

) (a2x+ (a+ 1)(4x− 2)

)= 0.25σ4

0

(a2(a2x2 + 2xa(2x− 2) + (2x− 2)2

)+ 4(2x− 1)

)= 0.25σ4

0

(a2 (ax+ 2(x− 1))2 + 4(2x− 1)

)= 0.25σ4

0

(a2 (ax+ 2(x− 1))2)+ σ2

0θ20.

The first term in brackets is obviously equal for true value φ0 (a = 0) and for

a = 21− xx⇒ φ0 − φ = 2

1− xx⇒ φ = 2

x− 1

x+ φ0.

2.B Iterative bias correction procedure

Asymptotic normality of the estimator can be proved by treating it as the solution of

the following estimating equations

N∑i=1

T∑t=2

((∆yi,t − Υ∆wi,t)∆w

′i,t +

1

2(∆yi,t − Υ∆wi,t)(∆yi,t − Υ∆wi,t)

′S

)= Om×(k+m),

(2.25)

where S = [Im Om×k].


Algorithm 1 Iterative Bias-correction procedure FDOLS

1. For k = 1 to kmax:

2. Given Υ (k−1) compute Υ (k) = Υ + (T − 1)Σ(Υ (k−1))S−1N ;

3. If ‖Υ (k) − Υ (k−1)‖< ε, stop. For some pre-specified matrix norm ‖·‖.

To initialize iterations we set Υ (0) = Υ , and Σ(Υ (k−1)) is defined as

Σ(Υ ) =1

2N(T − 1)

N∑i=1

(T∑t=2

(∆yi,t − Υ∆wi,t) (∆yi,t − Υ∆wi,t)′

). (2.24)

Proposition 2.14. Let Assumptions SA be satisfied and assume that the iterative

procedure in Algorithm 1 has a unique fixed point. Then

√N (υiBC − υ0)

d−→ Nm(0m2 ,F), (2.26)

where

F ≡ V −1XV −1, V = (Σ∆ ⊗ Im)− 1

2(Im(k+m) + Km,(k+m))((S

′Σ0S)⊗ Im),

X ≡ plimN→∞

1

N

N∑i=1

vecOi (vecOi)′ ,

Oi ≡T∑t=2

((∆yi,t − Υ0wi,t)w

′i,t +

1

2(∆yi,t − Υ0wi,t)(∆yi,t − Υ0wi,t)

′S

)

Note that the asymptotic distribution of the estimator depends upon the choice of

Σ(Φ). A different asymptotic distribution is obtained if instead of using the Σ esti-

mator in (2.24) we opt for the standard infeasible ML estimator

Σ(Υ ) =1

N(T − 1)

N∑i=1

(T∑t=1

(yi,t −Φyi,t−1 −Bxi,t) (yi,t −Φyi,t−1 −Bxi,t)′).

2.C Tables


Table

2.1

:D

esig

n1

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

φ11

AB

-GM

M-1

5.9

9-1

5.5

7-0

.77

0.4

50.4

30.2

5-3

6.0

0-3

5.3

2-1

.29

0.5

30.6

90.4

3-1

2.2

9-1

1.7

8-0

.34

0.0

80.1

80.1

2-1

9.7

6-1

8.9

6-0

.48

0.0

60.2

60.1

9S

ys-

GM

M2.2

02.9

4-0

.24

0.2

50.1

50.1

017.0

418.2

2-0

.15

0.4

60.2

50.2

07.0

07.1

9-0

.06

0.1

90.1

00.0

825.3

426.1

70.1

10.3

70.2

70.2

6F

DL

S0.3

30.2

1-0

.23

0.2

40.1

40.1

00.3

10.2

0-0

.23

0.2

40.1

40.1

00.1

10.0

5-0

.14

0.1

40.0

90.0

60.0

80.0

3-0

.14

0.1

40.0

90.0

6T

ML

E6.0

03.2

3-0

.25

0.4

70.2

30.1

510.1

17.1

8-0

.25

0.5

30.2

60.1

61.7

80.8

1-0

.11

0.1

80.0

90.0

63.5

42.2

3-0

.11

0.2

20.1

10.0

6T

ML

Ec

-0.5

1-0

.57

-0.2

00.1

90.1

20.0

8-0

.24

-0.4

8-0

.20

0.2

00.1

20.0

8-0

.34

-0.3

0-0

.10

0.0

90.0

60.0

4-0

.36

-0.3

1-0

.10

0.0

90.0

60.0

4T

ML

Es

1.4

1-0

.96

-0.2

60.3

70.1

90.1

31.4

6-0

.98

-0.2

60.3

70.1

90.1

30.8

80.1

9-0

.12

0.1

60.0

80.0

50.8

30.1

3-0

.12

0.1

60.0

80.0

5T

ML

Er

2.4

5-0

.44

-0.2

60.4

10.2

00.1

34.8

51.0

2-0

.26

0.4

80.2

30.1

40.8

80.1

9-0

.12

0.1

60.0

80.0

50.8

40.1

4-0

.12

0.1

60.0

80.0

5φ12

AB

-GM

M-9

.06

-8.5

2-0

.71

0.5

20.4

10.2

3-1

9.6

1-1

8.7

4-1

.18

0.7

50.6

60.3

7-6

.25

-6.1

1-0

.28

0.1

50.1

40.0

9-1

1.3

5-1

1.1

0-0

.40

0.1

60.2

00.1

4S

ys-

GM

M-1

.02

-1.0

8-0

.25

0.2

30.1

50.0

9-5

.51

-5.6

3-0

.35

0.2

40.1

90.1

2-2

.29

-2.1

5-0

.14

0.0

90.0

80.0

5-1

0.5

6-1

0.9

7-0

.22

0.0

30.1

30.1

1F

DL

S-0

.14

-0.1

3-0

.25

0.2

40.1

50.1

0-0

.19

-0.1

8-0

.25

0.2

40.1

50.1

0-0

.15

-0.1

1-0

.16

0.1

60.1

00.0

6-0

.18

-0.1

4-0

.16

0.1

60.1

00.0

6T

ML

E-2

.77

-1.8

7-0

.31

0.2

20.1

60.1

1-3

.27

-2.0

2-0

.34

0.2

30.1

80.1

21.4

91.0

6-0

.10

0.1

40.0

80.0

53.0

02.6

4-0

.09

0.1

60.0

90.0

6T

ML

Ec

-0.4

3-0

.32

-0.1

60.1

50.1

00.0

6-0

.62

-0.5

2-0

.17

0.1

50.1

00.0

6-0

.18

-0.2

1-0

.08

0.0

80.0

50.0

3-0

.19

-0.2

1-0

.08

0.0

80.0

50.0

3T

ML

Es

-3.8

8-3

.37

-0.2

80.1

90.1

50.1

0-4

.04

-3.5

4-0

.28

0.1

80.1

50.1

00.6

90.3

9-0

.10

0.1

20.0

70.0

50.6

40.3

3-0

.10

0.1

20.0

70.0

5T

ML

Er

-3.8

9-3

.43

-0.2

80.1

90.1

50.1

0-4

.77

-4.4

2-0

.29

0.1

80.1

50.1

00.6

90.3

9-0

.10

0.1

20.0

70.0

50.6

30.3

2-0

.10

0.1

20.0

70.0

5N

=250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

φ11

AB

-GM

M-6

.62

-7.0

1-0

.44

0.3

30.2

50.1

6-2

1.2

0-2

1.1

5-0

.92

0.4

80.5

00.3

0-5

.57

-5.3

9-0

.20

0.0

90.1

00.0

7-1

1.0

0-1

0.7

1-0

.33

0.0

90.1

70.1

2S

ys-

GM

M0.6

30.9

9-0

.16

0.1

60.1

00.0

710.0

110.7

1-0

.16

0.3

50.1

80.1

32.2

12.2

1-0

.06

0.1

00.0

60.0

415.6

115.7

40.0

10.3

00.1

80.1

6F

DL

S0.0

4-0

.02

-0.1

50.1

50.0

90.0

60.0

3-0

.02

-0.1

50.1

50.0

90.0

60.0

50.1

6-0

.09

0.0

90.0

60.0

40.0

50.1

6-0

.09

0.0

90.0

60.0

4T

ML

E3.6

01.0

5-0

.18

0.3

50.1

60.0

96.3

23.0

7-0

.17

0.4

20.1

90.1

01.0

80.3

0-0

.08

0.1

20.0

60.0

42.1

61.0

5-0

.07

0.1

60.0

70.0

4T

ML

Ec

-0.3

3-0

.38

-0.1

30.1

20.0

70.0

5-0

.30

-0.3

6-0

.13

0.1

20.0

80.0

5-0

.16

-0.1

1-0

.06

0.0

60.0

40.0

2-0

.16

-0.1

1-0

.06

0.0

60.0

40.0

2T

ML

Es

1.4

3-0

.56

-0.1

80.2

80.1

40.0

91.4

7-0

.54

-0.1

80.2

80.1

40.0

90.7

60.1

4-0

.08

0.1

10.0

60.0

30.7

60.1

4-0

.08

0.1

10.0

60.0

3T

ML

Er

1.5

6-0

.51

-0.1

80.2

90.1

40.0

92.1

9-0

.31

-0.1

80.3

30.1

50.0

90.7

60.1

4-0

.08

0.1

10.0

60.0

30.7

60.1

4-0

.08

0.1

10.0

60.0

3φ12

AB

-GM

M-3

.78

-4.2

9-0

.41

0.3

50.2

40.1

5-1

4.0

2-1

4.1

1-0

.86

0.5

70.4

90.2

8-2

.93

-2.8

9-0

.17

0.1

10.0

90.0

6-7

.12

-6.9

0-0

.29

0.1

40.1

50.1

0S

ys-

GM

M-0

.46

-0.3

5-0

.17

0.1

50.1

00.0

6-2

.00

-1.8

1-0

.28

0.2

20.1

50.0

9-0

.37

-0.3

1-0

.09

0.0

80.0

50.0

3-4

.61

-4.6

9-0

.18

0.0

90.0

90.0

7F

DL

S0.0

20.0

1-0

.15

0.1

50.0

90.0

60.0

0-0

.01

-0.1

50.1

50.0

90.0

6-0

.04

0.0

0-0

.10

0.1

00.0

60.0

4-0

.04

0.0

0-0

.10

0.1

00.0

60.0

4T

ML

E-0

.28

0.2

5-0

.19

0.1

70.1

10.0

70.1

01.0

8-0

.23

0.1

90.1

30.0

81.0

20.3

3-0

.07

0.1

10.0

50.0

32.0

70.9

9-0

.06

0.1

30.0

60.0

4T

ML

Ec

-0.1

5-0

.09

-0.1

00.1

00.0

60.0

4-0

.18

-0.1

2-0

.10

0.1

00.0

60.0

4-0

.10

-0.0

4-0

.05

0.0

50.0

30.0

2-0

.10

-0.0

4-0

.05

0.0

50.0

30.0

2T

ML

Es

-1.2

0-0

.92

-0.1

80.1

50.1

00.0

7-1

.27

-0.9

9-0

.18

0.1

50.1

00.0

70.7

10.2

0-0

.07

0.1

00.0

50.0

30.7

10.2

0-0

.07

0.1

00.0

50.0

3T

ML

Er

-1.2

1-0

.95

-0.1

80.1

50.1

00.0

7-1

.57

-1.2

5-0

.19

0.1

40.1

00.0

70.7

10.2

0-0

.07

0.1

00.0

50.0

30.7

10.2

0-0

.07

0.1

00.0

50.0

3


Table

2.2

:D

esig

n2

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

φ11

AB

-GM

M-8

.01

-8.3

9-0

.54

0.4

20.3

10.2

0-2

0.2

4-2

0.7

2-1

.01

0.6

50.5

80.3

5-7

.19

-7.0

9-0

.29

0.1

40.1

50.1

0-1

0.4

4-1

0.3

8-0

.38

0.1

70.2

00.1

3S

ys-

GM

M3.9

84.0

7-0

.28

0.3

40.1

90.1

325.4

826.4

4-0

.19

0.6

80.3

70.2

810.1

59.9

7-0

.07

0.2

80.1

50.1

137.5

538.4

50.1

50.5

70.4

00.3

8F

DL

S-4

.67

-4.8

3-0

.34

0.2

50.1

80.1

3-4

.67

-4.8

2-0

.34

0.2

50.1

80.1

3-5

.43

-5.5

2-0

.24

0.1

40.1

30.0

9-5

.43

-5.5

3-0

.24

0.1

40.1

30.0

9T

ML

E8.3

83.8

6-0

.27

0.5

90.2

70.1

614.8

89.6

2-0

.27

0.7

00.3

40.1

90.2

2-0

.13

-0.1

30.1

40.0

90.0

60.8

6-0

.06

-0.1

30.1

60.1

00.0

6T

ML

Ec

2.2

52.1

7-0

.21

0.2

60.1

40.1

02.4

72.2

3-0

.21

0.2

60.1

50.1

00.5

80.5

6-0

.11

0.1

20.0

70.0

50.5

80.5

6-0

.11

0.1

20.0

70.0

5T

ML

Es

5.3

51.6

7-0

.27

0.5

10.2

40.1

46.4

12.3

6-0

.27

0.5

50.2

50.1

40.1

8-0

.13

-0.1

30.1

40.0

80.0

60.1

8-0

.13

-0.1

30.1

40.0

80.0

6T

ML

Er

5.6

11.7

5-0

.27

0.5

20.2

40.1

48.1

13.1

4-0

.27

0.6

00.2

70.1

50.1

8-0

.13

-0.1

30.1

40.0

80.0

60.1

9-0

.13

-0.1

30.1

40.0

80.0

6φ12

AB

-GM

M-1

.74

-1.4

1-0

.50

0.4

50.2

90.1

80.0

00.0

1-0

.86

0.8

80.5

70.3

2-0

.43

-0.3

4-0

.22

0.2

00.1

30.0

90.4

20.4

8-0

.28

0.2

80.1

70.1

2S

ys-

GM

M-1

.67

-1.6

4-0

.33

0.2

90.1

90.1

3-8

.19

-8.4

7-0

.51

0.3

50.2

70.1

8-2

.94

-2.8

6-0

.20

0.1

40.1

10.0

7-9

.99

-10.0

0-0

.29

0.1

00.1

50.1

1F

DL

S4.9

84.9

8-0

.27

0.3

60.2

00.1

34.9

74.9

7-0

.27

0.3

60.2

00.1

38.3

68.3

9-0

.12

0.2

90.1

50.1

08.3

78.3

9-0

.12

0.2

90.1

50.1

0T

ML

E-0

.88

-0.5

5-0

.38

0.3

40.2

10.1

3-2

.86

-1.5

2-0

.46

0.3

70.2

50.1

60.0

50.1

0-0

.12

0.1

20.0

70.0

50.0

30.1

7-0

.13

0.1

30.0

90.0

5T

ML

Ec

-7.2

7-7

.32

-0.2

70.1

30.1

40.1

0-7

.27

-7.3

1-0

.27

0.1

30.1

40.1

0-3

.31

-3.3

5-0

.14

0.0

70.0

70.0

5-3

.31

-3.3

5-0

.14

0.0

70.0

70.0

5T

ML

Es

-0.6

9-0

.55

-0.3

20.3

00.1

90.1

2-1

.35

-1.1

5-0

.32

0.2

90.1

80.1

20.0

10.0

9-0

.12

0.1

20.0

70.0

50.0

00.0

9-0

.12

0.1

20.0

70.0

5T

ML

Er

-0.6

6-0

.53

-0.3

20.3

00.1

80.1

2-1

.91

-1.7

1-0

.32

0.2

80.1

80.1

20.0

10.0

9-0

.12

0.1

20.0

70.0

50.0

00.0

9-0

.12

0.1

20.0

70.0

5N

=250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

φ11

AB

-GM

M-3

.39

-4.0

5-0

.33

0.2

80.1

90.1

3-8

.81

-10.0

9-0

.62

0.4

90.3

50.2

2-3

.16

-3.2

1-0

.17

0.1

10.0

90.0

6-4

.99

-4.9

7-0

.24

0.1

40.1

30.0

8S

ys-

GM

M1.3

71.4

1-0

.19

0.2

10.1

20.0

814.3

714.1

3-0

.19

0.4

80.2

50.1

72.9

12.8

6-0

.08

0.1

40.0

70.0

522.5

922.3

40.0

20.4

40.2

60.2

2F

DL

S-5

.11

-5.1

3-0

.23

0.1

30.1

20.0

8-5

.11

-5.1

3-0

.23

0.1

30.1

20.0

8-5

.62

-5.7

3-0

.17

0.0

60.0

90.0

7-5

.62

-5.7

3-0

.17

0.0

60.0

90.0

7T

ML

E4.0

60.7

8-0

.18

0.4

20.1

80.0

98.1

02.5

5-0

.18

0.5

70.2

40.1

10.0

2-0

.09

-0.0

80.0

80.0

50.0

30.0

3-0

.09

-0.0

80.0

80.0

50.0

3T

ML

Ec

2.3

02.2

9-0

.12

0.1

70.0

90.0

62.3

22.3

0-0

.12

0.1

70.0

90.0

60.7

00.6

8-0

.07

0.0

80.0

40.0

30.7

00.6

8-0

.07

0.0

80.0

40.0

3T

ML

Es

2.8

70.4

0-0

.18

0.3

50.1

60.0

93.1

70.5

0-0

.18

0.3

70.1

70.0

90.0

2-0

.09

-0.0

80.0

80.0

50.0

30.0

2-0

.09

-0.0

80.0

80.0

50.0

3T

ML

Er

2.9

10.4

0-0

.18

0.3

50.1

60.0

93.5

50.5

6-0

.18

0.4

10.1

80.0

90.0

2-0

.09

-0.0

80.0

80.0

50.0

30.0

2-0

.09

-0.0

80.0

80.0

50.0

3φ12

AB

-GM

M-0

.66

-0.8

0-0

.30

0.2

90.1

80.1

20.3

9-0

.13

-0.5

60.5

60.3

50.2

1-0

.06

-0.0

5-0

.14

0.1

40.0

90.0

60.5

90.7

5-0

.19

0.2

00.1

20.0

8S

ys-

GM

M-0

.59

-0.6

2-0

.21

0.1

90.1

20.0

8-5

.03

-4.6

7-0

.39

0.2

80.2

10.1

4-0

.74

-0.7

5-0

.12

0.1

00.0

70.0

4-5

.98

-5.9

5-0

.26

0.1

40.1

30.0

9F

DL

S5.0

95.1

4-0

.14

0.2

50.1

30.0

95.0

95.1

4-0

.14

0.2

50.1

30.0

98.4

98.5

2-0

.05

0.2

10.1

20.0

98.4

98.5

2-0

.05

0.2

10.1

20.0

9T

ML

E0.8

50.3

2-0

.22

0.2

60.1

40.0

8-0

.27

0.1

9-0

.35

0.3

00.1

80.1

00.0

00.0

4-0

.08

0.0

70.0

50.0

30.0

10.0

4-0

.08

0.0

70.0

50.0

3T

ML

Ec

-7.2

8-7

.26

-0.2

00.0

50.1

10.0

8-7

.28

-7.2

7-0

.20

0.0

50.1

10.0

8-3

.29

-3.2

4-0

.10

0.0

30.0

50.0

4-3

.29

-3.2

4-0

.10

0.0

30.0

50.0

4T

ML

Es

0.6

70.1

3-0

.19

0.2

30.1

30.0

80.4

7-0

.04

-0.1

90.2

30.1

30.0

80.0

00.0

4-0

.08

0.0

70.0

50.0

30.0

00.0

4-0

.08

0.0

70.0

50.0

3T

ML

Er

0.6

60.1

2-0

.19

0.2

30.1

30.0

80.2

4-0

.19

-0.2

00.2

20.1

30.0

80.0

00.0

4-0

.08

0.0

70.0

50.0

30.0

00.0

4-0

.08

0.0

70.0

50.0

3


Table

2.3

:D

esig

n3

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

φ11

AB

-GM

M-2

7.5

5-2

2.6

3-1

.33

0.6

40.7

10.3

6-3

.27

-2.2

9-0

.33

0.2

30.2

00.0

9-1

1.6

5-1

1.2

6-0

.41

0.1

60.2

10.1

4-5

.37

-4.3

4-0

.26

0.1

20.1

30.0

7S

ys-

GM

M29.5

130.9

00.0

30.5

10.3

30.3

160.9

461.5

00.5

10.7

00.6

10.6

229.0

329.5

40.1

10.4

60.3

10.3

057.8

458.3

20.5

10.6

30.5

80.5

8F

DL

S10.0

39.8

9-0

.21

0.4

20.2

20.1

464.3

464.5

60.2

71.0

10.6

80.6

50.9

00.7

7-0

.18

0.2

10.1

20.0

834.6

833.5

70.0

60.6

60.3

90.3

4T

ML

E16.3

77.5

5-0

.24

0.8

10.3

60.1

77.4

70.7

3-0

.13

0.7

80.2

70.0

60.3

2-0

.09

-0.1

20.1

30.0

90.0

50.0

7-0

.14

-0.0

80.0

80.0

60.0

3T

ML

Ec

17.7

017.4

3-0

.09

0.4

60.2

40.1

852.1

351.5

50.3

40.7

90.5

40.5

27.8

17.5

4-0

.06

0.2

20.1

20.0

840.9

442.5

70.1

90.5

70.4

30.4

3T

ML

Es

4.0

00.7

2-0

.23

0.4

50.2

10.1

10.2

0-0

.21

-0.1

30.1

40.0

90.0

50.0

1-0

.11

-0.1

20.1

30.0

80.0

5-0

.14

-0.1

5-0

.08

0.0

80.0

50.0

3T

ML

Er

5.2

70.9

4-0

.23

0.5

50.2

30.1

10.4

5-0

.21

-0.1

30.1

50.1

00.0

50.0

1-0

.11

-0.1

20.1

30.0

80.0

5-0

.14

-0.1

5-0

.08

0.0

80.0

50.0

3φ12

AB

-GM

M-1

.70

-3.2

9-1

.03

1.0

30.6

70.3

30.6

10.3

9-0

.24

0.2

50.1

70.0

8-0

.74

-0.6

5-0

.30

0.2

80.1

80.1

11.1

81.0

5-0

.17

0.1

90.1

10.0

6S

ys-

GM

M-3

.10

-3.7

6-0

.23

0.2

00.1

30.0

9-1

0.4

3-1

0.6

0-0

.17

-0.0

30.1

10.1

1-4

.45

-4.6

6-0

.18

0.1

00.0

90.0

7-1

1.7

1-1

1.8

8-0

.16

-0.0

70.1

20.1

2F

DL

S5.6

55.5

0-0

.26

0.3

80.2

00.1

39.0

49.1

2-0

.22

0.4

00.2

10.1

48.5

58.5

8-0

.12

0.2

90.1

50.1

19.8

29.8

0-0

.15

0.3

50.1

80.1

2T

ML

E-0

.76

-0.0

6-0

.49

0.4

30.2

60.1

50.8

30.2

2-0

.17

0.3

00.1

70.0

50.0

60.0

9-0

.11

0.1

10.0

70.0

40.0

80.0

7-0

.08

0.0

70.0

50.0

3T

ML

Ec

-7.4

2-7

.50

-0.2

90.1

50.1

50.1

0-3

.83

-4.8

6-0

.16

0.1

20.0

90.0

7-3

.07

-3.0

3-0

.15

0.0

90.0

80.0

5-5

.02

-5.9

6-0

.18

0.1

10.1

00.0

8T

ML

Es

-0.0

7-0

.33

-0.2

50.2

60.1

60.0

9-0

.04

-0.0

4-0

.12

0.1

20.0

70.0

5-0

.05

0.0

6-0

.11

0.1

10.0

70.0

4-0

.01

0.0

6-0

.08

0.0

70.0

50.0

3T

ML

Er

-0.3

6-0

.60

-0.2

50.2

50.1

50.0

9-0

.07

-0.0

5-0

.12

0.1

20.0

70.0

5-0

.05

0.0

6-0

.11

0.1

10.0

70.0

4-0

.01

0.0

6-0

.08

0.0

70.0

50.0

3N

=250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

φ11

AB

-GM

M-1

7.4

4-1

3.6

8-0

.99

0.5

10.5

40.2

6-0

.78

-0.7

8-0

.14

0.1

30.0

80.0

5-5

.85

-5.6

8-0

.26

0.1

30.1

30.0

9-1

.95

-1.6

3-0

.13

0.0

80.0

70.0

4S

ys-

GM

M32.0

632.7

40.1

60.4

60.3

30.3

361.1

761.3

40.5

50.6

70.6

10.6

127.4

427.7

10.1

40.4

10.2

90.2

857.8

458.0

10.5

40.6

10.5

80.5

8F

DL

S9.8

79.7

9-0

.10

0.3

00.1

60.1

166.9

267.0

10.4

20.9

20.6

90.6

70.7

50.6

0-0

.11

0.1

30.0

70.0

535.9

235.2

60.1

70.5

70.3

80.3

5T

ML

E9.6

12.1

4-0

.15

0.6

90.2

60.0

91.1

00.0

1-0

.08

0.0

90.1

10.0

3-0

.03

-0.0

7-0

.07

0.0

80.0

50.0

3-0

.06

-0.1

1-0

.05

0.0

50.0

30.0

2T

ML

Ec

18.1

917.8

70.0

10.3

60.2

10.1

851.3

750.8

60.4

20.6

00.5

20.5

17.9

87.8

5-0

.01

0.1

70.1

00.0

844.0

044.9

70.3

00.5

50.4

50.4

5T

ML

Es

1.2

6-0

.06

-0.1

50.2

10.1

20.0

7-0

.02

-0.0

7-0

.08

0.0

80.0

50.0

3-0

.03

-0.0

7-0

.07

0.0

80.0

50.0

3-0

.06

-0.1

1-0

.05

0.0

50.0

30.0

2T

ML

Er

1.6

2-0

.02

-0.1

50.2

20.1

30.0

7-0

.02

-0.0

7-0

.08

0.0

80.0

50.0

3-0

.03

-0.0

7-0

.07

0.0

80.0

50.0

3-0

.06

-0.1

1-0

.05

0.0

50.0

30.0

2φ12

AB

-GM

M0.1

2-1

.50

-0.7

50.8

20.5

20.2

50.1

00.1

1-0

.12

0.1

20.0

80.0

5-0

.25

-0.1

3-0

.21

0.2

00.1

30.0

80.4

90.5

0-0

.10

0.1

10.0

60.0

4S

ys-

GM

M-3

.98

-4.2

6-0

.16

0.0

90.0

90.0

6-1

0.4

4-1

0.4

9-0

.15

-0.0

60.1

10.1

0-3

.81

-3.9

8-0

.14

0.0

70.0

70.0

5-1

1.7

8-1

1.8

3-0

.14

-0.0

90.1

20.1

2F

DL

S5.8

85.8

2-0

.14

0.2

60.1

30.0

99.4

59.4

7-0

.11

0.2

90.1

50.1

18.7

18.7

6-0

.04

0.2

20.1

20.0

910.0

910.2

2-0

.06

0.2

60.1

40.1

1T

ML

E0.6

00.4

1-0

.38

0.3

70.1

90.0

80.1

3-0

.01

-0.0

70.0

70.0

60.0

3-0

.01

-0.0

2-0

.07

0.0

70.0

40.0

3-0

.01

-0.0

1-0

.05

0.0

50.0

30.0

2T

ML

Ec

-7.5

9-7

.53

-0.2

20.0

60.1

10.0

8-4

.78

-5.2

4-0

.11

0.0

40.0

70.0

6-3

.01

-2.9

6-0

.11

0.0

40.0

60.0

4-6

.36

-6.8

7-0

.15

0.0

40.0

90.0

7T

ML

Es

0.3

2-0

.05

-0.1

40.1

50.0

90.0

6-0

.02

-0.0

4-0

.07

0.0

70.0

40.0

3-0

.01

-0.0

2-0

.07

0.0

70.0

40.0

3-0

.01

-0.0

1-0

.05

0.0

50.0

30.0

2T

ML

Er

0.1

9-0

.13

-0.1

40.1

50.0

90.0

6-0

.02

-0.0

4-0

.07

0.0

70.0

40.0

3-0

.01

-0.0

2-0

.07

0.0

70.0

40.0

3-0

.01

-0.0

1-0

.05

0.0

50.0

30.0

2


Table

2.4

:D

esig

n4

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

φ11

AB

-GM

M-2

.40

-2.6

4-0

.33

0.3

00.1

90.1

3-0

.60

-0.6

7-0

.19

0.1

80.1

10.0

7-3

.95

-4.0

5-0

.21

0.1

20.1

10.0

7-1

.59

-1.5

8-0

.13

0.0

90.0

70.0

4S

ys-

GM

M28.8

229.7

8-0

.11

0.6

60.3

70.3

052.0

851.3

30.4

60.6

10.5

20.5

129.1

530.1

60.0

30.5

20.3

30.3

052.8

052.7

00.5

00.5

60.5

30.5

3F

DL

S7.1

06.7

9-0

.23

0.3

90.2

00.1

356.2

456.1

30.1

70.9

60.6

10.5

60.3

30.2

2-0

.19

0.2

00.1

20.0

832.1

431.2

60.0

40.6

30.3

70.3

1T

ML

E12.1

84.8

5-0

.25

0.7

30.3

20.1

511.0

91.6

9-0

.18

0.9

60.3

30.0

80.6

2-0

.12

-0.1

20.1

40.1

00.0

50.3

2-0

.01

-0.0

90.0

90.0

70.0

4T

ML

Ec

13.5

213.2

3-0

.12

0.4

00.2

10.1

548.7

047.5

30.2

10.8

40.5

20.4

86.9

86.8

3-0

.07

0.2

10.1

10.0

836.2

636.9

50.1

50.5

50.3

80.3

7T

ML

Es

4.8

81.3

0-0

.24

0.5

10.2

20.1

20.8

00.1

4-0

.15

0.1

80.1

10.0

60.1

1-0

.14

-0.1

20.1

30.0

80.0

50.0

2-0

.02

-0.0

90.0

90.0

50.0

4T

ML

Er

5.4

21.4

4-0

.24

0.5

40.2

30.1

21.4

80.1

7-0

.15

0.1

90.1

40.0

60.1

1-0

.14

-0.1

20.1

30.0

80.0

50.0

2-0

.02

-0.0

90.0

90.0

50.0

4φ12

AB

-GM

M-0

.46

-0.2

7-0

.32

0.3

00.1

90.1

3-0

.14

0.0

1-0

.20

0.2

00.1

20.0

8-0

.25

-0.2

4-0

.17

0.1

60.1

00.0

7-0

.06

-0.0

9-0

.11

0.1

10.0

70.0

4S

ys-

GM

M-1

3.2

5-1

3.5

2-0

.43

0.1

70.2

30.1

6-1

6.0

5-1

5.4

8-0

.25

-0.0

90.1

70.1

5-9

.47

-9.7

8-0

.28

0.1

00.1

50.1

1-1

4.8

7-1

4.8

0-0

.18

-0.1

20.1

50.1

5F

DL

S5.6

95.8

0-0

.26

0.3

80.2

00.1

48.7

68.6

9-0

.25

0.4

30.2

30.1

58.5

68.6

9-0

.12

0.2

90.1

50.1

19.6

49.6

7-0

.15

0.3

40.1

80.1

2T

ML

E-0

.26

0.0

7-0

.44

0.4

00.2

40.1

41.0

70.4

0-0

.50

0.5

20.2

60.0

90.1

40.0

4-0

.12

0.1

20.0

80.0

50.1

3-0

.05

-0.0

90.0

90.0

60.0

3T

ML

Ec

-6.0

2-6

.03

-0.2

80.1

60.1

40.1

00.3

7-0

.08

-0.2

00.2

30.1

30.0

8-2

.74

-2.6

8-0

.14

0.0

90.0

80.0

5-0

.54

-1.0

8-0

.17

0.1

80.1

10.0

7T

ML

Es

-0.1

8-0

.31

-0.2

80.2

90.1

70.1

1-0

.08

0.0

1-0

.16

0.1

60.1

00.0

6-0

.03

-0.0

5-0

.11

0.1

10.0

70.0

5-0

.07

-0.0

6-0

.09

0.0

80.0

50.0

3T

ML

Er

-0.1

9-0

.36

-0.2

80.2

90.1

70.1

1-0

.13

-0.0

2-0

.16

0.1

60.1

00.0

6-0

.03

-0.0

5-0

.11

0.1

10.0

70.0

5-0

.07

-0.0

6-0

.09

0.0

80.0

50.0

3N

=250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

φ11

AB

-GM

M-0

.98

-1.2

0-0

.20

0.1

90.1

20.0

8-0

.22

-0.3

5-0

.11

0.1

10.0

70.0

4-1

.62

-1.6

6-0

.12

0.0

90.0

70.0

4-0

.57

-0.5

3-0

.07

0.0

60.0

40.0

3S

ys-

GM

M35.1

336.3

00.0

10.6

60.4

00.3

652.4

052.0

50.4

80.5

80.5

20.5

231.4

332.2

20.0

80.5

30.3

40.3

253.9

153.7

90.5

20.5

60.5

40.5

4F

DL

S6.8

66.7

3-0

.12

0.2

60.1

40.0

958.3

258.1

70.3

30.8

40.6

00.5

80.1

90.0

3-0

.12

0.1

30.0

70.0

533.2

132.5

40.1

50.5

40.3

50.3

3T

ML

E9.1

02.0

8-0

.16

0.6

90.2

60.0

92.7

50.1

0-0

.11

0.1

70.1

70.0

40.0

5-0

.09

-0.0

70.0

80.0

50.0

30.0

1-0

.03

-0.0

50.0

50.0

30.0

2T

ML

Ec

13.8

913.7

9-0

.02

0.3

00.1

70.1

450.2

147.7

40.3

20.7

90.5

20.4

87.1

67.1

3-0

.01

0.1

60.0

90.0

738.8

139.1

90.2

50.5

20.4

00.3

9T

ML

Es

1.8

80.3

0-0

.16

0.2

40.1

40.0

70.1

2-0

.01

-0.1

00.1

00.0

60.0

40.0

1-0

.09

-0.0

70.0

80.0

50.0

30.0

0-0

.03

-0.0

50.0

50.0

30.0

2T

ML

Er

2.0

40.3

5-0

.16

0.2

50.1

40.0

70.1

6-0

.01

-0.1

00.1

00.0

60.0

40.0

1-0

.09

-0.0

70.0

80.0

50.0

30.0

0-0

.03

-0.0

50.0

50.0

30.0

2φ12

AB

-GM

M-0

.11

-0.1

3-0

.20

0.1

90.1

20.0

80.0

1-0

.10

-0.1

20.1

20.0

70.0

5-0

.02

-0.0

4-0

.10

0.1

10.0

60.0

40.0

1-0

.01

-0.0

70.0

70.0

40.0

3S

ys-

GM

M-1

6.3

2-1

6.2

6-0

.39

0.0

60.2

10.1

7-1

6.0

6-1

5.8

3-0

.21

-0.1

20.1

60.1

6-1

0.3

1-1

0.3

0-0

.26

0.0

60.1

40.1

1-1

5.3

1-1

5.2

5-0

.17

-0.1

30.1

50.1

5F

DL

S5.8

25.7

1-0

.14

0.2

60.1

30.0

99.0

59.0

1-0

.13

0.3

10.1

60.1

18.6

88.7

0-0

.05

0.2

20.1

20.0

99.8

29.8

6-0

.06

0.2

60.1

40.1

0T

ML

E0.9

50.7

4-0

.38

0.3

70.2

00.0

91.1

30.2

3-0

.12

0.2

10.1

50.0

40.0

30.0

6-0

.07

0.0

70.0

40.0

30.0

00.0

2-0

.05

0.0

50.0

30.0

2T

ML

Ec

-6.0

4-6

.08

-0.2

00.0

80.1

00.0

7-0

.24

-0.4

6-0

.13

0.1

30.0

80.0

5-2

.71

-2.7

1-0

.10

0.0

50.0

50.0

4-1

.00

-1.2

8-0

.12

0.1

10.0

70.0

5T

ML

Es

0.6

30.1

2-0

.16

0.1

90.1

10.0

60.0

50.0

2-0

.10

0.1

00.0

60.0

40.0

00.0

5-0

.07

0.0

70.0

40.0

3-0

.01

0.0

2-0

.05

0.0

50.0

30.0

2T

ML

Er

0.5

80.1

0-0

.16

0.1

90.1

10.0

60.0

50.0

2-0

.10

0.1

00.0

60.0

40.0

00.0

5-0

.07

0.0

70.0

40.0

3-0

.01

0.0

2-0

.05

0.0

50.0

30.0

2


Table

2.5

:D

esig

n5

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

φ11

AB

-GM

M-1

4.6

5-1

5.6

5-0

.77

0.5

10.4

30.2

7-2

9.4

3-3

0.0

4-1

.31

0.7

50.7

40.4

5-1

2.9

4-1

3.0

0-0

.40

0.1

40.2

10.1

5-1

5.8

9-1

5.8

9-0

.49

0.1

70.2

60.1

8S

ys-

GM

M6.8

67.1

0-0

.33

0.4

50.2

50.1

732.3

334.1

4-0

.19

0.7

70.4

40.3

518.4

318.4

3-0

.05

0.4

10.2

30.1

946.4

347.6

90.2

60.6

20.4

80.4

8F

DL

S-4

.67

-4.7

5-0

.46

0.3

70.2

60.1

7-4

.69

-4.7

6-0

.46

0.3

70.2

60.1

7-5

.52

-5.5

9-0

.32

0.2

10.1

70.1

1-5

.53

-5.6

1-0

.32

0.2

10.1

70.1

1T

ML

E10.6

16.5

9-0

.37

0.6

90.3

40.2

217.6

314.9

8-0

.36

0.7

80.4

00.2

71.0

2-0

.23

-0.1

90.2

40.1

40.0

83.6

20.3

8-0

.19

0.4

40.1

90.0

9T

ML

Ec

1.6

91.2

5-0

.30

0.3

50.2

00.1

32.3

81.6

2-0

.30

0.3

80.2

10.1

30.2

30.3

5-0

.16

0.1

70.1

00.0

70.2

40.3

3-0

.16

0.1

70.1

00.0

7T

ML

Es

5.1

31.4

9-0

.36

0.5

80.2

90.1

96.9

73.0

6-0

.35

0.6

10.3

00.1

90.3

9-0

.34

-0.1

80.2

10.1

30.0

80.5

6-0

.30

-0.1

80.2

20.1

30.0

8T

ML

Er

5.8

81.8

0-0

.36

0.6

00.3

00.1

910.7

05.4

6-0

.35

0.7

00.3

40.2

20.4

1-0

.34

-0.1

80.2

10.1

30.0

80.8

6-0

.27

-0.1

80.2

30.1

40.0

8φ12

AB

-GM

M-2

.69

-2.5

4-0

.67

0.6

00.4

20.2

4-0

.96

0.0

6-1

.10

1.0

70.7

20.3

80.1

60.4

0-0

.28

0.2

80.1

70.1

11.2

51.3

6-0

.33

0.3

60.2

10.1

4S

ys-

GM

M-2

.75

-2.7

0-0

.42

0.3

60.2

40.1

6-9

.82

-9.8

7-0

.56

0.3

80.3

00.1

9-5

.09

-5.0

7-0

.26

0.1

60.1

40.0

9-1

2.0

1-1

2.2

9-0

.28

0.0

50.1

60.1

3F

DL

S5.4

15.5

4-0

.40

0.5

00.2

80.1

85.4

15.5

4-0

.40

0.5

00.2

80.1

88.6

58.7

2-0

.22

0.3

80.2

00.1

48.6

48.7

1-0

.22

0.3

80.2

00.1

4T

ML

E-1

.70

-0.8

9-0

.48

0.4

20.2

70.1

8-3

.84

-2.9

4-0

.53

0.4

30.2

90.2

00.3

80.2

0-0

.18

0.2

00.1

20.0

7-0

.26

0.2

1-0

.27

0.2

30.1

50.0

8T

ML

Ec

-6.9

4-6

.86

-0.3

50.2

10.1

80.1

2-6

.92

-6.8

1-0

.35

0.2

00.1

80.1

2-3

.21

-3.2

0-0

.18

0.1

20.1

00.0

7-3

.22

-3.2

1-0

.18

0.1

20.1

00.0

7T

ML

Es

-1.3

3-1

.06

-0.4

00.3

60.2

30.1

5-2

.43

-2.2

2-0

.39

0.3

40.2

20.1

50.2

80.1

4-0

.17

0.1

80.1

10.0

70.1

20.0

4-0

.17

0.1

80.1

10.0

7T

ML

Er

-1.2

7-0

.97

-0.4

00.3

70.2

30.1

5-2

.97

-2.9

9-0

.39

0.3

40.2

20.1

50.2

70.1

3-0

.17

0.1

80.1

10.0

7-0

.04

-0.0

9-0

.18

0.1

80.1

10.0

7N

=250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

φ11

AB

-GM

M-6

.59

-7.1

8-0

.47

0.3

70.2

70.1

8-1

5.3

2-1

6.1

8-0

.88

0.6

20.5

00.3

1-5

.79

-5.6

1-0

.25

0.1

30.1

30.0

9-7

.80

-7.7

1-0

.32

0.1

60.1

60.1

1S

ys-

GM

M2.7

82.7

4-0

.24

0.2

90.1

60.1

122.1

322.6

8-0

.19

0.6

40.3

40.2

58.5

18.1

8-0

.07

0.2

50.1

30.0

937.3

538.0

00.1

50.5

70.4

00.3

8F

DL

S-5

.19

-5.0

6-0

.30

0.2

00.1

60.1

1-5

.19

-5.0

7-0

.30

0.2

00.1

60.1

1-5

.67

-5.6

7-0

.22

0.1

10.1

10.0

8-5

.67

-5.6

7-0

.22

0.1

10.1

10.0

8T

ML

E6.3

51.6

3-0

.24

0.5

40.2

40.1

312.3

05.8

6-0

.24

0.6

70.3

10.1

7-0

.03

-0.2

3-0

.12

0.1

20.0

70.0

50.2

5-0

.21

-0.1

20.1

30.0

80.0

5T

ML

Ec

2.0

11.8

4-0

.18

0.2

30.1

20.0

82.0

81.8

6-0

.18

0.2

30.1

30.0

80.5

20.5

9-0

.10

0.1

10.0

60.0

40.5

20.5

9-0

.10

0.1

10.0

60.0

4T

ML

Es

4.0

30.4

0-0

.24

0.4

70.2

10.1

24.9

10.8

4-0

.24

0.5

10.2

20.1

3-0

.05

-0.2

3-0

.12

0.1

20.0

70.0

5-0

.04

-0.2

3-0

.12

0.1

20.0

70.0

5T

ML

Er

4.1

60.4

3-0

.24

0.4

80.2

20.1

26.0

31.1

4-0

.24

0.5

50.2

40.1

3-0

.05

-0.2

3-0

.12

0.1

20.0

70.0

5-0

.04

-0.2

3-0

.12

0.1

20.0

70.0

5φ12

AB

-GM

M-0

.76

-0.8

6-0

.41

0.4

00.2

50.1

60.7

60.3

3-0

.75

0.7

90.4

90.2

90.1

30.1

6-0

.19

0.1

90.1

10.0

81.0

41.0

5-0

.24

0.2

60.1

50.1

0S

ys-

GM

M-1

.12

-1.0

7-0

.28

0.2

60.1

60.1

1-7

.53

-7.6

2-0

.48

0.3

30.2

60.1

7-2

.29

-2.2

1-0

.17

0.1

30.1

00.0

6-9

.96

-10.3

4-0

.29

0.1

10.1

60.1

2F

DL

S5.3

45.2

8-0

.22

0.3

30.1

70.1

25.3

45.2

8-0

.22

0.3

30.1

70.1

28.6

08.6

6-0

.10

0.2

70.1

40.1

08.6

08.6

6-0

.10

0.2

70.1

40.1

0T

ML

E0.2

20.4

1-0

.33

0.3

20.1

90.1

2-1

.69

-0.3

5-0

.44

0.3

50.2

30.1

40.1

30.1

3-0

.10

0.1

10.0

70.0

40.1

70.1

5-0

.11

0.1

10.0

70.0

4T

ML

Ec

-7.1

0-7

.10

-0.2

50.1

00.1

30.0

9-7

.10

-7.0

9-0

.25

0.1

00.1

30.0

9-3

.23

-3.2

1-0

.13

0.0

60.0

70.0

4-3

.23

-3.2

1-0

.13

0.0

60.0

70.0

4T

ML

Es

0.3

00.1

4-0

.27

0.2

90.1

70.1

0-0

.30

-0.2

4-0

.27

0.2

80.1

70.1

00.1

10.1

3-0

.10

0.1

10.0

60.0

40.1

10.1

3-0

.10

0.1

10.0

60.0

4T

ML

Er

0.3

10.1

5-0

.27

0.2

90.1

70.1

0-0

.74

-0.6

7-0

.28

0.2

70.1

70.1

00.1

20.1

3-0

.10

0.1

10.0

60.0

40.1

10.1

3-0

.10

0.1

10.0

60.0

4


Table

2.6

:D

esig

n6

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

Mea

nM

ed.

5q

95

qR

MM

AE

φ11

AB

-GM

M-8

.23

-8.4

0-0

.55

0.4

10.3

10.1

9-2

5.6

5-2

5.2

1-1

.19

0.6

70.6

60.3

9-8

.24

-8.1

0-0

.31

0.1

40.1

60.1

1-1

2.2

7-1

2.2

7-0

.43

0.1

70.2

20.1

5S

ys-

GM

M6.4

46.2

7-0

.26

0.3

90.2

10.1

430.6

231.8

2-0

.15

0.7

30.4

10.3

310.5

010.3

6-0

.07

0.2

80.1

50.1

137.9

238.8

60.1

60.5

70.4

00.3

9F

DL

S0.3

00.1

3-0

.31

0.3

10.1

90.1

30.2

70.1

2-0

.31

0.3

10.1

90.1

3-6

.69

-6.7

4-0

.26

0.1

30.1

40.1

0-6

.69

-6.7

4-0

.26

0.1

30.1

40.1

0T

ML

E12.7

59.9

3-0

.21

0.5

60.2

70.1

617.4

213.8

0-0

.21

0.6

70.3

20.1

93.0

12.7

7-0

.10

0.1

80.0

90.0

63.1

22.7

8-0

.10

0.1

80.0

90.0

6T

ML

Ec

10.6

510.6

3-0

.13

0.3

50.1

80.1

310.9

110.7

0-0

.13

0.3

60.1

90.1

32.9

82.9

0-0

.09

0.1

50.0

80.0

52.9

82.9

0-0

.09

0.1

50.0

80.0

5T

ML

Es

10.6

68.5

0-0

.21

0.5

00.2

40.1

411.7

59.2

2-0

.21

0.5

30.2

50.1

53.0

02.7

7-0

.10

0.1

80.0

90.0

63.0

02.7

7-0

.10

0.1

80.0

90.0

6T

ML

Er

10.8

88.5

8-0

.21

0.5

10.2

40.1

513.1

29.7

6-0

.21

0.5

80.2

70.1

53.0

02.7

7-0

.10

0.1

80.0

90.0

63.0

02.7

7-0

.10

0.1

80.0

90.0

6φ12

AB

-GM

M-2

.01

-1.5

9-0

.49

0.4

30.2

90.1

8-0

.04

-0.5

1-0

.96

0.9

70.6

30.3

3-0

.62

-0.4

8-0

.23

0.2

10.1

40.0

90.2

50.2

8-0

.31

0.3

10.1

90.1

2S

ys-

GM

M-2

.17

-2.0

7-0

.35

0.2

90.2

00.1

3-9

.12

-9.4

9-0

.52

0.3

50.2

80.1

8-2

.87

-2.8

3-0

.20

0.1

40.1

10.0

7-9

.84

-9.9

0-0

.29

0.1

00.1

50.1

1F

DL

S4.0

44.0

2-0

.29

0.3

60.2

00.1

44.0

54.0

5-0

.29

0.3

60.2

00.1

48.2

48.3

8-0

.13

0.3

00.1

50.1

18.2

58.3

8-0

.13

0.3

00.1

50.1

1T

ML

E-0

.99

-0.3

1-0

.34

0.3

00.1

90.1

2-2

.87

-1.6

7-0

.42

0.3

10.2

20.1

4-0

.18

-0.1

4-0

.13

0.1

20.0

70.0

5-0

.19

-0.1

3-0

.13

0.1

20.0

80.0

5T

ML

Ec

-8.7

5-8

.81

-0.2

90.1

20.1

50.1

1-8

.70

-8.7

1-0

.29

0.1

20.1

50.1

1-3

.77

-3.7

5-0

.14

0.0

70.0

80.0

5-3

.77

-3.7

5-0

.14

0.0

70.0

80.0

5T

ML

Es

-0.5

1-0

.16

-0.3

00.2

80.1

70.1

1-1

.23

-0.7

4-0

.30

0.2

60.1

70.1

1-0

.18

-0.1

4-0

.13

0.1

20.0

70.0

5-0

.18

-0.1

4-0

.13

0.1

20.0

70.0

5T

ML

Er

-0.5

6-0

.19

-0.3

00.2

80.1

70.1

1-1

.89

-1.5

8-0

.31

0.2

60.1

70.1

1-0

.18

-0.1

4-0

.13

0.1

20.0

70.0

5-0

.19

-0.1

4-0

.13

0.1

20.0

70.0

5N

=250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

φ11

AB

-GM

M-3

.43

-3.8

3-0

.33

0.2

70.1

90.1

2-1

2.0

2-1

2.6

1-0

.73

0.5

10.4

10.2

5-3

.54

-3.4

5-0

.18

0.1

10.1

00.0

7-5

.90

-5.9

0-0

.27

0.1

50.1

40.0

9S

ys-

GM

M2.6

42.4

4-0

.18

0.2

40.1

30.0

919.3

918.8

7-0

.16

0.5

70.3

00.2

13.0

43.0

3-0

.07

0.1

40.0

70.0

522.6

222.2

60.0

20.4

40.2

60.2

2F

DL

S-0

.12

-0.1

9-0

.19

0.1

90.1

20.0

8-0

.12

-0.1

8-0

.19

0.1

90.1

20.0

8-6

.87

-6.9

6-0

.19

0.0

60.1

00.0

8-6

.87

-6.9

6-0

.19

0.0

60.1

00.0

8T

ML

E9.5

77.7

9-0

.11

0.3

70.1

80.1

011.8

18.7

6-0

.11

0.4

80.2

10.1

12.9

02.7

9-0

.05

0.1

20.0

60.0

42.9

02.7

9-0

.05

0.1

20.0

60.0

4T

ML

Ec

10.8

410.8

2-0

.04

0.2

60.1

40.1

110.8

510.8

3-0

.04

0.2

60.1

40.1

13.1

03.0

7-0

.05

0.1

10.0

60.0

43.1

03.0

7-0

.05

0.1

10.0

60.0

4T

ML

Es

9.0

37.5

4-0

.11

0.3

50.1

70.1

09.3

47.6

9-0

.11

0.3

60.1

70.1

02.9

02.7

9-0

.05

0.1

20.0

60.0

42.9

02.7

9-0

.05

0.1

20.0

60.0

4T

ML

Er

9.0

47.5

4-0

.11

0.3

50.1

70.1

09.6

67.7

6-0

.11

0.3

90.1

80.1

02.9

02.7

9-0

.05

0.1

20.0

60.0

42.9

02.7

9-0

.05

0.1

20.0

60.0

4φ12

AB

-GM

M-0

.74

-0.6

9-0

.29

0.2

80.1

70.1

10.5

70.1

6-0

.60

0.6

30.3

90.2

3-0

.19

-0.1

6-0

.15

0.1

40.0

90.0

60.5

00.6

9-0

.21

0.2

20.1

30.0

9S

ys-

GM

M-0

.73

-0.7

7-0

.22

0.2

00.1

30.0

8-6

.19

-6.1

1-0

.43

0.3

00.2

30.1

6-0

.70

-0.7

0-0

.12

0.1

00.0

70.0

4-5

.64

-5.6

3-0

.25

0.1

40.1

30.0

9F

DL

S4.1

54.1

3-0

.16

0.2

40.1

30.0

94.1

44.1

2-0

.16

0.2

40.1

30.0

98.3

88.4

8-0

.06

0.2

20.1

20.0

98.3

88.4

8-0

.06

0.2

20.1

20.0

9T

ML

E0.8

40.9

8-0

.20

0.2

10.1

30.0

8-0

.19

0.6

8-0

.27

0.2

20.1

50.0

8-0

.16

-0.1

2-0

.08

0.0

70.0

50.0

3-0

.16

-0.1

2-0

.08

0.0

70.0

50.0

3T

ML

Ec

-8.8

4-8

.82

-0.2

20.0

40.1

20.0

9-8

.85

-8.8

3-0

.22

0.0

40.1

20.0

9-3

.75

-3.7

3-0

.11

0.0

30.0

60.0

4-3

.75

-3.7

3-0

.11

0.0

30.0

60.0

4T

ML

Es

0.9

80.9

7-0

.18

0.2

00.1

20.0

80.7

50.8

1-0

.19

0.2

00.1

20.0

8-0

.16

-0.1

2-0

.08

0.0

70.0

50.0

3-0

.16

-0.1

2-0

.08

0.0

70.0

50.0

3T

ML

Er

1.0

00.9

8-0

.18

0.2

00.1

20.0

80.5

10.6

3-0

.19

0.2

00.1

20.0

8-0

.16

-0.1

2-0

.08

0.0

70.0

50.0

3-0

.16

-0.1

2-0

.08

0.0

70.0

50.0

3


Table

2.7

:D

esig

n1.

Rej

ecti

onfr

equen

cies

for

two

sided

t-te

sts

forφ

11.

Tru

eva

lueφ

11

=0.

6.

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

0.4

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

TM

LE

(r)

.320

.235

.210

.233

.316

.376

.291

.255

.259

.309

.802

.293

.106

.305

.664

.821

.346

.130

.270

.592

TM

LE

c(r)

.395

.139

.065

.172

.440

.399

.143

.072

.178

.444

.930

.414

.061

.465

.938

.929

.414

.061

.465

.938

TM

LE

s(r)

.257

.180

.172

.222

.340

.257

.181

.173

.224

.341

.780

.258

.092

.316

.695

.779

.255

.093

.318

.697

TM

LE

r(r)

.276

.201

.192

.236

.347

.314

.243

.233

.271

.363

.780

.258

.092

.316

.695

.779

.255

.093

.318

.697

AB

-GM

M2(W

).0

39

.052

.088

.144

.225

.059

.087

.127

.176

.236

.111

.050

.154

.414

.725

.049

.081

.219

.450

.693

Sys-

GM

M2(W

).3

99

.203

.087

.090

.231

.627

.491

.332

.176

.084

.931

.658

.227

.099

.462

.988

.957

.865

.646

.309

N=

250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

0.4

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

TM

LE

(r)

.366

.198

.150

.234

.418

.407

.235

.177

.233

.386

.992

.540

.099

.510

.881

.994

.589

.134

.468

.827

TM

LE

c(r)

.760

.261

.056

.301

.780

.761

.260

.056

.300

.780

.789

.054

.812

.789

.054

.812

TM

LE

s(r)

.330

.170

.133

.242

.449

.330

.170

.134

.242

.449

.987

.520

.085

.516

.894

.987

.520

.085

.516

.894

TM

LE

r(r)

.333

.173

.136

.245

.451

.343

.185

.150

.256

.456

.987

.520

.085

.516

.894

.987

.520

.085

.516

.894

AB

-GM

M2(W

).0

60

.032

.066

.145

.275

.029

.048

.086

.137

.209

.406

.088

.094

.437

.840

.125

.047

.126

.377

.691

Sys-

GM

M2(W

).5

85

.257

.076

.164

.524

.587

.397

.208

.094

.123

.992

.689

.092

.361

.947

.986

.881

.599

.253

.189

Table

2.8

:D

esig

n2.

Rej

ecti

onfr

equen

cies

for

two

sided

t-te

sts

forφ

11.

Tru

eva

lueφ

11

=0.

4.

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.239

.185

.173

.205

.281

.315

.260

.238

.247

.292

.712

.219

.055

.273

.687

.704

.222

.065

.279

.681

TM

LE

c(r)

.358

.148

.064

.104

.272

.361

.151

.069

.108

.276

.808

.311

.050

.258

.756

.808

.311

.050

.258

.756

TM

LE

s(r)

.204

.153

.148

.193

.284

.218

.166

.159

.201

.288

.713

.219

.054

.272

.688

.713

.219

.054

.272

.688

TM

LE

r(r)

.211

.160

.155

.198

.287

.250

.200

.195

.232

.311

.713

.219

.054

.272

.688

.713

.219

.055

.273

.688

AB

-GM

M2(W

).0

57

.046

.067

.123

.211

.038

.053

.076

.114

.165

.183

.062

.094

.274

.565

.090

.049

.100

.246

.462

Sys-

GM

M2(W

).2

78

.145

.081

.086

.151

.531

.424

.311

.208

.135

.835

.528

.209

.077

.200

.971

.925

.838

.693

.494

N=

250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.297

.116

.103

.215

.409

.336

.170

.149

.240

.399

.985

.498

.048

.513

.963

.985

.498

.048

.513

.962

TM

LE

c(r)

.716

.280

.058

.150

.515

.716

.280

.058

.150

.515

.997

.644

.043

.522

.989

.997

.644

.043

.522

.989

TM

LE

s(r)

.285

.102

.091

.209

.414

.290

.106

.095

.213

.416

.985

.498

.048

.513

.963

.985

.498

.048

.513

.963

TM

LE

r(r)

.286

.102

.092

.210

.414

.295

.113

.104

.221

.422

.985

.498

.048

.513

.963

.985

.498

.048

.513

.963

AB

-GM

M2(W

).1

28

.051

.059

.145

.295

.035

.034

.058

.106

.179

.517

.138

.068

.356

.775

.267

.077

.074

.268

.591

Sys-

GM

M2(W

).4

34

.171

.067

.129

.350

.465

.318

.197

.120

.101

.949

.536

.086

.226

.765

.950

.811

.573

.319

.167


Table

2.9

:D

esig

n3.

Rej

ecti

onfr

equen

cies

for

two

sided

t-te

sts

forφ

11.

Tru

eva

lueφ

11

=0.

4.

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.303

.222

.210

.249

.329

.733

.331

.141

.358

.701

.784

.269

.061

.318

.766

.970

.562

.079

.590

.964

TM

LE

c(r)

.634

.393

.192

.089

.081

.980

.969

.950

.908

.833

.868

.450

.082

.056

.299

.734

.645

.529

.407

.289

TM

LE

s(r)

.218

.109

.108

.185

.335

.738

.289

.081

.328

.720

.789

.269

.057

.315

.769

.972

.562

.076

.588

.966

TM

LE

r(r)

.233

.130

.132

.210

.357

.739

.290

.083

.330

.722

.789

.269

.057

.315

.769

.972

.562

.076

.588

.966

AB

-GM

M2(W

).0

39

.050

.070

.103

.152

.332

.133

.069

.193

.432

.087

.052

.103

.254

.477

.426

.151

.084

.370

.715

Sys-

GM

M2(W

).8

72

.764

.616

.422

.232

.999

.996

.970

.855

.613

.313

N=

250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.486

.180

.135

.259

.480

.982

.579

.079

.580

.962

.995

.577

.049

.595

.985

.910

.060

.907

TM

LE

c(r)

.961

.771

.392

.112

.067

.999

.998

.987

.999

.885

.194

.055

.532

.901

.870

.812

.694

.517

TM

LE

s(r)

.512

.110

.060

.233

.536

.984

.575

.067

.576

.966

.995

.577

.049

.595

.985

.910

.060

.907

TM

LE

r(r)

.514

.114

.065

.239

.541

.984

.575

.067

.576

.966

.995

.577

.049

.595

.985

.910

.060

.907

AB

-GM

M2(W

).0

30

.030

.047

.089

.149

.697

.270

.063

.339

.748

.259

.067

.080

.294

.618

.782

.326

.065

.509

.910

Sys-

GM

M2(W

).9

95

.972

.898

.719

.389

.998

.939

.674

.295

Table

2.1

0:

Des

ign

4.R

ejec

tion

freq

uen

cies

for

two

sided

t-te

sts

forφ

11.

Tru

eva

lueφ

11

=0.

4.

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.260

.186

.179

.225

.309

.553

.256

.149

.291

.549

.776

.265

.064

.307

.741

.951

.522

.075

.526

.941

TM

LE

c(r)

.570

.314

.141

.065

.093

.922

.875

.798

.693

.561

.873

.440

.080

.069

.342

.680

.540

.380

.251

.152

TM

LE

s(r)

.198

.116

.119

.186

.311

.594

.209

.071

.246

.583

.784

.266

.058

.303

.745

.958

.526

.072

.526

.948

TM

LE

r(r)

.207

.126

.129

.195

.318

.595

.214

.078

.254

.590

.784

.266

.058

.303

.745

.958

.526

.072

.526

.948

AB

-GM

M2(W

).1

65

.075

.071

.141

.283

.488

.186

.082

.233

.535

.399

.114

.084

.319

.696

.813

.326

.086

.498

.906

Sys-

GM

M2(W

).6

97

.569

.445

.329

.238

.999

.977

.909

.779

.615

.475

N=

250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.434

.169

.136

.257

.458

.878

.415

.090

.439

.859

.993

.569

.050

.574

.982

.870

.059

.861

TM

LE

c(r)

.930

.666

.259

.059

.109

.996

.995

.987

.957

.861

.880

.179

.074

.605

.901

.851

.719

.488

.268

TM

LE

s(r)

.440

.098

.066

.222

.485

.934

.423

.061

.437

.900

.993

.569

.049

.574

.982

.870

.059

.861

TM

LE

r(r)

.443

.101

.069

.224

.487

.934

.423

.061

.438

.900

.993

.569

.049

.574

.982

.870

.059

.861

AB

-GM

M2(W

).3

90

.126

.060

.194

.478

.845

.357

.070

.390

.849

.820

.282

.062

.465

.921

.993

.647

.064

.743

.996

Sys-

GM

M2(W

).9

17

.804

.663

.517

.384

.999

.985

.900

.728

.559


Table

2.1

1:

Des

ign

5.R

ejec

tion

freq

uen

cies

for

two

sided

t-te

sts

forφ

11.

Tru

eva

lueφ

11

=0.

4.

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.280

.242

.230

.246

.282

.357

.316

.293

.289

.299

.429

.141

.090

.234

.494

.433

.172

.123

.254

.490

TM

LE

c(r)

.232

.116

.079

.116

.224

.241

.127

.092

.128

.235

.537

.185

.066

.202

.525

.536

.185

.066

.203

.525

TM

LE

s(r)

.226

.190

.188

.221

.278

.245

.207

.202

.230

.282

.429

.134

.079

.228

.495

.431

.137

.082

.230

.497

TM

LE

r(r)

.240

.206

.203

.234

.288

.305

.272

.266

.285

.323

.429

.134

.080

.229

.496

.434

.141

.089

.237

.502

AB

-GM

M2(W

).0

54

.061

.087

.133

.198

.053

.071

.098

.132

.175

.090

.065

.144

.322

.547

.069

.072

.148

.295

.481

Sys-

GM

M2(W

).2

68

.165

.101

.080

.106

.579

.485

.389

.285

.189

.823

.629

.389

.191

.117

.986

.971

.939

.880

.765

N=

250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.234

.162

.155

.218

.328

.303

.232

.214

.255

.329

.826

.279

.059

.345

.785

.823

.280

.064

.348

.782

TM

LE

c(r)

.443

.169

.063

.124

.352

.443

.169

.063

.125

.352

.896

.386

.051

.336

.848

.896

.386

.051

.336

.848

TM

LE

s(r)

.208

.136

.134

.206

.333

.219

.147

.143

.214

.337

.827

.279

.058

.346

.785

.826

.279

.058

.346

.785

TM

LE

r(r)

.211

.140

.137

.209

.336

.239

.168

.166

.236

.354

.827

.279

.058

.346

.785

.827

.279

.059

.346

.785

AB

-GM

M2(W

).0

72

.048

.067

.138

.237

.033

.045

.072

.110

.161

.254

.069

.082

.293

.628

.136

.054

.087

.239

.493

Sys-

GM

M2(W

).3

16

.154

.078

.093

.206

.504

.391

.278

.182

.115

.882

.551

.185

.084

.297

.972

.932

.844

.701

.501

Table

2.1

2:

Des

ign

6.R

ejec

tion

freq

uen

cies

for

two

sided

t-te

sts

forφ

11.

Tru

eva

lueφ

11

=0.

4.

N=

100

T=

3π

=1

N=

100

T=

3π

=3

N=

100

T=

6π

=1

N=

100

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.326

.223

.171

.168

.204

.379

.283

.227

.211

.228

.804

.324

.059

.170

.550

.802

.324

.061

.172

.549

TM

LE

c(r)

.560

.297

.126

.065

.119

.561

.301

.132

.072

.127

.864

.407

.070

.168

.615

.864

.407

.070

.168

.615

TM

LE

s(r)

.301

.194

.146

.151

.199

.316

.208

.159

.161

.205

.804

.324

.059

.170

.550

.804

.324

.059

.170

.550

TM

LE

r(r)

.305

.201

.154

.157

.202

.336

.236

.191

.191

.229

.804

.324

.059

.170

.550

.804

.324

.059

.170

.550

AB

-GM

M2(W

).0

63

.048

.070

.123

.213

.045

.057

.080

.116

.163

.156

.057

.099

.274

.548

.069

.050

.108

.243

.440

Sys-

GM

M2(W

).2

88

.157

.085

.082

.136

.582

.482

.375

.267

.174

.830

.531

.214

.079

.191

.972

.927

.844

.696

.501

N=

250

T=

3π

=1

N=

250

T=

3π

=3

N=

250

T=

6π

=1

N=

250

T=

6π

=3

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

TM

LE

(r)

.560

.231

.110

.110

.239

.572

.263

.144

.138

.248

.996

.701

.073

.294

.883

.996

.701

.073

.294

.883

TM

LE

c(r)

.918

.617

.217

.054

.181

.918

.618

.217

.054

.181

.999

.786

.092

.303

.942

.999

.786

.092

.303

.942

TM

LE

s(r)

.557

.222

.102

.105

.239

.561

.226

.106

.108

.241

.996

.701

.073

.294

.883

.996

.701

.073

.294

.883

TM

LE

r(r)

.557

.222

.102

.105

.240

.563

.231

.114

.118

.249

.996

.701

.073

.294

.883

.996

.701

.073

.294

.883

AB

-GM

M2(W

).1

42

.055

.058

.143

.299

.033

.036

.059

.103

.166

.467

.120

.071

.342

.747

.207

.064

.076

.254

.540

Sys-

GM

M2(W

).4

37

.171

.068

.120

.313

.508

.370

.256

.168

.122

.954

.543

.087

.223

.758

.952

.819

.573

.317

.170

Chapter 3

On Maximum Likelihood

Estimation of Dynamic Panel Data

Models

3.1 Introduction

Dynamic panel data models have a prominent place in applied research and at the same

time form a challenging field in econometric theory. Many panel data applications have

a relatively small number of time periods T , whereas the cross sectional dimension N is

sizeable. It is therefore common to consider the semi-asymptotic behavior of estimators

and corresponding test statistics with T fixed and only N tending to infinity.

A central theme in linear dynamic panel data analysis is the fact that the Fixed Effects

(FE) estimator is inconsistent for fixed T and N large. This inconsistency is referred

to as the Nickell (1981) bias, and is an example of the incidental parameters problem.

It has therefore become common practice to estimate the parameters of dynamic panel

data models by the Generalized Method of Moments (GMM), see Arellano and Bond

(1991) and Blundell and Bond (1998). A main reason for using GMM is that it provides

asymptotically efficient inference exploiting a minimal set of statistical assumptions.

GMM inference has not been without its own problems, however. These include small

sample biases in both coefficient and variance estimators, sensitivity to important nui-

sance parameters and choices regarding the type and number of moment conditions. A

63

Chapter 3. On Maximum Likelihood Estimation of Dynamic Panel Data Models 64

large literature has been devoted to adapting the GMM approach to limit the impact

of these inherent drawbacks, see Bun and Sarafidis (2015) for a recent overview.

This again has led to an interest in likelihood based methods that correct for the inci-

dental parameters problem. Some of these methods are based on modifications of the

profile likelihood, see Lancaster (2002) and Dhaene and Jochmans (2015). Other meth-

ods start from the likelihood function of the first differences, see Hsiao et al. (2002) and

Binder et al. (2005). Essentially these methods treat the incidental parameters as fixed

in estimation. The alternative approach is to assume random effects, but in dynamic

models it is then necessary to be explicit about the non-zero correlation between indi-

vidual specific effects and initial conditions (Anderson and Hsiao (1982), Alvarez and

Arellano (2003), Moral-Benito (2012) and Hsiao and Zhang (2015)). Random effects

type ML estimators therefore typically exploit Chamberlain (1982) type of projections

to model the dependence between individual specific effects, initial observations and

additional covariates.

In this study we consider the Transformed Maximum Likelihood approach (TML) as in

Hsiao et al. (2002) and the Random effects Maximum Likelihood estimator (RML) as

in Alvarez and Arellano (2003).1 There is a close connection between TML and RML

in the sense that TML can be expressed as a restricted version of RML. Under suit-

able regularity conditions ML estimators are consistent and asymptotically normally

distributed. Monte Carlo evidence provided in both studies, suggest that these likeli-

hood based approaches can serve as viable alternatives to the usual GMM estimators.

Just like for GMM, however, the application of ML estimators is not without its own

problems.

In this study we address two important issues when implementing ML for dynamic panel

data models. First, we show that in the simple setup without time-series heteroscedas-

ticity both the TML and RML estimators give rise to a cubic first-order condition in

the autoregressive parameter. We therefore have either one or three solutions to the

first-order conditions. As a result even asymptotically the log-likelihood function can

be bimodal. This result is different from Kruiniger (2008) and Han and Phillips (2013),

who find a quartic equation assuming covariance stationarity.

Second, because both TML and RML can be seen as random effect ML estimators,

we address the issue of negative variance estimates as mentioned in Maddala (1971),

Alvarez and Arellano (2003) and Han and Phillips (2013). An important consequence

1Because the former is derived conditional on the initial observations and individual specific effects,it is also referred to as fixed effects ML in Kruiniger (2013).


of bimodality is that unconstrained maximization of the log-likelihood may lead to

ML estimates which do not satisfy the implicit restriction of non-negative variances.

Enforcing this non-negativity constraint may furthermore lead to a boundary solution

(Maddala (1971), Alvarez and Arellano (2003)).

We further investigate the impact of multiple roots and boundary conditions in finite

samples in a Monte Carlo study. We consider finite sample bias and RMSE of coef-

ficient estimators as well as size and power of corresponding t and LR statistics. We

find that, despite the robustness of the TML and RML to initial conditions, the fi-

nite sample properties of both estimators for small values of T depend heavily on the

initial condition. A partial explanation is that the behavior of the initial condition

has direct effect on the bimodality of the log-likelihood function. We find that when

there are three solutions to the first-order condition, the left solution always satisfies

the non-negativity restriction, while the right solution violates it. Estimators taking

into account non-negativity constraints perform much better than unconstrained coun-

terparts. Furthermore, we find that inference based on the LR statistic is size correct,

while t statistics show large size distortions. Using the dataset in Bun and Carree

(2005) we show how these theoretical results can influence empirical estimates of U.S.

state level unemployment dynamics.

Throughout the analysis we limit ourselves to an asymptotic analysis in which T is fixed

and N → ∞. When T is large the influence of initial conditions becomes negligible,

hence our main results become less relevant. For results with T large, see e.g. Bai

(2013a). Furthermore, we do not analyze the unit root case. Distribution theory

becomes rather different in this case, see Ahn and Thomas (2006) and Kruiniger (2013).

The plan of this study is as follows. In Section 3.2 we introduce the Maximum Like-

lihood estimators for the panel AR(1) model including the cubic first-order condition

for the autoregressive parameter. Section 3.3 deals with the possibility of multiple

solutions and proposes bounded estimation as a solution. Section 3.4 contains the ex-

tension to dynamic models with additional covariates. Section 3.5 reports the results

from the Monte Carlo study, while Section 3.6 shows the empirical results. Section 3.7

concludes.


3.2 ML estimation for the panel AR(1) model

We consider the following simple AR(1) specification without exogenous regressors:2

yi,t = ηi + φyi,t−1 + εi,t, E[εi,t|yi,0, ηi] = 0, (3.1)

for i = 1, . . . , N, t = 1, . . . , T . We assume that the idiosyncratic errors εi,t are i.i.d.

(0, σ2)3 and that initial conditions yi,0 are observed. Stacking the observations over

time, we can write the AR(1) model for each individual as

yi = φyi− + ıTηi + εi, εi = (εi,1, . . . , εi,T )′, (3.2)

with yi and yi− defined accordingly and ıT a vector of ones. We follow Kruiniger

(2013) to derive the log-likelihood function(s), but final results are identical to those

in Hsiao et al. (2002), Alvarez and Arellano (2003) or Binder et al. (2005). As it is

discussed in e.g. Kruiniger (2013) the crucial assumptions for consistency, asymptotic

normality and feasible inference of the likelihood estimation can be summarized with

two following assumptions.

Assumption TML: ξi,0 ≡ yi,0 − ηi/(1 − φ0) are i.i.d. with finite second moments,

∀i = 1, . . . , N .

Assumption RML: yi,0 and ηi are i.i.d. with finite second moments, ∀i = 1, . . . , N .

Remark 3.1. Note that Assumption TML is somewhat less restrictive than RML,

because one has to impose distributional assumptions only on a particular linear com-

bination of yi,0 and ηi. Assumption RML, on the other hand, imposes restrictions on

the whole joint distribution of yi,0 and ηi.

To derive corresponding estimating equations for both estimators, it is convenient to

use the Chamberlain (1982) type of projection for ηi:4

ηi = πyi,0 + vi, E[viyi,0] = 0, vi ∼ i.i.d.(0, σ2v). (3.3)

2Time-specific effects can be accommodated by taking the variables in deviations from the cross-sectional mean.

3Assuming heteroscedasticity over i does not violate consistency of the resulting estimators andhas no effect on the results of this paper.

4For simplicity we do not include a constant term in the projection as it would serve as a restrictedtime effect.


Note that this projection is only necessary for the RML estimator, while the i.i.d.

assumption is only imposed for simplicity as the discussed estimators remain consistent

even if vi is heteroscedastic. When we set π = 1−φ the projection corresponds exactly

to the TML framework as in this case vi = −(1 − φ0)ξi,0. In the TML approach the

variance of vi = ∆yi,1− εi,1 is a parameter to be estimated.5 We therefore only exploit

the projection in (3.3) to show the algebraic comparison between RML and TML. The

model can be represented as:6

Ryi = (e1φ+ ıTπ)yi,0 + (ıTvi + εi), (3.4)

where R = IT −LTφ and e1 is the first column of the IT matrix.7 Define the combined

error term as

ui ≡ ıTvi + εi, (3.5)

then it follows under our assumption

E[ui] = 0T , var[ui] = Σ = σ2vıT ı

′T + σ2IT . (3.6)

The variance-covariance structure ofΣ is of the usual “correlated” random effects form.

Using the matrix inversion and determinant lemmas, we obtain

Σ−1 =1

σ2IT −

1

σ2

σ2v

σ2 + Tσ2v

ıT ı′T , |Σ|= (σ2)T−1(σ2 + Tσ2

v). (3.7)

Denote by WT ≡ IT − 1TıT ı′T the usual fixed effects projection matrix, then we can

write

Σ−1 =1

σ2WT +

1

Tθ2ıT ı′T , |Σ|= (σ2)T−1θ2, (3.8)

θ2 ≡ σ2 + Tσ2v . (3.9)

Hence, instead of estimating σ2 and σ2v we rather estimate σ2 and θ2. By doing so

we do not restrict σ2v to be positive. This parametrization, in one form or the other,

has been used in Hsiao et al. (2002), Alvarez and Arellano (2003), Ahn and Thomas

(2006), and Kruiniger (2008) inter alia.

5Alternatively, following Hsiao et al. (2002) one can estimate ω ≡ 1 + σ2v/σ

2 = var (∆yi,1)/σ2.6Bai (2013a) considers a similar conditional maximum likelihood estimator with a possible factor

structure in the error term εi.7The lag-operator matrix LT is defined such that for any [T × 1] vector x = (x1, . . . , xT )′, LTx =

(0, x1, . . . , xT−1)′.


Now we define the quasi log-likelihood function8 for some individual i (up to a constant)

−2`i(κ) = (T − 1) log(σ2) + log(θ2)

+ (yi − φyi− − ıTπyi,0)′Σ−1(yi − φyi− − ıTπyi,0), (3.10)

where κ = (φ, π, σ2, θ2)′. This function is the true likelihood function if ui are jointly

normal.9 Now using the fact that WT ıT = 0T , we can write

−2`i(κ) = (T − 1) log(σ2) + log(θ2)

+1

σ2[(yi − φyi−)′WT (yi − φyi−)] +

T

θ2(yi − φyi− − πyi,0)2, (3.11)

where yi ≡ (1/T )∑T

t=1 yi,t and yi− ≡ (1/T )∑T

t=1 yi,t−1. Furthermore, we define yi,t ≡yi,t − yi, yi,t−1 ≡ yi,t−1 − yi−, yi ≡ yi − yi,0, yi− ≡ yi− − yi,0 and ρ ≡ π − (1− φ). This

implies the following final expression for the log-likelihood function (after summing

over all individual log-likelihood functions)

− 2

N`(κ) = (T − 1) log(σ2) + log(θ2)

+1

Nσ2

N∑i=1

T∑t=1

(yi,t − φyi,t−1)2 +T

Nθ2

N∑i=1

(yi − φyi− − ρyi,0)2. (3.12)

The log-likelihood function for observations in first-differences (also known as Trans-

formed log-likelihood in Hsiao et al. (2002) and Juodis (2014b)) is obtained by setting

ρ = 0. Thus both RML and TML estimators provide an in-built bias-correction term

for the usual fixed effects log-likelihood function.

The parameters σ2, θ2 (and ρ for RML) can be concentrated out, resulting in (up to a

constant)

`c(φ) = −N2

((T − 1) log σ2(φ) + log θ2(φ)

), (3.13)

where

σ2(φ) =1

N(T − 1)

N∑i=1

T∑t=1

(yi,t − φyi,t−1)2, θ2(φ) =T

N

N∑i=1

(yi − φyi−)2. (3.14)

8Or, following Bai (2013a), a distance function between population and sample covariance matrices.9This parametrization ensures that θ2 > 0 or equivalently ω > 1− 1

T as in Hsiao et al. (2002).


Here for the random effects log-likelihood function we defined yi and yi−

yi ≡ yi − yi,0∑N

i=1 yiyi,0∑Ni=1 y

2i,0

, yi− ≡ yi− − yi,0∑N

i=1 yi−yi,0∑Ni=1 y

2i,0

, (3.15)

while for the log-likelihood in first differences yi ≡ yi and yi− ≡ yi−.

The likelihood function in (3.13) is defined for all values of φ ∈ R, hence from theoretical

and computational point of view there are no reasons to consider a restricted parameter

space for estimation. Nevertheless, some studies (Hsiao et al. (2002); Hayakawa and

Pesaran (2015)) restrict φ ∈ (−1; 1). This may have consequences for the finite sample

properties of the resulting estimators, as we shall see below. Furthermore, the fact

that the likelihood function is defined over the whole real line, is in contrast with the

likelihood function in Kruiniger (2008) and Han and Phillips (2013). In these studies

stationarity has been assumed, hence the likelihood function is naturally defined only

for −1 < φ ≤ 1.

The FOC (first order condition) for the autoregressive parameter φ can now be ex-

pressed in the following way

d`c(φ)

dφ=

1

σ2(φ)

N∑i=1

T∑t=1

yi,t−1(yi,t − φyi,t−1) +T

θ2(φ)

N∑i=1

yi−(yi − φyi−) = 0, (3.16)

or alternatively

θ2(φ)N∑i=1

T∑t=1

yi,t−1(yi,t − φyi,t−1) + σ2(φ)TN∑i=1

yi−(yi − φyi−) = 0. (3.17)

Given that σ2(φ) and θ2(φ) are quadratic in φ it is not difficult to see that the FOC

is cubic in φ. Thus for any value of T and any realization of yiNi=1 there will be at

least one and at most three solutions to (3.17). For general value of T there is no easy

formula for the solutions, but in the next section we will obtain interesting analytical

results for three-wave panels. In any case the solutions of the cubic equation can be

found without any need of explicit numerical maximization. One can simply use root

finder algorithms based on the eigenvalues of the companion matrix.

For some reason the fact of possible multiple solutions is mostly forgotten when dis-

cussing both likelihood estimators. An exemption is Hayakawa and Pesaran (2015)

who observe that the TML log-likelihood function can have more than one solution

asymptotically. Here we show that also in finite samples this is possible, and that


both the TML and RML log-likelihood functions have one or two local maxima. More

importantly, this result is unaffected if strictly exogenous regressors are added to the

model as in Section 3.4.

Given the structure of the log-likelihood function we can easily specify the interval for

all solutions φ. In particular, we have

Lemma 3.1. For any N and T all solutions of (3.17) lie in the following interval

φ ∈ (φW , φB) =

(∑Ni=1

∑Tt=1 yi,tyi,t−1∑N

i=1

∑Tt=1 y

2i,t−1

,

∑Ni=1 yiyi−∑Ni=1 y

2i−

). (3.18)

Furthermore, this result continues to hold if N →∞.

The proof of this lemma follows directly from the fact that the log-likelihood function

is a sum of two quasi-concave functions with different maxima. The lower bound of

this interval is the fixed effects ML estimator (also known as Within Group or LSDV

estimator). The upper bound can be interpreted as a quasi-between estimator. It is

well known that plimN,T→∞ φW = φ, but that for fixed T the within estimator has

a negative bias, see Nickell (1981). Furthermore, it is straightforward to show that

plimN,T→∞ φB = 1 because yi and yi− converge to the same value as T goes to infinity.

Next, we investigate the asymptotic behavior of the interval in Lemma 3.1. As the

lower bound (φW ) is the same for both RML and TML estimators we are primarily

interested in the upper bound (φB), which is different between estimators. The result

is summarized in the following Proposition.

Proposition 3.2. The probability limits of the quasi-between estimators from Lemma

3.1 are

plimN→∞

φRMLB ≤ plim

N→∞φTMLB . (3.19)

Proof. In Appendix 3.A.

Thus the upper bound for RML is no larger than for TML. The interval for possible

values for φRML is narrower than the corresponding interval for TML. This result can

be expected given that the RML estimator is found to be more efficient than TML, see

Kruiniger (2013).


3.3 Multiple solutions and constrained estimation

The possibility of having one or three solutions to the cubic equation (3.17) has impor-

tant consequences. We first characterize the solutions for the case in which analytical

results can be derived, that of three-wave panels and TML. We then proceed to the

case of general T . Finally, we discuss a procedure of bounded estimation.

3.3.1 Three-wave panel and the Transformed ML estimator

For general values of T we can only specify in which interval the solutions of the cubic

equation lie, as described in Lemma 3.1. For T = 2 and the Transformed log-likelihood

function (i.e. ρ = 0), this result can be sharpened and a simple analytic expression for

the ML estimator can be derived. Observe that for T = 2 we have for σ2(φ) and θ2(φ)

as defined in (3.14)

σ2(φ) =1

2N

N∑i=1

(∆yi,2 − φ∆yi,1)2 , θ2(φ) =1

2N

N∑i=1

(∆yi,2 − (φ− 2)∆yi,1)2 . (3.20)

We have then the following expression for the TML log-likelihood function

Proposition 3.3. For T = 2 the log-likelihood function for the TML estimator is given

by

`c(φ) = −N2

log

(σ2(φ) +

(1

N

N∑i=1

∆yi,1∆yi,2 − φ1

N

N∑i=1

(∆yi,1)2

))2

+ d

,

(3.21)

where d does not depend on φ but only on data.



The polynomial inside the log(·) expression in Proposition 3.3 is symmetric around the

point φ = φW + 1. The FOC is

1

2σ2θ2

(N∑i=1

(∆yi,2 − φ∆yi,1)(∆yi,2 − (φ− 2)∆yi,1)

)

×

(N∑i=1

(∆yi,2 − (φ− 1)∆yi,1)∆yi,1

)= 0.

The solutions are given by φ = φW + 1 and for D > 0

φ(l) = φ−√D, φ(r) = φ+

√D,

where D ≡ 1 + φ2W −

∑Ni=1(∆yi,2)2∑Ni=1(∆yi,1)2

is the discriminant of the quadratic part of the score.

The first derivative of the concentrated likelihood consists of a linear and quadratic

part. The latter implies either zero or two more solutions for φ on top of the inter-

mediate case φ = φW + 1. Furthermore, setting this quadratic part equal to zero

can be recognized as the FOC of the bias corrected FE estimator as in Bun and Car-

ree (2005). Consistency10 of φ(l) follows directly from plimN→∞ φW = φ0 − σ20

var (∆yi,1)

and plimN→∞D =(

1− σ20

var (∆yi,1)

)2

. The solutions φ and φ(r) are inconsistent unless

var(∆yi,1) = σ20, i.e. σ2

v = 0.

The relationships between the solutions to the FOC can be further summarized as

follows

Corollary 3.4. For T = 2 and TML the following holds

`c(φ(l)) = `c(φ(r)), (3.22)

σ2v(φ

(l)) > 0 > σ2v(φ

(r)), (3.23)

θ2(φ) = σ2(φ), (3.24)

φB = φW + 2. (3.25)

However, this result does not hold in general for RML and/or T > 2.


10From now on, where necessary to avoid confusion, we will use the subscript 0 to denote the truevalues of the parameters, e.g. φ0, σ

20 .


Corollary 3.4 states that, if the cubic equation has only one solution, it is a corner

solution as the estimate for σ2v(φ) = 0 in this case.11 The equality of the likelihood for

φ(l) and φ(r) would imply that both can be considered “maximum likelihood”. However,

the second is inconsistent and leads to a negative estimate of σ2v . To illustrate the

relevance of the occurrence of three solutions we derived the probability of a positive

discriminant by assuming normality.

Corollary 3.5. For T = 2 and under joint normality of the data, the probability of

D > 0 (two maxima) is given by

Pr (D > 0) = F

N

N − 1

(2

σ20

var (∆yi,1)−(

σ20

var (∆yi,1)

)2)−1

(N−1,N)

, (3.26)

where F (·)(N−1,N) is the CDF of an F distributed random variable with (N − 1, N)

degrees of freedom.


The term inside F (·)(N−1,N) is always larger than 1 and consequently Pr (D > 0) ≥ 0.5

as var (∆yi,1) ≥ σ20. If the initial observation is generated from a stationary process

thenσ20

var (∆yi,1)= (1 + φ0)/2. To allow for unrestricted initial condition we define the

following relative variance ratio α0

α0 ≡1− φ2

0

σ20

var

(yi,0 −

ηi(1− φ0)

),⇒ var (∆yi,1) = σ2

0

(α0

1− φ0

1 + φ0

+ 1

), (3.27)

such that α0 = 1 if the initial observation is covariance stationary. Pr (D > 0) then

depends on N , φ0 and α0. It can be easily seen that Pr (D > 0) is a decreasing function

of φ0 and a increasing function of α0. Below we provide two graphs to illustrate how

this probability depends on the population parameters.

11While discussing the properties of the panel VAR estimator, Juodis (2014b) observed that forthe AR(1) with T = 2 case the results of the previous corollary hold asymptotically. In this paper,we show that this result is exact if the quadratic equation has a positive discriminant. Furthermore,Juodis (2014b) investigates the location of the second mode asymptotically and shows that its locationdepends on initialization of yi,0.


α0

φ 0

N=

50

0.20.4

0.6

0.81.0

0.25

0.50

0.75

1.00

0.6

0.7

0.8

α0

φ 0

N=

250

0.20.4

0.6

0.81.0

0.25

0.50

0.75

1.00

0.6

0.7

0.8

0.9

1

Figure 3.1: Probability of D > 0 with N = 50 on the left and N = 250 on theright. φ0 ∈ [0; 0.95] and α0 ∈ [0.0; 1.05].

3.3.2 Further asymptotic results for T > 2 and TML

In this subsection we extend the analysis of one or three solutions to the FOC to T > 2.

We consider the extent to which asymptotically the discriminant of (3.17) of TML is

positive or negative. Before proceeding we define the following quantities

aE ≡ E

[T∑t=1

y2i,t−1

], aE ≡ E

[T y2

i−], ξ ≡

T−2∑t=0

(T − t− 1)φt0, x ≡ φ0 − φ. (3.28)

Using this notation we can express the asymptotic solutions of the FOC for general T

as

Proposition 3.6. The two non-trivial (x 6= 0) asymptotic solutions of (3.17) (if they

exist) are implicitly defined by

x2 T

T − 1(aE aE) + x

ξ

T

((2T − 1

T − 1θ2

0aE

)−(T + 1

T − 1σ2

0 aE

))+(θ2

0aE + σ20 aE)− 2ξ2

T (T − 1)θ2

0σ20 = 0. (3.29)

The existence of non-trivial solutions to this equation in the simple AR(1) model de-

pends on two parameters: the autoregressive coefficient φ0 and the relative variance


parameter α0. Under this reparametrization the solutions in Proposition 3.6 are invari-

ant to σ20, because all quantities of interest are multiplicative in σ2

0

aE =

T−1∑t=0

φ2t0 −

1

T

(T−1∑t=0

φt0

)2 α0σ

20

1− φ20

+ σ20

T−2∑t=0

(t∑

j=0

φj0

(φj0 −

1

T

(t∑

j=0

φj0

))),

(3.30)

aE =α0σ

20ξ

2

T

1− φ0

1 + φ0

+σ2

0

T

T−2∑t=0

(t∑

j=0

φj0

)2

, (3.31)

θ20 = Tσ2

0

(α0

1− φ0

1 + φ0

+1

T

). (3.32)

To gain further insight into the quadratic equation in Proposition 3.6 for general T > 2

we investigate the sign of the discriminant numerically for different values of T . In

Figure 3.2 we present two plots of the sign of the discriminant. Here 3 indicates that

the discriminant is positive (thus bimodality), while 1 implies that the discriminant is

negative and the log-likelihood function is asymptotically unimodal. We present results

for T ∈ 3; 5. For higher values of T the border between three and one solution to the

FOC approaches the α0 = 1 line from below. There is a major change from T = 2 to

T > 2 in the set of values (φ0, α0) for which in the limit there is a positive discriminant

value. For T = 2 all values of (φ0, α0) for which α0 > 0 and φ0 < 1 have a positive

discriminant when N →∞. This set is obviously smaller already for T = 3.

0.80

0.85

0.90

0.95

1.00

1.05

0.00 0.25 0.50 0.75 1.00

φ

α 1

3

0.80

0.85

0.90

0.95

1.00

1.05

0.00 0.25 0.50 0.75 1.00

φ

α 1

3

Figure 3.2: The sign of the discriminant for T = 3 on the left and T = 5 on theright graph. 3 is for positive discriminant and thus three solutions to FOC, while 1

is for negative discriminant and one solution. φ0 ∈ [0; 0.99] and α0 ∈ [0.8; 1.05]

Figure 3.2 shows that as T increases the interval of α0 < 1, that results in a positive

discriminant, shrinks. It can be shown numerically for the relevant range of values for


α0 and φ0 that for α0 ≥ 1 the discriminant is always positive. These results show that

multiple solutions are possible for T > 2 even if N becomes large.

3.3.3 Constrained estimation

Aside from the suggested “take left” procedure in case of bimodality to avoid a negative

estimate of σ2v we may also use restricted ML estimation. Consider the following

reparametrization δ = σ2

θ2, so that in the population δ ∈ (0; 1] because by definition

θ2 = σ2 + Tσ2v . In order to take this population restriction into account we consider

the following reformulated log-likelihood function

(3.33)`(κ) = −N

2

(T log(σ2)− log(δ)

+1

Nσ2

N∑i=1

T∑t=1

(yi,t − φyi,t−1)2 + δT

Nσ2

N∑i=1

(yi − φyi− − ρyi,0)2

),

for κ = (φ, σ2, δ, ρ)′. The corresponding concentrated log-likelihood function in terms

of the δ parameter is

− 2

N`c(δ) = T log

[(c− 2φ(δ)b+ φ2(δ)a

)+ δ

(c− 2φ(δ)b+ φ2(δ)a

)]− log δ, (3.34)

where φ(δ) = b+δba+δa

and

a =1

N

N∑i=1

T∑t=1

y2i,t−1, b =

1

N

N∑i=1

T∑t=1

yi,tyi,t−1, c =1

N

N∑i=1

T∑t=1

y2i,t,

a =T

N

N∑i=1

y2i−, b =

T

N

N∑i=1

yiyi−, c =T

N

N∑i=1

y2i .

The expression for the concentrated log-likelihood can be further simplified as

− 2

N`c(δ) = T log

[(c+ δc)− (b+ δb)2

a+ δa

]− log δ. (3.35)

Similarly to (3.17), the FOC for the log-likelihood function in (3.35) is cubic in δ.


One can relate the likelihood function in (3.35) to equation (2.3) in Maddala (1971),

who investigates the occurrence of the boundary solution δ = 1 for this type of likeli-

hood function.

Remark 3.2. Note that Maddala (1971) and also Balestra and Nerlove (1966), assume

that for all i one has yi,0 = 0. If this restriction is indeed true, the two log-likelihood

functions are identical and φRML = φTML. On the other hand, if this restriction is not

satisfied, the resulting estimator is not consistent for any fixed value of T and has an

asymptotic bias of order OP (T−2). This estimator is labeled a “Misspecified Random

Effects Estimator” in Hahn et al. (2004), where the authors provide asymptotic results

for this estimator as N, T →∞ (jointly).

One can see that the necessary and sufficient condition for δ = 1 to be a local maximum

isd`c(δ)

dδ

∣∣∣∣δ=1

> 0. (3.36)

In this case one sets δ = 1 and the corresponding estimate of φ is given by

φ(1) =b+ b

a+ a. (3.37)

We know that for T = 2 and in the case of the TML estimator this φ(1) is exactly the

middle solution φ = φW + 1. In all other cases this solution will differ from the unique

global unconstrained maximum (in the one solution case). For example, if yi,0 = 0 for

all i, one can recognize φ(1) as the pooled OLS estimator of φ, which is known to be

positively biased. In general, φ(δ) is a weighted sum of “within” and “quasi-between”

estimators and thus belongs to the interval of Lemma 3.1. Furthermore, the weight of

“within” estimator is monotonically decreasing in δ, because

φ(δ) = φW q(δ) + φB(1− q(δ)), q(δ) =a

a+ δa, q′(δ) < 0. (3.38)

Hence, if the global maximum φ does not satisfy the non-negativity constraint it is

always non-smaller than φ(1) (assuming φW < φB). In the Monte Carlo section of this

paper we will investigate the finite sample properties of TML and RML estimators that

use φ(1) as estimate at the boundary of the parameter space.

Remark 3.3. Although not addressed in this paper, the use of the boundary solution

φ(1) might lead to a non-standard inference problem in finite samples (Feldman and

Cousins (1998), Ketz (2014)). Confidence intervals based on inverting the Likelihood


Ratio statistic have to be handled with care as for some values of the null hypothesis

φ0 the likelihood ratio statistic can be non-positive.

3.4 Extension to exogenous regressors

For most empirically relevant applications the AR(1) model specification is too restric-

tive and incomplete. In this subsection we therefore extend our analysis to an ARX(1)

model including additional strictly exogenous regressors.12 For ease of exposition, we

consider the following simplified version with one additional regressor

yi,t = ηi + φyi,t−1 + βxi,t + εi,t, E[εi,t|xi,0, . . . , xi,T , yi,0, ηi] = 0. (3.39)

Then using stacked notation for individual i we have

yi = φyi− + βxi + ıTηi + εi, εi = (εi,1, . . . , εi,T )′. (3.40)

We continue by using the Chamberlain (1982) type of projection for ηi as in Kruiniger

(2006) and Bai (2013a) on not only yi,0, but now also all the lags and leads of xi,t

ηi = π′wi + vi, E[viwi] = 0, wi = (yi,0, xi,0,x′i)′. (3.41)

The main implication is that the combined error term ui can be represented as

ui = Ryi − e1φyi,0 − βxi − ıTπ′wi. (3.42)

The variance-covariance matrix remains the same as in the pure AR(1) model without

exogenous regressor. The quasi log-likelihood function over all individuals is then given

by (up to a constant)

(3.43)− 2

N`(κ) = (T − 1) log(σ2) + log(θ2) +

1

Nσ2

N∑i=1

T∑t=1

(yi,t − φyi,t−1 − βxi,t)2

+T

Nθ2

N∑i=1

(yi − φyi− − βxi − π′wi)2,

where κ = (φ, β, σ2, θ2,π′)′.

12Inclusion of weakly exogenous regressors can be handled if the xi,t vector admits a VAR repre-sentation. Some results for first order panel VAR models are discussed in Juodis (2014b).


Similarly to the model without xi,t the TML estimator can be expressed as a restricted

version (in terms of the parameter restrictions) of a more general RML estimator.

Without loss of generality, we can rewrite the second component of the log-likelihood

function as

T

Nθ2

N∑i=1

(yi − φyi− − βxi − π′wi)2 =

T

Nθ2

N∑i=1

(yi − φyi− − βxi − ρ′zi)2, (3.44)

where zi ≡ (yi,0, xi,0,∆xi,1, . . . ,∆xi,T )′ and xi ≡ xi−xi,0. Furthermore, using e.g. The-

orem 3.1 in Juodis (2014b) the second component of the TML log-likelihood function

is given by

T

Nθ2

N∑i=1

(yi − φyi− − βxi − π′∆∆xi)2, ∆xi = (∆xi,1, . . . ,∆xi,T )′. (3.45)

Hence, by setting first two components of the ρ vector to zero we obtain the TML

estimator as the restricted version of the RML estimator.13

Irrespective of the estimator considered it is not difficult to see that, because xi ∈Span(∆xi), one can concentrate out the ρ/π∆ parameter such that the second compo-

nent of the log-likelihood function in (3.45) does not contain the β parameter. There-

fore, the (concentrated) log-likelihood function can be expressed as

(3.46)− 2

N`(κ) = (T − 1) log(σ2) + log(θ2) +

1

Nσ2

N∑i=1

T∑t=1

(yi,t−φyi,t−1− βxi,t)2

+T

Nθ2

N∑i=1

(yi − φyi−)2,

where we defined

yi ≡ yi −

(N∑i=1

yi∆x′i

)(N∑i=1

∆xi∆x′i

)−1

∆xi, (3.47)

yi ≡ yi −

(N∑i=1

yiz′i

)(N∑i=1

ziz′i

)−1

zi, (3.48)

13Note that the interpretation of TML as restricted version of RML is only valid if one includesxi,0 in wi.


for TML and RML estimators respectively (similarly for yi−). One can also concentrate

out the β parameter from the first component of the log-likelihood function. After

subsequent concentration of σ2 and θ2, the resulting log-likelihood function is then of

the same structure as in (3.13). The cubic FOC in (3.17) follows directly from that.

Summarizing, in this section we argued that in the model augmented with exogenous

regressors the FOC of the TML/RML estimators again is cubic in the autoregressive

parameter φ. Our derivations above rely upon the fact that the full Chamberlain

(1982) projection has been used, rather than the restricted Mundlak (1978) projection.

Without going into further discussion, we state that for the TML estimator the results

above do not carry over if one uses the Mundlak (1978) projection instead. However,

they continue to be valid for the RML estimator if one does not include xi,0 when

exploiting the Mundlak (1978) projection.14


In this section we investigate the finite sample performance of the various estimators

and corresponding test statistics using simulated data. In particular, we consider the

following panel AR(1) model

yi,t = φyi,t−1 + (1− φ)µi + εi,t, εi,t ∼ N (0, 1) , t = 1, . . . , T. (3.49)

yi,0 = γµi + εi,0, εi,0 ∼ N(

0,ζ

1− φ2

), µi ∼ N

(0, σ2

µ

). (3.50)

Mean (effect) stationarity of yi,t is achieved for designs with γ = 1, while the process

yi,t is covariance stationary if and only if both γ = ζ = 1. The actual value of σ2µ is

irrelevant for the TML estimator as long as γ = 1, but for the RML estimator this

parameter is always important. For the TML estimator the only important parameter

is α as defined in (3.27) as it measures the deviation from covariance stationarity.

Even for the simple AR(1) model the parameter space is already very large. We have

tried to cover its most relevant part by considering the following parameter settings

N = 50, 250, T = 3, 7, γ = 0.5, 1.0, σµ = 1, 3, φ = 0.5, 0.8,14Or if one does not impose that the coefficient for xi,0 is identical to the one for xi,1, . . . , xi,T .


while ζ = 1. We report mean, median, IQR (Interquartile Range) and RMSE for the

following coefficient estimators (TML/RML):15

• T(R)ML based on the global maximum (T(R)MLg), where always the global

maximum is selected.

• T(R)ML based on the “left” maximum (T(R)MLl), that takes into account the

non-negativity restriction only if there are two competing local maxima.

• T(R)ML with the imposed boundary condition φ(1) (T(R)MLb) from equation

(3.37), as in Section 3.3.3.

Note that in calculating coefficient estimators we refrained from using numerical opti-

mization techniques. As mentioned earlier, exploiting root finding algorithms one can

find solutions to the cubic first-order condition.16

Regarding inference we consider empirical rejection frequencies based on two sided t-

and LR statistics.17 We address both size and power. Due to the possible flatness of the

profile log-likelihood functions induced by the bimodality, inference quality based on

the two classical tests might differ substantially. The t or Wald test critically depends

on a quadratic approximation of the likelihood, which may cause problems when the

likelihood is flat. The LR test is probably better behaved under the null hypothesis, but

the flatness of the likelihood will influence its power. Below we summarize some general

patterns that arise from the various Tables with simulation results in the Appendix.


Regarding coefficient estimation we find that both T(R)MLl and T(R)MLb perform

substantially better than always choosing the global maximum of the likelihood function

(T(R)MLg). This point is especially relevant for TML and results from not taking the

non-negativity constraint into consideration. The “right” instead of “left” solution to

the FOC may sometimes provide the global maximum of the likelihood, but choosing

15We do not provide results for T(R)ML estimator where one imposes restrictions directly on theparameter space of φ. The common restriction of this type is φ ∈ (−1; 1) as considered e.g. in Hsiaoet al. (2002).

16As implemented by e.g. the roots(·) function in Matlab.17For the t-test we exploit the usual “sandwich” covariance matrix estimator, e.g. as in Hayakawa

and Pesaran (2015) or Juodis (2014b).


it causes serious bias. Therefore, in terms of bias and of RMSE both TMLg and RMLg

are dominated by “left” estimators or by exploiting the boundary condition.

When we do not consider T(R)MLg, we find little difference between RML and TML.

There is some tendency of RML to dominate TML, confirming results in Kruiniger

(2013). However, we do not observe any substantial problems for TML when σµ in-

creases, unlike the aforementioned study. Furthermore, in terms of RMSE exploiting

the boundary solution is almost always better than the “left” estimator. However, in

some cases (for small N and/or large φ) this choice can have a negative effect on mean

and median bias. This observation is in line with the theoretical discussion at the end

of Section 3.3.3. As N and T increase the discrepancy becomes negligible. Also the

distributions of all estimators tend to be asymmetric as illustrated by the discrepancy

between mean and the median.

Finally, we find that in all cases where the cubic FOC had three solutions, the “left”

solution always satisfies the non-negativity constraint, while the “right” solution never

satisfies it. Furthermore, in those cases the global constrained maximum is also achieved

at the “left” solution, and not at the boundary. Hence for replications with three

solutions only the “left” solution is natural. This is an important observation, because

it suggests that there is always at most one interior maximum in the constrained

optimization problem.

Figure 3.3 further illustrates how different ways of dealing with the boundary condition

shapes finite sample distributions of coefficient estimators. The fact that most of

the studies that consider RML and/or TML estimation (e.g. Hsiao et al. (2002),

Alvarez and Arellano (2003), Ahn and Thomas (2006), Kruiniger (2008), Hayakawa

and Pesaran (2015)) either do not address the negative variance issue at all or only

mention it without further exploring its consequences is somewhat puzzling. As we

can see for many designs substantial gains in terms of RMSE can be achieved when

using RMLl/TMLl rather than RMLg/TMLg. Finally, in most cases IQR for TML is

larger than the corresponding value for RML, thus confirming the asymptotic results

presented in Proposition 3.2.

3.5.2 Results: Inference

Regarding inference with the t and LR statistics, we observe that the LR test provides

reasonable size control, but the power properties are poor for small N and T . Given the


TMLg

0.25 0.50 0.75 1.00 1.25

2

4Density

TMLg RMLg

0.25 0.50 0.75 1.00 1.25

2

4Density

RMLg

TMLl

0.25 0.50 0.75 1.00 1.25

2

4Density

TMLl RMLl

0.25 0.50 0.75 1.00 1.25

2

4Density

RMLl

TMLb

0.25 0.50 0.75 1.00 1.25

2

4Density

TMLb RMLb

0.25 0.50 0.75 1.00 1.25

2

4Density

RMLb

Figure 3.3: Finite sample distribution of the TML/RML estimators for N =250, T = 3, φ = 0.5 with covariance stationary initialization of yi,0.

asymmetry of the likelihood function, especially the power for alternatives larger than

the null hypothesis is negligible and comparable to size. The power of the LR test im-

proves significantly with a large sample size, and especially for larger T . TMLb/RMLb

tends to be undersized in comparison to TMLl/RMLl.

Furthermore, inference based on the t-test in samples with small N and T is unreliable

as the actual rejection frequencies are substantially higher than nominal ones. The

results for the t-statistic deteriorate for φ = 0.8, which is related to the non-standard

behavior of the TML/RML estimator when φ is local-to-unity, see Kruiniger (2013) for

related results. The LR test is much less affected by the value of φ.

Finally, TML and RML statistics are similar in terms of empirical size with RML

statistics having higher power. In terms of empirical size we can rank test statistics in

the following order: g > l > b. These differences slowly disappear as N and/or T get

larger.


3.6 Empirical illustration

In this section we study the behavior of TML and RML estimators exploiting data

from Bun and Carree (2005), who considered the following model for unemployment

at the U.S. state level

ui,t = φui,t−1 + βgi,t−1 + ηi + τt + εi,t. (3.51)

Here ui,t is the unemployment rate in state i at time t and gi,t−1 is the real economic

growth rate at time t − 1.18 The annual panel data cover the years 1991-2000 for

all U.S. states (including Washington D.C, hence N = 51). We present estimation

results for the model including the growth regressor (Table 3.1) and the pure AR(1)

specification (Table 3.2). In order to investigate how the behavior of the log-likelihood

function for both estimators changes as T increases we consider estimates over an

increasing window. Thus results for T = 2 are obtained based on years 1998 − 2000,

T = 3 exploits the period 1997 − 2000, etc. All models are estimated based on data

in deviations from cross-sectional means to filter-out the time effects. To illustrate the

finite sample properties of T(R)MLg, T(R)MLl and T(R)MLb we present coefficient

estimates for varying T .

3.6.1 ARX(1) model

Using all time periods (T = 9) the estimation results based on TML and RML are very

similar to the estimates in Bun and Carree (2005) obtained using the bias-corrected

FE estimator.19 The similarity between the bias corrected FE results as found by

Bun and Carree (2005) and TML/RML estimators is not surprising, given that all

three estimators correct for the bias in the FE estimator using some bias adjustment

procedure.

The results in Table 3.1 show that for RML estimation RMLg = RMLl = RMLb,

irrespective of T . Hence, the global maximum is always achieved at the “left” solution

that satisfies the non-negativity restriction, which amounts to θ2 ≥ σ2. The same

holds for TML estimation, with the clear exception of the T = 2 case. There the global

maximum is attained at φ(r) = 1.422, which is substantially larger than φ(l) = 0.506.

18Empirical evidence on strict exogeneity of gi,t−1 is provided in Bun and Carree (2005).19The bias corrected FE estimates are φ = 0.615 and β = −0.057. Furthermore, our results are in

line with the results of Lokshin (2008) obtained for TML.


Table 3.1: TML and RML estimates for the ARX(1) model.

TMLg TMLl RMLg RMLl

T φ β φ β φ β φ β2 1.422 0.020 0.506 0.003 0.493 0.003 0.493 0.0033 0.429 -0.006 0.429 -0.006 0.532 -0.008 0.532 -0.0084 0.492 -0.026 0.492 -0.026 0.562 -0.025 0.562 -0.0255 0.451 -0.031 0.451 -0.031 0.489 -0.030 0.489 -0.0306 0.511 -0.036 0.511 -0.036 0.531 -0.035 0.531 -0.0357 0.511 -0.038 0.511 -0.038 0.517 -0.038 0.517 -0.0388 0.577 -0.041 0.577 -0.041 0.587 -0.040 0.587 -0.0409 0.617 -0.057 0.617 -0.057 0.641 -0.055 0.641 -0.055

3.6.2 AR(1) model

The empirical results for the pure AR(1) model without gi,t−1 are reported in Table 3.2.

As can be seen, the estimation results obtained from TML are quite stable irrespective

of the time horizon under consideration. Furthermore, in all cases we find that the

global maximum of the log-likelihood function is attained at the left maximum.20

The results for the RML estimator, however, are considerably less stable. For T =

2, 5, 7, 8 all three RML estimators are identical and are very close to the TML es-

timator. In two other cases, i.e. T = 6, 9, the global maximum is obtained at the

“right” solution and not at the “left” one. The result is a large difference between

the RMLg and RMLl/RMLb. Because the “left” solution satisfies the non-negativity

constraint, the RMLl and RMLb estimators are identical. Finally, for T = 3, 4 there

exists one solution only, which does not satisfy the non-negativity constraint. RMLl

and RMLb produce therefore markedly different estimates, with the latter actually be-

ing quite close to the corresponding TMLl estimates. In Figure 3.5 (see the Appendix)

we provide detailed plots of the concentrated log-likelihood function of this model for

different values of T . Based on these plots we can see how small increments in the

length of the time series change the shape of the concentrated log-likelihood function.

Furthermore, the log-likelihood functions for both estimators are relatively flat where

likelihoods are both unimodal and bimodal. Finally, we see in all cases that the second

mode of RML is smaller in absolute value as compared to that of TML.

20When T = 2 we report the “left” solution for TMLg as both its solutions are of the same log-likelihood value.


Table 3.2: TML and RML estimates for the AR(1) model.

T TMLg TMLl RMLg RMLl RMLb2 0.502 0.502 0.516 0.516 0.5163 0.514 0.514 0.750 0.750 0.6134 0.522 0.522 0.950 0.950 0.6675 0.461 0.461 0.514 0.514 0.5146 0.553 0.553 1.062 0.596 0.5967 0.545 0.545 0.565 0.565 0.5658 0.613 0.613 0.632 0.632 0.6329 0.671 0.671 1.054 0.695 0.695

3.7 Conclusions

We have investigated some finite sample and asymptotic properties of the TML and

RML estimators for dynamic panel data models. Both estimators are consistent for

fixed T and N large, but in finite samples their actual numerical implementation mat-

ters for inference. We showed that in a simple AR(1) model with homoscedastic errors

the TML and RML estimators can be obtained as solutions of cubic first-order con-

ditions. We furthermore argued that in some cases the value that maximizes the log-

likelihood function is not the best possible solution as it can violate the non-negativity

constraint of a positive variance. Finally, we showed that these results extend to models

with additional exogenous regressors.

In a Monte Carlo study we found that the issue of non-negativity constraints cannot

be ignored as it is commonly done in the literature. However, we also found, that for

some parameter values the use of a constrained likelihood can have detrimental effects

on finite sample bias of the corresponding ML estimator. Additionally, the inference

based on likelihood based estimators can be highly misleading, as for small values of N

and T we found that t-statistics tend to be substantially oversized. Although inference

based on the LR test provides reasonable size control for small N and T , it can result

in low power due to possible flatness of the likelihood function.

Finally, we investigated the issues of local maxima and boundary solutions in an em-

pirical analysis of U.S. state level unemployment rates. We have found that in some

cases the different treatment of these issues leads to markedly different estimates of the

autoregressive parameter.


3.A Proofs

.

Proof Proposition 3.2. To prove that the quasi-between estimator for RML is asymp-

totically smaller than for TML, it is sufficient to show that

θ2T a

R∞ − θ2

RaT∞ ≥ 0,

as plimN→∞ φ(j)B = φ0 + 1

T

θ2j

aj∞ξ, for j = T,R that follows from the fact that φ = φ0

is always a solution to the FOC asymptotically (for more details refer to the proof of

Proposition 3.6) so that the maximum likelihood estimator is consistent. Here a(j)∞ =

plimN→∞TN

∑Ni=1 y

2i− for j = T,R. Observe that for any T

ξ ≡T−2∑t=0

(T − t− 1)φt0 ≥ 0,

θ2T = σ0 + T (var (∆yi,1)− σ2

0) = σ20 + T

(E ((φ0 − 1 + π0)yi,0 + vi + εi,1)2 − σ2

0

),

θ2R = σ2

0 + T(E (vi)

2) ,where as before ηi = π0yi,0 + vi and π0 =

E[yi,0ηi]

E[y2i,0]. The difference is thus θ2

T − θ2R =

T((φ0 − 1 + π0)2 E[y2

i,0])≥ 0. Regarding aR∞ we have

aR∞ = aT∞ − T(E[yi−yi,0])2

E[y2i,0]

,

E[yi−yi,0] =1

Tξ(φ0 − 1 + π0) E[y2

i,0],

while for aT∞:21

aT∞ =

(ξ2

T(φ0 − 1)2

)E

(yi,0 −

ηi1− φ0

)2

+σ2

0

T

T−2∑t=0

(t∑

j=0

φj0

)2

,

=

(ξ2

T(φ0 − 1)2

)E

(yi,0

(1− π0

1− φ0

)− vi

1− φ0

)2

+σ2

0

T

T−2∑t=0

(t∑

j=0

φj0

)2

,

=

(1

T(φ0 − 1 + π0)2ξ2

)E[y2

i,0] +ξ2

TE[v2

i ] +σ2

0

T

T−2∑t=0

(t∑

j=0

φj0

)2

,

21For derivations of this term please refer to Lemma 2 in the Appendix of Chapter 2.


as E[viyi,0] = 0, which implies that

aR∞ =ξ2

TE[v2

i ] +σ2

0

T

T−2∑t=0

(t∑

j=0

φj0

)2

=1

T

(ξ2 E[v2

i ] + σ20q),

where q is implicitly defined. Denote w ≡ (φ0 − 1 + π0)2 E[y2i,0] ≥ 0, then

θ2T a

R∞ − θ2

RaT∞ = TwaR∞ − θ2

R

ξ2

Tw

= w(ξ2 E[v2

i ] + σ20q)− wξ2

(E[v2

i ] +1

Tσ2

0

)= wσ2

0

(q − ξ2

T

)> 0,

where the last result follows as an implication of the Jensen’s inequality.

Proof Proposition 3.3. Using the variables defined in Section 3.3.3, we note that for

T = 2 and TML

a = a, b = b+ 2a, c = c+ 4(a+ b).

and thus θ2(φ) = σ2(φ) + 4(a(1− φ) + b). Furthermore, for T = 2 we have from (3.13)

that `c(φ) ∝ log(θ2(φ)σ2(φ)) with

θ2(φ)σ2(φ) = σ4(φ) + 4σ2(b− φa) + 4σ2(φ)a

=(

2(b− φa) + σ2(φ))2

− 4(b− φa)2 + 4σ2(φ)a

=(

2(b− φa) + σ2(φ))2

− 4(b2 − 2φab+ φ2a2) + 4(ac− 2φab+ φ2a2)

=(

2(b− φa) + σ2(φ))2

+ 4(ac− b2)

=(

2(b− φa) + σ2(φ))2

+ d,

where

d =

(1

N

N∑i=1

(∆yi,1)2

)(1

N

N∑i=1

(∆yi,2)2

)−

(1

N

N∑i=1

∆yi,1∆yi,2

)2

. (3.52)

Proof Corollary 3.4. From Proposition 3.3 we have that θ2(φ)−σ2(φ) = 4(a(1−φ)+

b). Given that φ = 1 + b/a one can easily see that θ2(φ)− σ2(φ) = 4(a(1− φ) + b) = 0.


The first and the third parts follow from the symmetry established in Proposition 3.3,

while the last part follows directly from definitions.

These results do not hold for T > 2 and/or the RML estimator. For example, observe

that for general T we can decompose (for simplicity denote T1 = 1/(T − 1))

θ2(φ) = c− 2φb+ φ2a, σ2(φ) = T1

(c− 2φb+ φ2a

),

θ2(φ)− σ2(φ) = φ2(a− T1a

)− 2φ

(b− T1b

)+(c− T1c

).

For T = 2 and TML we have a = a and the right hand side of the last equation becomes

linear in φ. Hence, setting the left hand side of the last equation equal to zero and

solving for φ, there is only one solution given by φW + 1. For T > 2 and/or the RML

estimator, the equation θ2(φ)− σ2(φ) = 0 has two solutions of the form

φ =b− T1b

a− T1a±

√(b− T1b)2 − (a− T1a)(c− T1c)

(a− T1a)2. (3.53)

The first order condition in (3.17), evaluated at the value φ, becomes proportional to

(b+ b)− φ(a+ a) 6= 0. In the point φ the first order condition is not zero in this case,

hence the corner solution θ2(φ) = σ2(φ) cannot hold.

Proof Corollary 3.5. Note that the discriminant D = 1 − S22.1/S11, where S22.1 ≡S22−S2

12/S11, where Sij is the i, j element of the S =∑N

i=1 ((∆yi,1,∆yi,2)′(∆yi,1,∆yi,2))

matrix. Under joint normality the elements S22.1 and S11 are independent χ2(·) random

variables with respectively N − 1 and N degrees of freedom. The main result follows

after observing that E[vechS] = (var (∆yi,1), φ0 var (∆yi,1)−σ20, φ

20 var (∆yi,1)+2σ2

0(1−φ0))′.

Proof Proposition 3.6. Observe that for any value of φ (see e.g. Chapter 2)

σ2E(x) ≡ E[σ2(φ)] = σ2

0 +1

T − 1

(x2aE − x

2ξ

Tσ2

0

),

θ2E(x) ≡ E[θ2(φ)] = θ2

0 + x2aE + x2ξ

Tθ2

0,

with aE, aE, ξ, x defined in (3.28). It is not difficult to see that the asymptotic polyno-

mial is given by

θ2E(x)

(aEx−

σ20ξ

T

)+ σ2

E(x)

(aEx+

θ20ξ

T

)= 0.


Plugging in the expressions for σ2E(x) and θ2

E(x) into the previous formula

x

([θ2E(x)aE + σ2

E(x)aE] +x

Tξ

[θ2

0

aET − 1

− σ20 aE

]− 2ξ2

T (T − 1)θ2

0σ20

)= 0.

Note that

θ2E(x)aE + σ2

E(x)aE = x2

(1 +

1

T − 1

)(aE aE) + x

2ξ

T

(θ2

0aE −1

T − 1σ2

0 aE

)+(θ2

0aE + σ20 aE).

Combining both expressions and removing the trivial solution x = 0 we get

x2 T

T − 1(aE aE) + x

(1

Tξ

(θ2

0

aET − 1

− σ20 aE

)+

2ξ

T

(θ2

0aE −1

T − 1σ2

0 aE

))+(θ2

0aE + σ20 aE)− 2ξ2

T (T − 1)θ2

0σ20 = 0.

3.B Tables


Table 3.3: Estimation Results for N = 50, T = 3.

Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSE Mean Median IQR RMSEφ = 0.5 γ = 0.5 σµ = 1 φ = 0.5 γ = 0.5 σµ = 3 φ = 0.5 γ = 1.0 σµ = 1 φ = 0.5 γ = 1.0 σµ = 3

TMLg 0.72 0.68 0.62 0.40 0.82 0.66 0.83 0.53 0.69 0.66 0.57 0.38 0.69 0.66 0.57 0.38RMLg 0.57 0.52 0.33 0.26 0.75 0.58 0.77 0.47 0.55 0.51 0.32 0.25 0.64 0.58 0.51 0.34TMLl 0.54 0.50 0.32 0.23 0.52 0.49 0.20 0.18 0.54 0.50 0.34 0.24 0.54 0.50 0.34 0.24RMLl 0.53 0.50 0.28 0.22 0.53 0.50 0.20 0.19 0.53 0.49 0.29 0.22 0.54 0.49 0.33 0.25TMLb 0.51 0.50 0.30 0.19 0.52 0.49 0.20 0.16 0.50 0.50 0.30 0.19 0.50 0.50 0.30 0.19RMLb 0.49 0.49 0.24 0.16 0.52 0.50 0.20 0.16 0.47 0.48 0.22 0.15 0.50 0.49 0.28 0.18

φ = 0.8 γ = 0.5 σµ = 1 φ = 0.8 γ = 0.5 σµ = 3 φ = 0.8 γ = 1.0 σµ = 1 φ = 0.8 γ = 1.0 σµ = 3TMLg 0.90 0.93 0.36 0.28 0.95 0.97 0.41 0.31 0.90 0.92 0.36 0.28 0.90 0.92 0.36 0.28RMLg 0.83 0.82 0.36 0.24 0.95 0.96 0.44 0.31 0.82 0.82 0.36 0.24 0.86 0.87 0.40 0.27TMLl 0.76 0.77 0.34 0.21 0.78 0.78 0.34 0.20 0.75 0.77 0.34 0.21 0.75 0.77 0.34 0.21RMLl 0.78 0.77 0.33 0.22 0.79 0.78 0.35 0.22 0.77 0.77 0.33 0.22 0.77 0.77 0.34 0.23TMLb 0.73 0.76 0.29 0.19 0.77 0.78 0.31 0.19 0.73 0.75 0.29 0.19 0.73 0.75 0.29 0.19RMLb 0.71 0.73 0.22 0.17 0.76 0.78 0.30 0.18 0.70 0.73 0.22 0.18 0.72 0.74 0.26 0.19














Table

3.7

:t-

test

resu

lts

forN

=50

,T

=3.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σµ

=1

φ=

0.5

γ=

0.5

σµ

=3

φ=

0.5

γ=

1.0

σµ

=1

φ=

0.5

γ=

1.0

σµ

=3

TM

Lg

0.4

70.4

40.4

40.4

60.4

60.4

90.4

40.4

60.5

40.6

30.4

60.4

30.4

20.4

20.4

30.4

60.4

30.4

20.4

20.4

3R

ML

g0.3

20.2

50.2

30.2

80.3

60.4

50.3

60.3

80.4

60.5

80.3

10.2

40.2

20.2

60.3

40.4

00.3

60.3

50.3

60.4

0T

ML

l0.2

30.2

00.2

20.2

60.3

20.2

10.0

80.1

00.2

20.3

80.2

50.2

20.2

30.2

60.3

20.2

50.2

20.2

30.2

60.3

2R

ML

l0.2

60.1

90.1

80.2

30.3

30.2

30.1

00.1

10.2

20.3

90.2

70.2

00.1

90.2

30.3

20.2

60.2

20.2

20.2

60.3

3T

ML

b0.1

70.1

30.1

40.2

00.2

80.2

00.0

70.0

90.2

00.3

70.1

70.1

40.1

40.1

90.2

70.1

70.1

40.1

40.1

90.2

7R

ML

b0.1

40.0

70.0

70.1

60.2

80.2

10.0

70.0

80.1

90.3

60.1

30.0

60.0

70.1

60.2

90.1

50.1

00.1

10.1

60.2

7φ

=0.8

γ=

0.5

σµ

=1

φ=

0.8

γ=

0.5

σµ

=3

φ=

0.8

γ=

1.0

σµ

=1

φ=

0.8

γ=

1.0

σµ

=3

TM

Lg

0.5

60.4

70.3

60.2

80.2

70.5

80.5

20.4

40.3

30.3

00.5

60.4

70.3

50.2

70.2

70.5

60.4

70.3

50.2

70.2

7R

ML

g0.4

00.3

10.2

60.2

70.3

30.5

50.4

80.4

00.3

30.3

20.3

90.3

10.2

60.2

70.3

30.4

60.3

80.3

10.2

80.3

1T

ML

l0.3

40.2

80.2

20.2

10.2

90.3

30.2

90.2

50.2

20.2

80.3

40.2

80.2

10.2

10.2

90.3

40.2

80.2

10.2

10.2

9R

ML

l0.3

20.2

50.2

10.2

50.3

30.3

20.2

70.2

20.2

20.2

90.3

20.2

40.2

10.2

40.3

30.3

30.2

50.2

10.2

30.3

2T

ML

b0.2

90.2

40.2

00.2

10.3

20.3

00.2

70.2

40.2

10.2

90.2

90.2

30.2

00.2

10.3

30.2

90.2

30.2

00.2

10.3

3R

ML

b0.1

00.0

70.1

00.1

90.3

30.2

30.1

80.1

50.1

90.2

80.1

00.0

70.1

10.2

00.3

40.1

70.1

10.1

30.1

90.3

2

Table

3.8

:t-

test

resu

lts

forN

=50

,T

=7.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σµ

=1

φ=

0.5

γ=

0.5

σµ

=3

φ=

0.5

γ=

1.0

σµ

=1

φ=

0.5

γ=

1.0

σµ

=3

TM

Lg

0.7

90.2

70.0

80.3

50.7

70.9

00.3

80.1

10.4

40.8

80.7

70.2

60.0

80.3

40.7

60.7

70.2

60.0

80.3

40.7

6R

ML

g0.7

90.2

70.0

60.3

30.7

60.9

00.3

80.1

00.4

40.8

80.7

70.2

60.0

60.3

20.7

50.7

70.2

60.0

70.3

30.7

5T

ML

l0.7

90.2

60.0

50.3

20.7

60.9

00.3

70.0

60.4

00.8

70.7

70.2

50.0

50.3

10.7

40.7

70.2

50.0

50.3

10.7

4R

ML

l0.7

90.2

60.0

60.3

20.7

60.9

00.3

80.0

60.4

00.8

70.7

70.2

50.0

60.3

20.7

40.7

70.2

50.0

50.3

10.7

4T

ML

b0.7

90.2

60.0

50.3

20.7

60.9

00.3

70.0

60.4

00.8

70.7

70.2

50.0

50.3

10.7

40.7

70.2

50.0

50.3

10.7

4R

ML

b0.7

90.2

60.0

60.3

20.7

60.9

00.3

80.0

60.4

00.8

70.7

70.2

50.0

50.3

20.7

50.7

70.2

40.0

50.3

10.7

4φ

=0.8

γ=

0.5

σµ

=1

φ=

0.8

γ=

0.5

σµ

=3

φ=

0.8

γ=

1.0

σµ

=1

φ=

0.8

γ=

1.0

σµ

=3

TM

Lg

0.6

40.4

70.4

20.4

30.5

10.7

20.5

00.4

90.5

40.5

70.6

40.4

60.4

10.4

20.5

10.6

40.4

60.4

10.4

20.5

1R

ML

g0.6

10.3

20.2

10.3

40.6

30.7

20.5

00.4

80.5

40.5

90.6

00.3

20.2

10.3

40.6

20.6

20.3

90.3

20.3

80.5

5T

ML

l0.5

10.2

70.2

40.3

20.5

50.5

90.2

10.1

90.3

50.5

80.5

00.2

70.2

40.3

10.5

60.5

00.2

70.2

40.3

10.5

6R

ML

l0.5

80.2

90.1

80.3

30.6

30.5

90.2

10.1

90.3

50.5

80.5

70.2

90.1

80.3

30.6

30.5

20.2

80.2

20.3

20.5

8T

ML

b0.4

40.1

80.1

30.2

70.5

60.5

60.1

70.1

40.3

10.5

80.4

30.1

70.1

20.2

70.5

70.4

30.1

70.1

20.2

70.5

7R

ML

b0.4

60.1

20.0

70.3

00.6

80.5

40.1

50.1

20.3

00.5

80.4

50.1

10.0

60.3

10.6

90.4

20.1

40.0

90.2

70.5

9


Table

3.9

:t-

test

resu

lts

forN

=25

0,T

=3.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σµ

=1

φ=

0.5

γ=

0.5

σµ

=3

φ=

0.5

γ=

1.0

σµ

=1

φ=

0.5

γ=

1.0

σµ

=3

TM

Lg

0.6

00.2

80.2

80.4

30.6

40.9

10.4

40.3

10.5

60.8

60.5

30.2

70.2

70.4

00.5

90.5

30.2

70.2

70.4

00.5

9R

ML

g0.5

70.1

50.0

80.2

60.5

90.9

10.3

60.1

50.4

40.8

30.5

30.1

50.0

80.2

50.5

60.5

00.2

00.1

80.3

30.5

6T

ML

l0.4

90.0

90.0

90.2

80.5

40.9

00.2

90.0

50.3

60.7

90.4

00.1

10.1

10.2

70.5

10.4

00.1

10.1

10.2

70.5

1R

ML

l0.5

60.1

40.0

80.2

60.5

80.9

00.2

90.0

50.3

60.7

90.5

20.1

50.0

80.2

50.5

60.4

30.1

20.1

10.2

70.5

2T

ML

b0.4

70.0

70.0

70.2

50.5

20.9

00.2

90.0

50.3

60.7

90.3

80.0

80.0

80.2

30.4

90.3

80.0

80.0

80.2

30.4

9R

ML

b0.5

50.1

20.0

60.2

50.5

80.9

00.2

90.0

50.3

60.7

90.5

00.1

20.0

50.2

40.5

60.4

00.0

80.0

60.2

30.5

0φ

=0.8

γ=

0.5

σµ

=1

φ=

0.8

γ=

0.5

σµ

=3

φ=

0.8

γ=

1.0

σµ

=1

φ=

0.8

γ=

1.0

σµ

=3

TM

Lg

0.5

90.5

30.4

30.2

90.3

50.5

90.5

50.5

10.4

30.3

60.5

80.5

10.4

10.2

80.3

60.5

80.5

10.4

10.2

80.3

6R

ML

g0.4

50.2

80.2

00.2

60.4

60.5

70.5

20.4

70.4

00.3

80.4

40.2

80.1

90.2

60.4

60.5

00.3

90.2

90.2

90.4

3T

ML

l0.3

80.3

30.2

70.2

30.4

00.2

90.2

60.2

70.2

90.3

70.3

80.3

30.2

60.2

30.4

10.3

80.3

30.2

60.2

30.4

1R

ML

l0.4

20.2

50.1

80.2

50.4

60.3

10.2

70.2

50.2

60.3

80.4

20.2

60.1

70.2

50.4

60.4

00.3

00.2

20.2

60.4

4T

ML

b0.3

30.2

80.2

30.2

20.4

50.2

70.2

50.2

60.2

90.3

90.3

30.2

70.2

10.2

20.4

60.3

30.2

70.2

10.2

20.4

6R

ML

b0.2

50.0

60.0

70.2

20.5

00.2

60.2

10.2

00.2

40.3

80.2

40.0

50.0

70.2

30.5

10.2

70.1

40.0

90.2

10.4

5

Table

3.1

0:

t-te

stre

sult

sfo

rN

=25

0,T

=7.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σµ

=1

φ=

0.5

γ=

0.5

σµ

=3

φ=

0.5

γ=

1.0

σµ

=1

φ=

0.5

γ=

1.0

σµ

=3

TM

Lg

0.9

90.8

60.0

50.8

40.9

91.0

00.9

40.0

50.9

41.0

00.9

90.8

50.0

50.8

20.9

90.9

90.8

50.0

50.8

20.9

9R

ML

g0.9

90.8

70.0

50.8

50.9

91.0

00.9

40.0

50.9

41.0

00.9

90.8

50.0

50.8

30.9

90.9

90.8

50.0

50.8

20.9

9T

ML

l0.9

90.8

60.0

50.8

40.9

91.0

00.9

40.0

50.9

41.0

00.9

90.8

50.0

50.8

20.9

90.9

90.8

50.0

50.8

20.9

9R

ML

l0.9

90.8

70.0

50.8

50.9

91.0

00.9

40.0

50.9

41.0

00.9

90.8

50.0

50.8

30.9

90.9

90.8

50.0

50.8

20.9

9T

ML

b0.9

90.8

60.0

50.8

40.9

91.0

00.9

40.0

50.9

41.0

00.9

90.8

50.0

50.8

20.9

90.9

90.8

50.0

50.8

20.9

9R

ML

b0.9

90.8

70.0

50.8

50.9

91.0

00.9

40.0

50.9

41.0

00.9

90.8

50.0

50.8

30.9

90.9

90.8

50.0

50.8

20.9

9φ

=0.8

γ=

0.5

σµ

=1

φ=

0.8

γ=

0.5

σµ

=3

φ=

0.8

γ=

1.0

σµ

=1

φ=

0.8

γ=

1.0

σµ

=3

TM

Lg

0.9

90.5

70.2

40.6

20.8

00.9

90.7

50.3

20.7

50.9

00.9

90.5

40.2

40.6

00.8

00.9

90.5

40.2

40.6

00.8

0R

ML

g0.9

90.5

80.0

80.6

10.9

20.9

90.7

40.3

00.7

50.9

10.9

80.5

60.0

80.6

00.9

20.9

90.5

20.1

40.5

80.8

5T

ML

l0.9

80.4

60.1

10.5

50.8

70.9

90.6

80.0

60.6

40.9

30.9

80.4

40.1

10.5

30.8

60.9

80.4

40.1

10.5

30.8

6R

ML

l0.9

80.5

70.0

70.6

10.9

20.9

90.6

80.0

60.6

40.9

30.9

80.5

60.0

80.6

00.9

20.9

80.4

80.1

00.5

50.8

7T

ML

b0.9

60.4

30.0

60.5

20.8

70.9

90.6

80.0

50.6

30.9

30.9

50.4

00.0

70.5

10.8

70.9

50.4

00.0

70.5

10.8

7R

ML

b0.9

80.5

60.0

40.6

20.9

60.9

90.6

70.0

50.6

30.9

30.9

80.5

40.0

40.6

30.9

60.9

50.4

40.0

60.5

30.8

8


Table

3.1

1:

LR

test

resu

lts

forN

=50

,T

=3.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σν

=1

φ=

0.5

γ=

0.5

σν

=3

φ=

0.5

γ=

1.0

σν

=1

φ=

0.5

γ=

1.0

σν

=3

TM

Lg

0.2

50.0

90.0

50.0

90.1

30.4

20.1

70.0

80.1

30.2

40.2

20.0

80.0

50.0

80.1

20.2

20.0

80.0

50.0

80.1

2R

ML

g0.2

50.1

00.0

70.0

90.1

40.4

20.1

80.1

00.1

40.2

50.2

30.1

00.0

60.0

80.1

30.2

40.1

00.0

60.0

90.1

2T

ML

l0.2

20.0

70.0

30.0

60.0

90.3

60.1

20.0

40.0

80.1

90.2

00.0

60.0

30.0

50.0

90.2

00.0

60.0

30.0

50.0

9R

ML

l0.2

40.1

00.0

60.0

80.1

20.3

60.1

30.0

50.0

90.1

90.2

20.0

90.0

60.0

70.1

30.2

10.0

80.0

40.0

60.1

0T

ML

b0.2

00.0

60.0

30.0

60.0

90.3

60.1

20.0

40.0

80.1

90.1

80.0

50.0

30.0

50.0

90.1

80.0

50.0

30.0

50.0

9R

ML

b0.2

10.0

60.0

30.0

60.1

20.3

60.1

20.0

40.0

90.1

90.1

80.0

50.0

30.0

60.1

20.1

80.0

50.0

30.0

50.0

9φ

=0.8

γ=

0.5

σν

=1

φ=

0.8

γ=

0.5

σν

=3

φ=

0.8

γ=

1.0

σν

=1

φ=

0.8

γ=

1.0

σν

=3

TM

Lg

0.0

50.0

20.0

30.0

40.0

40.0

90.0

30.0

30.0

50.0

50.0

40.0

20.0

40.0

40.0

40.0

40.0

20.0

40.0

40.0

4R

ML

g0.1

20.0

60.0

60.0

60.0

80.1

10.0

40.0

40.0

60.0

60.1

10.0

60.0

60.0

60.0

80.0

80.0

40.0

50.0

60.0

5T

ML

l0.0

40.0

10.0

20.0

30.0

30.0

90.0

20.0

20.0

30.0

40.0

40.0

10.0

20.0

30.0

20.0

40.0

10.0

20.0

30.0

2R

ML

l0.1

10.0

50.0

40.0

50.0

70.1

00.0

20.0

20.0

40.0

40.1

00.0

50.0

40.0

50.0

70.0

60.0

20.0

30.0

40.0

4T

ML

b0.0

40.0

10.0

20.0

30.0

30.0

80.0

20.0

20.0

30.0

40.0

40.0

10.0

20.0

30.0

20.0

40.0

10.0

20.0

30.0

2R

ML

b0.0

50.0

10.0

20.0

40.0

70.0

90.0

20.0

20.0

40.0

40.0

50.0

10.0

20.0

40.0

70.0

40.0

10.0

20.0

30.0

4

Table

3.1

2:

LR

test

resu

lts

forN

=50

,T

=7.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σν

=1

φ=

0.5

γ=

0.5

σν

=3

φ=

0.5

γ=

1.0

σν

=1

φ=

0.5

γ=

1.0

σν

=3

TM

Lg

0.8

20.3

00.0

60.2

50.6

50.9

10.3

90.0

60.3

40.8

30.8

00.2

80.0

50.2

40.6

30.8

00.2

80.0

50.2

40.6

3R

ML

g0.8

20.3

00.0

40.2

50.6

70.9

10.3

90.0

60.3

40.8

20.8

00.2

80.0

40.2

40.6

60.8

00.2

80.0

50.2

40.6

3T

ML

l0.8

20.2

90.0

40.2

40.6

40.9

10.3

80.0

50.3

20.8

10.8

00.2

80.0

40.2

30.6

20.8

00.2

80.0

40.2

30.6

2R

ML

l0.8

20.3

00.0

40.2

40.6

60.9

10.3

80.0

50.3

20.8

10.8

00.2

80.0

40.2

40.6

60.8

00.2

80.0

40.2

30.6

3T

ML

b0.8

20.2

90.0

40.2

40.6

40.9

10.3

80.0

50.3

20.8

10.8

00.2

80.0

40.2

30.6

20.8

00.2

80.0

40.2

30.6

2R

ML

b0.8

20.3

00.0

40.2

40.6

60.9

10.3

80.0

50.3

20.8

10.8

00.2

80.0

40.2

40.6

60.8

00.2

80.0

40.2

30.6

3φ

=0.8

γ=

0.5

σν

=1

φ=

0.8

γ=

0.5

σν

=3

φ=

0.8

γ=

1.0

σν

=1

φ=

0.8

γ=

1.0

σν

=3

TM

Lg

0.7

00.2

40.0

60.1

20.1

20.7

90.3

00.0

60.1

60.1

90.6

90.2

20.0

60.1

20.1

20.6

90.2

20.0

60.1

20.1

2R

ML

g0.7

00.2

40.0

70.1

50.4

20.7

90.3

20.0

80.1

80.2

10.6

90.2

30.0

60.1

40.4

30.6

90.2

30.0

70.1

30.2

0T

ML

l0.6

60.2

10.0

40.0

90.1

00.7

60.2

50.0

40.1

20.1

50.6

50.2

00.0

40.0

90.1

00.6

50.2

00.0

40.0

90.1

0R

ML

l0.6

90.2

30.0

60.1

40.4

20.7

60.2

50.0

50.1

20.1

60.6

80.2

20.0

60.1

40.4

30.6

60.2

10.0

50.1

10.1

9T

ML

b0.6

60.1

80.0

30.0

90.1

00.7

60.2

40.0

40.1

20.1

50.6

50.1

70.0

30.0

90.1

00.6

50.1

70.0

30.0

90.1

0R

ML

b0.6

90.1

80.0

30.1

40.4

30.7

60.2

40.0

40.1

20.1

60.6

80.1

70.0

20.1

30.4

30.6

60.1

80.0

30.1

00.1

9


Table

3.1

3:

LR

test

resu

lts

forN

=25

0,T

=3.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σν

=1

φ=

0.5

γ=

0.5

σν

=3

φ=

0.5

γ=

1.0

σν

=1

φ=

0.5

γ=

1.0

σν

=3

TM

Lg

0.7

30.2

60.0

90.1

80.3

60.9

30.4

30.0

90.3

30.7

30.6

80.2

30.0

80.1

60.3

10.6

80.2

30.0

80.1

60.3

1R

ML

g0.7

30.2

30.0

50.1

70.4

20.9

30.4

00.0

80.3

10.7

20.7

00.2

10.0

50.1

60.4

10.6

70.2

20.0

70.1

50.3

1T

ML

l0.7

00.2

10.0

60.1

40.3

20.9

30.3

70.0

50.2

80.6

90.6

50.1

90.0

60.1

30.2

70.6

50.1

90.0

60.1

30.2

7R

ML

l0.7

30.2

30.0

50.1

60.4

20.9

30.3

80.0

50.2

80.7

00.6

90.2

10.0

50.1

60.4

10.6

60.2

00.0

60.1

40.2

9T

ML

b0.7

00.2

10.0

50.1

40.3

20.9

30.3

70.0

50.2

80.6

90.6

50.1

90.0

50.1

30.2

70.6

50.1

90.0

50.1

30.2

7R

ML

b0.7

30.2

30.0

50.1

60.4

20.9

30.3

80.0

50.2

80.6

90.6

90.2

00.0

40.1

50.4

10.6

60.1

90.0

50.1

30.2

9φ

=0.8

γ=

0.5

σν

=1

φ=

0.8

γ=

0.5

σν

=3

φ=

0.8

γ=

1.0

σν

=1

φ=

0.8

γ=

1.0

σν

=3

TM

Lg

0.3

30.0

40.0

30.0

60.0

50.4

70.1

00.0

40.0

80.0

90.3

10.0

30.0

40.0

60.0

40.3

10.0

30.0

40.0

60.0

4R

ML

g0.4

30.1

20.0

50.0

90.2

30.4

90.1

10.0

40.0

90.0

90.4

10.1

20.0

50.0

90.2

30.3

70.0

90.0

50.0

70.0

9T

ML

l0.3

20.0

40.0

20.0

40.0

40.4

50.0

90.0

20.0

60.0

70.3

00.0

30.0

20.0

40.0

30.3

00.0

30.0

20.0

40.0

3R

ML

l0.4

20.1

20.0

50.0

90.2

30.4

60.1

00.0

30.0

60.0

80.4

10.1

10.0

50.0

90.2

30.3

50.0

70.0

40.0

60.0

8T

ML

b0.3

10.0

40.0

20.0

40.0

40.4

50.0

90.0

20.0

60.0

70.2

90.0

30.0

20.0

40.0

30.2

90.0

30.0

20.0

40.0

3R

ML

b0.3

90.0

60.0

20.0

80.2

30.4

50.0

90.0

30.0

60.0

80.3

70.0

50.0

20.0

80.2

40.3

20.0

40.0

20.0

50.0

8

Table

3.1

4:

LR

test

resu

lts

forN

=25

0,T

=7.

φ−φ0

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

-.2

-.1

.0.1

.2-.

2-.

1.0

.1.2

φ=

0.5

γ=

0.5

σν

=1

φ=

0.5

γ=

0.5

σν

=3

φ=

0.5

γ=

1.0

σν

=1

φ=

0.5

γ=

1.0

σν

=3

TM

Lg

0.9

90.8

70.0

40.8

20.9

90.9

90.9

50.0

50.9

31.0

00.9

90.8

60.0

50.8

00.9

90.9

90.8

60.0

50.8

00.9

9R

ML

g0.9

90.8

80.0

40.8

30.9

90.9

90.9

50.0

40.9

31.0

00.9

90.8

60.0

40.8

10.9

90.9

90.8

60.0

50.8

00.9

9T

ML

l0.9

90.8

70.0

40.8

20.9

90.9

90.9

50.0

50.9

31.0

00.9

90.8

60.0

50.8

00.9

90.9

90.8

60.0

50.8

00.9

9R

ML

l0.9

90.8

80.0

40.8

30.9

90.9

90.9

50.0

40.9

31.0

00.9

90.8

60.0

40.8

10.9

90.9

90.8

60.0

50.8

00.9

9T

ML

b0.9

90.8

70.0

40.8

20.9

90.9

90.9

50.0

50.9

31.0

00.9

90.8

60.0

50.8

00.9

90.9

90.8

60.0

50.8

00.9

9R

ML

b0.9

90.8

80.0

40.8

30.9

90.9

90.9

50.0

40.9

31.0

00.9

90.8

60.0

40.8

10.9

90.9

90.8

60.0

50.8

00.9

9φ

=0.8

γ=

0.5

σν

=1

φ=

0.8

γ=

0.5

σν

=3

φ=

0.8

γ=

1.0

σν

=1

φ=

0.8

γ=

1.0

σν

=3

TM

Lg

0.9

90.6

90.0

80.3

30.4

20.9

90.8

00.0

90.4

90.6

00.9

90.6

70.0

80.3

20.4

30.9

90.6

70.0

80.3

20.4

3R

ML

g0.9

90.7

10.0

50.4

90.9

40.9

90.8

00.1

00.5

00.6

20.9

90.7

00.0

50.4

90.9

50.9

90.6

70.0

60.3

50.6

9T

ML

l0.9

90.6

70.0

60.3

00.4

10.9

90.7

80.0

50.4

40.5

60.9

90.6

50.0

60.2

90.4

20.9

90.6

50.0

60.2

90.4

2R

ML

l0.9

90.7

10.0

50.4

90.9

40.9

90.7

80.0

50.4

40.5

70.9

90.7

00.0

50.4

90.9

50.9

90.6

60.0

60.3

40.6

9T

ML

b0.9

90.6

70.0

40.3

00.4

10.9

90.7

80.0

50.4

40.5

60.9

90.6

50.0

40.2

90.4

20.9

90.6

50.0

40.2

90.4

2R

ML

b0.9

90.7

10.0

30.4

90.9

30.9

90.7

80.0

50.4

40.5

70.9

90.7

00.0

30.4

90.9

40.9

90.6

60.0

40.3

40.6

9


3.C Figures

TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

9.2

9.3

9.4

TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

9.4

9.5

9.6

RMLE

(a) T = 2

TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

9.0

9.2

9.4 TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

9.00

9.25

9.50RMLE

(b) T = 3

TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.75

9.00

9.25

9.50TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.75

9.00

9.25

9.50 RMLE

(c) T = 4

TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.50

8.75

9.00

9.25

9.50TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.50

8.75

9.00

9.25

9.50RMLE

(d) T = 5

Figure 3.4: Average concentrated log-likelihood function for φ in AR(1) model.


TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.25

8.50

8.75

9.00

9.25

9.50TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.5

9.0

9.5RMLE

(a) T = 6

TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.50

8.75

9.00

9.25

9.50TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.50

8.75

9.00

9.25

9.50RMLE

(b) T = 7

TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.0

8.5

9.0

TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.0

8.5

9.0

RMLE

(c) T = 8

TMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.0

8.5

9.0TMLE

RMLE

-0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

8.0

8.5

9.0

RMLE

(d) T = 9

Figure 3.5: Average concentrated log-likelihood function for φ in AR(1) model.

Chapter 4

Fixed T Dynamic Panel Data

Estimators with Multi-Factor

Errors

4.1 Introduction

There is a large literature on estimating dynamic panel data models with a two-way

error components structure and T fixed. Such models have been used in a wide range of

economic and financial applications; e.g. Euler equations for household consumption,

adjustment cost models for firms’ factor demand, and empirical models of economic

growth. In all these cases the autoregressive parameter has structural significance and

measures state dependence, which is due to the effect of habit formation, technologi-

cal/regulatory constraints, or imperfect information and uncertainty that often underlie

economic behavior and decision making in general.

Recently there has been a surge of interest in developing dynamic panel data estimators

that allow for richer error structures − mainly factor residuals. In this case standard

dynamic panel data estimators fail to provide consistent estimates of the parameters;

see e.g. Sarafidis and Robertson (2009), and Sarafidis and Wansbeek (2012) for a recent

overview. The multi-factor approach is appealing because it allows for multiple sources

of multiplicative unobserved heterogeneity, as opposed to the two-way error components

structure that represents additive heterogeneity. For example, in an empirical growth

model the factor component may reflect country-specific differences in the rate at which

99

Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 100

countries absorb time-varying technological advances that are potentially available to

all of them. In a partial adjustment model of factor input prices, the factor component

may capture common shocks that hit all producers, albeit with different intensities. In

this study we provide a review of inference methods for dynamic panel data models

with multi-factor error structure.

The majority of estimators developed in the literature are based on the Generalized

Method of Moments (GMM) approach. This is presumably because in microecono-

metric panels endogeneity of the regressors is often an issue of major importance. In

particular, Ahn, Lee, and Schmidt (2013) extend Ahn, Lee, and Schmidt (2001) to

the case of multiple factors, and propose a GMM estimator that relies on quasi-long-

differencing to eliminate the common factor component. Nauges and Thomas (2003)

utilise the quasi-differencing approach of Holtz-Eakin, Newey, and Rosen (1988), which

is computationally tractable for the single factor case, and propose similar moment

conditions to Ahn et al. (2001) mutatis mutandis. Sarafidis, Yamagata, and Robertson

(2009) propose using the popular linear first-differenced and System GMM estimators

with instruments based solely on strictly exogenous regressors. Robertson and Sarafidis

(2015) develop a GMM approach that introduces new parameters which represent the

unobserved covariances between the factor component of the error and the instruments.

Furthermore they show that given the model’s structure there exist restrictions in the

nuisance parameters that lead to a more efficient GMM estimator compared to quasi-

differencing approaches. Hayakawa (2012) shows that the moment conditions proposed

by Ahn et al. (2013) can be linearized at the expense of introducing extra parame-

ters. Finally, Bai (2013b) and Hayakawa (2012) suggest estimators that approximate

the factor loadings using a Chamberlain (1982) type projection approach, with Quasi

Maximum Likelihood estimators suggested in the former and a GMM estimator in the

latter cases.

The objective of our study is to serve as a useful guide for practitioners who wish to

apply methods that allow for multiplicative sources of unobserved heterogeneity in their

model. All methods are analyzed using a unified notational approach, to the extent

that this is possible of course, and their properties are discussed under deviations from

a baseline set of assumptions commonly employed. We pay particular attention to

calculating the number of identifiable parameters correctly, which is a requirement for

asymptotically valid inferences and consistent model selection procedures. This issue

is often overlooked in the literature. Furthermore, we consider the extensibility of

these estimators to practical situations that may frequently arise, such as their ability


to accommodate unbalanced panels, and to estimate models with common observed

factors.

Next, we investigate the finite sample performance of the estimators under a number

of different designs. In particular, we examine (i) the effect of the presence of weakly

exogenous covariates, (ii) the effect of changing the magnitude of the correlation be-

tween the factor loadings of the dependent variable and those of the covariates, (iii) the

impact of the number of moment conditions on bias and size for GMM estimators, (iv)

the impact of different levels of persistence in the data, and finally the effect of sample

size. These are important considerations with high empirical relevance. Notwithstand-

ing, to the best of our knowledge they remain largely unexplored. For example, the

simulation study in Robertson and Sarafidis (2015) does not consider the effect of us-

ing a different number of instruments on the finite sample properties of the estimator.

In Ahn et al. (2013) the design focuses on strictly exogenous regressors, while in Bai

(2013b) the results reported do not include inference. The practical issue of how to

choose initial values for the non-linear algorithms is considered in the appendix. The

results of our simulation study indicate that there are non-negligible differences in the

finite sample performance of the estimators, depending on the parametrization consid-

ered. Naturally, no estimator dominates the remaining ones universally, although it is

fair to say that some estimators are more robust than others.

The outline of the rest of the paper is as follows. The next section introduces the dy-

namic panel data model with a multi-factor error structure and discusses some under-

lying assumptions that are commonly employed in the literature. Section 4.3 presents

a large range of dynamic panel estimators developed for such models when T is small,

and discusses several technical points regarding their properties. Section 4.4 provides

some general remarks on the estimators. Section 4.5 investigates the finite sample per-

formance of the estimators. A final section concludes. The appendix analyzes in detail

the implementation of all these methods.

In what follows we briefly introduce our notation. The usual vec(·) operator denotes the

column stacking operator, while vech(·) is the corresponding operator that stacks only

the elements on and below the main diagonal. The elimination matrix Ba is defined

such that for any [a× a] matrix (not necessarily symmetric) vech(·) = Ba vec(·). The

lag-operator matrix LT is defined such that for any [T × 1] vector x = (x1, . . . , xT )′,

LTx = (0, x1, . . . , xT−1)′. Shorthand notation xi,s:k, s ≤ k is used to denote the vectors

of the form xi,s:k = (xi,s, . . . , xi,k)′. The jth column of the [x × x] identity matrix is


denoted by ej. Finally, 1(·) is the usual indicator function. For further details regarding

the notation used in this paper see Abadir and Magnus (2002).

4.2 Theoretical setup

We consider the following dynamic panel data model with a multi-factor error structure

yi,t = αyi,t−1 +K∑k=1

βkx(k)i,t + λ′ift + εi,t; i = 1, . . . , N, t = 1, . . . , T, (4.1)

where the dimension of the unobserved components λi and ft is [L× 1]. Stacking the

observations over time for each individual i yields

yi = αyi,−1 +K∑k=1

βkx(k)i + Fλi + εi,

where yi = (yi,1, . . . , yi,T )′ and similarly for (yi,−1,x(k)i ), while F = (f1, . . . ,fT )′ is

of dimension [T × L]. In what follows we list some assumptions that are commonly

employed in the literature, followed by some preliminary discussion. In Section 4.3 we

provide further discussion with regards to which of these assumptions can be strength-

ened/relaxed for each estimator analyzed.

Assumption 1: x(k)i,t has finite moments up to fourth order for all k;

Assumption 2: εi,t ∼ i.i.d. (0, σ2ε) and has finite moments up to fourth order;

Assumption 3: λi ∼ i.i.d. (0,Σλ) with finite moments up to fourth order, where Σλ

is positive definite. F is non-stochastic and bounded such that ‖F ‖< b <∞;

Assumption 4: E(εi,t|y′i,0:t−1,λ

′i,x

(k)′

i,1:τ

)= 0 for all t and k, where τ is a positive

integer that is bounded by T .

Assumption 1 is a standard regularity condition. Assumptions 2-3 are employed mainly

for simplicity and can be relaxed to some extent, details of which will be documented

later.1

1The zero-mean assumption for εi,t is actually implied by Assumption 4.


Assumption 4 can be crucial for identification, depending on the estimation approach,

because it characterizes the exogeneity properties of the covariates. In particular, we

will refer to covariates that satisfy τ = T as strictly exogenous with respect to the

idiosyncratic error component, whereas covariates that satisfy only τ = t are weakly

exogenous. When τ < t the covariates are endogenous. The exogeneity properties of

the covariates play a major role in the analysis of likelihood based estimators because

the presence of weakly exogenous or endogenous regressors may lead to inconsistent

estimates of the structural parameters, α and βk.

Furthermore, Assumption 4 implies that the idiosyncratic errors are conditionally seri-

ally uncorrelated. This can be relaxed in a relatively straightforward way, particularly

for GMM estimators; for example, an MA process of order q can be accommodated by

truncating the set of instruments with respect to y based on E(εi,t|y′i,0:s,λ

′i,x

(k)′

i,1:τ

)= 0,

where s < t− q.

Assumption 4 also implies that the idiosyncratic error is conditionally uncorrelated with

the factor loadings. This is required for identification based on internal instruments

in levels. Finally, notice that the set of our assumptions implies that yi,t has finite

fourth-order moments, but it does not imply conditional homoskedasticity for the two

error components.

Under Assumptions 1-4, the following set of population moment conditions is valid by

construction

E[vech(εiy′i,−1)] = 0T (T+1)/2. (4.2)

In addition, the following sets of moment conditions are valid, depending on whether

τ = T or τ = t hold true, respectively

E[vec(εix(k)′

i )] = 0T 2 ; (4.3)

E[vech(εix(k)′

i )] = 0T (T+1)/2. (4.4)

For all GMM estimators one can easily modify the above moment conditions to al-

low for endogenous x’s. For example, for (say) τ = t − 1 one may redefine x(k)i ≡

(xi,0, . . . , xi,T−1)′ and proceed in exactly the same way as in τ = t.

From now on we will use the triangular structure of the moment conditions induced

by the vech(·) operator to construct the estimating equations for the GMM estimators.


To achieve this we adopt the following matrix notation for the stacked model

Y = αY−1 +K∑k=1

βkXk +ΛF ′ +E; i = 1, . . . , N,

where (Y ,Y−1,Xk,E) are [N × T ] matrices with typical rows (y′i,y′i,−1,x

(k)′

i , ε′i) re-

spectively. Similarly a typical row element of Λ is given by λ′i.

Remark 4.1. For notational symmetry, while describing GMM estimators we assume

that x(k)i,0 observations are not included in the set of available instruments. Otherwise

additional T or T − 1 (depending on the estimator analyzed) moment conditions are

available. The same strategy is used in the Monte Carlo section of this paper.

4.3 Estimators

4.3.1 Quasi-differenced (QD) GMM

Replacing the expectations in (4.2) and (4.4) with sample averages yields

vech

(1

N(Y − αY−1 −

K∑k=1

βkXk −ΛF ′)′Y−1

);

vech

(1

N(Y − αY−1 −

K∑k=1

βkXk −ΛF ′)′Xk

).

These moment conditions depend on the unknown matrices F and Λ. In the simple

fixed effects model where F = ıT , the first-differencing transformation proposed by

Anderson and Hsiao (1982) is the most common approach to eliminate the fixed effects

from the equation of interest. Using a similar idea in the model with one unobserved

time varying factor, i.e.

yi,t = αyi,t−1 +K∑k=1

βkx(k)i,t + λift + εi,t,


Holtz-Eakin, Newey, and Rosen (1988) suggest eliminating the unobserved factor com-

ponent using the following quasi-differencing (QD) transformation

yi,t − rtyi,t−1 = α(yi,t−1 − rtyi,t−2) +K∑k=1

βk(x(k)i,t − rtx

(k)i,t−1) + εi,t − rtεi,t−1, (4.5)

for i = 1, . . . , N, t = 2, . . . , T where rt ≡ ft/ft−1. By construction equation (4.5) is free

from λift because

λift − rtλift−1 = λift −ftft−1

λift−1 = 0, ∀t = 2, . . . , T.

It is easy to see that the QD approach is well defined only if all ft 6= 0, t = 1, . . . , T −1.

Collecting all parameters involved in quasi-differencing we can define the corresponding

[(T − 1)× T ] QD transformation matrix by

D(r) =

−r2 1 0 · · · 0

0 −r3... 0

......

... 1...

0 0 . . . −rT 1

,

where r = (r2, . . . , rT )′. The first-differencing (FD) transformation matrix is a special

case with r2 = . . . = rT = 1. Pre-multiplying the terms inside the vech(·) operator in

the sample analogue of the population moment conditions above by D(r), and noticing

that D(r)F = 0, we can rewrite the estimating equations for the QD GMM estimator

as

mα = vech

(1

ND(r)

(Y − αY−1 −

K∑k=1

βkXk

)′Y−1J(1)′

);

mk = vech

(1

ND(r)

(Y − αY−1 −

K∑k=1

βkXk

)′XkJ(1)′

)∀k.

Here J(L) = (IT−L,O(T−L)×L) is a selection matrix that appropriately truncates the

set of instruments to ensure that the term inside the vech(·) operator is a square matrix.

One can easily see that the total number of moment conditions and parameters under

the weak exogeneity assumption for all x’s is given by

#moments =(K + 1)(T − 1)T

2; #parameters = (K + 1) + (T − 1).


The total number of parameters consists of two terms. The first term within the

brackets corresponds to K+1 parameters of interest (or structural/model parameters),

while the remaining term corresponds to T − 1 nuisance parameters, the time-varying

factors.

Remark 4.2. If we define rt ≡ ft−1/ft, we can also consider the quasi-differencing matrix

of the following type

D(r) =

1 −r2 0 · · · 0

0 1... 0

......

... −rT−2...

0 0 . . . 1 −rT

.

This transformation approach uses forward differences rather than backward differ-

ences. However, similarly to the original transformation matrix of Holtz-Eakin et al.

(1988), the estimator based on this transformation requires that all ft 6= 0, t = 2, . . . , T .

Hence the restrictions imposed by two differencing strategies overlap for t = 2, . . . , T−1,

but not for t = 1 and t = T . Finally, one can also consider transformation matrices

based on higher order forward differences.

The approach of Holtz-Eakin et al. (1988) as it stands is tailored for models with one

unobserved factor. In principle, it can be extended to multiple factors by removing

each factor consecutively based on a D(l)(r(l)) matrix, with the final transformation

matrix being a product of L such matrices. However, this approach soon becomes

computationally very cumbersome as the estimating equations become multiplicative in

r(l). On the other hand, if the model involves some observed factors, the corresponding

D(·)(·) matrix is known, leading to a simple estimator that involves equations containing

r and structural parameters only. For example, Nauges and Thomas (2003) augment

the model of Holtz-Eakin et al. (1988) by allowing for time-invariant individual effects

yi,t = ηi + αyi,t−1 +K∑k=1

βkx(k)i,t + λift + εi,t; t = 1, . . . , T,

where ηi is eliminated using the FD transformation matrix D(ıT−1), which yields

∆yi,t = α∆yi,t−1 +K∑k=1

βk∆x(k)i,t + λi∆ft + ∆εi,t; t = 2, . . . , T,


followed by the QD transformation, albeit operated based on a [(T−2)×(T−1)] matrix

D(r). The resulting number of parameters and moment conditions can be modified

accordingly from those in Holtz-Eakin et al. (1988).

Remark 4.3. The FD transformation is by no means the only way to eliminate the

fixed effects from the model. Another commonly discussed transformation is Forward

Orthogonal Deviations (FOD). If one uses FOD instead of FD, the identification of

structural parameters would require that all f ∗t 6= 0.2 Depending on the properties of

the ft’s, it might be safer (at the expense of efficiency) to use FOD even in the absence

of ηi since rt is defined for ft 6= 0, t = 1, . . . , T − 1 only.

Remark 4.4. Assumption 2 can be easily relaxed. For example, unconditional time-

series and cross-sectional heteroscedasticity of the idiosyncratic error component, εi,t,

is allowed in the two-step version of the estimator. Serial correlation can be accommo-

dated by choosing the set of instruments appropriately, as in the discussion provided

in Section 4.2. This is a particular attractive feature, which is common to all GMM

estimators discussed in this paper. Unconditional heteroscedasticity in λi can also be

allowed, although this is a less interesting extension for practical purposes since there

are no repeated observations over each λi.

Finally, endogeneity of the regressors can be easily allowed. The exogeneity property

of the covariates can be tested using an overidentifying restrictions test statistic. The

same holds for all GMM estimators discussed in this paper, which is of course a desirable

property from the empirical point of view since the issue of endogeneity in panels with

T fixed, e.g. microeconometric panels, may frequently arise.

4.3.2 Quasi-long-differenced (QLD) GMM

As we have mentioned before, the QD approach in Holtz-Eakin et al. (1988) is difficult

to generalize to more than one unobserved factor (or one unobserved factor plus ob-

served factors). Rather than eliminating factors using such transformation, Ahn, Lee,

and Schmidt (2013) propose using a quasi-long-differencing (QLD) transformation. The

factors can be removed from the model using the following QLD transformation matrix

D(F ∗)

D(F ∗) = (IT−L,F∗) = J(L) + F ∗J(L),

where F ∗ is a [T − L × L] parameter matrix and J(L) = (OL×(T−L), IL), an [L × T ]

selection matrix. Rather than using the last observation yi,t−1 to remove factors from

2Here f∗t ≡ ct(ft − (ft+1 + . . .+ fT )/(T − t)) with c2t = (T − t)/(T − t+ 1).


the model at time t (one-by-one), the QLD approach uses long-differences from the last

observations yi,T−L+1:T to remove all L factors at once.

To see this, partition F = (F ′A,−F ′B)′ where FA and FB are of dimensions [(T−L)×L]

and [L × L] respectively. Then assuming that FB is invertible, one can redefine (or

normalize) the factors and factor loadings as

Fλi =

(F ∗

−IL

)λ∗i ; F ∗ ≡ FAF−1

B ; λ∗i ≡ FBλi.

Using fairly straightforward matrix algebra it then follows

D(F ∗)Fλi = (IT−L,F∗)

(F ∗

−IL

)λ∗i = 0T−L.

One can express all available moment conditions for this estimator as

mα = vech

(D(F ∗)

1

N

(Y − αY−1 −

K∑k=1

βkXk

)′Y−1J(L)′

);

mk = vech

(D(F ∗)

1

N

(Y − αY−1 −

K∑k=1

βkXk

)′XkJ(L)′

)∀k.

Counting the number of moment conditions and resulting parameters we have

#moments =(K + 1)(T − L)(T − L+ 1)

2; #parameters = K + 1 + (T − L)L.

However, we will further argue that the number of identifiable parameters is smaller

than K + 1 + (T − L)L. To explain the reason for this, let K = 1 and rewrite the

transformed equation for yi,1 as

yi,1 +L∑l=1

f∗(l)1 yi,T−l = α

(yi,0 +

L∑l=1

f∗(l)1 yi,T−l−1

)

+ β

(xi,1 +

L∑l=1

f∗(l)1 xi,T−l

)+

(εi,1 +

L∑l=1

f∗(l)1 εi,T−l

). (4.6)

This equation has 2 + L unknown parameters in total, while the number of moment

conditions is 2 (yi,0 and xi,1). Thus, L “nuisance parameters” are identified only up

to a linear combination, unless L ≤ 2 (or K + 1 for the general model), and the total


number of identifiable parameters is

#parameters = K + 1 + (T − L)L− 1(L≥K+1)(L−K − 1)(L−K)

2.

Notice that for L = 1 the number of moment conditions and the number of identifiable

parameters is exactly the same as in the QD transformation. Thus, one expects that

the corresponding GMM estimators are asymptotically equivalent.

Remark 4.4 regarding Assumptions 2-4, as discussed in Section 4.3.1, applies identically

here as well. Ahn et al. (2013) show that under conditional homoscedasticity in εi,t

the estimation procedure simplifies considerably because it can be performed through

iterations. Furthermore, for the case where the regressors are strictly exogenous, the

resulting estimator is invariant to the chosen normalization scheme; see their Appendix

A.

Remark 4.5. One can see the quasi long-differencing transformation matrix as the lim-

iting case (in terms of the longest difference) of the forward differencing transformation

matrix in Remark 4.2.

4.3.3 Factor IV (FIVU and FIVR)

Rather than eliminating the incidental parameters λi, Robertson and Sarafidis (2015)

propose a GMM estimator, that relies on reducing these parameters onto a finite set

of estimable coefficients. They label the proposed estimator as FIVU (Factor IV Un-

restricted). Their approach makes use of centered moment conditions of the following

form

mα = vech

(1

N

(Y − αY−1 −

K∑k=1

βkXk

)′Y−1 − FG′

);

mk = vech

(1

N

(Y − αY−1 −

K∑k=1

βkXk

)′Xk − FG′k

)∀k,

where (G,Gk) are defined as

G = E[yi,−1λ′i]; Gk = E[x

(k)i λ

′i],

with typical row elements g′t and g(k)′

t respectively. The (G,Gk) matrices represent

the unobserved covariances between the instruments and the factor loadings in the


error term. This approach adopts essentially a (correlated) random effects treatment

of the factor loadings, which is natural because the asymptotics apply for N large

and T fixed, and there are no repeated observations over each λi. Thus, it is in the

spirit of Chamberlain’s projection approach. Different sensitivities to the factors (i.e.

differences in the factor loadings) can be generated by different values of the variance

of the cross sectional distribution of λi. Notice that as in Holtz-Eakin et al. (1988) and

Ahn, Lee, and Schmidt (2013), factors corresponding to loadings that are uncorrelated

with the regressors can be accommodated through the variance-covariance matrix of the

idiosyncratic error component, εi,t, i.e. E(εiε

′i

), since the latter can be left unrestricted.

For this estimator the total number of moment conditions is given by

#moments =(K + 1)T (T + 1)

2.

As the model stands right now, Gk (all K + 1) and F are not separately identifiable

because

FG′ = FUU−1G′

for any invertible [L × L] matrix U . This rotational indeterminacy is typically elimi-

nated in the factor literature by requiring an [L×L] submatrix of F to be the identity

matrix.3 These restrictions correspond to the L2 term in the equation below. Further-

more, for L > 1 additional normalizations are required due to the fact that the moment

conditions are of triangular vech(·) type. In particular, the number of identifiable pa-

rameters is

#parameters = (K + 1)(1 + TL) + TL− L2 − (K + 1)L(L− 1)

2

− 1(L≥K+1)(L−K − 1)(L−K)

2.

The (K + 1)L(L − 1)/2 term corresponds to the unobserved “last” g, while the last

term involving the indicator function corresponds to the unobserved “first” f and is

identical to the right-hand side term in the corresponding expression for the number

of identifiable parameters Ahn, Lee, and Schmidt (2013).

Notwithstanding, as shown in Robertson and Sarafidis (2015) if one is only interested

in the structural parameters, α and βk, it is not essential to impose any identifying nor-

malizations on G and F ; the resulting unrestricted estimator for structural parameters

3Robertson and Sarafidis (2015) discuss which submatrix of F has to be be invertible in order forthe estimator with weakly exogenous regressors to be consistent.


is consistent and asymptotically normal, while the variance-covariance matrix can be

consistently estimated using the corresponding sub-block of the generalized inverse of

the unrestricted variance-covariance matrix.4

Remark 4.6. Compared with the QLD estimator of Ahn et al. (2013) this estimator

utilises L(K + 1)(T − (L− 1)/2) extra moment conditions, at the expense of estimat-

ing exactly the same number of additional parameters. Hence these estimators are

asymptotically equivalent. Although in FIVU estimation one does not have to impose

any restrictions on F , for asymptotic identification the true value of FB (as defined for

QLD estimator) should still satisfy the full rank condition. Finally, the FIVU estimator

remains consistent even if the i.i.d. assumption on λi is replaced by i.h.d. (independent

and heteroscedastically distributed). However, in that situation consistent estimation

of the variance-covariance matrix is not possible. Ahn (2015) also discusses these is-

sues. Note that all other estimators that do not difference away λi, will also suffer from

this problem.

The autoregressive nature of the model suggests that individual rows of the G matrix

have also an autoregressive structure, i.e.

gt = αgt−1 +k∑k=1

βkg(k)t +Σλft.

For identification one may impose L(L + 1)/2 restrictions so that w.l.o.g. Σλ = IL.

Thus, one can express F in terms of other parameters as follows

F = (L′T − αIT )G+ eTg′T −

k∑k=1

βkGk.

Here LT is the usual lag matrix, while the additional parameter gT is introduced to

take into account the fact that in the original set of moment conditions gT = E[λiyi,T ]

does not appear as a parameter. Robertson and Sarafidis (2015) label the estimator

that takes into account restrictions imposed on F as FIVR (Factor IV Restricted)

estimator.

Robertson and Sarafidis (2015) show that FIVR is asymptotically more efficient than

FIVU and consequently than procedures involving some form of differencing. Further-

more, the restrictions imposed on a subset of the nuisance parameters appear to provide

substantial efficiency gains in finite samples.

4For further details see Theorem 3 in the corresponding paper.


Counting the total number of moment conditions and parameters, we have

#moments =(K + 1)T (T + 1)

2

#parameters = (K + 1)(1 + TL) + L− (K + 1)L(L− 1)

2.

Remark 4.7. Note that in the model without any regressors (or if regressors are strictly

exogenous), the (K + 1)L(L − 1)/2 term reduces to L(L − 1)/2. Together with the

L(L+ 1)/2 restrictions imposed on Σλ, one then has in total L2 restrictions (which is

a standard number of restrictions usually imposed for factor models).

4.3.4 Linearized QLD GMM

Hayakawa (2012) proposes a linearized GMM version of the QLD model in Ahn et al.

(2013) under strict exogeneity, at the expense of introducing extra parameters. The

moment conditions can be written as follows

mα = vech

(1

NJ

(Y − αY−1 −

K∑k=1

βkXk

)′Y−1J

′

)

+ vech

(1

NJ(L)

(Y F ∗ − Y−1F

∗α −

K∑k=1

XkF∗βk

)′Y−1J

′

);

mk = vech

(1

NJ

(Y − αY−1 −

K∑k=1

βkXk

)′Y−1J

′

)

+ vech

(1

NJ(L)

(Y F ∗ − Y−1F

∗α −

K∑k=1

XkF∗βk

)′XkJ

′

); ∀k.

The parameters F ∗α , F ∗βk do not appear in the estimator of Ahn et al. (2013). The

latter can be obtained directly by noting that

F ∗α = αF ∗; F ∗βk = βkF∗.

The linearized estimator is linear in parameters and, thus is computationally easy to

implement. On the other hand, this simplicity is not without price, as this estimator is

not as efficient as the estimator in Ahn et al. (2013). In total, under strict exogeneity


of all x(k)i,t we have

#moments =(T − L)(T − L+ 1)

2+KT (T − L);

#parameters = K + 1 + (T − L)L︸︷︷︸ALS

+ (T − L)L(K + 1)︸︷︷︸Linearization

−L(L− 1)

2.

Notice that the last term in the equation for the total number of parameters is not

present in the original study of Hayakawa (2012). To explain the necessity of this

term consider the (T − L)’th equation (for ease of exposition we set L = 2) without

exogenous regressors

yi,T−2 − f (1)T−2yi,T − f

(2)T−2yi,T−1 = αyi,T−3 + f (1)

αT−2yi,T−1 + f (2)

αT−2yi,T−2

+ εT−2,t − f (1)T−2εi,T − f

(2)T−2εi,T−1.

Clearly only f(2)T−2 + f

(1)αT−2 can be identified but not the individual terms separately.

As a result L(L− 1)/2 normalizations need to be imposed. Furthermore, as it can be

easily seen this term is unaltered if additional regressors are present in the model so

long as they do not contain other lags of yi,t or lags of exogenous regressors.

If regressors are only weakly exogenous, one has to use the Linearized QLD GMM esti-

mator with care. For simplicity consider only the case with a single weakly exogenous

regressor. Observe that we can rewrite the first equation of the transformed model as

yi,1 +L∑l=1

f(l)1 yi,T−l = αyi,0 + βxi,1 +

L∑l=1

f (l)α1yi,T−l−1 +

L∑l=1

f(l)β1xi,T−l + . . . (4.7)

This equation contains 2 + 3L unknown parameters, with only two available moment

conditions (assuming xi,0 is not observed, otherwise 3). Hence the full set of parameters

in this equation cannot be identified without further normalizations. It then follows

that the minimum value of T required in order to identify the structural parameters of

interest is such that (for simplicity assume L = 1)

2(T − 1) = 2 + 3 =⇒ min T = 1 + d2.5e = 4,

where dxe is the smallest integer not less than x (the “ceiling” function). For more

general models with K > 1, the condition min T = 4 continues to hold as

(K + 1)(T − 1) ≥ (K + 2) + (K + 1) =⇒ min T = 1 +

⌈2K + 3

K + 1

⌉= 4.


Notice that for the non-linear estimator min T = 3 in the single-factor case. As a

result, for L = 1 under weak exogeneity the number of identifiable parameters and

moment conditions is given by

#moments = (K + 1)(T − L)(T − L+ 1)

2− (K + 1);

#parameters = K + 1 + (T − L)L︸︷︷︸ALS

+ (T − L)L(K + 1)︸︷︷︸Linearization

−L(L− 1)

2− (K + 2),

where −(K+1) and −(K+2) adjustments are made to take into account the fact that

for t = 1 there are (K + 2) nuisance parameters to be estimated with (K + 1) available

moment conditions. Both expressions can be similarly modified for L > 1.

Remark 4.8. Although not discussed in Hayakawa (2012), the same linearisation strat-

egy for the QD estimator of Holtz-Eakin, Newey, and Rosen (1988) is also feasible.

4.3.5 Projection GMM

Following Bai (2013b)5, Hayakawa (2012) suggests approximating λi using a Mundlak

(1978)-Chamberlain (1982) type projection of the following form

λi = Φzi + νi,

where zi = (1,x(1)′

i , . . . ,x(K)′

i , yi,0)′. Notice that by definition of the projection E[νiz′i] =

OL×(TK+2) . As a result, the stacked model for individual i can be written as

yi = αyi,−1 +K∑k=1

βkx(k)i + FΦzi + Fνi + εi. (4.8)

While Bai (2013b) proposes maximum likelihood estimation of the above model, Hayakawa

(2012) advocates a GMM estimator; in our standard notation the total set of moment

5Note, that the first version of this paper dates back to 2009.


conditions used by Hayakawa (2012) is given by

mα =1

N

(Y − αY−1 −

K∑k=1

βkXk −ZΦ′F ′)′Y−1e1;

mι =1

N

(Y − αY−1 −

K∑k=1

βkXk −ZΦ′F ′)′ıN ;

mk = vech

(1

N

(Y − αY−1 −

K∑k=1

βkXk −ZΦ′F ′)′Xk

), ∀k.

Assuming weak exogeneity of the covariates one has

#moments = 2T +KT (T + 1)

2;

#parameters = (K + 1) + (T − L)L︸︷︷︸ALS

+L(TK + 2)︸︷︷︸Projection

.

Similarly to the FIVU estimator of Robertson and Sarafidis (2015) the number of

identifiable parameters is smaller than the nominal one and depends on the projected

variables zi.

One can further relate the Projection estimator to the FIVU estimator. To understand

the connection between two estimator better following Bond and Windmeijer (2002),

we consider a more general projection specification of the following form

λi = Φzi + νi,

where zi = (x(1)′

i , . . . ,x(K)′

i ,y′i,−1)′. The true value of Φ has the usual expression for

the projection estimator

Φ0 ≡ E [λiz′i] E [ziz

′i]−1.

The first term in the notation of Robertson and Sarafidis (2015) is simply

E [λiz′i] = (G′1, . . . ,G

′K ,G

′) . (4.9)

This estimator coincides asymptotically with the FIVU estimator of Robertson and

Sarafidis (2015), as well as with the QLD GMM estimator of Ahn et al. (2013) and QD

estimator of Holtz-Eakin et al. (1988) (for L = 1) if all T (T + 1)(K + 1)/2 moment

conditions are used. A proof for the equivalence between FIVU, QLD and QD GMM

estimators is given in Robertson and Sarafidis (2015).


4.3.6 Linear GMM

In their discussion of the test for cross-sectional dependence, Sarafidis et al. (2009)

observe that if one can assume

xi,t = Π(xi,t−1, . . . ,xi,0) + Γxift + π(εi,t−1, . . . , εi,0) + εxi,t (4.10)

where Π(·) and π(·) are measurable functions, and the stochastic components are such

that

E[εxi,sεi,l] = 0K ,∀s, l;E[vec(Γxi)λ

′i] = OKL×L,

then the following moment conditions are valid even in the presence of unobserved

factors in both equations for yi,t and xi,t

E[(yi,t − αyi,t−1 − β′xi,t)∆xi,s] = 0,∀s ≤ t;

E[(∆yi,t − α∆yi,t−1 − β′∆xi,t)xi,s] = 0,∀s ≤ t− 1.

The total number of valid (non-redundant) moment conditions is given by

#moments = K

((T − 1)T

2+ (T − 1)

),

if one does not include xi,0 and ∆xi,1 among the instruments. Under mean stationarity

additional moment conditions become available in the equations in levels, giving rise

to a system GMM estimator.

Identification of the structural parameters crucially depends on the condition that no

lagged values of yi,t are present in (4.10) as well as on the assumption that the factor

loadings of the y and x processes are uncorrelated factor loadings. However, it is

important to stress that all exogenous regressors are allowed to be weakly exogenous

due to the possible non-zero π(·) function, or even endogenous provided that εi,t is

serially uncorrelated.


4.3.7 Projection Quasi ML Estimator

To control for the correlation between the strictly exogenous regressors and the ini-

tial condition with factor loadings λi, Bai (2013b), similarly to the GMM estimator

proposed in Hayakawa (2012), considers the linear projection of the following form

λi = Φzi + νi, E[νiν′i] = Σv.

However, instead of relying on covariances as in the GMM framework, the Quasi ML

approach makes use of the following second moment estimator

S(θ) =1

N

(Y − αY−1 −

K∑k=1

βkXk −ZΦ′F ′)′(

Y − αY−1 −K∑k=1

βkXk −ZΦ′F ′),

where θ = (α,β′, σ2, vecF ′, vecΦ′)′. Evaluated at the true values of the parameters

the expected value of S(θ0) is

E[S(θ0)] = Σ = ITσ2 + FΣνF

′.

To solve the rotational indeterminacy problem, one can similarly to the FIVR estimator

of Robertson and Sarafidis (2015) normalize Σν = IL and redefine F ≡ FΣ1/2ν and

Φ ≡ ΦΣ−1/2ν . To evaluate the distance between S(θ) and Σ Bai (2013b)6 suggests

maximising the following Quasi Maximum Likelihood (QML) objective function to

obtain consistent estimates of the underlying parameters

`(θ) = −1

2

(log|Σ|+ tr

(Σ−1S

)).

Under standard regularity conditions for M-estimators the estimator obtained as the

maximizer of the objective function `(θ) is consistent and asymptotically normal for

fixed T , with asymptotic variance-covariance matrix of “sandwich” form irrespective of

the distributional assumptions imposed on the combined error term εi,t+ν′ift. If one can

replace the projection assumption by the assumption of conditional expectations, the

resulting estimator can be seen as a Quasi Maximum Likelihood estimator conditional

on exogenous regressors Xk and initial observation yi,0.

6Strictly speaking in the aforementioned paper the author solely describes the approach in termsof the likelihood function, while in Bai (2013a) the author describes a QML objective function as justone possibility.


The theoretical and finite sample properties of this estimator without factors are dis-

cussed in Alvarez and Arellano (2003), Kruiniger (2013) and Bun et al. (2015) among

others, while Westerlund and Norkute (2014) discuss the properties of this estimator

for possibly non-stationary data with large T .

The above version of the estimator requires time series homoscedasticity in εi,t for

consistency. If this condition holds true and all covariates are strictly exogenous, the

estimator provides efficiency gains over the GMM estimators analyzed before since the

latter do not make use of moment conditions that exploit homoscedasticity (see e.g.

Ahn et al. (2001)). The estimator can be modified in a straightforward manner under

time series heteroscedasticity to estimate all σ2t . On the other hand, cross-sectional

heteroscedasticity cannot be allowed without additional restrictions.

Furthermore, the estimator generally requires τ = T in Assumption 4, i.e. strict

exogeneity of the regressors. An exception to this is discussed in the following remark.

Remark 4.9. If it is plausible to assume that all exogenous regressors have the following

dynamic specification

x(k)i,t = βxx

(k)i,t−1 + αxyi,t−1 + f ′tλ

x(k)i + εxi,t, (4.11)

so that x(k)i,t is possibly weakly exogenous, then according to Bai (2013b) it is sufficient

to project on (1, x(1)i,0 , . . . , x

(K)i,0 , yi,0) only, resulting in a more efficient estimator. A

necessary condition for this approach to be valid is that the factor loadings (λx(k)i ,λi)

are independent, once conditioned on the initial observations (1, x(1)i,0 , . . . , x

(K)i,0 , yi,0).

4.4 Some general remarks on the estimators

4.4.1 (Non-)Invariance to factor loadings

In situations where the model contains fixed effects only, i.e. λ′ift = λi, some of the

classical panel data estimators can be location invariant with respect to individual

effects. For example, under mean stationarity of the initial condition the GMM esti-

mators of Anderson and Hsiao (1982)(with instruments in first differences). Hayakawa

(2009b), or the Transformed ML estimators as in Hsiao et al. (2002), Kruiniger (2013)

and Juodis (2014b) are invariant to the distribution of the fixed effects λi. In general,

irrespective of the properties of yi,0, none of the estimators presented in this paper are


invariant to λ′ift for fixed T. For GMM estimators, invariance would require knowledge

of the whole history ftTt=−∞ in order to construct instruments that are invariant to

λi. This conclusion is true both for estimators that involve some sort of differencing

(QD, QLD) and projection (FIVU, Projection GMM).

4.4.2 Unbalanced samples

As it is mentioned in e.g. Juodis (2015), for the quasi-long-differencing transformation

of Ahn et al. (2013) in the model with weakly exogenous regressors it is necessary

that for all individuals the last L observations are available to the researcher. Other-

wise the D(F ∗) transformation matrix might become group-specific, if one can group

observations based on availability.

To see this in more detail, consider Equation (4.6). As it stands, the quasi-long-

differencing transformation that removes the incidental parameters from the error is

feasible for individual i only if the last L periods are available. Otherwise, these

individuals may either be dropped out altogether, or be grouped such that it becomes

possible to normalize on different T − L periods. Either way, the estimator may suffer

from a substantial loss in efficiency, as a result of removing observations of splitting

the sample. On the other hand, if it is plausible to assume that the model contains

only strictly exogenous regressors, then it is sufficient that there exist L common time

indices t(1), . . . , t(L) where observations for all individuals are available.

The extension of FIVU and FIVR to unbalanced samples follows trivially by simply in-

troducing indicators, depending on whether a particular moment condition is available

for individual i or not (as for the standard fixed effects estimator).

The QD GMM estimator of Nauges and Thomas (2003) can be trivially modified as

well, as in the standard Arellano and Bond (1991) procedure. However, similarly to

that procedure, this transformation might result in dropping quite a lot of observations.

The projection estimator of Hayakawa (2012) requires further modifications in order to

take into account that projection variables zi are not fully observed for each individual.

We conjecture that the modification could be performed in a similar way as in the model

without a factor structure, as discussed by Abrevaya (2013). For maximum likelihood

based estimators, such extendability appears to be a more challenging task.

Remark 4.10. The above discussion relies on the existence of a large enough number

of consecutive time periods for each individual in the sample. For example, FIVU


requires at least two consecutive periods and quasi-differencing type procedures require

at least three. Under these circumstances, we note that estimators in their existing

form may not be fully efficient. For example, if one observes only yi,T and yi,T−2 for

a substantial group of individuals, assuming exogenous covariates are available at all

time periods, then one could use backward substitution and consider moment conditions

within the FIVU framework, which are quadratic in the autoregressive parameter and

result in efficiency gains. For projection type methodologies, however, such substantial

unbalancedness may affect the consistency of the estimators as one cannot substitute

unobserved quantities for zeros in the projection term. This issue is discussed in detail

by Abrevaya (2013).

4.4.3 Observed factors

In some situations one might want to estimate models with both observed and un-

observed factors at the same time. Taking the structure of observed factors into ac-

count may improve the efficiency of the estimators, although one can still consistently

estimate the model by treating the observed factors as unobserved. One such possi-

bility has been already discussed in Nauges and Thomas (2003) for models with an

individual-specific, time-invariant effect. In this section we will briefly summarize im-

plementability issues for all estimators when observed factors are present in the model

alongside their unobserved counterparts.7

For the GMM estimators that involve some form of differencing, e.g. Holtz-Eakin et al.

(1988) and Ahn et al. (2013), one can deal with observed factors using a similar proce-

dure as in Nauges and Thomas (2003), that is, by removing the observed factors first

(one-by-one) and then proceeding to remove the unobserved factors from the model.

The first step can be most easily implemented using a quasi-differencing matrix D(r)

with known weights.

For the GMM estimators of Robertson and Sarafidis (2015) (FIVU) and Hayakawa

(2012), since the unobserved factors are not removed from the model, the treatment

of the observed factors is somewhat easier. One merely needs to split the FG′ terms

into two parts, observed and unobserved factors, and then proceed as in the case of

unobserved factors. In this case the number of identified parameters will be smaller

7Under assumption that appropriate regularity conditions hold, which prohibit asymptoticcollinearity between the observed and unobserved factors.


than in the case where one treats the observed factors as unobserved. As a result, one

gains efficiency, at the expense, however, of robustness.

For FIVR one needs to take care when solving for F in terms of the remaining parame-

ters, because in the model with observed factors one estimates the variance-covariance

matrix of the factor loadings for the observed factors, while for those which are unob-

served their variance-covariance matrix is normalized.

The extension of the likelihood estimator of Bai (2013b) to observed factors can be

implemented in a similar way to the projection GMM estimator. As in FIVR, one

would have to estimate the variance-covariance matrix of the factor loadings for the

observed factors, while the covariances of unobserved factors can be w.l.o.g. normalized

as before.


This section investigates the finite sample performance of the estimators analyzed above

using simulated data. Our focus lies on examining the effect of the presence of weakly

exogenous covariates, the effect of changing the magnitude of the correlation between

the factor loadings of the dependent variable and those of the covariates, as well as

the impact of changing the number of moment conditions on bias and size for GMM

estimators. We also investigate the effect of changing the level of persistence in the

data, as well as the sample size in terms of both N and T .

4.5.1 Setup and designs

We consider model (4.1) with K = 1, i.e.

yi,t = αyi,t−1 + βxi,t + ui,t; ui,t =L∑`=1

λ`,if`,t + εyi,t.

The process for xi,t and for ft is given, respectively, by

xi,t = δyi,t−1 + αxxi,t−1 +L∑`=1

γ`,if`,t + εxi,t;

f`,t = αff`,t−1 +√

1− α2fεf`,t; εf`,t ∼ N (0, 1), ∀`.


The factor loadings are generated by λ`,i ∼ N (0, 1) and

γ`,i = ρλ`,i +√

1− ρ2υf`,i; υf`,i ∼ N (0, 1)∀`,

where ρ denotes the correlation between the factor loadings of the y and x processes.

Furthermore, the idiosyncratic errors are generated as8

εyi,t ∼ N (0, 1) ; εxi,t ∼ N(0, σ2

x

).

The starting period for the model is t = −S and the initial observations are generated

as

yi,−S =L∑`=1

λ`,if`,−S + εyi,−S; xi,−S =L∑`=1

γ`,if`,−S + εxi,−S; f−S ∼ N (0, 1).

The signal-to-noise ratio of the model is defined as follows

SNR ≡ 1

T

T∑t=1

var(yi,t|λ`,i, γ`,i, f`,sts=−S

)var εyi,t

− 1.

σ2x is set such that the signal-to-noise ratio is equal to SNR = 5 in all designs.9 This

particular value of SNR is chosen so that it is possible to control this measure across

all designs. Lower values of SNR (e.g. 3 as in Bun and Kiviet (2006)) would require

σ2x < 0 ceteris paribus in order to satisfy the desired equality for all designs.

We set β = 1−α such that the long run parameter is equal to 1, αx = 0.6, αf = 0.5 and

L = 1.10 We consider N = 200; 800 and T = 4; 8. Furthermore, α = 0.4; 0.8,ρ = 0; 0.6 and δ = 0; 0.3. The minimum number of replications performed equals

2, 000 for each design and the factors are drawn in each replication. The choice of

the initial values of the parameters for the nonlinear algorithms is discussed in 4.A.

8We have also explored the effect of non-normal errors based on the chi-squared distribution(centered and normalized). The results were almost identical and therefore, to save space, we refrainfrom reporting them.

9To ensure this, we also set S = 5.10Similar results have been obtained for L = 2. To avoid repeating similar conclusions we refrain

from reporting these results. We note that the number of factors can be estimated for all GMMestimators based on the model information criteria developed by Ahn et al. (2013). The performanceof these procedures appears to be more than satisfactory; the interested reader may refer to theaforementioned paper, as well as to the Monte Carlo study in Robertson, Sarafidis, and Westerlund(2014). The size of L is treated as known in this paper because there is currently no equivalentmethodology proposed for testing the number of factors within the likelihood framework.


When at least one of the estimators fails to converge in a particular replication, that

replication is discarded.11

Note that for the QML estimator we use standard errors based on a “sandwich”

variance-covariance matrix, as opposed to the simple inverse of the Hessian variance

matrix. First order conditions as well as Hessian matrices for likelihood estimators are

obtained using analytical derivatives to speed-up the computations.12

Although feasible, in this paper we do not implement the linearized GMM estimator

of Hayakawa (2012) adapted to weakly exogenous regressors. This is mainly due to

the fact that this estimator merely provides an easy way to obtain starting values for

the remaining estimators, which involve non-linear optimization algorithms. Motivated

from our theoretical discussion regarding the estimators considered in this paper, some

implications can be discussed a priori, based on our Monte Carlo design.

1. When δ 6= 0, likelihood based estimators are inconsistent, with the exception of

the modified estimator of Bai (2013b) conditional on (yi,0, xi,0).

2. For ρ 6= 0 the likelihood estimator conditional on (yi,0, xi,0) is inconsistent because

the conditional independence assumption is violated.

3. For α = 0.8, ρ = 0, δ = 0 the projection GMM estimator might suffer from weak

instruments because yi,0 remains the only relevant instrument.

Remark 4.11. Please note that although discussed in Section 4.3.1, the QD estimator

of Holtz-Eakin et al. (1988) is not included in the Monte Carlo study. As discussed

in Robertson et al. (2014), this estimator is asymptotically equivalent to the QLD

estimator of Ahn et al. (2013) (if all ft 6= 0, t = 1, . . . , T ) and thus it can be expected

that finite sample results should be similar. Furthermore, given that the QD estimator

requires more stringent normalization restrictions than the QLD estimator, one can

suspect that finite sample results can be even somewhat more sensitive to the DGP of

ft. Unreported preliminary results, that can be obtained from authors upon request,

confirm this observation.

11For the numerical maximization we used the BFGS method as implemented in the OxMetricsstatistical software. Convergence is achieved when the difference in the value of the given objectivefunction between two consecutive iterations is less than 10−4. Other values of this criterion wereconsidered in the preliminary study with similar qualitative conclusions, although the number of timesparticular estimators fail to converge varies. For further details on OxMetrics see Doornik (2009).

12In the preliminary study, results based on analytical and numerical derivatives were compared.Since the results were quantitatively and qualitatively almost identical (for designs where estimatorswere consistent), we prefer the use of analytical derivatives solely for practical reasons.


4.5.2 Results

The results are reported in the Appendix in terms of median bias and root median

square error, which is defined as

RMSE =√

med[(αr − α)2],

where αr denotes the value of α obtained in the rth replication using a particular estima-

tor (and similarly for β). As an additional measure of dispersion we report the radius

of the interval centered on the median containing 80% of the observations, divided by

1.28. This statistic, which we shall refer to as “quasi-standard deviation” (denoted

qStd) provides an estimate of the population standard deviation if the distribution

were normal, with the advantage that it is more robust to the occurrence of outliers

compared to the usual expression for the standard deviation. The reason we report

this statistic is that, on the one hand, the root mean square error is extremely sensitive

to outliers, and on the other hand it is fair to say that the root median square error

does not depend on outliers pretty much at all. Therefore, the former could be unduly

misleading given that in principle, for any given data set, one could estimate the model

using a large set of different initial values in an attempt to avoid local minima, or lack

of convergence in some cases (which we deal with in our experiments by discarding

those particular replications). In a large-scale simulation experiment as ours, however,

the set of initial values naturally needs to be restricted in some sensible/feasible way.

The quasi-standard deviation lies in-between because while it provides a measure of

dispersion that is less sensitive to outliers compared to the root mean square error,

it is still more informative about the variability of the estimators relative to the root

median square error. Finally, we report size, where nominal size is set at 5%. For

the GMM estimators we also report size of the overidentifying restrictions (“J”) test

statistic.

Initially we discuss results for the OLS estimator, the GMM estimator proposed by

Sarafidis, Yamagata, and Robertson (2009) and the linearized GMM estimator of

Hayakawa (2012); these estimators have been used to obtain initial values for the

parameters for the non-linear estimators, among other (random) choices. As we can

see in Table 4.1, in many circumstances the OLS estimator exhibits large median bias,

while the size of the estimator is most often not far from unity. On the other hand, the

linear GMM estimator proposed by Sarafidis, Yamagata, and Robertson (2009) does

fairly well both in terms of bias and RMSE when δ = 0 and ρ = 0, i.e. when the

covariate is strictly exogenous with respect to the total error term, ui,t. The size of


the estimator appears to be somewhat upwardly distorted, especially for T large, but

one expects that this would substantially improve if one made use of the finite-sample

correction proposed by Windmeijer (2005). On the other hand, the estimator is not

consistent for the remaining parameterizations of our design and this is well reflected

in its finite sample performance. Notably, the “J” statistic appears to have high power

to detect violations of the null, even if N is small.

With regards to the linearized GMM estimator of Hayakawa (2012), both median bias

and RMSE are reasonably small, even for N = 200, so long as δ = 0, i.e. under strict

exogeneity of x with respect to the idiosyncratic error. However, the estimator appears

to be quite sensitive to high values of α, both in terms of bias and qStd, an outcome

that may be partially related to the fact that the value of β is small in this case,

which implies that a many-weak instruments type problem might arise. Naturally, the

performance of the estimator deteriorates for δ = 0.3 as the moment conditions are

invalidated in this case. While the size of the “J” statistic appears to be distorted

upwards when the estimator is consistent, it has in general quite large power to detect

violations of strict exogeneity, and for high values of α this holds true even with a

relatively small N .

Tables 4.3 and 4.4 report results for the quasi-long-differenced GMM estimator pro-

posed by Ahn, Lee, and Schmidt (2013). The only difference between the two tables is

that Table 4.3 is based on the “pseudo-full” set of moment conditions, i.e. T (T − 1),

obtained by always treating xi,t as weakly exogenous, while Table 4.4 is based on the

4 most recent lags of the variables. In the latter case the number of instruments is of

order O(T ). This strategy is possible to implement only for T = 8, as for T = 4 there

are not enough degrees of freedom to identify the model when truncating the moment

conditions to such extent.13 The estimator appears to have small median bias under

all designs. This is expected given that the estimator is consistent. The qStd results

indicate that the estimator has large dispersion in some designs, especially when T is

small. We have explored further the underlying reason for this result. We found that

this is often the case when the value of the factor at the last time period, i.e. fT ,

is close to zero. Thus, the estimator appears to be potentially sensitive to this issue,

because the normalization scheme sets fT = 1.14 The two-step version improves on

these results. On the other hand, inferences based on one-step estimates seem to be

13To be more precise, the total number of moment conditions for the subset estimator is q(2(T −1) + 1− q), where in our case q = 4.

14Notice that imposing a different normalization, e.g. fT−1 = 1 would result in losing T momentconditions, as explained in the main text.


relatively more reliable. This outcome may be attributed to the standard argument

provided for linear GMM estimators, which is that two-step estimators rely on an es-

timate of the variance-covariance matrix of the moment conditions, which, in samples

where N is small, can lead to conservative standard errors. Notice that a Windmeijer

(2005) type correction is not trivial here because the proposed expression applies to

linear estimators only. Truncating the moment conditions for T = 8 seems to have a

negligible effect on the size properties of the one-step estimator but does improve size

for the two-step estimator quite substantially. This result seems to apply for all overi-

dentified GMM estimators actually. The “J” statistic exhibits small size distortions

upwards.

Tables 4.5 and 4.8 report results for FIVU and FIVR based on either the full or the

truncated sets of moment conditions, proposed by Robertson and Sarafidis (2015).

Similarly to Ahn et al. (2013), both estimators have very small median bias in all

circumstances. Furthermore, they perform well in terms of qStd. Especially the two-

step versions have small dispersion regardless of the design. Naturally, the dispersion

decreases further with high values of T because the degree of overidentification of

the model increases. As expected, RMSE appears to go down roughly at the rate of√N . FIVR dominates FIVU, which is not surprising given that the former imposes

overidentifying restrictions arising from the structure of the model and thus it estimates

a smaller number of parameters. The size of one-step FIVU and FIVR estimators is

close to its nominal value in all circumstances. On the other hand, the two-step versions

appear to be size distorted when T is large, although the distortion decreases when

only a subset of the moment conditions is used. Thus, one may conclude that using the

full set of moment conditions and relying on inferences based on first-step estimates is a

sensible strategy. From an empirical point of view this is appealing because it simplifies

matters regarding how many instruments to use − an important question that often

arises in two-way error components models estimated using linear GMM estimators.

Finally, the size of the “J” statistic is often slightly distorted when N is small, but

improves rapidly as N increases.

The projection GMM estimator proposed by Hayakawa (2012) has small bias and per-

forms well in general in terms of qStd unless α is close to unity, in which case outliers

seem to occur relatively more frequently. One could suspect that this design is the worst

case scenario for the estimator because only yi,0 is included in the set of instruments,

while lagged values of xi,t are only weakly correlated with yi,t−1. Inferences based on

the first-step estimator are reasonably accurate, certainly more so compared to the


two-step version, although the latter improves for the truncated set of moment condi-

tions. The “J” statistic seems to be size-distorted downwards but it slowly improves

for larger values of N .

Finally, Table 4.11 reports results for the conditional maximum likelihood estimator

proposed by Bai (2013b). The left panel corresponds to the estimator that treats xi,t as

strictly exogenous with respect to the idiosyncratic error, while the panel on the right-

hand side corresponds to the estimator that is consistent under weak exogeneity of a

first-order form15, which is satisfied in our design, assuming that ρ = 0. Interestingly,

the former appears to exhibit negligible median bias in all cases, even when both δ and

ρ take non-zero values. The dispersion of the estimator is small as well, unless T = 4

and δ = 0.3. Likewise the size of the estimator is distorted upwards when δ = 0.3

and gets worse with higher values of N , which is natural given that the estimator

is not consistent in this case. However, for cases where this estimator is consistent

(δ = 0 and ρ = 0), it may serve as a benchmark because it has negligible bias and

excellent size. This can be expected given the asymptotic optimality of this estimator

under conditional homoscedasticity of εi,t. The conclusion is pretty much invariant to

different values of N, T or ρ. The second estimator, in designs with ρ = 0.6 where it

is not consistent, tends to have substantial bias for both α and β. On the other hand,

when it is supposed to be consistent (δ = 0.3, ρ = 0.0) it is more size distorted than

the first estimator that is inconsistent. This is a somewhat puzzling finding.

Remark 4.12. Monte Carlo evidence in Juodis and Sarafidis (2015) suggest that the

standard error correction as in Windmeijer (2005) can substantially improve the empir-

ical size of the two-step FIVU estimator. We suspect that the same is also applicable

to the estimators of Ahn, Lee, and Schmidt (2013) and Hayakawa (2012). However,

extensive analysis of this issue is beyond the scope of this paper.

4.6 Conclusions

In this paper we have analyzed a group of fixed T dynamic panel data estimators with

a multi-factor error structure. All currently available estimators have been presented

using a unified notation. Both their theoretical properties as well as possible limitations

are discussed. We have considered a model with a lagged dependent variable and

additional regressors, possibly weakly exogenous or endogenous. We found that the

15That is, when xi,t follows an AR(1) process.


number of identifiable parameters for the GMM estimators can be smaller than what

can be found in the literature. This result is of major importance for practitioners

when performing model selection based on overidentifying test statistics. Theoretical

discussions in this paper were complemented by a finite sample study based on Monte

Carlo simulation.

We designed our Monte Carlo exercise to shed some light on the relative merits of

the various estimation approaches. It was found that the likelihood estimator of Bai

(2013b), when consistent, can serve as a benchmark in that it has negligible bias and

good size control, irrespective of the sample size. Under such circumstances, the FIVR

estimator proposed by Robertson and Sarafidis (2015) performs closely as well. How-

ever, FIVR is more robust to violations from strict exogeneity, as well as from the no

conditional correlation condition between the factor loadings. The latter applies to

other GMM estimators as well, at least provided that the cross-sectional dimension is

large enough.

This paper assumes that the time-series dimension is fixed. Bai (2013b) shows that

the presence of factors does not result in an incidental parameters problem for the

conditional maximum likelihood estimator as far as the structural parameters are con-

cerned. A natural question to ask is whether GMM estimators in models where the

number of parameters and number of moment conditions grows with T suffer from an

incidental parameters problem. Furthermore, in this paper we assume that all method

of moments estimators do not suffer from the presence of weak instruments. The anal-

ysis of the estimating procedures when this assumption might be violated is already

non-trivial for the models without factors, see e.g. Bun and Kleibergen (2014) and

Bun and Poldermans (2015). We leave detailed analysis of these questions for future

research.

4.A Starting values for non-linear estimators

This appendix discusses the choice of starting values used for the non-linear optimiza-

tion algorithms.

Ahn et al. (2013). Under conditional homoscedasticity in εi,t, this estimator can be

implemented through an iterative procedure. Iterations start given some set of initial

values for the structural parameters, α, β. For this purpose, we use both the one- and

two-step linearized GMM estimator as proposed by Hayakawa (2012), as well as the


OLS estimator. The two-step estimator is implemented in exactly the same way except

that the set of initial values for the structural parameters includes the one-step esti-

mator. Once final estimates of α, β and F are obtained, these are used as initial values

in the non-linear optimization algorithm, which optimises all parameters at once. This

is implemented in order to make sure that we indeed find the global minimum of the

objective function.

FIVU. Similarly to the previous estimator, FIVU can also be implemented in steps.

Iterations start given a set of starting values for the factors F . This set is obtained

using the linearized GMM estimator, estimates of the principal components extracted

from OLS residuals, and one set of uniform random variables on [−1; 1]. Unlike for

Ahn et al. (2013), joint non-linear optimization is not used as a final step in order to

save computational time.

FIVR. For this estimator the main source of starting values is obtained from FIVU

with the starting value of gT implied in terms of other parameters. Other starting

values include those based on the OLS estimator and the one- and two-step linearized

GMM estimator. In this case starting values for the nuisance parameters G are simply

drawn from uniform [−1; 1].

Projection GMM. This estimator is implemented in exactly the same way as Ahn

et al. (2013), i.e. first an iterative procedure is used, followed by a non-linear one.

Starting values for the factors are obtained using the principal components extracted

from OLS residuals, the estimate of f is obtained from the linearized GMM estimator,

and two sets of uniform random variables on [−1; 1]. In order to uniquely identify

all parameters up to rotation, we impose fT = 1 in estimation. We suspect that in

principle, similarly to FIVU, one can estimate the model without normalizations and

perform a degrees of freedom correction at the end. We leave this question open for

future research.

Projection MLE. Starting values for the structural parameters are obtained using the

linearized GMM estimator, OLS, and two sets of uniform random variables on [−1; 1].

The remaining parameters (including log(σ2)) are drawn as uniform random variables

on [0; 1]. In the preliminary study we also tried [−1; 1], but the results were identical.

Alternatively, one could also use the principal component estimates of F obtained from

OLS residuals, as suggested by Bai (2013b).

Subset GMM estimators. For T = 8 when both the subset and full-set GMM es-

timators are available, we estimate the subset estimators first using the algorithms as

described above and then use the subset estimator as starting values for the estimators

that make use of the full set of moment conditions.


4.B Tables

Chapter 4. Fixed T Dynamic Panel Data Estimators with Multi-Factor Errors 131T

able

4.1

:O

LS

esti

mat

oran

dSyst

emG

MM

esti

mat

orby

Sar

afidis

,Y

amag

ata,

and

Rob

erts

on(2

009)

Des

ign

sO

LS

Su

b-S

yst

emα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

04

.4.0

.0.0

22.0

48.1

35.6

09

-.00

8.0

25.0

69.2

47

-.00

2.0

29

.089

.060

-.002

.021

.065

.060

.041

200

4.4

.0.3

.005

.051

.146

.438

-.04

8.0

62.1

46.4

85

-.08

0.0

94

.228

.351

.037

.069

.204

.310

.707

200

4.4

.6.0

-.03

5.0

51.1

39.6

33.0

88

.088

.092

.851

-.03

5.0

56

.152

.405

.086

.087

.130

.638

.720

200

4.4

.6.3

-.17

0.1

70.1

62.9

21.1

41

.141

.162

.817

-.32

0.3

20

.237

.907

.289

.289

.320

.878

.866

200

4.8

.0.0

-.04

8.0

50.0

97.6

62.0

09

.013

.035

.139

-.03

8.0

91

.299

.105

-.012

.032

.108

.082

.044

200

4.8

.0.3

-.06

6.0

66.1

02.6

47-.

031

.045

.114

.412

-.30

1.3

05

.684

.649

-.007

.096

.299

.413

.823

200

4.8

.6.0

-.08

3.0

83.1

02.8

35.0

64

.064

.059

.893

-.11

3.1

47

.397

.488

.029

.054

.158

.360

.587

200

4.8

.6.3

-.18

1.1

81.1

37.9

64.1

09

.109

.131

.799

-.44

5.4

45

.403

.907

.246

.246

.319

.808

.818

200

8.4

.0.0

.037

.045

.110

.691

-.01

8.0

24.0

61.3

55

-.00

3.0

14

.044

.148

-.001

.012

.036

.135

.029

200

8.4

.0.3

.032

.051

.129

.519

-.06

0.0

65.1

18.5

67

-.12

2.1

22

.160

.772

.090

.095

.165

.653

.901

200

8.4

.6.0

-.01

3.0

41.1

16.6

67.0

77

.077

.067

.934

-.04

5.0

47

.089

.669

.087

.087

.079

.933

.768

200

8.4

.6.3

-.14

9.1

49.1

22.9

71.1

54

.154

.126

.952

-.36

2.3

62

.148

1.3

93.3

93

.207

.999

.988

200

8.8

.0.0

-.01

6.0

31.0

84.6

41.0

03

.010

.029

.103

-.03

3.0

41

.115

.248

-.007

.013

.039

.179

.039

200

8.8

.0.3

-.02

3.0

36.1

01.4

44-.

059

.063

.111

.564

-.40

4.4

04

.465

.960

.095

.139

.396

.692

.990

200

8.8

.6.0

-.04

5.0

46.0

82.7

60.0

62

.062

.040

.980

-.09

7.0

99

.204

.766

.038

.040

.073

.653

.680

200

8.8

.6.3

-.17

7.1

77.1

08.9

99.1

65

.165

.135

.952

-.57

0.5

70

.211

1.5

13.5

13

.299

1.9

7280

04

.4.0

.0.0

31.0

79.2

21.8

46

-.01

2.0

31.0

89.4

37

-.00

1.0

18

.056

.053

.000

.016

.049

.048

.053

800

4.4

.0.3

.004

.054

.152

.714

-.05

7.0

69.1

55.7

19

-.07

5.0

88

.193

.565

.050

.071

.190

.544

.949

800

4.4

.6.0

-.06

4.0

85.2

02.8

67.2

17

.217

.166

.987

-.12

2.1

27

.177

.857

.267

.267

.171

.957

.986

800

4.4

.6.3

-.18

1.1

81.1

68.9

70.1

54

.154

.182

.928

-.36

4.3

64

.212

.960

.366

.366

.325

.968

.985

800

4.8

.0.0

-.06

9.0

71.1

37.8

58.0

05

.013

.038

.086

-.01

4.0

45

.148

.069

-.002

.017

.057

.053

.048

800

4.8

.0.3

-.06

1.0

61.1

06.8

05-.

048

.057

.136

.703

-.29

5.3

05

.630

.807

.015

.104

.323

.645

.978

800

4.8

.6.0

-.11

0.1

10.1

34.9

29.2

08

.208

.135

.989

-.20

9.2

20

.359

.878

.232

.233

.183

.933

.979

800

4.8

.6.3

-.19

9.1

99.1

48.9

93.1

36

.136

.171

.919

-.51

5.5

15

.305

.965

.399

.399

.331

.967

.971

800

8.4

.0.0

.063

.074

.162

.876

-.02

9.0

34.0

86.5

49

-.00

1.0

10

.030

.074

.000

.010

.030

.059

.042

800

8.4

.0.3

.035

.051

.124

.740

-.06

7.0

69.1

06.7

91

-.10

4.1

04

.123

.871

.081

.082

.117

.773

180

08

.4.6

.0-.

036

.057

.148

.841

.205

.205

.118

1-.

129

.129

.086

.974

.236

.236

.088

11

800

8.4

.6.3

-.15

8.1

58.1

16.9

98.1

68

.168

.125

.992

-.36

2.3

62

.111

1.4

03.4

03

.163

11

800

8.8

.0.0

-.02

3.0

40.1

11.8

63.0

02

.010

.032

.083

-.00

6.0

19

.061

.096

.000

.009

.026

.057

.046

800

8.8

.0.3

-.02

3.0

34.0

92.7

04-.

057

.058

.091

.769

-.36

5.3

65

.453

.974

.069

.109

.319

.772

180

08

.8.6

.0-.

068

.068

.095

.925

.209

.209

.081

1-.

200

.200

.182

.982

.210

.210

.084

.999

180

08

.8.6

.3-.

169

.169

.095

1.1

57

.157

.118

.993

-.53

0.5

30

.180

1.4

62.4

62

.240

.998

1


able

4.2

:L

inea

rize

dG

MM

Est

imat

orof

Hay

akaw

a(2

012)

wit

hst

rict

exog

enei

tyas

sum

pti

on

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

04

.4.0

.0-.

004

.030

.097

.060

-.00

5.0

30.0

99.0

57

-.00

3.0

25

.077

.120

-.008

.024

.076

.111

.125

200

4.4

.0.3

-.05

9.0

65.1

60.2

14-.

160

.168

.247

.504

-.03

2.0

51

.142

.306

-.189

.190

.215

.812

.239

200

4.4

.6.0

-.01

2.0

31.1

09.0

79.0

00

.029

.103

.068

-.00

7.0

26

.085

.147

-.005

.025

.080

.128

.133

200

4.4

.6.3

-.08

5.0

86.1

81.2

91-.

160

.174

.262

.503

-.05

9.0

65

.150

.404

-.195

.196

.228

.831

.216

200

4.8

.0.0

-.06

0.0

77.2

16.1

93-.

010

.025

.084

.085

-.06

0.0

74

.209

.281

-.014

.023

.077

.194

.179

200

4.8

.0.3

-.32

2.3

22.3

01.7

68-.

125

.127

.134

.643

-.34

8.3

48

.320

.930

-.157

.157

.096

.905

.098

200

4.8

.6.0

-.07

5.0

90.2

42.2

36-.

008

.025

.084

.095

-.07

2.0

88

.243

.345

-.017

.026

.076

.207

.193

200

4.8

.6.3

-.34

7.3

47.3

05.7

61-.

126

.130

.134

.627

-.38

0.3

80

.334

.938

-.157

.157

.089

.905

.082

200

8.4

.0.0

-.00

3.0

22.0

75.0

94.0

00

.023

.078

.092

.000

.015

.048

.339

-.003

.015

.047

.333

.108

200

8.4

.0.3

-.06

4.0

70.1

68.2

79-.

015

.063

.219

.157

-.02

9.0

39

.102

.525

-.058

.068

.142

.695

.642

200

8.4

.6.0

-.01

2.0

24.0

92.1

17.0

10

.022

.091

.113

-.00

6.0

17

.056

.372

.004

.015

.051

.331

.114

200

8.4

.6.3

-.08

0.0

80.2

00.3

74-.

007

.063

.267

.186

-.04

2.0

44

.117

.584

-.057

.073

.164

.707

.583

200

8.8

.0.0

-.02

4.0

29.0

92.1

65-.

003

.015

.051

.080

-.02

0.0

24

.071

.433

-.005

.011

.036

.311

.118

200

8.8

.0.3

-.20

1.2

01.1

79.8

20-.

048

.074

.215

.317

-.19

3.1

93

.149

.991

-.086

.095

.126

.852

.600

200

8.8

.6.0

-.02

9.0

33.1

06.2

16.0

04

.015

.063

.111

-.02

5.0

27

.079

.476

-.002

.011

.038

.319

.104

200

8.8

.6.3

-.20

8.2

08.1

85.8

84-.

048

.077

.252

.340

-.20

0.2

00

.137

.996

-.089

.097

.137

.869

.508

800

4.4

.0.0

-.00

5.0

28.1

02.0

81-.

007

.032

.117

.078

-.00

2.0

23

.074

.143

-.006

.023

.076

.117

.149

800

4.4

.0.3

-.06

6.0

69.1

22.4

78-.

192

.194

.227

.726

-.03

7.0

55

.128

.603

-.215

.215

.178

.979

.818

800

4.4

.6.0

-.00

8.0

28.1

08.0

93-.

003

.033

.114

.087

-.00

4.0

23

.083

.160

-.005

.024

.084

.142

.160

800

4.4

.6.3

-.07

8.0

78.1

25.5

49-.

200

.203

.194

.773

-.05

4.0

57

.118

.605

-.229

.229

.175

.980

.732

800

4.8

.0.0

-.08

2.0

98.3

02.2

55-.

020

.035

.123

.144

-.07

3.0

87

.292

.339

-.021

.031

.122

.266

.203

800

4.8

.0.3

-.38

9.3

89.3

07.8

92-.

148

.149

.121

.806

-.43

6.4

36

.321

.981

-.178

.178

.067

.995

.549

800

4.8

.6.0

-.10

6.1

18.3

16.3

07-.

022

.037

.120

.156

-.09

9.1

12

.341

.422

-.028

.036

.118

.312

.233

800

4.8

.6.3

-.40

9.4

09.3

11.8

87-.

151

.152

.107

.824

-.45

8.4

58

.308

.985

-.182

.182

.051

.991

.436

800

8.4

.0.0

-.00

3.0

19.0

79.0

88-.

002

.024

.099

.112

.000

.011

.035

.208

-.004

.012

.039

.199

.167

800

8.4

.0.3

-.06

6.0

69.1

17.5

15-.

019

.052

.157

.290

-.01

3.0

25

.066

.528

-.085

.087

.089

.915

180

08

.4.6

.0-.

007

.020

.077

.106

.002

.022

.092

.113

-.00

3.0

12

.036

.209

-.002

.012

.037

.173

.163

800

8.4

.6.3

-.07

2.0

73.1

17.5

85-.

027

.053

.166

.314

-.01

9.0

24

.057

.511

-.094

.096

.083

.952

180

08

.8.0

.0-.

027

.029

.107

.242

-.00

4.0

19.0

71.0

91

-.02

3.0

24

.075

.415

-.008

.012

.040

.250

.193

800

8.8

.0.3

-.18

5.1

85.1

41.8

84-.

057

.067

.125

.531

-.18

2.1

82

.140

.984

-.103

.104

.089

.974

180

08

.8.6

.0-.

031

.033

.112

.275

-.00

3.0

19.0

71.0

91

-.02

5.0

26

.079

.459

-.008

.013

.039

.259

.192

800

8.8

.6.3

-.19

2.1

92.1

36.9

26-.

062

.073

.133

.572

-.18

8.1

88

.124

.993

-.109

.110

.076

.977

1


able

4.3

:G

MM

esti

mat

orof

Ahn,

Lee

,an

dSch

mid

t(2

013)

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

04

.4.0

.0.0

01.0

28.0

87.0

75

-.00

2.0

26.0

85.0

56

-.00

1.0

22

.067

.137

.000

.021

.065

.102

.097

200

4.4

.0.3

-.00

1.0

55.2

00.1

09-.

005

.057

.199

.111

-.00

7.0

38

.134

.148

.000

.041

.137

.158

.085

200

4.4

.6.0

-.00

5.0

29.0

97.0

94.0

04

.025

.083

.063

-.00

4.0

23

.074

.150

.002

.020

.063

.091

.094

200

4.4

.6.3

-.02

0.0

48.2

11.1

34.0

13

.049

.217

.117

-.01

3.0

37

.134

.141

.005

.037

.127

.138

.081

200

4.8

.0.0

-.00

4.0

29.1

07.0

96-.

001

.016

.056

.058

-.00

5.0

22

.083

.146

.000

.013

.045

.099

.102

200

4.8

.0.3

-.01

4.0

43.4

24.1

82-.

004

.038

.292

.166

-.01

3.0

34

.373

.197

-.003

.029

.270

.198

.122

200

4.8

.6.0

-.00

7.0

32.1

17.1

10.0

03

.016

.053

.067

-.00

7.0

22

.086

.142

.002

.013

.044

.092

.106

200

4.8

.6.3

-.01

6.0

39.3

23.1

68.0

06

.034

.125

.098

-.01

3.0

32

.273

.193

.001

.027

.103

.151

.107

200

8.4

.0.0

-.00

1.0

22.0

77.1

09.0

00

.022

.081

.100

-.00

1.0

15

.049

.315

.000

.014

.045

.257

.106

200

8.4

.0.3

.008

.054

.205

.133

-.01

1.0

57.2

19.1

28

.001

.029

.105

.341

-.002

.029

.102

.332

.078

200

8.4

.6.0

-.00

6.0

24.0

92.1

42.0

04

.020

.076

.100

-.00

4.0

17

.058

.356

.002

.013

.043

.239

.085

200

8.4

.6.3

-.01

4.0

46.2

35.1

44.0

10

.047

.246

.141

-.00

7.0

27

.116

.323

.006

.027

.110

.296

.091

200

8.8

.0.0

-.00

5.0

21.0

72.1

04.0

01

.013

.044

.063

-.00

2.0

15

.050

.288

.001

.009

.028

.197

.095

200

8.8

.0.3

-.00

5.0

35.1

33.0

99.0

03

.037

.133

.096

-.00

4.0

22

.079

.280

.002

.023

.076

.263

.074

200

8.8

.6.0

-.00

6.0

21.0

80.1

13.0

02

.012

.045

.076

-.00

3.0

15

.054

.295

.001

.008

.027

.195

.093

200

8.8

.6.3

-.01

0.0

33.1

34.1

18.0

10

.036

.146

.113

-.00

5.0

21

.075

.264

.006

.023

.076

.241

.075

800

4.4

.0.0

-.00

2.0

25.0

85.0

90.0

02

.029

.105

.092

-.00

1.0

18

.057

.123

.001

.021

.068

.120

.096

800

4.4

.0.3

-.00

2.0

33.1

24.1

06-.

001

.033

.126

.119

-.00

3.0

21

.070

.122

.000

.022

.072

.124

.105

800

4.4

.6.0

-.00

5.0

24.0

86.1

02.0

05

.025

.097

.086

-.00

3.0

19

.060

.136

.002

.019

.064

.091

.096

800

4.4

.6.3

-.00

8.0

28.1

15.1

11.0

05

.027

.121

.111

-.00

5.0

19

.063

.110

.002

.019

.066

.109

.100

800

4.8

.0.0

-.00

4.0

20.0

76.0

96.0

00

.018

.059

.078

-.00

4.0

17

.058

.136

.000

.015

.048

.093

.088

800

4.8

.0.3

-.00

5.0

22.0

94.1

27-.

002

.021

.079

.124

-.00

4.0

17

.067

.132

-.001

.016

.059

.130

.111

800

4.8

.6.0

-.00

6.0

19.0

73.1

01.0

01

.019

.063

.064

-.00

5.0

16

.065

.143

.000

.016

.052

.085

.090

800

4.8

.6.3

-.00

6.0

21.0

89.1

27.0

02

.021

.074

.098

-.00

5.0

17

.070

.138

.000

.017

.054

.115

.106

800

8.4

.0.0

.001

.022

.083

.136

-.00

1.0

27.1

11.1

23

-.00

1.0

10

.035

.220

.000

.013

.041

.186

.141

800

8.4

.0.3

.003

.029

.115

.109

-.00

4.0

30.1

18.1

19

-.00

1.0

12

.040

.176

.001

.012

.040

.173

.123

800

8.4

.6.0

-.00

4.0

19.0

79.1

43.0

03

.021

.088

.113

-.00

1.0

12

.038

.237

.001

.012

.037

.154

.139

800

8.4

.6.3

-.00

5.0

23.1

17.1

33.0

04

.024

.120

.126

-.00

2.0

12

.039

.170

.002

.012

.038

.150

.114

800

8.8

.0.0

-.00

2.0

13.0

45.0

83.0

00

.015

.051

.076

-.00

1.0

08

.027

.175

.000

.009

.027

.125

.110

800

8.8

.0.3

-.00

2.0

17.0

63.0

83.0

01

.017

.063

.083

-.00

1.0

09

.029

.137

.001

.010

.030

.134

.097

800

8.8

.6.0

-.00

3.0

13.0

46.0

87.0

00

.015

.052

.083

-.00

1.0

08

.030

.183

.000

.009

.027

.115

.116

800

8.8

.6.3

-.00

3.0

15.0

56.0

93.0

02

.016

.063

.088

-.00

1.0

08

.027

.117

.001

.009

.028

.108

.095


Table

4.4

:Subse

tG

MM

esti

mat

orof

Ahn,

Lee

,an

dSch

mid

t(2

013)

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

08

.4.0

.0.0

00.0

22.0

72.1

02

-.00

1.0

21.0

74.0

94

-.00

1.0

14

.046

.262

-.001

.013

.041

.193

.128

200

8.4

.0.3

.008

.050

.185

.125

-.01

2.0

52.1

88.1

25

.001

.025

.090

.263

-.002

.026

.088

.258

.087

200

8.4

.6.0

-.00

6.0

23.0

85.1

34.0

04

.019

.067

.090

-.00

3.0

16

.053

.299

.002

.013

.041

.189

.098

200

8.4

.6.3

-.01

2.0

44.2

05.1

31.0

09

.042

.208

.124

-.00

6.0

25

.094

.256

.005

.024

.089

.225

.087

200

8.8

.0.0

-.00

4.0

21.0

71.0

94.0

00

.013

.041

.060

-.00

2.0

14

.047

.261

.000

.008

.027

.168

.116

200

8.8

.0.3

-.00

5.0

34.1

27.0

92.0

03

.035

.126

.090

-.00

4.0

21

.073

.235

.002

.022

.070

.214

.088

200

8.8

.6.0

-.00

6.0

22.0

79.1

15.0

02

.012

.042

.072

-.00

3.0

15

.054

.273

.001

.008

.026

.152

.097

200

8.8

.6.3

-.01

0.0

32.1

21.1

09.0

09

.034

.132

.101

-.00

6.0

20

.071

.213

.006

.022

.071

.190

.088

800

8.4

.0.0

.001

.020

.079

.119

-.00

2.0

26.1

01.1

16

.000

.011

.034

.189

.000

.012

.039

.166

.135

800

8.4

.0.3

.004

.027

.109

.105

-.00

4.0

29.1

11.1

13

-.00

1.0

12

.039

.166

.001

.012

.039

.152

.129

800

8.4

.6.0

-.00

4.0

18.0

76.1

27.0

02

.020

.081

.115

-.00

1.0

12

.037

.208

.000

.011

.036

.130

.131

800

8.4

.6.3

-.00

4.0

21.0

99.1

24.0

04

.021

.100

.123

-.00

2.0

12

.038

.151

.001

.012

.037

.130

.110

800

8.8

.0.0

-.00

2.0

13.0

46.0

84.0

00

.014

.051

.077

-.00

1.0

09

.028

.162

.000

.009

.028

.121

.103

800

8.8

.0.3

-.00

3.0

17.0

60.0

82.0

01

.017

.061

.082

-.00

1.0

09

.029

.132

.001

.009

.030

.131

.101

800

8.8

.6.0

-.00

3.0

13.0

47.0

92-.

001

.014

.050

.078

-.00

1.0

09

.030

.170

.000

.009

.027

.105

.108

800

8.8

.6.3

-.00

3.0

14.0

53.0

89.0

01

.015

.058

.082

-.00

1.0

08

.027

.120

.001

.009

.028

.102

.094


able

4.5

:F

IVU

esti

mat

orof

Rob

erts

onan

dSar

afidis

(201

5)

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

04

.4.0

.0.0

01.0

23.0

68.0

64

-.00

2.0

22.0

65.0

48

.000

.021

.061

.073

-.001

.021

.060

.061

.031

200

4.4

.0.3

.008

.045

.132

.072

-.00

4.0

43.1

36.0

68

-.00

3.0

36

.111

.085

.001

.038

.113

.085

.031

200

4.4

.6.0

.000

.023

.069

.063

.001

.020

.060

.041

.000

.022

.064

.079

.000

.019

.057

.064

.029

200

4.4

.6.3

-.00

8.0

36.1

07.0

64.0

06

.036

.116

.064

-.00

6.0

33

.100

.068

.003

.034

.102

.079

.031

200

4.8

.0.0

.000

.024

.075

.063

.000

.014

.042

.053

-.00

1.0

20

.061

.070

.001

.012

.040

.069

.035

200

4.8

.0.3

-.00

3.0

30.0

99.0

60.0

03

.026

.088

.065

-.00

3.0

28

.089

.076

.002

.024

.079

.080

.038

200

4.8

.6.0

-.00

2.0

25.0

79.0

63.0

01

.013

.041

.043

-.00

2.0

20

.066

.071

.002

.012

.038

.066

.033

200

4.8

.6.3

-.00

6.0

29.0

93.0

68.0

04

.026

.082

.069

-.00

4.0

28

.089

.084

.002

.025

.079

.085

.035

200

8.4

.0.0

.002

.014

.042

.072

-.00

2.0

13.0

41.0

71

.001

.012

.036

.182

.000

.011

.034

.160

.032

200

8.4

.0.3

.012

.034

.097

.080

-.01

4.0

34.0

99.0

85

.004

.021

.063

.173

-.004

.022

.065

.180

.035

200

8.4

.6.0

.000

.014

.042

.065

.000

.012

.035

.061

.000

.013

.037

.179

.000

.011

.033

.135

.032

200

8.4

.6.3

-.00

4.0

25.0

80.0

56.0

03

.026

.079

.054

-.00

2.0

20

.060

.174

.002

.020

.061

.158

.034

200

8.8

.0.0

-.00

1.0

13.0

38.0

53.0

00

.008

.025

.050

.000

.011

.034

.168

.000

.007

.023

.143

.037

200

8.8

.0.3

-.00

1.0

22.0

66.0

51.0

01

.023

.068

.048

-.00

1.0

18

.054

.163

.001

.018

.057

.155

.036

200

8.8

.6.0

-.00

1.0

14.0

39.0

51.0

00

.008

.023

.055

.000

.012

.035

.164

.001

.007

.022

.140

.037

200

8.8

.6.3

-.00

4.0

20.0

60.0

48.0

05

.023

.066

.048

-.00

3.0

18

.053

.156

.002

.019

.057

.153

.030

800

4.4

.0.0

.000

.020

.061

.060

.000

.022

.073

.066

.000

.017

.051

.069

-.001

.020

.060

.069

.052

800

4.4

.0.3

.002

.024

.078

.072

-.00

1.0

24.0

81.0

68

-.00

1.0

20

.059

.059

.000

.020

.061

.063

.055

800

4.4

.6.0

-.00

2.0

19.0

55.0

68.0

02

.019

.058

.056

-.00

1.0

17

.053

.074

.002

.018

.057

.066

.050

800

4.4

.6.3

-.00

4.0

21.0

63.0

64.0

02

.020

.067

.059

-.00

2.0

18

.054

.060

.001

.018

.055

.065

.046

800

4.8

.0.0

-.00

2.0

16.0

53.0

58.0

00

.015

.047

.050

-.00

1.0

16

.048

.067

.000

.013

.042

.056

.050

800

4.8

.0.3

-.00

2.0

17.0

55.0

56.0

01

.017

.053

.053

-.00

2.0

15

.048

.058

.001

.015

.047

.052

.051

800

4.8

.6.0

-.00

4.0

15.0

51.0

71.0

00

.016

.049

.058

-.00

3.0

14

.047

.077

.001

.015

.046

.059

.048

800

4.8

.6.3

-.00

4.0

16.0

52.0

69.0

02

.016

.050

.059

-.00

2.0

15

.047

.066

.000

.015

.046

.058

.049

800

8.4

.0.0

.002

.013

.038

.056

-.00

3.0

17.0

50.0

66

.000

.008

.025

.079

.000

.010

.031

.081

.050

800

8.4

.0.3

.005

.018

.055

.063

-.00

7.0

19.0

55.0

64

.000

.010

.030

.080

.000

.010

.031

.083

.047

800

8.4

.6.0

-.00

1.0

11.0

31.0

54.0

00

.012

.035

.055

-.00

1.0

09

.026

.078

.001

.010

.030

.080

.055

800

8.4

.6.3

-.00

1.0

13.0

39.0

54.0

00

.013

.038

.052

-.00

1.0

10

.029

.078

.001

.010

.030

.077

.051

800

8.8

.0.0

-.00

1.0

08.0

26.0

49.0

00

.010

.030

.059

.000

.007

.021

.077

.000

.008

.024

.080

.050

800

8.8

.0.3

.000

.011

.034

.050

.001

.011

.034

.056

.000

.008

.024

.065

.000

.008

.025

.079

.052

800

8.8

.6.0

-.00

1.0

08.0

25.0

50-.

001

.009

.028

.057

-.00

1.0

07

.021

.084

.000

.008

.024

.079

.051

800

8.8

.6.3

-.00

1.0

09.0

29.0

56.0

00

.010

.031

.053

.000

.007

.024

.076

.000

.008

.025

.073

.059


Table

4.6

:Subse

tF

IVU

esti

mat

orof

Rob

erts

onan

dSar

afidis

(201

5)

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

08

.4.0

.0.0

02.0

13.0

42.0

76

-.00

3.0

13.0

41.0

69

.000

.012

.035

.126

-.001

.011

.035

.116

.029

200

8.4

.0.3

.011

.032

.094

.088

-.01

2.0

33.0

94.0

87

.001

.021

.064

.125

-.002

.021

.065

.119

.030

200

8.4

.6.0

.000

.014

.042

.068

.000

.012

.037

.049

.000

.013

.037

.127

-.001

.011

.034

.104

.032

200

8.4

.6.3

-.00

5.0

25.0

75.0

57.0

05

.024

.074

.059

-.00

2.0

20

.060

.121

.002

.020

.060

.109

.030

200

8.8

.0.0

.000

.014

.042

.066

.000

.008

.025

.057

-.00

1.0

12

.037

.136

.000

.008

.023

.115

.031

200

8.8

.0.3

-.00

2.0

23.0

68.0

57.0

01

.023

.068

.047

-.00

3.0

18

.057

.125

.002

.019

.056

.116

.035

200

8.8

.6.0

-.00

2.0

14.0

44.0

69.0

01

.008

.024

.058

-.00

1.0

12

.038

.134

.000

.008

.023

.101

.028

200

8.8

.6.3

-.00

5.0

20.0

61.0

52.0

05

.022

.067

.044

-.00

4.0

18

.054

.122

.003

.019

.058

.103

.039

800

8.4

.0.0

.002

.013

.038

.059

-.00

3.0

16.0

47.0

60

.000

.009

.026

.072

.000

.011

.033

.063

.044

800

8.4

.0.3

.004

.017

.051

.069

-.00

5.0

17.0

51.0

72

.000

.010

.032

.076

.000

.011

.033

.074

.045

800

8.4

.6.0

.000

.011

.032

.060

.000

.012

.035

.058

.000

.009

.028

.077

.000

.010

.032

.071

.048

800

8.4

.6.3

-.00

1.0

12.0

38.0

55.0

01

.012

.038

.069

-.00

1.0

10

.030

.079

.000

.010

.031

.071

.044

800

8.8

.0.0

.000

.010

.029

.059

.000

.010

.031

.055

.000

.008

.024

.068

.000

.008

.025

.072

.041

800

8.8

.0.3

-.00

1.0

11.0

34.0

59.0

01

.011

.032

.056

-.00

1.0

08

.026

.068

.001

.008

.026

.068

.047

800

8.8

.6.0

-.00

1.0

09.0

29.0

59.0

00

.009

.029

.061

.000

.008

.025

.072

.000

.008

.025

.073

.046

800

8.8

.6.3

-.00

2.0

10.0

30.0

49.0

01

.010

.031

.056

.000

.008

.025

.078

.000

.009

.025

.067

.050


able

4.7

:F

IVR

esti

mat

orof

Rob

erts

onan

dSar

afidis

(201

5)

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

04

.4.0

.0.0

01.0

19.0

58.0

68

-.00

2.0

20.0

60.0

58

.000

.016

.047

.081

-.001

.018

.052

.081

.035

200

4.4

.0.3

.008

.037

.113

.081

-.00

6.0

38.1

22.0

71

-.00

2.0

27

.083

.081

-.001

.030

.090

.080

.033

200

4.4

.6.0

.000

.019

.057

.061

.000

.019

.055

.046

.000

.016

.048

.081

.000

.017

.051

.073

.031

200

4.4

.6.3

-.00

2.0

31.0

95.0

62.0

03

.034

.106

.065

-.00

1.0

26

.079

.068

.000

.029

.088

.077

.032

200

4.8

.0.0

.001

.017

.055

.066

.000

.012

.038

.063

.000

.014

.044

.072

.000

.011

.035

.085

.035

200

4.8

.0.3

.000

.023

.073

.061

.002

.024

.076

.057

.000

.021

.061

.067

.000

.022

.067

.082

.039

200

4.8

.6.0

-.00

1.0

18.0

54.0

59.0

00

.012

.037

.060

.000

.014

.044

.068

.000

.011

.035

.086

.038

200

4.8

.6.3

-.00

1.0

23.0

71.0

62.0

02

.024

.076

.066

.000

.021

.062

.071

.000

.022

.072

.084

.041

200

8.4

.0.0

.001

.012

.037

.069

-.00

2.0

13.0

39.0

68

.001

.011

.031

.181

-.001

.011

.033

.172

.043

200

8.4

.0.3

.015

.034

.095

.086

-.01

7.0

36.0

99.0

87

.005

.020

.057

.214

-.006

.021

.061

.215

.043

200

8.4

.6.0

.000

.012

.036

.067

-.00

1.0

11.0

33.0

62

.001

.011

.032

.189

.000

.011

.032

.163

.040

200

8.4

.6.3

-.00

2.0

25.0

77.0

54.0

01

.027

.080

.051

-.00

1.0

18

.055

.197

.001

.020

.060

.186

.038

200

8.8

.0.0

.000

.011

.032

.054

.000

.008

.023

.051

.001

.009

.028

.179

.000

.007

.022

.155

.037

200

8.8

.0.3

.000

.019

.057

.047

.000

.022

.066

.045

.001

.015

.046

.183

.000

.018

.054

.174

.037

200

8.8

.6.0

.000

.011

.031

.054

.000

.007

.022

.051

.001

.009

.028

.181

.000

.007

.022

.159

.036

200

8.8

.6.3

-.00

3.0

18.0

55.0

51.0

04

.022

.066

.046

-.00

1.0

16

.047

.176

.002

.018

.056

.177

.038

800

4.4

.0.0

-.00

1.0

15.0

45.0

59.0

00

.019

.061

.063

-.00

1.0

12

.036

.066

.001

.016

.049

.066

.051

800

4.4

.0.3

.000

.021

.064

.068

-.00

1.0

22.0

70.0

66

-.00

1.0

15

.044

.068

.000

.016

.049

.060

.048

800

4.4

.6.0

-.00

1.0

13.0

41.0

59.0

02

.017

.052

.051

-.00

1.0

12

.037

.062

.001

.015

.048

.056

.051

800

4.4

.6.3

-.00

2.0

17.0

51.0

62.0

02

.018

.058

.059

-.00

1.0

14

.043

.059

.001

.016

.050

.058

.048

800

4.8

.0.0

-.00

1.0

11.0

34.0

61.0

00

.014

.043

.056

.000

.010

.030

.075

.000

.012

.038

.060

.045

800

4.8

.0.3

.000

.014

.042

.051

.000

.015

.046

.052

-.00

1.0

11

.035

.062

.000

.013

.040

.059

.047

800

4.8

.6.0

-.00

1.0

11.0

33.0

69.0

00

.015

.044

.056

.000

.010

.029

.075

.000

.014

.042

.062

.042

800

4.8

.6.3

-.00

1.0

13.0

41.0

64.0

02

.015

.048

.056

.000

.011

.035

.057

.000

.014

.042

.059

.044

800

8.4

.0.0

.001

.011

.033

.050

-.00

2.0

15.0

47.0

64

.000

.007

.020

.093

.000

.010

.028

.082

.054

800

8.4

.0.3

.005

.017

.053

.070

-.00

6.0

18.0

56.0

73

.000

.008

.026

.082

.000

.010

.028

.082

.054

800

8.4

.6.0

.000

.009

.026

.051

-.00

1.0

11.0

33.0

54

.000

.007

.021

.079

.000

.009

.028

.077

.053

800

8.4

.6.3

-.00

1.0

12.0

37.0

52.0

00

.013

.038

.056

.000

.009

.026

.078

.000

.009

.029

.079

.053

800

8.8

.0.0

.000

.006

.019

.053

.000

.010

.028

.061

.000

.005

.016

.082

.000

.007

.023

.080

.053

800

8.8

.0.3

.000

.010

.029

.055

.000

.011

.032

.054

.000

.007

.020

.078

.000

.008

.023

.081

.053

800

8.8

.6.0

.000

.006

.018

.051

.000

.009

.028

.057

.000

.005

.016

.078

.000

.008

.024

.080

.050

800

8.8

.6.3

.000

.009

.027

.055

.000

.010

.031

.055

.000

.007

.021

.079

.000

.008

.024

.079

.049


Table

4.8

:Subse

tF

IVR

esti

mat

orof

Rob

erts

onan

dSar

afidis

(201

5)

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

08

.4.0

.0.0

02.0

12.0

38.0

72

-.00

2.0

12.0

39.0

71

.000

.010

.031

.131

-.001

.010

.032

.124

.035

200

8.4

.0.3

.012

.032

.092

.093

-.01

2.0

34.0

97.0

89

.002

.018

.056

.142

-.003

.019

.059

.140

.038

200

8.4

.6.0

.001

.012

.037

.067

.000

.011

.034

.054

.000

.011

.032

.139

-.001

.010

.032

.124

.033

200

8.4

.6.3

-.00

2.0

23.0

70.0

63.0

03

.025

.074

.060

-.00

1.0

18

.053

.133

.000

.020

.057

.122

.035

200

8.8

.0.0

.000

.011

.032

.066

.000

.008

.023

.061

.000

.010

.029

.142

.000

.008

.022

.116

.035

200

8.8

.0.3

-.00

1.0

19.0

58.0

53.0

01

.022

.065

.052

.000

.015

.046

.136

.001

.018

.052

.126

.039

200

8.8

.6.0

.000

.011

.032

.064

.000

.007

.023

.054

.000

.009

.029

.143

.000

.007

.022

.118

.038

200

8.8

.6.3

-.00

3.0

18.0

54.0

56.0

04

.022

.065

.049

-.00

1.0

15

.047

.131

.001

.018

.055

.121

.042

800

8.4

.0.0

.001

.010

.032

.061

-.00

2.0

14.0

43.0

64

.000

.007

.022

.077

.000

.010

.029

.074

.053

800

8.4

.0.3

.004

.015

.048

.071

-.00

5.0

17.0

51.0

73

.000

.009

.027

.073

.000

.010

.029

.076

.048

800

8.4

.6.0

.000

.009

.026

.052

.000

.011

.033

.055

.000

.007

.023

.077

.000

.010

.029

.073

.055

800

8.4

.6.3

.000

.011

.035

.053

.000

.012

.036

.058

.000

.009

.026

.066

.000

.010

.029

.074

.049

800

8.8

.0.0

.000

.006

.020

.055

.000

.009

.028

.064

.000

.006

.017

.074

.000

.007

.023

.069

.044

800

8.8

.0.3

.000

.009

.028

.061

.000

.010

.030

.060

.000

.007

.020

.070

.000

.008

.023

.070

.052

800

8.8

.6.0

.000

.006

.019

.056

.000

.009

.028

.057

.000

.006

.017

.073

.000

.008

.024

.076

.046

800

8.8

.6.3

.000

.009

.026

.059

.000

.010

.030

.059

.000

.007

.021

.072

.000

.008

.024

.070

.050


able

4.9

:P

roje

ctio

nG

MM

esti

mat

orof

Hay

akaw

a(2

012)

wit

hw

eak

exog

enei

ty

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

04

.4.0

.0.0

00.0

25.0

76.0

58

-.00

1.0

23.0

75.0

53

-.00

2.0

23

.072

.087

.002

.023

.072

.074

.020

200

4.4

.0.3

.003

.056

.181

.078

-.00

3.0

54.1

72.0

83

-.01

1.0

55

.171

.113

.007

.050

.166

.113

.026

200

4.4

.6.0

-.00

1.0

26.0

81.0

77.0

03

.021

.070

.062

-.00

3.0

26

.081

.106

.005

.022

.073

.097

.028

200

4.4

.6.3

-.01

6.0

55.1

91.1

06.0

15

.052

.191

.097

-.01

9.0

56

.206

.153

.016

.050

.199

.141

.021

200

4.8

.0.0

-.00

1.0

33.1

07.0

73.0

01

.016

.050

.047

-.00

1.0

31

.101

.092

.002

.015

.046

.062

.020

200

4.8

.0.3

-.00

9.0

50.1

79.0

88.0

02

.034

.116

.079

-.01

3.0

52

.179

.132

.004

.035

.117

.111

.033

200

4.8

.6.0

-.00

3.0

32.1

04.0

69.0

03

.015

.050

.053

-.00

5.0

32

.108

.108

.004

.015

.051

.088

.021

200

4.8

.6.3

-.01

3.0

56.2

12.1

06.0

09

.041

.142

.084

-.01

8.0

59

.253

.167

.010

.041

.150

.122

.025

200

8.4

.0.0

.001

.015

.046

.075

-.00

1.0

14.0

46.0

75

.000

.013

.039

.143

.000

.012

.038

.131

.018

200

8.4

.0.3

.015

.045

.134

.089

-.01

5.0

44.1

35.0

99

.002

.031

.093

.147

-.002

.031

.092

.145

.021

200

8.4

.6.0

.000

.014

.044

.063

.001

.012

.037

.056

.000

.014

.042

.144

.001

.012

.035

.118

.028

200

8.4

.6.3

-.00

8.0

38.1

20.0

66.0

08

.038

.118

.051

-.00

6.0

31

.089

.136

.006

.030

.089

.128

.029

200

8.8

.0.0

-.00

1.0

16.0

50.0

59.0

01

.009

.028

.068

-.00

1.0

16

.046

.140

.001

.008

.026

.129

.021

200

8.8

.0.3

-.00

1.0

33.1

04.0

52.0

02

.030

.094

.056

-.00

4.0

31

.090

.128

.004

.027

.081

.123

.019

200

8.8

.6.0

-.00

1.0

15.0

46.0

45.0

00

.009

.025

.058

-.00

2.0

15

.047

.136

.001

.008

.025

.118

.026

200

8.8

.6.3

-.01

0.0

41.1

25.0

59.0

07

.038

.121

.059

-.00

9.0

35

.106

.135

.008

.033

.100

.138

.026

800

4.4

.0.0

-.00

1.0

26.0

74.0

65.0

00

.026

.082

.079

-.00

1.0

21

.064

.072

.001

.023

.069

.073

.035

800

4.4

.0.3

.000

.032

.105

.072

-.00

1.0

31.1

02.0

75

-.00

3.0

28

.090

.079

.003

.027

.088

.074

.037

800

4.4

.6.0

-.00

4.0

24.0

79.0

86.0

04

.022

.072

.069

-.00

4.0

24

.077

.103

.004

.022

.074

.095

.038

800

4.4

.6.3

-.00

6.0

31.1

07.0

81.0

06

.030

.108

.072

-.00

5.0

27

.089

.078

.004

.025

.089

.075

.042

800

4.8

.0.0

-.00

6.0

37.1

13.0

66.0

01

.019

.057

.054

-.00

7.0

36

.108

.098

.002

.017

.052

.059

.036

800

4.8

.0.3

-.00

3.0

28.0

94.0

67.0

01

.021

.067

.060

-.00

4.0

25

.088

.074

.001

.020

.063

.066

.044

800

4.8

.6.0

-.00

7.0

28.1

05.0

81.0

04

.021

.069

.071

-.00

8.0

28

.105

.096

.003

.020

.067

.084

.045

800

4.8

.6.3

-.00

7.0

31.1

15.0

77.0

05

.026

.091

.064

-.00

7.0

30

.106

.085

.005

.024

.083

.072

.043

800

8.4

.0.0

.003

.018

.053

.105

-.00

3.0

22.0

65.1

18

.000

.011

.032

.082

.000

.012

.036

.077

.030

800

8.4

.0.3

.008

.025

.073

.072

-.00

7.0

25.0

72.0

74

.000

.016

.048

.076

.000

.016

.047

.073

.027

800

8.4

.6.0

-.00

1.0

14.0

41.0

60.0

00

.013

.040

.055

-.00

1.0

12

.037

.084

.001

.012

.035

.079

.042

800

8.4

.6.3

-.00

3.0

22.0

68.0

54.0

03

.022

.068

.056

-.00

1.0

17

.047

.067

.001

.016

.047

.073

.035

800

8.8

.0.0

-.00

1.0

19.0

56.0

68.0

00

.013

.039

.079

-.00

2.0

15

.046

.087

.001

.010

.029

.075

.031

800

8.8

.0.3

.000

.018

.055

.057

.000

.016

.048

.057

-.00

1.0

14

.042

.080

.001

.012

.036

.079

.037

800

8.8

.6.0

.000

.015

.048

.046

.001

.011

.033

.055

-.00

1.0

14

.046

.083

.001

.010

.030

.073

.040

800

8.8

.6.3

-.00

2.0

20.0

62.0

53.0

01

.019

.058

.055

-.00

1.0

15

.047

.071

.000

.015

.043

.069

.041


Table

4.1

0:

Subse

tP

roje

ctio

nG

MM

esti

mat

orof

Hay

akaw

a(2

012)

wit

hw

eak

exog

enei

ty

Des

ign

sG

MM

1st

epG

MM

2st

epα

βα

βJ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Siz

e20

08

.4.0

.0.0

00.0

14.0

45.0

74

.000

.014

.044

.064

.000

.013

.041

.123

-.001

.013

.039

.108

.013

200

8.4

.0.3

.009

.043

.133

.083

-.00

9.0

43.1

34.0

88

.000

.033

.099

.120

.000

.032

.098

.128

.026

200

8.4

.6.0

.000

.014

.045

.071

.001

.012

.037

.054

.000

.015

.046

.140

.001

.012

.037

.111

.029

200

8.4

.6.3

-.00

9.0

38.1

18.0

76.0

09

.038

.118

.067

-.00

6.0

31

.093

.110

.005

.030

.091

.109

.035

200

8.8

.0.0

-.00

1.0

17.0

55.0

71.0

01

.009

.029

.068

-.00

2.0

17

.052

.138

.000

.009

.027

.115

.020

200

8.8

.0.3

-.00

4.0

38.1

17.0

68.0

04

.033

.105

.063

-.00

6.0

33

.102

.116

.004

.029

.087

.109

.031

200

8.8

.6.0

-.00

2.0

16.0

50.0

59.0

00

.009

.026

.058

-.00

2.0

17

.052

.122

.001

.009

.026

.104

.025

200

8.8

.6.3

-.01

1.0

44.1

39.0

79.0

09

.042

.130

.070

-.00

7.0

37

.116

.131

.007

.035

.108

.120

.032

800

8.4

.0.0

.003

.017

.049

.083

-.00

3.0

20.0

57.0

97

-.00

1.0

11

.034

.072

.000

.013

.038

.072

.033

800

8.4

.0.3

.006

.024

.072

.067

-.00

5.0

24.0

72.0

70

.000

.018

.051

.072

.001

.018

.051

.072

.034

800

8.4

.6.0

-.00

2.0

14.0

42.0

53.0

01

.014

.040

.051

-.00

1.0

13

.039

.075

.001

.012

.036

.070

.044

800

8.4

.6.3

-.00

2.0

22.0

66.0

58.0

03

.022

.067

.054

-.00

1.0

17

.050

.072

.000

.017

.048

.072

.044

800

8.8

.0.0

.000

.020

.064

.068

.000

.013

.039

.070

-.00

2.0

17

.052

.092

.001

.010

.030

.076

.035

800

8.8

.0.3

-.00

1.0

19.0

57.0

58.0

00

.015

.048

.061

-.00

1.0

15

.047

.076

.001

.013

.039

.069

.036

800

8.8

.6.0

-.00

1.0

16.0

52.0

54.0

01

.011

.034

.055

-.00

2.0

15

.052

.079

.001

.011

.031

.065

.044

800

8.8

.6.3

-.00

3.0

21.0

65.0

59.0

02

.019

.061

.062

-.00

1.0

17

.051

.068

.000

.016

.046

.068

.042


able

4.1

1:

Con

dit

ional

like

lihood

esti

mat

orof

Bai

(201

3b)

Des

ign

sS

tric

tW

eak

αβ

αβ

NT

αρ

δB

ias

RM

SE

qS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

Bia

sR

MS

EqS

tdS

ize

200

4.4

.0.0

.001

.013

.040

.052

-.001

.013

.038

.050

-.00

1.0

13.0

39.0

59.0

02.0

13.0

36

.066

200

4.4

.0.3

.003

.027

.081

.150

-.015

.031

.103

.207

-.00

1.0

25.0

74.1

27.0

00.0

27.0

78

.161

200

4.4

.6.0

.000

.014

.040

.053

.000

.013

.038

.052

-.01

1.0

17.0

43.1

29.0

24.0

24.0

39

.302

200

4.4

.6.3

.000

.025

.074

.109

-.006

.029

.090

.167

-.04

0.0

42.0

81.3

50.0

50.0

51.0

78

.445

200

4.8

.0.0

.000

.013

.040

.054

.000

.009

.026

.059

.000

.013

.039

.052

-.00

2.0

09.0

25.0

6920

04

.8.0

.3-.

005

.026

.225

.234

-.016

.030

.134

.313

.000

.019

.058

.093

.000

.020

.060

.142

200

4.8

.6.0

.000

.013

.039

.048

.000

.009

.027

.059

-.00

5.0

14.0

41.0

66.0

11.0

12.0

26

.166

200

4.8

.6.3

-.00

3.0

22.0

75.1

62

-.002

.026

.082

.194

-.02

5.0

28.0

60.2

05.0

35.0

35.0

55

.347

200

8.4

.0.0

.000

.008

.024

.056

.000

.008

.024

.051

-.00

1.0

08.0

24.0

53.0

01.0

08.0

24

.064

200

8.4

.0.3

.005

.015

.045

.086

-.005

.016

.047

.096

.001

.016

.049

.120

-.00

3.0

18.0

53.1

4420

08

.4.6

.0.0

00.0

09.0

25.0

59.0

00

.008

.024

.057

-.00

6.0

10.0

25.0

88.0

12.0

13.0

25

.190

200

8.4

.6.3

.003

.015

.044

.076

-.003

.016

.047

.090

-.02

3.0

25.0

54.2

90.0

27.0

28.0

57

.328

200

8.8

.0.0

.000

.008

.024

.053

.000

.006

.017

.062

.000

.008

.024

.050

-.00

1.0

06.0

18.0

6420

08

.8.0

.3-.

008

.015

.044

.131

.007

.018

.054

.155

.000

.015

.047

.148

-.00

1.0

19.0

57.1

7920

08

.8.6

.0.0

00.0

08.0

24.0

52.0

00

.006

.017

.059

-.00

3.0

08.0

24.0

65.0

06.0

07.0

17

.122

200

8.8

.6.3

-.00

9.0

15.0

42.1

28

.011

.019

.052

.150

-.02

1.0

22.0

50.2

56.0

27.0

29.0

57

.332

800

4.4

.0.0

.000

.010

.031

.060

.001

.012

.035

.051

-.00

3.0

11.0

33.0

95.0

04.0

15.0

43

.172

800

4.4

.0.3

.002

.022

.072

.339

-.014

.028

.116

.438

.001

.020

.061

.301

-.00

2.0

25.0

78.4

1580

04

.4.6

.0.0

00.0

10.0

31.0

57.0

01

.012

.035

.052

-.02

5.0

26.0

51.4

49.0

64.0

64.0

76

.798

800

4.4

.6.3

-.00

2.0

21.0

63.2

97

-.003

.028

.099

.409

-.04

4.0

44.0

73.6

42.0

56.0

56.0

74

.741

800

4.8

.0.0

-.00

1.0

09.0

27.0

57

.000

.010

.030

.049

.000

.009

.027

.058

-.00

6.0

12.0

38.1

8280

04

.8.0

.3-.

008

.024

.250

.448

-.019

.035

.170

.578

-.00

2.0

16.0

49.2

63.0

01.0

22.0

67

.411

800

4.8

.6.0

.000

.009

.027

.055

.000

.010

.032

.052

-.00

7.0

11.0

30.1

34.0

40.0

40.0

45

.722

800

4.8

.6.3

-.00

8.0

22.0

77.3

88

.005

.031

.110

.516

-.03

4.0

34.0

49.6

16.0

49.0

49.0

55

.779

800

8.4

.0.0

.000

.006

.018

.058

.000

.008

.023

.056

-.00

1.0

07.0

20.0

81.0

02.0

10.0

29

.154

800

8.4

.0.3

.005

.011

.030

.211

-.006

.012

.035

.241

.002

.013

.039

.314

-.00

4.0

16.0

48.3

8580

08

.4.6

.0-.

001

.006

.019

.054

.000

.008

.023

.050

-.01

4.0

15.0

25.4

03.0

35.0

35.0

44

.708

800

8.4

.6.3

.003

.010

.029

.175

-.003

.012

.035

.224

-.02

6.0

26.0

47.5

86.0

30.0

31.0

54

.629

800

8.8

.0.0

.000

.005

.015

.050

.000

.007

.019

.058

.000

.005

.015

.052

-.00

4.0

08.0

24.1

6380

08

.8.0

.3-.

006

.009

.024

.212

.007

.012

.035

.301

.000

.010

.031

.270

-.00

1.0

14.0

41.3

6980

08

.8.6

.0.0

00.0

05.0

15.0

46.0

00

.007

.020

.057

-.00

4.0

06.0

16.1

18.0

21.0

21.0

31

.553

800

8.8

.6.3

-.00

7.0

10.0

23.2

14

.010

.013

.033

.323

-.02

0.0

20.0

32.5

68.0

26.0

27.0

40

.655

Chapter 5

Pseudo Panel Data Models with

Cohort Interactive Effects

5.1 Introduction

Over the last three decades panel data techniques proved to be of high value for both

micro and macro economists. Nevertheless, genuine microeconomic panel data can still

be difficult and costly to obtain and administer. The non-availability of genuine panel

datasets can be especially problematic for developing countries with a limited amount

of administrative data that tracks individuals over time. In such cases, repeated cross-

section surveys can be used to form so-called pseudo panels.

Models for this type of data in economics were introduced by Deaton (1985), with

early contributions by Verbeek and Nijman (1992) and Moffitt (1993) among others.

Although pseudo panel data models have not been analysed as extensively as their gen-

uine counterparts, the volume of literature on these types of models is increasing. For

some recent theoretical papers, readers may be referred to McKenzie (2004), Verbeek

and Vella (2005) and Inoue (2008)[hereafter I2008 ] inter alia, while Verbeek (2008)

provides an excellent overview of the literature.

Existing estimation methods for linear pseudo panel data models assume that the un-

observed individual heterogeneity can be properly captured using the standard additive

error component structure. However, in some cases this assumption might be too re-

strictive to properly describe the data at hand. For genuine panel data models, there is

a substantial literature available on models that use a multiplicative error component

143

Chapter 5. Pseudo Panel Data Models with Cohort Interactive Effects 144

structure of the factor type to capture the unobserved individual characteristics in a

more flexible way, see e.g. Pesaran (2006), Bai (2009), Sarafidis et al. (2009) and the

survey of Sarafidis and Wansbeek (2012).

The key component of pseudo panel analysis is the use of cohort information in esti-

mation. We use the generic term “cohorts” to describe any grouping structure based

on variables like gender, race, or age. In this paper, we introduce a factor structure to

linear pseudo panel data models with a fixed number of time periods and cohorts. We

provide several theoretical contributions to the existing pseudo panel data literature.

Firstly, if the common mean assumption is violated for all cohorts, but can be main-

tained within cohorts, a Generalized Method of Moments (GMM) estimator based on

the quasi-differencing approach of Ahn, Lee, and Schmidt (2013) is consistent. Sec-

ondly, we discuss identification, estimation and inference properties of this estimator

for potentially unbalanced samples.

In addition to the theoretical results of the novel estimator, an extensive Monte Carlo

simulation study is conducted to assess the finite sample properties. We mainly focus

on the robustness of the proposed estimator with respect to endogenous variables,

cohort interactive effects, and weak identification.

We also apply our estimator in an empirical analysis of the labour supply elasticity

in Ecuador over the period of 2007-2013. We use annual survey data to construct ten

cohorts based on the corresponding heads of the household that work full time. To

account for possible general non-linear trends in labour supply we allow for a non-

additive factor structure using the newly developed estimator. We find a statistically

significant negative wage effect on hours worked.

Here we briefly introduce our notation. The usual vec(·) operator denotes the column

stacking operator. The commutation matrix Ka,b is defined in such a way that for any

[a× b] matrix A, vec(A′) = Ka,b vec(A). ⊗ denotes the Kronecker product satisfying

the property vec(ABC) = (C ′⊗A) vec(B) and ıT is a [T×1] vector of ones. For some

set A, we denote its cardinality by |A|. Finally, 1(·) is the usual indicator function.

For further details regarding the notation used in this paper see Abadir and Magnus

(2002).


5.2 The Model

In this paper, we consider a linear panel data model with group specific membership

yi,t = β′ws,t + ζ ′zi,t + νi,t, νi,t = λ′ift + εi,t, E[εi,t] = 0, i ∈ Is,t (5.1)

where Is,t is the set of all individuals (in total Ns,t) that are in group s = 1, . . . , S

at time t = 1, . . . , T , ws,t is a Kw-dimensional vector of group-time-specific covariates,

and zi,t is a Kz-dimensional vector of individual specific explanatory variables. Thus, in

total there are Kz +Kw = K parameters of interest for observed explanatory variables.

We denote the combined parameter vector by θ = (β′, ζ ′)′. For ease of exposition we

shall assume at this stage that zi,t does not contain any lags of yi,t. Extensions to

dynamic models are formally discussed in Section 5.3.4.

We use the generic term cohorts to describe any grouping structure based on some

selection variable. In the literature variables like gender, race, region of residence and

most popularly age are used to define the group participation, see Verbeek (2008) and

McKenzie (2004).

To allow for individual specific unobserved characteristics, ui,t contains the multifactor

error term λ′ift =∑L

l=1 λ(l)i f

(l)t . The L-dimensional vectors λi and ft are individual

specific factor loadings and time specific factors, respectively. The standard two com-

ponent (Fixed Effects (FE)) model (as in e.g. McKenzie (2004), I2008 and Verbeek

(2008)) can be obtained by setting ft to some constant c, such that λ′ift = δi, ∀t.

Estimation of the model in (5.1) is straightforward if E[νi,t|zi,t] = 0 and can be per-

formed by using pooled cross-sectional OLS. However, in most cases of empirical in-

terest these conditions can be violated as the unobserved individual characteristics λi

are correlated with observed individual characteristics zi,t. Hence, if the correlation

is non-zero we have to rely on either general “external” instruments or pseudo panel

techniques that use the cohort structure of the dataset as instruments, as originally

suggested by Deaton (1985). This paper deals with the latter type of estimators.

Before defining the estimators considered in this paper, we discuss the notation first.

All estimators discussed in this paper, can be expressed solely in terms of the matri-

ces/vectors containing cross-sectional averages. Observations at the individual level

i, on the other hand, are only used for the estimation of the asymptotic variance-

covariance matrices. By taking the cross-sectional average for some group s at time t


we obtain the following aggregated equation

ys,t = θ′xs,t + νs,t, s = 1, . . . , S, t = 1, . . . , T. (5.2)

Here we denote

ys,t =1

Ns,t

∑i∈Is,t

yi,t, xs,t =1

Ns,t

∑i∈Is,t

xi,s,t, λs,t =1

Ns,t

∑i∈Is,t

λi, εs,t =1

Ns,t

∑i∈Is,t

εi,t,

νs,t = λ′s,tft + εs,t, xi,s,t = (w′s,t, z′i,t)′.

After performing cross-sectional averaging we stack all observations over the time-

dimension for some cohort s

ys = Xsθ + νs, (5.3)

where Xs, a [T ×K] dimensional matrix, is defined as

Xs = (xs,1, . . . , xs,T )′ (5.4)

and similarly for T dimensional vectors ys and us. Finally, we can stack observations

for all cohorts

y = Xθ + ν, (5.5)

where the corresponding s specific vectors/matrices are stacked on top of each other,

e.g.

y = (y′1, . . . ,y′S)′, X = (X ′1, . . . ,X

′S)′. (5.6)

It is important that already at this point we discuss the asymptotic setup that one

can use to derive the theoretical results. Using the terminology of Verbeek (2008) we

formulate commonly used asymptotic schemes:

Type I Ns,t →∞. T and S are fixed;

Type II Ns,t and T fixed but S →∞;

Type III Ns,t →∞ and T →∞ but S fixed.

In this paper, we assume that one possesses a dataset such that the Type I asymptotic

scheme is reasonable for describing the finite sample properties of the estimator con-

sidered. Hence, unless stated otherwise,p−→/

d−→ are used to denote convergence in

probability/distribution as all Ns,t →∞.


A well known implication (see e.g. I2008 ) of the Type I asymptotics is the robust-

ness of the estimator based on cross-sectional averages to the presence of endogenous

explanatory variables. However, as it is later discussed in this paper, robustness to

endogeneity is only achieved under the assumption of strong identification. Another

implication for our analysis is that under Type I (unlike Type II) asymptotics, the

estimator that is discussed in this paper does not suffer from the “many instrument”

bias as in Bekker (1994) and Bekker and van der Ploeg (2005). The intuition behind

these properties is discussed later in the paper.

5.3 Cohort Interactive Effects

5.3.1 Inconsistency of the conventional Fixed Effects estima-

tor

In this section we show that the conventional Fixed Effects type estimator for pseudo

panel data models is inconsistent if νi,t has a factor structure. The conventional esti-

mator can be motivated assuming that

E[λ′ift|i ∈ Is,t] = δs. (5.7)

Here we refer to δs as the cohort fixed effect. This condition is satisfied if, for example,

ft = c is time-invariant vector and E[λi|i ∈ Is,t] = λs or the factor loadings λi have

zero mean, i.e. E[λi|i ∈ Is,t] = 0L.

Assuming (5.7), one can then rewrite (5.5)

y = Xθ + vec (ıδ′) + (ν − vec (ıδ′))

= vec (ıδ′) +Xθ + u, (5.8)

where δ = (δ1, . . . , δS)′ and u ≡ ν − vec (ıδ′). The cohort fixed effects vector δ can

be then eliminated from (5.8) using the within group transformation matrix of the

form M = IS ⊗ (IT − (1/T )ıT ı′T ). Using the terminology of I2008 (or alternatively of

Dargay (2007)) the GMM (or the Fixed Effects (FE)) estimator is given by

θGMMl = (X ′MΩMX)−1X ′MΩMy, (5.9)


where the subscript l stands for “linear” andΩ is some pre-specified [ST×ST ] positive-

definite weighting matrix. The asymptotically efficient version of this estimator can be

found in Appendix 5.A.3 together with the underlying assumptions.

This estimator remains consistent provided that (5.7) holds, because in this case

Mup−→ 0S(T−1). This condition in some cases can be too restrictive as it imposes

that all cohorts respond similarly to common shocks (on average). However, it can still

be reasonable to maintain the less restrictive assumption that E[λi|i ∈ Is,t] = λs, such

that

E[λ′ift|i ∈ Is,t] = λ′sft. (5.10)

Under this assumption all individuals i in cohort s have an error-component struc-

ture with common time varying-mean, or in other words, a cohort interactive effects

structure.

Before formally characterizing the asymptotic properties of the θGMMl estimator under

the cohort interactive effects structure in (5.10), we define

F = (f1, . . . ,fT )′, Λ = (λ1, . . . ,λS)′.

Here F and Λ are [T × L] and [S × L] matrices of factors and cohort factor loadings,

respectively. As a special case, in the fixed effects model, both F = c and Λ = δ are

T and S dimensional vectors. For the more general model we define ui,t to be

ui,t ≡ νi,t − λ′sft, i ∈ Is,t,

such that the newly combined error term has mean zero, i.e. E[ui,t] = 0, i ∈ Is,t. Using

this notation we state formally the assumptions we impose on the error terms ui,t.

(A.1) Ns,t → ∞, ∀s, t; ∃N → ∞ s.t. Ns,t/N → πs,t and 0 < minπs,t < maxπs,t <

∞. T and S are fixed (Type I asymptotics).

(A.2) ui,t are i.h.d. with finite 2 + δ moment, for δ > 0, such that√Ns,tus,t

d−→N (0, σ2

s,t) jointly ∀s, t with 0 < minσ2s,t ≤ maxσ2

s,t <∞.

Assumption (A.1) states that the number of individuals per cohort at any time t

should be large and asymptotically non-negligible as compared to N , while the number

of cohorts and time periods is fixed.


Remark 5.1. Note that in (A.1) instead of explicitly assuming thatN =∑T

t=1

∑Ss=1 Ns,t

we allow for some generic N . The estimators and the test statistics considered in this

paper are invariant to a particular choice of N . In general, one can think of N to be the

sum (as in Inoue (2008)), average or even any particular value of Ns,t (as in McKenzie

(2004)).

In (A.2) we do not impose the i.i.d. assumption unlike I2008, but allow for het-

eroscedasticity between individuals and over time. Furthermore, this assumption can

be relaxed by allowing a certain degree of spatial dependence between individuals of

the same cohort. In that case consistency (or inconsistency) properties of all estimators

discussed in this paper are not altered, but the knowledge about the exact structure of

the spatial dependence is required for correct inference.

Similarly to the model with cohort fixed effects one can rewrite the stacked equation

for y as

y = Xθ + ν = vec (FΛ′) +Xθ + u.

In this case under high-level Assumptions (A.1)-(A.2) the following result applies to

the θGMMl estimator

Proposition 5.1. If (5.10) holds, plimN→∞X = X∞ and FΛ′ are deterministic, then

under Assumptions (A.1)-(A.2)

θGMMl − θ0p−→ (X ′∞MΩMX∞)

−1X ′∞MΩM vec (FΛ′).


Thus, the GMM/FE estimator converges in probability to a value that depends on

unobserved factors (F ) and on cohort factor loadings (Λ). Note that, in principle, it

is possible that both limiting quantities have zero mean (hence θGMMl can be asymp-

totically unbiased), if one assumes FΛ′ to be stochastic. Technical reasons behind the

assumption that some of the quantities have to be deterministic are discussed later in

this paper in more detail.

5.3.2 Assumptions and estimation

Given that the θGMMl estimator is in general inconsistent in the presence of the mul-

tifactor error structure, another estimation strategy is needed to obtain consistent


estimates of θ. For this purpose, we adopt the quasi-differencing (QD) approach of

Ahn et al. (2001) and Ahn et al. (2013), that is tailored for genuine panel data models

with fixed T . Their approach suggests the use of the transformation matrix Ms(φ)

that depends on an unknown parameter vector φ so that (for T > L)

Ms(φ)F = O(T−L)×L.

In other words, one has to introduce the additional parameter vector φ in order to

remove the unobserved factors F from the model. Unlike the standard setup with fixed

effects δs only, where the factors and consequently the corresponding transformation

matrix are known (up to a constant), the Ms(φ) matrix is unknown and depends on

φ which has to be estimated jointly with θ.

Observe that for each [L× L] invertible matrix A we have

Fλs = (FA)(A−1λs

)= F ∗λ∗s.

In order to avoid this rotational indeterminacy (or in other words, non-uniqueness to

multiplication), we can normalise F ∗ = (Φ′,−IL)′ (assuming that the lower [L × L]

block (FL×L) of F matrix is of full rank). One can then set Ms(φ) to be

Ms(φ) = (IT−L,Φ), (5.11)

where φ = vec (Φ). Analogously to the fixed effects transformation matrix, we define

the stacked version of this matrix using the Kronecker product, i.e. M (φ) = IS ⊗Ms(φ).

Given the transformation matrix M (φ) we define the non-linear GMM estimator

γGMMn = (θ′GMMn, φ′GMMn)′ as the global minimiser of the following objective function

f(γ) =1

2[(y −Xθ)′M (φ)′ΩM (φ)(y −Xθ)] , (5.12)

for some pre-specified [S(T −L)×S(T −L)] positive definite weighting matrix Ω. The

corresponding gradient of the objective function in (5.12) is given by

∇f(γ) =

(−X ′M (φ)′

Q′((y −Xθ)⊗ IS(T−L)

) )ΩM (φ)(y −Xθ)

=

(Dθ(γ)′

Dφ(γ)′

)ΩM (φ)(y −Xθ).


Here Dγ(γ) = (Dθ(γ),Dφ(γ)) is the Jacobian matrix of the moment conditions

M (φ)(y−Xθ), evaluated at some γ (when evaluated at the true value γ0 we suppress

the dependence on γ). Finally, Q is an [S2T (T − L) × (T − L)L] selection matrix of

the following form

Q = (((IS ⊗KTS)(vec IS ⊗ IT ))⊗ IT−L)V , V =

((O(T−L)×L

IL

)⊗ IT−L

),

with zeros and ones as elements.

Before proceeding, we extend the set of the high-level assumptions that are sufficient

to prove the asymptotic results for γGMMn.

(A.3) γ = (θ′,φ′)′; γ ∈ Γ ⊂ RK+(T−L)L and γ0 ∈ interior(Γ ). The parameter space

Γ is compact.

(A.4) rk [plimN→∞Dγ ] = K+ (T −L)L. The X∞ ≡ plimN→∞X matrix is determin-

istic.

(A.5) a) L = L0 < minS, T, while (S − L)(T − L) > K with L0 being the true

number of factors with non-zero mean factor loadings. b) rk(Λ) = L0. c) F−1L×L

exists. d) F and Λ are deterministic matrices.

(A.6) The model is asymptotically identified: plimN→∞M (φ)(y −Xθ) = 0S(T−L)

implies γ = γ0.

The probability limit of the (transposed) Jacobian D′γ in (A.4) can be expressed in

the following way

plimN→∞

D′γ =

(−X ′∞M(φ0)′

Q′(vec (FΛ′)⊗ IS(T−L))

).

Here, the regressor matrix X∞ = (W ,Z∞), has a typical st’th row element given by

(w′s,t, limN→∞(1/Ns,t)∑Ns,t

i=1 E[z′i,t|i ∈ Is,t]). If the zi,t are i.i.d. for all i ∈ Is,t the st’th

row is simply given by (w′s,t,E[z′i,t|i ∈ Is,t]).

(A.4) is the strong identification assumption commonly used in the standard GMM

setting. This assumption is quite restrictive even when φ is known as noted by Ver-

beek (2008): “While it is not obvious that this requirement will be satisfied in empirical

applications, it is also not easy to check, because estimation error in the reduced form

parameters may hide collinearity problems. That is, sample cohort averages may exhibit


time-variation while the unobserved population cohort averages do not.”. The impor-

tance of this problem is illustrated in Section 5.4.2 as well as in the Monte Carlo section

of this paper. However, we leave the properties of the identification-robust inference

procedures of e.g. Kleibergen (2005) for future research.

For genuine panel data models Ahn et al. (2013) formulate (A.5) slightly differently:

“...denotes the number of the individual-specific effects that are correlated with the re-

gressors...”. In our case, only factors with non-zero mean factor loadings are of interest

for estimation, as factors with zero mean factor loadings cannot be identified from cross-

sectional averages. Hence, it is possible that the genuine panel data estimators would

identify more factors than the pseudo panel data estimator even if applied to the same

dataset.

Assumption (A.6) imposes asymptotic global identification for the objective function.

In Section 5.4.3 we provide detailed examples when this condition can be violated.

Note that Assumptions (A.1)-(A.6) do not impose any exogeneity restriction of zi,t

with respect to ui,t, thus elements of zi,t are allowed to be endogenous, as noted at the

end of Section 5.2.

Remark 5.2. In this paper, we treat both cohort specific variables ws,t and the cohort-

interactive effects component FΛ′ as deterministic. Equivalently, Assumptions (A.1)-

(A.6) can be formulated conditional on these observed and unobserved quantities, but

in that case one has to rely on limit theory developed in Kuersteiner and Prucha (2013;

2015) to obtain the limiting distribution. Our treatment of the unobserved quantities

is similar to the genuine panel data models for fixed T , where F is usually treated

as deterministic (as in Ahn et al. (2013) and Robertson and Sarafidis (2015)). The

deterministic treatment of variables is only needed in order to avoid technicalities,

without any effect on the way estimation and inference are performed (as emphasized

by Kuersteiner and Prucha (2013; 2015)).

Assumptions (A.1)-(A.6) are sufficient to obtain the following asymptotic represen-

tation of γGMMn.

Proposition 5.2. Suppose that Assumptions (A.1)-(A.6) are satisfied. Then γGMMn

has the following asymptotic representation.

√N(γGMMn − γ0)

d−→ plimN→∞

((D′γΩDγ

)−1D′γ

)ΩM (φ0)Σ1/2ξ, (5.13)

where ξ ∼ N (0ST , IST ) and Σ is an [ST × ST ] diagonal matrix with the typical (s −1)T + t diagonal element given by σ2

s,t/πs,t.


These results can be proved using standard arguments, e.g. as in Newey and McFadden

(1994).

Remark 5.3. Similar to the original setup of Ahn et al. (2013) for genuine panels, the

asymptotic distribution of γGMMn is well-defined only in the case of the true value of

L = L0 as imposed by Assumption (A.5).

The asymptotic variance-covariance matrix (treating X∞, F and Λ as deterministic)

is minimised at Ωopt = (M(φ0)ΣM (φ0)′)−1. In that case, the asymptotic variance-

covariance matrix of the θGMMn is given by

plimN→∞

(D′θΩ

1/2optMΩ

1/2optDφ

Ω1/2optDθ

)−1

,

where MΩ

1/2optDφ

= IS(T−L) − Ω1/2optDφ

(D′φΩoptDφ

)−1D′φΩ

1/2opt is the usual “residual

maker” projection matrix that projects off the column space of Ω1/2optDφ.

Remark 5.4. Depending on the assumptions made for the error terms (heteroscedas-

ticity or homoscedasticity over time and cohorts) and regressors (static or dynamic

model) the formulae for consistent estimation of Σ can be used without modifications

as in I2008. The typical (s − 1)T + t diagonal element of the Σ matrix is equal to

q2s,t = (N/Ns,t)σ

2s,t where

σ2s,t =

1

Ns,t

∑i∈Is,t

(yi,t − x′i,s,tθ1)2 −

1

Ns,t

∑i∈Is,t

(yi,t − x′i,s,tθ1)

2

(5.14)

for some consistent initial estimator θ1 (e.g. the estimator that uses the identity matrix

for Ω).

For a fixed value of S and T , the unconditional (treating deterministic quantities as

stochastic) distribution of√N(γGMMn − γ0) is not multivariate normal and depends

on the factors, cohort-specific factor loadings and cohort specific regressors ws,t in the

limit. Note that the limiting distribution of the linear GMM estimator γGMMl (as

in I2008 ) is also normal only conditionally on cohort specific regressors ws,t, while

unconditionally it is not. Hence the conditioning argument is not unique to the non-

linear estimator.

Note that the number of rows in theQ matrix is quadratic in both S and T . Thus, even

for moderate dimensions, numerical computations might become cumbersome. Given

that under Assumptions (A.1)-(A.6) the Σ matrix is diagonal (or block-diagonal if


dynamics are allowed), we can also limit our attention to the block diagonal (over

cohorts) Ω weighting matrix. In this case, the objective function can be substantially

simplified to

f(γ) =1

2

S∑s=1

(ys −Xsθ)′Ms(φ)′ΩsMs(φ)(ys −Xsθ).

Here as before Ms(φ) = (IT−L,Φ), while Ωs is the s’th block of the block-diagonal

matrix Ωs. The gradient can be also expressed as a sum

∇f(γ) =S∑s=1

(−X ′sMs(φ)′

V ′ ((ys −Xsθ)⊗ IT−L)

)ΩsMs(φ)(ys −Xsθ),

with V as defined previously. This simplification of the objective function is used in

Section 5.5 while conducting the Monte Carlo study as well as in the empirical exercise.

As an alternative to the quasi-differencing approach with respect to factors, one can

also construct a similar estimator by quasi-differencing over cohorts. In this case, one

has to look for a redefined φ s.t.

M (φ)(Λ⊗ IT ) = 0T (S−L). (5.15)

In this case the φ vector is of dimension (S−L)L rather than (T−L)L. This alternative

QD transformation might be of particular interest if T >> S and it is reasonable to

consider a large N, T asymptotic framework as in McKenzie (2004) (Type III). As a

result, the number of parameters does not grow as both N, T increase.

Remark 5.5. The possibility to perform quasi-differencing with respect to either Λ or

F is similar in spirit to the estimation procedure of Robertson and Sarafidis (2015) for

genuine panel data models. For the model studied in Robertson and Sarafidis (2015),

quasi-differencing can be performed either with respect to the F or G matrices (where

G depends on the covariance between the factor loadings and the instruments).

Remark 5.6. For the sake of brevity, in the remaining sections we focus on the original

setup, but all results can be modified accordingly by redefining theM(φ) matrix. Note

that if we estimate the factor loadings instead of factors themselves, the weighting

matrix Ω in some cases is not block diagonal over the time dimension. Furthermore,

if the Σ matrix is not diagonal, the optimal Ω in the second step is not even block

diagonal in the S dimension. As a result, one cannot use the simplified objective

function for estimation.


5.3.3 Unbalanced samples

The quasi-differencing approach can be extended to allow for the number of time-series

observations to be cohort specific (Ts), such that observations for cohort s are observed

only at time periods t ∈ Ts. This extension is important from an empirical point of

view as some cohorts can disappear if the time span is sufficiently long, or alternatively,

if irregularly spaced surveys from different countries are used, as in McKenzie (2001).

The total number of cross-sectional averages is TS′ =∑S

s=1 Ts, which in the case of a

balanced panel is equal to ST . At first we define

P = diag(P1, . . . ,PS), T =

∣∣∣∣∣S⋃s=1

Ts

∣∣∣∣∣ ,where T is the total length of time series, while each Ps is a [Ts × T ] group specific

“selection matrix” of ones and zeros that is equal to IT if the panel is balanced. Using

this notation, the DGP in the stacked notations can be expressed as

y = P vec (FΛ′) +Xθ + u, (5.16)

where X now is of dimensions [TS′ × K]. Assuming that the last L observations are

observed for all cohorts, the QD matrix M(φ) in this case can be written in the

following way

M(φ) = R(IS ⊗ (IT−L,Φ))P ′, (5.17)

where R is a [(TS′ − SL)× (T −L)S] block-diagonal selection matrix that selects only

those rows of (IS ⊗ (IT−L,Φ))P ′ that are available to the researcher.

In general, observations other than the last L can be used for normalization. The

general necessary condition for normalisation to be feasible is

Tmin =

∣∣∣∣∣S⋂s=1

Ts

∣∣∣∣∣ ≥ L, (5.18)

which is obviously satisfied if the last L observations are observed. Furthermore, we

define the following set as

S∗ = i, j, . . . , k ∈ 1, . . . , S : rk (λi,λj, . . . ,λk) = L.

In other words, the set S∗ contains all subsets of cohorts whose factor loadings span an

L dimensional space. Using this definition we can formulate the identifying restriction


as follows

∀t ∈S⋃s=1

Ts, ∃St ⊆ S∗, s.t. ∀s ∈ St : t ∈ Ts, |St| ≥ L. (5.19)

Combining these assumptions we can rewrite Assumption (A.5) as:

(A.5) a)∑S

s=1 Ts − (T − L + S)L > K with L0 being the true number factors with

non-zero mean factor loadings. b) ∀t ∈⋃Ss=1 Ts, ∃St ⊆ S∗, s.t. ∀s ∈ St : t ∈

Ts, |St| ≥ L0. c)∣∣∣⋂S

s=1 Ts∣∣∣ ≥ L0 and F−1

L×L exists. d) F and Λ are deterministic

matrices.

In this case, FL×L denotes some [L × L] dimensional block of the F (not necessarily

the last block), that is used for normalisation.

5.3.4 Dynamic models

So far we have assumed that the vector of individual specific regressors zi,t does not

contain any lags of yi,t. If p lags of yi,t are included among the regressors, then the

previous results need to be adjusted. Note that unlike in genuine panel data models,

the mere presence of yi,t−1, . . . , yi,t−p does not violate the consistency of the non-linear

GMM estimator for fixed T , and consequently one does not have to use lagged values of

yi,t as instruments. The fact that under Type I asymptotics dynamic pseudo panel data

estimators have no “Nickell” bias (Nickell (1981)) is well documented in the literature

(e.g. McKenzie (2004) and Verbeek and Vella (2005)).

For most pseudo panel datasets no information regarding the history of any individual

i is observed, hence the historical averages (y∗s,t−1, . . . , y∗s,t−p) are not observed either.1

The following modified Assumption (A.2) is sufficient to allow for the use of observed

quantities (ys,t = (ys,t−1, . . . , ys,t−p)′) instead of their unobserved counterparts (y∗s,t =

(y∗s,t−1, . . . , y∗s,t−p)

′):

(A.2) ui,t are i.h.d. with finite 2 + δ moment, for δ > 0, such that√Ns,t(us,t + (y∗s,t−

ys,t)′α0)

d−→ N (0, σ2s,t) jointly ∀s, t ≥ p with 0 < minσ2

s,t ≤ maxσ2s,t <∞.

1Here y∗s,t−p = (1/Ns,t∑i∈Is,t yi,t−p) and we use ∗ to distinguish between averages of the lagged

dependent variables and lagged averages of dependent variables themselves ((1/Ns,t∑i∈Is,t yi,t−p) vs.

(1/Ns,t−p∑i∈Is,t−p

yi,t−p)).


Here α0 is a [p × 1] vector of corresponding true coefficients for lagged dependent

variables. Due to the non-zero covariance between us,t−1 and ys,t−1 (for p = 1 and

analogously for p > 1), the variance-covariance matrix Σ has a block tri-diagonal

structure (for further details please refer to Assumption (2c) of I2008 or Theorem 2 of

McKenzie (2004)).

Before considering the dynamic model for potentially unbalanced datasets in greater

detail we define the following extended sets

T ∗s = Ts⋃T +s , s = 1, . . . , S,

where T +s is the set of indices for observed “pre-sample” observations. For balanced

samples, this definition implies that in total T + p observations are observed (so that

timing starts at −p + 1). Using this definition, Condition (5.18) can be extended to

allow dynamics in the model

∃A(l) ⊂

(S⋂s=1

T ∗s

), s.t.

∣∣A(l)∣∣ > p, l = 1, . . . , L, (5.20)

and each set A(l) is distinct and connected. In other words, it is possible to find L

distinctive time periods so that for all S cohorts the vector (ys,t, y′s,t, x

′s,t)′ is observed.

Although only L common subsets A(l) are needed for normalisation, it is still necessary

that for each cohort s the total number of complete observations is at least L + 1.

Making use of (5.20) we can reformulate Assumption (A.5) in the following way

(A.5) a)∑S

s=1 Ts− (T −L+S)L > K+p with L0 being the true number factors with

non-zero mean factor loadings. b) ∀t ∈⋃Ss=1 Ts, ∃St ⊆ S∗, s.t. ∀s ∈ St : t ∈

Ts, |St| ≥ L0. c) ∃A(l) ⊂(⋂S

s=1 T ∗s), s.t.

∣∣A(l)∣∣ > p, l = 1, . . . , L and F−1

L×L

exists. d) F and Λ are deterministic matrices.

Note that for datasets with highly disconnected T ∗s this condition might be violated,

thus observations of some cohorts need to be discarded for the estimation procedure to

be feasible (for any given L). Assumption (A.6), on the other hand, can be difficult

to satisfy for some dynamic models, as we discuss in Section 5.4.3.


5.4 Testing, model selection and identification

In this section we briefly discuss how hypothesis testing and the selection of the number

of factors can be performed under the conditions of Proposition 5.2. Later we discuss

some examples, where one or more of these conditions can be potentially violated.

Particularly, we discuss the issues of local and global identification.

5.4.1 Testing and model selection

Given that the estimator derived under Assumptions (A.1)-(A.6) has a well-defined

asymptotic normal limit, hypothesis testing is conducted in the usual way. First of all,

the t− and Wald statistics can be used to test parameter restrictions. Secondly, we

can consider the Wald test for H0 : ft = c 6= 0,∀t = 1, . . . , T (hence φt = −1) of the

fixed effects model:

W = N(φGMMn + ιT−1

)′( Avar φGMMn

)−1 (φGMMn + ιT−1

)d−→ χ2(T −1). (5.21)

Here, the consistent estimator of the variance-covariance matrix of φGMMn is given by

Avar φGMMn =(Dφ(γ)′Ω1/2MΩ1/2Dθ(γ)Ω

1/2Dφ(γ))−1

, (5.22)

with Ω =(M (φ1)Σ(θ1)M(φ1)′

)−1

evaluated at a consistent one-step estimator γ1

(e.g. using the identity weighting matrix Ω). Given that the number of degrees of

freedom grows linearly with T , one can suspect that some loss of power for moderate

values of T might occur. Furthermore, in Appendix 5.A.4 we discuss how a Hausman

type test can be performed in order to test the Fixed Effects assumption.

Similar to any standard GMM estimation problem it can be shown that under (A.1)-

(A.6) the criterion function has a limiting chi-square distribution (provided that (S −L)(T −L)−K > 0, and accordingly modified for unbalanced and/or dynamic models):

JN(L) = N(y −Xθ)′M(φ)′ΩoptM(φ)(y −Xθ)d−→ χ2

(S−L)(T−L)−K ,

if L = L0. Here JN(L) denotes the corresponding “J” statistic for the model with L

factors. Testing for the number of unobserved factors can be performed sequentially

as in Ahn et al. (2013) or using a BIC model selection criterion. One starts with


H0 : L0 = 0 and if the null hypothesis is rejected proceeds with H0 : L0 = 1. The

sequential procedure can be motivated by the fact that for L < L0 (for any positive

definite Ω)

JN(L) = N(y −Xθ)′M(φ)′ΩM (φ)(y −Xθ)→∞.

Alternatively, for model selection we use the Schwartz Information criterion (BIC) of

the following form

SN(L) = JN(L)− a ln (N)((T − L)(S − L)−K).

For further details please refer to Propositions 2 and 3 of Ahn et al. (2013). However,

as we further discuss in the section on global identification, the sequential procedure

and BIC can fail to consistently estimate the true number of factors if the global

identification assumption is violated, as for some DGP’s in this case JN(L = 0)d−→

χ2ST−K .

Remark 5.7. While JN(L) is invariant to a particular choice of “N”, this is not the

case for SN(L). In the empirical section we consider two BIC criteria based on N =∑Ss=1

∑Tt=1 Ns,t as e.g. in I2008 and N = (1/ST )

∑Ss=1

∑Tt=1Ns,t.

5.4.2 Identification: Local and Weak

At first, we consider the local asymptotic identification condition summarised in (A.4).

For the asymptotic distribution to be properly defined, the matrix M(φ0)X∞ should

have a full column rank K. This condition is more restrictive than the analogous

condition for the model with only fixed effects, where it is necessary thatMX∞ (where

M = IS ⊗ (IT − (1/T )ıT ı′T )) is of full column rank. For example, the rank condition

for the non-linear GMM estimator in a model with one individual specific regressor

only is not satisfied if the DGP of the regressor, is of the following form

zi,t = f ′tλzi + εzi,t, εzi,t ∼ (0, σ2

i,t), (5.23)

and as a result one has

plimN→∞

M(φ0)z = 0S(T−L).

In this example the cross-sectional averages of zi,t asymptotically lie in the space

spanned by F . In the fixed effects model (i.e. f ′tλzs = δs) this condition is violated if

the mean of zi,t does not sufficiently vary over time. On the other hand, if the factors ft


in the equation for zi,t differ from the corresponding factors in yi,t then the asymptotic

identification condition is satisfied.

Remark 5.8. In the genuine panel data literature sometimes it is assumed that factor

loadings in the equation for zi,t have a different non-zero mean than the corresponding

factor loadings in the equation for yi,t (in our case that implies E[λzi ] 6= λs 6= 0L). In

such a case it might be tempting to use the Quasi-differencing approach with respect

to cohort factor loadings (as in (5.15)) rather than factors to circumvent the problem

with local identification. However, in Section 5.4.3 we show that for this setup the

global (rather than local) identification assumption is violated.

The full rank Jacobian condition plays an important role in deriving consistency and

asymptotic normality of the GMM estimator. To further illustrate the importance of

this assumption for pseudo panel models we consider the simplified example with one

endogenous regressor.

Example 5.1 (One Regressor).

yi,t = ζzi,t + ui,t, ui,t = ρ

(zi,t −

µs,tNγs,t

)+ εi,t i ∈ Is,t,

zi,t =µs,tNγs,t

+ ε(z)i,t , εi,t, ε

(z)i,t ∼ iid(0, 1).

For simplicity, we assume that λs = 0L,∀s and µs,t are deterministic, so that the first

step estimator of ζ with Ω = IST is simply given by

ζ =

∑Ss=1

∑Tt=1 zs,tys,t∑S

s=1

∑Tt=1 z

2s,t

.

Then we can state the following result:

Proposition 5.3. Let the assumptions in the One Regressor example be satisfied, then

(ζ − ζ0)− ρ d−→∑S

s=1

∑Tt=1 π

−1s,t zs,tεs,t∑S

s=1

∑Tt=1 π

−1s,t z

2s,t

, if γ ≥ 1/2.

N1/2−γ(ζ − ζ0)d−→∑S

s=1

∑Tt=1 π

−(0.5+γ)s,t µs,tus,t∑S

s=1

∑Tt=1 π

−2γs,t µ

2s,t

, if 0 ≤ γ < 1/2.



Here the limiting random variables with subscript s, t are defined as

zs,t ∼ 1(γ=1/2)µs,t +N (0, 1),

εs,t ∼ −1(γ=1/2)ρµs,t +N (0, 1),

us,t ∼ N (0, 1 + ρ2),

where all Gaussian random variables are mutually independent. As a result, the non-

scaled estimator of the structural parameter ζ converges in distribution to a random

variable centered at ζ0 + ρ under weak-instrument asymptotics (if all µs,t = 0 for γ =

1/2). On the other hand, for semi-weak (or semi-strong) identification (0 ≤ γ < 1/2),

the estimator retains the asymptotic (conditionally) normal limit centered at the true

value ζ0 but with slower rate of convergence N1/2−γ.

Although one can think that the aforementioned example is quite restrictive, it is of

particular importance if one considers the AR(1) model

yi,t = αyi,t−1 + ui,t, i ∈ Is,t. (5.24)

In this case, the only source of variation (µs,t) is coming from the possible mean (effect)

non-stationarity of the initial condition

yi,0 =µsNγs,t

+ ui,0. (5.25)

Furthermore, as by construction the equation of interest contains an endogenous regres-

sor (as discussed in Section 5.3.4) with coefficient ρ = −α0, the estimator α converges

to a random limit centered at zero for γ > 1/2.

The important lesson that we learn from this example is that in cases where the rank

condition can be potentially locally violated, endogeneity starts to play an important

role. This is in sharp contrast to the full-rank assumption. In other words, endogenous

regressors play a role even if one considers the Type I asymptotic scheme. We inves-

tigate implications of this example for the more detailed model in the Monte Carlo

section of this paper.

5.4.3 Identification: Global

In addition to the full rank condition of the Jacobian matrix, the model has to be

globally identified, as formally summarised in Assumption (A.6). We start this section


with the most trivial case by considering the setup as in (5.23), but with

E[λzi ] = κλs,∀i ∈ Is,t, κ 6= 0. (5.26)

Then the required global identification condition for the model with one regressor is of

the very simple form

M (φ) ((IS ⊗ F ) vec(Λ′) (κ(ζ0 − ζ) + 1)) = 0S(T−L).

This condition is satisfied for ζ = ζ0 + 1/κ, irrespective of the value of φ. In this case,

if one performs sequential selection of factors, the model with L = 0 can be selected,

as the JN(L = 0) statistic for this model has a non-degenerate chi-squared limit. The

corresponding (inconsistent) estimator satisfies ζp−→ ζ0 + 1/κ.

Assumption (A.6) is particularly difficult to satisfy for linear dynamic models without

any additional regressors. As we will see in the next two examples, a necessary condition

for global identification of dynamic models is the presence of regressors (or initial

condition yi,0), that cannot be well approximated by the factor structure present in the

model itself. To illustrate this point, consider a linear AR(1) model

yi,t = λ′sft + αyi,t−1 + ui,t, i ∈ Is,t.

If the model only contains fixed effects, i.e. ft = c for all t, and the initial condition is

mean-stationary, then

yi,t−1 =δs

1− α0

+∞∑j=0

(α0)jui,t−1−j, i ∈ Is,t.

Coming back to equation (5.26) we have κ = 1/(1 − α0), and thus irrespective of the

value for M (φ)

plimN→∞

M (φ)(y −Xα) = 0S(T−L),

at α = 1. As a result, unlike the linear GMM estimator of I2008, where the stationary

initial condition violates the local identification assumption (A.4), (as in Example 5.1),

estimation of unobserved factors in this case also leads to the violation of the global

identification assumption (A.6).


The presence of unobserved factors in general is not sufficient for global identification.

To illustrate that, we consider the process for yi,t with “infinite” initialisation at “−∞”

yi,t−1 = λ′s

(∞∑j=0

(α0)jft−1−j

)+∞∑j=0

(α0)jui,t−j = λ′sf∗t−1 +

∞∑j=0

(α0)jui,t−1−j.

Defining F ∗ = (f ∗0 , . . . ,f∗T−1)′ the global identification condition can then be formu-

lated in the following way

plimN→∞

M (φ)(y −Xα) = M (φ) vec(((α0 − α)F ∗ + F )Λ′) = 0S(T−L).

If we can further assume that the appropriate [L×L] block of the F = (α0−α)F ∗+F

matrix is invertible, then each parameter value from the set

Γ = γ = (α,φ′)′ ∈ R(T−L)L+1 : α ∈ R, Φ = −F[(T−L)×L]F−1[L×L],

satisfies the moment conditions.

Based on these two examples, we can see that a necessary condition for global identi-

fication of dynamic models is the presence of regressors (or initial condition yi,0) that

cannot be well approximated by the factor structure present in the model itself. Note

that the non-linear dynamic pseudo panel data models as studied by e.g. Antman

and McKenzie (2007b) or more general models with regressors can still be globally

identified.

Finally, we can obtain a similar result if we consider the example in (5.26) with quasi-

differencing with respect to factor loadings rather than factors

M (φ) vec(F (Λ+ (ζ0 − ζ)Λ(z))′) = 0T (S−L), ∀γ ∈ ΓΛ,Λ = (ζ0 − ζ)Λ(z) +Λ,

ΓΛ = γ = (ζ,φ′)′ ∈ R(S−L)L+1 : ζ ∈ R, Φ = −Λ[(S−L)×L]Λ−1[L×L]

even if E[λ(z)i ] 6= λs 6= 0L. Although locally this transformation provides identification,

the global identification condition is violated.



5.5.1 Setup

The main goal of this Monte Carlo study is to investigate the effects of possibly en-

dogenous regressors in the nearly singular designs with factors. By doing so, we expand

the literature to models with unobserved factors and cases where asymptotic identifi-

cation assumption might be (locally) violated. As the main focus of this paper is on

static pseudo panel data models, we do not investigate the finite sample properties of

estimators in dynamic specifications.

The summary of the Monte Carlo setup is provided below.

yi,t = ftλs + βws,t + ζzi,t + εi,t, i ∈ Is,t,

zi,t = µs,t +√

1− σ2µzi,t, zi,t ∼ N (0, 1), µs,t = 1 + σµN (0, 1),

ws,t ∼ N (0, 1), λs ∼ U(0, 1),

λs ∼ N (λs, σ2λs), ft = 1 + σfu

(f)t , u

(f)t = αfu

(f)t−1 + ε

(f)t ,

ε(f)t ∼ N (0, 1− α2

f ), u(f)0 ∼ N (0, 1)

εi,t = ρ(zi,t − µs,t) + (√

1− ρ2)√

1− σ2fληi,t, ηi ∼ N (0, 1).

Several quantities were normalised to obtain a parameter space that is more “orthog-

onal” (following suggestions of Kiviet (2007; 2012)):

var zi,t = 1, var (ftλs + εi,t) = 1.

The following parameter space is considered:2

N = 150; 300, T = 5, S = 10,

σ2fλ = 0.1; 0.5, ρ = 0; 0.3, θ0 = 1; 1,σ2µ = 0; 0.05; 0.3, σ2

f = 0; 0.1, αf = 0.5.

To ensure that var (ftλs) = σ2fλ the σ2

λsis set to

σ2λs =

σ2fλ − σ2

f λ2s

σ2f + 1

.

2In the preliminary version of this paper, designs with T = 10 were also considered. But giventheir similarity to results for T = 5 we decided to present only the latter case.


Note that all s, t specific variables λs, ft, ws,t, µs,t are simulated in every Monte Carlo

replication so that the limiting distribution of the estimator is only conditionally nor-

mal. This is done to emphasize that in Assumptions (A.1)-(A.6) these quantities are

assumed to be deterministic only for technical reasons. On the other hand, λs’s are

generated only once in each design to make sure that the GMMl0 (two step estimator

without any factors, i.e. L = 0) estimator is biased in finite samples. However, the re-

sults for other estimators do not change quantitatively or qualitatively if all λs = 0. The

σ2µ parameter is introduced to control the degree of asymptotic singularity of the GMM

Jacobian matrix. Similar to I2008, other distributional assumptions for λs, µs,t, ws,tcan be considered, but the given setup is sufficient for our purposes. Ns,t is set to

be bπs,tNST c, where πs,t ∼ U(0, 1). Note, by generating λs, ft in each replication we

deviate from the theoretical discussions in this paper, but are more in line with genuine

panel data literature and the setup of I2008.

We consider 48 different Monte Carlo designs in total and for convenience of further

discussion we divide them in four different groups with 12 designs each. We denote the

groups by letters A, B, C and D in Tables 5.5 and 5.6. We denote two linear GMM

estimators that do not allow any cohort effects (M = IST ) or only time-invarying co-

hort effects (M = IS⊗ (IT − (1/T )ıT ı′T )), by GMMl0 and GMML1 (that are obtained

as in (5.9) but with the optimal weighting matrix). Furthermore, we use abbrevi-

ations GMMn1, GMMn2 and GMMo to denote the two-step non-linear GMM with

L = L0 = 1, non-linear GMM with L = 2 (both solutions to (5.12)) and GMM based

on BIC selection criteria, respectively. All results are presented for the two-step esti-

mators (where necessary) with the asymptotically optimal weighting matrix Ω under

the assumption that σ2s,t = σ2.3,4 In this case for an estimator of σ2 we use

σ2 =1

ST

T∑t=1

S∑s=1

σ2s,t

with σ2s,t defined in (5.14). In this section, we discuss the results for the ζ parameter

only; results for β are available from the author upon request.

Remark 5.9. Note that, for the given setup the GMMl0 estimator is always inconsistent

and biased, where the second result is due to λs being non-zero. The GMMl1 estimator,

3Note that under these assumptions all linear GMM estimators are one-step efficient and we usethis fact in estimation, whereas to obtain non-linear GMM estimators we perform estimation in twosteps as the optimal weighting matrix depends on unknown parameters.

4Note that for GMM estimators with factors as starting values we use “GMMl1” estimator forθ and a vector of zeros for factors. Based on preliminary simulations, results were not found to besensitive to starting values.


on the other hand, is always unbiased but inconsistent if σ2f 6= 0. In this paper, we do

not consider any designs where for σ2f 6= 0 and ρ = 0 the GMMl1 estimator is both

inconsistent and biased. This is mainly due to the impossibility to control the relative

variance ratio as in that case var (ftλs) would have to vary over time and/or cohorts.

As this possibility is not desirable for a fair comparison, we leave analysis of this case

for future research.


In this section we summarise the bias and RMSE properties of the estimators as pre-

sented in Table 5.5. Firstly, we discuss the results for the GMML0 estimator. As

argued in the setup of this Monte Carlo study, the designs considered ensure that

GMMl0 is biased in finite samples. Unbiasedness of the estimator can be easily ob-

tained by fixing λs = 0 (or any other constant if an intercept is included). The value

of the RMSE is mostly driven by squared bias, as can be expected given the minimum

variance properties of this estimator.

We now turn our attention to the properties of other estimators in each of the four

subgroups.

(A) In all 12 designs it can be seen that the GMMl1 estimator does not exhibit

any visible bias, while non-linear estimators tend to be slightly positively biased

in the first four designs. In general, the results are not surprising despite the

fact that none of the estimators is consistent for σ2µ = 0. As all µs,t = 0, the

estimators are still centered around the true value (recall Example 5.1). The

effects of asymptotic non-identification, on the other hand, show up clearly once

we consider the corresponding values of the RMSE. Even the slightest increase

in σ2µ reduces RMSE substantially. Furthermore, observe that for σ2

µ = 0 the size

of N does not have any effect on RMSE, while in other cases it has a substantial

negative effect, as can be expected given the consistency. Finally, higher values

of σ2µ have a positive effect (in terms of lower value) on RMSE of all estimators,

as it has a direct impact on the variance of all GMM estimators.

(B) In this subset of designs, the zi,t regressor is endogenous and has a non-zero

correlation coefficient ρ. In the first four designs when all GMM estimators are

asymptotically (locally) unidentifiable, the biases are roughly proportional to ρ.

Bias is somewhat smaller when a time-varying factor is present but still remains


substantial. Inflation in bias directly translates into larger values of RMSE. As

can be expected given the consistency of the GMM estimators, the bias quickly

disappears once σ2µ = 0.05. However, it still remains present as the corresponding

values of RMSE are larger in (B) as compared to (A).

(C) As compared to designs in group (A), designs with σ2fλ = 0.5 show substantial

improvements in terms of both bias and RMSE. While some bias was visible in

(A), it is no longer the case here. We can see that for σ2fλ = 0.5 with time-

varying factors there are no designs left with RMSE of GMMl1 being lower than

the corresponding value of GMMn1. However, it is still true in one case that the

RMSE that of GMMo is lower than of GMMn1. Furthermore, GMMn2 always

has higher RMSE than the correctly specified GMMn1.

(D) Finally, the results of (D) designs are very similar to those of (B), with conclusions

analogous to the (A)-(C) comparison.

5.5.3 Results: Testing

In this section, inferential properties of the estimators are considered and discussed.

The Wald statistic is used to test the fixed effects assumption, as discussed in Section

5.4.1. To simplify matters, we denote the two Hausman tests by h0 and h1, where h0 is

the test statistic for GMMl0 vs. GMMl1, and h1 tests for GMMl1 vs. GMMn1. Given

that in terms of size and power the Wald test dominates the h1 test (see Table 5.6),

we do not discuss the behaviour of the latter test in greater detail. The h0 almost in

all design rejects close to 100% of all Monte Carlo replications.

All test statistics have a nominal size of 5% (with the exception of GMMn2 estimator,

as no asymptotic results for that estimator are available).

(W) In Section 5.4.1 we have mentioned that the Wald test can be used as an alter-

native to the h1 test for the null hypothesis of no time-varying fixed effects. A

quick look at Table 5.6 suggests that for any given Monte Carlo setup the Wald

test is superior in terms of both size and power. Although we can see that for

low values of σ2fλ and N the test is slightly size-distorted (with a maximum of

12%), the distortions tend to disappear quickly and dominate the size of h1 test

in all cases. Similarly to the h1 test, a larger σ2µ does not seem to influence the

results a lot, while an increase in σ2fλ and/or N has a positive result for size

(closer to the nominal size). Conclusions regarding power are very similar to the

ones applicable to size but with accordingly adjusted implications.


(J-) Results for the “J” test are discussed from left to right in Table 5.6. As can

be expected, based on the “J” test in all cases we reject the model without

any fixed effects. Patterns for empirical rejection frequencies of the “J” test for

GMMl1 estimator are very similar to those of the Wald test as described in the

previous paragraph, though the power is somewhat smaller for the “J” test. As

in all designs with σ2µ > 0, the GMMn1 and GMMo estimators are consistent,

the rejection frequencies for those estimators should be close 0.05. However, it

is not always exactly the case, as often some size distortions are visible (up to

12%). In general, distortions diminish with larger values of N , σ2fλ and σ2

µ. Note

that asymptotic (local) non-identification has no major influence on inference

and that can be partially explained using the results in Dovonon and Renault

(2009). Observe that for the GMMn2 estimator the “J” test statistic is always

undersized, which is driven by the fact that we fit more than the true number of

factors.

(t-) First of all, we note that results for GMMn1, GMMo and GMMn2 are very similar

to each other and can be ranked in terms of size distortions in the aforementioned

order. Secondly, as can be expected from the asymptotic efficiency perspective

when GMMl1 is consistent, it has better size properties than GMMn1. Moreover,

unlike the non-linear estimator, the linear estimator is obtained in one step, and

is not affected by possible estimation bias of the optimal weighting matrix. On

the other hand, when it is inconsistent, the rejection frequencies slowly approach

1 as N , σ2µ, and/or σ2

fλ increase. Similarly, the empirical rejection frequencies of

GMMn1, GMMo and GMMn2 converge to the nominal size of 5% for larger values

of the aforementioned design parameters. As can be expected given the bias

results, when all GMM estimators are not (locally) asymptotically identified, the

properties of the t− statistic are directly dependent on the value of the correlation

coefficient ρ, so that for ρ = 0.3 empirical rejection frequencies are close to 1.

As mentioned previously, these size distortions disappear once σ2µ increases, but

even for σ2µ = 0.3 all test statistics are slightly oversized.

5.5.4 Results: Model selection

The results of this section can be found in the last column of Table 5.6 (#L = 1),

where each number indicates the fraction of Monte Carlo replications in which the

correct number of factors was selected (L0 = 1 in this case). Here we adopt a proce-

dure similar to Ahn et al. (2013) and set a = 0.75/ln(5), while we “N” is defined as


N = (1/ST )∑S

s=1

∑Tt=1Ns,t. The results based N =

∑Ss=1

∑Tt=1Ns,t are similar and

available from the author upon request. Below, we briefly summarise the findings.

We can see that the BIC based model selection procedure performs well in general.

In 10 out of 48 designs, the proportion of correctly specified L is marginally lower

than 95% while in the majority of cases this is above 98%. The results are not highly

sensitive to the choice of the design parameters, but a few clear trends are still visible.

Firstly, a higher relative weight of unobserved factor components as represented by the

σ2fλ parameter has a clear positive effect on the model selection procedure. It can be

easily explained by the fact that for smaller values of σ2fλ the DGP is quite close to

the model without factors and thus GMMl0 is the preferred estimation procedure. We

find it surprising that there are no substantial effects on model selection when ρ > 0

and σ2µ = 0. This contradicts the findings of the previous two sections where clear

distortions in terms of the bias and the size of the test statistics were visible.

5.6 Empirical illustration: ENEMDU Dataset

5.6.1 The Dataset

Ecuador has experienced a period of rapid growth for the last 10 years. The average real

GDP growth was above 4% with a small positive real GDP growth of 0.5% observed

even in the midst of the global recession in 2009. Furthermore, the country went

through a period of substantial shift in terms of alleviating economic inequality and

poverty. According to data of the World Bank, the percentage of the population living

below the poverty line steadily declined from over 44% in 2004 to 25.6% in 2013. This

decline reflects substantial changes in the socioeconomic environment in the country.

Finally, the decline in the unemployment rate over the last decade is also clearly visible,

see Figure 5.1. Furthermore, even during the global economic downturn the national

unemployment rate did not exceed 9% which is a small number as compared to some

developed countries during the same period. The dramatic changes of these main macro

indicators suggest that substantial changes at the micro level also occurred. In this

section, we estimate the labour supply elasticity for working males that are also the

heads of the household. To accommodate possible common shocks, we estimate the

model with cohort interactive effects.


Figure 5.1: National unemployment rate dynamics over 2003-2013. Source: BancoCentral Del Ecuador.

We use annual data from the National Employment and Unemployment Survey (EN-

EMDU) collected by the National Institute of Statistics and Census of Ecuador. The

dataset contains information at household level, with information provided over all in-

dividuals of age five and above. Due to the fact that in 2007 the survey methodology

was updated, and the fact that our estimation method is consistent for a small number

of time series observations we limit our sample to the period of 2007-2013. Further-

more, we consider only surveys from the fourth quarter of each year as it contains the

largest number of observations, which is also representative for annual observations.

This is partially done to ensure that each cohort contains at least 100 individuals.

Like some related studies (e.g. Antman and McKenzie (2007a) and Gonzalez and

Sala (2015)), we study the labour market participation of prime aged males (26-55)

occupying a single job. We restrict our sample to males who work for at least 30 hours

but no more than 60 hours per week to minimise the number of potential outliers.

Moreover, as we are only interested in the intensive margin (the number of hours

worked) of the labour supply, the observations with a lower number of hours worked

(corresponding to part-time workers) are not of prime interest. A joint study of the

extensive (decision to work full/part time) and intensive margin (the number of hours

worked) is complicated due to the scarcity of the available explanatory variables. To

obtain real rather than nominal income, we deflated individual income using the annual

Consumer Price Index (CPI) at the national level.

Before proceeding with estimation and model specification, we discuss how we define

cohorts in our study. Similar to Gonzalez and Sala (2015) we define cohorts solely

based on the age of the individual. In total, we construct 10 cohorts of equal age


3.76

3.78

3.80

3.82

3.84

27 30 33 36 39 42 45 48 51 54 57 60age

log

hour

s

(a) Average log hours worked.

0.5

0.6

0.7

0.8

0.9

1.0

27 30 33 36 39 42 45 48 51 54 57 60age

log

wag

es

(b) Average logwage.

Figure 5.2: Age of each cohort is defined as the middle point in the interval.

intervals based on individuals born in 1952-1981, therefore each cohort represents a

three year interval. Alternatively, one could define cohorts based on five year intervals

and/or geographical location. That strategy, on the other hand, would substantially

reduce the average number of observations per cohort or the total number of cohorts.

This is a common tradeoff faced by applied econometricians when dealing with pseudo

panel datasets, see e.g. Verbeek and Vella (2005) and Verbeek (2008). Hence, for

simplicity and ease of exposition we consider only results based on three year intervals.

As discussed by I2008 the adequacy of the model as well as the definition of cohorts

can be investigated by means of the “J” statistic.

5.6.2 Results

As a basic setup, we are interested in the model of the following form

log hoursi,t = γ logwagei,t + β′zi,t + θ′qi,t + ui,t, E[ui,t] = 0, i ∈ Is,t. (5.27)

Here log hoursi,t is the logarithm of the weekly hours worked by individual i while

logwagei,t is the real hourly wage. Models of similar form were extensively estimated

using genuine panel data methods, see e.g. Ziliak (1997). Furthermore, we assume that

the regressors in zi,t are observed by the econometrician, however qi,t are unobserved

but can be well approximated by

qi,t = Λ(q)s ft + εi,t, εs,t

p−→ 0Kq .


We would like to stress that we do not assume that E[zi,tq′i,t] = O or E[qi,t logwagei,t] =

0Kq hence we can allow for endogeneity in our framework. For example, due to the

non-availability of consumption data in this dataset, we assume that this endogenous

variable is a part of the qi,t variables rather than of zi,t.

Combining both equations we obtain a simple model to study the labour supply elas-

ticity in the intensive margin that can be summarised by the following equation

log hoursi,t = λ′sft + γ logwagei,t + βzi,t + vi,t, vi,t = ui,t + θ′εi,t, i ∈ Is,t, (5.28)

while λ′s = θ′Λ(q)s . The only control variable that we include in our model that is of

particular interest on its own (and is available in the dataset) is the total number of

individuals under the age of 16 in a given household. By including this variable, we

follow some other studies, particularly Peterman (2014), and control for the house-

hold composition. The averages of log hoursi,t worked and logwagei,t are presented in

Figure 5.2, while average values of zi,t and the number of observations per cohort are

summarised in Appendix 5.B.

As discussed in Sections 5.4.1 and 5.4.3, the “J” statistic evaluated at the GMMl0

estimator can be used as a possible indication for global non-identification. For this

model specification, the “J” statistic is equal to JN(L = 0) = 698.91, which indicates a

clear rejection based on the critical values from χ2(68) at any conventional significance

level. That provides some indication that global identification failure for this dataset

is not very likely.

Variable GMMl1 GMMn1 GMMn2logwage -0.130*** -0.075*** -0.076**# kids 0.003 -0.008 -0.011

df 58 52 38J 88.05*** 56.97 44.92

BIC1 -187.35 -133.62BIC2 -84.41 -58.39

Wald(FE) 33.874***

Table 5.1: T = 7, S = 10. Results are based on 2-step estimates using theoptimal weighting matrix in the second step. Based only on heads of the household.* indicates statistically significant at the 10% level, **- at the 5% level, and ***- at the1% level γ0, β0 = 0. J(GMMl0) = 577.84. BIC1 and BIC2 use N =

∑Ss=1

∑Tt=1Ns,t

and N = (1/ST )∑S

s=1

∑Tt=1Ns,t respectively.


As we can see from the estimation results in Table 5.1, the GMMl1 model specification

is rejected based on the “J” statistic.5 This conclusion is also confirmed using the

Wald test for testing the null hypothesis that the factor is constant over time. Hence,

the assumption that cohorts were not affected by any internal or external shocks is

difficult to justify based on the statistical procedures we consider. For GMMl1, the es-

timated elasticity coefficient of logwage is negative and significant at any conventional

significance level. On the other hand, if we allow for cohort interactive effects6 the

estimated elasticity coefficient is smaller in magnitude and does not significantly differ

from zero at the 1% significance level for the GMMn2 estimator. Based on the BIC

model selection criteria, the model specification with one factor is preferred. Turning

our attention to the estimated coefficient of #kids we can see that the results differ

slightly between estimators. In all cases, we find that estimates are not significant at

any conventional significance level. As a robustness check in Appendix 5.B.1 we also

provide results based on a linear-log specification with qualitatively similar conclusions.

Overall, our results indicate that by not taking into account the possible presence of

time-varying factors researchers can overestimate the elasticity of the labour supply

using pseudo-panel data.

Although methodologically our study is simpler and spans a shorter (and different)

time period, we can compare our results with those in Gonzalez and Sala (2015). They

found that for some Latin American countries, particularly Paraguay, the estimate

of the labour supply elasticity is strongly negative, while for others, it was found to

be positive (Argentina, and after sample restriction, Uruguay).7 Our estimate of the

elasticity coefficient places Ecuador closer to countries like Paraguay than to Argentina

or Chile. This conclusion can also be partially supported by some relatively similar

macroeconomic indicators (e.g. GDP per capita) in both countries and relatively long

average hours worked.

5The results based on the one-step estimator that assumes σ2s,t = σ2 are qualitatively and quan-

titative similar.6Unlike the Monte Carlo study where only one set of starting values based on FE estimator was

used, in this section we use up to 100 random starting values that are uniformly distributed on [−10; 10]a for non-linear estimator to make sure that the global minimum of the objective function is selected.

7This result holds irrespective of whether a log-log or linear-log specification is used.


5.7 Conclusions

In this paper, we have studied the properties of available estimation techniques for

linear pseudo panel data models. We have extended the pseudo panel data literature

to models with possible cohort interactive effects. To overcome inconsistency of the

usual FE estimator, we have introduced the approach of Ahn et al. (2013) to pseudo

panel data models. The consistency and conditional asymptotic normality of the new

estimator was proved for pseudo panels with a fixed number of time series observations

and cohorts. Furthermore, we have discussed the estimation and identification for

datasets with a cohort-specific number of time observations.

Results from the extensive Monte Carlo study suggest that the estimator that accounts

for the multiplicative structure of the cohort effects has good finite sample properties

for small values of S and T . The results, however, can be sensitive to the relative

importance of the unobserved factors in the total error component structure.

As an empirical illustration, we have studied labour supply elasticity based on data

from Ecuador. In our analysis we have found that the model with multiplicative error

structure provides a better fit to the data than its counterpart with fixed effects. More

importantly, we have found that using our estimation technique, the estimated labour

supply elasticity measure is smaller in absolute value in comparison to the conventional

cohort fixed effects estimation strategy.

As thoroughly discussed by McKenzie (2004) and Verbeek (2008), different types of

asymptotic approximations are available for pseudo panel data models, depending on

their dimensions. In this paper, we have mainly investigated the effects of the error

terms with multifactor structure assuming that the number of cohorts and the time

dimension is fixed. This assumption is only sensible for models with limited number

of cohorts but a large number of observations per cohort. However, given the limited

scope of this paper we leave the rigorous analysis of other asymptotic schemes for

models with multifactor error structure for future research.


5.A Theoretical results

5.A.1 Proofs

Proof of Proposition 5.1.

The result of this proposition follows directly given the DGP is given by:

y = vec (FΛ′) +Xθ0 + u.

Plugging in the expression for y into the formula for

θGMMl − θ0 = (X ′MΩMX)−1X ′MΩM(vec (FΛ′) + u)

= (X ′MΩMX)−1X ′MΩM(vec (FΛ′)) + oP (1)

= (X ′∞MΩMX∞)−1X ′∞MΩM vec (FΛ′) + oP (1).

Here the second line follows, as by Assumption (A.2) plimN→∞ u = 0ST . Finally,

using the notation plimN→∞X = X∞ we obtain the final result.

Proof of Proposition 5.3.

The proof of this proposition is a straightforward modification of the proof for a simple

IV estimator.

CASE I: γ ≥ 0.5:

ζ − ζ0 − ρ =

∑Ss=1

∑Tt=1 zs,t(εs,t − ρ(µs,t/N

γs,t))∑S

s=1

∑Tt=1 z

2s,t

=N

N

∑Ss=1

∑Tt=1 zs,t(εs,t − ρ(µs,t/N

γs,t))∑S

s=1

∑Tt=1 z

2s,t

=

∑Ss=1

∑Tt=1(N/Ns,t)(

√Ns,tzs,t)(

√Ns,tεs,t − ρ(µs,t/N

γ+0.5s,t ))∑S

s=1

∑Tt=1(N/Ns,t)

(√Ns,tzs,t

)2 .

From here the desired result follows given that

N/Ns,t → π−1s,t ,√

Ns,tzs,td−→ zs,t,√

Ns,tεs,t − ρ(µs,t/Nγ+0.5s,t )

d−→ εs,t,


as we assume that all idiosyncratic components are i.i.d. and hence the usual CLT

applies.

CASE II: γ ∈ [0; 0.5):

N1/2−γ(ζ − ζ0) =N1/2+γ

N2γ

∑Ss=1

∑Tt=1 zs,t(us,t)∑S

s=1

∑Tt=1 z

2s,t

=

∑Ss=1

∑Tt=1(N/Ns,t)

1/2+γ(Nγs,tzs,t)(

√Ns,tus,t)∑S

s=1

∑Tt=1(N/Ns,t)2γ(Nγ

s,tzs,t)2

.

By means of the Slutsky’s Theorem for the denominator

Nγs,tzs,t = µs,t +N

γ−1/2s,t (

√Ns,tε

(z)s,t )

p−→ µs,t.

For the numerator simple CLT for i.i.d. data applies√Ns,tus,t = ρ

√Ns,tε

zs,t +

√Ns,tεs,t

d−→ ρN (0, 1) +N (0, 1).

The desired result follows by combining the results for the numerator and denominator

N1/2−γ(ζ − ζ0)d−→∑S

s=1

∑Tt=1 π

−(0.5+γ)s,t µs,tus,t∑S

s=1

∑Tt=1 π

−2γs,t µ

2s,t

,

with us,t ∼ N (0, 1 + ρ2).

5.A.2 Differentials

If we denote transformed equations by u = u(θ,φ) = M (φ)(y −Xθ), the objective

function can be formulated in the following way

f(θ,φ) =1

2u′Ωu.

Using the product rule for differentials

df(θ,φ) = u′Ω du. (5.29)


Here the first differential of the u term can be compactly written

du = (dM (φ))(y −Xθ)−M (φ)X dθ

= (IS ⊗ (OT−L, dΦ))(y −Xθ)−M (φ)X dθ

=((y −Xθ)′ ⊗ IS(T−L)

)vec(IS ⊗ (OT−L, dΦ))−M(φ)X dθ

=((y −Xθ)′ ⊗ IS(T−L)

)Q dφ−M (φ)X dθ

= Dφ dφ+Dθ dθ.

Here the selection matrix Q is of the following form

Q = (((IS ⊗KTS)(vec IS ⊗ IT ))⊗ IT−L)

(O[(T−L)2×(T−L)L]

I(T−L)L

).

For more detailed derivations, see Magnus and Neudecker (2007)[p. 56]. The second

differential of the objective function f(θ,φ) is

d2f(θ,φ) = (du)′Ω du+ u′Ω d2u. (5.30)

Note that the second term is asymptotically negligible (oP (1)) if evaluated at any

consistent estimator γ. Hence

d2f(θ,φ) = (du)′Ω du+ oP (1). (5.31)

Plugging in the value of du (ignoring the oP (1) term),

d2f(θ,φ) = (du)′Ω du

= (dφ)′Dφ(γ)′ΩDφ(γ)(dφ) + (dθ)′Dθ(γ)′ΩDθ(γ)(dθ)

+ 2(dφ)′Dφ(γ)′ΩDθ(γ)(dθ).

Combining all results we obtain formulas for the score (∇(γ)) and the Hessian (H(γ)):

∇(γ) =

(Dθ(γ)′

Dφ(γ)′

)Ωu,

H(γ) =

(Dθ(γ)′ΩDθ(γ) Dθ(γ)′ΩDφ(γ)

Dφ(γ)′ΩDθ(γ) Dφ(γ)′ΩDφ(γ)

)=

(Dθ(γ)′

Dφ(γ)′

)Ω

(Dθ(γ)′

Dφ(γ)′

)′.


Under Assumptions (A.1)-(A.6) the asymptotic distribution is

√N(γ − γ0)

d−→ plimN→∞

( D′θΩDθ D′θΩDφ

D′φΩDθ D′φΩDφ

)−1(D′θ−D′φ

)ΩM (φ0)Σ1/2ξ.

Furthermore, similarly to any standard GMM problem the (conditional) variance of√N(γ − γ0) is minimized at

Ωopt = (M (φ0)ΣM (φ0)′)−1.

Therefore, the asymptotic variance with Ωopt as a weighting matrix equals

Avar γ =

[plimN→∞

(D′θD′φ

)(M(φ0)ΣM (φ0)′)

−1

(D′θD′φ

)′]−1

.

Finally, the same result holds if evaluated at any weighting matrix Ω such that Ωp−→

Ωopt.

5.A.3 Sufficient conditions for FE estimator

For the model with cohort fixed effects we define

ui,t ≡ νi,t − δs, i ∈ Is,t.

The following high-level assumptions are sufficient to prove that the linear GMM esti-

mator θGMMl is consistent and asymptotically normally distributed.

(FE.1) Ns,t →∞, ∀s, t; ∃N →∞ s.t. Ns,t/N → πs,t and 0 < minπs,t < maxπs,t <

∞. T and S are fixed (Type I asymptotics).

(FE.2) ui,t are i.h.d. with finite 2 + δ moment, for δ > 0, such that√Ns,tus,t

d−→N (0, σ2

s,t) jointly ∀s, t with 0 < minσ2s,t ≤ maxσ2

s,t <∞.

(FE.3) There exists a unique true value θ0.

(FE.4) rk(MX∞) = K. The X∞ ≡ plimN→∞X matrix is deterministic.

(FE.5) S(T − 1) > K.


Under these assumptions the asymptotically efficient weighting matrix can be consis-

tently estimated by

Ωopt =(MΣM

)+

.

Here we use the Moore-Penrose pseudoinverse of(MΣM

)because the rank of this

matrix is given by rk (M) = (T − 1)S. The typical (s − 1)T + t diagonal element of

the Σ matrix is equal to q2s,t = (N/Ns,t)σ

2s,t where

σ2s,t =

1

Ns,t

∑i∈Is,t

(yi,t − x′i,s,tθ1)2 −

1

Ns,t

∑i∈Is,t

(yi,t − x′i,s,tθ1)

2

, (5.32)

which is evaluated at some consistent initial estimator θ1 (e.g. the estimator that

replaces Ω by an identity matrix). Therefore (under Type I asymptotics) if θ1p−→ θ0,

one has q2s,t

p−→ q2s,t ≡ σ2

s,t/πs,t, where Ns,t/N → πs,t.

5.A.4 The Hausman test for fixed effects

The Hausman test can be used to test for the presence of unobserved factors (also

suggested by Bai (2009) in the context of genuine panels with large N, T ). In particular,

if L = L0 = 1 then the following result

H = N(

∆θ)′( Avar θGMMn − Avar θGMMl

)−1

∆θd−→ χ2(K) (5.33)

holds under the null hypothesis of the fixed effects model being correct. Here ∆θ =

θGMMn − θGMMl, while to estimate variance-covariance matrices of estimators we use

Avar φGMMn =(Dφ(γ)′Ω1/2MΩ1/2Dθ(γ)Ω

1/2Dφ(γ))−1

, (5.34)

with Ω =(M(φ1)Σ(θ1)M (φ1)′

)−1

evaluated at a consistent first-step estimator

γ1 = (θ′1, φ′1)′ (under the alternative hypothesis). For the fixed effects estimator the

variance-covariance matrix is analogously given by

Avar θGMMl =

(X ′M

(MΣ(θGMMl,1)M

)+

MX

)−1

, (5.35)


for M = IS ⊗ (IT − (1/T )ıT ı′T ). Here θGMMl,1 is a first-step consistent estimator

(under the null hypothesis). In both cases Σ(·) can be estimated using the formula in

(5.32).

5.B The ENEMDU dataset

Table 5.2: Observations per cohort in a particular year. Here Cohort 1 is the oldestand cohort is 10 the youngest.

1 2 3 4 5 6 7 8 9 102007 264 283 276 308 300 351 322 277 235 1912008 288 329 383 383 366 353 361 298 267 2112009 260 317 320 340 335 338 280 241 229 1852010 318 341 390 417 396 398 390 272 285 2022011 338 361 339 420 372 356 412 369 321 2972012 338 359 411 429 420 402 441 387 342 3012013 240 304 363 437 432 447 525 518 473 467

Table 5.3: The average number of individuals of age under 16 in a particularhousehold. Here Cohort 1 is the oldest and cohort 10 is the youngest.

1 2 3 4 5 6 7 8 9 102007 1.08 1.30 1.57 1.94 2.32 2.21 2.37 2.27 2.08 1.572008 0.92 1.47 1.41 1.87 2.18 2.09 2.46 2.18 2.12 1.702009 0.93 1.15 1.42 1.60 1.98 1.99 2.39 2.27 2.18 1.602010 0.90 1.08 1.30 1.54 1.86 2.05 2.17 2.18 2.37 1.792011 0.78 0.82 1.06 1.25 1.55 1.79 1.94 2.17 2.20 1.932012 0.80 0.97 0.97 1.21 1.40 1.86 1.92 2.21 2.21 1.942013 0.81 0.86 1.04 1.14 1.28 1.77 1.85 2.23 2.27 2.01


5.B.1 The linear-log specification

hoursi,t = λ′sft + γ logwagei,t + βzi,t + ui,t, i ∈ Is,t. (5.36)

In this specification the γ parameter no longer has the elasticity interpretation. We

summarize the results for this specification in Table 5.4. The results are both quantita-

tively and qualitatively similar to the ones based on log-log specification. The estimated

coefficient of logwage based on the Fixed Effects specification is substantially larger in

magnitude as compared to the counterpart estimated allowing for time varying factors.

Variable GMMl1 GMMn1 GMMn2logwage -6.21*** -3.45** -3.37**# kids 0.18 -0.41 -0.49

df 58 52 38J 88.29*** 56.25 43.45

BIC1 -188.07 -135.10BIC2 -85.12 -59.86

Wald(FE) 33.68***

Table 5.4: T = 7, S = 10. Results are based on 2-step estimates using theoptimal weighting matrix in the second step. Based only on heads of the household.* indicates statistical significance at the 10% level, **- at the 5% level, and ***- at the1% level γ0, β0 = 0. J(GMMl0) = 580.90. BIC1 and BIC2 use N =

∑Ss=1

∑Tt=1Ns,t

and N = (1/ST )∑S

s=1

∑Tt=1Ns,t respectively.

5.C Monte Carlo results


Table 5.5: Estimation results for T = 5, S = 10. 10000 MC replications. Forζ0 = 1.

Bias RMSE

σ2fλ; ρ;σ2

µ; N ;σ2f L0 LFE1 L1 L2 L L0 LFE1 L1 L2 L

.1 ; .00 ; .00 ; 150 ; .0 .53 .00 .04 .02 .03 .54 .16 .21 .26 .22 .1 ; .00 ; .00 ; 150 ; .1 .43 .00 .02 .01 .01 .44 .18 .18 .22 .18 .1 ; .00 ; .00 ; 300 ; .0 .39 .00 .02 .02 .02 .40 .16 .20 .25 .20 .1 ; .00 ; .00 ; 300 ; .1 .48 .00 .01 .00 .01 .48 .21 .18 .21 .17 .1 ; .00 ; .05 ; 150 ; .0 .35 .00 .00 .00 .00 .36 .06 .07 .09 .07

(A) .1 ; .00 ; .05 ; 150 ; .1 .51 .00 .01 .00 .00 .51 .07 .10 .09 .07 .1 ; .00 ; .05 ; 300 ; .0 .47 .00 .00 .00 .00 .48 .04 .05 .06 .05 .1 ; .00 ; .05 ; 300 ; .1 .53 .00 .00 .00 .00 .53 .06 .07 .06 .05 .1 ; .00 ; .30 ; 150 ; .0 .45 .00 .00 .00 .00 .46 .02 .03 .04 .03 .1 ; .00 ; .30 ; 150 ; .1 .37 .00 .00 .00 .00 .38 .03 .03 .04 .03 .1 ; .00 ; .30 ; 300 ; .0 .39 .00 .00 .00 .00 .40 .02 .03 .03 .02 .1 ; .00 ; .30 ; 300 ; .1 .35 .00 .00 .00 .00 .36 .02 .02 .03 .02

.1 ; .30 ; .00 ; 150 ; .0 .66 .30 .34 .33 .33 .67 .34 .39 .41 .39 .1 ; .30 ; .00 ; 150 ; .1 .47 .30 .24 .24 .23 .48 .35 .31 .34 .31 .1 ; .30 ; .00 ; 300 ; .0 .61 .30 .32 .32 .32 .61 .34 .37 .40 .37 .1 ; .30 ; .00 ; 300 ; .1 .47 .30 .18 .19 .18 .48 .36 .27 .29 .26 .1 ; .30 ; .05 ; 150 ; .0 .52 .04 .04 .04 .04 .53 .07 .08 .10 .08

(B) .1 ; .30 ; .05 ; 150 ; .1 .44 .04 .04 .04 .04 .45 .08 .08 .09 .07 .1 ; .30 ; .05 ; 300 ; .0 .49 .02 .02 .02 .02 .50 .05 .05 .07 .05 .1 ; .30 ; .05 ; 300 ; .1 .42 .02 .02 .02 .02 .43 .06 .05 .06 .05 .1 ; .30 ; .30 ; 150 ; .0 .33 .00 .01 .01 .01 .34 .02 .03 .04 .03 .1 ; .30 ; .30 ; 150 ; .1 .42 .01 .01 .01 .01 .43 .03 .04 .04 .03 .1 ; .30 ; .30 ; 300 ; .0 .44 .00 .00 .00 .00 .45 .02 .03 .03 .02 .1 ; .30 ; .30 ; 300 ; .1 .31 .00 .00 .00 .00 .32 .02 .02 .03 .02

.5 ; .00 ; .00 ; 150 ; .0 .58 .00 .00 .00 .00 .62 .12 .13 .17 .13 .5 ; .00 ; .00 ; 150 ; .1 .47 .00 .00 .00 .00 .52 .18 .10 .14 .11 .5 ; .00 ; .00 ; 300 ; .0 .59 .00 .00 .00 .00 .64 .12 .13 .17 .13 .5 ; .00 ; .00 ; 300 ; .1 .51 .00 .00 .00 .00 .55 .23 .10 .13 .10 .5 ; .00 ; .05 ; 150 ; .0 .46 .00 .00 .00 .00 .51 .04 .05 .06 .05

(C) .5 ; .00 ; .05 ; 150 ; .1 .51 .00 .00 .00 .00 .55 .07 .05 .06 .05 .5 ; .00 ; .05 ; 300 ; .0 .44 .00 .00 .00 .00 .49 .03 .03 .05 .03 .5 ; .00 ; .05 ; 300 ; .1 .42 .00 .00 .00 .00 .47 .06 .03 .04 .03 .5 ; .00 ; .30 ; 150 ; .0 .38 .00 .00 .00 .00 .42 .02 .02 .03 .02 .5 ; .00 ; .30 ; 150 ; .1 .33 .00 .00 .00 .00 .37 .03 .02 .03 .02 .5 ; .00 ; .30 ; 300 ; .0 .44 .00 .00 .00 .00 .48 .01 .01 .02 .01 .5 ; .00 ; .30 ; 300 ; .1 .49 .00 .00 .00 .00 .52 .03 .02 .02 .01

.5 ; .30 ; .00 ; 150 ; .0 .51 .30 .30 .30 .30 .56 .32 .33 .35 .33 .5 ; .30 ; .00 ; 150 ; .1 .35 .30 .18 .20 .18 .41 .35 .21 .25 .22 .5 ; .30 ; .00 ; 300 ; .0 .56 .30 .30 .30 .30 .61 .32 .33 .35 .33 .5 ; .30 ; .00 ; 300 ; .1 .47 .30 .14 .16 .14 .52 .37 .18 .23 .17 .5 ; .30 ; .05 ; 150 ; .0 .57 .04 .04 .04 .04 .61 .06 .06 .08 .06

(D) .5 ; .30 ; .05 ; 150 ; .1 .41 .04 .03 .04 .03 .46 .08 .06 .07 .06 .5 ; .30 ; .05 ; 300 ; .0 .47 .02 .02 .02 .02 .52 .04 .04 .05 .04 .5 ; .30 ; .05 ; 300 ; .1 .44 .02 .02 .02 .02 .49 .07 .04 .05 .04 .5 ; .30 ; .30 ; 150 ; .0 .45 .01 .01 .01 .01 .49 .02 .02 .03 .02 .5 ; .30 ; .30 ; 150 ; .1 .49 .01 .01 .01 .01 .52 .03 .03 .03 .02 .5 ; .30 ; .30 ; 300 ; .0 .33 .00 .00 .00 .00 .38 .01 .02 .02 .01 .5 ; .30 ; .30 ; 300 ; .1 .38 .00 .00 .00 .00 .42 .03 .02 .02 .01

Here “L0” is the “GMMl0” estimator; “LFE1 ” is the “GMMl1” fixed effects estimator; “L1”and “L2” are the non-linear “GMMn1” and “GMMn2” estimators, respectively; “L” is the

“GMMo” estimator with optimal number of factors based on BIC.


Table 5.6: Testing results for T = 5, S = 10. 10000 MC replications. For ζ0 = 1.

L0 LFE1 L1 L2 Lσ2

fλ; ρ;σ2µ; N ;σ2

f J t J t J t J t J t W h0 h1 #L = 1

.1 ; .00 ; .00 ; 150 ; .0 1 1 .05 .04 .08 .16 .02 .16 .04 .16 .09 .98 .31 .94 .1 ; .00 ; .00 ; 150 ; .1 1 1 .36 .08 .09 .14 .02 .14 .04 .13 .59 .96 .47 .94 .1 ; .00 ; .00 ; 300 ; .0 1 1 .05 .04 .06 .15 .02 .16 .04 .15 .08 .97 .27 .98 .1 ; .00 ; .00 ; 300 ; .1 1 1 .62 .12 .08 .11 .02 .13 .05 .11 .78 .97 .56 .97 .1 ; .00 ; .05 ; 150 ; .0 1 1 .06 .05 .10 .08 .03 .09 .06 .07 .12 1 .27 .94

(A) .1 ; .00 ; .05 ; 150 ; .1 1 1 .45 .11 .13 .09 .03 .08 .06 .07 .72 1 .59 .91 .1 ; .00 ; .05 ; 300 ; .0 1 1 .06 .06 .07 .06 .02 .07 .05 .06 .08 1 .18 .98 .1 ; .00 ; .05 ; 300 ; .1 1 1 .71 .18 .09 .07 .03 .08 .06 .06 .89 1 .72 .97 .1 ; .00 ; .30 ; 150 ; .0 1 1 .06 .05 .09 .06 .03 .08 .06 .06 .08 1 .17 .96 .1 ; .00 ; .30 ; 150 ; .1 1 1 .46 .12 .10 .06 .03 .07 .06 .06 .73 1 .57 .95 .1 ; .00 ; .30 ; 300 ; .0 1 1 .06 .05 .08 .06 .03 .07 .06 .06 .08 1 .15 .98 .1 ; .00 ; .30 ; 300 ; .1 1 1 .67 .16 .08 .05 .03 .07 .06 .05 .87 1 .69 .99

.1 ; .30 ; .00 ; 150 ; .0 1 1 .04 .55 .08 .61 .02 .49 .03 .60 .13 .94 .29 .94 .1 ; .30 ; .00 ; 150 ; .1 1 1 .40 .55 .13 .51 .03 .43 .06 .49 .62 .87 .49 .91 .1 ; .30 ; .00 ; 300 ; .0 1 1 .05 .55 .05 .59 .01 .49 .04 .59 .08 .96 .27 .98 .1 ; .30 ; .00 ; 300 ; .1 1 1 .65 .54 .10 .40 .03 .35 .07 .39 .81 .92 .59 .96 .1 ; .30 ; .05 ; 150 ; .0 1 1 .07 .12 .11 .14 .04 .13 .06 .13 .10 1 .24 .94

(B) .1 ; .30 ; .05 ; 150 ; .1 1 1 .45 .17 .12 .15 .04 .13 .07 .13 .71 1 .57 .94 .1 ; .30 ; .05 ; 300 ; .0 1 1 .06 .09 .08 .09 .03 .10 .06 .09 .07 1 .19 .98 .1 ; .30 ; .05 ; 300 ; .1 1 1 .63 .19 .09 .10 .03 .10 .06 .09 .85 1 .66 .97 .1 ; .30 ; .30 ; 150 ; .0 1 1 .06 .06 .10 .07 .04 .08 .06 .07 .11 1 .21 .95 .1 ; .30 ; .30 ; 150 ; .1 1 1 .50 .13 .10 .07 .03 .08 .06 .06 .76 1 .60 .95 .1 ; .30 ; .30 ; 300 ; .0 1 1 .06 .05 .07 .06 .03 .07 .06 .06 .07 1 .12 .99 .1 ; .30 ; .30 ; 300 ; .1 1 1 .60 .15 .08 .06 .03 .07 .06 .06 .84 1 .66 .98

.5 ; .00 ; .00 ; 150 ; .0 1 1 .05 .04 .05 .06 .02 .08 .04 .07 .05 .99 .25 .97 .5 ; .00 ; .00 ; 150 ; .1 1 1 .78 .17 .07 .06 .02 .07 .05 .06 .92 .99 .67 .97 .5 ; .00 ; .00 ; 300 ; .0 1 1 .05 .04 .05 .06 .02 .07 .04 .06 .05 1 .23 .99 .5 ; .00 ; .00 ; 300 ; .1 1 1 .90 .26 .06 .06 .02 .07 .05 .06 .97 .99 .75 .99 .5 ; .00 ; .05 ; 150 ; .0 1 .99 .06 .05 .07 .06 .03 .07 .05 .06 .07 1 .15 .97

(C) .5 ; .00 ; .05 ; 150 ; .1 1 1 .82 .23 .07 .06 .03 .07 .05 .06 .93 1 .78 .97 .5 ; .00 ; .05 ; 300 ; .0 1 1 .06 .05 .07 .06 .02 .07 .06 .06 .06 1 .13 .99 .5 ; .00 ; .05 ; 300 ; .1 1 .99 .91 .31 .06 .06 .02 .07 .05 .06 .97 1 .85 .99 .5 ; .00 ; .30 ; 150 ; .0 1 .99 .06 .05 .07 .05 .02 .07 .05 .06 .06 1 .12 .98 .5 ; .00 ; .30 ; 150 ; .1 1 .99 .78 .23 .07 .06 .03 .07 .05 .06 .92 1 .76 .97 .5 ; .00 ; .30 ; 300 ; .0 1 1 .05 .05 .06 .05 .02 .07 .05 .05 .06 1 .10 .99 .5 ; .00 ; .30 ; 300 ; .1 1 1 .93 .36 .06 .05 .02 .06 .05 .05 .98 1 .89 .99

.5 ; .30 ; .00 ; 150 ; .0 1 1 .05 .78 .05 .75 .02 .60 .04 .75 .07 .97 .27 .97 .5 ; .30 ; .00 ; 150 ; .1 1 .98 .75 .71 .09 .53 .03 .44 .06 .53 .91 .95 .68 .95 .5 ; .30 ; .00 ; 300 ; .0 1 1 .05 .78 .05 .75 .02 .59 .04 .75 .06 .98 .26 .99 .5 ; .30 ; .00 ; 300 ; .1 1 1 .91 .67 .09 .42 .03 .37 .07 .42 .97 .97 .78 .98 .5 ; .30 ; .05 ; 150 ; .0 1 1 .08 .16 .09 .16 .03 .15 .06 .16 .07 1 .17 .97

(D) .5 ; .30 ; .05 ; 150 ; .1 1 .99 .78 .28 .08 .15 .03 .14 .06 .15 .92 1 .75 .97 .5 ; .30 ; .05 ; 300 ; .0 1 1 .07 .11 .07 .11 .03 .11 .06 .11 .07 1 .14 .99 .5 ; .30 ; .05 ; 300 ; .1 1 1 .91 .33 .07 .11 .03 .11 .06 .10 .97 1 .85 .99 .5 ; .30 ; .30 ; 150 ; .0 1 1 .06 .07 .07 .07 .03 .08 .06 .07 .07 1 .12 .97 .5 ; .30 ; .30 ; 150 ; .1 1 1 .83 .26 .07 .07 .03 .08 .06 .07 .95 1 .80 .97 .5 ; .30 ; .30 ; 300 ; .0 1 .99 .06 .06 .06 .06 .03 .07 .05 .06 .06 1 .11 .99 .5 ; .30 ; .30 ; 300 ; .1 1 1 .92 .32 .06 .06 .02 .07 .06 .06 .97 1 .86 .99

Here “L0” is the “GMMl0” estimator; “LFE1 ” is the “GMMl1” fixed effects estimator; “L1”and “L2” are the non-linear “GMMn1” and “GMMn2” estimators, respectively; “L” is the

“GMMo” estimator with optimal number of factors based on BIC.

Bibliography

Abadir, K. M. and J. R. Magnus (2002): “Notation in Econometrics: A Proposal

for a Standard,” Econometrics Journal, 5, 76–90.

Abrevaya, J. (2013): “The Projection Approach for Unbalanced Panel Data,” The

Econometrics Journal, 16, 161–178.

Ahn, S. C. (2015): “Comment on “IV Estimation of Panels with Factor Residuals”

by D. Robertson and V. Sarafidis,” Journal of Econometrics, 185, 542 – 544.

Ahn, S. C., Y. H. Lee, and P. Schmidt (2001): “GMM Estimation of Linear

Panel Data Models with Time-varying Individual Effects,” Journal of Econometrics,

101, 219–255.

——— (2013): “Panel Data Models with Multiple Time-varying Individual Effects,”

Journal of Econometrics, 174, 1–14.

Ahn, S. C. and P. Schmidt (1995): “Efficient Estimation of Models for Dynamic

Panel Data,” Journal of Econometrics, 68, 5–27.

——— (1997): “Efficient Estimation of Dynamic Panel Data Models: Alternative

Assumptions and Simplified Estimation,” Journal of Econometrics, 76, 309–321.

Ahn, S. C. and G. M. Thomas (2006): “Likelihood Based Inference for Dynamic

Panel Data Models,” Unpublished Manuscript.

Akashi, K. and N. Kunitomo (2012): “Some Properties of the LIML Estimator in

a Dynamic Panel Structural Equation,” Journal of Econometrics, 166, 167 – 183.

Alonso-Borrego, C. and M. Arellano (1999): “Symmetrically Normalized

Instrumental-Variable Estimation using Panel Data,” Journal of Business & Eco-

nomic Statistics, 17, 36–49.

185

Bibliography 186

Alvarez, J. and M. Arellano (2003): “The Time Series and Cross-Section Asymp-

totics of Dynamic Panel Data Estimators,” Econometrica, 71(4), 1121–1159.

Amemiya, T. (1985): Advanced Econometrics, Harvard University Press.

Anderson, T. W. and C. Hsiao (1982): “Formulation and Estimation of Dynamic

Models Using Panel Data,” Journal of Econometrics, 18, 47–82.

Antman, F. and D. J. McKenzie (2007a): “Earnings Mobility and Measurement

Error: A Pseudo Panel Approach,” Economic Development and Cultural Change, 56,

125–161.

——— (2007b): “Poverty Traps and Nonlinear Income Dynamics with Measurement

Error and Individual Heterogeneity,” Journal of Development Studies, 43, 1057–1083.

Arellano, M. (2003a): “Modeling Optimal Instrumental Variables for Dynamic

Panel Data Models,” Unpublished manuscript.

——— (2003b): Panel Data Econometrics, Advanced Texts in Econometrics, Oxford

University Press.

Arellano, M. and S. Bond (1991): “Some Tests of Specification for Panel Data:

Monte Carlo Evidence and an Application to Employment Equations,” Review of

Economic Studies, 58, 277–297.

Arellano, M. and O. Bover (1995): “Another Look at the Instrumental Variable

Estimation of Error-components Models,” Journal of Econometrics, 68, 29–51.

Bai, J. (2009): “Panel Data Models With Interactive Fixed Effects,” Econometrica,

77, 1229–1279.

——— (2013a): “Fixed-Effects Dynamic Panel Models, a Factor Analytical Method,”

Econometrica, 81, 285–314.

——— (2013b): “Likelihood Approach to Dynamic Panel Models with Interactive

Effects,” Working Paper.

Balestra, P. and M. Nerlove (1966): “Pooling Cross Section and Time Series

Data in the Estimation of a Dynamic Model: The Demand for Natural Gas,” Econo-

metrica, 34, pp. 585–612.

Baltagi, B. H. (2013): Econometric Analysis of Panel Data, Wiley.

Bibliography 187

Bekker, P. A. (1994): “Alternative Approximations to the Distributions of Instru-

mental Variable Estimators,” Econometrica, 62, pp. 657–681.

Bekker, P. A. and J. van der Ploeg (2005): “Instrumental Variable Estimation

Based on Grouped Data,” Statistica Neerlandica, 59, 239–267.

Binder, M., C. Hsiao, and M. H. Pesaran (2005): “Estimation and Inference in

Short Panel Vector Autoregressions with Unit Root and Cointegration,” Econometric

Theory, 21, 795–837.

Blundell, R. W. and S. Bond (1998): “Initial Conditions and Moment Restrictions

in Dynamic Panel Data Models,” Journal of Econometrics, 87, 115–143.

Bond, S., C. Nauges, and F. Windmeijer (2005): “Unit Roots: Identification

and Testing in Micro Panels,” Working paper.

Bond, S. and F. Windmeijer (2002): “Projection Estimators for Autoregressive

Panel Data Models,” The Econometrics Journal, 5, 457–479.

Bun, M. J. G. and M. A. Carree (2005): “Bias-Corrected Estimation in Dynamic

Panel Data Models,” Journal of Business & Economic Statistics, 23(2), 200–210.

Bun, M. J. G., M. A. Carree, and A. Juodis (2015): “On Maximum Likelihood

Estimation of Dynamic Panel Data Models,” UvA-Econometrics Working Paper Se-

ries.

Bun, M. J. G. and J. F. Kiviet (2006): “The Effects of Dynamic Feedbacks on LS

and MM Estimator Accuracy in Panel Data Models,” Journal of Econometrics, 132,

409–444.

Bun, M. J. G. and F. R. Kleibergen (2014): “Identification in Linear Dynamic

Panel Data Models,” UvA-Econometrics Working Paper Series.

Bun, M. J. G. and R. W. Poldermans (2015): “Weak Identification Robust

Inference in Dynamic Panel Data Models,” Mimeo.

Bun, M. J. G. and V. Sarafidis (2015): “Dynamic Panel Data Models,” in The

Oxford Handbook of Panel Data, ed. by B. H. Baltagi, Oxford University Press,

chap. 3.

Bun, M. J. G. and F. Windmeijer (2010): “The Weak Instrument Problem of

the System GMM Estimator in Dynamic Panel Data Models,” The Econometrics

Journal, 13, 95–126.

Bibliography 188

Cao, B. and Y. Sun (2011): “Asymptotic Distributions of Impulse Response Func-

tions in Short Panel Vector Autoregressions,” Journal of Econometrics, 163, 127–143.

Chamberlain, G. (1982): “Multivariate Regression Models for Panel Data,” Journal

of Econometrics, 18, 5–46.

Dargay, J. (2007): “The Effect of Prices and Income on Car Travel in the UK,”

Transportation Research Part A, 41, 949–960.

Deaton, A. (1985): “Panel Data From Time Series of Cross-sections,” Journal of

Econometrics, 30, 109–126.

Dhaene, G. and K. Jochmans (2015): “Likelihood Inference in an Autoregression

with Fixed Effects,” Econometric Theory, (forthcoming).

Doornik, J. (2009): An Object-Oriented Matrix Language Ox 6, London: Timberlake

Consultants Press.

Dovonon, P. and E. Renault (2009): “GMM Overidentification Test with First

Order Underidentification,” Working Paper.

Ericsson, J. and M. Irandoust (2004): “The Productivity-bias Hypothesis and

the PPP Theorem: New Evidence from Panel Vector Autoregressive Models,” Japan

and the World Economy, 16, 121–138.

Feldman, G. J. and R. D. Cousins (1998): “Unified Approach to the Classical

Statistical Analysis of Small Signals,” Phys. Rev. D, 57, 3873–3889.

Gonzalez, R. and H. Sala (2015): “The Frisch Elasticity in the Mercosur Coun-

tries: A Pseudo-Panel Approach,” Development Policy Review, 33, 107–131.

Grassetti, L. (2011): “A Note on Transformed Likelihood Approach in Linear Dy-

namic Panel Models,” Statistical Methods & Applications, 20, 221–240.

Hahn, J. and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for a

Dynamic Panel Model with Fixed Effects When Both N and T are Large,” Econo-

metrica, 70(4), 1639–1657.

Hahn, J., G. Kuersteiner, and M. H. Cho (2004): “Asymptotic Distribution

of Misspecified Random Effects Estimator for a Dynamic Panel Model with Fixed

Effects when Both n and T are Large,” Economics Letters, 84, 117 – 125.

Bibliography 189

Han, C. and P. C. B. Phillips (2010): “GMM Estimation for Dynamic Panels with

Fixed Effects and Strong Instruments at Unity,” Econometric Theory, 26, 119–151.

——— (2013): “First Difference Maximum Likelihood and Dynamic Panel Estima-

tion,” Journal of Econometrics, 175, 35–45.

Hayakawa, K. (2007): “Consistent OLS Estimation of AR(1) Dynamic Panel Data

Models with Short Time Series,” Applied Economics Letters, 14:15, 1141–1145.

——— (2009a): “On the Effect of Mean-Nonstationarity in Dynamic Panel Data Mod-

els,” Journal of Econometrics, 153, 133–135.

——— (2009b): “A Simple Efficient Instrumental Variable Estimator for Panel AR(p)

Models when Both N and T are Large,” Econometric Theory, 25, 873–890.

——— (2012): “GMM Estimation of Short Dynamic Panel Data Model with Interactive

Fixed Effects,” Journal of the Japan Statistical Society, 42, 109–123.

——— (2015): “An Improved GMM Estimation of Panel VAR Models,” Computational

Statistics and Data Analysis, (forthcoming).

Hayakawa, K. and M. H. Pesaran (2012): “Robust Standard Errors in Trans-

formed Likelihood Estimation of Dynamic Panel Data Models,” Working Paper.

——— (2015): “Robust Standard Errors in Transformed Likelihood Estimation of

Dynamic Panel Data Models,” Journal of Econometrics, 188, 111–134.

Holtz-Eakin, D., W. K. Newey, and H. S. Rosen (1988): “Estimating Vector

Autoregressions with Panel Data,” Econometrica, 56, 1371–1395.

Hsiao, C. (2002): Analysis of Panel Data, Econometric Society Monographs, Cam-

bridge University Press, 2 ed.

Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu (2002): “Maximum Likeli-

hood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time

Periods,” Journal of Econometrics, 109, 107–150.

Hsiao, C. and J. Zhang (2015): “IV, GMM or Likelihood Approach to Estimate

Dynamic Panel Models when Either N or T or Both are Large,” Journal of Econo-

metrics, 187, 312 – 322.

Hsiao, C. and Q. Zhou (2015): “Statistical Inference for Panel Dynamic Simulta-

neous Equations Models,” Journal of Econometrics, (forthcoming).

Bibliography 190

Inoue, A. (2008): “Efficient Estimation and Inference in Linear Pseudo-panel Data

Models,” Journal of Econometrics, 142, 449–466.

Juodis, A. (2013): “A Note on Bias-corrected Estimation in Dynamic Panel Data

Models,” Economics Letters, 118, 435–438.

——— (2014a): “Cointegration Testing in Panel VAR Models Under Partial Identifi-

cation and Spatial Dependence,” UvA-Econometrics working paper 2014/08.

——— (2014b): “First Difference Transformation in Panel VAR models: Robustness,

Estimation and Inference,” UvA-Econometrics working paper 2013/06.

——— (2014c): “Supplement to “First Difference Transformation in Panel VAR mod-

els: Robustness, Estimation and Inference”.” http://arturas.economists.lt/FD_

online.pdf.

——— (2015): “Pseudo Panel Data Models with Cohort Interactive Effects,” Working

Paper.

Juodis, A. and V. Sarafidis (2014): “Fixed T Dynamic Panel Data Estimators

with Multi-Factor Errors,” UvA-Econometrics working paper 2014/07.

——— (2015): “Simplified Estimators for Dynamic Panels with a Multifactor Error

Structure,” Mimeo.

Ketz, P. (2014): “Testing Near or at the Boundary of the Parameter Space,” Mimeo.

Kiviet, J. F. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators

in Dynamic Panel Data Models,” Journal of Econometrics, 68, 53–78.

——— (2007): “Judging Contending Estimators by Simulation: Tournaments in Dy-

namic Panel Data Models,” in The Refinement of Econometric Estimation and Test

Procedures, ed. by G. Phillips and E. Tzavalis, Cambridge University Press, chap. 11,

282–318.

——— (2012): “Monte Carlo Simulation for Econometricians,” in Foundations and

Trends in Econometrics, ed. by W. H. Greene, NOW the essence of knowledge,

vol. 5.

Kiviet, J. F., M. Pleus, and R. Poldermans (2015): “Accuracy and Efficiency of

Various GMM Inference Techniques in Dynamic Micro Panel Data Models,” Working

Paper.

Bibliography 191

Kleibergen, F. R. (2005): “Testing Parameters in GMM without Assuming that

They are Identified,” Econometrica, 73, 1103–1123.

Koutsomanoli-Filippaki, A. and E. Mamatzakis (2009): “Performance and

Merton-type Default Risk of Listed Banks in the EU: A Panel VAR Approach,”

Journal of Banking and Finance, 33, 2050–2061.

Kripfganz, S. (2015): “Unconditional Transformed Likelihood Estimation of Time-

Space Dynamic Panel Data Models,” Working Paper.

Kruiniger, H. (2002): “On the Estimation of Panel Regression Models with Fixed

Effects,” Working paper 450, Queen Mary, University of London.

——— (2006): “Quasi ML Estimation of the Panel AR(1) Model with Arbitrary Initial

Condition,” Working paper 582, Queen Marry, University of London.

——— (2007): “An Efficient Linear GMM Estimator for the Covariance Stationary

AR(1)/Unit Root Model for Panel Data,” Econometric Theory, 23, 519–535.

——— (2008): “Maximum Likelihood Estimation and Inference Methods for the Co-

variance Stationary Panel AR(1)/Unit Root Model,” Journal of Econometrics, 144,

447–464.

——— (2013): “Quasi ML Estimation of the Panel AR(1) Model with Arbitrary Initial

Conditions,” Journal of Econometrics, 173, 175–188.

Kuersteiner, G. and I. R. Prucha (2013): “Limit Theory for Panel Data Models

with Cross Sectional Dependence and Sequential Exogeneity,” Journal of Economet-

rics, 174, 107–126.

——— (2015): “Dynamic Spatial Panel Models: Networks, Common Shocks, and

Sequential Exogeneity,” Working Paper.

Lancaster, T. (2002): “Orthogonal Parameters and Panel Data,” Review of Eco-

nomic Studies, 69, 647–666.

Lokshin, B. (2008): “A Monte Carlo Comparison of Alternative Estimators for Dy-

namic Panel Data Models,” Applied Economics Letters, 15, 15–18.

Maddala, G. S. (1971): “The Use of Variance Components Models in Pooling Cross

Section and Time Series Data,” Econometrica, 39, 341–358.

Bibliography 192

Magnus, J. R. and H. Neudecker (2007): Matrix Differential Calculus with Ap-

plications in Statistics and Econometrics, John Wiley & Sons.

McKenzie, D. J. (2001): “Estimation of AR(1) Models with Unequally Spaced

Pseudo-panels,” Econometrics Journal, 4, 89–108.

——— (2004): “Asymptotic Theory for Heterogeneous Dynamic Pseudo-panels,” Jour-

nal of Econometrics, 120, 235–262.

Michaud, P.-C. and A. van Soest (2008): “Health and Wealth of Elderly Couples:

Causality Tests Using Dynamic Panel Data Models,” Journal of Health Economics,

27, 1312 – 1325.

Moffitt, R. (1993): “Identification and Estimation of Dynamic Models with a Time

Series of Repeated Cross-sections,” Journal of Econometrics, 59, 99 – 123.

Molinari, L. G. (2008): “Determinants of Block Tridiagonal Matrices,” Linear Al-

gebra and its Applications, 429, 2221–2226.

Moral-Benito, E. (2012): “Determinants of Economic Growth: A Bayesian Panel

Data Approach,” The Review of Economics and Statistics, 94, 566–579.

Mundlak, Y. (1978): “On The Pooling of Time Series and Cross Section Data,”

Econometrica, 46, 69–85.

Mutl, J. (2009): “Panel VAR Models with Spatial Dependence,” Working Paper.

Nauges, C. and A. Thomas (2003): “Consistent Estimation of Dynamic Panel Data

Models with Time-varying Individual Effects,” Annales d’Economie et de Statistique,

70, 54–75.

Newey, W. K. and D. McFadden (1994): “Large Sample Estimation and Hy-

pothesis Testing,” in Handbook of Econometrics, ed. by J. Heckman and E. Leamer,

Elsevier, vol. 4, chap. 36, 2111–2245.

Neyman, J. and E. L. Scott (1948): “Consistent Estimation from Partially Con-

sistent Observations,” Econometrica, 16, 1–32.

Nickell, S. (1981): “Biases in Dynamic Models with Fixed Effects,” Econometrica,

49, 1417–1426.

Pesaran, M. H. (2006): “Estimation and Inference in Large Heterogeneous Panels

with a Multifactor Error Structure,” Econometrica, 74, 967–1012.

Bibliography 193

Peterman, W. B. (2014): “Reconciling Micro and Macro Estimates of the Frisch

Labor Supply Elasticity: A Sensitivity Analysis,” Working Paper.

Ramalho, J. J. S. (2005): “Feasible Bias-corrected OLS, Within-groups, and First-

differences Estimators for Typical Micro and Macro AR(1) Panel Data Models,”

Empirical Economics, 30, 735–748.

Robertson, D. and V. Sarafidis (2015): “IV Estimation of Panels with Factor

Residuals,” Journal of Econometrics, 185, 526–541.

Robertson, D., V. Sarafidis, and J. Westerlund (2014): “GMM Unit Root

Inference in Generally Trending and Cross-Correlated Dynamic Panels,” Working

Paper.

Sarafidis, V. and D. Robertson (2009): “On the Impact of Error Cross-Sectional

Dependence in Short Dynamic Panel Estimation,” Econometrics Journal, 12, 62–81.

Sarafidis, V. and T. J. Wansbeek (2012): “Cross-sectional Dependence in Panel

Data Analysis,” Econometric Reviews, 31, 483–531.

Sarafidis, V., T. Yamagata, and D. Robertson (2009): “A Test of Cross Sec-

tion Dependence for a Linear Dynamic Panel Model with Regressors,” Journal of

Econometrics, 148, 149–161.

Verbeek, M. (2008): “Pseudo-Panels and Repeated Cross-Sections,” in The Econo-

metrics of Panel Data, ed. by L. Matyas, P. Sevestre, J. Marquez, A. Spanos,

F. Adams, P. Balestra, M. Dagenais, D. Kendrick, J. Paelinck, R. Pindyck, and

W.Welfe, Springer Verlag, vol. 46 of Advanced Studies in Theoretical and Applied

Econometrics.

Verbeek, M. and T. Nijman (1992): “Can Cohort Data Be Treated As Genuine

Panel Data?” Empirical Economics, 17, 9–23.

Verbeek, M. and F. Vella (2005): “Estimating Dynamic Models from Repeated

Cross-sections,” Journal of Econometrics, 127, 83–102.

Verdier, V. (2015): “Estimation of Dynamic Panel Data Models with Cross-Sectional

Dependence: Using Cluster Dependence for Efficiency,” Journal of Applied Econo-

metrics, (forthcoming).

Westerlund, J. and M. Norkute (2014): “A Factor Analytical Method to Inter-

active Effects Dynamic Panel Models with or without Unit Root,” Working Paper

2014:12.

Bibliography 194

White, H. (2000): Asymptotic Theory for Econometricians, Economic Theory, Econo-

metrics, and Mathematical Economics, Academic Press, 2 ed.

Windmeijer, F. (2005): “A Finite Sample Correction for the Variance of Linear

Efficient Two-Step GMM Estimators,” Journal of Econometrics, 126, 25–51.

Ziliak, J. P. (1997): “Efficient Estimation with Panel Data When Instruments Are

Predetermined: An Empirical Comparison of Moment-Condition Estimators,” Jour-

nal of Business & Economic Statistics, 15, 419–431.

Nederlandse Samenvatting (Summary in Dutch)

Panel data zijn herhaalde waarnemingen, over verschillende tijdsperioden, van ver-

schillende cross-sectionele eenheden, zoals individuen of bedrijven. Panel data wor-

den in toenemende mate gebruikt in empirische macro- en (vooral) micro-economische

analyses, en deze toename heeft verschillende oorzaken. Het grootste voordeel van

het gebruik van panel data is dat het een statistische analyse mogelijk maakt van

causale effecten, waarbij voor niet-waargenomen kenmerken kan worden gecorrigeerd.

Een tweede voordeel van panel data is dat het samenvoegen van tijdreeksen over

verschillende cross-sectionele eenheden in een enkel model kan leiden tot een relatief

nauwkeurige schatting van onbekende parameters, zelfs als het aantal cross-sectie een-

heden (N) of het aantal tijdseenheden (T ) relatief klein is.

Een centraal thema in de analyse van lineaire dynamische panel-data modellen is de

inconsistentie van de fixed-effects schatter als N toeneemt maar T eindig blijft. Deze

inconsistentie staat bekend als de Nickell bias, en is een voorbeeld van het incidentele-

parameter probleem. Vanwege deze problemen met de fixed-effects schatter is het

gebruikelijk om de parameters van dynamische panel-data modellen te schatten met de

gegeneraliseerde momentenmethode (GMM). Deze methode leidt tot een consistente

en asymptotisch efficiente schatter, maar kent in eindige steekproeven ook beperk-

ingen: het gebruik van te veel of te zwakke instrumentele variabelen kan leiden tot

een vertekening in GMM schatters en toetsen. Dit heeft geleid tot een hernieuwde

belangstelling voor likelihood-gebaseerde schattingsmethoden die corrigeren voor het

incidentele-parameter probleem.

In Hoofdstuk 2 van dit proefschrift worden eigenschappen geanalyseerd van de schatter

gebaseerd op de likelihood in eerste verschillen in een eerste-orde panel vector autore-

gressief model. Nieuwe resultaten worden verkregen over de verdelingseigenschappen

van deze schatter. De nadruk ligt daarbij op situaties waarin niet aan de aannames

wordt voldaan, waaronder asymptotische eigenschappen zijn afgeleid. Daarnaast wordt

een vereenvoudigde aanpak voor het bepalen van deze schatter afgeleid. Bovendien

wordt in dit hoofdstuk de asymptotische bimodaliteit van de likelihood geanalyseerd,

een onderwerp dat in de literatuur onderbelicht is gebleven. Ter illustratie wordt een

uitgebreide Monte Carlo simulatiestudie uitgevoerd. De resultaten daarvan bieden be-

langwekkende inzichten in de eindige-steekproef eigenschappen van de schatter, relevant

voor zowel theoretische als toegepaste econometristen.

195

Nederlandse Samenvatting 196

Hoofdstuk 3 geeft een gedetailleerde analyse van de mogelijke bimodaliteit van de

likelihood en negativiteit van variantieschatters in het univariate panel AR(1) model,

eventueel uitgebreid met exogene verklarende variabelen. Een belangrijk resultaat is

dat de eerste-orde voorwaarde voor de ML schatter een derdegraads polynoom vormt

in de autoregressieve parameter. Dit suggereert dat de log-likelihood functie in eindige

steekproeven (voor willekeurige T ) zowel unimodaal als bimodaal kan zijn. Het hoofd-

stuk laat verder zien dat gebruikelijke t-toetsen een sterk vertekend significantieniveau

kunnen hebben. Dit probleem kan bijzonder relevant zijn bij empirische toepassing

van deze methoden; in een empirische illustratie blijkt het van invloed op de geschatte

dynamiek in de werkloosheid per staat in de Verenigde Staten.

Soms wordt de aanname dat het effect van weggelaten variabelen kan worden benaderd

met een additieve foutencomponenten structuur als te restrictief beschouwd. Een niet-

waargenomen eigenschap zoals talent of aanleg kan bijvoorbeeld een tijdsvarierend ef-

fect hebben op de productiviteit en daarmee op het inkomen. In dergelijke gevallen is

het gebruikelijk in de econometrische literatuur om interactie- (oftewel multiplicatieve)

effecten op te nemen. Tegenover de voordelen van een dergelijke flexibilere structuur

staan twee belangrijke praktische beperkingen. Ten eerste zijn standaard inferentiele

methoden inconsistent; ten tweede moet men gebruik maken van niet-lineaire metho-

den, die tot rekentechnische complicaties kunnen leiden, en waarvan de asymptotische

eigenschappen sterk afhankelijk zijn van specificieke aannames en nuisance parameters.

In de twee resterende hoofdstukken wordt dit type model geanalyseerd, en worden

nieuwe resultaten over de eindige-steekproef en asymptotische eigenschappen afgeleid

van schatters die rekening houden met interactie-effecten.

Hoofdstuk 4 geeft een uitgebreid overzicht van schatters voor dynamische panel-data

modellen met interactie-effecten. Doel van dit hoofdstuk is om empirische onderzoek-

ers een praktische handleiding te bieden bij het toepassen van methoden die rekening

houden met verschillende vormen van niet-waargenomen heterogeniteit. Bijzondere

aandacht wordt besteed aan de berekening van het aantal identificeerbare parameters,

een in de literatuur vaak veronachtzaamde vereiste voor het afleiden van asympto-

tisch accurate inferentiemethoden en consistente modelselectie procedures. De eindige-

steekproef eigenschappen van schatters worden bestudeerd voor een aantal verschil-

lende parameterconfiguraties in een grootschalige Monte Carlo studie. Hierbij wordt

aandacht gegeven aan (i) het effect van de aanwezigheid van zwak exogene verklarende

variabelen, (ii) het effect van een veranderende correlatie tussen de factor loadings voor

de endogene en de verklarende variabelen, (iii) de invloed van het aantal momentvoor-

waarden op de onzuiverheid en het significatieniveau van GMM schatters en toetsen,

Nederlandse Samenvatting 197

(iv) het effect van verschillen in tijdreeks-persistentie in de data, en tenslotte (v) het

effect van de steekproefgrootte.

Hoofdstuk 5 verlegt de analyse naar die van pseudo-panel-data modellen. Als een

echt panel niet beschikbaar is kan een pseudo-panel geconstrueerd worden uit her-

haalde cross-secties. In dit laatste hoofdstuk worden eigenschappen van schatters

bestudeerd in lineaire pseudo-panel-data modellen met een vast aantal cohorten en

tijdreekswaarnemingen, en, in het bijzonder, multiplicatieve effecten. Bijzondere aan-

dacht wordt gegeven aan identificatie aspecten van de voorgestelde schatter voor het

geval van niet-gebalanceerde steekproeven. Naast theoretische resultaten geeft het

hoofdstuk een uitgebreide Monte Carlo simulatiestudie. Daarbij wordt de nadruk

gelegd op de robuustheid van de voorgestelde schatter met betrekking tot endogen-

iteit, cohort interactie-effecten en zwakke identificatie. Voor zover bekend voor het

eerst in de literatuur worden zwakke en globale identificatie aspecten onderzocht in

pseudo-panels met een vast aantal cohorten en tijdreekswaarnemingen.

The Tinbergen Institute is the Institute for Economic Research, which was founded

in 1987 by the Faculties of Economics and Econometrics of the Erasmus University

Rotterdam, University of Amsterdam and VU University Amsterdam. The Institute

is named after the late Professor Jan Tinbergen, Dutch Nobel Prize laureate in eco-

nomics in 1969. The Tinbergen Institute is located in Amsterdam and Rotterdam. The

following books recently appeared in the Tinbergen Institute Research Series:

583. L.T. GATAREK, Econometric Contributions to Financial Trading, Hedging and

Risk Measurement

584. X. LI, Temporary Price Deviation, Limited Attention and Information Acquisition

in the Stock Market

585. Y. DAI, Efficiency in Corporate Takeovers

586. S.L. VAN DER STER, Approximate feasibility in real-time scheduling: Speeding

up in order to meet deadlines

587. A. SELIM, An Examination of Uncertainty from a Psychological and Economic

Viewpoint

588. B.Z. YUESHEN, Frictions in Modern Financial Markets and the Implications for

Market Quality

589. D. VAN DOLDER, Game Shows, Gambles, and Economic Behavior

590. S.P. CEYHAN, Essays on Bayesian Analysis of Time Varying Economic Patterns

591. S. RENES, Never the Single Measure

592. D.L. IN ’T VELD, Complex Systems in Financial Economics: Applications to

Interbank and Stock Markets

593. Y. YANG, Laboratory Tests of Theories of Strategic Interaction

594. M.P. WOJTOWICZ, Pricing Credits Derivatives and Credit Securitization

595. R.S. SAYAG, Communication and Learning in Decision Making

596. S.L. BLAUW, Well-to-do or doing well? Empirical studies of wellbeing and de-

velopment

597. T.A. MAKAREWICZ, Learning to Forecast: Genetic Algorithms and Experi-

ments

598. P. ROBALO, Understanding Political Behavior: Essays in Experimental Political

Economy

599. R. ZOUTENBIER, Work Motivation and Incentives in the Public Sector

600. M.B.W. KOBUS, Economic Studies on Public Facility use

601. R.J.D. POTTER VAN LOON, Modeling non-standard financial decision making

602. G. MESTERS, Essays on Nonlinear Panel Time Series Models

603. S. GUBINS, Information Technologies and Travel

604. D. KOPANYI, Bounded Rationality and Learning in Market Competition

605. N. MARTYNOVA, Incentives and Regulation in Banking

606. D. KARSTANJE, Unraveling Dimensions: Commodity Futures Curves and Eq-

uity Liquidity

607. T.C.A.P. GOSENS, The Value of Recreational Areas in Urban Regions

608. L.M. MARC, The Impact of Aid on Total Government Expenditures

609. C. LI, Hitchhiking on the Road of Decision Making under Uncertainty

610. L. ROSENDAHL HUBER, Entrepreneurship, Teams and Sustainability: a Series

of Field Experiments

611. X. YANG, Essays on High Frequency Financial Econometrics

612. A.H. VAN DER WEIJDE, The Industrial Organization of Transport Markets:

Modeling pricing, Investment and Regulation in Rail and Road Networks

613. H.E. SILVA MONTALVA, Airport Pricing Policies: Airline Conduct, Price Dis-

crimination, Dynamic Congestion and Network Effects

614. C. DIETZ, Hierarchies, Communication and Restricted Cooperation in Coopera-

tive Games

615. M.A. ZOICAN, Financial System Architecture and Intermediation Quality

616. G. ZHU, Three Essays in Empirical Corporate Finance

617. M. PLEUS, Implementations of Tests on the Exogeneity of Selected Variables and

their Performance in Practice

618. B. VAN LEEUWEN, Cooperation, Networks and Emotions: Three Essays in

Behavioral Economics

619. A.G. KOPANYI-PEUKER, Endogeneity Matters: Essays on Cooperation and

Coordination

620. X. WANG, Time Varying Risk Premium and Limited Participation in Financial

Markets

621. L.A. GORNICKA, Regulating Financial Markets: Costs and Trade-offs

622. A. KAMM, Political Actors playing games: Theory and Experiments

623. S. VAN DEN HAUWE, Topics in Applied Macroeconometrics

624. F.U. BRAUNING, Interbank Lending Relationships, Financial Crises and Mon-

etary Policy

625. J.J. DE VRIES, Estimation of Alonso’s Theory of Movements for Commuting

626. M. POP LAWSKA, Essays on Insurance and Health Economics

627. X. CAI, Essays in Labor and Product Market Search

628. L. ZHAO, Making Real Options Credible: Incomplete Markets, Dynamics, and

Model Ambiguity

629. K. BEL, Multivariate Extensions to Discrete Choice Modeling

630. Y. ZENG, Topics in Trans-boundary River sharing Problems and Economic The-

ory

631. M.G. WEBER, Behavioral Economics and the Public Sector

632. E. CZIBOR, Heterogeneity in Response to Incentives: Evidence from Field Data

Documents

UvA-DARE (Digital Academic Repository) Essays in panel data … › ws › files › 2636319 › 166708_DEF_Juodis... · prof. dr. D.C. van den Boom ten overstaan van een door het