Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Factor Analysis of HighDimensional Time Series
Chris Heaton
A thesis submitted in fullment
of the requirements for the degree of
Doctor of Philosophy
November 30, 2007
Acknowledgements
I gratefully acknowledge the signicant contribution made to the work pre-sented in this thesis by my supervisor Professor Victor Solo.
The results on identication presented in Section 2.2 were accepted for pub-lication before the preparation of this thesis, and appeared as Heaton, C. andSolo, V. (2004) Identication of causal models of stationary time series" TheEconometrics Journal 7, p.618-627. The editor and referees provided us withmany useful comments. This work was also presented at the 57th EuropeanMeeting of the Econometric Society in 2002. Early drafts of some of the mate-rial in Chapter 3 were presented at the 2003 North American Winter Meetingof the Econometric Society, the 9th International Conference on Computing inEconomics and Finance, and the 2006 North American Summer Meeting ofthe Econometric Society. The many useful comments and encouragement ofparticipants at these sessions are gratefully acknowledged.
Abstract
This thesis presents the results of research into the use of factor models forstationary economic time series. Two basic scenarios are considered. The rstis a situation where a large number of observations are available on a relativelysmall number variables, and a dynamic factor model is specied. It is shownthat a dynamic factor model may be derived as a representation of a VARMAmodel of reduced spectral rank observed subject to measurement error. Insome cases the resulting factor model corresponds to a minimal state-spacerepresentation of the VARMA plus noise model. Identication is discussedand proved for a fairly general class of dynamic factor model, and a frequencydomain estimation procedure is proposed which has the advantage of general-ising easily to models with rich dynamic structures. The second scenario is onewhere both the number of variables and the number of observations jointy di-verge to innity. The principal components estimator is considered in this case,and consistency is proved under assumptions which allow for much more errorcross-correlation than the previously published theorems. Ancillary results in-clude nite sample/variables bounds linking population principal componentsto population factors, and consistency results for principal components in adual limit framework under a `gap' condition on the eigenvalues. A new factormodel, named the Grouped Variable Approximate Factor Model, is introduced.This factor model allows for arbitrarily strong correlation between some of theerrors, provided that the variables corresponding to the strongly correlatederrors may be arranged into groups. An approximate instrumental variablesestimator is proposed for the model and consistency is proved.
Contents
1 Introduction 1
1.1 Classical Factor Analysis . . . . . . . . . . . . . . . . . . . . . . 41.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 Identication . . . . . . . . . . . . . . . . . . . . . . . . 51.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.4 Errors in Variables . . . . . . . . . . . . . . . . . . . . . 10
1.2 Static Principal Component Analysis . . . . . . . . . . . . . . . 111.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.2 Identication . . . . . . . . . . . . . . . . . . . . . . . . 131.2.3 Classical Asymptotics . . . . . . . . . . . . . . . . . . . 141.2.4 Random Matrix Theory . . . . . . . . . . . . . . . . . . 151.2.5 Principal component regression . . . . . . . . . . . . . . 191.2.6 Independent Component Analysis . . . . . . . . . . . . . 221.2.7 Canonical Correlation Analysis and Reduced Rank Re-
gression . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.3 Time series factor analysis . . . . . . . . . . . . . . . . . . . . . 24
1.3.1 Models Based on a 2-sided Filter . . . . . . . . . . . . . 261.3.2 Models Based on a 1-sided Filter . . . . . . . . . . . . . 271.3.3 Dynamic Errors in Variables . . . . . . . . . . . . . . . . 34
1.4 Time Series Principal Component Analysis . . . . . . . . . . . . 351.4.1 Models Based on a 2-sided Filter . . . . . . . . . . . . . 361.4.2 Models Based on a 1-sided Filter . . . . . . . . . . . . . 36
1.5 Factor Analysis and Principal Component Analysis of High-Dimensional Vectors . . . . . . . . . . . . . . . . . . . . . . . . 391.5.1 Population Results . . . . . . . . . . . . . . . . . . . . . 401.5.2 Models Based on a 2-sided Filter . . . . . . . . . . . . . 421.5.3 Models Based on a 1-sided Filter . . . . . . . . . . . . . 431.5.4 The Choice of Factor Order . . . . . . . . . . . . . . . . 471.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 481.5.6 Using Factors for GMM Estimation . . . . . . . . . . . . 50
ii
1.6 Evaluation and Contributions . . . . . . . . . . . . . . . . . . . 511.6.1 Evaluation of the Literature . . . . . . . . . . . . . . . . 511.6.2 Contributions Made in this Thesis . . . . . . . . . . . . . 57
2 Dynamic Factor Analysis with a Finite Number of Variables 64
2.1 Dynamic factor models in macroeconomics . . . . . . . . . . . . 662.2 Identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832.4 A Comparison of the Time Domain and Frequency Domain Al-
gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892.5 An Empirical Example . . . . . . . . . . . . . . . . . . . . . . . 912.6 Concluding Comments . . . . . . . . . . . . . . . . . . . . . . . 99
3 Principal Components Estimation of Large-Scale Factor Mod-
els 111
3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173.1.1 Population Principal Components and Population Factors1223.1.2 N and the Noise to Signal Ratio . . . . . . . . . . . . . . 1283.1.3 Sample Principal Components and Population Principal
Components . . . . . . . . . . . . . . . . . . . . . . . . . 1303.1.4 Sample Principal Components and Population Factors . 133
3.2 Measuring the noise-to-signal ratio . . . . . . . . . . . . . . . . 1383.3 The noise-to-signal ratio for a US macroeconomic data set . . . 1423.4 Summary and concluding comments . . . . . . . . . . . . . . . . 145
4 The Grouped Variable Approximate Factor Model 178
4.1 The grouped variable approximate factor model . . . . . . . . . 1844.2 The approximate instrumental variables estimator . . . . . . . . 186
4.2.1 Estimating Bi . . . . . . . . . . . . . . . . . . . . . . . . 1884.2.2 Estimating δ = (β′ α′)′ . . . . . . . . . . . . . . . . . . 1894.2.3 Estimating Σf and Ψ . . . . . . . . . . . . . . . . . . . . 1904.2.4 Estimating ft . . . . . . . . . . . . . . . . . . . . . . . . 1914.2.5 Estimation with approximate factors . . . . . . . . . . . 192
4.3 Some Dual-Limit Theory . . . . . . . . . . . . . . . . . . . . . . 1944.4 An experimental application to US macroeconomic data . . . . 2004.5 Concluding Comments . . . . . . . . . . . . . . . . . . . . . . . 206
5 Conclusions 230
5.1 The motivation for the research . . . . . . . . . . . . . . . . . . 2305.2 The ndings of the research . . . . . . . . . . . . . . . . . . . . 231
5.2.1 Dynamic factor analysis . . . . . . . . . . . . . . . . . . 231
iii
5.2.2 Approximate factor models . . . . . . . . . . . . . . . . . 2335.2.3 The grouped variable approximate factor model . . . . . 236
5.3 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . 2375.3.1 Dynamic factor analysis . . . . . . . . . . . . . . . . . . 2375.3.2 Approximate factor models . . . . . . . . . . . . . . . . . 2385.3.3 The grouped variable approximate factor model . . . . . 238
iv
List of Figures
2.1 Monthly industrial production growth for G7 countries . . . . . 932.2 Estimated ARMA(1,1) spectra of the errors . . . . . . . . . . . 972.3 Estimated ARMA(1,1) spectra of factor . . . . . . . . . . . . . . 982.4 Estimated ARMA(1,1) factor . . . . . . . . . . . . . . . . . . . 99
3.1 Eigenvalues of Stock and Watson's data . . . . . . . . . . . . . . 143
3.2 1N−k
N∑j=k+1
λj
λkfor Stock and Watson's data . . . . . . . . . . . . . 144
4.1 Single factor estimated using approximate instrumental vari-ables method and principal components method . . . . . . . . . 202
4.2 Dierence between factor estimated by approximate instrumen-tal variables and factor estimated by principal components . . . 203
v
List of Tables
2.1 Estimation by Time Domain and Frequency Domain ScoringAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.2 SBCs and log-likelihoods (q = the number of lags of factors) . . 952.3 Estimates of factor loadings and error variances . . . . . . . . . 962.4 Estimates of ARMA parameters of the factor and errors . . . . 96
3.1 Empirical and Theoretical Distributions of the Test Statistic(k=2, T=100, 5000 simulations) . . . . . . . . . . . . . . . . . . 142
3.2 Test results for Stock and Watson data . . . . . . . . . . . . . . 145
4.1 Approximate Instruments . . . . . . . . . . . . . . . . . . . . . 1934.2 Forecast MSEs for PC and AIV forecasts for IP . . . . . . . . . 2044.3 Forecast MSEs for PC and AIV forecasts for PUNEW . . . . . . 2044.4 Group a variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2234.5 Group 2 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2244.6 Group 3 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2254.7 Group 4 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2254.8 Group 5 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2264.9 Group 6 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2264.10 Group 7 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2274.11 Group 8 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2274.12 Group 9 variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2274.13 Group 10 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2284.14 Group 11 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2284.15 Group 12 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2294.16 Group 13 variables . . . . . . . . . . . . . . . . . . . . . . . . . 2294.17 Group 14 variables . . . . . . . . . . . . . . . . . . . . . . . . . 229
vi
Chapter 1
Introduction
Recent years have seen rapid growth in the availability of economic data.
Economists in the industrialised countries now have easy access to data on
many hundreds of variables that provide information about the state of the
economy. Coinciding with the growth in available data has been unprecen-
dented improvement in access to sophisticated econometric techniques. Soft-
ware packages such as Eviews, Oxmetrics, JMulti, Microt, and many others,
greatly simplify the application of modern econometric methods. Modelling
tasks which might otherwise require many hours of work by a specialist econo-
metrician can now be completed in seconds by general economists with basic
econometric training and a familiarity with their software environment.
Unfortunately, modern econometrics presents the time series analyst with
a Hobson's choice of techniques. While a wide range of methods exist for
analysing independently sampled random vectors, the majority of the com-
monly used techniques for analysing time series economic data are variations
on a basic dynamic regression, or vector autoregression, approach. While an
1
impressive range of problems have been investigated and resolved in this frame-
work, its limitation is that, given the length of time series usually available in
economics, it is unable to deal with more than a handful of variables in a single
model. Large scale structural macroeconometric models, while once popular,
are far less commonly employed following the criticisms of Sims (1980). Panel
data methods, while capable of handling large numbers of variables, are only
applicable in cases where panel structure exists. Consequently, for macroe-
conomists who believe that the wide range of time series data that are now
available contain useful information that is not spanned by any small subset
of variables, contemporary econometrics does not have much to oer.
A challenge for econometricians then, is to devise formal modelling strate-
gies that are suited to time series data sets which are large in the sense that the
number of variables is greater than that which could reasonably be analysed
using traditional regression-related techniques. Of particular interest are tech-
niques which work in cases for which the number of variables is of the same
order of magnitude, and possibly greater than, the number of observations
available, since this describes many of the data sets that are of interest. The
fact that economists who design macroeconomic policy pay close attention to
such a wide range of published data1 implies a belief that econometric tech-
niques which utilise large numbers of variables might reveal information that
cannot be gleaned from small econometric models. At the very least, such a
research program may be justied by a `suck it and see' argument if we don't
develop techniques that can deal with large time series data sets, then we will
never know whether these data sets are useful in economics.
1see, for example, the introduction to Bernanke and Boivin (2003).
2
One approach to modelling data sets which are too large for traditional
econometric techniques is to specify a factor model2. While factor analysis
is a commonly employed technique in the analysis of IID vectors, in the past
less interest has been shown in estimating factor models for time series. How-
ever, there is now a rapidly growing literature of applications of factor analyis
techniques to economic time series. Examples of applications include the con-
struction of economic indicators3, business cycle analysis4, forecasting5, the
analysis of monetary policy6, unemployment7, stock market returns8, interest
rates9, real estate market eciency10 and lending risk11. Importantly, there
exists recent theoretical work in which factor estimators are considered in a
framework in which both the number of observations and the number of vari-
ables jointly go to innity12.
This thesis presents several original contributions to the eld of time series
factor analysis. These include some extensions of established approaches, some
new approaches, and new theory. The remainder of this chapter provides an
introduction to classical factor analysis and principal component analysis and
related methods, and to the more recent work in time series factor analysis
2Other approaches, which are not considered in this thesis, include combining forecasts,Bayesian model averaging and empirical Bayes methods. Stock and Watson (2006) providea good survey of alternative approaches.
3Altissimo et al. (2006).4Gregory et al. (1997).5Stock and Watson (2002b).6Bernanke and Boivin (2003).7Heaton and Oslington (2002).8Ludvigson and Ng (2007).9Lippi and Thornton (2004).10Guntermann and Norrbin (1991).11Melvin and Schlagenhauf (1986).12Stock and Watson (2002a), Bai and Ng (2002), Bai (2003), Forni et al. (2000), Forni
et al. (2004) and Forni et al. (2005).
3
and principal components. An overview of the relevant literature is presented,
and the original contributions made in the thesis are briey described.
1.1 Classical Factor Analysis
1.1.1 Introduction
Factor analysis has its origins in early attempts by psychologists to measure
intelligence. Spearman (1904) gathered data on schoolchildrens' performance
in a range of tests and proposed that the positive correlation between the
test scores for randomly sampled children was due to a single factor, which he
referred to as general intelligence, and which he estimated from the data. Later
researchers proposed multiple types of intelligence, leading to multiple factor
models. While the psychological theories that inspired the development of
factor analysis have been superceded, factor analysis as a general multivariate
statistical technique is now commonly used in a range of empirical disciplines.
Consider observations on a N × 1 random vector xt for t = 1, .., T . The
factor model assumes that xt is a linear function of a k × 1 vector of random
factors ft and a N × 1 vector of random errors εt, so that
xt = Bft + εt (1.1)
where B is a N × k matrix of non-random coecients referred to as the factor
loading matrix, and k N . In the classical setting N is assumed to be xed.
Following the literature, it will be assumed that E(ft) = 0 and E(εt) = 0. If
xt for t = 1, .., T is a set of IID observations and k > 1 then equation (1.1) is
4
the classical multiple factor model of Thurstone (1947).
1.1.2 Identication
Denote Ψ = E(εtε′t) and Σff = E(ftf
′t). The covariance matrix of the observ-
able vector xt is then
Σxx = E(xtx′t) = BΣffB
′ + Ψ (1.2)
From a consideration of the rst two moments, the parameters of the factor
model are unidentied in the absence of restrictions. Firstly, note that the
factor structure implies that the covariance matrix of xt may be written as
the sum of a rank k matrix and a full-rank symmetric positive-denite ma-
trix. If we dene B∗ =√
αB and Ψ∗ = (1 − α)Ψ where 0 < α < 1 then
Σxx = BΣffB′ + Ψ = B∗ΣffB
′∗ + Ψ∗ where Ψ∗ is full-rank, symmetric and
positive-denite. Thus, in the general case, Ψ is unidentied. In classical
factor analysis, this problem is circumvented by assuming that Ψ is diagonal.
Anderson and Rubin (1956) prove that Ψ is identied under this restriction13.
Even with Ψ restricted to be diagonal, B is not uniquely determined. If we
now let B∗ = BM and f ∗t = M−1ft where M is any non-singular matrix, then
Σxx = BΣffB′ + Ψ = B∗Σf∗f∗B
′∗ + Ψ. Therefore, B and ft are identied
up to a non-singular transformation only. It is often assumed that Σff = Ik.
In this case, M must be an orthogonal matrix and B is then identied up to
an orthogonal transformation. In order to acheive identication up to a sign
13In addition to the diagonality of Ψ some other assumptions are made to prove theuniqueness of Ψ.
5
change, restrictions must be placed on B that are equivalent to choosing a
particular orthogonal value for M . Anderson and Rubin (1956) and Reiersøl
(1950) discuss a number of restrictions that are sucient to achieve identica-
tion up to a sign change. Perhaps the most familar of these, for an economist,
is that the observable variables can be ordered such that there exists a k × k
submatrix of B that is upper triangular, i.e. it is possible to order the elements
of xt such that
B =
B11 0 · · · 0
B21 B22 · · · 0
B31 B32. . . 0
......
BN1 BN2 · · · BNk
In the applied literature identication is often achieved by the analyst sim-
ply choosing a particular value of M that provides a set of factors and load-
ings that are regarded as being easily interpretable. A popular method is the
varimax rotation14 which chooses M to maximise the variance of the factor
loadings. This produces a large number of small factor loadings and a small
number of large factor loadings. Another popular approach has been simply to
plot factor loadings and choose an identication scheme which corresponds to a
plausible interpretation15. An alternative approach to identication which has
been utilised in the signal processing literature16 is to impose the restriction
that the factors are statistically independent (rather than the weaker condition
of uncorrelatedness that has been imposed above).
14Kaiser (1958).15see, for example, Gorsuch (1983).16Attias (1999).
6
The apparent arbitrariness of many of the identication schemes that have
been employed in the applied behavioural sciences literature has earned fac-
tor analysis something of a controversial reputation. However, two points
should be borne in mind. Firstly, the statistical theory of the classical factor
model, including the identication issue, is sound and is well-understood. The
charge that it has been badly applied in some cases should not be interpreted
as evidence that the general approach is awed. Secondly, there exist many
interesting applications (e.g. forecasting, variance decomposition, index con-
struction, etc) for which only the space spanned by the the factors needs to be
estimated. In such applications, the results are independent of any `rotation'
applied to the factors.
1.1.3 Estimation
Maximisation of the Gaussian likelihood by a (quasi-) Newton algorithm re-
quires identifying restricitons to be imposed on the model in order to ensure
a non-singular information matrix. In principle, any condition sucient for
identication could be employed, but the conditional Fletcher Powell proce-
dure proposed by Jöreskog (1967)17 produces a neat algorithm which is easy
to code. Once the model has been estimated under a particular restriction,
the parameters of any equivalent model may be computed by applying the
appropriate transformation to the maximum likelihood estimates.
Rubin and Thayer (1982) propose that maximum likelihood estimation of
the factor model be carried out using the EM algorithm of Dempster et al.
(1977). This approach has the advantage of being extremely simple to code.
17Jöreskog (1967) restricts B′ΨB to be a diagonal matrix.
7
While convergence of the EM algorithm generally requires many more itera-
tions than a (quasi)-Newton algorithm, each iteration can be computed rel-
atively quickly, particularly when the number of parameters is large. An in-
teresting feature of the EM algorithm in this context is that it estimates the
factor model without any explicit identifying restriction being imposed on the
parameters.
A practical problem which often occurs in the maximum likelihood estima-
tion of the factor model is that the algorithm will converge to a solution for
which one of the diagonal elements of Ψ is zero. These solutions are referred to
in the literature as `improper' or `Heywood' solutions18. Lawley and Maxwell
(1971) and Jöreskog (1967) propose an algorithm for model estimation in the
presence of improper solutions, which involves respecifying the factor model
once an improper solution has been detected.
Anderson and Rubin (1956) prove asymptotic Gaussianity of the maximum
likelihood estimator in cases in which the parameters of the model are identied
and√
T (Sxx − Ω) is asymptotically Gaussian, where Sxx = 1T
T∑t=1
xtx′t. Gill
(1977) proves consistency under more general conditions which include cases
in which a parameter lies on boundary of the coecient space.
A number of alternatives to likelihood estimation have been proposed.
Jöreskog and Goldberger (1972) propose a GLS procedure that may be imple-
mented using a Newton-Raphson algorithm. Ihara and Kano (1986) develop a
method of moments estimator for individual elements of Ψ. In a paper that is
particularly relevant for Chapter 4 of this thesis, Madansky (1964) notes that
the factor model implies an errors-in-variables structure. Equation 1.1 may be
18see Lawley and Maxwell (1971) or Jöreskog (1967).
8
partitioned into three blocks.
(1.3)x0t
x1t
x2t
=
B0
B1
B2
ft +
ε0t
ε1t
ε2t
t = 1, .., T
where xt is a k × k vector and it is assumed that B0 is non-singular and
that Ψ = E(εtε′t) is diagonal. Exploiting the fact that the factor loadings and
factors are identied up to a non-singular transformation only, the model may
be written equivalently as
(1.4)x0t
x1t
x2t
=
Ik
B∗1
B∗2
f ∗t +
ε0t
ε1t
ε2t
t = 1, .., T
where B∗j = BjB
−10 for j = 1, 2 and f ∗t = B0ft. The equations
xjt = B∗j f
∗t + εjt
f ∗t = x0t − ε0t
then form an errors-in-variables model. Madansky (1964) proposed an instru-
mental variables (IV) estimator based on this transformation of the factor
model. Hägglund (1982) extended this to a two stage least squares (2SLS)
estimator.
9
An important feature of the estimators of Ihara and Kano (1986), Madan-
sky (1964) and Hägglund (1982) is that they are non-iterative and may be
computed in cases in which the number of variables is larger than the number
of observations. From a computational perspective they are attractive alterna-
tives to likelihood methods particularly in cases for which improper solutions
cause a problem and in cases for which the number of variables is large. Jen-
nrich (1986) proposes a Gauss-Newton algorithm which achieves eciency in a
single step from consistent starting values. He suggests that it be used in con-
junction with the procedures of Ihara and Kano (1986) and Hägglund (1982)
to produce an ecient two-step estimator.
1.1.4 Errors in Variables
Consider the linear model
yt = β′x∗t + εt
where yt is a scalar and x∗t is a N × 1 vector. Suppose that x∗t is observed
subject to measurement error, so that
xt = x∗t + ηt
is the observable variable, where ηt is the random measurement error. A sim-
ple count of the moment conditions and parameters reveals that this model
is unidentied when all random variables are assumed to be Gaussian. Two
main approaches have been proposed to resolve this problem. One approach
is to use instrumental variables. Early discussions of this approach include
10
Geary (1942) and Reiersøl (1945). Carter and Fuller (1980) derive maximum
likelihood estimators using instruments. The other approach to estimation is
to assume non-Gaussianity and to use higher order moments to resolve the
identication problem. Reiersøl (1941) proposed estimators constructed from
third order moments. Geary (1942) discussed estimation using cumulunts of
any order higher than two. Pal (1980) used third moments to construct an es-
timator. Cragg (1997) and Dagenais and Dagenais (1997) derive instrumental
variables estimators using third and fourth moments. Erickson and Whited
(2002) construct a GMM estimator using higher order moments.
1.2 Static Principal Component Analysis
1.2.1 Introduction
The principal components method may be viewed as a linear transformation
of multivariate data to a new orthogonal coordinate system such that the
direction of the greatest variation of the data is given by the rst axis, the
direction of the second greatest variation of the data is given by the second
axis, and so on. The method is due originally to Pearson (1901) and was
developed by Hotelling (1933).
Consider observations on a N×1 random vector xt for t = 1, .., T . Let R1 be
the set of N × 1 vectors that have been normalised so that r′1r1 = 1,∀r1 ∈ R1.
Now consider the problem of choosing the value of r1 which maximises the
11
sample variance of r′1xt. Denote
r∗1 = argmaxr1∈R1
(1
T
T∑t=1
r′1xtx′tr1
)
Standard calculus arguments establish that r∗1 = q1 where q1 is the normalised
eigenvector corresponding to the largest eigenvalue of the sample covariance
matrix Sxx = 1T
T∑t=1
xtx′t. The constructed variable s1t = λ
− 12
1 r′∗1 xt = λ− 1
21 q′1xt,
where λ1 is the largest eigenvalue of the sample covariance matrix, is referred
to as the rst sample principal component of xt. Now dene R2 to be the set
of N × 1 vectors for which r′2r2 = 1 and r′2r1 = 0,∀r1 ∈ R1, r2 ∈ R2. Consider
the problem of choosing a value for r2 so that the sample variance of r′2xt is
maximised. Denoting
r∗2 = argmaxr2∈R2
(1
T
T∑t=1
r′2xtx′tr2
)
it is easy to show that r∗2 = q2 where q2 is the normalised eigenvector cor-
responding to the second largest eigenvalue, λ2. The constructed variable
s2t = λ− 1
22 r′∗2 x2 = λ
− 12
2 q′2xt is referred to as the second sample principal compo-
nent of xt. In a similar fashion a complete set of N orthormal sample principal
components may be dened.
An alternative, but equivalent way to dene the rst k sample principal
components of T observations on a N × 1 vector xt is to dene the matrix
X = (x1, · · · , xT )′ and to choose values for the T × k matrix Sk and the N × k
matrix B to minimise1
T‖X − SkB
′‖2F
12
such that 1TS ′kSk = Ik. As is well-known from regression analysis, the optimal
value of B is given by the OLS estimator B = 1TX ′Sk. Substituting this into
the maximand, the remaining problem is to choose a value for Sk to minimise
1
Ttr(S ′kXX ′Sk)
such that 1TS ′kSk = Ik. Therefore, the optimal value of Sk is Sk where Sk is
the matrix containing the eigenvectors corresponding to the rst k eigenvalues
of 1TXX ′.
It may be noted that the eigenvalues, eigenvectors and principal compo-
nents from the two denitions above are given simply by the singular value
decomposition1√T
X = SΛ12 Q′
where S is a T × N matrix for which the jth column is the jth principal
component.
In applied work in many disciplines it is often found that a large proportion
of the variance of xt is accounted for by the rst few principal components.
Since working with the rst few principal components might entail a signi-
cant reduction in dimension, in some applications analysts may prefer this to
working with the original data.
1.2.2 Identication
While often viewed simply as a technique for reducing sample dimension, it
is possible to interpret sample principal components as estimates of analogous
13
population quantities. Let Ω = E(Sxx) and let λj and qj be the jth eigenvalue
of Ω and its corresponding normalised eigenvector. Consider the problem of
choosing non-random orthonormal vectors rj for j = 1, .., N to maximise the
population variance of the constructed variables r′jxt for j = 1, .., N . The same
calculus arguments that were used above show that this problem is solved
by choosing rj so that the constructed variables are equal to the population
principal components, dened as sjt = λjq′jxt for j = 1, .., N .
It should be noted that, as dened above, population principal components
are identied up to an orthogonal transformation only. That is to say, the opti-
misation problem described above, which is solved by the population principal
components st = (s1t ... skt)′ is also solved by Mst where M is any k × k
orthogonal matrix. Consequently, the interpretation of principal components
as representing some underlying economic structure is problematic.
1.2.3 Classical Asymptotics
For the principal components model, if there are no repeated eigenvalues then
λj and qj are continuous functions of Ω19. Therefore, under conditions sucient
for√
T (Sxx − Ω) to be asymptotically Gaussian as T →∞,√
T (st −Mst) is
asymptotically Gaussian as T →∞ for j = 1, .., N , where N is a xed constant.
Anderson (1963) derives the asymptotic distribution of the eigenvalues and
eigenvectors for the more general case which allows for arbitrary multiplicity
of the eigenvalues in a setting where N is xed and T −→∞.
19see Magnus and Neudecker (1991).
14
1.2.4 Random Matrix Theory
In applications for which the number of variables is of the same order of mag-
nitude as the number of observations, the `xed-N ' asymptotics of Subsection
1.2.3 may be inappropriate. A signicant body of work, known as Random
Matrix Theory, exists which examines the distribution of the eigenvalues of
sequences of covariance matrices of T observations on N variables as T and
N jointly approach innity at the same rate. Random Matrix Theory has its
origins in theoretical physics. In quantum mechanics the discrete energy levels
of atomic nuclei may be found by computing the eigenvalues of a Schrödinger
operator. For light atoms, solution procedures are well-known, but for heavy
atoms with large numbers of energy levels, the required analysis becomes un-
feasibly complicated. Physicists often circumvent this problem by replacing
the Schrödinger operator with a Hermitian random matrix and conducting
analyses of energy levels by considering the distribution of eigenvalues. The
most famous result in this eld is Wigner's semicircle law20. Wigner con-
sidered a N × N Hermitian matrix with IID real random diagonal elements,
and o-diagonal elements that are IID complex random variables with a com-
mon variance σ2, and derived the asymptotic distribution of the eigenvalues.
Mar£enko and Pastur (1967) developed a similar theory for the eigenvalues of
the covariance matrix of serially and cross-sectionally uncorrelated Gaussian
random vectors. Assume that xt, t = 1, ..., T is a IIDN(0, σ2IN) sequence of
N × 1 random vectors, and denote the covariance matix Sxx = 1T
T∑t=1
xtx′t. Let
20Wigner (1955) and Wigner (1958).
15
λj be the jth eigenvalue of Sxx. The spectral distribution of Sxx is dened as
F Sxx(m) =#j : λj 6 m
N
where # denotes the number of elements in the set indicated. That is, the
spectral distribution at the point m gives the number of sample eigenvalues
that are not larger than m. Mar£enko and Pastur (1967) showed that as
NT−→ c < ∞, F Sxx(m) −→ F (m) where
F (m) =
√
(a+−m)(m−a−)
2πmcσ2 , a− 6 m 6 a+
0, elsewhere
a− = σ2(1−√
c)2 and a+ = σ2(1+√
c)2. Consequently, the sample eigenvalues
are more spread out than the population eigenvalues. A considerable amount
of theoretical work has followed. A complete review of this literature is not
necessary here, but it is noted that an implication of much of this work is that
sample eigenvalues may not be consistent estimators of population eigenvalues
in a setting in which N and T grow jointly. For example, Geman (1980)
considers the covariance matrix formed from a T × N matrix of IIDN(0, 1)
random variables, in a setting in which (T, N) −→ (∞,∞) and NT−→ c ∈ (0, 1]
and shows that
λ1a.s.−−→
(1 +
√c)2
16
Johnstone (2001) derives the asymptotic distribution of the largest sample
eigenvalue. He shows that
λ1 − µTN
σTN
d−→ W1 ∼ F1
where µTN =(√
T − 1 +√
N)2
, σTN =(√
T − 1 +√
N)(
1√T−1
+ 1√N
) 13,
and the density is given by
F1(s) = exp
−1
2
∞∫s
q(x) + (x− s)q2(x)dx
, s ∈ R
where q solves the Painlevé II dierential equation
q′′(x) = xq(x) + 2q3(x)
q(x) ∼ Ai(x) x →∞
where Ai(x) denotes the Airy function. Johnstone (2001) also proposes a
spiked" model in which a nite number of the population eigenvalues have
relatively large values. Baik and Silverstein (2006) derive almost sure limits
for the sample eigenvalues in a spiked model in which all but a nite number
of the population eigenvalues are equal to one. They nd that if all of the
population eigenvalues lie in the interval [1−√
c, 1 +√
c], then the Mar£enko
and Pastur (1967) result holds, despite the presence of the large population
eigenvalues. If some of the population eigenvalues lie outside [1−√
c, 1 +√
c],
then the same number of sample eigenvalues will lie outside the support of the
Mar£enko-Pastur density ([(1−√
c)2, (1 +√
c)2]), and the rest will conform to
17
the Mar£enko-Pastur limit.
Ledoit and Wolf (2002) use results from the Random Matrix Theory lit-
erature to consider the behaviour of test statistics for hypotheses about co-
variance matrices. They nd that an existing test for spherity is robust to
high dimensionality, and develop a modied statistic for testing that the co-
variance is equal to a specied matrix. There also exists some work in which
Random Matrix Theory is used to consider the behaviour of sample covari-
ances matrices of variables generated by factor models in a setting in which
(T,N) → (∞,∞) jointly. Kapetanios (2005) considers a static factor model
in which the eigenvalues of B′B grow at a rate of N and the eigenvalues of the
error covariance are bounded. If kmax > k then λi − λkmax+1 will diverge as
N −→∞ for i = 1, ..., k and remain bounded for i = k + 1, ..., kmax. A test of
the null hypothesis H0 : k = k0 against H1 : k > k0 may then be based on the
test statistic λk0+1− λkmax+1. Kapetanios (2005) proposes a subsampling tech-
nique to approximate the distribution of this test statistic. He proves that this
procedure consistently estimates the distribution of the test statistic, and that
a sequence of tests using this approach consistently estimates the true factor
order in a setting in which NT−→ c < ∞. Onatski (2007) proposes an alter-
native test statistic and derives its distribution. He assumes that the errors
are Gaussian and temporally independent, that the eigenvalues of the error
covariance are bounded, that the eigenvalues of B′B grow at a rate faster than
N23 , and that N
Tremains in a compact subset of (0,∞) as (T, N) −→ (∞,∞).
Onatski (2006a) considers a factor model in which the eigenvalues of B′B
are bounded above and the elements of the error vector εt are IIDN(0, σ2).
Since the proportion of the total variance that is accounted for by the factors
18
declines as N grows, he refers to the factors as being relatively weak. Using
arguments from Random Matrix Theory, he shows that for sequences for which
(T,N) → (∞,∞) and NT−→ c < ∞, the principal components are inconsistent
estimators of the factors but are asymptotically Gaussian. Onatski (2006b)
considers a static factor model with errors that are either IID across time or
IID across the cross-section (but not both). He assumes that the eigenvalues
of B′B are growing, but does not require the growth to be as fast as N . The
eigenvalues of the error covariance are assumed to be bounded. He uses ideas
from Random Matrix Theory to derive an estimate of an upper bound for
the eigenvalues of the error covariance matrix. The number of factors is then
estimated by counting the number of eigenvalues of the observable covariance
that are above this bound. He proves consistency of this estimator in a setting
in which N −→∞ and T −→∞ simultaneously at the same rate.
From the perspective of economic analysis, a weakness of Random Matrix
Theory, in its current form, is that it does not apply to serially dependent data.
An extension to a correlated time series setting would be a major advance
that would lead to the development of many new techniques in econometrics.
For the time being however, the applicability of Random Matrix Theory to
economic problems is somewhat limited.
1.2.5 Principal component regression
One possible application of principal components techniques which is at least
supercially attractive is the reduction of dimension in regression analysis.
This has been proposed both for reducing a large set of regressors to a size
19
manageable by standard regression techniques21, and for relatively small re-
ductions in dimension to eliminate multicollinearity problems22. It is often
argued that principal components corresponding to small eigenvalues may be
omitted from the analysis since they account for only a small proportion of
the total variation of the regressors23. However, this is not a sound argument.
The principal components of a set of predictor variables xt are the components
of xt which have the maximum possible variance. What matters for regression
analysis however, is not the variance of the right-hand-side variables, but their
correlation with the dependent variable yt. It is easy to construct examples in
which all of the correlation between xt and yt is accounted for by the principal
component of xt corresponding to the smallest eigenvalue. Therefore, in the
absence of an argument which explains why the correlation of xt with yt would
be due to the components which also explain most of the variance of xt, a re-
gression technique based on excluding principal components that correspond
to relatively small eigenvalues is suspect. In order to introduce ideas which
will be developed in much greater depth and generality in Chapters 3 and 4,
two such arguments will now be introduced.
Firstly, consider the case in which the explanatory variables are determined
by a vector of k unobservable factors ft and a vector of N errors εt, i.e.
xt = Bft + εt (1.5)
where B is a N × k matrix of unknown non-random coecients. It is assumed
21see, for example, Pidot (1969).22see, for example, Mittelhammer and Baritelle (1977), Cheng and Iglarsh (1976), McCal-
lum (1970).23Mittelhammer and Baritelle (1977) and Pidot (1969) both present this argument.
20
that yt is correlated with xt purely due to its correlation with the factors, i.e.
yt = β′ft + ηt (1.6)
where β is a k × 1 vector of regression coecients and ηt is a regression error
term that is uncorrelated with ft. For simplicity, it is assumed that E(εtε′t) =
IN and E(ftf′t) = Ik. Let Λk be a k× k diagonal matrix containing the rst k
eigenvalues of Ω = E
(T∑
t=1
xtx′t
)in descending order and Qk be a N×k matrix
containing the corresponding eigenvectors as columns. The eigenvectors of BB′
are then Qk and the eigenvalues are Λk − Ik. Therefore, there exists a k × k
orthogonal matrix M such that B = Qk (Λk − Ik)12 M . It follows that the rst
k principal components of xt may be written as
sjt =
√λj − 1
λj
fjt + λ− 1
2j q′jεt, j = 1, .., k
Therefore, in this particular case, the rst k principal components are noisy
scaled measures of the rst k factors. Consequently it is the rst k principal
components that account for the correlation between xt and yt. OLS regression
of the principal components on yt produces an inconsistent estimate of β due
to the measurement error introduced by using the principal components as
proxies for the factors. However, standard errors-in-variables arguments may
be used to construct a method of moments estimator of β for given values of
λj, j = 1, .., k. Consistent estimation of the eigenvalues then permits consistent
estimation of β.
Now consider a slightly dierent example. For equations (1.5) and (1.6)
21
assume once again that E(ftf′t) = Ik. This time however, the error variance
is assumed to be E(εtε′t) = σ2IN where σ2 is a scalar. The rst k principal
components of xt are now
sjt =
√λj − σ2
λj
fjt + λ− 1
2j q′jεt, j = 1, .., k
As above, it is clear that the correlation between xt and yt is accounted for by
the rst k principal components of xt, but that a OLS regression of the principal
components of xt on yt will have an errors-in-variables bias. Note however
that var(√
λj−σ2
λjfjt
)=
λj−σ2
λjand var
(λ− 1
2j q′jεt
)= σ2
λj. Consequently, if σ2
λjis
suciently small, the errors-in-variables bias will be negligible. In such cases,
OLS regression of the rst k principal components of xt on yt is a reasonable
approach to take.
1.2.6 Independent Component Analysis
An extension of principal component analysis, which resolves the identication
issue, is to assume that the components are statistically independent, rather
than merely uncorrelated. The independent component analysis (ICA) model
is
xt = Ast
where xt is a N×1 observable vector, st is a m×1 vector of unobservable unit-
variance independent signals, and A is a N ×m non-random mixing matrix.
Comon (1994) shows that st is identied up to a sign matrix24 if, with the
24that is, each series of elements of st is identied up to a sign change.
22
possible exception of one component, all the components are non-Gaussian,
N > m, and A is of full column rank. Cardoso (1998) and Hyvärinen et al.
(2001) provide good overviews of ICA and the many procedures that have been
proposed for estimation of the ICA model. Recently, Chen and Bickel (2006)
has proposed an asymptotically ecient semi-parametric estimator.
Applications of the ICA model have included the processing of magnetoen-
cephalographic and electroencephalographic data 25 26, functional magnetic
resonance imaging 27, and telecommunications 28. However, little work has
been done in economics using ICA. This is most likely due to the absence of
additive noise in the ICA model, which renders it unsuitable for many economic
applications.
1.2.7 Canonical Correlation Analysis and Reduced Rank
Regression
Canonical correlation analysis was introduced by Hotelling (1936). Consider
two random vectors xt and yt of dimensions N × 1 and m × 1 where m 6
N . Dene the weighted sums yt = β′xt and vt = α′yt, where α and β are
vectors of conformable dimension. The rst canonical vectors are derived by
nding the values of α and β which maximise the correlation between ut and
vt subject to the arbitrary normalisations α′α = 1 and β′β = 1. The second
canonical vectors are derived by similarly choosing vectors to maximise the
correlation, subject to the additional restriction that the second canonical
25Vigário et al. (1998).26Flexer et al. (2005).27McKeown et al. (1998).28Ristaniemi and Joutsensalo (1999).
23
vectors are orthogonal to the rst. A complete set of m canonical vectors may
be dened this way. Some calculus shows that the solution to this maximisation
problem is computed from the singular value decomposition of Σ− 1
2yy Σ′
xyΣ− 1
2xx
where Σxx = E(xtx′t), Σyy = E(yty
′t) and Σxy = E(xty
′t).
Consider the regression model
yt = ABxt + εt
where A is a m×r matrix of coecients and B is a r×N matrix of coecients.
If r < m then the regression coecient AB is of rank r < m and the model
is referred to as a reduced rank regression. It may be shown (see Reinsel and
Velu (1998)) that the Gaussian maximum likelihood estimators of A and B
are equivalent to the weighting matrices from a canonical correlation analysis
of yt and xt.
1.3 Time series factor analysis
The classical factor model was explicitly derived in a IID framework and most
of the problems to which it has been applied have been dened in this setting.
However, it should be remembered that the asymptotic Gaussianity proof for
the maximum likelihood estimator provided by Anderson and Rubin (1956)
requires only identication of the parameters and that√
T (Sxx−Ω) is asymp-
totically Gaussian; which does not necessarily rule out serial correlation of
ft or εt. Similar conditions are used to prove consistency and/or asymptotic
Gaussianity for the other factor model estimators discussed in Section 1.1, and
24
for the principal components estimator discussed in Section 1.2. Furthermore,
the factor model
xt =
q∑j=0
Bjft−j + εt
may be written as
xt = B∗f ∗t + εt
where B∗ =(B′
0 · · ·B′j
)′and f ∗t =
(f ′t f ′t−1 · · · f ′t−q
)′Therefore, a model with
q lags of k factors may be rewritten as a model with qk factors and no lags.
Consequently, the traditional techniques of factor analysis and principal com-
ponent analysis are also applicable in some time series settings in particular,
under the commonly assumed conditions of covariance stationarity and weak
dependence. Nonetheless, it is often of interest to explicity specify and es-
timate the dynamic structure of a factor-driven time series process. In this
thesis, the adjective `dynamic' will be used to indicate a factor model which
has an explicitly written lag structure. The adjective `static' will indicate a
model in which only contemporaneously-dated variables are explicity included
some or all of which may have interesting time series structure. Importantly,
the term `static' is not intended to indicate that a variable is IID.
Two distinct approaches have been taken to the specication of dynamic
factor models. The earliest models were based on a two-sided lter, such that
the observable variables are related to past and future values of the factors. The
alternative approach which was subsequently taken is to specify the observable
variables as functions of the current and past values of the factors only.
25
1.3.1 Models Based on a 2-sided Filter
Geweke (1977), Sargent and Sims (1977) and Geweke and Singleton (1981)
considered the dynamic factor model
xt =∞∑
j=−∞
Bjft−j + εt
for t = 1, .., T . The factor and the error terms are assumed to be zero-mean
mutually independent and covariance stationary. Taking the Fourier transform
of the autocovariance function yields the spectral density matrix of xt
Sx(ω) = B(ω)Sf (ω)B(ω)H + Sε(ω)
where |ω| 6 π, Sf (ω) is the spectral density matrix of ft, Sε(ω) is the spectral
density matrix of εt, B(ω) is the Fourier transform of the function Bt−j of
j, and B(ω)H is the complex conjugate transpose of B(ω). Geweke (1977)
assumed that the factor is scalar, unit variance and serially uncorrelated so that
Sf (ω) = 1. He proposes dividing the periodogram ordinates into frequency
bands and tting a model to each band using maximum likelihood methods
assuming complex Gaussianity. The algorithm that he uses is similar to that
employed by Jöreskog (1967) for the static model. Sargent and Sims (1977)
propose the same approach allowing for multiple factors. Geweke and Singleton
(1981) provide an identication theorem for the multiple factor model based
on zero-restrictions similar to that discussed for the static model in Subsection
1.1.2, and present an identication theorem which allows for correlated factors.
They also discuss maximum likelihood estimation in the correlated factor case.
26
Applications of this approach include business cycle modelling 29, a model of
interest rates30, and an analysis of sectoral unemployment31. A disadvantage
of this approach is the fact that the frequency bands must be specied and
that the spectrum must be assumed to be at within each frequency band.
The more bands that are used, the more likely that this assumption is to
be approximately true, but the fewer the available periodogram ordinates for
model estimation. A further disadvantage is that the observable variables are
determined by a two-sided lter of the factor vector. Consequently, Geweke's
model is not well-suited to forecasting.
1.3.2 Models Based on a 1-sided Filter
The alternative to specifying a model based on a two-sided lter is to allow
the observable variables to be related to the current and past values of the
factors only. Engle and Watson (1981) propose the following autoregressive
one-factor model
xt = Bft + Γzt + εt
ft = αft−1 + δzt + ηt
where xt is the N×1 observable vector, ft is the scalar factor and zt is a vector
of exogenous or lagged dependent variables. They propose that the model be
treated as a state space model for the purposes of estimation. For the model
written above, the factor is treated as the state variable and the equations are
29Geweke and Singleton (1981) and Sargent and Sims (1977).30Singleton (1980).31Heaton and Oslington (2002).
27
the measurement and state equations respectively. While the above speci-
cation is for a scalar AR(1) factor with only a contemporaneous relationship
between the factors and the observable variables, models with a one-sided lter
of multiple higher order autoregressive factors can be considered by appropri-
ately stacking the lags of the multiple factors to create the state vector, and
restricting the system matrices appropriately. If appropriate, autoregressive
errors may be included in the state vector32.
Relatively little attention has been paid to identication of the one-sided
dynamic factor model. Engle et al. (1985) briey consider the possibility that
their 2-factor model is not identied, but point out that lack of identiability
does not cause problems with the EM algorithm that they employ, and don't
consider the matter further. Camba-Mendez et al. (2001) prove identication
for a k-factor model of the form xt = Bft + εt, under the following conditions
1. ft = C(L)−1ηt where C(L) is a diagonal k × k nite-order lag operator;
2. ηt and εt are mutually and serially uncorrelated and conditionally ho-
moscedastic;
3. the covariance matrix of εt is diagonal and the covariance matrix of ηt is
the identity;
4. the elements of B are such that Bii = 1 for i = 1, .., k.
It should be noted that these conditions are also sucient for identiability for
the static model and are similar to those employed by Geweke and Singleton
(1981) for the two-sided dynamic factor model.
32see, for example, Watson and Engle (1983), Watson and Kraft (1984), and Stock andWatson (1990).
28
Two approaches exist for the estimation of dynamic factor models written
in a state space form.
Likelihood Approaches
Engle and Watson (1981) propose that the likelihood of the state space form
be computed using the Kalman lter and the model estimated using a scoring
algorithm, with numerical dierentiation employed to more rapidly compute
the gradient and information matrix. While this is a neat approach, in practice
the computational load can be quite high, particularly for models with large
numbers of variables and for models with an extensive lag structure. A good
set of starting values can be a valuable asset.
Shumway and Stoer (1982) and Watson and Engle (1983) independently
proposed the EM algorithm to estimate the above model. Each iteration of the
EM algorithm requires a run of the Kalman lter plus a smoothing algorithm.
However, the rest of the iteration is a least squares computation so it is not
necessary for the information matrix to be computed and inverted for each
iteration. Practical experience suggests that the EM algorithm is far more ro-
bust to poor starting values than the scoring algorithm. However, convergence
of the EM algorithm can be quite slow near the solution. A sensible strategy,
which is often employed in applied work, is to estimate the model using the
EM algorithm with an extemely course convergence criterion and to then use
the solutions as starting values for a scoring algorithm. Using this strategy,
models with over one hundred parameters can sometimes be estimated with-
out too much diculty, provided that the dynamic structure of the model is
29
kept simple33. However, the EM approach is not well-suited to the estimation
of models with autoregressive errors, since these are generally included in the
state vector, resulting in a `noiseless' measurement equation. Furthermore,
for models with multiple lags of multiple factors, and with autoregressive er-
rors, the computational load of the scoring algorithm can be heavy, even for
models of only a few variables. This can be a strong disincentive against the
use of dynamic factor models particularly when vector autoregressions can
be estimated so easily. Applications of the likelihood approach include the
construction of coincident and leading indicators34 and analyses of wages35,
productivity 36 and aggregate demand37. Giannone et al. (2006) advocate the
use of dynamic factor models for business cycle analysis.
Subspace Algorithms
Since likelihood methods may involve a heavy computational burden, alter-
native methods of estimation should be considered. One approach which is
particularly attractive is to represent the model in forward innovations state-
space form and to employ a subspace algorithm. Since subspace algorithms
are not well-known in economics, they will be briey described. Much more
detail is available from Kapetanios and Marcellino (2004), Bauer (1998), or
from the extensive literature in engineering in which subspace algorithms were
developed.
33see Lebow (1993) for an example of a model with 145 variables estimated using the EMalgorithm.
34Stock and Watson (1990).35Engle and Watson (1981) and Watson and Engle (1983).36Lebow (1993).37Watson and Kraft (1984).
30
Consider the dynamic factor model
xt = Bft + εt (1.7)
ft = Aft−1 + ηt (1.8)
where xt is the N ×1 observable vector and ft is the k×1 factor. Viewing this
as a state space model, it may be rewritten in forward innovations form38 as
xt = Bft + Cut
ft = Aft−1 + Dut−1
Dene the innite-dimensional vectors
xft =
xt
xt+1
xt+2
...
xp
t =
xt−1
xt−2
xt−3
...
Note that xf
t contains future values of xt, and xpt contains past values. Some
elementary matrix algebra shows that it is possible to write
xft = B1ft + ηt
ft = A1xpt
38see Hannan and Deistler (1986).
31
where ηt =
D1ut
D1ut+1
D1ut+2
...
where A1, B1 and D1 are functions of the system
matrices A, B, C and D. Substituting the second equation into the rst yields
a regression equation linking the past and future values of xt
xft = Γxp
t + ηt
where Γ = B1A1.
In practice, the innite dimensional vectors xft and xp
t cannot be con-
structed and so they are replaced with their truncated analogues
xfs,t =
(x′t x′t+1 · · ·x′t+s
)and xp
q,t =(x′t−1 x′t+2 · · ·x′t−q
)where s and q are
chosen values. Ordinary least squares is then used to estimate Γ. An estimate
of A1 is then computed from a singular value decomposition of Γ39. The factors
are then estimated as ft = A1xpst and the parameter matrices of the original
factor model are estimated by ordinary least squares with ft used in place of
ft. Consistency40 requires Ns > k and q to grow slower than T13 but faster
than (lnT )δ where δ is a parameter that depends on the largest eigenvalue of
A. However, the rate of convergence may be slow in practice. In fact Deistler
et al. (1995) only prove the existence of a sequence of non-singular uniformly
bounded matrices MT such that∥∥∥B −MT BM−1
T
∥∥∥ a.s.−−→ 0,∥∥∥C −MT C
∥∥∥ a.s.−−→ 0
and∥∥∥A−MT A
∥∥∥ a.s.−−→ 0. They also nd that((
loglogTT
)) 1
2 (logT )α)−1 ∥∥∥Γ− Γ
∥∥∥ a.s.−−→
0. Given the sample sizes typically available in economics, rates of convergence
39Often a weighted value of Γ is used. See Larimore (1983).40Deistler et al. (1995).
32
such as this are of some concern.
It should also be noted that the dynamic factor model given by equations
(1.7) and (1.8) has quite simple dynamics, which might be considered too
simple in many applications. Lagged factors may be incorporated into equation
(1.7) quite easily by stacking the factor vector. However, it is less clear how the
accompanying restrictions on the system matrices can then be incorporated
in the estimation procedure. Similarly, it is not immediately obvious how
autoregressive dynamics for the error vector can be estimated using a subspace
approach. Consequently, while subspace methods show considerable promise
as an estimation approach for dynamic factor models, it is not yet clear how
models with rich dynamics may be handled.
Alternative Specications for the Factor Processes
Subsequent research has proposed estimation algorithms for dynamic factor
models with dierent factor specications. In particular, Kim (1994) and Kim
and Yoo (1995) propose an approximate maximum likelihood procedure for
the estimation of a factor model with a factor which follows the Markov-
switching process of Hamilton (1989). Chauvet (1998) uses a modication
of their method. Kim and Nelson (1998) use Gibbs sampling to estimate a fac-
tor model with regime-switching factors. Diebold and Nerlove (1989) propose
a Factor ARCH model in which the observable vector is related to a factor
which follows an ARCH process, and propose an approximate likelihood esti-
mation procedure based on the Kalman lter. Dungey et al. (2000) estimate a
model in which the factor is autoregressive with GARCH disturbances. They
employ the indirect estimation procedure of Gourieroux et al. (1993).
33
1.3.3 Dynamic Errors in Variables
The linear dynamic errors in variables model is usually written as
K(L)ut = 0
xt = ut + εt
where ut and εt are zero-mean, stationary N × 1 vector processes which are
often assumed to be mutually uncorrelated, and K(L) is a (N−k)×N full-rank
polynomial matrix. xt is the only variable which is observable. The model may
be partitioned in the following way
(K1(L) −K2(L)
) u1t
u2t
= 0
x1t
x2t
=
u1t
u2t
+
ε1t
ε2t
(1.9)
where x1t, u1t and ε1t are k×1 vectors, x2t, u2t and ε2t are (N −k)×1 vectors,
K1(L) is (N − k) × k and K2(L) is (N − k) × (N − k) and of full rank. We
may then write
x2t = K2(L)−1K1(L)u1t + ε2t
x1t = u1t + ε1t
and it becomes clear that the model is a dynamic generalisation of the static
errors-in-variables model reviewed in Subsection 1.1.4.
Identication of the parameters of K(L) is a non-trivial issue which has
34
been the subject of some interest in the literature. Results have been estab-
lished for many special cases. Deistler and Anderson (1989) consider the single-
input-single-output case, the three-variable case, and cases where the number
of inputs is equal to the number of outputs, and prove several results. They
also consider using higher-order cumulunt spectra for identication. Nowak
(1992) discusses several subclasses of dynamic errors-in-variables model which
are identiable from their second order moments. Nowak (1993) assumes that
K(L) is a rational transfer function and uses a partial fraction expansion rep-
resentation to prove local identiability. Bloch (1989) considers the case where
K(L) is a two-sided lter and shows that the model may be written as a dy-
namic factor model. This representation may be seen by stacking the variables
from the partitioning in Equation (1.9) to yield
xt =
Ik
K2(L)−1K1(L)
u1t + εt
He uses this representation to investigate identiability and proposes estima-
tion using a maximum likelihood approach in the frequency domain similar to
the method for non-causal factor models proposed by Geweke (1977).
1.4 Time Series Principal Component Analysis
Principal components techniques may also be adapted to suit a dynamic frame-
work. Like dynamic factor analysis, dynamic principal component analysis
has been approached in two ways through the use of a two-sided lter, and
through the use of a one-sided lter.
35
1.4.1 Models Based on a 2-sided Filter
Brillinger (1975) proposes the estimation of principal components in the fre-
quency domain, based on a two-sided lter. He denes dynamic principal
components as in the static case, but with the spectral density matrix used
in place of the covariance matrix. In the time domain, the relationship be-
tween the dynamic principal components and the observable variables may be
written as
xt =∞∑
j=−∞
Ajst−j
xt is a N×1 vector, Aj is a N×N matrix of coecients, and st is a N×1 vector
of serially correlated variables that are mutually uncorrelated at all leads and
lags. Sample estimates of the dynamic principal components are constructed
by using a smoothing technique to consistently estimate the spectral density
matrix at a set of frequencies, and then computing the principal components
of the estimated spectral density matrix at each frequency.
Brillinger (1975) also proposes that canonical correlation analysis of pairs
of time series vectors be carried out in the frequency domain, based on a
two-sided lter.
1.4.2 Models Based on a 1-sided Filter
Kariya (1993) proposes a multivariate time series variance component (MTV)
model
xt = Ast
36
where xt is a N × 1 vector, A is a N × N matrix of coecients for which
A′A = IN , and st is a N × 1 vector of serially correlated variables that are
mutually uncorrelated at all leads and lags. It should be noted that, unlike
the dynamic principal component representation of Brillinger (1975), the MTV
model is not general. Taniguchi et al. (2006) propose a test statistic for the
hypothesis that an observable time series was generated by a MTV model.
Hosseini et al. (2003) construct a generalisation of the ICA model to a
setting in which the elements of the source vector st are mutually independent
but serially correlated. They propose a maximum likelihood procedure for the
estimation of the unmixing matrix A−1.
Box and Tiao (1977) consider a canonical analysis for a vector autoregres-
sion
yt = Φ(L)yt−1 + εt
where yt is a N×1 vector. Consider a linear combination of the elements of yt,
wt = r′yt, where r is a arbitrary N × 1 vector for which r′r = 1. The variance
of wt is given by
σ2w = r′ΣΦ
y r + r′Σεr
where ΣΦy is the covariance matrix of Φ(L)yt−1 and Σε is the covariance matrix
of εt. The predictability of the series may then be measured by the signal-to-
noise ratio
τ =r′ΣΦ
y r
r′Σεr
Some calculus shows that the most predictable linear combination of yt is con-
structed by setting r equal to the eigenvector corresponding to the largest
37
eigenvalue of Σ−1ε ΣΦ
y . Box and Tiao (1977) consider the case where Φ(L) has
roots close to the unit circle. They show that in this case, the canonical
variables may be divided into two groups some which follow stationary au-
toregressions, and some which are approaching non-stationarity. They suggest
that the second of these groups could serve as useful composite indicators of
the overall dynamic growth of yt.
Consider the dynamic reduced rank regression model
yt = A(L)B(L)xt + εt (1.10)
In the case where A(L) and B(L) are two-sided polynomials, the analysis of the
model corresponds to the dynamic canonical correlation analysis of Brillinger
(1975). Velu et al. (1986) considered the case where A(L) and B(L) are one-
sided polynomials. When xt = yt−1, Equation (1.10) is a reduced rank vector
autoregression. Velu et al. (1986) consider two special cases of this model. If
deg (A(L)) = 0 then the model corresponds to the canonical analysis of Box
and Tiao (1977). If instead deg (B(L)) = 0, then the model becomes the index
model of Sims (1981). Velu et al. (1986) discuss estimation and develop some
asymptotic theory for these models. By far the most common use of reduced
rank regression and canonical correlation analysis in economics is the analysis
of cointegrated systems (see Johansen (1988) and Reinsel and Ahn (1992)).
38
1.5 Factor Analysis and Principal Component
Analysis of High-Dimensional Vectors
In addition to dealing with low dimensional vectors which might alternatively
be analysed using more traditional techniques, the factor analysis techniques
outlined in the previous sections may be applied to larger vectors. However, in
cases in which the number of variables is of a similar magnitude to, or possibly
even larger than the number of observations, three problems are encountered.
Firstly, the assumption that the error terms are not correlated with each other
in a factor model will generally become harder to believe as the number of
variables increases. Secondly, the computational work required to estimate
the factor model often becomes prohibitive. Thirdly, and most importantly,
the asymptotic arguments that are used to justify the factor estimators out-
lined in the previous sections assume that the number of variables is xed and
the number of observations goes to innity. In a setting in which the number of
variables is of a similar order of magnitude to the number of observations, such
an approach may provide a poor approximation to the actual behaviour of the
estimators. Quah and Sargent (1992) propose that the EM algorithm might
be used in a large-N context. They argue (p.10) that ..increasing the cross-
section dimension N can only help to estimate more precisely (the conditional
expectations of) the (unobserved part of the) state and its cross moments.
However, their argument is based on intuition and is not entirely convincing.
Given the ndings in the random matrix theory literature, where sample eigen-
values do not consistently estimate population eigenvalues in a setting in which
(N, T ) → (∞,∞) jointly, insistence on proof, rather than informal arguments,
39
would seem wise.
In recent years, there has been a great deal of interest in estimating factor
models of high-dimensional time series using principal component methods.
The computational attraction of this approach is clear, since eigenvalues and
eigenvectors can be computed even for very large matrices easily and quickly.
Furthermore, there exists a growing body of formal theory which shows that
sample principal component quantities can consistently estimate their anal-
ogous population factor quantities, under certain conditions, in a setting in
which (N, T ) → (∞,∞) jointly. Recently, there has been rapid growth in the
number of applications of these techniques, although many of these papers are
yet to be published. In this section, the literature on principal component
estimation of innite dimensional factor models in reviewed.
1.5.1 Population Results
The earliest work on innite dimensional factor analysis considered the the-
oretical problem of using population principal components as estimators of
population factors. While this work does not produce an estimator which can
be implemented in practice, it provides a lot of insight into the relationship
between principal components and factors and establishes some of the ground-
work necessary for a consideration of the more practically relevant issue of
using sample principal components to estimate population factors. Chamber-
lain and Rothschild (1983) considered a generalisation of the arbitrage pricing
model of Ross (1976). Whereas Ross had assumed that asset returns followed a
strict factor model41 in which the errors are mutually uncorrelated, Chamber-
41The term `strict' factor model was introduced by Chamberlain and Rothschild (1983).
40
lain and Rothschild (1983) considered an `approximate' factor model in which
the errors are allowed to be weakly correlated in the sense that the largest
eigenvalue of the error covariance matrix is bounded. They show that, in a
framework in which N −→ ∞ principal components and factor loadings are
equivalent. While they don't explicitly consider the problem of estimation,
Chamberlain and Rothschild (1983) suggest that an implication of their re-
sults is that nancial analysis might be undertaken by a consideration of the
principal components of the covariance matrices of returns, rather than requir-
ing the consideration of a factor model. Bentler and Kano (1990) consider a
static single factor model
xt = Bft + εt
They assume that Ψii 6 σ2 < ∞ where Ψii is the ith diagonal element of Ψ =
E(εtε′t) and B′B −→∞ as N −→∞. They prove that under these conditions
the correlation coecient between the factor and the rst population principal
component of xt converges to 1, and the principal component loading vector
converges to the factor loading vector.
Schneeweiss and Mathes (1995) consider a k-factor static model and anal-
yse the sum of the canonical correlation coecients between the population
factors and the population principal components. They show that this sum
approaches k as σ2
dk−→ 0 where σ2 is the largest eigenvalue of Ψ and dk is
the smallest eigenvalue of B′B. They also prove similar results for the fac-
tor loadings and the principal component loadings. Under similar conditions,
Schneeweiss (1997) proves that∥∥∥BD− 1
2 −QfL∥∥∥
F−→ 0 and ‖ft − Lsft‖ −→ 0,
where D is a diagonal k×k matrix containing the ordered eigenvalues of B′B,
41
Qf is the N×k matrix containing the eigenvectors of Ω = E(xtx′t) correspond-
ing to the rst k eigenvalues, sft is a vector containing the rst k principal
components of xt, L is a k × k sign matrix42 and ‖.‖F denotes the Frobe-
nius norm. Like Chamberlain and Rothschild (1983), Schneeweiss and Mathes
(1995), and Schneeweiss (1997) consider population quantities only. However,
their work provides considerable insight into the conditions under which prin-
cipal component quantities will be similar to corresponding factor quantities.
In the 75 years that principal component analysis and factor analysis have
coexisted, their papers are the rst to provide a detailed account of the re-
lationships between the two concepts. An understanding of this relationship
provides the foundation upon which theories relating sample principal compo-
nents to population factors may be constructed.
1.5.2 Models Based on a 2-sided Filter
Forni et al. (2000) considered the dynamic factor model
xt = B(L)ft + εt
where ft is white noise, εt is zero mean, stationary and orthogonal to ft at
all leads and lags. Forni et al. (2000) state (p.541) that B(L) is a one-sided
square-summable lag polynomial, however their proposed estimator (p.546)
is based on a two-sided ltering. Transforming the model to the frequency
domain, they assume that the diagonal elements of the spectral density matrix
of xt are bounded, that the eigenvalues of the spectral density of the common
42i.e. the diagonal elements of L are all ±1.
42
component B(L)ft diverge as N −→ ∞, and the eigenvalues of the spectral
density of εt are uniformly bounded. As such, the model may be considered to
be a dynamic generalisation of the approximate factor model of Chamberlain
and Rothschild (1983). Using dynamic principal component techniques to
estimate the factors and factor loadings, they prove that the sample estimate
of the common component of the model converges in probability to the true
common component as (N, T ) −→ (∞,∞). Forni et al. (2004) show that the
rate of convergence is min(√
N,√
T ). For the dynamic principal component
estimator of the factors, they nd a rate of convergence of√
TN. The best rate
that can be achieved is√
N , but this requires that T grow at a rate at least
as fast as N2. They have no result for cases in which T grows slower than
N . This is not a particularly encouraging result for analysts who wish to use
dynamic principal component techniques to estimate factors, rather than just
the common component of the model, and there is a role for future research
to investigate whether faster rates of convergence may be established for the
factor estimator.
1.5.3 Models Based on a 1-sided Filter
In parallel to Forni, Hallin, Lippi and Reichlin's work on the dynamic factor
model, interesting results have been derived in a dual limit setting for static
principal component estimates of static factor quantities. Stock and Watson
(2002a) considered a static factor model for a N × 1 vector of time series xt,
and considered the problem of forecasting a scalar time series variable yt h
43
time periods ahead, i.e.
xt = Bft + εt
yt+h = β′ft + γ′zt + ηt
where zt is a vector of predetermined variables (which may include lags of
the dependent variable) and β and γ are vectors of regression coecients.
Stock and Watson (2002a) prove that, under certain conditions as (T,N) −→
(∞,∞), the sample principal component vector at each period in time con-
verges in probability to the population factor, up to a sign matrix, and that
OLS estimator of the regression coecients in the forecasting equation, com-
puted by substituting the sample principal components for the unobservable
population factors, converge in probability to the population parameters, up
to a sign matrix. They also show that the sample forecasts computed using
these sample quantities converge in probability to the corresponding infeasible
forecasts computed using the unknown population quantities.
Under slightly dierent conditions, Bai and Ng (2002) prove that
min(N, T ) ‖sft −HN,T ft‖2 = Op(1) for each t, where sft is a vector contain-
ing the rst k sample principal components of xt and HN,T is a sequence of
non-singular matrices. Bai (2003) shows that under similar conditions, as
(T,N) −→ (∞,∞), if√
NT−→ 0 then
√N (sft −HN,T ft) converges to a Gaus-
sian distribution and that if√
NT
> τ > 0 then T (sft −HN,T ft) = Op(1).
When√
TN−→ 0 he shows that
√T(λ
12i qi −H−1
NT bi
)converges to a Gaussian
distribution where λi is the ith eigenvalue of SXX = 1TN
X ′X and qi is the
corresponding eigenvector, and bi is the ith row of B. If√
TN
> τ > 0 then
N(λ
12i qi −H−1
NT bi
)= Op(1). Bai (2003) also proves asymptotic Gaussianity
44
for the estimator of the common component and provides a uniform bound
on the factors of max16t6T
‖sft −HN,T ft‖2 = Op
[max
(T− 1
2 ,√
TN
)]. Denoting
δ = (β′ γ′)′, Bai and Ng (2006) construct an OLS estimator δ using the
sample principal components in place of the unobservable populaton factors,
and prove that as (T, N) −→ (∞,∞) if√
TN−→ 0,
√T(δ − δ
)converges to
a Gaussian distribution, and that if√
NT
−→ 0, yT+h−y√var(yT+h)
d−→ N(0, 1), where
yT+h is a forecast of yT+h computed using the OLS estimates and the principal
components estimator of the factor.
Kapetanios and Marcellino (2004) have suggested that subspace algorithms
could be used to estimate factor models with large dimensions. They provide
a modication to standard subspace algorithms which allow the estimtor to be
computed when N is large relative to T . Currently, their asymptotic theory
restricts N to grow at a rate less than T13 , and so the rationale for using sub-
space algorithms in cases where N is of the same order of magnitude as T is
not yet established. Given the computational ease with which subspace esti-
mators may be computed for models small or large, an extension of this theory
which relaxed the restriction on the growth rate of N , would be particularly
interesting since it would mean that a single approach to estimation could be
used irrespective of the size of the factor model, making questions of whether
N is `large enough' redundant.
Bernanke et al. (2005) have proposed a factor-augmented vector autore-
gression (FAVAR) for which
ft
yt
= Φ(L)
ft−1
yt−1
+ ϕt
45
where yt is now permitted to be a vector of observable variables. They assume
that a large vector of observable variables xt has factor structure, estimate the
factors using the principal components of xt, and then estimate the FAVAR
equation using the rst k principal components in place of the unobservable
factors. Stock and Watson (2005) consider a similar model and provide a
detailed discussion of identication schemes for conducting a structural FAVAR
analysis.
Forni et al. (2005) show how the two-sided dynamic principal components
estimator of the one-sided dynamic factor model of Forni et al. (2000) may
be extended to create a one-sided estimator of the common component and
forecasts based on a one-sided lter. For the dynamic factor model
xt = B(L)ft + εt
the linear combination of the elements of xt which is closest to the space
spanned by the factors may be found by choosing the vector α1 to maximise
var(α′1B(L)ft) such that var(α′1εt) = 1. Denote this vector α∗1. A second vec-
tor α∗2 may be dened by maximising var(α′2B(L)ft) such that var(α′2εt) = 1
and α′2α1 = 0. In this way, k orthogonal components α∗1, ..., α∗k may be de-
ned which are the closest orthogonal factors to the common factor space.
Forni et al. (2005) show that these vectors are the generalised eigenvectors
corresponding to the rst k generalised eigenvalues of the covariance matrix of
B(L)ft and the covariance matrix of εt. They propose that these covariances
be estimated from the inverse Fourier transforms of the dynamic principal
component estimators of the common and idiosyncratic components of the dy-
46
namic factor model. They argue that this estimator is superior to the static
principal component estimator of Stock and Watson (2002a) since it incorpo-
rates information about the dynamic structure of the model in the estimation
technique. Simulation results provided by D'Agostino and Giannone (2006)
support this claim.
1.5.4 The Choice of Factor Order
Bai and Ng (2002) consider the estimation of the number of factors in a static
factor model. They derive modications of well-known model selection pro-
cedures, such as the Schwarz-Bayes criterion, and prove their consistency as
(T,N) −→ (∞,∞). As explained at the start of Section 1.3 a k-factor dy-
namic model with q lags of the factor may be written as a kq-factor static
model. Therefore, in a dynamic setting, the Bai and Ng (2002) procedure will
estimate kq, rather than k. Amengual and Watson (2007) devise an estimator
for k by rewriting the dynamic factor model in a static form with k factors
and then applying the Bai and Ng (2002) procedure to the transformed model.
They prove consistency for this procedure. Bai and Ng (2007a) take a dierent
approach to the estimation of the number of factors. They note that in the
static specication of the model, the spectral density matrix of the kq factors
will have a rank of k. They estimate k by specifying a VAR model for the
factors and estimating the rank of the covariance matrix of the VAR errors.
As discussed in Section 1.2.4 Kapetanios (2005) and Onatski (2007) have
proposed test statistics for the hypothesis that the factor order is equal to a
predetermined number. The reader is referred back to that section for a brief
47
discussion of these procedures.
1.5.5 Applications
There now exist many applications of the techniques described above. Fac-
tor models have been estimated using large macroeconomic data sets from
Australia43, Austria44, Belgium45, Brazil46, Canada47, the European Area48,
France49, Germany50, Italy51, the Netherlands52, New Zealand53, Spain54, the
United Kingdom55 and the United States56. The number of variables in the
analyses have ranged from as few as 29 57 to almost 100058. A range of practi-
cal issues have been considered. Bernanke and Boivin (2003) estimate policy
reaction functions for the Federal Reserve that include factors estimated by
the principal components of a large macroeconomic data set. Bernanke et al.
(2005) construct a factor-augmented vector autoregression (FAVAR) model
and nd that the inclusion of factors from a large macro data set helps to
explain the monetary policy "price puzzle" described by Sims (1992). Favero
et al. (2005) use principal component techniques to estimate factors from large
43Gillitzer et al. (2005) and Gillitzer and Kearns (2007).44Schneider and Spitzer (2004).45Nieuwenhuyze (2006).46Ferreira et al. (2005).47Brisson et al. (2003).48Forni et al. (2003).49Bandt et al. (2007).50Schumacher (2005).51Favero et al. (2004).52den Reijer (2005).53Giannone and Matheson (2006) and Matheson (2006).54Camacho and Sancho (2003).55Artis et al. (2005).56Stock and Watson (2002b) and Gavin and Kliesen (2006).57Gillitzer et al. (2005).58e.g. Altissimo et al. (2001).
48
data sets for the United States and the Euro area. They include the factors
as regressors in structural VAR models which they use to evaluate the eects
of monetary policy. Sala (2003) uses a large dynamic factor model to study
the transmission of menetary shocks in the Euro area. Mansour (2003) uses
a dynamic factor model to estimate a world business cycle from GDP growth
data of 113 countries. Helbling and Bayoumi (2003) use a factor model to
estimate common business cycle components for the G7 countries.
A popular application of large-scale factor techniques is to estimate the
factors from a broad collection of macroeconomic variables and to interpret
them as coincident economic indicators. Perhaps the best known of these are
the Chicago Fed National Activity Index 59 (CFNAI) which is the rst sample
principal component of 85 monthly indicators of economic activity, and Euro-
COIN60, developed by Altissimo et al. (2001) using dynamic factor techniques,
which is the cyclical component extracted from the common factor obtained
from a large set of European macroeconomic data. Other examples are Gillitzer
et al. (2005) who construct a coincident indicator for Australia, and Nieuwen-
huyze (2006) who constructs a business cycle indicator for Belgium. Recently,
Altissimo et al. (2006) have proposed a new version of EuroCOIN. Cristadoro
et al. (2005) use large-scale dynamic factor techniques to construct a measure
of core ination for the Euro area. Giannone and Matheson (2006) construct
a similar measure for New Zealand. Kapetanios (2004) uses subspace methods
to construct an estimate of core ination for the United Kingdom.
The most stringent test to which large-scale factor analysis techniques have
59http://www.chicagofed.org/economic_research_and_data/cfnai.cfm.60http://www.cepr.org.uk/data/eurocoin/.
49
been subjected is forecasting. If estimated factors contain useful information
that is not spanned by any small subset of economic variables, then it might be
expected that the inclusion of estimated factors in forecasting models would
improve forecasting performance. The last few years have seen a consider-
able number of studies which perform forecasting simulations using historical
macroeconomic data to compare the perfomance of large-scale factor models to
benchmark models such as scalar autoregressions and small vector autoregres-
sions. Stock and Watson (2006) and Breitung and Eickmeier (2005) provide
good surveys which cover some of this literature. Eickmeier and Ziegler (2006)
conduct a meta-analysis of 46 dierent studies which assess the forecasting
performance of dierent large-scale factor analysis procedures using various
data sets.
1.5.6 Using Factors for GMM Estimation
An interesting application of large-scale factor techniques, which is currently
under development, is to use estimated factors as instruments in a Generalised
Method of Moments (GMM) regression. Consider the regression model
yt = βxt + εt
where xt is a vector of m observable variables, β is a m×1 vector of regres-
sion coecients, and εt is a scalar regression error term for which E(xtεt) 6= 0.
Suppose that a vector of N instruments zt is available. Under assumptions
similar to those used by Bai (2003), Bai and Ng (2007b) prove that, if xt and
zt are driven by a set of k common factors, then a GMM estimator of β, in
50
which the rst k principal components of zt are used as instruments is√
T -
consistent and asymptotically Gaussian if√
TN−→ 0. Furthermore, they show
that the k-factor GMM estimator is more ecient than a GMM estimator
constructed using any subset of k of the elements of zt. Kapetanios and Mar-
cellino (2006) consider some more general relationships between the regressor,
the observable instruments and the factors, which allow for elements of zt to
be weak instruments. They prove several asymptotic results which support
the use of GMM estimation with the rst k principal components of zt used
as instruments. Favero et al. (2005) and Beyer et al. (2005) are examples of
applications of this technique to macroeconomic data.
1.6 Evaluation and Contributions
1.6.1 Evaluation of the Literature
Time Series Factor Analysis
As Giannone et al. (2006) point out, business cycle models, with variables
measured with error, imply dynamic factor structure. This, and the fact that
dynamic factor models were developed in the late 1970s and early 1980s, make
it remarkable that so little work has been done using dynamic factor models
since this time. However, to those with experience estimating dynamic factor
models, this is perhaps not much of a surprise. Simple models with a sin-
gle AR(1) factor, white noise errors, and no lagged factors are relatively easy
to estimate. The state space representation is the same as the factor model,
and the scoring algorithm usually converges quite rapidly. In cases where it
51
doesn't, it is reasonably straightforward to write an EM algorithm, which will
usually converge to a coarse convergence criterion fairly quickly. However, the
dynamic structure of such a model is restrictive and is likely to be unsatisfac-
tory in many applications. Generalising the dynamic structure of the model
complicates the estimation. Multiple factors may be accomodated by writing
state-space models for each factor, and then stacking the state vectors and sys-
tem matrices to create a state space representation for the entire factor model.
Lags of factors may be incorporated by stacking lags of the factors into the
state vector. Autoregressive errors may be accomodated by writing state space
representations for each error and then stacking the state vectors for the errors
with the state vector for the factors. However, the state space representation
of the factor model then has a `noise-free' measurement equation, which does
not lend itself to estimation by the EM algorithm. Furthermore, the scor-
ing algorithm can be quite slow is such cases, with the high state dimension
and relatively large number of parameters greatly increasing the computation
required by the Kalman lter. Also, practical experience suggests that con-
vergence often requires a large number of iterations, and may not occur at all.
Good starting values are essential. In principle, the dynamic factor model is
an attractive alternative to the vector autoregression for empirical macroeco-
nomics. However, given the relative ease with which vector autoregressions
may be estimated, it is not surprising that dynamic factor analysis has had
such a limited impact on the applied economics literature.
Most of the dynamic factor models that have been estimated in the lit-
erature have autoregressive factors. There also exist some applications with
factors which follow the Markov-switching process of Hamilton (1989). How-
52
ever, the obvious generalisation to factors which follow ARMA processes has
not yet been pursued. Since an innite impulse response function may be
approximated to arbitrary accuracy by a rational transfer function, this is a
worthwhile generalisation. An obvious diculty here is that the construction of
a state space representation of the factor model will be further complicated by
the addition of ARMA dynamics for the factors and errors, and so estimation
is likely to be dicult.
Another issue that requires further investigation is identication of dynamic
factor analysis. It is well known that in classical static factor analyis, the
factors are identied only up to an orthogonal transformation. Therefore,
linear sums of a set of factors can produce an alternative factor representation
which is equally valid. In the case of dynamic factor models, this is complicated
by the presence of lags. Geweke and Singleton (1981) and Camba-Mendez et al.
(2001) have shown that restrictions on the factor loading, similar to those used
in the case of static factor analysis, are sucient to identify the factors in the
two-sided and one-sided factor model respectively. Therefore, identication
is no more of a problem in the dynamic factor model than it is in the static
model. However, the question of whether the extra structure implied by the
dynamics helps to identify the parameters remains unexplored. If dynamic
factor models are to be used more widely as models of economic processes,
then the issue of identication warrants further attention.
53
Factor Analysis and Principal Component Analysis of High-Dimensional
Vectors
Given the relative computatonal ease with which principal components may
be computed, it is not surprising that there has been so much recent work
done with principal components estimators of high-dimensional factor models.
Considering the sheer volume of these applications to macroeconomic data,
one might hope that an empirical consensus had emerged concerning the types
of economies for which the techniques work well, the types of variables that
they can successfully forecast, the number of variables and observations re-
quired for good performance, etc. Unfortunately, no such concensus is yet
obvious. Indeed, it is not yet entirely clear that large-scale factor models nec-
essarily produce superior forecasts to standard forecasting approaches. Some
studies61 nd large improvements in forecasting performance from the use of
factors, with mean squared forecasting errors reduced by over 40% compared
to scalar autoregressions. Others62 nd little evidence of factor-based forecasts
providing a signicant benet over benchmark models.
Of particular interest in this thesis is the wide range in the number of
variables used to estimate factor models in the literature. Based on a naïve
reading of the theoretical literature, one might expect that studies that es-
timate the factors from the largest number of variables available would tend
to return the best results. However, the empirical evidence does not support
this proposition. Boivin and Ng (2006) nd that 40 carefully chosen vari-
61e.g. Stock and Watson (2002b), Brisson et al. (2003), Schneider and Spitzer (2004) andCamacho and Sancho (2003).
62e.g. Angelini et al. (2001), Giacomini and White (2003), Eklund and Karlsson (2007),Schumacher (2005), and Banerjee and Marcellino (2006).
54
ables can yield better results than 147 variables when forecasting 8 measures
of economic activity and ination for the US. They also perform monte carlo
simulations with dierent degrees of error cross-correlation and demonstrate
that increasing the number of variables in the factor model might worsen fore-
casting performance. They suggest the use of weighting schemes to improve
the performance of the principal components estimator. Inklaar et al. (2003)
consider the construction of a coincident indicator for the Euro area and nd
that a factor model estimated using 38 carefully chosen macroeconomic vari-
ables produces an indicator that is at least as good as that produced by esti-
mating a factor model estimated using their entire database of 246 variables.
Schneider and Spitzer (2004) consider forecasting Austrian GDP using a dy-
namic factor model estimated by dynamic principal components. They nd
that models that include only 5 to 11 variables perform signicantly better
than a model with 143 variables. den Reijer (2005) considers using a dynamic
factor model of 370 variables to forecast Dutch GDP, but nds that models of
147 and 223 carefully chosen variables perform better. These results are not
easily understood from the existing theory of factor model estimation which,
with one exception, shows consistency as (T,N) −→ (∞,∞). Onatski (2006a)
proves inconsistency in a case with weak factors and temporally independent
Gaussian errors. Furthermore, in this case the factors expain a negligible pro-
portion of the total variance and it is not clear that factor-based forecasts
would necessrily perform well, even if the true factors were known. However,
it is a little dicult to believe that some industrialised economies have a strong
factor structure and others don't. Also, it is not clear how the common obser-
vation that the estimator may be worse when N is large, can be explained in
55
Onatski's framework.
An alternative explanation for the mixed empirical performance of the prin-
cipal components estimator of factor forecasts concerns the behaviour of the
error covariance matrix as N grows. The published dual-limit consistency
proofs for principal components estimation of large-scale factor models place
restrictions on the cross-correlation structure of the error covariance matrix.
In particular, for the dynamic factor techniques introduced by Forni et al.
(2000), the dynamic eigenvalues of the spectral density of the errors are as-
sumed to be uniformly bounded. This assumption is the dynamic analogue
to the `approximate factor' restriction introduced by Chamberlain and Roth-
schild (1983). Stock and Watson (2002a) and Bai and Ng (2002) assume that
the mean absolute row sum of the error covariance matrix is bounded. Bai
(2003) and Bai and Ng (2006) assume that the absolute row sums are uni-
formly bounded. A bound on the maximum absolute row sum is also a bound
on the maximum eigenvalue, so these assumptions imply an approximate fac-
tor structure. While clearly less restrictive than the traditional `strict factor
model' assumption of a diagonal error covariance, these assumptions should
not be taken for granted. In spatio-temporal applications of the factor model,
it might be reasonable to assume that the correlation between errors decays as
the geographical distance between variables increases, so that the absolute row
sums of the error covariance matrix remain bounded as the number of variables
grows. In general however, there is no reason why the `approximate factor' re-
striction should necessarily be expected to hold. One possible description of
the data, which might be relevant in many applications, is that the variables
belong to a set of natural groups. For example, the groups could correspond
56
to geographical boundaries (e.g. dierent countries in the Euro area), or to
functional catagories (e.g. real, nominal and nancial variables). It might be
reasonable to assume that pairs of errors corresponding to dierent groups are
weakly correlated in the sense of uniform boundedness of the absolute row
sums of the covariances as the number of variables grows, but that pairs of
errors from the same group are strongly correlated. If the number of natural
groups is nite and relatively small, then the number of variables might be
increased only by increasing the number of variables used from each natural
group. In such a situation, the absolute row sums of the covariance matrix
are unlikely to be uniformly bounded by a constant and could in fact grow at
any rate up to N . The existing theory for principal components estimation of
large-scale factor models does not cover such cases. In fact, with the exception
of a brief consideration of a single-factor model with identical factor loadings
by Boivin and Ng (2006), the implications of stronger cross-correlation in the
errors has not been given explicit consideration in the theoretical literature.
1.6.2 Contributions Made in this Thesis
Chapter 2
Chapter 2 considers dynamic factor analysis with a one-sided factor lter in
a setting in which the number of variables N is assumed to be xed. Three
contributions are made.
(i) The dynamic factor model with mutually uncorrelated autoregressive
factors is derived as a particular realisation of a VARMA model with re-
duced spectral rank observed subject to noise. As Giannone et al. (2006)
57
have pointed out, many theoretical macroeconomic models suggest that
the number of structural shocks that drive macroeconomic uctuations
is less than the number of observable variables, and macroeconomic vari-
ables are usually observed subject to measurement error. Consequently,
the reduced spectral rank VARMA plus noise model is an attractive spec-
ication for macroeconomic analysis. In Section 2.1, it is shown that the
dynamic factor model with mutually uncorrelated autoregressive factors
corresponds to a minimal dimension state space representation of the
reduced spectral rank VARMA plus noise model in cases in which the
autoregressive polynomials of the factors do not have any common poly-
nomial factors. In cases where common polynomial factors exist, the
dynamic factor model does not correspond to a minimal state dimension
representation (Proposition 1).
(ii) In Section 2.2 the issue of identication is considered for a fairly gen-
eral class of weakly stationary dynamic factor model with uncorrelated
factors. For the model
xt = βft + εt
in cases where the removal of any row of β(L) leaves rows from which
it is possible to construct two k × k matrices of full rank and another
matrix with at least one row, it is shown that the error spectrum is
identied (Theorem 2.2.1) and that the number of dynamic factors is
identied (Theorem 2.2.2). It is also shown that, under these conditions,
zero-restrictions similar to those used to identify the static factor model
are also identifying for the dynamic factor model (Theorem 2.2.3). Of
58
most interest is Theorem 2.2.4 which shows that, under the above rank
conditions, if β(L) is irreducible and the spectra of the factors are lin-
early independent, then β(L), the factor specta, and the error spectra
are identied up to sign changes, reordering, and rescaling of the fac-
tors. Consequently, zero-restrictions are not necessary for identication
in many forms of dynamic factor model, including those with autoregres-
sive factors.
(iii) In Section 2.3, a frequency domain approach is proposed for the estima-
tion of dynamic factor models. A simulation exercise (in Section 2.4)
suggests that this method has some computational advantage over the
state space scoring algorithm which is usually used for dynamic factor
model estimation. However, the main attraction of the frequency domain
approach is the relative ease with which a general algorithm can be coded.
The existing time domain algorithms for the estimation of dynamic fac-
tor models require the construction of a state space representation of the
model. For factor models with few lags, this is trivial. However for more
complicated lag structures, and particularly for ARMA dynamics, this
task becomes more complex, and the construction of a general algorithm,
which can handle any specication of model orders is complicated. As
shown in Section 2.3, in the frequency domain a general expression for the
covariance matrix can be written (Equation (2.5)) which makes the eval-
uation of the likelihood relatively easy to code. As an illustration, this
approach is used to estimate a dynamic factor model with a ARMA(1,1)
factor and ARMA(1,1) errors using data on industrial production growth
59
in the G7 countries (Section 2.5).
Chapter 3
In Chapter a theoretical investigation into the asymptotic behaviour of the
principal components estimator is presented. The asymptotic results that have
previously been published in the literature assume that the mean of the row
sums of the absolute value of the covariance matrix of the errors is bounded (the
approximate factor assumption). It is argued in Chapter 3 that this assumption
will often be violated. For many of the applications in the literature, the
variables are chosen from a relatively small number of categories. For example,
large factor models will often have a large number of price indexes, a large
number of interest variables, a large number of measures of industrial output,
etc. It is easy to believe that the similarity of many of the variables that belong
to the same category is such that many of the error terms corresponding to
those variables will have non-negligible correlation. If the `large-N' conditon
is achieved by increasing the number of variables in each category, instead of
increasing the number of categories, then it is likely that the absolute row sums
of the error covariance matrix will grow without bound a situation that is
not covered by the published theory.
The main result in this chapter (Theorem 3.1.4) is that the principal compo-
nents estimator is consistent under conditions where the absolute row sums of
the error covariance matrix grow without bound. Therefore consistency holds
for a class of model which is more general than the approximate factor model
that has been investigated in the literature. However, the rate of convergence
that is achieved by the estimator is slower, the faster is the rate of growth of
60
error cross-correlation. Consequently, it is possible for the performance of the
principal components estimator to be poor even in applications with a very
large number of variables, which may explain the patchy performance record of
the principal components estimator in forecasting applications. The proof of
this result makes use of a number of preliminary results, which are interesting
in their own right.
Theorem 3.1.3 proves the consistency of sample eigenvalues (scaled by 1N)
for population eigenvalues in a framework in which (N, T ) → (∞,∞) jointly.
The key assumption in this theorem is the so-called `gap' condition, which
requires that the absolute dierence between each of the rst k eigenvalues
and any other eigenvalue, grows at a rate of strictly N .
Theorem 3.1.1 presents a set of nite-sample/variables bounds linking pop-
ulation principal components to population factors. By avoiding sampling
issues and asymptotic arguments, these bounds give a clear view of the con-
ditions under which population factors and population principal components
are likely to be `close'. In particular, they suggest that what matters for prin-
cipal components to estimate factors well is not the number of variables per
se, but rather the magnitude of the noise-to-signal ratio, which is dened as
ρ = σ2
λk, where σ2 is the largest eigenvalue of the error covariance matrix Ψ,
and λk is the kth eigenvalue of Ω = E(
1TX ′X
). When the noise-to-signal ratio
is small, population principal component and population factor quantities will
be similar. Estimation of a lower bound on the noise-to-signal ratio is consid-
ered in Section 3.2, and some empirical work is conducted which suggests that
the noise-to-signal ratio of the US macroeconomic data set used by Stock and
Watson (2002b) is not particularly small (Section 3.3).
61
Chapter 4
In Chapter 4 a new factor model, named the grouped variable approximate fac-
tor model, is proposed. The grouped variable approximate factor model is mo-
tivated by the idea that large sets of economic variables will often have a group
structure such that most of the cross-correlation between the errors in a factor
model occurs between variables that belong to the same group. For example,
in the data appendix of Stock and Watson (2002b), the 215 variables used in
their model are listed under headings such as Real output and income", Em-
ployment and hours", Stock prices", etc. Many of the variables listed under
these headings are very similar to each other. For example, under Real out-
put and income" are listed variables such as Industrial production:total index;
Industrial production:products, total; Industrial production:nal products; In-
dustrial production:consumer goods; Industrial production:durable consumer
goods; Industrial production:nondurable consumer goods; Industrial produc-
tion:business equipment; Industrial production:intermediate products; Indus-
trial production:materials; and so on. Under Employment and hours" are
listed variables such as Employees on nonagricultural payrolls: goods produc-
ing; Employees on nonagricultural payrolls: contract construction; Employ-
ees on nonagricultural payrolls: manufacturing; Employees on nonagricultural
payrolls: durable goods; Employees on nonagricultural payrolls: nondurable
goods; Employees on nonagricultural payrolls: service producing; Employees
on nonagricultural payrolls: wholesale and retail trade; and so on. It is sug-
gested in Chapter 4 that, if xt is constructed by entering these variables in
the order given by the variables listed under their headings, then most of the
62
error cross-correlation in the factor model will exist in blocks which lie on the
diagonal of the error covariance matrix and which correspond to the groups
identied by these headings.
The grouped variable approximate factor model formalises this idea by
assuming that the error covariance of the factor model has a block structure,
where the blocks correspond to the variable groups. The o-diagonal blocks
are subject to a weak correlation restriction specically, the largest of the
singular values of the o-diagonal blocks must grow at a rate strictly less than
N− 12 . No restriction is placed on the correlation structure of the blocks that
lie on the diagonal.
In Section 4.2 an approximate instrumental variables estimator is proposed
for the grouped variable factor model. This estimator is simple to compute, re-
quiring only matrix multiplication and the inversion of a k×k matrix, where k is
the number of factors. In Section 4.3 consistency is proved for the approximate
instrumental variables estimator in a framework in which (N, T ) → (∞,∞)
jointly (Theorem 4.3.1). A brief empirical experiment which compares the
approximate instrumental variables estimator to the principal components es-
timator, is presented in Section 4.4.
63
Chapter 2
Dynamic Factor Analysis with a
Finite Number of Variables
This chapter considers the dynamic factor model
xt = β(L)ft + εt
where xt is a N × 1 vector of observable variables, β(L) is a N × k nite-order
one-sided polynomial in which L is the backshift operator, ft is a k× 1 vector
of unobservable mutually uncorrelated, but serially correlated, factors and εt
is a N × 1 vector of unobservable mutually uncorrelated disturbances, which
also may be serially correlated. It is assumed that N is xed at a value that
is small relative to the number of observations T and that the model of the
factor process, and the model of the error process, are of a known parametric
form.
Most commonly, the factor process is specied to be a vector of mutually
64
uncorrelated autoregressions. Engle and Watson (1981) proposed a model
with a single autoregressive factor which they write in state space form and
estimate by a scoring algorithm. Watson and Engle (1983) and Shumway and
Stoer (1982) independently proposed an EM algorithm for estimating the
autoregressive factor model. Applications of this dynamic factor model have
included the construction of coincident and leading indicators1 and analyses
of wages2, productivity 3 and aggregate demand4. Following Altug (1989) and
Sargent (1989), Giannone et al. (2006) advocate the use of dynamic factor
models for business cycle analysis.
Other research has proposed estimation algorithms for dynamic factor mod-
els with dierent factor specications. In particular, Kim (1994), Kim and Yoo
(1995), Chauvet (1998), Kim and Nelson (1998) and Harris and Martin (1998)
have proposed models in which the factor follows the Markov-switching process
of Hamilton (1989), and have applied them to the modelling of business cycles,
and Dungey et al. (2000) have estimated a model of bond yields in which the
factor is autoregressive with GARCH disturbances.
This chapter makes three contributions to the dynamic factor analysis liter-
ature. Firstly, in Section 2.1 the dynamic factor model with mutually uncorre-
lated autoregressive factors is derived as a particular realisation of a VARMA
model with reduced spectral rank and additive noise. It is shown that in some
cases, this dynamic factor model corresponds to a minimum dimension state
space representation of the VARMA plus noise model. Since business cycle
1Stock and Watson (1990).2Engle and Watson (1981) and Watson and Engle (1983).3Lebow (1993).4Watson and Kraft (1984).
65
models generally have fewer stochastic shocks than observable variables and,
since macroeconomic variables are measured with noise, it is argued that the
dynamic factor model is useful as a general model for empirical macroeco-
nomics and should be viewed as an attractive alternative to the almost uni-
versally used vector autoregression (VAR)5. Secondly, in Section 2.2 the iden-
tication issue is considered for a fairly general class of dynamic factor model
which includes autoregressive factors, ARMA factors, Markov-switching fac-
tors, and many other specications that may be useful. Of particular interest
is the nding that under reasonably general conditions, dynamic factor models
are identied without the need for strong restrictions of the type necessary in
static factor analysis. Thirdly, in Section 2.3 a frequency domain approach
to the estimation of dynamic factor models is proposed. A simulation study
(presented in Section 2.4) shows that, for models with simple dynamics, the
frequency domain approach has some computational advantages over the tra-
ditional state space approach to estimation. However, its main attraction is
the ease with which it generalises to models with more complicated dynamics,
in particular models with ARMA factors and ARMA errors. This is in contrast
to the traditional approach where the construction of the state space repre-
sentation becomes more complicated as the dynamic structure of the model
becomes richer. A brief empirical example is presented in Section 2.5
2.1 Dynamic factor models in macroeconomics
Giannone et al. (2006) consider the estimation of business cycle models. Fol-
5See Giannone et al. (2006) for a similar argument.
66
lowing Altug (1989) and Sargent (1989), they note that when observable vari-
ables are subject to measurement error, business cycle models imply that vec-
tors of observable variables have dynamic factor structure. Using this obser-
vation as a starting point, in this section a rationale is presented for using
dynamic factor models with mutually uncorrelated autoregressive factors as a
general class of model for empirical macroeconomics. The argument presented
is more general than those provided by the above authors and pays particular
attention to the denition of the factors, and to issues of generality, uniqueness
and parsimony.
In a world characterised by measurement error, it is reasonable to assume
that a N × 1 vector of observed economic variables xt has two components
xt = ξt + εt (2.1)
where εt is a N × 1 vector of measurement errors, which are assumed to
be mutually uncorrelated at all leads and lags, and ξt is N × 1 vector of
`measurement-error-free' variables for which a theoretical economic model ex-
ists. ξt is assumed to be uncorrelated with εt at all leads and lags. It is assumed
that the spectral densities of ξt and εt are uniformly bounded. The rank of the
spectral density matrix of ξt is denoted k. In general we could have k = N but,
as pointed out by Giannone et al. (2006), equilibrium business cycle models
usually have fewer stochastic shock variables than observable variables. Con-
sequently, our main interest is in cases where k < N . A fairly general model
67
would have (with L the lag or backshift operator)
ξt = T (L)ηt
where T (L) is a N × k matrix of rational transfer functions and ηt is a k × 1
vector of white noises. It is assumed that ηt has a covariance matrix of Ik.
In business cycle models, the underlying shocks are often considered to be
variables such as demand shocks, monetary policy shocks, technology shocks,
etc. The (i, j)th element of T (L) may be written as
Tij(L) =bij(L)
aij(L)
where aij(L) and bij(L) are coprime polynomial operators in L. In the control
theory literature it is often assumed that rational transfer functions are strictly
proper so that deg(bij(L)) < deg(aij(L)). This assumption will be employed
later in this section for a discussion of minimal dimensionality, however for
the discussion of identication and estimation in subsequent sections, no re-
strictions need be placed on the degrees of the polynomials other than that
the numerator polynomial and the denominator polynomial must both be of
nite degree. Since an innite impulse response function may be approximated
to arbitrary accuracy by a ratio of nite degree polynomials, this assumption
does not sacrice generality. The complete model may be written as
xt = T (L)ηt + εt (2.2)
Interest centres on the estimation of T (L) since it largely determines the re-
68
sponses of observable variables to impulses in the underlying shock variables.
In cases where measurement error does not exist, these impulse responses may
be estimated in a VAR framework, provided that T (L) satises the `fundamen-
talness condition' that there exists a k×N matrix polynomial S(L) such that
S(L)T (L) = Ik (see Hansen and Sargent (1990) for a discussion). As pointed
out by Giannone et al. (2006) however, in the presence of measurement error,
the identication of the impulse responses in a VAR framework is problem-
atic. Giannone et al. (2006) propose instead that the impulse responses be
estimated in a dynamic factor analysis framework and they conduct monte
carlo simulations which demonstrate the superiority of this approach. What
follows is a more detailed rationale of the dynamic factor model than that
given by Giannone et al. (2006). In particular, care is taken to precisely dene
the factors, and issues of uniqueness, generality and parsimony are considered.
Since the rst component of Equation (2.2) is a VARMA model with a
reduced spectral rank, it is known that even in the absence of the measurement
error vector εt, the parameters of T (L) are not uniquely identied6. In VARMA
modelling, the rst step in dealing with identiability is to nd a minimal
dimensional state space model. This minimal dimension is called the McMillan
degree7 and it has a number of other characterisations. In this case however,
the VARMA system is observed subject to noise, and so the identication issue
is non-standard. In this section, the identication issue is handled by choosing
a form of the `VARMA plus noise' model that corresponds to a dynamic factor
model. In the next section, it will be shown that, under certain conditions,
6See, for example, Lütkepohl (1991).7See Solo (1986).
69
the dynamic factor representation is unique.
Consider a single column in T (L), Tij(L) =bij(L)
aij(L)for i = 1, ..., N . Let
dj(L) be the lowest common multiple of aij(L) for i = 1, ..., N . The elements
of column j may then be written as
Tij(L) =cij(L)
dj(L)
where cij(L) and dj(L), which is monic, are coprime polynomial operators.
Therefore, the rst term on the right hand side of Equation (2.1) may be
written as
ξt =k∑
j=1
cj(L)
dj(L)ηjt =
(c1(L)
d1(L)...
ck(L)
dk(L)
)ηt = (c1(L)...ck(L))
1
d1(L)0 0
0. . . 0
0 0 1dk(L)
ηt
where cj(L) =
(c1j(L) · · · cNj(L)
)′. Denoting
A(L) =
d1(L) 0 · · · 0
0. . . · · · 0
.... . .
...
0 · · · 0 dk(L)
and β(L) =
c11(L) · · · c1k(L)
.... . .
...
cN1(L) · · · cNk(L)
we may dene the factor vector
ft = A(L)−1ηt (2.3)
70
and write Equation (2.1) as
xt = β(L)ft + εt (2.4)
where ft is a k × 1 vector of uncorrelated scalar autoregressions. Thus, the
`VARMA plus noise' model has a dymamic factor representation.
Continuing, we now assemble a state space model. To do this we write ξt
in terms of its k components
ξt =k∑
j=1
ξjt =k∑
j=1
c1j(L)
...
cNj(L)
dj(L)−1
Consider the single input multiple output (SIMO) transfer function given by
single element of this sum,
c1j(L)
...
cNj(L)
dj(L)−1. The construction of a min-
imal state space representation of this transfer function is straightforward,
well-known8 and is as follows.
8See, for example, Kailath (1980) or Barnett (1980).
71
ξjt =
cj11 cj12 · · · cj1m
cj21 cj22 · · · ......
. . ....
cjN1 · · · · · · cjNm
νjt
νt =
−dj1 −dj2 · · · −djm
1 0 · · · 0
.... . . 0
0 0 1 0
νjt−1 +
1
0
...
0
δjt
where δjt is a scalar white noise and νjt is a mj × 1 state vector, where mj =
deg (dj(L)). The minimal state space model for ξt is then constructed by
stacking the models for ξjt for j = 1, ..., k. The state dimension of the factor
model is thenk∑
j=1
mj. Now we need to see under what conditions this is the
minimal state dimension, i.e. the McMillan degree.
To determine the McMillan degree of T (L), an easy approach is to construct
Gilbert's minimal state space representation9 (see Kailath (1980) or Barnett
(1980)). Gilbert's representation is constructed by taking a partial fraction
expansion of T (L)
T (L) =k∑
j=1
Rj
1− λjL
where the N×k matrices Rj, j = 1, ..., k are of rank %j respectively. A singular
9It should be noted that Gilbert's representation assumes that the roots of the denomi-nator polynomials are distinct. However, since the set of models for which these polynomialshave repeated roots has measure zero, this is not a matter of great practical concern.
72
value decomposition of Rj is
Rj = CjN×%j
× BjN×%j
The Gilbert state space model is then
ξjt =
(C1 · · · C1
)ωt
ωt =
λ1I%1 0 · · · 0
0 λ2I%2 · · · 0
.... . . 0
0 0 0 λkI%k
ωt−1 +
B1
B2
...
Bk
ϕt
where ωt is ak∑
j=1
%j × 1 state vector and ϕt is a k × 1 error term. Since the
Gilbert form is known to be of minimal dimension, the McMillan degree of
ξt = T (L)ηt isk∑
j=1
%j. The key point here is that, if the denominator polyno-
mials dj(L), j = 1, ..., k used to construct A(L) in the factor model, have no
common polynomial factors, then the Rj matrices in the Gilbert representation
will have full rank. Consequently,k∑
j=1
%j =k∑
j=1
mj. If the denominator polyno-
mials do have common polynomial factors, then the Rj matrices in the Gilbert
representation will have reduced rank, resulting ink∑
j=1
%j <k∑
j=1
mj. This yields
the following proposition
Proposition 1. The dynamic factor model given by Equations (2.3) and (2.4)
creates a minimal dimension state space representation of T (L) if and only
if the polynomials in the diagonal matrix A(L) have no common polynomial
factors.
73
2.2 Identication
In the previous section, a dynamic factor model with mutually uncorrelated au-
toregressive factors was derived as a particular realisation of a VARMA model
of reduced spectral rank observed subject to measurement error. It was shown
that this realisation corresonds to a minimal dimension state space represen-
tation in some cases. However, the issue of uniqueness was not considered. In
the case of the classical static factor analysis model, it is well-known10 that the
factors and factor loadings are identied only up to an orthogonal transforma-
tion. Identication of the factors requires the imposition of restrictions on the
factor loading matrix such that the only orthogonal transformation for which
the restrictions are invariant is the identity. Camba-Mendez et al. (2001) show
that similar restrictions are also identifying in the case of a factor model with
mutually uncorrelated autoregressive factors and with no lagged factors aect-
ing the observed vector. Whether such restrictions are identifying in a more
general setting where the observed variable is related to a nite distributed lag
of factors is an open question. Of further interest is whether restrictions such
as these are needed at all. The identication problem exists in classical static
factor analysis because it is possible to construct mutually and serially uncor-
related weighted sums of mutually and serially uncorrelated factors. Therefore,
given any valid vector of factors, a dierent, but equally valid, factor vector
can be constructed by an orthogonal transformation. This is not the case if
the factors are mutually uncorrelated autoregressions. Weighted sums of au-
toregressions are not autoregressions. Consequently, it might be hoped that
10See, for example, Lawley and Maxwell (1971).
74
the dynamic structure of the factors in the dynamic factor model eliminates
at least some of the orthogonal transformations of the factors that would be
permissable in the static factor case.
In this section, an investigation of the identication issue for dynamic fac-
tor models is presented. The results are derived for a fairly general class of
dynamic factor model which includes, but is not restricted to, the models with
mutually uncorrelated autoregressive factors derived in the previous section.
The class of models considered is those with factors which are uncorrelated with
the errors and for which the spectral density matrices of the factors and the
errors are diagonal. Consequently, in addition to being of interest to macroe-
conomists wishing to estimate models of reduced spectral rank that are subject
to measurement error, the results in this section are relevant for the estimation
of multiple factor models with Markov-switching factors, GARCH factors, etc,
provided that the factors are mutually uncorrelated.
Consider the dynamic factor model
M : xt = β(L)ft + εt
where xt is a N×1 vector of observable variables, β(L) is a N×k nite-order
one-sided polynomial in which L is the backshift operator, ft is a k× 1 vector
of unobservable factors and εt is a N × 1 vector of unobservable disturbances.
The following assumptions dene the class of factor model under consideration.
Assumptions 1.
1.1 β(L) is a one-sided nite matrix polynomial operator.
75
1.2 The spectrum of ft is diagonal and uniformly bounded. The spectrum of
εt is diagonal and uniformly bounded.
1.3 E(ftε′t−j) = 0, j ∈ Z.
The spectrum of the observable variables may be written as
Sxω = βωSf
ωβHω + Sε
ω
where H denotes the complex conjugate transpose and Sfω and Sε
ω are the
spectra of ft and εt respectively, which again are diagonal. The subscript
ω denotes the frequency. The identication theorems that follow are based
on a consideration of the rst two moments only. Accordingly, the following
denition is made.
Denition 2.2.1. We dene two factor models M and M ∗ to be observa-
tionally equivalent if the spectral density matrices of the observable vectors xt
and x∗t , Sxω and Sx∗
ω are equal for all frequencies ω, where −π 6 ω 6 π.
In order to prove identication, the following assumption, which is the
dynamic analogue of a standard assumption in static factor analysis, is made.
Assumptions 2.
2.4 If any row of β(L) is deleted, from the remaining rows it is possible to
construct two k×k full-rank polynomial matrices and a polynomial matrix
with (N − 2k − 1) > 0 rows.
The theorems and proofs below make use of some properties of polynomial
matrices which are not often discussed in the economics literature. These terms
76
are dened in Appendix 1. The rst result, a dynamic extension of a static
result in Anderson and Rubin (1956), gives conditions under which the error
process spectrum is identied. The proofs of all theorems appear in Appendix
2.
Theorem 2.2.1. For the set of models M , under assumptions 1.1, 1.2, 1.3
and 2.4 the disturbance spectrum Sεω is identied.
Note that the theorem requires that N > 2k + 1, providing a lower bound
on the number of observable variables if the identication results are to apply.
However, it is not sucient to simply have a large number of variables relative
to the number of factors. There also exist restrictions on the linear dependence
of the ltering that need to be satised. For example, if it was the case that the
data set consisted of time series observations of a panel of rms or individuals
who have identical characteristics, then it may be the case that they all react
to changes in the common factors in the same way, in which case the rank of
β(L) may be insucient for the theorem to apply, even if there are a large
number of variables. Therefore, loosely speaking, Theorem 2.2.1 says that it is
insucient to have a large number of variables; they must also be suciently
diverse.
The next result shows that the number of factors is uniquely determined
under the conditions of Theorem 2.2.1.
Theorem 2.2.2. For the set of models M , under assumptions 1.1, 1.2, 1.3
and 2.4 the dimension of the factor vector (k) is identied.
With the disturbance spectrum and factor dimension identied, all that
remains is to determine conditions under which βω and Sfω are uniquely de-
77
termined by βωSfωβH
ω . The following lemma, which is an extension of a static
result by Reiersøl (1950), provides a useful representation of the set of obser-
vationally equivalent models which will subsequently be used .
Lemma 1. Under assumptions 1.1, 1.2, 1.3 and 2.4, the set of observationally
equivalent dynamic factor models has the spectral representation
Sxω = β∗ωSf∗
ω β∗Hω + Sεω
where β∗ω = βωM−1ω and Sf∗
ω = MωSfωMH
ω and Mω is a k × k non-singular
polynomial operator in e−iω. Furthermore, if β(L) is irreducible, then Mω is
unimodular.
We now show that the factor spectrum and the lter β(L) are identied
under a particular pattern of zero-restrictions on the factor loading matrix
β(L).
Theorem 2.2.3. If
a) Assumptions 1.1, 1.2 and 1.3 hold.
b) β(L) is irreducible.
c) Following the deletion of any row of β(L), from the remaining rows it is
possible to construct a k×k lower-triangular polynomial matrix, a k×k
full-rank polynomial matrix and a polynomial matrix with (N−2k−1) > 0
rows
then β(L) is identied and the factor spectrum Sfω is identied up to a rescaling
of the factors, and a sign change on each factor.
78
Theorem 2.2.3 generalizes to the time series context the well-known result
that an appropriate pattern of zero-restrictions identies a static factor model.
Thus, it tells us that identication is no more of a problem in the time-series
setting than it is in the static case. It states that a k-factor model is identied
if k − j variables are independent of j of the factors for j = 1, .., k − 1. While
useful, the factor-exclusion assumptions that the theorem requires are strong
and may not be satised in many applications. In Theorem 2.2.4 it is shown
that under fairly general conditions, the results of Theorem 2.2.3 hold without
zero-restrictions.
Theorem 2.2.4. If
a) Assumptions 1.1, 1.2, 1.3 and 2.4 hold.
b) β(L) is irreducible.
c) The factor spectra are linearly independent functions, i.e. λdiag(Sfω) =
0∀ω ∈ [0, π] ⇒ λ = 0 for a 1× k vector λ,
then β(L) is identied and the factor spectrum Sfω is identied up to a re-
ordering of the factors, a rescaling of the factors, and sign changes of the
factors.
Since the class of nite order autoregressions is not closed under addition,
the spectra of autoregressive factors are linearly independent provided that
they are all dierent11. Accordingly, the factor lter matrix β(L), the factor
spectra, and the disturbance spectra of autoregressive multiple factor models
are identied under the rank and irreducibility assumptions of Theorem 2.2.4.
11That is to say, we exclude cases such as f1t = f2t.
79
Since autoregressions are identied from their unconditional second moments,
all the parameters of the model are identied. Zero restrictions, or the unit
restrictions of Camba-Mendez et al. (2001) are redundant in this case, provided
that the irreducibility and rank assumptions on β(L) are satised. This is a
remarkable result since it implies that the factor estimates from such models
can be interpreted far more readily than is the case for static factor estimates.
For the model
yt = T (L)ηt where E(ηtη′t) = Ik
it is well-known that the impulse response functions are identied only up to
an othogonal transformation. Consequently, for example, the estimation of
impulse response functions in a structural VAR model requires restrictions to
be imposed on the model. This rotational indeterminacy is eliminated in the
construction of the factor model by writing T (L) as the product of a polynomial
matrix and a diagonal matrix containing the lowest common denominators
of the columns of T (L). The only class of orthogonal transformation which
preserves this diagonal matrix, and maintains an identity covariance for ηt, is
dened by the set of permutation matrices. Consequently, this particular form
of the VARMA model is identied subject to the rank conditions of Theorem
2.2.4. It should be noted however, that the dynamic factor model is only one
particular realisation of the VARMA plus noise model. If the object of the
analysis is to conduct an impulse response analysis for the VARMA plus noise
model
xt = T (L)ηt + εt
80
then dynamic factor techniques provide a convenient way to estimate the pa-
rameters of the model. However, in this form of the model, the impulse re-
sponse function that relates the structural shocks ηt to the observable variables
xt is still subject to a rotational indeterminism.
The theorems in this section have wider applicability than models with
autoregressive factors. For the case of a model with ARMA factors things are
not as clear cut as for autoregressive factors. Since ARMA processes contem-
poraneously aggregate to ARMA processes (see e.g. Lütkepohl (1991)), it is
possible to construct multiple ARMA-factor models for which Assumption c)
of Theorem 2.2.4 does not hold. However, such models are somewhat con-
trived in the sense that the factors are able to cancel each other out to some
extent. If we are prepared to assume that no such cancellation is possible so
that the factor spectra are linearly independent, and that the polynomial nu-
merators and denominators in any of the ARMA processes in the model are
coprime, then all the parameters in the model are identied under the rank
and irreducibility assumptions on β(L).
For models with factors which follow unit root processes, such as the
Fernández-Macho (1997) model, the observable variable yt must be dierenced
in order to satisfy Assumption 1.2. The Fernández-Macho (1997) model will
then be equivalent to a static model and zero restrictions may be used for
identication when there is more than one factor in the model. For a general-
isation of the model with factors which follow random walks with autoregres-
sive shocks, the dierenced model will be similar to the autoregressive-factor
model and the parameters of the model will be identied under the rank and
irreducibility conditions of Theorem 2.2.4. Indeed the results may be ap-
81
plied to any factor model of an integrated vector xt which may be written as
∆dxt = β(L)ft + εt where ft and εt satisfy Assumptions 1.2 and 1.3, and β(L)
satises Assumptions 1.1 and 2.4.
Poskitt and Chung (1996) have shown that an r-state scalar Markov-
switching model with a non-singular transition matrix generates the auto-
covariance function of an ARMA(r − 1, r − 1) process. It follows that an
autoregressive lter of a 2-state Markov-switching variable plus uncorrelated
white noise must have the spectrum of an ARMA process. Consequently, the
above comments about identiability of ARMA factors also apply to multiple
factor versions of the Markov-switching model of Chauvet (1998). Similarly,
Karlsen (1990) shows that a process which switches between r AR(1) processes
has the autocovariance of an ARMA process, indicating that a similar result
holds for multiple factor generalisations of the model used by Chauvet et al.
(2002). Zhang and Stine (1999) similarly nd ARMA structure for a more gen-
eral Markov-switching vector autoregression model. Thus, we can state that
for a fairly general class of Markov-switching factor model, it is possible to
write down examples for which Assumption c) of Theorem 2.2.4 does not hold,
but that with the exception of contrived cases where polynomial terms cancel
out, the models are identied under the rank and irreducibility assumptions
of Theorem 2.2.4.
The results for factor-GARCH type models are less pleasing. Since GARCH
variables are unconditionally homoscedastic and serially uncorrelated, the re-
sults mirror those for static factor models that is Theorem 2.2.4 does not
apply and we need to impose zeros restrictions to guarantee identication of the
factor spectra, disturbance spectra and factor lter matrix. Of course in the
82
1-factor case, Assumption c) of Theorem 2.2.4 is satised and zero restrictions
are unnecessary. However, even in this case, only the spectra of the factor and
disturbances are identied. Since GARCH processes are not identied by un-
conditional second order moments, the parameters driving these processes are
not identied by our theorems. For the AR/GARCH factor model of Dungey
et al. (2000), Assumption c) of Theorem 2.2.4 is satised and the factor lter
and the AR parameters of the factors and disturbances are identied. How-
ever, the GARCH parameters are still not identied by our theorems since
they only consider unconditional second order moments.
2.3 Estimation
In Section 2.1, the dynamic factor model with uncorrelated autoregressive
factors was derived as a particular realisation of a `VARMA plus noise' model
where the VARMA component has reduced spectral rank. This model was mo-
tivated by business cycle models in which the variables of interest are driven
by a relatively small number of mutually and serially uncorrelated structural
shocks, and are assumed to be observed subject to measurement error. Engle
and Watson (1981) proposed that the dynamic factor model with autoregres-
sive factors be estimated using a scoring algorithm with the model written in
state space form. Shumway and Stoer (1982) and Watson and Engle (1983)
independently proposed that the EM algorithm be used to estimate the model
in state space form. While these algorithms work reasonably well for small
models with simple dynamics, their application to models with extensive lag
structures can be more troublesome, with the algorithms generally taking a
83
long time to converge, and often failing to converge. These issues can usually
be resolved with some skilled human intervention, but the fact remains that
dynamic factor models with rich lag structures are dicult to estimate. Fur-
thermore, in the derivation of the model in Section 2.1, the lag polynomials
of the autoregressive factors are constructed as the least common multiples of
the denominator polynomials in each column of the transfer function in the
original VARMA plus noise model. Consequently, the orders of the autore-
gressive factors could be high in practice, making estimation dicult. Since a
high-order polynomial may be approximated by a ratio of polynomials of much
lower order, a practical solution to this problem is to replace the high-order
autoregressive factors with low order ARMA processes. Furthermore, the state
space construction leading to Proposition 1 does not preclude common factors
in the numerator polynomials. If this occurs then the representation of the
common polynomial factors as moving average components of the model fac-
tors can lead to a substantial reduction in the parameter count. This motivates
a dynamic factor model with ARMA factors.
In this section, a frequency domain approach to the estimation of dynamic
factor models with ARMA factors is proposed. In the derivation of the dynamic
factor model in Section 2.1, the errors were considered to be measurement
errors. As such, one might expect them to be serially uncorrelated. However,
in the interests of generality, the approach described below is able to estimate
dynamic factor models with errors which follow ARMA processes.
The published estimation approaches taken to factor models based on one-
sided ltering have all been based on a state space representation of the model.
In the case of a single factor model with an AR(1) factor and no lagged factors,
84
the state space representation coincides with the factor model. Higher orders
can be accomodated by expanding the system matrices, and multiple factors
can be represented by stacking state vectors. Serially correlated errors can be
incorporated by their inclusion in the state vector. By this stage however, the
state space representation of the model is becoming somewhat complicated.
The addition of ARMA dynamics increases the complexity further. An al-
ternative approach, which generalises easily from a simple model to one with
complex dynamics, is to carry out estimation in the frequency domain.
In the time domain, the model with ARMA factors and ARMA errors may
be written as
xt = β(L)ft + εt
ft = A(L)−1B(L)ηt
εt = H(L)−1G(L)δt
where A(L), B(L), H(L) and G(L) are diagonal polynomial matrices of degrees
m, d, n and v respectively. The degree of β(L) is denoted q. xt is a N × 1
vector and ft is a k× 1 vector. The elements of ηt and δt are all mutually and
serially uncorrelated at all leads and lags, it is assumed that E(ηtη′t) = Ik, and
we denote R = E(δtδ′t), where R is a N ×N matrix.
The discrete Fourier transform of xt is dened as
xω =T∑
t=1
xte− 2πi
Tωt
for harmonic frequencies ω. Taking Fourier transforms, the above dynamic
85
factor model may be written in the frequency domain as approximately12
xω = β(e−
2πiT
ωt)
fω + εω
fω = A(e−
2πiT
ωt)−1
B(e−
2πiT
ωt)
ηω
εω = H(e−
2πiT
ωt)−1
G(e−
2πiT
ωt)
δω
where fω, εω and δω are Fourier transforms of ft, εt and δt respectively.
Since E(ft) = 0 and E(δt) = 0, it follows from the linearity of the discrete
Fourier transform that E(xω) = 0. Also, because of the linearity of the discrete
Fourier transform, it follows that the covariance matrices of δω, ηω, εω, fω, and
xω are the discrete Fourier transforms of the autocovariance matrix sequences
of δt, ηt, εt, ft, and xt respectively. Therefore, the covariance matrices of xω,
fω and εω are respectively
Sxω = β
(e−
2πiT
ωt)
Sfωβ(e−
2πiT
ωt)H
+ Sεω
Sfω = A
(e−
2πiT
ωt)−1
A(e
2πiT
ωt)−1
B(e−
2πiT
ωt)
B(e
2πiT
ωt)
Sεω = H
(e−
2πiT
ωt)−1
H(e
2πiT
ωt)−1
G(e−
2πiT
ωt)
G(e
2πiT
ωt)
R
(2.5)
where H denotes the complex conjugate transpose. The complex Gaussian
likelihood13 is
L =∑
ω
Lω where
Lω = −1
2ln|Sx
ω| −1
2tr(xωxH
ω Sx−1ω
)
12Because of end-point issues, these equations do not hold exactly.13Whittle (1961).
86
where the sum is over harmonic frequencies.
Some calculus provides the gradient vector and information matrix
∂Lω
∂θ′= 1
2vec[Sx−1
ω
(xωxH
ω − Sxω
)Sx−1
ω
]H ∂vecSxω
∂θ′
E
[∂2Lω
∂θ′∂θ
]= −1
2
[∂vecSx
ω
∂θ′
]H (S
x−1
ω ⊗ Sx−1ω
) [∂vecSxω
∂θ
]where θ is a vector containing the model parameters and S
x
ω is the complex
conjugate of Sxω.
Estimation of the dynamic factor model in the time domain requires the
construction of a state space representation. For models with few lags, this
is elementary. For more complicated lag structures however, it is less simple;
and the derivation of an algorithm to construct a miminal dimension state
space representation for a general model with ARMA factors and errors of
arbitrary orders is a non-trivial task. In the frequency domain however, a
general procedure for computing the likelihood is relatively easy to implement.
Dene the following coecient matrices
A is a k× (m+1) matrix of factor AR coecients, with the rst column
a vector of ones,
B is a k× (d + 1) matrix of factor MA coecients, with the rst column
a vector of ones,
H is a N × (n + 1) matrix of error AR coecients, with the rst column
a vector of ones,
G is a N × (v + 1) matrix of error MA coecients, with the rst column
87
a vector of ones,
R is a N ×N diagonal matrix containing the variances of δt.
β = (β1 · · · βk) is a N×k(q+1) matrix of factor lter coecients, where
βi is a N × (q + 1) matrix containing the coecients linking the q lags
of the ith factor to xt.
Also dene
γpω =
1
e−2πiT
ω
e−2πiT
ω2
...
e−2πiT
ωp
, ϕp
ω = γpωγpH
ω and Θω =
γqω 0 · · · 0
0. . .
......
. . . 0
0 · · · γqω
We then have β(e−
2πiT
ωt)
= βΘω and the covariance of the observable Fourier
ordinates may be written as
Sxω = βΘωSf
ωΘHω β′ + Sε
ω
where
Sfω =
(Bϕd
ωB′ Ik
)(Aϕm
ω A′ Ik)−1
Sεω = (Gϕv
ωG′ IN) (HϕnωH ′ IN)
−1R
and denotes the Hadamard product. This is a general expression which is
correct for all possible model orders and has the advantage of being relatively
easy to code. The value for the covariance at each frequency may be used to
88
compute the likelihood. The derivatives in the expressions for the gradient
vector and the information matrix can be evaluated numerically and a scoring
algorithm implemented to maximise the likelihood.
2.4 A Comparison of the Time Domain and Fre-
quency Domain Algorithms
In this section the relative computational eciency of the time domain and
frequency domain approaches to estimation is investigated. Attention is re-
stricted to models with only autoregressive dynamics since these are relatively
easy to estimate in the time domain. The gures reported in Table 2.1 are
the results of a simulation exercise in which 600 models were estimated. Six
dierent sets of model dimensions were chosen and 100 random data sets were
generated for models with those dimensions. For each of the 100 models the
true parameter values were chosen randomly. All autoregressive parameters
(including variances) were sampled from uniform [0,1] distributions, and the
elements of the lter matrix β(L) were sampled from N(0,1) distributions. For
a given set of randomly chosen parameter values a single random data set
was generated. A model with autogressive factors was tted using the time
domain state space scoring algorithm proposed by Engle and Watson (1981)
and the frequency domain algorithm proposed in Section 2.3. The number
of iterations and the execution time (in seconds) were recorded for each al-
gorithm. All starting values were set equal to 0.5. If an algorithm failed to
converge after 200 iterations it was stopped and recorded as failing to con-
89
verge. It should be noted that the models in Table 2.1 all have low polynomial
degrees. This was done as a matter of practicality. Dynamic factor models
with complex dynamics are dicult to estimate and often require a degree of
human intervention in order get the algorithm to converge. Consequently, they
do not lend themselves to simulation studies. Table 2.1 contains the average
number of iterations to convergence, the average execution time in seconds,
and the failure rate, all rounded to the nearest whole number. Maximization
was carried out using the fminunc procedure from the Optimization Toolbox
in Matlab and was run using Matlab 6.5 on a Pentium M 1.6Mhz notebook
with 1GB of RAM. All the default options of the toolbox were used. It should
be borne in mind that dierent software packages use dierent optimization al-
gorithms and may produce dierent results. Accordingly, the results reported
here are intended to provide an indication of the computational performance
which might be expected, rather than a claim of what will necessarily occur
for any particular data set using any software on any computer.
As a general observation, the new frequency domain algorithm appears
to converge in roughly the same number of iterations as the traditional state
space algorithm in most situations. The exceptions are when there are lags
of factors, in which case the frequency domain approach took fewer iterations,
and the multiple-factor case in which it took more. In all cases though, the
frequency domain approach takes less time to converge than the state space
algorithm. On average, the increase in computational speed was 385%. Given
this faster computation, the relative ease with which it can be programmed,
and the fact that it generalizes easily to the ARMA-factor ARMA-disturbance
case, the frequency domain approach to estimation is an attractive alternative
90
to the traditional state space scoring algorithm.
Table 2.1: Estimation by Time Domain and Frequency Domain Scoring Algo-rithms
Iterations Execution Time Failure RateN=5, q=0, k=1, m=1, n=0, T=100Time Domain 11 9 5%Frequency Domain 13 3 6%N=5, q=0, k=1, m=1, n=0, T=1000Time Domain 11 85 0%Frequency Domain 11 26 0%N=7, q=0, k=3, m=1, n=0, T=100Time Domain 28 108 3%Frequency Domain 42 28 15%N=5, q=0, k=1, m=1, n=1, T=100Time Domain 12 10 2%Frequency Domain 12 3 0%N=20, q=0, k=1, m=1, n=0, T=100Time Domain 11 136 0%Frequency Domain 16 35 0%N=5, q=2, k=1, m=1, n=0, T=100Time Domain 23 46 3%Frequency Domain 16 8 2%N=number of observable variables;q=number of lags of factors;k=number of factors;m=AR order of factors;n=AR order of disturbances;T=number of observations.
2.5 An Empirical Example
In this section, a brief empirical example of a dynamic factor model with
an ARMA factor and ARMA errors is presented. The variables measure the
91
monthly growth rates in industrial production in the G7 countries. The data
are from May 1969 to March 2007 and are seasonally adjusted. They are
taken from the OECD Main Economic Indicators database. Plots of the data
are presented in Figure 2.1. The objective in this section is to t a model with
ARMA factors and ARMA errors as an illustration of the technique discussed
in the previous two sections.
Discussions of order selection are conspicuously rare in the applied dynamic
factor analysis literature14. Presumably the reason for this is the heavy com-
putational cost involved in estimating dynamic factor models, particularly for
models with rich lag structures. The factor lter matrix β(L) contains N × k
dierent polynomial lters, each of which could have a dierent degree. Each
factor and each error has a numerator polynomial and a denomiator polynomial
with degrees which need to be chosen. Even for quite small models with mod-
estly set maximum possible polynomial degrees, the number of combinations
of polynomial degrees is extremely large so much so that an extensive search
which involved estimating all possible models would be infeasible, even if each
model could be estimated very quickly. In practice it is only computationally
feasible to estimate a small number of models15. Given this, and bearing in
mind that the following is intended to be an illustration of a technique, rather
14Camba-Mendez et al. (2001) x the order of the factor lter β(L) at zero, assumethat the error process is white noise, and use the Schwarz-Bayes criterion to choose theautoregressive order between values of 1 and 4. They state that this was done for reasonsof computational feasibility." They also consider the forecasting performance of models withbetween one and four factors. More commonly however, the model orders are simply xedand there is no discussion of their choice, e.g. Lebow (1993)
15As an indication of a rough order of magnitude, it might be feasible to estimate a fewdozen models in a day, provided that the data set did not have any `problematic' featuresand that an experienced analyst was on hand to manage the process, choose starting values,etc.
92
Figure 2.1: Monthly industrial production growth for G7 countries
-5
-4
-3
-2
-1
0
1
2
3
4
1970 1975 1980 1985 1990 1995 2000 2005
IP
Canada
-6
-4
-2
0
2
4
6
1970 1975 1980 1985 1990 1995 2000 2005
IP
France
-10
-5
0
5
10
15
1970 1975 1980 1985 1990 1995 2000 2005
IP
Germany
-15
-10
-5
0
5
10
15
1970 1975 1980 1985 1990 1995 2000 2005
IP
Italy
-4
-3
-2
-1
0
1
2
3
4
1970 1975 1980 1985 1990 1995 2000 2005
IP
Japan
-8
-6
-4
-2
0
2
4
6
8
10
1970 1975 1980 1985 1990 1995 2000 2005
IP
UK
-4
-3
-2
-1
0
1
2
3
1970 1975 1980 1985 1990 1995 2000 2005
IP
US
93
than a serious piece of economic analysis, the following restrictions will be
placed on the model orders. Firstly, all stochastic processes will be modelled
as ARMA(1,1). The ARMA(1,1) model has a good record of performance as a
model for observable stationary economic variables. While it cannot produce
oscillatory behaviour, this is unlikely to be a disadvantage in this application
since all the variables are seasonally adjusted16. Only single factor models
were considered. With seven variables, the identication theorems in Section
2.2 allow for the estimation of up to three factors. However, in this application
we are seeking a single variable which summarises the joint dynamic behaviour
of industrial production across the G7 countries. It will also be assumed that
all the polynomials in the factor lter matrix β(L) are of the same degree q.
The degree q is chosen by minimising the Schwarz-Bayes Criterion over model
orders from zero to six.
As is the case with the traditional time domain state space scoring algo-
rithm, the performance of the frequency domain algorithm of Section 2.3 is
quite sensitive to the choice of starting values. Particularly for models with
rich lag structures, the use of arbitrary starting values often results in the
algorithm failing to converge. Consequently, the following method was used.
Firstly, a model with white noise errors, an AR(1) factor and no lagged factors
was estimated using a starting value of 0.5 for all parameters. The estimates
from this procedure were then used as starting values for a model with the
same structure but with an ARMA(1,1) factor. The estimates of this model
16The exercise was also executed assuming that all the errors and the factor follow AR(2)processes. The results were not markedly dissimilar to those of the ARMA(1,1) model and,since models with autoregressive dynamics have appeared in the literature, the results arenot reported.
94
were used as starting values to estimate a model with the same structure but
with AR(1) errors. These estimates were then used as starting values to esti-
mate a model with an ARMA(1,1) factor and ARMA(1,1) errors. This is the
rst model for which the Schwarz-Bayes Criterion is recorded. Lagged factors
are then successively added to the model by using the estimates of the pre-
vious model as starting values for the parameters that remain in the model,
and zeros as starting values for the newly introduced parameters. In this way,
all seven models under consideration were estimated without any instances of
the algorithm failing to converge, or taking an undue number of iterations to
converge. The total computational time was 22 minutes using the fminunc pro-
cedure from the Optimization toolbox in Matlab 6.5 on a Quad-Core Pentium
computer with 4Gb RAM.
The values of the Schwarz-Bayes Criterion (SBC) and the likelihood for
factor lter degrees (q) from zero to six are presented in Table 2.2. Surprisingly,
the SBC-minimisaton procedure chooses a degree of zero. It is possible that
this is due to the restriction that all the polynomials in β(L) should have the
same order. The consequence of this is that each time q is increased by 1,
seven extra parameters are added to the model.
Table 2.2: SBCs and log-likelihoods (q = the number of lags of factors)
q 0 1 2 3 4 5 6BIC 9.9998 10.0677 10.1236 10.1751 10.2143 10.2666 10.3354L -2183.1 -2177.2 -2168.5 -2158.8 -2146.3 -2136.7 -2131.0
The estimates of the factor loadings and the variances of the shocks of
the error processes (i.e. R as dened in Equation 2.3) are presented in Table
95
2.3. Standard errors are in brackets. For each country, the factor coecient is
signicantly dierent from zero at standard signicance levels.
Table 2.3: Estimates of factor loadings and error variances
Canada US Japan France Germany Italy UK
β0.2794(0.0532)
0.2595(0.0483)
0.3378(0.0627)
0.2475(0.0457)
0.228(0.046)
0.2962(0.0601)
0.176(0.0433)
R0.9687(0.0698)
0.349(0.0294)
1.3149(0.0967)
1.1373(0.0829)
1.8631(0.1285)
4.0346(0.2759)
1.8647(0.1257)
The estimates of the parameters of the ARMA processes used to model
the factors and the errors are presented in Table 2.4, with standard error in
brackets. Interestingly, the MA parameter for the factor is not signicantly
dierent from zero at the 5% signicance level, suggesting that an AR factor
might have been adequate in this case. For the error terms, the results are
mixed. For France and Germany, the MA coecients are not statistically
signicantly dierent from zero, but the AR coecients are; for Japan it is the
MA coecient only which is signicant, and for Canada neither the MA nor
the AR coeents are signicantly dierent from zero. For the US, the UK and
Italy, both coecients are signicant.
Table 2.4: Estimates of ARMA parameters of the factor and errors
ft εCant εUS
t εJapt εFra
t εGert εIt
t εUKt
AR0.8632(0.043)
0.066(0.2038)
0.4505(0.1536)
0.0618(0.128)
0.4835(0.0891)
0.4627(0.0922)
0.7165(0.0882)
0.5658(0.1437)
MA0.2572(0.1875)
-0.1747(0.2111)
0.642(0.1795)
-0.3108(0.1407)
-0.0971(0.0885)
-0.0764(0.0864)
0.3013(0.0693)
0.3015(0.1264)
Figure 2.2 shows the estimated ARMA(1,1) spectra of the errors. With
96
the exception of the US, the variance is concentrated in the higher frequencies.
The estimated ARMA(1,1) spectrum of the factor is shown in Figure 2.3.
Figure 2.2: Estimated ARMA(1,1) spectra of the errors
0.6
0.8
1
1.2
1.4
1.6
1.8
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
Canada
0
0.5
1
1.5
2
2.5
3
3.5
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
France
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
Germany
0
1
2
3
4
5
6
7
8
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
Italy
0.5
1
1.5
2
2.5
3
3.5
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
Japan
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
UK
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
US
In contrast to most of the errors, the variance of the factor appears to be
concentrated in the lower frequencies.
97
Figure 2.3: Estimated ARMA(1,1) spectra of factor
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Frequency (x pi)
Factor
Using a standard property of the complex Gaussian distribution, the ex-
pected value of the Fourier ordinates of the factor are given by
E(fω|yω) = Sfω(βΘω)HSy−1
ω yω
Taking the inverse discrete Fourier transform of the series generated by eval-
uating this expression at all harmonic frequencies provides an estimate of the
factor, which is presented in Table 2.4. The relatively high content of low
frequency uctation is apparent in this plot.
98
Figure 2.4: Estimated ARMA(1,1) factor
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
1970 1975 1980 1985 1990 1995 2000 2005
Factor
2.6 Concluding Comments
This chapter has made three contributions to the literature on dynamic factor
analysis.
(i) The dynamic factor model was derived as a realisation of a `VARMA plus
noise' model where the VARMA component has reduced spectral rank.
It was shown that mutually uncorrelated autoregressive factors may be
derived from the lowest common denominators of the scalar transfer func-
tions in each column of the transfer function matrix. It was also shown
that, in cases where the factor polynomials have no common polynomial
factors, the dynamic factor model corresponds to a minimal dimension
state space representation of the VARMA plus noise model (Proposition
1). Since business cycle models often have fewer structural shocks than
99
observable variables, and macroeconomic variables are measured with
noise, the dynamic factor model with uncorrelated autoregressive factors
is a good choice of model for business cycle analysis and should be viewed
as a competitor to the almost universally used VAR.
(ii) The identication issue was considered for a fairly general class of weakly
stationary dynamic factor model with uncorrelated factors. In cases
where the removal of any row of β(L) leaves rows from which it is possi-
ble to construct two k× k matrices of full rank and another matrix with
at least one row, it was shown that the error spectrum is identied (Theo-
rem 2.2.1) and that the number of dynamic factors is identied (Theorem
2.2.2). It was also shown that, under these conditions, zero-restrictions
similar to those used to identify the static factor model are also identi-
fying for the dynamic factor model (Theorem 2.2.3). Of most interest
is Theorem 2.2.4 which shows that, under the above rank conditions, if
β(L) is irreducible and the spectra of the factors are linearly indepen-
dent, then β(L), the factor specta, and the error spectra are identied up
to sign changes, reordering, and rescaling of the factors. Consequently,
zero-restrictions are not necessary for identication in many forms of
dynamic factor model, including those with autoregressive factors.
(iii) A frequency domain approach was proposed for the estimation of dy-
namic factor models. A simulation exercise suggested that this method
has some computational advantage over the state space scoring algo-
rithms which are usually used for dynamic factor model estimation.
However, the main attraction of the frequency domain approach is the
100
relative ease with which a general algorithm can be coded. The existing
time domain algorithms for the estimation of dynamic factor models re-
quire the construction of a state space representation of the model. For
factor models with few lags, this is trivial. However for more complicated
lag structures, and particularly for ARMA dynamics, this task becomes
more complex, and the construction of a general algorithm, which can
handle any specication of model orders is complicated. As shown in
Section 2.3, in the frequency domain a general expression for the covari-
ance matrix can be written (Equation (2.5)) which make the evaluation
of the likelihood relatively easy to code.
There exists scope for future research in this area. Since the dynamic
factor model with mutually uncorrelated autoregressive factors is not always
a minimal dimension representation of the `VARMA plus noise' model, it is of
interest to derive a realisation which is. Perhaps the most natural way to do
this is to choose a representation of the VARMA component of the model which
is known to be of minimal dimension. If the representation of the VARMA
model is identied in the absence of additive noise, then Theorems 2.2.1 and
2.2.2 could easily be extended to provide identication of the VARMA plus
noise model. This is possible because the proofs to Theorems 2.2.1 and 2.2.2
do not require the existence of factor structure, but rather are based on the
ranks of submatrices of the spectral density matrix of the common component
β(L)ft. Consequently, these theorems could be rewritten to apply directly to
any representation of the VARMA plus noise model. A greater challenge is
oered by the estimation of the VARMA plus noise model. Even in the absence
of additive noise, VARMA estimation is a non-trivial problem. The addition
101
of the noise is unlikely to make things easier.
The identication theorems presented in Section 2.2 provide identication
of the factor spectra. In order for the parameters of the factor processes to
be identied, it must be possible for the factor parameters to be uniquely
determined from the second order moments of the factors. One particularly
interesting case where this is not possible is when the factors follow GARCH
processes. In this case, because the factors are serially uncorrelated, their spec-
tra are not linearly independent, and so Theorem 2.2.4 does not apply. Theo-
rem 2.2.3 provides identication for these models, but requires zero-restrictions
to be imposed, and these may be dicult to justify in practice. It would be
of interest to try to derive a theorem similar to Theorem 2.2.4 that applies to
models with GARCH factors. A possible approach would be to focus on con-
ditional second moments, rather than unconditional second moments. Given
the importance of multiple factor models in empirical nance, this would be a
worthwhile research project.
The simulations reported in Section 2.4 suggest that the frequency domain
approach to dynamic factor model estimation that was proposed in Section
2.3 has some compuational advantage over the traditional state space meth-
ods. However, it should be stressed that, as is the case with the traditional
state space methods, using the frequency domain method to estimate models
with complex lag structures is not a pleasant task. As the number of pa-
rameters grows, the speed with which each iteration is computed slows down
markedly. Practical experience suggests that the number of iterations is also
often larger for models with rich dynamic structures. Furthermore, with large
models, the algorithm often fails to converge and skilled human intervention
102
is often needed to achieve convergence. An important task for future research
then, is to introduce quicker, more robust estimation methods. Currently, sub-
space methods look like the most promising approach to explore. It has been
suggested in this chapter that the dynamic factor model is an attractive model
for empirical macroeconomics, which compares favourably with the commonly
used VAR model. However, until better estimation techniques are derived for
dynamic factor models, it is unlikely that they will see wide use.
Appendix 1 Denitions
• A scalar polynomial is said to be monic if and only if its leading term
has a value of unity.
• A matrix polynomial is said to be unimodular if and only if its inverse
is also a polynomial matrix. It follows that a polynomial matrix is uni-
modular if and only if its determinant is a non-zero constant.
• If P (L) is a N × k rational transfer function matrix (i.e. a matrix of
rational transfer functions), and if P (L) = P (L)DR(L)−1 where P (L)
and DR(L) are N × k and k × k polynomial matrices respectively, then
DR(L) is a right divisor of P (L). If P (L) = DL(L)−1P (L) where DL(L)
is a k × k polynomial matrix, then DL(L) is a left divisor of P (L).
• Let P and Q be a p × k and a q × k matrix polynomial respectively. If
P = PW , Q = QW where P , Q and W are matrix polynomials, then
W is a right divisor of P , Q. If P = WP , Q = WQ where P , Q and W
are matrix polynomials, then W is a left divisor of P , Q.
103
• Two matrix polynomials P and Q of dimensions N × k and q× k respec-
tively, are said to be (left or right) coprime if and only if they have only
unimodular common (left or right) divisors.
• A rational transfer function matrix P (L) is said to be proper if limL→∞
P (L) <
∞, and strictly proper if limL→∞
P (L) = 0.
• A polynomial matrix of full column rank is said to be irreducible if and
only if its rows are right coprime (Kailath (1980)).
Appendix 2 Proofs of Theorems
Proof of Theorem 2.2.1: Consider the spectral representation of two dy-
namic factor models M and M ∗. Without loss of generality, assume that
k∗ 6 k. Under the stated conditions for any j = 1, ..., k we can order the
variables such that
βω =
β1
βj
β2
β3
and β∗ω =
β∗1
β∗j
β∗2
β∗3
for each value of ω, where β1 and β2 are
full-rank k × k matrices, βj is a 1 × k vector, β∗1 and β∗2 are k × k∗ matrices
and β∗j is a 1 × k∗ vector. The subscript ω is suppressed in order to simplify
the notation.
104
For each value of ω we may write
βωSfωβH
ω =
β1SfβH1 β1Sfβ
Hj β1Sfβ
H2 β1Sfβ
H3
βjSfβH1 βjSfβ
Hj βjSfβ
H2 βjSfβ
H3
β2SfβH1 β2Sfβ
Hj β2Sfβ
H2 β2Sfβ
H3
β3SfβH1 β3Sfβ
Hj β3Sfβ
H2 β3Sfβ
H3
where ω is again suppressed, and we may write β∗ωSf∗
ω β∗Hω in a similar fashion.
Consider the (k + 1)× (k + 1) submatrix of βωSfωβH
ω occupying block rows
1 and 2 and block columns 2 and 3.
V =
β1SfβHj β1Sfβ
H2
βjSfβHj βjSfβ
H2
and the corresponding (k + 1)× (k + 1) submatrix of β∗ωSf∗
ω β∗Hω
V ∗ =
β∗1S∗fβ
∗Hj β∗1S
∗fβ
∗H2
β∗j S∗fβ
∗Hj β∗j S
∗fβ
∗H2
Since βωSf
ωβHω is of rank k, V being (k + 1) × (k + 1) must be singular.
Hence,
|V | = (−1)kβjSfβHj |β1Sfβ
H2 |+ f(β, Sf ) = 0 (2.6)
where f(., .) is a bounded, real-valued function of the elements of its matrix
arguments.
If M and M ∗ are observationally equivalent then Sxω = Sx∗
ω . Since Sεω
105
and Sε∗ω are diagonal it follows that the o-diagonal elements of β∗ωSf∗
ω β∗Hω are
equal to the corresponding elements of βωSfωβH
ω . Thus, β∗1S∗fβ
∗Hj = β1Sfβ
Hj ,
β∗1S∗fβ
∗H2 = β1Sfβ
H2 and β∗j S
∗fβ
∗H2 = βjSfβ
H2 . It follows that
V ∗ =
β1SfβHj β1Sfβ
H2
β∗j S∗fβ
∗Hj βjSfβ
H2
Similarly, since V ∗ is a (k + 1) × (k + 1) submatrix of a matrix of rank
k∗ 6 k, we have
|V ∗| = (−1)kβ∗j S∗fβ
∗Hj |β1Sfβ
H2 |+ f(β, Sf ) = 0 (2.7)
Since β1, Sf , and β2 are of full rank, |β1SfβH1 | 6= 0. Thus, equations (2.6) and
(2.7) yield β∗j S∗fβ
∗Hj = βjSfβ
Hj .
Similarly, under Assumption 2.4, it may be shown that the other diagonal
elements of β∗ωSf∗ω β∗Hω are equal to the corresponding elements of βωSf
ωβHω for
all ω. Thus, Sεω = Sε∗
ω for all ω.
Proof of Theorem 2.2.2: Consider two observationally equivalent factor mod-
els M and M ∗. Without loss of generality, assume that k∗ < k. From Theorem
2.2.1 we have βωSfωβH
ω = β∗ωSf∗ω β∗Hω . Let β1ω be a k × k full-rank sub-matrix
of βω and β∗1ω the k× k∗ matrix of corresponding rows from β∗ω. Then we have
β1ωSfωβH
1ω = β∗1ωSf∗ω β∗H1ω . Since β1ω is of full rank |β∗1ωSf∗
ω β∗H1ω | = |β1ωSfωβH
1ω| >
0 ⇒ k∗ > k, a contradiction.
Proof of Lemma 1: From Theorems 2.2.1 and 2.2.2, Sεω and k are identied.
Therefore, if M and M ∗ are observationally equivalent, it must be true that
106
βωSfωβH
ω = β∗ωSf∗
ω β∗Hω
Under the assumptions we may partition βω into a full-rank k × k matrix
β1ω, and a (N − k)× k matrix β1ω, and partition β∗ω similarly. Thus, we have
β1ω
β2ω
Sfω
(βH
1ω βH2ω
)=
β∗1ω
β∗2ω
Sf∗ω
(β∗H1ω β∗H2ω
)
a set of four matrix equations, two of which are
β1ωSfωβH
1ω = β∗1ωSf∗ω β∗H1ω (2.8)
β2ωSfωβH
1ω = β∗2ωSf∗ω β∗H1ω (2.9)
From (2.8) we can write
Sf∗ω = MωSf
ωMHω (2.10)
where
Mω = β∗−11ω β1ω (2.11)
Since
β∗1ω = β1ωM−1ω (2.12)
and
Sfω = M−1
ω Sf∗ω MH−1
ω
107
from (2.9) we have
β2ωM−1ω Sf∗
ω M−1Hω βH
1ω = β∗2ωSf∗ω M−1H
ω βH1ω
so
β∗2ω = β2ωM−1ω (2.13)
Stacking (2.12) and (2.13) gives
β∗ω = βωM−1ω (2.14)
If β and β∗ are irreducible, since β1 and β∗1 are constructed from rows of β
and β∗ respectively, it follows that β and β1 are right coprime and that β∗ and
β∗1 are right coprime. Thus, from the Simple Bezout Identity (Kailath (1980),
section 6.3) there exist polynomial matrices X1ω,, X2ω, X3ω, and X4ω such that
X1ωβω + X2ωβ1ω = I (2.15)
X3ωβ∗ω + X4ωβ∗1ω = I (2.16)
Substituting (2.14) and (2.11) into (2.15) and (2.16) yields M−1ω = X1ωβ∗ω +
X2ωβ∗1ω and Mω = X3ωβω +X4ωβ1ω. Since X1ω,X2ω, X3ω, X4ω, βω, β1ω, β∗ω, and
β∗ω are polynomial matrices, Mω and M−1ω are polynomial matrices. It follows
that Mω is unimodular.
Proof of Theorem 2.2.3: A permutation matrix is dened as a square ma-
trix for which each column and each row has exactly one element with a value
of unity. All other elements are zero.
108
From Theorem 2.2.2 and Lemma 1 we need only consider k-factor models
of the form
Sxω = β∗ωSf∗
ω β∗Hω + Sεω
where β∗ω = βωM−1ω and Sf∗
ω = MωSfωMH
ω and Mω is a k × k unimodular
operator.
Dene β1ω and β∗1ω as k × k lower triangular matrices constructed from
rows of βω and β∗ω respectively. We may then write Mω = β∗−11ω β1ω. Since
β1ω is lower triangular so is β−11ω . It follows that Mω is also lower triangular.
Now consider Sf∗ω = MωSf
ωMHω . Since Mω is lower triangular and Sf
ω and Sf∗ω
diagonal we have M−1ω Sf∗
ω = SfωMH
ω where the left-hand side is lower triangular
and the right-hand side upper triangular. It follows that Mω is diagonal. Since
from Lemma 1 Mω is also unimodular, it must be a constant. Thus, Sf∗ω is a
rescaling of Sfω.
Proof of Theorem 2.2.4: From Lemma 1
maxi,j
q∗ij∑j=0
β∗j e−ijω =
s∑l=0
maxi,j
qij∑j=0
βjMle−i(j+l)ω
for nite, non-negative s. Matching terms, it is clear that the above equality
holds if and only if maxi,j
q∗ij > maxi,j
qij. The unimodularity of Mω under the
assumed conditions allows us to similarly argue that maxi,j
qij > maxi,j
q∗ij. There-
fore maxi,j
q∗ij = maxi,j
qij and we may match the terms in the above equation and
solve to nd that Mω = M , a real constant. The factor-spectra of the set
of observationally equivalent models are therefore Sf∗ω = MSf
ωM ′. For this
equality to hold, it is required that (Mi Mj) diag(Sf
ω
)= 0 for i 6= j, where
Ml is the lth row of M and denotes the Hadamard product. Therefore, under
Assumption 1.2, (Mi Mj) = 0 for i 6= j. Thus, MM ′ is diagonal. This allows
us to write M = Md 12 M o where Md is a k × k non-singular diagonal matrix
109
and M o is a k × k orthogonal matrix. We then have M oSfωM oH = MdSf∗
ω .
The diagonality of the right-hand side implies that MdSf∗ω is the matrix of
eigenvalues of Sfω. Since Sf
ω is diagonal, Md = Ik and Sf∗ω is the spectrum of a
permutation of the factor vector ft.
110
Chapter 3
Principal Components Estimation
of Large-Scale Factor Models
The dynamic factor model described in Chapter 2 is suciently parsimonious
to be used in cases in which the number of variables is somewhat larger than
what might normally be used to estimate a vector autoregression. However,
this does not mean that it is necessarily suitable no matter how large the num-
ber of variables. In cases in which the number of variables is of the same order
of magnitude as the number of observations, three problems exist. Firstly,
the computational cost of the estimation procedure in Chapter 2 becomes
prohibitive. With the exception of models with extremely crude dynamics,
maximum likelihood estimation of dynamic factor models becomes computa-
tionally infeasible quite quickly as the number of variables in the model grows.
Secondly, the assumption that the spectrum of the error process is diagonal
becomes harder to believe the larger the number of variables in the model.
The magnitude of the bias caused by the presence of cross-correlation between
111
the errors is apparantly unknown. While it might be expected that this bias
would be small in cases where there was only a small amount of error cross-
correlation, there is no good general reason to expect the amount of correlation
to remain small as the number of variables grows. The third problem is that the
standard asymptotic theory for likelihood estimation assumes that the number
of variables is xed while the number of observations goes to innity. A more
appropriate asymptotic model in the case in which the number of variables is
of the same order of magnitude as the number of observations, is one in which
the number of variables and the number of observations diverge jointly.
In recent years there has been much interest in estimating the parameters of
large-scale factor models using principal components techniques. An obvious
advantage of this approach is the relative ease with which the eigenvalues and
eigenvectors of large symmetric matrices can be computed. In particular, when
computing the rst k sample principal components of a sequence of T N × 1
random vectors, it is not required that T > N . There now exists a substantial
body of asymptotic theory that considers the behaviour of principal component
based estimators as (T, N) −→ (∞,∞). Consider the static factor model
xt = Bft + εt
yt+h = β′ft + γ′zt + ηt
where xt is a N × 1 vector of observable variables, yt is a scalar observable
variable, zt is a m × 1 vector of predetermined variables (which may include
lags of the dependent variable), ft is a k × 1 vector of unobservable factors,
εt is a N × 1 vector of unobservable errors, ηt is a scalar unobservable er-
112
ror, B is a N × k matrix of non-random factor loadings and β and γ are
vectors of regression coecients. Let sft be a vector containing the sam-
ple principal components corresponding to the rst k sample eigenvalues of
Sxx = 1T
T∑t=1
xtx′t, let δ =
(β′ γ′
)′be the OLS estimator of δ = (β′ γ′)′
computed using the sample principal components in place of the unobserv-
able factors, and let ysT+h = δ′(s′fT z′T
)′be an estimator of the infeasible
population forecast δ′ (f ′T z′T )′. Stock and Watson (2002a) prove that, un-
der certain conditions as (T, N) −→ (∞,∞), sftp−→ Lft, δ
p−→
L 0
0 Im
δ
and ysT+hp−→ δ′ (f ′T z′T )′ where L is a k × k sign matrix1. Under slightly
dierent conditions, Bai and Ng (2002) prove that, for a xed value of t,
min(N, T ) ‖sft −HN,T ft‖2 = Op(1) where HN,T is a sequence of non-singular
matrices. Bai (2003) shows that under similar conditions, as (T, N) −→
(∞,∞), if√
NT−→ 0 then
√N (sft −HN,T ft) converges to a Gaussian distri-
bution and that if√
NT
> τ > 0 then T (sft −HN,T ft) = Op(1), where the value
of t is xed. When√
TN−→ 0 he shows that
√T(λ
12i qi −H−1
NT bi
)converges to
a Gaussian distribution where λi is the ith eigenvalue of SXX = 1TN
X ′X and qi
is the corresponding eigenvector, and bi is the ith row of B. If√
TN
> τ > 0 then
N(λ
12i qi −H−1
NT bi
)= Op(1). Bai (2003) also proves asymptotic Gaussianity
for the estimator of the common component and provides a uniform bound
on the factors of max16t6T
‖sft −HN,T ft‖2 = Op
[max
(T− 1
2 ,√
TN
)]. Bai and Ng
(2006) also prove that as (T,N) −→ (∞,∞) if√
TN
−→ 0,√
T(δ − δ
)con-
verges to a Gaussian distribution, and that if√
NT−→ 0, yT+h−yT+h√
var(yT+h)
d−→ N(0, 1),
where yT+h is a forecast of yT+h computed using the OLS estimates and the
1That is, a k × k diagonal matrix with each diagonal element either 1 or −1.
113
principal components estimator of the factor.
Forni et al. (2000) and Forni et al. (2004) consider the use of dynamic
principal components techniques to estimate the dynamic factor model
xt = B(L)ft + εt
They prove consistency as (T,N) −→ (∞,∞). However, since this chapter
does not consider the dynamic model, further details of their ndings will not
be provided here.
An important feature of the results outlined above is that they apply to
the `approximate' factor model, which allows for a degree of cross-correlation
between the errors. Specically, if Ψ = E(εtε′t) and Ψij is the i, jth element of
Ψ then Stock and Watson (2002a) and Bai and Ng (2002) assume that there
exists a nite bound M such that 1N
N∑i=1
N∑j=1
|Ψij| < M . Bai (2003) and Bai and
Ng (2006) make the slightly stronger assumption thatN∑
j=1
|Ψij| < M for all
i, i.e. the rows of the error covariance matrix are uniformly bounded. Forni
et al. (2000) and Forni et al. (2004) assume that the dynamic eigenvalues of
the error spectrum are uniformly bounded, which is implied by a similar ab-
solute summability condition on the rows of the error spectral density matrix.
Clearly these are weaker assumptions than the diagonal error covariance that is
assumed for the `strict' factor model. However, it is possible that the `approxi-
mate' factor model could still be too restrictive for the type of macroeconomic
analysis that has appeared in the literature. It is not hard to believe that
increasing the number of variables in a factor model would cause the absolute
row sums of the error covariance to increase without a xed bound, rendering
114
the existing theory inapplicable.
Empirical support for principal components estimation of macroeconomic
factor models is encouraging, but not uniformly so. Some studies, e.g. Stock
and Watson (2002b), Brisson et al. (2003), Schneider and Spitzer (2004) and
Camacho and Sancho (2003), nd large improvements in macroeconomic fore-
casting performance from the use of factors, with mean squared forecasting
errors reduced by over 40% compared to scalar autoregressions in some cases.
Others, e.g. Angelini et al. (2001), Giacomini and White (2003), Eklund and
Karlsson (2007), Schumacher (2005) and Banerjee and Marcellino (2006), nd
little benet from the use of principal component estimates of factors in fore-
casting models. Interestingly, there also exists some evidence that it is possible
for the number of variables in the factor model to be too large. Boivin and Ng
(2006) nd that 40 carefully chosen variables can yield better results than 147
variables when forecasting 8 measures of economic activity and ination for
the US. Inklaar et al. (2003) consider the construction of a coincident indicator
for the Euro area and nd that a factor model estimated using 38 carefully
chosen macroeconomic variables produces an indicator that is at least as good
as that produced by estimating a factor model estimated using their entire
database of 246 variables. Schneider and Spitzer (2004) consider forecasting
Austrian GDP using a dynamic factor model estimated by dynamic principal
components. They nd that models that include only 5 to 11 variables perform
signicantly better than a model with 143 variables. den Reijer (2005) consid-
ers using a dynamic factor model of 370 variables to forecast Dutch GDP, but
nds that models of 147 and 223 carefully chosen variables perform better.
Boivin and Ng (2006) also perform monte carlo simulations with dierent
115
degrees of error cross-correlation and with varying factor loadings, and demon-
strate that increasing the number of variables in the factor model can worsen
the performance of forecasts and factor estimators in some cases in which the
degree of error cross-correlation grows with N . Despite the importance of
these simulation results, to my knowledge, there is yet to appear any theory
that deals with cases in which the absolute row sums of the error covariance
matrix grow without a xed bound as the number of variables increases. Fur-
thermore, it is not clear how applied researchers might measure the degree of
error cross-correlation in order to make a judgement about the suitability of
the principal components estimator in any particular application. Given the
patchy empirical performance of large scale factor models, this is an important
area for theoretical research.
Section 3.1 of this chapter presents some new theory for principal com-
ponent estimators of static factor quantities in a setting in which (T,N) −→
(∞,∞). Consistency is proved under conditions which allow the absolute
row sums of the error covariance matrix to grow at a rate of O(N1−α) where
0 < α 6 1. However, the faster the growth in error cross-correlation, the
slower is the rate of convergence of the estimators. Furthermore, it is shown
that what really matters for estimation is not the number of variables per se,
but rather the magnitude of the noise-to-signal ratio, which is measured as σ2
λk
where σ2 is the largest eigenvalue of the error covariance matrix Ψ, and λk is
the kth eigenvalue of Ω = E(
1TX ′X
). When the noise-to-signal ratio is small,
population principal component and population factor quantities will be sim-
ilar. While the noise-to-signal ratio is not generally identied, it is possible
to construct lower bounds for it which are. In Section 3.2 a hypothesis test
116
for the magnitude of the noise-to-signal ratio is proposed. This test is devel-
oped in a framework in which N is xed and T −→ ∞, rather than one in
which (N, T ) −→ (∞,∞) jointly. However, simulations show that it performs
reasonably well in some cases in which N is large compared to the number
of observations. In Section 3.3 this test is used to consider whether the fac-
tor model estimated by Stock and Watson (2002b) has a small noise-to-signal
ratio.
3.1 Theory
The approach taken in this chapter is to consider the decomposition
sft − Lft = (sft − L1sft) + (L1sft − Lft) (3.1)
where sft is a vector containing the rst k principal components of the ob-
servable vector xt, sft is the vector containing the corresponding population
principal components, ft is the k×1 factor vector, and L and L1 are k×k sign
matrices. That is, the dierence between sample principal components and
population factors will be considered by separate consideration of the dierence
between sample principal components and population principal components
and the dierence between population principal components and population
factors. This section presents four sets of theoretical results which are relevant
to this decomposition.
(i) In Subsection 3.1.1 the work of Schneeweiss (1997) is extended to provide
a set of bounds on the `distance' between population principal compo-
117
nent quantities and their analogous factor quantities (Theorem 3.1.1).
In particular, an upper bound is placed on E ‖sft − Lft‖2F (Theorem
3.1.1(c)). The key parameter in these bounds is the noise-to-signal ratio
(ρ), which is measured as
ρ =σ2
λk
where σ2 is the largest eigenvalue of the error covariance matrix Ψ, and λk
is the kth eigenvalue of Ω = E(
1TX ′X
). When the noise-to-signal ratio is
small, population principal component and population factor quantities
will be similar. It is well-known that for a matrix A, ‖A‖2 6 ‖A‖1 ‖A‖∞.
Since Ψ, the covariance matrix of the errors, is symmetric, the following
Lemma is true.
Lemma 2. σ2 6 maxi
N∑j=1
|Ψij|
This lemma will be used subsequently to provide a link between the
noise-to-signal ratio and bounding conditions on the row sums of the
absolute value of the error covariance similar to those which have been
employed in the literature.
(ii) In Subsection 3.1.2 some conditions are presented under which the noise-
to-signal ratio shrinks as N grows (Theorem 3.1.2).
(iii) In Subsection 3.1.3 the relationship between sample principal compo-
nents and population principal components is considered. Under condi-
tions whereby the distance between each of the rst k eigenvalues and
all other eigenvalues grows at a rate of N , it is shown (Theorem 3.1.3)
that sample principal component quantities are√
T -consistent estimators
118
of population principal component quantities in a framework in which
(N, T ) −→ (∞,∞) jointly.
(iv) In Subsection 3.1.4 the results in subsections 3.1.1, 3.1.2 and 3.1.3 are
used with Equation (3.1) to develop results (Theorem 3.1.4) which give
conditions under which sample principal component quantities are con-
sistent estimators of population factor quantities.
The most commonly used notation is dened below. Less frequently used
notation will be dened when it is used.
The factor model, and the factor regression equation, are written as
xt = Bft + εt (3.2)
yt = β′ft + ηt (3.3)
where xt is a N ×1 vector of observed variables , yt is scalar observed variable.
εt is a N × 1 error vector, and ηt is a scalar error term. ft is a k × 1 factor
vector, B is a N×k matrix of factor loadings, and β is a k×1 regression vector.
All random variables are assumed to have expected values of zero. Without
loss of generality, it is assumed that E(ftf′t) = Ik. The covariance matrix of
xt is denoted Ω and the covariance matrix of εt is denoted Ψ. Therefore,
Ω = BB′ + Ψ
Λ is a N × N diagonal matrix containing the eigenvalues of Ω in descending
order, λ1, λ2, ..., λN . Q is the N×N matrix whose columns are the eigenvectors
119
corresponding to the diagonal elements of Λ. Λ is partitioned so that the top
left k × k block Λf contains the rst k eigenvalues. The remaining N − k
eigenvalues are contained in the lower right block Λ⊥. Q is similarly partitioned
into the N × k matrix Qf and the N × (N − k) matrix Q⊥. Therefore
Λ =
Λf 0
0 Λ⊥
and Q =
(Qf Q⊥
)
The population principal component vector st is partitioned into the rst k
principal components sft and the remaining N − k principal components s⊥t.
Therefore we have
st =
sft
s⊥t
= Λ− 12 Q′xt
An OLS estimate of the regression coecient in Equation (3.3) computed using
the population factors is
βf =1
T
T∑t=1
ftyt
The regression coecient computed using the population principal components
in place of the factors is
βs =1
T
T∑t=1
sftyt
The population forecast for period T +h computed at time T using the factors
is dened as
yfT+h = βffT+h
120
The forecast for period T+h computed at time T using the population principal
components instead of the factors is dened as
ysT+h = βssfT+h
The k×k diagonal matrix containing the eigenvalues of BB′ in descending
order, d1, d2, ..., dk, is denoted D. The N×k matrix containing as columns the
corresponding eigenvectors is denoted U . Therefore
BB′ = UDU ′
The largest eigenvalue of the error covariance matrix Ψ is denoted as σ2. The
noise-to-signal ratio (ρ) is dened as
ρ =σ2
λk
=maxeig(Ψ)
λk
The sample covariance matrix of xt is denoted
Sxx =1
T
T∑t=1
xtx′t
Sample quantities that are derived from Sxx are denoted by the same notation
used for their population counterparts, but with a `hat' to indicate that they
are sample estimates. Therefore, the eigenvalues and eigenvectors of Sxx are
given by
Λ =
Λf 0
0 Λ⊥
and Q =
(Qf Q⊥
)
121
the sample principal components are given by
st =
sft
s⊥t
= Λ− 12 Q′xt
the sample principal component regression estimator is
βs =1
T
T∑t=1
sftyt
and the sample forecast is
ysT+h = βssfT+h
3.1.1 Population Principal Components and Population
Factors
In this subsection the relationship between population principal component
quantities and population factor quantities is considered. It answers a fun-
damental question which has not received direct attention in the econometric
literature on large factor model; that is: under what conditions are factors
and principal components similar? While the assumptions made by Bai and
Ng (2002), Bai (2003), Bai and Ng (2006) and Stock and Watson (2002a) con-
stitute an answer to this question, their theoretical setup creates the need to
deal with sampling issues in a dual asymptotic framework, which somewhat
obscures the fundamental issue at hand. A consideration of population quanti-
ties alone simplies the problem and allows us to provide a concise and explicit
answer to this question.
122
Bearing in mind that principal component analysis and factor analysis are
widely used techniques which have co-existed for around 75 years, it is sur-
prising that this question has received so little attention in the literature.
Nonetheless, there are some researchers have considered this issue in the past.
In the context of a generalisation of the results on arbitrage pricing and factor
structure of Ross (1976), Chamberlain and Rothschild (1983) show that for
an approximate factor model, the eigenvectors of the population covariance
matrix of xt are asymptotically equivalent to the population factor loadings as
N gets large. Bentler and Kano (1990) consider a single factor model and show
that as N →∞ the correlation between the rst population principal compo-
nent and the population factor converges to one and the principal component
loading vector converges to the factor loading vector. In a signicant paper,
Schneeweiss and Mathes (1995) consider a k-factor static model and analyse
the sum of the canonical correlation coecients between the population factors
and the population principal components. They show that this sum approaches
k as σ2
dk−→ 0 where σ2 is the largest eigenvalue of Ψ and dk is the smallest
eigenvalue of B′B. They also prove similar results for the factor loadings
and the principal component loadings. Under similar conditions, Schneeweiss
(1997) proves that∥∥∥BD− 1
2 −QfL∥∥∥
F−→ 0 and E ‖sft − Lft‖F −→ 0, where
D is a diagonal k× k matrix containing the ordered eigenvalues of B′B, Qf is
the N × k matrix containing the eigenvectors of Ω = E(xtx′t) corresponding to
the rst k eigenvalues, sft is a vector containing the rst k principal compo-
nents of xt, L is a k×k sign matrix2 and ‖.‖F denotes the Frobenius norm. The
work of Schneeweiss (1997) is signicant since, in the long history of the prin-
2i.e. the diagonal elements of S are all ±1.
123
cipal component analysis and factor analysis literatures, it is the rst paper to
provide a detailed account of the `distance' between principal components and
factors. Since it deals only with population quantities it has nothing explicit
to say about sampling issues. However, it provides substantial insight into the
structure of the relationship between principal components and factors which
is subsequently used in this chapter to develop a sampling theory.
The remainder of this subsection presents some new theory linking pop-
ulation principal component quantities to their analogous population factor
quantities. In contrast to the asymptotic theorems of Schneeweiss (1997), the
results produced in this subsection are bounds on the distances between prin-
cipal component and factor quantities that hold for any number of variables.
Consider the factor model given by equations (3.2) and (3.3).
Let
r2 =‖β‖2
‖β‖2 + σ2η
where σ2η = E(ηtη
′t). Note that r2 is the proportion of the variance of yt that
is explained by the factors. Therefore, it may be interpreted as the population
analogue of the R2 statistic from regression analysis. Denote
δ =∞∑
j=1
∣∣∣∣E(ytyt−j)
E(y2t )
∣∣∣∣+ supi
∞∑j=1
∣∣∣∣∣ E(ytsi,t−j)√E(y2
t )E(s2it)
∣∣∣∣∣where sit is the ith principal component measured at time t. This variable
appears in one of the bounds that is subsequently derived. Also denote
c = max16i6k16j6N
i6=j
λi
|λj − λi|
124
for k > 1 and c = 0 for k = 1. Note that c provides a measure of the
relative closeness of adjacent eigenvalues. We dene the forecast deviation as
eT+h = ysT+h − yT+h = β′ssfT+h − yT+h.
Theorem 3.1.1 presents a set of bounds for the dierences between princi-
pal component quantities and their corresponding factor model quantities. In
each case the bound is a function of the noise-to-signal ratio ρ = σ2
λk. Conse-
quently, provided that the noise-to-signal ratio is suciently small, an analysis
of population factor quantities may be undertaken by considering the analo-
gous principal component quantities. All proofs are in the appendix.
Theorem 3.1.1.
For the factor model described above
(a) 1− ρ 6 di
λi6 1 for i = 1, .., k.
(b) if k = 1 or c 6 12ρ
√1−ρk−1
, then there exists a sign matrix L such that
‖Qf − UL‖2F 6 k (ρ + 4c2ρ2(k − 1)) .
(c) if k = 1 or c 6 1−ρ
2ρ√
(k−1)(1−ρ), then there exists a sign matrix L such that
E ‖sft − Lft‖2F 6 k (2ρ + ρ2 (4c2(k − 1)(1− ρ)− 1)) .
(d) if k = 1 or c 6 1−ρ
2ρ√
(k−1)(1−ρ), then there exists a sign matrix L such that
E ‖βs − Lβf‖2F 6 σ2
yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1)).
(e) if ft, εt and ηt are Gaussian and γ < ∞, then E|eT+h|√β′β
6√
k(ρ2 + 2kδ
r2T
)+√
kρ(1 + 2γ
r2T
).
The following asymptotic results follow trivially from these theorems:
Corollary 1.
125
(a) di
λi→ 1 as ρ → 0;
(b) If c < c < ∞ then there exists a sign matrix L such that ‖Qf − UL‖F →
0 as ρ → 0;
(c) If c < c < ∞ then there exists a sign matrix L such that ‖sft − Lft‖F
p−→
0 as ρ → 0;
(d) If c < c < ∞ then there exists a sign matrix L such that∥∥∥QfΛ
12f −BL
∥∥∥F→
0 as ρ → 0.
(e) If c < c < ∞ then there exists a sign matrix L such that ‖βs − Lβf‖F
p−→
0 as ρ → 0.
(f) If δ < δ < ∞ then eT+hp−→ 0 as
(1T, ρ)→ (0, 0).
Corollaries 1(a), 1(b), and 1(c) were previously proved by Schneeweiss
(1997). Corollaries 1(d), 1(e) and 1(f) are new. Importantly, Theorem 3.1.1
is new and provides rates of convergence which are necessary for subsequent
theorems.
Note that, in order to be non-trivial, Theorems 3.1.1(b), 3.1.1(c) and
3.1.1(d) require the rst k eigenvalues of Ω to be distinct so that c is bounded.
The distance between the relevant quantities in these theorems depends on
the closeness of adjacent eigenvalues and on the noise to signal ratio. Theorem
3.1.1(e) assumes Gaussianity. It is quite likely that this assumption could be
replaced by the assumption of an upper bound on sums of fourth moments.
However Gaussianity produces a result which is more easily interpretable. In
any case, in the asymptotic arguments that follow, the assumption of Gaus-
sianity will not be needed. In order for the principal component forecast and
126
the theoretically optimal forecast to be close, we need the noise to signal ratio
to be fairly small and the sample size to be reasonably large. Precisely how
large the sample size needs to be will depend on the magnitude of the auto-
covariances of the forecast variable and the proportion of the variance of the
forecast variable that is determined by the factors.
It should be noted that the computation of the noise to signal ratio requires
knowledge of the eigenvalues of the error covariance matrix. In cases in which
this information is unavailable, it may be useful to have a lower bound for ρ.
It is shown in the proof to Theorem 3.1.1(a) that dj + σ2 > λj ∀j = 1, .., N .
Since dk+1 = 0, λk+1 6 σ2. We therefore have that λk+1
λk6 ρ where λj is the
jth eigenvalue of the covariance matrix of xt. This expression makes it clear
that in order for the noise to signal ratio to be small, implying that population
principal components are close to factors, there must exist a large relative
gap between the kth and (k + 1)th eigenvalues of the covariance matrix of the
observable variables. This is an interesting observation since practitioners of
principal component analysis will often plot sample eigenvalues, look for a
large gap between two adjacent eigenvalues, and include in their subsequent
analysis only those principal components that correspond to the set of larger
eigenvalues. As the above analysis shows, these are the principal components
which are likely to be the most strongly related to the factors in a factor
representation of the variables. Consequently, in addition to accounting for a
large proportion of the variance, these principal components are also likely to
explain a large proportion of the correlation between the observable variables.
127
3.1.2 N and the Noise to Signal Ratio
The theory in Section 3.1.1 links the dierence between population principal
component and population factor quantities to the noise-to-signal ratio. This
begs the question: under what conditions might the noise-to-signal ratio be
expected to be small? Bai (2003) and Bai and Ng (2006) assume that 1N
B′B →
ΣB > 0 as N → ∞. In the case where all factors are `strong' in the sense of
Onatski (2006a), this implies that all k eigenvalues of B′B, and consequently
the rst k eigenvlaues of Ω, λ1, ..., λk, grow at a rate of N . Bai (2003) and
Bai and Ng (2006) also assume thatN∑
j=1
|Ψij| < M < ∞ for all i. Since Ψ is
symmetric, it follows from Lemma 2 that σ2 6√
M . Since λk grows at a rate of
N and σ2 has a xed upper bound, under these restrictions ρ = σ2
λk= O (N−1).
However, it is clear that these restrictions on the absolute row sums are stronger
than is necessary for ρ −→ 0 as N −→ ∞, and some interesting cases do not
satisfy these restrictions.
One particular case of interest is where the eigenvalues of B′B grow at a rate
strictly less than N so that the factors are `weak' in the sense that, as N →∞,
the proportion of tr(Ω) that is explained by the factors converges to zero. In
such cases, provided that the rate of growth of the kth eigenvalue of B′B is
greater than the rate of growth of the largest eigenvalue of Ψ, ρ will shrink
as N → ∞. Consequently, population principal components and population
factors will be close. If techniques for estimating principal components could
be developed for this case, then the estimated principal components could be
used as factor estimates. At present however, it is not known how this can be
done.
128
Another interesting case, which is the one explored in detail in this chapter,
is when the eigenvalues of B′B grow at a rate of N and the largest eigenvalue of
Ψ also grows. This is the case which is relevant when the `approximate' factor
restriction does not hold, so that the error cross-correlation grows without
bound as N →∞. It is easy to see that, provided that the largest eigenvalue
grows at a rate strictly less than N , the noise-to-signal ratio will shrink as N
gets large. For subsequent use, this is now stated as a Theorem.
Theorem 3.1.2. For the factor model described above, if
1. 0 < dL <dj
N< dU < ∞ for j = 1, .., k where dj = eigj(B
′B) and k is a
xed scalar.
2. σ2 = O (N1−α) where 0 < α 6 1 and σ2 = maxeig(Ψ).
then ρ = σ2
λk= O (N−α). Furthermore ρ = σ2
dk= O (N−α).
Notice that, from Lemma 2, Assumption 2 is satised wheneverN∑
j=1
|Ψij| =
O(N1−α). Therefore, with the eigenvalues of B′B growing at a rate of N , pop-
ulation principal component quantities can consistently estimate their popula-
tion factor counterparts as N −→ ∞ even with the absolute row sums of the
error covariance matrix diverging. Theorem 3.1.2 also makes a claim about
the magnitude of an alternative noise-to-signal measure ρ = σ2
dk. This result
will be used in the proof of a subsequent theorem.
With a connection between the dierence between population principal
component quantities and the corresponding population factor quantities es-
tablished, what is now required is some theory linking sample principal com-
129
ponents to population principal components in a setting in which (N, T ) −→
(∞,∞).
3.1.3 Sample Principal Components and Population Prin-
cipal Components
This subsection presents some new consistency results for the sample eigen-
values, sample principal components, estimates of coecients from sample
principal component models, and forecasts conditional on sample principal
components constructed from of a sequence of T N × 1 random vectors, in
a setting in which (T, N) −→ (∞,∞), and a `gap' assumption is satised
such that the distance between each of the rst k eigenvalues and any other
eigenvalue grows at a rate of at least N . This gap assumption is satised by
the factor models considered in the following subsection. The setting here is
dierent to the case covered by the classical asymptotic analysis of Anderson
(1963) since N is assumed to be growing with T rather than remaining xed.
It is also dierent from the Random Matrix Theory framework since the gap
assumption forces the rst k eigenvalues to grow at a rate of N . The formal
statement of the assumptions is as follows.
Assumptions 3 (Theorem 3.1.3).
3.1 E (xt) = 0 for t = 1, .., T ;
3.2 E (xtx′t) = Ω for t = 1, .., T , and 1
Ntr(Ω) = O(1);
3.3 supt
supN
max16i6N
16j6N
∞∑r=0
|cov (xitxjt, xit−rxjt−r)| < γ < ∞
130
3.4 Gap Assumption: ∃∆ > 0 such that ∆N < max16i6k16j6N
i6=j
|λj − λi|
3.5 E(yt) = 0; E(y2t ) = σ2
y.
Assumptions 3.1, 3.2 and 3.3 are fairly standard assumptions for time series
and are made to ensure that Sxx − Ωp−→ 0 on an element-by-element basis.
Consider Assumption 3.3. In the Gaussian case
cov (xitxjt, xit−rxjt−r) = E(xitxit−r)E(xjtxjt−r) + E(xitxjt−r)E(xjtxit−r)
Suppose that xt = wzt where w is a N × 1 vector of ones and zt is a scalar
stationary Gaussian AR(1). Then
supt
supN
max16i6N
16j6N
∞∑r=0
|cov (xitxjt, xit−rxjt−r)| =2σ2
z
1− θ2
where σ2z is the variance of zt and θ is the autoregressive parameter, so As-
sumption 3.3 is satised. Assumption 3.5 is unremarkable. Assumption 3.4 ,
the `gap' condition, is worthy of more detailed comment. It requires that each
of the rst k eigenvalues of the covariance matrix diverges from each of the
last N − k eigenvalues at a rate of at least N . Since Assumption 3.2 limits the
rate of growth of the sum of all the eigenvalues to N , this means that, at most,
a nite number of eigenvalues can grow with N . The rest must be bounded.
Note that this does not imply that λk+1 is bounded. In fact, λk+1 could grow
at a rate as high as N without violating the assumptions made above. As a
simple example to illustrate this point, suppose that λk = `k and λk+1 = `k+1
where `k−`k+1 > ¯> 0. Then ¯N < |λk−λk+1|, so Assumption 3.4 is satised.
131
Similarly, the rst k eigenvalues can all grow at a rate of N and also diverge
from each other at a rate of N . What is required is that only a nite number
of eigenvalues can grow, and that those eigenvalues must be distinct. In Sub-
section 3.1.1, Corrollaries 1 (a) to (e) require that c = max16i6k16j6N
i6=j
λi
|λj−λi| < c < ∞.
It is easy to show that this condition satises Assumption 3.4 whenever the
rst k eigenvalues grow at a rate of N .
In the Appendix, the following results are proved.
Theorem 3.1.3.
(a) Under assumptions 3.1, 3.2 and 3.3, max16j6N
1N|λj − λj| = Op
(T− 1
2
).
(b) Under assumptions 3.1 to 3.4, there exists a k × k sign matrix L such
that ‖sft − Lsft‖2 = Op
(T− 1
2
).
(c) Under assumptions 3.1 to 3.5, there exists a k × k sign matrix L such
that∥∥∥βs − Lβs
∥∥∥2
= Op
(T− 1
2
).
(d) Under assumptions 3.1 to 3.5 |ysT+h − ysT+h| = Op
(T− 1
2
).
Therefore, the scaled sample eigenvalues, the rst k sample principal com-
ponents, regression coecients computed using the rst k sample principal
components, and forecasts computed using those regression coecients, are all√
T -consistent estimators of their population counterparts under the stated as-
sumptions. Importantly, Theorem 3.1.3 does not require N to be xed or small
relative to T . Rather, it provides consistency results which hold for sequences
in which T and N grow simultaneously, without placing any restrictions on
the relationship between their growth rates.
132
3.1.4 Sample Principal Components and Population Fac-
tors
The previous subsections give conditions under which population factors are
close to population principal components, and conditions under which pop-
ulation principal components are close to sample principal components. In
this subsection, these ideas are combined to produce theorems linking sample
principal component quantities to population factor quantities. Specically,
conditions are presented under which sample principal component quantities
are consistent estimators of analogous population factor quantities.
For the factor model given by equations (3.2) and (3.3) the following as-
sumptions are made
Assumptions 4 (Theorem 3.1.3).
4.1 (a) 0 < dL <dj
N< dU < ∞ for j = 1, .., k where dj = eigj(B
′B) and k
is a xed scalar.
(b) if k > 1 then c = max16i6k16j6N
i6=j
di
|dj−di| < c < ∞.
4.2 (a) 1N
tr (Ψ) = O(1) where Ψ = E(εtε′t).
(b) σ2 = O (N1−α) where 0 < α 6 1 and σ2 is the largest eigenvalue of
Ψ.
4.3 (a) E (ft) = 0, E (εt) = 0, E (ftf′t) = Ik, E (ftεt) = 0 for t = 1, .., T ;
(b) E(ηt) = 0, E(η2t ) = σ2
η, E(ftηt) = 0 for t = 1, .., T ;
133
4.4 supt
supN
max16i6N
16j6N
∞∑r=0
|cov (vitvjt, vit−rvjt−r)| < γ < ∞ where vt =
(f ′t ε′t η′t
)′and N = N + k + 1.
4.5 E(yt) = 0; E(y2t ) = σ2
y.
Assumption 4.1(b) places a bound on the smallest possible gap between
the rst k eigenvalues of B′B. In Theorem 3.1.1 a similar expression for the
eigenvalues of Ω appears in the bounds that link population principal compo-
nent quantities to population factor quantities. It is shown in the Appendix
that similar bounds may be derived in terms of the expression that appears in
Assumption 4.1(b). Assumption 4.2(b) controls the growth rate of the largest
eigenvalue of the error covariance matrix. In combination with Assumption
4.1(a) this ensures that a modied noise-to-signal ratio goes to zero as N
gets large. Consequently, the dierences between population principal compo-
nent quantities and their population factor counterparts converge to zero as
N gets large. Assumptions 4.1(a) and 4.1(b) ensure that the `gap' condition
is met. In conjunction with the moment conditions in Assumptions 4.3(a),
4.3(b), 4.4, and 4.5, this ensures that sample principal component quantities
converge to population principal quantities as T gets large. The fact that these
convergence results occur as (N, T ) → (∞,∞) simultaneously is stated in the
following Theorem.
Theorem 3.1.4. Under assumptions 4.1 to 4.4
(a) max16j6k
| 1N
λj − 1N
dj| = Op
[max
(T− 1
2 , N−α)]
for j=1,..,k;
(b) ‖sft − Lft‖2 = Op
[max
(T− 1
2 , N−α2
)];
134
(c)∥∥∥βs − Lβf
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)];
(d) If Assumption 4.5 also holds, |ysT+h − yfT+h| = Op
[max
(T− 1
2 , N−α2
)].
As noted earlier in this chapter, consistency proofs for the principal compo-
nents estimator of a factor model already exist. The contribution of Theorem
3.1.4 is threefold.
(i) Stock and Watson (2002a), Bai and Ng (2002), Bai (2003) and Bai and
Ng (2006) all make assumptions equivalent to 1N
B′B −→ ΣB where ΣB
is a non-random k × k matrix. No such limit is assumed in Theorem
3.1.4. Rather than requiring that N is large enough for 1N
B′B to be
suciently close to some limiting value, all that is required is that the
eigenvalues of 1N
B′B are distinct and lie between uniform upper and
lower bounds. Therefore, the `largeness' of N is not important for this
part of the theory.
(ii) Theorem 3.1.4 allows for much more cross-correlation between the er-
ror terms than is allowed for in the previously published theory. Stock
and Watson (2002a) and Bai and Ng (2002) assume that there exists a
nite bound M such that 1N
N∑i=1
N∑j=1
|Ψij| < M where Ψ = E(εtε′t) and
Ψij is the i, jth element of Ψ. Bai (2003) and Bai and Ng (2006) make
the slightly stronger assumption thatN∑
j=1
|Ψij| < M for all i , i.e. the
rows of the error covariance matrix are uniformly bounded. While these
assumptions are more general than the diagonal error covariance matrix
that is assumed for the classical `strict' factor model, they might still be
too restrictive for the type of macroeconomic applications which appear
135
in the literature. Indeed, it is entirely plausible that, as the number of
variables is increased, the sums of the absolute values of the rows of the
error covariance matrix increase without a xed bound. In most appli-
cations in the literature, the variables are chosen from a relatively small
number of categories (e.g. Real output and income, Housing starts and
sales, Interest rates, Price indexes). One can imagine a factor model
being constructed by choosing a single variable from each category. A
sequence of factor models with increasing N could then be constructed
by successively adding another variable from each of the categories, with
the number of categories held xed. One might suppose that the er-
rors corresponding to variables from dierent categories might be largely
uncorrelated. However, some of the variables within a category will be
very similar to other variables within the same category (e.g. in Stock
and Watson (2002b), the "Price indexes" category includes the producer
price index for nished goods and the producer price index for nished
consumer goods as two separate series) and consequently it might be
expected that their errors could be correlated. Therefore, as the number
of variables from each category is increased, the sums of the absolute
values of the coecients across each row of the error covariance matrix
will grow at a rate of anything up to and including N . Such cases are not
covered by the theory of Stock and Watson (2002a), Bai and Ng (2002),
Bai (2003) and Bai and Ng (2006). Boivin and Ng (2006) consider a
similar situation but, with the exception of a brief informal considera-
tion of a single-factor model for which the factor loadings are identical,
their analysis is by monte carlo simulation. Theorem 3.1.4 is the rst
136
general theory for principal components estimation of large factor models
which applies under these conditions. What is shows is that the princi-
pal components approach is consistent for a more general class of model
than the approximate factor model. Consistency holds provided that
the absolute row sums of the error covariance matrix grow at a rate of
strictly less than N . However, the faster the growth of the absolute row
sums, the slower the rate of convergence of the estimator. Consequently,
in applied work, it is not sucient simply to have a very large number
of variables. The correlation properties of the errors of the sequence of
models is critical to the performance of the estimator as N grows. This
provides a possible explanation for the lack of empirical evidence for very
large factor models having superior performance to smaller models.
(iii) The proof of Theorem 3.1.4 makes it clear that what really matters for
the quality of the estimator is not the number of variables in the model
per se, but rather the smallness of the noise-to-signal ratio. This simple
statistic provides an appropriate measure of the degree of correlatedness
and variance of the error terms in the model. Rather than concerning
themselves with nding large numbers of variables to include in their
models, practitioners should concentrate their attention on the relative
magnitudes of the eigenvalues of the covariance matrix. Similar to the
methodology used in traditional `small-N ' principal components analy-
sis, economists wishing to estimate large factor models using princpal
component methods should be wary of proceeding unless they are sat-
ised that a large gap exists between the magnitudes of two groups of
137
eigenvalues.
3.2 Measuring the noise-to-signal ratio
Given the above theory, measurement of the noise to signal ratio is a concern
of some practical importance. Since Ψ is not identied, the eigenvalue σ2
and accordingly the noise to signal ratio ρ are not identied. Thus, direct
estimation of the noise to signal ratio is not possible. However, it is possible
to consistently estimate a lower bound on the noise to signal ratio. Let Φ =
σ2IN − Ψ. Then Φ + Ω = BB′ + σ2IN . Note that eigj(Φ) = σ2 − σ2j , where
eigj(.) denotes the jth ordered eigenvalue of its matrix argument, so eigj(Φ) >
0 ∀j = 1, .., N . Thus, Φ is positive semi-denite. It follows from Magnus and
Neudecker (1991, p.208, Theorem 9) that eigj(BB′ + σ2IN) > eigj(Ω), i.e.
dj + σ2 > λj ∀j = 1, .., N . Since dk+1 = 0, λk+1 6 σ2. We therefore have that
λk+1
λk
6 ρ
This expression makes it clear that in order for the noise to signal ratio to be
small, implying that the k-principal component forecast is close to the theo-
retical ideal forecast, there must exist a large relative gap between the kth and
(k+1)th eigenvalues of the covariance matrix of predictor variables. This links
asymptotic principal component techniques to the traditional principal compo-
nent literature, where analysts will often rank the eigenvalues and search for a
point at which the dierence between successive eigenvalues is large. Theorem
3.1.3(a) may be used to show that this ratio of population eigenvalues may be
138
consistently estimated by the corresponding ratio of sample eigenvalues.
In this section, a statistic is constructed for testing the hypothesis that the
noise-to-signal ratio is small in magnitude. Ultimately, what is needed is a test
statistic with an asymptotic distribution which is established in a framework
in which (N, T ) −→ (∞,∞) jointly. However, it is not yet clear how such a
statistic may be constructed. What is presented below is a testing framework
in which N is xed and T −→ ∞ with a brief investigation of the robustness
of the test to large values of N . While not providing the result that is really
required, this approach provides a candidate test statistic which may be the
subject of a more thorough investigation at a later time. In any case, the test
procedure appears to work reasonably well in a setting in which N is larger
than T .
An obvious approach is to consider the distribution of
√T
(λk+1
λk
− λk+1
λk
)
However, despite the fact that the ratio λk+1
λkis consistently estimated by its
sample counterpart, monte carlo simulations suggest that the distribution of
the above statistic is highly sensitive to the magnitude of N . Consequently,
an alternative approach, based on a slightly dierent bounding argument is
presented below.
It was shown previously that λj 6 σ2 + dj for j = 1, .., N . Since dj = 0
for j > k + 1 it follows that 1N−k
N∑j=k+1
λj 6 λk+1 6 σ2. It follows that υ =
1N−k
N∑j=k+1
λj
λk6 ρ We therefore construct a test of H0: 1
N−k
N∑j=k+1
λj
λk= υα; H1:
139
1N−k
N∑j=k+1
λj
λk> υα. Note that rejection of the null implies ρ > υα.
Consider the statistic θ =N∑
j=k+1
λj − υα(N − k)λk. What is really required
is the asymptotic distribution of θ in a framework in which (N, T ) → (∞,∞)
jointly. For serially independent variables, it is possible that something could
be derived using ideas from RandomMatrix Theory. However, in the correlated
time series setting, this problem remains unresolved. For this reason, as an
interim measure, the distribution of the statistic will be derived in a setting in
which N is xed and T →∞.
Under the assumption that
c = maxi6=j
λj
|λi − λj|< c < ∞
the eigenvalues λj are continuous functions of Sxx (see Magnus and Neudecker
(1991)). Assuming conditions sucient for√
T (Sxx −Ω) to be asymptotically
Gaussian, it follows that√
T (λj−λj) is asymptotically Gaussian for j = 1, .., N
where N is xed and T →∞. Consequently, θ is also asymptotically Gaussian.
Furthermore Lawley (1956) provides the following expressions for the rst two
moments of the eigenvalues.
E(λj) = λj +λj
T
N∑i=1
λi
λj − λi
+ O(T−2)
var(λj) =2λ2
j
T
(1− 1
T
N∑i=1
(λi
λj − λi
)2)
+ O(T−3)
cov(λiλj) =2λj
T 2
(λiλj
λj − λi
)2
+ O(T−3)
140
Given the nite upper bound on c, it follows that
√TE(θ) = O(T− 1
2 )
and
var(√
T θ) = T
(N∑
j=k+1
N∑i=k
cov(λi, λj) + υ2α(N − k)2 var(λk)
)
= 2υ2α(N − k)2λ2
k + 2N∑
j=k+1
λ2j + O(T−1).
Dividing√
T θ by its variance yields the test statistic
φ =
√T
2
(1
N−k
N∑j=k+1
λj
λk− υα
)√
υ2α + 1
(N−k)2
N∑j=k+1
λ2j
λ2k
d−→ N(0, 1)
The table below shows the results of a small Monte Carlo simulation of this
test. We initially set the number of observations to 100, the number of variables
to 5 and the number of factors to 2. We then raise the number of variables to
50, then to 200. We choose the rst 3 population eigenvalues to be 100, 75 and
5. The remaining eigenvalues decay linearly to 0.001. Thus, λk+1
λk= 0.0667.
We conduct 5000 simulations of the test statistic for each model.
The elements in the table are the proportions of the empirical probability
mass that lie above the critical value corresponding to α. Thus, for example,
the 5% critical value for a standard Gaussian distribution is 1.645, and the table
shows that, for a model with 5 observable variables, 0.0696 of the empirical
141
Table 3.1: Empirical and Theoretical Distributions of the Test Statistic (k=2,T=100, 5000 simulations)
Empirical PercentilesN = 5 N = 50 N = 200
α 10.00% 0.1282 0.1264 0.05665.00% 0.0696 0.0728 0.02861.00% 0.0178 0.0256 0.0076
probability mass lies above 1.645.
While in no way being a substitute for a more thorough investigation, the
data in Table 3.1 suggest that the test statistic is able to perform reasonably
well in some cases where N is large relative to T .
3.3 The noise-to-signal ratio for a US macroe-
conomic data set
Stock andWatson (2002b) have collected a large data set of variables describing
the US macroeconomy which they employ in a forecasting experiment using a
factor model. The interested reader is directed to their paper for a description
of the data. The data set used here was downloaded from Professor Watson's
website. We follow Stock and Watson in taking logs and/or dierences or
double-dierences for some variables. Following appropriate transformations
the balanced panel contains 149 variables measured monthly from March 1959
to December 1998. These variables are rescaled to a zero mean and unit
variance.
The plots below shows the eigenvalues of the Stock and Watson data, and
142
the ratios 1N−k
N∑j=k+1
λj
λkfor j = 1, .., N − 1.
Figure 3.1: Eigenvalues of Stock and Watson's data
0
5
10
15
20
25
0 20 40 60 80 100 120 140
Note that the rst few sample eigenvalues drop sharply but the plot levels
out after that. With the exception of the rst couple of values, none of the
ratios in Figure 3.2 are particularly small.
Consider the theoretical problem of producing a forecast for a scalar vari-
able yt using a regression with known population factors, yt = β′ft + εt. The-
orem 3.1.1(e) gives an upper bound on the scaled expected forecast error for
a regression on population factors. As T →∞, this bound becomes
E |eT+h|√β′β
6√
kρ (√
ρ + 1)
where eT+h is the forecast error and, since E(ftf′t) = Ik,
√β′β is the stan-
dard deviation of the `signal' component of the regression. Applying Markov's
Lemma yields a bound on the probability that the forecast error is larger than
143
Figure 3.2: 1N−k
N∑j=k+1
λj
λkfor Stock and Watson's data
0
0.1
0.2
0.3
0.4
0.5
0.6
0 20 40 60 80 100 120 140
the standard deviation of the signal.
P
|eT+h|√
β′β> 1
6√
kρ (√
ρ + 1)
If we can choose a desired numerical bound for the above probability, which
we denote as α, then we may solve the equation α =√
kρ(√
ρ + 1)to nd a
corresponding bound for ρ, which we denote ρα.
Since υ = 1N−k
N∑j=k+1
λj
λk6 ρ, we may test H0: υ = ρα; H1: υ > ρα. Note
that rejection of the null implies ρ > ρα. Table 3.2 presents the results of the
hypothesis test derived above conducted for factor models of orders 1 to 6. We
choose values for ρα corresponding to probability bounds of 5%, 10%, 25% and
50%.
Note that for a 1-factor model we cannot reject a probability bound of 0.25
144
Table 3.2: Test results for Stock and Watson data
k1 2 3 4 5 6
0.05 72.92 88.15 92.2 94.71 96.35 98.580.1 38.82 75.54 88.07 92.02 94.57 96.560.25 -2.74 19.16 49.82 63.27 74.46 79.330.5 -11.38 -4.37 8.7 17.1 27.53 32.31
and 0.5, and for a 2-factor model we cannot reject a probability bound of 0.5,
using signicance levels of 5 per cent. For all other factor models and probabil-
ity bounds we can strongly reject the hypothesis about the probability bound.
Overall, these results do not provide strong support for the proposition that
the US macroeconomic data set used by Stock and Watson (2002b) satises
the condition of having a small noise-to-signal ratio, although the preliminary
nature of these results must be stressed.
3.4 Summary and concluding comments
It has been argued in this chapter that the approximate factor model that
has been investigated in the theoretical literature imposes restrictions on the
cross-correlation of the errors that are likely to be not satised in the types
of applications of the factor model which have appeared in the empirical lit-
erature. Some new theory was presented which proves consistency under as-
sumptions which allow for much greater error cross-correlation. However, the
rates of convergence that are achieved depend on the rate of growth of the er-
ror cross-correlation. Consequently, it is possible for models with a very large
cross-sectional dimension to perform poorly.
145
An important conclusion of the theoretical approach taken in this chapter
is that it suggests that what matters for the quality of the principal component
estimator is not the number of variables in the model per se, but rather the
noise-to-signal ratio of the model. Rather than concerning themselves with
collecting data on every available variable, so that N may be made as large
as possible, practitioners should be giving thought to the likely noise-to-signal
ratio. Clearly, what is required is a trade-o between having a large number of
variables and having a low amount of error cross-correlation. Unfortunately,
the noise-to-signal ratio is not generally identied and so bounding arguments
must be employed to investigate its magnitude.
Clearly, there exists plenty of scope for the work in this chapter to be
extended. First order convergence is interesting, but some results on second
order convergence analogous to those of Bai (2003) and Bai and Ng (2006)
would also be useful. Of particular interest would be an investigation of the
asymptotic distribution of the sample eigenvalues of the covariance matrix as
(N, T ) → (∞,∞), since knowledge of this distribution may lead to a better
testing methodology for the noise-to-signal ratio than what has been proposed
here. While the `xed-N' test that is derived in Section 3.2 is a vast improve-
ment on simply hoping that the noise-to-signal ratio is small, it is clear that
what is really needed is a test statistic which converges to a known distribution
as (N, T ) → (∞,∞).
Another issue worthy of investigation is the estimation of principal com-
ponents in a framework in which (N, T ) → (∞,∞) and the rst k eigenvalues
grow at a rate slower than N . The work in Subsection 3.1.1 shows that popu-
lation principal components are consistent estimators of population factors as
146
the noise to signal ratio gets small. If the eigenvalues of the error covariance
matrix are assumed to be bounded, then all that is required of the eigenvalues
of B′B is that they grow with N growth at a rate exactly equal to N is not
required. Consequently, these theorems cover certain cases where the factors
are `weak' in the sense that they account for a declining proportion of the total
variance of xt as N grows. What is needed is a theory for the estimation of
principal components in such cases. Theorem 3.1.3 does not cover this case
since it requires the rst k eigenvalues to grow at a rate of strictly N in order
for the `gap' condition to be satised. The current versions of Random Matrix
Theory also do not cover this case since they assume the eigenvalues to be
bounded as N grows.
Finally, the work presented in this chapter applies to the static model only.
An interesting extension would be to develop analogous results for the dynamic
factor model of Forni et al. (2000).
Appendices Proofs of Theorems
These appendices contain proofs of the four theorems stated in this chapter.
Theorems 3.1.1 and 3.1.2 concern the properties of population quantities and
are proved in Appendix A. Theorems 3.1.3 and 3.1.4 describe the properties of
sample quantities and are proved in Appendix B. In each appendix the proofs
of the theorems are given, then lemmas used to prove the theorems are stated
and nally, the lemmas are proved.
147
Appendix A Proofs of Theorems 3.1.1 and 3.1.2
Proofs of Theorems
Proof of Theorem 3.1.1(a): Let χ = σ2IN−Ψ. Then χ+Ω = BB′+σ2IN .
Note that eigj(χ) = σ2 − σ2j , where eigj(.) denotes the jth ordered eigenvalue
of its matrix argument, so eigj(χ) > 0 ∀j = 1, .., N . Thus, χ is positive semi-
denite. It follows from Magnus and Neudecker (1991, p.208, Theorem 9) that
eigj(BB′+σ2IN) > eigj(Ω), i.e. dj +σ2 > λj ∀j = 1, .., N . It also follows from
Magnus and Neudecker (1991, p.208, Theorem 9) that λi > di ∀i = 1, ..., k.
The result follows.
Proof of Theorem 3.1.1(b):
‖Qf − UL‖2F = tr
[(Q′
f − LU ′) (Qf − UL)]
= 2tr(I − LQ′
fU)
(3.4)
From Lemma 10, 1− ρ−k∑
j 6=i
(q′iuj)2 6 (q′iui)
2 where qi is the ith column of
Qf and ui is the ith column of U . If k = 1 then c = 0 and the result holds
from Equation (3.4) with sign(Lii) = sign(q′iuj). If k > 1 then using Lemma
9, 1 − ρ − 4c2ρ2(k − 1) 6 (q′iui)2. With c 6 1
2ρ
√1−ρk−1
the left hand side is
non-negative and √1− ρ− 4c2ρ2(k − 1) 6 |q′iui| (3.5)
If we choose L so that sign(Lii) = sign(q′iuj) then from equations (3.4) and
(3.5) we get ‖Qf − UL‖2F 6 k − k
√(1− ρ)− 4c2ρ2(k − 1). Multiplying this
by1+√
(1−ρ)−4c2ρ2(k−1)
1+√
(1−ρ)−4c2ρ2(k−1)yields the result.
148
Proof of Theorem 3.1.1(c):
E ‖st − Lft‖2F = tr
[(Λ− 1
2f Q′
fUD12 − L
)(D
12 U ′QfΛ
− 12
f − L)
+ Λ− 1
2f Q′
fΨQfΛ− 1
2f
]= 2tr
(I − LΛ
− 12
f Q′fUD
12
)As in the proof to Theorem 3.1.1(b), 1− ρ− 4c2ρ2(k − 1) 6 (q′iui)
2. From
Theorem 3.1.1(a) 1− ρ 6 di
λi, so
(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ) 6di
λi
(q′iui)2
If c 6 1−ρ
2ρ√
(k−1)(1−ρ)then the left hand side is non-negative and√
(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ) 6√
di
λi|q′iui|. If we choose L so that
sign(Lii) = sign(q′iuj), we get E ‖st − Lft‖2F 6 k−k
√(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ).
Multiplying this by1+√
(1−ρ)2−4c2ρ2(k−1)(1−ρ)
1+√
(1−ρ)2−4c2ρ2(k−1)(1−ρ)yields the result.
Proof of Theorem 3.1.1(d): Using the triangle inequality
‖βs − Lβf‖F =
∥∥∥∥∥ 1
T
T∑t=1
sftyt −1
T
T∑t=1
Lftyt
∥∥∥∥∥F
61
T
T∑t=1
‖yt‖F ‖sft − Lft‖F
Therefore, by the Cauchy-Schwarz inequality
‖βs − Lβf‖F 61
T
T∑t=1
√E(‖yt‖2
F
)E(‖sft − Lft‖2
F
)=√
σ2yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1))
by Theorem 3.1.1(c).
149
Proof of Theorem 3.1.1(e): Dening Sxy = 1T
T∑t=1
xtyt, the forecast devia-
tion is
eT+h = β′ssfT+h − β′fT+h =(Λ− 1
2f Q′
fSxy
)′Λ− 1
2f Q′
fxT+h − β′fT+h
= S ′xyQfΛ−1f Q′
f (BfT+h + εT+h)− β′fT+h
= ea + eb
where ea =(S ′xyQfΛ
−1f Q′
fB − β′)fT+h and eb = S ′xyQfΛ
−1f Q′
fεT+h.
First consider eb. From the Cauchy-Schwarz inequality we have
|eb| 6∥∥S ′xyQfΛ
−1f
∥∥F
∥∥Q′fεT+h
∥∥Fand
(E |eb|)2 6 E∥∥S ′xyQfΛ
−1f
∥∥2
FE∥∥Q′
fεT+h
∥∥2
F(3.6)
We have that
E∥∥Q′
fεT+h
∥∥2
F= tr(Q′
fΨQf ) 6 σ2k (3.7)
Also, letting ωi be a vector of zeros with a 1 in the ith element, using Lemma
5
E∥∥S ′xyQfΛ
−1f
∥∥2
F= E
(S ′xyQfΛ
−2f Q′
fSxy
)=
k∑i=1
E[(
ω′iΛ−1f Q′
fSxy
)2]=
k∑i=1
[var
(ω′iΛ
−1f Q′
fSxy
)+ E
(ω′iΛ
−1f Q′
fSxy
)2]=
k∑i=1
[var
(ω′iΛ
−1f Q′
fSxy
)+(ω′iΛ
−1f Q′
fBβ)2]
(3.8)
150
Using Lemma 8 and Lemma 6
k∑i=1
(ω′iΛ
−1f Q′
fBβ)2
= β′B′QfΛ−2f Q′
fBβ = β′D12 U ′QfΛ
−2f Q′
fUD12 β
= βD12 RΛ−2
f R′D12 β 6 β′βmaxeig
(Λ−1
f R′DRΛ−11
)6 β′βmaxeig
(Λ−1
f ΛfΛ−1f
)6 β′βλ−1
k
(3.9)
where R = U ′Qf . Also, from Lemma 4k∑
i=1
var(ω′iΛ
−1f Q′
fSxy
)6 Υ1 + Υ2 + Υ3
where
Υ1 =2
T
k∑i=1
ω′iΛ−1f Q′
fΩQfΛ−1f ωiσ
(0)2y
Υ2 =2
T
k∑i=1
T−1∑j=1
∣∣ω′iΛ−1f Q′
fE(xtx′t−j)QfΛ
−1f ωiσ
(j)2y
∣∣Υ3 =
2
T
k∑i=1
T−1∑j=1
∣∣ω′iΛ−1f Q′
fE(xtyt−j)E(ytx′t−j)QfΛ
−1f ωi
∣∣where σ
(j)2y = E(ytyt−j).
We have that
Υ1 =2
Tσ(0)2
y tr(Λ−1
f ΛfΛ−1f
)6
2
Tσ(0)2
y λ−1k
Also Υ2 = 2T
k∑i=1
T−1∑j=1
∣∣∣ω′iΛ− 12
f E(sts′t−j)Λ
− 12
f ωiσ(j)2y
∣∣∣ where st = Λ− 1
2f Q′
fxt is the
principal component vector of xt, so
Υ2 =2
T
k∑i=1
T−1∑j=1
∣∣∣∣E(sits′it−j)
λi
σ(j)2y
∣∣∣∣ 6 2
T
k∑i=1
T−1∑j=1
1
λi
∣∣E(sits′it−j)
∣∣ ∣∣σ(j)2y
∣∣ 6 2
Tλ−1
k
T−1∑j=1
∣∣σ(j)2y
∣∣151
.
For the third term
Υ3 =2
T
k∑i=1
T−1∑j=1
1
λi
∣∣E(sityt−j)E(yts′it−j)
∣∣ 6 2
T
k∑i=1
T−1∑j=1
1
λi
|E(sityt−j)|∣∣E(yts
′it−j)
∣∣6
2σ(0)2y
T
k∑i=1
T−1∑j=1
1
λi
∣∣E(yts′it−j)
∣∣ 6 2σ(0)2y
Tλ−1
k supi
T−1∑j=1
∣∣E(yts′it−j)
∣∣
so
k∑i=1
var(ω′iΛ
−1f Q′
fSxy
)6
2
Tλk
(T−1∑j=0
∣∣σ(j)2y
∣∣+ σ(0)y sup
i
T−1∑j=1
∣∣E(yts′i,t−j)
∣∣)
=2
Tλk
σ(0)2y δ
(3.10)
where δ =T−1∑j=1
∣∣∣E(ytyt−j)
E(y2t )
∣∣∣+ supi
T−1∑j=1
∣∣∣∣ E(ytsi,t−j)√E(y2
t )E(s2it)
∣∣∣∣.Equations (3.8), (3.9), and (3.10) yield
E∥∥S ′xyQfΛ
−1f
∥∥26 λ−1
k
(2σ
(0)2y γ
T+ β′β
)
which when combined with equations (3.6) and (3.7) yield
(E |eb|)2
‖β‖2F
6σ2k
λk
(2
T
σ(0)2y γ
‖β‖2F
+ 1
).
152
Now consider ea. By the Cauchy-Schwarz inequality we have
(E |ea|)2 6 E∥∥S ′xyQfΛ
−1f Q′
fB − β′∥∥2
E ‖fT+h‖2
= kE∥∥S ′xyQfΛ
−1f Q′
fB − β′∥∥2
(3.11)
Now from Lemma 5,
E∥∥S ′xyQfΛ
−1f Q′
fB − β′∥∥2
=k∑
i=1
E[ω′i(B′QfΛ
−11 Q′
fSxy − β)2]
=k∑
i=1
var[ω′i(B′QfΛ
−1f Q′
fSxy − β)]
+k∑
i=1
E[ω′i(B′QfΛ
−1f Q′
fSxy − β)]2
(3.12)
but from Lemma 4k∑
i=1
var[ω′i(B′QfΛ
−1f Q′
fSxy − β)]
6 ∆1 + ∆2 + ∆3 where
∆1 =k∑
i=1
2
Tω′iB
′QfΛ−1f Q′
fΩQfΛ−1f Q′
fBωiσ(0)2y
∆2 =2
T
k∑i=1
T−1∑j=1
∣∣ω′iB′QfΛ−1f Q′
fE(xtxt−j)QfΛ−1f Q′
fBωiσ(j)2y
∣∣∆3 =
2
T
k∑i=1
T−1∑j=1
∣∣ω′iB′QfΛ−1f Q′
fE(xtyt−j)ω′iB
′QfΛ−11 Q′
fE(xtyt−j)∣∣
We have that
∆1 =2σ
(0)2y
Ttr(B′QfΛ
−1f Q′
fB)
62σ
(0)2y
Ttr(B′Ω−1B
)6
2kσ(0)2y
T
153
Also
∆2 =2
T
k∑i=1
T−1∑j=1
∣∣∣ω′iB′QfΛ− 1
2f E(stst−j)Λ
− 12
f Q′fBωiσ
(j)2y
∣∣∣
let vi = ω′iB′QfΛ
− 12
f =
(0 ... 0 νi 0 ... 0
)then
∆2 =2
T
k∑i=1
T−1∑j=1
∣∣viE(stst−j)v′iσ
(j)2y
∣∣ =2
T
k∑i=1
T−1∑j=1
∣∣ν2i E(stst−j)σ
(j)2y
∣∣6
2
T
k∑i=1
T−1∑j=1
ν2i
∣∣σ(j)2y
∣∣ 6 2
T
k∑i=1
ω′iBQfΛ−1f Q′
fBωi
T−1∑j=1
∣∣σ(j)2y
∣∣6
2
T
k∑i=1
tr(B′Ω−1B)T−1∑j=1
∣∣σ(j)2y
∣∣ 6 2k
T
T−1∑j=1
∣∣σ(j)2y
∣∣
For the third term
∆3 =2
T
k∑i=1
T−1∑j=1
∣∣∣ω′iB′QfΛ− 1
2f E(styt−j)E(yts
′t−j)Λ
− 12
f Q′fBωi
∣∣∣=
2
T
k∑i=1
ω′iB′QfΛ
−1f Q′
fBωi
T−1∑j=1
∣∣E(styt−j)E(yts′t−j)∣∣
62
Ttr(B′Ω−1B)σ(0)
y supi
T−1∑j=1
∣∣E(yts′t−j)∣∣
62kσ
(0)y
Tsup
i
T−1∑j=1
∣∣E(yts′t−j)∣∣
154
So combining the three terms yields
k∑i=1
var[ω′i(B′QfΛ
−1f Q′
fSxy − β)]
62k
T
T−1∑j=1
∣∣σ(j)2y
∣∣+ 2kσ(0)y
Tsup
i
T−1∑j=1
∣∣E(yts′t−j)∣∣
=2k
Tσ(0)2
y γ
(3.13)
Alsok∑
i=1
E[ω′i(B′QfΛ
−1f Q′
fSxy − β)]
= ω′i(M−I)β where M = D12 RΛ−1
f R′D12
and R = U ′Qf , sok∑
i=1
E[ω′i(B′QfΛ
−1f Q′
fSxy − β)]2
= β′(M − I)2β. However
M2 6 B′QfΛ−1f Q′
fΩQfΛ−1f Q′
fB = M so (M − I)2 6 (I − M) ⇒ M 6 Ik.
Therefore, using Lemma 7
k∑i=1
E[ω′i(B′QfΛ
−1f Q′
fSxy − β)]2
6 β′(M − I)2β
6 β′β [maxeig(I −M)]2
= β′β[maxeig
(Λ− 1
2f Q′
fΨQfΛ− 1
2f
)]26 β′β
(σ2
λk
)2
(3.14)
Combining equations (3.11) to (3.14) yields
(E |ea|)2
‖β‖2F
6 k
((σ2
λk
)2
+2k
T
σ(0)2y γ
‖β‖2F
)
Noting that ρ = σ2
λkand 1
r2 =σ2
y
‖β‖2F, combining the above results yields the
result of the theorem.
155
Proof of Theorem 3.1.2: From Magnus and Neudecker (1991, p.208, The-
orem 9) λk > dk > NdL so ρ = σ2
λk6 σ2
dk6 σ2
NdU= O (N−α).
Lemmas Used in Theorems
Lemma 3. If ω ∼ N(0, Γ) and α and β are vectors of conformable dimension
then E(α′ω)2(β′ω)2 = α′Γαβ′Γβ + 2(α′Γβ)2. This is a standard property of
Gaussian distributions. See, e.g. Johnson and Kotz (1972).
Corollary 2. var(α′ωβ′ω) = α′Γαβ′Γβ + (α′Γβ)2. The proof is elementary.
Lemma 4. If zt =
wt
ut
is Gaussian and E(ztz′t−j) = Γ(j) =
Γ(j)w Γ
(j)wu
Γ(j)uw Γ
(j)u
,
then using Lemma 3, and the Cauchy-Schwarz inequality,
var(a′Swub) = var
(1
T
T∑t=1
a′wtb′ut
)
62
T
(a′Γ(0)
w ab′Γ(0)u b +
T−1∑j=1
a′Γ(j)w ab′Γ(j)
u b +T−1∑j=1
a′Γ(j)wuab′Γ(−j)
wu b
)
where a and b are vectors of conformable dimension.
Corollary 3. E(α′u)2(β′v)2 = α′Γuαβ′Γvβ +2(α′Γuvβ)2 6 3α′Γuαβ′Γvβ. The
proof is elementary.
Lemma 5. If Z is a random vector and ei is a k × 1 vector of zeros but with
a 1 in position i, and M is a k × k constant, then
E(Z ′M ′MZ) = E(Z ′M ′k∑
i=1
eie′iMZ) =
k∑i=1
E(Z ′M ′ei)(e′iMZ) =
k∑i=1
E(e′iMZ)2
Lemma 6. Λ = Q′f (UDU ′ + Ψ)Qf
156
Lemma 7. If M = D12 RΛ−1
f R′D12 , where R = U ′Qf , then the eigenvalues of
I −M are equal to the eigenvalues of Λ− 1
2f Q′
fΨQfΛ− 1
2f .
Lemma 8. The eigenvalues of D12 RΛ−2
f R′D12 are equal to the eigenvalues of
Λ−1R′DRΛ−1, where R = U ′Qf .
Lemma 9. |q′iuj| 6 2cρ for i 6= j where qi is the ith column of Qf and uj is
the jth column of U, and c = max16i6k16j6N
i6=j
λi
|λj−λi|
Lemma 10. 1− ρ 6k∑
j=1
(q′iuj)2
Proofs of Lemmas
Proof of Lemma 6: ΩQf = QfΛf ⇒ (UDU ′+Ψ)Qf = QfΛf . Premultiply-
ing by Q′f gives the result.
Proof of Lemma 7: Using Lemma 6, the eigenvalues of I −M are the solu-
tions of
0 = |λI − (I −M)| = |(λ− 1)I + M | =∣∣∣(λ− 1)I + D
12 RΛ−1
f R′D12
∣∣∣=∣∣(λ− 1)I + RΛ−1
f R′D∣∣ =
∣∣(λ− 1)I + I −Q′fΨQfΛ
−1f
∣∣=∣∣λI −Q′
fΨQfΛ−1f
∣∣ =∣∣∣λI − Λ
− 12
f Q′fΨQfΛ
− 12
f
∣∣∣ .
Proof of Lemma 8: The eigenvalues of D12 R1Λ
−2R′1D
12 are the solutions for
λ of 0 =∣∣∣λI −D
12 RΛ−2
f R′D12
∣∣∣ =∣∣λI −RΛ−2
f R′D∣∣ =
∣∣λI − Λ−1f R′DR1Λ
−1f
∣∣
157
Proof of Lemma 9: QfΛfQ′f + Q⊥Λ⊥Q′
⊥ = UDU ′ + Ψ. Premultiplying by
Q′f , postmultiplying by UΛ−1
f , and subtracting Q′fU yields
ΛfQ′fUΛ−1
f −Q′fU = Q′
fΨUΛ−1f −Q′
fU(I −DΛ−1f ) (3.15)
We now consider each of the right hand side terms. Let ei be a vector of zeros
with a 1 in the ith element only. We have
(e′iQfΨUΛ−1
f ej
)26 tr
(U ′ΨQfeie
′iQ
′fΨUΛ−2
f
)6
1
λ2k
tr(e′iQ
′fΨUU ′ΨQfei
)6
1
λ2k
e′iQ′fΨ
2Qfei 6σ4
λ2k
∴∣∣e′iQfΨUΛ−1
f ej
∣∣ 6 ρ (3.16)
For the other right hand side term we have∣∣e′iQ′
fU(I −DΛ−1f )ej
∣∣ =∣∣∣q′iuj
(1− dj
λj
)∣∣∣ =
|q′iuj|(1− dj
λj
)since λj > dj. But 1 − dj
λj6 ρ from Theorem 3.1.1(a) so
|q′iuj|(1− dj
λj
)6 ρ |q′iuj| and |q′iuj| 6 1 by the Cauchy-Schwarz inequality, so
∣∣e′iQ′fU(I −DΛ−1
f )ej
∣∣ 6 ρ (3.17)
Combining equations (3.15), (3.16), and (3.17),
∣∣e′i (ΛfQ′fUΛ−1
f −Q′fU)ej
∣∣ =∣∣e′iQ′
fΨUΛ−1f ej − e′iQ
′fU(I −DΛ−1
f )ej
∣∣6∣∣e′iQ′
fΨUΛ−1f ej
∣∣+ ∣∣e′iQ′fU(I −DΛ−1
f )ej
∣∣ 6 2ρ
i.e.∣∣∣( λi
λj− 1)
q′iuj
∣∣∣ 6 2ρ ⇒ |q′iuj| 6 2cρ for i 6= j.
Proof of Lemma 10: QfΛfQ′f + Q⊥Λ⊥Q′
⊥ = UDU ′ + Ψ. Premultiply by
158
Q′fUU ′, postmultiply by Qf , and substitute R = U ′Qf to get
R′RΛf = R′DR + R′U ′ΨQf (3.18)
Also
Λ = Q′fΩQf ⇒ Λ = Q′
f (UDU ′ + Ψ)Qf ⇒ Λ = R′DR + Q′fΨQf (3.19)
Subtract Equation (3.18) from Equation (3.19) and postmultiply by Λ−1f to
get
R′R− I = R′U ′ΨQfΛ−1 −Q′
fΨQfΛ−11 = Q′
fUU ′ΨQfΛ−1f −Q′
fΨQfΛ−11
= Q′f (UU ′ − I)ΨQfΛ
−1f
= −Q′fU⊥U ′
⊥ΨQfΛ−1f
so e′i (I −R′R) ei = e′i(Q′
fU⊥U ′⊥ΨQfΛ
−1f
)ei 6 σ2
λk= ρ.
i.e. 1− ρ 6k∑
j=1
(q′iuj)2
Appendix B Proofs of Theorems 3.1.3 and 3.1.4
Proofs of Theorems
Proof of Theorem 3.1.3(a):
max16j6N
1
N2
(λj − λj
)2
61
N2
N∑j=1
(λj − λj
)2
61
N2‖Sxx − Ω‖2
F = Op
(T−1
)from Lemmas 11 and 15.
159
Proof of Theorem 3.1.3(b):
sft − Lsft = Λ− 1
2f Q′
fxt − LΛ− 1
2f Q′
fxt
= Λ− 1
2f Q′
f
(QfQ
′f + Q⊥Q′
⊥)xt − LΛ
− 12
f Q′fxt
= Λ− 1
2f Q′
fQfQ′fxt + Λ
− 12
f Q′fQ⊥Q′
⊥xt − LΛ− 1
2f Q′
fxt
= Λ− 1
2f Q′
fQfQ′fxt − Λ
− 12
f LQ′fxt + Λ
− 12
f LQ′fxt − Λ
− 12
f LQ′fxt + Λ
− 12
f Q′fQ⊥Q′
⊥xt
= Λ− 1
2f
(Q′
fQf − L)
Q′fxt +
(Λ− 1
2f − Λ
− 12
f
)LQ′
fxt + Λ− 1
2f Q′
fQ⊥Q′⊥xt
Therefore
‖sft − Lsft‖2 6√
Nλ− 1
2k
∥∥∥Q′fQf − L
∥∥∥2
∥∥∥∥ 1√N
Q′fxt
∥∥∥∥2
+√
N∥∥∥Λ− 1
2f − Λ
− 12
f
∥∥∥2
∥∥∥∥ 1√N
Q′fxt
∥∥∥∥2
+√
Nλ− 1
2k
∥∥∥Q′fQ⊥
∥∥∥2
∥∥∥∥ 1√N
Q′⊥xt
∥∥∥∥2
(3.20)
The following bounds apply to the right hand side terms in Equation (3.20).
• Since λk > 0 it follows from Theorem 3.1.3(a) that
√Nλ
− 12
k =√
Nλ− 1
2k + Op
(T− 1
2
)= O(1)
•√
N∥∥∥Λ− 1
2f − Λ
− 12
f
∥∥∥2
= max16j6k
(√Nλ
− 12
j −√
Nλ− 1
2j
)= Op
(T− 1
2
)from the
above bound.
•∥∥∥Q′
fQ⊥
∥∥∥2
= Op
(T− 1
2
)from Lemmas 14 and 15.
160
•
∥∥∥Q′fQf − L
∥∥∥2
=∥∥∥Rf − L
∥∥∥2
6∥∥∥Rf − L
∥∥∥F
=
√√√√ k∑i=1
k∑j=1
(Rij − Lij
)2
6
√√√√k max16i6k
k∑j=1
(Rij − Lij
)2
6√
k max16i6k
k∑j=1
∣∣∣Rij − Lij
∣∣∣=√
k max16i6k
k∑j=1
j 6=i
∣∣∣Rij
∣∣∣+ ∣∣∣Rii − Lii
∣∣∣
But for i 6= j, R2ij = Op (T−1) from Lemmas 12 and 15 and the Markov
inequality. Therefore∣∣∣Rij
∣∣∣ = Op
(T− 1
2
)for i 6= j. Also, from Lemmas
13 and 15 and the Markov inequality, 1 − R2ii = Op
(T− 1
2
). There-
fore, ∃Lii ∈ −1, +1 such that∣∣∣Rii − Lii
∣∣∣ = Op
(T− 1
2
). Consequently∥∥∥Q′
fQf − L∥∥∥
2= Op
(T− 1
2
).
•
E
(∥∥∥∥ 1√N
Q′⊥xt
∥∥∥∥2
)6 E
(∥∥∥∥ 1√N
Q′⊥xt
∥∥∥∥F
)= E
√∥∥∥∥ 1√N
Q′⊥xt
∥∥∥∥2
F
6
√√√√E
(∥∥∥∥ 1√N
Q′⊥xt
∥∥∥∥2
F
)6
√1
Ntr (Q′
⊥ΩQ⊥)
6
√1
Ntr(Ω)− 1
Ntr(Q′
fΩQf
)6
√1
Ntr(Ω) = O(1)
from Assumption 3.2. Therefore∥∥∥ 1√
NQ′⊥xt
∥∥∥2
= Op(1)
161
•
E
(∥∥∥∥ 1√N
Q′fxt
∥∥∥∥2
)6 E
(∥∥∥∥ 1√N
Q′fxt
∥∥∥∥F
)= E
√∥∥∥∥ 1√N
Q′fxt
∥∥∥∥2
F
6
√√√√E
(∥∥∥∥ 1√N
Q′fxt
∥∥∥∥2
F
)6
√1
Ntr(Q′
fΩQf
)
=
√√√√ 1
N
k∑j=1
λj = O(1)
from Assumption 3.2. Therefore∥∥∥ 1√
NQ′
fxt
∥∥∥2
= Op(1).
The above bounds and Equation (3.20) prove the theorem.
Proof of Theorem 3.1.3(c):
∥∥∥βs − Lβs
∥∥∥2
6
∥∥∥∥∥ 1
T
T∑t=1
(sft − Lsft) yt
∥∥∥∥∥2
61
T
T∑t=1
‖(sft − Lsft) yt‖2
As in the proof of Theorem 3.1.3(b),
sft − Lsft = Λ− 1
2f Q′
fxt − LΛ− 1
2f Q′
fxt
= Λ− 1
2f Q′
f
(QfQ
′f + Q⊥Q′
⊥)xt − LΛ
− 12
f Q′fxt
= Λ− 1
2f Q′
fQfQ′fxt + Λ
− 12
f Q′fQ⊥Q′
⊥xt − LΛ− 1
2f Q′
fxt
= Λ− 1
2f Q′
fQfQ′fxt − Λ
− 12
f LQ′fxt + Λ
− 12
f LQ′fxt − Λ
− 12
f LQ′fxt + Λ
− 12
f Q′fQ⊥Q′
⊥xt
= Λ− 1
2f
(Q′
fQf − L)
Q′fxt +
(Λ− 1
2f − Λ
− 12
f
)LQ′
fxt + Λ− 1
2f Q′
fQ⊥Q′⊥xt
162
Therefore
1
T
T∑t=1
‖(sft − Lsft) yt‖2 6√
Nλ− 1
2k
∥∥∥Q′fQf − L
∥∥∥2
1
T
T∑t=1
∥∥∥∥ 1√N
Q′fxtyt
∥∥∥∥2
+√
N∥∥∥Λ− 1
2f − Λ
− 12
f
∥∥∥2
1
T
T∑t=1
∥∥∥∥ 1√N
Q′fxtyt
∥∥∥∥2
+√
Nλ− 1
2k
∥∥∥Q′fQ⊥
∥∥∥2
1
T
T∑t=1
∥∥∥∥ 1√N
Q′⊥xtyt
∥∥∥∥2
(3.21)
As in the shown in the proof of Theorem 3.1.3(b), the following bounds apply
• Since λk > 0 it follows from Theorem 3.1.3(a) that
√Nλ
− 12
k =√
Nλ− 1
2k + Op
(T− 1
2
)= O(1)
•√
N∥∥∥Λ− 1
2f − Λ
− 12
f
∥∥∥2
= max16j6k
(√Nλ
− 12
j −√
Nλ− 1
2j
)= Op
(T− 1
2
)from the
above bound.
•∥∥∥Q′
fQ⊥
∥∥∥2
= Op
(T− 1
2
)from Lemmas 14 and 15.
•
∥∥∥Q′fQf − L
∥∥∥2
=∥∥∥Rf − L
∥∥∥2
6∥∥∥Rf − L
∥∥∥F
=
√√√√ k∑i=1
k∑j=1
(Rij − Lij
)2
6
√√√√k max16i6k
k∑j=1
(Rij − Lij
)2
6√
k max16i6k
k∑j=1
∣∣∣Rij − Lij
∣∣∣=√
k max16i6k
k∑j=1
j 6=i
∣∣∣Rij
∣∣∣+ ∣∣∣Rii − Lii
∣∣∣ = Op
(T− 1
2
)
163
But for i 6= j, R2ij = Op (T−1) from Lemmas 12 and 15 and the Markov
inequality. Therefore∣∣∣Rij
∣∣∣ = Op
(T− 1
2
)for i 6= j. Also, from Lemmas
13 and 15 and the Markov inequality, 1 − R2ii = Op
(T− 1
2
). There-
fore, ∃Lii ∈ −1, +1 such that∣∣∣Rii − Lii
∣∣∣ = Op
(T− 1
2
). Consequently∥∥∥Q′
fQf − L∥∥∥
2= Op
(T− 1
2
).
Also
•
E
(1
T
T∑t=1
∥∥∥∥ 1√N
Q′⊥xtyt
∥∥∥∥2
)=
1
T
T∑t=1
E
(∥∥∥∥ 1√N
Q′⊥xt
∥∥∥∥2
yt
)
61
T
T∑t=1
√√√√E
(∥∥∥∥ 1√N
Q′⊥xt
∥∥∥∥2
F
)E (y2
t )
61
T
T∑t=1
√1
Ntr (Q′
⊥ΩQ⊥) σ2y
6
√1
Ntr(Ω−Q′
fΩQf
)σ2
y
61
T
T∑t=1
√1
Ntr(Ω)σ2
y = O(1)
from assumptions 3.2 and 3.5. Therefore 1T
T∑t=1
∥∥∥ 1√N
Q′⊥xtyt
∥∥∥2
= Op(1).
164
•
E
(1
T
T∑t=1
∥∥∥∥ 1√N
Q′fxtyt
∥∥∥∥2
)=
1
T
T∑t=1
E
(∥∥∥∥ 1√N
Q′fxt
∥∥∥∥2
yt
)
61
T
T∑t=1
√√√√E
(∥∥∥∥ 1√N
Q′fxt
∥∥∥∥2
F
)E (y2
t )
61
T
T∑t=1
√1
Ntr(Q′
fΩQf
)σ2
y
61
T
T∑t=1
√√√√ 1
N
k∑j=1
λjσ2y = O(1)
from assumptions 3.2 and 3.5.
The above bounds and Equation 3.21 prove the theorem.
Proof of Theorem 3.1.3(d):
ysT+h− ysT+h = β′ssfT+h− β′ssfT+h = β′s (sfT+h − LsfT+h) +(β′sL− β′s
)sfT+h
so
|ysT+h − ysT+h| 6∥∥∥βs
∥∥∥2‖sfT+h − LsfT+h‖2 +
∥∥∥βs − β′s
∥∥∥2‖sfT+h‖2
since L is orthogonal. The following bounds hold
• ‖sfT+h − LsfT+h‖2 = Op
(T− 1
2
)from Theorem 3.1.3(b).
•∥∥∥β′s − β′s
∥∥∥2
= Op
(T− 1
2
)from Theorem 3.1.3(c).
• E ‖sfT+h‖2 6 E ‖sfT+h‖F = tr(Λ− 1
2f Q′
fΩQfΛ− 1
2f
)= k. Therefore ‖sfT+h‖2 =
Op(1).
165
• βs = βs + Op
(T− 1
2
)from Theorem 3.1.3(c). Therefore
∥∥∥βs
∥∥∥2
= O(1).
and the result follows.
Proof of Theorem 3.1.4: The proof of Theorem 3.1.4 is based on the fol-
lowing inequalities
(a) | 1N
λj − 1N
dj| 6 | 1N
λj − 1N
λj|+ | 1N
λj − 1N
dj|
(b) ‖sft − Lft‖2 6 ‖sft − L1sft‖2 + ‖L1sft − Lft‖2
(c)∥∥∥βs − Lβf
∥∥∥2
6∥∥∥βs − L1βs
∥∥∥2+ ‖L1βs − Lβf‖2
(d) |ysT+h − yfT+h| 6 |ysT+h − ysT+h|+ |ysT+h − yfT+h|
where L1 is a sign matrix. Theorem 3.1.1 gives bounds linking population prin-
cipal component quantities to population factor quantities, which are relevant
for the second term in each of the above inequalities. However, these bounds
are written in terms of the noise-to-signal ratio ρ = σ2
λk. The assumptions
of Theorem 3.1.4 are written in terms of the eigenvalues of B′B (dj) rather
than the eigenvalues of Ω (λj). In order to utilise the results of Theorem
3.1.1, the bounds are re-derived in terms of the modied noise-to-signal ratio
ρ = σ2
dk. These results are given as Lemmas 18, 21 and 22. Under assumptions
4.1(a) and 4.2(b), from Theorem 3.1.2 we have ρ = O (N−1). Therefore, under
Assumption 3.4, Lemma 21 yields
‖L1sft − Lft‖2 = Op
(N−α
2
)and Lemma 22 yields
‖βs − Lβf‖2 = Op
(N−α
2
)166
Since ysT+h − yfT+h = β′ssft − β′fft, these two results yield
ysT+h − yfT+h = Op
(N−α
2
)Lemma 18 states that 1− ρ 6 dj
λj6 1. It follows that
∣∣∣∣ 1
Nλj −
1
Ndj
∣∣∣∣ 6 λj
Nρ = O
(N−α
)since ρ = O (N−α) from Theorem 3.1.2 and, as shown in the proof to Theorem
3.1.1(a), under assumptions 4.1(a) and 4.2(b), dj 6 λj 6 dj +σ2, which implies
that λj
N= O(1) under Assumption 4.1(a). Thus, we have bounds for the second
terms on the right hand sides of inequalities (a) to (d).
Bounds for the rst terms on the right hand sides of inequalities (a) to (d)
are provided by Theorem 3.1.3. The following points prove that the assump-
tions of Theorem 3.1.3 are satised by the assumptions of Theorem 3.1.4.
• Assumption 3.1 of Theorem 3.1.3 is satised by Assumption 4.3(a) of
Theorem 3.1.4 .
• Assumption 3.2 of Theorem 3.1.3 is satised by assumptions 4.1(a),
4.2(a), 4.2(b) and 4.3(a) of Theorem 3.1.4 .
• Assumption 3.3 of Theorem 3.1.3 is satised by Assumption 4.4 of The-
orem 3.1.4 using Lemma 17.
• Assumption 3.4 of Theorem 3.1.3 is satised by assumptions 4.1(a),
4.1(b) and 4.2(b) of Theorem 3.1.4 using Lemma 16.
167
• Assumption 3.5 of Theorem 3.1.3 is the same as Assumption 4.5 of The-
orem 3.1.4.
Thus, under the assumptions of Theorem 3.1.4, the results of Theorem 3.1.3
hold and the rst terms on the right hand side of inequalities (a) to (d) are all
Op
(T− 1
2
).
Lemmas Used in Theorems
Lemma 11.N∑
j=1
(λj − λj
)2
6 ‖Sxx − Ω‖2F
Lemma 12. Under Assumption 3.4,
k∑j=1
N∑i=1i6=j
R2ij 6
4
∆2N2‖Sxx − Ω‖2
F
Lemma 13. Under Assumption 3.4,
1− R2jj 6
4
∆2N2‖Sxx − Ω‖2
F
Lemma 14. Under Assumption 3.4,
∥∥∥Q′⊥Qf
∥∥∥2
26
4k
∆2N2‖Sxx − Ω‖2
F
Lemma 15. Under assumptions 3.1, 3.2 and 3.3,
E ‖Sxx − Ω‖2F 6
2γN2
T
168
Lemma 16. Under assumptions 4.1(a), 4.1(b) and 4.2(b), ∃θ,N0 > 0 such
that
N > N0 =⇒ θN < |λj − λi|
for i = 1, ..., k, j = 1, ..., N and i 6= j.
Lemma 17. Under Assumption 4.4,
supt
supN
max16i6N
16j6N
∞∑r=0
|cov (xitxjt, xit−rxjt−r)| < γ < ∞
Lemma 18. 1− ρ 6 di
λi6 1 for i=1,..,k.
Lemma 19. |q′iuj| 6 2cρ for i 6= j where qi is the ith column of Qf and uj is
the jth column of U.
Lemma 20. 1− ρ 6k∑
j=1
(q′iuj)2
Lemma 21. If k = 1 or c 6 1−ρ
2ρ√
(k−1)(1−ρ), then there exists a sign matrix L
such that E ‖st − Lft‖2F 6 k (2ρ + ρ2 (4c2(k − 1)(1− ρ)− 1))
Lemma 22.
If k = 1 or c 6 1−ρ
2ρ√
(k−1)(1−ρ), then there exists a sign matrix L such that
E∥∥∥βS − Lβf
∥∥∥F
6√
σ2yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1)).
Proofs of Lemmas
Proof of Lemma 11:
N∑j=1
(λj − λj
)2
=N∑
j=1
λ2j +
N∑j=1
λ2j − 2
N∑j=1
λjλj
169
Since Sxx and Ω are positive denite and symmetric, it follows from Marcus
(1956) that tr (SxxΩ) 6N∑
j=1
λjλj. Therefore
N∑j=1
(λj − λj
)2
6 tr (SxxSxx) + tr (ΩΩ)− 2tr (SxxΩ)
= ‖Sxx − Ω‖2F
Proof of Lemma 12:
q′j(Sxx − Ω)qi = q′jSxxqi − q′jΩqi = q′jQΛQ′qi − q′jQΛQ′qi
Note that q′jQΛ = (0 ... λj ... 0) and ΛQ′qi = (0 ... λi ... 0)′. It follows that
q′jQΛQ′qi = λj q′jqi = λjRij and q′jQΛQ′qi = λiq
′jqi = λiRij. Therefore, the
above equation may be written as
q′j(Sxx − Ω)qi =(λj − λi
)Rij (3.22)
We may also write
(λj − λi) Rij −(λj − λj
)Rij =
(λj − λi
)Rij (3.23)
so from equations (3.22) and (3.23)
(λj − λi) Rij =(λj − λj
)Rij + q′j(Sxx − Ω)qi
170
Therefore
(λj − λi)2 R2
ij 6 2(λj − λj
)2
R2ij + 2
(q′j(Sxx − Ω)qi
)2Since
N∑i=1
R2ij =
N∑i=1
q′jqiq′iqj = 1, summing over i yields
N∑i=1
(λj − λi)2 R2
ij 6 2(λj − λj
)2
+ 2q′j(Sxx − Ω)(Sxx − Ω)qj
Summing over j yields
N∑j=1
N∑i=1
(λj − λi)2 R2
ij 6 2N∑
j=1
(λj − λj
)2
+2tr ((Sxx − Ω)(Sxx − Ω)) 6 4 ‖Sxx − Ω‖2F
from Lemma 11. Under Assumption 3.4, N2∆2 < (λj − λi)2 for i = 1, ..., k
and j = 1, ..., N , i 6= j. Therefore
N2∆2
k∑j=1
N∑i=1i6=j
R2ij 6
k∑j=1
N∑i=1i6=j
(λj − λi)2 R2
ij 6N∑
j=1
N∑i=1
(λj − λi)2 R2
ij 6 4 ‖Sxx − Ω‖2F
which yieldsk∑
j=1
N∑i=1i6=j
R2ij 6
4
N2∆2‖Sxx − Ω‖2
F
Proof of Lemma 13:
N∑i=1
R2ij =
N∑i=1
q′jqiq′iqj = 1
171
AlsoN∑
i=1
R2ij =
N∑i=1i6=j
R2ij + R2
jj
Therefore
1− R2jj =
N∑i=1i6=j
R2ij 6
k∑j=1
N∑i=1i6=j
R2ij 6
4
N2∆2‖Sxx − Ω‖2
F
from Lemma 12
Proof of Lemma 14: QfQ′f + Q⊥Q′
⊥ = I so, denoting Rf = Q′fQf ,
∥∥∥Q′⊥Qf
∥∥∥2
26∥∥∥Q′
⊥Qf
∥∥∥2
F= tr
(Q′
fQ⊥Q′⊥Qf
)= k − tr
(R′
f Rf
)6 k −
k∑i=1
k∑j=1
R2ij 6 k −
k∑j=1
R2jj
=k∑
j=1
(1− R2
jj
)6
4k
N2∆2‖Sxx − Ω‖2
F
from Lemma 13.
Proof of Lemma 15: E ‖Sxx − Ω‖2F = Etr
(ΩΩ)
=N∑
i=1
N∑j=1
E
([Ω]
ij
)where[
Ω]2
ijis the i, jth element of Ω = Sxx − Ω. Denoting the i, jth element of Ω as
172
σij, note that for all i = 1, ..., N and j = 1, ..., N ,
E
([Ω]2
ij
)= E
1
T
(T∑
t=1
xitxjt − σij
)2
=1
T 2var
(T∑
t=1
xitxjt
)
=1
T 2
T∑t=1
T∑r=1
cov(xitxjt, xit−rxjt−r)
62
T 2
T∑t=1
t∑r=0
|cov(xitxjt, xit−rxjt−r)|
62
Tsup
t
∞∑r=0
|cov(xitxjt, xit−rxjt−r)|
62
Tsup
tsupN
max16i6N
16j6N
∞∑r=0
|cov(xitxjt, xit−rxjt−r)| 62γ
T
Therefore
E ‖Sxx − Ω‖2F 6
N∑i=1
N∑j=1
2γ
T=
2Nγ
T
Proof of Lemma 16: Consider the case where i < j and j 6 k, so that
λi > λj and di > dj. From the proof of Theorem 3.1.1(a) we have λi > di and
λj 6 dj + σ2. It follows that
di − dj − σ2 6 λi − λj (3.24)
From assumptions 4.1(a) and 4.1(b)
NdL
c< |di − dj| (3.25)
173
Also, from Assumption 4.2(b), ∃M < ∞ such that
σ2 < MN1−α (3.26)
Combining equations 3.24, 3.25 and 3.26 yields
NdL
c−MN1−α < λi − λj
Dene N0 =(
cMdL
) 1α. Then
N > N0 =⇒ ∃θ > 0 3 θN <NdL
c−MN1−α < λi − λj
proving the lemma for cases in which i < j and j 6 k.
For cases where i 6 k and k +1 6 j 6 N , dene dj ≡ 0 for j = k +1, ..., N
and the above argument still applies with c set equal to 1.
For cases where i 6 k and j < i, the above argument holds with the indicies
i and j interchanged.
Proof of Lemma 17: xitxjt =k∑
p=1
k∑q=1
[Bip][Biq]fptfqt+k∑
p=1
[Bip]fptεjt
k∑q=1
[Biq]fqtεit+
εitεjt. Using the fact that for random numbers a1, ..., am and b1, ..., bn, cov
(m∑
i=1
ai,n∑
j=1
bj
)=
m∑i=1
n∑j=1
cov(ai, bj) it is straightforward, but tedious, to show that under Assump-
tion 4.4 there exists a constant γ such that
supt
supN
max16i6N
16j6N
∞∑r=0
|cov (xitxjt, xit−rxjt−r)| < γ < ∞
174
Proof of Lemma 18: As shown in the proof of Theorem 3.1.1(a) λi > di ∀i =
1, ..., k. Therefore ρ = σ2
dk> σ2
λk= ρ ⇒ 1− ρ 6 1− ρ. The result then follows
from Theorem 3.1.1(a).
Proof of Lemma 19: QfΛfQ′f +Q⊥Λ⊥Q′
⊥ = UDU ′+Ψ. Premultiplying by
D−1Q′f , postmultiplying by U , and subtracting Q′
fU yields
D−1Q′fUD −Q′
fU =(D−1Λf − Ik
)Q′
fU −D−1Q′fΨU (3.27)
We now consider each of the right hand side terms. Let ei be a vector of
zeros with a 1 in the ith element only. We have
(e′iD
−1Q′fΨUej
)26 tr
(U ′ΨD−1eie
′iD
−1Q′fΨU
)6
1
d2k
e′iQ′fΨUU ′ΨQfei
61
d2k
e′iQ′fΨ
2Qfei 6σ4
d2k
∴∣∣e′iD−1Q′
fΨUej
∣∣ 6 ρ (3.28)
For the other right hand side term we have∣∣e′i(D−1Λf − Ik)Q′fUej
∣∣ =∣∣∣(λj
dj− 1)
q′iuj
∣∣∣ = |q′iuj|∣∣∣λj
dj− 1∣∣∣. From the proof
to Theorem 3.1.1(a) we have λj 6 dj + σ2 for j=1,..,k. Dividing by dj yields
λj
dj6 1 + σ2
dj= 1 + ρ. Also |q′iuj| 6 1 by the Cauchy-Schwarz inequality, so
∣∣e′i(D−1Λf − Ik)Q′fUej
∣∣ 6 ρ (3.29)
175
Combining equations (3.27), (3.28), and (3.29),
∣∣e′i (D−1Q′fUD −Q′
fU)ej
∣∣ =∣∣e′i (D−1Λf − Ik
)Q′
fUej − e′iD−1Q′
fΨUej
∣∣6∣∣e′i (D−1Λf − Ik
)Q′
fUej
∣∣+ ∣∣e′iD−1Q′fΨUej
∣∣6 2ρ
i.e.∣∣∣( di
dj− 1)
q′iuj
∣∣∣ 6 2ρ ⇒ |q′iuj| 6 2cρ for i 6= j.
Proof of Lemma 20: As shown in the proof to Lemma 18 1− ρ 6 1− ρ so
the result follows from Lemma 10.
Proof of Lemma 21:
E ‖sft − Lft‖2F = tr
[(Λ− 1
2f Q′
fUD12 − L
)(D
12 U ′QfΛ
− 12
f − L)
+ Λ− 1
2f Q′
fΨQfΛ− 1
2f
]= 2tr
(I − LΛ
− 12
f Q′fUD
12
)= 2
k∑i=1
(1− Lii
(di
λi
) 12
q′iui
)
Consider the terms(
di
λi
) 12q′iui, i = 1, ..., k.
From Lemma 18, 1−ρ 6 di
λi. If k = 1, then from Lemma 20, 1−ρ 6 (q′1u1)
2.
Combining these two results yields 1 − ρ 6(
di
λi
) 12 |q′1u1| which produces the
required result.
If k > 1 then from Lemma 20 1−ρ 6k∑
j 6=i
(q′iuj)2+(q′iui)
2 and from Lemma 19
(q′iuj)2 6 4c2ρ2 when i 6= j. Combining these two results with Lemma 18 yields
(1− ρ)2 − 4c2ρ2(k − 1)(1 − ρ) 6 di
λi(q′iui)
2. If c 6 1−ρ
2ρ√
(k−1)(1−ρ)then the left
hand side is non-negative and√
(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ) 6√
di
λi|q′iui|.
176
If we choose L so that sign(Lii) = sign(q′iuj), we get
E ‖sft − Lft‖2F 6 k − k
√(1− ρ)2 − 4c2ρ2(k − 1)(1− ρ)
Multiplying this by1+√
(1−ρ)2−4c2ρ2(k−1)(1−ρ)
1+√
(1−ρ)2−4c2ρ2(k−1)(1−ρ)yields the result.
Proof of Lemma 22: Using the triangle inequality
‖βs − Lβf‖F =
∥∥∥∥∥ 1
T
T∑t=1
sftyt −1
T
T∑t=1
Lftyt
∥∥∥∥∥F
61
T
T∑t=1
‖yt‖F ‖sft − Lft‖F
Therefore, by the Cauchy-Schwarz inequality
E ‖βs − Lβf‖F 61
T
T∑t=1
√E(‖yt‖2
F
)E(‖sft − Lft‖2
F
)=√
σ2yk(2ρ + ρ2(4c2(k − 1)(1− ρ)− 1))
by Lemma 21.
177
Chapter 4
The Grouped Variable
Approximate Factor Model
Since the publication of the theoretical papers on approximate factor models by
Forni et al. (2000), Forni et al. (2004), Stock and Watson (2002a), Bai and Ng
(2002), Bai (2003) and Bai and Ng (2006) there have been many applications
of the principal component factor estimator to large macroeconomic datasets.
However, there has been relatively little discussion about the appropriateness
of the assumptions employed in the relevant theorems for the variables used.
It is well-known that the theorems in these papers apply to the `approximate'
factor model which allows for a degree of cross-sectional correlation between the
error terms. Specically, cross-correlation is permitted provided that the sums
of the absolute values of the rows of the error correlation matrix are uniformly
bounded1. It was argued in Chapter 3 of this thesis that this assumption
1In fact Stock and Watson (2002a) and Bai and Ng (2002) make the slightly weakerassumption that the mean of the absolute row sums is bounded, and Forni et al. (2000) andForni et al. (2004) place a uniform upper bound on the largest dynamic eigenvalue of the
178
might be too restrictive for the macroeconomic applications of the principal
component factor estimator that have appeared in the literature. It is easy to
imagine that as the number of variables in the model is increased, many of the
new variables that are added to the model will have errors which are correlated
with the errors of variables already in the model, so that the absolute row sums
of the covariance matrix grow without a xed bound as N −→ ∞. Consider
an `economy' consisting of N variables. Assume that each of these variables
belongs to a single group and that there exist m groups of size Nj so that
N =m∑
j=1
Nj, where m is a xed, nite scalar. These groups could correspond
to geographical or industrial sectors; they could be groups based on the type
of economic activity being measured (e.g. "stock prices", "housing starts and
sales", "average hourly earnings", etc as used in the appendix of Stock and
Watson (2002b)); or they might simply be broad functional groupings such
as `real variables', `price variables', and `nancial variables'. It is assumed
that a factor structure exists for all N variables in the economy. Since pairs of
variables that belong to the same group tend to be quite similar, it is reasonable
to expect that there would be stronger cross-correlation between the errors of
these variables than there would be between pairs of errors that correspond
to variables from two dierent groups. The group structure of the variables
implies the existence of an ordering such that a block structure for the error
covariance matrix exists, and the above argument suggests that the diagonal
blocks are likely to exhibit stronger cross-correlation than the o-diagonal
blocks. As an illustration, assume that the o-diagonal blocks of the error
covariance matrix, which represent error correlation between dierent groups,
spectrum of the error process, which is a slightly stronger assumption.
179
are subject to the same weak correlation assumption that is employed by Bai
(2003) and Bai and Ng (2006) for the entire covariance matrix. That is, the row
sums of the absolute value of the o-diagonal blocks of the covariance matrix
have a xed uniform upper bound of M . In contrast, the diagonal blocks of the
error covariance matrix, which represent error correlation between variables
that belong to the same group, have absolute row sums that are O(N1−α)
where 0 < α < 1, so that error cross-correlation within groups grows as N
grows. Suppose that N is increased by increasing the number of variables
in each group at a rate of N . The absolute row sums of the entire error
covariance matrix will grow at a rate of N1−α. Consequently, the theorems of
Forni et al. (2000), Forni et al. (2004), Stock and Watson (2002a), Bai and
Ng (2002), Bai (2003) and Bai and Ng (2006) do not apply, but consistency
of the principal components estimator is proved by Theorem 3.1.4 in Chapter
3 of this thesis. However, the rate of convergence is min(T
12 , N
α2
). If the
rate of growth in the error correlation within groups is high (α is low), then
this rate of convergence might be quite slow. Boivin and Ng (2006) have
conducted monte carlo simulations for the principal component estimator of
a factor model similar in some ways to that described above, and found the
performance of the estimator to be poor.
The work presented in this chapter is motivated by the proposition that
the poor empirical performance of the principal components estimator in some
applications might be due to relatively strong error cross-correlation between
variables that belong to the same group. A new factor model, named the
grouped variable approximate factor model, is proposed. This model is a
formalisation of the model that is loosely described above. It places a weak
180
correlation restriction on the o-diagonal blocks of the error covariance matrix,
but permits arbitrarily strong correlation between the errors of variables that
belong to the same group. An approximate instrumental variables estimator
is proposed for the model. This estimator is computationally straightforward
even for very large models, since it is non-iterative and requires only the com-
putation of matrix products and the inversion of k×k matrices, where k is the
number of factors in the model. It is not required that T > N . Consistency is
proved for this estimator, and rates of convergence are derived. The key result
in the chapter is that the rates of convergence depend on the rate of growth of
cross-correlation in the o-diagonal blocks of the error covariance matrix only.
The degree of error cross-correlation between variables within the same group
does not aect the rst order properties of the estimator. Consequently, pro-
vided that practitioners are able to identify groups of variables which are likely
to contain the strongest error cross-correlation, the approximate instrumental
variables estimator should provide rates of convergence superior to those avail-
able from the principal components estimator.
Since the entire error covariance matrix of the standard approximate factor
model satises the restrictions of the grouped variable model, the grouped
variable approximate factor model is a generalisation of the approximate factor
model, and the techniques and results presented in this chapter also apply to
the extension of the approximate factor model presented in Chapter 3, to
the approximate factor model considered by Stock and Watson (2002a), Bai
and Ng (2002), Bai (2003) and Bai and Ng (2006), and to the classical static
factor model. Consequently, the literature on the approximate factor model
is of relevance. However, since this literature is reviewed in Chapters 1 and
181
3, it will not be reviewed again here. The construction of the approximate
instrumental variables estimator exploits an errors-in-variables representation
of the static factor model that was rst proposed by Madansky (1964) and was
utilised by Hägglund (1982) to construct an instrumental variables estimator
for the classical `strict' static factor model. The results in this chapter might
be considered as an extension of this work to a dual-limit time series setting
with a grouped variable approximate factor structure.
Another part of the (as yet unpublished) literature that is relevant is the
recent work on GMM estimation of regression models in which there exist
a large number of instruments which have an approximate factor structure.
Consider the regression model
yt = θxt + ηt
where xt is a vector of n observable variables, θ is a n×1 vector of regression
coecients, and ηt is a scalar regression error term for which E(xtηt) 6= 0.
Suppose that a vector of N instruments zt is available. Under assumptions
similar to those used by Bai (2003), Bai and Ng (2007b) prove that, if xt
and zt are driven by a set of k common factors, then a GMM estimator of
θ, in which the rst k principal components of zt are used as instruments
is√
T -consistent and asymptotically Gaussian if√
TN
−→ 0. Furthermore,
they show that the k-factor GMM estimator is more ecient than a GMM
estimator constructed using any subset of k of the elements of zt. Kapetanios
and Marcellino (2006) consider some more general relationships between the
regressor, the observable instruments and the factors, which allow for elements
182
of zt to be weak instruments. They prove several asymptotic results which
support the use of GMM estimation with the rst k principal components of zt
used as instruments. Favero et al. (2005) and Beyer et al. (2005) are examples
of applications of this technique to macroeconomic data. The approximate
instrumental variables estimator proposed in this chapter and the theorems
that are proved, may be applied directly to regression models of this type.
Consequently, this chapter contains an alternative estimation procedure to
that proposed by Bai and Ng (2007b) and Kapetanios and Marcellino (2006),
which is consistent under the more general grouped variable approximate factor
model restriction.
The remainder of this chapter presents the work on the grouped variable
approximate factor model. In Section 4.1, the grouped variable approximate
factor model is described and some notation is established. In Section 4.2,
the approximate instrumental variables estimator is presented. For the sake
of clarity, the estimators are initially derived under the assumption that the
error covariance of the model is block diagonal. In this case, the estimator is
simply a standard instrumental variables estimator and it is relatively easy to
understand the rationale of the estimators and the choices of instruments. In
fact, the assumption that the o-diagonal blocks of the error covariance matrix
are zero is not necessary. All that is required is that they satisfy a weak corre-
lation assumption similar to that applied to the entire error covariance matrix
in Chapter 3. Since, under this condition, the moment conditions used to de-
rive the instrumental variables estimator hold only in an approximate sense as
N gets large, the instruments are referred to as approximate instruments, and
the estimator is referred to as an approximate instrumental variables estimator.
183
Section 4.3 presents some consistency results for the approximate instrumental
variables estimator. In particular, it is shown that the estimator is consistent
in a framework in which (T,N) → (∞,∞) jointly. Rates of convergence de-
pend on the rate of growth of cross-correlation in the o-diagonal blocks of the
error covariance matrix (i.e. on growth in cross-correlation between the errors
of variables belonging to dierent groups), but does not depend on the rate
of growth of error cross-correlation between variables that belong to the same
block. Consequently, useful rates of convergence may be achieved in cases
in which strong cross-correlation exists between errors, provided that su-
cient a priori information exists to allow for the arrangement of variables into
groups with relatively strongly cross-correlated errors. Section 4.4 contains
an application of the approximate instrumental variables estimator to a US
macroeconomic data set, and Section 4.5 contains some concluding comments.
The proofs of all theorems are in Appendix 1.
4.1 The grouped variable approximate factor model
It is assumed that a N×1 vector of observable variables xt may be represented
by a static factor model
xt = Bft + εt t = 1, ..., T (4.1)
where ft is a k × 1 vector of unobservable factors, εt is a N × 1 vector of
unobservable errors which are assumed to be uncorrelated with the factors,
and B is a N × k matrix of non-random coecients referred to as the factor
184
loadings. It is assumed that the observable variables may be ordered so that
xt may be partitioned into three subvectors of size Na, N2 and N3, such that
N = Na + N2 + N3, denoted xat, x2t and x3t. The covariance matrix of the
errors, Ψ = E (εtε′t) may be partitioned into three corresponding blocks.
Ψ =
Ψaa Ψa2 Ψa3
Ψ2a Ψ22 Ψ23
Ψ3a Ψ32 Ψ33
(4.2)
where Ψji for i, j = a, 2, 3 is a Nj×Ni matrix. In the approximate factor model,
the errors are assumed to be weakly correlated in the sense that the row sums
of the absolute value of the error covariance Ψ are uniformly bounded. In con-
trast, in the grouped variable approximate factor model, a weak correlation
assumption will be applied to the o-diagonal blocks of Ψ only. Specically, in
Section 4.3 an upper bound will be placed on the growth rate of the maximum
of the largest singular values of the o-diagonal blocks of Ψ. The blocks on
the diagonal, corresponding to variables belonging to the same group, are per-
mitted to display arbitrarily strong correlation. Of course, if weak correlation
applies to the entire covariance matrix, then it also applies to the o-diagonal
blocks, so the formulation above is a generalisation of the approximate fac-
tor model. As in the approximate factor model, it will be assumed that the
eigenvalues of BB′ grow at a rate of N .
The extension to higher numbers of groups is trivial. Furthermore, groups
may be aggregated since any two sets of variables which constitute groups as
dened above many be combined to form a superset of variables which also
satises the above denition of a group. However, the approximate instrumen-
185
tal variables estimator, which is described in the next section, requires that
there exist at least three blocks.
In this chapter, we also consider the estimation of the parameters in a
factor-augmented regression equation
yt = β′ft + α′wt + εyt (4.3)
where yt is a scalar random variable, εyt is a scalar error term, and wt is a
m× 1 vector of exogenous variables which may include lags of yt.
4.2 The approximate instrumental variables es-
timator
The grouped variable approximate factor model may be written as
xat
x2t
x3t
=
Ba
B2
B3
ft +
εat
ε2t
ε3t
t = 1, .., T
Following ideas established by Madansky (1964) and developed by Hägglund
(1982) in the context of a classical `exact' static factor model, the grouped
variable approximate factor model will now be interpreted as an errors-in-
variables model. To this end, xat is partitioned into a k × 1 vector x0t and
a Na − k vector x1t, and the vectors ε0t and ε1t and matrices B0 and B1 are
186
created similarly. The model may then be written as
x0t
x1t
x2t
x3t
=
B0
B1
B2
B3
ft +
ε0t
ε1t
ε2t
ε3t
t = 1, .., T (4.4)
In what follows, the k×1 vector x0t will be used as a proxy for the unobservable
k × 1 vector of factors. The key feature of the partioning is that ε0t is only
weakly correlated with ε2t and ε3t. This fact will be exploited to construct an
approximate instrumental variables estimator.
In what follows, it will be assumed that rank(B0) = k. Since B and ft
are identied only up to a non-singular transformation, it is without loss of
generality that we can create
B = BB−10
and
ft = B0ft
and write the model as
x0t
x1t
x2t
x3t
=
Ik
B1
B2
B3
ft +
ε0t
ε1t
ε2t
ε3t
t = 1, .., T (4.5)
187
where xjt and εjt are Nj×1 vectors, Bj = BjB−10 is Nj×k and N0 = k. Using
obvious denitions, we will write the model in this form as
xt = Bft + εt
The partitioning in Equation (4.5) suggests an errors-in-variables interpreta-
tion of the factor model. We may write
xit = Bift + εit i = 1, 2, 3.
x0t = ft + ε0t
(4.6)
In order to establish ideas, it will be assumed initially that the o-diagonal
blocks of (4.2) are known to all be zero. It will subsequently be shown that
this assumption is not necessary.
4.2.1 Estimating Bi
Suppose, for example, that we wished to estimate B2. From Equation (4.6)
we may write
x2t −B2x0t = ε2t −B2ε0t
z2t = x3t will be used as a vector of instruments for estimating B2. Postmul-
tiplying the above equation by z′2t yields
x2tz′2t −B2x0tz
′2t = ε2tf
′tB
′3 −B2ε0tf
′tB3 −B2ε0tε
′3t + ε2tε
′3t (4.7)
188
Bearing in mind that the errors are assumed to be uncorrelated with the factors
and that, for the sake of clarity, we are assuming for the time being that the
error covariance (4.2) is block diagonal, taking expectations of Equation (4.7)
yields the moment condition
Ω2 − Ω20B′2 = 0
where Ω2 = E (z2tx′2t) and Ω20 = E (z2tx
′0t). Replacing Ω2 and Ω20 by their
sample estimators yields an instrumental variables estimator for B2
B′2 = (S ′20S20)
−1S ′20S2
where S2 = 1T
T∑t=1
z2tx′2t and S20 = 1
T
T∑t=1
z2tx′0t. Similar arguments may be
used to construct instrumental variables estimators for B1 and B3 using z1t =
(x′2t x′3t)′ and z3t = x2t as instrument vectors, and the (temporary) assump-
tion that the error covariance (4.2) is block diagonal.
4.2.2 Estimating δ = (β′ α′)′
The above approach may be extended to estimate the regression parameters
in Equation (4.3). Let δ = (β′ α′)′ and xwt = (x′0t w′t)′. Let zwt be a vector
of valid instruments. The line of argument used above then yields the moment
condition
Ωw − Ωw0δ = 0
189
where Ωw = E (zwtyt) and Ωw0 = E (zwtx′wt). Replacing Ωw and Ωw0 with their
sample estimators yields an instrumental variables estimator for δ
δ = (S ′w0Sw0)−1
S ′w0Sw
where Sw = 1T
T∑t=1
zwtyt and Sw0 = 1T
T∑t=1
zwtx′wt. The appropriate choice of
elements for the instrument vector depends on which group yt belongs to. If
yt belongs to the same group of variables as x2t, then zwt = x3t. If yt belongs
to the same group of variables as x3t, then zwt = x2t.
4.2.3 Estimating Σf and Ψ
To estimate Σf , construct the instrument vector zft = (x′2t x′3t)′. Recall that
x0t = ft + ε0t
Postmultiplying by z′ft and taking expectations (noting that the errors are
assumed to be uncorrelated with the factors and, for the time being, we are
assuming the error covariance (4.2) to be block diagonal) yields the moment
condition
Ωf0 = BfΣf
where Ωf0 = E (zftx′0t) and Bf = (B′
2 B′3)′. Replacing Ωf with its sample
analogue, and Bf by the estimator Bf = (B′2 B′
3)′ described above, yields an
estimator of Σf
Σf =(B′
f Bf
)−1
B′fSf0
190
where Sf0 = 1T
T∑t=1
zftx′0t. It should be noted that this estimate is not con-
strained to be symmetric. A preferable estimator is therefore
Σf =1
2Σf +
1
2Σ′
f
Given consistent estimates of Σf and B, the error covariance Ψ may be
consistently estimated by
Ψ = Sxx − BΣf B′
where Sxx = 1T
T∑t=1
xtx′t.
Once the model parameters have been estimated, transformations may be
applied to the estimated factor loadings to acheive any particular orientation
that might be of interest. For example, it might be deemed worthwhile to
transform the estimated model to have orthonormal factors so that direct
comparison can be made with the usual principal component estimator of the
factor model.
4.2.4 Estimating ft
Now consider the estimation of the factor vector at a particular point in time.
If the true factor loadings were known, then an unbiased estimator of the
factor2 is given by
f ∗t = (B′B)−1
B′xt (4.8)
2corresponding to the rotation given by equation 4.5
191
The covariance matrix of this estimate is
cov(f ∗t ) = (B′B)−1
B′ΨB (B′B)−1 (4.9)
Note that ‖cov(f ∗t )‖2 =∥∥(B′B)−1 B′ΨB (B′B)−1
∥∥2
6 d1σ2
d2k
where dj is the jth
eigenvalue of B′B and σ2 is the largest eigenvalue of the entire error covari-
ance matrix Ψ. Theorem 3.1.2 in Chapter 3 presents conditions under which
the right hand side of this inequality goes to zero as N → ∞. Under such
conditions Equation (4.8) is a consistent estimator of the factor. However,
the block structure of the error covariance matrix in the grouped variable fac-
tor model suggests that this convergence to zero might occur quite slowly, so
that the value of the covariance given by Equation (4.9) would be of inter-
est in applications. Consistent estimators of the `population' factor estimator
and its covariance (equations (4.8) and (4.9)) are constructed by replacing the
population parameters with their sample estimators to yield.
ft =(B′B
)−1
B′xt (4.10)
and
C = est.cov(f ∗t ) =(B′B
)−1
B′ΨB(B′B
)−1
(4.11)
4.2.5 Estimation with approximate factors
The instrumental variables estimators described above are based on moment
conditions that were derived from the assumption that the error covariance ma-
trix (4.2) is block diagonal. It turns out that this assumption is not necessary
192
for consistency. As will be claimed more precisely in Section 4.3 (and proved
in Appendix 1), in a framework in which (N, T ) → (∞,∞) jointly, consistency
holds if the o-diagonal blocks of the error covariance satisfy a weak correla-
tion assumption. Since the moment conditions do not hold exactly in such
a framework, the instruments will be referred to as approximate instruments,
and the estimators listed above will be referred to as approximate instrumen-
tal variables estimators. To summarise, the approximate instruments for each
parameter matrix are listed in Table 4.1, and the approximate instrumental
variables estimators are listed below.
Table 4.1: Approximate Instruments
Parameter Approximate InstrumentB1 z1t = (x′2tx
′3t)
′
B2 z2t = x3t
B3 z3t = x2t
δ zwt =
x2t if yt belongs to group 3x3t if yt belongs to group 2
Σf zft = (x′2tx′3t)
′
The approximate instrumental variables estimators are
B′i = (S ′i0Si0)
−1S ′i0Si
for i = 1, .., 3, where Si = 1T
T∑t=1
zitx′it and Si0 = 1
T
T∑t=1
zitx′0t.
δ = (S ′w0Sw0)−1
S ′w0Sw
where Sw = 1T
T∑t=1
zwtyt and Sw0 = 1T
T∑t=1
zwtx′wt.
193
Σf =1
2Σf +
1
2Σ′
f
where
Σf =(B′
f Bf
)−1
B′fSf0
and Sf0 = 1T
T∑t=1
zftx′0t and Bf = (B′
2 B′3)′.
Ψ = Sxx − BΣf B′
where Sxx = 1T
T∑t=1
xtx′t and B = (Ik B′
1 B′2 B′
3)′.
ft =(B′B
)−1
B′xt
C = est.cov(f ∗t ) =(B′B
)−1
B′ΨB(B′B
)−1
4.3 Some Dual-Limit Theory
In the case where the o-diagonal blocks of the error covariance matrix are
zero and the number of variables N is xed, standard arguments may be used
to prove consistency of the estimators dened in Section 4.1. Furthermore,
in this case more ecient GMM estimators may easily be derived, and the
well-known testing procedures of the GMM applied to the model. In a setting
in which the o-diagonal blocks are non-zero, and N and T approach innity
jointly, consistency is less straightforward to establish.
194
Let
vt =
(f ′t ε′t w′
t yt
)′The following assumptions are made.
Assumptions 5.
5.1 E (vt) = 0 for t = 1, .., T .
5.2 Denote Σv = 1TE
(T∑
t=1
vtv′t
). Then Σv is a full-rank matrix and max
16i6N
16j6N
|[Σv]ij| <
c < ∞, where N = N + k + m + 1.
5.3 supt
supN
max16i6N
16j6N
∞∑r=0
|cov (vitvjt, vit−rvjt−r)| < γ < ∞ where N = N + k +
m + 1.
5.4 ϕ = supN
max06i6306j63
‖Ψij‖2 = O(N
1−α2
), α > 0.
5.5 Denoting dji = eigi
(B′
jBj
), 0 < dmin 6 dji
Nj6 dmax < ∞ where Nj is the
number of rows in Bj and i = 1, .., k.
5.6 0 < dmin < eigi
(B′
0B0
)< dmax < ∞, i = 1, .., k.
5.7 E(ftε′t) = 0, E(ftεyt) = 0.
5.8 Ωw0 = E
(1T
T∑t=1
zwtx′wt
)is of full column rank, where xwt = (x′0t w
′t)′ and
zwt is the vector of instruments used to estimate δ. Also, δ = O(1).
5.9 0 < mmin 6 Nj
N6 mmax < 1, j = 1, 2, 3.
Assumptions 5.1, 5.2 and 5.3 are made to ensure that sample second mo-
ments converge in probability to their corresponding population second mo-
195
ments. Assumption 5.4 places the weak correlation restriction on the o-
diagonal blocks of the error covariance matrix. It allows the maximum of the
largest of the singular values of the o-diagonal blocks of the error covari-
ance to grow at a rate strictly less than N12 . Since ‖Ψij‖2 6
√‖Ψij‖1 ‖Ψij‖∞
this assumption could be satised by making the stronger assumption that
‖Ψij‖1 ‖Ψij‖∞ = O(N1−α), that is, that the product of the maximum absolute
row sum and the maximum absolute column sum of the o-diagonal blocks of
the error covariance matrix grows at a rate strictly less than N . Assumptions
5.5 and 5.6 require that all k eigenvalues of the common component grow at a
rate of N . If they grew slower than this, then the proportion of the variance of
xt accounted for by the factors would go to zero as N −→∞. Any faster and
it would go to one. Assumption 5.7 requires that the errors are uncorrelated
with the factors. It is likely that this assumption could be relaxed to one of
weak correlation, but this isn't attempted here. Assumpton 5.8 is required
in order for the moment condition used to derive the estimator for δ has a
unique uniformly bounded solution. Lastly, Assumption 5.9 requires that N is
increased by increasing the number of variables in all the groups at the same
rate.
Under these assumptions, the following theorems hold. Proofs are pre-
sented in Appendix 1.
Theorem 4.3.1 (Consistency of AIV Regression Estimator). Under assump-
tions 5.1 to 5.9,∥∥∥δ − δ
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)].
Theorem 4.3.2 (Consistency of the AIV Factor Loading Estimator). Under
assumptions 5.1 to 5.7 and 5.9, max16j6Ni
∥∥∥Bi(j) −Bi(j)
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]196
where Bi(j) is the jth row of Bi and Bi(j) is the corresponding row of Bi.
Theorem 4.3.3 (Consistency of the AIV Estimate of the Covariance Ma-
trix of the Factors). Under assumptions 5.1 to 5.7 and 5.9∥∥∥Σf − Σf
∥∥∥2
=
Op
[max
(T− 1
2 , N−α2
)].
Theorem 4.3.4 (Consistency of the AIV Factor Estimator). Under assump-
tions 5.1 to 5.7 and 5.9,∥∥∥ft − f ∗t
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)].
Theorem 4.3.5 (Consistency of the Sample Covariance Estimator for the
Population Covariance Estimator). Under assumptions 5.1 to 5.7 and 5.9,∥∥∥C − cov(f ∗t )∥∥∥
2= Op
[max
(T− 1
2 , N−α2
)].
As is the case for the principal components estimator in Chapter 3 the
rate of convergence of the estimator depends on the growth of N and the
rate at which cross-correlation grows as N grows (which is determined by the
parameter α). However, for the approximate instrumental variables estima-
tor, it is the rate of growth of cross-correlation in the o-diagonal blocks only
which matters. The rate of convergence is not aected by the growth in cross-
correlation in the diagonal blocks. As long as the variables with the highest
concentration of correlated errors are appropriately arranged into groups, sig-
nicant error cross-correlation which would result in poor rates of convergence
for the principal components estimator, does not aect the performance of the
approximate instrumental variables estimator.
Care should be taken in interpreting Theorem 4.3.4. For the approximate
factor model in Chapter 3 it was shown that the sample principal components
converge in probability to the population factors. A dierent approach is taken
for the grouped variable approximate factor model in this chapter. Firstly, an
197
estimator is written (Equation (4.8)) which assumes that the population fac-
tor loading matrix is known. The covariance matrix for this estimator is then
given in terms of population parameters (Equation (4.9)). Sample estimators
of these quantities are constructed by replacing the unknown population co-
ecients with sample estimators of those coecients (Equations (4.10) and
(4.11)). Explicitly, it is the population estimator of the factor f ∗t (and its
covariance) that is being estimated by ft, rather than the factor ft itself. As
was explained in Section 4.2, ‖cov(f ∗t )‖2 =∥∥(B′B)−1 B′ΨB (B′B)−1
∥∥2
6 d1σ2
d2k
where dj is the jth eigenvalue of B′B and σ2 is the largest eigenvalue of the en-
tire error covariance matrix Ψ. Theorem 3.1.2 in Chapter 3 presents conditions
under which the right hand side of this inequality goes to zero as N → ∞.
Consequently, we might expect the approximate instrumental variables esti-
mator of the factors to be consistent under fairly general conditions of error
cross-correlation. However, the situations in which the approximate instru-
mental variables estimator is of particular interest are precisely those in which
the convergence is likely to be slow. For this reason, it will generally be of
interest to estimate both the factor vector and the covariance matrix of the
factor estimator.
Finally, Bai and Ng (2007b) and Kapetanios and Marcellino (2006) have
considered the problem of estimating a regression equation
yt = θxt + ηt
where xt is a vector of n observable variables, θ is a n× 1 vector of regression
coecients, and ηt is a scalar regression error term for which E(xtηt) 6= 0.
198
They assume that there exists a N × 1 vector of instruments zt which have a
k-factor structure
zt = Bft + εt
They consider estimating the factors using principal components of zt and
using these estimated factors as instruments in a GMM estimator for θ. They
prove consistency and asymptotic Gaussianity as (N, T ) −→ (∞,∞), under
assumptions similar to those made in Bai (2003) and Bai and Ng (2006).
Equations (4.3) and (4.6) provide us with yt = β′ft+α′wt+εyt and x0t = ft+ε0t.
Combining these equations yields
yt = β′x0t + α′wt + δyt (4.12)
where δyt = εyt−β′ε0t. Note that E(δytx0t) 6= 0. Furthermore, x0t has a factor
structure and there exists a large set of instrument variables zwt that have a
similar factor structure. Consequently, Equation (4.12), which is the same as
that considered by Bai and Ng (2007b) and Kapetanios and Marcellino (2006)3,
may be estimated by the approximate instrumental variables estimator, and
the theorems of this section apply.
3Note that Kapetanios and Marcellino (2006) considers three possible relationships be-tween the regressor and the factors. It is the second of these relationships that is beingconsidered here. This is also the relationship that is considered by Bai and Ng (2007b).
199
4.4 An experimental application to US macroe-
conomic data
As an experiment and illustration, the approximate instrumental variables es-
timator will now be used to estimate a grouped variable approximate factor
model for a US macroeconomic data set, and a comparison made to the princi-
pal components estimator of the approximate factor model. The data are the
same as those used by Stock and Watson (2002b) and were downloaded from
Professor Watson's web site. The reader is referred to their paper for a more
complete discussion of the data and an extensive simulation which compares
the out-of-sample forecasting performance of large factor models to a range of
more standard forecasting models.
In the appendix of Stock and Watson (2002b) they list the variables used
in their analysis under fourteen dierent headings. In the following analysis
it will be assumed that these headings dene a set of groups. Variables are
included in the following analysis only if data exist from 1959:01 to 1998:12.
Following Stock and Watson (2002b), variables were excluded if they had any
observations lying more than 10 times the interquartile range from the median.
This gave a set of 150 variables, 149 of which will be used as predictors.
The list of variables in each group is given in Appendix 2. Also listed are
codes indicating any transformation that was applied to the variable. The
transformations used are those used by Stock and Watson (2002b). The codes
are: 1 = no transformation, 2 = rst dierence, 4 = logarithm, 5 = rst
dierence of logarithms, 6 = second dierence of logarithms.
At the current stage of theoretical development, we have little guidance for
200
the best choice of variables to use as proxies for the factors. Arbitrarily, the
following variables were used:
1. Personal consumption expenditure (chained) - total durables (GMCDQ),
2. Personal consumption expenditure (chained) - nondurables (GMCNQ),
3. Personal consumption expenditure (chained) - services (GMCSQ),
4. Personal consumption expenditure (chained) - new cars (GMCANQ),
Grouped variable approximate factor models with from one to four factors
were estimated by using these variables in order as factor proxies.
Denoting the factor loading matrix corresponding to group j as Bj, for
j = 2, ..., 14 Bj may be estimated using as instruments the variables in all
groups except Group a and Group j. B1 may be estimated using as instruments
all variables except those in Group a. The approximate instrumental variables
estimator for B is then constructed by stacking B0 = Ik with the estimates of
B1, ..., B14. Σf is estimated using as instruments all variables except those in
Group a. The other parameters, and the factors, are estimated exactly as for
the three-group model detailed in Section 4.2.
The rst task was to estimate a one-factor model using the principal com-
ponents method and the approximate instrumental variables estimator. Since
the one-factor model is identied, this allows for a simple comparison to be
made between the output of the two estimators. Figure 4.1 plots the single
factor estimated using the two methods using all of the 150 variables listed
above.
201
Figure 4.1: Single factor estimated using approximate instrumental variablesmethod and principal components method
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
1960 1965 1970 1975 1980 1985 1990 1995
Approximate Instrumental Variables Method
-6
-5
-4
-3
-2
-1
0
1
2
3
4
1960 1965 1970 1975 1980 1985 1990 1995
Principal Components Method
Apart from the scaling, it takes a sharp eye to spot the dierences between
the two dierent factor estimates. However, Figure 4.2, which shows the dif-
ference between the two standardised series shows that they are not identical.
The sample correlation coecient for the two series is 0.97. Importantly, both
estimates have the appearance that we would expect of a macroeconomic fac-
tor. In particular, the downturns in the mid-1970s, early 1980s and early 1990s
are apparent.
The second task undertaken was to conduct an out-of-sample forecasting
simulation using the approximate instrumental variables estimator and the
principal components estimator in order to make a direct comparison of their
forecasting performance. Two variables were forecast in this simulation: indus-
trial production (IP) and CPI ination (PUNEW). For each of these variables,
factor models were estimated using the two estimation procedures with all of
the 149 other variables used as predictors. No lags were included in any fore-
casting model and, in each case the factor order was pre-determined. This was
done primarily because we do not yet have a model selection procedure for
the approximate instrumental variables estimator. Stock and Watson (2002b)
found that ...good forecasts can be made with only one or two factors...".
202
Figure 4.2: Dierence between factor estimated by approximate instrumentalvariables and factor estimated by principal components
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1960 1965 1970 1975 1980 1985 1990 1995
They also found little benet from lagged variables when forecasting IP, but
considerable benet when forecasting PUNEW. As part of their simulation ex-
ercise they considered forecasts generated from models with xed factor orders
from 1 to 4. The same xed orders are considered here.
All variables used to estimate the factors were transformed as indicated in
the above list of variables. Left hand side variables were also adjusted to have
zero means. The models used are as follows.
1200
hln
(IPt+h
IPt
)= β′Ift + ηIt
and
1200
hln
(PUNEWt+h
PUNEWt
)− 1200ln
(PUNEWt
PUNEWt−1
)= β′P ft + ηPt
203
Forecast horizons considered are 6 months, 12 months, 18 months and 24
months. The rst 6-month-ahead forecast was computed by estimating the
above models using data from 1959:04 to 1968:07, and then using the param-
eter estimates and the observations from 1968:08 to compute forecasts of the
dependent variables in 1969:02. An extra observation was then added to the
data set, and the models re-estimated using data up to 1968:08, and a fore-
cast computed for 1969:03 using data from 1968:09. Continuing this procedure
produced a series of 359 6-month-ahead forecasts for each of the variables. In
a similar fashion 359 12-, 18-, and 24-month-ahead forecasts were also gener-
ated. The forecasts were then compared to the actual values which occurred
and mean squared forecast errors calculated. These are displayed in tables 4.2
and 4.3.
Table 4.2: Forecast MSEs for PC and AIV forecasts for IP
k=1 k=2 k=3 k=4h PC AIV PC AIV PC AIV PC AIV6 0.7559 0.7095 0.5705 0.7081 0.5615 0.7316 0.5703 0.738212 0.9337 0.9047 0.548 0.8788 0.5261 0.9103 0.5306 0.877118 1.0123 1.0079 0.5718 0.9577 0.525 0.9851 0.5085 0.911124 1.0227 1.0276 0.6164 0.9683 0.5197 0.9675 0.4859 0.8928
Table 4.3: Forecast MSEs for PC and AIV forecasts for PUNEW
k=1 k=2 k=3 k=4h PC AIV PC AIV PC AIV PC AIV6 1.1978 1.1877 1.1998 1.1863 1.2002 1.1849 1.1371 1.175112 1.1545 1.1367 1.1524 1.1328 1.1171 1.1401 1.0331 1.140318 1.1937 1.1654 1.1793 1.1602 1.1192 1.1666 1.0302 1.172324 1.2138 1.175 1.1704 1.1716 1.1002 1.1699 1.0239 1.186
204
For the one-factor model, the approximate instrumental variables estimator
generally produces a slightly lower mean squared forecast error for both vari-
ables at the shorter forecast horizons. Interestingly however, while increasing
the number of factors leads to a signicant improvement in the performance
of the principal component procedure in forecasting IP, it makes little dier-
ence to the performance of the approximate instrumental variables estimate
forecast. For the PUNEW forecast, increasing the factor order doesn't make
a large dierence to the MSE of either procedure. It is not easy to explain
why this might be happening, given the current state of the relevant theory.
One possibility is that only the rst factor is useful in forecasting PUNEW,
and this is being estimated reasonably well by both procedures. In contrast,
IP might require two factors to produce an ecient forecast and, for some
reason, the approximate instrumental variables procedure is doing a poor job
of estimating multiple factors. One possible cause of this might be the choice
of proxies. In this particular case, all the proxies were (arbitrarily) chosen
to be consumption variables. It might be the case that all the consumption
variables respond to changes in the factors in a similar way, so that the matrix
B0 is poorly conditioned, resulting in the chosen variables being poor proxies
when used together, but satisfactory when used individually. Whatever the
cause, this application illustrates the need for further theoretical research on
the behaviour of the approximate instrumental variables estimator. Of partic-
ular interest is the distribution of the estimator and how it is aected by the
choice of proxies and instruments.
205
4.5 Concluding Comments
The theoretical work in this chapter provides a new method for estimating
large factor models. Since the rate of convergence of the estimator does not
depend on the degree of correlation in the diagonal blocks of the error covari-
ance matrix, it is of particular interest in applications in which there is a high
degree of error cross-correlation between some variables, since the results pre-
sented in Chapter 3 suggest that the principal components estimator is likely
to perform poorly in such situations. While a small empirical application was
carried out, what is really needed is an extensive set of empirical applica-
tions which compare the approximate instrumental variables estimator to the
usual principal components estimator. It would be of particular interest to
see whether the approximate instrumental variables estimator provides a bet-
ter performance in cases where the principal components estimator has been
shown to perform relatively poorly. Results such as this would indicate that
the poor performance of the principal components estimator in those cases is
due to excessive error cross-correlation. Of course, it would also be of inter-
est to compare the two approaches to estimation in cases where the principal
components estimator works well.
Much theoretical work remains to be done. The theorems in this chapter
prove rst order convergence only. Second order convergence results would
also be of interest. This would allow for a comparison of the asymptotic
variance with that derived for the principal components estimator of the ap-
proximate factor model by Bai (2003) and Bai and Ng (2006), which would
permit a more thourough theoretical comparison between the two procedures.
206
When constructing the approximate instrumental variables estimator there ex-
ist multiple choices for the proxy variables x0t. As yet, it is not clear what the
best choice is. The derivation of the asymptotic variance might provide some
guidance in this choice. A procedure for choosing the factor order would also
be useful. This might follow from distribution theory, or it might be based on
modications to traditional model selection procedures, as done by Bai and Ng
(2002) for the approximate factor model. Finally, in this chapter it is assumed
that the group structure of the variables is known so that valid approximate
instruments may be used for estimation. In some applications, it might not be
clear what the group structure is. Since the instrumental variables are chosen
based on a set of approximate moment conditions which are overidentifying,
the development of tests of the validity of the overidentifying approximate con-
ditions, similar to well-known J-test in the GMM literature, would be a useful
line of research.
Appendix 1 Proofs
This appendix contains proofs of all the theorems stated in this chapter.
Firstly, the proofs of the theorems are given. Then the lemmas used to prove
the theorems are stated. Finally the lemmas are proved.
Proofs of Theorems
Proof of Theorem 4.3.1. Dene Ωw0 = E(zwtx′wt), Ωw = E(zwtyt), Ψw =
E(εwtεyt), and Ψw0 = E(εwtε′0t), where εwt is the error vector for the vector
207
of instruments zwt. We have yt = β′ft + α′wt + εyt and ft = x0t − ε0t so we
may write yt = γ′xwt + εyt− β′ε0t where γ = (β′ α′)′ and xwt = (x0t wt)′. Post-
multiplying by zwt, taking the expected value, and solving yields an expression
for the population parameter
δ = (Ω′w0Ωw0)
−1Ω′
w0Ωw − (Ω′w0Ωw0)
−1Ω′
w0 (Ψw −Ψw0β) (4.13)
where the non-singularity of Ω′w0Ωw0 is a consequence of Assumption 5.8. The
sample estimator is
δ = (S ′w0Sw0)−1
S ′w0Sw
where Sw = 1T
T∑t=1
zwtyt and Sw0 = 1T
T∑t=1
zwtx′wt. From Lemma 27, 1
NwS ′w0Sw0 =
1Nw
Ω′w0Ωw0 + Op
(T− 1
2
)and 1
NwS ′w0Sw = 1
NwΩ′
w0Ωw + Op
(T− 1
2
)where Nw is
the number of elements in zwt. Also(
1Nw
Ω′w0Ωw0
)−1
= O(1) as a consequence
of Assumption 5.8. It follows that
δ − δ = (Ω′w0Ωw0)
−1Ω′
w0 (Ψw −Ψw0β) + Op
(T− 1
2
)
Since
∥∥∥(Ω′w0Ωw0)
−1Ω′
w0 (Ψw −Ψw0β)∥∥∥
26
∥∥∥∥∥(
1
Nw
Ω′w0Ωw0
)−11√Nw
Ω′w0
∥∥∥∥∥2
∥∥∥∥ 1√Nw
(Ψw −Ψw0β)
∥∥∥∥2
= O(N−α
2
)(4.14)
208
from Lemmas 24 and 25 and Assumption 5.9, it follows that
∥∥∥δ − δ∥∥∥
2+ O
(N−α
2
)+ Op
(T− 1
2
)
Proof of Theorem 4.3.2. The proof of Theorem 4.3.2 follows a similar ar-
gument to the proof of Theorem 4.3.1 Dene Ωi0 = E(zitx′0t), Ωi = E(zitx
′it),
Ψzi = E(εzitεit), and Ψzi0 = E(εzitε′0t), where εzit is the error vector for the
vector of instruments zit used to estimate Bi. We have xit = Bift + εit and
ft = x0t−ε0t so we may write xit = Bix0t+εit−Biε0t . Post-multiplying by z′it,
taking the expected value, and solving yields an expression for the population
parameter
B′i = (Ω′
i0Ωi0)−1
Ω′i0Ωi − (Ω′
i0Ωi0)−1
Ω′i0 (Ψzi −Ψzi0B
′i) (4.15)
where the non-singularity of Ω′i0Ωi0 is a consequence of Lemma 24. The sample
estimator is
B′i = (S ′i0Si0)
−1S ′i0Si
where Si = 1T
T∑t=1
zitxit and Si0 = 1T
T∑t=1
zitx′0t. From Lemma 27,∥∥∥ 1
NziS ′i0Si0 − 1
NziΩ′
i0Ωi0
∥∥∥2
= Op
(T− 1
2
)and
∥∥∥ 1Nzi
S ′i0Si − 1Nzi
Ω′i0Ωi
∥∥∥2
= Op
(T− 1
2
)where Nzi is the number of elements in zit. Also
(1
NziΩ′
i0Ωi0
)−1
= O(1) from
Lemma 24. It follows that
max16j6Ni
∥∥∥Bi(j) −Bi(j)
∥∥∥2
= max16j6Ni
∥∥∥(Ω′i0Ωi0)
−1Ω′
i0
(Ψzi(j) −Ψzi0B
′i(j)
)∥∥∥2+Op
(T− 1
2
)209
Since
max16j6Ni
∥∥∥(Ω′i0Ωi0)
−1Ω′
i0
(Ψzi(j) −Ψzi0B
′i(j)
)∥∥∥2
6
∥∥∥∥∥(
1
Nzi
Ω′i0Ωi0
)−11√Nzi
Ω′i0
∥∥∥∥∥2
× max16j6Ni
∥∥∥∥ 1√Nzi
(Ψzi(j) −Ψzi0B
′i(j)
)∥∥∥∥2
= O(N−α
2zi
)(4.16)
from Lemmas 24 and 25. It follows from Assumption 5.9 that
max16j6Ni
∥∥∥Bi(j) −Bi(j)
∥∥∥2
= O(N−α
2
)+ Op
(T− 1
2
)
Proof of Theorem 4.3.3. Dene Ωf0 = E(zftx′0t), Ψf0 = E(εftε
′0t), where
εft = (ε′2t ε′3t)
′. Also dene Bf = Bf −Bf and Ωfo = Sf0 − Ωf0.
We have x0t = ft + ε0t and zft = Bfft + εft where zft = (x′2t x′3t)
′ and Bf =
(B′2 B′
3)′. Taking the expected value of zftx
′0t and solving for Σf = E(ftf
′t)
yields
Σf =(B′
fBf
)−1B′
fΩf0 −(B′
fBf
)−1B′
fΨf0
The second term on the right hand side of the parameter equation is bounded
by
∥∥∥∥( 1Nf
B′fBf
)−11
NfB′
fΨ′f0
∥∥∥∥2
6
∥∥∥∥∥(
1√Nf
B′fBf
)−11
NfB′
f
∥∥∥∥∥2
∥∥∥∥ 1√Nf
Ψ′f0
∥∥∥∥2
= O(N−1
f
)since
√√√√maxeig
[(1√Nf
B′fBf
)−1]
= 1√dmin
and
∥∥∥∥ 1√Nf
Ψ′f0
∥∥∥∥2
= O(N−1
f
)where
210
Nf = N2 + N3. Therefore we have
Σf =(B′
fBf
)−1B′
fΩf0 + O(N−1
f
)The sample estimator is
Σf =(B′
f Bf
)−1
B′fSf0
From Lemma 31 and Assumption 5.5(1
NfB′
f Bf
)−1
=(
1Nf
B′fBf
)−1
+ Op
[max
(T− 1
2 , N−α
2f
)], so to prove the
theorem we need to show that 1Nf
S ′f0Bf = 1Nf
Ω′f0Bf +Op
[max
(T− 1
2 , N−α
2f
)].
We write Bf = Bf + Bf and Sj0 = Ωf0 + Ωf0. Then
1
Nf
S ′f0Bf =1
Nf
Ω′f0Bf +
1
Nf
Ω′f0Bf +
1
Nf
Ω′f0Bf +
1
Nf
Ω′f0Bf (4.17)
Bounds will now be given for the terms on the right hand side of Equation
(4.17).
•∥∥∥ 1
NfΩ′
f0Bf
∥∥∥2
6
∥∥∥∥ 1√Nf
Ωf0
∥∥∥∥2
∥∥∥∥ 1√Nf
Bf
∥∥∥∥2
= Op
[max
(T− 1
2 , N−α
2f
)]from
Lemma 30, Assumption 5.5 and Assumption 5.2.
•∥∥∥ 1
NfΩ′
f0Bf
∥∥∥2
6
∥∥∥∥ 1√Nf
Ωf0
∥∥∥∥∥∥∥∥ 1√Nf
Bf
∥∥∥∥2
. It follows from Lemma 26 and
Markov's Inequality that
∥∥∥∥ 1√Nf
Ωf0
∥∥∥∥2
2
6
∥∥∥∥ 1√Nf
Ωf0
∥∥∥∥2
F
= Op (T−1). Also∥∥∥∥ 1√Nf
Bf
∥∥∥∥2
2
= maxeig(
1Nf
B′fBf
)= dmax. It follows that
∥∥∥ 1Nf
Ωf0Bf
∥∥∥2
=
Op
(T− 1
2
).
•∥∥∥ 1
NfΩ′
f0Bf
∥∥∥2
6
∥∥∥∥ 1√Nf
Ωf0
∥∥∥∥2
∥∥∥∥ 1√Nf
Bf
∥∥∥∥2
= Op
[T− 1
2 max(T− 1
2 , N−α
2f
)]211
from arguments presented in the above two points.
From these three results and Equation (4.17) 1Nf
S ′f0Bf = 1Nf
Ωf0Bf+Op
[max
(T− 1
2 , N−α
2f
)]and the required result follows from Assumption 5.9.
Proof of Theorem 4.3.4. The population factor estimator is f ∗t = (B′B)−1 B′xt
and the sample factor estimator is ft =(B′B
)−1
B′xt where B =
(Ik B′
2 B′2 B′
3
)′.
It follows that∥∥∥ft − f ∗t
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]if∥∥∥ 1
NB′B − 1
NB′B
∥∥∥2
=
Op
[max
(T− 1
2 , N−α2
)]and
∥∥∥ 1N
B′xt − 1N
B′xt
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)].
Since B′B = Ik +3∑
i=1
B′iBi, the rst of these conditions is proved by Lemma
31. To prove the second, note that Si = 1T
T∑t=1
zitx′it and xit = Bix0t + δit
where δit = εit − Biε0t. Therefore Si = Si0B′i + Siδ where Siδ = 1
T
T∑t=1
zitδ′it.
Consequently
B′ixit −B′
ixit = (S ′i0Si0)−1
S ′iδxit
We have
∥∥∥∥ 1
Ni
B′ixit −
1
Ni
B′ixit
∥∥∥∥2
6
∥∥∥∥∥(
1
Nzi
S ′i0Si0
)−11√Nzi
S ′i0
∥∥∥∥∥2
∥∥∥∥ 1√NiNzi
Siδ
∥∥∥∥2
∥∥∥∥ 1√Ni
xit
∥∥∥∥2
(4.18)
where Nzi is the number of elements in the vector zit. The following bounds
apply
•∥∥∥∥( 1
NziS ′i0Si0
)−11√Nzi
S ′i0
∥∥∥∥2
= Op(1) from Lemma 28.
•∥∥∥ 1√
NiNziSiδ
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]from Lemma 29.
•∥∥∥ 1√
Nixit
∥∥∥2
= O(1) under assumptions 5.1, 5.2 and 5.5.
212
It follows that∥∥∥ 1
NiB′
ixit − 1Ni
B′ixit
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]. The result
then follows from the fact that∥∥∥ 1
NB′xt − 1
NB′xt
∥∥∥2
63∑
i=1
∥∥∥ 1N
B′ixit − 1
NB′
ixit
∥∥∥2
6
3∑i=1
∥∥∥ 1Ni
B′ixit − 1
NiB′
ixit
∥∥∥2.
Proof of Theorem 4.3.5. From Lemma 31 and Assumption 5.5∥∥∥∥( 1N
B′B)−1
−(
1N
B′B)−1
∥∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)], so to prove the the-
orem we need to show that∥∥∥ 1
N2 B′ΨB − 1
N2 B′ΨB
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)].
1
N2B′ΨB =
1
N2B′(Sxx − B′Σf B
)B =
1
N2B′SxxB − 1
N2B′BΣf B
′B
=1
N2B′ΩB +
1
N2B′ΩB − 1
N2B′BΣf B
′B
(4.19)
where Ω = Sxx−Ω and Ω = E(Sxx). We now consider each of the three terms
on the right hand side of Equation (4.19).
• For the rst right hand side term in Equation (4.19), 1N2 B
′ΩB− 1N2 B
′ΩB =
1N2 B
′ΩB + 1N2 B
′ΩB + 1N2 B
′ΩB where B = B − B. The terms on the
right hand side of this expression may be bounded as follows.
Firstly∥∥∥ 1
N2 B′ΩB
∥∥∥2
=∥∥∥ 1√
NB′∥∥∥
2
∥∥ 1N
Ω∥∥
2
∥∥∥ 1√N
B∥∥∥
2.
•∥∥∥ 1√
NB′∥∥∥
2= O(1) under Assumption 5.5.
•∥∥ 1
NΩ∥∥
2= 1
N2 maxeig (Ω2) = 1N2 maxeig (Ω)2 = O(1) under assump-
tions 5.2 and 5.5.
• Since B′ =
(0k B′
1 B′2 B′
3
), it follows that∥∥∥ 1√
NB∥∥∥
2=
3∑i=1
∥∥∥ 1√N
Bi
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]from Lemma
30.
213
Secondly∥∥∥ 1
N2 B′ΩB
∥∥∥2
6∥∥∥ 1√
NB∥∥∥2
2
∥∥ 1N
Ω∥∥
2= Op [max (T−1, N−α)] from
the above arguments.
Combining results we have
∥∥∥∥ 1
N2B′ΩB − 1
N2B′ΩB
∥∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]
• For the second right hand side term in Equation (4.19),
1
N2B′ΩB =
1
N2
(B′ + B′
)Ω(B + B
)=
1
N2B′ΩB +
1
N2B′ΩB +
1
N2B′ΩB +
1
N2B′ΩB
Bounds are now written for each of these terms.
•
∥∥∥∥ 1
N2B′ΩB
∥∥∥∥2
61√N
∥∥∥∥ 1√N
B
∥∥∥∥2
2
∥∥∥∥ 1√N
Ω
∥∥∥∥2
6d2
max√N
∥∥∥∥ 1√N
Ω
∥∥∥∥2
6d2
max√N
∥∥∥∥ 1√N
Ω
∥∥∥∥F
= Op
(N− 1
2 T− 12
)from Lemma 26 under Assumptions 5.1 to 5.3.
•∥∥∥ 1
N2 B′ΩB
∥∥∥2
6∥∥∥ 1√
NB′ 1
NΩ 1√
NB∥∥∥
26∥∥∥ 1√
NB∥∥∥
2
∥∥∥ 1N
Ω∥∥∥
2
∥∥∥ 1√N
B∥∥∥
2.∥∥∥ 1√
NB∥∥∥
26√
dmax under Assumption 5.5 and∥∥∥ 1√N
B∥∥∥
2= Op
[max
(T− 1
2 , N−α2
)]from Lemma 30.
∥∥∥ 1N
Ω∥∥∥
2=
Op
(T− 1
2
)from Lemma 26. Therefore∥∥∥ 1
N2 B′ΩB
∥∥∥2
= Op
[T− 1
2 max(T− 1
2 , N−α2
)].
•∥∥∥ 1
N2 B′ΩB
∥∥∥2
6∥∥∥ 1√
NB∥∥∥2
2
∥∥∥ 1N
Ω∥∥∥
2= Op
[T− 1
2 max (T−1, N−α)]from
214
Lemma 26 and Lemma 30.
Combining results we have∥∥∥ 1
N2 B′ΩB
∥∥∥2
= Op
[max
(N− 1
2 T− 12
)].
• For the third right hand side term in Equation (4.19),∥∥∥ 1N2 B
′BΣf B′B − 1
N2 B′BΣfB
′B∥∥∥
2= Op
[max
(T− 1
2 , N−α2
)]since∥∥∥ 1
NB′B − 1
N2 B′B∥∥∥
2= Op
[max
(T− 1
2 , N−α2
)]from Lemma 31, and∥∥∥Σf − Σf
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]from Theorem 4.3.3.
Combining these three results with Equation (4.19) we have
∥∥∥∥ 1
N2B′ΨB − 1
N2B′ΩB − 1
N2B′BB′B
∥∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]∥∥∥∥ 1
N2B′ΨB − 1
N2B′(Ω−BB′)B
∥∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]∥∥∥∥ 1
N2B′ΨB − 1
N2B′ΨB
∥∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]
Lemmas Used in the Proofs of the Theorems
Lemma 23. Let G and H be positive denite symmetric J × J matrices with
eigenvalues g1 > g2 > ... > gJ and h1 > h2 > ... > hJ . Then
J∑i=1
(gi − hi)2 6 ‖G−H‖2
F
Lemma 24. Under assumptions 5.1 to 5.7 tr
[(1
NziΩ′
i0Ωi0
)−1]
= O(1), where
Ωi0 = E(zitx′0t).
215
Lemma 25. Under assumptions 5.2 and 5.4,∥∥∥ 1√
Nw(Ψw −Ψw0β)
∥∥∥2
= O(N−α
2w
)and max
16j6Ni
∥∥∥ 1√Nzi
(Ψzi(j) −Ψzi0B
′i(j)
)∥∥∥2
= O(N−α
2zi
)where Bi(j) is the jth row
of Bi, Ψw = E(εwtε′yt), Ψw0 = E(εwtε
′0t), Ψzi = E(εzitε
′it), Ψzi0 = E(εzitε
′0t)
and Ψzi(j) is the jth column of Ψzi. εzit is the Nzi×1 error vector from the fac-
tor model of the instrument vector used to estimate Bi, and εwt is the Nw × 1
error vector from the factor model of the instrument vector zwt. Ni is the
number of rows in Bi.
Lemma 26. Dene ut = (x′t w′t yt)
′. Let upt be a Np × 1 vector containing a
subset of the elements of ut, and let uqt be a Nq × 1 vector dened similarly.
Dene Spq = 1T
T∑t=1
uptu′qt, Ωpq = E(Spq) and Ωpq = Spq − Ωpq. Then, under
assumptions 5.1, 5.2 and 5.3,
E∥∥∥Ωpq
∥∥∥2
F6
NpNqγ
T
where 0 < γ < ∞ and γ is a uniform bound applying to all vectors upt and uqt
as dened above.
Lemma 27. Dene ut = (x′t w′t yt)
′. Let upt be a Np × 1 vector containing
a subset of the elements of ut. Also let uqt be a Nq × 1 vector, and urt be a
Nr × 1 vector dened similarly. Dene Spq = 1T
T∑t=1
uptu′qt, Spr = 1
T
T∑t=1
uptu′rt,
Ωpq = E(Spq) and Ωpr = E(Spr). Then, under assumptions 5.1, 5.2 and 5.3,
∥∥S ′pqSpr − Ω′pqΩpr
∥∥2
= Op
(T− 1
2 NpN− 1
2q N
− 12
r
)
where the bound applies uniformly to all matrices Spq and Spr as dened above.
216
Lemma 28. Under assumptions 5.1 to 5.7
∥∥∥∥( 1Nzi
S ′i0Si0
)−11√Nzi
S ′i0
∥∥∥∥2
2
= Op(1).
Lemma 29. Let δit = xit − Bix0t and Siδ = 1T
T∑t=1
zitδ′it. Under assumptions
5.1, 5.2, 5.3, 5.5 and 5.9,∥∥ 1
NSiδ
∥∥2
= Op
[max
(T− 1
2 , N−α2
)]Lemma 30. Denote Bi = Bi − Bi. Under assumptions 5.1 to 5.7 and 5.9,∥∥∥ 1√
NiBi
∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]where Ni is the number of rows in Bi.
Lemma 31. Under assumptions 5.1 to 5.7 and 5.9,∥∥∥ 1
NfB′
f Bf − 1Nf
B′fBf
∥∥∥2
=
Op
[max
(T− 1
2 , N−α2
)].
Proofs of Lemmas
Proof of Lemma 23.
J∑i=1
(gi − hi)2 =
J∑i=1
g2i +
J∑i=1
h2i − 2
J∑i=1
gihi = tr(G′G) + tr(H ′H)− 2J∑
i=1
gihi
but tr(GH) 6J∑
i=1
gihi from Marcus (1956), so
J∑i=1
(gi − hi)2 6 tr(G′G) + tr(H ′H)− 2tr(GH) = ‖G−H‖2
F
Proof of Lemma 24. Ωi0 = BziΣf + Ψzi0 where Ψzi0 = E(εzitε′0t) and εzit is
the error vector corresponding to the instrument vector zit. Therefore
∥∥∥∥ 1
Nzi
Ω′i0Ωi0 −
1
Nzi
ΣfB′ziBziΣf
∥∥∥∥2
=
∥∥∥∥ 1
Nzi
ΣfB′ziΨzi0 +
1
Nzi
Ψ′zi0BziΣf +
1
Nzi
Ψ′zi0Ψzi0
∥∥∥∥217
where Nzi is the number of elements in zit. However∥∥∥∥ 1
Nzi
ΣfB′ziΨzi0 +
1
Nzi
Ψ′zi0BziΣf +
1
Nzi
Ψ′zi0Ψzi0
∥∥∥∥ 6 2 ‖Σf‖2
∥∥∥∥ 1√Nzi
Bzi
∥∥∥∥2
∥∥∥∥ 1√Nzi
Ψzi0
∥∥∥∥2
+
∥∥∥∥ 1√Nzi
Ψzi0
∥∥∥∥2
2
= O(N−α
2zi
)from assumptions 5.4, 5.5 and 5.6. Therefore
1
Nzi
Ω′i0Ωi0 =
1
Nzi
ΣfB′ziBziΣf + O
(N−α
2zi
)
Under assumptions 5.5 and 5.6,(
1Nzi
ΣfB′ziBziΣf
)−1
= O(1). It follows that
(1
Nzi
Ω′i0Ωi0
)−1
=
(1
Nzi
ΣfB′ziBziΣf
)−1
+ O(N−α
2zi
)= O(1)
Proof of Lemma 25.
∥∥∥∥ 1√Nw
(Ψw −Ψw0β)
∥∥∥∥2
6
∥∥∥∥ 1√Nw
Ψw
∥∥∥∥2
+
∥∥∥∥ 1√Nw
Ψw0
∥∥∥∥2
‖β‖2 = O(N−α
2w
)
under Assumption 5.4. Similarly
max16j6Ni
∥∥∥∥ 1√Nzi
(Ψzi −Ψzi0β)
∥∥∥∥2
6
∥∥∥∥ 1√Nzi
Ψzi
∥∥∥∥2
+
∥∥∥∥ 1√Nzi
Ψzi0
∥∥∥∥2
max16j6Ni
∥∥Bi(j)
∥∥2
= O(N−α
2zi
)
under Assumption 5.4 and since max16j6Ni
∥∥Bi(j)
∥∥ = O(1).
Proof of Lemma 26. Using the fact that for random numbers a1, ..., am and
b1, ..., bn, cov
(m∑
i=1
ai,n∑
j=1
bj
)=
m∑i=1
n∑j=1
cov(ai, bj) it is straightforward, but te-
218
dious, to show that under Assumption 5.3 there exists a constant γ such that
supt
supN
max16i,j6Nu
∞∑r=0
|cov (uitvjt, uit−rujt−r)| <γ
2< ∞ (4.20)
where Nu = N + m + 1
E
([Ωpq
]2ij
)= E
1
T
(T∑
t=1
uitujt − σij
)2
=1
T 2var
(T∑
t=1
uitujt
)
=1
T 2
T∑t=1
T∑r=1
cov(uitujt, uit−rujt−r)
62
T 2
T∑t=1
t∑r=0
|cov(uitujt, uit−rujt−r)|
62
Tsup
t
∞∑r=0
|cov(uitujt, uit−rujt−r)|
62
Tsup
tsupN
max16i,j6Nu
∞∑r=0
|cov(uitujt, uit−rujt−r)| 6γ
T
Therefore
E∥∥∥Ωpq
∥∥∥2
F=
Np∑i=1
Nq∑j=1
E
([Ωpq
]2ij
)6
NpNqγ
T
Proof of Lemma 27. Dene Ωpq = Spq − Ωpq and Ωpr = Spr − Ωpr. Then
E(Ωpq
)= 0 and E
(Ωpr
)= 0. Dene θpq = sup
i,j|[Ωpq]ij| and θpr = sup
i,j|[Ωpr]ij|.
The existence of the suprema is given by Assumption 5.2.
219
We have
∥∥S ′pqSpr − Ω′pqΩpr
∥∥2
=∥∥∥Ω′
pqΩpr + Ω′pqΩpr + Ω′
pqΩpr
∥∥∥2
6∥∥Ω′
pq
∥∥2
∥∥∥Ωpr
∥∥∥2+∥∥Ω′
pr
∥∥2
∥∥∥Ωpq
∥∥∥2+∥∥∥Ω′
pr
∥∥∥2
∥∥∥Ωpq
∥∥∥2
The terms on the right hand side of this inequality satisfy the following bounds
•∥∥∥Ωpq
∥∥∥2
6∥∥∥Ωpq
∥∥∥F
6√
NpNq γ
Tfrom Lemma 26.
•∥∥∥Ωpr
∥∥∥2
6∥∥∥Ωpr
∥∥∥F
6√
NpNr γ
Tfrom Lemma 26.
• ‖Ωpq‖2 6 ‖Ωpq‖F 6√
NpNqθpq.
• ‖Ωpr‖2 6 ‖Ωpr‖F 6√
NpNrθpr.
proving the lemma.
Proof of Lemma 28.
∥∥∥∥∥(
1
Nzi
S ′i0Si0
)−11√Nzi
S ′i0
∥∥∥∥∥2
2
= maxeig
((1
Nzi
S ′i0Si0
)−1)
=
(mineig
(1
Nzi
S ′i0Si0
))−1
From Lemma 27
1
Nzi
S ′i0Si0 =1
Nzi
Ω′i0Ωi0 + Op
(T− 1
2
)
so from Lemma 23
mineig
(1
Nzi
S ′i0Si0
)= mineig
(1
Nzi
Ω′i0Ωi0
)+ Op
(T− 1
2
)
It follows from Lemma 24 that
220
(mineig
(1
NziΩ′
i0Ωi0
))−1
= maxeig
((1
NziΩ′
i0Ωi0
)−1)
6 tr
((1
NziΩ′
i0Ωi0
)−1)
=
O(1). Therefore
∥∥∥∥∥(
1
Nzi
S ′i0Si0
)−11√Nzi
S ′i0
∥∥∥∥∥2
2
= mineig
(1
Nzi
S ′i0Si0
)= mineig
(1
Nzi
Ω′i0Ωi0
)+ Op
(T− 1
2
)= Op(1)
Proof of Lemma 29.
Siδ =1
T
T∑t=1
zitx′it −Bi
1
T
T∑t=1
zitx′0t
= Si −BiSi0
= Ωi −BiΩi0 + Ωi −BiΩi0
However, the moment condition that is satised by the model is
Ωi −BiΩi0 = Ψiz −BiΨ0iz
where Ψiz = E(εitε′zit) and Ψ0iz = E(ε0tε
′zit) and εzit is the Nzi×1 error vector
from the factor structure of the instrument vector zit. Therefore
1
NSiδ =
1
NΨiz −
1
NBiΨ0iz +
1
NΩi −
1
NBiΩi0
and consequently
∥∥∥∥ 1
NSiδ
∥∥∥∥2
=
∥∥∥∥ 1
NΨiz
∥∥∥∥2
+
∥∥∥∥ 1√N
Bi
∥∥∥∥2
∥∥∥∥ 1√N
Ψ0iz
∥∥∥∥2
+
∥∥∥∥ 1
NΩi
∥∥∥∥2
+
∥∥∥∥ 1√N
Bi
∥∥∥∥2
∥∥∥∥ 1√N
Ωi0
∥∥∥∥2
221
Bounds are now given for each of the terms on the right hand side of this
equation.
•∥∥ 1
NΨiz
∥∥2
= O(N−α2 ) by Assumption 5.4 and Assumption 5.9.
•∥∥∥ 1√
NΨ0iz
∥∥∥2
= O(N−α2 ) by Assumption 5.4 and Assumption 5.9.
•∥∥∥ 1√
NBi
∥∥∥2
= O(1) by Assumption 5.5 and Assumption 5.9.
•∥∥∥ 1
NΩi
∥∥∥2
6∥∥∥ 1
NΩi
∥∥∥F
= Op
(T− 1
2
)from Lemma 26 and Assumption 5.9.
•∥∥∥ 1√
NΩi0
∥∥∥2
6∥∥∥ 1√
NΩi0
∥∥∥F
= Op
(T− 1
2
)from Lemma 26 and Assumption
5.9.
Therefore∥∥ 1
NSiδ
∥∥2
= O(N−α2 ) + Op
(T− 1
2
).
Proof of Lemma 30. Note that Si = Si0B′i + Siδ where Siδ = 1
T
T∑t=1
zitδ′it.
Since B′i = (S ′i0Si0)
−1 S ′i0Si, we have
Bi = B′i −B′
i = (S ′i0Si0)−1
S ′i0Siδ
Therefore
∥∥∥∥ 1√Ni
Bi
∥∥∥∥2
6
∥∥∥∥∥(
1
Nzi
S ′i0Si0
)−11√Nzi
S ′j0
∥∥∥∥∥2
∥∥∥∥ 1√NiNzi
Siδ
∥∥∥∥2
= Op
[max
(T− 1
2 , N−α2
)]
from Lemmas 28 and 29 and Assumption 5.9.
Proof of Lemma 31. Since Bi = B′i −B′
i, we have
1
Ni
B′iBi −
1
Ni
B′iBi =
1
Ni
B′iBi +
1
Ni
B′iBi +
1
Ni
B′iBi
222
Therefore
∥∥∥∥ 1
Ni
B′iBi −
1
Ni
B′iBi
∥∥∥∥2
= 2
∥∥∥∥ 1√Ni
Bi
∥∥∥∥2
∥∥∥∥ 1√Ni
Bi
∥∥∥∥2
+
∥∥∥∥ 1√Ni
Bi
∥∥∥∥2
2
= Op
[max
(T− 1
2 , N−α2
)]
from Lemma 30, Assumption 5.5 and Assumption 5.9
Appendix 2 Data
Table 4.4: Group a variables
GMCQ 5 Personal consumption expend (chained)-total (bil 92$, saar)GMCDQ 5 Personal consumption expend (chained)-total durables (bil 92$, saar)GMCNQ 5 Personal consumption expend (chained)-nondurables (bil92$, saar)GMCSQ 5 Personal consumption expend (chained)-services (bil 92$, saar)GMCANQ 5 Personal consumption expend (chained)-new cars (bil 92$, saar)
223
Table 4.5: Group 2 variables
IP 5 Industrial production:total index (1992=100, sa)IPP 5 Industrial production:products, total (1992=100, sa)IPF 5 Industrial production:nal products (1992=100, sa)IPC 5 Industrial production:consumer goods (1992=100, sa)IPCD 5 Industrial production:durable consumer goods (1992=100, sa)IPCN 5 Industrial production:nondurable consumer goods (1992=100, sa)IPE 5 Industrial production:business equipment (1992=100, sa)IPI 5 Industrial production:intermediate products (1992=100, sa)IPM 5 Industrial production:materials (1992=100, sa)IPMD 5 Industrial production:durable goods materials (1992=100, sa)IPMND 5 Industrial production:nondurable goods materials (1992=100, sa)IPMFG 5 Industrial production:manufacturing (1992=100, sa)IPD 5 Industrial production:durable manufacturing (1992=100, sa)IPN 5 Industrial production:nondurable manufacturing (1992=100, sa)IPMIN 5 Industrial production:mining (1992=100, sa)IPUT 5 Industrial production:utilities (1992=100, sa)IPXMCA 1 Capacity utilization rate: manufacturing, total (% of capacity, sa) (frb)PMI 5 Purchasing managers' index (sa)PMP 5 NAPM production index (percent)GMPYQ 5 Personal income (chained) (series #52) (bil 92$, saar)GMYXPQ 5 Personal income less transfer payments (chained) (#51) (bil 92$, saar)
224
Table 4.6: Group 3 variables
LHEL 5 Index of help wanted advertising in newspapers (1967=100, sa)LHELX 4 Employment: ratio; help wanted ads:no. Unemployed clfLHEM 5 Civilian labor force: employed, total (thousands, sa)LHNAG 5 Civilian labor force: employed: employed, nonagricultural industries (thsnd, sa)LHUR 1 Unemployment rate: all workers, 16 years and over (%, sa)LHU680 1 Unemployed by duration: average (mean) duration in weeks (sa)LHU5 1 Unemployed by duration: persons unemployed less than 5 wks (thousands, sa)LHU14 1 Unemployed by duration: persons unemployed 5 to 14 wks (thousands, sa)LHU15 1 Unemployed by duration: persons unemployed 15 wks (thousands, sa)LHU26 1 Unemployed by duration: persons unemployed 15 to 26 weeks (thousands, sa)LPNAG 5 Employees on nonagricultural payrolls: total (thousands, sa)LP 5 Employees on nonagricultural payrolls: total, private (thousands, sa)LPGD 5 Employees on nonagricultural payrolls: goods producing (thousands, sa)LPCC 5 Employees on nonagricultural payrolls: contract construction (thousands, sa)LPEM 5 Employees on nonagricultural payrolls: manufacturing (thousands, sa)LPED 5 Employees on nonagricultural payrolls: durable goods (thousands, sa)LPEN 5 Employees on nonagricultural payrolls: nondurable goods (thousands, sa)LPSP 5 Employees on nonagricultural payrolls: service producing (thousands, sa)LPT 5 Employees on nonagricultural payrolls: wholesale and retail trade (thousands, sa)LPFR 5 Employees on nonagricultural payrolls: n., ins. and real estate (thsnds, sa)LPS 5 Employees on nonagricultural payrolls: services (thousands, sa)LPGOV 5 Employees on nonagricultural payrolls: government (thousands, sa)LPHRM 1 Average weekly hours of production workers: manufacturing (sa)LPMOSA 1 Average weekly hours of production workers: manufacturing, overtime hours (sa)PMEMP 1 NAPM employment index (%)
Table 4.7: Group 4 variables
MSMTQ 5 Manufacturing & trade: total (millions of chained 1992 dollars)(sa)MSMQ 5 Manufacturing & trade: manufacturing; total (millions of chained 1992 dollars)(sa)MSDQ 5 Manufacturing & trade: durable goods (millions of chained 1992 dollars)(sa)MSNQ 5 Manufacturing & trade: durable goods (millions of chained 1992 dollars)(sa)WTQ 5 Merchant wholesalers: total (millions of chained 1992 dollars)(sa)WTDQ 5 Merchant wholesalers: durable goods (millions of chained 1992 dollars)(sa)WTNQ 5 Merchant wholesalers: nondurable goods (millions of chained 1992 dollars)(sa)RTQ 5 Retail trade: total (millions of chained 1992 dollars)(sa)RTNQ 5 Retail trade: nondurable goods (millions of 1992 dollars)(sa)
225
Table 4.8: Group 5 variables
HSFR 4 Housing starts: nonfarm (1947-58): total farm and nonfarm (1959-) (thousands, sa)HSNE 4 Housing starts: northeast (thousands) saHSMW 4 Housing starts: midwest (thousands) saHSSOU 4 Housing starts: south (thousands) saHSWST 4 Housing starts: west (thousands) saHSBR 4 Housing authorized: total new private housing units (thousands, saar)HMOB 4 Mobile homes: manufacturers' shipments (thousands of units, saar)
Table 4.9: Group 6 variables
IVMTQ 5 Manufacturing and trade inventories: total (millions of chained 1992)(sa)IVMFGQ 5 Inventories, business, mfg (millions of chained 1992)(sa)IVMFDQ 5 Inventories, business, durables (millions of chained 1992)(sa)IVMFNQ 5 Inventories, business, durables (millions of chained 1992)(sa)IVWRQ 5 Manufacturing & trade inventories: merchant wholesalers (m$1992 chained )(sa)IVRRQ 5 Manufacturing & trade inventories: retail trade (millions of chained 1992 dollars)(sa)IVSRQ 2 Ratio for manufacturing and trade: inventory/sales (chained 1992 dollars, sa)IVSRMQ 2 Ratio for manufacturing and trade: man. inventory/sales (chained m$1992, sa)IVSRWQ 2 Ratio for manufacturing and trade: wholesaler inventory/sales (chained 1992 dollars, sa)IVSRRQ 2 Ratio for manufacturing and trade: retail trade inventory/sales (chained 1992 dollars, sa)PMNV 1 NAPM inventories index (percent)
226
Table 4.10: Group 7 variables
PMNO 1 NAPM new orders index (percent)PMDEL 1 NAPM vendor deliveries index (percent)MOCMQ 5 New orders, (net)-consumer goods and materials, 1992 dollars (bci)MDOQ 5 New orders, durable goods industries, 1992 (bci)MSONDQ 5 New orders, nondefense capital goods, in 1992 dollars (bci)MO 5 Manufacturing new orders: all manufacturing industries, total (mil$, sa)MOWU 5 Manufacturing new orders: manufacturing industries with unlled orders (mil$, sa)MDO 5 Manufacturing new orders: durable goods industires, total (mil$, sa)MDUWU 5 Manufacturing new orders: durable goods industries with unlled orders (mil$, sa)MNO 5 Manufacturing new orders: nondurable goods industries, total (mil$, sa)MNOU 5 Manufacturing new orders: nondurable goods ind. with unlled orders (m$, sa)MU 5 Manufacturing unlled orders: all manufacturing industries (m$, sa)MDU 5 Manufacturing unlled orders: durable goods industries, total (mil$, sa)MNU 5 Manufacturing unlled orders: nondurable goods industries, total (mil$, sa)MPCON 5 Contracts and orders for plant and equipment (bil$, sa)MPCONQ 5 Contracts and orders for plant and equipment in 1992 dollars (bci)
Table 4.11: Group 8 variables
FSNCOM 5 NYSE common stock price index: composite (12/31/65 = 50)FSPCOM 5 S&P's common stock price index: composite (1941-43 = 10)FSPIN 5 S&P's common stock price index: industrials (1941-43 = 10)FSPCAP 5 S&P's common stock price index: capital goods (1941-43 = 10)FSPUT 5 S&P's common stock price index: utilities (1941-43 = 10)FSDXP 1 S&P's common stock price index: dividend yield (% per annum)FSPXE 1 S&P's common stock price index: price earnings ratio (%, nsa)
Table 4.12: Group 9 variables
EXRUS 5 United States eective exchange rate (merm) (index no.)EXRGER 5 Foreign exchange rate: Germany (deutsche mark per US$)EXRSW 5 Foreign exchange rate: Switzerland (swiss franc per US$)EXRJAN 5 Foreign exchange rate: Japan (yen per US$)EXRCAN 5 Foreign exchange rate: Canada (Canadian $ per US$)
227
Table 4.13: Group 10 variables
FYGT1 2 Interest rate: US treasury const maturities, sec mkt, 1-yr. (% per ann, nsa)FYGT5 2 Interest rate: US treasury const maturities, sec mkt, 5-yr. (% per ann, nsa)FYGT10 2 Interest rate: US treasury const maturities, sec mkt, 10-yr. (% per ann, nsa)FYAAAC 2 Bond yield: Moody's AAA corporate (% per annum)FYBAAC 2 Bond yield: Moody's BAA corporate (% per annum)FYFHA 2 Secondary market yield on FHA mortgages (% per annum)SFYCP90 1 Spread 90 day commercial paper minus federal fundsSFYGM3 1 Spread 3mo treasury bills minus federal fundsSFYGM6 1 Spread 6mo treasury bills minus federal fundsSFYGT1 1 Spread FYGT1 minus federal fundsSFYGT5 1 Spread FYGT5 minus federal fundsSFYGT10 1 Spread FYGT10 minus federal fundsSFYAAAC 1 Spread FYAAAC minus federal fundsSFYBAAC 1 Spread FYBAAC minus federal fundsSFYFHA 1 Spread FYFHA minus federal funds
Table 4.14: Group 11 variables
FM1 6 Money stock: M1 (bil$,sa)FM2 6 Money stock: M2 (bil$,sa)FM3 6 Money stock: M3 (bil$,sa)FM2DQ 5 Money supply-M2 in (1992$)(bci)FMFBA 6 Monetary base, adj for reserve requirement changes (mil$,sa)FMRRA 6 Depository institution reserves: total, adj for reserve requirement changes (mil$,sa)FMRNBC 6 Depository institution reserves: nonborrow+ext cr, adj res req changes (mil$,sa)
228
Table 4.15: Group 12 variables
PMCP 6 NAPM commodity price index:PWFSA 6 Producer price index: nished goods (82 = 100, sa)PWFCSA 6 Producer price index: nished consumer goods (82=100, sa)PSM99Q 6 Index of sensitive materials prices (1990=100)(bci-99a)PUNEW 6 CPI-U: all items (82-84=100, sa)PU83 6 CPI-U: apparel and upkeep (82-84=100, sa)PU84 6 CPI-U: transportation (82-84=100, sa)PU85 6 CPI-U: medical care (82-84=100, sa)PUC 6 CPI-U: commodities (82-84=100, sa)PUCD 6 CPI-U: durables (82-84=100, sa)PUS 6 CPI-U: services (82-84=100, sa)PUXF 6 CPI-U: all items less food (82-84=100, sa)PUXHS 6 CPI-U: all items less shelter (82-84=100, sa)PUXM 6 CPI-U: all items less medical care (82-84=100, sa)PUXX 6 CPI-U: all items less food and energy (82-84=100, sa)GMDC 6 Pce, impl pr de. Pce; (1987=100)GMDCD 6 Pce, impl pr de. Pce; durables (1987=100)GMDCN 6 Pce, impl pr de. Pce; nondurables (1987=100)GMDCS 6 Pce, impl pr de. Pce; services (1987=100)
Table 4.16: Group 13 variables
LEHCC 6 Average hourly earnings of construction workers: construction ($, sa)LEHM 6 Average hourly earnings of production workers: manufacturing ($, sa)
Table 4.17: Group 14 variables
HHSNTN 1 U. Michigan index of consumer expectations (bcd-83)
229
Chapter 5
Conclusions
5.1 The motivation for the research
For the most part, economics is an observational science. Researchers in ex-
perimental disciplines often generate their own data. In contrast, economists
usually have to make do with what is available. The standard techniques for
analysing macroeconomic time series are mostly variations on the basic vector
autoregression model. Most of these techniques are not suitable for modelling
more than a handful of variables at a time. Conseqently, in macroeconomics,
`gathering more data' usually means waiting for the passage of time. Since the
period of the business cycle is probably a few years, it can be a long wait. For
the industrialised economies, data on hundreds of economic variables are now
regularly collected and published by the statistical agencies. In contrast to the
restrictions imposed by contemporary econometric methods, policy macroe-
conomists often informally analyse a wide range of available data in order to
make judgements about the state of the economy. This practice implies a belief
230
that the broad range of economic variables, most of which must be omitted
from formal analyses, contain useful information about the state of the econ-
omy. If this is true, then the development of formal econometric technques
which are capable of modelling more variables than is feasible with vector
autoregressions should be a fruitful area of research.
The research presented in this thesis focuses on the development of tech-
niques for estimating factor models of economic time series. Factor models
are attractive in economics since the notion that a wide range of economic
variables are aected by a small number of possibly unobservable factors is
usually uncontroversial, particularly in elds such as business cycle theory and
nance. Furthermore, the fact that factor analysis has been such a successful
tool in the analysis of independently sampled multivariate data, encourages
the belief that factor models of time series will prove to be a useful empirical
tool for applied economists.
5.2 The ndings of the research
A number of theoretical and methodological contributions are made in this
thesis.
5.2.1 Dynamic factor analysis
In Chapter 2 the dynamic factor model with mutually uncorrelated autoregres-
sive factors is derived as a particular realisation of a VARMA model of reduced
spectral rank observed subject to noise. It is shown (Proposition 1) that this
representation corresponds to a minimal dimension state space representation
231
of the VARMA plus noise model in cases where the the lowest common de-
nominator polynomials of each of the columns of the VARMA lter have no
common polynomial factors. When common polynomial factors exist, then
the dynamic factor model is not equivalent to a minimal dimension state space
representation.
Identication is also considered for a fairly general class of dynamic factor
model. Theorem 2.2.1 shows that the error spectrum of the dynamic factor
model is identied under some rank assumptions on the factor lter matrix
β(L). Theorem 2.2.2 shows that the number of factors is identied under
the same condition. These theorems are written for the dynamic factor model,
however the proofs do not rely on the existence of factor structure. Rather, the
essential requirement is that the observable variable is the sum of a component
with spectral rank k and a component with a diagonal spectrum. The proof is
based on the rank of submatrices of this rst component so, provided that the
appropriate rank conditions hold, the result applies generally. Theorem 2.2.3
shows that β(L) and the factor spectra, are identied, up to rescaling and sign
changes of factors, if it is possible to order the variables in xt such that the
rst k rows of β(L) are a lower triangular polynomial matrix. This generalises
a well-known condition for identication of static factor models to a dynamic
setting. Finally, Theorem 2.2.4 shows that under fairly general conditions,
zero restrictions of this type are not necessary for identication. The key
assumption here is that the factor spectra are linearly independent functions.
This assumption is satised by factors which follow autoregressive processes
(provided that no pair of factors follows the same autoregressive process). It
is also true for ARMA factors provided certain `no cancelling out' conditions
232
are satised. Since it holds for ARMA factors, it also holds for models which
have factors which follow the Markov-switching process of Hamilton (1989).
In Section 2.3, a frequency domain approach was proposed for the estima-
tion of dynamic factor models. A simulation exercise (in Section 2.4) suggests
that this method has some computational advantage over the state space scor-
ing algorithm which is usually used for dynamic factor model estimation. How-
ever, it is unfortunately the case that large models with rich dynamic structures
are dicult to estimate, as they are with the traditional state space scoring
algorithm. However, the main attraction of the frequency domain approach is
the relative ease with which a general algorithm can be coded. The existing
time domain algorithms for the estimation of dynamic factor models require
the construction of a state space representation of the model. For factor mod-
els with few lags, this is trivial. However for more complicated lag structures,
and particularly for ARMA dynamics, this task becomes more complex, and
the construction of a general algorithm, which can handle any specication of
model orders is complicated. As was shown in Section 2.3, in the frequency do-
main a general expression for the covariance matrix can be written (Equation
(2.5)) which makes the evaluation of the likelihood relatively easy to code.
5.2.2 Approximate factor models
Most of the recent research in time series factor models has been concerned
with the use of principal component methods to estimate factor models of
economic time series in a setting in which the number of variables in the
model, and the number of observations, go to innity jointly. An important
233
feature of this literature is that it does not assume that the errors of the model
have a diagonal covariance matrix, but rather assumes an `approximate' factor
structure in which the row sums of the absolute value of the error covariance
matrix are subject to a xed bound as the number of variables increases. It has
been argued (e.g. by Bai (2003)) that this is a much more realistic assumption
in applications with a large number of variables. It was argued in Chapter 3 of
this thesis, that the `approximate' factor model might still be too restrictive for
many economic applications. In most of the applications of this technique that
have appeared in the literature, the variables in the model belong to a relatively
small set of groups. Since variables within a group tend to be quite similar
(e.g. they might be price indexes for dierent classes of consumer goods), it
is possible that a non-trivial amount of cross-correlation exists between the
errors of variables that belong to the same group. Applications that have
particularly large numbers of variables tend to still have a small number of
groups. Therefore, it might be argued that the number of variables is being
increased by increasing the number of variables in a nite set of groups. In such
cases, it is quite possible that the absolute row sums of the error covariance
would grow without bound as the number of variables grows, violating the
assumptions of the approximate factor literature.
In Chapter 3 it is shown that the principal components estimator is still con-
sistent for the factor model when the absolute row sums of the error covariance
matrix are unbounded as N grows (Theorem 3.1.4) . However, the rates of con-
vergence achieved depend on the rate of growth of cross-correlation in the error
covariance matrix. Consequently, it is possible that the principal components
estimator could perform poorly in some situations. It is also shown that sample
234
principal components are consistent estimators of population principal compo-
nents in a setting in which (N, T ) → (∞,∞) provided that a `gap' condition is
satised whereby the rst k eigenvalues of the covariance matrix of xt diverge
from all other eigenvalues at a rate of N (Theorem 3.1.3). Consequently, even
in cases where the cross-correlation of the errors is growing rapidly, the sample
principal components may still be good estimates of the population principal
components. Theorem 3.1.1 presents a set of nite-sample/variables bounds
linking population principal components to population factors. By avoiding
sampling issues and asymptotic arguments, these bounds give a clear view of
the conditions under which population factors and population principal com-
ponents are likely to be `close'. In particular, they suggest that what matters
for principal components to estimate factors well is not the number of variables
per se, but rather the magnitude of the noise-to-signal ratio, which is dened
as ρ = σ2
λk, where σ2 is the largest eigenvalue of the error covariance matrix Ψ,
and λk is the kth eigenvalue of Ω = E(
1TX ′X
). When the noise-to-signal ratio
is small, population principal component and population factor quantities will
be similar. In Section 3.2 a `xed-N ' hypothesis test for the magnitude of the
noise-to-signal ratio is proposed. While the asymptotic framework in which
this test is developed is not entirely satisfactory, it represents a rst attempt
to make inferences about the noise-to-signal ratio in large factor models, and
provides some ideas which may form the basis of future research.
235
5.2.3 The grouped variable approximate factor model
In Chapter 4 a new factor model, named the grouped variable approximate
factor model, was proposed. In the grouped variable approximate factor model
the error covariance has a block structure, where the blocks correspond to the
variable groups. The o-diagonal blocks are subject to a weak correlation
restriction specically, the largest of the singular values of the o-diagonal
blocks must grow at a rate strictly less than N− 12 . No restriction is placed on
the correlation structure of the blocks that lie on the diagonal. In Section 4.2
an approximate instrumental variables estimator is proposed for the grouped
variable factor model. This estimator is simple to compute, requiring only
matrix multiplication and the inversion of a k × k matrix, where k is the
number of factors. In Section 4.3 consistency is proved for the approximate
instrumental variables estimator in a framework in which (N, T ) → (∞,∞)
jointly (Theorem 4.3.1). Importantly, the rates of convergence do not depend
on the correlation in the diagonal blocks (i.e. the correlation between the errors
of variables that belong to the same group). What matters is the rate of growth
of correlation in the o-diagonal blocks. Consequently, if it is possible to
arrange the variables into groups such that most of the error cross-correlation
occurs between variables in the same group, then the approximate instrumental
variables estimator will provide better rates of convergence than the principal
components estimator.
236
5.3 Future research
5.3.1 Dynamic factor analysis
Since there exist conditions under which the dynamic factor model with mu-
tually uncorrelated autoregressive factors does not correspond to a mimimal
dimension state space representation of the reduced spectral rank VARMA
plus noise model, a useful task for future research is to devise models which
are. Minimal dimension state space theory would clearly be an important
component of the analysis of any such model. Modications of Theorems 2.2.1
and 2.2.2 could be used to identify the spectra of the errors and the spectral
rank of the VARMA component of the process. Estimation however, is likely
to be a challenge.
Bloch (1989) provides a relationship between the dynamic errors-in-variables
model and the dynamic factor model. It would be interesting to see whether,
using this relationship, the ndings in Chapter 2 could be used to gain any
new insights into the dynamic errors-in-variables model.
Perhaps the most pressing need in the eld of dynamic factor analysis, is
for computationally ecient estimation algorithms. Dynamic factor models
with rich lag structures may be written in state space form by stacking fac-
tors and their lags into the state vector, and augmenting it by the vector of
state variables from the error processes. Unfortunately, this usually results
in a noise-free measurement equation, which does not lend itself to the EM
algorithm. While the scoring algorithm will work with such models, it tends
to be very slow and often fails to converge. Subspace algorithms oer some
hope here. However, in their current form it is not clear how the restrictions
237
implied by the stacking of the state vector can be reected in the estimation
procedure. Resolving this issue would be an important contribution to the
eld of dynamic factor analysis.
5.3.2 Approximate factor models
The theorems about the principal components estimator presented in Chapter
3 deal with rst order convergence only. Second order results which gave
convergence in distribution would also be useful. In particular, an investigation
of the asymptotic distribution of the eigenvalues would be useful since it may
lead to the development of hypothesis test procedures for the noise-to-signal
ratio. One obvious line of inquiry here would be to develop results in the
framework of Random Matrix Theory. However, currently Random Matrix
Theory applies only to serially independent vectors, and so is not applicable to
problems in time series econometrics. Consequently, it is unlikely that research
in this direction will be straightforward.
5.3.3 The grouped variable approximate factor model
Chapter 4 presents theorems which describe rst order convergence only. Re-
sults on convergence in distribution would also be useful. These might allow
the construction of test procedures for the number of factors in the model,
and may also provide some guidance about the appropriate choice of prox-
ies and factors in the construction of the approximate instrumental variables
estimator.
Another contribution that could be made by future research is empirical.
238
The applied literature which uses the principal components technique provides
mixed results. In some cases, the principal components estimator of the fac-
tor model is shown to perform well in forecasting exercises. In other cases,
the use of the estimated factors is of little benet compared with standard
univariate forecasting methods. It would be of interest to estimate the lower
bound on the noise-to-signal ratio in each of these cases to see whether it is
higher in cases where the principal components approach performs relatively
poorly. It would also be of interest to re-estimate many of the models that
appear in the literature using the approximate instrumental variables estima-
tor. Unfortunately, while it is now possible to download from the author's
websites working papers detailing the estimation of factor models for many
of the industrialised economies, the data sets used in these studies are not so
easily available. Constructing large data sets is time consuming work even
when all the data are publicly available. While perhaps lacking the challenge of
theoretical and empirical research, a great contribution could be made to this
eld by the electronic publication of a collection of large macroeconomic data
sets, as used in the published literature on large factor models, with consistent
formatting and with the usual transformations, so that the existing empirical
work may be easily replicated and extended.
239
Bibliography
Altissimo, F., Bassanetti, A., Cristadoro, R., Forni, M., Hallin, M., Lippi,
M., Reichlin, L. and Veronese, G. (2001), `Eurocoin: A real time coincident
indicator of the euro area business cycle', CEPR Discussion Papers: 3108 .
Altissimo, F., Cristadoro, R., Forni, M., Lippi, M. and Veronese, G. (2006),
`New eurocoin: Tracking economic growth in real time', C.E.P.R. Discussion
Papers, CEPR Discussion Papers: 5633 .
Altug, S. (1989), `Time-to-build and aggregate uctuations: Some new evi-
dence', International Economic Review 30, 889920.
Amengual, D. and Watson, M. W. (2007), `Consistent estimation of the num-
ber of dynamic factors in a large n and t panel', Journal of Business and
Economic Statistics 25(1), 9196.
Anderson, T. W. (1963), `Asymptotic theory for principal component analysis',
Annals of mathematical statistics 34, 122148.
Anderson, T. W. and Rubin, H. (1956), `Statistical inference in factor analy-
sis', Third Berkeley Symposium on Mathematical Statistics and Probability
5, 111150.
240
Angelini, E., Henry, J. and Mestre, R. (2001), `Diusion index-based ination
forecasts for the euro area', Working Paper Series: European Central Bank
(061).
Artis, M. J., Banerjee, A. and Marcellino, M. (2005), `Factor forecasts for the
uk', Journal of Forecasting 24(4), 279298.
Attias, H. (1999), `Independent factor analysis', Neural Computation
11(4), 803851.
Bai, J. (2003), `Inferential theory for factor models of large dimensions', Econo-
metrica 71(1), 13571.
Bai, J. and Ng, S. (2002), `Determining the number of factors in approximate
factor models', Econometrica 70(1), 191221.
Bai, J. and Ng, S. (2006), `Condence intervals for diusion index forecasts and
inference for factor-augmented regressions', Econometrica 74(4), 11331150.
Bai, J. and Ng, S. (2007a), `Determining the number of primitive shocks in
factor models', Journal of Business and Economic Statistics 25(1), 5260.
Bai, J. and Ng, S. (2007b), `Instrumental variable estimation in a data rich
environment', mimeo.
Baik, J. and Silverstein, J. W. (2006), `Eigenvalues of large sample covariance
matrices of spiked population models', J. Multivar. Anal. 97(6), 13821408.
Bandt, O. D., Michaux, E., Bruneau, C. and Flageollet, A. (2007), `Fore-
casting ination using economic indicators: the case of france', Journal of
Forecasting 26(1), 122.
241
Banerjee, A. and Marcellino, M. (2006), `Are there any reliable leading indica-
tors for us ination and gdp growth?', International Journal of Forecasting
22(1), 137151.
Barnett, S. (1980), Matrices in Control Theory, Van Nostrand Reinhold Com-
pany, New York.
Bauer, D. (1998), Some asymptotic theory for the estimation of linear systems
using maximum likelihood methods or subspace algorithms, PhD thesis, TU
Wien.
Bentler, P. M. and Kano, Y. (1990), `On the equivalence of factors and com-
ponents', Multivariate Behavioral Research 25(1), 6774.
Bernanke, B. S. and Boivin, J. (2003), `Monetary policy in a data-rich envi-
ronment', Journal of Monetary Economics 50, 525546.
Bernanke, B. S., Boivin, J. and Eliasz, P. (2005), `Measuring the eects of
monetar policy: A factor-augmented vector autoregressive (favar) approach',
Quarterly Journal of Economics 120, 387422.
Beyer, A., Farmer, R. E. A., Henry, J. and Marcellino, M. (2005), Factor
analysis in a new-keynesian model, Working Paper Series 510, European
Central Bank.
Bloch, A. M. (1989), `Identication and estimation of dynamic errors-in-
variables models', Journal of Econometrics 41, 145158.
Boivin, J. and Ng, S. (2006), `Are more data always better for factor analysis?',
Journal of Econometrics 132, 169194.
242
Box, G. E. P. and Tiao, G. C. (1977), `A canonical analysis of multiple time
series', Biometrika 64, 355365.
Breitung, J. and Eickmeier, S. (2005), Dynamic factor models, Discussion
Paper Series 1: Economic Studies 2005,38, Deutsche Bundesbank, Research
Centre.
Brillinger, D. (1975), Time Series: Data Analysis and Theory, Holt, Rinehart
and Winston.
Brisson, M., Campbell, B. and Galbraith, J. W. (2003), `Forecasting some
low-predictability time series using diusion indices', Journal of Forecasting
22, 515531.
Camacho, M. and Sancho, I. (2003), `Spanish diusion indexes', Spanish Eco-
nomic Review 5, 173203.
Camba-Mendez, G., Kapetanios, G., Smith, R. J. andWeale, M. R. (2001), `An
automatic leading indicator of economic activity: Forecasting gdp growth
for european countries', The Econometrics Journal 4(1), S56S90.
Cardoso, J.-F. (1998), `Blind signal separation: statistical principles', Proceed-
ings of the IEEE 9, 20092025.
Carter, R. L. and Fuller, W. A. (1980), `Instrumental variable estimation of
the simple errors-in-variables model', Journal of the American Statistical
Association 75, 687692.
Chamberlain, G. and Rothschild, M. (1983), `Arbitrage, factor structure, and
243
mean-variance analysis on large asset markets', Econometrica 51(5), 1281
1304.
Chauvet, M. (1998), `An econometric characterization of business cycle dy-
namics with factor structure and regime switching', International Economic
Review 39(4), 96996.
Chauvet, M., Juhn, C. and Potter, S. (2002), `Markov switching in disaggregate
unemployment rates', Empirical Economics 27, 205232.
Chen, A. and Bickel, P. J. (2006), `Ecient independent component analysis',
Annals of Statistics 34, 28252855.
Cheng, D. C. and Iglarsh, H. J. (1976), `Principal component estimators in
regression analysis', The Review of Economics and Statistics 58, 229234.
Comon, P. (1994), `Independent component analysis a new concept?', Signal
Processing 36, 287314.
Cragg, J. G. (1997), `Using higher moments to estimate the simple errors-in-
variables model', RAND Journal of Economics 28, S7191.
Cristadoro, R., Forni, M., Reichlin, L. and Veronese, G. (2005), `A core in-
ation indicator for the euro area', Journal of Money, Credit and Banking
37, 539560.
Dagenais, M. G. and Dagenais, D. L. (1997), `Higher moment estimators for
linear regression models with errors in the variables', Journal of Economet-
rics 76, 193.
244
D'Agostino, A. and Giannone, D. (2006), Comparing alternative predictors
based on large-panel factor models, Working Paper Series 680, European
Central Bank.
Deistler, M. and Anderson, B. D. O. (1989), `Linear dynamic errors-in-variables
models: Some structure theory', Journal of Econometrics 41, 3963.
Deistler, M., Peternell, K. and Scherrer, W. (1995), `Consistency and relative
eciency of subspace methods', Automatica 31, 18651875.
Dempster, Laird and Rubin, D. B. (1977), `Maximum likelihood from incom-
plete data via the em algorithm', Journal of the Royal Statistical Society.
Series B (Methodological) 39, 138.
den Reijer, A. (2005), Forecasting dutch gdp using large scale factor models,
DNB Working Papers 028, Netherlands Central Bank, Research Depart-
ment.
Diebold, F. X. and Nerlove, M. (1989), `The dynamics of exchange rate volatil-
ity: A multivariate latent factor arch model', Journal of Applied Economet-
rics 4(1), 121.
Dungey, M., Martin, V. L. and Pagan, A. R. (2000), `A multivariate latent
factor decomposition of international bond yield spreads', Journal of Applied
Econometrics 15, 697715.
Eickmeier, S. and Ziegler, C. (2006), How good are dynamic factor models
at forecasting output and ination? a meta-analytic approach, Discussion
245
Paper Series 1: Economic Studies 2006,42, Deutsche Bundesbank, Research
Centre.
Eklund, J. and Karlsson, S. (2007), An embarrassment of riches: Forecasting
using large panels, Working Papers 2007:1, Örebro University, Department
of Business, Economics, Statistics and Informatics.
Engle, R. F., Lilien, D. M. and Watson, M. (1985), `A dymimic model of
housing price determination', Journal of Econometrics 28, 307326.
Engle, R. F. and Watson, M. W. (1981), `A one-factor multivariate time se-
ries model of metropolitan wage rates', Journal of the American Statistical
Association 76(376), 77481.
Erickson, T. and Whited, T. M. (2002), `Two-step gmm estimation of the
errors-in-variables model using high order moments', Econometric Theory
18, 776799.
Favero, C. A., Marcellino, M. and Neglia, F. (2005), `Principal components
at work: The empirical analysis of monetary policy with large data sets',
Journal of Applied Econometrics 20, 602620.
Favero, C. A., Ricchi, O. and Tegami, C. (2004), `Forecasting italian ination
with large datasets and many models', IGIER Working Paper No. 269 .
Fernández-Macho, F. J. (1997), `A dynamic factor model for economic time
series', Kybernetika 33(6), 583606.
Ferreira, R. T., Bierens, H. and Castelar, I. (2005), `Forecasting quarterly
246
brazilian gdp growth rate with linear and nonlinear diusion index models',
Economia 6(3), 261292.
Flexer, A., Bauer, H., Prip, J. and Dorner, G. (2005), `Using ica for removal
of ocular artifacts in eeg recorded from blind subjects', Neural Networks
18(7), 9981005.
Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000), `The generalized
dynamic-factor model: Identication and estimation', Review of Economics
and Statistics 82(4), 54054.
Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2003), `Do nancial variables
help forecasting ination and real activity in the euro area?', Journal of
Monetary Economics 50, 12431255.
Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2004), `The general-
ized dynamic factor model consistency and rates', Journal of Econometrics
119, 231255.
Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2005), `The generalized
dynamic factor model: One-sided estimation and forecasting', Journal of
the American Statistical Association 100, 830840.
Gavin, W. T. and Kliesen, K. L. (2006), `Forecasting ination and output:
comparing data-rich models with simple rules', Federal Reserve Bank of St.
Louis, Working Paper 54A .
Geary, R. (1942), `Inherent relations between random variables', Proceedings
of the Royal Irish Academy 47, 6.
247
Geman, S. (1980), `A limit theorem for the norm of random matrices', Annals
of Probability 8, 252261.
Geweke, J. F. (1977), The dynamic factor analysis of economic time-series
models, in D. J. Aigner and A. S. Goldberger, eds, `Latent Variables in
Socioeconomic Models', North Holland, Amsterdam.
Geweke, J. F. and Singleton, K. J. (1981), `Maximum likelihood 'conrma-
tory' factor analysis of economic time series', International Economic Review
22(1), 3754.
Giacomini, R. and White, H. (2003), `Tests of conditional predictive ability',
University of California, San Diego.
Giannone, D. and Matheson, T. (2006), A new core ination indicator for new
zealand, Reserve Bank of New Zealand Discussion Paper Series DP2006/02,
Reserve Bank of New Zealand.
Giannone, D., Reichlin, L. and Sala, L. (2006), `Vars, common factors and
the empirical validation of equilibrium business cycle models', Journal of
Econometrics 132, 257279.
Gill, R. D. (1977), Consistency of maximum likelihood estimators of the factor
analysis model, when the observations are not multivariate normally dis-
tributed, in J. B. Barra, F. Brodeau, G. Romier and B. Van Cutsem, eds,
`Recent Developments in Statistics', North-Holland, Amsterdam.
Gillitzer, C. and Kearns, J. (2007), Forecasting with factors: The accuracy of
248
timeliness, RBA Research Discussion Papers rdp2007-03, Reserve Bank of
Australia.
Gillitzer, C., Kearns, J. and Richards, A. (2005), The australian business cycle:
A coincident indicator approach, RBA Research Discussion Papers rdp2005-
07, Reserve Bank of Australia.
Gorsuch, R. L. (1983), Factor Analysis, Hillsdale, N. J.: L. Erlbaum Asso-
ciates.
Gourieroux, C., Renault, E. and Touzi, N. (1993), `Indirect inference', Journal
of Applied Econometrics 8, S85S118.
Gregory, A., Head, A. and Raynauld, J. (1997), `Measuring world business
cycles', International Economic Review 38(3), 677701.
Guntermann, K. L. and Norrbin, S. C. (1991), `Empirical tests of real estate
market eciency', Journal of Real Estate Finance and Economics 4(3), p297
313.
Hamilton, J. D. (1989), `A new approach to the economic analysis of nonsta-
tionary time series and the business cycle', Econometrica 57(2), 35784.
Hannan, E. J. and Deistler, M. (1986), The Statistical Theory of Linear Sys-
tems, Wiley, New York.
Hansen, L. and Sargent, T. (1990), Two diculties in interpreting vector au-
toregressions, in L. Hansen and T. Sargent, eds, `Rational Expectations
Econometrics', Westview Press: London.
249
Harris, D. and Martin, V. L. (1998), `Indirect estimation of dynamic factor
models of the business cycle', mimeo.
Heaton, C. and Oslington, P. (2002), `The contribution of structural shocks to
australian unemployment', Economic Record 78(243), p433 442.
Helbling, T. and Bayoumi, T. (2003), `Are they all in the same boat? the 2000-
2001 growth slowdown and the g-7 business cycle linkages', IMF Working
Paper No. 03/46 .
Hägglund, G. (1982), `Factor analysis by instrumental variables methods', Psy-
chometrika 47(2), 209222.
Hosseini, S., Jutten, C. and Pham, D. T. (2003), `Markovian source separa-
tion', IEEE Transactions on Signal Processing 51, 30093019.
Hotelling, H. (1933), `Analysis of a complex of statistical variables into prin-
cipal components', Journal of educational psychology 24, 417441 498520.
Hotelling, H. (1936), `Relations between two sets of variates', Biometrika
28, 312377.
Hyvärinen, A., Karhunen, J. and Oja, E. (2001), Independent Component Anal-
ysis, Wiley, New York.
Ihara, M. and Kano, Y. (1986), `A new estimator of the uniqueness in factor
analysis', Psychometrika 51(4), 563566.
Inklaar, R. J., Jacobs, J. and Romp, W. (2003), `Business cycle indexes: does
a heap of data help?', University of Groningen.
250
Jennrich, R. I. (1986), `A gauss-newton algorithm for exploratory factor anal-
ysis', Psychometrika 51(2), 277284.
Johansen, S. (1988), `Statistical analysis of cointegration vectors', Journal of
Economic Dynamics and Control 12(2/3), p231 254.
Johnson, N. L. and Kotz, S. (1972), Distributions in statistics: continuous
multivariate distributions, Wiley, New York.
Johnstone, I. M. (2001), `On the distribution of the largest eigenvalue in prin-
cipal components analysis', Annals of Statistics 29, 295327.
Jöreskog, K. G. (1967), `Some contributions to maximum likelihood factor
analysis', Psychometrika 32, 443482.
Jöreskog, K. and Goldberger, A. (1972), `Factor analysis by generalized least
squares', Psychometrika 37(3), 243260.
Kailath, T. (1980), Linear Systems, Prentice-Hall, New Jersey.
Kaiser, H. F. (1958), `The varimax criterion for analytic rotation in factor
analysis', Psychometrika 23, 187200.
Kapetanios, G. (2004), `A note on modelling core ination for the uk using
a new dynamic factor estimation method and a large disaggregated price
index dataset', Economics Letters 85, 6369.
Kapetanios, G. (2005), `A testing procedure for determining the number of
factors in approximate factor models with large datasets', Working paper
No.551, Queen Mary.
251
Kapetanios, G. and Marcellino, M. (2004), `A parametric estimation method
for dynamic factor models of large dimensions', Working Paper 489, Queen
Mary, University of London.
Kapetanios, G. and Marcellino, M. (2006), `Factor-gmm estimation with large
sets of possibly weak instruments', Working paper No.577, Queen Mary,
University of London.
Kariya, T. (1993), Quantitative Methods for Portfolio Analysis: MTV Model
Approach, Theory and decision library, Kluwer Academic Publishers, Dor-
drecht.
Karlsen, H. A. (1990), `Doubly stochastic vector ar(1) processes', Dept. of
Mathematics, University of Bergen, Norway.
Kim, C. J. (1994), `Dynamic linear models with markov-switching', Journal of
Econometrics 60(1-2), 122.
Kim, C. J. and Nelson, C. R. (1998), `Business cycle turning points, a new
coincident index, and tests of duration dependence based on a dynamic
factor model with regime switching', Review of Economics and Statistics
80(2), 188201.
Kim, M. J. and Yoo, J. S. (1995), `New index of coincident indicators: A
multivariate markov switching factor model approach', Journal of Monetary
Economics 36(3), 60730.
Larimore, W. (1983), System identication, reduced-order ltering and mod-
252
eling via canonical variate analysis, in `Proc. 1983 American Control Con-
ference'.
Lawley, D. (1956), `Tests of signicance for the latent roots of covariance and
correlation matrices', Biometrika 43, 128136.
Lawley, D. N. and Maxwell, A. E. (1971), Factor Analysis as a Statistical
Method, 2nd edn, Butterworths.
Lebow, D. E. (1993), `The covariability of productivity shocks across indus-
tries', Journal of Macroeconomics 15, 483510.
Ledoit, O. and Wolf, M. (2002), `Some hypothesis tests for the covariance
matrix when the dimension is large compared to the sample size', Annals of
Statistics 30, 10811102.
Lippi, M. and Thornton, D. L. (2004), `A dynamic factor analysis of the re-
sponse of u. s. interest rates to news', Federal Reserve Bank of St Louis
Working Paper .
Ludvigson, S. C. and Ng, S. (2007), `The empirical risk-return relation: A
factor analysis approach', Journal of Financial Economics 83(1), p171
222.
Lütkepohl, H. (1991), Introduction to Multiple Time Series Analysis, Springer-
Verlag: Berlin.
Madansky, A. (1964), `Instrumental variables in factor analysis', Psychome-
trika 29(2), 105113.
253
Magnus, J. R. and Neudecker, H. (1991), Matrix dierential caclulus with ap-
plications in statistics and econometrics, Wylie, Chichester.
Mansour, J. M. (2003), `Do national business cycles have an international
origin?', Empirical Economics 28, 223247.
Marcus, M. (1956), `An eigenvalue inequality for the product of normal matri-
ces', American Mathematical Monthly 63, 173174.
Mar£enko, V. A. and Pastur, L. A. (1967), `Distribution of eigenvalues for some
sets of random matrices', Mathematics of the USSR Sbornik 72, 457483.
Matheson, T. D. (2006), `Factor model forecasts for new zealand', International
Journal of Central Banking 2(2), 169237.
McCallum, B. T. (1970), `Articial orthogonalization in regression analysis',
The Review of Economics and Statistics 52, 110113.
McKeown, M., Makeig, S., Brown, S., Jung, T.-P., Kindermann, S., Bell, A.,
Iragui, V. and Sejnowski, T. (1998), `Blind separation of functional magnetic
resonance imaging (fmri) data', Human Brain Mapping 6, 368372.
Melvin, M. and Schlagenhauf, D. (1986), `Risk in international lending: A dy-
namic factor analysis applied to france and mexico', Journal of International
Money and Finance 5, pS31 48.
Mittelhammer, R. C. and Baritelle, J. L. (1977), `On two strategies for choosing
principal components in regression analysis', American Journal of Agricul-
tural Economics 59, 336343.
254
Nieuwenhuyze, C. V. (2006), A generalised dynamic factor model for the bel-
gian economy - useful business cycle indicators and gdp growth forecasts,
Research series 200603-2, National Bank of Belgium.
Nowak, E. (1992), `Identiability in multivariate dynamic linear errors-in-
variables models', Journal of the American Statistical Association 87, 714
723.
Nowak, E. (1993), `The identication of multivariate linear dynamic errors-in-
variables models', Journal of Econometrics 59, 213227.
Onatski, A. (2006a), `Asymptotic distribution of the principal components
estimator of large factor models when the factors are relatively weak', mimeo.
Onatski, A. (2006b), `Determining the number of factors from empirical dis-
tribution of eigenvalues', mimeo.
Onatski, A. (2007), `A formal statistical test for the number of factors in the
approximate factor models', mimeo.
Pal, M. (1980), `Consistent moment estimators of regression coecients in the
presence of errors in variables', Journal of Econometrics 14, 349364.
Pearson, K. (1901), `On lines and planes of closest t to systems of points in
space', Philosophical Magazine 2, 559572.
Pidot, G. B. (1969), `A principal components analysis of the determinants of
local government scal patterns', The Review of Economics and Statistics
51, 176188.
255
Poskitt, D. S. and Chung, S. H. (1996), `Markov chain models, time series
analysis and extreme value theory', Advances in Applied Probability 28, 405
425.
Quah, D. and Sargent, T. J. (1992), A dynamic index model for large
cross sections, Discussion Paper / Institute for Empirical Macroe-
conomics 77, Federal Reserve Bank of Minneapolis. available at
http://ideas.repec.org/p/p/fedmem/77.html.
Reiersøl, O. (1941), `Conuence analysis by means of lag moments and other
methods of conuence analysis', Econometrica 9, 124.
Reiersøl, O. (1945), `Conuence analysis by means of instrumental sets of
variables', Arkiv für Mathematik, Astronomi och Fysik 32A, 4.
Reiersøl, O. (1950), `On the identiability of parameters in thurstone's multiple
factor analysis', Psychometrika 15, 121149.
Reinsel, G. and Ahn, S. (1992), `Vector autoregressive models with unit roots
and reduced rank structure: estimation , likelihood ratio test, and forecast-
ing', Journal of Time Series Analysis 13, 352375.
Reinsel, G. C. and Velu, R. P. (1998), Multivariate Reduced-rank Regression:
Theory and Applications, Springer, New York.
Ristaniemi, T. and Joutsensalo, J. (1999), On the performance of blind source
separation in cdma downlink, in `Proc. Int. Workshop on Independent Com-
ponent Analysis and Signal Separation (ICA'99), Aussois, France'.
256
Ross, S. A. (1976), `The arbitrage theory of capital asset pricing', Journal of
Economic Theory 13, 341360.
Rubin, D. and Thayer, D. (1982), `Em algorithms for factor analysis', Psy-
chometrika 47, 6976.
Sala, L. (2003), `Monetary transmission in the euro area: A factor model
approach', mimeo.
Sargent, T. J. (1989), `Two models of measurements and the investment ac-
celerator', Journal of Political Economy 97, 251287.
Sargent, T. J. and Sims, C. A. (1977), Business cycle modeling without pre-
tending to have too much a priori economic theory, in C. A. Sims, ed., `New
Methods in Business Cycle Research', Minneapolis: Federal Reserve Bank
of Minneapolis, pp. 45109.
Schneeweiss, H. (1997), `Factors and principal components in the near spherical
case', Multivariate Behavioral Research 32(4), 375401.
Schneeweiss, H. and Mathes, H. (1995), `Factor analysis and principal compo-
nents', Journal of Multivariate Analysis 55, 105124.
Schneider, M. and Spitzer, M. (2004), Forecasting austrian gdp using the gen-
eralized dynamic factor model, Working Papers 89, Oesterreichische Nation-
albank (Austrian Central Bank).
Schumacher, C. (2005), `Forecasting german gdp using alternative factor mod-
els based on large datasets', Deutsche Bundesbank, Research Centre.
257
Shumway, R. H. and Stoer, D. S. (1982), `An approach to time series smooth-
ing and forecasting using the em algorithm', Journal of Time Series Analysis
3, 253264.
Sims, C. (1981), An autoregressive index model for the u.s., 1948-1975, in
J. Kmenta and J. Ramsey, eds, `Large-Scale Macroeconometric Models',
Amsterdam: North Holland, pp. 283327.
Sims, C. (1992), `Interpreting the macroeconomic time series facts: The eects
of monetary policy', European Economic Review 36, 9751000.
Sims, C. A. (1980), `Macroeconomics and reality', Econometrica 48(1), p1
48.
Singleton, K. (1980), `A latent time series model of the cyclical behavior of
interest rates', International Economic Review 21, 559575.
Solo, V. (1986), Topics in advanced time series analysis, Vol. 1215 of Lecture
Notes in Math., Springer, Berlin, pp. 165328.
Spearman, C. (1904), `General intelligence objectively determined and mea-
sured', American Journal of Psychology 15, 201293.
Stock, J. H. and Watson, M. W. (1990), New indexes of coincident and leading
economic indicators, NBER Reprints 1380, National Bureau of Economic
Research, Inc.
Stock, J. H. and Watson, M. W. (2002a), `Forecasting using principal compo-
nents from a large number of predictors', Journal of the American Statistical
Association 97(460), 116779.
258
Stock, J. H. and Watson, M. W. (2002b), `Macroeconomic forecasting using
diusion indexes', Journal of Business and Economic Statistics 20(2), 147
62.
Stock, J. H. and Watson, M. W. (2005), `Implications of dynamic factor models
for var analysis', mimeo.
Stock, J. and Watson, M. (2006), Forecasting with many predicors, in G. El-
liott, C. W. J. Granger and A. Timmermann, eds, `Handbook of Economic
Forecasting'.
Taniguchi, M., Maeda, K. and Puri, M. (2006), `Statistical analysis of a class
of factor time series models', Journal of Statistical Planning and Inference
136, 23672380.
Thurstone, L. L. (1947),Multiple Factor Analysis, University of Chicago Press:
Chicago.
Velu, R., Reinsel, G. and Wichern, D. (1986), `Reduced rank models for mul-
tivariate time series', Biometrika 73, 105118.
Vigário, R., Jousmáki, V., Hämäläinen, M., Hari, R. and Oja, E. (1998), In-
dependent component analysis for identication of artifacts in magnetoen-
cephalographic recordings, in `NIPS '97: Proceedings of the 1997 conference
on Advances in neural information processing systems 10', MIT Press, Cam-
bridge, MA, USA, pp. 229235.
Watson, M. W. and Engle, R. F. (1983), `Alternative algorithms for the esti-
259
mation of dynamic factor, mimic and varying coecient regression models',
Journal of Econometrics 23, 385400.
Watson, M. W. and Kraft, D. F. (1984), `Testing the interpretation of indices
in a macroeconomic index model', Journal of Monetary Economics 13, 165
181.
Whittle, P. (1961), `Gaussian estimation in stationary time series', Bulletin de
L'Institut International de Statistique 39, 105130.
Wigner, E. (1955), `Characteristic vectors of bordered matrices with innite
dimensions', Annals of Mathematics 62, 548564.
Wigner, E. (1958), `On the distribution of the roots of certain symmetric
matrices', Annals of Mathematics 67, 325328.
Zhang, J. and Stine, R. A. (1999), Autocovariance structure of markov regime
models and model selection, Technical report, Department of Statistics, The
Wharton School of the University of Pennsylvania.
260