Factor Models for Conditional Asset Pricing∗
PAOLO ZAFFARONI
Imperial College Business School
This Draft
November 12, 2019
Abstract
This paper develops a methodology for inference on conditional asset pricing models linear in latent riskfactors, valid when the number of assets diverges but the time series dimension is fixed, possibly very small.We show that the no-arbitrage condition permits to identify the risk premia as the expectation of the latentrisk factors. This result paves the way to an inferential procedure for the factors’ risk premia and for thestochastic discount factor, spanned by the latent risk factors. In our set up every feature of the asset pricingmodel is allowed to be time-varying including loadings, risk premia, idiosyncratic risk and the number ofrisk factors. Monte Carlo experiments corroborate our theoretical findings. Several empirical applicationsbased on individual asset returns data demonstrate the power of the methodology, allowing to tease out theempirical content of the time-variation stemming from asset pricing theory.
Keywords: Conditional Asset Pricing Models; No-Arbitrage; Latent Factor Model; Risk Premia; Stochastic
Discount Factor; Principal Component Analysis.
∗I thank Claudia Custodio, Victor DeMiguel, Gabaix Xavier, Marcin Kacperczyk, Frank Kleibergen, Ralph Koijen, OliverLinton, Stefan Nagel, Olivier Scaillet, Raman Uppal, Dacheng Xiu, Guofu Zhou, audiences at Cambridge, Imperial College London,Luxembourg School of Finance, Yale and Tinbergen Institute (Amsterdam) for comments and suggestions. I am very grateful toMassimo Dello Preite for developing the simulation and empirical exercises of this paper.
1
1
1 Introduction
This paper develops a formal methodology to conduct inference on no-arbitrage factor conditional asset pricing
models. Our method does not require to specify in any way how the conditional distribution of asset returns
changes over time, allowing to leave the form of time-variation of loadings, risk premia and even of the number
of risk factors completely unspecified. Moreover, it does not require to pre-select any candidate, observed,
risk factor and instead it treats all the risk factors as unobserved. These two features imply that our method
mitigates completely the risk of model misspecification arising from either postulating the incorrect dynamics
and from relying on the wrong or incomplete set of risk factors. The salient feature of our methodology, that
enables to achieve these features, is that it works when the time-series dimension of the data T is fixed, in
practice (almost) arbitrarily small, whereas it demands the number of assets N to diverge.
Linear factor models represent the workhorse of asset pricing, whereby the stochastic discount factor (SDF)
is assumed to be a linear function of a set of systematic risk factors, common across assets. From a theoret-
ical perspective, this approach has been legitimized by the CAPM of Sharpe (1964) and Lintner (1965), the
Intertemporal CAPM of Merton (1973) and the Arbitrage Pricing Theory (APT) of Ross (1976). Empirically,
it has proven much more arduous to identify the complete set of risk factors that span the SDF. Although
the Fama & French (1993) three-factor model and, especially, the Fama & French (2015) five-factor model
appear successful, with respect to several metrics, there is no consensus about which, among the hundreds of
candidate risk factors documented in the empirical asset pricing literature (see Harvey et al. (2016)), are really
“...important and which are....subsumed by others” (Cochrane (2011)).
At the same time, tens of thousands of financial assets are traded every day in financial markets. However,
most of the econometric methods developed to test and estimate asset pricing models are designed for when
the time-series dimension T diverges and the number of assets N is fixed. There are many instances when
allowing a large T is not beneficial or just not feasible. For instance, economic and financial phenomena are
plagued by structural breaks (geopolitical crises and wars, energy crises, financial crises) and, sometimes, a
long time series of data simply is not available, such as when considering financial data on emerging markets
or data on new financial instruments. An arbitrarily large number of assets N is also dictated by asset pricing
theory, in particular when the idiosyncratic component of asset returns is assumed correlated across assets. For
example the CAPM hinges on the notion of full diversifiability of idiosyncratic risk, achieved when N diverges.1
1More in general, exact pricing (i.e. when expected returns are exactly linear in betas with null pricing errors) requires existenceof a well-diversified portfolio on the mean-variance frontier (see Chamberlain (1983)[Corollary 1]). On the other hand, when exact
2
Moreover, dynamic asset pricing models characterize expected returns, prices and risk premia as conditional
(hence time-varying) moments and such time-variation has to be explicitly parameterized, in a tractable way,
when dealing with any large-T estimation procedure. However, with tractability comes the risk of model
misspecification, rendering any large-T inferential procedure potentially invalid. More in general, estimating
the first moment of asset returns by means of large-T techniques is notoriously challenging, as elegantly elicited
by Merton (1980).
Therefore, empirical and theoretical considerations strongly suggest the need for an inferential methods for
conditional asset pricing models that work when N is large and T is small. This is the achievement of the
paper. More formally, this paper provides the empirical asset pricing methodology that mirrors the theoretical
asset pricing contributions of Chamberlain (1983), Chamberlain & Rothschild (1983a) and Hansen & Richard
(1987). In particular, whereas the first two seminal works combine no-arbitrage factor asset pricing models with
population Principal Component Analysis (PCA) in a large-N environment, ours develops the inferential theory
of the sample PCA for no-arbitrage factor asset pricing models, also in a large-N environment. This implies
that our inferential procedure is particularly suited for conditional asset pricing models, formalized upon the
conditional no-arbitrage condition of Hansen & Richard (1987), because parameterizing time-variation becomes
much less critical, if any at all, when T can be taken sufficiently small, as the assumption of mild or no time-
variation represents a plausible approximation over a short time interval.
Our novel PCA asymptotic results are instrumental for developing an inferential procedure for our objects
of interest, namely the time-varying risk premia and the time-varying SDF. Our results are first developed for
the case when a risk free asset is traded, and then generalized to the case when the zero-beta rate is unknown
and needs to be estimated. The usefulness of the PCA estimator of the risk factors, from an asset pricing
perspective, naturally stems from its close analogy with the mimicking portfolio estimator: we demonstrate
that these two estimators, although not identical, are asymptotically equivalent, implying that they allow to
conduct the same inference on the set of true latent risk factors. Our methodology for conditional asset pricing
models represents a unique tool to address important empirical questions in empirical asset pricing, and beyond,
unthinkable with the usual PCA methodology, valid under double-asymptotic (i.e. when both N and T diverge).
More specifically, we show how to estimate consistently the number of (latent) risk factors and the associated
pricing does not hold, such as for the APT, an arbitrarily large N is required for all the theoretical predictions to hold. Morespecifically, Chamberlain & Rothschild (1983a) state that “... Ross (1976) has argued that the apparent empirical success of theCAPM is due to three assumptions which are more plausible than the assumptions needed to derive the CAPM. These assumptionsare first, that there are many assets; second, that the market permits no arbitrage opportunities; and third, that asset returns havea factor structure with a small number of factors...”
3
factors’ risk premia, and how to conduct inference on them, such as e.g. constructing asymptotically valid
confidence intervals. Interestingly, as a by-product of the no-arbitrage condition, our risk premia estimator
exhibits two, equivalent, representations: first, as the time-series sample mean of the PCA estimated risk
factors, and, second, as the traditional OLS two-pass estimator of Fama & MacBeth (1973), obtained by
projecting sample average returns on the PCA estimated loadings.2 It turns out that our risk premia estimator
exhibits the same rate of convergence characterizing risk premia estimators for the case when the risk factors
are observed. In other words, somewhat surprising, no loss of information arises from the latent nature of the
risk factors within our large-N fixed-T sampling scheme, the only difference being that risk premia are now
identified up to an (unknown) rotation. Notice that our focus on the large-N and fixed-T scheme necessarily
implies that we can identify the so-called ex-post risk premia, which equals the (ex-ante) risk premia plus the
risk factors’ innovation (difference between the risk factors’ sample and population means).3 For the case of
observed risk factors, Gagliardini et al. (2016) and Raponi et al. (2018) demonstrate how valid inference on
correct specification of the asset pricing model (i.e. on the asset pricing restriction) can be conducted by relying
on the ex-post risk premia, without any loss of power, when N,T are both diverging and when N diverges but
with fixed-T , respectively.
We then study estimation of the SDF spanned by the latent risk factors. Unlike the risk premia, the SDF
is unaffected by the rotation that plagues the PCA estimator of factors and risk premia. We derive the rate of
convergence and the asymptotic distribution of our SDF estimator, and show how to construct asymptotically
valid confidence intervals. This is important because the limiting statistical properties of the SDF, spanned
by latent risk factors, were hitherto unknown, when taking into account the sampling variability of the risk
factors’ estimates. In analogy to risk premia, as a consequence of our sampling scheme, note that the object
of inference is necessarily the so-called ex-post SDF, which is a (linear) function of the ex-post risk premia. To
justify the notion of the ex-post SDF, we formally show that the pricing errors associated with it are almost
undistinguishable from the ones associated with the conventional SDF, based on the ex-ante risk premia, even
for a moderate T .
Finally, we show how to consistently estimate expected returns of large portfolios, based on our risk premia
estimator. Just like for the SDF, portfolios expected returns are immune to the rotation affecting the risk
factors and their risk premia. Consistent estimation follows because expected portfolio returns depend on the
2Recall that equivalence between the sample mean and the two-pass estimator is not warranted for the case of observed tradeablerisk factors.
3The notion of ex-post risk premia was by coined by Shanken (1992).
4
(weighted) first moment of the loadings, which can be recovered despite the noisiness of the PCA-estimated
loadings themselves.
We develop several empirical applications, based on a data set of individual U.S. monthly stock returns, that
demonstrate the strength of our large-N methodology. First, we show empirically how the estimated number
of risk factors varies across time, ranging from one to 10 and increasing during booms and sharply decreasing
during financial crises: the estimated correlation between the S&P 500 index and our estimated number of
factors is above 60% in our sample. This is a marked rejection of the conventional approach of constancy of the
number of latent factors. Second, we found robust evidence according to which the dominant estimated factors
(measured by the size of the corresponding eigenvalues, sorted in descending order) are strongly related to the
Fama & French (2015) five factors. In particular, the market factor, the small-minus-big and high-minus-low
factors appear paired with the first three PCA-estimated factors, respectively, whereas the profitability and the
investment factors appear jointly related to the fourth and fifth PCA-estimated factors. The strength of such
relationships varies with time and seems to weaken after the year 2000. Third, we demonstrate how the time-
variation of the estimated risk factors’ loadings appear driven by the interest rate spread variables, dividend
yield and earnings variables, especially during financial crises. Fourth, we evaluate the economic value of the
estimated asset pricing model in terms of pricing performance, i.e. magnitude of the associated pricing errors,
and in terms of out-of-sample Sharpe ratios associated with mean-variance portfolios built using the model’s
estimated parameters. It emerges that the factor model, based on the time-varying estimated number of risk
factors, provides the best performance across both metrics. Noticeably, in terms of out of sample Sharpe ratios,
our estimated model, with a time-varying number of risk factors, not only dominates the performance of a
model with a constant number of factors but dominates also the celebrated equally-weighted portfolio strategy.
Fifth, the time series of the estimated SDF exhibits a prominent, and statistically significant, anti-cyclical
behavior, in particular with respect to the NBER recession indicator. Moreover, the estimated SDF appears
markedly spanned by the Fama & French (2015) factors until the late 1990s, and rather tenuously after that,
except in the aftermath of the subsequent financial crises.
The paper proceeds as follows. Section 2 describes a literature review. Section 3 presents the factor
conditional asset pricing model, sets the notation and describes the regularity assumptions. Section 4 presents
our methodology: we show how to conduct inference on the latent factors, including consistent estimation of
the true number of latent factors; we formalize how to conduct inference on risk premia and the SDF, and how
to generalize our results to the case when the zero-beta rate has to be estimated; we establish the asymptotic
5
equivalence between PCA estimator of the risk factors and the mimicking portfolio estimator. Monte Carlo
experiments are presented in Section 5. Section 6 contains the five empirical applications based on the data set
of individual US stock returns. Section 7 concludes. Formal proofs are reported in the final appendixes.
2 Literature Review
This paper advances the empirical asset pricing literatures that exploits the available large cross-sections of
individual assets returns and the one on estimation of conditional asset pricing models and the econometrics
literature on PCA estimation.
Giglio & Xiu (2017) recognize that, by extracting the complete set of systematic risk factors, PCA solves the
problems arising from estimating risk premia associated with observed factors, when, in particular, some of these
factors are omitted or even mismeasured, leading to the classical omitted variable bias. Gagliardini, Ossola and
Scaillet (2018) propose a methodology to detect if the residuals of a potentially misspecified beta-pricing model,
with time-varying coefficients, exhibit a factor structure. Although PCA is not directly employed, their method
is based on the asymptotic behaviour of the maximum eigenvalue of the residuals’ sample covariance matrix.
Lettau & Pelger (2018a) study a version of the PCA estimator designed to also minimize the mispricing in terms
of expected return, with the noticeably feature of estimating consistently weak factors (i.e. factors associated
with non-diverging eigenvalues), and Lettau & Pelger (2018b) provide several applications of this methodology
to characteristics portfolios and individual stock returns. Kozak et al. (2018) demonstrate empirically that
a small number of the dominant PCA estimates satisfactorily quantifies the SDF associated with anomalies
portfolios returns, regardless of whether the underlying asset pricing model is deemed as behavioural (i.e. risk
driven by say stock characteristics) or rational (i.e. risk driven by loadings). Kim & Korajczyk (2019) study
estimation of the SDF by minimizing the average mispricing in terms of prices, where the latent factors, when
present, have been pre-estimated by PCA, building on the insight of Pukthuanthong & Roll (2017).
Conditional linear factor models have been typically specified by assuming that loadings and risk premia
are linear functions of (lagged) observed state variables.4 Gagliardini et al. (2016) provides a formal statistical
analysis for an inferential method of this type of conditional linear factor models, in particular characterized by
observed risk factors. In contrast, Connor & Linton (2007), Connor et al. (2012), Fan et al. (2016) , Kelly et al.
(2017) and Kelly et al. (2018) study, and apply, estimation procedures for conditional models with latent risk
4See Shanken (1990), Ferson & Harvey (1991), Jagannathan & Wang (1996) and Lettau & Ludvigson (2001) among many others.The possible shortfalls of this approach have been discussed in Ghysels (1998) and Harvey (2001).
6
factors models whereby the loadings and, when applicable, the mispricing (i.e. the intercept) are instrumented
by observed state variables.5 An alternative approach, closer to the perspective of this paper, is to leave the
dynamics of the conditional asset pricing model unspecified, and use nonparametric techniques for estimation.
Building on the insight of French et al. (1987), Andersen et al. (2006), Lewellen & Nagel (2006) and Ang &
Kristensen (2012) develop several variations of this approach. Whereas the asymptotic analysis of the time-
varying loading’s estimator pose no particular problems, reliable nonparametric estimation of the alphas, and
thus of the time-varying risk premia, is particularly challenging with nonparametric techniques, as detailed by
Ang & Kristensen (2012) who build on the arguments of Merton (1980).
Seminal, econometrics, results for inference on factor models are Bai & Ng (2002), Stock & Watson (2002a),
Stock & Watson (2002b) and Bai (2003), whereby the observable variable is a static function of a finite set
of latent common factors. However, note that this model is not static, in the sense that both the common
factors and the idiosyncratic component are allowed to be time-dependent. In this strand of the literature,
time-domain inferential methods and assumptions are have been adopted. In contrast, Forni et al. (2000) and
Forni & Lippi (2001) postulate that the observable is a dynamic function of a finite set of latent factors, their
approach being denominated as the generalized dynamic factor model. Frequency domain techniques have been
adopted here to analyze the statistical properties of the dynamic PCA estimator.6 Although a more general
form of time-dependence is allowed for, this frequency-domain approach is less suitable for forecasting because
it necessarily leads to the observable being a function of past and future innovations. Forni et al. (2015) and
Forni et al. (2017) show how to reconcile the advantages of both approaches. All the aforementioned work
requires double asymptotics, namely that both N and T diverge to infinity. Moreover, with the exception
of Bai (2003), none of these works developed inferential methods, namely the asymptotic distribution and the
associated standard errors for the PCA estimates of loadings and factors, but focused on estimating the number
of factors and deriving consistency (sometimes with rates of convergence) of the PCA estimator of loadings and
common factors.
A large literature focused on procedures for estimating the number of latent common factors, since the
seminal work of Connor & Korajczyk (1993). Bai & Ng (2002) provide the theory, under double asymptotics,
for consistent estimation of the number of factors in a static model. Amengual & Watson (2007) show how their
approach can be extended to the case of finite-dimensional vector autoregression latent factors. Onatski (2010)
5Although we denote these models as conditional, in the sense that the corresponding parameters are function of observables,only Kelly et al. (2017) and Kelly et al. (2018) allow explicitly for dynamics.
6Brillinger (2001) introduce the dynamic PCA estimator, based on the eigenvalues and eigenvectors of a consistent estimator ofthe observables’ spectral density matrix.
7
allows for an arbitrary speed of divergence of the dominant eigenvalues and Ahn & Horenstein (2013) relax the
dependence from a finite upper bound for the true number of latent factors. Ait-Sahalia & Xiu (2017) establish
consistent estimation of the number of latent factors (and estimation of the factor structure itself) within an
in-fill asymptotics framework when high-frequency data are available.7 For the case of dynamic factor models
a la Forni et al. (2000), Hallin & Liska (2007) provide a consistent estimator for the number of factors and
Onatski (2009) establishes the asymptotic distribution of a test for the number of factors. As for estimation,
these advances have been developed in a double asymptotics setting.
Building on Dahlhaus (1997) and Dahlhaus (2000), factor models with time-varying loadings of unspecified
form, are considered by Motta et al. (2011), Brandon et al. (2013), Su & Wang (2017) and Barigozzi et al.
(2019), among others. All these developments are based on double asymptotics.
Very little work has been established within the large-N fixed-T sampling scheme, especially when com-
pared with the double asymptotics setting cited above, and moreover the few existing results are scattered
across various papers, and depend on different set of regularity conditions. In particular, Connor & Korajczyk
(1986) establish consistency of the PCA estimator for the latent factors when T is fixed. Subsequently Bai
(2003)[Theorem 4] establish the necessary and sufficient conditions for the Connor & Korajczyk (1986) large-
N consistency result, namely that the cross-sectional averages of the idiosyncratic components’ variances and
autocovariances are, respectively, constant and null. A related, statistical literature, focused on the large-N ,
small-T environment, denominated as the high-dimension-low- sample-size (HDLSS) situation, where rates of
convergence of the PCA estimator of the latent factors, and variations of, are studied under various regularity
conditions; see Jung & Marron (2009) and Shen et al. (2013) among others.8
Our work complements these empirical asset pricing and econometrics literatures in an important way
because we specifically focus on the case when N is large and T is fixed, developing tools that allow to perform
inference, such as confidence intervals and hypothesis testing for the estimated factors, as well establishing
consistent estimation of the true number of latent factors. Moreover, we show how our econometrics results are
instrumental for developing a formal methodology for inference on risk prema and the SDF associated with the
linear asset pricing model, which are our ultimate objects of interest.
Noticeably, individual asset returns, in particular equity returns, exhibit a very limited time-dependence but,
7Ait-Sahalia & Xiu (2018) establish the asymptotic distribution theory of the PC estimator in a continuous time setting butwithout assuming necessarily a factor structure, hence N is assumed fixed.
8Fan et al. (2016), cited above, reports consistency results for a variation of the PCA, both in the HDLSS environment andunder double asymptotics.
8
at the same time, are characterized by strong cross-sectional dependence and heterogeneity. As our interest
lies predominantly in factor models for asset returns, these empirical features motivate our large-N fixed-T
sampling scheme and explain our focusing on static factor models, in particular in light of Bai (2003) findings,
as opposed to dynamic factor models a la Forni et al. (2000), although our approach could in principle be
applied to the latter too.
3 Factor Conditional Asset Pricing Model: Set-Up and Assumptions
We assume that the i-th individual asset return, xit, in excess of the risk free rate, satisfies the finite-dimensional
conditional factor model, in the sense that each xit is a linear function of a finite number rt−1 of latent common
factors, and of a (latent) idiosyncratic component:
xit = αit−1 + λ′it−1ft + eit for every i = 1, · · · , N and t ∈ Z, (1)
where ft defines the rt−1×1 vector of latent risk factors, λit−1 the rt−1×1 vector of latent time-varying loadings,
eit is the idiosyncratic component and Z = · · · ,−1, 0, 1, · · · . In representation (1) the parameter αit−1 is just
a time-varying intercept, not necessarily equal to the conditional expected excess return Et−1(xit) (i.e. the risk
premium associated with asset i). Therefore, (1) does not imply that Et−1(ft) = 0rt , although we will assume
Et−1(eit) = 0 in our regularity assumptions. Finally, we assume that a risk-free asset is tradedable. The case
when any risk-free asset is not tradeable is described subsequently.
It is important to make sure that model (1) does not allow for a ‘free lunch’, formalized as follows
(see Chamberlain (1983)[Condition A] and the generalization of Hansen & Richard (1987)[Definition 2.4]
to a conditional setting). This requires to define an arbitrary portfolio strategy p with weights wpNt−1 =
(wp1t−1, wp2t−1, · · · , w
pNt−1)′ of the N risky assets’ excess returns xt. ≡ (x1t, · · · , xNt)′, leading to the portfolio
p excess return rpN,t ≡ wp′Nt−1xt.. As we explained below, the no-arbitrage assumption turns out to be instru-
mental for the interpretation of the latent factors, and for derivation of the statistical properties of the PCA
estimator.
Assumption 1 (conditional no-arbitrage) There is no sequence of portfolios along some subsequence N ′
for which, for an arbitrary positive scalar C:
vart−1(raN ′,t)→ 0 a.s., when N ′ →∞, and Et−1(raN ′,t) ≥ C > 0 a.s for every t ∈ Z.
9
Ingersoll (1984) clarifies that the above definition of no-arbitrage is equivalent to assume that Et−1(raN ′,t)→∞
(a.s.) by suitably leveraging, i.e. scaling up, the portfolio weights. Under the above no-arbitrage condition, a
limited degree of cross-sectional dependence and a sufficient degree of smoothness of the loadings (see Proposi-
tion 4 in Appendix B), there exists a rt−1× 1 vector γt−1, namely the time-varying risk premia associated with
the risk factors ft, such that
µit−1 ≡ Et−1(xit) = ai−1 + λ′it−1γt−1 a.s. for every i = 1, · · · , N and t ∈ Z, (2)
where the pricing errors ait−1 must satisfy, as a consequence of no-arbitrage (see Chamberlain (1983)[Theorem
1] and Stambaugh (1983)[Theorems 1 and 2] for the static and conditional settings, respectively), 9
∞∑i=1
a2it−1 <∞ a.s. for every t ∈ Z. (3)
Combining (1) and (2):
xit = ait−1 + λ′it−1Ft + eit, (4)
setting
Ft ≡ ft + γt−1 − Et−1(ft). (5)
A re-writing of (4), such as xit = µit−1 + λ′it−1(ft − Et−1(f)) + eit, leads to the conditional version of the APT
formulation in Ross (1976)[Eq (15)], Chamberlain (1983)[Eq. (4.1)] and Chamberlain & Rothschild (1983a)[Eq.
(1.1)], among others. However, (4) provides the ultimate specification that PCA will estimate, in particular
obtaining (consistent) estimates of the risk factors Ft (that is, the ft adjusted by γt−1 − Et−1(ft)). Note that
Et−1(Ft) = γt−1, that is the no-arbitrage condition leads to a change-of-measure of the latent risk factors ft,
so that the Ft are centered precisely around the risk premia γt−1. This result does not follow if the xit are
de-meaned beforehand, suggesting the importance of applying PCA directly to the xit from (4) without any
prior de-meaning of the data.
9Stambaugh (1983)[Theorems 1 and 2] extends Chamberlain & Rothschild (1983b)[Theorem 1] to the conditional setting, withpossibly time-varying risk premia but constant loadings. We show how Stambaugh (1983) results can be further generalized tothe case of time-varying loadings under a suitable smoothness assumption (see Proposition 4). The appeal of Stambaugh (1983)approach is that it applies to conditioning information set of any size, including the case of total absence of conditioning information,namely the one-period static case. This implies that Stambaugh (1983) result is robust to asset repackaging, which is relevant whenagents construct portfolios based on different information sets. As discussed in Hansen & Richard (1987)[Section 5], absence of thisproperty would imply that existence of a conditional factor structure does not lead to existence of a corresponding unconditionalfactor structure or, more in general, to existence under a different information set. For an alternative proof (based on a differentset of assumptions) of the conditional APT with a countably number of assets see Reisman (1992). Building on Al-Najjar (1999),Gagliardini et al. (2016) establish robustness to asset repackaging of the conditional APT with a continuum of assets.
10
Although the emphasis of this paper in on asset returns, if one wishes to apply model (1) to data other
than asset returns, then the restriction (2) does not necessarily hold, and one simply re-arranges (1) as
xit = (αit−1,λit−1)′(1, f ′t)′ + eit = (αit−1,λit−1)′Gt + eit, (6)
where Gt denotes the (rt−1 + 1)× 1 vector (1, f ′t)′. When αit−1 = αi, λit−1 = λi and rt−1 = r, representation
(6) is the static latent factor formulation of (1) adopted, among others, by Stock & Watson (2002a) and Bai &
Ng (2002). It follows that by applying PCA to model (6), with constant parameters, one captures r+ 1 factors
in total, one of which being the constant factor (equal to 1 for all periods), as long as such (constant) factor is
pervasive, namely N−1∑N
i=1 α2i → C > 0, for some constant C.10 Alternatively, focusing again on the constant
loadings case of Eq. (6), one can subtract the (time series) sample mean from both sides of (6), and apply the
PCA to the xit − xi = λ′i(ft − f) + eit − ei, denoting zi ≡ T−1∑T
t=1 zit for a generic sequence zit, In this case
only the r (de-meaned) factors, ft − f , will be captured by PCA.
Things become more complicated for the case of time-varying parameters, as for instance de-meaning is
not even meaningful because it will not eliminate the time-varying intercept term anyway. This is where the
no-arbitrage assumption becomes extremely useful, because PCA will detect the rt−1 factors without the need
to de-mean the asset returns data, because the constant factor is not pervasive. In other words, for asset returns
one can safely estimate the risk factors ignoring completely the intercept ait−1 (i.e. the pricing errors) without
inducing any bias in the estimated factors and loadings, an especially advantageous feature when considering
conditional factor models.
Summarizing, throughout this paper, we will always implement PCA, without allowing for any intercept
term, regardless of whether one envisages to fit the factor model to asset returns or not, as this will always lead
to accurate (i.e. consistent) estimates of the common latent factors, under our regularity assumptions. This
is methodologically relevant because, as explained in details below, allowing explicitly for an intercept term
or, alternatively, de-meaning the data, induces time-dependence in the idiosyncratic component of the fitted
model, violating one of the regularity assumptions needed in our large-N fixed-T environment, as clarified by
Bai (2003).
10Obviously, in practice, such constant factor will be masked by the rotation that affects loadings and factors. It is well-knownthat this does not lead to any relevant consequence for practical use of the model.
3.1 Econometric Quantities: Factors and Loadings 11
3.1 Econometric Quantities: Factors and Loadings
To define the PCA estimator, it is useful to adopt a matrix formulation, setting the T×N matrix of observed ex-
cess asset returns X ≡ (x.1 · · ·x.i · · ·x.N ) ≡ (x1. · · ·xt. · · ·xT.)′, where x.i ≡ (xi1 · · ·xiT )′ and xt. = (x1t · · ·xNt)′,
the T×rt−1 matrix of factors F ≡ (F1 · · ·FT )′, the N×rt−1 matrix of loadings Λt−1 ≡ (λ1t−1 · · ·λNt−1)′ and the
T ×N matrix of idiosyncratic components e ≡ (e.1 · · · e.i · · · e.N ) = (e1. · · · et. · · · eT.)′, setting e.i ≡ (ei1 · · · eiT )′
and et. ≡ (e1t · · · eNt)′. Note that factors, loadings and residuals are always latent.
We assume that the T observations used in the PCA are extracted from a longer time-series sample of size
T0 > T and in particular, without loss of generality, that they are made by the observations from t−T + 1 to t,
for any t satisfying T−1 < t ≤ T0. Therefore, whereas X and F contains exclusively T rows (referring to T time
series observations), without loss of generality one can always assume that these matrices are extracted from
the larger matrixes X0 and F0 with T0 rows. Although the allowed degree of time-variation will be formalized
below, for derivation of the PCA it suffices for now to assume that the loadings Λt−1 and the number of factors
rt−1 remain constant within the time interval from t−T +1 to t, but can vary freely across the T0 observations.
We present two, equivalent, ways to derive the PCA estimator of factors, loadings and number of factors
correspondingly to the time-varying model (1), namely as the nonparametric Nadaraya-Watson estimator, in
addition to the classical PCA formulation. For t ≡ [uT0] , corresponding to some u satisfying T/T0 ≤ u ≤ 1,
implying T ≤ t ≤ T0, set Kh(·) ≡ h−1K(·/h), with K(·) ≡ 1(·), the indicator function, and bandwidth
parameter h ≡ T/T0. The Kh(s/T0 − u), for every 1 ≤ s ≤ T0, make the diagonal T0 × T0 matrix Ku ≡
diag(Kh(1/T0 − u), · · · ,Kh(1− u)).
Without loss of generality for N , assume that N > T ≥ rt−1, for every T −1 < t ≤ T0, as we will only allow
N to diverge. As it will be explained, one needs T ≥ rt−1 to rule out existence, and multiplicity, of the zero
eigenvalue of certain relevant matrixes. The PCA estimators for the loadings and the factors are the solution
of the least squares problem:
F(Tt), Λ(Tt) = argminF,Λt−1
1
NT
t∑s=t−T+1
(xs. −Λt−1Fs)′(xs. −Λt−1Fs)
= argminF,Λt−1
1
NT0trace
((X0 − F0Λ
′t−1)Ku(X0 − F0Λ
′t−1)
),
where the PCA estimators, corresponding to rt−1 factors, are denoted by
F(Tt) ≡ (Ft−T+1,t, · · · , Fs,t, · · · , Ft,t)′ and Λ(Tt) ≡ (λ1t−1, · · · , λNt−1)′,
3.2 Analogies between PCA and Mimicking Portfolios 12
to indicate dependence on the time interval Tt ≡ (t− T + 1, t).
Indicating F(Tt) = F and Λ(Tt) = Λt−1, that is omitting reference to the interval Tt for the sake of simplicity,
F denotes the set of rt−1 eigenvectors, multiplied by√T , corresponding to the largest rt−1 eigenvalues of
XX′/NT , and correspondingly Λt−1 ≡ X′F(F′F)−1 = X′F/T ;11 see Magnus & Neudecker (2007), Theorem 7
of Chapter 17, for an elegant proof.
In particular, setting Vt−1 equal to the diagonal rt−1 × rt−1 matrix with the largest rt−1 eigenvalues of
XX′/NT , one obtains:
XX′
NTF = FVt−1 where
F′F
T= Irt−1 . (7)
Crucially, we do not explicitly allow for an intercept term when implementing PCA, for the reasons discussed
above. In the next sections, we will establish the asymptotic properties of F and Λt−1 as N →∞ and, more in
general, how to conduct inference on model (1), including estimating the true (but unknown) number of factors
rt−1.
3.2 Analogies between PCA and Mimicking Portfolios
The estimated factors F have the interpretation of portfolio (excess) returns, corresponding to the rt−1 vectors
of portfolio weights wpcaNt−1 ≡ Λt−1(Λ′t−1Λt−1)−1. In fact, by (7) and recalling that N−1Λ′t−1Λt−1 = Vt−1,
F =XX′
NTFV−1
t−1 = XX′F
T(NVt−1)−1 = XΛt−1(Λ′t−1Λt−1)−1 = Xwpca
Nt−1.
The population counterpart of the PCA portfolio weights wpcaNt−1 is clearly wpca
Nt−1 ≡ Λt−1(Λ′t−1Λt−1)−1.
However, it is more relevant to compare the PCA estimated risk factors F with the notion of mimick-
ing portfolios, coined by Huberman et al. (1987) and Breeden et al. (1989). In particular, setting Fmpt ≡
Et−1(Ftx′t.)(Et−1(xt.xt.))
−1xt. = wmp′Nt−1xt. as the mimicking portfolios corresponding the latent factors Ft, we
will study the limiting behaviour of both Fmpt and of wpca′
Nt−1xt., as N →∞, where by direct calculations:
wmpNt−1 =(
(at−1 + Λt−1γt−1)(at−1 + Λt−1γt−1)′ + Λt−1Ωt−1Λ′t−1 + Σet−1
)−1 (at−1γ
′t−1 + Λt−1γt−1γ
′t−1 + Λt−1Ωt−1
),
11It is well-known (see the discussion in Bai & Ng (2002)) that an asymptotically equivalent PCA estimator of factors andloadings can be obtained by setting Λt−1 equal to the eigenvectors of X′X/NT , multiplied by
√N , corresponding to the rt largest
eigenvalues, and F = XΛt−1/N . As N is much larger than T , the latter approach is computationally less convenient than the onedescribed above. Recall that the non-zero eigenvalues of X′X and XX′ coincide.
3.3 Asset Pricing Quantities: Risk Premia and SDF 13
setting Ωt−1 ≡ vart−1(Ft), Σet−1 ≡ vart−1(et.) and at−1 ≡ (a1t−1, · · · , aNt−1)′. More importantly, we establish
the limiting behaviour of the portfolio excess returns based on the feasible PCA-based estimator of wmpNt−1, and
show it close analogies with the PCA estimator F.
3.3 Asset Pricing Quantities: Risk Premia and SDF
Conditional expected excess returns µit−1 and risk premia γt−1 in particular, as from equation (2), are of
fundamental importance for many asset pricing problems, such as for identification of the SDF implied by (1)
and for portfolio construction. In particular, under the conditional no-arbitrage assumption, one can define the
(conditional or time-varying) risk premia, when a riskless asset with time-varying (gross) rate rft−1, is traded,
as:
γt−1 ≡ −rft−1
(Pt−1(ft)−
Et−1(ft)
rft−1
)for every t ∈ Z, (8)
where Pt(·) denotes the conditional pricing functional, formally defined as Pt(.) ≡ Et(mt,t+1·) where mt,t+1 is
the (conditional) SDF, whose existence is ensured by no-arbitrage (see Chamberlain (1983)[Theorem 1] and the
generalization to a conditional setting of Hansen & Richard (1987)[Theorem 2.1]).12 In our context, the price
Pt−1(ft) is only notional because the ft are latent, and hence non-tradeable. However, the formula above applies
to any risk factors so, for instance, when ft represent observed portfolio (gross) returns, then Pt−1(ft) = 1rt−1 ,
and the risk premia coincides with the expected value of the factors in excess of the risk free rate, as (8)
simplifies to γt−1 = Et−1(ft − rft−11rt−1). More in general, expression (8) defines the risk premia γt−1 as the
price of the factors ft less their risk-neutral valuation.
When T is fixed, an important identification issue related to the risk premia γt−1 arise. To simplify
arguments, consider the static version of the asset pricing model (1), that is assume for now ait−1 = ai,λit−1 =
λi, rt−1 = r. Then, following Shanken (1992), by taking the average of (4) across the time interval Tt, yields:
xi =1
T
∑s∈Tt
xis = ai + λ′iF + ei = ai + λ′iγP + ei, (9)
setting F ≡ F′1T /T and ei ≡ e′.i1T /T . Equation (9) shows how the (sample) average excess returns are linear
in the so-called ex-post risk premia, coined by Shanken (1992), namely
γP ≡ F = γ + f − E(ft).
12Expression (8) follows by generalizing Chamberlain (1983)[Theorem 1] to the conditional setting. An alternative, equivalent,expression for the risk premia vector γt−1 is based on the limit, as N diverges, of the linear projection with respect to (2), namely(Λ′t−1Λt−1)−1Λ′t−1µt−1, setting µt−1 ≡ (µ1t−1, · · · , µNt−1)′; see Ingersoll (1984)[Theorem 3].
3.3 Asset Pricing Quantities: Risk Premia and SDF 14
Interestingly, the ex-post risk premia vector γP coincides with the sample mean of the (population) Ft, although
the latter are not observed, and thus not tradeable. The ex-post risk premia exhibit many attractive properties,
providing essentially the same informations on the asset pricing model (1) as the ex-ante risk premia γ. For
instance, γP is centered around (i.e. is unbiased for) γ and, as T increases, γP →p γ. Moreover, the asset
pricing model (9) is linear γP . More importantly, γP allows to conduct inference on γ −E(ft) which is exactly
as informative as γ on the asset pricing implications of the model, a feature exploited by Gagliardini et al.
(2016).13
Note that the discrepancy between the ex-ante and ex-post risk premia is not due to the latency of the
risk factors in (1) but is a necessary consequence of keeping T fixed. Raponi et al. (2018) develop a complete
methodology for inference on linear asset pricing models, such as (1), but when the ft are (tradeable or non-
tradeable) observed factors. Their object of inference consists necessarily the ex-post risk premia and, as here,
they assume that only the number of assets, N , diverges.
Therefore, by exploiting the restriction (2) arising from the no-arbitrage assumption, a natural estimator
for the ex-post risk premia, γP , emerges, namely:
γ ≡ T−1∑s∈Tt
Fs,t, (10)
as the PCA estimator for the Fs,t accurately quantifies the Fs. Representation (9) suggests a different, perhaps
more traditional, way to estimate the ex-post risk premia γP , namely the two-pass estimator:14
γtwopass ≡ (Λ′Λ)−1Λ′xN , (11)
setting xN ≡ (x1, · · · , xN )′ = X′1T /T . We formally show that (see Proposition 3 in Appendix B)15
γtwopass = γ. (12)
13For the case of observed risk factors, this readily follows by simply noticing that as f is observed, then an estimator for γ−E(ft)can be immediately obtained by netting out f from an estimator for γP . Some care is required when using such estimators for testingthe asset pricing restriction as, for instance, when the factors are excess returns from tradeable assets, then the null hypothesis ofcorrect model specification corresponds to a zero value for γ − E(ft) in population.
14The two-pass estimator, popularized by Black et al. (1972) and Fama & MacBeth (1973), provides the most popular estimatorfor the exante risk premia γ when the risk factors are observed in (1) and exact pricing holds, i.e. ai = 0 in (2). Shanken (1992)provides the large-T asymptotics for the two-pass estimator, further generalized by Jagannathan & Wang (1998), among others.More recent work on the two-pass methodology, allowing for both N and T to diverge, has been developed by Bai & Zhou (2015)and Gagliardini et al. (2016).
15The expression of estimator γtwopass is very similar to the one studied in Giglio & Xiu (2017), the main difference being thatGiglio & Xiu (2017) estimate the loadings by PCA after de-meaning model (1), that is by applying PCA to the xit− xi. This is notfeasible in our fixed-T context as de-meaning induces dynamic serial dependence in the idiosyncratic component, and thus violatingour regularity assumptions. However, de-meaning does not cause any bias (asymptotically) in Giglio & Xiu (2017) because they letboth N and T to diverge. For the same reason, no (asymptotic) difference arises in their setup between the ex-ante and ex-postrisk premia, as γP − γ →p 0 as T →∞.
3.3 Asset Pricing Quantities: Risk Premia and SDF 15
This result differs starkly from the observed risk factors case, where the sample mean of the realized return of
a traded factor is not necessarily identical to the two-pass CSR OLS estimator, unless exact pricing is imposed
(see Shanken (1992, Section 2)). The above equivalence holds under Assumption 1, much milder than exact
pricing. Another way to understand our equality result is that, unlike the observed factors case, in our latent
factor cases, both estimators use the same information, namely a panel of excess returns. In contrast, for
the observed factor case, the sample mean of the factor realized return ignores completely all the information
stemming from the panel of returns on the test assets. This feature has been often be advocated as a limitation
of the sample mean estimator as its use is valid only under the (strong) assumption of exact pricing. This
criticism does not apply to our framework, as we allow for pricing errors of unspecified form in (2).
Turning now to the general, time-varying, case, one defines the ex-post risk premia as:
γPt−1 ≡ Ft = γt−1 + ft − Et−1(ft) for every t ∈ Z. (13)
Raponi et al. (2018) introduce the notion of time-varying ex-post risk premia (13) and develop a large-N
inferential methodology for the case of observed risk factors. Here we complement these results to the case of
latent risk factors. In light of (13), we will provide the limiting distribution theory for the estimator of γPs−1:
γs−1 ≡ Fs,t for every s ∈ Tt and t ∈ Z. (14)
As for the static case discussed above, an alternative, numerically equivalent, estimator for the time-varying
risk premia is:16
γtwopasss−1 ≡ (Λ′t−1Λt−1)−1Λ′t−1xs. for every s ∈ Tt and t ∈ Z, (15)
whose limiting properties immediately follow from the ones of γs−1 in light of the equivalence γtwopasss−1 = γs−1.
However, when time-variation holds, (10) remains a meaningful quantity, especially when T is small, as γ
will estimate the average ex-post risk premia
γP ≡ F =1
T
∑s∈Tt
γs−1 + f − 1
T
∑s∈Tt
Es−1(fs). (16)
When T is small, γP represents a local average of the time-varying γPs−1 which, on one hand, can accurately
capture the dynamics of the true risk premia (depending on their degree of time-variation) and, on the other,
16Estimator γtwopasss−1 is essentially identical to the one proposed by Raponi et al. (2018), the main difference being that they onlyconsider observed factors
3.3 Asset Pricing Quantities: Risk Premia and SDF 16
can be more accurately estimated than γPs−1, due to the averaging. These considerations prompt us to study
also the asymptotic properties of (10).17
We now illustrate how to estimate the SDF associated with the asset pricing model (1). It turns out that
the PCA estimates provide exactly the necessary ingredients for consistent estimation of the SDF. Uppal et al.
(2018)[Theorem 1 and 2] establish that the SDF implied by the linear factor asset pricing models when a
risk-free asset is traded and all risk factors ft are observed (tradeable or non-tradeable), is given by:
mt,t+1 ≡1
rft− 1
rftγ ′tΩ
−1t (ft+1 − Et(ft+1))− 1
rfta′tΣ
−1t (xt+1. − Et(xt+1.)) for every t ∈ Z, (17)
setting the asset excess returns’s conditional covariance matrix Σt ≡ ΛtΩtΛ′t + Σet. In fact,
Pt(1) = Et(mt,t+1) =1
rftand Pt(xit+1) = Et(mt,t+1xit+1) = 0 for every i = 1, · · · , N and t ∈ Z, (18)
as the xit are excess returns.
We depart from Uppal et al. (2018) in several ways. First, the risk factors ft are never observed in our
framework. Second, the true pricing errors ait cannot be identified, and thus cannot be estimated, by the PCA
procedure, leading us to ignore the term in the aits in (17). Third, and foremost, as discussed above, in a fixed-T
environment one cannot identify the risk premia γt−1 but only γPt−1 and its sample average γP . Along the same
lines, when T remains fixed, we cannot identify the population quantities Ωt and ft+1−Et(ft+1) = Ft+1−Et(Ft+1)
but only their sample counterparts Ω ≡ T−1F′M1TF and Ft+1 − γP .
This implies that, within our fixed-T environment, the object of inference consists necessarily of the ex-post
SDF:
mPt,t+1 ≡
1
rft− 1
rftγP ′Ω−1(Ft+1 − γP ) for every t ∈ Z. (19)
Note that mPt,t+1 is not feasible and needs to be estimated as the Ft are not observed or, in other words, mP
t,t+1
represents the population ex-post SDF, from a fixed-T perspective, even though mPt,t+1 is a function of the
time-series sample moments of the Ft.
Clearly, mt,t+1 and mPt,t+1 differ but, under mild assumptions on the latent risk factors, the ex-post SDF
mPt,t+1 satisfies the pricing conditions (18) with a (pricing) error of order O(T−1), namely (see Proposition 2
for a formal proof):
Et(mPt,t+1xit+1) = O(T−1) for every i = 1, · · · , N and t ∈ Z.
17Although this is not considered in our large-N and fixed-T setting, if one considers the nonparametric sampling scheme bywhich T−1
0 + T/T0 → 0, that is when the length of the sample becomes arbitrarily small as T0 increases, then γP would identifythe time-varying risk premia γt−1. See Raponi et al. (2018), Internet Appendix, for details on nonparametric estimation of riskpremia. This provides a further justification for studying γ as an estimator of γP .
3.4 Assumptions 17
This fast rate is achieved because, although mt,t+1 and mPt,t+1 differ by an order of magnitude of Op(T
− 12 ),
due to the difference between Ωt,Ft+1 − Et(Ft+1) and Ω,Ft+1 − γP , the averaging embedded in expectation
operator involved when using the SDFs for pricing (e.g. Eq. (18), leads to the faster O(T−1) rate.18
As the PCA procedure provides the estimator Fs,t of the Fs, the natural estimator for mPs,s+1 is:
ms,s+1 ≡1
rfs− 1
rfsγ ′Ω−1(Fs+1,t − γ) for every s ∈ Tt and t ∈ Z, (20)
setting Ω ≡ T−1F′M1T F = Irt − γγ ′. We will illustrate below the limiting properties, as N →∞, for our SDF
estimator, which will take into account the sampling variability stemming from the PCA estimates of the risk
factors.
3.4 Assumptions
We now present our regularity assumptions needed to deriving the limiting properties of the PCA estimator
for the number of factors and for the factors themselves. Note that all convergences hold for N → ∞ and
conditionally on F = (F1, · · · ,FT )′ unless stated otherwise. Without loss of generality, we can follow Bai &
Ng (2002) and assume that the loadings λi are non random but, given our large-N approach, it seems justified
to allow them to be random. We report our notation in Appendix A.
Assumption 2 (factors) For any T such that rt ≤ T <∞,
ΣF =F′F
T> 0 a.s.
If rt−1 > T then ΣF will be singular, with at least rt−1 − T zero eigenvalues, a case that we rule out as it
induces multiple (and, moreover, zero) eigenvalues.
Assumption 3 (loadings time-varying) For every t ∈ Z:
Λ′t−1Λt−1
N→p ΣΛt−1 > 0.
Assumption 4 (eigenvalues) For every t ∈ Z, the eigenvalues of the rt−1 × rt−1 matrix (ΣΛt−1ΣF) are
distinct a.s.
18Proposition 2 represents the only result of this paper where we allow T to diverge, throughout this paper, and its only purposeis to establish that using the ex-post SDF mP
t,t+1 leads to pricing errors of almost identical magnitude to mt,t+1.
3.4 Assumptions 18
Assumption 5 (idiosyncratic component - temporal and cross-sectional dependence) Set σij,uv =
E(eiuejv) and κh,i1...ih,t1...th = cumulant(ei1t1 , ei2t2 , . . . , eihth), with Es(eiuejv) = E(eiuejv) for every s < u, v.
For for every 1 ≤ i, j ≤ N , every s, v ∈ Tt and t ∈ Z:
E(eis) = 0, E(eiteisejv) = 0, |σij,sv| ≤ C, |κiijj,ttvs| ≤ C
1
N
N∑i=1
(eiseiv − σii,sv)→p 0,
1
N
N∑i=1
(e4is − κ4,iiii,ssss − 3σ2
ii,ss)→p 0,
with
1
N
N∑i=1
(σii,sv − 1(s=v)σ2t−1) = o(
1√N
),1
N
N∑i=1
N∑j=1j 6=i
|σij,sv| → 0 and supt∈Z
g1(Σet−1) = o(N12 ) a.s.,
1
N
N∑i=1
(σ2ii,ss − σ4t−1) = o(
1√N
),
1
N
N∑i=1
(κ4,iiii,ttsv − 1(t=v=s)κ4t−1) = o(1√N
) and1
Nsupi1
N∑i2,...,ih=1
|κh,i1...ih,t1...th | → 0
for every 3 ≤ h ≤ 8, every tj ∈ Tt (1 ≤ j ≤ h) and for at least one ij (2 ≤ j ≤ h) different from i1, for some
σ2t , σ4t, κ4t satisfying
0 < inft∈Z
σ2t ≤ sup
t∈Zσ2t <∞, 0 < inf
t∈Zσ4t ≤ sup
t∈Zσ4t <∞, 0 ≤ inf
t∈Z|κ4t| ≤ sup
t∈Z|κ4t| <∞.
Our assumptions of (local in time but asymptotic in N) dynamic homoskedasticity and uncorrelatedness
of the eits are suggested by Bai (2003), who establishes that these conditions are necessary and sufficient for
consistency of the PCA factors when T is fixed and the factor model is static, strengthening Connor & Korajczyk
(1986) sufficiency result. We are able to somewhat relax these in the sense to impose them locally within any
given time-interval of length T . At any rate, asset returns, especially at the level of individual assets, appear to
be empirically only mildly autocorrelated across time. Our assumption imposes equality between conditional
and unconditional moments of the eits. However, this is not as restrictive as it seems given that we are not
imposing constancy but are instead allowing for (local) time-variation of the unconditional moments, a form of
local stationarity along the lines of Dahlhaus (1997). As N diverges, the number of parameters say in Σet−1
increases, a manifestation of the curse of dimensionality and it becomes necessary to limit the cross-sectional
heterogeneity of the moments of the idiosyncratic component eit. The rate on the maximum eigenvalue of
Σet−1 does not follow from the previous condition on the behaviour of the sum of the cross-covariances, and is
3.4 Assumptions 19
necessary in order to deal with the effect of the pricing errors ait in the aymptotics. This eigenvalue condition
is not needed when ait−1 = 0 for every i and t.
Our assumptions are extremely mild in terms of the permitted degree of cross-sectional dependence. For
instance assume that the eit satisfy a factor structure, imposing constant moments for simplicity, such as:
ei,t = δiut + ηi,t, (21)
where, for some positive finite constants C1 ≤ C2 and 0 < ε1 ≤ 2ε2 < 1/2, for any N ,
C1Nε1 ≤
N∑i=1
δ2i and
N∑i=1
|δi| ≤ C2Nε2 , (22)
with ut i.i.d. (0, 1) and ηi,t i.i.d. (0, σ2η) over time and across units, where the ut and the ηi,s are mutually
independent for every i, s, t. It follows that the second and fourth condition (involving the second moments) of
Assumption 5 hold and, more in general all the other conditions will follow by making suitable higher-moments
assumptions on the ηit. At the same time, the maximum eigenvalue of the covariance matrix Σe = var(et.),
given by∑N
i=1 δ2i + σ2
η, diverges at a rate not slower than O(N ε1), for some 0 < ε1 < 1/2. This implies, for
example, that the the row- and column-norm of Σe, namely supj∑N
i=1 |σij,tt| also necessarily diverges with N ,
manifesting a substantial degree of cross-sectional dependence.
The zero third-moment assumption is only made to simplify exposition and can be relaxed. The cu-
mulants’ assumption implies that the fourth-order cumulant κ4,iijj,tstv = cumulant(eit, eis, ejt, ejv) satisfies
N−1∑N
i=1
∑Nj=1j 6=i
κ4,iijj,ttsv → 0.
Assumption 6 (mixed moments) For every t ∈ Z:
1
N
N∑i=1
λit−1e′.i →p 0rt−1×T , (23)
1
N
N∑i=1
(λit−1λ′it−1 ⊗ E(e.ie
′.j))→p (ΣΛt−1 ⊗ σ2
t−1IT ), (24)
Condition (24) can be generalized to
1
N
N∑i=1
(λit−1λ′it−1 ⊗ E(e.ie
′.j))→p Γt−1,
as in Bai (2003), without imposing Γt−1 = (ΣΛt−1⊗σ2t−1IT ).19 In contrast, under our assumptions, it is possible
to estimate the asymptotic covariance matrix of 1/√N∑N
i=1(e.ieis − E(e.ieis)), as explained below.
19Consistent estimation of Γt−1 is possible although with a more elaborate proof.
3.4 Assumptions 20
Assumption 7 (joint convergence in distribution) For every s ∈ Tt and t ∈ Z:
1√N
N∑i=1
[(e.ieis − E(e.ieis))
(λit−1 ⊗ e.i)
]→d N
(0T (rt−1+1),
[(κ4t−1 + σ4t−1)ιsι
′s + σ4t−1IT 0T×Trt−1
0Trt−1×T ΣΛt−1 ⊗ σ2t−1IT
]). (25)
Our theorems require the joint distribution of
1√N
N∑i=1
(e.ieis − E(e.ieis))λit−1eis
(λit−1 ⊗ e.i).
,which can be derived by premultiplying the right hand side of (25) by the (T + rt−1 + rt−1T ) × (T + rt−1T )
matrix:
As,t ≡
IT 0T×rt−1T
0rt−1×T+rt−1(s−1) Irt−1 0rt−1×rt−1(T−s)0rt−1T×T Irt−1T×rt−1T
, (26)
one for every s ∈ Tt and t ∈ Z.
Primitive conditions for Assumption 7 can be derived but at the cost of raising the level of complexity of
our proofs. For instance, when (21) with (22) hold,
1√N
N∑i=1
e.i →d N(0T , σ2ηIT ) as N →∞,
by Theorem 2 of Kuersteiner & Prucha (2013) when the ηit satisfy their martingale difference assumption (see
their Assumptions 1 and 2.) This result extends easily to (25) under suitable additional assumptions. (Details
are available upon request.)
The implicit assumption that justifies this new angle is that the time-variation affecting model (1), in
particular regarding the loadings Λt−1 and the idiosyncratic variances σii,tt, is sufficiently smooth within the
time window of size T . The strength and generality afforded by such argument relies on the possibility of taking
T sufficiently small, which the asymptotic theory of the preceding sections allows, as long as rt−1 < T for every
T ≤ t ≤ T0. In particular, the following three smoothness assumptions suffice.
Assumption 8 (number of factors - smoothness) For every s ∈ Tt and t ∈ Z,
rs = rt−1.
Our fixed-T large-N approach permits time-variation of virtually any parameter of interest. Even if rt−1 is
allowed to change over time, we are ruling out that such change occurs within the given interval Tt.
21
Assumption 9 (loadings - smoothness) For every s ∈ Tt and t ∈ Z, the Λt−1 are differentiable functions
of t such that their first-derivative, Λ(1)t−1, satisfy
Λ′t−1Λ(1)s = op(N), e′s.Λ
(1)t−1 = op(N).
As an example of Assumption 9, one can have that only a fraction N0/N of the units has a time-varying loadings,
with N0/N → 0. An important, special, case is when the loading matrix Λt−1 is a differentiable function of a
set of observed state variables, say represented by the finite-dimensional vector zt−1, that is Λt−1 = Λ(zt−1).
Assumption 10 (idiosyncratic component - smoothness) The e.i satisfy
e.i = Σ12eiη.i with Σei ≡ E(e.ie
′.i) = [σii,uv] for i = 1, · · · , N and u, v ∈ Tt, t ∈ Z,
and η.i = (ηi1, · · · , ηiT )′ satisfying E(η.i) = 0, E(η.iη′.i) = IT with η.i and Σej mutually independent for any
i, j. Then, for every s ∈ Tt and t ∈ Z, assume that the σii,s and κ4,iiiissss are differentiable functions of s such
that they first-derivative, σ(1)ii,ss and κ
(1)4,iiiis, satisfy, for every a, b, c, d ∈ Tt and a ≤ a∗ ≤ t, b ≤ b∗ ≤ t, c ≤ c∗ ≤
t, d ≤ d∗ ≤ t:
N∑i=1
|ηiaηibσ(1)ii,a∗σ
(1)ii,b∗ | = op(N),
N∑i=1
|ηiaηibσ(1)ii,a∗σii,tt| = op(N),
N∑i=1
|σ(1)ii,a∗ | = o(N),
N∑i=1
(|σ(1)ii,a∗ |σii,tt) = o(N),
N∑i=1
(|σ(1)ii,a∗ |σii,a∗) = o(N),
N∑i=1
|σ(1)ii,a∗σ
(1)ii,b∗ | = o(N),
N∑i=1
(λ′it−1λit−1)|σ(1)ii,a∗ | = op(N),
N∑i=1
|κ(1)4,iiiia∗ | = o(N).
4 Inference
4.1 Inference on Risk Factors
In this section we present the limiting distribution theory for the PCA-estimated risk factors.
Theorem 1 (rate of convergence and limiting distribution of the risk factors estimator) Under As-
sumptions 1-10, as N →∞, F(Tt) = (Ft−T+1,t, · · · , Fs,t, · · · , Ft,t)′ and Λ(Tt−1) = (λ1t−1, · · · , λNt−1)′ satisfy:
(i)
‖ Fs,t − H′t−1Fs ‖= Op(N− 1
2 ) for every s ∈ Tt and t ∈ Z.
4.1 Inference on Risk Factors 22
(ii)√N(Fs,t − H′t−1Fs
)→d N(0rt−1 ,A
′t−1Bs,t−1At−1) for every s ∈ Tt and t ∈ Z,
setting
At ≡ (Irt , Irt , Irt)′,
Bs,t ≡
σ4tT U−2
t + (κ4t+σ4t)T 2 U−1
t H′tFsF′sHtU
−1t 0rt×rt 0rt×rt
0rt×rt σ2tU−1t
σ2tT U−1
t QtΣΛtFsF′sHtU
−1t
0rt×rtσ2tT U−1
t H′tFsF′sΣΛtQ
′tU−1t
σ2tT (F′sΣΛtFs)U
−2t
,where Qt and Ut are defined in Lemma 3.
Proof. See Appendix D.
When T is also allowed to diverge, the asymptotic distribution of the estimated factors simplifies significantly.
In particular, the result of Bai (2003), Theorem 5, is re-obtained for our static case, except that we obtain a
much simplified expression for the asymptotic covariance matrix: thank to our Assumption 6, adopting Bai
(2003) notation, V−1t−1Qt−1Γt−1Q
′t−1V
−1t−1 = σ2
t−1U−1t−1. The latter formula carries a very neat interpretation:
it states that the (asymptotic) precision of the estimated factors increases proportionally with the magnitude
of the associated (normalized) eigenvalue. At the same time, the precision diminishes, equally across the rt−1
factors, when the idiosyncratic average variance σ2t−1 increases.
In order to construct confidence intervals and to perform hypothesis testing, for the estimated factors,
one needs to consistently estimate the asymptotic covariance matrix reported in Theorem 1-(ii). This entails
estimating σ2t−1 and the matrices ΣΛt−1 ,Qt−1,Ut−1, among others. For the static case Bai (2003) shows that
this is feasible when both N and T diverge, despite the rotation indeterminacy of the estimated factors and
loadings associated with the rotation matrix Ht−1. However, in our sampling framework, estimation of ΣΛt−1
appears particularly problematic because the λit−1 are not consistently estimated when T is fixed, regardless of
the rotation. In particular, following Bai (2003)[proof of Theorem 2] and using by Theorem 1-(i), the following
decomposition and limit hold:
λit−1 = H−1t−1λit−1 +
F′e.iT
+1
TF′(F− FH−1
t−1)λit−1 →p H−1t−1λit−1 +
H′t−1F′e.i
T. (27)
Note that the lack of consistency of the loadings is unrelated to latency of the factors: it will still hold when the
Ft are observed. However, by a careful analysis, we are still able to estimate consistently the second moment
matrix of the λit−1 for T fixed. This is accomplished in the next theorem.
4.1 Inference on Risk Factors 23
Theorem 2 (consistent estimation of asymptotic covariance matrix) Under Assumptions 1-10 and the
identification condition κ4t−1 = 0, as N →∞,
‖ Bs,t −Bs,t ‖→p 0 for every s ∈ Tt and t ∈ Z,
where
Bs,t ≡
σ4tT−1U∗−1
t
(Irt + T−1Fs,tF
′s,t
)U∗−1t 0rt×rt 0rt×rt
0rt×rt σ2t U∗−1t σ2
t T−1U∗−1
t ΣΛtFs,tF′s,tU
∗−1t
0rt×rt σ2t T−1U∗−1
t Fs,tF′s,tΣΛtU
∗−1t σ2
t T−1(F′s,tΣΛtFs,t)U
∗−2t
,setting
σ2t ≡ T
T − rtVTt(rt) with VTt(rt) ≡
1
NTtrace
((X− FΛ′)(X− FΛ′)′
),
σ4t ≡(
3 +27
T(∑s∈Tt
((F′s,tFs,t)
T)2) +
18rtT
)−1( 1
NT
N∑i=1
∑s∈Tt
e4is
),
U∗t = Vt −σ2t
TIrt = ΣΛt .
Proof. See Appendix D.
When estimating the asymptotic covariance matrix, one needs necessarily to set κ4t−1 = 0 for identification
purposes, as evinced from Lemma 4 in Appendix C. However, higher-order cumulants are not constrained to
be zero, implying that κ4t−1 = 0 is not equivalent to Gaussianity. Moreover, heterogeneity of the σii,tt across
assets induces (potentially) a very large (average) volatility of the idiosyncratic component eit, quantified by
the difference between σ4t−1 and σ4t−1. Note that ΣΛt−1 is identical to U∗t−1 as Vt−1 = Λ′t−1Λt−1/N , but,
for expositional clarity, we prefer to refer to them with a different notation, in particular to facilitate the
construction of the standard errors for the PCA-estimated risk factors.
We now illustrate estimation of the number of latent factors. Hence, assume that a k factor model is
estimated, where k can be smaller, equal or larger than the true number of factors rt−1. In particular, Fk
denotes the T × k matrix of the k eigenvectors associated with Vkt−1, the diagonal matrix containing the k
largest eigenvalues of XX′/NT and, similarly, Λkt−1 = T−1X′Fk denotes the N × k matrix of loadings Set, for
some function g(N),
PCTt(k) ≡ V ∗Tt(k) + kg(N)
with V ∗Tt(k) ≡ ( TT−k )VTt(k) and
VTt(k) ≡ 1
NTtrace
((X− FkΛk
t−1′)(X− FkΛk
t−1′)′).
4.2 Inference on Risk Premia and the SDF 24
Theorem 3 (consistent estimation of the number of factors) Suppose that Assumptions 1-10 hold and
that g(N)→ 0 together with√Ng(N)→∞. For every t ∈ Z, set
kt−1 ≡ argmin0≤k≤TPCTt(k).
Then
Prob(kt−1 = rt−1)→ 1 as N →∞.
Proof. See Appendix D.
As for the double asymptotic result of Bai & Ng (2002), the result follows by showing that V ∗Tt(k) is (a.s.)
up-ward biased when k < rt−1 although it is not (asymptotically) biased whenever k ≥ rt−1. The penalization
through the g(N) function, on the other hand, is asymptotically negligible precisely when is not needed, that
is when k < rt−1, and instead it induces a bias, as desired, whenever k > rt−1, the bigger the slower is g(N)
converging towards zero, leading to consistency of kt−1 for rt−1.
The strength of Theorem 3 is that it represents the only existing method to consistently estimate the number
of latent factors for model (1) when T is fixed and N diverges. All other existing methods surveyed in Section 2,
require double asymptotics.
Theorem 3 relies on studying the behaviour of Fk including the case k > rt−1 (see Lemma 5). In particular,
we show that when an erroneously larger number of risk factors are considered, that is k > rt−1, the corre-
sponding risk premia consist of a linear combination of the risk premia associated with the correct number (viz.
rt−1) of risk factors. This suggests that the use of PCA for risk premia estimation could potentially avoid the
problems arising from including too many factors, such as when useless, or spurious, factors (i.e. factors that
are uncorrelated with the test assets) are included.20
4.2 Inference on Risk Premia and the SDF
The previous results permit to derive the limiting properties of the risk premia and SDF estimators. In
particular, in view of (14), Theorem 1 provides directly the limiting distribution theory for the estimator of
20Jagannathan & Wang (1998), Kan & Zhang (1999a), Kan & Zhang (1999b) and Gospodinov et al. (2017) pointed out howinclusion of useless factors leads to serious problems when conducting inference on beta-pricing models, and various methods havebeen proposed to tackle them; see Kleibergen (2009), Gospodinov et al. (2014), Bryzgalova (2014), Burnside (2016), Anatolyev &Mikusheva (2018) and Raponi & Zaffaroni (2018) among others.
4.2 Inference on Risk Premia and the SDF 25
γPs−1, namely:
γs−1 ≡ Fs,t for every s ∈ Tt and t ∈ Z,
and for the numerically equivalent estimator γtwopasss−1 .
It is important to recognize that no gain appears with respect to estimation of the time-varying risk premia
γs−1 from taking T large. This represents a key motivation for using our large-N results. More specifically,
γs−1 − H′t−1γs−1 = H′t−1(fs − Es−1(fs)) + Γs,NT = Op(1) +Op(N− 1
2 ),
for a random quantity satisfying Γs,NT = Op(N− 1
2 ), by Theorem 1.21 The only effect of allowing also a large
T is that the asymptotic covariance matrix of γs−1 will be smaller (in the matrix sense) without affecting the
rate of convergence, which is still Op(N− 1
2 ).
Summarizing, when time-variation is allowed for, the ex-ante risk premia H′t−1γs−1 cannot be consistently
estimated even when N and T both diverge. Instead, the ex-post risk premia H′t−1γPs−1 can be consistently
estimated only when N diverges, with T affecting the asymptotic precision. The latter effect could be relevant
in finite, small, samples.
Considering now the average risk premia estimator γ in (10), it follows that
γ = T−1∑s∈Tt
Fs,t →p T−1∑s∈Tt
H′t−1γPs−1 = H′t−1
(γ + f − T−1
∑s∈Tt
Es−1(fs)
)≡ H′t−1γ
P ,
setting γ ≡ T−1∑
s∈Tt γs−1 and f ≡ T−1∑
s∈Tt fs, that is γ now captures accurately the local average of the
time-varying risk premia γP , for which inference can be made by generalizing our PCA results as follows.
Theorem 4 (rate of convergence and limiting distribution of the risk premia estimator) Under As-
sumptions 1-10, as N →∞,
(i) The risk premia estimator γ identifies the population risk premia γP multiplied by the rotation matrix Ht−1.
(ii)
‖ γ − H′t−1γP ‖= Op(N
− 12 ) for every t ∈ Z. (29)
21In particular:
γs−1 = H′t−1γPs−1 + U−1
t−1
F′
T(ees.N− E(
ees.N
)) + U−1t−1
F′F
T
Λ′t−1es.N
+ U−1t−1
F′
T
eΛt−1
NFs
= H′t−1γs−1 + H′t−1(fs − Es−1(fs)) + Γs,NT , , (28)
where Γs,NT ≡ T−1U−1t−1F
′( ees.N− ( ees.
N)) + T−1U−1
s F′FΛ′t−1es.
N+ T−1U−1
s F′eΛt−1
NFs.
4.2 Inference on Risk Premia and the SDF 26
(iii)√N(γ − H′t−1γ
P )→d N(0rt−1 ,A′t−1Ct−1At−1) for every t ∈ Z,
setting
Ct ≡
(κ4t+σ4t)
T 2 U−2t + σ4t
T 2 U−1t H′tFF′HtU
−1t 0rt×rt 0rt×rt
0rt×rtσ2tT U−1
tσ2tT U−1
t QtΣΛtFF′HtU−1t
0rt×rtσ2tT U−1
t H′tFF′ΣΛtQ′tU−1t
σ2tT (F′ΣΛtF)U−2
t
,where Qt and Ut are defined in Lemma 3.
(iv) When, in addition, κ4t = 0,
‖ Ct −Ct ‖→p 0 for every t ∈ Z,
setting
Ct ≡
σ4tT 2 U∗−2
t + σ4tT 2 U∗−1
t¯F¯F′U∗−1
t 0rt×rt 0rt×rt
0rt×rtσ2tT U∗−1
tσ2tT U∗−1
t ΣΛt
¯F¯F′U∗−1t
0rt×rtσ2tT U∗−1
t¯F¯F′ΣΛtU
∗−1t
σ2tT ( ¯F′ΣΛt
¯F)U∗−2t
,where σ2
t , σ4t, U∗t and ΣΛt are defined in Theorem 2.
Proof. See Appendix D.
Theorem 4 is in agreement with Raponi et al. (2018), who noticed that their large-N fixed-T estimation
procedure, designed for constant risk premia, would still lead to a meaningful estimator for local averages of
the time-varying risk premia. Recall that T is fixed and possibly very small so such local averages might still
display a pronounced time-variation when evaluated over a sequence of rolling samples. In contrast, if one lets
T to diverge, then γ will capture the integrated risk premia over the entire time line, that is it will converge to
(net of a rotation)∫∞−∞ γsds, given that f − T−1
∑ts=t−T+1 Es−1(fs) = op(1) under mild conditions. Therefore,
no information on time-variation can be recovered from γ if one lets T to diverge.
As the vector of estimated risk premia is a sample average of the latent factors, its precision increases with
T , in particular at rate O(T−12 ). Indeed, although this is deliberately not explored in this paper, our result
shows that γ is consistent at rate O((NT )12 ) when both N and T diverge. A closer analysis explains the reasons
behind such fast rate of convergence. In fact, averaging (28) across Tt gives
γ = H′t−1(1
T
∑s∈Tt
γs−1) + H′t−1(f − 1
T
∑s∈Tt
Es−1(fs)) + ΓNT ,
where we set ΓNT ≡ T−1∑
s∈Tt Γs,NT . By Theorem 4, it follows that ΓNT = Op((NT )−12 ) and, in fact,
Dello Preite & Zaffaroni (2019) establish asymptotic normality of√NT ΓNT , and thus of
√NT (γ − H′t−1γ
P ),
4.2 Inference on Risk Premia and the SDF 27
as both N,T diverge. This fast rate seems at odds with the usual rate of convergence for estimated risk premia,
in the presence of observed risk factors. For instance, the sample mean of a traded factor (i.e. a portfolio excess
return) converges at rate O√T ), and the same rate applies to the traditional two-pass OLS estimator. The
reason for this apparent contradiction is that γ − H′t−1γP is, implicitly, capturing the randomness (i.e. the
asymptotic distribution) around T−1∑
s∈Tt νs−1, where νs−1 ≡ γs−1−Es−1(fs), namely the average risk premia
netted out by the factor average conditional expectation or, in other words, the portion of the risk premia that
is not linearly dependent on the factors’ expectation. Besides the rotation by the matrix H′t−1, which is just a
by-product of having unobserved risk factors, such fast rate is identical to the one found by Gagliardini et al.
(2016), in a double asymptotic setting, and when the risk factors are assumed to be observed.22
In contrast, if inference on the average (ex-ante) average risk premia T−1∑
s∈Tt γs−1 is aimed for, then
γ − H′t−1
1
T
∑s∈Tt
γs−1 = H′t−1
1
T
∑s∈Tt
(fs − Es−1(fs)) + ΓNT = Op(T− 1
2 ) +Op((NT )−12 ),
and the slower, traditional,√T -rate of convergence emerges.
Regarding the two-pass estimator γtwopasst−1 , Theorem 4 applies, in view of (12), but one can also use direct
arguments to show√N -rate of convergence:
γtwopasst−1 =(Λ′t−1Λt−1
)−1Λ′t−1xN
→p
(H−1t−1ΣΛt−1H
−1′t−1 +
σ2t−1
TIrt−1
)−1(H−1t−1ΣΛt−1H
−1′t−1H
′t−1γ
P +σ2t−1
TH′t−1F
)= H′t−1γ
P .
Interestingly, no-bias adjustment is required for γtwopasst−1 , unlike the case when the risk factors are observed.23
Given (12), inference on γtwopasst−1 can be readily conducted by using Theorem 4.
Giglio & Xiu (2017) illustrate a generalization of their method to time-varying loadings whereby the loadings
are function of observed state variables. Similarly, but in the context of observed risk factors, Gagliardini
et al. (2016) allow for observed state variables in loadings and risk premia. Our approach differs from both
contributions because we do not need to make any assumption on the form of time-variation of risk premia,
such as dependence from some state-variables.
Our asymptotic distribution theory delivers the limiting properties of the SDF estimator mt,t+1 of Eq. (20).
22Obviously, with tradeable, and thus observed, factors νs−1 = 0rt−1 under correct model specification.23See Raponi et al. (2018) for the analysis of the two-pass risk premia estimation when only N diverges.
4.2 Inference on Risk Premia and the SDF 28
Theorem 5 (rate of convergence and limiting distribution of the SDF estimator) Under Assumptions
1-10, as N →∞,
(i)
‖ ms,s+1 −mPs,s+1 ‖= Op(N
− 12 ) for every s ∈ Tt and t ∈ Z.
(ii)√N(ms,s+1 −mP
s,s+1
)→d N(0,D′s,t−1Et−1Ds,t−1) for every s ∈ Tt and t ∈ Z,
setting
Ds,t ≡ (IT ⊗H−1t )
1
rfs
(−((
1TT
)⊗ Irt)Ω−1(F′(ιs+1 −
1TT
))− ((ιs+1 −1TT
)⊗ Irt)Ω−1(F′
1TT
)
+(M1TFΩ−1F′ ⊗ Ω−1F′)
[(1TT⊗ (ιs+1 −
1TT
)) + ((ιs+1 −1TT
)⊗ 1TT
)
]),
Et ≡ (IT ⊗U−1t
H′tF′
T)Uet(IT ⊗
FHt
TU−1t ) + (σ2
t IT ⊗U−1t ) + (FΣΛtF
′ ⊗ σ2t
TU−2t )
+(σ2t
FHt
TU−1t ⊗H′tF
′)KrtT +KTrt(σ2tU−1t
H′tF′
T⊗ FHt).
where Uet is defined in Appendix 8.5 and Kab defines the commutation matrix of order (a, b) (see Magnus &
Neudecker (2007)[Chapter 3, Section 7] )
(iii)
‖ Ds,t −Ds,t ‖→p 0, ‖ Et −Et ‖→p 0 for every s ∈ Tt and t ∈ Z,
setting
Ds,t ≡ 1
rfs
(−((
1TT
)⊗ Irt)Ω−1(F′(ιs+1 −
1TT
))− ((ιs+1 −1TT
)⊗ Irt)Ω−1(F′
1TT
)
+(M1T FΩ−1F′ ⊗ Ω−1F′)
[(1TT⊗ (ιs+1 −
1TT
)) + ((ιs+1 −1TT
)⊗ 1TT
)
]),
Et ≡ (IT ⊗ U∗−1t
F′
T)Uet(IT ⊗
F
TU∗−1t ) + (σ2
t IT ⊗ U∗−1t ) + (FΣΛtF
′ ⊗ σ2t
TU∗−2t )
+(σ2t
F
TU∗−1t ⊗ F′)KrtT +KTrt(σ2
t U∗−1t
F′
T⊗ F),
where σ2t , σ4t, U
∗t are defined in Theorem 2, and Uet is defined in Appendix E.
It is interesting to compare the limiting behaviour of the SDF estimator ms,s+1 as N → ∞, established
above, with its behaviour under double asymptotics, that is when both N,T → ∞. It turns out that in the
double asymptotics setting, ms,s+1 converges to the (ex-ante) SDF ms,s+1 in Eq. (17) although at the slower
4.3 Inference with Mimicking Portfolios 29
rate δNT ≡ min[√T ,√N ]. Although different techniques, from the one developed in this paper, have to be
used, as the latent factors F, and their PCA estimates, become high-dimensional as T diverges, it can be
shown that δNT (ms,s+1 −ms,s+1) is asymptotically normal and that asymptotically valid standard errors can
be derived (see Dello Preite & Zaffaroni (2019) for details).
The time-varying risk premia result leads immediately to the limiting behaviour for (conditional) portfolio
expected returns, yielding
µat−1 ≡ wa′Nt−1Λt−1γ →p µ
at−1 ≡ µa′Λt−1
H−1′t−1H
′t−1γ
P = µa′Λt−1γP ,
whenever Λ′t−1waNt−1 →p H−1
t−1µaΛt−1
. This condition easily follows under mild conditions on the weights waNt−1.
The estimated conditional portfolio risk premium (i.e. the expected excess portfolio return) is asymptotically
independent of the rotation matrix Ht−1, hence it is perfectly identified.
4.3 Inference with Mimicking Portfolios
Here we examine the limiting behaviour of the mimicking portfolios excess returns, both in population and using
its PCA-based estimator. It turns out that the population mimicking portfolio excess returns Fmpt = wmp′
Nt−1xt.
reproduce accurately the true (latent) risk factors Ft as N diverges. This result does not formally depend on
whether Ft is observed or not, demonstrating that the notion of mimicking portfolios is useful also in an APT
framework, namely in the presence of pricing errors. However, precisely due to the presence of pricing errors,
the mimicking portfolios require N large in order to be effective. It follows that, in a APT framework, one
cannot estimate the mimicking portfolios by means of the sample moments, as T−1∑T
s=1 xs.x′s. is singular for
N < T (precisely at most of rank N − T ). However, a consistent estimator for Fmpt , valid when N → ∞ for
every t ∈ Z, can be derived, using our PCA procedure, yielding the mimicking portfolio estimator:
Fmpt ≡
(γγ ′ + Ω
)Λ′t−1
((Λt−1γ)(Λt−1γ)′ + Λt−1ΩΛ′t−1 + σ2
t−1IN
)−1xt. = wmp′
Nt−1xt. for every t ∈ Z. (30)
Moreover, by means of some algebraic manipulations, one can show that the mimicking portfolio estimator Fmpt
satisfies:
Fmpt =
(σ2t−1Irt−1 + Λ′t−1Λ
′t−1
)−1Λ′t−1xt. for every t ∈ Z.
It follows that, although not identical to the PCA-estimator, the mimicking portfolio estimator has the same
limiting properties of Ft,t as N diverges, and thus it will converge to a suitable rotation of the true (latent)
risk factors, as enunciated below.
4.4 Inference in the Absence of the Risk Free Asset 30
Theorem 6 (mimicking portfolios) As N →∞,
(i) Under Assumptions 1,3 and gN (Σet−1) ≥ C > 0 a.s.,
‖ Fmpt − Ft ‖= op(1) for every t ∈ Z,
(ii) Under Assumptions 1-10:
‖ Fmpt − Ft,t ‖= Op(N
−1) for every t ∈ Z. (31)
By part (i), the unfeasible (population) mimicking portfolio returns converge to the true (latent) factors. A
rate of convergence can be derived by strengthening Assumption 2, in particular is by defining the rate of
convergence of N−1Λ′t−1Σ−1et−1Λt−1. The closeness between the mimicking portfolio estimator Fmp
t and the
PCA estimator Ft,t, stated in part (ii), implies that they have the same rate of convergence and asymptotic
distribution, centered around H′t−1Ft, as spelled out in Theorem 1. In fact
Fmpt − H′t−1Ft = (Fmp
t − Ft,t) + (Ft,t − H′t−1Ft) = Op(N−1) + (Ft,t − H′t−1Ft) = (Ft,t − H′t−1Ft) + op(1),
implying√N(Fmp
t − H′t−1Ft) = Op(N− 1
2 ) +√N(Ft,t − H′t−1Ft).
4.4 Inference in the Absence of the Risk Free Asset
We now describe how our results need to be modified when a risk free asset is not tradable. All our results still
applies, except that one now needs to estimate an additional quantity, namely the zero-beta rate, which in our
large-N environment requires some further details. Throughout this section xit denotes the gross return for
asset i, still assumed to satisfy (1). However, the conditional asset pricing restriction (2) is now replaced by
µit−1 ≡ Et−1(xit) = ait−1 + γ0t−1 + λ′it−1γt−1 a.s. for every i = 1, · · · , N and t ∈ Z, (32)
where γ0t−1 indicates the (unknown) zero-beta rate, implying
xit = ait−1 + γ0t−1 + λ′it−1Ft + eit for every i = 1, · · · , N and t ∈ Z, (33)
where as before Ft = ft + γt−1 − Et1(ft). In vector form one obtains:
xt. = at−1 + γ0t−11N + Λt−1Ft + et. for every t ∈ Z. (34)
4.4 Inference in the Absence of the Risk Free Asset 31
However, unlike the previous case when γ0t−1 = rft−1 + 1, one cannot apply PCA to the row data X, even in
the static case γ0t−1 = γ0, because γ0 is the risk premium associated with the constant (unit) factor and its
presence is not asymptotically negligible.
Giglio & Xiu (2017), who focus on the static case, circumvent this difficulty by applying PCA to the
time-series de-meaned data xit − xi, and then estimate γ0 in a second stage. This avenue is ruled out in our
framework, because it induces serial correlation in the eit, but one can still eliminate the influence of γ0 by
de-meaning with respect to the cross-sectional sample averages:
x∗t. ≡M1Nxt. = M1Nat−1 + M1NΛt−1Ft + M1Net.
= a∗t−1 + Λ∗t−1Ft + e∗t.,
as γ0t−1M1N1N = 0. In other words, one applies the PCA to the x∗it = xit − xt = xit − (N−1∑N
j=1 xjt),
where the same definition applies for the other starred quantities defined above. A crucial feature of such
cross-sectional de-meaning is that it does not alter the degree of serial (temporal) dependence.
Thus, application of PCA to the X∗ yields
F∗(Tt), Λ∗(Tt) = argminF,Λ∗t−1
1
NTtrace
((X∗ − FΛ∗′t−1)(X∗ − FΛ∗′t−1)′
)where, as before, F∗ = F∗(Tt) is given by the set of rt−1 eigenvectors, multiplied by
√T , corresponding to
the largest rt−1 eigenvalues of X∗X∗′/NT , and correspondingly Λ∗(Tt) = Λ∗t−1 = X∗′F∗(F∗′F∗)−1 = X∗′F∗/T .
Notice that the true risk factors F are not affected by pre-multiplication with the matrix M1N but we denote
the PCA estimator as F∗ to emphasize that these are obtained from the cross-sectionally de-meaned data.
It turns out that all our results apply once λit−1 and ΣΛt−1 are replaced by λ∗it−1 and ΣΛ∗t−1throughout
our assumptions. Notice that, by construction,∑N
i=1 λ∗it−1 = 0rt−1 and that ΣΛ∗t−1
is the covariance matrix of
the λit (and not the second moment matrix as before), given that:
Λ∗′t−1Λ∗t−1
N=
Λ′t−1M1NΛt
N=
Λ′t−1Λt−1
N− µΛt−1µ
′Λt−1
→p ΣΛ∗t−1> 0,
setting µΛt−1 ≡ N−1Λ′t−11N .
As the true risk factors are not affected by the (cross-sectional) de-meaning, their estimator and thus the
estimators for the risk premia and the SDF, will not be affected either, except of course for the rotation that
will necessarily differ from the case when the risk free asset is tradable. However, estimation of portfolios risk
premia will be affected by the de-meaning as one cannot recover the (weighted) first-moment of the loadings
4.4 Inference in the Absence of the Risk Free Asset 32
from the Λ∗t−1, as these are now centered around zero. Moreover, one still needs to estimate the zero-beta rates
γ0t−1. For the static case, Giglio & Xiu (2017) derive the asymptotics for the estimator
γGX0 ≡ (1′NMΛ∗1N )−11′NMΛ∗ xN .
However, in our setting, γGX0 →p γ0 + µ′ΛγP as N → ∞, because 1′N Λ∗ = 0′r, implying that γGX0 is not
consistent for γ0.24 Further complications arise when the zero-beta rate is time varying, as in our framework.
Instead, one can construct a consistent estimator for both γ0t−1 and (a rotation of) µΛt−1 jointly, as follows.
By taking the cross-sectional average of both sides in (34), and stacking across t, one obtains
xT = aT + γ0 + FµΛt−1 + eT = aT + M1T γ0 + D
(γ0
µΛt−1
)+ eT , (35)
setting xT ≡ X1N/N , eT ≡ e1N/N , aT ≡ (a′01N/N, · · · ,a′T−11N/N)′, γ0 ≡ (γ00, · · · , γ0T−1)′, γ0 ≡ γ ′01T /T
and D ≡ (1T ,F). Representation (35) naturally suggests the two-pass OLS estimator for (γ0, µ′Λt−1
)′:(γ0
µΛ
)≡ (D′D)−1D′xT , (36)
based on the (first-pass) estimator of the risk factors F∗, where we set D ≡ (1T , F∗). Noticeably, the two-pass
estimator (γ0, µ′Λ)′ is exactly the mirror image of the traditional two-pass estimator. Whereas the latter is based
on a single cross-sectional regression of N time-series averaged returns on the estimated loadings, evaluated
as the T diverges (see Black et al. (1972), Fama & MacBeth (1973) and Shanken (1992), among others), our
two-pass estimator is obtained from a single time-series regression of T cross-sectional averaged returns on the
estimated factors, evaluated as N diverges.
The estimators (γ0, µ′Λ)′ are clealry meaningful if both aT and M1T γ0 are (asymptotically) negligible as
N diverges. Indeed, these conditions hold as a consequence of no-arbitrage (for the vector of pricing errors
aT ) and by suitable smoothing assumptions (for the γ0), as specified below. The next theorem provides an
inferential procedure for (γ0, µ′Λ)′.
Theorem 7 (rate of convergence, limiting distribution, consistent estimation of the asymptotic
covariance matrix of γ0 and µΛ)
Under Assumptions 1-10, as N →∞,
24Giglio & Xiu (2017) do not encounter this problem because, as they consider a double asymptotic setting, they apply PCA tothe time-series de-meaned data as already discussed above, for which their estimator γGX0 is valid.
4.4 Inference in the Absence of the Risk Free Asset 33
(i) ((γ0
µΛ)− (
γ0
H−1t−1µΛt−1
)
)= (
¯F∗′
Irt−1
)Ω∗−1Cov(F∗′,γ0) +Op(N− 1
2 ), (37)
setting
Cov(F∗′,γ0) ≡ 1
TF∗′M1T γ0. (38)
and ¯F∗≡ T−1F∗′1T , Ω
∗ ≡ T−1F∗′M1T F∗.
When, in addition, µΛt−1 →p µΛt−1, aT = o(N−12 ) and γ0s = γ0t−1 for every s ∈ Tt and t ∈ Z:
(ii) ((γ0
µΛ)− (
γ0t−1
H−1t−1µΛt−1
)
)= Op(N
− 12 ). (39)
(iii)√N
((γ0
µΛ)− (
γ0t−1
H−1t−1µΛt−1
)
)→d N(0rt−1+1,G
′t−1Lt−1Gt−1), (40)
setting
Gt ≡ 1
T
(1T ,FHt)(1 γP ′Ht
H′tγP Irt
)−1
−((1T ,FHt)(1 γP ′Ht
H′tγP Irt
)−1 ⊗H−1t µΛt)
,Lt ≡(
L11t L′21t
L21t L22t
),
with
L11t ≡ σ2t IT , L21t ≡ σ2
t (IT ⊗U−1t H−1
t µΛt) + σ2t (FµΛt ⊗U−1
t
H′tF′
T),
L22t ≡ (IT ⊗U−1t
H′tF′
T)Uet(IT ⊗
FHt
TU−1t ) + (σ2
t IT ⊗U−1t ) + (FΣΛ∗t
F′ ⊗ σ2t
TU−2t )
+(σ2t
FHt
TU−1t ⊗H′tF
′)KrtT +KTrt(σ2tU−1t
H′tF′
T⊗ FHt).
(iii) When, in addition, κ4t = 0,
‖ Gt −Gt ‖→p 0, ‖ Lt − Lt ‖→p 0,
setting
Gt1
T≡
(1T ,F∗
T )(1 γ ′
γ Irt)−1
−((1T ,F∗
T )(1 γ ′
γ Irt)−1 ⊗ µΛ)
, Lt ≡(
L11t L′21t
L21t L22t
),
4.4 Inference in the Absence of the Risk Free Asset 34
with
L11t ≡ σ2t IT , L21t ≡ σ2
t (IT ⊗ U∗−1t µΛ) + σ2
t (F∗µΛ ⊗ U∗−1
t
F∗′
T),
L22t ≡ (IT ⊗ U∗−1t
F∗′
T)Uet(IT ⊗
F∗
TU∗−1t ) + (σ2
t IT ⊗ U∗−1t ) + (F∗ΣΛ∗t
F∗′ ⊗ σ2t
TU∗−2t )
+(σ2t
F∗
TU∗−1t ⊗ F∗′)KrtT +KTrt(σ2
t U∗−1t
F∗′
T⊗ F∗),
where σ2t , σ4t, U
∗t , ΣΛ∗t
are defined in Theorem 2, replacing F, Λt with F∗, Λ∗t , Kab is defined in Theorem 5 and
Uet is defined in Appendix E.
(iv) When, in addition, µΛt−1 − µΛt−1 = op(N− 1
2 ), then (i)-(ii)-(iii) also apply replacing µΛt−1 with µΛt−1.
Although our focus is on the large-N finite-T case, by analyzing the asymptotic covariance matrix it follows
that our estimators for γ0t−1 and H−1t−1µΛt−1 are, in fact, converging at the fast rate Op((NT )−
12 ) if one allows
both N and T to diverge. This rate resembles the rate that Gagliardini et al. (2016) obtain for their estimator
of the (nonlinear component of the) risk premia associated with the risk factors. The analysis of the zero-beta
rate is excluded from their analysis but we conjecture that the same rate as ours, when the factors are observed,
applies.
Our result shows that, when the risk-free rate is not sufficiently smooth, our risk premia estimator would
quantify the (local) average of the time-varying zero beta rates, namely γ0, although a bias emerges, that
depends on the sample covariance between the risk factors and the time-varying zero-beta rates. A similar
bias affects our estimator of the loadings’ first moment µΛt−1 . However, as T can be arbitrarily small, the
smoothness assumption on the γ0t−1 is extremely mild in practice.
In the absence of the risk free asset, the population ex-post SDF becomes:
mPt,t+1 =
1
γ0t− 1
γ0tγP ′Ω−1(Ft+1 − γP ) for every t ∈ Z, (41)
which satisfies the pricing conditions Et(mPt,t+1xit+1) = 1+O(T−1), again with a (pricing) error of order O(T−1)
by an easy extension of Proposition 2. Our PCA-based estimator for mPt,t+1 is then:
ms,s+1 =1
γ0− 1
γ0γ∗′Ω∗−1(F∗s+1,t − γ∗) for every s ∈ Tt and t ∈ Z,
setting γ∗ = F∗′1T /T , and the analogue of Theorem 5 applies.25
25Details are available upon request.
35
Having obtained a consistent estimator for H−1t−1µΛt−1 , we can re-construct the estimator for the load-
ings as Λ†t−1 ≡ Λ∗t−1 + 1N µ′Λt−1
. In turn, this allows to estimate consistently the expected portfolio returns
corresponding to any generic portfolio weights waN , namely
µat−1 ≡ wa′Nt−1Λ
†t−1γ →p µ
at ≡ µa′Λt−1
H−1′t−1H
′t−1γ
P = µa′Λt−1γP ,
under mild assumptions on the portfolio weights waNt−1 such as Λ†′t−1w
aNt−1 →p H−1
t−1µaΛt−1
.
5 Monte Carlo Experiments
We illustrate the finite-sample performance of the PCA estimators for the number of factors and for the factors
themselves when N is allowed to diverge but T is fixed and, moreover, assumed to be very small. We compare
the results from using our methodology with the ones that are valid when N and T both diverge. Given the
scope of the Monte Carlo exercise, we focus on the static model. We consider cases when T is very small, such
as T = 2, 5 and also the cases when T = 50, 100. We combine these values of T with N ranging from 10 up to
3, 000. Data are generating according to model (1), imposing zero pricing errors,
xit = λ′iFt + eit, i = 1, · · · , N, t = 1, · · · , T,
where the common factors Ft are assumed iid (independent and identically distributed) across t and N(0, Ir),
corresponding to a given number of factors r, the loadings λi are assumed iid across i and N(0, Ir) and the
the idiosyncratic components eit are assumed iid across both i and t standard normal, unless we say otherwise.
Moreover Ft,λi and ejs are assumed mutually independent for every i, j, t, s.26 All the results reported below
are based on 1, 000 Monte Carlo iterations where the common factors Ft are not re-sampled at every iteration
but only sampled once at the outset of every Monte Carlo exercise.
5.1 Estimation of the Number of Factors
Our estimator is based on the criterion PC(k) = V ∗(k)+kg(N) where V ∗(k) = ( TT−k )V (k) and the penalization
function is constructed:
g(N) =(log(N))ε1N ε2
√N
for ε1 ≥ 0 and 0 < ε2 < 1/2,
which only depends on N .27 We compare our results with some of the Bai & Ng (2002)[Section 5] criteria,
asymptotically valid when N and T both diverge, namely the PCp and ICp criteria. We also consider, as a
26Results for the case of cross-sectional dependent ejs are available upon request.27Variations of our criterion were tried. A similar, good, performance to ours has been obtained also for V (k) + kg(N).
5.2 Estimation of the Asymptotic Distribution of Factors 36
comparison, the AICp and BICp criteria, which Bai & Ng (2002) showed not to be always asymptotically valid.
We consider the cases when the true number of latent factors equals r = 1, 3.
Tables I and II focus on case r = 1 and show how our criterion estimates r very accurately, especially for
larger values of ε2: these values imply that the penalization decreases very slowly, keeping k below 2 in most
cases. In particular, Table I show how our method outperforms, for all value of N , when T is extremely small,
namely T = 2 (top part of the table) and T = 3 (bottom part of the table). In contrast, the large-N and
-T criteria IPp1 and ICp1 work well when both N,T are large enough, as for the BIC1 criterion. The AIC1
criterion, instead, always overstates the true number of factors. These features have already been documented
in Bai & Ng (2002).
TABLE I HERE
TABLE II HERE
Table III considers case r = 3 and, again, our criterion appears useful for very small values of T whereas
the IPp1, ICp1 and BIC1 criteria work well for larger values of T , whereas the AIC1 criterion always overstates
the number of factors.
TABLE III HERE
5.2 Estimation of the Asymptotic Distribution of Factors
In this section we always assume r = 1 for simplicity.28 We follow Bai (2003)[Section 6] in terms of the various
analysis developed to demonstrate the accuracy of the asymptotic distribution of the PCA estimated factors,
illustrated in our Theorems 1 and 2 respectively. Table IV reports the correlation, averaged across Monte Carlo
iterations, between true and estimated factors for T = 2, 5, 50 and N = 10, 100, 1000, 3000. We also report the
correlation corresponding to the loadings, as a comparison, in Table V. The results show that the factors are
accurately estimated, even when T is very small, once N is sufficiently large. In contrast, the loadings require
a sufficiently large T in order to obtain a similar accuracy, appearing poorly estimated when T = 2, 5 and,
instead, well estimated when T = 50. Moreover, the quality of the estimates for the loadings does not change
very much as one varies N from 10 to 3000. We also report in Figure I and II the histogram of the estimated
correlations for the factors and the loadings, respectively, when T = 10, N = 100 and when T = 100, N = 100.
Whereas the correlations are all concentrated around unity, for factors, they are instead concentrated around
28Further results are available upon request.
5.2 Estimation of the Asymptotic Distribution of Factors 37
0.95 for the loadings, markedly away from unity, when T is much smaller than N . Instead, the histograms have
the same shape when T and N are both equal to 100, as indicated in Figure II.
TABLE IV and V HERE
FIGURE I HERE
FIGURE II HERE
We then consider the the histograms of factors’ estimates, across the Monte Carlo iterations. In particular
we report the histograms associated with the studentized PCA estimates, satisfying by Theorem 1, for every
1 ≤ s ≤ T ,
f small−Ts ≡ (A′BsA)−12N
12 (Fs − H′Fs)→d N(0r, Ir) as N →∞,
where A and Bs are defined in Theorem 2, setting Bs = Bs,t in view of the static formulation adotped here.
We only display case t = 1 for simplicity in all the figures. For comparison, we also display the histograms
corresponding to the studentized PCA estimates but using the (estimated) asymptotic covariance matrix that
is valid when both N and T diverge, namely σ2U∗−1. The latter formula can be easily obtained by evaluating
the limit of Bs as T →∞, yielding
f large−Ts ≡ (σ2U∗−1)−12N
12 (Fs − H′Fs),
where we expect tails larger than the standard normal as the incorrect standard errors, used for f large−Ts , are
always smaller than the asymptotically correct ones, used for f small−Ts . In the last two graphs we also report
the histogram corresponding to the studentized PCA estimates based on the (estimated) asymptotic covariance
matrix that is valid when both N and T diverge and robust to heteroskedasticity (see Bai (2003)[Eq.(7), Section
5]).
The results are illustrated in Figures III to X. The finite-sample distribution of the f small−Ts is remarkably
close to the standard normal density when N is large, regardless of T , for instance even when T = 2. More
in details, we always report the histogram for the f small−Ts in green and for the f large−Ts in blue. Figure III
considers the case T = 2, N = 10: although N is very moderate, the f small−Ts ’s empirical distribution does
not appear too far from the shape of the standard normal although it is wrongly centered around a negative
value. Instead, using the distribution of the f large−Ts appears much more spread out than the standard normal,
the result of adopting excessively small standard errors. Figure IV still considers the very small T = 2 case
but now with N = 1, 000: now the factors appear extremely well estimated, the empirical distribution of the
f small−Ts is very close to the standard normal whereas, again, the f large−Ts exhibit much fatter tails.
5.2 Estimation of the Asymptotic Distribution of Factors 38
FIGURE III HERE
FIGURE IV HERE
As expected, as one increases T , the discrepancy between the distribution of the f small−Ts and f large−Ts
diminishes, and it is evident from Figure V and VI (T = 5) and Figures VII and VIII (T = 50). In particular,
when T = 50 there is almost no difference between the two histograms, in turn very close to the standard
normal density.
FIGURE V HERE
FIGURE VI HERE
FIGURE VII HERE
FIGURE VIII HERE
Finally, in Figures IX and X we report, together with the histograms of f small−Ts and f large−Ts , also the
histograms of the (studentized) factors’ estimated using the robust standard errors (in red) for the cases
T = 50, N = 1, 000 and T = 100, N = 1, 000, respectively. In the latter case, the three histograms are
essentially identical, and close to the standard normal density, whereas even for T = 50 the robust standard
errors are still underestimating the true asymptotic covariance matrix.29
FIGURE IX HERE
FIGURE X HERE
Next, we evaluate the 95% (asymptotic) confidence intervals for the true factors Ft. Again we consider the
average intervals across the 1000 Monte Carlo iterations. We follow Bai (2003) and construct the confidence
intervals as: (βFs − 1.96
β
N12 (A′BsA)
12
, βFs + 1.96β
N12 (A′BsA)
12
), s = 1, · · · , T,
where β ≡ (F′F)−1F′F, that is the OLS estimator from projecting the estimated factors F on the true factors F
without intercept.This provides an accurate estimation of the (inverse of the) rotation matrix H of Theorem 1.
Figures XI and XII report the times series of these confidence intervals for cases T = 5, N = 10 and T = 5, N =
1, 000 respectively.
FIGURE XI HERE
29We do not report the values corresponding to the robust standard errros for the other combinations of N and T because thecorresponding histograms is too far off from the ones of fsmall−Ts and f large−Ts .
39
FIGURE XII HERE
The confidence intervals, corresponding to T = 5, N = 10 are sufficiently accurate to include the true
factor’s realization, where the interval becomes much narrower as N increases, essentially nailing down the
time series of the true factor. Note that we set T = 5 in both figures.
6 Empirical Application
In this section we assess the performance of our method using an (unbalanced) panel of monthly individual assets
traded in the NYSE from January 1960 to December 2013 (source CRSP). Our method appears ideal to address
important empirical questions related to the possible time-variation characterizing the factor asset pricing
model. Our empirical results are all based on adopting a short rolling window of T = 12 time observations. We
have repeated all the exercises for T = 24 and T = 36 without noticeably differences of the empirical results.
6.1 Time-Variation of Number of Factors
We investigate the extent to which the number of risk factors is varying across time over long horizons, for
example whether they have been increasing along the increased sophistication of financial markets. Moreover,
we investigate whether the number of factors increases rapidly over economic expansions and financial market
booms and decrease otherwise, thus charactering the time-variation over short horizons.
Figure XIII reports the estimated number of factors from January 1960 to January 2013 using rolling
windows of T = 12 months, leading to a number of individual assets varying from 800 to 5, 000 across the
rolling samples.
FIGURE XIII HERE
The results clearly shows that the number of risk factors was small, between one and two, until the beginning
of the 1980s, possibly a sympthom of under-developed financial markets and of financial segmentation. From
then on, the number has been steadily increasing until the dot-com bubble of the 2000, reaching 10 risk factors,
then decreasing sharply to two risk factors as the financial crises unfolds. More in general, from January
2000 onward, the estimated number of factors tracks the dynamics of the financial market remarkably well,
increasing during periods of booms and decreasing during crashes: the correlation between the two series is
above 80% between March 2000 and June 2009, with the number of factors ranging from two to 10. Our method
6.2 Identifying Risk Factors 40
corroborates the observation according to which the correlation across financial markets increases dramatically
during crashes, as one obtains when the number of risk factors reduces to one or two, The correlation across
the whole period (January 1996 until December 2013) is 58%.
Figure XIV shows the same times series of the estimated number of factors, with the NBER business cycle
indicator and various other macroeconomic and financial crises indicators: it is, again, remarkable how the
estimated number of factors co-vary negatively with almost every negative macroeconomic event.
FIGURE XIV HERE
Our finding might explain the disagreement emerging in the empirical asset pricing literature on the correct
number of risk factors for the universe of stock US equity returns, ranging from the small number of Fama and
French factors (three or 5), to the recent advances that suggest up to 10 risk factors. Once time-variation is
allowed for, it is clear that the notion of assuming a fixed number of risk factors appears uncorrect. Moreover,
our maximum estimated number of factors appears aligned with previous findings.
6.2 Identifying Risk Factors
We describe the dynamic behaviour of the time series of the five dominant estimated factors and, exploiting
the time variation of our approach, show how one can identify the extent to which the estimated latent risk
factors are related to some of the observed factors proposed in the empirical asset pricing literature. We focus
here on the five observed factors associated with the Fama & French (2015) asset pricing model.
Figure XV reports the first five estimated factors using rolling windows of size T = 12, where the number
of stocks varies from 800 to 5, 000 across the rolling samples.
Table VI reports the correlation, across the full sample, between the five estimated factors and the observed
factors of the Fama & French (2015) asset pricing model. It emerges that the first factor has the highest
correlation with the market excess return (mkt), the second factor with the profitability return (rmw) and
with the high-minus-low return (hml), and the third factor with the small-minus-large return (smb) and, in
part, with the investment return (cma). The fourth and fifth factors appear mildly correlated with all the five
observed returns. It is interesting to compare these correlations with the ones obtained during the sub-prime
financial crises crash (identified as Aug 2007 to Feb 2009), reported in Table VII: the first factors is even more
prominently related to mkt, the second factor to hml, the third factor to both smb and hml, the fourth factor
especially to cma and, to a lesser extent, to hml and rmw, and, finally, the fifth factor to cma.
6.3 Detecting the State Variables for Loadings 41
Therefore, the links of the first three estimated risk factors with the Fama & French (2015) factors appear
robust, although varying with time. This can be better appreciated by looking Figures XVI, XVII and XVIII,
where we report the time series of the estimated correlations between the PCA estimates and the Fama &
French (2015) five factors, over rolling windows of 60 observations. In particular, Figure XVI shows how the
market excess return is clearly related to the first PCA, the smb return to the second PCA and the hml return
to the third PCA. Although the magnitude of these estimated correlations vary through time, it is evident
that in general the link becomes more tenuous from the late 1990s. Figures XVII and XVIII show the link of
the rmw and cma returns with the fourth and fifth PCA: it appears that rmw and cma are related to a linear
combination of those PCAs. Again the strength of the relationship appears to fade away over the last part
of the sample. Our evidence on the larger number of estimated PCA factors explaining the cross-section of
returns from the 1990s onwards appears aligned with this attenuation of the estimated correlations. A more
economically meaningful measure of the (dynamic) importance of the Fama & French (2015) five factors is
obtained by relating them to the estimated SDF, as ilustrated below. There, we confirm that, indeed, the
relevance of the Fama and French factors appears strong in the first half of the our sample period, until the
early 1990s, and much weaker thereafter, except during the most volatile periods of the financial crises. This
agrees with our empirical finding, above, suggesting that only a small number of risk factors is relevant during
such periods.
FIGURE XV HERE
TABLES VI and VII HERE
FIGURES XVI, XVII, XVIII HERE
6.3 Detecting the State Variables for Loadings
We examine the time variation of the estimated equally-weighted portfolio return:
µewt−1 ≡ wew′N Λt−1γ,
setting wewN = N−11N . Under our assumptions, µewt−1 →p µ
ewt−1 ≡ µew′Λt−1
γP . As already indicated, µewt−1 is
rotation-free, and thus its time-variation must be attributed to the dynamics of the loadings, and not induced
by the rotation matrix. In fact, our identification strategy is based on making the expected portfolio return
above function of the average risk premia over the T data points, namely γP , as opposed to considering the
time-varying risk premia γPt−1.
6.4 Pricing Performance of Risk Premia 42
Figure XIX reports the time series of µewt−1 together with four state-variables often advocated to drive the
dynamics of the loadings to the risk factors, namely the market dividend yield (ratio of aggregate dividends to
the S&P 500 index; source Robert Shiller’s website), the default spread (Moody’s Seasoned Baa Corporate Bond
Yield Relative to Yield on 10-Year Treasury Constant Maturity, Not Seasonally Adjusted; source FRED), the
term spread (10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity, Not Seasonally
Adjusted; source FRED) and the CAPE ratio (Cyclically Adjusted Price Earnings Ratio P/E10; source Robert
Shiller website). Several papers suggest the explanatory power of these state-variables (see Gagliardini et al.
(2016) and Kelly et al. (2018) among others). Data are monthly from Dec 1960 until Dec 2013.
FIGURE XIX HERE
The prominent time-variation of the portfolio estimated expected return is evident. We now assess more
accurately the dynamic relationships between these variables, in particular trying to quantify the extent to
which the portfolio estimated expected return moved together with the four state variables. Figure XX reports
the time series of the adjusted-R2 of the regression, over rolling windows of 60 observations, of µewt−1 over the
four above described state-variables, both jointly (black line) and also taken one by one (red line for CAPE,
green line for term spread, blue line for default spread, light blue for dividend yield).
The results from the regressions reported in figure XX show that the time-variation of the risk factors’
loadings appears strongly driven by the four state-variables when taken jointly, except between the late 1990s
and the mid 2000s. The CAPE appears to be more relevant in the first part of the sample. Interestingly,
the dividend yield seems to explain time-variation of loadings in the aftermath of financial and economic crisis
whereas the spread variables appear more relevant during booms and expansions.
FIGURE XX HERE
6.4 Pricing Performance of Risk Premia
We report in Figures XXI,XXII and XXIII the time series of the estimated (ex-post) risk premia for risk factor
one, two and three, respectively (black line). When these are statistically significant, positive or negative, at
5% significance value, we report them in green and red, respectively. Notice that, for risk premia two and three,
sometimes the lines are missing, corresponding to the rolling windows for which the estimated number of factor
is smaller, respectively, than two and three. The grey bars indicates financial crises and recessions. Notice that
the confidence bands have a time-varying width because are based on a time-varying number of stocks (from
6.4 Pricing Performance of Risk Premia 43
800 to 5, 000). It emerges that the estimated risk premia have some tendency for being anti-cyclical and, more
importantly, their variability increases substantially in the aftermath of crises. Moreover, as illustrated by our
theory, the risk premium associated with the highest-ranked factor (in terms of the corresponding eigenvalues)
is more precisely estimated, and the precision progressively diminishes as one considers the second, the third,
and so forth, risk premium.
FIGURES XXI,XXII,XXIII HERE.
To quantify the pricing performance of the estimated risk premia, we consider the empirical pricing errors
stemming from our PCA procedure:
δit−1 ≡ xi − λ′it−1γ. (42)
Note that these are almost identical to the estimated intercept from the time series regression of the i-th excess
returns xis on the PCA estimates Fs,t, namely αit−1 ≡ xi − β′it−1γ, with the estimated regression coefficient
given by βit−1 ≡ (F′M1T F)−1F′M1Tx.i. The difference between the δit−1 and the αit−1 is that the former rely
on the estimated loadings λit−1 = (F′F)−1F′x.i whereas the αit−1 rely on the βit−1. The popular GRS test of
correct specification of Gibbons et al. (1989) is based on the αit−1.30
We evaluate the following statistic N−1∑N
i=1 δ2it−1, corresponding to any given rolling window Tt of size T .
However, under the null hypothesis of correct model specification, namely Eq. (2), it follows that δit−1 →p
ai + ei − (e′.iF/T )H′t−1F, where ai = (ai0, · · · , aiT−1)1T /T and ei = e′.i1T /T , implying
1
N
N∑i=1
δ2it−1 →p
σ2t−1
T(1− γP ′Ht−1H
′t−1γ
P ) as N →∞.
The above finding, namely that the population pricing errors are not negligible even under the hypothesis of
correct pricing when T is fixed, has been discovered by Raponi et al. (2018) in the context of testing beta-pricing
models with observed risk factors. Here we derive the analogue result the latent risk factors case. In view of
this, we plot the time series of the centered average squared (sample) pricing errors:
∆t−1 ≡1
N
N∑i=1
δ2it−1 −
1
Tσ2t−1(1− γ ′γ),
in order to quantify the pricing performance of the PCA-estimated asset pricing model. Figure XXIV plots the
time series of the ∆t−1 over rolling windows of size T = 12 corresponding to various cases: when the number
of factors equal kt−1 (as reported in Figure XIII) which is presumably the correct model (black line), and also
30Their result, namely that a suitably standartized quadratic form of the αit−1 is exactly distributed like a chi-square, holds fora fixed N and T .
6.4 Pricing Performance of Risk Premia 44
when the number of factors is arbitrarily kept constant, such as k = 1, (red line), k = 2 (green line), k = 3 (blue
line) and k = 5 (light blue). It is evident how the one-factor model leads to the poorest performance whereas
the model associated with kt−1 gives the best performance, yet close to the three- and five-factor model.
FIGURE XXIV HERE.
Another, alternative, way to assess whether the conditional factor asset pricing model adds economic value
is to construct mean-variance efficient portfolios and to assess its out-of-sample performance, for instance in
terms of Sharpe ratios. Specifically, assuming that the universe of assets is driven by the factor time-varying
asset pricing model (1) with rt−1 risk factors, we construct time series of (tangency) portfolio excess returns as:
rmv,kt−1
t ≡ wmv′t−1xt.,
setting
wmvt−1 ≡
Σ−1t−1Λt−1γ
1′NΣ−1t−1Λt−1γ
with Σt−1 ≡(Λt−1ΩΛ′t−1 + σ2
t−1IN
).
The results are reported in Table VIII. We have evaluated the portfolio weights over rolling time-windows of size
T = 36 where the (time-varying) number of assumed risk factors is either taken from Section 6.1 above (based on
our selection criterion), namely kt−1, or assumed fixed throughout the exercise (k = 1, k = 5 and k = 10). The
portfolio returns rmv,ktt have been carefully constructed in real time, meaning that the corresponding portfolio
weights only use past information, in particular data from period t − 35 until t − 1. We have also evaluated
the t-ratios to compare each of the Sharpe ratios with the one associated with the equally weighted portfolios.
Such t-ratios are asymptotically standard normal under the null hypothesis of equality (see Lo (2003)).
TABLE VIII
In summary, the conditional factor asset pricing model (1), corresponding to the estimated number of risk
factors obtained using our criteria (namely kt−1), appears to perform well in terms of out-of-sample Sharpe
ratios. In particular, the corresponding optimal portfolio statistically beats the equally weighted portfolio, which
is usually a hard benchmark to achieve for any model-based portfolio strategy (see DeMiguel et al. (2009)).
Our approach, based on the estimated time-varying number of risk factors kt−1, also beats the cases based on
assuming a constant number of factors: for instance, the results are particularly weak when assuming k = 10
risk factors, a symptom of the over-fitting arising from selecting too many risk factors.
6.5 Dynamic Spanning of the SDF 45
6.5 Dynamic Spanning of the SDF
We examine the dynamic properties of the estimated SDF mt,t+1 (black line), whose times series is reported in
Figure XXV, together with its 95% confidence band (red lines) where, as before, the grey bands correspond to
recessions and financial crises. The results are markedly indicating the anti-cyclicality of the SDF, confirming
its interpretation as an index of bad times. We consider here rolling windows of size T = 24 (two years).
In fact, when when T = 12, one obtains both large swings in the estimated SDF and, correspondingly,
confidence intervals with a large width, especially towards the second half of the time period, due to the
proximity of the chosen short time window (T = 12) and the relatively large number of estimated factors, with
kt−1 often taking values up to 10. This causes quasi-singularity of Ω and thus excessively large estimates of
the SDF and their standard errors.31
FIGURE XXV HERE.
More importantly, we examine the extent to which the Fama & French (2015) five factors span, in part
the SDF. The results are reported in Figure XXVI, where we report the adjusted-R2 from regressing the SDF,
respectively, on the market return, the three Fama & French (1993) factors and the five Fama & French (2015)
factors, using rolling windows of 60 months. It emerges that all three models explained a sizeable amount of
time-variation of the SDF until the 1990s. Thereafter, the CAPM poorly spans the SDF whereas the three-
and five-factor models seem more relevant, although the magnitude of explained R2 diminishes, with positive
spikes observed in the aftermath of financial and economics crises, for instance especially after the 2007 financial
crises. This fact squares with our finding regarding the small number of risk factors determined during market
crashes, as opposed to the large number of risk factors that appear relevant during market booms.
FIGURE XXVI HERE.
7 Conclusion
This paper develops a methodology for inference on conditional asset pricing models linear in latent risk
factors, valid when the number of assets diverges but the time series dimension is fixed, possibly very small.
We show that the no-arbitrage condition implies that the PCA estimator of the risk factors is asymptotically
equivalent to the mimicking portfolios estimator. Moreover, no-arbitrage permits to identify the risk premia as
31Note that when kt−1 = T , which is its largest possible value, then the T × T matrix Ω has rank T − 1, and hence becomessingular. Thus, for accurate estimation of the SDF, it is advisable to allow for a T much larger than kt−1.
REFERENCES 46
the expectation of the latent risk factors. This result paves the way to an inferential procedure, based on the
PCA approach, for the factors’ risk premia and for the stochastic discount factor, spanned by the latent risk
factors. The strength of our set up is that it naturally handles time-varying factor models, where every feature is
allowed to be time-varying including loadings, idiosyncratic risk and the number of risk factors. Several Monte
Carlo experiments corroborate our theoretical findings. Our results represent a unique tool to address several
empirical questions in empirical asset pricing, and beyond, unthinkable with the usual PCA methodologies valid
when under double-asymptotic. For instance, our empirical analysis demonstrates how the estimated number
of risk factors varies across time, increasing during booms and sharply decreasing during financial crises. We
also show how the time-variation of the risk factors’ loadings appear driven by the interest rate spread variables
and dividend and earnings variables, especially during financial crises. The estimates of the SDF implied by
our cross-section of individual returns exhibit a substantial anti-cyclicality and appears strongly spanned by
the Fama & French (2015) factors during market crashes, than otherwise.
References
Ahn, S. C. & Horenstein, A. R. (2013), ‘Eigenvalue ratio test for the number of factors’, Econometrica 81, 1203–
1227.
Ait-Sahalia, Y. & Xiu, D. (2017), ‘Using principal component analysis to estimate a high dimensional factor
model with high-frequency data’, Journal of Econometrics 201, 384–399.
Ait-Sahalia, Y. & Xiu, D. (2018), ‘Principal component analysis of high-frequency data’, Journal of the Amer-
ican Statistical Association 0, 1–17.
Al-Najjar, N. (1999), ‘On the robustness of factor structures to asset repackaging’, Journal of Mathematical
Economics 31, 309–320.
Amengual, D. & Watson, M. (2007), ‘Consistent estimation of the number of dynamic factors in large n and t
panel’, Journal of Business and Economic Statistics 25, 91?96.
Anatolyev, S. & Mikusheva, A. (2018), ‘Factor models with many assets: strong factors, weak factors, and the
two-pass procedure’, Working Paper, MIT .
Andersen, T., Bollerselv, T., Diebold, F. & Wu, J. (2006), Realized beta: persistence and predictability, in
T. Fomby & D. Terrell, eds, ‘Advances in Econometrics: Econometric Analysis of Economic and Financial
Time Series in Honor of R.F. Engle and C.W.J. Granger, vol. B’, Elsevier, Amsterdam, Netherlands,
pp. 1–40.
Ang, A. & Kristensen, D. (2012), ‘Testing conditional factor models’, Journal of Financial Economics 106, 132–
156.
Bai, J. (2003), ‘Inferential theory for factor models of large dimensions’, Econometrica 71, 135–171.
REFERENCES 47
Bai, J. & Ng, S. (2002), ‘Determining the number of factors in approximate factor models’, Econometrica
70, 191–221.
Bai, J. & Zhou, G. (2015), ‘Fama-MacBeth two-pass regressions: Improving risk premia estimates’, Finance
Research Letters 15, 31–40.
Barigozzi, M., Hallin, M. & Soccorsi, S. (2019), ‘Time-varying general dynamic factor models and the measure-
ment of financial connectedness’, Working Paper, https://papers.ssrn.com/sol3/abstract=3329445 .
Black, F., C., J. M. & Scholes, M. (1972), The Capital Asset Pricing Model: Some empirical tests, in ‘Studies
in the Theory of Capital Markets’, Praeger, New York.
Brandon, J. B., Plagborg-Møller, M., Stock, J. & Watson, M. (2013), ‘Consistent factor estimation in dynamic
factor models with structural instability’, Journal of Econometrics 177, 289–304.
Breeden, D. T., Gibbons, M. R. & Litzenberger, R. H. (1989), ‘Empirical test of the consumption-oriented
c a p m’, The Journal of Finance 44, 231–262.
Brillinger, D. R. (2001), Time series: data analysis and theory, Society for Industrial and Applied Mathematics,
Philadelphia.
Bryzgalova, S. (2014), ‘Spurious factors in linear asset pricing models’, Working Paper, Stanford University .
Burnside, C. (2016), ‘Identification and inference in linear stochastic discount factor models with excess returns’,
Journal of Financial Econometrics 14, 295–330.
Chamberlain, G. (1983), ‘Funds, factors, and diversification in arbitrage pricing models’, Econometrica
51, 1305–1323.
Chamberlain, G. & Rothschild, M. (1983a), ‘Arbitrage, factor structure, and mean-variance analysis on large
asset markets’, Econometrica 51, 1281–1304.
Chamberlain, G. & Rothschild, M. (1983b), ‘Arbitrage, factor structure, and mean-variance analysis on large
asset markets’, Econometrica 51, 1281–1304.
Cochrane, J. H. (2011), ‘Presidential address: Discount rates’, The Journal of Finance 66, 1047–1108.
Connor, G., Hagmann, M. & Linton, O. (2012), ‘Efficient estimation of a semiparametric characteristic-based
factor model of security returns’, Econometrica 80, 713–754.
Connor, G. & Korajczyk, R. (1993), ‘A test for the number of factors in an approximate factor model’, Journal
of Finance 48, 1263?1291.
Connor, G. & Korajczyk, R. A. (1986), ‘Performance measurement with the arbitrage pricing theory: A new
framework for analysis’, Journal of Financial Economics 15, 373 – 394.
Connor, G. & Linton, O. (2007), ‘Semiparametric estimation of a characteristic-based factor model of common
stock returns’, Journal of Empirical Finance 14, 694 – 717.
Dahlhaus, R. (1997), ‘Fitting time series models to nonstationary processes’, Annals of Statistics 25, 1–37.
Dahlhaus, R. (2000), ‘A likelihood approximation for locally stationary processes’, Annals of Statistics 28, 1762–
1794.
REFERENCES 48
Dello Preite, M. & Zaffaroni, P. (2019), ‘Inference on the stochastic discount factor with large data’, Working
Paper, Imperial College Business School .
DeMiguel, V., Garlappi, L. & Uppal, R. (2009), ‘Optimal versus naive diversification: How inefficient is the
1/n portfolio strategy?’, The Review of Financial Studies 22, 1915–1953.
Fama, E. F. & French, K. R. (1993), ‘Common risk factors in the returns on stocks and bonds’, Journal of
Financial Economics 33, 3–56.
Fama, E. F. & French, K. R. (2015), ‘A five-factor asset pricing model’, Journal of Financial Economics
116, 1–22.
Fama, E. F. & MacBeth, J. D. (1973), ‘Risk, return, and equilibrium: Empirical tests’, Journal of Political
Economy 81, 607–636.
Fan, J., Liao, Y. & Wang, W. (2016), ‘Projected principal component analysis in factor models’, The Annals
of Statistics 44, 219–254.
Ferson, W. E. & Harvey, C. R. (1991), ‘The variation of economic risk premiums’, Journal of Political Economy
99, 385–415.
Forni, M., Hallin, M., Lippi, M. & Zaffaroni, P. (2015), ‘Dynamic factor models with infinite-dimensional factor
spaces: One-sided representations’, Journal of Econometrics 185, 359 – 371.
Forni, M., Hallin, M., Lippi, M. & Zaffaroni, P. (2017), ‘Dynamic factor models with infinite-dimensional factor
space: Asymptotic analysis’, Journal of Econometrics 199, 74 – 92.
Forni, M. & Lippi, M. (2001), ‘The generalized dynamic factor model: Representation theory’, Econometric
Theory 17, 1113–1141.
Forni, Mario, F., Hallin, M., Lippi, M. & Reichlin, L. (2000), ‘The generalized dynamic-factor model: Identifi-
cation and estimation’, The Review of Economics and Statistics 82, 540–554.
French, K., Schwert, G. & Stambaugh, R. (1987), ‘Expected stock returns and volatility’, Journal of Financial
Economics 19, 3–29.
Gagliardini, P., Ossola, E. & Scaillet, O. (2016), ‘Time-varying risk premium in large cross-sectional equity
data sets’, Econometrica 84, 985–1046.
Ghysels, E. (1998), ‘On stable factor structures in the pricing of risk: do time-varying betas help or hurt?’,
Journal of Finance 53, 549–573.
Gibbons, M. R., Ross, S. A. & Shanken, J. (1989), ‘A test of the efficiency of a given portfolio’, Econometrica
57, 1121–1152.
Giglio, S. & Xiu, D. (2017), ‘Inference on risk premia in the presence of omitted factors’, Working Paper 23527,
National Bureau of Economic Research .
Gospodinov, N., Kan, R. & Robotti, C. (2014), ‘Misspecification-robust inference in linear asset-pricing models
with irrelevant risk factors’, The Review of Financial Studies 27, 2139–2170.
Gospodinov, N., Kan, R. & Robotti, C. (2017), ‘Spurious inference in reduced-rank asset-pricing models’,
Econometrica 85, 1613–1628.
REFERENCES 49
Hallin, M. & Liska, R. (2007), ‘Determining the number of factors in the general dynamic factor model’, Journal
of the American Statistical Association 102, 603?617.
Hansen, L. P. & Richard, S. F. (1987), ‘The role of conditioning information in deducing testable restrictions
implied by dynamic asset pricing models’, Econometrica 55, 587–613.
Harvey, C. R. (2001), ‘The specification of conditional expectations’, Journal of Empirical Finance 8, 573–637.
Harvey, C. R., Liu, Y. & Zhu, H. (2016), ‘. . . and the cross-section of expected returns’, The Review of Financial
Studies 29, 5–68.
Huberman, G., Kandel, S. & Stambaugh, R. F. (1987), ‘Mimicking portfolios and exact arbitrage pricing’, The
Journal of Finance 42, 1–0.
Ingersoll, J. E. (1984), ‘Some results in the theory of arbitrage pricing’, The Journal of Finance 39, 1021–1039.
Jagannathan, R. & Wang, Z. (1996), ‘The conditional capm and the cross-section of expected returns’, The
Journal of finance 51, 3–53.
Jagannathan, R. & Wang, Z. (1998), ‘An asymptotic theory for estimating beta-pricing models using cross-
sectional regression’, The Journal of Finance 53, 1285?1309.
Jung, S. & Marron, J. S. (2009), ‘PCA consistency in high dimension, low sample size context’, The Annals of
Statistics 37, 4104–4130.
Kan, R. & Zhang, C. (1999a), ‘GMM tests of stochastic discount factor models with useless factors’, Journal
of Financial Economics 54, 103?127.
Kan, R. & Zhang, C. (1999b), ‘Two-pass tests of asset pricing models’, Journal of Finance 54, 203?235.
Kelly, B. T., Pruitt, S. & Su, Y. (2017), ‘Instrumented principal component analysis’, Working Paper, SSRN:
https://ssrn.com/abstract=2983919 or http://dx.doi.org/10.2139/ssrn.2983919 .
Kelly, B. T., Pruitt, S. & Su, Y. (2018), ‘Characteristics are covariances: A unified model of risk and return.’,
Working Paper, Yale University .
Kim, S. & Korajczyk, R. A. (2019), ‘Large sample estimators of the stochastic discount factor’, Working Paper,
SSRN: https://papers.ssrn.com/abstract=3131274 .
Kleibergen, F. (2009), ‘Tests of risk premia in linear factor models’, Journal of Econometrics 149, 149–173.
Kozak, S., Nagel, S. & Santosh, S. (2018), ‘Interpreting factor models’, The Journal of Finance 73, 1183–1223.
Kuersteiner, G. M. & Prucha, I. R. (2013), ‘Limit theory for panel data models with cross–sectional dependence
and sequential exogeneity’, Journal of Econometrics 174, 107–126.
Lettau, M. & Ludvigson, S. (2001), ‘Resurrecting the (c) capm: A cross-sectional test when risk premia are
time-varying’, Journal of Political Economy 109, 1238–1287.
Lettau, M. & Pelger, M. (2018a), ‘Estimating latent asset-pricing factors’, Working Paper, SSRN:
https://ssrn.com/abstract=3175556 .
Lettau, M. & Pelger, M. (2018b), ‘Factors that fit the time-series and cross-section of stock returns’, Working
Paper, SSRN: https://ssrn.com/abstract=3211106 .
REFERENCES 50
Lewellen, J. & Nagel, S. (2006), ‘The conditional capm does not explain asset-pricing anomalies’, Journal of
Financial Economics 82, 289–314.
Lintner, J. (1965), ‘The valuation of risk assets and the selection of risky investments in stock portfolios and
capital budgets’, The Review of Economics and Statistics 47, 13–37.
Lo, A. (2003), ‘The statistics of Sharpe ratios’, Financial Analysts Journal 58, 36–52.
Magnus, J. & Neudecker, R. H. (2007), Matrix Differential Calculus with Applications in Statistics and Econo-
metrics., J. Wiley & Sons, Revised Edition, Chicester.
Merton, R. (1973), ‘An Intertemporal Capital Asset Pricing Model’, Econometrica 41, 867–87.
Merton, R. (1980), ‘On estimating the expected return on the market: An exploratory investigation’, Journal
of Financial Economics 8, 323–361.
Motta, G., Hafner, C. M. & von Sachs, R. (2011), ‘Locally stationary factor models: identification and non-
parametric estimation’, Econometric Theory 27, 1279–1319.
Onatski, A. (2009), ‘Testing hypotheses about the number of factors in large factor models’, Econometrica
77, 1447–1479.
Onatski, A. (2010), ‘Determining the number of factors from empirical distribution of eigenvalues’, The Review
of Economics and Statistics 92, 1004–1016.
Pukthuanthong, K. & Roll, R. (2017), ‘An agnostic and practically useful estimator of the stochastic discount
factor’, Working Paper, Caltech .
Raponi, V., Robotti, C. & Zaffaroni, P. (2018), ‘Testing beta-pricing models using large cross-section’, Working
Paper, Imperial College London . Forthcoming, Review of Financial Studies.
Raponi, V. & Zaffaroni, P. (2018), ‘Dissecting spurious factors with cross-sectional regressions’, Working Paper,
Imperial College London .
Reisman, H. (1992), ‘Intertemporal arbitrage pricing theory’, The Review of Financial Studies 5, 105–122.
Ross, S. A. (1976), ‘The Arbitrage Theory of Capital Asset Pricing’, Journal of Economic Theory 13, 341–360.
Shanken, J. (1990), ‘Intertemporal asset pricing: an empirical investigation’, Journal of Econometrics 45, 99–
120.
Shanken, J. (1992), ‘On the estimation of beta-pricing models’, The Review of Financial Studies 5, 1–33.
Sharpe, W. F. (1964), ‘Capital asset prices: a theory of market equilibrium under conditions of risk’, The
Journal of Finance 19, 425–442.
Shen, D., Shen, H. & Marron, J. S. (2013), ‘Consistency of sparse PCA in high dimension, low sample size
contexts’, Journal of Multivariate Analysis 115, 317 – 333.
Stambaugh, R. F. (1983), ‘Arbitrage pricing with information’, Journal of Financial Economics 12, 357–369.
Stock, J. H. & Watson, M. W. (2002a), ‘Forecasting using principal components from a large number of
predictors’, Journal of the American Statistical Association 97, 1167–1179.
51
Stock, J. H. & Watson, M. W. (2002b), ‘Macroeconomic forecasting using diffusion indexes’, Journal of Business
& Economic Statistics 20, 147–162.
Su, L. & Wang, X. (2017), ‘On time-varying factor models: Estimation and testing’, Journal of Econometrics
198, 84–101.
Uppal, R., Zaffaroni, P. & Zviadadze, I. (2018), ‘Correcting misspecified stochastic discount factors’, Working
Paper, Edhec-Imperial College London-Stockholm School of Economics .
8 Appendixes
This section contains five appendixes: Appendix A sets the mathematical notation adopted in the paper;
Appendix B presents four ancillary propositions ; Appendix C presents the technical lemmas necessary for the
proofs of the main theorems, which are then reported in Appendix D; Appendix E reports the form of the
covariance matrix of the (e.i ⊗ e.i).
To simplify arguments, we report the proofs for the static case, that is when ait−1 = ai,λit−1 = λi, σ2t−1 =
σ2, rt−1 = r. Generalization to the time-varying case is obtained without further difficulties in most cases.32
Likewise,∑
s∈Tt will be replaced by∑T
s=1 throughout all the proofs.
8.1 Appendix A: Notation
The following notation is adopted throughout the paper: a.s. means almost surely; 1A denotes the indicator
function equal to one when event A holds; ιt denotes the t−th column of the T×T identity matrix IT ; C denotes
a finite positive constant, not always the same; ab and Ab×c denote a generic vector and matrix, respectively, of
size b×1 and b×c implying that 0a and 0a×b are the a×1 vector and the a×b matrix of zeros matrix, respectively;
diag(A) is the diagonal matrix with the diagonal elements of the matrix A; for any full column rank matrix A
of size T × a, set the projection matrixes PA ≡ A(A′A)−1A′ and MA ≡ IT −A(A′A)−1A′; gj(A) defines the
j − th eigenvalue in decreasing order of any symmetric a× a matrix A implying g1(A) ≥ ga(A); ⊗ denotes the
Kroenecker product; vec(.) and trace(.) denote the vec and trace operators; Z = · · · ,−1, 0, 1, · · · denotes
the set of relative numbers; the o(·), O(·) and op(·), Op(·) notation is adopted for scalars and finite-dimensional
vectors and matrixes (whose number of rows and columns are not a function of N);→p,→d denote convergence
in probability and distribution, respectively; E(.),Var(.),Cov(.) and Et(.),Vart(.),Covt(.) indicate, respectively,
the unconditional mean, variance, covariance and the conditional mean, variance, covariance with respect to
all the available information up to time t.
32Further details for the time-varying case are provided for the proofs of the main theorem, when not straightforward.
8.2 Appendix B: Propositions 52
8.2 Appendix B: Propositions
Proposition 1 (alternative representations of the objective function)
(i) For any arbitrary T ×k matrix A = (a1, · · · ,aT )′ of rank k, and arbitrary N ×k matrix B = (b1, · · · ,bN )′,
V (k,A) ≡ minB
1
NT
N∑i=1
T∑t=1
(xit −B′iAt)2
=1
NTminB
trace(
(X′ −BA′)(X′ −BA′)′)
=1
NTminB
trace(
(X−AB′)(X′ −BA′))
=1
NTtrace
(MAXX′
).
(ii) Setting A equal to the T × k matrix of eigenvectors of XX′/NT , times√T , associated with its k largest
eigenvalues,
V (k) ≡ minAV (k,A) =
1
NT
(trace(XX′)− trace(PAXX′)
)=
1
NT
(trace(XX′)− trace(A′XX′A
T))
= (T∑t=1
gt(XX′
NT)−
k∑t=1
gt(XX′
NT)) = (trace(VT )− trace(Vk)),
setting Va ≡ diag(g1(XX′/NT ) · · · ga(XX′/NT )), for every 1 ≤ a ≤ T .
(iii) V (k) is unchanged when replacing A by AC for any non-singular k × k matrix C.
Proposition 2 (pricing errors associated with the ex-post SDF mPt−1,t) Let
Ω ≡ cov(ft) be a positive definite matrix,
gt ≡ Ω−12 (ft − E(ft)),
such that:
sup1≤t≤T
T∑s=1
‖ E(g′tgs
)‖= O(1),
sup1≤t,v≤T
T∑s=1
‖ E(gtg′sgv) ‖= O(1),
sup1≤t≤T
T∑s=1
‖ E(gtg′tgsg
′s
)− Ir ‖= O(1),
g1(T−1T∑t=1
gtg′t) ≥ C > 0 a.s. when T large enough.
8.2 Appendix B: Propositions 53
Then, when (1) holds with ai = 0 for every i, the ex-post SDF mPt−1,t, defined in (19), satisfies:
E(mPt−1,txit) = O(T−1) for every i. (43)
Proof.
E(mPt−1,txit) = E
[(
1
rf− 1
rfγP′Ω−1(ft − f))(λ′iγ + λ′i(ft − E(ft)) + eit)
]=λ′iγ
rf− λ
′i
rfE(
(ft − E(ft))(ft − f)′Ω−1γP)− λ
′iγ
rfE(
(ft − f)′Ω−1γP)
=λ′iγ
rf− λ
′i
rfE(
(ft − E(ft))(ft − f)′Ω−1(γ + f − E(ft)))− λ
′iγ
rfE(
(ft − f)′Ω−1(γ + f − E(ft)))
=λ′iγ
rf+ I + II.
For I
I = −λ′i
rfE((ft − E(ft))(ft − f)′Ω−1γP
)+−λ
′i
rfE(
(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)γP)
= −λ′iγ
rf(1− 1
T)− λ
′i
rfE((ft − E(ft))(ft − f)′Ω−1(f − E(f)
)−λ
′i
rfE(
(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)γ)− λ
′i
rfE(
(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)(f − E(f)))
= −λ′iγ
rf(1− 1
T)− λ
′i
rfE((ft − E(ft))(ft − f)′Ω−1(f − E(f)
)− λ
′i
rfE(
(ft − E(ft))(ft − E(ft))′(Ω−1 −Ω−1)γ
)−λ
′i
rfE(
(ft − E(ft))(E(ft)− f)′(Ω−1 −Ω−1)γ)− λ
′i
rfE(
(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)(f − E(f))
= −λ′iγ
rf(1− 1
T) + I2 + I3 + I4 + I5.
Term I2 satisfies:
E((ft − E(ft))(ft − f)′Ω−1(f − E(f))
)= E
((ft − E(ft))(ft − E(ft)
′Ω−1(f − E(f)))− E
((ft − E(ft))(f − E(ft)
′Ω−1(f − E(f)))
= O(T−1).
The two term on the right hand side of the last expression are made by a finite number of terms like
T−1∑T
s=1 E((fat−E(fat))(fbt−E(fbt))(fcs−E(fcs))) and T−2∑T
s,v=1 E((fat−E(fat))(fbv−E(fbv))(fcs−E(fcs))),
respectively, all of which are O(T−1) in absolute value, whereas for the second term
Term I3 is clearly o(1) as, when T is large enough,
‖ E(
(ft − E(ft))(ft − E(ft))′(Ω−1 −Ω−1)
)‖=‖ E
((ft − E(ft))(ft − E(ft))
′Ω−1(Ω−Ω)Ω−1)‖
= O(C−1 ‖ E
((ft − E(ft))(ft − E(ft))
′Ω−1(Ω−Ω) ‖))
= O(‖ E(
(ft − E(ft))(ft − E(ft))′Ω−1Ω
)−Ω ‖
)= O
(‖ Ω
12E(gtg′tΩ− 1
2 ΩΩ−12 − Ir
)Ω
12 ‖)
= O
(‖ Ω
12E
(gtg′t(
1
T
T∑s=1
gsg′s)− Ir
)Ω
12 ‖
)= O(T−1).
8.2 Appendix B: Propositions 54
Terms I4 and I5 are also O(T−1) by Holder’s inequality, as they each contain two terms that are of order
O(T−12 ), namely E ‖ Ω−1 −Ω−1 ‖= O(T−
12 ), E ‖ f − E(f) ‖= O(T−
12 ).
Consider now part II.
E(
(ft − f)′Ω−1(γ + f − E(ft)))
= E((ft − f)′Ω−1(γ + f − E(ft))
)+ E
((ft − f)′(Ω−1 −Ω−1)(γ + f − E(ft))
)Then
E((ft − f)′Ω−1(γ + f − E(ft))
)= E
((ft − E(ft))
′Ω−1(f − E(ft)))
+ E((E(ft)− f)′Ω−1(f − E(ft))
)= E(g′tg)− E(g′g) = O(T−1),
and
E(
(ft − f)′(Ω−1 −Ω−1)(γ + f − E(ft)))
= E(
(ft − f)′(Ω−1 −Ω−1))γ +O(T−1)
= E(
(ft − f)′Ω−1(Ω− Ω)Ω−1)γ +O(T−1) = O(T−1),
as
‖ E(
(ft − f)′Ω−1(Ω− Ω)Ω−1)‖= O(C−1 ‖ E
((ft − f)′Ω−1(Ω− Ω)
)‖)
= O(‖ E(
(ft − f)′Ω−12 (Ir −Ω−
12 ΩΩ−
12 )Ω
12
)‖)
= O
(‖ E
(g′t(Ir −
1
T
T∑s=1
gsg′s)
)Ω
12 ‖
)
= O
(1
T
T∑s=1
‖ E(g′tgsg′s) ‖
)= O(T−1),
where take into account that the terms involving g are of smaller order. Collecting terms
E(mPt−1,txit) =
λ′iγ
rf− λ
′iγ
rf(1− 1
T) +O(T−1) +O(T−1) = O(T−1).
QED
Remark. Our assumptions on the latent factors ft are weaker than (temporal) iid and zero mixed-third moments.
Remark. Proposition 2 can be extended to the case ai 6= 0. However, we only present case ai = 0 as our PCA
procedure, valid when N becomes large, does not allow to identify the pricing errors ai.
Proposition 3 (equivalence between risk premia estimators) Under Assumption 1:
γtwopass = γ.
Proof. The result follows from the identifies N−1Λ′Λ = V and (NT )−1F′XX′ = VF′, as:
γtwopass = (Λ′Λ
N)−1 Λ′xN
N= V−1 Λ′X′
N
1TT
= V−1 F′XX′
NT
1TT
= V−1VF′1TT
= γ.
QED
8.2 Appendix B: Propositions 55
Proposition 4 (conditional APT) Under Assumptions 1, 2 and 5:
(i) there exists a K1t−1 <∞ a.s such that
supN
a′t−1Σ−1et−1at−1 ≤ K1t−1 a.s. for every t ∈ Z. (44)
(ii) When, in addition, with supN g1(Σet−1) ≤ C <∞, there exists exists a K2t−1 <∞ a.s such that
supN
a′t−1at−1 ≤ K2t−1 a.s. for every t ∈ Z. (45)
(iii) When, in addition, E(γ ′t−1γt−1) ≤ C <∞, E(K2t−1) ≤ C <∞ and the smoothness condition∑∞
i=1 (E([λit−1 − E(λit−1|It−1)]′[λit−1 − E(λit−1|It−1)]))12 ≤
C <∞ holds, then for any information set It−1 coarser than all the available information at time t− 1, there
exists K3t−1 <∞ a.s. such that:
supN
N∑i=1
(E(xit|It−1)− E(λit−1|It−1)′E(γt−1|It−1)
)2 ≤ K3t−1 a.s. for every t ∈ Z. (46)
Proof. Part (i) follows along the lines of Ingersoll (1984)[Theorem 1] and part (ii) follows along the lines of
Ingersoll (1984)[Theorem 3], applied to the conditional factor model (1).
For part (iii) we generalize Stambaugh (1983)[Theorem 2] to the case of time-varying loadings. From part
(ii)
E(K2t−1|It−1) ≥ E
(N∑i=1
(µit−1 − λ′it−1γt−1)2|It−1
)≥
N∑i=1
(E(µit−1|It−1)− E(λ′it−1γt−1|It−1)
)2=
N∑i=1
(E(µit−1|It−1)− E(λ′it−1|It−1)E(γt−1|It−1)− E([λit−1 − E(λit−1|It−1)]′γt−1|It−1)
)2.
By our assumption it follows that E(K2t−1|It−1) < ∞ a.s. Therefore, the factor asset pricing model assumed
by the investor with the coarser information set It−1 satisfies the pricing errors bound
N∑i=1
(E(µit−1|It−1)− E(λ′it−1|It−1)E(γt−1|It−1)
)2 ≤ K3t−1 <∞ a.s.
for some K3t−1 if |∑∞
i=1 E([λit−1 − E(λit−1|It−1)]′γt−1|It−1)| <∞ a.s.. In turn, this follows by
E|∞∑i=1
E([λit−1 − E(λit−1|It−1)]′γt−1|It−1)|
≤∞∑i=1
E(
([λit−1 − E(λit−1|It−1)]′[λit−1 − E(λit−1|It−1)])12 (γ ′t−1γt−1)
12
)≤(E(γ ′t−1γt−1)
) 12
∞∑i=1
(E([λit−1 − E(λit−1|It−1)]′[λit−1 − E(λit−1|It−1)])
) 12 ≤ C <∞.
QED
Remark. The smoothness assumption on the loadings is trivially satisfied for the static case, namely λit−1 = λi,
which is the case examined by Stambaugh (1983). However, it will be more generally satisfied in a variety of
8.3 Appendix C: Lemmas 56
dynamic set-ups. For instance, when λis = λi(s) for some differentiable function λi(·), as in Assumption 9, and
assuming for simplicity that the coarser information set simply means that information is acquired with some
delay, so that E(λit−1|It−1) = λis−1 for some s < t, then λit−1−E(λit−1|It−1) = (t−s)λ(1)is∗ and the smoothness
assumption can be expressed as
E(tr(Λ
(1)′t−1Λ
(1)t−1)
)≤ C <∞ for every t ∈ Z.
Remark. As indicated by Stambaugh (1983)[Theorem 3], a special case of a coarser information set consists of
no information, leading to a restriction on the unconditional pricing errors E(ait).
Remark. Stambaugh (1983)[Lemma 1] derives condition E(K2t−1|It−1) < ∞ under distributional assumptions
and static loadings.
8.3 Appendix C: Lemmas
Lemma 1 Under Assumption 1 and 3, as N →∞,
‖ Λ′a ‖= Op(N12 ).
Proof.
‖ Λ′a ‖≤N∑i=1
|ai| ‖ λi ‖≤ (N∑i=1
a2i )
12 (
N∑i=1
‖ λi ‖2)12 = Op(N
12 ).
QED
Lemma 2 Under Assumption 1 and 5, as N →∞,
‖ e′a ‖= Op(g121 (Σe)).
Proof.
e′a = Op((N∑
i,j=1
aiajE(e.ie′.j))
12 ) = Op((
N∑i,j=1
aiajσij)12 ) = Op((a
′a)12 g
121 (Σe)) = Op(g
121 (Σe)).
QED
Lemma 3 (central lemma) Under Assumptions 1-6, as N →∞,
(i)1
TF′( 1
NTXX′ − 1
Tσ2IT
)F = U ≡ (V − 1
Tσ2Ir)→p U > 0.
F′F
T
(Λ′Λ
N
)F′F
T→p U,
where recall that U denotes the diagonal r × r matrix of eigenvalues of(
1NT XX′ − 1
T σ2IT
)corresponding to
F, and U denotes the diagonal r × r matrix of distinct eigenvalues of Σ12ΛΣFΣ
12Λ.
8.3 Appendix C: Lemmas 57
(ii)F′F
T→p Q ≡ U
12 ΥΣ
− 12
Λ ,
for a non-singular Q, where Υ is the r × r eigenvectors matrix of Σ12ΛΣFΣ
12Λ.
Proof. Consider case ai = 0 first. We will then show that the no-arbitrage condition implies that the ais will
be washed out as N diverges.
(i) The result follows given that
1
NTXX′ =
1
TF(
Λ′Λ
N)F′ +
ee′
NT+ op(1)→p
1
TFΣΛF′ +
σ2
TIT .
Note that, as highlighted in Bai (2003), Lemma D.1, 1NT XX′ and
(1NT XX′ − 1
T σ2IT
)have the same set
of eigenvectors, but different eigenvalues. In particular, the eigenvalues corresponding to F equal V and
U, respectively, for the two matrixes. Continuity of the eigenvalue function together with Slutzky theorem
concludes, recalling that the the non-zero eigenvalues of T−1FΣΛF′ coincide with the entire set of r eigenvalues
of Σ12ΛΣFΣ
12Λ. The second part of (i) follows directly.
Now consider the general case when the ai 6= 0. Then, from X = 1Ta′ + FΛ′ + e,
1
NTXX′ =
1
TF(
Λ′Λ
N)F′ +
ee′
NT+ 1T
a′a
NT1′T +
1Ta′e′
NT+
ea1′TNT
+1Ta′ΛF′
NT+
FΛ′a1′TNT
+ op(1)
→p1
TFΣΛF′ +
σ2
TIT ,
by Lemmas 1 and 2. The rest of the proof follows along the same lines.
(ii) Consider case ai = 0 first. We adapt the proof of Bai (2003), Proposition 1. In fact, one needs to replace
XX′/NT with (XX′/NT −σ2IT /T ) to take into account the fact that the term ee′/NT is non-negligible when
T is fixed, and it would contribute to the non-zero eigenvalues (as N →∞) even though it is not related to the
common component.
In particular, given (XX′
NT− σ2
TIT
)F = F
(Vr −
σ2
TIr) = FU,
pre-multiplying both sides by T−1(Λ′Λ/N)12 F′, and re-arranging terms, yields
(Λ′Λ
N)12F′
T
(XX′
NT− σ2
TIT
)F = (
Λ′Λ
N)12F′F
T
Λ′Λ
N
F′F
T+ CN = (
Λ′Λ
N)12F′F
TU
re-written as
(BN + CND−1N )EN = ENU,
setting EN = DN (diag(D′NDN ))−12 and
BN = (Λ′Λ
N)12F′F
T(Λ′Λ
N)12 ,
CN = (Λ′Λ
N)12
(F′F
T
Λ′e′F
NT+
F′e
TN
Λ′F′F
T+
1
T 2F′(
ee′
N− σ2IT )F
)= op(1),
DN = (Λ′Λ
N)12F′F
T,
8.3 Appendix C: Lemmas 58
as DN is a.s. invertible for N large enough by part (i). One then follows the steps in Bai (2003), but replacing
ΣF with ΣF , obtaining BN →p Σ12ΛΣFΣ
12Λ and EN →p Υ′, and thus
F′F
T= (
Λ′Λ
N)−
12 EN (diag(D′NDN ))
12 →p Σ
− 12
Λ Υ′U12 .
The same result is obtained when ai 6= 0, the only difference being that CN is now given by
CN = (Λ′Λ
N)12
(F′F
T
Λ′e′F
NT+
F′e
TN
Λ′F′F
T+
1
T 2F′(
ee′
N− σ2IT )F
+F′
T(1T
a′a
NT1′T +
1Ta′e′
NT+
ea1′TNT
+1Ta′ΛF′
NT+
FΛ′a1′TNT
)F
)= op(1),
which is still op(1) by Lemmas 1 and 2. QED
Lemma 4 (quantities for asymptotic covariance matrix) Under Assumptions 1-6, as N →∞,
(i)
σ2 ≡ T
T − rV (r)→p σ
2.
(ii)
σ4 ≡(
3 +27
T(
T∑t=1
((F′tFt)
T)2) +
18r
T
)−1( 1
NT
N∑i=1
T∑t=1
e4it
)→p σ4 + Cκκ4,
for a constant Cκ (function of r, T , F and H) defined in (50) below.
(iii)
ΣΛ ≡Λ′Λ
N− σ2
TIr →p H−1ΣΛH−1′.
Proof. Consider case ai = 0 first. We will then show that the no-arbitrage condition implies that the ais will
8.3 Appendix C: Lemmas 59
be washed out as N diverges. (i) From
eit = xit − λ′iFt = xit −(λ′iH
−1′ + (e′.iF
T)− λ′i(H−1′F′ − F′)
F
T
)(H′Ft + (Ft − H′Ft)
)= xit − λ′iFt − λ′iH−1′(Ft − H′Ft)−
((e′.iF
T)− λ′i(H−1′F′ − F′)
F
T
)Ft
= eit − λ′iH−1′(Ft − H′Ft)− (e′.iF
T)Ft + λ′i(H
−1′F′ − F′)F
TFt
yielding
V (r) =1
NT
N∑i=1
T∑t=1
e2it =
1
NT
N∑i=1
T∑t=1
e2it +
1
T 3
T∑t=1
F′tF′(
1
N
N∑i=1
e.ie′.i)FFt
+1
T
T∑t=1
(Ft − H′Ft)′H−1(
1
N
N∑i=1
λiλ′i)H
−1′(Ft − H′Ft)
+1
T 3
T∑t=1
F′tF′(FH−1 − F)(
1
N
N∑i=1
λiλ′i)(H
−1′F′ − F′)FFt
−21
NT
N∑i=1
T∑t=1
eit
(λ′iH
−1′(Ft − H′Ft) + (e′.iF
T)Ft − λ′i(H−1′F′ − F′)
FFt
T
)+2
1
NT
N∑i=1
T∑t=1
(λ′iH−1′(Ft − H′Ft))
((e′.iF
T)Ft − λ′i(H−1′F′ − F′)
FFt
T
)−2
1
NT
N∑i=1
T∑t=1
((e′.iF
T)Ft)(λ
′i(H
−1′F′ − F′)FFt
T)
= σ2 +σ2
Ttr(
F′F
T
F′F
T)− 2
σ2
T
T∑t=1
ι′t,TFFt
T+Op(N
− 12 )
= σ2 +σ2
Ttr(
F′F
T
F′F
T)− 2
σ2
Ttr(
F′F
T) +Op(N
− 12 ) = σ2(1− r
T) +Op(N
− 12 ).
Case ai 6= 0 now easily follows by Lemmas 1 and 2. To prove this, simply replace eit in the above expressions
with e∗it ≡ eit + ai, and notice that all the parts involving the ai vanish asymptotically. For instance, the first
of these terms satisfies
1
NT
N∑i=1
T∑t=1
(e∗it)2 =
1
NT
N∑i=1
T∑t=1
(eit + ai)2
1
NT
N∑i=1
T∑t=1
e2it +
1
NT
N∑i=1
T∑t=1
a2i +
2
NT
N∑i=1
T∑t=1
eitai =1
NT
N∑i=1
T∑t=1
e2it +O(N−1) +Op(N
− 12 ).
(ii) Set ai = 0. By developing N−1∑N
i=1 e4it, along the same lines as part (i), one obtains
1
NT
T∑t=1
N∑i=1
e4it =
1
NT
T∑t=1
N∑i=1
e4it +
1
NT
T∑t=1
N∑i=1
((e′.iF
T)Ft)
4 +6
NT
T∑t=1
N∑i=1
e2it((
e′.iF
T)Ft)
2
+4
NT
T∑t=1
N∑i=1
e3it((
e′.iF
T)Ft) +
4
NT
T∑t=1
N∑i=1
eit((e′.iF
T)Ft)
3 +Op(N− 1
2 ). (47)
8.3 Appendix C: Lemmas 60
Set
ct = (c1t · · · cTt)′ =1
TFHH′Ft =
1
T
((F′1HH′Ft) · · · (F′tHH′Ft) · · · (F′THH′Ft)
)′.
Then, the second term on the right hand side of (47) can be re-written as (NT )−1∑T
t=1
∑Ni=1((e′.iF/T )Ft)
4 =
(NT )−1∑T
t=1
∑Ni=1(
∑Ts=1 eiscst)
4 + op(1), with mean satisfying:
1
NT
T∑t=1
N∑i=1
E(T∑s=1
eiscst)4 =
1
NT
T∑t=1
N∑i=1
T∑s1,s2,s3,s4=1
κ4,iiii,s1,s2,s3,s4cs1tcs2tcs3tcs4t
+1
NT
T∑t=1
N∑i=1
T∑s1,s2,s3,s4=1
(σii,s1s2σii,s3s4 + σii,s1s3σii,s2s4 + σii,s1s4σii,s2s3)cs1tcs2tcs3tcs4t
→ κ41
T
T∑t=1
T∑s=1
c4st +
3σ4
T
T∑t=1
(
T∑s=1
c2st)
2,
and variance
Var
(1
NT
N∑i=1
T∑t=1
(
T∑u=1
eiucut)4
)=
1
N2T 2
N∑i,j=1
T∑t,s=1
Cov
((
T∑u=1
eiucut)4, (
T∑u=1
ejucus)4
)
=1
N2T 2
N∑i,j=1
T∑t,s=1
T∑u1,u2,u3,u4=1
T∑v1,v2,v3,v4=1
cu1tcu2tcu3tcu4tcv1scv2scv3scv4s
×Cov (eiu1eiu2eiu3eiu4 , ejv1ejv2ejv3ejv4)
=1
N2T 2
N∑i,j=1
T∑t,s=1
T∑u1,u2,u3,u4=1
T∑v1,v2,v3,v4=1
cu1tcu2tcu3tcu4tcv1scv2scv3scv4s
×
(κ8 (eiu1 , eiu2 , eiu3 , eiu4 , ejv1 , ejv2 , ejv3 , ejv4)
+
(6,2)∑κ6 (eiu1 , eiu2 , eiu3 , eiu4 , ejv1 , ejv2) Cov (ejv3 , ejv4)
+
(4,4)∑κ4 (eiu1 , eiu2 , ejv1 , ejv2)κ4 (eiu3 , eiu4 , ejv3 , ejv4)
+
(4,2,2)∑κ4 (eiu1 , eiu2 , ejv1 , ejv2) Cov (eiu3 , eiu4) Cov (ejv3 , ejv4)
+
(2,2,2,2)∑Cov (eiu1 , eiu2) Cov (eiu3 , ejv1) Cov (eiu4 , ejv2) Cov (ejv3 , ejv4)
),
(48)
where κ4 (·), κ6 (·), and κ8 (·) denote the fourth-, sixth-, and eighth-order mixed cumulants, respectively. Ex-
pression (48) is a consequence of the cumulants’ theorem (see Brillinger (2001)), whereby∑(ν1,ν2,...,νk) denotes
the sum over all possible partitions of a group of K random variables into k subgroups of size ν1, ν2, . . . , νk,
respectively. As an example,∑(6,2) defines the sum over all possible partitions of the group of eight random
variables eiu1 , eiu2 , eiu3 , eiu4 , ejv1 , ejv2 , ejv3 , ejv4 into two subgroups of size six and two, respectively. Moreover,
8.3 Appendix C: Lemmas 61
since E (eit) = E(e3it
)= 0, we do not need to consider further partitions in the above relation.33 Under our
assumptions it follows that the number of indecomposable partitions, in (48), is of order O(N), implying
Var
(1
NT
N∑i=1
T∑t=1
(T∑s=1
eiscst)4
)= O
(1
N
). (49)
Therefore, it follows that
1
NT
T∑t=1
N∑i=1
((e′iF
T)Ft)
4 →p κ41
T
T∑t=1
T∑s=1
c4st +
3σ4
T
T∑t=1
(T∑s=1
c2st)
2.
For the third term of (47) one obtains 6(NT )−1∑T
t=1
∑Ni=1 e
2it((e
′.iF/T )Ft)
2 = 6(NT )−1∑T
t=1
∑Ni=1 e
2it(∑T
s=1 eiscst)2+
op(1), and, along the same lines,
6
NT
T∑t=1
N∑i=1
E(e2it(
T∑s=1
eiscst)2) → 6
T(κ4 + 2σ4)(
T∑t=1
c2tt) +
6r
Tσ4,
where by easy calculationsT∑t=1
T∑s=1
c2st =
T∑t=1
ctt = r,T∑t=1
(T∑s=1
c2st)
2 =T∑t=1
c2tt.
Using again the cumulants’ theorem, it follows that
Var
(1
NT
N∑i=1
T∑t=1
e2it(
T∑s=1
eisccst)2
)=
1
N2T 2
N∑i,j=1
T∑t,s=1
Cov
(e2it(
T∑u=1
eiucut)2, e2
js(T∑u=1
ejucus)2
)
=1
N2T 2
N∑i,j=1
T∑t,s=1
T∑u1,u2=1
T∑v1,v2=1
cu1tcu2tcv1scv2sCov(e2iteiu1eiu2 , e
2jsejv1ejv2
)= O
(1
N
),
implying6
NT
T∑t=1
N∑i=1
e2it((
e′.iF
T)Ft)
2 →p6
T(κ4 + 2σ4)(
T∑t=1
c2tt) +
6r
Tσ4.
Along the same lines, for the fourth and fifth terms on the right hand side of (47) one obtains
4
NT
T∑t=1
N∑i=1
e3it((
e′.iF
T)Ft)→p
4
T(κ4 + 3σ4)(
T∑t=1
ctt) =4r
T(κ4 + 3σ4)
4
NT
T∑t=1
N∑i=1
eit((e′.iF
T)Ft)
3 →p4
T(κ4(
T∑t=1
c3tt) + 3σ4(
T∑t=1
c2tt)).
Collecting terms concludes the proof where
Cκ ≡(
1 +
∑Tt=1
∑Ts=1 c
4st
T+
6∑T
t=1 c2t,t
T+
4r
T+
4∑T
t=1 c3t,t
T
)(3 +
27
T(T∑t=1
c2t,t) +
18r
T
)−1. (50)
Case ai 6= 0 results in exactly the same expressions, by Lemmas 1 and 2, and as before the proof follows
replacing the eit with the e∗it = eit + ai.
33By the cumulants’ theorem (Brillinger (2001)), evaluation of Cov (eiu1eiu2eiu3eiu4 , ejv1ejv2ejv3ejv4) requires considering theindecomposable partitions of the two sets eiu1 , eiu2 , eiu3 , eiu4, ejv1 , ejv2 , ejv3 , ejv4, where indecomposable means that there mustbe at least one subset that includes an element of both sets.
8.3 Appendix C: Lemmas 62
(iii) Set ai = 0. By (27)
1
N
N∑i=1
λiλ′i =
1
N
N∑i=1
(H−1λi +
F′e.iT
+1
TF′(F− FH−1)λi
)(H−1λi +
F′e.iT
+1
TF′(F− FH−1)λi
)′= H−1
( 1
N
N∑i=1
λiλ′i
)H−1′ +
1
T 2F′(
1
N
N∑i=1
e.ie′.i)F +
1
N
N∑i=1
H−1λi
( F′e.iT
+1
TF′(F− FH−1)λi
)′+
1
N
N∑i=1
F′e.iT
(H−1λi +
1
TF′(F− FH−1)λi
)′+
1
N
N∑i=1
1
TF′(F− FH−1)λi
(H−1λi +
F′e.iT
+1
TF′(F− FH−1)λi
)′= H−1
( 1
N
N∑i=1
λiλ′i
)H−1′ +
1
T 2F′(
1
N
N∑i=1
e.ie′.i)F +Op(N
− 12 ),
implying1
N
N∑i=1
λiλ′i →p H−1ΣΛH−1′ +
σ2
TIr.
Case ai 6= 0 follows by replacing the eit with the e∗it = eit + ai, implying e∗.i = e.i + ai1T , and using Lemmas 1
and 2. For instance,
1
N
N∑i=1
e∗.ie∗′.i =
1
N
N∑i=1
(e.i + ai1T )(e.i + ai1T )′ =1
N
N∑i=1
e.ie′.i +
1T1′TN
N∑i=1
a2i +
1
N
N∑i=1
aie.i1′T +
1
N
N∑i=1
ai1Te′.i
=1
N
N∑i=1
e.ie′.i +O(N−1) +Op(N
− 12 ).
QED
Remark. The following identity holds:
ΣΛ = U∗.
Remark. Based on Lemma 4 one can construct a plug-in consistent estimator of the asymptotic covariance
matrix.
Lemma 5 (generalization of consistency with rates for every k) Under Assumptions 1-6, as N →∞,
‖ Fkt − H′kFt ‖= Op(N
− 12 ), for every 1 ≤ k, t ≤ T,
setting Fkt ≡ UkF
kt , with
Ukt ≡ Vk −
σ2
TIk, and Hk ≡ HkUk = (
Λ′Λ
N)(
F′Fk
T)being a full-rank r × k matrix.
Proof. This follows from the proof to Theorem 1-(i). Pre-multiplying (52) by Uk, where we express all in
terms of a generic 1 ≤ k ≤ T ,
Fkt − H′kFt = T−1
T∑s=1
Fksζst + T−1
T∑s=1
Fksηst + T−1
T∑s=1
Fksηts, (51)
and then apply Lemma 3, given that ‖ F′F/T ‖= Op(1) and ‖ F′kFk/T ‖= O(1). QED
8.3 Appendix C: Lemmas 63
Lemma 6 (behaviour of V (k)) Under Assumptions 1-6,
(i) When k ≤ r:V (k) = σ2 − σ2 k
T+
r∑s=k+1
us +Op(N− 1
2 ).
where the us are the random (diagonal) elements of U, satisfying u1 ≥ u2 ≥ · · · ≥ ur > 0 a.s..
(ii) When k > r:
V (k) = σ2 − σ2 k
T+Op(N
− 12 ).
Proof. Consider case ai = 0, as the proof follows along the same lines when ai 6= 0 by Lemmas 1 and 2.
(i) Case k ≤ r:
Given Λk = X′Fk(Fk′Fk)−1 = X′Fk/T , one gets:
λk′i =e′iF
k
T+ λ′i(
F ′F k
T), setting the r × k matrix Hk ≡ (Λ′Λ/N)(F′Fk/T )U−1
k of rank k,
yielding
ekit ≡ xit − λk′i Fkt = eit + λ′iFt − (
e′.iFk
T+ λ′i(
F′Fk
T))(H′kFt + (Fk
t − H′kFt))
= eit + λ′iFt −e′.iF
kFkt
T− λ′i(
F′Fk
T)(H′kFt +Op(N
−1/2)).
By Lemma 3 above, recalling that Fr = F with r ≥ k,
F′Fk
T=
F′F
T[Ik,0k×r−k]
′ →p Q′[Ik,0k×r−k]′,
recalling that Q = U12 ΥΣ
− 12
Λ . Then
(F′Fk
T)H′k →p Q′[Ik,0k×r−k]
′U−1k [Ik,0k×r−k]QΣΛ = Σ
− 12
Λ Υ′(Ik 0k×r−k
0r−k×k 0r−k×r−k)ΥΣ
12Λ,
obtaining
ekit = eit + λ′i(Ir −Σ− 1
2Λ Υ′(
Ik 0k×r−k0r−k×k 0r−k×r−k
)ΥΣ12Λ)Ft −
e′.iFkFk
t
T+Op(N
−1/2),
and implying
1
NT
T∑t=1
N∑i=1
(ekit)2 = V (k) =
1
NT
T∑t=1
N∑i=1
e2it +
1
NT
T∑t=1
N∑i=1
Fk′t Fk′e.iT
e′.iFkFk
t
T− 2
1
NT
T∑t=1
N∑i=1
eite′.iF
kFkt
T
+1
NT
T∑t=1
N∑i=1
F′t(Ir −Σ12ΛΥ′(
Ik 00 0
)ΥΣ− 1
2Λ )λiλ
′i(Ir −Σ
− 12
Λ Υ′(Ik 00 0
)ΥΣ12Λ)Ft +Op(N
−1/2)
→p σ2 + σ2 k
T− 2σ2 k
T+ trace
((Ir −Υ′(
Ik 0k×r−k0r−k×k 0r−k×r−k
)Υ)Σ12ΛΣFΣ
12Λ
)= σ2 − σ2 k
T+(trace[(
0k×k 0k×r−k0r−k×k Ir−k
)U])
= σ2 − σ2 k
T+
r∑s=k+1
us.
8.4 Appendix D: Proofs of Theorems 64
(ii) Case k > r:
In this case Hk = (Λ′ΛN )(F′Fk
T ) has full-row rank matrix, implying that its Moore-Penrose satisfies HkH+k =
Ir and (H+k )′H′k = Ir. By Proposition 1 and Lemma 3, V (k) = (NT )−1
∑Tt=1
∑Ni=1(ekit)
2 = (NT )−1∑T
t=1
∑Ni=1(ekit)
2
for
ekit ≡ xit − λk′i Fkt setting Λ =
X′Fk
T.
Then
ekit = eit −e′.iF
k
TFkt + λ′i(H
+k )′H′kFt − λ′i
F′Fk
TFkt
= eit −e′.iF
k
TFkt + λ′i(H
+k )′(H′kFt − Fk
t + Fkt )− λ′i
F′Fk
TFkt
= eit −e′.iF
k
TFkt + λ′i(H
+k )′(H′kFt − Fk
t ) +
(λ′i(H
+k )′Fk
t − λ′iF′Fk
TFkt
)
= eit −e′.iF
k
TFkt + λ′i(H
+k )′(H′kFt − Fk
t )− λ′i(H+k )′
((H′kF
′ − Fk′)Fk
T
)Fkt
= eit −e′.iF
k
TFkt + λi ×Op(N−
12 ),
where the Op(N− 1
2 ) term does not depend of i. Thus
V (k) =1
NT
N∑i=1
T∑t=1
e2it +
1
T 3
T∑t=1
Fk′t Fk′(
1
N
N∑i=1
e.ie′.i)F
kFkt −
2
T 2
T∑t=1
Fk′t Fk′(
1
N
N∑i=1
e.ieit) +Op(N− 1
2 )
= σ2 +σ2
Ttr
(Fk′Fk
T
Fk′Fk
T
)− 2σ2
T 2
T∑t=1
Fk′t Fk′ιt,T +Op(N
− 12 )
→p σ2 +
σ2k
T− 2
σ2k
T= σ2(1− k
T).
QED
8.4 Appendix D: Proofs of Theorems
To simplify arguments, we first report the proofs for the static case, that is when ais = ai,λis = λi, σ2t = σ2, rt =
r, and the illustrate their generalization to the time-varying case only when this adds further complexity to the
arguments.
Proof of Theorem 1. Consider case ai = 0, as the proof follows along the same lines when ai 6= 0 by
Lemmas 1 and 2.
The following identity holds (see Bai (2003), equation (D.1)):
Ft − H′Ft = U−1T−1T∑s=1
Fsζst + U−1T−1T∑s=1
Fsηst + U−1T−1T∑s=1
Fsηts. (52)
where, by Assumptions 6 and 7, for every given s, t = 1, · · · , T ,
ζst ≡e′s.et.N− E(
e′s.et.N
) = Op(N− 1
2 ), ηst ≡F′sΛ
′et.N
= Op(N− 1
2 ). (53)
8.4 Appendix D: Proofs of Theorems 65
The result follows noting that, by Lemma 3, ‖ U−1 ‖= Op(1), ‖ F′F/T ‖= Op(1) for N large enough, and
‖ F′F/T ‖= O(1) by construction.
(ii) By part (i), the first term on the right hand side of (52) satisfies
U−1
∑Ts=1 FsζstT
= U−1 1
T
T∑s=1
Fs(e′s.et.N− E(
e′s.et.N
))
= U−1 1
TF′(
eet.
N− E(
eet.
N)) = U−1 1
TH′F′(
eet.
N− E(
eet.
N)) + op(1),
where, by Lemma 3,
H→p ΣΛQ′U−1 = H = Σ12ΛΥ′U−
12 = Q−1. (54)
Noticing that
H′ΣFH = Ir,
one obtains √NU−1 1
T
T∑s=1
Fsζst →d N(0r, σ4U−2
T+
(µ4 − 2σ4)
T 2U−1H′FtF
′tHU−1),
as, by having limited the degree of cross-sectional heterogeneity of the moments of the eit as formalized in
Assumption 5, an identical limiting distribution is obtained when centering with respect to E(e′s.et.N )) or to its
large-N limit. The same reasoning applies to all other limiting distributions that follow.
In particular, to derive the asymptotic covariance matrix of√NU−1 1
T
∑Ts=1 Fsζst, consider that the (s, v)th
element of E(e.ieite′.jejt)− E(e.ieit)E(e′.jejt), for every 1 ≤ s, v ≤ T , satisfies
Cov(eiseit, ejvejt) = κ4,iijj,stvt + E(eisejv)E(eitejt) + E(eisejt)E(eitejv)
= κ4,iijj,stvt + σij,svσij,tt + σij,stσij,tv,
yielding, in matrix form,
Cov(e.ieit, e′.jejt) =
κ4,iijj,1t1t · · · κ4,iijj,1tT t
κ4,iijj,2t1t · · · κ4,iijj,2tT t...
......
κ4,iijj,T t1t · · · κ4,iijj,T tT t
+ σij,tt
σij,11 · · · σij,1Tσij,21 · · · σij,2T
......
...σij,T1 · · · σij,TT
+
σij,1t...
σij,T t
(σij,1t . . . σij,T t) .
Therefore,
1
N
N∑i=1
N∑j=1
Cov(e.ieit, e′.jejt) =
1
N
N∑i=1
κ4,iiii,ttttιtι′t + σii,tt
σii,11 · · · 0
0 σii,22 · · ·...
......
0 · · · σii,TT
+σ2ii,ttιtι
′t
+op(1)→ (κ4+σ4)ιtι′t+σ4IT .
For the second term of (52):
√NU−1 1
T
T∑s=1
Fsηst = U−1 1
T
T∑s=1
FsF′s
1√N
N∑i=1
λieit = U−1 F′F
T
1√N
N∑i=1
λieit →d N(0r, σ2U−1QΣΛQ′U−1),
8.4 Appendix D: Proofs of Theorems 66
where QΣΛQ′ = U. For the third term of (52):
√NU−1 1
T
T∑s=1
Fsηts =√NU−1 1
T
T∑s=1
FsF′t
1
N
N∑i=1
λieis = U−1 1
T
1√N
N∑i=1
T∑s=1
Fseisλ′iFt
= U−1 1
TF′(
1√N
N∑i=1
e.iλ′i)Ft = U−1 1
T(F′t ⊗ F′)
1√N
N∑i=1
(λi ⊗ e.i)
→d N(0r, σ2(
F′tΣΛFt
T)U−2),
where the asymptotic covariance matrix of T−1(F′t ⊗ F′) 1√N
∑Ni=1(λi ⊗ e.i) is obtained noticing that
T−1(F′t ⊗H′F′)(ΣΛ ⊗ σ2IT )(Ft ⊗ FH) = σ2(F′tΣΛFt)IT .
Given that the three terms are potentially correlated, we need the joint convergence of Assumption 7, as it
emerges by re-writing the right hand side of (52) in matrix form:
Ft − H′Ft = T−1U−1[F′, (F′F), (F′t ⊗ F′)
] 1
N
N∑i=1
(e.ieit − E(e.ieit))λieit
(λi ⊗ e.i)
(55)
= T−1U−1[F′, (F′F), (F′t ⊗ F′)
]At
1
N
N∑i=1
[(e.ieit − E(e.ieit))
(λi ⊗ e.i)
], (56)
where At is defined in (26). By Assumption 5 the eit have zero third-moments, implying that the covariance
matrix of ((e.ieit − E(e.ieit))′, (λi ⊗ e.i)
′)′ is block-diagonal. The diagonal elements have been established
above. When considering (55), one needs the asymptotic (cross) covariance matrix between λieit = (λi ⊗ eit)and (λ′j ⊗ e′.j), which automatically emerges by pre- and post-multiplying the asymptotic covariance matrix in
(25) by At and A′t, respectively, as indicated by (56) However, it can also be derived by direct calculations as
follows: by our assumptions1
N
N∑i,j=1
(λiλ′j ⊗ E(eite
′.j))→p σ
2(ΣΛ ⊗ ι′t,T ),
implying
1
TU−1(
F′F
T)
1
N
N∑i,j=1
(λiλ′j⊗E(eite
′.j))(Ft⊗F)U−1 →p
σ2
TU−1Q(ΣΛ⊗ι′t)(Ft⊗FH)U−1 =
σ2
TU−1QΣΛFtF
′tHU−1.
Let us now extend the proof to the time-varying case. We first show that
1
N
T0∑s=1
[(xs −ΛtFs)
′(xs −ΛtFs)− (xs −ΛsFs)′(xs −ΛsFs)
]ws(t) = op(1).
This implies that in terms of loadings we are in fact estimating Λt, thus constant within the interval Tt. In
fact,[(xs −ΛtFs)
′(xs −ΛtFs)− (xs −ΛsFs)′(xs −ΛsFs)
]=[−F′sΛ
′txs + F′sΛ
′sxs − x′sΛtFs + x′sΛsFs + F′sΛ
′tΛtFs − F′sΛ
′sΛsFs
]=[−F′s(Λt −Λs)
′xs − x′s(Λt −Λs)Fs + F′s(Λt −Λs)′ΛsFs + F′sΛ
′s(Λt −Λt)Fs + F′s(Λt −Λs)
′(Λt −Λs)Fs
].
8.4 Appendix D: Proofs of Theorems 67
By the mean value theorem (Λt −Λs) = Λ(1)s∗ (t− s) and, by summing across i, one gets the op(N) rate.
We now show how, within the interval Tt, we can replace σ2, σ4, κ4 by σ2t−1, σ4t−1, κ4t−1 throughout our
results. To simplify the exposition, it is convenient to assume that the set Tt equals 1, · · · , T (instead of
t− T + 1, · · · , t) without loss of generality. Starting with Lemma 3, we just need to consider ee′/N − σ2t−1IT
where now
ee′
N=
1
N
N∑i=1
σii,11 0 . . .
0 σii,22 . . .. . . . . . . . .. . . 0 σii,TT
12
η.iη′.i
σii,11 0 . . .
0 σii,22 . . .. . . . . . . . .. . . 0 σii,TT
12
+ op(1)
=1
N
N∑i=1
σii,tt + σii,1∗(1− t) 0 . . .
0 σii,tt + σii,2∗(2− t) . . .. . . . . . . . .. . . 0 σii,tt + σii,T ∗(T − t)
12
η.iη′.i
×
σii,tt + σii,1∗(1− t) 0 . . .
0 σii,tt + σii,2∗(2− t) . . .. . . . . . . . .. . . 0 σii,tt + σii,T ∗(T − t)
12
+ op(1)
= σ2t−1IT + op(1).
Let us first generalize Theorem 1. Regarding the first term of the corresponding asymptotic covariance matrix:
1
N
N∑i=1
N∑j=1
Cov(e.ieis, e′.jejs) =
1
N
N∑i=1
κ4,iiiissssιsι′s + σii,ss
σii,11 · · · 0
0 σii,22 · · ·...
......
0 · · · σii,TT
+ σ2ii,ssιsι
′s
+ op(1)
=1
N
N∑i=1
((κ4,iiiitttt + κ
(1)4,iiii,s∗(s− t))ιsι
′s + (σii,tt + σ
(1)ii,s∗(s− t))
×
σii,tt + σ
(1)ii,1∗(1− t) · · · 0
0 σii,tt + σ(1)ii,2∗(2− t) · · ·
......
...
0 · · · σii,tt + σ(1)ii,T ∗(T − t)
+ (σ2ii,tt + 2σii,s∗σ
(1)ii,s∗(t− s))ιsι
′s
)+ op(1)
→ (κ4t−1 + σ4t−1)ιsι′s + σ4t−1IT .
8.4 Appendix D: Proofs of Theorems 68
Regarding the second term of the asymptotic covariance matrix:
1
N
N∑i=1
N∑j=1
(λit−1λ′jt−1 ⊗ E(e.ie
′.j)) =
1
N
N∑i=1
(λit−1λ′it−1 ⊗ E(e.ie
′.i)) + op(1)
=1
N
N∑i=1
λit−1λ′it−1 ⊗
σii,11 0 . . .
0 σii,22 . . .. . . . . . . . .. . . 0 σii,TT
+ op(1)
=1
N
N∑i=1
λit−1λ′it−1 ⊗
σii,tt + σii,1∗(1− t) 0 . . .
0 σii,tt + σii,2∗(2− t) . . .. . . . . . . . .. . . 0 σii,tt + σii,T ∗(T − t)
+ op(1)
= (ΣΛt−1 ⊗ σ2t−1IT ) + op(1).
Finally, regarding the non-zero covariance term:
1
N
N∑i,j=1
(λit−1λ′jt−1 ⊗ E(eise
′.j)) = σ2
t−1(ΣΛt−1 ⊗ ι′s).
QED
Proof of Theorem 2. The result immediately follows from Lemma 4, noting that F′tΣΛQ′ = F′tHH−1ΣΛH−1′
and F′tΣΛFt = F′tHH−1ΣΛH−1′H′Ft.
The time-varying case requires to generalize Lemma 4. Then, part (i) of Lemma 2 follows immediately by
the above arguments. For part (ii), one needs to consider terms such as
1
NT
T∑h=1
N∑i=1
E(
T∑s=1
eiscsh)4 =1
NT
T∑h=1
N∑i=1
T∑s1,s2,s3,s4=1
κ4,iiii,s1,s2,s3,s4cs1hcs2hcs3hcs4h
+1
NT
T∑h=1
N∑i=1
T∑s1,s2,s3,s4=1
(σii,s1s2σii,s3s4 + σii,s1s3σii,s2s4 + σii,s1s4σii,s2s3)cs1hcs2hcs3hcs4h
=1
NT
T∑h=1
N∑i=1
T∑s=1
κ4,iiii,ssssc4sh +
T∑s1,s2=1
3σii,s1s1σii,s2s2c2s1hc
2s2h
+ o(1)
=1
NT
T∑h=1
N∑i=1
T∑s=1
(κ4,iiii,tttt + κ(1)4,iiii,s∗(s− t))c
4sh +
T∑s1,s2=1
3((σii,tt + (σ(1)ii,s∗1
(s1 − t))(σii,tt + (σ(1)ii,s∗2
(s2 − t)))c2s1hc
2s2h
+ o(1)
= κ4t−11
T
T∑h=1
T∑s=1
c4sh +
3σ4t−1
T
T∑t=1
(T∑s=1
c2st)
2 + o(1).
The remainder four terms of part (ii) follow along the same lines. Finally, by the same arguments, part (iii)
easily follow yielding1
N
N∑i=1
λit−1λ′it−1 →p H−1
t−1ΣΛt−1H−1′t−1 +
σ2t−1
TIrt−1 .
The proof is concluded setting κ4t−1 = 0. QED
8.4 Appendix D: Proofs of Theorems 69
Proof of Theorem 3. To show that Prob(PC(k) < PC(r)) → 0 for any k 6= r, consider at first case r < k,
where PC(k)− PC(r) = V ∗(k)− V ∗(r)− (r − k)g(N). Then
Prob(V ∗(k)− V ∗(r) < (r − k)g(N))→ 0,
because g(N) ↓ 0 whereas V ∗(k)− V ∗(r) has a strictly (a.s.) positive limit. Consider now case k > r. Then
Prob(PC(k) < PC(r)) = Prob(V ∗(r)− V ∗(k) > (k − r)g(N))→ 0,
because V ∗(r)− V ∗(k) = Op(N− 1
2 ) whereas N−12 /g(N) = o(1).
In the time-varying case one needs to evaluate the limit of terms such as N−1∑N
i=1 e2is, N
−1∑N
i=1 e.ie′.i and
N−1∑N
i=1 e.ieit, which follow by our previous arguments, with their limits being function of σ2t−1 and σ4t−1.
QED
Proof of Theorem 4. Consider case ai = 0, as the proof follows along the same lines when ai 6= 0 by
Lemmas 1 and 2.
(ii) Consistency of γ follows by Slutzky theorem and Theorem 1 as:
γ =1
T
T∑t=1
Ft →p1
T
T∑t=1
H′Ft = H′γP .
(iii) To derive the limiting distribution of the risk premia estimator, by the same steps adopted for the proof
of Theorem 1,
γ − H′γP =F′1TT− H′F′1T
T= U−1 F′
T(ee′
N− σ2IT )
1TT
+ U−1 F′F
T
Λ′e′1TNT
+ U−1 F′
T
eΛ
NF.
Regarding the first term on the right hand side:
√NU−1 F′
T(ee′
N− σ2IT )
1TT→d N(0r,
1
T 2U−1H′((κ4 + σ4)ΣF + σ4FF′)HU−1),
where, to derive the corresponding asymptotic covariance matrix, one needs to evaluate, for every 1 ≤ s, v ≤ T ,
Cov(eiseit, ejvejr) = κ4,iijj,stvr + E(eisejv)E(eitejr) + E(eisejr)E(eitejv)
= κ4,iijj,stvr + σij,svσij,tr + σij,srσij,tv,
yielding, in matrix form,
Cov(e.ieit, e′.jejr) =
κ4,iijj,1t1r · · · κ4,iijj,1tTr
κ4,iijj,2t1r · · · κ4,iijj,2tTr...
......
κ4,iijj,T t1r · · · κ4,iijj,T tTr
+ σij,tr
σij,11 · · · σij,1Tσij,21 · · · σij,2T
......
...σij,T1 · · · σij,TT
+
σij,1t...
σij,T t
(σij,1r . . . σij,T r) .
8.4 Appendix D: Proofs of Theorems 70
Therefore,
T−2T∑t=1
T∑r=1
1
N
N∑i=1
N∑j=1
Cov(e.ieit, e′.jejr)
= T−2T∑t=1
T∑r=1
1
N
N∑i=1
κ4,iiii,ttrrιtι′r + σii,tr
σii,11 · · · 0
0 σii,22 · · ·...
......
0 · · · σii,TT
+ σii,ttσii,rrιtι′r
+ op(1)
→ (κ4 + σ4)
TIT + σ4
1T1′TT 2
.
For the second term:√NU−1 F′F
T
Λ′e′1TNT
→d N(0r,σ2
TU−1QΣΛQ′U−1),
as
1
NT 2
N∑i=1
N∑j=1
λiλ′jCov(1′Te.i, e
′.j1T ) =
1
NT 2
T∑t=1
T∑r=1
N∑i=1
N∑j=1
λiλ′jCov(eit, ejr)
=1
NT 2
T∑t=1
T∑r=1
N∑i=1
N∑j=1
λiλ′jσij,tr =
1
NT 2
T∑t=1
T∑r=1
N∑i=1
λiλ′iσii,tr + op(1)→p
σ2
TΣΛ,
and for the third term:√NU−1 F′
T
eΛ
NF→d N(0r,
σ2F′ΣΛF
TU−2),
noticing that√NU−1 F′
T
eΛ
NF = U−1(F′ ⊗ F′
T)
1√N
N∑i=1
(λi ⊗ e.i).
It remains to obtain the covariance terms, with the only non-zero term being
1
T
T∑t=1
U−1 F′F
T
1
N
N∑i=1
N∑j=1
(λiλ′j ⊗ E(eite
′.j))(F⊗
F
T)U−1 →p
σ2
TU−1QΣΛFF′HU−1.
Finally, the expression for the asymptotic covariance matrix of γ simplifies recalling the identities QΣΛQ′ = U
and H′ΣFH′ = Ir. The above four quantities make the matrix CT .
(iv) Consistency of CT easily follows along the lines of the proof to Theorem 2, recalling the identity QH = Ir,
replacing H′F by ¯F and recalling that ΣΛ →p H−1ΣΛH−1′. QED
Proof of Theorem 5. Define the following function of an arbitrary, full column rank, T × r matrix G
m(G; t) ≡ 1
rf− 1
rf(1′TT
G)(G′M1TG
T)−1(G′(ιt −
1TT
)).
Then mt−1,t = m(F; t) and mPt−1,t = m(F; t). Notice that mP
t−1,t = m(FC; t) for any non-singular r× r matrix
C, including the special case C = H.
8.4 Appendix D: Proofs of Theorems 71
Given that T is fixed, we can view vec(F′) as a vector of parameters, and use the delta method, in particular
expanding m(F; t) around m(H′F; t), to derive the limiting distribution of mt. Considering the first-order
differential of m(G; t) with respect to the arbitrary argument (matrix) G:
dm(G; t) = − 1
rf(1′TTdG)(
G′M1TG
T)−1(G′(ιt −
1TT
))
− 1
rf(1′TT
G)d(G′M1TG
T)−1(G′(ιt −
1TT
))− 1
rf(1′TT
G)(G′M1TG
T)−1(dG′(ιt −
1TT
))
= − 1
rf(G′(ιt −
1TT
))′(G′M1TG
T)−1((
1′TT
)⊗ Ir)dvecG′ − 1
rf(1′TT
G)(G′M1TG
T)−1((ι′t −
1′TT
)⊗ Ir)dvecG′
+1
rf(1′TT
G(G′M1TG
T)−1 ⊗ (ιt −
1TT
)′G(G′M1TG
T)−1)(G′M1T ⊗ Ir)dvecG
′
+1
rf((ιt −
1TT
)′G(G′M1TG
T)−1 ⊗
1′TT
F(G′M1TG
T)−1)(G′M1T ⊗ Ir)dvecG
′.
Therefore
∂mt(F; t)
∂vecG′=
1
rf
(−((
1TT
)⊗ Ir)Ω−1(F′(ιt −
1TT
))− ((ιt −1TT
)⊗ Ir)Ω−1(F′
1TT
)
+(M1T F⊗ Ir)(Ω−1F′
1TT⊗ Ω−1F′(ιt −
1TT
)) + (M1T F⊗ Ir)(Ω−1F′(ιt −
1TT
)⊗ Ω−1F′1TT
)
)=
1
rf
(−((
1TT
)⊗ Ir)Ω−1(F′(ιt −
1TT
))− ((ιt −1TT
)⊗ Ir)Ω−1(F′
1TT
)
+(M1T FΩ−1F′ ⊗ Ω−1F′)
[(1TT⊗ (ιt −
1TT
)) + ((ιt −1TT
)⊗ 1TT
)
]),
satisfying
∂m(F; t)
∂vecG′→p (IT ⊗H−1)
∂m(F; t)
∂vecG′.
We now evaluate the asymptotic covariance matrix of F′ − H′F′, where
F′ − H′F′ = U−1 F′
T(ee′
N− σ2IT ) + U−1 F′F
T
Λ′e′
N+ U−1 F′
T
eΛ
NF′.
implying
vec(F′ − H′F′) =
vec(U−1 F′
T(ee′
N− σ2IT ) + U−1 F′F
T
Λ′e′
N+ U−1 F′
T
eΛ
NF′)
= (IT ⊗ U−1 F′
T)
1
N
N∑i=1
((e.i ⊗ e.i)− σ2vec(IT ))
+(IT ⊗ U−1 F′F
T)
1
N
N∑i=1
(e.i ⊗ λi) + (F⊗ U−1 F′
T)
1
N
N∑i=1
(λi ⊗ e.i).
8.4 Appendix D: Proofs of Theorems 72
Then
Var(√Nvec(F′ − H′F′)) =
+(IT ⊗ U−1 F′F
T)
1
N
N∑i=1
E [(e.i ⊗ λi)(e′.i ⊗ λ′i)](IT ⊗F′F
TU−1) + (F⊗ U−1 F′
T)
1
N
N∑i=1
E [(λi ⊗ e.i)(λ′i ⊗ e′.i)](F
′ ⊗ F
TU−1)
+(IT ⊗ U−1 F′F
T)
1
N
N∑i=1
E [(e.i ⊗ λi)(λ′i ⊗ e′.i)](F′ ⊗ F
TU−1) + (F⊗ U−1 F′
T)
1
N
N∑i=1
E [(λi ⊗ e.i)(e′.i ⊗ λ′i)](IT ⊗
F′F
TU−1)
+(IT ⊗ U−1 F′
T)
1
N
N∑i=1
E [((e.i ⊗ e.i)− σ2vec(IT ))((e′.i ⊗ e′.i)− σ2vec′(IT ))](IT ⊗F
TU−1) + op(1)
→p (IT ⊗U−1 H′F′
T)Ue(IT ⊗
FH
TU−1) + (σ2IT ⊗U−1) + (FΣΛF′ ⊗ σ2
TU−2)
+(IT ⊗U−1 H′F′F
T)(σ2IT ⊗ΣΛ)KTr(F′ ⊗
FH
TU−1) + (F⊗U−1 H′F′
T)(ΣΛ ⊗ σ2IT )KrT (IT ⊗
F′FH
TU−1).
Re-arranging terms, one obtains
Var(√Nvec(F′ − H′F′)) →
= (IT ⊗U−1 H′F′
T)Ue(IT ⊗
FH
TU−1) + (σ2IT ⊗U−1) + (FΣΛF′ ⊗ σ2
TU−2)
+(σ2 FH
TU−1 ⊗H′F′)KrT +KTr(σ2U−1 H′F′
T⊗ FH),
using the property Kpm(A ⊗ B) = (B ⊗ A)Kqn for every m × n matrix A and p × q matrix B, and K′ab =
K−1ab = Kba (see Magnus & Neudecker (2007)[Chapter 3, Section 7, Theorem 9]), and using the identities
H′ΣFH = Ir, U−1H−1ΣΛH−1′U−1 = U−1, in particular in second variance term above, and U−1H′ΣFΣΛ =
U−1H′ΣFHH−1ΣΛ = U−1H−1ΣΛ = H′ in last two covariance terms. QED
Proof of Theorem 6. For part (i), it is convenient to re-write Fmpt as
Fmpt = (γ,Ω)
(a′ + γΛ′
Λ′
)([(a + Λγ),Λ]
(1 0′r0r Ω
)[a′ + γ ′Λ′
Λ′
]+ Σe
)−1
xt.
= (γ,Ω)Λ′†(Λ†Ω†Λ
′† + Σe
)−1xt.
= (γ,Ω)Λ′†(Λ†Ω†Λ
′† + Σe
)−1(
Λ†(1
ft − E(ft)) + et.
),
setting
Λ† ≡ (a + Λγ,Λ) , Ω† ≡(
1 0′r0r Ω
).
8.4 Appendix D: Proofs of Theorems 73
Then
Fmpt = (γ,Ω)
(Ir+1 −Λ′†Σ
−1e Λ†(Ω
−1† + Λ′†Σ
−1e Λ†)
−1)
Λ′†Σ−1e
(Λ†(
1ft − E(ft)
) + et.
)= (γ,Ω)Ω−1
† (Ω−1† + Λ′†Σ
−1e Λ†)
−1Λ′†Σ−1e
(Λ†(
1ft − E(ft)
) + et.
)= (γ,Ω)Ω−1
†
(1 + (a′ + γ ′Λ′)Σ−1
e (a + Λγ) (a′ + γ ′Λ′)Σ−1e Λ
Λ′Σ−1e (a + Λγ) Ω−1 + Λ′Σ−1
e Λ
)−1
Λ′†Σ−1e
(Λ†(
1ft − E(ft)
) + et.
)= I + II.
Consider first term I. Setting for simplicity a ≡ 1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ), b ≡ Λ′Σ−1
e (a + Λγ) and
B ≡ Ω−1 + Λ′Σ−1e Λ, and using the block-wise formula for the inverse of a matrix, gives:(
1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ) (a′ + γ ′Λ′)Σ−1
e ΛΛ′Σ−1
e (a + Λγ) Ω−1 + Λ′Σ−1e Λ
)−1
= (a b′
b B)−1 =
(a−1 + a−2b′(B− bb′
a )−1b −ba
′(B− bb′
a )−1
−(B− bb′
a )−1 ba (B− bb′
a )−1
).
Post multiplying the latter expression by Λ′†Σ−1e Λ† = (
a− 1 b′
b B−Ω−1 ) yields(a−1 + a−2b′(B − bb′
a)−1b −b
a′(B − bb′
a)−1
−(B − bb′a
)−1 ba
(B − bb′a
)−1
)(
a − 1 b′
b B − Ω−1) =(
(a − 1)(a−1 + a−2b′(B − bb′a
)−1b) − ba′(B − bb′
a)−1b (a−1 + a−2b′(B − bb′
a)−1b)b′ − b
a′(B − bb′
a)−1(B − Ω−1)
−(a − 1)(B − bb′a
)−1 ba
+ (B − bb′a
)−1b −(B − bb′a
)−1 ba
b′ + (B − bb′a
)−1(B − Ω−1)
)=
((a−1)
a− 1
a2 b′(bfB − bb′a
)−1b ba′(B − bb′
a)−1Ω−1
(B − bb′a
)−1 ba
Ir − (B − bb′a
)−1Ω−1
)We now evaluate the behaviour of (B− bb′
a )−1 and b/a as N →∞. One needs first to express B− bb′
a suitably,
as:
B− bb′
a= Ω−1 + Λ′Σ−1
e Λ− Λ′Σ−1e (a + Λγ)(a + Λγ)′Σ−1
e Λ′
(1 + (a + Λγ)′Σ−1e (a + Λγ))
= Ω−1 + Λ′Σ−1/2e [IN −
Σ−1/2e (a + Λγ)(a + Λγ)′Σ
−1/2e
(1 + (a + Λγ)′Σ−1e (a + Λγ))
]Σ−1/2e Λ
= Ω−1 + Λ′Σ−1/2e [IN + Σ−1/2
e (a + Λγ)(a + Λγ)′Σ−1/2e ]−1Σ−1/2
e Λ
given the identity
[IN −Σ−1/2e (a + Λγ)(a + Λγ)′Σ
−1/2e
(1 + (a + Λγ)′Σ−1e (a + Λγ))
] = [IN + Σ−1/2e (a + Λγ)(a + Λγ)′Σ−1/2
e ].
Setting the non-singular matrix, where non-singularity holds for N large enough as the last two terms of C
only affect its maximum eigenvalue,
C ≡ Σ−1e + aa′ + Λγa′ + aγ ′Λ′,
8.4 Appendix D: Proofs of Theorems 74
and using it within the inverse of the last expression above, gives
(B− bb′
a)−1 =
Ω−ΩΛ′Σ−1/2e (IN + Σ−1/2
e (a + Λγ)(a + Λγ)′Σ−1/2e + Σ−1/2
e ΛΩΛ′Σ−1/2e )−1Σ−1/2
e ΛΩ
= Ω−ΩΛ′(Σ−1e + (a + Λγ)(a + Λγ)′ + ΛΩΛ′)−1ΛΩ
= Ω−ΩΛ′(C + Λ(γγ ′ + Ω)Λ′)−1ΛΩ
= Ω−Ω(Λ′C−1Λ)((γγ ′ + Ω)−1 + Λ′C−1Λ)−1(γγ ′ + Ω)−1Ω
implying, as N →∞, in particular by the divergence of Λ′C−1Λ,
(B− bb′
a)−1 →p Ω−Ω(γγ ′ + Ω)−1Ω =
γγ ′
(1 + γ ′Ω−1γ).
In fact, divergence of Λ′C−1Λ follows as it is bounded from below by Λ′ΛgN (C−1). In turn gN (C−1) = g−11 (C)
and thus one needs the maximum eigenvalue of C not to diverge too fast. However, the maximum eigenvalue of
C diverges at most as a′Λγ = Op(N12 ), given that a′a = O(1) by no arbitrage and Σe has minimum eigenvalue
bounded away from zero, whereas Λ′Λ = Op(N). It follows that
b
a=
Λ′Σ−1e (a + Λγ)
1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ)
→pDγ
γ ′Dγ.
setting D to be probability limit of Λ′Σ−1e Λ/N . Therefore, putting terms together,(
1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ) (a′ + γ ′Λ′)Σ−1
e ΛΛ′Σ−1
e (a + Λγ) Ω−1 + Λ′Σ−1e Λ
)−1
Λ′†Σ−1e Λ† =
((a−1)a − 1
a2b′(B− bb′
a )−1b ba
′(B− bb′
a )−1Ω−1
(B− bb′
a )−1 ba Ir − (B− bb′
a )−1Ω−1
)
→p
(1− γ′Dγγ′Dγ
(γ′Dγ)2(1+γ′Ω−1γ)(γ′Dγ)γ′Ω−1
(γ′Dγ)(1+γ′Ω−1γ)γγ′Dγ
(γ′Dγ)(1+γ′Ω−1γ)Ir − γγ′Ω−1
(1+γ′Ω−1γ)
)=
(1− 1
(1+γ′Ω−1γ)γ′Ω−1
(1+γ′Ω−1γ)γ
(1+γ′Ω−1γ)Ir − γγ′Ω−1
(1+γ′Ω−1γ)
).
It follows that, as N diverges, term I satisfies:
I →p (γ, Ir)
(1− 1
(1+γ′Ω−1γ)γ′Ω−1
(1+γ′Ω−1γ)γ
(1+γ′Ω−1γ)Ir − γγ′Ω−1
(1+γ′Ω−1γ)
)(
1ft − E(ft)
) = (γ, Ir)(1
ft − E(ft)) = Ft.
Next, considering the variance of term II (conditioning on the Λ), we will show that it converges to zero as N
diverges. In fact,
var(
(γ,Ω)(Ir+1 −Λ′†Σ
−1e Λ†(Ω
−1† + Λ′†Σ
−1e Λ†)
−1)
Λ′†Σ−1e et.
)= (γ, Ir)(
a b′
b B)−1(
a− 1 b′
b B−Ω−1 )(a b′
b B)−1(γ, Ir)
′
→p (γ, Ir)
(1− 1
(1+γ′Ω−1γ)γ′Ω−1
(1+γ′Ω−1γ)γ
(1+γ′Ω−1γ)Ir − γγ′Ω−1
(1+γ′Ω−1γ)
)1
(1 + γ ′Ω−1γ)(
1 −γ ′−γ γγ ′
)(γ, Ir)′ = 0r×r,
using
(a b′
b B)−1 →p
1
(1 + γ ′Ω−1γ)(
1 −γ ′−γ γγ ′
).
Therefore II →p 0r establishing part (i).
8.4 Appendix D: Proofs of Theorems 75
Part (ii) follows from the identity
Fmpt − F = −σ2(σ2Ir + Λ′Λ)−1F,
and recalling that, under our assumptions, Λ′Λ = Op(N−1). QED
Proof of Theorem 7. Consider at first cases (ii), (iii) and (iv) for the static model, where aT = a1T as
ait = ai with a ≡ a′1T /N . By simple algebric steps:
(γ0
µΛ)− (
γ0 + a
H−1µΛ) = (D′D)−1D′xT − (
γ0 + a
H−1µΛ)
= (D′D)−1D′(D(γ0 + aµΛ
) + eT )− (γ0 + a
H−1µΛ)
= (D′D)−1D′(D(1 0′r0r H
)(γ0 + a
H−1µΛ) + eT )− (
γ0 + a
H−1µΛ)
= (D′D
T)−1 D′eT
T+ (
D′D
T)−1[
D′D
T(
1 0′r0r H
)− (D′D
T)](
γ0 + a
H−1µΛ) = I + II.
Consider now
[D′D
T(
1 0′r0r H
)− (D′D
T)] = (
0 γP ′H− γ ′
0rF∗′F∗
T H− Ir) = (
0 γP ′H− γ ′
0rF∗′FT H− F∗′F∗
T
)
= (0
1′TT (FH− F∗)
0rF∗′
T (FH− F∗)) = (0r+1,
D′
T(FH− F∗)),
implying that term II satisfies
(D′D
T)−1[
D′D
T(
1 0′r0r H
)− (D′D
T)](
γ0 + a
H−1µΛ) = (
D′D
T)−1 D′
T(FH− F∗)H−1µΛ
= vec((D′D
T)−1 D′
T(FH− F∗)H−1µΛ) = −(µ′ΛH−1′ ⊗ (
D′D
T)−1 D′
T)vec(F∗ − FH)
= −(µ′ΛH−1′ ⊗ (D′D
T)−1 D′
T)KrT vec(F
∗′ − H′F′).
Asymptotic normality of√Nvec(F∗′ − H′F′), and the corresponding asymptotic covariance matrix, follows
from the proof to Theorem 5 whereas asymptotic normality of√N eT follows from our assumptions. Thus, to
derive the asymptotic distribution of (γ0, µ′Λ)′, we need to further derive the asymptotic covariance between√
Nvec(F(∗′ − H′F′) and√N eT , given by:
NCov(vec(F∗′ − H′F′), e′N ) = NCov(vec((IT ⊗ U−1 F∗′
T)
1
N
N∑i=1
((e.i ⊗ e.i)− σ2vec(IT )), e′T )
+NCov((IT ⊗ U−1 F∗′F
T)
1
N
N∑i=1
(e.i ⊗ λi), e′T ) +NCov((F⊗ U−1 F∗′
T)
1
N
N∑i=1
(λi ⊗ e.i), e′T )
= N(IT ⊗U−1Q)1
N2
N∑i=1
(E(e.ie′.i)⊗ λi) +N(F⊗U−1 H′F′
T)
1
N2
N∑i=1
(λi ⊗ E(e.ie′.i)) + o(1)
→ σ2(IT ⊗U−1Q)(IT ⊗ µΛ) + σ2(F⊗U−1 H′F′
T)(µΛ ⊗ IT ) = σ2(IT ⊗U−1H−1µΛ) + σ2(FµΛ ⊗U−1 H′F′
T),
8.5 Appendix E: The matrixes Ue,Uet−1 and Ue, Uet−1. 76
recalling that Q = H−1. Putting terms together, parts (i),(ii) and (iv) follow, where the matrix G is obtained
using the property Ka1 = Ia. Part (iii) follows precisely along the same steps of Theorem 2.
Consider now case (i), which is only relevant in the dynamic case. Following the previous steps, but
recognizing time-variation of the ait−1, γ0t−1 and λit−1, as ¯a ≡ a′T1T /T = Op(N− 1
2 ), eT = Op(N− 1
2 ) and
‖ F∗ − FHt−1 ‖= Op(N− 1
2 ),
(γ0
µΛ) = (D′D)−1D′
(aT + γ0 + FµΛt−1 + eT
)= (
D′D
T)−1 D
T
′ (aT + γ0 + F∗H−1
t−1µΛt−1
)+Op(N
− 12 )
=
(1 + ¯F∗′Ω∗−1 ¯F∗ − ¯F∗′Ω∗−1
−Ω∗−1 ¯F∗ Ω∗−1
)(γ0 + ˜F∗′H−1
t−1µΛt−1
F∗′γ0T + F∗′F∗
T H−1t−1µΛt−1
)+Op(N
− 12 )
=
(γ0 − ˜F∗′Ω∗−1Cov(F∗′,γ0)
H−1t−1µΛt−1 − Ω∗−1Cov(F∗′,γ0)
)+Op(N
− 12 )
=
(γ0
H−1t−1µΛt−1
)−
(˜F∗′
Irt−1
)Ω∗−1Cov(F∗′,γ0) +Op(N
− 12 )
QED
8.5 Appendix E: The matrixes Ue,Uet−1 and Ue, Uet−1.
We now present the closed-form expression for Ue, which is defined as the limit:
1
N
N∑i=1
E(((e.i ⊗ e.i)− σ2vec(IT ))((e′.i ⊗ e′.i)− σ2vec′(IT ))
)→ Ue.
In particular, Raponi et al. (2018) established that the T 2 × T 2 matrix Ue has the form:
Uε =
U11 · · · U1t · · · U1T
.... . .
......
...
Ut1 · · · Utt · · · UtT
......
.... . .
...
UT1 · · · UTt · · · UTT
. (56)
Each block of Uε is a T × T matrix. The blocks along the main diagonal, denoted by Utt, t = 1, 2, . . . , T , are
themselves diagonal matrixes with (κ4 + 2σ4) in the (t, t)-th position and σ4 in the (s, s) position for every
8.6 Tables 77
s 6= t, that is,
↓t-th column
Utt = →t-th row
σ4 · · · 0 · · · · · · · · · 0...
. . ....
......
......
0 · · · σ4 0 · · · · · · 00 · · · 0 (κ4 + 2σ4) 0 · · · 00 · · · · · · 0 σ4 · · · 0...
......
......
. . ....
0 · · · · · · · · · · · · 0 σ4
. (57)
The blocks outside the main diagonal, denoted by Uts, s, t = 1, 2, . . . , T with s 6= t, are all made of zeros except
for the (s, t)-th position that contains σ4, that is,
↓t-th column
Uts = →s-th row
0 · · · 0 · · · · · · · · · 0...
. . ....
......
......
0 · · · 0 0 · · · · · · 00 · · · 0 σ4 0 · · · 00 · · · · · · 0 0 · · · 0...
......
......
. . ....
0 · · · · · · · · · · · · 0 0
. (58)
The time-varying matrix Uet is defined by replacing κ4, σ4 with κ4t, σ4t in the expressions above.
Finally, noticing that Ue = Ue(κ4, σ4) and Uet−1 = Ue(κ4t−1, σ4t−1), consistent plug-in estimators can be
easily obtained as Ue = Ue(0, σ4) and Uet−1 = Ue(0, σ4t−1) when κ4 = κ4t−1 = 0 and σ4, σ4t−1 denote the
consistent estimators adopted in the paper.
8.6 Tables
8.6 Tables 78
N T PCp1 ICp1 AIC1 BIC1 PC(0.10) PC(0.25) PC(0.45)
5 2 2 2 2 2 1 1 115 2 2 2 2 2 1 1 130 2 2 2 2 2 1 1 1100 2 2 2 2 2 1 1 1200 2 2 2 2 2 1 1 1500 2 2 2 2 2 1 1 11000 2 2 2 2 2 1 1 13000 2 2 2 2 2 1 1 15 3 3 3 3 3 1.28 1.132 1.05215 3 3 3 3 3 1.316 1.08 130 3 3 3 3 3 1.256 1.024 1100 3 3 3 3 3 1.248 1.008 1200 3 3 3 3 3 1.196 1 1500 3 3 3 3 3 1.184 1 11000 3 3 3 3 3 1.136 1 13000 3 3 3 3 3 1.072 1 1
Table I: Estimated number of factors, k (average across Monte Carlo iterations). The true number of factorsis r = 1 and we search for 0 ≤ k ≤ T = 2, 3. The eit are assumed iid N(0, 1) across i and t. Columns 3 to6 correspond to the competing criteria IPp1, ICp1, AIC1, BIC1 (see Bai & Ng (2002)[Section 5] for details).
Columns 7 to 9 correspond to our large-N criterion PC(ε2) = ( TT−k )V (k) + kg(N) with g(N) = (log(N))ε1Nε2√
Nand ε1 = 0, ε2 = 0.1, 0.25, 0.45.
8.6 Tables 79
N T PCp1 ICp1 AIC1 BIC1 PC(0.10) PC(0.25) PC(0.45)
10 5 5.16 5.11 5.13 5.19 1.11 1.01 130 5 5.03 5.03 5.03 5.04 1.06 1 1100 5 5.05 5.03 5.03 5.05 6 1 1200 5 5.07 5.06 5.06 5.07 6 1 1500 5 5.07 5.07 5.07 5.07 6 1.2 11000 5 5.05 5.05 5.05 5.05 6 6 13000 5 5.01 5.01 5.01 5.01 6 6 110 10 8 8 8 8 1.02 1 130 10 8 8 8 8 1 1 1100 10 8 8 8 8 1 1 1200 10 8 8 8 8 1 1 1500 10 8 8 8 8 1 1 11000 10 8 7.51 8 8 1 1 13000 10 8 1 8 8 1 1 110 30 8 8 8 8 1 1 130 30 6.15 1 8 8 1 1 1100 30 1.5 1 8 3.6 1 1 1200 30 1 1 8 1 1 1 1500 30 1 1 4.84 1 1 1 11000 30 1 1 1.29 1 1 1 13000 30 1 1 1 1 1 1 110 60 8 8 8 8 1 1 130 60 3.54 1 8 8 1 1 1100 60 1 1 8 2.18 1 1 1200 60 1 1 8 1 1 1 1500 60 1 1 4.78 1 1 1 11000 60 1 1 1 1 1 1 13000 60 1 1 1 1 1 1 110 120 8 8 8 8 1 1 130 120 1.13 1 8 8 1 1 1100 120 1 1 8 4.2 1 1 1200 120 1 1 8 1 1 1 1500 120 1 1 8 1 1 1 11000 120 1 1 1.43 1 1 1 13000 120 1 1 1 1 1 1 1
Table II: Estimated number of factors, k (average across Monte Carlo iterations). The true number of factors isr = 1 and we search for 0 ≤ k ≤ T . The eit are assumed iid N(0, 1) across i and t. Columns 3 to 6 correspondto the competing criteria IPp1, ICp1, AIC1, BIC1 (see Bai & Ng (2002)[Section 5] for details). Columns 7 to
9 correspond to our large-N criterion PC(ε2) = ( TT−k )V (k) + kg(N) with g(N) = (log(N))ε1Nε2√
Nand ε1 = 0,
ε2 = 0.1, 0.25, 0.45.
8.6 Tables 80
N T PCp1 ICp1 AIC1 BIC1 PC(0.10) PC(0.25) PC(0.45)
10 5 5.05 5.05 5.05 5.05 4.38 3.57 2.2330 5 5.01 5 5 5.01 6 5.98 2.43100 5 5.01 5 5 5.01 6 6 3.6200 5 5.05 5.05 5.05 5.05 6 6 4.22500 5 5.05 5.02 5.01 5.05 6 6 5.111000 5 5.03 5.02 5 5.03 6 6 5.73000 5 5.08 5.07 5.06 5.08 6 6 5.8510 10 8 8 8 8 4.4 3.39 2.0430 10 8 8 8 8 6.19 3.23 1.8100 10 8 8 8 8 6.76 2.87 1.76200 10 8 8 8 8 6.5 2.9 1.87500 10 8 8 8 8 5.77 2.91 1.861000 10 8 8 8 8 5.29 2.95 1.953000 10 8 8 8 8 4.39 2.99 1.9410 30 8 8 8 8 4.22 2.88 1.8830 30 6.34 2.93 8 8 3.61 2.91 1.94100 30 3.09 3 8 4.46 3.04 3 2.07200 30 3 3 8 3 3 3 2.15500 30 3 3 5.52 3 3 3 2.261000 30 3 3 3.03 3 3 3 2.383000 30 3 3 3 3 3 3 2.5510 60 8 8 8 8 4.15 2.79 1.9530 60 4.18 2.98 8 8 3.02 2.95 2.03100 60 3 3 8 3.41 3 3 2.23200 60 3 3 8 3 3 3 2.32500 60 3 3 5.49 3 3 3 2.621000 60 3 3 3 3 3 3 2.73000 60 3 3 3 3 3 3 2.7610 120 8 8 8 8 3.53 2.78 1.9830 120 3.01 2.99 8 8 3 2.97 2.07100 120 3 3 8 4.91 3 3 2.37200 120 3 3 8 3 3 3 2.45500 120 3 3 8 3 3 3 2.811000 120 3 3 3.12 3 3 3 2.93000 120 3 3 3 3 3 3 2.96
Table III: Estimated number of factors, k (average across Monte Carlo iterations). The true number of factorsis r = 3 and we search for 0 ≤ k ≤ T . The eit are assumed iid N(0, 1) across i and t. Columns 3 to 6 correspondto the competing criteria IPp1, ICp1, AIC1, BIC1 (see Bai & Ng (2002)[Section 5] for details). Columns 7 to
9 correspond to our large-N criterion PC(ε2) = ( TT−k )V (k) + kg(N) with g(N) = (log(N))ε1Nε2√
Nand ε1 = 0,
ε2 = 0.1, 0.25, 0.45.
8.6 Tables 81
N=10 N=100 N=1000 N=3000
T=2 - - - -T=5 0.942 0.996 0.999 0.999T=50 0.955 0.996 0.999 0.999
Table IV: Sample correlations between Ft and Ft for 1 ≤ t ≤ T (average across Monte Carlo iterations).
N=10 N=100 N=1000 N=3000
T=2 0.610 0.667 0.674 0.675T=5 0.875 0.905 0.907 0.907T=50 0.989 0.991 0.992 0.992
Table V: Sample correlations between λi and λi for 1 ≤ i ≤ N (average across Monte Carlo iterations).
8.6 Tables 82
mkt smb hml rmw cma
factor 1 -0.163 -0.013 0.024 0.040 0.040
factor 2 0.013 0.007 -0.071 -0.091 -0.030
factor 3 -0.032 -0.113 -0.004 0.015 0.070
factor 4 0.005 0.004 -0.002 0.025 -0.043
factor 5 -0.043 -0.020 0.027 0.011 0.041
Table VI: Identifying risk factors: entire period (Dec 1960 to Dec 2013). We report the correlation betweenmktt (market excess return), smbt (small-minus-big portfolio return), hmlt(high-minus-low portfolio return),rmwt (profitability portfolio return), camt (investment portfolio return) and the first five estimated risk factors.Rolling windows of T = 12 observations from Dec 1960 to Dec 2013. Data source: Kenneth French’s website.
8.6 Tables 83
mkt smb hml rmw cma
factor 1 0.367 0.083 0.378 0.173 -0.064
factor 2 -0.194 -0.037 -0.311 -0.160 0.109
factor 3 0.161 0.433 0.439 -0.125 -0.046
factor 4 0.071 0.085 -0.342 -0.361 -0.462
factor 5 0.007 0.191 -0.232 -0.155 -0.412
Table VII: Identifying risk factors: from the burst of the sub-prime bubble to recovery (Aug 2007 to Feb2009). We report the correlation between mktt (market excess return), smbt (small-minus-big portfolio return),hmlt(high-minus-low portfolio return), rmwt (profitability portfolio return), camt (investment portfolio return)and the first five estimated risk factors. Rolling windows of T = 12 observations from Dec 1960 to Dec 2013.Data source: Kenneth French’s website.
8.6 Tables 84
1/N k=k k=1 k=5 k=10
0.58 0.88 0.65 0.18 0.14- (2.13) (0.49) (- 2.86) (-3.15)
Table VIII: Column one reports the out-of-sample Sharpe ratio (annualized) corresponding to equally-weightedportfolio excess return. Columns two to five report the out-of-sample Sharpe ratios (annualized) correspondingto the tangency portfolio excess returns:
rmv,kt+1 = wmv′t xt+1.,
corresponding to an estimated k factor models, with portfolio weights:
wmvt =
Σ−1t Λtγ
1′NΣ−1t Λtγ
where Σt =(ΛtΩΛ′t + σ2
t IN
).
Rolling windows of T = 36 observations from Dec 1960 to Dec 2013. Data source: individual equity returnsfrom CRSP. The second row report the t-ratios for testing equality between the Sharpe ratios of the portfoliosassociated with the factor model and the Sharpe ratio of the equally weighted portfolio (see Lo (2003) fordetails).
8.7 Figures 85
8.7 Figures
Figure I: T = 10, N = 100. The figure presents the histograms(across Monte Carlo iterations) of the sample correlations betweenFt and Ft (dark blue) and of the sample correlations between λi andλi (red). The correlations have been taken in absolute value.
Histogram of abs(corre)
abs(corre)
Den
sity
0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
050
100
150
8.7 Figures 86
Figure II: T = 100, N = 100. The figure presents the histograms (across Monte Carlo iterations)of the sample correlations between Ft and Ft (dark blue) and of the sample correlations betweenλi and λi (red). The correlations have been taken in absolute value.
Histogram of abs(corre)
abs(corre)
Den
sity
0.994 0.995 0.996 0.997 0.998
010
020
030
040
050
060
0
8.7 Figures 87
Figure III: T = 2, N = 10. The figure presents the histogram of (studentized) estimated factors
(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs − HFs) (green area) and
f large−Ts = (σ2U∗−1)−12N
12 (Fs− HFs) (blue area) for s = 1. The standard normal density function
is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−6 −4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
8.7 Figures 88
Figure IV: T = 2, N = 1000. The figure presents the histogram of (studentized) estimated factors
(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs − HFs) (green area) and
f large−Ts = (σ2U∗−1)−12N
12 (Fs− HFs) (blue area) for s = 1. The standard normal density function
is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
8.7 Figures 89
Figure V: T = 5, N = 10. The figure presents the histogram of (studentized) estimated factors
(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs − HFs) (green area) and
f large−Ts = (σ2U∗−1)−12N
12 (Fs− HFs) (blue area) for s = 1. The standard normal density function
is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−4 −2 0 2
0.0
0.1
0.2
0.3
0.4
8.7 Figures 90
Figure VI: T = 5, N = 1000. The figure presents the histogram of (studentized) estimated factors
(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs − HFs) (green area) and
f large−Ts = (σ2U∗−1)−12N
12 (Fs− HFs) (blue area) for s = 1. The standard normal density function
is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−2 0 2 4
0.0
0.1
0.2
0.3
0.4
8.7 Figures 91
Figure VII: T = 50, N = 10. The figure presents the histogram of (studentized) estimated factors
(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs − HFs) (green area) and
f large−Ts = (σ2U∗−1)−12N
12 (Fs− HFs) (blue area) for s = 1. The standard normal density function
is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−4 −3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
8.7 Figures 92
Figure VIII: T = 50, N = 1000. The figure presents the histogram of (studentized) estimated
factors (across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs− HFs) (green area)
and f large−Ts = (σ2U∗−1)−12N
12 (Fs − HFs) (blue area) for s = 1. The standard normal density
function is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
8.7 Figures 93
Figure IX: T = 50, N = 1000. The figure presents the histogram of (studentized) estimated factors
(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs − HFs) (green area) and
f large−Ts = (σ2U∗−1)−12N
12 (Fs−HFs) (blue area) for s = 1. We also report (red area) the histogram
of (studentized) estimated factors using Bai (2003) robust standard errors. The standard normaldensity function is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
8.7 Figures 94
Figure X: T = 100, N = 1000. The figure presents the histogram of (studentized) estimated factors
(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N
12 (Fs − HFs) (green area) and
f large−Ts = (σ2U∗−1)−12N
12 (Fs−HFs) (blue area) for s = 1. We also report (red area) the histogram
of (studentized) estimated factors using Bai (2003) robust standard errors. The standard normaldensity function is superimposed on the histograms (dark blue line).
Histogram of fstn
fstn
Den
sity
−3 −2 −1 0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
8.7 Figures 95
Figure XI: T = 5, N = 10. The figure presents the time series (average across Monte Carloiterations) of the 95% confidence intervals (red lines) for the true factor (black line):(
βFt − 1.96β
N12 (A′BtA)
12
, βFt + 1.96β
N12 (A′BtA)
12
), t = 1, · · · , T,
where β is the OLS estimator from projecting F on F (without intercept).
1 2 3 4 5
−1.
0−
0.5
0.0
0.5
1.0
1.5
95% Confidence Interval for Factors:
T=5, N=10t
t(ftr
ue)
8.7 Figures 96
Figure XII: T = 5, N = 1000. The figure presents the time series (average across Monte Carloiterations) of the 95% confidence intervals (red lines) for the true factor (black line):(
βFt − 1.96β
N12 (A′BtA)
12
, βFt + 1.96β
N12 (A′BtA)
12
), t = 1, · · · , T,
where β is the OLS estimator from projecting F on F (without intercept).
1 2 3 4 5
−1.
0−
0.5
0.0
0.5
1.0
1.5
95% Confidence Interval for Factors:
T=5, N=1000t
t(ftr
ue)
8.7 Figures 97
Figure XIII: The figure presents the time series of the estimated number of factors kt−1 (blackline), using the criterion of Theorem t-numfac, together with the S& P 500 Index (green line) fromJanuary 1966 to December 2013, with rolling windows of size T = 12. The grey bands indicatingthe NBER recession indicators and various economic and financial crises, and are numbered asfollows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 98
Figure XIV: The figure presents the time series of the estimated number of factors kt−1, from Jan-uary 1966 to December 2013, using the criterion of Theorem t-numfac, with rolling windows of sizeT = 12, together with grey bands indicating the NBER recession indicators and various economicand financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 99
Figure XV: The figure presents the time series of the first five estimated risk risk factors, namelythe first five entries of Ft,t, based on rolling windows of T = 12 observations. The number of stocks,corresponding to each rolling window, varies from 800 to 5, 000 circa. The grey bands indicatingthe NBER recession indicators and various economic and financial crises, and are numbered asfollows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 100
Figure XVI: The figure presents the time series of the estimated correlation, over rolling windowsof 60 months, of the market return with the first PCA, of the smb with the second PCA andof the hml with the third PCA, where the PCA is implemented over rolling windows of T = 12observations. The number of stocks, corresponding to each rolling window, varies from 800 to5, 000 circa. The grey bands indicating the NBER recession indicators and various economic andfinancial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 101
Figure XVII: The figure presents the time series of the estimated correlation, over rolling windowsof 60 months, of the rmw and cma returns with the fourth PCA, where the PCA is implementedover rolling windows of T = 12 observations. The number of stocks, corresponding to each rollingwindow, varies from 800 to 5, 000 circa. The grey bands indicating the NBER recession indicatorsand various economic and financial crises, and are numbered as follows:The figure presents the time series of the first five estimated risk risk factors, based on rollingwindows of T = 12 observations. The number of stocks, corresponding to each rolling window,varies from 800 to 5, 000 circa. The grey bands indicating the NBER recession indicators andvarious economic and financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]:1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12,[7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 102
Figure XVIII: The figure presents the time series of the estimated correlation, over rolling windowsof 60 months, of the rmw and cma returns with the fifth PCA, where the PCA is implementedover rolling windows of T = 12 observations. The number of stocks, corresponding to each rollingwindow, varies from 800 to 5, 000 circa. The grey bands indicating the NBER recession indicatorsand various economic and financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]:1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12,[7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 103
Figure XIX: The figure presents the time series of the estimated expected equally-weighted portfolioexcess return (black line) µewt = wew′
N Λtγ, where wewN = N−11N , based on the estimated number of
factors reported in Figure XIV, with rolling windows of size T = 12. We also report (red line) themarket dividend yield (ratio of aggregate dividends to the S&P 500 index; source Robert Shiller’swebsite), (blue line) the default spread (Moody’s Seasoned Baa Corporate Bond Yield Relative toYield on 10-Year Treasury Constant Maturity, Not Seasonally Adjusted; source FRED), (light blueline) the term spread (10-Year Treasury Constant Maturity Minus 3-Month Treasury ConstantMaturity, Not Seasonally Adjusted; source FRED) and (green line) the CAPE ratio (CyclicallyAdjusted Price Earnings Ratio P/E10; source Robert Shiller website). The grey bands indicatingthe NBER recession indicators and various economic and financial crises, and are numbered asfollows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 104
Figure XX: The figure presents the time series of the adjusted-R2 of the regression, over rolling win-dows of 60 observations, of µewt = wew′
N Λtγ, where wewN = N−11N , based on the estimated number
of factors reported in Figure XIV, over four state variables, namely the market dividend yield (ratioof aggregate dividends to the S&P 500 index; source Robert Shiller’s website), the default spread(Moody’s Seasoned Baa Corporate Bond Yield Relative to Yield on 10-Year Treasury ConstantMaturity, Not Seasonally Adjusted; source FRED), the term spread (10-Year Treasury ConstantMaturity Minus 3-Month Treasury Constant Maturity, Not Seasonally Adjusted; source FRED) andthe CAPE ratio (Cyclically Adjusted Price Earnings Ratio P/E10; source Robert Shiller website).The black line corresponds to the R2 when the four variables are considered jointly, and the colouredlines when each state variable is considered alone in the regression (red line for CAPE, green linefor term spread, blue line for default spread, light blue for dividend yield). The grey bands indi-cating the NBER recession indicators and various economic and financial crises, and are numberedas follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 105
Figure XXI: The figure presents the time series of the estimated risk premium (black line), as-sociated with the first risk factor, with rolling windows of size T = 12. When it is statisticallysignificantly positive and negative at 95%, we report it in green and red respectively, based on the re-sults of Theorem 4. The grey bands indicating the NBER recession indicators and various economicand financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 106
Figure XXII: The figure presents the time series of the estimated risk premium (black line), as-sociated with the second risk factors, with rolling windows of size T = 12. When it is statisti-cally significantly positive and negative at 5% significance value, we report it in green and redrespectively, based on the results of Theorem 4. The grey bands indicating the NBER recessionindicators and various economic and financial crises, and are numbered as follows: [1]: 1969.10- 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]:1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12,[11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11,[16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 107
Figure XXIII: The figure presents the time series of the estimated risk premium (black line),associated with the third risk factors, with rolling windows of size T = 12. When it is statisticallysignificantly positive and negative at 5% significance value, we report it in green and red respectively,based on the results of Theorem 4. The grey bands indicate the NBER recession indicators andvarious economic and financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]:1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12,[7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 108
Figure XXIV: The figure presents the time series (black line), estimated over rolling windows ofT = 12 observations, of the pricing performance statistic:
∆t−1 =1
N
N∑i=1
δ2it−1 −
σ2t−1
T(1− γ ′γ),
where δit−1 = xi− λ′it−1γ. The black line corresponds to the estimated number of factors kt−1 withour criterion. We report cases k = 1, (red line), k = 2 (green line), k = 3 (blue line) and k = 5(light blue).The grey bands indicate the NBER recession indicators and various economic and financial crises,and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]:1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9]1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10,[14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10,[19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 109
Figure XXV: The figure presents the time series (black line) of the estimated SDF:
mt,t+1 ≡1
rft− 1
rftγ ′Ω−1(Ft+1 − γ).
together with its large-N 95% confidence interval (red lines), based on Theorem 5, with rolling win-dows of size T = 24. The grey bands indicate the NBER recession indicators and various economicand financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.
8.7 Figures 110
Figure XXVI: We estimate the three linear regressions, using rolling windows of 60 months,
mt−1,t = α+ β1mktt + ut,
mt−1,t = α+ β1mktt + β2smbt + β3hmlt + ut,
mt−1,t = α+ β1mktt + β2smbt + β3hmlt + β4rmwt + β5cmat + ut,
and report the corresponding adjusted-R2. Here mktt denotes the market excess return, smbtdenotes the small-minus-big portfolio return, hmlt denotes the high-minus-low portfolio return,rmwt denotes the profitability portfolio return and camt denotes the investment portfolio return(source: Kenneth French’s website); t-ratios in parenthesis. The CAPM, the Fama & French(1993) and Fama & French (2015) models are reported with the black, red and green lines, re-spectively. The grey bands indicate the NBER recession indicators and various economic andfinancial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.