Download pdf - Factor Models for Conditional Asset Pricing

Factor Models for Conditional Asset Pricing∗

PAOLO ZAFFARONI

Imperial College Business School

This Draft

November 12, 2019

Abstract

This paper develops a methodology for inference on conditional asset pricing models linear in latent riskfactors, valid when the number of assets diverges but the time series dimension is fixed, possibly very small.We show that the no-arbitrage condition permits to identify the risk premia as the expectation of the latentrisk factors. This result paves the way to an inferential procedure for the factors’ risk premia and for thestochastic discount factor, spanned by the latent risk factors. In our set up every feature of the asset pricingmodel is allowed to be time-varying including loadings, risk premia, idiosyncratic risk and the number ofrisk factors. Monte Carlo experiments corroborate our theoretical findings. Several empirical applicationsbased on individual asset returns data demonstrate the power of the methodology, allowing to tease out theempirical content of the time-variation stemming from asset pricing theory.

Keywords: Conditional Asset Pricing Models; No-Arbitrage; Latent Factor Model; Risk Premia; Stochastic

Discount Factor; Principal Component Analysis.

∗I thank Claudia Custodio, Victor DeMiguel, Gabaix Xavier, Marcin Kacperczyk, Frank Kleibergen, Ralph Koijen, OliverLinton, Stefan Nagel, Olivier Scaillet, Raman Uppal, Dacheng Xiu, Guofu Zhou, audiences at Cambridge, Imperial College London,Luxembourg School of Finance, Yale and Tinbergen Institute (Amsterdam) for comments and suggestions. I am very grateful toMassimo Dello Preite for developing the simulation and empirical exercises of this paper.

1

1

1 Introduction

This paper develops a formal methodology to conduct inference on no-arbitrage factor conditional asset pricing

models. Our method does not require to specify in any way how the conditional distribution of asset returns

changes over time, allowing to leave the form of time-variation of loadings, risk premia and even of the number

of risk factors completely unspecified. Moreover, it does not require to pre-select any candidate, observed,

risk factor and instead it treats all the risk factors as unobserved. These two features imply that our method

mitigates completely the risk of model misspecification arising from either postulating the incorrect dynamics

and from relying on the wrong or incomplete set of risk factors. The salient feature of our methodology, that

enables to achieve these features, is that it works when the time-series dimension of the data T is fixed, in

practice (almost) arbitrarily small, whereas it demands the number of assets N to diverge.

Linear factor models represent the workhorse of asset pricing, whereby the stochastic discount factor (SDF)

is assumed to be a linear function of a set of systematic risk factors, common across assets. From a theoret-

ical perspective, this approach has been legitimized by the CAPM of Sharpe (1964) and Lintner (1965), the

Intertemporal CAPM of Merton (1973) and the Arbitrage Pricing Theory (APT) of Ross (1976). Empirically,

it has proven much more arduous to identify the complete set of risk factors that span the SDF. Although

the Fama & French (1993) three-factor model and, especially, the Fama & French (2015) five-factor model

appear successful, with respect to several metrics, there is no consensus about which, among the hundreds of

candidate risk factors documented in the empirical asset pricing literature (see Harvey et al. (2016)), are really

“...important and which are....subsumed by others” (Cochrane (2011)).

At the same time, tens of thousands of financial assets are traded every day in financial markets. However,

most of the econometric methods developed to test and estimate asset pricing models are designed for when

the time-series dimension T diverges and the number of assets N is fixed. There are many instances when

allowing a large T is not beneficial or just not feasible. For instance, economic and financial phenomena are

plagued by structural breaks (geopolitical crises and wars, energy crises, financial crises) and, sometimes, a

long time series of data simply is not available, such as when considering financial data on emerging markets

or data on new financial instruments. An arbitrarily large number of assets N is also dictated by asset pricing

theory, in particular when the idiosyncratic component of asset returns is assumed correlated across assets. For

example the CAPM hinges on the notion of full diversifiability of idiosyncratic risk, achieved when N diverges.1

1More in general, exact pricing (i.e. when expected returns are exactly linear in betas with null pricing errors) requires existenceof a well-diversified portfolio on the mean-variance frontier (see Chamberlain (1983)[Corollary 1]). On the other hand, when exact

2

Moreover, dynamic asset pricing models characterize expected returns, prices and risk premia as conditional

(hence time-varying) moments and such time-variation has to be explicitly parameterized, in a tractable way,

when dealing with any large-T estimation procedure. However, with tractability comes the risk of model

misspecification, rendering any large-T inferential procedure potentially invalid. More in general, estimating

the first moment of asset returns by means of large-T techniques is notoriously challenging, as elegantly elicited

by Merton (1980).

Therefore, empirical and theoretical considerations strongly suggest the need for an inferential methods for

conditional asset pricing models that work when N is large and T is small. This is the achievement of the

paper. More formally, this paper provides the empirical asset pricing methodology that mirrors the theoretical

asset pricing contributions of Chamberlain (1983), Chamberlain & Rothschild (1983a) and Hansen & Richard

(1987). In particular, whereas the first two seminal works combine no-arbitrage factor asset pricing models with

population Principal Component Analysis (PCA) in a large-N environment, ours develops the inferential theory

of the sample PCA for no-arbitrage factor asset pricing models, also in a large-N environment. This implies

that our inferential procedure is particularly suited for conditional asset pricing models, formalized upon the

conditional no-arbitrage condition of Hansen & Richard (1987), because parameterizing time-variation becomes

much less critical, if any at all, when T can be taken sufficiently small, as the assumption of mild or no time-

variation represents a plausible approximation over a short time interval.

Our novel PCA asymptotic results are instrumental for developing an inferential procedure for our objects

of interest, namely the time-varying risk premia and the time-varying SDF. Our results are first developed for

the case when a risk free asset is traded, and then generalized to the case when the zero-beta rate is unknown

and needs to be estimated. The usefulness of the PCA estimator of the risk factors, from an asset pricing

perspective, naturally stems from its close analogy with the mimicking portfolio estimator: we demonstrate

that these two estimators, although not identical, are asymptotically equivalent, implying that they allow to

conduct the same inference on the set of true latent risk factors. Our methodology for conditional asset pricing

models represents a unique tool to address important empirical questions in empirical asset pricing, and beyond,

unthinkable with the usual PCA methodology, valid under double-asymptotic (i.e. when both N and T diverge).

More specifically, we show how to estimate consistently the number of (latent) risk factors and the associated

pricing does not hold, such as for the APT, an arbitrarily large N is required for all the theoretical predictions to hold. Morespecifically, Chamberlain & Rothschild (1983a) state that “... Ross (1976) has argued that the apparent empirical success of theCAPM is due to three assumptions which are more plausible than the assumptions needed to derive the CAPM. These assumptionsare first, that there are many assets; second, that the market permits no arbitrage opportunities; and third, that asset returns havea factor structure with a small number of factors...”

3

factors’ risk premia, and how to conduct inference on them, such as e.g. constructing asymptotically valid

confidence intervals. Interestingly, as a by-product of the no-arbitrage condition, our risk premia estimator

exhibits two, equivalent, representations: first, as the time-series sample mean of the PCA estimated risk

factors, and, second, as the traditional OLS two-pass estimator of Fama & MacBeth (1973), obtained by

projecting sample average returns on the PCA estimated loadings.2 It turns out that our risk premia estimator

exhibits the same rate of convergence characterizing risk premia estimators for the case when the risk factors

are observed. In other words, somewhat surprising, no loss of information arises from the latent nature of the

risk factors within our large-N fixed-T sampling scheme, the only difference being that risk premia are now

identified up to an (unknown) rotation. Notice that our focus on the large-N and fixed-T scheme necessarily

implies that we can identify the so-called ex-post risk premia, which equals the (ex-ante) risk premia plus the

risk factors’ innovation (difference between the risk factors’ sample and population means).3 For the case of

observed risk factors, Gagliardini et al. (2016) and Raponi et al. (2018) demonstrate how valid inference on

correct specification of the asset pricing model (i.e. on the asset pricing restriction) can be conducted by relying

on the ex-post risk premia, without any loss of power, when N,T are both diverging and when N diverges but

with fixed-T , respectively.

We then study estimation of the SDF spanned by the latent risk factors. Unlike the risk premia, the SDF

is unaffected by the rotation that plagues the PCA estimator of factors and risk premia. We derive the rate of

convergence and the asymptotic distribution of our SDF estimator, and show how to construct asymptotically

valid confidence intervals. This is important because the limiting statistical properties of the SDF, spanned

by latent risk factors, were hitherto unknown, when taking into account the sampling variability of the risk

factors’ estimates. In analogy to risk premia, as a consequence of our sampling scheme, note that the object

of inference is necessarily the so-called ex-post SDF, which is a (linear) function of the ex-post risk premia. To

justify the notion of the ex-post SDF, we formally show that the pricing errors associated with it are almost

undistinguishable from the ones associated with the conventional SDF, based on the ex-ante risk premia, even

for a moderate T .

Finally, we show how to consistently estimate expected returns of large portfolios, based on our risk premia

estimator. Just like for the SDF, portfolios expected returns are immune to the rotation affecting the risk

factors and their risk premia. Consistent estimation follows because expected portfolio returns depend on the

2Recall that equivalence between the sample mean and the two-pass estimator is not warranted for the case of observed tradeablerisk factors.

3The notion of ex-post risk premia was by coined by Shanken (1992).

4

(weighted) first moment of the loadings, which can be recovered despite the noisiness of the PCA-estimated

loadings themselves.

We develop several empirical applications, based on a data set of individual U.S. monthly stock returns, that

demonstrate the strength of our large-N methodology. First, we show empirically how the estimated number

of risk factors varies across time, ranging from one to 10 and increasing during booms and sharply decreasing

during financial crises: the estimated correlation between the S&P 500 index and our estimated number of

factors is above 60% in our sample. This is a marked rejection of the conventional approach of constancy of the

number of latent factors. Second, we found robust evidence according to which the dominant estimated factors

(measured by the size of the corresponding eigenvalues, sorted in descending order) are strongly related to the

Fama & French (2015) five factors. In particular, the market factor, the small-minus-big and high-minus-low

factors appear paired with the first three PCA-estimated factors, respectively, whereas the profitability and the

investment factors appear jointly related to the fourth and fifth PCA-estimated factors. The strength of such

relationships varies with time and seems to weaken after the year 2000. Third, we demonstrate how the time-

variation of the estimated risk factors’ loadings appear driven by the interest rate spread variables, dividend

yield and earnings variables, especially during financial crises. Fourth, we evaluate the economic value of the

estimated asset pricing model in terms of pricing performance, i.e. magnitude of the associated pricing errors,

and in terms of out-of-sample Sharpe ratios associated with mean-variance portfolios built using the model’s

estimated parameters. It emerges that the factor model, based on the time-varying estimated number of risk

factors, provides the best performance across both metrics. Noticeably, in terms of out of sample Sharpe ratios,

our estimated model, with a time-varying number of risk factors, not only dominates the performance of a

model with a constant number of factors but dominates also the celebrated equally-weighted portfolio strategy.

Fifth, the time series of the estimated SDF exhibits a prominent, and statistically significant, anti-cyclical

behavior, in particular with respect to the NBER recession indicator. Moreover, the estimated SDF appears

markedly spanned by the Fama & French (2015) factors until the late 1990s, and rather tenuously after that,

except in the aftermath of the subsequent financial crises.

The paper proceeds as follows. Section 2 describes a literature review. Section 3 presents the factor

conditional asset pricing model, sets the notation and describes the regularity assumptions. Section 4 presents

our methodology: we show how to conduct inference on the latent factors, including consistent estimation of

the true number of latent factors; we formalize how to conduct inference on risk premia and the SDF, and how

to generalize our results to the case when the zero-beta rate has to be estimated; we establish the asymptotic

5

equivalence between PCA estimator of the risk factors and the mimicking portfolio estimator. Monte Carlo

experiments are presented in Section 5. Section 6 contains the five empirical applications based on the data set

of individual US stock returns. Section 7 concludes. Formal proofs are reported in the final appendixes.

2 Literature Review

This paper advances the empirical asset pricing literatures that exploits the available large cross-sections of

individual assets returns and the one on estimation of conditional asset pricing models and the econometrics

literature on PCA estimation.

Giglio & Xiu (2017) recognize that, by extracting the complete set of systematic risk factors, PCA solves the

problems arising from estimating risk premia associated with observed factors, when, in particular, some of these

factors are omitted or even mismeasured, leading to the classical omitted variable bias. Gagliardini, Ossola and

Scaillet (2018) propose a methodology to detect if the residuals of a potentially misspecified beta-pricing model,

with time-varying coefficients, exhibit a factor structure. Although PCA is not directly employed, their method

is based on the asymptotic behaviour of the maximum eigenvalue of the residuals’ sample covariance matrix.

Lettau & Pelger (2018a) study a version of the PCA estimator designed to also minimize the mispricing in terms

of expected return, with the noticeably feature of estimating consistently weak factors (i.e. factors associated

with non-diverging eigenvalues), and Lettau & Pelger (2018b) provide several applications of this methodology

to characteristics portfolios and individual stock returns. Kozak et al. (2018) demonstrate empirically that

a small number of the dominant PCA estimates satisfactorily quantifies the SDF associated with anomalies

portfolios returns, regardless of whether the underlying asset pricing model is deemed as behavioural (i.e. risk

driven by say stock characteristics) or rational (i.e. risk driven by loadings). Kim & Korajczyk (2019) study

estimation of the SDF by minimizing the average mispricing in terms of prices, where the latent factors, when

present, have been pre-estimated by PCA, building on the insight of Pukthuanthong & Roll (2017).

Conditional linear factor models have been typically specified by assuming that loadings and risk premia

are linear functions of (lagged) observed state variables.4 Gagliardini et al. (2016) provides a formal statistical

analysis for an inferential method of this type of conditional linear factor models, in particular characterized by

observed risk factors. In contrast, Connor & Linton (2007), Connor et al. (2012), Fan et al. (2016) , Kelly et al.

(2017) and Kelly et al. (2018) study, and apply, estimation procedures for conditional models with latent risk

4See Shanken (1990), Ferson & Harvey (1991), Jagannathan & Wang (1996) and Lettau & Ludvigson (2001) among many others.The possible shortfalls of this approach have been discussed in Ghysels (1998) and Harvey (2001).

6

factors models whereby the loadings and, when applicable, the mispricing (i.e. the intercept) are instrumented

by observed state variables.5 An alternative approach, closer to the perspective of this paper, is to leave the

dynamics of the conditional asset pricing model unspecified, and use nonparametric techniques for estimation.

Building on the insight of French et al. (1987), Andersen et al. (2006), Lewellen & Nagel (2006) and Ang &

Kristensen (2012) develop several variations of this approach. Whereas the asymptotic analysis of the time-

varying loading’s estimator pose no particular problems, reliable nonparametric estimation of the alphas, and

thus of the time-varying risk premia, is particularly challenging with nonparametric techniques, as detailed by

Ang & Kristensen (2012) who build on the arguments of Merton (1980).

Seminal, econometrics, results for inference on factor models are Bai & Ng (2002), Stock & Watson (2002a),

Stock & Watson (2002b) and Bai (2003), whereby the observable variable is a static function of a finite set

of latent common factors. However, note that this model is not static, in the sense that both the common

factors and the idiosyncratic component are allowed to be time-dependent. In this strand of the literature,

time-domain inferential methods and assumptions are have been adopted. In contrast, Forni et al. (2000) and

Forni & Lippi (2001) postulate that the observable is a dynamic function of a finite set of latent factors, their

approach being denominated as the generalized dynamic factor model. Frequency domain techniques have been

adopted here to analyze the statistical properties of the dynamic PCA estimator.6 Although a more general

form of time-dependence is allowed for, this frequency-domain approach is less suitable for forecasting because

it necessarily leads to the observable being a function of past and future innovations. Forni et al. (2015) and

Forni et al. (2017) show how to reconcile the advantages of both approaches. All the aforementioned work

requires double asymptotics, namely that both N and T diverge to infinity. Moreover, with the exception

of Bai (2003), none of these works developed inferential methods, namely the asymptotic distribution and the

associated standard errors for the PCA estimates of loadings and factors, but focused on estimating the number

of factors and deriving consistency (sometimes with rates of convergence) of the PCA estimator of loadings and

common factors.

A large literature focused on procedures for estimating the number of latent common factors, since the

seminal work of Connor & Korajczyk (1993). Bai & Ng (2002) provide the theory, under double asymptotics,

for consistent estimation of the number of factors in a static model. Amengual & Watson (2007) show how their

approach can be extended to the case of finite-dimensional vector autoregression latent factors. Onatski (2010)

5Although we denote these models as conditional, in the sense that the corresponding parameters are function of observables,only Kelly et al. (2017) and Kelly et al. (2018) allow explicitly for dynamics.

6Brillinger (2001) introduce the dynamic PCA estimator, based on the eigenvalues and eigenvectors of a consistent estimator ofthe observables’ spectral density matrix.

7

allows for an arbitrary speed of divergence of the dominant eigenvalues and Ahn & Horenstein (2013) relax the

dependence from a finite upper bound for the true number of latent factors. Ait-Sahalia & Xiu (2017) establish

consistent estimation of the number of latent factors (and estimation of the factor structure itself) within an

in-fill asymptotics framework when high-frequency data are available.7 For the case of dynamic factor models

a la Forni et al. (2000), Hallin & Liska (2007) provide a consistent estimator for the number of factors and

Onatski (2009) establishes the asymptotic distribution of a test for the number of factors. As for estimation,

these advances have been developed in a double asymptotics setting.

Building on Dahlhaus (1997) and Dahlhaus (2000), factor models with time-varying loadings of unspecified

form, are considered by Motta et al. (2011), Brandon et al. (2013), Su & Wang (2017) and Barigozzi et al.

(2019), among others. All these developments are based on double asymptotics.

Very little work has been established within the large-N fixed-T sampling scheme, especially when com-

pared with the double asymptotics setting cited above, and moreover the few existing results are scattered

across various papers, and depend on different set of regularity conditions. In particular, Connor & Korajczyk

(1986) establish consistency of the PCA estimator for the latent factors when T is fixed. Subsequently Bai

(2003)[Theorem 4] establish the necessary and sufficient conditions for the Connor & Korajczyk (1986) large-

N consistency result, namely that the cross-sectional averages of the idiosyncratic components’ variances and

autocovariances are, respectively, constant and null. A related, statistical literature, focused on the large-N ,

small-T environment, denominated as the high-dimension-low- sample-size (HDLSS) situation, where rates of

convergence of the PCA estimator of the latent factors, and variations of, are studied under various regularity

conditions; see Jung & Marron (2009) and Shen et al. (2013) among others.8

Our work complements these empirical asset pricing and econometrics literatures in an important way

because we specifically focus on the case when N is large and T is fixed, developing tools that allow to perform

inference, such as confidence intervals and hypothesis testing for the estimated factors, as well establishing

consistent estimation of the true number of latent factors. Moreover, we show how our econometrics results are

instrumental for developing a formal methodology for inference on risk prema and the SDF associated with the

linear asset pricing model, which are our ultimate objects of interest.

Noticeably, individual asset returns, in particular equity returns, exhibit a very limited time-dependence but,

7Ait-Sahalia & Xiu (2018) establish the asymptotic distribution theory of the PC estimator in a continuous time setting butwithout assuming necessarily a factor structure, hence N is assumed fixed.

8Fan et al. (2016), cited above, reports consistency results for a variation of the PCA, both in the HDLSS environment andunder double asymptotics.

8

at the same time, are characterized by strong cross-sectional dependence and heterogeneity. As our interest

lies predominantly in factor models for asset returns, these empirical features motivate our large-N fixed-T

sampling scheme and explain our focusing on static factor models, in particular in light of Bai (2003) findings,

as opposed to dynamic factor models a la Forni et al. (2000), although our approach could in principle be

applied to the latter too.

3 Factor Conditional Asset Pricing Model: Set-Up and Assumptions

We assume that the i-th individual asset return, xit, in excess of the risk free rate, satisfies the finite-dimensional

conditional factor model, in the sense that each xit is a linear function of a finite number rt−1 of latent common

factors, and of a (latent) idiosyncratic component:

xit = αit−1 + λ′it−1ft + eit for every i = 1, · · · , N and t ∈ Z, (1)

where ft defines the rt−1×1 vector of latent risk factors, λit−1 the rt−1×1 vector of latent time-varying loadings,

eit is the idiosyncratic component and Z = · · · ,−1, 0, 1, · · · . In representation (1) the parameter αit−1 is just

a time-varying intercept, not necessarily equal to the conditional expected excess return Et−1(xit) (i.e. the risk

premium associated with asset i). Therefore, (1) does not imply that Et−1(ft) = 0rt , although we will assume

Et−1(eit) = 0 in our regularity assumptions. Finally, we assume that a risk-free asset is tradedable. The case

when any risk-free asset is not tradeable is described subsequently.

It is important to make sure that model (1) does not allow for a ‘free lunch’, formalized as follows

(see Chamberlain (1983)[Condition A] and the generalization of Hansen & Richard (1987)[Definition 2.4]

to a conditional setting). This requires to define an arbitrary portfolio strategy p with weights wpNt−1 =

(wp1t−1, wp2t−1, · · · , w

pNt−1)′ of the N risky assets’ excess returns xt. ≡ (x1t, · · · , xNt)′, leading to the portfolio

p excess return rpN,t ≡ wp′Nt−1xt.. As we explained below, the no-arbitrage assumption turns out to be instru-

mental for the interpretation of the latent factors, and for derivation of the statistical properties of the PCA

estimator.

Assumption 1 (conditional no-arbitrage) There is no sequence of portfolios along some subsequence N ′

for which, for an arbitrary positive scalar C:

vart−1(raN ′,t)→ 0 a.s., when N ′ →∞, and Et−1(raN ′,t) ≥ C > 0 a.s for every t ∈ Z.

9

Ingersoll (1984) clarifies that the above definition of no-arbitrage is equivalent to assume that Et−1(raN ′,t)→∞

(a.s.) by suitably leveraging, i.e. scaling up, the portfolio weights. Under the above no-arbitrage condition, a

limited degree of cross-sectional dependence and a sufficient degree of smoothness of the loadings (see Proposi-

tion 4 in Appendix B), there exists a rt−1× 1 vector γt−1, namely the time-varying risk premia associated with

the risk factors ft, such that

µit−1 ≡ Et−1(xit) = ai−1 + λ′it−1γt−1 a.s. for every i = 1, · · · , N and t ∈ Z, (2)

where the pricing errors ait−1 must satisfy, as a consequence of no-arbitrage (see Chamberlain (1983)[Theorem

1] and Stambaugh (1983)[Theorems 1 and 2] for the static and conditional settings, respectively), 9

∞∑i=1

a2it−1 <∞ a.s. for every t ∈ Z. (3)

Combining (1) and (2):

xit = ait−1 + λ′it−1Ft + eit, (4)

setting

Ft ≡ ft + γt−1 − Et−1(ft). (5)

A re-writing of (4), such as xit = µit−1 + λ′it−1(ft − Et−1(f)) + eit, leads to the conditional version of the APT

formulation in Ross (1976)[Eq (15)], Chamberlain (1983)[Eq. (4.1)] and Chamberlain & Rothschild (1983a)[Eq.

(1.1)], among others. However, (4) provides the ultimate specification that PCA will estimate, in particular

obtaining (consistent) estimates of the risk factors Ft (that is, the ft adjusted by γt−1 − Et−1(ft)). Note that

Et−1(Ft) = γt−1, that is the no-arbitrage condition leads to a change-of-measure of the latent risk factors ft,

so that the Ft are centered precisely around the risk premia γt−1. This result does not follow if the xit are

de-meaned beforehand, suggesting the importance of applying PCA directly to the xit from (4) without any

prior de-meaning of the data.

9Stambaugh (1983)[Theorems 1 and 2] extends Chamberlain & Rothschild (1983b)[Theorem 1] to the conditional setting, withpossibly time-varying risk premia but constant loadings. We show how Stambaugh (1983) results can be further generalized tothe case of time-varying loadings under a suitable smoothness assumption (see Proposition 4). The appeal of Stambaugh (1983)approach is that it applies to conditioning information set of any size, including the case of total absence of conditioning information,namely the one-period static case. This implies that Stambaugh (1983) result is robust to asset repackaging, which is relevant whenagents construct portfolios based on different information sets. As discussed in Hansen & Richard (1987)[Section 5], absence of thisproperty would imply that existence of a conditional factor structure does not lead to existence of a corresponding unconditionalfactor structure or, more in general, to existence under a different information set. For an alternative proof (based on a differentset of assumptions) of the conditional APT with a countably number of assets see Reisman (1992). Building on Al-Najjar (1999),Gagliardini et al. (2016) establish robustness to asset repackaging of the conditional APT with a continuum of assets.

10

Although the emphasis of this paper in on asset returns, if one wishes to apply model (1) to data other

than asset returns, then the restriction (2) does not necessarily hold, and one simply re-arranges (1) as

xit = (αit−1,λit−1)′(1, f ′t)′ + eit = (αit−1,λit−1)′Gt + eit, (6)

where Gt denotes the (rt−1 + 1)× 1 vector (1, f ′t)′. When αit−1 = αi, λit−1 = λi and rt−1 = r, representation

(6) is the static latent factor formulation of (1) adopted, among others, by Stock & Watson (2002a) and Bai &

Ng (2002). It follows that by applying PCA to model (6), with constant parameters, one captures r+ 1 factors

in total, one of which being the constant factor (equal to 1 for all periods), as long as such (constant) factor is

pervasive, namely N−1∑N

i=1 α2i → C > 0, for some constant C.10 Alternatively, focusing again on the constant

loadings case of Eq. (6), one can subtract the (time series) sample mean from both sides of (6), and apply the

PCA to the xit − xi = λ′i(ft − f) + eit − ei, denoting zi ≡ T−1∑T

t=1 zit for a generic sequence zit, In this case

only the r (de-meaned) factors, ft − f , will be captured by PCA.

Things become more complicated for the case of time-varying parameters, as for instance de-meaning is

not even meaningful because it will not eliminate the time-varying intercept term anyway. This is where the

no-arbitrage assumption becomes extremely useful, because PCA will detect the rt−1 factors without the need

to de-mean the asset returns data, because the constant factor is not pervasive. In other words, for asset returns

one can safely estimate the risk factors ignoring completely the intercept ait−1 (i.e. the pricing errors) without

inducing any bias in the estimated factors and loadings, an especially advantageous feature when considering

conditional factor models.

Summarizing, throughout this paper, we will always implement PCA, without allowing for any intercept

term, regardless of whether one envisages to fit the factor model to asset returns or not, as this will always lead

to accurate (i.e. consistent) estimates of the common latent factors, under our regularity assumptions. This

is methodologically relevant because, as explained in details below, allowing explicitly for an intercept term

or, alternatively, de-meaning the data, induces time-dependence in the idiosyncratic component of the fitted

model, violating one of the regularity assumptions needed in our large-N fixed-T environment, as clarified by

Bai (2003).

10Obviously, in practice, such constant factor will be masked by the rotation that affects loadings and factors. It is well-knownthat this does not lead to any relevant consequence for practical use of the model.

3.1 Econometric Quantities: Factors and Loadings 11

3.1 Econometric Quantities: Factors and Loadings

To define the PCA estimator, it is useful to adopt a matrix formulation, setting the T×N matrix of observed ex-

cess asset returns X ≡ (x.1 · · ·x.i · · ·x.N ) ≡ (x1. · · ·xt. · · ·xT.)′, where x.i ≡ (xi1 · · ·xiT )′ and xt. = (x1t · · ·xNt)′,

the T×rt−1 matrix of factors F ≡ (F1 · · ·FT )′, the N×rt−1 matrix of loadings Λt−1 ≡ (λ1t−1 · · ·λNt−1)′ and the

T ×N matrix of idiosyncratic components e ≡ (e.1 · · · e.i · · · e.N ) = (e1. · · · et. · · · eT.)′, setting e.i ≡ (ei1 · · · eiT )′

and et. ≡ (e1t · · · eNt)′. Note that factors, loadings and residuals are always latent.

We assume that the T observations used in the PCA are extracted from a longer time-series sample of size

T0 > T and in particular, without loss of generality, that they are made by the observations from t−T + 1 to t,

for any t satisfying T−1 < t ≤ T0. Therefore, whereas X and F contains exclusively T rows (referring to T time

series observations), without loss of generality one can always assume that these matrices are extracted from

the larger matrixes X0 and F0 with T0 rows. Although the allowed degree of time-variation will be formalized

below, for derivation of the PCA it suffices for now to assume that the loadings Λt−1 and the number of factors

rt−1 remain constant within the time interval from t−T +1 to t, but can vary freely across the T0 observations.

We present two, equivalent, ways to derive the PCA estimator of factors, loadings and number of factors

correspondingly to the time-varying model (1), namely as the nonparametric Nadaraya-Watson estimator, in

addition to the classical PCA formulation. For t ≡ [uT0] , corresponding to some u satisfying T/T0 ≤ u ≤ 1,

implying T ≤ t ≤ T0, set Kh(·) ≡ h−1K(·/h), with K(·) ≡ 1(·), the indicator function, and bandwidth

parameter h ≡ T/T0. The Kh(s/T0 − u), for every 1 ≤ s ≤ T0, make the diagonal T0 × T0 matrix Ku ≡

diag(Kh(1/T0 − u), · · · ,Kh(1− u)).

Without loss of generality for N , assume that N > T ≥ rt−1, for every T −1 < t ≤ T0, as we will only allow

N to diverge. As it will be explained, one needs T ≥ rt−1 to rule out existence, and multiplicity, of the zero

eigenvalue of certain relevant matrixes. The PCA estimators for the loadings and the factors are the solution

of the least squares problem:

F(Tt), Λ(Tt) = argminF,Λt−1

1

NT

t∑s=t−T+1

(xs. −Λt−1Fs)′(xs. −Λt−1Fs)

= argminF,Λt−1

1

NT0trace

((X0 − F0Λ

′t−1)Ku(X0 − F0Λ

′t−1)

),

where the PCA estimators, corresponding to rt−1 factors, are denoted by

F(Tt) ≡ (Ft−T+1,t, · · · , Fs,t, · · · , Ft,t)′ and Λ(Tt) ≡ (λ1t−1, · · · , λNt−1)′,

3.2 Analogies between PCA and Mimicking Portfolios 12

to indicate dependence on the time interval Tt ≡ (t− T + 1, t).

Indicating F(Tt) = F and Λ(Tt) = Λt−1, that is omitting reference to the interval Tt for the sake of simplicity,

F denotes the set of rt−1 eigenvectors, multiplied by√T , corresponding to the largest rt−1 eigenvalues of

XX′/NT , and correspondingly Λt−1 ≡ X′F(F′F)−1 = X′F/T ;11 see Magnus & Neudecker (2007), Theorem 7

of Chapter 17, for an elegant proof.

In particular, setting Vt−1 equal to the diagonal rt−1 × rt−1 matrix with the largest rt−1 eigenvalues of

XX′/NT , one obtains:

XX′

NTF = FVt−1 where

F′F

T= Irt−1 . (7)

Crucially, we do not explicitly allow for an intercept term when implementing PCA, for the reasons discussed

above. In the next sections, we will establish the asymptotic properties of F and Λt−1 as N →∞ and, more in

general, how to conduct inference on model (1), including estimating the true (but unknown) number of factors

rt−1.

3.2 Analogies between PCA and Mimicking Portfolios

The estimated factors F have the interpretation of portfolio (excess) returns, corresponding to the rt−1 vectors

of portfolio weights wpcaNt−1 ≡ Λt−1(Λ′t−1Λt−1)−1. In fact, by (7) and recalling that N−1Λ′t−1Λt−1 = Vt−1,

F =XX′

NTFV−1

t−1 = XX′F

T(NVt−1)−1 = XΛt−1(Λ′t−1Λt−1)−1 = Xwpca

Nt−1.

The population counterpart of the PCA portfolio weights wpcaNt−1 is clearly wpca

Nt−1 ≡ Λt−1(Λ′t−1Λt−1)−1.

However, it is more relevant to compare the PCA estimated risk factors F with the notion of mimick-

ing portfolios, coined by Huberman et al. (1987) and Breeden et al. (1989). In particular, setting Fmpt ≡

Et−1(Ftx′t.)(Et−1(xt.xt.))

−1xt. = wmp′Nt−1xt. as the mimicking portfolios corresponding the latent factors Ft, we

will study the limiting behaviour of both Fmpt and of wpca′

Nt−1xt., as N →∞, where by direct calculations:

wmpNt−1 =(

(at−1 + Λt−1γt−1)(at−1 + Λt−1γt−1)′ + Λt−1Ωt−1Λ′t−1 + Σet−1

)−1 (at−1γ

′t−1 + Λt−1γt−1γ

′t−1 + Λt−1Ωt−1

),

11It is well-known (see the discussion in Bai & Ng (2002)) that an asymptotically equivalent PCA estimator of factors andloadings can be obtained by setting Λt−1 equal to the eigenvectors of X′X/NT , multiplied by

√N , corresponding to the rt largest

eigenvalues, and F = XΛt−1/N . As N is much larger than T , the latter approach is computationally less convenient than the onedescribed above. Recall that the non-zero eigenvalues of X′X and XX′ coincide.

3.3 Asset Pricing Quantities: Risk Premia and SDF 13

setting Ωt−1 ≡ vart−1(Ft), Σet−1 ≡ vart−1(et.) and at−1 ≡ (a1t−1, · · · , aNt−1)′. More importantly, we establish

the limiting behaviour of the portfolio excess returns based on the feasible PCA-based estimator of wmpNt−1, and

show it close analogies with the PCA estimator F.

3.3 Asset Pricing Quantities: Risk Premia and SDF

Conditional expected excess returns µit−1 and risk premia γt−1 in particular, as from equation (2), are of

fundamental importance for many asset pricing problems, such as for identification of the SDF implied by (1)

and for portfolio construction. In particular, under the conditional no-arbitrage assumption, one can define the

(conditional or time-varying) risk premia, when a riskless asset with time-varying (gross) rate rft−1, is traded,

as:

γt−1 ≡ −rft−1

(Pt−1(ft)−

Et−1(ft)

rft−1

)for every t ∈ Z, (8)

where Pt(·) denotes the conditional pricing functional, formally defined as Pt(.) ≡ Et(mt,t+1·) where mt,t+1 is

the (conditional) SDF, whose existence is ensured by no-arbitrage (see Chamberlain (1983)[Theorem 1] and the

generalization to a conditional setting of Hansen & Richard (1987)[Theorem 2.1]).12 In our context, the price

Pt−1(ft) is only notional because the ft are latent, and hence non-tradeable. However, the formula above applies

to any risk factors so, for instance, when ft represent observed portfolio (gross) returns, then Pt−1(ft) = 1rt−1 ,

and the risk premia coincides with the expected value of the factors in excess of the risk free rate, as (8)

simplifies to γt−1 = Et−1(ft − rft−11rt−1). More in general, expression (8) defines the risk premia γt−1 as the

price of the factors ft less their risk-neutral valuation.

When T is fixed, an important identification issue related to the risk premia γt−1 arise. To simplify

arguments, consider the static version of the asset pricing model (1), that is assume for now ait−1 = ai,λit−1 =

λi, rt−1 = r. Then, following Shanken (1992), by taking the average of (4) across the time interval Tt, yields:

xi =1

T

∑s∈Tt

xis = ai + λ′iF + ei = ai + λ′iγP + ei, (9)

setting F ≡ F′1T /T and ei ≡ e′.i1T /T . Equation (9) shows how the (sample) average excess returns are linear

in the so-called ex-post risk premia, coined by Shanken (1992), namely

γP ≡ F = γ + f − E(ft).

12Expression (8) follows by generalizing Chamberlain (1983)[Theorem 1] to the conditional setting. An alternative, equivalent,expression for the risk premia vector γt−1 is based on the limit, as N diverges, of the linear projection with respect to (2), namely(Λ′t−1Λt−1)−1Λ′t−1µt−1, setting µt−1 ≡ (µ1t−1, · · · , µNt−1)′; see Ingersoll (1984)[Theorem 3].


Interestingly, the ex-post risk premia vector γP coincides with the sample mean of the (population) Ft, although

the latter are not observed, and thus not tradeable. The ex-post risk premia exhibit many attractive properties,

providing essentially the same informations on the asset pricing model (1) as the ex-ante risk premia γ. For

instance, γP is centered around (i.e. is unbiased for) γ and, as T increases, γP →p γ. Moreover, the asset

pricing model (9) is linear γP . More importantly, γP allows to conduct inference on γ −E(ft) which is exactly

as informative as γ on the asset pricing implications of the model, a feature exploited by Gagliardini et al.

(2016).13

Note that the discrepancy between the ex-ante and ex-post risk premia is not due to the latency of the

risk factors in (1) but is a necessary consequence of keeping T fixed. Raponi et al. (2018) develop a complete

methodology for inference on linear asset pricing models, such as (1), but when the ft are (tradeable or non-

tradeable) observed factors. Their object of inference consists necessarily the ex-post risk premia and, as here,

they assume that only the number of assets, N , diverges.

Therefore, by exploiting the restriction (2) arising from the no-arbitrage assumption, a natural estimator

for the ex-post risk premia, γP , emerges, namely:

γ ≡ T−1∑s∈Tt

Fs,t, (10)

as the PCA estimator for the Fs,t accurately quantifies the Fs. Representation (9) suggests a different, perhaps

more traditional, way to estimate the ex-post risk premia γP , namely the two-pass estimator:14

γtwopass ≡ (Λ′Λ)−1Λ′xN , (11)

setting xN ≡ (x1, · · · , xN )′ = X′1T /T . We formally show that (see Proposition 3 in Appendix B)15

γtwopass = γ. (12)

13For the case of observed risk factors, this readily follows by simply noticing that as f is observed, then an estimator for γ−E(ft)can be immediately obtained by netting out f from an estimator for γP . Some care is required when using such estimators for testingthe asset pricing restriction as, for instance, when the factors are excess returns from tradeable assets, then the null hypothesis ofcorrect model specification corresponds to a zero value for γ − E(ft) in population.

14The two-pass estimator, popularized by Black et al. (1972) and Fama & MacBeth (1973), provides the most popular estimatorfor the exante risk premia γ when the risk factors are observed in (1) and exact pricing holds, i.e. ai = 0 in (2). Shanken (1992)provides the large-T asymptotics for the two-pass estimator, further generalized by Jagannathan & Wang (1998), among others.More recent work on the two-pass methodology, allowing for both N and T to diverge, has been developed by Bai & Zhou (2015)and Gagliardini et al. (2016).

15The expression of estimator γtwopass is very similar to the one studied in Giglio & Xiu (2017), the main difference being thatGiglio & Xiu (2017) estimate the loadings by PCA after de-meaning model (1), that is by applying PCA to the xit− xi. This is notfeasible in our fixed-T context as de-meaning induces dynamic serial dependence in the idiosyncratic component, and thus violatingour regularity assumptions. However, de-meaning does not cause any bias (asymptotically) in Giglio & Xiu (2017) because they letboth N and T to diverge. For the same reason, no (asymptotic) difference arises in their setup between the ex-ante and ex-postrisk premia, as γP − γ →p 0 as T →∞.


This result differs starkly from the observed risk factors case, where the sample mean of the realized return of

a traded factor is not necessarily identical to the two-pass CSR OLS estimator, unless exact pricing is imposed

(see Shanken (1992, Section 2)). The above equivalence holds under Assumption 1, much milder than exact

pricing. Another way to understand our equality result is that, unlike the observed factors case, in our latent

factor cases, both estimators use the same information, namely a panel of excess returns. In contrast, for

the observed factor case, the sample mean of the factor realized return ignores completely all the information

stemming from the panel of returns on the test assets. This feature has been often be advocated as a limitation

of the sample mean estimator as its use is valid only under the (strong) assumption of exact pricing. This

criticism does not apply to our framework, as we allow for pricing errors of unspecified form in (2).

Turning now to the general, time-varying, case, one defines the ex-post risk premia as:

γPt−1 ≡ Ft = γt−1 + ft − Et−1(ft) for every t ∈ Z. (13)

Raponi et al. (2018) introduce the notion of time-varying ex-post risk premia (13) and develop a large-N

inferential methodology for the case of observed risk factors. Here we complement these results to the case of

latent risk factors. In light of (13), we will provide the limiting distribution theory for the estimator of γPs−1:

γs−1 ≡ Fs,t for every s ∈ Tt and t ∈ Z. (14)

As for the static case discussed above, an alternative, numerically equivalent, estimator for the time-varying

risk premia is:16

γtwopasss−1 ≡ (Λ′t−1Λt−1)−1Λ′t−1xs. for every s ∈ Tt and t ∈ Z, (15)

whose limiting properties immediately follow from the ones of γs−1 in light of the equivalence γtwopasss−1 = γs−1.

However, when time-variation holds, (10) remains a meaningful quantity, especially when T is small, as γ

will estimate the average ex-post risk premia

γP ≡ F =1

T

∑s∈Tt

γs−1 + f − 1

T

∑s∈Tt

Es−1(fs). (16)

When T is small, γP represents a local average of the time-varying γPs−1 which, on one hand, can accurately

capture the dynamics of the true risk premia (depending on their degree of time-variation) and, on the other,

16Estimator γtwopasss−1 is essentially identical to the one proposed by Raponi et al. (2018), the main difference being that they onlyconsider observed factors


can be more accurately estimated than γPs−1, due to the averaging. These considerations prompt us to study

also the asymptotic properties of (10).17

We now illustrate how to estimate the SDF associated with the asset pricing model (1). It turns out that

the PCA estimates provide exactly the necessary ingredients for consistent estimation of the SDF. Uppal et al.

(2018)[Theorem 1 and 2] establish that the SDF implied by the linear factor asset pricing models when a

risk-free asset is traded and all risk factors ft are observed (tradeable or non-tradeable), is given by:

mt,t+1 ≡1

rft− 1

rftγ ′tΩ

−1t (ft+1 − Et(ft+1))− 1

rfta′tΣ

−1t (xt+1. − Et(xt+1.)) for every t ∈ Z, (17)

setting the asset excess returns’s conditional covariance matrix Σt ≡ ΛtΩtΛ′t + Σet. In fact,

Pt(1) = Et(mt,t+1) =1

rftand Pt(xit+1) = Et(mt,t+1xit+1) = 0 for every i = 1, · · · , N and t ∈ Z, (18)

as the xit are excess returns.

We depart from Uppal et al. (2018) in several ways. First, the risk factors ft are never observed in our

framework. Second, the true pricing errors ait cannot be identified, and thus cannot be estimated, by the PCA

procedure, leading us to ignore the term in the aits in (17). Third, and foremost, as discussed above, in a fixed-T

environment one cannot identify the risk premia γt−1 but only γPt−1 and its sample average γP . Along the same

lines, when T remains fixed, we cannot identify the population quantities Ωt and ft+1−Et(ft+1) = Ft+1−Et(Ft+1)

but only their sample counterparts Ω ≡ T−1F′M1TF and Ft+1 − γP .

This implies that, within our fixed-T environment, the object of inference consists necessarily of the ex-post

SDF:

mPt,t+1 ≡

1

rft− 1

rftγP ′Ω−1(Ft+1 − γP ) for every t ∈ Z. (19)

Note that mPt,t+1 is not feasible and needs to be estimated as the Ft are not observed or, in other words, mP

t,t+1

represents the population ex-post SDF, from a fixed-T perspective, even though mPt,t+1 is a function of the

time-series sample moments of the Ft.

Clearly, mt,t+1 and mPt,t+1 differ but, under mild assumptions on the latent risk factors, the ex-post SDF

mPt,t+1 satisfies the pricing conditions (18) with a (pricing) error of order O(T−1), namely (see Proposition 2

for a formal proof):

Et(mPt,t+1xit+1) = O(T−1) for every i = 1, · · · , N and t ∈ Z.

17Although this is not considered in our large-N and fixed-T setting, if one considers the nonparametric sampling scheme bywhich T−1

0 + T/T0 → 0, that is when the length of the sample becomes arbitrarily small as T0 increases, then γP would identifythe time-varying risk premia γt−1. See Raponi et al. (2018), Internet Appendix, for details on nonparametric estimation of riskpremia. This provides a further justification for studying γ as an estimator of γP .

3.4 Assumptions 17

This fast rate is achieved because, although mt,t+1 and mPt,t+1 differ by an order of magnitude of Op(T

− 12 ),

due to the difference between Ωt,Ft+1 − Et(Ft+1) and Ω,Ft+1 − γP , the averaging embedded in expectation

operator involved when using the SDFs for pricing (e.g. Eq. (18), leads to the faster O(T−1) rate.18

As the PCA procedure provides the estimator Fs,t of the Fs, the natural estimator for mPs,s+1 is:

ms,s+1 ≡1

rfs− 1

rfsγ ′Ω−1(Fs+1,t − γ) for every s ∈ Tt and t ∈ Z, (20)

setting Ω ≡ T−1F′M1T F = Irt − γγ ′. We will illustrate below the limiting properties, as N →∞, for our SDF

estimator, which will take into account the sampling variability stemming from the PCA estimates of the risk

factors.

3.4 Assumptions

We now present our regularity assumptions needed to deriving the limiting properties of the PCA estimator

for the number of factors and for the factors themselves. Note that all convergences hold for N → ∞ and

conditionally on F = (F1, · · · ,FT )′ unless stated otherwise. Without loss of generality, we can follow Bai &

Ng (2002) and assume that the loadings λi are non random but, given our large-N approach, it seems justified

to allow them to be random. We report our notation in Appendix A.

Assumption 2 (factors) For any T such that rt ≤ T <∞,

ΣF =F′F

T> 0 a.s.

If rt−1 > T then ΣF will be singular, with at least rt−1 − T zero eigenvalues, a case that we rule out as it

induces multiple (and, moreover, zero) eigenvalues.

Assumption 3 (loadings time-varying) For every t ∈ Z:

Λ′t−1Λt−1

N→p ΣΛt−1 > 0.

Assumption 4 (eigenvalues) For every t ∈ Z, the eigenvalues of the rt−1 × rt−1 matrix (ΣΛt−1ΣF) are

distinct a.s.

18Proposition 2 represents the only result of this paper where we allow T to diverge, throughout this paper, and its only purposeis to establish that using the ex-post SDF mP

t,t+1 leads to pricing errors of almost identical magnitude to mt,t+1.

3.4 Assumptions 18

Assumption 5 (idiosyncratic component - temporal and cross-sectional dependence) Set σij,uv =

E(eiuejv) and κh,i1...ih,t1...th = cumulant(ei1t1 , ei2t2 , . . . , eihth), with Es(eiuejv) = E(eiuejv) for every s < u, v.

For for every 1 ≤ i, j ≤ N , every s, v ∈ Tt and t ∈ Z:

E(eis) = 0, E(eiteisejv) = 0, |σij,sv| ≤ C, |κiijj,ttvs| ≤ C

1

N

N∑i=1

(eiseiv − σii,sv)→p 0,

1

N

N∑i=1

(e4is − κ4,iiii,ssss − 3σ2

ii,ss)→p 0,

with

1

N

N∑i=1

(σii,sv − 1(s=v)σ2t−1) = o(

1√N

),1

N

N∑i=1

N∑j=1j 6=i

|σij,sv| → 0 and supt∈Z

g1(Σet−1) = o(N12 ) a.s.,

1

N

N∑i=1

(σ2ii,ss − σ4t−1) = o(

1√N

),

1

N

N∑i=1

(κ4,iiii,ttsv − 1(t=v=s)κ4t−1) = o(1√N

) and1

Nsupi1

N∑i2,...,ih=1

|κh,i1...ih,t1...th | → 0

for every 3 ≤ h ≤ 8, every tj ∈ Tt (1 ≤ j ≤ h) and for at least one ij (2 ≤ j ≤ h) different from i1, for some

σ2t , σ4t, κ4t satisfying

0 < inft∈Z

σ2t ≤ sup

t∈Zσ2t <∞, 0 < inf

t∈Zσ4t ≤ sup

t∈Zσ4t <∞, 0 ≤ inf

t∈Z|κ4t| ≤ sup

t∈Z|κ4t| <∞.

Our assumptions of (local in time but asymptotic in N) dynamic homoskedasticity and uncorrelatedness

of the eits are suggested by Bai (2003), who establishes that these conditions are necessary and sufficient for

consistency of the PCA factors when T is fixed and the factor model is static, strengthening Connor & Korajczyk

(1986) sufficiency result. We are able to somewhat relax these in the sense to impose them locally within any

given time-interval of length T . At any rate, asset returns, especially at the level of individual assets, appear to

be empirically only mildly autocorrelated across time. Our assumption imposes equality between conditional

and unconditional moments of the eits. However, this is not as restrictive as it seems given that we are not

imposing constancy but are instead allowing for (local) time-variation of the unconditional moments, a form of

local stationarity along the lines of Dahlhaus (1997). As N diverges, the number of parameters say in Σet−1

increases, a manifestation of the curse of dimensionality and it becomes necessary to limit the cross-sectional

heterogeneity of the moments of the idiosyncratic component eit. The rate on the maximum eigenvalue of

Σet−1 does not follow from the previous condition on the behaviour of the sum of the cross-covariances, and is

3.4 Assumptions 19

necessary in order to deal with the effect of the pricing errors ait in the aymptotics. This eigenvalue condition

is not needed when ait−1 = 0 for every i and t.

Our assumptions are extremely mild in terms of the permitted degree of cross-sectional dependence. For

instance assume that the eit satisfy a factor structure, imposing constant moments for simplicity, such as:

ei,t = δiut + ηi,t, (21)

where, for some positive finite constants C1 ≤ C2 and 0 < ε1 ≤ 2ε2 < 1/2, for any N ,

C1Nε1 ≤

N∑i=1

δ2i and

N∑i=1

|δi| ≤ C2Nε2 , (22)

with ut i.i.d. (0, 1) and ηi,t i.i.d. (0, σ2η) over time and across units, where the ut and the ηi,s are mutually

independent for every i, s, t. It follows that the second and fourth condition (involving the second moments) of

Assumption 5 hold and, more in general all the other conditions will follow by making suitable higher-moments

assumptions on the ηit. At the same time, the maximum eigenvalue of the covariance matrix Σe = var(et.),

given by∑N

i=1 δ2i + σ2

η, diverges at a rate not slower than O(N ε1), for some 0 < ε1 < 1/2. This implies, for

example, that the the row- and column-norm of Σe, namely supj∑N

i=1 |σij,tt| also necessarily diverges with N ,

manifesting a substantial degree of cross-sectional dependence.

The zero third-moment assumption is only made to simplify exposition and can be relaxed. The cu-

mulants’ assumption implies that the fourth-order cumulant κ4,iijj,tstv = cumulant(eit, eis, ejt, ejv) satisfies

N−1∑N

i=1

∑Nj=1j 6=i

κ4,iijj,ttsv → 0.

Assumption 6 (mixed moments) For every t ∈ Z:

1

N

N∑i=1

λit−1e′.i →p 0rt−1×T , (23)

1

N

N∑i=1

(λit−1λ′it−1 ⊗ E(e.ie

′.j))→p (ΣΛt−1 ⊗ σ2

t−1IT ), (24)

Condition (24) can be generalized to

1

N

N∑i=1


′.j))→p Γt−1,

as in Bai (2003), without imposing Γt−1 = (ΣΛt−1⊗σ2t−1IT ).19 In contrast, under our assumptions, it is possible

to estimate the asymptotic covariance matrix of 1/√N∑N

i=1(e.ieis − E(e.ieis)), as explained below.

19Consistent estimation of Γt−1 is possible although with a more elaborate proof.

3.4 Assumptions 20

Assumption 7 (joint convergence in distribution) For every s ∈ Tt and t ∈ Z:

1√N

N∑i=1

[(e.ieis − E(e.ieis))

(λit−1 ⊗ e.i)

]→d N

(0T (rt−1+1),

[(κ4t−1 + σ4t−1)ιsι

′s + σ4t−1IT 0T×Trt−1

0Trt−1×T ΣΛt−1 ⊗ σ2t−1IT

]). (25)

Our theorems require the joint distribution of

1√N

N∑i=1

(e.ieis − E(e.ieis))λit−1eis

(λit−1 ⊗ e.i).

,which can be derived by premultiplying the right hand side of (25) by the (T + rt−1 + rt−1T ) × (T + rt−1T )

matrix:

As,t ≡

IT 0T×rt−1T

0rt−1×T+rt−1(s−1) Irt−1 0rt−1×rt−1(T−s)0rt−1T×T Irt−1T×rt−1T

, (26)

one for every s ∈ Tt and t ∈ Z.

Primitive conditions for Assumption 7 can be derived but at the cost of raising the level of complexity of

our proofs. For instance, when (21) with (22) hold,

1√N

N∑i=1

e.i →d N(0T , σ2ηIT ) as N →∞,

by Theorem 2 of Kuersteiner & Prucha (2013) when the ηit satisfy their martingale difference assumption (see

their Assumptions 1 and 2.) This result extends easily to (25) under suitable additional assumptions. (Details

are available upon request.)

The implicit assumption that justifies this new angle is that the time-variation affecting model (1), in

particular regarding the loadings Λt−1 and the idiosyncratic variances σii,tt, is sufficiently smooth within the

time window of size T . The strength and generality afforded by such argument relies on the possibility of taking

T sufficiently small, which the asymptotic theory of the preceding sections allows, as long as rt−1 < T for every

T ≤ t ≤ T0. In particular, the following three smoothness assumptions suffice.

Assumption 8 (number of factors - smoothness) For every s ∈ Tt and t ∈ Z,

rs = rt−1.

Our fixed-T large-N approach permits time-variation of virtually any parameter of interest. Even if rt−1 is

allowed to change over time, we are ruling out that such change occurs within the given interval Tt.

21

Assumption 9 (loadings - smoothness) For every s ∈ Tt and t ∈ Z, the Λt−1 are differentiable functions

of t such that their first-derivative, Λ(1)t−1, satisfy

Λ′t−1Λ(1)s = op(N), e′s.Λ

(1)t−1 = op(N).

As an example of Assumption 9, one can have that only a fraction N0/N of the units has a time-varying loadings,

with N0/N → 0. An important, special, case is when the loading matrix Λt−1 is a differentiable function of a

set of observed state variables, say represented by the finite-dimensional vector zt−1, that is Λt−1 = Λ(zt−1).

Assumption 10 (idiosyncratic component - smoothness) The e.i satisfy

e.i = Σ12eiη.i with Σei ≡ E(e.ie

′.i) = [σii,uv] for i = 1, · · · , N and u, v ∈ Tt, t ∈ Z,

and η.i = (ηi1, · · · , ηiT )′ satisfying E(η.i) = 0, E(η.iη′.i) = IT with η.i and Σej mutually independent for any

i, j. Then, for every s ∈ Tt and t ∈ Z, assume that the σii,s and κ4,iiiissss are differentiable functions of s such

that they first-derivative, σ(1)ii,ss and κ

(1)4,iiiis, satisfy, for every a, b, c, d ∈ Tt and a ≤ a∗ ≤ t, b ≤ b∗ ≤ t, c ≤ c∗ ≤

t, d ≤ d∗ ≤ t:

N∑i=1

|ηiaηibσ(1)ii,a∗σ

(1)ii,b∗ | = op(N),

N∑i=1

|ηiaηibσ(1)ii,a∗σii,tt| = op(N),

N∑i=1

|σ(1)ii,a∗ | = o(N),

N∑i=1

(|σ(1)ii,a∗ |σii,tt) = o(N),

N∑i=1

(|σ(1)ii,a∗ |σii,a∗) = o(N),

N∑i=1

|σ(1)ii,a∗σ

(1)ii,b∗ | = o(N),

N∑i=1

(λ′it−1λit−1)|σ(1)ii,a∗ | = op(N),

N∑i=1

|κ(1)4,iiiia∗ | = o(N).

4 Inference

4.1 Inference on Risk Factors

In this section we present the limiting distribution theory for the PCA-estimated risk factors.

Theorem 1 (rate of convergence and limiting distribution of the risk factors estimator) Under As-

sumptions 1-10, as N →∞, F(Tt) = (Ft−T+1,t, · · · , Fs,t, · · · , Ft,t)′ and Λ(Tt−1) = (λ1t−1, · · · , λNt−1)′ satisfy:

(i)

‖ Fs,t − H′t−1Fs ‖= Op(N− 1

2 ) for every s ∈ Tt and t ∈ Z.

4.1 Inference on Risk Factors 22

(ii)√N(Fs,t − H′t−1Fs

)→d N(0rt−1 ,A

′t−1Bs,t−1At−1) for every s ∈ Tt and t ∈ Z,

setting

At ≡ (Irt , Irt , Irt)′,

Bs,t ≡

σ4tT U−2

t + (κ4t+σ4t)T 2 U−1

t H′tFsF′sHtU

−1t 0rt×rt 0rt×rt

0rt×rt σ2tU−1t

σ2tT U−1

t QtΣΛtFsF′sHtU

−1t

0rt×rtσ2tT U−1

t H′tFsF′sΣΛtQ

′tU−1t

σ2tT (F′sΣΛtFs)U

−2t

,where Qt and Ut are defined in Lemma 3.

Proof. See Appendix D.

When T is also allowed to diverge, the asymptotic distribution of the estimated factors simplifies significantly.

In particular, the result of Bai (2003), Theorem 5, is re-obtained for our static case, except that we obtain a

much simplified expression for the asymptotic covariance matrix: thank to our Assumption 6, adopting Bai

(2003) notation, V−1t−1Qt−1Γt−1Q

′t−1V

−1t−1 = σ2

t−1U−1t−1. The latter formula carries a very neat interpretation:

it states that the (asymptotic) precision of the estimated factors increases proportionally with the magnitude

of the associated (normalized) eigenvalue. At the same time, the precision diminishes, equally across the rt−1

factors, when the idiosyncratic average variance σ2t−1 increases.

In order to construct confidence intervals and to perform hypothesis testing, for the estimated factors,

one needs to consistently estimate the asymptotic covariance matrix reported in Theorem 1-(ii). This entails

estimating σ2t−1 and the matrices ΣΛt−1 ,Qt−1,Ut−1, among others. For the static case Bai (2003) shows that

this is feasible when both N and T diverge, despite the rotation indeterminacy of the estimated factors and

loadings associated with the rotation matrix Ht−1. However, in our sampling framework, estimation of ΣΛt−1

appears particularly problematic because the λit−1 are not consistently estimated when T is fixed, regardless of

the rotation. In particular, following Bai (2003)[proof of Theorem 2] and using by Theorem 1-(i), the following

decomposition and limit hold:

λit−1 = H−1t−1λit−1 +

F′e.iT

+1

TF′(F− FH−1

t−1)λit−1 →p H−1t−1λit−1 +

H′t−1F′e.i

T. (27)

Note that the lack of consistency of the loadings is unrelated to latency of the factors: it will still hold when the

Ft are observed. However, by a careful analysis, we are still able to estimate consistently the second moment

matrix of the λit−1 for T fixed. This is accomplished in the next theorem.

4.1 Inference on Risk Factors 23

Theorem 2 (consistent estimation of asymptotic covariance matrix) Under Assumptions 1-10 and the

identification condition κ4t−1 = 0, as N →∞,

‖ Bs,t −Bs,t ‖→p 0 for every s ∈ Tt and t ∈ Z,

where

Bs,t ≡

σ4tT−1U∗−1

t

(Irt + T−1Fs,tF

′s,t

)U∗−1t 0rt×rt 0rt×rt

0rt×rt σ2t U∗−1t σ2

t T−1U∗−1

t ΣΛtFs,tF′s,tU

∗−1t

0rt×rt σ2t T−1U∗−1

t Fs,tF′s,tΣΛtU

∗−1t σ2

t T−1(F′s,tΣΛtFs,t)U

∗−2t

,setting

σ2t ≡ T

T − rtVTt(rt) with VTt(rt) ≡

1

NTtrace

((X− FΛ′)(X− FΛ′)′

),

σ4t ≡(

3 +27

T(∑s∈Tt

((F′s,tFs,t)

T)2) +

18rtT

)−1( 1

NT

N∑i=1

∑s∈Tt

e4is

),

U∗t = Vt −σ2t

TIrt = ΣΛt .


When estimating the asymptotic covariance matrix, one needs necessarily to set κ4t−1 = 0 for identification

purposes, as evinced from Lemma 4 in Appendix C. However, higher-order cumulants are not constrained to

be zero, implying that κ4t−1 = 0 is not equivalent to Gaussianity. Moreover, heterogeneity of the σii,tt across

assets induces (potentially) a very large (average) volatility of the idiosyncratic component eit, quantified by

the difference between σ4t−1 and σ4t−1. Note that ΣΛt−1 is identical to U∗t−1 as Vt−1 = Λ′t−1Λt−1/N , but,

for expositional clarity, we prefer to refer to them with a different notation, in particular to facilitate the

construction of the standard errors for the PCA-estimated risk factors.

We now illustrate estimation of the number of latent factors. Hence, assume that a k factor model is

estimated, where k can be smaller, equal or larger than the true number of factors rt−1. In particular, Fk

denotes the T × k matrix of the k eigenvectors associated with Vkt−1, the diagonal matrix containing the k

largest eigenvalues of XX′/NT and, similarly, Λkt−1 = T−1X′Fk denotes the N × k matrix of loadings Set, for

some function g(N),

PCTt(k) ≡ V ∗Tt(k) + kg(N)

with V ∗Tt(k) ≡ ( TT−k )VTt(k) and

VTt(k) ≡ 1

NTtrace

((X− FkΛk

t−1′)(X− FkΛk

t−1′)′).

4.2 Inference on Risk Premia and the SDF 24

Theorem 3 (consistent estimation of the number of factors) Suppose that Assumptions 1-10 hold and

that g(N)→ 0 together with√Ng(N)→∞. For every t ∈ Z, set

kt−1 ≡ argmin0≤k≤TPCTt(k).

Then

Prob(kt−1 = rt−1)→ 1 as N →∞.


As for the double asymptotic result of Bai & Ng (2002), the result follows by showing that V ∗Tt(k) is (a.s.)

up-ward biased when k < rt−1 although it is not (asymptotically) biased whenever k ≥ rt−1. The penalization

through the g(N) function, on the other hand, is asymptotically negligible precisely when is not needed, that

is when k < rt−1, and instead it induces a bias, as desired, whenever k > rt−1, the bigger the slower is g(N)

converging towards zero, leading to consistency of kt−1 for rt−1.

The strength of Theorem 3 is that it represents the only existing method to consistently estimate the number

of latent factors for model (1) when T is fixed and N diverges. All other existing methods surveyed in Section 2,

require double asymptotics.

Theorem 3 relies on studying the behaviour of Fk including the case k > rt−1 (see Lemma 5). In particular,

we show that when an erroneously larger number of risk factors are considered, that is k > rt−1, the corre-

sponding risk premia consist of a linear combination of the risk premia associated with the correct number (viz.

rt−1) of risk factors. This suggests that the use of PCA for risk premia estimation could potentially avoid the

problems arising from including too many factors, such as when useless, or spurious, factors (i.e. factors that

are uncorrelated with the test assets) are included.20

4.2 Inference on Risk Premia and the SDF

The previous results permit to derive the limiting properties of the risk premia and SDF estimators. In

particular, in view of (14), Theorem 1 provides directly the limiting distribution theory for the estimator of

20Jagannathan & Wang (1998), Kan & Zhang (1999a), Kan & Zhang (1999b) and Gospodinov et al. (2017) pointed out howinclusion of useless factors leads to serious problems when conducting inference on beta-pricing models, and various methods havebeen proposed to tackle them; see Kleibergen (2009), Gospodinov et al. (2014), Bryzgalova (2014), Burnside (2016), Anatolyev &Mikusheva (2018) and Raponi & Zaffaroni (2018) among others.


γPs−1, namely:

γs−1 ≡ Fs,t for every s ∈ Tt and t ∈ Z,

and for the numerically equivalent estimator γtwopasss−1 .

It is important to recognize that no gain appears with respect to estimation of the time-varying risk premia

γs−1 from taking T large. This represents a key motivation for using our large-N results. More specifically,

γs−1 − H′t−1γs−1 = H′t−1(fs − Es−1(fs)) + Γs,NT = Op(1) +Op(N− 1

2 ),

for a random quantity satisfying Γs,NT = Op(N− 1

2 ), by Theorem 1.21 The only effect of allowing also a large

T is that the asymptotic covariance matrix of γs−1 will be smaller (in the matrix sense) without affecting the

rate of convergence, which is still Op(N− 1

2 ).

Summarizing, when time-variation is allowed for, the ex-ante risk premia H′t−1γs−1 cannot be consistently

estimated even when N and T both diverge. Instead, the ex-post risk premia H′t−1γPs−1 can be consistently

estimated only when N diverges, with T affecting the asymptotic precision. The latter effect could be relevant

in finite, small, samples.

Considering now the average risk premia estimator γ in (10), it follows that

γ = T−1∑s∈Tt

Fs,t →p T−1∑s∈Tt

H′t−1γPs−1 = H′t−1

(γ + f − T−1

∑s∈Tt

Es−1(fs)

)≡ H′t−1γ

P ,

setting γ ≡ T−1∑

s∈Tt γs−1 and f ≡ T−1∑

s∈Tt fs, that is γ now captures accurately the local average of the

time-varying risk premia γP , for which inference can be made by generalizing our PCA results as follows.

Theorem 4 (rate of convergence and limiting distribution of the risk premia estimator) Under As-

sumptions 1-10, as N →∞,

(i) The risk premia estimator γ identifies the population risk premia γP multiplied by the rotation matrix Ht−1.

(ii)

‖ γ − H′t−1γP ‖= Op(N

− 12 ) for every t ∈ Z. (29)

21In particular:

γs−1 = H′t−1γPs−1 + U−1

t−1

F′

T(ees.N− E(

ees.N

)) + U−1t−1

F′F

T

Λ′t−1es.N

+ U−1t−1

F′

T

eΛt−1

NFs

= H′t−1γs−1 + H′t−1(fs − Es−1(fs)) + Γs,NT , , (28)

where Γs,NT ≡ T−1U−1t−1F

′( ees.N− ( ees.

N)) + T−1U−1

s F′FΛ′t−1es.

N+ T−1U−1

s F′eΛt−1

NFs.


(iii)√N(γ − H′t−1γ

P )→d N(0rt−1 ,A′t−1Ct−1At−1) for every t ∈ Z,

setting

Ct ≡

(κ4t+σ4t)

T 2 U−2t + σ4t

T 2 U−1t H′tFF′HtU

−1t 0rt×rt 0rt×rt

0rt×rtσ2tT U−1

tσ2tT U−1

t QtΣΛtFF′HtU−1t

0rt×rtσ2tT U−1

t H′tFF′ΣΛtQ′tU−1t

σ2tT (F′ΣΛtF)U−2

t

,where Qt and Ut are defined in Lemma 3.

(iv) When, in addition, κ4t = 0,

‖ Ct −Ct ‖→p 0 for every t ∈ Z,

setting

Ct ≡

σ4tT 2 U∗−2

t + σ4tT 2 U∗−1

t¯F¯F′U∗−1

t 0rt×rt 0rt×rt

0rt×rtσ2tT U∗−1

tσ2tT U∗−1

t ΣΛt

¯F¯F′U∗−1t

0rt×rtσ2tT U∗−1

t¯F¯F′ΣΛtU

∗−1t

σ2tT ( ¯F′ΣΛt

¯F)U∗−2t

,where σ2

t , σ4t, U∗t and ΣΛt are defined in Theorem 2.


Theorem 4 is in agreement with Raponi et al. (2018), who noticed that their large-N fixed-T estimation

procedure, designed for constant risk premia, would still lead to a meaningful estimator for local averages of

the time-varying risk premia. Recall that T is fixed and possibly very small so such local averages might still

display a pronounced time-variation when evaluated over a sequence of rolling samples. In contrast, if one lets

T to diverge, then γ will capture the integrated risk premia over the entire time line, that is it will converge to

(net of a rotation)∫∞−∞ γsds, given that f − T−1

∑ts=t−T+1 Es−1(fs) = op(1) under mild conditions. Therefore,

no information on time-variation can be recovered from γ if one lets T to diverge.

As the vector of estimated risk premia is a sample average of the latent factors, its precision increases with

T , in particular at rate O(T−12 ). Indeed, although this is deliberately not explored in this paper, our result

shows that γ is consistent at rate O((NT )12 ) when both N and T diverge. A closer analysis explains the reasons

behind such fast rate of convergence. In fact, averaging (28) across Tt gives

γ = H′t−1(1

T

∑s∈Tt

γs−1) + H′t−1(f − 1

T

∑s∈Tt

Es−1(fs)) + ΓNT ,

where we set ΓNT ≡ T−1∑

s∈Tt Γs,NT . By Theorem 4, it follows that ΓNT = Op((NT )−12 ) and, in fact,

Dello Preite & Zaffaroni (2019) establish asymptotic normality of√NT ΓNT , and thus of

√NT (γ − H′t−1γ

P ),


as both N,T diverge. This fast rate seems at odds with the usual rate of convergence for estimated risk premia,

in the presence of observed risk factors. For instance, the sample mean of a traded factor (i.e. a portfolio excess

return) converges at rate O√T ), and the same rate applies to the traditional two-pass OLS estimator. The

reason for this apparent contradiction is that γ − H′t−1γP is, implicitly, capturing the randomness (i.e. the

asymptotic distribution) around T−1∑

s∈Tt νs−1, where νs−1 ≡ γs−1−Es−1(fs), namely the average risk premia

netted out by the factor average conditional expectation or, in other words, the portion of the risk premia that

is not linearly dependent on the factors’ expectation. Besides the rotation by the matrix H′t−1, which is just a

by-product of having unobserved risk factors, such fast rate is identical to the one found by Gagliardini et al.

(2016), in a double asymptotic setting, and when the risk factors are assumed to be observed.22

In contrast, if inference on the average (ex-ante) average risk premia T−1∑

s∈Tt γs−1 is aimed for, then

γ − H′t−1

1

T

∑s∈Tt

γs−1 = H′t−1

1

T

∑s∈Tt

(fs − Es−1(fs)) + ΓNT = Op(T− 1

2 ) +Op((NT )−12 ),

and the slower, traditional,√T -rate of convergence emerges.

Regarding the two-pass estimator γtwopasst−1 , Theorem 4 applies, in view of (12), but one can also use direct

arguments to show√N -rate of convergence:

γtwopasst−1 =(Λ′t−1Λt−1

)−1Λ′t−1xN

→p

(H−1t−1ΣΛt−1H

−1′t−1 +

σ2t−1

TIrt−1

)−1(H−1t−1ΣΛt−1H

−1′t−1H

′t−1γ

P +σ2t−1

TH′t−1F

)= H′t−1γ

P .

Interestingly, no-bias adjustment is required for γtwopasst−1 , unlike the case when the risk factors are observed.23

Given (12), inference on γtwopasst−1 can be readily conducted by using Theorem 4.

Giglio & Xiu (2017) illustrate a generalization of their method to time-varying loadings whereby the loadings

are function of observed state variables. Similarly, but in the context of observed risk factors, Gagliardini

et al. (2016) allow for observed state variables in loadings and risk premia. Our approach differs from both

contributions because we do not need to make any assumption on the form of time-variation of risk premia,

such as dependence from some state-variables.

Our asymptotic distribution theory delivers the limiting properties of the SDF estimator mt,t+1 of Eq. (20).

22Obviously, with tradeable, and thus observed, factors νs−1 = 0rt−1 under correct model specification.23See Raponi et al. (2018) for the analysis of the two-pass risk premia estimation when only N diverges.


Theorem 5 (rate of convergence and limiting distribution of the SDF estimator) Under Assumptions

1-10, as N →∞,

(i)

‖ ms,s+1 −mPs,s+1 ‖= Op(N

− 12 ) for every s ∈ Tt and t ∈ Z.

(ii)√N(ms,s+1 −mP

s,s+1

)→d N(0,D′s,t−1Et−1Ds,t−1) for every s ∈ Tt and t ∈ Z,

setting

Ds,t ≡ (IT ⊗H−1t )

1

rfs

(−((

1TT

)⊗ Irt)Ω−1(F′(ιs+1 −

1TT

))− ((ιs+1 −1TT

)⊗ Irt)Ω−1(F′

1TT

)

+(M1TFΩ−1F′ ⊗ Ω−1F′)

[(1TT⊗ (ιs+1 −

1TT

)) + ((ιs+1 −1TT

)⊗ 1TT

)

]),

Et ≡ (IT ⊗U−1t

H′tF′

T)Uet(IT ⊗

FHt

TU−1t ) + (σ2

t IT ⊗U−1t ) + (FΣΛtF

′ ⊗ σ2t

TU−2t )

+(σ2t

FHt

TU−1t ⊗H′tF

′)KrtT +KTrt(σ2tU−1t

H′tF′

T⊗ FHt).

where Uet is defined in Appendix 8.5 and Kab defines the commutation matrix of order (a, b) (see Magnus &

Neudecker (2007)[Chapter 3, Section 7] )

(iii)

‖ Ds,t −Ds,t ‖→p 0, ‖ Et −Et ‖→p 0 for every s ∈ Tt and t ∈ Z,

setting

Ds,t ≡ 1

rfs

(−((

1TT

)⊗ Irt)Ω−1(F′(ιs+1 −

1TT

))− ((ιs+1 −1TT

)⊗ Irt)Ω−1(F′

1TT

)

+(M1T FΩ−1F′ ⊗ Ω−1F′)

[(1TT⊗ (ιs+1 −

1TT

)) + ((ιs+1 −1TT

)⊗ 1TT

)

]),

Et ≡ (IT ⊗ U∗−1t

F′

T)Uet(IT ⊗

F

TU∗−1t ) + (σ2

t IT ⊗ U∗−1t ) + (FΣΛtF

′ ⊗ σ2t

TU∗−2t )

+(σ2t

F

TU∗−1t ⊗ F′)KrtT +KTrt(σ2

t U∗−1t

F′

T⊗ F),

where σ2t , σ4t, U

∗t are defined in Theorem 2, and Uet is defined in Appendix E.

It is interesting to compare the limiting behaviour of the SDF estimator ms,s+1 as N → ∞, established

above, with its behaviour under double asymptotics, that is when both N,T → ∞. It turns out that in the

double asymptotics setting, ms,s+1 converges to the (ex-ante) SDF ms,s+1 in Eq. (17) although at the slower

4.3 Inference with Mimicking Portfolios 29

rate δNT ≡ min[√T ,√N ]. Although different techniques, from the one developed in this paper, have to be

used, as the latent factors F, and their PCA estimates, become high-dimensional as T diverges, it can be

shown that δNT (ms,s+1 −ms,s+1) is asymptotically normal and that asymptotically valid standard errors can

be derived (see Dello Preite & Zaffaroni (2019) for details).

The time-varying risk premia result leads immediately to the limiting behaviour for (conditional) portfolio

expected returns, yielding

µat−1 ≡ wa′Nt−1Λt−1γ →p µ

at−1 ≡ µa′Λt−1

H−1′t−1H

′t−1γ

P = µa′Λt−1γP ,

whenever Λ′t−1waNt−1 →p H−1

t−1µaΛt−1

. This condition easily follows under mild conditions on the weights waNt−1.

The estimated conditional portfolio risk premium (i.e. the expected excess portfolio return) is asymptotically

independent of the rotation matrix Ht−1, hence it is perfectly identified.

4.3 Inference with Mimicking Portfolios

Here we examine the limiting behaviour of the mimicking portfolios excess returns, both in population and using

its PCA-based estimator. It turns out that the population mimicking portfolio excess returns Fmpt = wmp′

Nt−1xt.

reproduce accurately the true (latent) risk factors Ft as N diverges. This result does not formally depend on

whether Ft is observed or not, demonstrating that the notion of mimicking portfolios is useful also in an APT

framework, namely in the presence of pricing errors. However, precisely due to the presence of pricing errors,

the mimicking portfolios require N large in order to be effective. It follows that, in a APT framework, one

cannot estimate the mimicking portfolios by means of the sample moments, as T−1∑T

s=1 xs.x′s. is singular for

N < T (precisely at most of rank N − T ). However, a consistent estimator for Fmpt , valid when N → ∞ for

every t ∈ Z, can be derived, using our PCA procedure, yielding the mimicking portfolio estimator:

Fmpt ≡

(γγ ′ + Ω

)Λ′t−1

((Λt−1γ)(Λt−1γ)′ + Λt−1ΩΛ′t−1 + σ2

t−1IN

)−1xt. = wmp′

Nt−1xt. for every t ∈ Z. (30)

Moreover, by means of some algebraic manipulations, one can show that the mimicking portfolio estimator Fmpt

satisfies:

Fmpt =

(σ2t−1Irt−1 + Λ′t−1Λ

′t−1

)−1Λ′t−1xt. for every t ∈ Z.

It follows that, although not identical to the PCA-estimator, the mimicking portfolio estimator has the same

limiting properties of Ft,t as N diverges, and thus it will converge to a suitable rotation of the true (latent)

risk factors, as enunciated below.

4.4 Inference in the Absence of the Risk Free Asset 30

Theorem 6 (mimicking portfolios) As N →∞,

(i) Under Assumptions 1,3 and gN (Σet−1) ≥ C > 0 a.s.,

‖ Fmpt − Ft ‖= op(1) for every t ∈ Z,

(ii) Under Assumptions 1-10:

‖ Fmpt − Ft,t ‖= Op(N

−1) for every t ∈ Z. (31)

By part (i), the unfeasible (population) mimicking portfolio returns converge to the true (latent) factors. A

rate of convergence can be derived by strengthening Assumption 2, in particular is by defining the rate of

convergence of N−1Λ′t−1Σ−1et−1Λt−1. The closeness between the mimicking portfolio estimator Fmp

t and the

PCA estimator Ft,t, stated in part (ii), implies that they have the same rate of convergence and asymptotic

distribution, centered around H′t−1Ft, as spelled out in Theorem 1. In fact

Fmpt − H′t−1Ft = (Fmp

t − Ft,t) + (Ft,t − H′t−1Ft) = Op(N−1) + (Ft,t − H′t−1Ft) = (Ft,t − H′t−1Ft) + op(1),

implying√N(Fmp

t − H′t−1Ft) = Op(N− 1

2 ) +√N(Ft,t − H′t−1Ft).

4.4 Inference in the Absence of the Risk Free Asset

We now describe how our results need to be modified when a risk free asset is not tradable. All our results still

applies, except that one now needs to estimate an additional quantity, namely the zero-beta rate, which in our

large-N environment requires some further details. Throughout this section xit denotes the gross return for

asset i, still assumed to satisfy (1). However, the conditional asset pricing restriction (2) is now replaced by

µit−1 ≡ Et−1(xit) = ait−1 + γ0t−1 + λ′it−1γt−1 a.s. for every i = 1, · · · , N and t ∈ Z, (32)

where γ0t−1 indicates the (unknown) zero-beta rate, implying

xit = ait−1 + γ0t−1 + λ′it−1Ft + eit for every i = 1, · · · , N and t ∈ Z, (33)

where as before Ft = ft + γt−1 − Et1(ft). In vector form one obtains:

xt. = at−1 + γ0t−11N + Λt−1Ft + et. for every t ∈ Z. (34)


However, unlike the previous case when γ0t−1 = rft−1 + 1, one cannot apply PCA to the row data X, even in

the static case γ0t−1 = γ0, because γ0 is the risk premium associated with the constant (unit) factor and its

presence is not asymptotically negligible.

Giglio & Xiu (2017), who focus on the static case, circumvent this difficulty by applying PCA to the

time-series de-meaned data xit − xi, and then estimate γ0 in a second stage. This avenue is ruled out in our

framework, because it induces serial correlation in the eit, but one can still eliminate the influence of γ0 by

de-meaning with respect to the cross-sectional sample averages:

x∗t. ≡M1Nxt. = M1Nat−1 + M1NΛt−1Ft + M1Net.

= a∗t−1 + Λ∗t−1Ft + e∗t.,

as γ0t−1M1N1N = 0. In other words, one applies the PCA to the x∗it = xit − xt = xit − (N−1∑N

j=1 xjt),

where the same definition applies for the other starred quantities defined above. A crucial feature of such

cross-sectional de-meaning is that it does not alter the degree of serial (temporal) dependence.

Thus, application of PCA to the X∗ yields

F∗(Tt), Λ∗(Tt) = argminF,Λ∗t−1

1

NTtrace

((X∗ − FΛ∗′t−1)(X∗ − FΛ∗′t−1)′

)where, as before, F∗ = F∗(Tt) is given by the set of rt−1 eigenvectors, multiplied by

√T , corresponding to

the largest rt−1 eigenvalues of X∗X∗′/NT , and correspondingly Λ∗(Tt) = Λ∗t−1 = X∗′F∗(F∗′F∗)−1 = X∗′F∗/T .

Notice that the true risk factors F are not affected by pre-multiplication with the matrix M1N but we denote

the PCA estimator as F∗ to emphasize that these are obtained from the cross-sectionally de-meaned data.

It turns out that all our results apply once λit−1 and ΣΛt−1 are replaced by λ∗it−1 and ΣΛ∗t−1throughout

our assumptions. Notice that, by construction,∑N

i=1 λ∗it−1 = 0rt−1 and that ΣΛ∗t−1

is the covariance matrix of

the λit (and not the second moment matrix as before), given that:

Λ∗′t−1Λ∗t−1

N=

Λ′t−1M1NΛt

N=

Λ′t−1Λt−1

N− µΛt−1µ

′Λt−1

→p ΣΛ∗t−1> 0,

setting µΛt−1 ≡ N−1Λ′t−11N .

As the true risk factors are not affected by the (cross-sectional) de-meaning, their estimator and thus the

estimators for the risk premia and the SDF, will not be affected either, except of course for the rotation that

will necessarily differ from the case when the risk free asset is tradable. However, estimation of portfolios risk

premia will be affected by the de-meaning as one cannot recover the (weighted) first-moment of the loadings


from the Λ∗t−1, as these are now centered around zero. Moreover, one still needs to estimate the zero-beta rates

γ0t−1. For the static case, Giglio & Xiu (2017) derive the asymptotics for the estimator

γGX0 ≡ (1′NMΛ∗1N )−11′NMΛ∗ xN .

However, in our setting, γGX0 →p γ0 + µ′ΛγP as N → ∞, because 1′N Λ∗ = 0′r, implying that γGX0 is not

consistent for γ0.24 Further complications arise when the zero-beta rate is time varying, as in our framework.

Instead, one can construct a consistent estimator for both γ0t−1 and (a rotation of) µΛt−1 jointly, as follows.

By taking the cross-sectional average of both sides in (34), and stacking across t, one obtains

xT = aT + γ0 + FµΛt−1 + eT = aT + M1T γ0 + D

(γ0

µΛt−1

)+ eT , (35)

setting xT ≡ X1N/N , eT ≡ e1N/N , aT ≡ (a′01N/N, · · · ,a′T−11N/N)′, γ0 ≡ (γ00, · · · , γ0T−1)′, γ0 ≡ γ ′01T /T

and D ≡ (1T ,F). Representation (35) naturally suggests the two-pass OLS estimator for (γ0, µ′Λt−1

)′:(γ0

µΛ

)≡ (D′D)−1D′xT , (36)

based on the (first-pass) estimator of the risk factors F∗, where we set D ≡ (1T , F∗). Noticeably, the two-pass

estimator (γ0, µ′Λ)′ is exactly the mirror image of the traditional two-pass estimator. Whereas the latter is based

on a single cross-sectional regression of N time-series averaged returns on the estimated loadings, evaluated

as the T diverges (see Black et al. (1972), Fama & MacBeth (1973) and Shanken (1992), among others), our

two-pass estimator is obtained from a single time-series regression of T cross-sectional averaged returns on the

estimated factors, evaluated as N diverges.

The estimators (γ0, µ′Λ)′ are clealry meaningful if both aT and M1T γ0 are (asymptotically) negligible as

N diverges. Indeed, these conditions hold as a consequence of no-arbitrage (for the vector of pricing errors

aT ) and by suitable smoothing assumptions (for the γ0), as specified below. The next theorem provides an

inferential procedure for (γ0, µ′Λ)′.

Theorem 7 (rate of convergence, limiting distribution, consistent estimation of the asymptotic

covariance matrix of γ0 and µΛ)

Under Assumptions 1-10, as N →∞,

24Giglio & Xiu (2017) do not encounter this problem because, as they consider a double asymptotic setting, they apply PCA tothe time-series de-meaned data as already discussed above, for which their estimator γGX0 is valid.


(i) ((γ0

µΛ)− (

γ0

H−1t−1µΛt−1

)

)= (

¯F∗′

Irt−1

)Ω∗−1Cov(F∗′,γ0) +Op(N− 1

2 ), (37)

setting

Cov(F∗′,γ0) ≡ 1

TF∗′M1T γ0. (38)

and ¯F∗≡ T−1F∗′1T , Ω

∗ ≡ T−1F∗′M1T F∗.

When, in addition, µΛt−1 →p µΛt−1, aT = o(N−12 ) and γ0s = γ0t−1 for every s ∈ Tt and t ∈ Z:

(ii) ((γ0

µΛ)− (

γ0t−1

H−1t−1µΛt−1

)

)= Op(N

− 12 ). (39)

(iii)√N

((γ0

µΛ)− (

γ0t−1

H−1t−1µΛt−1

)

)→d N(0rt−1+1,G

′t−1Lt−1Gt−1), (40)

setting

Gt ≡ 1

T

(1T ,FHt)(1 γP ′Ht

H′tγP Irt

)−1

−((1T ,FHt)(1 γP ′Ht

H′tγP Irt

)−1 ⊗H−1t µΛt)

,Lt ≡(

L11t L′21t

L21t L22t

),

with

L11t ≡ σ2t IT , L21t ≡ σ2

t (IT ⊗U−1t H−1

t µΛt) + σ2t (FµΛt ⊗U−1

t

H′tF′

T),

L22t ≡ (IT ⊗U−1t

H′tF′

T)Uet(IT ⊗

FHt

TU−1t ) + (σ2

t IT ⊗U−1t ) + (FΣΛ∗t

F′ ⊗ σ2t

TU−2t )

+(σ2t

FHt

TU−1t ⊗H′tF

′)KrtT +KTrt(σ2tU−1t

H′tF′

T⊗ FHt).

(iii) When, in addition, κ4t = 0,

‖ Gt −Gt ‖→p 0, ‖ Lt − Lt ‖→p 0,

setting

Gt1

T≡

(1T ,F∗

T )(1 γ ′

γ Irt)−1

−((1T ,F∗

T )(1 γ ′

γ Irt)−1 ⊗ µΛ)

, Lt ≡(

L11t L′21t

L21t L22t

),


with

L11t ≡ σ2t IT , L21t ≡ σ2

t (IT ⊗ U∗−1t µΛ) + σ2

t (F∗µΛ ⊗ U∗−1

t

F∗′

T),

L22t ≡ (IT ⊗ U∗−1t

F∗′

T)Uet(IT ⊗

F∗

TU∗−1t ) + (σ2

t IT ⊗ U∗−1t ) + (F∗ΣΛ∗t

F∗′ ⊗ σ2t

TU∗−2t )

+(σ2t

F∗

TU∗−1t ⊗ F∗′)KrtT +KTrt(σ2

t U∗−1t

F∗′

T⊗ F∗),

where σ2t , σ4t, U

∗t , ΣΛ∗t

are defined in Theorem 2, replacing F, Λt with F∗, Λ∗t , Kab is defined in Theorem 5 and

Uet is defined in Appendix E.

(iv) When, in addition, µΛt−1 − µΛt−1 = op(N− 1

2 ), then (i)-(ii)-(iii) also apply replacing µΛt−1 with µΛt−1.

Although our focus is on the large-N finite-T case, by analyzing the asymptotic covariance matrix it follows

that our estimators for γ0t−1 and H−1t−1µΛt−1 are, in fact, converging at the fast rate Op((NT )−

12 ) if one allows

both N and T to diverge. This rate resembles the rate that Gagliardini et al. (2016) obtain for their estimator

of the (nonlinear component of the) risk premia associated with the risk factors. The analysis of the zero-beta

rate is excluded from their analysis but we conjecture that the same rate as ours, when the factors are observed,

applies.

Our result shows that, when the risk-free rate is not sufficiently smooth, our risk premia estimator would

quantify the (local) average of the time-varying zero beta rates, namely γ0, although a bias emerges, that

depends on the sample covariance between the risk factors and the time-varying zero-beta rates. A similar

bias affects our estimator of the loadings’ first moment µΛt−1 . However, as T can be arbitrarily small, the

smoothness assumption on the γ0t−1 is extremely mild in practice.

In the absence of the risk free asset, the population ex-post SDF becomes:

mPt,t+1 =

1

γ0t− 1

γ0tγP ′Ω−1(Ft+1 − γP ) for every t ∈ Z, (41)

which satisfies the pricing conditions Et(mPt,t+1xit+1) = 1+O(T−1), again with a (pricing) error of order O(T−1)

by an easy extension of Proposition 2. Our PCA-based estimator for mPt,t+1 is then:

ms,s+1 =1

γ0− 1

γ0γ∗′Ω∗−1(F∗s+1,t − γ∗) for every s ∈ Tt and t ∈ Z,

setting γ∗ = F∗′1T /T , and the analogue of Theorem 5 applies.25

25Details are available upon request.

35

Having obtained a consistent estimator for H−1t−1µΛt−1 , we can re-construct the estimator for the load-

ings as Λ†t−1 ≡ Λ∗t−1 + 1N µ′Λt−1

. In turn, this allows to estimate consistently the expected portfolio returns

corresponding to any generic portfolio weights waN , namely

µat−1 ≡ wa′Nt−1Λ

†t−1γ →p µ

at ≡ µa′Λt−1

H−1′t−1H

′t−1γ

P = µa′Λt−1γP ,

under mild assumptions on the portfolio weights waNt−1 such as Λ†′t−1w

aNt−1 →p H−1

t−1µaΛt−1

.

5 Monte Carlo Experiments

We illustrate the finite-sample performance of the PCA estimators for the number of factors and for the factors

themselves when N is allowed to diverge but T is fixed and, moreover, assumed to be very small. We compare

the results from using our methodology with the ones that are valid when N and T both diverge. Given the

scope of the Monte Carlo exercise, we focus on the static model. We consider cases when T is very small, such

as T = 2, 5 and also the cases when T = 50, 100. We combine these values of T with N ranging from 10 up to

3, 000. Data are generating according to model (1), imposing zero pricing errors,

xit = λ′iFt + eit, i = 1, · · · , N, t = 1, · · · , T,

where the common factors Ft are assumed iid (independent and identically distributed) across t and N(0, Ir),

corresponding to a given number of factors r, the loadings λi are assumed iid across i and N(0, Ir) and the

the idiosyncratic components eit are assumed iid across both i and t standard normal, unless we say otherwise.

Moreover Ft,λi and ejs are assumed mutually independent for every i, j, t, s.26 All the results reported below

are based on 1, 000 Monte Carlo iterations where the common factors Ft are not re-sampled at every iteration

but only sampled once at the outset of every Monte Carlo exercise.

5.1 Estimation of the Number of Factors

Our estimator is based on the criterion PC(k) = V ∗(k)+kg(N) where V ∗(k) = ( TT−k )V (k) and the penalization

function is constructed:

g(N) =(log(N))ε1N ε2

√N

for ε1 ≥ 0 and 0 < ε2 < 1/2,

which only depends on N .27 We compare our results with some of the Bai & Ng (2002)[Section 5] criteria,

asymptotically valid when N and T both diverge, namely the PCp and ICp criteria. We also consider, as a

26Results for the case of cross-sectional dependent ejs are available upon request.27Variations of our criterion were tried. A similar, good, performance to ours has been obtained also for V (k) + kg(N).

5.2 Estimation of the Asymptotic Distribution of Factors 36

comparison, the AICp and BICp criteria, which Bai & Ng (2002) showed not to be always asymptotically valid.

We consider the cases when the true number of latent factors equals r = 1, 3.

Tables I and II focus on case r = 1 and show how our criterion estimates r very accurately, especially for

larger values of ε2: these values imply that the penalization decreases very slowly, keeping k below 2 in most

cases. In particular, Table I show how our method outperforms, for all value of N , when T is extremely small,

namely T = 2 (top part of the table) and T = 3 (bottom part of the table). In contrast, the large-N and

-T criteria IPp1 and ICp1 work well when both N,T are large enough, as for the BIC1 criterion. The AIC1

criterion, instead, always overstates the true number of factors. These features have already been documented

in Bai & Ng (2002).

TABLE I HERE

TABLE II HERE

Table III considers case r = 3 and, again, our criterion appears useful for very small values of T whereas

the IPp1, ICp1 and BIC1 criteria work well for larger values of T , whereas the AIC1 criterion always overstates

the number of factors.

TABLE III HERE

5.2 Estimation of the Asymptotic Distribution of Factors

In this section we always assume r = 1 for simplicity.28 We follow Bai (2003)[Section 6] in terms of the various

analysis developed to demonstrate the accuracy of the asymptotic distribution of the PCA estimated factors,

illustrated in our Theorems 1 and 2 respectively. Table IV reports the correlation, averaged across Monte Carlo

iterations, between true and estimated factors for T = 2, 5, 50 and N = 10, 100, 1000, 3000. We also report the

correlation corresponding to the loadings, as a comparison, in Table V. The results show that the factors are

accurately estimated, even when T is very small, once N is sufficiently large. In contrast, the loadings require

a sufficiently large T in order to obtain a similar accuracy, appearing poorly estimated when T = 2, 5 and,

instead, well estimated when T = 50. Moreover, the quality of the estimates for the loadings does not change

very much as one varies N from 10 to 3000. We also report in Figure I and II the histogram of the estimated

correlations for the factors and the loadings, respectively, when T = 10, N = 100 and when T = 100, N = 100.

Whereas the correlations are all concentrated around unity, for factors, they are instead concentrated around

28Further results are available upon request.


0.95 for the loadings, markedly away from unity, when T is much smaller than N . Instead, the histograms have

the same shape when T and N are both equal to 100, as indicated in Figure II.

TABLE IV and V HERE

FIGURE I HERE

FIGURE II HERE

We then consider the the histograms of factors’ estimates, across the Monte Carlo iterations. In particular

we report the histograms associated with the studentized PCA estimates, satisfying by Theorem 1, for every

1 ≤ s ≤ T ,

f small−Ts ≡ (A′BsA)−12N

12 (Fs − H′Fs)→d N(0r, Ir) as N →∞,

where A and Bs are defined in Theorem 2, setting Bs = Bs,t in view of the static formulation adotped here.

We only display case t = 1 for simplicity in all the figures. For comparison, we also display the histograms

corresponding to the studentized PCA estimates but using the (estimated) asymptotic covariance matrix that

is valid when both N and T diverge, namely σ2U∗−1. The latter formula can be easily obtained by evaluating

the limit of Bs as T →∞, yielding

f large−Ts ≡ (σ2U∗−1)−12N

12 (Fs − H′Fs),

where we expect tails larger than the standard normal as the incorrect standard errors, used for f large−Ts , are

always smaller than the asymptotically correct ones, used for f small−Ts . In the last two graphs we also report

the histogram corresponding to the studentized PCA estimates based on the (estimated) asymptotic covariance

matrix that is valid when both N and T diverge and robust to heteroskedasticity (see Bai (2003)[Eq.(7), Section

5]).

The results are illustrated in Figures III to X. The finite-sample distribution of the f small−Ts is remarkably

close to the standard normal density when N is large, regardless of T , for instance even when T = 2. More

in details, we always report the histogram for the f small−Ts in green and for the f large−Ts in blue. Figure III

considers the case T = 2, N = 10: although N is very moderate, the f small−Ts ’s empirical distribution does

not appear too far from the shape of the standard normal although it is wrongly centered around a negative

value. Instead, using the distribution of the f large−Ts appears much more spread out than the standard normal,

the result of adopting excessively small standard errors. Figure IV still considers the very small T = 2 case

but now with N = 1, 000: now the factors appear extremely well estimated, the empirical distribution of the

f small−Ts is very close to the standard normal whereas, again, the f large−Ts exhibit much fatter tails.


FIGURE III HERE

FIGURE IV HERE

As expected, as one increases T , the discrepancy between the distribution of the f small−Ts and f large−Ts

diminishes, and it is evident from Figure V and VI (T = 5) and Figures VII and VIII (T = 50). In particular,

when T = 50 there is almost no difference between the two histograms, in turn very close to the standard

normal density.

FIGURE V HERE

FIGURE VI HERE

FIGURE VII HERE

FIGURE VIII HERE

Finally, in Figures IX and X we report, together with the histograms of f small−Ts and f large−Ts , also the

histograms of the (studentized) factors’ estimated using the robust standard errors (in red) for the cases

T = 50, N = 1, 000 and T = 100, N = 1, 000, respectively. In the latter case, the three histograms are

essentially identical, and close to the standard normal density, whereas even for T = 50 the robust standard

errors are still underestimating the true asymptotic covariance matrix.29

FIGURE IX HERE

FIGURE X HERE

Next, we evaluate the 95% (asymptotic) confidence intervals for the true factors Ft. Again we consider the

average intervals across the 1000 Monte Carlo iterations. We follow Bai (2003) and construct the confidence

intervals as: (βFs − 1.96

β

N12 (A′BsA)

12

, βFs + 1.96β

N12 (A′BsA)

12

), s = 1, · · · , T,

where β ≡ (F′F)−1F′F, that is the OLS estimator from projecting the estimated factors F on the true factors F

without intercept.This provides an accurate estimation of the (inverse of the) rotation matrix H of Theorem 1.

Figures XI and XII report the times series of these confidence intervals for cases T = 5, N = 10 and T = 5, N =

1, 000 respectively.

FIGURE XI HERE

29We do not report the values corresponding to the robust standard errros for the other combinations of N and T because thecorresponding histograms is too far off from the ones of fsmall−Ts and f large−Ts .

39

FIGURE XII HERE

The confidence intervals, corresponding to T = 5, N = 10 are sufficiently accurate to include the true

factor’s realization, where the interval becomes much narrower as N increases, essentially nailing down the

time series of the true factor. Note that we set T = 5 in both figures.

6 Empirical Application

In this section we assess the performance of our method using an (unbalanced) panel of monthly individual assets

traded in the NYSE from January 1960 to December 2013 (source CRSP). Our method appears ideal to address

important empirical questions related to the possible time-variation characterizing the factor asset pricing

model. Our empirical results are all based on adopting a short rolling window of T = 12 time observations. We

have repeated all the exercises for T = 24 and T = 36 without noticeably differences of the empirical results.

6.1 Time-Variation of Number of Factors

We investigate the extent to which the number of risk factors is varying across time over long horizons, for

example whether they have been increasing along the increased sophistication of financial markets. Moreover,

we investigate whether the number of factors increases rapidly over economic expansions and financial market

booms and decrease otherwise, thus charactering the time-variation over short horizons.

Figure XIII reports the estimated number of factors from January 1960 to January 2013 using rolling

windows of T = 12 months, leading to a number of individual assets varying from 800 to 5, 000 across the

rolling samples.

FIGURE XIII HERE

The results clearly shows that the number of risk factors was small, between one and two, until the beginning

of the 1980s, possibly a sympthom of under-developed financial markets and of financial segmentation. From

then on, the number has been steadily increasing until the dot-com bubble of the 2000, reaching 10 risk factors,

then decreasing sharply to two risk factors as the financial crises unfolds. More in general, from January

2000 onward, the estimated number of factors tracks the dynamics of the financial market remarkably well,

increasing during periods of booms and decreasing during crashes: the correlation between the two series is

above 80% between March 2000 and June 2009, with the number of factors ranging from two to 10. Our method

6.2 Identifying Risk Factors 40

corroborates the observation according to which the correlation across financial markets increases dramatically

during crashes, as one obtains when the number of risk factors reduces to one or two, The correlation across

the whole period (January 1996 until December 2013) is 58%.

Figure XIV shows the same times series of the estimated number of factors, with the NBER business cycle

indicator and various other macroeconomic and financial crises indicators: it is, again, remarkable how the

estimated number of factors co-vary negatively with almost every negative macroeconomic event.

FIGURE XIV HERE

Our finding might explain the disagreement emerging in the empirical asset pricing literature on the correct

number of risk factors for the universe of stock US equity returns, ranging from the small number of Fama and

French factors (three or 5), to the recent advances that suggest up to 10 risk factors. Once time-variation is

allowed for, it is clear that the notion of assuming a fixed number of risk factors appears uncorrect. Moreover,

our maximum estimated number of factors appears aligned with previous findings.

6.2 Identifying Risk Factors

We describe the dynamic behaviour of the time series of the five dominant estimated factors and, exploiting

the time variation of our approach, show how one can identify the extent to which the estimated latent risk

factors are related to some of the observed factors proposed in the empirical asset pricing literature. We focus

here on the five observed factors associated with the Fama & French (2015) asset pricing model.

Figure XV reports the first five estimated factors using rolling windows of size T = 12, where the number

of stocks varies from 800 to 5, 000 across the rolling samples.

Table VI reports the correlation, across the full sample, between the five estimated factors and the observed

factors of the Fama & French (2015) asset pricing model. It emerges that the first factor has the highest

correlation with the market excess return (mkt), the second factor with the profitability return (rmw) and

with the high-minus-low return (hml), and the third factor with the small-minus-large return (smb) and, in

part, with the investment return (cma). The fourth and fifth factors appear mildly correlated with all the five

observed returns. It is interesting to compare these correlations with the ones obtained during the sub-prime

financial crises crash (identified as Aug 2007 to Feb 2009), reported in Table VII: the first factors is even more

prominently related to mkt, the second factor to hml, the third factor to both smb and hml, the fourth factor

especially to cma and, to a lesser extent, to hml and rmw, and, finally, the fifth factor to cma.

6.3 Detecting the State Variables for Loadings 41

Therefore, the links of the first three estimated risk factors with the Fama & French (2015) factors appear

robust, although varying with time. This can be better appreciated by looking Figures XVI, XVII and XVIII,

where we report the time series of the estimated correlations between the PCA estimates and the Fama &

French (2015) five factors, over rolling windows of 60 observations. In particular, Figure XVI shows how the

market excess return is clearly related to the first PCA, the smb return to the second PCA and the hml return

to the third PCA. Although the magnitude of these estimated correlations vary through time, it is evident

that in general the link becomes more tenuous from the late 1990s. Figures XVII and XVIII show the link of

the rmw and cma returns with the fourth and fifth PCA: it appears that rmw and cma are related to a linear

combination of those PCAs. Again the strength of the relationship appears to fade away over the last part

of the sample. Our evidence on the larger number of estimated PCA factors explaining the cross-section of

returns from the 1990s onwards appears aligned with this attenuation of the estimated correlations. A more

economically meaningful measure of the (dynamic) importance of the Fama & French (2015) five factors is

obtained by relating them to the estimated SDF, as ilustrated below. There, we confirm that, indeed, the

relevance of the Fama and French factors appears strong in the first half of the our sample period, until the

early 1990s, and much weaker thereafter, except during the most volatile periods of the financial crises. This

agrees with our empirical finding, above, suggesting that only a small number of risk factors is relevant during

such periods.

FIGURE XV HERE

TABLES VI and VII HERE

FIGURES XVI, XVII, XVIII HERE

6.3 Detecting the State Variables for Loadings

We examine the time variation of the estimated equally-weighted portfolio return:

µewt−1 ≡ wew′N Λt−1γ,

setting wewN = N−11N . Under our assumptions, µewt−1 →p µ

ewt−1 ≡ µew′Λt−1

γP . As already indicated, µewt−1 is

rotation-free, and thus its time-variation must be attributed to the dynamics of the loadings, and not induced

by the rotation matrix. In fact, our identification strategy is based on making the expected portfolio return

above function of the average risk premia over the T data points, namely γP , as opposed to considering the

time-varying risk premia γPt−1.

6.4 Pricing Performance of Risk Premia 42

Figure XIX reports the time series of µewt−1 together with four state-variables often advocated to drive the

dynamics of the loadings to the risk factors, namely the market dividend yield (ratio of aggregate dividends to

the S&P 500 index; source Robert Shiller’s website), the default spread (Moody’s Seasoned Baa Corporate Bond

Yield Relative to Yield on 10-Year Treasury Constant Maturity, Not Seasonally Adjusted; source FRED), the

term spread (10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity, Not Seasonally

Adjusted; source FRED) and the CAPE ratio (Cyclically Adjusted Price Earnings Ratio P/E10; source Robert

Shiller website). Several papers suggest the explanatory power of these state-variables (see Gagliardini et al.

(2016) and Kelly et al. (2018) among others). Data are monthly from Dec 1960 until Dec 2013.

FIGURE XIX HERE

The prominent time-variation of the portfolio estimated expected return is evident. We now assess more

accurately the dynamic relationships between these variables, in particular trying to quantify the extent to

which the portfolio estimated expected return moved together with the four state variables. Figure XX reports

the time series of the adjusted-R2 of the regression, over rolling windows of 60 observations, of µewt−1 over the

four above described state-variables, both jointly (black line) and also taken one by one (red line for CAPE,

green line for term spread, blue line for default spread, light blue for dividend yield).

The results from the regressions reported in figure XX show that the time-variation of the risk factors’

loadings appears strongly driven by the four state-variables when taken jointly, except between the late 1990s

and the mid 2000s. The CAPE appears to be more relevant in the first part of the sample. Interestingly,

the dividend yield seems to explain time-variation of loadings in the aftermath of financial and economic crisis

whereas the spread variables appear more relevant during booms and expansions.

FIGURE XX HERE

6.4 Pricing Performance of Risk Premia

We report in Figures XXI,XXII and XXIII the time series of the estimated (ex-post) risk premia for risk factor

one, two and three, respectively (black line). When these are statistically significant, positive or negative, at

5% significance value, we report them in green and red, respectively. Notice that, for risk premia two and three,

sometimes the lines are missing, corresponding to the rolling windows for which the estimated number of factor

is smaller, respectively, than two and three. The grey bars indicates financial crises and recessions. Notice that

the confidence bands have a time-varying width because are based on a time-varying number of stocks (from


800 to 5, 000). It emerges that the estimated risk premia have some tendency for being anti-cyclical and, more

importantly, their variability increases substantially in the aftermath of crises. Moreover, as illustrated by our

theory, the risk premium associated with the highest-ranked factor (in terms of the corresponding eigenvalues)

is more precisely estimated, and the precision progressively diminishes as one considers the second, the third,

and so forth, risk premium.

FIGURES XXI,XXII,XXIII HERE.

To quantify the pricing performance of the estimated risk premia, we consider the empirical pricing errors

stemming from our PCA procedure:

δit−1 ≡ xi − λ′it−1γ. (42)

Note that these are almost identical to the estimated intercept from the time series regression of the i-th excess

returns xis on the PCA estimates Fs,t, namely αit−1 ≡ xi − β′it−1γ, with the estimated regression coefficient

given by βit−1 ≡ (F′M1T F)−1F′M1Tx.i. The difference between the δit−1 and the αit−1 is that the former rely

on the estimated loadings λit−1 = (F′F)−1F′x.i whereas the αit−1 rely on the βit−1. The popular GRS test of

correct specification of Gibbons et al. (1989) is based on the αit−1.30

We evaluate the following statistic N−1∑N

i=1 δ2it−1, corresponding to any given rolling window Tt of size T .

However, under the null hypothesis of correct model specification, namely Eq. (2), it follows that δit−1 →p

ai + ei − (e′.iF/T )H′t−1F, where ai = (ai0, · · · , aiT−1)1T /T and ei = e′.i1T /T , implying

1

N

N∑i=1

δ2it−1 →p

σ2t−1

T(1− γP ′Ht−1H

′t−1γ

P ) as N →∞.

The above finding, namely that the population pricing errors are not negligible even under the hypothesis of

correct pricing when T is fixed, has been discovered by Raponi et al. (2018) in the context of testing beta-pricing

models with observed risk factors. Here we derive the analogue result the latent risk factors case. In view of

this, we plot the time series of the centered average squared (sample) pricing errors:

∆t−1 ≡1

N

N∑i=1

δ2it−1 −

1

Tσ2t−1(1− γ ′γ),

in order to quantify the pricing performance of the PCA-estimated asset pricing model. Figure XXIV plots the

time series of the ∆t−1 over rolling windows of size T = 12 corresponding to various cases: when the number

of factors equal kt−1 (as reported in Figure XIII) which is presumably the correct model (black line), and also

30Their result, namely that a suitably standartized quadratic form of the αit−1 is exactly distributed like a chi-square, holds fora fixed N and T .


when the number of factors is arbitrarily kept constant, such as k = 1, (red line), k = 2 (green line), k = 3 (blue

line) and k = 5 (light blue). It is evident how the one-factor model leads to the poorest performance whereas

the model associated with kt−1 gives the best performance, yet close to the three- and five-factor model.

FIGURE XXIV HERE.

Another, alternative, way to assess whether the conditional factor asset pricing model adds economic value

is to construct mean-variance efficient portfolios and to assess its out-of-sample performance, for instance in

terms of Sharpe ratios. Specifically, assuming that the universe of assets is driven by the factor time-varying

asset pricing model (1) with rt−1 risk factors, we construct time series of (tangency) portfolio excess returns as:

rmv,kt−1

t ≡ wmv′t−1xt.,

setting

wmvt−1 ≡

Σ−1t−1Λt−1γ

1′NΣ−1t−1Λt−1γ

with Σt−1 ≡(Λt−1ΩΛ′t−1 + σ2

t−1IN

).

The results are reported in Table VIII. We have evaluated the portfolio weights over rolling time-windows of size

T = 36 where the (time-varying) number of assumed risk factors is either taken from Section 6.1 above (based on

our selection criterion), namely kt−1, or assumed fixed throughout the exercise (k = 1, k = 5 and k = 10). The

portfolio returns rmv,ktt have been carefully constructed in real time, meaning that the corresponding portfolio

weights only use past information, in particular data from period t − 35 until t − 1. We have also evaluated

the t-ratios to compare each of the Sharpe ratios with the one associated with the equally weighted portfolios.

Such t-ratios are asymptotically standard normal under the null hypothesis of equality (see Lo (2003)).

TABLE VIII

In summary, the conditional factor asset pricing model (1), corresponding to the estimated number of risk

factors obtained using our criteria (namely kt−1), appears to perform well in terms of out-of-sample Sharpe

ratios. In particular, the corresponding optimal portfolio statistically beats the equally weighted portfolio, which

is usually a hard benchmark to achieve for any model-based portfolio strategy (see DeMiguel et al. (2009)).

Our approach, based on the estimated time-varying number of risk factors kt−1, also beats the cases based on

assuming a constant number of factors: for instance, the results are particularly weak when assuming k = 10

risk factors, a symptom of the over-fitting arising from selecting too many risk factors.

6.5 Dynamic Spanning of the SDF 45

6.5 Dynamic Spanning of the SDF

We examine the dynamic properties of the estimated SDF mt,t+1 (black line), whose times series is reported in

Figure XXV, together with its 95% confidence band (red lines) where, as before, the grey bands correspond to

recessions and financial crises. The results are markedly indicating the anti-cyclicality of the SDF, confirming

its interpretation as an index of bad times. We consider here rolling windows of size T = 24 (two years).

In fact, when when T = 12, one obtains both large swings in the estimated SDF and, correspondingly,

confidence intervals with a large width, especially towards the second half of the time period, due to the

proximity of the chosen short time window (T = 12) and the relatively large number of estimated factors, with

kt−1 often taking values up to 10. This causes quasi-singularity of Ω and thus excessively large estimates of

the SDF and their standard errors.31

FIGURE XXV HERE.

More importantly, we examine the extent to which the Fama & French (2015) five factors span, in part

the SDF. The results are reported in Figure XXVI, where we report the adjusted-R2 from regressing the SDF,

respectively, on the market return, the three Fama & French (1993) factors and the five Fama & French (2015)

factors, using rolling windows of 60 months. It emerges that all three models explained a sizeable amount of

time-variation of the SDF until the 1990s. Thereafter, the CAPM poorly spans the SDF whereas the three-

and five-factor models seem more relevant, although the magnitude of explained R2 diminishes, with positive

spikes observed in the aftermath of financial and economics crises, for instance especially after the 2007 financial

crises. This fact squares with our finding regarding the small number of risk factors determined during market

crashes, as opposed to the large number of risk factors that appear relevant during market booms.

FIGURE XXVI HERE.

7 Conclusion

This paper develops a methodology for inference on conditional asset pricing models linear in latent risk

factors, valid when the number of assets diverges but the time series dimension is fixed, possibly very small.

We show that the no-arbitrage condition implies that the PCA estimator of the risk factors is asymptotically

equivalent to the mimicking portfolios estimator. Moreover, no-arbitrage permits to identify the risk premia as

31Note that when kt−1 = T , which is its largest possible value, then the T × T matrix Ω has rank T − 1, and hence becomessingular. Thus, for accurate estimation of the SDF, it is advisable to allow for a T much larger than kt−1.

REFERENCES 46

the expectation of the latent risk factors. This result paves the way to an inferential procedure, based on the

PCA approach, for the factors’ risk premia and for the stochastic discount factor, spanned by the latent risk

factors. The strength of our set up is that it naturally handles time-varying factor models, where every feature is

allowed to be time-varying including loadings, idiosyncratic risk and the number of risk factors. Several Monte

Carlo experiments corroborate our theoretical findings. Our results represent a unique tool to address several

empirical questions in empirical asset pricing, and beyond, unthinkable with the usual PCA methodologies valid

when under double-asymptotic. For instance, our empirical analysis demonstrates how the estimated number

of risk factors varies across time, increasing during booms and sharply decreasing during financial crises. We

also show how the time-variation of the risk factors’ loadings appear driven by the interest rate spread variables

and dividend and earnings variables, especially during financial crises. The estimates of the SDF implied by

our cross-section of individual returns exhibit a substantial anti-cyclicality and appears strongly spanned by

the Fama & French (2015) factors during market crashes, than otherwise.

References

Ahn, S. C. & Horenstein, A. R. (2013), ‘Eigenvalue ratio test for the number of factors’, Econometrica 81, 1203–

1227.

Ait-Sahalia, Y. & Xiu, D. (2017), ‘Using principal component analysis to estimate a high dimensional factor

model with high-frequency data’, Journal of Econometrics 201, 384–399.

Ait-Sahalia, Y. & Xiu, D. (2018), ‘Principal component analysis of high-frequency data’, Journal of the Amer-

ican Statistical Association 0, 1–17.

Al-Najjar, N. (1999), ‘On the robustness of factor structures to asset repackaging’, Journal of Mathematical

Economics 31, 309–320.

Amengual, D. & Watson, M. (2007), ‘Consistent estimation of the number of dynamic factors in large n and t

panel’, Journal of Business and Economic Statistics 25, 91?96.

Anatolyev, S. & Mikusheva, A. (2018), ‘Factor models with many assets: strong factors, weak factors, and the

two-pass procedure’, Working Paper, MIT .

Andersen, T., Bollerselv, T., Diebold, F. & Wu, J. (2006), Realized beta: persistence and predictability, in

T. Fomby & D. Terrell, eds, ‘Advances in Econometrics: Econometric Analysis of Economic and Financial

Time Series in Honor of R.F. Engle and C.W.J. Granger, vol. B’, Elsevier, Amsterdam, Netherlands,

pp. 1–40.

Ang, A. & Kristensen, D. (2012), ‘Testing conditional factor models’, Journal of Financial Economics 106, 132–

156.

Bai, J. (2003), ‘Inferential theory for factor models of large dimensions’, Econometrica 71, 135–171.

REFERENCES 47

Bai, J. & Ng, S. (2002), ‘Determining the number of factors in approximate factor models’, Econometrica

70, 191–221.

Bai, J. & Zhou, G. (2015), ‘Fama-MacBeth two-pass regressions: Improving risk premia estimates’, Finance

Research Letters 15, 31–40.

Barigozzi, M., Hallin, M. & Soccorsi, S. (2019), ‘Time-varying general dynamic factor models and the measure-

ment of financial connectedness’, Working Paper, https://papers.ssrn.com/sol3/abstract=3329445 .

Black, F., C., J. M. & Scholes, M. (1972), The Capital Asset Pricing Model: Some empirical tests, in ‘Studies

in the Theory of Capital Markets’, Praeger, New York.

Brandon, J. B., Plagborg-Møller, M., Stock, J. & Watson, M. (2013), ‘Consistent factor estimation in dynamic

factor models with structural instability’, Journal of Econometrics 177, 289–304.

Breeden, D. T., Gibbons, M. R. & Litzenberger, R. H. (1989), ‘Empirical test of the consumption-oriented

c a p m’, The Journal of Finance 44, 231–262.

Brillinger, D. R. (2001), Time series: data analysis and theory, Society for Industrial and Applied Mathematics,

Philadelphia.

Bryzgalova, S. (2014), ‘Spurious factors in linear asset pricing models’, Working Paper, Stanford University .

Burnside, C. (2016), ‘Identification and inference in linear stochastic discount factor models with excess returns’,

Journal of Financial Econometrics 14, 295–330.

Chamberlain, G. (1983), ‘Funds, factors, and diversification in arbitrage pricing models’, Econometrica

51, 1305–1323.

Chamberlain, G. & Rothschild, M. (1983a), ‘Arbitrage, factor structure, and mean-variance analysis on large

asset markets’, Econometrica 51, 1281–1304.

Chamberlain, G. & Rothschild, M. (1983b), ‘Arbitrage, factor structure, and mean-variance analysis on large

asset markets’, Econometrica 51, 1281–1304.

Cochrane, J. H. (2011), ‘Presidential address: Discount rates’, The Journal of Finance 66, 1047–1108.

Connor, G., Hagmann, M. & Linton, O. (2012), ‘Efficient estimation of a semiparametric characteristic-based

factor model of security returns’, Econometrica 80, 713–754.

Connor, G. & Korajczyk, R. (1993), ‘A test for the number of factors in an approximate factor model’, Journal

of Finance 48, 1263?1291.

Connor, G. & Korajczyk, R. A. (1986), ‘Performance measurement with the arbitrage pricing theory: A new

framework for analysis’, Journal of Financial Economics 15, 373 – 394.

Connor, G. & Linton, O. (2007), ‘Semiparametric estimation of a characteristic-based factor model of common

stock returns’, Journal of Empirical Finance 14, 694 – 717.

Dahlhaus, R. (1997), ‘Fitting time series models to nonstationary processes’, Annals of Statistics 25, 1–37.

Dahlhaus, R. (2000), ‘A likelihood approximation for locally stationary processes’, Annals of Statistics 28, 1762–

1794.

REFERENCES 48

Dello Preite, M. & Zaffaroni, P. (2019), ‘Inference on the stochastic discount factor with large data’, Working

Paper, Imperial College Business School .

DeMiguel, V., Garlappi, L. & Uppal, R. (2009), ‘Optimal versus naive diversification: How inefficient is the

1/n portfolio strategy?’, The Review of Financial Studies 22, 1915–1953.

Fama, E. F. & French, K. R. (1993), ‘Common risk factors in the returns on stocks and bonds’, Journal of

Financial Economics 33, 3–56.

Fama, E. F. & French, K. R. (2015), ‘A five-factor asset pricing model’, Journal of Financial Economics

116, 1–22.

Fama, E. F. & MacBeth, J. D. (1973), ‘Risk, return, and equilibrium: Empirical tests’, Journal of Political

Economy 81, 607–636.

Fan, J., Liao, Y. & Wang, W. (2016), ‘Projected principal component analysis in factor models’, The Annals

of Statistics 44, 219–254.

Ferson, W. E. & Harvey, C. R. (1991), ‘The variation of economic risk premiums’, Journal of Political Economy

99, 385–415.

Forni, M., Hallin, M., Lippi, M. & Zaffaroni, P. (2015), ‘Dynamic factor models with infinite-dimensional factor

spaces: One-sided representations’, Journal of Econometrics 185, 359 – 371.

Forni, M., Hallin, M., Lippi, M. & Zaffaroni, P. (2017), ‘Dynamic factor models with infinite-dimensional factor

space: Asymptotic analysis’, Journal of Econometrics 199, 74 – 92.

Forni, M. & Lippi, M. (2001), ‘The generalized dynamic factor model: Representation theory’, Econometric

Theory 17, 1113–1141.

Forni, Mario, F., Hallin, M., Lippi, M. & Reichlin, L. (2000), ‘The generalized dynamic-factor model: Identifi-

cation and estimation’, The Review of Economics and Statistics 82, 540–554.

French, K., Schwert, G. & Stambaugh, R. (1987), ‘Expected stock returns and volatility’, Journal of Financial

Economics 19, 3–29.

Gagliardini, P., Ossola, E. & Scaillet, O. (2016), ‘Time-varying risk premium in large cross-sectional equity

data sets’, Econometrica 84, 985–1046.

Ghysels, E. (1998), ‘On stable factor structures in the pricing of risk: do time-varying betas help or hurt?’,

Journal of Finance 53, 549–573.

Gibbons, M. R., Ross, S. A. & Shanken, J. (1989), ‘A test of the efficiency of a given portfolio’, Econometrica

57, 1121–1152.

Giglio, S. & Xiu, D. (2017), ‘Inference on risk premia in the presence of omitted factors’, Working Paper 23527,

National Bureau of Economic Research .

Gospodinov, N., Kan, R. & Robotti, C. (2014), ‘Misspecification-robust inference in linear asset-pricing models

with irrelevant risk factors’, The Review of Financial Studies 27, 2139–2170.

Gospodinov, N., Kan, R. & Robotti, C. (2017), ‘Spurious inference in reduced-rank asset-pricing models’,

Econometrica 85, 1613–1628.

REFERENCES 49

Hallin, M. & Liska, R. (2007), ‘Determining the number of factors in the general dynamic factor model’, Journal

of the American Statistical Association 102, 603?617.

Hansen, L. P. & Richard, S. F. (1987), ‘The role of conditioning information in deducing testable restrictions

implied by dynamic asset pricing models’, Econometrica 55, 587–613.

Harvey, C. R. (2001), ‘The specification of conditional expectations’, Journal of Empirical Finance 8, 573–637.

Harvey, C. R., Liu, Y. & Zhu, H. (2016), ‘. . . and the cross-section of expected returns’, The Review of Financial

Studies 29, 5–68.

Huberman, G., Kandel, S. & Stambaugh, R. F. (1987), ‘Mimicking portfolios and exact arbitrage pricing’, The


Ingersoll, J. E. (1984), ‘Some results in the theory of arbitrage pricing’, The Journal of Finance 39, 1021–1039.

Jagannathan, R. & Wang, Z. (1996), ‘The conditional capm and the cross-section of expected returns’, The

Journal of finance 51, 3–53.

Jagannathan, R. & Wang, Z. (1998), ‘An asymptotic theory for estimating beta-pricing models using cross-

sectional regression’, The Journal of Finance 53, 1285?1309.

Jung, S. & Marron, J. S. (2009), ‘PCA consistency in high dimension, low sample size context’, The Annals of

Statistics 37, 4104–4130.

Kan, R. & Zhang, C. (1999a), ‘GMM tests of stochastic discount factor models with useless factors’, Journal

of Financial Economics 54, 103?127.

Kan, R. & Zhang, C. (1999b), ‘Two-pass tests of asset pricing models’, Journal of Finance 54, 203?235.

Kelly, B. T., Pruitt, S. & Su, Y. (2017), ‘Instrumented principal component analysis’, Working Paper, SSRN:

https://ssrn.com/abstract=2983919 or http://dx.doi.org/10.2139/ssrn.2983919 .

Kelly, B. T., Pruitt, S. & Su, Y. (2018), ‘Characteristics are covariances: A unified model of risk and return.’,

Working Paper, Yale University .

Kim, S. & Korajczyk, R. A. (2019), ‘Large sample estimators of the stochastic discount factor’, Working Paper,

SSRN: https://papers.ssrn.com/abstract=3131274 .

Kleibergen, F. (2009), ‘Tests of risk premia in linear factor models’, Journal of Econometrics 149, 149–173.

Kozak, S., Nagel, S. & Santosh, S. (2018), ‘Interpreting factor models’, The Journal of Finance 73, 1183–1223.

Kuersteiner, G. M. & Prucha, I. R. (2013), ‘Limit theory for panel data models with cross–sectional dependence

and sequential exogeneity’, Journal of Econometrics 174, 107–126.

Lettau, M. & Ludvigson, S. (2001), ‘Resurrecting the (c) capm: A cross-sectional test when risk premia are

time-varying’, Journal of Political Economy 109, 1238–1287.

Lettau, M. & Pelger, M. (2018a), ‘Estimating latent asset-pricing factors’, Working Paper, SSRN:

https://ssrn.com/abstract=3175556 .

Lettau, M. & Pelger, M. (2018b), ‘Factors that fit the time-series and cross-section of stock returns’, Working

Paper, SSRN: https://ssrn.com/abstract=3211106 .

REFERENCES 50

Lewellen, J. & Nagel, S. (2006), ‘The conditional capm does not explain asset-pricing anomalies’, Journal of

Financial Economics 82, 289–314.

Lintner, J. (1965), ‘The valuation of risk assets and the selection of risky investments in stock portfolios and

capital budgets’, The Review of Economics and Statistics 47, 13–37.

Lo, A. (2003), ‘The statistics of Sharpe ratios’, Financial Analysts Journal 58, 36–52.

Magnus, J. & Neudecker, R. H. (2007), Matrix Differential Calculus with Applications in Statistics and Econo-

metrics., J. Wiley & Sons, Revised Edition, Chicester.

Merton, R. (1973), ‘An Intertemporal Capital Asset Pricing Model’, Econometrica 41, 867–87.

Merton, R. (1980), ‘On estimating the expected return on the market: An exploratory investigation’, Journal

of Financial Economics 8, 323–361.

Motta, G., Hafner, C. M. & von Sachs, R. (2011), ‘Locally stationary factor models: identification and non-

parametric estimation’, Econometric Theory 27, 1279–1319.

Onatski, A. (2009), ‘Testing hypotheses about the number of factors in large factor models’, Econometrica

77, 1447–1479.

Onatski, A. (2010), ‘Determining the number of factors from empirical distribution of eigenvalues’, The Review

of Economics and Statistics 92, 1004–1016.

Pukthuanthong, K. & Roll, R. (2017), ‘An agnostic and practically useful estimator of the stochastic discount

factor’, Working Paper, Caltech .

Raponi, V., Robotti, C. & Zaffaroni, P. (2018), ‘Testing beta-pricing models using large cross-section’, Working

Paper, Imperial College London . Forthcoming, Review of Financial Studies.

Raponi, V. & Zaffaroni, P. (2018), ‘Dissecting spurious factors with cross-sectional regressions’, Working Paper,

Imperial College London .

Reisman, H. (1992), ‘Intertemporal arbitrage pricing theory’, The Review of Financial Studies 5, 105–122.

Ross, S. A. (1976), ‘The Arbitrage Theory of Capital Asset Pricing’, Journal of Economic Theory 13, 341–360.

Shanken, J. (1990), ‘Intertemporal asset pricing: an empirical investigation’, Journal of Econometrics 45, 99–

120.

Shanken, J. (1992), ‘On the estimation of beta-pricing models’, The Review of Financial Studies 5, 1–33.

Sharpe, W. F. (1964), ‘Capital asset prices: a theory of market equilibrium under conditions of risk’, The


Shen, D., Shen, H. & Marron, J. S. (2013), ‘Consistency of sparse PCA in high dimension, low sample size

contexts’, Journal of Multivariate Analysis 115, 317 – 333.

Stambaugh, R. F. (1983), ‘Arbitrage pricing with information’, Journal of Financial Economics 12, 357–369.

Stock, J. H. & Watson, M. W. (2002a), ‘Forecasting using principal components from a large number of

predictors’, Journal of the American Statistical Association 97, 1167–1179.

51

Stock, J. H. & Watson, M. W. (2002b), ‘Macroeconomic forecasting using diffusion indexes’, Journal of Business

& Economic Statistics 20, 147–162.

Su, L. & Wang, X. (2017), ‘On time-varying factor models: Estimation and testing’, Journal of Econometrics

198, 84–101.

Uppal, R., Zaffaroni, P. & Zviadadze, I. (2018), ‘Correcting misspecified stochastic discount factors’, Working

Paper, Edhec-Imperial College London-Stockholm School of Economics .

8 Appendixes

This section contains five appendixes: Appendix A sets the mathematical notation adopted in the paper;

Appendix B presents four ancillary propositions ; Appendix C presents the technical lemmas necessary for the

proofs of the main theorems, which are then reported in Appendix D; Appendix E reports the form of the

covariance matrix of the (e.i ⊗ e.i).

To simplify arguments, we report the proofs for the static case, that is when ait−1 = ai,λit−1 = λi, σ2t−1 =

σ2, rt−1 = r. Generalization to the time-varying case is obtained without further difficulties in most cases.32

Likewise,∑

s∈Tt will be replaced by∑T

s=1 throughout all the proofs.

8.1 Appendix A: Notation

The following notation is adopted throughout the paper: a.s. means almost surely; 1A denotes the indicator

function equal to one when event A holds; ιt denotes the t−th column of the T×T identity matrix IT ; C denotes

a finite positive constant, not always the same; ab and Ab×c denote a generic vector and matrix, respectively, of

size b×1 and b×c implying that 0a and 0a×b are the a×1 vector and the a×b matrix of zeros matrix, respectively;

diag(A) is the diagonal matrix with the diagonal elements of the matrix A; for any full column rank matrix A

of size T × a, set the projection matrixes PA ≡ A(A′A)−1A′ and MA ≡ IT −A(A′A)−1A′; gj(A) defines the

j − th eigenvalue in decreasing order of any symmetric a× a matrix A implying g1(A) ≥ ga(A); ⊗ denotes the

Kroenecker product; vec(.) and trace(.) denote the vec and trace operators; Z = · · · ,−1, 0, 1, · · · denotes

the set of relative numbers; the o(·), O(·) and op(·), Op(·) notation is adopted for scalars and finite-dimensional

vectors and matrixes (whose number of rows and columns are not a function of N);→p,→d denote convergence

in probability and distribution, respectively; E(.),Var(.),Cov(.) and Et(.),Vart(.),Covt(.) indicate, respectively,

the unconditional mean, variance, covariance and the conditional mean, variance, covariance with respect to

all the available information up to time t.

32Further details for the time-varying case are provided for the proofs of the main theorem, when not straightforward.

8.2 Appendix B: Propositions 52

8.2 Appendix B: Propositions

Proposition 1 (alternative representations of the objective function)

(i) For any arbitrary T ×k matrix A = (a1, · · · ,aT )′ of rank k, and arbitrary N ×k matrix B = (b1, · · · ,bN )′,

V (k,A) ≡ minB

1

NT

N∑i=1

T∑t=1

(xit −B′iAt)2

=1

NTminB

trace(

(X′ −BA′)(X′ −BA′)′)

=1

NTminB

trace(

(X−AB′)(X′ −BA′))

=1

NTtrace

(MAXX′

).

(ii) Setting A equal to the T × k matrix of eigenvectors of XX′/NT , times√T , associated with its k largest

eigenvalues,

V (k) ≡ minAV (k,A) =

1

NT

(trace(XX′)− trace(PAXX′)

)=

1

NT

(trace(XX′)− trace(A′XX′A

T))

= (T∑t=1

gt(XX′

NT)−

k∑t=1

gt(XX′

NT)) = (trace(VT )− trace(Vk)),

setting Va ≡ diag(g1(XX′/NT ) · · · ga(XX′/NT )), for every 1 ≤ a ≤ T .

(iii) V (k) is unchanged when replacing A by AC for any non-singular k × k matrix C.

Proposition 2 (pricing errors associated with the ex-post SDF mPt−1,t) Let

Ω ≡ cov(ft) be a positive definite matrix,

gt ≡ Ω−12 (ft − E(ft)),

such that:

sup1≤t≤T

T∑s=1

‖ E(g′tgs

)‖= O(1),

sup1≤t,v≤T

T∑s=1

‖ E(gtg′sgv) ‖= O(1),

sup1≤t≤T

T∑s=1

‖ E(gtg′tgsg

′s

)− Ir ‖= O(1),

g1(T−1T∑t=1

gtg′t) ≥ C > 0 a.s. when T large enough.


Then, when (1) holds with ai = 0 for every i, the ex-post SDF mPt−1,t, defined in (19), satisfies:

E(mPt−1,txit) = O(T−1) for every i. (43)

Proof.

E(mPt−1,txit) = E

[(

1

rf− 1

rfγP′Ω−1(ft − f))(λ′iγ + λ′i(ft − E(ft)) + eit)

]=λ′iγ

rf− λ

′i

rfE(

(ft − E(ft))(ft − f)′Ω−1γP)− λ

′iγ

rfE(

(ft − f)′Ω−1γP)

=λ′iγ

rf− λ

′i

rfE(

(ft − E(ft))(ft − f)′Ω−1(γ + f − E(ft)))− λ

′iγ

rfE(

(ft − f)′Ω−1(γ + f − E(ft)))

=λ′iγ

rf+ I + II.

For I

I = −λ′i

rfE((ft − E(ft))(ft − f)′Ω−1γP

)+−λ

′i

rfE(

(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)γP)

= −λ′iγ

rf(1− 1

T)− λ

′i

rfE((ft − E(ft))(ft − f)′Ω−1(f − E(f)

)−λ

′i

rfE(

(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)γ)− λ

′i

rfE(

(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)(f − E(f)))

= −λ′iγ

rf(1− 1

T)− λ

′i

rfE((ft − E(ft))(ft − f)′Ω−1(f − E(f)

)− λ

′i

rfE(

(ft − E(ft))(ft − E(ft))′(Ω−1 −Ω−1)γ

)−λ

′i

rfE(

(ft − E(ft))(E(ft)− f)′(Ω−1 −Ω−1)γ)− λ

′i

rfE(

(ft − E(ft))(ft − f)′(Ω−1 −Ω−1)(f − E(f))

= −λ′iγ

rf(1− 1

T) + I2 + I3 + I4 + I5.

Term I2 satisfies:

E((ft − E(ft))(ft − f)′Ω−1(f − E(f))

)= E

((ft − E(ft))(ft − E(ft)

′Ω−1(f − E(f)))− E

((ft − E(ft))(f − E(ft)

′Ω−1(f − E(f)))

= O(T−1).

The two term on the right hand side of the last expression are made by a finite number of terms like

T−1∑T

s=1 E((fat−E(fat))(fbt−E(fbt))(fcs−E(fcs))) and T−2∑T

s,v=1 E((fat−E(fat))(fbv−E(fbv))(fcs−E(fcs))),

respectively, all of which are O(T−1) in absolute value, whereas for the second term

Term I3 is clearly o(1) as, when T is large enough,

‖ E(

(ft − E(ft))(ft − E(ft))′(Ω−1 −Ω−1)

)‖=‖ E

((ft − E(ft))(ft − E(ft))

′Ω−1(Ω−Ω)Ω−1)‖

= O(C−1 ‖ E

((ft − E(ft))(ft − E(ft))

′Ω−1(Ω−Ω) ‖))

= O(‖ E(

(ft − E(ft))(ft − E(ft))′Ω−1Ω

)−Ω ‖

)= O

(‖ Ω

12E(gtg′tΩ− 1

2 ΩΩ−12 − Ir

)Ω

12 ‖)

= O

(‖ Ω

12E

(gtg′t(

1

T

T∑s=1

gsg′s)− Ir

)Ω

12 ‖

)= O(T−1).


Terms I4 and I5 are also O(T−1) by Holder’s inequality, as they each contain two terms that are of order

O(T−12 ), namely E ‖ Ω−1 −Ω−1 ‖= O(T−

12 ), E ‖ f − E(f) ‖= O(T−

12 ).

Consider now part II.

E(

(ft − f)′Ω−1(γ + f − E(ft)))

= E((ft − f)′Ω−1(γ + f − E(ft))

)+ E

((ft − f)′(Ω−1 −Ω−1)(γ + f − E(ft))

)Then

E((ft − f)′Ω−1(γ + f − E(ft))

)= E

((ft − E(ft))

′Ω−1(f − E(ft)))

+ E((E(ft)− f)′Ω−1(f − E(ft))

)= E(g′tg)− E(g′g) = O(T−1),

and

E(

(ft − f)′(Ω−1 −Ω−1)(γ + f − E(ft)))

= E(

(ft − f)′(Ω−1 −Ω−1))γ +O(T−1)

= E(

(ft − f)′Ω−1(Ω− Ω)Ω−1)γ +O(T−1) = O(T−1),

as

‖ E(

(ft − f)′Ω−1(Ω− Ω)Ω−1)‖= O(C−1 ‖ E

((ft − f)′Ω−1(Ω− Ω)

)‖)

= O(‖ E(

(ft − f)′Ω−12 (Ir −Ω−

12 ΩΩ−

12 )Ω

12

)‖)

= O

(‖ E

(g′t(Ir −

1

T

T∑s=1

gsg′s)

)Ω

12 ‖

)

= O

(1

T

T∑s=1

‖ E(g′tgsg′s) ‖

)= O(T−1),

where take into account that the terms involving g are of smaller order. Collecting terms

E(mPt−1,txit) =

λ′iγ

rf− λ

′iγ

rf(1− 1

T) +O(T−1) +O(T−1) = O(T−1).

QED

Remark. Our assumptions on the latent factors ft are weaker than (temporal) iid and zero mixed-third moments.

Remark. Proposition 2 can be extended to the case ai 6= 0. However, we only present case ai = 0 as our PCA

procedure, valid when N becomes large, does not allow to identify the pricing errors ai.

Proposition 3 (equivalence between risk premia estimators) Under Assumption 1:

γtwopass = γ.

Proof. The result follows from the identifies N−1Λ′Λ = V and (NT )−1F′XX′ = VF′, as:

γtwopass = (Λ′Λ

N)−1 Λ′xN

N= V−1 Λ′X′

N

1TT

= V−1 F′XX′

NT

1TT

= V−1VF′1TT

= γ.

QED


Proposition 4 (conditional APT) Under Assumptions 1, 2 and 5:

(i) there exists a K1t−1 <∞ a.s such that

supN

a′t−1Σ−1et−1at−1 ≤ K1t−1 a.s. for every t ∈ Z. (44)

(ii) When, in addition, with supN g1(Σet−1) ≤ C <∞, there exists exists a K2t−1 <∞ a.s such that

supN

a′t−1at−1 ≤ K2t−1 a.s. for every t ∈ Z. (45)

(iii) When, in addition, E(γ ′t−1γt−1) ≤ C <∞, E(K2t−1) ≤ C <∞ and the smoothness condition∑∞

i=1 (E([λit−1 − E(λit−1|It−1)]′[λit−1 − E(λit−1|It−1)]))12 ≤

C <∞ holds, then for any information set It−1 coarser than all the available information at time t− 1, there

exists K3t−1 <∞ a.s. such that:

supN

N∑i=1

(E(xit|It−1)− E(λit−1|It−1)′E(γt−1|It−1)

)2 ≤ K3t−1 a.s. for every t ∈ Z. (46)

Proof. Part (i) follows along the lines of Ingersoll (1984)[Theorem 1] and part (ii) follows along the lines of

Ingersoll (1984)[Theorem 3], applied to the conditional factor model (1).

For part (iii) we generalize Stambaugh (1983)[Theorem 2] to the case of time-varying loadings. From part

(ii)

E(K2t−1|It−1) ≥ E

(N∑i=1

(µit−1 − λ′it−1γt−1)2|It−1

)≥

N∑i=1

(E(µit−1|It−1)− E(λ′it−1γt−1|It−1)

)2=

N∑i=1

(E(µit−1|It−1)− E(λ′it−1|It−1)E(γt−1|It−1)− E([λit−1 − E(λit−1|It−1)]′γt−1|It−1)

)2.

By our assumption it follows that E(K2t−1|It−1) < ∞ a.s. Therefore, the factor asset pricing model assumed

by the investor with the coarser information set It−1 satisfies the pricing errors bound

N∑i=1

(E(µit−1|It−1)− E(λ′it−1|It−1)E(γt−1|It−1)

)2 ≤ K3t−1 <∞ a.s.

for some K3t−1 if |∑∞

i=1 E([λit−1 − E(λit−1|It−1)]′γt−1|It−1)| <∞ a.s.. In turn, this follows by

E|∞∑i=1

E([λit−1 − E(λit−1|It−1)]′γt−1|It−1)|

≤∞∑i=1

E(

([λit−1 − E(λit−1|It−1)]′[λit−1 − E(λit−1|It−1)])12 (γ ′t−1γt−1)

12

)≤(E(γ ′t−1γt−1)

) 12

∞∑i=1

(E([λit−1 − E(λit−1|It−1)]′[λit−1 − E(λit−1|It−1)])

) 12 ≤ C <∞.

QED

Remark. The smoothness assumption on the loadings is trivially satisfied for the static case, namely λit−1 = λi,

which is the case examined by Stambaugh (1983). However, it will be more generally satisfied in a variety of

8.3 Appendix C: Lemmas 56

dynamic set-ups. For instance, when λis = λi(s) for some differentiable function λi(·), as in Assumption 9, and

assuming for simplicity that the coarser information set simply means that information is acquired with some

delay, so that E(λit−1|It−1) = λis−1 for some s < t, then λit−1−E(λit−1|It−1) = (t−s)λ(1)is∗ and the smoothness

assumption can be expressed as

E(tr(Λ

(1)′t−1Λ

(1)t−1)

)≤ C <∞ for every t ∈ Z.

Remark. As indicated by Stambaugh (1983)[Theorem 3], a special case of a coarser information set consists of

no information, leading to a restriction on the unconditional pricing errors E(ait).

Remark. Stambaugh (1983)[Lemma 1] derives condition E(K2t−1|It−1) < ∞ under distributional assumptions

and static loadings.

8.3 Appendix C: Lemmas

Lemma 1 Under Assumption 1 and 3, as N →∞,

‖ Λ′a ‖= Op(N12 ).

Proof.

‖ Λ′a ‖≤N∑i=1

|ai| ‖ λi ‖≤ (N∑i=1

a2i )

12 (

N∑i=1

‖ λi ‖2)12 = Op(N

12 ).

QED

Lemma 2 Under Assumption 1 and 5, as N →∞,

‖ e′a ‖= Op(g121 (Σe)).

Proof.

e′a = Op((N∑

i,j=1

aiajE(e.ie′.j))

12 ) = Op((

N∑i,j=1

aiajσij)12 ) = Op((a

′a)12 g

121 (Σe)) = Op(g

121 (Σe)).

QED

Lemma 3 (central lemma) Under Assumptions 1-6, as N →∞,

(i)1

TF′( 1

NTXX′ − 1

Tσ2IT

)F = U ≡ (V − 1

Tσ2Ir)→p U > 0.

F′F

T

(Λ′Λ

N

)F′F

T→p U,

where recall that U denotes the diagonal r × r matrix of eigenvalues of(

1NT XX′ − 1

T σ2IT

)corresponding to

F, and U denotes the diagonal r × r matrix of distinct eigenvalues of Σ12ΛΣFΣ

12Λ.


(ii)F′F

T→p Q ≡ U

12 ΥΣ

− 12

Λ ,

for a non-singular Q, where Υ is the r × r eigenvectors matrix of Σ12ΛΣFΣ

12Λ.

Proof. Consider case ai = 0 first. We will then show that the no-arbitrage condition implies that the ais will

be washed out as N diverges.

(i) The result follows given that

1

NTXX′ =

1

TF(

Λ′Λ

N)F′ +

ee′

NT+ op(1)→p

1

TFΣΛF′ +

σ2

TIT .

Note that, as highlighted in Bai (2003), Lemma D.1, 1NT XX′ and

(1NT XX′ − 1

T σ2IT

)have the same set

of eigenvectors, but different eigenvalues. In particular, the eigenvalues corresponding to F equal V and

U, respectively, for the two matrixes. Continuity of the eigenvalue function together with Slutzky theorem

concludes, recalling that the the non-zero eigenvalues of T−1FΣΛF′ coincide with the entire set of r eigenvalues

of Σ12ΛΣFΣ

12Λ. The second part of (i) follows directly.

Now consider the general case when the ai 6= 0. Then, from X = 1Ta′ + FΛ′ + e,

1

NTXX′ =

1

TF(

Λ′Λ

N)F′ +

ee′

NT+ 1T

a′a

NT1′T +

1Ta′e′

NT+

ea1′TNT

+1Ta′ΛF′

NT+

FΛ′a1′TNT

+ op(1)

→p1

TFΣΛF′ +

σ2

TIT ,

by Lemmas 1 and 2. The rest of the proof follows along the same lines.

(ii) Consider case ai = 0 first. We adapt the proof of Bai (2003), Proposition 1. In fact, one needs to replace

XX′/NT with (XX′/NT −σ2IT /T ) to take into account the fact that the term ee′/NT is non-negligible when

T is fixed, and it would contribute to the non-zero eigenvalues (as N →∞) even though it is not related to the

common component.

In particular, given (XX′

NT− σ2

TIT

)F = F

(Vr −

σ2

TIr) = FU,

pre-multiplying both sides by T−1(Λ′Λ/N)12 F′, and re-arranging terms, yields

(Λ′Λ

N)12F′

T

(XX′

NT− σ2

TIT

)F = (

Λ′Λ

N)12F′F

T

Λ′Λ

N

F′F

T+ CN = (

Λ′Λ

N)12F′F

TU

re-written as

(BN + CND−1N )EN = ENU,

setting EN = DN (diag(D′NDN ))−12 and

BN = (Λ′Λ

N)12F′F

T(Λ′Λ

N)12 ,

CN = (Λ′Λ

N)12

(F′F

T

Λ′e′F

NT+

F′e

TN

Λ′F′F

T+

1

T 2F′(

ee′

N− σ2IT )F

)= op(1),

DN = (Λ′Λ

N)12F′F

T,


as DN is a.s. invertible for N large enough by part (i). One then follows the steps in Bai (2003), but replacing

ΣF with ΣF , obtaining BN →p Σ12ΛΣFΣ

12Λ and EN →p Υ′, and thus

F′F

T= (

Λ′Λ

N)−

12 EN (diag(D′NDN ))

12 →p Σ

− 12

Λ Υ′U12 .

The same result is obtained when ai 6= 0, the only difference being that CN is now given by

CN = (Λ′Λ

N)12

(F′F

T

Λ′e′F

NT+

F′e

TN

Λ′F′F

T+

1

T 2F′(

ee′

N− σ2IT )F

+F′

T(1T

a′a

NT1′T +

1Ta′e′

NT+

ea1′TNT

+1Ta′ΛF′

NT+

FΛ′a1′TNT

)F

)= op(1),

which is still op(1) by Lemmas 1 and 2. QED

Lemma 4 (quantities for asymptotic covariance matrix) Under Assumptions 1-6, as N →∞,

(i)

σ2 ≡ T

T − rV (r)→p σ

2.

(ii)

σ4 ≡(

3 +27

T(

T∑t=1

((F′tFt)

T)2) +

18r

T

)−1( 1

NT

N∑i=1

T∑t=1

e4it

)→p σ4 + Cκκ4,

for a constant Cκ (function of r, T , F and H) defined in (50) below.

(iii)

ΣΛ ≡Λ′Λ

N− σ2

TIr →p H−1ΣΛH−1′.

Proof. Consider case ai = 0 first. We will then show that the no-arbitrage condition implies that the ais will


be washed out as N diverges. (i) From

eit = xit − λ′iFt = xit −(λ′iH

−1′ + (e′.iF

T)− λ′i(H−1′F′ − F′)

F

T

)(H′Ft + (Ft − H′Ft)

)= xit − λ′iFt − λ′iH−1′(Ft − H′Ft)−

((e′.iF

T)− λ′i(H−1′F′ − F′)

F

T

)Ft

= eit − λ′iH−1′(Ft − H′Ft)− (e′.iF

T)Ft + λ′i(H

−1′F′ − F′)F

TFt

yielding

V (r) =1

NT

N∑i=1

T∑t=1

e2it =

1

NT

N∑i=1

T∑t=1

e2it +

1

T 3

T∑t=1

F′tF′(

1

N

N∑i=1

e.ie′.i)FFt

+1

T

T∑t=1

(Ft − H′Ft)′H−1(

1

N

N∑i=1

λiλ′i)H

−1′(Ft − H′Ft)

+1

T 3

T∑t=1

F′tF′(FH−1 − F)(

1

N

N∑i=1

λiλ′i)(H

−1′F′ − F′)FFt

−21

NT

N∑i=1

T∑t=1

eit

(λ′iH

−1′(Ft − H′Ft) + (e′.iF

T)Ft − λ′i(H−1′F′ − F′)

FFt

T

)+2

1

NT

N∑i=1

T∑t=1

(λ′iH−1′(Ft − H′Ft))

((e′.iF

T)Ft − λ′i(H−1′F′ − F′)

FFt

T

)−2

1

NT

N∑i=1

T∑t=1

((e′.iF

T)Ft)(λ

′i(H

−1′F′ − F′)FFt

T)

= σ2 +σ2

Ttr(

F′F

T

F′F

T)− 2

σ2

T

T∑t=1

ι′t,TFFt

T+Op(N

− 12 )

= σ2 +σ2

Ttr(

F′F

T

F′F

T)− 2

σ2

Ttr(

F′F

T) +Op(N

− 12 ) = σ2(1− r

T) +Op(N

− 12 ).

Case ai 6= 0 now easily follows by Lemmas 1 and 2. To prove this, simply replace eit in the above expressions

with e∗it ≡ eit + ai, and notice that all the parts involving the ai vanish asymptotically. For instance, the first

of these terms satisfies

1

NT

N∑i=1

T∑t=1

(e∗it)2 =

1

NT

N∑i=1

T∑t=1

(eit + ai)2

1

NT

N∑i=1

T∑t=1

e2it +

1

NT

N∑i=1

T∑t=1

a2i +

2

NT

N∑i=1

T∑t=1

eitai =1

NT

N∑i=1

T∑t=1

e2it +O(N−1) +Op(N

− 12 ).

(ii) Set ai = 0. By developing N−1∑N

i=1 e4it, along the same lines as part (i), one obtains

1

NT

T∑t=1

N∑i=1

e4it =

1

NT

T∑t=1

N∑i=1

e4it +

1

NT

T∑t=1

N∑i=1

((e′.iF

T)Ft)

4 +6

NT

T∑t=1

N∑i=1

e2it((

e′.iF

T)Ft)

2

+4

NT

T∑t=1

N∑i=1

e3it((

e′.iF

T)Ft) +

4

NT

T∑t=1

N∑i=1

eit((e′.iF

T)Ft)

3 +Op(N− 1

2 ). (47)


Set

ct = (c1t · · · cTt)′ =1

TFHH′Ft =

1

T

((F′1HH′Ft) · · · (F′tHH′Ft) · · · (F′THH′Ft)

)′.

Then, the second term on the right hand side of (47) can be re-written as (NT )−1∑T

t=1

∑Ni=1((e′.iF/T )Ft)

4 =

(NT )−1∑T

t=1

∑Ni=1(

∑Ts=1 eiscst)

4 + op(1), with mean satisfying:

1

NT

T∑t=1

N∑i=1

E(T∑s=1

eiscst)4 =

1

NT

T∑t=1

N∑i=1

T∑s1,s2,s3,s4=1

κ4,iiii,s1,s2,s3,s4cs1tcs2tcs3tcs4t

+1

NT

T∑t=1

N∑i=1

T∑s1,s2,s3,s4=1

(σii,s1s2σii,s3s4 + σii,s1s3σii,s2s4 + σii,s1s4σii,s2s3)cs1tcs2tcs3tcs4t

→ κ41

T

T∑t=1

T∑s=1

c4st +

3σ4

T

T∑t=1

(

T∑s=1

c2st)

2,

and variance

Var

(1

NT

N∑i=1

T∑t=1

(

T∑u=1

eiucut)4

)=

1

N2T 2

N∑i,j=1

T∑t,s=1

Cov

((

T∑u=1

eiucut)4, (

T∑u=1

ejucus)4

)

=1

N2T 2

N∑i,j=1

T∑t,s=1

T∑u1,u2,u3,u4=1

T∑v1,v2,v3,v4=1

cu1tcu2tcu3tcu4tcv1scv2scv3scv4s

×Cov (eiu1eiu2eiu3eiu4 , ejv1ejv2ejv3ejv4)

=1

N2T 2

N∑i,j=1

T∑t,s=1

T∑u1,u2,u3,u4=1

T∑v1,v2,v3,v4=1

cu1tcu2tcu3tcu4tcv1scv2scv3scv4s

×

(κ8 (eiu1 , eiu2 , eiu3 , eiu4 , ejv1 , ejv2 , ejv3 , ejv4)

+

(6,2)∑κ6 (eiu1 , eiu2 , eiu3 , eiu4 , ejv1 , ejv2) Cov (ejv3 , ejv4)

+

(4,4)∑κ4 (eiu1 , eiu2 , ejv1 , ejv2)κ4 (eiu3 , eiu4 , ejv3 , ejv4)

+

(4,2,2)∑κ4 (eiu1 , eiu2 , ejv1 , ejv2) Cov (eiu3 , eiu4) Cov (ejv3 , ejv4)

+

(2,2,2,2)∑Cov (eiu1 , eiu2) Cov (eiu3 , ejv1) Cov (eiu4 , ejv2) Cov (ejv3 , ejv4)

),

(48)

where κ4 (·), κ6 (·), and κ8 (·) denote the fourth-, sixth-, and eighth-order mixed cumulants, respectively. Ex-

pression (48) is a consequence of the cumulants’ theorem (see Brillinger (2001)), whereby∑(ν1,ν2,...,νk) denotes

the sum over all possible partitions of a group of K random variables into k subgroups of size ν1, ν2, . . . , νk,

respectively. As an example,∑(6,2) defines the sum over all possible partitions of the group of eight random

variables eiu1 , eiu2 , eiu3 , eiu4 , ejv1 , ejv2 , ejv3 , ejv4 into two subgroups of size six and two, respectively. Moreover,


since E (eit) = E(e3it

)= 0, we do not need to consider further partitions in the above relation.33 Under our

assumptions it follows that the number of indecomposable partitions, in (48), is of order O(N), implying

Var

(1

NT

N∑i=1

T∑t=1

(T∑s=1

eiscst)4

)= O

(1

N

). (49)

Therefore, it follows that

1

NT

T∑t=1

N∑i=1

((e′iF

T)Ft)

4 →p κ41

T

T∑t=1

T∑s=1

c4st +

3σ4

T

T∑t=1

(T∑s=1

c2st)

2.

For the third term of (47) one obtains 6(NT )−1∑T

t=1

∑Ni=1 e

2it((e

′.iF/T )Ft)

2 = 6(NT )−1∑T

t=1

∑Ni=1 e

2it(∑T

s=1 eiscst)2+

op(1), and, along the same lines,

6

NT

T∑t=1

N∑i=1

E(e2it(

T∑s=1

eiscst)2) → 6

T(κ4 + 2σ4)(

T∑t=1

c2tt) +

6r

Tσ4,

where by easy calculationsT∑t=1

T∑s=1

c2st =

T∑t=1

ctt = r,T∑t=1

(T∑s=1

c2st)

2 =T∑t=1

c2tt.

Using again the cumulants’ theorem, it follows that

Var

(1

NT

N∑i=1

T∑t=1

e2it(

T∑s=1

eisccst)2

)=

1

N2T 2

N∑i,j=1

T∑t,s=1

Cov

(e2it(

T∑u=1

eiucut)2, e2

js(T∑u=1

ejucus)2

)

=1

N2T 2

N∑i,j=1

T∑t,s=1

T∑u1,u2=1

T∑v1,v2=1

cu1tcu2tcv1scv2sCov(e2iteiu1eiu2 , e

2jsejv1ejv2

)= O

(1

N

),

implying6

NT

T∑t=1

N∑i=1

e2it((

e′.iF

T)Ft)

2 →p6

T(κ4 + 2σ4)(

T∑t=1

c2tt) +

6r

Tσ4.

Along the same lines, for the fourth and fifth terms on the right hand side of (47) one obtains

4

NT

T∑t=1

N∑i=1

e3it((

e′.iF

T)Ft)→p

4

T(κ4 + 3σ4)(

T∑t=1

ctt) =4r

T(κ4 + 3σ4)

4

NT

T∑t=1

N∑i=1

eit((e′.iF

T)Ft)

3 →p4

T(κ4(

T∑t=1

c3tt) + 3σ4(

T∑t=1

c2tt)).

Collecting terms concludes the proof where

Cκ ≡(

1 +

∑Tt=1

∑Ts=1 c

4st

T+

6∑T

t=1 c2t,t

T+

4r

T+

4∑T

t=1 c3t,t

T

)(3 +

27

T(T∑t=1

c2t,t) +

18r

T

)−1. (50)

Case ai 6= 0 results in exactly the same expressions, by Lemmas 1 and 2, and as before the proof follows

replacing the eit with the e∗it = eit + ai.

33By the cumulants’ theorem (Brillinger (2001)), evaluation of Cov (eiu1eiu2eiu3eiu4 , ejv1ejv2ejv3ejv4) requires considering theindecomposable partitions of the two sets eiu1 , eiu2 , eiu3 , eiu4, ejv1 , ejv2 , ejv3 , ejv4, where indecomposable means that there mustbe at least one subset that includes an element of both sets.


(iii) Set ai = 0. By (27)

1

N

N∑i=1

λiλ′i =

1

N

N∑i=1

(H−1λi +

F′e.iT

+1

TF′(F− FH−1)λi

)(H−1λi +

F′e.iT

+1


)′= H−1

( 1

N

N∑i=1

λiλ′i

)H−1′ +

1

T 2F′(

1

N

N∑i=1

e.ie′.i)F +

1

N

N∑i=1

H−1λi

( F′e.iT

+1


)′+

1

N

N∑i=1

F′e.iT

(H−1λi +

1


)′+

1

N

N∑i=1

1


(H−1λi +

F′e.iT

+1


)′= H−1

( 1

N

N∑i=1

λiλ′i

)H−1′ +

1

T 2F′(

1

N

N∑i=1

e.ie′.i)F +Op(N

− 12 ),

implying1

N

N∑i=1

λiλ′i →p H−1ΣΛH−1′ +

σ2

TIr.

Case ai 6= 0 follows by replacing the eit with the e∗it = eit + ai, implying e∗.i = e.i + ai1T , and using Lemmas 1

and 2. For instance,

1

N

N∑i=1

e∗.ie∗′.i =

1

N

N∑i=1

(e.i + ai1T )(e.i + ai1T )′ =1

N

N∑i=1

e.ie′.i +

1T1′TN

N∑i=1

a2i +

1

N

N∑i=1

aie.i1′T +

1

N

N∑i=1

ai1Te′.i

=1

N

N∑i=1

e.ie′.i +O(N−1) +Op(N

− 12 ).

QED

Remark. The following identity holds:

ΣΛ = U∗.

Remark. Based on Lemma 4 one can construct a plug-in consistent estimator of the asymptotic covariance

matrix.

Lemma 5 (generalization of consistency with rates for every k) Under Assumptions 1-6, as N →∞,

‖ Fkt − H′kFt ‖= Op(N

− 12 ), for every 1 ≤ k, t ≤ T,

setting Fkt ≡ UkF

kt , with

Ukt ≡ Vk −

σ2

TIk, and Hk ≡ HkUk = (

Λ′Λ

N)(

F′Fk

T)being a full-rank r × k matrix.

Proof. This follows from the proof to Theorem 1-(i). Pre-multiplying (52) by Uk, where we express all in

terms of a generic 1 ≤ k ≤ T ,

Fkt − H′kFt = T−1

T∑s=1

Fksζst + T−1

T∑s=1

Fksηst + T−1

T∑s=1

Fksηts, (51)

and then apply Lemma 3, given that ‖ F′F/T ‖= Op(1) and ‖ F′kFk/T ‖= O(1). QED


Lemma 6 (behaviour of V (k)) Under Assumptions 1-6,

(i) When k ≤ r:V (k) = σ2 − σ2 k

T+

r∑s=k+1

us +Op(N− 1

2 ).

where the us are the random (diagonal) elements of U, satisfying u1 ≥ u2 ≥ · · · ≥ ur > 0 a.s..

(ii) When k > r:

V (k) = σ2 − σ2 k

T+Op(N

− 12 ).

Proof. Consider case ai = 0, as the proof follows along the same lines when ai 6= 0 by Lemmas 1 and 2.

(i) Case k ≤ r:

Given Λk = X′Fk(Fk′Fk)−1 = X′Fk/T , one gets:

λk′i =e′iF

k

T+ λ′i(

F ′F k

T), setting the r × k matrix Hk ≡ (Λ′Λ/N)(F′Fk/T )U−1

k of rank k,

yielding

ekit ≡ xit − λk′i Fkt = eit + λ′iFt − (

e′.iFk

T+ λ′i(

F′Fk

T))(H′kFt + (Fk

t − H′kFt))

= eit + λ′iFt −e′.iF

kFkt

T− λ′i(

F′Fk

T)(H′kFt +Op(N

−1/2)).

By Lemma 3 above, recalling that Fr = F with r ≥ k,

F′Fk

T=

F′F

T[Ik,0k×r−k]

′ →p Q′[Ik,0k×r−k]′,

recalling that Q = U12 ΥΣ

− 12

Λ . Then

(F′Fk

T)H′k →p Q′[Ik,0k×r−k]

′U−1k [Ik,0k×r−k]QΣΛ = Σ

− 12

Λ Υ′(Ik 0k×r−k

0r−k×k 0r−k×r−k)ΥΣ

12Λ,

obtaining

ekit = eit + λ′i(Ir −Σ− 1

2Λ Υ′(

Ik 0k×r−k0r−k×k 0r−k×r−k

)ΥΣ12Λ)Ft −

e′.iFkFk

t

T+Op(N

−1/2),

and implying

1

NT

T∑t=1

N∑i=1

(ekit)2 = V (k) =

1

NT

T∑t=1

N∑i=1

e2it +

1

NT

T∑t=1

N∑i=1

Fk′t Fk′e.iT

e′.iFkFk

t

T− 2

1

NT

T∑t=1

N∑i=1

eite′.iF

kFkt

T

+1

NT

T∑t=1

N∑i=1

F′t(Ir −Σ12ΛΥ′(

Ik 00 0

)ΥΣ− 1

2Λ )λiλ

′i(Ir −Σ

− 12

Λ Υ′(Ik 00 0

)ΥΣ12Λ)Ft +Op(N

−1/2)

→p σ2 + σ2 k

T− 2σ2 k

T+ trace

((Ir −Υ′(

Ik 0k×r−k0r−k×k 0r−k×r−k

)Υ)Σ12ΛΣFΣ

12Λ

)= σ2 − σ2 k

T+(trace[(

0k×k 0k×r−k0r−k×k Ir−k

)U])

= σ2 − σ2 k

T+

r∑s=k+1

us.

8.4 Appendix D: Proofs of Theorems 64

(ii) Case k > r:

In this case Hk = (Λ′ΛN )(F′Fk

T ) has full-row rank matrix, implying that its Moore-Penrose satisfies HkH+k =

Ir and (H+k )′H′k = Ir. By Proposition 1 and Lemma 3, V (k) = (NT )−1

∑Tt=1

∑Ni=1(ekit)

2 = (NT )−1∑T

t=1

∑Ni=1(ekit)

2

for

ekit ≡ xit − λk′i Fkt setting Λ =

X′Fk

T.

Then

ekit = eit −e′.iF

k

TFkt + λ′i(H

+k )′H′kFt − λ′i

F′Fk

TFkt

= eit −e′.iF

k

TFkt + λ′i(H

+k )′(H′kFt − Fk

t + Fkt )− λ′i

F′Fk

TFkt

= eit −e′.iF

k

TFkt + λ′i(H


t ) +

(λ′i(H

+k )′Fk

t − λ′iF′Fk

TFkt

)

= eit −e′.iF

k

TFkt + λ′i(H


t )− λ′i(H+k )′

((H′kF

′ − Fk′)Fk

T

)Fkt

= eit −e′.iF

k

TFkt + λi ×Op(N−

12 ),

where the Op(N− 1

2 ) term does not depend of i. Thus

V (k) =1

NT

N∑i=1

T∑t=1

e2it +

1

T 3

T∑t=1

Fk′t Fk′(

1

N

N∑i=1

e.ie′.i)F

kFkt −

2

T 2

T∑t=1

Fk′t Fk′(

1

N

N∑i=1

e.ieit) +Op(N− 1

2 )

= σ2 +σ2

Ttr

(Fk′Fk

T

Fk′Fk

T

)− 2σ2

T 2

T∑t=1

Fk′t Fk′ιt,T +Op(N

− 12 )

→p σ2 +

σ2k

T− 2

σ2k

T= σ2(1− k

T).

QED

8.4 Appendix D: Proofs of Theorems

To simplify arguments, we first report the proofs for the static case, that is when ais = ai,λis = λi, σ2t = σ2, rt =

r, and the illustrate their generalization to the time-varying case only when this adds further complexity to the

arguments.

Proof of Theorem 1. Consider case ai = 0, as the proof follows along the same lines when ai 6= 0 by

Lemmas 1 and 2.

The following identity holds (see Bai (2003), equation (D.1)):

Ft − H′Ft = U−1T−1T∑s=1

Fsζst + U−1T−1T∑s=1

Fsηst + U−1T−1T∑s=1

Fsηts. (52)

where, by Assumptions 6 and 7, for every given s, t = 1, · · · , T ,

ζst ≡e′s.et.N− E(

e′s.et.N

) = Op(N− 1

2 ), ηst ≡F′sΛ

′et.N

= Op(N− 1

2 ). (53)


The result follows noting that, by Lemma 3, ‖ U−1 ‖= Op(1), ‖ F′F/T ‖= Op(1) for N large enough, and

‖ F′F/T ‖= O(1) by construction.

(ii) By part (i), the first term on the right hand side of (52) satisfies

U−1

∑Ts=1 FsζstT

= U−1 1

T

T∑s=1

Fs(e′s.et.N− E(

e′s.et.N

))

= U−1 1

TF′(

eet.

N− E(

eet.

N)) = U−1 1

TH′F′(

eet.

N− E(

eet.

N)) + op(1),

where, by Lemma 3,

H→p ΣΛQ′U−1 = H = Σ12ΛΥ′U−

12 = Q−1. (54)

Noticing that

H′ΣFH = Ir,

one obtains √NU−1 1

T

T∑s=1

Fsζst →d N(0r, σ4U−2

T+

(µ4 − 2σ4)

T 2U−1H′FtF

′tHU−1),

as, by having limited the degree of cross-sectional heterogeneity of the moments of the eit as formalized in

Assumption 5, an identical limiting distribution is obtained when centering with respect to E(e′s.et.N )) or to its

large-N limit. The same reasoning applies to all other limiting distributions that follow.

In particular, to derive the asymptotic covariance matrix of√NU−1 1

T

∑Ts=1 Fsζst, consider that the (s, v)th

element of E(e.ieite′.jejt)− E(e.ieit)E(e′.jejt), for every 1 ≤ s, v ≤ T , satisfies

Cov(eiseit, ejvejt) = κ4,iijj,stvt + E(eisejv)E(eitejt) + E(eisejt)E(eitejv)

= κ4,iijj,stvt + σij,svσij,tt + σij,stσij,tv,

yielding, in matrix form,

Cov(e.ieit, e′.jejt) =

κ4,iijj,1t1t · · · κ4,iijj,1tT t

κ4,iijj,2t1t · · · κ4,iijj,2tT t...

......

κ4,iijj,T t1t · · · κ4,iijj,T tT t

+ σij,tt

σij,11 · · · σij,1Tσij,21 · · · σij,2T

......

...σij,T1 · · · σij,TT

+

σij,1t...

σij,T t

(σij,1t . . . σij,T t) .

Therefore,

1

N

N∑i=1

N∑j=1

Cov(e.ieit, e′.jejt) =

1

N

N∑i=1

κ4,iiii,ttttιtι′t + σii,tt

σii,11 · · · 0

0 σii,22 · · ·...

......

0 · · · σii,TT

+σ2ii,ttιtι

′t

+op(1)→ (κ4+σ4)ιtι′t+σ4IT .

For the second term of (52):

√NU−1 1

T

T∑s=1

Fsηst = U−1 1

T

T∑s=1

FsF′s

1√N

N∑i=1

λieit = U−1 F′F

T

1√N

N∑i=1

λieit →d N(0r, σ2U−1QΣΛQ′U−1),


where QΣΛQ′ = U. For the third term of (52):

√NU−1 1

T

T∑s=1

Fsηts =√NU−1 1

T

T∑s=1

FsF′t

1

N

N∑i=1

λieis = U−1 1

T

1√N

N∑i=1

T∑s=1

Fseisλ′iFt

= U−1 1

TF′(

1√N

N∑i=1

e.iλ′i)Ft = U−1 1

T(F′t ⊗ F′)

1√N

N∑i=1

(λi ⊗ e.i)

→d N(0r, σ2(

F′tΣΛFt

T)U−2),

where the asymptotic covariance matrix of T−1(F′t ⊗ F′) 1√N

∑Ni=1(λi ⊗ e.i) is obtained noticing that

T−1(F′t ⊗H′F′)(ΣΛ ⊗ σ2IT )(Ft ⊗ FH) = σ2(F′tΣΛFt)IT .

Given that the three terms are potentially correlated, we need the joint convergence of Assumption 7, as it

emerges by re-writing the right hand side of (52) in matrix form:

Ft − H′Ft = T−1U−1[F′, (F′F), (F′t ⊗ F′)

] 1

N

N∑i=1

(e.ieit − E(e.ieit))λieit

(λi ⊗ e.i)

(55)

= T−1U−1[F′, (F′F), (F′t ⊗ F′)

]At

1

N

N∑i=1

[(e.ieit − E(e.ieit))

(λi ⊗ e.i)

], (56)

where At is defined in (26). By Assumption 5 the eit have zero third-moments, implying that the covariance

matrix of ((e.ieit − E(e.ieit))′, (λi ⊗ e.i)

′)′ is block-diagonal. The diagonal elements have been established

above. When considering (55), one needs the asymptotic (cross) covariance matrix between λieit = (λi ⊗ eit)and (λ′j ⊗ e′.j), which automatically emerges by pre- and post-multiplying the asymptotic covariance matrix in

(25) by At and A′t, respectively, as indicated by (56) However, it can also be derived by direct calculations as

follows: by our assumptions1

N

N∑i,j=1

(λiλ′j ⊗ E(eite

′.j))→p σ

2(ΣΛ ⊗ ι′t,T ),

implying

1

TU−1(

F′F

T)

1

N

N∑i,j=1

(λiλ′j⊗E(eite

′.j))(Ft⊗F)U−1 →p

σ2

TU−1Q(ΣΛ⊗ι′t)(Ft⊗FH)U−1 =

σ2

TU−1QΣΛFtF

′tHU−1.

Let us now extend the proof to the time-varying case. We first show that

1

N

T0∑s=1

[(xs −ΛtFs)

′(xs −ΛtFs)− (xs −ΛsFs)′(xs −ΛsFs)

]ws(t) = op(1).

This implies that in terms of loadings we are in fact estimating Λt, thus constant within the interval Tt. In

fact,[(xs −ΛtFs)

′(xs −ΛtFs)− (xs −ΛsFs)′(xs −ΛsFs)

]=[−F′sΛ

′txs + F′sΛ

′sxs − x′sΛtFs + x′sΛsFs + F′sΛ

′tΛtFs − F′sΛ

′sΛsFs

]=[−F′s(Λt −Λs)

′xs − x′s(Λt −Λs)Fs + F′s(Λt −Λs)′ΛsFs + F′sΛ

′s(Λt −Λt)Fs + F′s(Λt −Λs)

′(Λt −Λs)Fs

].


By the mean value theorem (Λt −Λs) = Λ(1)s∗ (t− s) and, by summing across i, one gets the op(N) rate.

We now show how, within the interval Tt, we can replace σ2, σ4, κ4 by σ2t−1, σ4t−1, κ4t−1 throughout our

results. To simplify the exposition, it is convenient to assume that the set Tt equals 1, · · · , T (instead of

t− T + 1, · · · , t) without loss of generality. Starting with Lemma 3, we just need to consider ee′/N − σ2t−1IT

where now

ee′

N=

1

N

N∑i=1

σii,11 0 . . .

0 σii,22 . . .. . . . . . . . .. . . 0 σii,TT

12

η.iη′.i

σii,11 0 . . .

0 σii,22 . . .. . . . . . . . .. . . 0 σii,TT

12

+ op(1)

=1

N

N∑i=1

σii,tt + σii,1∗(1− t) 0 . . .

0 σii,tt + σii,2∗(2− t) . . .. . . . . . . . .. . . 0 σii,tt + σii,T ∗(T − t)

12

η.iη′.i

×

σii,tt + σii,1∗(1− t) 0 . . .


12

+ op(1)

= σ2t−1IT + op(1).

Let us first generalize Theorem 1. Regarding the first term of the corresponding asymptotic covariance matrix:

1

N

N∑i=1

N∑j=1

Cov(e.ieis, e′.jejs) =

1

N

N∑i=1

κ4,iiiissssιsι′s + σii,ss

σii,11 · · · 0

0 σii,22 · · ·...

......

0 · · · σii,TT

+ σ2ii,ssιsι

′s

+ op(1)

=1

N

N∑i=1

((κ4,iiiitttt + κ

(1)4,iiii,s∗(s− t))ιsι

′s + (σii,tt + σ

(1)ii,s∗(s− t))

×

σii,tt + σ

(1)ii,1∗(1− t) · · · 0

0 σii,tt + σ(1)ii,2∗(2− t) · · ·

......

...

0 · · · σii,tt + σ(1)ii,T ∗(T − t)

+ (σ2ii,tt + 2σii,s∗σ

(1)ii,s∗(t− s))ιsι

′s

)+ op(1)

→ (κ4t−1 + σ4t−1)ιsι′s + σ4t−1IT .


Regarding the second term of the asymptotic covariance matrix:

1

N

N∑i=1

N∑j=1

(λit−1λ′jt−1 ⊗ E(e.ie

′.j)) =

1

N

N∑i=1


′.i)) + op(1)

=1

N

N∑i=1

λit−1λ′it−1 ⊗

σii,11 0 . . .

0 σii,22 . . .. . . . . . . . .. . . 0 σii,TT

+ op(1)

=1

N

N∑i=1

λit−1λ′it−1 ⊗

σii,tt + σii,1∗(1− t) 0 . . .


+ op(1)

= (ΣΛt−1 ⊗ σ2t−1IT ) + op(1).

Finally, regarding the non-zero covariance term:

1

N

N∑i,j=1

(λit−1λ′jt−1 ⊗ E(eise

′.j)) = σ2

t−1(ΣΛt−1 ⊗ ι′s).

QED

Proof of Theorem 2. The result immediately follows from Lemma 4, noting that F′tΣΛQ′ = F′tHH−1ΣΛH−1′

and F′tΣΛFt = F′tHH−1ΣΛH−1′H′Ft.

The time-varying case requires to generalize Lemma 4. Then, part (i) of Lemma 2 follows immediately by

the above arguments. For part (ii), one needs to consider terms such as

1

NT

T∑h=1

N∑i=1

E(

T∑s=1

eiscsh)4 =1

NT

T∑h=1

N∑i=1

T∑s1,s2,s3,s4=1

κ4,iiii,s1,s2,s3,s4cs1hcs2hcs3hcs4h

+1

NT

T∑h=1

N∑i=1

T∑s1,s2,s3,s4=1

(σii,s1s2σii,s3s4 + σii,s1s3σii,s2s4 + σii,s1s4σii,s2s3)cs1hcs2hcs3hcs4h

=1

NT

T∑h=1

N∑i=1

T∑s=1

κ4,iiii,ssssc4sh +

T∑s1,s2=1

3σii,s1s1σii,s2s2c2s1hc

2s2h

+ o(1)

=1

NT

T∑h=1

N∑i=1

T∑s=1

(κ4,iiii,tttt + κ(1)4,iiii,s∗(s− t))c

4sh +

T∑s1,s2=1

3((σii,tt + (σ(1)ii,s∗1

(s1 − t))(σii,tt + (σ(1)ii,s∗2

(s2 − t)))c2s1hc

2s2h

+ o(1)

= κ4t−11

T

T∑h=1

T∑s=1

c4sh +

3σ4t−1

T

T∑t=1

(T∑s=1

c2st)

2 + o(1).

The remainder four terms of part (ii) follow along the same lines. Finally, by the same arguments, part (iii)

easily follow yielding1

N

N∑i=1

λit−1λ′it−1 →p H−1

t−1ΣΛt−1H−1′t−1 +

σ2t−1

TIrt−1 .

The proof is concluded setting κ4t−1 = 0. QED


Proof of Theorem 3. To show that Prob(PC(k) < PC(r)) → 0 for any k 6= r, consider at first case r < k,

where PC(k)− PC(r) = V ∗(k)− V ∗(r)− (r − k)g(N). Then

Prob(V ∗(k)− V ∗(r) < (r − k)g(N))→ 0,

because g(N) ↓ 0 whereas V ∗(k)− V ∗(r) has a strictly (a.s.) positive limit. Consider now case k > r. Then

Prob(PC(k) < PC(r)) = Prob(V ∗(r)− V ∗(k) > (k − r)g(N))→ 0,

because V ∗(r)− V ∗(k) = Op(N− 1

2 ) whereas N−12 /g(N) = o(1).

In the time-varying case one needs to evaluate the limit of terms such as N−1∑N

i=1 e2is, N

−1∑N

i=1 e.ie′.i and

N−1∑N

i=1 e.ieit, which follow by our previous arguments, with their limits being function of σ2t−1 and σ4t−1.

QED

Proof of Theorem 4. Consider case ai = 0, as the proof follows along the same lines when ai 6= 0 by

Lemmas 1 and 2.

(ii) Consistency of γ follows by Slutzky theorem and Theorem 1 as:

γ =1

T

T∑t=1

Ft →p1

T

T∑t=1

H′Ft = H′γP .

(iii) To derive the limiting distribution of the risk premia estimator, by the same steps adopted for the proof

of Theorem 1,

γ − H′γP =F′1TT− H′F′1T

T= U−1 F′

T(ee′

N− σ2IT )

1TT

+ U−1 F′F

T

Λ′e′1TNT

+ U−1 F′

T

eΛ

NF.

Regarding the first term on the right hand side:

√NU−1 F′

T(ee′

N− σ2IT )

1TT→d N(0r,

1

T 2U−1H′((κ4 + σ4)ΣF + σ4FF′)HU−1),

where, to derive the corresponding asymptotic covariance matrix, one needs to evaluate, for every 1 ≤ s, v ≤ T ,

Cov(eiseit, ejvejr) = κ4,iijj,stvr + E(eisejv)E(eitejr) + E(eisejr)E(eitejv)

= κ4,iijj,stvr + σij,svσij,tr + σij,srσij,tv,

yielding, in matrix form,

Cov(e.ieit, e′.jejr) =

κ4,iijj,1t1r · · · κ4,iijj,1tTr

κ4,iijj,2t1r · · · κ4,iijj,2tTr...

......

κ4,iijj,T t1r · · · κ4,iijj,T tTr

+ σij,tr

σij,11 · · · σij,1Tσij,21 · · · σij,2T

......

...σij,T1 · · · σij,TT

+

σij,1t...

σij,T t

(σij,1r . . . σij,T r) .


Therefore,

T−2T∑t=1

T∑r=1

1

N

N∑i=1

N∑j=1

Cov(e.ieit, e′.jejr)

= T−2T∑t=1

T∑r=1

1

N

N∑i=1

κ4,iiii,ttrrιtι′r + σii,tr

σii,11 · · · 0

0 σii,22 · · ·...

......

0 · · · σii,TT

+ σii,ttσii,rrιtι′r

+ op(1)

→ (κ4 + σ4)

TIT + σ4

1T1′TT 2

.

For the second term:√NU−1 F′F

T

Λ′e′1TNT

→d N(0r,σ2

TU−1QΣΛQ′U−1),

as

1

NT 2

N∑i=1

N∑j=1

λiλ′jCov(1′Te.i, e

′.j1T ) =

1

NT 2

T∑t=1

T∑r=1

N∑i=1

N∑j=1

λiλ′jCov(eit, ejr)

=1

NT 2

T∑t=1

T∑r=1

N∑i=1

N∑j=1

λiλ′jσij,tr =

1

NT 2

T∑t=1

T∑r=1

N∑i=1

λiλ′iσii,tr + op(1)→p

σ2

TΣΛ,

and for the third term:√NU−1 F′

T

eΛ

NF→d N(0r,

σ2F′ΣΛF

TU−2),

noticing that√NU−1 F′

T

eΛ

NF = U−1(F′ ⊗ F′

T)

1√N

N∑i=1

(λi ⊗ e.i).

It remains to obtain the covariance terms, with the only non-zero term being

1

T

T∑t=1

U−1 F′F

T

1

N

N∑i=1

N∑j=1

(λiλ′j ⊗ E(eite

′.j))(F⊗

F

T)U−1 →p

σ2

TU−1QΣΛFF′HU−1.

Finally, the expression for the asymptotic covariance matrix of γ simplifies recalling the identities QΣΛQ′ = U

and H′ΣFH′ = Ir. The above four quantities make the matrix CT .

(iv) Consistency of CT easily follows along the lines of the proof to Theorem 2, recalling the identity QH = Ir,

replacing H′F by ¯F and recalling that ΣΛ →p H−1ΣΛH−1′. QED

Proof of Theorem 5. Define the following function of an arbitrary, full column rank, T × r matrix G

m(G; t) ≡ 1

rf− 1

rf(1′TT

G)(G′M1TG

T)−1(G′(ιt −

1TT

)).

Then mt−1,t = m(F; t) and mPt−1,t = m(F; t). Notice that mP

t−1,t = m(FC; t) for any non-singular r× r matrix

C, including the special case C = H.


Given that T is fixed, we can view vec(F′) as a vector of parameters, and use the delta method, in particular

expanding m(F; t) around m(H′F; t), to derive the limiting distribution of mt. Considering the first-order

differential of m(G; t) with respect to the arbitrary argument (matrix) G:

dm(G; t) = − 1

rf(1′TTdG)(

G′M1TG

T)−1(G′(ιt −

1TT

))

− 1

rf(1′TT

G)d(G′M1TG

T)−1(G′(ιt −

1TT

))− 1

rf(1′TT

G)(G′M1TG

T)−1(dG′(ιt −

1TT

))

= − 1

rf(G′(ιt −

1TT

))′(G′M1TG

T)−1((

1′TT

)⊗ Ir)dvecG′ − 1

rf(1′TT

G)(G′M1TG

T)−1((ι′t −

1′TT

)⊗ Ir)dvecG′

+1

rf(1′TT

G(G′M1TG

T)−1 ⊗ (ιt −

1TT

)′G(G′M1TG

T)−1)(G′M1T ⊗ Ir)dvecG

′

+1

rf((ιt −

1TT

)′G(G′M1TG

T)−1 ⊗

1′TT

F(G′M1TG

T)−1)(G′M1T ⊗ Ir)dvecG

′.

Therefore

∂mt(F; t)

∂vecG′=

1

rf

(−((

1TT

)⊗ Ir)Ω−1(F′(ιt −

1TT

))− ((ιt −1TT

)⊗ Ir)Ω−1(F′

1TT

)

+(M1T F⊗ Ir)(Ω−1F′

1TT⊗ Ω−1F′(ιt −

1TT

)) + (M1T F⊗ Ir)(Ω−1F′(ιt −

1TT

)⊗ Ω−1F′1TT

)

)=

1

rf

(−((

1TT

)⊗ Ir)Ω−1(F′(ιt −

1TT

))− ((ιt −1TT

)⊗ Ir)Ω−1(F′

1TT

)

+(M1T FΩ−1F′ ⊗ Ω−1F′)

[(1TT⊗ (ιt −

1TT

)) + ((ιt −1TT

)⊗ 1TT

)

]),

satisfying

∂m(F; t)

∂vecG′→p (IT ⊗H−1)

∂m(F; t)

∂vecG′.

We now evaluate the asymptotic covariance matrix of F′ − H′F′, where

F′ − H′F′ = U−1 F′

T(ee′

N− σ2IT ) + U−1 F′F

T

Λ′e′

N+ U−1 F′

T

eΛ

NF′.

implying

vec(F′ − H′F′) =

vec(U−1 F′

T(ee′

N− σ2IT ) + U−1 F′F

T

Λ′e′

N+ U−1 F′

T

eΛ

NF′)

= (IT ⊗ U−1 F′

T)

1

N

N∑i=1

((e.i ⊗ e.i)− σ2vec(IT ))

+(IT ⊗ U−1 F′F

T)

1

N

N∑i=1

(e.i ⊗ λi) + (F⊗ U−1 F′

T)

1

N

N∑i=1

(λi ⊗ e.i).


Then

Var(√Nvec(F′ − H′F′)) =

+(IT ⊗ U−1 F′F

T)

1

N

N∑i=1

E [(e.i ⊗ λi)(e′.i ⊗ λ′i)](IT ⊗F′F

TU−1) + (F⊗ U−1 F′

T)

1

N

N∑i=1

E [(λi ⊗ e.i)(λ′i ⊗ e′.i)](F

′ ⊗ F

TU−1)

+(IT ⊗ U−1 F′F

T)

1

N

N∑i=1

E [(e.i ⊗ λi)(λ′i ⊗ e′.i)](F′ ⊗ F

TU−1) + (F⊗ U−1 F′

T)

1

N

N∑i=1

E [(λi ⊗ e.i)(e′.i ⊗ λ′i)](IT ⊗

F′F

TU−1)

+(IT ⊗ U−1 F′

T)

1

N

N∑i=1

E [((e.i ⊗ e.i)− σ2vec(IT ))((e′.i ⊗ e′.i)− σ2vec′(IT ))](IT ⊗F

TU−1) + op(1)

→p (IT ⊗U−1 H′F′

T)Ue(IT ⊗

FH

TU−1) + (σ2IT ⊗U−1) + (FΣΛF′ ⊗ σ2

TU−2)

+(IT ⊗U−1 H′F′F

T)(σ2IT ⊗ΣΛ)KTr(F′ ⊗

FH

TU−1) + (F⊗U−1 H′F′

T)(ΣΛ ⊗ σ2IT )KrT (IT ⊗

F′FH

TU−1).

Re-arranging terms, one obtains

Var(√Nvec(F′ − H′F′)) →

= (IT ⊗U−1 H′F′

T)Ue(IT ⊗

FH

TU−1) + (σ2IT ⊗U−1) + (FΣΛF′ ⊗ σ2

TU−2)

+(σ2 FH

TU−1 ⊗H′F′)KrT +KTr(σ2U−1 H′F′

T⊗ FH),

using the property Kpm(A ⊗ B) = (B ⊗ A)Kqn for every m × n matrix A and p × q matrix B, and K′ab =

K−1ab = Kba (see Magnus & Neudecker (2007)[Chapter 3, Section 7, Theorem 9]), and using the identities

H′ΣFH = Ir, U−1H−1ΣΛH−1′U−1 = U−1, in particular in second variance term above, and U−1H′ΣFΣΛ =

U−1H′ΣFHH−1ΣΛ = U−1H−1ΣΛ = H′ in last two covariance terms. QED

Proof of Theorem 6. For part (i), it is convenient to re-write Fmpt as

Fmpt = (γ,Ω)

(a′ + γΛ′

Λ′

)([(a + Λγ),Λ]

(1 0′r0r Ω

)[a′ + γ ′Λ′

Λ′

]+ Σe

)−1

xt.

= (γ,Ω)Λ′†(Λ†Ω†Λ

′† + Σe

)−1xt.

= (γ,Ω)Λ′†(Λ†Ω†Λ

′† + Σe

)−1(

Λ†(1

ft − E(ft)) + et.

),

setting

Λ† ≡ (a + Λγ,Λ) , Ω† ≡(

1 0′r0r Ω

).


Then

Fmpt = (γ,Ω)

(Ir+1 −Λ′†Σ

−1e Λ†(Ω

−1† + Λ′†Σ

−1e Λ†)

−1)

Λ′†Σ−1e

(Λ†(

1ft − E(ft)

) + et.

)= (γ,Ω)Ω−1

† (Ω−1† + Λ′†Σ

−1e Λ†)

−1Λ′†Σ−1e

(Λ†(

1ft − E(ft)

) + et.

)= (γ,Ω)Ω−1

†

(1 + (a′ + γ ′Λ′)Σ−1

e (a + Λγ) (a′ + γ ′Λ′)Σ−1e Λ

Λ′Σ−1e (a + Λγ) Ω−1 + Λ′Σ−1

e Λ

)−1

Λ′†Σ−1e

(Λ†(

1ft − E(ft)

) + et.

)= I + II.

Consider first term I. Setting for simplicity a ≡ 1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ), b ≡ Λ′Σ−1

e (a + Λγ) and

B ≡ Ω−1 + Λ′Σ−1e Λ, and using the block-wise formula for the inverse of a matrix, gives:(

1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ) (a′ + γ ′Λ′)Σ−1

e ΛΛ′Σ−1

e (a + Λγ) Ω−1 + Λ′Σ−1e Λ

)−1

= (a b′

b B)−1 =

(a−1 + a−2b′(B− bb′

a )−1b −ba

′(B− bb′

a )−1

−(B− bb′

a )−1 ba (B− bb′

a )−1

).

Post multiplying the latter expression by Λ′†Σ−1e Λ† = (

a− 1 b′

b B−Ω−1 ) yields(a−1 + a−2b′(B − bb′

a)−1b −b

a′(B − bb′

a)−1

−(B − bb′a

)−1 ba

(B − bb′a

)−1

)(

a − 1 b′

b B − Ω−1) =(

(a − 1)(a−1 + a−2b′(B − bb′a

)−1b) − ba′(B − bb′

a)−1b (a−1 + a−2b′(B − bb′

a)−1b)b′ − b

a′(B − bb′

a)−1(B − Ω−1)

−(a − 1)(B − bb′a

)−1 ba

+ (B − bb′a

)−1b −(B − bb′a

)−1 ba

b′ + (B − bb′a

)−1(B − Ω−1)

)=

((a−1)

a− 1

a2 b′(bfB − bb′a

)−1b ba′(B − bb′

a)−1Ω−1

(B − bb′a

)−1 ba

Ir − (B − bb′a

)−1Ω−1

)We now evaluate the behaviour of (B− bb′

a )−1 and b/a as N →∞. One needs first to express B− bb′

a suitably,

as:

B− bb′

a= Ω−1 + Λ′Σ−1

e Λ− Λ′Σ−1e (a + Λγ)(a + Λγ)′Σ−1

e Λ′

(1 + (a + Λγ)′Σ−1e (a + Λγ))

= Ω−1 + Λ′Σ−1/2e [IN −

Σ−1/2e (a + Λγ)(a + Λγ)′Σ

−1/2e

(1 + (a + Λγ)′Σ−1e (a + Λγ))

]Σ−1/2e Λ

= Ω−1 + Λ′Σ−1/2e [IN + Σ−1/2

e (a + Λγ)(a + Λγ)′Σ−1/2e ]−1Σ−1/2

e Λ

given the identity

[IN −Σ−1/2e (a + Λγ)(a + Λγ)′Σ

−1/2e

(1 + (a + Λγ)′Σ−1e (a + Λγ))

] = [IN + Σ−1/2e (a + Λγ)(a + Λγ)′Σ−1/2

e ].

Setting the non-singular matrix, where non-singularity holds for N large enough as the last two terms of C

only affect its maximum eigenvalue,

C ≡ Σ−1e + aa′ + Λγa′ + aγ ′Λ′,


and using it within the inverse of the last expression above, gives

(B− bb′

a)−1 =

Ω−ΩΛ′Σ−1/2e (IN + Σ−1/2

e (a + Λγ)(a + Λγ)′Σ−1/2e + Σ−1/2

e ΛΩΛ′Σ−1/2e )−1Σ−1/2

e ΛΩ

= Ω−ΩΛ′(Σ−1e + (a + Λγ)(a + Λγ)′ + ΛΩΛ′)−1ΛΩ

= Ω−ΩΛ′(C + Λ(γγ ′ + Ω)Λ′)−1ΛΩ

= Ω−Ω(Λ′C−1Λ)((γγ ′ + Ω)−1 + Λ′C−1Λ)−1(γγ ′ + Ω)−1Ω

implying, as N →∞, in particular by the divergence of Λ′C−1Λ,

(B− bb′

a)−1 →p Ω−Ω(γγ ′ + Ω)−1Ω =

γγ ′

(1 + γ ′Ω−1γ).

In fact, divergence of Λ′C−1Λ follows as it is bounded from below by Λ′ΛgN (C−1). In turn gN (C−1) = g−11 (C)

and thus one needs the maximum eigenvalue of C not to diverge too fast. However, the maximum eigenvalue of

C diverges at most as a′Λγ = Op(N12 ), given that a′a = O(1) by no arbitrage and Σe has minimum eigenvalue

bounded away from zero, whereas Λ′Λ = Op(N). It follows that

b

a=

Λ′Σ−1e (a + Λγ)

1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ)

→pDγ

γ ′Dγ.

setting D to be probability limit of Λ′Σ−1e Λ/N . Therefore, putting terms together,(

1 + (a′ + γ ′Λ′)Σ−1e (a + Λγ) (a′ + γ ′Λ′)Σ−1

e ΛΛ′Σ−1

e (a + Λγ) Ω−1 + Λ′Σ−1e Λ

)−1

Λ′†Σ−1e Λ† =

((a−1)a − 1

a2b′(B− bb′

a )−1b ba

′(B− bb′

a )−1Ω−1

(B− bb′

a )−1 ba Ir − (B− bb′

a )−1Ω−1

)

→p

(1− γ′Dγγ′Dγ

(γ′Dγ)2(1+γ′Ω−1γ)(γ′Dγ)γ′Ω−1

(γ′Dγ)(1+γ′Ω−1γ)γγ′Dγ

(γ′Dγ)(1+γ′Ω−1γ)Ir − γγ′Ω−1

(1+γ′Ω−1γ)

)=

(1− 1

(1+γ′Ω−1γ)γ′Ω−1

(1+γ′Ω−1γ)γ

(1+γ′Ω−1γ)Ir − γγ′Ω−1

(1+γ′Ω−1γ)

).

It follows that, as N diverges, term I satisfies:

I →p (γ, Ir)

(1− 1

(1+γ′Ω−1γ)γ′Ω−1

(1+γ′Ω−1γ)γ

(1+γ′Ω−1γ)Ir − γγ′Ω−1

(1+γ′Ω−1γ)

)(

1ft − E(ft)

) = (γ, Ir)(1

ft − E(ft)) = Ft.

Next, considering the variance of term II (conditioning on the Λ), we will show that it converges to zero as N

diverges. In fact,

var(

(γ,Ω)(Ir+1 −Λ′†Σ

−1e Λ†(Ω

−1† + Λ′†Σ

−1e Λ†)

−1)

Λ′†Σ−1e et.

)= (γ, Ir)(

a b′

b B)−1(

a− 1 b′

b B−Ω−1 )(a b′

b B)−1(γ, Ir)

′

→p (γ, Ir)

(1− 1

(1+γ′Ω−1γ)γ′Ω−1

(1+γ′Ω−1γ)γ

(1+γ′Ω−1γ)Ir − γγ′Ω−1

(1+γ′Ω−1γ)

)1

(1 + γ ′Ω−1γ)(

1 −γ ′−γ γγ ′

)(γ, Ir)′ = 0r×r,

using

(a b′

b B)−1 →p

1

(1 + γ ′Ω−1γ)(

1 −γ ′−γ γγ ′

).

Therefore II →p 0r establishing part (i).


Part (ii) follows from the identity

Fmpt − F = −σ2(σ2Ir + Λ′Λ)−1F,

and recalling that, under our assumptions, Λ′Λ = Op(N−1). QED

Proof of Theorem 7. Consider at first cases (ii), (iii) and (iv) for the static model, where aT = a1T as

ait = ai with a ≡ a′1T /N . By simple algebric steps:

(γ0

µΛ)− (

γ0 + a

H−1µΛ) = (D′D)−1D′xT − (

γ0 + a

H−1µΛ)

= (D′D)−1D′(D(γ0 + aµΛ

) + eT )− (γ0 + a

H−1µΛ)

= (D′D)−1D′(D(1 0′r0r H

)(γ0 + a

H−1µΛ) + eT )− (

γ0 + a

H−1µΛ)

= (D′D

T)−1 D′eT

T+ (

D′D

T)−1[

D′D

T(

1 0′r0r H

)− (D′D

T)](

γ0 + a

H−1µΛ) = I + II.

Consider now

[D′D

T(

1 0′r0r H

)− (D′D

T)] = (

0 γP ′H− γ ′

0rF∗′F∗

T H− Ir) = (

0 γP ′H− γ ′

0rF∗′FT H− F∗′F∗

T

)

= (0

1′TT (FH− F∗)

0rF∗′

T (FH− F∗)) = (0r+1,

D′

T(FH− F∗)),

implying that term II satisfies

(D′D

T)−1[

D′D

T(

1 0′r0r H

)− (D′D

T)](

γ0 + a

H−1µΛ) = (

D′D

T)−1 D′

T(FH− F∗)H−1µΛ

= vec((D′D

T)−1 D′

T(FH− F∗)H−1µΛ) = −(µ′ΛH−1′ ⊗ (

D′D

T)−1 D′

T)vec(F∗ − FH)

= −(µ′ΛH−1′ ⊗ (D′D

T)−1 D′

T)KrT vec(F

∗′ − H′F′).

Asymptotic normality of√Nvec(F∗′ − H′F′), and the corresponding asymptotic covariance matrix, follows

from the proof to Theorem 5 whereas asymptotic normality of√N eT follows from our assumptions. Thus, to

derive the asymptotic distribution of (γ0, µ′Λ)′, we need to further derive the asymptotic covariance between√

Nvec(F(∗′ − H′F′) and√N eT , given by:

NCov(vec(F∗′ − H′F′), e′N ) = NCov(vec((IT ⊗ U−1 F∗′

T)

1

N

N∑i=1

((e.i ⊗ e.i)− σ2vec(IT )), e′T )

+NCov((IT ⊗ U−1 F∗′F

T)

1

N

N∑i=1

(e.i ⊗ λi), e′T ) +NCov((F⊗ U−1 F∗′

T)

1

N

N∑i=1

(λi ⊗ e.i), e′T )

= N(IT ⊗U−1Q)1

N2

N∑i=1

(E(e.ie′.i)⊗ λi) +N(F⊗U−1 H′F′

T)

1

N2

N∑i=1

(λi ⊗ E(e.ie′.i)) + o(1)

→ σ2(IT ⊗U−1Q)(IT ⊗ µΛ) + σ2(F⊗U−1 H′F′

T)(µΛ ⊗ IT ) = σ2(IT ⊗U−1H−1µΛ) + σ2(FµΛ ⊗U−1 H′F′

T),

8.5 Appendix E: The matrixes Ue,Uet−1 and Ue, Uet−1. 76

recalling that Q = H−1. Putting terms together, parts (i),(ii) and (iv) follow, where the matrix G is obtained

using the property Ka1 = Ia. Part (iii) follows precisely along the same steps of Theorem 2.

Consider now case (i), which is only relevant in the dynamic case. Following the previous steps, but

recognizing time-variation of the ait−1, γ0t−1 and λit−1, as ¯a ≡ a′T1T /T = Op(N− 1

2 ), eT = Op(N− 1

2 ) and

‖ F∗ − FHt−1 ‖= Op(N− 1

2 ),

(γ0

µΛ) = (D′D)−1D′

(aT + γ0 + FµΛt−1 + eT

)= (

D′D

T)−1 D

T

′ (aT + γ0 + F∗H−1

t−1µΛt−1

)+Op(N

− 12 )

=

(1 + ¯F∗′Ω∗−1 ¯F∗ − ¯F∗′Ω∗−1

−Ω∗−1 ¯F∗ Ω∗−1

)(γ0 + ˜F∗′H−1

t−1µΛt−1

F∗′γ0T + F∗′F∗

T H−1t−1µΛt−1

)+Op(N

− 12 )

=

(γ0 − ˜F∗′Ω∗−1Cov(F∗′,γ0)

H−1t−1µΛt−1 − Ω∗−1Cov(F∗′,γ0)

)+Op(N

− 12 )

=

(γ0

H−1t−1µΛt−1

)−

(˜F∗′

Irt−1

)Ω∗−1Cov(F∗′,γ0) +Op(N

− 12 )

QED

8.5 Appendix E: The matrixes Ue,Uet−1 and Ue, Uet−1.

We now present the closed-form expression for Ue, which is defined as the limit:

1

N

N∑i=1

E(((e.i ⊗ e.i)− σ2vec(IT ))((e′.i ⊗ e′.i)− σ2vec′(IT ))

)→ Ue.

In particular, Raponi et al. (2018) established that the T 2 × T 2 matrix Ue has the form:

Uε =

U11 · · · U1t · · · U1T

.... . .

......

...

Ut1 · · · Utt · · · UtT

......

.... . .

...

UT1 · · · UTt · · · UTT

. (56)

Each block of Uε is a T × T matrix. The blocks along the main diagonal, denoted by Utt, t = 1, 2, . . . , T , are

themselves diagonal matrixes with (κ4 + 2σ4) in the (t, t)-th position and σ4 in the (s, s) position for every

8.6 Tables 77

s 6= t, that is,

↓t-th column

Utt = →t-th row

σ4 · · · 0 · · · · · · · · · 0...

. . ....

......

......

0 · · · σ4 0 · · · · · · 00 · · · 0 (κ4 + 2σ4) 0 · · · 00 · · · · · · 0 σ4 · · · 0...

......

......

. . ....

0 · · · · · · · · · · · · 0 σ4

. (57)

The blocks outside the main diagonal, denoted by Uts, s, t = 1, 2, . . . , T with s 6= t, are all made of zeros except

for the (s, t)-th position that contains σ4, that is,

↓t-th column

Uts = →s-th row

0 · · · 0 · · · · · · · · · 0...

. . ....

......

......

0 · · · 0 0 · · · · · · 00 · · · 0 σ4 0 · · · 00 · · · · · · 0 0 · · · 0...

......

......

. . ....

0 · · · · · · · · · · · · 0 0

. (58)

The time-varying matrix Uet is defined by replacing κ4, σ4 with κ4t, σ4t in the expressions above.

Finally, noticing that Ue = Ue(κ4, σ4) and Uet−1 = Ue(κ4t−1, σ4t−1), consistent plug-in estimators can be

easily obtained as Ue = Ue(0, σ4) and Uet−1 = Ue(0, σ4t−1) when κ4 = κ4t−1 = 0 and σ4, σ4t−1 denote the

consistent estimators adopted in the paper.

8.6 Tables

8.6 Tables 78

N T PCp1 ICp1 AIC1 BIC1 PC(0.10) PC(0.25) PC(0.45)

5 2 2 2 2 2 1 1 115 2 2 2 2 2 1 1 130 2 2 2 2 2 1 1 1100 2 2 2 2 2 1 1 1200 2 2 2 2 2 1 1 1500 2 2 2 2 2 1 1 11000 2 2 2 2 2 1 1 13000 2 2 2 2 2 1 1 15 3 3 3 3 3 1.28 1.132 1.05215 3 3 3 3 3 1.316 1.08 130 3 3 3 3 3 1.256 1.024 1100 3 3 3 3 3 1.248 1.008 1200 3 3 3 3 3 1.196 1 1500 3 3 3 3 3 1.184 1 11000 3 3 3 3 3 1.136 1 13000 3 3 3 3 3 1.072 1 1

Table I: Estimated number of factors, k (average across Monte Carlo iterations). The true number of factorsis r = 1 and we search for 0 ≤ k ≤ T = 2, 3. The eit are assumed iid N(0, 1) across i and t. Columns 3 to6 correspond to the competing criteria IPp1, ICp1, AIC1, BIC1 (see Bai & Ng (2002)[Section 5] for details).

Columns 7 to 9 correspond to our large-N criterion PC(ε2) = ( TT−k )V (k) + kg(N) with g(N) = (log(N))ε1Nε2√

Nand ε1 = 0, ε2 = 0.1, 0.25, 0.45.

8.6 Tables 79


10 5 5.16 5.11 5.13 5.19 1.11 1.01 130 5 5.03 5.03 5.03 5.04 1.06 1 1100 5 5.05 5.03 5.03 5.05 6 1 1200 5 5.07 5.06 5.06 5.07 6 1 1500 5 5.07 5.07 5.07 5.07 6 1.2 11000 5 5.05 5.05 5.05 5.05 6 6 13000 5 5.01 5.01 5.01 5.01 6 6 110 10 8 8 8 8 1.02 1 130 10 8 8 8 8 1 1 1100 10 8 8 8 8 1 1 1200 10 8 8 8 8 1 1 1500 10 8 8 8 8 1 1 11000 10 8 7.51 8 8 1 1 13000 10 8 1 8 8 1 1 110 30 8 8 8 8 1 1 130 30 6.15 1 8 8 1 1 1100 30 1.5 1 8 3.6 1 1 1200 30 1 1 8 1 1 1 1500 30 1 1 4.84 1 1 1 11000 30 1 1 1.29 1 1 1 13000 30 1 1 1 1 1 1 110 60 8 8 8 8 1 1 130 60 3.54 1 8 8 1 1 1100 60 1 1 8 2.18 1 1 1200 60 1 1 8 1 1 1 1500 60 1 1 4.78 1 1 1 11000 60 1 1 1 1 1 1 13000 60 1 1 1 1 1 1 110 120 8 8 8 8 1 1 130 120 1.13 1 8 8 1 1 1100 120 1 1 8 4.2 1 1 1200 120 1 1 8 1 1 1 1500 120 1 1 8 1 1 1 11000 120 1 1 1.43 1 1 1 13000 120 1 1 1 1 1 1 1

Table II: Estimated number of factors, k (average across Monte Carlo iterations). The true number of factors isr = 1 and we search for 0 ≤ k ≤ T . The eit are assumed iid N(0, 1) across i and t. Columns 3 to 6 correspondto the competing criteria IPp1, ICp1, AIC1, BIC1 (see Bai & Ng (2002)[Section 5] for details). Columns 7 to

9 correspond to our large-N criterion PC(ε2) = ( TT−k )V (k) + kg(N) with g(N) = (log(N))ε1Nε2√

Nand ε1 = 0,

ε2 = 0.1, 0.25, 0.45.

8.6 Tables 80


10 5 5.05 5.05 5.05 5.05 4.38 3.57 2.2330 5 5.01 5 5 5.01 6 5.98 2.43100 5 5.01 5 5 5.01 6 6 3.6200 5 5.05 5.05 5.05 5.05 6 6 4.22500 5 5.05 5.02 5.01 5.05 6 6 5.111000 5 5.03 5.02 5 5.03 6 6 5.73000 5 5.08 5.07 5.06 5.08 6 6 5.8510 10 8 8 8 8 4.4 3.39 2.0430 10 8 8 8 8 6.19 3.23 1.8100 10 8 8 8 8 6.76 2.87 1.76200 10 8 8 8 8 6.5 2.9 1.87500 10 8 8 8 8 5.77 2.91 1.861000 10 8 8 8 8 5.29 2.95 1.953000 10 8 8 8 8 4.39 2.99 1.9410 30 8 8 8 8 4.22 2.88 1.8830 30 6.34 2.93 8 8 3.61 2.91 1.94100 30 3.09 3 8 4.46 3.04 3 2.07200 30 3 3 8 3 3 3 2.15500 30 3 3 5.52 3 3 3 2.261000 30 3 3 3.03 3 3 3 2.383000 30 3 3 3 3 3 3 2.5510 60 8 8 8 8 4.15 2.79 1.9530 60 4.18 2.98 8 8 3.02 2.95 2.03100 60 3 3 8 3.41 3 3 2.23200 60 3 3 8 3 3 3 2.32500 60 3 3 5.49 3 3 3 2.621000 60 3 3 3 3 3 3 2.73000 60 3 3 3 3 3 3 2.7610 120 8 8 8 8 3.53 2.78 1.9830 120 3.01 2.99 8 8 3 2.97 2.07100 120 3 3 8 4.91 3 3 2.37200 120 3 3 8 3 3 3 2.45500 120 3 3 8 3 3 3 2.811000 120 3 3 3.12 3 3 3 2.93000 120 3 3 3 3 3 3 2.96

Table III: Estimated number of factors, k (average across Monte Carlo iterations). The true number of factorsis r = 3 and we search for 0 ≤ k ≤ T . The eit are assumed iid N(0, 1) across i and t. Columns 3 to 6 correspondto the competing criteria IPp1, ICp1, AIC1, BIC1 (see Bai & Ng (2002)[Section 5] for details). Columns 7 to

9 correspond to our large-N criterion PC(ε2) = ( TT−k )V (k) + kg(N) with g(N) = (log(N))ε1Nε2√

Nand ε1 = 0,

ε2 = 0.1, 0.25, 0.45.

8.6 Tables 81

N=10 N=100 N=1000 N=3000

T=2 - - - -T=5 0.942 0.996 0.999 0.999T=50 0.955 0.996 0.999 0.999

Table IV: Sample correlations between Ft and Ft for 1 ≤ t ≤ T (average across Monte Carlo iterations).

N=10 N=100 N=1000 N=3000

T=2 0.610 0.667 0.674 0.675T=5 0.875 0.905 0.907 0.907T=50 0.989 0.991 0.992 0.992

Table V: Sample correlations between λi and λi for 1 ≤ i ≤ N (average across Monte Carlo iterations).

8.6 Tables 82

mkt smb hml rmw cma

factor 1 -0.163 -0.013 0.024 0.040 0.040

factor 2 0.013 0.007 -0.071 -0.091 -0.030

factor 3 -0.032 -0.113 -0.004 0.015 0.070

factor 4 0.005 0.004 -0.002 0.025 -0.043

factor 5 -0.043 -0.020 0.027 0.011 0.041

Table VI: Identifying risk factors: entire period (Dec 1960 to Dec 2013). We report the correlation betweenmktt (market excess return), smbt (small-minus-big portfolio return), hmlt(high-minus-low portfolio return),rmwt (profitability portfolio return), camt (investment portfolio return) and the first five estimated risk factors.Rolling windows of T = 12 observations from Dec 1960 to Dec 2013. Data source: Kenneth French’s website.

8.6 Tables 83

mkt smb hml rmw cma

factor 1 0.367 0.083 0.378 0.173 -0.064

factor 2 -0.194 -0.037 -0.311 -0.160 0.109

factor 3 0.161 0.433 0.439 -0.125 -0.046

factor 4 0.071 0.085 -0.342 -0.361 -0.462

factor 5 0.007 0.191 -0.232 -0.155 -0.412

Table VII: Identifying risk factors: from the burst of the sub-prime bubble to recovery (Aug 2007 to Feb2009). We report the correlation between mktt (market excess return), smbt (small-minus-big portfolio return),hmlt(high-minus-low portfolio return), rmwt (profitability portfolio return), camt (investment portfolio return)and the first five estimated risk factors. Rolling windows of T = 12 observations from Dec 1960 to Dec 2013.Data source: Kenneth French’s website.

8.6 Tables 84

1/N k=k k=1 k=5 k=10

0.58 0.88 0.65 0.18 0.14- (2.13) (0.49) (- 2.86) (-3.15)

Table VIII: Column one reports the out-of-sample Sharpe ratio (annualized) corresponding to equally-weightedportfolio excess return. Columns two to five report the out-of-sample Sharpe ratios (annualized) correspondingto the tangency portfolio excess returns:

rmv,kt+1 = wmv′t xt+1.,

corresponding to an estimated k factor models, with portfolio weights:

wmvt =

Σ−1t Λtγ

1′NΣ−1t Λtγ

where Σt =(ΛtΩΛ′t + σ2

t IN

).

Rolling windows of T = 36 observations from Dec 1960 to Dec 2013. Data source: individual equity returnsfrom CRSP. The second row report the t-ratios for testing equality between the Sharpe ratios of the portfoliosassociated with the factor model and the Sharpe ratio of the equally weighted portfolio (see Lo (2003) fordetails).

8.7 Figures 85

8.7 Figures

Figure I: T = 10, N = 100. The figure presents the histograms(across Monte Carlo iterations) of the sample correlations betweenFt and Ft (dark blue) and of the sample correlations between λi andλi (red). The correlations have been taken in absolute value.

Histogram of abs(corre)

abs(corre)

Den

sity

0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00

050

100

150

8.7 Figures 86

Figure II: T = 100, N = 100. The figure presents the histograms (across Monte Carlo iterations)of the sample correlations between Ft and Ft (dark blue) and of the sample correlations betweenλi and λi (red). The correlations have been taken in absolute value.

Histogram of abs(corre)

abs(corre)

Den

sity

0.994 0.995 0.996 0.997 0.998

010

020

030

040

050

060

0

8.7 Figures 87

Figure III: T = 2, N = 10. The figure presents the histogram of (studentized) estimated factors

(across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N

12 (Fs − HFs) (green area) and

f large−Ts = (σ2U∗−1)−12N

12 (Fs− HFs) (blue area) for s = 1. The standard normal density function

is superimposed on the histograms (dark blue line).

Histogram of fstn

fstn

Den

sity

−6 −4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

8.7 Figures 88

Figure IV: T = 2, N = 1000. The figure presents the histogram of (studentized) estimated factors






Histogram of fstn

fstn

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

8.7 Figures 89

Figure V: T = 5, N = 10. The figure presents the histogram of (studentized) estimated factors






Histogram of fstn

fstn

Den

sity

−4 −2 0 2

0.0

0.1

0.2

0.3

0.4

8.7 Figures 90

Figure VI: T = 5, N = 1000. The figure presents the histogram of (studentized) estimated factors






Histogram of fstn

fstn

Den

sity

−2 0 2 4

0.0

0.1

0.2

0.3

0.4

8.7 Figures 91

Figure VII: T = 50, N = 10. The figure presents the histogram of (studentized) estimated factors






Histogram of fstn

fstn

Den

sity

−4 −3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

8.7 Figures 92

Figure VIII: T = 50, N = 1000. The figure presents the histogram of (studentized) estimated

factors (across Monte Carlo iterations), namely f small−Ts = (A′BsA)−12N

12 (Fs− HFs) (green area)

and f large−Ts = (σ2U∗−1)−12N

12 (Fs − HFs) (blue area) for s = 1. The standard normal density

function is superimposed on the histograms (dark blue line).

Histogram of fstn

fstn

Den

sity

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

8.7 Figures 93

Figure IX: T = 50, N = 1000. The figure presents the histogram of (studentized) estimated factors




12 (Fs−HFs) (blue area) for s = 1. We also report (red area) the histogram

of (studentized) estimated factors using Bai (2003) robust standard errors. The standard normaldensity function is superimposed on the histograms (dark blue line).

Histogram of fstn

fstn

Den

sity

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

8.7 Figures 94

Figure X: T = 100, N = 1000. The figure presents the histogram of (studentized) estimated factors




12 (Fs−HFs) (blue area) for s = 1. We also report (red area) the histogram

of (studentized) estimated factors using Bai (2003) robust standard errors. The standard normaldensity function is superimposed on the histograms (dark blue line).

Histogram of fstn

fstn

Den

sity

−3 −2 −1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

8.7 Figures 95

Figure XI: T = 5, N = 10. The figure presents the time series (average across Monte Carloiterations) of the 95% confidence intervals (red lines) for the true factor (black line):(

βFt − 1.96β

N12 (A′BtA)

12

, βFt + 1.96β

N12 (A′BtA)

12

), t = 1, · · · , T,

where β is the OLS estimator from projecting F on F (without intercept).

1 2 3 4 5

−1.

0−

0.5

0.0

0.5

1.0

1.5

95% Confidence Interval for Factors:

T=5, N=10t

t(ftr

ue)

8.7 Figures 96

Figure XII: T = 5, N = 1000. The figure presents the time series (average across Monte Carloiterations) of the 95% confidence intervals (red lines) for the true factor (black line):(

βFt − 1.96β

N12 (A′BtA)

12

, βFt + 1.96β

N12 (A′BtA)

12

), t = 1, · · · , T,

where β is the OLS estimator from projecting F on F (without intercept).

1 2 3 4 5

−1.

0−

0.5

0.0

0.5

1.0

1.5

95% Confidence Interval for Factors:

T=5, N=1000t

t(ftr

ue)

8.7 Figures 97

Figure XIII: The figure presents the time series of the estimated number of factors kt−1 (blackline), using the criterion of Theorem t-numfac, together with the S& P 500 Index (green line) fromJanuary 1966 to December 2013, with rolling windows of size T = 12. The grey bands indicatingthe NBER recession indicators and various economic and financial crises, and are numbered asfollows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 98

Figure XIV: The figure presents the time series of the estimated number of factors kt−1, from Jan-uary 1966 to December 2013, using the criterion of Theorem t-numfac, with rolling windows of sizeT = 12, together with grey bands indicating the NBER recession indicators and various economicand financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 99

Figure XV: The figure presents the time series of the first five estimated risk risk factors, namelythe first five entries of Ft,t, based on rolling windows of T = 12 observations. The number of stocks,corresponding to each rolling window, varies from 800 to 5, 000 circa. The grey bands indicatingthe NBER recession indicators and various economic and financial crises, and are numbered asfollows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 100

Figure XVI: The figure presents the time series of the estimated correlation, over rolling windowsof 60 months, of the market return with the first PCA, of the smb with the second PCA andof the hml with the third PCA, where the PCA is implemented over rolling windows of T = 12observations. The number of stocks, corresponding to each rolling window, varies from 800 to5, 000 circa. The grey bands indicating the NBER recession indicators and various economic andfinancial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 101

Figure XVII: The figure presents the time series of the estimated correlation, over rolling windowsof 60 months, of the rmw and cma returns with the fourth PCA, where the PCA is implementedover rolling windows of T = 12 observations. The number of stocks, corresponding to each rollingwindow, varies from 800 to 5, 000 circa. The grey bands indicating the NBER recession indicatorsand various economic and financial crises, and are numbered as follows:The figure presents the time series of the first five estimated risk risk factors, based on rollingwindows of T = 12 observations. The number of stocks, corresponding to each rolling window,varies from 800 to 5, 000 circa. The grey bands indicating the NBER recession indicators andvarious economic and financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]:1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12,[7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 102

Figure XVIII: The figure presents the time series of the estimated correlation, over rolling windowsof 60 months, of the rmw and cma returns with the fifth PCA, where the PCA is implementedover rolling windows of T = 12 observations. The number of stocks, corresponding to each rollingwindow, varies from 800 to 5, 000 circa. The grey bands indicating the NBER recession indicatorsand various economic and financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]:1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12,[7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 103

Figure XIX: The figure presents the time series of the estimated expected equally-weighted portfolioexcess return (black line) µewt = wew′

N Λtγ, where wewN = N−11N , based on the estimated number of

factors reported in Figure XIV, with rolling windows of size T = 12. We also report (red line) themarket dividend yield (ratio of aggregate dividends to the S&P 500 index; source Robert Shiller’swebsite), (blue line) the default spread (Moody’s Seasoned Baa Corporate Bond Yield Relative toYield on 10-Year Treasury Constant Maturity, Not Seasonally Adjusted; source FRED), (light blueline) the term spread (10-Year Treasury Constant Maturity Minus 3-Month Treasury ConstantMaturity, Not Seasonally Adjusted; source FRED) and (green line) the CAPE ratio (CyclicallyAdjusted Price Earnings Ratio P/E10; source Robert Shiller website). The grey bands indicatingthe NBER recession indicators and various economic and financial crises, and are numbered asfollows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 104

Figure XX: The figure presents the time series of the adjusted-R2 of the regression, over rolling win-dows of 60 observations, of µewt = wew′

N Λtγ, where wewN = N−11N , based on the estimated number

of factors reported in Figure XIV, over four state variables, namely the market dividend yield (ratioof aggregate dividends to the S&P 500 index; source Robert Shiller’s website), the default spread(Moody’s Seasoned Baa Corporate Bond Yield Relative to Yield on 10-Year Treasury ConstantMaturity, Not Seasonally Adjusted; source FRED), the term spread (10-Year Treasury ConstantMaturity Minus 3-Month Treasury Constant Maturity, Not Seasonally Adjusted; source FRED) andthe CAPE ratio (Cyclically Adjusted Price Earnings Ratio P/E10; source Robert Shiller website).The black line corresponds to the R2 when the four variables are considered jointly, and the colouredlines when each state variable is considered alone in the regression (red line for CAPE, green linefor term spread, blue line for default spread, light blue for dividend yield). The grey bands indi-cating the NBER recession indicators and various economic and financial crises, and are numberedas follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]:1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3,[10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4,[15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 105

Figure XXI: The figure presents the time series of the estimated risk premium (black line), as-sociated with the first risk factor, with rolling windows of size T = 12. When it is statisticallysignificantly positive and negative at 95%, we report it in green and red respectively, based on the re-sults of Theorem 4. The grey bands indicating the NBER recession indicators and various economicand financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 106

Figure XXII: The figure presents the time series of the estimated risk premium (black line), as-sociated with the second risk factors, with rolling windows of size T = 12. When it is statisti-cally significantly positive and negative at 5% significance value, we report it in green and redrespectively, based on the results of Theorem 4. The grey bands indicating the NBER recessionindicators and various economic and financial crises, and are numbered as follows: [1]: 1969.10- 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]:1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12,[11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11,[16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 107

Figure XXIII: The figure presents the time series of the estimated risk premium (black line),associated with the third risk factors, with rolling windows of size T = 12. When it is statisticallysignificantly positive and negative at 5% significance value, we report it in green and red respectively,based on the results of Theorem 4. The grey bands indicate the NBER recession indicators andvarious economic and financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]:1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12,[7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 108

Figure XXIV: The figure presents the time series (black line), estimated over rolling windows ofT = 12 observations, of the pricing performance statistic:

∆t−1 =1

N

N∑i=1

δ2it−1 −

σ2t−1

T(1− γ ′γ),

where δit−1 = xi− λ′it−1γ. The black line corresponds to the estimated number of factors kt−1 withour criterion. We report cases k = 1, (red line), k = 2 (green line), k = 3 (blue line) and k = 5(light blue).The grey bands indicate the NBER recession indicators and various economic and financial crises,and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]: 1987.8-1977.11, [4]:1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11, [8] 1989.9 - 1989.12, [9]1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10,[14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10,[19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 109

Figure XXV: The figure presents the time series (black line) of the estimated SDF:

mt,t+1 ≡1

rft− 1

rftγ ′Ω−1(Ft+1 − γ).

together with its large-N 95% confidence interval (red lines), based on Theorem 5, with rolling win-dows of size T = 24. The grey bands indicate the NBER recession indicators and various economicand financial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.

8.7 Figures 110

Figure XXVI: We estimate the three linear regressions, using rolling windows of 60 months,

mt−1,t = α+ β1mktt + ut,

mt−1,t = α+ β1mktt + β2smbt + β3hmlt + ut,

mt−1,t = α+ β1mktt + β2smbt + β3hmlt + β4rmwt + β5cmat + ut,

and report the corresponding adjusted-R2. Here mktt denotes the market excess return, smbtdenotes the small-minus-big portfolio return, hmlt denotes the high-minus-low portfolio return,rmwt denotes the profitability portfolio return and camt denotes the investment portfolio return(source: Kenneth French’s website); t-ratios in parenthesis. The CAPM, the Fama & French(1993) and Fama & French (2015) models are reported with the black, red and green lines, re-spectively. The grey bands indicate the NBER recession indicators and various economic andfinancial crises, and are numbered as follows: [1]: 1969.10 - 1970.11, [2]: 1973.11 - 1975.3, [3]:1987.8-1977.11, [4]: 1980.1-1980.7, [5]: 1981.7-1982.11, [6]: 1986.10-1986.12, [7] 1987.9 - 1987.11,[8] 1989.9 - 1989.12, [9] 1990.7 - 1991.3, [10] 1991.8 -01992.12, [11] 1994.7-1994.10, [12] 1997.5-1997.9, [13]1998.8-1998.10, [14]2000.2-2000.4, [15]2001.3-2001.11, [16]2005.8-2005.11, [17]2007.12-2009.6, [18]2010.8-2010.10, [19]2012.5-2012.7.Data are monthly from Dec 1960 until Dec 2013.