65
Smart Stochastic Discount Factors SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI * January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S–SDFs) minimizing various notions of SDF dispersion, under general convex constraints on non zero pricing errors. S–SDFs can be naturally motivated by market frictions, asymptotic no-arbitrage conditions in an APT framework or a need for SDF regulariza- tion. More broadly, we show that they are always supported by a suitable viable economy with transaction costs. Minimum dispersion S–SDFs give rise to new nonparametric bounds for asset pricing models, under weaker assumptions on a model’s ability to price cross-sections of assets. They arise from a simple transfor- mation of the optimal payoff in a dual penalized portfolio problem. We clarify the deep properties of S–SDFs induced by various economically motivated pricing error structures and develop a systematic tractable ap- proach for their empirical analysis. For various APT settings, we demonstrate the improved out-of-sample pricing performance of minimum dispersion S–SDFs, which directly corresponds to highly profitable dual portfolio strategies. Keywords: SDF, Pricing Errors, Fundamental Theorem of Asset Pricing, Market Frictions, Model-free SDF, Minimum Dispersion SDF, Arbitrage Pricing Theory, Factor Models, Tests of Asset Pricing Models, SDF Bounds, Transaction Costs, Portfolio Regularization, Lasso, Ridge, Elastic Net. * Sofonias Alemu Korsaye (email: [email protected]) and Fabio Trojani (email: [email protected]) are with University of Geneva, Geneva Finance Research Institute & Swiss Finance Institute. Alberto Quaini (email: [email protected]) is with University of Geneva, Geneva Finance Research Institute. We thank Caio Almeida, Fed- erico Bandi, Tony Berrada, Federico Carlini, Ines Chaieb, George Constantinides, Jerome Detemple, Patrick Gagliardini, Eric Ghysels, Lars Hansen, Oliver Linton, Stefan Nagel, Olivier Scaillet, Paul Schneider, Michael Weber, Dacheng Xiu, conference participants at the 2019 SoFiE annual Meeting in Shanghai, the 2019 European Econometric Society Meeting in Manchester, the 2019 Workshop on Big Data and Economic Forecasting in Ispra, the 2019 Conference on Quantitative Finance and Fi- nancial Econometrics in Marseille, the 2019 Financial Econometrics Conference in Toulouse, the 2019 Vienna Congress on Mathematical Finance, the 2019 Swiss Finance Institute Research days, the 2019 ESSEC Workshop on Monte Carlo Methods and Approximate Dynamic Programming with Applications in Finance, the 2019 Paris December Finance Meeting and seminar participants at BI Norwegian Business School, Bocconi University, University of Zurich, University of Lugano, University of Geneva, University of Lund, JRC in Ispra and Luiss Business School. All errors are ours.

Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Smart Stochastic Discount Factors

SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI∗

January 9, 2020

ABSTRACT

We introduce model-free Smart Stochastic Discount Factors (S–SDFs) minimizing various notions of SDF

dispersion, under general convex constraints on non zero pricing errors. S–SDFs can be naturally motivated

by market frictions, asymptotic no-arbitrage conditions in an APT framework or a need for SDF regulariza-

tion. More broadly, we show that they are always supported by a suitable viable economy with transaction

costs. Minimum dispersion S–SDFs give rise to new nonparametric bounds for asset pricing models, under

weaker assumptions on a model’s ability to price cross-sections of assets. They arise from a simple transfor-

mation of the optimal payoff in a dual penalized portfolio problem. We clarify the deep properties of S–SDFs

induced by various economically motivated pricing error structures and develop a systematic tractable ap-

proach for their empirical analysis. For various APT settings, we demonstrate the improved out-of-sample

pricing performance of minimum dispersion S–SDFs, which directly corresponds to highly profitable dual

portfolio strategies.

Keywords: SDF, Pricing Errors, Fundamental Theorem of Asset Pricing, Market Frictions, Model-free SDF,

Minimum Dispersion SDF, Arbitrage Pricing Theory, Factor Models, Tests of Asset Pricing Models, SDF

Bounds, Transaction Costs, Portfolio Regularization, Lasso, Ridge, Elastic Net.

∗Sofonias Alemu Korsaye (email: [email protected]) and Fabio Trojani (email: [email protected])are with University of Geneva, Geneva Finance Research Institute & Swiss Finance Institute. Alberto Quaini (email:[email protected]) is with University of Geneva, Geneva Finance Research Institute. We thank Caio Almeida, Fed-erico Bandi, Tony Berrada, Federico Carlini, Ines Chaieb, George Constantinides, Jerome Detemple, Patrick Gagliardini, EricGhysels, Lars Hansen, Oliver Linton, Stefan Nagel, Olivier Scaillet, Paul Schneider, Michael Weber, Dacheng Xiu, conferenceparticipants at the 2019 SoFiE annual Meeting in Shanghai, the 2019 European Econometric Society Meeting in Manchester,the 2019 Workshop on Big Data and Economic Forecasting in Ispra, the 2019 Conference on Quantitative Finance and Fi-nancial Econometrics in Marseille, the 2019 Financial Econometrics Conference in Toulouse, the 2019 Vienna Congress onMathematical Finance, the 2019 Swiss Finance Institute Research days, the 2019 ESSEC Workshop on Monte Carlo Methodsand Approximate Dynamic Programming with Applications in Finance, the 2019 Paris December Finance Meeting and seminarparticipants at BI Norwegian Business School, Bocconi University, University of Zurich, University of Lugano, University ofGeneva, University of Lund, JRC in Ispra and Luiss Business School. All errors are ours.

Page 2: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Contents

1 Introduction 4

2 Theory of Smart Stochastic Discount Factors 7

2.1 Existence of S–SDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Minimum dispersion S–SDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Relevant examples of S–SDFs 13

3.1 S–SDFs directly induced by market frictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 S–SDFs induced by machine learning penalization devices . . . . . . . . . . . . . . . . . . . . 15

3.3 S– SDFs induced by constrained pricing error minimizations . . . . . . . . . . . . . . . . . . . 17

3.3.1 Good-Deal Bounds S–SDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3.2 Robust linear S–SDFs with shrinkage . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 S–SDFs induced by the Arbitrage Pricing Theory . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4.1 APT–consistent S–SDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4.2 Penalized principal components S–SDFs . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Econometrics of minimum dispersion S–SDFs 24

4.1 Asymptotic properties of dispersion bound and dual portfolio weight estimators . . . . . . . . 25

4.2 S–SDF dispersion bound diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Empirical Analysis: S–SDFs in the APT Framework 27

5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 In–sample properties of APT–consistent minimum dispersion S–SDFs . . . . . . . . . . . . . 29

5.3 Out-of-sample pricing properties of APT–consistent S–SDFs . . . . . . . . . . . . . . . . . . . 31

5.3.1 S–SDF estimation setting and benchmark empirical asset pricing models . . . . . . . . 32

5.3.2 Out-of-sample pricing results for APT–consistent minimum variance S–SDFs . . . . . 33

2

Page 3: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

5.3.3 Out-of-sample pricing results for other APT–consistent minimum dispersion S–SDFs . 34

5.4 Optimal trading strategies supporting APT–consistent S–SDFs . . . . . . . . . . . . . . . . . 35

6 Conclusions 37

Appendices 38

A - Figures and Tables 38

B - Proofs 49

Appendices 58

A - Cressie-Read Dispersions 58

B - Estimation and inference for minimum dispersion S–SDFs 59

B.1 Asymptotic distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

B.1.1 Asymptotic distribution of estimators for S–SDF dispersion bounds . . . . . . . . . . 60

B.1.2 Asymptotic distribution of estimator for S–SDF dual portfolio weights . . . . . . . . . 61

C - Proofs of Section B 63

3

Page 4: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

1 Introduction

Hansen and Jagannathan [1991] provide a unified analysis and testing framework for asset pricing models

under frictionless arbitrage-free markets, based on the minimum variance stochastic discount factor (SDF)

that exactly prices a given set of securities. By construction, the minimum variance SDF provides both a

lower bound on the variance of any admissible SDF and - by means of its dual portfolio characterization - an

upper bound on the maximal attainable Sharpe ratio using linear portfolios of traded securities. In this way,

it provides a useful reality check for any asset pricing model, in terms of both the required SDF variability and

the attainable risk-return tradeoff. A large literature has built on these findings, by deriving the implications

of more general notions of SDF dispersion for the admissible SDF variability and the attainable risk return

tradeoff, giving rise to an extended family of model reality checks that can incorporate a concern for, e.g.,

higher-moment risk; see Backus et al. [2011], Juilliard and Ghosh [2012], Gosh et al. [2017] and Almeida and

Garcia [2016], among others. In this paper, we extend this literature to general asset pricing settings where

SDFs can incorporate widespread constraints on non-zero pricing errors. We call such SDFs Smart SDFs

(S–SDFs).

Economically, S–SDFs naturally arise already in arbitrage-free economies with frictions that allow unbounded

total wealth allocations to risky assets - such as proportional transaction costs, short-sale constraints or

margin requirements - because of the sublinearity of the pricing functional in such settings; see, e.g, Luttmer

[1996]. We first show that S–SDFs are more generally supported by viable markets with convex transaction

costs that can directly constrain the total wealth allocation to risky assets, such as, e.g., quadratic transaction

costs or constraints on total leverage. In such settings, we carefully embed total wealth penalizations in our

S–SDF methodology, because they are essential to obtain well-behaved minimum dispersion S–SDFs also

for markets where the standard SDF duality characterizations in Hansen and Jagannathan [1991], Almeida

and Garcia [2016] and Luttmer [1996], among others, do not apply; see, e.g., Borwein [1993] for counter-

examples. We further show that S–SDFs naturally arise also under asymptotic notions of no-arbitrage,

which are typically assumed in Ross’s Arbitrage Pricing Theory (APT). In this context, we allow for flexible

constraints on non zero pricing errors that can address also situations where the underlying factor model for

returns may be misspecified; see, e.g., Uppal et al. [2018].

We define minimum dispersion S–SDFs as the solutions of general minimum SDF dispersion problems with

flexible convex constraints on the pricing errors of a subset of assets. The generality of our setting allows

4

Page 5: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

us to address with one coherent framework several specifications of frictions and total wealth constraints

previously considered in the literature, various pricing error bounds implied by the APT, and more broadly

any pricing error geometry induced by convex pricing error metrics. In parallel, our class of SDF dispersions

is broad enough to embed a concern for second and higher-order risks, as measured, e.g., by notions of SDF

variability induced by Cressie and Read [1984] power divergences; see also Kitamura et al. [2004] and Newey

and Smith [2004], among others.

We formally link minimum dispersion S–SDFs to the solutions of dual portfolio selection problems with

penalized portfolio weights, under a uniquely determined convex penalization function. This dual formulation

explicitly measures the direct effect on optimal portfolio weights of, e.g., market frictions, of the type of

portfolio weight regularization used to obtain a well-behaved S–SDF, of a possible violation of the Arbitrage

Pricing Theory factor model assumptions, or a combination of these and other features. Equivalently,

the one-to-one relation between the specification of convex pricing error constraints and the corresponding

penalization on portfolio weights uniquely identifies the deep S–SDF properties that are induced, e.g., by a

given market friction, a particular regularization choice or a specific form of misspecification in the APT.

The established duality between minimum dispersion S–SDFs and the solution of the corresponding penalized

optimal portfolio problem allows us to develop a systematic approach also for the empirical analysis of

minimum dispersion S–SDFs and their corresponding minimum dispersion bounds. In contrast to standard

minimum dispersion SDFs, the portfolio optimization problem underlying minimum dispersion S–SDFs can

imply a non smooth objective function, because its solution depends on a possibly nonsmooth penalization.

In order to develop our empirical methodology for a broad class of convex pricing error specifications, we

systematically make use of accurate approximations of portfolio weights penalties based on Moreau [1962]

envelopes. This approach gives rise to smooth S–SDF estimation problems that are compatible with key S–

SDF properties, such as pricing errors or portfolio weights sparsity, and are at the same time more tractable

for applications.

In our empirical analysis, we systematically study minimum dispersion S–SDFs that are consistent with the

pricing errors bounds induced by APT–like asymptotic no arbitrage conditions. We consider several such

S–SDFs, using dispersion measures in the class of Cressie-Read power dispersions and several pricing error

metrics that are compatible with various degrees of sparsity on pricing errors or portfolio weights.

We first show that pricing error sparsity is a key determinant of large S–SDF dispersions when pricing a

5

Page 6: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

cross-section of asset returns without prior assumptions about exactly priced risk factors in the APT. In

contrast, the trade-off between portfolio weight sparsity and S–SDF dispersion is less tight. This evidence

indicates that S–SDFs with dense pricing errors may be better able to explain a given cross-section of asset

returns, while targeting a given pricing error bound in the APT without taking a stand on which systematic

risk factors are exactly priced.

Second, we explore the distinct properties of minimum dispersion S–SDFs induced by different notions

of dispersion, including Hansen-Jagannathan distance, Kullback-Leibler, negative entropy and Hellinger

dispersions. We find that these S–SDFs can be very different when no pricing error is admitted, due to

the strict higher-moment effects in the empirical return distribution. However, differences tend to shrink

substantially when introducing pricing errors that are consistent with the pricing errors bounds induced by

APT-type no-arbitrage conditions. Hence, SDF regularizations induced by APT pricing error bounds make

minimimum dispersion S–SDFs less dependent on the dispersion criterion used.

We then study the out-of-sample pricing properties of mimimum dispersion S–SDFs, relative to those of

well-known linear SDF benchmarks in the literature that correspond to various APT–like factor model

specifications. In doing so, we also systematically explore the implications of pricing error and portfolio

weight sparsity for the out-of-sample pricing performance, based on GLS-adjusted R2s in standard testing

procedures for empirical asset pricing models. We find that Hansen-Jagannathan minimum variance S–

SDFs with dense pricing errors and dense portfolio weights clearly outperform minimum variance S–SDFs

imposing sparsity on pricing errors or portfolio weights. Moreover, the out-of-sample pricing performance of

such S–SDFs dominates the one of minimum dispersion S–SDFs induced by notions of dispersion different

from variance, such as negative entropy or Kullback-Leibler dispersion. We also find that the out-of-sample

pricing performance of minimum variance S–SDFs is dramatically higher than the one of benchmark linear

asset pricing models resulting from APT–like factor model specifications. For instance, we find that while

the out-of-sample GLS-adjusted R2 generated by bechmark three- and five-factor Fama-French models on

an intermediate dimensional dataset with about 200 assets is only about 6%, the one generated by our

data-driven minimum variance S–SDFs is about 42%.

The large out-of-sample pricing performance of Hansen-Jagannathan minimum variance S–SDFs with dense

pricing errors and dense portfolio weights naturally corresponds to highly profitable optimal trading strate-

gies. Indeed, in our out-of sample period from July 1963 to June 2018, the unconditional annualized out-of-

6

Page 7: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

sample Sharpe ratio of the portfolio supporting our minimum variance S–SDFs is about 1.4, while those of

market capitalization-weighted and equally-weighted portfolio strategies are only 0.42 and 0.51, respectively.

Importantly, the out-of-sample Sharpe ratios of data-driven minimum variance S–SDF portfolios are largely

unrelated to the time-varying leverage in the underlying optimal S–SDF strategy. This feature indicates that

the excellent performance of minimum variance S–SDF portfolios predominantly arises from their ability to

successfully exploit the conditional mean-variance trade-offs, rather than from their dynamic timing features.

The remainder of the paper proceeds as follows. Section 2 presents our main theoretical findings. First,

we formally define S–SDFs and prove their existence in arbitrage-free (viable) financial markets supported

by corresponding sublinear (convex) transaction costs. Subsequently, we derive optimal model-free S–SDFs

that minimize various established notions of statistical dispersion. To this end, we employ Fenchel duality to

reduce the infinite dimensional optimization problem defining minimum dispersion S–SDFs to a correspond-

ing finite dimensional penalized portfolio problem. We then fully characterize minimum dispersion S–SDFs

as simple transformations of the optimal portfolio payoff resulting from the corresponding penalized dual

portfolio problems. In between these theoretical results, we provide various relevant economic examples of

S–SDFs, demonstrating how different friction, regularization and pricing error specifications in the litera-

ture are embedded into our approach, and we highlight their connections. Subsequently, we introduce our

systematic modelling methodology based on Moreau envelopes to construct S–SDFs inducing tractable, i.e.,

smooth, dual portfolio problems, given any target pricing error specification with desirable properties such

as sparsity. Finally, we introduce and explain in detail the notion of APT–consistent minimum dispersion

S–SDFs. Section B addresses estimation and inference for S–SDFs and the corresponding minimum disper-

sion bounds. These findings provide the foundations for a general testing framework of asset pricing models

based on S–SDFs, which is also developed. Section 5 presents our empirical analysis of S–SDFs induced by

APT-type asymptotic no-arbitrage conditions. Finally, Section 6 concludes.

2 Theory of Smart Stochastic Discount Factors

Consider an economy consisting of a fixed number N of basis securities with random payoffsX := (Xn)Nn=1 at

time 1 and associated quoted prices P ∈ RN at time 0. These securities are partitioned into what we call sure

and dubious securities, by means of two index sets, S ⊂ 1, . . . , N and its complement D := 1, . . . , N \S,

having cardinalities NS and ND, respectively. That is, PS := (Pn)n∈S denotes the vector of prices and

7

Page 8: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

XS := (Xn)n∈S the vector of payoffs of the sure assets. Similarly, PD and XD denote the vectors of dubious

prices and payoffs, respectively. The partition’s rationale asks that sure securities are exactly priced by a

linear pricing functional represented by a corresponding stochastic discount factor, while dubious securities

might be mispriced to some extent by such linear pricing rule.1 The vector of pricing errors under a generic

stochastic discount factor M is given by E[MX] − P .2 Given the willingness or the necessity to tolerate

some pricing errors for the dubious assets, a general approach to control their size and geometry can rely on

following pricing error constraints:

E[MXS ]− PS = 0 and h(E[MXD]− PD) ≤ τ , (1)

where τ > 0 and h : RND → [0,+∞] is a closed and convex pricing error function such that h(0) = 0.3

Literally, linear pricing functional E[M ·] in equation (1) matches exactly the quoted prices of sure payoffs

and it implies pricing errors on dubious assets bounded by threshold τ under pricing error function h.

Definition 1 (Smart Stochastic Discount Factors). Given pricing error function h and bound τ , a Smart

Stochastic Discount Factor (S–SDF) is a nonnegative random variable M satisfying pricing constraints (1).

Intuitively, function h in Definition 1 fixes the geometry of the pricing errors generated by an S–SDF and

it thus crucially determines the deep S–SDFs properties. For instance, the chosen pricing error geometry

has direct implications for the S–SDF characterization of assets excess returns. Indeed, following standard

expected excess return identity holds for any sure asset payoff:

E[Xn]− PnE[M ]

= −Cov[M

E[M ], Xn

], n ∈ S . (2)

In contrast, the expected excess returns of dubious assets satisfy:

E[Xn]− PnE[M ]

= −Cov[M

E[M ], Xn

]+

E[MXn]− PnE[M ]

, n ∈ D . (3)

It follows that the expected excess returns of dubious assets are given by the sum of a risk premium com-

1 One of these two index sets might be empty, in which case either only sure or only dubious securities exist. Note thatthis partitioning needs to be specified ex-ante, at the formation of an asset pricing model. As shown later, there are ways toendogenously discover which assets among the dubious index set can be considered ex-post as sure.

2 In more general multi-period economies, the vector of (conditional) pricing errors under stochastic discount factor Mis given by E[MX|T ] − P , where T is the information set available to investors at the time of the trade and E[·|T ] is thecorresponding conditional expectation operator. In this article, we do not model explicitly the information set T and workwithout loss of generality with the vector of expected pricing errors E[MX]−P . As is standard in the literature, conditioninginformation can be incorporated in this setting by considering prices and payoffs instrumented by random variables in theinformation set T , which gives rise to unconditional pricing constraints for dynamically managed portfolios of asset payoffs.

3 In this study all closed convex functions are also proper, i.e., finite in at least one point. A function h : RND → [0,∞] isclosed if all its sublevel sets η : h(η) ≤ t (t ∈ R) are closed. The set of closed convex and proper functions is broad enoughto encompass all pricing error functions that are relevant for our work.

8

Page 9: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

ponent, reflecting exposure to systematic S–SDF risk, and an additional pricing error component. Hence,

a given pricing error geometry directly influences the way how an S–SDF explains a cross-section of asset

excess returns, in terms of their exposure to S–SDF risk and asset pricing errors, respectively.

Most of the existing asset pricing models imply SDFs that are consistent with pricing constraints of the form

(1). Obvious ones are standard SDFs that price exactly all payoffs, which can be specified by considering

all assets as sure. Many other relevant examples of S–SDFs are discussed in more detail in Section 3. More

broadly, we provide in the next section a general arbitrage-free foundation for S–SDFs, by showing that they

are always supported by a suitable arbitrage-free economy with transaction costs.

2.1 Existence of S–SDFs

Let (Ω, F , P) be a probability space and Lp with p ∈ [1,+∞] be the space of p−integrable random variables

on this probability space.4 Throughout this study, we assume that payoff Xn ∈ Lp with p ∈ (1,+∞), for

each basis security indexed by n = 1, . . . , N . Suppose further that portfolio positions θS ∈ RNS on the

sure payoffs involve no transaction costs, while portfolio positions θD ∈ RND on dubious payoffs involve

transaction costs measured by some function γ : RND → [0,+∞]. We specify the set of traded payoffs as the

set of portfolio payoffs that can be replicated with finite transaction costs, by means of the basis payoffs X:

Z := Z = X ′θ : γ(θD) < +∞ , (4)

where θ := (θ′S ,θ′D)′. Accordingly, we define a pricing functional π on Z as the smallest replication cost of

a payoff Z by means of portfolios of basis payoffs X, after accounting for transaction costs:

π(Z) := infθ∈RN

P ′θ + γ(θD) : Z = X ′θ . (5)

Equations (4) and (5) imply a market structure (Z, π) consistent with established asset pricing settings in

the literature. For instance, if γ is sublinear one obtains the asset pricing framework with frictions in Jouini

and Kallal [1995].5 Here, market Z is a convex cone and pricing functional π is sublinear. If markets are

frictionless, γ is the zero function and the classical asset pricing framework in Harrison and Kreps [1979]

follows. Here, market Z is a linear space and pricing functional π is linear. Finally, note that whenever assets

4 We equip Lp := x : Ω→ R : x is measurable, E[|x|p]1/p < +∞, p ∈ [1,+∞], with the partial order ≤ defined by: x ≤ yif and only if P(x ≤ y) = 1. Also, the duality pair between Lp and Lq , for 1/p + 1/q = 1, is given by E[xy], for any x ∈ Lp,y ∈ Lq . In what follows, we do not consider payoffs in Lp with p = 1 or p = +∞ in order to obtain compact proofs involvingLp − Lq duality and remain in the realm of the norm topology.

5 Function γ is sublinear if γ(θD + θD) ≤ γ(θD) + γ(θD) and γ(aθD) = aγ(θD) for any a ≥ 0 and any θD, θD ∈ RND .

9

Page 10: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

redundancy can be excluded in market (Z, π), i.e., X ′θ1 = X ′θ2 and π(X ′θ1) = π(X ′θ2) imply θ1 = θ2,

then pricing functional (5) is directly given for any traded payoff Z = X ′θ by π(Z) = P ′θ + γ(θD).

We now characterize arbitrage-free market structures (Z, π) that support the existence of S–SDFs. Regarding

the notion of arbitrage-free markets, we follow Harrison and Kreps [1979] and define an arbitrage opportunity

as a traded payoff Z ∈ Z such that Z ≥ 0, π(Z) ≤ 0 and P(Z > 0) > 0. Regarding the arbitrage-free market

structures supporting S–SDFs, we derive appropriate transaction cost structures for these markets. To this

end, we specify in equations (4) and (5) a market structure (Zh, πh) with a transaction cost function defined

by:

γh∗(θD) := infλ>0λh∗(θD/λ) + λτ , (6)

where h∗ is the convex conjugate of pricing error metric h in equation (1), given by:

h∗(θD) := supη∈RN

η′θD − h(η) . (7)

In transaction cost definition (6), term λh∗(θD/λ) is interpretable as a varying transaction cost component

associated with portfolios θD of dubious assets that are executed with λ orders of size 1/λ. Similarly, λτ

is interpretable as a transaction cost component cumulating a fix execution cost τ over λ separated orders.

Therefore, transaction cost function γh∗(θD) is interpretable as the minimum execution cost for implementing

a target portfolio θD of dubious assets.

Importantly, function h∗ in equation (7) stays in a one-to-one relation with the pricing error metric h in

equation (1).6 Hence, transaction cost function (6) and market structure (Zh, πh) are uniquely linked to

the S–SDF pricing error metric h. A natural question is under which additional conditions on market

structure (Zh, πh) does and S–SDF induced by a pricing error metric h exist. The next proposition shows

that arbitrage-freeness of (Zh, πh) is the characterizing condition for the existence of such an S–SDF.

Proposition 1 (Existence of S–SDFs). Given market structure (Zh, πh) induced by transaction cost

function (6), following statements are equivalent: (i) (Zh, πh) is arbitrage-free; (ii) there exists a strictly

positive S-SDF M ∈ Lq, with 1/p+ 1/q = 1, satisfying pricing constraints (1).

According to Proposition 1, there always exist an arbitrage-free economy with frictions that supports a given

S–SDF, and vice versa. Moreover, we can always uniquely identify from pricing error metric h the type of

frictions allowing to support a given S–SDF in the corresponding arbitrage-free economy. Similarly, given

6See, e.g., [Rockafellar, 1970, Cor. 12.2.1].

10

Page 11: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

an arbitrage-free economy with a given transaction cost function of the form (6), we can always uniquely

identify from function h∗ the implied pricing error metric h as its convext conjugate. Therefore, Proposition

1 also uniquely characterizes the hidden pricing properties of S–SDFs in general arbitrage-free economies

with transaction costs of the form (6). Finally, we show in Section 3 that transaction function (6) is available

in closed-form for a broad variety of practically relevant S–SDF settings.7 Therefore, the characterization of

S–SDF in Proposition 1 also provides useful insights for the empirical analysis of S–SDFs.

2.2 Minimum dispersion S–SDFs

Granted the existence of S–SDFs, an analysis and selection framework of S–SDFs satisfying pricing con-

straints (1) in incomplete markets is required. For this reason, this section extends the analysis and selection

frameworks in Hansen and Jagannathan [1991], Luttmer [1996] and Almeida and Garcia [2016], among others,

by introducing optimal S–SDFs minimizing various established notions of stochastic dispersion. Borrowing

from Rockafellar [1971], we measure the SDFs dispersion via Φ−dispersions, which are integral functionals

of the form M 7→ E[φ(M)], where φ : R → [0,+∞] is a closed and strictly convex function.8 Many well-

known measures of SDF dispersion are Φ−dispersions, such as the Hansen and Jagannathan [1991] distance,

entropy-based dispersions and more generally dispersions in the Cressie and Read [1984] family.9

Definition 2 (Minimum dispersion S–SDFs). SDFs that are consistent with pricing constraints (1) and

minimize a particular Φ−dispersion are called minimum dispersion S–SDFs. That is, they are solutions to:10

Π(τ) := infM∈M

E[φ(M)] : h(E[MXD]− PD) ≤ τ , (8)

where M⊂ Lq denotes the convex subset of nonnegative SDFs that exactly price the sure assets:

M := M ∈ Lq : M ≥ 0 ; E[MXS ]− PS = 0S . (9)

In Definition 2, minimum dispersion S–SDF problems naturally extend the settings in Hansen and Jagan-

nathan [1991] and Luttmer [1996], by providing generalized SDF dispersion bounds incorporating widespread

pricing error features. Basic theoretical properties of generalized S–SDF bounds are clarified in the next

7In particular, whenever function h∗ is equation (7) is sublinear it is easy to see that γh∗ = h∗.8 We further require that φ+ is finite on (0,+∞).9 Online Appendix A collects relevant examples of Φ−dispersions in the Cressie and Read [1984] family.

10 Note that the implicit assumption that there exists at least one feasible S–SDF M satisfying E[φ(M)] < +∞ can be ensuredby requiring that basis payoffs and S–SDFs live in some convenient Lp − Lq dual spaces. For instance, if φ determines theHansen-Jagannathan dispersion and any element of vector X is in L2, then, under Proposition 1, there exists S–SDF M ∈ L2,i.e., having finite Hansen-Jagannathan dispersion.

11

Page 12: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

remark.

Remark 1. Let MS0 be the S–SDF solving problem Π(∞) := limτ→+∞Π(τ), i.e., MS

0 is only required to

price the sure assets exactly, and define:

τmax := h(E[MS0 XD]− PD) . (10)

The generalized S–SDF minimum dispersion bound Π : R+ → [0,+∞] is convex and strictly decreasing if

τ ∈ [0, τmax], while it is constant at E[φ(MS0 )] if τ > τmax.11

Working directly with problem (8) is undesirable, as it is an infinite-dimensional optimization problem.

Therefore, we provide next the dual characterization of minimum dispersion S–SDF problems, in terms of

unconstrained penalized portfolio problems defined on RN . The economic condition implying such duality

result is absence of arbitrage.

Proposition 2 (Dual portfolio problem). Consider the portfolio problem:

∆(τ) := minθ∈RN

E[φ∗+(−X ′θ)] + P ′θ + γh∗(θD)

, (11)

with function γh∗ defined in equation (6), and assume that the supporting economy (Zh, πh) of Proposition 1

is arbitrage-free. Then, Π(τ) = −∆(τ). Moreover, let θ0 be the solution of problem ∆(τ). Then, the unique

solution of problem Π(τ) in equation (8) is:12

M0 = (φ∗+)′(−X ′θ0) . (12)

In Proposition 2, minimum dispersion S–SDF problems are uniquely linked to corresponding finite-dimensional

regularized portfolio problems, in which the portfolio weights of the dubious securities are penalized with

transaction function γh∗ from equation (6). The dispersion of the portfolio payoff is measured by the dual

Φ−dispersion, E[φ∗+(·)], where φ∗+ : R → [0,+∞] is the convex conjugate of φ+, the restriction of φ to

R+.13 Moreover, the dual Φ−dispersion directly determines in equation (12) the optimal S–SDF M0 from

the optimal portfolio payoff −X ′θ0. Online Appendix A collects closed-form expressions of φ+, φ∗+ and the

minimum dispersion S–SDF (12) for the family of Cressie and Read [1984] dispersions.

11 This follows by [Luenberger, 1997, Prop. 1 and 2, p. 216-217], by strict convexity of Φ−dispersions and convexity ofpricing error function h.

12 With a slight abuse of notation, we denote by (φ∗+)′(x) the first derivative of function φ∗+ in an interior point x.13 That is, φ+(x) = φ(x) if x ≥ 0 and φ+(x) = +∞ if x < 0.

12

Page 13: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Remark 2. The no arbitrage assumption in Proposition 2 for the supporting economy (Zh, πh) is essential,

as it ensures with Proposition 1 existence of a strictly positive S–SDF in the first place. A violation of this

assumption can imply that a candidate solution M0 in equation (12) is not a solution of the primal S–SDF

problem Π(τ). Figure 2 illustrates such a duality failure empirically, based on the datasets used in the

empirical study of Section 5. We solve the empirical version of dual problem ∆(τ) for various pricing error

bounds τ , using sample variance as a measure of S–SDF dispersion and an l2−penalization function for dual

portfolio weights, which is APT–consistent in the sense discussed in Section 3.4 below. We then estimate a

candidate minimum variance S–SDF M0 and the corresponding pricing error vector E[M0XD]−PD. Finally,

we verify whether indeed the condition h(E[M0XD]−PD) ≤ τ is satisfied, consistently with the pricing error

restriction in primal S–SDF problem (8). We find that for a range of values for parameter τ strictly smaller

than the point of discontinuity in the plot of Figure 2, the candidate SDF M0 implied by the dual portfolio

problem yields a serious violation of the inequality h(E[M0XD]− PD) ≤ τ , i.e., M0 is not a solution of the

empirical primal S–SDF problem in those cases. Therefore, no empirical minimum variance SDF exactly

pricing all returns according to the finite-sample pricing restrictions exists in such cases. In contrast, an

empirical minimum variance S–SDF solution exists for sufficiently large pricing error bounds τ .14

3 Relevant examples of S–SDFs

From Definition 1 and Proposition 1, the deep S–SDF properties are characterized by their pricing error ge-

ometry or, equivalently, by the transaction cost structure of their supporting arbitrage free market. Starting

from this twofold interpretation, we study in more detail several concrete settings generating economically

relevant S–SDFs, for which we obtain in closed-form the expressions for h, h∗, and penalization γh∗ in

Proposition 2. These settings give rise to a large family of tractable minimum dispersion S–SDFs that are

directly computable for applications. For some of the S–SDF examples we discuss, Table 1 summarizes the

closed-form pricing error and transaction cost function h and h∗, together with the corresponding portfolio

penalty function γh∗ in Proposition 2.

We first discuss S–SDFs that naturally arise from a direct specification of market frictions, including short-

sale restrictions, bid-ask spreads, leverage restrictions, and proportional or quadratic transaction costs.15

14 In the empirical analysis of Section 5, we systematically ensure that our estimated S–SDFs are solutions of the empiricalprimal S–SDF problem, using a sufficiently large pricing error threshold τ .

15 Examples of studies involving (i) borrowing and short-sale constraints, bid-ask spreads and margin requirements are foundin Luttmer [1996] and Naik and Uppal [1994], (ii) proportional transaction costs are found in Mei et al. [2016] and Constantinides[1986], and (iii) quadratic transaction costs are the focus in Garleanu and Pedersen [2013] and DeMiguel et al. [2015]. The use

13

Page 14: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Afterwards, we provide a foundation for several recent approaches, which aim to obtain regularized SDFs

in large asset markets, based on various penalization devices from the machine learning literature. Third,

we show that several other approaches in the literature, which rely on a minimization of pricing errors

subject to a bound on the SDF variability, actually induce particular S–SDF choices, for which we elicit

the corresponding pricing error geometries and transaction cost structures. These settings, include, e.g., the

good-deal bounds theory in Cochrane and Saa-Requejo [2000] and the robust linear SDF modelling in Nagel

et al. [2018]. We then explicitly link the pricing predictions of Ross [2013]’s Arbitrage Pricing Theory (APT)

and its extended version with misspecification in Uppal et al. [2018] to suitable APT–consistent minimum

dispersion S–SDFs. In this setting, we also highlight the links between APT–consistent minimum dispersion

S–SDFs and penalized principal components approaches as in Lettau and Pelger [2019].

3.1 S–SDFs directly induced by market frictions

We first address S–SDFs directly induced by several forms of market frictions. Minimum variance SDFs in-

corporating bid-ask spreads and short-sale constraints in arbitrage-free markets have been studied in Luttmer

[1996]. In general, such market frictions can be modelled with a portfolio constraint θD ∈ K, using a convex

cone K ⊂ RND . Luttmer [1996] shows that SDFs in such markets imply pricing errors constrained to belong

to the polar cone of K, call it K0.16 For instance, short-sale constraints are defined by non-negative portfolio

positions, i.e., K = RND+ . Since the polar cone of RND+ is RND− , pricing errors are such that E[MXD] ≤ PD,

i.e., the pricing functional is sublinear. In our framework, these market frictions are easily modelled with a

transaction cost function given by the characteristic function of K:

h∗(θD) = δK(θD) :=

0 θD ∈ K

+∞ θD /∈ K, (13)

i.e., investors incur zero transaction costs if portfolio positions belong to K and arbitrarily large transaction

costs otherwise. The pricing error function follows as the convex conjugate of h∗: h = δK0 , i.e., pricing

errors are constrained to belong to K0.

Transaction costs homogeneous in the portfolio positions on the dubious assets are naturally modelled with

norm-based transaction cost functions: h∗ = α ‖·‖, where ‖·‖ is some norm in RND and α > 0 is a scaling

of convex functions modeling transaction costs is advocated in, e.g., Lillo et al. [2003] and Constantinides [1979].16 The polar cone of set K is the set K0 := y ∈ RND : x′y ≤ 0, for all x ∈ K.

14

Page 15: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

parameter. S–SDFs in these markets imply pricing errors such that:

‖E[MXD]− PD‖∗ ≤ α , (14)

where ‖·‖∗ is the dual norm of ‖·‖.17 This is so, because the convex conjugate of h∗ is the characteristic

function of the ball of radius α under the dual norm: h = δαB(‖·‖∗). For instance, a specification of

proportional transaction costs with an l1−norm implies maximal pricing errors bounded by α. Hence, larger

bounds on pricing errors correspond to larger proportional transaction costs, and vice versa.

Frictions corresponding to constraints on total leverage are easily modelled, e.g, by bounding maximal

absolute positions on the dubious assets with a threshold α > 0: ‖θD‖∞ ≤ α. This is equivalent to a

specification of transactions costs using the characteristic function of the ball of radius α under the l∞−norm:

h∗ = δαB(‖·‖∞). Since the convex conjugate of h∗ is the l1−norm scaled by α, S–SDFs in these markets

imply:

‖E[MXD]− PD‖1 ≤τ

α. (15)

This is intuitive, as less binding leverage constraints are linked to S–SDF with smaller pricing errors.

Finally, various specifications of quadratic transaction costs, which can naturally arise in markets char-

acterized by price impact costs, can also be easily incorporated in our S–SDF framework. Since these

specifications are also naturally related to various penalization devices proposed in the machine learning

literature, we discuss them in the next section.

3.2 S–SDFs induced by machine learning penalization devices

The proportional transaction cost specification h∗ = α ‖·‖1 with an l1−norm in the previous section cor-

responds to the widely used lasso penalty in the machine learning literature; see Tibshirani [1996]. As is

well-known, this penalty is useful to produce sparse optimal solutions, i.e., sparse optimal portfolios in our

approach. Given the link (12) between optimal portfolios and minimum dispersion S–SDFs, this choice natu-

rally gives rise to S–SDFs that only depend on a strict subset of dubious returns. For instance, Qiu and Otsu

[2018] define with link (12) a particular S–SDF from the solution of a penalized portfolio problem of the form

(11), in which the dispersion of portfolio payoffs (function φ∗+) is measured by the convex conjugate of the

17The dual norm ‖·‖∗ of norm ‖·‖ is defined as ‖θD‖∗ := maxη∈RND θ

′Dη : ‖η‖ ≤ 1. For instance, the dual norm of

an lp−norm, that is ‖x‖p := (∑NDi=1 x

pi )1/p when p ∈ [1,+∞) and ‖x‖p := max

NDi=1 |xi| when p = +∞, is the lq−norm with

1/p+ 1/q = 1. In particular, the l∞−norm is the dual norm of the l1−norm and vice-versa, while the l2−norm is self-dual.

15

Page 16: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Kullback-Leibler SDF dispersion (function φ+).18 They then use this S–SDF to estimate a target unknown

SDF that exactly prices all assets, under the assumption that the logarithm of the latter is well approximated

by linear combinations of excess returns as the asset dimension grows. In our approach, we do not need

additional assumptions regarding, e.g., exact pricing constraints or some spanning properties of returns, and

derive instead a general duality link in Proposition 2, between primal S–SDF problems with non zero pricing

errors and dual penalized portfolio problems. Beside allowing us to handle relevant economic settings in

which exact linear pricing cannot be granted, this approach allows us (i) to provide a natural foundation and

additional economic insight for S–SDF specifications generated by solutions of penalized portfolio problems,

in terms of the pricing error geometry or the market frictions associated with these S–SDFs, and (ii) to

obtain extended diagnostics for asset pricing models in these settings.

A key insight of our theory is that sparsity cannot be in general obtained both for pricing errors and

supporting optimal portfolio weights of minimum dispersion S–SDFs. For instance, in equation (15) the

maximum pricing error bound induced by a lasso penalization for dual optimal portfolios implies dense

pricing errors. Equivalently, an l1−pricing error bound yields sparse pricing errors and dense dual portfolio

weights. This sparsity trade-off between pricing errors and supporting portfolio weights arises also for

penalization devices implying a more flexible relation between shrinkage and sparsity of the underlying

optimal solutions. One such example is the Elastic Net penalization proposed in Zou and Hastie [2005] and

used in Nagel et al. [2018], among others, to shrink the cross-section of returns: h∗ = α1 ‖·‖1+ α2

2 ‖·‖22, where

α1, α2 > 0 are scaling parameters. This specification corresponds to a quadratic transaction cost function

incorporating, e.g., price impact, and can induce sparse optimal portfolios. The resulting pricing error metric

h = 12α2

dist2α1B∞(·) in Table 1 is proportional to the squared Euclidean distance from the Ball of radius α1

under the l∞−metric. Here, only maximal pricing errors larger than α1 are penalized, which gives rise to

a quadratic penalization above a threshold implying dense pricing errors. The limit case α1 = 0, α2 = α

gives rise to a standard ridge transaction cost specification h∗ = 12α ‖·‖

22 with self-dual pricing error metric

h = α2 ‖·‖

22. In this setting, pricing errors and supporting portfolio weights of minimum dispersion S–SDFs

are both not sparse.

A quadratic portfolio penalization above a threshold can also directly specify a transaction cost functions h∗ =

α2

2 dist2α1B∞(·). In this case, the resulting pricing error metric arises from a quadratic Elastic Net specification

h = α1 ‖·‖1 + 12α2‖·‖22, which induces sparse pricing errors and dense portfolio weights. Alternatively, an

18See Case 1 of Examples 1 in Online Appendix A for the closed-form expressions of these functions.

16

Page 17: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

l1−penalty above a threshold and an l2−penalty below the same threshold can be applied, which gives rise

to the Huber-type specification of transaction costs in Table 1: h∗ = Ψα1,α2(·). For a sufficiently low choice

of the threshold value, this function resembles a smoothed version of a lasso penalization, which induces

approximately sparse optimal portfolios, i.e., a large fraction of small but not exactly zero portfolio weights.

The resulting pricing error metric in Table 1 gives rise to a quadratic penalization only below threshold

α1 > 0.

The specifications in Table 1 cover various examples of penalized portfolio problems relevant for applications,

to which we can naturally supply the associated minimum dispersion S–SDFs. Obvious modifications of these

settings can incorporate, e.g., asset specific proportional transaction costs and/or proportional portfolio

reallocation costs relative to a reference point, using a transaction cost function of the form:

h∗ := ‖Λ1·‖1 +∥∥∥Λ2(· − θ(b)D )

∥∥∥1, (16)

where Λ1,Λ2 are positive definite diagonal matrices and θ(b)D is a reference portfolio of dubious assets

that needs to be rebalanced. For instance, DeMiguel et al. [2019] adopt such a penalization to study

the implications of transaction costs for the number of characteristics included in optimal mean variance

portfolios. Also for such settings, our approach can directly provide the minimum variance S–SDFs consistent

with these optimal portfolios, simply by computing the implied pricing error metric h as the convex conjugate

of function (16).

3.3 S– SDFs induced by constrained pricing error minimizations

A useful equivalent definition of minimum dispersion S–SDFs can rely on the solution of a minimum pricing

error problem, subject to an upper bound ν > 0 on the S–SDF dispersion:

Π(ν) := infM∈M

h(E[MXD − PD]) : E[φ+(M)] ≤ ν . (17)

Indeed, Proposition 3 of Appendix B shows that the solution of problem Π(ν) can equivalently be obtained

as the solution of problem Π(τ) in equation (8), for a corresponding threshold τ > 0. Reversely, the solution

of problem Π(τ) can equivalently be obtained as the solution of problem Π(ν), for a corresponding threshold

ν > 0. As we show next, the equivalent definition of minimum dispersion S–SDFs with problem (17) allows

us to naturally link our methodology to additional important directions in the literature.

17

Page 18: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

3.3.1 Good-Deal Bounds S–SDFs

The Good-Deal Bounds theory in Cochrane and Saa-Requejo [2000] specifies a maximal price PD and a

minimal price PD for a newly issued security with payoff XD, given an exact pricing condition for a set

of traded securities and under an upper bound on the maximum attainable Sharpe Ratio. This bound is

introduced to rule out unreasonably good deals in the extended market including the new security. The

maximal and minimal price for the newly issued security are obtained as follows:

PD := PD + infM∈M

E[MXD − PD] : E[φ+(M)] ≤ ν , (18)

PD := PD − infM∈M

E[−(MXD − PD)] : E[φ+(M)] ≤ ν , (19)

where (i) PD is the price of the new security implied by the minimum variance SDF that exactly prices all

traded assets and (ii) an upper maximal Sharpe ratio bound arises using variance as a measure of S–SDF

dispersion, i.e., function φ+ in case (3a) of Examples 1 in Online Appendix A. Apparently, both PD and PD

are obtained by computing a corresponding minimum dispersion S–SDF, solving a problem of the form (17)

with pricing error functions given by h(η) = η and h(η) = −η, respectively.

3.3.2 Robust linear S–SDFs with shrinkage

Nagel et al. [2018] propose robust linear pricing rules for large asset markets, based on a parametric S–SDF

M(θ0) := θ0S − θ′0D(F e − E[F e]), where F e is a vector of ND characteristics-based factor excess returns

with covariance matrix ΣF e . This S–SDF is defined by the solution of following optimization problem, for

some µ > 0:19

M(θ0) := arg minθ∈RN

1

2||E[M(θ)F e]||22 +

µ

2V ar(M(θ)) : E[M(θ)] = 1

. (20)

Hence, M(θ0) is equivalently given by the solution of a particular form of a constrained problem (17), in

which no positivity constraint appears in set M of admissible S–SDFs exactly pricing the sure assets. Such

a constrained problem is specified using a single sure asset payoff XS := 1 with price PS = 1, and dubious

asset payoffs XD := F e with dubious price vector PD = 0ND :

M(θ0) = arg minθ∈RN

1

2||E[M(θ)F e]||22 :

1

2E[M2(θ)] ≤ ν and E[M(θ)] = 1

, (21)

19This S–SDF is motivated by a Bayesian prior on expected returns that shrinks the Sharpe ratio of low eigenvalue principalcomponent portfolios relative to the Sharpe ratio of high eigenvalue principal component portfolios.

18

Page 19: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

for some ν > 0. The equivalence between problem Π(ν) in equation (17) and problem Π(τ) in equation (8)

follows from Proposition 3 of Appendix B, for a corresponding threshold τ > 0:

M(θ0) = arg minθ∈RN

1

2E[M2(θ)] :

1

2||E[M(θ)F e]||22 ≤ τ and E[M(θ)] = 1

, (22)

which is a parametric minimum variance S–SDF problem with pricing error metric h = 12 ‖·‖

22. However, the

characterization (12) of the duality link in Proposition 2 shows that minimum variance S–SDFs are always

described by a linear function of asset payoffs. Therefore,

M(θ0) = arg minM∈L2

1

2E[M2] :

1

2||E[MF e]||22 ≤ τ and E[M ] = 1

, (23)

i.e., M(θ0) is the nonparametric minimum variance S–SDF induced by a ridge pricing error metric 12 ‖·‖

22.

Self-duality of the ridge penalization also implies h∗ = 12 ‖·‖

22, i.e., a quadratic transaction cost specification

in the supporting arbitrage-free economy of Proposition 1 and in dual portfolio problem (11).

Overall, S–SDF (20) implies dense pricing errors and dense portfolio weights, with no selection of a sparse

subset of excess returns on which this S–SDF can depend. To obtain sparse portfolio weights, one may add

to the variance penalty in problem (20) a sparsity inducing penalization, such as a scaled lasso penalization

α||Σ1/2 · ||1, for some α > 0, which gives rise to a scaled elastic net-type penalty, as in Nagel et al. [2018]:

µ

2V ar(M(θ)) + α||Σ1/2θD||1 =

µ

2θ′DΣθD + α||Σ1/2θD||1 . (24)

However, it is hard to see how the solution of this problem can be a minimum variance S–SDF, because

problem (20) extended with penalization (24) is hardly written as a problem of the form (17). Therefore, it

is also not obvious how Proposition 1 can provide an arbitrage-free foundation for this solution. However,

using the theory in Section 3.2 we have shown that following minimum variance S–SDF:

M(θ0) := arg minθ∈RN

α2

2dist2α1B∞(E[M(θ)F ]) +

µ

2V ar(M(θ)) : E[M(θ)] = 1

, (25)

implies sparse portfolio weights θ induced by an elastic net type of portfolio penalty. This S–SDF is fully

consistent with the arbitrage-free foundation provided by Proposition 1.

19

Page 20: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

3.4 S–SDFs induced by the Arbitrage Pricing Theory

The Arbitrage Pricing Theory (APT) first proposed by Ross [2013] relies on a linear factor model for a N×1

vector Re of asset excess returns:

Re = ΛF e + ζ , (26)

where F e is a NS×1 vector of observed traded factor excess returns and Λ a N×NS matrix of factor loadings.

The N × 1 vector of residuals ζ is assumed orthogonal to traded factor excess returns, but potentially cross-

sectionally correlated and not zero mean. Its (full rank) variance covariance matrix is denoted with Σζ .

As shown in Ingersoll Jr [1984], and more broadly in Uppal and Zaffaroni [2016], the absence of asymptotic

arbitrage opportunities implies constraints on asset expected excess returns and pricing errors in factor model

(26).20 Precisely, [Uppal and Zaffaroni, 2016, Thm 3.3] implies the existence of a NS×1 vector λ = E[F e] of

traded factor risk premia such that for any N there exists τ > 0 satisfying following asset pricing bound:21

α := E[ζ] = ΛHλH + η ; ‖η‖2,Σ−1/2ζ

:=√η′Σ−1ζ η ≤ τ , (27)

where ΛHλH is an expected return component that may be induced by an unknown number of KH hid-

den systematic risk factors, ΛH a N × KH matrix of associated factor loadings, λH a KH × 1 vector of

corresponding factor risk premia and η an asset specific expected return component. We crucially exploit

in the next section these fundamental APT asset pricing relations to formulate a theory of APT–consistent

minimum dispersion S–SDFs.

3.4.1 APT–consistent S–SDFs

By construction, in statement (27) the pricing error on traded factor excess returns is zero, i.e., we can

naturally treat traded factors as sure assets in our S–SDF setting. On the other hand, α in equation (27) has

the interpretation of an expected excess return component not generated by an exposure to observed traded

factor risk, but rather by exposures to hidden systematic factor risks and some form of asset mispricing.

20 A sequence of portfolios θkk∈N, corresponding to a sequence of growing economies Rkk∈N with k → ∞, is said togenerate an asymptotic arbitrage opportunity if along some subsequence θk′k′∈N:

V ar(θ′k′Rk′ )→ 0 and E[θ′k′ (Rk′ −Rf1)] ≥ τ > 0 .

21 The original APT pricing error bound is α′Σ−1ζ α ≤ τ2 for some τ > 0. [Uppal and Zaffaroni, 2016, Thm 3.3] states

the existence of τη ≥ 0 such that η′Σ−1ε η ≤ τ2η , where Σε is the covariance matrix of the non-pervasive component ε of ζ.

However, note that Σ−1ε −Σ−1

ζ is positive semi-definite under the assumptions of [Uppal and Zaffaroni, 2016, Thm 3.3]. Hence

bound (27) follows.

20

Page 21: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

These remarks motivate following notion of an APT–consistent minimum dispersion S–SDF.

Definition 3 (APT–consistent S–SDF). In the above economy, an S–SDF M is APT–consistent if:

1. It exactly prices the risk-free return and the vector of traded factor returns, i.e, E[M ] = 1 and

E[MF e] = 0NS .22

2. It implies pricing errors η := E[M(Re −ΛF e)] that satisfy the APT bound (27) for some τ > 0.

In summary, APT–consistent S–SDFs induce traded factor risk premia that exactly reproduce the physical

expected excess returns of traded factors. In parallel, they induce a particular expected component in the

excess returns orthogonal to traded factor risk, which reflects exposures to hidden factor risks and asset

mispricing consistently with the APT bound (27). More precisely, borrowing from identity (3) we obtain

following decomposition of asset risk premia, using APT–consistent S–SDFs:

E[Re] = −Cov(M,ΛF e

)− Cov

(M,Re −ΛF e

)+ η . (28)

In this equation, the first component on the RHS is the risk premium for exposure to traded factor risk,

which equals ΛE[Fe]. The second component is a risk premium component for exposure to hidden systematic

factor risk and it is equal to E[(1− M)(Re −ΛF e)]. The third term, η = E[M(Re −ΛF e)], is an expected

excess return component due to (bounded) mispricing, which satisfies the inequality in equation (27) for

some τ > 0.

Given that an APT–consistent S–SDF exactly prices observable traded factor returns, it is convenient to

decompose asset excess returns into the two orthogonal components that are spanned and unspanned by

traded factor risk, respectively. To this end, we introduce the projection of excess returns on the space

orthogonal to the traded factor excess return space:

Re := Re − projspan(F e)(Re) . (29)

Following Definition 3, we can finally denote by RS := (1, (1NS +F e)′)′ the vector of sure asset returns and

by RD := 1ND + Re the vector of dubious asset returns for an APT–consistent S–SDF. With this notation,

the notion of an APT–consistent minimum dispersion S–SDF naturally follows.

Definition 4 (Minimum dispersion APT–consistent S–SDF). Let sure and dubious asset returns be

22The hidden risk free return is normalized to one without loss of generality.

21

Page 22: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

given by RS := (1, (1NS +F e)′)′ and RD := 1ND +ReN . An APT–consistent minimum Φ−dispersion S–SDF

is defined for some τ > 0 by the solution of a constrained optimization problem:

K(τ) := infM∈M

E[φ(M)] : h(E[MRD]− 1) ≤ τ , (30)

in which pricing error metric h is such that for any pricing error vector ηD ∈ RND :

h(ηD) ≥ ‖ηD‖2,Σ−1/2ζ

. (31)

In other words, a minimum dispersion S–SDF is APT–consistent when its pricing error metric is stronger

than the pricing error metric that defines the APT bound (27). This feature ensures that any threshold τ > 0

satisfied by pricing errors under APT-consistent metric h is also satisfied under the APT metric ‖·‖2,Σ−1/2ζ

.

3.4.2 Penalized principal components S–SDFs

Nagel et al. [2018] and Lettau and Pelger [2019] consider a linear factor model for a N ×1 vector Re of asset

excess returns:

Re = Λ0Fe0 + ε0 , (32)

where F e0 is a NS0×1 vector of unobservable factor excess returns and Λ0 a N×NS0 matrix of factor loadings.

We denote by µF e0 and ΣF e0 the NS0×1 factor expectation vector and the NS0

×NS0factor covariance matrix,

respectively. The N×1 vector of residuals ε0 only depends on non pervasive risks orthogonal to factor excess

returns. Its mean is α0 := E[ε0] and its (full rank) variance covariance matrix is denoted with Σε0 .

Compared to the previous section, factor model (32) is correctly specified, because ε0 does not contain

pervasive factors, but the underlying factor structure is hidden and needs to be recovered from the cross-

section of returns. If follows from our theory that given factor structure (S0,Λ0,Fe0 ) any APT–consistent

minimum variance S–SDF with no positivity constraint and a unitary expectation is of the form:23

M(θ0) = 1− θ′0S(F e0 − E[F e0 ]) ; θ0S = Σ−1F e0µF e0 , (34)

i.e., only systematic factor risk is priced and the minimum variance S–SDF factor loadings are given by the

23These S–SDFs take the form:

M(θ0) = 1− θ′0S(F e0 − E[F e0 ])− θ′0D(ε0 −α0) , (33)

for corresponding optimal portfolio weights θ0S and θ0D of sure and dubious assets. However, APT-consistency in Definition3, together with the orthogonality of F e0 and ε0, further implies E[M(θ0)(ε0 −α0)] = 0N and E[M(θ0)F e0 ] = 0NS . Therefore,

θ0D = 0ND and θ0S = Σ−1F e0µF e0 .

22

Page 23: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

weights of the maximum Sharpe ratio factor portfolio. A key question is how the underlying factor structure

in model (32) can be recovered from observed asset returns. Common approaches start from a rotation of

asset returns based on a form of Principal Component Analysis (PCA):

Re = ΛF e , (35)

where Λ is a N ×N matrix of factor loadings, and F e a N ×1 vector of principal components excess returns

with covariance matrix ΣF e . Nagel et al. [2018] apply to a vector of standard principal component excess

returns a penalized pricing error minimization (20), in which sparsity inducing penalizations generate S–SDFs

with shrunk factor loadings for a sparse subset of principal components. Lettau and Pelger [2019] directly

select the relevant S–SDF factors using a risk premium PCA (RP-PCA) approach, in which a penalization

of factor model pricing errors improves the statistical power in detecting weak asset pricing factors.

Our theory of APT–consistent S–SDFs naturally gives rise to a simple different approach for the identification

of factor based S–SDFs. To see this, note that any APT–consistent minimum variance S–SDF with a single

sure return RS = 1 takes the form:

M(θD(τ)) = 1− θ(τ)′(Re − E[Re]) , (36)

with portfolio weight vector θ(τ) ∈ RN solving optimization problem (30) and for some τ > 0. Since this

S–SDF implies a pricing error vector α(τ) := E[M(θD(τ))Re], it is also uniquely determined from α(τ) by

the N exact pricing constraints:

E[M(θD(τ))(Re −α(τ))] = 0ND . (37)

Recalling rotation (35), M(θD(τ)) = 1− θ(τ)′Λ(F e−E[F e]) and we obtain following principal components

based asset pricing relations:

α(τ) = E[M(θD(τ))ΛF e] = ΛE[F e]−ΛΣF eΛ′θ(τ) , (38)

with the unique solution:

Λ′θ(τ) = Σ−1F e(E[F e]−Λ−1α(τ)

). (39)

In this equation, Λ′θ(τ) is given by the portfolio weights of a modified maximum Sharpe ratio principal

23

Page 24: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

components portfolio, in which principal components expected returns are corrected by the pricing error term

Λ−1α(τ). Whenever N −NS(τ) components of E[F e] equal the corresponding components of Λ−1α(τ), the

vector of principal components weights belongs to a NS(τ)−dimensional subspace of RN , i.e., the minimum

variance S–SDF belongs to a subspace of L2 having dimension NS(τ) strictly smaller than N . This feature

is particularly apparent for rotations implying a diagonal principal components covariance matrix. Here, all

principal components excess returns with expected return equal to the corresponding component of scaled

pricing error vector Λ−1α(τ) do not give rise to prices systematic risk in minimum variance S–SDF (36).

In summary, we have shown that the APT asset pricing restrictions always give rise to corresponding APT–

consistent S–SDFs, from which the underlying factors structure of asset returns can be naturally studied.

4 Econometrics of minimum dispersion S–SDFs

Consistent estimation procedures of S–SDF dispersion bounds and dual portfolio weights in stationary dy-

namic economies can be naturally developed within our approach. To this end, let (Xt+1,Pt)t∈N be a

time series of payoffs and quoted prices of the N basis assets, which is defined on the filtered probability

space (Ω, Ftt∈N,P). The N–dimensional vectors of payoffs Xt+1 and reference prices Pt are observed at

time t+ 1 and time t, respectively.

We denote by Q : RN → [0,+∞] the objective function in the dual optimization problem (11), i.e.,

Q(θ) := Et[φ∗+(−X ′t+1θ)

]+ P ′tθ + γh∗(θD) . (40)

By Proposition 2, the minimum S–SDF dispersion bound is given by ∆ = minQ(θ) : θ ∈ Θ, where Θ is

the domain of Q, and the optimal portfolio weights are given by θ0 := arg minQ(θ) : θ ∈ Θ. Given a

sample of size T > 0, the empirical analogue to population function (40) is:

QT (θ) :=1

T

T∑t=1

[φ∗+(−X ′t+1θ) + P ′tθ] + γh∗(θD) . (41)

The natural estimator for population value ∆(τ) is thus:

∆T (τ) := min QT (θ) : θ ∈ RN . (42)

Accordingly, any θT ∈ arg minQT (θ) : θ ∈ RN is the corresponding estimator of θ0.

24

Page 25: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

4.1 Asymptotic properties of dispersion bound and dual portfolio weight esti-

mators

Under standard assumptions detailed in Section B of the Online Appendix, ∆T (τ) and θT are consistent

estimators of ∆(τ) and θ0, respectively.24 These assumptions are satisfied, e.g., by S–SDF settings based

on Cressie and Read [1984] measures of dispersion in Section A of the Online Appendix and lower semi-

continuous penalization functions γh∗ .25

The asymptotic distribution of ∆T (τ) and θT also follows, under essentially the same standard assumptions

used for minimum dispersion SDF settings with no penalization (see Sections B.1 of the Online Appendix),

but it can be characterized in quite different ways, depending on whether penalty function γh∗ is just lower

semi-continuous on the relevant optimization domain, continuously differentiable, or twice differentiable.

To provide a unified theory, we make use of continuously differentiable penalizations γh∗ in Assumption 4

of the Online Appendix. Further, in order to theoretically demonstrate the wide applicability of our theory,

we show by means of Moreau [1962] envelopes theory how not differentiable penalizations can always be

approximated with arbitrary accuracy using continuously differentiable penalizations. Indeed, for any closed

and convex pricing error function h, we can always define following accurate Ridge approximation:

hα := h+α

2‖·‖22 ,

for a sufficiently small parameter α > 0. Using Moreau [1962] envelopes theory, we then show that the cor-

responding penalization γh∗α , as defined in (6), is always continuously differentiable. Therefore, for practical

purposes our theory covers all relevant applications with a single approach.

Under the assumptions in Section B of the Online Appendix, we fully characterize in Proposition 6 of

the Online Appendix the asymptotic distribution of minimum dispersion bound estimator ∆T (τ), which is

Gaussian because of our ability to work with continuously differentiable penalizations. We then also fully

characterize in Proposition 7 of the Online Appendix the asymptotic distribution of θT . To this end, we

introduce the equicontinuity Assumption 7, which overcomes the issues arising in settings where penalty

term γh∗ is not twice differentiable. The asymptotic distribution of θT is not Gaussian in general. We

characterize it in Proposition 6 of the Online Appendix as the distribution of the minimizer of a random

function with Gaussian finite dimensional distributions.

24See Proposition 4 in Section B of the Online Appendix for details.25 A convex and proper function is closed if and only if it is lower semi-continuous.

25

Page 26: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

4.2 S–SDF dispersion bound diagnostics

In our S–SDF minimum dispersion setting, we can develop natural extended diagnostics for asset pricing

models with pricing errors. To this end, suppose that M∗t+1Tt=1 is the time series of S–SDF realizations

generated by a specific asset pricing model M∗. Given pricing error function h and threshold τ > 0, we aim

to test following null hypothesis:

H0 : E[M∗XS ]− PS = 0S and h(E[M∗XD]− PD) ≤ τ . (43)

The null of correct SDF specification under exact pricing in Hansen and Jagannathan [1991] is obtained

when, e.g., all assets are treated as sure. Similarly, the null hypothesis in Hansen et al. [1995] of correct SDF

specification in presence of short-selling constraints or bid-ask spreads is obtained by setting h = δ−RND+

.

However, our testing framework provides model diagnostics for much broader convex pricing error geometries

or transaction cost specifications.

A natural diagnostics in our setting simply tests whether an S–SDF M∗ satisfies the minimum dispersion

bounds introduced in Section 2.2, given that constraint (43) holds and while parametrizing the bound with

the bond price E[M∗] in a bond market without frictions. Borrowing from the asymptotic distribution

results for minimum dispersion bounds in the previous section, we obtain asymptotically normal minimum

dispersion S–SDF bounds.

The primal minimum dispersion S–SDF problem incorporating constraint (43) and the exact pricing con-

straint in bond markets is:

Π(τ) := infM∈M

E[φ(M)] : E[M ] = E[M∗], h(E[MXD]− PD) ≤ τ .

Moreover, Π(τ) = −∆(τ) from Proposition 2, with the penalized portfolio problem:

∆(τ) := minγ∈R, θ∈RN

E[φ∗+(−γ −X ′θ)] + γE[M∗] + P ′θ + γh∗(θD) .

Clearly, under null hypothesis (43) the following null also holds:

H′0 : ξ(τ) := ∆(τ) + E[φ(M∗)] ≥ 0 , (44)

26

Page 27: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

and the sample version of population quantity ξ(τ) is the statistic:

ξT (τ) := ∆T (τ) +1

T

T∑t=1

φ(M∗t+1) . (45)

By straightforwardly adjusting the arguments in the proof of Proposition 6 of the Online Appendix, we find

that√T (ξT (τ)− ξ(τ)) converges in distribution to a normally distributed zero-mean random variable with

asymptotic variance:

v(θ0) = limT→∞

V ar

(√T (∆T (τ)−∆(τ)) +

1√T

T∑t=1

φ(M∗t+1)− E[φ(M∗t+1)]

). (46)

In equation (46), the first term isolates the variance contribution of the minimum S–SDF dispersion bound

estimator ΠT (τ) := −∆T (τ) and the second term the contribution of the estimated dispersion of S–SDF

M∗.

We can now formulate a model diagnostics procedure for the composite null hypothesis (44). The most

conservative critical value for statistic (45) under null hypothesis (44) is the one implied by the simple null

ξ(τ) = 0. Given this null, we can use the above asymptotic distribution to construct conservative asymptotic

critical values for null hypothesis (44).

In Section 5 below, we study the empirical properties of estimated minimum dispersion S–SDF bounds

for various specifications of APT–consistent S–SDFs, which are linked to distinct pricing error and dual

portfolios weights sparsity features.

5 Empirical Analysis: S–SDFs in the APT Framework

This section explores the empirical properties of minimum dispersion S–SDFs under an Arbitrage Pricing

Theory setting, in which pricing error bounds are motivated by the absence of asymptotic arbitrage oppor-

tunities; see Section 3.4. We estimate various minimum dispersion S–SDFs consistent with the APT pricing

error bounds (27) and illustrate their in-sample and out-of-sample properties and implications in terms of

the resulting pricing errors, dual portfolio representations, dispersion bounds and time series realizations.

Moreover, we examine the out-of-sample ability of these APT–consistent S–SDFs in explaining asset risk

premia and study the performances of portfolio strategies supporting them.

These empirical findings are explored along varying modelling choices regarding the dispersion measure and

the imposed pricing error geometry and size. Instead, the market partition is kept fixed and such that all

27

Page 28: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

risky returns are dubious, i.e., might be misspriced, and only the risk-free return is always priced exactly.

This way, the results are independent of the specification of the relevant traded risk factors in a potentially

misspecified APT factor model for returns. Four dispersion measures are considered, namely the Hansen-

Jagannathan, Kullback-Leiber, negative entropy and Hellinger dispersions; see cases 3a. (γ = 2), 1., 2. and

3b. (γ = 1/2) of Examples 1 in Online Appendix A.

Beside the natural APT pricing error metric in equation (27), we further consider pricing error geometries

generated by following APT–consistent pricing error metrics, which are defined for every ηD ∈ RND and

λ ∈ [0, 1] by:26

h1(ηD) := (1− λ)∥∥∥Σ−1/2ηD∥∥∥

1+ λ

∥∥∥Σ−1/2ηD∥∥∥2, (47)

and

h∞(ηD) := (1− λ)√ND

∥∥∥Σ−1/2ηD∥∥∥∞

+ λ∥∥∥Σ−1/2ηD∥∥∥

2. (48)

Figure 3 illustrates the relation between these APT–consistent pricing error metrics for a setting with two

assets such that Σ = I. While pricing error metric h1 promotes varying degrees of sparsity on the resulting

pricing errors, h∞ implies sparsity on the portfolio weights supporting minimum dispersion S–SDFs, due to

the presence of the l1−norm in h1 and in the dual norm h∗∞, respectively.27

5.1 Dataset

Our empirical evidence relies on standard databases of different dimensions, in order to easily benchmark our

results with related empirical evidence in the literature. We use three datasets that differ in the dimension

of the set of assets considered, in order to better isolate the empirical merits of APT–consistent S–SDFs for

26 APT–consistency of these pricing error metrics directly follows from the standard norm inequalities ‖·‖2 ≤ ‖·‖1 and‖·‖2 ≤

√ND ‖·‖∞.

27The dual portfolio problem for APT–consistent S–SDFs in this setting directly follows from our theory in Section 2:

∆ = minθ∈RN

E[φ∗+(−R′θ)] + 1′θ + τh∗i (θD)

, (49)

with dual norms given by:

(i) When λ = 1:

h∗1(θD) = h∗∞(θD) = h∗2(θD) =∥∥∥Σ1/2θD

∥∥∥2.

(ii) When λ = 0:

h∗1(θD) =∥∥∥Σ1/2θD

∥∥∥∞

; h∗∞(θD) =∥∥∥Σ1/2θD

∥∥∥1/√ND .

(iii) When λ ∈ (0, 1):

h∗1(θD) = minz∈RND

max

∥∥Σ1/2θD∥∥∞

1− λ,

∥∥Σ1/2(θD − z)∥∥2

λ

,

h∗∞(θD) = minz∈RND

max

∥∥Σ1/2θD∥∥1√

ND(1− λ),

∥∥Σ1/2(θD − z)∥∥2

λ

. (50)

28

Page 29: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

return spaces of different dimensions. Our datasets consist of monthly gross returns of sorted portfolios from

Kenneth French’s data library.

1. Low dimensional dataset: The returns considered are the risk-free return, 25 portfolio returns

sorted on size and book to market, 10 portfolio returns sorted on momentum and 25 portfolio returns

sorted on size and long term reversal.

2. Intermediate dimensional dataset: The returns considered are the risk-free return, 100 portfolio

returns sorted on size and book to market, 25 portfolio returns sorted on momentum, 25 portfolio

returns sorted on size and long term reversal, 25 portfolio returns sorted on size and short term

reversal and 49 portfolio returns sorted on industry.

3. Augmented intermediate dimensional dataset: For this dataset, the returns in the intermediate

dimensional dataset are augmented with characteristics-based factors from the WRDS financial ratios

(WFR) dataset, which consists of 69 ratios for 10 industries based on Fama and French industry

classification. The construction of the factor returns follows Nagel et al. [2018].

We also consider the returns of the Fama-French five-factors for constructing simple benchmark linear

SDFs. Data availability allows accessing a sample of asset returns from January 1931 (January 1970 for

the characteristic-based factors in the augmented intermediate dimensional dataset) until June 2018. Port-

folios with missing time series observations are removed. After such removal, the low dimensional dataset

consists of 57 assets and 1054 time series observations, the intermediate dimensional dataset consists of 188

assets and 1054 time series observations, while the augmented intermediate dataset has 260 asset returns

with 561 time series observations.

5.2 In–sample properties of APT–consistent minimum dispersion S–SDFs

We first explore the in-sample properties of APT–consistent S–SDFs induced by their deep characteristics:

the S–SDF dispersion criterion (function φ+) and the S–SDF pricing error geometry (function h). In a

nutshell, we find that allowing pricing errors on a subset of assets can help (i) restoring the SDF dispersion

to more realistic levels and (ii) avoiding to reach the zero lower bound for an S–SDF when variance is used

to measure dispersion. These two S–SDF features are more easily attained by pricing error geometries with

dense pricing errors than by sparsity inducing geometries, showing that pricing error sparsity can be costly

29

Page 30: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

in terms of minimal S–SDF dispersion. We also find that the importance of introducing a concern for higher

order risks by means of S–SDF dispersion criteria other than variance diminishes as pricing errors are allowed.

Figure 4 shows the tradeoff between minimum S–SDF dispersion and pricing error size, under the family

of APT–consistent S–SDFs described at the beginning of this section. These results are based on the

intermediate dimensional dataset and variance as a measure of S–SDF dispersion. By construction, the

minimum variance bound decreases in the pricing error bound τ . However, this tradeoff is different for S–

SDFs with sparse pricing errors, as they require a proportionally higher variance to satisfy the same pricing

error bound, as opposed to S–SDFs with dense pricing errors.

Using the pricing error threshold highlighted by the vertical line in Figure 4, we next illustrate in more

detail in Figure 5 the deep properties of APT–consistent S–SDFs. In order to show explicitly the different

geometries induced by the l1, l2 and the (scaled) l∞ APT–consistent pricing error metrics, we report in

panel (A) of Figure 5 pricing errors in their standardized APT metric, i.e., the empirical version of vector

Σ−1/2E[M0RD − 1]. Similarly, optimal portfolio weights are reported in their penalized APT metric, i.e.,

we report the empirical counterpart of Σ1/2θ0D.

In Figure 5, sparse pricing errors under the l1−pricing error metric are linked to dense portfolio weights

and to the largest S–SDF variance. On the other hand, dense pricing errors under the (scaled) l∞−pricing

error metric are linked to sparse portfolio weights and the second largest S–SDF variance. Finally, the

l2−pricing error metric induces an S–SDF with dense portfolio weights and pricing errors and the lowest

variance. Notice that the l2−pricing error metric implies a qualitatively identical pattern in APT pricing

errors and dual portfolio weights, due to both the self-duality of the l2−pricing error metric and the linearity

of minimum variance S–SDFs in their optimal dual portfolio returns.

Panel (B) of Figure 5 reports the time series of S–SDFs estimated for the various APT–consistent pricing

error metrics. As S–SDFs with dense portfolio weights and pricing errors imply the lowest volatility, they

rarely attain the zero lower bound. Their lowest volatility is the result of the portfolio weights in figure

5, which are on average smaller in absolute value than the portfolio weights of S–SDFs with sparse pricing

errors or sparse portfolio weights. The larger portfolio weights implied by the two latter S–SDFs follow from

the more restrictive pricing error geometries under the l1− and the (scaled) l∞−metrics, which increase the

S–SDF volatility bound.

In panel (A) of Figure 6 we report the time series of minimum variance S–SDFs for different pricing error

30

Page 31: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

thresholds, expressed as fractions of τmax, the estimate of the maximum pricing error threshold τmax in

equation (10). The minimum variance SDF pricing all assets exactly is highly volatile and often degenerate,

i.e., it reaches the zero lower bound. In contrast, APT–consistent minimum variance S–SDFs feature by

construction much less volatility and hence reach less often, if at all, the zero lower bound, when higher

pricing error thresholds are allowed.

We next address the properties of APT–consistent minimum dispersion S–SDFs linked to notions of disper-

sion beyond variance. To this end we further consider Kullback Leibler (KL), Negative Entropy (NE) and

Hellinger (HE) dispersions, where the pricing error metrics is given by the APT-consistent l2-norm. Note

that while minimizing variance entails S–SDFs that are piecewise linear in the dual optimal portfolio payoffs,

minimizing the KL dispersion entails S–SDFs exponential in the payoffs and minimizing NE or HE dispersion

provides a hyperbolic relation between minimum dispersion S–SDFs and optimal payoffs; see Corollary 2 of

Appendix A. Therefore, while minimum variance S–SDFs can attain the zero lower bound, minimum KL,

NE or HE dispersion S–SDFs do not.

Minimum KL divergence S–SDFs in panel (B) of Figure 6 are strictly positive in all cases and more positively

skewed, especially for low pricing error thresholds. However, as the admissible pricing error threshold

increases to, e.g., 60% of the maximal pricing error threshold τmax, minimum variance and minimum KL

divergence S–SDFs become more and more similar and virtually perfectly positively correlated. This feature

holds also with respect to minimum NE and HE dispersion S–SDFs not reported in the scatter plots of

Figure 6 panel (B). To understand this last evidence better, Figure 7 reports the correlations between the

optimal portfolio payoffs underlying minimum variance S–SDFs and the optimal portfolio payoffs representing

minimum KL, NE and HE dispersion S–SDFs, as a function of the corresponding pricing error threshold.

Here, all S–SDF correlations are increasing in the pricing error threshold and the smallest correlation between

minimum variance S–SDFs and the other S–SDFs is no less than 97.5% when the corresponding pricing error

threshold is at least as large as half of τmax. Overall, this evidence demonstrates that the differences in

minimum dispersion S–SDFs induced by different notions of dispersion tend to shrink in presence of less

constrained pricing errors that correspond to stronger portfolio weight penalizations.

5.3 Out-of-sample pricing properties of APT–consistent S–SDFs

In this section we explore the explanatory power of minimum dispersion APT–consistent S–SDFs in fitting

the cross-section of asset risk premia. To this end, similarly to Ghosh et al. [2016], we sequentially estimate

31

Page 32: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

on rolling windows of 30 years of monthly returns the S–SDFs described at the beginning of Section 5.

Then their out-of-sample pricing performance are evaluated by means of GLS adjusted cross-sectional R2−

of cross-sectional regressions of average excess returns on the estimated factor loadings of these different

empirical asset pricing models.

The main finding is that allowing for pricing errors in-sample helps to significantly improve the S–SDF out-

of-sample pricing performance. Such performance clearly dominates the one of benchmark empirical asset

pricing models, including three- and five-factor Fama-French models. The S–SDF out-of-sample pricing

performance also clearly dominates the one of linear SDFs implied by informal regularizations based on the

principal components of asset returns. Within our family of APT–consistent S–SDFs, we find that minimum

variance S–SDFs based on the standard APT pricing error metric in equation (27) achieve the highest and

most robust out-of-sample pricing performance. Other APT–consistent pricing error metrics or different

notions of S–SDF dispersion than variance produce inferior out-of-sample pricing results.

5.3.1 S–SDF estimation setting and benchmark empirical asset pricing models

For every in-sample window of 30 years of monthly observations having last observation in June of year

y − 1, the in-sample S–SDF estimation requires the specification of a pricing error threshold, denoted by

τy.28 We use our S–SDF duality framework to naturally determine the range of relevant empirical pricing

error thresholds. We estimate the maximal admissible threshold τmaxy with the finite-sample version of

equation (10), which corresponds to the estimated pricing error threshold of a standard minimum variance

SDF that exactly prices only the risk-free asset in the estimation window of 30 years previous to year y.

Similarly, the minimal admissible threshold τminy is estimated as the smallest threshold value for which in the

given estimation window no empirical duality failure arises; see again the discussion in Section 2.2 following

Proposition 2.

Given estimated range [τminy , τmaxy ] of admissible thresholds for a fixed estimation window of 30 years of

monthly observations, we estimate an optimal threshold τ∗y that maximizes the estimated GLS-adjusted cross-

sectional R2−metric within the given estimation window. Using the methodology introduced in Section B,

this gives rise to a sequence τ∗y , θy of estimated pricing error thresholds and S–SDF dual portfolio weights,

which is updated with an annual frequency. For any given month m in year y, we denote by R(m)y the

28 Given our data sample, this gives us an out-of-sample period of monthly observations from July 1963 to June 2018. Thefirst in-sample window of 30 years consists of monthly observations from July 1933 to June 1963.

32

Page 33: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

vector of returns for month m and estimate with Proposition 2 the corresponding sequence of out-of-sample

monthly minimum dispersion S–SDFs, i.e. M (m)y := (φ∗+)′(−θ′yR

(m)y ). In this way, we obtain a monthly

time series M (m)y of fully data driven APT–consistent S–SDFs with no forward looking bias, for which

we can evaluate the out-of-sample fit, based on standard testing methodologies for empirical asset pricing

models applied to out-of-sample monthly S–SDF and asset return realizations.

We compare the out-of sample fit of our data-driven APT–consistent S–SDFs with the following benchmarks.

First, a Fama-French three-factor model with factors given by the returns of market (in excess of the risk

free return), size and book-to-market portfolios. Second, a 5-factor Fama-French model that extends the

three-factor model by an operational profitability and an investment factor. Third, a linear factor model

with factors consisting of the first three principal components of returns estimated from out-of-sample return

realizations. While this empirical asset pricing model implies a forward-looking bias due to the estimation of

the covariance matrix of returns from out-of-sample observations, it provides a natural reference for assessing

the out-of-sample pricing performance that is achievable by linear factor models with time-invariant factor

compositions. Our last benchmark empirical asset pricing model exploits in a different way forward-looking

information from the out-of-sample period and is based on a sequence of corresponding S–SDFs. We obtain

these S–SDFs using the threshold parametrization τy(w) := wτmaxy , where parameter w is estimated using

out-of-sample data and is such that τy(w) ≥ maxyτminy , in order to ensure existence of an empirical

minimum dispersion S–SDF satisfying pricing error threshold τy(w) for every estimation window.29 We

obtain the ex-post optimal sequence τy := τy(w) of these SDFs using an estimated weight w that maximizes

the GLS-adjusted cross-sectional R2−metric with respect to the out-of-sample S–SDF and asset return

realizations. While as mentioned this last empirical asset pricing model implies a forward-looking bias

generated by the estimation of parameter w, it is a natural benchmark for assessing the out-of-sample

pricing performance of data driven S–SDFs in which time-varying threshold τy varies roughly proportionally

to the maximal threshold τmaxy .

5.3.2 Out-of-sample pricing results for APT–consistent minimum variance S–SDFs

For the intermediate dimensional dataset, Figure 8 reports various curves of out-of-sample GLS adjusted

R2s implied by data-driven minimum variance S–SDFs for weights w in the admissible range, together with

the optimal weight w and GLS adjusted R2 of the optimal forward looking minimum variance S–SDF. The

29 See again the discussion in Section 2.2 following Proposition 2.

33

Page 34: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

left and right panels report results for the APT–consistent pricing error metrics in equations (47) and (48),

respectively. We find that the maximal forward-looking GLS adjusted R2 in the left (right) panel ranges

between a value of about 44.1% for the l2 APT–consistent pricing error metric, to a value of about 32.3%

(39.5%) for the l1 (the scaled l∞) APT–consistent pricing error metric.30

The above evidence indicates that data driven S–SDFs based on a time-varying pricing error bound roughly

proportional to maximal threshold τmaxy may hardly produce out-of-sample GLS adjusted R2 larger than

about 45%, which is a quite high benchmark for the out-of-sample pricing performance in the intermediate

dimensional dataset. For comparison, we find that the out-of-sample GLS adjusted R2 of the three-factor

Fama-French model is only about 6.3%, while that of the five-factor Fama-French model is only about 5.9%.

In parallel, the GLS adjusted R2 of a linear three-factor model with factors given by the first three principal

components of returns is only about 5.7%. In contrast, we find that the out-of-sample GLS adjusted R2 of

APT–consistent S–SDFs without forward looking bias based on a l2−pricing error metric is as large as 41.8%,

which is only slightly lower than the maximal GLS adjusted R2 of forward-looking APT–consistent S–SDFs.

Table 2 produces additional details on the out-of-sample fit provided by the various S–SDF approaches

considered above.

In summary, we conclude that our data-driven APT–consistent minimum variance S–SDFs with no forward

looking bias yield a substantial out-of-sample pricing performance relative to all benchmark empirical asset

pricing settings considered.

5.3.3 Out-of-sample pricing results for other APT–consistent minimum dispersion S–SDFs

Our previous analysis has focused on APT–consistent minimum variance S–SDFs for the intermediate di-

mensional dataset. A natural question is how the corresponding out-of-sample results depend on the notion

of S–SDF dispersion used and on the dimension of the set of traded assets under scrutiny. To answer these

questions, Panel (A) and (B) of Figure 9 report the evidence obtained for minimum Kullback Leibler diver-

gence S–SDFs, using again the set of APT–consistent pricing error metrics introduced at the beginning of

Section 5. In each panel, we examine the GLS adjusted R2 achieved by minimum variance and minimum

KL divergence S–SDFs for both the low and the intermediate dimensional dataset.

Overall, we find that minimum KL divergence S–SDFs do not improve on the out-of-sample fit of minimum

30 The minimal forward-looking GLS adjusted R2 in all panels of Figure 8 is about 10% and corresponds to a sequence ofstandard minimum variance SDFs exactly pricing in each estimation window only the risk free rate.

34

Page 35: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

variance S–SDFs. For instance, in the intermediate dimensional dataset, the maximal GLS adjusted R2 under

a forward-looking APT–consistent minimum KL divergence S–SDF is even lower than the GLS adjusted R2

under the minimum variance APT–consistent S–SDF with no forward looking bias introduced above.

Second, we find that the region of pricing error thresholds with large GLS adjusted R2 under the minimum

variance APT–consistent S–SDF is quite broad. In contrast, the same region under the minimum KL

divergence S–SDF is much thinner, reflecting a higher sensitivity of the out-of-sample fit of these S–SDFs

with respect to the choice of the relevant APT pricing error bound. Such a large sensitivity can be a challenge

for constructing fully data-driven minimum KL divergence S–SDFs with good out-of-sample fit, based on a

corresponding estimated sequence of thresholds τ∗y with no forward looking bias.

Third, in the low-dimensional dataset the optimal weight w for minimum KL divergence SDFs is basically

equal to 1, indicating that the optimal S–SDF sequence actually consists of standard minimum KL divergence

SDFs that exactly price in each estimation window only the risk-free rate. Related results for minimum

variance S–SDFs show that the out-of-sample pricing performance is essentially independent of the pricing

error threshold used. Here, the fit of standard minimum variance SDFs exactly pricing the sure assets alone

or the sure and the dubious assets together is basically identical to the one of any other minimum variance

S–SDF. Therefore, in the low dimensional dataset the degree of portfolio weight penalization does not seem

to matter much for the resulting out-of-sample fit, especially when using minimum variance SDFs based on

APT-consistent pricing errors under the l2 metric.

5.4 Optimal trading strategies supporting APT–consistent S–SDFs

The time series of our data-driven monthly minimum variance S–SDFs M (m)y is spanned by a corresponding

time series of dual optimal portfolio monthly payoffs θ′yR(m)y , in which portfolio weights θy are estimated

on a training window of previous 30 years and rebalanced yearly. Hence, the price θ′y1 of these payoffs varies

with a yearly frequency and implies, e.g., a total leverage that varies at the same frequency.

To obtain an excess return with a normalized total investment, we compute following excess return, which

corresponds to a one dollar investment in the optimal portfolio financed by borrowing at the risk-free rate

Rfy(m)

:

Rey(m) :=

θ′y

(R

(m)y −Rfy

(m)1)

|θ′y1|. (51)

35

Page 36: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Note that since excess return (51) conditionally spans S–SDF M(m)y , it induces the same conditional pricing

fit and is thus a natural candidate to define a single-factor linear asset pricing model based on a traded

excess return.31

To obtain an excess return with a uniformly bounded leverage on each asset, we can further scale Rey(m)

proportionally to its (time-varying) maximal allocation to individual assets, e.g., by twice the maximal

allocation to individual assets, which ensures a bound of 0.5 on the maximal absolute asset allocation:

Rey(m)

:=θ′y

(R

(m)y −Rfy

(m)1)

2 ·maxi∈1,...N |θyi|. (52)

Similar to excess return (51), excess return (52) induces by construction the same conditional pricing fit as

S–SDF M(m)y .

Panel (A) of Table 3 documents the out-of-sample fit of empirical asset pricing models based on single

factor excess returns (51) and (52) implied by an l2 APT-consistent pricing error metric, in comparison

to benchmark single factor models based on the Fama-French excess returns and the excess return of an

equally-weighted portfolio, respectively. Consistently with our previous findings, we find that the model

based on excess return factors (51) and (52) clearly outperforms all other single factor benchmarks.

Panel (B) of Table 3 collects summary statistics of the various excess returns underlying the single-factor

models of Panel (A). It shows that excess return (51) produces a monthly Sharpe ratio larger than about

three times the market Sharpe ratio, the Sharpe ratio of the equally weighted portfolio or the Sharpe ratios

of size and book-to-market excess returns. In contrast to the market and the equally weighted portfolio

returns, excess return (51) is positively skewed. However, it also implies a higher kurtosis. The time-varying

rescaling underlying the definition of excess return Rey(m)

in Table 3 is reflected in the lower expected return

and volatility of Rey(m)

relative to Rey(m). While the Sharpe ratio of these two excess returns is very similar,

Rey(m)

dominates both the market and the equally weighted portfolio returns in a mean variance sense.

Figure 10 reports the cumulative returns of a one dollar investment in various excess returns, namely the

market excess return, excess return (51) leveraged to have the same volatility as the market portfolio, excess

return (52) leveraged to have the same volatility as the market portfolio, the excess return of the SMB

portfolio, the excess return of the HML portfolio and the excess return of an equally-weighted portfolio of

31The unconditional pricing fit of M(m)y and Rey

(m) can in principle be different, due to the unconditional variations in portfolio

weights θy . However, the low-frequency (yearly) character of these variations implies small differences in out-of-sample pricing,as we document below.

36

Page 37: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

all sorted portfolios in the intermediate dimensional dataset. As is apparent from Figure 10, the optimal

portfolios implied by our data-driven S–SDFs M (m)y largely outperform all other benchmarks in terms

of cumulative performance. They also produce on average lower conditional return volatilities and are less

subject to downside risks during recessions or periods of financial distress.

Overall, we conclude that the high profitability of our trading strategies induced by minimum variance S–

SDFs predominantly arises from their ability to optimize conditional mean-variance trade-offs and less from

their dynamic leverage of timing features.

6 Conclusions

We propose a general approach to study model-free smart stochastic discount factors (S–SDFs), i.e., stochas-

tic discount factors incorporating widespread constraints on the non-zero pricing errors of a subset of assets.

To this end, we introduce minimum dispersion S–SDFs as the solutions of a general minimum SDF disper-

sion problem with convex constraints on some subset of nonzero pricing errors. This approach allows us

to incorporate, e.g., several specifications of frictions and total wealth constraints in the literature, various

useful approaches to regularize SDFs in arbitrage-free asset markets, as well as various pricing error bounds

implied by the Arbitrage Pricing Theory, both under a correctly specified or a misspecified factor model for

returns.

We first show that S–SDFs are generally always supported by a corresponding viable market with convex

transaction costs, such as, e.g., quadratic transaction costs or constraints on total leverage. We then formally

link minimum dispersion S–SDFs to the solution of a dual portfolio selection problem with penalized port-

folio weights, under a uniquely determined convex penalization function. This dual penalization explicitly

measures the direct effect on optimal portfolio weights of a given pricing error metric for non zero pricing

errors. Equivalently, it uniquely identifies the deep S–SDF properties that are induced by a given market

friction, a particular regularization choice or a specific APT pricing error bound.

We develop the necessary methodology for the empirical analysis of S–SDFs, under a broad choice of convex

pricing error constraints. In contrast to standard minimum dispersion SDFs, the dual optimization problem

of minimum dispersion S–SDFs often yields a non smooth objective function. For a broad class of pricing

error constraints and convex portfolio penalization, we systematically make use of Moreau envelopes to

obtain tractable estimation problems that are compatible with various geometries of sparse pricing errors or

37

Page 38: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

sparse optimal portfolio weights.

In order to systematically construct more robust model-free stochastic discount factors motivated by plausible

no-arbitrage assumptions, we make use of Cressie-Read power dispersions to obtain APT–consistent S–

SDFs in asset markets where APT-type no arbitrage conditions can be invoked. We empirically quantify

the tradeoff between pricing error sparsity and pricing error size in getting APT–consistent S–SDFs with a

plausible dispersion. We then clarify the deep properties of pricing error and dual portfolio weight geometries

induced by various APT–consistent penalization choices. Finally, we propose a systematic data-driven

approach to construct APT–consistent minimum variance S–SDFs that yield substantial improvements in

out-of-sample pricing performance, relative to a variety of natural empirical asset pricing benchmarks. We

directly show how the large out-of-sample pricing performance of APT–consistent minimum variance S–SDFs

naturally corresponds to highly profitable optimal trading strategies, which are able to very successfully

optimize the conditional mean-variance trade-off without implying hardly practicable timing features.

Our S–SDF methodology is applicable to study several important topics that could not be addressed in

this paper, such as the estimation of conditional S–SDFs incorporating information from time-varying asset

characteristics and market frictions, the validation of macro-based asset pricing models in presence of general

transaction costs, or the measurement of segmentation between markets with nonlinear (convex) pricing rules.

We plan to address these topics in future research.

Appendix A - Figures and Tables

Figure 1: Two-dimensional unit balls for pricing errors (orange) and portfolio weights (blue), under the APT

pricing error function h = 12

∥∥Σ−1/2·∥∥22, thus h∗ = 1

2

∥∥Σ1/2·∥∥22, for the cases: Σ = I (left panel); Σ11 = 1,

Σ12 = 0, Σ22 = 1.5 (middle panel); Σ11 = 2, Σ12 = 0.5, Σ22 = 1 (right panel). The pricing error (portfolioweight) unit ball is given by the set x ∈ R2 : h(x) ≤ 1 (x ∈ R2 : h∗(x) ≤ 1).

38

Page 39: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Figure 2: Empirical duality failure. We solve for the minimum variance APT–consistent S–SDF withl2−pricing error function for varying pricing error bounds τ ≥ 0. On the y-axis we report the estimate ofthe pricing error function value h(E[P −M0X]) for each τ . The point of discontinuity in the plot identifiesthe smallest pricing error bound τ , for which a solution of the empirical primal S–SDF problem exists. Thelargest pricing error threshold τmax in the plot is computed as the sample version of the maximal thresholdτmax in equation (10). All calculations are based on the augmented intermediate dimensional dataset fromJanuary 1970 to June 2018, using the risk-free asset as the only sure asset while sorted portfolios and financialratios constitute the dubious assets.

Figure 3: Various APT–consistent pricing error bounds. Two-dimensional unit balls induced byvarious APT–consistent pricing error functions on pricing errors transformed by Σ−1/2. The pricing errorfunctions are given by the elastic-net norm h = (1 − λ) ‖·‖1 + λ ‖·‖2 on the left panel and by h = (1 −λ)√ND ‖·‖∞ + λ ‖·‖2 on the right panel, for various λ ∈ [0, 1].

39

Page 40: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

(1) Panel (A)

(2) Panel (B)

Figure 4: Minimum volatility bounds. Panel (A) reports the annualized volatility of minimum varianceAPT–consistent SDFs as function of the SDF mean for two different values of pricing error bound, on theleft (right) 1/5 (1/4) of the maximum pricing error bound for l1 pricing error metric is used. We considerthe following three pricing error metrics: h = ‖·‖1 (red), h = ‖·‖2 (black) and h =

√ND ‖·‖∞ (blue). In

Panel (B), the two figures report the annualized volatility for various minimum variance APT–consistentS–SDFs, with mean fixed to 1/E[Rf ], as a function of pricing error threshold τ . S–SDFs are determined forvarious λ ∈ [0, 1] by pricing error function h = (1 − λ) ‖·‖1 + λ ‖·‖2 in the left figure, and by pricing errorfunction h = (1−λ)

√ND ‖·‖∞+λ ‖·‖2 in the right figure. All calculations are based on the low dimensional

dataset from July 1963 to June 2018. The dashed-dotted vertical line indicates the pricing error thresholdτ = 0.45 used in Figures 5

40

Page 41: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

(1) Panel (A)

(2) Panel (B)

Figure 5: Properties of S–SDFs. Panel (A) reports, from top to bottom, Σ−1/2−transformed pricingerrors and Σ1/2−transformed optimal portfolio weights for the set of dubious assets which consists of 56portfolios, where l2 (first row), l1 (second row) and scaled l∞ (third row) are used as pricing error metric.The assets are grouped in the folowing order: Size-book-to market. Momentum (shaded area) and Size-LTRportfolios. These results are based on an identical pricing error threshold τ = 0.45 and for E[M ] = 1/E[Rf ](green dashed line) from Figure 4. Panel (B) reports the corresponding minimum variance S–SDF timeseries. All calculations are based on the low dimensional dataset from July 1963 to June 2018. Grey shadedareas represent NBER recessions.

41

Page 42: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

(1) Panel (A)

(2) Panel (B)

Figure 6: Panel (A): Time series of minimum variance APT–consistent S–SDFs. Time seriesof minimum variance APT–consistent S–SDFs with l2 pricing error function for pricing error thresholdτ = 0 (top-left panel), τ = 0.3 × τmax (top-right panel), τ = 0.6 × τmax (bottom-left panel) and τ =0.9 × τmax (bottom-right panel), where τmax is the empirical version of population quantity τmax definedby equation (10). Panel (B): Scatter plots of minimum variance S–SDFs and minimum KullbackLeibler divergence S–SDFs. Scatter plots of the time series of minimum Kullback Leibler divergence andminimum variance APT–consistent S–SDFs, based on the l2 pricing error function computed with pricingerror threshold τ = 0 (top-left panel), τ = 0.3× τmax (top-right panel), τ = 0.6× τmax (bottom-left panel)and τ = 0.9 × τmax The intermediate dimensional dataset from July 1963 to June 2018 is used, with therisk-free asset being the only sure asset and sorted portfolios constituting the dubious assets.

42

Page 43: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Figure 7: Correlation between optimal portfolio payoffs underlying different minimum disper-sion S–SDFs. The figure reports for various pairs of optimal portfolio payoffs their time series correlation,in dependence of different pricing error thresholds that correspond to different fractions of threshold τmax,the empirical version of population quantity τmax in equation (10). The payoffs are induced by S–SDFs min-imizing various notions of dispersions collected in Appendix A, i.e., Variance (Hansen-Jagannathan, HJ),Kullback-Leibler divergence (KL), Negative entropy (NE) and Hellinger divergence (HE). The intermediatedimensional dataset from July 1963 to June 2018 is used, with the risk-free asset being the only sure assetand sorted portfolios constituting the dubious assets.

43

Page 44: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

(1) Panel (A)

(2) Panel (B)

Figure 8: Out-Of-Sample (OOS) GLS adjusted R2 of various S–SDFs. Panel (A) reports the OOSGLS adjusted R2 of the cross-sectional regressions of average excess returns on estimated factor loadings fordifferent empirical asset pricing models. The red, black and blue curves report the OOS GLS adjusted R2 forthe forward-looking APT–consistent minimum variance S–SDFs described in Section 5.3, based on pricingerror functions given for λ ∈ [0, 1] by h = (1−λ) ‖·‖1 +λ ‖·‖2 (left panel) and h = (1−λ)

√ND ‖·‖∞+λ ‖·‖2

(right panel). We compute these S–SDFs for various fractions, reported on the x-axis, of the empiricalmaximal pricing error thresholds τmaxy . The dashed horizontal line reports the OOS GLS adjusted R2

achieved by the Fama-French three-factor model. The dashed-dotted horizontal line reports the OOS GLSadjusted R2 attained by the data-driven minimum variance S–SDF (OPT(A)) with l2 APT–consistent pricingerror function and no forward-looking bias; see again Section 5.3 for details. Panel (B) reports the timeseries of following out-of-sample S–SDFs from Figure 8: (A), OPT(A), (B) and (C). Shaded areas correspondto NBER recessions. The intermediate dataset is used with an OOS period starting in July 1963 and endingin June 2018. For each of these S–SDFs, the risk-free asset is the only sure asset and sorted portfoliosconstitute the dubious assets. See Section 5.3 for details.

44

Page 45: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

(1) Panel (A)

(2) Panel (B)

Figure 9: OOS comparison of minimum variance (H-J) and minimum Kullback Leibler (KL)dispersion S–SDFs. The figure reports the OOS GLS adjusted R2s of the cross-sectional regressionsof average excess returns in the low dimensional dataset (left panels) and the intermediate dimensionaldataset (right panels) on estimated factor loadings of minimum H-J dispersion (top panels) and minimumKL dispersion (bottom panels) S–SDFs for various APT–consistent pricing error functions. The OOS periodstarts in July 1963 and ends in June 2018. The solid black cure depicts the OOS GLS adjusted R2s obtainedwith an APT–consistent l2 pricing error function, In Panel (A) the dashed red lines correspond to a pricingerror function given for various λ ∈ (0, 1) by h = (1− λ) ‖·‖1 + λ ‖·‖2. In Panel (B) the dashed blue linescorrespond to a pricing error function given for various λ ∈ (0, 1) by h = (1 − λ)

√ND ‖·‖∞ + λ ‖·‖2. In

every panel, the risk-free asset is the only sure asset, while sorted portfolios constitute the dubious assets.See Section 5.3 for details.

45

Page 46: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Figure 10: OOS portfolios cumulative returns. The figure reports the cumulative returns of a onedollar investment in different portfolios, financed by borrowing at the risk-free rate: the market portfolio

(Mkt, yellow line), Portfolio (51) leveraged to have the same volatility as the market portfolio (Re(m)y , blue

line), Portfolio (52) leveraged to have the same volatility as the market portfolio (Re(m)y , red line), portfolio

SMB (purple line), portfolio HML (green line) and the equally-weighted portfolio of all sorted portfolios inthe intermediate dataset (EW, light blue line). The out-of-sample valuation period starts in July 1963 andends in June 2018. Shaded areas correspond to NBER recessions.

46

Page 47: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Table 1: Specifications for market friction and portfolio penalizations

Supporting friction h∗ h γh∗

Short-sale constraint,Bid-Ask spread

δK δKo δK

Proportional transaction costs α ‖·‖ δαB(‖·‖∗) α ‖·‖

Leverage constraints δαB(‖·‖) α ‖·‖∗ δαB(‖·‖)

Quadratic costs (Ridge) α2 ‖·‖

22

12α ‖·‖

22

√2τα ‖·‖2

Quadratic costs (Elastic Net) α1 ‖·‖1 + α2

2 ‖·‖22

12α2

dist2α1B∞(·) α1 ‖·‖1 +√

2τα2 ‖·‖2

Quadratic after a threshold α2

2 dist2α1B∞(·) α1 ‖·‖1 + 12α2‖·‖22 λelh

∗(·/λel) + λelτ

Quadratic within a thresholdand Linear after the threshold

Ψα1,α2(·) δα1B∞ + α2

2 ‖·‖22 λhuh

∗(·/λhu) + λhuτ

The table reports the functional specifications of different transaction cost functions h∗, associated pricingerror metrics h and corresponding regularization functions γh∗ . Here, h∗ and γh∗ are functions of portfolioweights, while h is a function of pricing errors. In the first row, we consider market frictions that restrictportfolio weights in a cone K, such as short-sale constraints (K = RND+ ) and Bid-Ask spreads (K = RND+ ×RND− . The second row considers proportional transaction costs measured by norms, while the third rowaddresses constraints on leverage, i.e., portfolio weights restricted to lay in norm-balls. Row four reportsquadratic transaction costs modelled by a ridge parametrization, while in row five this is extended to amore general elastic net parametrization of quadratic costs. In row six, portfolio allocations are subject to(quadratic) transaction costs only above a threshold. The resulting pricing error metric is reproduced by anelastic net parametrization. The portfolio regularization term in function γh∗ is the solution of the equation

‖θD/λ‖22 −∥∥projα1B∞(θD/λ)

∥∥22− 2τ/α2 = 0, where projα1B∞ is the Euclidean projection on the l∞−ball

with radius α1. In row seven, we consider portfolio allocations subject to quadratic transaction costs below acertain threshold, and linear above this threshold. The resulting transaction cost function is the multivariateHuber function, defined by Ψα1,α2(θD) :=

∑Ndi=1 ψα1,α2(θDi), where ψα1,α2(x) := α1(|x| − α2α1/2) if |x| ≥

α1α2 and x2/(2α2) otherwise. For this case the portfolio regularization parameter λhu in penalization γh∗

is the solution of the equation: θ′D

[∑NDi=1

(α11|θDi|≥α1α2

+ (θDi/α2)1|θDi|≤α1τ

)]− τλ2 = 0.

47

Page 48: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Tab

le2:

Tw

o-s

tep

ou

t-of-

sam

ple

ass

et

pri

cin

gte

sts

Con

st(%

)λ1(%

)λ2(%

)λ3(%

)λ4(%

)λ5(%

)R

2 OLS

(%)

R2 GLS

(%)

OP

T(A

)0.

86(3

6.65

)-4

8.6

8(-

7.3

3)

22.0

741.7

6

(A)

0.80

(46.

95)

-16.8

7(-

7.8

2)

24.4

544.1

3

(B)

0.65

(33.

51)

-16.8

7(-

6.5

9)

18.5

832.3

2

(C)

0.89

(26.

17)

-41.0

6(-

5.1

9)

12.2

539.5

0

FF

31.

01(7

.91)

-0.4

0(-

3.2

6)

0.2

3(6

.27)

0.1

6(3

.12)

27.5

16.2

9

FF

50.

71(4

.53)

-0.1

3(-

0.8

6)

0.2

9(7

.42)

0.0

9(3

.60)

-0.0

1(-

0.1

6)

0.3

4(3

.60)

29.2

35.9

2

PC

A3

0.99

(8.2

6)-3

.22

(-2.0

1)

2.0

2(5

.67)

0.0

6(0

.32)

22.7

05.6

9

GM

M0.

84(3

8.58

)-8

8.0

3(-

7.1

4)

21.1

838.8

3

Th

eta

ble

sum

mar

izes

the

resu

lts

ofou

t-of

-sam

ple

two-s

tep

cross

-sec

tion

al

regre

ssio

ns

of

aver

age

exce

ssre

turn

son

the

esti

mate

dfa

ctor

load

ings

of

diff

eren

tem

pir

ical

asse

tp

rici

ng

model

s.T

he

colu

mn

sre

port

from

the

left

toth

eri

ght

the

esti

mate

din

terc

ept,

fact

or

risk

pre

mia

,ad

just

edR

2an

dG

LS

adju

sted

R2

inp

erce

nta

gete

rms,

wit

ht-

stati

stic

sin

pare

nth

esis

.T

he

firs

tro

wre

port

sre

sult

sfo

rth

ed

ata

-dri

ven

S–S

DF

OP

T(A

)w

ith

no

forw

ard

-look

ing

bia

sd

escr

ibed

inS

ecti

on5.

3.R

ows

two

tofo

ur

rep

ort

resu

lts

for

the

S–S

DF

s(A

),(B

)an

d(C

)w

ith

forw

ard

-lookin

gb

ias

from

Fig

ure

8;se

eag

ain

Sec

tion

5.3

for

det

ails

.T

he

fift

han

dsi

xth

row

sre

port

resu

lts,

resp

ecti

vel

y,fo

rth

eF

am

a-F

ren

chth

ree-

fact

or

an

dfi

ve-f

act

or

mod

els.

Row

seve

nre

por

tsre

sult

sfo

ra

lin

ear

SD

Fb

ased

on

the

firs

tth

ree

PC

Afa

ctor

retu

rns

com

pu

ted

usi

ng

data

from

the

ou

t-of-

sam

ple

per

iod

.T

he

inte

rmed

iate

dat

aset

isu

sed

wit

han

out-

of-s

amp

lep

erio

dst

art

ing

inJu

ly1963

an

den

din

gin

Ju

ne

2018.

Th

ela

stro

wre

port

sre

sult

sfo

ra

lin

ear

out-

of-s

amp

leS

DF

,R′ θGMM

,es

tim

ated

min

imiz

ing

pri

cin

ger

rors

inea

chtr

ain

ing

win

dow

via

GM

M,

i.e.

,θ G

MM

solv

esth

ep

rob

lem

minθ∈RNE

[(R′ θ

)R−

1]′ (Cov

(R−Rf))−1E[

(R′ θ

)R−

1]

:R′ θ≥

0,E[

(R′ θ

)Rf]

=1

.

48

Page 49: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Table 3: Portfolio performance summary

Panel (A) Panel (B)

Mean Std Sharpe Ratio Skewness Kurtosis Const(%) λ(%) R2OLS(%) R2

GLS(%)

Re(m)y 0.301 0.740 0.407 0.541 11.894 0.78 30.99 34.40 40.61

(52.47) (9.95)

Re(m)y 0.015 0.036 0.414 0.967 17.816 0.78 1.31 28.69 38.95

(49.45) (8.71)Mkt 0.005 0.044 0.121 -0.539 5.029 0.54 0.17 0.99 -0.19

(5.00) (1.69)SMB 0.002 0.031 0.073 0.493 8.426 0.54 0.24 16.78 1.71

(15.56) (6.22)HML 0.003 0.028 0.118 0.087 5.103 0.77 0.17 4.00 3.07

(34.99) (2.96)EW 0.007 0.050 0.147 -0.525 5.670 0.39 0.34 6.31 -0.37

(4.27) (3.69)

Panel (A) reports out-of-sample summary statistics of various monthly excess return strategies. Thecolumns report from the left to the right the sample mean, standard deviation, Sharpe ratio, skewnessand kurtosis of various monthly portfolio excess returns. The first two rows report results for the data-

driven excess returns Rey(m) and Rey

(m)described in Section 5.4. Rows three to six report results for the

excess returns on the three Fama-French factors and on the equally weighted portfolio, respectively. Theintermediate dataset is used with out-of-sample period starting in July 1963 and ending in June 2018.Panel (B) summarizes the results of out-of-sample two-step cross-sectional regressions of average excessreturns on the estimated factor loadings of different single factor empirical asset pricing models. The columnsreport from the left to the right the estimated intercept, factor risk premium, adjusted R2 and GLS adjustedR2 in percentage terms, with t-statistics in parenthesis.

B - Proofs

Lemma 1. Optimal transaction cost function γh∗ defined in (6) satisfies for any θD ∈ RND :

γh∗(θD) = σC(θD) := supη∈RND

θ′Dη : h(η) ≤ τ . (A-1)

Thus, γh∗ is sublinear, closed and proper. Moreover, γh∗(θD) is attained at some λ∗ > 0 for any θD 6= 0.

Proof. Since τ > 0 and h(0) = 0, we can use [Luenberger, 1997, Thm. 1, pag. 224] thereby obtaining

σC(θD) = minλ≥0

sup

η∈RNDθ′Dη − λh(η)+ λτ

,

with attainment at some λ∗ ≥ 0. As (λh)∗(θD) = λh∗(θD/λ), we can further write

σC(θD) = minλ≥0

f(λ,θD), where f(λ,θD) :=

λh∗(θD/λ) + λτ λ > 0

δ0(θD) λ = 0,

and δ0(θD) is the characteristic function of singleton set 0. But λ∗ = 0 if θD = 0 Again since τ > 0 and

49

Page 50: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

h∗(0) = 0, we have that λ∗ = 0 for θD = 0. Thus,

σC(θD) =

minλ>0λh∗(θD/λ) + λτ θD 6= 0

0 θD = 0,

with attainment at some λ∗ > 0 when θD 6= 0. Since h∗ is the convex conjugate of h, it follows thath∗(0) = 0, and thus we can conclude that

σC(θD) = infλ>0λh∗(θD/λ) + λ =: γh∗(θD) .

Finally, [Rockafellar, 1970, Thm. 13.2] implies that γh∗ is proper, closed and sublinear.

Proof of Proposition 1. By Lemma 1, γh∗ is a sublinear function. This implies that the set of payoffs, Zis a subspace. Moreover, since Z is finitely generated, by [Clark, 1993, Thm. 6], the assumption of absenceof arbitrage is equivalent to the no-free-lunch assumption in Jouini and Kallal [1995]. Hence, by [Jouini andKallal, 1995, Thm. 2.1], absence of arbitrage is equivalent to the existence of a strictly positive, continuouslinear functional f on Lp, such that f |Z ≤ π, where f |Z indicates the restriction of f to Z. Since f is alinear continuous functional on Lp, it belongs to the dual space of Lp. By Riesz Representation Theorem,this is true if and only if there exists M ∈ Lq, with 1/p + 1/q = 1, such that λ(Z) = E[MZ], see e.g.,[Royden and Fitzpatrick, 1988, Ch. 8]. The positivity of λ is equivalent to the condition that M > 0 almostsurely. Hence, absence of arbitrage is equivalent to the existence of a strictly positive SDF, M , such thatE[MZ] ≤ π(Z) for any Z ∈ Z.

Now, by definition of cost functional π, it equivalently follows for any θ ∈ RN that θ′P+γh∗(θD) ≥ E[Mθ′X],where γh∗ is given in (6), i.e., for any θ ∈ RN

θ′SPS + θ′DPD + γh∗(θD) ≥ θ′SE[MXS ] + θ′DE[MXD] . (A-2)

(A-2) is equivalent to the following two inequalities: θ′SPS ≥ θ′SE[MXS ] for any θS ∈ RNS and θ′D (E[MXD]− PD) ≤γh∗(θD) for any θD ∈ RND . The first inequality holds if and only if E[MXS ] − PS = 0 and the secondinequality, by [Bauschke et al., 2011, Prop. 13.10 (i)], if and only if

(γh∗)∗(E[MXD]− PD) ≤ 0 . (A-3)

Since by Lemma 1, γh∗ = σC , where C := y : h(y) ≤ τ, the convex conjugate (γh∗)∗ is given by the

characteristic function δC , see [Rockafellar, 1970, Thm. 13.2], i.e.,

(γh∗)∗(E[MXD]− PD) =

0 h(E[MXD]− PD) ≤ τ+∞ else

.

Thus, inequality (A-3) is equivalent to condition h(E[MXD]− PD) ≤ τ . This concludes the proof.

Proof of Proposition 2. We first prove that Π(τ) = −∆(τ). To do so, we rewrite problem Π(τ) accordingto the notation given in [Borwein and Lewis, 1992, Sec. 4] and employ Fenchel’s duality to obtain its dualproblem. Finally, we employ Lagrange duality to obtain a portfolio problem interpretation of the dualproblem.

Define for any stochastic discount factor M the linear function A : Lq → RN by A(M) := E[MX]. Further,for every vector y ∈ RN , let g : RN → [0,+∞] be defined by g(y) := δPS(yS) + δC(yD − PD), whereC := η ∈ RND : h(η) ≤ τ.32 Finally, let Iφ+

:= E[φ+(·)], where φ+ is the restriction of φ to R+, i.e.,

32 δC is the characteristic function of set C, i.e., for any y ∈ C we define δC(y) :=

0 y ∈ C+∞ y /∈ C

.

50

Page 51: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

φ+(x) = φ(x) if x ≥ 0 and φ+(x) = +∞ if x < 0. Notice that

Iφ(M) + δLq+(M) = Iφ+(M) =

Iφ(M) M ≥ 0

+∞ else.

Thus we can write:Π(τ) = inf

M∈LqIφ+

(M) + g(A(M)) .

As payoffs are in Lp with 1/p+ 1/q = 1, A is continuous, while absence of arbitrage implies that Iφ+and g

are both convex functions. Thus, in order to apply [Borwein and Lewis, 1992, Cor. 4.3] and obtain the dualproblem of Π we need to check that

A[qri(dom Iφ+)] ∩ dom g 6= ∅ , (A-4)

as g is piecewise linear.33

As proved in [Borwein and Lewis, 1991, Cor.2.6], our requirement (0,+∞) ⊂ domφ+ implies that

A(dom Iφ+∩ Lq++) ⊂ A[qri(dom Iφ+

)] .

Moreover, dom g = PS × C. Hence, showing that A(dom Iφ+∩ Lq++) ∩ dom g 6= ∅ is enough to prove

(A-4). This result follows from absence of arbitrage.34 Indeed, we have M ∈ dom Iφ+∩Lq++, hence A(M) ∈

A(dom Iφ+∩ Lq++). Moreover, PS = E[MXS ] and E[MXD]− PD ∈ C, which means that A(M) ∈ dom g.

Therefore, condition (A-4) is satisfied and by [Borwein and Lewis, 1992, Cor. 4.3] we have

Π(τ) = maxθ∈RN

−I∗φ+(tA(θ))− g∗(−θ) , (A-5)

where I∗φ+: Lp → ([0) −∞,+∞] is the conjugate function of Iφ+

and tA : RN → Lp is the adjoint map of

A, given by tA(θ) = X ′θ.35

We now show that the right hand side of equation (A-5) is equivalent to the dual portfolio problem definedin equation (11). As φ+ is convex and closed, we can apply [Rockafellar, 1968, Thm. 2] to obtain I∗φ+

= Iφ∗+ .

Moreover, for every θ ∈ RN ,

g∗(−θ) = supη∈RN

−θ′η − g(η)

= supηS∈RNS

−θ′SηS − δPS(ηS)

+ supηD∈RND

−θ′DηD − δC(ηD − PD)

= −P ′SθS + supk∈RND

−θ′D(k + PD) : h(k) ≤ τ

= −P ′θ + supk∈RND

−θ′Dk : h(k) ≤ τ .

Thus, after a change of variable, (A-5) reads:

Π(τ) = − minθ∈RN

E[φ∗+(−X ′θ)] + P ′θ + σC(θD)

,

where σC(θD) is defined in (A-1). By Lemma 1, we can write

Π(τ) = − minθ∈RN

E[φ∗+(−X ′θ)] + P ′θ + γh∗(θD)

,

33 See [Borwein and Lewis, 1992, Def. 2.3] for the definition of quasi relative interior, qri.34 Again, the implicit assumption behind primal problem 8 is that the Lp−−Lq dual spaces are chosen so that the integrability

condition E[φ+(M)] < +∞ is satisfied.35 The adjoint map of A, tA, is characterized by the identity E[tA(θ)M ] = θ′E[MX], for each M ∈ Lq and each portfolio

weights θ ∈ RND .

51

Page 52: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

thereby proving strong duality between Π(τ) and −∆(τ).

We now prove the relation between the optimal solutions of Π(τ) and ∆(τ) given in (12). The dual problemcan be furthermore rewritten as

∆(τ) = minθ∈RN

Iφ∗+(tA(−θ)) + g∗(θ) . (A-6)

Since φ+ is strictly convex on its domain, by [Borwein and Lewis, 1991, Thm. 4.6] φ∗+ is differentiable onthe interior of its domain, which is non-empty by [Borwein and Lewis, 1991, Lem. 4.2]. Thus assuming that−X ′θ0 is in the interior of the domain of φ∗+, implies that φ∗+ is differentiable in tA(−θ0)(ω) almost surely.Let us denote such derivative by M0(ω) := (φ∗+)′(−X ′(ω)θ0).

Below, we will use the fact that M0 is the unique element of ∂Iφ∗+(tA(−θ0)) ⊂ Lq. To show this, we first

claim that for any M ∈ ∂Iφ∗+(tA(−θ0)) we must have M(ω) ∈ ∂φ∗+(tA(−θ0)(ω)) ⊂ R almost surely. If this

is true, the fact that φ∗+ is differentiable in −X(ω)′θ0 a.s., that is, the only element of its subdifferential isgiven by M0(ω), implies that M = M0 almost surely, i.e., M0 is the unique element of ∂Iφ∗+(tA(−θ0)). To

show this uniqueness, we start from following identity, which holds by [Rockafellar, 1970, Thm. 23.5]:

Iφ∗+(tA(−θ0)) + Iφ+(M) = 〈tA(−θ0), M〉 .

Explicitly, this gives∫ [φ∗+(tA(−θ0)(ω)) + φ+(M(ω))−t A(−θ0)(ω)M(ω)

]dP(ω) = 0. Here, the integrand

is nonnegative by Fenchel’s inequality. Therefore, it is zero almost surely. Applying again [Rockafellar, 1970,Thm. 23.5] to this integrand, we obtain M(ω) ∈ ∂φ∗+(tA(−θ0)(ω)) almost surely and hence M0 = M ∈ Lq,as claimed.

To show the primal feasibility of M0, notice first that since θ0 is a solution to problem (A-6), if we denotewith Q the dual objective function, the following first order condition holds:

0 ∈ ∂Q(θ0) = ∂Iφ∗+(tA(−θ0)) + ∂g∗(θ0) .

Moreover, by [Borwein, 1981, Thm. 4.1], ∂Iφ∗+(tA(−θ0)) = −ttA(∂Iφ∗+(tA(−θ0)), which is given by the

singleton −A(M0) since ttA|Lq = A. Hence, there exists ν ∈ ∂g∗(θ0) ⊆ RN , such that

−A(M0) + ν = 0 . (A-7)

Equation (A-7), together with the fact that ν ∈ ∂g∗(θ0) ⊂ RN and M0 ∈ ∂Iφ∗+(tA(θ0)), implies the primal

feasibility of M0. Indeed, by [Rockafellar, 1970, Thm. 23.5], ν ∈ ∂g∗(θ0) ⊂ RN implies g(ν) = ν′θ0−g∗(θ0).Since g∗ is proper and by definition of the subgradient, we must thus have g∗(θ0) < +∞. Hence, g(ν) isfinite, which from equation (A-7) means that also g(A(M0)) is finite. Using the explicit definition of g, thismeans E[MXS ] − PS = 0S and h(E[MXD] − PD) ≤ τ . Similarly, using again [Rockafellar, 1970, Thm.23.5], the fact that M0 ∈ ∂Iφ∗+(tA(−θ0)), the definition of the subgradient and properness of Iφ∗+ imply

M0 ∈ dom Iφ+, so that M0 ∈ Lq+ is indeed primal feasible.

We finally show that M0 is a primal solution. To this end, consider the following equalities:

Iφ∗+(tA(−θ0)) + Iφ+(M0) = 〈tA(−θ0),M0〉 = 〈−θ0, A(M0)〉 = −ν′θ0 ,

where the first equality follows again from [Rockafellar, 1970, Thm. 23.5], the second one from the definitionof the adjoint map, and the third one from optimality condition (A-7). Thus, we can explicitly write theprimal objective function computed in M0 as:

Iφ+(M0) + g(A(M0)) = −Iφ∗+(tA(−θ0))− g∗(θ0) + [−ν′θ0 + g∗(θ0) + g(A(M0))] .

Using equation (A-7), −ν′θ0 + g∗(θ0) + g(A(M0)) = −A(M0)′θ0 + g∗(θ0) + g(A(M0)), which is zero by[Rockafellar, 1970, Thm. 23.5]. This shows that M0 is a primal solution. Uniqueness of this solution can beshown following [Borwein and Lewis, 1991, Prop. 2.11]. This concludes the proof.

52

Page 53: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Proposition 3 (Equivalence with dispersion constrained minimum pricing error problems).Suppose that Π(τ) and

P(ν) := infM∈M

h(E[MXD − PD]) : E[φ+(M)] ≤ ν (A-8)

for ν ∈ R are finite and attained. (i) If M0 solves Π(τ), then there exists ν0 ∈ R such that M0 solves P(ν0).(ii) If M0 is the unique solution of P, then there exists τ0 ∈ R such that M0 solves Π(τ0).

Proof. (i) Let ν0 := E[φ+(M0)]. By strict convexity of φ+ and [Borwein and Lewis, 1991, Prop. 2.11],M0 is the unique solution if Π(τ). Therefore, M0 is the unique element of M ∈ M : E[φ+(M)] ≤ν0, h(E[MXD − PD]) ≤ τ. Thus, M0 solves P(ν0). (ii) Let τ0 := h(E[M0XD − PD]). If M0 is the uniquesolution of P(ν), it is the unique element of M ∈ M : E[φ+(M)] ≤ ν, h(E[MXD − PD]) ≤ τ0. Thus,M0 solves Π(τ0).

53

Page 54: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

References

Caio Almeida and Rene Garcia. Economic implications of nonlinear pricing kernels. Management Science,

63(10):3361–3380, 2016.

Donald WK Andrews. Empirical process methods in econometrics. Handbook of econometrics, 4:2247–2294,

1994.

David Backus, Mikhail Chernov, and Ian Martin. Disasters implied by equity index options. The journal of

finance, 66(6):1969–2012, 2011.

Heinz H Bauschke, Patrick L Combettes, et al. Convex analysis and monotone operator theory in Hilbert

spaces, volume 408. Springer, 2011.

JM Borwein. A lagrange multiplier theorem and a sandwich theorem for convex relations. Mathematica

Scandinavica, pages 189–204, 1981.

Jonathan M Borwein. On the failure of maximum entropy reconstruction for fredholm equations and other

infinite systems. Mathematical programming, 61(1-3):251–261, 1993.

Jonathan M Borwein and Adrian S Lewis. Duality relationships for entropy-like minimization problems.

SIAM Journal on Control and Optimization, 29(2):325–338, 1991.

Jonathan M Borwein and Adrian S Lewis. Partially finite convex programming, part i: Quasi relative

interiors and duality theory. Mathematical Programming, 57(1-3):15–48, 1992.

Lawrence D Brown, R Purves, et al. Measurable selections of extrema. The annals of statistics, 1(5):902–912,

1973.

Stephen A Clark. The valuation problem in arbitrage price theory. Journal of Mathematical Economics, 22

(5):463–478, 1993.

John H Cochrane and Jesus Saa-Requejo. Beyond arbitrage: Good-deal asset price bounds in incomplete

markets. Journal of political economy, 108(1):79–119, 2000.

George M Constantinides. Multiperiod consumption and investment behavior with convex transactions costs.

Management Science, 25(11):1127–1137, 1979.

George M Constantinides. Capital market equilibrium with transaction costs. Journal of political Economy,

94(4):842–862, 1986.

Noel Cressie and Timothy RC Read. Multinomial goodness-of-fit tests. Journal of the Royal Statistical

Society. Series B (Methodological), pages 440–464, 1984.

James Davidson. Stochastic limit theory: An introduction for econometricians. OUP Oxford, 1994.

54

Page 55: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Victor DeMiguel, Alberto Martın-Utrera, and Francisco J Nogales. Parameter uncertainty in multiperiod

portfolio optimization with transaction costs. Journal of Financial and Quantitative Analysis, 50(6):

1443–1471, 2015.

Victor DeMiguel, Alberto Martın-Utrera, Francisco J Nogales, and Raman Uppal. A transaction-cost per-

spective on the multitude of firm characteristics. Review of Financial Studies, forthcoming, 2019.

Nicolae Garleanu and Lasse Heje Pedersen. Dynamic trading with predictable returns and transaction costs.

The Journal of Finance, 68(6):2309–2340, 2013.

Charles J Geyer. On the asymptotics of convex stochastic optimization. Unpublished manuscript, 37, 1996.

Anisha Ghosh, Christian Julliard, and Alex P Taylor. An information-theoretic asset pricing model. Technical

report, Working Paper, London School of Economics, 2016.

Anisha Gosh, Christian Julliard, and Alex Taylor. What is the consumption-capm missing? an information-

theoretic framework for the analysis of asset pricing models. The Review of Financial Studies, 30:442–504,

2017.

Lars Peter Hansen and Ravi Jagannathan. Implications of security market data for models of dynamic

economies. Journal of political economy, 99(2):225–262, 1991.

Lars Peter Hansen, John Heaton, and Erzo GJ Luttmer. Econometric evaluation of asset pricing models.

The Review of Financial Studies, 8(2):237–274, 1995.

J Michael Harrison and David M Kreps. Martingales and arbitrage in multiperiod securities markets. Journal

of Economic theory, 20(3):381–408, 1979.

Jonathan E Ingersoll Jr. Some results in the theory of arbitrage pricing. The Journal of Finance, 39(4):

1021–1039, 1984.

Elyes Jouini and Hedi Kallal. Martingales and arbitrage in securities markets with transaction costs. Journal

of Economic Theory, 66(1):178–197, 1995.

Christian Juilliard and Anisha Ghosh. Can rare events explain the equity premium puzzle? Review of

Financial Studies, 25:3037–3076, 2012.

Yuichi Kitamura, Gautam Tripathi, and Hyungtaik Ahn. Empirical likelihood-based inference in conditional

moment restriction models. Econometrica, 72(6):1667–1714, 2004.

Martin Lettau and Markus Pelger. Estimating latent asset-pricing factors. Journal of Financial Economet-

rics, forthcoming, 2019.

Fabrizio Lillo, J Doyne Farmer, and Rosario N Mantegna. Econophysics: Master curve for price-impact

55

Page 56: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

function. Nature, 421(6919):129, 2003.

David G Luenberger. Optimization by vector space methods. John Wiley & Sons, 1997.

Erzo GJ Luttmer. Asset pricing in economies with frictions. Econometrica: Journal of the Econometric

Society, pages 1439–1467, 1996.

Xiaoling Mei, Victor DeMiguel, and Francisco J Nogales. Multiperiod portfolio optimization with multiple

risky assets and general transaction costs. Journal of Banking & Finance, 69:108–120, 2016.

Jean-Jaques Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes

Rendus de l’Academie des Sciences, pages 2897–2899, 1962.

Stefan Nagel, Serhiy Kozak, and Shrihari Santosh. Shrinking the cross-section. Journal of Financial Eco-

nomics, 2018.

Vasanttilak Naik and Raman Uppal. Leverage constraints and the optimal hedging of stock and bond options.

Journal of Financial and Quantitative Analysis, 29(2):199–222, 1994.

Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis testing. Handbook of

econometrics, 4:2111–2245, 1994.

Whitney K Newey and Richard J Smith. Higher order properties of gmm and generalized empirical likelihood

estimators. Econometrica, 72(1):219–255, 2004.

Chen Qiu and Taisuke Otsu. Information theoretic approach to high dimensional multiplicative models:

Stochastic discount factor and treatment effect. 2018.

Ralph Rockafellar. Integrals which are convex functionals. Pacific journal of mathematics, 24(3):525–539,

1968.

Ralph Rockafellar. Integrals which are convex functionals. ii. Pacific Journal of Mathematics, 39(2):439–469,

1971.

Ralph Tyrell Rockafellar. Convex analysis. Princeton university press, 1970.

Stephen A Ross. The arbitrage theory of capital asset pricing. In Handbook of the Fundamentals of Financial

Decision Making: Part I, pages 11–30. World Scientific, 2013.

Halsey Lawrence Royden and Patrick Fitzpatrick. Real analysis, volume 32. Macmillan New York, 1988.

Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society.

Series B (Methodological), pages 267–288, 1996.

Raman Uppal and Paolo Zaffaroni. Portfolio choice with model misspecification: a foundation for alpha and

beta portfolios. 2016.

56

Page 57: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Raman Uppal, Paolo Zaffaroni, and Irina Zviadadze. Beyond the bound: Pricing assets with misspecified

stochastic discount factors. 2018.

Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the royal

statistical society: series B (statistical methodology), 67(2):301–320, 2005.

57

Page 58: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Online Appendix A - Cressie-Read Dispersions

We collect for easy reference relevant explicit examples of Φ−dispersions from the Cressie-Read family,together with their corresponding functions φ+ and φ∗+; see also Kitamura et al. [2004] and Newey andSmith [2004], among others.

Examples 1. 1. Kullback-Leibler dispersion:

φ+(x) =

x lnx− x+ 1 x > 0

1 x = 0

+∞ x < 0

, with φ∗+(y) = exp(y)− 1 .

2. Negative entropy:

φ+(x) =

− lnx+ x− 1 x > 0

+∞ x ≤ 0, with φ∗+(y) =

− ln(1− y), y < 1

+∞ y ≥ 1.

3. Power dispersion:

(a) for γ > 1 and β = γ/(γ − 1),

φ+(x) =

xγ/γ x ≥ 0

+∞ x < 0, with φ∗+(y) = maxy, 0β/β ,

(b) for 0 < γ < 1 and β = γ/(γ − 1),

φ+(x) =

γx−xγ1−γ x ≥ 0

+∞ x < 0, with φ∗+(y) =

(1 + y/β)β − 1 y < −β+∞ y ≥ −β

,

(c) for γ < 0 and β = γ/(γ − 1),

φ+(x) =

1−xγ+γ(x−1)

γ x > 0

+∞ x ≤ 0, with φ∗+(y) =

− (1−y)β+1

β y < 1

+∞ y ≥ 0.

In the above settings, function φ+ is always closed and strictly convex, while function φ∗+ is always dif-ferentiable in the interior of its domain. Moreover, function φ∗+ is strictly convex in all cases, but case(3a).

The next leemma summarizes the analytical expressions for minimum dispersion S–SDFs corresponding tothe particular choices of Φ−dispersions from Examples 1.

Lemma 2. Under the conditions of Proposition 2 in the main text, consider the Φ−dispersions induced byfunctions φ+ in Examples 1. It then follows:

(1) M0 = exp(−X ′θ0).

(2) M0 = 1/(1 +X ′θ0), if −X ′θ0 < 1.

(3a) M0 = max−X ′θ0, 0β−1.

(3b) M0 = (1−X ′θ0/β)β−1

, if −X ′θ0 < −β.

(3c) M0 = (1 +X ′θ0)β−1, if −X ′θ0 < 1.

58

Page 59: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Proof of Lemma 2. For every Φ−dispersion considered under Examples 1, φ+ is strictly convex on itsdomain. Moreover, for Φ−dispersions (1) and (3a) the interior of the domain of φ∗+, int(domφ∗+) is theentire real line , i.e., there is no restriction on the values of the optimal portfolio payoff. With regard to theother cases, the optimal portfolio payoff is restricted to be almost surely in the interior of the domain of φ∗+,which is (−∞, 1) in cases (2) and (3c) and (−∞,−β) in case (3b), respectively.

Online Appendix B - Estimation and inference for minimum dis-persion S–SDFs

In this section, we develop consistent estimation procedures of S–SDF dispersion bounds and dual portfolioweights using data generated by a dynamic economy with stationarity properties. To this end, consider atime series (Xt+1,Pt)t∈N of payoffs and quoted prices of the N basis assets that are defined on the filteredprobability space (Ω, Ftt∈N,P). The N–dimensional vectors of payoffs Xt+1 and reference prices Pt areobserved at time t + 1 and time t, respectively. In order to estimate the dipersion bounds and portfolioweights from the time series observation, we make the following standard stationarity assumption.

Assumption 1. Stochastic process (Xt+1,Pt)t∈N is stationary and ergodic.

Let Q : RN → (−∞,+∞] denote the objective function in the dual optimization problem (11), i.e.,

Q(θ) := Et[φ∗+(−X ′t+1θ)

]+ P ′tθ + γh∗(θD) . (OA-1)

By Proposition 2, we have that the minimum S–SDF dispersion bound is given by ∆(τ) = minQ(θ) : θ ∈Θ, where Θ is the domain of Q, and the optimal portfolio weights are given by θ0 := arg minQ(θ) : θ ∈ θ.Given a sample of size T > 0, the empirical analogue to population function (OA-1) is QT : RN → (−∞,+∞]defined by

QT (θ) :=1

T

T∑t=1

[φ∗+(−X ′t+1θ) + P ′tθ] + γh∗(θD) . (OA-2)

The natural estimator for population value ∆(τ) is thus:

∆T (τ) := min QT (θ) : θ ∈ RN . (OA-3)

Accordingly, θT ∈ arg minQT (θ) : θ ∈ RN is the corresiponding estimator of θ0.36

We study the consistency of the estimator ∆T (τ), under following assumption on the set of minimizers of Q.

Assumption 2. The set of minimizers of Q is a compact subset of the interior of Θ.

Proposition 8 in Appendix B shows that the excluding redundancies in the securities is the key economiccondition implying compactness of the set of minimizers of Q. We prove consistency of the estimator θTunder following additional identification assumption.

Assumption 3. θ0 is the unique minimizer of Q.

Proposition 8 also discusses the economic conditions implying uniqueness of the set of minimizers of Q.

Consistency of estimators for S–SDF dispersion bounds and dual portfolio weights is stated in the nextresult.

Proposition 4 (Consistent estimation of ∆(τ) and θ0). Suppose there are no arbitrage opportunitiesin the supporitng economy, Assumption 2 holds and the dual Φ−dispersion in ∆(τ) is one of Examples 1 inAppendix A. (i) Then ∆T (τ) converges to ∆(τ) almost surely as T →∞. (ii) Let Assumption 3 be satisfied,then θT converges to θ0 in probability as T →∞.

36 The set arg min QT (θ) : θ ∈ RN is not empty for sufficiently large T . This is shown in Brown et al. [1973] becausefunction Q is lower semi-continuous, convex and proper if the primal problem 8 is feasible, i.e., if there are no arbitrageopportunities in the supporting economy, see Proposition 2. Note that a proper and convex function is lower semi-continuousif and only if it is closed.

59

Page 60: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Online Appendix B.1 Asymptotic distributions

We next study the asymptotic distribution of S–SDF dispersion bound estimator ∆T (τ) and the correspond-ing dual portfolio weight estimator θT , based on differentiable penalization functions γh∗ . To this end, wefirst introduce a unifying methodology to model the geometry of S–SDF pricing errors and the portfolioweight penalizations, by means of a convenient class of pricing error functions h that ensure continuousdifferentiability of penalizations γh∗ . Given a closed and convex pricing error function h, we propose toaccurately approximate it using following ridge correction:

hα := h+α

2‖·‖22 , (OA-4)

for some α ≥ 0. In definition (OA-4), hα is always closed and convex, and it always leads to smooth dualpenalizations γh∗α , where h∗α is he convex conjugate of hα. To prove this, it is sufficient to show with thenext proposition that convex conjugate h∗α is continuously differentiable.

Proposition 5. Let hα be defined by equation (OA-4). Then it follows for any θD ∈ RND :

h∗α(θD) =

(eαh

∗)(θD) α > 0

h∗(θD) α = 0, (OA-5)

where closed and convex function eαh∗ is the Moreau approximation of h∗, defined by:

(eαh∗)(θD) := inf

z∈RND

h∗(z) +

1

2α‖z − θD‖22

.

Moreover, for any α > 0 function h∗α is differentiable, with α−1–Lipschitz continuous gradient given by

∇h∗α(θD) =1

α(θD − proxα h

∗(θD)) , (OA-6)

where proxα h∗ : RND → RND is the proximal operator of h∗, defined by

proxα h∗(θD) = arg min

z∈RND

h∗(z) +

1

2α‖z − θD‖22

.

Proof of Proposition 5. For α > 0 and since h∗ is closed and convex, [Bauschke et al., 2011, Prop. 13.21(i)] yields:

(eαh∗)∗ = h+

α

2‖·‖22 .

Moreover, from [Bauschke et al., 2011, Prop. 12.15], eαh∗ is closed and convex. Therefore,

h∗α = (h+α

2‖·‖22)∗ = ((eαh

∗)∗)∗ = eαh∗ . (OA-7)

The result for α = 0 follows from the fact that eαh∗ converges to h∗ as α ↓ 0. This proves result (OA-5).

Result (OA-6) now follows from [Bauschke et al., 2011, Prop. 12.29]. This concludes the proof.

Assumption 4. Prcing error metric h is of type (OA-4) with α > 0 and dual dispersion φ∗+ is continuouslydifferentiable in a neighborhood of θ0.

Differentiability condition on φ∗+ in Assumption 4 is verified, e.g., for primal dispersion functions in theCressie-Read family of Examples 1 in Appendix A.

Online Appendix B.1.1 Asymptotic distribution of estimators for S–SDF dispersion bounds

To obtain the asymptotic distribution of S–SDF dispersion bounds estimators, we first assume interchange-ability between differentiation and integration in gradient ∇ E[φ∗+(−X ′t+1θ) + P ′tθ]

∣∣θ=θ0

; see, e.g., [Newey

60

Page 61: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

and McFadden, 1994, Lem. 3.6]. Any assumption allowing such an interchangeability can be used insteadof Assumption 5 below.

Assumption 5. Given θ0, the unique minimizer of Q, and the continuous differentiability of φ∗+ in an openneighborhood of θ0, denoted by N (θ0), the following condition holds:

E

[sup

θ∈N (θ0)

∥∥Pt − (φ∗+)′(−X ′t+1θ0)Xt+1

∥∥] <∞.The high-level Central Limit Theorem assumption for the asymptotic distribution of the S–SDF dispersionbound estimator reads as follows.

Assumption 6. The sequence of random variables

1√T

T∑t=1

φ∗+(−X ′t+1θ0) + P ′tθ0 − E[φ∗+(−X ′t+1θ0) + P ′tθ0]

,

converges in distribution to a normally distributed random variable with mean zero and strictly positivevariance

v(θ0) := limT→∞

V ar

[1√T

T∑t=1

φ∗+(−X ′t+1θ0) + P ′tθ0 − E[φ∗+(−X ′t+1θ0) + P ′tθ0]

]. (OA-8)

With the above conditions, the asymptotic Gaussian distribution of dispersion bound estimator ∆T (τ) followswith standard methods, using an additional boundedness in probability assumption.

Proposition 6 (Asymptotic distribution of dispersion bound estimator ∆T (τ)). Let Assumptions1–6 be satisfied and the sequence of random vectors

1√T

T∑t=1

Pt − (φ∗+)′(−X ′t+1θ0)Xt+1 − E[Pt − (φ∗+)′(−X ′t+1θ0)Xt+1]

, (OA-9)

be bounded in probability. Then,√T (∆T (τ)−∆(τ)) converges in distribution as T → ∞ to a normally

distributed random variable with mean zero and variance v(θ0) given in equation (OA-8).37

Despite the presence of regularization, the asymptotic distribution of estimator ∆T (τ) is Gaussian, becauseof the convexity and differentiability under Assumption 4 of the objective function defining the bound inthe penalized dual optimal portfolio problem. The boundedness in probability of sequence (OA-9) follows,e.g., under the assumptions of a suitable Central Limit Theorem, such as the one in Assumption 8 below.Boundedness in probability of sequence (OA-9) is typically assumed to obtain the asymptotic distributionof estimators for dispersion bounds of SDFs that are subject only to exact pricing constraints. Similarly,Central Limit Theorems of the type of the one stated in Assumption 6 are also assumed in such settings.Therefore, our asymptotic results in Proposition 6 follow essentially by extendig the standard assumptionsin unpenalized settings by Assumptions 4.

Online Appendix B.1.2 Asymptotic distribution of estimator for S–SDF dual portfolio weights

To derive the asymptotic distribution of estimator θT for S–SDF dual optimal portfolio weights vector θ0,recall that the objective function of the corresponding dual penalized portfolio problem is not in general twicedifferentiable, because penalization function γh∗ is typically not twice differentiable everywhere. However,

37 The same result can be obtained for general convex conjugates h∗, provided they are continuously differentiable in aneighborhood of θ0.

61

Page 62: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

we can still work under a sufficient degree of generality using the standard high-level assumption of stochasticequicontinuity.

Assumption 7. Following stochastic equicontinuity condition holds for any u ∈ RN :

QT

(θ0 +

u√T

)−QT (θ0)−∇QT (θ0)

u√T−(Q

(θ0 +

u√T

)−Q(θ0)

)= op(1/T ).

Stochastic equicontinuity Assumption 7 allows us to derive the asymptotic distribution of portfolio weightestimator θT without requiring twice differentiability of the sample objective function QT .38 Clearly, in set-tings where QT and Q are twice differentiable Assumption 7 holds. High-level conditions implying stochasticequicontinuity are provided, e.g., in Andrews [1994].

In order to obtain the asymptotic distribution of estimator θT , we finally need a standard high-level CentralLimit Theorem assumption for the gradient of the objective function in the unpenalized dual portfolioproblem, which is complemented by a nondegeneracy assumption for the corresponding Hessian matrix ofthe expectaton.

Assumption 8. The sequence of random vectors (OA-9) converges in distribution as T → ∞ to a zeromean normally distributed random vector with positive definite variance covariance matrix:

V (θ0) := limT→∞

V ar

[1√T

T∑t=1

Pt − (φ∗+)′(−X ′t+1θ0)Xt+1 − E[Pt − (φ∗+)′(−X ′t+1θ0)Xt+1]

]. (OA-10)

Moreover, matrix G(θ0) := ∇2E[φ∗+(−X ′t+1θ) + P ′tθ]∣∣θ=θ0

exists.

In Assumption 8, we require twice differentiability of convex function θ 7→ E[φ∗+(X ′t+1θ)−P ′tθ] in θ0, whichdoes not necessarily require twice differentiability of φ∗+ everywhere, due to the smoothing properties ofthe expectation operator. Note that the convexity of φ∗+ directly yields positive semi-definiteness of matrixG(θ0). Positive definiteness of this matrix further follows under Assumption 3.

Under the above conditions, the asymptotic distribution of dual optimal portfolio weight estimator θT ischaracterized as follows.

Proposition 7 (Asymptotic distribution of optimal portfolio weight estimator θT ). Let Assump-tions 1–5 and 7–8 be satisfied. Then, random sequence

√T (θT − θ0) converges to random variable

u0 = arg minuZ(u) ,

in distribution as T →∞, where

Z(u) := u′Y +1

2u′G(θ0)u+R(u) , (OA-11)

Y is a zero mean normally distributed random vector with variance covariance matrix V (θ0) and R(·) is thelimit function of the sequence TRT (·), with:39

RT (u) := γh∗

(θ0D +

uD√T

)− γh∗(θ0D)−∇γh∗(θ0D)

uD√T. (OA-12)

The convergence of the process TRT is implied by Assumption 4. From Proposition 7,√T (θT −θ0) is not in

38Stochastic equicontinuity Assumption 7 can also be more commonly stated as follows. Let RQT (u) := QT

(θ0 + u√

T

)−

QT (θ0) − 1√T∇QT (θ0)u −

(Q(θ0 + u√

T

)−Q(θ0)

). Stochastic equicontinuity then requires that for any sequence δn → 0

following condition holds: sup‖u‖≤δn

√TR

QT(u)

‖u/√T‖ = op(1).

39The same result can be obtained for general dual penalty function γh∗ by assuming that TRT converges pointwise to limitfunction R as T →∞.

62

Page 63: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

general asymptotically normally distributed, due to the contribution of function R(u) in equation (OA-11).

Online Appendix C - Proofs of Section B

Before proving Proposition 4, we provide conditions under which the solution set of problem ∆(τ) is compact.

Proposition 8 (Compactness of set of minimizers of ∆(τ)). Suppose there are no arbitrage opportu-nities in the supporting economy, Assumption 1 holds and there are no redundancies in the securities, thatis, if X ′θ1 = X ′θ2 and P ′θ1 = P ′θ2 for some θ1 and θ2 involving finite transaction costs, then θ1 = θ2.If the dual Φ−dispersion in ∆(τ) is given by case (3a) of Examples 1 of Appendix A, then the solution setof problem ∆(τ) is non-empty and compact. If the dual Φ−dispersion is in any other case of Examples 1 ofAppendix A, then the solution set of ∆(τ) is a singleton.

Proof of Proposition 8. If there are no redundancy in the securities we have ∆(τ) = minθ∈RN Q(θ), wherefor any θ ∈ RN :

Q(θ) := E[φ∗+(−X ′θ)] + π(X ′θ) .

By the properties of φ∗+ and γh∗ , function Q is closed and convex. Moreover, under absence of arbitrage∆(τ) is finite and attained, thus function Q is proper. Therefore the set of solutions of ∆(τ) is a nonemptyclosed and convex set, see [Rockafellar, 1970, Section 7].

Suppose that Q is defined in terms of any dual Φ−dispersions in Examples 1 of Appendix A that is not incase (3a). Then φ∗+ is strictly convex and, under no redundancy in the securities, the set of solutions of ∆(τ)is a singleton.

If instead Q is defined in terms of the dual Φ−dispersions in case (3a), we next show that in this case Q iscoercive, which implies that the set of solution of ∆(τ) is also bounded, and therefore compact. Q is coercivewhen, for every portfolio weight θ 6= 0,

limy→+∞

yβE[max−X ′t+1θ, 0β/β] + yP ′tθ + γh∗(yθD) = +∞ ,

where β = γ/(γ − 1) > 1.

Recall that γh∗ is a nonnegative function and notice that we do not need to explore directions θ such thatX ′t+1θ = 0 almost surely, because in this case under no-arbitrage and no redundancies in the securitiesit must be that θ = 0. Consider directions θ such that X ′t+1θ < 0 with positive probability. ThenE[max−X ′t+1θ, 0β/β] is strictly positive and therefore

yβE[max−X ′t+1θ, 0β/β] + yP ′tθ −→ +∞ ,

as y > 0 increases, since β > 1. Finally, consider directions θ for which X ′t+1θ ≥ 0 almost surely, andX ′t+1θ > 0 with positive probability. In this case E[max−X ′t+1θ, 0β/β] = 0 and, since there are noarbitrage opportunities, P ′tθ + γh∗(θD) is strictly positive. Therefore, for these directions +yE[P ′tθ] growsunboundedly as y > 0 increases.

We have shown that in this case function Q is coercive, thereby the set of solutions to problem ∆(τ) iscompact. This concludes the proof.

Proof of Proposition 4. Given claim (i), convergence in probability of θT to θ0 in statement (ii) followsusing, e.g., [Newey and McFadden, 1994, Thm 2.7]. We now prove claim (i).

Under absence of arbitrage in the supporting economy and Assumption 1, Θ is non-empty and stochasticprocess φ∗+(−X ′t+1θ) + P ′tθt∈N is ergodic, stationary and integrable for all θ ∈ Θ. We can thus use theErgodic theorem (see, e.g., [Davidson, 1994, Thm. 13.12]) to conclude that QT converges to Q almost surelyon Θ. Then, by [Rockafellar, 1970, Thm. 10.8], we obtain almost sure convergence of QT to Q, uniformlyon any compact subset of the interior of Θ.

63

Page 64: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Under Assumption 2, Θ contains in its interior the compact set of minimizers of Q, which we denote by Q.Hence, the exists a compact subset Ξ of the interior of Θ, which contains Q in its interior. Therefore, theuniform convergence of sequence QT to Q on Ξ implies that for a sufficiently large T the minimum of QT onΞ is taken almost surely in set Q. For such a large T , convexity of QT implies that also the minimum of QTon RN is taken almost surely in set Q, i.e., ∆T (τ) := minQT (θ) : θ ∈ RN = minQT (θ) : θ ∈ Ξ almostsurely. Therefore, given the almost sure uniform convergence of sequence QT to Q on Ξ, we finally obtainthe almost sure convergence of ∆T (τ) to minQ(θ) : θ ∈ Ξ = minQ(θ) : θ ∈ Θ =: ∆(τ). This concludesthe proof.

Proof of Proposition 6. We have:

√T (∆T (τ)−∆(τ)) = −

√T (QT (θT )−Q(θ0))

= −√T (QT (θT )−QT (θ0))−

√T (QT (θ0)−Q(θ0)) .

(OA-13)

We analyze the two components of the RHS of equation (OA-13). The second component,√T (QT (θ0) −

Q(θ0)), reads:

1√T

T∑t=1

φ∗+(−X ′t+1θ0) + P ′tθ0 − E[φ∗+(−X ′t+1θ0) + P ′tθ0]

. (OA-14)

Assumption 6 implies that the random sequence (OA-14) converges in distribution to a zero mean normallydistributed random variable with variance v(θ0).

Now, recall that QT is a convex, differentiable function minimized in θT . Therefore, the first component onthe RHS of equation (OA-13) satisfies:

0 ≤ −√T (QT (θT )−QT (θ0)) ≤ −

√T∇QT (θ0)(θT − θ0) , (OA-15)

where

√T∇QT (θ0) =

1√T

T∑t=1

−(φ∗+)′(−X ′t+1θ0)Xt+1 + Pt + [0′S ,∇γh∗(θ0D)′]′

.

From the first-order condition ∇Q(θ0) = 0 for an extremum in the interior of Θ and Assumption 5, we have[0′S ,∇γh∗(θ0D)′]′ = −E

[(−φ∗+)′(−X ′t+1θ0)Xt+1 + Pt

]. Moreover, given the boundedness in probability of

sequence

1√T

T∑t=1

[−(φ∗+)′(−X ′t+1θ0)Xt+1 + Pt − E[−(φ∗+)′(−X ′t+1θ0)Xt+1 + Pt]

](OA-16)

in equation (OA-9), we obtain that√T∇QT (θ0) is bounded in probability as well. This, together with

the consistency of θT in Proposition 4, shows that the last term of on the RHS of inequalities (OA-15)converges in probability to zero, implying −

√T (QT (θT )−QT (θ0)) = op(1). Therefore,

√T (∆T (τ)−∆(τ))

is asymptotically equivalent to the random sequence (OA-14), which converges in distribution to a zero meannormally distributed random variable with variance v(θ0). This concludes the proof.

Proof of Proposition 7. We first define for u, z ∈ RN the auxiliary functions: QT (u) = QT

(θ0 + u√

T

),

Q(u) = Q(θ0 + u√

T

), gT (z) := 1

T

∑Tt=1

(φ∗+(−X ′t+1z) + P ′tz

)and g(z) := E[φ∗+

(−X ′t+1z

)+ P ′tz]. We

then define the estimation problem in terms of local parameter u ∈ RN , using the process:

ZT (u) = TQT (u)− QT (0)

. (OA-17)

64

Page 65: Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

Note that uT :=√T (θT − θ0) minimizes ZT (u), because QT (θ) is minimized in θT . Using the stochastic

equicontinuity Assumption 7, we obtain:

QT (u)− QT (0) =1√T∇QT (0)u+

(Q(u)− Q(0)

)+ op(1/T ) . (OA-18)

The second component of the RHS of equation (OA-18) reads:

Q(u)− Q(0) = g

(θ0 +

u√T

)− g(θ0) + γh∗

(θ0D +

uD√T

)− γh∗(θ0D). (OA-19)

Assumption 8 now allows us to compute a second order Taylor expansion of auxiliary function g in θ0.Together with the expression of the remainder sequence RT , this yields:

Q(u)− Q(0) =1√T

(∇g(θ0) + [0′S ,∇γh∗(θ0D)′]′)u+1

2Tu′G(θ0)u+RT (u) + op(1/T ) . (OA-20)

Given that θ0 minimizes Q(θ) in the interior of Θ, the first component on the RHS of equality (OA-20)vanishes. Substituting identity (OA-20) in equation (OA-18) and multiplying by

√T , we obtain:

ZT (u) =√T (∇gT (θ0) + [0′S ,∇γh∗(θ0D)′]′)u+

1

2u′G(θ0)u+ TRT (u) + op(1) .

Assumptions 5 and 8, together with the first-order condition ∇Q(θ0) = 0, imply the convergence in dis-tribution of term

√T (∇gT (θ0) + [0′S ,∇γh∗(θ0D)′]′) to a zero mean normally distributed random vector Y

with variance covariance matrix V (θ0). Finally, Lipschitz continuity of ∇γh∗ implies that the term TRT (u)converges pointwise to a limit R(u) described by convex function R.

In summary, we obtain that all finite-dimensional distributions of process ZT converge to the correspondingfinite-dimensional distributions of a process Z defined by Z(u) := u′Y + 1

2u′G(θ0)u + R(u), as T → ∞.

Assumption 3 and the convexity of function Q also imply that matrix G(θ0) is positive definite. Hence,random function u 7→ Z(u) is strictly convex in u and has a unique minimizer u0 := arg minu∈RN Z(u)almost surely. This implies that

√T (θT − θ0) → arg minu∈RN Z(u) in distribution as T → ∞; see, e.g.,

Geyer [1996]. This concludes the proof.

65