ACTSC372 Course Project

ACTSC372Course Project

University of Waterloo

Markowitz’s Portfolio Optimizationvs. 1/N Portfolio

Bill Zhuo

Abstract & Reader’s Guide

This is a comprehensive report on the work that I have done to finish this ACTSC 372project. The original idea was based on a paper called, Optimal Versus Naive Diversification:How Inefficient is the 1/N Portfolio Strategy?. Please focus on Part I of the whole reportwhere the readers can find the required analysis and detailed codes. Part II is my personalresearch on practical estimation of S&P 500 companies’ β, which did play a minor part inPart I.

November 2020

Contents

I Main Content

1 Markowitz’s Portfolio vs. 1/N Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Data Collection 7

1.2 Estimation 91.2.1 Estimation on Training Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Construction of Portfolios 101.3.1 Grid Search for Optimal τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2 1/N Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 In-Sample Performation Evaluation 121.4.1 Portfolio Distribution and Efficient Frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.4.2 Estimation of β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 Out-of-Sample Performance Evaluation 181.5.1 Enumeration of Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5.2 Estimation of β in Testing Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.5.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.5.4 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.6 Ah-hoc Analysis 211.6.1 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.6.2 Overfitting and Weighting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.6.3 Real World Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.7 Discussion 22

4

2 Detailed Codebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1 Markowitz’s Portfolio Optimization 232.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.1.3 Construction of Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1.4 In-sample Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1.5 Out-of-sample Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

II Side Research

3 Causal Inference on S&P 500 Stocks’ β . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.1 Problem Definition 793.2 Data Processing 793.3 Market Average Return and Benchmark β120 803.3.1 Estimation Benchmark: β120 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.4 Model Evaluation 833.5 Exploration of Additional Models 863.5.1 Ordinary Linear Regression (OLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.5.2 Reversed Weighted Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.5.3 Robust Regression (Huber Regression) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.5.4 Model Evaluation: Frequentist Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.5.5 Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.6 Potential Improvements and Recommendations 913.6.1 Sector Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.6.2 Value or Growth? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.6.3 Foreign or Domestic? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.6.4 COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.6.5 Link to Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.7 Acknowledgement 933.8 Appendix 94

I

1 Markowitz’s Portfolio vs. 1/N Portfolio . . . 71.1 Data Collection1.2 Estimation1.3 Construction of Portfolios1.4 In-Sample Performation Evaluation1.5 Out-of-Sample Performance Evaluation1.6 Ah-hoc Analysis1.7 Discussion

2 Detailed Codebook . . . . . . . . . . . . . . . . . 232.1 Markowitz’s Portfolio Optimization

Main Content

1. Markowitz’s Portfolio vs. 1/N Portfolio

1.1 Data CollectionWe were advised to pick 10 stocks in the market with a minimum 2-year trading history toconstruct a portfolio. The main idea that guided my selection is to have predominant expo-sure to the growing technology sector, minor exposure on the industrial companies/energycompanies, hedge position in gold/mining, and an emphasis on ESG investing. Based onmy investment thesis, I picked the following 10 stocks displayed in the table below.

Company Ticker Sector Highlights

Apple AAPL Technology

A global leader in multiplelines of electronic device businessesand their derivatives. Very strongand consistent earning ability.

MicrosoftCorporation

MSFT Technology

A global leader in computerhardware and software service.Very strong and consistent earningability

Amazon AMZNTechnology/ConsumerDiscretionary

A tech giant that operatesinternational e-commercebusinesses along with toucheson the entertainmentindustry and cloud computing.Very strong andconsistent earning ability.

8 Chapter 1. Markowitz’s Portfolio vs. 1/N Portfolio

Tesla, Inc. TSLATechnology/Industrial

An electronic vehicle productioncompany with extensiveinvestment in self-driving carsand clean energy. Volatilityearning reports result inhigh volatility in returns.

Equinix EQIXREIT/Technology

A global leader in the data centreand colocation data.Veryconsistent earning ability.

GeneralMotors

GM IndustrialTraditional vehicle and partsmanufacture company.

GileadSciences

GILD PharmaceuticalA biopharmaceutical companythat researches, develops,and commercializes drugs.

Enbridge ENB Energy

A Canadian energytransportation company.Earning subject to the energycommodity in the market.

Barrick GoldCorporation

GOLD MineralA Canadian mining companywith predominant operationsin gold mining.

First Solar FSLR EnergyA fast-growing solar energycompany. Very ESG-oriented.

The data is retrieved on November 2nd, 2020 using Yahoo Finance data provider.1 %ticker_list = ['AAPL', 'MSFT', 'GOLD', 'GM', 'GILD', 'AMZN', 'TSLA', 'ENB'

, 'EQIX', 'FSLR']2 price_data = list()34 start_date = '2018 -11 -02'5 end_date = '2020 -11 -02'67 for index , ticker in enumerate(ticker_list):8 prices = pdr.get_data_yahoo(ticker , start = start_date , end = end_date)9 price_data.append(prices.assign(ticker = ticker)[['Adj Close ']])

1011 df_stocks = pd.concat(price_data , axis =1)12 df_stocks.columns=ticker_list13 df_stocks.head()

Using the code chunk displayed above, we retrieve the daily adjusted price for these 10stocks from November 2nd, 2018, to November 2nd, 2020. The detailed retrieval of data

1.2 Estimation 9

for S&P 500 Index and 10-year US T-Bill yield can be found in the 2.1.1Data Collectionsection in Chapter 2.

Traing/Testing Dataset PreparationWe use a usual 2 : 1 train-test ratio with the first 2

3 rows of data being in the training setand the rest for our testing set while keeping the sequential order with respect to date.This practice is reasonable for our setting to test whether analysis on the historical datacan give us reliable forecast about the future. This can be done fairly easily in sklearnshown below.

1 %from sklearn.model_selection import train_test_split23 X_train , X_test , y_train , y_test = train_test_split(df_stocks , sp500 ,

test_size =0.33, shuffle=False)

where sp500 is the S&P 500 index time series.

1.2 Estimation

1.2.1 Estimation on Training Set

To perform mean-variance optimization, we need to obtain mean return estimate andcovariance estimate. As a common practice, to account for the compounding effect, wecompute the daily return percentage using the log-return formula defined by

Ri = log(

Pi

Pi−1

)

where Pi is the adjusted price of the stock on i−th day. This calculation is carried throughbelow.

1 %return_data = list()23 for index , ticker in enumerate(ticker_list):4 log_ret = np.log(X_train[ticker ]) - np.log(X_train[ticker ].shift (1))5 return_data.append(log_ret)67 df_ret = pd.concat(return_data , axis =1)8 df_ret.columns=ticker_list9 df_ret.tail()


For each stock, we compute the sample mean return to form the mean return vector µ̂ andcompute the sample covariance matrix Σ̂ using the following expressions.

µ̂j =1n

n

∑i=1

Ri,j,∀j ∈ {1, · · · ,10} =⇒ µ̂ =

µ̂1µ̂2...

µ̂10

Σ̂=

Var({R1,i}n

i=1) Cov({R1,i}ni=1,{R2,i}n

i=1) · · · Cov({R1,i}ni=1,{R10,i}n

i=1)Cov({R2,i}n

i=1,{R1,i}ni=1) Var({R2,i}n

i=1) · · · Cov({R2,i}ni=1,{R10,i}n

i=1)...

.... . .

...Cov({R10,i}n

i=1,{R1,i}ni=1) Cov({R10,i}n

i=1,{R2,i}ni=1) · · · Var({R10,i}n

i=1)

To facilitate portfolio performance with ease, we shall use annualized values. This isprimarily due to the fact that 10-year US T-Bill yields are given as annual data points.Performing interpolation on such data is usually considered flawed or at least prone toerros. Meanwhile, the mean-variance optimization proposed in the Markowitz’s portfoliotheory can accommodate different time horizons as long as µ and Σ are consistent. Thus,we use

µ̂annual = (1 + µ̂)252 − 1, Σ̂annual = 252× Σ̂

where 252 is the number of trading days in a year, which can give us a more realisticestimate. The following two functions performed the claimed calculations.

1 %def mean_estimate(df):2 mean_dict = dict()3 for col_name in df.columns:4 mean_dict[col_name] = (np.mean(df[col_name ]) + 1) ** 252 - 15 return mean_dict67 def cov_mat_estimate(df):8 return df.cov() * 252

Similar logic was applied to the S&P 500 index return as shown below.

1 %sp_mean = (np.mean(y_train['Return ']) + 1) ** 252 - 12 sp_vol = np.std(y_train['Return ']) * np.sqrt (252)

We omitted some data wrangling here and they can be found in 2.1.2Estimation section inChapter 2.

1.3 Construction of Portfolios

We adopt one of the formulation of mean-variance portfolio optimization given by

maxw

τw>µ− 12

w>Σw s.t. e>w = 1

where τ is the risk-return trade-off parameter. We shall test different values of τ to plotthe efficient frontier of this set of stocks. We have calculated the closed form formula forthe optimal weight wopt given below.

1.3 Construction of Portfolios 11

Self-financing portfolio

wz = Σ−1µ− e>Σ−1µ

e>Σ−1eΣ−1e

1 %def w_z(mu, sigma):2 e = np.ones (10)3 sigma_inv = np.linalg.inv(sigma)4 factor = (np.dot(e.T, np.dot(sigma_inv , mu))) / (np.dot(e.T, np.dot(

sigma_inv , e)))5 return np.dot(sigma_inv , mu) - factor * np.dot(sigma_inv , e)

Min-risk portfolio

wm =Σ−1e

e>Σ−1e

1 %def w_m(mu, sigma):2 e = np.ones (10)3 sigma_inv = np.linalg.inv(sigma)4 return (np.dot(sigma_inv , e)) / (np.dot(e.T, np.dot(sigma_inv , e)))

Optimal portfolio

wopt = τwz + wm

Since we do not know µ,Σ we use µ̂annual, Σ̂annual instead.

1.3.1 Grid Search for Optimal τ

Since we do not have a clear objective function for us to optimize τ against, we cannotuse more advanced techniques, such as Bayesian optimization. We shall perform therudimentary grid search and interpret the performance metric manually later. We pickT = {0,0.01, · · · ,0.19} as our potential τ set construct corresponding optimal portfolios asshown below.

1 %stocks_num = 102 tau_mean = np.zeros(len(tau_array))3 tau_vol = np.zeros(len(tau_array))4 all_weights = list()5 for index , tau in enumerate(tau_array):6 weight_opt = tau * w_z(train_mean_estimate_np ,

train_cov_mat_estimate_np) + \7 w_m(train_mean_estimate_np , train_cov_mat_estimate_np)8 tau_mean[index] = np.sum(( train_mean_estimate_np * weight_opt))9 tau_vol[index] = np.sqrt(np.dot(weight_opt.T, np.dot(

train_cov_mat_estimate_np , weight_opt)))10 all_weights.append(weight_opt)

We also store the mean and volatility (standard deviation) of each optimal portfolio forperformance measurement later.

Definition 1.3.1 — τ−Optimized Portfolios. A τ−optimized portfolio is the wopt pro-duced by

maxw

τw>µ− 12

w>Σw s.t. e>w = 1


1.3.2 1/N StrategyOne of the objective of this endeavour is to test whether the mean-variance optimizedportfolio can beat the 1/N naive portfolio consistently. Thus, we also construct an 1/Nportfolio and keep track its mean return and volatility.

1 %naive_weight = np.repeat (1 / stocks_num , stocks_num)2 naive_mean = np.sum(( train_mean_estimate_np * naive_weight))3 naive_vol = np.sqrt(np.dot(naive_weight.T, np.dot(train_cov_mat_estimate_np

, naive_weight)))

Detailed code can be found in 2.1.3Construction of Portfolios section in Chapter 2.

1.4 In-Sample Performation Evaluation1.4.1 Portfolio Distribution and Efficient Frontier

We conducted an analysis on the empirical efficient frontier and the theoretical efficientfrontier calculated using τ−optimized portfolios. We first generated 50000 random port-folio weights for these 10 stocks and plot them on the volatility versus return axis, shownas below.

The colouring is based on the portfolio’s sharpe ratio given by

Sharpe(P) =µP − r f

σP

where µP is the portfolio mean return, σP is the portfolio volatility, and r f is the annualizedmean risk-free rate derived from the 10-year US T-Bill yields. By using the minimizefunction provided by scipy.optimize, we can compute the envelope, empirical efficient

1.4 In-Sample Performation Evaluation 13

frontier, of these randomly generated portfolios. Then, we add our τ−optimized portfolios,S&P 500 index, and 1/N portfolio data into this chart to have the following result.

We can observe the min-risk portfolio is highlighted as τ−portfolio with the orangecolouring on the leftest point of the theoretical efficient frontier. Moreover, we haveseveral interesting observations to report:

• We observe that our empirical efficient frontier and theoretical efficient frontieroverlap at most of the places. This suggests our original formulation of the problemis practical.

• Due to the selection of our 10-stock portfolios, S&P 500’s return and volatility datapair is not included within the efficient frontier. This tells us we cannot use these 10stocks to replicate the risk-return profile of the S&P 500 index for this time periodwith the tolerance of some estimation errors. Moreover, the position of the S&P 500index data pair is on the bottom left, which is less desirable compared to a largenumber portfolios, including the τ−portfolios. One potential question to ask is whya market portfolio consisting of most of the best companies in the world is underperforming over this period. There is one obvious answer, but not the only one,which is the COVID-19 pandemic and the March stock sell-offs in the US market.

• As expected, 1/N is a viable portfolio is construct, thus, contained within the efficientfrontier. Based on its position, it is likely to be better than the S&P 500 index in termsof some metrics that we shall see later but it is certainly inefficient compared to someof the random portfolios and certainly the τ−optimized portfolios on the frontier.

• The theoretical efficient frontier is a hyperbola as we have seen in lectures. But wetend to prefer portfolios that are closer to the top left corner of the graph, whichpresent a more desirable risk-return trade off.


In order to compare these portfolios with more quantitative approach. We shall definesome performance metrics. But first, we need estimate a crucial parameter.

1.4.2 Estimation of β

We shall calculate βP for each τ−optimized portfolio that we generated using the linearityof β,

βP = w>opt~β

where ~β =[β1, · · · , β10

]. By definition, βi measures the correlation between the i−th stock

risk premium over the risk free rate r f and the market risk premium over r f . Please check3 Causal Inference on S&P 500 Stocks’β in Chapter 3 for detailed research on differentapproaches of estimation that I tested on S&P 500 companies as a side research supervisedby Cubist Systematic Strategies. Here, we introduce two approaches.

Simple Linear RegressionBased on CAPM model, we know that

rP − r f = r f + βP(rM − r f )

where rP,rM are portfolio return and market return respectively. With given rM,rP,r finformation, we formulate this as a simple linear regression problem.

rP − r f = β0 + βP(rM − r f ) + ε

where ε ∼ N(0,σ2). By using the following function, we can perform this estimation withease.

1 %def ols_beta(ticker , df_x , df_y , r_f):2 ret_raw_stock = np.log(df_y[ticker ]) - np.log(df_y[ticker ]. shift (1))3 ret_raw_stock = ret_raw_stock [1:] - r_f['DGS10'][1:]/1004 market_ret = np.array(df_x['Return '][1:] - r_f['DGS10'][1:]/100)5 X = sm.add_constant(market_ret)6 Y = ret_raw_stock7 OLS = sm.OLS(Y,X)8 results = OLS.fit()9 print(results.summary ())

10 return results.params [1]

Robust Linear RegressionWe know that diversification of the portfolio by using a market portfolio can only eliminatenon-systematic risk. Sudden market events can easily create outliers in the return figures,examples including 911, 2008 financial crisis, and March COVID-19 massive sales. Thisrequires a more robust regression framework and we shall use the well-known Huber lossfunction

ρk(r) =

{12 k2 |r| ≤ kk|r| − 1

2 k2 |r| > k

in practice, it is common to set k = 1.345 to achieve a theoretical balance between effi-ciency and resistance to outliers. The following function is created to perform this robustestimation.

1 %def huber_beta(ticker , df_x , df_y , r_f):2 ret_raw_stock = np.log(df_y[ticker ]) - np.log(df_y[ticker ]. shift (1))3 ret_raw_stock = ret_raw_stock [1:] - r_f['DGS10'][1:]/1004 market_ret = np.array(df_x['Return '][1:] - r_f['DGS10'][1:]/100)


5 X = sm.add_constant(market_ret)6 Y = ret_raw_stock7 huber = sm.RLM(Y, X, M = sm.robust.norms.HuberT (1.345))8 results = huber.fit()9 print(results.summary ())

10 return results.params [1]

Using both approaches, we get the following β estimations.

We note that there are some differences between the simple linear regression (or OLSestimate) result and the robust estimate. In particular, the stock GOLD is considered as ahedge in the portfolio against market down fall. Thus, we shall stick with robust estimate

approach and its ~̂β estimated. Now, we can calculate βP for each τ−optimized portfolio Pby

β̂P = w>opt~̂β

the corresponding code is

1 %beta = (beta_df.T[['Robust Beta']].T * weights).sum (1) [0]

Now, we are ready to define evaluation metrics.

1.4.3 Evaluation Metrics

R The required calculation of daily return of τ−optimized portfolios, 1/N portfolio,and S&P 500 index is done by the following code chunk.

1 %### Daily Return Calculation23 tau_opt_port_daily_ret_dict = OrderedDict ()45 for index , tau in enumerate(tau_array):6 weights = pd.Series(all_weights[index], index = ticker_list)7 tau_opt_port_daily_ret_dict['tau=' + str(tau)] = (df_ret *

weights).sum(1)89

10 tau_opt_port_daily_ret_dict['1/N'] = (df_ret * naive_weight).sum(1)

11 tau_opt_port_daily_ret_dict['SPX'] = y_test [['Return ']]. sum (1)1213 tau_opt_port_daily_ret_df = pd.DataFrame.from_dict(

tau_opt_port_daily_ret_dict)

Cumulative ReturnBesides mean return and volatility, we can also look at the cumulative return of eachportfolio directly. To facilitate visualization, we use the dtale library promoted by theMAN research group, which uses plotly to create interactive graphs. For our case, wehave the cumulative return graph shown below.


As we can observe, with a higher τ value, the τ−optimized portfolio will have muchhigher cumulative return compared to 1/N portfolio or the S&P 500 index. Even for ourmin-risk portfolio, the performance is considerably better than both 1/N portfolio andS&P 500 index in terms of cumulative return. However, this metric does not addressvolatility in the return.

Performance MeasuresWith all the effort estimating β, we shall use it to define the following performancemeasures.

Sharpe Ratio Treynor Ratio Jensen’s AlphaFormula µP−r f

σP

µP−r fβP

(µP − r f )− βP(µM − r f )

BenchmarkRelative performance to

the risk-free assetRelative performance to

the market portfolioRelative performance to

an efficient portfolio

With the following code,

1 %metric_dict = OrderedDict ()23 def metric_summary(mu , sigma , beta , rf , spm):4 mean = mu5 vol = sigma6 sharpe = sharpe_ratio(mu, sigma , rf)7 treynor = treynor_ratio(mu, beta , rf)8 jensen = jensen_alpha(mu, beta , rf, spm)9 return [mean , vol , beta , sharpe , treynor , jensen]

1011 for index , tau in enumerate(tau_array):12 weights = pd.Series(all_weights[index], index = ticker_list)13 beta = (beta_df.T[['Robust Beta']].T * weights).sum (1) [0]14 metric_dict['tau=' + str(tau)] = metric_summary(tau_mean[index],

tau_vol[index], beta , rf_mean , sp_mean)1516 naive_beta = (beta_df.T[['Robust Beta']].T * naive_weight).sum (1)[0]17


18 metric_dict['1/N'] = metric_summary(naive_mean , naive_vol , naive_beta ,rf_mean , sp_mean)

19 metric_dict['SPX'] = metric_summary(sp_mean , sp_vol , 1, rf_mean , sp_mean)

We have the tabulated performance comparisons.

It can be summarized using a grouped bar chart.


We have several interesting observations to report:

• With increasing τ ≥ 0, the τ−optimized portfolios consistently outperform the 1/Nportfolio and the S&P 500 index on the training set.

• When τ increases, mean return, standard deviation, estimated β̂P, Treynor Ratio,and Jensen’s alpha all increases. This means τ−optimized portfolios have betterperformance than the market portfolio and its efficient portfolio when τ increases.

• However, the sharpe ratio is not monotonously increasing. In fact, it reaches thepeak at τ = 0.18 with this grid of τ. Even though it is not necessarily the globaloptimum for the sharpe ratio, we know it is close to it.

• Based on these performance metrics, we can clearly see that our τ−optimizedportfolios have a better performance compared to the S&P 500 index and the 1/Nportfolio by a considerable margin.

Now, we turn our attention to the testing set and see if these τ−optimized portfoliosgenerated on the training set can give us consistent superiority over 1/N portfolio andS&P 500 index.

Detailed code can be found in 2.1.4In-Sample Performation Evaluation section in Chapter2.

1.5 Out-of-Sample Performance Evaluation

1.5.1 Enumeration of AttributesAgain, we need to compute mean return and volatility of τ−optimized portfolios, 1/Nportfolio, and S&P 500 index in our testing set, which ranges from March 10th, 2020, toNovember 2nd, 2020. This was done by the following code chunk.

1 %test_mean_estimate = mean_estimate(df_ret_test)2 test_mean_estimate_np = np.array(list(test_mean_estimate.values ()))34 test_cov_mat_estimate = cov_mat_estimate(df_ret_test)5 test_cov_mat_estimate_np = test_cov_mat_estimate.to_numpy ()67 tau_mean_test = np.zeros(len(tau_array))

1.5 Out-of-Sample Performance Evaluation 19

8 tau_vol_test = np.zeros(len(tau_array))9

10 for index , tau in enumerate(tau_array):11 weight_opt = all_weights[index]12 tau_mean_test[index] = np.sum(( test_mean_estimate_np * weight_opt))13 tau_vol_test[index] = np.sqrt(np.dot(weight_opt.T, np.dot(

test_cov_mat_estimate_np , weight_opt)))1415 naive_mean_test = np.sum(( test_mean_estimate_np * naive_weight))16 naive_vol_test = np.sqrt(np.dot(naive_weight.T, np.dot(

test_cov_mat_estimate_np , naive_weight)))1718 sp_mean_test = (np.mean(y_test['Return ']) + 1) ** 252 - 119 sp_vol_test = np.std(y_test['Return ']) * np.sqrt (252)20 rf_mean_test = np.mean(rf_test['DGS10'] / 100)21 rf_vol_test = np.std(rf_test['DGS10'] / 100)

Similar to what we have done in the training set, we also compute daily returns andcumulative returns of τ−optimized portfolios, 1/N portfolio, and S&P 500 index.

1.5.2 Estimation of β in Testing SetBy using robust regression on the testing set’s market return and stock return data, wehave the following result.

1.5.3 Performance ComparisonCumulative ReturnWe plot the cumulative returns in dtale.

From the cumulative return chart (we picked out τ ∈ {0,0.5,0.15,0.19} for clear graphing),our τ-optimized portfolio outperforms the 1/N portfolio and S&P 500 by huge margins in


the early stage. For the majority of time, our portfolio produce better cumulative returnthan the 1/N portfolio and S&P 500 with a cost of inconsistency/high volatility. By thevery end of the testing set, 1/N portfolio performs better than all of the τ−optimizedportfolios and S&P 500. Moreover, 1/N portfolio performs consistently better than themarket cumulative return over time. This is drastically different from what we observedin the training set.

1.5.4 Performance Metrics

Using similar code in the in-sample evaluation, we have the following output of meanreturn, volatility, sharpe ratio, treynor ratio, and Jensen’s alpha.

Corresponding to what we have observed in the cumulative return graph, 1/N portfoliosignificantly outperforms all τ−optimized portfolios in almost all performance metricswhile beating the market by plenty as well. We can visualize these metrics in the followinggrouped bar plot.

1.6 Ah-hoc Analysis 21

Not only 1/N portfolio have significantly better result across all metrics, our τ−optimizedportfolio seem to have a hard time beating the market return (S&P 500 index) with highsharpe ratio.

This particular practice seems to indicate inconsistency of the mean-variance optimizationapproach over time. As for the 1/N portfolio, even though it is not optimal under certaintime frame, it can generate more consistent performance with some sacrifice of efficiency.

Detailed code can be found in 2.1.5Out-of-sample Performance Evaluation section inChapter 2.

1.6 Ah-hoc Analysis

We shall discuss what happened to our τ−optimized portfolios and why they havedrastically different performances against 1/N portfolio and the S&P 500 index.

1.6.1 Estimation MethodsWe have had certain assumptions about daily mean return and covariance matrix, such ascompounding effect. Moreover, risk-free rate was assumed to be the annual average yieldin corresponding dataset due to inconsistency between trade date and 10-Year US T-Billyield update date. There are also rooms for us to improve our β estimate, which can beshown in Chapter 3.

1.6.2 Overfitting and Weighting ProblemWe might have encountered an overfitting result where the mean-variance optimizationperforms extraordinarily on the training set but becomes mediocre when it comes tothe testing set. As we know that the S&P 500 index reflects the market dynamics butour τ−optimized portfolios are static once the training is done. Even though this doesnot justify the consistency of 1/N portfolio, which is another static portfolio, it providesus some direction into performing mean-variance optimization with a more continuousfashion. Moreover, with a retrospective angle, we check the weighting of each stock withincreasing τ ≥ 0 values. Using dtale object, we have


1 %weights_df = pd.DataFrame(data = all_weights)2 weights_df.columns = ticker_list3 weights_df.index = tau_array4 weights_df_dtale = dtale.show(weights_df , ignore_duplicate=True)5 weights_df_dtale

We can observe that since wz is a constant vector our weighting’s direction is insensitiveto τ > 0 after a certain degree. This might lead to a stubborn portfolio that insists on shortor long a certain stock under any circumstances in the market. In our case, Amazon.com(AMZN) has been shorted consistently for most of the τ. This does not make sense sinceAMZN is one of the top technology stocks in 2020.

1.6.3 Real World IssuesThe timing of this project, especially the splitting date between the training and testingdatasets, is very tricky. It is amid the March sell-off with multiple market crashes due tothe COVID-19 pandamic. Such historic event might even make this period an outlier in thestock trading history, nevertheless, the mean-variance model along with it. As we can seefrom the composition, one of the reasons why our τ-optimized portfolios can do well onthe training set is dumping industrial, manufacturing, and over-valued technology stockswhile longing data centre companies, such as EQIX. With a swift "V-shaped" recovery,at least in the stock market, such aggressive two-way strategy really suffers. Moreover,an usual gold price boost due to uncertainty around the pandemic further makes theτ−optimized portfolios look worse since they decide to short GOLD.

1.7 DiscussionThis is indeed a very intriguing topic to look at as the sophistication of Markowitz portfoliotheory cannot provide more consistent and outstanding performance than a somehownaive 1/N portfolio. The main idea seems to be trading efficiency for consistency ofperformance.

2. Detailed Codebook

All coding was done in Python 3.7.3 with Jupyter Notebook.

2.1 Markowitz’s Portfolio Optimization

2.1.1 Data Collection

[26]: ## Import necessary librariesimport pandas_datareader as pdrimport yfinance as yf

yf.pdr_override()

from datetime import datetimeimport pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as pltimport dtale

[27]: ## Get 10 stock dataticker_list = ['AAPL', 'MSFT', 'GOLD', 'GM', 'GILD', 'AMZN', 'TSLA',␣

↪→'ENB', 'EQIX', 'FSLR']price_data = list()

start_date = '2018-11-02'end_date = '2020-11-02'

for index, ticker in enumerate(ticker_list):

24 Chapter 2. Detailed Codebook

prices = pdr.get_data_yahoo(ticker, start = start_date, end =␣↪→end_date)

price_data.append(prices.assign(ticker = ticker)[['Adj Close']])

df_stocks = pd.concat(price_data, axis=1)df_stocks.columns=ticker_listdf_stocks.head()

[27]: AAPL MSFT GOLD GM GILD \Date2018-11-02 50.078476 103.341148 12.807792 33.803833 64.4420242018-11-05 48.656826 104.655296 12.846692 34.010242 64.5719532018-11-06 49.183006 104.859718 12.778618 34.207264 65.2865302018-11-07 50.674641 108.987129 12.623017 34.601311 67.2632222018-11-08 50.497833 108.782707 12.739717 34.310467 66.706398

AMZN TSLA ENB EQIX FSLRDate2018-11-02 1665.530029 69.281998 27.326300 378.091675 43.0000002018-11-05 1627.800049 68.279999 28.109039 381.974487 43.9300002018-11-06 1642.810059 68.211998 28.804808 381.569824 42.8300022018-11-07 1755.489990 69.632004 28.848291 385.385101 44.1699982018-11-08 1754.910034 70.279999 28.613470 375.249512 42.669998

[28]: ## Get S&P 500 datasp500 = pdr.get_data_yahoo('^GSPC', start = start_date, end =␣

↪→end_date)[['Adj Close']]sp500['Return'] = np.log(sp500['Adj Close']) - np.log(sp500['Adj Close'].

↪→shift(1))sp500.head()

[28]: Adj Close ReturnDate2018-11-02 2723.060059 NaN2018-11-05 2738.310059 0.0055852018-11-06 2755.449951 0.0062402018-11-07 2813.889893 0.0209872018-11-08 2806.830078 -0.002512

[29]: ## Perform train/test dataset splitfrom sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df_stocks, sp500,␣↪→test_size=0.33, shuffle=False)

[30]: ## Trim risk-free rate to have the same trade dates as the stock inforf_train = pdr.DataReader('DGS10', 'fred', start_date, '2020-03-09')rf_test = pdr.DataReader('DGS10', 'fred', '2020-03-10', end_date)

2.1 Markowitz’s Portfolio Optimization 25

rf_train = rf_train[rf_train.index.isin(X_train.index)].fillna(method =␣↪→'ffill')

rf_test = rf_test[rf_test.index.isin(X_test.index)].fillna(method =␣↪→'ffill')

2.1.2 EstimationTraining Set EstimationTo take the compound effect in, we consider the daily log return defined by

Ri = log(

Pi

Pi−1

)

[31]: ## Compute daily log return of each stockreturn_data = list()

for index, ticker in enumerate(ticker_list):log_ret = np.log(X_train[ticker]) - np.log(X_train[ticker].shift(1))return_data.append(log_ret)

df_ret = pd.concat(return_data, axis=1)df_ret.columns=ticker_listdf_ret.tail()

[31]: AAPL MSFT GOLD GM GILD AMZN \Date2020-03-03 -0.032274 -0.049106 0.037554 -0.029062 -0.015908 -0.0232792020-03-04 0.045341 0.036057 -0.003401 0.032557 0.023966 0.0344142020-03-05 -0.032975 -0.025415 0.029252 -0.034289 0.001577 -0.0265672020-03-06 -0.013369 -0.028674 0.003303 -0.047977 0.052331 -0.0119952020-03-09 -0.082395 -0.070179 -0.063189 -0.150150 -0.087216 -0.054302

TSLA ENB EQIX FSLRDate2020-03-03 0.002538 -0.009733 -0.006029 -0.0166002020-03-04 0.005338 0.026091 0.048127 0.0283222020-03-05 -0.033869 -0.008795 -0.045676 -0.0002202020-03-06 -0.029498 -0.012286 -0.013058 -0.0472852020-03-09 -0.145865 -0.196413 -0.058542 -0.102367

Under the assumption of Markowitz’s portfolio theory, we consider µ as the expectedreturn (annualized) and covariance matrix Σ as the risk factor. Since we cannot have thetrue value of µ and Σ. We shall use the estimations µ̂ and Σ̂. We use the sample mean(annualized)

µ̂j =1n

n

∑i=1

Ri,j,∀j ∈ {1, · · · ,10} =⇒ µ̂ =

µ̂1µ̂2...

µ̂10


As for the estimated covariance matrix, we use the sample covariance matrix (annualized)

Σ̂=

Var({R1,i}n

i=1) Cov({R1,i}ni=1,{R2,i}n

i=1) · · · Cov({R1,i}ni=1,{R10,i}n

i=1)Cov({R2,i}n

i=1,{R1,i}ni=1) Var({R2,i}n

i=1) · · · Cov({R2,i}ni=1,{R10,i}n

i=1)...

.... . .

...Cov({R10,i}n

i=1,{R1,i}ni=1) Cov({R10,i}n

i=1,{R2,i}ni=1) · · · Var({R10,i}n

i=1)

[32]: ## annualized mean return estimate

def mean_estimate(df):mean_dict = dict()for col_name in df.columns:

mean_dict[col_name] = (np.mean(df[col_name]) + 1) ** 252 - 1return mean_dict

[33]: ## annualized sample covariancedef cov_mat_estimate(df):

return df.cov() * 252

[34]: train_mean_estimate = mean_estimate(df_ret)train_mean_estimate_np = np.array(list(train_mean_estimate.values()))

[35]: train_cov_mat_estimate = cov_mat_estimate(df_ret)train_cov_mat_estimate_np = train_cov_mat_estimate.to_numpy()

[36]: ## Sample mean and standard deviation of S&P500 index (annualized)sp_mean = (np.mean(y_train['Return']) + 1) ** 252 - 1sp_vol = np.std(y_train['Return']) * np.sqrt(252)

[37]: ## Sample mean and standard deviation of the risk-free rate (annualized)rf_mean = np.mean(rf_train['DGS10'] / 100)rf_vol = np.std(rf_train['DGS10'] / 100)

2.1.3 Construction of PortfoliosBy Markowitz’s theory on mean-variance portfolio optimization with the following risk-return tradeoff formulation,

maxw

τw>µ− 12

w>Σw e>w = 1

where τ is the risk-return trade-off parameter. We shall test different values of τ to havean optimized return performance against an objective function.

[38]: ## Self-financing portfoliodef w_z(mu, sigma):

e = np.ones(10)sigma_inv = np.linalg.inv(sigma)factor = (np.dot(e.T, np.dot(sigma_inv, mu))) / (np.dot(e.T, np.

↪→dot(sigma_inv, e)))return np.dot(sigma_inv, mu) - factor * np.dot(sigma_inv, e)


## Min-risk portfoliodef w_m(mu, sigma):

e = np.ones(10)sigma_inv = np.linalg.inv(sigma)return (np.dot(sigma_inv, e)) / (np.dot(e.T, np.dot(sigma_inv, e)))

Grid Search for Optimal τ

[39]: ## Initialize a line vector of tautau_array = np.arange(0, 0.2, 0.01)

[40]: stocks_num = 10tau_mean = np.zeros(len(tau_array))tau_vol = np.zeros(len(tau_array))all_weights = list()for index, tau in enumerate(tau_array):

weight_opt = tau * w_z(train_mean_estimate_np,␣↪→train_cov_mat_estimate_np) + \

w_m(train_mean_estimate_np, train_cov_mat_estimate_np)tau_mean[index] = np.sum((train_mean_estimate_np * weight_opt))tau_vol[index] = np.sqrt(np.dot(weight_opt.T, np.

↪→dot(train_cov_mat_estimate_np, weight_opt)))all_weights.append(weight_opt)

1/N Portfolio

[41]: ## naive portfolionaive_weight = np.repeat(1 / stocks_num, stocks_num)naive_mean = np.sum((train_mean_estimate_np * naive_weight))naive_vol = np.sqrt(np.dot(naive_weight.T, np.

↪→dot(train_cov_mat_estimate_np, naive_weight)))

2.1.4 In-sample Performance Evaluation

[42]: ## Generate random portfolio to have a region of possible portfolios

np.random.seed(42)num_ports = 50000ret_arr = np.zeros(num_ports)vol_arr = np.zeros(num_ports)sharpe_arr = np.zeros(num_ports)

for x in range(num_ports):# Weightsweights = np.array(np.random.random(stocks_num) * 2 - 0.15)weights = weights/np.sum(weights)# Expected returnret_arr[x] = np.sum((train_mean_estimate_np * weights))


# Expected volatilityvol_arr[x] = np.sqrt(np.dot(weights.T, np.

↪→dot(train_cov_mat_estimate_np, weights)))

# Sharpe Ratiosharpe_arr[x] = (ret_arr[x] - rf_mean)/vol_arr[x]

[43]: ## Plot of random portfoliosplt.figure(figsize=(12,8))plt.scatter(vol_arr, ret_arr, c=sharpe_arr, cmap='viridis', alpha=0.3)plt.colorbar(label='Sharpe Ratio')plt.xlabel('Volatility')plt.ylabel('Return')# plt.scatter(max_sr_vol, max_sr_ret,c='red', s=50) # red dotplt.title('Randomly Generated Portfolios')plt.show()

[44]: ## define functions to get the empirical efficient frontierfrom scipy.optimize import minimizedef get_ret_vol_sr(weights):

weights = np.array(weights)ret = np.sum((train_mean_estimate_np * weights))vol = np.sqrt(np.dot(weights.T, np.dot(train_cov_mat_estimate_np,␣

↪→weights)))sr = ret/vol


return np.array([ret, vol, sr])

def neg_sharpe(weights):# the number 2 is the sharpe ratio index from the get_ret_vol_sr

return get_ret_vol_sr(weights)[2] * -1

def check_sum(weights):#return 0 if sum of the weights is 1return np.sum(weights)-1

[45]: frontier_y = np.arange(-0.1, 0.35, 0.001)

def minimize_volatility(weights):return get_ret_vol_sr(weights)[1]

[46]: frontier_x = []init_guess = list(np.repeat(1 / stocks_num, stocks_num))bounds = [(0,1)] * stocks_num

[47]: for possible_return in frontier_y:cons = ({'type':'eq', 'fun':check_sum},

{'type':'eq', 'fun': lambda w: get_ret_vol_sr(w)[0] -␣↪→possible_return})

result = minimize(minimize_volatility,init_guess,method='SLSQP',␣↪→bounds=bounds, constraints=cons)

frontier_x.append(result['fun'])

[51]: ## plot everything and highlight portfoliosplt.figure(figsize=(12,8))plt.scatter(vol_arr, ret_arr, c=sharpe_arr, cmap='viridis', alpha = 0.3,␣

↪→label = "Randomly Generated Portfolios")plt.colorbar(label='Sharpe Ratio')plt.plot(tau_vol, tau_mean, 'r--', linewidth=3, label = "Tau-Theoretical␣

↪→Efficient Frontier")plt.scatter(tau_vol[1:], tau_mean[1:], c=tau_array[1:], cmap='inferno',␣

↪→marker = "D", label = "Tau Optimized Portfolios")plt.colorbar(label='tau value')plt.scatter(sp_vol, sp_mean, c = 'black', marker = "X", label = "S&P 500␣

↪→Index")plt.scatter(tau_vol[0], tau_mean[0], c = 'orange', marker = "P", label =␣

↪→"Min-Risk Portfolio")plt.scatter(naive_vol, naive_mean, c = 'red', marker = "p", label = "1/N␣

↪→Naive Portfolio")plt.xlabel('Annualized Volatility')plt.ylabel('Annualized Return')plt.plot(frontier_x,frontier_y, '--', linewidth=2, label = "Empirical␣

↪→Efficient Frontier")


plt.savefig('EfficientFrontier.png')plt.legend()plt.title("Portfolios Comparison")plt.show()

Estimation of β

[53]: ## Regression models for betaimport statsmodels.api as sm

def ols_beta(ticker, df_x, df_y, r_f):ret_raw_stock = np.log(df_y[ticker]) - np.log(df_y[ticker].shift(1))ret_raw_stock = ret_raw_stock[1:] - r_f['DGS10'][1:]/100market_ret = np.array(df_x['Return'][1:] - r_f['DGS10'][1:]/100)X = sm.add_constant(market_ret)Y = ret_raw_stockOLS = sm.OLS(Y,X)results = OLS.fit()print(results.summary())return results.params[1]

def huber_beta(ticker, df_x, df_y, r_f):ret_raw_stock = np.log(df_y[ticker]) - np.log(df_y[ticker].shift(1))ret_raw_stock = ret_raw_stock[1:] - r_f['DGS10'][1:]/100market_ret = np.array(df_x['Return'][1:] - r_f['DGS10'][1:]/100)X = sm.add_constant(market_ret)


Y = ret_raw_stockhuber = sm.RLM(Y, X, M = sm.robust.norms.HuberT(1.345))results = huber.fit()print(results.summary())return results.params[1]

[54]: # Perform regressionsbeta_dict = dict()

for ticker in ticker_list:beta_dict[ticker] = np.array([ols_beta(ticker, y_train, X_train,␣

↪→rf_train), huber_beta(ticker, y_train, X_train, rf_train)])

OLS Regression Results==============================================================================Dep. Variable: y R-squared: 0.

↪→715Model: OLS Adj. R-squared: 0.

↪→714Method: Least Squares F-statistic: ␣

↪→836.0Date: Fri, 13 Nov 2020 Prob (F-statistic): 6.

↪→15e-93Time: 20:42:04 Log-Likelihood: ␣

↪→1035.0No. Observations: 336 AIC: ␣

↪→-2066.Df Residuals: 334 BIC: ␣

↪→-2058.Df Model: 1Covariance Type: nonrobust==============================================================================

coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0108 0.001 8.676 0.000 0.008 0.

↪→013x1 1.4682 0.051 28.914 0.000 1.368 1.

↪→568==============================================================================Omnibus: 53.086 Durbin-Watson: 1.

↪→963Prob(Omnibus): 0.000 Jarque-Bera (JB): 553.

↪→979Skew: 0.044 Prob(JB): 5.

↪→07e-121Kurtosis: 9.290 Cond. No. ␣

↪→83.5


==============================================================================

Notes:[1] Standard Errors assume that the covariance matrix of the errors is␣

↪→correctlyspecified.

Robust linear Model Regression Results==============================================================================Dep. Variable: y No. Observations: ␣

↪→336Model: RLM Df Residuals: ␣

↪→334Method: IRLS Df Model: ␣

↪→ 1Norm: HuberTScale Est.: madCov Type: H1Date: Fri, 13 Nov 2020Time: 20:42:04No. Iterations: 22==============================================================================

coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0104 0.001 9.652 0.000 0.008 0.

↪→012x1 1.4519 0.044 33.258 0.000 1.366 1.

↪→537==============================================================================

If the model instance has been used for another fit with different fitparameters, then the fit options might not be the correct ones anymore .




↪→1028.Date: Fri, 13 Nov 2020 Prob (F-statistic): 6.




↪→-2246.


Df Model: 1Covariance Type: nonrobust==============================================================================

coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0061 0.001 6.404 0.000 0.004 0.

↪→008x1 1.2314 0.038 32.055 0.000 1.156 1.



↪→868Skew: 0.195 Prob(JB): 2.


↪→83.5==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0065 0.001 8.114 0.000 0.005 0.

↪→008x1 1.2613 0.033 38.756 0.000 1.198 1.

↪→325==============================================================================





↪→008Method: Least Squares F-statistic: 3.

↪→629Date: Fri, 13 Nov 2020 Prob (F-statistic): 0.

↪→0577Time: 20:42:04 Log-Likelihood: ␣




coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0162 0.002 -6.685 0.000 -0.021 -0.

↪→011x1 0.1875 0.098 1.905 0.058 -0.006 0.



↪→715Skew: -0.132 Prob(JB): 0.

↪→0348Kurtosis: 3.640 Cond. No. ␣

↪→83.5==============================================================================




↪→336


Model: RLM Df Residuals: ␣↪→334

Method: IRLS Df Model: ␣↪→ 1

Norm: HuberTScale Est.: madCov Type: H1Date: Fri, 13 Nov 2020Time: 20:42:04No. Iterations: 23==============================================================================

coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0163 0.002 -6.891 0.000 -0.021 -0.

↪→012x1 0.1854 0.096 1.934 0.053 -0.002 0.

↪→373==============================================================================










coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0012 0.001 0.858 0.392 -0.002 0.

↪→004x1 1.1028 0.059 18.652 0.000 0.986 1.

↪→219


==============================================================================Omnibus: 39.811 Durbin-Watson: 1.


↪→989Skew: 0.092 Prob(JB): 7.


↪→83.5==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0005 0.001 -0.373 0.709 -0.003 0.

↪→002x1 1.0206 0.051 20.074 0.000 0.921 1.

↪→120==============================================================================




↪→342


Method: Least Squares F-statistic: ␣↪→174.8

Date: Fri, 13 Nov 2020 Prob (F-statistic): 2.↪→17e-32

Time: 20:42:04 Log-Likelihood: ␣↪→951.95

No. Observations: 336 AIC: ␣↪→-1900.

Df Residuals: 334 BIC: ␣↪→-1892.


coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0027 0.002 -1.711 0.088 -0.006 0.

↪→000x1 0.8596 0.065 13.223 0.000 0.732 0.



↪→762Skew: 1.123 Prob(JB): 1.


↪→83.5==============================================================================






↪→ 1Norm: HuberTScale Est.: madCov Type: H1Date: Fri, 13 Nov 2020Time: 20:42:04


No. Iterations: 18==============================================================================

coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0025 0.001 -2.032 0.042 -0.005 -8.

↪→93e-05x1 0.8953 0.050 17.839 0.000 0.797 0.

↪→994==============================================================================










coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0048 0.001 3.378 0.001 0.002 0.

↪→008x1 1.2133 0.058 21.052 0.000 1.100 1.



↪→848Skew: 1.177 Prob(JB): 3.


↪→83.5


==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0037 0.001 3.371 0.001 0.002 0.

↪→006x1 1.1873 0.044 26.860 0.000 1.101 1.

↪→274==============================================================================









↪→-1314.



coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0134 0.004 3.538 0.000 0.006 0.

↪→021x1 1.5463 0.154 10.054 0.000 1.244 1.



↪→498Skew: -0.554 Prob(JB): 3.


↪→83.5==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0133 0.003 4.714 0.000 0.008 0.

↪→019x1 1.5208 0.115 13.231 0.000 1.296 1.

↪→746==============================================================================











coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0034 0.001 -2.398 0.017 -0.006 -0.

↪→001x1 0.8307 0.058 14.261 0.000 0.716 0.



↪→884Skew: -3.500 Prob(JB): ␣

↪→0.00Kurtosis: 34.000 Cond. No. ␣

↪→83.5==============================================================================




↪→336


Model: RLM Df Residuals: ␣↪→334



coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0061 0.001 -6.167 0.000 -0.008 -0.

↪→004x1 0.6596 0.040 16.460 0.000 0.581 0.

↪→738==============================================================================










coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0039 0.001 -2.689 0.008 -0.007 -0.

↪→001x1 0.7632 0.059 12.968 0.000 0.647 0.

↪→879


==============================================================================Omnibus: 40.319 Durbin-Watson: 1.


↪→203Skew: 0.470 Prob(JB): 2.


↪→83.5==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0044 0.001 -3.342 0.001 -0.007 -0.

↪→002x1 0.7430 0.053 13.888 0.000 0.638 0.

↪→848==============================================================================




↪→257


Method: Least Squares F-statistic: ␣↪→116.8



No. Observations: 336 AIC: ␣↪→-1626.

Df Residuals: 334 BIC: ␣↪→-1619.


coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0009 0.002 0.372 0.710 -0.004 0.

↪→006x1 1.0558 0.098 10.808 0.000 0.864 1.



↪→675Skew: -1.227 Prob(JB): 4.


↪→83.5==============================================================================






↪→ 1Norm: HuberTScale Est.: madCov Type: H1Date: Fri, 13 Nov 2020Time: 20:42:04


No. Iterations: 18==============================================================================

coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0041 0.002 2.158 0.031 0.000 0.

↪→008x1 1.1751 0.078 15.120 0.000 1.023 1.

↪→327==============================================================================


[55]: beta_df = pd.DataFrame.from_dict(beta_dict)beta_df = beta_df.rename(index={0: 'OLS Beta', 1:'Robust Beta'})

[56]: beta_df

[56]: AAPL MSFT GOLD GM GILD AMZN \OLS Beta 1.468194 1.231412 0.187470 1.102756 0.859581 1.213319Robust Beta 1.451862 1.261327 0.185402 1.020601 0.895288 1.187281

TSLA ENB EQIX FSLROLS Beta 1.546329 0.830657 0.763201 1.055762Robust Beta 1.520798 0.659591 0.743018 1.175148

We proceed with robust beta estimations. See report for details.

[57]: beta_df.T[['Robust Beta']]

[57]: Robust BetaAAPL 1.451862MSFT 1.261327GOLD 0.185402GM 1.020601GILD 0.895288AMZN 1.187281TSLA 1.520798ENB 0.659591EQIX 0.743018FSLR 1.175148

Evaluation Metrics[58]: ### Daily Return Calculation

from collections import OrderedDict

tau_opt_port_daily_ret_dict = OrderedDict()


for index, tau in enumerate(tau_array):weights = pd.Series(all_weights[index], index = ticker_list)tau_opt_port_daily_ret_dict['tau=' + str(tau)] = (df_ret * weights).

↪→sum(1)

tau_opt_port_daily_ret_dict['1/N'] = (df_ret * naive_weight).sum(1)tau_opt_port_daily_ret_dict['SPX'] = y_train[['Return']].sum(1)

tau_opt_port_daily_ret_df = pd.DataFrame.↪→from_dict(tau_opt_port_daily_ret_dict)

[59]: ## Cumulative Return Calculationcum_return_data = list()

for index, ticker in enumerate(ticker_list):cum_ret = df_ret[ticker].cumsum()cum_return_data.append(cum_ret)

df_ret_cum = pd.concat(cum_return_data, axis=1)df_ret_cum.columns=ticker_list

[60]: ## Consolidate metricsfrom collections import OrderedDict

tau_opt_port_cum_ret_dict = OrderedDict()

for index, tau in enumerate(tau_array):weights = pd.Series(all_weights[index], index = ticker_list)tau_opt_port_cum_ret_dict['tau=' + str(tau)] = (df_ret_cum * weights).

↪→sum(1)

tau_opt_port_cum_ret_dict['1/N'] = (df_ret_cum * naive_weight).sum(1)tau_opt_port_cum_ret_dict['SPX'] = y_train[['Return']].cumsum().sum(1)

tau_opt_port_cum_ret_df = pd.DataFrame.↪→from_dict(tau_opt_port_cum_ret_dict)

[61]: ## dtale object to plot cumulative returnsdtale_summary = dtale.show(tau_opt_port_cum_ret_df, ignore_duplicate=True)dtale_summary

<IPython.lib.display.IFrame at 0x221ee6d6c88>

[61]:


[62]: def sharpe_ratio(mu, sigma, rf):return (mu - rf) / sigma

def treynor_ratio(mu, beta, rf):return (mu - rf) / beta

def jensen_alpha(mu, beta, rf, spm):return (mu - rf) - beta * (spm - rf)

[63]: metric_dict = OrderedDict()

def metric_summary(mu, sigma, beta, rf, spm):mean = muvol = sigmasharpe = sharpe_ratio(mu, sigma, rf)treynor = treynor_ratio(mu, beta, rf)jensen = jensen_alpha(mu, beta, rf, spm)return [mean, vol, beta, sharpe, treynor, jensen]

for index, tau in enumerate(tau_array):weights = pd.Series(all_weights[index], index = ticker_list)beta = (beta_df.T[['Robust Beta']].T * weights).sum(1)[0]metric_dict['tau=' + str(tau)] = metric_summary(tau_mean[index],␣

↪→tau_vol[index], beta, rf_mean, sp_mean)

naive_beta = (beta_df.T[['Robust Beta']].T * naive_weight).sum(1)[0]

metric_dict['1/N'] = metric_summary(naive_mean, naive_vol, naive_beta,␣↪→rf_mean, sp_mean)

metric_dict['SPX'] = metric_summary(sp_mean, sp_vol, 1, rf_mean, sp_mean)

metric_df = pd.DataFrame.from_dict(metric_dict)metric_df = metric_df.rename(index={0: 'Mean Return', 1:'Standard␣

↪→Deviation', 2: 'Estimated Beta',3: 'Sharpe Ratio', 4: 'Treynor Ratio',␣↪→5: 'Jensen Alpha'})

[64]: metric_df = metric_df.T

[65]: metric_df

[65]: Mean Return Standard Deviation Estimated Beta Sharpe Ratio \tau=0.0 0.173520 0.167380 0.686467 0.907961tau=0.01 0.238246 0.169303 0.695817 1.279960tau=0.02 0.302972 0.174943 0.705166 1.608673tau=0.03 0.367698 0.183960 0.714516 1.881668tau=0.04 0.432424 0.195888 0.723866 2.097513tau=0.05 0.497150 0.210232 0.733215 2.262281tau=0.06 0.561876 0.226534 0.742565 2.385212tau=0.07 0.626602 0.244401 0.751915 2.475671


tau=0.08 0.691328 0.263516 0.761265 2.541714tau=0.09 0.756054 0.283627 0.770614 2.589700tau=0.1 0.820780 0.304536 0.779964 2.624434tau=0.11 0.885506 0.326090 0.789314 2.649455tau=0.12 0.950232 0.348169 0.798663 2.667345tau=0.13 1.014958 0.370679 0.808013 2.679979tau=0.14 1.079684 0.393547 0.817363 2.688723tau=0.15 1.144410 0.416713 0.826713 2.694576tau=0.16 1.209136 0.440130 0.836062 2.698270tau=0.17 1.273862 0.463761 0.845412 2.700349tau=0.18 1.338588 0.487574 0.854762 2.701216tau=0.19 1.403314 0.511544 0.864111 2.7011731/N 0.173845 0.207520 1.010032 0.733904SPX 0.006465 0.180789 1.000000 -0.083413

Treynor Ratio Jensen Alphatau=0.0 0.221387 0.162327tau=0.01 0.311433 0.227194tau=0.02 0.399092 0.292061tau=0.03 0.484457 0.356928tau=0.04 0.567617 0.421794tau=0.05 0.648656 0.486661tau=0.06 0.727654 0.551528tau=0.07 0.804687 0.616395tau=0.08 0.879829 0.681262tau=0.09 0.953146 0.746129tau=0.1 1.024707 0.810996tau=0.11 1.094571 0.875863tau=0.12 1.162800 0.940730tau=0.13 1.229450 1.005597tau=0.14 1.294576 1.070464tau=0.15 1.358228 1.135331tau=0.16 1.420456 1.200198tau=0.17 1.481308 1.265065tau=0.18 1.540829 1.329932tau=0.19 1.599062 1.3947991/N 0.150787 0.167531SPX -0.015080 0.000000

[66]: metric_dtale = dtale.show(metric_df, ignore_duplicate=True)metric_dtale

## All the charting were done in dtale object

<IPython.lib.display.IFrame at 0x221ee46f9b0>

[66]:


2.1.5 Out-of-sample Performance Evaluation

Estimation of β on the testing set

[67]: ## beta estimate on testing setbeta_dict_test = dict()

for ticker in ticker_list:beta_dict_test[ticker] = np.array([ols_beta(ticker, y_test, X_test,␣

↪→rf_test), huber_beta(ticker, y_test, X_test, rf_test)])







↪→-860.9Df Residuals: 163 BIC: ␣

↪→-854.7Df Model: 1Covariance Type: nonrobust==============================================================================

coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0023 0.001 1.599 0.112 -0.001 0.

↪→005x1 1.0777 0.057 18.915 0.000 0.965 1.



↪→215Skew: 0.708 Prob(JB): 7.


↪→41.3==============================================================================

Notes:


[1] Standard Errors assume that the covariance matrix of the errors is␣↪→correctly

specified.Robust linear Model Regression Results

==============================================================================Dep. Variable: y No. Observations: ␣




coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0017 0.001 1.443 0.149 -0.001 0.

↪→004x1 1.0664 0.048 22.164 0.000 0.972 1.

↪→161==============================================================================











coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0013 0.001 1.163 0.247 -0.001 0.

↪→003x1 1.1095 0.043 25.587 0.000 1.024 1.



↪→130Skew: 0.402 Prob(JB): 0.


↪→41.3==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0007 0.001 0.702 0.483 -0.001 0.

↪→003x1 1.0786 0.043 25.363 0.000 0.995 1.

↪→162==============================================================================







↪→000166Time: 20:46:32 Log-Likelihood: ␣




coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0026 0.003 -1.021 0.309 -0.008 0.

↪→002x1 0.3989 0.103 3.855 0.000 0.195 0.



↪→223Skew: 0.202 Prob(JB): 8.


↪→41.3==============================================================================





↪→163




coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0031 0.002 -1.388 0.165 -0.007 0.

↪→001x1 0.4176 0.088 4.744 0.000 0.245 0.

↪→590==============================================================================










coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0020 0.003 0.805 0.422 -0.003 0.

↪→007x1 1.2246 0.100 12.187 0.000 1.026 1.

↪→423==============================================================================


Omnibus: 13.852 Durbin-Watson: 1.↪→672

Prob(Omnibus): 0.001 Jarque-Bera (JB): 41.↪→162

Skew: -0.077 Prob(JB): 1.↪→15e-09

Kurtosis: 5.442 Cond. No. ␣↪→41.3

==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0015 0.002 0.653 0.514 -0.003 0.

↪→006x1 1.2177 0.092 13.291 0.000 1.038 1.

↪→397==============================================================================





↪→43.85




No. Observations: 165 AIC: ␣↪→-795.5

Df Residuals: 163 BIC: ␣↪→-789.3


coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0053 0.002 -3.055 0.003 -0.009 -0.

↪→002x1 0.4599 0.069 6.622 0.000 0.323 0.



↪→788Skew: 0.741 Prob(JB): 6.


↪→41.3==============================================================================








coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0063 0.001 -4.464 0.000 -0.009 -0.

↪→004x1 0.4955 0.057 8.769 0.000 0.385 0.

↪→606==============================================================================










coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0002 0.002 -0.136 0.892 -0.003 0.

↪→003x1 0.6470 0.066 9.797 0.000 0.517 0.



↪→721Skew: 0.406 Prob(JB): 0.


↪→41.3==============================================================================








coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0010 0.002 -0.681 0.496 -0.004 0.

↪→002x1 0.6680 0.061 10.907 0.000 0.548 0.

↪→788==============================================================================









↪→-512.8Df Model: 1Covariance Type: nonrobust


==============================================================================coef std err t P>|t| [0.025 0.

↪→975]------------------------------------------------------------------------------const 0.0073 0.004 1.824 0.070 -0.001 0.

↪→015x1 1.2083 0.161 7.526 0.000 0.891 1.



↪→858Skew: -0.196 Prob(JB): 5.


↪→41.3==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0076 0.004 2.133 0.033 0.001 0.

↪→015x1 1.2571 0.142 8.824 0.000 0.978 1.

↪→536==============================================================================

If the model instance has been used for another fit with different fit


parameters, then the fit options might not be the correct ones anymore .OLS Regression Results

==============================================================================Dep. Variable: y R-squared: 0.








coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0011 0.002 -0.666 0.506 -0.004 0.

↪→002x1 1.0864 0.064 16.990 0.000 0.960 1.



↪→235Skew: 0.166 Prob(JB): 2.


↪→41.3==============================================================================





↪→163




coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0023 0.001 -2.130 0.033 -0.004 -0.

↪→000x1 0.9594 0.043 22.104 0.000 0.874 1.

↪→044==============================================================================










coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0005 0.001 -0.335 0.738 -0.003 0.

↪→002x1 0.8667 0.056 15.544 0.000 0.757 0.

↪→977==============================================================================


Omnibus: 2.244 Durbin-Watson: 2.↪→037

Prob(Omnibus): 0.326 Jarque-Bera (JB): 2.↪→093

Skew: -0.028 Prob(JB): 0.↪→351

Kurtosis: 3.549 Cond. No. ␣↪→41.3

==============================================================================







coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const -0.0004 0.001 -0.287 0.774 -0.003 0.

↪→002x1 0.8577 0.052 16.474 0.000 0.756 0.

↪→960==============================================================================





↪→61.73




No. Observations: 165 AIC: ␣↪→-648.0

Df Residuals: 163 BIC: ␣↪→-641.8


coef std err t P>|t| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0029 0.003 1.080 0.282 -0.002 0.

↪→008x1 0.8533 0.109 7.857 0.000 0.639 1.



↪→632Skew: 0.673 Prob(JB): 1.


↪→41.3==============================================================================








coef std err z P>|z| [0.025 0.↪→975]

------------------------------------------------------------------------------const 0.0023 0.002 1.084 0.278 -0.002 0.

↪→007x1 1.0354 0.086 12.028 0.000 0.867 1.

↪→204==============================================================================


[68]: beta_df_test = pd.DataFrame.from_dict(beta_dict_test)beta_df_test = beta_df_test.rename(index={0: 'OLS Beta', 1:'Robust Beta'})

[69]: beta_df_test

[69]: AAPL MSFT GOLD GM GILD AMZN \OLS Beta 1.077664 1.109520 0.398892 1.224639 0.459944 0.647042Robust Beta 1.066432 1.078614 0.417648 1.217697 0.495541 0.667989

TSLA ENB EQIX FSLROLS Beta 1.208272 1.086403 0.866686 0.853298Robust Beta 1.257081 0.959386 0.857697 1.035389

[71]: return_data_test = list()

for index, ticker in enumerate(ticker_list):log_ret = np.log(X_test[ticker]) - np.log(X_test[ticker].shift(1))return_data_test.append(log_ret)

df_ret_test = pd.concat(return_data_test, axis=1)df_ret_test.columns=ticker_listdf_ret_test.tail()

[71]: AAPL MSFT GOLD GM GILD AMZN \Date2020-10-27 0.013382 0.014977 0.017126 -0.026306 0.000834 0.0244232020-10-28 -0.047419 -0.050837 -0.044150 -0.023196 -0.021731 -0.0383202020-10-29 0.036381 0.010015 0.009599 0.023196 -0.003241 0.0151342020-10-30 -0.057648 -0.011051 0.021173 -0.010372 -0.006514 -0.0559952020-11-02 -0.000827 -0.000692 0.015959 0.000579 0.006514 -0.010486

TSLA ENB EQIX FSLRDate2020-10-27 0.010415 -0.006689 0.003666 -0.0163722020-10-28 -0.044934 -0.027213 -0.022546 0.1244632020-10-29 0.011777 0.000363 -0.020565 -0.0647462020-10-30 -0.057071 0.000000 -0.012827 -0.004699


2020-11-02 0.031630 -0.002543 0.009636 0.020917

[72]: ## mean return, sample variance, sample covariance of test setstest_mean_estimate = mean_estimate(df_ret_test)test_mean_estimate_np = np.array(list(test_mean_estimate.values()))

test_cov_mat_estimate = cov_mat_estimate(df_ret_test)test_cov_mat_estimate_np = test_cov_mat_estimate.to_numpy()

tau_mean_test = np.zeros(len(tau_array))tau_vol_test = np.zeros(len(tau_array))

for index, tau in enumerate(tau_array):weight_opt = all_weights[index]tau_mean_test[index] = np.sum((test_mean_estimate_np * weight_opt))tau_vol_test[index] = np.sqrt(np.dot(weight_opt.T, np.

↪→dot(test_cov_mat_estimate_np, weight_opt)))

naive_mean_test = np.sum((test_mean_estimate_np * naive_weight))naive_vol_test = np.sqrt(np.dot(naive_weight.T, np.

↪→dot(test_cov_mat_estimate_np, naive_weight)))

[73]: sp_mean_test = (np.mean(y_test['Return']) + 1) ** 252 - 1sp_vol_test = np.std(y_test['Return']) * np.sqrt(252)rf_mean_test = np.mean(rf_test['DGS10'] / 100)rf_vol_test = np.std(rf_test['DGS10'] / 100)

[74]: ### Daily Return Calculation

tau_opt_port_daily_ret_dict = OrderedDict()

for index, tau in enumerate(tau_array):weights = pd.Series(all_weights[index], index = ticker_list)tau_opt_port_daily_ret_dict['tau=' + str(tau)] = (df_ret_test *␣

↪→weights).sum(1)

tau_opt_port_daily_ret_dict['1/N'] = (df_ret_test * naive_weight).sum(1)tau_opt_port_daily_ret_dict['SPX'] = y_test[['Return']].sum(1)

tau_opt_port_daily_ret_df_test = pd.DataFrame.↪→from_dict(tau_opt_port_daily_ret_dict)

[75]: tau_opt_port_daily_ret_df_test

[75]: tau=0.0 tau=0.01 tau=0.02 tau=0.03 tau=0.04 tau=0.05 \Date2020-03-10 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000002020-03-11 -0.029702 -0.030061 -0.030420 -0.030778 -0.031137 -0.031496


2020-03-12 -0.099652 -0.098807 -0.097963 -0.097119 -0.096274 -0.0954302020-03-13 0.065279 0.069593 0.073906 0.078219 0.082533 0.0868462020-03-16 -0.070922 -0.075645 -0.080369 -0.085092 -0.089815 -0.0945392020-03-17 0.052776 0.060717 0.068658 0.076598 0.084539 0.0924802020-03-18 -0.047563 -0.042846 -0.038129 -0.033412 -0.028695 -0.0239782020-03-19 -0.000370 -0.004759 -0.009148 -0.013537 -0.017926 -0.0223152020-03-20 -0.028541 -0.035464 -0.042387 -0.049310 -0.056233 -0.0631562020-03-23 -0.010157 -0.008822 -0.007487 -0.006153 -0.004818 -0.0034842020-03-24 0.099640 0.101194 0.102749 0.104303 0.105857 0.1074112020-03-25 -0.006434 -0.006616 -0.006799 -0.006981 -0.007164 -0.0073472020-03-26 0.049734 0.050114 0.050493 0.050873 0.051253 0.0516332020-03-27 -0.025266 -0.022848 -0.020430 -0.018012 -0.015594 -0.0131762020-03-30 0.039509 0.045785 0.052062 0.058338 0.064614 0.0708912020-03-31 -0.023186 -0.024426 -0.025666 -0.026907 -0.028147 -0.0293872020-04-01 -0.028198 -0.025301 -0.022405 -0.019509 -0.016612 -0.0137162020-04-02 0.033395 0.038281 0.043166 0.048052 0.052938 0.0578242020-04-03 0.001711 0.001819 0.001927 0.002035 0.002142 0.0022502020-04-06 0.032442 0.034300 0.036158 0.038016 0.039874 0.0417322020-04-07 -0.016542 -0.023420 -0.030297 -0.037175 -0.044052 -0.0509302020-04-08 0.026900 0.023552 0.020204 0.016855 0.013507 0.0101592020-04-09 0.029056 0.029519 0.029981 0.030444 0.030906 0.0313692020-04-13 0.004730 0.005897 0.007065 0.008232 0.009399 0.0105662020-04-14 0.024657 0.028094 0.031531 0.034968 0.038405 0.0418422020-04-15 -0.025490 -0.021628 -0.017767 -0.013905 -0.010044 -0.0061822020-04-16 0.014084 0.017847 0.021610 0.025373 0.029135 0.0328982020-04-17 0.029213 0.022840 0.016467 0.010094 0.003721 -0.0026532020-04-20 -0.011402 -0.013340 -0.015279 -0.017217 -0.019155 -0.0210932020-04-21 -0.020535 -0.018995 -0.017454 -0.015914 -0.014373 -0.012833... ... ... ... ... ... ...2020-09-22 0.011252 0.012016 0.012781 0.013545 0.014309 0.0150742020-09-23 -0.022933 -0.026569 -0.030205 -0.033841 -0.037478 -0.0411142020-09-24 0.003250 0.005818 0.008386 0.010954 0.013523 0.0160912020-09-25 0.007727 0.010013 0.012298 0.014584 0.016870 0.0191552020-09-28 0.008370 0.007389 0.006408 0.005427 0.004446 0.0034652020-09-29 -0.005524 -0.003954 -0.002384 -0.000815 0.000755 0.0023252020-09-30 0.001690 0.001120 0.000551 -0.000019 -0.000588 -0.0011572020-10-01 0.006172 0.004913 0.003655 0.002396 0.001138 -0.0001212020-10-02 -0.004034 -0.005881 -0.007728 -0.009575 -0.011422 -0.0132682020-10-05 0.017211 0.016514 0.015817 0.015120 0.014423 0.0137252020-10-06 -0.020545 -0.020985 -0.021424 -0.021864 -0.022303 -0.0227422020-10-07 0.011559 0.009023 0.006487 0.003951 0.001416 -0.0011202020-10-08 0.018554 0.018287 0.018019 0.017752 0.017485 0.0172182020-10-09 0.009538 0.011416 0.013295 0.015173 0.017051 0.0189292020-10-12 0.005744 0.006783 0.007823 0.008862 0.009902 0.0109422020-10-13 -0.003359 -0.003323 -0.003287 -0.003251 -0.003215 -0.0031792020-10-14 -0.007364 -0.006369 -0.005375 -0.004380 -0.003386 -0.0023912020-10-15 -0.005648 -0.007616 -0.009583 -0.011550 -0.013517 -0.0154842020-10-16 0.000436 -0.000765 -0.001967 -0.003168 -0.004369 -0.0055712020-10-19 -0.015634 -0.017129 -0.018623 -0.020118 -0.021613 -0.023107


2020-10-20 0.006864 0.002470 -0.001925 -0.006320 -0.010715 -0.0151092020-10-21 0.000028 0.001066 0.002105 0.003144 0.004182 0.0052212020-10-22 -0.002538 -0.007164 -0.011790 -0.016416 -0.021042 -0.0256682020-10-23 -0.002474 -0.001870 -0.001266 -0.000662 -0.000058 0.0005462020-10-26 -0.014511 -0.015049 -0.015586 -0.016124 -0.016662 -0.0172002020-10-27 0.003525 0.006464 0.009403 0.012342 0.015281 0.0182202020-10-28 -0.031044 -0.036029 -0.041014 -0.045999 -0.050985 -0.0559702020-10-29 -0.000135 -0.000185 -0.000236 -0.000286 -0.000336 -0.0003872020-10-30 -0.000921 0.000189 0.001299 0.002409 0.003519 0.0046302020-11-02 0.003766 0.004635 0.005503 0.006372 0.007240 0.008109

tau=0.06 tau=0.07 tau=0.08 tau=0.09 ... tau=0.12 tau=0.↪→13 \

Date ...2020-03-10 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.

↪→0000002020-03-11 -0.031854 -0.032213 -0.032572 -0.032930 ... -0.034006 -0.

↪→0343652020-03-12 -0.094585 -0.093741 -0.092897 -0.092052 ... -0.089519 -0.

↪→0886752020-03-13 0.091160 0.095473 0.099786 0.104100 ... 0.117040 0.

↪→1213532020-03-16 -0.099262 -0.103985 -0.108709 -0.113432 ... -0.127602 -0.

↪→1323252020-03-17 0.100421 0.108362 0.116303 0.124244 ... 0.148066 0.

↪→1560072020-03-18 -0.019261 -0.014544 -0.009827 -0.005110 ... 0.009040 0.

↪→0137572020-03-19 -0.026704 -0.031093 -0.035482 -0.039871 ... -0.053038 -0.

↪→0574272020-03-20 -0.070079 -0.077003 -0.083926 -0.090849 ... -0.111618 -0.

↪→1185412020-03-23 -0.002149 -0.000815 0.000520 0.001854 ... 0.005858 0.

↪→0071932020-03-24 0.108965 0.110519 0.112073 0.113627 ... 0.118290 0.

↪→1198442020-03-25 -0.007529 -0.007712 -0.007894 -0.008077 ... -0.008625 -0.

↪→0088072020-03-26 0.052013 0.052392 0.052772 0.053152 ... 0.054291 0.

↪→0546712020-03-27 -0.010758 -0.008340 -0.005922 -0.003504 ... 0.003749 0.

↪→0061672020-03-30 0.077167 0.083443 0.089719 0.095996 ... 0.114824 0.

↪→1211012020-03-31 -0.030627 -0.031867 -0.033107 -0.034347 ... -0.038067 -0.

↪→039307


2020-04-01 -0.010820 -0.007923 -0.005027 -0.002131 ... 0.006558 0.↪→009455

2020-04-02 0.062709 0.067595 0.072481 0.077366 ... 0.092023 0.↪→096909

2020-04-03 0.002358 0.002466 0.002574 0.002682 ... 0.003005 0.↪→003113

2020-04-06 0.043590 0.045448 0.047306 0.049164 ... 0.054738 0.↪→056596

2020-04-07 -0.057807 -0.064685 -0.071563 -0.078440 ... -0.099073 -0.↪→105950

2020-04-08 0.006811 0.003462 0.000114 -0.003234 ... -0.013279 -0.↪→016627

2020-04-09 0.031832 0.032294 0.032757 0.033219 ... 0.034607 0.↪→035069

2020-04-13 0.011733 0.012900 0.014068 0.015235 ... 0.018736 0.↪→019903

2020-04-14 0.045280 0.048717 0.052154 0.055591 ... 0.065902 0.↪→069339

2020-04-15 -0.002321 0.001541 0.005402 0.009263 ... 0.020848 0.↪→024709

2020-04-16 0.036661 0.040424 0.044187 0.047950 ... 0.059238 0.↪→063001

2020-04-17 -0.009026 -0.015399 -0.021772 -0.028145 ... -0.047264 -0.↪→053638

2020-04-20 -0.023031 -0.024970 -0.026908 -0.028846 ... -0.034661 -0.↪→036599

2020-04-21 -0.011292 -0.009752 -0.008211 -0.006671 ... -0.002049 -0.↪→000509

... ... ... ... ... ... ... ␣↪→...

2020-09-22 0.015838 0.016602 0.017367 0.018131 ... 0.020424 0.↪→021188

2020-09-23 -0.044750 -0.048386 -0.052022 -0.055658 ... -0.066566 -0.↪→070202

2020-09-24 0.018659 0.021227 0.023795 0.026363 ... 0.034067 0.↪→036635

2020-09-25 0.021441 0.023727 0.026012 0.028298 ... 0.035155 0.↪→037441

2020-09-28 0.002484 0.001503 0.000522 -0.000459 ... -0.003402 -0.↪→004383

2020-09-29 0.003895 0.005465 0.007034 0.008604 ... 0.013314 0.↪→014884

2020-09-30 -0.001727 -0.002296 -0.002866 -0.003435 ... -0.005143 -0.↪→005713

2020-10-01 -0.001379 -0.002638 -0.003896 -0.005155 ... -0.008931 -0.↪→010189


2020-10-02 -0.015115 -0.016962 -0.018809 -0.020656 ... -0.026196 -0.↪→028043

2020-10-05 0.013028 0.012331 0.011634 0.010937 ... 0.008846 0.↪→008149

2020-10-06 -0.023182 -0.023621 -0.024061 -0.024500 ... -0.025818 -0.↪→026258

2020-10-07 -0.003656 -0.006192 -0.008728 -0.011263 ... -0.018871 -0.↪→021407

2020-10-08 0.016950 0.016683 0.016416 0.016149 ... 0.015347 0.↪→015080

2020-10-09 0.020807 0.022685 0.024563 0.026442 ... 0.032076 0.↪→033954

2020-10-12 0.011981 0.013021 0.014060 0.015100 ... 0.018218 0.↪→019258

2020-10-13 -0.003143 -0.003106 -0.003070 -0.003034 ... -0.002926 -0.↪→002890

2020-10-14 -0.001397 -0.000402 0.000592 0.001587 ... 0.004570 0.↪→005565

2020-10-15 -0.017451 -0.019418 -0.021386 -0.023353 ... -0.029254 -0.↪→031221

2020-10-16 -0.006772 -0.007974 -0.009175 -0.010376 ... -0.013980 -0.↪→015182

2020-10-19 -0.024602 -0.026096 -0.027591 -0.029085 ... -0.033569 -0.↪→035064

2020-10-20 -0.019504 -0.023899 -0.028294 -0.032689 ... -0.045873 -0.↪→050268

2020-10-21 0.006259 0.007298 0.008337 0.009375 ... 0.012491 0.↪→013530

2020-10-22 -0.030294 -0.034920 -0.039546 -0.044172 ... -0.058049 -0.↪→062675

2020-10-23 0.001150 0.001754 0.002358 0.002962 ... 0.004774 0.↪→005378

2020-10-26 -0.017737 -0.018275 -0.018813 -0.019351 ... -0.020964 -0.↪→021502

2020-10-27 0.021159 0.024098 0.027037 0.029976 ... 0.038793 0.↪→041732

2020-10-28 -0.060955 -0.065940 -0.070925 -0.075910 ... -0.090866 -0.↪→095851

2020-10-29 -0.000437 -0.000487 -0.000538 -0.000588 ... -0.000739 -0.↪→000790

2020-10-30 0.005740 0.006850 0.007960 0.009070 ... 0.012401 0.↪→013511

2020-11-02 0.008978 0.009846 0.010715 0.011584 ... 0.014189 0.↪→015058

tau=0.14 tau=0.15 tau=0.16 tau=0.17 tau=0.18 tau=0.19 \Date


2020-03-10 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000002020-03-11 -0.034723 -0.035082 -0.035441 -0.035799 -0.036158 -0.0365172020-03-12 -0.087830 -0.086986 -0.086142 -0.085297 -0.084453 -0.0836082020-03-13 0.125667 0.129980 0.134293 0.138607 0.142920 0.1472342020-03-16 -0.137048 -0.141772 -0.146495 -0.151218 -0.155942 -0.1606652020-03-17 0.163948 0.171889 0.179829 0.187770 0.195711 0.2036522020-03-18 0.018474 0.023191 0.027908 0.032625 0.037342 0.0420592020-03-19 -0.061816 -0.066205 -0.070594 -0.074983 -0.079372 -0.0837612020-03-20 -0.125464 -0.132388 -0.139311 -0.146234 -0.153157 -0.1600802020-03-23 0.008527 0.009862 0.011196 0.012531 0.013865 0.0152002020-03-24 0.121398 0.122952 0.124506 0.126060 0.127614 0.1291692020-03-25 -0.008990 -0.009172 -0.009355 -0.009538 -0.009720 -0.0099032020-03-26 0.055051 0.055430 0.055810 0.056190 0.056570 0.0569502020-03-27 0.008585 0.011003 0.013421 0.015839 0.018257 0.0206752020-03-30 0.127377 0.133653 0.139929 0.146206 0.152482 0.1587582020-03-31 -0.040548 -0.041788 -0.043028 -0.044268 -0.045508 -0.0467482020-04-01 0.012351 0.015247 0.018144 0.021040 0.023936 0.0268332020-04-02 0.101795 0.106681 0.111566 0.116452 0.121338 0.1262232020-04-03 0.003221 0.003329 0.003437 0.003544 0.003652 0.0037602020-04-06 0.058454 0.060313 0.062171 0.064029 0.065887 0.0677452020-04-07 -0.112828 -0.119705 -0.126583 -0.133460 -0.140338 -0.1472162020-04-08 -0.019975 -0.023324 -0.026672 -0.030020 -0.033368 -0.0367162020-04-09 0.035532 0.035994 0.036457 0.036919 0.037382 0.0378452020-04-13 0.021071 0.022238 0.023405 0.024572 0.025739 0.0269062020-04-14 0.072776 0.076213 0.079650 0.083087 0.086524 0.0899612020-04-15 0.028571 0.032432 0.036293 0.040155 0.044016 0.0478782020-04-16 0.066764 0.070527 0.074289 0.078052 0.081815 0.0855782020-04-17 -0.060011 -0.066384 -0.072757 -0.079130 -0.085503 -0.0918762020-04-20 -0.038537 -0.040475 -0.042414 -0.044352 -0.046290 -0.0482282020-04-21 0.001032 0.002572 0.004113 0.005653 0.007194 0.008734... ... ... ... ... ... ...2020-09-22 0.021952 0.022717 0.023481 0.024245 0.025010 0.0257742020-09-23 -0.073838 -0.077474 -0.081110 -0.084746 -0.088382 -0.0920182020-09-24 0.039203 0.041771 0.044339 0.046907 0.049476 0.0520442020-09-25 0.039726 0.042012 0.044298 0.046583 0.048869 0.0511552020-09-28 -0.005364 -0.006345 -0.007326 -0.008307 -0.009288 -0.0102692020-09-29 0.016453 0.018023 0.019593 0.021163 0.022733 0.0243022020-09-30 -0.006282 -0.006852 -0.007421 -0.007991 -0.008560 -0.0091292020-10-01 -0.011448 -0.012706 -0.013965 -0.015223 -0.016482 -0.0177402020-10-02 -0.029890 -0.031737 -0.033584 -0.035431 -0.037277 -0.0391242020-10-05 0.007452 0.006754 0.006057 0.005360 0.004663 0.0039662020-10-06 -0.026697 -0.027137 -0.027576 -0.028016 -0.028455 -0.0288942020-10-07 -0.023942 -0.026478 -0.029014 -0.031550 -0.034086 -0.0366212020-10-08 0.014812 0.014545 0.014278 0.014011 0.013743 0.0134762020-10-09 0.035832 0.037710 0.039589 0.041467 0.043345 0.0452232020-10-12 0.020297 0.021337 0.022376 0.023416 0.024456 0.0254952020-10-13 -0.002854 -0.002817 -0.002781 -0.002745 -0.002709 -0.0026732020-10-14 0.006559 0.007554 0.008549 0.009543 0.010538 0.0115322020-10-15 -0.033188 -0.035156 -0.037123 -0.039090 -0.041057 -0.043024


2020-10-16 -0.016383 -0.017585 -0.018786 -0.019987 -0.021189 -0.0223902020-10-19 -0.036558 -0.038053 -0.039547 -0.041042 -0.042537 -0.0440312020-10-20 -0.054662 -0.059057 -0.063452 -0.067847 -0.072241 -0.0766362020-10-21 0.014568 0.015607 0.016645 0.017684 0.018723 0.0197612020-10-22 -0.067301 -0.071927 -0.076553 -0.081179 -0.085805 -0.0904312020-10-23 0.005982 0.006586 0.007190 0.007794 0.008399 0.0090032020-10-26 -0.022040 -0.022577 -0.023115 -0.023653 -0.024191 -0.0247282020-10-27 0.044671 0.047610 0.050549 0.053487 0.056426 0.0593652020-10-28 -0.100836 -0.105822 -0.110807 -0.115792 -0.120777 -0.1257622020-10-29 -0.000840 -0.000890 -0.000941 -0.000991 -0.001041 -0.0010922020-10-30 0.014621 0.015732 0.016842 0.017952 0.019062 0.0201732020-11-02 0.015927 0.016795 0.017664 0.018533 0.019401 0.020270

1/N SPXDate2020-03-10 0.000000 0.0482152020-03-11 -0.037002 -0.0501032020-03-12 -0.096001 -0.0999452020-03-13 0.056336 0.0888082020-03-16 -0.109684 -0.1276522020-03-17 0.034542 0.0582262020-03-18 -0.065133 -0.0532222020-03-19 0.035991 0.0046972020-03-20 -0.022527 -0.0443282020-03-23 -0.009073 -0.0297322020-03-24 0.094909 0.0896832020-03-25 0.006443 0.0114692020-03-26 0.043726 0.0605442020-03-27 -0.033707 -0.0342682020-03-30 0.025645 0.0329672020-03-31 -0.007622 -0.0161422020-04-01 -0.046336 -0.0451462020-04-02 0.015317 0.0225732020-04-03 0.002471 -0.0152532020-04-06 0.051315 0.0679682020-04-07 0.002747 -0.0016042020-04-08 0.025942 0.0334892020-04-09 0.025220 0.0143832020-04-13 0.022737 -0.0101562020-04-14 0.038056 0.0301152020-04-15 -0.016715 -0.0222772020-04-16 0.010612 0.0058002020-04-17 0.024005 0.0264412020-04-20 -0.013238 -0.0180432020-04-21 -0.035981 -0.031155... ... ...2020-09-22 0.003260 0.0104632020-09-23 -0.033385 -0.0240072020-09-24 0.004330 0.002983


2020-09-25 0.019479 0.0158502020-09-28 0.017373 0.0159822020-09-29 -0.006347 -0.0048242020-09-30 0.008523 0.0082202020-10-01 0.015911 0.0052792020-10-02 -0.020850 -0.0096242020-10-05 0.024955 0.0178132020-10-06 -0.018241 -0.0140722020-10-07 0.022695 0.0172472020-10-08 0.008323 0.0079782020-10-09 0.013644 0.0087552020-10-12 0.017289 0.0162832020-10-13 0.000685 -0.0063272020-10-14 -0.000706 -0.0066452020-10-15 -0.007899 -0.0015292020-10-16 -0.001814 0.0001352020-10-19 -0.016969 -0.0164652020-10-20 0.006857 0.0047162020-10-21 -0.004326 -0.0021982020-10-22 0.002610 0.0052052020-10-23 -0.000337 0.0034402020-10-26 -0.011932 -0.0187642020-10-27 0.003546 -0.0030302020-10-28 -0.019588 -0.0359262020-10-29 0.001791 0.0118772020-10-30 -0.019500 -0.0122042020-11-02 0.007069 0.012243

[166 rows x 22 columns]

[76]: ## Cumulative Return Calculationcum_return_data = list()

for index, ticker in enumerate(ticker_list):cum_ret = df_ret_test[ticker].cumsum()cum_return_data.append(cum_ret)

df_ret_cum_test = pd.concat(cum_return_data, axis=1)df_ret_cum_test.columns=ticker_list

[77]: tau_opt_port_cum_ret_dict = OrderedDict()

for index, tau in enumerate(tau_array):weights = pd.Series(all_weights[index], index = ticker_list)tau_opt_port_cum_ret_dict['tau=' + str(tau)] = (df_ret_cum_test *␣

↪→weights).sum(1)

tau_opt_port_cum_ret_dict['1/N'] = (df_ret_cum_test * naive_weight).sum(1)


tau_opt_port_cum_ret_dict['SPX'] = y_test[['Return']].cumsum().sum(1)

tau_opt_port_cum_ret_df_test = pd.DataFrame.↪→from_dict(tau_opt_port_cum_ret_dict)

[78]: cum_ret_dtale_test = dtale.show(tau_opt_port_cum_ret_df_test,␣↪→ignore_duplicate=True)

cum_ret_dtale_test

<IPython.lib.display.IFrame at 0x221ee54d438>

[78]:

[79]: metric_dict = OrderedDict()

for index, tau in enumerate(tau_array):weights = pd.Series(all_weights[index], index = ticker_list)beta = (beta_df_test.T[['Robust Beta']].T * weights).sum(1)[0]metric_dict['tau=' + str(tau)] = metric_summary(tau_mean_test[index],␣

↪→tau_vol_test[index], beta, rf_mean_test, sp_mean_test)

naive_beta = (beta_df_test.T[['Robust Beta']].T * naive_weight).sum(1)[0]

metric_dict['1/N'] = metric_summary(naive_mean_test, naive_vol_test,␣↪→naive_beta, rf_mean_test, sp_mean_test)

metric_dict['SPX'] = metric_summary(sp_mean_test, sp_vol_test, 1,␣↪→rf_mean_test, sp_mean_test)

metric_df_test = pd.DataFrame.from_dict(metric_dict)metric_df_test = metric_df_test.rename(index={0: 'Mean Return', 1:

↪→'Standard Deviation', 2: 'Estimated Beta',3: 'Sharpe Ratio', 4:␣↪→'Treynor Ratio', 5: 'Jensen Alpha'})

[76]: metric_df_test = metric_df_test.Tmetric_df_test

[76]: Mean Return Standard Deviation Estimated Beta Sharpe Ratio \tau=0.0 0.038918 0.323248 0.732397 0.098603tau=0.01 0.059313 0.334633 0.742281 0.156194tau=0.02 0.079708 0.352000 0.752165 0.206427tau=0.03 0.100102 0.374517 0.762049 0.248472tau=0.04 0.120497 0.401319 0.771933 0.282697tau=0.05 0.140891 0.431607 0.781817 0.310111tau=0.06 0.161286 0.464700 0.791701 0.331915tau=0.07 0.181681 0.500043 0.801584 0.349241tau=0.08 0.202075 0.537191 0.811468 0.363055tau=0.09 0.222470 0.575795 0.821352 0.374135tau=0.1 0.242864 0.615580 0.831236 0.383085tau=0.11 0.263259 0.656333 0.841120 0.390372


tau=0.12 0.283654 0.697884 0.851004 0.396353tau=0.13 0.304048 0.740098 0.860888 0.401303tau=0.14 0.324443 0.782868 0.870771 0.405430tau=0.15 0.344838 0.826107 0.880655 0.408896tau=0.16 0.365232 0.869747 0.890539 0.411829tau=0.17 0.385627 0.913729 0.900423 0.414326tau=0.18 0.406021 0.958006 0.910307 0.416465tau=0.19 0.426416 1.002539 0.920191 0.4183091/N 1.024719 0.374655 0.905348 2.716296SPX 0.327399 0.387441 1.000000 0.826845

Treynor Ratio Jensen Alphatau=0.0 0.043519 -0.202753tau=0.01 0.070415 -0.185525tau=0.02 0.096604 -0.168297tau=0.03 0.122114 -0.151069tau=0.04 0.146971 -0.133840tau=0.05 0.171199 -0.116612tau=0.06 0.194822 -0.099384tau=0.07 0.217863 -0.082155tau=0.08 0.240342 -0.064927tau=0.09 0.262281 -0.047699tau=0.1 0.283697 -0.030470tau=0.11 0.304610 -0.013242tau=0.12 0.325038 0.003986tau=0.13 0.344996 0.021214tau=0.14 0.364502 0.038443tau=0.15 0.383569 0.055671tau=0.16 0.402214 0.072899tau=0.17 0.420449 0.090128tau=0.18 0.438288 0.107356tau=0.19 0.455743 0.1245841/N 1.124070 0.727642SPX 0.320354 0.000000

[80]: weights_df = pd.DataFrame(data = all_weights)weights_df.columns = ticker_listweights_df.index = tau_arrayweights_df_dtale = dtale.show(weights_df, ignore_duplicate=True)weights_df_dtale

<IPython.lib.display.IFrame at 0x221eccf69e8>

[80]:

[81]: weights_df.index = tau_array

[82]: weights_df


[82]: AAPL MSFT GOLD GM GILD AMZN ␣↪→TSLA \

0.00 -0.048357 0.118213 0.218855 0.106989 0.175218 0.086370 -0.0512160.01 -0.035994 0.212194 0.236188 0.044762 0.152940 0.039427 -0.0407180.02 -0.023630 0.306176 0.253520 -0.017465 0.130662 -0.007517 -0.0302200.03 -0.011267 0.400157 0.270852 -0.079692 0.108383 -0.054461 -0.0197230.04 0.001097 0.494139 0.288184 -0.141918 0.086105 -0.101404 -0.0092250.05 0.013460 0.588120 0.305517 -0.204145 0.063827 -0.148348 0.0012730.06 0.025824 0.682102 0.322849 -0.266372 0.041549 -0.195292 0.0117710.07 0.038187 0.776083 0.340181 -0.328599 0.019271 -0.242235 0.0222680.08 0.050551 0.870065 0.357514 -0.390826 -0.003007 -0.289179 0.0327660.09 0.062914 0.964046 0.374846 -0.453053 -0.025285 -0.336123 0.0432640.10 0.075278 1.058028 0.392178 -0.515280 -0.047563 -0.383066 0.0537610.11 0.087641 1.152009 0.409511 -0.577506 -0.069842 -0.430010 0.0642590.12 0.100005 1.245991 0.426843 -0.639733 -0.092120 -0.476954 0.0747570.13 0.112368 1.339972 0.444175 -0.701960 -0.114398 -0.523897 0.0852540.14 0.124732 1.433954 0.461507 -0.764187 -0.136676 -0.570841 0.0957520.15 0.137095 1.527935 0.478840 -0.826414 -0.158954 -0.617785 0.1062500.16 0.149459 1.621917 0.496172 -0.888641 -0.181232 -0.664729 0.1167470.17 0.161823 1.715898 0.513504 -0.950868 -0.203510 -0.711672 0.1272450.18 0.174186 1.809880 0.530837 -1.013094 -0.225788 -0.758616 0.1377430.19 0.186550 1.903862 0.548169 -1.075321 -0.248067 -0.805560 0.148241

ENB EQIX FSLR0.00 0.172940 0.225602 -0.0046150.01 0.154576 0.256599 -0.0199740.02 0.136212 0.287595 -0.0353330.03 0.117848 0.318592 -0.0506920.04 0.099484 0.349588 -0.0660510.05 0.081120 0.380585 -0.0814090.06 0.062756 0.411582 -0.0967680.07 0.044392 0.442578 -0.1121270.08 0.026028 0.473575 -0.1274860.09 0.007664 0.504571 -0.1428450.10 -0.010700 0.535568 -0.1582030.11 -0.029064 0.566564 -0.1735620.12 -0.047428 0.597561 -0.1889210.13 -0.065792 0.628557 -0.2042800.14 -0.084156 0.659554 -0.2196390.15 -0.102521 0.690550 -0.2349970.16 -0.120885 0.721547 -0.2503560.17 -0.139249 0.752543 -0.2657150.18 -0.157613 0.783540 -0.2810740.19 -0.175977 0.814537 -0.296433

[83]: weights_df_dtale = dtale.show(weights_df, ignore_duplicate=True)weights_df_dtale

<IPython.lib.display.IFrame at 0x221ee454eb8>


[83]:

[85]: metric_dtale_test = dtale.show(metric_df_test.T, ignore_duplicate=True)metric_dtale_test

<IPython.lib.display.IFrame at 0x221ee65a8d0>

II

3 Causal Inference on S&P 500 Stocks’ β 793.1 Problem Definition3.2 Data Processing3.3 Market Average Return and Benchmark β1203.4 Model Evaluation3.5 Exploration of Additional Models3.6 Potential Improvements and Recommendations3.7 Acknowledgement3.8 Appendix

Side Research

3. Causal Inference on S&P 500 Stocks’ β

3.1 Problem Definition

We are given a data set consisting of S&P500 stock time series with a time index and rawdaily return. The central theme of this project is to estimate the causal β of S&P500 withrespect to the S&P500 market return. For the sake of convenience, we have the followingdefinitions.

Definition 3.1.1 — Returns. We let Raw_Return(A, t) be the raw daily return of stock Aon day t. Moreover, let At be the whole basket of S&P500 stocks at time t. Moreover,we let

Market_Return(A, t) =1|At| ∑

A∈At

Raw_Return(A, t)× wA,t

denote the market return on day t where wA,t is the given market capitalization weightsof stocks in the basket.

R It is worth noting that the S&P500 status of a stock is not constant all the time. Thisresults in some NaNs values at the beginning or the end of the stock time series.

3.2 Data Processing

The original data set is stored in a .h5 file format. The data file can be found using thefollowing link:

https://student.cs.uwaterloo.ca/~w3zhuo/Public_Notes/ACTSC372/sp500.h5

To facilitate data analysis in later parts, we aim to consolidate data so that we can getticker-level information with ease.

1 %f = h5.File('sp500.h5', 'r')



80 Chapter 3. Causal Inference on S&P 500 Stocks’ β

2 ticker_array = np.array(f['ut']['axis1_level1 '])3 unique_key_array = np.array(f['ut']['axis1_label1 '])4 date_array = np.array(f['ut']['axis1_label0 '])5 data_table_header = np.array(f['ut']['block0_items '])6 data_table = np.array(f['ut']['block0_values '])7 data_table_master = dict(zip(data_table_header , np.transpose(data_table)))

� Example 3.1 — AAPL - Sample Output.

1 %sp_data[b'AAPL'].head()

�

3.3 Market Average Return and Benchmark β120

Based on our definition of the market return, we compute Market_Return(A, t) for all1≤ t ≤ Tmax, where Tmax = 5030 in this dataset, using the following code chunk.

1 %def calc_date_market_ret(date):2 date_key = np.where(date_array == date)3 date_key = list(date_key [0])45 date_ret_raw = data_table_master[b'ret_raw '][ date_key]6 date_sp_weight = data_table_master[b'sp_weight '][ date_key]78 return np.dot(date_ret_raw , date_sp_weight)9

10 date_range = np.arange(np.max(date_array) + 1)1112 date_market_ret = dict(13 date = date_range ,14 market_ret = np.array([ calc_date_market_ret(date) / 100 for date in

date_range ])15 )16 date_market_ret_df = pd.DataFrame.from_dict(date_market_ret)17 date_market_ret_df['cumulative_ret '] = date_market_ret_df.market_ret.map(

lambda x: 1 + x).cumprod ().map(lambda x: (x ** (1/252) - 1))

The output of the resulting dataframe can be found in the attached notebook as it wouldbe too long to include.

3.3 Market Average Return and Benchmark β120 81

This definition of the market average return is consistent with the S&P500 return definitionas it captures the weights defined by market capitalization percentage of the total S&P500capitalization. Such weighted sum is a common practice when calculating daily averagereturns.

3.3.1 Estimation Benchmark: β120

The β120 is defined to be:

Definition 3.3.1 — β120. For stock A on day T, β120(A, T) is the slope ofthe weighted linear regression line between {Raw_Return(A, t) : t0 ≤ t < T} and{Market_Return(A, t) : t0 ≤ t < T}. The weights are pre-defined using a specifiedhalf-life = 120 days, i.e.,

ρt = 0.5t−t0120 ,∀t0 ≤ t ≤ T

This benchmark can be considered as a special case of the weighted linear regression. Theidea of a weighted linear regression (WLS) to give different weights/penalty factors to theunexplained residuals. As shown below

L(~β; (xi,yi)) =n

∑i=1

ρi(yi − (β0 + β1xi))2

We simplify the WLS by assuming ρi follows certain logic and β0 ≈ 0.

• The reason why one might consider such an exponentially decaying weighting is topunish the model more when it cannot explain past data well and punish the model


not as much for the more recent residuals. This is just an observation but one reasonto do in this practice is that investors believe in the fundamentals and historicalperformance of the stock more. Another point is about the 120-day halflife. Thisis close to half of the trading days in a year, which might be related to significancedecaying assumptions.

• The overall framework that we can assume linear relationship between the marketreturn and the individual stock return is the well-known CAPM model.

rA = r f + β(rM − r f )

where r f is the risk-free rate, usually the 1-year rate on the US-Treasury bill. As weshall see empirically that it is fairly small and even smaller in recent low-interest era.

To compute β120(A, T) for any stock at in-domain time, we have the following code chunk.

1 %bench_data_dict = dict()23 def bench_beta(date , ticker):4 if (date , ticker) in bench_data_dict.keys():5 return bench_data_dict [(date , ticker)]6 else:7 ind_stock = sp_data[ticker]8 start_date = min(ind_stock.date_index)9 ret_raw_stock = np.array(ind_stock[ind_stock.date_index < date][b'

ret_raw '])10 market_ret = np.array(date_market_ret_df [( date_market_ret_df.date <

date) & \11 (date_market_ret_df.date >= start_date)]. market_ret)12 decay_weights = np.apply_along_axis(lambda x: 0.5 ** ((x -

start_date) / 120), 0, np.arange(start = start_date , stop = date))13 X = sm.add_constant(market_ret)14 Y = ret_raw_stock15 WLS = sm.WLS(Y, X, weights = decay_weights)16 results = WLS.fit()17 bench_data_dict [(date , ticker)] = (results , results.params [1])18 return (results , results.params [1])

� Example 3.2 — DTE Energy. We compute one β120 using the function bench_beta()directly.

1 %bench_beta (5030, b'DTE')[0]. summary ()23 ##Output:4 WLS Regression Results5 Dep. Variable: y R-squared: 0.0346 Model: WLS Adj. R-squared: 0.0347 Method: Least Squares F-statistic: 176.48 Date: Sun , 11 Oct 2020 Prob (F-statistic): 1.37e-399 Time: 22:33:52 Log -Likelihood: -15332.

10 No. Observations: 5029 AIC: 3.067e+0411 Df Residuals: 5027 BIC: 3.068e+0412 Df Model: 113 Covariance Type: nonrobust14 coef std err t P>|t| [0.025 0.975]15 const 0.0011 0.000 3.977 0.000 0.001 0.00216 x1 0.3218 0.024 13.281 0.000 0.274 0.369

3.4 Model Evaluation 83

17 Omnibus: 4812.539 Durbin -Watson: 1.77518 Prob(Omnibus): 0.000 Jarque -Bera (JB): 4917471.58019 Skew: 3.553 Prob(JB): 0.0020 Kurtosis: 156.027 Cond. No. 89.0

As we can see from the output, the intercept is fairly small, β120(DTE,5030) = 0.3218,and the p-value of 0.000 implies there is very strong evidence against H0 : β = 0, thus,suggesting linearity. �

3.4 Model Evaluation

The general framework is based on the given target condition: we need β120(A, T) ×Market_Return(A, T) to be a "good" estimate of Raw_Return(A, T). Since this is timeseries type of data, we shall not attempt to use traditional cross-validation method dueto time dependencies. To address this main goal, we define the following evaluationmetrics.

Definition 3.4.1 — Pointwise Squared Difference (PSD). For a given stock A, date T, andestimate β̂(A, T), the pointwise squared difference (PSD) is defined as

PSD(A, T) =[β̂(A, T)×Market_Return(A, T)−Raw_Return(A, T)

]2

R PSD is a very straight-forward metric evaluating β120’s estimation performance on asingle day T. However, this might not capture the whole picture of all the precedentdata points.

Definition 3.4.2 — Root Cumulative Squared Difference (RCSD). For a given stock A,date T, and estimate β̂(A, T), the root cumulative squared difference (RCSD) is definedas

RCSD(A, T) =

√√√√ 1T − t0

T−1

∑t=t0

[β̂(A, T)×Market_Return(A, t)−Raw_Return(A, t)]2

R RCSD is usually also known as the root mean squared error (RMSE), which cancapture more about the effectiveness of the WLS model.

While PSD,RCSD focuses on the model performance on individual stock return withrespect to the market return. We also want a more general metric to measure the modelperformance on a larger set of S&P companies. To that end, we define:

Definition 3.4.3 — S&P500 Estimation Variance (SPEV). For a given stock A, date T =Tmax, and a β̂(A, Tmax), the S&P500 estimation variance (SPEV) is defined as

SPEV(A) =1|I| ∑

A∈IRCSD(A, Tmax)

where I = {A ∈A : wA,Tmax ≥ 0.01}.


R For the convenience of computation, I is the set of S&P 500 that takes up at least 1%of the total market capitalization. This was done by the following code chunk.

1 %date_key = np.where(date_array == 5030)2 top_stocks_index = unique_key_array[np.where(data_table [:,-2] >=

1)]3 top_stocks = np.unique(ticker_array[top_stocks_index ])45 ## Outputs:6 array([b'AAPL', b'AMZN', b'BRK -B', b'FB', b'GOOG', b'GOOGL', b'

HD',7 b'JNJ', b'JPM', b'MSFT', b'PG', b'T', b'V', b'VZ', b'XOM'

],8 dtype='|S5')

Besides numerical statistics that we can compute for the model evaluation, we can alsointroduce some graphical summaries.

• Scatter plot with regression line: this is used to see if outliers have affected theregression significantly

• Line Plot of β̂ over time: this is used to see if the estimated β̂ is consistent over timeand how soon the convergence appears.

All of these metrics are defined in a general way so that we can apply them to additionalmodels introduced later.

1 %def beta_evaluation_pointwise(bench_method , date , ticker):2 diff = (bench_method(date , ticker)[1] * date_market_ret_df[

date_market_ret_df.date == date]. market_ret - \3 sp_data[ticker ][ sp_data[ticker ]. date_index == date][b'ret_raw '].

tolist ()[0]) ** 24 return diff.values [0]56 def beta_evaluation_cum(bench_method , date , ticker):7 ind_stock = sp_data[ticker]8 start_date = min(ind_stock.date_index)9 date_index_array = np.arange(start = start_date + 1, stop = date)

10 local_beta = bench_method(date , ticker)[1]11 f = lambda date: (( local_beta * date_market_ret_df[date_market_ret_df.

date == date]. market_ret - \12 sp_data[ticker ][ sp_data[ticker ]. date_index == date][b'ret_raw '].

tolist ()[0]) ** 2).values [0]13 return np.sqrt(np.mean([f(date) for date in date_index_array ]))1415 def sp_model_evaluation(method):16 f = lambda ticker: beta_evaluation_cum(method , max(sp_data[ticker ].

date_index), ticker)17 return np.mean([f(ticker) for ticker in top_stocks ])

1 %def scatter_summary_max(ticker , date , method):2 ind_stock = sp_data[ticker]3 start_date = min(ind_stock.date_index)45 LS = method(date , ticker)[0]67 prstd , iv_l , iv_u = wls_prediction_std(WLS)

3.4 Model Evaluation 85

89 y = np.array(ind_stock[ind_stock.date_index < date][b'ret_raw '])

10 x = np.array(date_market_ret_df [( date_market_ret_df.date < date) & \11 (date_market_ret_df.date >= start_date)]. market_ret)1213 fig , ax = plt.subplots(figsize = (10, 8))1415 ax.plot(x, y, 'o', label="Data")1617 ax.plot(x, LS.fittedvalues , 'g--.')18 ax.legend(loc="best")19 plt.xlabel('Market Return ')20 plt.ylabel(str(ticker)[2: -1] + " Raw Return")21 plt.title(str(ticker)[2: -1] + " Raw Return vs. Market Return (All Dates

)")222324 def running_beta(ticker , method):25 ind_stock = sp_data[ticker]26 start_date = min(ind_stock.date_index)27 end_date = max(ind_stock.date_index)2829 x = np.arange(start = start_date + 2, stop = end_date)30 y = np.array ([ method(date , ticker)[1] for date in x])3132 fig , ax = plt.subplots(figsize = (12, 8))3334 ax.plot(x, y, 'b-', label = "Estimated Causal Beta")3536 ax.legend(loc = "best")37 plt.xlabel("Time Index")38 plt.ylabel(str(ticker)[2: -1] + " Estimated Beta")39 plt.title(str(ticker)[2: -1] + " Estimated Causal Beta vs. Time")

� Example 3.3 — DTE and AAPL Model Evaluation. For T = Tmax, we have

Stock PSD RCSD SPEVDTE 3.151× 10−5 0.011592 0.0158983AAPL 3.577× 10−8 0.024149


So far, we cannot say much about the numerical summaries as there is no competitionyet. However, from the scatter plots, we can see that it can capture the general trend butnot so impressive. The line plots with β̂ over time also both display fierce damping at thebeginning, which is understandable as there are less date at the beginning. Both plotsconverge to a value fairly fast before day 1000. �

3.5 Exploration of Additional ModelsIn this section, we shall explore some alternatives to the benchmark β120 estimation.

3.5.1 Ordinary Linear Regression (OLS)Our benchmark β120 model assumes an exponentially decaying weighted linear regression.Decaying penalties possess initial assumption of the running market return versus theindividual stock return. This assumption itself might be too strict, thus, it does not harmus to see the most simplistic case using ordinary linear regression. Essentially, it treats allresiduals equally with ρt = 1.

1 %ols_data_dict = dict()23 def ols_beta(date , ticker):4 if (date , ticker) in ols_data_dict.keys():5 return ols_data_dict [(date , ticker)]6 else:7 ind_stock = sp_data[ticker]8 start_date = min(ind_stock.date_index)9 ret_raw_stock = np.array(ind_stock[ind_stock.date_index < date][b'


date) & \11 (date_market_ret_df.date >= start_date)]. market_ret)12 X = sm.add_constant(market_ret)13 Y = ret_raw_stock14 OLS = sm.OLS(Y,X)15 results = OLS.fit()16 ols_data_dict [(date , ticker)] = (results , results.params [1])17 return (results , results.params [1])

3.5.2 Reversed Weighted Linear RegressionWe were not exactly satisfied with our β120 estimation not only numerically/graphicallybut also about its decaying assumption. From a time-series analysis perspective, we

3.5 Exploration of Additional Models 87

usually models the stock price as a stochastic process (either a discrete/continuous timeMarkov Chain or a Martingale). Such time dependency and effectiveness imply that weshould put larger penalty factors to more recent residuals. Thus, we decide to reverse theweight factors as implemented below.

1 %reverse_data_dict = dict()23 ## Cache for efficiency4 def reverse_beta(date , ticker):5 if (date , ticker) in reverse_data_dict.keys():6 return reverse_data_dict [(date , ticker)]7 else:8 ind_stock = sp_data[ticker]9 start_date = min(ind_stock.date_index)

10 ret_raw_stock = np.array(ind_stock[ind_stock.date_index < date][b'ret_raw '])

11 market_ret = np.array(date_market_ret_df [( date_market_ret_df.date <date) & \

12 (date_market_ret_df.date >= start_date)]. market_ret)13 decay_weights = np.flip(np.apply_along_axis(lambda x: 0.5 ** ((x -

start_date) / 120), 0, np.arange(start = start_date , stop = date)), 0)14 X = sm.add_constant(market_ret)15 Y = ret_raw_stock16 WLS = sm.WLS(Y, X, weights = decay_weights)17 results = WLS.fit()18 reverse_data_dict [(date , ticker)] = (results , results.params [1])19 return (results , results.params [1])

R It is certainly worthwhile to consider optimize the parameter, halflife = 120, but thisis an attempt to overthrow the benchmark methodology.

3.5.3 Robust Regression (Huber Regression)From the CAPM model, we know that diversification of the portfolio by using a marketportfolio can only eliminate non-systematic risk. Sudden market events can easily createoutliers in the return figures, examples including 911, 2008 finanical crisis, and MarchCOVID-19 massive sales. This requires a more robust regression framework and we shalluse the well-known Huber loss function

ρk(r) =

{12 k2 |r| ≤ kk|r| − 1

2 k2 |r| > k

in practice, it is common to set k = 1.345 to achieve a theoretical balance between efficiencyand resistance to outliers. Moreover, our approach uses iterative re-weighted least square(IRLS) method to find β. IRLS is a dynamic way of reweighting loss function similar toweighted least square. Even though it is considered more computationally heavy, it isusually capable to search for a set of weights to minimize the root average squared erroras we introduced.

1 %huber_data_dict = dict()23 def huber_beta(date , ticker):4 if (date , ticker) in huber_data_dict.keys():5 return huber_data_dict [(date , ticker)]6 else:


7 ind_stock = sp_data[ticker]8 start_date = min(ind_stock.date_index)9 ret_raw_stock = np.array(ind_stock[ind_stock.date_index < date][b'


date) & \11 (date_market_ret_df.date >= start_date)]. market_ret)12 X = sm.add_constant(market_ret)13 Y = ret_raw_stock14 huber = sm.RLM(Y, X, M = sm.robust.norms.HuberT (1.345))15 results = huber.fit()16 huber_data_dict [(date , ticker)] = (results , results.params [1])17 return (results , results.params [1])

3.5.4 Model Evaluation: Frequentist Approaches

So far, we have discussed three linear regression approaches that are usually honoured byfrequentists.

� Example 3.4 — AAPL - All Model Evalutions. We summarize the computed evaluationmetrics in the following table.

Models β̂ PSD RCSD SPEVBenchmark 2.4743 3.5772× 10−8 0.024149 0.0158983

OLS 1.3432 1.04801× 10−5 0.020787 0.0151624Reversed WLS 1.4562 8.38024× 10−6 0.020822 0.0153588

Robust Regression 1.2777 1.18044× 10−5 0.020798 0.0151754

As we can see from this example, estimations of β for stock AAPL (Apple Inc.) are roughlyconsistent among OLS, reversed WLS, and robust regression, while the β120 seems to beoff. Moreover, OLS and robust regression are the top performers in terms of RCSD andthe overall SPEV, which imply that they are appropriate estimators for β.

R As we can see, even tough this is just an example on a single stock, SPEV’s signalseem to be consistent with RCSD and they can split the good estimators from themediocre ones.

Furthermore, we can look at the scatter plots and line plots.

3.5 Exploration of Additional Models 89


From the scatter plot, we can see OLS, reversed WLS, and robust regression can capturethe general linear trend without being distracted by the outliers. While the line plotsindicate that reversed WLS continues the damping without convergence even till the mostrecent time. This plot also suggest OLS and robust regression should be selected.

For the summary plots implementations, please check 3.11 and 3.12 �

3.5.5 Bayesian Linear RegressionPrevious methods are related to frequentist point of view of this problem. It is alsoworthwhile to consider Bayesian Linear Regression to obtain an posterior distributionof the β. In a retrospective way of thinking, Bayesian linear regression does not totallydepend on the sampling data provided and it can be generalized in more scenarios,especially true for recent volatile markets.

1 %def posterior(Phi , t, alpha , beta , return_inverse=False):2 S_N_inv = alpha * np.eye(Phi.shape [1]) + beta * Phi.T.dot(Phi)3 S_N = np.linalg.inv(S_N_inv)4 m_N = beta * S_N.dot(Phi.T).dot(t)56 if return_inverse:7 return m_N , S_N , S_N_inv8 else:9 return m_N , S_N

1011 def expand(x, bf , bf_args=None):12 if bf_args is None:13 return np.concatenate ([np.ones(x.shape), bf(x)], axis =1)14 else:15 return np.concatenate ([np.ones(x.shape)] + [bf(x, bf_arg) for

bf_arg in bf_args], axis =1)1617 def blr_beta(date , ticker , lr_num = 6):18 ind_stock = sp_data[ticker]19 start_date = min(ind_stock.date_index)20 ret_raw_stock = np.array(ind_stock[ind_stock.date_index < date][b'


date) & \22 (date_market_ret_df.date >= start_date)]. market_ret)2324 X = market_ret.reshape(-1, 1)25 Y = ret_raw_stock.reshape(-1, 1)2627 # Design matrix of test observations28 Phi_test = expand(X_test , lambda x: x)2930 plt.figure(figsize =(10, 8))31 # Design matrix of training observations32 Phi = expand(X, lambda x: x)3334 # Mean and covariance matrix of posterior35 m, S = posterior(Phi , Y, alpha , beta)3637 # Draw 5 random weight samples from posterior and compute y values38 w_samples = np.random.multivariate_normal(m.ravel (), S, lr_num).T39 y_samples = Phi_test.dot(w_samples)40 plt.plot(X, Y, 'o', label = "Data")4142 for index , y in enumerate(y_samples.T):

3.6 Potential Improvements and Recommendations 91

43 plt.plot(X, y, 'b-', label = "Regression Line "+ str(index))44 plt.xlabel('Market Return ')45 plt.ylabel(str(ticker)[2: -1] + " Raw Return")46 plt.title(str(ticker)[2: -1] + " Raw Return vs. Market Return (All Dates

)")

� Example 3.5 — AAPL - Bayesian Linear Regression.

1 %blr_beta (5030, b'AAPL', 20)

�

3.6 Potential Improvements and Recommendations

3.6.1 Sector EffectIn today’s S&P 500, tech stocks have taken a large proportion of the overall marketcapitalization. It is merely naive to assume that the tech sector itself does not affect the βof a tech stock. In fact, tech stocks are usually considered to possess higher β than stocksin other sectors. β itself is considered as the sensitivity of the return with respect to themarket return. As we can see below, the comparison between Nasdaq-100 and S&P500 interms of cumulative return displays higher β in the tech sector.


Meanwhile, sectors such as utility and mineral sectors usually have a lower β compared tothe overall S&P500. This can be incorporated into our modeling if we have data classifyingstocks by sectors.

3.6.2 Value or Growth?

Value stocks tend to have a more at-par β against S&P500 while growth stocks tend tohave a higher than S&P 500 β. There are two reasons that come to mind.

• Investor perspectives: investors in value stocks are usually more resistant to adjustpositions based on general market movement while investor in growth stocks aremore sensitive to the general market movement due to the companies’ uncertainnature at their early stages.

• Cash flows: value stock companies usually generate stable cash flows and lessvolatile dividend issuance while growth stock companies usually have unexpectedlyhigh or low earning reported (for example, TSLA over recent years, possibly why itis not included in the S&P500 yet).

This can be incorporated into modeling if we can get the adjusted P/E ratios of allcompanies.

3.6.3 Foreign or Domestic?

S&P500 does not only have US stocks. This might play a role in the β estimation sinceforeign companies might be less affected by the US macro condition or domestic policiessince its operation and earning sources might be oversea. This lower correlation leads tolower β compared to the average. This can be incorporated into the modeling by labelingthe locations of companies’ headquarters.

3.7 Acknowledgement 93

3.6.4 COVID-19As we are in an unprecedented time, a pandemic has made the tech giants even morehumongous while companies with more retail businesses tend to suffer. In recent time,the major market movements are sometimes driven by the tech sector itself, such as at thestart of the COVID-19 pandemic in the US and the large-scale tech option purchase bySoftbank. This should be researched in further details.

3.6.5 Link to RatesIt is apparent that loose monetary policy maintained by the Feds have restrained andwill continue to restrain the overnight lending rate around 0%. Such action indeed hasinduced several market surges. However, sectors such as finanicals tend to go into theopposite direction. During a era that requires the Feds to save the economy, any stockwith correlation with the rate changes will have its β changed accordingly.

3.7 AcknowledgementLastly, I would like to thank Cubist Systematic Strategies for providing this data set andthis interesting topic for me to explore.


3.8 Appendix1 %def scatter_summary(ticker , date):2 ind_stock = sp_data[ticker]3 start_date = min(ind_stock.date_index)45 fig , ax = plt.subplots(figsize = (10, 8))6 method_list = [bench_beta , ols_beta , reverse_beta , huber_beta]7 color_list = ["red", "green", "purple", "orange"]8 y = np.array(ind_stock[ind_stock.date_index < date][b'ret_raw '])9 x = np.array(date_market_ret_df [( date_market_ret_df.date < date) & \

10 (date_market_ret_df.date >= start_date)]. market_ret)11 ax.plot(x, y, 'o', label="Data")12 for index in range (4):13 method = method_list[index]14 LS = method(date , ticker)[0]15 prstd , iv_l , iv_u = wls_prediction_std(WLS)16 ax.plot(x, LS.fittedvalues , 'g--.', label = method.__name__ , color

= color_list[index ])17 ax.legend(loc="best")1819 plt.xlabel('Market Return ')20 plt.ylabel(str(ticker)[2: -1] + " Raw Return")21 plt.title(str(ticker)[2: -1] + " Raw Return vs. Market Return (All Dates

)")

1 %def running_beta_summary(ticker):2 ind_stock = sp_data[ticker]3 start_date = min(ind_stock.date_index)4 end_date = max(ind_stock.date_index)5 fig , ax = plt.subplots(figsize = (12, 8))6 x = np.arange(start = start_date + 3, stop = end_date)7 method_list = [bench_beta , ols_beta , reverse_beta , huber_beta]8 color_list = ["red", "green", "purple", "orange"]9 for index in range (4):

10 method = method_list[index]11 y = np.array ([ method(date , ticker)[1] for date in x])12 ax.plot(x, y, 'b-', label = method.__name__ , color = color_list[

index])1314 ax.legend(loc = "best")15 plt.xlabel("Time Index")16 plt.ylabel(str(ticker)[2: -1] + " Estimated Beta")17 plt.title(str(ticker)[2: -1] + " Estimated Causal Beta vs. Time")

Documents

ACTSC372 Course Project