25
Data-Driven Portfolio Optimization Utilizing Machine Learning MELINDA HSIEH RIDER UNIVERSITY 5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 1

Data Driven Portfolio Optimization utilizing Machine Learning

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Data-Driven Portfolio Optimization Utilizing Machine Learning

MELINDA HSIEH

RIDER UNIVERSITY

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 1

Topics

Data-Driven Portfolio Optimization with Drawdown Constraints

Prescribe Optimal Portfolio utilizing Machine Learning Methods

Portfolio Performance Assessment

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 2

Portfolio Performance

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 3

Drawdown – Peak to Trough decline

𝑫 𝒕 = 𝐦𝐚𝐱𝟎≤𝝉≤𝒕

𝑹 𝝉 − 𝑹(𝒕)

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 4

𝑡1 𝑡2 𝑡3 ⋯⋯ 𝑇 𝑡𝑖𝑚𝑒

Po

rtfo

lio C

um

ula

tive

R

etu

rn

R(t)

D(t)

Portfolio Optimization – with constraint in drawdown

𝐦𝐚𝐱𝝎

𝐸[𝝎𝑇 𝑅𝑵]

s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

𝑹𝑵: the annualized cumulative returns of N assets over the time period [0,T]

𝝎: investment weights of the N assets

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 5

Data-Driven Portfolio Optimization – with constraint in drawdown

𝐦𝐚𝐱𝝎

𝝎𝑇 𝑅𝑵

s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

𝑹𝑵: the estimated average annualized cumulative returns of N assets over the time period [0,T]

𝝎: investment weights of the N assets

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 6

Big Data

An explosion in the availability and accessibility in data

For example, people frequently browse the internet: shopping, streaming, and searching for key words

Online activities generate footprints - source of data

A variety of data sources may be related to stock returns

Optimal portfolio weights should take into account of these

auxiliary variables

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 7

Portfolio Optimization with Auxiliary Variables

max𝝎(𝒙)

𝝎𝑇(𝑥)𝐸[ 𝑅𝑁 |𝒙]

s.t. max 𝐷 𝝎(𝑥), 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎(𝒙)𝑇1𝑁 ≤ 𝐶2

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 8

𝒙 ∈ 𝑹𝒅 are auxiliary variables with d-dimensions

Data-Driven Portfolio Optimization with Auxiliary Variables

max𝝎(𝒙)

𝝎𝑇(𝑥) 𝑅𝑁 (𝒙)

s.t. max 𝐷 𝝎(𝑥), 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎(𝒙)𝑇1𝑁 ≤ 𝐶2

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 9

𝑅𝑁 𝑥 : the estimated conditional mean of annualized cumulative returns of N assets

Searching Optimal Portfolio Weights – Linear Programming Problem

The portfolio optimization can be represented as a

linear programming problem

Compute average accumulative returns to obtain

𝑅𝑁

The derived optimal portfolio weight is data-driven,

depending on the estimated 𝑅𝑁

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 10

Machine Learning

Apply machine learning methods to estimate the conditional means

of returns 𝑹𝑵 𝒙 and derive the optimal portfolio weight 𝝎(𝒙):

1. Use machine learning methods to classify asset returns into several groups based on the values of auxiliary variables

2. For each group, compute 𝑹𝑵(𝒙)

3. Solve the linear programing problem to find the optimal portfolio weight 𝝎(𝒙)

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 11

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 12

𝑅𝑁(1); 𝑥1

(1), 𝑥2

(1)… , 𝑥𝑑

(1)

𝑅𝑁(2)

; 𝑥1(2), 𝑥2

(2)… , 𝑥𝑑

(2)

𝑅𝑁(3)

; 𝑥1(3), 𝑥2

(3)… , 𝑥𝑑

(3)

𝑅𝑁(𝑇); 𝑥1

(𝑇), 𝑥2

(𝑇)… , 𝑥𝑑

(𝑇)

𝑅𝑁(1); 𝑥1

(1), 𝑥2

(1)… , 𝑥𝑑

(1)

𝑅𝑁(3); 𝑥1

(3), 𝑥2

(3)… , 𝑥𝑑

(3)

𝑅𝑁(4); 𝑥1

(4), 𝑥2

(4)… , 𝑥𝑑

(4)

𝑅𝑁(5); 𝑥1

(5), 𝑥2

(5)… , 𝑥𝑑

(5)

𝑅𝑁(10)

; 𝑥1(10)

, 𝑥2(10)

… , 𝑥𝑑(10)

𝑅𝑁(2); 𝑥1

(2), 𝑥2

(2)… , 𝑥𝑑

(2)

𝑅𝑁(6); 𝑥1

(6), 𝑥2

(6)… , 𝑥𝑑

(6)

𝑅𝑁(20)

; 𝑥1(20)

, 𝑥2(20)

… , 𝑥𝑑(20)

𝑅𝑁(𝑇); 𝑥1

(𝑇), 𝑥2

(𝑇)… , 𝑥𝑑

(𝑇)

Train data 𝝎 𝒙(𝟏), 𝒙(𝟑), 𝒙(𝟒)

𝝎 𝒙(𝟓), 𝒙(𝟏𝟎)

𝝎 𝒙(𝟐), 𝒙(𝟔), 𝒙(𝟐𝟎), 𝒙(𝑻)

Prescribe Investment Weights

𝑅𝑁(𝑖): cumulative returns observed at time i

KNN - K Nearest Neighborhood

𝝎𝑲𝑵𝑵(𝒙𝟎) ∈ 𝐦𝐚𝐱𝒙 𝒊 ∈ ℵ𝒌(𝒙𝟎)

𝝎𝑇 𝑹𝑁 (𝒙(𝒊))

s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

where ℵ𝐾 𝒙𝟎 = 𝑖 = 1,… , 𝑇 ∶ 𝑗=1𝑇 𝐼 𝒙𝟎 − 𝒙(𝒊) ≥ 𝒙𝟎 − 𝒙(𝒋) ≤ 𝑘 is the neighborhood of

k data points that are closest to the point 𝒙𝟎

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 13

Ctree - Conditional Inference Tree (Hothorn et al (2006)

𝝎𝑪𝒕𝒓𝒆𝒆(𝒙) ∈ 𝐦𝐚𝐱𝒊:𝑹(𝒙 𝒊 ) ∈ 𝑹(𝒙)

𝝎𝑇 𝑹𝑁 (𝒙(𝒊))

s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

where R(x) is the splitting rule implied by a regression tree trained based on the training data

Ctree applies the inference test to determine if a possible split is significant

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 14

Random Forest

𝝎𝑹𝑭𝒌 (𝒙) ∈ 𝐦𝐚𝐱

𝒊:𝑹𝒌(𝒙 𝒊 ) ∈ 𝑹𝒌(𝒙)𝝎𝑇 𝑹𝑁 (𝒙(𝒊))

s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

𝝎𝑹𝑭 𝒙 =𝟏

𝒎

𝒌=𝟏

𝒎

𝝎𝑹𝑭𝒌 (𝒙)

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 15

Simulation

Let 𝑋1, 𝑋2, 𝑋3 be the market factors that are related to the returns of the underlying assets in the portfolio.

Assume 𝑋 𝑡 = {𝑋1(𝑡), 𝑋2(𝑡), 𝑋3(𝑡)} follows a multivariate ARMA(2,2) process

Generate 12 time series of returns based on the three market factors

𝑅𝑁 𝑡 = 𝐴𝑇 𝑋 𝑡 +𝛿

4+ (𝐵𝑇(𝑋(𝑡))𝜂

where 𝛿 and 𝜂 are independent Gaussian noises

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 16

Portfolio Performance – Validation

For each realization of the simulated returns

split the series into in-sample (training) and out-of-sample (validation) data.

Trained the in-sample data based on the machine learning algorithm and prescribe optimal portfolio weights

Apply the prescribed weights to the out-of-sample (validation) returns and compute the portfolio returns

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 17

Performance Metrics

1. Average Cumulative Return

𝟏

𝒏

𝒕=𝟏

𝒏

𝑹𝒕𝒗

2. Maximum Drawdown

𝐦𝐚𝐱𝟎≤𝒕≤𝒏

𝐦𝐚𝐱𝟎≤𝝉≤𝒕

𝑹 𝝉 − 𝑹 𝒕

3. Reward Risk 𝟏𝒏 𝒕=𝟏𝒏 𝑹𝒕

𝒗

𝐦𝐚𝐱𝟎≤𝒕≤𝒏

𝐦𝐚𝐱𝟎≤𝝉≤𝒕

𝑹 𝝉 − 𝑹 𝒕

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 18

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 19

0.0

0.2

0.4

10 100 1000 2500 5000 10000Training Size

Av

era

ge

Cu

mu

lati

ve

Re

turn

(%)

method

CtreeKNNRFSAA

Average Cumulative Returns

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 20

0.5

1.0

1.5

2.0

10 100 1000 2500 5000 10000Training Size

Ma

x D

raw

do

wn

method

CtreeKNNRFSAA

Maximum Drawdown

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 21

0.0

0.5

1.0

1.5

10 100 1000 2500 5000 10000Training Size

Re

wa

rd R

isk method

CtreeKNNRFSAA

Reward Risk

Summary

Machine Learning method improves the out-of sample performance of the data-driven optimal portfolio

The performance improves as the size of training data increases

Tree-based approaches such as random forest and Ctreeoutperform the SAA method which does not incorporate the inputs from auxiliary variables.

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 22

Future Work

how the tuning parameters affect optimal portfolio weights and portfolio performance?

how the correlations of the auxiliary variables affect optimal portfolio weights and portfolio performance?

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 23

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 24

Appendix

Searching Optimal Portfolio Weights – Linear Programming Problem

𝐦𝐚𝐱𝝎

𝝎𝑇 𝑹𝑻×𝑵

s.t. 𝑢𝑘− 𝝎𝑇 𝑅𝑘×𝑁 ≤ 𝐶 , 1 ≤ 𝑘 ≤ 𝑁

𝑢𝑘 ≥ 𝝎𝑇 𝑅𝑘×𝑁 , 1 ≤ 𝑘 ≤ 𝑁

𝑢𝑘 ≥ 𝑢𝑘−1, 1 ≤ 𝑘 ≤ 𝑁

𝑢0= 0

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 25

where 𝑢𝑘 are auxiliary variables, 1 ≤ 𝑘 ≤ 𝑁