Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Data-Driven Portfolio Optimization Utilizing Machine Learning
MELINDA HSIEH
RIDER UNIVERSITY
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 1
Topics
Data-Driven Portfolio Optimization with Drawdown Constraints
Prescribe Optimal Portfolio utilizing Machine Learning Methods
Portfolio Performance Assessment
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 2
Drawdown – Peak to Trough decline
𝑫 𝒕 = 𝐦𝐚𝐱𝟎≤𝝉≤𝒕
𝑹 𝝉 − 𝑹(𝒕)
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 4
𝑡1 𝑡2 𝑡3 ⋯⋯ 𝑇 𝑡𝑖𝑚𝑒
Po
rtfo
lio C
um
ula
tive
R
etu
rn
R(t)
D(t)
Portfolio Optimization – with constraint in drawdown
𝐦𝐚𝐱𝝎
𝐸[𝝎𝑇 𝑅𝑵]
s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶
𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2
𝑹𝑵: the annualized cumulative returns of N assets over the time period [0,T]
𝝎: investment weights of the N assets
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 5
Data-Driven Portfolio Optimization – with constraint in drawdown
𝐦𝐚𝐱𝝎
𝝎𝑇 𝑅𝑵
s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶
𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2
𝑹𝑵: the estimated average annualized cumulative returns of N assets over the time period [0,T]
𝝎: investment weights of the N assets
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 6
Big Data
An explosion in the availability and accessibility in data
For example, people frequently browse the internet: shopping, streaming, and searching for key words
Online activities generate footprints - source of data
A variety of data sources may be related to stock returns
Optimal portfolio weights should take into account of these
auxiliary variables
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 7
Portfolio Optimization with Auxiliary Variables
max𝝎(𝒙)
𝝎𝑇(𝑥)𝐸[ 𝑅𝑁 |𝒙]
s.t. max 𝐷 𝝎(𝑥), 𝑡 ≤ 𝐶
𝐶1 ≤ 𝝎(𝒙)𝑇1𝑁 ≤ 𝐶2
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 8
𝒙 ∈ 𝑹𝒅 are auxiliary variables with d-dimensions
Data-Driven Portfolio Optimization with Auxiliary Variables
max𝝎(𝒙)
𝝎𝑇(𝑥) 𝑅𝑁 (𝒙)
s.t. max 𝐷 𝝎(𝑥), 𝑡 ≤ 𝐶
𝐶1 ≤ 𝝎(𝒙)𝑇1𝑁 ≤ 𝐶2
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 9
𝑅𝑁 𝑥 : the estimated conditional mean of annualized cumulative returns of N assets
Searching Optimal Portfolio Weights – Linear Programming Problem
The portfolio optimization can be represented as a
linear programming problem
Compute average accumulative returns to obtain
𝑅𝑁
The derived optimal portfolio weight is data-driven,
depending on the estimated 𝑅𝑁
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 10
Machine Learning
Apply machine learning methods to estimate the conditional means
of returns 𝑹𝑵 𝒙 and derive the optimal portfolio weight 𝝎(𝒙):
1. Use machine learning methods to classify asset returns into several groups based on the values of auxiliary variables
2. For each group, compute 𝑹𝑵(𝒙)
3. Solve the linear programing problem to find the optimal portfolio weight 𝝎(𝒙)
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 11
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 12
𝑅𝑁(1); 𝑥1
(1), 𝑥2
(1)… , 𝑥𝑑
(1)
𝑅𝑁(2)
; 𝑥1(2), 𝑥2
(2)… , 𝑥𝑑
(2)
𝑅𝑁(3)
; 𝑥1(3), 𝑥2
(3)… , 𝑥𝑑
(3)
⋮
𝑅𝑁(𝑇); 𝑥1
(𝑇), 𝑥2
(𝑇)… , 𝑥𝑑
(𝑇)
𝑅𝑁(1); 𝑥1
(1), 𝑥2
(1)… , 𝑥𝑑
(1)
𝑅𝑁(3); 𝑥1
(3), 𝑥2
(3)… , 𝑥𝑑
(3)
𝑅𝑁(4); 𝑥1
(4), 𝑥2
(4)… , 𝑥𝑑
(4)
𝑅𝑁(5); 𝑥1
(5), 𝑥2
(5)… , 𝑥𝑑
(5)
𝑅𝑁(10)
; 𝑥1(10)
, 𝑥2(10)
… , 𝑥𝑑(10)
𝑅𝑁(2); 𝑥1
(2), 𝑥2
(2)… , 𝑥𝑑
(2)
𝑅𝑁(6); 𝑥1
(6), 𝑥2
(6)… , 𝑥𝑑
(6)
𝑅𝑁(20)
; 𝑥1(20)
, 𝑥2(20)
… , 𝑥𝑑(20)
𝑅𝑁(𝑇); 𝑥1
(𝑇), 𝑥2
(𝑇)… , 𝑥𝑑
(𝑇)
Train data 𝝎 𝒙(𝟏), 𝒙(𝟑), 𝒙(𝟒)
𝝎 𝒙(𝟓), 𝒙(𝟏𝟎)
𝝎 𝒙(𝟐), 𝒙(𝟔), 𝒙(𝟐𝟎), 𝒙(𝑻)
Prescribe Investment Weights
𝑅𝑁(𝑖): cumulative returns observed at time i
KNN - K Nearest Neighborhood
𝝎𝑲𝑵𝑵(𝒙𝟎) ∈ 𝐦𝐚𝐱𝒙 𝒊 ∈ ℵ𝒌(𝒙𝟎)
𝝎𝑇 𝑹𝑁 (𝒙(𝒊))
s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶
𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2
where ℵ𝐾 𝒙𝟎 = 𝑖 = 1,… , 𝑇 ∶ 𝑗=1𝑇 𝐼 𝒙𝟎 − 𝒙(𝒊) ≥ 𝒙𝟎 − 𝒙(𝒋) ≤ 𝑘 is the neighborhood of
k data points that are closest to the point 𝒙𝟎
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 13
Ctree - Conditional Inference Tree (Hothorn et al (2006)
𝝎𝑪𝒕𝒓𝒆𝒆(𝒙) ∈ 𝐦𝐚𝐱𝒊:𝑹(𝒙 𝒊 ) ∈ 𝑹(𝒙)
𝝎𝑇 𝑹𝑁 (𝒙(𝒊))
s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶
𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2
where R(x) is the splitting rule implied by a regression tree trained based on the training data
Ctree applies the inference test to determine if a possible split is significant
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 14
Random Forest
𝝎𝑹𝑭𝒌 (𝒙) ∈ 𝐦𝐚𝐱
𝒊:𝑹𝒌(𝒙 𝒊 ) ∈ 𝑹𝒌(𝒙)𝝎𝑇 𝑹𝑁 (𝒙(𝒊))
s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶
𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2
𝝎𝑹𝑭 𝒙 =𝟏
𝒎
𝒌=𝟏
𝒎
𝝎𝑹𝑭𝒌 (𝒙)
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 15
Simulation
Let 𝑋1, 𝑋2, 𝑋3 be the market factors that are related to the returns of the underlying assets in the portfolio.
Assume 𝑋 𝑡 = {𝑋1(𝑡), 𝑋2(𝑡), 𝑋3(𝑡)} follows a multivariate ARMA(2,2) process
Generate 12 time series of returns based on the three market factors
𝑅𝑁 𝑡 = 𝐴𝑇 𝑋 𝑡 +𝛿
4+ (𝐵𝑇(𝑋(𝑡))𝜂
where 𝛿 and 𝜂 are independent Gaussian noises
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 16
Portfolio Performance – Validation
For each realization of the simulated returns
split the series into in-sample (training) and out-of-sample (validation) data.
Trained the in-sample data based on the machine learning algorithm and prescribe optimal portfolio weights
Apply the prescribed weights to the out-of-sample (validation) returns and compute the portfolio returns
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 17
Performance Metrics
1. Average Cumulative Return
𝟏
𝒏
𝒕=𝟏
𝒏
𝑹𝒕𝒗
2. Maximum Drawdown
𝐦𝐚𝐱𝟎≤𝒕≤𝒏
𝐦𝐚𝐱𝟎≤𝝉≤𝒕
𝑹 𝝉 − 𝑹 𝒕
3. Reward Risk 𝟏𝒏 𝒕=𝟏𝒏 𝑹𝒕
𝒗
𝐦𝐚𝐱𝟎≤𝒕≤𝒏
𝐦𝐚𝐱𝟎≤𝝉≤𝒕
𝑹 𝝉 − 𝑹 𝒕
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 18
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 19
0.0
0.2
0.4
10 100 1000 2500 5000 10000Training Size
Av
era
ge
Cu
mu
lati
ve
Re
turn
(%)
method
CtreeKNNRFSAA
Average Cumulative Returns
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 20
0.5
1.0
1.5
2.0
10 100 1000 2500 5000 10000Training Size
Ma
x D
raw
do
wn
method
CtreeKNNRFSAA
Maximum Drawdown
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 21
0.0
0.5
1.0
1.5
10 100 1000 2500 5000 10000Training Size
Re
wa
rd R
isk method
CtreeKNNRFSAA
Reward Risk
Summary
Machine Learning method improves the out-of sample performance of the data-driven optimal portfolio
The performance improves as the size of training data increases
Tree-based approaches such as random forest and Ctreeoutperform the SAA method which does not incorporate the inputs from auxiliary variables.
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 22
Future Work
how the tuning parameters affect optimal portfolio weights and portfolio performance?
how the correlations of the auxiliary variables affect optimal portfolio weights and portfolio performance?
5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 23