©2015 Morningstar, Inc. All rights reserved.
Andy Tang, CFA
@QWAFAFEW Pittsburgh
June 9th, 2015
The Wisdom of Crowds - Random Forest™ and Morningstar Quant Equity Rating
Ensemble Models
4
gCrowds aren’t wise unless:
/Independent– every guess must be independent from each other
/Talent – every guess must be slightly better than random guess
/Crowd – need a lot of guesses
gRandom Forests™ are structurally designed to take advantage of this phenomenon
Ensemble models mathematically formalize and capitalize on the concept of wisdom of crowds.
Random Forest™
5
gCollection of independent decision trees (“forest”)
gRandom subspaces to fit the trees (“random”)
g Idea: Wisdom of crowds
/Several weak predictions can be averaged to make ONE strong prediction
Random Forest™ Methodology
6
gHow to construct a decision tree?
/What to split?
/Where to split?
/When to stop splitting?
gHow to use Random Forest™?
/How to predict?
/Variable Importance
Methodology – what to split?
7
gStart with the training set (Y, X)
X, independent variables or features
Y, Categorical
or Continuous
Methodology – what to split?
gRandom subspaces
/Step 1. starts with some randomly sampled training set;
Methodology – what to split?
gRandom subspaces
/Step 1. starts with some randomly sampled training set;
/Step 2: randomly sampled features for potential splitting candidate, at every node;
Node 1
Node 2 Node 3
Methodology – what to split?
gRandom subspaces
/Step 1. starts with some randomly sampled training set;
/Step 2: randomly sampled features for potential splitting at every node;
Node 1
Node 2 Node 3
Independence
Methodology – Build 1st Decision Tree
11
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies 500 Companies
Avg P/FV = 0.8
False True
100 Companies
400 Companies
200 Companies
100 Companies
Company ROA
Market
Cap
Enterprise
Value
Volatility Drawdown P/E Sector P/E P/B
FairValue
Ratio
A X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
B X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
C X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
D X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
E X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
F X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
G X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
H X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
I X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
J X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
K X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
L X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
Methodology – Build 1st Decision Tree
12
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies
Avg P/FV = 0.8
500 Companies
Avg P/FV = 0.8
False True
100 Companies
Avg P/FV = 1.0
400 Companies
Avg P/FV = 1.2
200 Companies
Avg P/FV = 0.8
100 Companies
Avg P/FV = 1.8
Company ROA
Market
Cap
Enterprise
Value
Volatility Drawdown P/E Sector P/E P/B
FairValue
Ratio
A X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
B X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
C X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
D X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
E X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
F X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
G X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
H X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
I X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
J X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
K X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
L X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
Methodology – Build 1st Decision Tree
13
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies
Avg P/FV = 0.8
500 Companies
Avg P/FV = 0.8
False True
100 Companies
Avg P/FV = 1.0
400 Companies
Avg P/FV = 1.2
200 Companies
Avg P/FV = 0.8
100 Companies
Avg P/FV = 1.8
Company ROA
Market
Cap
Enterprise
Value
Volatility Drawdown P/E Sector P/E P/B
FairValue
Ratio
A X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
B X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
C X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
D X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
E X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
F X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
G X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
H X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
I X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
J X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
K X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
L X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XXTalent
Methodology – Build the forest
14
Tree 1 Tree N
Company ROA
Market
Cap
Enterprise
Value
Volatility Drawdown P/E Sector P/E P/B
FairValue
Ratio
A X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
B X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
C X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
D X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
E X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
F X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
G X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
H X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
I X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
J X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
K X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
L X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
…
Methodology – Build the forest
15
Tree 1 Tree N
Company ROA
Market
Cap
Enterprise
Value
Volatility Drawdown P/E Sector P/E P/B
FairValue
Ratio
A X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
B X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
C X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
D X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
E X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
F X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
G X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
H X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
I X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
J X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
K X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
L X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX X.XX
…
Crowds
Methodology – where to split?
19
gClassification Trees : Categorical Y
/GINI Ratio (CART)
/ Information Gain (C4.5/C5.0)
gRegression Trees: Continuous Y:
/Variance Reduction
Pre-split
Variance
Sum of Left Variance and Right
Variance Post-Split
rightN
rightyy
leftN
leftyy
presplitN
presplityy
VarDiff
222
Methodology – where to split?
23
gSearch all values of selected variables trying them each as a potential split point (Greedy search)
gChoose point which maximizes the equation above
Methodology – when to stop splitting?
24
gVariance reduction threshold
gMinimum number of companies per end-node
Methodology – How to predict (in a Single Tree)?
25
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies
Avg P/FV = 0.8
500 Companies
Avg P/FV = 0.8
XYZ Corp
ROA=9%
Market Cap=19
Sector=Energy
False True
100 Companies
Avg P/FV = 1.0
400 Companies
Avg P/FV = 1.2
200 Companies
Avg P/FV = 0.8
100 Companies
Avg P/FV = 1.8
Methodology – How to predict (in a Single Tree)?
26
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies
Avg P/FV = 0.8
500 Companies
Avg P/FV = 0.8
XYZ Corp
ROA=9%
Market Cap=19
Sector=Energy
False True
100 Companies
Avg P/FV = 1.0
400 Companies
Avg P/FV = 1.2
200 Companies
Avg P/FV = 0.8
100 Companies
Avg P/FV = 1.8
Methodology – How to predict (in a Single Tree)?
27
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies
Avg P/FV = 0.8
500 Companies
Avg P/FV = 0.8
XYZ Corp
ROA=9%
Market Cap=19
Sector=Energy
False True
100 Companies
Avg P/FV = 1.0
400 Companies
Avg P/FV = 1.2
200 Companies
Avg P/FV = 0.8
100 Companies
Avg P/FV = 1.8
Methodology – How to predict (in a Single Tree)?
28
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies
Avg P/FV = 0.8
500 Companies
Avg P/FV = 0.8
XYZ Corp
ROA=9%
Market Cap=19
Sector=Energy
False True
100 Companies
Avg P/FV = 1.0
400 Companies
Avg P/FV = 1.2
200 Companies
Avg P/FV = 0.8
100 Companies
Avg P/FV = 1.8
Methodology – How to predict (in a Single Tree)?
29
ROA>10%
Market
Cap>20
Market
Cap>40
Sector=Tech
True False
False False True True
500 Companies
Avg P/FV = 0.8
500 Companies
Avg P/FV = 0.8
XYZ Corp
ROA=9%
Market Cap=19
Sector=Energy
False
Final Tree Prediction = 0.8
True
100 Companies
Avg P/FV = 1.0
400 Companies
Avg P/FV = 1.2
200 Companies
Avg P/FV = 0.8
100 Companies
Avg P/FV = 1.8
Methodology – How to predict (in a Forest)?
30
Tree 1 Prediction: 0.8 Tree N Prediction: 1.1
Random Forest™ Prediction = (0.8 + …+ 1.1) / N = 0.95
Tree 1 Tree N
…
Methodology – Variable Importance
32
gWhat variables or features are important for our prediction, ROA or Market Cap?
Methodology – Variable Importance
33
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
Methodology – Variable Importance
34
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
OOB
OOB
Methodology – Variable Importance
35
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
OOB
OOB
2ˆErrorOOBOOB iii
yy
Methodology – Variable Importance
36
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
gVariable Importance
Methodology – Variable Importance
37
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
gVariable Importance
OOB Company ROA
A 1%
B 5%
C 5%
I 7%
J 2%
K 9%
True OOB Error = 5
Methodology – Variable Importance
38
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
gVariable Importance
OOB Company ROA
A 1%
B 5%
C 5%
I 7%
J 2%
K 9%
OOB Company ROA
A 1%
B 5%
C 5%
I 7%
J 2%
K 9%
Permutation On ROA
True OOB Error = 5
Methodology – Variable Importance
39
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
gVariable Importance
OOB Company ROA
A 1%
B 5%
C 5%
I 7%
J 2%
K 9%
OOB Company ROA
A 1%
B 5%
C 5%
I 7%
J 2%
K 9%
Permutation On ROA
True OOB Error = 5 Permutated OOB Error = 10
Methodology – Variable Importance
40
gWhat variables or features are important for our prediction, ROA or Market Cap?
gOOB (Out-Of-Bag) Error Rate
gVariable Importance
OOB Company ROA
A 1%
B 5%
C 5%
I 7%
J 2%
K 9%
OOB Company ROA
A 1%
B 5%
C 5%
I 7%
J 2%
K 9%
Permutation On ROA
Variable Importance of ROA = Increase in OOB Error = 5
True OOB Error = 5 Permutated OOB Error = 10
Methodology – Variable Importance
41
g If variable importance of variable i is close to 0, then variable i is NOT important
g If variable importance of variable i is close to ∞, then variable i is SUPER important
44
Morningstar Quantitative Research Overview
Morningstar quantitative equity ratings vastly expand our coverage of the equity universe. They are designed to replicate the proven, proprietary, forward-looking analysis of our research team.
Morningstar Global Analyst Team
45
Morningstar Quantitative Research
Our accomplished team of award-winning analysts applies one consistent valuation methodology. Their work forms the basis for our quantitative model.
Equity and Credit Analysts
gOur analysts are
specialized and
organized by sector
gAverage coverage per
analyst is 16 companies
gMore than 2/3 of
our analysts have an
MBA or are CFA
charterholders, 1/3
have both
g13 Analyst Awards from
the Wall Street Journal
Valuation Methodology
gDiscounted cash flow
foundation applied
to all companies, across
all sectors
gEconomic Moat Committee
ensures consistent
assignment of Economic
Moat™ Rating
Morningstar’s Analyst Research Methodology
46
Morningstar Quantitative Research
Our analysts focus on proprietary data points such as economic moats and compare the market price to our fair value estimate. Our quantitative model is designed to replicate this focus.
The Effectiveness of Our Ratings Over Time
47
Morningstar Quantitative Research
Our methodology has proved effective, with five-star rated stocks outperforming all others.
Trailing Annualized Returns (%)
Morningstar Rating™ for stocks 1-Year 3-Year 5-Year 10-Year
Since Inception
(08/06/2001)
QQQQQ 17.7 17.6 19.1 17.7 13.0
QQQQ 5.9 13.5 12.9 11.2 12.1
QQQ 14.4 16.7 13.9 11.0 9.8
QQ 11.5 12.2 10.9 6.6 5.7
Q 4.5 14.4 10.0 14.9 10.2
Morningstar Coverage Universe 11.8 15.0 13.4 12.0 —
S&P 500 Index (cap-weighted) 12.7 16.1 14.5 8.0 —
Source: Morningstar Time-weighted returns through: March 31, 2015
Morningstar’s Quantitative Research Methodology
48
Morningstar Quantitative Research
We generate the Morningstar® Quantitative Rating for each stock daily, deriving it from the qualitative ratings our analysts assign to their coverage universe.
Our Quantitative Model Matches Our Analyst Ratings
49
Morningstar Quantitative Research
The model shows meaningful disagreement with the direction of the analyst recommendation less than 12% of the time.
0.00
0.50
1.00
1.50
2.00
2.50
3.00
0 1 2 3 4 5
Qua
nt P
rice/
Fair
Valu
e R
atio
Analyst Price/Fair Value Ratio
Data as of March 31, 2015
R² = 0.7528
How Our Model Has Performed Over Time
50
Morningstar Quantitative Research
We back-tested the model and found it performs as expected—the most undervalued stocks outperform all others.
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
USD
Quintile 1 Quintile 2 Quintile 3 Quintile 4 Quintile 5
Data as of March 31, 2015
51
Role of the Economic Moat
We’ve harnessed our data and analyst insight to significantly expand our equity research coverage.
Morningstar Quantitative Research
736 Analyst Coverage
14,510 Quantitative Coverage
North America
253 Analyst Coverage
8,524 Quantitative Coverage
Europe
128 Analyst Coverage
13,898 Quantitative Coverage
Asia
32 Analyst Coverage
1,041 Quantitative Coverage
Latin America 42 Analyst Coverage
7,638 Quantitative Coverage
Eurasia/India/
Middle East/Africa
225 Analyst Coverage
2,067 Quantitative Coverage
Australia
1,416 Analyst Coverage
50,506 Quantitative Coverage
19x 33x 108x 181x
32x
9x
Data as of March 31, 2015. Coverage numbers are calculated by the number of companies in their respective countries of domicile.
52
Comprehensive Global Quantitative Equity Coverage
Our quantitative coverage is both broad across regions and deep within countries.
Morningstar Quantitative Research
North America
Country Companies
U.S. 10,746
Canada 3,277
Latin America
Brazil 485
Chile 222
Mexico 130
Argentina 100
Colombia 59
Asia Pacific
Country Companies
Japan 3,574
China 2,632
South Korea 1,810
Taiwan 1,780
Malaysia 903
Singapore 622
Thailand 641
Indonesia 501
Vietnam 302
Philippines 255
Hong Kong 223
Australia & New Zealand
Australia 1,837
New Zealand 141
Africa, India, Pakistan, & Middle East
Country Companies
India 4,718
Israel 481
Pakistan 487
South Africa 308
Bangladesh 276
Egypt 231
Kuwait 199
Nigeria 191
Saudi Arabia 163
Oman 116
U.A.E. 107
Iraq 92
Europe
Country Companies
U.K. 1,506
Germany 851
France 845
Poland 859
Russia 805
Sweden 408
Turkey 430
Italy 289
Switzerland 219
Greece 237
Spain 151
Norway 166
Netherlands 96
Belgium 152
Country Companies
Denmark 150
Finland 127
Austria 82
Ireland 28
Luxembourg 15
Portugal 60
Data as of March 31, 2015. Displaying countries with over 50 companies covered.
53
Applying the Quantitative Rating
Filter across and within sectors makes it possible to filter on a variety of data points to uncover investment ideas.
Morningstar Quantitative Research
Next Steps
54
Morningstar Quantitative Research
gMorningstar’s quantitative research:
/ Is forward-looking and distinct from other quantitative tools
/Applies the principles of 10 years of successful Morningstar analyst experience
/Aims to predict future alpha
/Offers broad and deep global coverage
gMeet with one of our quantitative analysts to learn more about our methodology and explore sample data
Andy Tang, CFA
Quantitative Analyst
(312) 384-4839
56
Decision Tree: Robust, Not Accurate
CART by Leo Breiman, 1984
C4.5 by Ross Quinlan, 1986
Aggregated Trees: Robust, but high-correlation
Bagging by Leo Breiman, 1994
(Boosting by Robert Schapire, 1990)
Randomized Trees: NO correlation
By Yali Amit, 1997
Random Forest™: Bagging + Randomized Trees
By Leo Breiman, 2001
Family Tree of Random Forest™
Random Forest™ updates
57
gArborist: Random Forest™ and GPU by Mark Seligman
gmobForest: model based partitioning forest
/Regression Models for node splitting (e.g. linear model, GLM, etc)
rightN
rightyy
leftN
leftyy
presplitN
presplityy
VarDiff
222