Predictably Unequal?The Effects of Machine Learning on Credit Markets
Andreas Fuster, Paul Goldsmith-Pinkham, Tarun Ramadorai, Ansgar Walther
SNB, Yale SOM, Imperial College (×2)
September 2020
1 / 26
Advances in Technology and Inequality
- Machine learning has been rapidly adopted in many industries
- Central application: default prediction in credit markets(e.g. Khandani, Kim, and Lo, 2010; Sirignano, Sadhwani, and Giesecke, 2017)
- This paper: What are the distributional effects of new technology?
2 / 26
Advances in Technology and Inequality
vs.
3 / 26
This Paper
Theory: Distributional implications of “better” statistical technology
Mortgage default prediction: Using US administrative data with traditional technology(Logit) and Machine Learning
Distributional consequences of new technology- Across racial groups: fewer winners in some minority groups; increased dispersion
Equilibrium implications in a model of competitive loan pricing- Outcomes differ on both extensive and intensive margins
4 / 26
A Lender’s Prediction Problem
Observe borrowers with characteristics x and default outcome y
Predict y = P(x) to minimize MSE- Old technology: Restricted class of functions P (e.g. linear)- New technology: Wider class of permitted functions
Lemma. Optimal predictions with new technology are a mean-preserving spread ofthose with old technology⇒ There are winners and losers
5 / 26
Winners and Losers
x
DefaultProbability
Pnl
Plin
βx+ γgr
βx+ γgb
a
Convex quadratic: “extreme” x lose, others gain6 / 26
Winners and Losers
x
DefaultProbability
Pnl
Plin
βx+ γgr
βx+ γgb
a
Two groups: “blue” borrowers lose due to high variance6 / 26
Sources of Unequal Effects
- Previous example could arise from
y = P(x) + ε,
where P is nonlinear and the group g does not matter for y .⇒Winners/losers arise from additional flexibility of new technology.Effects across g depend on functional form of new technology, and the differencesin distribution of characteristics
- Alternative:y = β · x + γ · g + ε,
i.e. true relationship is linear, but g predictive of default.⇒ Effects of new technology arise due to “triangulating” g
7 / 26
Sources of Unequal Effects
- Previous example could arise from
y = P(x) + ε,
where P is nonlinear and the group g does not matter for y .⇒Winners/losers arise from additional flexibility of new technology.Effects across g depend on functional form of new technology, and the differencesin distribution of characteristics
- Alternative:y = β · x + γ · g + ε,
i.e. true relationship is linear, but g predictive of default.⇒ Effects of new technology arise due to “triangulating” g
7 / 26
Triangulation
x
DefaultProbability
Plin
βx+ γgred
βx+ γgblue
a
- No linear correlation between x and g → linear model simply recovers average8 / 26
Triangulation
x
DefaultProbability
Pnl
Plin
βx+ γgred
βx+ γgblue
a
- Blue borrowers more likely to have extreme x → nonlinear model penalizes.8 / 26
US Mortgage DataHMDA
- Application date, applicant income,loan type, size, purpose,
- race, ethnicity, gender
McDash (Black Knight)
- Underwriting, contract andperformance: e.g. FICO, LTV, interestrate, default status
Linked Dataset
- 9.4m mortgage loans from 2009-2013- Portfolio and GSE loans, < $1m- Default: 90+ days delinquent within 3
years of origination9 / 26
Default Rates Across Race
0 .005 .01 .015 .02Average Three-Year Default Rate
Black
White Hispanic
Native American
White Non-Hispanic
Asian
10 / 26
Mean FICO Across Race
730 740 750 760 770Average FICO score
Black
White Hispanic
Native American
White Non-Hispanic
Asian
11 / 26
S.D. of FICO Across Race
40 45 50 55 60S.D. of FICO score
Black
White Hispanic
Native American
White Non-Hispanic
Asian
12 / 26
Interest Rates Across Race
-.1 -.05 0 .05 .1Average Spread at Origination
Black
White Hispanic
Native American
White Non-Hispanic
Asian
13 / 26
Estimating Probabilities of Default: TechnologiesTraditional: Probability of Default = Logit(x ) (e.g. Demyanyk and Van Hemert, 2011; Elul et al., 2010)
- Using nonlinear “bin” dummies for FICO, LTV, income
Machine Learning: Decision trees estimate step functions
Figure 11: An example of a CART model for a discrete dependent variable with two out-comes, good and bad, and two independent variables {x1, x2}.
increases the types of relations that can be captured and the number of independent variables
that can be used. Moreover, CART models produce easily interpretable decision rules whose
logic is clearly laid out in the tree. This aspect is particularly relevant for applications in
the banking sector in which “black-box” models are viewed with suspicion and skepticism.
CART models can easily be applied to problems with high-dimensional feature spaces.
Suppose we have N observations of the dependent variable {y1, . . . , yN} and its corresponding
D-dimensional feature vectors {x1, ..., xN}. We estimate the parameters of the CART model
on the training dataset by recursively selecting features from x ∈ {x1, ..., xD} and parameters
{Lj} that minimize the residual sum-of-squared errors. Of course, we must impose a “pruning
criterion” for stopping the expansion of the tree so as to avoid overfitting the training data.
One of the most widely used measures for pruning is the Gini measure:
G(τ) ≡K∑
k=1
Pτ (k)(1 − Pτ (k)) (1)
where τ refers to a leaf node of a CART model and Pτ (k) refers to the proportion of training
data assigned to class k at leaf node τ . Then the pruning criterion for CART model T is
20
(from Khandani, Kim, and Lo, 2010)
1. Random forest(w/cross-validation)
2. Calibration(isotonicregression)
- (Similar if use“XGBoost”)
14 / 26
Explanatory VariablesLogit Nonlinear Logit
Applicant Income (linear) Applicant Income (25k bins, from 0-500k)LTV Ratio (linear) LTV Ratio (5-point bins, from 20 to 100%;
separate dummy for LTV=80%)FICO (linear) FICO (20-point bins, from 600 to 850;)
separate dummy for FICO<600)(with dummy variables for missing values)
Common CovariatesSpread at Origination “SATO” (linear)Origination Amount (linear and log)Documentation Type (dummies for full/low/no/unknown documentation)Occupancy Type (dummies for vacation/investment property)Jumbo Loan (dummy)Coapplicant Present (dummy)Loan Purpose (dummies for purchase, refinance, home improvement)Loan Term (dummies for 10, 15, 20, 30 year terms)Funding Source (dummies for portfolio, Fannie Mae, Freddie Mac, other)Mortgage Insurance (dummy)State (dummies)Year of Origination (dummies)
15 / 26
Model Performance More Detail
Estimate on training set (70%), evaluate on test set (30%).
Out-of-sample performance:- R2 ↑ by 14.30%- Precision Score ↑ by 5.1%
How many predicted defaults are true defaults?
- Bootstrap analysis confirms significant differences
→ Random Forest method substantially better predictor of Pr(default|X )
16 / 26
Unequal Effects of New Technology: Example- Fix all characteristics but
income + FICO
- Compare distribution vs.predictions by race
- Logit not very flexible
- RF much more flexible
17 / 26
Unequal Effects of New Technology: Example- Fix all characteristics but
income + FICO
- Compare distribution vs.predictions by race
- Logit not very flexible
- RF much more flexible
50 100 150 200Income
650
700
750
800
FIC
O
0.1
0.20.3
0.5
Nonlinear Logit: White Non-Hispanic
17 / 26
Unequal Effects of New Technology: Example- Fix all characteristics but
income + FICO
- Compare distribution vs.predictions by race
- Logit not very flexible
- RF much more flexible
50 100 150 200Income
650
700
750
800
FIC
O
0.2
0.30.6
1.0
1.3
Random Forest: White Non-Hispanic
17 / 26
Unequal Effects of New Technology: Example- Fix all characteristics but
income + FICO
- Compare distribution vs.predictions by race
- Logit not very flexible
- RF much more flexible
50 100 150 200Income
650
700
750
800
FIC
O
0.2
0.30.6
1.0
1.3
Random Forest: Black
17 / 26
Unequal Effects of New Technology: Population
−1.0 −0.5 0.0 0.5 1.0
Log(PD from Random Forest) - Log(PD from Nonlinear Logit)
0.0
0.2
0.4
0.6
0.8
1.0
Cu
mu
lati
veS
har
e
Asian
White Non-Hispanic
White Hispanic
Black
18 / 26
Unequal Effects of New Technology: Alternative Approaches
More Interactions
GSE+Full Doc
White
2009−2011
No Unknowns
Without Fico
With Int. Rates
Without Int. Rates
0 10 20Random Forest Improvement
over NL Logit in R^2 (p.p.)
More Interactions
GSE+Full Doc
White
2009−2011
No Unknowns
Without Fico
With Int. Rates
Without Int. Rates
0 5 10 15Percent share more creditworthy
in Random Forest vs. Nonlinear Logit
RaceHispanicBlackAsianWhite
19 / 26
Predicting Race using Identical Observables
Model ROC AUC Precision Score Brier Score × 10 R2
Logit 0.7478 0.1948 0.5791 0.0609Nonlinear Logit 0.7485 0.1974 0.5783 0.0622Random Forest 0.7527 0.2110 0.5665 0.0813
−→ RF model is strikingly better at predicting black / hispanic borrowers
20 / 26
Flexibility versus Triangulation
Decomposition of model improvements:1. Add race as an explanatory variable to Logit2. Allow use of ML technology to the model with race
(i.e. ”add” nonlinear functions / interactions of x as explanatory variables)
⇒ Improved performance mostly due to flexibility, not triangulation
21 / 26
Flexibility versus Triangulation
Decomposition of model improvements:1. Add race as an explanatory variable to Logit2. Allow use of ML technology to the model with race
(i.e. ”add” nonlinear functions / interactions of x as explanatory variables)
Race Race Int. TechnologyROC-AUC 6.28 2.04 91.69Precision 9.05 22.43 68.52R2 3.37 4.39 92.24
⇒ Improved performance mostly due to flexibility, not triangulation
21 / 26
Flexibility versus Triangulation
Decomposition of model improvements:1. Add race as an explanatory variable to Logit2. Allow use of ML technology to the model with race
(i.e. ”add” nonlinear functions / interactions of x as explanatory variables)
Technology RaceROC-AUC 89.77 10.23Precision 94.14 5.86R2 92.95 7.05
⇒ Improved performance mostly due to flexibility, not triangulation
21 / 26
Interest Rates in Competitive EquilibriumSimple 2-period model:
NPV (x ,R) =1
1+ ρ
[(1− P(x ,R))(1+ R)L+ P(x ,R)L
]− L
- Equilibrium R?(x) solves NPV = 0
- Reject x-borrowers if NPV (x ,R) < 0 for allfeasible R
R
NPV
N(xL, R)
N(xH , R)
R(xL)
- Calibration:- recovery: L = min((1+ R)L, 0.75V )− 0.1L (second part: carrying costs, liquidation exp.)- WACC: ρ = quarterly average interest rate −30bps- 3-year PD to lifetime via “standard default assumption” (MBS mkt convention)
22 / 26
Interest Rates in Competitive EquilibriumSimple 2-period model:
NPV (x ,R) =1
1+ ρ
[(1− P(x ,R))(1+ R)L+ P(x ,R)L
]− L
- Equilibrium R?(x) solves NPV = 0
- Reject x-borrowers if NPV (x ,R) < 0 for allfeasible R
R
NPV
N(xL, R)
N(xH , R)
R(xL)
- Calibration:- recovery: L = min((1+ R)L, 0.75V )− 0.1L (second part: carrying costs, liquidation exp.)- WACC: ρ = quarterly average interest rate −30bps- 3-year PD to lifetime via “standard default assumption” (MBS mkt convention)
22 / 26
Identification
In data, only observe one R per loan. But likely not randomly allocated.→ bias in P for counterfactual rates R
Proposed solution:1. Restrict attention to GSE / full documentation loans→ Likely selection on observable variables, not soft information (Keys et al., 2010)
2. Adjust ∂P∂R downwards using ratio of causal to reduced-form estimates based on
Fuster and Willen (2017)- Estimated ∂P
∂R over first 3 years ≈ 1.7× causal ∂P∂R
23 / 26
Model Outcomes- Acceptance rates
- Average SATO(= R − Rt )
- S.D. of SATO→ new technologyincreasesdispersion acrossand within groups
93.3
92.4
91.1
90.3
89.5
88.9
86.4
85.6
79.3
77.7
Asian
White Non−Hispanic
Other
White Hispanic
Black
0 25 50 75 100
RFLogit
24 / 26
Model Outcomes- Acceptance rates
- Average SATO(= R − Rt )
- S.D. of SATO→ new technologyincreasesdispersion acrossand within groups
−0.123
−0.108
−0.09
−0.083
−0.088
−0.083
−0.008
−0.031
0.06
0.022
Asian
White Non−Hispanic
Other
White Hispanic
Black
−0.15 −0.10 −0.05 0.00 0.05 0.10
RFLogit
24 / 26
Model Outcomes- Acceptance rates
- Average SATO(= R − Rt )
- S.D. of SATO→ new technologyincreasesdispersion acrossand within groups
0.322
0.274
0.356
0.296
0.36
0.296
0.414
0.333
0.461
0.365
Asian
White Non−Hispanic
Other
White Hispanic
Black
0.0 0.1 0.2 0.3 0.4 0.5
RFLogit
24 / 26
Inclusion and Exclusion in EquilibriumAverage SATO: White + Asian Borrowers
−0.12
−0.10
0.71
0.95
Always In
Excluded
Included
0.0 0.4 0.8 1.2Mean SATO
RFLogit
25 / 26
Inclusion and Exclusion in EquilibriumAverage SATO: Black + Hispanic Borrowers
−0.03
−0.04
0.72
1.00
Always In
Excluded
Included
0.0 0.4 0.8 1.2Mean SATO
RFLogit
25 / 26
Inclusion and Exclusion in EquilibriumSD of SATO: White + Asian Borrowers
0.30
0.27
0.39
0.42
Always In
Excluded
Included
0.0 0.1 0.2 0.3 0.4 0.5SD Sato
RFLogit
25 / 26
Inclusion and Exclusion in EquilibriumSD of SATO: Black + Hispanic Borrowers
0.37
0.31
0.39
0.38
Always In
Excluded
Included
0.0 0.1 0.2 0.3 0.4SD Sato
RFLogit
25 / 26
Conclusion
- Improvements in statistical technology creates- Greater predictive power and gains for producers- Increased disparity in outcomes for consumers
- Based on US mortgage data, black + hispanic borrowers bear larger changes- First-moment effects: More likely to be perceived as high risk- Second-moment effects: Greater increase in dispersion of outcomes- Improvement comes from more than just “putting race in”
- Equilibrium effects- Positive extensive-margin effect of new technology- Unequal effects persist at intensive margin
26 / 26
Performance of Different Statistical Technologies Predicting Default
ROC AUC Precision Score Brier Score × 100 R2
(1) (2) (3) (4) (5) (6) (7) (8)Model No Race Race No Race Race No Race Race No Race RaceLogit 0.8522 0.8526 0.0589 0.0592 0.7172 0.7171 0.0245 0.0246Nonlinear Logit 0.8569 0.8573 0.0598 0.0601 0.7146 0.7145 0.0280 0.0281Random Forest 0.8634 0.8641 0.0630 0.0641 0.7114 0.7110 0.0323 0.0329
1 / 5
Measuring Model Performance Return
MSE is one natural way to evaluate P :
MSE (P) = n−1 ∑n
(P(xi )− yi )2
Can be decomposed into three components:
MSE (P) = n−1K
∑k=1
nk(yk − yk)2
︸ ︷︷ ︸Reliability
− n−1K
∑k=1
nk(yk − y)2
︸ ︷︷ ︸Resolution
+ y(1− y)︸ ︷︷ ︸Uncertainty
- Brier Scores: RF 0.00711, Logit 0.00714, but overall uncertainty: 0.00735
- Reliability: Logit is 3500% worse than RF
- Resolution: Logit is 40% better than RF2 / 5
Isotonic regressions and calibration Return
0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value
0.0
0.2
0.4
0.6
0.8
1.0
Fra
ctio
nof
pos
itiv
es
Perfectly calibrated
Logit
Nonlinear Logit
Random Forest
Random Forest - Isotonic
3 / 5
Decomposition of Performance Improvement Return
Race TechnologyROC-AUC 5.88 94.12Precision 7.90 92.10Brier 3.25 96.75R2 2.04 97.96
Technology RaceROC-AUC 91.16 8.84Precision 77.21 22.79Brier 90.63 9.37R2 87.75 12.25
Panel A: Race Controls First Panel B: New Technology First
- Panel A: Nonlinear Logit→ add race dummies. Get less than 8% of fitimprovement that get from moving to Random Forest (w/o race)
- Panel B: Random Forest→ add race dummies. Slightly larger improvements fromhaving race but still much less important than benefit of flexibility
4 / 5
Predicting Minority Status from x Return
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
Tru
ePo
siti
veR
ate
LogitNonlinear LogitRandomForest
0.0 0.2 0.4 0.6 0.8 1.0Recall
0.0
0.2
0.4
0.6
0.8
1.0
Pre
cisi
on
LogitNonlinear LogitRandomForest
Model ROC AUC Precision Score Brier Score × 10 R2
Logit 0.7478 0.1948 0.5791 0.0609Nonlinear Logit 0.7485 0.1974 0.5783 0.0622Random Forest 0.7527 0.2110 0.5665 0.0813
5 / 5