Download pdf - Presentation: Predictably Unequal? The Effects of Machine ... · 9/11/2020 · Loan Purpose (dummies for purchase, re nance, home improvement) Loan Term (dummies for 10, 15, 20,

Predictably Unequal?The Effects of Machine Learning on Credit Markets

Andreas Fuster, Paul Goldsmith-Pinkham, Tarun Ramadorai, Ansgar Walther

SNB, Yale SOM, Imperial College (×2)

September 2020

1 / 26

Advances in Technology and Inequality

- Machine learning has been rapidly adopted in many industries

- Central application: default prediction in credit markets(e.g. Khandani, Kim, and Lo, 2010; Sirignano, Sadhwani, and Giesecke, 2017)

- This paper: What are the distributional effects of new technology?

2 / 26

Advances in Technology and Inequality

vs.

3 / 26

This Paper

Theory: Distributional implications of “better” statistical technology

Mortgage default prediction: Using US administrative data with traditional technology(Logit) and Machine Learning

Distributional consequences of new technology- Across racial groups: fewer winners in some minority groups; increased dispersion

Equilibrium implications in a model of competitive loan pricing- Outcomes differ on both extensive and intensive margins

4 / 26

A Lender’s Prediction Problem

Observe borrowers with characteristics x and default outcome y

Predict y = P(x) to minimize MSE- Old technology: Restricted class of functions P (e.g. linear)- New technology: Wider class of permitted functions

Lemma. Optimal predictions with new technology are a mean-preserving spread ofthose with old technology⇒ There are winners and losers

5 / 26

Winners and Losers

x

DefaultProbability

Pnl

Plin

βx+ γgr

βx+ γgb

a

Convex quadratic: “extreme” x lose, others gain6 / 26

Winners and Losers

x

DefaultProbability

Pnl

Plin

βx+ γgr

βx+ γgb

a

Two groups: “blue” borrowers lose due to high variance6 / 26

Sources of Unequal Effects

- Previous example could arise from

y = P(x) + ε,

where P is nonlinear and the group g does not matter for y .⇒Winners/losers arise from additional flexibility of new technology.Effects across g depend on functional form of new technology, and the differencesin distribution of characteristics

- Alternative:y = β · x + γ · g + ε,

i.e. true relationship is linear, but g predictive of default.⇒ Effects of new technology arise due to “triangulating” g

7 / 26

Sources of Unequal Effects

- Previous example could arise from

y = P(x) + ε,

where P is nonlinear and the group g does not matter for y .⇒Winners/losers arise from additional flexibility of new technology.Effects across g depend on functional form of new technology, and the differencesin distribution of characteristics

- Alternative:y = β · x + γ · g + ε,

i.e. true relationship is linear, but g predictive of default.⇒ Effects of new technology arise due to “triangulating” g

7 / 26

Triangulation

x

DefaultProbability

Plin

βx+ γgred

βx+ γgblue

a

- No linear correlation between x and g → linear model simply recovers average8 / 26

Triangulation

x

DefaultProbability

Pnl

Plin

βx+ γgred

βx+ γgblue

a

- Blue borrowers more likely to have extreme x → nonlinear model penalizes.8 / 26

US Mortgage DataHMDA

- Application date, applicant income,loan type, size, purpose,

- race, ethnicity, gender

McDash (Black Knight)

- Underwriting, contract andperformance: e.g. FICO, LTV, interestrate, default status

Linked Dataset

- 9.4m mortgage loans from 2009-2013- Portfolio and GSE loans, < $1m- Default: 90+ days delinquent within 3

years of origination9 / 26

Default Rates Across Race

0 .005 .01 .015 .02Average Three-Year Default Rate

Black

White Hispanic

Native American

White Non-Hispanic

Asian

10 / 26

Mean FICO Across Race

730 740 750 760 770Average FICO score

Black

White Hispanic

Native American

White Non-Hispanic

Asian

11 / 26

S.D. of FICO Across Race

40 45 50 55 60S.D. of FICO score

Black

White Hispanic

Native American

White Non-Hispanic

Asian

12 / 26

Interest Rates Across Race

-.1 -.05 0 .05 .1Average Spread at Origination

Black

White Hispanic

Native American

White Non-Hispanic

Asian

13 / 26

Estimating Probabilities of Default: TechnologiesTraditional: Probability of Default = Logit(x ) (e.g. Demyanyk and Van Hemert, 2011; Elul et al., 2010)

- Using nonlinear “bin” dummies for FICO, LTV, income

Machine Learning: Decision trees estimate step functions

Figure 11: An example of a CART model for a discrete dependent variable with two out-comes, good and bad, and two independent variables {x1, x2}.

increases the types of relations that can be captured and the number of independent variables

that can be used. Moreover, CART models produce easily interpretable decision rules whose

logic is clearly laid out in the tree. This aspect is particularly relevant for applications in

the banking sector in which “black-box” models are viewed with suspicion and skepticism.

CART models can easily be applied to problems with high-dimensional feature spaces.

Suppose we have N observations of the dependent variable {y1, . . . , yN} and its corresponding

D-dimensional feature vectors {x1, ..., xN}. We estimate the parameters of the CART model

on the training dataset by recursively selecting features from x ∈ {x1, ..., xD} and parameters

{Lj} that minimize the residual sum-of-squared errors. Of course, we must impose a “pruning

criterion” for stopping the expansion of the tree so as to avoid overfitting the training data.

One of the most widely used measures for pruning is the Gini measure:

G(τ) ≡K∑

k=1

Pτ (k)(1 − Pτ (k)) (1)

where τ refers to a leaf node of a CART model and Pτ (k) refers to the proportion of training

data assigned to class k at leaf node τ . Then the pruning criterion for CART model T is

20

(from Khandani, Kim, and Lo, 2010)

1. Random forest(w/cross-validation)

2. Calibration(isotonicregression)

- (Similar if use“XGBoost”)

14 / 26

Explanatory VariablesLogit Nonlinear Logit

Applicant Income (linear) Applicant Income (25k bins, from 0-500k)LTV Ratio (linear) LTV Ratio (5-point bins, from 20 to 100%;

separate dummy for LTV=80%)FICO (linear) FICO (20-point bins, from 600 to 850;)

separate dummy for FICO<600)(with dummy variables for missing values)

Common CovariatesSpread at Origination “SATO” (linear)Origination Amount (linear and log)Documentation Type (dummies for full/low/no/unknown documentation)Occupancy Type (dummies for vacation/investment property)Jumbo Loan (dummy)Coapplicant Present (dummy)Loan Purpose (dummies for purchase, refinance, home improvement)Loan Term (dummies for 10, 15, 20, 30 year terms)Funding Source (dummies for portfolio, Fannie Mae, Freddie Mac, other)Mortgage Insurance (dummy)State (dummies)Year of Origination (dummies)

15 / 26

Model Performance More Detail

Estimate on training set (70%), evaluate on test set (30%).

Out-of-sample performance:- R2 ↑ by 14.30%- Precision Score ↑ by 5.1%

How many predicted defaults are true defaults?

- Bootstrap analysis confirms significant differences

→ Random Forest method substantially better predictor of Pr(default|X )

16 / 26

Unequal Effects of New Technology: Example- Fix all characteristics but

income + FICO

- Compare distribution vs.predictions by race

- Logit not very flexible

- RF much more flexible

17 / 26


income + FICO




50 100 150 200Income

650

700

750

800

FIC

O

0.1

0.20.3

0.5

Nonlinear Logit: White Non-Hispanic

17 / 26


income + FICO




50 100 150 200Income

650

700

750

800

FIC

O

0.2

0.30.6

1.0

1.3

Random Forest: White Non-Hispanic

17 / 26


income + FICO




50 100 150 200Income

650

700

750

800

FIC

O

0.2

0.30.6

1.0

1.3

Random Forest: Black

17 / 26

Unequal Effects of New Technology: Population

−1.0 −0.5 0.0 0.5 1.0

Log(PD from Random Forest) - Log(PD from Nonlinear Logit)

0.0

0.2

0.4

0.6

0.8

1.0

Cu

mu

lati

veS

har

e

Asian

White Non-Hispanic

White Hispanic

Black

18 / 26

Unequal Effects of New Technology: Alternative Approaches

More Interactions

GSE+Full Doc

White

2009−2011

No Unknowns

Without Fico

With Int. Rates

Without Int. Rates

0 10 20Random Forest Improvement

over NL Logit in R^2 (p.p.)

More Interactions

GSE+Full Doc

White

2009−2011

No Unknowns

Without Fico

With Int. Rates

Without Int. Rates

0 5 10 15Percent share more creditworthy

in Random Forest vs. Nonlinear Logit

RaceHispanicBlackAsianWhite

19 / 26

Predicting Race using Identical Observables

Model ROC AUC Precision Score Brier Score × 10 R2

Logit 0.7478 0.1948 0.5791 0.0609Nonlinear Logit 0.7485 0.1974 0.5783 0.0622Random Forest 0.7527 0.2110 0.5665 0.0813

−→ RF model is strikingly better at predicting black / hispanic borrowers

20 / 26

Flexibility versus Triangulation

Decomposition of model improvements:1. Add race as an explanatory variable to Logit2. Allow use of ML technology to the model with race

(i.e. ”add” nonlinear functions / interactions of x as explanatory variables)

⇒ Improved performance mostly due to flexibility, not triangulation

21 / 26




Race Race Int. TechnologyROC-AUC 6.28 2.04 91.69Precision 9.05 22.43 68.52R2 3.37 4.39 92.24


21 / 26




Technology RaceROC-AUC 89.77 10.23Precision 94.14 5.86R2 92.95 7.05


21 / 26

Interest Rates in Competitive EquilibriumSimple 2-period model:

NPV (x ,R) =1

1+ ρ

[(1− P(x ,R))(1+ R)L+ P(x ,R)L

]− L

- Equilibrium R?(x) solves NPV = 0

- Reject x-borrowers if NPV (x ,R) < 0 for allfeasible R

R

NPV

N(xL, R)

N(xH , R)

R(xL)

- Calibration:- recovery: L = min((1+ R)L, 0.75V )− 0.1L (second part: carrying costs, liquidation exp.)- WACC: ρ = quarterly average interest rate −30bps- 3-year PD to lifetime via “standard default assumption” (MBS mkt convention)

22 / 26

Interest Rates in Competitive EquilibriumSimple 2-period model:

NPV (x ,R) =1

1+ ρ

[(1− P(x ,R))(1+ R)L+ P(x ,R)L

]− L

- Equilibrium R?(x) solves NPV = 0

- Reject x-borrowers if NPV (x ,R) < 0 for allfeasible R

R

NPV

N(xL, R)

N(xH , R)

R(xL)

- Calibration:- recovery: L = min((1+ R)L, 0.75V )− 0.1L (second part: carrying costs, liquidation exp.)- WACC: ρ = quarterly average interest rate −30bps- 3-year PD to lifetime via “standard default assumption” (MBS mkt convention)

22 / 26

Identification

In data, only observe one R per loan. But likely not randomly allocated.→ bias in P for counterfactual rates R

Proposed solution:1. Restrict attention to GSE / full documentation loans→ Likely selection on observable variables, not soft information (Keys et al., 2010)

2. Adjust ∂P∂R downwards using ratio of causal to reduced-form estimates based on

Fuster and Willen (2017)- Estimated ∂P

∂R over first 3 years ≈ 1.7× causal ∂P∂R

23 / 26

Model Outcomes- Acceptance rates

- Average SATO(= R − Rt )

- S.D. of SATO→ new technologyincreasesdispersion acrossand within groups

93.3

92.4

91.1

90.3

89.5

88.9

86.4

85.6

79.3

77.7

Asian

White Non−Hispanic

Other

White Hispanic

Black

0 25 50 75 100

RFLogit

24 / 26




−0.123

−0.108

−0.09

−0.083

−0.088

−0.083

−0.008

−0.031

0.06

0.022

Asian


Other

White Hispanic

Black

−0.15 −0.10 −0.05 0.00 0.05 0.10

RFLogit

24 / 26




0.322

0.274

0.356

0.296

0.36

0.296

0.414

0.333

0.461

0.365

Asian


Other

White Hispanic

Black

0.0 0.1 0.2 0.3 0.4 0.5

RFLogit

24 / 26

Inclusion and Exclusion in EquilibriumAverage SATO: White + Asian Borrowers

−0.12

−0.10

0.71

0.95

Always In

Excluded

Included

0.0 0.4 0.8 1.2Mean SATO

RFLogit

25 / 26

Inclusion and Exclusion in EquilibriumAverage SATO: Black + Hispanic Borrowers

−0.03

−0.04

0.72

1.00

Always In

Excluded

Included

0.0 0.4 0.8 1.2Mean SATO

RFLogit

25 / 26

Inclusion and Exclusion in EquilibriumSD of SATO: White + Asian Borrowers

0.30

0.27

0.39

0.42

Always In

Excluded

Included

0.0 0.1 0.2 0.3 0.4 0.5SD Sato

RFLogit

25 / 26

Inclusion and Exclusion in EquilibriumSD of SATO: Black + Hispanic Borrowers

0.37

0.31

0.39

0.38

Always In

Excluded

Included

0.0 0.1 0.2 0.3 0.4SD Sato

RFLogit

25 / 26

Conclusion

- Improvements in statistical technology creates- Greater predictive power and gains for producers- Increased disparity in outcomes for consumers

- Based on US mortgage data, black + hispanic borrowers bear larger changes- First-moment effects: More likely to be perceived as high risk- Second-moment effects: Greater increase in dispersion of outcomes- Improvement comes from more than just “putting race in”

- Equilibrium effects- Positive extensive-margin effect of new technology- Unequal effects persist at intensive margin

26 / 26

Performance of Different Statistical Technologies Predicting Default

ROC AUC Precision Score Brier Score × 100 R2

(1) (2) (3) (4) (5) (6) (7) (8)Model No Race Race No Race Race No Race Race No Race RaceLogit 0.8522 0.8526 0.0589 0.0592 0.7172 0.7171 0.0245 0.0246Nonlinear Logit 0.8569 0.8573 0.0598 0.0601 0.7146 0.7145 0.0280 0.0281Random Forest 0.8634 0.8641 0.0630 0.0641 0.7114 0.7110 0.0323 0.0329

1 / 5

Measuring Model Performance Return

MSE is one natural way to evaluate P :

MSE (P) = n−1 ∑n

(P(xi )− yi )2

Can be decomposed into three components:

MSE (P) = n−1K

∑k=1

nk(yk − yk)2

︸︷︷︸Reliability

− n−1K

∑k=1

nk(yk − y)2

︸︷︷︸Resolution

+ y(1− y)︸︷︷︸Uncertainty

- Brier Scores: RF 0.00711, Logit 0.00714, but overall uncertainty: 0.00735

- Reliability: Logit is 3500% worse than RF

- Resolution: Logit is 40% better than RF2 / 5

Isotonic regressions and calibration Return

0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value

0.0

0.2

0.4

0.6

0.8

1.0

Fra

ctio

nof

pos

itiv

es

Perfectly calibrated

Logit

Nonlinear Logit

Random Forest

Random Forest - Isotonic

3 / 5

Decomposition of Performance Improvement Return

Race TechnologyROC-AUC 5.88 94.12Precision 7.90 92.10Brier 3.25 96.75R2 2.04 97.96

Technology RaceROC-AUC 91.16 8.84Precision 77.21 22.79Brier 90.63 9.37R2 87.75 12.25

Panel A: Race Controls First Panel B: New Technology First

- Panel A: Nonlinear Logit→ add race dummies. Get less than 8% of fitimprovement that get from moving to Random Forest (w/o race)

- Panel B: Random Forest→ add race dummies. Slightly larger improvements fromhaving race but still much less important than benefit of flexibility

4 / 5

Predicting Minority Status from x Return

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

Tru

ePo

siti

veR

ate

LogitNonlinear LogitRandomForest

0.0 0.2 0.4 0.6 0.8 1.0Recall

0.0

0.2

0.4

0.6

0.8

1.0

Pre

cisi

on

LogitNonlinear LogitRandomForest

Model ROC AUC Precision Score Brier Score × 10 R2

Logit 0.7478 0.1948 0.5791 0.0609Nonlinear Logit 0.7485 0.1974 0.5783 0.0622Random Forest 0.7527 0.2110 0.5665 0.0813

5 / 5