Peter Sarlin. Toward robust early-warning models: A horse race, ensembles and model uncertainty

RiskLab

Toward robust early-warning models:

A horse race, ensembles and model uncertainty

Peter Sarlin, joint with Markus Holopainen

Hanken School of Economics and RiskLab Finland

Seminar at Bank of Estonia

June 30, 2015

RiskLab Motivation

I An acute interest in new approaches to assess systemic risk

I Financial crises triggered by various shocks (unpredictable)...

I ...but widespread imbalances build-up ex ante (identi�able)

I Early-warning models to identify systemic risk at early stages

I Yet: which method(s) to use & when can we trust results?

RiskLab Systemic risk

I Systemic risk along two dimensions (Borio, 2009)

1. Build-up of risk in tranquil times & abrupt unraveling in crisis2. How risk is distributed and how shocks transmit in the system

I Three types of systemic risks (ECB, 2010):

I endogenous build-up and unraveling of widespread imbalancesI exogenous aggregate shocksI contagion and spillover

RiskLabEarly-warning indicators & models

I Text-book example of 2-class classi�cation: crisis vs. tranquil

I To identify vulnerable states of a country you need...

I Dates of historical crisis occurrencesI Indicators to identify sources of vulnerability

I Estimate the probability of being in a vulnerable state

I Signaling: Monitor univariate indicatorsI Non/linear approaches for combining indicators

I Set a threshold on the probability to optimize a loss function

I Transforms probabilities into binary point forecasts (0/1)I Depends on preferences between type I/II errors

RiskLabEWIs & Financial Stability Maps

I Mapping the State of Financial Stability (joint with Peltonen)

I How to represent mutliple indicators visually?

I Large-volume and high-dimensional data

I Clustering: Reduce large volumes of dataI Projection: From high-dimensional to low-dimensional

I Financial Stability Map based upon 14 macro-�nancialindicators for 28 economies from 1990Q1�2011Q4

VisRisk: A visualization platform for systemic risk analytics

http://vis.risklab.fi/

RiskLabInterconnectedness & EWMs

Interconnectedness of the banking sector as a vulnerability to crises(joint with Rancan & Peltonen)

I This paper enriches an EWM with network measures

I Financial networks of institutional sectors in Europe

I MFI, INS, OFI, NFC, GOV, HH and ROWI Loans, deposits, debt and shares

I Centrality of the MFI as an indicator for banking crises

I Interconnectedness of the banking sector entails a vulnerability

I Cross-border linkages capture vulnerabilities to crises...I ...and larger domestic sectoral linkages ampli�es vulnerability...I ...which yields useful predictions.I Most vulnerability descends from loans and debt securities

RiskLab Bank EWM

Predicting Bank Distress in Europe (with Betz, Oprica, Peltonen)

I One of the �rst EWMs for individual banks and analysis ofdeterminants of bank vulnerabilities in the EU

I Introduces a new dataset of bank distress in Europe

I Micro-macro perspective: banking sector & MIP indicators

I Loss function accounts for importance of individual banks

Conclusions

I Importance of complementing bank-speci�c vulnerabilities withmacro-�nancial indicators

I EWM based on publicly available data would have been usefulto predict individual bank distress during this crisis

I For a policymaker, it is essential to be more concerned of typeI/II errors related to systemically important banks

RiskLab Networks & EWMs

Network linkages to predict bank distress (with Piloiu & Peltonen)

I Does predictive performance improve if the EWM isaugmented with estimated bank interdependencies?

I Banks are interconnected, yet EWMs model individual distress

I A bank's risk modeled as a function of its neighbors' risk

Conclusions

I Two-step estimation incorporating neighbors' vulnerabilities

I Accounting for interconnections improves EWM performance

I Allows comparing relative e�ciency of di�erent networks

RiskLabRiskRank: Joint measurement

RiskRank: Measuring interconnected systemic risk (with Mezei)

I EWMs aggregate indicators & network measures connectivity

I We assume a hierarchical system of interconnected nodes

I RiskRank: Joint measure of cyclical & cross-sectional risk

Conclusions

I Bottom-up aggregation: direct, indirect & feedback e�ects

I Improved performance for bank and country models

I General framework to combine the 2 systemic risk dimensions

RiskLab This paper

A three-fold contribution:

I Conduct a horse race of early-warning models (EWMs)

I Test various approaches to aggregating these methods

I Estimate model performance and output uncertainty

Key questions:

I How EWMs perform in an objective & robust ranking?

I Is one above others or should they be used concurrently?

I Statistical signi�cance

I is a method better than others?I are the probabilities above the threshold?

RiskLab Literature

Early-warning method comparisons

I Often entirely missing

I Bilateral tests (e.g., Peltonen, '06; Marghescu et al., '11)

I ESCB's horse race show: little comparability (Alessi et al., '14)

Aggregation or ensemble learning

I No previous use of model aggregation

I Parctly incorporated in RandomForest by Alessi & Detken ('14)

Statistical signi�cance and uncertainty

I El-Shagi et al. ('13): is a model useful?

I Hurlin et al. ('14): similarity of two �rms' risk measures

RiskLab Data

I Quarterly data for 15 EU countries, from 1976�2014Q3

I Systemic banking crises

I Laeven and Valencia & ESCB Heads of ResearchI Pre-crisis indicator: 5-12 quartersI Late-pre, crisis, and post-crisis periods removed

I Macro-�nancial indicators

I asset price misalignments (house and stock prices)I excessive credit growth (growth and gaps)I business cycle indicators (GDP and in�ation)I macroeconomic factors (debt and CA)

RiskLab Methods in this paper

I A horse race of multiple methods for early-warning exercises

I Signal extractionI LDA & QDAI Logit & Logit LASSOI Naive BayesI KNNI Classi�cation tree & Random forestI ANN & ELMI SVM

RiskLab Taxonomy of methods

Predictive analytics

Clustering Classification

Covariance matrix

LDA

QDA

Logit

Logit LASSO

Frequency table

Signal extraction

Naive Bayes

Decision tree

Random forest

Similarity functions

KNN

Others

ANN

ELM

SVM

Regression

RiskLabEnsembles and uncertainty

Ensemble approaches for concurrent use of EWMs

I Best-of & voting

I Arithmetic & weighted averages of probabilities

Empirical resampling distributions to assess uncertainty

I Use repeated cross-validation and bootstrapping

I Model performance uncertainty

I Variation in relative Usefulness of EWMs

I Model output uncertainty

I Variation in probabilities and thresholds

RiskLab Evaluation criterionI Apply usefulness criterion (Alessi-Detken, '11 & Sarlin, '13):

Actual class Ij

Crisis No crisis

Predicted class Pj

Signal True positive (TP) False positive (FP)

No signal False negative (FN) True negative (TN)

I Find the threshold that minimizes a loss function that dependson policymakers' preferences µ between Type I errors(T1 = FN/(FN + TP)) (missed crises) and Type II errors(T2 = FP/(TN + FP)) (false alarms) and unconditionalprobabilities of the events P1 and P2

L(µ) = µT1P1 + (1− µ)T2P2

I De�ne absolute usefulness Ua as the di�erence between theloss of disregarding the model (available Ua) and the loss ofthe model

Ua(µ) = min [µP1, (1− µ)P2]− L(µ)

RiskLabEvaluation & estimation strategies

I Relative usefulness Ur is the ratio of captured Ua to availableUa, given µ and P1

Ur (µ) = Ua(µ)/min [µP1, (1− µ)P2]

I Model selection to optimize free parameters via a grid search

I Cross-validation exercise (repeated CV)

I Assess generalization performanceI 10 folds

I Real-time recursive exercise (bootstrapping)

I Test prediction performance from 2006Q2 - 2014Q3I Use only data available at that speci�c point in time

RiskLabCross-validated horse raceRank(*) Method Ur (µ) SE AUC SE

1(4) KNN 92 % 0.016 0.987 0.006

2(7) SVM 91 % 0.017 0.998 0.001

3(8) Neural network 90 % 0.022 0.996 0.003

4(8) ELM 88 % 0.023 0.991 0.005

5(8) Weighted 88 % 0.012 0.995 0.0006

6(8) Voting 88 % 0.017 0.947 0.008

7(11) Best-of 84 % 0.030 0.991 0.005

8(11) Non-weighted 83 % 0.010 0.992 0.0007

9(11) Random forest 82 % 0.042 0.996 0.001

10(11) QDA 79 % 0.024 0.984 0.001

11(13) Classif. tree 64 % 0.027 0.882 0.018

12(13) Naive Bayes 60 % 0.019 0.948 0.002

13(15) Logit 54 % 0.018 0.933 0.008

14(15) Logit LASSO 53 % 0.017 0.934 0.001

15(16) LDA 48 % 0.022 0.927 0.002

16(-) Signaling 4 % 0.014 0.712 0.000

RiskLab Recursive horse raceRank(*) Method Ur (µ) SE AUC SE

1(8) Best-of 76 % 0.074 0.92 0.023

2(5) Weighted 75 % 0.034 0.95 0.010

3(10) Non-weighted 72 % 0.040 0.94 0.011

4(10) KNN 66 % 0.047 0.97 0.016

5(10) Voting 64 % 0.044 0.86 0.016

6(10) Neural network 64 % 0.063 0.94 0.011

7(10) QDA 61 % 0.071 0.97 0.008

8(10) ELM 60 % 0.066 0.91 0.020

9(13) SVM 52 % 0.122 0.84 0.069

10(16) Logit 44 % 0.055 0.90 0.012

11(16) Random forest 39 % 0.162 0.94 0.010

12(16) Logit LASSO 37 % 0.054 0.87 0.010

13(16) Naive Bayes 24 % 0.076 0.86 0.015

14(16) LDA 23 % 0.064 0.83 0.013

15(16) Classif. tree 22 % 0.108 0.75 0.059

16(-) Signaling -39 % 0.057 0.62 0.007

RiskLabModel output uncertainty

I Probabilities for UK & SWE, real-time recursive exercise

I Con�dence bands for probabilities and thresholds

2002 2004 2006 2008 2010 2012 2014

0.0

0.2

0.4

0.6

0.8

1.0

Country: United Kingdom

Pro

babi

lity,

met

hod:

kkn

n

● ●

●●

●

●

ProbabilityInsignificant probabilityThresholdCrisisPre−crisis

2004 2006 2008 2010 2012 2014

0.0

0.2

0.4

0.6

0.8

1.0

Country: Sweden

Pro

babi

lity,

met

hod:

kkn

n

●●

●

ProbabilityInsignificant probabilityThresholdCrisisPre−crisis

RiskLabModel output uncertaintyRank Method All Ur (µ) Sig Ur (µ)

1 KNN 92 % 93 %

2 SVM 91 % 100 %

3 Neural network 90 % 100 %

4 ELM 88 % 100 %

5 Random forest 82 % 100 %

6 Weighted 88 % 94 %

8 Best-of 84 % 97 %

9 Non-weighted 83 % 92 %

10 QDA 79 % 88 %

11 Classif. tree 64 % 82 %

12 Naive Bayes 60 % 75 %

13 Logit 54 % 56 %

14 Logit LASSO 53 % 58 %

15 LDA 48 % 55 %

16 Signaling 4 % -7 %

RiskLab Conclusion

A three-fold contribution...

I Objectively test many methods for early-warning analysis [1]

I Introduce ensemble learning to early-warning analysis

I Estimate model performance and output uncertainty

...and conclusion

I Machine and ensemble learning approaches perform well

I Aggregation decreases variation in model performance

I Accounting for output uncertainty improves model performance

http://cm.infolytika.com

RiskLab

Thanks for your attention!

RiskLab Extra

RiskLab Variables

Variable name Definition Transformation and additional information

House prices to income Nominal house prices and nominal disposable income per head Ratio, index based in 2010

Current account to GDP Nominal current account balance and nominal GDP Ratio

Government debt to GDP Nominal general government consolidated gross debt and nominal GDP Ratio

Debt to service ratio Debt service costs and nominal income of households and non-financial corporations Ratio

Loans to income Nominal household loans and gross disposable income Ratio

Credit to GDP Nominal total credit to the private non-financial sector and nominal GDP Ratio

Bond yield Real long-term government bond yield Level

GDP growth Real gross domestic product 1-year growth rate

Credit growth Real total credit to private non-financial sector 1-year growth rate

Inflation Real consumer price index 1-year growth rate

House price growth Real residential property price index 1-year growth rate

Stock price growth Real stock price index 1-year growth rate

Credit to GDP gap Nominal bank credit to the private non-financial sector and nominal GDP Absolute deviation from trend, λ =400,000

House price gap Deviation from trend of the real residential property price index Relative deviation from trend, λ =400,000

RiskLab Machine learning

I Unsupervised learning

I Exploring the pastI Univariate, bivariate and multivariate

I Supervised learning

I Predicting the futureI Regression and classi�cation

RiskLab Predictive modelling

RiskLab Predictive modelling

I Examples of approaches for supervised learning:

I linear discriminant analysisI logit analysisI decision treesI arti�cial neural networksI support vector machines

I As well as ensembles of multiple models

RiskLab Bias vs. variance

I Model �t: Opportunity and risk

I ANNs are universal approximators for any continuous functionI Logit analysis tends to be robust on any sample

I Bias: error from erroneous assumptions in the learningalgorithm (under�t)

I Variance: error from sensitivity to small �uctuations in thetraining set (over�t)

I Regularize complexity with model selection criteria

I Cross-validation: partitioning into folds and testing on the foldleft out

I but also leave-one-out CV, AIC, BIC etc

RiskLab What is an ANN?

I ANNs are composed of nodes connected by links

I Layers of nodes: Input, hidden and output

I Link weights are network parameters that are tuned iterativelyby a learning algorithm

I Optimization to update network parametersI Commonly backpropagation to compute the actual gradientsI Derivative of the cost function with respect to the weightsI Update weights in a gradient-related direction

I Optimization through gradient descent, Levenberg-Marquardt,Gauss-Newton, ML, etc

RiskLab What is an ANN?

RiskLab Logit/LDA vs. ANN

f (·)

I Logit/LDA through ANNs

I Input: x1,x2, x3 (and interceptb)

I Output: hw ,b(x) = f(wT x

)= f

(∑3

i=1 wixi + b)

I Let f (·) be a sigmoidal function: f (z) = 11+exp(−z)

I Or a step function with threshold θ: f (z) =

{1 if z ≥ θ0 if z < θ

RiskLab ANN as an �ensemble�

RiskLabWhat is a Random Forest?

I Decision tree

I Top-down approach by splitting data into two classesI Sequential signal extractionI Trees are grown as long as it bene�ts the classi�cationI This might lead to over�tting: pruned via CV to generalize

I Random Forest: Bagging of decision trees

I Draw samples with replacement and m variables from dataI Estimate decision tree models for each resamplingI Use voting to combine model output

RiskLab Ensemble learning

I Simultaneous use of multiple statistical learning algorithms toimprove predictive performance

I Often gains in accuracy, generalization and robustnessI Gains from uncorrelated output/diversity

I Bagging: aggregate (var/obs) resampled models into onemodel output

I Boosting: output from multiple models averaged with speci�edweights

I Stacking: another layer of models on top of individual modeloutput

RiskLab Model uncertainty

I General procedure applied to model performance & output

I Estimate SE from empirical resampling distributionsI Find critical t values from the empirical distributionI Perform mean-comparison tests as overlapping con�dence

intervals do not assure statistical signi�cance

RiskLab Model selection

Method Parameters

Signal extraction Debt service ratio

LASSO λ = 0.0012

KNN k = 2 Distance = 1

Random forest No. of trees = 180 No. of predictors sampled = 5

ANN No. of hidden layer units = 8 Max no. of iterations = 200 Weight decay = 0.005

ELM No. of hidden layer units = 300 Activation function = Tan-sig

SVM γ = 0.4 Cost = 1 Kernel = Radial basis

Economy & Finance

Peter Sarlin. Toward robust early-warning models: A horse race, ensembles and model uncertainty