Model Risk With Estimates of Probabilities of Default...The Machine Learning perspective I Classiﬁcation on datasets with changed distributions is a problem well-known in Machine

Model Risk With Estimates of Probabilities ofDefault

Dirk Tasche

Imperial College, London

June 2015

2

The views expressed in the following material are the

author’s and do not necessarily represent the views of

the Global Association of Risk Professionals (GARP),

its Membership or its Management.

Outline

Two forecasting problems

A taxonomy of dataset shift

Estimation under prior probability shift assumption

Estimation under ’invariant density ratio’ assumption

An application to the mitigation of model risk for PD estimation

Concluding remarks

References

Dirk Tasche (Imperial College) Model Risk With Estimates of Probabilities of Default 2 / 18


Single borrower’s probability of default (PD)

I Moody’s corporate issuer and default counts in 20082.

Grade Caa-C B Ba Baa A Aa Aaa AllIssuers 417 1151 528 1021 966 582 140 4805Defaults 63 25 6 5 5 4 0 108

I January 1, 2009: What is a Baa-rated borrower’s probability todefault in 2009?

I Natural (?) estimate: 51021 ≈ 0.49%.

2Source: Moody’s (2015)Dirk Tasche (Imperial College) Model Risk With Estimates of Probabilities of Default 3 / 18


Rating profile known

Moody’s corporate issuer proportions and default rates in 2008 andissuer proportions in 20093. All numbers in %.

Grade 2008 2009Issuers Default rate Issuers Default rate

Caa-C 8.7 15.1 11.4 ?B 24.0 2.2 20.9 ?Ba 11.0 1.1 11.0 ?Baa 21.2 0.5 21.9 ?A 20.1 0.5 20.7 ?Aa 12.1 0.7 11.2 ?Aaa 2.9 0.0 2.8 ?All 100.0 2.2 100.0 ?

How to take account of the additional data?

3Source: Moody’s (2015)Dirk Tasche (Imperial College) Model Risk With Estimates of Probabilities of Default 4 / 18


Some thoughts

I Compared to 2008, the rating profile in January 2009 haschanged.

I Why should the grade-level or total default rates remain the same?I Invariant grade-level default rates would imply almost invariant

discriminatory power:I Observed accuracy ratio in 2008: 63.4%I Forecast accuracy ratio for 2009: 66.6%

I What other ways are there to reflect ’almost invariant’discriminatory power?

I Assuming an invariant accuracy ratio is not sufficient for inferringPDs if only the rating profile is known.



Alternatives to ’invariant grade-level default rates’

I Geometric interpretations of accuracy ratio (for continuous score):I Based on area under Receiver Operating Characteristic (ROC)I Based on area between Cumulative Accuracy Profile (CAP) and

diagonalI Derivative of CAP is (essentially) the PD curve of the scores.I Derivative of ROC is (essentially) the density ratio of the scores.I Is ’invariant density ratio’ a viable alternative?I We also look at ’invariant rating profiles of defaulters and

non-defaulters’ as an alternative assumption on ’almost invariant’discriminatory power.



The Machine Learning perspective

I Classification on datasets with changed distributions is a problemwell-known in Machine Learning.

I Moreno-Torres et al. (2012) proposed a taxonomy for datasetshifts.

I Setting:I Each item in a dataset has a class y and a covariates vector x .I ptest(x , y) and ptrai(x , y) are the joint distributions of (x , y) on the

test and training sets respectively.I ptrai(x , y) is known from observation but for ptest(x , y) only the

marginal distribution ptest(x) is observable now.I How to determine unconditional class probabilities ptest(y = c) and

conditional class probabilities ptest(y = c | x)?I Definition: Dataset shift occurs if ptest(x , y) 6= ptrai(x , y).I On Slide 4, y = default status, x = rating grade.



The Moreno-Torres et al. taxonomyI Four types of dataset shift ptest(x , y) 6= ptrai(x , y):

I Covariate shift:

ptest(x) 6= ptrai(x), but ptest(y | x) = ptrai(y | x).

I Prior probability shift:

ptest(y) 6= ptrai(y), but ptest(x | y) = ptrai(x | y).

I Concept shift:

ptest(x) = ptrai(x), but ptest(y | x) 6= ptrai(y | x), orptest(y) = ptrai(y), but ptest(x | y) 6= ptrai(x | y).

I Other shifts.I Assuming ’invariant grade-level default rates’ on Slide 4 is

equivalent to an assumption of covariate shift.



Moody’s corporate rating profiles 2008

Caa−C B Ba Baa A Aa Aaa

DefaultersAllNon−defaulters

Fre

quen

cy

0.0

0.1

0.2

0.3

0.4

0.5



The least-squares estimatorI Setting as on Slide 4:

I y = default status (classes D default and N non-default), x = ratinggrade

I ptest(x) known, but ptest(x) 6= ptrai(x)I Want to determine ptest = ptest(y = D) and ptest(y = D | x)

I Prior probability shift assumption:

ptest(x | y = c) = ptrai(x | y = c) for c = D,N.

I Hence, for all x , the class probability ptest should satisfy

ptest(x) = ptest ptrai(x | y = D) + (1− ptest)ptrai(x | y = N).

I This is unlikely to be achievable. Therefore least squaresapproximation

p̂test =∫ (

ptest (x)−ptrai (x | y=N))(

ptrai (x | y=D)−ptrai (x | y=N))

dx∫ (ptrai (x | y=D)−ptrai (x | y=N)

)2dx

. (1)



Fitted and observed Moody’s corporate rating profiles


Fitted profile 2009Observed profile 2009

Fre

quen

cy

0.00

0.05

0.10

0.15

0.20



The ’invariant density ratio’ estimatorI Setting as on Slide 4:

I y = default status (classes D default and N non-default), x = ratinggrade

I ptest(x) known, but ptest(x) 6= ptrai(x)I Want to determine ptest = ptest(y = D) and ptest(y = D | x)

I Invariant density ratio assumption:

ptest(x | y = D)

ptest(x | y = N)=

ptrai(x | y = D)

ptrai(x | y = N)def= λ(x).

I Then the class probability ptest must satisfy

0 =

∫λ(x)−1

1+(λ(x)−1) ptestd ptest(x). (2a)

I There is a unique solution to (2a) if and only if∫λ(x)d ptest(x) > 1 and

∫λ(x)−1 d ptest(x) > 1. (2b)



PropertiesI If condition (2b) is not satisfied then the profiles ptest(x) and

ptrai(x) are so different that any ’inheritance’ of discriminatorypower seems questionable.

I Exact fit: With the ’invariant density ratio’ estimate p̃test comeestimates of the conditional densities p̃test(x | y = c), c = D,N,such that

λ(x) = p̃test (x | y=D)p̃test (x | y=N)

, and

ptest(x) = p̃test p̃test(x | y = D) + (1− p̃test) p̃test(x | y = N).

I Call ptest =∫

ptrai(y = D | x)d ptest(x) the ’covariate shift’estimator of ptest . Then it holds that

ptest = (1− π)ptrai(y = D) + π p̃test . (3)

Where 0 ≤ π ≤ 1 and π is the closer to 1 the more discriminatorypower the scores x have on the training set.



Conditional profiles 2009: Invariant from 2008 vs. fitted


Observed 2008Fitted 2009Observed 2009

Defaulters

Fre

quen

cy

0.0

0.2

0.4

0.6


Observed 2008Fitted 2009Observed 2009

Non−defaulters

Fre

quen

cy

0.00

0.10

0.20



Different forecast methods

I Three methods of forecasting portfolio-wide default rate ptest :I Covariate shift estimator ptest (Slide 13)I Prior probability shift estimator p̂test , (1)I Invariant density ratio estimator p̃test , (2a)

I p̂test and p̃test give the same estimate under a prior probabilityshift.

I Estimates p̂test and p̃test of ptest provide estimates of theconditional default rates ptest(y = D | x) by

ptest(y = D | x) =ptest λ(x)

1 + (λ(x)− 1)ptest.

I (3) suggests that max(ptest , p̃test) could be an upper bound fornext year’s portfolio-wide default rate.



Observed vs. forecast corporate default rates4

1990 1995 2000 2005 2010 2015

24

68

Year

Per c

ent

ObservedCovariate shift forecastInvariant Density Ratio forecast

4Source of observed rates: Moody’s (2015).Dirk Tasche (Imperial College) Model Risk With Estimates of Probabilities of Default 16 / 18

Concluding remarks

I Straightforward ’covariate shift’ (or ’invariant conditional defaultrate’) PD estimates sometimes may seriously underestimatefuture default rates.

I The ’invariant density ratio’ approach often provides very differentestimates that may be used for model risk mitigation.

I The ’invariant density ratio’ approach can also be applied for theestimation of loss rates.

I Rating agency data like Moody’s (2015) possibly are ’subjective’,making results of the approach conservative.

I Further reading:I Background and more details: Tasche (2013), Tasche (2014)I More sophisticated approaches: Hofer and Krempl (2013), Hofer

(2015)


References

V. Hofer. Adapting a classification rule to local and global shift whenonly unlabelled data are available. European Journal of OperationalResearch, 243(1):177–189, 2015.

V. Hofer and G. Krempl. Drift mining in data: A framework foraddressing drift in classification. Computational Statistics & DataAnalysis, 57(1):377–391, 2013.

Moody’s. Annual Default Study: Corporate Default and RecoveryRates, 1920-2014. Special comment, Moody’s Investors Service,March 2015.

J.G. Moreno-Torres, T. Raeder, R. Alaiz-Rodriguez, N.V. Chawla, andF. Herrera. A unifying view on dataset shift in classification. PatternRecognition, 45(1):521–530, 2012.

D. Tasche. The art of probability-of-default curve calibration. Journal ofCredit Risk, 9(4):63–103, 2013.

D. Tasche. Exact fit of simple finite mixture models. Journal of Riskand Financial Management, 7(4):150–164, 2014.


Documents

Model Risk With Estimates of Probabilities of Default...The Machine Learning perspective I Classiﬁcation on datasets with changed distributions is a problem well-known in Machine