Credit Scoring and Data Mining

8/4/2019 Credit Scoring and Data Mining

1/170

Mang 6054 Credit Scoring and Data MiningDr. Bart Baesens

School of Management

University of Southampton

[email protected]


2/170

Lecturer: Bart BaesensStudied at the Catholic University of Leuven

(Belgium) Business Engineer in Management

Informatics, 1998

Ph.D. in Applied Economic Sciences, 2003

Ph.D. title: Developing Intelligent Systems for Credit Scoring Using

Machine Learning TechniquesAssociate professor at the K.U.Leuven, Belgium,and Vlerick Leuven Ghent Management School

Lecturer at the School of Management at the Universityof Southampton, United Kingdom

Research: data mining, Web intelligence, neural networks, SVMs,rule extraction, survival analysis, CRM, credit scoring, Basel II,

Twitter: www.twitter.com/DataMiningApps

Facebook: Data Mining with Bart

www.dataminingapps.com222
http://www.twitter.com/DataMiningAppshttp://www.dataminingapps.com/http://www.dataminingapps.com/http://www.twitter.com/DataMiningApps


3/170

3

Software Support SAS

SAS Enterprise Miner

SAS Credit Scoring Nodes

interactive grouping, scorecard,reject inference, credit exchange

Weka

SPSS

Microsoft Excel


4/170

Relevant Background: Credit Scoring Books Credit Risk Management: Basic Concepts, Van Gestel

and Baesens. Oxford University Press, 2008.

Credit Scoring and Its Applications, Thomas,Edelman, and Crook. Siam Monographs onMathematical Modeling and Computation,2002.

The Credit Scoring Toolkit, Anderson, Oxford University Press, 2007.

Consumer Credit Models: Pricing, Profit and Portfolios, Thomas, Oxford

University Press, 2009. The Basel Handbook, Ong, 2004.

Basel II Implementation: A Guide to Developing and Validating aCompliant Internal Risk Rating System, Ozdemir and Miu, McGraw-Hill, 2008.

The Basel II Risk Parameters: Estimation, Validation and Stress Testing,

Engelmann and Rauhmeier, Springer, 2006. Recovery Risk: The Next Challenge in Credit Risk Management. Edward

Altman, Andrea Resti, Andrea Sironi (editors), 2005

Developing Intelligent Systems for Credit Scoring, Bart Baesens,Ph.D. thesis, K.U.Leuven, 2003.

4


5/170

Relevant Background: Data mining books

Hastie T., Tibshirani R., Friedman J., Elementsof Statistical Learning: data mining, inference

and prediction, Springer-Verlag, 2001.Hand D.J., Mannila H., Smyth P., Principles

of Data Mining, the MIT Press, 2001.

Han J., Kamber M., Data Mining: Concepts and

Techniques, 2nd edition, Morgan Kaufmann, 2006.Bishop C.M., Neural Networks for Pattern Recognition,

Oxford University Press, 1995.

Cristianini N., Taylor J.S., An Introduction to SupportVector Machines and other kernel-based learning

methods, Cambridge University Press, 2000.

Cox D.R., Oakes D., Analysis of Survival Data,Chapman and Hall, 1984.

Allison P.D., Survival Analysis Using the SAS System,

SAS Institute Inc., 1995.5
http://search.barnesandnoble.com/booksearch/imageviewer.asp?ean=9781555442798http://research.microsoft.com/~cmbishop/images/NNPR_large.jpghttp://mitpress.mit.edu/images/products/books/026208290X-f30.jpg


6/170

6

Relevant Background: Credit scoring journals

Journal of Credit Risk Risk Magazine

Journal of Banking and Finance

Journal of the Operational Research Society

European Journal of Operational Research Management Science

IMA Journal of Mathematics


7/170

Relevant Background: Data mining journals

Data Mining and Knowledge Discovery

Machine Learning

Expert Systems with Applications

Intelligent Data Analysis

Applied Soft Computing

European Journal of Operational Research

IEEE Transactions on Neural Networks

Journal of Machine Learning Research

Neural Computation

7
http://www.jmlr.org/


8/1708

Relevant Background: Credit scoring Web sites

www.defaultrisk.com

http://www.bis.org/Websites of regulators:

www.fsa.gov.uk (United Kingdom)

www.hkma.gov.uk (Hong Kong)

www.apra.gov.au (Australia)

www.mas.gov.sg (Singapore)

In the U.S. 4 agencies: OCC, Federal Reserve, FDIC, OTS

Proposed Supervisory Guidance for Internal Ratings Based Systems forCredit Risk, Advanced Measurement Approaches for Operational Risk, andthe Supervisory Review Process (Pillar 2), Federal Register, Vol 72, No. 39,February 2007.

Risk-based capital standards: Advanced Capital Adequacy Framework-Basel II; Final rule, Federal Register, December 2007.
http://www.defaultrisk.com/http://www.bis.org/http://www.fsa.gov.uk/http://www.hkma.gov.uk/http://www.apra.gov.au/http://www.mas.gov.sg/http://www.mas.gov.sg/http://www.apra.gov.au/http://www.hkma.gov.uk/http://www.fsa.gov.uk/http://www.bis.org/http://www.defaultrisk.com/


9/170

Relevant Background: Data mining web sites

www.kdnuggets.com (data mining)

www.kernel-machines.org (SVMs)

www.sas.com

See support.sas.com for interesting macros

www.twitter.com/DataMiningApps

9
http://www.kdnuggets.com/http://www.kernel-machines.org/http://www.sas.com/http://www.twitter.com/DataMiningAppshttp://www.twitter.com/DataMiningAppshttp://www.sas.com/http://www.kernel-machines.org/http://www.kernel-machines.org/http://www.kernel-machines.org/http://www.kdnuggets.com/


10/17010

Course Overview

Credit Scoring

Basel I and Basel II

Data preprocessing for PD modeling

Classification techniques for PD modeling

Measuring performance of PD models

Input selection

Reject inference

Behavioural scoring

Other applications


11/170

Section 1

An Introduction to

Credit Scoring


12/170

Credit Scoring Estimate whether applicant will successfully repay

his/her loan based on various information Applicant characteristics (e.g. age, income, )

Credit bureau information

Application information of other applicants Repayment behavior of other applicants

Develop models (also called scorecards) estimatingthe probability of default of a customer

Typically, assign points to each piece of information,add all points, and compare with a threshold (cut-off)

12


13/170

Judgmental versus Statistical ApproachJudgmental

Based on experience

Five Cs: character, capital, collateral, capacity, condition

Statistical

Based on multivariate correlations between inputs and riskof default

Both assume that the future will resemble the past.

13


14/170

Types of Credit Scoring Application scoring

Behavioral scoring

Profit scoring

Bankruptcy prediction

14


15/170

Application Scoring

Estimate probability of default at the time the applicant applies for the loan!

Use predetermined definition of default

For example, three months of payment arrears Has now been fixed in Basel II

Use application variables (age, income, marital status, years ataddress, years with employers, known client, )

Use bureau variables

Bureau score, raw bureau data (for example, number of credit checks,total amount of credits, delinquency history),

In the US

Fico scores: between 300 to 850

Experian, Equifax, TransUnion

Others

Baycorp Advantage (Australia and New Zealand)

Schufa (Germany)

BKR (the Netherlands)

CKP (Belgium)

Dun & Bradstreet (MidCorp, SME)

15


16/170

Example Application Scorecard

16

So, a new customer applies for credit:

AGE 32 120 pointsGENDER Female 180 pointsSALARY $1,150 160 points

Total 460 points

Let cut-off = 500

REFUSE CREDIT

...


17/170

Example Variables Used in ApplicationScorecard Development

Age

Time at residence

Time at employment

Time in industry

First digit of postal code

Geographical: Urban/Rural/Regional/Provincial

Residential StatusEmployment status

Lifestyle code

Existing client (Y/N)

Years as client

Number of products internally

Internal bankruptcy/write-off

Number of delinquencies

Beacon/Empirical

Time at bureau

Total Inquiries

Time since last Inquiry

Inquiries in the last 3/6/12 mths

Inquiries in the last 3/6/12 mths as % of total

Income

Total Liabilities

Total Debt

Total Debt service $

Total Debt Service Ratio

Gross Debt Service Ratio

Revolving Debt/Total DebtOther Bank card

Number of other bank cards

Total Trades

Trades opened in last six months

Trades Opened in Last six months /Total Trades

Total Inactive Trades

Total Revolving trades

Total term trades

% revolving trades

Total term trades

Number of trades current

Number of trades R1, R2..R9

% trades R1-R2

% Trades R3 or worse

% trades O1..O9

% trades O1-O2

% Trades O3 or worse

Total credit lines - revolvingTotal balance - revolving

Total Utilization

total balance- all

17


18/170

Application Scoring: Summary New customers

Two snapshots of the state of the customer Second snapshot typically taken around 18 months

after loan origination

Static procedure

Scorecards are out-of-date, even before they areimplemented (Hand 2003).

Application scoring aims at ranking customersno calibrated probabilities of default needed!

18


19/170

Behavioral ScoringBehavioral Scoring

Study risk behavior of existing customers Update risk assessment taking into account recent behavior

Example of behavior:

Average/Max/Min/trend in checking account balance, bureau score,

Delinquency history (payment arrears, )

Job changes, home address changes,

Dynamic

Video clip to snapshot (typically taken after 12 months)

19 continued...

Performance period/video clip

Observation point Outcome point/snapshot


20/170

Behavioral Scoring Behavioral scoring can be used for

Debt provisioning and profit scoring Authorizing accounts to go in excess

Setting credit limits

Renewals/reviews

Collection strategies Characteristics:

Many, many variables (input selection needed,cf. infra)

Purpose is again scoring = ranking customer withrespect to their default likelihood

20


21/170

Profit Scoring Calculate profit over customer's lifetime

Need loss estimates for bad accounts Decide on time horizon

Survival analysis

Dynamic models: incorporate changes in economicclimate

Score the profitability of a customer for a particularproduct (product profit score)

Score the profitability of a customer over all products(customer profit score)

Video clip to video clip

Not being adopted by many financial institutions (yet!)

21


22/170

Corporate default risk modelingPrediction approach

Predict bankrupt versus non-bankrupt given ratios (solvency, liquidity, )describing financial status of companies

Assumes the availability of data (and defaults!)

For example, Altmans z-model

1968, linear discriminant analysis

For public industrial companies: Z=1.2X1 + 1.4X2+ 3.3X3 + 0.6X4 +1.0X5 (healthy if Z > 2.99, unhealthy if Z < 1.81)

For private industrial companies: Z=6.56X1 + 3.26X2 + 6.72X3 +1.05X4 (healthy if Z > 2.60, unhealthy if Z < 1.1)

X1: Working Capital/Total Assets, X2: Retained Earnings/Total Assets,X3: EBIT/ Total Assets, X4: Market (Book) Value of Equity/Total Liabilities,

X5: Net Sales/Total AssetsExpert based approach

Expert or expert committee decides upon criteria

22


23/170

Example expert based scorecard (Ozdemirand Miu, 2009)

Business Risk Score

Industry Position 6

Market Share Trends and Prospects 2

Geographical Diversity of Operations 2

Diversity Product and Services 6

Customer Mix 1

Management Quality and Depth 4

Executive Board oversight 2

. .

23


24/170

Corporate default risk modeling

Agency rating approach In the absence of default data (for example, sovereign exposures, low

default portfolios)

Purchase ratings (for example, AAA, AA, A, BBB) to reflectcreditworthiness

Each rating corresponds to a default probability (PD)

Shadow rating approach

Purchase ratings from a ratings agency

Estimate/mimick ratings using statistical models and data

24


25/170

The Altman Model: Enron Case

25 Source: Saunders and Allen (2002)


26/170

Rating Agencies Moodys Investor Service, Standard & Poors,

Fitch IBCA Assign ratings to debt and fixed income securities

at the borrowers request to reflect financial strength

(creditworthiness) of the economic entity

Ratings for Companies (private and public)

Countries and governments (sovereign ratings)

Local authorities

Banks

Models are based on quantitative and qualitative(judgmental) analysis (black box)

26


27/170

Rating Agencies

27 Van Gestel, Baesens, 2008


28/170

Section 2

Basel I, Basel II and Basel III


29/170

29

Regulatory versus Economic capitalRole of capital

Banks should have sufficient capital to protect themselves andtheir depositors from the risks (credit/market/operational) they aretaking

Regulatory capital

Amount of capital a bank should have according to a regulation(e.g. Basel I or Basel II)

Economic capital

Amount of capital a bank has based on internal modeling strategyand policy

Types of capital:

Tier 1: common stock, preferred stock, retained earnings

Tier 2: revaluation reserves, undisclosed reserves, generalprovisions, subordinated debt (can vary from country tocountry)

Tier 3: specific to market risk


30/170

30

The Basel I and II Capital AccordsBasel Committee

Central Banks/Bank regulators of major (G10) industrial countries Meet every 3 months at Bank for International Settlements (BIS) in

Basel

Basel I Capital Accord 1988

Aim is to set up minimum regulatory capital requirements in order to

ensure that banks are able, at all times, to give back depositors funds Capital ratio = available capital/risk-weighted assets

Capital ratio should be > 8%

Both tier 1 and tier 2 capital

Need for new Accord Regulatory arbitrage

Not sufficient recognition of collateral guarantees

Solvency of debtor not taken into account

Market risk later introduced in 1996

No operational risk


31/170

31

Basel I: Example Minimum capital = 8% of risk-weighted assets

Fixed risk ratesRetail:

Cash: 0%

Mortgages: 50%

Other commercial: 100%

Example:

100$ mortgage

50% 50$ risk-weighted assets 8% 4$ capital needed

Risk weights are independent of obligor and facilitycharacteristics!


32/170

32

Three Pillars of Basel II

Credit Risk

Standard Approach Internal Ratings

Based Approach

Foundation

Advanced

Operational Risk

Market Risk

Sound internal

processes toevaluate risk(ICAAP)

Supervisorymonitoring

Semi-annual

disclosure of: Banks risk profile

Qualitative andQuantitativeinformation

Risk managementprocesses

Risk managementStrategy

Objective is to informpotential investors

Pillar 1: Minimum CapitalRequirement

Pillar 2: Supervisory ReviewProcess

Pillar 3: Market Discipline andPublic Disclosure


33/170

33

Pillar 1: Minimum Capital Requirements Credit Risk: key components

Probability of default (PD) (decimal) Loss given default (LGD) (decimal)

Exposure at default (EAD) (currency)

Maturity (M)

Expected Loss (EL) = PD x LGD x EAD For example, PD=1%, LGD=20%, EAD=1000 Euros EL=2

Euros

Standardized Approach

Internal Ratings Based Approach

In the U.S., initially, in the notice of proposed rule making, adifference was made between ELGD (expected LGD) and LGD(based on economic downturn). This was later abandoned.


34/170

34

The Standardized Approach Risk assessments (for example, AAA, AA, BBB, ...)

are based on external credit assessment institution(ECAI) (for example, Standard & Poors, Moodys, ...)

Eligibility criteria for ECAI given in Accord (objectivity,independence, ...)

Risk weights given in Accord to computeRisk Weighted Assets (RWA)

Risk weights for sovereigns, banks, corporates, ...

Minimum Capital = 0.08 x RWA


35/170

35

The Standardized ApproachExample:

Corporate/1 mio dollars/maturity 5years/unsecured/external rating AA

RW=20%, RWA=0.2 mio, Reg. Cap.=0.016 mio

Credit risk mitigation (collateral)

guidelines provided in accord (simple versuscomprehensive approach)

For retail:

Risk weight =75% for non-mortgage; 35% for mortgage

But:

Inconsistencies

Sufficient coverage?

Need for individual risk profile!

No LGD, EAD


36/170

36

Internal Ratings Based (IRB) ApproachInternal Ratings Based Approach

Foundation Advanced

PD LGD EAD

Foundationapproach

Internalestimate

Regulator'sestimate

Regulator'sestimate

Advancedapproach

Internalestimate

Internal estimate Internal estimate


37/170

37

IRB Approach Split exposure into 5 categories:

corporate (5 subclasses) sovereign

bank

retail (residential mortgage, revolving, other retail

exposures)

equity

Foundation IRB approach not allowed for retail

Risk weight functions to derive capital requirements

Phased rollout of the IRB approach across bankinggroup using implementation plan

Transition period of 3 years


38/170

38

Basel II: Retail Specifics Retail exposures can be pooled

Estimate PD/LGD/EAD per rating/pool/segment! Note: pool=segment=grade=rating

Default definition

obligor is past due more than 90 days

(can be relaxed during transition period) default can be applied at the level of the credit facility

varies per country (e.g. with materiality thresholds)!

PD is the greater of the one-year estimated PD or 0.03%

Length of underlying historical observation period toestimate loss characteristics must be at least 5 years (canbe relaxed during transition period, minimum 2 years at thestart)

No maturity (M) adjustment for retail!


39/170

39

Developing a Rating Based System Application and behavioral scoring models provide ranking

of customers according to risk This was OK in the past (for example, for loan approval),

but Basel II requires well-calibrated default probabilities

Map the scores or probabilities to a number of distinct

borrower ratings/pools

Decide on number of ratings and their definition Typically around 15 ratings

Can be defined by mapping onto Moodys, S&P scale,

or expert based!

Impact on regulatory capital!

ABCE D

600580570520 Score


40/170

40

Developing a Rating SystemFor corporate, sovereign, and bank exposures

To meet this objective, a bank must have a minimum ofseven borrower gradesfor non-defaulted borrowers andone for those that have defaulted, paragraph 404 of theBasel II Capital Accord.

For retail

For each pool identified, the bank must be able to providequantitative measures of loss characteristics (PD, LGD,and EAD) for that pool. The level of differentiation for IRBpurposes must ensure that the number of exposures in a

given pool is sufficient so as to allow for meaningfulquantification and validation of the loss characteristics atthe pool level. There must be a meaningful distribution ofborrowers and exposures across pools. A single pool mustnot include an undue concentration of the banks total retail

exposure, paragraph 409 of the Basel II Capital Accord.


41/170

41

A Basel II Project PlanNew customers: application scoring

1 year PD (needed for Basel II) Loan term/18-month PD (needed for loan approval)

Customers having credit: behavioral scoring

1 year PD

Other existing customers applying for credit:mix application/behavioural scoring

1 year PD

Loan term/18-month PD


42/170

42

Estimating Well-calibrated PDs Calculate empirical 1-year realized default rates for

borrowers in a specific grade/pool

Use 5 years of data to estimate future PD (retail)

How to calibrate the PD?

Average Maximum

Upper percentile (stressed/conservative PD)

Dynamic model (time series, e.g. ARIMA)

periodtimegiventhe

ofbeginningtheatXratingwithobligorsofNumber

periodtimetheduringdefaultedthatperiodtimegiventhe

ofbeginningtheatXratingwithobligorsofNumber

XgraderatingforPDyear-1


43/170

43

Estimating Well-calibrated PDs

empirical 12 month PD rating grade B

0,00%

0,50%

1,00%

1,50%

2,00%

2,50%

3,00%

3,50%

4,00%

4,50%

5,00%

0 10 20 30 40 50 60

Month

PD


44/170

44

Basel II model architecture

ModelScorecardLogistic Regression

Internal DataExternal DataExpert Judgment

Risk ratings definitionPD calibration

Level 0

Level 1

Level 2

CharacteristicName

AttributeScorecard

Points

AGE 1 Up to 26 100AGE 2 26 - 35 120

AGE 3 35 - 37 185

AGE 4 37+ 225

GENDER 1 Male 90

GENDER 2 Female 180

SALARY 1 Up to 500 120

SALARY 2 501-1000 140

SALARY 3 1001-1500 160

SALARY 4 1501-2000 200

SALARY 5 2001+ 240

CharacteristicName

AttributeScorecard

Points

AGE 1 Up to 26 100AGE 2 26 - 35 120

AGE 3 35 - 37 185

AGE 4 37+ 225

GENDER 1 Male 90

GENDER 2 Female 180

SALARY 1 Up to 500 120

SALARY 2 501-1000 140

SALARY 3 1001-1500 160

SALARY 4 1501-2000 200

SALARY 5 2001+ 240

CharacteristicName

CharacteristicName

CharacteristicName

AttributeAttributeAttributeScorecard

PointsScorecard

PointsScorecard

Points

AGE 1AGE 1AGE 1 Up to 26Up to 26Up to 26 100100100AGE 2AGE 2AGE 2 26 - 3526 -3526 - 35 120120120

AGE 3AGE 3AGE 3 35 - 3735 -3735 - 37 185185185

AGE 4AGE 4AGE 4 37+37+37+ 225225225

GENDER 1GENDER 1GENDER 1 MaleMaleMale 909090

GENDER 2GENDER 2GENDER 2 FemaleFemaleFemale 180180180

SALARY 1SALARY 1SALARY 1 Up to 500Up to 500Up to 500 120120120

SALARY 2SALARY 2SALARY 2 501-1000501-1000501-1000 140140140



SALARY 5SALARY 5SALARY 5 2001+2001+2001+ 240240240

empirical 12 month PD rating grade B

0,00%

0,50%

1,00%

1,50%

2,00%2,50%

3,00%

3,50%

4,00%

4,50%

5,00%

0 10 20 30 40 50 60

Month

PD


45/170

45

Risk Weight Functions for Retail

N() is cumulative standard normal distribution, N-1() isinverse cumulative standard normal distribution and

is asset correlation Regulatory capital=K.EAD

Residential mortgage exposures

Qualifying revolving exposures

Other retail exposures

PDNPDNNLGDK )999.0(1)(1

1. 11

15.0

04.0

35

35

35

35

1

1116.0

1

103.0

e

e

e

ePDPD

Risk Weight Functions For


46/170

46

Risk Weight Functions ForCorporates/Sovereigns/Banks

M represents the nominal or effective maturity (between 1 and5 years)

Firm size adjustment for SMEs

S represents total annual sales in millions of euros (between5 and 50 million)

bbMPDNPDNNLGDK

5.11))5.2(1()999.0(

1)(

11. 11

))ln(05478.011852.0( PDb

50

50

50

50

1

1124.01

112.0 e

e

e

ePDPD

)45

51(04.0

1

1124.0

1

112.0

50

50

50

50

S

e

e

e

ePDPD

Risk Weight Functions


47/170

47

Risk Weight Functions

Regulatory Capital=K (PD, LGD). EAD Only covers unexpected loss (UL)!

Expected loss=LGD.PD Expected loss covered by provisions!

RWA not explicitly calculated. Can be backed out as follows: Regulatory Capital=8% . RWA RWA=12.50 x Regulatory Capital=12.50 x K x EAD

Note: scaling factor of 1.06 for credit-risk-weighted assets (based onQIS 3, 4 and 5)

The asset correlations are chosen depending on the businesssegment (corporate, sovereign, bank, residential mortgages,revolving, other retail) using some empirical but not publishedprocedure!

Based on reverse engineering of economic capital models fromlarge banks The asset value correlations reflect a combination of supervisory

judgment and empirical evidence (Federal Register) Note: asset correlations also measure how the asset class is

dependent on the state of the economy

LGD/ EAD E M E i Th


48/170

48

LGD/ EAD Errors More Expensive ThanPD Errors

Consider a credit card portfolio where PD = 0.03 ; LGD = 0.5 ; EAD =$10,000

Basel formulae give capital requirement ofK(PD,LGD).EAD

K(0.03,0.50)(10000) = $343.7 10% over estimate on PD means capital required is

K(0.033,0.50)(10,000) = $367.3

10% over estimate on LGD means capital required is

K(0.03,0.55)(10,000) = $378.0

10% over estimate on EAD means capital required is

K(0.03,0.50)(11,000) = $378.0


49/170

49

The Basel II Risk Weight Functions

Basel III


50/170

Basel III Objective: fundamentally strengthen global capital standards

Key attention points:

focus on tangible equity capital since this is the component withthe greatest loss-absorbing capacity

reduced reliance on banks internal models

reduce the reliance on external ratings

greater focus on stress testing

Loss absorbing capacity beyond common standards forsystematically important banks

Tier 3 capital eliminated

Address model risk by e.g. introducing non-risk based leverageratio as a backstop

Liquidity coverage ratio and the Net Stable Funding Ratio

No major impact on underlying credit risk models!

The new standards will take effect on 1 January 2013 and for themost part will become fully effective by January 2019

50

Basel II versus Basel III


51/170

Basel II versus Basel III

Basel II Basel III

Tier 1 capital ratio 4%.RWA 6% .RWA (by 2015)

Core tier 1 capital ratio (commonequity=shareholder+ retained earnings)

2%.RWA 4.5% .RWA (by 2015)

Capital conservation buffer (commonequity)

- 2,5% .RWA (by 2019)

Countercyclical buffer - 0% 2.5% .RWA (by 2019)

Non risk-based leverage ratio (tier 1capital)

- 3% .Assets (by 2018)Note: Assets include off balancesheet exposures and derivatives.

Total capital ratio [Tier 1 CapitalRatio] + [Tier 2Capital Ratio] +[Tier 3 CapitalRatio]

[Tier 1 Capital Ratio] + [CapitalConservation Buffer] +[Countercyclical Capital Buffer] +[Capital for Systemically ImportantBanks]


52/170

Section 3

Preprocessing Data

for Credit Scoring andPD Modeling


53/170

53

Motivation Dirty, noisy data

For example, Age= -2003 Inconsistent data

Value 0 means actual zero or missing value

Incomplete data

Income=? Data integration and data merging problems

Amounts in Euro versus Amounts in dollar

Duplicate data

Salary versus Professional Income Success of the preprocessing steps is crucial to success of

following steps

Garbage in, Garbage out (GIGO)

Very time consuming (80% rule)


54/170

54

Preprocessing Data for Credit Scoring

Types of variables

Sampling

Missing values

Outlier detection and treatment

Coarse classification of attributes Weights of Evidence and Information Value

Segmentation

Definition of target variable


55/170

55

Types of Variables

Continuous Defined on a continuous interval For example, income, amount on savings account, ... In SAS: Interval

Discrete Nominal

No ordering between values For example, purpose of loan, marital status

Ordinal Implicit ordering between values For example, credit rating (AAA is better than AA, AA is better

than A, ) Binary For example, gender


56/170

56

Sampling Take sample of past applicants to build scoring model

Think careful about population on which the model that is going to bebuilt using the sample, will be used

Timing of sample

How far do I go to get my sample?

Trade-off: many data versus recent data

Number of bads versus number of goods

Undersampling, oversampling may be needed (dependent onclassification algorithm, cf. infra)

Sample taken must be from a normal business period, to get asaccurate a picture as possible of the target population

Make sure performance window is long enough to stabilize bad rate

(e.g. 18 months!) Example sampling problems

Application scoring: reject inference

Behavioural scoring: seasonality depending upon the choice ofthe observation point


57/170

57

Missing Values Keep or Delete or Replace

Keep The fact that a variable is missing can be important

information!

Encode variable in a special way (e.g. separate

category during coarse classification) Delete

When too many missing values, removing thevariable or observation might be an option

Horizontally versus vertically missing values Replace

Estimate missing value using imputationprocedures

Imputation Procedures for Missing Values


58/170

58

Imputation Procedures for Missing Values

For continuous attributes

Replace with median/mean (median more robust to outliers) Replace with median/mean of all instances of the same class

For ordinal/nominal attributes

Replace with modal value (= most frequent category)

Replace with modal value of all instances of the same class

Advanced schemes

Regression or tree-based imputation

Predict missing value using other variables

Nearest neighbour imputation

Look at nearest neighbours (e.g. in Euclidean sense) forimputation

Markov Chain Monte Carlo methods

Not recommended!


59/170

59

Dealing with Missing Values in SASdata credit;input income savings;datalines;1300 4001000 .2000 800. 200

1700 6002500 10002200 900. 10001500 .;

procstandarddata=credit replaceout=creditnomissing;run;

Impute node in

Enterprise Miner!


60/170

60

Outliers E.g., due to recording or data entry errors or noise

Types of outliers Valid observation: e.g. salary of boss, ratio

variables

Invalid observation: age =-2003

Outliers can be hidden in one dimensional views of thedata (multidimensional nature of data)

Univariate outliers versus multivariate outliers

Detection versus treatment


61/170

61

Univariate Outlier Detection Methods Visually

histogram based box plots

z-score

Measures how many standard deviations an observationis away from the mean for a specific variable

m is the mean of variable xi and sits standard deviation Outliers are variables having |zi| > 3 (or 2.5)

Note: mean of z-scores is 0, standard deviation is 1

Calculate the z-score in SAS using PROC STANDARD

s

m

i

i

x

z


62/170

62

Univariate Outlier Detection Methods


63/170

63

Box Plot A box plot is a visual representation of five numbers:

Median M P(XM)=0.50 First Quartile Q1P(XQ1)=0.25

Third Quartile Q3P(XQ3)=0.75

Minimum

Maximum

IQR=Q3-Q1

MinQ1 Q3M

1,5*IQR

Outliers

M l i i O li D i M h d


64/170

64

Multivariate Outlier Detection MethodsMultivariate outliers!

T t t f O tli


65/170

65

Treatment of Outliers For invalid outliers:

E.g. age=300 years Treat as missing value (keep, delete, replace)

For valid outliers:

Truncation based on z-scores:

Replace all variable values having z-scores of

> 3 by the mean + 3 times the standard deviation Replace all variable values having z-scores of

< -3 by the mean -3 times the standard deviation

Truncation based on IQR (more robust than z-scores)

Truncate to M3s, with M=median and s=IQR/(2 x 0.6745)

See Van Gestel, Baesens et al., 2007

Truncation using a sigmoid

Use a sigmoid transform, f(x)=1/(1+e-x)

Also referred to as winsorizing

C ti th i SAS


66/170

66

Computing the z-scores in SAS

data credit;

input age income;datalines;34 130024 100020 200040 2100

54 170039 250023 220034 70056 1500

;

procstandarddata=credit mean=0std=1out=creditnorm;run;

Di ti ti d G i f Att ib t


67/170

67

Discretization and Grouping of AttributeValues

Motivation

Group values of categorical variables for robust analysis

Introduce non-linear effects for continuous variables

Classification technique requires discrete inputs (for example, Bayesiannetwork classifiers)

Create concept hierarchies (Group low level concepts,for example, raw age data, to higher lever concepts, for example, young,middle aged, old)

Also called Coarse classification, Classing, Categorisation, binning, grouping,

Methods

Equal interval binning Equal frequency binning (histogram equalization)

Chi-squared analysis

Entropy based discretization (using e.g. decision trees)

C Cl if i C ti V i bl


68/170

68

Coarse Classifying Continuous Variables

Lyn Thomas, A survey of credit and behavioural scoring: forecastingfinancial risk of lending to consumers, International Journal ofForecasting, 16, 149-172, 2000

Th Bi i M th d


69/170

69

The Binning Method For example, consider the attribute income

1000, 1200, 1300, 2000, 1800, 1400 Equal interval binning

Bin width=500

Bin 1, [1000, 1500[ : 1000, 1200, 1300, 1400

Bin 2, [1500, 2000[ : 1800, 2000

Equal frequency binning (histogram equalization)

2 bins

Bin 1: 1000, 1200, 1300

Bin 2: 1400, 1800, 2000

However, both these methods do not take into accountthe default risk!

Th Chi d M th d


70/170

70

The Chi-squared Method Consider the following example (taken from the book,

Credit Scoring and Its Applications, by Thomas,Edelman and Crook, 2002)

Suppose we want three categories

Should we take option 1: owners, renters, and othersor option 2: owners, with parents, and others

Attribute Owner RentUnfurnished

RentFurnished

Withparents

Other Noanswer

Total

Goods 6000 1600 350 950 90 10 9000

Bads 300 400 140 100 50 10 1000

Good:badodds

20:1 4:1 2.5:1 9.5:1 1.8:1 1:1 9:1

continued...

Th Chi d M th d


71/170

71

The Chi-squared Method The number of good owners given that the odds are the same as in

the whole population are (6000+300) x 9000/10000=5670.

Likewise, the number of bad renters given that the odds are the sameas in the whole population are(1600+400+350+140) x 1000/10000=249.

Compare the observed frequencies with the theoretical frequenciesassuming equal odds using a chi-squared test statistic.

Option 1

Option 2

The higher the test statistic, the better the split (formally, comparewith chi-square distribution with k-1degrees of freedom for k classesof the characteristic).

Option 2 is the better split!

583

121

121160

1089

10891050

249

249540

2241

22411950

630

630300

5670

567060002

2

662265

2656002385

23852050105

105100945

945950630

6303005670

567060002

C Cl ifi ti i SAS


72/170

Coarse Classification in SAS

72

data residence;

input default$ resstatus$ count;datalines;good owner 6000good rentunf 1600good rentfurn 350

good withpar 950good other 90good noanswer 10bad owner 300bad rentunf 400bad rentfurn 140bad withpar 100bad other 50bad noanswer 10;

C Cl ifi ti i SAS


73/170


73

data coarse1;

input default$ resstatus$ count;datalines;good owner 6000good renter 1950good other 1050

bad owner 300bad renter 540bad other 160;


74/170



75/170


75

proc freq data=coarse1;

weight count;tables default*resstatus / chisq;run;

proc freq data=coarse2;weight count;tables default*resstatus / chisq;

run;

Recoding Nominal/Ordinal Variables


76/170

76

Recoding Nominal/Ordinal Variables Nominal variables

Dummy encoding Ordinal variables

Thermometer encoding

Weights of evidence coding

Dummy and Thermometer Encoding


77/170

77

Dummy and Thermometer Encoding

Input 1 Input 2 Input 3

Purpose=car 1 0 0

Purpose=real estate 0 1 0

Purpose=other 0 0 1

Original Input CategoricalInput

Thermometer Inputs

Input 1 Input 2 Input 3

Income 800 Euro 1 0 0 0

Income > 800 Euro andIncome 1200 Euro

2 0 0 1

Income > 1200 Euro and 10000 Euro

3 0 1 1

Income > 10000 Euro 4 1 1 1

Many inputs

Explosion of data set!

Thermometer encoding

Dummy encoding

Weights of Evidence


78/170

78

Weights of Evidence Measures risk represented by each (grouped) attribute

category (for example, age 23-26) The higher the weight of evidence (in favor of being

good), the lower the risk for that category

Weight of Evidence

category = ln (p_goodcategory / p_badcategory ),

where p_goodcategory = number of goodscategory/ number of goodstotal

p_badcategory = number of badscategory/ number of badstotal

If p_goodcategory > p_badcategory then WOE > 0

If p_goodcategory < p_badcategory then WOE < 0

continued...

Weights of Evidence


79/170

79

Weights of Evidence

Age CountDistr.

Count GoodsDistr.

Goods BadsDistr.Bads WOE

Missing 50 2.50% 42 2.33% 8 4.12% -57.28%

18-22 200 10.00% 152 8.42% 48 24.74% -107.83%

23-26 300 15.00% 246 13.62% 54 27.84% -71.47%

27-29 450 22.50% 405 22.43% 45 23.20% -3.38%

30-35 500 25.00% 475 26.30% 25 12.89% 71.34%

35-44 350 17.50% 339 18.77% 11 5.67% 119.71%44+ 150 7.50% 147 8.14% 3 1.55% 166.08%

Total: 2000 1806 194

Distr Good Distr Bad/Ln x 100

...

Information Value


80/170

80

Information Value Information Value (IV) is a measure of predictive

power used to: assess the appropriateness of the classing

select predictive variables

IV is similar to an entropy (cf. infra):

IV = S( p_good category- p_bad category )*woe category ) Rule of thumb:

< 0.02 : unpredictive

0.02 0.1 : weak

0.1 0.3 : medium

0.3 + : strong

Age

Distr.

Goods

Distr.

Bads WOE IV

Missing 2.33% 4.12% -57.28% 0.010318-22 8.42% 24.74% -107.83% 0.1760

23-26 13.62% 27.84% -71.47% 0.1016

27-29 22.43% 23.20% -3.38% 0.0003

30-35 26.30% 12.89% 71.34% 0.0957

35-44 18.77% 5.67% 119.71% 0.1568

44+ 8.14% 1.55% 166.08% 0.1095

Information Value: 0.6502

WOE: Logical Trend?


81/170

81

WOE: Logical Trend?

Predictive Strength

-150

-100

-50

0

50

100

150

200

Missing 18-22 23-26 27-29 30-35 35-44 44 +

Age

Weigh

t

Younger people tend to represent a higher riskthan the older population

Segmentation


82/170

82

Segmentation When more than onescorecard is required

Build a scorecard for each segment separately Three reasons (Thomas, Jo, and Scherer, 2001)

Strategic

Banks might want to adopt special strategies to specific

segments of customers (for example, lower cut-offscore depending on age or residential status).

Operational

New customers must have separate scorecardbecause the characteristics in the standard scorecard

do not make sense operationally for them. Variable Interactions

If one variable interacts strongly with a number ofothers, it might be sensible to segment according to

this variable. continued...

Segmentation


83/170

83

Segmentation Two ways

Experience based In cooperation with financial expert

Statistically based

Use clustering algorithms (k-means,

decision trees, ) Let the data speak

Better predictive power than single modelif successful!

Decision Trees for Clustering


84/170

84

Decision Trees for Clustering

Segmentation

Customer > 2 yrsBad rate = 0.3%

Customer < 2 yrsBad rate = 2.8%

Existing CustomerBad rate = 1.2%

Age < 30Bad rate = 11.7%

Age > 30Bad rate = 3.2%

New ApplicantBad rate = 6.3%

All Good/BadsBad rate = 3.8%

4 Segments

Definitions of Bad


85/170

85

Definitions of Bad 30/60/90 days past due

Charge off/write off Bankrupt

Claim over $1000

Profit based

Negative NPV

Less than x% owed collected

Fraud over $500

Basel II: 90 days, but can be changed by nationalregulators (for example, U.S., UK, Belgium, )


86/170

Section 4

Classification Techniques

for Credit Scoring andPD Modeling

The Classification Problem


87/170

87

The Classification Problem The classification problem can be stated as follows:

Given an observation (also called pattern) withcharacteristics x=(x1,..,xn), determine its class cfrom a predetermined set of classes {c1,...,cm}

The classes {c1,..,cm} are known on beforehand

Supervised learning! Binary (2 classes) versus multiclass (> 2 classes)

classification

Regression for classification


88/170

Regression for classification

Linear regression gives : Y=0+ 1Age+ 2Income+ 3Gender

Can be estimated using OLS (in SAS use proc reg or proc glm)Two problems:

No guarantee that Y is between 0 and 1 (i.e. a probability)

Target/Errors not normally distributed

Using a bounding function to limit the outcome between 0 and 1:

88

Customer Age Income Gender G/B

John 30 1200 M B

Sarah 25 800 F GSophie 52 2200 F G

David 48 2000 M B

Peter 34 1800 M G

Customer Age Income Gender G/B Y

John 30 1200 M B 0

Sarah 25 800 F G 1Sophie 52 2200 F G 1

David 48 2000 M B 0

Peter 34 1800 M B 1

z

e

zf

1

1)(

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

-7 -5 -3 -1 1 3 5 7

Logistic Regression


89/170

g g Linear regression with a transformation such that the output

is always between 0 and 1 can thus be interpreted as a

probability (e.g. probability of good customer)

Or, alternatively

The parameters can be estimated in SAS using proc logistic Once the model has been estimated using historical data, we

can use it to score or assign probabilities to new data

89

...)( 32101

1)gender,...income,,age|goodcustomer(

genderincomeagee

P

...))genderincome,age,|badcustomer(

)gender,...income,age,|goodcustomer((ln 3210

genderincomeage

P

P

Logistic Regression in SAS


90/170

g g

90

Customer Age Income Gender Response

John 30 1200 M No

Sarah 25 800 F Yes

Sophie 52 2200 F Yes

David 48 2000 M No

Peter 34 1800 M Yes

Historical data

proc logistic data=responsedata;

class Gender;

model Response=Age Income Gender;

run;

Customer Age Income Gender Response score

Emma 28 1000 F 0,44

Will 44 1500 M 0,76

Dan 30 1200 M 0,18

Bob 58 2400 M 0,88

...)80.0050.022.010.0(1

1)Gender,...Income,,Age|yesresponse(

genderincomeagee

P

New data

Logistic Regression


91/170

91

Logistic Regression The logistic regression model is formulated as follows:

Hence,

Model reformulation:

nn

nn

nn XX

XX

XXne

e

eXXYP

...

...

)...(1 110

110

110 11

1),...,|1(

nnnn XXXXnn

ee

XXYPXXYP

...)...(11 110110

1

1

1

11),...,|1(1),...,|0(

1),...,|0(),,...,|1(0 11 nn XXYPXXYP

nXnX

eXXYP

XXYP

n

n

. . .110

),...,|0(

),...,|1(

1

1

continued...

Logistic Regression


92/170

92

Logistic Regression

nnn

n XXXXYP

XXYP

...)

),...,|0(

),...,|1((ln

1101

1

),...,|1(1

),...,|1(

1

1

n

n

XXYP

XXYP

is the odds in favor of Y=1

is called the logit)),...,|1(1

),...,|1((ln

1

1

n

n

XXYP

XXYP

Logistic Regression


93/170

93

Logistic Regression

If Xi increases by 1:

ei is the odds-ratio: the multiplicative increasein the odds when Xi increases by one (othervariables remaining constant/ceteris paribus)

i>0ei> 1 odds and probability increase

with Xi i


94/170

94

Logistic Regression and Weight ofEvidence Coding

CustID

Age Age Group Age WoE

1 20 1: until 22 -1.1

2 31 2: 22 until 35 0.2

3 49 3: 35+ 0.9

Actual Age

AfterclassingAfter re-

coding

continued...

Logistic Regression and Weight of


95/170

95

Logistic Regression and Weight ofEvidence Coding

No dummy variables!

More robust

...)_*_*(1 01

1),...,|1(

woepurposewoeagen purposeagee

XXYP

Decision Trees


96/170

96

Decision Trees Recursively partition the training data

Various tree induction algorithms C4.5 (Quinlan 1993)

CART: Classification and Regression Trees(Breiman, Friedman, Olsen, and Stone 1984)

CHAID: Chi-squared Automatic InteractionDetection (Hartigan 1975)

Classification tree

Target is categorical

Regression tree Target is continuous (interval)

Decision Tree Example


97/170

97

Decision Tree Example

Income > $50,000

Job > 3 Years High Debt

No

No No

Good Risk

Yes

Bad Risk

Yes

Good RiskBad Risk

Yes

Decision Trees Terminology


98/170

98

Income > $50,000

Job > 3 Years High Debt

No

No No

Good Risk

Yes

Bad Risk

Yes

Good RiskBad Risk

Yes

Decision Trees Terminology

Leafnodes

BranchInternal node

Root

Estimating decision trees from data


99/170

Estimating decision trees from data

Splitting decision

Which variable to split at what value (e.g. age < 30or not, income < 1000 or not; maritalstatus=married or not, )?

Stopping decision

When to stop growing the tree?

When to stop adding nodes to the tree?

Assignment decision

Which class (e.g. good or bad customer) to assignto a leave node?

Usually the majority class!

99

Splitting decision: entropy


100/170

p g py

Green dots represent good customers; red dots badcustomers

Impurity measures disorder/chaos in a data set

Impurity of a data sample S can be measured as follows:

Entropy: E(S)= -pGlog2(pG)-pBlog2(pB) (C4.5) Gini: Gini(S)=2pGpB (CART)

100

Minimal ImpurityMinimal Impurity Maximal Impurity

Measuring Impurity using Entropy


101/170

Measuring Impurity using EntropyEntropy is minimal when pG=1 or pB =1 and maximal

when pG= p

B=0.50

Consider different candidate splits and measure for eachthe decrease in entropy or gain

101

Entropy

0

0,2

0,4

0,6

0,8

1

0 0,2 0,4 0,6 0,8 1

Example: Gain Calculation


102/170

102

Entropy top node = -1/2 x log2(1/2) 1/2 x log2(1/2) = 1 Entropy left node = -1/3 x log2(1/3) 2/3 x log2(2/3) = 0.91

Entropy right node = -1 x log2(1) 0 x log2(0) = 0

Gain = 1 (600/800) x 0.91 (200/800) x 0 = 0.32

Consider different alternative splits and pick the one with biggestgain

102

400 400

200 400 200 0

G B

G B G B

Age < 30 Age 30

Gain

Age < 30 0,32

Marital status=married 0,05

Income < 2000 0,48

Savings amount < 50000 0,12

Stopping decision: early stopping


103/170

103

pp g y pp gAs the tree is grown, the data becomes more and more segmented and

splits are based on fewer and fewer observations

Need to avoid the tree from becoming too specific and fitting the noise ofthe data (aka overfitting)

Avoid overfitting by partitioning the data into a training set (used for splittingdecision) and a validation set (used for stopping decision)

Optimal tree is found where validation set error is minimal

103

Validation set

Trainingset

minimum

Misclassificationerror

STOP Growing tree!

Number of tree nodes

Advantages/Disadvantages of Trees


104/170

104

g g Advantages

Ease of interpretation Non-parametric

No need for transformations of variables

Robust to outliers

Disadvantages Sensitive to changes in the training data (unstable)

Combinations of decision trees: bagging, boosting,random forests


105/170

Section 5

Measuring the

Performance of Credit ScoringClassification Models

How to Measure Performance?


106/170

106

Performance

How well does the trained model perform in predictingnew unseen (!) instances?

Decide on performance measure

Classification: Percentage Correctly Classified (PCC),

Sensitivity, Specificity, Area Under ROC curve(AUROC), ...

Regression: Mean Absolute Deviation (MAD), MeanSquared Error (MSE), ...

Methods Split sample method

Single sample method

N-fold cross-validation

Split Sample Method


107/170

107

p p For large data sets

Large= > 1000 Obs, with > 50 bad customers Set aside a test set (typically one-third of the data)

which is NOT used during training!

Estimate performance of trained classifier on test set

Note: for decision trees, validation set is part oftraining set

Stratification

Same class distribution in training set and test set

Performance Measures for Classification


108/170

108

Classification accuracy, confusion matrix, sensitivity,specificity

Area under the ROC curve

The Lorenz (Power) curve

The Cumulative lift curve

Kolmogorov Smirnov

Classification performance: confusion matrix


109/170

109

Cut off=0,50

G/B Score

John G 0,72

Sophie B 0,56David G 0,44

Emma B 0,18

Bob B 0,36

G/B Score Predicted

John G 0,72 G

Sophie B 0,56 G

David G 0,44 BEmma B 0,18 B

Bob B 0,36 B

Actual statusPositive (Good) Negative (Bad)

Predicted

status

Positive (Good) True Positive (John) False Positive (Sophie)

Negative (Bad) False Negative (David) True Negative (Emma, Bob)

Classification accuracy=(TP+TN)/(TP+FP+FN+TN)=3/5

Classification error = (FP +FN)/(TP+FP+FN+TN)=2/5Sensitivity=TP/(TP+FN)=1/2Specificity=TN/(FP+TN)=2/3

Confusion matrix

All these measures depend upon the cut off!

Classification Performance: ROC analysis


110/170

y Make a table with sensitivity and specificity for each possible cut-off

ROC curve plots sensitivity versus 1-specificity for each possible cut-off

Perfect model has sensitivity of 1 and specificity of 1 (i.e. upper left corner)

Scorecard A is better than B in above figure ROC curve can be summarized by the area underneath (AUC); the bigger

the better!

110

Cut-off sensitivity specificity 1-specificity

0 1 0 1

0,01

0,02

.

0,99

1 0 1 0

0

0,2

0,4

0,6

0,8

1

0 0,2 0,4 0,6 0,8 1

Sensitivity

(1 - Specificity)

ROC Curve

Scorecard - A Random Scorecard - B


111/170

The Area Under the ROC Curve


112/170

112

How to compare intersecting ROC curves?

The area under the ROC curve (AUC) The AUC provides a simple figure-of-merit for the

performance of the constructed classifier

An intuitive interpretation of the AUC is that it provides

an estimate of the probability that a randomly choseninstance of class 1 is correctly ranked higher than arandomly chosen instance of class 0 (Hanley andMcNeil, 1983) (Wilcoxon or Mann-Whitney or Ustatistic)

The higher the better

A good classifier should have an AUC larger than 0.5

Cumulative Accuracy Profile (CAP)


113/170

Sort the population from high score to low model score Measure the (cumulative) percentage of bads for each score decile

E.g. top 30% most likely bads according to model captures 65% of truebads

Often summarised as top-decile lift, or how many bads in top 10%

113

Accuracy Ratio


114/170

114

The accuracy ratio (AR) is defined as:(Area below power curve for current model-Area below powercurve for random model) /(Area below power curve for perfect model-Area below power

curve for random model) Perfect model has an AR of 1

Random model has an AR of 0

AR is sometimes also called the Gini coefficient

AR=2*AUC-1

B

A

B

A

AR=B/(A+B)

Current model

Perfect model

The Kolmogorov-Smirnov (KS) Distance


115/170

115

Separation measure

Measures the distance between the cumulative scoredistributions P(s|B) and P(s|G)

KS = maxs|P(s|G)-P(s|B)|, where:

P(s|G)= xsp(x|G) (equals 1- sensitivity)

P(s|B) = xsp(x|B) (equals the specificity) KS distance metric is the maximum vertical distance

between both curves

KS distance can also be measured on the ROC graph

Maximum vertical distance between ROC graph anddiagonal

continued...

The Kolmogorov-Smirnov Distance


116/170

116

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

KS distance

score

P(s|B)

P(s|G)


117/170

Section 6

Setting the

Classification Cut-off

Setting the Classification Cut-off


118/170

118

P(customer is good payer|x)

Depends upon risk preference strategy No input provided by Basel accord, capital should be in

line with risks taken

Strategy curve

Choose cut-off that produces same acceptancerate as previous scorecard but improves the badrate

Choose cut-off that produces same bad rate as

previous scorecard but improves the acceptancerate

Strategy Curve


119/170

119

Accept Rate

0

2

4

68

10

12

14

16

18

40 45 50 55 60 65 70 75 80 85 90 95

B

adRate

Setting the Cut-off Based on MarginalG


120/170

120

Good-Bad Rates

Usually one chooses a marginal good-bad rate ofabout 5:1 or 3:1

Dependent upon the granularity of the cut-off change

Cut-off Cum. Goods

above score

Cum. Bads

above score

Marginal

good-bad rate

0.9 1400 50 -

0.8 2000 100 12:1

0.7 2700 170 10:1

0.6 2950 220 5:1

0.5 3130 280 3:1

0.4 3170 300 2:1

Including a Refer Option


121/170

121

Refer option

Gray zone Requires further (human) inspection

0 400190

Reject

210

AcceptRefer

...


122/170

122

Baesens B., Van Gestel T., Viaene S.,

Stepanova M., Suykens J., Vanthienen J.,Benchmarking State of the Art ClassificationAlgorithms for Credit Scoring, Journal of theOperational Research Society, Volume 54,Number 6, pp. 627-635, 2003.

Case Study 1

Benchmarking Study Baesens et al., 2003


123/170

123

8 real-life credit scoring data sets

UK, Benelux, Internet 17 algorithms or variations of algorithms

Various cut-off setting schemes

Classification accuracy + Area under Receiver

Operating Characteristic Curve McNemar test + DeLong, DeLong and Clarke-Pearson

test

Baesens B., Van Gestel T., Viaene S., Stepanova M., Suykens J.,Vanthienen J., Benchmarking State of the Art Classification Algorithmsfor Credit Scoring, Journal of the Operational Research Society, Volume54, Number 6, pp. 627-635, 2003.

continued...

Benchmarking Study


124/170

124

Benchmarking Study: Conclusions


125/170

125

NN and RBF LS-SVM consistently yield goodperformance in terms of both PCC and AUC

Simple linear classifiers such as LDA and LOGalso gave very good performances

Most credit scoring data sets are only weaklynon-linear

Flat maximum effect

continued...

Benchmarking Study: Conclusions


126/170

126

Although the differences are small, they may be large

enough to have commercial implications. (Henley and

Hand, 1997)

Credit is an industry where improvements, even when

small, can represent vast profits if such improvementcan be sustained. (Kelly, 1998)

For mortgage portfolios, a small increase in PDscorecard discrimination will significantly lower theminimum Capital Requirements in the context of BaselII. (Experian, 2003)

The best way to augment the performance of ascorecard is to look for better predictors!

Role of Credit Bureaus!

Credit BureausC di f i di b


127/170

127

Credit reference agencies or credit bureaus UK: Information typically accessed through name and address

Types of information: Publicly available information For example, time at address, electoral rolls, court judgments

Previous searches from other financial institutions Shared contributed information by several financial institutions

Check whether applicant has loan elsewhere and how he pays

Aggregated information For example, data at postcode level

Fraud Warnings Check whether prior fraud warnings at given address

Bureau added value For example, build generic scorecard for small populations or

new product In U.K.: Experian, Equifax, Call Credit In U.S.: Equifax, Experian, TransUnion


128/170

Section 7

Input Selection for

Classification

Input Selection


129/170

129

Inputs=Features=Attributes=Characteristics=Variables

Also called attribute selection, feature selection

If n features are present, 2n-1 possible feature setscan be considered

Heuristic search methods are needed!

Good feature subsets contain features highly

correlated (predictive of) with the class, yetuncorrelated with (not predictive of) each other(Hall and Smith 1998)

Can improve the performance and estimation timeof the classifier

continued...

Input Selection


130/170

130

Curse of dimensionality

Number of training examples required increasesexponentially with number of inputs

Remove irrelevant and redundant inputs

Interaction and correlation effects

Correlation implies redundancy Difference between redundant input and relevant input

An input can be redundant, but that does not meanit is relevant (due to e.g. high correlation with

another input already in the model)

Input selection procedure


131/170

131

Step 1: Use a filter procedure

For quick filtering of inputs Inputs are selected independent of the

classification algorithm

Step 2: Wrapper/stepwise regression

Use the p-value of the regression for inputselection

Step 3: AUC based pruning

Iterative procedure based on AUC

Filter Methods for Input Selection


132/170

132

Continuous target(e.g. LGD)

Discrete target(e.g. PD)

Continuous input Pearson correlationSpearman rank ordercorrelation

Fisher score

Discrete input Fisher score Chi-squared analysis

Cramers V

Information Value

Gain/Entropy

Chi-squared Based Filter


133/170

133

Under the independence assumption,P(married and good payer) = P (married).P(good payer) =

0.6 * 0.8

The expected number of good payers that are married are0.6*0.8*1000 = 480.

Good Payer Bad Payer Total

Married 500 100 600

Not Married 300 100 400

800 200 1000

continued...



134/170

134

If both criteria independent, one should have thefollowing contingency table

Compare using a chi-squared test statistic

Good Payer Bad Payer Total

Married 480 120 600

Not Married 320 80 400

800 200 1000



135/170

135

is chi-squared distributed with k-1 degrees of freedom, with kbeing the number of classes of the characteristic

The bigger (lower) the value of , the more (less) predictive theattribute is

Apply as filter: rank all attributes with respect to theirp-value and select the most predictive ones

Note:

always bounded between 0 and 1, higher values indicate betterpredictive input.

41.10

80

80100

320

320300

120

120100

480

4805002

n

VsCramer'

Example: Filters (German credit data set)


136/170

136Van Gestel, Baesens, 2008

Fisher scoreD fi d


137/170

137

Defined as

Higher value indicates more predictive input

Can also be used if target is continuous and input discrete

Example for German credit data set

22

||scoreFisher

BG

BG

ss

xx

Van Gestel, Baesens, 2008

Wrapper/stepwise regressionU th l t d id th i t f th


138/170

138

Use the p-value to decide upon the importance of theinputs

p-value < 0.01: highly significant

0.01 < p-value < 0.05: significant

0.05 < p-value < 0.10: weakly significant

0.1 < p-value: not significant Can be used in different ways:

Forward

Backward

Stepwise

Search StrategiesDi ti f h


139/170

139

Direction of search

Forward selection

Backward elimination

Floating search

Oscillating search

Traversing the input space Greedy Hill Climbing search (Best First search)

Beam search

Genetic algorithms

Example Search Space for Four Inputs


140/170

140

{I1,I2,I3,I4}

{I2}{I1} {I3} {I4}

{I1,I4} {I2,I4} {I3,I4}{I2,I3}{I1,I2} {I1,I3}

{I1, I2,I4}{I1, I2,I3} {I1,I3,I4} {I2,I3,I4}

{}

Stepwise Logistic Regression


141/170

141

SELECTION=

FORWARD,

PROC LOGISTIC first estimates parameters for

effects forced into the model. These effects are theintercepts and the first nexplanatory effects in theMODEL statement, where nis the number specifiedby the START= or INCLUDE= option in the MODELstatement (nis zero by default). Next, theprocedure computes the score chi-square statisticfor each effect not in the model and examines thelargest of these statistics. If it is significant at theSLENTRY= level, the corresponding effect is addedto the model. Once an effect is entered in themodel, it is never removed from the model. The

process is repeated until none of the remainingeffects meet the specified level for entry or until theSTOP= value is reached.

continued...



142/170

142

SELECTION=

BACKWARD,

Parameters for the complete model as specified in

the MODEL statement are estimated unless theSTART= option is specified. In that case, only theparameters for the intercepts and the first nexplanatory effects in the MODEL statement areestimated, where nis the number specified by theSTART= option. Results of the Wald test forindividual parameters are examined. The leastsignificant effect that does not meet the SLSTAY=level for staying in the model is removed. Once aneffect is removed from the model, it remainsexcluded. The process is repeated until no other

effect in the model meets the specified level forremoval or until the STOP= value is reached.

continued...



143/170

143

SELECTION=

STEPWISE

similar to the SELECTION=FORWARD option

except that effects already in the model do notnecessarily remain. Effects are entered into andremoved from the model in such a way that eachforward selection step may be followed by one ormore backward elimination steps. The stepwiseselection process terminates if no further effect canbe added to the model or if the effect just enteredinto the model is the only effect removed in thesubsequent backward elimination.

SELECTION=

SCORE

PROC LOGISTIC uses the branch and bound

algorithm of Furnival and Wilson (1974)

Stepwise Logistic Regression: Example


144/170

144

proclogisticdata=bart.dmagecr;class checking history purpose savings employed marital coapp residentproperty age other housing;

model good_bad=checking duration history purpose amount savings employedinstallp marital coapp resident property other/selection=stepwise slentry=0.10slstay=0.01;run;

!

AUC based pruningStart from a model (e g logistic regression) with all n inputs


145/170

145

Start from a model (e.g. logistic regression) with all n inputs

Remove each input in turn and re-estimate the model

Remove the input giving the best AUC

Repeat this procedure until AUC performance decreasessignificantly.

Inputs

Average AUCPerformance


146/170

Section 8

Reject Inference

Population


147/170

147

Through-the-door

Rejects Accepts

Bads Goods? Bads ? Goods

Reject Inference Problem:


148/170

148

Problem:

Good/Bad Information is only available for pastaccepts, but not for past rejects.

Reject bias when building classification models

Need to create models for the through-the-doorpopulation

If past application policy was random, then rejectinference no problem, however this is not realistic

Solution:

Reject Inference: A process whereby theperformance of previously rejected applications isanalyzed to estimate their behavior.

Reject Inference


149/170

149

Through-the-door

10,000

Rejects

2,950

Accepts

7,050

Inferred Bads

914

Inferred Goods

2,036

Known Bads

874

Known Goods

6,176

Techniques to Perform Reject Inference Define as bad


150/170

150

Define as bad

Define all rejects as bads

Reinforces the credit scoring policy and prejudicesof the past

In SAS Enterprise Miner (reject inference node):

Hard cut-off augmentation Parcelling

Fuzzy Augmentation

Hard Cut-Off Augmentation AKA Augmentation 1


151/170

151

AKA Augmentation 1

Build a model using known goods and bads

Score rejects using this model and establishexpected bad rates

Set an expected bad rate level above which anaccount is deemed bad

Classify rejects as good and bad based on thislevel

Add to known goods/bads

Remodel

Parcelling Rather than classify rejects as good or bad it assigns


152/170

152

Rather than classify rejects as good or bad, it assignsthem proportional to the expected bad rate at that

score

score rejects with good/bad model

split rejects into proportional good and bad groups.

Score # Bad # Good % Bad % Good

0-99 24 10 70.3% 29.7%

100-199 54 196 21.6% 78.4%

200-299 43 331 11.5% 88.5%

300-399 32 510 5.9% 94.1%

400+ 29 1,232 2.3% 97.7%

Reject

342

654

345

471

778

Rej - Bad

240

141

40

28

18

Rej - Good

102

513

305

443

760

...

Parcelling However


153/170

153

However

Reject bad proportion cannot be the same asapproved

Therefore

Allocate higher proportion of bads from reject

Rule of thumb: bad rate for rejects should be24 times that of approves

Nearest Neighbor Methods Create two sets of clusters: goods and bads


154/170

154

Create two sets of clusters: goods and bads

Run rejects through both clusters

Compare Euclidean distances to assign most likelyperformance

Combine accepts and rejects and remodel

However, rejects may be situated in regions far awayfrom the accepts

Bureau Based Reject Inference Bureau data


155/170

155

Bureau data

performance of declined apps on similar productswith other companies

legal issues

difficult to implement in practice timings,definitions, programming

Declined got

credit elsewhere

Jan 99

Analyze

performance

Dec 00

...

Additional Techniques to PerformReject Inference


156/170

156

Reject Inference

Approve all applications Three-group approach

Iterative Re-Classification

Mixture Distribution modelling

Reject Inference... There is no unique best method of universal


157/170

157

... There is no unique best method of universal

applicability, unless extra information is obtained. That is,

the best solution is to obtain more information (perhapsby granting loans to some potential rejects) about those

applicants who fall in the reject region, David Hand,1998.

Cost of granting credit to rejects versus benefit of betterscoring model (for example, work by Jonathan Crook)

continued

Reject Inference Banasik et al. were in the exceptional situation of


158/170

158

Banasik et al. were in the exceptional situation ofbeing able to observe the repayment behavior of

customers that would normally be rejected. Theyconcluded that the scope for improving scorecardperformance by including the rejected applicantsinto the model development process is present but

modest. Banasik J., Crook J.N., Thomas L.C., Sample

selection bias in credit scoring models, In:Proceedings of the Seventh Conference on Credit

Scoring and Credit Control(CSCCVII2001),Edinburgh, Scotland, 2001.

Reject Inference No method of universal applicability


159/170

159

No method of universal applicability

Much controversy

Withdrawal inference

Competitive market

Withdrawal Inference


160/170

160

Through-the-door

Withdrawn Rejects

? Bads ? Goods? Bads ? Goods

Accepts

Bads Goods


161/170

Section 10

Behavioral Scoring

Why Use Behavioral Scoring? Debt provisioning


162/170

162

p g

Setting credit limits (revolving credit)

Authorizing accounts to go in excess

Collection strategies

What preventive actions should be taken

if customer starts going into arrears? (earlywarning system)

Marketing applications (for example, customersegmentation and targeted mailings)

Basel II!

Behavioral Scoring Once behavioral scores have been obtained, they can be used to


163/170

163

increase/decrease the credit limits

But: score measures the risk of default given the currentoperating policy and credit limit!

So, using the score to change the credit limit invalidates theeffectiveness of the score!

Analogy: only those who have no or few accidents when they

drive a car at 30 mph in the towns should be allowed to drive at70 mph on the motorways (Thomas, Edelman, and Crook 2002)

Other skills may be needed to drive faster!

Similarly, other characteristics required to manage account withlarge credit limits when compared to accounts with small credit

limits Typically, divide the behavioral score into bands and give

different credit limit to each band.


164/170

Variables Used in Behavioural Scoring

Define derived variables measuring financial solvency


165/170

165

g yof customer

Customer can have many products and have differentroles for each product

For example, primary owner, secondary owner,guarantor, private versus professional products, ...

Think about product taxonomy to define customerbehavior

For example, average savings amount, look atdifferent savings products

The best way to augment the performance of ascorecard is to create better variables!

Aggregation Functions Average (sensitive to outliers)

Median (insensitive to outliers)


166/170

166

Minimum/Maximum

For example, worst status during last 6 months, highestutilization during last 12 months,

Absolute trend

Relative trend

Most recent value, Value 1 month ago,

Ratio variable

E.g.: Obligation/debt ratio: calculated by adding the proposedmonthly house payment and any regular monthly installmentrevolving debt and dividing by the monthly income.

Others: current balance/credit limit,

6

6

6

T

TT

x

xx

6

6 TT xx

Impact of Using Aggregation Functions Impact of missing values


167/170

167

p g

How to calculate trends?

Ratio variables mighty have distributions with fat tails(large positive and negative values)

Use truncation/winsoring!

Data set expands in the number of attributes Input selection!

What if input selection allows you to choose between,for example, average and most recent value of a ratio?

Choose average value for cyclic dependent ratios Choose most recent for structural (non-cyclic)

dependent ratios

Behavioral Scoring Implicit assumption


168/170

168

p p

Relationship between the performancecharacteristics and the subsequent delinquencystatus of a customer is the same now as 2 or 3years ago.

Classification approach to behavioral scoring

Markov Chain method to behavioral scoring

Classification Approach to BehavioralScoring


169/170

169

Performance period versus observation period

Develop classification model estimating probability of

default during observation period Length of performance period and observation period

Trade-off

Typically 6 months to 1 year

Seasonality effects

Performance period

Observation point Outcome point

How to Choose the Observation Point? Seasonality, dependent upon observation point


170/170

Develop 12 PD models

Less maintainable

Need to backtest/stress test/monitor all thesemodels!

Use sampling methods to remove seasonality Use 3 years of data and sample the observation

point randomly for each customer during thesecond year

Only 1 modelGives a seasonally neutral data set

Documents

Credit Scoring and Data Mining