125
Comparing Different Classification Techniques in Credit Scoring Saed Sayad 1 www.ismartsoft.com iSmartsoft

Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Embed Size (px)

Citation preview

Page 1: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Comparing Different Classification Techniques in Credit Scoring 

Saed Sayad 

1 www.ismartsoft.com 

iSmartsoft

Page 2: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Data Mining 

www.ismartsoft.com  2 

Data mining is about explaining the past and predicting the future by 

means of data analysis

Page 3: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Credit Scoring 

www.ismartsoft.com  3 

(Mester, 1997) 

Credit scoring is a statistical method that is used to predict the probability that a loan applicant or existing borrower will 

default or become delinquent 

(Mester, 1997)

Page 4: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Data Mining in Credit Scoring 

1  • Problem Definition 

2  • Data Preparation 

3  • Data Exploration 

4  • Modeling 

5  • Evaluation 

6  • Deployment 

www.ismartsoft.com  4

Page 5: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

1. Problem Definition 

www.ismartsoft.com  5 

•  Develop a credit scoring model to predict the credit risk of credit applicants as bad risk (default) and good risk. 

•  The credit issuer intends to deploy the model for the existing borrower.

Page 6: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Data Mining Team 

Modeler 

Analyst DBA 

www.ismartsoft.com  6 

Domain Expert

Page 7: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Data Mining Software Vendors Gartner Magic Quadrant 

www.ismartsoft.com  7

Page 8: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

2. Data Preparation 

www.ismartsoft.com  8 

•  Training set –  No of cases: 35,500 –  No of cases with bad risk (target): 2,500 (7%) 

–  Number of variables: 25 

–  Total balance for all cases: $554,000,000 –  Total balance for cases with bad risk: $58,000,000 

•  Test set –  Total number of cases: 8,167 

–  Total number of targets: 560 

–  Total balance for cases with bad risk : $12,281,589

Page 9: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

3. Data Exploration 

Data 

Exploration 

Univariate Analysis 

Frequency, Average, Min, 

Max, ... 

Bar, Line, Pie, ... 

Charts 

Bivariate Analysis 

Correlation 

Z test, Chi2 ,... 

Bivariate Charts 

www.ismartsoft.com  9

Page 10: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Data Exploration ‐ Bivariate 

10 

Target% 

Months in Business

Page 11: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

4. Modeling 

www.ismartsoft.com  11 

f Age  Default 

Y or N 

Logistic Regression

Page 12: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

5. Evaluation 

Stats  Charts 

K‐S Chart 

ROC Chart 

Gain/Lift Chart 

Variables Contribution 

Mean Square Error 

Confusion Matrix 

www.ismartsoft.com  12

Page 13: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Result 

•  Total number of cases = 8,167 

•  Total number of targets = 560 

•  Total balance for targets = $12,281,589 •  Top 10% Random 

–  Number of targets = 56 

–  Total balance = $1,230,000 •  Top 10%Model 

–  Number of targets = 305 

–  Total balance = $7,655,772 

www.ismartsoft.com  13 

600% Lift

Page 14: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

6. Deployment 

www.ismartsoft.com  14 

SQL  Batch Scoring 

HTML Web‐ based Scoring

Page 15: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Do we need to try other modeling techniques? 

If so, how do we compare them? 

www.ismartsoft.com  15

Page 16: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Model Comparison 

www.ismartsoft.com  16 

Theoretically 

Robust 

Practically 

Measurable Truly 

Scalable 

No black box please! 

Don’t be Sisyphus of the analytics world!  Inputs & Outputs

Page 17: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

True Scalability 

•  Incremental/Decremental Learning: utilizing new data without the necessity of pooling new data with old data and repeating the model training step; 

•  Variable Addition/Deletion:  add or remove variables without re‐training; 

•  Scenario Testing: rapid formulation and testing of multiple and diverse models; 

•  Parallel and Distributed Processing: carrying out parallel and/or distributed processing simultaneously for all datasets or data segments to enable a single model using all of the data to be obtained. 

www.ismartsoft.com  17

Page 18: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

www.ismartsoft.com  18 

Please, fasten your belt!

Page 19: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Modeling (Classification & Regression) 

Frequency 

Table 

OneR 

Bayesian 

Decision Tree 

Covariance 

Matrix 

Linear 

Regression 

LDA 

(Z Score) 

Logistic 

Regression 

SVM 

Similarity 

Function 

KNN 

Neural 

Networks 

Perceptron 

Back 

Propagation 

RBF 

www.ismartsoft.com  19 Scalable Methods

Page 20: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Models based on 

Frequency Tables 

www.ismartsoft.com  20

Page 21: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

ZeroR Classification 

www.ismartsoft.com  21 

Outlook  Temp.  Humidity  Windy  Play Golf Rainy  Hot  High  False  No 

Rainy  Hot  High  True  No 

Overcast  Hot  High  False  Yes 

Sunny  Mild  High  False  Yes 

Sunny  Cool  Normal  False  Yes 

Sunny  Cool  Normal  True  No 

Overcast  Cool  Normal  True  Yes 

Rainy  Mild  High  False  No 

Rainy  Cool  Normal  False  Yes 

Sunny  Mild  Normal  False  Yes 

Rainy  Mild  Normal  True  Yes 

Overcast  Mild  High  True  Yes 

Overcast  Hot  Normal  False  Yes 

Sunny  Mild  High  True  No 

Predictors  Target

Page 22: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification ‐ ZeroR 

www.ismartsoft.com  22 

Yes 

Sort 

5 / 14 = 0.36 

9 / 14 = 0.64 

Play Golf No 

No 

Yes 

Yes 

Yes 

No 

Yes 

No 

Yes 

Yes 

Yes 

Yes 

Yes 

No 

Play Golf No 

No 

No 

No 

No 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes

Page 23: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification ‐ ZeroR 

www.ismartsoft.com  23 

No 

2493/35500 = 0.07 

Default 

33007/35500 = 0.93

Page 24: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

www.ismartsoft.com 

Outlook  Temp  Humidity  Windy  Play Golf Rainy  Hot  High  False  No 

Rainy  Hot  High  True  No 

Overcast  Hot  High  False  Yes 

Sunny  Mild  High  False  Yes 

Sunny  Cool  Normal  False  Yes 

Sunny  Cool  Normal  True  No 

Overcast  Cool  Normal  True  Yes 

Rainy  Mild  High  False  No 

Rainy  Cool  Normal  False  Yes 

Sunny  Mild  Normal  False  Yes 

Rainy  Mild  Normal  True  Yes 

Overcast  Mild  High  True  Yes 

Overcast  Hot  Normal  False  Yes 

Sunny  Mild  High  True  No 

Classification ‐ OneR Which one is the best predictor ? 

24

Page 25: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

OneR ‐ Frequency Tables 

Play Golf 

Yes  No 

Outlook 

Sunny  3  2 

Overcast  4  0 

Rainy  2  3 

www.ismartsoft.com  25 

Play Golf 

Yes  No 

Humidity High  3  4 

Normal  6  1 

Play Golf 

Yes  No 

Temp. 

Hot  2  2 

Mild  4  2 

Cool  3  1 

Play Golf 

Yes  No 

Windy False  6  2 

True  3  3

Page 26: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

OneR – The Best Rule 

www.ismartsoft.com  26 

MAXDELQ: < 2  ‐> N >= 2  ‐> Y ?  ‐> N 

94%

Page 27: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Can we incorporate all predictors? 

www.ismartsoft.com  27 

Bayesian Classifier

Page 28: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

( ) ( ) ( ) ( ) X | X X | P 

T P T P T P = 

Bayes’ Rule 

www.ismartsoft.com  28 

Predictor Prior Probability 

Likelihood  Target Prior Probability 

Posterior Probability

Page 29: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Likelihood Tables 

Play Golf 

Yes  No 

Outlook 

Sunny  3/9  2/5 

Overcast  4/9  0/5 

Rainy  2/9  3/5 

www.ismartsoft.com  29 

Play Golf 

Yes  No 

Humidity High  3/9  4/5 

Normal  6/9  1/5 

Play Golf 

Yes  No 

Temp. 

Hot  2/9  2/5 

Mild  4/9  2/5 

Cool  3/9  1/5 

Play Golf 

Yes  No 

Windy False  6/9  2/5 

True  3/9  3/5 

33 . 0 9 / 3 ) | ( = = = =  Yes PlayGolf Sunny Outlook P

Page 30: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Predictor Probability 

Play Golf 

Yes  No 

Outlook 

Sunny  3/9  2/5  5/14 

Overcast  4/9  0/5  4/14 

Rainy  2/9  3/5  5/14 

www.ismartsoft.com  30 

36 . 0 14 / 5 ) ( = = = Sunny Outlook P

Page 31: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Target Probability 

Play Golf 

Yes  No 

Outlook 

Sunny  3/9  2/5  5/14 

Overcast  4/9  0/5  4/14 

Rainy  2/9  3/5  5/14 

9/14  5/14 

www.ismartsoft.com  31 

64 . 0 14 / 9 ) ( = = = Yes PlayGolf P

Page 32: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Posterior Probability 

www.ismartsoft.com  32 

33 . 0 9 / 3 ) | ( = = = =  Yes PlayGolf Sunny Outlook P 

64 . 0 14 / 9 ) ( = = = Yes PlayGolf P 

36 . 0 14 / 5 ) ( = = =  Sunny Outlook P 

60 . 0 36 . 0 64 . 0 33 . 0 ) | ( = ÷ × = = =  Sunny Outlook Yes PlayGolf P

( ) ( ) ( ) ( ) X | X X | P 

T P T P T P =

Page 33: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Bayesian ‐ Prediction 

Posterior Probability 

Play Golf 

Yes  No 

Outlook 

Sunny  0.60  0.40 

www.ismartsoft.com  33 

Posterior Probability 

Play Golf 

Yes  No 

Humidity High  0.43  0.57 

Posterior Probability 

Play Golf 

Yes  No 

Temp.  Mild  0.67  0.33 

Posterior Probability 

Play Golf 

Yes  No 

Windy True  0.50  0.50 

P(PlayGolf = Yes) = 0.60 x 0.43 x 0.67 x 0.50 = 0.08643 

P(PlayGolf = No) = 0.40 x 0.57 x 0.33 x 0.50 = 0.03762 

Case = [Outlook=Sunny;  Humidity=High;  Temp=Mild;  Windy=True]

Page 34: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Naïve Bayesian– Multiple Predictors 

www.ismartsoft.com  34 

Strong Independence assumption about predictors 

) ( ) | ( ) | ( ) | ( X) | (  2 1  i i n i i i  T P T x P T x P T x P T P × × × × = L

Page 35: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Can we incorporate all predictors Can we incorporate all predictors + Strong dependency? 

www.ismartsoft.com  35 

Decision Trees

Page 36: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Decision Tree 

www.ismartsoft.com  36 

Outlook 

Sunny 

Windy 

FALSE 

Play 

TRUE 

Not Play 

Overcast 

Play 

Rainy 

Humidity 

High 

Not Play 

Normal 

Play

Page 37: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Entropy 

www.ismartsoft.com  37 

Entropy  = ‐p log 2 p – q log 2 q 

Entropy = ‐0.5 log 2 0.5 – 0.5 log 2 0.5 = 1

Page 38: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Entropy – Frequency 

www.ismartsoft.com  38 

Entropy (5,3,2) = Entropy (0.5,0.3,0.2) 

= ‐ (0.5 * log 2 0.5) ‐ (0.3 * log 2 0.3) ‐ (0.2 * log 2 0.2) 

= 1.49 

∑ =

− = c 

i i i  p p S E 

1 2 log ) (

Page 39: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Entropy ‐ Target 

www.ismartsoft.com  39 

Play Golf No 

No 

Yes 

Yes 

Yes 

No 

Yes 

No 

Yes 

Yes 

Yes 

Yes 

Yes 

No 

Play Golf No 

No 

No 

No 

No 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Sort 

5 / 14 = 0.36 

9 / 14 = 0.64 

Entropy(PlayGolf) =  Entropy (5,9) 

= Entropy (0.36, 0.64) 

= ‐ (0.36 log 2 0.36) ‐ (0.64 log 2 0.64) 

= 0.94

Page 40: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Frequency Tables 

Play Golf 

Yes  No 

Outlook 

Sunny  3  2 

Overcast  4  0 

Rainy  2  3 

www.ismartsoft.com  40 

Play Golf 

Yes  No 

Humidity High  3  4 

Normal  6  1 

Play Golf 

Yes  No 

Temp. 

Hot  2  2 

Mild  4  2 

Cool  3  1 

Play Golf 

Yes  No 

Windy False  6  2 

True  3  3

Page 41: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Entropy – Frequency Table Play Golf 

Yes  No 

Outlook 

Sunny  3  2  5 

Overcast  4  0  4 

Rainy  2  3  5 

14 

www.ismartsoft.com  41 

∑ ∈

= X v 

v E v P X T E  ) ( ) ( ) , ( 

E(PlayGolf, Outlook) = P(Sunny)*E(3,2) + P(Overcast)*E(4,0) + P(Rainy)*E(2,3) 

= (5/14)*0.971 + (4/14)*0.0 + (5/14)*0.971 

= 0.693

∑ =

− = c 

i i i  p p v E 

1 2 log ) (

Page 42: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Information Gain 

www.ismartsoft.com  42 

) , ( ) ( ) , (  X T Entropy T Entropy X T Gain − =

G(PlayGolf, Outlook) = E(PlayGolf) – E(PlayGolf, Outlook) 

= 0.940 – 0.693 = 0.247

Page 43: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Information Gain – the best predictor? 

Play Golf 

Yes  No 

Outlook 

Sunny  3  2 

Overcast  4  0 

Rainy  2  3 

Gain = 0.247 

www.ismartsoft.com  43 

Play Golf 

Yes  No 

Humidity High  3  4 

Normal  6  1 

Gain = 0.152 

Play Golf 

Yes  No 

Temp. 

Hot  2  2 

Mild  4  2 

Cool  3  1 

Gain = 0.029 

Play Golf 

Yes  No 

Windy False  6  2 

True  3  3 

Gain = 0.048

Page 44: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Decision Tree – Root Node 

Outlook 

Sunny  Overcast  Rainy 

www.ismartsoft.com  44

Page 45: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Dataset – Divide and Conqure 

www.ismartsoft.com  45 

Outlook  Temp.  Humidity  Windy  Play Golf 

Sunny  Mild  High  FALSE  Yes Sunny  Cool  Normal  FALSE  Yes Sunny  Cool  Normal  TRUE  No Sunny  Mild  Normal  FALSE  Yes 

Sunny  Mild  High  TRUE  No 

Rainy  Hot  High  FALSE  No 

Rainy  Hot  High  TRUE  No Rainy  Mild  High  FALSE  No 

Rainy  Cool  Normal  FALSE  Yes Rainy  Mild  Normal  TRUE  Yes Overcast  Hot  High  FALSE  Yes Overcast  Cool  Normal  TRUE  Yes Overcast  Mild  High  TRUE  Yes Overcast  Hot  Normal  FALSE  Yes

Page 46: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Subset (Outlook = Sunny) 

www.ismartsoft.com  46 

Temp.  Humidity  Windy  Play Golf 

Mild  High  FALSE  Yes Cool  Normal  FALSE  Yes Mild  Normal  FALSE  Yes Cool  Normal  TRUE  No 

Mild  High  TRUE  No 

Outlook 

Sunny 

Windy 

FALSE 

Play 

TRUE 

Not Play Not Play 

Overcast 

Play Play 

Rainy

Page 47: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Final Decision Tree 

www.ismartsoft.com  47 

Outlook 

Sunny 

Windy 

FALSE 

Play 

TRUE 

Not Play 

Overcast 

Play 

Rainy 

Humidity 

High 

Not Play 

Normal 

Play

Page 48: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Numeric Variables and Missing Values 

www.ismartsoft.com  48 

Outlook  Temp  Humidity  Windy  Play Golf Rainy  85  High  False  No 

Rainy  80  High  True  No 

Overcast  83  High  False  Yes 

Sunny  70  High  False  Yes 

Sunny  68  ?  False  Yes 

Sunny  65  Normal  True  No 

Overcast  64  Normal  True  Yes 

Rainy  72  High  ?  No 

Rainy  69  Normal  False  Yes 

Sunny  75  Normal  False  Yes 

Rainy  75  Normal  True  Yes 

?  72  High  True  Yes 

Overcast  81  Normal  False  Yes 

Sunny  71  High  True  No

Page 49: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Numeric Variables ‐ Binning 

www.ismartsoft.com  49 

Temp  B_Temp  Play Golf 85  80­90  No 

80  80­90  No 

83  80­90  Yes 

70  70­80  Yes 

68  60­70  Yes 

65  60­70  No 

64  60­70  Yes 

72  70­80  No 

69  60­70  Yes 

75  70­80  Yes 

75  70­80  Yes 

72  70­80  Yes 

81  80­90  Yes 

71  70­80  No 

Play Golf 

Yes  No 

B_Temp 

60‐70  3  1 

70‐80  4  2 

80‐90  2  2

Page 50: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Models based on Frequency Table 

www.ismartsoft.com  50 

Robust  Measurable  Scalable 

OneR 

Naïve Bayesian 

Decision Tree

Page 51: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Models based on 

Covariance Matrix 

www.ismartsoft.com  51

Page 52: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Y X 1  X 3 

52 www.ismartsoft.com 

X 2 

Models based on Covariance Matrix

Page 53: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Covariance 

n x ∑ = µ 

Covariance matrix 

Mean 

53 www.ismartsoft.com 

n y x 

C  y x ∑ − − = 

) )( ( µ µ

Page 54: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

www.ismartsoft.com  54 

Multiple Linear Regression MLR 

Find β  by minimizing ε

Page 55: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

MLR : Ordinary Least Square 

1 ( ' ) ' X X X Y β − = Predicted Values: 

Residuals: 

Intercept and Slopes: 

X Y β = ′ 

Y Y ′ − 55 www.ismartsoft.com 

) ( ) , ( 

x Var y x Cov b =

Page 56: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Regression Statistics 

∑ − =  2 ) (  Y Y SST

∑ ′ − ′ =  2 ) (  Y Y SSR

∑ ′ − =  2 ) (  Y Y SSE 

56 www.ismartsoft.com

Page 57: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Regression Statistics How good is our model? 

SST SSR R = 2 

Coefficient of Determination to judge the adequacy of the regression model 

57 www.ismartsoft.com

Page 58: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

ANOVA 

df  SS  MS  F  P‐value 

Regression  k  SSR  SSR / df  MSR / MSE  P(F) 

Residual  n‐k‐1  SSE  SSE / df 

Total  n‐1  SST 

If P(F)<a then we know that we get significantly better prediction of Y from the regression model than by just predicting mean of Y. 

0 : 0 ... :  2 1 0

≠ = = = = 

i A 

H H

β β β β 

at least one! 

58 www.ismartsoft.com

Page 59: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Hypotheses Tests for Regression Coefficients 

ii e 

i i 

i e 

i k n 

C S b 

b S b t 

2 1 

) 1 (  ) ( β β −

= −

= − − 

0 : 0 : 0

≠ = 

i A 

H H

β β 

59 www.ismartsoft.com

Page 60: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Multicolinearity 

§  If the F‐test for significance of regression is significant, but tests on the individual regression coefficients are not, multicolinearity may be present. 

§  Variance Inflation Factors (VIFs) are very useful measures of multicolinearity.  If any VIF exceed 5, multicolinearity is a problem. 

ii i 

i  C R 

VIF = −

=  2 1 1 ) (β 

60 www.ismartsoft.com

Page 61: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Linear Regression – Binary Target 

•  If actual Y is a binary variable the predicted Y can be less than zero or greater than 1. 

•  If actual Y is a binary variable error is not normally distributed. 

1 i o i i Y X β β ε = + + 

61 www.ismartsoft.com

Page 62: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Linear Regression Model 

0

1 Y 

62 www.ismartsoft.com

Page 63: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Logistic Regression 

63 www.ismartsoft.com 

) (  1 0 1 1 

X e p β β + − +

=

Page 64: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Frequency Table 

www.ismartsoft.com  64 

Months in Business  Count Target Count 

Target Probability 

<50  4  0  0 

50‐100  12  1  0.083 

100‐150  4  1  0.25 

150‐200  4  2  0.5 

200‐250  4  3  0.75 

250‐300  1  1  1 

>300  4  4  1

Page 65: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Frequency Plot 

www.ismartsoft.com 65 

0.2 

0.4 

0.6 

0.8 

1  2  3  4  5  6  7 

Months in Business ‐ Bins 

Target Probability

Page 66: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Logistic Function 

www.ismartsoft.com  66 

z e z f − +

= 1 1 ) (

Page 67: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Logistic Regression 

§  The logistic distribution constrains the estimated probabilities to lie between 0 and 1. 

§  Maximum Likelihood Estimation is a statistical method for estimating the coefficients of a model. 

67 www.ismartsoft.com 

) (  1 0 1 1 

X e p β β + − +

=

Page 68: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Logistic Regression Model 

0

1

Linear Model

Logistic Model 

68 www.ismartsoft.com

Page 69: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Log Likelihood (LL) 

www.ismartsoft.com  69 

•  Likelihood is the probability that the dependent variable may be predicted from the independent variables. 

•  LL is calculated through iteration, using maximum likelihood estimation (MLE). 

•  Log likelihood is the basis for tests of a logistic model.

Page 70: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Wald Test 

•  A Wald test is used to test the statistical significance of each coefficient (β) in the model. 

•  A Wald test calculates a Z statistic, which is: 

•  This Z value is then squared, yielding a Wald statistic with a chi‐square distribution. 

www.ismartsoft.com  70 

SE Z β ̂

=

Page 71: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Linear Discriminant Analysis ‐ LDA 

$0 

$50,000 

$100,000 

$150,000 

$200,000 

$250,000 

0  10  20  30  40  50  60  70 

Non‐Default 

Default 

Age 

Loan$ 

2 2 1 1  x x Z β β + = 

71 www.ismartsoft.com

Page 72: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

LDA – Training 

p p x x x Z β β β + + + =  ... 2 2 1 1

β β µ β µ β β 

C S  T 

T T 2 1 ) ( −

=  Score function 

groups within Z of Variance 2 1 ) (  Z Z S

− = β 

72 www.ismartsoft.com

Page 73: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

LDA – Training 

n x ∑ = µ 

Covariance matrix 

Mean 

73 www.ismartsoft.com 

n y x 

C  y x ∑ − − = 

) )( ( µ µ

Page 74: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

LDA – Training 

) ( 1 2 2 1 1 

2 1 

C n C n n n 

C + +

) (  2 1 1 µ µ β − = − C 

: , : : 

2 1 µ µ

β C 

Linear model coefficients 

Pooled covariance matrix 

Mean vectors 

74 www.ismartsoft.com

Page 75: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

LDA – Model Assessment 

) (  2 1 2 µ µ β − = ∆  T 

: ∆  Mahalanobis distance between two groups 

75 www.ismartsoft.com

Page 76: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

LDA – Prediction 

+

= 2 

2 1 0

µ µ β T Z 

1 1 µ β T Z = 

2 2 µ β T Z = 

76 www.ismartsoft.com

Page 77: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Can we handle non‐linearity? 

www.ismartsoft.com  77 

Support Vector Machines

Page 78: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Support Vector Machines‐ SVM 

$0 

$50,000 

$100,000 

$150,000 

$200,000 

$250,000 

0  10  20  30  40  50  60  70 

Non‐Default 

Default 

Support Vectors 

Age 

Loan$ 

78 www.ismartsoft.com

Page 79: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

SVM – Dual Form 

www.ismartsoft.com  79

Page 80: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

SVM – Dual Form 

www.ismartsoft.com  80 

∑ =

+ = n 

j j j  b x w x f 

) (

∑ =

+ = m

i i i  b x x k x f 

1 ) ( ) ( α

Page 81: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

SVM – Kernel functions 

www.ismartsoft.com  81 

d j i j i k  ) x . x ( ) x , x ( =

− − =  2 

x x exp ) x , x (

σ j i 

j i k 

Polynomial 

Gaussian Radial Basis function

Page 82: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Linear SVM 

www.ismartsoft.com  82 

Linear Proximal SVM Algorithm defined by Fung and Mangasarian: Given m data points in R n represented by them x n matrix A and a diagonal D of +1 and ‐1 labels denoting the class of each row of A, we generate the linear classifier as follows: 

where e is an m x 1 vectors of ones. 

Typically v is chosen by means of a tuning (validating) set.

Page 83: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Models based on Covariance Matrix 

www.ismartsoft.com  83 

Robust  Measurable  Scalable 

MLR 

LDA 

Logistic Reg. 

SVM

Page 84: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Models based on Similarity 

www.ismartsoft.com  84

Page 85: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

KNN ‐ Definition 

KNN is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure. 

85 www.ismartsoft.com

Page 86: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

KNN – different names 

K‐Nearest Neighbors • K‐Nearest Neighbors • Memory‐Based Reasoning • Example‐Based Reasoning •  Instance‐Based Learning • Case‐Based Reasoning •  Lazy Learning 

86 www.ismartsoft.com

Page 87: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

KNN Classification 

$0 

$50,000 

$100,000 

$150,000 

$200,000 

$250,000 

0  10  20  30  40  50  60  70 

Non‐Default 

Default 

Age 

Loan$ 

87 www.ismartsoft.com

Page 88: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

KNN Classification – Distance Age  Loan  Default  Distance 25  $40,000  N  102000 35  $60,000  N  82000 45  $80,000  N  62000 20  $20,000  N  122000 35  $120,000  N  22000 52  $18,000  N  124000 23  $95,000  Y  47000 40  $62,000  Y  80000 60  $100,000  Y  42000 48  $220,000  Y  78000 33  $150,000  Y  8000 

48  $142,000  ? 

2 2 1 

2 2 1  ) ( ) (  y y x x D − + − = Euclidean Distance 

88 www.ismartsoft.com

Page 89: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

KNN Classification – Standardized Distance Age  Loan  Default  Distance 

0.125  0.11  N  0.7652 0.375  0.21  N  0.5200 0.625  0.31  N  0.3160 

0  0.01  N  0.9245 0.375  0.50  N  0.3428 

0.8  0.00  N  0.6220 0.075  0.38  Y  0.6669 

0.5  0.22  Y  0.4437 1  0.41  Y  0.3650 

0.7  1.00  Y  0.3861 0.325  0.65  Y  0.3771 

0.7  0.61  ? 

Min Max Min X X s −

− = 

89 www.ismartsoft.com

Page 90: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

KNN – Number of Neighbors 

•  If K=1, select the nearest neighbor •  If K>1, – For classification select the most frequent neighbor. 

– For regression calculate the average of K neighbors. 

– What is the optimal K? 

90 www.ismartsoft.com

Page 91: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Models based on Similarity 

www.ismartsoft.com  91 

Robust  Measurable  Scalable 

KNN

Page 92: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Neural Networks 

www.ismartsoft.com  92

Page 93: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Biological Neuron and Integrated Circuit 

93 www.ismartsoft.com

Page 94: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Biological Neuron Synapse 

www.ismartsoft.com  94

Page 95: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

95 www.ismartsoft.com 

Neural Network ‐ Neuron 

∑  f 

ni w 

i w 1 

ji w

∑ = j 

j ji i  x w I 

) (  i i  I f y =

(1)  Summation 

(2) Transfer 

i y

Page 96: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Transfer Functions 

www.ismartsoft.com  96

Page 97: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

www.ismartsoft.com  97 

Weight Adjustment or Error Propagation 

i i  X Y D W  ) ( − = ∆ η

∑  f 

ni w 

i w 1 

ji w ε

Page 98: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Neural Networks 

www.ismartsoft.com  98 

Robust  Measurable  Scalable 

Neural Networks

Page 99: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Model Comparison 

www.ismartsoft.com  99 

Theoretically 

Robust 

Practically 

Measurable Truly 

Scalable 

Simple 

Practical Scalable

Page 100: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Measurable (Model Evaluation) 

www.ismartsoft.com  100

Page 101: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Model Evaluation 

Evaluation Evaluation 

Classification Classification 

Confusion Matrix 

Gain, Lift, ... Charts 

Regression Regression 

Mean Squared Error 

Residuals Chart 

www.ismartsoft.com  101

Page 102: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification ‐ Confusion Matrix 

www.ismartsoft.com  102 

True 

Positive 

False 

Positive 

False 

Negative 

True 

Negative 

CM 

Positive Cases  Negative Cases 

Pred

icted 

Positiv

e Pred

icted 

Negative

Page 103: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification Matrix 

www.ismartsoft.com  103 

Target 

Y  N 

Model 

Y  TP  FP Positive 

Predictive Value TP/(TP+FP) 

N  FN  TN Negative 

Predictive Value TN/(TN+FN) 

Sensitivity TP/(TP+FN) 

Specificity TN/(TN+FP) 

Accuracy 

FN FP TN TP TN TP

+ + + +

Page 104: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification Matrix ‐ OneR 

www.ismartsoft.com  104 

OneR Target 

Y  N 

Model 

Y  129  431 Positive 

Predictive Value 0.23 

N  79  7528 Negative 

Predictive Value 0.99 

Sensitivity 0.62 

Specificity 0.95 

Accuracy 0.94

Page 105: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification Matrix – Logistic Regression 

www.ismartsoft.com  105 

LogReg Target 

Y  N 

Model 

Y  135  425 Positive 

Predictive Value 0.24 

N  50  7557 Negative 

Predictive Value 0.99 

Sensitivity 0.73 

Specificity 0.95 

Accuracy 0.94

Page 106: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification – Gain Chart 

www.ismartsoft.com  106 

Population% 

50% 

100% 

100% 0% 

Target% 

Wizard 

Random 

Model

Page 107: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Gain Chart 

www.ismartsoft.com  107 

Population% 

50% 10% 

100% 

100% 

10% 

Target% 

Random 

Wizard 

50% 

A

Page 108: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Classification – Gain Chart 

www.ismartsoft.com  108 

Population% 

10%    20%     30%     40%    50% 

100% 

100% 

36% 

Target% 

54% 

66% 

76% 85% 

A B Index Gini = 

A

Page 109: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Gain Chart 

www.ismartsoft.com  109 

Target  Score 

1  880 

1  724 

1  676 

1  556 

0  480 

0  368 

0  345 

0  235 

0  195 

...  ... 

Sorted by Score 

Population%  Target% 

10  36 

20  54 

30  66 

40  76 

50  85 

60  90 

70  94 

80  98 

90  100 

100  100 

Gain Table

Page 110: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Lift Chart 

www.ismartsoft.com  110 

Population%  Lift 

10  3.6 

20  2.7 

30  2.2 

40  1.9 

50  1.7 

60  1.5 

70  1.3 

80  1.2 

90  1.1 

100  1 

Lift 

Population%  Target% 

10  36 

20  54 

30  66 

40  76 

50  85 

60  90 

70  94 

80  98 

90  100 

100  100 

Gain 

36/10

Page 111: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Lift Chart 

www.ismartsoft.com  111 

Lift 

Population% 

0.5 

1.5 

2.5 

3.5 

10  20  30  40  50  60  70  80  90  100

Page 112: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

K‐S Chart 

www.ismartsoft.com  112 

0% 

10% 

20% 

30% 

40% 

50% 

60% 

70% 

80% 

90% 

100% 

0 ‐ 100  100 ‐ 200 

200 ‐ 300 

300 ‐ 400 

400 ‐ 500 

500 ‐ 600 

600 ‐ 700 

700 ‐ 800 

800 ‐ 900 

900 ‐ 1000 

Target 

Non‐Target 

Score 

Count%

Page 113: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

K‐S Chart (Kolmogorov‐Smirnov) 

www.ismartsoft.com  113 

Score Range  Count  Cumulative Count 

Lower  Upper  Target  Non‐Target  Target  Non‐Target  K‐S 

0  100  3  62  0.5%  0.8%  0.3% 

100  200  0  23  0.5%  1.1%  0.6% 

200  300  1  66  0.7%  2.0%  1.3% 

300  400  7  434  2.0%  7.7%  5.7% 

400  500  181  5627  34.3%  81.7%  47.4% 

500  600  112  886  54.3%  93.3%  39.0% 

600  700  83  332  69.1%  97.7%  28.6% 

700  800  45  63  77.1%  98.5%  21.4% 

800  900  29  37  82.3%  99.0%  16.7% 

900  1000  99  77  100.0%  100.0%  0.0% 

K‐S 

K(0.95) = 6.0% K(0.99) = 7.1%

Page 114: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

ROC Chart 

www.ismartsoft.com  114 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

Random 

Model 

1‐Specificity 

Sensitivity

Page 115: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Area Under ROC Curve 

www.ismartsoft.com  115

Page 116: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Model Comparison 

www.ismartsoft.com  116

Page 117: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Model Comparison 

www.ismartsoft.com  117 

Model  Accuracy  PPV  Sensitivity  ROC Area 

ZeroR  0.93  0.0  0.0  0.50 

OneR  0.94  0.23  0.62  0.61 

Bayesian  0.93  0.36  0.44  0.79 

Decision Tree  0.94  0.20  0.70  0.73 

Logistic Regression  0.94  0.24  0.73  0.85 

LDA  0.93  0.24  0.71  0.83 

SVM (Lin)  0.94  0.07  0.83  0.54 

KNN (5NN)  0.78  0.27  0.09  0.59 

Neural Network (RBF)  0.94  0.25  0.62  0.78

Page 118: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Evaluation – Variables Contribution 

www.ismartsoft.com  118 

0.0%  5.0%  10.0%  15.0%  20.0%  25.0%  30.0%  35.0% 

Total Balance 

Number of Line of Credit 

Months in Business 

Business Type 

Total Number of Deliquent Days 

Total Number of Deliquency 

Credit Score 

Number of Delinquent Days 

Maximum Line Usage 

Maximum Number of Delinquency 

Variables Contribution ‐ Top 10

Page 119: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Data Exploration ‐ Bivariate 

119 

Target% 

Months in Business

Page 120: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Evaluation – Gain Chart 

www.ismartsoft.com  120 

Population% 

50% 10% 

100% 

100% 

58% 

10% 

Default%

Page 121: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

K‐S Chart 

www.ismartsoft.com  121 

0% 

10% 

20% 

30% 

40% 

50% 

60% 

70% 

80% 

90% 

100% 

0 ‐ 100  100 ‐ 200 

200 ‐ 300 

300 ‐ 400 

400 ‐ 500 

500 ‐ 600 

600 ‐ 700 

700 ‐ 800 

800 ‐ 900 

900 ‐ 1000 

Target 

Non‐Target 

Score 

Count%

Page 122: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Final Result 

•  Total number of cases = 8,167 

•  Total number of targets = 560 

•  Total balance for targets = $12,281,589 •  Top 10% Random 

–  Number of targets = 56 

–  Total balance = $1,230,000 •  Top 10%Model 

–  Number of targets = 305 

–  Total balance = $7,655,772 

www.ismartsoft.com  122 

600% Lift

Page 123: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

The Amount of Balance? 

www.ismartsoft.com  123 

Loss Given Default?

Page 124: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

www.ismartsoft.com  124 

[email protected]

Page 125: Techniques in Credit Scoring · PDF fileTechniques in Credit Scoring ... the credit risk of credit applicants as bad ... – No of cases: 35,500

Prepayment Modeling (Refinance, Move, and Default) 

www.ismartsoft.com  125 

Market 

• Market rates • House prices • Product choice 

Demographics 

• Age, life stage • Income, assets 

Property Data 

Geography 

Loan Data 

• Lien history • Property valuation • Loan purpose 

• Refinance costs • Appreciation • Affordability • Employment 

• Payment history • Equity • Loan characteristics 

Specifically no credit information due to FCRA