Upload
dotruc
View
218
Download
0
Embed Size (px)
Citation preview
A case study on using generalizedadditive models to fit credit ratingscores
Marlene Müller, [email protected]
This version: July 8, 2009, 14:32
Contents
Application: Credit Rating
Aim of this Talk
Case StudyGerman Credit DataAustralian Credit DataFrench Credit DataUC2005 Credit Data
Simulation Study
Conclusions
Appendix: Further PlotsAustralian Credit DataFrench Credit DataUC2005 Credit Data
1
Application: Credit Rating
- Basel II: capital requirements of a bank are adapted to the individual creditportfolio
- core terms: determine rating score and subsequently default probabilities(PDs) as a function of some explanatory variables
- further terms: loss given default, portfolio dependence structure
- in practice: often classical logit/probit-type models to estimate linearpredictors (scores) and probabilities (PDs)
- statistically: 2-group classification problem
risk management issues
- credit risk is ony one part of a bank’s total risk:; will be aggregated with other risks
- credit risk estimation from historical data:; stress-tests to simulate future extreme situations; need to easily adapt the rating system to possible future changes; possible need to extrapolate to segments without observations
2
Application: Credit Rating
- Basel II: capital requirements of a bank are adapted to the individual creditportfolio
- core terms: determine rating score and subsequently default probabilities(PDs) as a function of some explanatory variables
- further terms: loss given default, portfolio dependence structure
- in practice: often classical logit/probit-type models to estimate linearpredictors (scores) and probabilities (PDs)
- statistically: 2-group classification problem
risk management issues
- credit risk is ony one part of a bank’s total risk:; will be aggregated with other risks
- credit risk estimation from historical data:; stress-tests to simulate future extreme situations; need to easily adapt the rating system to possible future changes; possible need to extrapolate to segments without observations
2
(Simplified) Development of Rating Score and Default Probability
� raw data:
Xj measurements of several variables (“risk factors”)
� (nonlinear) transformation:
Xj → eXj = mj(Xj)
; handle outliers, allow for nonlinear dependence on raw risk factors
� rating score:S = w1
eX1 + . . . + wdeXd
� default probability:
PD = P(Y = 1|X) = G(w1eX1 + . . . + wd
eXd)
(where G is e.g. the logistic or gaussian cdf ; logit or probit)
3
(Simplified) Development of Rating Score and Default Probability
� raw data:
Xj measurements of several variables (“risk factors”)
� (nonlinear) transformation:
Xj → eXj = mj(Xj)
; handle outliers, allow for nonlinear dependence on raw risk factors
� rating score:S = w1
eX1 + . . . + wdeXd
� default probability:
PD = P(Y = 1|X) = G(w1eX1 + . . . + wd
eXd)
(where G is e.g. the logistic or gaussian cdf ; logit or probit)
3
(Simplified) Development of Rating Score and Default Probability
� raw data:
Xj measurements of several variables (“risk factors”)
� (nonlinear) transformation:
Xj → eXj = mj(Xj)
; handle outliers, allow for nonlinear dependence on raw risk factors
� rating score:S = w1
eX1 + . . . + wdeXd
� default probability:
PD = P(Y = 1|X) = G(w1eX1 + . . . + wd
eXd)
(where G is e.g. the logistic or gaussian cdf ; logit or probit)
3
(Simplified) Development of Rating Score and Default Probability
� raw data:
Xj measurements of several variables (“risk factors”)
� (nonlinear) transformation:
Xj → eXj = mj(Xj)
; handle outliers, allow for nonlinear dependence on raw risk factors
� rating score:S = w1
eX1 + . . . + wdeXd
� default probability:
PD = P(Y = 1|X) = G(w1eX1 + . . . + wd
eXd)
(where G is e.g. the logistic or gaussian cdf ; logit or probit)
3
(Simplified) Development of Rating Score and Default Probability
� raw data:
Xj measurements of several variables (“risk factors”)
� (nonlinear) transformation:
Xj → eXj = mj(Xj)
; handle outliers, allow for nonlinear dependence on raw risk factors
� rating score:S = w1
eX1 + . . . + wdeXd
� default probability:
PD = P(Y = 1|X) = G(w1eX1 + . . . + wd
eXd)
(where G is e.g. the logistic or gaussian cdf ; logit or probit)
3
Aim of this Talk
case study on (cross-sectional) rating data
- compare different approaches to generalized additive models (GAM)
- consider models that allow for additional categorical variables; partial linear terms (combination of GAM/GPLM)
� generalized additive models allow for a simultaneous fit of thetransformations from the raw data, the linear rating score and the defaultprobabilities
4
Outline of the Study
� credit data case study: 4 credit datasets
regressorsdataset sample defaults continuous discrete categoricalGerman Credit 1000 30.00% 3 – 17Australian Credit 678 55.90% 3 1 8French Credit 8178 5.86% 5 3 15UC2005 Credit 5058 23.92% 12 3 21
- differences between different approaches?- improvement of default predictions?
� simulation study: comparison of additive model (AM) and GAM fits
- differences between different approaches?- what if regressors are concurve? (nonlinear version of multicollinear)- do sample size and default rate matter?
5
Generalized Additive Model
� logit/probit are special cases of the generalized linear model (GLM)
E(Y |X) = G“
X⊤
β”
� “classic” generalized additive model
E(Y |X) = G
8
<
:
c +
pX
j=1
mj(Xj)
9
=
;
mj nonparametric
� generalized additive partial linear model (semiparametric GAM)
E(Y |X1, X2) = G
8
<
:
c + X⊤
1 β +
pX
j=1
mj(X2j)
9
=
;
mj nonparametric
linear part
- allows for known transformation functions
- allows to add / control for categorical regressors
6
Generalized Additive Model
� logit/probit are special cases of the generalized linear model (GLM)
E(Y |X) = G“
X⊤
β”
� “classic” generalized additive model
E(Y |X) = G
8
<
:
c +
pX
j=1
mj(Xj)
9
=
;
mj nonparametric
� generalized additive partial linear model (semiparametric GAM)
E(Y |X1, X2) = G
8
<
:
c + X⊤
1 β +
pX
j=1
mj(X2j)
9
=
;
mj nonparametric
linear part
- allows for known transformation functions
- allows to add / control for categorical regressors
6
Generalized Additive Model
� logit/probit are special cases of the generalized linear model (GLM)
E(Y |X) = G“
X⊤
β”
� “classic” generalized additive model
E(Y |X) = G
8
<
:
c +
pX
j=1
mj(Xj)
9
=
;
mj nonparametric
� generalized additive partial linear model (semiparametric GAM)
E(Y |X1, X2) = G
8
<
:
c + X⊤
1 β +
pX
j=1
mj(X2j)
9
=
;
mj nonparametric
linear part
- allows for known transformation functions
- allows to add / control for categorical regressors
6
R “Standard” Tools
two main approaches for GAM in
- gam::gam ; backfitting with local scoring (Hastie and Tibshirani; 1990)
- mgcv::gam ; penalized regression splines (Wood; 2006)
; compare these procedures under the default settings of gam::gam andmgcv::gam
competing estimators:
- logit binary GLM with G(u) = 1/{1 + exp(−u)} (logistic cdf as link)
- logit2, logit3 binary GLM with 2nd / 3rd order polynomial terms for thecontinuous regressors
- logitc binary GLM with continuous regressors categorized (4–5 levels)
- gam binary GAM using gam::gam with s() terms for continuous
- mgcv binary GAM using mgcv::gam
7
German Credit Data
� from http://www.stat.uni-muenchen.de/service/datenarchiv/kredit/kredit_e.html
regressorsdataset name sample defaults continuous discrete categoricalGerman 1000 30.00% 3 – 17
� 3 continuous regressors: age, amount, duration (time to maturity)
� use 10 CV subsamples for validation
� stratified data (true default rate ≈ 5%)
� important findings:- some observation(s) that seem to confuse mgcv::gam in one CV subsample
(→ see following slides)- however, mgcv::gam seems to improve deviance and discriminatory power
w.r.t. gam::gam- estimation times of mgcv::gam are between 4 to 7 times higher than for
gam::gam (not more than around a second, though)- if we only use the continuous regressors: both GAM estimators are comparable
to logit cubic additive functions
8
German Credit Data
� from http://www.stat.uni-muenchen.de/service/datenarchiv/kredit/kredit_e.html
regressorsdataset name sample defaults continuous discrete categoricalGerman 1000 30.00% 3 – 17
� 3 continuous regressors: age, amount, duration (time to maturity)
� use 10 CV subsamples for validation
� stratified data (true default rate ≈ 5%)
� important findings:- some observation(s) that seem to confuse mgcv::gam in one CV subsample
(→ see following slides)- however, mgcv::gam seems to improve deviance and discriminatory power
w.r.t. gam::gam- estimation times of mgcv::gam are between 4 to 7 times higher than for
gam::gam (not more than around a second, though)- if we only use the continuous regressors: both GAM estimators are comparable
to logit cubic additive functions
8
German Credit Data: Additive Functions
3.0 3.2 3.4 3.6 3.8 4.0 4.2
−0.
50.
00.
5
age
s(ag
e,1)
Variable age (mgcv and blue: gam)
6 7 8 9
−4
−2
02
amount
s(am
ount
,4.4
9)
Variable amount (mgcv and blue: gam)
1.5 2.0 2.5 3.0 3.5 4.0
−2
−1
01
2
duration
s(du
ratio
n,1)
Variable duration (mgcv and blue: gam)
9
How to Compare Binary GLM Fits?
� preferably by out-of-sample validation ; block cross-validation approach:leave out subsamples of x% from the fitting procedure, estimate from theremaining (100-x)% and calculate validation criteria from the x% left-out
� two criteria for comparison: deviance (→ goodness of fit) and accuracyratios AR from CAP curves (→ discriminatory power)
� CAP curve (Lorenz curve) and the accuracy ratio AR:
- plot the empirical cdf of the fitted scoresagainst the empirical cdf of the fitted
default sample scores (precisely 1 − bF vs.
1 − bF(.|Y = 1))
- AR is the area between CAP curve anddiagonal in relation to the correspondingarea for the best possible CAP curve (bestpossible ∼= perfect separation)
- relation to ROC: compares bF(.|Y = 0) andbF(.|Y = 1) and it holds
AR = 2 AUC−1
PD
1−F(s)
best possible CAP curve
Percentage
100%
100%
1−F (s)1
CAP curve
2
1_G
Percentage of applicants
of defaults
10
How to Compare Binary GLM Fits?
� preferably by out-of-sample validation ; block cross-validation approach:leave out subsamples of x% from the fitting procedure, estimate from theremaining (100-x)% and calculate validation criteria from the x% left-out
� two criteria for comparison: deviance (→ goodness of fit) and accuracyratios AR from CAP curves (→ discriminatory power)
� CAP curve (Lorenz curve) and the accuracy ratio AR:
- plot the empirical cdf of the fitted scoresagainst the empirical cdf of the fitted
default sample scores (precisely 1 − bF vs.
1 − bF(.|Y = 1))
- AR is the area between CAP curve anddiagonal in relation to the correspondingarea for the best possible CAP curve (bestpossible ∼= perfect separation)
- relation to ROC: compares bF(.|Y = 0) andbF(.|Y = 1) and it holds
AR = 2 AUC−1
PD
1−F(s)
best possible CAP curve
Percentage
100%
100%
1−F (s)1
CAP curve
2
1_G
Percentage of applicants
of defaults
10
German Credit Data: Comparison
logit1 logit2 logit3 logitc gam mgcv
0.45
0.50
0.55
0.60
0.65
German: Accuracy Ratios (AR)
2 4 6 8 10
0.45
0.50
0.55
0.60
0.65
German: Accuracy Ratios (AR)
sample
2 4 6 8 10
0.45
0.50
0.55
0.60
0.65
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
100
150
200
250
300
350
German: Deviances
2 4 6 8 10
100
150
200
250
300
350
German: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
02
46
810
German: Estimation Times
Figure: Out of sample comparison (blockwise CV with 10 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
11
German Credit Data: Models with only Continuous Regressors
logit1 logit2 logit3 logitc gam mgcv
0.0
0.1
0.2
0.3
0.4
0.5
German−Metric: Accuracy Ratios (AR)
2 4 6 8 10
0.0
0.1
0.2
0.3
0.4
0.5
German−Metric: Accuracy Ratios (AR)
sample
2 4 6 8 10
0.0
0.1
0.2
0.3
0.4
0.5
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
110
115
120
125
130
German−Metric: Deviances
2 4 6 8 10
110
115
120
125
130
German−Metric: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
0.0
0.1
0.2
0.3
0.4
0.5
German−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 10 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
12
Australian Credit Data
� from http://archive.ics.uci.edu/ml/datasets/Statlog+(Australian+Credit+Approval)
� used for estimation:
regressorsdataset name sample defaults continuous discrete categoricalAustralian 678 55.90% 3 1 8
� use only 7 CV subsamples for validation
� original A13 and A14 were dropped since actually multicollinear with A10, someobservations were dropped because of very few categories
� A10 was transformed to log(1 + A10), nevertheless used only as a linear predictor(as half of the observations have the same value)
� important findings:
- essentially, the estimated additive function for A2 differs between mgcv::gamand gam::gam
- gam::gam mostly outperforms than all other estimates (recall, that however thenumber of CV subsamples is rather small!)
- estimation times of mgcv::gam are around 3 to 5 times higher than forgam::gam (less than a second, though)
13
French Credit Data
� data were already analyzed with GPLMs in Müller and Härdle (2003), here used forestimation:
regressorsdataset name sample defaults continuous discrete categoricalFrench 8178 5.86% 5 3 15
� use the same preprocessing as in as in Müller and Härdle (2003)
� the original estimation + validation samples were merged, use 20 CV subsamples forvalidation instead
� continuous variables are X1, X2, X3, X4 and X6, in particular X3, X4 and X6 areknown to have nonlinear form in a GAM
� important findings:
- it is confirmed that additive functions for X3, X4 and X6 should be modelled bya nonlinear function be nonlinear
- again observation(s) "confusing" mgcv::gam in one of the subsamples- all estimates show similar discriminatory power, though with a slightly better
performance for both mgcv::gam and gam::gam- estimation times of mgcv::gam are around 15 to 24 times higher than for
gam::gam (for the largest model: 20-40 sec. on a 3Ghz Intel CPU for thesubsamples of about 7800 observations)
14
UC2005 Credit Data
� data from the 2005 UC data mining competitionwere already analyzed with GPLMsin Müller and Härdle (2003), here used for estimation:
regressorsdataset name sample defaults continuous discrete categoricalUC2005 5058 23.92% 12 3 21
� the original estimation + validation + quiz samples were merged, use again 20 CVsubsamples for validation
� stratified data (true default rate ≈ 5%)
� several of the variables have been preprocessed with a log-transform or to binary
� in general, the data haven’t been very carefully analysed, it’s use is rather meant a“proof-of concept”
� important findings:
- there are again observations "confusing" mgcv::gam in one of the subsamples- performance of mgcv::gam and gam::gam w.r.t. is very similar and
outperforms the other approaches (closest to them is the logit fit with cubicadditive functions)
- estimation times of mgcv::gam are around 8 to 40 times higher than forgam::gam (for the largest model: 5-8 min on a 3Ghz Intel CPU for up to 400seconds for the subsamples of about 4800 observations)
15
Simulation Study for (G)PLM
E(Y |X, T) = β1X1 + β2X2 + m(T)
which of the (G)AM estimators is preferable ...?
� ... to fit the additive component functions and/or the regression function?
� ... w.r.t. discriminatory power in the GPLM/GAM cases?
� ... from a practical point of view (comp. speed, numerical stability etc.)?
simulation setup:
β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c
X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered)
nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0,−1,−2}
� X2 and T are nonlinearly dependent (if ρ = 0.7) or independent otherwise
� sample size n up to 10000 which is a possible size for credit data
� the intercept c controls for the default rate (15%–50%) in the GPLM
16
Simulation Study for (G)PLM
E(Y |X, T) = β1X1 + β2X2 + m(T)
which of the (G)AM estimators is preferable ...?
� ... to fit the additive component functions and/or the regression function?
� ... w.r.t. discriminatory power in the GPLM/GAM cases?
� ... from a practical point of view (comp. speed, numerical stability etc.)?
simulation setup:
β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c
X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered)
nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0,−1,−2}
� X2 and T are nonlinearly dependent (if ρ = 0.7) or independent otherwise
� sample size n up to 10000 which is a possible size for credit data
� the intercept c controls for the default rate (15%–50%) in the GPLM
16
Simulation Study for (G)PLM
E(Y |X, T) = β1X1 + β2X2 + m(T)
which of the (G)AM estimators is preferable ...?
� ... to fit the additive component functions and/or the regression function?
� ... w.r.t. discriminatory power in the GPLM/GAM cases?
� ... from a practical point of view (comp. speed, numerical stability etc.)?
simulation setup:
β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c
X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered)
nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0,−1,−2}
� X2 and T are nonlinearly dependent (if ρ = 0.7) or independent otherwise
� sample size n up to 10000 which is a possible size for credit data
� the intercept c controls for the default rate (15%–50%) in the GPLM
16
Simulation Study: Additive Components for GPLM
gammgcv
ρ = 0.7c = −2
n = 1000
gam mgcv
0.00
000
0.00
010
0.00
020
0.00
030
MSE m(t)
gam mgcv
0.00
000
0.00
005
0.00
010
0.00
015
0.00
020
0.00
025
0.00
030
MSE beta1
gam mgcv
0.00
000
0.00
005
0.00
010
0.00
015
0.00
020
0.00
025
MSE beta2
gammgcv
ρ = 0.7c = −2
n = 10000
gam mgcv
0e+
002e
−05
4e−
056e
−05
MSE m(t)
gam mgcv
0.0e
+00
5.0e
−06
1.0e
−05
1.5e
−05
2.0e
−05
2.5e
−05
MSE beta1
gam mgcv
0e+
001e
−05
2e−
053e
−05
4e−
05
MSE beta2
17
Simulation Study: Independent Components vs. Dependent
gammgcv
ρ = 0.7
c = 0
n = 10000
gam mgcv
0e+
001e
−05
2e−
053e
−05
4e−
05
MSE m(t)
gam mgcv
0.0e
+00
5.0e
−06
1.0e
−05
1.5e
−05
2.0e
−05
2.5e
−05
3.0e
−05
MSE beta1
gam mgcv
0.0e
+00
5.0e
−06
1.0e
−05
1.5e
−05
2.0e
−05
MSE beta2
gammgcv
ρ = 0.7
c = −2
n = 10000
gam mgcv
0e+
002e
−05
4e−
056e
−05
MSE m(t)
gam mgcv
0.0e
+00
5.0e
−06
1.0e
−05
1.5e
−05
2.0e
−05
2.5e
−05
MSE beta1
gam mgcv
0e+
001e
−05
2e−
053e
−05
4e−
05
MSE beta2
18
Simulation Study: Comparison with Components for PLM
gammgcv
ρ = 0.7c = 0
n = 1000
gam mgcv
0.0e
+00
5.0e
−06
1.0e
−05
1.5e
−05
MSE m(t)
gam mgcv
0e+
001e
−07
2e−
073e
−07
4e−
07
MSE beta1
gam mgcv
0e+
001e
−06
2e−
063e
−06
4e−
065e
−06
MSE beta2
gammgcv
ρ = 0.7c = 0
n = 10000
gam mgcv
0.0e
+00
5.0e
−06
1.0e
−05
1.5e
−05
MSE m(t)
gam mgcv
0e+
001e
−08
2e−
083e
−08
4e−
085e
−08
6e−
08
MSE beta1
gam mgcv
0.0e
+00
5.0e
−07
1.0e
−06
1.5e
−06
2.0e
−06
2.5e
−06
3.0e
−06
MSE beta2
19
Simulation Study: Deviance and Discriminatory Power for GPLM
gammgcv
ρ = 0.7c = −2
n = 1000
gam mgcv
650
700
750
800
850
Deviances
gam mgcv
0.35
0.40
0.45
0.50
0.55
0.60
Accuracy Ratios (AR)
gam mgcv
0.30
0.35
0.40
0.45
0.50
Kolmogorov Stats (T)
gammgcv
ρ = 0.7c = −2
n = 10000
gam mgcv
7200
7300
7400
7500
7600
7700
7800
Deviances
gam mgcv
0.44
0.46
0.48
0.50
Accuracy Ratios (AR)
gam mgcv
0.32
0.34
0.36
0.38
Kolmogorov Stats (T)
in fact, most of the gam::gam deviances are larger here than the mgcv::gam deviances
and gam::gam fits have smaller discriminatory power
20
Simulation Study: Estimation Times for GPLM
gammgcv
ρ = 0.7c = −2
gam mgcv
0.1
0.2
0.3
0.4
0.5
0.6
Estimation Time
gam mgcv
12
34
56
Estimation Time
n = 1000 n = 10000
(estimation times in sec. on a Xeon 2.50GHz)
21
Conclusions
� typically, categorical regressors improve fit significantly, thereforeestimation methods for credit data should adequately use these
� backfitting + local scoring (gam::gam) provides fast and numerically stableresults
� there is however clear indication, that penalized regression splines(mgcv::gam) may provide more precise estimates of the additivecomponent functions; its current drawbacks are:
- estimation time (increasing with model complexity, categorical variables)- effects are to be seen only in large samples
� thus: no clear recommendation, no “ultimate method”; clearly topics for more research
22
References
Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and SemiparametricModeling: An Introduction, Springer, New York.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models, Vol. 43 of Monographs onStatistics and Applied Probability, Chapman and Hall, London.
Müller, M. (2001). Estimation and testing in generalized partial linear models — a comparative study,Statistics and Computing 11: 299–309.
Müller, M. and Härdle, W. (2003). Exploring credit data, in G. Bol, G. Nakhaeizadeh, S. Rachev, T. Ridderand K.-H. Vollmer (eds), Credit Risk - Measurement, Evaluation and Management, Physica-Verlag.
Speckman, P. E. (1988). Regression analysis for partially linear models, Journal of the Royal StatisticalSociety, Series B 50: 413–436.
Wood, S. N. (2006). Generalized Additive Models: An Introduction with R, Texts in Statistical Science,Chapman and Hall, London.
23
Australian Credit Data: Additive Functions
3.0 3.5 4.0
−1.
0−
0.5
0.0
0.5
A2
s(A
2,1)
Variable A2 (mgcv and blue: gam)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
−3
−2
−1
01
23
A3
s(A
3,4.
35)
Variable A3 (mgcv and blue: gam)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
−10
−5
0
A7
s(A
7,6.
27)
Variable A7 (mgcv and blue: gam)
24
Australian Credit Data: Comparison
logit1 logit2 logit3 logitc gam mgcv
0.75
0.80
0.85
0.90
Australian: Accuracy Ratios (AR)
1 2 3 4 5 6 7
0.75
0.80
0.85
0.90
Australian: Accuracy Ratios (AR)
sample
1 2 3 4 5 6 7
0.75
0.80
0.85
0.90
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
5060
7080
90
Australian: Deviances
1 2 3 4 5 6 7
5060
7080
90
Australian: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Australian: Estimation Times
Figure: Out of sample comparison (blockwise CV with 7 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
25
Australian Credit Data: Models with only Continuous Regressors
logit1 logit2 logit3 logitc gam mgcv
0.30
0.35
0.40
0.45
0.50
0.55
0.60
Australian−Metric: Accuracy Ratios (AR)
1 2 3 4 5 6 7
0.30
0.35
0.40
0.45
0.50
0.55
0.60
Australian−Metric: Accuracy Ratios (AR)
sample
1 2 3 4 5 6 7
0.30
0.35
0.40
0.45
0.50
0.55
0.60
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
105
110
115
120
125
Australian−Metric: Deviances
1 2 3 4 5 6 7
105
110
115
120
125
Australian−Metric: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
0.00
0.05
0.10
0.15
0.20
0.25
Australian−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 7 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
26
French Credit Data: Additive Functions
−1 0 1 2 3
−0.
50.
00.
51.
0
X1
s(X
1,1)
Variable X1 (mgcv and blue: gam)
−1 0 1 2 3
−1.
5−
1.0
−0.
50.
00.
5
X2
s(X
2,1)
Variable X2 (mgcv and blue: gam)
0 1 2 3
−0.
4−
0.3
−0.
2−
0.1
0.0
0.1
X3
s(X
3,1)
Variable X3 (mgcv and blue: gam)
0 1 2 3
−0.
50.
00.
51.
0
X4
s(X
4,5.
19)
Variable X4 (mgcv and blue: gam)
0 1 2 3−
1.5
−1.
0−
0.5
0.0
0.5
X6
s(X
6,3.
34)
Variable X6 (mgcv and blue: gam)
27
French Credit Data: Comparison
logit1 logit2 logit3 logitc gam mgcv
0.3
0.4
0.5
0.6
French: Accuracy Ratios (AR)
5 10 15 20
0.3
0.4
0.5
0.6
French: Accuracy Ratios (AR)
sample
5 10 15 20
0.3
0.4
0.5
0.6
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
150
200
250
300
350
400
French: Deviances
5 10 15 20
150
200
250
300
350
400
French: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
050
100
150
200
French: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
28
French Credit Data: Models with only Significant Regressors
logit1 logit2 logit3 logitc gam mgcv
0.30
0.35
0.40
0.45
0.50
0.55
French−Signif: Accuracy Ratios (AR)
5 10 15 20
0.30
0.35
0.40
0.45
0.50
0.55
French−Signif: Accuracy Ratios (AR)
sample
5 10 15 20
0.30
0.35
0.40
0.45
0.50
0.55
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
160
170
180
190
French−Signif: Deviances
5 10 15 20
160
170
180
190
French−Signif: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
05
1015
French−Signif: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
29
French Credit Data: Models with only Metric Regressors
logit1 logit2 logit3 logitc gam mgcv
0.1
0.2
0.3
0.4
0.5
French−Metric: Accuracy Ratios (AR)
5 10 15 20
0.1
0.2
0.3
0.4
0.5
French−Metric: Accuracy Ratios (AR)
sample
5 10 15 20
0.1
0.2
0.3
0.4
0.5
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
170
175
180
185
190
195
200
French−Metric: Deviances
5 10 15 20
170
175
180
185
190
195
200
French−Metric: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
02
46
810
12
French−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
30
UC2005 Credit Data: Additive Functions
−2 0 2 4
−1.
5−
1.0
−0.
50.
00.
51.
0
X1
s(X
1,2.
13)
Variable X1 (mgcv and blue: gam)
−4 −2 0 2
−0.
50.
00.
51.
0
X4
s(X
4,1.
84)
Variable X4 (mgcv and blue: gam)
−3 −2 −1 0 1 2 3
−1
01
23
X5
s(X
5,5.
24)
Variable X5 (mgcv and blue: gam)
−1 0 1 2
−2
02
46
8
X13
s(X
13,4
.43)
Variable X13 (mgcv and blue: gam)
0 2 4
−2
−1
01
X14
s(X
14,2
.11)
Variable X14 (mgcv and blue: gam)
−2 −1 0 1
−0.
40.
00.
20.
40.
60.
81.
0
X15
s(X
15,2
.77)
Variable X15 (mgcv and blue: gam)
−1 0 1 2 3
−3
−2
−1
01
X26
s(X
26,1
.64)
Variable X26 (mgcv and blue: gam)
0 1 2 3
01
23
4
X28
s(X
28,4
.43)
Variable X28 (mgcv and blue: gam)
−1 0 1 2 3
−2
02
4
X29
s(X
29,2
.6)
Variable X29 (mgcv and blue: gam)
−1 0 1 2 3 4 5
−2.
5−
1.5
−0.
50.
00.
5
X30
s(X
30,4
.43)
Variable X30 (mgcv and blue: gam)
−1 0 1 2 3 4 5
−2
02
46
X33
s(X
33,7
.99)
Variable X33 (mgcv and blue: gam)
0 2 4 6 8
−6
−4
−2
02
X37
s(X
37,2
.1)
Variable X37 (mgcv and blue: gam)
31
UC2005 Credit Data: Comparison
logit1 logit2 logit3 logitc gam mgcv
0.92
0.94
0.96
0.98
UC2005: Accuracy Ratios (AR)
5 10 15 20
0.92
0.94
0.96
0.98
UC2005: Accuracy Ratios (AR)
sample
5 10 15 20
0.92
0.94
0.96
0.98
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
5010
015
020
025
030
0
UC2005: Deviances
5 10 15 20
5010
015
020
025
030
0
UC2005: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
010
020
030
040
050
0
UC2005: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
32
UC2005 Credit Data: Models with only Metric Regressors
logit1 logit2 logit3 logitc gam mgcv
0.85
0.90
0.95
UC2005−Metric: Accuracy Ratios (AR)
5 10 15 20
0.85
0.90
0.95
UC2005−Metric: Accuracy Ratios (AR)
sample
5 10 15 20
0.85
0.90
0.95
gam vs. mgcv (AR)
sample
logit1 logit2 logit3 logitc gam mgcv
6080
100
120
140
160
UC2005−Metric: Deviances
5 10 15 20
6080
100
120
140
160
UC2005−Metric: Deviances
sample
logit1 logit2 logit3 logitc gam mgcv
020
4060
80
UC2005−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators,accuracy ratios from CAP curves (upper panels), deviance values and estimation times(lower panels)
33