Upload
eric-webb
View
214
Download
0
Embed Size (px)
Citation preview
GETTING READY FOR DISCRIMINANT ANALYSIS
Defining Dummy Variables
Why dummies?
Not necessary for predictive models, but has some advantages. A subset of a variable (a certain range of values)
may affect dependent differently, but variable used as a continuous one may not be significant.
Easier to interpret for business applications. For credit bureau variables, can handle special
cases (no record, inquiries only, missing, etc.) a little better, based on dependent variable characteristics for those categories.
How to define them
Compute ratio of column percentages for each category (Good Column Percent / Bad Column Percent).
Use the pattern of these ratios to determine how many categories (and hence number of dummies) to create.
Must have a neutral category.
Example:Customer Age
VAGECol Pct Bad Good Total Ratio0 TO 17 1 1 2
0.02 0.0118 TO 20 521 827 1348
11.36 8.79 0.77421 TO 23 703 1096 1799
15.32 11.65 0.76024 TO 27 866 1519 2385
18.88 16.15 0.85528 TO 34 1123 2328 3451
24.48 24.75 1.01135 TO 39 533 1297 1830
11.62 13.79 1.18740 TO 44 376 986 1362
8.2 10.48 1.27845 TO 49 251 639 890
5.47 6.79 1.24150 TO 54 126 356 482
2.75 3.78 1.37555 TO 61 62 241 303
1.35 2.56 1.89662+ 26 116 142
0.57 1.23 2.158Total 4588 9406 13994
Neutral
1
2
3
4
5
6
7
Dummies
Some Guidelines
Look for a logical pattern Eg: Ratios get better with age – does that make
sense? Why or why not? If a higher age category has lower ratio then
combine it with the previous (or next) category. If pattern is contrary to business expectation,
investigate data, and/or drop the variable. If no pattern (variation in ratios) at all, drop the
variable – it has no discriminatory power.
Special Cases
What to do with ‘No Record’, ‘Inquiries Only’, etc. while dealing with Credit Bureau variables? Look at Good/Bad ratio for those categories. Find category with closest match and make that
the Neutral category. The special cases should also be part of Neutral
category for all variables. Assess their impact only once in the model by
defining dummies for the CBTYPE variable.
CBTYPE Variable
Key to CBTYPE variable1 = Record with Trades2 = Record w/Inqs. and Pub Recs Only3 = Record w/Inqs. Only4 = Record w/Pub Recs Only5 = No Record