7
GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

Embed Size (px)

Citation preview

Page 1: GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

GETTING READY FOR DISCRIMINANT ANALYSIS

Defining Dummy Variables

Page 2: GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

Why dummies?

Not necessary for predictive models, but has some advantages. A subset of a variable (a certain range of values)

may affect dependent differently, but variable used as a continuous one may not be significant.

Easier to interpret for business applications. For credit bureau variables, can handle special

cases (no record, inquiries only, missing, etc.) a little better, based on dependent variable characteristics for those categories.

Page 3: GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

How to define them

Compute ratio of column percentages for each category (Good Column Percent / Bad Column Percent).

Use the pattern of these ratios to determine how many categories (and hence number of dummies) to create.

Must have a neutral category.

Page 4: GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

Example:Customer Age

VAGECol Pct Bad Good Total Ratio0 TO 17 1 1 2

0.02 0.0118 TO 20 521 827 1348

11.36 8.79 0.77421 TO 23 703 1096 1799

15.32 11.65 0.76024 TO 27 866 1519 2385

18.88 16.15 0.85528 TO 34 1123 2328 3451

24.48 24.75 1.01135 TO 39 533 1297 1830

11.62 13.79 1.18740 TO 44 376 986 1362

8.2 10.48 1.27845 TO 49 251 639 890

5.47 6.79 1.24150 TO 54 126 356 482

2.75 3.78 1.37555 TO 61 62 241 303

1.35 2.56 1.89662+ 26 116 142

0.57 1.23 2.158Total 4588 9406 13994

Neutral

1

2

3

4

5

6

7

Dummies

Page 5: GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

Some Guidelines

Look for a logical pattern Eg: Ratios get better with age – does that make

sense? Why or why not? If a higher age category has lower ratio then

combine it with the previous (or next) category. If pattern is contrary to business expectation,

investigate data, and/or drop the variable. If no pattern (variation in ratios) at all, drop the

variable – it has no discriminatory power.

Page 6: GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

Special Cases

What to do with ‘No Record’, ‘Inquiries Only’, etc. while dealing with Credit Bureau variables? Look at Good/Bad ratio for those categories. Find category with closest match and make that

the Neutral category. The special cases should also be part of Neutral

category for all variables. Assess their impact only once in the model by

defining dummies for the CBTYPE variable.

Page 7: GETTING READY FOR DISCRIMINANT ANALYSIS Defining Dummy Variables

CBTYPE Variable

Key to CBTYPE variable1 = Record with Trades2 = Record w/Inqs. and Pub Recs Only3 = Record w/Inqs. Only4 = Record w/Pub Recs Only5 = No Record