Changing Requirements of Business Analytics in Financial Services

Don’t blame the Retriever;Who threw the ball?

Pay No Attention to the Man Behind The Curtain

The Changing Requirements of Business Analytics in Financial Services

• “There is no business challenge that cannot be solved if one considers that a Business Challenge is simply a Tennis Ball waiting to be thrown….”

– Jon Farrar

Once Upon A Time….

There was this dream that everyone who needed a loan would always be treated fairly

But there were factors at work that made the dream almost


But the needs were so great….

Then all of a sudden

Someone invented something called Credit Scores

They were a bit odd, at first, but they were also kind of an elegant accessory and they fit real good.

Once folks found out about ‘em, Everybody wanted ‘em

Everybody was Happy…

They Seemed to go with EVERYTHING,

and they were a little Magical besides…

But, trouble was a-brewin….

The Wizard of OCC (“awk”) found out about the Credit Scores and he was not happy.

The Wizard of OCC thought Credit Scores looked like this…

And The Wizard Wanted them tolook more like this…

So, The Wizard sent his Minions to do some work….

Storm after Storm blew down on everyone using Credit Scores

And Because the Wizard of OCC

Wasn’t always real clear about what he wanted everybody to do. People were confused…


So the Wizard Tried Again,

More Confusion……

OCC 97-24

And Again….


And Still, NOBODY seemed to know what to do….

And then the wisest one of them allhad an Idea….

C’mon, you guys! We just gotta go talk to the old bird

Toto’s Right!

So there was only one thing left to do…..

They formed their little group and they went off to see the Wizard….

So, they followed the FICO-Built Road


But that proved kinda scary,

everybody said The Wizard was REAL MEAN!

And the Minions seemed to like that everybody was confused

And all along the road

There were “Empirically Derived”sAnd “Demonstrably and Statistically Sound”s There were Models, Reporting,

and BackTests


They knew they weren’t in Kansas Anymore….

But They Carried OnIn spite of attempts to

deter them from their road…

And When they finally got to The Wizard

They made an appointment with his Admin

When They First Got Inside, they WERE scared

But Then They Realized something funny

The Wizard of OCC wasn’t such a bad guy after all

He Just wanted Everybody to Understand How the Ruby Slippers Were Made

So that they held up,and didn’t fall apart,and were the right size,and were available to all,And so everybody could buy

and sell more slippers,In a kinda sorta fair way….

So the Wizard Of OCC Created



And everybody sort of understood,And everybody was sort of happy

It STILL wasn’t perfect,But it was a gosh-darn sight better

than what came before…..




97-24 OCC2000-


There was something for everybody

And Toto too….

Dorothy Understood that she needed to spread the word

And with the help of a very good Travel Agent….

They loaded the Ruby Slippers and the New Instructions into

the Open Gray Box

And set off back to where it all started…

* Well, for the time being anyway….


Pay No Attention… Part II• What we have learned thus far

– Since the beginning, Models were Magical– Regulators were always concerned with Fairness and

measurability– Models offer Promise but lots of confusion

• Models are used for lots of different functions• Models are not always clearly understood• Regulating them lagged behind their prevalence

and use– Multiple attempts to regulate but never clear– Finally catching up but still lagging– OCC 2011-12 best so far, large way there

• Dem’s da rules, Dat’s how we gotta play…

Models offered Promise but lots of confusion too

• We started using models for all sorts of different functions

• Consumers started asking lots of questions• “You didn’t Score enough” didn’t cut it• “Lemme talk to your MANAGER!”

You see, time was…..

Characteristic Points

Home Ownership


Lives with parentsOther


Years On Job

< 2years2 – 5 years5 – 8 years

8+ years


Credit History

< 2 years2-4 years4-7 years7+ years


Credit Report

< 3 Inquiries3+ Inquiries

< 3 Satisfactory3+ Satisfactory

Worst Rating 60+ DelinqWorst Rating Derog

Worst Rating Satisfactory




Earlier Models were able to be very simply rendered

One just added up the points

If there were enough to pass the cutoff, the customer was approved

But still nobody really knew how to explain them

And we started using models for all kinds of things



TypicalSourcesof Data




Credit Bureau,Demographics



New AccountScoring

Application,Credit Bureau




Masterfile,Credit Bureau,Loan Details




Behavior Scoring

MasterfilePurchases &

PaymentsLoan Details


Linear Regression ModelsLogistic Regression Models

Models offered Promise but lots of confusion

• Models used for lots of different functions• Consumers started asking lots of questions– Why did I only get that Loan Amount?– Why was I turned down?– Why didn’t you renew my Credit Line?– Why did you call me for a payment?

Farrar -45

Don’t blame the Retriever;Who threw the ball?

Models offered Promise but lots of confusion

• Models used for lots of different functions• Consumers gaining Savvy and asked lots of

questions• “You didn’t Score enough” didn’t cut it– Customers didn’t get it– Loan Officers also didn’t get it– The Tin Woodsman didn’t get it

(and he had an Axe!)

And now look where we are(not to mention where we’re going…)




TypicalSourcesof Data




Masterfile,Credit Bureau


Cross Sell




Web Logs, Transactional Databases,Historical time series databases

Internal system databases (DDA, Collection, Recovery, Financial, etc.)

DataMarts,Data Warehouses

Traditional MS Office SuiteData extraction tools

Leading edge Statistical packages (SAS, SPSS, R)Data Mining packages

Pattern Recognition AlgorithmsCategorization and Regression Trees (CART®)

Stochastic Gradient Boosting (TreeNet ®)Programming and Application Languages

Models offered Promise but lots of confusion

• Models used for lots of different functions• Consumers started asking lots of questions• “You didn’t Score enough” didn’t cut it

But how ya gonna keep ‘em down on the Farm…?

• Plethora of Modeling techniques and Methodologies are part of Statistical training

• Reality Bites• Only very small number of learned statistical techniques can actually

be used in most business scenarios• Where we can apply them in Business, even fewer of those meet

usability requirements– Tracking, Monitoring, Maintaining, Refreshing– Time to Develop, Validate, Test, Deploy– Extensible, Scalable, contribute to KPI’s and Financial Measures like ROA, RAROC,

ROI, etc. – EXPLAINABLE! (ahhhh… back to the Regulations….in a moment…)

• So in general, it makes more sense to use simpler types of models for most business applications

So How ya gonna keep ‘em down on the Farm?

• Easy.• Tell ‘em they have to follow 2011-12• They’ll NEVER leave!

OCC 2011-12

• The design, theory, and logic underlying the model should be well documented and generally supported by published research and sound industry practice. The model methodologies and processing components that implement the theory, including the mathematical specification and the numerical techniques and approximations, should be explained in detail with particular attention to merits and limitations.

OCC 2011-12 (2)

• Without adequate documentation, model risk assessment and management will be ineffective. Documentation of model development and validation should be sufficiently detailed so that parties unfamiliar with a model can understand how the model operates, its limitations, and its key assumptions. Documentation provides for continuity of operations, makes compliance with policy transparent, and helps track recommendations, responses, and exceptions.

Vital organs of 2011-12• Oversight – Model Risk Management Division

– Manage Model Risk like any other type of risk– Detailed Policies and procedures for Models , their uses and permitted Overrides– Rigorous assessment of Data quality, relevance, appropriateness and documentation– All model assumptions must be tracked and monitored– Appropriateness of chosen Methodology must be defensible (design and construction)– Audit and Compliance Signoffs

• Rigorous Testing before Implementation– Stress testing against multiple economic and Financial Scenarios to identify model uncertainty and

potential for inaccuracy• Independent Validation prior to Implementation (internal unit or Contracted External

resource)• Model used for population designed on• Reporting formalized, pre-established thresholds for performance effectiveness and stability• Exhaustive documentation to EXPLAIN everything

– Business Goals, Assumptions, Data, Intended Use, Methodology, How Model Works, ties in to Policy and Procedures, Adverse Action, Testing, Validation and tracking protocols, etc

EXPLAINING now is a really BIG thing…

The sum of the square roots of any two sides of

an isosceles triangle is equal to the square root of the remaining side. Oh

joy! Rapture! I got a brain! How can I ever thank you enough?

Explaining Models• Logistic and Linear Regression Models are very well

understood, have been reliably used in Business Applications for over 60 years, and when properly built are stable, very good predictors of outcomes

• Logistic and Linear Regression Models are relatively easy to explain– A linear regression line has an equation of the form Y = a + bX,

where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0)*

– Logistic regression is used for predicting binary outcomes (Bernoulli trials) rather than continuous outcomes, and models a transformation of the expected value as a linear function of the predictors, rather than the expected value itself**


**http://en.wikipedia.org/wiki/Logistic_regression#DefinitionFarrar -55

Explaining Models (2)

• Regression Models generally assume a statistically normal distribution of variables and predicted outcomes

• Both Linear and Logistic Models are founded on the correlative nature of multiple variables to predicted outcomes and require some type of linear relationship between each variable and the predicted outcome– Sometimes (generally) first require data to be transformed in a variety

of ways to establish an optimal linear relationship– Use a given variable only once in a given model, according to the

(derived) linear relationship• One variable (or range), one coefficient

On The Other Hand…

• Business Data is becoming less and less normally distributed• Businesses must now pay more and more attention to

exceptions and outliers in order to maximize targeting and profitability

• Linear and Logistic methodologies are no longer always adequate to solve the more complex business challenges– Some build model suites to address a single challenge– Lead times for development, validation, testing and documenting

suites of models are therefore much more extended– Newer methodologies can help here, in the sense that often one

model can be built, but…..• but 2011-12 rears its head again….

2011-12 rears its head again

• If ya’ can’t explain it, ya’ can’t use it• Neural Networks, Bayesian Networks, Stochastic

Gradient Boosting, etc. all need to be explained• Mathematical formulas, and underpinnings like

assumptions, must be justified, can be difficult to objectively explain, and may be difficult if not impossible to place into an Adverse Action context

Why CART is so cool…See, Decision Trees are “easy” because we can

explain this one no problem:

INDUS <= 6.145

INDUS > 6.145 && PT <= 18.65 && DIS <= 4.91145

INDUS > 6.145 && PT <= 18.65 && DIS > 4.91145

INDUS > 6.145 && PT > 18.65 && NOX <= 0.755 && LSTAT <= 5.165

INDUS > 6.145 && PT > 18.65 && NOX <= 0.755 && LSTAT > 5.165

INDUS > 6.145 && PT > 18.65 && NOX > 0.755

Even if it’s a bigger tree….

INDUS <= 6.145 && MV <= 45.7

INDUS <= 6.145 && MV > 45.7

INDUS > 6.145 && PT <= 18.65 && DIS <= 4.91145 && TAX <= 17.5

INDUS > 6.145 && PT <= 18.65 && DIS <= 4.91145 && TAX > 17.5

INDUS > 6.145 && PT <= 18.65 && DIS > 4.91145

INDUS > 6.145 && PT > 18.65 && NOX <= 0.755 && LSTAT <= 5.165

INDUS > 6.145 && PT > 18.65 && NOX <= 0.755 && LSTAT > 5.165 && DIS <= 1.1333

INDUS > 6.145 && PT > 18.65 && NOX <= 0.755 && LSTAT > 5.165 && DIS > 1.1333

INDUS > 6.145 && PT > 18.65 && NOX > 0.755

But how in Munchkin Land can you explain this thing?

And what if ya’ had something like THIS …

+ ++ ++ ++

++ ++ ++ +

+ ++ ++ ++

++ ++ ++ +

++ ++ …….+ +

Oh MY!

Even the TREES get confused…

+ ++ ++ ++

++ ++ ++ +

+ ++ ++ ++

++ ++ ++ +

++ ++ …..+ +

BAD news

• ya really can’t explain this one

OCC 2011-12


Good news• Ya CAN explain this one…..

+ ++ ++ ++

++ ++ ++ +

+ ++ ++ ++

++ ++ ++ +

++ ++ ……..+ +

But what the Kansas is this thing anyway?

A Woodman’s view of TreeNet®• Borrowing from Dan Steinberg’s introductory video….

– TreeNet® is also called Stochastic Gradient Boosting– It’s speed and accuracy are unparalleled in Modeling and it has a

number of advantages over more traditional methodologies• I will leave the Sales Pitch to Salford, but it is my favorite tool and we used it for

every kind of model you can think of

– I am no expert but here is kind of how it works (and TreeNet® does this automatically and keeps track of it all for you):• build an initial tree and identify the misclassifications• using the misclassified cases as the target, pull your whole sample again, develop a

new tree based on that• continue until you have exhausted your errors. Could be hundreds or thousands of

builds, all happening very quickly• You then “simply” add up all of the weights of the variables in the individual trees

and Voilà!

Farrar -66

Don’t blame the Retriever;Who threw the ball?

Think about it like this….• So you get your one tree…

• TreeNet® changes your target to the Misclassesand creates a second tree….

• And TreeNet® does it again and againand again while you get a treat for Toto……

• In the end, TreeNet® adds the weights of the variables in all the trees together…..

+ + ++ + + ….

• Then you simply export the code and implement the model!

Here’s a bit of what a Treenet Model looks like to a C Programmer

/********************************************************** * The following C source code was automatically generated * by the TRANSLATE feature in Salford Predictive Miner(tm). * Modeling version:, Translation version: **********************************************************/

#include <string.h> /* for strcmp() */#include <math.h> /* for exp() */

/********************************************************** * **** APPLICATION-DEPENDENT MISSING VALUES **** * The two constants must be set **by you** to whatever * value(s) you use in your data management or programming * workflow to represent missing data. **********************************************************/

const double DBL_MISSING_VALUE = /* value needed here! */ ;const int INT_MISSING_VALUE = /* value needed here! */ ;

/************ * PREDICTORS ************/



* Here come the treenets in the grove. A shell for calling them * appears at the end of this source file.***************************************************************/double TreeNet_1(double * const pProb0, double * const pProb1){ /* TreeNet version: */ /* TreeNet: TreeNet_1 */ /* Timestamp: 2012043172135 */ /* Grove: C:\DOCUME~1\Office\LOCALS~1\Temp\s5u137 */ /* Target: CHAS */ /* N trees: 197 */ /* N target classes: 2 */

double target, net_response = 0.0; int node, done; int response = 0;

/***************************/ /* Class-specific treenets */ /***************************/

double expsum = 0.0; double prob0, score0; /* CHAS = 0 */ double prob1, score1; /* CHAS = 1 */

/*******************************************************/ /* The following predictors had no missing data in */ /* the learn sample, so the TreeNet model is unable to */ /* accommodate missing data for them during scoring. */ /* They must be imputed. These particular values are */ /* the learn sample medians and/or modes. These are */ /* provided as a convenience, you may wish to replace */ /* these expressions with your own. */

/*******************************************************/if (CRIM == DBL_MISSING_VALUE) CRIM = 0.2102; if (ZN == DBL_MISSING_VALUE) ZN = 0; if (INDUS == DBL_MISSING_VALUE) INDUS = 8.14; if (NOX == DBL_MISSING_VALUE) NOX = 0.515; if (RM == DBL_MISSING_VALUE) RM = 6.251; if (AGE == DBL_MISSING_VALUE) AGE = 74.3; if (DIS == DBL_MISSING_VALUE) DIS = 3.4211; if (RAD == DBL_MISSING_VALUE) RAD = 5; if (TAX == DBL_MISSING_VALUE) TAX = 207; if (PT == DBL_MISSING_VALUE) PT = 18.6; if (B == DBL_MISSING_VALUE) B = 192.11; if (LSTAT == DBL_MISSING_VALUE) LSTAT = 10.3; if (MV == DBL_MISSING_VALUE) MV = 21.7;

/* Tree 1 of 197 */ /* N terminal nodes = 6, Depth = 5 */

target = 0.0; node = 1; /* start at root node */ done = 0; /* set at terminal node */

while (!done) switch (node) {

case 1: if (NOX != DBL_MISSING_VALUE && NOX < 0.755) node = 2; else node = -6; break;

case 2: if (TAX != DBL_MISSING_VALUE && TAX < 278) node = 3; else node = 5; break;

case 3: if (RM != DBL_MISSING_VALUE && RM < 5.93) node = -1; else node = 4; break;

TreeNet® code2

case -1: target = -1.202511; node = 1; done = 1; break;

case 4: if (LSTAT != DBL_MISSING_VALUE && LSTAT < 6.13) node = -2; else node = -3; break;

case -2: target = -1.217944; node = 2; done = 1; break;

case -3: target = -1.2337965; node = 3; done = 1; break;

case 5: if (MV != DBL_MISSING_VALUE && MV < 27.3) node = -4; else node = -5; break;

case -4: target = -1.2337965; node = 4; done = 1; break;

case -5: target = -1.2231822; node = 5; done = 1; break;

case -6: target = -1.2087922; node = 6; done = 1; break;

default: /* error */ target = 0.0; done = 1; node = 0; break;


net_response += target;

/* Tree 2 of 197 */ /* N terminal nodes = 6, Depth = 5 */

target = 0.0; node = 1; /* start at root node */ done = 0; /* set at terminal node */

while (!done) switch (node) {

case 1: if (NOX != DBL_MISSING_VALUE && NOX < 0.7155) node = 2; else node = -6; break;

case 2: if (PT != DBL_MISSING_VALUE && PT < 17.7) node = 3; else node = 5; break;

case 3: if (TAX != DBL_MISSING_VALUE && TAX < 40.5) node = -1; else node = 4; break;

case -1: target = 0.024272515; node = 1; done = 1; break;

case 4: if (CRIM != DBL_MISSING_VALUE && CRIM < 0.191425) node = -2; else node = -3; break;

case -2: target = -0.005427301; node = 2; done = 1; break;

case -3: target = 0.0093125903; node = 3; done = 1; break;

case 5: if (RM != DBL_MISSING_VALUE && RM < 5.5815) node = -4; else node = -5; break;

case -4: target = 0.00081652142; node = 4; done = 1; break;

case -5: target = -0.0047567333; node = 5; done = 1; break;

case -6: target = 0.01884071; node = 6; done = 1; break;

default: /* error */ target = 0.0; done = 1; node = 0; break;


net_response += target;

/* Tree 3 of 197 */ /* N terminal nodes = 6, Depth = 5 */ (…..)

Code for the first 3 Trees in the Model…*

*NOTE! We multiplied the results times 10,000 to eliminate double precision problems during implementation… Ask me!

Imagine that for THOUSANDS of trees….

But back to the Wizard of OCC….

• What you need to focus on is explaining itall for the Wizard….

• And that doesn’t mean slapping down a bunch of code lines

• The Wizard needs to understand how come the Ruby Slippers fit so well, how the Slippers were put together, and where the material comes from (the variables and weights that drive the results)

• Especially if you need to communicate to customers the effects of wearing the Slippers– In modeling terms, like if it is an Origination model

needing Score Factor Codes for Adverse Action Letters…)

• So here’s one way to do that….

• Forget about the code… it’s just text! IT can handle it!

CASE STUDY from Real Life….

•Attrition Model – Customer will close all accounts• Needed Talking Points (Score Factors) to facilitate

attempts to save customer accounts• Built TreeNet® model to predict probability that a

customer will close all of their accounts• Identified CART Equivalent Rules for all Accounts• Pulled new out of sample data for recent periods• Scored and Validated the results against known

outcomes• Based on the Probability, generated list of high risk

accounts and pushed to Branches with Score Factors (rules) appended

Attrition Model Process• Built TreeNet® Model• Scored Validation Set using model built

• Created new data set appending probability score and Node identifier to each sample point

• Identified Variable Importance• Used CART to derive a Regression tree using TreeNet® score as the target• Compared Variable Importances• Looked at rules governing each of the like nodes• Manually went through tree finding Terminal Nodes with like Mean values• Generalized like nodes based on rules and split thresholds, creating factors such as “Low

Balance,” “Short Time On Books,” “Diminishing Balance Over Last 6 Months,” etc.• Pruned Tree where possible (without fundamentally changing Rules and split

thresholds)• Analyzed each step to understand Utility vs. Complexity tradeoffs• Tested outcome (same data) with the generalized variables• Tested with repeated out-of-sample Validation sets• Subjected process to Model Risk Management Unit which independently validated

model and documentation• Implemented Model

A Schematical* Representation of what I just explained…

Initial Regression Tree (post- TreeNet®)

*HAH! I love new words….

Look at which cases hit the same Nodeand group them

Step 1 CASEID RESPONSE NODE CASEID RESPONSE NODE CASEID RESPONSE NODE CASEID RESPONSE NODE 1 -0.000004613 20 31 -0.00000251 14 61 0.000006376 3 91 0.00001477 21 2 -0.000064770 12 32 0.000005463 1 62 0.000006376 3 92 0.00001477 21 3 0.002201621 9 33 -0.00000251 14 63 0.000006376 3 93 0.00001477 21 4 0.002201621 9 34 0.000005463 1 64 0.000006376 3 94 0.00001477 21 5 0.002201621 9 35 0.000005463 1 65 -0.000046164 13 95 0.00001477 21 6 0.002201621 9 36 0.00002013 4 66 -0.000004613 20 96 0.000020736 22 7 0.000004718 19 37 0.00002013 4 67 0.000004618 2 97 0.000020736 22 8 -0.000000771 13 2 0.000016180 14 68 0.000005193 16 98 0.002227983 11 9 -0.000004596 18 19 0.000016180 14 69 0.000005193 16 99 0.002227983 11 10 0.000004718 19 16 0.002227895 7 70 0.000005193 16 100 -0.000068815 17 11 0.000004718 19 1 0.00221521 5 71 0.000005463 1 101 0.001503144 3 12 0.000004718 19 42 0.00221521 5 72 0.000005463 1 102 0.000014534 24 13 0.000004718 19 13 0.00221521 5 73 0.000005463 1 103 0.000005463 1 14 0.000005463 1 44 -0.000064770 12 74 0.000005463 1 104 0.000005463 1 15 0.000005463 1 45 -0.000064770 12 75 0.000005463 1 105 0.000005463 1 16 0.000005463 1 46 0.004469995 2 76 0.000005463 1 106 0.000005463 1 17 0.000005463 1 47 0.004469995 2 77 0.000005463 1 107 0.000005463 1 18 0.000005463 1 18 -0.000064770 12 78 0.000005463 1 108 0.000005463 1 19 0.000015537 7 14 0.004469995 2 79 0.000005463 1 109 0.000005463 1 20 0.000005463 1 50 0.004469995 2 80 0.000005463 1 110 0.000005463 1 21 0.000005463 1 51 -0.000046164 13 81 0.001503144 3 111 0.000005463 1 22 0.000005463 1 52 -0.000046164 13 82 0.00002013 4 112 0.000014534 24 23 0.000005463 1 53 0.00222916 8 83 0.000016180 5 113 0.000005463 1 24 0.000005463 1 54 -0.000046164 13 84 0.00002013 4 114 0.000005463 1 25 0.000005463 1 55 0.000005193 16 85 0.000008446 15 115 0.000005463 1 26 0.000005463 1 56 0.002220315 6 86 0.000008446 15 116 0.000005463 1 27 0.000005463 1 3 -0.000004613 20 87 0.000008446 15 117 0.000005463 1 4 0.000005463 1 58 0.00240118 10 88 0.000008446 15 118 0.000005463 1 3 0.000005463 1 59 0.000006376 3 89 0.00240118 10 119 0.000005463 1 20 0.000005463 1 60 0.000006376 3 90 0.002227983 11 120 0.000005463 1

Sort by Response and determine if it makes sense to group similar outcomes

Step 2 CASEID RESPONSE NODE CASEID RESPONSE NODE CASEID RESPONSE NODE CASEID RESPONSE NODE 100 -0.0000688150 17 18 0.0000054630 1 108 0.0000054630 1 38 0.0000161800 14 2 -0.0000647700 12 20 0.0000054630 1 109 0.0000054630 1 39 0.0000161800 14 44 -0.0000647700 12 21 0.0000054630 1 110 0.0000054630 1 83 0.0000161800 14 45 -0.0000647700 12 22 0.0000054630 1 111 0.0000054630 1 36 0.0000201300 4 48 -0.0000647700 12 23 0.0000054630 1 113 0.0000054630 1 37 0.0000201300 4 51 -0.0000461640 13 24 0.0000054630 1 114 0.0000054630 1 82 0.0000201300 4 52 -0.0000461640 13 25 0.0000054630 1 115 0.0000054630 1 84 0.0000201300 4 54 -0.0000461640 13 26 0.0000054630 1 116 0.0000054630 1 96 0.0000207360 22 65 -0.0000461640 13 27 0.0000054630 1 117 0.0000054630 1 97 0.0000207360 22 1 -0.0000046130 20 28 0.0000054630 1 118 0.0000054630 1 81 0.0015031440 3 57 -0.0000046130 20 29 0.0000054630 1 119 0.0000054630 1 101 0.0015031440 3 66 -0.0000046130 20 30 0.0000054630 1 120 0.0000054630 1 3 0.0022016210 9 9 -0.0000045960 18 32 0.0000054630 1 59 0.0000063760 3 4 0.0022016210 9 31 -0.0000025100 14 34 0.0000054630 1 60 0.0000063760 3 5 0.0022016210 9 33 -0.0000025100 14 35 0.0000054630 1 61 0.0000063760 3 6 0.0022016210 9 8 -0.0000007710 13 71 0.0000054630 1 62 0.0000063760 3 41 0.0022152100 5 67 0.0000046180 2 72 0.0000054630 1 63 0.0000063760 3 42 0.0022152100 5 7 0.0000047180 21 73 0.0000054630 1 64 0.0000063760 3 43 0.0022152100 5 10 0.0000047180 21 74 0.0000054630 1 85 0.0000084460 15 56 0.0022203150 6 11 0.0000047180 21 75 0.0000054630 1 86 0.0000084460 15 40 0.0022278950 7 12 0.0000047180 21 76 0.0000054630 1 87 0.0000084460 15 90 0.0022279830 11 13 0.0000047180 21 77 0.0000054630 1 88 0.0000084460 15 98 0.0022279830 11 55 0.0000051930 16 78 0.0000054630 1 102 0.0000145340 24 99 0.0022279830 11 68 0.0000051930 16 79 0.0000054630 1 112 0.0000145340 24 53 0.0022291600 8 69 0.0000051930 16 80 0.0000054630 1 91 0.0000147700 19 58 0.0024011800 10 70 0.0000051930 16 103 0.0000054630 1 92 0.0000147700 19 89 0.0024011800 10 14 0.0000054630 1 104 0.0000054630 1 93 0.0000147700 19 46 0.0044699950 2 15 0.0000054630 1 105 0.0000054630 1 94 0.0000147700 19 47 0.0044699950 2 16 0.0000054630 1 106 0.0000054630 1 95 0.0000147700 19 49 0.0044699950 2 17 0.0000054630 1 107 0.0000054630 1 19 0.0000155370 7 50 0.0044699950 2

Similar Outcomes, Consolidate Rules?

Tnode 4 0.00002013YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && BRANCH <= 417.5 && C_BTL > 0.04427 && PROFIT_ILE <= 24.95 Tnode 22 0.00002074

YRS_OB > 6.145 && CONTACTS > 18.65 && FEE_LEVEL <= 0.755 && TRANS_NUM > 540.5 && M_VAL <= 39.9

Tnode 8 0.00222916YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && C_BTL > 0.04427 && PROFIT_ILE > 24.95 && BRANCH <= 290.5 && TRANS_NUM > 1325.5 && TRANS_NUM <= 2731

Tnode 11 0.00222798YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && C_BTL > 0.04427 && BRANCH > 290.5 && BRANCH <= 417.5 && NUM_PROD > 4.5 && FEE_LEVEL > 0.5055 && PROFIT_ILE > 24.95 && PROFIT_ILE <= 96.05 && MOS_ACTIVE <= 367.315 && TRANS_NUM <= 1563

Node Rule Consolidation DecisionsTnode 4 0.00002013YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && BRANCH <= 417.5 && C_BTL > 0.04427 && PROFIT_ILE <= 24.95

Tnode 22 0.00002074 YRS_OB > 6.145 && CONTACTS > 18.65 && FEE_LEVEL <= 0.755 && TRANS_NUM > 540.5 && M_VAL <= 39.9

Tnode 8 0.00222916YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && C_BTL > 0.04427 && PROFIT_ILE > 24.95 && BRANCH <= 290.5 && TRANS_NUM > 1325.5 && TRANS_NUM <= 2731

Tnode 11 0.002227983YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && C_BTL > 0.04427 && BRANCH > 290.5 && BRANCH <= 417.5 && NUM_PROD > 4.5 && FEE_LEVEL > 0.5055 && PROFIT_ILE > 24.95 && PROFIT_ILE <= 96.05 && MOS_ACTIVE <= 367.315 && TRANS_NUM <= 1563

TN4 and TN22Not good for Consolidation, too many


TN8 and TN11Good for Consolidation, differences

can be dealt with

Rules Tnodes 8 & 11



Determine if Pruning the tree will appreciably affect the generic rules

TNode 5YRS_OB > 6.145 && CONTACTS > 18.65 && FEE_LEVEL <= 0.755 && TRANS_NUM <= 540.5

TNode 8 YRS_OB > 6.145 && CONTACTS > 18.65 && FEE_LEVEL > 0.755

Rules Tnodes 5 & 8



Analyze Differences:Full vs. Pruned Trees

F_CASEID F_NODE P_CASEID P_NODE Same Node1 1 1 1 TRUE2 3 2 3 TRUE3 3 3 3 TRUE Counts _NODE4 1 4 1 TRUE F_NODE 1 2 3 4 5 6 7 8 Grand Total5 1 5 1 TRUE 1 154 1546 1 6 1 TRUE 2 9 97 20 7 4 FALSE 3 5 58 20 8 4 FALSE 4 5 59 20 9 4 FALSE 5 1 1

10 20 10 4 FALSE 6 9 911 20 11 4 FALSE 7 3 312 20 12 4 FALSE 8 9 913 20 13 4 FALSE 9 12 1214 22 14 6 FALSE 10 6 615 22 15 6 FALSE 11 9 916 22 16 6 FALSE 12 2 217 22 17 6 FALSE 13 4 418 22 18 6 FALSE 14 10 1019 22 19 6 FALSE 15 1 120 22 20 6 FALSE 16 9 921 22 21 6 FALSE 17 1 122 22 22 6 FALSE 18 4 423 22 23 6 FALSE 19 9 924 22 24 6 FALSE 20 21 2125 22 25 6 FALSE 21 5 526 22 26 6 FALSE 22 208 20827 22 27 6 FALSE 23 2 228 22 28 6 FALSE 24 8 829 22 29 6 FALSE Grand Total 154 9 99 21 5 208 2 8 50630 22 30 6 FALSE31 22 31 6 FALSE32 22 32 6 FALSE33 22 33 6 FALSE34 22 34 6 FALSE


Variable Importance Changes

Full Tree Variable ImportanceVariable Score M_VAL 100.00 ||||||||||||||||||||||||||||||||||||||||||TRANS_NUM 90.09 ||||||||||||||||||||||||||||||||||||||CONTACTS 88.67 |||||||||||||||||||||||||||||||||||||FEE_LEVEL 79.38 |||||||||||||||||||||||||||||||||BRANCH 70.26 |||||||||||||||||||||||||||||YRS_OB 66.95 ||||||||||||||||||||||||||||REGION 56.57 |||||||||||||||||||||||NUM_ACCTS 56.23 |||||||||||||||||||||||PROFIT_ILE 39.96 ||||||||||||||||FAM_MEMS 35.23 ||||||||||||||C_BTL 34.01 ||||||||||||||NUM_PROD 23.71 |||||||||MOS_ACTIVE 9.20 |||

Pruned Tree Variable ImportanceVariable Score M_VAL 100.00 ||||||||||||||||||||||||||||||||||||||||||CONTACTS 80.93 ||||||||||||||||||||||||||||||||||TRANS_NUM 80.71 ||||||||||||||||||||||||||||||||||FEE_LEVEL 75.06 |||||||||||||||||||||||||||||||YRS_OB 70.90 |||||||||||||||||||||||||||||NUM_ACCTS 60.14 |||||||||||||||||||||||||REGION 56.84 |||||||||||||||||||||||BRANCH 41.87 |||||||||||||||||FAM_MEMS 28.42 |||||||||||PROFIT_ILE 17.74 |||||||C_BTL 16.08 ||||||NUM_PROD 8.96 |||

Farrar -81

Tnode 8 0.00222916YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && C_BTL > 0.04427 && PROFIT_ILE > 24.95 && BRANCH <= 290.5 && TRANS_NUM > 1325.5 && TRANS_NUM <= 2731

Tnode 11 0.002227983YRS_OB > 6.145 && CONTACTS <= 18.65 && NUM_ACCTS <= 4.5 && C_BTL > 0.04427 && BRANCH > 290.5 && BRANCH <= 417.5 && NUM_PROD > 4.5 && FEE_LEVEL > 0.5055 && PROFIT_ILE > 24.95 && PROFIT_ILE <= 96.05 && MOS_ACTIVE <= 367.315 && TRANS_NUM <= 1563

TN8 and TN11Good for Consolidation, differences can be

dealt with

Rules Tnodes 8 & 11



TNode 5YRS_OB > 6.145 && CONTACTS > 18.65 && FEE_LEVEL <= 0.755 && TRANS_NUM <= 540.5

TNode 8 YRS_OB > 6.145 && CONTACTS > 18.65 && FEE_LEVEL > 0.755

Rules TNodes 5 & 8



and After…

Effect after pruning (Where Art Meets Science):•TNodes change from 8 and 11 to 5 and 8 (Smaller tree)•“BRANCH” kept since it applied prior to pruning and aids in list generation and routing•“YRS_OB” split threshold becomes rounded generalized threshold •Generalization can still be used

• In this example, “FEE_LEVEL” was not included ( “<= and >” cancel each other out)• “CONTACTS” thresholds change ( “<= becomes >” ) but threshold still can be used within “Low” designation• “TRANS_NUM was kept since it applied prior to pruning and aided in talking points

RUN, Toto, RUN!!!!• Implement the Dog-gone thing!

Customer Branch Risk Point 1 Point 2 Point 3 Point 4

Bill Muchkinovski 200 LowLong time on books 6 Mos. Moderate Balance Moderate number products Moderate Profit

Millie Smoller 27 MedShort time on books 6 Mos. Low Balance Low Number Products

6 Mos. Low number contacts

Beulah Diminuitive 343 MedShort time on books 6 Mos. Low Balance Low Number Products

6 Mos. High number contacts

Casper Lollipopovich 721 High

Long time on books 6 Mos. Diminishing Balance

6 Mos. High Number of Contacts Moderate Profit

Martha Smallkind 14 HighLong time on books 6 Mos. Diminishing Balance

6 Mos. High Number of Contacts Moderate Profit

Elmo Munchkinovich 1 High

Long time on books

6 Mos. High Number of Contacts 6 Mos. High Balance High Profit

Save those CUSTOMERS!

Happily Down the Road….

There’s No Place Like Home….

The End

The now departed Zeppelin, best human being I ever knew, proudly displaying the four balls he so loved to retrieve…

Contact Information:

[email protected]

Jon’s 30+ years of Predictive Modeling expertise comes from various segments of the financial industry including Banking, Consumer Finance, Mortgage, and Modeling Vendor. He has experience in the U.S., Canada, Australia and the United Kingdom. As SVP and Manager of Predictive Modeling at Union Bank, Jon introduced Scoring technology in 1995 and provided Credit Risk research, analytics and Customer Segmentation strategies, along with many of the Bank’s Business Intelligence and Operations statistical models.

Jon’s Expertise includes Regulatory oversight and all things AVM (Automated Valuation Modeling).

In addition to Consulting and Expert Witness engagements, Jon holds a Master’s Degree in Counseling Psychology and speaks at a variety of Industry conferences.

Farrar -88